An Introduction to the Theory of Point Processes: General theory and structure

Probability and its Applications A Series of the Applied Probability Trust Editors: J. Gani, C.C. Heyde, T.G. Kurtz Sp...

Author: Daryl J. Daley | David Vere-Jones

20 downloads 752 Views 8MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Probability and its Applications A Series of the Applied Probability Trust

Editors: J. Gani, C.C. Heyde, T.G. Kurtz

Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo

D.J. Daley

D. Vere-Jones

An Introduction to the Theory of Point Processes Volume I: Elementary Theory and Methods Second Edition

D.J. Daley Centre for Mathematics and its Applications Mathematical Sciences Institute Australian National University Canberra, ACT 0200, Australia [email protected]

Series Editors: J. Gani Stochastic Analysis Group, CMA Australian National University Canberra, ACT 0200 Australia

D. Vere-Jones School of Mathematical and Computing Sciences Victoria University of Wellington Wellington, New Zealand [email protected]

C.C. Heyde Stochastic Analysis Group, CMA Australian National University Canberra, ACT 0200 Australia

T.G. Kurtz Department of Mathematics University of Wisconsin 480 Lincoln Drive Madison, WI 53706 USA

Library of Congress Cataloging-in-Publication Data Daley, Daryl J. An introduction to the theory of point processes / D.J. Daley, D. Vere-Jones. p. cm. Includes bibliographical references and index. Contents: v. 1. Elementary theory and methods ISBN 0-387-95541-0 (alk. paper) 1. Point processes. I. Vere-Jones, D. (David) II. Title QA274.42.D35 2002 519.2´3—dc21 2002026666 ISBN 0-387-95541-0

Printed on acid-free paper.

© 2003, 1988 by the Applied Probability Trust. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 9 8 7 6 5 4 3 2 1

SPIN 10885680

Typesetting: Photocomposed pages prepared by the authors using plain TeX files. www.springer-ny.com Springer-Verlag New York Berlin Heidelberg A member of BertelsmannSpringer Science+Business Media GmbH

◆ To Nola, and in memory of Mary ◆

Preface to the Second Edition

In preparing this second edition, we have taken the opportunity to reshape the book, partly in response to the further explosion of material on point processes that has occurred in the last decade but partly also in the hope of making some of the material in later chapters of the ﬁrst edition more accessible to readers primarily interested in models and applications. Topics such as conditional intensities and spatial processes, which appeared relatively advanced and technically diﬃcult at the time of the ﬁrst edition, have now been so extensively used and developed that they warrant inclusion in the earlier introductory part of the text. Although the original aim of the book— to present an introduction to the theory in as broad a manner as we are able—has remained unchanged, it now seems to us best accomplished in two volumes, the ﬁrst concentrating on introductory material and models and the second on structure and general theory. The major revisions in this volume, as well as the main new material, are to be found in Chapters 6–8. The rest of the book has been revised to take these changes into account, to correct errors in the ﬁrst edition, and to bring in a range of new ideas and examples. Even at the time of the ﬁrst edition, we were struggling to do justice to the variety of directions, applications and links with other material that the theory of point processes had acquired. The situation now is a great deal more daunting. The mathematical ideas, particularly the links to statistical mechanics and with regard to inference for point processes, have extended considerably. Simulation and related computational methods have developed even more rapidly, transforming the range and nature of the problems under active investigation and development. Applications to spatial point patterns, especially in connection with image analysis but also in many other scientiﬁc disciplines, have also exploded, frequently acquiring special language and techniques in the diﬀerent ﬁelds of application. Marked point processes, which were clamouring for greater attention even at the time of the ﬁrst edition, have acquired a central position in many of these new applications, inﬂuencing both the direction of growth and the centre of gravity of the theory. vii

viii

Preface to the Second Edition

We are sadly conscious of our inability to do justice to this wealth of new material. Even less than at the time of the ﬁrst edition can the book claim to provide a comprehensive, up-to-the-minute treatment of the subject. Nor are we able to provide more than a sketch of how the ideas of the subject have evolved. Nevertheless, we hope that the attempt to provide an introduction to the main lines of development, backed by a succinct yet rigorous treatment of the theory, will prove of value to readers in both theoretical and applied ﬁelds and a possible starting point for the development of lecture courses on diﬀerent facets of the subject. As with the ﬁrst edition, we have endeavoured to make the material as self-contained as possible, with references to background mathematical concepts summarized in the appendices, which appear in this edition at the end of Volume I. We would like to express our gratitude to the readers who drew our attention to some of the major errors and omissions of the ﬁrst edition and will be glad to receive similar notice of those that remain or have been newly introduced. Space precludes our listing these many helpers, but we would like to acknowledge our indebtedness to Rick Schoenberg, Robin Milne, Volker Schmidt, G¨ unter Last, Peter Glynn, Olav Kallenberg, Martin Kalinke, Jim Pitman, Tim Brown and Steve Evans for particular comments and careful reading of the original or revised texts (or both). Finally, it is a pleasure to thank John Kimmel of Springer-Verlag for his patience and encouragement, and especially Eileen Dallwitz for undertaking the painful task of rekeying the text of the ﬁrst edition. The support of our two universities has been as unﬂagging for this endeavour as for the ﬁrst edition; we would add thanks to host institutions of visits to the Technical University of Munich (supported by a Humboldt Foundation Award), University College London (supported by a grant from the Engineering and Physical Sciences Research Council) and the Institute of Mathematics and its Applications at the University of Minnesota. Daryl Daley Canberra, Australia

David Vere-Jones Wellington, New Zealand

Preface to the First Edition

This book has developed over many years—too many, as our colleagues and families would doubtless aver. It was conceived as a sequel to the review paper that we wrote for the Point Process Conference organized by Peter Lewis in 1971. Since that time the subject has kept running away from us faster than we could organize our attempts to set it down on paper. The last two decades have seen the rise and rapid development of martingale methods, the surge of interest in stochastic geometry following Rollo Davidson’s work, and the forging of close links between point processes and equilibrium problems in statistical mechanics. Our intention at the beginning was to write a text that would provide a survey of point process theory accessible to beginning graduate students and workers in applied ﬁelds. With this in mind we adopted a partly historical approach, starting with an informal introduction followed by a more detailed discussion of the most familiar and important examples, and then moving gradually into topics of increased abstraction and generality. This is still the basic pattern of the book. Chapters 1–4 provide historical background and treat fundamental special cases (Poisson processes, stationary processes on the line, and renewal processes). Chapter 5, on ﬁnite point processes, has a bridging character, while Chapters 6–14 develop aspects of the general theory. The main diﬃculty we had with this approach was to decide when and how far to introduce the abstract concepts of functional analysis. With some regret, we ﬁnally decided that it was idle to pretend that a general treatment of point processes could be developed without this background, mainly because the problems of existence and convergence lead inexorably to the theory of measures on metric spaces. This being so, one might as well take advantage of the metric space framework from the outset and let the point process itself be deﬁned on a space of this character: at least this obviates the tedium of having continually to specify the dimensions of the Euclidean space, while in the context of completely separable metric spaces—and this is the greatest ix

x

Preface to the First Edition

generality we contemplate—intuitive spatial notions still provide a reasonable guide to basic properties. For these reasons the general results from Chapter 6 onward are couched in the language of this setting, although the examples continue to be drawn mainly from the one- or two-dimensional Euclidean spaces R1 and R2 . Two appendices collect together the main results we need from measure theory and the theory of measures on metric spaces. We hope that their inclusion will help to make the book more readily usable by applied workers who wish to understand the main ideas of the general theory without themselves becoming experts in these ﬁelds. Chapter 13, on the martingale approach, is a special case. Here the context is again the real line, but we added a third appendix that attempts to summarize the main ideas needed from martingale theory and the general theory of processes. Such special treatment seems to us warranted by the exceptional importance of these ideas in handling the problems of inference for point processes. In style, our guiding star has been the texts of Feller, however many lightyears we may be from achieving that goal. In particular, we have tried to follow his format of motivating and illustrating the general theory with a range of examples, sometimes didactical in character, but more often taken from real applications of importance. In this sense we have tried to strike a mean between the rigorous, abstract treatments of texts such as those by Matthes, Kerstan and Mecke (1974/1978/1982) and Kallenberg (1975, 1983), and practically motivated but informal treatments such as Cox and Lewis (1966) and Cox and Isham (1980). Numbering Conventions. Each chapter is divided into sections, with consecutive labelling within each of equations, statements (encompassing Deﬁnitions, Conditions, Lemmas, Propositions, Theorems), examples, and the exercises collected at the end of each section. Thus, in Section 1.2, (1.2.3) is the third equation, Statement 1.2.III is the third statement, Example 1.2(c) is the third example, and Exercise 1.2.3 is the third exercise. The exercises are varied in both content and intention and form a signiﬁcant part of the text. Usually, they indicate extensions or applications (or both) of the theory and examples developed in the main text, elaborated by hints or references intended to help the reader seeking to make use of them. The symbol denotes the end of a proof. Instead of a name index, the listed references carry page number(s) where they are cited. A general outline of the notation used has been included before the main text. It remains to acknowledge our indebtedness to many persons and institutions. Any reader familiar with the development of point process theory over the last two decades will have no diﬃculty in appreciating our dependence on the fundamental monographs already noted by Matthes, Kerstan and Mecke in its three editions (our use of the abbreviation MKM for the 1978 English edition is as much a mark of respect as convenience) and Kallenberg in its two editions. We have been very conscious of their generous interest in our eﬀorts from the outset and are grateful to Olav Kallenberg in particular for saving us from some major blunders. A number of other colleagues, notably

Preface to the First Edition

xi

David Brillinger, David Cox, Klaus Krickeberg, Robin Milne, Dietrich Stoyan, Mark Westcott, and Deng Yonglu, have also provided valuable comments and advice for which we are very grateful. Our two universities have responded generously with seemingly unending streams of requests to visit one another at various stages during more intensive periods of writing the manuscript. We also note visits to the University of California at Berkeley, to the Center for Stochastic Processes at the University of North Carolina at Chapel Hill, and to Zhongshan University at Guangzhou. For secretarial assistance we wish to thank particularly Beryl Cranston, Sue Watson, June Wilson, Ann Milligan, and Shelley Carlyle for their excellent and painstaking typing of diﬃcult manuscript. Finally, we must acknowledge the long-enduring support of our families, and especially our wives, throughout: they are not alone in welcoming the speed and eﬃciency of Springer-Verlag in completing this project. Daryl Daley Canberra, Australia

David Vere-Jones Wellington, New Zealand

Contents

Preface to the Second Edition Preface to the First Edition

vii ix

Principal Notation Concordance of Statements from the First Edition 1

Early History

1

1.1 Life Tables and Renewal Theory 1.2 Counting Problems 1.3 Some More Recent Developments 2

Basic Properties of the Poisson Process

2.1 The Stationary Poisson Process 2.2 Characterizations of the Stationary Poisson Process: I. Complete Randomness 2.3 Characterizations of the Stationary Poisson Process: II. The Form of the Distribution 2.4 The General Poisson Process 3

Simple Results for Stationary Point Processes on the Line

3.1 3.2 3.3 3.4 3.5 3.6

xvii xxi

Speciﬁcation of a Point Process on the Line Stationarity: Deﬁnitions Mean Density, Intensity, and Batch-Size Distribution Palm–Khinchin Equations Ergodicity and an Elementary Renewal Theorem Analogue Subadditive and Superadditive Functions xiii

1 8 13 19 19 26 31 34 41 41 44 46 53 60 64

xiv 4

Contents

Renewal Processes

4.1 4.2 4.3 4.4 4.5 4.6 5

Basic Properties Stationarity and Recurrence Times Operations and Characterizations Renewal Theorems Neighbours of the Renewal Process: Wold Processes Stieltjes-Integral Calculus and Hazard Measures

Finite Point Processes

5.1 An Elementary Example: Independently and Identically Distributed Clusters 5.2 Factorial Moments, Cumulants, and Generating Function Relations for Discrete Distributions 5.3 The General Finite Point Process: Deﬁnitions and Distributions 5.4 Moment Measures and Product Densities 5.5 Generating Functionals and Their Expansions 6

Models Constructed via Conditioning: Cox, Cluster, and Marked Point Processes

6.1 6.2 6.3 6.4 7

Conditional Intensities and Likelihoods

7.1 7.2 7.3 7.4 7.5 7.6 8

Inﬁnite Point Families and Random Measures Cox (Doubly Stochastic Poisson) Processes Cluster Processes Marked Point Processes

Likelihoods and Janossy Densities Conditional Intensities, Likelihoods, and Compensators Conditional Intensities for Marked Point Processes Random Time Change and a Goodness-of-Fit Test Simulation and Prediction Algorithms Information Gain and Probability Forecasts

Second-Order Properties of Stationary Point Processes

8.1 8.2 8.3 8.4 8.5 8.6

Second-Moment and Covariance Measures The Bartlett Spectrum Multivariate and Marked Point Processes Spectral Representation Linear Filters and Prediction P.P.D. Measures

66 66 74 78 83 92 106 111 112 114 123 132 144

157 157 169 175 194 211 212 229 246 257 267 275 288 289 303 316 331 342 357

Contents

A1 A1.1 A1.2 A1.3 A1.4 A1.5 A1.6 A2 A2.1 A2.2 A2.3 A2.4 A2.5 A2.6 A2.3 A2.3 A3 A3.1 A3.2 A3.3 A3.4

A Review of Some Basic Concepts of Topology and Measure Theory Set Theory Topologies Finitely and Countably Additive Set Functions Measurable Functions and Integrals Product Spaces Dissecting Systems and Atomic Measures Measures on Metric Spaces

xv

368 368 369 372 374 377 382 384

Borel Sets and the Support of Measures Regular and Tight Measures Weak Convergence of Measures Compactness Criteria for Weak Convergence Metric Properties of the Space MX Boundedly Finite Measures and the Space M# X Measures on Topological Groups Fourier Transforms

384 386 390 394 398 402 407 411

Conditional Expectations, Stopping Times, and Martingales

414

Conditional Expectations Convergence Concepts Processes and Stopping Times Martingales

414 418 423 428

References with Index

432

Subject Index

452 Chapter Titles for Volume II

9 General Theory of Point Processes and Random Measures 10 Special Classes of Processes 11 Convergence Concepts and Limit Theorems 12 Ergodic Theory and Stationary Processes 13 Palm Theory 14 Evolutionary Processes and Predictability 15 Spatial Point Processes

Principal Notation

Very little of the general notation used in Appendices 1–3 is given below. Also, notation that is largely conﬁned to one or two sections of the same chapter is mostly excluded, so that neither all the symbols used nor all the uses of the symbols shown are given. The repeated use of some symbols occurs as a result of point process theory embracing a variety of topics from the theory of stochastic processes. Where they are given, page numbers indicate the ﬁrst or signiﬁcant use of the notation. Generally, the particular interpretation of symbols with more than one use is clear from the context. Throughout the lists below, N denotes a point process and ξ denotes a random measure.

Spaces C Rd R = R1 R+ S Ud2α Z, Z+ X Ω ∅, ∅(·) E (Ω, E, P) X (n) X∪

complex numbers d-dimensional Euclidean space real line nonnegative numbers circle group and its representation as (0, 2π] d-dimensional cube of side length 2α and vertices (± α, . . . , ± α) integers of R, R+ state space of N or ξ; often X = Rd ; always X is c.s.m.s. (complete separable metric space) space of probability elements ω null set, null measure measurable sets in probability space basic probability space on which N and ξ are deﬁned n-fold product space X × · · · × X = X (0) ∪ X (1) ∪ · · · xvii

158 123 129

xviii

Principal Notation

B(X )

Borel σ-ﬁeld generated by open spheres of c.s.m.s. X 34 BX = B(X ), B = BR = B(R) 34, 374 (n) BX = B(X (n) ) product σ-ﬁeld on product space X (n) 129 BM(X ) measurable functions of bounded support 161 BM+ (X ) measurable nonnegative functions of bounded support 161 K mark space for marked point process (MPP) 194 MX (NX ) totally ﬁnite (counting) measures on c.s.m.s. X 158, 398 boundedly ﬁnite measures on c.s.m.s. X 158, 398 M# X # NX boundedly ﬁnite counting measures on c.s.m.s. X 131 P+ p.p.d. (positive positive-deﬁnite) measures 359 S inﬁnitely diﬀerentiable functions of rapid decay 357 U complex-valued Borel measurable functions on X of modulus ≤ 1 144 U ⊗V product topology on product space X × Y of topological spaces (X , U), (Y, V) 378 V = V(X ) [0, 1]-valued measurable functions h(x) with 1 − h(x) of bounded support in X 149, 152

General Unless otherwise speciﬁed, A ∈ BX , k and n ∈ Z+ , t and x ∈ R, h ∈ V(X ), and z ∈ C. ˜

˘ #

µ a.e. µ, µ-a.e. a.s., P-a.s. A(n) A Bu (Tu ) ck , c[k]

ν˜, F = Fourier–Stieltjes transforms of measure ν or d.f. F φ˜ = Fourier transform of Lebesgue integrable function φ for counting measures reduced (ordinary or factorial) (moment or cumulant) measure extension of concept from totally ﬁnite to boundedly ﬁnite measure space variation norm of measure µ almost everywhere with respect to measure µ almost sure, P-almost surely n-fold product set A × · · · × A family of sets generating B; semiring of bounded Borel sets generating BX backward (forward) recurrence time at u kth cumulant, kth factorial cumulant, of distribution {pn }

411–412 357 160 158 374 376 376 130 31, 368 58, 76 116

c(x) = c(y, y + x) covariance density of stationary mean square continuous process on Rd

160, 358

Principal Notation

factorial cumulant measure and density C[k] (·), c[k] (·) reduced covariance measure of stationary N or ξ C˘2 (·), c˘(·) c˘(·) reduced covariance density of stationary N or ξ δ(·) Dirac delta function δx (A) Dirac measure, = A δ(u − x) du = IA (x) ∆F (x) = F (x) − F (x−) jump at function F dx in right-continuous eλ (x) = ( 12 λ)d exp − λ i=1 |xi | two-sided exponential density in Rd F renewal process lifetime d.f. F n∗ n-fold convolution power of measure or d.f. F F (· ; ·) ﬁnite-dimensional (ﬁdi) distribution F history Φ(·) characteristic functional G[h] probability generating functional (p.g.ﬂ.) of N , G[h | x] member of measurable family of p.g.ﬂ.s Gc [·], Gm [· | x] p.g.ﬂ.s of cluster centre and cluster member processes Nc and Nm (· | x) G, GI expected information gain (per interval) of stationary N on R Γ(·), γ(·) Bartlett spectrum, its density when it exists H(P; µ) generalized entropy H, H∗ internal history of ξ on R+ , R IA (x) = δx (A) indicator function of element x in set A modiﬁed Bessel function of order n In (x) Jn (A1 × · · · × An ) Janossy measure jn (x1 , . . . , xn ) Janossy density local Janossy measure Jn (· | A) K compact set Kn (·), kn (·) Khinchin measure and density (·) Lebesgue measure in B(Rd ), Haar measure on σ-group Lu = Bu + Tu current lifetime of point process on R L[f ] (f ∈ BM+ (X )) Laplace functional of ξ Lξ [1 − h] p.g.ﬂ. of Cox process directed by ξ L2 (ξ 0 ), L2 (Γ) Hilbert spaces of square integrable r.v.s ξ 0 , and of functions square integrable w.r.t. measure Γ LA (x1 , . . . , xn ), = jN (x1 , . . . , xN | A) likelihood, local Janossy density, N ≡ N (A) λ rate of N , especially intensity of stationary N λ∗ (t) conditional intensity function kth (factorial) moment of distribution {pn } mk (m[k] )

xix 147 292 160, 292 382 107 359 67 55 158–161 236, 240 15 15, 144 166 178 280, 285 304 277, 283 236 72 124 125 137 371 146 31 408–409 58, 76 161 170 332 22, 212 46 231 115

xx

Principal Notation

˘2 m ˘ 2, M

reduced second-order moment density, measure, of stationary N mg mean density of ground process Ng of MPP N N (A) number of points in A N (a, b] number of points in half-open interval (a, b], = N ((a, b]) N (t) = N (0, t] = N ((0, t]) Nc cluster centre process N (· | x) cluster member or component process {(pn , Πn )} elements of probability measure for ﬁnite point process P (z) probability generating function (p.g.f.) of distribution {pn } P (x, A) Markov transition kernel P0 (A) avoidance function Pjk set of j-partitions of {1, . . . , k} P probability measure of stationary N on R, probability measure of N or ξ on c.s.m.s. X {πk } batch-size distribution q(x) = f (x)/[1 − F (x)] hazard function for lifetime d.f. F Q(z) = − log P (z) Q(·), Q(t) hazard measure, integrated hazard function (IHF) ρ(x, y) metric for x, y in metric space {Sn } random walk, sequence of partial sums S(x) = 1 − F (x) survivor function of d.f. F Sr (x) sphere of radius r, centre x, in metric space X d t(x) = i=1 (1 − |xi |)+ triangular density in Rd Tu forward recurrence time at u T = {S1 (T ), . . . , Sj (T )} a j-partition of k T = {Tn } = {{Ani }} dissecting system of nested partitions U (A) = E[N (A)] renewal measure U (x) = U ([0, x]), expectation function, renewal function (U (x) = 1 + U0 (x)) V (A) = var N (A), variance function V (x) = V ((0, x]) variance function for stationary N or ξ on R {Xn } components of random walk {Sn }, intervals of Wold process

289 198, 323 42 19 42 42 176 176 123 10, 115 92 31, 135 121 53 158 28, 51 2, 106 27 109 370 66 2, 109 35, 371 359 58, 75 121 382 67 61 67 295 80, 301 66 92

Concordance of Statements from the First Edition

The table below lists the identifying number of formal statements of the ﬁrst edition (1988) of this book and their identiﬁcation in this volume. 1988 edition

this volume

1988 edition

this volume

2.2.I–III 2.3.III

2.2.I–III 2.3.I

2.4.I–II 2.4.V–VIII

2.4.I–II 2.4.III–VI

8.1.II 8.2.I 8.2.II 8.3.I–III

6.1.II, IV 6.3.I 6.3.II, (6.3.6) 6.3.III–V

3.2.I–II 3.3.I–IX 3.4.I–II 3.5.I–III

3.2.I–II 3.3.I–IX 3.4.I–II 3.5.I–III

8.5.I–III

6.2.II

3.6.I–V

3.6.I–V

4.2.I–II

4.2.I–II

11.1.I–V 11.2.I–II 11.3.I–VIII 11.4.I–IV 11.4.V–VI

8.6.I–V 8.2.I–II 8.4.I–VIII 8.5.I–IV 8.5.VI–VII

4.3.I–III 4.4.I–VI 4.5.I–VI 4.6.I–V

4.3.I–III 4.4.I–VI 4.5.I–VI 4.6.I–V

13.1.I–III 13.1.IV–VI 13.1.VII 13.4.III

7.1.I–III 7.2.I–III 7.1.IV 7.6.I

5.2.I–VII 5.3.I–III 5.4.I–III 5.4.IV–VI 5.5.I

5.2.I–VII 5.3.I–III 5.4.I–III 5.4.V–VII 5.5.I

A1.1.I–5.IV A2.1.I–III A2.1.IV A2.1.V–VI A2.2.I–7.III A3.1.I–4.IX

A1.1.I–5.IV A2.1.I–III A1.6.I A2.1.IV–V A2.2.I–7.III A3.1.I–4.IX

7.1.XII–XIII

6.4.I(a)–(b) xxi

CHAPTER 1

Early History

The ancient origins of the modern theory of point processes are not easy to trace, nor is it our aim to give here an account with claims to being deﬁnitive. But any retrospective survey of a subject must inevitably give some focus on those past activities that can be seen to embody concepts in common with the modern theory. Accordingly, this ﬁrst chapter is a historical indulgence but with the added beneﬁt of describing certain fundamental concepts informally and in a heuristic fashion prior to possibly obscuring them with a plethora of mathematical jargon and techniques. These essentially simple ideas appear to have emerged from four distinguishable strands of enquiry—although our division of material may sometimes be a little arbitrary. These are (i) (ii) (iii) (iv)

life tables and the theory of self-renewing aggregates; counting problems; particle physics and population processes; and communication engineering.

The ﬁrst two of these strands could have been discerned in centuries past and are discussed in the ﬁrst two sections. The remaining two essentially belong to the twentieth century, and our comments are briefer in the remaining section.

1.1. Life Tables and Renewal Theory Of all the threads that are woven into the modern theory of point processes, the one with the longest history is that associated with intervals between events. This includes, in particular, renewal theory, which could be deﬁned in a narrow sense as the study of the sequence of intervals between successive replacements of a component that is liable to failure and is replaced by a new 1

2

1. Early History

component every time a failure occurs. As such, it is a subject that developed during the 1930s and reached a deﬁnitive stage with the work of Feller, Smith, and others in the period following World War II. But its roots extend back much further than this, through the study of ‘self-renewing aggregates’ to problems of statistical demography, insurance, and mortality tables—in short, to one of the founding impulses of probability theory itself. It is not easy to point with conﬁdence to any intermediate stage in this chronicle that recommends itself as the natural starting point either of renewal theory or of point process theory more generally. Accordingly, we start from the beginning, with a brief discussion of life tables themselves. The connection with point processes may seem distant at ﬁrst sight, but in fact the theory of life tables provides not only the source of much current terminology but also the setting for a range of problems concerning the evolution of populations in time and space, which, in their full complexity, are only now coming within the scope of current mathematical techniques. In its basic form, a life table consists of a list of the number of individuals, usually from an initial group of 1000 individuals so that the numbers are eﬀectively proportions, who survive to a given age in a given population. The most important parameters are the number x surviving to age x, the number dx dying between the ages x and x + 1 (dx = x − x+1 ), and the number qx of those surviving to age x who die before reaching age x + 1 (qx = dx /x ). In practice, the tables are given for discrete ages, with the unit of time usually taken as 1 year. For our purposes, it is more appropriate to replace the discrete time parameter by a continuous one and to replace numbers by probabilities for a single individual. Corresponding to x we have then the survivor function S(x) = Pr{lifetime > x}. To dx corresponds f (x), the density of the lifetime distribution function, where f (x) dx = Pr{lifetime terminates between x and x + dx}, while to qx corresponds q(x), the hazard function, where q(x) dx = Pr{lifetime terminates between x and x + dx | it does not terminate before x.} Denoting the lifetime distribution function itself by F (x), we have the following important relations between the functions above: x ∞ S(x) = 1 − F (x) = f (y) dy = exp − q(y) dy , (1.1.1) x

0

dF dS f (x) = = , dx dx d d f (x) = [log S(x)] = − {log[1 − F (x)]}. q(x) = S(x) dx dx

(1.1.2) (1.1.3)

1.1.

Life Tables and Renewal Theory

3

The ﬁrst life table appeared, in a rather crude form, in John Graunt’s (1662) Observations on the London Bills of Mortality. This work is a landmark in the early history of statistics, much as the famous correspondence between Pascal and Fermat, which took place in 1654 but was not published until 1679, is a landmark in the early history of formal probability. The coincidence in dates lends weight to the thesis (see e.g. Maistrov, 1967) that mathematical scholars studied games of chance not only for their own interest but for the opportunity they gave for clarifying the basic notions of chance, frequency, and expectation, already actively in use in mortality, insurance, and population movement contexts. An improved life table was constructed in 1693 by the astronomer Halley, using data from the smaller city of Breslau, which was not subject to the same problems of disease, immigration, and incomplete records with which Graunt struggled in the London data. Graunt’s table was also discussed by Huyghens (1629–1695), to whom the notion of expected length of life is due. A. de Moivre (1667–1754) suggested that for human populations the function S(x) could be taken to decrease with equal yearly decrements between the ages 22 and 86. This corresponds to a uniform density over this period and a hazard function that increases to inﬁnity as x approaches 86. The analysis leading to (1.1.1) and (1.1.2), with further elaborations to take into account diﬀerent sources of mortality, would appear to be due to Laplace (1747–1829). It is interesting that in A Philosophical Essay on Probabilities (1814), where the classical deﬁnition of probability based on equiprobable events is laid down, Laplace gave a discussion of mortality tables in terms of probabilities of a totally diﬀerent kind. Euler (1707–1783) also studied a variety of problems of statistical demography. From the mathematical point of view, the paradigm distribution function for lifetimes is the exponential function, which has a constant hazard independent of age: for x > 0, we have f (x) = λe−λx ,

q(x) = λ,

S(x) = e−λx ,

F (x) = 1 − e−λx .

(1.1.4)

The usefulness of this distribution, particularly as an approximation for purposes of interpolation, was stressed by Gompertz (1779–1865), who also suggested, as a closer approximation, the distribution function corresponding to a power-law hazard of the form q(x) = Aeαx

(A > 0, α > 0, x > 0).

(1.1.5)

With the addition of a further constant [i.e. q(x) = B + Aeαx ], this is known in demography as the Gompertz–Makeham law and is possibly still the most widely used function for interpolating or graduating a life table. Other forms commonly used for modelling the lifetime distribution in different contexts are the Weibull, gamma, and log normal distributions, corresponding, respectively, to the formulae q(x) = βλxβ−1

with S(x) = exp(−λxβ )

(λ > 0, β > 0),

(1.1.6)

4

1. Early History

f (x) = λαxα−1 e−λx Γ(α), √ 2 f (x) = (σx 2π )−1 e−[(log x−µ)/σ] /2 .

(1.1.7) (1.1.8)

The Weibull distribution was introduced by Weibull (1939a, b) as a model for brittle fracture. Both this and the preceding distribution have an interpretation in terms of extreme value theory (see e.g. Exercise 1.1.2), but it should be emphasized that as a general rule the same distribution may arise from several models (see Exercise 1.1.3). The gamma distribution has a long history and arises in many diﬀerent contexts. When α = 12 k and λ = 12 , it is nothing other than the chi-squared distribution with k degrees of freedom, with well-known applications in mathematical statistics. When α = 1, it reduces to the exponential distribution, and when α = 32 , it reduces to the Maxwell distribution for the distribution of energies of molecules in a perfect gas. The most important special cases in the context of life tables arise when α is a positive integer, say α = k. It then has an interpretation as the sum of k independent random variables, each having an exponential distribution. Although commonly known as the Erlang distribution, after the Danish engineer and mathematician who introduced it as a model for telephone service and intercall distributions in the 1920s, this special form and its derivation were known much earlier. One of the earliest derivations, if not the ﬁrst, is due to the English mathematician R.C. Ellis (1817–1859) in a remarkable paper in 1844 that could well be hailed as one of the early landmarks in stochastic process theory, although in fact it is rarely quoted. In addition to establishing the above-mentioned result as a special case, Ellis studied a general renewal process and in that context established the asymptotic normality of the sum of a number of independent nonnegative random variables. It is particularly remarkable in that he used Fourier methods; in other words, essentially the modern characteristic function proof (with a few lacunae from a modern standpoint) of the central limit theorem. An equally interesting aspect of Ellis’ paper is the problem that inspired the study. This takes us back a century and a half to an even less familiar statistician in the guise of Sir Isaac Newton (1642–1728). For much of his later life, Newton’s spare time was devoted to theological problems, one of which was to reconcile the ancient Greek and Hebrew chronologies. In both chronologies, periods of unknown length are spanned by a list of successive rulers. Newton proposed to estimate such periods, and hence to relate the two chronologies, by supposing each ruler to reign for a standard period of 22 years. This ﬁgure was obtained by a judicious comparison of averages from a miscellany of historical data for which more or less reliable lengths of reigns were known. It is a statistical inference in the same sense as many of Graunt’s inferences from the London Bills of Mortality: a plausible value based on the best or only evidence available and supported by as many cross-checks as can be devised. How far it was explicitly present in Newton’s mind that he was dealing with a statistical problem and whether he made any attempts

1.1.

Life Tables and Renewal Theory

5

to assess the likely errors of his results himself are questions we have not been able to answer with any certainty. In an informal summary of his work, Newton (1728) wrote: “I do not pretend to be exact to a year: there may be errors of ﬁve or ten years, and sometimes twenty, and not much above.” However, it appears unlikely that these ﬁgures were obtained by any theory of compounding of errors. It is tempting to conjecture that he may have discussed the problems with such friends and Fellows of the Royal Society as Halley, whose paper to the Royal Society would have been presented while Newton was president, and de Moivre, who dedicated the ﬁrst edition of The Doctrine of Chances to Newton, but if records of such discussions exist, we have not found them. Up until the middle of the nineteenth century, as will be clear even from the brief review presented above, mathematical problems deriving from life tables not only occupied a major place in the subject matter of probability and statistics but also attracted the attention of many leading mathematicians of the time. From the middle of the nineteenth century onward, however, actuarial mathematics (together, it may be added, with many other probabilistic notions), while important in providing employment for mathematicians, became somewhat disreputable mathematically, a situation from which it has not fully recovered. (How many elementary textbooks in statistics, for example, even mention life tables, let alone such useful descriptive tools as the hazard function?) The result was that when, as was inevitably the case, new applications arose that made use of the same basic concepts, the links with earlier work were lost or only partially recognized. Moreover, the new developments themselves often took place independently or with only a partial realization of the extent of common material. In the twentieth century, at least three such areas of application may be distinguished. The ﬁrst, historically, was queueing theory, more speciﬁcally the theory of telephone trunking problems. Erlang’s (1909) ﬁrst paper on this subject contains a derivation of the Poisson distribution for the number of calls in a ﬁxed time interval. It is evident from his comments that even before that time the possibility of using probabilistic methods in that context was being considered by engineers in several countries. The work here appears to be quite independent of earlier contributions. In later work, the analysis was extended to cover queueing systems with more general input and service distributions. Mathematical interest in actuarial problems as such re-emerged in the 1910s and 1920s in connection with the diﬀerential and integral equations of population growth. Here at least there is a bridge between the classical theory of life tables on the one hand and the modern treatments of renewal processes on the other. It is provided by the theory of ‘self-renewing aggregates’ [to borrow a phrase from the review by Lotka (1939), which provides a useful survey of early work in this ﬁeld], a term that refers to a population (portfolio in the insurance context) of individuals subject to death but also able to regenerate themselves so that a stable population can be achieved.

6

1. Early History

As a typical illustration, consider the evolution of a human population for which it is assumed that each female of age x has a probability φ(x) dt of giving birth to a daughter in a time interval of length dt, independently of the behaviour of other females in the population and also of any previous children she may have had. Let S(x) denote the survivor function for the (female) life distribution and n(t) the expected female birth rate at time t. Then n(t) satisﬁes the integral equation

t

n(t − x)S(x)φ(x) dx,

n(t) = 0

which represents a breakdown of the total female birth rate by age of parent. If the population is started at time zero with an initial age distribution having density r(x), the equation can be rewritten in the form

t

n(t − x)S(x)φ(x) dx,

n(t) = n0 (t) + 0

where n0 (t) =

∞

S(t + x) φ(t + x) dx S(x)

r(x) 0

is the contribution to the birth rate at time t from the initial population. In this form, the analogy with the integral equation of renewal theory is clear. Indeed, the latter equation corresponds to the special case where at death each individual is replaced by another of age zero and no other ‘births’ are possible. The population size then remains constant, and it is enough to consider a population with just one member. In place of n(t), we then have the renewal density m(t), with m(t) dt representing the probability that a replacement will be required in the small time interval (t, t + dt); also, φ(x) becomes the hazard function h(x) for the life distribution, and the combination S(x)h(x) can be replaced by the probability density function f (x) as in (1.1.3). Thus, we obtain the renewal equation in the form

t

m(t − u)f (u) du.

m(t) = n0 (t) + 0

If, ﬁnally, the process is started with a new component in place at time 0, then n0 (t) = f (t) and we have the standard form

t

ms (t − u)f (u) du.

ms (t) = f (t) + 0

The third ﬁeld to mention is reliability theory. A few problems in this ﬁeld, including Weibull’s discussion of brittle fracture, appeared before World War II, but its systematic development relates to the post-war period and the rapid growth of the electronics industry. Typical problems are the calculation

1.1.

Life Tables and Renewal Theory

7

of lifetime distributions of systems of elements connected in series (‘weakest link’ model) or in parallel. Weibull’s analysis is an example of the ﬁrst type of model, which typically leads to an extreme-value distribution with a long right tail. An early example of a parallel model is Daniels’ (1945) treatment of the failure of ﬁbre bundles; the distributions in this case have an asymptotically normal character. In between and extending these two primary cases lie an inﬁnite variety of further failure models, in all of which the concepts and terminology invented to cover the life table problem play a central role. In retrospect, it is easy to see that the three ﬁelds referred to are closely interconnected. Together, they provide one of the main areas of application and development of point process theory. Of course, they do not represent the only ﬁelds where life table methods have been applied with success. An early paper by Watanabe (1933) gives a life table analysis of the times between major earthquake disasters, a technique that has been resurrected by several more recent writers under the name of theory of durability. An important recent ﬁeld of application has been the study of trains of nerve impulses in neurophysiology. In fact, the tools are available and relevant for any phenomenon in which the events occur along a time axis and the intervals between the time points are important and meaningful quantities.

Exercises and Complements to Section 1.1 1.1.1 A nonnegative random variable (r.v.) X with distribution function (d.f.) F has an increasing failure rate (abbreviated to IFR) if the conditional d.f.s Fx (u) = Pr{X ≤ x + u | X > x} =

F (x + u) − F (x) 1 − F (x)

(u, x ≥ 0)

are increasing functions of x for every ﬁxed u in 0 x) decreases with increasing x, and it is new better than used in expectation (NBUE) if E(X − x | X > x) ≤ EX (all x > 0). Show that IFR implies DMRL, DMRL implies NBUE, and NBUE implies that var X ≤ (EX)2 [see Stoyan (1983, Section 1.6)]. 1.1.2 Let X1 , X2 , . . . be a sequence of independent identically distributed r.v.s with d.f. F (·). Then, for any ﬁxed nonnegative integer n,

Pr

max Xj ≤ u

1≤j≤n

n

= (F (u)) .

Replacing n by a Poisson-distributed r.v. N with mean µ yields G(u) ≡ Pr

max Xj ≤ u

1≤j≤N

≡ e−µ

∞

µk (k!)−1 (F (u)) = e−µ(1−F (u)) . k

k=0

When F (u) = 1 − e−λu , G is the Gumbel d.f., while when F (u) = 1 − λu−α , G is the Weibull d.f. [In the forms indicated, these extreme-value distributions include location and/or scale parameters; see e.g. Johnson and Kotz (1970, p. 272).]

8

1. Early History

= 1 − e−λu . Show 1.1.3 Let X1 , X2 , . . . be as in the previous exercise with F (u) n that Y ≡ max(X1 , . . . , Xn ) has the same distribution as Xj /j. j=1 [Hint: Regard X1 , . . . , Xn as lifetimes in a linear death process with death rate λ, so that y is the time to extinction of the process. Exercise 2.1.2 gives more general properties.] 1.1.4 Suppose that the lifetimes of rulers are independent r.v.s with common d.f. F and that conditional on reaching age 21 years, a ruler has a son (with lifetime d.f.s F ) every 2 years for up to six sons, with the eldest surviving son succeeding him. Conditional on there being a succession, what is the d.f. of the age at succession and the expected time that successor reigns (assuming a reign terminated by death from natural causes)? What types of error would be involved in matching chronologies from a knowledge of the orders of two sets of rulers (see the reference to Newton’s work in the text)? How would such chronologies be matched in the light of developments in statistical techniques subsequent to Newton? 1.1.5 Investigate the integral equation for the stationary age distribution in a supercritical age-dependent branching process. Using a suitable metric, evaluate the diﬀerence between this stationary age distribution and the backward recurrence time distribution of a stationary renewal process with the same lifetime distribution as a function of the mean of the oﬀspring distribution. Note that Euler worked on the age distribution in exponentially growing populations.

1.2. Counting Problems The other basic approach to point process phenomena, and the only systematic approach yet available in spaces of higher dimension, is to count the numbers of events in intervals or regions of various types. In this approach, the machinery of discrete distributions plays a central role. Since in probability theory discrete problems are usually easier to handle than continuous problems, it might be thought that the development of general models for a discrete distribution would precede those for a continuous distribution, but in fact the reverse seems to be the case. Although particular examples, such as the Bernoulli distribution and the negative binomial distribution, occurred at a very early stage in the discussion of games of chance, there seems to be no discussion of discrete distributions as such until well into the nineteenth century. We may take as a starting point Poisson’s (1837) text, which included a derivation of the Poisson distribution by passage to the limit from the binomial (the claim that he was anticipated in this by de Moivre is a little exaggerated in our view: it is true that de Moivre appends a limit result to the discussion of a certain card problem, but it can hardly be said that the resulting formula was considered by de Moivre as a distribution, which may be the key point). Even Poisson’s result does not seem to have been widely noted at the time, and it is not derived in a counting process context. The ﬁrst discussions of counting problems known to us are by Seidel (1876) and Abb´e (1879),

1.2.

Counting Problems

9

who treated the occurrence of thunderstorms and the number of blood cells in haemocytometer squares, respectively, and both apparently independently of Poisson’s work. Indeed, Poisson’s discovery of the distribution seems to have been lost sight of until attention was drawn to it in Von Bortkiewicz’s (1898) monograph Vas Gesetz der kleinen Zahlen, which includes a systematic account of phenomena that ﬁt the Poisson distribution, including, of course, the famous example of the number of deaths from horse kicks in the Prussian army. Lyon and Thoma (1881), on Abb´e’s data, and Student (1907) gave further discussions of the blood cell problem, the latter paper being famous as one of the earliest applications of the chi-square goodness-of-ﬁt test. Shortly afterward, the Poisson process arose simultaneously in two very important contexts. Erlang (1909) derived the Poisson distribution for the number of incoming calls to a telephone trunking system by supposing the numbers in disjoint intervals to be independent and considering the limit behaviour when the interval of observation is divided into an increasing number of equally sized subintervals. This eﬀectively reproduces the Poisson distribution as the limit of the binomial, but Erlang was not aware of Poisson’s work at the time, although he corrected the omission in later papers. Then, in 1910, Bateman, brought in as mathematical consultant by Rutherford and Geiger in connection with their classical experiment on the counting of α particles, obtained the Poisson probabilities as solutions to the family of diﬀerential equations pn (t) = −λpn (t) + pn−1 (t) p0 (t)

(n ≥ 1),

= −λp0 (t).

[Concerning the relation p0 (t) = e−λt , Bateman (1910) commented that it “has been known for some time (Whitworth’s Choice and Chance, 4th Ed., Proposition LI),” while Haight (1967) mentioned the result as a theorem of Boltzmann (1868) and quoted the reference to Whitworth, who does not indicate the sources of his results; in a Gresham lecture reproduced in Whitworth (1897, p. xxxiii), he wrote of Proposition LI as “a general theorem which I published in 1886, which met with rather rough treatment at the hands of a reviewer in The Academy.” Whitworth’s (1867) book evolved through ﬁve editions. It is easy to envisage repeated independent discovery of his Proposition LI.] These equations represent a formulation in terms of a pure birth process and the ﬁrst step in the rapid development of the theory of birth and death processes during the next two decades, with notable early papers by McKendrick (1914, 1926) and Yule (1924). This work preceded the general formulation of birth and death processes as Markov processes (themselves ﬁrst studied by Markov more than a decade earlier) in the 1930s and is not of immediate concern, despite the close connection with point process problems. A similar remark can be made about branching processes, studied ﬁrst by Bienaym´e (see Heyde and Seneta, 1977) and of course by Galton and Watson

10

1. Early History

(1874). There are close links with point processes, particularly in the general case, but the early studies used special techniques that again lie a little outside the scope of our present discussion, and it was only from the 1940s onward that the links became important. Closer in line with our immediate interests is the work on alternatives to the Poisson distribution. In many problems in ecology and elsewhere, it is found that the observed distribution of counts frequently shows a higher dispersion (i.e. a higher variance for a given value of the mean) than can be accounted for satisfactorily by the Poisson distribution, for which the variance/mean ratio is identically unity. The earliest and perhaps still the most widely used alternative is the negative binomial distribution, which ﬁgures in early papers by Student (1907), McKendrick (1914), and others. A particularly important paper for the sequel was the study by Greenwood and Yule (1920) of accident statistics, which provided an important model for the negative binomial, and in so doing sparked a controversy, still not entirely resolved, concerning the identiﬁability of the model describing accident occurrence. Since the accident process is a kind of point process in time, and since shades of the same controversy will appear in our own models, we brieﬂy paraphrase their derivation. Before doing so, however, it is convenient to summarize some of the machinery for handling discrete distributions. The principal tool is the probability generating function (p.g.f.) deﬁned for nonnegative integer-valued random variables X by the equation P (z) =

∞

pn z n ,

0

where pn = Pr{X = n}. It is worth mentioning that although generating functions have been used in connection with diﬀerence equations at least since the time of Laplace, their application to this kind of problem in the 1920s and 1930s was hailed as something of a technological breakthrough. In Chapter 5, relations between the p.g.f., factorial moments, and cumulants are discussed. For the present, we content ourselves with the observation that the negative binomial distribution can be characterized by the form of its p.g.f., α µ P (z) = (α > 0, µ > 0), (1.2.1) 1+µ−z corresponding to values of the probabilities themselves, α n µ 1 (α − 1 + n)! . pn = (α − 1)! n! 1 + µ 1+µ 1 Note

that there is a lack of agreement on terminology. Other authors, for example Johnson and Kotz (1969), would label this as a compound Poisson and would call the distribution we treat below under that name a generalized Poisson. The terminology we use is perhaps more common in texts on probability and stochastic processes; the alternative terminology is more common in the statistical literature.

1.2.

Counting Problems

11

Greenwood and Yule derived this distribution as an example of what we call a mixed Poisson1 distribution; that is, it can be obtained from a Poisson distribution pn = e−λ λn /n! by treating the parameter λ as a random variable. If, in particular, λ is assumed to have the gamma distribution −1 −µλ e dλ, dF (λ) = µα λα−1 Γ(α) then the resultant discrete distribution has p.g.f. α ∞ µ P (z) = , eλ(z−1) dF (λ) = 1+µ−z 0 eλ(z−1) being the p.g.f. of the Poisson distribution with parameter λ. It is not diﬃcult to verify that the mean and variance of this negative binomial distribution equal α/µ and (α/µ)(1 + µ−1 ), so that the variance/mean ratio of the distribution equals 1 + µ−1 , exceeding by µ−1 the corresponding ratio for a Poisson distribution. Greenwood and Yule interpreted the variable parameter λ of the underlying Poisson distribution as a measure of individual ‘accident proneness,’ which was then averaged over all individuals in the population. The diﬃculty for the sequel is that, as was soon recognized, many other models also give rise to the negative binomial, and these may have quite contradictory interpretations in regard to accidents. L¨ uders (1934) showed that the same distribution could be derived as an example of a compound Poisson distribution, meaning a random sum of independent random variables in which the number of terms in the sum has a Poisson distribution. If each term is itself discrete and has a logarithmic distribution with p.g.f. P (z) =

log(1 + µ − z) , log µ

(1.2.2)

and if the number of terms has a Poisson distribution with parameter α, then the resultant distribution has the identical p.g.f. (1.2.1) for the negative binomial (see Exercise 1.2.1). The interpretation here would be that all individuals are identical but subject to accidents in batches. Even before this, Eggenberger and P´ olya (1923) and P´ olya (1931) had introduced a whole family of distributions, for which they coined the term ‘contagious distributions’ to describe situations where the occurrence of a number of events enhances the probability of the occurrence of a further event, and had shown that the negative binomial distribution could be obtained in this way. If the mixed and compound models can be distinguished in principle by examining the joint distributions of the number of accidents in nonoverlapping intervals of a person’s life, Cane (1974, 1977) has shown that there is no way in which the mixed Poisson and P´ olya models can be distinguished from observations on individual case histories, for they lead to identical conditional distributions (see Exercise 1.2.2).

12

1. Early History

Another important contribution in this ﬁeld is the work of Neyman (1939), who introduced a further family of discrete distributions, derived from consideration of a cluster model. Speciﬁcally, Neyman was concerned with distributions of beetle larvae in space, supposing these to have crawled some small distance from their initial locations in clusters of eggs. Further analysis of this problem resulted in a series of papers, written by Neyman in collaboration with E.L. Scott and other writers, which treated many diﬀerent statistical questions relating to clustering processes in ecology, astronomy, and other subjects (see e.g. Neyman and Scott, 1958). Many of these questions can be treated most conveniently by the use of generating functionals and moment densities, a theory that had been developing simultaneously as a tool for describing the evolution of particle showers and related problems in theoretical physics. The beginnings of such a general theory appear in the work of the French physicist Yvon (1935), but the main developments relate to the post-war period, and we therefore defer a further discussion to the following section.

Exercises and Complements to Section 1.2 1.2.1 Poisson mixture of logarithmic distributions is negative binomial. Verify that if X1 , X2 , . . . are independent r.v.s with the logarithmic distribution whose p.g.f. is in (1.2.2), and if N , independent of X1 , X2 , . . . , is a Poisson r.v. with mean α, then X1 + · · · + XN has the negative binomial distribution in (1.2.1). 1.2.2 Nonidentiﬁability in a model for accident proneness. Suppose that an individual has n accidents in the time interval (0, T ) at t1 < t2 < · · · < tn . Evaluate the likelihood function for these n times for the two models: (i) accidents occur at the epochs of a Poisson process at rate λ, where λ is ﬁxed for each individual but may vary between individuals; (ii) conditional on having experienced j accidents in (0, t), an individual has probability (k + j)µ dt/(1 + µt) of an accident in (t, t + dt), independent of the occurrence times of the j accidents in (0, t); each individual has probability kµ dt of an accident in (0, dt). Show that the probabilities of n events in (0, T ) are Poisson and negative binomial, respectively, and deduce that the conditional likelihood, given n, is the same for (i) and (ii). See Cane (1974) for discussion. 1.2.3 The negative binomial distribution can also arise as the limit of the P´olya– Eggenberger distribution deﬁned for integers n and α, β > 0 by

pk =

n Γ(α + k)Γ(β + n − k)Γ(α + β) = Γ(α + β + n)Γ(α)Γ(β) k

−α Γ(α + β)n!Γ(β + n − k) . k Γ(β)(n − k)!Γ(β + n + α)

When β and n → ∞ with n/β → µ, a constant, and α ﬁxed, show that {pk } has the p.g.f. in (1.2.1). [For further properties, see Johnson and Kotz (1969) and the papers cited in the text.]

1.3.

Some More Recent Developments

13

1.2.4 Neyman’s Type A distribution (e.g. Johnson and Kotz, 1969) has a p.g.f. of the form exp µ

αi exp[−λi (1 − z)] − 1

,

i

α = 1, λi > 0, and µ > 0, and arises as a cluster model. where αi ≥ 0, i i Give such a cluster model interpretation for the simplest case αi = 1 for i = 1, αi = 0 otherwise, and general λ ≡ λ1 and µ. 1.2.5 Suppose that a (large) population evolves according to a one-type Galton– Watson branching process in which the distribution of the number of children has p.g.f. P (z). Choose an individual at random in a particular generation. Show that the distribution of the number of sibs (sisters, say) of this randomly chosen individual has p.g.f. P (z)/P (1) and that this is the same as for the number of aunts, or great-aunts, of this individual. [Hint: Attempting to estimate the oﬀspring distribution by using the observed family size distribution, when based on sampling via the children, leads to the distribution with p.g.f. zP (z)/P (1) and is an example of length-biased sampling that underlies the waiting-time paradox referred to in Sections 3.2 and 3.4. The p.g.f. for the number of great-aunts is used in Chapter 11.]

1.3. Some More Recent Developments The period during and following World War II saw an explosive growth in theory and applications of stochastic processes. On the one hand, many new applications were introduced and existing ﬁelds of application were extended and deepened; on the other hand, there was also an attempt to unify the subject by deﬁning more clearly the basic theoretical concepts. The monographs by Feller (1950) and Bartlett (1955) (preceded by mimeographed lecture notes from 1947) played an important role in stressing common techniques and exploring the mathematical similarities in diﬀerent applications; both remain remarkably succinct and wide-ranging surveys. From such a busy scene it is diﬃcult to pick out clearly marked lines of development, and any selection of topics is bound to be inﬂuenced by personal preferences. Bearing such reservations in mind, we can attempt to follow through some of the more important themes into the post-war period. On the queueing theory side, a paper of fundamental importance is Connie Palm’s (1943) study of intensity ﬂuctuations in traﬃc theory, a title that embraces topics ranging from the foundation of a general theory of the input stream to the detailed analysis of particular telephone trunking systems. Three of his themes, in particular, were important for the future of point processes. The ﬁrst is the systematic description of properties of a renewal process, as a ﬁrst generalization of the Poisson process as input to a service system. The notion of a regeneration point, a time instant at which the system reverts to a speciﬁed state with the property that the future evolution is independent of how the state was reached, has proved exceptionally fruitful in many diﬀerent applications. In Palm’s terminology, the Poisson process

14

1. Early History

is characterized by the property that every instant is a regeneration point, whereas for a general renewal process only those instants at which a new interval is started form regeneration points. Hence, he called a Poisson process a process without aftereﬀects and a renewal process a process with limited aftereﬀects. Another important idea was his realization that two types of distribution function are important in describing a stationary point process—the distribution of the time to the next event from a ﬁxed but arbitrary origin and the distribution of the time to the next event from an arbitrary event of the process. The relations between the two sets of distributions are given by a set of equations now commonly called the Palm–Khinchin equations, Palm himself having exhibited only the simplest special case. A third important contribution was his (incomplete) proof of the ﬁrst limit theorem for point processes: namely, that superposition of a large number of independent sparse renewal processes leads to a Poisson process in the limit. Finally, it may be worth mentioning that it was in Palm’s paper that the term ‘point processes’ (Punktprozesse) was ﬁrst used as such—at least to the best of our knowledge. All these ideas have led to important further development. H. Wold (1948, 1949), also a Swedish mathematician, was one of the ﬁrst to take up Palm’s work, studying processes with Markov-dependent intervals that, he suggested, would form the next most complex alternative to the renewal model. Bartlett (1954) reviewed some of this early work. Of the reworkings of Palm’s theory, however, the most inﬂuential was the monograph by Khinchin (1955), which provided a more complete and rigorous account of Palm’s work, notably extended it in several directions, and had the very important eﬀect of bringing the subject to the attention of pure mathematicians. Thus, Khinchin’s book became the inspiration of much theoretical work, particularly in the Soviet Union and Eastern Europe. Ryll-Nardzewski’s (1961) paper set out fundamental properties of point processes and provided a new and more general approach to Palm probabilities. Starting in the early 1960s, Matthes and co-workers developed many aspects concerned with inﬁnitely divisible point processes and related questions. The book by Kerstan, Matthes and Mecke (1974) represented the culmination of the ﬁrst decade of such work; extensive revisions and new material were incorporated into the later editions in English (1978) (referred to as MKM in this book) and in Russian (1982). In applications, these ideas have been useful not only in queueing theory [for continuing development in this ﬁeld, see the monographs of Franken et al. (1981) and Br´emaud (1981)] but also in the study of level-crossing problems. Here the pioneering work was due to Rice (1944) and McFadden (1956, 1958). More rigorous treatments, using some of the Palm–Khinchin theory, were given by Leadbetter and other writers [see e.g. Leadbetter (1972) and the monographs by Cram´er and Leadbetter (1967) and Leadbetter, Lindgren and Rootzen (1983)]. On a personal note in respect of much of this work, it is appropriate to remark that Belyaev, Franken, Grigelionis, K¨ onig, Matthes, and one of us,

1.3.

Some More Recent Developments

15

among others, were aﬀected by the lectures and personal inﬂuence of Gnedenko (see Vere-Jones, 1997), who was a student of Khinchin. Meanwhile, there was also rapid development on the theoretical physics front. The principal ideas here were the characteristic and generating functionals and product densities. As early as 1935, Kolmogorov suggested the use of the characteristic functional Φ(ξ) = E(eiX,ξ ) as a tool in the study of random elements X from a linear space L; ξ is then an element from the space of linear functionals on L. The study of probability measures on abstract spaces remained a favourite theme of the Russian school of probability theory and led to the development of the weak convergence theory for measures on metric spaces by Prohorov (1956) and others, which in turn preceded the general study of random measures [e.g. Ji˘rina (1966) and later writers including the Swedish mathematicians Jagers (1974) and Kallenberg (1975)]. After the war, the characteristic functional was discussed by LeCam (1947) for stochastic processes and Bochner (1947) for random interval functions. Bochner’s (1955) monograph, in particular, contains many original ideas that have only partially been followed up, for example, by Brillinger (1972). Kendall (1949) and Bartlett and Kendall (1951) appear to be the ﬁrst to have used the characteristic functional in the study of speciﬁc population models. Of more immediate relevance to point processes is the related concept of a probability generating functional (p.g.ﬂ.) deﬁned by G[h] = E h(xi ) = E exp log h(x) N (dx) , i

where h(x) is a suitable test function and the xi are the points at which population members are located, that is, the atoms of the counting measures N (·). The p.g.ﬂ. is the natural extension of the p.g.f., and, like the p.g.f., it has an expansion, when the total population is ﬁnite, in terms of the probabilities of the number of particles in the population and the probability densities of their locations. There is also an expansion, analogous to the expansion of the p.g.f. in terms of factorial moments, in terms of certain factorial moment density functions, or product densities as they are commonly called in the physical literature. Following the early work of Yvon noted at the end of Section 1.2, the p.g.ﬂ. and product densities were used by Bogoliubov (1946), while properties of product densities were further explored in important papers by Bhabha (1950) and Ramakrishnan (1950). Ramakrishnan, in particular, gave formulae expressing the moments of the number of particles in a given set in terms of the product densities and Stirling numbers. Later, these ideas were considerably extended by Ramakrishnan, Janossy, Srinivasan, and others; an extensive literature exists on their application to cosmic ray showers summarized in the monographs by Janossy (1948) and Srinivasan (1969, 1974).

16

1. Early History

This brings us to another key point in the mathematical theory of point processes, namely the fundamental paper by Moyal (1962a). Drawing principally on the physical and ecological contexts, Moyal for the ﬁrst time set out clearly the mathematical constructs needed for a theory of point processes on a general state space, clarifying the relations between such quantities as the product densities, ﬁnite-dimensional distributions, and probability generating functionals and pointing out a number of important applications. Independently, Harris (1963) set out similar ideas in his monograph on branching processes, subsequently (Harris, 1968, 1971) contributing important ideas to the general theory of point processes and the more complex subject of interacting particle systems. In principle, the same techniques are applicable to other contexts where population models are important, but in practice the discussions in such contexts have tended to use more elementary, ad hoc tools. In forestry, for example, a key problem is the assessment of the number of diseased or other special kinds of trees in a given region. Since a complete count may be physically very diﬃcult to carry out and expensive, emphasis has been on statistical sampling techniques, particularly of transects (line segments drawn through the region) and nearest-neighbour distances. Mat´ern’s (1960) monograph brought together many ideas, models, and statistical techniques of importance in such ﬁelds and includes an account of point process aspects. Ripley’s (1981) monograph covers some more recent developments. On the statistical side, Cox’s (1955) paper contained seeds leading to the treatment of many statistical questions concerning data generated by point processes and discussing various models, including the important class of doubly stochastic Poisson processes. A further range of techniques was introduced by Bartlett (1963), who showed how to adapt methods of time series analysis to a point process context and brought together a variety of diﬀerent models. This work was extended to processes in higher dimensions in a second paper (Bartlett, 1964). Lewis (1964a) used similar techniques to discuss the instants of failure of a computer. The subsequent monograph by Cox and Lewis (1966) was a further important development that, perhaps for the ﬁrst time, showed clearly the wide range of applications of point processes as well as extending many of the probabilistic and statistical aspects of such processes. In the 1970s, perhaps the most important development was the rapid growth of interest in point processes in communications engineering (see e.g. Snyder, 1975). It is a remarkable fact that in nature, for example in nerve systems, the transfer of information is more often eﬀected by pulse signals than by continuous signals. This fact seems to be associated with the high signal/noise ratios that it is possible to achieve by these means; for the same reason, pulse techniques are becoming increasingly important in communication applications. For such processes, just as for continuous processes, it is meaningful to pose questions concerning the prediction, interpolation, and estimation of signals, and the detection of signals against background noise (in this context, of random pulses). Since the signals are intrinsically nonnega-

1.3.

Some More Recent Developments

17

tive, the distributions cannot be Gaussian, so linear models are not in general appropriate. Thus, the development of a suitable theory for point processes is closely linked to the development of nonlinear techniques in other branches of stochastic process theory. As in the applications to processes of diﬀusion type, martingale methods provide a powerful tool in the discussion of these problems, yielding, for example, structural information about the process and its likelihood function as well as more technical convergence results. Amongst other books, developments in this area were surveyed in Liptser and Shiryayev (1974; English translation 1977, 1978; 2nd ed. 2000), Br´emaud (1981), and Jacobsen (1982). The last quarter-century has seen both the emergence of new ﬁelds of applications and the consolidation of older ones. Here we shall attempt no more than a brief indication of major directions, with references to texts that can be consulted for more substantive treatments. Spatial point processes, or spatial point patterns as they are often called, have become a burgeoning subject in their own right. The many ﬁelds of application include environmental studies, ecology, geography, astrophysics, ﬁsheries and forestry, as well as substantially new topics such as image processing and spatial epidemic theory. Ripley (1981) and Diggle (1983) discuss both models and statistical procedures, while Cressie (1991) gives a broad overview with the emphasis on applications in biology and ecology. Image processing is discussed in the now classical work of Serra (1982). Theoretical aspects of spatial point patterns link closely with the ﬁelds of stereology and stochastic geometry, stemming from the seminal work of Roger Miles and, particularly, Rollo Davidson (see Harding and Kendall, 1974) and surveyed in Stoyan, Kendall and Mecke (1987, 2nd ed. 1995) and Stoyan and Stoyan (1994). There are also close links with the newly developing subject of random set theory; see Math´eron (1975) and Molchanov (1997). The broad-ranging set of papers in Barndorﬀ-Nielsen et al. (1998) covers many of these applications and associated theory. Time, space–time, and marked space–time point processes have continued to receive considerable attention. As well as in the earlier applications to queueing theory, reliability, and electrical engineering, they have found important uses in geophysics, neurophysiology, cardiology, ﬁnance, and economics. Applications in queueing theory and reliability were developed in the 1980s by Br´emaud (1981) and Franken et al. (1981). Baccelli and Br´emaud (1994) contains a more recent account. Second-order methods for the statistical analysis of such data, including spectral theory, are outlined in the now classic text of Cox and Lewis (1966) and in Brillinger (1975b). Snyder and Miller (1991) describe some of the more recent applications in medical ﬁelds. Extreme-value ideas in ﬁnance are discussed, from a rather diﬀerent point of view than in Leadbetter et al. (1983) and Resnick (1987), in Embrechts et al. (1997). Prediction methods for point processes have assumed growing importance in seismological applications, in which context they are reviewed in Vere-Jones (1995).

18

1. Early History

Survival analysis has emerged as another closely related major topic, with applications in epidemiology, medicine, mortality, quality control, reliability, and other ﬁelds. Here the study of a single point process is usually replaced by the study of many individual processes, sometimes with only a small number of events in each, evolving simultaneously. Starting points include the early papers of Cox (1972b) and Aalen (1975). Andersen et al. (1993) give a major survey of modelling and inference problems in this ﬁeld; their treatment includes an excellent introduction to point process concepts in general, emphasizing martingale concepts for inference, and the use of product-integral formulae. The growing range of applications has led to an upsurge of interest in inference problems for point process models. Many of the texts referred to above devote a substantial part of their discussion to the practical implementation of inference procedures. General principles of inference for point processes are treated in the text by Liptser and Shiryayev already mentioned and in Kutoyants (1980, 1984), Karr (1986, 2nd ed. 1991), and Kutoyants (1998). Theoretical aspects have also continued to ﬂourish, particularly in the connections with statistical mechanics and stochastic geometry. Recent texts on basic theory include Kingman’s (1993) beautiful discussion of the Poisson process and Last and Brandt’s (1995) exposition of marked point processes. There are close connections between point processes and inﬁnite particle systems (Liggett, 1999), while Georgii (1988) outlines ideas related to spatial processes and phase changes. Branching processes in higher-dimensional spaces exhibit many remarkable characteristics, some of which are outlined in Dawson et al. (2000). Very recently, Coram and Diaconis (2002), exploiting Diaconis and Evans (2000, 2001), have studied similarities between ﬁnite point processes of n points on the unit circle constructed from the eigenvalues of random unitary matrices from the unitary group Un , and blocks of n successive zeros of the Riemann zeta function, where n depends on the distance from the real axis of the block of zeros.

CHAPTER 2

Basic Properties of the Poisson Process

The archetypal point processes are the Poisson and renewal processes. Their importance is so great, not only historically but also in illustrating and motivating more general results, that we prefer to give an account of some of their more elementary properties in this and the next two chapters before proceeding to more complex examples and the general theory of point processes. For our present purposes, we shall understand by a point process some method of randomly allocating points to intervals of the real line or (occasionally) to rectangles or hyper-rectangles in a d-dimensional Euclidean space Rd . It is intuitively clear and will be made rigorous in Chapters 5 and 9 that a point process is completely deﬁned if the joint probability distributions are known for the number of events in all ﬁnite families of disjoint intervals (or rectangles, etc.). We call these joint or ﬁnite-dimensional distributions ﬁdi distributions for short.

2.1. The Stationary Poisson Process With the understanding just enunciated, the stationary Poisson process on the line is completely deﬁned by the following equation, in which we use N (ai , bi ] to denote the number of events of the process falling in the half-open interval (ai , bi ] with ai < bi ≤ ai+1 : Pr{N (ai , bi ] = ni , i = 1, . . . , k} =

k [λ(bi − ai )]ni i=1

ni !

e−λ(bi −ai ) .

(2.1.1)

This deﬁnition embodies three important features: (i) the number of points in each ﬁnite interval (ai , bi ] has a Poisson distribution; 19

20

2. Basic Properties of the Poisson Process

(ii) the numbers of points in disjoint intervals are independent random variables; and (iii) the distributions are stationary: they depend only on the lengths bi − ai of the intervals. Thus, the joint distributions are multivariate Poisson of the special type in which the variates are independent. Let us ﬁrst summarize a number of properties that follow directly from (2.1.1). The mean M (a, b] and variance V (a, b] of the number of points falling in the interval (a, b] are given by M (a, b] = λ(b − a) = V (a, b].

(2.1.2)

The constant λ here can be interpreted as the mean rate or mean density of points of the process. It also coincides with the intensity of the process as deﬁned following Proposition 3.3.I. The facts that the mean and variance are equal and that both are proportional to the length of the interval provide a useful diagnostic test for the stationary Poisson process: estimate the mean M (a, b] and the variance V (a, b] for half-open intervals (a, b] over a range of diﬀerent lengths, and plot the ratios V (a, b]/(b − a). The estimates should be approximately constant for a stationary Poisson process and equal to the mean rate. Any systematic departure from this constant value indicates some departure either from the Poisson assumption or from stationarity [see Exercise 2.1.1 and Cox and Lewis (1966, Section 6.3) for more discussion]. Now consider the relation, following directly from (2.1.1), that Pr{N (0, τ ] = 0} = e−λτ

(2.1.3)

is the probability of ﬁnding no points in an interval of length τ . This may also be interpreted as the probability that the random interval extending from the origin to the point ﬁrst appearing to the right of the origin has length exceeding τ . In other words, it gives nothing other than the survivor function for the length of this interval. Equation (2.1.3) therefore shows that the interval under consideration has an exponential distribution. From stationarity, the same result applies to the length of the interval to the ﬁrst point of the process to the right of any arbitrarily chosen origin and then equally to the interval to the ﬁrst point to the left of any arbitrarily chosen origin. In this book, we follow queueing terminology in calling these two intervals the forward and backward recurrence times; thus, for a Poisson process both forward and backward recurrence times are exponentially distributed with mean 1/λ. Using the independence property, we can extend this result to the distribution of the time interval between any two consecutive points of the process, for the conditional distribution of the time to the next point to the right of the origin, given a point in (−∆, 0], has the same exponential form, which, being independent of ∆, is therefore the limiting form of this conditional distribution as ∆ → 0. When such a unique limiting form exists, it can be

2.1.

The Stationary Poisson Process

21

identiﬁed with the distribution of the time interval between two arbitrary points of the process (see also Section 3.4 in the next chapter). Similarly, by considering the limiting forms of more complicated joint distributions, we can show that successive intervals are independently distributed as well as having exponential distributions (see Exercises 2.1.2–4 and, for extensions to R2 and R3 , Exercises 2.1.7–8). On the other hand, the particular interval containing the origin is not exponentially distributed. Indeed, since it is equal to the sum of the forward and backward recurrence times, and each of these has an exponential distribution and is independent of the other, its distribution must have an Erlang (or gamma) distribution with density λ2 xe−λx . This result has been referred to as the ‘waiting-time paradox’ because it describes the predicament of a passenger arriving at a bus stop when the bus service follows a Poisson pattern. The intuitive explanation is that since the position of the origin (the passenger’s arrival) is unrelated to the process governing the buses, it may be treated as eﬀectively uniform over any given time interval; hence, it is more likely to fall in a large rather than a small interval. See Sections 3.2 and 3.4 for more detail and references. Now let tk , k = 1, 2, . . . , denote the time from the origin t0 = 0 to the kth point of the process to the right of the origin. Then we have {tk > x} = {N (0, x] < k}

(2.1.4)

in the sense that the expressions in braces describe identical events. Hence, in particular, their probabilities are equal. But the probability of the event on the right is given directly by (2.1.1), so we have Pr{tk > x} = Pr{N (0, x] < k} =

k−1

j=0

(λx)j −λx e . j!

(2.1.5)

Diﬀerentiating this expression, which gives the survivor function for the time to the kth point, we obtain the corresponding density function fk (x) =

λk xk−1 −λx , e (k − 1)!

(2.1.6)

which is again an Erlang distribution. Since the time to the kth event can be considered as the sum of the lengths of the k random intervals (t0 , t1 ], (t1 , t2 ], . . . , (tk−1 , tk ], which as above are independently and exponentially distributed, this provides an indirect proof of the result that the sum of k independent exponential random variables has the Erlang distribution. In much the same vein, we can obtain the likelihood of a ﬁnite realization of a Poisson process. This may be deﬁned as the probability of obtaining the given number of observations in the observation period, times the joint conditional density for the positions of those observations, given their number. Suppose that there are N observations on (0, T ] at time points t1 , . . . , tN . From (2.1.1), we can write down immediately the probability of obtaining

22

2. Basic Properties of the Poisson Process

single events in (ti − ∆, ti ] and no points on the remaining part of (0, T ]: it is just N λ∆. e−λT j=1

Dividing by ∆ and letting ∆ → 0, to obtain the density, we ﬁnd as the required likelihood function N

L(0,T ] (N ; t1 , . . . , tN ) = λN e−λT .

(2.1.7)

Since the probability of obtaining precisely N events in (0, T ] is equal to [(λT )N /N ! ]e−λT , this implies inter alia that the conditional density of obtaining points at (t1 , . . . , tN ), given N points in the interval, is just N !/T N , corresponding to a uniform distribution over the hyperoctant 0 ≤ t1 ≤ · · · ≤ tN ≤ T. One point about this result is worth stressing. It corresponds to treating the points as indistinguishable apart from their locations. In physical contexts, however, we may be concerned with the positions of N physically distinguishable particles. The factor N ! , which arises in the ﬁrst instance as the volume of the unit hyperoctant, can then be interpreted also as the combinatorial factor representing the number of ways the N distinct particles can be allocated to the N distinct time points. The individual particles are then to be thought of as uniformly and independently distributed over (0, T ]. It is in this sense that the conditional distributions for the Poisson process are said to correspond to the distributions of N particles laid down uniformly at random on the interval (0, T ] (see Exercise 2.1.5). Furthermore, either from this result or directly from (2.1.1), we obtain Pr{N (0, x] = k, N (x, T ] = N − k} Pr{N (0, T ] = N } k N −k N 1 − px,T = , (2.1.8) px,T k

Pr{N (0, x] = k | N (0, T ] = N } =

where px,T = x/T , representing a binomial distribution for the number in the subinterval (0, x], given the number in the larger interval (0, T ]. Most of the results in this section extend both to higher dimensions and to nonstationary processes (see Exercises 2.1.6–8). We conclude the present section by mentioning the simple but important extension to a Poisson process with time-varying rate λ(t), commonly called the nonhomogeneous or inhomogeneous Poisson process. The process can be deﬁned exactly as in b (2.1.1), with the quantities λ(bi − ai ) = aii λ dx replaced wherever they occur by quantities bi λ(x) dx. Λ(ai , bi ] = ai

Thus, the joint distributions are still Poisson, and the independence property still holds. Furthermore, conditional distributions now correspond to particles

2.1.

The Stationary Poisson Process

23

independently distributed on (0, T ] with a common distribution having density function λ(x)/Λ(0, T ] (0 ≤ x ≤ T ). The construction of sample realizations is described in Exercise 2.1.6, while the likelihood function takes the more general form L(0,T ] (N ; t1 , . . . , tN ) = e−Λ(0,T ]

λ(ti )

i=1 T

= exp

N

−

T

λ(t) dt + 0

(2.1.9)

log λ(t) N (dt) . 0

From this expression, we can see that results for the nonstationary Poisson process can be derived from those for the stationary case by a deterministic time change t → u(t) ≡ Λ(0, t]. In other words, if we write N (t) = N (0, t] (all t ≥ 0) and deﬁne a new point process by (t) = N u−1 (t) , N (t) has the rate quantity Λ(0, ˜ t) = u(u−1 (t)) = t and is therefore a then N stationary Poisson process at unit rate. In Chapters 7 and 14, we shall meet a remarkable extension of this last result, due to Papangelou (1972a, b): any point process satisfying a simple continuity condition can be transformed into a Poisson process if we allow a random time change in which Λ[0, t] depends on the past of the process up to time t. Papangelou’s result also implies that (2.1.9) represents the typical form of the likelihood for a point process: in the general case, all that is needed is to replace the absolute rate λ(t) in (2.1.9) by a conditional rate that is allowed to depend on the past of the process. Other extensions lead to the class of mixed Poisson processes (see Exercise 2.1.9) and Cox processes treated in Chapter 6.

Exercises and Complements to Section 2.1 2.1.1 Let N1 , . . . , Nn be i.i.d. like the Poisson r.v. N with mean µ = EN , and write N = (N1 + · · · + Nn )/n for the sample mean. When µ is suﬃciently large, indicate why the sample index of dispersion Z=

n

(Nj − N )2 j=1

N

has a distribution approximating that of a χ2n−1 r.v. Darwin (1957) found approximations to the distribution of Z for a general distribution for N based on its cumulants, illustrating his work via the Neyman, negative binomial, and Thomas distributions (see also Kathirgamatamby, 1953). 2.1.2 Exponential distribution order properties. Let X1 , . . . , Xn be i.i.d. exponential r.v.s on (0, ∞) with Pr{X1 > x} = e−λx (x ≥ 0) for some positive ﬁnite λ. (a) Let X(1) < · · · < X(n) be the order statistics of X1 , . . . , Xn . Then (X(1) , . . . , X(n) ) has the same distribution as the vector whose kth component is Xn Xn−k+1 Xn−1 + ··· + . + n−1 n−k+1 n

24

2. Basic Properties of the Poisson Process (b) Write Y = X1 + · · · + Xn and set Y(k) = (X1 + · · · + Xk )/Y . Then Y(1) , . . . , Y(n−1) are the order statistics of n − 1 i.i.d. r.v.s uniformly distributed on (0, 1).

2.1.3 Exponential r.v.s have no memory. Let X be exponentially distributed as in Exercise 2.1.2, and for any nonnegative r.v. Y that is independent of X, deﬁne an r.v. XY as any r.v. whose d.f. has as its tail R(z) ≡ Pr{XY > z} = Pr{X > Y + z | X > Y }. Then XY and X have the same d.f. [There exist innumerable characterizations of exponential r.v.s via their lack of memory properties; many are surveyed in Galambos and Kotz (1978).] 2.1.4 A process satisfying (2.1.1) has Pr{N (t − x − ∆, t − ∆] = 0, N (t − ∆, t] = 1, N (t, t + y] = 0 | N (t − ∆, t] > 0} → e−λx e−λy

(∆ → 0),

showing the stochastic independence of successive intervals between points of the process. 2.1.5 Order statistics property of Poisson process. Denote the points of a stationary Poisson process on R+ by t1 < t2 < · · · < tN (T ) < · · · , where for any positive T , tN (T ) ≤ T < tN (T )+1 . Let u(1) < · · · < u(n) be the order statistics of n i.i.d. points uniformly distributed on [0, T ]. Show that, conditional on N (T ) = n, the distributions of {u(i) : i = 1, . . . , n} and {ti : i = 1, . . . , n} coincide. 2.1.6 Conditional properties of inhomogeneous Poisson processes. Given a ﬁnite measure Λ(·) on a c.s.m.s. X , let {t1 , . . . , tN (X ) } be a realization of an inhomogeneous Poisson process on X with parameter measure Λ(·). (a) I.i.d. property. Let r.v.s U1 , . . . , Un be i.i.d. on X with probability distribution Λ(·)/Λ(X ). Show that the joint distributions of {Ui } coincide with those of {ti } conditional on N (X ) = n. (b) Binomial distribution. When X = (0, T ], show that (2.1.8) still holds for the process N (·) with px,T = Λ(x)/Λ(T ). (c) Thinning construction. To construct a realization on (0, T ] of an inhomogeneous Poisson process Π1 for which the local intensity λ(·) satisﬁes 0 ≤ λ(u) ≤ λmax (0 < u ≤ T ) for some ﬁnite positive constant λmax , ﬁrst construct a realization of a stationary Poisson process with rate λmax (using the fact that successive intervals are i.i.d. exponential r.v.s with mean 1/λmax ), yielding the points 0 < tl < t2 < · · ·, say. Then, independently for each k = 1, 2, . . . , retain tk as a point of Π1 with probability λ(tk )/λmax and otherwise delete it. Verify that the residual set of points satisﬁes the independence axiom and that

E(#{j: 0 < tj < u, tj ∈ Π1 }) =

u

λ(v) dv. 0

[See also Lewis and Shedler (1976) and Algorithm 7.5.II.]

2.1.

The Stationary Poisson Process

25

2.1.7 Avoidance functions of Poisson process in Rd . The distance X of the point closest to the origin of a Poisson process in Rd with rate λ satisﬁes Pr{X > y} = exp ( − λvd (y)), where vd (y) = y vd (1) is the volume of a sphere of radius y in Rd . particular, (i) in R1 , Pr{X > y} = e−2λy ; d

In

2

(ii) in R2 , Pr{X > y} = e−πλy ;

3

(iii) in R3 , Pr{X > y} = e−(4π/3)λy . These same expressions also hold for the nearest-neighbour distance of an arbitrarily chosen point of the process. 2.1.8 Simulating a Poisson process in Rd . Using the notation of Exercise 2.1.6, we can construct a realization of a Poisson process Πd in a neighbourhood of the origin in Rd by adapting Exercises 2.1.6 and 2.1.7 to give an inhomogeneous Poisson process on (0, T ) with intensity λ(d/dy)vd (y) and then, denoting these points by r1 , r2 , . . . , taking the points of Πd as having polar coordinates (rj , θj ), where θj are points independently and uniformly distributed over the surface of the unit sphere in Rd . [An alternative construction for rj is to use the fact that λ(vd (rj ) − vd (rj−1 )), with r0 = 0, are i.i.d. exponential r.v.s with unit mean. See also Quine and Watson (1984). The eﬃcient simulation of a Poisson process in a ddimensional hypersphere, at least for small d, is to choose a point at random in a d-dimensional hypercube containing the hypersphere and use a rejection method of which Exercise 2.1.6(c) is an example.] 2.1.9 (a) Mixed Poisson process. A point process whose joint distributions are given by integrating λ in the right-hand side of (2.1.1) with respect to some d.f. deﬁnes a mixed Poisson process since the distributions come from regarding λ as a random variable. Verify that N (0, t]/t →a.s. λ

(t → ∞),

EN (0, t] = (Eλ)t, var N (0, t] = (Eλ)t + (var λ)t2 ≥ EN (0, t], with strict inequality unless var λ = 0. (b) Compound Poisson process. Let Y, Y1 , Y2 , . . . be i.i.d. nonnegative integervalued r.v.s with probability generating function g(z) = Ez Y (|z| ≤ 1), and let them be independent of a Poisson process Nc at rate λ; write Nc (t) = Nc (0, t]. Then

Nc (t)

N (0, t] ≡

Yi

i=1

deﬁnes the counting function of a compound Poisson process for which Ez N (0,t] = exp [ − λt(1 − g(z))], EN (0, t] = λ(EY )t, var N (0, t] = λ(var Y )t + λ(EY )2 t = λ[E(Y 2 )]t = [ENc (t)](var Y ) + [var Nc (t)](EY )2 ≥ EN (0, t], with strict inequality unless E[Y (Y − 1)] = 0, i.e. Y = 0 or 1 a.s.

26

2. Basic Properties of the Poisson Process [Both the mixed and compound Poisson processes are in general overdispersed compared with a Poisson process in the sense that (var N (0, t])/EN (0, t] ≥ 1, with equality holding only in the exceptional cases as noted.]

2.1.10 For a Poisson process with the cyclic intensity function (κ ≥ 0, ω0 > 0, 0 ≤ θ < 2π, λ > 0),

λ(t) = λ exp[κ sin(ω0 t + θ)]/I0 (κ)

2π

where I0 (κ) = 0 exp(κ sin u) du is the modiﬁed Bessel function of the ﬁrst kind of zero order, the likelihood [see (2.1.9) above] of the realization t1 , . . . , tN on the interval (0, T ) where, for convenience of simplifying the integral below, T is a multiple of the period 2π/ω0 , equals

exp

− 0

T

λ exp[κ sin(ω0 t + θ)] dt I0 (κ)

= e−λT /2π

λ I0 (κ)

N

N

exp κ

λ I0 (κ)

N

N

exp κ

sin(ω0 ti + θ)

i=1

sin(ω0 ti + θ) .

i=1

Consequently, N is a suﬃcient statistic for λ, and, when the frequency ω0 is known,

N,

N

i=1

sin ω0 ti ,

N

cos ω0 ti

≡ (N, S, C)

say,

i=1

are jointly suﬃcient statistics for the parameters (λ, κ, θ), the maximum likeˆ κ ˆ being determined by λ ˆ = 2πN/T , tan θˆ = C/S, and lihood estimates (λ, ˆ , θ) ˆ = (S 2 + C 2 )1/2 /N (the constraints that (d/dκ) log I0 (κ)|κ=ˆκ = S/(N cos θ) κ ˆ ≥ 0 and that S and cos θ are of the same sign determine which root θˆ is taken). [See Lewis (1970) and Kutoyants (1984, Chapter 4) for more details.]

2.2. Characterizations of the Stationary Poisson Process: I. Complete Randomness In applications, the Poisson process is sometimes referred to simply as a random distribution of points on a line (as if there were no alternative random processes!) or slightly more speciﬁcally as a purely random or completely random process. In all these terminologies, what is in view is the fundamental independence property referred to in (ii) under (2.1.1). We start our discussion of characterizations by examining how far this property alone is capable of characterizing the Poisson process. More precisely, let us assume that we are given a point process satisfying the assumptions below and examine how far the distributions are determined by them. Assumptions 2.2.I. (i) The number of points in any ﬁnite interval is ﬁnite and not identically zero. (ii) The numbers in disjoint intervals are independent random variables. (iii) The distribution of N (a + t, b + t] is independent of t.

2.2.

Characterizations: I. Complete Randomness

27

For brevity, we speak of a process satisfying (i) as boundedly ﬁnite and nonnull, while property (ii) may be referred to as complete independence and (iii) as (crude) stationarity. Theorem 2.2.II. Under Assumptions 2.2.I, the probability generating function (p.g.f.) P (z, τ ) = E(z N (0,τ ] ) can be written uniquely in the form (2.2.1) P (z, τ ) = e−λτ [1−Π(z)] , ∞ where λ is a positive constant and Π(z) = n=1 πn z n is the p.g.f. of a discrete distribution having no zero term. Remark. From the stationarity and independence assumptions, all the joint distributions can be written down once the form of (2.2.1) is given, so that (2.2.1) is in fact suﬃcient to specify the process completely. Hence, the assumption of crude stationarity suﬃces in the case of the Poisson process to ensure its (complete) stationarity (see Deﬁnition 3.2.I below). Proof. Since N (a, b] is a monotonically increasing function of b, it is clear that P (z, τ ) is a monotonically decreasing function of τ for any ﬁxed z with 0 ≤ z ≤ 1, while Q(z, τ ) = − log P (z, τ ), ﬁnite because of Assumption 2.2.I(i), is a monotonically increasing nonnegative function of τ . Also, since N (0, τ1 + τ2 ] = N (0, τ1 ] + N (τ1 , τ1 + τ2 ], it follows from the stationarity and independence assumptions that P (z, τ1 + τ2 ) = P (z, τ1 )P (z, τ2 ), Q(z, τ1 + τ2 ) = Q(z, τ1 ) + Q(z, τ2 ).

(2.2.2)

Now it is well known (see e.g. Lemma 3.6.III) that the only monotonic solutions of the functional equation (2.2.2) are of the form Q(z, τ ) = constant × τ, where in this case the constant is a function of z, C(z) say. Thus, for all τ > 0 we can write P (z, τ ) = e−τ C(z) (2.2.3) for some uniquely determined function C(z). Consider ﬁrst the case z = 0. From Assumption 2.2.I(i), N (0, τ ] ≡ 0, so P (0, τ ) ≡ 1, and hence C(0) = 0. Now n k−1 k {N (0, 1] ≥ n} ⊇ N ≥1 , , k=1

n

n

so using the independence assumption and (2.2.3), we have n n Pr{N (0, 1] ≥ n} ≥ Pr{N (0, 1/n] ≥ 1} = 1 − e−C(0)/n .

28

2. Basic Properties of the Poisson Process

If now C(0) = ∞, then Pr{N (0, 1] ≥ n} = 1 (all n = 1, 2, . . .), contradicting Assumption 2.2.I(i) that N (0, 1] is a.s. ﬁnite. Thus, we conclude that 0 < C(0) < ∞.

(2.2.4)

Deﬁne quantities λ and Π(z) by λ = C(0)

and

Π(z) =

log P (z, τ ) − log P (0, τ ) C(0) − C(z) = , C(0) − log P (0, τ )

the ﬁniteness and nonnegativity of Π(z) on 0 ≤ z ≤ 1 being ensured by the monotonicity in z of P (z, ·). From (2.2.3) and (2.2.4), it follows that P (z, τ ) → 1 (τ → 0) for every ﬁxed z in 0 ≤ z ≤ 1, so from (2.2.3) we have τ C(z) = 1 − P (z, τ ) + o(τ ) from which also Π(z) = lim τ ↓0

(τ ↓ 0),

P (z, τ ) − P (0, τ ) . 1 − P (0, τ )

This representation expresses Π(·) as the limit of p.g.f.s, namely the p.g.f.s of the conditional probabilities πk|τ ≡ Pr{N (0, τ ] = k | N (0, τ ] > 0}. The deﬁnition of Π(z) shows that it inherits from P (z, τ ) the property of continuity as z ↑ 1, and therefore the continuity theorem for p.g.f.s (see e.g. Feller, 1968, Section XI.6) ensures that Π(z) must also be a p.g.f., Π(z) = πk z k say, where πk = lim πk|τ = lim Pr{N (0, τ ] = k | N (0, τ ] > 0} τ ↓0

τ ↓0

(k = 0, 1, . . .). (2.2.5)

In particular, π0 = Π(0) = 0. We have thus established the required form of the representation in (2.2.1). Uniqueness follows from the uniqueness of P (z, τ ), which deﬁnes C(z) by (2.2.3), and C(z) in turn deﬁnes λ and Π(z). The process deﬁned by Assumptions 2.2.I is clearly more general than the Poisson process, to which it reduces only in the case π1 = 1, πk = 0 (k = 1). The clue to its interpretation comes from the limit relation (2.2.5), which suggests that {πk } should be interpreted as a ‘batch-size’ distribution, where ‘batch’ refers to a collection of points of the process located at the same time point. None of our initial assumptions precludes the possibility of such batches. The distribution of the number of such batches in (0, 1) is found by replacing Π(z) by z in (2.2.1), and therefore it is Poisson with rate λ. Thus, the general process deﬁned by Assumptions 2.2.I can be described as consisting of a succession of batches, the successive batch sizes or multiplicities being independent random variables [as follows readily from Assumption 2.2.I(ii)] having the common distribution {πk }, and the number of batches following

2.2.

Characterizations: I. Complete Randomness

29

a Poisson process with constant rate λ. Recognizing that (2.2.1) speciﬁes the p.g.f. of a compound Poisson distribution, we refer to the process as the compound Poisson process [see the footnote on p.10 regarding terminology]. Processes with batches represent an extension of the intuitive notion of a point process as a random placing of points over a region. They are variously referred to as nonorderly processes, processes with multiple points, compound processes, processes with positive integer marks, and so on. For a general proof of the existence of a batch-size distribution for stationary point processes, see Proposition 3.3.VII. It should be noted that the uniqueness of the representation (2.2.1) breaks down once we drop the convention π0 = 0. Indeed, given any p.g.f. Π(·) as in (2.2.1), let π0∗ be any number in 0 ≤ π0∗ < 1, and deﬁne λ∗ = λ/(1 − π0∗ ), πn∗ = (1 − π0∗ )πn . Then ∞ Π∗ (z) ≡ n=0 πn∗ z n = π0∗ + (1 − π0∗ )Π(z), and λ∗ 1 − Π∗ (z) = λ(1 − π0∗ )−1 {(1 − π0∗ )[1 − Π(z)]} = λ 1 − Π(z) . The interpretation of this nonuniqueness is that if we increase the rate of occurrence of batches, we may compensate for this increase by observing only those batches with nonzero batch size. We obtain an alternative interpretation of the process by writing (2.2.1) in the form ∞ P (z, τ ) = exp[−λπk τ (1 − z k )], k=1

corresponding to a representation of the total as the sum of independent contributions from a countable family of simpler processes, the kth of which may be regarded as a modiﬁed Poisson process in which the rate of occurrence of points is equal to λπk and each such point is treated as a batch of ﬁxed size k. In this representation, the process is regarded as a superposition of independent component processes, each of Poisson type but with ﬁxed batch size. Since both interpretations lead to the same joint distributions and hence to the same probability structures, they must be regarded as equivalent. Theorem 2.2.II may also be regarded as a special case of the more general theorem of L´evy on the structure of processes with stationary independent increments (see e.g. Lo`eve, 1963, Section 37). In our case, there can be no Gaussian component (since the realizations are monotonic), no drift component (since the realizations are integer-valued), and the Poisson components must have positive integral jumps. Because a process has independent increments if and only if the distributions of the increment over any ﬁnite interval are inﬁnitely divisible, (2.2.1) also gives the general form of an inﬁnitely divisible distribution taking values only on the nonnegative integers [see Exercise 2.2.2 and Feller (1968, Section XII.2)]. Analytically, the condition corresponding to the requirement of no batches, or points occurring one at a time, is clearly π1 = 1, or equivalently Pr{N (0, τ ] > 1} = o(Pr{N (0, τ ] > 0}) = o(1 − e−λτ ) = o(τ )

for τ ↓ 0.

(2.2.6)

30

2. Basic Properties of the Poisson Process

More generally, a stationary process satisfying this condition was called by Khinchin (1955) an orderly process (Russian ordinarnii), and we follow this terminology for the time being, as contrasted with the sample path terminology of a simple point process. The relations between analytical and sample path properties are discussed later in Section 3.3 and Chapter 9. For the present, suﬃce it to be noted that the analytical condition (2.2.6) is equivalent to the absence of batches with probability 1 (see Exercise 2.2.4). Using the notion of an orderly process, we obtain the following characterization of the Poisson process as a corollary to Theorem 2.2.II. Theorem 2.2.III. A stationary point process satisfying Assumption 2.2.I(i) is a Poisson process if and only if (a) it has the complete independence property 2.2.I(ii) and (b) it is orderly.

Exercises and Complements to Section 2.2 2.2.1 In equation (2.2.3), P (z, τ ) → 1 (z → 1) for every ﬁnite τ (why?), and equation (2.2.2) and λτ > 0 suﬃce to check that Π(1) = 1. (A general proof, using only stationarity and not the Poisson assumption, is given in Proposition 3.3.VIII below.) 2.2.2 Call the p.g.f. P (z) inﬁnitely divisible when for 0 ≤ z ≤ 1 its uniquely deﬁned nonnegative kth root P1/k (z) ≡ (P (z))1/k is a p.g.f. for every positive integer. Then show that unless P (z) = 1 for all 0 ≤ z ≤ 1: (a) p0 = P (0) > 0; (b) (P (z)/p0 )1/k → 1 (k → ∞); P1/k (z) − P1/k (0) log P (z) − log P (0) = lim ; (c) k↑∞ − log P (0) 1 − P1/k (0) (d) the left-hand side of (c) represents a p.g.f. on {1, 2, . . .}. Hence, deduce that every nontrivial inﬁnitely divisible p.g.f. is of the form ∞ exp[−λ(1−Π(z))] for ﬁnite λ (in fact, p0 = e−λ ), and p.g.f. Π(z) = n=1 πn z n [for details see e.g. Feller (1968, Section XII.2)]. 2.2.3 (Continuation). Show that an r-variate p.g.f. P (z1 , . . . , zr ), which is nontrivial r in the sense that P (z1 , . . . , zr ) ≡ 1 in |1 − zj | > 0, is inﬁnitely divisible j=1 if and only if it is expressible in the form exp[−λ(1 − Π(z1 , . . . , zr ))] for some p.g.f. Π(z1 , . . . , zr ) =

∞

n1 =0

···

∞

πn1 ,...,nr z n1 · · · zrnr

nr =0

for which π0...0 = 0. 2.2.4 If a point process N has N ((k − 1)/n, k/n] ≤ 1 for k = 1, . . . , n, then there can be no batches on (0, 1]. Use the complete independence property in Assumption 2.2.I(ii) and the fact that (1 − o(1/n))n → 1 (n → ∞) to show that a Poisson process satisfying the analytic orderliness property in (2.2.6) has a.s. no batches on the unit interval, and hence on R.

2.3.

Characterizations: II. The Form of the Distribution

31

2.3. Characterizations of the Stationary Poisson Process: II. The Form of the Distribution The discussion to this point has stressed the independence property, and it has been shown that the Poisson character of the ﬁnite-dimensional distributions is really a consequence of this property. To what extent is it possible to work in the opposite direction and derive the independence property from the Poisson form of the distributions? Observe that for any partition A1 , . . . , Ar of a Borel set A, the avoidance probability P0 (A) of a Poisson process satisﬁes P0 (A) = Pr{N (A) = 0} = exp(−λ(A)) =

r

exp(−λ(Ai )) =

i=1

r

P0 (Ai ),

i=1

(2.3.1) so the events {N (Ai ) = 0} are independent [in (2.3.1), (·) denotes Lebesgue measure]. R´enyi (1967) weakened this assumption by requiring (2.3.1) to hold merely on all sets A that are ﬁnite unions of ﬁnite intervals, and then, adding the requirement that N be orderly, he deduced that N must be Poisson. In the converse direction, it is not enough to take A to be the class of unions of any ﬁxed number of intervals: in particular, it is not enough to know that N (A) has a Poisson distribution for all single intervals A = [a, b], as shown in a series of counterexamples provided by Shepp in Goldman (1967), Moran (1967, 1976a, b), Lee (1968), Szasz (1970), and Oakes (1974); two such counterexamples are described in Exercises 2.3.1 and 4.5.12. Theorem 2.3.I. Let N be an orderly point process on R. Then, for N to be a stationary Poisson process it is necessary and suﬃcient that for all sets A that can be represented as the union of a ﬁnite number of ﬁnite intervals, P0 (A) = e−λ(A) .

(2.3.2)

It is as easy to prove a more general result for a Poisson process that is not necessarily stationary. To this end, deﬁne a simple Poisson process in d-dimensional space Rd as a point process N for which the joint distributions of the counts N (Ai ) on bounded disjoint Borel sets Ai satisfy [see equation (2.1.1)] Pr{N (Ai ) = ki (i = 1, . . . , r)} =

r [µ(Ai )]ki i=1

ki !

e−µ(Ai )

(r = 1, 2, . . .)

for some nonatomic measure µ(·) that is bounded on bounded sets. Thus, the N (Ai ) are Poisson-distributed and independent, E[N (A)] = µ(A), and µ being nonatomic, µ(An ) → 0 for any monotonic sequence of bounded sets An ↓ ∅ or {x } for any singleton set {x } (see Lemma A1.6.II). It is an elementary property

of the Poisson distribution that this then implies that Pr{N (An ) ≥ 2} Pr{N (An ) ≥ 1} → 0 for the same sequence {An }; thus, N has the property of orderliness noted below (2.2.6).

32

2. Basic Properties of the Poisson Process

Theorem 2.3.II. Let µ be a nonatomic measure on Rd , ﬁnite on bounded sets, and suppose that the simple point process N is such that for any set A that is a ﬁnite union of rectangles, Pr{N (A) = 0} = e−µ(A) .

(2.3.3)

Then N is a Poisson process with mean µ(A). Proof. We use the idea of a dissecting system (see Appendix A1.6). For any set A as in (2.3.3), let the set Tn of rectangles {Ani : i = 1, . . . , rn } be an element of a dissecting system {Tn } of partitions for A [so, for given n, the union of the Ani is A, Ani and Anj are disjoint for i = j, and each Anj is the union of some subset An+1,is (s = 1, . . . , rn,i ) of Tn+1 , and for any x ∈ A, there is a sequence {An (x)}, An (x) ∈ Tn with n An (x) = {x}]. Since µ is nonatomic, µ(An (x)) → 0 as n → ∞. Given a partition Tn , deﬁne the indicator random variables Ini =

1 if N (Ani ) > 0, 0 otherwise,

rn Ini . Because the sets Ani are disjoint, the random and set Nn (A) = i=1 variables of the set {Inij : j = 1, . . . , s} are mutually independent because they are {0, 1}-valued and Pr{Inij = 0 (j = 1, . . . , s)} = Pr{N (Anij ) = 0 (j = 1, . . . , s)} s = Pr{N j=1 Anij = 0} s = exp − µ j=1 Anij =

s

exp[−µ(Anij )] .

j=1

Also, E(z Ini ) = 1 − (1 − z)(1 − e−µ(Ani ) ), so Nn (A) has p.g.f. E(z Nn (A) ) =

i

E(z Ini ) =

1 − (1 − z)(1 − e−µ(Ani ) ) . i

Because µ is nonatomic, supi µ(Ani ) ≡ n → 0 as n → ∞ (see Lemma A1.6.II), and thus, using 1 − δ < e−δ < 1 − δ + δ 2 for all δ suﬃciently small, the p.g.f. of Nn (A) converges to exp[−(1 − z)µ(A)] as n → ∞. Since N is simple, for each realization there exists n0 such that, for all n ≥ n0 , each of the N (A) points xj is in a distinct set Anj , say. Then, for n ≥ n0 , Nn (A) = N (A). Also, the random variables Nn (A) are monotonically increasing in n and thus have the a.s. limit N (A). It follows that E(z N (A) ) = exp[−(1 − z)µ(A)]; i.e. N (A) is Poisson-distributed with mean µ(A) for sets A as in the theorem.

2.3.

Characterizations: II. The Form of the Distribution

33

Next, let {Aj } be a ﬁnite family of disjoint sets that are unions of rectangles. Repeating the argument above shows that the random variables {N (Aj )} are mutually independent Poisson random variables with means µ(Aj ). Now let A be an open set. Then there is a sequence of families Tn of rectangles Ani that are disjoint, as for Tn , with union a subset of A and the unions converging monotonically to A. Analysis similar to that just given shows that N (A) is Poisson distributed with mean µ(A). Similarly, for a ﬁnite family of disjoint open sets Aj , the random variables N (Aj ) are independent. Finally, we extend these properties to arbitrary disjoint bounded Borel sets Aj by using generating functionals (see Deﬁnition 5.5.I) with functions that equal 1 on open sets contained by Aj , vanish on a closed set containing Aj , and are continuous (and between 0 and 1). Such approximating functions yield generating functions that are of Poisson variables and that decompose into products of the separate functions (for each distinct Aj ), so the N (Aj ) are Poisson-distributed and independent. Theorem 2.3.II is due to R´enyi (1967); the proof above is adapted from Kingman (1993). This result includes Theorem 2.3.I as a special case, while in the other direction, it is a corollary of a more general result, proved in Chapter 9 and due to Kurtz, that for a simple point process N , it is enough to know the avoidance probabilities P0 (A) on a suﬃciently rich class of sets A in order to determine its distribution. In turn, this leads to a characterization of those set functions P0 (A) that can be avoidance functions.

Exercises and Complements to Section 2.3 2.3.1 (see Theorem 2.3.II). Let N (·) be a point process on R having as its ﬁdi distributions those of a stationary Poisson process of unit rate except for the following eight probabilities relating to the interval (0, 4]: p0010 = p0101 = p1011 = p1100 = e−4 + , p0100 = p1010 = p1101 = p0011 = e−4 − , where pijkl = Pr{N (0, 1] = i, N (1, 2] = j, N (2, 3] = k, N (3, 4] = l}, 0 <

< e−4 , and, conditional on N (a, a + 1] = 1 for a = 0, 1, 2, 3, that point is uniformly distributed over that unit interval. Verify that N (I) is Poissondistributed for any interval I, but N (·) is not a Poisson process (Lee, 1968). 2.3.2 (a) Raikov’s theorem. Let Z be a Poisson r.v. expressible as the sum Z = X + Y of independent nondegenerate, nonnegative r.v.s X and Y . Then X and Y are Poisson r.v.s [see e.g. Lo`eve (1963, Section 19.2) or Moran (1968, p. 408)]. (b) Let N be a Poisson process for which N = N + N for nontrivial independent point processes N , N . Show that each of N and N is a Poisson process.

34

2. Basic Properties of the Poisson Process

2.3.3 (see Theorem 2.3.III). Suppose a stationary orderly point process satisﬁes (2.3.1). Since orderliness implies that Pr{N ((0, 1] \ ((k − 1)/n, k/n]) = 0} − Pr{N (0, 1] = 0} = Pr{N ((0, 1] \ ((k − 1)/n, k/n]) = 0, N ((k − 1)/n, k/n] = 1} + o(1/n), deduce that Pr{N (0, 1] = 1} = limn→∞ n(e−λ(1−1/n) − e−λ − o(1/n)) = λe−λ . Extend this argument to show that Pr{N (0, 1] = j} = λj e−λ /j ! 2.3.4 (a) Random thinning. Let N (·) be an orderly inhomogeneous Poisson process on Rd with rate λ(·). Form a new process N (·) by treating each point of a realization {xi } independently of all other points; namely (∗) either retain xi with probability p(xi ) or delete it with probability 1 − p(xi ), where p(·) is a measurable function with 0 ≤ p(x) ≤ 1 for all x. Show that N (·) is a Poisson process with rate p(x)λ(x). (b) Random translation. Repeat part (a) but instead of (∗) use (†): translate xi to xi + Yi , where Yi are independent identically distributed random variables with distribution function F (·). Show that the resulting point process, N (·) say, is Poisson with rate Rd λ(x − y) F (dy). (c) What conditions on λ(·) and p(·) make N (·) stationary? What conditions make N (·) stationary?

2.4. The General Poisson Process We suppose in this section that the point process takes its values in a complete separable metric space (c.s.m.s.) X , thereby anticipating the context of Chapter 9, and without necessarily being stationary, homogeneous, or isotropic. The cases of frequent occurrence are those in which X is twoor three-dimensional Euclidean space (see the exercises), while the setting includes spatial point processes as in Section 5.3 and Chapter 15, for example. We suppose throughout that N (A), the number of points in the set A, is deﬁned and ﬁnite for every bounded set A in the Borel σ-ﬁeld B(X ) ≡ BX generated by the open spheres of X . We may express this more succinctly by saying that (with probability 1) the trajectories N (·) are boundedly ﬁnite [recall Assumption 2.2.I(i)]. The Poisson process can then be deﬁned by assuming that there exists a boundedly ﬁnite Borel measure Λ(·) such that for every ﬁnite family of disjoint bounded Borel sets {Ai , i = 1, . . . , k} Pr{N (Ai ) = ni , i = 1, . . . , k} =

k [Λ(Ai )]ni i=1

ni !

e−Λ(Ai ) .

(2.4.1)

The measure Λ(·) is called the parameter measure of the process. Note that when X is the real line, (2.4.1) includes as special cases the two examples given in Section 2.1: for the homogeneous process Λ(A) = λ(A), and for the inhomogeneous process, Λ(A) = A λ(x) dx. Equation (2.4.1) embraces a

2.4.

The General Poisson Process

35

nontrivial increase in generality because, in general, the parameter measure may have both a discrete (or atomic) component and a continuous singular component. In this general setting, we ﬁrst clarify the role of the discrete component of Λ(·). Suppose, in particular, that Λ(·) has an atom of mass λ0 at the point x0 . Since the single-point set {x0 } is a Borel set, it follows at once from (2.4.1) that N {x0 } ≡ N ({x0 }) must have a Poisson distribution with parameter λ0 . We say that any point x0 with the property Pr{N {x0 } > 0} > 0 is a ﬁxed atom of the process. Thus, we conclude that every atom of Λ(·) is a ﬁxed atom of N (·). Conversely, if x0 is a ﬁxed atom of N (·), then N {x0 } must have a Poisson distribution with nonzero parameter λ0 , say. From this, it follows that x0 is an atom of Λ(·) with mass λ0 . Hence, the following is true. Lemma 2.4.I. The point x0 is an atom of the parameter measure Λ if and only if it is a ﬁxed atom of the process. Note that whether a given point x0 represents a ﬁxed atom of the process is not discernible from a single realization: any point of the process is an atom of its particular realization. For x0 to constitute a ﬁxed atom, there must be positive probability of it recurring over a whole family of realizations. Thus, the ﬁxed atoms relate to the probability structure of the process, not to the structure of individual realizations. In the Poisson case, the ﬁxed atoms are also the key to the question of orderliness. The deﬁnition given earlier in (2.2.6) is most naturally extended to the present context by requiring Pr{N (S (x)) > 1} = o(Pr{N (S (x)) > 0})

( → 0),

(2.4.2)

for each x ∈ X , where S (x) denotes the open sphere with radius and centre x. In the case of a Poisson process, N (S (x)) has a Poisson distribution, with parameter Λ(S (x)) = Λ , say, so that Pr{N (S (x)) > 0} = 1 − e−Λ , Pr{N (S (x)) > 1} = 1 − e−Λ − Λ e−Λ . Now if x is a ﬁxed atom of Λ, Λ → Λ0 = Λ{x} > 0 as ↓ 0, whereas if x is not a ﬁxed atom, Λ → 0. In the ﬁrst case, the ratio Pr{N (S (x)) > 1}/ Pr{N (S (x)) > 0} tends to the positive constant 1 − Λ0 /(eΛ0 − 1), whereas in the second case it tends to zero. Thus, the process is orderly, in the sense of (2.4.2), if and only if Λ(·) has no atoms. Theorem 2.4.II. The Poisson process deﬁned by (2.4.1) is orderly if and only if it has no ﬁxed atoms; equivalently, if and only if the parameter measure has no discrete component. When X is the real line, the distribution function FΛ (x) ≡ Λ(0, x] is continuous if and only if Λ has no discrete component, so in this case Λ itself could

36

2. Basic Properties of the Poisson Process

be called continuous. One should beware of claiming any such conclusions for more general X , however, for even though Λ(·) may have no atoms, it may well have concentrations on lines, surfaces, or other lower-dimensional subsets that may cause an associated distribution function to be discontinuous. In such situations, in contrast to the case of a homogeneous Poisson process, there will be some positive probability of points of the process appearing on such lines, surfaces, and so on. We turn next to the slightly more diﬃcult problem of extending the characterizations based on the complete independence property stated below. Assumption 2.4.III. For each ﬁnite family of bounded, disjoint Borel sets {Ai , i = 1, . . . , k}, the random variables N (A1 ), . . . , N (Ak ) are mutually independent. The most important result is contained in the following lemma. Lemma 2.4.IV. Suppose (i) N is boundedly ﬁnite a.s. and has no ﬁxed atoms, and (ii) N has the complete independence property of Assumption 2.4.III. Then, there exists a boundedly ﬁnite nonatomic Borel measure Λ(·) such that P0 (A) = Pr{N (A) = 0} = e−Λ(A)

(all bounded Borel sets A).

Proof. Set Q(A) = − log P0 (A), observing immediately that Q(A) ≥ 0 and that by (ii) it is ﬁnitely additive. Countable additivity is equivalent to having Q(An ) → 0 for any decreasing sequence {An } of bounded Borel sets for which Q(An ) < ∞ and An ↓ ∅. For An ↓ ∅, we must have N (An ) → 0 a.s., and thus e−Q(An ) = P0 (An ) = Pr{N (An ) = 0} → 1, establishing Q(An ) → 0 as required. To show that Q(·) is nonatomic, observe that, by (i), 0 = Pr{N {x} > 0} = 1 − e−Q({x}) , so that Q({x}) = 0 for every x. It remains to show that Q(·) is boundedly ﬁnite, which is equivalent to P0 (A) > 0 for any bounded Borel set A. Suppose the contrary for some set A, which without loss of generality we may assume to be closed, for if not, ¯ ≤ P0 (A) = 0, whence P0 (A) ¯ = 0. Since X is separable, A can be 0 ≤ P0 (A) covered by a countable ∞number of disjoint Borel sets An , each with diameter less than 1, so A = n=1 An . Let pn = Pr{N (An ) > 0}, ∞so that N (A) = 0 (1 − pn ). This only if N (An ) = 0 for all n, and thus 0 = P0 (A) = n=1 ∞ inﬁnite product vanishes only if pn = 1 for some n, or else n=1 pn diverges. In the latter event, the Borel–Cantelli lemma implies that a.s. inﬁnitely many N (An ) are nonzero, and hence N (A) = ∞ a.s., contradicting the assumption that N (·) is boundedly ﬁnite. Consequently, we must have pn = 1 for some set An , A(1) say, and A(1) has diameter less than 1 and as with A may be assumed to be closed. By repeating the argument, we can ﬁnd a closed set A(2) with diameter less than 2−1 such that P0 (A(2) ) = 0. Proceeding by induction, a

2.4.

The General Poisson Process

37

sequence {A(n) } of nested closed sets is constructed with diameters → 0, and P0 (A(n) ) = 0 (all n). Choose xn ∈ A(n) , so that {xn } is a Cauchy sequence, xn → x0 say, and, each A(n) being closed, x0 ∈ A(n) , and therefore An ↓ {x0 }. Then N (A(n) ) ↓ N ({x0 }), and by monotone convergence, P0 ({x0 }) = limn→∞ P0 (A(n) ) = 0. Equivalently, Pr{N {x0 } > 0} = 1, so that x0 is a ﬁxed atom of the process, contradicting (i). Now suppose that the process is orderly in addition to satisfying the conditions of Lemma 2.4.IV. Then, it follows from Theorem 2.3.II that we have a Poisson process without ﬁxed atoms. Thus, the following theorem, due to Prekopa (1957a, b), is true. Theorem 2.4.V. Let N (·) be a.s. boundedly ﬁnite and without ﬁxed atoms. Then N (·) is a Poisson process if and only if (i) it is orderly, and (ii) it has the complete independence property of Assumption 2.4.III. To extend this result to the nonorderly case, consider for ﬁxed real z in 0 ≤ z ≤ 1 the set function Qz (A) ≡ − log E(z N (A) ) ≡ − log Pz (A) deﬁned over the Borel sets A. It follows immediately that 0 ≤ Qz (A) < Q(A), and using also the argument of Lemma 2.4.VI, it follows that Qz (·) is a measure, absolutely continuous with respect to Q(·). Consequently, there exists a density, qz (x) say, such that Qz (A) = qz (x) Q(dx) (2.4.3) A

and for Q-almost-all x qz (x) = lim ↓0

Qz (S (x)) , Q(S (x))

where S (x) is as in (2.4.2); see also e.g. Lemma A1.6.III for this property of Radon–Nikodym derivatives. If we continue to assume that the process has no ﬁxed atoms, Q(S (x)) and hence also Qz (S (x)) both → 0 as → 0, for then S (x) → {x}. We can then imitate the argument leading to Theorem 2.2.II and write for Q-almost-all x Πz (x) = 1 − qz (x) = lim ↓0

Pz (S (x)) − P0 (S (x)) . 1 − P0 (S (x))

(2.4.4)

Now, for ﬁxed A, Qz (A) is monotonically decreasing in z for 0 ≤ z ≤ 1, so by taking a countably dense set of z values in [0, 1], (2.4.4) holds for such z except possibly on a Q-null set formed by the union of the Q-null sets where it may fail for the separate values of z.

38

2. Basic Properties of the Poisson Process

For each , (2.4.4) is the p.g.f. of the conditional distribution Pr{N (S (x)) = k | N (S (x)) > 0}. Now a sequence of p.g.f.s converging on a countably dense set of z values in [0, 1) converges for all 0 ≤ z < 1, with the limit being a p.g.f. of a possibly dishonest distribution. In the present case, the limit is in fact Q-a.e. honest because by monotone convergence and (2.4.3), 0 = log P1 (A) = lim Qz (A) = lim qz (x) Q(dx), z↑1

A

z→1

implying that limz→1 qz (x) = 0 Q-a.e. Consequently, except for a Q-null set, (2.4.4) holds for all 0 ≤ z ≤ 1, and for the limit qz (x), 1 − qz (x) is the p.g.f. of a proper distribution, {πk (x)} say, for which ∞

π0 (x) = 0, Πz (x) = πk (x)z k , k=1

and Pz (A) = exp

−

[1 − Πz (x)] Q(dx) .

(2.4.5)

A

There is the alternative form for (2.4.5), Pz (A) = exp − Q(A)[1 − Πz (A)] , in which there appears the p.g.f. Πz (A) of the ‘averaged’ probabilities 1 πk (A) = πk (x) Q(dx). Q(A) A Thus, the distributions in this process still have the compound Poisson form. Finally, suppose we reinstate the ﬁxed atoms of the process. Note that these are also atoms of Q(·) and can therefore be at most countable in number, and also that the number of points of the process at each ﬁxed atom must be a discrete random variable independent of the rest of the process. We thus arrive at the following structure theorem for the general point process satisfying the complete independence property. Theorem 2.4.VI. Let N (·) be a point process that has the complete independence property of Assumption 2.4.III. Then N (·) can be written in the form of a superposition N = N1 + N2 , where N1 and N2 are independent and (i) N1 consists of a ﬁnite or countable family of ﬁxed atoms, {x1 , x2 , . . .}, where for each i, N1 {xi } has a proper, discrete distribution and is independent of the rest of the process; and (ii) N2 is a process without ﬁxed atoms, which can be represented in the compound Poisson form (2.4.5), where Q(·) is a ﬁxed, boundedly ﬁnite, nonatomic measure, and for Q-almost-all x, Πz (x) is the p.g.f. of a proper discrete distribution, satisfying Π0 (x) = 0.

2.4.

The General Poisson Process

39

We remark that, analogously to the situation described by Theorem 2.2.II, the realizations of N2 consist a.s. of random batches of points, where the number of batches is governed by a Poisson process with parameter measure Q(·) and, conditional on a batch occurring at x, its probability distribution is given by {πk (x)}. These sample-path results can be established directly for this special case, but we prefer to treat them as special cases of the theorems established in Chapter 3.

Exercises and Complements to Section 2.4 2.4.1 Let N1 , N2 be independent Poisson processes with parameter measures Λ1 , Λ2 . Show that N1 + N2 is a Poisson process with parameter measure Λ1 + Λ2 . 2.4.2 Poisson process on the surface of a sphere. There is an area-preserving map of the surface of a sphere of radius r onto the curved surface of a cylinder of radius r and height 2r. Conclude that a homogeneous Poisson process on the surface of such a sphere can be represented as a Poisson process on a rectangle with side-lengths 2r and 2πr. How may a homogeneous Poisson process on the surface of an oblate or prolate elliptical spheroid be constructed? [Hint: An oblate spheroid is the solid of revolution obtained by rotating an ellipse with major and minor axes of lengths 2a and 2b, respectively, about its minor axis, so it has the same surface area as the curved surface of a cylinder of π/2 radius a and height 2 0 cos θ a2 sin2 θ + b2 cos2 θ dθ. For a prolate spheroid, use a cylinder of radius b and height 2

π/2 0

sin θ

a2 sin2 θ + b2 cos2 θ dθ.]

2.4.3 Poisson process on a lattice. A homogeneous Poisson process with density λ on a given (countably inﬁnite) lattice of points, {zi } say, is a sequence of i.i.d. Poisson r.v.s, {Ni } say, with common mean λ. A homogeneous binary process on such a lattice is a sequence, {Yi } say, of i.i.d. {0, 1}-valued r.v.s {Yi } for which Pr{Yi = 1} = p for some p ∈ (0, 1). It is only approximately Poisson, and then only for small p. 2.4.4 Deﬁne a homogeneous Poisson process on a cylinder of unit radius as a Poisson process of points {(xi , θi )} on the doubly inﬁnite strip R × (0, 2π] at rate λ dx dθ. Such a point process can also be interpreted as a Poisson process of directed lines in the plane since any such line is speciﬁed by its orientation relative to a given direction and its distance from the origin (negative if the origin is to the left of the line rather than the right). (a) In this line-process interpretation, check that the largest circle that can be drawn around a randomly chosen point in the plane without intersecting a line has radius R with distribution Pr{R > y} = Pr{strip of width 2y has no point (xi , θi )} = exp(−λ 2πy). (b) Show that the expected number of intersections lying within the circle SR (0) between the line (x, 0) and lines of the process, where 0 < x < R R, equals 4 x arsin(y/R) 2λ dy. Deduce that the expected number of intersections between any two lines of the process and lying in a circle of radius R equals

2π

R

2λ dx 0

x

R

8λ arsin(y/R) dy = (2λ πR)2 .

40

2. Basic Properties of the Poisson Process Observe that such a point process (from line intersections) cannot be Poisson because with probability 1, given any two points, there are inﬁnitely many other points collinear with the two given points.

2.4.5 Poisson process in Hilbert space. (i) Find an example of a Hilbert-space-valued random variable that does not have its distribution concentrated ona ﬁnite-dimensional subspace. [Hint: Consider a series of the form Y = ak Uk ek , where the ak form a scalar series, the Uk are i.i.d., and ek is the unit vector in the kth dimension. Other examples follow from the Hilbert-space Gaussian measures discussed in Chapter 9.] By combining copies of this probability measure suitably, build up examples of σ-ﬁnite measures. (ii) Using the measures above, construct examples of well-deﬁned Poisson processes on a Hilbert space. Discuss the nature of the realizations in increasing sequences of spheres or cubes. (iii) Show that if a σ-ﬁnite measure is invariant under Hilbert-space translations, then it cannot be boundedly ﬁnite. Hence, show that no Poisson process can exist that is invariant under the full set of Hilbert-space translations.

CHAPTER 3

Simple Results for Stationary Point Processes on the Line

The object of this chapter is to give an account of some of the distinctive aspects of stationary point processes on the line without falling back on the measure-theoretic foundations that are given in Chapter 9. Some aspects that are intuitively reasonable and that can in fact be given a rigorous basis are taken at face value in order that the basic ideas may be exposed without the burden of too much mathematical detail. Thus, the results presented in this chapter may be regarded as being made logically complete when combined with the results of Chapter 9. Ideas introduced here concerning second-order properties are treated at greater length in Chapters 8 and 12, and Palm theory in Chapter 13.

3.1. Speciﬁcation of a Point Process on the Line A point process on the line may be taken as modelling the occurrences of some phenomenon at the time epochs {ti } with i in some suitable index set. For such a process, there are four equivalent descriptions of the sample paths: (i) (ii) (iii) (iv)

counting measures; nondecreasing integer-valued step functions; sequences of points; and sequences of intervals.

In describing a point process as a counting measure, it does not matter that the process is on the real line. However, for the three other methods of describing the process, the order properties of the reals are used in an essential way. While the methods of description may be capable of extension into higher dimensions, they become less natural and, in the case of (iv), decidedly artiﬁcial. 41

42

3. Simple Results for Stationary Point Processes on the Line

In Chapters 1 and 2, we mostly used the intuitive notion of a point process as a counting measure. To make this notion precise, take any subset A of the real line and let N (A) denote the number of occurrences of the process in the set A; i.e. N (A) = number of indices i for which ti lies in A = #{i: ti ∈ A}.

(3.1.1)

When A is expressed as the union of the disjoint sets A1 , . . . , Ar , say, that is, A=

r

Ai

where Ai ∩ Aj = ∅ for i = j,

i=1

it is a consequence of (3.1.1) that N

r i=1

Ai

=

r

N (Ai )

for mutually disjoint A1 , . . . , Ar .

(3.1.2)

i=1

It also follows from (3.1.1) that N (A) is nonnegative integer-(possibly ∞-)valued.

(3.1.3)

In order that we may operate conveniently on N (A) for diﬀerent sets A—in particular, in order that the probability of events speciﬁed in terms of N (A) may be well deﬁned—we must impose a restriction on the sets A that we are prepared to consider. Since we want to include intervals and unions thereof, the usual constraint is that N (A) is deﬁned for all Borel subsets A of the real line.

(3.1.4)

Finally, in order to exclude the possibility of ‘too many’ points occurring ‘too close’ together, we insist that, for the point processes we consider, N (A) is ﬁnite for bounded sets A.

(3.1.5)

The assumptions in (3.1.2–5) with (3.1.2) extended to allow r = ∞ are precisely those that make N (·) a counting measure on the σ-ﬁeld BR of all Borel subsets of the real line R. The constraint in (3.1.3) that N (·) be integervalued distinguishes it from other more general nonnegative measures as a counting measure. To be consistent with N (·) being a set function, we ought to write, for example, N ((a, b]) when A is the half-open interval (a, b]; our preference for the less cumbersome abbreviation N (a, b] should lead to no confusion. We have already used in Chapters 1 and 2 the further contraction N (t) = N (0, t] = N ((0, t])

(0 < t ≤ ∞);

(3.1.6)

3.1.

Speciﬁcation of a Point Process on the Line

43

the diﬀerence in argument should suﬃce to distinguish the real function N (t) (t > 0) from the set function N (A). This function N (t) is nondecreasing, right-continuous, and integer-valued, and hence a step function. For point processes on the positive half-line, knowledge of N (t) for all t ≥ 0 suﬃces to determine N (A) for Borel sets A ⊂ (0, ∞) in precisely the same manner as a distribution function determines a probability measure on Borel sets. When the point process is deﬁned on the whole line, we extend the deﬁnition (3.1.6) to ⎧ (t > 0), ⎪ ⎨ N ((0, t]) (t = 0), (3.1.7) N (t) = 0 ⎪ ⎩ −N ((t, 0]) (t < 0). In this way, N (t) retains the properties of being a right-continuous integervalued function on the whole line. Moreover, N (t) determines N (A) for all Borel sets A and hence describes the point process via a step function. Thus, instead of starting with N (A) (all A ∈ B), we could just as well have speciﬁed the sample path as a right-continuous function N (t) (−∞ < t < ∞) that is nonnegative and integer-valued for t > 0, nonpositive and integer-valued for t < 0, and has N (0) = 0. The simplest case of the third method listed above occurs where the process is deﬁned on the half-line t > 0. Setting ti = inf{t > 0: N (t) ≥ i}

(i = 1, 2, . . .),

(3.1.8)

it follows that for i = 1, 2, . . ., we have the seemingly obvious but most important relation ti ≤ t if and only if N (t) ≥ i. (3.1.9) This relation makes it clear that specifying the sequence of points {ti } is equivalent to specifying the function N (t) in the case where N (−∞, 0] = 0. It should be noted that the set of points {ti } in (3.1.8) is in increasing order; such a restriction is not necessarily implied in talking of a set of time epochs {ti } as at the beginning of the present section. If the point process has points on the whole line and not just the positive axis, the simplest extension consistent with (3.1.8) is obtained by deﬁning ti = inf{t: N (t) ≥ i} inf{t > 0: N (0, t] ≥ i} (i = 1, 2, . . .), = − inf{t > 0: N (−t, 0] ≥ −i + 1} (i = 0, −1, . . .).

(3.1.10)

Such a doubly inﬁnite sequence of points has the properties that ti ≤ ti+1 (all i)

t0 ≤ 0 < t1 .

(3.1.11)

with {ti } as in (3.1.10)

(3.1.12)

and

Finally, by setting τi = ti − ti−1

44

3. Simple Results for Stationary Point Processes on the Line

[or else, in the case of only a half-line as in (3.1.8), with the added conventions that t0 = 0 and τi is deﬁned only for i = 1, 2, . . . ], the process is fully described by the sequence of intervals {τi } and one of the points {ti }, usually t0 . Observe n that τi ≥ 0 and that if N (t) → ∞ as t → ∞, then i=1 τi → ∞ as n → ∞, while if N (t) → ∞ as t → ∞, then τi is not deﬁned for i > limt→∞ N (t). We now make the intuitively plausible assumption that there exists a probability space on which the functions N (A), N (t), ti , τi are well-deﬁned random variables and furthermore that we can impose various constraints on these random variables in a manner consistent with that assumption. The question of the existence of such a probability space is discussed in Chapter 9.

Exercises and Complements to Section 3.1 3.1.1 Suppose that the r.v.s {ti } in (3.1.8) are such that Pr{ti+1 > ti } = 1, and deﬁne Gi (x) = Pr{ti ≤ x}. (a) Show that limx→0 Gi (x) = 0 for all integers i > 0. (b) Show that the assumption in (3.1.5) of N (·) being boundedly ﬁnite implies that, for all real x > 0, lim Gi (x) = 0. i→∞

3.1.2 (Continuation). Show that for x > 0, M (x) ≡ EN (x) = more generally, that E([N (x)]r ) =

∞

(ir − (i − 1)r )Gi (x) =

i=1

∞

∞ i=1

Gi (x) and,

ir (Gi (x) − Gi+1 (x))

i=1

in the sense that either both sides are inﬁnite or, if one is ﬁnite, so is the other and the two sides are equal. 3.1.3 (Continuation). Show that for |z| ≤ 1 and x > 0, P (x; z) ≡ Ez N (x) = 1 + (z − 1)

∞

Gi+1 (x)z i .

i=0

3.2. Stationarity: Deﬁnitions The notion of stationarity of a point process at ﬁrst sight appears to be a simple matter: at the very least, it means that the distribution of the number of points lying in an interval depends on its length but not its location; that is, pk (x) ≡ Pr{N (t, t + x] = k} (x > 0, k = 0, 1, . . .) depends on the length x but not the location t. Lawrance (1970) called this property simple stationarity, while we follow Chung (1972) in calling it crude stationarity. It is in fact weaker than the full force of the deﬁnition below (see Exercise 3.2.1).

3.2.

Stationarity: Deﬁnitions

45

Deﬁnition 3.2.I. A point process is stationary when for every r = 1, 2, . . . and all bounded Borel subsets A1 , . . . , Ar of the real line, the joint distribution of {N (A1 + t), . . . , N (Ar + t)} does not depend on t (−∞ < t < ∞). In the case where the point process is deﬁned only on the positive half-line, the sets Ai must be Borel subsets of (0, ∞) and we require t > 0. There is also the intuitive feeling that the intervals {τi } should be stationary, and accordingly we introduce the following deﬁnition. Deﬁnition 3.2.II. A point process is interval stationary when for every r = 1, 2, . . . and all integers ii , . . . , ir , the joint distribution of {τi1 +k , . . . , τir +k } does not depend on k (k = 0, ±1, . . .). Note that this deﬁnition makes no reference to the point t0 required to complete the speciﬁcation of a sample path as below (3.1.12). It is most natural to take t0 = 0 [see (3.1.11)]. Such processes may then be regarded as a generalization of renewal processes in that the intervals between occurrences, instead of being mutually independent and identically distributed, constitute merely a stationary sequence. The relation that exists between the probability distributions for interval stationarity on the one hand and stationarity on the other is taken up in Section 3.4 and elsewhere, notably Chapter 13, under its usual heading of Palm–Khinchin theory. Some authors speak of arbitrary times and arbitrary points in connection with point processes. A probability distribution with respect to an arbitrary time epoch of a stationary point process is one that is stationary as under Deﬁnition 3.2.I; a probability distribution with respect to an arbitrary point of a point process is one determined by the interval stationary distributions as under Deﬁnition 3.2.II. The importance of maintaining a distinction between interval stationarity and ordinary stationarity is underlined by the waiting-time paradox. If in some town buses run exactly on schedule every ∆ minutes and a stranger arrives at a random time to wait for the next bus, then his expected waiting time EW is 12 ∆ minutes. If, on the other hand, buses run haphazardly according to a Poisson process with an average time ∆ between buses, then the expected waiting time of the same stranger is ∆. The core of the so-called paradox lies in the use of ∆ as an average interval length from the arrival of one bus to the next, and the waiting time EW being half the mean interval between bus arrivals when the probabilities of diﬀerent intervals being chosen are proportional to their lengths. In renewal theory, the resolution of the paradox is known as length-biased sampling [see Feller (1966, Section I.4), Exercise 1.2.5 above, and (3.4.17) below].

46

3. Simple Results for Stationary Point Processes on the Line

Exercises and Complements to Section 3.2 3.2.1 (a) Construct an example of a crudely stationary point process that is not stationary (for one example, see Exercise 2.3.1). (b) Let N (·) be crudely stationary. Is it necessarily true that Pr{N ({t}) ≥ 2 for some t in (−1, 0]} = Pr{N ({t}) ≥ 2 for some t in (0, 1]} ? [See the proof of Proposition 3.3.VI, where equality is shown to hold when the probabilities equal zero.]

3.3. Mean Density, Intensity, and Batch-Size Distribution A natural way of measuring the average density of points of a point process is via its mean, or in the case of a stationary point process, its mean density, which we deﬁne as m = E(N (0, 1]). (3.3.1) Deﬁning the function M (x) = E(N (0, x]),

(3.3.2)

it is a consequence of the additivity properties of N (·) as in (3.1.2) and of expectations of sums, and of the crude stationarity property in (3.2.1), that for x, y ≥ 0, M (x + y) = E N (0, x + y] = E N (0, x] + N (x, x + y] = E N (0, x] + E N (x, x + y] = E N (0, x] + E N (0, y] = M (x) + M (y). In other words, M (·) is a nonnegative function satisfying Cauchy’s functional equation M (x + y) = M (x) + M (y) (0 ≤ x, y < ∞). Consequently, by Lemma 3.6.III, M (x) = M (1)x = mx

(0 ≤ x < ∞),

(3.3.3)

irrespective of whether M (x) is ﬁnite or inﬁnite for ﬁnite x > 0. There is another natural way of measuring the rate of occurrence of points of a stationary point process, due originally to Khinchin (1955). Proposition 3.3.I (Khinchin’s Existence Theorem). For a stationary (or even crudely stationary) point process, the limit λ = lim h↓0

exists, though it may be inﬁnite.

Pr{N (0, h] > 0} h

(3.3.4)

3.3.

Mean Density, Intensity, and Batch-Size Distribution

47

Proof. Introduce the function φ(x) = Pr{N (0, x] > 0}.

(3.3.5)

Then φ(x) ↓ 0 as x ↓ 0, and φ(·) is subadditive on (0, ∞) because for x, y > 0, φ(x + y) = Pr{N (0, x + y] > 0} = Pr{N (0, x] > 0} + Pr{N (0, x] = 0, N (x, x + y] > 0} ≤ Pr{N (0, x] > 0} + Pr{N (x, x + y] > 0} = φ(x) + φ(y). The assertion of the proposition now follows from the subadditive function Lemma 3.6.I. The parameter λ is called the intensity of the point process, for when it is ﬁnite, it makes sense to rewrite (3.3.4) as Pr{N (x, x + h] > 0} = Pr{there is at least one point in (x, x + h]} = λh + o(h)

(h ↓ 0).

(3.3.6)

Examples of a point process with λ = ∞ are given in Exercises 3.3.2–3. These two measures of the ‘rate’ of a stationary point process coincide when the point process has the following property. Deﬁnition 3.3.II. A point process is simple when Pr{N ({t}) = 0 or 1 for all t} = 1.

(3.3.7)

Daley (1974) called this sample-path property almost sure orderliness to contrast it with the following analytic property due to Khinchin (1955). Deﬁnition 3.3.III. A crudely stationary point process is orderly when Pr{N (0, h] ≥ 2} = o(h)

(h ↓ 0).

(3.3.8)

Notice that stationarity plays no role in the deﬁnition of a simple point process, nor does it matter whether the point process is deﬁned on the real line or even a Euclidean space. While orderliness can be deﬁned for point processes that either are nonstationary or are on some space diﬀerent from the real line, the deﬁning equation (3.3.8) must then be suitably amended [see Exercise 3.3.1, Chapter 9, and Daley (1974) for further discussion and references]. It is a consequence of Korolyuk’s theorem and Dobrushin’s lemma, given below, that for stationary point processes with ﬁnite intensity, Deﬁnitions 3.3.II and 3.3.III coincide. Proposition 3.3.IV (Korolyuk’s Theorem). For a crudely stationary simple point process, λ = m, ﬁnite or inﬁnite.

48

3. Simple Results for Stationary Point Processes on the Line

Remark. In Khinchin’s (1955, Section 11) original statement of this proposition, the point process was assumed to be orderly rather than simple. In view of the possible generalizations of the result to nonstationary point processes and to processes on spaces other than the real line where any deﬁnition of orderliness may be more cumbersome, it seems sensible to follow Leadbetter (1972) in connecting the present result with Korolyuk’s name. Proof. We use a sequence of nested intervals that in fact constitute a dissecting system (see Section A1.6 and the proof of Theorem 2.3.II). For any positive integer n and i = 1, . . . , n, deﬁne indicator random variables > 0, 1 i−1 i Ini = (3.3.10) according as N , n n = 0. 0 Then, as n → ∞ through the integers 2p , p = 1, 2, . . . , n

Ini ↑ N (0, 1]

(3.3.11)

i=1

for those realizations N (·) for which N (0, 1] < ∞ and N ({t}) = 0 or 1 for all 0 < t ≤ 1; that is, in view of (3.1.5) and (3.3.7), (3.3.11) holds a.s. Then ! " n

m = E N (0, 1] = E lim Ini n→∞

! = lim E n→∞

n

i=1 −1

= lim nφ(n n→∞

=λ

" Ini )

i=1

by Lebesgue’s monotone convergence theorem, by (3.3.5), (3.3.10), and crude stationarity,

by Khinchin’s existence theorem.

Proposition 3.3.V (Dobrushin’s Lemma). A crudely stationary simple point process of ﬁnite intensity is orderly. Proof. For any positive integer n, E(N (0, 1]) = n E(N (0, n−1 ]) by crude stationarity, so m = E(N (0, 1]) = n

∞

Pr{N (0, n−1 ] ≥ j}

j=1

≥ nφ(n−1 ) + n Pr{N (0, n−1 ] ≥ 2}.

(3.3.12)

Being crudely stationary, Khinchin’s existence theorem applies, so nφ(n−1 ) → λ as n → ∞, and being simple also, Korolyuk’s theorem applies, so λ = m. Combining these facts with (3.3.12), n Pr{N (0, n−1 ] ≥ 2} → 0 as n → ∞, which by (3.3.8) is the same as orderliness. Dobrushin’s lemma is a partial converse of the following result in which there is no ﬁniteness restriction on the intensity. Proposition 3.3.VI. A crudely stationary orderly point process is simple.

3.3.

Mean Density, Intensity, and Batch-Size Distribution

49

Proof. Simpleness is equivalent to 0=

∞

# $ Pr N ({t}) ≥ 2 for some t in (r, r + 1] ,

r=−∞

which in turn is equivalent to # $ 0 = Pr N ({t}) ≥ 2 for some t in (r, r + 1]

(r = 0, ±1, . . .). (3.3.13)

For every positive integer n, i−1 i Pr{N ({t}) ≥ 2 for some t in (0, 1]} ≤ ≥2 , Pr N n n i=1 n

= n Pr{N (0, n−1 ] ≥ 2} →0

(n → ∞)

by crude stationarity, when N (·) is orderly,

so (3.3.13) holds for r = 0 and, by trite changes, for all r. In the results just given, a prominent role is played by orderliness, which stems from the notion that the points {ti } can indeed be ordered; that is, in the notation of (3.1.10), we have ti < ti+1 for all i. Without orderliness, we are led to the idea of batches of points: we proceed as follows. Proposition 3.3.VII. For a crudely stationary point process, the limits λk = lim h↓0

Pr{0 < N (0, h] ≤ k} h

(3.3.14)

exist for k = 1, 2, . . . , and λk ↑ λ

(k → ∞), ﬁnite or inﬁnite ;

(3.3.15)

when λ is ﬁnite, πk ≡

λk − λk−1 = lim Pr{N (0, h] = k | N (0, h] > 0} h↓0 λ

(3.3.16)

is a probability distribution on k = 1, 2, . . . . Proof. Deﬁne, by analogy with (3.3.5), φk (x) = Pr{0 < N (0, x] ≤ k}

(x > 0, k = 1, 2, . . .).

(3.3.17)

Then, like φ(·), φk (x) → 0 for x ↓ 0 and it is subadditive on (0, ∞) because, for x, y > 0, φk (x + y) = Pr{0 < N (0, x] ≤ k, N (x, x + y] = 0} + Pr{N (0, x] ≤ k − N (x, x + y], 0 < N (x, x + y] ≤ k} ≤ Pr{0 < N (0, x] ≤ k} + Pr{0 < N (x, x + y] ≤ k} = φk (x) + φk (y),

50

3. Simple Results for Stationary Point Processes on the Line

invoking crude stationarity at the last step. Thus, (3.3.14) follows from the subadditive function lemma, which is also invoked in writing λ = sup sup h>0 k>0

φk (h) φk (h) = sup sup = sup λk . h h k>0 h>0 k>0

The monotonicity of λk in k is obvious from (3.3.14), so (3.3.15) is now proved. Equation (3.3.16) follows from (3.3.14), (3.3.15), and (3.3.17). The limit of the conditional probability in (3.3.16) can be rewritten in the form (h ↓ 0, k = 1, 2, . . .).

Pr{N (0, h] = k} = λπk h + o(h)

(3.3.18)

This equation and (3.3.16) suggest that the points {ti } of sample paths occur in batches of size k = 1, 2, . . . with respective intensities λπk . To make this idea precise, recall that for bounded Borel sets A we have assumed N (A) to be integer-valued and ﬁnite so that we can deﬁne Nk (A) = #{distinct t ∈ A: N ({t}) = k}

(k = 1, 2, . . .)

and thereby express N (A) as N (A) =

∞

kNk (A).

(3.3.19)

k=1

By deﬁnition, these point processes Nk (·) are simple and stationary, and (k) for them we can deﬁne indicator random variables Ini , analogous to Ini in (3.3.10), by % = k, 1 i−1 i (k) , Ini = (3.3.20) according as N n n = k. 0 By letting n → ∞ through n = 2p for p = 1, 2, . . . , it follows from (3.3.20) and the construction of Nk (·) that Nk (0, 1] = lim

n→∞

n

(k)

Ini

a.s.

(3.3.21)

i=1

(k)

Now Ini ≤ Ini , so when λ < ∞, it follows from (3.3.21) by using dominated convergence that E(Nk (0, 1]) < ∞, being given by n (k) Ini E(Nk (0, 1]) = lim E n→∞

i=1

= lim n[φk (n−1 ) − φk−1 (n−1 )] n→∞

= λπk .

(3.3.22)

3.3.

Mean Density, Intensity, and Batch-Size Distribution

51

The sample-path deﬁnition of Nk (·) having intensity λπk as in (3.3.22) warrants the use of the term batch-size distribution for the probability distribution {πk }. Note that a stationary orderly point process has the degenerate batch-size distribution for which π1 = 1, πk = 0 (all k = 1). Otherwise, the sample paths are appropriately described as having multiple points; this terminology is reﬂected in the frequently used description of a simple point process as one without multiple points. The moments of the distribution {πk } can be related to those of N (·) as in the next two propositions, in which we call equation (3.3.23) a generalized Korolyuk equation. Proposition 3.3.VIII. For a crudely stationary point process of ﬁnite intensity, ∞

m = E(N (0, 1]) = λ kπk , ﬁnite or inﬁnite. (3.3.23) k=1

Proof. Take expectations in (3.3.19) with A = (0, 1] and use Fubini’s theorem and (3.3.22) to deduce (3.3.23). Proposition 3.3.IX. For a crudely stationary point process of ﬁnite intensity λ and ﬁnite γth moment, γ ≥ 1, ∞

E [N γ (0, h]]γ lim exists and equals λ k γ πk . h↓0 h

(3.3.24)

k=1

Proof. Introduce

Mγ (x) = E(N γ (0, x]),

and observe that for x, y > 0, using γ ≥ 1, Mγ (x + y) = E (N (0, x] + N (x, x + y])γ ≥ E(N γ (0, x]) + E(N γ (x, x + y]) = Mγ (x) + Mγ (y); that is, the function Mγ (x) is superadditive for x > 0. When Mγ (x) is ﬁnite for 0 < x < ∞, Mγ (x) → 0 (x ↓ 0), so the subadditive function Lemma 3.6.IV applied to −Mγ (x) proves the existence part of (3.3.24). Since N γ (0, 1] ≥

n ∞

i=1

(k)

k γ Ini

→

k−1

∞

k γ Nk (0, 1] a.s.

(n → ∞),

k=1

we can use dominated convergence and crude stationarity to conclude that ∞ ∞

lim nMγ (n−1 ) = E k γ Nk (0, 1] = λ k γ πk .

n→∞

k=1

k=1

52

3. Simple Results for Stationary Point Processes on the Line

Exercises and Complements to Section 3.3 3.3.1 Verify that a simple point process (Deﬁnition 3.3.II) can be deﬁned equivalently as one for which the distances between points of a realization are a.s. positive. [Hint: When the realization consists of the points {tn }, (3.3.7) is equivalent (Vasil’ev, 1965) to the relation Pr{|ti − tj | > 0 (all i = j)} = 1. ] 3.3.2 Show that a mixed Poisson process for which

∞

Pr{N (0, t] = j} = 1

e−λt (λt)j j!

1 −3/2 λ 2

dλ

is simple but not orderly. A mixed Poisson process with

∞

Pr{N (0, t] = j} = 1

e−λt (λt)j −2 λ dλ j!

also has inﬁnite intensity, but it does satisfy the orderliness property (3.3.8). 3.3.3 (a) Let the r.v. X be distributed on (0, ∞) with distribution function F (·) and, conditional on X, let the r.v. Y be uniformly distributed on (0, X). Now deﬁne a point process to consist of the set of points {nX + Y : n = 0, ±1, . . .}. Verify that such a process is stationary and that

∞

Pr{N (0, h] = 0} =

1−

h

h

Pr{N (0, h] ≥ 2} = h

h dF (x) = 1 − h x

∞

x−2 F (x) dx,

h

x−2 F (x) dx.

(1/2)h

When F (x) = x for 0 < x < 1, show that (i) the intensity λ = ∞; (ii) the process is not orderly; and (iii) it has the Khinchin orderliness property [Khinchin (1956); see also Leadbetter (1972) and Daley (1974)] Pr{N (0, h] ≥ 2 | N (0, h] ≥ 1} → 0

(h → 0).

(3.3.25)

(b) Let the realizations of a stationary point process come, with probability 12 each, either from a process of doublets consisting of two points at each of {n + Y : n = 0, ±1, . . .}, where Y is uniformly distributed on (0, 1), or from a simple point process as in part (a). Then Pr{N ({t}) ≤ 1 for all t} = 12 , so the process is not simple, but it does have the Khinchin orderliness property in (3.3.25). 3.3.4 Suppose that N (·) is a simple point process on (0, ∞) with ﬁnite ﬁrst moment M (x) = EN (x), x and suppose that M (·) is absolutely continuous in the sense that M (x) = 0 m(y) dy (x > 0) for some density function m(·). Show that the distribution functions Gi (·) of Exercise 3.1.1 are also absolutely continuous with density functions gi (·), where

Gi (x) =

x

gi (y) dy, 0

and

m(x) =

∞

i=1

gi (x) a.e.

3.4.

Palm–Khinchin Equations

53

3.3.5 (Continuation). Now deﬁne Gi (x; t) as the d.f. of the ith forward recurrence time after t, i.e. Gj (x; t) is the d.f. of inf{u > t: N (t, u] ≥ i}. Supposing that N (·) has ﬁnite ﬁrst moment and is absolutely continuous in the sense of Exercise 3.3.4, show that when N (·) is simple, g1 (0; t) = m(t),

gi (0; t) = 0

(i ≥ 2).

Use these results to give an alternative proof of Korolyuk’s Theorem 3.3.IV. Show also that when the rth moment of N (·) is ﬁnite, E[(N (t, t + h])r ] = m(t). h↓0 h 3.3.6 Given any point process with sample realizations N , deﬁne another point process with sample realization N ∗ by means of lim

N ∗ (A) = #{distinct x ∈ A: N ({x}) ≥ 1}

(all Borel sets A)

(in the setting of marked point processes in Section 6.4 below, N ∗ here is an example of a ground process, denoted Ng there). Show that if, for any real ﬁnite s > 0, ∗ E(e−sN (A) ) ≥ E(e−sN (A) ) (all Borel sets A), then N is simple. Irrespective of whether or not it is simple, N (A) = 0 iﬀ N ∗ (A) = 0. Show that if N is a compound Poisson process as in Theorem 2.2.II, then N ∗ is a stationary Poisson process with rate λ. 3.3.7 Consider a compound Poisson process as in Theorem 2.2.II, and suppose that the mean batch size Π (1) = kπk is inﬁnite. Let the points of the process be subject to independent shifts with a common distribution that has no atoms. The resulting process is no longer Poisson, is simple, and has inﬁnite intensity. When the shifts are i.i.d. and uniform on (0, 1), show that, for 0 < h < 1,

Pr{N (0, h] = 0} = exp

− λ(1 + h) + λ(1 − h)Π(1 − h) + 2λ

h

Π(1 − u) du . 0

3.4. Palm–Khinchin Equations Throughout this section, we use P to denote the probability measure of a stationary point process (Deﬁnition 3.2.I). Our aim is to describe an elementary approach to the problem raised by the intuitively reasonable idea that the stationarity of a point process as in Deﬁnition 3.2.I should imply some equivalent interval stationarity property as in Deﬁnition 3.2.II. For example, for positive x and y and small positive h, stationarity of the point process N (·) implies that P{N (t, t + h] = N (t + x, t + x + h] = N (t + x + y, t + x + y + h] = 1, N (t, t + x + y + h] = 3} = P{N (−h, 0] = N (x − h, x] = N (x + y − h, x + y] = 1, N (−h, x + y] = 3} ≡ P{Ax,y,h }, say. (3.4.1) Now the event Ax,y,h describes a sample path with a point near the origin

54

3. Simple Results for Stationary Point Processes on the Line

and intervals of about x and y, respectively, to the next two points. Our intuition suggests that, as far as the dependence on the variables x and y is concerned, P{Ax,y,h } should be related to the probability measure P0 (·) for an interval stationary point process; that is, there should be a simple relation between P{Ax,y,h } and P0 {τ1 x, τ2 y}. We proceed to describe the partial solution that has its roots in Khinchin’s monograph (1955) and that connects P{N (0, x] ≤ j} to what we shall show is a distribution function Rj (x) = lim P{N (0, x] ≥ j | N (−h, 0] > 0} h↓0

(j = 1, 2, . . .).

(3.4.2)

What emerges from the deeper considerations of Chapter 13 is that, granted orderliness, there exists an interval stationary point process {τj } with probability measure P0 , so P0 {t0 = 0} = 1, for which we can indeed set P0 (·) = lim P(· | N (−h, 0] > 0). h↓0

It then follows, for example, that P0 {τ1 + · · · + τj ≤ x} = Rj (x)

(3.4.3)

[see (3.4.2) and (3.1.9)], thereby identifying a random variable having Rj (·) as its distribution function. Instead of the expression in (3.4.1), we consider ﬁrst the probability ψj (x, h) ≡ P{N (0, x] ≤ j, N (−h, 0] > 0}

(3.4.4)

and prove the following proposition. Proposition 3.4.I. For a stationary point process of ﬁnite intensity, the limit Qj (x) = lim P{N (0, x] ≤ j | N (−h, 0] > 0} (3.4.5) h↓0

exists for x > 0 and j = 0, 1, . . . , being right-continuous and nonincreasing in x with Qj (0) = 1. Proof. Observe that for u, v > 0, ψj (x, u + v) = P{N (0, x] ≤ j, N (−u, 0] > 0} + P{N (0, x] ≤ j, N (−u, 0] = 0, N (−u − v, −u] > 0}. In the last term, {N (0, x] ≤ j, N (−u, 0] = 0} = {N (−u, x] ≤ j, N (−u, 0] = 0} ⊆ {N (−u, x] ≤ j} ⊆ {N (−u, x − u] ≤ j},

3.4.

Palm–Khinchin Equations

55

and then using stationarity of P(·), we have ψj (x, u + v) ≤ ψj (x, u) + ψj (x, v). Consequently, the subadditivity lemma implies that the limit as h → 0 of ψj (x, h)/h exists, being bounded by λ [because ψj (x, h) ≤ φj (h)], so by writing ψj (x, h) ψj (x, h)/h P{N (0, x] ≤ j | N (−h, 0] > 0} = = , φ(h) φ(h)/h we can let h → 0 to prove the assertion in (3.4.5) concerning existence. By subadditivity, and right-continuity and monotonicity in x of ψj (x, h), ψj (x, h) ψj (y, h) = sup sup = sup Qj (y), λh λh y>x h>0 h>0 y>x

Qj (x) = sup

so Qj (x) is right-continuous and nonincreasing in x, with Qj (0) = 1 since ψj (0, h) = φ(h). It follows from this result that every Rj (x) ≡ 1 − Qj−1 (x)

(j = 1, 2, . . .)

(3.4.6)

is a d.f. on (0, ∞) except for the possibility, to be excluded later under the conditions of Theorem 3.4.II, that limx→∞ Rj (x) may be less than 1. The plausible interpretation of (3.4.5), or equivalently, of (3.4.6), is that Rj (x) represents the conditional probability (in which the conditioning event has zero probability) # $ P N (0, x] ≥ j | N ({0}) > 0 = P{τ1 + · · · + τj ≤ x | t0 = 0, t1 > 0}. (3.4.7) Example 3.4(a) Renewal process. Consistent with (3.4.7), for a renewal process starting at 0 with lifetime d.f. F for which F (0+) = 0, Rj (x) = F j∗ (x), where F n∗ (·) is the n-fold convolution of F . In this case then, Rj (·) is the d.f. of the sum of j random variables that are not merely stationary but also independent. On the other hand, if we have a renewal process with a point at 0 and having lifetime d.f. F for which 0 < F (0+) < 1, then the constraint in (3.4.7) that τ1 = t1 − t0 > 0 means that τ1 has d.f. F+ (x) = (F (x) − F (0+))/(1 − F (0+)), while τ2 , τ3 , . . . have d.f. F and

x

F (j−1)∗ (x − u) dF+ (u)

Rj (x) =

(j = 1, 2, . . .).

0

Thus, Rj (x) is here the d.f. of the sum of nonstationary r.v.s, and so for a renewal process we have the stationarity property at (3.4.3) only when F (0+) = 0; that is, when the process is orderly (or equivalently, simple).

56

3. Simple Results for Stationary Point Processes on the Line

This last assumption is also what enables us to proceed simply in general [but, note the remarks around (3.4.12) below]. Theorem 3.4.II. For an orderly stationary point process of ﬁnite intensity λ and such that P{N (−∞, 0] = N (0, ∞) = ∞} = 1, ∞ qj (u) du (j = 0, 1, . . .), P{N (0, x] ≤ j} = λ

(3.4.8) (3.4.9)

x

where qj (x) = lim P{N (0, x] = j | N (−h, 0] > 0}, h↓0

(3.4.10)

j−1 and Rj (x) = 1 − k=0 qk (x) is a distribution function on (0, ∞) with mean jλ−1 for each j = 1, 2, . . . . Proof. Set

Pj (x) = P{N (0, x] ≤ j}

and observe by Proposition 3.4.I and the assumption of orderliness that Pj (x + h) =

j

P{N (0, x] ≤ j − i, N (−h, 0] = i}

i=0

= P{N (0, x] ≤ j} − P{N (0, x] ≤ j, N (−h, 0] > 0} + P{N (0, x] ≤ j − 1, N (−h, 0] = 1} + o(h). Thus, Pj (x + h) − Pj (x) = P{N (0, x] ≤ j − 1, N (−h, 0] ≥ 1} − P{N (0, x] ≤ j, N (−h, 0] > 0} + o(h) = −λhqj (x) + o(h), where the existence of qj (x) in (3.4.10) is assured by (3.4.5) directly for j = 0 and then by induction for j = 1, 2, . . . . Using D+ to denote the right-hand derivative operator, it follows that D+ Pj (x) = −λqj (x). Setting Q−1 (x) ≡ 0, the nonnegative function qj (x) = Qj (x) − Qj−1 (x) is the diﬀerence of two bounded nonincreasing functions and hence is integrable on bounded intervals with y qj (u) du. (3.4.11) Pj (x) − Pj (y) = λ x

The assumption in (3.4.8) implies that Pj (y) → 0 for y → ∞, so (3.4.9) now follows from (3.4.11).

3.4.

Palm–Khinchin Equations

57

Letting x ↓ 0 in (3.4.9), it follows that λ

−1

∞

=

qj (u) du

(j = 0, 1, . . .),

0

and hence, using (3.4.6) as well, that for j = 1, 2, . . . ,

∞

1 − Rj (u) du =

0

∞

Qj−1 (u) du = jλ−1 .

0

There is a most instructive heuristic derivation of (3.4.9) as follows. By virtue of (3.4.8), if we look backward from a point x, there will always be some point u < x for which N (u, x] ≤ j and N [u, x] > j. In fact, because of orderliness, we can write (with probability 1) {N (0, x] ≤ j} =

{N (u, x] = j, N ({u}) = 1},

u≤0

in which we observe that the right-hand side is the union of the mutually exclusive events that the (j + 1)th point of N (·) looking backward from x occurs at some u ≤ 0. Consequently, we can add their ‘probabilities’, which by (3.4.7), (3.3.4), and orderliness equal qj (x − u)λ du, yielding the Palm– Khinchin equation (3.4.9) in the form Pj (x) = λ

0

−∞

qj (x − u) du.

Without the orderliness assumption, made from (3.4.8) onward above, we can proceed as follows. First (see Proposition 3.4.I), we show that the function ψj|i (x, h) ≡ P{N (0, x] ≤ j, 0 < N (−h, 0] ≤ i}

(3.4.12)

is subadditive in h and so deduce that, for those i for which πi > 0 [see (3.3.16)], there exists the limit Qj|i (x) = lim P{N (0, x] ≤ j | N (−h, 0] = i}, h↓0

(3.4.13)

with P{N (0, x] ≤ j, N (−h, 0] = i} = λπi Qj|i (x)h + o(h)

(h ↓ 0)

irrespective of πi > or = 0 by setting Qj|i (x) ≡ 0 when πi = 0. Then, the argument of the proof of Theorem 3.4.II can be mimicked in establishing that Pj (x) = λ

∞ ∞

x

i=1

πi [Qj|i (u) − Qj−i|i (u)] du,

(3.4.14)

58

3. Simple Results for Stationary Point Processes on the Line

setting Qk|i (u) ≡ 0 for k < 0, and it can also be shown that, when πi > 0, Rj|i (x) ≡ 1 − Qj−1|i (x) ≡ 1 −

j−1

qk|i (x)

k=0

is a proper distribution function on (0, ∞). For any point process N , the random variable Tu ≡ inf{t > 0: N (u, y + t] > 0}

(3.4.15)

is the forward recurrence time r.v. For a stationary point process, Tu =d T0 for all u, and we can study its distribution via the Palm–Khinchin equations since {T0 > x} = {N (0, x] = 0}. Assuming that (3.4.8) holds, ∞ q0 (u) du (3.4.16) P{T0 > x} = λ x

when N (·) is orderly as in Theorem 3.4.II. Recall that q0 (·) is the tail of the d.f. R1 (·), which can be interpreted as the d.f. of the length τ1 of an arbitrarily chosen interval. Then, still assuming that (3.4.8) holds, ∞ ∞ P{T0 > x} dx = λ uq0 (u) du ET0 = 0 0 ∞ =λ u 1 − R1 (u) du = 12 λ(Eτ12 ). (3.4.17) 0

When all intervals are of the same length, ∆ say, λ = ∆−1 and ET0 = 12 ∆, whereas for a Poisson process, τ1 has mean ∆ and second moment Eτ12 = 2∆2 , so then ET0 = ∆. These remarks amplify the comments on the waiting-time paradox at the end of Section 3.2. In both Theorem 3.4.II and the discussion of the forward recurrence time r.v. Tu , the caveat that P{N (0, ∞) = ∞} = 1 has been added. This is because stationary point processes on the real line R have the property that P{N (0, ∞) = ∞ = N (−∞, 0)} = 1 − P{N (R) = 0},

(3.4.18)

which is equivalent to P{0 < N (R) < ∞} = 0.

(3.4.19)

A similar property in a more general setting is proved in Chapter 12. Inspection of the statements onward from (3.4.8) shows that they are either conditional probability statements (including limits of such statements), which in view of (3.4.18) reduce to being conditional also on {N (R) = ∞}, or unconditional statements, which without (3.4.8) need further elaboration. This is quickly given: (3.4.8) is equivalent by (3.4.18) to P{T0 < ∞} = 1, and without (3.4.8), equations (3.4.16) and (3.4.17) must be replaced by assertions

3.4.

Palm–Khinchin Equations

59

of the form

P{T0 > x} = λ

∞

q0 (u) du + 1 − ,

(3.4.20)

x

E(T0 | T0 < ∞) = 12 λE(τ12 ),

(3.4.21)

where = P{N (R) = ∞} = P{T0 < ∞}.

Exercises and Complements to Section 3.4 3.4.1 Analogously to (3.4.15), deﬁne a backward recurrence time r.v. Bu ≡ inf{t > 0: N (u − t, u] > 0} (assuming this to be ﬁnite a.s.). Show that when N (·) is a stationary point process, Bu =d B0 =d T0 . The r.v. Lu = Bu + Tu denotes the current lifetime r.v.; when N is orderly and stationary, show that EL0 = (Eτ12 )/(Eτ1 ) [see (3.4.16)] and that

x

[q0 (u) − q0 (x)] du = λ

P{L0 < x} = λ 0

x

u dR1 (u). 0

3.4.2 Use Palm–Khinchin equations to show that when the hazard functions q and r of the interval and forward recurrence r.v.s τ0 and T0 , respectively, are such x that r(x) = r(0) + 0 r (u) du for some density function r , then q and r are related by r(x) = q(x) + r (x)/r(x) (x > 0). 3.4.3 Show that for an orderly point process,

1

P{N (dx) ≥ 1},

EN (0, 1] = 0

where the right-hand side is to be interpreted as a Burkill integral [see Fieger (1971) for further details]. 3.4.4 For a point process N on R, deﬁne the event Bk ≡ Bk ((xi , ji ): i = 1, . . . , k ) = {N (0, xi ] ≤ ji (i = 1, . . . , k)} for positive xi , nonnegative integers ji (i = 1, . . . , k), and any ﬁxed ﬁnite positive integer k. (a) When N is stationary with ﬁnite intensity λ, ψ(Bk , h) = P(Bk ∩ {N (−h, 0] > 0}) is subadditive in h > 0, the limit Q(Bk ) = limh↓0 P (Bk | {N (−h, 0] > 0}) exists ﬁnite, is right-continuous and nonincreasing in each xi and nondecreasing in ji , is invariant under permutations of (x1 , j1 ), . . . , (xk , jk ), satisﬁes the consistency conditions Q(Bk ) = Q(Bk+1 ((0, jk+1 ), (xi , ji ) (i = 1, . . . , k))) = Q(Bk+1 ((xk+1 , ∞), (xi , ji ) (i = 1, . . . , k))), and Q(Bk ) = lim ψ(Bk , h)/λh = sup ψ(Bk , h}/λh. h↓0

h>0

60

3. Simple Results for Stationary Point Processes on the Line (b) Deﬁne a shift operator Sh (h > 0) and a diﬀerence operator ∆ on Bk by Sh Bk = Bk ((xi + h, ji ) (i = 1, . . . , k)), ∆Bk = Bk ((xi , ji − 1) (i = 1, . . . , k)), and put q(Bk ) = Q(Bk ) − Q(∆Bk ), with the convention that if any ji = 0, then ∆Bk is a null set with Q(∆Bk ) = 0. Under the condition (3.4.8) of Theorem 3.4.II, the right-hand derivative D+ P(Bk ) exists in the sense that D+ P(Sh Bk ) |h=0 = −λq(Bk ), and

P(Bk ) − P(Sx Bk ) = λ

x

q(Su Bk ) du. 0

[See Daley and Vere-Jones (1972, Section 7) and Slivnyak (1962, 1966). Note that Slivnyak used a slightly diﬀerent operator Sh0 deﬁned by Sh0 Bk = Bk+1 ((h, 0), (xi + h, ji ) (i = 1, . . . , k)), so that ψ(Bk , h) = P(Bk )−P(Sh0 Bk ), and deduced the existence of a derivative in h of P(Sh0 Bk ) from the convexity in h of this function, assuming stationarity of N but not necessarily that it has ﬁnite intensity.]

3.5. Ergodicity and an Elementary Renewal Theorem Analogue Let N (·) be a stationary point process with ﬁnite mean density m = EN (0, 1]. Then, the sequence {Xn } of random variables deﬁned by Xn = N (n − 1, n]

(n = 0, ±1, . . .)

is stationary with ﬁnite ﬁrst moment m = EXn (all n), and by the strong law for stationary random sequences, N (0, n] X1 + · · · + Xn = →ξ n n

a.s.

for some random variable ξ for which Eξ = m. Using x to denote the largest integer ≤ x, it then follows on letting x → ∞ in the inequalities N 0, x x N 0, x N 0, x + 1 x + 1 · ≤ ≤ · (x ≥ 1) x x x x + 1 x that we have proved the following proposition. Proposition 3.5.I. For a stationary point process with ﬁnite mean density m = EN (0, 1], ζ ≡ limx→∞ N (0, x]/x exists a.s. and is a random variable with Eζ = m.

3.5.

Ergodicity and an Elementary Renewal Theorem Analogue

61

In our discussion of limit properties of stationary point processes we shall have cause to use various concepts of ergodicity; for the present we simply use the following deﬁnition. Deﬁnition 3.5.II. A stationary point process with ﬁnite mean density m is ergodic when P{N (0, x]/x → m (x → ∞)} = 1. Suppose that in addition to being ergodic, the second moment E[(N (0, 1])2 ] is ﬁnite, so by stationarity and the Cauchy–Schwarz inequality, E[(N (0, x])2 ] < ∞ for all ﬁnite positive x. Then, we can use an argument similar to that leading to Proposition 3.5.I to deduce from the convergence in mean square of (X1 + · · · + Xn )/n = N (0, n]/n to the same limit [see e.g. (2.15) of Doob (1953, p. 471) or Chapter 12 below] that var(N (0, x]/x) = E(N (0, x]/x − m)2 → 0

(x → ∞)

(3.5.1)

when N (·) is ergodic with ﬁnite second moment. This is one of the key probabilistic steps in the proof of the next theorem, in which the asymptotic result in (3.5.3), combined with the remarks that follow, is an analogue of the elementary renewal theorem [see Exercise 4.1.1(b) and Section 4.4 below]. The function U (·), called the expectation function in Daley (1971), is the analogue of the renewal function. Theorem 3.5.III. For a stationary ergodic point process with ﬁnite second moment and mean density m, the second-moment function

x

M2 (x) ≡ E[(N (0, x])2 ] =

2U (u) − 1 m du

(3.5.2)

0

for some nondecreasing function U (·) for which U (x)/x → m when the process is orderly, U (x) =

(x → ∞); ∞

Rj (x).

(3.5.3)

(3.5.4)

j=0

Remarks. (1) It is consistent with the interpretation of Rj (·) in (3.4.3) as the d.f. of the sum Sj = τ1 + · · · + τj that U (x) = lim E(N (0, x] + 1 | N (−h, 0] > 0) h↓0

in the case where N (·) is orderly. In the nonorderly case, it emerges that, given an ergodic stationary sequence {τj } of nonnegative random variables

62

3. Simple Results for Stationary Point Processes on the Line

with Eτj = 1/m and partial sums {Sn } given by S0 = 0 and Sn = τ1 + · · · + τn , S−n = −(τ0 + · · · + τ−(n−1) )

(n = 1, 2, . . .),

we can interpret U (·) as ∞

2U (x) − 1 = E#{n = 0, ±1, . . . : |Sn | ≤ x} =

Pr{|Sn | ≤ x}.

(3.5.5)

n=−∞

In the case where the random variables {τj } are independent and identically distributed, ∞

U (x) = F n∗ (x) (3.5.6) n=0

and hence U (·) is then the renewal function. (2) It follows from (3.5.2) that var N (0, x] =

x

2[U (u) − mu] − 1 m du.

(3.5.7)

0

(3) It is a simple corollary of (3.5.3) that for every ﬁxed ﬁnite y, U (x + y) →1 U (x)

(x → ∞).

(3.5.8)

Proof of Theorem 3.5.III. From the deﬁnition in (3.5.2) with N (x) = N (0, x], 2 M2 (x) = E [N (x)]2 = var N (x) + EN (x) = x2 [var(N (x)/x) + m2 ] ∼ m2 x2

(x → ∞)

when N (·) is ergodic, by (3.5.1). If we can assume that M2 (·) is absolutely continuous and that the function U (·), which can then be deﬁned as in (3.5.2), is monotonically nondecreasing, we can appeal to a Tauberian theorem (e.g. Feller, 1966, p. 421) and conclude that (3.5.3) holds. It remains then to establish (3.5.2), for which purpose we assume ﬁrst that N (·) is orderly so that the representation (3.4.9) is at our disposal. It is a matter of elementary algebra that ∞ M2 (x) + mx = E N (x)(N (x) + 1) = j(j + 1)P{N (x) = j} j=1

=2

∞

k=1

kP{N (x) ≥ k}

3.5.

Ergodicity and an Elementary Renewal Theorem Analogue

=2

∞

k=1 x

x

(k + 1) 1+

=2

qk (u)λ du 0

0

63

∞

1 − Qj (u)

λ du = 2

∞ x

0

j=0

Rj (u)λ du,

j=0

where R0 (u) ≡ 1. Thus, we have (3.5.2) in the case of orderly N (·) with the additional identiﬁcation that ∞

U (x) = Rj (x), (3.5.9) j=0

of which (3.5.6) is a special case. Note in (3.5.9) that the nondecreasing nature of each Rj (·) ensures the same property for U (·). When N (·) is no longer orderly, we must appeal to (3.4.14) in writing M2 (x) + mx = 2 =2

∞

k=0 ∞

(k + 1) 1 − Pk (x) (k + 1)

∞ x

0

k=0

πi Qk|i (u) − Qk−i|i (u) λ du. (3.5.10)

i=1

Without loss of generality, we may set Qk|i (x) ≡ 1 when πi = 0. Fubini’s theorem is then applicable as before in the manipulations below: 2

∞

k=0

(k + 1)

∞

i=1

πi

k

qj|i (u) = 2

∞

i=1

j=(k−i+1)+

=

=

∞

i=1 ∞

πi

∞

πi

∞

(k + 1)

k=0

k

qj|i (u)

j=(k−i+1)+

i(2j + i + 1)qj|i (u)

j=0

∞

1 − Qj|1 (u) . (3.5.11) iπi i + 1 + 2

i=0

j=0

Substitute (3.5.11) in (3.5.10) and recall that Qj|i (u) is nonincreasing; this establishes the existence of nondecreasing U (·) in (3.5.2) as required.

Exercises and Complements to Section 3.5 3.5.1 (see Theorem 3.5.III). Use the Cauchy–Schwarz inequality to show that, when M2 (x) ≡ EN 2 (0, x] < ∞ for ﬁnite x, (M2 (x))1/2 is subadditive in x > 0 and hence that there is then a ﬁnite constant λ2 ≥ m2 such that M2 (x) ∼ λ2 x2 (x → ∞). 3.5.2 Let N (·) be a stationary mixed Poisson process with P{N (0, t] = j} = 1 −t j e t /j! + 12 e−2t (2t)j /j! . Show that λ = 32 = m 0) (cf. 2 Theorem 3.5.III; this process is not ergodic) and that N (0, t]/t → ξ (t → ∞), where ξ = 1 or 2 with probability 12 each.

64

3. Simple Results for Stationary Point Processes on the Line

3.6. Subadditive and Superadditive Functions We have referred earlier in this chapter to properties of subadditive and superadditive functions, and for convenience we now establish these properties in a suitable form. For a more extensive discussion of such functions, see Hille and Phillips (1957). A function g(x) deﬁned for 0 ≤ x < a ≤ ∞ is subadditive when g(x + y) ≤ g(x) + g(y)

(3.6.1)

holds throughout its domain of deﬁnition; similarly, a function h(x) for which h(x + y) ≥ h(x) + h(y)

(3.6.2)

holds is superadditive. A function f (x) for which f (x + y) = f (x) + f (y)

(3.6.3)

holds is additive, and (3.6.3) is known as Cauchy’s functional equation or (see e.g. Feller, 1966, Section IV.4) the Hamel equation. Lemma 3.6.I. For a subadditive function g(·) that is bounded on ﬁnite intervals, µ ≡ inf x>0 g(x)/x is ﬁnite or −∞, and g(x) →µ x

(x → ∞).

(3.6.4)

Proof. There exists y for which g(y)/y < µ for any µ > µ. Given any x, there is a unique integer n for which x = ny + η, where 0 ≤ η < y, and n → ∞ as x → ∞. Then g(x) g(ny) + g(η) ng(y) g(η) ≤ ≤ + x x ny + η x g(η) g(y) g(y) + → (x → ∞). = y + η/n x y Thus, lim supx→∞ g(x)/x ≤ µ , and µ being an arbitrary quantity > µ, this proves the lemma. The function −h(x) is subadditive when h(·) is superadditive, and an additive function is both subadditive and superadditive, so Lemma 3.6.I implies both of the following results. Lemma 3.6.II. For a superadditive function h(·) that is bounded on ﬁnite intervals, µ ≡ supx>0 h(x)/x is ﬁnite or +∞ and h(x) →µ x

(x → ∞).

(3.6.5)

3.6.

Subadditive and Superadditive Functions

65

Lemma 3.6.III. An additive function f (·) that is bounded on ﬁnite intervals satisﬁes f (x) = f (1)x (0 ≤ x < ∞). (3.6.6) In passing, note that there do exist additive functions that do not have the linearity property (3.6.6): they are unbounded on every ﬁnite interval and moreover are not measurable (see e.g. Hewitt and Zuckerman, 1969). Observe also that nonnegative additive functions satisfy (3.6.6) with the understanding that f (1) = ∞ is allowed. The behaviour near 0 of subadditive and superadditive functions requires the stronger condition of continuity at 0 in order to derive a useful result [a counterexample when f (·) is not continuous at 0 is indicated in Hille and Phillips (1957, Section 7.11)]. Lemma 3.6.IV. Let g(x) be subadditive on [0, a] for some a > 0, and let g(x) → 0 as x → 0. Then λ ≡ supx>0 g(x)/x is ﬁnite or +∞, and g(x) →λ x

(x → 0).

(3.6.7)

Proof. The ﬁniteness of g(x) for some x > 0 precludes the possibility that λ = −∞. Consider ﬁrst the case where 0 < λ < ∞, and suppose that g(an )/an < λ − 2 for some > 0 for all members of a sequence {an } with an → 0 as n → ∞. For any given x > 0, we can ﬁnd an suﬃciently small that sup0≤δ0 g(x)/x ≤ λ − , contradicting the deﬁnition of λ. The case −∞ < λ ≤ 0 is established by considering g1 (x) ≡ g(x) + λ x for some ﬁnite λ > −λ. Finally, the case λ = ∞ is proved by contradiction starting from the supposition that g(an )/an → λ < ∞ for some {an } with an → 0. Lemma 3.6.V. Let h(x) be superadditive on [0, a] for some a > 0, and let h(x) → 0 as x → 0. Then λ ≡ inf x>0 h(x)/x is ﬁnite or −∞, and h(x) →λ x

(x → 0).

(3.6.8)

CHAPTER 4

Renewal Processes

The renewal process and variants of it have been the subject of much study, both as a model in many ﬁelds of application (see e.g. Cox, 1962; Cox and Lewis, 1966; Cox and Isham, 1980) and as a source of important theoretical problems. It is not the aim of this chapter to repeat much of the material that is available, for example, in Volume II of Feller (1966); rather, we have selected some features that are either complementary to Feller’s treatment or relevant to more general point processes. The ﬁrst two sections are concerned with basic properties, setting these where possible into a point process context. The third section is concerned with some characterization theorems and the fourth section with aspects of the renewal theorem, a topic so important and with such far-reaching applications that it can hardly be omitted. Two versions of the theorem are discussed, corresponding to diﬀerent forms of convergence of the renewal measure to Lebesgue measure. Some small indication of the range of applications is given in Section 4.5, which is concerned with ‘neighbours’ of the renewal process, notably the Wold process of correlated intervals. A ﬁnal section is concerned with the concept of a hazard measure for the lifetime distribution, a topic that is of interest in its own right and of central importance to the discussion of compensators and conditional intensity functions in Chapters 7 and 14.

4.1. Basic Properties Let X, X1 , X2 , . . . be independent identically distributed nonnegative random variables, and deﬁne the partial sums S0 = 0,

Sn = Sn−1 + Xn = X1 + · · · + Xn 66

(n = 1, 2, . . .).

(4.1.1)

4.1.

Basic Properties

67

For Borel subsets A of (0, ∞), we attempt to deﬁne the counting measure of a point process by setting N (A) = #{n: Sn ∈ A}.

(4.1.2)

Even if we exclude the trivial case X = 0 a.s., as we do throughout this chapter, it may not be completely obvious that (4.1.2) is ﬁnite. To see that this is so, observe that for X = 0 a.s. there must exist positive ε, δ such that Pr{X > ε} > δ so that with probability 1 the event {Xn > ε} must occur inﬁnitely often (by the Borel–Cantelli lemmas) and hence Sn → ∞ a.s. It follows that the right-hand side of (4.1.2) is a.s. ﬁnite whenever A is bounded, thus justifying the deﬁnition (4.1.2). (Here we ignore measurability aspects, for which see Chapter 9.) The process so deﬁned is the (ordinary ) renewal process. In the notation and terminology of Chapter 3, provided X1 > 0, we have ti = Si and τi = Xi for i = 1, 2, . . . , while the assumption that the {Xn } are i.i.d. implies that N (·) is interval stationary. Orderliness of the process here means Sn+1 > Sn for n = 0, 1, . . . ; that is, Xn > 0 for all n ≥ 0, all with probability 1. But the probability that Xn > 0 for n = 0, 1, . . . , N − 1 is equal to (Pr{X > 0})N → 0 as N → ∞ unless Pr{X > 0} = 1. Thus, the process is orderly if and only if Pr{X > 0} = 1; that is, if and only if the lifetime distribution has zero mass at the origin. Taking expectations of (4.1.2) yields the renewal measure U (A) = E(#{n: Sn ∈ A, n = 0, 1, 2, . . .}) = E[N (A)],

(4.1.3)

an equation that remains valid even if A includes the origin. U (A) is just the ﬁrst moment or expectation measure of N (·). Writing F (·) for the common lifetime distribution and F k∗ for its k-fold convolution (which is thus the distribution function for Sk ), and immediately abusing the notation by writing F (·) for the measure induced on the Borel sets of BR by F , we have ∞ ∞

I{Sk ∈A} = δ0 (A) + F k∗ (A). (4.1.4) U (A) = E k=0

k=1

We note in passing that the higher moments of N (A) can also be expressed in terms of U (·) (see Exercise 4.1.2). The quantity most commonly studied is the cumulative function, commonly called the renewal function, U (x) ≡ U ([0, x]) = 1 +

∞

F k∗ (x)

(x ≥ 0).

(4.1.5)

k=1

Again, U (x) is always ﬁnite. To see this, choose any δ > 0 for which F (δ) < 1 (possible since we exclude the case X = 0 a.s.). Then, since F (0−) = 0, we have for any positive integers i, j and x, y > 0, 1 − F (i+j)∗ (x + y) ≥ 1 − F i∗ (x) 1 − F j∗ (y) ,

68

4. Renewal Processes

and for 0 < y < x, F i∗ (x − y)F j∗ (y) ≤ F (i+j)∗ (x) ≤ F i∗ (x)F j∗ (x). Thus, F k∗ (δ) ≤ (F (δ))k < 1, and therefore the series in (4.1.5) certainly converges for x < δ. For general x in 0 < x < ∞, there exists ﬁnite positive k for which x/k < δ. For given x and such k, 1 − F k∗ (x) > [1 − F (x/k)]k > 0, so ∞ (k−1)∗ U (x) ≤ 1 + F (x) + · · · + F (x) F nk∗ (x) n=0

≤ 1 + F (x) + · · · + F

(k−1)∗

(x) / 1 − F k∗ (x) < ∞.

Thus, (4.1.5) converges for all x > 0. Taking Laplace–Stieltjes transforms in (4.1.5), we have for Re(θ) > 0 χ(θ) ≡

∞

e−θx dU (x) =

0

where ψ(θ) =

∞ 0

∞

k ψ(θ) =

k=0

1 , 1 − ψ(θ)

(4.1.6)

e−θx dF (x). Equivalently, for Re(θ) > 0, ψ(θ) = 1 − 1/χ(θ),

which shows (using the uniqueness theorem for Laplace–Stieltjes transforms) that U determines F uniquely and hence that there is a one-to-one correspondence between lifetime distributions F and renewal functions U . From (4.1.5), we have for x > 0 x U (x) = 1 + U (x − y) dF (y), (4.1.7) 0

this being the most important special case of the general renewal equation x Z(x) = z(x) + Z(x − y) dF (y) (x > 0), (4.1.8) 0

where the solution function Z is generated by the initial function z. If the function z(x) is measurable and bounded on ﬁnite intervals, one solution to (4.1.8) is given by Z0 (x) = z(x) +

∞

k=1

0

x

z(x − y) dF k∗ (y) =

x

z(x − y) dU (y),

(4.1.9)

0

the convergence of the series in the middle member being justiﬁed by comparison with (4.1.5). Using the monotonicity of the relation z → Z0 , we easily see that if z ≥ 0, (4.1.9) is the minimal nonnegative solution to (4.1.8). In fact, considerably more is true, for if z(x) is merely measurable and bounded on ﬁnite inter-

4.1.

Basic Properties

69

vals, the diﬀerence D(x) between any two solutions of (4.1.8) with the same property satisﬁes x D(x) = D(x − y) dF k∗ (y) for each k = 1, 2, . . . ; 0

hence, D(x) ≡ 0 from the fact that F k∗ (x) → 0 as k → ∞ and the assumed boundedness of D. We summarize as follows. Lemma 4.1.I (Renewal Equation Solution). When z(x) is measurable and bounded on ﬁnite intervals, the general renewal equation (4.1.8) has a unique measurable solution that is also bounded on ﬁnite intervals, and it is given by (4.1.8). In particular, U (x) is the unique monotonic and ﬁnite-valued solution of (4.1.7). Example 4.1(a) Exponential intervals. The lack of memory property of the exponential distribution bequeaths on the renewal process that it generates the additional independence properties of the Poisson process. Suppose specifically that (λ > 0, 0 ≤ x < ∞). F (x) = 1 − e−λx The renewal function for the corresponding Poisson process is U (x) = 1 + λx, as can be checked either by using the transform equation in (4.1.6), by summing the convolution powers as in (4.1.5), or by direct veriﬁcation in the integral equation in (4.1.7). Example 4.1(b) Forward recurrence time. We gave below (3.4.15) an expression for the distribution of the forward recurrence time r.v. Tu of a stationary point process. The deﬁnition at (3.4.15) does not require stationarity, and in the present case of a renewal process, it can be written as Tu = inf{Sn : Sn > u} − u = inf{Sn − u: Sn − u > 0} X1 − u if X1 > u, = inf{Sn − X1 : Sn − X1 > u − X1 } − (u − X1 ) otherwise. Now when X1 ≤ u, Tu has the same distribution as the forward recurrence time r.v. Tu−X , deﬁned on the renewal process with lifetime r.v.s {Xn } ≡ 1 {Xn+1 }, so u Pr{Tu > y} = Pr{X1 > y + u} + Pr{Tu−v > y} dF (v). (4.1.10) 0

But this equation is of the form (4.1.8), with z(x) = Pr{X1 > y + x} = 1 − F (y + x), so by (4.1.9) u Pr{Tu > y} = [1 − F (y + u − v)] dU (v). (4.1.11) 0−

In particular, putting y = 0, we recover the identity that is implicit in (4.1.5), x [1 − F (x − v)] dU (v) (all x ≥ 0). (4.1.12) 1= 0−

70

4. Renewal Processes

Example 4.1(c) Renewal equation with linear solution. As another important application of (4.1.8), consider the generator z(·) that corresponds to the solution Z(x) = λx (all x > 0), assuming such a solution function exists, and ∞ that λ−1 = EXn = 0 [1 − F (x)] dx is ﬁnite. Rearranging (4.1.8) yields x x z(x) = λx − λ (x − y) dF (y) = λ [1 − F (y)] dy. 0

0

We can recognize this expression as the distribution function of the forward recurrence time of a stationary point process. This argument identiﬁes the only initial distribution for which the delayed renewal function is linear. We conclude this section with a few brief remarks concerning the more general case where the random variables Xn are not necessarily nonnegative or even one-dimensional; thus we admit the possibility that the Xn are ddimensional vectors for some integer d > 1. In such cases, the sequence {Sn } constitutes a random walk. Such a walk is said to be transient if (4.1.2) is ﬁnite for all bounded Borel sets A; otherwise, it is recurrent, in which case the walk revisits any nonempty open set inﬁnitely often. Thus, it is only for transient random walks that (4.1.2) can be used to deﬁne a point process, which we shall call the random walk point process. In R1 , it is known that a random walk is transient if the mean E(X) is ﬁnite and nonzero; if E(X) exists but E(X) = 0, the random walk is recurrent. If the expectation is not deﬁned (the integral diverges), examples of both kinds can occur. In R2 , the random walk can be transient even if E(X) = 0, but only if the variance is inﬁnite. In higher dimensions, every random walk is transient unless perhaps it is concentrated on a one- or two-dimensional subspace. Proofs and further details are given, for example, in Feller (1966). Most of the renewal equation results also carry over to this context with only nominal changes of statement but often more diﬃcult proofs. Thus, the expectation or renewal measure may still be deﬁned as in (4.1.4), namely U (A) = δ0 (A) +

∞

F k∗ {A},

(4.1.4 )

k=1

and is ﬁnite for bounded Borel sets whenever the random walk is transient (but not otherwise, at least if A has nonempty interior). Furthermore, if z(x) is bounded, measurable, and vanishes outside a bounded set, we may consider the function ∞

Z0 (x) = z(x) + z(x − y) F k∗ (dy) = z(x − y) U (dy), (4.1.13) k=1

Rd

Rd

which is then a solution, bounded on ﬁnite intervals, of the generalized renewal equation Z(x) = z(x) + Rd

Z(x − y) F (dy).

(4.1.14)

4.1.

Basic Properties

71

Note that in (4.1.8) we were constrained not only to distributions F (·) concentrated on the half-line but also to functions z(x) and solutions Z(x) that could be taken as zero for x < 0. Without such constraints, the proof of uniqueness becomes considerably more subtle: one possible approach is outlined in Exercise 4.1.4. Note too that both (4.1.13) and (4.1.14) remain valid on replacing the argument x by a bounded Borel set A, provided Z(·) is then a set function uniformly bounded under translation for such A. Example 4.1(d) Random walks with symmetric stable distributions. Here we deﬁne the symmetric stable distributions to be those distributions in R with characteristic functions of the form 0 < α ≤ 2.

φα (s) = exp(−c|s|α )

Let us consider the associated random walks for the cases α ≤ 1 for which the ﬁrst moment does not exist. The case α = 1 corresponds to the Cauchy distribution with density function for some ﬁnite positive c f (x) =

c π(c2 + x2 )

(−∞ < x < ∞).

The nth convolution is again a Cauchy distribution with parameter cn = nc. If the renewal measure were well deﬁned, we would expect it to have a renewal density ∞ ∞

cn 1 u(x) = f n∗ (x) = . 2 2 π n=1 c n + x2 n=1 The individual terms are O(n−1 ) as n → ∞, so the series diverges. It follows readily that the ﬁrst-moment measure is inﬁnite, so the associated random walk is recurrent. For α < 1, it is diﬃcult to obtain a convenient explicit form for the density, but standard results for stable distributions imply that f n∗ and f diﬀer only by a scale factor, fαn∗ (x) = n−1/α fα (xn−1/α ), so that, assuming fα is continuous at zero, fαn∗ (x) ∼ xn−1/α fα (0). Thus, the series is convergent for 1/α > 1 (i.e. for α < 1), and divergent otherwise, so the associated random walk is transient only for α < 1. Example 4.1(e) A renewal process in two dimensions. We consider independent pairs (Xn , Yn ) where each pair has a bivariate exponential distribution with density vanishing except for x ≥ 0, y ≥ 0, where f (x, y) =

λ1 λ2 exp 1−ρ

λ1 x + λ2 y 1−ρ

I0

2(ρλ1 λ2 xy)1/2 , 1−ρ

72

4. Renewal Processes

λ1 , λ2 , and ρ are positive constants, 0 ≤ ρ < 1, and In (x) is the modiﬁed Bessel function of order n deﬁned by the series In (x) =

∞

(x/2)2k+n . k! (k + n)!

(4.1.15)

k=0

The marginal distributions are exponential with parameters λ1 , λ2 ; ρ is the correlation between X1 and Y1 ; and the joint distribution has bivariate Laplace–Stieltjes transform ψ(θ, φ) = {(1 + θ/λ1 )(1 + φ/λ2 ) − ρθφ/λ1 λ2 }−1 . Much as in the one-dimensional case, the renewal function can be deﬁned as U (x, y) = E(#{n: Sn ≤ x, Tn ≤ y}), n where Sn = k=1 Xk and Tn = k=1 Yk and has Laplace–Stieltjes transform χ(θ, φ) given by 1 . χ(θ, φ) = 1 − ψ(θ, φ) n

Substituting for ψ(θ, φ) and simplifying, we obtain χ(θ, φ) − 1 = [θ/λ1 + φ/λ2 + (1 − ρ)θφ/λ1 λ2 ]−1 , corresponding to the renewal density λ 1 x + λ2 y 2(λ1 λ2 xy)1/2 λ1 λ2 exp − I0 u(x, y) = 1−ρ 1−ρ 1−ρ

(x > 0, y > 0).

It should be noted that while the renewal density has uniform marginals, corresponding to the fact that each marginal process is Poisson, the bivariate renewal density is far from uniform, and in fact as x → ∞ and y → ∞, it becomes relatively more and more intensely peaked around the line λ1 x = λ2 y, as one might anticipate from the central limit theorem. The example is taken from Hunter (1974a, b), where more general results can be found together with a bibliography of earlier papers on bivariate renewal processes. See also Exercise 4.1.5.

Exercises and Complements to Section 4.1 4.1.1 (a) Using a sandwich argument and the strong law of large numbers for the i.i.d. sequence of lifetimes, prove that N (x)/x → λ a.s. as x → ∞. (b) Deduce from (a) the Elementary Renewal Theorem: The renewal function U (x) satisﬁes U (x)/x → λ as x → ∞, i.e. U (x) ∼ λx. [Hint: See Smith (1958) and Doob (1948). This is not the only possible proof.] (c) Similarly, if the lifetime distribution has ﬁnite second moment with variance σ 2 , deduce √ from the central limit theorem for the Xn that as x → ∞, (N (x) − λx)/λσ λx converges in distribution to a standard N (0, 1) random variable. [Hint: N (x) ≥ n if and only if Sn ≤ x, and if n, x → ∞ √ such that (x − n/λ)/(σ n ) → z for ﬁnite z, then λx/n → 1.]

4.1.

Basic Properties

73

4.1.2 Higher moments of the number of renewals. (a) Show that for 0 < x < y < ∞, E[N (dx) N (dy)] = U (dx) U (dy − x), where U is the renewal measure. Similarly, for any ﬁnite sequence 0 < x1 < x2 < · · · < xk < ∞, E[N (dx1 ) · · · N (dxk )] = U (dx1 ) U (dx2 − x1 ) · · · U (dxk − xk−1 ). [These are diﬀerential forms for the moment measures. When the densities exist, they reduce to the moment or product densities as discussed in Chapter 5; see, in particular, Example 5.4(b).] [k]

(b) Prove directly that E[(N (0, x]) ] ≤ k! [U0 (x)]k < ∞, where n[k] = n(n − 1) · · · (n − k + 1) and U0 (x) = U (x) − 1. (c) In terms of the renewal function U (x), use (a) to show that E[(N [0, x])2 ] = U (x) + 2

x

U0 (x − y) dU (y) 0−

and hence that when the renewal process is simple,

x

[U0 (x − y) − U0 (y)] dU0 (y).

var N [0, x] = var N (0, x] = U0 (x) + 2 0+

Check that in the case of a Poisson process at rate λ, E[(N [0, x])2 ] = 1 + 3λx + λ2 x2 and var N (0, x] = λx. 4.1.3 Let Q(z; x) =

∞

n=0

z n Pr{N [0, x] ≥ n}. Show that

x

Q(z; x − y) dF (y)

Q(z; x) = 1 + z 0

and hence that the Laplace–Stieltjes transform is given by

θ) = Q(z;

∞

e−θx dx Q(z; x) =

0−

1 , 1 − zψ(θ)

where ψ(θ) is the Laplace–Stieltjes of F . Obtain corresponding ∞ transform n results for the p.g.f. P (z; x) = z Pr{N [0, x] = n}. Deduce that the n=0 factorial moment E[(N [0, x])[k] ] is the k-fold convolution of U (x) − 1. 4.1.4 For the one-dimensional random walk with nonlattice step distribution F , prove that the only bounded measurable solutions of the equation

∞

D(x − y) F (dy)

D(x) = −∞

are constant. An outline of one method nis as follows. (1◦ ) Let Yn = D(−Sn ), where Sn = i=1 X1 . Use the equation to show that for any bounded measurable solution D, the random variables {Yn } constitute a bounded martingale (see Appendix 3) and hence converge a.s. to some limit random variable Y∞ . (2◦ ) Since Y∞ is deﬁned on the tail σ-algebra of the i.i.d. sequence {Xn }, it must be degenerate; that is, Y∞ = c for some ﬁnite real number c.

74

4. Renewal Processes (3◦ ) Since for all X1 independent of Sn , D(−X1 − Sn ) =d D(−Sn+1 ) → c a.s., deduce that E(D(−X1 − Sn ) | X1 ) → c and hence, using the equation again, that D(−X1 ) = c a.s., whence also D(−Sn ) = c a.s. for n = 1, 2, . . . . Thus, ﬁnally, D(x) = c a.e. whenever X has a nonlattice distribution. [Hint: See Doob, Snell and Williamson (1960); for an alternative proof, see Feller (1966, Section XI.2), and for a review, see Rao and Shanbhag (1986).]

4.1.5 Two-dimensional renewal process. In the context n of Example 4.1(e), n let N (x, y) = #{n : Sn ≤ x, Tn ≤ y}, where Sn = X and T = Y, i n i=1 i=1 i and put Q(z; x, y) =

n

z n Pr{N (x, y) ≥ n},

n=0

P (z; x, y) =

∞

z n Pr{N (x, y) = n}.

n=0

Extend the result of Exercise 4.1.3 to show that the double Laplace–Stieltjes transform of P (z; x, y) is given by 1 − ψ(θ, φ) , P˜ (z; θ, φ) = 1 − zψ(θ, φ)

∞

ψ(θ, φ) = 0

∞

e−θx−φy dx,y F (x, y).

0

For the particular bivariate exponential distribution in Example 4.1(e), the ∞ n∗ renewal measure has the density f , where for x, y > 0, n=1

n−1 f

n∗

(x, y) = f (x, y)

ζ ρ

In−1 (2ζ/(1 − ρ)) , I0 (2ζ/(1 − ρ))

ζ=

ρλ1 λ2 xy .

4.2. Stationarity and Recurrence Times A modiﬁed or delayed renewal process, {Sn } say, is deﬁned much as in (4.1.1) but with X1 replaced by X1 , which is independent of, but not necessarily identically distributed with, the remaining variables X2 , X3 , . . . . Let F1 (x) = Pr{X1 ≤ x}. Then, in terms of a forward recurrence time r.v. Tu for a renewal process as in Example 4.1(b), the forward recurrence time r.v. Tu for such a process {Sn } is deﬁned by Tu = inf{Sn : Sn > u} − u and satisﬁes % X1 − u if X1 > u, Tu =d (4.2.1) Tu−X1 otherwise, hence (see (4.1.10)) Pr{Tu > y} = 1 − F1 (y + u) +

u

Pr{Tu−v > y} dF1 (v). 0

(4.2.2)

4.2.

Stationarity and Recurrence Times

75

The most important delayed renewal process arises when X1 has the probability density function f1 (x) = λ 1 − F (x) x ≥ 0, λ−1 = E(X) , (4.2.3) for then the resulting point process in (0, ∞), with counting measure N (A) = #{n: Sn ∈ A}, is stationary, as we might anticipate from (3.4.16) and Example 4.1(c). Note that here we are dealing with stationarity on the half-line, in the sense that Deﬁnition 3.2.I is required to hold only for Borel subsets of (0, ∞) and for shifts t ≥ 0. To establish this stationarity property more formally, deﬁne another delayed renewal process, {Sn } say, with initial lifetime r.v. X1 = Tu that is followed by a further sequence of i.i.d. random variables with common d.f. F . Stationarity of {Sn } is proved by showing that the distributions of the two sequences {Sn } and {Sn } coincide. From the assumed independence and distributional properties, it is enough to show that the distributions of the two initial intervals X1 and X1 coincide; i.e. Pr{X1 > y} = Pr{Tu > y} for all nonnegative u and y. Using (4.2.2) and (4.1.11), Pr{Tu > y} equals u u−v ∞ [1 − F (x)] dx + [1 − F (y + u − v − w)] dU (w) λ[1 − F (v)] dv, λ 0

y+u

0−

(4.2.4) and the last term here equals u u−w λ dU (w) 1 − F (v) 1 − F (y + u − v − w) dv 0−

0

1 − F (u − w − v) 1 − F (y + v) dv 0 0 u−v u 1 − F (y + v) dv 1 − F (u − v − w) dU (w) =λ 0 0 u =λ 1 − F (y + v) dv, using (4.1.12). =λ

u

dU (w)

u−w

0

Substituting back in (4.2.4) and simplifying leads by (4.2.3) to Pr{Tu > y} ∞ = λ y [1 − F (x)] dx = Pr{X1 > y}, as required. These remarks prove the ﬁrst part of the following proposition (see Exercise 4.2.2 for an alternative proof of this part). Proposition 4.2.I. If the lifetime d.f. has ﬁnite ﬁrst moment λ−1 , then the delayed renewal process with initial density (4.2.3) is stationary, and for all u > 0 the forward recurrence time Tu has this density. If the mean of the lifetime distribution is inﬁnite, then no delayed renewal process with this lifetime distribution can be stationary. Proof. To prove the last statement, start by noting from the key renewal theorem, proved later in Proposition 4.4.II, that the forward recurrence time r.v. Tu for a renewal process {Sn } whose lifetime distribution has inﬁnite

76

4. Renewal Processes

mean satisﬁes (see also Example 4.4(a)) for every ﬁnite y,

lim Pr{Tu ≤ y} = 0.

u→∞

Then, by dominated convergence, letting u → ∞ in (4.2.2) shows that, irrespective of the distribution F1 of X1 , Pr{Tu > y} → 1 for every y, so no stationary form for the distribution of Tu is possible. The intuitive interpretation of the last somewhat paradoxical limit statement is that if λ−1 = ∞, we shall spend an ever greater proportion of time traversing intervals of exceptional length and ﬁnd ourselves in a situation where the current interval has a length greater than y still to run. Now recall from Exercise 3.4.1 the deﬁnition of a backward recurrence time r.v. Bu as a companion to the forward recurrence time r.v. Tu : Tu = inf{y: N (u, u + y] > 0},

Bu = inf{x: N (u − x, u] > 0}.

(4.2.5)

Note that there is an asymmetry in the deﬁnitions of Bu and Tu : because N (·) is a.s. ﬁnite on bounded intervals, Tu > 0 a.s. but it is quite possible to have Pr{Bu = 0} > 0. The current lifetime r.v. Lu can then be deﬁned by Lu ≡ Bu + Tu . The joint distribution of any two of these r.v.s thus gives the distribution of all three: the simplest is that of Bu and Tu for which, when N (·) is stationary and orderly, Pr{Bu > x, Tu > y} = Pr{N (u − x, u + y] = 0} = Pr{N (u, u + x + y] = 0} ∞ 1 − F (v) dv. = Pr{Tu > x + y} = λ

(4.2.6)

x+y

Note that under stationarity and orderliness, Bu has the same marginal d.f. as Tu , while z Pr{Tu > z − x, Bu ∈ (x, x + dx)} + Pr{Bu > z} Pr{Lu > z} = 0 z ∞ = λ 1 − F (x + z − x) dx + λ 1 − F (v) dv 0 z ∞ =λ 1 − F (max(v, z)) dv. (4.2.7) 0

Thus, ELu = 2ETu = 2EBu = λEX 2 = EX 2 /EX ≥ EX,

(4.2.8)

with equality only in the case where X = EX a.s.; that is, all lifetimes are equal to the same constant, when the renewal process is variously called a deterministic renewal process or a process of equidistant points. By identifying 1 − F (·) with q0 (·) in (3.4.9), equations (4.2.6–8) continue to hold for any stationary orderly point process as discussed in Section 3.4.

4.2.

Stationarity and Recurrence Times

77

Without the assumption of stationarity, we may use the alternative deﬁnition for Bu , Bu = u − sup{Sn : Sn ≤ u} (u ≥ 0}. Arguing as in (4.1.10), it is not diﬃcult to show (see Exercise 4.2.1) that for the basic renewal process {Sn }, (u−x)+ 1 − F (u + y − v) dU (v). (4.2.9) Pr{Bu > x, Tu > y} = 0

In the case of a Poisson process, we have F (x) = 1 − e−λx , and it is then not diﬃcult to check from these relations that EX < ∞ and the distribution of Tu is independent of u; EX < ∞ and Bu and Tu are independent for each u > 0;

(4.2.10b)

ETu < ∞ (all u) and is independent of u.

(4.2.10c)

(4.2.10a)

Properties such as (4.2.10) have been used to characterize the Poisson process amongst renewal processes, as detailed in part in Galambos and Kotz (1978). For example, when ETu < ∞, integration of (4.1.10) shows that u ∞ ETu = 1 − F (y) dy + E(Tu−v ) dF (v), 0

u

so that when (4.2.10c) holds, 1 − F (u) ETu = 1 − F (u) ET0 =

∞

1 − F (y) dy

(all u > 0).

u

Thus, F (y) = 1−c e−λy for some constant c = 1−F (0+); since F (0+) = 0 for an orderly renewal process, c = 1. The proof of the rest of Proposition 4.2.II is indicated in Exercises 4.2.3–4. Proposition 4.2.II. Any one of the statements (4.2.10a), (4.2.10b), and (4.2.10c) characterizes the Poisson process amongst orderly renewal processes.

Exercises and Complements to Section 4.2 4.2.1 By following the argument leading to (4.2.3), show that for an orderly renewal process N (·) for which N ({0}) = 1 a.s., Pr{Bu > x, Tu > y} = Pr{N (u − x, u + y] = 0}

(u−x)+

[1 − F (y + u − v)] dU (v),

= 0− u

[1 − F ( max(z, u − v))] dU (v).

Pr{Lu > z} = 0−

4.2.2 Suppose that the delayed renewal process {Sn } with counting function N (·) and lifetime distribution F (·) with ﬁnite mean λ−1 is stationary. Show that X1 must have the density (4.2.3). [Hint: Stationarity implies that EN (0, x] = λx (all x > 0); now use Example 4.1(c).] 4.2.3 Use (4.1.10) to show that (4.2.10a) characterizes the Poisson process among orderly renewal processes.

78

4. Renewal Processes

4.2.4 Use (4.2.9) with x ↑ u to deduce that when (4.2.10b) holds, Pr{Tu > y} =

1 − F (y + u) 1 − F (u)

for each u and y ≥ 0. Consequently, for all v in the support of U (·),

[1 − F (0+)][1 − F (y + v)] = [1 − F (y)][1 − F (v)], so that F (·) is either geometric or exponential. If F (x) is constant for 0 < x < δ, then Bu and Tu cannot be independent—hence the characterization in Proposition 4.2.II via (4.2.10b). 4.2.5 For a renewal process with lifetime d.f. F (x) = 1 − (1 + µx)e−µx , evaluate the renewal function as U (x) = 1 + 12 µx − 14 (1 − e−2µx ) and hence derive the d.f.s of the forward and backward recurrence time r.v.s Tu and Bu . Verify their asymptotic properties for u → ∞.

4.3. Operations and Characterizations Because a single d.f. F suﬃces to describe a renewal or stationary renewal process, it is of interest to ask in various contexts involving the manipulation of point processes what conditions lead again to a renewal process as a result of the transformation or operation concerned. More often than not, the solution to such a question is a characterization of the Poisson process, a conclusion that can be disappointing when it might otherwise be hoped that more general renewal processes could be realized. Roughly speaking, when such a Poisson process characterization solution holds, it indicates that the interval independence property of a renewal process can be preserved only as a corollary of the stronger lack-of-memory property of the Poisson process. We have already given examples of characterizations of the Poisson process in Proposition 4.2.II. The three operations considered in this section concern thinning, superposition, and inﬁnite divisibility. Example 4.3(a) Thinning of renewal processes. Given a renewal process {Sn }, let each point Sn for n = 1, 2, . . . be omitted from the sequence with probability 1 − α and retained with probability α for some constant α in 0 < α < 1, each such point Sn being treated independently. This independence property means that if {Sn(r) , r = 1, 2, . . .} is the sequence of retained points with 0 = n(0) < n(1) < n(2) < . . . , then Nr ≡ n(r) − n(r − 1) is a family of i.i.d. positive integer-valued r.v.s with Pr{Nr = j} = α(1 − α)j−1 for j = 1, 2, . . . , and hence {Yr } ≡ {Sn(r) − Sn(r−1) } (4.3.1)

4.3.

Operations and Characterizations

79

is a family of i.i.d. r.v.s with d.f. Pr{Yr ≤ x} =

∞

α(1 − α)j−1 F j∗ (x).

j=1

Consequently, {Sn(r) } is still a renewal process, and it is not hard to verify that its renewal function, Uα say, is related to that of {Sn } by rescaling as in (4.3.2) Uα (x) − 1 = α U (x) − 1 . It is readily seen that whenever {Nr } here is a family of i.i.d. positive integer-valued r.v.s, {Sn(r) } is a renewal process, but it is only for the geometric distribution for Nr that (4.3.2) holds. In connection with this equation, the converse question can be asked as to when it can be taken as deﬁning a renewal function for α > 1. In general, for a given renewal function U , there is a ﬁnite largest α ≥ 1 for which 1 + α(U (x) − 1) is a renewal function, although there is a class of lifetime d.f.s, including the exponential and others besides, for which 1 + α(U (x) − 1) is a renewal function for all ﬁnite positive α [Daley (1965); see also van Harn (1978) and Exercise 4.3.1]. Any renewal function U satisﬁes U (x)/λx → 1 as x → ∞, and consequently the renewal function Uα of the thinned renewal process {Sn(r) }, when rescaled so as to have the same mean lifetime, becomes Uαs , say, deﬁned by (α ↓ 0). Uαs (x) − 1 = α U (x/α) − 1 → λx Thus, if Uαs is independent of α, it must equal the renewal function of a Poisson process, which is therefore the only renewal process whose renewal function is preserved under thinning and rescaling, i.e. Uαs = U (all 0 < α < 1). Example 4.3(b) Superposition of renewal processes. Let N1 , . . . , Nr be independent nontrivial stationary renewal processes. When is the superposed process N = N1 + · · · + N r (4.3.3) again a renewal process? Certainly, N is a renewal process (indeed a Poisson process) when each of the components N1 , . . . , Nr is a Poisson process. Conversely, since by Raikov’s theorem (e.g. Lukacs, 1970) independent random variables can have their sum Poisson-distributed only if every component of the sum is Poisson-distributed also, it follows from writing N (A) = N1 (A) + · · · + Nr (A) (all Borel sets A) and appealing to Renyi’s characterization in Theorem 2.3.II that if N is a Poisson process, then so also is each Nj . Because a renewal process is characterized by its renewal function, and this is linear only if the process is Poisson, one way of proving each of the two assertions below is to show that the renewal function concerned is linear. Proposition 4.3.I. A stationary renewal process is the superposition of two independent nontrivial stationary renewal processes only if the processes are Poisson.

80

4. Renewal Processes

Proposition 4.3.II. A stationary renewal process is the superposition of r ≥ 2 independent identically distributed stationary renewal processes only if the processes are Poisson. Proof. We start by allowing the renewal processes Nj to have possibly diﬀerent lifetime d.f.s Fj , denoting each mean by λ−1 j , so by Proposition 4.1.I, each λj is ﬁnite and positive. Write λ = λ1 +· · ·+λr , pj = λj /λ, πj = Fj (0+), and π = F (0+), where F is the lifetime d.f. of the superposed process N . For any such renewal process, we have, for small h > 0 and |z| ≤ 1, E z N (0,h) = 1 −

λh(1 − z) + o(h) (1 − π)(1 − zπ) r r = 1− E z Nj (0,h) = j=1

j=1

λj h(1 − z) + o(h) . (1 − πj )(1 − zπj )

It follows by equating powers of z that for i = 1, 2, . . . , lim Pr{N (0, h] = i | N (0, h] > 0} = π i−1 (1 − π) = (1 − π)λ−1 h↓0

r

λj πji−1 .

j=1

All these equations can hold for nonzero π and πj (and nonzero λ) only if π = πj for j = 1, . . . , r; that is, only if all renewal processes concerned have the same probability of zero lifetimes. Consequently, it is enough to establish the propositions in the orderly case, which we assume to hold from here on. In place of the renewal function U in (4.1.5), we use H(x) =

∞

F n∗ (x),

so H(x) = λx for a Poisson process.

(4.3.4)

n=1

Then, from (3.5.3), for a stationary renewal process N , x var N (0, x) = var N (0, x] = λ [2H(u) + 1] du − (λx)2 0 x 2[H(u) − λu] + 1 du ≡ V (x) =λ 0

and thus

cov N [−x, 0), N (0, y] = 12 V (x + y) − V (x) − V (y) y =λ G(x + u) − G(u) du, 0

where G(x) = H(x)−λx. It is convenient to write below, for r.v.s Y for which the limits exist, E0 (Y ) = lim E(Y | N (0, h] > 0). h↓0

Since pj = limh↓0 Pr{Nj (0, h] > 0 | N (0, h] > 0},

4.3.

Operations and Characterizations

81

H(x) = E0 (N (0, x] | N ({0}) > 0) r r &

Pr{Nj (−h, 0] > 0}[1 + o(1)] & = lim E0 Ni (0, x] & Nj ({0}) > 0 h→0 Pr{N (−h, 0] > 0} j=1 i=1 r

= pj Hj (x) + pj λi x , (4.3.5) j=1

so G(x) =

i =j

r

pj Gj (x). Similar, somewhat lengthier, algebra leads to G(x, y) ≡ lim E0 N (−x, 0) − λx N (0, y) − λy | N ({0}) > 0 h→0 y r r

2 G(x + u) − G(u) − pj Gj (x, y) + λ pj Gj (x + u) − Gj (u) du. = j=1

j=1

0

j=1

Thus, when N1 , . . . , Nr are identically distributed, pj = 1/r, Gj (x) = G1 (x) (all j), and G1 (x) = G(x). Also, for a renewal process, G(x, y) = G(x)G(y), so y G(x + u) − G(u) du. G(x)G(y) = G(x)G(y) + λ(1 − 1/r) 0

It follows that G(x + y) = G(y) = G(0) (all x, y > 0). Thus, H(x) = λx, and Proposition 4.3.II is proved. On the other hand, for r = 2 and possibly diﬀerent F1 and F2 , replacing G(x, y) by G(x)G(y) with G(x) = p1 G1 (x) + p2 G2 (x), p1 + p2 = 1, leads to −p1 p2 G1 (x) − G2 (x) G1 (y) − G2 (y) y G1 (x + u) + G2 (x + u) − G1 (u) − G2 (u) du. = λp1 p2 0

The function K(y) ≡ G1 (y) − G2 (y) thus has a right-derivative k(·) given by −K(x)k(y) = λ G1 (x + y) + G2 (x + y) − G1 (y) − G2 (y) . Either K(x) = 0, in which case G1 = G2 and the earlier argument shows that G(x) = 0, or else by letting y ↓ 0 and using G1 (0) = G2 (0) = 0, it follows that G1 (x) is proportional to G2 (x), with G1 (x) having the derivative g1 (x), say. Consequently, g1 (x)g1 (y) = αg1 (x + y) for some nonzero α, so g1 (x) = αe−βx for some 0 < β < ∞ because G1 (x)/x → 0 as x → ∞. Transform calculus now shows that each 1 − Fj (u) = e−bj u . An earlier version of Proposition 4.3.I is in McFadden and Weissblum (1963), and a diﬀerent proof is in Mecke (1969). Another argument is used in Mecke (1967) to prove the following result (the proof is omitted here). Proposition 4.3.III. Let the stationary renewal process N be the superposition of the independent stationary point processes N1 and N2 with N1 renewal. If the lifetime d.f.s F and F1 of N and N1 have density functions that are continuous on (0, ∞) and right-continuous at 0, then N1 is a Poisson process.

82

4. Renewal Processes

By taking N1 to be Poisson with rate parameter λ and N2 to be an alternating renewal process with exponential distributions for the alternating lifetime d.f.s, their parameters α and β being such that λ2 = αβ, Daley (1973a) furnished an example showing that Mecke’s result cannot characterize N2 as a Poisson process. If only the diﬀerentiability assumptions could be omitted, the restriction in Proposition 4.3.II that the components Nj of the sum N at (4.3.3) should be identically distributed could be dropped. Example 4.3(c) Inﬁnite divisibility. A natural complement to Example 4.3(b) is to ask whether there are any stationary renewal processes other than the Poisson that are inﬁnitely divisible. Here we ask whether for (any or all) integers r, the stationary renewal process N in (4.3.3) is expressible as the superposition of i.i.d. stationary point processes N1 , . . . , Nr . Assuming that the lifetime distribution concerned has a density function, [MKM] state that H¨ aberlund (1975) proved that the Poisson process is the only one, while under the additional assumption of the existence of density functions for all the joint distributions of the component process N1 , Ito (1980) has asserted the stronger result that if N is expressible as N = N1 + · · · + Nr for one integer r ≥ 2, then it is Poisson and hence inﬁnitely divisible. There are innumerable characterizations of the exponential distribution and Poisson processes (see reviews in Galambos and Kotz (1978) and Johnson and Kotz (1994, Section 19.8)). Fosam and Shanbhag (1997) has a useful list of papers exploiting variants of the Choquet–Deny functional equation approach.

Exercises and Complements to Section 4.3 4.3.1 (a) When F (x) = 1 − (1 + x)e−x , show (e.g. by using Laplace–Stieltjes transforms) that 1 + α(U (x) − 1) is a renewal function if and only if 0 < α ≤ 1. (b) Let {X(t): t ≥ 0} be a stochastic process with X(0) = 0 and stationary nonnegative independent increments, with L´evy–Khinchin representation E(e−θX(t) ) = etψ(θ) , where

ψ(θ) = −θµ0 +

(e−θx − 1) µ(dx),

(0,∞)

with µ0 ≥ 0 and µ(·) a nonnegative measure on (0, ∞) satisfying

min(x, 1) µ(dx) < ∞, and µ(0, ∞) = ∞ if µ0 = 0. Let 0 = t0 < t1 < · · · be the successive epochs of a Poisson process in (0, ∞) with intensity so that the r.v.s X(tn ) − X(tn−1 ) are i.i.d. with d.f. F (x) = unit ∞ F (x, t)e−t dt, where F (x, t) = Pr{X(t) ≤ x}. Show that with U (·) the 0 renewal function corresponding to F and U0 (x) = U (x) − 1, 1 + αU0 (x) is a renewal function for all 0 < α < ∞, and that U0 (x) is subadditive (see Kingman, 1972, p. 100). (0,∞)

4.3.2 Let the stationary point process N1 arise as the jump epochs of a Markov process on countable state space, and let N2 be a stationary Poisson process independent of N1 . Daley (1975b) showed that for N ≡ N1 + N2 to be a stationary renewal process diﬀerent from Poisson, not only must the Markov chain transition rates underlying N1 have a particular structure but also there is a unique rate λ for N2 for which N can have the renewal property.

4.4.

Renewal Theorems

83

4.4. Renewal Theorems Considerable eﬀort has been expended in the mathematics of renewal theory on establishing Theorem 4.4.I below and its equivalents; they are stronger statements than the elementary renewal theorem [i.e. the property U (x) ∼ λx given in Exercise 4.1.1(b) of which there is a generalization in (3.5.3)]. Theorem 4.4.I is variously known as Blackwell’s renewal theorem or the key renewal theorem, depending basically on how it is formulated. Theorem 4.4.I (Blackwell’s Renewal Theorem). For ﬁxed positive y, restricted to ﬁnite multiples of the span of the lattice when the lifetime d.f. is lattice, and otherwise arbitrary, U (x + y) − U (x) → λy

(x → ∞).

(4.4.1)

Equation (4.4.1) says roughly that the renewal measure ultimately behaves like a multiple of Lebesgue measure. To make this more precise, let St U denote the shifted version of the renewal measure U so that St U (A) = U (t + A). Then (4.4.1) implies that on any ﬁnite interval (0, M ), St U converges weakly to the multiple λ of Lebesgue measure (·) (or, equivalently, St U as a whole converges vaguely to λ; see Section A2.3 for deﬁnitions and discussion of weak and vague convergence). Blackwell’s theorem represents the ‘set’ form of the criterion for weak convergence, while the key renewal theorem (Theorem 4.4.II below) represents a strengthened version of the corresponding ‘function’ form, the strengthening taking advantage of the special character of the limit measure and its approximants. On the other hand, the theorem is not so strong as to assert anything concerning a density u(·) for U . Such results require further assumptions about the lifetime distributions and are explored, together with further strengthenings of Blackwell’s theorem, following Theorem 4.4.II. Proof of Theorem 4.4.I. The proof given here is probabilistic and uses a coupling method [see Lindvall (1977, 1992) and Thorisson (2000, Section 2.8)]. We compare each sample path {Sn } with the sample path {Sn } of a stationary renewal process as deﬁned in Section 4.2, {Sn } and {Sn } being deﬁned on a common probability space (Ω, F, P ) so as to be mutually independent. For each ω ∈ Ω, and every integer i ≥ 0, deﬁne for {Sn } the forward recurrence time r.v.s Zi ω = TS i (ω) so that Zi (ω) = min{Sj (ω) − Si (ω): Sj (ω) > Si (ω)}. Because the sequence {Si+n − Si } has a distribution independent of i and is independent of {Sn }, and because Tu is stationary, it follows that the sequence {Zi } is also stationary. Thus, the events Ai ≡ {Zj < δ for some j ≥ i},

84

4. Renewal Processes

which we deﬁne for any ﬁxed δ > 0, have the same probability for each i = 0, 1, . . . , and in particular therefore P (A0 ) = P (A∞ ), where A0 ⊇ A1 ⊇ · · · ⊇ A∞ ≡

∞

Ai = {Zi < δ i.o.}.

i=1

Now A∞ is a tail event on the conditional σ-ﬁeld (namely, conditional on X1 ) of the i.i.d. r.v.s {X1 , X1 , X2 , X2 , . . .} and therefore by the zero–one law for tail events (see e.g. Feller, 1966, Section IV.6), for -a.e. x, P (A∞ | X1 = x) = 0 or 1

(0 < x < ∞).

Because F is nonlattice, P {u − x < Sj − X1 0 (see Feller, 1966, Section V.4a, Lemma 2), and hence P (A0 | X1 = x) > 0 for every x. Thus, the equations ∞ 0<λ P (A0 | X1 = x)[1 − F (x)] dx = P (A0 ) 0 ∞ = P (A∞ ) = λ P (A∞ | X1 = x)[1 − F (x)] dx 0

X1

force P (A∞ | = x) = 1 for every x for which F (x) < 1. Hence, P (A∞ ) = 1 = P (A0 ), so that for every δ > 0, P {Zi < δ for some i} = 1. To establish (4.4.1), it is enough to show that, for any δ > 0, we can ﬁnd x0 such that x ≥ x0 implies that |EN (x, x + y] − λy| ≤ δ. Observe that λy = EN (x, x + y], where N is the counting function for the stationary renewal process with intervals {Xn }. Let Iδ = inf{i: Zi < δ}, so that P {Iδ < ∞} = 1. Deﬁning J ≡ inf{j: Sj (ω) > SIδ (ω)}, we then have 0 < ZIδ (ω) = SJ (ω) − SIδ (ω) < δ. Deﬁne a new point process by means of the sequence of intervals {X1 , . . . , XIδ , XJ+1 , XJ+2 , . . .},

and denote its counting function by N so that for any Borel set A, N (A) = N A ∩ (0, SIδ ) + N (A + ZIδ ) ∩ (Sj , ∞) = N A ∩ (0, SIδ ) + N (A + ZIδ ) − N (A + ZIδ ) ∩ (0, Sj ) . When A is the interval (x, x+y], the shifted interval A+ZIδ has EN (A+ZIδ ) lying between λ(y − δ) and λ(y + δ) because (x + δ, x + y] ⊆ (x + ZIδ , x + y + ZIδ ] ⊆ (x, x + y + δ]. For every x, the r.v.s N (x, x + y] are stochastically dominated by the r.v. 1 + N (0, y], and since this has ﬁnite expectation, {N (x, x + y]: x ≥ 0} is a

4.4.

Renewal Theorems

85

uniformly integrable family of r.v.s. This ensures that as x → ∞ E N (x, x + y]I{x<SIδ } → 0 since then P {x < SIδ } → 0. Similarly, N (x+ZIδ , x+y +ZIδ ] is stochastically dominated by 1 + N (0, y] and P {x < Sj } → 0 as x → ∞, so E(N (x + ZIδ , x + y + ZIδ ]I{x<Sj } ) → 0. Consequently, for x suﬃciently large, U (x + y) − U (x) = EN (x, x + y] is arbitrarily close to EN (A + ZIδ ), and since δ is arbitrarily positive, (4.4.1) is established. We now turn to an equivalent but very important form of Theorem 4.4.I for nonlattice lifetimes. A function g(·) deﬁned on [0, ∞) is directly Riemann integrable there when, for any h > 0, the normalized sums ∞ ∞

h h h g− (nh) and h g+ (nh) n=1

n=1

converge to a common ﬁnite limit as h → 0; here, h g− (x) = inf g(x − δ),

h g+ (x) = sup g(x − δ).

0≤δ≤h

0≤δ≤h

Exercise 4.4.1 states suﬃcient conditions for g to be directly Riemann integrable. For such a function, with U (x) ≡ 0 for x < 0 and monotonically increasing on x ≥ 0, x ∞ ≤ h g(x − y) dU (y) g± (nh) U x − (n − 1)h − U (x − nh) . ≥ 0 n=1 These sums can be truncated to ﬁnite sums with truncation error bounded by x−C |g(x − y)| dU (y) 0

[x−C]

≤

|g|1+ (C + n) U (x + 1 − C − n) − U (x − C − n)

n=1

≤ U (1)

∞

|g|1+ (C + n),

n=1

which can be made arbitrarily small, uniformly in x > 0, by taking C suﬃciently large. Thus, the sums are approximated by x [C/h] ≤ h g(x − y) dU (y) g (nh)[U (x − nh + h) − U (x − nh)] ≥ n=1 ± x−C

[C/h]

→ λh

h g± (nh)

(x → ∞)

n=1 C

→ λ

g(u) du 0

(h → 0).

86

4. Renewal Processes

The following equivalent form of Theorem 4.4.I can now be given. Theorem 4.4.II (Key Renewal Theorem). For nonlattice lifetime distributions and directly Riemann integrable functions g(·), x ∞ g(x − y) dU (y) → λ g(y) dy (x → ∞). (4.4.2) 0

0

Some results for monotonically decreasing but not necessarily integrable functions g(·) are sketched in Exercise 4.4.5(c). The following examples may serve as prototypes for the application of the renewal theorem to problems of convergence to equilibrium. Example 4.4(a) Convergence of the forward recurrence time distribution. Our starting point is (4.1.11), which after subtracting from (4.1.12) can be written u Fu (y) ≡ Pr{Tu ≤ y} = [F (y + u − v) − F (u − v)] dU (v). (4.4.3) 0−

This is in the form (4.4.2) with g(x) = F (y + x) − F (x). This function is integrable and of bounded variation over the whole half-line; it then follows easily (see Exercise 4.4.1) that the function is directly Riemann integrable, so that the theorem can be applied. It asserts that, provided the lifetime distribution is nonlattice, ∞ y Fu (y) → λ [F (y + x) − F (x)] dx = λ [1 − F (v)] dv (u → ∞). 0

0

If λ−1 < ∞, this is the usual form of the length-biased distribution associated with F , the fact that the distribution is proper following from the identity ∞ 1 = λ 0 1 − F (v) dv. In this case, (4.4.2) asserts directly that the forward recurrence time distribution converges weakly to its limit form. The extension of this result to a delayed renewal process with arbitrary initial distribution follows then from (4.4.4). When λ−1 = ∞, Fu (y) → 0 for all y and no stationary form can exist. Example 4.4(b) Convergence of the renewal density. As a further corollary, we shall prove (see Feller, 1966, Section XI.4) that if the lifetime distribution F has ﬁnite mean and bounded density f (t), then U (t) has density u(t) such that u(t) − f (t) → λ. (4.4.4) This follows from the fact that u(t), when it exists, satisﬁes the renewal equation in its traditional form t u(t) = f (t) + u(t − x)f (x) dx. 0

[To check this, note that equation (4.1.9) implies that the solution has the form s t u(s) = 0 f (s − x) dU (x), which on integrating yields 0 u(s) ds = U (t) − 1.]

4.4.

Renewal Theorems

87

Moreover, the function u(t) − f (t) =

∞

f k∗ (t)

k=2

satisﬁes the renewal equation u(t) − f (t) = f (t) +

t

[u(t − x) − f (t − x)]f (x) dx.

2∗

(4.4.5)

0

Now if f (t) is bounded, f 2∗ (t) is directly Riemann integrable. Indeed, as the convolution of a bounded and an integrable function, it is uniformly continuous (Exercise 4.4.2), while the inequality

t/2

t

f (t − y)f (y) dy +

f 2∗ (t) = 0

=2 0

f (t − y)f (y) dy t/2

t/2

f (t − y)f (y) dy ≤ 2C[1 − F ( 12 t)],

where C = sup |f (t)|, shows that when µ = λ−1 < ∞, f 2∗ (t) is also bounded above by an integrable monotonic function and is therefore directly Riemann integrable by Exercise 4.4.1(c). Thus, Proposition 4.4.II applies, yielding (4.4.4). The argument can be extended to the case where, if not f itself, at least one of its convolution powers has bounded density (see Exercise 4.4.3). Even a partial assumption of absolute continuity allows the conclusions of the renewal theorems to be substantially strengthened—for example, from local weak convergence of the renewal measure to local convergence in variation norm, namely St U − λM → 0, (4.4.6) where µM is the variation norm of the (signed) measure µ over [0, M ]. Equation (4.4.6) would imply that, in Blackwell’s theorem, U (t + A) → λ(A) not only for A an interval, as in (4.4.1), but for any bounded Borel A, a strengthening considered by Breiman (1965) [see Feller (1966, Section XI.1) for counterexamples]. An appropriate condition is embodied in the following deﬁnition. Deﬁnition 4.4.III. A probability distribution F is spread out if there exists a positive integer n0 such that F n0 ∗ has a nonzero absolutely continuous component with respect to Lebesgue measure. The deﬁnition implies that F n0 ∗ can be written in the form F n0 ∗ = Σ + A,

(4.4.7)

88

4. Renewal Processes

where Σ is singular and A is absolutely continuous with respect to Lebesgue measure, and A has a nonzero density a(x), so that ∞ σ = Σ = 1 − a(x) dx < 1. 0

Since the convolution of A with any power of F or Σ is again absolutely continuous, it follows that the total masses of the absolutely continuous components F n∗ can only increase as n → ∞, and in fact must approach 1, since Σk∗ = σ k → 0. Thus, we might anticipate that the asymptotic behaviour of the renewal measure for a spread out distribution would approximate the behaviour to be expected when a density exists. This is the broad content of the following proposition (see Stone, 1966) from which our further results will follow as corollaries. Proposition 4.4.IV. Let F be spread out, U the renewal measure associated with F , and UG = G ∗ U the renewal measure associated with the corresponding delayed renewal process with initial distribution G. Then UG can be written in the form UG = U1G + U2G , (4.4.8) where U1G is absolutely continuous with density u1G (x) satisfying ∞ −1 λ = x dF (x), u1G (x) → λ,

(4.4.9)

0

and U2G is totally ﬁnite. Proof. Consider ﬁrst the ordinary renewal measure U associated with F . Since the convolution of A with itself can always be taken to dominate a uniformly continuous function (Exercise 4.4.2), there is no loss of generality in supposing that the density a(x) of A in (4.4.6) is continuous, bounded, and vanishes outside some ﬁnite interval (0, M ). With this understanding, let U3 denote the renewal measure associated with the distribution F n0 ∗ so that we may write u3 = δ0 + F n0 ∗ + F 2n0 ∗ + · · · and

U = [δ0 + F + F 2∗ + · · · + F (n0 −1)∗ ] ∗ U3 = ρ ∗ U3 ,

where ρ has total mass n0 . Also, since U3 satisﬁes the renewal equation U3 = δ0 + F n0 ∗ ∗ U3 = δ0 + (Σ + A) ∗ U3 , we have U3 ∗ (δ0 − Σ) = δ0 + A ∗ U3 . Since δ0 − Σ has total mass less than unity, this factor may be inverted to yield U3 = Uσ + A ∗ Uσ ∗ U3 , 2∗

where Uσ = δ0 + Σ + Σ U , and then for UG ,

+ · · · has total mass (1 − σ)−1 . Thus, we obtain for

UG = G ∗ ρ ∗ Uσ + A ∗ G ∗ ρ ∗ Uσ ∗ U3 . This will serve as the required decomposition, with U2G = G ∗ ρ ∗ Uσ totally ﬁnite and U1G = A ∗ G ∗ ρ ∗ Uσ ∗ U3 absolutely continuous, since it is a

4.4.

Renewal Theorems

89

convolution in which one of the terms is absolutely continuous. To show that its density has the required properties, we note ﬁrst that the key renewal theorem applies to U3 in the form ∞ λ (U3 ∗ g)(t) → g(x) dx n0 0 whenever g is directly Riemann integrable. But then a similar result applies also to H = G ∗ ρ ∗ Uσ ∗ U3 , which is simply a type of delayed renewal measure in which the initial ‘distribution’ G ∗ ρ ∗ Uσ has total mass 1 × n0 × (1 − σ)−1 , so that ∞ λ (H ∗ g)(t) → g(x) dx (t → ∞). 1−σ 0 Finally, since the density of A is continuous and vanishes outside a bounded set, we can take g(t) = a(t), in which case the left-hand side of the last equation reduces to u1G (t) and we obtain u1G (t) →

λ 1−σ

∞

a(x) dx = λ. 0

We have the following corollary (see Arjas, Nummelin and Tweedie, 1978). Corollary 4.4.V. If F is spread out and g ≥ 0 is bounded, integrable, and satisﬁes g(x) → 0 as x → ∞, then & & ∞ & & lim sup &&(UG ∗ f )(t) − λ f (x) dx&& → 0. (4.4.10) t→∞ |f |≤g

0

Proof. We consider separately the convolution of g with each of the two components in the decomposition (4.4.8) of UG . Taking ﬁrst the a.c. component, and setting uG (x) = 0 for x < 0, we have & t & sup && u1G (t − x)f (x) dx − λ

|f |≤g

0

0

∞

& & f (x) dx&& ≤

∞

& & &u1G (t − x) − λ& g(x) dx.

0

Now u1G (t) → λ so it is bounded for suﬃciently large t, |u1G (t) − λ| ≤ C say, for t > T , and we can write the last integral as 0

t−T

& & g(x) &u1G (t − x) − λ& dx +

T

& & &u1G (s) − λ& g(t − s) ds,

0

where the ﬁrst integral tends to zero by dominated convergence because |u1G (t − x) − λ| is bounded, u1G (t − x) → λ for each ﬁxed x, and g(x) is integrable, while the second tends to zero by dominated convergence since |u1G (s) − λ| has ﬁnite total mass over (0, T ) and by assumption g(t − s) → 0 for each ﬁxed s.

90

4. Renewal Processes

Similarly, the integral against the second component is dominated for all |f | ≤ g by t g(t − x) dU2G (x), 0

where again the integrand is bounded and tends to zero for each ﬁxed x, while U2G has ﬁnite total mass, so the integral tends to zero by dominated convergence. Corollary 4.4.VI. If F is spread out, then for each ﬁnite interval (0, M ) St UG − λM → 0. The version of the renewal theorem summarized by these results has the double advantage of not only strengthening the form of convergence but also replacing the rather awkward condition of direct Riemann integrability by the simpler conditions of Proposition 4.4.IV. Further variants are discussed in Exercise 4.4.4 and in the paper by Arjas et al. (1978). With further conditions on the lifetime distributions—for example, the existence of moments—it is possible to obtain bounds on the rate of convergence in the renewal theorem. For results of this type, see Stone (1966), Sch¨al (1971), and Bretagnolle and Dacunha-Castelle (1967); for a very simple case, see Exercise 4.4.5(a).

Exercises and Complements to Section 4.4 4.4.1 Conditions for direct Riemann integrability. Let z(x) be a measurable function deﬁned on [0, ∞). Show that each of the following conditions is suﬃcient to make z(·) directly Riemann integrable (see also Feller, 1966). (a) z(x) is nonnegative, monotonically decreasing, and Lebesgue integrable. (b) z(x) is continuous, and setting αn = supn<x≤n+1 |z(x)|, Σαn < ∞. [Hint: z(x) is Riemann integrable on any ﬁnite interval, and the remainder term outside this interval provides a contribution that tends to zero.] (c) z(x) ≥ 0, z(x) is uniformly continuous and bounded above by a monotonically decreasing integrable function. 4.4.2 (a) If g is bounded and continuous and f is integrable, then their convolution product f ∗ g = R g(t − x)f (x) dx is uniformly continuous. (b) Extend this to the case where g is any bounded measurable function by approximating g by bounded continuous functions. In particular, therefore, f (t − x) dx is uniformly continuous whenever A is a measurable set. A (c) Let F have a.c. component f ; show from (b) that F ∗ F has an a.c. component f2 , which dominates a uniformly continuous function and hence a bounded function that vanishes outside a bounded set and is twice continuously diﬀerentiable. 4.4.3 Apply the key renewal theorem as around (4.4.5) to show that if F has density f with f k∗ bounded, and if λ−1 < ∞, then the renewal density u(x) exists and satisﬁes

2k−1

u(x) −

j=1

f j∗ (x) → λ.

4.4.

Renewal Theorems

91

2k−1

∞

f j∗ (x) satisﬁes the renewal equation [Hint: u(x) − j=1 f j∗ (x) = j=2k 2k∗ with z(x) = f (x), which is uniformly continuous and bounded above by an integrable function. Necessary and suﬃcient conditions for u(x) itself to converge are given in Smith (1962); see also Feller (1966, Section XI.4).] 4.4.4 Strong convergence counterexample. Let Gu denote the distribution of the forward recurrence time at t = u and G∞ its limit, if it exists, of a renewal process N (·) with lifetime distribution F with mean 1/λ. (a) Suppose that xF has discrete support but is nonlattice. Show that Gu (x) → G∞ (x) = λ 0 [1−F (u)] du, but that Gu −G∞ = 2 (all ﬁnite u). [Hence, Gu does not converge in variation norm · , i.e. strong convergence fails.] (b) Show that Gu − G∞ → 0 (u → ∞) when F is spread out. 4.4.5 Rate of convergence in renewal theorems. ∞ (a) Consider (4.1.8) with z(t) = λ t F (y) dy, where F (y) = 1 − F (y) and F has second moment σ 2 + µ2 . Deduce that Z, the solution of (4.1.8) with such z, equals φ(t) ≡ U (t) − λt. Use the key renewal theorem to conclude that for nonlattice F ,

t

∞

0 ≤ φ(t) = λ 0

F (v) dv dU (u) → 12 λ2 (σ 2 + µ2 )

(0 ≤ t → ∞).

t−u

(b) Let the r.v.s T1 , T2 be independent with Pr{T1 > t} = z(t) as in (a). Use the subadditivity of the renewal function U (·) to give, for all t ≥ 0, U (2t) ≤ 2EU (t + T1 − T2 ), and hence deduce from EU (t−T1 ) = λt (cf. Example 4.1(c) and Proposition 4.2.I) that 2λt ≤ U (2t) ≤ 2λt + λ2 σ 2 + 1. [See Carlsson and Nerman (1986) for details and earlier references.] (c) Suppose that the generator z(·) in the general renewal equation (4.1.8) is t positive and decreases monotonically. Show that J1 (t) ≡ 0 z(u)λ du → ∞

t

(t → ∞) if and only if J2 (t) ≡ 0 z(t − u) dU (u) → ∞ (t → ∞) and that then limt→∞ J1 (t)/J2 (t) = 1. that, when F (·) has inﬁnite second moment, U (t) − λt ∼ ∞Deduce 2 λ min(v, t)F (v) dv ≡ G(t) (Sgibnev, 1981). 0 ∞ For an alternative proof, show that φ(t) ≤ 0 U (min(v, t))λF (v) dv ≡ GU (t) ≥ G(t) by the elementary renewal theorem. Use Blackwell’s theorem to show that lim supt→∞ GU (t)/G(t) ≤ 1. When F (·) has ﬁnite second moment and is nonarithmetic, show that limt→∞ [J1 (t) − J2 (t)] = 0. (d) Use the asymptotics of φ(·) to deduce that for a stationary orderly renewal process N (·), var N (0, t] ∼ (var λX)(λt) when t the lifetime d.f. has ﬁnite second moment, and var N (0, t] ∼ λ2 t2 − λ3 0 (t − v)2 F (v) dv otherwise. [Hint: First, ﬁnd var N (0, t] from (3.5.2) and (3.5.6).]

92

4. Renewal Processes

4.5. Neighbours of the Renewal Process: Wold Processes The speciﬁcation of a renewal process via independent identically distributed intervals raises the possibility of specifying other point processes via intervals that are one step removed from independence. In this section, we consider point processes for which the successive intervals {Xn } form a Markov chain so that the distribution of Xn+1 given Xn , Xn−1 , . . . in fact depends only on Xn . Such processes seem to have been considered ﬁrst by Wold (1948); accordingly, we call them Wold processes. Example 4.5(a) A ﬁrst-order exponential autoregressive process. Suppose that the family {Xn } of intervals satisfy the relation Xn+1 = ρXn + n

(4.5.1)

for some 0 ≤ ρ < 1 and family {n } of i.i.d. nonnegative random variables (note {Xn } is itself i.i.d. if ρ = 0). For the particular distribution given by Pr{n = 0} = ρ

and

Pr{n > y} = (1 − ρ)e−y

(y > 0),

taking Laplace transforms of (4.5.1) shows that if a stationary sequence of intervals is to exist, the common distribution F of the {Xn } must have its Laplace–Stieltjes transform F satisfy the functional equation F(ρs)(1 + ρs) F(s) = . 1+s The only solution of this equation for which F(0) = F(0+) = 1 is F(s) = (1 + s)−1 . Thus, a stationary version of the Markov chain exists and the marginal distribution for the intervals is exponential as for a Poisson process. The parameter ρ controls the degree of association between the intervals. For ρ > 0, a realization of the process consists of a sequence of intervals each one of which is an exact fraction of the preceding one, followed by an interval independently chosen from the same exponential distribution. The construction can be extended to more general types of gamma distribution and has been studied extensively by P.A.W. Lewis and co-authors: see, for example, Gaver and Lewis (1980). They have advocated its use as an alternative to the Poisson process, partly on the grounds of the very simple behaviour of the spectrum of the interval process. Other aspects are more intractable, however, and from a point process viewpoint its partly deterministic behaviour gives it a rather special character (see Exercises 4.5.2 and 4.5.9). In general, the interval structure of a Wold process is determined by a Markov transition kernel P (x, A); that is, a family {P (x, ·): 0 ≤ x < ∞} of probability measures in [0, ∞), and the distribution, P0 (·) say, of the initial interval X0 , with P (·, A) measurable for each ﬁxed Borel set A ⊆ [0, ∞). When the chain {Xn } is irreducible [see e.g. Harris (1956), Orey (1971) or Meyn and

4.5.

Neighbours of the Renewal Process: Wold Processes

93

Tweedie (1993) for discussions of the precise meaning of irreducibility] and admits a stationary distribution, π(·) say, so that for all such Borel subsets A ∞ π(A) = P (x, A) π(dx), (4.5.2) 0−

an interval sequence {Xn } with a stationary distribution can be speciﬁed. The following construction then leads to a counting process N (·) that is stationary in the sense of Deﬁnition 3.2.I. First, let {X0 , X1 , . . .} be a realization of the Markov chain for which X0 has the initial distribution x π(dx) P0 (dx) ≡ Pr{X0 ∈ (x, x + dx)} = ∞ , u π(du) 0−

(4.5.3a)

where we suppose both π{0} = 0 and ﬁniteness of the normalizing factor; i.e. ∞ ∞ λ−1 ≡ x π(dx) = π(u, ∞) du < ∞. (4.5.3b) 0−

0

Next, conditional on X0 , let X0 be uniformly distributed on (0, X0 ), and determine N by N (0, x] = #{n: Sn ≤ x}, where

S1 = X0 ,

Sn+1 = Sn + Xn

(n = 1, 2, . . .).

The relation (4.5.3), in conjunction with the deﬁnition of Sn , states that the origin is located uniformly at random within an interval selected according to the length-biased distribution with increment around x proportional to x π(dx). Since π{0} = 0, the normalizing constant λ is just the intensity of the process. Note that the distributions here are consistent with the relations found in Exercise 3.4.1 for the stationary distributions for the forward recurrence time and the length of the current interval. Indeed, the construction here can be rephrased usefully in terms of the bivariate, continuous-time Markov process X(t) = L(t), R(t) , (4.5.4) where L(t) is the length of the interval containing t and R(t) is the forward recurrence time at time t. The Markovian character of X(t) follows readily from that of the sequence of intervals. Moreover, it is clear that the process N (t) is uniquely determined by X(t) and vice versa. By starting the Markov process with its stationary distribution, we ensure that it remains stationary in its further evolution, and the same property then holds for the point process. An immediate point of contrast to the ordinary point process is that it is not necessary, in (4.5.2), to have R+ π(dx) < ∞. If the underlying Markov chain is null recurrent, a stationary regime can exist for the point process (though not for its intervals) in which, because of the dependence between the lengths

94

4. Renewal Processes

of successive intervals, long runs of very short intervals intervene between the occurrences of longer intervals; in such situations, divergence of R+ π(dx) can coexist with convergence of R+ x π(dx) (i.e. near the origin, π may integrate x but not 1). This leads to the possibility of constructing stationary Wold processes with inﬁnite intensity but ﬁnite mean interval length. One such construction is given in Daley (1982); another is outlined in Exercise 4.5.1. With such examples in mind, it is evident that the problem of formulating analogues of the renewal theorems for the Wold process needs to be approached with some care. One possible approach is through the family of renewal measures U (A | x) = E[#{n: Sn ∈ A} | X0 = x] and their associated cumulative processes U (t | x) ≡ U ([0, t] | x). The latter functions satisfy the renewal-type equations ∞ U (t | x) = I{t≥x} (t) + U (t − x | y) P (x, dy). (4.5.5) 0

Unfortunately, these equations seem rather intractable in general. The analogy with the renewal equations of Section 4.4 becomes clearer on taking Laplace–Stieltjes transforms of (4.5.5) with respect to t. Introducing the integral operator Tθ with kernel tθ (dy, x) = e−θx P (x, dy), the transform versions of equation (4.5.5) become ∞ e−θt U (dt | x) = e−θx + (Tθ Uθ )(x) Uθ (x) ≡ 0

with the formal solution Uθ = (1 − Tθ )−1 eθ , where (eθ )(x) ≡ e−θx , which may be compared to equation (4.1.6). Example 4.5(b) Discrete Wold processes. Consider a simple point process ({0, 1}-valued process) on the lattice of integers {0, 1, . . .}; the kernel P (x, dy) here becomes a matrix pij and in place of the cumulative form#in (4.5.5) it is more$ natural to consider the renewal functions u(j | i) = Pr N {j} = 1 | X0 = i . Then ∞

u(j | i) = δij + pik u(j − i | k), k=1

taking the right-hand ∞side here to be zero for j < i. By introducing the transforms ui (z) = k=i z k u(k | i), these become ui (z) = z i +

∞

pik z i uk (z),

k=1

or in matrix-vector form u(z) = ζ + Pz u(z),

4.5.

Neighbours of the Renewal Process: Wold Processes

95

where Pz = {pik z i }, u(z) = {ui (z)}, and ζ = (1, z, z 2 , . . .). The asymptotic behaviour of u(j | i) as j → ∞ is therefore related to the behaviour of the resolvent-type matrix (I − Pz )−1 as z → 1. When P is ﬁnite, this can be discussed in classical eigenvector/eigenvalue terms; see Exercise 4.5.4 and for further details Vere-Jones (1975). A particular question that arises relates to periodicity of the process: nonzero values of u(j | i) may be restricted to a sublattice of the integers. This phenomenon is not directly related to periodicity of the underlying Markov chain; again, see Exercise 4.5.4 for some examples. A more general approach, which can be extended to the denumerable case and anticipates the general discussion to be given below, is to consider the discrete version of the Markov chain X(t) in (4.5.4). When this bivariate chain is aperiodic and recurrent, returns to any given state pair—for example, time points at which an interval of speciﬁed length i0 is just commencing— constitute an imbedded renewal process for X(t) and allow standard renewal theory results to be applied. Example 4.5(c) Transition kernels speciﬁed by a diagonal expansion. Lancaster (1963) investigates the class of bivariate probability densities that can be represented by an expansion of the kind ! " ∞

f (x, y) = fX (x)fY (y) 1 + ρn Ln (x)Mn (y) , n=1

where fX (·), fY (·) are the marginal densities and Ln (x), Mn (y) are families of complete orthonormal functions deﬁned with respect to the marginal distributions fX (·), fY (·), respectively. When fX and fY coincide (so Ln = Mn ), the bivariate density can be used to deﬁne the density of the transition kernel of a stationary Markov chain with speciﬁed stationary distribution fX (x): just put ! " ∞

f (x, y) = fX (y) 1 + ρn Ln (x)Ln (y) . p(x, y) = fX (x) n=1 For many of the standard distributions, this leads to expansions in terms of classical orthogonal polynomials (see e.g. Tyan and Thomas, 1975). In particular, when fX (x) and fY (y) are both taken as gamma distributions, fX (x) = xα−1 e−x /Γ(α),

say,

the Ln (x) become the Laguerre polynomials of order α. The bivariate exponential density of Example 4.1(e) is a case in point when α = 1 and ρn = ρn . The resulting Wold process then has exponential intervals, but in contrast to Example 4.5(a), the realizations have no deterministic properties but simply appear as clustered groups of small or large intervals, the degree of clustering being controlled by the parameter ρ. Lampard (1968) describes an electrical counter system that produces correlated exponential intervals. More gener-

96

4. Renewal Processes

ally, when α = 12 d, such correlated gamma distributions can be simulated from bivariate normal distributions with random variables in common; this leads to the possibility of simulating Wold processes with correlated gamma intervals starting from a sequence of i.i.d. normal variates (see Exercise 4.5.7). Even in such a favourable situation, the analytic study of the renewal functions remains relatively intractable. Lai (1978) studies the exponential case in detail and provides a perturbation expansion for the renewal function and (count) spectral density of the process in terms of the parameter ρ. As such examples illustrate, explicit computations for the Wold process are often surprisingly diﬃcult. However, a useful and general approach to the asymptotic results can be developed by identifying a sequence of regeneration points within the evolution of the process and by applying to this sequence the renewal theorems of Section 4.4. It is by no means obvious that any such sequence of regeneration points exists, but the ‘splitting’ techniques developed for Markov chains with general state space by Nummelin (1978) and Athreya and Ney (1978) allow such a sequence to be constructed for a wide class of examples. The essence of this idea is to identify a particular set A0 in the state space and a particular distribution φ on A0 such that whenever the process enters A0 , it has a certain probability of doing so ‘according to φ’, when its future evolution will be just the same as when it last entered A0 ‘according to φ’. In eﬀect, returns to A0 according to φ can be treated as if they are returns to a ﬁxed atom in the state space and provide the regeneration points we seek. The following conditions summarize the requirements on the transition kernel for this to be possible (see Athreya and Ney, 1978). Conditions 4.5.I. (Regenerative Homing Set Conditions). For the Markov chain {Xn } on state space S ⊆ [0, ∞) ≡ R+ , there exists a homing set A0 ∈ B(R+ ), A0 ⊆ S, a probability measure φ on A0 , and a positive constant c such that for all x ∈ S, (i) Pr{Xn ∈ A0 for some n = 1, 2, . . . | X0 = x} = 1; and (ii) for every Borel subset B of A0 , P (x, B) ≥ cφ(B). The ﬁrst of these conditions embodies a rather strong recurrence condition; indeed Athreya and Ney call a chain satisfying Condition 4.5.I ‘strongly aperiodic recurrent’ since the conditions imply aperiodicity as well as recurrence. The second condition is more akin to an absolute continuity requirement on the transition kernel. In particular, it is satisﬁed whenever the following simpler but more stringent condition holds. Condition 4.5.I . (ii) For all x ∈ A0 , P (x, B) has density p(x, y) on A0 with respect to φ such that p(x, y) ≥ c > 0 for all y ∈ A0 . Typically, A0 is a set with positive Lebesgue measure and φ the uniform distribution on A0 (i.e. a multiple of Lebesgue measure scaled to give A0 total mass unity). In the discrete case, 4.5.I(ii) is equivalent to the assumption that the matrix of transition probabilities has at least one positive diagonal element.

4.5.

Neighbours of the Renewal Process: Wold Processes

97

Conditions 4.5.I are trivially satisﬁed in the independent (renewal) case if we take S to be the support of the lifetime distribution F and put A0 = S, φ = F , and c = 1. Under Conditions 4.5.I, Athreya and Ney (1978) show that the chain is recurrent in the sense of Harris (1956) and admits a unique ﬁnite invariant measure π(·). The important feature for our purposes is not so much the existence of the invariant measure as its relation to the sequence {νk } of ‘returns to A0 according to φ’. This aspect is made explicit in the following proposition [see Athreya and Ney (1978) and Nummelin (1978) for proof]. Proposition 4.5.II. Conditions 4.5.I imply that for the Markov chain {Xn }, (a) there exists a stopping time ν ≥ 1 with respect to the σ-ﬁelds generated by {Xn } such that for Borel subsets B of A0 Pr{Xν ∈ B | X0 · · · Xν−1 ; ν} = φ(B); (b) {Xn } has an invariant measure π(·) related to φ by ! ν−1 "

all B ∈ B(R+ ) , IB (Xn ) π(B) = Eφ

(4.5.6)

(4.5.7)

n=0

where Eφ refers to expectations under the initial condition that X0 has distribution φ on A0 , i.e. Pr{X0 ∈ B} = φ(B ∩ A0 ) for B ∈ B(R+ ). Equation (4.5.7) can be extended by linearity and approximation by simple functions to ! ν−1 "

f (x) π(dx) = Eφ f (Xn ) (4.5.8) R+

n=0

whenever f is Borel-measurable and either nonnegative or π-integrable. Special cases of (4.5.8) include Eφ (ν) = π(dx) (4.5.9a) R+

and Eφ (X0 + X1 + · · · + Xν−1 ) =

x π(dx).

(4.5.9b)

R+

n Now let Sn = i=1 Xi , and let {Tk } = {Sνk − 1} denote the sequence of times at which the process returns to A0 according to φ. These Tk form the regeneration points that we seek. If G(·) denotes the distribution function of the successive diﬀerences Tk − Tk−1 so that in particular G(u) = Eφ {ISν−1 ≤ u} = Prφ {Sν−1 ≤ u},

(4.5.10)

then the Tk form the instants of a renewal process with lifetime distribution G. We apply this fact, with the theorems of Section 4.4, to determine the asymptotic behaviour of the Wold process.

98

4. Renewal Processes

The results are stated for the renewal function Uφ (C × Tt B) = Eφ #{n: Xn ∈ C, Sn ∈ Tt B},

(4.5.11)

where Tt B is the translate of B through time t. If the process is started from a general distribution κ for X0 , we write Uκ (·) for the corresponding renewal function. The analogue of Blackwell’s renewal theorem for this function reads, for B = (0, h) and λ as in (4.5.3b), Uφ (C × Tt B) → λπ(C)(B). We approach these results through an extended version of the key renewal theorem, ﬁxing a bounded measurable function h(x, y) with support in the positive quadrant x ≥ 0, y ≥ 0, and setting for t > 0 Z(t) = Eφ

! N (t)

" h(Xn , t − Sn )

∞ t

= 0

n=0

h(x, t − u) Uφ (dx × du). (4.5.12)

0

Considering the time T1 to the ﬁrst return to A0 according to φ, we ﬁnd that t Z(t) satisﬁes the renewal equation Z(t) = z(t) + 0 Z(t − u) dG(x), where z(t) = Eφ

! ν−1

"

h(Xn , t − Sn )

T

h(XN (u) , t − u) dN (u) . (4.5.13)

= Eφ 0

n=0

If then we can show that z(t) satisﬁes the condition of direct Riemann integrability (for Feller’s form of the key renewal theorem in 4.4.II) or the conditions in 4.4.III for the Breiman form of the theorem, we shall be able to assert that Z(t) → λ

∞

(t → ∞).

z(t) dt 0

To evaluate the integral, we make use of (4.5.8) so that formally

∞

∞

z(t) dt = 0

Eφ 0

" h(Xn , t − Sn ) dt

n=0

= Eφ

! ν−1

! ν−1

n=0

∞ ∞

∞

" h(Xn , t − Sn ) dt

Sn

h(x, t) π(dx) dt,

= 0

= Eφ

! ν−1

n=0

"

∞

h(Xn , u) du

0

(4.5.14)

0

the formal operations being justiﬁed by Fubini’s theorem whenever h ≥ 0 or h is (π × )-integrable.

4.5.

Neighbours of the Renewal Process: Wold Processes

99

Direct Riemann integrability can be established directly in simple cases, to which we add the following general suﬃcient condition. For δ > 0, any α in 0 ≤ α < δ, and Ij (δ) ≡ (jδ, (j + 1)δ], deﬁne mδ (x, α) =

∞

sup h(x, t)

and

mδ (x) = sup mδ (x, α), 0≤α<δ

j=0 t∈Ij (δ)

and similarly mδ (x, α) and mδ (x) by replacing sup by inf. For any y, there is a unique αδ (y) in [0, δ) such that y = j δ + αδ (y) for some integer j . Then ∞

sup h(x, t − y) = mδ x, αδ (y) .

j=0 t∈Ij (δ)

Using ﬁrst Fatou’s lemma and then Fubini’s theorem, ! ν−1 " ∞

sup z(t) ≤ Eφ mδ Xn , αδ (−Sn ) j=0 t∈Ij (δ)

n=0

≤ Eφ

! ν−1

" mδ (Xn )

=

∞

mδ (x) π(dx). 0

n=0

A similar lower bound with sup and mδ replaced by inf and mδ , respectively, holds. Thus, a suﬃcient condition for the direct Riemann integrability of z(t) is that, as δ ↓ 0, ∞

δ

[mδ (x) − mδ (x)] π(dx) → 0.

(4.5.15)

0

If, alternatively, G is spread out, then it is enough to show that z(t) is integrable and tends to zero as t → ∞. Simple suﬃcient conditions for the latter (not the most general possible) are that h(x, t) → 0

as

t→∞

for each ﬁxed x

(4.5.16a)

and |h(x, t)| ≤ h0 (x),

(4.5.16b)

where h0 (x) is π-integrable. This follows readily from (4.5.13) and an application of the dominated convergence theorem. Summarizing these results, we have the following theorem. Theorem 4.5.III. Suppose that the Markov transition kernel associated with a Wold process satisﬁes the regenerative homing set Conditions 4.5.I and that its invariant measure π has a ﬁnite normalizing factor λ−1 as in (4.5.3b). Also let h(x, t) be a ﬁxed measurable function, vanishing outside the positive quadrant in R2 and (π × )-integrable in R+ × R+ , and deﬁne G, Uφ , Zφ , and zφ by (4.5.10–13), respectively. If either

100

4. Renewal Processes

(i) G is nonlattice and zφ is directly Riemann integrable, or (ii) G is spread out and z(t) is bounded and → 0 as t → ∞, then ∞ t Zφ (t) = h(x, t − u) Uφ (dx × du) 0 ∞0 ∞ →λ h(x, u) π(dx) du (t → ∞). 0

(4.5.17)

0

In particular, (4.5.15) implies Condition (i) and (4.5.16) Condition (ii). We now apply this theorem to some important special cases. Consider ﬁrst the Blackwell-type result, where h(x, t) = IA (x)I(0,h) (t). In general, h(x, t) is only (π × )-integrable if A is bounded away from zero. Then, since I(0,h) (t) has only two points of discontinuity, each of unit height, it is easy to see that for all x ∈ R+ , mδ (x) − mδ (x) ≤ 2IA (x), so that both (4.5.15) and (4.5.16) are satisﬁed. Equation (4.5.16) also holds if the interval (0, h) is replaced by any bounded Borel set B. Finally, if π(·) is totally ﬁnite, the condition on A can be dropped and the same results hold. Thus, we have the following corollary. Corollary 4.5.IV. Let A, B be Borel subsets of R+ . If G is nonlattice, then Uφ (A × Tt B) → λπ(A)(B)

(t → ∞)

(4.5.18)

whenever B is a ﬁnite interval (0, h) and A ⊆ [, ∞) for some > 0. If G is spread out, the same result holds for B any bounded Borel set. If π(·) is totally ﬁnite, these results hold without any further condition on A. We next extend the results to an arbitrary initial distribution, κ say, for X0 . If we denote the corresponding renewal functions by Uκ , Zκ , then Zκ satisﬁes t

Zφ (t − u) G(du)

Zκ (t) = zκ (t) +

(4.5.19)

0

with zκ (t) = Eκ

! ν −1

" h(Xn , t

−

Sn )

,

(4.5.20)

n=0

where X1 , Sn refer to the sequence of interval lengths and renewals for the process with initial distribution κ, and ν is the time of the ﬁrst entry to A0 according to φ, again starting from X0 distributed according to κ. It follows

4.5.

Neighbours of the Renewal Process: Wold Processes

101

from Condition 4.5.I(i) that this entry is certain, so ν is ﬁnite with probability 1. It then follows from (4.5.19) that Zκ (t) − Zφ (t) = zκ (t) − zφ (t) so that we need conditions to ensure the convergence of the right-hand side to zero. This will follow from (4.5.20) if Eκ (ν ) < ∞ and h is bounded and satisﬁes (4.5.16a). Corollary 4.5.V. Suppose that (4.5.17) holds for Uφ and that κ is an arbitrary initial distribution for X0 . Then (4.5.17) continues to hold with Uκ in place of Uφ if and only if zκ (t) − zφ (t) → 0, in particular if h is bounded and satisﬁes (4.5.16a), and Eκ (ν ) < ∞, Eφ (ν) = R+ π(dx) < ∞. Finally, we turn to the question of the weak convergence of the process X(t) in (4.5.4). It somewhat simpliﬁes the algebraic details to work with the bivariate process Y(t) = (L(t), L(t) − R(t)), i.e. with the backward recurrence time L(t) − R(t) in place of the forward one. If then ξ(x, y) is any bounded continuous function of x, y in R+ × R+ , we consider ξ(Y(t)), which we may write in the form ∞ ξ Y(t) = h(Ln , t − Sn ), n=0

where

ξ(x, t) (0 ≤ t ≤ x), 0 (t > x), since in fact only the term with n = N (t) contributes to the sum. Suppose ﬁrst that G is nonlattice, and deﬁne the modulus of continuity ω(x, δ) of h(·) by ω(x, δ) = sup sup |h(x, t) − h(x, t + u)|. h(x, t) =

0≤t≤x−δ 0≤u≤δ

Then, for the particular choice of h given above, mδ (x) − mδ (x) ≤ (x/δ)ω(x, δ)

so that δ

R+

[mδ (x) − mδ (x)] π(dx) ≤

x ω(x, δ) π(dx). R+

For each ﬁxed x > 0, h(x, t) is continuous and nonvanishing on a ﬁnite closed interval so it is uniformly continuous, and hence ω(x, δ) → 0. Also, ω(x, δ) is uniformly bounded in x and δ, so by dominated convergence, the integral on the right converges to zero as δ → 0; that is, (4.5.15) holds. Also, & & |zκ (t)| ≤ Eκ &ξ Y(t) &; T > t ≤ CPκ {T > t}, where the last term tends to zero from the recurrence property assumed in Condition 4.5.I(i). Consequently, the conditions for Corollary 4.5.V hold. If, furthermore, G is spread out, then this result alone is suﬃcient to ensure the truth of the Riemann-type theorem. This means the continuity condition on ξ can be dropped, implying that the weak convergence of Y(t) to its limit can be replaced by convergence in variation norm.

102

4. Renewal Processes

Proposition 4.5.VI. Let Pκ,t denote the distribution of X(t) supposing X0 has initial distribution κ, and π∞ the stationary distribution for X(t) with elementary mass λ π(dx) dy over the region 0 ≤ y ≤ x < ∞. If G is nonlattice and λ−1 = R+ x π(dx) < ∞, then Pκ,t → π∞ weakly. If, furthermore, G is spread out, then Pκ,t → π∞ in variation norm. Throughout our discussion, we have assumed ﬁniteness of the mean λ−1 [see (4.5.3b)]. When the mean is inﬁnite, further types of behaviour are possible, some of which are sketched in Athreya, Tweedie and Vere-Jones (1980).

Exercises and Complements to Section 4.5 4.5.1 A Wold process with inﬁnite intensity. Consider a symmetric random walk {Xn } with reﬂecting barrier at the origin, supposing the walk to have density and be null recurrent; for example, the single-step distribution could be N (0, 1). Then, the invariant measure for Xn is Lebesgue measure on (0, ∞). Now transform the state space by setting Yn = T (Xn ), where for y > 0 x = T −1 (y) = y −β (1 + y)−α

(α > 0, β > 0);

note that under T the origin is mapped into the point at inﬁnity and vice versa. Then, the transformed process Yn is Markovian with invariant measure having density π(y), where near the origin π(y) ∼ y −(1+β) and near inﬁnity π(y) ∼ ∞ y −(α+β+1) . Choose α and β so that 0 < β < 1, α+β > 1; then 0 y π(y) dy <

1

∞ but 0 π(y) dy = ∞. Complete the construction of a stationary version of the corresponding Wold process by using the joint distribution of the current interval and forward recurrence time as indicated in the text following (4.5.4). 4.5.2 Inﬁnitely divisible autoregressive process. Let X ≥ 0 have an inﬁnitely divisible distribution with representation of the form

ψ(θ) = E(e−θX ) = exp

−

∞

[1 − e−θx ] M (dx)

( Re(θ) > 0),

0

where 0,∞) min(x, 1) M (dx) < ∞. Show that there exists a stationary sequence {Xn }, satisfying the autoregressive equation Xn+1 = ρXn + n

( n independent of Xn )

and having marginal distribution with Laplace–Stieltjes transform ψ(θ), whenever M is absolutely continuous with monotonically decreasing density m(x), hence in particular whenever the Xn are gamma distributed. [Hint: If n is also inﬁnitely divisible, its Laplace–Stieltjes transform, φ(θ) say, ∞ must satisfy φ(θ) = ψ(θ)/ψ(ρθ) = exp ( 0 (e−θx − 1) [M (dx) − M (ρ−1 dx)]).] 4.5.3 Let F (t; x, y) be the distribution function of the bivariate process Y(t) = (L(t), L(t)−R(t)), conditional on an event at the origin and L(0−) = s. Then, if F has a density f (t; x, y) ≡ f (t; x, y | s), it satisﬁes for 0 < y < min(x, t) ∂F ∂F + = ∂t ∂y

t

f (t; u, u)P (u, (0, x]) du − 0

y

f (t; u, u) du, 0

and if also the density function is suﬃciently regular, then for the same x, y, t, ∂f ∂f + = 0. ∂t ∂y

4.5.

Neighbours of the Renewal Process: Wold Processes

103

Argue on probabilistic grounds that f (t; x, y) = f (t−v; x, y−v) for 0 < y−v < min(x, t − v), so f (t; x, x) = f (t − x; x, 0+) for 0 < x < t, and that

t

f (t; x, 0+) = p(s, t)p(t, x) +

f (t; u, u)p(u, x) du.

(4.5.21)

0

When the p.d.f.s p(u, x) are independent of u, this reduces to the renewal density function equation. Assuming that the conditions for the limits of Theorem 4.5.III and its corollaries are satisﬁed, identify f (x, y) ≡ limt→∞ f (t; x, y) with the density function π(x) for the stationary measure π(·) of the theorem, and deduce the density version of equation (4.5.2) by taking the limit in (4.5.21). Now t let L(0−) ∞ be an r.v. with p.d.f. λsπ(s) with λ as in the theorem. Interpret 0 dx 0 yf (t; x, y | s)λsπ(s) ds as the density of the expectation function U (·) of the Wold process. [Lai (1978) has other discussion and references.] 4.5.4 Discrete Wold processes. (a) Suppose integer-valued intervals are generated by a ﬁnite Markov chain on {1, 2, 3} with transition matrices of the forms

⎧ ⎫ ⎧ 0 1 0⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ (i) P = ⎪ ; (ii) P = 0 0 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎩ 21 1

0

0

2

0 1 2 1 2

⎫ ⎧ 0 12 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0⎪ ; (iii) P = ⎪ 1 0 ⎪ ⎪ ⎭ ⎩ 1⎪

0

1

0

⎫ ⎪ ⎪ ⎪ 0⎪ . ⎪ ⎪ ⎭ 1 2

0

For which of these P do the corresponding Wold processes show lattice behaviour? What is the relation of periodicity of P to lattice behaviour of the associated Wold process? (b) Deﬁne mij (n) = Pr{interval of length j starts at n | X0 = i} and show that, for n ≥ 0, mij (n) = δij δ0n +

mik (n − k)pkj = δij δ0n +

k

pik mkj (n − i),

k

where we interpret mij (n) = 0 for n < 0. In matrix form, the p.g.f.s are given by % ) M(z) = {m ij (z)} ≡

∞

mij (n)z n

−1

= (I − H(z))

,

n=0

where H(z) = (hij (z)) ≡ (z i pij ). (c) If the Wold process is nonlattice and P is irreducible, (1 − z)[I − H(z)]−1 = λΠ + (1 − z)Q(z), where Π is the one-dimensional projection onto the null space of I − P and Q(z) is analytic within some disk |z| ≤ 1 + , > 0 (see Vere-Jones, 1975). 4.5.5 Denumerable discrete Wold processes. Consider the bivariate process X(n) = (L(n), R(n)) [or Y(n) = (L(n), L(n) − R(n))] as a Markov chain with an augmented space. Show that the Wold process is nonlattice if and only if this augmented chain is aperiodic, and that if the original Markov chain is

104

4. Renewal Processes positive recurrent with stationary distribution {πj }, having ﬁnite mean, the augmented chain X(n) is positive recurrent with stationary distribution

π(h, j) = Pr{Ln = j, Rn = h} = where λ−1 =

λπj

(h = 1, . . . , j),

0

otherwise,

jπj < ∞ as before.

4.5.6 Markov chains with kernels generated by a power diagonal expansion. (a) If {Xn } is generated by a kernel with the structure p(x, y) = f (y)

∞

ρn Ln (x)Ln (y)

n=1

for an orthogonal family of functions Ln (·), then the m-step transition kernel p(m) (x, y) is generated by a kernel with similar structure and ρ replaced by ρm = ρm . (b) In the particular case where f (·) is exponential and the {Ln (x)} are Laguerre polynomials, a key role is played by the Hille–Hardy formula ∞

Ln (x)Ln (y)ρn =

n=0

e−(x+y)ρ/(1+ρ) I0 1−ρ

2√xyρ 1−ρ

.

Use this to show the following [see Lai (1978) for details]: (i) Convergence to the stationary limit as m → ∞ is not uniform in x. h (ii) For every x > 0, the conditional d.f.s F (h | x) = 0 p(x, y) dy are bounded by a common function α(h), where α(h) < 1 for h < ∞. (iii) If A(θ) is the integral operator on L1 [0, ∞) with kernel p(x, y)e−θx , then for all θ with Re(θ) ≥ 0, θ = 0, A2 (θ) < 1, so the inverse [I − A(θ)]−1 exists and is deﬁned by an absolutely convergent series of powers of A(θ). , Z1 , . . . be a 4.5.7 Simulation of Wold process with χ2 interval distribution. Let Z0 sequence of i.i.d. N (0, σ 2 ) variables; deﬁne successively Y1 = Z0 / 1 − ρ2 and Yi+1 = ρY1 + Zi (i = 1, 2, . . .). Then {Yi } is a stationary sequence of normal r.v.s with ﬁrst-order autoregressive structure. Construct d independent realizations of such autocorrelated normal series, {Y1i , . . . , Ydi ; i = 1, 2, . . .} say, and generate a stationary sequence of autocorrelated gamma r.v.s {Xi } by setting Xi =

d

2 Yki

k=1 2

2

−1

so EXi = dσ /(1 − ρ ) ≡ λ , var Xi = 2dσ 4 /(1 − ρ2 )2 , and cov(Xi , Xi+1 ) = dσ 4 (1 + ρ2 )/(1 − ρ2 )2 . These Xi can be used as the intervals of a point process, but the process so obtained is not initially stationary: to obtain a stationary version, the length-biased distribution may be approximated by choosing T λ−1 , selecting a time origin uniformly on (0, T ) and taking the initial interval to be the one containing the origin so selected, and the subsequent intervals to be X1 , X2 and so on.

4.5.

Neighbours of the Renewal Process: Wold Processes

105

4.5.8 Wold processes with intervals conditionally exponentially distributed. Let p(x, y) be of the form λ(x)e−λ(x)y . (a) When λ(x) = λx−1/2 , the marginal density π(x) can be found via Mellin transforms (Wold, 1948). (b) When λ(x) = λ + αx, the density π(x) is given by π(x) = c(λ + αx)−1 e−λx for ﬁnite c > 0 [see Cox (1955), Cox and Isham (1980, pp. 60–62), and Daley (1982); the model has a simple form of likelihood function and has been used to illustrate problems of inference for Poisson processes when the alternative is a Wold process, in particular of the type under discussion]. 4.5.9 Time-reversed exponential autoregression. Let the intervals Yn of a point process be stationary and satisfy Yn+1 = min(Yn /ρ, ηn ) for i.i.d. nonnegative ηn and 0 < ρ < 1. Show that when ηn is exponentially distributed, so also is Yn , with corr(Y0 , Yn ) = ρ|n| . Furthermore, {Yn } =d {X−n }, where Xn are as in Example 4.5(a) with Pr{ n > y} = (1 − ρ)e−y [see Chernick et al. (1988), where it is also shown that this identiﬁcation of {Xn } as the time-reversed process of {Yn } characterizes the exponential distribution]. 4.5.10 Lampard’s reversible counter system [see Lampard (1968) and Takacs (1976)]. Consider a system with two counters, one of which is initially empty but accumulates particles according to a Poisson process of rate λ, the other of which has an initial content ξ0 + r particles and loses particles according to a Poisson process of rate µ until it is empty. At that point, the roles of the two counters are reversed; an additional r particles are added to the number ξ1 accumulated in the ﬁrst counter, which then begins to lose particles at rate µ, while the second counter begins to accumulate particles again at rate λ. We take X0 , X1 , . . . to be the intervals between successive reversals of the counters. Then, the {Xi } form a Markov chain that has a stationary distribution if and only if µ > λ. 4.5.11 mth-order dependence. Suppose that the intervals {Xi } of a point process form an mth-order Markov chain. Then, in place of the process (L(t), R(t)), we may consider the process X(t) = (L−m+1 (t), . . . , L−1 (t), L(t), R(t)), where the state is deﬁned as the set of m − 1 preceding intervals, the current interval, and the forward recurrence time. The regenerative homing set conditions can be applied to the discrete time vector process with state Un = (Xn−m+1 , . . . , Xn−1 , Xn ), which is Markovian in the simple sense. Establish analogues to Theorem 4.5.III and its corollaries. [See Chong (1981) for details.] 4.5.12 A non-Poisson process with exponentially distributed intervals. Let the intervals τ1 , τ2 , . . . of a point process on R+ be deﬁned pairwise by i.i.d. pairs {(τ2n−1 , τ2n )}, n = 1, 2, . . . as follows. For each pair, the joint density function f (u, v) = e−u−v + f (u, v), where f (u, v) = 0 except for (u, v) in the

106

4. Renewal Processes set A = {0 < u < 2 and 2 < v < 4, or 0 < v < 2 and 2 < u < 4}, where it equals for u ∈ (0, 1) and v ∈ (2, 3); u ∈ (1, 2) and v ∈ (3, 4); v ∈ (0, 1) and u ∈ (3, 4); and v ∈ (1, 2) and u ∈ (2, 3); and f = − on the complement in A of these four unit squares. Check that τ2n−1 and τ2n are not independent, that each τi is exponentially distributed y with unit mean, and that every pair (τi , τi+1 ) has Pr{τi + τi+1 ≤ y} = 0 we−w dw. Conclude that for any k = 1, 2, . . . , the length of k consecutive intervals has the same distribution as for a Poisson process at unit rate and hence that N (a, b] for a < b is Poisson-distributed with mean b − a. [This counterexample to Theorem 2.3.II is due to Moran (1967).]

4.5.13 A stationary point process N with ﬁnite second moment is long-range dependent when var N (0, x] lim sup = ∞. x x→∞ (a) A renewal process is long-range dependent if and only if the lifetime distribution has inﬁnite second moment (Teugels, 1968; Daley, 1999). (b) Construct an example of a stationary Wold process that is long-range dependent but for which the marginal distribution of intervals has ﬁnite second moment. [Daley, Rolski and Vesilo (2000) note two examples.]

4.6. Stieltjes-Integral Calculus and Hazard Measures The results in this section can be regarded as being a prelude to the general discussion of conditional intensities and compensators in Chapters 7 and 14. The simplest case concerns a renewal process whose lifetime distribution function F (·) is absolutely continuous with density f (·). An important role is played by the hazard function q(x) = f (x)/S(x) [see (1.1.3)], particularly in applications to forecasting because we can interpret q(x) as the risk of an event occurring in the next short time interval, given the time elapsed since the last renewal; that is, q(x) dt = Pr{event in t, t + dt | last event at t − x}. Example 4.6(a) Prediction of the time to the next event in a renewal process. Suppose a renewal process has hazard function q(·) as just described and that at time t the time back to the last event is observed to be x. Then, the distribution of the time to the next event has hazard function qx (y) = q(x + y)

(y ≥ 0),

corresponding to a d.f. with tail (i.e. conditional survivor function) Sx (y) = 1 − Fx (y) = exp

−

y

q(x + u) du 0

=

1 − F (x + y) . 1 − F (x)

4.6.

Stieltjes-Integral Calculus and Hazard Measures

107

Note that x here denotes an observation, and that for a stationary Poisson process, the risk qx (y) is everywhere constant. What of the nonabsolutely continuous case in this example? An appropriate extension of the hazard function is the hazard measure Q(·) in Deﬁnition 4.6.IV below. Our discussion of Q(·) is facilitated by two results for Lebesgue– Stieltjes integrals. The ﬁrst is just the formula for integration by parts in the Lebesgue–Stieltjes calculus. The second is much more remarkable: it is the exponential formula, which has been used mainly in connection with martingale theory without its being in any sense a martingale result; it is in fact a straightforward (if unexpected) theorem in classical real analysis. Lemma 4.6.I (Integration-by-Parts Formula). Let F (x) and G(x) be monotonically increasing right-continuous functions of x ∈ R. Then b b F (x) dG(x) = F (b)G(b) − F (a)G(a) − G(x−) dF (x). (4.6.1) a

a

This is a standard result on Lebesgue–Stieltjes integrals; it can be proved directly from ﬁrst principles or as an application of Fubini’s theorem (see e.g. Br´emaud 1981, p. 336). Note that the last term of (4.6.1) contains the leftcontinuous function G(x−); also, recall the convention for Lebesgue–Stieltjes integrals that b ∞ u(x) dG(x) = I(a,b] (x)u(x) dG(x); −∞

a

if we wish to include the contribution from a jump of G at a itself, then we write the integral as b u(x) dG(x); similarly,

b− a

a−

u(x) dG(x) excludes the eﬀect of any jump of G at b.

Lemma 4.6.II (Exponential Formula). Suppose F (x) is a monotonically increasing right-continuous function of x ∈ R and that u(x) is a measurable t function for which 0 |u(x)| dF (x) < ∞ for each t > 0. Let {xi } be the set of discontinuitiesof F in [0, ∞); set ∆F (xi ) = F (xi ) − F (xi −) and write Fc (x) = F (x) − 0<xi ≤t ∆F (xi ) for the continuous part of F (·). Then, the function t H(t) = H(0) exp 1 + u(xi )∆F (xi ) u(x) dFc (x) (4.6.2) 0

0<xi ≤t

is the unique solution in t ≥ 0 of the integral equation t H(t) = H(0) + H(x−)u(x) dF (x) 0

satisfying sup0≤s≤t |H(s)| < ∞ for each t > 0.

(4.6.3)

108

4. Renewal Processes

Proof. We outline a proof (see Br´emaud, 1981, pp. 336–339; Andersen et al., 1993, Theorem II.6.1). Write 1 + u(xi ) ∆F (xi ) G1 (t) = H(0) 0<xi ≤t

and

G2 (t) = exp

t

u(x) dFc (x) .

0

Then, the relation between (4.6.2) and (4.6.3) is just an application of the integration-by-parts formula to obtain an expression for G1 (t)G2 (t), noting that G1 (·) increases by jumps only at the points t = xi , where in fact the jump is equal to G1 (xi ) − G1 (xi −) = 1 + u(xi ) G1 (xi −) − G1 (xi −) = u(xi )G1 (xi −). To show that (4.6.2) is the unique bounded solution to (4.6.3), let D(t) = H1 (t) − H2 (t) be the diﬀerence between any two bounded solutions. Then D(t) itself is bounded in every ﬁnite interval, and we can form the estimate, using (4.6.3) and for ﬁxed ﬁnite s and t with 0 < s < t, s s |D(x−)| |u(x)| dF (x) ≤ M |u(x)| dF (x), |D(s)| ≤ 0

0

where M = sup0≤s≤t |D(s)|. Now feeding this estimate back into (4.6.3) yields s 2 s x M |D(s)| ≤ M |u(x)| dF (x) . |u(y)| dF (y) |u(x)| dF (x) ≤ 2 0 0 0 Evidently, this iteration may be continued and yields for general n ≥ 1 s n M |D(s)| ≤ |u(x)| dF (x) . n! 0 This last expression converges to zero as n → ∞, so D(s) ≡ 0. Corollary 4.6.III. Lemmas 4.6.I and 4.6.II remain true when the functions F and G are of bounded variation on ﬁnite intervals. Proof. For Lemma 4.6.I, use the fact that any function of bounded variation is the diﬀerence of two monotonically increasing right-continuous functions. For Lemma 4.6.II, observe that the argument depends only on the use of the formula for integration by parts and the estimate, for any bounded interval A, & & & & & &≤ u(x) dF (x) |u(x)| dVF (x), & & A

A

where VF is the total variation of F . We now specialize these results to the case where F is a distribution function of a positive random variable, so F (0+) = 0, F (∞) = limx→∞ F (x) ≤ 1.

4.6.

Stieltjes-Integral Calculus and Hazard Measures

109

Deﬁnition 4.6.IV. The hazard measure Q(·) associated with the distribution F on [0, ∞) is the measure on [0, ∞) for which Q(dx) =

F (dx) F (dx) = ; S(x−) 1 − F (x−)

in integrated form, the integrated hazard function (IHF) is the function t dF (x) . Q(t) = 0 1 − F (x−) In the case where F has a density f , we have simply t Q(t) = q(x) dx = − log S(t), 0

where q(x) = f (x)/S(x) is the hazard function and S(x) = 1 − F (x) the survivor function of F . However, this logarithmic relation holds only in the continuous case; in the discrete case, it must be replaced by a relation analogous to (4.6.2) (Kotz and Shanbhag (1980) or Andersen et al. (1993, Theorem II.6.6)]. Proposition 4.6.V. The IHF of a right-continuous d.f. F is monotonically increasing and right continuous, and at each discontinuity xi of F it has a jump of height ∆F (xi ) ≤ 1. ∆Q(xi ) = S(xi −) Conversely, any monotonically increasing right-continuous nonnegative function Q with discontinuities of magnitude < 1, except perhaps for a ﬁnal discontinuity of size 1, can be the IHF of some d.f. F given by the inversion formula t S(t) = 1 − F (t) = 1 − ∆Q(xi ) exp − dQc (x) , (4.6.4) 0≤xi ≤t

0

where ∆Q(xi ) is the jump of Q at its discontinuity xi and Qc the continuous part of Q. Proof. Given a d.f. F on [0, ∞), observe ﬁrst that when F has a jump ∆F (xi ) at the discontinuity xi , the corresponding jump in the IHF is ∆F (xi )/S(xi −) by Deﬁnition 4.6.IV. Since ∆F (xi ) = F (xi ) − F (xi −) ≤ 1 − F (xi −) = S(xi −) with equality if and only if F (xi ) = 1—that is, xi is a discontinuity of F and is the supremum of the support of F —we must have ∆Q(xi ) ≤ 1 with equality possible only for such xi . The inversion formula (4.6.4) is an immediate application of the exponential formula. To see this, we have from Deﬁnition 4.6.IV dF (xi ) = S(xi −) dQ(xi )

110

4. Renewal Processes

with

t

t

dF (x) = 1 −

S(t) = 1 − F (t) = 1 − 0

S(x−) dQ(x). 0

Taking u(x) = −1 in (4.6.3), S(·) is the unique solution of the equation t satisfying 0 |S(x)| dQ(x) < ∞ for t < ∞, so (4.6.4) holds. Corollary 4.6.VI. The d.f. F is uniquely determined by its IHF and conversely. This corollary is simply a formalization and extension of the fact that a renewal process is determined entirely by its lifetime d.f. The fact that the hazard measure is also the central concept in estimating the time to the next renewal has been shown already in Example 4.6(a) which we now continue but without any assumption of absolute continuity. Example 4.6(a) (continued). Recall the setting leading to the density qx (y) earlier. If the lifetime has a jump at x, then we should think of the risk as having a δ-function component at x, the weight associated with the δ-function being given by ∆Q(x) as above. Then, in place of the survivor function Sx (y) given earlier, we now appeal to the corresponding modiﬁcation of (4.6.4), namely x+y Sx (y) = 1 − ∆Q(xi ) exp − dQc (u) . x≤xi ≤x+y

x

In a Wold process, the risk has to be conditioned not only by the time since the last event but also by the length of the most recently observed complete interval as in the following example. Example 4.6(b) Wold process with exponential conditional distributions (see Exercise 4.5.8). Wold (1948) and Cox (1955) both considered processes with Markov-dependent intervals, where the transition kernel has the form P (x, dy) = p(x, y) dy = λ(x) exp[−λ(x)y] dy

(x, y > 0),

corresponding to the assumption that, conditional on the length x of the last interval, the current interval is exponentially distributed with parameter λ(x). In this case, if we observe the process at time t and the length of the last completed interval is x, the risk is constant at λ(x) until the occurrence of the next event. As a stochastic process, the conditional risk appears as a step function, constant over intervals, the constant for any one interval being a function of the length of the preceding interval. Clearly, the ideas in these two examples can be generalized to situations where the dependence on the past extends to more than just the time since the last event or the length of the last completed interval. Such extensions and further examples are explored in Chapters 7 and 14.

CHAPTER 5

Finite Point Processes

The Poisson process can be generalized in many directions. We have already discussed some consequences of relaxing the independency assumptions while retaining those of stationarity and orderliness of a point process on the line. In this chapter we examine generalizations in another direction, stemming from the observation in Chapter 2 that, for a Poisson process, conditional on the total number of points in a bounded region of time or space, the individual points can be treated as independently and identically distributed over the region. This prompts an alternative approach to specifying the structure of point processes in a bounded domain or, more generally, of any point process in which the total number of points is ﬁnite with probability 1. Such a process is called a ﬁnite point process. Such ﬁnite point processes arise naturally as models for populations of animals, insects, and plants in the ecological ﬁeld and as models for particle processes in physics, which was also the context of the ﬁrst general theory of point processes given by Moyal (1962a) following earlier work by Yvon (1935), Bogoliubov (1946), Janossy (1950), Bhabha (1950) and Ramakrishnan (1950). More recently, spatial point processes have been extensively studied with an emphasis on ﬁnite models. Useful reviews can be found in Ripley (1981), Diggle (1983), Stoyan, Kendall and Mecke (1987, 1995), Baddeley and Møller (1989), Cressie (1991), Stoyan and Stoyan (1994), Baddeley et al. (1996), and Barndorﬀ-Nielsen (1998), amongst others. In this chapter, we give a somewhat informal introduction to concepts and structure theorems for ﬁnite point processes, with a sketch of some of their applications. In contrast to the methods of the previous two chapters, the order properties of the real line here play no role in the discussion, and the theory can be developed as easily for a general state space as it can for the real line. In this sense, the present chapter serves as a precursor to the general theory developed more systematically in Volume Two. 111

112

5. Finite Point Processes

The approach we take is ﬁrst to specify the distribution of the total number N of points, and then, given N , to specify the joint distribution of the N points over the region. This leads to a treatment of point process probabilities as probability measures over the space X ∪ introduced formally above Proposition 5.3.II and of the associated battery of Janossy measures, moment measures, cumulant measures, etc., all of which are recurrent themes in the development of the general theory. A special feature of the treatment of ﬁnite point processes is its dependence on combinatorial arguments. The reader may ﬁnd it helpful to brush up on the deﬁnitions of binomial and multinomial coeﬃcients and their relation to the number of ways of sorting a set of objects into various subsets. Closely related to these ideas are the results collected together in Section 5.2 concerning some basic tools for handling discrete distributions: factorial moments and cumulants and their relation with probability generating functions. The importance of this material for the theory of point processes would be hard to overemphasize. Most of the results of this chapter, and much of the general theory also, may be seen as extensions of the results for discrete distributions summarized in that section.

5.1. An Elementary Example: Independently and Identically Distributed Clusters We start with an elementary example that may help to illustrate and motivate the more general discussion. Let a random number N of particles be independently and identically distributed (i.i.d.) over a Euclidean space X according to some common probability measure F (·) on the Borel sets of X . Then, given N , the number of particles in any subregion A is found by ‘binomial sampling’: each particle, independently of the others, may fall in A with probability p = F (A), so, conditional on N , the number of particles in A has the binomial distribution N p(n; A | N ) = (F (A))n (1 − F (A))N −n . n Similarly, given any ﬁnite partition A1 , . . . , Ak of X , the joint distribution of the number of particles is given by the multinomial probability N (F (A1 ))n1 . . . (F (Ak ))nk . p(n1 , . . . , nk ; A1 , . . . , Ak | N ) = n1 · · · nk Unconditionally, the joint distribution of the numbers N (A1 ), . . . , N (Ak ) of particles in A1 , . . . , Ak is found by averaging over N : Pr{N (Ai ) = ni (i = 1, . . . , k)} =

∞

n=0

Pr{N = n} p(n1 , . . . , nk ; A1 , . . . , Ak | n).

5.1.

Independently and Identically Distributed Clusters

113

The procedure just outlined is most readily carried out in terms of probability generating functions (p.g.f.s). Let PN (z) = E(z N ), and write for convenience pi = F (Ai ). Then, the joint p.g.f. of N (Ai ) (i = 1, . . . , k) is N (A )

N (A )

P (A1 , . . . , Ak ; z1 , . . . , zk ) ≡ E(z1 1 · · · zk k ) = PN (p1 z1 + · · · + pk zk ).

(5.1.1)

More generally, for A1 , . . . , Ak just a set of mutually disjoint subregions, P (A1 , . . . , Ak ; z1 , . . . , zk ) = PN (p1 z1 +· · ·+pk zk +(1−p1 −· · ·−pk )); (5.1.2) in eﬀect, we have introduced a further subset Ak+1 = (A1 ∪ · · · ∪ Ak )c and set zk+1 = 1 on Ak+1 . As special cases, when N is Poisson-distributed with parameter λ, the N (Ai ) are independent Poisson random variables with parameters λF (Ai ). In this case, (5.1.1) reduces to the identity ! * k +"

P (A1 , . . . , Ak ; z1 , . . . , zk ) = exp λ zi f (Ai ) − 1 i=1

=

k

exp[λF (Ai )(zi − 1)].

i=1

When N has a negative binomial distribution on {0, 1, . . .} so that PN (z) = (1 + µ(1 − z))−α for some µ, α > 0, {N (Ai )} is a set of mutually correlated binomial random variables with joint p.g.f. −α k

P (A1 , . . . , Ak ; z1 , . . . , zk ) = 1 + µ F (Ai )(1 − zi ) . i=1

In particular, from (5.1.2), the distribution of N (Ai ) itself has the p.g.f. P (Ai ; z) = [1 + µF (Ai )(1 − z)]−α and is again negative binomial with parameters µF (Ai ), α. It is not only the distributions of the N (Ai ) that may be of interest but also their moments. Consider, for example, the problem of ﬁnding the covariance of the number of points in two complementary subsets A1 , A2 = Ac1 . For any given N , we have from the binomial sampling property that E[N (A1 )N (A2 ) | N ] = N (N − 1)F (A1 )(1 − F (A1 )) = N (N − 1)F (A1 )F (A2 ). Hence, E(N (A1 )N (A2 )) = m[2] F (A1 )F (A2 )

(5.1.3)

cov(N (A1 ), N (A2 )) = c[2] F (A1 )F (A2 ),

(5.1.4)

and where m[2] is the second factorial moment, and c[2] the second factorial cumulant, of the total number N of points. In the Poisson case, the covariance is

114

5. Finite Point Processes

zero, and in the negative binomial case it is positive; both contrast with the more familiar case of ﬁxed N when the covariance is clearly negative. Note that both the second moment and the covariance have the form of a measure evaluated on the product set A1 × A2 . This is also the case in general and anticipates the introduction of the factorial moment and cumulant measures in Section 5.4.

5.2. Factorial Moments, Cumulants, and Generating Function Relations for Discrete Distributions Factorial moments and cumulants are natural tools for handling nonnegative integer-valued random variables, a characteristic they bequeath to their oﬀspring, the factorial moment and cumulant measures, in the point process context. We begin by recalling some basic deﬁnitions. For any integers n and r, the factorial powers of n, written n[r] , may be deﬁned by n(n − 1) · · · (n − r + 1) (r = 0, . . . , n), [r] n = 0 (r > n). We then have the following deﬁnition. Deﬁnition 5.2.I. For r = 0, 1, . . . , the rth factorial moment m[r] of the nonnegative integer-valued random variable N is m[r] ≡ E(N [r] ). Thus, when N has probability distribution {pn } = {Pr{N = n}}, m[r] =

∞

n[r] pn .

(5.2.1)

n=0

Consequently, when the distribution is concentrated on a ﬁnite range 0, 1, . . . , n0 , all factorial moments of order larger than n0 are zero. It is useful to be able to convert from factorial moments to ordinary moments and back again. The coeﬃcients that arise in these conversions are the Stirling numbers of the ﬁrst and second kinds, deﬁned, respectively, as the coeﬃcients arising in the expansion of x[r] and xr in powers or factorial powers of x, where, by analogy with the deﬁnition of n[r] , x[r] = x(x − 1) · · · (x − r + 1) for any real x and positive integer r. We follow the notation of David and Barton (1962) in denoting them by Dj,r and ∆j,r . Deﬁnition 5.2.II. The Stirling numbers of the ﬁrst kind Dj,r and second kind ∆j,r are deﬁned by the relations n[r] =

r

j=1

Dj,r (−1)r−j nj

(n ≥ r)

(5.2.2)

5.2.

Factorial Moments, Cumulants, and Generating Function Relations

and nr =

r

(n ≥ r).

∆j,r n[j]

115

(5.2.3)

j=1

Replacing n in (5.2.2) and (5.2.3) by the random variable N and taking expectations, we obtain the corresponding relations between moments: m[r] = mr ≡ E(N r ) =

r

j=1 r

Dj,r mj (−1)r−j ,

(5.2.4)

∆j,r m[j] .

(5.2.5)

j=1

It is clear that, for a nonnegative random variable, the rth factorial moment is ﬁnite if and only if the ordinary rth moment is ﬁnite. Some useful recurrence relations for the Stirling numbers are given in Exercise 5.2.1. For further properties, relation to Bernoulli numbers, and so on, see David and Barton (1962, Chapter 15) and texts on ﬁnite diﬀerences. The factorial moments of the random variable N are related to the Taylor series expansion of the p.g.f. P (z) = E(z N )

(|z| ≤ 1)

about z = 1 in much the same way as the ordinary moments arise in the expansion of the characteristic or moment generating function about the origin. Proposition 5.2.III. For a nonnegative integer-valued random variable N whose kth factorial moment is ﬁnite, the p.g.f. is expressible as P (1 + η) = 1 +

k

m[r] η r r=1

r!

+ o(η k )

(5.2.6)

for all η such that |1 + η| ≤ 1. The complete Taylor series expansion of the p.g.f., ∞

m[r] η r P (1 + η) = 1 + , (5.2.7) r! r=1 is valid for some nonzero η if and only if all moments exist and the series in (5.2.7) has nonzero radius of convergence in η; equivalently, if and only if the p.g.f. P (z) is analytic in a disk |z| < 1 + for some > 0. Equation (5.2.7) then holds for |η| < . Proof. To establish (5.2.6), write (1 + η)N = 1 +

k

N [r] η r r=1

r!

+ Rk (N, η)

(k = 1, 2, . . .)

for remainder terms Rk (N, η) that we now investigate. For k = 0, set R0 (N, η) = (1 + η)N − 1

116

5. Finite Point Processes

and observe that |R0 (N, η)| ≤ 2 under the condition of the theorem that |1 + η| ≤ 1. For general k = 1, 2, . . . , repeated integration of R0 (N, ·) shows that &

& &Rk (N, η) η k & ≤ 2N [k] k! (|1 + η| ≤ 1). Since the left-hand side of this inequality → 0 (η → 0) for each ﬁxed N and the right-hand side has ﬁnite expectation under the assumption of the theorem, it follows by dominated convergence that E Rk (N, η) = o(η k ), which is the result required. To establish (5.2.7), consider the binomial expansion (1 + η)N = 1 +

∞

N [r] η r r=1

r!

.

For η > 0, the ﬁniteness of the expectation on the left is equivalent to requiring the p.g.f. to be analytic for |z| < 1 + η. When this condition is satisﬁed, it follows from Fubini’s theorem that for such η the expectation can be taken inside the summation on the right, leading to the right-hand side of (5.2.7). Conversely, suppose all moments exist and that the sum on the righthand side of (5.2.7) is at least conditionally convergent for some nonzero η0 . Then m[r] η0r /r! → 0 as r → ∞, and it follows from a standard power series argument that the series in (5.2.7) is absolutely convergent for |η| < |η0 | and so deﬁnes an analytic function of η there. Since each m[r] = E(N [r] ) is nonnegative, we can now take any positive η < |η0 | and use Fubini’s theorem to reverse the argument used earlier to deduce that because (5.2.7) holds for all 0 ≤ η ≤ |η0 |, P (z), being a power series with nonnegative coeﬃcients, has its ﬁrst singularity on the positive half-line outside |z| < 1 + |η0 |. In the sequel, we also require the version of Proposition 5.2.III in which the remainder term is bounded by a term proportional to the (k + 1)th moment. The proof, which is along similar lines, is left to the reader. An alternative approach is indicated in Exercise 5.2.2. A similar expansion holds for log P (1 + η), the coeﬃcients of η r /r! being the factorial cumulants c[r] (r = 1, 2, . . .). If P (·) is analytic in a disk as below (5.2.7), then the inﬁnite expansion log P (1 + η) =

∞

c[r] η r r=1

(5.2.8a)

r!

is valid, while under the more limited assumption that mk < ∞, we have the ﬁnite Taylor series expansion log P (1 + η) =

k

c[r] η r r=1

r!

+ o(η k )

valid for |1 + η| < 1; veriﬁcation is left to the reader.

(η → 0),

(5.2.8b)

5.2.

Factorial Moments, Cumulants, and Generating Function Relations

117

The factorial cumulants are related to the factorial moments by the same relations as hold between the ordinary cumulants and moments. The ﬁrst few relations between the ordinary cumulants cr , central moments mr , and factorial moments and cumulants are useful to list as below: c[1] = c1 = µ = m[1] ,

(5.2.9a)

c[2] = c2 − c1 = σ − µ = m[2] − 2

c[3] = c3 − 3c2 + 2c1 =

m3

m2[1]

,

(5.2.9b)

− 3σ + 2µ = m[3] − 3m[2] m[1] + 2

2m3[1] .

(5.2.9c)

Generally, the factorial moments and cumulants provide a much simpler description of the moment properties of a discrete distribution than do the ordinary moments. In particular, for the Poisson distribution {pn (λ)}, m[r] = λr ,

c[1] = λ,

c[r] = 0

(r = 2, 3, . . .).

This vanishing of the factorial cumulants of the Poisson distribution is reminiscent of the vanishing of the ordinary cumulants of the normal distribution and is perhaps one indication of why the Poisson process plays such an outstanding role in the theory of point processes. There are in fact four expansions of the p.g.f. of possible interest, according to whether we expand P (z) itself or its logarithm and whether the expansion is about z = 0 or z = 1. The expansions about z = 1 yield the factorial moments and factorial cumulants, and the expansion of P (z) about z = 0 yields the probability distribution {pn }. This leaves the expansion of log P (z) about z = 0, an expansion that, while rarely used, has an important interpretation in the case of an inﬁnitely divisible (compound Poisson) distribution. Since the analogous expansion for the probability generating functional (p.g.ﬂ.) of a point process is also important, again in the context of inﬁnite divisibility, we now consider the last case in some detail. Proposition 5.2.IV. If p0 > 0, the p.g.f. P (·) can be written in the form log P (z) = −q0 +

∞

qn z n

(|z| < R)

(5.2.10)

n=1

where p0 = e−q0 and R is the distance from the origin to the nearest zero or singularity of P (z). When P (·) is the p.g.f. of a compound Poisson dis∞ tribution, the terms qn are nonnegative and q0 = n=1 qn , so the sequence {πn : n = 1, 2, . . .} ≡ {qn /q0 } can be interpreted as the probability distribution of the cluster size, given that the cluster is nonempty; in this case, (5.2.10) can be rewritten as ∞

log P (z) = −q0 πn (1 − z n ) (|z| < R). n=1

Proof. The structure of the compound Poisson distribution follows from analysis in Chapter 2 (see Theorem 2.2.II and Exercise 2.2.2). The other remarks are standard properties of power series expansions of analytic functions.

118

5. Finite Point Processes

Example 5.2(a) Negative binomial distribution and generating functions. To illustrate these various expansions consider the p.g.f. of the negative binomial distribution, P (z) = [1 + µ(1 − z)]−α (µ > 0, α > 0, |z| ≤ 1). Putting z = 1 + η, we ﬁnd P (1 + η) = (1 − µη)−α = 1 +

∞

α+r−1 r

r=1

µr η r

so that m[r] = α(α + 1) · · · (α + r − 1)µr . Taking logarithms, log P (1 + η) = −α log(1 − µη) = α

∞

µr η r r=1

r

,

and hence c[r] = (r − 1)! αµr . For the expansions about z = 0, we have −α ∞

µz α + n − 1 µz n 1 1 1 − = , P (z) = (1 + µ)α 1+µ (1 + µ)α n=0 1+µ n

so pn = and

µ n α+n−1 1 , (1 + µ)α 1 + µ n

µz log P (z) = −α log(1 + µ) − α log 1 − 1+µ ∞

n = −[α log(1 + µ)] 1 − πn z , n=1 −1

where πn = [n log(1 + µ)] [µ/(1 + µ)] . Clearly, these {πn } constitute a probability distribution, namely the logarithmic distribution, illustrating the well-known fact that the negative binomial is inﬁnitely divisible and hence must be expressible as a compound Poisson distribution. n

Corresponding to the four possible expansions referred to above, there are twelve sets of conversion relations between the diﬀerent coeﬃcients. One of these, the expression for factorial moments in terms of the probabilities, is a matter of deﬁnition: what can be said about the others? Formally, either expansion about z = 1 can be converted to an expansion about z = 0 by a change of variable and expansion, for example, in (formally) expressing the probabilities in terms of the factorial moments via P (z) = 1 +

∞

m[r] (z − 1)r r=1

r!

;

5.2.

Factorial Moments, Cumulants, and Generating Function Relations

119

expanding (z − 1)r and equating coeﬃcients of z n , we obtain

m[r] r pn = (−1)r−n r! n r≥n

or, in the more symmetrical form, n! pn =

∞

∞

(−1)r−n

r=n

m[r] m[n+r] = . (−1)r (r − n)! r=0 r!

(5.2.11)

This relation may be compared with its converse m[r]

∞

∞

Jr+n , = n pn = n! n=r n=0 [r]

(5.2.12)

where Jn+r = (n + r)! Pn+r . Thus, to display the symmetry in these (formal) relations to best advantage, we need to use the quantities Jn , which are analogues of the Janossy measures to be introduced in Section 5.3. Under what circumstances can the converse relation (5.2.11) be established rigorously? For the derivation above to be valid, we must be able to expand P (z) about z = 1 in a disk |z − 1| < 1 + for some > 0, requiring P (z) itself to be analytic at all points on the line segment (−, 2 + ). Since P (z) has nonnegative coeﬃcients, its radius of convergence is determined by the ﬁrst singularity on the positive real axis. Consequently, in order for (5.2.11) to hold for all r = 1, 2, . . . , it is suﬃcient that P (z) should be analytic in the disk |z| < 2 + for some > 0. A ﬁnite version of (5.2.11) with remainder term is due to Fr´echet (1940); extensions are given in Takacs (1967) and Galambos (1975) (see also Daley and Narayan, 1980). We give a simple result in the theorem below, with some extensions left to Exercises 5.2.2–4. Proposition 5.2.V. If the distribution {pn } has all its moments ﬁnite and its p.g.f. P (z) is convergent in a disk |z| < 2 + for some > 0, then (5.2.11) holds. Without assuming such analyticity, the ﬁniteness of m[k] ensures that for integers n = 0, 1, . . . , k − 1, k−1

m[r] (n) + Rk , (r − n)!

(5.2.13a)

≤ m[k] (k − n)! .

(5.2.13b)

If all moments are ﬁnite and for some integer n0 m[k] = o (k − n0 )! (k → ∞),

(5.2.14a)

n! pn =

(−1)r−n

r=n

where (n)

0 ≤ (−1)k−n Rk

then lim

k→∞

k

(−1)r−n m[r] (r − n)!

(5.2.14b)

r=n

exists for n = 0, 1, . . . , n0 and the formal relation (5.2.11) holds for such n.

120

5. Finite Point Processes

Proof. When P (z) is analytic for |z| < 2 + , the expansion P (z) =

∞

m[r] (z − 1)r r=0

r!

is valid for |z − 1| < 1 + , within which region, and at z = 0 in particular, it can be diﬀerentiated n times, leading at once to (5.2.11). Under the weaker condition that m[k] < ∞, n-fold diﬀerentiation in the deﬁnition P (z) = E(z N ) is possible for all |z| ≤ 1 for n = 1, . . . , k, leading to P (n) (z) = E(N (n) z N −n ). Now P (n) (z) is (k − n) times diﬀerentiable in |z| ≤ 1, so the Taylor series expansion P (n) (z) =

k−n−1

r=0

(z − 1)r P (n+r) (1) (z − 1)k−n P (k) (1 + (z − 1)ν) + r! (k − n)!

holds for real z in |z| ≤ 1 for some ν ≡ ν(z) in (0, 1). In particular, (5.2.13a) results on putting z = 0 with (n)

Rk

= (−1)k−n

E(N (k) (1 − ν)N −k ) , (k − n)!

from which relation the inequalities in (5.2.13b) follow. When (5.2.14) holds, (n) Rk → 0 (k → ∞) for each ﬁxed n, and hence (5.2.11) holds in the sense indicated. Special cases of (5.2.13) give the Bonferroni inequalities (see Exercise 5.2.5). Similar relations can be obtained between the factorial cumulants and the quantities πn of Proposition 5.2.IV. Thus, when log P (z) is analytic in a disk |z| < 1 + for some > 0, r-fold diﬀerentiation of (5.2.10) and then setting z = 1 yields ∞

c[r] = qn n[r] = q0 µ[r] , (5.2.15) n=r

where µ[r] in the case of a compound Poisson process is the rth factorial moment of the cluster-size distribution. Reversing the exercise, when log P (z) is analytic in the disk |z| < 2 + , we have [see the derivation of (5.2.11)] n! qn =

∞

r=n

(−1)r−n

c[r] . (r − n)!

(5.2.16)

The most diﬃcult relations to treat in a general form are those between the moments and cumulants, or between the {pn } and the {qn }; these arise from taking exponentials or logarithms of a given series and expanding it by formal manipulation. The feature of these relations is that they involve partitions. For given positive integers j and k with j ≤ k, we deﬁne a j-partition of k as a partition of the set of k numbers {1, . . . , k} into j nonempty subsets.

5.2.

Factorial Moments, Cumulants, and Generating Function Relations

121

Let Pjk denote the collection of all such j-partitions and write T = {S1 (T ), . . . , Sj (T )} for an element of Pjk , noting that the order in which the subsets S. (T ) are labelled or written is immaterial. Thus, for example, the collection of sets {1, 2, 4}, {3, 5}, {6, 8}, {7} is a 4-partition of 8 and is the same as {1, 2, 4}, {6, 8}, {7}, {3, 5}. The following lemma is basic (see e.g. Andrews, 1976); in it, |Sj (T )| denotes the number of elements in Sj (T ) ⊂ {1, . . . , k}. ∞ Lemma 5.2.VI. Let {cj : j = 1, 2, . . .} be a sequence satisfying j=1 |cj |/j! < ∞. Then ! ∞ " ∞

cj z j dk z k exp = (all |z| ≤ 1), (5.2.17) j! k! j=1 k=0

where d0 = 1 and for k = 1, 2, . . . , dk =

j k

j=1 T ∈Pjk i=1

ck =

k

j=1

j−1

(−1)

c|Si (T )| ,

(j − 1)!

j

T ∈Pjk i=1

(5.2.18)

d|Si (T )| .

(5.2.19)

Proof. Establishing (5.2.18) and (5.2.19) is essentially a matter of counting 2 terms. For (5.2.18), consider the expansion /2!+· · · of the exponen∞ 1+Σ+Σ j tial function in (5.2.17) (here, Σ = j=1 cj z /j!), and concentrate attention on all the terms in a speciﬁed product of coeﬃcients such as c3 c22 c1 . Observe ﬁrst that such terms involve z to the power of the sum of the indices, here 3 + 2 + 2 + 1 = 8, and thus they contribute to the term d8 . Second, if we transfer the coeﬃcient 1/k! of dk z k to the multiplier k! on the opposite side, each particular term c3 c22 c1 is then multiplied by the ratio of factorials 8!/3! 2! 2! 1! arising from the factorials associated with the cj and dk . Third, the number of such terms obtained from expanding Σ4 equals the multinomial coeﬃcient 4!/1! 2! 1! , which on division by the factorial 4! from the expansion of exp(Σ) leaves the factor 1!/1! 2! 1! . Thus, altogether the contribution of the coeﬃcient of c3 c22 c1 to d8 is 8!/{(3! 2! 2! 1!) (1! 2! 1!)}. On the other hand, in the expression asserted for dk in (5.2.18), we have to look at 4-partitions of 8 into subsets of sizes 3, 2, 2, 1. The number of such subsets is just 8!/3! 2! 2! 1! , which must be divided by 2! because there are two subsets of size 2. Thus, the coeﬃcient of c3 c22 c1 is of the form implied by (5.2.18). Arguing this way in general establishes (5.2.18), and a similar kind of argument leads to (5.2.19). We remark that the advantage of working with j-partitions, rather than with additive partitions as in David and Barton (1962), is that the counting procedure automatically takes into account repeated terms without requiring explicit notation for the number of repetitions; such notation would make (5.2.18) and (5.2.19) appear much more cumbersome. Examples of full expansions are given in Exercises 5.2.6–8.

122

5. Finite Point Processes

Corollary 5.2.VII. (a) Factorial moments m[k] and factorial cumulants c[k] are related as in (5.2.18) and (5.2.19) via the substitutions cj = c[j] and dk = m[k] . (b) In equation (5.2.10), the probabilities pn and qn are also related as at (5.2.18) and (5.2.19) with cj = j! qj /(− log p0 ) and dk = k! pk /p0 .

Exercises and Complements to Section 5.2 5.2.1 Recurrence relations for Stirling numbers. Use n[r+1] = (n − r)n[r] to show that ∆j,r+1 = j∆j,r + ∆j−1,r ,

∆1r = 1

(r ≥ 1),

Dj,r+1 = rDj,r + Dj−1,r ,

D0r = 0

(r ≥ 1),

∆j0 = 0

(j ≥ 1),

D11 = 1, Dj1 = 0 (j ≥ 2).

5.2.2 Show that when P (z) is any p.g.f. with ﬁnite ﬁrst moment P (1), the function (1 − P (z))/P (1)(1 − z) is also a p.g.f. Use this fact in an induction argument to show that (see Proposition 5.2.III) when m[k] = P (k) (1) < ∞, the function mk (z) in the expansion P (z) = 1 +

k−1

(z − 1)r m[r]

r!

r=1

+

(z − 1)k mk (z) k!

equals m[k] times a p.g.f. Since mk (z) = m[k] + o(1) as z → 1 through values |z| ≤ 1, (5.2.6) follows, as well as the alternative version with remainder bounded by m[k] . Equations (5.2.13) can also be derived by n-fold diﬀerentiation of an expansion to k − n terms (e.g. Daley and Narayan, 1980). 5.2.3 Let the nonnegative integer-valued r.v. N have all factorial moments m[r] ﬁnite and lim supr→∞ (m[r] /r!)1/r = 1/ for some > 0. Show that the p.g.f. P (z) of N has radius of convergence 1 + , and hence deduce that the moments m[r] determine the distribution of N uniquely. Relate P (z) to a moment generating function and deduce that 1 + = exp , where 1/ ≡ lim supr→∞ (mr /r!)1/r . 5.2.4 (Continuation). By using an analytic continuation technique (see Takacs, 1965), show that when > 0 and for any nonnegative z > −2 − 1, pn =

∞

r r=n

r

1 r − n r−s m[s] (−1)s−n z . r+1 s! n (1 + z) s−n s=n

5.2.5 Bonferroni inequalities. Let the r.v. N count the number of occurrences amongst a given set of ν events A1 , . . . , Aν . Show that Sr ≡

Pr(Ai ∩ Aj ∩ · · ·) = E(N (r) )/r! ,

(r)

extends over all νr distinct subsets {i, j, . . .} of where the summation (r) size r from the index set {1, . . . , ν}. [Hint: Using indicator r.v.s, write N (r) = r!

(r)

I(Ai ∩ Aj ∩ · · ·),

5.3.

The General Finite Point Process: Deﬁnitions and Distributions

123

where the term r! arises from the r! ordered subsets of {1, . . . , ν} yielding the same (unordered) subset {i, j, . . .} containing r indices.] Deduce from (5.2.13) the Bonferroni inequalities

0 ≤ Sn −

n+1 Sn+1 + · · · + 1

n+k Sn+k − pn ≤ k

n+k+1 Sn+k+1 , k+1

where k is an even integer (see e.g. Moran, 1968, pp. 25–31). 5.2.6 For given positive integers j and k with j ≤ k, k) = {positive pdeﬁne P(j, p integers {r1 , . . . , rp } and {π1 , . . . , πp } such that i=1 πi = j, i=1 πi ri = k} = set of all j-partitions of k. Write the series (5.2.7) in the form P = 1 + Σ so that log P (z) = Σ − Σ2 /2 + Σ3 /3 − · · · , and expand the series Σn as a multinomial expansion. By equating coeﬃcients of z k , show formally that the factorial cumulants in (5.2.8) are given by

c[k] = k!

k

j=1

(−1)j−1 (j − 1)!

1 m[r ] π1 1 P(j,k)

π1 !

r1 !

···

1 m[rp ] πp . πp ! rp !

5.2.7 Apply Lemma 5.2.VI to show that c[4] = m[4] − 4m[3] m[1] − 3m2[2] + 12m[2] m2[1] − 6m4[1] , m[4] = c[4] + 4c[3] c[1] + 3c2[2] + 6c[2] c2[1] + c4[1] . 5.2.8 Investigate the use of Lemma 5.2.VI in deriving explicit expressions for probabilities of (i) the ‘doubly Poisson’ compound Poisson distribution with p.g.f. P (z) = exp{−µ[1 − exp(−λ(1 − z))]}; (ii) the Hermite distribution with p.g.f. P (z) = exp(az + bz 2 ) for appropriate constants a and b (see Milne and Westcott, 1993).

5.3. The General Finite Point Process: Deﬁnitions and Distributions We now drop any special assumptions and suppose only that the following conditions hold concerning a ﬁnite point process. Conditions 5.3.I. (a) The points are located in a complete separable metric space (c.s.m.s.) X , as, for example, X = Rd . . . .) is given determining the total number (b) A distribution {pn } (n = 0, 1, ∞ of points in the population, with n=0 pn = 1. (c) For each integer n ≥ 1, a probability distribution Πn (·) is given on the Borel sets of X (n) ≡ X × · · · × X , and it determines the joint distribution of the positions of the points of the process, given that their total number is n.

124

5. Finite Point Processes

Such a deﬁnition is both natural and powerful. In particular, it provides a constructive deﬁnition that could be used to simulate the process: ﬁrst, generate a random number N according to the distribution {pn } (and note that Pr{0 ≤ N < ∞} = 1), and then, supposing N = n and excepting the case n = 0 in which case there is nothing else to do, generate a random vector (x1 , . . . , xn ) according to the distribution Πn (·). At this stage, the distinction between ordered and unordered sets of points should be clariﬁed. In talking of stochastic point processes, we make the tacit assumption that we are dealing with unordered sets of points: points play the role of locations at which a given set of particles might be found. We talk of the probability of ﬁnding a given number k of points in a set A: we do not give names to the individual points and ask for the probability of ﬁnding k speciﬁed individuals within the set A. Nevertheless, this latter approach is quite possible (indeed, natural) in contexts where the points refer to individual particles, animals, plants, and so on. Moreover, it is actually this latter point of view that is implicit in Conditions 5.3.I, for as yet there is nothing in them to prevent x1 , say—that is, the ﬁrst point or particle named—from taking its place preferentially in some part of the space, leaving the other particles to distribute themselves elsewhere. To be consistent with treating point processes as a theory of unordered sets, we stipulate that the distributions Πn (·) should give equal weight to all n! permutations of the coordinates (x1 , . . . , xn ), i.e. Πn (·) should be symmetric. If this is not already the case in Condition 5.3.I(c), it is easily achieved by introducing the symmetrized form for any partition (A1 , . . . , An ) of X , Πsym n (A1 × · · · × An ) =

1 Πn (Ai1 × · · · × Ain ), n! perm

(5.3.1)

where the summation perm is taken over all n! permutations (i1 , . . . , in ) of the integers (1, . . . , n) and the normalizing factor 1/n! ensures that the resulting measure still has total mass unity. An alternative notation, which has some advantages in simplifying combinatorial formulae, utilizes the nonprobability measures

Jn (A1 × · · · × An ) = pn Πn (Ai1 × · · · × Ain ) perm (5.3.2) = n! pn Πsym (A × · · · × A ). 1 n n We follow Srinivasan (1969) in referring to these as Janossy measures after their introduction by Janossy (1950) in the context of particle showers. By contrast, Yvon (1935), Bogoliubov (1946) and Bhabha (1950) worked with the form (5.3.1), as have also Macchi (1975) and co-workers, who refer to quantities such as Πsym n (·) in (5.3.1) as exclusion probabilities. An important feature of Janossy measures is their simple interpretation when derivatives exist. If X = Rd and jn (x1 , . . . , xn ) denotes the density of

5.3.

The General Finite Point Process: Deﬁnitions and Distributions

125

Jn (·) with respect to Lebesgue measure on (Rd )(n) with xi = xj for i = j, then % ) there are exactly n points in the jn (x1 , . . . , xn ) dx1 · · · dxn = Pr process, one in each of the n distinct . inﬁnitesimal regions (xi , xi + dxi ) This interpretation gives the Janossy densities a fundamental role in the structural description and likelihood analysis of ﬁnite point processes. Thus, they appear as likelihoods in Chapter 7, where they play a key role in the study of spatial point patterns (see also Chapter 15 and references there) and also in pseudolikelihoods. They are well adapted to describing the behaviour on observational regions, which, being ﬁnite, are typically bounded. Example 5.3(a) I.i.d. clusters (continued from Section 5.1). In this case, X = Rd and, assuming F (A) = A f (x) dx for some density function f (·), the joint density function for the ordered sequence of n points at x1 , . . . , xn is πn (x1 , . . . , xn ) = f (x1 ) · · · f (xn ), which is already in symmetric form. Here jn (x1 , . . . , xn ) = pn n! f (x1 ) · · · f (xn ), and it is jn (· · ·), not πn (· · ·), that gives the probability density of ﬁnding one particle at each of the n points (x1 , . . . , xn ), the factorial term giving the number of ways the particles can be allocated to these locations. Example 5.3(b) Finite renewal processes and random walks. Suppose X = R1 and that, given N = n, the points of the process are determined by the successive points S1 , . . . , Sn of a simple renewal process for which the common distribution of the lifetimes Sj − Sj−1 (where S0 ≡ 0 and j = 1, . . . , n) has a density function f (·). Then πn (S1 , . . . , Sn ) =

n

f (Sj − Sj−1 ).

(5.3.3)

j=1

In moving to the symmetrized form, some care is needed. For any x1 , . . . , xn , we have, formally, πnsym (x1 , . . . , xn ) =

1 f (xi1 )f (xi2 − xi1 ) · · · f (xin − xin−1 ). n! perm

Let x(1) , . . . , x(n) denote the set {x1 , . . . , xn } in ascending order. Then, at least one term in each product in perm will vanish (since f (x) = 0 for x < 0) unless we already have x1 , . . . , xn ordered; that is, xj = x(j) for j = 1, . . . , n. Hence, πnsym (x1 , . . . , xn ) =

1 f (x(1) )f (x(2) − x(1) ) · · · f (x(n) − x(n−1) ). n!

(5.3.4)

126

5. Finite Point Processes

Comparing (5.3.3) and (5.3.4), 1/n! in the latter is seemingly a discrepant factor. The reconciliation lies in the fact that (5.3.3) vanishes outside the hyperoctant x1 < x2 < · · · < xn , whereas (5.3.4) repeats itself symmetrically in all n! hyperoctants. Finally, the Janossy densities are given by jn (x1 , . . . , xn ) = pn f (x(1) )f (x(2) − x(1) ) · · · f (x(n) − x(n−1) ),

(5.3.5a)

where as before pn is the probability that the process contains just n points. Again it is to be noted that (5.3.3) vanishes outside the ﬁrst hyperoctant, whereas (5.3.5) gives positive measure to all hyperoctants. Once the unidirectional character of each step is lost, these simpliﬁcations do not occur. What is then available for a general random walk is conﬁned to the forms (5.3.3) and the corresponding expression

f (xi1 )f (xi2 − xi1 ) · · · f (xin − xin−1 ). (5.3.5b) jn (x1 , . . . , xn ) = pn perm

The simplest renewal example occurs when f has an exponential density. The joint density (5.3.3) then reduces to n λ exp(−λxn ) (0 ≤ x1 ≤ xn ), πn (x1 , . . . , xn ) = 0 otherwise, or in terms of (5.3.5),

jn (x1 , . . . , xn ) = pn λn e−λx(n) .

Remarkably, the joint distribution depends only on the position of the extreme value x(n) ; given this value, the other points are distributed uniformly over (0, x(n) ). The simplest example of a symmetric random walk is probably that for which the individual steps are normally distributed N (0, 1). The successive Si are then the partial sums of a sequence of independent normal variates Si =

i

Zj

j=1

and for any given n are therefore jointly normally distributed with zero mean vector and covariance matrix having elements σij = min(i, j)

(1 ≤ i, j ≤ n).

No dramatic simpliﬁcations seem possible, but some further details are given in Exercise 5.3.1. Example 5.3(c) Gibbs processes: processes generated by interaction potentials. A fundamental class of point processes arising in statistical physics is described by means of forces acting on and between particles. The total

5.3.

The General Finite Point Process: Deﬁnitions and Distributions

127

potential energy corresponding to a given conﬁguration of particles is assumed to be decomposable into terms representing the interactions between the particles taken in pairs, triples, and so on; ﬁrst-order terms representing the potential energies of the individual particles due to the action of an external force ﬁeld may also be included. This leads to a representation of the total potential energy for a conﬁguration of n particles at x1 , . . . , xn by a series of the form n

U (x1 , . . . , xn ) = ψr (xi1 , . . . xir ), (5.3.6) r=1 1≤i1 <···
where ψ(·) is the rth-order interaction potential. Frequently, it is supposed that only the ﬁrst- and second-order terms need be included, so that the process is determined by the point pair potentials, and U (x1 , . . . , xn ) =

n

i=1

ψ1 (xi ) +

n−1

n

ψ2 (xi , xj ).

(5.3.7)

i=1 j=i+1

It is then one of the fundamental principles of statistical mechanics that in equilibrium the probability density of a particular conﬁguration is inversely proportional to the exponential of the potential energy. In terms of Janossy densities, this means that jn (x1 , . . . , xn ) = C(θ) exp[−θU (x1 , . . . , xn )]

(5.3.8)

for some constant of proportionality C(θ) and parameter θ related to the temperature of the system. The normalizing constant is referred to as the partition function. The major diﬃculty in handling processes of this type lies in expressing the partition function as a function of θ (or, indeed, of any other parameters that may occur in the description of the system). It is important to note that for ﬁnite point processes for which the Janossy densities exist, there is a converse to equation (5.3.8) where the densities jn (·) are expressed in terms of the interaction potentials ψr (·) via the function U (·). Speciﬁcally, Exercise 5.3.7 describes ψk (·) in terms of jr (·) (r = 1, . . . , k). In fact, two slightly diﬀerent situations may be considered. In the ﬁrst of these, the canonical ensemble, the number n of particles is regarded as ﬁxed and the normalizing constant is chosen to satisfy 1 exp[−θU (x1 , . . . , xn )] dx1 · · · dxn . = C(θ) X (n) In the second, the grand canonical ensemble, both the number of particles and their locations are regarded as variable, and the partition function has to be chosen to satisfy (5.3.9) below. Here we examine two special cases; further discussion is given around Examples 7.1(c)–(f).

128

5. Finite Point Processes

(i) No interactions (ideal gas). Here, n n

ψ(xi ) = C(θ) exp[−θψ(xi )]. jn (x1 , . . . , xn ) = C(θ) exp − θ i=1

i=1

Integrating over (x1 , . . . , xn ) ∈ X (n) and summing, using (5.3.9), we obtain ∞

[Λ(θ)]n 1 = C(θ) = C(θ)eΛ(θ) , n! n=0

setting j0 = J0 = C(θ) and Λ(θ) = X e−θψ(x) dx. Thus, C(θ) = e−Λ(θ) and the process is just an inhomogeneous Poisson process with intensity e−θψ(x) . (ii) Repulsive interactions. Consider next the case of a homogeneous process in which the potential is speciﬁed entirely by the pairwise interactions ψ2 (x, y), which are assumed to be a function φ(r) of the distance r = |x − y| between the pair of points. A large variety of special forms have been considered for the function φ(·) both in the statistical mechanics literature (e.g. Ruelle, 1969; Preston, 1976) and more recently as models for spatial point processes in other contexts (see e.g. Ripley, 1977; Ogata and Tanemura, 1984). Typical examples include 2 φ1 (r) = − log(1 − e−(r/σ) ), φ2 (r) = (σ/|r|)n (n = 4, 6, etc.), φ3 (r) = ∞ or 0

as r ≤ or > σ.

The function φ1 (·) represents relatively weak repulsive forces, even for r near zero, and it is therefore described as a ‘soft-core’ model. φ3 (·) corresponds to the ‘hard-core’ model; every point pair has a separation > σ, and no other interaction occurs. The second model is of intermediate type, approximating the behaviour of the hard-core model for large n. None of these models is easy to handle analytically, and special expansion techniques have been developed to approximate the partition functions. For the subsequent discussions, we use mainly the Janossy measures. In this formulation, the normalization condition pn = 1 takes the form ∞

Jn (X (n) ) =1 n! n=0

(5.3.9)

since we may interpret J0 (X (0) ) = p0 and, for n ≥ 1, we have

Πn (X (n) ) = pn n! . Jn (X (n) ) = pn perm

It is clear that from any family of symmetric measures Jn (·) satisfying (5.3.9), we can construct a probability distribution {pn } and a set of symmetric probability measures {Πsym n (·)} satisfying Conditions 5.3.I, and conversely.

5.3.

The General Finite Point Process: Deﬁnitions and Distributions

129

Either speciﬁcation is equivalent to specifying a global probability measure P on the Borel sets A of the countable union (with X (0) interpreted as an isolated point) X ∪ = X (0) ∪ X (1) ∪ X (2) ∪ · · · ; (5.3.10) Moyal (1962a) takes (X ∪ , P) as the canonical probability space of a ﬁnite point process. Given such a measure P, the measure pn Πsym n , or equivalently, (n!)−1 Jn , appears as the restriction of P to the component X (n) . The situation is summarized in the following proposition. Proposition 5.3.II. Let X be a complete separable metric space, and let (n) BX be the product σ-ﬁeld on X (n) i, with the added convention that the set X (0) denotes an ideal point such that X (0) × X = X × X (0) = X . Then, the following speciﬁcations are equivalent, and each suﬃces to deﬁne a ﬁnite point process on X : (i) a probability distribution {pn } on the nonnegative integers and a family (n) of symmetric probability distributions Πsym n (·) on BX , n ≥ 1; (n)

(ii) a family of nonnegative symmetric measures Jn (·) on BX , n ≥ 1, satisfying the normalization condition (5.3.9) and with J0 (X (0) ) = p0 ; (iii) a symmetric probability measure P on the symmetric Borel sets of the countable union in (5.3.10). There is one point of principle to be noted here concerning the canonical choice of state space for a ﬁnite point process. To be consistent with treating a point process as a set of unordered points, a realization with, say, k points should be thought of not as a point in X (k) but as a point in the quotient space of X (k) with respect to the group of permutations amongst the k coordinates. For example, when X = R and k = 2, then in place of all pairs (x1 , x2 ), with (x1 , x2 ) and (x2 , x1 ) being treated as equivalent, we should consider some representation of the quotient space such as the set {(x1 , x2 ): x1 ≤ x2 }. The diﬃculty with this approach in general is that it is often hard to ﬁnd a convenient concrete representation of the quotient space (consider for example the case just cited with R replaced by the unit circle or sphere), with the attendant problems of visualizing the results and bringing geometric intuition to bear. We have therefore preferred the redundant representation, which allows a distinction between the points but then gives all permutations amongst the labelling of the points equal weight in the measure. It must be borne in mind that there is then a many–one relation between the points in the space X ∪ and the set of all totally ﬁnite counting measures. Another way of treating the same problem is to introduce the σ-algebra of symmetric sets in X (k) , that is, the sets invariant under permutations of the coordinate axes. A symmetric set in X ∪ is a set whose projections onto X (k) are symmetric for each positive integer k. Then, any event deﬁned on the point process represents a symmetric set in X ∪ , and thus the natural σ-algebra to use in discussing point process properties is this σ-algebra of symmetric sets. We do not emphasize this

130

5. Finite Point Processes

approach because our main development in Chapter 9 is given in terms of counting measures; we merely refer the reader seeking details to Moyal (1962a) and Macchi (1975) (see also Exercises 5.3.4–6). Now let us turn to the problem of expressing in terms of Janossy measures (or one of their equivalents) the probability distributions of the random variables N (Ai ). If (A1 , . . . , Ak ) represents a ﬁnite partition of X , the probability of ﬁnding exactly ni points in Ai (i = 1, . . . , k) can be written, with n1 + · · · + nk = n, as (n )

(n )

Jn (A1 1 × · · · × Ak k ) Pk (A1 , . . . , Ak ; n1 , . . . , nk ) = n1 ! · · · nk ! n (n1 ) (n ) Πsym = pn × · · · × Ak k ), n (A1 n1 · · · nk

(5.3.11)

where the multinomial coeﬃcient can be interpreted as the number of ways of grouping the n points so that ni lie in Ai (i = 1, . . . , k). It is important in (5.3.11) both that the sets Ai are disjoint and that they have union X (i.e. they are a partition of X ). For any i for which ni = 0, the corresponding term is omitted from the right-hand side. From (5.3.11), it follows in particular that the probability of ﬁnding n points in A, irrespective of the number in its complement Ac , is given by n! P1 (A; n) =

∞

Jn+r (A(n) × (Ac ))(r) )

r!

r=0

.

(5.3.12)

Similarly, if A1 , . . . , Ak are any k disjoint Borel sets, C = (A1 ∪ · · · ∪ Ak )c , and n = n1 + · · · + nk , the probability of ﬁnding just ni points in Ai , i = 1, . . . , k, is given by n1 ! · · · nk ! Pk (A1 , . . . , Ak ; n1 , . . . , nk ) =

∞ (n ) (n )

Jn+r (A 1 × · · · × A k × C (r) ) 1

r=0

k

r!

.

(5.3.13) These probabilities are in fact the joint distributions of the random variables N (Ai ), i = 1, . . . , k. The fact that they do form a consistent set of ﬁnite-dimensional (ﬁdi) distributions is implicit in their derivation, but it can also be veriﬁed directly, as we show following the discussion of such conditions in Chapter 9. An alternative approach, following Moyal (1962a), starts from the observation that each realization can be represented as a random vector Y ∈ X (n) for some n ≥ 0. Any such vector deﬁnes a counting measure on X , through N (A) = #{i: yi ∈ A}, where the yi are the components of the random vector Y . The random vector thus gives rise to a mapping from X (n) into the space NX# of all counting measures on X . It is easy to see that this mapping is measurable so it deﬁnes a point process (see Chapter 9). This being true for every n, the whole

5.3.

The General Finite Point Process: Deﬁnitions and Distributions

131

process is a point process, and since (5.3.13) are its ﬁdi distributions, they are necessarily consistent. As Moyal pointed out, this approach to the existence of ﬁnite point processes can be extended to more general cases by considering the restrictions of the process to an increasing family of Borel sets (spheres, say) chosen so that they expand to ﬁll the whole space but with probability 1 have only a ﬁnite number of points in each. The main diﬃculty with this approach from our point of view is that it does not extend readily to random measures, which we require for their own sake and for applications in later chapters. We conclude this section with a lemma that will play a useful role in simplifying the relations amongst various measures introduced in the sequel. It is needed in particular in checking that the distributions deﬁned by (5.3.13) satisfy the consistency conditions of Chapter 9. Lemma 5.3.III. Let A be a Borel subset of X and S a symmetric measure deﬁned on X (n) for some n > 0. Then, for any partition {A1 , . . . , Ak } of A,

n (j ) (j ) (n) S(A1 1 × · · · × Ak k ), (5.3.14) S(A ) = j1 · · · jk where the summation extends over all nonnegative integers j1 , . . . , jk for which j1 + · · · + jk = n. Proof. Equation (5.3.14) expresses the fact that the partitioning of A induces a partitioning of A(n) into k n subsets, which are grouped together into classes that are identiﬁed by vectors (j1 , . . . , jk ): within any given class, each constituent subset has Ai appearing as a coordinate or ‘edge’ ji times. The symmetry of S implies that all subsets in the same class have the same S measure; hence, (5.3.14) follows.

Exercises and Complements to Section 5.3 5.3.1 [see Example 5.3(b)]. For a ﬁnite random walk with normally distributed N (0, 1) steps, show that 2

π2sym (x, y) = and π3sym (x, y, z) = where f (x, y, z) = e−(x

2

(e−x

/2

2

+ e−y /2 )e−(x−y) 4π

2

/2

f (x, y, z) + f (y, z, x) + f (z, x, y) , 12π(2π)1/2

+(y−z)2 )/2

(e−(y−x)

2

/2

+ e−(z−x)

2

/2

).

5.3.2 Check Proposition 5.3.II in detail. 5.3.3 Show that, by a suitable choice of metric, X ∪ in (5.3.10) becomes a c.s.m.s. [Recall the assumption, made in Condition 5.3.I(a), that X is a c.s.m.s.] 5.3.4 Let A(k) denote the k-fold product A × · · · × A. Show that a symmetric measure on the Borel sets of X (2) is determined by its values on sets of the form A(2) but that the corresponding statement for X (k) with k ≥ 3 is false. [Hint: Consider ﬁrst X = {1, 2} and k = 2, 3.]

132

5. Finite Point Processes (k)

5.3.5 (Continuation). Let Bsym be the smallest σ-algebra containing the sets A(k) (k) for Borel subsets A of X . Show that Bsym consists of all symmetric Borel (k) subsets of X and that any symmetric measure µ on B(k) is completely (k) determined by its values on Bsym . Show also that a symmetric measure µ on (k) B is completely determined by integrals of the form

X (k)

ζ(x1 ) · · · ζ(xk ) µ(dx1 × · · · × dxk )

for functions ζ in the class U of Deﬁnition 5.5.I. (n)

5.3.6 Let X0 denote the quotient space X (n) /Π(n) , where Π(n) is the permutation group over the coordinates of a point in X (n) . Prove that there is a one-to-one (n) correspondence between measures on the Borel subsets of X0 and symmetric ∞ (n) (n) measures on the Borel subsets of X . [Macchi (1975) uses n=0 X0 in place ∪ of X in (5.3.10) as the sample space for ﬁnite point processes.] 5.3.7 Let {jk (·): k = 1, 2, . . .} be a family of positive Janossy densities for an a.s. ﬁnite point process. Deﬁne functions ψ1 (x) = − log j1 (x), ψk (x1 , . . . , xk ) = − log jk (x1 , . . . , xk ) −

k−1

ψr (xi1 , . . . , xir ).

r=1 1≤i1 <···
Show that {jk (·)} thereby deﬁnes recursively a unique family of interaction potentials for a Gibbs process [see Example 5.3(c), especially (5.3.8)]. 5.3.8 Let f (·) be a bounded or nonnegative functional of an a.s. ﬁnite point process with Janossy measures Jn (·). Show that E[f (N )] =

∞

1 n=0

n!

X (n)

f (δx1 + · · · + δxn ) Jn (dx1 × · · · × dxn ).

5.4. Moment Measures and Product Densities We now investigate the moment structure of ﬁnite point processes, extending to counting measures the notions of ordinary and factorial moments and cumulants developed for nonnegative integer-valued r.v.s in Section 5.2. In fact, because we require a general point process to be ﬁnite a.s. on bounded sets, the deﬁnitions can be extended almost immediately to the general case (these extensions are treated in Chapter 9). Suppose then that the total population has ﬁnite kth moment µk = E [N (X )]k for some k = 1, 2, . . . . Then, for any Borel set A ∈ BX , deﬁne (5.4.1) Mk (A(k) ) = E [N (A)]k , where we choose to regard the left-hand side as the value on the product (k) set A(k) of a set function deﬁned on the product σ-ﬁeld BX in X (k) . In

5.4.

Moment Measures and Product Densities

133

particular, if the total population has ﬁnite mean µ1 = E[N (X )], we can deﬁne the expectation measure M (·) by M (A) ≡ M1 (A) = E[N (A)]

(A ∈ BX ).

(5.4.2)

Here it is clear from Fubini’s theorem that M (·) inherits countable additivity from N (·) so that it does in fact deﬁne a measure on BX . For k > 1, we can extend the deﬁnition of Mk to arbitrary rectangle sets of the form (k ) r) A1 1 × · · · × A(k , r where {k1 , · · · , kr } is a partition of k (so kr ≥ 1 and k1 + · · · + kr = k) and the Ai are disjoint sets of BX , by setting (k1 )

Mk (A1

× · · · × Ar(kr ) ) = E [N (A1 )]k1 · · · [N (Ar )]kr .

(5.4.3)

It is not diﬃcult to check that Mk is countably additive on these k-dimensional (k) rectangle sets and hence can be extended to a measure on the Borel sets BX . In fact, Mk can be regarded as the expectation measure of a point process on X (k) : the point process consists of all k-tuples (allowing repetitions and distinguishing the order in this k-tuple) of points from the original realization; that is, it consists of the k-fold product N (k) of N with itself. Thus, Mk gives (k) the expected number of such k-tuples in arbitrary sets from BX . Since N (k) (k) is a symmetric measure on X , so too is its expectation measure Mk . We call Mk the kth moment measure of N . Similarly, we can introduce the kth factorial moment measure M[k] . Here, M[1] = M1 = M , and for k > 1 the ordinary powers inside the expectation in (5.4.3) are replaced by factorial powers: with Ai and ki as in (5.4.3), we set (k1 )

M[k] (A1

r) × · · · × A(k ) = E [N (A1 )][k1 ] · · · [N (Ar )][kr ] . r

(5.4.4)

As for Mk , the set function on the left-hand side of this deﬁning relation is countably additive on rectangle sets in X (k) and can be interpreted as the expectation measure of a certain point process in X (k) . In this case, the realizations of the new process consist of all k-tuples of distinct points from the original process, still distinguishing the order within the k-tuple but not allowing repetitions. (Note that if the original process N has multiple points, each such point is to be enumerated according to its multiplicity: for example, a double point of N should be regarded as two distinct points having the same coordinates when constructing the k-tuples.) Then M[k] (A) represents (k) the expected number of such k-tuples falling in A ∈ BX . Proposition 5.4.I. If µk = E([N (X )]k ) < ∞, the set functions Mk and M[k] deﬁned by (5.4.3) and (5.4.4) are countably additive on rectangle sets and have unique extensions to symmetric measures Mk and M[k] , respectively, (k) on BX .

134

5. Finite Point Processes

Using the identities (5.2.2) and (5.2.3) that relate ordinary and factorial powers, it is possible to write down explicit expressions for Mk on certain sets in terms of {M[j] , j = 1, . . . , k} and for M[k] in terms of {Mj , j = 1, . . . , k}. Directly from (5.2.5), we have the important special case k

E [N (X )]k = Mk (A(k) ) = ∆j,k M[j] (A(j) ).

(5.4.5)

j=1

Such relations are particularly useful when the factorial moment measures are absolutely continuous so that the right-hand side of (5.4.5) can be expressed as a sum of integrals of the product densities introduced below Lemma 5.4.III. Note also relations such as M[2] (A × B) = E[N (A)N (B)] − E[N (A ∩ B)] = M2 (A × B) − M (A ∩ B)

(A, B ∈ BX )

(5.4.6)

(see Exercises 5.4.1–6 for a more systematic exposition of such relations). Applications of these moment measures appear in subsequent chapters; here we explore their relation to the Janossy measures and their interpretation in terms of product densities. Since (5.4.4) is simply the factorial moment of a ﬁdi distribution, which can be expressed in terms of the Janossy measures by means of (5.3.11), we can obtain an expression for M[k] (·) in terms of Janossy measures. To examine this expression, we return to the case where A1 , . . . , Ar is a partition of X . Assuming E([N (X )][k] ) < ∞, we have directly from the deﬁnitions, when k1 + · · · + kr = k, that

(k ) [k ] r) M[k] (A1 1 × · · · × A(k )= j1 1 · · · jr[kr ] Pr (A1 , . . . , Ar ; j1 , . . . , jr ) r ji ≥ki , i=1,...,r

=

r)

Jj +···+j (A(j1 ) × · · · × A(j r ) 1 r 1 . r i=1 (ji − ki )!

ji ≥ki

To simplify the last sum, put ni = ji − ki and group together the terms for which n1 + · · · + nr = n. Setting k = k1 + · · · + kr , we obtain (k1 )

M[k] (A1

r) × · · · × A(k ) r ∞

1 = n! n=0

ni =n

n (k +n ) Jk+n (A1 1 1 × · · · × Ar(kr +nr ) ). n1 · · · nr

The inner sum can be reduced by Lemma 5.3.III, taking A = X and deﬁning S by (k ) (n) r) S(B) = Jk+n (A1 1 × · · · × A(k × B) (B ∈ BX ), r thereby yielding the equation (k ) M[k] (A1 1

× ··· ×

r) A(k ) r

∞ (k ) (k )

Jk+n (A1 1 × · · · × Ar r × X (n) ) = . n! n=0

5.4.

Moment Measures and Product Densities

135

Using the countable additivity of both sides, this extends to the following elegant generalization of (5.2.12), Mk (B) =

∞

Jk+n (B × X (n) ) n! n=0

(k)

(all B ∈ BX ).

(5.4.7)

To obtain the inverse relation, suppose that all factorial moments µ[k] of N (X ) exist and that the p.g.f. P (1 + η) =

∞

µ[k] η k k=0

(5.4.8)

k!

is convergent in a disk |η| < 1 + for some > 0 [equivalently, that P (z) = E(z N (X ) ) is analytic in some disk |z| < 2+]. Then, the inverse relation (5.2.1) can be applied to yield, with the same notation as in (5.4.7) and following a parallel route, (k1 )

Jn (A1

r) × · · · × A(k )= r

∞

(k1 )

(−1)k

M[n+k] (A1

k=0

=

r

(kr )

× · · · × Ar k!

(j )

(−1)ji −ki

ji ≥ki i=1

× X (k) ) (j )

M[j1 +···+jr ] (A1 1 × · · · × Ar r ) (ji − ki )!

(n)

so that for general B ∈ BX , Jn (B) =

∞

(−1)k

k=0

M[n+k] (B × X (k) ) . k!

(5.4.9)

These results may be summarized for reference in the following theorem. Theorem 5.4.II. If the total population size has ﬁnite kth moment, then the kth factorial moment measure is deﬁned and ﬁnite and can be represented in terms of the Janossy measures by (5.4.7). Conversely, if all moments are ﬁnite and for some > 0 the p.g.f. (5.4.8) is convergent for |η| < 1 + , then the Janossy measures can be represented in terms of the factorial moment measures by (5.4.9). Example 5.4(a) Avoidance function. To illustrate the application of Theorem 5.4.II, consider the set function P0 (A) ≡ Pr{N (A) = 0} = P1 (A; 0); that is, the probability of ﬁnding no points in a given subset A of X , or, equivalently, the probability that the support of N avoids A. Taking n = 0 in (5.4.9) and restricting X to A itself, we obtain immediately P0 (A) = J0 (A) =

∞

k=0

(−1)k

M[k] (A(k) ) . k!

(5.4.10)

136

5. Finite Point Processes

An important feature of (5.4.10) is that it is not necessary to know anything about the nature of the moment measure outside A to determine the probability. In the case X = R and A equal to the interval (0, t], the result in (5.4.10) gives the survivor function for the forward recurrence time in terms of the moment measures on (0, t]. Of course, from another point of view, (5.4.10) is just a special case of equation (5.2.11) giving the probabilities of a discrete distribution in terms of the factorial moments. We now turn and consider densities for the moment measures, assuming X to be a real Euclidean space (or well-behaved subset thereof). Recall the standard result, which follows from Fubini’s theorem, that if a totally ﬁnite measure can be represented as the superposition of a ﬁnite or countably inﬁnite family of component measures, then it is absolutely continuous with respect to a given measure if and only if each component is absolutely continuous, the density of the superposition being represented a.e. by the sum of the densities. Applied to the representation (5.4.7), this yields immediately the following lemma. Lemma 5.4.III. If the kth factorial moment measure M[k] (·) exists, then it is absolutely continuous if and only if the Janossy measures Jn (·) are absolutely continuous for all n ≥ k, in which case the densities m[k] (·) and jn (·) are related by the equations, for k = 1, 2, . . . , ∞

1 m[k] (x1 , . . . , xk ) = ··· jk+n (x1 , . . . , xk , y1 , . . . , yn ) dy1 · · · dyn . n! X X n=0 The inverse relation follows in a similar way: if all the factorial moment measures exist and are absolutely continuous, and if the series (5.4.9) is absolutely convergent, then the corresponding Janossy measure is absolutely continuous with density given by ∞

(−1)k jn (x1 , . . . , xn ) = · · · mn+k (x1 , . . . , xn , y1 , . . . , yk ) dy1 · · · dyk . k! X X k=0 (5.4.11) Historically, the introduction of factorial moment densities, also referred to as product densities in Bhabha (1950) and Ramakrishnan (1950) and as coincidence densities in Macchi (1975), considerably preceded the more general treatment as above using factorial moment measures. This is easily understood in view of the simple physical interpretation of the densities: equations (5.4.7) and (5.3.9) imply that if m[k] (x1 , . . . , xk ) is bounded in a neighbourhood of (x1 , . . . , xk ), then we can write ∞

Jk+n (dx1 × · · · × dxk × X (n) ) n! n=0 one particle located in each of the = Pr , inﬁnitesimal subsets dxi (i = 1, . . . , k)

m[k] (x1 , . . . , xk ) dx1 · · · dxk =

(5.4.12)

where dxi denotes both the inﬁnitesimal set (xi , xi + dxi ) and its Lebesgue

5.4.

Moment Measures and Product Densities

137

measure. This interpretation may be contrasted with that for the density jk (x1 , . . . , xk ) dx1 · · · dxk exactly k points in realization, one in each = Pr . subset dxi (i = 1, . . . , k), and none elsewhere

(5.4.13)

From an experimental point of view, (5.4.12) can be estimated from the results of k observations at speciﬁc times or places, whereas the Janossy measure requires indeﬁnitely many observations to determine the exact (total) number of occurrences. For this reason, the densities (5.4.12) are in principle amenable to experimental determination (through ‘coincidence’ experiments, hence the name coincidence densities) in a way that Janossy measures are not, at least in the context of counting particles. However, as Macchi (1975) has stressed, the Janossy measures, and hence the joint distributions, can be determined by the converse relations (5.4.9) and (5.4.11). Moment measures also have the important feature, in common with relations such as (5.4.10), that they are global in character, in contrast to the local character of the Janossy measures. We mean by this that the form of the moment measures is not inﬂuenced by the nature of the region of observations: if two observation regions overlap, the moment measures coincide over their common region. On the other hand, the Janossy measures depend critically on the observation regions: just as the number of points observed in the region depends on its size and shape, so also the Janossy measures are exactly tailored to the particular region. This feature lends further importance to the converse relations (5.4.9) and (5.4.11): knowing the moment densities, the Janossy densities for any observation region A can be calculated by taking X = A in (5.4.11), a remark that continues to have force even when the point process is not totally ﬁnite over the whole of X . Thus, the one set of moment measures suﬃces to determine the Janossy measures for as many observation regions as one cares to nominate. When the region of interest is indeed a bounded subset A of the space X where the point process is deﬁned, we introduce the following deﬁnition. Deﬁnition 5.4.IV (Local Janossy Measures and Densities). Given any bounded Borel set A, the Janossy measures localized to A are the measures Jn (· | A) (n = 1, 2, . . .) satisfying, for locations xi ∈ A (i = 1, . . . , n), exactly n points in A at . Jn (dx1 × · · · dxn | A) = Pr locations dx1 , . . . , dxn When these measures have densities, they deﬁne the local Janossy densities. Such local functions have particular importance when the process is no longer a.s. ﬁnite-valued on the whole space X . For these local functions the identities in (5.4.9) and (5.4.11) continue to hold with X (k) replaced by A(k)

138

5. Finite Point Processes

(and the local functions on the respective left-hand sides), as for example jn (x1 , . . . , xn | A) ∞

(−1)k = · · · mn+k (x1 , . . . , xn , y1 , . . . , yk ) dy1 · · · dyk . k! A A

(5.4.14)

k=0

What is remarkable about such a relation is that by merely changing the range of integration of a function deﬁned globally, we can recover the local probabilistic structure when all the moments exist [see Example 5.5(b)]. Local Janossy densities jn (x1 , . . . , xn | A) feature prominently in the discussion of point process likelihoods in Section 7.1. The existence of densities is closely linked to the concept of orderliness, or more properly, simplicity, in the sense of Chapter 3, that with probability 1 there are no coincidences amongst the points. Suppose on the contrary that, for some population size n, the probability that two points coincide is positive. In terms of the measure Jn (·), the necessary and suﬃcient condition for this probability to be positive is that Jn (·) should allot nonzero mass to at least one (and hence all) of the diagonal sets {xi = xj }, where xi is a point in the ith coordinate space. Thus, we have the following proposition. Proposition 5.4.V. (a) A necessary and suﬃcient condition for a point process to be simple is that, for all n = 1, 2, . . . , the associated Janossy measure Jn (·) allots zero mass to the ‘diagonals’ {xi = xj }. (b) When X = Rd , the process is simple if for all such n the Janossy measures have densities jn (·) with respect to (nd)-dimensional Lebesgue measure. It is more convenient to frame an analogous condition in terms of the moment measures (assuming they exist). From the preceding result and the representation (5.4.7), we have immediately the following proposition. Proposition 5.4.VI. Suppose the second factorial moment measure M[2] (·) exists. Then, a necessary and suﬃcient condition for the point process to be simple is that M[2] (·) allots zero mass to the ‘diagonal’ set {xi = xj }. In particular, for X = Rd , the process is simple whenever M[2] (·) has a density m[2] (·) with respect to 2d-dimensional Lebesgue measure. An alternative approach to this proposition can be given in the context of random measures: for the stationary case, see Proposition 8.1.IV and its Corollary 8.1.V. In some applications, we may wish to verify that a given family of densities constitutes the product densities of some point process. The following result gives a simple suﬃcient condition, which, however, is far from necessary (see remarks after the proof). Proposition 5.4.VII. Let m[k] (·) on X (k) (k = 1, 2, . . .) be a family of symmetric nonnegative functions with ﬁnite total integrals µ[k] = m[k] (x) dx, X (k)

5.4.

Moment Measures and Product Densities

139

∞ k and suppose that for some > 0 the series k=1 µ[k] z is convergent for |z| < 1 + . Then, a necessary and suﬃcient condition for the family {m[k] (·)} to be factorial moment densities of a ﬁnite point process is that the integrals in (5.4.11) should be nonnegative for every n = 1, 2, . . . and every vector x = (x1 , . . . , xn ). These factorial moment densities then determine the process uniquely. Proof. The integrals are convergent by assumption and clearly deﬁne a family of nonnegative symmetric functions. The only other requirement needed for them to form a set of Janossy functions is the normalization condition (5.4.9). On integrating (5.4.11) over x1 , . . . , xn , the required condition is seen to be equivalent to demanding that if we deﬁne {pn } by N ! pn =

∞

(−1)k

k=0

µ[k+n] , k!

then the {pn } should sum to unity. But this reduces to the condition µ[0] = m[0] = 1, which may be assumed without loss of generality. ∞ Remarks. The constraint that k=1 µ[k] /k! converges for |z| < 1+ is stronger than is needed: it is enough that lim supr→∞ (µ[r] /r!)1/r < ∞, but a more complicated deﬁnition of pn may then be needed (see Exercises 5.4.3–4). Also, for the product densities to deﬁne a point process that is not necessarily a ﬁnite point process, it is enough for the result to hold (with either the given or modiﬁed conditions on {µ[r] }) with the state space X replaced by a sequence {An } of bounded sets for which An ↑ X as n → ∞. Example 5.4(b) Moment densities of a renewal process (Macchi, 1971a). It is well known (see Chapter 4) that the moment properties of a renewal process are completely speciﬁed by the renewal function. Although the renewal process is not a ﬁnite point process, the machinery developed in this section can be carried over to give a particularly succinct formulation of this result in terms of the factorial moment densities, where for ease of exposition it is assumed that the renewal density exists, u(·) say. In these terms, and assuming stationarity, the renewal density is just a multiple of the second-moment density since for s < t and with m = M[1] ((0, 1]), m[2] (s, t) ds dt = Pr{renewals in (s, s + ds) and (t, t + dt)} = m ds u(t − s) dt. Similarly, exploiting the regenerative property, we have for t1 < · · · < tk that m[k] (t1 , . . . , tk ) dt1 · · · dtk = Pr{renewals in (ti , ti + dti ), 1 ≤ i ≤ k} = m dt1 u(t2 − t1 ) dt2 · · · u(tk − tk−1 ) dtk . (5.4.15) Thus, when the moment densities exist, a necessary condition for a point process to be a stationary renewal process is that the densities be expressible in the product form (5.4.15).

140

5. Finite Point Processes

This condition is also suﬃcient. To see this, assume (5.4.15) holds for some constant m and some function u(·) for each k = 1, 2, . . . . From the cases k = 1, 2, ﬁrst the constant m and then the function u(·) are identiﬁed in terms of ﬁrst- and second-moment densities. From (5.4.11), we can obtain an expression for the density of the interval distribution by taking X = [0, t] and requiring exactly two events, one at 0 and one at t, thus yielding for the lifetime density f (·) the relation m f (t) = m

∞

(−1)k k=0

=m

∞

k=0

k!

· · · u(x1 )u(x2 − x1 ) · · · u(t − xk ) dx1 · · · dxk

[0,t](k)

k

(−1)

···

u(x1 )u(x2 − x1 ) · · · u(t − xk ) dx1 · · · dxk

0<x1 <···<xk
This identiﬁes f (·) as the solution to an inverse of the renewal equation in the form f = u − f ∗ u. Finally, uniqueness follows from the fact that the moment measures, which coincide with those constructed from a renewal process with this density f (·), determine the process uniquely. Example 5.4(c) The fermion process (Macchi, 1975). The renewal process of the previous example generally produces a spacing or ‘antibunching’ eﬀect, at least if its lifetime distribution has its coeﬃcient of variation less than unity. Such behaviour is characteristic of fermions (e.g. electrons) as distinct from bosons (e.g. photons) in the elementary particle context. Benard and Macchi (1973) and Macchi (1975) developed a remarkable dual theory for both types of particles. This theory, while derived in the the ﬁrst instance from considerations of quantum mechanics, leads to a dual family of point processes of considerable general interest. The ﬁrst family coincides with the family of renewal processes under suitable conditions; we describe a typical member shortly. The dual family is described in Example 6.2(b) and consists of a doubly stochastic processes. A striking application concerns the zeros of the Riemann zeta function. Coram and Diaconis (2002) provide statistical tests that illustrate aspects of a considerable literature on close connections between blocks of n ‘adjacent’ zeros and eigenvalues of random unitary matrices in the unitary group Un furnished with Haar measure, for suitably chosen n. This statistical work includes comparisons of spacings (between adjacent zeros and eigenvalues), traces (of blocks of n zeros and eigenvalues of random elements of Un ), and correlation studies of points in intervals. D.E. Littlewood’s immanants (e.g. Littlewood, 1950, Chapter 6), of which permanents and determinants as linear forms of all n-fold products over n points of the kernel C(·, ·) below are extremes, can be viewed as interpolating between boson and fermion point processes, respectively, via the group characters (Diaconis and Evans, 2001). Given C(·, ·) and a character group, the immanant, if positive, is proportional to the Janossy density of a simple point process with n points.

5.4.

Moment Measures and Product Densities

141

Our state space X is a general d-dimensional Euclidean space, and we use A to denote a closed bounded subset (e.g. a rectangle) within X . Let C(x, y) be a covariance function deﬁned on X so that with ⎧ ⎫ C(x1 , y1 ) · · · C(x1 , yk ) ⎪ ⎪ ⎫ ⎧ ⎪ ⎪ ⎪ ⎪ x · · · xk ⎪ .. .. ⎪ .. ⎪ ⎭ = det ⎪ ⎪ ⎪ ⎩ 1 , C ⎪ ⎪ . . . ⎪ ⎪ y1 · · · yk ⎩ ⎭ C(xk , y1 ) · · · C(xk , yk ) the symmetric determinant

⎫ ⎧ x · · · xk ⎪ ⎪ ⎭ ≥ 0. ⎩ 1 C x1 · · · xk In general, C(·, ·) may be complex-valued and therefore Hermitian so that C(x, y) = C(x, y), but for ease of writing we assume here that C(·, ·) is real. It follows from nonnegativity that for λ > 0 the function ⎫ ⎧ · · · xk ⎪ ⎪ ⎭ ⎩ x1 (5.4.16) m[k] (x1 , . . . , xk ) = λk C x1 · · · xk

is at least a possible candidate for the kth factorial moment density of some orderly point process on X . To decide whether this is a legitimate choice, we need to investigate whether the corresponding Janossy densities, given formally by (5.4.11), are well deﬁned and nonnegative. In fact, the Janossy densities have a representation parallel to (5.4.16) in terms of the solution Rλ (x, y) of the resolvent equation C(x, y)Rλ (u, y) du = C(x, y). (5.4.17) Rλ (x, y) − λ A

It is well known in the theory of integral equations (see e.g. Pogorzelski, 1966, p. 47) that Rλ (x, y) can be expressed as a series in λ with terms involving (5.4.16); speciﬁcally, λRλ (x, y) equals * + ⎫ ⎧ ∞

(−λ)j 1 x x1 · · · xj ⎪ ⎪ ⎭ dx1 · · · dxj , λC(x, y) + λ ··· C⎩ y x1 · · · xj d(λ) j! A A j=1 where d(λ) = 1 +

∞

(−λ)j j=1

j!

A

···

⎫ ⎧ · · · uj ⎪ ⎪ ⎭ du1 · · · duj ⎩ u1 C u1 · · · uj A

is the Fredholm determinant associated with equation (5.4.17). More generally, the k × k ‘Fredholm minor’ associated with this equation, obtained by replacing C by Rλ in the basic determinant (5.4.16), is given by ⎧ ⎫ λ ⎪ ⎩ x1 · · · xk ⎪ ⎭ λk R y1 · · · yk * ⎫ ⎧ 1 · · · xk ⎪ ⎪ ⎭ ⎩ x1 λk C = y1 · · · yk d(λ) + ⎫ ⎧ ∞

(−1)j x 1 · · · xk u1 · · · uj ⎪ k ⎪ ⎭ du1 · · · duj (5.4.18) +λ · · · C⎩ y1 · · · yk u 1 · · · u j j! A A j=1

142

5. Finite Point Processes

(see e.g. Pogorzelski, 1966, p. 52). Now (5.4.18) has the same form as (5.4.11) if we identify the factorial moment densities by (5.4.16) and the Janossy measures by ⎧ ⎫ x λ ⎪ ⎩ 1 · · · xk ⎪ ⎭. (5.4.19) jn (x1 , . . . , xk ) = λk d(λ) R x1 · · · xk The convergence of (5.4.18) is ensured by the general theory, using the Hadamard inequality to bound the determinants appearing therein. Thus, only the nonnegativity of the functions (5.4.19) needs to be checked. While these functions need not be nonnegative in general, an appropriate suﬃcient condition can easily be stated in terms of λ and the eigenvalues of (5.4.17); that is, the values of λ for which the homogeneous equation corresponding to (5.4.17) [i.e. (5.4.17) with the right-hand side replaced by zero] admits a nontrivial soluλ in (5.4.19) is nonnegative if the function Rλ tion. In fact, the determinant R is itself a covariance function, for which it suﬃces that the eigenvalues µi (λ) of Rλ be nonnegative. Now these eigenvalues are related to those of C by the equation µi (λ) = λi − λ, so a necessary and suﬃcient condition for Rλ to be a covariance function is that λ < min{λi }, in which case d(λ) is also nonnegative. It is now easy to check that this condition is necessary and suﬃcient for the existence of a well-deﬁned point process with factorial moments and Janossy densities given by (5.4.16) and (5.4.19). A great virtue of this model is that it provides a rather general model for ‘antibunching’ with repulsive rather than attractive points for which moment and probability densities can be given explicitly, or at least be computed numerically, and is not restricted to the state space R. Further details of the process, including a discussion of the corresponding discrete process in which the integral operator is replaced by a matrix, are given in Exercises 5.4.7–10.

Exercises and Complements to Section 5.4 5.4.1 (see Proposition 5.4.I). Show that for disjoint sets A and B, M[2] ((A ∪ B)(2) ) = M[2] (A(2) ) + M[2] (B (2) ) + 2M[2] (A × B). 5.4.2 Establish the analogues below of (5.4.6), where all distinct terms of like kind:

∗

denotes summation over

M [3] (A1 × A2 × A3 ) = E[N (A1 )N (A2 )N (A3 )] −

∗

E[N (A1 )N (A2 ∩ A3 )] + 2E[N (A1 ∩ A2 ∩ A3 )],

M[4] (A1 × A2 × A3 × A4 ) = E[N (A1 )N (A2 )N (A3 )N (A4 )] −

∗ ∗ E[N (A1 )N (A2 )N (A3 ∩ A4 )] + E[N (A1 ∩ A2 )N (A3 ∩ A4 )] ∗

+2

E[N (A1 )N (A2 ∩ A3 ∩ A4 )] − 6E[N (A1 ∩ A2 ∩ A3 ∩ A4 )].

5.4.

Moment Measures and Product Densities

143

5.4.3 (Continuation). Find the generalization for M[k] (A1 × · · · × Ak ) for general k, and discuss the relation to the Stirling numbers Dj,k . Observe that the relation is essentially one between the ordinary product counting measure N (k) and the modiﬁed product counting measure consisting of distinct ordered k-tuplets. 5.4.4 Show that M3 (dx1 × dx2 × dx3 ) equals M[3] (dx1 × dx2 × dx3 ) +

∗

M[2] (dx1 × dx2 )δ(x2 , x3 ) + M[1] (dx1 )δ(x1 , x2 , x3 ),

where δ(x1 , x2 ) and δ(x1 , x2 , x3 )vanish outside the hyperplane x1 = x2 and ∗ x1 = x2 = x3 , respectively, and is as in Exercise 5.4.2. 5.4.5 (Continuation). Show that in general Mk (dx1 × · · · × dxk ) =

k

j=1

! M[j]

V

j

" dyi (V) δ(V),

i=1

where the inner sum is taken over all partitions V of the k coordinates into j nonempty subsets, the yi (V) constitute an arbitrary selection of one coordinate from each subset, and δ(V) is a δ-function that equals zero unless equality holds among the coordinates in each of the nonempty subsets of V (see Krickeberg, 1974). 5.4.6 (Continuation). Show that if a point process is simple, the moment measure Mk completely determines Mj for j ≤ k. [Hint: Consider the representation of Mk in terms of the factorial moment measures M[j] with j ≤ k. If the process is simple, each diagonal term for Mk can be identiﬁed with one of the M[j] .] Provide a counterexample showing that for point processes that are not simple, two distinct processes may have the same M2 but diﬀerent M1 (see Krickeberg, 1974, Theorem 3, Corollary 3). 5.4.7 Discrete fermion process. As an analogue of Example 5.4(c), let X be a discrete space of K points labelled 1, . . . , K, and for k ≥ 1 set

⎧

⎪ ⎩ mk (i1 , . . . , ik ) = E[N {i1 } · · · N {ik }] ≡ λk C

⎫

i1 · · · ik ⎪ ⎭, i1 · · · ik

= (cij ) is a k ×k covariance matrix. Observe that the determinant on where C the right vanishes if an index is repeated (and hence, in particular, if k > K), so that the function mk (·) is nonzero only for combinations of distinct indices. Deﬁne P (1 + η1 , . . . , 1 + ηK ) = 1 +

K

k=1

λk

⎧ i1 · · · ik ⎫ ⎪ ⎪ ⎩ ⎭ηi1 · · · ηik C

comb

i1 · · · ik

= det(I + λDη C), where Dη = diag(η1 , . . . , ηK ), C is the K × K matrix with elements cij , and

is taken over all distinct combinations of k indices from K. Show that, comb with zi = 1 + ηi , P (·) is a proper multivariate p.g.f. [Hint: Use the identity (I + λDz Rλ )(I − λC) = I + λDη C,

144

5. Finite Point Processes where Rλ = C(I − λC)−1 , leading to P (z1 , . . . , zK ) = d(λ) det(I + λDz Rλ ), where d(λ) = det(I − λC) and thus

⎧

λ ⎪ ⎩ jk (i1 , . . . , ik ) = d(λ) λk R

⎫

i1 · · · ik ⎪ ⎭ i1 · · · ik

= Pr{N {i1 } = · · · = N {ik } = 1, N {j} = 0 (j ∈ / {i1 , . . . , ik }) }. Check that this expression is nonnegative provided 0 < λ < min{λi }, where the λi solve d(λ) = 0.] 5.4.8 (Continuation). Show that the process of Exercise 5.4.7 satisﬁes the following: (i) The process is simple, i.e. Pr{N {j} = 0 or 1 for j = 1, . . . , K} = 1; (ii) E[N {i}N {j}] = λ2 (cii cjj − |cij |2 ) < λ2 cii cjj = E[N {i}] E[N {j}], and hence the values are negatively correlated for all i, j; (iii) N (X ), the total number of points on X , has p.g.f. d(λ(z − 1)); (iv) Pr{N (X ) = 0} = d(λ). For a dual model, see Exercises 6.2.3–5. 5.4.9 Derive the results asserted in Example 5.4(c) by a passage to the limit from the discrete analogue described in the preceding exercises assuming C(x, y) is bounded and continuous on A and imitating the proofs of the Fredholm theory approach to integral equations. For a dual model, see Exercise 6.2.6. 5.4.10 For the special case of Example 5.4(c) with X = R and C(x, y) = ρ e−|x−y|/L , the fermion process reduces to a stationary renewal process with interval distributions having density f (x) = √

2ρ e−x/L sinh [(x/L) 1 − 2ρL ] 1 − 2ρL

(see Macchi, 1971b). More generally, a reduction to a renewal process is possible whenever C(x, y)C(y, z) = C(x, z)C(y, y)

(x ≤ y ≤ z).

For a dual model, see Exercise 6.2.7.

5.5. Generating Functionals and Their Expansions The factorial moment densities are closely linked, as are the factorial moments in the univariate and ﬁnite multivariate cases, to an appropriate version of the generating function concept. In the point process context, the appropriate generalization is the probability generating functional, which we introduce as follows. Let ζ(·) be any bounded complex-valued Borel measurable function; then, for a realization {Xi : i = 1, . . . , N } of a ﬁnite point process, the N (random) product i=1 ζ(xi ) is well deﬁned, and on imposing the further requirement that |ζ(x)| ≤ 1 (all x ∈ X ), its expectation will exist and be ﬁnite. When p.g.ﬂ.s return in Chapter 9, they are deﬁned ﬁrst much as here and then extended.

5.5.

Generating Functionals and Their Expansions

145

Deﬁnition 5.5.I. Let U: X → C be the class of complex-valued Borel measurable functions satisfying the condition |ζ(x)| ≤ 1. Then, for a ﬁnite point process, the probability generating functional (p.g.ﬂ.) is deﬁned for ζ ∈ U by ! N " G[ζ] = E ζ(xi ) , (5.5.1) i=1

where the product is zero if N > 0 and ζ(xi ) = 0 for some i, and is unity if N = 0. We can get some feel for the p.g.ﬂ. by taking A1 , . . . , Ar to be a measurable partition of X and setting r

ζ(x) = zi IAi (x), (5.5.2) i=1

where IA (x) is the indicator function of the set A and |zi | ≤ 1 for i = 1, . . . , r. The function ζ in (5.5.2) belongs to U, and substitution in (5.5.1) leads to " + ! r * r

N (A ) i , zi IAi (·) = E zi G i=1

i=1

which is just the multivariate p.g.f. of the number of points in the sets of the given partition. The case of a general function ζ ∈ U may be regarded as a limiting form of this result, where every inﬁnitesimal region dx is treated as a separate set in a grand partition of X , and ζ(x) is the coeﬃcient (z value) of the corresponding indicator function in (5.5.2). In this way, the p.g.ﬂ. provides a portmanteau description of the p.g.f. of all possible ﬁnite or inﬁnite families of counting r.v.s N (·). As in the case of an ordinary discrete distribution, the p.g.ﬂ. provides a useful way of summarizing and illuminating the complex combinatorial results associated with the moments and a convenient formal tool for deriving relations between them. In further analogy to the univariate case, there are two useful expansions of the p.g.ﬂ., the ﬁrst about ζ ≡ 0 and the second about ζ ≡ 1. The ﬁrst results directly from the deﬁnition (5.5.1) when the expectation is written out in terms of the elements {(pn , Πn )} of the point process or, equivalently, in terms of the Janossy measures Jn (·) [see Conditions 5.3.I and equation (5.3.11)]. For all ζ ∈ U, we have G[ζ] = p0 + = J0 +

∞

n=1 ∞

pn

1 n! n=1

X (n)

ζ(x1 ) · · · ζ(xn ) Πn (dx1 × · · · × dxn )

(5.5.3a)

ζ(x1 ) · · · ζ(xn ) Jn (dx1 × · · · × dxn ).

(5.5.3b)

X (n)

The second expansion can be derived as a generalization from the case where ζ has the particular form (5.5.2) when the p.g.ﬂ. reduces to a multivariate

146

5. Finite Point Processes

p.g.f., and the expansion can be expressed in terms the multivariate factorial of ∞ moments. Assuming as in (5.4.8) that the series k=0 µ[k] z k is convergent for |z| < for some > 0 and expressing the factorial moments of the counting r.v.s in terms of the factorial moment measures (5.4.4), we obtain + * + * r r

G zi IAi = G 1 + (zi − 1)IAi i=1

i=1

∞

1 =1+ k! k=1

k1 +···+kr =k

r k (k ) r) (zi − 1)ki M[k] (A1 1 × · · · × A(k ). r k1 · · · kr i=1

The ﬁnal sum here can be identiﬁed with the integral with respect to M[k] (·) r of the product i=1 (zi − 1)ki IAi (xj ) so we have ∞

1 G[1 + η] = 1 + η(x1 ) · · · η(xk ) M[k] (dx1 × · · · × dxk ), (5.5.4) k! X (k) k=1

r

where η(x) = i=1 (zi − 1)IAi (x) in the special case considered. Since any Borel measurable function can be approximated by simple functions such as η, the general result follows by familiar continuity arguments,using the dominated convergence theorem and the assumed convergence of µ[k] z k in |z| < , supposing that |η(x)| < for x ∈ X . By taking logarithms of the expansions in (5.5.3) and (5.5.4), we can obtain expansions analogous to those in (5.2.10) and (5.2.8). The ﬁrst of these takes the form, under the condition that J0 > 0, ∞

1 log G[ζ] = −K0 + ζ(x1 ) · · · ζ(xn ) Kn (dx1 × · · · × dxn ), (5.5.5) n! X (n) n=1 where J0 = exp(−K0 ) and the Kn (·) (n = 1, 2, . . .) are symmetric signed measures, which, following Bol’shakov (1969), we call Khinchin measures. This expansion is important when the point process is inﬁnitely divisible and can be given a cluster interpretation generalizing that of the compound Poisson distribution (see Section 6.3). Here we note that in this case the measures Kn (·)/K0 can be identiﬁed asthe Janossy measures of the process charac∞ terizing the clusters, so K0 = n=1 Kn (X (n) )/n! , and the expansion can be rewritten in the form ∞

1 [ζ(x1 ) · · · ζ(xn ) − 1] Kn (dx1 × · · · × dxn ). (5.5.6) log G[ζ] = n! X (n) n=1 Taking logarithms of the expansions (5.5.4) leads to a development in terms of factorial cumulant measures C[k] , namely log G[1 + η] =

∞

1 η(x1 ) · · · η(xk ) C[k] (dx1 × · · · × dxk ). k! X (k)

k=1

(5.5.7)

5.5.

Generating Functionals and Their Expansions

147

This expansion converges under the same conditions as (5.5.4) itself, namely that the factorial moments µ[k] of the total population size should satisfy µ[k] k < ∞ for some > 0 or, equivalently, that the p.g.f. of the total population size should be analytic within a disk |z| < 1 + . Note that the scope of application of these results can be increased considerably by recalling that X itself can be deliberately restricted to a subspace such as a ﬁnite interval or rectangle of the original space in which the process may not even be ﬁnite. Relations between the factorial cumulant measures and factorial moment measures can be derived from the expansions (5.5.4) and (5.5.7) by formal substitution or by recalling that the measures appearing in those expansions are symmetric: without this restriction, they are not uniquely deﬁned by integral representations such as (5.5.7). For example, by comparing the linear and quadratic terms of ζ, we have

ζ(x1 ) C[1] (dx1 ) =

ζ(x1 ) M[1] (dx1 ),

(5.5.8a)

X (2)

ζ(x1 )ζ(x2 ) C[2] (dx1 × dx2 ) =

X (2)

ζ(x1 )ζ(x2 ) M[2] (dx1 × dx2 ) −

X

ζ(x1 ) M[1] (dx1 )

X

ζ(x2 ) M[1] (dx2 ), (5.5.8b)

which can be abbreviated to C[1] (dx1 ) = M[1] (dx1 ),

(5.5.8c)

C[2] (dx1 × dx2 ) = M[2] (dx1 × dx2 ) − M[1] (dx1 ) M[1] (dx2 ). (5.5.8d) The latter statement follows because any Borel measure on X (2) is determined by its values on rectangles A × B, which in the case of a symmetric measure may be taken to be squares A × A for which the indicator functions have the form ζ(x1 )ζ(x2 ). In the sequel, we repeatedly use such inﬁnitesimal notation to represent equality of measures on product spaces. Using this notation, the general relation between C[k] and the factorial moment measures M[j] for j ≤ k is most conveniently written in the form, analogous to (5.2.19), C[k] (dx1 × · · · × dxk ) =

k

j=1

(−1)j−1 (j − 1)!

j

M[|Si (T )|] (dxi1 × · · · × dxi,|Si (T )| ).

(5.5.9)

T ∈Pjk i=1

To check that (5.5.9) holds, apply Lemma 5.2.VI to the expansions (5.5.4) for the p.g.ﬂ. and (5.5.7) for its logarithm. Note that in (5.5.9), unlike (5.2.19), here we must take explicit note of the elements xi1 , . . . , xi,|Si (T )| of each constituent set Si (T ) in each partition T in Pjk .

148

5. Finite Point Processes

In practice, it is convenient to group together those partitions ∗ T in Pjk that have common numbers of elements in their subsets: using to denote summation over such groups, (5.5.9) then yields, for example when k = 4, C[4] (dx1 × · · · × dx4 ) = M[4] (dx1 × · · · × dx4 ) ∗ − M[1] (dx1 )M[3] (dx2 × dx3 × dx4 ) ∗ M[2] (dx1 × dx2 )M[2] (dx3 × dx4 ) − ∗ +2 M[1] (dx1 )M[1] (dx2 )M[2] (dx3 × dx4 ) − 6M[1] (dx1 ) · · · M[1] (dx4 ).

(5.5.10)

∗

Here, the ﬁrst two terms come from P24 , with terms in the former sum four ∗ and three terms in the latter, while the other term comes from P34 and has six terms. This expression then compares immediately with the relation in Exercise 5.2.7. Inverse relations can be derived in the same way and take the form M[k] (dx1 × · · · × dxk ) =

j k

C[|Si (T )|] (dxi1 × · · · × dxi,|Si (T )| ).

j=1 T ∈Pjk i=1

(5.5.11) Just as with integer-valued r.v.s, expansions such as (5.4.9) and (5.5.11) can in principle be combined to provide expressions for the Janossy measures in terms of the factorial cumulant measures and vice versa. While they may appear to be too clumsy to be of any great practical value, when one or more of the entities concerned has a relatively simple structure, as occurs for example with the Poisson process, they can in fact provide a usable theoretical tool (see e.g. Proposition 7.1.III). Similar comments apply to the relations between the Khinchin measures and the factorial moment measures. For ease of reference, we give at the end of this section a summary of the various expansions of the p.g.ﬂ. G[·] of an a.s. ﬁnite point process N , together with the corresponding relations between the associated families of measures. First, we illustrate uses of the p.g.ﬂ. in three examples; for the third of these, concerning branching processes, it is convenient to present here a range of results needed later in the book. Example 5.5(a) I.i.d. clusters [continued from Section 5.1 and Example 5.3(a)]. Returning to our initial example, we see that equation (5.1.1) for the joint p.g.f. of this example is a special case of the general form for the p.g.ﬂ. G[ζ] = PN X ζ(x) F (dx) , (5.5.12) where as before PN (·) is a p.g.f. of the cluster size and F (·) is the distribution of the individual cluster members about the origin. The case where PN (·) has the compound Poisson form (see Theorem 2.2.II) PN (z) = e−λ[1−Π(z)]

5.5.

Generating Functionals and Their Expansions

149

and Π(·) is the p.g.f. of the compounding distribution, is of interest. Expanding log G[ζ], we have n ∞

πn ζ(x) F (dx) −1 ; log G[ζ] = λ Π X ζ(x) F (dx) − 1 = λ n=1

X

hence, K0 = λ and for n = 1, 2, . . . , Kn (dx1 × · · · × dxn ) = λπn n! F (dx1 ) · · · F (dxn ). This can be compared with the form for the Janossy measures for which J0 = e−λ and for n = 1, 2, . . . , Jn (dx1 × · · · × dxn ) = πn n! F (dx1 ) · · · F (dxn ), the interpretation being as follows. The process can be regarded as the superposition of ν i.i.d. nonempty subclusters, where ν has a Poisson distribution with mean λ, and for each subcluster, Kn (dx1 × · · · × dxn )/K0 is the probability that the subcluster consists of n points and that they are located at {x1 , . . . , xn }. The Janossy measure yields as Jn (dx1 × · · · × dxn ) the probability that the superposition of the ν subclusters results in n points in all, with these points being located at {x1 , . . . , xn }. In this particular case, the measures Jn (·) and Kn (·) for n = 1, 2, . . . diﬀer only by a scale factor that depends on n: this is a consequence of the i.i.d. nature of the locations of the points. In the more complex examples studied in Chapters 6 and 10, this no longer need hold [see also Example 7.1(e)]. Example 5.5(b) P.g.ﬂ. for the local process on A. Let V(A) denote the space of all measurable functions h on A satisfying 0 ≤ h ≤ 1, and for h ∈ V(A) extend h to all X by putting h∗ (x) = h(x)IA (x). Then, the p.g.ﬂ. GA [h] of the local process on A is deﬁned in terms of the global p.g.ﬂ. G by the equation GA [h] = G[1 − IA + h∗ ]

(h ∈ V(A)).

(5.5.13)

This representation follows immediately from the interpretation of the p.g.ﬂ. as the expectation * + * + GA [h] = E h(xi ) = E [1 − IA (xi ) + h∗ (xi )] . xi ∈A

xi ∈X

Thus, the local Janossy measures can be obtained from an expansion of the p.g.ﬂ. about the function 1 − IA (·) rather than about 0. Speciﬁcally, GA [ρh] = G[1 − IA + ρh∗ ] ∞

ρn = p0 (A) + h(x1 ) · · · h(xn ) Jn (dx1 × · · · × dxn | A). n! A(n) n=1 (5.5.14)

150

5. Finite Point Processes

A similar comment applies to the Khinchin measures arising from the expansion of the log p.g.ﬂ. We can introduce local Khinchin measures, Kn (· | A) say, via the expansion [see equation (5.5.5)] of log GA [ρh] as log G[1 − IA + ρh∗ ]

∞

ρn = −K0 (A) − h(x1 ) · · · h(xn ) Kn (dx1 × · · · × dxn | A), n! A(n) n=1 (5.5.15) where p0 (A) = exp[−K0 (A)]. Example 5.5(c) General branching processes; multiplicative population chains. This basic model stimulated much of the early discussion of generating functionals and moment measures (see e.g. Bartlett and Kendall, 1951; Moyal, 1962a, b) and may be described as follows. A population evolves in discrete time or generations t = 0, 1, . . . . The members of each generation are characterized by both their total number and their locations in the state space X in such a way that the population consisting of the tth generation can be described by a ﬁnite point process on X . The fundamental multiplicative property of the process expresses the fact that the population at the (t + 1)th generation is built up as the sum or, more properly, the superposition of the contributing processes representing the oﬀspring from each of the members of the tth generation. Here we shall assume that, given the number Zt and the locations {xti : i = 1, . . . , Zt } of the members of the tth generation, the contributing processes to the (t + 1)th generation are mutually independent and independent of both Zt and all generations prior to t. This relation is then expressible in the form Nt+1 (A) =

Zt

N (A | xti )

(A ∈ BX , t = 0, 1, . . .),

(5.5.16)

i=1

where the Zt ﬁnite point processes {N (· | xti ): i = 1, . . . , Zt } are mutually independent. The distributions of the contributing or oﬀspring processes N (· | x) may depend on the location x of the parent. They can be speciﬁed by probability distributions {pn (x): n = 0, 1, . . .} and symmetric distributions Πn (· | x) as in Conditions 5.3.I with the additional requirement that, for ﬁxed values of their other arguments, the pn (x) and Πn (· | x) are all assumed to be measurable functions of x for each n = 0, 1, . . . . Then, the oﬀspring p.g.ﬂ., G[ζ | x] say, will also be a measurable function, and the relation (5.5.16) can be expressed as Zt Gt+1 [ζ | Nt ] = G[ζ | xti ], (5.5.17) i=1

where the left-hand side represents the conditional p.g.ﬂ. for the (t + 1)th generation given the number and locations of the members of the tth generation

5.5.

Generating Functionals and Their Expansions

151

as speciﬁed by the point process Nt . It is clear that the right-hand side is a measurable function of {Zt , xti (i = 1, . . . , Zt )} and hence that the left-hand side is a measurable function of the ﬁnite process Nt . We may therefore take expectations over the left-hand side with respect to Nt , thus obtaining the relation Gt+1 [ζ] = Gt G[ζ | ·] , (5.5.18) where G[ζ | · ] is to be treated as the argument of Gt (note that G[ζ | · ] ∈ U whenever ζ ∈ U). Equation (5.5.18) is a far-reaching generalization of the functional iteration relation for the p.g.f.s of the number of oﬀspring in successive generations of the Galton–Watson process (see also Exercise 5.5.3). Analogous formulae for the factorial moment measures can be established by similar conditioning arguments or else more formally by expanding the p.g.ﬂ. in powers of ζ and equating like terms. We illustrate these procedures for the expectation measures, denoting by M (· | x) the expectation measure for the oﬀspring process N (· | x) with a parent at x and by M(t) (·) the expectation measure for the population at the tth generation. Corresponding to (5.5.17), we have M(t+1) (A | Nt ) =

Zt

M (A | xti ) =

i=1

X

M (A | x) Nt (dx),

(5.5.19)

where again the measurability of M (A | x) as a function of x is clear from the assumptions. Taking expectations with respect to Nt , we then have M(t+1) (A) = M (A | x) M(t) (dx), (5.5.20) X

showing that the expectation measures for successive generations are obtained by operating on M(0) (·) by successive powers of the integral operator with kernel M (· | x). As in the case of a multitype Galton–Watson process (which indeed is the special case when the state space consists of a ﬁnite number of discrete points), this operator governs the asymptotic behaviour of the process. In particular, its maximum eigenvalue determines the asymptotic rate of growth (or decay) of the mean population size. These and many other properties are discussed in standard references on general branching processes (see e.g. Moyal, 1962b; Harris, 1963; Athreya and Ney, 1972; Jagers, 1975). Most attention has been given to the case where X is compact, which results in behaviour similar to that of the ﬁnite multitype case. New types of behaviour occur in the noncompact case: for example, M (A | ·) may be the kernel of a transient Markov chain, in which case the total mass is preserved but, in contrast to the compact case, the population need not necessarily become extinct—it may continue ‘moving’ indeﬁnitely across the state space as a kind of population wave. Some further aspects and examples are taken up in the exercises [see also Chapter 12 of MKM (1978) and Liemant et al. (1988)]

152

5. Finite Point Processes

For an alternative derivation of (5.5.20), write ζ = 1 + η in (5.5.18) and expand the two sides. We have

1+ X

η(x) M(t+1) (dx) + · · · = 1 + (G[1 + η(x)] − 1)M(t) (dx) + · · · X =1+ M(t) (dx) η(u) M (du | x) + · · · + · · · , X

X

where all terms omitted involve product terms in η. Equating the measures with respect to which η is integrated on each side of the equation, we obtain (5.5.20). This brief illustration is a typical example of the fact that the p.g.ﬂ. acts as a portmanteau device for condensing a broad range of formulae (see also Exercise 5.5.4). We conclude this section with a summary of the various expansions of the p.g.ﬂ. G[·] of an a.s. ﬁnite point process N , together with the corresponding relations between the associated families of measures. For brevity of notation, the latter are written in density form: they can easily be translated into measure notation [for example, equation (5.5.11) is an analogue of (5.5.28) both for measure notation and analogous expansions]. For point processes that are not a.s. ﬁnite, the expansions must be applied to the local process on A, N (· ∩ A) say, for any bounded A ∈ BX [see Example 5.5(b)]. Some statements below have already been proved; proofs of the rest are left to the reader. (I) G[h] Janossy measures

(II) G[1 + η] Factorial moment measures

(III) log G[h] Khinchin measures

(IV) log G[1 + η] Factorial cumulant measures

(A) Deﬁnitions, Ranges of Validity For suitable measurable functions h and family of measures {µn : n = 0, 1, . . .} with µ0 a constant and µn deﬁned on B(X (n) ), write Y [h, {µn }] =

∞

1 h(x1 ) · · · h(xn ) µn (dx1 × · · · × dxn ), n! X (n) n=1

(5.5.21)

where V denotes the class of measurable functions h: X → [0, 1] such that h(x) = 1 for x outside some bounded Borel set. R denotes the radius of ∞ convergence of the p.g.f. P (z) = n=0 pn z n = E(z N (X ) ). Always, R > 1.

5.5.

Generating Functionals and Their Expansions

153

(I) Janossy Measures {Jn }. G[h] = J0 + Y [h, {Jn }],

(5.5.22)

valid for h ∈ V and subject to {Jn } satisfying the normalizing condition ∞

Jn (X (n) ) . (5.5.23) n! n=1 ∞ (n) , with pn = {Jn (·)/n! } is a probability measure on X ∪ = n=0 X (n) Jn (X )/n! (n = 0, 1, . . . ).

1 = G[1] = J0 +

(II) Factorial Moment Measures {M[n] }. G[1 + η] = 1 + Y [η, {M[n] }],

(5.5.24)

valid for |1 + η| ∈ V for which |η(x)| < (all x) provided R ≥ 1 + > 1, imply that all M[n] (X (n) ) < ∞, M[0] = 1. (III) Khinchin Measures {Kn }. log G[h] = −K0 + Y [h, {Kn }],

(5.5.25)

valid for h ∈ V with K0 > 0 and {Kn } satisfying the normalizing condition K0 =

∞

Kn (X (n) ) . n! n=1

(5.5.26)

For n ≥ 1, Kn (·) need not necessarily be nonnegative; if every Kn (·) ≥ 0, then N is inﬁnitely divisible. (IV) Factorial Cumulant Measures {C[n] }. log G[1 + η] = Y [η, {C[n] }],

(5.5.27)

valid for η as in (II), with R ≥ 1 + > 1 implying that |C[n] (X (n) )| < ∞ for all n, C[0] = 0.

(B) Relations Between Measures in Diﬀerent Expansions The conditions given for validity are suﬃcient but not always necessary. (I) → (II). This is a matter of deﬁnition! For n such that M[n] (X (n) ) < ∞, ∞

1 m[n] (x1 , . . . , xn ) = jn+r (x1 , . . . , xn , y1 , . . . , yr ) dy1 · · · dyr . r! X (r) r=0 (5.5.28) (II) → (I). For R > 2, ∞

(−1)r jn (x1 , . . . , xn ) = m[n+r] (x1 , . . . , xn , y1 , . . . , yr ) dy1 · · · dyr . r! X (r) r=0 (5.5.29)

154

5. Finite Point Processes

(I) → (III). K0 = −J0 (and hence needs J0 > 0) and R > 1. kn (x1 , . . . , xn ) =

n

r

(−1)r−1 (r − 1)!

T ∈Prn i=1

r=1

j|Si (T )| (xi1 , . . . , xi,|Si (T )| ). (5.5.30)

(III) → (I). J0 = exp(−K0 ) (and hence needs K0 < ∞) and R > 1. ! jn (x1 , . . . , xn ) = J0

n r

" k|Si (T )| (xi1 , . . . , xi,|Si (T )| ) .

(5.5.31)

r=0 T ∈Prn i=1

(III) → (IV) and (IV) → (III). These are the direct analogues of the relations between (I) and (II), noting that C[0] = 0. Valid for R > 2. (II) → (IV) and (IV) → (II). These are the direct analogues of the relations between (I) and (III), noting that M[0] = 1. Valid for R > 2.

Exercises and Complements to Section 5.5 5.5.1 [ Section 5.1 and Examples 5.3(a) and 5.5(a)]. Derive (5.1.1) from (5.5.12) by j z I (x), where {A1 , . . . , Aj } is a ﬁnite partition of X . putting ξ(x) = i=1 i Ai Put ξ = 1 + η to establish the formal relation

G[1 + η] = 1 +

∞

µ[k] k=1

k!

X

···

η(x1 ) · · · η(xk ) Π(dx1 ) · · · Π(dxk ), X

and hence, when µ[k] = E(N (k) ) < ∞, M[k] (dx1 × · · · × dxk ) = µ[k] Π(dx1 ) · · · Π(dxk ), of which the case k = 2 appears in (5.1.3). 5.5.2 For a Gibbs process as in Example 5.3(c), express the Khinchin densities in terms of the interaction potentials ψr (·). More generally, for ﬁnite point processes for which the Janossy densities exist, explore the relationship between Khinchin densities and the interaction potentials ψr (·) (see Exercise 5.3.7). 5.5.3 Branching process [continued from Example 5.5(c)]. Let Gt [ζ | x] denote the p.g.ﬂ. for the point process Nt (· | x) describing the points that constitute the tth generation of the process of Example 5.5(c) starting from a single ancestor at x; so, G1 [ζ | x] = G[ζ | x]. Show that for all k = 1, . . . , t − 1, Gt [ζ | x] = Gt−k [Gk [ζ | · ] | x] = G(t) [ζ | x], where G(t) [ζ | x] is the tth functional iterate of G[ · | · ] [see (5.5.18)].

5.5.

Generating Functionals and Their Expansions

155

5.5.4 (Continuation). Let qt (x) denote the probability of extinction within t generations starting from a single ancestor at x, so that qt (x) = Pr{Nt (X | x) = 0}. Show that for each ﬁxed x ∈ X , {qt (x): t = 0, 1, . . .} is a monotonically decreasing sequence and that, for k = 1, . . . , t − 1, qt (x) = Gt−k [qk (·) | x], so, in particular, qt+1 (x) = G[qt (·) | x]. Deduce that the probability of ultimate extinction starting from an initial ancestor at x, q(x) say, is the smallest nonnegative solution of the equation q(x) = G[q(·) | x]. 5.5.5 (Continuation). Show that the ﬁrst-moment measure M(t) (· | x) of Nt (· | x) (t) and the second factorial cumulant measure, C[2] (A × B | x) say, of Nt (· | x) satisfy the recurrence relations (with M ≡ M(1) )

M(t+1) (A | x) = (t+1)

C[2]

M(t) (A | y) M (dy | x),

X (A × B | x) =

X (2)

M(t) (A | y)M(t) (B | z) C[2] (dy × dz)

+

X

(t)

C[2] (A × B | y) M (dy | x).

[Hint: Use Nt+1 (A | X) =d Nt (A | xi ), where the {xi } denote the indixi viduals of the ﬁrst generation; see also equations (6.3.3–5).] 5.5.6 (Continuation). Let Ht [ζ | x] denote the p.g.ﬂ. for all individuals up to and including those in the tth generation starting from an initial ancestor at x. Show that these p.g.ﬂ.s satisfy the recurrence relations Ht+1 [ζ | x] = ζ(x)G[Ht [ζ | · ] | x]. Show also that, if extinction is certain, the total population over all generations has p.g.ﬂ. H[ζ | · ], which for 0 < ζ < 1 is the smallest nonnegative solution to the functional equation H[ζ | x] = ζ(x)G[H[ζ | · ] | x], and ﬁnd equations for the corresponding ﬁrst two moment measures. 5.5.7 Model for the spread of infection. Take X = Rd , and suppose that any individual infected at x in turn gives rise to infected individuals according to a Poisson process with parameter measure µ(· | x) = µ(· − x | 0) ≡ µ(· − x), where X µ(du) = ν < 1. Show that the total number N (X | 0) of infected individuals, starting from one individual infected at 0, is ﬁnite with probability 1 and that the p.g.ﬂ. H[· | · ] for the entire population of infected individuals satisﬁes the functional equation

H[ζ | 0] = ζ(0) exp

−

(1 − H[ζ | u]) µ(du) , X

where H[ζ | u] = H[Tu ζ | 0] and Tu ζ(v) = ζ(v + u).

156

5. Finite Point Processes Deduce, in particular, the following: (i) The p.g.f. of N (X | 0) satisﬁes f (z) ≡ Ez N (X |0) = z exp[−ν(1 − f (z))]. (ii) The expectation measure M (· | 0) for the total population of infected individuals, given an initial infected individual at the origin, satisﬁes

M (A | 0) = δ0 (A) +

M (A − u | 0) µ(du) X

= δ0 (A) + µ(A) + µ2∗ (A) + · · · . (iii) The second factorial moment measure M[2] (A × B | 0) of N (· | 0) satisﬁes M[2] (A × B | 0) = M (A | 0)M (B | 0)

M[2] (A − u, B − u | 0) µ(du) − δ0 (A)δ0 (B).

+ X

(iv) The Fourier transforms for M (· | 0) and M[2] (· | 0) are expressible in terms of µ ˜(θ) = X eiθ·x µ(dx) thus:

(θ | 0) = M

eiθ·x M (dx | 0) = X

[2] (θ, φ | 0) = M

1 , 1−µ ˜(θ)

ei(θ·x+φ·y) M[2] (dx × dy | 0) =

(θ | 0)M (φ | 0) − 1 M . 1−µ ˜(θ + φ)

5.5.8 Age-dependent branching process. Let X = R, and suppose that an individual born at time u produces oﬀspring according to a Poisson process with parameter measure µ(· | u) = µ(· − u | 0) ≡ µ(· − u) for some boundedly ﬁnite measure µ(·) that vanishes on (−∞, 0]. Let Gt [h | 0] denote the p.g.ﬂ. for the ages of individuals present in the population at time t starting from a single newly born individual at time 0. (a) Show that Gt satisﬁes the equation

Gt [h | 0] = h(t) exp

t

−

(1 − Gt [h | u]) µ(du) , 0

where Gt [h | u] = Gt−u [h | 0] for 0 < u < t. (b) When µ(A) = µ(A ∩ R+ ), show that

Gt [h | 0] = h(t) 1 + µ

−1

t

[1 − h(u)]eµ(t−u) du

.

0

5.5.9 Equation (5.5.29) expresses Janossy densities in terms of factorial moment densities when R > 2. Investigate whether the relation in Exercise 5.2.4 has an analogue for densities valid when only R > 1.

CHAPTER 6

Models Constructed via Conditioning: Cox, Cluster, and Marked Point Processes

In this chapter, we bring together a number of the most widely used classes of point process models. Their common theme is the generation of the ﬁnal model by a two-stage construction: ﬁrst, the generation of an indexed family of processes, and then an operation applied to members of the family to produce the ﬁnal process. The ﬁrst two classes (Cox and cluster processes) extend the simple Poisson process in much the same way that the mixed and compound Poisson distributions extend the basic Poisson distribution. Independence plays a central role and leads to elegant results for moment and generating functional relationships. Both processes are used typically in contexts where the realizations are stationary and therefore deﬁne inﬁnite collections of points. To deal with these issues, we anticipate the transition from ﬁnite to general point processes to be carried out in Chapter 9 and present in Section 6.1 a short review of some key results for more general point processes and random measures. The third class of processes considered in this chapter represents a generalization in a diﬀerent direction. In many situations, events are characterized by both a location and a weight or other distinguishing attribute. Such processes are already covered formally by the general theory, as they can be represented as a special type of point process on a product space. However, marked point processes are deserving of study in their own right because of their wide range of applications, such as in queueing theory, and their conceptual importance in contexts such as Palm theory (see [MKM] especially).

6.1. Inﬁnite Point Families and Random Measures Although the framework developed for ﬁnite point processes in Chapter 5 needs to be extended, it nevertheless contains the essential ingredients of the 157

158

6. Models Constructed via Conditioning

more general theory. We retain the assumption that the points are located within a complete, separable metric space (c.s.m.s.) X , and will generally interpret X as either R1 or R2 . The space X ∪ as in (5.3.10) is no longer the appropriate space for deﬁning the realizations; instead we move to a description of the realizations in terms of counting measures, meaning measures whose values on Borel sets are nonnegative integers. The interpretation is that the value of the measure on such a set counts the number of points falling inside that set. A basic assumption, which really deﬁnes the extent of current point process theory, is that the measures are boundedly ﬁnite: only a ﬁnite number of points fall inside any bounded set (i.e. there are no ﬁnite accumulation points). In the martingale language of Chapters 7 and 14, this is equivalent to requiring the realizations to be ‘nonexplosive’. The space X ∪ is then replaced by the space1 NX# of all boundedly ﬁnite counting measures on X . A remarkable feature is that a relatively simple and natural distance between counting measures can be deﬁned and allows NX# to be interpreted as a metric space in its own right. It then acquires a natural topology and a natural family of Borel sets B(NX# ) that can be used to deﬁne measures on NX# . We shall not give details here but refer to Chapter 9 and Appendix A2.6. Thus, the way is open to formally introducing a point process on X as a random counting measure on X , meaning technically a measurable mapping from a probability space (Ω, E, P) into the space (NX# , B(NX# )). Often, the latter space itself is taken as the canonical probability space for a point process on X . Every distinct probability measure on (NX# , B(NX# )) deﬁnes a distinct point process. As in the ﬁnite case, speciﬁc examples of point processes are commonly speciﬁed by their ﬁnite-dimensional distributions, or ﬁdi distributions for short. These can no longer be deﬁned globally, as was done through the Janossy measures for a ﬁnite point process, but are introduced by specifying consistent joint distributions Pk (A1 , . . . , Ak ; n1 , . . . , nk ) = Pr{N (A1 ) = n1 , . . . , N (Ak ) = nk }

(6.1.1)

for the number of points in ﬁnite families of bounded Borel sets. Indeed, this was the way we introduced the Poisson process in Chapter 2. Consistency here combines conditions of two types: ﬁrst, the usual conditions (analogous to those for any stochastic process) for consistency of marginal distributions and invariance under simultaneous permutation of the sets and the numbers falling into them; second, conditions to ensure that the realizations are almost surely measures, namely that N (A ∪ B) = N (A) + N (B) 1

a.s.

and

N (An ) → 0

a.s.

(6.1.2)

# In this edition, we use M# X (and NX ) to denote spaces of boundedly ﬁnite (counting)

, X (and N, X ), respectively. measures on X where in the ﬁrst edition we used M

6.1.

Inﬁnite Point Families and Random Measures

159

for (respectively) all disjoint Borel sets A, B, and all sequences {An } of Borel sets with An ↓ ∅. These two conditions reduce to the requirements on the ﬁdi distributions that, for all ﬁnite families of disjoint bounded Borel sets, (A1 , . . . , Ak ), n

Pk (A1 , A2 , A3 , . . . , Ak ; n − r, r, n3 , . . . , nk )

r=0

= Pk−1 (A1 ∪ A2 , A3 , . . . , Ak ; n, n3 , . . . , nk ),

(6.1.3)

and P1 (Ak ; 0) → 1

(6.1.4)

for all sequences of bounded Borel sets {Ak } with Ak ↓ ∅. Moreover, for point processes deﬁned on Euclidean spaces, it is enough for these relationships to hold when the sets are bounded intervals. Example 6.1(a) Simple Poisson process on R. Recall equation (2.2.1): Pr{N (ai , bi ] = ni , i = 1, . . . , k} =

k [λ(bi − ai )]ni i=1

ni !

e−λ(bi −ai ) .

(6.1.5)

Consistency of the marginals means that if one of the variables, say N (a1 , b1 ], is integrated out (by summing over n1 ), the resulting quantity is the joint probability corresponding to the remaining variables. Invariance under permutations of the variables means that if the sets and the number of points falling into them are written down in a diﬀerent order, the resulting probability is not aﬀected. In the present example, both conditions are obvious from the product form of the joint distributions. The additivity requirement (6.1.3) comes from the additivity property of the Poisson distribution: for Poisson random variables N1 and N2 that are independent (as is implied here by the product form of the distributions), their sum again has a Poisson distribution. Finally, (6.1.4) follows from the property e−δn → 1 when δn → 0. Moment measures, factorial moment measures, and probability generating functionals can be deﬁned as in Sections 5.4 and 5.5. The main diﬀerences are that in deﬁning the moment measures we should restrict ourselves to bounded sets and that in deﬁning the p.g.ﬂ. we should conﬁne ourselves to functions h in V(X ), the space of nonnegative, measurable functions bounded by unity and such that 1 − h(x) vanishes outside some bounded set. Within these constraints, the relations between generating functionals, moment measures, and all the various quantities derived from these in Chapter 5 hold much as they did there. A more detailed account, examining existence and convergence conditions, is given in Chapter 9. For many of the examples that we consider, the point processes will be deﬁned on a Euclidean space and stationary, meaning that their ﬁdi distributions are invariant under simultaneous shifts of their arguments: writing

160

6. Models Constructed via Conditioning

A + u = {x + u, x ∈ A}, stationarity means that, for all real u, Pk (A1 , . . . , Ak ; n1 . . . nk ) = Pk (A1 + u, . . . , Ak + u; n1 , . . . , nk ).

(6.1.6)

The full consequences of this assumption are quite profound (see the foretaste in Chapter 3), but for the present it is enough to note the following. Proposition 6.1.I (Stationarity Properties). (i) A point process with p.g.ﬂ. G[h] is stationary if and only if for all real u, G[(Su h)] = G[h], where (Su h)(x) = h(x − u). (ii) If a point process is stationary and the ﬁrst-moment measure M1 exists, then M1 reduces to a multiple of the uniform measure (Lebesgue measure), M1 (dx) = m (dx) = m dx, say. (iii) If a point process is stationary and the second-moment measure M2 exists, then M2 reduces to the product of a Lebesgue component along the ˘ 2 (du) say, where u = x − y, diagonal x = y and a reduced component2 , M orthogonal to the diagonal. Proof. The ﬁdi distributions as above are determined by the p.g.ﬂ. and can be evaluated by taking h to be the sum of simple functions on disjoint sets; conversely, the ﬁdi distributions determine the p.g.ﬂ., which has the shift-invariance properties under stationarity. Property (ii) can be proved from Cauchy’s functional equation (see Section 3.6), while property (iii) is the measure analogue of the familiar fact that the covariance function of a stationary time series is a function of the diﬀerence in the arguments only: c(x, y) = c˘(x − y). Similar expressions for the moment densities follow from property (iii) whenever the moment measures have densities, but in general they have a singular component along the diagonal x = y, which reappears as an atom at the ˘ 2 (·) (see also Section 8.1). General routes to origin in the reduced measure M these reduced measures are provided by the factorization theorems in Section A2.7 or by the disintegration theory outlined in Section A1.4 (see Chapter 8 for further discussion and examples). Estimation of these reduced moment measures and their Fourier transforms (spectral measures) is a key issue in the statistical analysis of point process data and will be taken further in Chapter 8 and in more detail in Chapter 12. We shall also need the idea of a random measure, so we note some elementary properties. The general theory of random measures is so closely interwoven with point process theory that the two can hardly be separated. Point processes are indeed only a special class (integer-valued) of the former, 2

˘ 2 (·) and C ˘2 (·) to denote reduced second moment and covariance In this edition, we use M ,2 (·) and measures (and m ˘ and c˘ for their densities) where in the ﬁrst edition we wrote M , etc. C(·),

6.1.

Inﬁnite Point Families and Random Measures

161

and much of the general theory runs in parallel for both cases, a fact exploited more systematically in Chapter 9. Here we provide just suﬃcient background to handle some simple applications. The formal deﬁnition of a random measure ξ(·) proceeds much as in the discussion for point processes given above. Once again, the realizations ξ(·) are required to be a.s. boundedly ﬁnite and countably additive, and their distributional properties are completely speciﬁed by their ﬁnite-dimensional distributions. Since the values of the measure are no longer integer-valued in general (although still nonnegative), these take the more general form Fk (A1 , . . . , Ak ; x1 , . . . , xk ) = Pr{ξ(Ai ) ≤ xi , i = 1, . . . , k}.

(6.1.7)

The moment measures are deﬁned as for point processes, although the special role played by the factorial moment measures is not sustained, particularly when the realizations are continuous. In place of the p.g.ﬂ., the most useful transform is the Laplace functional, deﬁned for f ∈ BM+ (X ), the space of all nonnegative f ∈ BM(X ), by L[f ] ≡ Lξ [f ] = E exp − X f (x) ξ(dx) .

(6.1.8)

[We sometimes write Lξ as a reminder of the random measure ξ to which the Laplace functional L relates and f dξ as shorthand for the integral in (6.1.8).] Of course, the Laplace functional can also be deﬁned for point processes and is therefore the natural tool when both are discussed together. Although Lξ deﬁnes (the ﬁdi distributions of) a random measure ξ uniquely, via appropriate inversion theorems, there is no easy counterpart to the expansion of the p.g.ﬂ. about the zero function as in equations (5.5.3). There is, however, a Taylor series expansion for the Laplace functional about f ≡ 0, corresponding to the p.g.ﬂ. expansion about h ≡ 1. It takes the form s2 f (x1 )f (x2 ) M2 (dx1 × dx2 ) − · · · f (x) M1 (dx) + 2! X (2) X (−s)r + f (x1 ) . . . f (xr ) Mr (dx1 × · · · × dxr ) + · · · . (6.1.9) r! X (r)

L[sf ] = 1 − s

This expression is just the expectation of the expansion of the ordinary Laplace transform of the linear functional Y = X f (x) ξ(dx). Its validity depends ﬁrst on the existence of all moments of the random measure ξ and second on the convergence, typically in a disk around the origin s = 0 with radius determined by the length of the largest interval (0, r) within which the Laplace transform is analytic. Finite Taylor series expansions, when just a limited number of moment measures exist, are possible for imaginary values of s, corresponding to the use of the characteristic functional, and are set out in Chapter 9.

162

6. Models Constructed via Conditioning

Example 6.1(b) Gamma random measures (stationary case). Suppose that the random variables ξ(Ai ) in (6.1.7) are independent for disjoint Borel sets Ai in Rd and have the gamma distributions with Laplace–Stieltjes transforms E(e−sξ(Ai ) ) = ψ(Ai , s) = (1 + λs)−α(Ai )

(λ > 0, α > 0, Re(s) ≥ 0), (6.1.10) where (·) denotes Lebesgue measure. By inspection, ψ(Ai , s) → 1 as s → 0, showing that ξ(A) is a.s. ﬁnite for any ﬁxed bounded set A. Then, since X is separable, it can be represented as a denumerable union Ai of such sets and ∞

Pr{at least one ξ(Ai ) is inﬁnite} ≤ Pr{ξ(Ai ) = ∞} = 0. i=1

As in the case of a Poisson process, additivity of ξ is a consequence of independence and the additivity property of the gamma distribution. Also, ψ(Ai , s) → 1 as (Ai ) → 0, implying the equivalent of (6.1.4), which guarantees countable additivity for ξ and is equivalent to stochastic continuity of the cumulative process ξ((0, t]) when the process is on R1 . The Laplace functional of ξ can be found by extending (6.1.10) to the case where f is a linear combination of indicator functions and generalizing: it takes the form log[1 + λf (x)] α (dx) . L[f ] = exp − X

Expanding this expression as in (6.1.9) and examining the ﬁrst and second coeﬃcients, we ﬁnd E ξ(dx) = λα (dx), (6.1.11) E ξ(dx) ξ(dy) = λ2 α2 (dx) (dy) + δ(x − y)λ2 α (dx). Thus, the covariance measure for ξ(·) vanishes except for the diagonal component along x = y, or, equivalently, the reduced covariance measure is just an atom of mass λ2 α at the origin. These features are consequences of the independence of the increments and the purely atomic nature of the sample paths ξ(·), equivalent when X = R1 to the pure jump character of the cumulative process (see Section 8.3 for further discussion). From these results, we can also conﬁrm the expressions for the moments as follow directly from (6.1.10), namely Eξ(A) = λα (A)

and

var ξ(A) = λ2 α (A).

Exercise 6.1.1 gives a more general version of a gamma random measure. Example 6.1(c) Quadratic random measure. Let Z(t) be a Gaussian process with a.s. continuous trajectories, and consider, for any Borel set A, the set function ξ(A) = Z 2 (u) du. A

6.1.

Inﬁnite Point Families and Random Measures

163

Since Z is a.s. continuous, so is Z 2 , so the integral is a.s. well deﬁned and is additive on disjoint sets. In particular, when Z has zero mean, each value Z 2 (t) is proportional to a chi-square random variable, so ξ(A) for suitably ‘small’ sets A is also approximately a chi-square r.v. Generally, ξ(A) can be deﬁned (being an integral) as a limit of linear combinations of Z 2 (ti ) for points ti that become dense in A, and this is quadratic in the Z, hence the name. The random measure properties of ξ are discussed in more detail in Chapter 9. See Exercise 6.1.3 for the ﬁrst two moments of ξ. The next example has a long history. It was originally introduced in early work by Campbell (1909) to describe the properties of thermionic noise in vacuum tubes. Moran (1968, pp. 417–423) gives further details and references. In his work, Campbell developed formulae for the moments, such as E g(x) N (dx) = g(x) M (dx), which led Matthes et al. (1978) to adopt the term Campbell measure for the concept that underlies their treatment of moments and Palm distributions (see also Chapter 13). Since that time, the ideas have appeared repeatedly in applications [see e.g. Vere-Jones and Davies (1966), where the model is referred to as a ‘trigger process’ and used to describe earthquake clustering]. Here we introduce it as a prelude to the major theme of this chapter. It is, like the other models in the chapter, a two-stage model, for which we consider here only the ﬁrst stage. Example 6.1(d) Intensity of a shot-noise process. A model for a shot-noise process is that the observations are those of a Poisson point process with a random intensity λ(·) with the following structure. A stochastic process λ(t) is formed as a ﬁltered version of a simple stationary Poisson process N (·) on R at rate ν with typical realization {ti }, the ﬁltering being eﬀected by (1) a nonnegative function g that integrates to unity and vanishes on (−∞, 0], and (2) random ‘multiplier’ eﬀects, {Yi }, a series of i.i.d. nonnegative random variables with common distribution F (·). We then deﬁne λ(t) by ∞

Yi g(t − ti ) = Y (u)g(t − u) N (du), (6.1.12) λ(t) = 0

i:ti
where Y (u) is a (ﬁctitious) process of i.i.d. variables with distribution F . Since λ(t), when ﬁnite, is stationary in t and is measurable, it is locally integrable: indeed, since its arguments are nonnegative, if it has ﬁnite expectation it must be ﬁnite a.s. For Borel sets A, the integral

λ(u) du = Yi g(u) du ξ(A) ≡ A

i

A+ti

is then well deﬁned, though possibly inﬁnite (see Exercise 6.1.4).

164

6. Models Constructed via Conditioning

The Laplace functional of ξ can be evaluated as follows. We require L[f ] = E exp − f (u)λ(u) du . R

Now, from (6.1.12), the integral can be written as a sum of terms

f (u)λ(u) du = Yi f (u)g(u − ti ) du ≡ Zi , say. R

i

R

i

If the points ti are treated as given (i.e. ﬁxed), then the Zi are independent and, with φ(·) denoting the common Laplace–Stieltjes transform of the Yi , Zi = Yi R f (u)g(u − ti ) du has the transform E(e−Zi ) = E exp − Yi R f (u)g(u − ti ) du = φ R f (u)g(u − ti ) du ≡ ζ(ti ), say, which lies in (0, 1] because f , g and the Yi are all nonnegative. Proceeding formally, the last three equations give us L[f ] = E ti ∈N ζ(ti ) = GN [ζ], by deﬁnition of a p.g.ﬂ., GN is the p.g.ﬂ. of a Poisson process, = exp ν R [ζ(t) − 1] dt , $ # = exp ν R φ R f (u)g(u − t) du − 1 dt . It is clear from the random measure analogue of Proposition 6.1.I that the random measure ξ(·) here is stationary (we can easily check that L[Su f ] = L[f ]). With a view to applying the expansion (6.1.9), we ﬁnd after some manipulation that L[f ] − 1 equals 1 ν − µ1 f (u)g(u − t) du + 2 µ2 f (u)g(u − t) du f (v)g(v − t) dv − · · · dt 2 1 2 + 2ν µ1 f (u)g(u − t) du f (v)g(v − s) dv + · · · dt ds + · · · , where µj = E(Y j ) for j = 1, 2. Collect terms, identify the measures associated ∞ with ﬁrst and second powers of f (·), and recall that −∞ g(u) du = 1 and g(u) = 0 for u < 0; then M1 (dt) = νµ1 dt, 2 2 M2 (ds × dt) = ν µ1 + νµ2

min(s,t)

−∞

g(s − u)g(t − u) du ds dt,

so that M1 has constant density νµ1 and M2 has the density ∞ m(s, t) = m ˘ 2 (v) = ν 2 µ21 + νµ2 g(y)g(y + |v|) dy, where v = s − t. 0

6.1.

Inﬁnite Point Families and Random Measures

165

The fact that M2 is absolutely continuous stems from the absolute continuity of the trajectories. The appearance of the reduced density m ˘ 2 here is characteristic of the stationary form of the moment measures (see Proposition 8.1.I and onward). While these arguments appear intuitively reasonable, to make them rigorous we must check two further points. First, we must establish that the random measure ξ is well deﬁned in the sense that, despite the inﬁnite sums in the deﬁnition, the realizations are a.s. boundedly ﬁnite; see Exercise 6.1.4. Second, the implicit conditioning step, consisting here of being given a realization {ti } of the Poisson process and then taking expectations over such realizations, needs to be justiﬁed. In a more general context, this task hinges on the technical concept of measurability and is the subject of the next proposition; it appears repeatedly in this and later chapters. As in Example 6.1(d), the models considered in this chapter are deﬁned in two steps: ﬁrst, an initial process is laid down and then a secondary process is deﬁned, with distributions conditional on the realization of the initial process. The existence and other properties of such processes depend on extensions of standard theorems concerning the structure of bivariate distributions. Because a realization of a point process (or indeed a more general random measure) can be thought of as a point in a metric space, the same basic apparatus for describing the distributions conditional on the realization of a random measure is available as for dealing with bivariate distributions in R2 . A general discussion of conditions for a bivariate random system in which each component takes its value in a c.s.m.s. is in Proposition A1.5.II. To apply the concepts in a point process context, the key idea we utilize is that of a measurable family of point processes or random measures. Suppose there is given a family {N (· | y): y ∈ Y} of point processes taking their values in the c.s.m.s. X and indexed by the elements y of the c.s.m.s. Y. This family forms a measurable family if, for each set A in B(NX# ), the function P(A | y) is B(Y)-measurable, where P(A | y) = Pr{N (· | y) ∈ A}.

(6.1.13)

As in Proposition A1.5.II, we average across a measurable family of point processes to form a new point process as a mixture of the originals. Proposition 6.1.II. Suppose there is given (a) a measurable family of point processes P(A | y), deﬁned on the c.s.m.s. X and indexed by elements of Y, and (b) a Y-valued random variable Y with distribution Π on B(Y). Then the integrals P(A) = E[P(A | Y )] =

Y

P(A | y) Π(dy)

(6.1.14)

deﬁne a probability measure P on B(X ) and hence a point process on X . Corresponding concepts can readily be deﬁned for random measures and are set out in Exercise 6.1.5.

166

6. Models Constructed via Conditioning

The next lemma gives simple suﬃcient conditions for checking whether an indexed family of point processes forms a measurable family. Lemma 6.1.III. Each of the following conditions is necessary and suﬃcient to deﬁne a measurable family of point processes on a Euclidean space: (a) for all choices of positive integer k, ﬁnite unions of disjoint intervals (B1 , . . . , Bk ), and nonnegative integers (n1 , . . . , nk ), the ﬁdi probabilities Pk (B1 . . . , Bk ; n1 , . . . , nk | y) are B(Y)-measurable functions of y; (b) for all functions h in the space V(X ), the p.g.ﬂ. G[h | y] is a B(Y)measurable function of y. Proof. Denote by A the class of subsets A of NX for which P(A | y) is measurable in y with respect to B(Y). If (a) holds, then A contains the cylinder sets used in deﬁning the ﬁdi probabilities. It follows from the closure properties of families of measurable functions (see Appendix A1.4) that the class A is closed under monotone limits and therefore contains the σ-ﬁeld of all subsets of X generated by the cylinder sets; that is, A ⊇ B(X ). Hence the given family of point processes forms a measurable family. If, alternatively, (b) holds, then by taking h to be a linear combination of indicator functions and diﬀerentiating, we can recover the ﬁdi distributions. Diﬀerentiation and the other operations involved preserve measurability so that the result follows from (a). The necessity of (a) is obvious, and that of (b) follows on observing that G[h | y] for a general h ∈ V(X ) can be obtained from the case where h is a linear combination of indicator functions by operations that preserve the measurability in y. We can immediately apply this lemma to give suﬃcient conditions that are simpler to check than those of Proposition 6.1.II. Corollary 6.1.IV. Suppose there is given a Y-valued random variable Y with distribution Π on B(Y) and either (a) a family of ﬁdi probabilities Pk (B1 . . . , Bk ; n1 , . . . , nk | y) satisfying condition (a) of Lemma 6.1.III or (b) a family of p.g.ﬂ.s G[h | y] satisfying condition (b) of Lemma 6.1.III. For each of these cases, there exists a well-deﬁned point process on X for which in case (a) the ﬁdi probabilities are given by Pk (B1 , . . . , Bk ; n1 , . . . , nk ) = E Pk (B1 , . . . , Bk ; n1 , . . . , nk | Y ) = Pk (B1 , . . . , Bk ; n1 , . . . , nk | y) Π(dy) (6.1.15a) Y

and in case (b) the p.g.ﬂ. is given by G[h] = E G[h | Y ] = G[h | y] Π(dy).

(6.1.15b)

Y

The following is perhaps the simplest example to which these ideas apply; their applications will be explored more systematically in the next two sections.

6.1.

Inﬁnite Point Families and Random Measures

167

Example 6.1(e) Mixed Poisson process. Take the distributions (6.1.5) as a candidate for a measurable family, with the role of y played by λ and that of Y played by the half-line R+ = [0, ∞). For a ﬁxed set of half-open intervals, the function (6.1.5) is a continuous and hence a measurable function of λ so that condition (a) of Lemma 6.1.III is satisﬁed. Thus, the simple Poisson processes form a measurable family with respect to the real variable λ. Consequently, we can mix (average) them with respect to a distribution Π for λ to obtain the ﬁdi distributions of a new point process. If, for example, Π is the exponential distribution with density µe−µλ dλ, then the number of points falling into any given set A has a geometric distribution pn = qpn with parameter p = µ/(µ + |A|) , q = 1 − p. Moreover, the locations of the points in A, given the number of events in A, are uniformly distributed over A. Alternatively, we could workfrom the p.g.ﬂ. for the Poisson process, namely G[h] = exp − λ [1 − h(u)] du , and take expectations over λ using condition (b) of the lemma and Corollary 6.1.IV. The resultant process has p.g.ﬂ. ∞ G[h] = exp − λ [1 − h(u)] du Π(dλ) = Π∗ [1 − h(u)] du , (6.1.16) 0

where Π∗ (θ) = E(e−θY ) is the Laplace–Stieltjes transform of an r.v. Y with distribution Π. In particular, when Π is exponential with mean 1/µ, the p.g.ﬂ. reduces to µ . G[h] = µ + [1 − h(u)] du

This reduces to the p.g.f. µ µ + |A|(1 − z) of the geometric distribution described above when we set h(u) = 1 − (1 − z)IA (u).

Exercises and Complements to Section 6.1 6.1.1 A general gamma random measure on the c.s.m.s. X can be constructed as a process with independent nonnegative increments for which the increment ξ(A) on the bounded Borel set A has a gamma distribution with Laplace transform E(e−sξ(A) ) = (1 + λs)−α(A) , where the scale parameter λ is ﬁnite and positive and the shape parameter measure α(·) is a boundedly ﬁnite measure on BX . (a) Verify that these marginal distributions, coupled with the independent increment property, lead to a well-deﬁned random measure. (b) In the case X = R, show that ξ(·) may be regarded as the increments of an underlying nondecreasing stochastic process X(t), which with positive probability is discontinuous at t if and only if α({t}) > 0. (c) Show that ξ has as its Laplace functional

L[f ] = exp

−

log(1 + λf (x)) α(dx)

(f ∈ BM+ (X )).

X

[Hint: See Chapter 9 for more detail, especially parts (b) and (c).]

168

6. Models Constructed via Conditioning

6.1.2 Stable random measure. Consider a random measure ξ for which E(e−sξ(A) ) = (1 + [exp(−sα )])−Λ(A) for some ﬁxed measure Λ(·) and that has independence properties as in Example 6.1(a). Verify that for 0 < α < 1, there is a welldeﬁned random measure with marginal distributions as stated. 6.1.3 Let ξ be the quadratic random measure of Example 6.1(c) in which the Gaussian process Z is stationary with zero mean, variance σ 2 and cov(Z(s), Z(t)) = c(s − t). Show that for bounded Borel sets A and B, E[ξ(A)] = σ 2 (A),

cov(ξ(A), ξ(B)) = 2 A

c2 (u − t) du dt.

B

6.1.4 Random measure and shot noise. Denote by {xi } the points of a stationary Poisson process on R with rate parameter ν, and let {Yj : j = 0, ±1, . . .} denote a sequence of i.i.d. r.v.s independent of {xj }. Let the function g be as in Example 6.1(d). Investigate conditions under which the formally deﬁned process

Yj g(t − xj ) Y (t) = xj ≤t

is indeed well deﬁned (e.g. by demanding that the series is absolutely convergent a.s.). Show that suﬃcient conditions are that (a) E|Y | < ∞, or else (b) g(·) is nonincreasing on R+ and there is an increasing nonnegative function ∞ g˜(·) with g˜(t) → ∞ as t → ∞ such that 0 g˜(t)g(t) dt < ∞ and whose −1 −1 g (|Y |) < ∞ [see also Daley (1981)]. inverse g˜ (·) satisﬁes E˜ 6.1.5 Write down conditions, analogous to (6.1.13), for a measurable family of random measures, and establish the analogue of Proposition 6.1.II for random measures. Frame suﬃcient conditions for the existence of a two-stage process similar to those in Lemma 6.1.III and Corollary 6.1.IV but using the Laplace functional in place of the p.g.ﬂ. 6.1.6 Let ξ be a random measure on X = Rd . For a nonnegative bounded measurable function g, deﬁne G(A) = A g(x) (dx) (A ∈ BX ), where denotes Lebesgue measure on Rd , and

G(A − x) ξ(dx).

η(A) = X

(a) Show that η(A) is an a.s. ﬁnite-valued r.v. for bounded A ∈ BX and that it is a.s. countably additive on BX . Then, the existence theorems in Chapter 9 can be invoked to show that η is a well-deﬁned random measure. (b) Show that if ξ has moment measures up to order k, so does η, and ﬁnd the relation between them. Verify that the kth moment measure of η is absolutely continuous with respect to Lebesgue measure on (Rd )(k) . (c) Denoting the characteristic functionals of ξ and η by Φξ [·] and Φη [·], show that, for f ∈ BM+ (X ),

f (y)g(y − x) dy

h(x) = X

is also in BM+ (X ), and Φη [f ] = Φξ [h].

6.2.

Cox (Doubly Stochastic Poisson) Processes

169

6.1.7 (Continuation). By its very deﬁnition, η is a.s. absolutely continuous with respect to Lebesgue measure, and when ξ is completely random, its density

g(t − x) ξ(dx)

Y (t) ≡ X

is called a linear process. [The shot-noise process noted in (6.1.12) is an example; for other references, see e.g. Westcott (1970).] Find the characteristic functional of Y when ξ is a stationary gamma random measure.

6.2. Cox (Doubly Stochastic Poisson) Processes The doubly stochastic Poisson process—or, more brieﬂy, the Cox process, so named in recognition of its appearance in a seminal paper of Cox (1955)—is obtained by randomizing the parameter measure in a Poisson process. It is thus a direct generalization of the mixed Poisson process in Example 6.1(e). We ﬁrst give a deﬁnition, then discuss the consequences of the structural features it incorporates, and ﬁnally in Proposition 6.2.II give a more mathematical deﬁnition together with a list of properties. Deﬁnition 6.2.I. Let ξ be a random measure on X . A point process N on X is a Cox process directed by ξ when, conditional on ξ, realizations of N are those of a Poisson process N (· | ξ) on X with parameter measure ξ. We must check that such a process is indeed well deﬁned. The probabilities in the Poisson process N (· | ξ) are readily seen to be measurable functions of ξ; for example, P (A; n) = [ξ(A)]n e−ξ(A) /n! is a measurable function of ξ(A), which in turn is a measurable function of ξ as an element in the metric space M# X of boundedly ﬁnite measures on X ; hence, we can apply Corollary 6.1.IV(a) and take expectations with respect to the distribution of ξ to obtain a well-deﬁned ‘mixed’ point process on X . The ﬁnite-dimensional (i.e. ﬁdi) distributions are easily obtained in terms of the distributions of the underlying directing measure ξ and are all of mixed Poisson type. Thus, for example, ∞ k [ξ(A)]k −ξ(A) x −x P (A; k) = Pr{N (A) = k} = E = e e FA (dx), k! k! 0 (6.2.1) where FA is the distribution function for the random mass ξ(A). The factorial moment measures of the Cox process turn out to be the ordinary moment measures of the directing measure; this is because the factorial moment measures for the Poisson process are powers of the directing measure. Thus, denoting by µk and γk the ordinary moment and cumulant measures for ξ, we have for k = 2, M[2] (A × A) = E E[N (A)(N (A) − 1) | ξ] = E [ξ(A)]2 = µ2 (A × A) ,

170

6. Models Constructed via Conditioning

and similarly for the covariance measures C[2] (A × A) = γ2 (A × A) . The algebraic details are most easily handled via the p.g.ﬂ. approach outlined in Corollary 6.1.IV(b). As a function of the parameter measure ξ, the p.g.ﬂ. of the Poisson process can be written, for h ∈ V(X ), as (6.2.2) G[h | ξ] = exp − X [1 − h(x)] ξ(dx) . For ﬁxed h, this is a measurable function of ξ as an element of MX . Thus, the family of p.g.ﬂ.s (6.2.2) is a measurable family in the sense of Corollary 6.1.IV(b), which implies that we can indeed construct the p.g.ﬂ of a point process by taking expectations in (6.2.2) with respect to any probability measure for ξ in MX . The expectation [1 − h(x)] ξ(dx) , E exp − X

however, can be identiﬁed with the Laplace functional [see (6.1.8)] of the random measure ξ, evaluated at the function [1 − h(x)]. This establishes the ﬁrst part of the proposition below. The remaining parts are illustrated above for particular cases and are left for the reader to check in general. Proposition 6.2.II. Let ξ be a random measure on the c.s.m.s. X and Lξ its Laplace functional. Then, the p.g.ﬂ. of the Cox process directed by the random measure ξ is given by G[h] = E exp X [h(x) − 1] ξ(dx) = Lξ [1 − h]. (6.2.3) The ﬁdi distributions of a Cox process are of mixed Poisson type, as in (6.2.1); its moment measures exist up to order n if and only if the same is true for ξ. When ﬁnite, the kth factorial moment measure M[k] for the Cox process equals the corresponding ordinary moment measure µk for ξ. Similarly, the kth factorial cumulant measure C[k] of the Cox process equals the corresponding ordinary cumulant measure γk for ξ. Note that this last result implies that the second cumulant measure of a Cox process is nonnegative-deﬁnite (see Chapter 8). Also, for bounded A ∈ BX , var N (A) = M[1] (A) + C[2] (A × A) = M[1] (A) + var ξ(A) ≥ M[1] (A) = EN (A), so a Cox process, like a Poisson cluster process, is overdispersed relative to the Poisson process. Example 6.2(a) Shot-noise or trigger process [see Example 6.1(d) and Lowen and Teich (1990)]. We continue the discussion of this example by supposing the (random) function

λ(t) = Yi g(t − xi ) (6.2.4) i:xi
6.2.

Cox (Doubly Stochastic Poisson) Processes

171

to be the density of the random measure directing the observed Poisson process. In more picturesque language, the epochs {xi } are trigger events with respective sizes (or weights) {Yi } that decay according to the function g. Note that in the deﬁnition it is not necessary to assume that g decays monotonically: integrability is suﬃcient (see also Exercise 6.1.4). Now we use the generating function formalism to obtain some elementary properties of the shot-noise process. Conditional on the sequence {(xi , Yi )}, we can appeal to (6.2.2) and write ! " ∞

G[h | {(xi , Yi )} ] = exp −Yi [1 − h(t)] g(t − xi ) dt . (6.2.5) i

xi

Write φ(θ) = E(e−θY1 ) for the common Laplace–Stieltjes transform of the {Yi }. Taking expectations in (6.2.5) ﬁrst with respect to {Yi } and then with respect to {xi }, we have for the p.g.ﬂ. of the process ! " ∞ G[h] = E φ xi [1 − h(t)] g(t − xi ) dt i

! " - ∞ φ x [1 − h(t)] g(t − x) dt − 1 dx . = exp ν

(6.2.6)

R

By taking logarithms in this expression and expanding, it follows that the point process has factorial cumulant measures existing to as many orders as the r.v.s Yi have ﬁnite moments, as is consistent with Proposition 6.2.II. It also follows that these moment measures are absolutely continuous with densities ∞ g(u) du, m1 = νµ1 0 ∞ c[2] (t1 , t2 ) = c˘[2] (t1 − t2 ) ≡ c˘[2] (t1 ) = νµ2 g(u)g(t1 + u) du, 0

c[k] (t1 , . . . , tk ) = c˘[k] (t1 , . . . , tk−1 ) ∞ = νµk g(u)g(t1 + u) · · · g(tk−1 + u) du, 0

where tj = tj − tk (j = 1, . . . , k − 1) and µk = E(Y k ). These relations are analogues of Campbell’s formulae in the theory of shot noise (see references preceding Example 6.1(c)), while the ﬁrst two illustrate the proposition insofar as the right-hand sides represent the ordinary cumulants of the directing shotnoise process. The fact that they are absolutely continuous reﬂects the same property in the realizations of ξ. The representation (6.2.6) shows that the process can equally be regarded as a Neyman–Scott Poisson cluster process [see Example 6.3(a)]. The fact that the shot-noise process and the associated Neyman–Scott process have

172

6. Models Constructed via Conditioning

the same p.g.ﬂ. means that they are identical as point processes: no measurements on the point process can distinguish the clustering and doubly stochastic (or Cox) interpretations. This ambiguity of interpretation is an extension of the corresponding ambiguity concerning the dual interpretation of contagious distributions alluded to in Exercise 1.2.3. The possibility of such dual interpretations is not restricted to cluster processes: for example, Exercise 6.2.1 sketches a nontrivial characterization of the class of renewal processes that can be represented as Cox processes. Example 6.2(b) Boson processes (Macchi, 1971a, 1975) [see Example 5.4(c)]. In optical problems concerning light beams of low density, the particulate aspects of light are important, and the emission or reception of individual photons (or more generally bosons) can be treated as a point process in time, or space, or both. A standard approach to modelling this situation is to treat the photon process as a Cox process directed by the ﬂuctuating intensity of the light beam, with this latter phenomenon modelled as the squared modulus of a complex Gaussian process. Thus, for the (density of the) random intensity, we take the function λ(t) = λ|X(t)|2 (λ > 0), (6.2.7) where X(·) is a complex Gaussian process with zero mean and complex covariance function C(s, t). The process λ(·) is similar to the quadratic random measure discussed in Example 6.1(c) with appropriate attention given to the conventions regarding a complex Gaussian process. These require that X(t) = U (t) + iV (t), where U (·) and V (·) are real Gaussian processes such that E U (s)U (t) = E V (s)V (t) = C1 (s, t), E U (s)V (t) = −E U (t)V (s) = C2 (s, t), C(s, t) = E X(s)X(t) = 2 C1 (s, t) + iC2 (s, t) . Here it is to be understood that C1 is real, symmetric, and nonnegativedeﬁnite, while C2 is antisymmetric (so, in particular, C2 (s, s) = 0, and E[X(s)X(t)] = 0 for all s, t). The moments of the process λ(·) are given by a classical result concerning the even moments of a complex Gaussian process (see e.g. Goodman and Dubman, 1969) & & + & C(s , t ) · · · C(s , t ) &+ 1 1 1 k & & & & .. .. .. E X(s1 ) · · · X(sk )X(t1 ) · · · X(tk ) = & & . . . & & & C(sk , t1 ) · · · C(sk , tk ) & ⎧ ⎫ ⎩ s1 , . . . , sk ⎪ ⎭, (6.2.8) = C+ ⎪ t1 , . . . , tk where the permanent per B ≡ + |B|+ of a matrix B contains the same terms as the corresponding determinant det B but with constant positive signs for each

6.2.

Cox (Doubly Stochastic Poisson) Processes

173

product of matrix elements in place of the alternating positive and negative signs of the determinant, so, for example, +&

& & a b &+ & & & c d & = ad + bc.

It can be shown (see Minc, 1978) that for any nonnegative-deﬁnite Hermitian matrix B, per B ≥ det B. Equations (6.2.7) and (6.2.8), taken together with Proposition 6.2.I, show that the factorial moment densities for the boson process are given by ⎧ ⎫ t ⎩ 1 , . . . , tk ⎪ ⎭. m[k] (t1 , . . . , tk ) = E λ(t1 ) · · · λ(tk ) = λk C + ⎪ (6.2.9) t1 , . . . , tk This result paves the way for a discussion that exactly parallels the discussion of the fermion process of Example 5.4(c). In place of the expansion of the Fredholm determinant d(λ) used there, we have here an analogous expansion of the function ⎧ ⎫ ∞

λk u + ⎩ 1 , . . . , uk ⎪ ⎭ du1 · · · duk , d (λ) = 1 + ··· C +⎪ u1 , . . . , uk k! A A k=1

where as before the observation region A is a closed, bounded set in a general Euclidean space Rd . Corresponding to the expression (5.4.18) for the Fredholm minor is the expression x1 , . . . , xk k + λ R−λ y1 , . . . , yk ⎧ ⎫ 1 x1 , . . . , xk ⎪ k +⎪ ⎩ ⎭ λ C = + y1 , . . . , yk d (λ) ⎧ ⎫ ∞

⎩ x1 , . . . , xk , u1 , . . . , uj ⎪ ⎭ du1 · · · duj . + λk (−λ)j ··· C +⎪ y1 , . . . , yk , u1 , . . . , uj A A j=1

(6.2.10) This shows that the Janossy measures for the photon process have densities x1 , . . . , xk + jk (x1 , . . . , xk ) = λk d+ (λ)R−λ (k = 1, 2, . . .). (6.2.11) x1 , . . . , xk Macchi (1971a) established (6.2.11) directly by evaluating the expectation jk (x1 , . . . , xk ) = E λ(x1 ) · · · λ(xk ) exp − λ(u) du A

[see also Grandell (1976) and Exercises 6.2.5–6 for further discussion].

174

6. Models Constructed via Conditioning

Example 6.2(c) A pseudo-Cox process: the Gauss–Poisson process. The Gauss–Poisson process will be introduced as a two-point cluster process in Example 6.3(d) in the next section. Here we wish only to point out that the p.g.ﬂ. G[h] in (6.3.30) for such a process, if the measures Q1 and Q2 there are absolutely continuous with respect to Lebesgue measure, equals 1 exp [1 − h(x)] [1 − h(y)] c(x, y) dx dy , [1 − h(x)] m(x) dx − 2 X

X

X

where, in the notation of (6.3.30), in which Q2 (·) is symmetric, m(x) dx = Q1 (dx) + 2Q2 (dx × X )

and

c(x, y) dx dy = 2Q2 (dx × dy).

This expression is identical in form with the expression L∗ [1 − h] for the Laplace functional of a Gaussian process, {X(t): t ∈ R} say, with mean m(t) = EX(t) and covariance c(t, u) = cov(X(t), X(u)), provided only that the function c(t, u) is positive-deﬁnite. On the other hand, the process is not an example of the construction described in Deﬁnition 6.2.I because, a.s., a realization of a Gaussian process takes both positive and negative values, so the notion of a Poisson process with parameter measure with density equal to the realization of such a Gaussian process is void. Newman (1970) coined the name ‘Gauss–Poisson’ because of this formal property of the p.g.ﬂ. This example also serves to illustrate that while the conditions of 6.2.II are suﬃcient for a functional L∗ [1 − h] to represent the p.g.ﬂ. of a point process, they are not necessary because the functional displayed at the outset of Example 6.2(c) is not the Laplace functional of a random measure.

Exercises and Complements to Section 6.2 6.2.1 Let {In } = {(an , bn ]: n = 1, 2, . . .} be a sequence of random intervals on R+ of lengths Xn = bn − an > 0 a.s. and having gaps Yn = an+1 − bn > 0 a.s., with {Xn } i.i.d. exponential r.v.s, {Yn } i.i.d. r.v.s independent of {Xn } and with ﬁnite mean, and a1 = 0. Let a Cox process N on R+ be directed by a random ∞ measure ξ, which has density λ on the set n=1 In and zero elsewhere. Show that N (·) + δ0 (·) is a renewal process. [The points of the set {an , bn : n = 1, 2, . . .} are those of an alternating renewal process with exponential lifetimes for one of the underlying lifetime distributions. Kingman (1964) showed, eﬀectively, that any stationary Cox process that is also a stationary renewal process must be directed by the stationary version of the random measure described.] 6.2.2 Discrete boson process. Let C ≡ (cij ) be a (real or complex) covariance matrix. The discrete counterpart of Example 5.4(c) and its associated exercises is the mixed Poisson process obtained by taking N (i) (i = 1, . . . , K) to be Poisson with random parameter λ|Zi |2 , where Z = (Z1 , . . . , ZK ) has the multivariate normal distribution N (0, C). For K = 1, this reduces to a geometric distribution with p.g.f. P (1 + η) = 1/(1 − λc211 η). For K > 1, the multivariate p.g.f. has the form 1 P (1 + η1 , . . . , 1 + ηK ) = , (6.2.12) det(I − λDη C) where Dη = diag(η1 , . . . , ηK ).

6.3.

Cluster Processes

175

The factorial moment relations corresponding to (6.2.9) may be written down as follows. For any k > 0, let r1 , . . . , rK be nonnegative integers such that r1 +· · ·+rK = k; here, rj is to be interpreted as the number of repetitions of the index j in deﬁning the factorial moment m[k] (i1 , . . . , ik ) = E(N (1)[r1 ] · · · N (K)[rK ] ), where the set (i1 , . . . , ik ) consists of the index j repeated rj times (j = 1, . . . , K). We then have

⎧

⎩ m[k] (i1 , . . . , ik ) = λk C + ⎪

⎫

i1 , . . . , ik ⎪ ⎭. i1 , . . . , ik

(6.2.13)

6.2.3 (Continuation). The relations (6.2.12) and (6.2.13) of Exercise 6.2.2 are together equivalent to the identity for the reciprocal of the characteristic polynomial

⎧ ⎫ ∞

1 λk + ⎪ i1 , . . . , ik ⎪ ⎭ ηi1 · · · ηik , C ⎩ =1+ i1 , . . . , ik det(I − λDη C) k! k=1

perm

where the inner summation extends over all distinct permutations of k indices from the set i1 , . . . , ik allowing repetitions [this is related to the Master Theorem of MacMahon (1915, Sections 63–66); see also Vere-Jones (1984, 1997)]. 6.2.4 (Continuation). Using (6.2.12), we have also P (z1 , . . . , zK ) = d+ (λ) det(I − λDz R−λ ), where R−λ = C(I + λC)−1 and d+ (λ) = det(I + λC). From this p.g.f., we obtain the multivariate probabilities in the form (using the notation of preceding exercises) πk (i1 , . . . , ik ) = Pr{N (j) = rj (j = 1, . . . , K)}

= λk d+ (λ) ·

+ R−λ

i1 , . . . , ik i1 , . . . , ik

r1 ! · · · rk !

.

6.2.5 (Continuation). Derive the results of Example 6.2(b) by a suitable passage to the limit of the last three exercises. [An alternative route to these results uses the expansion of Z(t) in an orthogonal series over A: see Macchi (1971a) and Grandell (1976).] 6.2.6 (Continuation). When C(s, t) = σ 2 e−α|s−t| in Example 6.2(b), show that with 2 β = α(α − 2σ ) , −1

Pr{N (0, T ] = 0} = eαT ( cosh βT + (α + 2σ 2 )β −1 sinh βT )

.

6.3. Cluster Processes Cluster processes form one of the most important and widely used models in point process studies, whether applied or theoretical. They are natural

176

6. Models Constructed via Conditioning

models for the locations of objects in the plane or in three-dimensional space, in a remarkable range of contexts: for example, plants, molecules, protozoa, human settlements, stars, galaxies, and earthquake epicentres. Along the time axis, they have been used to model photoelectric emissions, volcano eruptions, arrivals and departures at queueing systems, nerve signals, faults in computer systems, and many other phenomena. The cluster mechanism is also a natural way to describe the locations of individuals from consecutive generations of a branching process, an application with unexpectedly rich mathematical structure as well as its obvious practical applications. The intuitive motivation of such processes involves two components: the locations of clusters and the locations of elements within a cluster. The superposition of the latter constitutes the ‘observed’ process. To model the cluster elements, we specify a countable family of point processes N (· | yi ) indexed by the cluster centres {yi } (a ‘cluster ﬁeld’ in [MKM]). To model the cluster locations, we suppose there is given a process Nc of cluster centres, often unobserved, whose generic realization consists of the points {yi } ⊂ Y. More often than not, we have Y = X ; it is useful to preserve the notational distinction as a reminder of the structure of the process. The centres yi act as the germs (= ancestors in the branching process context) for the clusters they generate; it is supposed in general that there are no special features attaching to the points of a given cluster that would allow them to be distinguished from the points in some other cluster. More formally, we have the following deﬁnition. Deﬁnition 6.3.I. N is a cluster process on the c.s.m.s. X , with centre process Nc on the c.s.m.s. Y and component processes the measurable family of point processes {N ( · | y): y ∈ Y}, when for every bounded A ∈ BX ,

N (A) = N (A | y) Nc (dy) = N (A | yi ) < ∞ a.s. (6.3.1) Y

yi ∈Nc (·)

The deﬁnition requires the superposition of the clusters to be almost surely boundedly ﬁnite. There is, however, no requirement in general that the individual clusters must themselves be a.s. ﬁnite [i.e. the condition N (X | y) < ∞ a.s. is not necessary], although it is a natural constraint in many examples. A general cluster random measure can be introduced in the same way by allowing the component processes to be random measures (see Exercise 6.3.1). For the remainder of this section, we require the component processes to be mutually independent. We shall then speak of the component processes as coming from an independent measurable family and thereby deﬁning an independent cluster process. In this deﬁnition, it is to be understood that multiple independent copies of N (· | y) are taken when Nc {y} > 1. If Y = X (i.e. the cluster centre process and the component processes are all deﬁned on the same space X and X admits translations), then the further constraint that the translated components N (A − y | y) are identically distributed may be added, thus producing a natural candidate for a stationary version of the process.

6.3.

Cluster Processes

177

Conditions for the existence of the resultant point process are not so easily obtained as for the Cox process, even though the superposition of the cluster member processes involves only operations that are clearly measurable. The diﬃculty revolves around the ﬁniteness requirement embodied in equation (6.3.1). The number of clusters that are potentially able to contribute points to a given bounded set soars as the dimension of the state space increases, imposing delicate constraints that have to be met by any proposed existence theorem. For independent cluster processes, the ﬁniteness condition can be rephrased somewhat more formally as follows. Lemma 6.3.II. An independent cluster process exists if and only if, for any bounded set A ∈ BX ,

pA (y) Nc (dy) = pA (yi ) < ∞ Πc -a.s., (6.3.2) yi ∈Nc

Y

where pA (y) = Pr{N (A | y) > 0} for y ∈ Y and A ∈ BX , and Πc is the probability measure for the process of cluster centres. Proof. The sum (6.3.2) is required to converge a.s. as part of the deﬁnition of a cluster process. The converse, for given Nc , is an application of the second Borel–Cantelli lemma to the sequence of events Ei = {cluster i contributes at least one point to the set A}. The condition of Lemma 6.3.II can alternatively be rephrased in terms of generating functionals (see Exercise 6.3.2). When the components of the process are stationary (i.e. their cluster centre process is stationary and the distribution of the cluster members depends only on their positions relative to the cluster centre), a simple suﬃcient condition for the resultant cluster process to exist is that the mean cluster size be ﬁnite; even in the Poisson case, however, this condition is not necessary (see Exercise 6.3.5 for details). The moments are easier to handle. Thus, taking expectations conditional on the cluster centres yields

E[N (A) | Nc ] = M1 (A | yi ) = M1 (A | y) Nc (dy), yi ∈Nc

Y

where M1 (· | y) denotes the expectation measure of the cluster member process with centre at y, assuming this latter exists. From the assumption that the cluster member processes form a measurable family, it follows also that whenever M1 (A | y) exists, it deﬁnes a measurable kernel (a measure in A for each y and a measurable function of y for each ﬁxed Borel set A ∈ BX ). Then we can take expectations with respect to the cluster centre process to obtain E[N (A)] = M1 (A | y) M c (dy), (6.3.3) Y

c

ﬁnite or inﬁnite, where M (·) = E[Nc (·)] is the expectation measure for the process of cluster centres. From this representation, it is clear that the ﬁrst-

178

6. Models Constructed via Conditioning

moment measure of the resultant process exists if and only if the integral in (6.3.3) is ﬁnite for all bounded Borel sets A. Similar representations hold for the higher-order moment measures. In the case of the second factorial moment measure, for example, we need to consider all possible ways in which two distinct points from the superposition of clusters could fall into the product set A × B (A, B ∈ BX ). Here there are two possibilities: either both points come from the same cluster or they come from distinct clusters. Incorporating both cases, supposing the cluster centre process is given, we obtain E[N [2] (A × B | Nc )] = M[2] (A × B | y) Nc (dy) Y + M1 (A | y1 )M1 (B | y2 ) Nc[2] (dy1 × dy2 ), Y (2)

where the superscript in N [2] denotes the process of distinct pairs from N and in the second integral we have used the assumption of independent clusters. Taking expectations with respect to the cluster centre process, we obtain for the second factorial moment of the cluster process M[2] (A × B) = M[2] (A × B | y) M c (dy) Y c M1 (A | y1 )M1 (A | y2 ) M[2] (dy1 × dy2 ). (6.3.4) + Y (2)

Again, the second factorial moment measure of the cluster process exists if and only if the component measures exist and the integrals in (6.3.4) converge. Restated in terms of the factorial cumulant measure, equation (6.3.4) reads c C[2] (A × B) = M (A | y1 ) M (B | y2 ) C[2] (dy1 × dy2 ) Y (2) + M[2] (A × B | y) M c (dy). (6.3.5) Y

Many of these relationships are derived most easily, if somewhat mechanically, from the portmanteau relation for the probability generating functionals, which takes the form, for h ∈ V(X ) and exploiting the independent cluster assumptions, G[h] = E G[h | Nc ] = E exp − − log Gm [h | y] Nc (dy) Y (6.3.6) = Gc Gm [h | · ] , where Gm [h | y] for h ∈ V(X ) is the p.g.ﬂ. of N (· | y), and − log Gm [h | y] Nc (dy) (6.3.7) Gm [h | yi ] = exp − G[h | Nc ] = yi ∈Nc

Y

6.3.

Cluster Processes

179

is the conditional p.g.ﬂ. of N given Nc . The a.s. convergence of the inﬁnite product in (6.3.7) is equivalent to the a.s. convergence of the sum in Lemma 6.3.II by Exercise 6.3.2. The measurable family requirements of the family of p.g.ﬂ.s for the cluster centres follow from the initial assumptions for the process. Thus, the p.g.ﬂ. representation is valid whenever the cluster process exists. One class of cluster processes occurs so frequently in applications, and is so important in the theory, that it warrants special attention. In this class, (1◦ ) the cluster centres are the points of a Poisson process, and (2◦ ) the clusters are independent and ﬁnite with probability 1. Whenever condition (1◦ ) holds, we speak of a Poisson cluster process. The basic existence and moment results for Poisson cluster processes are summarized in the proposition below. Proposition 6.3.III. Suppose that the cluster centre process is Poisson with parameter measure µc (·) and that the cluster member processes form an independent measurable family. Then, using the notation above, (i) a necessary and suﬃcient condition for the existence of the resultant process is the convergence for each bounded A ∈ BX of the integrals pA (y) µc (dy); (6.3.8) Y

(ii) when the process exists, its p.g.ﬂ. is given by the expression G[h] = exp − 1 − Gm [h | y] µc (dy) ;

(6.3.9)

Y

(iii) the resultant process has ﬁrst and second factorial moment measures and second factorial cumulant measure given, respectively, for A, B ∈ BX , by M1 (A) = M[1] (A) = M[1] (A | y) µc (dy), (6.3.10) Y M[2] (A × B | y) µc (dy) + M1 (A)M1 (B), (6.3.11) M[2] (A × B) = Y C[2] (A × B) = M[2] (A × B | y) µc (dy); (6.3.12) Y

(iv) when X = Rd , the distribution function F of the distance from the origin to the nearest point of the process is given by pSr (0) (y) µc (dy) , (6.3.13) 1 − F (r) = exp − Y

where Sr (0) is the sphere in X = Rd of radius r and centre at 0. Proof. Since E[Nc (dy)] = M c (dy) = µc (dy) for a Poisson cluster process, condition (6.3.8) implies the a.s. convergence of (6.3.2) and hence the exis¯ ∈ V(Y), Gc [h] ¯ = tence of the process. If the process exists, then since for h

180

6. Models Constructed via Conditioning

¯ exp − [1 − h(y)] µc (dy) , equation (6.3.9) is just the appropriate special ¯ form of (6.3.6) with h(y) = Gm [h | y] for h ∈ V(X ) and so it holds. Putting h(x) = 1 − IA (x), the integral in (6.3.9) reduces to 1 − Gm [1 − IA (·) | y] = pA (y) , from which the necessity of (6.3.8) is obvious. The moment relations are just restatements of equations (6.3.3–5) for the special case of the Poisson process, where M c (dy) = µc (dy) and C[2] (dy1 × dy2 ) ≡ 0. The ﬁnal equation (6.3.13) is a consequence of the fact that if R is the distance from the origin to the nearest point of the process, then R > r if and only if the sphere Sr (0) contains no point of the process, which yields (6.3.13) as the special case of (6.3.9) with h(x) = 1 − ISr (0) (x). If X = Y = Rd and the process is stationary, and the factorial measures entering into equations (6.3.10–12) have densities, then the latter equations simplify further. In this case, the cluster centre process reduces to a Poisson process with constant intensity µc , say, and the ﬁrst-moment density for the cluster member process can be written m1 (x | y) = m1 (x − y | 0) ≡ ρ1 (x − y),

say.

Similarly, the second factorial moment and cumulant densities can be written m[2] (x1 , x2 | y) = m[2] (x1 − y, x2 − y) ≡ ρ[2] (x1 − y, x2 − y), c[2] (x1 , x2 | y) = c[2] (x1 − y, x2 − y) ≡ γ[2] (x1 − y, x2 − y). Substituting, we obtain simpliﬁed forms for the corresponding densities of the cluster process: m = µc ρ1 (u) du = µc M1 (X | 0) = µc E[Nm (X | 0)], X m ˘ [2] (u) = m[2] (y, y + u) = µc ρ[2] (w, u + w) dw + m2 , (6.3.14) X c˘[2] (u) = µc ρ[2] (w, u + w) dw. X

A more systematic treatment of such reduced densities m ˘ [2] and c˘[2] is given in Section 8.1. The particularly simple form of these expressions means that it is often possible to obtain explicit expressions for the second moments of the counting process in such examples. Note also that since the cumulant density c˘[2] (u) is everywhere nonnegative, the resultant process is generally overdispersed relative to a Poisson process with the same ﬁrst-moment measure (i.e. it shows greater variance in the number of counts). The alternative terms in the ﬁrst line of (6.3.14) illustrate the suﬃcient condition for the existence of the process mentioned earlier and in Exercise 6.3.5: if the mean cluster size M1 (X | 0)

6.3.

Cluster Processes

181

is ﬁnite, then the ﬁrst-moment measure of the resultant process exists, and a fortiori the resultant process itself exists. Other aspects of the process, such as interval properties, are generally less easy to obtain. Nevertheless, some partial results may be obtained in this direction via equation (6.3.13). Suppose that X = Y = R. Then, from (6.3.13) but using the half-interval (0, t) in place of the ‘sphere’ (−t, t), the survivor function S(t) [see below (2.1.3)] for the length of the interval from 0 to the ﬁrst point of the process in R+ is given by S(t) = exp − p(t | y) µc (dy) , (6.3.15) R

where p(t | y) = p(0,t) (y), a special case of the function pA (y) in (6.3.2). Taking logarithms of (6.3.15) and diﬀerentiating, we see that the hazard function r(t) for this ﬁrst interval is given by ∂p(t | y) µc (dy) . r(t) = − ∂t R When the process is stationary, a further diﬀerentiation gives the hazard function q(·) of the distribution of the interval between two consecutive points of the process, as in Exercise 3.4.2. In higher dimensions, a similar approach may be used for the nearestneighbour distributions, although explicit expressions here seem harder to determine (see Chapter 15). In all of Examples 6.3(a)–(e) below, the spaces X and Y of Deﬁnition 6.3.I are the same. Example 6.3(a) The Neyman–Scott process: centre-satellite process; process of i.i.d. clusters (Neyman and Scott, 1958, 1972; Thompson, 1955; Warren, 1962, 1971). Suppose that the individual cluster members are independently and identically distributed; that is, we are dealing with i.i.d. clusters as in Section 5.1 [see also Examples 5.3(a) and 5.5(a)]. Write F (dx | y) for the probability distribution of the cluster members with cluster centre at y and Q(z | y) for the p.g.f. of the total cluster size (assumed ﬁnite). Then, the cluster member p.g.ﬂ. is given by (5.5.12), which in the notation above becomes & & Gm [h | y] = Q h(x) F (dx | y) & y , (6.3.16) X

while the corresponding factorial measures take the form M[k] (dx1 × · · · × dxk | y) = µ[k] (y)

k

F (dxi | y),

(6.3.17)

i=1

where µ[k] (y) is the kth factorial moment for the cluster size distribution when the cluster centre is at y. Note that if F is degenerate at y, we obtain the compound Poisson process discussed in Example 2.1.10(b) and again in the next section, while if every cluster has exactly one point [so Q(z | y) = z], we have random translations, ﬁrst mentioned above at Exercise 2.3.4(b).

182

6. Models Constructed via Conditioning

In many practical applications with X = Rd , the cluster centre process is stationary Poisson at rate µc , Q(z | y) and µ[k] (y) are independent of y, and F (dx | y) is a function of the vector distance x − y alone and has density function f (x | y) = f˘(x − y) = (d/dx)F˘ (x − y). With these simplifying assumptions, the resultant p.g.ﬂ. takes the compact form ˘ G[h] = exp µc Q h(y + x) F (dx) − 1 dy , (6.3.18) Rd

Rd

while from the densities in (6.3.14), the mean rate and second factorial cumulant measures for the resultant process are given by m = µc µ[1] and f˘(y + u) f˘(y) dy, (6.3.19) c˘[2] (u) = µc µ[2] Rd

respectively. Also, for the survivor function S(t) of the interval to the ﬁrst point in the case d = 1, we obtain − log S(t) = µc 1 − Q 1 − F˘ (y + t) + F˘ (y) dy (6.3.20) R

with a pleasing simpliﬁcation when F˘ (·) is the exponential distribution (see Exercise 6.3.7). Exercise 6.3.10 sketches a two-dimensional extension. Example 6.3(b) Bartlett–Lewis model: random walk cluster process; Poisson branching process (Bartlett, 1963; Lewis, 1964a, b). In this example, we take X = Y = Rd and suppose that the points in a cluster are the successive end points in a ﬁnite random walk, starting from and including the cluster centre. The special case where the random walk has unidirectional steps in R1 (i.e. forms a ﬁnite renewal process), was used as a road traﬃc model in Bartlett (1963) and studied in depth by Lewis (1964a) as a model for computer failures. A closed-form expression for Gm [h | y] does not appear to exist, although for the special case where both the step lengths and the number of steps are independent of the positions of the cluster centre, it can be represented in the form h(y) q0 + q1 h(y + x1 ) F (dx1 ) X (6.3.21) + q2 h(y + x1 )h(y + x1 + x2 ) F (dx1 ) F (dx2 ) + · · · , X (2)

where qj is the probability that the walk terminates after j steps and F is the common step-length distribution. Assuming also a constant intensity µc for the Poisson process of cluster centres, the mean density takes the form m = µc

∞

(j + 1)qj = µc (1 + m[1] ),

(6.3.22)

j=0

while the reduced form for the second factorial cumulant measure is given by

6.3.

Cluster Processes

C˘[2] (du) = µc

∞

j=1

qj

183 j

(j − k + 1) F k∗ (du) + F k∗ (−du) .

(6.3.23)

k=1

Expressions for the nearest point and nearest-neighbour distance can be obtained at least for the case X = R and unidirectional F (·). Under these conditions, the probability p(t | y) that a cluster with centre at y has a point in the interval (0, t) is given by ⎧ 0 for y > t, ⎪ ⎪ ⎪ ⎪ ⎪ for 0 ≤ y ≤ t, ⎨1 |y| p(t | y) = ∞

⎪ ⎪ ⎪ F (|y| + t − x) − F (|y| − x) dF i∗ (x) for y < 0, r ⎪ i+1 ⎪ ⎩ i=0

0

∞

where ri = j=1 qj . Substituting in (6.3.17) and simplifying, we obtain for the log survivor and hazard functions t t [1 − F (x)] dx = mt − µc m[1] F (x) dx, − log S(t) = µc t + µc m[1] 0

r(t) = µc + µc m[1]

1 − F (t) ,

0

(6.3.24a) (6.3.24b)

where 1 + m[1] = m/µc as in (6.3.22) (see also Exercise 6.3.9). The next model, the Hawkes process, ﬁgures widely in applications of point processes to seismology, neurophysiology, epidemiology, and reliability. It is also an important model from the theoretical point of view and will ﬁgure repeatedly in later sections of this book. One reason for its versatility and popularity is that it combines in the one model both a cluster process representation and a simple conditional intensity representation, which is moreover linear. It comes closest to fulﬁlling, for point processes, the kind of role that the autoregressive model plays for conventional time series. However, the class of processes that can be approximated by Hawkes processes is more restricted than the class of time series models that can be approximated by autoregressive models. In particular, its representation as a cluster process means that the Hawkes process can only be used in situations that are overdispersed relative to the Poisson model. In introducing the model, Hawkes (1971a, b, 1972) stressed the linear representation aspect from which the term ‘self-exciting’ derives. Here we derive its cluster process representation, following Hawkes and Oakes (1974), mainly because this approach leads directly to extensions in higher dimensional spaces but also because it simpliﬁes study of the model. Example 6.3(c) Hawkes process: self-exciting process; infectivity model [see also Examples 6.4(c) (marked Hawkes process), 7.2(b) (conditional intensity representation), 8.2(e) (Bartlett spectrum), 8.5(d) (mutually exciting point

184

6. Models Constructed via Conditioning

processes) and 8.3(c) (linear prediction formulae)]. The points {xi } of a Hawkes process are of two types: ‘immigrants’ without extant parents in the process, and ‘oﬀspring’ that are produced by existing points. An evolutionary construction of the points is as follows. Immigrants {yj }, say, arrive according to a Poisson process at constant rate µc , while the oﬀspring arise as elements of a ﬁnite Poisson process that is associated with some point already constructed. Any point of the process, located at x , say, has the potential to produce further points whose locations are those of a (ﬁnite) Poisson process with intensity measure µ(A − x ); we assume that µ(·) has total mass ν ≡ µ(X ) < 1 and that all these ﬁnite Poisson processes are mutually independent and, given the point that generates them, identically distributed (modulo the shift as noted) and independent of the immigrant process as well. Consequently, each immigrant has the potential to produce descendants whose numbers in successive generations constitute a Galton–Watson branching process with Poisson oﬀspring distribution whose mean is ν. Since ν < 1, this branching process is subcritical and therefore of ﬁnite total size with mean 1/(1 − ν) < ∞ if we include the initial immigrant member. Regard the totality of all progeny of a given immigrant point yj as a cluster; then the totality of all such immigrant points and their clusters constitutes a Hawkes process. An important task is to ﬁnd conditions that ensure the existence of a stationary Hawkes process (i.e. of realizations of point sets {xi } on the whole space X = Rd having the structure above and with distributions invariant under translation). Since the immigrant process is stationary, a suﬃcient condition, by Exercise 6.3.5, is that the mean cluster size be ﬁnite [or else, since the immigrant process is Poisson, Proposition 6.3.III(i) can be invoked]. The cluster centres may be regarded as ‘infected immigrants’ from outside the system and the clusters they generate as the process of new infections they produce. Then, µ(dx) is a measure of the infectivity at the point x due to an infected individual at the origin. The key characteristics of any cluster are the ﬁrst- and second-moment measures for the total progeny. From Exercise 5.5.6, the ﬁrst of these is given by M1 (A | 0) = δ0 (A) + µ(A) + µ2∗ (A) + · · · (bounded A ∈ BX ), while the second satisﬁes the integral equation M[2] (dy, y + A | 0) X = M1 (y + A | 0) M1 (dy | 0) − δ0 (A) + M[2] (du, u + A | 0) µ(dv), X

X

X

so that (1 − ν)

X

M[2] (dy, y + A | 0) =

X

M1 (y + A | 0) M1 (dy | 0) − δ0 (A). (6.3.25)

6.3.

Cluster Processes

185

From the general results (6.3.10–12), it now follows that the mean density of the resultant cluster process is given by m = λM1 (X | 0) = µc /(1 − ν), while for its factorial covariance measure we have ˘ M[2] (dy, y + A | 0) C[2] (A) = µc X µc = M1 (y + A | 0) M1 (dy | 0) − δ0 (A) . 1−ν X

(6.3.26)

(6.3.27)

This corresponds to the reduced density µc m1 (y) m1 (x + y) dy − δ0 (x) c˘[2] (x) = 1−ν X when M1 (A | 0) is absolutely continuous with density m1 (x), say, apart from the δ-function at the origin. An important feature of these formulae is that they lead to simple Fourier transforms, and we exploit this fact later in illustrating the spectral theory in Example 8.2(e). For a parametric example, with X = R and µ(·) with support in R+ , suppose that for some α > 0 and 0 < ν < 1 ναe−αx dx for x ≥ 0, µ(dx) = 0 otherwise. Then M1 (·) is absolutely continuous apart from an atom at the origin; for its density m1 (·), we ﬁnd on x ≥ 0 that m1 (x) = δ(x) + ναe−α(1−ν)x . It follows that C˘[2] (·) is absolutely continuous also, and by substituting in (6.3.26) and (6.3.27), we ﬁnd that the covariance density of the stationary process is given by µc αν(1 − 12 ν) −α(1−ν)|y| c˘[2] (y) = e . (6.3.28) (1 − ν)2 Example 6.3(d) The Gauss–Poisson process: process of correlated pairs (Bol’shakov, 1969; Newman, 1970; Milne and Westcott, 1972). This process has the curious distinction of being simultaneously a Neyman–Scott process, a Bartlett–Lewis process, and a pseudo-Cox process [Example 6.2(c)]. Its essential characteristic is that the clusters contain either one or two points (so it exists if and only if the cluster centre process exists). Let one point be taken as the cluster centre, let F (dx | y) denote the distribution of the second point relative to the ﬁrst, and let q1 (y), q2 (y) be the probabilities of 1 and 2 points, respectively, when the centre is at y. Then, we may regard the process as a special case of the Example 6.3(b) with Gm [h | y] = q1 (y)h(y) + q2 (y)h(y) h(x) F (dx | y) Y

186

6. Models Constructed via Conditioning

so that for the resultant process (and recall that X = Y = Rd here), log G[h] = h(y) − 1 q1 (y) µ(dy) X + h(x)h(y) − 1 q2 (y) µ(dy) F (dx | y). (6.3.29) X

X

This is not quite in standard form because the measure q2 (y) µ(dy) F (dx | y) is not symmetric in general. However, the value of the p.g.ﬂ. is unaltered when we replace this measure by its symmetrized form Q2 (dx × dy), say, so without loss of generality we may write the p.g.ﬂ. in the form log G[h] = h(x) − 1 Q1 (dx) + h(x)h(y) − 1 Q2 (dx × dy), (6.3.30) X (2)

X

where Q1 and Q2 are boundedly ﬁnite and Q2 is symmetric with boundedly 2 = 2Q2 and substitute in (6.3.30), we ﬁnite marginals. If now we deﬁne Q obtain the standard form in (6.3.32) below using Khinchin measures. Conversely, given any two such measures Q1 and Q2 , any expression of the form (6.3.30) represents the p.g.ﬂ. of a process of correlated points because we can ﬁrst deﬁne a measure µ by µ(A) = Q1 (A) + Q2 (A × X ), then appeal to the Radon–Nikodym theorem to assert the existence µ-a.e. of nonnegative functions q1 (·), q2 (·) with q1 (x) + q2 (x) = 1 satisfying, for all bounded A ∈ BX , Q1 (A) = q1 (x) µ(dx) and Q2 (A × X ) = q2 (x) µ(dx), A

A

and ﬁnally use Proposition A1.5.III concerning regular conditional probabilities to deﬁne a family of probability measures {F (· | x): x ∈ X } by Q2 (A × B) = F (B | x) Q2 (dx × X ) = F (B | x)q2 (x) µ(dx) A

A

for all bounded A and all B ∈ BX . This discussion characterizes the p.g.ﬂ. of such two-point cluster processes, but Milne and Westcott (1972) give the following stronger result. Proposition 6.3.IV. For (6.3.30) to represent the p.g.ﬂ. of a point process, it is necessary and suﬃcient that (i) Q1 and Q2 be nonnegative and boundedly ﬁnite, and (ii) Q2 have boundedly ﬁnite marginals. Proof. The additional point to be proved is that (6.3.30) fails to be a p.g.ﬂ. if either Q1 or Q2 is a signed measure with nontrivial negative part. Exercise 6.3.11 sketches details [see also Example 6.2(c) and Exercises 6.3.12–13].

6.3.

Cluster Processes

187

Observe that for the process with p.g.ﬂ. given by (6.3.30), the expectation and second cumulant measures exist and are given, respectively, by M (dx) = Q1 (dx) + Q2 (dx × X ) + Q2 (X × dx), (6.3.31a) C[2] (dx1 × dx2 ) = Q2 (dx1 × dx2 ) + Q2 (dx2 × dx1 ), (6.3.31b) the representation holding whether or not Q2 is given in its symmetric version. It appears to be an open problem to determine conditions similar to those in Proposition 6.3.IV for an expansion such as (6.3.30) with just k terms (k ≥ 3) to represent the log p.g.ﬂ. of a point process [see Milne and Westcott (1993) for discussion]. Example 6.3(e) A bivariate Poisson process [see also Examples 7.3(a) (intensity functions and associated martingales), 7.4(e) (random-time transformation to unit-rate Poisson process) and 8.3(a) (spectral properties), and Exercise 8.3.7 (joint forward recurrence time d.f.)]. A bivariate process can be represented as a process on the product space X × {1, 2}, where indices (or marks) 1, 2 represent the two component processes. The p.g.ﬂ. expansions are most conveniently written out with the integrals over each component space taken separately. Consider, in particular, a Poisson cluster process on X × {1, 2} in which the clusters may be of three possible types only: a single point in process 1, a single point in process 2, and a pair of points, one from each process. Arguments analogous to those in the preceding example show that the joint p.g.ﬂ. can be written in the form log G[h1 , h2 ] = h1 (x) − 1 Q1 (dx) + h2 (x) − 1 Q2 (dx) X X + h1 (x1 )h2 (x2 ) − 1 Q3 (dx1 × dx2 ), X (2)

where Q1 , Q2 and Q3 are boundedly ﬁnite and Q3 has boundedly ﬁnite marginals. The marginal p.g.ﬂ. for process 1 can be found by setting h2 = 1 and is therefore a Poisson process with parameter measure µ1 (dx) = Q1 (dx) + Q3 (dx × X ); similarly, the process with mark 2 is also Poisson with parameter measure µ2 (dx) = Q2 (dx) + Q3 (X × dx). Finally, the superposition of the two processes is of Gauss–Poisson type, with 1 (dx) = Q1 (dx) + Q2 (dx) Q and (taking the symmetric form) 2 (dx1 × dx2 ) = 1 [Q3 (dx1 × dx2 ) + Q3 (dx2 × dx1 )]. Q 2

188

6. Models Constructed via Conditioning

Evidently, this is the most general example of a bivariate Poisson cluster process with Poisson marginals since clusters of any higher order would introduce higher-order clusters in the marginals and hence destroy the Poisson property. The resulting ﬁdi distributions are inﬁnitely divisible bivariate Poisson distributions of the kind studied by Holgate (1964) and Milne (1974); see also Griﬃths, Milne and Wood (1979). The particular bivariate distribution studied by Dwass and Teicher (1957) corresponds to the situation where the pairs must occur for both processes at the same location x; the resultant process is then not only inﬁnitely divisible but also has complete independence. Example 6.3(e) appears in many guises—for example as the joint process of the input and output streams of the M/M/∞ queue. It is closely related to the Gauss–Poisson process, which is nothing other than the ‘ground process’ (see Section 6.4) of the bivariate example above. We shall use it repeatedly to illustrate the structure of multivariate processes—their moments, spectra, conditional intensities, and compensators. See in particular Example 7.3(a). There are, of course, many examples of bivariate Poisson processes that are not inﬁnitely divisible; one class may be obtained by mixing over the relative proportions of pairs and single points in the example above (see Exercise 6.3.12). A queueing example is given in Daley (1972a). The previous examples illustrate the point that the same process can be represented in several equivalent ways as a Poisson cluster process: the Gauss– Poisson process, for example, can be represented either as a Neyman–Scott process or as a Bartlett–Lewis type process for appropriately chosen special cases of those models. This same example also points the way to an intrinsic characterization of Poisson cluster processes. In the next result, the measures Kk (·) are extended versions of the Khinchin measures deﬁned for ﬁnite processes by (5.5.5). Proposition 6.3.V. The p.g.ﬂ. of every Poisson cluster process with a.s. ﬁnite clusters can be uniquely represented in the form log G[h] =

∞

1 h(x1 ) . . . h(xk ) − 1 Kk (dx1 × · · · × dxk ), (6.3.32) k! X (k)

k=1

where the {Kk } form a family of symmetric, boundedly ﬁnite measures on B(X (k) ) such that each Kk (·) has boundedly ﬁnite marginals Kk ( · × X (k−1) ), and the sum ∞ k

1 k (6.3.33) Kk A(i) × (Ac )(k−i) k! i=1 i k=1

is ﬁnite for bounded A ∈ BX . Conversely, given any such family of measures {Kk : k ≥ 1}, the p.g.ﬂ. (6.3.32) represents the p.g.ﬂ. of a Poisson cluster process. Proof. Suppose there is given a Poisson cluster process with cluster centres deﬁned on the space Y and having parameter measure µc (·). Suppose also

6.3.

Cluster Processes

189

that the clusters are a.s. ﬁnite, so that they can be represented in terms of a family of Janossy measures Jk (· | y) (see Section 5.3), conditioned by the location y of the cluster centre. Note that by deﬁnition these measures are symmetric. Consequently, we consider the quantities Kk (·) deﬁned by setting Kk (B) = Jk (B | y) µc (dy) B ∈ B(X (k) ) Y

and check that they are in fact boundedly ﬁnite measures. From Proposition 6.3.III, we know that the integral Y pA (y) µ(dy) converges for each bounded set A ∈ BX . Here, pA (y) is just the sum over k ≥ 1 of the probabilities that the cluster has k members of which at least one falls into the set A, so that, referring to (5.3.10), pA (y) equals ∞ k ∞

Jk (X (k) | y) − Jk ((Ac )(k) | y) k Jk A(i) × (Ac )(k−i) | y = . k! k! i i=1 k=1

k=1

The ﬁniteness of Kk (B) follows when B is of the form A(k) for bounded A. Similarly, by taking the term in the sum with i = 1, we deduce the bounded ﬁniteness of the marginals. Finally, (6.3.33) is just a restatement of the necessary and suﬃcient condition that (6.3.8) be ﬁnite. We can then obtain the representation (6.3.32) from the standard representation of a Poisson cluster p.g.ﬂ. log G[h] = (G[h | y] − 1) µc (dy) h ∈ V(X ) Y

by expressing G[h | y] in terms of the associated Janossy measures as in equation (5.5.3) and rearranging the integrations. Note that the term with k = 0 drops out of the summation. Uniqueness follows from standard results concerning uniqueness of the expression of the p.g.ﬂ. and its logarithm about the origin. Now suppose conversely that a family of measures Kk satisfying the stated conditions is given. We wish to construct at least one Poisson cluster process that has the p.g.ﬂ. representation (6.3.32). Take X = Y, and let the measure µ0 (·) be deﬁned over bounded A ∈ BX by µ0 (A) =

∞

Kk (A × X (k−1) )/k!

(6.3.34)

k=1

as the parameter measure for the cluster centre process. Note that the ﬁniteness condition (6.3.33) entails the ﬁniteness of (6.3.34) because k k k−1 Kk A × A(i−1) × (Ac )k−i Kk A(i) × (Ac )(k−i) = i i−1 i i=1

k

k i=1

≥ Kk (A × X (k−1) ).

190

6. Models Constructed via Conditioning

As in the Gauss–Poisson case, we can deﬁne µ0 -a.e. a probability distribution {qk (y)} on k = 1, 2, . . . as the Radon–Nikodym derivatives in Kk (A × X (k−1) ) , qk (y) µ0 (dy) = k! A these probabilities {qk (y)} determining the number of points k in a cluster with centre y. The cluster member structure can be deﬁned by taking one point as the cluster centre and locating the positions of the others relative to it through the distribution Pk−1 (B | y) deﬁned µ0 -a.e. over B ∈ B(X (k−1) ) by Pk−1 (B | y) Kk (dy × X (k−1) ) = Kk (A × B), A

appealing again to the existence of regular conditional probabilities. We can now check that the process with these components has the p.g.ﬂ. representation (6.3.32) and that the existence condition (6.3.33) is satisﬁed. Note that there are many other processes that could be constructed from the same ingredients. In particular (see below Theorem 2.2.II), we can introduce an arbitrary probability q˜0 (y) of empty clusters with 0 ≤ q˜0 (y) < 1 (all y) by redeﬁning (k = 1, 2, . . .) q˜k (y) = 1 − q˜0 (y) qk (y) and setting

−1 µ ˜c (dy) = 1 − q˜0 (y) µc (dy).

The p.g.ﬂ. is unaltered by this transformation, and the resultant processes are equivalent; we record this formally. Corollary 6.3.VI. The probability of a zero cluster is not an estimable parameter in any Poisson cluster model. A similar range of possibilities exists for the way the cluster centre x is deﬁned relative to the joint distributions Pk (·) of the points in the cluster. In the construction above, we have chosen to ﬁx the centre at an arbitrary point of the cluster. The measures Jk ( · | y) are then related to the Pk ( · | y) by J1 (A) = P1 (A) and, for k ≥ 2, the symmetrization relations

δy (A1 )Pk−1 (A2 × · · · × Ak | y). Jk (A1 × A2 × · · · × Ak | y) = k −1 sym

Alternatively, we might prefer to locate the cluster centre at the multivariate centre of mass of the distribution (assuming this to be deﬁned) or else in some other manner. This can be done without altering the ﬁnal form of the p.g.ﬂ. If it is necessary to select one particular form of representation for the process, we shall choose that used in the proof above and refer to it as the regular representation of the given process. The proposition implies that there is a one-to-one correspondence between measures on B(M# X ) induced by Poisson cluster processes and the elements in their regular representations.

6.3.

Cluster Processes

191

Exercises and Complements to Section 6.3 6.3.1 LeCam’s precipitation process. Formulate a deﬁnition for a general cluster random measure ζ analogous to Deﬁnition 6.3.I by replacing {N (· | y)} by a measurable family of random measures {ξ( · | y)}. When these components are independent and Lξ [f | y] denotes the Laplace functional of ξ( · | y) deﬁned over f ∈ BM+ (X ) [see around (6.1.8)], the Laplace functional Lζ of ζ is related to {Lξ [f | y]} and the p.g.ﬂ. Gc of the cluster centre process by Lζ = Gc [Lξ [f | · ] ] provided ζ is well deﬁned. [This model is discussed in LeCam (1961), who was motivated by the problem of modelling precipitation.] 6.3.2 Show that an independent cluster process exists if and only if, for each h ∈ V(X ), the inﬁnite product G[h | Nc ] = i Gm [h | yi ] converges Πc -a.s. 6.3.3 Frequently, it may be desired speciﬁcally to include the cluster centres with the points generated by the cluster member processes with p.g.ﬂ. Gm [h | y]. Show that the modiﬁed process has p.g.ﬂ. Gc [h(·)Gm [h | · ] ]. 6.3.4 Moment measures for a cluster process. For a cluster process, the r.v. Xf ≡ f (y) N (dy) can be expressed as the sum Y (yi ), where the yi are the i f X cluster centres and Yf (y) = X f (x) Nm (dx | y) is the potential contribution to Xf from a cluster member with centre at y. Assume that for f ∈ BM+ (X )

M1,f (y) ≡ E[Yf (y)] =

f (x) M1 (dx | y) < ∞, X

M2,f (y) ≡ E[Yf2 (y)] =

X (2)

f (x1 )f (x2 ) M2 (dx1 × dx2 | y) < ∞.

Use a conditioning argument to obtain the basic relations

E[Yf (y)] M c (dy) =

EXf = Y

f (x) M1 (dx | y) M c (dy),

= Y

EXf2

M1,f (y) M c (dy) Y

X

c

=

V2 (y) M (dy) + Y (2)

Y

c

var Xf =

V2 (y) M (dy) + Y

M1,f (y) M1,f (z) M2c (dy × dz),

Y (2)

M1,f (y)M1,f (z) C2c (dy × dz),

where V2 (y) = M2,f (y) − (M1,f (y))2 = var Yf (y). Derive equations (6.3.3–5) by considering also cov(Xf , Xg ) and setting f (·) = IA (·), g(·) = IB (·). [Hint: Take care in passing from ordinary to factorial moments.] 6.3.5 (a) Show that a suﬃcient condition for the existence of a stationary cluster process is that the mean cluster size be ﬁnite. (b) Show by counterexample that the condition is not necessary, even for a Poisson cluster process.

192

6. Models Constructed via Conditioning [Hint: For part (a), show ﬁrst that in the stationary case,

M1 (A | x) dx = µc

M1 (A) = µc X

M1 (A − x | 0) dx = m(A), X

and then observe that p(A | x) ≤ M1 (A | x). For part (b), consider a compound Poisson process with inﬁnite mean batch size.] 6.3.6 (a) Show that a stationary Poisson cluster process is simple if and only if each cluster member process is simple. (b) When this condition is satisﬁed, show that the d.f. F corresponding to an interval between successive points of the process has coeﬃcient of variation ≥ 1. [Hint: Show that R(t) ≡ − log S(t) in (6.3.8) is subadditive in t > 0 and hence that S(t) ≥ exp(−R (0+)t). Use Korolyuk’s theorem to identify 1/R (0+) as the ﬁrst moment of F , and use a hazard function argument (see ∞ Exercise 3.4.2) to identify the second moment of F with (2/R (0+)) 0 S(t) dt. Exercise 6.3.9(b) below gives a special case.] 6.3.7 For a Neyman–Scott Poisson cluster process as around (6.3.20) with Y = X = R, suppose F (x) has an exponential distribution. Use (6.3.20) to show (see Vere-Jones, 1970) that the hazard function below (6.3.15) for the distance from the origin to the nearest point of the process is given by r(t) =

µc (1 − Q(e−λt )) . 1 − e−λt

6.3.8 Consider a Neyman–Scott cluster process with cluster centres yi the points of a Poisson process at rate µc and for each such point a Poisson-distributed random number ni of points, with mean Yi for an i.i.d. sequence of r.v.s {Yi }, are located at {yi +xij : j = 1, . . . , ni }, where the xij are i.i.d. with probability density g(·). Show that such a process {yi + xij : i = 1, . . . , ni , all i} is identical with the shot-noise process of Example 6.2(a). 6.3.9 (a) Evaluate the ﬁrst-moment measure of the interval (0, t] for a cluster with centre y in a Bartlett–Lewis process as ⎧0 y > t, ⎨ Mc ((0, t] | y ) =

1+

∞

⎩ ∞

i=1

ri F i∗ (t − y)

r [F i∗ (t + |y|) − F i∗ (|y|)] i=1 i

0 < y ≤ t,

y ≤ 0.

(b) Show that the hazard function for the interval distribution in the process corresponding to (6.3.24) is r(t) = µc + µc m[1] (1 − F (t)) −

m[1] f (t) , 1 + m[1] (1 − F (t))

where f (t) is the density corresponding to F (t). Now verify Exercise 6.3.6(b): the interval distribution has coeﬃcient of variation ≥ 1 (Lewis, 1964a). 6.3.10 Suppose the common d.f. in a Neyman–Scott type process in R2 is circular normal with density f (x, y) = (2π)−1 exp[− 12 (x2 + y 2 )]. Show that the probability that a particular point of a given cluster falls in the circle of radius r and centre at the origin, when the cluster centre is at a distance ρ from the

6.3.

Cluster Processes

193

origin, equals 2

P (r | ρ) ≡ e−ρ

t

/2

ue−u

2

/2

I0 (uρ) du,

0

where I0 is the modiﬁed Bessel function of zero order. Then the log survivor function of the distance from the origin to the nearest point of such a Neyman–Scott Poisson cluster process, with cluster p.g.f. Q(z), is given by

∞

[1 − Q(1 − P (r | ρ))] ρ dρ.

− log S(r) = 2πµc 0

In particular, if the number in each cluster has a Poisson distribution with mean λ, ∞

− log S(r) = 2πµc

(1 − e−λP (r|ρ) ) ρ dρ.

0

6.3.11 Show that P (z) = exp{q1 (z − 1) + q2 (z 2 − 1)} is a univariate p.g.f. if and only if q1 ≥ 0, q2 ≥ 0, and hence complete the proof of Proposition 6.3.IV. [Hint: To be a p.g.f., P (z) must have nonnegative coeﬃcients as a power series in z, while by virtue of its representation, P (z) is an entire function. Hence, show that log P (z) must be well deﬁned and nondecreasing on the whole positive half-line z > 0, and deduce that both q1 and q2 ≥ 0.] 6.3.12 Show that a point process N is Gauss–Poisson if and only if the ﬁrst two Khinchin measures are nonnegative with boundedly ﬁnite marginals and all remaining Khinchin measures vanish. [This is a rephrasing of Proposition 6.3.IV and Examples 6.2(c) and 6.3(d).] 6.3.13 Show that the functional of (possibly signed) measures Q1 (·) and Q2 (· × ·)

[h(x) − 1] Q1 (dx) + X

1 2

X (2)

[h(x) − 1] [h(y) − 1] Q2 (dx × dy)

equals the logarithm of the p.g.ﬂ. of some point process if and only if Q1 is nonnegative and the symmetrized version s

Q2 (A × B) = 12 (Q2 (A × B) + Q2 (B × A)) s

is nonnegative and bounded as in Q2 (A × B) ≤ min (Q1 (A), Q1 (B)) for bounded A, B ∈ BX . [Hint: Reduce the functional above to the form of (6.3.30) and appeal to Proposition 6.3.IV. See also Example 6.2(d).] 6.3.14 Proposition 6.3.V represents a Poisson cluster process with a.s. ﬁnite clusters. Realize a cluster of size k and choose one of its points, Y say, at random. Show that Kk (A × Y (k−1) ) , Pr{Y ∈ A} = Kk (Y (k) ) but k k Kk (A(i) × (Ac )(k−i) ) a cluster realization of . = Pr size k has a point in A i Kk (Y (k) ) i=1

6.3.15 The factorial cumulant measures C[k] of a Gauss–Poisson process vanish for k = 3, 4, . . . . Show in general that for a Poisson cluster process with clusters of size not exceeding k0 , C[k] vanishes for k > k0 . [Hint: Use (6.3.32) and write 1 + h for h.]

194

6. Models Constructed via Conditioning

6.4. Marked Point Processes In many stochastic process models, a point process arises not as the primary object of study but as a component of a more complex model; often, the point process is the component that carries the information about the locations in time or space of objects that may themselves have a stochastic structure and stochastic dependency relations. From the point of view of point process theory, many such models can be subsumed under the heading of marked point processes. In this section, we provide an initial study of such processes, particularly those with links to the Cox and cluster processes described in the two preceding sections. For any marked point process, the locations {xi } where the events occur constitute an important process in their own right (the xi may denote times but could also be two- or three-dimensional, for example). We shall refer to this process as the ground process and accordingly denote it by Ng . Deﬁnitions 6.4.I. (a) A marked point process (MPP), with locations in the c.s.m.s. X and marks in the c.s.m.s. K, is a point process {(xi , κi )} on X × K with the additional property that the ground process Ng (·) is itself a point process; i.e. for bounded A ∈ BX , Ng (A) = N (A × K) < ∞. (b) A multivariate (or multitype) point process is a marked point process with mark space the ﬁnite set {1, . . . , m} for some ﬁnite integer m. If a marked point process N is regarded as a process on the product space X × K, then the ground process Ng is the marginal process of locations. However, it is a consequence of Deﬁnition 6.4.I(a) that not all point processes on product spaces are marked point processes. For example, the bivariate Poisson process on R2 with parameter measure µ dx dy cannot be represented as an MPP on R × R because such a Poisson process has N (A × R) = ∞ a.s. for Borel sets A of positive Lebesgue measure. However, in the special case of a multivariate point process, the extra condition is redundant since the ﬁniteness of the mark space immediately implies that each component process Ni (·) = N (· × {i}) is boundedly ﬁnite and we can write Ng (·) = N (· × {1, . . . , m}) =

m

Ni (·).

(6.4.1)

i=1

In general, an MPP can be regarded either as a point process in the product space X × K subject to the ﬁniteness constraint on the ground process Ng as set out above, or as an ordinary (not necessarily simple) point process in X , {xi } say, with an associated sequence of random variables {κi } taking their values in K. Either approach leads to the representation of the MPP as a set of pairs {(xi , κi )} in the product space. They are equivalent whenever it can be shown that the marks κi in an MPP are well-deﬁned random variables, which is certainly the case when the ground process has ﬁnite intensity, but there are subtleties in general: see Section 8.3 and Chapter 9 for further discussion.

6.4.

Marked Point Processes

195

The class of MPPs is a great deal richer than might at ﬁrst appear. This is due to the great variety of forms that can be taken by the marks and the variety of dependence relations that can exist between the marks themselves and their locations. When X = R, for example, many remarkable results can be obtained by taking the mark at an event xi to represent some feature from the history of the process up to xi . A careful study of such MPPs lies at the heart of the fundamental researches of Matthes, Mecke, and co-workers. Extending the concepts of earlier chapters, we deﬁne for MPPs the following two classes of point processes. Deﬁnition 6.4.II. (a) The MPP N is simple if the ground process Ng is simple. (b) The MPP N on X = Rd is stationary (homogeneous) if the probability structure of the process is invariant under shifts in X . The structure of an MPP may be spelled out in a variety of ways. If the ground process Ng is not necessarily simple, it can be thought of as a cluster process in which the cluster centres xi are the distinct locations in X and the cluster members are all pairs in X × K of the form (xi , κij ), where the κij are the marks of the points with common location xi . Equally, however, the family κij could be thought of as a single, compound mark in the space K∪ deﬁned as in (5.3.8). This last comment implies that by suitably redeﬁning the marks, any MPP on X can be represented as an MPP on X for which the ground process Ng is simple. For many applications, though not for all, we may therefore assume that the MPPs we encounter are simple. The next pair of deﬁnitions characterize two important types of independence relating to the mark structure of MPPs. Observe in part (b) that a crucial feature is the role of order in the location space: it reﬂects the evolutionary property that we associate with a time-like dimension. Deﬁnition 6.4.III (Independent marks and unpredictable marks). Let the MPP N = {(xi , κi )} on X × K be given. (a) N has independent marks if, given the ground process Ng = {xi }, the {κi } are mutually independent random variables such that the distribution of κi depends only on the corresponding location xi . (b) For X = R, N has unpredictable marks if the distribution of the mark at xi is independent of locations and marks {(xj , κj )} for which xj < xi . The most common case of an MPP with independent marks occurs when the κi are in fact i.i.d. Similarly, the most common case of a process with unpredictable marks occurs when the marks are conditionally i.i.d. given the past of the process (but the marks may inﬂuence the future of Ng ). The next proposition outlines the basic structure of processes with independent marks, introducing in particular the mark kernel F (· | ·) at a speciﬁed location. P.g.ﬂ.s for MPPs are deﬁned over the space V(X × K) of measurable functions h(x, κ) that lie between 0 and 1 and for some bounded set A, h(x, κ) = 1 for all κ ∈ K and x ∈ / A.

196

6. Models Constructed via Conditioning

Proposition 6.4.IV (Structure of MPP with independent marks). Let N be an MPP with independent marks. (a) The probability structure of N is completely deﬁned by the distribution of the ground process Ng and the mark kernel {F (K | x): K ∈ B(K), x ∈ X }, representing the conditional distribution of the mark, given the location x. (b) The p.g.ﬂ. for N takes the form G[h] = Gg [hF ]

(h ∈ V(X × K)),

(6.4.2)

where Gg is the p.g.ﬂ. of Ng and hF (x) = K h(x, κ) F (dκ | x). (c) The moment measure Mk of order k for N exists if and only if the corresponding moment measure Mkg exists for the ground process Ng , in which case Mk (dx1 × · · · × dxk × dκ1 × · · · × dκk ) = Mkg (dx1 × · · · × dxk )

k

F (dκi | xi ).

(6.4.3)

i=1

Similar representations hold for factorial and cumulant measures. Proof. All the statements above are corollaries of the general results for conditional point processes outlined in Section 6.1. In the present case, we deduce statements for the process of pairs {(xi , κi )} from their distribution conditional on the process of locations {xi } using the conditional independence of the κi . Because of the independence properties, it is easiest to approach the statements via the p.g.ﬂ. Given the locations xi , the p.g.ﬂ. of the pairs (xi , κi ) takes the form G[h(x, κ) | Ng ] = h(xi , κ) F (dκ | xi ) = hF (xi ). (6.4.4) i

K

i

Note that hF ∈ V(X ) when h ∈ V(X × K) because for some bounded set A, / A and all κ ∈ K, and hence for such x, hF (x) = h(x, κ) = 1 for x ∈ h(x, κ) F (dκ | x) = 1. Provided then that Ng exists, the ﬁnal product is K well deﬁned for h ∈ V(X × K) and deﬁnes a measurable function of Ng . We thus have a measurable family satisfying Lemma 6.1.III(b); taking expectations over the locations, we obtain (6.4.2). Since the p.g.ﬂ. is well deﬁned, so are the ﬁdi distributions and hence the probability structure of the process. To justify the expressions for the moment measures, consider an integral of the form h(x1 , . . . , xk , κ1 , . . . , κk ) N (dx1 × dκ1 ) · · · N (dxk × dκk ). Conditional on the locations {xi }, its expectation can be written K

· · · h(x1 , . . . , xk , κ1 , . . . , κk ) F (dκ1 | x1 ) · · · F (dκk | xk ). K

(6.4.5)

6.4.

Marked Point Processes

197

Now taking expectations over the locations, assuming the moment measure to exist for the ground process, we obtain (6.4.3), ﬁnite or inﬁnite according to whether the integrals converge. But convergence of the integrals for all appropriate h is the necessary and suﬃcient condition for the existence of the moment measures, so statement (c) follows. In many applications, K = R+ and interest centres on the random measure deﬁned by

κ N (dx × dκ) = κi . (6.4.6) ξ(A) = A×K

xi ∈A

Its properties when ξ has independent marks are summarized below. Observe that if κi = κ a.s. for all i, then ξ(A) = κNg (A). Proposition 6.4.V. If K = R+ and the MPP N has independent marks, ξ in (6.4.6) deﬁnes a purely atomic random measure on X with only ﬁnitely many atoms on any bounded set A ∈ BX . It has Laplace functional Lξ [h] = Gg [φh ]

(h ∈ BM+ (X )),

(6.4.7)

where φh (x) = K e−κh(x) F (dκ | x) and Gg is as in (6.4.2). The moment measure Mkξ of order k for ξ exists if (i) the moment measure Mkg of order k exists for the ground process Ng , (ii) the kth moment of the mark distribution, µk (x) = R+ κk F (dκ | x) exists M1g -a.e., and (iii) the integrals deﬁning Mkξ in terms of µr and Msg for r, s = 1, . . . , k, converge. When they exist, the ﬁrst- and second-moment measures are given, for bounded A, B ∈ BX , by M1ξ (B) =

µ1 (x) M1g (dx) ,

(6.4.8)

B

M2ξ (A × B) = A×B

g µ1 (x1 )µ1 (x2 ) M[2] (dx1 × dx2 ) +

µ2 (x) M1g (dx) . A∩B

(6.4.9) Proof. The statements follow from reasoning similar to that used in Proposition 6.4.IV. The integral in (6.4.6) is a.s. ﬁnite when A is bounded (since the sum is then over an a.s. ﬁnite number of terms) and is easily seen to have the additivity properties required of a random measure. Its Laplace functional and moment measures can again be found by ﬁrst conditioning on the locations. Thus, Lξ (h | Ng ) equals E exp −

R+

& & & h(x) ξ(dx) & Ng = i

R+

e

−κh(xi )

F (dκ | xi ) .

198

6. Models Constructed via Conditioning

Equation (6.4.7) follows on taking expectations over the locations. Note that when h ∈ BM+ (X ), the Laplace–Stieltjes transform φh ∈ V(X ), as is required for a p.g.ﬂ. Equation (6.4.8) is derived similarly. To obtain (6.4.9), we have to condition on the location of pairs (xi , xj ) deﬁned by the product counting measure Ng × Ng . Note the special attention given to the diagonal pairs (xi , xi ): M2ξ (A × B) equals E κ1 κ2 F (dκ1 | x1 ) F (dκ2 | x2 ) Ng (dx1 ) Ng (dx2 ) A B K K + κ2 F (dκ | x) Ng (dx) A∩B K g = µ1 (x1 )µ1 (x2 ) M[2] (dx1 × dx2 ) + µ2 (x) M1g (dx) . A×B

A∩B

These expressions can be checked by expanding the functionals and transforms concerned (see Exercise 6.4.1 for the case k = 3). As for cluster processes, the results simplify if the process is stationary, and the relevant factorial moment densities exist. Stationarity implies that the mark kernel is independent of x, F (· | x) = F (·) say, so that φh in (6.4.7) becomes φh (x) = K e−κh(x) F (dκ), the usual Laplace–Stieltjes transform of the distribution F evaluated at h(x) ∈ BM+ (X ). Given the existence of the g reduced densities m ˘ g[2] (·) and c˘[2] (·), and writing µk = K κk F (dκ), (6.4.8) and (6.4.9) lead to m = µ1 mg , m ˘ 2 (u) = c˘2 (u) =

(µ1 ) m ˘ g[2] (u) + δ(u)µ2 mg , g (µ1 )2 c˘[2] (u) + δ(u)µ2 mg . 2

(6.4.10) (6.4.11a) (6.4.11b)

The appearance of the δ-function in (6.4.11) is a reminder that the ξ process, as well as the process Ng , is purely atomic and therefore has a diagonal concentration (see Section 8.1 below). Equation (6.4.11b) leads to the wellknown expression for the variance of a random sum of i.i.d. r.v.s, var ξ(A) = [E(κ)]2 var Ng (A) + E[Ng (A)] var κ.

(6.4.12)

Extension of the discussion above to the mark space K = R is possible but leads to signed measures and requires the use of characteristic functionals in place of Laplace functionals; see Exercise 6.4.2. An important special case arises when the ground process Ng is Poisson. We call such a process a compound Poisson process. As such, it extends the compound Poisson process introduced in Section 2.2, where K = Z+ . For this (generalized) compound Poisson process, the marks often represent a weight associated with the point, such as a monetary value in ﬁnancial applications, an energy or seismic moment in seismology, a weight or volume in forestry or

6.4.

Marked Point Processes

199

geological prospecting, and so on. In such cases, ξ measures the total value, energy, weight, volume, etc., accumulating within a certain time interval or region. We give some examples shortly but ﬁrst present a simple, important structural property that foreshadows results for more general classes of MPPs. Lemma 6.4.VI. A compound Poisson process that has mark kernel F (· | ·), and for which the Poisson ground process Ng has intensity measure µ(·), is equivalent to a Poisson process on the product space X × K with intensity measure Λ(dx × dκ) = µ(dx) F (dκ | x). Proof. We examine the p.g.ﬂ.s. Substituting in (6.4.2) for the p.g.ﬂ. of the Poisson process for Ng and rearranging, we have, using notation from (6.4.2), G[h] = exp

[h(x, κ) − 1] F (dκ | x) µ(dx) , [hF (x) − 1] µ(dx) = exp

where the last expression can be identiﬁed with the p.g.ﬂ. of the Poisson process on the product space. Many classical stochastic models are rooted in the compound Poisson process. One famous example is as follows. Example 6.4(a) Lundberg’s collective risk model (Lundberg, 1903; Cram´er, an insurer are made at times ti . Let 1930). Suppose that claims Wi against ξ(t) represent the accumulated claims i:0 µE(W ), ruin may be avoided and interest centres around estimating the probability of ruin, say η. In both cases, important information may be derived from the observation that, if

200

6. Models Constructed via Conditioning

τi = ti − ti−1 , then the random variables Zi = Wi − ατi are independent, so that the process n

Un − U0 = Zi = αtn − ξtn i=1

constitutes a random walk. In particular, this observation, coupled to a standard martingale argument, leads to the classical Cram´er bound on the probability of ultimate ruin. The argument is outlined in Exercise 6.4.3 (or else, see e.g. Embrechts et al., 1997, Section 1.1). Example 6.4(b) Negative binomial processes. The negative binomial distribution is a common choice for the count random variables N (A) in applications to processes N (·) where a clustering alternative is preferred to the Poisson process. It is somewhat surprising that the only known examples of processes yielding the negative binomial form for the distributions of N (A) are both extreme cases: a compound Poisson process that has the complete independence property and in which all the clusters are concentrated at single points, and a mixed Poisson process in which the individual realizations are indistinguishable from those of a Poisson process. The usefulness of the negative binomial distribution in practice stems more from its relative simplicity and tractability than its link to organic physical models, although it will of course be true that for long time intervals, when the time scale of clustering is short relative to the time scale of observation, the compound Poisson model may be an adequate approximation. We describe these two models; see also Gr´egoire (1984) and the review article of Diggle and Milne (1983). (i) Compound Poisson process leading to negative binomial distributions. Suppose there is given a compound Poisson process with constant intensity µ and discrete mark distribution that is independent of the location x. If N (A) is to have a negative binomial distribution, then we know from Example 5.2(a) that the cluster size distribution should have the logarithmic form πn (x) = (ρn /n) log[1/(1 − ρ)]. Taking this as the mark distribution, we ﬁnd that the p.g.ﬂ. for the resulting random measure ξ, which in this case is again a point process but nonorderly, now has the form log [1 − ρh(x)]/(1 − ρ) µ(dx) (h ∈ V(X )). G[h] = exp log(1 − ρ) X This corresponds to the multivariate p.g.f. for the ﬁdi distributions on disjoint sets A1 , . . . , Ak , −µ(Ai )/ log(1−ρ) k 1−ρ Pk (A1 , . . . , Ak ; z1 , . . . , zk ) = , 1 − ρzi i=1

6.4.

Marked Point Processes

201

representing one simple type of multivariate negative binomial distribution. The factorial cumulant measures can be obtained from the expansion log[1 − ρη(x)/(1 − ρ)] µ(dx) log G[1 + η] = log(1 − ρ) X k ∞

ρ 1 1 [η(x)]k µ(dx) =− log(1 − ρ) k 1−ρ X k=1

so that C[k] (·) for k ≥ 2 is a singular measure with a concentration c[k] µ(·) on the diagonal x1 = · · · = xk , where c[k] is the kth factorial moment of the logarithmic distribution, or, equivalently, c[k] / log[1/(1 − ρ)] is the kth factorial cumulant of the negative binomial distribution. Recall the p.g.f. of the negative binomial distribution in Example 5.2(a) and the p.g.ﬂ. for a local process on a bounded Borel set A as in Example 5.5(b). The p.g.ﬂ. for the type (i) negative binomial process applied to Example 5.5(b) gives us (since the integral over Ac vanishes) 1 1 − ρh µ(dx) , log GA [1 − IA + h∗ ] = exp log(1 − ρ) A 1−ρ where h∗ (x) = h(x)IA (x). Thus, the localized process is still a negative binomial process. The local Janossy measures can be found from the expansion ∞

ρn (n) 1 − ρh = − log(1 − ρ) + log h , 1−ρ n n=1 from which we deduce that p0 (A) = exp[−µ(A)] and J1 (dx | A) = ρp0 (A) µ(dx), J2 (dx1 × dx2 | A) = ρ2 p0 (A)[µ(dx1 )µ(dx2 ) + δ(x1 , x2 )µ(dx1 )], where the two terms in J2 represent contributions from two single-point clusters at x1 and x2 (x1 = x2 ) and a two-point cluster at x1 = x2 . (ii) Mixed Poisson process leading to negative binomial distributions. Take the mixing distribution Π with Laplace–Stieltjes transform Π∗ as in (6.1.16), now generalized to the nonstationary case, to have the gamma distribution Γ(α, λ) with Laplace–Stieltjes transform (1 + s/λ)−α . Then ∗

G[h] = Π

−α 1 [1 − h(x)] µ(dx) = 1 + [1 − h(x)] µ(dx) , λ X X

so that the multivariate p.g.f. has the form ! Pk (A1 , . . . , Ak ; z1 , . . . , zk ) =

"−α k 1 1+ (1 − zi )µ(Ai ) . λ i=1

202

6. Models Constructed via Conditioning

The factorial cumulants can be obtained from the expansion

1 log G[1 + η] = −α log 1 − λ

X

k ∞

1 η(x) µ(dx) , η(x) µ(dx) = α k λ X k=1

so C[k] (dx1 × · · · × dxk ) = αλ−k (k − 1)! µ(dx1 ) · · · µ(dxk ), where we can recognize the coeﬃcient of the product measure on the righthand side as the kth cumulant measure of the negative binomial distribution. Note that Example 5.2(a) corresponds to the case where the measure µ(·) is totally ﬁnite, in which case µ(X )/λ here equals the parameter µ there. Most of the examples of point processes that we have considered in earlier sections can be adorned with marks in a way similar to the Poisson process in Examples 6.4(a) and (b) above. The choice of underlying model will depend on the context and anticipated dependence structure. The most interesting extensions appear when we drop the assumption of completely independent marks and consider ways in which either the marks can inﬂuence the future development of the process or the current state of the process can inﬂuence the distribution of marks, or both. Using the Hawkes process of Example 6.3(c) as below illustrates some of the many possible issues that can arise. Example 6.4(c) Marked Hawkes process. Marked versions of the Hawkes process of Example 6.3(c) are best known from Hawkes (1971b, 1972), who considered the multivariate case in detail, with an application in Hawkes and Adamopoulos (1973), though Kerstan (1964) considered them at length. We consider here the case of unpredictable marks; for a more general multivariate extension, see Example 8.3(c). Both extensions have important applications in seismology [see also Example 6.4(d) below], epidemiology, neurophysiology, and teletraﬃc (see e.g. Br´emaud and Massouli´e, 1996). In extending the Hawkes process of Example 6.3(c) to an MPP {(xi , κi )}, we interpret the marks κi as the ‘type’ of an individual in a multitype branching process. Recall that, in the branching process interpretation, points in a Hawkes process are either ‘immigrants’ without parents or ‘oﬀspring’ of another point in the process. This (multitype) model now incorporates the following assumptions: (i) immigrants arrive according to a compound Poisson process N (dy × dκ) with constant rate µc and ﬁxed mark distribution F (dκ); (ii) each individual in the process, whether an immigrant or not, has the potential to act as an ancestor and thereby yield ﬁrst-generation oﬀspring according to an ordinary Poisson process with intensity measure µ(du | κ) = ψ(κ) µ(du) that depends only on the mark κ of the ancestor event and the distance u of the oﬀspring from the ancestor; and (iii) the marks of the oﬀspring form an i.i.d. sequence with the same d.f. F as the immigrants.

6.4.

Marked Point Processes

203

The factor ψ(κ) determines the relative average sizes of families with different marks, while the measure µ(·) determines how the family members are spread out along the time axis. For a stable process, µ(X ) must be ﬁnite, and for the sake of deﬁniteness, we assume that µ(X ) = 1 so that ψ(κ) becomes the expected number of direct oﬀspring with mark κ. In principle, the analysis of such a process requires the general theory of multiple type branching processes with a continuous range of types. However, the assumption of i.i.d. marks (i.e. oﬀspring types) greatly simpliﬁes the analysis. Indeed, the assumptions above imply that the ground process Ng for this marked point process can be described as an ordinary Hawkes process with immigration rate µc and infectivity measure µg (du) = ρ µ(du),

where ρ = E[ψ(κ)] = K

ψ(κ) F (dκ) < ∞.

If then ρ < 1, the total number of progeny is a.s. ﬁnite with ﬁnite mean 1/(1 − ρ) so that the ground process is well deﬁned and has a stationary version (see Exercise 6.3.5). Since the overall process may itself be regarded as a Poisson cluster process taking its values in X × K, a second application of Exercise 6.3.5 implies that the overall process has a well-deﬁned stationary version. We state this formally for reference. Proposition 6.4.VII. Using the notation above, suﬃcient conditions for the existence of a stationary version of the marked Hawkes process with unpredictable marks are (i) the intensity measure µ(·) is totally ﬁnite (and then taken to be a probability measure); and (ii) ρ = E[ψ(κ)] < 1. First- and second-order properties of the process can be obtained by combining results for branching processes with results for cluster processes and are given in Chapter 8. The p.g.ﬂ. is diﬃcult to obtain explicitly; one approach is suggested in Exercise 6.4.4. Many variations and extensions of this model are possible. Example 7.3(b) will show that the conditional intensity for this process has a very simple and powerful linear form, which lends itself to various types of generalization. The mark can be expanded to include a spatial as well as a size component, as for the spatial ETAS model described below. The assumption of unpredictable marks can also be weakened in several ways, for example by allowing the distributions of the marks of the oﬀspring to depend on either the mark of the ancestor or the oﬀspring’s distance from the ancestor, or both. See Example 8.3(e) for a somewhat simpler model illustrating such dependence. If the branching structure is critical rather than subcritical (i.e. ρ = 1), further types of behaviour can occur. For example, if the infectivity function is suﬃciently long-tailed, Br´emaud and Massouli´e (2001) provides examples of stationary Hawkes processes without immigration (i.e. of a Hawkes process

204

6. Models Constructed via Conditioning

whose clusters overlap at such large distances that the process maintains a stationary regime). Further details are given in Chapter 10. Example 6.4(d) Ordinary and spatial ETAS models. Ogata (1988) introduced the ETAS (Epidemic Type After-Shock) model to describe earthquake occurrence, following earlier applications of the Hawkes model to this context by Hawkes and Adamopoulos (1973) and Vere-Jones and Ozaki (1982). It corresponds to the special case of the marked Hawkes process where X = K = R, the xi are interpreted as the occurrence times of the earthquakes and the κi as their magnitudes, and the following speciﬁc choices are made: ψ(κ) = Aeα(κ−κ0 ) I{κ>κ0 } (κ), K µ(du) = I{u>0} (u) du, (c + u)1+p F (dκ) = βe−β(κ−κ0 ) I{κ>κ0 } (κ) dκ. These choices are dictated largely by seismological considerations: thus, the mark distribution cited above corresponds to the Gutenberg–Richter frequency–magnitude law, while the power-law form for µ follows the empirical Omori Law for aftershock sequences. The free parameters ∞ are β, α, c, A and p. K = p cp is a normalizing constant chosen to ensure 0 µ(du) = 1. In this case, suﬃcient conditions for a stationary process are that p > 0,

β > α,

and

ρ = Aβ/(β − α) < 1.

The last condition in particular is physically somewhat unrealistic since it is well known that the frequency–magnitude distribution cannot retain the pure exponential form indeﬁnitely, but must drop to zero much more quickly for very large magnitudes. An important extension involves adding locations to the description of the oﬀspring so that the branching structure evolves in both space and time. Then, one obvious way of extending the model is to have the ground process include both space and time coordinates, retaining the same mark space K. From the computational point of view, however, and especially for the conditional intensity and likelihood analyses to be described in Chapter 7, there are advantages in keeping the ground process to the set of time points and regarding the spatial coordinates as additional dimensions of the mark. The weight (magnitude) component of the mark retains its unpredictable character (so the weights are i.i.d. given the past), but we allow the spatial component of the mark to be aﬀected by the spatial location of its ancestor. No matter which of these descriptions we adopt, the cluster structure evolves over both space and time, oﬀspring events occurring at various distances away from the initial ancestor, just as they follow it in time. When the branching structure is spatially homogeneous, the infectivity measure µ(dt × dx) depends both on the time delay u = t − t0 and the displacement y = x − x0 from the time and location of the ancestor (t0 , x0 ).

6.4.

Marked Point Processes

205

Various branching mechanisms of this type have been proposed in the literature [see e.g. Ogata (1998) for a review]. Thus, Vere-Jones and Musmeci (1992) suggests a space–time diﬀusion with infectivity density βe−βu 1 y2 z2 µ(du × dy × dz) = du dy dz, exp − + 2 2πuσy σz 2u σy2 σz whereas Ogata’s space–time ETAS model uses a simpler product form for the space and time terms. Many choices are possible for the components of the model without aﬀecting the underlying cluster character. In some applications, the assumption of spatial homogeneity may not be appropriate, so the infectivity or mark distribution may depend on the absolute location of the oﬀspring as well as its separation from the ancestor. In all of this wide diversity of models, the basic suﬃcient condition for the existence of a stationary version of the model, essentially the subcriticality of the oﬀspring branching process, is aﬀected only insofar as the integral of the infectivity measure needs to be extended over space as well as time. We conclude this section with a preliminary foray into the fascinating and also practically important realm of stochastic geometry. Marked point processes play an important role here as models for ﬁnite or denumerable families of random geometrical objects. The objects may be of many kinds: triplets or quadruplets of points (then, the process would be a special case of a cluster process), circles, line segments, triangles, spheres, and so on. Deﬁnition 6.4.VIII (Particle process). A particle process is a point process with state space ΣX equal to the class of nonempty compact sets in X . Thus, a typical realization of a particle process is a sequence, ordered in some way, of compact sets {K1 , K2 , . . .} from the c.s.m.s. X . An underlying diﬃculty with such a deﬁnition is that of ﬁnding a convenient metric for the space ΣX . One possibility is the Hausdorﬀ metric deﬁned by ρ(K1 , K2 ) = inf{: K1 ⊆ K2 and K2 ⊆ K1 }, where K is the halo set x∈K S (x) (see Appendix A2.2); for further references and discussion, see Stoyan et al. (1995), Stoyan and Stoyan (1994), and Molchanov (1997), amongst others. In special cases, when the elements are more speciﬁc geometrical objects such as spheres or line segments, this diﬃculty does not arise, as there are many suitable metrics at hand. Very often, interest centres on the union set or coverage process Ξ= Si (see Hall, 1988), which is then an example of a random closed set in X . Now let us suppose that X = Rd and that for each compact set S ⊂ X we can identify a unique centre y(S), for example its centre of gravity. Then, we

206

6. Models Constructed via Conditioning

may introduce an equivalence relation among the sets in ΣX by deﬁning two compact sets to belong to the same equivalence class if they diﬀer only by a translation. The sets in Σo ≡ ΣoX , the compact subsets of X with their centres at the origin, index the equivalence classes so that every set S ∈ ΣX can be represented as the pair (y, S o ), where y ∈ X and S o ∈ Σo , and S = y + S o (set addition). This opens the way to deﬁning the particle process as an MPP {yi , Si }, where the {yi } form a point process in X and the marks {Si } take their values in Σo . Once again, there is the problem of identifying a convenient metric on Σo , but this point aside, we have represented the original particle process as an example of a so-called germ–grain model in which the {yi } are the germs and the {Si } are the grains. The next example illustrates one of the most straightforward and widely used models of this type. Example 6.4(e) Boolean model. This is the compound Poisson analogue for germ–grain models. We suppose that the locations {yi } form a Poisson process in X and that the compact sets Sio are i.i.d. and independent of the location process; write Si = yi +Sio . Two derived processes suggest themselves for special attention. One is the random measure Υ(·) formed by superposing the compact sets Si . With the addition of random weights Wi , this gives the bounded set A the (random) mass

Wi (A ∩ Si ) (A ∈ BX ), (6.4.13) Υ(A) = i

where (·) is the reference measure on X (e.g. Lebesgue measure, or counting measure on a lattice). The other is the localized measure of the union set Ξ described above, which gives the bounded set A the (random) mass # $ Ψ(A) = (A ∩ Ξ) ≡ (6.4.14) i (A ∩ Si ) . For example, (6.4.13) might represent the total mass of ejected material falling within the set A from a series of volcanic eruptions at diﬀerent locations; then (6.4.14) would represent the area of A covered by the ejected material. In both cases, the processes can be represented in terms of densities forming random processes (random ﬁelds) on X . Thus, (6.4.13) and (6.4.14) have respective densities

υ(x) = Wi ISi (x) (6.4.15) and

i

ψ(x) = I{∪i Si } (x).

(6.4.16)

Many aspects of these and related processes are studied in the stochastic geometry literature such as Math´eron (1975), Stoyan et al. (1995) and Molchanov (1997). Here we restrict ourselves to a consideration of the mean and covariance functions of (6.4.15) and (6.4.16) under the more explicit assumptions that X = R2 , that the location process Ng of centres {y(Si )} = {yi } is a simple Poisson process with constant intensity λ, and that each Si is a

6.4.

Marked Point Processes

207

disk of random radius Ri and has weight Wi that may depend on Ri but that the pairs (Ri , Wi ) are mutually independent and independent also of the centres {yi }. Consistent with our earlier description, we thus have an MPP on R2 , with mark space R+ × R+ , and hence a point process N on R2 × R2+ . The mean and covariance function for υ(x) can be found by ﬁrst conditioning on the ground process Ng as in earlier examples. Thus, writing υ(x) as υ(x) = R2 ×R2+

wI{r≥y−x} (r, y) N (dy × dr × dw)

(6.4.17)

and taking expectations, the independence assumptions coupled with the stationarity of the Poisson process yield E[υ(x)] = λ E W R2 2

I{R≥y} (R, y) dy = λ E W 0

R

2π

r dr dθ 0

= λ π E(W R ) . The second moment E[υ(x1 )υ(x2 )] can be found similarly by ﬁrst conditioning on the {yi }. Terms involving both pairs of distinct locations and coincident locations (arising from the diagonal term in the second-moment measure of the location process) are involved. However, as for Poisson cluster processes, we ﬁnd that the covariance cov[υ(x1 ), υ(x2 )] depends only on the term involving coincident locations: it equals E R2 ×R+ ×R+

w2 I{r≥y−x1 ,r≥y−x2 } (r, y) N (dy × dr × dw)

2 = λE W

I{R≥max(y−x1 ,y−x2 )} (R, y) dy = 2λE W 2 R2 arcos(u/R) − u R2 − u2 I{R≥u} (R) , R2

where u = 12 x1 − x2 . Note that the ﬁrst moment is independent of x and the covariance is a function only of x1 − x2 , as we should expect from the stationary, isotropic character of the generating process. Note also that if the radius R is ﬁxed, the covariance vanishes for x1 − x2 > 2R. The resemblance of these formulae to those for Poisson cluster processes is hardly coincidental. From a more general point of view, the process is a special case of LeCam’s precipitation model in Exercise 6.3.1, where the Poisson cluster structure is generalized to cluster random measures. Some details and extensions are indicated in Exercise 6.4.6. The corresponding formulae for the union process present quite diﬀerent and, in general, much harder problems since we lose the additive structure for the independent contributions to the sum process. The ﬁrst moment E[ψ(x)] represents the volume fraction of space (in this case area) occupied

208

6. Models Constructed via Conditioning

by the union set Ξ. It can be approached by the following argument, which is characteristic for properties of the Boolean model. First, note that 1 − E[ψ(x)] = 1 − Pr{Ξ x} = Pr{Ξ x} = E [1 − ISi (x)] . i

Conditioning on the locations {yi } (i.e. on the ground process Ng ), we can write Pr{Ξ x | Ng } = Pr{Ri < x − yi } = h(yi ; x) , i

i

say, where h(y; x) = E[I[0,y−x) (R)] and R has the common distribution of the i.i.d. radii Ri . Removing the conditioning, we have h(yi ; x) = Gg [h(· ; x)] = exp − λ 1 − E[ψ(x)] = E [1 − h(y; x)] dy . R2

i

Substituting for h(y; x) and simplifying, we obtain for the mean density the constant 2 p∗ ≡ E[ψ(x)] = 1 − e−λ E(πR ) . (6.4.18) For the second product moment, using similar reasoning, we have m2 (x1 , x2 ) = E[ψ(x1 )ψ(x2 )] = Pr{Ξ x1 , Ξ x2 } = Pr{Ξ x1 } + Pr{Ξ x2 } − [1 − Pr{Ξ x1 or x2 }] = E[ψ(x1 )] + E[ψ(x2 )] − [1 − Pr{Ξ x1 , Ξ x2 }] = 2p∗ − 1 + Gg [h(· ; x1 , x2 )], say, where h(y; x1 , x2 ) = E[I[0,min(y−x1 ,y−x2 )] (R)]. Substituting for the p.g.ﬂ. of the Poisson ground process, putting u = 12 x1 − x2 and simplifying, we ﬁnd that m(x1 , x2 ) equals u 2p∗ −1+exp −λE πR2 (1+I{R
Exercises and Complements to Section 6.4 6.4.1 For the atomic random measure ξ with independent marks as in Proposition 6.4.V, show that the third-order moment measure M3ξ (A1 × A2 × A3 ) equals

A1 ×A2 ×A3

g µ1 (x1 )µ1 (x2 )µ1 (x3 ) M[3] (dx1 × dx2 × dx3 )

+

+

A1 ×A23

+

+

A2 ×A31

A3 ×A12

µ3 (x1 ) M1g (dx1 ),

A1 ∩A2 ∩A3

where Aij = Ai ∩ Aj for i = j.

g µ1 (x1 )µ2 (x2 ) M[2] (dx1 × dx2 )

6.4.

Marked Point Processes

209

[Hint: Each side is the coeﬃcient of 16 s3 in the respective expansions of (6.4.7) with argument sh(·), using (6.1.9) for the Laplace functional, (5.5.4) [with η(x) = φsh(x)− 1 = −sh(x)µ1 (x) + 12 s2 [h(x)]2 µ2 (x) − 16 s3 [h(x)]3 µ3 (x) + · · · and µr (x) = K κr F (dκ | x), r = 1, 2, 3] for the p.g.ﬂ., and φsh as in (6.4.7). The general case now follows by appealing to the symmetry (invariance under permutations of the axes) of the moment measures.] 6.4.2 Develop formulae, analogous to those of Proposition 6.4.V, for characteristic functionals of MPPs with marks in R. Use these to extend the results of Proposition 6.4.V to the case where ξ may be a signed measure. 6.4.3 Cram´er bound on probability of ruin. For the compound risk process, verify the following results [with notation as for Example 6.4(a)]. (i) The sequence Un − U0 forms a random walk with mean α/µ − E(W ). (ii) If ruin occurs, then it does so at the ﬁrst time point tn for which Un < 0. (iii) If α ≤ µE(W ), then ruin is certain, but if α > µE(W ), then there is positive probability that ruin will never occur. (iv) In the latter case, if the Laplace–Stieltjes transform E(e−sW ) is an entire ∗ function of s, then there exists positive real s∗ such that E(e−s W ) = 1. ∗ (v) The sequence {ζn } = {exp(−s Un )} constitutes a martingale for which the time of ruin is a stopping time. (vi) Let pM denote the probability that ruin occurs before the accumulated reserves reach a large number M . Deduce from the martingale property that pM E[exp(s∗ ∆0 ) | 0] + (1 − pM )E[exp(−s∗ ∆M ) | M ] = exp(−s∗ U0 ), where −∆0 and ∆M are the respective overshoots at 0 and M . (vii) Hence, obtain the Cram´er bound for the probability of ultimate ruin p = lim pM ≤ exp(−s∗ U0 ) . M →∞

6.4.4 Find ﬁrst and second factorial moment measures for the ground processes of the marked and space–time Hawkes processes described in Example 6.4(c). [Hint: Use the cluster process representation much as in Example 6.3(c).] 6.4.5 Study the Laplace functional and moment measures for the random measure ξ for a Hawkes process with unpredictable marks. [Hint: Use the cluster representation to get a general form for the p.g.ﬂ. of the process as a process on X × K. From it, develop equations for the ﬁrst and second moments.] Are explicit results available? 6.4.6 Formulate the process Υ(A) in (6.4.13) as an example of a LeCam process (see Exercise 6.3.1). Show that in the special case considered in (6.4.17), when the random sets are spheres [= disks in R2 ] with random radii we can write

Lξ [f | x] = E exp

−W

R2

f (y) I{R≥ x−y } (y) dy

.

Derive expressions for the mean and covariance functions of υ(x) as corollaries.

210

6. Models Constructed via Conditioning

6.4.7 Higher-order moments of the union set. In the context of the union set Ξ of the Boolean model of Example 6.4(e), show that the kth product moment E[ψ(x1 ) · · · ψ(xk )] = Pr{Ξ xj (j = 1, . . . , xk )}, for k distinct points x1 , . . . , xk in X = R2 , equals

1+

k

(−1)r

r

q(xj1 , . . . , xjr ),

r=1

denotes the sum over all distinct r-tuplets of the set {x1 , . . . , xk }, where q(x1 , . . . , xr ) = Gg [h(· ; x1 , . . . , xr )], and the function h(y ; x1 , . . . , xr ) = Pr{R < min1≤j≤r { xj − y }}. [Hint: The relation arises from taking expectations in the expansion of products of indicator random variables I{Ξ all xj } =

j

=1+

I{Ξ xj } =

k r=1

j

(1 − I{Ξ xj })

r r

(−1)

r

=1

I{Ξ xj }

and

r =1

I{Ξ xj } =

r =1

i

I{Si xj } =

r i

=1

I{Si xj },

and the conditional expectation of the last product, given the locations {yi }, equals h(yi ; xj1 , . . . , xjr ), as indicated.]

CHAPTER 7

Conditional Intensities and Likelihoods

A notable absence from the previous chapter was any discussion of likelihood functions. There is a good reason for this absence: the likelihood functions for most of the processes discussed in that chapter are relatively intractable. This diﬃculty was a block to the application of general point process models until the late 1960s, when a quite diﬀerent approach was introduced in papers on ﬁltering theory pioneered by the electrical engineers: see for example Yashin (1970), Snyder (1972), Boel, Varaiya and Wong (1975), Snyder(1975, 2nd ed. Snyder and Miller, 1991), and Kailath and Segall (1975). This approach led to the concept of the conditional intensity function. Once recognised, its role in elucidating the structure of point process likelihoods was soon exploited. General deﬁnitions of the conditional intensity function were given in Rubin (1972) and especially by Br´emaud (1972), in whose work conditional intensity functions were rigorously deﬁned and applied to likelihood and other problems (see also Br´emaud, 1981). Even earlier, Gaver (1963) had introduced what is essentially the same concept through his notion of a random hazard function. Many of these ideas came together in the 1971 Point Process Conference (Lewis, 1972), as a result of which the links between likelihoods, conditional intensities, the theoretical work of Watanabe (1964) and Kunita and Watanabe (1967), and the more practical approaches of Gaver, Hawkes (1971a, b) and Cox (1972a) became more evident. Later, Liptser and Shiryayev (1974, 1977, 1978; 2nd ed. 2000) gave a comprehensive theoretical treatment, while Br´emaud (1981) gave a more accessible account that emphasises applications to queueing theory; this same emphasis is in Baccelli and Br´emaud (1994). The last two decades have seen the systematic development and application of these ideas to applied problems in many ﬁelds, perhaps especially in conjunction with techniques for simulating and predicting point processes. Throughout this chapter runs the theme of delineating classes of models for which the conditional intensity function, and hence the likelihood, has a rel211

212

7. Conditional Intensities and Likelihoods

atively simple form. A key requirement is that the point process should have an evolutionary character: at any time, the current risk—which is just informal terminology for the conditional intensity function—should be explicitly expressible in terms of the past of the process. Many simple point processes in time, including stationary and nonstationary Poisson processes, renewal and Wold processes, and Hawkes processes, fall into this category. So too do many marked point processes in time and also space–time processes, provided that the current distributions of the marks and spatial locations, as well as the current risk, are explicitly expressible in terms of the past. Purely spatial processes—so-called spatial point patterns—cannot be handled so readily this way because they lack a time-like, evolutionary dimension. Nor can processes such as the Neyman–Scott cluster process, in which estimation of the current risk requires averaging over complex combinations of circumstances. However, in some cases of this type, ﬁltering and related iterative techniques can sometimes provide a route forward; they are discussed further in Chapters 14 and 15 alongside the more careful theoretical analysis required to handle conditional intensity functions in a general context. This chapter provides an informal treatment of these issues. We start with a brief introduction to point process likelihoods for a.s. ﬁnite point processes, based on the Janossy densities introduced in Chapter 5. In principle the methods can be applied to observations on a general point process observed within a bounded observation region, but in practice the usefulness of this approach is severely curtailed by the diﬃculty of writing down the Janossy densities for the process within the observation region in terms of a global speciﬁcation of the process. In Section 7.2, we move to the representation of the likelihood of a simple point process evolving in time. Here the technique of successive conditionings on the past, as the process evolves in time, reduces the diﬃculty above to that of specifying initial conditions for the process. It leads to a simple and powerful representation of the likelihood in terms of the conditional intensity function. Then, in Section 7.3 we examine the extension of these ideas to marked and space–time point processes, where the process retains an evolutionary character along the time axis. Section 7.4 is devoted to the discussion of intensity-based random time changes, which have the eﬀect of reducing a general initial process to a simple or compound Poisson process. The time changes are motivated by their applications to goodness-of-ﬁt procedures based on the technique of ‘residual point process analysis’. The concluding Sections 7.5 and 7.6 are concerned with uses of the conditional intensity for testing, simulating, and forecasting such processes, and with the links between point process entropy and the evaluation of probability forecasts.

7.1. Likelihoods and Janossy Densities In the abstract at least, there are no special diﬃculties involved in the notion of a point process likelihood. Granted a realization (x1 , . . . , xn ) in some subset

7.1.

Likelihoods and Janossy Densities

213

A of the state space X , we require the joint probability density of the xi with respect to a convenient reference measure, which when X = Rd is commonly the n-fold product of Lebesgue measure on Rd . As usual, the likelihood should be considered as a function of the parameters deﬁning the joint density and not as a function of the xi and n, which are taken as given. The density here is for an unordered set of points; it represents loosely the probability of ﬁnding particles at each of the locations xi and nowhere else within A, and so it is nothing other than the local Janossy density (Deﬁnition 5.4.IV) jn (x1 , . . . , xn | A) for the point process restricted to A. These considerations are formalized in the following two deﬁnitions. Deﬁnition 7.1.I. (a) Given a bounded Borel set A ⊆ Rd , a point process N on X = Rd is regular on A if for all integers k ≥ 1 the local Janossy measures Jk (dx1 × · · · × dxk | A) of Section 5.4 are absolutely continuous on A(k) with respect to Lebesgue measure in X (k) . (b) It is regular if it is regular on A for all bounded A ∈ B(Rd ). Proposition 5.4.V implies that a regular point process is necessarily simple. Deﬁnition 7.1.II. The likelihood of a realization x1 , . . . , xn of a regular point process N on a bounded Borel set A ⊆ Rd , where n = N (A), is the local Janossy density LA (x1 , . . . , xn ) = jn (x1 , . . . , xn | A).

(7.1.1)

For convenience, we often abbreviate LA to L. When the whole point process is a.s. ﬁnite, and the set A coincides with the space X , the situation is particularly simple. In many cases, the likelihood can be written down immediately from the deﬁnition; some examples follow. Example 7.1(a) Finite inhomogeneous Poisson process in A ⊂ Rd . Suppose the process has intensity measure Λ(·) with density λ(x) with respect to Lebesgue measure on Rd . It follows from the results in Section 2.4 that the total number of points in A has a Poisson distribution with mean Λ(A) and that conditional on the number N of such points, the points themselves are i.i.d. on A with common density λ(x)/Λ(A). Suppose we observe the points {x1 , . . . , xn } within A, with n = N (A). In this case, we may assume X = A without any eﬀective loss of generality, as the complete independence property ensures that the behaviour within A is unaﬀected by realization of the process outside A. Then, taking logs of the Janossy density gives for the log likelihood the formula n

log L(x1 , . . . , xn ) = log λ(xi ) − λ(x) dx, (7.1.2) i=1

A

of which (2.1.9) is the special case X = R. This example continues shortly.

214

7. Conditional Intensities and Likelihoods

Equation (7.1.2) is basic to the likelihood theory of evolutionary processes. As we shall see in the next section, it extends to a wide range of such processes, provided the rate λ(t) is interpreted in a suﬃciently broad manner. Another important use for the likelihood in (7.1.2) is as a reference measure for the more general concept of the likelihood ratio. Let N , N be two point processes deﬁned on a common state space X and with probability measures P, P , respectively, on some common probability space (Ω, E). By a mild abuse of language, we shall say that N is absolutely continuous with respect to N , denoting it N N , if P is absolutely continuous with respect to P . In talking about a ﬁnite point process on a bounded Borel subset A of Rd , the appropriate probability space is A∪ [see (5.3.8)], and an appropriate reference measure is that of a Poisson process on A with constant intensity. In this context, we have the following result. Proposition 7.1.III. Let N , N be point processes deﬁned on the c.s.m.s. X = Rd , and let A be a bounded Borel set ⊂ Rd . Then N N on A if and only if for each k > 0 the local Janossy measures Jk (· | A) and Jk (· | A) associated with N and N , respectively, satisfy Jk (· | A) Jk (· | A). In particular, if N is the Poisson process with constant intensity λ > 0, then N N if and only if N is regular on A. Proof. If N vanishes identically on A, the conclusion is trivial, so we suppose this is not the case. Recall from the discussionaround Proposition 5.3.II ∞ that an event E from A∪ has the structure E = 0 Sk , where each Sk is a (k) symmetric set; i.e. an element of Bsym (A) (see Exercise 5.3.5). To establish the absolute continuity N N on A, we have to show that if P, P are the probability measures induced on A∪ by N, N , then P(E) = 0 whenever P (E) = 0. Since N is not identically zero, P (E) = 0 only if S0 = ∅ and P (Sk ) = 0 for all k > 0. It is enough here to suppose that Sk is the symmetrized form of a product set A1 × . . . × Ak , where the Ai form a partition of A, since product sets of this form generate the symmetric sets in A(k) . Then, from the deﬁnition of the local Janossy measures, k! P(Sk ) = Jk (A1 × . . . × Ak | A) = Jk (Sk | A). Similarly, k! P (Sk ) = Jk (A1 × . . . × Ak | A). Thus, if P (E) = 0, then for each k, Jk (Sk | A) = 0, and if Jk (· | A) Jk (· | A), then Jk (Sk | A) = P(Sk ) = 0 as well, so P(E) = 0. The same equivalences establish the converse relation. If, in particular, N is the Poisson process on A with constant intensity λ, then k Jk (Sk | A) = k! P (Sk ) = λ(Ai ) e−λ(A) , i=1

7.1.

Likelihoods and Janossy Densities

215

where is Lebesgue measure in Rd . Thus, each local Janossy measure Jk (· | A) is proportional to Lebesgue measure in (Rd )k , so Jk (· | A) Jk (· | A) for all k > 0 if and only if N is regular. When densities are known explicitly for both processes, the likelihood ratio for a realization {x1 , . . . , xn } within A is the ratio of the two Janossy densities of order n for the process on A. When the reference measure is that of a Poisson process with unit intensity, P # say, this can be written (A) jn (x1 , . . . , xn | A). LA /L# A =e

(7.1.3a)

In other words, it is directly proportional to the Janossy measure itself. Alternatively, (7.1.3a), or more properly the collection of such expressions for all integers n, can be regarded simply as the density of the given point process on A∪ relative to the Poisson process measure as a reference measure. Written out in full, the Radon–Nikodym derivative for the two measures on A∪ takes the form (see Exercise 5.3.8) ∞

dP λn λ(A) J (7.1.3b) (ω) = e I + (x , . . . , x )I j 0 N (A)=0 n 1 n N (A)=n . dP n! 1 We look again at the inhomogeneous Poisson process example in this light. Example 7.1(a) (continued). As in (7.1.2), PA denotes the distribution associated with an inhomogeneous Poisson process with intensity λ(x). Then, the log likelihood ratio relative to the unit-rate Poisson takes the form log(LA /L# A) =

N

log λ(xi ) −

[λ(x) − 1] dx. A

i=1

One further manipulation of this equation is worth pointing out. Suppose that λ(x) has the form λ(x) = Cφ(x), where C is a positive scale parameter and φ(x) is normalized so that A φ(x) dx = 1. Then (7.1.3) becomes log(LA /L# A ) = N log C +

N

log φ(xi ) − C + (A).

i=1

Diﬀerentiation with respect to C yields the maximum likelihood estimate , = N, C and it is clear that here N is a suﬃcient statistic for C. Moreover, substituting ˆ A , say, and the ratio becomes this value back into the likelihood yields L

ˆ A /L# ) = N log N − N + (A) + log(L log φ(xi ). A

216

7. Conditional Intensities and Likelihoods

Apart from a constant term, this is the same expression as would be obtained by ﬁrst conditioning on N , when the likelihood reduces to that for N independent observations on the distribution with density φ(xi ). Clearly, in this situation, estimates based on Poisson observations with variable N yield the same results as estimates obtained by ﬁrst conditioning on N , a statement that is not true with other distributions even asymptotically. Finally, consider the model with constant but arbitrary (unknown) rate C, so that λ(x) = C/(A) with likelihood L0A , say. We ﬁnd as a special case of the above ˆ 0 /L# ) = N log N − N + (A) − N log (A), log(L A A from which ˆ A /L ˆ0 ) = log(L A

log φ(xi ) + N log (A).

Thus, the term on the right-hand side is the increment to the log likelihood ratio achieved by ﬁtting a model with density proportional to φ(x) over a model with constant density. This elementary observation often provides a useful reduction in the complexity of numerical computations involving Poisson models. The next three examples form some of the key models in representing spatial point patterns within ﬁnite regions. Although the likelihoods can be given in more or less explicit form, explicit analytic forms for other characteristics of the process—moment and covariance densities, for example—are not easy to ﬁnd, mainly because of the intricate links between the numbers and locations of particles within a given region. Another major problem is that, in many important examples, the characteristics of the process are not given directly in terms of the local Janossy measures for the process on A but in terms of global characteristics from which the local characteristics have to be derived. If the process is deﬁned directly in terms of the local Janossy measures, then it is assumed, either tacitly or otherwise, that any eﬀects from points outside the observation region A have been incorporated into the deﬁnitions or ignored. If this is not the case—if, for example, one wishes to ﬁt a stationary version of a process with speciﬁed interaction potentials—the situation becomes considerably more complex. Allowing for the inﬂuence exerted in an average sense by points outside A amounts to nothing less than a generalized version of the Ising problem, where the issue was ﬁrst posed in the context of magnetized particles in a one-dimensional continuum. The issue is discussed further around Example 7.1(e) and in Chapter 15. In the next three examples, this diﬃculty is avoided by assuming that the process is totally ﬁnite on X and that X = A. Example 7.1(b) Finite Gibbs processes on X ; pairwise interaction systems [see Example 5.3(c)]. An important class of examples from theoretical physics

7.1.

Likelihoods and Janossy Densities

217

was introduced in Example 5.3(c), with Janossy densities and hence likelihoods of the form L(x1 , . . . , xn ) = C(θ) exp[−θU (x1 , . . . , xn )] ,

(7.1.4)

where U can be expressed as a sum of interaction potentials, and the partition function C(θ) is chosen to satisfy the normalization condition of equation (5.3.7). In the practically important case of pairwise interactions, only ﬁrstand-second order interaction terms are present, and U takes the form U (x1 , . . . , xn ) =

n

ψ1 (xi ) +

i

n

ψ2 (xi , xj ).

j
Although such models have a valuable ﬂexibility in modelling diﬀerent types of spatial interactions, their initial attractiveness is somewhat countered by the diﬃculty of expressing the partition function C(θ) in terms of the other parameters of the model. In fact, exact expressions for the likelihood do not seem to be available in any cases where the second-order term is nontrivial. Ogata and Tanemura (1981) advocate using the approximations (virial expansions) developed by physicists for this purpose, but even so the computations are laborious and their accuracy uncertain. Diggle et al. (1994) compares different numerical approximations. More recent work has focussed on Markov chain Monte Carlo (MCMC) approximations, where the equilibrium solution is obtained numerically as a long-term average of simulations of a Markov chain having the required distribution as its stationary distribution (see e.g. H¨ aggstrøm et al., 1999; Andersson and Britton, 2000, Chapter 11). By judicious choice of the Markov chain transition probabilities, the normalizing constant can be made to disappear from the estimates (e.g. Exercise 7.1.7). Another technique that obviates the need to explicitly evaluate the normalizing constant is to replace the true likelihood L by the pseudolikelihood L† deﬁned by n jn (x1 , . . . , xn ) L† (x1 , . . . , xn ) = . jn−1 ({x1 , . . . , xn } \ xk ) k=1

Since this involves a ratio of Janossy densities, the normalizing constant disappears. It is very much easier, therefore, to derive the pseudolikelihood estimates for a model of this kind than it is to derive the true maximum likelihood estimates. On the other hand, the properties of estimates obtained by maximizing the pseudolikelihood, for example their consistency or asymptotic normality, are currently only partially resolved. In practice, they behave in much the same way as standard maximum likelihood estimates, and it seems likely that in time the theory of both will be subsumed under a more general umbrella. See Baddeley (2001) for examples and further discussion. Example 7.1(c) Strauss processes; hard-core models (Strauss, 1975; Kelly and Ripley, 1976). Strauss processes are the special cases of the model above

218

7. Conditional Intensities and Likelihoods

when ψ1 is a constant α and ψ2 (xi , xj ) has a ﬁxed value β within the range xi − xj < R, for some ﬁxed R < ∞, and is zero outside it. In this case, the Janossy density takes the form jn (x1 , . . . , xn ) = C(α, β, R) αn β m , where m = m(x1 , . . . , xn ) is the number of distinct pairs xi , xj for which xi − xj < R. The Janossy density is constant on hypercylinders around the diagonals xi = xj and their intersections in X (n) . For the process to be well deﬁned, the sum of the Janossy measures must converge [see equation (5.3.9)], which occurs if and only if either β < 1 or β = 1 and α ≤ 1 (cf. Exercise 7.1.8). The condition β < 1 implies some degree of repulsion between points, implying underdispersion relative to the Poisson process. In particular, the choice β = 0 corresponds to a so-called hard-core model, in which points cannot come closer than within a distance R of each other. Other examples of hard-core models appear in Section 8.3. For other values of α and β, the series of Janossy measures diverges so that they no longer correspond to a well-deﬁned ﬁnite point process. Thus, the process cannot be used directly to model clustering, but modiﬁed Strauss processes with β > 1 can be produced by weighting the Janossy densities with a sequence of constants, wn say, chosen to ensure convergence of the Janossy measures. The most extreme case, corresponding to setting wn = 1 for some selected value of n and to 0 otherwise, corresponds to conditioning on an outcome of ﬁxed size n. See Kelly and Ripley (1976) and Exercise 7.1.8 for details. Example 7.1(d) Markov point processes (Ripley and Kelly, 1977). In order to introduce some concept of Markovianity into the unordered context of spatial point processes, Ripley and Kelly ﬁrst assume the existence of a relationship ∼ among the points {xi } of a realization. When xi ∼ xj , the points (xi , xj ) are said to belong to the same clique or neighbourhood class. Given any realization of the process, the points may be uniquely divided up into cliques, where a point xi forms a clique by itself if there are no other points xj in the realization for which xi ∼ xj . Let ϕ: X ∪ → R+ be a function deﬁned on cliques V and taking real positive values. Then, a ﬁnite point process is said to be a Markov point process if the Janossy density for a realization with a total of N points coming from V cliques Vk with Nk points in Vk takes the form V jN (x1 , . . . , xN | A) = C ϕ(Vk ), (7.1.5)

k=1

where N = k Nk and C is a normalization constant chosen to ensure the Janossy measures satisfy condition (5.3.7). This is equivalent to requiring that the density relative to a unit-rate Poisson process is always proportional V to the product k=1 ϕ(Vk ) no matter how many points the realization may contain.

7.1.

Likelihoods and Janossy Densities

219

A common choice is to take xi ∼ xj if ||xi − xj || < R. We leave the reader to verify that this leads to a well-deﬁned equivalence relation and that if 0 if N (V) ≥ 2, ϕ(V) = α otherwise, then we recover the hard-core version of the Strauss model. Many other important examples of spatial point processes may be put into this form, although the appropriate deﬁnitions of clique and the function φ may take some teasing out. A more extended discussion of Markov point processes is given in Chapter 10. In some examples, it is possible to take advantage of a simple expression for the log p.g.ﬂ.; this generally leads to simple expressions for the Khinchin measures, which can then be used to construct the Janossy measures via the combinatorial formulae (5.5.31). The simplest example is the Poisson process, for which only the ﬁrst Khinchin measure is nonzero, so in the notation of Exercise 5.5.8 we have, say, K0 = − log p0 (A) = λ(x) dx = Λ(A), A

k1 (x | A) = λ(x).

n Then, from (5.5.31) we have jn (x1 , . . . , xn ) = p0 (A) i=1 λ(xi ) as used in (7.1.3a). The next most complicated example of this type is the Gauss–Poisson process described in detail in Example 6.3(d) for which just the ﬁrst two of the Khinchin measures are nonzero. At this point, we meet an example of the diﬃculty referred to in the discussion preceding Example 7.1(b). The deﬁning quantities for the Gauss–Poisson process are the measures Q1 (dx) and Q2 (dx1 × dx2 ) described in Proposition 6.3.IV. If the process is observed on a bounded set A, then we have to determine whether these quantities are given explicitly for the process on A or quite generally for the process on the whole of R. In the former case the analysis can proceed directly and is outlined in Example 7.1(e)(i) below. In the latter case, however, and speciﬁcally in the case where we want to ﬁt a model with densities q1 (x) ≡ q1 , q2 (x1 , x2 ) = q(x1 − x2 ) corresponding to a stationary version of the process, it is not clear how to allow for the interactions with points of the process lying outside of A and hence unobserved. It turns out that, for this particular model, explicit corrections for the average inﬂuence of such outside points can be made and amount to modifying the parameters for the process observed on A. This discussion is outlined in Example 7.1(e)(ii). Example 7.1(e) (i) Gauss–Poisson process on a bounded Borel set A. From (6.3.30) or Exercise 6.3.12, we know that the log p.g.ﬂ. of a Gauss–Poisson process deﬁned on a bounded Borel set A as state space has the expansion [1 − h(x)h(y)] K2 (dx × dy). − log G[h] = [1 − h(x)] K1 (dx) + A

A(2)

220

7. Conditional Intensities and Likelihoods

Assume that K1 (dx) = µ(x) dx and K2 (dx × dy) = 12 q(x − y) dx dy for some function µ(·) and some symmetric function q(·). Then, the Khinchin densities kr are given by k2 (x, y) = q(x − y),

k1 (x) = µ(x),

kr (·) = 0

and

(all r = 3, 4, . . .),

K0 = − log p0 (A) =

µ(x) dx + A

=

1 2

k1 (x) dx + A

1 2

q(x − y) dx dy A A k2 (x, y) dx dy. A A

We turn to the expansion of the Janossy densities in terms of Khinchin densities given by equation (5.5.31), namely jn (x1 , . . . , xn | A) = exp(−K0 )

n r

r=1 T ∈Prn i=1

k|Si (T )| (xi1 , . . . , xi,|Si (T )| ),

where the inner summation is taken over all partitions T of x1 , . . . , xn into i subsets as described above Lemma 5.2.VI. The only nonzero terms arising in this summation are those relating to partitions into sets of sizes 1 and 2 exclusively. This leads to the form for the Janossy densities jn (x1 , . . . , xn | A)

∗

[n/2]

= p0 (A)

µ(xi1 ) · · · µ(xin−2k ) q(xi1 − xi2 ) · · · q(xi2k−1 − xi2k ),

(7.1.6)

k=0

∗ where the summation extends over the n!/[(n − 2k)! 2k ] distinct sets of k pairs of diﬀerent indices (i1 , i2 ), . . . , (i2k−1 , i2k ) from {1, . . . , n} satisfying i2j−1 < i2j (j = 1, . . . , k) and i1 < i3 < · · · < i2k−1 , and {i1 , . . . , in−2k } is the complementary set of indices. Given a realization x1 , . . . , xn of a Gauss–Poisson process on a set A, its likelihood is then jn (x1 , . . . , xn | A), which is in principle computable but in practice is somewhat complex as soon as n is of moderate size. Newman (1970) established (7.1.6) by an induction argument. (ii) Stationary Gauss–Poisson process. In the speciﬁc case of a stationary (translation-invariant) Gauss–Poisson process, we can proceed as follows. The global process is deﬁned by two global parameters, a mean density, say m, and a factorial covariance measure C˘[2] , which we shall assume to have density q(x − y). From these we can obtain obtain versions of the local Khinchin densities from equations, analogous to (5.4.11), k1 (x | A) = c[1] (x) +

∞

(−1)j i=1

j!

A(j)

c[1+j] (x, y1 , . . . , yj ) dy1 · · · dyj ,

7.1.

Likelihoods and Janossy Densities

which here reduces to

221

k1 (x | A) = m −

q(x − y) dy ≡ µ(x)

(x ∈ A),

A

and k2 (x1 , x2 | A) = q(x1 − x2 )

(x1 , x2 ∈ A),

while all higher-order Khinchin measures vanish. Since these two densities deﬁne the two measures Q1 , Q2 characterizing a Gauss–Poisson process [see Example 6.3(d)], we see ﬁrstly that the process on A is still a Gauss–Poisson process and secondly that its deﬁning measures, unlike the moment measures, depend explicitly on the locations within the observation set A. In other words, although the local process on A is still a process of correlated pairs, its properties are no longer constant across A but depend in general on the proximity to the boundary of A. From this discussion, we see that there is no loss of generality in assuming that X = A, although to obviate the need for edge corrections we shall have to assume that the deﬁning measures are not stationary, even though the global process may be so (see also Brix and Kendall, 2002). In principle, it is possible to write down expressions even more complicated than (7.1.6) for cluster processes with up to 3, 4, . . . points in each cluster. Baudin (1981) developed an equivalent systematic procedure for writing down the likelihood of a Neyman–Scott cluster process, but again it is of substantial combinatorial complexity: see Exercises 7.1.5–6 for details (see also Baddeley, 1998). The diﬃculty of ﬁnding the local Janossy measures in terms of global parameters of the model varies greatly with the model. In a few simple cases, such as the Poisson and Gauss–Poisson examples just considered, explicit expressions may be obtained. In other examples, ﬁnding exact solutions raises diﬃculties of principle as much as technical diﬃculty. Only the evolutionary processes, considered in the later sections of this chapter, provide a substantial class of models for which a ready solution exists and then only by taking special advantage of the order properties of the time-like dimension. Further discussion of the general problem is deferred until Chapter 15. At the practical level, the diﬃculty can be alleviated to some extent by the use of so-called plus sampling or minus sampling. This consists of either adding to (‘plus’) or subtracting from (‘minus’) the original sampling region A a buﬀer region in which the points contribute indirectly to the likelihood by virtue of their eﬀects on the probability density of the points in the inner region but are not included as part of the realization as such. Of course, the points in the buﬀer region do not play their full weight in the analysis, and the corrections so obtained are only approximate. There is clearly some delicacy in choosing the buﬀer region large enough to improve accuracy by reducing bias (arising from edge eﬀects) but not so large that the improvement is oﬀset by the loss of information due to not making full use of the data points in the buﬀer region. Edge eﬀects are discussed again at the end of Section 8.1.

222

7. Conditional Intensities and Likelihoods

Another possible strategy is to introduce ‘periodic boundary eﬀects’, essentially by wrapping the time interval around a circle, in the case of a onedimensional problem, or, for a rectangular region in the plane, by repeating the original region (with the original data) at all contiguous positions in a rectangular tiling of the plane with the original region as base set. The rationale behind the procedure is that the missing data in a neighbourhood of the original observation will be replaced by data that may be expected to have similar statistical properties in general terms. Further discussion of these and similar techniques can be found in the texts by Ripley (1981), Cressie (1991), and Stoyan and Stoyan (1994). Example 7.1(f) Fermion and boson processes [see Examples 5.4(c) and 6.2(b)]. Each of these processes is completely speciﬁed by a global covariance function c(x, y), and the local Janossy densities appear as either determinants [for the fermion process: see (5.4.19)] or permanents [for the boson process: see (6.2.11)]. In each case, the densities are derived from a resolvent kernel of the integral equation on A with kernel c(· , ·). As for the Gauss–Poisson process, the resulting explicit expressions for the Janossy densities (and thus the likelihoods) incorporate requisite adjustments for boundary eﬀects. We conclude this section with an excursion into the realm of hypothesis testing; it has the incidental advantage of illustrating further the role of the Khinchin density functions. A commonly occurring need in practice is to test for the null hypothesis of a Poisson process against some appropriate class of alternatives, and it is then pertinent to enquire as to the form of the optimal or at least locally optimal test statistic for this purpose. This question has been examined by Davies (1977), whose general approach we follow. The locally optimal test statistic is just the derivative of the log likelihood function, calculated at the parameter values corresponding to the null hypothesis. Davies’ principal result is that this quantity has a representation as a sum of orthogonal terms, containing contributions from the factorial cumulants of successively higher orders. The formal statement is as follows (note that we return here to the general case of an observation region A ⊂ X = Rd ). Proposition 7.1.IV. For a bounded Borel subset A of Rd , let the distributions {Pθ } correspond to a family of orderly point processes on Rd indexed by a single real parameter θ such that (i) for θ = 0 the process is a Poisson process with constant intensity µ, and (ii) for all θ in some neighbourhood V of the origin, all factorial moment and cumulant densities m[k] and c[k] exist and are diﬀerentiable functions of θ and are such that for each s = 1, 2, . . . the series ∞

1 ··· c[k+s] (x1 , . . . , xs , y1 , . . . , yk ; θ) dy1 · · · dyk k! A A

k=1

(7.1.7)

7.1.

Likelihoods and Janossy Densities

223

is uniformly convergent for θ ∈ V , and the series ∞

(1 + δ)k ··· c[k] (y1 , . . . , yk ; θ) dy1 · · · dyk k! A A

(7.1.8)

k=1

converges for some δ > 0. & Then, the eﬃcient score statistic ∂ log L/∂θ&θ=0 can be represented as the sum & ∞

∂ log L && = Dk , (7.1.9) D≡ ∂θ &θ=0 k=1

where, with I(y1 , . . . , yk ) = 1 if no arguments coincide and = 0 otherwise and Z(dy) = N (dy) − µ dy, 1 ··· I(y1 , . . . , yk )c[k] (y1 , . . . , yk ; 0) Z(dy1 ) · · · Z(dyk ). Dk = k µ k! A A (7.1.10) Under the null hypothesis θ = 0 and j > k ≥ 1, E(Dk ) = E(Dk Dj ) = 0, 1 · · · [c[k] (y1 , . . . , yk ; 0)]2 dy1 · · · dyk . var Dk = k µ k! A A

(7.1.11a) (7.1.11b)

Proof. We again use the machinery for ﬁnite point processes starting with the expression for the likelihood L ≡ Lθ = jn (x1(1)n ; θ) of the realization {x1 , . . . , xn } ≡ {x1(1)n } on the set A in the form [see (5.5.31)] L = exp(−K0 (θ))

j n

k|Si (T )| (xi,1 , . . . , xi,|Si (T )| ; θ),

(7.1.12)

j=1 T ∈Pjn i=1

where the kr (·) denote Khinchin densities and the inner summation extends over the set Pjn of all j-partitions T of the realization {x1(1)n }. Because θ = 0 corresponds to a Poisson process, K0 (0) = µ(A) and kr (y1(1)n ; 0) = 0 unless r = 1 when k1 (y; 0) = µ. Consequently, (7.1.12) for θ = 0 reduces to L0 = µn exp(−µ(A)), as it should. This fact simpliﬁes the diﬀerentiation of (7.1.12) because, assuming (as we justify later) the existence of the derivatives & ∂ kr (y1(1)r ; 0) ≡ kr (y1(1)r ; θ)&θ=0 , ∂θ in diﬀerentiating the product term in (7.1.12), nonzero terms remain on setting θ = 0 only if at most one set Si (T ) has |Si (T )| > 1 and all other j − 1 sets have |Si (T )| = 1. Thus, & n

∗ kn−j+1 (xr1 , . . . , xrn−j+1 ; 0) ∂ log L && j−1 (log L) ≡ = −K (0) + µ 0 & ∂θ θ=0 µn j=1 =

−K0 (0)

+

n

i=1

µ−i

∗

ki (xr1 , . . . , xri ; 0),

224

7. Conditional Intensities and Likelihoods

∗ where the summation extends over all distinct selections of size i from the set {x }. Since this set is a realization of the process N (·) over A, the 1(1)n ∗ sum is expressible as the integral 1 ··· I(y1(1)i ; 0) N (dy1 ) · · · N (dyi ), i! A A where the factor I(y1(1)i ) avoids repeated indices and division by i! compensates for the i! recurrences of the same set of indices in diﬀerent orders. This leads to the representation ∞

1 (log L) = −K0 (0) + · · · I(y1(1)i )k (y1(1)i ; 0) N (dy1 ) · · · N (dyi ) i i! µ A A i=1 (7.1.13) now valid on an inﬁnite range for i as the sum terminates after N (A) terms. When the Khinchin measures are known explicitly, (7.1.13) can be used directly. Otherwise, use the expansion akin to (5.5.29) of k(·) in terms of factorial cumulant densities ∞

(−1)j ··· c[i+j] (y1(1)i , u1(1)j ; θ) du1 · · · duj , ki (y1(1)i ; 0) = j! A A j=0 which, in view of the assumption in (7.1.7), both shows that the ki (·) are diﬀerentiable as assumed earlier and justiﬁes term-by-term diﬀerentiation. Because of (7.1.12), the same is also true of Lθ . Also, since by (5.5.26) K0 (θ) is a weighted sum of all other Khinchin measures, substitution for ki (·) yields ∞ ∞

1 (−1)j ··· K0 (θ) = i! A j! A i=1 j=0 × ··· c[i+j] (y1(1)i , u1(1)j ; θ) du1 · · · duj dy1 · · · dyi , A

A

which on replacing j by j − i, inverting the order of summation, and using j j−i (−1) /[i! (j − i)!] = −(−1)j /j! gives for θ = 0 i=1 K0 (0) = −

∞

(−1)j j=1

j!

A

···

c[j] (u1(1)j ; 0) du1 · · · duj .

A

Similar substitution after diﬀerentiation into (7.1.13), rearrangement of the order of summation, and substitution for −K0 (0) yields j ∞

1 (−µ)j−i j! (log L) = µj j! i=0 i! (j − i)! j=1 × ··· c[j] (y1(1)i , u1(1)j−i ; 0) N (dy1 ) · · · N (dyi ) du1 · · · duj−i .

A

A

7.1.

Likelihoods and Janossy Densities

225

Here j we recognize that the inner sum can arise from an expansion of i=1 [N (dvi ) − µ(dvi )], the symmetry of the densities c[j] (·) implying equality of their integrals with respect to any reordering of the indices in a diﬀerential expansion such as N (dv1 ) · · · N (dvi ) dvi+1 · · · dvj . Inserting this product form leads to (7.1.9) and (7.1.10). Veriﬁcation of equations (7.1.11a) and (7.1.11b) under the null hypothesis is straightforward. Example 7.1(g) Poisson cluster processes with bounded cluster size. Suppose the size of the clusters is limited to M so that only the ﬁrst M terms are present in the expansions in terms of Khinchin or cumulant densities; the Gauss–Poisson case of Example 7.1(e) corresponds to M = 2. Then, for θ > 0, we may deﬁne the process as the superposition of a stationary Poisson process with parameter µ and a Poisson cluster process with clusters of size 2, . . . , M with Khinchin measures with densities θkj (y1 , . . . , yj ) taken from the p.g.ﬂ. representation (6.3.32) (i.e. kj is the density of the measure Kj there). Then, the Khinchin densities in the resultant process have the form (identifying the state space X with the set A) K0 (θ) = θ(A) + θ k1 (x; θ) = µ + θk1 (x),

M

1 ··· kj (x1 , . . . , xj ) dx1 · · · dxj , j! A A j=1 kj (x1 , . . . , xj ) = θkj (x1 , . . . , xj )

(j = 2, . . . , M ).

From (7.1.21), we have the expansion & M

∂ log L && 1 I(y1 , . . . , yj ) kj (y1 , . . . , yj ) = · · · N (dy1 ) · · · N (dyj ) & j ∂θ θ=0 j=1 µ A j! A =

M

1 ∗ kj (xr1 , . . . xrj ). µj j=1

& This expression exhibits the eﬃcient score ∂ log L/∂θ&θ=0 as the sum of ﬁrst-, second-, . . . , M th-order statistics in the observed points x1 , . . . , xN . In the Gauss–Poisson case, only the ﬁrst- and second-order terms are needed. The derivation here implies that the form of the cluster process, up to and including the detailed speciﬁcation of the Kj , is known a priori. The situation if the structure is not known is much more complex but would in eﬀect involve taking a supremum over an appropriate family of functions Kj . An alternative representation is available through (7.1.9) and (7.1.10). This has the advantage that the cumulant densities can be speciﬁed globally so that no implicit assumptions about boundary eﬀects are needed. It follows from (6.3.32) (see Exercise 6.3.17) that only the ﬁrst M factorial cumulant densities c[j] need be considered and (since the c[j] are derived from linear combinations

226

7. Conditional Intensities and Likelihoods

of the kj ) that the same kind of structure holds for the c[j] , namely c[1] (x; θ) = µ + θc[1] (x), c[j] (x1 , . . . , xj ; θ) = θc[j] (x1 , . . . , xj )

(j = 2, . . . , M ).

Then (7.1.9) leads to a similar expansion in terms of linear, quadratic, . . . statistics, namely Dk =

1 k! µk

···

A

I(y1 , . . . , yk ) c[k] (y1 , . . . , yk ) Z(dy1 ) · · · Z(dyk ). A

For further examples, asymptotic behaviour in the stationary case, and the possibility of representing the Dk in terms of spectral measures, see Davies (1977) and Exercises 7.1.8–10.

Exercises and Complements to Section 7.1 7.1.1 Let N1 , N2 be two ﬁnite Poisson processes with intensity measures Λ1 , Λ2 , respectively. Show that N1 N2 if and only if Λ1 Λ2 (see above Proposition 7.1.III for N1 N2 ). 7.1.2 Exercise 2.1.9 discusses the likelihood of a cyclic Poisson process with rate parameter µ(t) = exp[α + β sin(ω0 t + θ)], though the parametric form is diﬀerent: eα here equals λ/I0 (κ) there. The derivation of maximum likelihood estimators given there assumes ω0 is known; here we extend the discussion to the case where ω0 is unknown. (a) Show that the supremum of the likelihood function in general is approached by a sequence of arbitrarily large values of ω0 for which sin ω0 ti ≈ constant and cos ω0 ti ≈ constant for every ti of a given realization. A global maximum of the likelihood is attainable if the parameters are constrained to a compact set. (b) Suppose the observation interval T → ∞, and constrain ω0 to an interval [0, ωT ], where ωT /T 1− → 0 (T → ∞) for some > 0. Then, the sequence of estimators ω ,0 (T ) is consistent. [See Vere-Jones (1982) for details.] 7.1.3 Another cyclic Poisson process model assumes µ(t) = α + β[1 + sin(ω0 t + θ)]. Investigate maximum likelihood estimators for the parameters [see earlier references and Chapter 4 of Kutoyants (1980, 1984)]. 7.1.4 Suppose that the density µ(·) of an inhomogeneous Poisson process on the bounded Borel set A such as the unit interval (or rectangle or cuboid, etc.) can be expanded as a ﬁnite series of polynomials orthogonal with respect to some weight function w(·) so that

µ(x) = αw(x) 1 +

r

j=1

βj vj (x)

≡ αw(x)ψ(x),

7.1.

Likelihoods and Janossy Densities

227

where A w(x) dx = 1, A w(x)vj (x) dx = 0, A w(x)vj (x)vk (x) dx = δjk (j, k = 1, . . . , r). Show that the problem of maximizing the log likelihood ratio log(L/L0 ), where L0 refers to a Poisson process with density w(x), is N equivalent to the problem of maximizing log ψ(xi ) subject to the coni=1 straint that ψ(x) ≥ 0 on A. This maximization has to be done numerically; the main diﬃculty arises from the nonnegativity constraint. 7.1.5 Use the relations in equation (5.5.31) between the Janossy and Khinchin densities to provide a representation of the likelihood of a Poisson cluster process in terms of the Janossy densities of the cluster member process. [Hint: Suppose ﬁrst that the process is a.s. totally ﬁnite. Expand log G[h] = (G[h | y] − 1) µc (dy) (h ∈ V(X )) and obtain X

jn (x1 , . . . , xn | y) µc (dy).

kn (x1 , . . . , xn ) = X

In the general case, proceed from the p.g.ﬂ. expansion of the local process on A as in (5.5.14) and (5.5.15).] 7.1.6 (Continuation). When the cluster structure is that of a stationary Neyman– Scott process with µc (dy) = µc dy as in Example 6.3(a) so that G[h | y] =

∞

j

pj

h(y + u) F (du)

≡Q

h(y + u)f (u) du ,

say,

X

j=0

deduce that the Janossy densities for the local process on A are given by

jn (x1 , . . . , xn | A) = exp µc −1

2 n

×

b∈B01 i=1

*

µc

X (|ai |)

Q X

[Q(1 − F (A − y)) − 1] dy (1 − F (A − y))

n

+b(ai )

[f (xj − y)]

aij

dy

,

j=1

where ai = (ai1 , . . . , ain ) is the binary expansion of i = 1, . . . , 2n − 1, |ai | = #{j: aij = 1}, and B01 is the class ofall {0, 1}-valued functions b(·) deﬁned on {ai : i = 1, . . . , 2n − 1} such that b(ai )ai = (1, . . . , 1). [Thus, any b(·) i has b(a) = 0 except for at most n subsets of a partition of {1, . . . , n}, and is here equivalent to in (5.5.31). Baudin (1981) used a b i j T x combinatorial lemma in Ammann and Thall (1979) to deduce the expression above and commented on the impracticality of its use for even a moderate number of points!] 7.1.7 Suppose that for each n the function U ≡ Un of (7.1.4) satisﬁes Un (x1 , . . . , xn ) ≥ −cn for some ﬁnite positive constant c. Show that a distribution is well deﬁned (i.e. that a ﬁnite normalizing constant exists). 7.1.8 Clustered version of the Strauss process. In the basic Strauss model of Example 7.1(c), if β > 1, the Janossy densities, and hence also their integrals over the observation region, will tend to increase as the number of points in

228

7. Conditional Intensities and Likelihoods the region increases. Suppose that the densities are taken proportional to wn αn β m(n) , where m(n) is as deﬁned in the example. Then, the integrals are dominated by the quantities Cwn αn β n(n−1) , and a suﬃcient condition for the process to be well deﬁned is that

wn αn β n(n−1) < ∞.

Show that this condition is not satisﬁed if wn ≡ 1, and investigate conditions on the wn to make it hold. Note that such modiﬁcations will not aﬀect the sampling patterns for ﬁxed n but only the probabilities pn controlling the relative frequency of patterns with diﬀerent numbers of events. See Kelly and Ripley (1976) for further discussion. 7.1.9 (a) For a stationary Gauss–Poisson process [see Example 7.1(e)] for which c[1] (u) = µ + θ and c[2] (u, v) = θγ(u − v) for some symmetric p.d.f. γ(·) representing the distribution of the signed distance between the points of a two-point cluster, show that its eﬃcient score statistic D (see Proposition 7.1.IV) is expressible as D = D1 + D2 , where D1 = N (A) − µ(A) ≡ Z(A),

γ(x − y) Z(dx) Z(dy).

D2 = A

A

(b) In practice, µ ˆ is estimated by N (A)/(A), so D1 vanishes, and in the second , = N (·) − µˆ(·). Davies (1977) shows that the term, Z is replaced by Z(·) asymptotic results remain valid with this modiﬁcation, so the eﬃciency of other second-order statistics can be compared with the locally optimum form D2 . Write the variance estimator in the form (r − 1)

r

[N (∆j ) − µ ˆ(∆j )]2 ,

j=1

where ∆1 ∪· · ·∪∆r is a partition of the observation region A into subregions of equal Lebesgue measure, in a form similar to D2 , and investigate the variance-to-mean ratio as a test for the Gauss–Poisson alternative to a Poisson process. [Davies suggested that the asymptotic local eﬃciency is bounded by 23 .] 7.1.10 (Continuation). In the case of a Neyman–Scott process with Poisson cluster size distribution, all terms Dk in the expansion in (7.1.9) are present, and D2 dominates D only if the cluster dimensions are small compared with the mean distance between cluster centres. 7.1.11 When the Poisson cluster process of Example 7.1(g) for X = R is stationary and A = (0, t], Dj ≈

1 ··· φj (l1 /t, . . . , lj /t) gj (λ1 , . . . , λj ; t), tj+1 j! µj l1 +···+lj =0

7.2.

Conditional Intensities, Likelihoods, and Compensators where

···

φj (λ1 , . . . , λj ) = R

kj (t1 , . . . , tj ) exp 2πi

R

j

229

λr tr

dt2 · · · dtj

r=1

with λ1 + · · · + λj = 0 and gj (λ1 , . . . , λj ; t) equals

t

··· 0

t

I(t1 , . . . , tj ) exp 2πi 0

j

λr tr

Z(dt1 ) · · · Z(dtj ).

r=1

[Hint: Use Parseval-type relations to show that t−1 E(|Dj − Dj |2 ) → 0 as t → ∞. See also Theorem 3.1 of Davies (1977).]

7.2. Conditional Intensities, Likelihoods, and Compensators If the discussion in the previous section suggests that there are no easy methods for evaluating point process likelihoods on general spaces, it is all the more remarkable, and fortunate, that in the special and important case X = R there is available an alternative approach of considerable power and generality. The essence of this approach is the use of a causal description of the process through successive conditionings. A full development of this approach is deferred to Chapter 14; here we seek to provide an introduction to the topic and to establish its links to representations in terms of Janossy densities. For simplicity, suppose observation of the process occurs over the time interval A = [0, T ] so that results may be described in terms of a point process on R+ . Denote by {t1 , . . . , tN (T ) } the ordered set of points occurring in the ﬁxed interval (0, T ). As in the discussion around equation (3.1.8), the ti , as well as the intervals τi = ti − ti−1 , i ≥ 1, t0 = 0, are taken to be well-deﬁned random variables. Suppose also that the point process is regular on (0, T ), so that the Janossy densities jk (·) all exist (recall Deﬁnition 7.1.I). We suppose that if there is any dependence on events before t = 0, it is already incorporated into the Janossy densities. For ease of writing, we use jn (t1 , . . . , tn | u) for the local Janossy density on the interval (0, u), and J0 (u) for J0 ((0, u)). Now introduce the conditional survivor functions Sk (u | t1 , . . . , tk−1 ) = Pr{τk > u | t1 , . . . , tk−1 } and observe that these can be represented recursively in terms of the (local) Janossy functions through the equations S1 (u) = J0 (u) S2 (u | t1 )p1 (t1 ) = j1 (t1 | t1 + u) S3 (u | t1 , t2 )p2 (t2 | t1 ) = j2 (t1 , t2 | t2 + u)

(0 < u < T ), (0 < tt < t1 + u < T ), (0 < t1 < t2 < t2 + u < T ),

and so on, where p1 (t), p2 (t | t1 ), . . . are the probability densities corresponding to the survivor functions S1 (u), S2 (u | t1 ), . . . . The fact that these densities exist is a corollary of the assumed regularity of the process. This can be

230

7. Conditional Intensities and Likelihoods

seen more explicitly by noting identities such as (for S1 (·)) T ∞

1 T ··· jk (u1 , . . . , uk | T ) du1 · · · duk , J0 (t) = J0 (T ) + k! t t k=1

from which p1 (t) = j1 (t | T ) +

∞

k=2

1 (k − 1)!

T

T

··· t

jk (t, u2 , . . . , uk | T ) du2 · · · duk , t

an expression that is actually independent of T for T > t. Similarly, for S2 we ﬁnd (for t1 < t < T ) p1 (t1 )S2 (t | t1 ) = j1 (t1 | T ) +

∞

k=2

1 (k − 1)!

T

··· t

T

jk (t1 , u2 , . . . , uk | T ) du2 · · · duk t

= j1 (t1 | t), from which it follows that p1 (t1 )p2 (t | t1 ) equals T T ∞

1 j2 (t1 , t | T ) + ··· jk (t1 , t, u3 , . . . , uk | T ) du3 · · · duk , (k − 2)! t t k=3

again establishing the absolute continuity of S2 (t | t1 ). Further results follow by an inductive argument, the details of which we leave to the reader. Together they suﬃce to establish the ﬁrst part of the following proposition. Proposition 7.2.I. For a regular point process on X = R+ , there exists a uniquely determined family of conditional probability density functions pn (t | t1 , . . . , tn−1 ) and associated survivor functions t Sn (t | t1 , . . . , tn−1 ) = 1 − pn (u | t1 , . . . , tn−1 ) du (t > tn−1 ) tn−1

deﬁned on 0 < t1 < · · · < tn−1 < t such that each pn (· | t1 , . . . , tn−1 ) has support carried by the half-line (tn−1 , ∞), and for all n ≥ 1 and all ﬁnite intervals [0, T ] with T > 0, J0 (T ) = S1 (T ), jn (t1 , . . . , tn | T ) ≡ jn (t1 , . . . , tn | (0, T )) = p1 (t1 )p2 (t2 | t1 ) · · · pn (tn | t1 , . . . , tn−1 ) × Sn+1 (T | t1 , . . . , tn ),

(7.2.1a)

(7.2.1b)

where 0 < t1 < · · · < tn < T can be regarded as the order statistics of the points of a realization of the point process on [0, T ]. Conversely, given any such family of conditional densities for all t > 0, equations (7.2.1a) and (7.2.1b) specify uniquely the distribution of a regular point process on R+ .

7.2.

Conditional Intensities, Likelihoods, and Compensators

231

Proof. Only the converse requires a brief comment. Given a family of conditional densities pn , both J0 (T ) and symmetric densities jk (· | T ) can be deﬁned by (7.2.1), and we can verify that they satisfy T ∞

1 T J0 (T ) + ··· jn (t1 , . . . , tn | T ) dt1 · · · dtn n! 0 0 n=1 ∞

··· = J0 (T ) + jn (t1 , . . . , tn | T ) dt1 · · · dtn = 1. n=1 0
It follows from Proposition 5.3.II that there exists a well-deﬁned point process with these densities. Since the point process is uniquely determined by the Janossy measures and these are equivalent to the conditional densities pn (t | t1 , . . . , tn−1 ) for a regular point process, there is a one-to-one correspondence between regular point processes and families pn (· | ·), as described. We now make a seemingly innocuous but critical shift of view. Instead of specifying the conditional densities pn (· | ·) directly, we express them in terms of their hazard functions hn (t | t1 , . . . , tn−1 ) =

pn (t | t1 , . . . , tn−1 ) Sn (t | t1 , . . . , tn−1 )

so that

pn (t | t1 , . . . , tn−1 ) = hn (t | t1 , . . . , tn−1 ) exp −

t

hn (u | t1 , . . . , tn−1 ) du .

tn−1

(7.2.2) Given a sequence {ti } with 0 < t1 < · · · < tn < · · · , we deﬁne an amalgam of the hazard functions by (0 < t ≤ t1 ), h1 (t) ∗ λ (t) = (7.2.3) hn (t | t1 , . . . , tn−1 ) (tn−1 < t ≤ tn , n ≥ 2). Deﬁnition 7.2.II. The conditional intensity function for a regular point process on R+ = [0, ∞) is the representative function λ∗ (·) deﬁned piecewise by (7.2.3). Note on terminology. In the general deﬁnition of conditional intensities, care must be taken to specify the information on which the conditioning is based. This is conveniently summarized by a σ-algebra of events. In the conditional intensity deﬁned above, the conditioning is taken with respect to the minimal σ-algebra consistent with observations on the process, namely the σ-algebra generated by the observed past of the process. More general versions may include information about exogenous variables or processes, as illustrated around Examples 7.2(d)–(e). The conditional intensity introduced here follows the terminology of Br´emaud (1981) and related references in the

232

7. Conditional Intensities and Likelihoods

electrical engineering literature; it should be carefully distinguished from the conditional intensity used in more recent discussions of spatial point patterns (see e.g. Baddeley and Turner, 2000), where it is a special case of the Papangelou intensity introduced in Chapter 15. This Papangelou conditional intensity relates to the eﬀect of adding an additional point within the observation region; Deﬁnition 7.2.II refers to adding an additional point within an extension of the observation region. The intuitive content of the notion of a conditional intensity function is well expressed through the suggestive relation λ∗ (t) dt ≈ E[N (dt) | Ht− ],

(7.2.3 )

where Ht− is the σ-algebra of events occurring at times up to but not including t. Thus, the conditional intensity can be interpreted as the conditional risk of the occurrence of an event at t, given the realization of the process over the interval [0, t). Strictly, the notation should reﬂect the fact that λ∗ (·) is a function λ∗ (· | t1 , . . . , tN (t) ) of the point history, or, even more generally, that it is itself a stochastic process λ∗ (t, ω) depending on ω through the realization {t1 (ω), . . . , tN (ω)} of the history up to time t. The terms conditional risk (or rate or hazard) function, or even these terms omitting the word ‘conditional’, have also been used to describe λ∗ (·) as deﬁned in (7.2.3). It is the key both to the likelihood analysis and to solving problems of prediction, ﬁltering, and simulating point processes on a half-line. Just as the density function of a probability distribution can in principle be speciﬁed only up to its values on a set of Lebesgue measure zero, so also a lack of uniqueness arises in deﬁning λ∗ (·). In all practical situations, the densities pn (· | ·) will be at least piecewise continuous, and uniqueness can then be ensured by (for example) taking the left-continuous modiﬁcation λ∗ (t−) for λ∗ (t). The reason for using left continuity is connected with predictability: if the conditional intensity has a discontinuity at a point of the process, then its value at that point should be deﬁned by the history before that point, not by what happens at the point itself. This is implicit in the way the hazard functions are deﬁned and crucial to the correct deﬁnition of the likelihood, since it is the density for the interval preceding a point that ﬁgures in the likelihood, not the new density that comes into play once the point has occurred. A rigorous discussion of these issues leads to the concept of a predictable σ-algebra and to the existence of predictable versions of the conditional intensity; see comments later in this chapter and Chapter 14. In the remainder of this section, unless stated otherwise, it is tacitly assumed that a left-continuous version of λ∗ (·) exists and is being used. Proposition 7.2.III. Let N be a regular point process on [0, T ] for some ﬁnite positive T , and let t1 , . . . , tN (T ) denote a realization of N over [0, T ]. Then, the likelihood L of such N is expressible in the form + * N (T ) T ∗ ∗ λ (ti ) exp − λ (u) du , (7.2.4) L= i=1

0

7.2.

Conditional Intensities, Likelihoods, and Compensators

233

and its log likelihood ratio relative to the Poisson process on [0, T ] with constant rate 1 is expressible as T N (T )

L ∗ log = log λ (ti ) − [λ∗ (u) − 1] du. L0 0 i=1

(7.2.5)

Proof. To establish (7.2.4), it is enough to express the Janossy densities in terms of the conditional densities pn (t | t1 , . . . , tn−1 ) and then express each of these in terms of their hazard functions and hence of λ∗ (·). Details are left to the reader: see Exercise 7.2.1. An important consequence of the construction used in the proof above is that the conditional intensity function determines the family of conditional hazard functions at (7.2.3) and that these in turn determine the Janossy densities. This can be summarized as below. Proposition 7.2.IV. Let N be a regular point process on R+ . Then, the conditional intensity function determines the probability structure of the point process uniquely. Our ﬁrst example illustrates these ideas in the context of a Wold process. Example 7.2(a) Wold process of correlated intervals (see Section 4.5). Suppose the Markov process of successive interval lengths {In } ≡ {tn − tn−1 } (n = 1, 2, . . .), with t0 ≡ 0, is governed by the transition kernel with density p(y | x) for the length y of the interval In given the length x of the interval In−1 . For n ≥ 3, the conditional distribution has the density pn (t | t1 , . . . , tn−1 ) = p(t − tn−1 | tn−1 − tn−2 ), so that in terms y of the hazard function h(y | x) = p(y | x)/S(y | x), where S(y | x) = 1 − 0 p(u | x) du, we have λ∗ (t) = h(t − tN (t) | tN (t) − tN (t)−1 ). Here, tN (t) and tN (t)−1 are the ﬁrst and second points to the left of t, and it is assumed that N (t) ≥ 2. To specify λ∗ (·) at the beginning of the observation period (i.e. in {t > 0: N (t) ≤ 1}), some further description of the initial conditions is needed. If observations are started from an event of the process as origin, it is enough to be given the distribution of the initial ∞interval (0, t1 ) [e.g. it may be the stationary density π(·) satisfying π(y) = 0 p(y | x) π(x) dx, if such π(·) exists]. Otherwise, the length of the interval terminating at t1 may be an additional parameter in the likelihood and we may seek to estimate it, or we may impose further description of both the interval terminating at t1 and the interval (t1 , t2 ). See Exercise 7.2.3 for a particular case. Example 7.2(b) Hawkes process [continued from Example 6.3(c)]. Suppose that the infectivity measure µ(dx) has a density µ(dx) = µ(x) dx, say. Then,

234

7. Conditional Intensities and Likelihoods

each event at ti < t contributes an amount µ(t − ti ) to the risk at t. There is also a risk, λ say, of a new arrival at t. Assuming no contributions to the risk from the negative half-line, λ∗ (·) is expressible in the simple form t

∗ µ(t − ti ) = λ + µ(t − u) N (du). (7.2.6) λ (t) = λ + 0

0
In applications, it is desirable to give µ(·) some convenient parametric form. Ogata and Akaike (1982) and Vere-Jones and Ozaki (1982) discuss likelihood estimation for this process using a parametrization of the form % K −αt (t > 0), k=0 bk Lk (t)e µ(t) = (7.2.7) 0 (t ≤ 0), where the functions Lk (t) are Laguerre polynomials deﬁned on t > 0; detailed computations are given in the quoted papers. Combinations of exponential terms with diﬀerent decay parameters could also be considered, but pragmatic problems of estimability arise: even estimating α in (7.2.7) can be diﬃcult. Initial conditions also pose a problem. It is simplest to suppose that λ∗ (0) = 0 so that any inﬂuence from the past is excluded. If this is not the case, then it may be possible to condition on information prior to time t = 0; in the technical language of Chapter 14, this means passing from the internal history to a more general intrinsic history. If neither of these options is available, then we are faced with a minor version of the Ising problem, as discussed around Examples 7.1(b) and 7.1(e). In principle, we should take the joint distribution of the observations (on (0, T ), say) and the entire past and then average over all possible past histories. In simple cases, this may be explicitly possible. For example, if K = 0 in (7.2.7), any contribution from events before t = 0 decays exponentially at the uniform rate exp(−αt), and in fact the whole process λ∗ (t) is Markovian. In the equilibrium case, we can then integrate over the equilibrium distribution of λ∗ (0) to obtain the appropriate averaged likelihood. Further details on this special case are given in Exercise 7.2.5. ∞ If we assume that ν = 0 µ(x) dx < 1 so that a unique stationary process exists [see Example 6.3(c)], it can be shown that the process converges toward equilibrium as t → ∞ (see Chapter 13). In this case, the conditional intensity approaches the complete intensity function λ† (t), which is the analogue of λ∗ (t) for the process deﬁned on R and not merely on R+ ; that is, events of the process are no longer conﬁned to t > 0. Equation (7.2.6) is then replaced by λ† (t) = λ +

t

−∞

µ(t − u) N (du).

This linear form also arises from second-order theory and suggests that for this example the optimal (least squares) linear predictor coincides with the optimal nonlinear predictor, at least as far as the immediate future is concerned. For further discussion of this issue, see Example 8.5(d).

7.2.

Conditional Intensities, Likelihoods, and Compensators

235

Note that in this and similar examples, ﬁnding the initial conditions required to make the ensuing process stationary resolves for such a process the problem described in the previous section of expressing the local Janossy densities in terms of the global process. In a one-dimensional point process observed over a ﬁnite interval, boundary eﬀects can arise only at the two ends of the interval, while the causal character of the time dimension implies that there are no backward eﬀects from points occurring later than the end of the observation interval. For a stationary point process in time, therefore, the only issue to be resolved is ﬁnding the right initial conditions to ensure that the resulting process is stationary. The form (7.2.7) taken with (7.2.6) gives an example of a linearly parameterized intensity. The general usefulness of this model suggests that, in practical applications, it may be more convenient to choose a ﬂexible family of models that are readily amenable to processing in much the same way that ARMA models can be used in conventional time series analysis rather than seeking the conditional intensity of a model that is given a priori. To this end, we look for examples in which the conditional intensity has a convenient parametric form. Two broad classes of such models are described below. Example 7.2(c) Processes with linear or log-linear conditional intensity functions. The assumption in these models is that the conditional intensity function can be written in one of the forms

λ∗ (t) = bk Q∗k (t), (7.2.8) k

log λ∗ (t) =

bk Rk∗ (t),

(7.2.9)

k

referred to as linear and log-linear forms, respectively, and where the Q∗k and Rk∗ are known functions. In these two cases, either the likelihood or the log likelihood is a convex function of the parameters so that, if it exists, the maximum likelihood estimate of λ∗ is unique [see Exercise 7.2.6 and Ogata (1978)]. This property is of great importance when the model is highly parameterized; without some safeguard that guarantees convexity, the likelihood function may be extremely irregular, in which case convergence of numerical maximization routines is likely to be the exception rather than the rule. The known functions Q∗k (·) or Rk∗ (·) may represent many types of dependency: trends or cyclic eﬀects, linear or nonlinear dependence on the lengths of past intervals as in the Wold process, or linear dependence on the occurrence times of past events as in the Hawkes process. It must be admitted, however, that because of the inherent nonlinearity of the algebraic structure of a point process, there has not yet emerged for point processes a single class of parametric models of the same general utility as the ARMA models in conventional time series analysis. Further examples are given in Exercise 7.2.6 together with some indication of the numerical problems of estimation. For a more extended review, see Ogata and Katsura (1986); a deeper theoretical treatment is in Kutoyants (1984).

236

7. Conditional Intensities and Likelihoods

So far, we have mainly assumed that the history controlling the conditional intensity is the history of the process itself (i.e. its ‘internal history’), or in economics jargon that there are no exogenous variables that may inﬂuence the behaviour of the process. In many situations, this is not the case: to deﬁne the future progress of the process properly, the observations must include variables over and above the previous points of the process. In the previous example, one can well imagine that some of the terms in the linear combination might depend on external variables in addition to variables deﬁned by the past points of the process itself. Likelihoods and predictions will then depend on just what information is in fact available. In the case of a Cox process, for example, prediction of the process takes on a very diﬀerent character if the observations available to the predictor include knowledge of the random intensity function. Ideas of this kind are developed in the general theory of processes (see Appendix A3.3 for a brief introduction and further references), in which a history (or ﬁltration) for the process is deﬁned as a nested, increasing family H of σ-algebras Ht such that N (t) is Ht -measurable for all t. N (t) is then said to be H-adapted. Conditional intensities can be found for any history of the process and will usually have diﬀerent forms according to the history chosen. In such a situation, the full likelihood of the process will cover the joint distributions of the point process and also of the additional variables that may inﬂuence the process through the dependence on past histories. Often, this is not available or is too complex to be used for practical inference or prediction. In such cases, some kind of partial likelihood, treating the observed values of explanatory variables as constants, may still be used for estimation purposes (see e.g. Cox, 1975). Such partial likelihoods have the same structural form as (7.2.4) provided the proper version of the conditional intensity (incorporating the new explanatory variables as they occur) is used. In this context, where new explanatory variables may arise, it is helpful to view the basic form (7.2.4) as an extension of the likelihood for the Poisson process. Because of the complete independence property of the Poisson process, its likelihood corresponds to a continuous version of the multiplicative property for independent events: for example, Pr(A ∩ B ∩ C) = Pr(A)Pr(B)Pr(C). When the events are not independent, this can be replaced by the chain rule formula Pr(A ∩ B ∩ C) = Pr(A) Pr(B | A) Pr(C | A ∩ B), which still represents the joint probability of the three events as a product. Equation (7.2.4), even in the form allowing general histories, can be regarded as an analogous extension of the original Poisson likelihood. The situation is more transparent for processes in discrete time, as in the simple example below.

7.2.

Conditional Intensities, Likelihoods, and Compensators

237

Example 7.2(d) Binary processes: discrete-time logistic regression model. We consider a discrete-time process with realizations of the form {0, 0, 1, 0, 0, 0, 1, 1, 0, . . . }. In this context, the equivalent of an inhomogeneous Poisson process is a process with independent, nonidentical Bernoulli trials Yi with success probabilities pi = Pr{Yi = 1}. The likelihood of a realization (Y1 , . . . , Yn ) with n trials can be written as log L(Y1 , . . . , Yn ; p1 , . . . , pn ) =

i:Yi =1

pi − log(1 − pi ). 1 − pi 1 n

log

(7.2.10)

Now suppose that the Yi are no longer independent but have probabilities p∗i = Pr{Yi = 1 | Y1 , . . . , Yi−1 }, which can depend on the past history of the process. Then, by the same chain rule argument referred to earlier, (7.2.10) remains valid if the pi are replaced by the p∗i . But there is no essential requirement here to restrict the conditioning to events deﬁned on the previous values of the Yi . We can add in dependence on additional past variables without aﬀecting the validity of the chain rule formula. This is equivalent to extending the sequence of σ-algebras Hi (histories) to include all events generated by the relevant random variables before time i, including but not restricted to values of the sequence Yi itself. To take a more concrete example, the probabilities p∗i might depend on the last few values of some explanatory variable Ui . This dependence might be modelled through a logistic regression, such as the explicit representation of p∗i = E(Yi | Hi ) = E(Yi | U1 , U2 , . . .) by an equation of the form log

r

p∗i = α + αj Ui−j . 0 1 − p∗i j=1

This is nothing other than the discrete-time version of a model with loglinear intensity, as described in Example 7.2(c), but with the explanatory variables now a selection of lagged versions of the external variables Ui . The art of the modeller here lies in constructing a form of dependence on the past that captures as much as possible of the true dynamics of the process being modelled. Example 7.2(e) Simple and modulated renewal process. From Example 7.2(a) (see also Exercise 7.2.3), it follows that for a renewal process N (t) denoting the number of renewals in (0, t) and whose lifetime distribution has a hazard function h(·), the conditional intensity has the form h(t − tN (t) ). Suppose that in addition to the renewal instants {ti } corresponding to the basic point process N (t), we also observe a (vector) family of stochastic processes {X(t): 0 < t < ∞} ≡ {X1 (t), . . . , Xk (t): 0 < t < ∞}, and suppose that as the deﬁning history for the process we take the σ-algebras Ft of the form Ft = HtN ∨ HtX ,

238

7. Conditional Intensities and Likelihoods

thus combining the internal history of {N (t): 0 < t < ∞} with that of {X(t): 0 < t < ∞}. Now suppose that the hazard function in successive intervals is modiﬁed in a multiplicative fashion by some nonnegative function ψ(X1 (t), . . . , Xk (t)) of the current values of the {Xi (t)}; that is, we take λ∗ (t) = h(t − tN (t) ) ψ(X1 (t), . . . , Xk (t)). Cox (1972a) posed the problem of estimating parameters β1 , . . . , βk when ψ(·) has the log-linear form k

βj Xj . log ψ(X1 , . . . , Xk ) = j=1

There is a close analogy with the problem of estimating the parameters in a model for lifetime distributions when the lifetimes of diﬀerent individuals may be aﬀected by diﬀerent values of concomitant variables X1 , . . . , Xk ; this is the Cox regression model described in Cox (1972b) and now the subject of a considerable literature (see e.g. Aalen, 1975, 1978; Jacobsen, 1982; and Andersen et al., 1993). Exercise 7.2.7 sketches a speciﬁc example. Example 7.2(f) Processes with unpredictable marks (see Deﬁnition 6.4.III). Conditional intensities for marked point processes will be considered more systematically in Section 7.3. In the special case of processes with unpredictable marks, however, the marks occur independently of the past of the process and can be treated as a sequence of independent random variables. Without necessarily assuming stationarity and supposing that the mark distribution at time t has density f ∗ (κ | t), the conditional intensity factorizes into the form [see Lemma 7.3.V(iii)] λ∗ (t, κ) = λ∗g (t)f ∗ (κ | t). Consequently, the log likelihood can be written as the sum of two terms log L = log L1 + log L2 , where

Ng (T )

log L1 =

log λ∗g (ti ) −

T

λ∗g (u) du

(7.2.11a)

0

i=1

and

Ng (T )

log L2 =

f ∗ (κi | ti ).

(7.2.11b)

i=1

The ﬁrst term is in the standard form for a univariate point process on (0, T ) except for the fact that the ground intensity λ∗g (t) may depend on the marks κi for events occurring before t as well as on the ti themselves. In this sense, the ground process has the structure of a point process whose evolution depends on the evolution of a parallel, extrinsic process, namely the process of marks. The second term is the usual sum for a set of independent observations.

7.2.

Conditional Intensities, Likelihoods, and Compensators

239

If the mark distribution has no parameters in common with the distribution of the ground process, then the two terms can be maximized separately and give the full likelihood estimates. If the marks are treated as a set of given values, about whose structure or distribution we have no information, then the ﬁrst term could still be maximized as a partial likelihood. Several of the simpler models for earthquake occurrence and neural impulses are of this form, where the size or strength of the event is treated as an independent mark, but can nevertheless inﬂuence the future evolution of the process. A typical example is the ETAS model [see Example 6.4(d) for notation and details], for which the conditional intensity of the ground process has the form λ∗g (t) = µc + D

i:ti
eα(κi −κ0 )

1 . (c + t − ti )1+p

Here D = AK is a constant that controls the criticality of the underlying branching process. This form can be substituted into L1 above and used to evaluate the parameters µc , α, c, p and D without reference to the mark distribution. Conﬂicts will arise only if there is some departure from the assumption of unpredictable marks or if the mark distribution has some parameter in common with those speciﬁed above. See Example 7.3(c) for illustrations and further discussion. The stress-release model, considered below, is another example of this general type. It is an example also of a further class of models with the characteristic feature that the conditional intensity is governed by a Markov process that in general is only partially observable. The simplest examples of this type are doubly stochastic processes in which the underlying Markov process governs the stochastic intensity function. Here explicit expressions for the likelihood are not usually available, but an approach to likelihood estimation can nevertheless be made through adaptations of the Baum–Welch or E-M algorithms (see Exercise 7.2.8) or via the general ﬁltering techniques discussed in Chapter 14. In the stress-release model, the occurrence times and marks of the events inﬂuence the Markov process itself so that the doubly stochastic character is lost, but in compensation the realization of the Markov process can be reconstructed from the data, given the model parameters and an initial value X(0), so that an explicit form for the likelihood can be obtained. Example 7.2(g) Self-correcting or stress-release model. The model was ﬁrst investigated by Isham and Westcott (1979) as an example of a process that automatically corrects a deviation from its mean. Motivated by quite diﬀerent applications in seismology, Knopoﬀ (1971) and Vere-Jones (1978b) introduced essentially the same model as an elementary stochastic version of the so-called elastic rebound theory of earthquake formation in which context it has undergone substantial further study and elaboration (e.g. Ogata and Vere-Jones, 1984; Zheng, 1991; Zheng and Vere-Jones, 1994; Lu et al., 1999; Bebbington

240

7. Conditional Intensities and Likelihoods

and Harte, 2001). Processes analogous to the stress-release model also arise in storage and insurance applications—wherever there is a process of steady accumulation and random release. Vere-Jones (1988) discusses an insurance interpretation. The model is deﬁned by an unobserved jump-type Markov chain X(t) that increases linearly between events and decreases by a random amount (its mark) when an event occurs. Let the event times and associated marks be denoted by (ti , κi ), where it is supposed that the κi are nonnegative. Then, for t ≥ 0, X(t) has the representation X(t) = X(0) + νt −

i:0
κi .

Now suppose that the risk of an event occurring is an increasing function Ψ(x) of the value x of X(t). Given an initial value X(0), and treating the κi as known quantities, the conditional intensity for the ground process (all events {ti }) can be written (7.2.12) λ∗g (t) = Ψ[X(t)]. One of the remarkable features of this process is that, apart from the value of X(0), the conditional intensity is fully determined by the parameters of the model and the observations (ti , κi ). In other words, (7.2.12) is an H-intensity (internal intensity), in marked contrast to the doubly stochastic models, where one has to distinguish carefully between the internal intensity (conditioning on the observed event times and sizes only) and the intensity with respect to the full history (conditioning on both the events and the realization of the Markov process up to time t), and generally neither is very useful, the former being intractable and the latter inaccessible. If (as is commonly the case) it is assumed that the event sizes form an i.i.d. sequence, the model again falls into the class of processes with unpredictable marks. The ﬁrst term of the likelihood, (7.2.11a), is then suﬃcient to determine the parameter ν and any additional parameters arising in the speciﬁcation of the function Ψ. In the particularly tractable special case where Ψ(x) = exp(α + ρx), the conditional intensity can then be represented in the log-linear form λ∗g (t) = exp α + ρ[X(0) + νt − i:0
7.2.

Conditional Intensities, Likelihoods, and Compensators

241

ﬁnite mean µ and that the function Ψ is monotonically increasing, the essential condition (see Zheng, 1991, Proposition 4.3; Vere-Jones, 1988) is that lim Ψ(x) < µc /µ < lim Ψ(x).

x→−∞

x→+∞

(7.2.13)

These two inequalities on µc /µ ensure that the process X(t) drifts neither toward −∞ nor toward +∞. Some further properties are developed in Exercises 7.2.8–10. The integral of the conditional intensity function over time also plays an important role in the general theory. It is known as the compensator of the point process, relative to some given history F , on account of the following key property. Lemma 7.2.V. Suppose {N (t): 0 ≤ t < ∞} is adapted to the history F and admits a left-continuous F -intensity λ∗ (t). Deﬁne Λ∗ (t) as the pointwise integral t

Λ∗ (t) =

λ∗ (u) du.

0

Then, the process M (t) = N (t) − Λ∗ (t) is an F -martingale: for every s > t > 0, E[M (s) | Ft ] = M (t). Proof. The idea behind the proof is simple. Consider the increment in the counting process N (t) over an interval (t, t + ∆). We have approximately E [N (t + ∆) − N (t)] − [Λ∗ (t + ∆) − Λ∗ (t)] | Ht E[N (t + ∆) − N (t) | Ht ] − λ∗ (t)∆ λ∗ (t) − λ∗ (t) = 0. However, the simplicity of this argument is deceptive in that the identiﬁcation E[N (dt) | Ht ] = λ∗ (t) dt on which it depends, while intuitively clear, is tantamount to accepting the martingale property as a ﬁrst premise. When F = H, the internal history, the challenge is to derive this seemingly simple statement from the deﬁnition of the conditional intensity in terms of a family of hazard functions. Exercise 7.2.2 gives a simple special case. A formal proof starts from the Doob–Meyer decomposition of a submartingale into an increasing, predictable part and a martingale (see Proposition A3.4.IX). The predictable part is identiﬁed with the compensator and shown to equal the integral of the conditional intensity function when such a function exists. See Chapter 14 for details. Lemma 7.2.V characterizes the compensator as the process that must be subtracted from the increasing process N (t) to make it a martingale. It is increasing and, as holds for the conditional intensity, it is required to have a predictability property that in practice (at least when a conditional intensity exists) reduces to continuity. It increases continuously even though the process N (t) is a step function with irregularly spaced steps.

242

7. Conditional Intensities and Likelihoods

By contrast, the martingale component includes jumps and is sometimes referred to as the innovations process. It may be compared with the Brownian motion term in a stochastic diﬀerential equation. However, it is only in very special situations (notably the Poisson process) that the innovations process for a point process has independent increments. In a renewal process, for example, the compensator is a sum of log survivor functions or, more generally, integrated hazard functions (IHFs) as in Section 4.6, and the martingale component consists of a combination of continuous segments, predictable when the last point is known, and unpredictable jumps (see Exercise 7.2.11). Another remarkable property of the compensator is embodied in the random time-change theorem outlined in Section 7.4. It provides a far-reaching generalization of the assertion (see Exercise 2.4.4) that a nonstationary Poisson process can be transformed back into a stationary one by stretching the t time axis, speciﬁcally by setting τ = Λ(t) = 0 λ(u) du.

Exercises and Complements to Section 7.2 7.2.1 Complete the details of the proof of Proposition 7.2.III. [Hint: Use (7.2.1b), (7.2.2) and (7.2.3).] 7.2.2 Consider a one-point process with its point t1 uniformly distributed over (0, T ) for some positive T . Show that the conditional intensity is given by

λ∗ (t) =

1/(T − t)

(0 < t ≤ t1 ),

0

(t1 < t ≤ T ).

Find also the corresponding compensator Λ∗ (t) and check that E[Λ∗ (t)] = t/T = E[N (t)] < 1 for 0 < t < T . 7.2.3 (a) For a d.f. F with density f , write h(x) = f (x)/F (x) for its hazard function, where F (x) = 1 − F (x). Verify that a renewal process with lifetime d.f. F on R+ , with realization 0 = t0 < t1 < · · · < tn < · · · and N (t) = sup{n: tn < t} (note that N (t) is then left-continuous), has conditional intensity λ∗ (t) = h(t − tN (t) ) (7.2.14) and likelihood f (t1 ) f (t2 −t1 ) · · · f (tN (t) −tN (t)−1 )F (t−tN (t) ) [see Example 5.3(b)]. (b) Now let N (·) denote the counting function on R+ of a delayed renewal process in which t1 has d.f. G with density g and otherwise the lifetime d.f. is F with mean λ−1 as in (a). Show that λ∗ (t) = g(t)/G(t) if N (t) = 0 and otherwise (7.2.14) holds, and that the likelihood function equals G(t) N (t)−1 f (ti+1 − ti ))F (t − tN (t) ). if N (t) = 0 and otherwise equals g(t1 )( i=1 (c) For a stationary renewal process, put g(t) = λF (t) in (b). (d) Evaluate the expressions in (a) and (c) when (i) F (x) = 1 − e−λx (x > 0); (ii) F (x) = 1 − (1 + λx)e−λx (x > 0). 7.2.4 Let 0 = t0 < t1 < · · · be a realization on (0, t] of the Wold process detailed in Exercise 4.5.8. Write down its likelihood function and its hazard function. Investigate both these functions when the process is stationary (so that then t0 < 0 in general). See Lai (1978) for another example.

7.2.

Conditional Intensities, Likelihoods, and Compensators

243

7.2.5 Hawkes model with exponential decay. Consider the model in (7.2.7) with K = 0, writing it in the form

λ∗ (t) = λ + ν

t

αe−α(t−u) N (du) = λ + να

0

where ν =

∞ 0

ti ≤t

e−α(t−ti ) ,

µ(t) dt. Establish the properties below.

t

(i) The process Y (t) = 0 e−α(t−u) N (du) is Markovian; hence also λ∗ (t) = λ + ναY (t), with inﬁnitesimal transitions and rates

Y (t + dt) =

Y (t) + 1

with probability [λ + ναY (t)] dt,

(1 − α dt)Y (t)

with probability 1 − [λ + ναY (t)] dt.

(ii) The distribution function Ft (y) = Pr{Y (t) ≤ y} satisﬁes the forward Kolmogorov equation ∂Ft (y) ∂Ft (y) = αy − ∂t ∂y

y

(λ + ναu) Ft (du).

(7.2.15)

(y−1)+

(iii) If ν < 1, an equilibrium distribution exists, with density π(x) say, that satisﬁes y αyπ(y) =

(λ + ναu)π(u) du, (y−1)+

for which π(y) = π(1)eν(y−1) y (λ/α)−1 for 0 < y < 1 and for real θ ≥ 0,

φ(θ) ≡

∞

e−θy π(y) dy =

0

λ α

1

exp(−θ)

(1 − w) dw . ν(1 − w) + log w

(iv) The likelihood for a set of observations 0 < t1 < · · · < tN (T ) from the equit librium process on (0, T ) is given by 0 1 Ly π(y) dy, where Ly is formed in the usual way from the modiﬁed conditional intensity λ∗y (t) = ye−αt + λ + ν

t

αe−α(t−u) N (du).

0

7.2.6 (a) For each of the models implied by (7.2.8) and (7.2.9) with r parameters b1 , . . . , br , check that r r

j=1 k=1

vj vk

∂ 2 log L ≤0 ∂bj ∂bk

(all real vj , j = 1, . . . , r).

Deduce that if a solution of the equations ∂L/∂bj = 0 (j = 1, . . . , r) is found, then it is unique. (b) For the log-linear model, show that along any ray {(ρb1 , . . . , ρbk ): −∞ < ρ < ∞}, log L → ∞ for |ρ| → ∞, so that a maximum on the ray exists, and hence a global maximum for log L exists.

244

7. Conditional Intensities and Likelihoods [See Ogata and Vere-Jones (1984) for an example. In the linear model, there is no guarantee that with any parameters {ˆbj } so determined, the likelihood of any other set of observations will necessarily have positive likelihood, nor is it even necessarily the case that the intensity at every point in the realization is positive! In general, it is necessary to treat the problem as one of constrained optimization: see e.g. Ogata (1983) and the discussion by Berman (1983).]

7.2.7 Poisson process in a random environment [see Example 7.2(e)]. As a simple example of a modulated renewal process, suppose that the rate λ(t) of a simple Poisson process takes diﬀerent values λ1 , . . . , λK in response to environmental factors X(t); thus, we can write λ∗ (t) =

K

λk IAk (X(t)),

k=1

where Ak denotes the range of values of X(t) on which λ takes on the value λk . If X(t) is an observed, continuous function of t but the λk are unknown parameters of the process, write down the likelihood conditional on a knowledge of X(t) at time t. Hence, obtain an estimate of λk in terms of the proportion of time spent by X(t) in Ak . Is the result aﬀected if instead of being an external variable, X(t) is a function of the backward recurrence time (i.e. of the age of the ‘component’ in place at time t)? 7.2.8 E–M algorithm applied to a Cox process with a Markovian rate function. In contrast to the previous exercise, suppose that the process X(t) governing the rate of occurrence of points is not observed but is known to be a continuous-time Markov chain with ﬁnite state space K = {1, . . . , K} and Qmatrix Q = {qkl ; k, l ∈ K}, and that when X(t) = k, points occur according to a Poisson process with rate λk . The aim is to estimate the parameters qkl and λk from observations on N (·) alone. Approximate the continuoustime process by a discrete skeleton X(nδ); then the resulting Markov chain has transition probabilities given approximately (for δ small) by pkk = 1 − qkk δ , pkl = −qkl δ, k = l. Observations on the process consist of the counts Yn = N (nδ, (n+1)δ], treated as Poisson or even binomial (presence or absence of points). Write down and implement iterative procedures for estimating the parameters of the discrete approximation, and hence of the underlying continuous process, using the E-M methodology. [Hint: This example has been widely discussed in the literature on point process ﬁltering and will be reviewed further in Chapter 14. Since the Markov process is unobserved, the example can be treated as a ‘hidden Markov model’ and is thus a natural candidate for analysis via the Baum–Welch and E–M algorithms—see Dempster et al. (1977), Elliott et al. (1995), and MacDonald and Zucchini (1997). The full likelihood is the likelihood for both the realization of the Markov chain and the observed counts; the restricted likelihood is the likelihood for the observed counts only, averaged over the possible realizations of the Markov chain. The references cited give general accounts of the form of the averaging (E-step) and estimation (M-step) procedures that can be employed to pass from the full to the restricted likelihoods and obtain the resulting estimates.]

7.2.

Conditional Intensities, Likelihoods, and Compensators

245

7.2.9 Stress-release model: Stationary behaviour. In Example 7.2(g), let F (x, t) = Pr{X(t) ≤ x}, S(u) = Pr{κ > u}. (i) Show, using the notation of the example, that the forward equations for the Markov process X(t) take the form ∂F ∂F +ν = ∂t ∂x

∞

Ψ(y)S(y − x) F (dy, t). x

(ii) Deduce that, if it exists, the density π(x) of the stationary distribution for X(t) satisﬁes ∞ νπ(x) = Ψ(y)S(y − x)π(y) dy, x

and that its characteristic function ϕ(s) = eisx π(x) dx = E(eisX(·) ) satisﬁes ϕ(s) = γ(s)ϕΨ (s) where, with µ = E(κ), γ(s) and ϕΨ (s) are the characteristic functions of the distributions with densities S(x)/µ and (µ/ν)Ψ(x)π(x), respectively, and E[Ψ(X(t))] = µν by stationarity. (iii) If the mark distribution is exponential with mean µ, then

π(x) = A exp

x 1 − µ ν

x

Ψ(u)du . 0

(iv) If Ψ(x) = exp[β(x − x0 )], the equation for ϕ(s) above takes the form ϕ(s) = cγ(s)ϕ(s − iβ) ,

c = e−βx0 µ/ν ,

which admits the solution in inﬁnite product form ϕ(s) = eisR γ(s)

∞

eis/(βk)

k=1

γ(s − ikβ) , γ(−ikβ)

where R = x0 + ( log(βν) − γ0 )/β and γ0 = 0.5772 . . . . (v) Show that, in the stationary regime, if the jump distribution has moment generating function m(s), the risk Ψ[X(t)] has moments E([Ψ(X)]k ) =

⎧ ⎨ µν

⎩ k−1 k

E([Ψ(X)]−k ) =

(k = 1), (νβ)k (k − 1)!

=0

[1 − m(−β)]

[m(β) − 1] (νβ) !

=0

(k = 2, 3, . . .),

(k = 1, 2, . . .).

[Hint: See Vere-Jones (1988) and Borovkov and Vere-Jones (2000).] 7.2.10 (Continuation) Variance properties. (a) Let N (t) denote the number of jumps (events in the ground process) for the stress-release model. Show that in the stationary case, if X(t) has ﬁnite second moment, then var N (t) is bounded uniformly in t if and only if the jump distribution is degenerate at a single point.

246

7. Conditional Intensities and Likelihoods [Hint: In this case, X(t) has bounded variance and the forward result is trivial; for the converse, consider a bivariate version of Wald’s identity using the joint characteristic function for the intervals Ti and number of jumps Ni between successive crossings of a ﬁxed level for X(t).] (b) Under similar conditions, the mean rate and reduced second factorial moment density for the stress-release model can be expressed in the forms

m=

Ψ(x)π(x) dx,

m ˘ [2] (u) =

Ψ(x)π(x) dx j(y − x) dy Ψ(z) Fu (y, dz),

where j is the density of the jump distribution and the transition kernel Fu (y, z) = Pr{X(u) ≤ z | X(0+) = y}. (c) In general, the diﬃculty of solving the forward equations to obtain the transition kernel Fu (y, ·) renders the equations above of relatively academic interest. However, if Ψ(x) = σ for x > 0 and 0 otherwise, the process alternates between ‘periods of prosperity’ when X(t) > 0 and ‘periods of recovery’ when X(t) < 0, the terminology being suggested by the analogy of a collective risk model. Then, an argument similar to that used for M/G/1 queue and analogous storage problems can be used to show that the reduced covariance density c˘[2] (u) has Laplace transform of the form c∗[2] (s) = [1 + ω(s)]−1 , where ω(s) is the unique solution in Re(θ) > 0 of the equation θ − s = σ[1 − j ∗ (θ)] and j ∗ is the Laplace transform of the jump density. 7.2.11 Renewal process compensators. (a) By integrating the conditional intensity function in (7.2.14), show that when the lifetime distribution of a renewal process has a density f , the compensator has the form Λ∗ (t) = −

N (t) n=1

log S(Tn − Tn−1 ) − log S(t − TN (t) ),

where S(·) is the survivor function for the lifetime d.f. with density f . (b) Verify directly that Λ∗ (t) as deﬁned makes N (t) − Λ∗ (t) a martingale. (c) Show that (b) continues to hold for a general renewal process whose lifetime r.v.s are positive a.s., provided the log survivor function is replaced by the integrated hazard function (IHF).

7.3. Conditional Intensities for Marked Point Processes The extension of conditional intensity models to higher dimensions is surprisingly straightforward provided that a causal, time-like character is retained for the principal dimension. When this is present, as in space–time processes, the development of conditional intensities and likelihoods can proceed along

7.3.

Conditional Intensities for Marked Point Processes

247

much the same lines as was developed for one-dimensional simple point processes in the preceding sections. When it is absent, as in purely spatial point patterns, analysis is still possible in the ﬁnite case (compare the discussions in Chapter 5 and Section 7.1) but raises major problems for nonﬁnite cases such as occur for homogeneous processes in the plane. In this section, we examine the extension of the ideas of Section 7.2 to MPPs in time and space– time point processes. A more general and rigorous discussion of conditional intensities and related topics, for both simple and marked point processes in time, is given in Chapter 14. An approach to likelihood methods for spatial processes, based on the Papangelou intensity, is in Chapter 15. The ground work for the material in the present section was laid in the basic paper by Jacod (1975); among many other references, Karr (1986) gives both a review of inference procedures for MPPs and a range of examples and applications. Consider then an MPP on [0, ∞) × K, where, as in Section 6.4, K denotes the mark space, which may be discrete (for multivariate point processes), the positive half-line (if the marks represent weights or energies), two- or three-dimensional Euclidean space (for space–time processes), or more general spaces [e.g. for the Boolean model of Example 6.4(d)]. In order to deﬁne likelihoods for MPPs, we need ﬁrst to ﬁx on a measure in the mark space (K, BK ) to serve as a reference measure in forming densities. We shall denote this reference measure by K (·), using (·) to denote Lebesgue measure on Rd . When K is also some Euclidean space, it will often be convenient to take K to be Lebesgue measure on that space but not always so; for example, in some situations it may be simpler to take K to be a probability measure on K. Similarly, when the mark space is discrete, it will often be convenient to take the reference measure to be counting measure, but in some situations it may again be more convenient to choose the reference measure to be a probability measure. Once the reference measure K has been ﬁxed, we can extend the notion of a regular point process from simple to marked point processes. As in Deﬁnition 7.1.I, we shall say that an MPP on X = Rd × K is regular on A for a bounded Borel set A ∈ BX if for all n ≥ 1 the Janossy measure Jn is absolutely continuous with respect to the n-fold product of × K and regular if it is regular on A for all bounded A ∈ BX . Thus, when the MPP is regular on A, for every n > 0 there exists a well-deﬁned Janossy density jn (· | A × K) with the interpretation jn (x1 , . . . , xn , κ1 , . . . , κn | A × K) dx1 . . . dxn K (dκ1 ) . . . K (dκn ) = Pr{points around (x1 , . . . , xn ) with marks around (κ1 , . . . , κn )}. The following equivalences extend to MPPs the discussion around Proposition 7.1.III. Proposition 7.3.I. Let N (·) be an MPP on Rd × K, let denote Lebesgue measure on (Rd , BRd ) and K the reference measure on (K, BK ), and let A be a bounded set in BRd . Then, conditions (i)–(iv) below are equivalent.

248

7. Conditional Intensities and Likelihoods

(i) N (·) is regular on A. ∪ (ii) The probability measure induced by N (·) on ZA , where ZA = A × K, is absolutely continuous with respect to the measure induced by × K on ∪ ZA . (iii) The ground process Ng (·) is regular on A, and for each n > 0 the conditional distribution of the marks (κ1 , . . . , κn ), for a given realization (x1 , . . . , xn ) of the locations within A, is absolutely continuous with re(n) spect to K with density fA,n (κ1 , . . . , κn | x1 , . . . , xn ), say. (iv) If Π(·) is a probability measure equivalent to K on (K, BK ), then N (·) is absolutely continuous with respect to the compound Poisson process N0 (·) for which the ground process N0g has positive intensity λ on A and the marks are i.i.d. with common probability distribution Π. Proof. The four statements are just alternative ways of stating the fact that the Janossy measures Jn (·) in the proposition have appropriate densities on all components of X ∪ . When any one of the conditions is satisﬁed, the Radon–Nikodym derivative of the probability measure P for N with respect to the probability measure P0 of the compound Poisson process N0 in (iv) has the form [see (7.1.3b)] dP e−λ(A) = J0 I{N (T )=0} dP0 ∞

j g (x1 , . . . , xn | A) fA,n (κ1 , . . . , κn | t1 , . . . , tn ) , + I{N (T )=n} n λn π(κ1 ) · · · π(κn ) n=1 (7.3.1a) in which π(κ) = (dΠ/dK )(κ) and is itself a portmanteau expression of the statements that, given a realization (t1 , κ1 ), . . . , (tn , κn ) with N (T ) = n, the likelihood ratio of N with respect to N0 is given by L/L0 = jng (x1 , . . . , xn | A)fn (κ1 , . . . κn | x1 , . . . , xn )/[λn π(κ1 ) · · · π(κn )]. (7.3.1b) Much as in the discussion leading to Proposition 7.2.I, we now rewrite the Janossy densities in a way that takes advantage of the directional character of time. Thus, the Janossy densities for the ﬁrst few pairs may be represented in the form J0 (T ) = S1 (T ), j1 (t1 , κ1 | T ) = p1 (t1 , κ1 ) = p1 (t1 ) f1 (κ1 | t1 ) (0 < t1 < T ), j2 (t1 , t2 , κ1 , κ2 | T ) = p1 (t1 ) f1 (κ1 | t1 ) p2 (t2 | (t1 , κ1 )) f2 (κ2 | (t1 , κ1 ), t2 ) (0 < t1 < t2 < T ), where the pi (·) refer to the densities, suitably conditioned, for the locations in the ground process, and the fi (·) refer to the densities, again suitably conditioned, for the marks. There is a subtle diﬀerence in the conditioning incorporated into the conditional densities fn (κn | (t1 , κ1 ), . . . , (tn−1 , κn−1 ), tn ) that appear in the equations above and those that appear in the proposition. In the equations above we condition the distribution of the current mark, as

7.3.

Conditional Intensities for Marked Point Processes

249

time progresses, on both marks and time points of all preceding events; in the proposition, we condition on the full set of time points in (0, T ), irrespective of the marks and of their relative positions in time. Once again, the dependence of the left-hand side on T is illusory, and the densities for the locations can be expressed in terms of corresponding hazard functions. The conditioning in the hazard functions may now include the values of the preceding marks as well as the length of the current and preceding intervals. All this information is collected into the internal history H ≡ {Ht : t ≥ 0} of the process so that the amalgam of hazard functions and mark densities can be represented as a single composite function for the MPP, namely ⎧ h1 (t)f1 (κ | t) (0 < t ≤ t1 ), ⎪ ⎪ ⎪ .. ⎪ ⎪ ⎨ . ∗ λ (t, κ) = hn t | (t1 , κ1 ), . . . , (tn−1 , κn−1 ) × ⎪ ⎪ ⎪ fn κ | (t1 , κ1 ), . . . , (tn−1 , κn−1 ), t (tn−1 < t ≤ tn , n ≥ 2), ⎪ ⎪ ⎩ .. . (7.3.2) where h1 (t) is the hazard function for the location of the initial point, h2 (t | (t1 , κ1 )) the hazard function for the location of the second point conditioned by the location of the ﬁrst point and the value of the ﬁrst mark, and so on, while f1 (κ | t) is the density for the ﬁrst mark given its location, and so on. Deﬁnition 7.3.II. Let N be a regular MPP on R+ × K. The conditional intensity function for N , with respect to its internal history H, is the representative function λ∗ (t, κ) deﬁned piecewise by (7.3.2). Predictability is again important in that the hazard functions refer to the risk at the end of a time interval, not at the beginning of the next time interval, so left-continuity should be preferred where there is a jump in the conditional intensity. Similarly, the conditional mark density refers to the distribution to be anticipated at the end of a time interval, not immediately after the next interval has begun. More formal and more general discussions of predictibility in the MPP context will be given in Chapter 14. It is often convenient to write λ∗ (t, κ) = λ∗g (t) f ∗ (κ | t) ,

(7.3.3)

where λ∗g (t) is the H-intensity of the ground process (i.e. of the locations {ti } of the events), and f ∗ (κ | t) is the conditional density of a mark at t given Ht− (the reader will note that we use the ∗ notation as a reminder that the ‘functions’ concerned are also random variables dependent in general on the random past history of the process). The two terms in (7.3.3) correspond to the ﬁrst and second factors in (7.3.2). Heuristically, equations (7.3.2) and (7.3.3) can be summarized in the form λ∗ (t, κ) dt dκ ≈ E[N (dt × dκ) | Ht− ] ≈ λ∗g (t) f ∗ (κ | t) dt dκ .

(7.3.4)

250

7. Conditional Intensities and Likelihoods

Notice that the H-intensity λ∗g (t) is not in general the same as the conditional intensity λg (t) of the ground process with respect to its own internal history Hg : H incorporates information about the values of the marks, whereas Hg does not. The example below illustrates the diﬀerence in a simple special case. Example 7.3(a) Bivariate Poisson process [see Example 6.3(e)]. We consider a bivariate Poisson process initiated at time 0 rather than the stationary version considered earlier. We consider also just the process of linked pairs, in which the points {ti } of component I form the ‘parents’ and arrive according to a simple Poisson process with rate λ while the points {sj } of component II represent the process of ‘oﬀspring’. We assume each parent has just one oﬀspring, delayed by nonnegative random times {τi } forming an i.i.d. sequence, independent also of the times {ti }, with common exponential distribution 1 − e−µτ . We shall treat this process as a special case of an MPP with mark space having two discrete points, corresponding to components I and II. The internal history, H, for the full process records the occurrence times and marks for both types of events but does not record which event in component II is associated with which event in component I. Suppose that, at time t, NI (t) = n, NII (t) = m, where necessarily m ≤ n. The full H-intensity is given by λ (κ = I), λ∗ (t, κ) = (n − m)µ (κ = II). Let HI , HII , and Hg denote the internal histories of the component I process, the component II process, and the ground process. The HI -intensity of component I is clearly equal to its H-intensity λI ≡ λ. To ﬁnd the HII intensity of component II, we have to average over the n ≥ m points of component I. For a given value of n, the locations ti may be treated as n i.i.d. variables uniformly distributed over (0, t). The probability that any one such point produces an oﬀspring that appears only after time t is given by t ds 1 − e−µt p(t) = e−µ(t−s) = . t µt 0 The k = n − m parent points that fail to produce oﬀspring in the interval (0, t) then form a ‘thinned’ version of the original, Poisson-distributed number n of the component I points in (0, t), the selected and nonselected points forming two independent streams. Independently of the number m of successes, the expected number of points with oﬀspring still pending is thus λtp(t) and we obtain for the HII -intensity of the component II process λII (t) = E[(n − m)µ | NII (t) = m] = µλt(1 − e−µt )/(µt) = λ(1 − e−µt ). This is a nonrandom function of t, and we recognize it as the conditional intensity of a nonstationary Poisson process. Thus, the two components separately are Poisson, and the rate of the component II process approaches that of component I as t → ∞. The ground process has H-intensity λ + (n − m)µ

7.3.

Conditional Intensities for Marked Point Processes

251

and HII -intensity λ(2 − e−µt ); its Hg -intensity is that of a Gauss–Poisson process; see Exercise 7.3.1. Similar distinctions need to be borne in mind with respect to the various compensators and martingales that can be formed with the two component processes. Thus, NI (t) − λt is both an H- and an HI -martingale, the process NII (t) − µ(NII (t) − NI (t)) is an H-martingale, and NII (t) − λt(1 − e−µt ) is an HII -martingale. We now turn to an MPP extension of Proposition 7.2.III, expressing the likelihood of a simple point process in terms of its conditional intensity. As there, reversing the construction that leads from the point process distributions to the H-intensity in (7.3.2) yields an explicit expression for the Janossy density of the MPP in terms of its conditional intensity (see below). Details of the proof are left to Exercise 7.3.2. Proposition 7.3.III. Let N be a regular MPP on [0, T ] × K for some ﬁnite positive T , and let (t1 , κ1 ), . . . , (tNg (T ) , κNg (T ) ) be a realization of N over the interval [0, T ]. Then, the likelihood L of such a realization is expressible in the form * + T Ng (T ) ∗ ∗ λ (ti , κi ) exp − λ (u, κ) du K (dκ) L= i=1

=

* Ng (T )

0

K

+ * Ng (T ) + ∗ ∗ λg (ti ) f (κi | ti ) exp −

i=1

T

λ∗g (u) du ,

(7.3.5)

0

i=1

where K is the reference measure on K. Its log likelihood ratio on [0, T ] relative to the compound Poisson process N0 with constant intensity λ and i.i.d. mark distribution with density π(·) is expressible as T Ng (T )

L λ∗ (ti , κi ) log = log [λ∗ (u, κ) − λπ(κ)] du K (dκ) − L0 λπ(κ) 0 K i=1

Ng (T )

=

i=1

log

λ∗g (ti ) − λ

0

T

Ng (T )

[λ∗g (u) − λ] du +

i=1

log

f ∗ (κi | ti ) . (7.3.6) π(κi )

The second form in equations (7.3.5) and (7.3.6) follows from the assumption that the densities over the mark space are proper (i.e. integrate to unity). The reversibility of the arguments leading to the representation of the conditional intensity function in (7.3.2) (see Exercise 7.3.2) implies the following MPP analogue of Proposition 7.2.IV. Proposition 7.3.IV. Let N be a regular MPP as in Proposition 7.3.III. Then, the conditional intensity function with respect to the internal history H determines the probability structure of N uniquely. The next proposition gives speciﬁc examples of such characterizations, makeing more explicit the distinction between point processes with independent and unpredictable marks introduced already in Section 6.4.

252

7. Conditional Intensities and Likelihoods

Proposition 7.3.V. Let N be a regular MPP on R+ × K with H-intensity expressible as λ∗ (t, κ) = λ∗g (t)f ∗ (κ | t), (7.3.7) where λ∗g (t) is the H-intensity of the ground process. Then N is (i) a compound Poisson process if λ∗g (t) = λ(t) and f ∗ (κ | t) = f (κ | t) for deterministic functions λ(t) and f (κ | t); (ii) a process with independent marks if λ∗g (t) equals the Hg -intensity for the ground process and f ∗ (κ | t) = f (κ | t) as in (i); and (iii) a process with unpredictable marks if f ∗ (κ | t) = f (κ | t) as in (i). Proof. In a process with independent marks, the ground process and the marks are completely decoupled (i.e. they are independent processes), whereas for a process with unpredictable marks, the marks can inﬂuence the subsequent evolution of the process, though the ground process does not inﬂuence the distribution of the marks. The compound Poisson process is the special case of a Poisson process with independent marks. The forms of the conditional intensities follow readily from these comments, which merely reﬂect the deﬁnitions of these three types of MPP given in Deﬁnition 6.4.III and preceding Lemma 6.4.VI. The lemma is then a consequence of the uniqueness assertion in Proposition 7.3.IV. Some details and examples are given in Exercise 7.3.5. The following nonlinear generalization of the Hawkes process is important for its range of applications. It has been used as a model for neuron ﬁring in Br´emaud and Massouli´e (1994, 1996), and it also embraces a range of other examples, including both ordinary and space–time versions of the ETAS model [Examples 6.4(d) and 7.2(f)]. Example 7.3(b) Nonlinear, marked Hawkes processes [see Example 7.2(b)]. We start by extending the basic Hawkes process N to a nonlinear version with conditional intensity [see (7.2.6)] t ∗ λ (t) = Φ λ + µ(t − u) N (du) , (7.3.8) 0

where the nonnegative function Φ is in general nonlinear but satisﬁes certain boundedness and continuity conditions; in particular, it is required to be Lipschitz with Lipschitz constant α ≤ 1. Such a nonlinear Hawkes process can immediately be extended to a nonlinear marked Hawkes process by giving the points independent marks with density f (κ) so that the conditional intensity function for the marked version is t µ(t − u) Ng (du) f (κ) = λ∗g (t)f (κ). (7.3.9) λ∗ (t, κ) = Φ λ + 0

The marks here make no contribution to the current risk, nor to the evolution of the ground process, which therefore has the same structure as the process N of (7.3.8). Consequently, in (7.3.9) we have Ng = N .

7.3.

Conditional Intensities for Marked Point Processes

253

By contrast, generalizing the ETAS model of Example 6.4(d) and using its notation, we may equally well consider extensions in which the conditional intensity has the form ∗ λ (t, κ) = Φ λ + ψ(χ)µ(t − u) N (du × dχ) f (κ), (7.3.10) (0,t)×K

where ψ(χ) modiﬁes the strength of the infectivity density µ(·) according to the mark χ. In this case, the process has unpredictable marks that, depending on the form of ψ(·), can inﬂuence substantially the evolution of the ground process. In both cases, the likelihood for a ﬁnite observation period [0, T ] decouples and, following the second form in (7.3.6), can be written as * + T

∗ ∗ log L = log λg (ti ) − λg (u) du + log f (κi ) 0

i:0≤ti ≤T

i:0≤ti ≤T

≡ log L1 + log L2 , where λ∗g (t)

=Φ λ+

ψ(κ)µ(t − u) N (du × dκ) .

(0,t)×K

In many parametric models, no parameter appears in both L1 and L2 , so each term can be maximized separately. It is not necessary here to limit the mark to a measure of the size of the accompanying event. As suggested in Example 6.4(d), elements in the mark space may comprise both size and spatial components, κ ∈ K and y ∈ Y, say. Then we can write, for example, λ∗ (t, κ, x) = Φ λ + ψ(χ)µ(t − u)g(x − y) N (du × dχ × dy) f (κ), (0,t)×K×Y

where the spatial density g(·), like f (·), has been normalized to have unit integral and determines the positions of the oﬀspring about the ancestor. Because of the independent sizes κi here, the log likelihood again separates into two terms, the ﬁrst of which is analogous to log L1 above but includes an integration over both space and time. From a model-building point of view, it is of critical importance to establish conditions for the existence of stationary versions of the process and for convergence to equilibrium. General conditions are given by Br´emaud and Massouli´e (1996) and discussed further in Chapters 13 and 14. In the special case corresponding to the space–time ETAS model, where the function Φ is linear (and can be taken to be the identity function), the process retains the basic branching structure, and a suﬃcient condition for the existence of a stationary version is the subcriticality of the underlying branching component, as outlined already in Example 6.4(d). It is, of course, quite possible to devise models where the mark distributions are dependent on the evolution of the process. A simple example is given below.

254

7. Conditional Intensities and Likelihoods

Example 7.3(c) Processes governed by a Markovian rate process. Several models for both simple and marked processes are governed by an underlying Markov process, X(t) say, which both inﬂuences and is inﬂuenced by the evolving point process. Typically, in the marked case, both the ground process intensity and the mark distribution depend on the current value of X(t). Two simple models of this type are the simple stress-release model in Example 7.2(g) and the Cox process with Markovian rate function considered in Exercise 7.2.7. To illustrate possible ramiﬁcations of such models, consider ﬁrst a Hawkes process with exponential infectivity density µ(x) = µe−µx . In this case, the Markovian process X(t) is given by the sum X(t) = µ i: 0
λ∗ (t) = Φ[X(t)],

where in the simplest case, Φ(x) = λ + νx for some λ > 0 and 0 < ν < 1 as in Exercise 7.2.5. Next, we could consider a marked version of such a process, with random event sizes Si = ψ(κi ), deﬁning X(t) by

X(t) = µ ψ(κi )e−µ(t−ti ) . (7.3.11) i:0
In the simplest case of independent marks, with common density f (κ), λ∗ (t, κ) = Φ[X(t)]f (κ),

(7.3.12)

corresponding to an ETAS-type model but with exponential rather than power-law decay function. It might well be natural, however, to suppose that not only the rate λ∗ (t) but also the density f (κ) of the mark distribution could be aﬀected by the value of X(t), in which case f (κ) would be replaced by f (κ | X(t)). To take a particular parametric example, let the mark distribution have an exponential density βe−βκ , and set β = a + bX(t) so that the conditional intensity takes the form λ∗ (t, κ) = e−λ−νX(t) · [a + bX(t)]e−[a+bX(t)]κ , with X(t) given by (7.3.11). In this case, the log likelihood can still be written as the sum of two terms, log L = log L1 + log L2 , say, where the second term equals i log f (κi | X(ti )), but it is no longer possible to decouple the two parts of the likelihood completely because the parameters relating to X(t) appear in both parts. In the speciﬁc example considered, log L equals T log Φ[X(ti )] − Φ[X(u)] du + log f (κi | X(ti )) = log L1 +log L2 , i

0

i

where the parameters λ and ν appear in L1 only, the parameters a and b appear in L2 only, but the parameter µ, as well as any parameter involved in the deﬁnition of the function ψ, appears in both L1 and L2 .

7.3.

Conditional Intensities for Marked Point Processes

255

Example 7.3(d) Linked stress-release model. This is a multivariate version of the basic model outlined in Example 7.2(g). We consider a ﬁnite number of distinct regions or components i = 1, . . . , I, say, each with its own stress level Xi (t) and with the property that a proportion θij of a stress drop occurring in region i is transferred to region j (but we do not necessarily require either θij ≥ 0 or i θij = 1). The evolution of stress Xi (t) in the ith region can thus be expressed in the form

Xi (t) = Xi (0) + ρi t − θij S (j) (t), (7.3.13) j

where S (j) (t) is the accumulated stress release in region j over the period [0, t) and ρi is the rate of stress input into region i. The process of events is doubly marked: by the region i and by the size of the stress drop κ. We suppose that both the risk functions (i.e. stress levels) and the jump distributions are functions of the vector X(t). The assumptions imply that the process X(t) controls the evolution of the point process and is itself Markovian. They lead to a conditional intensity of the form λ∗ (t, i, κ) = Ψi X(t)fi [κ | X(t)] , (7.3.14) where fi [κ | X(t)] is the density for the distribution of stress drop for an event that occurs in region i at a time when the vector of stress levels is X(t), and Ψi gives the risk in region i as a function of the vector of stress levels X(t). Typically, Ψi (X) = exp(µi + νi Xi ), so that only the stress level in the region under consideration aﬀects the risk in that region. Then, the conditional intensity function can be written in the reparameterized form

λ∗ (t, i, κ) = exp αi + νi ρi t − θij S (j) (t) fi [κ | X(t)], j

where αi = µi + νi Xi (t), νi , ρi , and θij (i = j) are the parameters to be estimated, apart from those involved in the density function for the stress drops, and we set θii = 1. As in Example 7.3(c), the likelihood can be expressed as the sum of two terms, the ﬁrst relating to the times and the second to the stress drops of the events, but only fully decouples when the stress drops are i.i.d. In the present context, an appealing candidate for the mark distribution is the tapered Pareto, or Kagan, distribution with survivor function α c 1 − F (κ) = e−βκ . (7.3.15) c+κ Typically, β is taken very small so that for small and intermediate values of κ, the density is close to a power-law form, but for large κ it is dominated by the exponential taper. Distributions of this general type have recently been considered in several contexts where it is desirable for the body of the distribution to have a power-law character but for the moments to remain

256

7. Conditional Intensities and Likelihoods

ﬁnite (see e.g. Kagan, 1999; Kagan and Schoenberg, 2001; Vere-Jones et al., 2001). For the present example, we might take α as ﬁxed and equal to unity and allow β to decrease with the value of X(t) in such a way that the upper turning point 1/β increases to ∞. In this case, the tail of the distribution would progressively lengthen (admitting larger and larger events) as the stress level increased while its mean approached +∞. For applications of the linked stress-release model to earthquake data, using generally independent marks with exponential distribution, see e.g. Liu et al. (1999), Bebbington and Harte (2001), and Lu and Vere-Jones (2000). See Exercise 7.3.6 for stability properties of the model. Example 7.3(e) Cumulative processes. Let N ≡ {(ti , κi )} be a regular MPP deﬁned over the time interval (0, T ), and consider the random measure derived from N as in (6.4.6) and characterized through the cumulative process ξ(t) = κ N (du × dκ). (0,t)×K

We do not insist here that the process have independent or unpredictable marks. Although ξ(t) corresponds to a random measure rather than a point process, it is still germane to ask questions about its internal history, its likelihood, and its conditional intensity. The following points are straightforward to verify and are left to the reader. (i) The internal history of ξ coincides with the internal history for the underlying MPP N . (ii) The likelihood for ξ(t) over an interval (0, T ) coincides with the likelihood for N over the same period. (iii) A conditional intensity µ∗ (t) for ξ(t) can be deﬁned by ξ ∗ ∗ N µ (t) dt ≡ E[dξ(t) | Ht− ] = λ (t) E[κ | Ht− ] dt = κ λ∗ (t, κ) dκ, K

where λ∗ (t, κ) is the HN -conditional intensity for the MPP N .

Exercises and Complements to Section 7.3 7.3.1 Further properties of the bivariate Poisson process [see Example 7.3(a)]. (a) Discuss the F0 -intensity for the process of the example. [The diﬃculty here, as for other cluster processes, is in averaging over the diﬀerent possible ways that parents and oﬀspring may be associated; see the comments on Example 6.2(c) concerning the Gauss–Poisson process.] (b) Verify the martingale properties asserted at the end of the example. 7.3.2 Write out explicitly the construction leading back from (7.3.2) to the Janossy densities and hence complete the proof of Proposition 7.3.III. 7.3.3 Deﬁne a one-point MPP on (0, T ) × (0, T ) as follows. For any realization {(t1 , κ1 )}, say, the point t1 has the density f (·) on (0, T ) and, given t1 , κ1 is uniformly distributed on (0, T − t1 ). Find the conditional intensities for this MPP and for the bivariate point process {(t1 , t1 + κ1 )}. What are the corresponding compensators?

7.4.

Random Time Change and a Goodness-of-Fit Test

257

7.3.4 Verify the forms of conditional intensity in Proposition 7.3.V for compound Poisson processes and processes with independent or unpredictable marks. 7.3.5 Accelerated moment release model. Let tf denote the time of a major earthN (t) quake, and ξ(t) = i=1 κi the cumulative release of seismic moments of small or moderate earthquakes up until time t < tf . According to Varnes (1989) and Main (1996), there are physical grounds for supposing that ξ(t) increases hyperbolically before the major event; i.e. ξ(t) ≈ A + B(tf − t)−m , where A, B and m are positive constants. Suggest an appropriate conditional intensity model and associated likelihood, assuming that the relationship refers to E[ξ(t)] and that the increase is due to either (i) an increase in the frequency of events but not their average size; or (ii) an increase in the average size of the events but not their frequency; or (iii) an increase in both frequency and average size. [Hint: In Vere-Jones et al. (2001), both exponential and tapered Pareto distributions are used to model the event sizes, and a maximum entropy argument is used to suggest that the increase in moment should be partitioned between the mean size and mean frequency of events in such a way that each takes up the square root of the overall increase.] 7.3.6 Stability results for linked stress-release model [see Example 7.3(d)]. (a) Suppose that a stationary regime exists; for events in region i, let li = E(Ψi [X(t)]) denote their rate of occurrence and mi their mean size. Establish the balance equations ρi =

θij lj mj

(i = 1, . . . , I).

j

(b) Let Ri (x) = Ψi (x E[κi | X(t) = x]), and write R(x) for the vector with components Ri (x), with x in domain D. Then, a matrix analogue of condition (7.2.13) takes the form lim inf x∈D R(x) ≤ [2I − Θ]−1ρ ≤ lim supx∈D R(x), where ρ is the vector of input rates. Investigate possible suﬃcient conditions for the existence of a stationary version of the process.

7.4. Random Time Change and a Goodness-of-Fit Test The proposition below has been part of the folklore of point process theory for many years. In essence, it goes back to the work of Watanabe (1964), who ﬁrst recognised that the Poisson process could be characterized by the form of its compensator (the deterministic function λt), and Meyer (1971). It was ﬁrst clearly stated and proved by Papangelou (1974), who describes it in the following terms:

258

7. Conditional Intensities and Likelihoods

“Suppose that, starting at 0 say, we trace R+ in such a way that at the time we are passing position t our speed is 1/λ∗ (t), which can be ∞. (The value λ∗ (t) is determined by the past, i.e. by what happened up to t.) Then the time instants at which we shall meet all the points in R+ of the process form a homogeneous Poisson process.”

t In other language, the random time transformation τ = Λ∗ (t) = 0 λ∗ (u) du takes the point process with conditional intensity function λ∗ (t) into a unitrate Poisson process. Theorem 7.4.I. Let N be a simple point process adapted to a history F with bounded, strictly positive conditional F -intensity λ∗ (t) and F -compent sator Λ∗ (t) = 0 λ∗ (u) du that is not a.s.-bounded. Under the random time change t → Λ∗ (t), the transformed process (t) = N Λ∗ −1 (t) N (7.4.1) is a Poisson process with unit rate. Conversely, suppose there is given a history G, a G-adapted cumulative process M (t) with a.s. ﬁnite, monotonically increasing and continuous trajectories, and a G-adapted simple Poisson process N 0 (t). Let F denote the history of σ-algebras Ft = GM (t) . Then N (t) = N0 M (t) is a simple point process that is F -adapted and has F -compensator M (t). Proof. The essence of this theorem is a generalization of the well-known result, crucial to many simulation algorithms, that if the random variable X has a continuous distribution function F (x), then Y = F (X) has a uniform distribution on the unit interval. We ﬁrst restate this result in a form that will make the analogy more transparent. Lemma 7.4.II. Let X be a random variable with continuous distribution function F (·) and integrated hazard function H(x) = − log[1 − F (x)]. Then Y = H(X) has a unit exponential distribution (i.e. with unit mean). Conversely, if Y is a random variable with unit exponential distribution, then X = H −1 (Y ) has distribution function F (·). If, therefore, we have a sequence of interval lengths X1 , X2 , . . . with continuous distributions F1 (t), F2 (t), . . . , the corresponding sequence of transformed random variables Y1 = H1 (X1 ), Y2 = H2 (X2 ), . . . is a sequence of unit exponential random variables. Now recall the construction for the conditional intensity function as an amalgam of hazard functions hn (u | t1 , . . . , tn−1 ) in equation (7.2.3), and set F1 (x) = 1 − exp[−H1 (x)], F2 (x) = 1 − exp[−H2 (x)], . . . , where for brevity of notation we have written Hn (x) = x h (u | t1 , . . . , tn−1 ) du. If the intervals X1 , X2 , . . . represent the sequence 0 n of intervals for a point process with conditional intensity function λ∗ (t) that can be represented in terms of integrated hazard functions as above, then the joint distribution of any ﬁnite sequence of these intervals is the product of the distribution functions Fi (t), and the joint distribution of the corresponding transformed random variables H1 (X1 ), H2 (X2 ), . . . is the product of unit ex-

7.4.

Random Time Change and a Goodness-of-Fit Test

259

ponential distributions and therefore represents the joint distribution of a set of i.i.d. unit exponential random variables. But such a point process is just a unit-rate Poisson process. This argument lies behind a possible proof of the direct part of the theorem in the case where F is the internal history of N (·). The converse part, again for the special case of the internal history, follows by a reversed argument using the converse part of the lemma. The proof in the general case requires the same kind of attention to questions of predictability that we have mentioned in earlier discussions of the conditional intensity function and its integral. We sketch the general argument below, leaving a fuller discussion to Chapter 14. Under the stated conditions, Λ∗ (t) and its inverse are both continuous, so , like N itself, can increase only by unit jumps. It is also that the process N clear that the family of σ-algebras Ft is mapped into the family of σ-algebras Gt = FΛ∗ −1 (t) , say, for the transformed process. (A rigorous deﬁnition of these, and a strict proof, requires use of the optional-sampling theorem as in Appendix A3.3.III.) Furthermore, & (t) | Gt− ] = E dN Λ∗ −1 (t) & FΛ∗ −1 (t) E[dN ≈ λ∗ Λ∗ −1 (t) d Λ∗ −1 (t) = dt,

(7.4.2)

has the lack-of-memory property and is therewhich shows that the process N fore the Poisson process (Theorem 2.2.III). The converse is a further application of the optional-sampling theorem. Since each T = M (t) is a stopping time for N0 (t), the σ-algebras Ft = GM (t) are well deﬁned, and N (t) = N0 M (t) is F -adapted. Note the crucial importance that G should contain the history of the process N0 —indeed, the minimal form of the theorem requires only that M (t) be adapted to the internal history of N0 . N (t) is also a.s. ﬁnite and monotonically increasing with unit jumps; hence, it deﬁnes a simple point process. The optionalsampling theorem and the martingale property for N0 (t) − t then imply that, for t > s, T = M (t) > S = M (s), E[N (t) − M (t) | Fs ] = E[N0 (T ) − T | GS ] = N0 (S) − S = N (s) − M (s). Thus, N (t) − M (t) is an F -martingale, from which it follows that M must be the F -compensator for N . Because of this result, a simple point process with continuous compensator is sometimes called a process of Poisson type. The theorem implies that all such processes can be derived from a simple Poisson process by a random time transformation. Example 7.4(a) Renewal process [see Exercises 7.2.3(a) and 7.2.11]. We consider an ordinary renewal process started with an event at the origin. We know from Exercise 7.2.11 that the conditional intensity function for this

260

7. Conditional Intensities and Likelihoods

process is just the hazard function for the interval distribution, evaluated at the backward recurrence time Bt , namely the time elapsed since the most recent event before the present time t. Also, the compensator A(·) satisﬁes A(t) = A(t − Bt ) − log[1 − F (Bt )] . On the transformed time scale, the time interval τ from one event to the next is given by τ = − log[1 − F (X)], where X is the length of the interval on the original time scale. As in Lemma 7.4.II, the transformation takes successive intervals into a sequence of i.i.d. exponentially distributed intervals (i.e. into a unit-rate Poisson process). The general case with internal history is a generalization of this argument to the situation where the distributions of successive intervals are conditioned by the previous history of the process. The requirement in Theorem 7.4.I that the compensator Λ∗ (t) should increase without bound ensures that there is no last point in the process. The basic result remains valid without it, except insofar as the ﬁnal interval is then inﬁnite and so cannot belong to a unit-rate Poisson process. The extreme case in the next example makes the point. Example 7.4(b) One-point process (see Exercises 7.2.2 and 7.4.1). Let a point process on (0, ∞) have exactly one point, at t1 , say, where Pr{t1 ≤ x} = F (x), and we assume that the d.f. F is continuous. Then min(t,t1 ) dF (u) ∗ = − log 1 − F [min(t, t1 )] . Λ (t) = 1 − F (u) 0 The initial interval transforms, as in the previous example, to an interval with unit exponential distribution; the transformed process then terminates. The converse part of Theorem 7.4.I contains within it the basis for one general approach to simulating point processes. Using the notation X1 , X2 , . . . , F1 (·), F2 (·), · · · and H1 (·), H2 (·) as in the proof of that theorem and Lemma 7.4.II, it may be summarized as follows. Algorithm 7.4.III. Simulation of point processes by the inverse method. 1. Simulate a sequence Y1 , Y2 , . . . of unit exponential random variables (respectively, a sequence U1 , U2 , . . . of uniform U (0, 1) random variables). 2. Transform to the sequence of successive interval lengths X1 = H1−1 (Y1 ), X2 = H2−1 (Y2 ), . . . (respectively, the sequence F1−1 (U1 ), F2−1 (U2 ), . . .). 3. Form the point process (t1 , t2 , . . .) by setting t1 = X1 , t2 = X1 + X2 , . . . . The use of exponential or uniform random variables to initiate the algorithm is immaterial in that both lead to point processes with identical properties. The use of the exponential variates shows more clearly the relation to the Poisson process and may be marginally more convenient when the process is speciﬁed through its conditional intensity function because t1 , t2 , . . . then solve the successive equations t2 t1 λ∗ (u)du = Y1 , λ∗ (u)du = Y2 , 0

t1

7.4.

Random Time Change and a Goodness-of-Fit Test

261

and so on. The main constraint in the use of this algorithm is the common need to introduce an iterative numerical method to ﬁnd the inverse of the integrated hazard or distribution function. In principle, the method may be extended to situations where the interval distributions are conditioned by external as well as internal variables, provided that all the relevant conditioning information is available at the beginning of each new interval. A second important application of Theorem 7.4.I is the technique sometimes referred to as point process residual analysis (see e.g. Ogata, 1988); it uses the time transformation in testing the goodness-of-ﬁt of a point process model. It depends on the fact that if the compensator used for the transformation is that of the true model, then the transformed process will be unit-rate Poisson, whereas if the wrong compensator is used, the transformed process will show some systematic departure from the unit-rate Poisson process. This means that the problem of testing for goodness-of-ﬁt for a given, perhaps quite complex, model can be reduced to the well-studied and much simpler problem of testing for a unit-rate Poisson process (e.g. Cox and Lewis, 1966). This device ﬁlls what is otherwise something of a gap for point process inference. While estimation and model comparison procedures can be based on standard likelihood methods, and a variety of statistical tests on speciﬁc characteristics, such as the interval lengths or the second-order properties of count numbers, are also available [the now classical monograph by Cox and Lewis (1966) remains an excellent introduction to a range of techniques of this kind], the one feature not obviously present there is a general purpose goodness-of-ﬁt test for assessing the adequacy of a model overall. Before outlining the method, we present a minor rephrasing and extension of the basic theorem. Proposition 7.4.IV. Let {0 < t1 < t2 < · · ·} be an unbounded, increasing sequence of time points in the half-line (0, ∞), N ∗ a simple point process with internal history H, and monotonic, continuous H-compensator Λ∗ (t) such that Λ∗ (t) → ∞ a.s. Then, with probability 1, the transformed sequence {τi = Λ∗ (ti )} is a realization of a unit-rate Poisson process if and only if the original sequence {ti } is a realization from the point process deﬁned by Λ∗ (t). Proof. This proposition extends Theorem 7.4.I by incorporating the assertion that the character of the transformed process can (with probability 1) be unambiguously determined from a realization on the half-line R+ . This can be regarded as a consequence of the ergodic theorem (see Chapter 12): for a stationary process, the probability of any of the events appearing in the ﬁdi distributions can be recovered as a limiting ratio. If the processes are not identical, there must be at least one such event to which the two processes ascribe diﬀerent probabilities. Thus, the limiting ratios, and hence the observation sequence, must be able to discriminate between the two processes. Granted this assertion, the result is a corollary of Theorem 7.4.I.

262

7. Conditional Intensities and Likelihoods

Now suppose there is given a realization {t1 , . . . , tN (T ) } on a ﬁnite observation interval (0, T ) to which has been ﬁtted a point process model with compensator Λ∗ (t). The procedure outlined below makes use of Proposition 7.4.II to deﬁne a goodness-of-ﬁt test for point process models for which the conditional intensity function, and hence the compensator, is explicitly known. Algorithm 7.4.V. Goodness-of-ﬁt test based on the residual point process. 1. Form the transformed time sequence {τi = Λ∗ (ti ), i = 1, . . . , N (T )}. 2. Plot the cumulative step-function Y (x) through the points (xi , yi ) = (τi /T, i/N (T )) in the unit square 0 ≤ x, √ y ≤ 1. 3. Plot conﬁdence lines y = x ± Z1−α/2 / T , where with Φ denoting the standard normal distribution function, Φ(Zp ) = p. 4. Implement an approximate 100(1−α)% test of the hypothesis that the {τi } come from a unit-rate Poisson process by observing whether the empirical process Y (x) falls outside the conﬁdence band drawn in step 3. At step 4, this procedure uses the maximum deviation from the expected rate curve in the transformed time domain to check for departures from the unit rate expected for the data in the transformed time domain. It is analogous in this context to the Kolmogorov–Smirnov test. The test is approximate in two respects. First, it is a large sample test, based on the Brownian motion approximation to the Poisson process. Second, and perhaps more importantly, it does not take into account the eﬀect of estimating the parameters from the same data as are used to check the model. While both are typical large sample approximations, the bias resulting from the latter in moderate-sized data sets may be considerable, as shown for example in Schoenberg (2002), particularly when the process has strong time-dependence features that reduce the eﬀective amount of information available in the data. As with any portmanteau test, the test above has the further disadvantage, oﬀset by its wide range of applicability, that its eﬀectiveness (power) against diﬀerent types of alternatives may be very variable. For more speciﬁc alternatives, there are many other tests of Poisson character that could be substituted for the Kolmogorov–Smirnov-type test suggested above [see e.g. Cox and Lewis (1966), as already noted]. Such tests are likely to be more powerful than the test above when the nature of the expected deviation from Poisson character is known. One advantage of the residual analysis is that it leads to a visual display (step 2 in the algorithm above) that can be useful in gaining a qualitative impression of the goodness-of-ﬁt whether or not a formal test is applied. Ogata has made ingenious uses of this feature for visually detecting departures from a standard model, as illustrated below. Example 7.4(c) Use of residual analysis to detect the return to normal background activity. The rate of occurrence of events in aftershock sequences to (large) earthquakes is traditionally modelled by a Poisson process whose intensity function decays as a power law, known as the modiﬁed Omori law

7.4.

Random Time Change and a Goodness-of-Fit Test

263

in the seismology literature, λ(t) = A/(t + c)1+p

(t > 0),

where A, c and p are nonnegative parameters and p is commonly close to zero. It is a delicate question to determine the time at which the aftershocks merge indistinguishably into the general background activity for the region. Leaving aside the problem of deﬁning precisely what is meant by this statement, the visual pattern can be much enhanced by ﬁrst transforming the time scale by the compensator Λ∗ (t) = (A/p)[c−p − (t + c)−p ], (t ≥ 0) of the model above. When the rate of aftershock activity has decayed to about the level of the background activity, the dominant factor in the observed rate changes from the aftershock decay term to the steady background rate, increasing the observed rate above what would be expected from modelling the aftershock sequence. The change point is hard to pinpoint visually on the original time scale, but on the transformed time scale, it shows up relatively clearly as a deviation above the diagonal y = x near the end of the observation sequence. See e.g. Ogata (1988) and Utsu et al. (1995) for illustrations and further details. Residual analysis can also be adapted to more speciﬁc problems as below. Example 7.4(d) Using the ETAS model to test for relative quiescence in seismic data. At shallow depths (0–20 km or so), the ETAS model of Example 6.4(d) usually provides a reasonable ﬁrst approximation to the time– magnitude history of moderate or small-size earthquake events in an observation region. For this reason, departures from the ETAS model, or changes in its apparent parameter values, can be used as an indicator of anomalous seismic activity that may be associated with the genesis of a forthcoming large event. In particular, a reduction in activity below that anticipated by the ETAS model may signify the onset of a period of seismic quiescence, a much debated indicator of a larger event. The task of searching for changes in rate is here complicated by the high level of clustering characteristic of earthquake activity, which makes the evaluation of appropriate conﬁdence levels particularly diﬃcult. Again, the task can be much facilitated by ﬁrst transforming the occurrence times according to the best-ﬁtting ETAS model and then carrying out the change-point test on the transformed data. The problem is then reduced to that of testing for a change point in a constant-rate Poisson process, a relatively straightforward and well-studied problem. Ogata (1988, 1992, 2001) has developed detailed procedures, including a modiﬁcation to the usual AIC criterion, to take into account the nonstandard character of the change-point problem (the additional parameters are absent in the null hypothesis rather than being ﬁtted to a special numerical value; Davies’ (1987) work on the problem of hypothesis testing when parameters vanish under H0 is pertinent). Some further details are given in the exercises. Exercise 7.4.2 indicates extensions to the marked point process case.

264

7. Conditional Intensities and Likelihoods

As with the other procedures we have illustrated in this chapter, the results on random time changes can be generalized relatively straightforwardly to other types of evolutionary point processes (notably multivariate and marked point processes) but only with more diﬃculty to spatial point patterns (see Chapter 14). We indicate below the extensions to multivariate and marked point processes; for more discussion, see e.g. Brown and Nair (1988). These extensions hinge on the uniqueness of the compensator with respect to the internal history H; see Proposition 7.3.IV for regular MPPs. Consider ﬁrst a multivariate point process. Here each component could be transformed by its own compensator, as a result of which we would obtain a multivariate Poisson process in which each component has unit rate. But would these components then be independent? The answer to this question depends crucially on the histories used to deﬁne the compensators. If the full internal history is used for each component, then any dependence between the original components is taken into account and a Poisson process with independent, equally likely components is obtained. On the other hand, if each component is transformed according to its own internal history, the components of the resulting multivariate Poisson process will have equal (unit) rates but in general will not be independent. The next example provides a simple illustration. Example 7.4(e) Bivariate Poisson process [see Example 7.3(a)]. The model consists of an initial stream of input points from a Poisson process at constant rate λ and an associated stream of output points formed by delaying the initial points by random times exponentially distributed with mean 1/µ independently for each initial point. Integrating the full H-conditional intensities at (7.3.2), the corresponding compensators are for component I a line of constant slope λ and for component II a broken straight line, with segments whose slopes are nonnegative multiples of µ, the breaks in the line occurring at the points of both processes, the slope increasing by µ whenever a component I point occurs and decreasing by µ whenever a component II point occurs. The transformed points from component I are identical with the original points apart from an overall linear change of scale. The time transformation for component II is more complex: the distances between points are stretched just after a component I point and shrunk after a component II point. Further, if for any t all points of component I have been cleared (i.e. their associated component II points have already occurred), the transformed time remains ﬁxed until the next component I point arrives. In this way, the dependence between the two components is broken, and both component processes are transformed into unit-rate Poisson processes. A similar conclusion holds even if either or both components is augmented by the addition of the points from an independent Poisson process or processes: the relative scales of the time changes compensate for any diﬀerences in the original component rates, producing always a unit rate in the trans-

7.4.

Random Time Change and a Goodness-of-Fit Test

265

formed process, while any dependence between the two components is still broken as explained above. Consider now the case of a regular MPP. If the support of the mark distribution is no longer ﬁnite, then eﬀectively we have an inﬁnite family of diﬀerent components; clearly it is not possible to turn them all into unit-rate Poisson processes and hope to retain an MPP as output. To achieve such a result, at least the rates of the components should be adjusted to produce a transformed process with ﬁnite ground rate. Here is one way of proceeding. Suppose that the H-conditional intensity of the original process can be represented in the form λ∗ (t, κ) = λ∗g (t)f ∗ (κ | t), where f ∗ (κ | t) is a probability density with respect to the reference measure K (·), which we take here to be itself a probability measure so that t (dκ) = K f (κ | t) K (dκ) = 1. Let A(t, U ) = U 0 λ∗ (s, κ) ds K (dκ) be K K t the full H-compensator for the process, and write Aκ (t) = 0 λ∗ (s, κ) ds. To avoid complications in deﬁning the inverse functions, we suppose both λ∗g (t) and f ∗ (κ | t) are strictly positive for all t and κ. Now consider the transformation that takes the pair (t, κ) into the pair (Aκ (t), κ). We claim that the transformed process is a stationary compound Poisson process with unit ground rate and mark distribution K (·). To establish this result, we appeal to the uniqueness theorem for compensators (Proposition 7.3.IV). The crucial computation, corresponding to equation (7.4.2), is (dτ × dκ)] = E[N (dy × dκ)] ≈ λ∗ (y, κ) dy (dκ) = dτ (dκ), E[N K K ∗ where y = A−1 κ (τ ), so that dy = dτ /λ (y, κ). The last form can be identiﬁed with the compensator for a stationary compound Poisson process with ground ˜ g = 1 and mark distribution (·). The uniqueness theorem completes rate λ K the proof. The results for both multivariate and marked point processes are summarized in the following proposition (a more careful discussion of the arguments above is given in Chapter 14).

Proposition 7.4.VI. (a) Let {Nj (t): j = 1, . . . , J} be a multivariate point process deﬁned on [0, ∞) with a ﬁnite set of components, full internal history H, and left-continuous H-intensities λ∗j (t). Suppose that for j = 1, . . . , J, the t conditional intensities are strictly positive and that Λ∗j (t) = 0 λ∗j (s) ds → ∞ as t → ∞. Then, under the simultaneous random time transformations t → Λ∗j (t),

(j = 1, . . . , J)

the process {(N1 (t), . . . , NJ (t)): t ≥ 0} is transformed into a multivariate Poisson process with independent components each having unit rate.

266

7. Conditional Intensities and Likelihoods

(b) Let N (t, κ) be an MPP deﬁned on [0, ∞) × K, where K is a c.s.m.s. with Borel sets BK and reference probability measure K (·), and let H denote the full internal history. Suppose that the H-conditional intensity λ∗ (t, κ) = λ∗g (κ)f ∗ (κ | t) exists, is K -a.e. left-continuous in t and strictly positive on t [0, ∞) × K, and that Λ∗κ (t) = 0 λ∗ (s, κ) ds → ∞ as t → ∞ K -a.e. Then, under the random time transformations (t, κ) → (Λ∗κ (t), κ), with unit the MPP N is transformed into a compound Poisson process N ground rate and stationary mark distribution K (·). Example 7.4(f) ETAS model [see Example 6.4(d)]. This can serve as a typical example of a process with unpredictable marks. The conditional intensity factorizes into the form [see equation (7.3.10)] λ∗ (t, κ) = λ0 + ν eα(χ−κ0 ) g(t − s) N (ds × dχ) f (κ) ≡ λ∗g (t)f (κ), (−∞,t)×K

where f (·), the density of the magnitude distribution, is commonly assumed to have ∞an exponential form on K = [0, ∞). For stationarity, we require ρ = ν 0 eακ f (κ) dκ < 1. Under these conditions, it is natural to take the reference measure on K to be f itself, in which case all the densities relative to the reference measure are equal to unity. Consequently, the multiple time changes here all reduce to the same form: t (t, κ) → (Λ∗g (t), κ), where Λ∗g (t) = λ∗g (s) ds. 0

In other words, under the random time change associated with the ground process, the original ETAS process is transformed into a compound Poisson process with unit ground rate and stationary mark density f . Such transformations open the way to corresponding extensions of the procedures described earlier for testing the process. In particular, checking the constancy of the mark distribution simpliﬁes the detection of changes in the relative rates of events of diﬀerent magnitudes. Similar remarks apply to other examples with unpredictable marks, such as the stress-release models of Examples 7.2(g) and 7.3(d). Schoenberg (1999) gives a random-time change for transforming spatial point processes to Poisson.

Exercises and Complements to Section 7.4 7.4.1 Consider a two-point process t1 , t2 , (t1 < t2 ) on [0, T ], where (t1 , t2 − t1 ) has continuous bivariate d.f. F (t, u). Find the compensator and deﬁne the random time change explicitly in terms of F . The Poisson process here has to be conditioned on the occurrence of two points within the interval [0, T ]. [Hint: Example 7.4(b) treats the one-point case.]

7.5.

Simulation and Prediction Algorithms

267

7.4.2 Marked point process extension of Algorithm 7.4.III. Following the discussion around equation (7.3.2), suppose there is given a family of conditional hazard functions hn (u | (t1 , κ1 ), . . . , (tn−1 , κn−1 )) and corresponding conditional mark distributions fn (κ | (t1 , κ1 ), . . . , (tn−1 , κn−1 ); u). Formulate in detail a sequence of simulation steps to solve successively the pairs of equations

tn

hn (u | (t1 , κ1 ), . . . , (tn−1 , κn−1 )) du = Yn ,

tn−1 κn

fn (κ | (t1 , κ1 ), . . . , (tn−1 , κn−1 ); u) dκ = Un .

0

7.4.3 (Continuation). Using steps analogous to the simulation argument above, provide an alternative, constructive proof of Proposition 7.4.VI. 7.4.4 Extension of Ogata’s residual analysis to multivariate and marked point processes. Develop algorithms, analogous to those in Algorithm 7.4.V, for testing multivariate and marked point processes. [Hint: In the multivariate case, test both (a) that the ground process for the transformed process is a unit-rate Poisson process and (b) that the marks are i.i.d. with equal probabilities. In the marked case, take the reference measure to be, say, a unit exponential distribution, and replace (b) with a test for a set of i.i.d. unit exponential variates.]

7.5. Simulation and Prediction Algorithms In the next two sections, we broach the topics of simulation, prediction, and prediction assessment. In modelling, the existence of a logically consistent simulation algorithm for some process is tantamount to a constructive proof that the process exists. Furthermore, simulation methods have become a key component in evaluating the numerical characteristics of a model, in checking both qualitative and quantitative features of the model, and in the centrally important task of model-based prediction. A brief survey of the principal approaches to point process simulation and of the theoretical principles on which these approaches are based therefore seemed to us an important complement to the rest of the text. This section provides a brief introduction to simulation methods for evolutionary models; that is, for models retaining a time-like dimension that then dictates the probability structure through the conditional intensity function. Simulation methods can be developed also for spatial point patterns (see Chapter 15), but considerable conceptual simplicity results from the ability to order the evolution of the process in ‘time’. The growth in importance of Markov chain Monte Carlo methods for simulating spatial processes is a tacit acknowledgement of the fact that such methods introduce an artiﬁcial time dimension even into problems where no such dimension is originally present. Two general approaches are commonly used for simulating point processes in time. The ﬁrst we have already considered in Algorithm 7.4.III; it involves simulating the successive intervals, making use of the description of the

268

7. Conditional Intensities and Likelihoods

conditional intensity function as a family of hazard functions as in equation (7.2.3). Its main disadvantage as a general method is that it requires repeated numerical solution of the equation deﬁning the inverse. The thinning methods outlined in the present section, by contrast, require only evaluations of the conditional intensity function. Although the diﬀerence in computational time between these two methods is not huge, it is the main reason why the thinning method is given greater prominence in this section. In addition, the theoretical basis behind thinning methods is of interest in its own right. The most important theoretical result is a construction, originating in Kerstan (1964) and reﬁned and extended in Br´emaud and Massouli´e (1996), that has something of the character of a converse to Proposition 7.4.I. There we transformed a point process with general conditional intensity to a Poisson process; here we convert a Poisson process back into a process with general conditional intensity. For this purpose, we use an auxiliary coordinate in the say, on the state space, so we consider a unit-intensity Poisson process, N consist of pairs (xj , yj ). product space X = R × R+ . The realizations of N Also, let Ht denote the σ-algebra of events deﬁned on a simple point process over the interval [0, t) and H the history {Ht }. The critical assumption below is that λ∗ is H-adapted. , H be deﬁned as above, let λ∗ (t) be a nonnegative, Proposition 7.5.I. Let N left-continuous, H-adapted process, and deﬁne the point process N on R by dt × (0, λ∗ (t)] . N (dt) = N

(7.5.1)

Then N has H-conditional intensity λ∗ (t). Proof. Arguing heuristically, it is enough to note that & ˜ dt × (0, λ∗ (t−)] & Ht− = λ∗ (t) dt. E[N (dt) | Ht− ] = E N There is no requirement in this proposition that the conditional intensity be a.s. uniformly bounded as was required in the original Shedler–Lewis algorithm. When such a bound exists, it leads to straightforward versions of the thinning algorithm, as in Algorithm 7.5.II below. The result can be further extended in various ways, for example to situations where more general histories are permitted or where the initial process is not Poisson but has a conditional intensity function that almost surely bounds that of the process to be simulated; see Exercises 7.5.1–2. Example 7.5(a) Standard renewal process on [0, ∞). We suppose the process starts with an event at t = 0. Let h(u) denote the hazard function for the lifetime distribution of intervals between successive points, so that [see Exercise 7.2.3(a)] the conditional intensity function has the form λ∗ (t) = h(t − tN (t) )

(t ≥ 0),

7.5.

Simulation and Prediction Algorithms

269

where tN (t) is the time of occurrence of the last event before time t. However, rather than on N . To this end, λ∗ (t) should be deﬁned on the history of N . With t0 = 0, deﬁne we ﬁrst deﬁne the sequence of points ti in terms of N sequentially tn+1 = min{xi : xi > tn and yi < h(xi − tn )}

(n = 0, 1, . . .)

and then deﬁne λ∗ (t) as above. Notice that the right-hand side of this expression is Ft -measurable and the whole process is F -adapted. Thinning algorithms generally follow much the same lines as in Proposition 7.5.I and the example above. The main diﬃculty arises from the range of yi being unbounded, which provides a ﬂexibility that is diﬃcult to match in practice. The original Shedler–Lewis algorithm (Lewis and Shedler, 1976; see also Exercise 2.1.6) was for an inhomogeneous Poisson process in a time interval where the intensity is bounded above by some constant, M say. Then, the auxiliary dimension can be taken as the bounded interval (0, M ) rather than the whole of R+ , or equivalently the yi could be considered i.i.d. uniformly distributed random variables on the interval (0, M ). Equivalently again, the time intensity could be increased from unity to M and the yi taken as i.i.d. uniform on (0, 1), which leads to the basic form of the thinning algorithm outlined in the algorithm below. In discussing the simulation algorithms below, it is convenient to introduce the term list-history to stand for the actual record of times, or times and marks, of events observed or simulated up until the current time t. We shall denote such a list-history by H, or Ht if it is important to record the current time in the notation. Thus, a list-history H is just a vector of times {t1 , . . . , tN (t) } or a matrix of times and marks {(t1 , κ1 ), . . . , (tN (t) , κN (t) )}. We shall denote the operation of adding a newly observed or generated term to the list-history by H → H ∪ tj or H → H ∪ (tj , κj ). In the discussion of conditioning relations such as occur in the conditional intensity, the listhistory Ht bears to the σ-algebra Ht a relationship similar to that between an observed value x of a random variable X and the random variable X itself. The algorithms require an extension of Proposition 7.5.I to the situation where the process may depend on an initial history H0 ; we omit detail but note the following. Such a history will be reﬂected in the list-history by a set of times or times and marks of events observed prior to the beginning of the simulation. This is an important feature when we come to prediction algorithms and wish to start the simulation at the ‘present’, taking into account the real observations that have been observed up until that time. It is also important in the simulation of stationary processes, for which the simulation may be allowed to run for some initial period (−B, 0) before simulation proper begins. The purpose is to allow the eﬀects of any transients from the initial conditions to become negligible. Finding the optimal length of such a preliminary ‘burn-in’ period is an important question in its own right. Its solution depends on the rate at which the given process converges toward equilibrium

270

7. Conditional Intensities and Likelihoods

from the initial state, but in general this is a delicate question that is aﬀected by the choice of initial state as well as decay parameters characteristic of the process as a whole. Suppose, then, that the process to be simulated is speciﬁed through its conditional intensity λ∗ (t), that there exists a ﬁnite bound M such that λ∗ (t) ≤ M for all possible past histories, and that the process is to be simulated over a ﬁnite interval [0, A) given some initial list-history H0 . Algorithm 7.5.II. Shedler–Lewis Thinning Algorithm for processes with bounded conditional intensity. 1. Simulate x1 , . . . , xi according to a Poisson process with rate M (for example, by simulating successive interval lengths as i.i.d. exponential variables with mean 1/M ), stopping as soon as xi > A. 2. Simulate y1 , . . . , yi as a set of i.i.d. uniform (0, 1) random variables. 3. Set k = 1, j = 1. 4. If xk > A, terminate. Otherwise, evaluate λ∗ (xk ) = λ(xk | Hxk ). 5. If yk ≤ λ∗ (xk )/M , set tj = xk , update H to H ∪ tj , and advance j to j + 1. 6. Advance k to k + 1 and return to step 4. 7. The output consists of the list {j; t1 , . . . , tj }. This algorithm is relatively simple to describe. In the more elaborate versions that appear shortly, it is convenient to include a termination condition (or conditions), of which steps 1 and 4 above are simple. In general, we may need some limit on the number of points to be generated that lies outside the raison d’ˆetre of the algorithm. While this algorithm works well enough in its original context of ﬁxed intensity functions, its main drawback in applications to processes with random conditional intensities is the need for a bound on the intensity that holds not only over (0, A) but also over all histories of the process up to time A. To meet this diﬃculty, Ogata (1981) suggested a sequential variant of the algorithm that overcomes this diﬃculty, requiring only a local boundedness condition on the conditional intensity. A minor variant of his approach is outlined in Algorithm 7.5.IV. For the sake of clarity, we return to the representation of the conditional intensity function in terms of successive hazard functions, much as in Deﬁnition 7.2.II, but allowing all such functions to depend on an initial history H0 , namely hn (s | H0 , t1 , . . . , tn−1 ), for 0 < t1 < · · · < tn−1 < s ≤ A. For every t in (0, A) and associated σ-algebra Ht , we suppose there are given two quantities, a local bound M (t | Ht ) and a time interval of length L(t | Ht ), satisfying the following conditions.

7.5.

Simulation and Prediction Algorithms

271

Condition 7.5.III. There exist functions M (t | Ht ), L(t | Ht ) such that, for all initial histories H0 , all t ∈ [0, ∞), for every n = 1, 2, . . . , and all sequences t1 , . . . , tn−1 with 0 < t1 < · · · < tn−1 < t, the hazard functions satisfy hn (t + u | H0 , t1 , . . . , tn−1 ) ≤ M (t | Ht )

(0 ≤ u < L(t | Ht ) ).

Placing the bound on the hazard function is equivalent to placing the bound on the conditional intensity function under the constraint that no additional points of the process occur in the interval (t, t + u) under scrutiny. As soon as a new point does occur, in general the hazard function will change and a new bound will be required. Thus, the bound holds until either the time step L(·) has elapsed or a new point of the process occurs. For the algorithm below, the list-history Ht consists of {H0 , t1 , . . . , tN (t) }, where N (t) is the number of points ti satisfying 0 ≤ ti < t. For brevity, we mostly write M (t) and L(t) for M (t | Ht ) and L(t | Ht ). Ogata (1981) gives extended discussion and variants of the procedure. Algorithm 7.5.IV. Ogata’s modiﬁed thinning algorithm. 1. Set t = 0, i = 0. 2. Stop if the termination condition is met; otherwise, compute M (t | Ht ) and L(t | Ht ). 3. Generate an exponential r.v. T with mean 1/M (t) and an r.v. U uniformly distributed on (0, 1). 4. If T > L(t), set t = t + L(t) and return to step 2. 5. If T ≤ L(t) and λ∗ (t + T )/M (t) > U , replace t by t + T and return to step 2. 6. Otherwise, advance i by 1, set ti = t + T , replace t by ti , update H to H ∪ ti , and return to step 2. 7. The output is the list {i; t1 , . . . , ti }. The technical diﬃculties of calculating suitable values for M (t) and L(t) vary greatly according to the character of the process being simulated. In an example such as a Hawkes process, at least when the hazard functions decrease monotonically after an event, it would be enough in principle to consider only t = ti (i.e. points of the process) and set M (ti ) = λ∗ (ti +). This leads to a very ineﬃcient algorithm, however, since the hazard decreases rapidly and a large number of rejected trial points could be generated. A simple modiﬁcation is to set M (t) = λ∗ (t) and L(t) = 12 λ∗ (t+), irrespective of whether or not t is a point of the process. Such a choice gives a reasonable compromise between setting the bound too high, and so generating excessive trial points, and setting it too low, thus requiring too many iterations of step 3. The next example is a process with an increasing hazard, where the intervention of step 3 is virtually mandatory. Example 7.5(b) Self-correcting or stress-release model. We discuss the simulation of the model of Example 7.2(g). As described there, points {ti } occur

272

7. Conditional Intensities and Likelihoods

at a rate governed by the conditional intensity function λ(t) = Ψ[X(t)], where X(t) is an unobserved Markov jump process that increases linearly between jump times ti at which it decreases by an amount κi , so that X(t) = X(0) + νt −

i:ti
κi .

Given an initial history H0 , we can now simulate the process using Algorithm 7.5.IV as, for example, we could take L(t) = 2/Ψ[X(t)] and M (t) = Ψ[X(t) + νL(t)]. With high probability, the next event would occur within twice the mean interval length at the start of the interval, and because of the increasing nature of the hazard function, a simple bound would be its value at the end of the search interval. Algorithm 7.5.IV can be extended to cover the situation where the evolution of the conditional hazard function depends on additional random processes, themselves evolving jointly with the given point process. The immediate requirements are for the existence of explicit algorithms for calculating the intensity and for ﬁnding local bounds L(·) and M (·) that take into account current and past values of the auxiliary variables. A deeper diﬃculty, however, relates to the need to simulate forward not only the point process but also the auxiliary variables on which it depends. For auxiliary variables that change only slowly, this may not be a serious handicap, but for longer-term predictions, a full model is needed from which the point process and auxiliary variables can be jointly simulated. Extension of the simulation algorithms to marked point processes, including even space–time processes, presents no signiﬁcant diﬃculty. Once again, the evolutionary character of the process makes a sequential approach straightforward and natural. First, a candidate for the next time point of the process is selected and either accepted or rejected by the thinning algorithm using the full H-intensity for the overall sequence of time points. Once a new time point is selected, the corresponding mark, whether a weight, a spatial location, or some further characteristic, is simulated, using the conditional density f ∗ (κ | t) for the mark distribution. The situation is particularly simple if the process has independent or unpredictable marks, as the mark distribution is then independent of the history of the process. In general, the mark distribution can depend on the past history of the process, including both the past locations and marks, and the simulation will be tractable provided this dependence can be captured in a reasonably simple explicit manner. For convenience, an outline algorithm is summarized more formally below. In it, we use the same notation as for Algorithm 7.5.IV. The local bounds M (t) and L(t) must be chosen for the full internal intensity λ∗g (t) of the ground process. Subject to replacing λ∗ (t) by λ∗g (t) at step 5, the ﬁrst part of the algorithm is just a restatement of the steps in Algorithm 7.5.IV. Note that

7.5.

Simulation and Prediction Algorithms

273

we have paid particular attention to the need to update the list-history H. If simulation is to be applied to point process prediction, it is essential to allow the history at time 0 (corresponding to the present) to be nontrivial, in this case including all relevant information on observations of the actual process up to the time when simulation commences. Algorithm 7.5.V. Thinning algorithm for marked point processes. 1. Set t = 0, i = 0, H0 = ∅. 2. Stop if the termination condition is met. Otherwise, calculate M (t), L(t) for the ground intensity λgH (t). 3. Generate an exponential r.v. T with mean 1/M (t) and an r.v. U uniformly distributed on (0, 1). 4. If T > L(t), set t = t + L(t), update the list-history H, and return to step 2. 5. If T ≤ L(t) and λ∗0 (t + T )/M (t) < U , replace t by t + T , update the list-history H, and return to step 2. 6. Advance i by 1, set ti = t + T , replace t by ti , and generate a mark κi from the distribution with density f (κ | ti ). 7. Update the list-history H to H ∪ (ti , κi ), and return to step 2. 8. The output is the list {i; (t1 , κ1 ), . . . , (ti , κi )}. In Example 7.5(b) above, for example, simulation proceeds as if the process has nonanticipating marks until step 6 is reached, at which point the appropriate value φ[X(t)] must be read into the simulation routine for producing values according to the tapered Pareto distribution. By way of illustrating Algorithm 7.5.V, we consider the extension of Example 7.5(b) to the linked stress-release model. Example 7.5(c) Simulating the linked stress-release model [see Example 7.3(d)]. In this model, there are two types of marks: the region in which the event occurs (as a surrogate for spatial location) and the size of the event. The basic form of the conditional intensity is given in equation (7.3.14). A key step in the simulation is updating the list-history. This will consist of a matrix or list type object with one column for each coordinate of the events being described: here the times ti , their regions Ki , and their magnitudes Mi . When the simulation is started, the list-history may contain information from real or simulated data from the past in order to allow the simulation to join ‘seamlessly’ onto the past. Each time a new event is simulated, its coordinates are added to the list-history. Since the simulation of the next event depends only on the form of the conditional intensity, as determined by the current list-history, and additional random numbers, it can proceed on an event-by-event basis. First, the time of the next event in the ground process is simulated, then the region is selected with probabilities proportional to the relative values of the conditional intensities for the diﬀerent regions at that time, and then a magnitude is selected from the standard magnitude distribution (this distribution is ﬁxed in the standard model, but it can also be made stress- or region-dependent).

274

7. Conditional Intensities and Likelihoods

The prediction of point processes, in all but a few very special cases where explicit algorithms are available, goes hand-in-hand with simulation. The quantities that one would like to predict, such as the time to the next event, the probability of an event occurring within a given interval in the future, or the costs caused by events in the future, are commonly nonlinear functionals of the future of the process. They rarely fall into any general category for which analytic expressions are available. Since, on the other hand, simulation of a point process is relatively straightforward once its conditional intensity function is known, and moreover can be extended to situations where an arbitrary initial history can be incorporated into the conditional intensity, it is indeed natural to see prediction as an application and extension of the preceding procedures. Suppose there is given a realization of the point process on some ﬁnite interval (a, b). To link up with the preceding algorithms, we identify the origin t = 0 with the end point b of the interval so that, in our earlier notation, the realization on (a, b) forms part of the initial history H0 . Suppose for the sake of deﬁniteness that our aim is to predict a particular quantity V that can be represented as a functional of a ﬁnite segment of the future of the process. To fulﬁl our aim, we estimate the distribution of V . An outline of a prediction procedure is as follows. 1. Choose a time horizon (0, A) suﬃcient to encompass the predicted quantity of interest (we need not insist here that A be a ﬁxed number, provided the stopping rule is clearly deﬁned and can be incorporated into the simulation algorithm). 2. Simulate the process forward over (0, A) using the known structure of the conditional intensity function and initial history H0 . 3. Extract from the simulation the value V of the functional that it is required to predict. 4. Repeat steps 2 and 3 suﬃciently often to obtain the required precision for the prediction. 5. The output consists of the empirical distribution of the values of V obtained from the successive simulations. In step 5 above, it is often convenient to summarize the empirical distribution by key characteristics, such as its mean, standard deviation, and selected quantiles. Not all prediction exercises ﬁt exactly into this schema, but many are variations on it. Example 7.5(d) Prediction of a Wold process with exponential intervals [see Exercise 4.5.8 and Example 4.6(b)]. In the notation used previously, let an interval preceded by an interval of length x have parameter λ(x) [and hence mean 1/λ(x)]. Suppose that we wish to predict the time X0 to the next event and the length X1 of the ensuing complete interval, given the current list-history consisting of the times t0 , t−1 , . . . of the preceding events, where 0 denotes the present time so 0 > t0 > t−1 > · · · .

7.5.

Simulation and Prediction Algorithms

275

The quantity V of the preceding discussion is the pair (X0 , X1 ). The particular speciﬁcation of the model here implies that the joint density function of (X0 , X1 ) equals λ |t0 − t−1 | e−λ(|t0 −t−1 |)X0 λ |t0 | + X0 e−λ(|t0 |+X0 )X1 ; then simulation via the model should lead to a joint histogram that in principle is an approximation to this function. For pragmatic purposes, we may be satisﬁed with the ﬁrst moments 1 E(X0 | H0 ) = λ |t0 − t−1 |

and E(X1 | H0 ) =

0

∞

λ(|t0 − t−1 |) −λ(|t0 −t−1 |)u du. e λ(|t0 | + u)

Exercises and Complements to Section 7.5 7.5.1 Extended form of Proposition 7.5.I. Let F be a history on [0, ∞), λ1 (t), λ2 (t) be two nonnegative, left-continuous (or more generally predictable), historydependent candidates for conditional intensity functions, and let N ∗ (dt × ds) be an F -adapted unit-rate Poisson process on R+ × R that is unpredictable in the sense that its evolution for s > t is independent of the history up to t. Let N (t) on R+ consist of the time coordinates ti from those points of N ∗ lying in the region min{λ1 (t), λ2 (t)} < s < max{λ1 (t), λ2 (t)}. Then N is F -adapted and has conditional intensity |λ1 (t) − λ2 (t)|. [In most cases, as in Proposition 7.5.I, the history will be that generated by the Poisson process itself, but the generalization opens the way to conditioning on external variables. See Br´emaud and Massouli´e (1996) and Massouli´e (1998).] 7.5.2 Extension of thinning Algorithm 7.5.II. In the setup for Algorithm 7.5.II, suppose that the xi are simulated from a process with conditional intensity λ+ (t) that satisﬁes a.s. λ+ (t) ≥ λ∗ (t) (0 < t < T ) and that the thinning probability at time t is equal to the ratio λ∗ (t)/λ+ (t). Show that the thinned process is again the point process with intensity λ∗ (t). [See Ogata (1981).] 7.5.3 Simulation algorithms for Boolean models. Devise a simulation procedure for the Boolean model of Example 6.4(d) with a view to describing distributions of functionals such as the intensity function or a joint intensity (‘correlation’). 7.5.4 Show how Algorithm 7.5.V can be applied to a pure linear birth process. 7.5.5 Simulation of cluster processes. Brix and Kendall (2002) describe a technique for the perfect simulation of a cluster point process in a given region A (hence, the simulations have no edge eﬀects—this is an analogue of having no ‘burnin’ period). The crucial step is to replace the parent process Nc , say, by a process which has at least one oﬀspring point in the observation region.

276

7. Conditional Intensities and Likelihoods

7.6. Information Gain and Probability Forecasts We come now to the problem of assessing probability forecasts of the type described in the previous section. A distinction needs to be made here between assessing the probability forecast as such and assessing a decision procedure based on the probability forecast. Commonly, when probability forecasts for weather and other phenomena are being assessed, a threshold probability level is established, and the forecast is counted as a ‘success’ if either the forecast probability rises above the threshold level and a target event occurs within the forecasting period or region or the forecast probability falls below the threshold level and no event occurs. The assessment is then based on the 2 ×2 table of observed and forecast successes and failures, and a variety of scores for this purpose have been developed and studied (see e.g. Shi et al., 2001). In eﬀect, such a procedure converts the probability forecast into a decision rule, and it is the decision rule rather than the forecast that is assessed. In fact, many decision rules can be based on the same probability forecast, depending on the application in view. For example, in earthquake forecasts, one relevant decision for a government might be whether or not to issue a public earthquake warning; but other potential users, such as insurance companies, emergency service coordinators, and managers of gas, power, or transport companies, might prefer to initiate actions at quite diﬀerent probability levels and would therefore score the forecasts quite diﬀerently. Our concern is with assessing the probability forecasts as such. The basic criterion we shall use for this purpose is the binomial or entropy score, in which the forecast is scored by the negative logarithm − log pˆk of the forecast probability pˆk of the outcome k that actually occurs. If outcome k has true probability pk of occurring, then a ‘good’ set of forecasts should have pˆk ≈ pk for outcome k, and therefore the expected score is approximately − k pk log pk , which is just the entropy of the distribution {pk } (up to a multiplicative factor in not using logarithms to base 2). This leads us to a preliminary discussion of the entropy of point process models, a study taken further in Chapter 14. The entropy score itself, summed over a sequence of forecasts based on a speciﬁc parametric model, is nothing other than the log likelihood of the model. In this sense, the discussion highlights an alternative interpretation of the likelihood principle. Maximizing the likelihood from within a family of models amounts to ﬁnding the model with the best forecast performance in the sense of the entropy score. Equally, testing the model on the basis of its forecasting performance amounts to testing the model on the basis of its likelihood. Other criteria, such as the goodness-of-ﬁt of ﬁrst- and second-moment properties, may be less relevant to selecting a model for its forecasting ability. In any case, the analysis and assessment of probability forecasts is a topic of importance in its own right, and it is this point of view that motivates the present discussion. To bring some of the underlying issues into focus, consider ﬁrst the simpler problem of producing and assessing probability forecasts for a sequence of

7.6.

Information Gain and Probability Forecasts

277

i.i.d. multinomial trials in which observation Yi , for i = 1, . . . , N , may have one of K diﬀerent outcomes 1, . . . , K, say, with respective true probabilities K pk = Pr{outcome is k} = Pr{Yi = k} (the trials are i.i.d.) and k=1 pk = 1. Suppose that there is a record available of observations {Y1 , . . . , YN } on N N independent trials, and write pˆk = N −1 i=1 δYi k ≡ Nk /N for the sample proportion of outcomes equal to k (k = 1, . . . , K). What should be our forecast for trial N + 1? In accordance with our general prescription, the forecast should be in the form of a set of probabilities based on an assumed model (i.e. a model for which the underlying probabilities are assumed known). In this simple situation, it is intuitively obvious that the {pk } are also the probabilities that we would use to forecast the diﬀerent possible outcomes of the next event. However, it is also possible to base this choice on somewhat more objective grounds, namely that our choice should maximize some expected score, suitably chosen. Denote the candidate probabilities for the forecast by ak . In accordance with the discussion above, we consider here the likelihood ratio score SLR =

N

i=1

ak aYi = N pˆk log , πYi πk K

log

(7.6.1)

k=1

where {πk } is a set of reference probabilities. The use of the logarithm of the ratio ak /πk rather than the simple logarithm log ak has two beneﬁts: it introduces a natural standard against which the forecasts using the given model can be compared; and it overcomes dimensionality problems in the passage from discrete to continuous contexts (Exercise 7.6.1 gives some further discussion). This score function has the character of a skill score, for which higher values show greater skills. Taking expected values has the eﬀect of replacing the empirical frequencies pˆk by pk in the second form of (7.6.1). Elementary computations then show that the score SLR is optimized by the choice ak = pk ; i.e. the procedure that optimizes the expected score is to use the model probabilities as the forecasting probabilities. Speciﬁcally, the optimum values achieved by following the procedure above are given by E(SLR ) = N H(P ; Π),

(7.6.2)

where P , Π denote the distributions with elements pk , πk , respectively, and H(·) is the relative or generalized entropy or Kullback–Leibler distance between the two distributions. The appearance of the entropy here should not come as a surprise, as it is nothing other than the expected value of (minus) a log probability, or more generally a log likelihood. In terms of SLR , the distribution that is hardest to predict is the discrete uniform distribution, which has maximum entropy amongst distributions on K points. If we use the uniform as the reference distribution {πk }, the change in the expected score as the model distribution moves away from the maximum

278

7. Conditional Intensities and Likelihoods

entropy distribution will be referred to as the expected information gain. It represents the improvement in the predictability of the model used relative to the reference model. The greatest expected gains, corresponding to the most eﬀective predictions, will be achieved when the model distribution is largely concentrated on one or a small number of distinguished values. The ratio pk /πk of the model probability pk to the reference probability πk for any particular distinguished value k is sometimes called the probability gain for k. Now let us examine how these ideas carry over to the point process context. We start with a discrete-time framework, such as would arise if the forecasts were being made regularly, after the elapse of a ﬁxed time interval (weekly, monthly, etc.). We also assume that the process is marked, with the marks taking one of the ﬁnite set of values {1, . . . , K}. In eﬀect, this merely extends the discussion from the case of independent to dependent trials, with the assumption that the trials are indexed by a time parameter so that the evolutionary character is maintained. Alternatively, and more conveniently for our purposes, we may consider the model as a multivariate point process in discrete time. Rather than using the sequence of marks Yn (n = 1, 2, . . .) as before, introduce Xkn = δYn k , and let the K component simple point processes Nk (n) count the number ofpoints with mark k up to ‘time’ n, with Nk (0) = 0 for n each k, so Nk (n) = i=1 Xki . An argument similar to that given previously shows that the forecasting probability that optimizes the expected value of the score at step n, given the history Hn−1 up to time n − 1, is p∗kn = E(Xkn | Hn−1 ), where H is the full history of the process, recording information on the marks as well as the occurrence times. If, as a reference process, we take the process of i.i.d. trials having ﬁxed probabilities πkn = fk , then the total entropy score over a period of T time units can be written log

T K

p∗ L = Xkn log kn , L0 πkn n=1

(7.6.3)

k=1

which is just the likelihood ratio for the given process relative to the reference process. This formulation shows clearly that the total entropy score for the multivariate process is the sum of the entropy scores of the component processes. There is no implication here that the component processes are independent; dependence comes through the joint dependence of the components on the full past history. In the case of a univariate process, for which the only possible outcomes are 0 and 1, the formula in (7.6.3) simpliﬁes to the binomial score T

p∗n 1 − p∗n L Xn log . (7.6.4) = + (1 − Xn ) log log L0 πn 1 − πn n=1 Equation (7.6.3) assumes a form closer to that used previously for the likelihood of a multivariate point process if we reserve one mark, 0 say, for

7.6.

Information Gain and Probability Forecasts

279

the null event; that is, the event that no event of any other type occurs. Let us assume in addition that the ground process is simple, so that at most one nonnull notations event can occur in any one time instant, and introduce the ∗ p∗n = k p∗kn for the conditional intensity of the ground process, fk|n = p∗kn /p∗n for the conditional distribution of the mark, Kgiven the past history and the occurrence of an event at n, and Xn = k=1 Xkn for the ground process itself. Let us also choose the reference probabilities in the form πkn = fk πn for k = 0, π0n = 1 − πn , corresponding to a discrete-time analogue of a continuous-time compound Poisson process. Then we can rewrite (7.6.3) as * K + T ∗

p∗n fk|n L 1 − p∗n log = Xkn log + (1 − Xn ) log L0 fk πn 1 − πn n=1 k=1 * + T K ∗

fk|n p∗n 1 − p∗n = Xn log . (7.6.5) + (1 − Xn ) log + Xkn log πn 1 − πn fk n=1 k=1

Taking expectations of the nth term, given the past up to time n − 1, gives the conditional relative entropy or conditional information gain In =

K

p∗kn log

k=1

= p∗n log

p∗kn 1 − p∗n + (1 − p∗n ) log pkn 1 − pn

K ∗

fk|n p∗n 1 − p∗n ∗ + (1 − p∗n ) log + p∗n fk|n log . pn 1 − pn fk

(7.6.6)

k=1

It is the conditional relative entropy of the nth observation, given the information available prior to the nth step. Note that this quantity is still a random variable since it depends on the random past through the conditioning σ-algebra Hn−1 . It reduces to the zero random variable when p∗kn = πkn but is otherwise positive, as follows from Jensen’s inequality. In the special case of a univariate process, it reduces to In∗ = p∗n log The relation

p∗n 1 − p∗n + (1 − p∗n ) log . πn 1 − πn

(7.6.7)

∗ ∗ ) | Hn−1 ] = E(In∗ | Hn−1 ) + In−1 E[(In∗ + In−1

yields the joint conditional entropy of Xn and Xn+1 , given the information available at the (n − 1)th step. Continuing in this way, we obtain * N + & & N

&

L && ∗ & E (7.6.8) H In & H0 = E In∗ | H0 = E log 0 , L0 & n=1 n=1 the joint entropy of the full set of observations, conditional on the information available at the beginning of the observation period. Dividing this quantity

280

7. Conditional Intensities and Likelihoods

by N , we obtain the average expected information gain per time step. This quantity is of particular interest when the whole setup is stationary and the expectations in (7.6.8) have the same value, namely the expected information gain per unit time. We shall denote this quantity by G. In this situation, we expect the log likelihood to increase roughly linearly with the number of observations, with the expected increment being equal to G. To avoid diﬃculties with transient eﬀects near n = 0, the histories in the stationary case should cover the inﬁnite past rather than the past since some ﬁxed starting time. Following the notation in later chapters, write p†n = E[Xn+1 | H(−∞,n] ] and set πn = E(p†n ) = E(Xn ) = p, say. Then, G can be expressed as K

f† 1 − p†n p† G = E p†n log n + (1 − p†n ) log log k . + p†n p 1−p fk

(7.6.9)

k=1

The ﬁrst term represents the information gain from the ground process and the second the additional information gain that comes from predicting the values of the marks, given the ground process. Overall, G represents the expected improvement in forecasting skill, as measured by the entropy score, if we move from using the background probabilities as the forecast to using the time-varying model probabilities. G ranges from 0, when the trials are i.i.d. and the model probabilities coincide with those of the reference model, to a maximum when the model trials are completely predictable, related to the absolute entropy of the independent trials model. To see this last point, suppose, to take a speciﬁc case, that the background model is for i.i.d. trials with equal probabilities 1/K for each outcome. Now write G in the form * G = E p†n log p†n + (1 − p†n ) log(1 − p†n ) − p†n log p + (1 − p†n ) log(1 − p) + p∗n

K

k=1

† fk|n

log

† fk|n

fk

+ (7.6.10)

and suppose that with high probability, p†n is close to either one or zero † is also close to one, so that the process is highly and that one of the fk|n predictable. Then, both the ﬁrst two terms in the ﬁrst sum above are very small, while in the second sum either p†n itself is very small or it is close to one and the remaining sum is close to the value − log(1/K). After taking expectations, recalling E(p†n ) = p, G reduces to approximately −[p log p + (1 − p) log(1 − p) + p log(1/K)], the absolute entropy of the independent trials model with equal probabilities for each outcome. In general, the ﬁnal term will be of the form p E[log fk† ], where fk† is the background probability of the outcome k † that is successfully predicted. In summary, we have the following statement.

7.6.

Information Gain and Probability Forecasts

281

Proposition 7.6.I. For a stationary, multivariate, discrete-time process, with full internal history F , overall occurrence rate p, and background model as deﬁned above, G, the expected information gain per time step, is given by (7.6.9) above. It is a characteristic of the model and lies in the range 0 ≤ G ≤ −[p log p + (1 − p) log(1 − p) + p E(log fk† )], where fk† is the background probability of the outcome k † that is successfully predicted. G takes the lower end point of the range when the increments Xnk are independent and the upper end point when perfect prediction is possible. Example 7.6(a) Discrete Hawkes process: logistic autoregression. This model deﬁnes a univariate process in which p∗n has the general form log

K K

p∗n = a + a X = a + I{Xn−i =1} ai , 0 i n−i 0 1 − p∗n i=1 i=1

(7.6.11)

where the ai are parameters and, to accommodate the stationarity requirement, F is taken to be the complete history H† , so that Hn† is generated by the Xi with −∞ < i ≤ n. For simplicity, we examine just the case of a ﬁrst-order autoregression; there are then just two parameters, a0 and a1 , in 1 : 1 correspondence with the probabilities π1|0 = Pr{Xn = 1 | Xn−1 = 0} and π1|1 = Pr{Xn = 1 | Xn−1 = 1}, respectively. Three extreme cases arise. If π1|0 is close to 0 and π1|1 is close to 1, then a realization will consist of long sequences of 0s followed by long sequences of 1s, and any prediction should approximate the weatherman’s rule: tomorrow’s weather will be the same as today’s. If π1|1 is close to 0 and π1|0 is close to 1, then the realization will be an almost perfect alternation of 0s and 1s, and any prediction rule should approximate the antiweatherman’s rule: tomorrow’s weather will be the opposite of today’s. In the third case, π1|0 and π1|1 are both close to 12 , and the sequence will consist of more or less random occurrences of 0s and 1s, and no good prediction rule will be possible. To examine such eﬀects quantitatively, let us choose the parameters a0 , a1 so that π1|0 and π1|1 can be written π1|0 = ,

π1|1 = 1 − ρ .

The stationary probability p solves the equation p = pπ1|1 + (1 − p)π1|0 so p = 1/(1 + ρ). Thus, the parameter controls the mean length of runs of the same digit, and the parameter ρ controls the relative probabilities of 0s and 1s. We examine the behaviour of the predictions for small . When Xn−1 = 0, we take as our prediction p∗n = π1|0 = , and when Xn−1 = 1 we take p∗n = π1|1 = 1 − ρ. The information gain when Xn−1 = r for r = 0, 1 is then Jr = π1|r log

π1|r 1 − π1|r + (1 − π1|r ) log . p 1−p

282

7. Conditional Intensities and Likelihoods

The expected information gain per forecast is G = pJ1 +(1−p)J0 . Substituting for π1|0 , π1|1 and p, we ﬁnd that, for small , G = Hp + 2ρ log + O(), where Hp is as in Proposition 7.6.I. As decreases, the expected information gain approaches Hp , whereas if = 1/(1 + ρ), then π1|0 = π1|1 = 1/(1 + ρ) and G = 0. We have stressed that the expected information gain is a function of the model: it is an indicator of its inherent predictability. In practice, other factors may intervene to produce an observed mean information gain that is well below that predicted by the model. This may happen, in particular, if the data are being ﬁtted by a poor model. There would then be substantial long-run discrepancies between the actual data and the data that would be produced by simulation from the model. In such a case, the average information gain over a long sequence of trials could be well below the expected model value. In this sense, the mean information gain, representing the average likelihood per observation, forms the basis for a kind of goodness-of-ﬁt test for the model. We turn now to the problem of transferring these ideas to the continuoustime, point process context. In practice, forecasts cannot be issued continuously but only after intervals of greater or smaller length. We therefore adopt the following framework. Suppose there is given a ﬁnite interval (0, T ) and a partition T into subintervals {0 < tT ,1 < · · · < tT ,N = T }. Forecasts are to be made at the end of each subinterval (i.e. at the time points {tT ,k }) for the probability of an event occurring in the next subinterval. Suppose further that the given partition is a member of a dissecting family of partitions Tn in the sense of Appendix A1.6: as n → ∞, the norm T = max |tT ,k − tT ,k−1 | → 0 so that the partitions ultimately distinguish points of (0, T ), and the intervals appearing in the partitions are rich enough in total to generate the Borel sets of (0, T ). Our aim is to relate the performance of the forecasts on the ﬁnite partition to the underlying properties of the point process. For this purpose, Lemmas A1.6.IV, on convergence to a Radon–Nikodym derivative, and A1.6.V, on the relative entropy of probability measures on nested partitions, play a key role. To apply these lemmas, we must relate the partitions of the interval (0, T ) to the partitions of the measurable space (Ω, E) on which the probabilities are deﬁned. Here it is enough to note that a partition of the interval into N subintervals induces a partition of (Ω, E) into the (K + 1)N events corresponding to all possible sequences obtained by noting whether or not the subinterval contains a point of the process and, if so, noting the mark of the ﬁrst point occurring within the subinterval. From Lemma A1.6.IV, it follows that, as the partitions are reﬁned, the probability gains p∗nk /πnk converge (P × )-a.e. to the corresponding ratio of intensities λ∗ (t, k)/λ0 (t, k). Lemma A1.6.V then implies that the corresponding relative entropies increase to a limit bounded above by the point process

7.6.

Information Gain and Probability Forecasts

283

relative entropy. The latter can be obtained directly by taking expectations of the point process likelihood ratio. Speciﬁcally, starting from the MPP log likelihood ratio at (7.6.5), taking expectations when the reference measure corresponds to a compound Poisson process with constant rate λ0 and mark distribution fk , the relative entropy H(PT ; P0,T ) equals * K + T

T λ∗ (t, k) ∗ ∗ E dt − [λg (t) − λ0 ] dt (7.6.12) λ (t, k) log λ0 fk 0 k=1 0 T T λ∗g (t) ∗ =E λg (t) log dt − [λ∗g (t) − λ0 ] dt λ0 0 0 T K

fk∗ (t) ∗ ∗ + λg (t) fk (t) log dt , (7.6.13) fk 0 k=1

where λ∗g (t) is the conditional intensity for the ground process. A proof of this result for the univariate case when H is the internal history and the likelihood reduces to the Janossy density is outlined in Exercise 7.6.3. The general case, as well as a more complete discussion of the convergence of the p∗nk to λ∗ (t, k), is taken up in Chapter 14. When the process is stationary and λ∗ is replaced by λ† (i.e. the conditioning is taken with respect to the inﬁnite past), the relative entropy in (7.6.12) reduces to a multiple of T . If further we assume that λ0 = E[λ†g (0)] ≡ mg , then (7.6.12) can be written ! * K † +"

fk|0 λ†g (0) † + mg E . (7.6.14) log H(PT ; P0,T ) = T E λg (0) log fk λ0 k=1

Again, we can write G for the coeﬃcient of T and refer to it as the mean entropy or expected information gain per unit time. It is worth noting that here G can be written in the two alternative forms * K † + K

fk|0 λ†g (0) λ† (0) † + mg E , = log E λ†k (0) log k G = E λg (0) log λ0 fk λk k=1

λ†k (0)

† λ†g (0)fk|0

k=1

where = and λk = mg fk . The ﬁrst form represents a division of the information gain into components due to forecasting the occurrence times of the points and their marks, while the second represents a division of the information gain into components corresponding to the individual marks. This equality does not hold in general for the approximating discrete-time processes because the two forms then correspond to diﬀerent ways of scoring situations where more than one point of the process falls into a single time step. As in the discrete case, the quantity G is a characteristic of the model. It represents an upper bound to the expected information gains per unit time that could be obtained from any approximating discrete model. The results are summarized in the proposition below.

284

7. Conditional Intensities and Likelihoods

Proposition 7.6.II. Let N (t, κ) be a stationary regular MPP, let † † λ† (t, κ) dt = λ†g (t)fκ|t dt = E[dt N (t, κ) | Ht− ]

denote its complete H† -conditional intensity, and suppose that λ†g (0) † G = E λg (0) log < ∞, mg where mg = E[λ†g (0)]. If T is any ﬁnite partition of the interval (0, T ) and GT the associated average expected information gain per unit time, then GT ≤ G and, as Tn increases through any nested sequence of partitions generating the Borel sets in (0, T ), GTn ↑ G† ≡ limn→∞ GTn ≤ G. Proof. The result follows from further applications of Lemmas A1.6.IV and A1.6.V, but a formal proof requires a more careful discussion of conditioning and predictability than given here and is deferred to Chapter 14. Since G here is a property of the model, it can be evaluated analytically or numerically (by simulation). The model value of G can then be compared with the mean likelihood T −1 log L obtained by applying the model to a set of data, this latter being just the mean entropy score per unit time for the given model with the given data. If the model is close to the true model for the data, the estimate of G obtained in this way should be close to the model G. When the data do not match the model well, the predictive power of the model should be below that obtained when the model is applied to matching data and hence below the theoretical G of the model. In such a situation, the estimated G from the likelihood will generally come out well below the true G of the model (as well as below the unknown G of the true model). The diﬀerence between the model and estimated values of G can therefore serve as a basis for model testing and is in fact so used in contingency table contexts, corresponding roughly to the discrete time-models considered earlier in this section. Some of these points are illustrated in the following two examples. Example 7.6(b) Renewal process. Consider a stationary renewal process with interval distribution having density f (x), assumed at least left-continuous. Then λ† (t) = f (Bt )/S(Bt ), where Bt has the distribution of a stationary backward recurrence time. For the mean rate and the expected information gain per unit time, we obtain, respectively, f (Bt ) , m = E[λ† (t)] = E S(Bt ) f (Bt ) λ† (t) f (Bt ) G = E λ† (t) log =E , (7.6.15) log m S(Bt ) mS(Bt )

7.6.

Information Gain and Probability Forecasts

285

the two expectations on the extreme right-hand sides

being with respect to ∞ the distribution of Bt , which has density y f (u) du µ, where µ is the mean interval length [see (4.2.5) or Exercise 3.4.1]. Substituting and simplifying, we ﬁnd m = 1/µ and ∞ f (y) dy . (7.6.16) f (y) log G=m 1+ m 0 The same result can be obtained from the general result that, for a stationary process, the expected information gain per unit time is just m times the expected information gain per interval, where the latter is deﬁned to be GI = E 0

∞

f † (x) log

f † (x) dx , f0 (x)

with f † (x) the density of the distribution of an interval given the history up to its start, and f0 (x) is the density of an interval under the reference measure. Here, given m, the exponential distribution with mean 1/m has maximum entropy so we take f0 (x) = me−mx in the expression above, corresponding precisely to the choice of the Poisson process with rate m used in the counting process description. Now suppose that probability forecasts are made for a forecasting period of length ∆ ahead. The probability of an event occurring in the interval (t, t + ∆), given the past history Ft† , is given by

p∗ (∆ | X) = S(X) − S(X + ∆) S(X), say, where S(x) is the survivor function for the interval distribution, and X is the backward recurrence time. In the stationary case, writing p0 = 1 − e−m∆ and taking expectations with respect to the stationary form of the backward recurrence time distribution, we consider the quantity G∆ = E[I∆ | Ht† ] 1 p∗ (∆ | X) 1 − p∗ (∆ | X) = E p∗ (∆ | X) log . + [1 − p∗ (∆ | X)] log ∆ p0 1 − p0 (7.6.17) It represents the average expected information gain for forecasts of length ∆, is independent of t and can be shown to satisfy G∆ ≤ G = lim∆→0 G∆ . See Exercise 7.6.4 for details and some numerical illustrations. The next model both illustrates the ideas of Proposition 7.6.II in a relatively simple context and adds a cautionary note to the discussion of probability forecasts for point processes. Example 7.6(c) Marked Hawkes process with exponential infectivity function [see Example 7.3(b)]. Consider an MPP with complete conditional intensity of the form

286

7. Conditional Intensities and Likelihoods

λ† (t, κ) = µ0 + ψ(κi )βe−β(t−ti ) f (κ). {i:ti
In common with the ETAS model where the marks κ are commonly denoted by M for magnitudes, it has unpredictable marks, and its ground intensity is just the term in square brackets above. The ground intensity can be written in the form

A(t) = ψ(κi )βe−β(t−ti ) . λ†g (t) = µ0 + A(t), {i:ti
Now, although the sum deﬁning A(t) goes back into the indeﬁnite past, in fact it is a Markov process, its future evolution depending only on its present value (discounted exponentially in the gaps between events) and the sizes of future events that are chosen independently of the past. Thus E[A(t)] = mg (1 − µ0 ) and µ0 + A(t) G = E µ0 + A(t) log m are both fully determined once the equilibrium distribution for the Markov process A(t) is determined. In this example, the observed performance of predictions based on the true model is likely to be worse than predictions based on a Poisson process with the same mean rate m. This is because the rate in intervals between points is assessed as µ0 by the model and as m by the Poisson process. When an event occurs, however, it is likely to be followed by several others within the same prediction interval, all of which are likely to be scored (badly). In fact, this is one example where the distinction between the scores SLR at (7.6.1) and SQ at Exercise 7.6.1 makes a crucial diﬀerence in the estimation of the performance of the model. A related example with numerical details from simulations is given in Vere-Jones (1999).

Exercises and Complements to Section 7.6 7.6.1 As a possible alternative to the likelihood score SLR in (7.6.1) for assessing probability forecasts, deﬁne the quadratic score SQ by SQ =

N K

i=1

k=1

2

(δXi ,k − ak )

=N 1−2

K

1

pˆk ak +

K

2

(ak )

.

1

Show that, just as for SLR , the optimal result is achieved by using the model probabilities as the forecast probabilities. Show also that when these proba-2 −1 bilities are used, E(SQ ) = N [1 − K − var pX ], where var pX = (pk − p¯) and p¯ = pk /K = 1/K. 7.6.2 (Continuation). Consider the eﬀect on SQ of the limit procedure that passes from a discrete probability to a continuous density. How should a reference measure be introduced so as to secure a meaningful passage to a limit?

7.6.

Information Gain and Probability Forecasts

287

7.6.3 Entropy of a regular ﬁnite point process. (a) For a regular ﬁnite point process, deﬁne the point process entropy H(P) as the expected value E[log(L/L0 )] of the likelihood ratio. Express L in terms of Janossy densities, and use the representation (i) of Theorem 5.3.II to show (see Rudemo, 1964; McFadden, 1965) that H(P) equals −

k

pk log pk −

πksym (x1 , . . . , xk ) log[k! πk (x1 , . . . , xk )] dx1 · · · dxk ,

k

where pk = Pr{N (X ) = k}. (b) Now take X to be the interval (0, T ) and represent the Janossy densities in terms of hazard functions and hence the internal conditional intensity. Hence, derive (7.6.14). 7.6.4 Forecasts for renewal processes [see Example 7.6(b)]. (a) Recall that the backward recurrence time has density mS(x) in the notation of Example 7.6(b). Hence, simplify the expectation in (7.6.17) and verify the inequality for G∆ using a convexity argument. (b) Uniformly distributed intervals. Examine the special case

S(x) =

1−x

(0 < x < 1),

1

(x ≥ 1).

Substitute in (7.6.15) and (7.6.16) and investigate the result in (a) numerically. 7.6.5 Information gain for the Wold process with exponential intervals [see Exercise 4.5.8 and Example 7.5(c)]. Using the earlier notation, show that the information gain per unit time can be expressed as

G = E log

λ0 , λ(X)

where the expectation is over the stationary distribution for an interval length X, and λ0 = 1/E(X).

APPENDIX 1

A Review of Some Basic Concepts of Topology and Measure Theory

In this appendix, we summarize, mainly without proof, some standard results from topology and measure theory. The aims are to establish terminology and notation, to set out results needed at various stages in the text in some speciﬁc form for convenient reference, and to provide some brief perspectives on the development of the theory. For proofs and further details, the reader should refer, in particular, to Kingman and Taylor (1966, Chapters 1–6), whose development and terminology we have followed rather closely.

A1.1. Set Theory A set A of a space X is a collection of elements or points of X . When x is an element of the set A, we write x ∈ A (x belongs to or is included in A). The set of points of X not included in A is the complement of A, written Ac . If A, B are two sets of points from X , their union, written A ∪ B, is the set of points in either A or B or both; their symmetric diﬀerence, written A B, is the set of points in A or B but not both. If every element of B is also an element of A, we say B is included in A (B ⊆ A) or A contains B (A ⊇ B). In this case, the proper diﬀerence of A and B, written either A − B or A \ B, is the set of points of A but not B. More generally, we use A − B for A ∩ B c , so A − B = A B only when A ⊃ B. The operations ∩ and on subsets of X are commutative, associative and distributive. The class of all such subsets thus forms an algebra with respect to these operations, where ∅, the empty set, plays the role of identity for and X the role of identity for ∩. The special relation A ∩ A = A implies that the algebra is Boolean. More generally, any class of sets closed under the operations of ∩ and is called a ring, or an algebra if X itself is a member of the class. A semiring is a class of sets A with the properties (i) A is closed under intersections and (ii) every symmetric diﬀerence of sets in A can be 368

A1.2.

Topologies

369

represented as a ﬁnite union of disjoint sets in A. The ring generated by an arbitrary family of sets F is the smallest ring containing F or, equivalently, the intersection of all rings containing F. Every element in the ring generated by a semiring A can be represented as a union of disjoint sets of A. If R is a ﬁnite ring, there exists a basis of disjoint elements of R such that every element in R can be represented uniquely as a union of disjoint elements of the basis. The notions of union and intersection can be extended to arbitrary classes of sets. If {An : n = 1, 2, . . .} is a sequence ∞of sets, write An ↑ A = lim An if An ⊆ An+1 (n = 1, 2, . . .) and A = n=1 An ; similarly, if An ⊇ An+1 , ∞ write An ↓ A = lim An if A = n=1 An . A monotone class is a class of sets closed under monotonically increasing sequences. A ring or algebra that is closed under countable unions is called a σ-ring or σ-algebra, respectively. The σ-ring generated by a class of sets C, written σ(C), is the smallest σring containing C. A σ-ring is countably generated if it can be generated by a countable class of C. The following result, linking σ-rings to monotone classes, is useful in identifying the σ-ring generated by certain classes of sets. Proposition A1.1.I (Monotone Class Theorem). If R is a ring and C is a monotone class containing R, then C contains σ(R). A (i) (ii) (iii)

closely related result uses the concept of a Dynkin system D meaning X ∈ D; D is closed under proper diﬀerences; and D is closed under monotonically increasing limits.

Proposition A1.1.II (Dynkin System Theorem). If S is a class of sets closed under ﬁnite intersections, and D is a Dynkin system containing S, then D contains σ(S).

A1.2. Topologies A topology U on a space X is a class of subsets of X that is closed under arbitrary unions and ﬁnite intersections and that includes the empty set ∅ and the whole space X ; the members of U are open sets, while their complements are closed sets. The pair (X , U) is a topological space. The closure of an ¯ is the smallest closed set (equivalently, arbitrary set A from X , written A, the intersection of all closed sets) containing A. The interior of A, written A◦ , is the largest open set (equivalently, the union of all open sets) contained within A. The boundary of A, written ∂A, is the diﬀerence A¯ \ A◦ . The following elementary properties of boundaries are needed in the discussion of weak convergence of measures. Proposition A1.2.I. (a) ∂(A ∪ B) ⊆ ∂A ∪ ∂B; (b) ∂(A ∩ B) ⊆ ∂A ∪ ∂B; (c) ∂Ac = ∂A.

370

APPENDIX 1. Some Basic Topology and Measure Theory Concepts

A neighbourhood of the point x ∈ X with respect to the topology U (or, more brieﬂy, a U-neighbourhood of x) is an open set from U containing x. U is a Hausdorﬀ or T2 -topology if the open sets separate points; that is, if for x = y, x and y possess disjoint neighbourhoods. A family of sets F forms a basis for the topology U if every U ∈ U can be represented as a union of sets in F and F ⊆ U. U is then said to be generated by F. U is second countable if it has a countable basis. A suﬃcient condition for a family of sets to form a basis for some topology is that, if F1 ∈ F, F2 ∈ F and x ∈ F1 ∩ F2 , then there exists F3 ∈ F such that x ∈ F3 ⊆ F1 ∩ F2 . The topology generated by F is then uniquely deﬁned and consists of all unions of sets in F. Two bases F and G, say, are equivalent if they generate the same topology. A necessary and suﬃcient condition for F and G to be equivalent is that for each F ∈ F and x ∈ F , there exists G ∈ G with x ∈ G ⊆ F , and similarly for each G ∈ G and y ∈ G, there exists F ∈ F such that y ∈ F ⊆ G. Given a topology U on X , a notion of convergence of sequences (or more generally nets, but we do not need the latter concept) can be introduced by saying xn → x in the topology U if, given any U-neighbourhood of x, Ux , there exists an integer N (depending on the neighbourhood in general) such that xn ∈ Ux for n ≥ N . Conversely, nearly all the important types of convergence can be described in terms of a suitable topology. In this book, the overwhelming emphasis is on metric topologies, where the open sets are deﬁned in terms of a metric or distance function ρ(·) that satisﬁes the conditions, for arbitrary x, y, z ∈ X , (i) ρ(x, y) = ρ(y, x); (ii) ρ(x, y) ≥ 0 and ρ(x, y) = 0 if and only if x = y; and (iii) (triangle inequality) ρ(x, y) + ρ(y, z) ≥ ρ(x, z). With respect to a given distance function ρ, the open sphere S (x) is the set {y: ρ(x, y) < }, being deﬁned for any > 0. For any set A, deﬁne its diameter by $ # diam A = 2 inf r: Sr (x) ⊇ A for some x . The metric topology generated by ρ is the smallest topology containing the open spheres; it is necessarily Hausdorﬀ. A set is open in this topology if and only if every point in the set can be enclosed by an open sphere lying wholly within the set. A sequence of points {xn } converges to x in this topology if and only if ρ(xn , x) → 0. A limit point y of a set A is a limit of a sequence of points xn ∈ A with xn = y; y need not necessarily be in A. The closure of A in the metric topology is the union of A and its limit points. A space X with topology U is metrizable if a distance function ρ can be found such that U is equivalent to the metric topology generated by ρ. Two metrics on the same space X are equivalent if they each generate the same topology on X . A sequence of points {xn : n ≥ 1} in a metric space is a Cauchy sequence if ρ(xn , xm ) → 0 as n, m → ∞. The space is complete if every Cauchy sequence has a limit; i.e. if for every Cauchy sequence {xn } there exists x ∈ X such

A1.2.

Topologies

371

that ρ(xn , x) → 0. A set D is dense in X if, for every > 0, every point in X can be approximated by points in D; i.e. given x ∈ X , there exists d ∈ D such that ρ(x, d) < . The space X is separable if there exists a countable dense set, also called a separability set. If X is a separable metric space, the spheres with rational radii and centres on a countable dense set form a countable base for the topology. Given two topological spaces (X1 , U1 ) and (X2 , U2 ), a mapping f (·) from (X1 , U1 ) to (X2 , U2 ) is continuous if the inverse image f −1 (U ) of every open set U ∈ U2 is an open set in U1 . If both spaces are metric spaces, the mapping is continuous if and only if for every x ∈ X1 and every > 0, there exists δ > 0 such that ρ2 (f (x ), f (x)) < whenever ρ1 (x , x) < δ, where ρi is the metric in Xi for i = 1, 2; we can express this more loosely as f (x ) → f (x) whenever x → x. A homeomorphism is a one-to-one continuous-bothways mapping between two topological spaces. A famous theorem of Urysohn asserts that any complete separable metric space (c.s.m.s.) can be mapped homeomorphically into a countable product of unit intervals. A Polish space is a space that can be mapped homeomorphically into an open subset of a c.s.m.s. The theory developed in Appendix 2 can be carried through for an arbitrary Polish space with only minor changes, but we do not seek this greater generality. A set K in a topological space (X , U) is compact if every covering of K by a family of open sets contains a ﬁnite subcovering; i.e. K ⊆ α Uα , Uα ∈ U, N implies the existence of N < ∞ and α1 , . . . , αN such that K ⊆ i=1 Uαi . It is relatively compact if its closure K is compact. In a separable space, every open covering contains a countable subcovering, and consequently it is suﬃcient to check the compactness property for sequences of open sets rather than general families. More generally, for a c.s.m.s., the following important characterizations of compact sets are equivalent. Proposition A1.2.II (Metric Compactness Theorem). Let X be a c.s.m.s. Then, the following properties of a subset K of X are equivalent and each is equivalent to the compactness of K. (i) (Heine–Borel property) Every countable open covering of K contains a ﬁnite subcovering. (ii) (Bolzano–Weierstrass property) Every inﬁnite sequence of points in K contains a convergent subsequence with its limit in K. (iii) (Total boundedness and closure) K is closed, and for every > 0, K can be covered by a ﬁnite number of spheres of radius . (iv) Every sequence {Fn } of closed subsets of K with nonempty ﬁnite interN sections (i.e. n=1 Fn = ∅ for N < ∞, ∞the ﬁnite intersection property) has nonempty total intersection (i.e. n=1 Fn = ∅). The space X itself is compact if the compactness criterion applies with X in place of K. It is locally compact if every point of X has a neighbourhood with compact closure. A space with a locally compact second countable topology

372

APPENDIX 1. Some Basic Topology and Measure Theory Concepts

is always metrizable. In a c.s.m.s., local compactness implies σ-compactness: the whole space can be represented as a countable union of compact sets (take the compact closures of the neighbourhoods of any countable dense set). Any ﬁnite-dimensional Euclidean space is σ-compact, but the same does not apply to inﬁnite-dimensional spaces such as C[0, 1] or the inﬁnite-dimensional Hilbert space 2 . A useful corollary of Proposition A1.2.II is that any closed subset F of a compact set in a complete metric space is again compact, for by (ii) any inﬁnite sequence of points of F has a limit point in K, and by closure the limit point is also in F ; hence, F is compact.

A1.3. Finitely and Countably Additive Set Functions Let A be a class of sets in X , and ξ(·) a real- or complex-valued function deﬁned on A. ξ(·) is ﬁnitely additive on A if for ﬁnite families {A1 , . . . , AN } of disjoint sets from A, with their union also in A, there holds N N ξ Ai = ξ(Ai ). i=1

i=1

If a similar result holds for sequences of sets {Ai : i = 1, 2, . . .}, then ξ is countably additive (equivalently, σ-additive) on A. A countably additive set function on A is a measure if it is nonnegative; a signed measure if it is realvalued but not necessarily nonnegative; and a complex measure if it is not necessarily real-valued. A determining class for a particular type of set function is a class of sets with the property that if two set functions of the given type agree on the determining class, then they coincide. In this case, we can say that the set function is determined by its values on the determining class in question. The following proposition gives two simple results on determining classes. The ﬁrst is a consequence of the representation of any element in a ring of sets as a disjoint union of the sets in any generating semiring; the second can be proved using a monotone class argument and the continuity lemma A1.3.II immediately following. Proposition A1.3.I. (a) A ﬁnitely additive, real- or complex-valued set function deﬁned on a ring A is determined by its values on any semiring generating A. (b) A countably additive real- or complex-valued set function deﬁned on a σ-ring S is determined by its values on any ring generating S. Proposition A1.3.II (Continuity Lemma). Let µ(·) be a ﬁnite real- or complex-valued, ﬁnitely additive set function deﬁned on a ring A. Then, µ is countably additive on A if and only if for every decreasing sequence {An : n = 1, 2, . . .} of sets with An ↓ ∅, µ(An ) → 0.

A1.3.

Finitely and Countably Additive Set Functions

373

So far, we have assumed that the set functions take ﬁnite values on all the sets for which they are deﬁned. It is frequently convenient to allow a nonnegative set function to take the value +∞; this leads to few ambiguities and simpliﬁes many statements. We then say that a ﬁnitely additive set function ξ(·) deﬁned on an algebra or σ-algebra A is totally ﬁnite if, for all unions of disjoint sets A1 , . . . , AN in A, there exists M < ∞ such that N

& & &ξ(Ai )& ≤ M. i=1

In particular, a nonnegative, additive set function µ is totally ﬁnite if and only if µ(X ) < ∞. A ﬁnitely additive set function is σ-ﬁnite ∞ if there exists a sequence of sets {An : n = 1, 2, . . .} ∈ A such that X ⊆ n=1 An and for each n the restriction of ξ to An , deﬁned by the equation ˆ ξ(A) = ξ(A ∩ An )

(A ∈ A),

is totally ﬁnite, a situation we describe more brieﬂy by saying that ξ is totally ﬁnite on each An . The continuity lemma extends to σ-ﬁnite set functions with the proviso that we consider only sequences for which |µ(An )| < ∞ for some n < ∞. (This simple condition, extending the validity of Proposition A1.3.II to σ-ﬁnite set functions, fails in the general case, however, and it is then better to refer to continuity from below.) We state next the basic extension theorem used to establish the existence of measures on σ-rings. Note that it follows from Proposition A1.3.I that when such an extension exists, it must be unique. Theorem A1.3.III (Extension Theorem). A ﬁnitely additive, nonnegative set function deﬁned on a ring R can be extended to a measure on σ(R) if and only if it is countably additive on R. As an example of the use of the theorem, we cite the well-known result that a right-continuous monotonically increasing function F (·) on R can be used to deﬁne a measure on the Borel sets of R (the sets in the smallest σ-ring containing the intervals) through the following sequence of steps. (i) Deﬁne a nonnegative set function on the semiring of half-open intervals (a, b] by setting µF (a, b] = F (b) − F (a). (ii) Extend µF by additivity to all sets in the ring generated by such intervals (this ring consists, in fact, of all ﬁnite disjoint unions of such half-open intervals). (iii) Establish countable additivity on this ring by appealing to compactness properties of ﬁnite closed intervals. (iv) Use the extension theorem to assert the existence of a measure extending the deﬁnition of µF to the σ-ring generated by the half-open intervals— that is, the Borel sets. The intrusion of the topological notion of compactness into this otherwise measure-theoretic sequence is a reminder that in most applications there is a

374

APPENDIX 1. Some Basic Topology and Measure Theory Concepts

close link between open and measurable sets. Generalizing the corresponding concept for the real line, the Borel sets in a topological space are the sets in the smallest σ-ring (necessarily a σ-algebra) BX containing the open sets. A Borel measure is any measure deﬁned on the Borel sets. The properties of such measures when X is a c.s.m.s. are explored in Appendix 2. Returning to the general discussion, we note that no simple generalization of the extension theorem is known for signed measures. However, there is an important result, that shows that in some respects the study of signed measures can always be reduced to the study of measures. Theorem A1.3.IV (Jordan–Hahn Decomposition). Let ξ be a signed measure deﬁned on a σ-algebra S. Then, ξ can be written as the diﬀerence ξ = ξ+ − ξ− of two measures ξ + , ξ − on S, and X can be written as the union of two disjoint sets U + , U − in S such that, for all E ∈ S, ξ + (E) = ξ(E ∩ U + )

ξ − (E) = −ξ(E ∩ U − ),

and

and hence in particular, ξ + (U − ) = ξ − (U + ) = 0. The measures ξ + and ξ − appearing in this theorem are called upper and lower variations of ξ, respectively. The total variation of ξ is their sum Vξ (A) = ξ + (A) + ξ − (A). It is clear from Theorem A1.3.IV that

n(IP)

Vξ (A) = sup

|ξ(Ai )|,

IP(A) i=1

where the supremum is taken over all ﬁnite partitions IP of A into disjoint measurable sets. Thus, ξ is totally bounded if and only if Vξ (X ) < ∞. In this case, Vξ (A) acts as a norm on the space of totally bounded signed measures ξ on S; it is referred to as the variation norm and sometimes written Vξ (X ) = ξ.

A1.4. Measurable Functions and Integrals A measurable space is a pair (X , F), where X is the space and F a σ-ring of sets deﬁned on it. A mapping f from a measurable space (X , F) into a measurable space (Y, G) is G-measurable (or measurable for short) if, for all A ∈ G, f −1 (A) ∈ F. Note that the inverse images in X of sets in G form a σ-ring H = f −1 (G), say, and the requirement for measurability is that H ⊆ F. By specializing to the case where Y is the real line R with G the σ-algebra of Borel sets generated by the intervals, BR , the criterion for measurability simpliﬁes as follows. Proposition A1.4.I. A real-valued function f : (X , F) → (R, BR ) is Borel measurable if and only if the set {x: f (x) ≤ c} is a set in F for every real c.

A1.4.

Measurable Functions and Integrals

375

The family of real-valued (Borel) measurable functions on a measurable space (X , F) has many striking properties. It is closed under the operations of addition, subtraction, multiplication, and (with due attention to zeros) division. Moreover, any monotone limit of measurable functions is measurable. If X is a topological space and F the Borel σ-ﬁeld on X , then every continuous function on X is measurable. The next proposition provides an important approximation result for measurable functions. Here a simple function is a ﬁnite linear combination of indicator functions of measurable sets; that is, a function of the form N s(x) = k=1 ck IAk (x), where c1 , . . . , cN are real and A1 , . . . , AN are measurable sets. Proposition A1.4.II. A nonnegative function f : (X , F) → (R+ , BR+ ) is measurable if and only if it can be represented as the limit of a monotonically increasing sequence of simple functions. Now let µ be a measure on F. We call the triple (X , F, µ) a ﬁnite or σﬁnite measure space according to whether µ has the corresponding property; in the special case of a probability space, when µ has total mass unity, the triple is more usually written (Ω, E, P), where the sets of the σ-algebra E are interpreted as events, a measurable function on (Ω, E) is a random variable, and P is a probability measure. We turn to the problem of deﬁning an integral (or in the probability case N an expectation) with respect to the measure µ. If s = k=1 ck IAk is a nonnegative simple function, set N

s(x) µ(dx) = s dµ = ck µ(Ak ), X

X

k=1

where we allow +∞ as a possible value of the integral. Next, for any nonnegative measurable function f and any sequence of simple functions {sn } approximating f from below, set f dµ = lim sn dµ n→∞

X

X

and prove that the limit is independent of the particular sequence of simple functions used. Finally, for any measurable function f , write + f+ (x) = f (x) = max f (x), 0 ,

f− (x) = f+ (x) − f (x),

and if f+ dµ and f− dµ are both ﬁnite (equivalently, X |f | dµ is ﬁnite), say that f is integrable and then deﬁne, for any integrable function f , f dµ = f+ dµ − f− dµ. X

X

X

The resulting abstract Lebesgue integral is well deﬁned, additive, linear, order-preserving, and enjoys strikingly elegant continuity properties. These

376

APPENDIX 1. Some Basic Topology and Measure Theory Concepts

last are set out in the theorem below, where we say fn → f µ-almost everywhere (µ-a.e., or a.e. µ) if the (necessarily measurable) set {x: fn (x) → f (x)} has µ-measure zero. In the probability case, we refer to almost sure (a.s.) rather than a.e. convergence. Theorem A1.4.III (Lebesgue Convergence Theorems). The following results hold for a sequence of measurable functions {fn : n = 1, 2, . . .} deﬁned on the measure space (X , F, µ) : (a) (Fatou’s Lemma) If fn ≥ 0, lim inf fn (x) µ(dx) ≤ lim inf fn (x) µ(dx). X n→∞

n→∞

X

(b) (Monotone Convergence Theorem) If fn ≥ 0 and fn ↑ f µ-a.e., then f is measurable and lim fn dµ = f dµ n→∞

X

X

in the sense that either both sides are ﬁnite, and then equal, or both are inﬁnite. (c) (Dominated Convergence Theorem) If |fn (x)| ≤ g(x) where g(·) is integrable, and fn → f µ-a.e., then lim fn dµ = f dµ. n→∞

X

X

If f is an integrable function, the indeﬁnite integral of f over any measurable subset can be deﬁned by def def f dµ = IA f dµ, ξf (A) = A

X

where IA is the indicator function of A. It is clear that ξf is totally ﬁnite and ﬁnitely additive on S. Moreover, it follows from the dominated convergence theorem that if An ∈ S and An ↓ ∅, then IAn f → 0 and hence ξf (An ) → 0. Thus, ξf is also countably additive; that is, a signed measure on S. This raises the question of which signed measures can be represented as indeﬁnite integrals with respect to a given µ. The essential feature is that the ξ-measure of a set should tend to zero with the µ-measure. More speciﬁcally, ξ is absolutely continuous with respect to µ whenever µ(A) = 0 implies ξ(A) = 0; we then have the following theorem. Theorem A1.4.IV (Radon–Nikodyn Theorem). Let (X , F, µ) be a σ-ﬁnite measure space and ξ a totally ﬁnite measure or signed measure on F. Then, there exists a measurable integrable function f such that ξ(A) = f (x) µ(dx) (all A ∈ F) (A1.4.1) A

if and only if ξ is absolutely continuous with respect to µ; moreover, f is a.e. uniquely determined by (A1.4.1), in the sense that any two functions satisfying (A1.4.1), for all A ∈ F must be equal µ-a.e.

A1.5.

Product Spaces

377

The function f appearing in (A1.4.1) is usually referred to as a Radon– Nikodym derivative of ξ with respect to µ, written dξ/dµ. Lemma A1.6.III below shows one way in which the Radon–Nikodym derivative can be expressed as a limiting ratio. There is an obvious extension of Theorem A1.4.IV to the case where ξ is σ-ﬁnite; in this extension, (A1.4.1) holds for subsets A of any member of the denumerable family of measurable sets on which ξ is totally ﬁnite. Finally, we consider the relation between a ﬁxed σ-ﬁnite measure µ and an arbitrary σ-ﬁnite signed measure ξ. ξ is said to be singular with respect to µ if there is a set E in F such that µ(E) = 0 and for all A ∈ F, ξ(A) = ξ(E ∩ A) so that also µ(E c ) = 0 and µ(A) = µ(A ∩ E c ). We then have the following theorem. Theorem A1.4.V (Lebesgue Decomposition Theorem). Let (X , F, µ) be a σ-ﬁnite measure space and ξ(·) a ﬁnite or σ-ﬁnite signed measure on F. Then, there exists a unique decomposition of ξ, ξ = ξs + ξac , into components that are, respectively, singular and absolutely continuous with respect to µ.

A1.5. Product Spaces If X , Y are two spaces, the Cartesian product X × Y is the set of ordered pairs {(x, y): x ∈ X , y ∈ Y}. If X and Y are either topological or measure spaces, there is a natural way of combining the original structures to produce a structure in the product space. Consider ﬁrst the topological case. If U , V are neighbourhoods of the points x ∈ X , y ∈ Y with respect to topologies U, V, deﬁne a neighbourhood of the pair (x, y) as the product set U × V . The class of product sets of this kind is closed under ﬁnite intersections because (U × V ) ∩ (A × B) = (U ∩ A) × (V ∩ B). It can therefore be taken as the basis of a topology in X × Y; it is called the product topology and denoted X ⊗ Y [we follow e.g. Br´emaud (1981) in using a distinctive product sign as a reminder that the product entity here is generated by the elements of the factors]. Most properties enjoyed by the component (or coordinate) topologies are passed on to the product topology. In particular, if X , Y are both c.s.m.s.s, then X × Y is also a c.s.m.s. with respect to any one of a number of equivalent metrics, of which perhaps the simplest is ρ((x, y), (u, v)) = max(ρX (x, u), ρY (y, v)). More generally, if {Xt : t ∈ T } is a family of spaces, the Cartesian product X =

× (Xt)

t∈T

may be deﬁned as the set of all functions x: T →

t

Xt such that x(t) ∈ Xt .

378

APPENDIX 1. Some Basic Topology and Measure Theory Concepts

A cylinder set in this space is a set in which restrictions are placed on a ﬁnite subset of the coordinates, on x(t1 ), . . . , x(tN ), say, the values of the other coordinates being unrestricted in their appropriate spaces. A family of basic open sets in X can be deﬁned by choosing open sets {Ut ⊆ Xti , i = 1, . . . , N } and requiring x(ti ) ∈ Ui , i = 1, . . . , N . The topology generated by the class of cylinder sets of this form is called the product topology in X . A remarkable property of this topology is that if the coordinate spaces Xt are individually compact in their respective topologies, then X is compact in the product topology. On the other hand, if the individual Xt are metric spaces, there are again many ways in which X can be made into a metric space [e.g. by using the supremum of the distances ρt (x(t), y(t)) ], but the topologies they generate are not in general equivalent among themselves nor to the product topology deﬁned earlier. Turning now to the measure context, let (X , F, µ) and (Y, G, ν) be two measure spaces. The product σ-ring F ⊗ G is the σ-ring generated by the semiring of measurable rectangles A × B with A ∈ F, B ∈ G. The product measure µ × ν is the extension to the σ-ring of the countably additive set function deﬁned on such rectangles by (µ × ν)(A × B) = µ(A) ν(B) and extended by additivity to the ring of all ﬁnite disjoint unions of such rectangles. If µ, ν are both ﬁnite, then so is µ × ν; similarly, if µ, ν are σﬁnite, so is µ × ν. The product measurable space is the space (X × Y, F ⊗ G), and the product measure space is the space (X × Y, F ⊗ G, µ × ν). All the deﬁnitions extend easily to the products of ﬁnite families of measure spaces. In the probability context, they form the natural framework for the discussion of independence. In the context of integration theory, the most important results pertain to the evaluation of double integrals, the question we take up next. Let H = F ⊗ G and π = µ × ν. If C is H-measurable, its sections Cx = {y: (x, y) ∈ C},

C y = {x: (x, y) ∈ C}

are, respectively, G-measurable for each ﬁxed x and F-measurable for each ﬁxed y. (The converse to this result, that a set whose sections are measurable is H-measurable, is false, however.) Similarly, if f (x, y) is H-measurable, then regarded as a function of y, it is G-measurable for each ﬁxed x, and regarded as a function of x, it is F-measurable for each ﬁxed y. Introducing integrals with respect to µ, ν, write f (x, y) ν(dy) if the integrand is ν-integrable, Y s(x) = +∞ otherwise; f (x, y) µ(dx) if the integrand is µ-integrable, X t(y) = +∞ otherwise. We then have the following theorem.

A1.5.

Product Spaces

379

A1.5.I (Fubini’s Theorem). Let (X , F, µ) and (Y, G, ν) be σ-ﬁnite measure spaces, and let (Z, H, π) denote the product measure space. (a) If f is H-measurable and π-integrable, then s(x) is F-measurable and µ-integrable, t(y) is G-measurable and ν-integrable, and

f dπ = Z

s dµ = X

t dν. Y

(b) If f is H-measurable and f ≥ 0, it is necessary and suﬃcient for f to be π-integrable that either s be µ-integrable or t be ν-integrable. Not all the important measures on a product space are product measures; in the probability context, in particular, it is necessary to study general bivariate probability measures and their relations to the marginal and conditional measures they induce. Thus, if π is a probability measure on (X × Y, F ⊗ G), we deﬁne the marginal probability measures πX and πY to be the projections of π onto (X , F) and (Y, G), respectively; i.e. the measures deﬁned by πX (A) = π(A × Y)

πY (B) = π(X × B).

and

We next investigate the possibility of writing a measure on the product space as an integral (or a mixture of conditional probabilities), say π(A × B) = A

Q(B | x) πX (dx),

(A1.5.1)

where Q(B | x) may be regarded as the conditional probability of observing the event B given the occurrence of x. Such a family is also known as a disintegration of π. Proposition A1.5.II. Given a family {Q(· | x): x ∈ X } of probability measures on (Y, G) and a probability measure πX on (X , F), the necessary and suﬃcient condition that (A1.5.1) should deﬁne a probability measure on the product space (Z, H) is that, as a function of x, Q(B | x) be F-measurable for each ﬁxed B ∈ G. When this condition is satisﬁed, for every H-measurable, nonnegative function f (·, ·),

f dπ =

Z

X

πX (dx)

Y

f (x, y) Q(dy | x).

(A1.5.2)

Indeed, the integral in (A1.5.1) is not deﬁned unless Q(B | ·) is F-measurable. When it is, the right-hand side of (A1.5.2) can be extended to a ﬁnitely additive set function on the ring of ﬁnite unions of disjoint rectangle sets. Countable additivity and the extension to a measure for which (A1.5.2) holds then follow along standard lines using monotone approximation arguments.

380

APPENDIX 1. Some Basic Topology and Measure Theory Concepts

The projection of π onto the space (Y, G), i.e. the measure deﬁned by πY (B) =

X

Q(B | x) πX (dx),

is known as the mixture of Q(· | x) with respect to πX . The converse problem, of establishing the existence of a family of measures satisfying (A1.5.1) from a given measure and its marginal, is a special case of the problem of regular conditional probabilities (see e.g. Ash, 1972, Section 6.6). For any ﬁxed B ∈ G, π(· × B) may be regarded as a measure on (X , F), that is clearly absolutely continuous with respect to the marginal πX . Hence, there exists a Radon–Nikodym derivative, QR (B | x) say, that is Fmeasurable, satisﬁes (A1.5.1), and should therefore be a candidate for the disintegration of π. The diﬃculty is that we can guarantee the behaviour of QR only for ﬁxed sets B, and it is not clear whether, for x ﬁxed and B varying, the family QR (B | x) will have the additivity and continuity properties of a measure. If {A1 , . . . , AN } is a ﬁxed family of disjoint sets in G or if {Bn : n ≥ 1} is a ﬁxed sequence in G with Bn ↓ ∅, then it is not diﬃcult to show that ! QR

N i=1

" N &

& Ai & x = QR (Ai | x)

πX -a.e.,

i=1

QR (Bn | x) → 0

(n → ∞)

πX -a.e.,

respectively, but because there are uncountably many such relations to be checked, it is not obvious that the exceptional sets of measure zero can be combined into a single such set. The problem, in fact, is formally identical to establishing the existence of random measures and is developed further in Chapter 9. The following result is a partial converse to Proposition A1.5.II. Proposition A1.5.III (Existence of Regular Conditional Probabilities). Let (Y, G) be a c.s.m.s. with its associated σ-algebra of Borel sets, (X , F) an arbitrary measurable space, and π a probability measure on the product space (Z, H). Then, with πX (A) = π(A × Y) for all A ∈ F, there exists a family of kernels Q(B | x) such that (i) Q(· | x) is a probability measure on G for each ﬁxed x ∈ X ; (ii) Q(B | ·) is an F-measurable function on X for each ﬁxed B ∈ G; and (iii) π(A × B) = A Q(B | x) πX (dx) for all A ∈ F and B ∈ B. We consider ﬁnally the product of a general family of measurable spaces, {(XT , Ft ): t ∈ T }, where T is an arbitrary (ﬁnite, countable, or uncountable) indexing set. Once again, the cylinder sets play a basic role. A measurable cylinder set in X = ×t∈T (Xt ) is a set of the form C(t1 , . . . , tN ; B1 , . . . , BN ) = {x(t): x(ti ) ∈ Bi , i = 1, . . . , N },

A1.5.

Product Spaces

381

where Bi ∈ Fti is measurable for each i = 1, . . . , N . Such sets form a semiring, their ﬁnite disjoint unions form a ring, and the generated σ-ring we denote by 0 F∞ = Ft . t∈T

This construction can be used to deﬁne a product measure on F∞ , but greater interest centres on the extension problem: given a system of measures π(σ) deﬁned on ﬁnite subfamilies F(σ) = Ft1 ⊗ Ft2 ⊗ · · · ⊗ FtN , where (σ) = {t1 , . . . , tN } is a ﬁnite selection of indices from T , when can they be extended to a measure on F∞ ? It follows from the extension theorem A1.3.III that the necessary and suﬃcient condition for this to be possible is that the given measures must give rise to a countably additive set function on the ring generated by the measurable cylinder sets. As with the previous result, countable additivity cannot be established without some additional assumptions; again it is convenient to put these in topological form by requiring each of the Xt to be a c.s.m.s. Countable additivity then follows by a variant of the usual compactness argument, and the only remaining requirement is that the given measures should satisfy the obviously necessary consistency conditions stated in the theorem below. Theorem A1.5.IV (Kolmogorov Extension Theorem). Let T be an arbitrary index set, and for t ∈ T suppose (Xt , Ft ) is a c.s.m.s. with its associated Borel σ-algebra. Suppose further that for each ﬁnite subfamily (σ) = {t1 , . . . , tN } of indices from T , there is given a probability measure π(σ) on F(σ) = Ft1 ⊗ · · · ⊗ FtN . In order that there exist a measure π on F∞ such that for all (σ), π(σ) is the projection of π onto F(σ) , it is necessary and suﬃcient that for all (σ), (σ1 ), (σ2 ), (i) π(σ) depends only on the choice of indices in (σ), not on the order in which they are written down; and (ii) if (σ1 ) ⊆ (σ2 ), then π(σ1 ) is the projection of π(σ2 ) onto F(σ1 ) . Written out more explicitly in terms of distribution functions, condition (i) becomes (in an obvious notation) the condition of invariance under simultaneous permutations: if p1 , . . . , pN is a permutation of the integers 1, . . . , N , then (N ) (N ) Ft1 ,...,tN (x1 , . . . , xN ) = Ftp1 ,...,tp (xp1 , . . . , xpN ). N

Similarly, condition (ii) becomes the condition of consistency of marginal distributions, namely that (N +k)

(N )

Ft1 ,...,tN ,s1 ,...,sk (x1 , . . . , xN , ∞, . . . , ∞) = Ft1 ,...,tN (x1 , . . . , xN ). The measure π induced on F∞ by the ﬁdi distributions is called their projective limit. Clearly, if stochastic processes have the same ﬁdi distributions, they must also have the same projective limit. Such processes may be described as being equivalent or versions of one another. See Parthasarathy (1967, Sections 5.1–5) for discussion of Theorem A1.5.IV in a slightly more general form and for proof and further details.

382

APPENDIX 1. Some Basic Topology and Measure Theory Concepts

A1.6. Dissecting Systems and Atomic Measures The notion of a dissecting system in Deﬁnition A1.6.I depends only on topological ideas of separation and distinguishing one point from another by means of distinct sets, though we use it mainly in the context of a metric space where its development is simpler. If (X , U) is a topological space, the smallest σ-algebra containing the open sets is called the Borel σ-algebra. If f : X → R is any real-valued continuous function, then the set {x: f (x) < c} is open in U and hence measurable. It follows that f is measurable. Thus, every continuous function is measurable with respect to the Borel σ-algebra. Deﬁnition A1.6.I (Dissecting System). The sequence T = {Tn } of ﬁnite partitions Tn = {Ani : i = 1, . . . , kn } (n = 1, 2, . . .) consisting of Borel sets in the space X is a dissecting system for X when (i) (partition properties) Ani ∩ Anj = ∅ for i = j and An1 ∪ · · · ∪ Ankn = X ; (ii) (nesting property) An−1,i ∩ Anj = Anj or ∅; and (iii) (point-separating property) given distinct x, y ∈ X , there exists an integer n = n(x, y) such that x ∈ Ani implies y ∈ / Ani . Given a dissecting system T for X , properties (i) and (ii) of Deﬁnition A1.6.I imply that there is a well-deﬁned nested sequence {Tn (x)} ⊂ T such that ∞ Tn (x) = {x}, so µ(Tn (x)) → µ{x} (n → ∞) n=1

because µ is a measure and {Tn (x)} is a monotone sequence. Call x ∈ X an atom of µ if µ({x}) ≡ µ{x} > 0. It follows that x is an atom of µ if and only if µ(Tn (x)) > (all n) for some > 0; indeed, any in 0 < ≤ µ{x} will do. We use δx (·) to denote Dirac measure at x, being deﬁned on Borel sets A by 1 if x ∈ A, δx (A) = 0 otherwise. More generally, an atom of a measure µ on a measurable space (X , F) is any nonempty set F ∈ F such that if G ∈ F and G ⊆ F , then either G = ∅ or G = F . However, when X is a separable metric space, it is a consequence of Proposition A2.1.IV below that the only possible atoms of a measure µ on (X , F) are singleton sets. A measure with only atoms is purely atomic; a diﬀuse measure has no atoms. Given > 0, we can identify all atoms of µ of mass µ{x} ≥ , and then using a sequence {j } with j ↓ 0 as j → ∞, all atoms of µ can be identiﬁed. Because µ is σ-ﬁnite, it can have at most countably many atoms, so identifying them as {xj : j = 1, 2, . . .}, say, and writing bj = µ{xj }, the measure µa (·) ≡

∞

j=1

bj δxj (·),

A1.6.

Dissecting Systems and Atomic Measures

383

which clearly consists only of atoms, is the atomic component of the measure µ. The measure ∞

µd (·) ≡ µ(·) − µa (·) = µ(·) − bj δxj (·) j=1

has no atoms and is the diﬀuse component of µ. Thus, any measure µ as above has a unique decomposition into atomic and diﬀuse components. Lemma A1.6.II. Let µ be a nonatomic measure and {Tn } a dissecting system for a set A with µ(A) < ∞. Then n ≡ supi µ(Ani ) → 0 as n → ∞. Proof. Suppose not. Then there exists δ > 0 and, for each n, some set An,in , say, with An,in ∈ Tn and µ(An,in ) > δ. Because Tn is a dissecting system, the nesting implies that there exists An−1,in−1 ∈ Tn−1 and contains An,in , so µ(An−1,in−1 ) > δ. Consequently, we can assume there exists a nested sequence of sets An,in for which µ(An,in ) > δ, and hence δ ≤ lim µ(An,in ) = µ(lim An,in ), n

n

equality holding here because µ is a measure and {An,in } is monotone. But, because Tn is a dissecting system, limn An,in is either empty or a singleton set, {x } say. Thus, the right-hand side is either µ(∅) = 0 or µ({x}) = 0 because µ is nonatomic (i.e. δ ≤ 0), which is a contradiction. Dissecting systems can be used to construct approximations to Radon– Nikodym derivatives as follows (e.g. Chung, 1974, Chapter 9.5, Example VIII). Lemma#A1.6.III (Approximation of Radon–Nikodym Derivative). Let T = $ {Tn } = {Ani : i = 1, . . . , kn } be a nested family of measurable partitions of the measure space (Ω, E, µ), generating E and let ν be a measure absolutely continuous with respect to µ, with Radon–Nikodym derivative dν/dµ. Deﬁne λn (ω) =

kn

i=1

Then, as n → ∞, λn →

IAni (ω)

ν(Ani ) µ(Ani )

(ω ∈ Ω).

dν , µ-a.e. and in L1 (µ) norm. dµ

As a ﬁnal result involving dissecting systems, given two probability measures P and P0 on (Ω, E), deﬁne the relative entropy of the restriction of P and P0 to a partition T = {Ai } of (Ω, E) by

P (Ai ) . P (Ai ) log H(P ; P0 ) = P 0 (Ai ) i Additivity of measures, convexity of x log x on x > 0, and the inequality (a1 + a2 )/(b1 + b2 ) ≤ a1 /b1 + a2 /b2 , valid for nonnegative ar and positive br , r = 1, 2, establishes the result below. Lemma A1.6.IV. Let T1 , T2 be measurable partitions of (Ω, E) with T1 ⊆ T2 and P, P0 two probability measures on (Ω, E). Then, the relative entropies of the restrictions of P, P0 to Tr satisfy H1 (P ; P0 ) ≤ H2 (P ; P0 ).

APPENDIX 2

Measures on Metric Spaces

A2.1. Borel Sets and the Support of Measures If (X , U) is a topological space, the smallest σ-algebra containing the open sets is called the Borel σ-algebra. If f : X → R is any real-valued continuous function, then the set {x: f (x) < c} is open in U and hence measurable. It follows that f is measurable. Thus, every continuous function is measurable with respect to the Borel σ-algebra. It is necessary to clarify the relation between the Borel sets and various other candidates for useful σ-algebras that suggest themselves, such as (a) the Baire sets, belonging to the smallest σ-ﬁeld with respect to which the continuous functions are measurable; (b) the Borelian sets, generated by the compact sets in X ; and (c) if X is a metric space, the σ-algebra generated by the open spheres. We show that, with a minor reservation concerning (b), all three concepts coincide when X is a c.s.m.s. More precisely, we have the following result. Proposition A2.1.I. Let X be a metric space and U the topology induced by the metric. Then (i) the Baire sets and the Borel sets coincide; (ii) if X is separable, then the Borel σ-algebra is the smallest σ-algebra containing the open spheres; (iii) a Borel set is Borelian if and only if it is σ-compact; that is, if it can be covered by a countable union of compact sets. In particular, the Borel sets and the Borelian sets coincide if and only if the whole space is σ-compact. 384

A2.1.

Borel Sets and the Support of Measures

385

Proof. Part (i) depends on Lemma A2.1.II below, of interest in its own right; (ii) depends on the fact that when X is separable, every open set can be represented as a countable union of open spheres; (iii) follows from the fact that all closed subsets of a compact set are compact and hence Borelian. Lemma A2.1.II. Let F be a closed set in the metric space X , U an open set containing F , and IF (·) the indicator function of F . Then, there exists a sequence of continuous functions {fn (x)} such that (i) 0 ≤ fn (x) ≤ 1 (x ∈ X ); (ii) fn (x) = 0 outside U ; (iii) fn (x) ↓ IF (x) as n → ∞. Proof. Let fn (x) = ρ(x, U c )/[ρ(x, U c ) + 2n ρ(x, F )], where for any set C ρ(x, C) = inf ρ(x, y). y∈C

Then, the sequence {fn (x)} has the required properties. It is clear that in a separable metric space the Borel sets are countably generated. Lemma A2.1.III exhibits a simple example of a countable semiring of open sets generating the Borel sets. Lemma A2.1.III. Let X be a c.s.m.s., D a countable dense set in X , and S0 the class of all ﬁnite intersections of open spheres Sr (d) with centres d ∈ D and rational radii. Then (i) S0 and the ring A0 generated by S0 are countable; and (ii) S0 generates the Borel σ-algebra in X . It is also a property of the Borel sets in a separable metric space, and of considerable importance in the analysis of sample-path properties of point processes and random measures, that they include a dissecting system deﬁned in Deﬁnition A1.6.I. Proposition A2.1.IV. Every separable metric space X contains a dissecting system. Proof. Let {d1 , d2 , . . .} = D be a separability set for X (i.e. D is a countable dense set in X ). Take any pair of distinct points x, y ∈ X ; their distance apart equals 2δ ≡ ρ(x, y) > 0. We can then ﬁnd dm , dn in D such that ρ(dm , x) < δ, ρ(dn , y) < δ, so the spheres Sδ (dm ), Sδ (dn ), which are Borel sets, certainly separate x and y. We have essentially to embed such separating spheres into a sequence of sets covering the whole space. For the next part of the proof, it is convenient to identify one particular element in each Tn (or it may possibly be a null set for all n suﬃciently large) as An0 ; this entails no loss of generality. Deﬁne the initial partition {A1i } by A11 = S1 (d1 ), A10 = X \ A11 . Observe that X is covered by the countably inﬁnite sequence {S1 (dn )}, so the sequence

386

APPENDIX 2. Measures on Metric Spaces

n of sets {An0 } deﬁned by An0 = X \ r=1 S1 (dr ) converges to the null set. For n = 2, 3, . . . and i = 1, . . . , n, deﬁne "c ! n Bni = S1/2n−i (di ), Bn0 = Bni , i=1

so that {Bni :i = 0, . . . , n} coversX . By setting Cn0 = Bn0 , Cn1 = Bn1 , and Cni = Bni \ Bn1 ∪ · · · ∪ Bn,i−1 , it is clear that {Cni : i = 0, 1, . . . , n} is a partition of X . Let the family {Ani } consist of all nonempty intersections of the $ , setting in particular An0 = An−1,0 ∩ Cn0 = An0 . Then # form An−1,j ∩ Cnk {Ani }: n = 1, 2, . . . clearly consists of nested partitions of X by Borel sets, and only the separation property has to be established. Take distinct points x, y ∈ X , and write δ = ρ(x, y) as before. Fix the integer r ≥ 0 by 2−r ≤ min(1, δ) < 2−r+1 , and locate a separability point dm such that ρ(dm , x) < 2−r . Then x ∈ S1/2r (dm ) = Bm+r,m , and consequently x ∈ Cm+r,j for some j = 1, . . . , m. But by the triangle inequality, for any z ∈ Cm+r,j , ρ(x, z) < 2

and

2−(m+r−j) < 2δ = ρ(x, y),

so the partition {Cm+r,i }, and hence also {Am+r,j }, separates x and y. Trivially, if T is a dissecting system for X , the nonempty sets of T ∩ A (in an obvious notation) constitute a dissecting system for any A ∈ BX . If A is also compact, the construction of a dissecting system for A is simpliﬁed by applying the Heine–Borel theorem to extract a ﬁnite covering of A from the countable covering {S2−n (dr ): r = 1, 2, . . .}. Deﬁnition A2.1.V. The ring of sets generated by ﬁnitely many intersections and unions of elements of a dissecting system is a dissecting ring.

A2.2. Regular and Tight Measures In this section, we examine the extent to which values of a ﬁnitely or countably generated set function deﬁned on some class of sets can be approximated by their values on either closed or compact sets. Deﬁnition A2.2.I. (i) A ﬁnite or countably additive, nonnegative set function µ deﬁned on the Borel sets is regular if, given any Borel set A and > 0, there exist open and closed sets G and F , respectively, such that F ⊆ A ⊆ G and µ(G − A) < and µ(AF ) < . (ii) It is compact regular if, given any Borel set A and > 0, there exists a compact set C such that C ⊆ A and µ(A − C) < . We ﬁrst establish the following.

A2.2.

Regular and Tight Measures

387

Proposition A2.2.II. If X is a metric space, then all totally ﬁnite measures on BX are regular. Proof. Let µ be a totally ﬁnite, additive, nonnegative set function deﬁned on BX . Call any A ∈ BX µ-regular if µ(A) can be approximated by the values of µ on open and closed sets in the manner of Deﬁnition A2.2.I. The class of µ-regular sets is obviously closed under complementation. It then follows from the inclusion relations Gα − Fα ⊆ (Gα − Fα ) (A2.2.1a) α

and

α

Gα −

α

Fα ⊆

α

α

α

α

G α − Fα

⊆

(Gα − Fα )

(A2.2.1b)

α

that the class is an algebra if µ is ﬁnitely additive and aσ-algebra if µ is countably additive. In the latter case, the countable union α Fα in (A2.2.1a) may N not be closed, but we can approximate µ α Fα by µ i=1 Fαi to obtain a set that is closed and has the required properties; similarly, in (A2.2.1b) we N can approximate µ α Gα by µ i=1 Aαi . Moreover, if µ is σ-additive, the class also contains all closed sets, for if F is closed, the halo sets F = S (x) = {x: ρ(x, F ) < } (A2.2.2) x∈F

form, for a sequence of values of tending to zero, a family of open sets with the property F ↓ F ; hence, it follows from the continuity lemma A1.3.II that µ(F ) → µ(F ). In summary, if µ is countably additive, the µ-regular sets form a σ-algebra containing the closed sets, and therefore the class must coincide with the Borel sets themselves. Note that this proof does not require either completeness or separability. Compact regularity is a corollary of this result and the notion of a tight measure. Deﬁnition A2.2.III (Tightness). A ﬁnitely or countably additive set function µ is tight if, given > 0, there exists a compact set K such that µ(X −K) is deﬁned and µ(X − K) < . Lemma A2.2.IV. If X is a complete metric space, a Borel measure is compact regular if and only if it is tight. Proof. Given any Borel set A, it follows from Proposition A2.2.II that there exists a closed set C ⊆ A with µ(A − C) < 12 . If µ is tight, choose K so that µ(X − K) < 12 . Then, the set C ∩ K is a closed subset of the compact set K and hence is itself compact; it also satisﬁes µ(A − C ∩ K) ≤ µ(A − C) + µ(A − K) < ,

388

APPENDIX 2. Measures on Metric Spaces

which establishes the compact regularity of µ. If, conversely, µ is compact regular, tightness follows on taking X = K. Proposition A2.2.V. If X is a c.s.m.s., every Borel measure µ is tight and hence compact regular. Proof. Let D be a separability set for X ; then for ﬁxed n, d∈D S1/n (d) = X , and so by the continuity lemma A1.3.II, there is a ﬁnite set d1 , . . . , dk(n) such that k(n) S1/n (di ) < n . µ X− 2 i=1 k(n) Now consider K = n i=1 S1/n (di ) . It is not diﬃcult to see that K is closed and totally bounded, and hence compact, by Proposition A1.2.II and that µ(X − K) < . Hence, µ is tight. The results above establish compact regularity as a necessary condition for a ﬁnitely additive set function to be countably additive. The next proposition asserts its suﬃciency. The method of proof provides a pattern that is used with minor variations at several important points in the further development of the theory. Proposition A2.2.VI. Let A be a ring of sets from the c.s.m.s. X and µ a ﬁnitely additive, nonnegative set function deﬁned and ﬁnite on A. A suﬃcient condition for µ to be countably additive on A is that, for every A ∈ A and > 0, there exists a compact set C ⊆ A such that µ(A − C) < . Proof. Let {An } be a decreasing sequence of sets in A with An ↓ ∅; to establish countable additivity for µ, it is enough to show that µ(An ) → 0 for every such sequence. Suppose to the contrary that µ(An ) ≥ α > 0. By assumption, there exists for each n a compact set Cn for which Cn ⊆ An and µ(An − Cn ) < α/2n+1 . By (A2.2.1), An −

Ck ⊆

k

Since A is a ring, every ﬁnite union from the ﬁnite additivity of µ,

(Ak − Ck ).

k

n

k=1 (Ak

− Ck ) is an element of A, so

n n

α µ An − Ck ≤ < 2n+1 k=1

1 2 α.

k=1

n Thus, the intersection k=1 Ck is nonempty for each n, nand it follows from the ﬁnite intersection part of Proposition A1.2.II that k=1 Ck is nonempty. This gives us the required contradiction to the assumption An ↓ ∅.

A2.2.

Regular and Tight Measures

389

Corollary A2.2.VII. A ﬁnite, ﬁnitely additive, nonnegative set function deﬁned on the Borel sets of X is countably additive if and only if it is compact regular. We can now prove an extension of Proposition A2.2.VI that plays an important role in developing the existence theorems of Chapter 9. It is based on the notion of a self-approximating ring and is a generalization of the concept of a covering ring given in Kallenberg (1975). Deﬁnition A2.2.VIII (Self-Approximating Ring). A ring A of sets of the c.s.m.s. X is a self-approximating ring if, for every A ∈ A and > 0, there exists a sequence of closed sets {Fk (A; )} such that (i) Fk (A; ) ∈ A (k = 1, 2, . . .); (ii) each set Fk (A; ) is contained within a sphere of radius ; and ∞ (iii) k=1 Fk (A; ) = A. Kallenberg uses the context where X is locally compact, in which case it is possible to require the covering to be ﬁnite so that the lemma below eﬀectively reduces to Proposition A2.2.VI. The general version is based on an argument in Harris (1968). The point is that it allows checking for countable additivity to be reduced to a denumerable set of conditions. Lemma A2.2.IX. Let A be a self-approximating ring of subsets of the c.s.m.s. X and µ a ﬁnitely additive, nonnegative set function deﬁned on A. In order that µ have an extension as a measure on σ(A), it is necessary and suﬃcient that for each A ∈ A, using the notation of Deﬁnition A2.2.VIII, ! lim µ

m→∞

m

" Fi (A; )

= µ(A).

(A2.2.3)

i=1

Proof. Necessity follows from the continuity lemma. We establish suﬃciency by contradiction: suppose that µ is ﬁnitely additive and satisﬁes (A2.2.3) but that µ cannot be extended to a measure on σ(A). From the continuity lemma, it again follows that there exists α > 0 and a sequence of sets An ∈ A, with An ↓ ∅, such that µ(An ) ≥ α. (A2.2.4) mk For each k, use (A2.2.3) to choose a set Fk = i=1 Fi (A; k −1 ) that is closed, can be covered by a ﬁnite number of k −1 spheres, and satisﬁes µ(Ak − Fk ) ≤ α/2n+1 . From (A2.2.1), we have A − additivity of µ, implies that ! µ

k j=1 k j=1

Fj ⊆

k

j=1 (Aj

" Fj

≥ 12 α > 0.

− Fj ), which, with the

390

APPENDIX 2. Measures on Metric Spaces

Thus, the sets Fj have the ﬁnite intersection property. kTo show that their complete intersection is nonempty, choose any xk ∈ j=1 Fj . Since F1 can be covered by a ﬁnite number of 1-spheres, there exists a subsequence {xk } that is wholly contained within a sphere of radius 1. Turning to F2 , we can select a further subsequence xk , which for k ≥ 2 lies wholly within a sphere of radius 12 . Proceeding in this way by induction, we ﬁnally obtain by a diagonal selection argument a subsequence {xkj } such that for j ≥ j0 all terms are contained within a sphere of radius 1/j0 . This is enough to show that {xkj } is a Cauchy sequence that, since X is complete, k has a limit point x ¯, say. For each k, the xj are in n=1 Fn for all suﬃciently large j. Since the sets are closed, this implies that x ¯ ∈ Fk for every k. But ∞ this implies also that x ¯ ∈ Ak and hence x ¯ ∈ k=1 Ak , which contradicts the assumption that An ↓ ∅. The contradiction shows that (A2.2.4) cannot hold and so completes the proof of the lemma. Let us observe ﬁnally that self-approximating rings do exist. A standard example, which is denumerable and generating as well as self-approximating, is the ring C generated by the closed spheres with rational radii and centres on a countable dense set. To see this, consider the class D of all sets that can be approximated by ﬁnite unions of closed sets in C in the sense required by condition (iii) of Deﬁnition A2.2.VIII. This class contains all open sets because any open set G can be written as a denumerable union of closed spheres, with their centres at points of the countable dense set lying within G, and rational radii bounded by the nonzero distance from the given point of the countable dense set to the boundary of G. D also contains all closed spheres in C; for example, suppose is given, choose any positive rational δ < , and take the closed spheres with centres at points of the countable dense set lying within the given sphere and having radii δ. These are all elements of C, and therefore so are their intersections with the given closed sphere. These intersections form a countable family of closed sets satisfying (iii) of Deﬁnition A2.2.VIII for the given closed sphere. It is obvious that D is closed under ﬁnite unions and that, from the relation ! ∞ " ! ∞ " ∞ ∞ Fj ∩ Fk = (Fj ∩ Fk ), j=1

k=1

j=1 k=1

D is also closed under ﬁnite intersections. Since D contains all closed spheres and their complements that are open, D contains C. Thus, every set in C can be approximated by closed spheres in C, so C is self-approximating as required.

A2.3. Weak Convergence of Measures We make reference to the following notions of convergence of a sequence of measures on a metric space (see Section A1.3 for the deﬁnition of · ).

A2.3.

Weak Convergence of Measures

391

Deﬁnition A2.3.I. Let {µn : n ≥ 1} and µ be totally ﬁnite measures in the metric space X . (i) µn → µ weakly if f dµn → f dµ for all bounded continuous functions f on X . (ii) µn → µ vaguely if f dµn → f dµ for all bounded continuous functions f on X vanishing outside a compact set. (iii) µn → µ strongly (or in variation norm) if µn − µ → 0. The last deﬁnition corresponds to strong convergence in the Banach space of all totally ﬁnite signed measures on X , for which the total variation metric constitutes a genuine norm. The ﬁrst deﬁnition does not correspond exactly to weak convergence in the Banach-space sense, but it reduces to weak star (weak*) convergence when X is compact (say, the unit interval) and the space of signed measures on X can be identiﬁed with the adjoint space to the space of all bounded continuous functions on X . Vague convergence is particularly useful in the discussion of locally compact spaces; in our discussion, a somewhat analogous role is played by the notion of weak hash convergence (w# -convergence; see around Proposition A2.6.II below); it is equivalent to vague convergence when the space is locally compact. Undoubtedly, the central concept for our purposes is the concept of weak convergence. Not only does it lead to a convenient and internally consistent topologization of the space of realizations of a random measure, but it also provides an appropriate framework for discussing the convergence of random measures conceived as probability distributions on this space of realizations. In this section, we give a brief treatment of some basic properties of weak convergence, following closely the discussion in Billingsley (1968) to which we refer for further details. Theorem A2.3.II. Let X be a metric space and {µn : n ≥ 1} and µ measures on BX . Then, the following statements are equivalent. (i) µn → µ weakly. (ii) µn (X ) → µ(X ) and lim supn→∞ µn (F ) ≤ µ(F ) for all closed F ∈ BX . (iii) µn (X ) → µ(X ) and lim inf n→∞ µn (G) ≥ µ(G) for all open G ∈ BX . (iv) µn (A) → µ(A) for all Borel sets A with µ(∂A) = 0 (i.e. all µ-continuity sets). Proof. We show that (i) ⇒ (ii) ⇔ (iii) ⇒ (iv) ⇒ (i). Given a closed set F , choose any ﬁxed ν > 0 and construct a [0, 1]-valued continuous function f that equals 1 on F and vanishes outside F ν [see (A2.2.2) and Lemma A2.1.II]. We have for each n ≥ 1 µn (F ) ≤ f dµn ≤ µn (F ν ), so if (i) holds,

lim sup µn (F ) ≤ n→∞

f dµ ≤ µ(F ν ).

392

APPENDIX 2. Measures on Metric Spaces

But F ν ↓ F as ν ↓ 0, and by the continuity Lemma A1.3.II we can choose ν so that, given any > 0, µ(F ν ) ≤ µ(F ) + . Since is arbitrary, the second statement in (ii) follows, while the ﬁrst is trivial if we take f = 1. Taking complements shows that (ii) and (iii) are equivalent. ¯ so supposing that (iii) holds When A is a µ-continuity set, µ(A◦ ) = µ(A), and hence (ii) also, we have on applying (ii) to A¯ and (iii) to A◦ that ¯ ≤ µ(A) ¯ = µ(A◦ ) lim sup µn (A) ≤ lim sup µn (A) ≤ lim inf µn (A◦ ) ≤ lim inf µn (A). Thus, equality holds throughout and µn (A) → µ(A) so (iv) holds. Finally, suppose that (iv) holds. Let f be any bounded continuous function on X , and let the bounded interval [α , α ] be such that α < f (x) < α for all x ∈ X . Call α ∈ [α , α ] a regular value of f if µ{x: f (x) = α} = 0. At most a countable number of values can be irregular, while for any α, β that are regular values, {x: α < f (x) ≤ β} is a µ-continuity set. From the boundedness of f on X , given any > 0, we can partition [α , α ] by a ﬁnite set of points α0 = α , . . . , αN = α with αi−1 < αi ≤ αi−1 + for i = 1, . . . , N , and from the countability of the set of irregular points (if any), we can moreover assume that these αi are all regular points of f . Deﬁning Ai = {x: αi−1 < f (x) ≤ αi } for i = 1, . . . , N and then fL (x) =

N

αi−1 IAi (x),

fU (x) =

i=1

N

αi IAi (x),

i=1

each Ai is a µ-continuity set, fL (x) ≤ f (x) ≤ fU (x), and by (iv), fL dµ =

N

αi−1 µ(Ai ) = lim

n→∞

i=1

≤ lim

n→∞

fU dµn =

N

αi−1 µn (Ai ) = lim

n→∞

i=1

fL dµn

fU dµ,

by at most µ(X ). Since is arbitrary and the extreme terms here diﬀering f dµ ≤ f dµ ≤ f dµ , it follows that we must have f dµ → L n n U n n f dµ for all bounded continuous f ; that is, µn → µ weakly. Since the functions used in the proof that (i) implies (ii) are uniformly continuous, we can extract from the proof the following useful condition for weak convergence. Corollary A2.3.III. µn → µ weakly if and only if f dµn → f dµ for all bounded and uniformly continuous functions f : X → R. Billingsley calls a class C of sets with the property that µn (C) → µ(C)

(all C ∈ C)

implies

µn → µ

weakly

(A2.3.1)

A2.3.

Weak Convergence of Measures

393

a convergence-determining class. In this terminology, (iv) of Theorem A2.3.II asserts that the µ-continuity sets form a convergence-determining class. Any convergence-determining class is necessarily a determining class, but the converse need not be true. In particular circumstances, it may be of considerable importance to ﬁnd a convergence-determining class that is smaller than the classes in Theorem A2.3.II. While such classes often have to be constructed to take advantage of particular features of the metric space in question, the general result below is also of value. In it, a covering semiring is a semiring with the property that every open set can be represented as a ﬁnite or countable union of sets from the semiring. If X is separable, an important example of such a semiring is obtained by ﬁrst taking the open spheres Srj (dk ) with centres at the points {dk } of a countable dense set and radii {rj } forming a countable dense set in (0, 1), then forming ﬁnite intersections, and ﬁnally taking proper diﬀerences. Proposition A2.3.IV. Any covering semiring, together with the whole space X , forms a convergence-determining class. Proof. Let G be an open set so that by assumption we have G=

∞

Ci

for some Ci ∈ S,

i=1

where S is a generating semiring. Since the limit µ in (A2.3.1) is a measure, given > 0, we can choose a ﬁnite integer K such that " " ! K ! K 1 Ci ≤ 2 , i.e. µ(G) ≤ µ Ci + 12 . µ G− i=1

i=1

K

Further, since C is a semiring, i=1 Ci can be represented as a ﬁnite union of disjoint sets in C. From (A2.3.1), it therefore follows that there exists N such that, for n ≥ N , " ! K " ! K µ Ci ≤ µn Ci + 12 . i=1

Hence,

i=1

! µ(G) ≤ lim inf µn n→∞

K i=1

" Ci

+ ≤ lim inf µn (G) + . n→∞

Since is arbitrary, (iii) of Theorem A2.3.II is satisﬁed, and therefore µn → µ weakly. We investigate next the preservation of weak convergence under mappings from one metric space into another. Let X , Y be two metric spaces with associated Borel σ-algebras BX , BY , and f a measurable mapping from (X , BX ) into (Y, BY ) [recall that f is continuous at x if ρY f (x ), f (x) → 0 whenever ρX (x , x) → 0].

394

APPENDIX 2. Measures on Metric Spaces

Proposition A2.3.V. Let (X , BX ), (Y, BY ) be metric spaces and f a measurable mapping of (X , BX ) into (Y, BY ). Suppose that µn → µ weakly on X and µ(Df ) = 0; then µn f −1 → µf −1 weakly. Proof. Let B be any Borel set in BY and x any point in the closure of f −1 (B). For any sequence of points xn ∈ f −1 (B) such that xn → x, either ¯ Arguing similarly for x ∈ Df or f (xn ) → f (x), in which case x ∈ f −1 (B). the complement, ∂{f −1 (B)} ⊆ f −1 (∂B) ∪ Df . (A2.3.2) Now suppose that µn → µ weakly on BX , and consider the image measures µn f −1 , µf −1 on BY . Let B be any continuity set for µf −1 . It follows from (A2.3.2) and the assumption of the proposition that f −1 (B) is a continuity set for µ. Hence, for all such B, (µn f −1 )(B) = µn (f −1 (B)) → µ(f −1 (B)) = (µf −1 )(B); that is, µn f −1 → µf −1 weakly.

A2.4. Compactness Criteria for Weak Convergence In this section, we call a set M of totally ﬁnite Borel measures on X relatively compact for weak convergence if every sequence of measures in M contains a weakly convergent subsequence. It is shown in Section A2.5 that weak convergence is equivalent to convergence with respect to a certain metric and that if X is a c.s.m.s., the space of all totally ﬁnite Borel measures on X is itself a c.s.m.s. with respect to this metric. We can then appeal to Proposition A1.2.II and conclude that a set of measures is compact (or relatively compact) if and only if it satisﬁes any of the criteria (i)–(iv) of that proposition. This section establishes the following criterion for compactness. Theorem A2.4.I (Prohorov’s Theorem). Let X be a c.s.m.s. Necessary and suﬃcient conditions for a set M of totally ﬁnite Borel measures on X to be relatively compact for weak convergence are (i) the total masses µ(X ) are uniformly bounded for µ ∈ M; and (ii) M is uniformly tight—namely, given > 0, there exists a compact K such that, for all µ ∈ M, µ(X − K) < .

(A2.4.1)

Proof. We ﬁrst establish that the uniform tightness condition is necessary, putting it in the following alternative form. Lemma A2.4.II. A set M of measures is uniformly tight if and only if, for all > 0 and δ > 0, there exists a ﬁnite family of δ-spheres (i.e. of radius δ) S1 , . . . , SN such that N µ X − k=1 Sk ≤ (all µ ∈ M). (A2.4.2)

A2.4.

Compactness Criteria for Weak Convergence

395

Proof of Lemma. If the condition holds, we can ﬁnd, for every k = 1, 2, . . . , a ﬁnite union Ak of spheres of radius 1/k such that µ(X − Ak ) ≤ /2k for all ∞ µ ∈ M. Then, the set K = k=1 Ak is totally bounded and hence compact, and for every µ ∈ M, µ(X − K) ≤

∞

µ(X − Ak ) < .

k=1

Thus, M is uniformly tight. Conversely, if M is uniformly tight and, given , we choose a compact K to satisfy (A2.4.1), then for any δ > 0, K can be covered by a ﬁnite set of δ-spheres, so (A2.4.2) holds. Returning now to the main theorem, suppose if possible that M is relatively compact but (A2.4.2) fails forsome > 0 and δ > 0. Since we assume X is ∞ separable, we can write X = k=1 Sk , where each Sk is a δ-sphere. On the other hand, for every ﬁnite n, we can ﬁnd a measure µn ∈ M such that ∞ (A2.4.3a) µn X − k=1 Sk ≥ . If in fact M is relatively compact, there exists a subsequence {µnj } that converges weakly to some limit µ∗ . From (A2.4.3a), we obtain via (ii) of Theorem A2.3.II that, for all N > 0, N N µ∗ X − k=1 Sk ≥ lim supnj →∞ µnj X − k=1 Sk ≥ . N This contradicts the requirement that, because X − k=1 Sk ↓ ∅, we must have N µ∗ X − k=1 Sk → 0. Thus, the uniform tightness condition is necessary. As it is clear that no sequence {µn } with µn (X ) → ∞ can have a weakly convergent subsequence, condition (i) is necessary also. Turning to the converse, we again give a proof based on separability, although in fact the result is true without this restriction. We start by constructing a countable ring R from the open spheres with rational radii and centres in a countable dense set by taking ﬁrst ﬁnite intersections and then proper diﬀerences, thus forming a semiring, and ﬁnally taking all ﬁnite disjoint unions of such diﬀerences. Now suppose that {µn : n ≥ 1} is any sequence of measures from M. We have to show that {µn } contains a weakly convergent subsequence. For any A ∈ R, condition (i) implies that {µn (A)} is a bounded sequence of real numbers and therefore contains a convergent subsequence. Using a diagonal selection argument, we can proceed to ext ract subsequences {µnj } for which the µn (A) approach a ﬁnite limit for each of the countable number of sets A ∈ R. Let us write µ∗ (A) for the limit and for brevity of notation set µnj = µj . Thus, we have µj (A) → µ∗ (A)

(all A ∈ R).

(A2.4.3b)

396

APPENDIX 2. Measures on Metric Spaces

This might seem enough to set up a proof, for it is easy to see that µ∗ inherits ﬁnite additivity from the µj , and one might anticipate that the uniform tightness condition could be used to establish countable additivity. The diﬃculty is that we have no guarantee that the sets A ∈ R are continuity sets for µ∗ , so (A2.4.3b) cannot be relied on to give the correct value to the limit measure. To get over this diﬃculty, we have to develop a more elaborate argument incorporating the notion of a continuity set. For this purpose, we introduce the class C of Borel sets, which are µ∗ -regular in the following sense: given C ∈ C, we can ﬁnd a sequence {An } of sets in R and an associated sequence of open sets Gn such that An ⊇ Gn ⊇ C and similarly a sequence of sets Bn ∈ R and closed sets Fn with C ⊇ Fn ⊇ Bn , the two sequences {An }, {Bn } having the property lim inf µ∗ (An ) = lim sup µ∗ (Bn ) = µ(C),

say.

(A2.4.4)

We establish the following properties of the class C. (1◦ ) C is a ring: Let C, C be any two sets in C, and consider, for example, the diﬀerence C − C . If {An }, {Gn }, {Bn }, {Fn } and {An }, {Gn }, {Bn }, {Fn } are the sequences for C and C , respectively, then An − Bn ⊇ Gn − Fn ⊇ C − C ⊇ Fn − Gn ⊇ Bn − An , with Gn − Fn open, Fn − Gn closed, and the outer sets elements of R since R is a ring. From the inclusion (An − Bn ) − (Bn − An ) ⊆ (An − Bn ) ∪ (An − Bn ), we ﬁnd that µ∗ (An − Bn ) and µ∗ (Bn − An ) have common limit values, which we take to be the value of µ(C − C ). Thus, C is closed under diﬀerences, and similar arguments show that C is closed also under ﬁnite unions and intersections. (2◦ ) C is a covering ring: Let d be any element in the countable dense set used to construct R, and for rational values of r deﬁne h(r) = µ∗ Sr (d) . Then h(r) is monotonically increasing, bounded above, and can be uniquely extended to a monotonically increasing function deﬁned for all positive values of r and continuous at all except a countable set of values of r. It is clear that if r is any continuity point of h(r), the corresponding sphere Sr (d) belongs to C. Hence, for each d, we can ﬁnd a sequence of spheres S n (d) ∈ C with radii n → 0. Since any open set in X can be represented as a countable union of these spheres, C must be a covering class. (3◦ ) For every C ∈ C, µj (C) → µ(C): Indeed, with the usual notation, we have µ∗ (An ) = lim µj (An ) ≥ lim sup µj (C) ≥ lim inf µj (C) j→∞

j→∞

j→∞

≥ lim µj (Bn ) = µ∗ (Bn ). j→∞

A2.4.

Compactness Criteria for Weak Convergence

397

Since the two extreme members can be made as close as we please to µ(C), the two inner members must coincide and equal µ(C). (4◦ ) µ is ﬁnitely additive on C: This follows from (3◦ ) and the ﬁnite additivity of µj . (5◦ ) If M is uniformly tight, then µ is countably additive on C: Suppose that {Ck } is a sequence of sets from C, with Ck ↓ ∅ but µ(Ck ) ≥ α > 0. From the deﬁnition of C, we can ﬁnd for each Ck a set Bk ∈ R and a closed set Fk such that Ck ⊇ Fk ⊇ Bk and µ∗ (Bk ) > µ(Ck ) − α/2k+1 . Then lim inf µj (Fk ) ≥ lim µj (Bk ) = µ∗ (Bk ) ≥ α − α/2k+1 , j→∞

j→∞

! and µ(Ck ) − lim inf j→∞

µj

" equals

Fn

n=1

! lim sup µj j→∞

k

Ck −

"

k

≤

Fn

n=1

≤

k

lim sup µj (Cn − Fn )

n=1

j→∞

k

µ(Cn ) − lim inf µj (Fn ) ≤ j→∞

n=1

hence,

! lim inf j→∞

k

µj

1 2 α;

" Fn

≥ 12 α

(all k).

n=1

If now M is uniformly tight, there exists a compact set K such that µ(X − K) < 14 α for all µ ∈ M. In particular, therefore, ! µj

k n=1

" Fn −µj

!

k n=1

" (Fn ∩K)

α < , 4

! so

lim inf µj j→∞

k n=1

" (Fn ∩K)

≥

α . 4

k But this is enough to show that, for each k, the sets n=1 Fn ∩ K are nonempty, and since (if X is complete) each is a closed subset of the compact set K, it follows from Theorem A1.2.II that their total intersection is ∞ nonempty. Since their total intersection is contained in n=1 Cn , this set is also nonempty, contradicting the assumption that Cn ↓ ∅. We can now complete the proof of the theorem without diﬃculty. From the countable additivity of µ on C, it follows that there is a unique extension of µ to a measure on BX . Since C is a covering class and µj (C) → µ(C) for C ∈ C, it follows from Proposition A2.3.III that µj → µ weakly or, in other words, that the original sequence µn contains a weakly convergent subsequence, as required.

398

APPENDIX 2. Measures on Metric Spaces

A2.5. Metric Properties of the Space MX Denote by MX the space of all totally ﬁnite measures on BX , and consider the following candidate (the Prohorov distance) for a metric on MX , where F is a halo set as in (A2.2.2): d(µ, ν) = inf{: ≥ 0, and for all closed F ⊆ X , µ(F ) ≤ ν(F ) + and ν(F ) ≤ µ(F ) + }.

(A2.5.1)

If d(µ, ν) = 0, then µ(F ) = ν(F ) for all closed F , so µ(·) and ν(·) coincide. If d(λ, µ) = δ and d(µ, ν) = , then λ(F ) ≤ µ(F δ ) + δ ≤ µ(F δ ) + δ ≤ ν (F δ ) + δ + ≤ ν(F δ+ ) + δ + , with similar inequalities holding when λ and ν are interchanged. Thus, the triangle inequality holds for d, showing that d is indeed a metric. The main objects of this section are to show that the topology generated by this metric coincides with the topology of weak convergence and to establish various properties of MX as a metric space in its own right. We start with an extension of Theorem A2.3.II. Proposition A2.5.I. Let X be a c.s.m.s. and MX the space of all totally ﬁnite measures on BX . Then, each of the following families of sets in MX is a basis, and the topologies generated by these three bases coincide: (i) the sets {ν: d(ν, µ) < } for all > 0 and µ ∈ MX ; (ii) the sets {ν: ν(Fi ) < µ(Fi ) + for i = 1, . . . , k, |ν(X ) − µ(X )| < } for all > 0, ﬁnite families of closed sets F1 , . . . , Fk , and µ ∈ MX ; (iii) the sets {ν: ν(Gi ) > µ(Gi ) − for i = 1, . . . , k, |ν(X ) − µ(X )| < } for all > 0, ﬁnite families of open sets G1 , . . . , Gk , and µ ∈ MX . Proof. Each of the three families represents a family of neighbourhoods of a measure µ ∈ MX . To show that each family forms a basis, we need to verify that, if G, H are neighbourhoods of µ, ν in the given family, and η ∈ G ∩ H, then we can ﬁnd a member J of the family such that η ∈ J ⊆ G ∩ H. Suppose, for example, that G, H are neighbourhoods of µ, ν in the family (ii) [(ii)-neighbourhoods for short], corresponding to closed sets F1 , . . . , Fn , F1 , . . . , Fm , respectively, and with respective bounds , , and that η is any measure in the intersection G ∩ H. Then we must ﬁnd closed sets Ci and a bound δ, deﬁning a (ii)-neighbourhood J of η such that, for any ρ ∈ J, ρ(Fi ) < µ(Fi ) + ρ(Fj ) < µ(Fj ) +

(i = 1, . . . , n), (j = 1, . . . , m),

and |ρ(X − µ(X )| < . For this purpose, we may take Ci = Fi , i = 1, . . . , n; Ci+j = Fj , j = 1, . . . , m, and δ = min{δ1 , . . . , δn ; δ1 , . . . , δm ; 12 , 12 ], where

A2.5.

Metric Properties of the Space MX

δi = µ(Fi ) + − η(Fi ) δj

=

µ(Fj )

+ −

η(Fj )

399

(i = 1, . . . , n), (j = 1, . . . , m).

For ρ ∈ J thus deﬁned, we have, for i = 1, . . . , n, ρ(Fi ) < η(Fi ) + δ = η(Fi ) + µ(Fi ) + 1 − η(Fi ) = µ(Fi ) + 1 , while |ρ(X ) − µ(X )| < 1 . Thus J ⊆ G, and similarly J ⊆ H. The proof for family (iii) follows similar lines, while that for family (i) is standard. To check that the three topologies are equivalent, we show that for any µ ∈ MX , any (iii)-neighbourhood of µ contains a (ii)-neighbourhood, which in turn contains a (i)-neighbourhood, and that this in turn contains a (iii)neighbourhood. Suppose there is given then a (iii)-neighbourhood of µ, as deﬁned in (iii) of the proposition, and construct a (ii)-neighbourhood by setting Fi = Gci , i = 1, . . . , n, and taking 12 in place of . Then, for any ν in this neighbourhood, ν(Gi ) = ν(X ) − ν(Gci ) > µ(X ) − 12 − µ(Gci ) − 12 = µ(Gi ) − . Since the condition on |µ(X ) − νX )| carries across directly, this is enough to show that ν lies within the given (iii)-neighbourhood of µ. Given next a (ii)-neighbourhood, deﬁned as in the proposition, we can ﬁnd a δ with 0 < δ < 12 for which, for i = 1, . . . , n, µ(Fiδ ) < µ(Fi ) + 12 . Consider the sphere in MX with centre µ and radius δ, using the weak-convergence metric d. For any ν in this sphere, ν(Fi ) < µ(Fiδ ) + δ < µ(Fi ) + 21 + 21 = µ(Fi ) + , while taking F = X in the deﬁning relation for d gives ν(X ) − 12 < µ(X ) < ν(X + 21 ; thus ν also lies within the given (ii)-neighbourhood. Finally, suppose there is given a (i)-neighbourhood of µ, Sµ say, deﬁned by the relations, holding for all closed F and given > 0, {ν: ν(F ) < µ(F ) + ; µ(F ) < ν(F ) + }. We have to construct a (iii)-neighbourhood of µ that lies within Sµ . To this end, we ﬁrst use the separability of X to cover X with a countable union of spheres S1 , S2 , . . . , each of radius 13 or less, and each a continuity set for µ. Then, choose N large enough so that RN = X − ∪N 1 Si , which is also a continuity set for µ, satisﬁes µ(RN ) < 13 . We now deﬁne a (iii)-neighbourhood of µ by taking the ﬁnite family of sets A consisting of all ﬁnite unions of the Si , i = 1, . . . , N , all ﬁnite unions of the closures of their complements Sic , and RN , and setting Gµ = {ν : ν(A) < µ(A) + 13 , A ∈ A, |ν(X ) − µ(X )| < 13 }. Given an arbitrary closed F in X , denote by F ∗ the union of all elements of A that intersect F , so that F ∗ ∈ A and F ∗ ⊆ F ∗ ⊆ F . Then, for ν ∈ Gµ ,

400

APPENDIX 2. Measures on Metric Spaces

ν(F ) ≤ ν(F ∗ ) + ν(RN ) < µ(F ∗ ) + 13 + ν(RN ) < µ(F ∗ ) + 13 + µ(RN ) + 13 < µ(F ) + . Further, µ(F ) ≤ µ(F ∗ ) + µ(RN ) < µ(F ∗ ) + 13 = µ(X ) − µ[(F ∗ )c ] + 13 . But µ(X ) < ν(X ) + 13 , and µ[(F ∗ )c ] ≥ ν[(F ∗ )c ] − 13 , so that on substituting µ(F ) < ν(X ) − ν[(F ∗ )c ] + = ν(F ∗ ) + < ν(F ) + . These inequalities show that ν ∈ Sµ and hence Gµ ⊆ Sµ . The weak convergence of µn to µ is equivalent by Theorem A2.3.II to µn → µ in each of the topologies (ii) and (iii) and hence by the proposition to d(µn , µ) → 0. The converse holds, so we have the following. Corollary A2.5.II. For µn and µ ∈ MX , µn → µ weakly if and only if d(µn , µ) → 0. If A is a continuity set for µ, then we have also µn (A) → µ(A). However, it does not appear that there is a basis, analogous to (ii) and (iii) of Proposition A2.5.I, corresponding to this form of the convergence. Having established the fact that the weak topology is a metric topology, it makes sense to ask whether MX is separable or complete with this topology. Proposition A2.5.III. If X is a c.s.m.s. and MX is given the topology of weak convergence, then MX is also a c.s.m.s. Proof. We ﬁrst establish completeness by using the compactness criteria of the preceding section. Let {µn } be a Cauchy sequence in MX ; we show that it is uniformly tight. Let positive and δ be given, and choose positive η < min( 13 , 12 δ). From the Cauchy property, there is an N for which d(µn , µN ) < η for n ≥ N . Since µN itself is tight, X can be covered by a sequence of spheres S1 , S2 , . . . of radius η and there is a ﬁnite K for which K µN (X ) − µN i=1 Si < η. For n > N , since d(µn , µN ) < η, µn (X ) − µN (X ) < η so

and

µN

K i=1

η K + η, Si < µn i=1 Si

η

K < µn (X ) − µn i=1 Si & K & & + η ≤ 3η < . ≤ |µn (X ) − µN (X )| + &µN (X ) − µN i=1 Si

µn (X ) − µn

K i=1

Si

It follows that for every and δ we can ﬁnd a ﬁnite family of δ spheres whose union has µn measure within of µn (X ), uniformly in n. Hence, the sequence {µn } is uniformly tight by Lemma A2.4.II and relatively compact by Theorem A2.4.I [since it is clear that the quantities µn (X ) are bounded when {µn } is

A2.5.

Metric Properties of the Space MX

401

a Cauchy sequence]. Thus, there exists a limit measure such that µn → µ weakly, which implies by Corollary A2.5.II that d(µn , µ) → 0. Separability is easier to establish, as a suitable dense set is already at hand in the form of the measures with ﬁnite support (i.e. those that are purely atomic with only a ﬁnite set of atoms). Restricting the atoms to the points of a separability set D for X and their masses to rational numbers, we obtain a countable family of measures, D say, which we now show to be dense in MX by proving that any sphere S (µ) ⊆ MX contains an element of D . To this end, ﬁrst choose a compact set K such that µ(X \ K) < 12 , which is possible because µ is tight. Now cover K with a ﬁnite family of disjoint sets A1 , . . . , An , each with nonempty interior and of radius or less. [One way of constructing such a covering is as follows. First, cover K with a ﬁnite family of open spheres S1 , . . . , Sm , say, each of radius . Take A1 = S¯1 , A2 = S¯2 ∩Ac1 , A3 = S¯3 ∩ (A1 ∪ A2 )c , and so on, retaining only the nonempty sets in this construction. Then S2 ∩ Ac1 is open and either empty, in which case S2 ⊆ A1 so S¯2 ⊆ A¯1 and A2 is empty, or has nonempty interior. It is evident that each Ai has radius or less and that they are disjoint.] For each i, since Ai has nonempty interior, we can choose an element xi of the separability set for X with xi ∈ Ai , give xi rational mass µi such that µ(Ai ) ≥ µi ≥ µ(Ai ) − /(2N ),

with atoms and let µ denote a purely atomic measure at xi of mass µi . Then, denoting i:xi ∈F , for an arbitrary closed set F , with

µ (F ) = µi ≤ µ(Ai ) < µ(F ) + , where we have used the fact that i:xi ∈F Ai ⊆ F because Ai has radius at most . Furthermore,

µ(F ∩ Ai ) + 12 , µ(F ) < µ(K ∩ F ) + 12 ≤ where denotes i:Ai ∩F =∅ , so

µ(F ) ≤ µ (Ai ) + 12 + 12 < µ(F ) + . Consequently, d(µ, µ ) < , or equivalently, µ ∈ S (µ), as required. Denote the Borel σ-algebra on MX by B(MX ) so that from the results just established it is the smallest σ-algebra containing any of the three bases listed in Proposition A2.5.I. We use this fact to characterize B(MX ). Proposition A2.5.IV. Let S be a semiring generating the Borel sets BX of X . Then B(MX ) is the smallest σ-algebra of subsets of MX with respect to which the mappings ΦA : MX → R deﬁned by ΦA (µ) = µ(A) are measurable for A ∈ S. In particular, B(MX ) is the smallest σ-algebra with respect to which the ΦA are measurable for all A ∈ BX .

402

APPENDIX 2. Measures on Metric Spaces

Proof. Start by considering the class C of subsets A of X for which ΦA is B(MX )-measurable. Since ΦA∪B = ΦA + ΦB for disjoint A and B, and the sum of two measurable functions is measurable, C is closed under ﬁnite disjoint unions. Similarly, since ΦA\B = ΦA −ΦB for A ⊇ B, C is closed under proper diﬀerences and hence in particular under complementation. Finally, since a monotone sequence of measurable functions has a measurable limit, and ΦAn ↑ ΦA whenever An ↑ A, it follows that C is a monotone class. Let F be any closed set in X and y any positive number. Choose µ ∈ MX such that µ(F ) < y and set = y − µ(F ). We can then write {ν: ΦF (ν) < y} = {ν: ν(F ) < y} = {ν: ν(F ) < µ(F ) + }, showing that this set of measures is an element of the basis (ii) of Proposition A2.5.I and hence an open set in MX and therefore an element of B(MX ). Thus, C contains all closed sets, and therefore also C contains all open sets. From these properties of C, it now follows that C contains the ring of all ﬁnite disjoint unions of diﬀerences of open sets in X , and since C is a monotone class, it must contain all sets in BX . This shows that ΦA is B(MX )-measurable for all Borel sets A and hence a fortiori for all sets in any semiring S generating the Borel sets. It remains to show that B(MX ) is the smallest σ-algebra in MX with this property. Let S be given, and let R be any σ-ring with respect to which ΦA is measurable for all A ∈ S. By arguing as above, it follows that ΦA is also R-measurable for all A in the σ-ring generated by S, which by assumption is BX . Now suppose we are given > 0, a measure µ ∈ MX , and a ﬁnite family F1 , . . . , Fn of closed sets. Then, the set {ν: ν(Fi ) < µ(Fi ) + for i = 1, . . . , n and |ν(X ) − µ(X )| < } is an intersection of sets of R and hence is an element of R. But this shows that R contains a basis for the open sets of MX . Since MX is separable, every open set can be represented as a countable union of basic sets, and thus all open sets are in R. Thus, R contains B(MX ), completing the proof.

#

A2.6. Boundedly Finite Measures and the Space MX

For applications to random measures, we need to consider not only totally ﬁnite measures on BX but also σ-ﬁnite measures with the strong local ﬁniteness condition contained in the following deﬁnition. Deﬁnition A2.6.I. A Borel measure µ on the c.s.m.s. X is boundedly ﬁnite if µ(A) < ∞ for every bounded Borel set A. We write M# X for the space of boundedly ﬁnite Borel measures on X and generally use the # notation for concepts taken over from ﬁnite to boundedly

A2.6.

Boundedly Finite Measures and the Space M# X

403

ﬁnite measures. The object of this section is to extend to M# X the results previously obtained for MX : while most of these extensions are routine, they are given here for the sake of completeness. Consider ﬁrst the extension of the concept of weak convergence. Taking a ﬁxed origin x0 ∈ X , let Sr = Sr (x0 ) for 0 < r < ∞ and introduce a distance function d# on M# X by setting d# (µ, ν) = 0

∞

e−r

dr (µ(r) , ν (r) ) dr, 1 + dr (µ(r) , ν (r) )

(A2.6.1)

where µ(r) , ν (r) are the totally ﬁnite restrictions of µ, ν to Sr and dr is the Prohorov distance between the restrictions. Examining (A2.5.1) where this distance is deﬁned, we see that the inﬁmum cannot decrease as r increases when the number of closed sets to be scrutinized increases, so as a function of r, dr is monotonic and thus a measurable function. Since the ratio dr /(1 + dr ) ≤ 1, the integral in (A2.6.1) is deﬁned and ﬁnite for all µ, ν. The triangle inequality is preserved under the mapping x → x/(1 + x), while d# (µ, ν) = 0 if and only if µ and ν coincide on a sequence of spheres expanding to the whole of X , in which case they are identical. We call the metric topology generated by d# the w# -topology (‘weak hash’ topology) and write µk →w# µ for convergence with respect to this topology. Some equivalent conditions for w# -convergence are as in the next result. Proposition A2.6.II. Let {µk : k = 1, 2, . . .} and µ be measures in M# X; then the following conditions are equivalent. (i) µk →w# µ. (ii) X f (x) µk (dx) → X f (x) µ(dx) for all bounded continuous functions f (·) on X vanishing outside a bounded set. (n) (iii) There exists a sequence of spheres S (n) ↑ X such that if µk , µ(n) denote (n) the restrictions of the measures µk , µ to subsets of S (n) , then µk → µ(n) weakly as k → ∞ for n = 1, 2, . . . . (iv) µk (A) → µ(A) for all bounded A ∈ BX for which µ(∂A) = 0. Proof. We show that (i) ⇒ (iii) ⇒ (ii) ⇒ (iv) ⇒ (i). Write the integral in (A2.6.1) for the measures µn and µ as ∞ d# (µk , µ) = e−r gk (r) dr 0

so that for each k, gk (r) increases with r and is bounded above by 1. Thus, there exists a subsequence {kn } and a limit function g(·) such that gkn (r) → g(r) at all continuity points of g [this is just a version of the compactness criterion for vague convergence on R: r egard each gk (r) as the distribution function of a probability measure so that there exists a vaguely convergent subsequence; see Corollary A2.6.V or any standard proof of the Helly–Bray ∞ results]. By dominated convergence, 0 e−r g(r) dr = 0 and hence, since g(·)

404

APPENDIX 2. Measures on Metric Spaces

is monotonic, g(r) = 0 for all ﬁnite r > 0. This being true for all convergent subsequences, it follows that gk (r) → 0 for such r and thus, for these r, (r)

dr (µk , µ(r) ) → 0

(k → ∞).

In particular, this is true for an increasing sequence of values rn , corresponding (r ) to spheres {Srn } ≡ {Sn }, say, on which therefore µk n → µ(rn ) weakly. Thus, (i) implies (iii). Suppose next that (iii) holds and that f is bounded, continuous, and vanishes outside some bounded set. Then, the support of f is contained in some (r) Sr , and hence f dµk → f dµ(r) , which is equivalent to (ii). When (ii) holds, the argument used to establish (iv) of Theorem A2.3.II shows that µk (C) → µ(C) whenever C is a bounded Borel set with µ(∂C) = 0. Finally, if (iv) holds and Sr is any sphere that is a continuity set for µ, (r) then by the same theorem µk → µ(r) weakly in Sr . But since µ(Sr ) increases monotonically in r, Sr is a continuity set for almost all r, so the convergence to zero of d# (µk , µ) follows from the dominated convergence theorem. Note that we cannot ﬁnd a universal sequence of spheres, {S n } say, for which (i) and (ii) are equivalent because the requirement of weak convergence on S n that µk (S n ) → µ(S n ) cannot be guaranteed unless µ(∂S n ) = 0. While the distance ﬁunction d# of Deﬁnition A2.6.I depends on the centre x0 of the family {Sr } of spheres used there, the w# -topology does not depend on the choice of x0 . To see this, let {Sn } be any sequence of spheres expanding to X so that to any Sn we can ﬁrst ﬁnd n for which Sn ⊆ Srn and then ﬁnd n for which Srn ⊆ Sn . Now weak convergence within a given sphere is subsumed by weak convergence in a larger sphere containing it, from which the asserted equivalence follows. It should also be noted that for locally compact X , w# -convergence coincides with vague convergence. The next theorem extends to w# -convergence the results in Propositions A2.5.III and A2.5.IV. # Theorem A2.6.III. (i) M# X with the w -topology is a c.s.m.s. # (ii) The Borel σ-algebra B(MX ) is the smallest σ-algebra with respect to which the mappings ΦA : M# X → R given by

ΦA (µ) = µ(A) are measurable for all sets A in a semiring S of bounded Borel sets generating BX and in particular for all bounded Borel sets A. Proof. To prove separability, recall ﬁrst that the measures with rational masses on ﬁnite support in a separability set D for X form a separability set D for the totally ﬁnite measures on each Sn under the weak topology. Given ∞ > 0, choose R so that R e−r dr < 12 . For any µ ∈ M# X , choose an atomic measure µR from the separability set for SR such that µR has support in SR

A2.6.

Boundedly Finite Measures and the Space M# X

405

and dR (µR , µ(R) ) < 12 . Clearly, for r < R, we also have (r)

dr (µR , µ(r) ) < 12 . Substitution in the expression for d# shows that d# (µR , µ) < , establishing that the union of separability sets is a separability set for measures in M# X. To show completeness, let {µk } be a Cauchy sequence for d# . Then, each (r) sequence of restrictions {µk } forms a Cauchy sequence for dr and so has a limit νr by Proposition A2.5.III. The sequence {νr } of measures so obtained is clearly consistent in the sense that νr (A) = νs (A) for s ≤ r and Borel sets A of Sr . Then, the set function µ(A) = νr (A) is uniquely deﬁned on Borel sets A of Sr and is nonnegative and countably additive on the restriction of MX to each Sr . We now extend the deﬁnition of µ to all Borel sets by setting µ(A) = lim νr (A ∩ Sr ), r→∞

the sequence on the right being monotonically increasing and hence having a limit (ﬁnite or inﬁnite) for all A. It is then easily checked that µ(·) is ﬁnitely additive and continuous from below and therefore countably additive and so a boundedly ﬁnite Borel measure. Finally, it follows from (ii) of Proposition A2.6.II that µk →w# µ. To establish part (ii) of the theorem, examine the proof of Proposition A2.5.IV. Let C be the class of sets A for which ΦA is a B(M# X )-measurable mapping into [0, ∞). Again, C is a monotone class containing all bounded open and closed sets on X and hence BX as well as any ring or semiring generating BX . Also, if S is a semiring of bounded sets generating BX and ΦA is R-measurable for A ∈ S and some σ-ring R of sets on M# X , then ΦA is R-measurable for A ∈ BX . The proposition now implies that R(r) , the σ-algebra formed by projecting the measures in sets of R onto Sr , contains B(MSr ). Equivalently, R contains the inverse image of B(MSr ) under this projection. The deﬁnition of B(M# X ) implies that it is the smallest σ-algebra containing each of these inverse images. Hence, R contains B(M# X ). The ﬁnal extension is of the compactness criterion of Theorem A2.4.I. Proposition A2.6.IV. A family of measures {µk } in M# X is relatively com(n) pact in the w# -topology on M# if and only if their restrictions {µα } to a X sequence of closed spheres S n ↑ X are relatively compact in the weak topology on MS n , in which case the restrictions {µF α } to any closed bounded F are relatively compact in the weak topology on MF . Proof. Suppose ﬁrst that {µα } is relatively compact in the w# -topology on M# X and that F is a closed bounded subset of X . Given any sequence of the # µF α , there exists by assumption a w -convergent subsequence, µαk →w# µ say. From Proposition A2.6.II, arguing as in the proof of A2.3.II, it follows that for

406

APPENDIX 2. Measures on Metric Spaces

all bounded closed sets C, lim supk→∞ µαk (C) ≤ µ(C). Hence, in particular, the values of µαk (F ) are bounded above. Moreover, the restrictions {µF αk } are uniformly tight, this property being inherited from their uniform tightness on a closed bounded sphere containing F . Therefore, the restrictions are relatively compact as measures on F , and there exists a further subsequence converging weakly on F to some limit measure, µ# say, on F . This is enough to show that the µF α themselves are relatively compact. Conversely, suppose that there exists a family of spheres Sn , closed or (n) otherwise, such that {µα } are relatively compact for each n. By diagonal (n) selection, we may choose a subsequence αk such that µαk → µ(n) weakly for every n and therefore that, if f is any bounded continuous function vanishing (n) outside a bounded set, then f dµαk → f dµ(n) . It is then easy to see (n) (n) (m) that the µα form a consistent family (i.e. µα coincides with µα on Sm for # n ≥ m) and so deﬁne a unique element µ of MX such that µαk →w# µ. The criterion for weak convergence on each Sn can be spelled out in detail from Prohorov’s Theorem A2.4.I. A particularly neat result holds in the case where X is locally (and hence countably) compact when the following terminology is standard. A Radon measure in a locally compact space is a measure taking ﬁnite values on compact sets. A sequence {µk } of such measures converges vaguely to µ if f dµk → f dµ for each continuous f vanishing outside a compact set. Now any locally compact space with a countable base is metrizable, but the space is not necessarily complete in the metric so obtained. If, however, the space is both locally compact and a c.s.m.s., it can be ◦ represented as the union of a sequence of compact sets Kn with Kn ⊆ Kn+1 , and then by changing to an equivalent metric if necessary, we can ensure that the spheres Sn are compact as well as closed (see e.g. Hocking and Young, 1961, Proposition 2.61); we assume this is so. Then, a Borel measure is a Radon measure if and only if it is boundedly ﬁnite, and vague convergence coincides with w# -convergence. The discussion around (A2.6.1) shows that the vague topology is metrizable and suggests one form for a suitable metric. Finally, Proposition A2.6.IV takes the following form. Corollary A2.6.V. If X is a locally compact c.s.m.s., then the family {µα } of Radon measures on BX is relatively compact in the vague topology if and only if the values {µα (A)} are bounded for each bounded Borel set A. Proof. Assume the metric is so chosen that closed bounded sets are compact. Then, if the µα (·) are relatively compact on each Sn , it follows from condition (i) of Theorem A2.4.I that the µα (Sn ) are bounded and hence that the µα (A) are bounded for any bounded Borel set A. Conversely, suppose the boundedness condition holds. Then, in particular, it holds for Sn , which is compact so the tightness condition (ii) of Theorem A2.4.I is satisﬁed trivially. Thus, the {µα } are relatively compact on each Sn and so by Proposition A2.6.IV are relatively compact in the w# - (i.e. vague) topology.

A2.7.

Measures on Topological Groups

407

A2.7. Measures on Topological Groups A group G is a set on which is deﬁned a binary relation G × G → G with the following properties. (i) (Associative law) For all g1 , g2 , g3 ∈ G, (g1 g2 )g3 = g1 (g2 g3 ). (ii) There exists an identity element e (necessarily unique) such that for all g ∈ G, ge = eg = g. (iii) For every g ∈ G, there exists a unique inverse g −1 such that g −1 g = gg −1 = e. The group is Abelian if it also has the property (iv) (Commutative law) For all g1 , g2 ∈ G, g1 g2 = g2 g1 . A homomorphism between groups is a mapping T that preserves the group operations in the sense that (T g1 )(T g2 ) = T (g1 g2 ) and (T g1 )−1 = T g −1 . If the mapping is also one-to-one, it is an isomorphism. An automorphism is an isomorphism of the group onto itself. A subgroup H of G is a subset of G that is closed under the group operations and so forms a group in its own right. If H is nontrivial (i.e. neither {e} nor the whole of G), its action on G splits G into equivalence classes, where g1 ≡ g2 if there exists h ∈ H such that g2 = g1 h. These classes form the left cosets of G relative to H; they may also be described as the (left) quotient space G/H of G with respect to H. Similarly, H splits G into right cosets, which in general will not be the same as the left cosets. If G is Abelian, however, or more generally if H is a normal (or invariant) subgroup, which means that for every g ∈ G, h ∈ H, g −1 hg ∈ H, the right and left cosets coincide and the products of two elements, one from each of any two given cosets, fall into a uniquely deﬁned third coset. With this deﬁnition of multiplication, the cosets then form a group in their own right, namely the quotient group. The natural map taking an element from G into the coset to which it belongs is then a homomorphism of G into G/H, of which H is the kernel; that is, the inverse image of the identity in the image space G/H. The direct product of two groups G and H, written G × H, consists of the Cartesian products of G and H with the group operation (g1 , h1 )(g2 , h2 ) = (g1 g2 , h1 h2 ), identity (eG , eH ), and inverse (g, h)−1 = (g −1 , h−1 ). In particular, if G is a group and H a normal subgroup, then G is isomorphic to the direct product H × G/H. G is a topological group if it has a topology U with respect to which the mapping (g1 , g2 ) → g1 g2−1 from G × G (with the product topology) into G is continuous. This condition makes the operations of left (and right) multiplication by a ﬁxed element of G, and of inversion, continuous. A theory with wide applications results if the topology U is taken to be locally compact and second countable. It is then metrizable but not necessarily complete in the resulting metric. In keeping with our previous discussion, however, we frequently assume that G is a complete separable metric group (c.s.m.g.) as well

408

APPENDIX 2. Measures on Metric Spaces

as being locally compact. If, as may always be done by a change of metric, the closed bounded sets of G are compact, we refer to G as a σ-group. Deﬁnition A2.7.I. A σ-group is a locally compact, complete separable metric group with the metric so chosen that closed bounded sets are compact. In this context, boundedly ﬁnite measures are Radon measures, and the concepts of weak and vague convergence coincide. A boundedly ﬁnite measure µ on the σ-group is left-invariant if (writing gA = {gx: x ∈ A}) µ(gA) = µ(A) or equivalently,

G

(g ∈ G, A ∈ BG ),

f (g −1 x) µ(dx) =

(A2.7.1)

f (x) µ(dx)

(A2.7.2)

G

for all f ∈ BC(G), the class of continuous functions vanishing outside a bounded (in this case compact) set. Right-invariance is deﬁned similarly. A fundamental theorem for locally compact groups asserts that up to scale factors they admit unique left- and right-invariant measures, called Haar measures. If the group is Abelian, the left and right Haar measures coincide, as they do also when the group is compact, in which case the Haar measure is totally ﬁnite and is uniquely speciﬁed when normalized to have total mass unity. On the real line, or more generally on Rd , the Haar measure is just Lebesgue measure (·), and the uniqueness referred to above is eﬀectively a restatement of results on the Cauchy functional equation. If G is a topological group and H a subgroup, the quotient topology on G/H is the largest topology on G/H making the natural map from G into G/H continuous. It is then also an open map (i.e. takes open sets into open sets). If it is closed, then the quotient topology for G/H inherits properties from the topology for G: it is Hausdorﬀ, or compact, or locally compact if and only if G has the same property. These concepts extend to the more general context where X is a c.s.m.s. and H deﬁnes a group of one-to-one bounded continuous maps Th of X into itself such that Th1 (Th2 (x)) = Th1 h2 (x). Again we assume that H is a σ-group and that the {Th } act continuously on X , meaning that the mapping (h, x) → Th (x) is continuous from H × X into X . The action of H splits X into equivalence classes, where x1 ≡ x2 if there exists h ∈ H such that x2 = Th (x1 ). It acts transitively on X if for every x1 , x2 ∈ X there exists an h such that Th maps x1 into x2 . In this case, the equivalence relation is trivial: there exists only one equivalence class, the whole space X . In general, the equivalence classes deﬁne a quotient space Q, which may be given the quotient topology; with this topology, the natural map taking x into the equivalence class containing it is again both continuous and open. If the original topology on H is not adjusted to the group action, however, the quotient topology may not be adequate for a detailed discussion of invariant measures.

A2.7.

Measures on Topological Groups

409

Example A2.7(a). Consider R1 under the action of scale changes: x → αx (0 < α < ∞). Here H may be identiﬁed with the positive half-line (0, ∞) with multiplication as the group action. There are three equivalence classes, (−∞, 0), {0}, and (0, ∞), which we may identify with the three-point space Q = {−1, 0, 1}. The quotient topology is trivial (only ∅ and the whole of Q), whereas the natural topology for further discussion is the discrete topology on Q, making each of the three points both open and closed in Q. With this topology, the natural map is open but not continuous. It does have, however, a continuous (albeit trivial) restriction to each of the three equivalence classes and therefore deﬁnes a Borel mapping of X into Q. An important problem is to determine the structure of boundedly ﬁnite measures on X that are invariant under the group of mappings {Th }. In many cases, some or all of the equivalence classes of X under H can be identiﬁed with replicas of H so that we may expect the restriction of the invariant measures to such cosets to be proportional to Haar measure. When such an identiﬁcation is possible, the following simple lemma can be used; it allows us to deal with most of the situations arising from concrete examples of invariant measures [see e.g. Bourbaki (1963) for further background]. Lemma A2.7.II (Factorization Lemma). Let X = H × Y, where H is a σ-group and Y is a c.s.m.s., and suppose that µ ∈ M# X is invariant under left multiplication by elements of H in the sense that, for A ∈ BX and B ∈ BY , µ(hA × B) = µ(A × B).

(A2.7.3)

Then µ = × κ, where is a multiple of left Haar measure on H and κ ∈ M# Y is uniquely determined up to a scalar multiple. Proof. Consider the set function µB (·) deﬁned on BH for ﬁxed B ∈ BY by µB (A) = µ(A × B). Then µB inherits from µ the properties of countable additivity and bounded ﬁniteness and so deﬁnes an element of M# H . But then, from (A2.7.3), µB (hA) = µ(hA × B) = µ(A × B) = µB (A), implying that µB is invariant under left multiplication by elements of H. It therefore reduces to a multiple of left Haar measure on H, µB (A) = κ(B) = (A),

say.

Now the family of constants κ(B) may be regarded as a set function on BY , and, as for µB , this function is both countably additive and boundedly ﬁnite. Consequently, κ(·) ∈ M# Y , and it follows that µ(A × B) = µB (A) = (A)κ(B). In other words, µ reduces to the required product form on product sets, and since these generate BX , µ and the product measure × κ coincide. To apply this result to speciﬁc examples, it is often necessary to ﬁnd a suitable product representation for the space on which the transformations act. The situation is formalized in the following statement.

410

APPENDIX 2. Measures on Metric Spaces

Proposition A2.7.III. Let X be a c.s.m.s. acted on measurably by a group of transformations {Th : h ∈ H}, where H is a σ-group. Suppose, furthermore, that there exists a mapping ψ: H × Y → X , where Y is a c.s.m.s. and ψ is one-to-one, both ways measurable, takes bounded sets into bounded sets, and preserves the transformations {Th } in the sense that Th ψ(h, y) = ψ(h h, y)

(h ∈ H).

(A2.7.4)

Let µ be a measure on M# X that is invariant under the transformation Th . Then there exists a unique invariant measure κ ∈ M# Y such that, for BX measurable nonnegative functions f , f (x) µ(dx) = κ(dy) f ψ(h, y) (dh). (A2.7.5) X

Y

H

Proof. Let µ ˜ be the image of µ induced on H × Y by the mapping ψ; that is, µ ˜(A × B) = µ ψ(A × B) . Then, ˜(A × B) µ ˜(hA × B) = µ ψ(hA × B) = µ Th ψ(A × B) = µ ψ(A × B) = µ so that µ ˜ is invariant under the action of h ∈ H on the ﬁrst argument. Moreover, if A and B are bounded sets in H and Y, respectively, then by assumption ψ(A × B) is bounded in X so that µ ˜ is boundedly ﬁnite whenever µ is boundedly ﬁnite. Lemma A2.7.II can now be applied and yields the result that µ ˜(A × B) = (A)κ(B) for some unique boundedly ﬁnite measure κ in M# Y . This relation establishes the truth of (A2.7.5) for indicator functions Iψ(A×B) (x) for A ∈ BH and B ∈ B(M# Y ). Using the usual approximation arguments, the result extends to simple functions f and thence to limits of these. It therefore holds for all nonnegative f such that f ◦ ψ is measurable on H × Y. But this is true for any f that is BX -measurable and so proves (A2.7.5). Example A2.7(b). Let µ be a measure on R2 that is invariant under rotations about the origin. These may be written Tθ for θ ∈ S, S denoting the circumference of the unit disk with addition modulo 2π. The equivalence classes consist of circles of varying radii centred on the origin, together with the isolated point {0}. The mapping (r, θ) → (r cos θ, r sin θ) takes the product space S × R+ into R2 \ {0} and is a representation of the required kind for R2 \ {0}. We therefore write µ as the sum of a point mass at the origin and a measure on R2 \ {0} that is invariant under rotations and can therefore be represented as the image of the uniform distribution around the circle and a measure κ on the positive half-line. Integration with respect to µ takes the form [see (A2.7.5)] ∞ 2π dθ f (x) µ(dx) = f (0)µ({0}) + κ(dr) f (r cos θ, r sin θ) . 2π R2 0+ 0

A2.8.

Fourier Transforms

411

A2.8. Fourier Transforms In this section, we collect together a few basic facts from classical Fourier transform theory. For brevity, most results are stated for Fourier transforms of functions on R ≡ R1 ; the corresponding results for Rd can be obtained by no more than changes in the domain of integration and appropriate bookkeeping with multiples of 2π. Both the Rd theory and the theory of Fourier series, which can be regarded as Fourier transforms of functions deﬁned on the unit circle, are subsumed under the concluding comments concerned with Fourier transforms of functions deﬁned on locally compact Abelian groups. We refer to texts such as Titchmarsh (1937) for more speciﬁc material on these topics. For any real- or complex-valued measurable (Lebesgue) integrable function f (·), its Fourier transform f˜(·) is deﬁned by f˜(ω) =

∞

eiωx f (x) dx

−∞

(ω ∈ R).

(A2.8.1)

If f is real and symmetric, then so is f˜. In any case, f˜ is bounded and continuous, while the Riemann–Lebesgue lemma asserts that f (ω) → 0 as |ω| → ∞. Furthermore, if f˜ is integrable, then the inverse relation f (ω) =

1 2π

∞

eixω f˜(ω) dω

(A2.8.2)

−∞

holds. The theory is not symmetric with respect to f and f˜: for a more detailed account of the representation of a function by its inverse Fourier transform, see, for example, Titchmarsh (1937). A symmetrical theory results if we consider (real- or complex-valued) functions that are square integrable. We have the Plancherel identities for square integrable functions f and g,

∞

f (x)g(x) dx = −∞

1 2π

∞

g (ω) dω, f˜(ω)˜

(A2.8.3)

& & &f˜(ω)&2 dω.

(A2.8.4)

−∞

and, with g = f ,

& & &f (x)&2 dx = 1 2π −∞ ∞

∞

−∞

Here the Fourier transform cannot be obtained directly from (A2.8.1) but can be represented as a mean square limit

T

f˜(ω) = l.i.m. T →∞

eiωx f (x) dx, −T

(A2.8.5)

412

APPENDIX 2. Measures on Metric Spaces

the existence of the ﬁnite integral following readily from the Schwarz inequality. Since the limit is deﬁned only up to an equivalence, the theory is strictly between equivalence classes of functions—that is, elements of the Hilbert space L2 (R)—rather than a theory between individual functions. An important version for probability theory is concerned with the Fourier transforms of totally ﬁnite measures (or signed measures). If G is such a measure, its Fourier–Stieltjes transform g˜ is the bounded uniformly continuous function ∞ g˜(ω) = eiωx G(dx). (A2.8.6) −∞

If G is a probability measure, g˜(ω) is its characteristic function and g˜ is then a positive-deﬁnite function: for arbitrary ﬁnite families of real numbers ω1 , . . . , ωr and complex numbers α1 , . . . , αr , r r

αi α ¯ j g˜(ωi − ωj ) ≥ 0.

(A2.8.7)

i=1 j=1

Conversely, Bochner’s theorem asserts that any function continuous at ω = 0 and satisfying (A2.8.7) can be represented as the Fourier transform of a totally ﬁnite measure G on R with G(R) = g˜(0). If we take any real or complex integrable function f with any totally ﬁnite signed measure G and apply Fubini’s theorem to the double integral ∞ ∞ eiωx f (ω) G(dx) dω, −∞

−∞

which is certainly well deﬁned, we obtain Parseval’s identity: ∞ ∞ ˜ f (ω)˜ g (ω) dω. f (x) G(dx) = −∞

(A2.8.8)

−∞

This identity is of basic importance because it shows that G is uniquely determined by g˜. Various more speciﬁc inversion theorems can be obtained by taking suitable choices of f followed by a passage to the limit: this approach is outlined in Feller (1966, Section XV.3), for example. In particular, the following two forms are traditional. (i) For continuity intervals (a, b) of G,

T

G((a, b)) = lim

T →∞

−T

e−iωa − e−iωb g˜(ω) dω. iω

(ii) For an atom a of G, 1 T →∞ 2T

T

G({a}) = lim

−T

e−iωa g˜(ω) dω.

A2.8.

Fourier Transforms

413

Much of the preceding theory can be extended without diﬃculty from R to the case of a locally compact Abelian topological group G. The characters of such a group are the continuous homomorphisms of the group into the complex numbers of modulus 1. If χ1 , χ2 are characters, then so are χ1 χ2 and χ−1 1 . Thus, the characters form a group in their own right, G say, the dual group for G. There is a natural topology on G, namely the smallest making the evaluation mapping eg (χ) ≡ χ(g) continuous for each g ∈ G, and with this topology G also is a locally compact Abelian topological group. If G = R, the characters are of the form eiωx (ω ∈ R), and G can be identiﬁed with another version of R. If G = Z, the group of integers, G is the circle group, and vice versa. In any case, the original group reappears as the dual of the dual group and if G is compact, G is discrete and conversely. G, denote Haar measure on G and G, respectively. If f : G → Now let H and H R is measurable and H-integrable, its Fourier transform f˜ is the function deﬁned on G by ˜ f (χ) = χ(g)f (g) H(dg). (A2.8.9) G

If also f˜ is H-integrable, then the inverse relation χ(g)f˜(χ) H(dχ) f (g) = G

(A2.8.10)

is normed appropriately [otherwise, a normalizing holds, provided that H constant such as 1/(2π) in (A2.8.2) is needed]. Assuming that such a norming has been adopted, the appropriate analogues of (A2.8.4–8) remain true. In particular, we note the generalized Plancherel identity & & & & &f (g)&2 H(dg) = &f˜(χ)&2 H(dχ). (A2.8.11) G G

APPENDIX 3

Conditional Expectations, Stopping Times, and Martingales

This appendix contains mainly background material for Chapter 14. For further discussion and most proofs, we refer the reader to Ash (1972), Chung (1974), Br´emaud (1981), and to various references cited in the text.

A3.1. Conditional Expectations Let (Ω, E, P) be a probability space (see Section A1.4), X a random variable (r.v.) with E|X| = Ω |X| P(dω) < ∞, and G a sub-σ-algebra of events from E. The conditional expectation of X with respect to G, written E(X | G) or EX|G (ω), is the G-measurable function (i.e. a random variable) deﬁned up to values on a set of G of P-measure zero as the Radon–Nikodym derivative (G)

E(X | G) = EX|G (ω) = ξX (dω)/P (G) (dω),

where ξX (A) = A X(ω) P(dω) is the indeﬁnite integral of X and the superscript (G) indicates that the set functions are to be restricted to G. The G-measurability of E(X | G) implies that X(ω) P(dω) = EX|G (ω) P(dω) (all U ∈ G), (A3.1.1) U

U

an equation, usually taken as the deﬁning relation, that determines the conditional expectation uniquely. Extending (A3.1.1) from G-measurable indicator functions IU (ω) to more general G-measurable functions Y , we have, whenever E(|X|) and E(|XY |) exist, E(XY ) = Y (ω)X(ω) P(dω) = Y (ω)EX|G (ω) P(dω) = E[Y E(X | G)]. Ω

Ω

(A3.1.2) 414

A3.1.

Conditional Expectations

415

Now replacing Y by Y IU for U ∈ G and using (A3.1.1), there follows the factorization property of conditional expectations that for G-measurable r.v.s Y for which both E(|X|) and E(|XY |) exist, E(XY | G) = Y E(X | G)

a.s.

(A3.1.3)

Conditional expectations inherit many standard properties of ordinary expectations: k k & & αj Xj & G = αj E(Xj | G); (A3.1.4) Linearity: E j=1

j=1

Monotonicity: X ≤ Y a.s. implies E(X | G) ≤ E(Y | G) a.s.; (A3.1.5) Monotone convergence: Xn ≥ 0 and Xn ↑ Y a.s. imply that E(Xn | G) ↑ E(Y | G) a.s.; (A3.1.6) Jensen’s inequality: For convex measurable functions f : R → R for which E[|f (X)|] < ∞, f (E[X | G]) ≤ E[f (X) | G] a.s. (A3.1.7) in (A3.1.7), convexity means that f 12 (x + y) ≤ 12 [f (x) + f (y)] . If G1 and G2 are two sub-σ-algebras with G1 ⊆ G2 ⊆ E and E(|X|) < ∞ as before, the repeated conditioning theorem holds: E[E(X | G1 ) | G2 ] = E[E(X | G2 ) | G1 ] = E(X | G1 ),

(A3.1.8)

yielding as the special case when G = {∅, Ω} E[E(X | G)] = E(X).

(A3.1.9)

Two σ-algebras G and H are independent if, for all A ∈ G and B ∈ H, P(A ∩ B) = P(A)P(B). Given such G and H, if X is G-measurable and we seek E(X | H), we may expect it to reduce to yield E(X | H) = E(X).

(A3.1.10)

This is a special case of the principle of redundant conditioning: if the r.v. X is independent of H [i.e. σ(X) and H are independent σ-algebras] and G is independent of H, then E(X | G ∨ H) = E(X | G),

(A3.1.11)

reducing to (A3.1.10) for trivial G. Let X be a c.s.m.s. and X an X -valued r.v. on (Ω, E, P). Given a sub-σalgebra G of E, the conditional distribution of X given G is deﬁned by analogy with (A3.1.1) by P(X ∈ A | G) = E(IA (X) | G)

(A ∈ BX ).

(A3.1.12)

416

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

As in Section A1.5, the question of the existence of regular conditional distributions arises. In our present context, we seek a kernel function Q(A, ω)

(A ∈ B(X ), ω ∈ Ω)

such that for ﬁxed A, Q(A, ·) is a G-measurable function of ω [and we identify this with (A3.1.12)], while for ﬁxed ω, we want Q(·, ω) to be a probability measure on B(X ). Introduce the set function π(·) deﬁned initially for product sets A × U for A ∈ B(X ) and U ∈ G by IA (X(ω)) P(dω). (A3.1.13) π(A × U ) = U

Since π(·) is countably additive on such sets, it can be extended to a measure, clearly a probability, on (X × Ω, B(X ) ⊗ G). Then Proposition A1.5.III can be applied and yields the following formal statement in which we identify the kernel function Q(·, ·) sought above with P(X ∈ A | G). Proposition A3.1.I. Let X be a c.s.m.s., (Ω, E, P) a probability space, and X an X -valued r.v. deﬁned on (Ω, E, P). If G is a sub-σ-algebra of E, then there exists a regular version of the conditional distribution PX∈·|G (ω) such that (i) PX∈·|G (ω) is a probability measure on B(X ) for each ﬁxed ω; (ii) PX∈A|G (·) is a G-measurable function of ω for ﬁxed A ∈ B(X ); and (iii) for each U ∈ G and A ∈ B(X ), PX∈A|G (ω) P(dω) = IA (X(ω)) P(dω). (A3.1.14) U

U

Observe that if G = E, then the conditional distribution PX∈·|G (ω) is the degenerate distribution concentrated on the point X(ω). In general, the conditional distribution represents a blurred image of this degenerate distribution, the blurring arising as a result of the incomplete information concerning X carried by the sub-σ-algebra G. The following question is of the nature of a converse to the proposition. Given (X , B(X )), (Ω, E, P) and a regular kernel Q(A, ω), can we ﬁnd a reﬁnement E ⊇ E and an E -measurable X -valued r.v. X such that Q(A, ω) coincides with PX∈A|G (ω)? If we conﬁne ourselves to the original space, this may not necessarily be possible, but by extending Ω we can accomplish our aim. Take the probability space (Ω , E , P ) given by Ω = X × Ω, E = B(X ) ⊗ E and P = π as constructed via (A3.1.13) (identifying G there with E here), and consider the r.v. X: X × Ω → X for which X(ω ) = X(x, ω) = x. With the mapping T : Ω → Ω for which T (ω ) = T (x, ω) = ω, so that T −1 (E) is a sub-σ-algebra of E , we then have PX∈A|T −1 (E) (ω ) = Q(A, T (ω )) = Q(A, ω)

(A ∈ B(X )).

(A3.1.15)

A3.1.

Conditional Expectations

417

Often the conditioning σ-algebra G is itself generated by some real- or (more generally) c.s.m.s.-valued r.v. Y . Then E(X | G) is called the conditional expectation of X given Y and P(X ∈ A | G) the conditional distribution of X given Y , together with the suggestive notation E(X | Y ) or EX|Y (ω) and P(X ∈ A | G) or PX∈A|G (ω). Equation (A3.1.13) then implies, for any Borelmeasurable function h(·) such that the unconditional expectations exist, E[Xh(Y ) | Y ] = h(Y ) E(X | Y ).

(A3.1.16)

The terminology suggests that, although E(X | Y ) is deﬁned as an r.v., its value should depend on ω only through Y (ω). Thus, if Y takes its values in a c.s.m.s. Y, we should look for a real-valued B(Y)-measurable function hX|Y (y) such that (A3.1.17) EX|Y (ω) = hX|Y Y (ω) a.s. That such a function exists is the assertion of the Doob representation theorem (e.g. Doob, 1953). It can be established by applying the argument around (A3.1.1) to the measures induced on B(Y) by the equations PY (B) = P(Y −1 (B)) (B ∈ B(Y)), ξX (B) = X(ω) P(dω), Y −1 (B)

and, noting that ξX PY on B(Y), by applying the Radon–Nikodym theorem. Since the product of a ﬁnite or denumerably inﬁnite number of c.s.m.s.s can itself be regarded as a c.s.m.s., we state the theorem in the following general form. Proposition A3.1.II. Let (Ω, E, P) be a probability space, X an integrable real-valued r.v. on Ω, and G a sub-σ-algebra of E generated by a countable family of r.v.s Y = {Y1 , Y2 , . . .} taking their values in the c.s.m.s.s Y1 , Y2 , . . . respectively. Then, there exists a Borel-measurable function hX|Y (·): Y1 × Y2 × · · · → R such that EX|G (ω) = hX|Y (Y1 (ω), Y2 (ω), . . .) P-a.s.

(A3.1.18)

The proposition concerning regular conditional distributions can be transformed in a similar way, yielding a kernel PX∈A|Y (y1 , y2 , . . .), which is a probability distribution in A for each vector (y1 , y2 , . . .), a Borel-measurable function of the family (y1 , y2 , . . .) for each A, and satisﬁes PX∈A|G (ω) = PX∈A|Y (Y1 (ω), Y2 (ω), . . .) P-a.s. When densities exist with respect to some underlying measure µ such as Lebesgue measure on Rd , the conditional distributions have the form fX,Y (x, y1 , y2 , . . .) µ(dx) , PX∈A|Y (y1 , y2 , . . .) = A f (x, y1 , y2 , . . .) µ(dx) X X,Y where fX,Y (·) is the joint density for X, Y1 , Y2 , . . . in the product space X × Y1 × Y2 × · · ·, and a similar representation holds for the conditional expectation hX|Y (·).

418

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

A3.2. Convergence Concepts Most of the diﬀerent notions of convergence and uniform integrability mentioned below are standard. Stable convergence is less familiar and is discussed in more detail. A sequence of r.v.s {Xn : n = 1, 2, . . .} on a common probability space (Ω, E, P) converges in probability to a limit r.v. X, also deﬁned on (Ω, E, P), if for all > 0, (n → ∞). (A3.2.1) P{|Xn − X| > } → 0 The sequence converges almost surely to X if 1 = P{ω: Xn (ω) → X(ω) (n → ∞)} ∞ ∞ & & 1 ω: &Xm (ω) − X(ω)& < =P r r=1 n=1 m≥n ∞ ∞ & & 1 & & =P . ω: Xm (ω) − Xn (ω) < r r=1 n=1

(A3.2.2)

m≥n

Both these concepts readily generalize to the case where the r.v.s X and Xn are X -valued for some c.s.m.s. X by simply replacing the Euclidean distance |X − Y | by the metric ρ(X, Y ) for X, Y ∈ X . The a.s. convergence in (A3.2.2) implies convergence in probability; convergence in probability implies the existence of a subsequence {Xnk } that converges a.s. to the same limit. Returning to the real-valued case, for any given real p ≥ 1, {Xn } converges in the mean of order p (or in pth mean, or in Lp norm) if the pth moments exist and Xn − Xp ≡ [E(|Xn − X|p )]1/p → 0 (n → ∞), (A3.2.3) the norm here denoting the norm in the Banach space Lp (Ω, E, P) of equivalence classes of r.v.s with ﬁnite pth moments. Mean square convergence—i.e. convergence in L2 norm—has its own notation l.i.m. (Doob, 1953, p. 8) as in Section 8.4. For p = ∞, the space L∞ (Ω, E, P) consists of P-essentially bounded r.v.s X; that is, r.v.s X for which |X| ≤ M a.s. for some M < ∞; then X∞ = ess sup |X(ω)| = inf{M : |X(ω)| ≤ M a.s.}. (A3.2.4) If Xn → X in pth mean, then E(Xnp ) → E(X p ) (n → ∞). Chebyshev’s inequality, in the form for an Lp r.v. X, P{|X − a| > } ≤ −p E(|X − a|p )

( > 0, real a),

(A3.2.5)

shows that convergence in Lp norm implies convergence in probability. The converse requires the additional condition of uniform integrability. Deﬁnition A3.2.I. A family of real-valued r.v.s {Xt : t ∈ T } deﬁned on the common probability space (Ω, E, P) is uniformly integrable if, given > 0, there exists M < ∞ such that

A3.2.

Convergence Concepts

419

|Xt |>M

|Xt (ω)| P(dω) <

(all t ∈ T ).

(A3.2.6)

Proposition A3.2.II. Let the r.v.s {Xn : n = 1, 2, . . .} and X be deﬁned on a common probability space (Ω, E, P) and be such that Xn → X in probability. Then, a necessary and suﬃcient condition for the means to exist and for Xn → X in L1 norm is that the sequence {Xn } be uniformly integrable. Applied to the sequence {Xnp } and noting the inequality E(|Xn − X|p ) ≤ 2 [E(|Xn |p ) + E(|X|p )] (1 ≤ p < ∞), the proposition extends in an obvious way to convergence in Lp norm for 1 ≤ p < ∞. A weaker concept than convergence in Lp norm [i.e. strong convergence in the Banach space Lp (Ω, E, P)] is that of weak Lp convergence, namely, that if Xn and X ∈ Lp , then E(Xn Y ) → E(XY ) (n → ∞) for all Y ∈ Lq , where p−1 + q −1 = 1. Let Xn be X -valued for a c.s.m.s. X with metric ρ. Xn converges in distribution if P{Xn ∈ A} → P{X ∈ A} for all A ∈ B(X ) for which P{X ∈ ∂A} = 0. This type of convergence is not so much a constraint on the r.v.s as a constraint on the distributions they induce on B(X ): indeed, it is precisely weak convergence of their induced distributions. If Xn → X in probability (or, a fortiori, if Xn → X a.s. or in Lp norm), then from the inequalities P{Xn ∈ A} − P{X ∈ A} ≤ P {Xn ∈ A} ∩ {X ∈ A} ≤ P {Xn ∈ A} ∩ {X ∈ (A )c } + P{X ∈ A } − P{X ∈ A} p−1

≤ P{ρ(Xn , X) > } + P{X ∈ A } − P{X ∈ A}, it follows that Xn → X in distribution, also written Xn →d X. No general converse statement is possible except when X is degenerate; that is, X = a a.s. for some a ∈ X . For this exceptional case, Xn →d a means that for any positive , P{ρ(Xn , a) < } = P{Xn ∈ S (a)} → 1 (n → ∞), and this is the same as Xn → a in probability. A hybrid concept, in the sense that it depends partly on the r.v.s Xn themselves and partly on their distributions, is that of stable convergence. Deﬁnition A3.2.III. If {Xn : n = 1, 2, . . .} and X are X -valued r.v.s on (Ω, E, P) and F is a sub-σ-algebra of E, then Xn → X (F-stably) in distribution if for all U ∈ F and all A ∈ B(X ) with P{X ∈ ∂A} = 0, P({Xn ∈ A} ∩ U ) → P({X ∈ A} ∩ U )

(n → ∞).

(A3.2.7)

The hybrid nature of stable convergence is well illustrated by the facts that when F = {∅, Ω}, F-stable convergence is convergence in distribution, whereas when F ⊇ σ(X), we have a.s. convergence in probability because the regular version PX∈A|F (ω) of the conditional distribution appearing in P({X ∈ A} ∩ U ) = U PX∈A|F (ω) P(dω) can be taken as being {0, 1}-valued, and when such degenerate distributions for the limit r.v. occur, the concepts of convergence in distribution and in probability coincide, as already noted.

420

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

In general, stable convergence always implies weak convergence, and it may be regarded as a form of weak convergence of the conditional distributions P(Xn ∈ A | F). Just as weak convergence can be expressed in equivalent ways, so too can stable convergence as follows (see Aldous and Eagleson, 1978). Proposition A3.2.IV. Let {Xn }, X and F be as in Deﬁnition A3.2.III. Then, the conditions (i)–(iv) below are equivalent. (i) Xn → X (F-stably); that is, (A3.2.7) holds. (ii) For all F-measurable P-essentially bounded r.v.s Z and all bounded continuous h: X → R, E[Zh(Xn )] → E[Zh(X)]

(n → ∞).

(A3.2.8)

(iii) For all real-valued F-measurable r.v.s Y , the pairs (Xn , Y ) converge jointly in distribution to the pair (X, Y ). (iv) For all bounded continuous functions g: X × R → R and all real-valued F-measurable r.v.s Y , g(Xn , Y ) → g(X, Y )

(F-stably).

(A3.2.9)

If X = R , then any of (i)–(iv) is equivalent to condition (v). (v) For all real vectors t ∈ Rd and all P-essentially bounded F-measurable r.v.s Z, d

E[Z exp(it Xn )] → E[Z exp(it X)]

(n → ∞).

(A3.2.10)

Proof. Equation (A3.2.7) is the special case of (A3.2.8) with Z = IU (ω) and h(x) = IA (x) for U ∈ F and A ∈ B(X ), except that such h(·) is not in general continuous: as in the continuity theorem for weak convergence, (A3.2.8) can be extended to the case where h is bounded and Borel measurable and P{X ∈ ∂h} = 0, where ∂h is the set of discontinuities of h. When X = Rd , (A3.2.10) extends the well-known result that joint convergence of characteristic functions is equivalent to weak convergence of distributions. Note that all of (A3.2.7), (A3.2.8), and (A3.2.10) are contracted versions of the full statement of weak convergence in L1 of the conditional distributions; namely, that E(Z E[h(Xn ) | F]) → E(Z E[h(X) | F])

(n → ∞)

(A3.2.11)

for arbitrary (not necessarily F-measurable) r.v.s Z. However, (A3.2.11) can immediately be reduced to the simpler contracted forms by using the repeated conditioning theorem, which shows ﬁrst that it is enough to consider the case that Z is F-measurable and second that when Z is F-measurable, the conditioning on F can be dropped. If Y is real-valued and F-measurable and in (A3.2.7) we set U = Y −1 (B) for B ∈ B(R), we obtain P{(Xn , Y ) ∈ A × B} → P{(X, Y ) ∈ A × B}, from which (iii) follows. Conversely, taking Y = IU in (iii) yields (A3.2.7).

A3.2.

Convergence Concepts

421

Finally, for any two real-valued F-measurable r.v.s Y, Z, repeated application of (iii) shows that (Xn , Y, Z) converges weakly in distribution to the triple (X, Y, Z). Applying the continuous mapping theorem (Proposition A2.2.VII) yields the result that the pairs (g(Xn , Y ), Z) converge weakly in distribution to (g(X, Y ), Z), which is equivalent to the stable convergence of g(Xn , Y ) to g(X, Y ) by (iii). Since stable convergence implies weak convergence, (iv) implies (iii). When the limit r.v. is independent of the conditioning σ-algebra F, we have a special case of some importance: (A3.2.7) and (A3.2.10) then reduce to the forms P(Xn ∈ A | U ) → P{X ∈ A} and

(P(U ) > 0)

E[Z exp(it Xn )] → E(Z) E[exp(it X)],

(A3.2.12) (A3.2.13)

respectively. In this case, the Xn are said to converge F-mixing to X. In applications, it is often the case that the left-hand sides of relations such as (A3.2.7) converge as n → ∞, but it is not immediately clear that the limit can be associated with the conditional distribution of a well-deﬁned r.v. X. Indeed, in general there is no guarantee that such a limit r.v. will exist, but we can instead extend the probability space in such a way that on the extended space a new sequence of r.v.s can be deﬁned with eﬀectively the same conditional distributions as for the original r.v.s and for which there is F-stable convergence in the limit to a proper conditional distribution. Lemma A3.2.V. Suppose that for each # U ∈ F and for A $ in some covering ring generating B(X ), the sequences P({Xn ∈ A} ∩ U ) converge. Then, there exists a probability space (Ω , E , P ), a measurable mapping T : (Ω , E ) → (Ω, E), and an r.v. X deﬁned on (Ω , E ) such that if F = T −1 F and Xn (ω ) = Xn (T ω ), then Xn → X (F -stably). Proof. Set Ω = X × Ω, and let E be the smallest σ-algebra of subsets of Ω containing both B(X ) ⊗ F and also X × E. Deﬁning T by T (x, ω) = ω, we see that T is measurable. Also, for each A ∈ B(X ) and U ∈ F, the limit π(A × U ) = limn→∞ P({Xn ∈ A} ∩ U ) exists by assumption and deﬁnes a countably additive set function on such product sets. Similarly, we can set π(X × B) = limn→∞ P({Xn ∈ X } ∩ B) = P(B) for B ∈ E. Thus, π can be extended to a countably additive set function, P say, on E . Observe that F = T −1 F consists of all sets X × U for U ∈ F. Deﬁne also X (x, ω) = x. Then, for U = X × U ∈ F , P ({Xn ∈ A} ∩ U ) = P({Xn ∈ A} ∩ U ) → P (A × U ) = P ({X ∈ A} ∩ U ) so that Xn converges to X F-stably. Each of the conditions (i)–(v) of Proposition A3.2.IV consists of a family of sequences, involving r.v.s Xn converging in some sense, and the family of

422

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

the limits is identiﬁed with a family involving a limit r.v. X. It is left to the reader to verify via Lemma A3.2.V that if we are given only the convergence parts of any of these conditions, then the conditions are still equivalent, and it is possible to extend the probability space and construct a new sequence of r.v.s Xn with the same joint probability distributions as the original Xn together with a limit r.v. X such that Xn → X , F-stably, and so on. In a similar vein, there exists the following selection theorem for stable convergence. Proposition A3.2.VI. Let {Xn } be a sequence of X -valued r.v.s on (Ω, E, P) and F a sub-σ-algebra of E. If (i) either F is countably generated or F ⊇ σ(X1 , X2 , . . .), and (ii) the distributions of the {Xn } converge weakly on B(X ), then there exists an extended probability space (Ω , E , P ), elements T , F , Xn deﬁned as in Lemma A3.2.V, a sequence {nk }, and a limit r.v. X such that {Xn k } converges to X , F-stably, as k → ∞. Proof. Suppose ﬁrst that F is countably generated, and denote by R some countable ring generating F. For each U ∈ R, the measures on B(X ) deﬁned by Qn (A; U ) = P({Xn ∈ A} ∩ U ) are uniformly tight because they are strictly dominated by the uniformly tight measures P({Xn ∈ A}). Thus, they contain a weakly convergent subsequence. Using a diagonal selection argument, the subsequence can be so chosen that convergence holds simultaneously for all U ∈ R. Therefore, we can assume that the sequence {Qnk (A; U )} converges as k → ∞ to some limit Q(A; U ) for all A that are continuity sets of this limit measure and for all U ∈ R. Given > 0 and B ∈ F, there exist U , V ∈ R such that U ⊆ B ⊆ V and P(U ) ≥ P(V ) − . Then, the two extreme terms in the chain of inequalities lim Qnk (A; U ) ≤ lim inf P({Xnj ∈ A} ∩ B)

k→∞

k→∞ j>k

≤ lim sup P({Xnj ∈ A} ∩ B) ≤ lim Qnk (A; V ) k→∞ j>k

k→∞

# $ diﬀer by at most , so the sequence P({Xnk ∈ A} ∩ B) also converges. The construction of an extended probability space (Ω , E , P ) and a limit r.v. X now follows as in the lemma, establishing the proposition in the case where F is countably generated. To treat the case where F ⊇ σ(X1 , X2 , . . .), consider ﬁrst the case where F = F0 ≡ σ(X1 , X2 , . . .). This is countably generated because X is separable and only a countable family of r.v.s is involved. Applying the selection argument and extension of the probability space, we can conclude from (A3.2.10) that E[Zh(Xn k )] → E[Zh(X )]

(any F0 -measurable Z).

(A3.2.14)

A3.3.

Processes and Stopping Times

423

Now let Z be any F -measurable r.v. (where F ⊃ F0 ). Because h(Xn k ) is F0 -measurable, we can write E[Z h(Xn k )] = E[ E(Z | F0 ) h(Xn k )], and the convergence follows from (A3.2.14) by the F0 -measurability of E(Z | F0 ). Thus, for any such Z , E[Z h(Xn k )] → E[Z h(X )], implying that Xn k → X (F0 -stably). A systematic account of the topology of stable convergence when F = E but no limit r.v. is assumed is given by Jacod and Memin (1984).

A3.3. Processes and Stopping Times This section is primarily intended as background material for Chapter 14, where the focus is on certain real-valued stochastic processes denoted {Xt (ω)} = {X(t, ω)} = {X(t)} on the positive time axis, t ∈ (0, ∞) ≡ R+ . Other time domains—ﬁnite intervals, or R, or (subsets of) the integers Z = {0, ± 1, . . .} —can be considered: it is left to the reader to supply appropriate modiﬁcations to the theory as needed. Our aim here is to give just so much of the measure-theoretic framework as we hope will make our text intelligible. For a detailed discussion of this framework, texts such as Dellacherie (1972), Dellacherie and Meyer (1978) or Elliott (1982) should be consulted. Condensed accounts of selected results such as given here are also given in Br´emaud (1981), Kallianpur (1980), and Liptser and Shiryayev (1977). While a stochastic process X(t, ω) may be regarded as an indexed family of random variables on a common probability space (Ω, E, P), with index set here taken to be R+ , it is more appropriate for our purposes, as in the general theory, to regard it as a function on the product space R+ × Ω. The stochastic process X: R+ × Ω → B(R+ ) ⊗ E is measurable when this mapping is measurable; that is, for all A ∈ B(R), {(t, ω): X(t, ω) ∈ A} ∈ B(R+ ) ⊗ E,

(A3.3.1)

where the right-hand side denotes the product σ-algebra of the two σ-algebras there. As a consequence of this measurability and Fubini’s theorem, X(·, ω): R+ → R is a.s. measurable, while for measurable functions h: R → R, Y (ω) ≡

h(X(t, ω)) dt R+

is a random variable provided the integral exists. A stochastic process on R+ , if deﬁned merely as an indexed family of r.v.s on a common probability space, is necessarily measurable if, for example, the trajectories are either a.s. continuous or a.s. monotonic and right-continuous.

424

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

The main topic we treat concerns the evolution of a stochastic process; that is, we observe {X(s, ω): 0 < s ≤ t} for some (unknown) ω and ﬁnite time interval (0, t]. It is then natural to consider the σ-algebra (X)

Ft

≡ σ{X(s, ω): 0 < s ≤ t}

generated by all possible such evolutions. Clearly, (X)

Fs(X) ⊆ Ft

for 0 < s < t < ∞. Of course, we may also have some foreknowledge of the process X, and this we represent by a σ-algebra F0 . Quite generally, an expanding family F = {Ft : 0 ≤ t < ∞} of sub-σ-algebras of E is called a ﬁltration or history, and we concentrate on those histories that incorporate information on the process X. For this purpose, we want the r.v. X(t, ω) to be Ft -measurable (all t); we then say that X is F -adapted. We adopt the special notation (X)

H = {Ft

: 0 ≤ t ≤ ∞} ≡ {Ht : 0 ≤ t ≤ ∞},

(X) (X) (X) (X) where F0 = lim inf t>0 Ft = {∅, Ω} and F∞ = t>0 Ft , and call H the internal, minimal, or natural history of the process X, both of these last two names reﬂecting the fact that H is the smallest family of nested σ-algebras to which X is adapted. Any history of the form F = {F0 ∨ Ht : 0 ≤ t ≤ ∞} is called an intrinsic history. Suppose X is measurable and F -adapted. An apparently stronger condition to impose on X is that of progressive measurability with respect to F , meaning that for every t ∈ R+ and any A ∈ B(R), {(s, ω): 0 < s ≤ t, X(s, ω) ∈ A} ∈ B((0, t]) × Ft .

(A3.3.2)

Certainly, (A3.3.2) is more restrictive on X than (A3.3.1), and while (A3.3.2) implies (A3.3.1), the converse is not quite true. What can be shown, however, is that given any measurable F -adapted R-valued process X, we can ﬁnd an F -progressively measurable process Y (that is therefore measurable and F adapted) that is a modiﬁcation of X in the sense of being deﬁned (like X) on (Ω, E, P) and satisfying P{ω: X(t, ω) = Y (t, ω)} = 1

(all t)

(A3.3.3)

(see e.g. Dellacherie and Meyer, 1978, Chapter IV, Theorems 29 and 30). The sets of the form [s, t] × U, 0 ≤ s < t, U ∈ Ft , t ≥ 0, generate a σ-algebra on R+ × Ω, which may be called the F -progressive σ-algebra. Then the requirement that the process X be F -progressively measurable may be rephrased as the requirement that X(t, ω) be measurable with respect to the F -progressive σ-algebra.

A3.3.

Processes and Stopping Times

425

A more restrictive condition to impose on X is that it be F -predictable (the term F -previsible is also used). Call the sub-σ-algebra of B(R+ ) ⊗ E generated by product sets of the form (s, t] × U , where U ∈ Fs , t ≥ s, and 0 ≤ s < ∞, the predictable σ-algebra, denoted ΨF . (The terminology is well chosen because it reﬂects what can be predicted at some ‘future’ time t given the evolution of the process—as revealed by sets U ∈ Fs —up to the ‘present’ time s). Then X is F -predictable when it is ΨF -measurable; that is, for any A ∈ B(R), {(t, ω): X(t, ω) ∈ A} ∈ ΨF . The archetypal F -predictable process is left-continuous, and this is reﬂected in Lemma A3.3.I below, in which the left-continuous history F(−) ≡ {F 1 t− } associated with F appears: here, F0− = F0 and Ft− = lim sups
(0 < a < b < ∞, U ∈ Fa ),

(A3.3.4)

which is F -predictable by construction of ΨF . For given t, {ω: X(t, ω) = 1} =

∅

if a ≥ t or b < t,

U

if a < t ≤ b,

so X(t, ω) is Ft− -measurable. Since an arbitrary F -predictable function can be approximated by a linear combination of functions of this type, and since the class of F(−) -adapted processes is closed under linear combinations and monotone limits, standard extension arguments complete the proof. Indicator functions as in (A3.3.4), and linear combinations of them, can be used to show that the F -predictable σ-algebra ΨF above can be characterized as the σ-algebra generated by the class of bounded left-continuous F -adapted processes (see e.g. Kallianpur, 1980, Lemma 3.1.1). It is often important to examine the behaviour of a process not at a ﬁxed time t but rather a random time T = T (ω). Here the deﬁnition of stopping time is fundamental. Deﬁnition A3.3.II. Given a history F , a nonnegative r.v. T : Ω → [0, ∞] is an F -stopping time if {ω: T (ω) ≤ t} ∈ Ft (0 ≤ t < ∞). If S, T are stopping times, then so are S ∧ T and S ∨ T . Indeed, given a family {Tn : n = 1, 2, . . .} of stopping times, supn≥1 Tn is an F -stopping time, while inf n≥1 Tn is an F(+) -stopping time.

426

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

Since {T (ω) = ∞} = n {T (ω) > n} ∈ F∞ , we can also consider extended stopping times as those for which P{T (ω) < ∞} < 1. While stopping times can be generated in various ways, the most common method is as a ﬁrst passage time, which for a nondecreasing process usually arises as a level-crossing time. Lemma A3.3.III. Let X be an F -adapted monotonically increasing rightcontinuous process, and let Y be an F0 -measurable r.v. Then T (ω) ≡ inf{t: X(t, ω) ≥ Y (ω)} is an F -stopping time, possibly extended, while if X is F -predictable, then T is an (extended) F(−) -stopping time. Proof. If Y is constant, X(t) ≥ Y if and only if T ≤ t, and since {ω: X(t, ω) ≥ Y } ∈ Ft , we also have {T (ω) ≤ t} ∈ Ft . More generally, X(t, ω) − Y (ω) is monotonically increasing, right-continuous, and F -adapted (because Y , being F0 -measurable, is necessarily Ft -measurable for every t > 0). Then, by the same argument, {T (ω) ≤ t} = {ω: X(t, ω) − Y (ω) ≥ 0} ∈ Ft . Finally, when X is F -predictable, it is F(−) -adapted, and thus we can replace Ft by Ft− throughout. The next result shows that a process stopped at an F -stopping time T inherits some of the regularity properties of the original process. Here we use the notation X(t) (t ≤ T ), X(t ∧ T ) = X(T ) (t > T ). Proposition A3.3.IV. Let F be a history, T an F -stopping time, and X a process. Then X(t ∧ T ) is measurable, F -progressive, or F -predictable, according to whether X(t) itself is measurable, F -progressive, or F -predictable. In all these cases, if T < ∞ a.s., then X(T ) is an F∞ -measurable r.v. Proof. The product σ-algebra B(R+ ) ⊗ E is generated by sets of the form (a, ∞) × B for real ﬁnite a and B ∈ E. Since {(t, ω): (t ∧ T (ω), ω) ∈ (a, ∞) × B} = (a, ∞) × (B ∩ {T (ω) > a}) and B ∩ {T (ω) > a} ∈ E, if X is measurable, so is Y (t, ω) ≡ X(t ∧ T (ω), ω). The F -predictable σ-algebra ΨF is generated by sets of a similar product form but with B ∈ Fa . Since {T (ω) > a} ∈ Fa , (a, ∞) × (B ∩ {T (ω) > a}) is also a set generating ΨF , and thus if X is F -predictable, so is Y as before. Suppose now that X is F -progressive so that for given t in 0 < t < ∞, {X(s, ω): o < s ≤ t} is measurable as a process on (0, t] with probability space (Ω, Ft , P). Then, the ﬁrst argument shows that Y (s) ≡ X(s ∧ T ) is a measurable process on this space; that is, X(t ∧ T ) is F -progressive. On the set {T < ∞}, X(t∧T ) → X(T ) as t → ∞, so when P{T < ∞} = 1, X(T ) is an r.v. as asserted. As an important corollary to this result, observe that if X is F -progressive and a.s. integrable on ﬁnite intervals, then

A3.3.

Processes and Stopping Times

427

Y (t, ω) =

t

X(s, ω) ds 0

is F -progressive, Y (T ) is an r.v. if T < ∞ a.s., and Y (t ∧ T ) is again F progressive. We conclude this section with some remarks about the possibility of a converse to Lemma A3.3.I. In the case of a quite general history, no such result of this kind holds, as is shown by the discussion in Dellacherie and Meyer (1978), especially around Chapter IV, Section 97. On the other hand, it is shown in the same reference that when X is deﬁned on the canonical # measure space (M# [0,∞) , B(M[0,∞) )), the two concepts of being F(−) -adapted and F -predictable can be identiﬁed, a fact exploited in the treatment by Jacobsen (1982). The situation can be illustrated further by the two indicator processes VT− (t, ω) ≡ I{T (ω)
VT+ (t, ω) ≡ I{T (ω)≤t} (t, ω),

generated by an F -stopping time T . The trajectories of VT+ are rightcontinuous while those of VT− are left-continuous. Since Ft {ω: T (ω) ≤ t} = {ω: VT+ (t) = 1}, it follows that VT+ is F -adapted. So too is VT− because ∞ 1 ∈ Ft . ω: T (ω) ≤ t − {ω: VT− (t) = 1} = {ω: T (ω) < t} = n n=1 Hence, both VT+ and VT− are F -progressively measurable [see the earlier comments or Br´emaud (1981, Theorem A1.T33)]. Being left-continuous, VT− is F -predictable (e.g. Br´emaud, 1981, Theorem 1.T9) and hence also F(−) -adapted. No such statement can be made in general about VT+ . However, suppose further that T is not only an F -stopping time but also an F(−) -stopping time, so that from the above, VT+ is F(−) -adapted. Can we assert that it is F -predictable? Suppose T is a countably-valued r.v., so for some countable set {tk } ⊂ R+ , T

−1

({tk : k = 1, 2, . . .}) =

∞

T

−1

(tk ) =

k=1

Then {(t, ω): VT+ (t, ω) = 1} =

∞

Uk ,

say, = Ω.

k=1 ∞

[tk , ∞) × Uk .

k=1

By assumption, T being an F(−) -stopping time, Uk ∈ Ftk − , so Uk ∈ $ # + σ n Ftk −1/n and hence VT is F -predictable. While it can be proved that any F -stopping time can be approximated from above by a sequence of stopping times taking only a countable set of values, this is not enough to treat the general case—indeed, the counterexample considered by Dellacherie and Meyer is just of this indicator function type.

428

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

A3.4. Martingales Deﬁnition A3.4.I. Let (Ω, E, P) be a probability space, F a history on (Ω, E), and X(·) ≡ {X(t): 0 ≤ t < ∞} a real-valued process adapted to F and such that E(|X(t)|) < ∞ for 0 ≤ t < ∞. Then X is an F -martingale if for 0 ≤ s < t < ∞, E[X(t) | Fs ] = X(s)

a.s.,

(A3.4.1)

E[X(t) | Fs ] ≥ X(s)

a.s.,

(A3.4.2)

an F -submartingale if and an F -supermartingale if the reverse inequality in (A3.4.2) holds. Strictly, we should speak of X as a P-F -martingale: mostly, it is enough to call it a martingale since both P and F are clear from the context. While the concept of a martingale had its origins in gambling strategies, it has come to play a dominant role in the modern theory of stochastic processes. In our text, we need only a small number of the many striking results concerning martingales and their relatives, principally those connected with stopping times and the Doob–Meyer decomposition. An important example of a martingale is formed from an F∞ -measurable r.v. X∞ with ﬁnite mean by taking successive conditional expectations with respect to F : deﬁne (A3.4.3) X(t) = E(X∞ | Ft ). Such a martingale is uniformly integrable. The converse statement is also true (see e.g. Liptser and Shiryayev, 1977, Theorem 3.6). Proposition A3.4.II. Let X(·) be a uniformly integrable F -martingale. Then, there exists an F∞ -measurable r.v. X∞ such that (A3.4.3) holds. The following form of the well-known convergence theorem can be found in Liptser and Shiryayev (1977, Theorem 3.3). Theorem A3.4.III. Let X(·) be an F -submartingale with a.s. right-continuous trajectories. If sup0≤t<∞ E[max(0, X(t))] < ∞, then there exists an F∞ -measurable r.v. X∞ such that X(t, ω) → X∞ (ω)

(t → ∞)

a.s.

If also X(·) is uniformly integrable, then E(|X∞ |) < ∞ and E(|X(t)−X∞ |) → 0 as t → ∞; that is, X(t) → X∞ in L1 norm. This theorem can be applied to the example in (A3.4.3) whether the family of σ-algebras {Ft } is increasing (as with a history F ) or decreasing. For convenience, we state the result in terms of atwo-sided history G = {Gt : −∞ < t < ∞}, deﬁning G∞ as usual and G−∞ = −∞
A3.4.

Martingales

429

Corollary A3.4.IV. If the r.v. Y is G∞ -measurable, has ﬁnite ﬁrst moment, and Y (t) ≡ E(Y | Gt ) has a.s. right-continuous trajectories on −∞ < t < ∞ for some two-sided history G, then Y (t → ∞), E(Y | Gt ) → (A3.4.4) E(Y | G−∞ ) (t → −∞), both a.s. and in L1 norm. In most point process applications, the processes concerned are rightcontinuous by deﬁnition, so the sample-path conditions for the convergence results above are automatically satisﬁed. In the general theory of processes, it is shown that, if the history F is right-continuous and the σ-algebras are P-complete in the strong sense that F0 (and hence Ft for all t > 0) contains all P-null sets from F∞ , there always exists a right-continuous modiﬁcation of an F -submartingale, with the additional property that this modiﬁcation also has left limits at each t > 0; that is, the (modiﬁed) process is c` adl` ag [see e.g. Liptser and Shiryayev (1977, pp. 55–59) or Dellacherie and Meyer (1980); Elliott (1982) uses corlol, the acronym of the English equivalent, continuous on right, limits on left]. In turning to properties of martingales with ﬁxed times s, t replaced by stopping times S, T , say, we need the notion of σ-algebras consisting of events prior to (and including) the time T and also strictly prior to T . Deﬁnition A3.4.V. Let F be a history and T an F -stopping time. The T -prior σ-algebra FT is the sub-σ-algebra of F∞ deﬁned by FT = {A: A ∈ F∞ and A ∩ {T ≤ t} ∈ Ft for every t}; the strict T -prior σ-algebra FT − is generated by the sets # $ {A: A ∈ F0 } ∪ A ∩ {T > t} for A ∈ Ft and t ≥ 0 . Clearly, FT and FT − are somewhat diﬀerent entities (see Dellacherie and Meyer, 1978, p. 117). It can be checked that T is both FT - and FT − measurable. A contrast is provided in the next result. Lemma A3.4.VI. Let F be a history, T an F -stopping time, and X(·) an F -progressive process. Then X(T ) is FT -measurable. Further, if X(·) is F -predictable, then X(T ) is FT − -measurable. Proof. Suppose X(·) is F -progressive. Setting for any x ∈ R Ax = {ω: X(T (ω), ω) ≤ x}, X(T ) is FT -measurable if Ax ∩ {T ≤ t} ∈ Ft . But from Proposition A3.4.IV, X(t ∧ T ) is F -progressive, and therefore F -adapted, so {ω: X(t ∧ T (ω), ω) ≤ x} ∈ Ft ; hence Ax ∩ {T ≤ t} = {ω: X(t ∧ T (ω), ω) ≤ x} ∩ {T ≤ t} ∈ Ft .

430

APPENDIX 3. Conditional Expectations, Stopping Times, Martingales

Now suppose that X(·) is F -predictable. To show the FT − -measurability of X(T ), look at the inverse image under X(T ): ω → X(T (ω), ω) ∈ R of a generating set (t, ∞)×A (A ∈ Ft ) of the F -predictable σ-algebra ΨF , namely {ω: t < T (ω) < ∞} ∩ {ω: ω ∈ A}, which is a generating set for FT − . The optional sampling theorem for martingales follows (see e.g. Liptser and Shiryayev, 1977, pp. 60–61). Theorem A3.4.VII. Let F be a history, S and T the F -stopping times with S ≤ T a.s., and X(·) an F -submartingale that is uniformly integrable and has right-continuous trajectories. Then FS ⊆ FT and E[X(T ) | FS ] ≥ X(S)

a.s.,

where equality holds if X is an F -martingale. Corollary A3.4.VIII. Let T be an F -stopping time. If X(·) is a uniformly integrable F -martingale (resp. submartingale), then so is X(t ∧ T ). Proof. For ﬁxed s, t with s < t, s ∧ T and t ∧ T are two stopping times satisfying the conditions of the theorem, so E[X(t ∧ T ) | Fs∧T ] ≥ X(s ∧ T ), and thus {X(t ∧ T )} is an {Ft∧T }-martingale. To show the stronger property that it is an F -martingale, note that Ft∧T ⊆ Ft so {X(t ∧ T )} is F -adapted, and it remains to show that Xt∧T P(dω) ≥ Xs∧T P(dω) (all A ∈ Fs ), (A3.4.5) A

A

knowing that it holds for all A ∈ Fs∧T . Express the left-hand side as the sum of integrals over A1 = A ∩ {T > s} and A2 = A ∩ {T ≤ s}. Certainly, A1 ∈ Fs , while if u < s, ∅ ∈ Fu A1 ∩ {s ∧ T ≤ u} = A ∩ {T > s} ∩ {s ∧ T ≤ u} = A1 ∈ Fs if u ≥ s. Now Fs ⊆ Fu , so by deﬁnition of Fs∧T , we have A1 ∈ Fs∧T , and (A3.4.5) holds for A1 . On A2 , t ≥ s ≥ T so X(t ∧ T ) = X(s ∧ T ) there, and (A3.4.5) holds for A2 . By addition, we have shown (A3.4.5). Finally, we quote the form of the Doob–Meyer decomposition theorem used in Chapter 14; see e.g. Liptser and Shiryayev (1977) for proof. Theorem A3.4.IX (Doob–Meyer). Let F be a history and X(·) a bounded F -submartingale with right-continuous trajectories. Then, there exists a unique (up to equivalence) uniformly integrable F -martingale Y (·) and a unique F -predictable cumulative process A(·) such that X(t) = Y (t) + A(t).

(A3.4.6)

A3.4.

Martingales

431

For nondecreasing processes A(·) with right-continuous trajectories, it can be shown that F -predictability is equivalent to the property that for every bounded F -martingale Z(·) and positive u, u u Z(t) A(dt) = E Z(t−) A(dt) . E 0

0

Since for -adapted u cumulative process ξ and any F -martingale Z, uany F E Z(u) 0 ξ(dt) = E 0 Z(t) ξ(dt) , the property above is equivalent to u E[Z(u)A(u)] = E Z(t−) A(dt) . 0

A cumulative process with this property is referred to in many texts as a natural increasing process. The theorem can then be rephrased thus: every bounded submartingale has a unique decomposition into the sum of a uniformly integrable martingale and a natural increasing function. The relation between natural increasing and predictable processes is discussed in Dellacherie and Meyer (1980). The boundedness condition in Theorem A3.4.IX is much stronger than is really necessary, and it is a special case of Liptser and Shiryayev’s (1977) ‘Class D’ condition for supermartingales; namely, that the family {X(T )} is uniformly integrable for all F -stopping times. More general results, of which the decomposition for point processes described in Chapter 13 is in fact a special case, relax the boundedness or uniform integrability conditions but weaken the conclusion by requiring Y (·) to be only a local martingale [i.e. the stopped processes Y (· ∧ Tn ) are martingales for a suitable increasing sequence {Tn } of F -stopping times]. The Doob–Meyer theorem is often stated for supermartingales, in which case the natural increasing function should be subtracted from the martingale term, not added to it. Given an F -martingale S, it is square integrable on [0, τ ] for some τ ≤ ∞ if sup0
(0 ≤ t ≤ τ )

(A3.4.7)

for some F -martingale Y2 (·) and F -predictable process A2 (·). It is readily checked that for 0 ≤ s < t ≤ τ , A2 (t) − A2 (s) = E[(Xt − Xs )2 | Fs ], hence the name quadratic variation process for A2 (·). Equation (A3.4.7) can be established for any square-integrable martingale via the general Doob– Meyer theorem. A signiﬁcant calculus for such processes, including applications to point processes, can be constructed as in Kunita and Watanabe (1967) and Br´emaud (1981).

References with Index

[At the end of each reference entry is the page number or numbers where it is cited. A bibliography of about 600 references up to about 1970, although excluding much of the historical material of Chapter 1 of this book, is given in D.J. Daley and R.K. Milne (1972), The theory of point processes: A bibliography, Int. Statist. Rev. 41, 183–201.] Aalen, O.O. (1975). Statistical Inference for a Family of Counting Processes. Ph.D. thesis, Statistics Dept., University of California, Berkeley. [17, 238] —— (1978). Non-parametric inference for a family of counting processes. Ann. Statist. 6, 701–726. [238] ¨ Abb´e, E. (1879). Uber Blutk¨ orper-Zahlung. Jena Z. Med. Naturwiss. 13 (New Series 6), 98–105. [8] Aldous, D. and Eagleson, G.K. (1978). On mixing and stability of limit theorems. Ann. Probab. 6, 325–331. [420] Ammann, L.P. and Thall, P.F. (1979). Count distributions, orderliness and invariance of Poisson cluster processes. J. Appl. Probab. 16, 261–273. [227] Andersen, P.K., Borgan, Ø., Gill, R.D., and Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer, New York. [17, 108–109, 238] Andersson, H. and Britton, T. (2000). Stochastic Epidemic Models and Their Statistical Analysis, Lecture Notes in Statistics 151. Springer-Verlag, New York. [217] Andrews, G.E. (1976). The Theory of Partitions. Addison–Wesley, Reading, MA. [121] Argabright, L. and de Lamadrid, J.G. (1974). Fourier analysis of unbounded measures on locally compact abelian groups. Mem. Amer. Math. Soc. 145. [303, 358] Arjas, E., Nummelin, E., and Tweedie, R.L. (1978). Uniform limit theorems for non-singular renewal and Markov renewal processes. J. Appl. Probab. 15, 112– 125. [89–90] 432

References with Index

433

Ash, R.B. (1972). Real Analysis and Probability. Academic Press, New York. [380, 414] Athreya, K. and Ney, P.E. (1972). Branching Processes. Springer-Verlag, New York. [151] —— and —— (1978). A new approach to the limit theory of recurrent Markov chains. Trans. Amer. Math. Soc. 245, 493–501. [96–97] ——, Tweedie, R.L., and Vere-Jones, D. (1980). Asymptotic behaviour of pointprocesses with Markov-dependent intervals. Math. Nachr. 99, 301–313. [102] Baccelli, F. and Br´emaud, P. (1994). Elements of Queueing Theory. Springer-Verlag, Berlin. [17, 211] Baddeley, A.J. (1998). A crash course on stochastic geometry. In Barndorﬀ-Nielsen et al. (1998), 1–35. [221] —— (2001). Likelihoods and pseudolikelihoods for Markov spatial processes. In de Gunst, M.C.M., Klaassen, C.A.J., and van der Vaart, W. (Eds.), State of the Art in Probability and Statistics: Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics Monograph Series 36. IMS, Hayward, CA, pp. 21–49. [217] —— and Møller, J. (1989). Nearest-neighbour Markov point processes and random sets. Int. Statist. Rev. 57, 89–121. [111] —— and Turner, R. (2000). Practical maximum pseudo-likelihood for spatial point patterns (with Discussion). Aust. N.Z. J. Statist. 42, 283–322. [232] —— , van Lieshout, M.N.M., and Møller, J. (1996). Markov properties of cluster processes. Adv. Appl. Probab. 28, 346–355. [111] Barndorﬀ-Nielsen, O.E., Kendall, W.S., and van Lieshout, M.N.M. (Eds.) (1998). Stochastic Geometry: Likelihood and Computation. Chapman and Hall, London. [17, 111] Bartlett, M.S. (1954). Processus stochastiques ponctuels. Ann. Inst. Henri Poincar´e 14, 35–60. [14] —— (1955). An Introduction to Stochastic Processes. Cambridge University Press, Cambridge [2nd ed. 1966; 3rd ed. 1978]. [13] —— (1963). The spectral analysis of point processes. J. Roy. Statist. Soc. Ser. B 25, 264–296. [16, 182, 303, 305] —— (1964). The spectral analysis of two-dimensional point processes. Biometrika 51, 299–311. [16] —— (1967). The spectral analysis of line processes. Proc. Fifth Berkeley Symp. Math. Statist. Probab. 3, 135–153. [329] —— and Kendall, D.G. (1951). On the use of the characteristic functional in the analysis of some stochastic processes in physics and biology. Proc. Cambridge Philos. Soc. 47, 65–76. [15, 150] Bateman, H. (1910). Note on the probability distribution of α-particles. Philos. Mag. 20 (6), 704–707. [Note to E. Rutherford and H. Geiger, The probability variations in the distribution of α-particles, Philos. Mag. 20 (6), 698–704.] [9] Baudin, M. (1981). Likelihood and nearest-neighbor distance properties of multidimensional Poisson cluster processes. J. Appl. Probab. 18, 879–888. [221, 227] Bebbington, M. and Harte, D. (2001). On the statistics of the linked stress release model. In Daley, D.J. (Ed.), Probability, Statistics and Seismology, J. Appl. Probab. 38A, 176–187. [240, 256]

434

References with Index

Benard, C. and Macchi, O. (1973). Detection and emission processes of quantum particles in a chaotic state. J. Math. Phys. 14, 155–167. [140] Berb´ee, H. (1983). A bound on the size of point clusters of a random walk with stationary increments. Ann. Probab. 11, 414–418. [301] Berg, C. and Forst, G. (1975). Potential Theory on Locally Compact Abelian Groups, Ergebnisse der Mathematik und ihrer Grenzgebiete 87. Springer-Verlag, New York. [357] Berman, M. (1983). Discussion of Ogata’s paper. Bull. Int. Statist. Inst. 50 (3), 412–422. [244] Bhabha, H.J. (1950). On the stochastic theory of continuous parametric systems and its application to electron-photon cascades. Proc. Roy. Soc. London Ser. A 202, 301–332. [15, 111, 124, 136] Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. [391] Bloom, W.R. (1984). Translation bounded measures and the Orlicz–Paley–Sidon theorem. In Probability Measures on Groups, VII (Oberwolfach, 1983), Lecture Notes in Mathematics 1064. Springer, Berlin, pp. 1–9. [367] Bochner, S. (1947). Stochastic processes. Ann. Math. 48, 1014–1061. [15] —— (1955). Harmonic Analysis and the Theory of Probability. University of California Press, Berkeley. [15, 303, 338] Boel, R., Varaiya, P., and Wong, E. (1975). Martingales on jump processes, I: Representation results, and II: Applications. SIAM J. Control 13, 999–1021 and 1022– 1061. [211] Bogoliubov, N.N. (1946). Problems of a Dynamical Theory in Statistical Physics (in Russian). Gostekhizdat, Moscow. [Translated by E.K. Gora in de Boer, J. and Uhlenbeck, G.E. (Eds.), Studies in Statistical Mechanics, Vol. 1, North-Holland, Amsterdam, 1962, pp. 5–116.] [15, 111, 124] Bol’shakov, I.A. (1969). Statistical Problems in Isolating a Stream of Signals from Noise (in Russian). Sovyetskoye Radio, Moscow. [146, 185] Boltzmann, L. (1868). Studien u ¨ ber das Gleichgewicht der lebendigen Kraft zwischen bewegten materiellen Punkten. Sitzungsber. Math. Naturwiss. Kl. Kais. Akad. Wiss. 58, 517–560. [9] Borovkov, K. and Vere-Jones, D. (2000). Explicit formulae for stationary distributions of stress release models. J. Appl. Probab. 37, 315–321. [245] ´ ements de Math´ematique, Fasc. XXIX (Livre VI, Int´egration, Bourbaki, N. (1963). El´ Chaps. 7 et 8), Actualit´es Scientiﬁques et Industrielles 1306. Hermann, Paris. [409] Breiman, L. (1965). Some probabilistic aspects of the renewal theorem. In Transactions of the Fourth Prague Conference on Information Theory, Statistics and Decision Functions, Prague, 1965. Academia, Prague, pp. 255–261. [87] Br´emaud, P. (1972). A Martingale Approach to Point Processes. Ph.D. thesis, Electrical Engineering Dept., University of California, Berkeley. [211] —— (1981). Point Processes and Queues: Martingale Dynamics. Springer-Verlag, New York. [14, 17, 107–108, 211, 231, 377, 414, 423, 427, 431] —— and Massouli´e, L. (1994). Imbedded construction of stationary point processes and sequences with a random memory. Queueing Syst. 17, 213–234. [252] —— and —— (1996). Stability of nonlinear Hawkes processes. Ann. Probab. 24, 1563–1588. [202, 252–253, 268, 275]

References with Index

435

Br´emaud, P. and Massouli´e, L. (2001). Hawkes branching point processes without ancestors. J. Appl. Probab. 38, 122–135. [203] Bretagnolle, J. and Dacunha-Castelle, D. (1967). Sur une classe de marches al´eatoires. Ann. Inst. Henri Poincar´e 3, 403–431. [90] Brillinger, D.R. (1972). The spectral analysis of stationary interval functions. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 1, 483–513. [15, 303, 331, 338] —— (1975a). The identiﬁcation of point process systems. Ann. Probab. 3, 909–929. [318] —— (1975b). Stochastic inference for stationary point processes. In Puri, M.L. (Ed.) Stochastic Processes and Related Topics. Academic Press, New York, pp. 55–99. [Reprinted in Brillinger (1981).] [17, 318] —— (1978). Comparative aspects of the study of ordinary time series and point processes. In Krishnaiah, P.R. (Ed.), Developments in Statistics, Vol. I. Academic Press, New York, pp. 33–133. [303, 318, 337] —— (1981). Time Series: Data Analysis and Theory, 2nd ed. Holden–Day, San Francisco. [318, 337] —— (1992). Nerve cell spike train analysis: A progression of technique. J. Amer. Statist. Assoc. 87, 260–271. [318] Brix, A. and Kendall, W.S. (2002). Simulation of cluster point processes without edge eﬀects. Adv. Appl. Probab. 34, 267–280. [221, 275] Brown, T.M. and Nair, M. (1988). A simple proof of the multivariate random time change theorem for point processes. J. Appl. Probab. 25, 210–214. [264] Campbell, N.R. (1909). The study of discontinuous phenomena. Proc. Cambridge Philos. Soc. 15, 117–136. [163] Cane, V.R. (1974). The concept of accident proneness. Izv. Mat. Inst. Bulgar. Akad. Sci. 15, 183–189. [11–12] —— (1977). A class of non-identiﬁable stochastic models. J. Appl. Probab. 14, 475–482. [11] Carlsson, H. and Nerman, O. (1986). An alternative proof of Lorden’s renewal inequality. Adv. Appl. Probab. 18, 1015–1016. [91] Chernick, M.R., Daley, D.J., and Littlejohn, R.P. (1988). A time-reversibility relationship between two Markov chains with exponential stationary distributions. J. Appl. Probab. 25, 418–422. [105] Chong, F.S. (1981). A point process with second order Markov dependent intervals. Math. Nachr. 103, 155–163. [105, 298, 303] Chung, K.L. (1972). Crudely stationary point processes. Amer. Math. Monthly 79, 867–877. [44] —— (1974). A Course in Probability Theory, 2nd ed. Academic Press, New York. [383, 414] Copson, E.C. (1935). An Introduction to the Theory of Functions of a Complex Variable. Oxford University Press, Oxford. [310, 336] Coram, M. and Diaconis, P. (2002). New tests of the correspondence between unitary eigenvalues and the zeros of Riemann’s zeta function. J. Phys. A (to appear). [18, 140] Cox, D.R. (1955). Some statistical methods connected with series of events (with Discussion). J. Roy. Statist. Soc. Ser. B 17, 129–164. [16, 105, 110, 169]

436

References with Index

Cox, D.R. (1962). Renewal Theory. Methuen, London. [66] —— (1972a). The statistical analysis of dependencies in point processes. In Lewis (1972), pp. 55–66. [211, 238] —— (1972b). Regression models and life tables (with Discussion). J. Roy. Statist. Soc. Ser. B 34, 187–220. [17, 238] —— (1975). Partial likelihood. Biometrika 62, 269–276. [236] —— and Isham, V. (1980). Point Processes. Chapman and Hall, London. [66, 105, 295–296, 301–302] —— and Lewis, P.A.W. (1966). The Statistical Analysis of Series of Events. Methuen, London. [16–17, 20, 66, 261–262, 296, 303] Cram´er, H. (1930). On the Mathematical Theory of Risk. Skandia Jubilee Volume, Stockholm. [Reprinted in Cram´er, H. (1994). Collected Works (A. Martin-L¨ of, Ed.). Springer, Berlin.] [199] —— and Leadbetter, M.R. (1967). Stationary and Related Stochastic Processes. Wiley, New York. [14, 333, 345] Cressie, N.A.C. (1991). Statistics for Spatial Data. Wiley, New York. [Rev. ed. 1993.] [17, 111, 222, 320] Daley, D.J. (1965). On a class of renewal functions. Proc. Cambridge Philos. Soc. 61, 519–526. [78] —— (1971). Weakly stationary point processes and random measures. J. Roy. Statist. Soc. Ser. B 33, 406–428. [61, 301, 305, 331] —— (1972a). A bivariate Poisson queueing process that is not inﬁnitely divisible. Proc. Cambridge Philos. Soc. 72, 449–450. [188] —— (1972b). Asymptotic properties of stationary point processes with generalized clusters. Z. Wahrs. 21, 65–76. [308] —— (1973a). Poisson and alternating renewal processes with superposition a renewal process. Math. Nachr. 57, 359–369. [82] —— (1973b). Markovian processes whose jump epochs constitute a renewal process. Quart. J. Math. Oxford Ser. (2) 24, 97–105. [82] —— (1974). Various concepts of orderliness for point processes. In Harding and Kendall (1974), pp. 148–161. [47, 52] —— (1981). The absolute convergence of weighted sums of dependent sequences of random variables. Z. Wahrs. 58, 199–203. [168] —— (1982). Stationary point processes with Markov-dependent intervals and inﬁnite intensity. In Gani, J. and Hannan, E.J. (Eds.), Essays in Statistical Science, J. Appl. Probab. 19A, 313–320. [94, 105] —— (1999). The Hurst index of long-range dependent renewal processes. Ann. Probab. 27, 2035–2041. [106] —— and Milne, R.K. (1975). Orderliness, intensities and Palm–Khinchin equations for multivariate point processes. J. Appl. Probab. 12, 383–389. [331] —— and Narayan, P. (1980). Series expansions of probability generating functions and bounds for the extinction probability of a branching process. J. Appl. Probab. 17, 939–947. [119, 122] ——, Rolski, T., and Vesilo, R. (2000). Long-range dependent point processes and their Palm–Khinchin distributions. Adv. Appl. Probab. 32, 1051–1063. [106]

References with Index

437

Daley, D.J. and Vere-Jones, D. (1972). A summary of the theory of point processes. In Lewis (1972), pp. 299–383. [60] Daniels, H.E. (1945). The statistical theory of the strength of bundles of threads, I. Proc. Roy. Soc. London Ser. A 183, 405–435. [7] Darwin, J.H. (1957). The power of the Poisson index of dispersion. Biometrika 44, 286–289. [23] David, F.N. and Barton, D.E. (1962). Combinatorial Chance. Griﬃn, London. [114–115, 121] Davidson, R. (1974). Construction of line processes: Second-order properties. In Harding and Kendall (1974), pp. 55–75. [Original publication (1970), Izv. Akad. Nauk Armen. SSR Ser. Mat. 5, 219–234.] [305] Davies, R.B. (1977). Testing the hypothesis that a point process is Poisson. Adv. Appl. Probab. 9, 724–746. [222, 226, 228–229] —— (1987). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 33–43. [263] Dawson, D.A., Fleischmann, K., and Mueller, C. (2000). Finite time extinction of superprocesses with catalysts. Ann. Probab. 28, 603–642. [18] Dellacherie, C. (1972). Capacit´es et Processus Stochastiques. Springer, Berlin. [423] —— and Meyer, P.-A. (1978). Probabilities and Potential. Hermann, Paris, and North-Holland, Amsterdam. [423–424, 427] —— and —— (1980). Probabilit´es et Potential, Chap. V–VIII, Th´eorie des Martingales. Hermann, Paris. [429, 431] Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–22. [244] Diaconis, P. and Evans, S.N. (2000). Immanants and ﬁnite point processes. In memory of Gian-Carlo Rota. J. Combin. Theory Ser. A 91, 305–321. [18] —— and —— (2001). Linear functionals of eigenvalues of random matrices. Trans. Amer. Math. Soc. 353, 2615–2633. [18, 141] Diggle, P.J. (1983). Statistical Analysis of Spatial Point Patterns. Academic Press, London. [17, 111, 300] ——, Fiksel, T., Grabanik, P., Ogata, Y., Stoyan, D., and Tanemura, M. (1994). On parameter estimation for pairwise-interaction point processes. Internat. Statist. Rev. 62, 99–117. [217] —— and Milne, R.K. (1983). Negative binomial quadrat counts and point processes. Scand. J. Statist. 10, 257–267. [200] Doob, J.L. (1948). Renewal theory from the point of view of the theory of probability. Trans. Amer. Math. Soc. 63, 422–438. [72] —— (1949). Time series and harmonic analysis. In Neyman, J. (Ed.), Berkeley Symposium on Mathematical Statististics and Probabability. University of California Press, Berkeley, pp. 303–343. [303, 331] —— (1953). Stochastic Processes. Wiley, New York. [61, 303–304, 333, 339, 417] ——, Snell, J.L., and Williamson, R.E. (1960). Application of boundary theory to sums of independent random variables. In Contributions to Probability and Statistics (Essays in Honor of H. Hotelling), Stanford University Press, Stanford, CA, pp. 182–197. [74] Dwass, M. and Teicher, H. (1957). On inﬁnitely divisible random vectors. Ann. Math. Statist. 28, 461–470. [188]

438

References with Index

¨ Eggenberger, F. and P´ olya, G. (1923). Uber die Statistik verketteter Vorg¨ ange. Z. Angew. Math. Mech. 3, 279–289. [11] Elliott, R.J. (1982). Stochastic Calculus and Applications. Springer-Verlag, New York. [423, 429] —— , Aggoun, L., and Moore, J.B. (1995). Hidden Markov Models. Springer, New York. [244] Ellis, R.L. (1844). On a question in the theory of probabilities. Cambridge Math. J. 4 (21), 127–133. [Reprinted in W. Walton (Ed.) (1863) The Mathematical and Other Writings of Robert Leslie Ellis, Deighton Bell, Cambridge, pp. 173–179.] [4] Embrechts, P., Kl¨ uppelberg, C., and Mikosch, T. (1997). Modelling Extremal Events. Springer, Berlin. [17, 200] Erlang, A.K. (1909). The theory of probabilities and telephone conversations. Nyt. Tidsskr. Mat. B 20, 33–41. [Reprinted in E. Brockmeyer, H.L. Halstrom and A. Jensen (1948), The Life and Works of A.K. Erlang, Copenhagen Telephone Company, Copenhagen, pp. 131–137.] [5, 9] Feller, W. (1950). An Introduction to Probability Theory and Its Applications, Vol. 1. Wiley, New York [2nd ed. 1957; 3rd ed. 1968]. [13] —— (1966). An Introduction to Probability Theory and Its Applications, Vol. 2. Wiley, New York [2nd ed. 1971]. [45, 62, 64, 66, 70, 74, 84, 86–87, 90–91, 412] —— (1968). [= 3rd ed. of Feller (1950).] [28–30] Fieger, W. (1971). Die Anzahl der γ-Niveau-Kreuzungspunkte von stochastische Prozessen. Z. Wahrs. 18, 227–260. [59] Fosam, E.B. and Shanbhag, D.N. (1997). Variants of the Choquet–Deny theorem with applications. J. Appl. Probab. 34, 101–106. [82] Franken, P., K¨ onig, D., Arndt, U., and Schmidt, V. (1981). Queues and Point Processes. Akademie-Verlag, Berlin. [14, 17] Fr´echet, M. (1940). Les probabilit´es associ´ees ` a un systeme d’´ev´enements compatibles et d´ependants, Actualit´es Scientiﬁques et Industrielles 859. Hermann, Paris. [119] Galambos, J. (1975). Methods for proving Bonferroni inequalities. J. London Math. Soc. 9 (2), 561–564. [119] —— and Kotz, S. (1978). Characterizations of Probability Distributions, Lecture Notes in Mathematics 675. Springer-Verlag, Berlin. [24, 77, 82] Galton, F. and Watson, H.W. (1874). On the probability of extinction of families. J. Roy. Anthropol. Inst. 4, 138–144. [9] Gaver, D.P. (1963). Random hazard in reliability problems. Technometrics 5, 211– 216. [211] —— and Lewis, P.A.W. (1980). First-order autoregressive gamma sequences and point processes. Adv. Appl. Probab. 12, 727–745. [92] Georgii, H.-O. (1988). Gibbs Measures and Phase Transitions. W. de Gruyter, Berlin. [18] Glass, L. and Tobler, W.R. (1971). Uniform distribution of objects in a homogeneous ﬁeld: Cities on a plain. Nature 233, 67–68. [298] Goldman, J.R. (1967). Stochastic point processes: Limit theorems. Ann. Math. Statist. 8, 771–779. [31]

References with Index

439

Goodman, N.R. and Dubman, M.R. (1969). The theory of time-varying spectral analysis and complex Wishart matrix processes. In Krishnaiah, P.R. (Ed.), Multivariate Analysis II. Academic Press, New York, pp. 351–366. [172] Grandell, J. (1976). Doubly Stochastic Poisson Processes, Lecture Notes in Mathematics 529. Springer-Verlag, New York. [173, 175] Graunt, J. (1662). Natural and Political Observations Made Upon the Bills of Mortality. John Martin, London. [Reprinted in facsimile in The Earliest Classics: John Graunt and Gregory King (1973). Gregg International Publishers, Farnborough.] [3] Greenwood, M. and Yule, G.U. (1920). An enquiry into the nature of frequency distributions of multiple happenings, with particular reference to the occurrence of multiple attacks of disease or repeated accidents. J. Roy. Statist. Soc. 83, 255– 279. [10] Gr´egoire, G. (1984). Negative binomial distribution for point processes. Stoch. Proc. Appl. 16, 179–188. [200] Greig-Smith, P. (1964). Quantitative Plant Ecology, 2nd ed. Butterworths, London. [296] Griﬃths, R.C., Milne, R.K., and Wood, R. (1979). Aspects of correlation in bivariate Poisson distributions and processes. Aust. J. Statist. 21, 238–255. [188] Guttorp, P. (1995). Stochastic Modeling of Scientiﬁc Data. Chapman and Hall, London. [320] H¨ aberlund, E. (1975). Inﬁnitely divisible stationary recurrent point processes. Math. Nachr. 70, 259–264. [82] H¨ aggstr¨ om, O., van Lieshout, M.N.M., and Møller, J. (1999). Characterization results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli 5, 641–658. [217] Haight, F.A. (1967). Handbook of the Poisson Distribution. Wiley, New York. [9] Hall, P. (1988). An Introduction to Coverage Processes. Wiley, New York. [205] Hannan, E.J. (1970). Multiple Time Series. Wiley, New York. [347] Harding, E.J. and Kendall, D.G. (Eds.) (1974). Stochastic Geometry. Wiley, Chichester. [17] Harn, K. van (1978). Classifying Inﬁnitely Divisible Distributions by Functional Equations, Mathematical Centre Tract 103. Mathematisch Centrum, Amsterdam. [78] Harris, T.E. (1956). The existence of stationary measures for certain Markov processes. Proc. Third Berkeley Symp. Math. Statist. Probab. 2, 113–124. [92, 97] —— (1963). The Theory of Branching Processes. Springer-Verlag, Berlin. [16, 151] —— (1968). Counting measures, monotone random set functions. Z. Wahrs. 10, 102–119. [16, 389] —— (1971). Random measures and motions of point processes. Z. Wahrs. 18, 85– 115. [16] Hawkes, A.G. (1971a). Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 83–90. [183, 211] —— (1971b). Point spectra of some mutually exciting point processes. J. Roy. Statist. Soc. Ser. B 33, 438–443. [183, 202, 211, 309, 320, 322]

440

References with Index

Hawkes, A.G. (1972). Spectra of some mutually exciting point processes with associated variables. In Lewis (1972), pp. 261–271. [183, 202, 320] —— and Adamopoulos, L. (1973). Cluster models for earthquakes—regional comparisons. Bull. Int. Statist. Inst. 45(3), 454–461. [202, 204, 309] —— and Oakes, D. (1974). A cluster representation of a self-exciting process. J. Appl. Probab. 11, 493–503. [183] Hayashi, T. (1986). Laws of large numbers in self-correcting point processes. Stoch. Proc. Appl. 23, 319–326. [240] Hewitt, E. and Zuckerman, H.S. (1969). Remarks on the functional equation f (x+y) = f (x) + f (y). Math. Mag. 42, 121–123. [64] Heyde, C.C. and Seneta, E. (1977). I.J. Bienaym´e: Statistical Theory Anticipated. Springer-Verlag, New York. [9] Hille, E. and Phillips, R.S. (1957). Functional Analysis and Semi-Groups. American Mathematical Society, Providence, RI. [63–64] Hocking, T.G. and Young, G.S. (1961). Topology. Addison–Wesley, Reading, MA, and London. [406] Holgate, P. (1964). Estimation for the bivariate Poisson distribution. Biometrika 51, 241–245. [188] Hunter, J.J. (1974a). Renewal theory in two dimensions: Basic results. Adv. Appl. Probab. 6, 376–391. [72] —— (1974b). Renewal theory in two dimensions: Asymptotic results. Adv. Appl. Probab. 6, 546–562. [72] Isham, V. (1985). Marked point processes and their correlations. In Droesbecke, F. (Ed.), Spatial Processes and Spatial Time Series Analysis. Publications des Facult´es Universitaires Saint-Louis, Bruxelles, pp. 63–75. [327] —— and Westcott, M. (1979). A self-correcting point process. Stoch. Proc. Appl. 8, 335–347. [239] Ito, Y. (1980). Renewal processes decomposable into i.i.d. components. Adv. Appl. Probab. 12, 672–688. [82] Jacobsen, M. (1982). Statistical Analysis of Counting Processes, Lecture Notes in Statistics 12. Springer-Verlag, New York. [17, 238, 427] Jacod, J. (1975). Multivariate point processes: Predictable projections, Radon–Nikodym derivatives, representation of martingales. Z. Wahrs. 31, 235–253. [247] —— and Memin, J. (1981). Sur un type de convergence interm´ediaire entre la convergence en loi et la convergence en probabilit´e. In Seminar on Probability, XV (Strasbourg, 1979/1980), Lecture Notes in Mathematics 850, Springer, Berlin, pp. 529–546. [Correction (1983) in Seminar on Probability, XVII, Lecture Notes in Mathematics 986, Springer, Berlin, pp. 509–511.] [423] Jagers, P. (1974). Aspects of random measures and point processes. In Ney, P. (Ed.), Advances in Probability and Related Topics, Vol. 3. Marcel Dekker, New York, pp. 179–239. [15] —— (1975). Branching Processes with Biological Applications. Wiley, London. [151] Janossy, L. (1948). Cosmic Rays. Oxford University Press, Oxford. [15] —— (1950). On the absorption of a nucleon cascade. Proc. Roy. Irish Acad. Sci. Sect. A 53, 181–188. [111, 124]

References with Index

441

Ji˘rina, M. (1966). Asymptotic behaviour of measure-valued branching processes. Rozpr. Cesk. Akad. Ved., Rada Mat. Prir. Ved. 75(3). [15] Johnson, N.L and Kotz, S. (1969). Distributions in Statistics, Vol. I: Discrete Distributions. Houghton Miﬄin, Boston. [2nd ed. 1993. Wiley, New York.] [10, 12] —— and —— (1970). Distributions in Statistics, Vol. II: Continuous Univariate Distributions–1. Houghton Miﬄin, Boston. [2nd ed. 1994. Wiley, New York.] [7] —— and —— (1994). = 2nd ed. of Johnson and Kotz (1970). [82] Jolivet, E. (1978). Caract´erisation et test du caract`ere agr´egatif des processus ponctuels stationnaires sur R2 . In Dacunha-Castelle, D. and Cutsem, B. van (Eds.) Journ´ees de Statistiques des Processus Stochastiques, Lecture Notes in Mathematics 636, Springer-Verlag, Berlin, pp. 1–25. [300] Jowett, J. and Vere-Jones, D. (1972). The prediction of stationary point processes. In Lewis (1972), pp. 405–435. [331] Kagan, Y.Y. (1999). Universality of the seismic moment-frequency relation. Pure Appl. Geophys. 155, 537–573. [256] —— and Schoenberg, F. (2001). Estimation of the upper cutoﬀ parameter for the tapered Pareto distribution. In Daley, D.J. (Ed.), Probability, Statistics and Seismology, J. Appl. Probab. 38A, 158–175. [256] Kailath, T. and Segall, I. (1975). The modelling of random modulated jump processes. IEEE Trans. Inf. Theory IT-21 (2), 135–142. [211] Kallenberg, O. (1975). Random Measures. Akademie-Verlag, Berlin, and Academic Press, London. [3rd ed. 1983; reprinted with corrections as 4th ed. 1986]. [15, 389] —— (1983). = 3rd ed. of Kallenberg (1975). [292, 294] Kallianpur, G. (1980). Stochastic Filtering Theory. Springer-Verlag, New York. [423, 425] Karr, A.F. (1986). Point Processes and Their Statistical Inference. Marcel Dekker, New York. [2nd ed. 1991.] [18, 247] Kathirgamatamby, N. (1953). Note on the Poisson index of dispersion. Biometrika 40, 225–228. [23] Kelly, F.P. and Ripley, B.D. (1976). A note on Strauss’s model for clustering. Biometrika 63, 357–360. [217–218, 228] Kendall, D.G. (1949). Stochastic processes and population growth. J. Roy. Statist. Soc. Ser. B 11, 230–264. [15] Kerstan, J. (1964). Teilprozesse Poissonscher Prozesse. In Transactions of the Third Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, Czech. Academy of Science, Prague, pp. 377–403. [202, 268] ——, Matthes, K., and Mecke, J. (1974). Unbegrenzt Teilbare Punktprozesse. Akademie-Verlag, Berlin. [14] ——, ——, and —— (1982). Inﬁnitely Divisible Point Processes (in Russian). Nauka, Moscow. [= 3rd ed. of Kerstan et al. (1974).] [14] Khinchin, A.Ya. (1955). Mathematical Methods in the Theory of Queueing (in Russian). Trudy Mat. Inst. Steklov 49. [Translated (1960). Griﬃn, London.] [14, 30, 46–48, 54] —— (1956). On Poisson sequences of chance events. Teor. Veroyatnost. i Primenen. 1, 320–327. [Translation in Theory Probab. Appl. 1, 291–297.] [52]

442

References with Index

Kingman, J.F.C. (1964). On doubly stochastic Poisson processes. Proc. Cambridge Philos. Soc. 60, 923–930. [174] —— (1972). Regenerative Phenomena. John Wiley, London. [82] —— (1993). Poisson Processes. Clarendon Press, Oxford. [18, 33] —— and Taylor, S.J. (1966). Introduction to Measure and Probability. Cambridge University Press, Cambridge. [Chapters 1–9 republished as Introduction to Measure and Integration (1973), same publisher.] [332, 368] Knopoﬀ, L. (1971). A stochastic model for the occurrence of main sequence events. Rev. Geophys. Space Phys. 9, 175–188. [239] Kolmogorov, A.N. (1935). La transformation de Laplace dans les espaces lin´eaires. C. R. Acad. Sci. Paris 200, 1717–1718. [14] Kotz, S. and Shanbhag, D. (1980). Some new approaches to probability distributions. Adv. Appl. Probab. 12, 903–921. [109] Krickeberg, K. (1974). Moments of point processes. In Harding and Kendall (1974), pp. 89–113. [143] —— (1980). Statistical problems on point processes. In Mathematical Statistics, Banach Centre Publications 6, PWN, Warsaw, pp. 197–223. [300] Kunita, H. and Watanabe, S. (1967). On square-integrable martingales. Nagoya Math. J. 30, 209–245. [211, 431] Kutoyants, Y.A. (1980). Estimation of Parameters of Stochastic Processes (in Russian). Armenian Academy of Science, Erevan. [18, 226] —— (1984). Parameter Estimation for Stochastic Processes. Heldermann, Berlin. [Translated by B.L.S. Prakasa Rao and Revised from Kutoyants (1980).] [18, 26, 226, 235] —— (1998). Statistical Inference for Spatial Poisson Processes, Lecture Notes in Statistics 134. Springer-Verlag, New York. [18] Lai, C.D. (1978). An example of Wold’s point processes with Markov-dependent intervals. J. Appl. Probab. 15, 748–758. [96, 103–104, 243] Lampard, D.G. (1968). A stochastic process whose successive intervals between events form a ﬁrst order Markov chain-I. J. Appl. Probab. 5, 648–668. [95, 105] Lancaster, H.O. (1963). Correlations and canonical forms of bivariate distribution functions. Ann. Math. Statist. 34, 532–538. [95] Laplace, P.S. (1814). Essai Philosophique des Probabilit´es. Introduction (pp. i–cvi), Th´eorie Analytique des Probabilit´es, 2nd ed. [English Translation (1951), A Philosophical Essay on Probabilities. Dover, New York.] [3] Last, G. and Brandt, A. (1995). Marked Point Processes on the Real Line. SpringerVerlag, New York. [18] Lawrance, A.J. (1970). Selective interaction of a stationary point process and a renewal process. J. Appl. Probab. 7, 483–489. [44] Leadbetter, M.R. (1972). On basic results of point process theory. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 3, 449–462. [14, 48, 52] ——, Lindgren, G., and Rootzen, H. (1983). Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New York. [14, 17] LeCam, L. (1947). Un instrument d’´etude des fonctions al´eatoires: La fonctionelle caract´eristique. C. R. Acad. Sci. Paris 224, 710–711. [15]

References with Index

443

LeCam, L. (1961). A stochastic theory of precipitation. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 3, 165–186. [191] Lee, P.M. (1968). Some aspects of inﬁnitely divisible point processes. Stud. Sci. Math. Hungar. 3, 219–224. [31, 33] Lewis, P.A.W. (1964a). A branching Poisson process model for the analysis of computer failure patterns (with Discussion). J. Roy. Statist. Soc. Ser. B 26, 398–456. [16, 182, 192] —— (1964b). The implications of a failure model for the use and maintenance of computers. J. Appl. Probab. 1, 347–368. [182] —— (1970). Remarks on the theory, computation and application of the spectral analysis of series of events. J. Sound Vib. 12 (3), 353–375. [26] —— (Ed.) (1972). Stochastic Point Processes. Wiley, New York. [211] —— and Shedler, G.S. (1976). Simulation of nonhomogeneous Poisson processes with log linear rate function. Biometrika 63, 501–506. [24, 269] Liemant, A., Matthes, K., and Wakolbinger, A. (1988). Equilibrium Distributions of Branching Processes, Mathematical Research 42. Akademie-Verlag, Berlin. [151] Liggett, T.M. (1999). Stochastic Interacting Systems: Contact, Voter and Exclusion Processes. Springer-Verlag, Berlin. [18] Lin, V.Ya. (1965). On equivalent norms in the space of square summable entire functions of exponential type (in Russian). Mat. Sb. (N.S.) 67(109), 586–608. [Translation (1969) Amer. Math. Soc. Transl. 79(2), 53–76.] [358, 364] Lindvall, T. (1977). A probabilistic proof of Blackwell’s renewal theorem. Ann. Probab. 5, 482–485. [83] —— (1992). Lectures on the Coupling Method. Wiley, New York. [83] Liptser, R.S. and Shiryayev, A.N. (1974). Statistics of Random Processes (in Russian). Nauka, Moscow. [Translation (1977, 1978).] [17, 211] —— and —— (1977). Statistics of Random Processes, I: General Theory. SpringerVerlag, New York. [17, 211, 423, 428–431] —— and —— (1978). Statistics of Random Processes, II: Applications. SpringerVerlag, New York. [17, 211] —— and —– (2000). 2nd ed. of Liptser and Shiryayev (1977, 1978). [17, 211] Littlewood, D.E. (1950). The Theory of Group Characters and Matrix Representations of Groups, 2nd ed. Clarendon Press, Oxford. [140] Liu, J., Chen, Y., Shi, Y., and Vere-Jones, D. (1999). Coupled stress release model for time dependent seismicity. Pure Appl. Geophys. 155, 649–667. [256] Lo`eve, M. (1963). Probability Theory, 3rd ed. Van Nostrand, Princeton, NJ. [4th ed. (2 vols.) (1977, 1978). Springer-Verlag, New York.] [29, 33] Lotka, A.J. (1939). A contribution to the theory of self-renewing aggregates, with especial reference to industrial replacement. Ann. Math. Statist. 10, 1–25. [5] Lowen, S.B. and Teich, H.C. (1990). Power-law shot noise. IEEE Trans. Inf. Theory IT-36, 1302–1318. [170] Lu, C., Harte, D., and Bebbington, M. (1999). A linked stress release model for historical Japanese earthquakes. Coupling among major seismic regions. Earth Planets Space 51, 907–916. [240] —— and Vere-Jones, D. (2000). Application of linked stress release model to historical earthquake data: Comparison between two kinds of tectonic seismicity. Pure Appl. Geophys. 157, 2351–2364. [256]

444

References with Index

L¨ uders, R. (1934). Die Statistik der seltenen Ereignisse. Biometrika 26, 108–128. [11] Lukacs, E. (1970). Characteristic Functions, 2nd ed. Griﬃn, London. [79] Lundberg, F. (1903). Approximerad framst¨ allning av sannolikhetsfunktionen. ˚ Aterf¨ ors¨ akring av kollektivrisker. Akad. Afhandling. Almqvist och Wiksell, Uppsala. [199] Lyon, J.F. and Thoma, R. (1881). Ueber die Methode der Blutk¨ orperz¨ ahlung. Virchows Arch. Path. Anat. Physiol. 84, 131–154. [9] Macchi, O. (1971a). Distribution statistique des instants d’´emission des photo´electrons d’une lumi`ere thermique. C. R. Acad. Sci. Paris Ser. A 272, 437–440. [139, 172–173, 175] —— (1971b). Stochastic processes and multicoincidences. IEEE Trans. IT-17 (1), 1–7. [144] —— (1975). The coincidence approach to stochastic point processes. Adv. Appl. Probab. 7, 83–122. [124, 130, 132, 136–137, 140, 172] MacDonald, I.L. and Zucchini, W. (1997). A Hidden Markov and other Models for Discrete-Valued Time Series. Chapman and Hall, London. [244] MacMahon, P.A. (1915). Combinatory Analysis, Vol. 1. Cambridge University Press, Cambridge. [175] Main, I.G. (1996). Statistical physics, seismogenesis, and seismic hazard. Rev. Geophys. 34, 433–462. [257] Maistrov, L.E. (1967). Probability Theory—An Historical Sketch (in Russian). Izdat. Nauka, Moscow. [Translated by S. Kotz (Ed.) (1974). Academic Press, New York.] [3] Massouli´e, L. (1998). Stability results for a general class of interacting point process dynamics, and applications. Stoch. Proc. Appl. 75, 1–30. [275] Mat´ern, B. (1960). Spatial Variation. Meddelanded Stat. Skogsforsk. 49 (5), 1– 144. [2nd ed. (1986). Lecture Notes in Statistics 36, Springer-Verlag, New York.] [16, 298] Math´eron, G. (1975). Random Sets and Integral Geometry. Wiley, New York. [17, 206] Matthes, K., Kerstan, J., and Mecke, J. (1978). Inﬁnitely Divisible Point Processes. Wiley, Chichester [= 2nd ed. of Kerstan, Matthes and Mecke (1974).] [See MKM] McFadden, J.A. (1956). The axis-crossing intervals of random functions, I. Trans. Inst. Radio Engnrs. IT-2, 146–150. [14] —— (1958). The axis-crossing intervals of random functions, II. Trans. Inst. Radio Engnrs. IT-4, 14–24. [14] —— (1965). The entropy of a point process. J. SIAM 13, 988–994. [286] —— and Weissblum, W. (1963). Higher-order properties of a stationary point process. J. Roy. Statist. Soc. Ser. B 25, 413–431. [81] McKendrick, A.G. (1914). Studies on the theory of continuous probabilities with special reference to its bearing on natural phenomena of a progressive nature. Proc. London Math. Soc. 13(2), 401–416. [9–10] —— (1926). The application of mathematics to medical problems. Proc. Edinburgh Math. Soc. 44, 98–130. [9]

References with Index

445

Mecke, J. (1967). Zum Problem der Zerlegbarkeit station¨ arer rekurrenter zuf¨ alliger Punktfolgen. Math. Nachr. 35, 311–321. [81] —— (1969). Versch¨ arfung eines Satzes von McFadden. Wiss. Z. Friedrich-SchillerUniversit¨ at Jena 18, 387–392. [81] Meyer, P.A. (1971). D´emonstration simpliﬁ´ee d’un th´eor`eme de Knight. In S´eminaire de Probabilit´es V, Universit´e de Strasbourg, 1969–1970, Lecture Notes in Mathematics 191. Springer, Berlin, pp. 191–195. [257] Meyn, S.P. and Tweedie, R.L. (1993). Markov Chains and Stochastic Stability. Springer-Verlag, London. [93] Miles, R.E. (1974). On the elimination of edge eﬀects in planar sampling. In Harding and Kendall (1974), pp. 228–247. [303] Milne, R.K. (1974). Inﬁnitely divisible bivariate Poisson processes (Abstract). Adv. Appl. Probab. 6, 226–227. [188] —— and Westcott, M. (1972). Further results for Gauss–Poisson processes. Adv. Appl. Probab. 4, 151–176. [185–186, 315] —— and —— (1993). Generalized multivariate Hermite distributions and related point processes. Ann. Inst. Statist. Math. 45, 367–381. [123, 187] Minc, H. (1978). Permanents. Addison–Wesley, Reading, MA. [173] [MKM] (1978). [= Matthes, Kerstan and Mecke (1978).] [14, 82, 151, 157, 163, 176] [MKM] (1982). [See Kerstan, Matthes and Mecke (1982).] Molchanov, I. (1997). Statistics of the Boolean Model for Practitioners and Mathematicians. Wiley, Chichester. [17, 205–206] Moran, P.A.P. (1967). A non-Markovian quasi-Poisson process. Stud. Sci. Math. Hungar. 2, 425–429. [31, 106] —— (1968). An Introduction to Probability Theory. Clarendon Press, Oxford. [33, 123, 163] —— (1976a). A quasi-Poisson point process in the plane. Bull. London Math. Soc. 8, 69–70. [31] —— (1976b). Another quasi-Poisson plane point process. Z. Wahrs. 33, 269–272. [31] Moyal, J.E. (1962a). The general theory of stochastic population processes. Acta Math. 108, 1–31. [15, 111, 129–130, 150] —— (1962b). Multiplicative population chains. Proc. Roy. Soc. London Ser. A 266, 518–526. [150–151] Neuts, M.F. (1979). A versatile Markovian point process. J. Appl. Probab. 16, 764–779. [306] Newman, D.S. (1970). A new family of point processes characterized by their second moment properties. J. Appl. Probab. 7, 338–358. [174, 185, 220] Newton, Sir Isaac (1728). The Chronology of Ancient Kingdoms Amended. [Published posthumously. See H. Zeitlinger (1927), A Newton bibliography. In W.J. Greenstreet (Ed.), Isaac Newton 1642–1727. Bell and Sons, London, pp. 148–170.] [5] Neyman, J. (1939). On a new class of ‘contagious’ distributions applicable in entomology and bacteriology. Ann. Math. Statist. 10, 35–57. [11] —— and Scott, E.L. (1958). Statistical approach to problems of cosmology (with Discussion). J. Roy. Statist. Soc. Ser. B 20, 1–43. [12, 181]

446

References with Index

Neyman, J. and Scott, E.L. (1972). Processes of clustering and applications. In Lewis (1972), pp. 646–681. [181] Nummelin, E. (1978). A splitting technique for Harris recurrent Markov chains. Z. Wahrs. 43, 309–318. [96–97] Oakes, D. (1974). A generalization of Moran’s quasi-Poisson process. Stud. Sci. Math. Hungar. 9, 433–437. [31] Ogata, Y. (1978). The asymptotic behaviour of maximum likelihood estimates for stationary point processes. Ann. Inst. Statist. Math. 30, 243–261. [235]] —— (1981). On Lewis’ simulation method for point processes. IEEE Trans. Inf. Theory IT-27, 23–31. [270–271, 275] —— (1983). Likelihood analysis of point processes and its applications to seismological data. Bull. Int. Statist. Inst. 50 (2), 943–961. [244] —— (1988). Statistical models for earthquake occurrences and residual analysis for point processes. J. Amer. Statist. Assoc. 83, 9–27. [204, 261, 263] —— (1992). Detection of precursory relative quiescence before great earthquakes through a statistical model. J. Geophys. Res. 97, 19845–19871. [263] —— (1998). Space–time point-process models for earthquake occurrences. Ann. Inst. Statist. Math. 50, 379–402. [204] —— (2001). Increased probability of large earthquakes near aftershock regions with relative quiescence. J. Geophys. Res. 106, 8729–8744. [263] —— and Akaike, H. (1982). On linear intensity models for mixed doubly stochastic Poisson and self-exciting point processes. J. Roy. Statist. Soc. Ser. B 44, 102–107. [234, 309] ——, Akaike, H., and Katsura, K. (1982). The application of linear intensity models to the investigation of causal relations between a point process and another stochastic process. Ann. Inst. Statist. Math. 34, 373–387. [309] —— and Katsura, K. (1986). Point-process models with linearly parametrized intensity for the application to earthquake catalogue. J. Appl. Probab. 23A, 231–240. [235] —— and Tanemura, M. (1981). Estimation of interaction potentials of spatial point patterns through the maximum likelihood procedure. Ann. Inst. Statist. Math. 33B, 315–338. [217] —— and —— (1984). Likelihood analysis of spatial point patterns. J. Roy. Statist. Soc. Ser. B 46, 496–518. [128] —— and Vere-Jones, D. (1984). Inference for earthquake models: A self-correcting model. Stoch. Proc. Appl. 17, 337–347. [239–240, 244] Ohser, H. and Stoyan, D. (1981). On the second-order and orientation analysis of planar stationary point processes. Biom. J. 23, 523–533. [298] Orey, S. (1971). Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand–Reinhold, London. [92] Ozaki, T. (1979). Maximum likelihood estimation of Hawkes’ self-exciting point processes. Ann. Inst. Statist. Math. 31, 145–155. [309] Palm, C. (1943). Intensit¨ atsschwankungen im Fernsprechverkehr, Ericsson Technics 44. [13, 328] Papangelou, F. (1972a). Summary of some results on point and line processes. In Lewis (1972), pp. 522–532. [23]

References with Index

447

Papangelou, F. (1972b). Integrability of expected increments of point processes and a related random change of scale. Trans. Amer. Math. Soc. 165, 483–506. [23] —— (1974). On the Palm probabilities of processes of points and processes of lines. In Harding and Kendall (1974), pp. 114–147. [257] Parthasarathy, K.R. (1967). Probability Measures on Metric Spaces. Academic Press, New York. [381] Pogorzelski, W.A. (1966). Integral Equations and Their Applications. Pergamon Press, Oxford, and PWN, Warsaw. [142] Poisson, S.D. (1837). Recherches sur la Probabilit´e des Jugements en Mati`ere Criminelle et en Mati`ere Civile, Pr´ec´ed´ees des R`egles G´en´erales du Calcul des Probabilit´es. Bachelier, Paris. [8] P´ olya, G. (1931). Sur quelques points de la th´eorie des probabilit´es. Ann. Inst. Henri Poincar´e 1, 117–162. [11] Pr´ekopa, A. (1957a). On the compound Poisson distribution. Acta Sci. Math. Szeged. 18, 23–28. [37] —— (1957b). On Poisson and composed Poisson stochastic set functions. Stud. Math. 16, 142–155. [37] Preston, C.J. (1976). Random Fields, Lecture Notes in Mathematics 534. SpringerVerlag, New York. [128] Prohorov, Yu.V. (1956). Convergence of random processes and limit theorems in probability theory (in Russian). Teor. Veroyatnost. i Primenen. 1, 177–238. [Translation in Theory Probab. Appl. 1, 157–214.] [15] Quine, M.P. and Watson, D.F. (1984). Radial simulation of n-dimensional Poisson processes. J. Appl. Probab. 21, 548–557. [25] Ramakrishnan, A. (1950). Stochastic processes relating to particles distributed in a continuous inﬁnity of states. Proc. Cambridge Philos. Soc. 46, 595–602. [15, 111, 136] Rao, C.R. and Shanbhag, D.N. (1986). Recent results on characterizations of probability distributions: A uniﬁed approach through extensions of Deny’s theorem. Adv. Appl. Probab. 18, 660–678. [74] R´enyi, A. (1967). Remarks on the Poisson process. Stud. Sci. Math. Hungar. 5, 119–123. [31, 33] Resnick, S.I. (1987). Extreme Values, Regular Variation, and Point Processes. Springer-Verlag, New York. [17] Rice, S.O. (1944). Mathematical analysis of random noise. Bell Syst. Tech. J. 23, 282–332 and 24, 46–156. [Reprinted in N. Wax (Ed.) (1954). Selected Papers on Noise and Stochastic Processes. Dover, New York, pp. 133–294.] [14] Ripley, B. D. (1976). The second-order analysis of spatial point processes. J. Appl. Probab. 13, 255–266. [297, 300] —— (1977). Modelling spatial patterns (with Discussion). J. Roy. Statist. Soc. Ser. B 39, 172–212. [128, 297] —— (1981). Spatial Statistics. Wiley, New York. [16–17, 111, 222, 297, 300] —— (1988). Statistical Inference for Spatial Processes. Cambridge University Press, Cambridge. [320] —— and Kelly, F.P. (1977). Markov point processes. J. London Math. Soc. 15, 188–192. [218]

448

References with Index

Robertson, A.P. and Thornett, M.L. (1984). On translation bounded measures. J. Aust. Math. Soc. Ser. A 37, 139–142. [358, 367] Rubin, I. (1972). Regular point processes and their detection. IEEE Trans. Inf. Theory IT-18, 547–557. [211] Rudemo, M. (1964). Dimension and entropy for a class of stochastic processes. Magyar Tud. Akad. Mat. Kutat´ o Int. K¨ ozl. 9, 73–87. [286] Ruelle, D. (1969). Statistical Mechanics: Rigorous Results. Benjamin, New York. [128] Ryll-Nardzeweki, C. (1961). Remarks on processes of calls. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 2, 455–465. [14] ¨ Sch¨ al, M. (1971). Uber L¨ osungen einer Erneuerungsgleichung. Abh. Math. Sem. Univ. Hamburg 36, 89–98. [90] Schlather, M. (2001). Second order characteristics of marked point processes. Bernoulli 7, 99–117. [327] Schoenberg, F. (1999). Transforming spatial point processes into Poisson processes. Stoch. Proc. Appl. 81, 155–164. [266] Schoenberg, F.P. (2002). On rescaled Poisson processes and the Brownian bridge. Ann. Inst. Statist. Math. 54, 445–457. [262] Schwarz, L. (1951). Th´eorie des Distributions, Vol. II. Hermann, Paris. [357] ¨ Seidel, H. (1876). Uber die Probabilit¨ aten solcher Ereignisse welche nur seiten vorkommen, obgleich sie unbeschr¨ ankt oft m¨ oglich sind. Sitzungsber. Math. Phys. Cl. Akad. Wiss. M¨ unchen 6, 44–50. [8] Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, London. [17] Sgibnev, M.S. (1981). On the renewal theorem in the case of inﬁnite variance. Sibirsk. Mat. Zh. 22 (5), 178–189. [Translation in Siberian Math. J. 22, 787–796.] [91] Shi, Y., Liu, J., and Zhang, S. (2001). An evaluation of Chinese annual earthquake predictions, 1990–1998. In Daley, D.J. (Ed.), Probability, Statistics and Seismology, J. Appl. Probab. 38A, 222–231. [276] Slivnyak, I.M. (1962). Some properties of stationary ﬂows of homogeneous random events. Teor. Veroyatnost. i Primenen. 7, 347–352. [Translation in Theory Probab. Appl. 7, 336–341.] [60] —— (1966). Stationary streams of homogeneous random events. Vestn. Harkov. Gos. Univ. Ser. Mech. Math. 32, 73–116. [60] Smith, W.L. (1958). Renewal theory and its ramiﬁcations (with Discussion). J. Roy. [72] Statist. Soc. Ser. B 20, 284–302. —— (1962). On necessary and suﬃcient conditions for the convergence of the renewal density. Trans. Amer. Math. Soc. 104, 79–100. [91] Snyder, D.L. (1972). Filtering and detection for doubly stochastic Poisson processes. IEEE Trans. Inf. Theory IT-18, 97–102. [211] —— (1975). Random Point Processes. Wiley, New York. [16, 211] —— and Miller, M.I. (1991). Random Point Processes in Time and Space. Wiley, New York. [= 2nd ed. of Snyder (1975).] [17, 211] Solomon, H. and Wang, P.C.C. (1972). Nonhomogeneous Poisson ﬁelds of random lines with applications to traﬃc ﬂow. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 3, 383–400. [329]

References with Index

449

Srinivasan, S.K. (1969). Stochastic Theory and Cascade Processes. American Elsevier, New York. [15, 124] —— (1974). Stochastic Point Processes and Their Applications. Griﬃn, London. [15] Stone, C. (1966). On absolutely continuous components and renewal theory. Ann. Math. Statist. 37, 271–275. [88, 90] Stoyan, D. (1983). Comparison Methods for Queues and Other Stochastic Models. Wiley, Chichester. [7] —— (1984). On correlations of marked point processes. Math. Nachr. 116, 197–207. [327] ——, Kendall, W.S., and Mecke, J. (1987). Stochastic Geometry. Akademie-Verlag, Berlin, and Wiley, Chichester. [17, 111] ——, Kendall, W.S., and Mecke, J. (1995). Stochastic Geometry, 2nd ed. Wiley, Chichester. [1st ed. Stoyan et al. (1987).] [17, 111, 205–206, 222] —— and Stoyan, H. (1994). Fractals, Random Shapes and Point Fields. Wiley, Chichester. [17, 111, 205, 222] Strauss, D.J. (1975). A model for clustering. Biometrika 62, 467–475. [217] ‘Student’ (1907). On the error of counting with a haemacytometer. Biometrika 5, 351–360. [9–10] Szasz, D.O.H. (1970). Once more on the Poisson process. Stud. Sci. Math. Hungar. 5, 441–444. [31] Takacs, L. (1965). A moment problem. J. Aust. Math. Soc. 5, 487–490. [122] —— (1967). On the method of inclusion and exclusion. J. Amer. Statist. Assoc. 62, 102–113. [119] —— (1976). Some remarks on a counter process. J. Appl. Probab. 13, 623–627. [105] Teugels, J.L. (1968). Renewal theorems when the ﬁrst or the second moment is inﬁnite. Ann. Math. Statist. 39, 1210–1219. [106] Thedeen, T. (1964). A note on the Poisson tendency in traﬃc distribution. Ann. Math. Statist. 35, 1823–1824. [329] Thompson, H.R. (1955). Spatial point processes with applications to ecology. Biometrika 42, 102–115. [181] Thorisson, H. (2000). Coupling, Stationarity and Regeneration. Springer, New York. [83] Thornett, M.L. (1979). A class of second-order stationary random measures. Stoch. Proc. Appl. 8, 323–334. [338–340, 342, 357–358, 367] Titchmarsh, E.C. (1937). Introduction to the Theory of Fourier Integrals. Oxford University Press, Oxford. [411] Tyan, S. and Thomas, J.B. (1975). Characterization of a class of bivariate distribution functions. J. Multivariate Anal. 5, 227–235. [95] Utsu, T., Ogata, Y., and Matsu’ura, R.S. (1995). The centenary of the Omori for[263] mula for the decay law of aftershock activity. J. Phys. Earth 43, 1–33. Varnes, D.J. (1989). Predicting earthquakes by analyzing accelerating precursory seismic activity. Pure Appl. Geophys. 130, 661–686. [257] Vasil’ev, P.I. (1965). On the question of ordinariness of a stationary stream. Kisinev. Gos. Univ. Ucen. Zap. 82, 44–48. [52]

450

References with Index

Vere-Jones, D. (1970). Stochastic models for earthquake occurrences (with Discussion). J. Roy. Statist. Soc. Ser. B 32, 1–62. [192, 325, 344] —— (1974). An elementary approach to the spectral theory of stationary random measures. In Harding and Kendall (1974), pp. 307–321. [331, 334, 341, 345, 357] —— (1975). A renewal equation for point processes with Markov-dependent intervals. Math. Nachr. 68, 133–139. [95, 103] —— (1978a). Space time correlations for microearthquakes—a pilot study. Supplement to Adv. Appl. Probab. 10, 73–87. [297–298, 300, 303] —— (1978b). Earthquake prediction—a statistician’s view. J. Phys. Earth 26, 129– 146. [239] —— (1982). On the estimation of frequency in point-process data. In Gani, J. and Hannan, E.J. (Eds.), Essays in Statistical Science, J. Appl. Probab. 19A, 383–394. [226] —— (1984). An identity involving permanents. Linear Alg. Appl. 63, 267–270. [175] —— (1988). On the variance properties of stress release models. Aust. J. Statist. 30A, 123–135. [240–241, 245] —— (1995). Forecasting earthquakes and earthquake risk. Int. J. Forecasting 11, 503–538. [17] —— (1997). Alpha-permanents and their applications to multivariate gamma, negative binomial and ordinary binomial distributions. N.Z. J. Math. 26, 125–149. [14, 140, 175] —— (1999). Probabilities and information gain for earthquake forecasting. Comput. Seismol. 30 (Geodynamics and Seismology), 248–263. [286] —— and Davies, R.B. (1966). A statistical survey of earthquakes in the main seismic region of New Zealand. Part II, Time Series Analysis. N.Z. J. Geol. Geophys. 9, 251–284. [163, 344] —— and Musmeci, F. (1992). A space–time clustering model for historical earthquakes. Ann. Inst. Statist. Math. 44, 1–11. [204] —— and Ogata, Y. (1984). On the moments of a self-correcting process. J. Appl. Probab. 21, 335–342. [240] —— and Ozaki, T. (1982). Some examples of statistical inference applied to earthquake data. Ann. Inst. Statist. Math. 34, 189–207. Correction (1987), Ann. Inst. Statist. Math. 39, 243. [203, 234, 309, 337] ——, Robinson, R., and Yang, W. (2001). Remarks on the accelerated moment release model: Problems of model formulation, simulation and estimation. Geophys. J. Internat. 144, 517–531. [256–257] Von Bortkiewicz, L. (1898). Das Gesetz der kleinen Zalhlen. G. Teubner, Leipzig. [See M.P. Quine and E. Seneta (1987). Bortkiewicz’s data and the law of small numbers. Internat. Statist. Rev. 55, 173–181.] [9] Warren, W.G. (1962). Contributions to the Study of Spatial Point Processes. Ph.D. thesis, University of North Carolina, Chapel Hill (Statistics Dept. Mimeo Series 337). [181] —— (1971). The centre-satellite concept as a basis for ecological sampling. In Patil, G.P., Pielou, E.C., and Waters, W.E. (Eds.), Statistical Ecology, Vol. 2, Pennsylvania State University Press, University Park, PA, pp. 87–118. [181] Watanabe, S. (1933). On the theory of durability. Geophys. Mag. (Tokyo) 7, 307– 317. [7]

References with Index

451

Watanabe, S. (1964). On discontinuous additive functionals and Levy measures of a Markov process. Japanese J. Math. 34, 53–70. [211, 257] Weibull, W. (1939a). A statistical theory of the strength of materials. Ing. Vetensk. Akad. Banal. Stockholm, No. 151. [4] —— (1939b). The phenomenon of rupture in solids. Ing. Vetensk. Akad. Banal. Stockholm, No. 153. [4] Westcott, M. (1970). Identiﬁability in linear processes. Z. Wahrs. 16, 39–46. [169] Whitworth, W.A. (1867). Choice and Chance, an Elementary Treatise on Permutations, Combinations and Probability, with 300 Exercises. Cambridge. [2nd ed. (1870); 3rd ed. (1878); 4th ed. . . . with 640 Exercises (1886), Deighton Bell, Cambridge; 5th ed. Choice and Chance, with One Thousand Exercises (1901), reprinted (1942), Stechert, New York, and (1951), Hafner Publishing, New York.] [9] —— (1897). DCC Exercises in Choice and Chance. Deighton Bell, Cambridge. [Reprinted (1959), Hafner Publishing, New York.] [9] Wold, H. (1948). On stationary point processes and Markov chains. Skand. Aktuar. 31, 229–240. [14, 92, 105, 110] —— (1949). Sur les processes stationnaires ponctuels. Coll. Int. CNRS 13, 75–86. [14] Yaglom, A.Ya. (1961). Second-order homogeneous random ﬁelds. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 2, 593–622. [303] Yashin, A. (1970). Filtering of jump processes. Avtomat. i Telemekh. 1970(5), 52– 58. [Translation in Automat. Remote Control 1970, 725–730.] [211] Yule, G.U. (1924). A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis. Philos. Trans. B 213, 21–87. [9] ´ ´ Yvon, J. (1935). La Th´eorie Statistique des Fluides et l’Equation d’Etat, Actualit´es Scientiﬁques et Industrielles 203. Hermann, Paris. [12, 111, 124] Zheng, X. (1991). Ergodic theorems for stress release processes. Stoch. Proc. Appl. 37, 239–258. [240–241] —— and Vere-Jones, D. (1994). Further applications of the stress release model to historical earthquake data. Tectonophysics 229, 101–121. [240] Zygmund, A. (1968). Trigonometric Series, 2nd ed. Cambridge University Press, Cambridge. [341]

Subject Index

Absolute continuity, 214, 376 of measures, 376 in Lebesgue decomposition, 377 of point processes, 214 ﬁnite Poisson processes, 226 Abstract Lebesgue integral, 375 Accelerated moment release model, 257 Accident proneness, 11 Adapted processes, 236 Additive function on R Cauchy functional (= Hamel) equation, 64 Additive set function, 372 regular, compact regular, 386 totally ﬁnite, σ-ﬁnite, 373 Aftereﬀects in point process, 13 Algebra of sets, 368 generating ring, 369 covering ring, 389 Almost sure (a.s.) convergence, 418 Atomic component of random measure inﬂuence on moment measure, 292 of reduced covariance measure, 292 Atomic measure, 382 Autoregressive process, with exponential intervals, 92 inﬁnitely divisible example, 102 452

Autoregressive representation of random measure, 351 of best linear predictor, 354 Avoidance function, 135 factorial moment measure representation, 135 Avoidance probability, 31 Poisson process characterization, 32 see also Avoidance function Backward recurrence time, 59 for Poisson process, 20 for renewal process, 76 hazard function of, 59 Baire sets, 384 Bartlett–Lewis process, 182, 192 Bartlett spectrum, 315 Bartlett spectrum of stationary point process or random measure, 303 —general properties absolutely continuous case, 305, 311 canonical factorization, 347 condition for nondeterministic process, 347 inversion formula, 304 spectral density, 305, 309 Bartlett spectrum—named processes or operation Bartlett–Lewis, 315 bivariate Poisson, 318

Subject Index cluster, cluster formation, 307 Cox, doubly stochastic structure, 313 cyclic process on four points, 314 deterministic process, 307 Hawkes, 309 isotropic planar point processes, 310 multivariate random measure, 317 mutually exciting process, 322 Neyman–Scott, 314 isotropic case, 312 Poisson, 306 on Rd , 306 quadratic random measure, 313 random translations, 314 iterated translations, 314 renewal process, 306 superpositions, 313 Batch-size distribution moments, 51 point process on line, 46, 49 Poisson process, 29 Baum–Welch see E–M algorithm Bessel transform in Bartlett spectrum of isotropic planar process, 310 Best linear predictor, 353 ARMA representations, 354 point process with rational spectral density, 354 two-point cluster process, 356 Binary process, 237 logistic regression model, 237 Binning, 343 Binomial score, 278 Birth process, linear, simulation of, 275 Bivariate mark kernel for second-order stationary MPP, 325 Bivariate MPP Palm–Khinchin equations for, 331 Bivariate point processes from input–output process, 329 Bivariate Poisson distributions, 188 Bivariate Poisson process, 187 Bartlett spectrum, 318 forward recurrence times, 330 intensities for diﬀerent histories, 250, 256 martingale properties, 256 random time transformations, 264 Blackwell renewal theorem, 83

453

Bochner’s theorem, 303, 412 Bolzano–Weierstrass property, 371 Bonferroni inequalities, 120, 122 Boolean algebra, 368 Boolean model for random set, 206 associated random ﬁelds, 206 moments of union set, 210 simulation, 275 Borel measurable function, 374 Borel measure, 374 on c.s.m.s. boundedly ﬁnite, 402 space of, 402 Borel sets, 384 countably generated in separable metric space, 385 in topological space, 374 Borel σ-algebra, 382, 384 Borelian sets, 384 Boson process, 172 discrete version, 174 Janossy densities, 222 Bounded convergence theorem, 376 Bounded variability process, 295, 301 Boundedly ﬁnite counting measures, 158 space of (= NX# ), 158 measures, relatively compact family of, 405 space of (= M# X ), 402 as a c.s.m.s., 403 weak convergence in, 403 w# -topology, 403 signed measure on Rd , 358 p.p.d., 358 positive-deﬁnite, 358 transformable, 358 translation-bounded, 358 Branching process age-dependent, 156 Galton–Watson, 13 sib distribution in, 13 model for spread of infection, 155 Branching process, general (= multiplicative population chain), 150 extinction probability in, 155 p.g.ﬂ., moment measure relations, 150 for total population, 155 Burkill integral, 59 Burn-in period, in simulation, 269 as edge eﬀect, 275

454

Subject Index

C` adl` ag process, 429 Campbell measure, 163 Canonical ensemble, 127 Cartesian product, 377 Cauchy sequence, 370 Cauchy’s functional equation, 64 nonmeasurable solutions, 64 Central limit theorem, early proof by Ellis, 4 Centre-satellite process, 181 see Neyman–Scott process Change-point detection in residual analysis tests, 262, 263 Characteristic functional, 14 Characterizations of point processes Poisson, 26 renewal process, 77, 78 Chebyshev’s inequality, 418 Clique, in Markov point process, 218 Cluster models and processes, 11, 175 Bartlett spectrum, 307 centre and component processes, 176 independent clusters, 176 moment measure for, 191 p.g.ﬂ., 178 second-order factorial moments, 178 suﬃcient condition for existence of stationary version, 191 see also Poisson cluster process Coherence in multivariate process spectrum, 318 Coincidence density, 136 product density, 136 Combinatorial arguments, 112 Compact regular measure, 387 iﬀ tight measure in c.m.s., 387 Compact set in topological space, 371 Compensator, 241 deﬁning random time change, 258 renewal process, 246 Complete history, 281 Complete independence Poisson process, 27 Complete intensity function, 234 Complete separable metric space (c.s.m.s.), 124, 371, 384 separability set in, 385, 388 tightness of Borel measure in, 388 Complete space, 370

Complete stationarity, 27 see stationarity Compound Poisson process, 25 conditional intensity characterization, 252, 257 deﬁnition via MPP, 198 inﬁnite intensity example, 53 p.g.f., 27 random time transformation of MPP, 266 Conditional distributions, 415 regular version of, 416 Conditional expectation, 414, 417 Doob representation for, 417 repeated conditioning, 415 Conditional intensity function, 211, 231 as amalgam of hazard functions, 231 as random hazard function, 211 Papangelou intensity contrast, 232 terminology, 231 complete intensity function, 234 determine ﬁdi distributions, 233 for MPP, 246 mark characterizations, 252, 257 of ground process, 249 history-dependent in bivariate process, 250, 256 in likelihood, 232 in nonlinear prediction, 267, 344 left-continuous version, 232 linear parametrizations, 235 Markov representations for, 239 of Cox process with Markovian rate process, 254 of renewal process, 237 use in thinning construction, 268 Conditional probability, 379 existence in regular case, 380 Conditional (second-order) intensity, 296 Conditional survivor functions, 229 Contagious distribution, 11 Continuity lemma for measures, 372 for σ-ﬁnite set function, 373 Continuous mapping theorem, 371 Controlled variability process see Bounded variability process

Subject Index Convergence of conditional distributions, see Stable convergence of functions or r.v.s almost everywhere (a.e.), 376 almost sure (a.s.), 418 in Lp , 418 in probability, 418 stable, 419 of measures strong = in variation norm, 391 for renewal theorem, 90 vague, 391 weak, 391 w# , boundedly ﬁnite case, 403 Convergence-determining class of sets, 393 Corlol, = c` adl` ag, 429 Correlation function, radial, 298 Countable base, 371 Counting measure, point process on line, 42 Coupling method of proof Blackwell renewal theorem, 83 Coverage process, 205 Covering ring, 389 covering class, 396 covering semiring, 393 Cox process (= doubly stochastic Poisson process), 169 Bartlett spectrum, 313 conditions to be renewal process, 174 ﬁdi distributions and moments, 170 Markovian rate functions, 244 p.g.ﬂ., 170 Cox regression model, 238 Crude stationarity, 44 Poisson process, 27 C.s.m.s., 124 see Complete separable metric space Cumulative processes, 256 Current lifetime, 59 of renewal process, 76 Cyclic Poisson process, 26 likelihood, 226 Cyclic process on four points, 313 Bartlett spectrum, 314 Cylinder set in product space, 378 Delayed renewal process, 74

455

Determining class of set functions, 372 Deterministic process, 76 L2 sense, 345 process of equidistant points, 76 stationary, Bartlett spectrum of, 307 Diﬀuse measure, 382 Dirac measure, 382 Direct Riemann integrability, 85 conditions for, 90 Discrete point process binary process, 237 Hawkes process, 281 Wold process, 94, 103 Disintegration of measures, 379 Dissecting ring, 386 Dissecting system, 282, 382 existence in separable metric space, 385 nested family of partitions, 383 Dobrushin’s lemma, 48 Dominated convergence theorem, 376 Doob representation for conditional expectation, 417 Doob–Meyer decomposition of submartingale, 241, 430 Double stochastic Poisson process, see Cox process Doubly Poisson compound Poisson distribution, 123 Dynkin system (of sets), 369 Dynkin system theorem, 369 Earthquake models, see Epidemic type aftershock sequence (ETAS) model Stress-release model Edge eﬀects in moment estimates, 299, 303 multivariate case, 320 in segment of stationary process, 216 in simulation, 275 periodic boundary eﬀect, 222 plus and minus sampling, 221 Eﬃcient score statistic factorial cumulant densities, 223 Gauss–Poisson process, 228 Neyman–Scott process, 228 point process on Rd , 222 Poisson cluster process, 225 Eigenvalues of random unitary matrices, 140

456

Subject Index

Elastic rebound theory, 239 Elementary renewal theorem, 72 analogue for process on R, 60 E–M algorithm, 239, 244 Entropy of ﬁnite point process, 287 score, 276 Epidemic type aftershock sequence (ETAS) model, 203 ground process, 239 nonlinear generalization, 253 spatial version, 205 under random time change, 266 Equivalent bases for topology, 392 metrics, 370 topological spaces, 370 Ergodic point process on R, 61 Ergodic theorems for point processes and random measures, 291 Erlang distribution, 4, 21 Essentially bounded r.v., 418 ETAS, see Epidemic type aftershock sequence Evolutionary dimension absent in spatial point pattern, 212 Evolutionary process likelihood theory for, 214 Exclusion probabilities, 124 Expectation function of stationary point process on R, 61 Expectation measure ﬁnite point process, 133 renewal process, 67 see also First moment measure Expected information gain, 277 linear and nonlinear predictors, 357 per time step, 280 per unit time, 283 Exponential autoregressive process, 92 density, two-sided, multivariate, 359 distribution lack of memory property, 24 transformation to, 258 formula for Lebesgue–Stieltjes integral, 107 Extension theorem for measures, 373 Extreme value distributions, 7

Factorial cumulant densities in eﬃcient score statistics, 223 Factorial cumulant measures, 146 relation to other measures, 154 representation via factorial moment measures, 147 converse, 148 Factorial moment measures, 133 characterization of family of, 139 relation to other measures, 153 Factorial moments and cumulants, 114 Factorization lemma for measures invariant under σ-group of transformations, 409 rotation-invariant measure, 410 Fatou’s lemma, 376 Fermion process, 140 discrete, 143 Janossy densities, 222 renewal process example, 144 Fidi, see Finite-dimensional Filtration, 424 see History Finite Fourier transform of point process, 336 Finite inhomogeneous Poisson process likelihood, 213 likelihood ratio, 215 Finite intersection property, 371 of c.s.m.s., 371 Finite point process, 111, 123, 129 absolute continuity of Poisson, 226 canonical probability space for, 129 eigenvalues of random unitary matrix, 18, 140 expectation measure, 133 ﬁdi distributions, 112 moment measures, 132 product density, 136 symmetric probability measures, 124, 129 Finite renewal process, 125 Finite-dimensional (ﬁdi) distributions for point process, 130, 158 conditional density and survivor function representation, 230 for MPP, 247 consistency conditions, 158 for ﬁnite point process, 130

Subject Index determined by conditional intensity, 233 for MPP, 251 Poisson process, 19, 159 Finitely additive set function, 372 condition to form measure, 388 continuity lemma, 372 countably or σ-additive, 372 measure when compact regular, 388 First passage time, 426 stopping time property, 426 First-order moment measures structure in stationary case, 289 for MPP, 322 for multivariate process, 316 see also Expectation measure Fixed atom of point process, 35 sample path family property, 35 Forecast of point process see Scores for probability forecast Forward recurrence time, 58 analyzed as MPP, 327 bivariate Poisson process, 330 convergence of distribution, 86 hazard function of, 59 Palm–Khinchin equation for, 58 Poisson process, 20 renewal process, 69 stationary renewal process, 75 Fourier transform, 411 inverse of, 411 inversion theorems for, 412 of Poisson process, 335 of p.p.d. measures, 357 of unbounded measures, 303, 357 Riemann–Lebesgue lemma for, 411 Fourier’s singular integral, 341 Fourier–Stieltjes transform, 412 Fredholm determinant, 141 Fubini’s theorem, 379 Functions of rapid decay, 332, 357 Gamma distribution, 3 Gamma random measure general, 167 stationary, 162 Gauss–Poisson process, 174, 185 eﬃcient score statistic, 228 existence conditions, 185 Khinchin and Janossy measures, 219 marked, 331

457

on bounded set, 219 pseudo Cox process, 174 stationary, 220, 228 General Poisson process, 34 characterization by complete independence, 36 orderliness, 35 General renewal equation, 68 uniqueness of solution, 69 General theory of processes, 236 Generalized entropy see Relative entropy Generalized functions and p.p.d. measures, 357 Generating functional expansions relationships between, 153 Germ–grain model, 206 Gibbs process, 126 ﬁnite, 216 likelihood, pseudolikelihood, 217 ideal gas model, 128 interaction and point pair potentials, 127 soft- and hard-core models, 128 Gompertz–Makeham law, 3 Goodness-of-ﬁt for point process, 261 algorithm for test of, 262 Grand canonical ensemble, 127 Ground process, 53, 194 conditional intensity λ∗g , 249 Group, 407 direct product, 408 dual, 413 topological, 407 equivalence classes on, 408 metrizable, 407 quotient topology, 408 Gumbel distribution, 7 Haar measure, 408 in factorization lemma, 409 on topological group and its dual, 413 Plancherel identity for, 413 Halo set, 387 Hamel equation, 64 Hard-core model, 128 Gibbs process, 128 Mat´ern’s models, 299 Strauss process, 217, 219 Hausdorﬀ metric, 205 Hausdorﬀ topology, 370

458

Subject Index

Hawkes process, 183 autoregressive process analogy, 309 Bartlett spectrum, 309 minimal p.p.d. measure for, 367 cluster construction of, 184 condition to be well-deﬁned, 184, 234 conditional intensity for, 233 parametric forms, 234 representation by, 233 discrete, 281 infectivity function µ(·), 184 exponential, 185, 243 long-tailed, 203 linear prediction formula, 355 marked, 202 moments, 184 multivariate, see Mutually exciting nonlinear marked, 252 stationarity conditions, 252 self-exciting, 183 without immigration, 203 Hazard function, 2, 231, 242 in conditional intensity, 231 in life table, 2 of recurrence time r.v.s, 59 random, 211 role in simulation, 271 see also Integrated hazard function Hazard measure, 106 Heine–Borel property, 371 Hermite distribution, 123 Hilbert space, Poisson process on, 40 History of point process, 234, 424 complete, 281 ﬁltration, 236 internal, 234, 424 for MPP, 249 intrinsic, 234, 424 list history, 269 minimal or natural, 424 Ideal gas model, 128 IHF, see Integrated hazard function I.i.d., see Independent identically distributed Immanants, 140 Independent σ-algebras, 415 redundant conditioning, 415 Independent cluster process, 176 conditions for existence, 177

Independent identically distributed (i.i.d.) clusters, 112, 125, 148 Janossy and other measures, 149 Janossy density, 125 negative binomial counts, 113 p.g.ﬂ., 148 see also Neyman–Scott process Independent increments Poisson process, 29 Index of dispersion, 23 Infectivity model, 183 see Hawkes process Inﬁnitely divisible p.g.f., 30 Information gain, 276 average, 279 conditional, 279 see also Expected information gain Inhomogeneous (= nonstationary) Poisson process, 22 conditional properties, 24 thinning construction, 24 Innovations process, 242 Input–output process cluster process example, 329 M/M/∞ queue example, 188 point process system, 319 Integrated hazard function (IHF), 108 exponential r.v. transformation, 258 in renewal process compensator, 246 Intensity function, inhomogeneous Poisson process, 22 see also Conditional intensity Intensity of point process on R, 47 inﬁnite intensity example, 53 Interaction potential for Gibbs process, 127 Internal and intrinsic history, 234 see also History Inverse method of simulation, 260 Ising problem, 216 plus and minus sampling, 221 Isomorphisms of Hilbert spaces in spectral representations, 333 Isotropic planar point process, 297 Bartlett spectrum, 310 Bessel transform in, 310 Neyman–Scott example, 298, 302 Bartlett spectrum, 312 Ripley’s K-function, 297

Subject Index Janossy measure and density, 125 local character of density, 136 moment measure representation, 135 converse, 135 relation to other measures, 153 Jensen’s inequality, 415 Jordan–Hahn decomposition of signed measure, 374 K-function, 297 Kagan (tapered Pareto) distribution, 255 Key renewal theorem, 86 applications, 86 Wold process analogue, 100 Khinchin existence theorem stationary point process on R, 46 Khinchin measures, 146 in likelihood, 219 relation to other measures, 154 use in eﬃcient score statistics, 223 Khinchin orderliness, 52 Kolmogorov extension theorem, 381 projective limit, 381 Kolmogorov forward equations Hawkes process with exponential decay, 243 Kolmogorov–Smirnov test, 262 Korolyuk theorem, 47 generalized equation, 51 Kullback–Leibler distance, 277 Lp convergence, 418 Laguerre polynomials, in conditional intensity for Hawkes process, 234 Lampard reversible counter system, 106 Laplace functional for random measure, 161 Taylor series expansion, 161 Lebesgue bounded convergence theorem, 376 decomposition theorem, 377 integral, 375 monotone convergence theorem, 376 Lebesgue–Stieltjes integral exponential formula for, 107 integration by parts, 106 LeCam precipitation model, 191, 207, 209

459

Length-biased distribution for sibs in branching process, 13 in MPP, 326 in sampling, 45 see also waiting-time paradox Life table, 1 applications, 7 renewal equation from, 6 Likelihood for point process, 211, 213 as local Janossy density, 213 of Poisson process, 21 of regular MPP, 251 Likelihood ratio for point process, 214 inhomogeneous Poisson process, 215 score, 277 binomial score, 278 Line process Poisson, 39 representation as point process on cylinder, 39 Linear birth process simulation, 275 Linear ﬁlters acting on point processes and random measures, 342 Linear predictor, 344 best, 353 conditional intensity comparison, 344 Linear process from completely random measure, 169 Linearly parameterized intensities, 235 uniqueness of ML estimates, 235 Linked stress-release model, 255 simulation of, 273 List history, in simulation, 269 Local Janossy density, 137 as point process likelihood, 213 Janossy measure, 137 Khinchin measure, 150 process on A, p.g.ﬂ., 149 Locally compact second countable topology, 371 topological space, 371 Logarithmic distribution p.g.f., 11 Logistic autoregression, 281 see Discrete Hawkes process Lognormal distribution, 3 Long-range dependent point process, 106 Lundberg’s collective risk model, 199 ruin probability, Cram´er bound, 209

460

Subject Index

Mapping continuous, 371 measurable, 374 Marginal probability measures, 379 conditional probability, 379 Marginal process of locations in MPP, = ground process Ng , 194 Mark distributions in MPP, second-order properties, 323 Mark kernel for MPP, 195 Marked point process (MPP), 194 —general properties conditional intensity, 246 characterization of mark structure, 252, 257 ground process (= marginal process of locations), 194 simple MPP, 195 stationary, 195 internal history, 249 likelihood, 247 predictability, 249 reduced second moment measure distribution interpretation, 325 reference measure for, 247 regular, 247 second-order characteristics diverse nature, 325 MPP—mark-related properties evolutionary-dependent marks, 253 mark kernel, 195 structure of MPP with independent marks, 196 p.g.ﬂ. and moment measures, 196 sum of marks as random measure, 197 with independent or unpredictable marks, 195, 238 conditional intensity characterization, 252, 257 MPP—named processes cluster, cluster-dependent marks, 326 Gauss–Poisson, 331 governed by Markovian rate function, 254 ground process with inﬁnite mean density, 330 Hawkes, 202 expected information gain, 286 existence of stationary version, 203 functional, moment measure, 209

Markov chain on R+ homing set conditions for convergence, 96 existence of invariant measure, 97 application to Wold process, 100 intervals deﬁning Wold process, 92 kernel with diagonal expansion, 104 Markov chain Monte Carlo, 217 Markov point processes, 218 Markov process governing MPP, 254 governing point process, 239 Martingale, 427 convergence theorem, 428 two-sided history version, 428 from Doob–Meyer decomposition, 430 in bivariate Poisson process, 256 representation of point process, 241 uniform integrability of, 428 Mat´ern’s models for underdispersion Model I in R, 298, 302 Model I in Rd , 302 Model II, 303 Maxwell distribution, 4 Mean density point process on line, 46 Mean square continuous process, 332, 348 integral of process with uncorrelated increments, 333 Measurable family of point process, 165 of random measures, 168 Measurable function, space, 374 closure under monotone limits, 376 Measure, 372 atomic and diﬀuse components, 383 Haar, 408 invariant under σ-group of transformations, 409 factorization lemma, 409 nonatomic, 383 on BR , deﬁned by right-continuous monotonic function, 373 on topological group, 407 positive-deﬁnite, 290, 358 reduced moment measure, 160, 289 regular, 386, 387 sequence of, uniform tightness, 394

Subject Index signed, 372 symmetric, 290 tight, 387 compact regular, 387 transformable, 358 translation-bounded, 290, 358 Metric, metric topology, 370 compactness theorem, 371 complete, 370 distance function, 370 equivalent, 370 separable, 372 Metrizable space, 370 Minimal p.p.d. measures, 365 Hawkes process example, 367 Mixed Poisson distribution, 10 terminology, 10 Mixed Poisson process, 25, 167 orderliness counterexamples, 52 p.g.ﬂ., 167 M/M/∞ queue input and output, 188 Modiﬁcation of process, 424 Modulated renewal process, 237 Poisson process example, 244 Moment densities, 136 for renewal process, 139 Moment measure, 132 factorial, 133 Janossy measure representation, 134 for ﬁnite point process, 132 Janossy measure representation, 134 converse, 135 symmetry properties, 133 reduced, 290 of multivariate process, 316 Monotone class (of sets), 369 monotone class theorem, 369 Monotone convergence theorem, 376 Moving average representation of best linear predictor, 354 of random measure, 351 MPP, 194, see Marked point process µ-regular set, 387 Multiple points, 51 Multiplicative population chain, see Branching process, general Multivariate Neyman–Scott process moments, 329

461

Multivariate point process spectra coherence and phase, 318 Multivariate random measure Bartlett spectrum, 317 Multivariate triangular density, 359 Mutually exciting process, 320 Bartlett spectrum, 322 second-order moments, 321 Natural increasing process, 431 Negative binomial distribution, 10 counts in i.i.d. clusters, 113 p.g.f. expansions, 118 P´ olya–Eggenberger, 12 Negative binomial process, 200 from compound Poisson, 200 from mixed Poisson, 201 Neighbourhood (w.r.t. a topology), 370 Neyman Type A distribution, 12 Neyman–Scott process, 181, 192 eﬃcient score statistic, 228 likelihood, 221, 227 multivariate, moments of, 329 planar, 192, 298 isotropic, 302 shot-noise process, 192 Nonlinear marked Hawkes process, 252 Nonstationary Poisson see Inhomogeneous Poisson One-point process, 242 MPP, 256 random time change of, 260 Open sphere, 370 Optional sampling theorem, 429 in random time change, 259 Order statistics exponential distribution, 23 Poisson process, 24 Orderliness, 30, 47 general Poisson process, 35 Khinchin, 52 mixed Poisson simple but not orderly, 52 Poisson process, 30 renewal process, 67 simple but not Khinchin orderly, 52 simple nonorderly example, 52 stationary point process on R, 47 Palm process in reduced moment measure, 296

462

Subject Index

Palm–Khinchin equations, 14, 53 bivariate MPP, 331 interval stationarity, 53 renewal process, 55 Slivnyak’s derivation of, 59 stationary orderly point process, 53 Papangelou intensity contrast with conditional intensity function, 232 Parameter measure of Poisson process, 34 Pareto distribution, tapered, 255 Parseval equation or identity or relation, 304, 357 extended, for L1 (µ)-functions, 362 isotropic planar process, 311 p.p.d. measures, 357 one-to-one mapping, 362 random measure, 334 Particle process, 205 as random closed set, 205 coverage process, 205 union set, 205 volume fraction, 207 Partition function for Gibbs process, 127 Partitions nested family of, 383 in relative entropy, 383 of coordinate set, 143 of integer, 120 of interval set or space, 282 of set or space, 382 Perfect simulation, 275 Periodogram of point process, 336 Perron–Frobenius theorem use in Hawkes process analysis, 321 P.g.f., 10 see Probability generating function P.g.ﬂ., 15 see Probability generating functional Phase in multivariate process spectrum, 318 Planar point processes, isotropic, moments, 297 Neyman–Scott, 298, 302 Ripley’s K-function, 298 two-dimensional renewal, 71 Plancherel identity, 413 Plus and minus sampling, 221

Point pair potential for Gibbs process, 127 Point process (see also individual entries) —basic properties absolute continuity, 214 canonical probability space, 158 deﬁnition as counting measure, 41 boundedly ﬁnite, 158 as sequence of intervals, 42 as set or sequence of points, 41 as step function, 41 exclusion probabilities, 124 ﬁdi distributions, 158 Janossy measures, 124 measurable family of, 165 ordered v. unordered points, 124 orderly, 30, 47 origin of name, 14 second-moment function, 61 simple, 47 stationarity, 44, 160 with multiple points, 51 Point process—general properties best linear predictor, 353 eﬃcient score statistic, 222 goodness-of-ﬁt test, 261 likelihood, 211, 213 likelihood ratio for, 215 martingale representation, 241 periodogram for, 336 prediction via simulation, 274 relative entropy of, 283 residual analysis, 261 Point process—named (see also individual entries) Bartlett–Lewis, 182 Cox, 169 Gauss–Poisson, 174, 185 Gibbs, 126 Hawkes, 183 Neyman–Scott, 181 Poisson, 19 bivariate Poisson, 187 compound Poisson, 25 doubly stochastic Poisson, 169 mixed Poisson, 25 quasi Poisson, 31 Poisson cluster, 179 Wold, 92

Subject Index Point process—types or classes of (see also individual entries) ARMA representations, 351 exponential intervals, 69 inﬁnite intensity example, 53 long-range dependent, 106 of equidistant points, 76 on real line R, 41 stationarity, 44 Palm–Khinchin equations, 53 counting measure, 42 time to ith event, 44 regular, 213 system and system identiﬁcation, 319 with complete independence, 34 structure theorem, 38 with or without aftereﬀects, 13 Poisson branching process, 182 see Bartlett–Lewis model Poisson cluster process, 179 bounded cluster size, 225 eﬃcient score statistic, 225 existence and moments, 179 p.g.ﬂ., canonical form, 188 point closest to the origin, 179 reduced factorial moment and cumulant densities, 180 representation of likelihood, 227 stationary second-order properties, 295 zero cluster probability not estimable, 190 Poisson distribution, 8 ‘compound’ or ‘generalized’ or ‘mixed’ terminology, 10 limit of binomial, 8 p.g.f., 10 Raikov theorem characterization, 32 Poisson process, 13, 19 (see also individual entries) —on real line R avoidance functions, 25 batch-size distribution, 28 characterization by complete randomness, 26 count distributions on unions of intervals, 31 forward recurrence time, 77 renewal process, 77 exponential intervals, 69 superposition, 80

463

superposition counterexample, 82 complete independence, 27 conditional distributions, 22 crude stationarity, 27 implies stationarity, 27 ﬁdi distributions, 19 Fourier transform of, 335 from random time change, 257 in random environment, 244 independent increment process, 29 index of dispersion, 23 inhomogeneous (= nonstationary), 22 cyclic intensity, 26 time change to homogeneous, 23 intensity, 20 likelihood, 21 mean density, 20 order statistics for exponential distribution, 23 orderly, simple, 30 recurrence time, 20 backward, 27 stationary, 19 survivor function, 20 waiting-time paradox, 21 Poisson process—in Rd avoidance function, 32 characterization by, 32 Bartlett spectrum, 306 ﬁnite inhomogeneous, likelihood, 213 random thinning, 34 random translation, 34 simulation, 25 Poisson process—in other named spaces cylinder, 39 as Poisson line process, 39 Hilbert space, 40 lattice, 39 surface of sphere, 39 surface of spheroids, 39 Poisson process—in c.s.m.s. ﬁxed atom, 35 Khinchin measures, 219 parameter measure, 34 atom of, 35 see also extension of R, 22 Poisson summation formula, 367 Poisson tendency in vehicular traﬃc, 329 Polish space, 371 P´ olya–Eggenberger distribution, 12

464

Subject Index

Positive measure, 290 Positive positive-deﬁnite (p.p.d.) measure, 290, 303, 357 closure under products, 359 nonunique ‘square root’, 359 decomposition of, 365 density of, 367 Fourier transform of, 357, 359 minimal, 365 Hawkes process example, 367 of counting measure, 359 Parseval equations, one-to-one mapping, 362 symmetry of, 360 tempered measure property, 367 translation-bounded property, 360 use of Parseval identities, 357 Positive-deﬁnite function, 412 measure, 290, 358 sequence, 366 Power series expansions of p.g.f., 117 P.p.d., see Positive positive-deﬁnite Predictability, predictable σ-algebra, 425 characterization of, 425 conditional intensity function, 232, 241 in random time change, 259 of MPP, 249 of process, 425 Prediction of point process, 267 use of simulation in, 274 Previsibility, 425 Prior σ-algebra, 429 see T -prior σ-algebra Probability forecast, 276 see also Scores for Probability gain, 278 see also Expected information gain Probability generating function (p.g.f.), 10 compound Poisson process, 27–29 discrete distribution, 115 for i.i.d. cluster, 113 inﬁnitely divisible, 30 negative binomial, 10 power series expansions, 117 Taylor series expansions, 115

Probability generating functional (p.g.ﬂ.), 15 cluster process, 178 Cox process, 170 factorial moment measure representation, 146 ﬁnite point process, 144 i.i.d. clusters, 148 Janossy measure representation, 145 mixed Poisson process, 167 Probability space, 375 product space, 377 conditional probability, 379 independence, 378 marginal probability measures, 379 Process governed by Markov process conditional intensity function, 253 MPP, 254 Process of correlated pairs, 185 see Gauss–Poisson process Process of Poisson type, 259 Process with marks, see Marked point process Process with orthogonal increments, 333 Processes with stationary increments spectral theory, 303 Product density, 136 ﬁnite point process, 136 coincidence density, 136 Product measurable space, 378 disintegration, 379 double integrals, 378 Fubini theorem, 379 setting for independence, 378 Product measure, σ-ring, 378 extension problem, 382 projective limit, 382 Product space, 377 of measure spaces, 378 of topological spaces, 377 Product space, topology, 377 cylinder set, 378 Progressive measurability, 424 Prohorov distance, 398 weak convergence theorem, 394 Pseudolikelihood, 217 Purely nondeterministic process, 345 Bartlett spectrum condition, 347

Subject Index Quadratic random measure, 162 Bartlett spectrum, 313 moments, 168 Quadratic score for probability forecast, 286 variation process of martingale, 431 Radial correlation function, 298 Radon–Nikodym derivative, 377 approximation to, 383 as conditional expectation, 414 Radon–Nikodym theorem, 376 Raikov’s theorem, 32 Random hazard function, 211 Random measure, 160 ARMA representations, 351 best linear predictor, 353 as sum of marks in MPP, 197 atomic, from MPP, 197 gamma, 162, 167 see named entry Laplace functional, 161 measurable family of, 168 quadratic, 164 see named entry shot-noise process, 168 smoothing of, 168 as linear process, 169 stationary, NNN second-order moment structure, 289 wide-sense, 339 Random sampling of random process, 337 Random signed measure as mean-corrected random measure, 292 wide-sense spectral theory, 339 characterization of spectral measure, 342 Random thinning, 24, 34, 78 see also Thinning operation Random time change, 257 multivariate, 265 for multivariate and MPP, 265 transformation to Poisson process, 258 Random translation Bartlett spectrum, 314 Poisson process, 34 Random variable, formal deﬁnition, 375

465

Random walk as a point process, 70 generalized renewal equation, 70 nonlattice step distribution, 73 symmetric stable distribution, 71 transience and recurrence, 70 two-dimensional, 71, 74 cluster process, 182 see Bartlett–Lewis process ﬁnite, normally distributed steps, 131 Rapid decay, functions of, 357 Rational spectral density, 348 canonical factorization, 348 Hawkes process example, 309 linear predictor, 354 renewal process, 357 Recurrence time r.v.s, 58, 75, 331 MPP stationary d.f. derivation, 327 Reduced covariance measure, 292 properties, 292 structure, atomic component, 292 simple point process characterization, 294 Reduced moment and cumulant measures, 160 Reduced moment measures estimates for, 299, 303 multivariate case, 320 Reduced second-moment measure, 290 characterization problem, 305, 315 for multivariate process, 317 for MPP, 322 bivariate mark kernel, 325 interpretations, 324 Palm process interpretation, 296 Reference probability measure for MPP, 247 in likelihood ratio score, 277 Regeneration point, 13 Regular measure, 386, 387 Regular point process, 213 conditional densities in one-to-one relation, 230, 232 MPP case, 247 deﬁned uniquely by conditional intensity, 251 likelihood, 251 Relative compactness of measures, 394 of Radon measures on locally compact c.s.m.s., 406

466

Subject Index

Relative entropy, 277, 383 of point processes, 283 Relative second-order intensity, 297 Reliability theory, 6 failure rate classiﬁcation of distributions, 7 Renewal equation, 6, 68 general, 68 linear solution, 70 unique solution, 69 Renewal function, 67 asymptotic discrepancy from linearity, 91 for Erlang distribution, 78 thinning, 76 rescaling characterization, 79, 82 see also Renewal theorem Renewal measure, 67 Renewal process, 67 —general properties compensator, 246 conditional intensity function, 237 construction by thinning, 268 delayed or modiﬁed, 74 expected information gain, 284, 287 exponential intervals, 69 ﬁnite, 125 Janossy densities for, 126 ﬁrst moment measure for, 67 forward recurrence time, 69, 75 from fermion process, 144 from Matern’s Model I, 302 higher moments, 73 lifetime, 67 current lifetime, 76 likelihood, 242 linear and nonlinear predictors, 357 with rational spectral density, 357 modulated, 237 moment densities for, 139 orderliness, 67 ordinary, 67 Palm–Khinchin equation setting, 55 interval distributions, 55 prediction of time to next event, 110 process with limited aftereﬀects, 13 recurrence times, 58, 74 two-dimensional, 71, 74

Renewal process—stationary, 75 Bartlett spectrum, 306 transformation to Poisson process, 259 characterizations of Poisson process, 77, 80 conditions to be Cox process, 174 inﬁnite divisibility conditions, 82 recurrence times, current lifetime, 75 superposition of, 79 thinning of, 78 Renewal theorem Blackwell, 83 convergence in variation norm, 90 counterexample, 91 for forward recurrence time, 86 for renewal density, 86 key, 86 rate of convergence, 91 uniform convergence, 90 Renewal theory, 1, 67 in life tables, 1 Repulsive interaction, 128, 142 Residual analysis for point process, 261 for multivariate and MPP, 267 tests for return to normal intensity, 262 for relative quiescence, 263 see also Goodness-of-ﬁt Ring of sets, 368 covering ring, 389 generating ring, 369 self-approximating, 389 existence of, 390 ﬁnite and σ-additive, 389 Ripley’s K-function, 297 Score for probability forecast binomial, 278 entropy, 276 likelihood, 277 quadratic, 286 Second-order intensity, 296 relative, 296 Second-order properties of point processes and random measures, 288 complementarity of count and interval properties, 288 moment measures, 61, 289 structure in stationary case, 289

Subject Index for multivariate process, 317 for MPP, 322 Second-order stationarity, 289, 334 Self-approximating ring, 389 existence of, 390 Self-correcting point process, 239 see also Stress-release model Self-exciting process, 183 see Hawkes process Semiring, 368 Separability set (of metric space), 371 Set closure, 369 boundary, interior, 369 Shot-noise process, 163, 170 as Neyman–Scott process, 192 Campbell measure, 163 conditions for existence, 168 intensity of, 163 p.g.ﬂ. and factorial cumulants, 170 random measure, 168 σ-additive set function, 372, 387 determining class for, 372 see also Measure σ-algebra of sets, 369 countably generated, 369 independent, 415 σ-compactness in c.s.m.s., 372 σ-compact space, 372 σ-ﬁnite set function, 373 σ-group, 408 of scale changes, 409 of rotations, 410 σ-ring, 369 countably generated, 369 σ-compactness in c.s.m.s., 372 Signed measure, 373 Jordan–Hahn decomposition for, 374 variation norm for, 374 Simple function, 375 Simple point process, 47 characterization via Janossy measure, 138 moment measure, 139 reduced covariance measure, 294 with continuous compensator, 259 Simple Poisson process ﬁdi distributions, 159

467

Simulation of point process, 260, 267 by inverse method, 260 MPP extension, 267 by thinning method, 268 MPP extension, 273 Ogata, 271 Shedler–Lewis, 270, 275 perfect, 275 use in prediction, 274 Simulation—named processes cluster process, 275 linear birth process, 275 Poisson process in Rd , 25 renewal process, 268 stress-release models, 271, 273 Wold process, 274 Singularity of measures, 377 Soft-core model, 128 Spatial point pattern, 17, 212 can lack evolutionary dimension, 212 Spectral density of point process, 305 see also Rational spectral density Spectral measure point process, see Bartlett spectrum stationary process, 305 Spectral representation, 331 of random measure, 331 isomorphisms of Hilbert spaces, 333 for randomly sampled process, 337 via second-moment measure, 341 Spread-out distribution, 87 use in renewal theory, 88 Stable convergence, 419 equivalent conditions for, 420 F -mixing convergence, 421 selection theorem for, 422 topology of, 423 Stable random measure, 168 Stationarity, 41, 45, 159 crude, 44 interval, 45 reduced moment and cumulant measures, 160 second-order, 289 simple, 44 see also individual entries for named processes Stationary interval function, 331

468

Subject Index

Stationary mark distribution, 323 ergodic and Palm probability interpretation, 323 Stationary random measure deterministic and purely nondeterministic, 345 Stirling numbers, 114 ﬁrst and second kind, 114 in factorial moment representations, 142 recurrence relations, 122 Stochastic geometry, 17, 205 Stochastic process, 423 as function on Ω × X , 423 F(−) -adapted, 426 measurable, 424 modiﬁcation of, 424 predictable, 425 progressively measurable, 425 Stopping time, 425 extended, 425 ﬁrst passage time construction, 426 in random time change, 259 T -prior σ-algebra, 429 Strauss process, 217 cluster version, 227 likelihood, 217 Stress-release model, 239 forward Kolmogorov equations, 245 linked, 255 conditional intensity function, 255 stability results, 257 risk and moments of, 245 simulation of, 271 variance of stress, 245 Sub- and superadditive functions, 63 applications of, 46–59 limit properties, 64 Sub- and supermartingale, 428 see also Martingale Subgroup, 407 invariant, 407 normal, 407 Survival analysis, 17 Survivor function, 2 Poisson process, 20 conditional, 229 determine ﬁdi distributions, 230 Symmetric diﬀerence of sets, 368

Symmetric measure, 290 p.p.d. measure property, 360 Symmetric probability measure, 124, 129 Symmetric sets and measures, 129, 131 System identiﬁcation for point processes, 319 cluster process example, 329 for input–output process, 329 T -prior σ-algebra, 429 strict, 429 Tapered Pareto distribution, 255 Taylor series expansions of p.g.f., 115 Thinning operation Poisson process, 24, 34 renewal process, 78 simulation algorithms, 268–275 Tight measure, 387 Topological group, measure on, 407 locally compact, 408 Abelian, characters of, 413 dual group, 413 Topology, topological space, 369 basis for, 370 compact set in, 372 countable base for, 370 equivalent bases for, 370 Hausdorﬀ, 370 locally compact, 372 metric, 370 product, 377 relative compactness, 372 second countable, 370 Totally bounded space, 373 Totally ﬁnite additive set function, 373 Totally ﬁnite measures regular on metric space, 387 metric properties of, 398 space of (= MX ), 398 c.s.m.s. under weak convergence topology, 400 equivalent topologies for, 398 mapping characterization of σ-algebra, 401 Prohorov’s metric on, 398 Transformable measure, 358 property of p.p.d. measure, 362 sequences, 366 translation-bounded counterexample, 366

Subject Index Translation-bounded measure, 290, 358 integrability characterization of, 367 property of p.p.d. measure, 360 Triangular density, 359 multivariate extension, 359 Trigger process, see Shot-noise, 163 Two-dimensional process, see Planar Two-point cluster process, 348 Bartlett spectrum factorization, 348 best linear predictor, 356 Two-point process, 266 Two-sided exponential density, 359 multivariate extension, 359 Unbounded measures Fourier transform, 303, 357 Uniform integrability, 418 equivalent to L1 convergence, 419 Unitary matrix group eigenvalues of random element as ﬁnite point process, 18, 140 Unpredictable marks, process with, 238 Urysohn’s theorem for c.s.m.s., 371 Variance function of stationary point process, 294, 301 bounded variability process, 295 Fourier representation, 305 simple point process, 62, 295 Variation norm for signed measure, 374 Variation of function upper, lower, total, 374 Vehicles on a road, 328 Volume fraction of union set, 207 Waiting-time paradox, 21, 45 Weak convergence of measures, 390 compactness criterion for, 394 on metric space, equivalent conditions for, 391

469

functional condition for, 392 preservation under mapping, 394 relative compactness of, 394 Weibull distribution, 3, 7 Wide-sense theory, 339, 345 Wold decomposition theorem, 344 extension to random measures, 345 Wold process, 92 —general properties of conditional intensity for, 233 convergence in variation norm, 102 intervals as Markov chain homing set conditions for, 96 Markov transition kernel for, 92 diagonal expansion speciﬁcation, 95, 104 key renewal theorem analogue, 100 likelihood and hazard function, 242 mth order, 105 stationary distribution, 93 homing set condition for, 96 Wold process—named examples χ2 distributed intervals, 104 conditionally exponentially distributed intervals, 95, 105, 110 information gain, 287 prediction of, 274 discrete, 94, 103 ﬁrst-order exponential autoregressive process, 92 inﬁnite intensity example, 102 inﬁnitely divisible intervals, 102 intervals as autoregressive process time-reversed example, 105 Lampard’s reversible counter system, 106 long-range dependent example, 106 non-Poisson process with exponential intervals and Poisson counts, 105

Probability and Its Applications A Series of the Applied Probability Trust

Editors: J. Gani, C.C. Heyde, P. Jagers, T.G. Kurtz

Probability and Its Applications Azencott et al.: Series of Irregular Observations. Forecasting and Model Building Bass: Diffusions and Elliptic Operators Bass: Probabilistic Techniques in Analysis Berglund/Gentz: Noise-Induced Phenomena in Slow-Fast Dynamical Systems: A Sample-Paths Approach Chen: Eigenvalues, Inequalities and Ergodic Theory Costa/Fragoso/Marques: Discrete-Time Markov Jump Linear Systems Daley/Vere-Jones: An Introduction to the Theory of Point Processes I: Elementary Theory and Methods, Second Edition Daley/Vere-Jones: An Introduction to the Theory of Point Processes II: General Theory and Structure, Second Edition de la Pena/Gine: Decoupling: From Dependence to Independence, Randomly Stopped Processes U-Statistics and Processes Martingales and Beyond Del Moral: Feynman-Kac Formulae. Genealogical and Interacting Particle Systems with Applications Durrett: Probability Models for DNA Sequence Evolution Galambos/Simonelli: Bonferroni-Type Inequalities with Equations Gani (ed.): The Craft of Probabilistic Modelling. A Collection of Personal Accounts Guyon: Random Fields on a Network. Modeling, Statistics and Applications Kallenberg: Foundations of Modern Probability, 2nd edition Kallenberg: Probabilistic Symmetries and Invariance Principles Last/Brandt: Marked Point Processes on the Real Line Molchanov: Theory of Random Sets Nualart: The Malliavin Calculus and Related Topics, 2nd edition Shedler: Regeneration and Networks of Queues Silvestrov: Limit Theorems for Randomly Stopped Stochastic Processes Rachev/Rueschendorf: Mass Transportation Problems. Volume I: Theory Rachev/Rueschendorf: Mass Transportation Problems. Volume II: Applications Resnick: Extreme Values, Regular Variation and Point Processes Thorisson: Coupling, Stationarity and Regeneration

D.J. Daley

D. Vere-Jones

An Introduction to the Theory of Point Processes Volume II: General Theory and Structure Second Edition

ABC

D.J. Daley Centre for Mathematics and its Applications Mathematical Sciences Institute Australian National University Canberra. ACT 0200, Australia [email protected]

D. Vere-Jones School of Mathematics, Statistics and Computing Science Victoria University of Wellington Wellington, New Zealand [email protected]

Series Editors: J. Gani Stochastic Analysis Group, CMA Australian National University Canberra. ACT 0200 Australia

P. Jagers Department of Mathematical Sciences Chalmers University of Technology and Göteberg (Gothenberg) SE-412 96 Göteberg Sweden

C.C. Heyde Stochastic Analysis Group, CMA Australian National University Canberra, ACT 0200 Australia

T.G. Kurtz Department of Mathematics University of Wisconsin 480 Lincoln Drive Madison, WI 53706 USA

ISBN 978-0-387-21337-8

e-ISBN 978-0-387-49835-5

Library of Congress Control Number: 2007936157 c 2008, 1988 by the Applied Probability Trust All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identiﬁed as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper. 9 8 7 6 5 4 3 2 1 springer.com

To Nola, and in memory of Mary

Preface to Volume II, Second Edition

In this second volume, we set out a general framework for the theory of point processes, starting from their interpretation as random measures. The material represents a reorganized version of those parts of Chapters 6–14 of the ﬁrst edition not already covered in Volume I, together with a signiﬁcant amount of new material. Contrary to our initial expectations, growth in the theoretical aspects of the subject has at least matched the growth in applications. Much of the original text has been substantially revised in order to present a more consistent treatment of marked as well as simple point processes. This applies particularly to the material on stationary processes in Chapter 12, the Palm theory covered in Chapter 13, and the discussion of martingales and conditional intensities in Chapter 14. Chapter 15, on spatial point processes, has also been signiﬁcantly modiﬁed and extended. Essentially new sections include Sections 10.3 and 10.4 on point processes deﬁned by Markov chains and Markov point processes in space; Sections 12.7 on long-range dependence and 12.8 on scale invariance and self-similarity; Sections 13.4 on marked point processes and convergence to equilibrium and 13.6 on fractal dimensions; Sections 14.6 on random time changes and 14.7 on Poisson embedding and convergence to equilibrium; much of the material in Sections 15.1–15.4 on spatial processes is substantially new or revised; and some recent material on point maps and point stationarity has been included in Section 13.3. As in the ﬁrst edition, much of the general theory has been developed in the context of a complete separable metric space (c.s.m.s. throughout this volume). Critical to this choice of context is the existence of a well-developed theory of measures on metric spaces, as set out, for example, in Parthasarathy vii

viii

Preface to Volume II, Second Edition

(1967) or Billingsley (1968). We use this theory at two levels. First, we es# tablish results concerning the space1 M# X and NX of realizations of random measures and point processes, showing that these spaces themselves can be regarded as c.s.m.s.s, and paying particular attention to sample path properties such as the existence of atoms. Second, leaning on these results, we use the same framework to discuss the convergence of random measures and point processes. The fact that the same theory appears at both levels lends unity and economy to the development, although care needs to be taken in discriminating between the two levels. The text of this volume necessarily assumes greater familiarity with aspects of measure theory and topology than was the case in Volume I, and the ﬁrst two appendices at the end of Volume I are aimed at helping the reader in this regard. The third appendix reviews some of the material from martingale theory and the general theory of processes that underlies the discussion of predictability and conditional intensities in Chapter 14. As was the case in Volume I, we are very much indebted to the friends, critics, reviewers, and readers who have supplied us with comments, suggestions, and corrections at various stages in the preparation of this volume. The list is too long to include in full, but we would like to mention in particular the continuing support and advice we have had from Robin Milne, Val Isham, Rick Schoenberg, Gunther Last, and our long-suﬀering colleagues in Canberra and Wellington. The patience and expertise of Springer Verlag, as mediated through our long-continued contacts with John Kimmel, are also very much appreciated. Daryl Daley Canberra, Australia

David Vere-Jones Wellington, New Zealand

# In this edition we use M# X (and NX ) to denote spaces of boundedly ﬁnite (counting) measures on X where in the ﬁrst edition we used MX (and NX ), respectively.

1

Contents

Preface to Volume II, Second Edition Principal Notation

vii xii

Concordance of Statements from the First Edition

xvi

9

Basic Theory of Random Measures and Point Processes

9.1 9.2 9.3 9.4 9.5 10

Deﬁnitions and Examples Finite-Dimensional Distributions and the Existence Theorem Sample Path Properties: Atoms and Orderliness Functionals: Deﬁnitions and Basic Properties Moment Measures and Expansions of Functionals

2 25 38 52 65

Special Classes of Processes

76

10.1 10.2 10.3 10.4 11

1

Completely Random Measures Inﬁnitely Divisible Point Processes Point Processes Deﬁned by Markov Chains Markov Point Processes

Convergence Concepts and Limit Theorems

11.1 Modes of Convergence for Random Measures and Point Processes 11.2 Limit Theorems for Superpositions 11.3 Thinned Point Processes 11.4 Random Translations ix

77 87 95 118 131 132 146 155 166

x

Contents

12

Stationary Point Processes and Random Measures

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 13

Stationarity: Basic Concepts Ergodic Theorems Mixing Conditions Stationary Inﬁnitely Divisible Point Processes Asymptotic Stationarity and Convergence to Equilibrium Moment Stationarity and Higher-order Ergodic Theorems Long-range Dependence Scale-invariance and Self-similarity

Palm Theory

13.1 13.2 13.3 13.4

Campbell Measures and Palm Distributions Palm Theory for Stationary Random Measures Interval- and Point-stationarity Marked Point Processes, Ergodic Theorems, and Convergence to Equilibrium 13.5 Cluster Iterates 13.6 Fractal Dimensions 14

Evolutionary Processes and Predictability

14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 15

Compensators and Martingales Campbell Measure and Predictability Conditional Intensities Filters and Likelihood Ratios A Central Limit Theorem Random Time Change Poisson Embedding and Existence Theorems Point Process Entropy and a Shannon–MacMillan Theorem

Spatial Point Processes

15.1 15.2 15.3 15.4 15.5 15.6 15.7

Descriptive Aspects: Distance Properties Directional Properties and Isotropy Stationary Line Processes in the Plane Space–Time Processes The Papangelou Intensity and Finite Point Patterns Modiﬁed Campbell Measures and Papangelou Kernels The Papangelou Intensity Measure and Exvisibility

176 177 194 206 216 222 236 249 255 268 269 284 299 317 334 340 355 356 376 390 400 412 418 426 440 457 458 466 471 485 506 518 526

References with Index

537

Subject Index

557

Chapter Titles for Volume I

1

Early History

2

Basic Properties of the Poisson Process

19

3

Simple Results for Stationary Point Processes on the Line

41

4

Renewal Processes

66

5

Finite Point Processes

111

6

Models Constructed via Conditioning: Cox, Cluster, and Marked Point Processes

157

7

Conditional Intensities and Likelihoods

211

8

Second-Order Properties of Stationary Point Processes

288

A Review of Some Basic Concepts of Topology and Measure Theory

368

A2

Measures on Metric Spaces

384

A3

Conditional Expectations, Stopping Times and Martingales

414

A1

1

xi

Principal Notation

Very little of the general notation used in Appendices 1–3 in Volume I is given below. Also, notation that is largely conﬁned to one or two sections of the same chapter is mostly excluded, so that neither all the symbols used nor all the uses of the symbols shown are given. The repeated use of some symbols occurs as a result of point process theory embracing a variety of topics from the theory of stochastic processes. Generally, the particular interpretation of symbols with more than one use is clear from the context. Where they are given, page numbers indicate the ﬁrst or signiﬁcant use of the notation. Page numbers in slant font, such as 158, refer to Volume I, but such references are not intended to be comprehensive. Throughout the lists below, N denotes a point process, ξ a random measure (or sometimes, as on p. 358, a cumulative process on R+ ), and X a c.s.m.s.

Spaces C R = R1 R+ , R+ 0 Rd S Ud2α X Z, Z+ X Ω E (Ω, E, P) X (n) X∪

complex numbers real line nonnegative numbers, positive numbers d-dimensional Euclidean space circle group and its representation as (0, 2π] d-dimensional cube of side length 2α and vertices (± α, . . . , ± α) countable state space for Markov chain integers of R, R+ state space of N or ξ; often X = Rd ; always X is c.s.m.s. (complete separable metric space) space of probability elements ω measurable sets in probability space basic probability space on which N and ξ are deﬁned n-fold product space X × · · · × X = X (0) ∪ X (1) ∪ · · · xii

358

96

158, 7 129 129

Principal Notation

xiii

B(X ) Borel σ-ﬁeld generated by open spheres of X 34 = B(X ), B = BR = B(R) 34, 374 BX (n) product σ-ﬁeld on product space X (n) 129 BX = B(X (n) ) BM(X ) bounded measurable functions of bounded support 161, 52 nonnegative functions f ∈ BM(X ) 57 BM+ (X ) BM+ (X ) limits of monotone sequences from BM+ (X ) 57 K mark space for marked point process (MPP) 194, 7 totally ﬁnite (counting) measures on X 158, 3 MX NX # MX boundedly ﬁnite measures on X 158, 3 NX# boundedly ﬁnite counting measures on X 131, 3 # # N0 = NX \ {∅} 90 NX#∗ simple counting measures in NX# 24 N0 , NX#∗ subset of NX#∗ with N {0} > 0 FIX !!! 24, 290 + doubly inﬁnite sequences of positive numbers S ∞ ∞ 14 {t0 , t±1 , t±2 , . . .} with n=1 tn = n=1 t−n = ∞ U linear space; complex-valued Borel measurable functions ζ on X with |ζ| ≤ 1 52; 57 U ⊗V product topology on product space X × Y of topological spaces (X , U), (Y, V) 378 V = V(X ) [0, 1]-valued measurable functions h(·) with 1 − h(·) of bounded support in X 59 = {h ∈ V(X ): inf x h(x) > 0}, i.e., − log h ∈ BM+ (X ) 59 V0 (X ) V(X ) limits of monotone sequences from V(X ) 59 W = X × M# product space supporting Campbell measure C 269 P X

General Unless otherwise speciﬁed, A ∈ BX , k and n ∈ Z+ , t and x ∈ R, h ∈ V(X ), and z ∈ C. ˘ #

F n∗ a, g µ a.e. µ, µ-a.e. a.s., P-a.s. A(·), AF (·) A(n) A ck , c[k]

reduced measure (by factorization) 160, 183 extension of concept from totally ﬁnite to boundedly ﬁnite measure space 158, viii n-fold convolution power of measure or d.f. F 55 suﬃxes for atomic measure, ground process of MPP 4, 3 variation norm of (signed) measure µ 374 almost everywhere with respect to measure µ 376 almost sure, P-almost surely 376 F -compensator for ξ on R+ 358 n-fold product set A × · · · × A 130 family of sets generating B; semiring of 31, 368 bounded Borel sets generating BX kth cumulant, kth factorial cumulant, 116 of distribution {pn }

xiv c(x) = c(y, y + x)

Principal Notation

covariance density of stationary mean square 160, 69 continuous process on Rd cumulant, factorial cumulant measure 147, 69 Ck (·), C[k] (·) covariance measure of ξ 191, 69 C2 (A × B) = cov(ξ(A), ξ(B)) reduced covariance measure of stationary N or ξ 292, 238 C˘2 (·) Campbell measure, modiﬁed Campbell measure 269, 270 CP , CP! reduced Campbell measure (= Palm measure) 287, 331 C˘P (·) δ(·) Dirac delta function Dirac measure, = A δ(u − x) du = IA (x) 382, 3 δx (A) Dirichlet process 12 Dα ∆F (x) = F (x) − F (x−) jump at x in right-continuous function F 107 left- and right-hand discontinuity operators 376 ∆L , ∆R F (· ; ·) ﬁnite-dimensional (ﬁdi) distributions 158, 26 history on R+ ; R 236, 356; 394 F; F† Φ(·) characteristic functional 15, 54 G[h] (h ∈ V) probability generating functional (p.g.ﬂ.) of N 144, 59 p.g.ﬂ. of cluster centre process Nc 178 Gc [·] p.g.ﬂ. of cluster member process Nm (· | x) 178, 192 Gm [· | x] G expected information gain of stationary N on R 280, 442 304, 205 Γ(·) Bartlett spectrum for stationary ξ on Rd H(t) integrated hazard function (IHF) [Q(t) in Vol.I] 109, 361 H(P; µ) generalized entropy 277, 441 internal history of ξ on R+ ; R 236, 358; 395 H; H† IA (x) = δx (A) indicator function of element x in set A 194 I σ-ﬁeld of events invariant under shift operator Su Janossy measure 124 Jn (A1 × · · · × An ) local Janossy measure 137, 73 Jn (· | A) 125, 119, 506 jn (x1 , . . . , xn ) Janossy density K compact set; generic Borel set in mark space K 371, 8 31 (·) Lebesgue measure in B(Rd ) reference measure on mark space 401 K (·) Laplace functional of ξ 161, 57 L[f ] (f ∈ BM+ (X )) p.g.ﬂ. of Cox process directed by ξ 170 Lξ [1 − h] λ(·), λ intensity measure of N , intensity of stationary N 44, 46 conditional intensity function 231, 390 λ∗ (t, ω) complete intensity function for stationary MPP on R 394 λ† (t, κ, ω) 136 mk (·) (m[k] (·)) kth (factorial) moment density ˘2 reduced second-order moment density, measure, m ˘ 2, M of stationary N 289 mean density of ground process Ng of MPP N 198, 323 mg M (A) expectation measure E[ξ(A)] 65 kth-order moment measure E[ξ (k) (·)] 66 Mk (·) N (A) number of points in A 42 N (a, b], = N ((a, b]) number of points in half-open interval (a, b] 19, 42

Principal Notation

N (t) Nc Nm (· | x) Ng N∗ {(pn , Πn )} P = (pij )

xv

= N (0, t] = N ((0, t]) 42 cluster centre process 176 cluster member or component process 176 ground process of MPP 194, 7 support counting measure of N 4 probability measure elements for ﬁnite point process 123 matrix of one-step transition probabilities pij of discretetime Markov chain on countable state space X 96 P (z) probability generating function (p.g.f.) of 10, 115 distribution {pn } avoidance function 31, 33 P0 (A) P probability measure of N or ξ on c.s.m.s. X 158, 6 Palm distribution for stationary N or ξ on R 288 P0 (·) P0 averaged (= mean) Palm measure for stationary MPP 319 Palm measure for κ ∈ K 318 P(0,κ) (·) local Palm measure for (x, κ) ∈ X × K 318 Px,κ (·) stationary distribution for (pij ) 96 {πk } ∅, ∅(·) empty set; null measure 17; 88, 292 Q-matrix of transition rates qij for continuous-time Q = (qij ) Markov chain on countable state space X 97 ρ(x, y) metric for x, y in metric space 370 ρ(y | x) Papangelou (conditional) intensity 120, 506 nested bounded sets, Si ↑ X (i → ∞) 16 {Si } random walk, sequence of partial sums 66 {Sn } sphere of radius r, centre x, in metric space X 371, 5 Sr (x) = Sr (0) 459 Sr successive points of N on R, t−1 < 0 ≤ t0 15 {ti (N )}, {ti } intervals between points of N on R, τi = ti − ti−1 15 {τi } 382, 10 T = {Tn } = {{Ani }} dissecting system of nested partitions tiling 16 tail σ-algebra of process on Rd 208 T∞ U (A) = E[N (A)] renewal measure 67

Concordance of Statements from the First Edition

The table below lists the identifying number of formal statements of the ﬁrst edition (1988) of this book and their identiﬁcation in both volumes of this second edition. 1988 edition

this edition

1988 edition

this edition

2.2.I–III 2.3.III 2.4.I–II 2.4.V–VIII

2.2.I–III 2.3.I 2.4.I–II 2.4.III–VI

3.2.I–6.V

3.2.I–6.V

4.2.I–6.V

4.2.I–6.V

5.2.I–VII 5.3.I–III 5.4.I–III 5.4.IV–VI 5.5.I 6.1.I 6.1.II and 7 6.1.III–IV and 7 6.1.V–VII and 7 6.2.I–IX 6.3.I–VI 6.3.VII–IX 6.4.I–II 6.4.III 6.4.IV–V 6.4.VI 6.4.VII–IX

5.2.I–VII 5.3.I–III 5.4.I–III 5.4.V–VII 5.5.I 9.1.I 9.1.VI 9.1.VIII–IX 9.1.XIV–XVI 9.2.I–IX 9.3.I–VI 10.1.II–IV 9.4.I–II 9.5.I 9.5.IV–V 9.4.III 9.4.VI–VIII

7.1.I–II 7.1.III 7.1.IV–VI 7.1.VII 7.1.VIII and 6 7.1.IX–X 7.1.XI 7.1.XII–XIII 7.2.I 7.2.II 7.2.III 7.2.IV 7.2.V–VIII 7.3.I–V 7.4.I–II 7.4.III 7.4.IV–V 7.4.VI 7.4.VII

9.1.II–III 9.1.VII 9.1.IV 9.1.V 9.1.VIII 9.1.XV, XII 9.2.X 6.4.I(a)–(b) 9.3.VII 9.3.VIII–IX 9.3.VIII 9.3.X 9.3.XII–XV 9.2.XI–XV 9.4.IV–V 9.5.VI 9.4.VII–VIII 9.5.II 9.4.IX

8.1.I 8.1.II 8.2.I 8.2.II 8.2.III

(6.1.13) 6.1.II, IV 6.3.I 6.3.II, (6.3.6) Ex.12.1.6

xvi

xvii

Concordance of Statements from the First Edition

1988 edition 8.2.IV 8.3.I–III 8.4.I–VIII 8.5.I–III 9.1.I–VII 9.1.IX 9.1.X 9.1.XI Ex.9.1.3 9.2.I–VI (9.2.12) 9.2.VII 9.2.VIII 9.3.I–V 9.4.I–V 9.4.VI–IX 10.1.I–III 10.1.IV 10.1.V–VI 10.2.I–V 10.2.VI 10.2.VII–VIII 10.3.I–IX 10.3.X–XII 10.4.I–II 10.4.III 10.4.IV–VII 10.5.II–III 10.6.I–VIII 11.1.I–V 11.2.I–II 11.3.I–VIII 11.4.I–IV 11.4.V–VI 12.1.I–III 12.1.IV–VI 12.2.I 12.2.II–V

this edition (6.3.6) 6.3.III–V 10.2.I–VIII 6.2.II 11.1.II–VIII 11.1.IX 11.1.XI 12.3.X 11.1.X 11.2.I–VI 11.2.VII 10.2.IX 11.2.VIII 11.3.I–V 11.4.I–V 13.6.I–IV 12.1.I–III 12.1.VI 12.4.I–II 12.2.I–V 12.2.IV 12.2.VI–VII 12.3.I–IX 12.4.III–V 12.6.I–II 12.1.IV 12.6.III–VI 15.2.I–II 15.3.I–VIII 8.6.I–V 8.2.I–II 8.4.I–VIII 8.5.I–IV 8.5.VI–VII 13.1.I–III 13.1.V–VII 13.2.I 13.2.III–VI

1988 edition

this edition

12.2.VI–VIII 12.3.I 12.3.II–VI 12.4.I 12.4.II 12.4.III 12.4.IV–VI

13.4.I–III 13.1.IV 13.3.I–V 13.4.IV Ex.13.4.7 13.4.V 13.4.VII–IX

13.1.I–III 13.1.IV–VI 13.1.VII 13.2.I–II Ex.13.2(b) 13.2.III–IV 13.2.V 13.2.VI 13.2.VII–IX 13.3.I 13.3.II–IV 13.3.V–VIII 13.3.IX–XI 13.4.I–III 13.4.III 13.4.IV 13.4.V 13.4.VI 13.5.I–II 13.5.III 13.5.IV–V 13.5.VI 13.5.VII–IX 13.5.X 13.5.XI

7.1.I–III 7.2.I–III 7.1.IV 14.1.I–II 14.1.III 14.1.IV–V 14.1.VII 14.2.I 14.2.VII 14.2.II 14.3.I–III 14.4.I–IV 14.5.I–III 14.6.I–III 7.4.I 14.2.VIII 14.2.VII 14.2.IX pp. 394–396 14.3.IV 14.3.V–VI 14.8.I 14.8.V–VII (14.8.9) 14.8.VIII

14.2.I–V 14.2.VI–VII 14.3.I–III

15.6.I–III, V–VI 15.7.I–II 15.7.III–V

Appendix identical except for A2.1.IV A1.6.I A2.1.V–VI A2.1.IV–V

CHAPTER 8

Second-Order Properties of Stationary Point Processes

Second-order properties are extremely important in the statistical analysis of point processes, not least because of the relative ease with which they can be estimated in both spatial and temporal contexts. However, there are several shortcomings when compared with, for example, the second-order properties of classical time series. There are ambiguities in the point process context as to just which second-order aspects of the process are in view. The second-order properties of the intervals, in a point process on R, are far from equivalent to the second-order properties of the counts, as already noted in Chapter 3 and elsewhere. In this chapter, our concern is solely with random measure or counting properties, broadly interpreted. A more important diﬃculty, however, is that the deﬁning property of a point process—that its realizations are integer-valued measures—is not clearly reﬂected in properties of the moment measures. It does imply the presence of diagonal singularities in the moment measures, but this property is shared with other random measures possessing an atomic component. Nor does there seem to exist a class of tractable point processes, analogous to Gaussian processes, whose second-order properties are coextensive with those of point processes in general. Indeed, there are still open questions concerning the class of measures that can appear as moment measures for point processes or for random measures more generally. Gibbs processes deﬁned by point–pair interactions come close to the generality required for a Gaussian process analogue but have neither the same appeal nor the same tractability as the Gaussian processes. Other examples, such as Hawkes processes, also come close to this role without fulﬁlling it entirely. Ultimately, these problems are related to the nonlinearity of key features of point processes such as positivity and integer counts. Thus, the second-order theory, with its associated toolkit of linear 288

8.1.

Second-Moment and Covariance Measures

289

prediction and ﬁltering methods, although still important, is of less general utility for point processes than for classical time series. Nevertheless, it seems worthwhile to set out systematically both the aspects of practical importance and their underpinning mathematical properties. Such a programme is the aim of the present chapter, which includes a discussion of both time-domain and frequency-domain techniques for secondorder stationary point processes and random measures. Deeper theoretical issues, such as ergodicity, the general structure of moment measures for stationary random measures, and invariance under wider classes of transformations, are taken up in Chapter 12. Spatial processes are treated brieﬂy here, reappearing in Chapters 12 and 15. To avoid encumbering the main text with tools and arguments that are hardly used elsewhere in the book, the main technical arguments relating to the Fourier transforms of second-moment measures are placed in the ﬁnal section, Section 8.6. We shall assume throughout the chapter that the basic point processes are simple. For multivariate and marked point processes, we take this to mean that the ground process is simple. As we have already remarked in Chapter 6, there is no signiﬁcant loss of generality in making this assumption since the batch size in a nonsimple point process can always be treated as an additional mark and the properties of the original process derived from those for marked point processes.

8.1. Second-Moment and Covariance Measures Second-order properties of stationary processes have already made brief appearances in Section 3.5 and Proposition 6.1.I. Here we take as our starting point the second and third properties listed in Proposition 6.1.I. For the purposes of this chapter, these can be restated as follows. Proposition 8.1.I (Stationary random measure: Second-order moment structure). Let ξ be a stationary random measure on X = Rd for which the second-order moment measure exists. (a) The ﬁrst-moment measure M1 (·) is a multiple of Lebesgue measure (·); i.e. M1 (dx) = m (dx) for a nonnegative constant m, the mean density. (b) The second-moment measure M2 (·) is expressible as the product of a Lebesgue component (dx) along the diagonal x = y and a reduced mea˘ 2 (du) say, along u = x − y, or in integral form, for bounded sure, M measurable functions f of bounded support, ˘ 2 (du). (8.1.1a) f (s, t) M2 (ds × dt) = f (x, x + u) (dx) M X (2)

X

X

In particular, by taking f (x, y) = IUd (x)IB (y − x), ˘ 2 (B) = E ξ(x + B) ξ(dx) . M Ud

(8.1.1b)

290

8. Second-Order Properties of Stationary Point Processes

A point process or random measure for which the ﬁrst- and second-moment measures exist and satisfy (a) and (b) of Proposition 8.1.I will be referred to as being second-order stationary. We should note, however, that a point process for which the ﬁrst- and second-order moments satisfy the stationarity assumptions above is not necessarily stationary: nonstationary processes can have stationary ﬁrst and second moments (see Exercises 8.1.1 and 8.1.2). We retain the accent ˘ to denote reduced measures formed by dropping one component from the moment measures of stationary processes as a con˘ [2] (·), C˘2 (·), and sequence of a factorization of the form (8.1.1). Thus, M ˘ C[2] stand, respectively, for the reduced forms of the second factorial moment measure, covariance measure, and factorial covariance measure. A proof of such factorization can be based on the observation that, under stationarity, M2 (dx, d(x+u)) is independent of x and so should have the form (dx)×Q(du) for some measure Q(·); see Chapter 12 and Proposition A2.7.III for details and background. Our principal aim in this section is to study the properties of these reduced measures and the relations between their properties and those of the point processes or random measures from which they derive. We start with a dis˘ 2 , which is arguably the most fundamental if not always the most cussion of M convenient of the various forms. ˘ 2 (·) be the reduced second-moment measure of Proposition 8.1.II. Let M a nonzero, second-order stationary point process or random measure ξ on Rd ˘ 2 is with mean density m. Then M ˘ ˘ (i) symmetric: M2 (A) = M2 (−A) ; ˘ 2 (A) ≥ 0, with strict inequality at least when 0 ∈ A and (ii) positive: M either ξ has an atomic component or A is an open set; (iii) positive-deﬁnite: for all bounded measurable functions ψ of bounded support, ˘ 2 (dx) ≥ 0 , (ψ ∗ ψ ∗ )(x) M (8.1.2) Rd

where ψ ∗ φ(x) =

Rd

ψ(y)φ(x − y) dy,

ψ ∗ (x) = ψ(−x);

(iv) translation-bounded: for every bounded Borel set A in Rd , there exists a ﬁnite constant KA such that ˘ 2 (x + A) ≤ KA (all x ∈ Rd ). (8.1.3) M If also ξ is ergodic and the bounded convex Borel set A increases in such a way that r(A) = sup{r: A ⊇ Sr (0)} → ∞, where Sr (0) denotes the ball in Rd of radius r and centre at 0, then in this limit, for all bounded Borel sets B, ˘ 2 (A) / (A) → m2 (8.1.4) M and 1 (A)

˘ 2 (B) ξ(x + B) ξ(dx) → M A

a.s.

(8.1.5)

8.1.

Second-Moment and Covariance Measures

291

Proof. Symmetry follows from the symmetry of M2 so that, in shorthand form, ˘ 2 (du) (dx) = M2 dx × d(x + u) M = M2 d(x + u) × dx ˘ 2 (−du) (dy), = M2 dy × d(y − u) = M ˘2 (A) follows directly from (8.1.1b). which establishes (i). Nonnegativity of M Positivity for A 0 when ξ has an atomic component follows from Proposition 8.1.IV below, while for the other case, since A is open so that A ⊇ S2 (0) for some sphere of radius 2 > 0, we can choose < 12 and then ˘ 2 S2 (0) = E ˘2 A ≥ M ξ x + S2 (0) ξ(dx) M Ud ≥E ξ S2 (x) ξ(dx) since Ud ⊃ S (0), S (0)

≥E

ξ S (0) ξ(dx)

since S2 (x) ⊃ S (0) for x ∈ S (0),

S (0)

2 = M2 S (0) × S (0) ≥ m S (0) > 0

since > 0.

Positive-deﬁniteness is a consequence of &2 & & & ˘ 2 (du)ψ(x)ψ(x + u) (dx) 0 ≤ E && ψ(x) ξ(dx)&& = M X X X ˘ 2 (du) = ψ ∗ (u − w)ψ(w) (dw). M X

X

˘ 2 is a positive, positive-deﬁnite Properties (ii) and (iii) together show that M (p.p.d.) measure; (iv) is then a consequence of general properties of p.p.d. measures, as set out in Section 8.6. The ﬁnal two assertions follow from the ergodic theorems developed in Chapter 11. In particular, a simple form of ergodic theorem for point processes and random measures ξ on Rd asserts that, for sets A satisfying the conditions outlined in (v), as r(A) → ∞, ξ(A)/(A) → m a.s. and in L1 -norm. If second & &2 moments exist, then also E&ξ(A)/(A)−m& → 0. From these results, it is easy to show that provided both r(A) and r(B) → ∞, M2 (A × A)/[(A)]2 → m2 and, more generally, M2 (A×B)/[(A)(B)] → m2 . Approximating further, we ﬁnd that M2 (U )/( × )(U ) → m2 for a wide class of sets U ∈ X (2) including cylinder sets such as U (A, r) = {(x, u): x ∈ A, y ∈ x + Sr (0)}. But ˘ 2 (A), ˘ 2 (dv) = Sr (0) M M2 (ds × dt) = (du) M U (A,r)

Sr (0)

A

and so (8.1.4) follows after dividing by ( × ) U (A, r) = (A) Sr (0) . Equation (8.1.5) can be established by similar arguments and is a simple special case of the higher-order ergodic theorems described in Chapter 11.

292

8. Second-Order Properties of Stationary Point Processes

Most of the results above transfer directly or with minor modiﬁcations to the other reduced second-order measures. The most important of these is the reduced covariance measure, which can be deﬁned here through the relation ˘ 2 (du) − m2 (du). C˘2 (du) = M

(8.1.6)

The covariance measure itself can be regarded as the second-moment measure of the mean-corrected random signed measure ˜ ξ(A) ≡ ξ(A) − m (A);

(8.1.7)

note that ξ˜ is a.s. of bounded variation on bounded sets. The reduced form ˘ 2 (·). inherits the following properties from M Corollary 8.1.III. The reduced covariance measure C˘2 (·) of a second-order stationary random measure ξ is symmetric, positive-deﬁnite, and translationbounded but in general is a signed measure rather than a measure. If ξ is ergodic, then for A, B and r(A) → ∞ as for (8.1.5), and ξ˜ in (8.1.7),

1 (A)

C˘2 (A)/(A) → 0,

˜ + B) ξ(dx) ˜ ξ(x → C˘2 (B) = E

˜ + B) ξ(dx) ˜ ξ(x .

(8.1.8) (8.1.9)

Ud

A

For point processes, a characteristic feature of the reduced forms of both the moment and covariance measures is the atom at the origin. For a simple point process, this is removed by transferring to the corresponding reduced ˘ [2] (·) and C˘[2] (·). This is not the case, however, for more factorial measures M general point processes and random measures. The situation is summarized in the proposition below and its corollary (see also Kallenberg, 1983, Chapter 2). Proposition 8.1.IV. Let ξ be a stationary second-order random measure or point process on Rd with mean density m and reduced covariance measure C˘2 . Then C˘2 (du) has a positive atom at u = 0 if and only if ξ has a nontrivial ˘ 2 ({0}) and both equal atomic component, in which case C˘2 ({0}) = M + *

2 E (8.1.10) ξ({x}) ξ(dx) = E [ξ({xi })] . Ud

i:xi ∈Ud

Moreover, there exists a σ-ﬁnite measure µ(·) on R+ such that (i) µ has ﬁnite mass outside any neighbourhood of the origin, and for every b > 0, the atoms of ξ with mass greater than b can be represented as a stationary marked point process on X × R+ with ground rate µ(b, ∞) and stationary mark distribution Πb (dκ) = µ(dκ]/µ(b, ∞) on κ > b; (ii) µ(·) integrates κ on R+ , and R+ κ µ(dκ) ≤ m; (iii) ξ is purely atomic a.s. if and only if m = R+ κ µ(dκ); and ˘ 2 ({0}) = C˘2 ({0}). (iv) µ(·) integrates κ2 on R+ , and κ2 µ(dκ) = M R+

8.1.

Second-Moment and Covariance Measures

293

Proof. Choose any monotonically decreasing sequence of nonempty sets An with diam An ↓ 0 and An ↓ {0}. Then, for any x ∈ X , ξ(x + An ) ↓ ξ({x}) a.s. From (8.1.1b) and monotone convergence, we obtain ˘ 2 (An ) = E ξ(x + An ) ξ(dx) ↓ E ξ({x}) ξ(dx) M d Ud *U +

2 = E ξ({xi }) . xi ∈Ud

˘ 2 and C˘2 are In particular, if ξ is a.s. continuous, it follows that both M continuous at the origin, and conversely. Suppose next that b > 0 is given, and consider the atoms from ξ with masses ξ({x}) > b. If ξ is second-order stationary, there can be at most a ﬁnite number of such atoms in any ﬁnite interval. The set of such atoms is therefore denumerable and can be represented as an ordered sequence of pairs {(xi , κi )}, where xi < xj for −∞ < i < j < ∞ and b < κi = ξ({xi }). As in Section 6.4, equation (6.4.6), the set of pairs therefore constitutes a marked point process, which we denote by ξb (·). Let mgb and Πb (·) denote, respectively, the mean density of the ground process for ξb and its stationary mark distribution. Consistency of the ergodic limits requires that for b < b and B ⊆ (b, ∞), mgb Πb (B) = mgb Πb (B) ≡ µ(B).

(8.1.11)

This relation therefore deﬁnes µ consistently and uniquely as a σ-ﬁnite measure on all of R+ . Taking B = (b, ∞) in (8.1.11) then implies that µ(b, ∞) = mgb < ∞, establishing (i). Moreover, the mean density of ξb , mb say, is given by ∞

mb = mgb

∞

κ Πb (dκ) = b

∞

κ µ(dκ) =

κI{κ>b} µ(dκ) . 0

b

Since mb ≤ m < ∞ and for any A, ξb (A) ↑ ξa (A) as b → 0, whereξa denotes the atomic component of ξ, we must have mb = E ξb (Ud ) ↑ E ξa (Ud ) ≡ ma ≤ E(ξ(Ud )) ≡ m as b → 0. Hence, ∞ ∞ ma = lim κI{κ>b} µ(dκ) = κ µ(dκ), b→0

0

0

establishing (ii). Assertion (iii) is the same as the diﬀuse measure ξ − ξa having zero mean, implying that it is a.s. null. Finally, for any b > 0, consideration of the second moment of ξb yields the equations + * ∞ ∞

g 2 2 2 mb κ Πb (dκ) = κ µ(dκ) = E [ξ({xi })] . b

b

xi ∈Ud : ξ({xi })>b

˘ 2 ({0}) < ∞ and converges Since the right-hand side is bounded above by M ˘ to M2 ({0}) as b → 0, (iv) follows.

294

8. Second-Order Properties of Stationary Point Processes

Condition (iii) above identiﬁes purely atomic stationary random measures (see also Kallenberg, 1983). We would like to be able to use some property of µ to identify point processes (i.e. integer-valued random measures) and then simple point processes. The former identiﬁcation is tantamount to a version of the moment problem: when do the moments of a measure [here µ(·)] suﬃce to identify the measure? This has no easy solution for our present purposes. The latter is much simpler. Corollary 8.1.V. A second-order stationary point process N with density ˘ 2 ({0}) = m, which m is a simple point process if and only if C˘2 ({0}) = M is equivalent to the reduced second-order factorial moment and covariance measures having no atom at the origin. Proof. A stationary random measure ξ is a simple point process if and only if it is integer-valued and all its atoms have mass 1. The latter condition ∞ ∞ is satisﬁed if and only if 1 κ µ(dκ) = 1 κ2 µ(dκ); i.e. µ has all its mass ˘ 2 ({0}). The equivalent form of the latter on {1}, or equivalently, m = M ˘ [2] ({0}) = M ˘ 2 ({0}) − m. condition follows from the relation M Analytical derivations of the relations for κr µ(dκ) for positive integers r and stationary point processes have been given in Propositions 3.3.VIII and 3.3.IX. In Chapter 12, there is an analogue of Corollary 8.1.V for a higherorder reduced factorial measure of a stationary point process to vanish at {0} as a condition for the process to have a bounded batch-size distribution or equivalently the factorial moment of the same order of µ(·) to vanish. Returning to more general properties, results such as (8.1.4) and (8.1.8) can be rephrased in further equivalent ways. When X = R, for example, they reduce respectively to E[ξ 2 (0, x)] ∼ m2 x2 ,

(x → ∞),

var ξ(0, x) = o(x2 )

results already discussed for ergodic point processes in Section 3.4. Other useful results follow as special cases of the general representations (8.1.1). These imply, for example, that

h(y) ξ(dy) =

cov

g(x) ξ(dx), Rd

Rd

Rd

C˘2 (du)

g(x)h(x + u) (dx). Rd

(8.1.12) In particular, (8.1.12) leads to the following expressions for the variance: V (A) ≡ var ξ(A) =

IA (x)IA (x + u) (dx) C˘2 (du) IA (x) (dx) IA−x (u) C˘2 (du)

Rd Rd

=

R

d

Rd

C˘2 (A − x) (dx).

= A

(8.1.13a)

8.1.

Second-Moment and Covariance Measures

295

When X = R and A = (0, x], this becomes x (x − |u|) C˘2 (du) = 2 V (x) ≡ var ξ(0, x] = −x

x

Fc (u) du,

(8.1.13b)

0−

where for x > 0, Fc (x) = 12 C˘2 ({0}) + C˘2 (0, x] = 12 C˘2 [−x, x] is a symmetrized form of the distribution function corresponding to the reduced covariance measure. Properties of V (x) can be read oﬀ rather simply from this last representation: for example, it is absolutely continuous with a density function of which there exists a version that is continuous except perhaps for a countable number of ﬁnite discontinuities. Further details and an alternative approach in the point process case are outlined in Exercise 8.1.3. Note that, when it exists, the covariance density is a second derivative in (0, ∞) of V (x). See Exercise 8.1.4 for an analogue of (8.1.13b) in the case of a stationary isotropic point process in R2 . The variance function V (A) is widely used in applications, often in the form of the ratio to the expected value M (A); for a simple point process, this is just C˘ (A − x) (dx) C˘[2] (A − x) (dx) V (A) A 2 =1+ A . (8.1.14) = M (A) M (A) m(A) This ratio equals 1 for a Poisson process, while values larger than 1 indicate clustering and values less than 1 indicate repulsion or some tendency to regular spacing. For suitably small sets, for which diam A → 0, V (A)/M (A) → 1; that is, locally the process is like a Poisson process in having the variance-tomean ratio ≈ 1 (see Exercise 8.1.5). As (A) → ∞, various possibilities for the behaviour of V (A)/M (A) exist and are realizable (see Exercise 8.1.6), but most commonly, the covariance measure is totally ﬁnite, in which case V (A)/M (A) → 1 + m−1 C˘[2] (X )

(A ↑ X ).

A stationary random measure is of bounded variability if V (A) itself remains bounded as (A) → ∞ as for (8.1.5) [see Exercises 7.2.10(a) and 8.1.6]. [This terminology is preferred to controlled variability (Cox and Isham, 1980, p. 94).] Example 8.1(a) Stationary Poisson cluster processes. For a stationary Poisson cluster process and all values of the cluster centre x, monotone convergence shows that the cluster member process satisﬁes M[2] (An × An | x) → E[Z(Z − 1)] as (An ) → ∞ through a convex averaging sequence {An }, where Z ≡ Nm (X | 0) denotes a generic r.v. for the total number of points in a cluster. Then, since (6.3.12) for large A gives C[2] (A × A) ∼ E[Z(Z − 1)] M c (A), we have C˘[2] (X ) = E[Z(Z − 1)] and thus V (A)/M (A) → 1 + E[Z(Z − 1)]/EZ = EZ 2 /EZ.

(8.1.15)

296

8. Second-Order Properties of Stationary Point Processes

Characteristically, therefore, the variance-to-mean ratio for a Poisson cluster process increases from a value approximately equal to 1 for very small sets to a limiting value equal to the ratio of the mean square cluster size to the mean cluster size for very large sets [see the formula for the compound Poisson process in Exercise 2.1.8(b)]. The region of rapid growth of the ratio occurs as A passes through sets with dimensions comparable to those of (the spread of) individual clusters. These comments provide the background to diagnostic procedures such as plotting the ratio V (A)/M (A) against M (A) or (A) as (A) → ∞ and to the Greig-Smith method of nested quadrats, which uses a components-of-variance analysis to determine the characteristic dimensions at which clustering effects or local inhomogeneities begin to inﬂuence the variance [see Greig-Smith (1964) for further discussion]. The representation (8.1.1b) has important interpretations when ξ is a point process rather than a general random measure, and for the discussion in this section we assume that the process is orderly. In particular, it follows in this case that ˘2 (A) = E #{point-pairs (xi , xj ): xi ∈ Ud and xj ∈ x1 + A} M (8.1.16a) = E rate of occurrence of point-pairs (xi , xj ): xj − xi ∈ A . (8.1.16b) Dividing by the mean density (= intensity = average rate of occurrence) m ˘ 2 in terms of the expectation measure of the yields an interpretation of M Palm process (see Section 3.4 and the discussion in Chapter 13) obtained by conditioning on the presence of a point at the origin: ˘ 2 (A) / m. E #points xi ∈ A | point at x = 0 = M

(8.1.17)

It is even more useful to have density versions of (8.1.17), assuming (as we ˘ [2] (A) = ˘ [2] is absolutely continuous, so M m[2] (x) dx. This now do) that M A density is related to the corresponding covariance density by m ˘ [2] (x) = c˘[2] (x) + m2 .

(8.1.18)

When the density exists, the ratio m ˘ [2] /m has been called the intensity of the process (e.g. Cox and Lewis, 1966, p. 69) or the conditional intensity function (e.g. Cox and Isham, 1980, Section 2.5). We call it the second-order intensity ˘ 2 (·) so that and denote it by h ˘ 2 (x) = m h ˘ [2] (x)/m = m + c˘[2] (x)/m. ˘ 2 (x) can also be interpreted as the intensity at x of the process conditional h on a point at the origin; this is an interpretation taken up further in the discussion of Palm measures in Chapter 13. Notice that, in d = 1, we have for

8.1.

Second-Moment and Covariance Measures

297

a renewal process as in Chapter 4 with renewal function U (x) (x > 0) that is ˘ 2 (x) = h ˘ 2 (−x) = U (|x|). We call the ratio absolutely continuous, h r2 (x) ≡

˘ 2 (x) m ˘ [2] (x) h = m m2

(8.1.19)

the relative second-order intensity [but note that in Vere-Jones (1978a) it is called the relative conditional intensity]. It equals 1 for a stationary Poisson process, while for other stationary processes it provides a useful indication of the strength and character of second-order dependence eﬀects between pairs of points at diﬀerent separations x ∈ Rd : for example, when r2 (x) > 1, pointpairs separated by the vector x are more common than in the purely random (Poisson) case, while if r2 (x) < 1 such point-pairs are less common. ˘ 2 (A) and related functions, spheres In considering the reduced measures M Sr (0) constitute a natural class of sets to use for A in dimension d ≥ 2; deﬁne ˘ 2 (Sr (0) \ {0}) = M ˘ [2] (Sr (0)), ˘ 2 (r) = M K

(8.1.20)

the equivalent formulation here being a consequence of orderliness. Ripley (1976, 1977) introduced this function, though what is now commonly called Ripley’s K-function (including Ripley, 1981) is the density-free version K(r) =

˘ 2 (Sr (0) \ {0}) ˘ 2 (r) M K = , 2 m m2

(8.1.21)

so, since λ = m because of orderliness, λK(r) = E(# of points within r of the origin | point at the origin), (8.1.22) where on the right-hand side the origin itself is excluded from the count. The function K(r) is monotonically nondecreasing on its range of deﬁnition r > 0 and converges to 0 as r → 0. As can be seen from the examples below and is discussed further in Chapter 12, this function is particularly useful in studying stationary isotropic point processes because it then provides a succinct summary of the second-order properties of the process. For a Poisson process, K(r) = (Sr (0)). Recall the deﬁnition of K(r) in terms of the sphere Sr (0). Noting the ˘ 2 (r) = K (r) interpretation in (8.1.22), we see that the derivative (d/dr)K gives the conditional probability of a point on the surface of a spherical shell of radius r, conditional on a point at the centre of the shell. Consequently, for an isotropic process in R2 , the probability density that a point is located at distance r from a given point of the process and in the direction θ equals K (r)/(2πr), independent of θ because of isotropy. In dimension d ≥ 3, the same equality holds on replacing the denominator 2πr by the surface area of Sr (0).

298

8. Second-Order Properties of Stationary Point Processes

For stationary isotropic processes in R2 , the relative second-order intensity r2 (x), which → 1 as x → 0 when it is continuous there, is a function of |x| alone, and ρ(r) = r2 (x) − 1, where r = |x|, has been called the radial correlation function (see e.g. Glass and Tobler, 1971), though it may lack the positive-deﬁniteness property of a true correlation function. The same quantity can be introduced, irrespective of isotropy, as a derivative of Ripley’s K-function K(r) in (8.1.21): write dK(r) K (r) ρ(r) = − 1 = − 1. (8.1.23) d(πr2 ) 2πr Examples of the use of m ˘ [2] (·) and ρ(r) are given in Vere-Jones (1978a), Chong (1981) and Ohser and Stoyan (1981), amongst many other references. Example 8.1(b) A two-dimensional Neyman–Scott process. By using the general results of Example 6.3(a), it can be shown that the reduced second factorial cumulant measure is given by ˘ F (u + A) F (du) = µc m[2] G(A), C[2] (A) = µc m[2] R2

where F is the probability distribution for the location of a cluster member about the cluster centre, G is the probability distribution for the diﬀerence of two i.i.d. random vectors with distribution F , µc is the Poisson density of cluster centres, and m[2] is the second factorial moment of the number of cluster members. For the K-function, we ﬁnd K(r) = πr2 + [m[2] /(µc m21 )]G1 (r), where G1 (r) is the d.f. for the distance between two ‘oﬀspring’ from the same ‘parent’, while ρ(r) = [m[2] /(µc m21 )]g1 (r), where g1 (r) = G1 (r) is the probability density function for the distance between two oﬀspring from the same parent. Note that ρ is everywhere positive, an indication of overdispersion or clustering relative to the Poisson process, at all distances from an arbitrarily chosen point of the process. Some particular results for the case where F is a bivariate normal distribution are given in Exercise 8.1.7. Example 8.1(c) Mat´ern’s Model I for underdispersion (Mat´ern, 1960). Let {xn } denote a realization of a stationary Poisson process N on the line with intensity λ. Identify the subset {xn } of those points of the realization that are within a distance R of another such point, i.e. # $ {xn } = x ∈ {xn }: |x − y| < R for some y ∈ {xn } with y = x , and let {xn }\{xn } ≡ {xn } constitute a realization of a new point process N (note that N = {xn } is deﬁned without using any Poisson properties of N ). The probability that any given point x of N will be absent from N is then the probability, 1 − e−2λR , that at least one further point of N is within a distance R of x. While these events are not mutually independent, they have

8.1.

Second-Moment and Covariance Measures

299

the same probability, so the mean density m for the modiﬁed process equals m = λe−2λR ≤ e−1 /(2R)

for all λ;

the inequality is trict except for λR = 12 . To ﬁnd the second-order properties of N , consider the probability q(v) that for a given pair of points distance v apart in N , both are also in N . Then ⎧ (0 < v ≤ R), ⎪ ⎨0 exp − λ 2R + v) (R < v ≤ 2R), q(v) = ⎪ ⎩ exp − 4λR (v > 2R). The factorial moment density of N is thus m ˘ [2] (x) = λ2 q(x), and the relative second-order intensity [see (8.1.19)] is given by r2 (x) =

0 eλ(2R−x)+

(0 < x ≤ R), (x > R).

Thus, the process shows complete inhibition (as for any hard-core model) up to distance R and then a region of overdispersion for distances between R and 2R before settling down to Poisson-type behaviour for distances beyond 2R. The process is in fact of renewal type: the results above and others can be deduced from the renewal function for the process [see Exercise 8.1.9(a) for further details]. The model can readily be extended to point processes in the plane or space, but the analogues of the explicit expressions above become more cumbersome as the expression for the area or volume of the common intersection of circles or spheres becomes more complex (see Exercise 8.1.8). The set of rejected points {xn } is ‘clustered’ in the sense that every point has a nearest neighbour within a distance R [see Exercise 8.1.9(c)]. We conclude this section with some notes on possible estimates for reduced moment measures, being guided by the interpretations of the model-deﬁned quantities and their interpretation described above. Assume, as is usually the case, that we observe only a ﬁnite part of a single realization of an ergodic process. Let B denote a suitable test set, such as an interval on the line or a rectangle or disk in the plane, and A a (larger) observation region. Then, replacing Ud by A in the right-hand side of (8.1.1b) and allowing for the change to the second factorial moment, we obtain * +

1 ∗ ˘ [2] (B) = E N (xi + B) , (8.1.24) M (A) i: xi ∈A

where N ∗ (x + B) = N (x + B) − δ0 (B), so that N (x + B) is reduced by 1 when B contains the origin.

300

8. Second-Order Properties of Stationary Point Processes

The corresponding na¨ıve estimate is obtained by dropping the expectation sign in the expression above (i.e. by taking each point xi in A in turn as origin, counting the number of points in sets xi + B having a common relative position to xi but ignoring xi itself if it happens to lie within the test region, and then dividing by the Lebesgue measure of the observation region); we denote it by

, [2] (B; A) = 1 N ∗ (xi + B). (8.1.25) M (A) i:xi ∈A

Note that in the case of a process with multiple points, the points at each (1) (n ) xi should be labelled xi , . . . , xi i , and the deﬁnition of N ∗ implies that we (j) (j) (j) (k) omit pairs (xi , xi ) but not any pair (xi , xi ) with j = k. In principle, (8.1.1b) implies that this estimate is unbiased, while the assumed ergodicity of the process and the ﬁrst assertion of (8.1.5) imply that it is consistent. In practice, however, diﬃculties arise with edge eﬀects since N ∗ (xi + B) may not be observable if xi lies near the boundary of B. Replacing it by N ∗ [(xi + B) ∩ A] introduces a bias that may be corrected in a variety of ways. For example, we may subtract an explicit correction factor [see Exercise 8.1.11(b)], or we may take observations over an extended region A + B (plus sampling), thereby ensuring that all necessary information is available but at the expense of the fullest use of the data. One commonly used correction replaces (8.1.25) by the form

c M[2] (B; A)

N (A)(B) = (A)

N ∗ [A ∩ (xi + B)] xi ∈A xi ∈A [A ∩ (xi + B)]

(8.1.26)

so that each observation count N ∗ (xi + B) is given a relative weight equal to that fraction of (xi + B) that remains inside A; see also Exercise 8.1.10(a). Estimates of the reduced covariance measure, and hence of the variance function, can be obtained by subtracting appropriate multiples of (B) as noted in Exercise 8.1.11(c). These comments are included to suggest a basis for the systematic treatment of moment estimation for point processes; Krickeberg (1980) and Jolivet (1978) discuss some further issues and special problems, while applications are discussed by Ripley (1976, 1981), Diggle (1983), Vere-Jones (1978a), and many others.

Exercises and Complements to Section 8.1 8.1.1 Consider a nonstationary Poisson cluster process on R with cluster centres having intensity µc (t) and a cluster with centre t having either a single point at t with probability p1 (t) or two points, one at t and the other at t + X, where the r.v. X has d.f. F . Show that p1 (·) and µc (·) can be chosen so that the process is ﬁrst-order stationary but not second-order stationary.

8.1.

Second-Moment and Covariance Measures

301

8.1.2 Construct an example of a point process that has stationary covariance measure but nonstationary expectation measure. [Hint: Such a process is necessarily not simple: consider a compound Poisson process in which the rate of occurrence of groups and mean square group size are adjusted suitably.] 8.1.3 Let V (x) = var(N (0, x]) denote the variance function of a second-order stationary point process N (·) on the line, and write M2 (x) = E([N (0, x]]2 ) = V (x) + (mx)2 , where m = EN (0, 1]. (a) Show that M2 (x) is superadditive in x > 0 and hence that V (0+) ≡ limx↓0 V (x)/x exists, with V (0+) ≥ m. (b) Show that (M2 (x))1/2 is subadditive and hence that limx→∞ V (x)/x2 exists and is ﬁnite. (c) When N (·) is crudely stationary (see Section 3.2), show that V (0+) = m if and only if the process is simple. (d) Construct an example of a second-order stationary point process for which the set of discontinuities of the left and right derivatives of V (·) is countably dense in (0, ∞). x (e) Writing M2 (x) = λ 0 (1 + 2U (y)) dy, where λ is the intensity of N (·), show that limx→∞ U (x)/λx exists and is ≥ 1. (f) Show that supx>0 (U (x + y) − U (x)) ≤ 2U (y) + m/λ. x (g) Use (8.1.13) to show that V (x) = 2 0 Fc (u) du where, in terms of the ˘2 (0, u] = 1 C ˘2 , Fc (u) = 1 C ˘ ({0})+C ˘ [−u, u]. reduced covariance measure C 2 2 2 2 Deduce that, when it exists, the covariance density is a second derivative in R+ of V (x). [Hint: See Daley (1971) for (a)–(e) and Berb´ee (1983) for (f).] 8.1.4 Suppose N (·) is a simple stationary isotropic point process in R2 with intensity λ, ﬁnite second-moment measure, and second-order intensity [see (8.1.18)] ˘ 2 (x) = h(|x|), ˘ h say, for points distance |x| apart. Show that for a sphere Sr of radius r, V (Sr ) ≡ var N (Sr ) equals 2

λπr + λ

r

2πu du 0

r+u

arcos 0+

max

u2 + v 2 − r 2 − 1, 2uv

˘ v h(v) dv

˘ Supposethat h(u) → 0 monotonically for u large enough. Deduce that when r ˘ limr→∞ 1 uh(u) du < ∞, limr→∞ V (Sr )/M (Sr ) exists [see below (8.1.14)]. 8.1.5 (a) If {In } is a nested decreasing sequence of intervals with (In ) → 0 as n → ∞, show that for any second-order stationary simple point process on R, V (In )/M (In ) → 1. (b) Show that replacing {In } by more general nested sets {An } may lead to V (An )/M (An ) → 1. [Hint: Consider a stationary deterministic process at j unit rate, and for some ﬁxed integer j ≥ 2, let An = i=1 (i, i + 1/n].] (c) Let {An } be a nested decreasing sequence of sets in Rd with diam(An ) → 0 as n → ∞. Show that V (An )/M (An ) → 1 as n → ∞ for second-order stationary simple point processes on Rd . 8.1.6 Processes of bounded variability. Show that for a nontrivial stationary cluster point process on R with ﬁnite second-moment measure to be of bounded variability, the cluster centre process must be of bounded variability and all clusters must be of the same size.

302

8. Second-Order Properties of Stationary Point Processes As a special case, suppose the cluster centre process is deterministic and that points are randomly jittered with jitter distribution F , say. What conditions on F are needed for the jittered process to be of bounded variability? [See Cox and Isham (1980, Section 3.5) for more discussion.]

8.1.7 Isotropic Neyman–Scott process. In Example 8.1(b), suppose that the d.f. F is the bivariate normal distribution with zero mean and covariance matrix

⎧ 2 ⎩ σ1 Σ=⎪

ρσ1 σ2

⎫

ρσ1 σ2 ⎪ ⎭. σ22

Then, the symmetrized d.f. G for the vector distance between two oﬀspring from the same parent is bivariate normal also with zero mean vector and covariance matrix 2Σ. When σ12 = σ22 = σ 2 , say, and ρ = 0, the process is isotropic and K(r) = πr2 + [m[2] /(µc m21 )](1 − e−r

2

/4σ 2

).

8.1.8 Rd -analogue of Mat´ern’s Model I. Let v(R, a) denote the volume of the intersection of two Rd hyperspheres of radius R whose centres are distance a apart. Construct a point process in Rd analogous to the process in R of Example 8.1(b) and show that this Rd analogue has M (A) = λe−λv(R,0) (A), ˘ 2 (x) = h

⎧ ⎨0 ⎩

(0 < |x| ≤ R),

λ exp ( − λ[v(R, 0) − v(R, |x|)])

(R < |x| ≤ 2R),

λ exp ( − λv(R, 0))

(2R < |x|).

[Hint: See Cox and Isham (1980, Exercise 6.3) for the case d = 2.] 8.1.9 Mat´ern’s Model I: Further properties. (a) Renewal process. Let {tn : n = 1, 2, . . .} be the successive epochs in (1, ∞) of a Poisson process on R+ at rate λ, and attach marks I(tn ) = 0 or 1 successively as follows, starting with tn initially unmarked. If tn is unmarked, then I(tn ) = 0 if tn+1 < tn + 1, in which case I(tn+1 ) = 0 also, or else tn+1 > tn + 1, I(tn ) = 1, and tn+1 is initially unmarked. If I(tn ) = 0, then I(tn+1 ) = 0 if tn+1 < tn + 1, or else tn+1 > tn + 1 and tn+1 is initially unmarked. Show that {tn : n = 0, 1, . . .}, deﬁned by t0 = 0 and tn+1 = inf{tj > tn : I(tj ) = 1} (n = 0, 1, . . .), are the epochs of a renewal process with a renewal density function h(·) that is ultimately constant, namely 0 (0 < x ≤ 1), h(x) dx = λe−λ min(x,2) (x > 1). (b) Show that Example 8.1(c) is a version of the corresponding stationary renewal process. (c) The complementary set. Every point in the complementary set {xn } of ‘rejected points’ in the construction of Mat´ern’s Model I in Example 8.1(c) shows clustering characteristics: for one thing, the nearest-neighbour distance of any xn is at most R. Investigate other properties of this process.

8.2.

The Bartlett Spectrum

303

[Hint: Consider ﬁrst the case d = 1; ﬁnd its density, cluster structure, nearest-neighbour distribution, and covariance density. Which of these are accessible when d ≥ 2? What properties of {xn } can be deduced by complementarity with respect to a Poisson process of the underdispersed process of Example 8.1(c)?] 8.1.10 Matern’s Model II for underdispersion. Consider an independent marked Poisson process with realization {(xi , κi )} in which the points {xi } have intensity λ, say, and the independent marks have a common uniform distribution on (0, 1) (any absolutely continuous distribution will do). A point xi is rejected if there is any other point within distance R and with mark larger than κi . Show that the retained points {xi }, say, have density (1 − e−2λR )/(2R) and that the relative second-order intensity r2 (x) vanishes for |x| < R, equals 1 for |x| > 2R, and for R < |x| < 2R, r2 (x) =

2R + (3R + x)e−λ(R+x) − (5R + x)e−λ(3R+x) > 1. R(R + x)(3R + x)

Examine the Rd analogues of the model (see Exercise 8.1.8). c (B) in (8.1.26) is unbiased. 8.1.11 (a) Show the weighted estimate M[2] (b) A simpler but cruder correction subtracts from (8.1.25) the expected bias when the observed process is Poisson with the same mean rate. Express c this as a correction to M[2] (B). [Hint: See e.g. Miles (1974) and Vere-Jones (1978a, p. 80) who give explicit forms.] (c) Although the cumulative forms given above admit consistent estimates, they are less easy to interpret than smoothed estimates of the corresponding densities. For example, in R2 , estimates of the radial correlation function and related quantities can be obtained by counting the number of points in an annulus about a given point of the realization, dividing by the area of the annulus, subtracting the appropriate mean, and regarding the resultant value as an estimate of ρ(r) at a distance r corresponding to the mid-radius of the annulus. Fill out the details behind these remarks. [Hint: See e.g. Vere-Jones (1978a) and Chong (1981) for applications.]

8.2. The Bartlett Spectrum The spectral theory of point processes has two origins. On the theoretical side, the results can be derived from specializations of Doob’s (1949, 1953) theory of processes with stationary increments and related treatments of generalized stochastic processes by Bochner (1955) and Yaglom (1961). The key features relevant to the practical analysis of point process data were identiﬁed by Bartlett (1963) and followed up by several authors, as summarized for example in Cox and Lewis (1966) and Brillinger (1972, 1978). The treatment given in this chapter is based on developments of the theory of Fourier transforms of unbounded measures (see e.g. Argabright and de Lamadrid, 1974). As such, it requires an extension, not quite trivial, of the classical Bochner theorem and related results used in standard time series analysis. We describe

304

8. Second-Order Properties of Stationary Point Processes

this extension, concerned with properties of positive, positive-deﬁnite (p.p.d.) measures, in Section 8.6. Here in this section, we summarize and illustrate the properties that are most relevant to the practical analysis of point process models. ˘2 We saw in Proposition 8.1.II that the reduced second-moment measure M of a stationary random measure is a p.p.d. measure so that all the proper˘ 2 is ties developed for such measures in Section 8.6 apply. In particular, M transformable so that it possesses a well-deﬁned Fourier transform (in the sense of generalized functions), which is again a measure, and for which the explicit versions of the Parseval relation and the inversion theorem, derived in that section, are valid. The reduced covariance measure C˘2 is not itself a ˘ 2 only by the term m2 , which is also a p.p.d. measure, but it diﬀers from M p.p.d. measure [its Fourier transform is the multiple (m2 /2π)δ0 of the measure consisting of a single atom at the origin]. Thus, C˘2 can be represented as a diﬀerence of two p.p.d. measures, so that the same results (existence of a Fourier transform that is a diﬀerence of two p.p.d. measures, Parseval relations, etc.) hold for it also. A similar remark applies to the reduced second factorial moment measure and the corresponding factorial cumulant measure, where it is a matter of subtracting an atom at the origin. Any one of these four measures could be taken as the basis for further development of the spectral theory. It is convenient, and consistent with the standard convention in time series analysis, to choose as the spectrum of the process ξ the inverse Fourier transform of the (ordinary) covariance measure. The proposition below summarizes the main results pertaining to this transform; (8.2.1) and (8.2.2) are examples of Parseval relations. Proposition 8.2.I. Let ξ be a second-order stationary point process or random measure on Rd with reduced covariance measure C˘2 . Then (a) there exists a symmetric, translation-bounded measure Γ on BRd such that, for all ψ in the space S of functions of rapid decay deﬁned below (8.6.1), ˜ ˘ ψ(x) C2 (dx) = ψ(ω) Γ(dω), (8.2.1)

Rd

Rd

˜ where ψ(ω) = Rd ei(ω·u) ψ(u) du (ω ∈ Rd ); (b) the inversion relations (8.6.6–10) and (8.6.12) hold, with µ identiﬁed as Γ and ν as C˘2 ; and (c) for bounded measurable φ with bounded support and also for φ ∈ S, if ζφ = Rd φ(x) ξ(dx), then 2 ˜ var ζφ = |φ(ω)| Γ(dω) = (φ ∗ φ∗ )(u) C˘2 (du) ≥ 0, (8.2.2) Rd

Rd

∗

where φ (u) = φ(−u). Proof. The statements all follow from the p.p.d. properties noted in the opening paragraph and the results for p.p.d. measures outlined in Section 8.6. In particular, (8.2.2) follows from Proposition 8.6.IV.

8.2.

The Bartlett Spectrum

305

Deﬁnition 8.2.II. The Bartlett spectrum of a second-order stationary point process or random measure ξ on Rd is the measure Γ(·) associated with the reduced covariance measure C˘2 of ξ in Proposition 8.2.I. Equations (8.2.1), usually in the form of (8.2.4) below, and (8.2.2) are generally the most convenient results to use in establishing the form of the Bartlett spectrum for a given process. Note in particular the special case for X = R and ψ the indicator function for (0, t], var ξ(0, t] = R

sin 12 ωt 1 2ω

2 Γ(dω),

(8.2.3)

which is essentially Daley’s (1971) representation for the variance function of a stationary point process or random measure [Daley uses a measure deﬁned on R+ , while in (8.2.3), Γ(·) is a symmetric measure on R]. An alternative route to (8.2.3) exploiting a skeleton process, the standard Bochner representation and weak convergence, is sketched in Exercise 8.2.1. It is clear from Proposition 8.2.I that while the spectral measure Γ is positive, it is not in general a p.p.d. measure. However, since the reduced second˘ 2 is positive and is the Fourier transform of the positive moment measure M 2 measure Γ(·) + [m /(2π)d ]δ0 (·), Γ(·) can be made into a p.p.d. measure by the addition of a suﬃciently large atom at the origin. In the point process case, the reduced covariance measure has an atom at the origin that transforms into a positive multiple of Lebesgue measure, and consequently the Bartlett spectrum of a point process is never totally ﬁnite. On the other hand, the factorial covariance measure is often both absolutely continuous and totally ﬁnite, and then Γ(·) is absolutely continuous with a density γ(·), which can be written (for the case d = 1)

∞

2πγ(ω) = m + −∞

e−iωx c[2] (x) dx

= m + c˜[2] (−ω) = m + c˜[2] (ω).

(8.2.4)

It was in this form that the spectral measure was originally introduced by Bartlett (1963). It is not known whether every p.p.d. measure can arise as the secondmoment measure of some random measure nor, when it does, how to construct a process yielding the given measure as its second-moment measure. The standard construction using Gaussian processes or measures is not available here, as such processes do not have nonnegative trajectories (see Wiener’s homogeneous chaos example in Chapter 9). Some partial results arise from the examples considered below and from Exercises 8.2.11–12 and 8.4.6–7. Davidson (1974) provided a construction for identifying the second-moment measures of stationary random measures on the circle (see the further discussion in Chapter 12), but it relies on the ﬁniteness of the invariant measure on a circle, and

306

8. Second-Order Properties of Stationary Point Processes

it is not obvious how it might be extended to either point processes or random measures on the line. In the very special case of a discrete point process on the four points of the compass (NESW), with translation interpreted as rotation through π/2, the family of second-moment measures can be identiﬁed explicitly and is strictly contained in the class of p.p.d. measures; see Exercise 8.2.5 for details. We now discuss the Bartlett spectrum for some basic point processes on Rd . Example 8.2(a) Poisson process with constant intensity on Rd . Here C˘2 consists only of the atom mδ0 (·) so Γ is absolutely continuous with density m/(2π)d . This ‘white-noise’ spectrum is consistent with the completely random character of the process. Note that the Parseval relations (8.2.1) and (8.2.2) take, respectively, the special forms, with ζφ = Rd φ(x) N (dx), m ˜ ψ(ω) dω mψ(0) = (2π)d Rd and

m var ζφ = m |φ(x)| dx = d (2π) d R

2

Rd

2 ˜ |φ(ω)| dω.

Example 8.2(b) Stationary renewal process. If the renewal density u(t) exists and the process is stationary with mean rate λ = 1/µ, where µ is the mean lifetime, we have from Example 5.4(b) that m ˘ [2] (x) = λu(|x|) and hence

c˘2 (x) = δ0 (x) + λ u(|x|) − λ .

If further the diﬀerence u(x) − λ is integrable on (0, ∞), (8.2.4) yields for ω = 0 F(ω) 1 F(−ω) λ 1 λ 1+ + + −1 , = γ(ω) = 2π 2π 1 − F(ω) 1 − F(−ω) 1 − F(ω) 1 − F(−ω) (8.2.5) ∞ iωx where F (ω) = 0 e dF (x) is the characteristic function of the lifetime distribution. For ω = 0, we obtain from the above or Exercise 4.4.5 ∞ σ 2 + µ2 λ λ 1+ 1+2 = u(x) − λ dx . γ(0) = 2π µ2 2π 0 Special cases, when lifetime distributions are of ‘phase type’ for example, yield rational polynomials for F and hence rational spectral densities (see e.g. Neuts, 1979). Exercise 8.2.6 gives a simple nontrivial example. Since a stationary renewal process has moment measures of all orders whenever it exists, the Bartlett spectrum exists for all such processes, but without the additional restriction it may not be absolutely continuous or (even if it is) γ(0) need not be ﬁnite as above. The extreme case described in the next example is worth particular mention.

8.2.

The Bartlett Spectrum

307

Example 8.2(c) Stationary deterministic process. Here, points occur on a regular lattice of span a, the whole lattice being randomly shifted so that the ﬁrst point to the right of the origin is uniformly distributed on (0, a]. The ˘ 2 (·) has an inﬁnite sum, with mass 1/a at each of the points ka, measure M (k = 0, ±1, . . .). Its Fourier transform has mass 1/a2 at each of the points 2πj/a, (j = 0, ±1, . . .). Moving to the Fourier transform of the covariance measure deletes the atom at j = 0 so that Γ(·) can be written in terms of Dirac measures as ∞ 1 Γ(A) = 2 (8.2.6) δ2πj/a (A) + δ−2πj/a (A) . a j=1 Example 8.2(d) Cluster processes. For a general cluster process N in Rd , the variance of an integral Rd φ(x) N (dx) can be written (see Exercise 6.3.4) φ(x) N (dx) =

var Rd

Vφ (u) M c (du) + mφ (u)mφ (v) C2c (du × dv),

Rd

(8.2.7)

(Rd )(2)

where mφ (u) =

Rd

φ(x) M1 (dx | u),

Vφ (u) =

(Rd )(2)

φ(s)φ(t) C2 (ds × dt | u),

and we use the notation M c (·) and C2c (·) from (6.3.4–5). In the stationary case, M c (du) = mc du, where mc is the mean density of the cluster centre process, while C2c has a reduced form that can be written in terms of the Bartlett spectrum Γc of the cluster centre process. Since also C2 (ds × dt | y) depends only on the diﬀerences s − y and t − y, the ﬁrst term in (8.2.7) can be written in terms of the measure B deﬁned via bounded measurable h by h(y) B(dy) = h(s − t) C2 (ds × dt | 0). (Rd )(2)

Rd

Here the measure B is both positive-deﬁnite and totally ﬁnite (since the mean square cluster size is necessarily ﬁnite); it has therefore an ordinary Fourier transform B(ω) = (2π)−d Rd e−i(ω·x) B(dx), which can be written in the symmetric form B(ω) = var e−i(ω·x) Nm (dx | 0) , Rd

where, it should be recalled, var Z = E |Z|2 − |EZ|2 for a complex-valued r.v. Z. Thus, writing 1 (ω | 0) = e−i(ω·x) M1 (dx | 0) = E e−i(ω·x) Nm (dx | 0), M Rd

Rd

308

8. Second-Order Properties of Stationary Point Processes

we obtain from (8.2.7) var

mc 2 ˜ ˜ 1 (ω | 0)|2 Γc (dω) . B(ω) dω + | M |φ(ω)| (2π)d Rd

φ(x) N (dx) =

Rd

This relation shows that the Bartlett spectrum of the cluster process N can be identiﬁed with the measure −d 1 (ω | 0)|2 Γc (dω). Γ(dω) = B(ω)m dω + |M c (2π)

(8.2.8)

The ﬁrst term can be regarded as the contribution to the spectrum from the internal cluster structure; the second term is a ﬁltered version of the spectrum of the cluster centre process with the ﬁltering reﬂecting the mean distribution of the cluster, as in Daley (1972b). For a stationary Poisson cluster process, further simpliﬁcation occurs. Letting µc denote the intensity of the Poisson process of cluster centres, we ﬁnd that Γ has a density γ, which has the simple alternative forms µc γ(ω) = (2π)d

!R& & µc = E && d (2π)

M1 (dx | 0) +

d

R

−i(y·ω)

e &2 " & i(x·ω) e Nm (dx | 0)&& , d Rd

Rd

M[2] (dx × dy | 0) (8.2.9)

which is easily recognized as the transformed version of (6.3.5). Speciﬁc results for the Neyman–Scott and Bartlett–Lewis processes follow readily from these equations (see Exercises 8.2.9 and 8.2.10). We shall see in Section 8.3 that, for ﬁltering and prediction purposes, a particularly important role is played by point processes having a rational spectral density. Many common and useful examples fall into this class. By suitable speciﬁcation of the components, both renewal and cluster processes can give rise to spectral measures with rational spectral densities. For example, it is clear from (8.2.5) that this will occur whenever the interval distribution of a renewal process has a rational Laplace transform, that is, whenever the distribution is expressible as a ﬁnite convolution or mixture of exponentials. Several types of cluster processes, as well as Cox processes, have rational spectral densities, in particular the Neyman–Scott process with an exponential or Erlang distribution for the distances of the cluster elements from the cluster centre [see also Exercise 8.2.7(b)]. The wide choice of such examples shows not only the richness of the class but also the relative lack of discrimination in the spectrum as a means of distinguishing between processes that in other respects may be quite dissimilar. One of the most important examples is the Hawkes process with suitably restricted response function (i.e. infectivity measure) as described below.

8.2.

The Bartlett Spectrum

309

Example 8.2(e) Hawkes process with rational spectral density. From Example 6.3(c) and the results on branching processes in Exercise 5.5.6, we see 1 of the ﬁrst-moment measure of the total oﬀthat the Fourier transform M spring process is a rational function of the Fourier–Stieltjes transform µ ˜ of the infectivity measure, namely 1 (ω | 0) = 1/[1 − µ M ˜(ω)],

where

µ ˜(ω) =

∞

eiωx µ(dx).

0

Combining this result with the expressions for the mean rate and covariance density given by (6.3.26) and (6.3.27) and with the general form (8.2.8) for cluster processes, we obtain the spectral density for the Hawkes process in the form λ/(2π) γ(ω) = . (8.2.10) (1 − ν) |1 − µ ˜(ω)|2 1 (ω). Consequently, when µ ˜(ω) is a rational function of ω, so too is M Because the form of (8.2.10) is similar to that of the spectral density of an autoregression in continuous time, one might hope that the Hawkes model could play a role similar to that of autoregressive models in the context of mean square continuous processes. This hope is frustrated by the special probabilistic structure of the Hawkes model, which requires that µ(·) ≥ 0. If this condition is violated, it is not clear that there exists any point process with the spectral form (8.2.10), and if such a process does exist, it certainly will not have the Poisson branching structure of a Hawkes process. Despite this diﬃculty, the possibility of using the Hawkes process to approximate general point process spectra was explored by Hawkes (1971b), Hawkes and Adamopoulos (1973), Ozaki (1979) and, more deliberately, by Ogata and Akaike (1982), with an application in Ogata et al. (1982). Ogata and Akaike (1982) suggest taking for µ a measure on [0, ∞) with density function µ(t) = K eαt k=0 bk Lk (t) for α > 0 and Laguerre polynomials Lk (t). This form leads automatically to processes with rational spectral densities since the Fourier transforms of the Laguerre polynomials are themselves rational. The simplest case occurs when K = 0 and b0 = αν for 0 < ν < 1, so that µ ˜(ω) = να/(α−iω) and λ ω 2 + α2 . γ(ω) = · 2 2π(1 − ν) ω + α2 (1 − ν)2 Note the characteristic feature for point processes with rational spectral density that the numerator and denominator are of equal degree. Further examples are given in the papers cited and in Vere-Jones and Ozaki (1982). To yield a valid model, the parameters should be constrained to ensure that the density of the infectivity measure (and hence the conditional intensity) is everywhere nonnegative; for stationarity, the infectivity measure should have total mass < 1. These conditions are relatively stringent and quite diﬃcult to impose in estimation procedures. Within these constraints,

310

8. Second-Order Properties of Stationary Point Processes

however, the Hawkes model is one of the most ﬂexible models available in that it allows both the calculation of the form of the spectrum and the investigation of probabilistic aspects of the process. The basic results described so far apply to stationary (translation-invariant) point processes in any general Euclidean space Rd . When d > 1, however, additional symmetries such as isotropy (invariance under rotations) become possible and have important implications for the structure of the spectral measures. As an illustration, we conclude this section with a brief discussion of isotropic random measures in R2 , this time looking at the Fourier transforms. In the stationary, isotropic case, the second-order properties of a random ˘ 2 (·) measure in R2 are fully deﬁned by the mean density m and the function K deﬁned in (8.1.20). We examine the constraints on the Bartlett spectrum in R2 implied by this isotropy condition and show how to represent the spectrum ˘ 2 (·). in terms of m and K Consider ﬁrst the eﬀect of the double Fourier transform on a function h: R2 → R which, in addition to being bounded, measurable, and of bounded support, is circularly symmetric, i.e. h(x, y) = h(r cos θ, r sin θ) = g(r)

(all r ∈ S)

for some function g. The transform is given by ˜ h(ω, φ) ≡

ei(ωx+φy) h(x, y) dx dy =

R2 ∞

=

0

2π

eir(ω cos θ+φ sin θ) dθ

rg(r) dr 0

2π

eirρ cos(θ−ψ) dθ

rg(r) dr 0

∞

0

using (ρ, ψ) as polar coordinates in the (ω, φ) plane. Now the integral over θ 2π is simply a Bessel function J0 (u) = 1/(2π) 0 eiu cos θ dθ, so ˜ h(ω, φ) = 2π

∞

rJ0 (rρ)g(r) dr ≡ g˜B (ρ),

where ρ = (ω 2 + φ2 )1/2 .

0

(8.2.11) ˜ Consequently, h(ω, φ) is again circularly symmetric, reducing to the function g˜B (·), which we call the Bessel transform of g(·) (we have included the factor 2π—this is a departure from the usual deﬁnition) and is also called a Hankel transform—see e.g. Copson (1935, p. 342). By arguing analogously from the inverse Fourier transform 1 ˜ ei(ωx+φy) h(ω, h(x, y) = φ) dω dφ, (2π)2 R2 it follows that the Bessel transform is inverted as in ∞ 1 g(r) = ρ˜ g B (ρ)J0 (rρ) dρ. 2π 0

(8.2.12)

8.2.

The Bartlett Spectrum

311

From this discussion, we should expect the Bartlett spectral density of a stationary isotropic process to be circularly symmetric in frequency space and ˘ 2 (r). To cover to be related to the inverse Bessel transform of the density of K the situation where densities may not exist, the Bessel transform relation needs to be put into the form of a Parseval relation so that it can be extended to measures, as follows. Proposition 8.2.III. Let Γ(·) be the Bartlett spectrum on R2 associated with a simple stationary isotropic point process in R2 . Then Γ(·) is circularly symmetric and is expressible via (ω1 , ω2 ) = (ρ cos ψ, ρ sin ψ) as dψ dρ + m2 κ(dρ) + 2πm2 δ0 (dρ) , (8.2.13) Γ(dρ × dψ) = mρ 2π 2π ˘ 2 (·) of (8.1.20) by the Parseval– where κ is related to the radial measure K Bessel equation ∞ ∞ B ˘ 2 (dr) g˜ (ρ) κ(dρ) = g(r) K (8.2.14) 0

0

for all bounded measurable g of ﬁnite support on R+ and g˜B is deﬁned by (8.2.11). Proof. Recall that the Bartlett spectrum is the Fourier transform in R2 of the complete covariance measure C˘2 , which for disks Sr (0) takes the form ˘ 2 (r), C˘2 Sr (0) = m − m2 πr2 + m2 K where the ﬁrst term arises from the diagonal concentration associated with a simple point process; the second, the term involving the square of the mean, must be subtracted from the second moment to yield the covariance; and the third is the form of the reduced second factorial moment measure. Using mixed diﬀerential notation, this can be rewritten as ˘ 2 (dr) dθ . C˘2 (dx × dy) = m δ0 (dx × dy) − m2 dx dy + m2 K 2π The ﬁrst and second terms have the following inverse Fourier transforms, respectively: m dω1 dω2 mρ dρ dψ dρ dψ = = mρ · , (2π)2 (2π)2 2π 2π dψ 4π 2 m2 δ0 (dω1 × dω2 ) = 2πm2 δ0 (dρ) · . (2π)2 2π ˘ 2 (dr) dθ/(2π) by Denoting the double Fourier transform of the measure K L(dω1 × dω2 ), the Parseval relation for such transforms implies that, with h ˜ as earlier, and h ∞ 2π dθ ˜ 1 , ω2 ) L(dω1 × dω2 ) = ˘ 2 (dr) . h(ω h(r cos θ, r sin θ) K 2π R2 0 0

312

8. Second-Order Properties of Stationary Point Processes

Now

2π

h(r cos θ, r sin θ) 0

dθ 2π ∞

∞ 2π 1 ˜ cos ψ, ρ sin ψ) dψ dθ dρ e−iρr cos(θ−ψ) ρ h(ρ = (2π)2 0 0 0 ∞ 2π 1 ˜ cos ψ, ρ sin ψ) dψ, dρ ρJ0 (ρr)h(ρ = 2π 0 0

where as before the invariance of integrating θ over any interval of length 2π ˜ 1 , ω2 ) to have the product form has been used. If, in particular, we take h(ω g˜B (ρ)f (ψ), we obtain from this relation and the Bessel transform equation (8.2.12) that

g˜ (ρ)f (ψ) L(dρ × dψ) = B

0

(0,∞)×(0,2π)

2π

dψ f (ψ) 2π

∞

˘ 2 (dr). g(r) K

0

Since the integral here depends on f only through its integral over (0, 2π), a uniqueness argument implies that L(·) has a disintegration of the form L(dρ × dψ) = κ(dρ) [dψ/(2π)], where κ(·) satisﬁes (8.2.14). ˘ 2 (dr) dr (and not K ˘ 2 ), in the sense of Note that (8.2.14) deﬁnes (1/r)K generalized functions, as the Bessel transform of (1/ρ) κ(dρ) dρ. Example 8.2(f) An isotropic Neyman–Scott process. Consider the circularly symmetric case from Example 8.1(b) and Exercise 8.1.7, for which we have ˘ 2 (dr) = 2πr dr + m[2] re−r2 /4σ2 dr . K µm21 2σ 2 It is easy to check from (8.2.14) that the measure 2πr dr on R+ is the Parseval– Bessel transform of the measure consisting of a unit atom at the origin. The second term is a density, and it can be derived (via the Fourier transform in R2 or otherwise) as the Parseval–Bessel transform of the density κ(ρ) =

µm[2] 2 2 ρe−σ ρ . 2 2πµm1

Consequently, for this isotropic Neyman–Scott model, the Bartlett spectrum is absolutely continuous with spectral density γ(ω, φ) =

µm[2] −σ2 (ω2 +φ2 ) β(ρ) µm1 e + ≡ , 2 4π 2π 2π

where the function β(·) as just deﬁned exhibits the Bartlett spectrum in the polar form β(ρ) dρ [dψ/(2π)].

8.2.

The Bartlett Spectrum

313

Exercises and Complements to Section 8.2 8.2.1 Given a second-order stationary point process N , the relation {Xh (n)} = {N (nh, (n + 1)h]} deﬁnes a second-order stationary discrete time series. Express var N (0, nh] in terms of the second-moment structure of {Xh (n)}. Use the standard spectral representation of the second moments of a discrete-time process to give a spectral representation for var N (0, nh], and argue that for h → 0 there is a weak limit as in (8.2.3). 8.2.2 Superposition. Show that if ξ1 , ξ2 are independent second-order stationary random measures with Bartlett spectra Γ1 , Γ2 , respectively, then ξ1 + ξ2 has spectrum Γ1 + Γ2 . More generally, if ξ1 , ξ2 , . . . are independent second-order stationary random measures such that the L2 limit ξ = ξ1 + ξ2 + · · · exists, then ξ has Bartlett spectrum Γ1 + Γ2 + · · · . 8.2.3 Cox process. Let ξ be a second-order stationary random measure on Rd with Bartlett spectrum Γ and mean density m. Show that the Cox process directed by ξ has Bartlett spectrum Γ(·) + m(2π)−d (·), where (·) denotes Lebesgue measure on Rd . 8.2.4 Quadratic random measure [see Example 6.1(c) and Exercise 6.1.3]. (a) Let Zi (t)(i = 1, 2) be independent mean square continuous second-order stationary random processes on R with respective spectral d.f.s Fi and zero mean. Show that the product Z1 Z2 is a mean square continuous second-order stationary process with spectral measure F1 ∗ F2 . (b) If Z is a mean square continuous stationary Gaussian process with spectral d.f. F and zero mean, then the quadratic random measure whose sample paths have density Z 2 (·) has covariance density 2|c(·)|2 and Bartlett spectrum 2F ∗ F, where c(x) = cov(Z(0), Z(x)). (c) Investigate what changes are needed in (a) and (b) when the zero mean assumption is omitted. 8.2.5 Cyclic point process on four points. Consider a {0, 1}-valued process on the four compass points NESW that is stationary (i.e. invariant under cyclic permutations). Denote the probabilities of the six basic conﬁgurations 0000, 1000, 1100, 1010, 1110, and 1111 by {p0 , p1 , . . . , p5 }, respectively. (i) Show that the mean density and reduced second-moment measure are given respectively by m = 14 p1 + 12 (p2 + p3 ) + 34 p4 + p5 , ˘ 2 = {a, b, c, d}, M where a = m, b = d = 14 p2 + 12 p4 + p5 , c = 12 p3 + 12 p4 + p5 . Show that ˘ 2 is a p.p.d. measure with Fourier transform proportional to (a + c + 2b, M a − c, a + c − 2b, a − c). (ii) Renormalize the probabilities so that m = 1 (equivalent to looking at the Palm measure and its ﬁrst moment) and the second-moment measure has

314

8. Second-Order Properties of Stationary Point Processes standardized form {1, β, γ, β}. Show that this is a p.p.d. measure if and only if β, γ are nonnegative and γ ≤ 1, 1 + γ ≥ 2β. However, this is the second-moment measure of a point process on NESW if and only if, in addition, 1 + β ≥ 2γ. [Hint: Write x = 12 p4 + p5 , y = 14 p1 + 14 p4 , so that x < min(β, γ) and (x, y) lies on the line y = 3x−K, where K = 2β +2γ −1. Nonnegative solutions x, y exist if and only if 13 K ≤ min(β, γ), which yields both the p.p.d. condition and the additional condition.]

8.2.6 Stationary renewal process. Let the lifetime d.f. F (·) of the process as in Example 8.2(b) be the convolution of two exponentially distributed random variables with means 1/µj (j = 1, 2). Evaluate (8.2.5) explicitly. 8.2.7 Random translations. Let the point process N be second-order stationary with Bartlett spectrum Γ and mean density m. If the points of N are subjected to independent random translation with common d.f. F, show that the resultant point process NT has Bartlett spectrum [see (8.2.8)] ΓT (dω) = |F(ω)|2 Γ(dω) + m(2π)−d (1 − |F(ω)|2 ) (dω). 8.2.8 Iterated random translations. Let the independent translation of points of N as in Exercise 8.2.7 be iterated n times. Show that the Bartlett spectrum Γn of the resulting process satisﬁes Γn (dω) = |F(ω)|2 Γn−1 (dω) + m(2π)−d (1 − |F(ω)|2 ) (dω) = |F(ω)|2n Γ(dω) + m(2π)−d (1 − |F(ω)|2n ) (dω) and hence give conditions for Γn (·) to converge weakly to m(2π)−d (·). (See Chapter 11). 8.2.9 Neyman–Scott process [continued from Example 6.3(a)]. (a) Show that the Bartlett spectrum for a Neyman–Scott process on R, with (Poisson) cluster centre process at rate µc , m[1] and m[2] for the ﬁrst two factorial moments of the cluster size distribution, and common d.f. F for the distances of the points of a cluster from their centre, has density γNS (ω) given by γNS (ω) = (µc /2π)[m[1] + m[2] |F(ω)|2 ], where F(ω) =

∞

−∞

eixω F (dx).

(b) In the particular case where F (x) = 1 − e−αx (x ≥ 0), deduce that γNS (·) is the rational function γNS (ω) =

µc m[1] α2 m[2] /m[1] 1+ . 2π α2 + ω 2

(c) When the Neyman–Scott process is as above on Rd , show that γNS (ω) = (µc m[1] /(2π)d )[1 + (m[2] /m[1] )|F(ω)|2 ]

with F(ω) = Rd eix·ω F (dx). Deduce that when d = 2 and F (·) is a bivariate normal d.f. with zero mean and the usual second-moment parameters σ12 , σ22 and ρσ1 σ2 , the spectrum has density γNS (ω1 , ω2 ) =

µc m[1] m[2] 1+ exp(−σ12 ω12 − 2ρσ1 σ2 ω1 ω2 − ρ22 ω22 ) . 2 4π m[1]

8.2.

The Bartlett Spectrum

315

(d) Show that if in (a) the cluster structure is modiﬁed to include the cluster centre, then γNS (ω) = (µc /2π)[1 + m[1] (1 + F(ω) + F(−ω)) + m[2] |F(ω)|2 ]. (e) Show that if in (a) the cluster centre process is a general stationary point process with mean intensity µc and Bartlett spectrum Γc (·), then the Bartlett spectrum ΓNS (·) of the cluster process is given by µc ΓNS (dω) = |m[1] F(ω)|2 Γc (dω) + [m[1] + (m[2] − m2[1] )|F(ω)|2 ] (dω). 2π [Hint: Except for (d), the results can be derived, ﬁrst by compounding and then by using random translations as in Exercise 8.2.7; otherwise, see (8.2.8).] 8.2.10 Bartlett–Lewis model [continued from Example 6.3(b)]. (a) Use (6.3.23) to show that the Bartlett spectrum has density γBL (·) given by ∞ ∞ ∞

µc (j + 1)qj + (k + 1 − j)qk (Fj (ω) + Fj (−ω)) . γBL (ω) = 2π j=0

j=1 k=j

Observe that γBL (ω) = γNS (ω) as in Exercise 8.2.9(d) in the cases q1 = 1 and m[1] = 1, m[2] = 0, respectively. (b) Show that when qj = (1 − α)αj (j = 0, 1, . . .) with 0 < α < 1, so that each cluster is a transient renewal process,

1 1 µc + −1 , γBL (ω) = 2π(1 − α) 1 − αF(ω) 1 − αF(−ω) while when q0 = 0, qj = (1 − α)αj−1 (j = 1, 2, . . .),

1 1 µc γBL (ω) = + − 1 − (1 − α)2 . 2πα(1 − α) 1 − αF(ω) 1 − αF (−ω) (c) The formulae in parts (a) and (b) assume that the cluster centre is included in the cluster process. Show that omitting the cluster centres leads to

*

+

∞ ∞

µc γBL (ω) = jqj + (k − j)qk (Fj (ω) + Fj (−ω)) 2π

*

j=1

k=j+1

+

j−1 ∞

µc = jqj + qj (j − k)(Fk (ω) + Fk (−ω)) . 2π ∞

j=1

j=2

k=1

8.2.11 Let M2 be a p.p.d. measure on BR with density m2 . Show that if 0 < a ≤ m2 (x) ≤ b < ∞ (all x) then there exists a zero-mean Gaussian process X(t) such that m2 (x) = E[X 2 (t)X 2 (t + x)] andhence that M2 is the reduced second-moment measure of the process ξ(A) = A X 2 (t) dt (A ∈ BR ). Deduce that any p.p.d. function c2 (·) can be a reduced covariance density; i.e. there is some a > 0 such that a + c2 (x) is the second-moment density of some second-order stationary random measure. 8.2.12 Let F be any totally bounded symmetric measure Rd . Show that F can be a covariance measure. [Hint: Construct a Gauss–Poisson process and refer to Proposition 6.3.IV. See Milne and Westcott (1972) for further details.]

316

8. Second-Order Properties of Stationary Point Processes

8.3. Multivariate and Marked Point Processes This section provides a ﬁrst introduction to the wide range of extensions of the previous theory, incorporating both time-domain and frequency-domain aspects. We look ﬁrst at multivariate and marked point processes, with stationarity in time (i.e. translation invariance) still playing the central role. The results given thus far for second-order stationary random measures and point processes on Rd extend easily to multivariate processes on Rd , though for convenience we discuss mostly the case d = 1. The ﬁrst-moment measure in Proposition 8.1.I(a) becomes a vector of ﬁrst-moment measures Mi (A) = E[ξi (A)]

(i = 1, . . . , K; A ∈ BR ),

one for each of the K components. Under stationarity, which means translation invariance of the joint probability structure, not just of each component separately, this reduces to a vector of mean densities {mi , i = 1, . . . , K}. Similarly, the second-order moment and covariance measures in the univariate case are replaced by matrices M and C of auto- and cross-moment (or covariance) measures with elements for i, j = 1, . . . , K and A, B ∈ BR , Mij (A × B) = E[ξi (A)ξj (B)], Cij (A × B) = Mij (A × B) − Mi (A)Mj (B). Under stationarity, the diagonal components Mii are invariant under simulta˘ ii , which neous shifts in both coordinates and so possess reduced forms M inherit the properties of the reduced moment measures listed in Proposition 8.1.II. More than this is true, however. Since every linear combinak tion i=1 αi ξi (Ai ) is again stationary, we ﬁnd on taking expectations of the k k squares that the quadratic forms i=1 j=1 αi αj Mij (Ai × Aj ) are all stationary under diagonal shifts and therefore possess diagonal factorizations. ˘ ij (·), C˘ij (·), say, for From this there follows the existence of reduced forms, M the oﬀ-diagonal as well as the diagonal components of the matrices. ˘ ij , C˘ij (i = j) In the point process case, the oﬀ-diagonal components M will not have the atom at the origin characteristic of the diagonal components unless there is positive probability of pairs of points occurring simultaneously in both the i and j streams. In particular, if the ground process Ng (·) = K i=1 Ni (·) is orderly, both the matrix of reduced factorial moment measures ˘ ij (A) − [δij δ0 (A)mi ](A) ˘ [i,j] (A) = M ˘ M(A) = M and the corresponding matrix of reduced factorial covariance measures with elements ˘ [i,j] (A) − mi mj (A) C˘[i,j] (A) = M will be free from atoms at the origin. ˘ enjoys matrix versions of Whether or not such atoms exist, the matrix M the properties listed in Proposition 8.1.II; we state them for clarity.

8.3.

Multivariate and Marked Point Processes

317

Proposition 8.3.I (Stationary multivariate random measure: Second-order moment properties). ˘ ii (A) > 0 if A 0 and either Ni has an atomic ˘ (i) M(A) ≥ 0, with M component or A is an open set; ˘ ˘ T (−A); (ii) M(A) =M ˘ is positive-deﬁnite: for all ﬁnite sequences {fi } of bounded measurable (iii) M complex functions of bounded support, K K

˘ ij (du) ≥ 0; (8.3.1) fi (x)fj (x + u) M i=1 j=1

R

˘ is translation-bounded: for given A, there exists a constant KA such (iv) M ˘ ij (x + A)| < KA ; ˘ + A)|| = K |M that ||M(x i,j=1 (v) If also the process is ergodic as for equations (8.1.4–5), then as r(A) → ∞, ˘ ˘ ∞ ≡ (mi mj ), and for all bounded Borel sets B, M(A)/(A) →M 1 ˘ ij (B). ξi (x + B) ξj (dx) → M (A) A The properties follow readily from the same device of applying the univariate results to linear combinations of the components (see Exercise 8.3.1). Note that property (ii) implies that the diagonal measures are symmet˘ ij (A) = M ˘ ji (−A), conﬁrming the ric, while for the oﬀ-diagonal measures M importance of order in specifying the cross-moments. The spectral theory also extends easily to multivariate processes on R. For any linear combination of the components, the basic p.p.d. properties (i) and (iii) above are interchanged by the Fourier transform map, implying that the moment measures can be represented by a matrix of spectral measures, which again enjoys the properties listed above (see Exercise 8.3.2). For practical purposes, the multivariate extension of the Bartlett spectrum (Deﬁnition 8.2.II) is of greatest importance. This comprises the matrix Γ of auto- and cross-spectral measures Γij (·) in which the diagonal elements Γii (·) have the properties described in Section 8.2 and the matrix as a whole has the positive-deﬁniteness property in (8.3.1). Indeed, (8.3.1) can be regarded as being derived from the ﬁltered form k ∞

X(t) = fi (t − u) ξi (du) (8.3.2) i=1

−∞

for which the spectral measure ΓX has the form ΓX (dω) =

k k

f˜i (ω)f˜j (ω) Γij (dω).

(8.3.3)

i=1 j=1

In the generality considered here, the components ξi at (8.3.2) may be point processes or random measures. If the latter are absolutely continuous, the appropriate components of the matrix Γ then reduce to the usual spectra

318

8. Second-Order Properties of Stationary Point Processes

and cross-spectra of the stationary processes formed by their densities. In this way, the theory embraces both point and continuous processes as well as mixed versions. If the continuous process has varying sign, as occurs with a Gaussian process, or is given in the wide sense only, then the appropriate framework is the matrix extension of the wide sense theory summarized after Deﬁnition 8.4.VII. From the practical viewpoint, these remarks mean that the interaction of point process systems, or mixtures of point process and continuous systems, can be studied in the frequency domain very much as if they were all continuous systems. The essential diﬀerence is that each point process component leads to a δ-function component in the diagonal term C˘ii (·) to which there is then a corresponding nonzero constant contribution in the spectral measure Γii (·). Bearing this in mind, all the standard concepts of multivariate spectral theory, such as coherence and phase, or real and quadratic spectra, carry over with minor variations to this more general context and provide valuable tools for the descriptive analysis of multivariate point processes and mixed systems. Brillinger (1975a, b, 1978, 1981) outlines both diﬀerences and similarities; for an example studied in depth, see Brillinger (1992). The next two examples illustrate simple special cases of these ideas. Example 8.3(a) A bivariate Poisson process [continued from Example 6.3(e)]. The stationary bivariate point process described earlier is determined by three parameters: rates µ1 and µ2 for the occurrence of single points in processes 1 3 (du) = µ3 G(du) on R, and 2, respectively, and a boundedly ﬁnite measure Q in which µ3 is the rate of occurrence of pairs of points, one in each process and G(du) is a probability distribution for the signed distance u from the process 1 point to the other point. It is convenient for the rest of the example to have G(du) = g(u) du for some probability density function g(·) on R. Since the two component processes are both Poisson, the only nonzero second-order factorial cumulant measure is in the cross-covariance term, with g(u) du = C˘[21] (−A).

C˘[12] (A) = µ3 A

˘ C ˘ of reduced secondThe matrices m(u), ˘ ˘ c(u) of densities for the matrices M, moment measures are given respectively by ⎧ ⎫ 0 ⎪ ⎩ µ1 + µ3 ⎭ δ0 (u) + m(u) ˘ =⎪ 0 µ2 + µ3 ⎧ (µ1 + µ3 )2 ⎪ ⎩ (µ1 + µ3 )(µ2 + µ3 ) + µ3 g(−u)

⎫ (µ1 + µ3 )(µ2 + µ3 ) + µ3 g(u) ⎪ ⎭ (µ2 + µ3 )2

and ⎧ ⎩ µ1 + µ3 ˘ c(u) = ⎪ 0

⎫ ⎧ 0 0 ⎪ ⎭ δ0 (u) + ⎪ ⎩ µ2 + µ3 µ3 g(−u)

⎫ µ3 g(u) ⎪ ⎭. 0

8.3.

Multivariate and Marked Point Processes

319

The corresponding Bartlett spectra are all absolutely continuous, the densities γij (ω) of the matrix Γ being given by ⎧ ⎫ 1 ⎪ µ +µ µ3 G(ω) ⎪ ⎪ ⎪ (8.3.4) ⎩ 1 3 ⎭, 2π µ3 G(−ω) µ2 + µ3 where G(ω) = R e−iuω g(u) du. The coherence of the two processes, at frequency ω, is the ratio µ3 |G(ω)| , ρ12 (ω) = (µ1 + µ3 )(µ2 + µ3 ) while their phase at the same frequency is Im(G(ω)) . θ12 (ω) = arctan Re(G(ω)) Example 8.3(b) System identiﬁcation: a special case. In the previous example, the spectral densities completely determine the parameters of the process. This leads to the more general problem of determining the characteristics of a point process system, meaning some mechanism for producing a point process output from a point process input. Deletions (or thinnings), delays (or translations), and triggering of clusters can all be regarded as examples of point process systems. The problem of system identiﬁcation then consists of determining the mechanism, or at least its main features, from measurements on its input and output. The two components of the previous example can be regarded as the input and output of a system speciﬁed as follows: a proportion π1 = µ1 /(µ1 + µ3 ) of the input points are randomly deleted while each of the points in the remaining proportion π2 = 1 − π1 is transmitted after independent delays with d.f. G [such a speciﬁcation requires G(·) to be concentrated on a half-line], with this transmitted output being contaminated with ‘noise’ consisting of the points of a Poisson process at rate µ2 . It is evident from the spectral representation in (8.3.4) that the three system parameters π1 , G and µ2 can be identiﬁed by measuring the response of the system to a Poisson input process and ﬁnding the joint ﬁrst- and second-order properties of the input and output. It is equally evident that this identiﬁcation is impossible on the basis of separate observations of the input and output. Suppose now that the Poisson input process is replaced by any simple stationary input process with mean density m and spectral density γ(·) in place of (µ1 + µ3 )/(2π). Then, in place of the matrix with components at (8.3.4), we would have the matrix ⎫ ⎧ γ(ω)G(ω) ⎪ ⎪ ⎪ ⎪ γ(ω) ⎪ ⎪ ⎪ ⎪ . (8.3.5) ⎪ ⎪ m + mπ µ ⎪ ⎪ 2 2 2 2 ⎭ ⎩ γ(ω)G(ω) γ(ω) − + π1 |G(ω)| 2π 2π Once more it is evident that in principle the parameters π1 , G and µ can be identiﬁed from this matrix of spectral densities.

320

8. Second-Order Properties of Stationary Point Processes

Many applications of multivariate point process models arise as extensions of contingency table models when more precise data become available concerning the occurrence times of the registered events. Typical examples arise in the analysis of medical or epidemiological data collected by diﬀerent local authorities. If the only data available represent counts of occurrences for each region and within crude (e.g. yearly) time intervals, then methods of categorical data analysis may help to uncover and interpret spatial and temporal dependences. If, however, the data are extended to record the times of each individual occurrence, then marked point process methods may be more appropriate. Several recent books, such as Cressie (1991), Ripley (1988) and Guttorp (1995), provide useful introductions to and examples of such studies. The interpretation of the marks, however, is by no means restricted to such spatial examples. Examples abound in neurophysiology, geology, physics, astronomy, and so on in which interest centres on the evolution and interdependencies of sequences of events involving diﬀerent types of events. The ﬁrst stages in the point process analysis of such data are likely to involve descriptive studies, which have the aim of mapping basic characteristics and dependences. Here, while they may be followed later by model-ﬁtting and testing exercises, nonparametric estimates of the ﬁrst- and second-order characteristics are of particular importance. Such estimates closely follow the univariate forms described earlier [see in particular (8.1.4–5) and (8.1.16–17)]. They take their cue from (8.1.5) in Proposition 8.1.II. Since we are considering MPPs with time as the underlying dimension, estimates such as (8.1.16) for the reduced moment measures here take the form

, ˘ jk ((0, τ ]) = 1 M Nj (tik , tik + τ ] . (8.3.6) T i:0≤tik
In the cross terms, the sum is extended over events of type k while the counts are for events of type j. Edge corrections of the type (8.1.26) can be incorporated, or more simply one could apply the plus sampling modiﬁcation, which in the one-dimensional context would amount to including within the sum the full contributions N (tik , tik + τ ] initiated by events of type k with tik < T < tik + τ . Models for such processes typically involve extensions and modiﬁcations of the basic univariate models. In particular, it is very easy to develop extensions of the standard cluster models in which the cluster members may be events of diﬀerent types (see Exercise 8.3.3). More complex versions allow events of any one type to produce ‘oﬀspring’ of other types. Perhaps the most important such example is the multivariate extension of the Hawkes process considered below. Example 8.3(c) Mutually exciting point processes. Hawkes (1971b, 1972) generalized the model described in Examples 6.4(c) and 7.2(b) to both the multivariate and marked point process cases. We give here the multivariate model but via a cluster process representation, where the branching process

8.3.

Multivariate and Marked Point Processes

321

now consists of points of K diﬀerent types and for each i, j = 1, . . . , K there is a Poisson process of oﬀspring of type j generated by an ancestor of type i at time t governed by the parameter measure µij (· | t), all these processes being independent and each new oﬀspring generating its own Poisson process. Assume homogeneity of such oﬀspring processes by setting µij (s | t) = µij (s − t) as earlier in Example 6.4(c) and, to ensure that there are a.s. only ﬁnitely many descendants to individual, that the eigenvalue any given of largest modulus of the matrix µij (R) , which by Perron–Frobenius theory is necessarily positive, is smaller than 1. Finally, suppose that type i points enter the system from outside as ancestors in a Poisson process at rate λi (i = 1, . . . , K). For notational simplicity, we conﬁne attention to the case where the µkl (·) have densities (i.e. µkl (dv) = µkl (v) dv, say). Then, results from branching processes in Section 5.5 (see e.g. Exercise 5.5.7) show for the cluster member processes ﬁrst that the ﬁrst-moment measures Mki (·) have densities mki (·) for which K

mki (x) = δik δ0 (x) + µkl (v)mli (x − v) dv, (8.3.7) R

l=1

and for the second-order measures we have the densities K

mk,ij (x, y) = mki (x)mkj (y) + µkl (v)ml,ij (x − v, y − v) dv. l=1

(8.3.8)

R

The ﬁrst- and second-moment densities, which incorporate an appropriate δ-function, can be interpreted as ancestor of type k born at 0 has mki (x) dx = Pr , type i descendant born in (x, x + dx) ancestor of type k born at 0 has type i and j descenmk,ij (x, y) dx dy = Pr . dants born in (x, x + dx) and (y, y + dy), respectively

Thus, the mean density of type i points, assuming stationarity, is given by mi ≡

K

k=1

λk

R

mki (x) dx.

(8.3.9)

The integral in (8.3.9) can be found by solving (8.3.7) after integration, but for later use it is better now to introduce the Fourier transforms ixω m ij (ω) = e mij (x) dx, µ ij (ω) = eixω µij (x) dx,

R

R

so that m(ω) ≡ m ij (ω) and µ (ω) ≡ µ ij (ω) are related by −1 , m(ω) = I −µ (ω)

(8.3.10)

T and the column vector (m1 , . . . , mK )T = m(0)(λ 1 , . . . , λK ) . The inverse at

322

8. Second-Order Properties of Stationary Point Processes

(8.3.10) is well deﬁned because the largest eigenvalue of µij (R) = µ ij (0) is by assumption less than 1. Similar lengthier analysis starting from (8.3.8) and using the multitype extension of the relation in (6.3.14) for the reduced covariance density in terms of the second-order cluster member densities leads to K

c˘ij (u) = λk mk,ij (x, x + u) dx, k=1

R

in which the mk,ij (·) are multitype analogues of ρ[2] (·) in (6.3.14). This leads ultimately to the matrix of spectral densities as 1 T 1 γij (ω) = eiuω c˘ij (u) du = m (−ω) diag(m1 , . . . , mK )m(ω) 2π R 2π −1 −1 1 I − [ µ(−ω)]T diag(m1 , . . . , mK ) I − µ (ω) , (8.3.11) = 2π which generalizes (8.2.10). Hawkes (1971b) derived (8.3.11) using a Wiener–Hopf argument and the linear intensity structure

t ∗ λi (t) = λi + µki (t − s) dNk (s). k

−∞

A range of further models can be obtained by varying the character of the cluster centre process while keeping the mutually exciting form for the cluster members (see Exercise 8.3.4). We now turn to the second-order properties of MPPs with general mark space. We consider point processes taking their values in X = R × K for some c.s.m.s. K so that the process consists of pairs (ti , κi ), where ti ∈ R and κi ∈ K. We assume stationarity along the time axis R and suppose that the ﬁrst- and second-moment measures exist as boundedly ﬁnite measures in X and X (2) . The main emphasis is on time-domain properties—that is, on the moment and covariance measures themselves—rather than on their Fourier transforms. Much of this theory can be extended immediately to homogeneous point processes in Rd , but mostly we leave such extensions to follow the more systematic analysis of homogeneous processes in Chapter 12. Although we have met already several examples of processes of this type, particularly in Chapter 6, it may still be helpful to start by listing formally the basic properties of their ﬁrst- and second-order moment measures. Proposition 8.3.II (Moment Structure of Stationary MPP). Let N (·) on R × K be a simple stationary marked point process for which the ﬁrst- and second-moment measures exist. Then, deﬁning u = t2 − t1 , the ﬁrst- and second-moment measures have respective factorizations M1 (dt × dκ) = F (dκ) dt, ˘ 2 (du × dκ1 × dκ2 ) dt1 , M2 (dt1 × dt2 × dκ1 × dκ2 ) = M

(8.3.12) (8.3.13)

8.3.

Multivariate and Marked Point Processes

323

corresponding, respectively, to the following integral relations, valid for bounded measurable h with bounded support: h(t, κ) M1 (dt × dκ) = dt h(t, κ) F (dκ), (8.3.14) R×K R K h(t1 , t2 , κ1 , κ2 ) M2 (dt1 × dt2 × dκ1 × dκ2 ) (R×K)(2) ˘ 2 (du × dκ1 × dκ2 ). (8.3.15) = dt h(t, t + u, κ1 , κ2 ) M R

R×K×K

Proof. Both statements are straightforward applications of the factorization Lemma A2.7.II, the second after taking coordinates in the space X (2) so that (t1 , t2 , κ1 , κ2 ) → (t1 , t1 + u, κ1 , κ2 ) (see Exercise 8.3.5). If the ground process has a ﬁnite mean density mg = E[N ((0, 1] × K)], then the measure F is totally ﬁnite with F (K) = mg , and we can thus introduce a probability measure Π on (K, B(K)) by setting Π(A) = F (A)/F (K) A ∈ B(K) . (8.3.16) Π(A) can then be interpreted as the stationary distribution of marks. The assumption mg < ∞ is not implied directly by the assumption that the ﬁrst-moment measure exists (i.e. deﬁnes a boundedly ﬁnite measure in R × K), though to our knowledge all extant counterexamples are nonergodic in character (see Exercise 8.3.6). The distribution Π has two further important interpretations. First, it is an ergodic probability in the sense (see Chapter 12) that, if the process is ergodic and T → ∞, #{(ti , κi ): 0 < ti < T, κi ∈ A} N ((0, T ] × A) = → Π(A) T T

a.s.

Second, it can be interpreted as the distribution of the mark associated with an arbitrary (loosely, randomly selected) time point (event) ti of the process. Equivalently, it is the distribution of the mark associated with an event at the origin, given that an event of some kind occurs at the origin. This is the interpretation as a Palm probability, as intimated in Chapter 6 and developed in greater detail in Chapter 13. ˘ 2 (du × dκ1 × dκ2 ) also has a range The reduced second-moment measure M of important interpretations. For u = 0, it represents the rate of occurrence of pairs of points u time units apart, the ﬁrst having its mark in (κ1 , κ1 + dκ1 ) and the second, at the (signed) distance u from the ﬁrst, having its mark in (κ2 , κ2 + dκ2 ). Note that the order of marks can be distinguished; when u = 0 and the density m ˘ 2 (u) exists, we have ˘ 2 (−u, κ2 , κ1 ), m ˘ 2 (u, κ1 , κ2 ) = m in general. = m ˘ 2 (u, κ2 , κ1 )

324

8. Second-Order Properties of Stationary Point Processes

Again, there is an interpretation as an ergodic limit: for T → ∞, #{pairs (ti , κi ), (tj , κj ): 0 < ti < T, 0 < tj − ti < u, κi ∈ A, κj ∈ B} T ˘ → M2 ((0, u] × A × B) a.s. Several diﬀerent interpretations as a Palm measure are possible, depending on whether one conditions on a point at the origin, without any condition on the mark; on a point at the origin with speciﬁed mark; or on two points at a given separation u apart, with the ﬁrst at the origin. In particular, ˘ ˘ 2 (B | u, κ1 ) = M2 (du × dκ1 × B) , M du F (dκ1 ) m ˘ 2 (u, κ1 , κ2 ) dκ2 = B mg f (κ1 )

(8.3.17) if the densities exist,

representing the rate of occurrence of points with marks in B conditional on the occurrence of a point with mark κ1 at a time origin u time units previously. It has the character of a cross-intensity. Further variants are set out in Lemma 8.3.III. The results so far have been stated in terms of the ordinary rather than the factorial moment measures. When the ground process is simple (as we are assuming throughout this chapter), the only diﬀerences arise when u = 0, in which case the reduced form of the ordinary second-moment measure includes a double δ-function term δ(u) δ[ρ(κ1 , κ2 )] (here, ρ(·) represents the distance function in the mark space), a term that is missing from the corresponding factorial moment density. Even if u = 0, the complete moment density m ˘ 2 (0, κ1 , κ2 ) can still exist (and is then zero) if κ1 = κ2 . For u = 0, the densities m ˘ 2 (u, κ1 , κ2 ) and the corresponding covariance densities c˘2 (u, κ1 , κ2 ) (or normalized versions of them) are usually the main objects of investigation in a second-order analysis of a stationary marked or multivariate point process. Example 8.3(d) Stationary process with independent marks (see Proposition 6.4.IV). Let the simple point process N on R have mean density m and suppose that marks are allocated independently according to the probability distribution F (·). Then, F (·) coincides with the stationary mark distribution Π(·) at (8.3.16) and with the mark kernel F (· | t) introduced in Proposition 6.4.IV (and here independent of t, from stationarity). For u = 0, the reduced ˘ 2 takes the form moment measure M ˘ 2 (du × dκ1 × dκ2 ) = M ˘ g (du) × F (dκ1 ) × F (dκ2 ) , M [2] and for the covariance measure, C˘2 (du × dκ1 × dκ2 ) = C˘2g (du) × F (dκ1 ) × F (dκ2 ),

8.3.

Multivariate and Marked Point Processes

325

˘ g and C˘ g are the reduced moment and cumulant measures of the where M 2 2 initial process N , which here acts as the ground process Ng . Such a simple model may be useful as a null hypothesis in testing for more complex interactions, as, for example, in the discussion of earthquake magnitudes in Vere-Jones (1970). Another focus of practical interest is the bivariate distribution of the marks from two points at a given separation from each other. One is typically interested in how the properties of this distribution vary as a function of the distance between the two points. The existence of such distributions, while not a direct corollary of Proposition 8.3.II, does follow from it via a further application of the disintegration theory outlined in Appendix A1.5. We state the result for MPPs with state space X = R; note that the extensions to stationary (homogeneous) processes on X = Rd are immediate (see also Exercise 8.3.7). Lemma 8.3.III. Let N (·) satisfy the conditions of Proposition 8.3.II, and suppose in addition that for its ground process the second-moment measure ˘ g (·). Then, there exists a bivariate mark kernel exists and has reduced form M 2 Π2 (K1 × K2 | u), where K1 , K2 ∈ B(K), such that ˘ g -almost-all u, Π2 (· | u) is a probability distribution on K(2) ; (i) for M 2 (ii) Π2 (K1 × K2 | u) is a Borel measurable function of u for ﬁxed K1 , K2 ; ˘ 2 has the factorization (iii) M ˘ 2 (du × dκ1 × dκ2 ) = M ˘ g (du) Π2 (dκ1 × dκ2 | u), M 2 or in integral form, for bounded Borel functions h on X × K(2) with bounded support on X , ˘ [2] (du × dκ1 × dκ2 ) h(u, κ1 , κ2 ) M R×K(2) ˘ g (du) h(u, κ1 , κ2 ) Π2 (dκ1 × dκ2 | u). = M 2 R

K(2)

Proof. The proof is a straightforward application of the disintegration theorems A1.5.II and A1.5.III, starting from the observation that for ﬁxed K1 and ˘ [2] (du × K1 × K2 ) is absolutely continuous with respect to K2 , the measure M ˘ g (du) of the ground process. the moment measure M 2 A point to note here is that the univariate mark distributions arising as the marginals in the bivariate distribution above are not in general equal to the stationary mark distribution: the former stem from an analysis of secondorder moments, while the latter comes from ﬁrst-order moments. Nor is it necessarily the case that the bivariate distributions are symmetric. These points are illustrated in Exercise 8.3.8 and Example 8.3(e) below. Assuming that the conditions of the lemma hold, various characteristics of the bivariate mark kernel Π2 (· | u) can be studied as functions of u. The

326

8. Second-Order Properties of Stationary Point Processes

most important are the covariance and the correlation, which we may denote by covK (u) and corrK (u), respectively. Exactly parallel concepts can be introduced for spatial processes, with the simpliﬁcation, when the process is isotropic as well as homogeneous, that the functions depend only on the distance |u|. Example 8.3(e) Marked cluster process with cluster-dependent marks. We consider cluster processes in which both the cluster centre process and the cluster member processes carry marks and such that the mark, K say, for a given cluster centre controls both the spatial and the mark distributions of the cluster members. In the example that follows, we suppose for simplicity that all marks are nonnegative integers. Take a Neyman–Scott type MPP in which the cluster centre process has realizations (xi , Ki ), say, where {xi } are the points of a Poisson process at rate λc and the marks Ki are i.i.d. with Pr{Ki ≥ k} = sk (all i). For a given cluster centre with mark K, say, let the number of cluster members Nm , say, have a negative binomial distribution with parameters (α, K/(1 + K)) so that the conditional mean and variance of the cluster size are αK and αK(K + 1), respectively. Suppose also that the associated marks for the cluster members, given the parent mark K, are i.i.d. with discrete uniform distribution on the integers 1, . . . , K. Thus, the larger the parent mark K, the larger both the number of oﬀspring and their marks. Assume that oﬀspring points are distributed at i.i.d. distances from the parent with common distribution F with density f . The MPP we consider is the collection of all oﬀspring points and associated marks. Consider ﬁrst the process of points having a given mark k ≥ 1. Only clusters with parent mark K ≥ k can contribute to this process. Given Nm , the number of cluster members having mark k from such a cluster is found by binomial sampling, with probability of success 1/K, from the Nm cluster members. The resulting number of cluster members with mark k again has a negative binomial distribution with parameters (α, 12 ), independent of k, provided K ≥ k, and with mean α. Overall, the mean density of points with mark k is therefore λc αsk . For every positive k, the process of points with mark k is well deﬁned. Moreover, the process as a whole is a well-deﬁned point process on R × Z+ . On the other hand, in order to be an MPP as deﬁned in Section 6.4, the ground process (meaning the set of all oﬀspring points) must be well deﬁned (i.e. only ﬁnitely many points a.s. in bounded sets). Since the cluster centre process is Poisson, and clusters are i.i.d., a suﬃcient condition for the cluster process to be well deﬁned is that the mean number of events per cluster is ﬁnite [see Exercise 6.3.5(a)]. Here the meannumber ∞ of points per cluster for the ground process is given by E(K) = k=1 sk , which is ﬁnite if and only if K has ﬁnite ﬁrst moment. When this condition is satisﬁed, the stationary distribution of marks overall has the length-biased form πk = sk /E(K). Consider next the process of pairs of points, with marks k1 , k2 , separated by distance u > 0. The second-order moment density has the form

8.3.

Multivariate and Marked Point Processes

327

m ˘ 2 (u; k1 , k2 ) = λ2 α2 sk1 sk2 ∞ Hk1 (K)Hk2 (K)Nm (Nm − 1) + λE f˘(x)f˘(x + u) dx, (8.3.18) K2 0 where Hk (j) = 1 if j ≥ k, 0 otherwise, and the integral follows the notation of equation (6.3.19). The ﬁrst term here represents the product of the means, while the second is the contribution to the second moment from pairs belonging to the same cluster. Note that Hk1 (K) Hk2 (K) = Hmax(k1 ,k2 ) (K); taking expectations with respect to the parent cluster mark in the second term yields m ˘ 2 (u; k1 , k2 ) = λ2 α2 sk1 sk2 + λα(α + 1)smax(k1 ,k2 ) φ(u),

(8.3.19)

where φ(u) denotes the integral in (8.3.18). This quantity exists for the marked process without any further restrictions, but the second-moment mea s sure does not exist for the ground process unless the sum k1 k2 max(k1 ,k2 ) = k (2k + 1)sk converges, equivalent to the existence of a second moment for the parent mark distribution. When this condition is satisﬁed, the bivariate mark kernel at separation u, Π2 (k1 , k2 | u), can be found by renormalizing [i.e. by dividing (8.3.19) by the double sum just described]. Even if we sum out one variable, the marginal distribution of the other does not reduce to the stationary mark distribution because of the intervention of the second term. Expressions for the mark covariance and mark correlation at separation u can be found from the bivariate mark kernel: details are left to the reader. The assumption of i.i.d. marks within a cluster implies that there is no dependence on the separation u except through the term φ(u). This implies in particular that the bivariate mark kernel is symmetric in u. It would, however, be quite natural in some modelling situations to incorporate an explicit dependence of the mark distribution on the distance from the cluster centre, in which case a further dependence on u would arise, causing the bivariate distribution to be asymmetric in general. MPPs can give rise to a diverse range of second-order characteristics (see e.g. Stoyan, 1984; Isham, 1985): the ‘simple’ case of a ﬁnite mark space in Proposition 8.3.I bears this out. Schlather (2001) gives a valuable survey. From a theoretical viewpoint, some of the most interesting applications of stationary MPPs are to situations where the marks are not merely statistically dependent on the past evolution of the process but are direct functions of it. As an extreme case, the mark at time t can be taken as the whole past history of the point process up to time t. This idea lies behind one approach to the Palm theory of Chapter 13. The following elementary example gives some insight into this application. Example 8.3(f) Forward recurrence times. Assume there is given a simple stationary point process on R, and associate with any point ti of the process the length Li = inf{u: N (ti − u, ti ) ≥ 1} of the previous interval. Then, the MPP consisting of the pairs (ti , Li ) is stationary. Assuming that N has a

328

8. Second-Order Properties of Stationary Point Processes

ﬁnite mean density m, it follows from Proposition 8.3.II and (8.3.16) that a stationary probability distribution ΠL (·) exists for the interoccurrence times. The integral relation (8.3.14) then leads to important relations involving ΠL (·) as for example in the following deduction of the distribution of the stationary forward recurrence time random variable. The distance of the point nearest to the right of the origin, t1 say, has this distribution, with t1 = inf{ti : ti > 0}. If i is the index of this point, then 0 < t1 = ti ≤ Li . Take any bounded measurable function g(·) of bounded support and deﬁne h(t, κ) = g(t) if 0 ≤ τ ≤ κ, h(t, κ) = 0 otherwise. The left-hand side of (8.3.14) equals R×R+

h(t, κ) M1 (dt × dκ) = E * =E

R×R+

h(t, κ) N (dt × dκ) +

h(ti , κi ) = E[g(t1 ]

i:t)i>0

since h(t, κ) = 0 for t > t1 ; evaluating the right-hand side as below gives E[g(t1 )]

=m

∞

g(u) du 0

∞

ΠL (dκ) = m u

∞

[1 − FL (u)]g(u) du,

0

t where FL (t) = 0 ΠL (du) is the distribution function for the interval length. Since g is an arbitrary measurable function of bounded support, we can for example choose g(t) = I(0,x] (t) and obtain Pr{t1 ≤ x} on the left-hand side, x equal to m 0 [1 − FL (u)] du from the right-hand side; thus, the distribution for the point t1 immediately following the origin (i.e. the distribution for the forward recurrence time) has the density f1 (x) = m[1 − FL (x)] = [1 − FL (x)]/µL , where µL is the mean interval length [see (4.2.3) and Proposition 4.2.I]. This simple derivation of a Palm–Khinchin relation uses an argument similar to the original work of Palm (1943). Example 8.3(g) Vehicles on a road. We consider a spatially stationary distribution of cars along a long straight road, the car at xi having a (constant) velocity vi , with vi = vj in general. Our aim is to determine the evolution in time, if any, of characteristics of the process. The family of transformations that concerns us is given by (xi , vi ) → (xi + tvi , vi )

(real t).

Denote by mt , Πt (·), and ct (u, v1 , v2 ) the mean density, the stationary (in space) velocity distribution, and the spatial covariance density at time t. We can refer moments at time t to moments at time 0 on account of the following

8.3.

Multivariate and Marked Point Processes

329

reasoning. From (8.3.14), we have for the space–velocity mean density at time t, Mt (dx × dv) say, h(x, v) Mt (dx × dv) = h(x + tv, v)m0 dx Π0 (dv) R×R+ R×R+ h(y, v)m0 dy Π0 (dv), = R×R+

so that the mean vehicle density and velocity distribution remain constant in time whatever their initial forms. Applying a similar argument to the second-order integrals implies that if the covariance densities ct (u, v1 , v2 ) exist for t = 0, they exist for all t > 0 and are given by ct (u, v1 , v2 ) = c0 u + t(v2 − v1 ), v1 , v2 . The asymptotic covariance properties of ct (·) at t → ∞ thus depend on the behaviour of c0 (u, v1 , v2 ) for large u. In most practical cases, a mixing condition holds and implies that for all v1 , v2 , c0 (u, v1 , v2 ) → 0 as |u| → ∞. Under these conditions, any correlation structure tends to die out, this being an illustration of the ‘Poisson tendency’ of vehicular traﬃc (Thedeen, 1964). This example can also be treated as a line process and extended in various ways (see e.g. Bartlett, 1967; Solomon and Wang, 1972).

Exercises and Complements to Section 8.3 8.3.1 Detail the argument that establishes Proposition 8.3.I by applying Proposition 8.1.I to the linear combinations ai ξi (·). ˘ ij (·)) of nonnegative measures be positive-deﬁnite as in 8.3.2 Let the matrix (M (8.3.1). Show that the matrix of Fourier transforms (Fij (·)) consists of nonnegative measures with the same positive-deﬁnite property. 8.3.3 Consider a multivariate Neyman–Scott process in which cluster centres occur in time at rate µc and cluster members may be of diﬀerent types with joint density p(k, u) = πk fk (u), πk = 1 = fk (u)du (k = 1, . . . , K). Find expressions, generalizing those of Example 6.3(c), for the means and covariance densities of the diﬀerent component streams and the corresponding multivariate Bartlett spectra. 8.3.4 Consider a cluster process in which the cluster centres form a simple stationary point process with mean density λc and Bartlett spectrum with density γ11 (·), while the clusters have the Hawkes branching structure of Example 8.3(c). Regard the resultant process as the output of a system with the cluster centre process the input and the generation of cluster members representing a type of positive feedback with the linear structure characteristic of a Hawkes process. (a) Arguing from the general relations for the second-order properties of a cluster process, show that the output process here has the spectral density γ22 (ω) =

[λc /(2π)]((1 − ν)−1 − 1) + γ11 (ω) , |1 − µ (ω)|2

330

8. Second-Order Properties of Stationary Point Processes

(0), which [see (8.3.11)] is a diﬀerent generalization of (8.2.10). where ν = µ The only contributions to the cross-covariance terms are from the cluster centre to cluster members, leading to c12 (u) = λc m1 (u | 0) (see the notation in Exercise 5.5.6), and thus γ12 (ω) =

λc /(2π) −1

(1 − µ (ω))

= γ21 (−ω).

(b) By specializing γ11 (·), more speciﬁc examples of input/output systems are obtained. For example, the input may be a Cox process directed by a continuous nonnegative process X(·), in which case we have a continuous input process X(·) causally aﬀecting an output point process. If, moreover, X(·) is itself a shot-noise process generated by some primary point process, we recover a somewhat more general case of mutually exciting point processes. 8.3.5 Explicitly state the mappings and show their use in applying the factorization Lemma A2.7.II to prove Proposition 8.3.II. 8.3.6 MPPs with inﬁnite mean ground density. Suppose given a countable inﬁnity of stationary (R × K)-valued MPPs Nj , j = 1, 2, . . . , deﬁned on some common probability space and K ⊆ R+ . Suppose that Nj has ﬁnite mean density mj and each point of Nj has the positive-valued mark κj , say, and there is a probability distribution {πj } with πj > 0 for j = 1, 2, . . . such that π m = ∞. j j j (a) Let the MPP N equal Nj with probability πj for j = 1, 2, . . . . Then N is nonergodic: limT →∞ N ((0, T ] × K)/T = limT →∞ Nj ((0, T ] × K)/T = mj with probability πj . Since each Nj is well deﬁned, so is N , and its mean ground density equals π m = ∞. Denoting a realization of N j j j by {(xi , κi )}, consider the stationary random measure ξ(A) = κ. xi ∈A i κ is independent of j a.s., and that Show that ξ(·) is nonergodic unless m j j its mean density equals π m κ , which can be ﬁnite or inﬁnite. j j j j (b) Now suppose that the Nj are mutually independent marked Poisson processes. (i) Show that the superposition of any speciﬁed ﬁnite collection of the Nj is an MPP with ﬁnite mean density. (ii) Let J be a countably inﬁnite subset of {1, 2, . . .}, and consider N = j∈J Nj . Then, N is not an MPP because N ((0, 1]×K) = ∞ a.s., contradicting the ﬁniteness condition in Deﬁnition 6.4.I(a). (c) Suppose in (b) that the Nj are mutually independent simple stationary MPPs (not necessarily Poisson). Do the conclusions (i) and (ii) continue to hold? 8.3.7 Let the bivariate simple Poisson process model of Example 8.3(a) be stationary so that it can be described in terms of three rate functions µ1 , µ2 , µ3 and a distribution function G(·) of the signed distance between a pair of related points, taking a type 1 point as the initial point. Show that in terms of these quantities, m 1 = µ1 + µ3 , m 2 = µ2 + u 3 , ˘[2] (−du; 2, 1). ˘ C[2] (du; 1, 2) = µ3 G(du) = C

8.4.

Spectral Representation

331

Use the p.g.ﬂ. or otherwise to show that when X = R, the joint distribution of the distances T1 and T2 from an arbitrary origin to the nearest points of types 1 and 2, respectively, is given by log Pr{T1 > x, T2 > y}

x+y

= −2m1 x − 2m2 y + µ3

( min(x, y − v) − max(−x, −y − v)) G(dv), −x−y

while the joint distribution of the forward recurrence times T1+ , T2+ from the origin to the nearest points in the positive direction is given by log Pr{T1+ > x, T2+ > y}

y

= −m1 x − m2 y + µ3

( min(x, y − v) − max(0, −v)) G(dv). −x

Consider extensions to the case X = Rd . 8.3.8 Gauss–Poisson process with asymmetric bivariate mark distribution. In a marked process of correlated pairs (marked Gauss–Poisson process), suppose that the joint distribution of the marks corresponding to the two points in a pair depends on the separation of the two points and that the mark of the ﬁrst occurring point in the pair is (say) always the larger. Construct an explicit example for which the bivariate mark distribution at separation u depends explicitly on u and is asymmetric. 8.3.9 Bivariate forward recurrence time. Extend the argument of Example 8.3(f) to the case of a bivariate point process by using an MPP in which the mark at a point ti of the process is of the form (ji ; L1i , L2i ), where ji is the type of the point and L1i , L2i are the backward occurrence times to the last points of types 1 and 2, respectively. Obtain a bivariate extension of the Palm– Khinchin equations, and compare these with the extensions to nonorderly point processes discussed in (3.4.14). Hence or otherwise, obtain expressions for the joint distributions of the intervals between an arbitrary point of type i (i = 1, 2) and the next occurring points of types 1 and 2 in Example 8.3(a). [Daley and Milne (1975) use a diﬀerent approach that exploits methods similar to those of Chapter 3].

8.4. Spectral Representation We take up next the possibility of developing a Cram´er-type spectral representation for stationary point processes and random measures. In R, such a representation is essentially a corollary of the spectral representation for processes with stationary increments given by Doob (1949) and for stationary interval functions given by Brillinger (1972). No essentially new points arise, although minor reﬁnements are possible as a result of the additional properties available for p.p.d. measures. We give a brief but essentially selfcontained account of the representation theory for random measures in Rd following the general lines of the approach in Vere-Jones (1974). The relation to spectral representations for stationary generalized processes is discussed in Daley (1971) and Jowett and Vere-Jones (1972).

332

8. Second-Order Properties of Stationary Point Processes

In order to be consistent with the representation theory for continuous-time processes, we work throughout with the mean-corrected process ξ 0 (dx) = ξ(dx) − m dx

(8.4.1)

with zero mean, where ξ is a second-order stationary random measure with mean density m. Thus, we are concerned with properties of the Bartlett spectrum. An equivalent and perhaps slightly more direct theory could be built up from the properties of ξ(·) and the second-moment measure: the diﬀerences are outlined in Exercise 8.4.1. The essence of the Cram´er representation is an isomorphism between two Hilbert spaces, one of random variables deﬁned on a probability space and the other of functions on the state space X = Rd . In the present context, we use the notation L2 (ξ 0 ) to denote the Hilbert space of (equivalence classes of) random variables formed from linear combinations of the second-order random variables ξ 0 (A) (bounded A ∈ BX ) and their mean square limits, while L2 (Γ) denotes the Hilbert space of (equivalence classes of) measurable functions square integrable with respect to Γ. Since Γ is not in general totally ﬁnite, we cannot apply directly the theory for mean square continuous processes. Rather, there are two possible routes to the required representations: we can exploit the results already available for continuous processes by means of smoothing techniques such as those used in Section 8.5, or we can develop the theory from ﬁrst principles, using appropriate modiﬁcations of the classical proofs where necessary. We adopt the latter approach, although we only sketch the arguments where they directly mimic the standard theory. A convenient starting point is the following lemma in which S again denotes the space of functions of rapid decay in Rd . Lemma 8.4.I. Given any boundedly ﬁnite measure Γ in Rd , the space S is dense in L2 (Γ). Proof. The result is a minor modiﬁcation of standard results [see e.g. Kingman and Taylor (1966, p.131) and Exercise 8.4.2]. The key step in establishing the isomorphism between the spaces L2 (ξ 0 ) and L2 (Γ) is a special case of Proposition 8.6.IV, which, with the notation f (x) ξ 0 (dx), (8.4.2) ζf = Rd

where f is a bounded Borel function of bounded support, can be stated in the form f˜ = |f˜(ω)|2 Γ(dω) = f (x)f (x + u) du C˘2 (dx) L2 (Γ)

Rd

= var(ζf ) = ζf L2 (ξ0 ) .

Rd

Rd

(8.4.3)

A ﬁrst corollary of this equality of norms is the following counterpart of the lemma above.

8.4.

Spectral Representation

333

Lemma 8.4.II. For ψ ∈ S, the random integrals ζψ = dense in L2 (ξ 0 ).

Rd

ψ(x) ξ 0 (dx) are

Proof. It is enough to show that for any given bounded A ∈ B(Rd ), ξ 0 (A) can be approximated in mean square by elements ζψn with ψn ∈ S. Working from the Fourier transform side, it follows from (8.4.3) that I˜A ∈ L2 (Γ) and thus by Lemma 8.4.I that I˜A can be approximated by a sequence of functions in S. Now S is invariant under the Fourier transform map, so this sequence can be written as ψ˜n with ψn ∈ S. Applying (8.4.3) with ψ = IA ψn leads to I˜A − ψ˜n L2 (Γ) = ξ 0 (A) − ζψn L2 (ξ0 ) . By construction, the left-hand side → 0 as n → ∞, and hence also the righthand side → 0, which from our opening remark is all that is required. Lemmas 8.4.I and 8.4.II show that for ψ ∈ S there is a correspondence ψ˜ ↔ ζψ between elements ψ˜ of a set dense in L2 (Γ) and elements ζψ of a set dense in L2 (ξ 0 ). The correspondence is one-to-one between equivalence classes of functions and is norm-preserving. From this last fact, it follows that the correspondence can be extended to an isometric isomorphism between the full Hilbert spaces L2 (Γ) and L2 (ξ 0 ) (see Exercise 8.4.3 for details), thus establishing the following proposition. Proposition 8.4.III. There is an isometric isomorphism between L2 (Γ) and L2 (ξ 0 ) in which, for ψ ∈ S, the integral ζψ in (8.4.2) ∈ L2 (ξ 0 ) and the Fourier transform ψ˜ ∈ L2 (Γ) are corresponding elements. The main weakness of this proposition is that it does not give an explicit Fourier representation of the random measure and associated integrals ζψ . To overcome this deﬁciency, we adopt the standard procedure of introducing a mean square integral with respect to a certain wide-sense random signed measure with uncorrelated values on disjoint sets. For any bounded A ∈ B(Rd ), let Z(A) denote the random element in L2 (ξ 0 ) ˜ corresponding to ψ(ω) ≡ IA (ω) in L2 (Γ). For disjoint sets A1 , A2 , it follows from the polarized form of (8.4.2) (obtained by expressing inner products in terms of norms) that E Z(A1 ) Z(A2 ) =

Rd

IA1 (ω)IA2 (ω) Γ(dω) = 0,

(8.4.4)

so that the Z(·) are indeed uncorrelated on disjoint sets (or, in the setting of the real line, have orthogonal increments). The deﬁnition of a mean square integral with respect to such a family is a standard procedure (see e.g. Doob, 1953; Cram´er and Leadbetter, 1967) and leads to the conclusion that for every g ∈ L2 (Γ) the integral g(ω) Z(dω) Rd

334

8. Second-Order Properties of Stationary Point Processes

can be deﬁned uniquely as a mean square limit of integrals of simple functions and can be identiﬁed with the unique random variable associated with g in the isomorphism theorem described by Proposition 8.4.III. In particular, for g = ψ˜ ∈ S, the integral below can be identiﬁed with the random element ζψ ; that is, ˜ ψ(ω) Z(dω) = Rd

ψ(x) ξ 0 (dx). Rd

Also, referring to the convergence property displayed in the proof of Lemma 8.4.II (and this deﬁnes an equivalence relation as noted), the limit relation can be written as ξ 0 (A) = l.i.m. ζψn n→∞

(see e.g. Doob, 1953, p. 8). More generally, it follows from Proposition 8.6.IV and (8.4.3) that the same conclusion holds for any bounded ψ of bounded support. Thus, we have the following result, which is a slight strengthening, as well as an extension to Rd , of the corresponding result in Vere-Jones (1974). Theorem 8.4.IV. Let ξ be a second-order stationary random measure or point process in Rd with Bartlett spectrum Γ. Then, there exists a secondorder wide-sense random measure Z(·) deﬁned on bounded A ∈ B(Rd ) for which (i) EZ(A) = 0 = E[Z(A)Z(B) ] for bounded disjoint A, B ∈ B(Rd ); (8.4.4 ) (ii) var Z(A) = E(|Z(A)|2 ) = Γ(A);

(8.4.5)

to g in the isomor(iii) for all g ∈ L2 (Γ), the random variable ζ corresponding phism of Proposition 8.4.III is expressible as ζ = Rd g(ω) Z(dω); and (iv) for all ψ ∈ S and all bounded measurable ψ of bounded support, ζψ ≡

˜ ψ(ω) Z(dω)

0

ψ(x) ξ (dx) = Rd

a.s.

(8.4.6)

Rd

Observe that in the Parseval relation in (8.4.6) the left-hand side represents the usual random integral deﬁned on a realization by realization basis, whereas the right-hand side is a mean square integral that does not have a meaning in this sense. The two most important classes of functions ψ are covered by the theorem. In Exercise 8.4.4, we indicate how (8.4.6) can be extended to somewhat wider classes of functions and, in particular, (8.4.6) continues to hold whenever ψ is Lebesgue integrable and ψ˜ ∈ L2 (Γ). An alternative approach to the substance of part (iv) of this theorem is simply to deﬁne the integral on the left-hand side of (8.4.6) to be equal to the right-hand side there for all ψ˜ ∈ L2 (Γ), but this begs the question as to when this deﬁnition coincides with the a.s. deﬁnition of the integral used until now. More explicit representation theorems can be obtained as corollaries to (8.4.6). In particular, taking ψ(x) = IA (x), we have the following.

8.4.

Spectral Representation

335

Corollary 8.4.V. For all bounded A ∈ B(Rd ), I˜A (ω) Z(dω) ξ 0 (A) =

a.s.

(8.4.7)

Rd

We cannot immediately obtain an inversion theorem for Z(·) in this form because the corresponding integral (2π)−d Rd I˜B (−x) ξ 0 (dx) need not exist. The ﬁnite integral over UdT presents no diﬃculties, however, and leads to the second corollary. Corollary 8.4.VI. For all bounded A ∈ Rd that are Γ-continuity sets, 1 Z(A) = l.i.m. (8.4.8) I˜A (−x) ξ 0 (dx). T →∞ (2π)d Ud T Proof. From the theorem, the ﬁnite integral in (8.4.8) can be transformed into the expression [for θ = (θ1 , . . . , θd ) and ω = (ω1 , . . . , ωd ) ∈ Rd ] + * d sin T (ωi − θi ) dθ. Z(dω) ωi − θi Rd A i=1 Provided A is a continuity set for Γ, the integrand convolved with IA (ω) converges in L2 (Γ) to IA (ω) as T → ∞ (see Exercise 8.4.5: the proof is straightforward for intervals A but not so direct for general bounded A), and hence the integral converges in mean square to Z(A). In very simple cases, Corollary 8.4.VI can be used to calculate directly the process Z(·) having orthogonal increments. Such an example is given below, partly to illustrate the potential dangers of using the second-order representation for anything other than second-order properties. Example 8.4(a) The Fourier transform of the Poisson process. Let ξ be a Poisson process on R with constant rate λ. Then, it follows from (8.4.8) that T ixa 1 e − eixb N (dx) − λ dx . Z((a, b]) = l.i.m. T →∞ 2πi −T x Consider in particular the process Ua (ω) ≡ Z(ω + a) − Z(ω − a) = l.i.m. T →∞

1 π

T

−T

e−iωx sin ax N (dx) − λ dx . x

Using standard results from Chapter 9 for the characteristic functional of the Poisson process, we ﬁnd Φ(ω, s) ≡ E exp(isUa (ω) ∞ ise−iωx sin ax ise−iωx sin ax −1− dx exp = exp λ x x −∞ ∞ sin ax 2 sin ax 3 = exp λ − 12 s2 cos ωx dx + O(s3 ) x x −∞ $ # = exp − 12 πλas2 + O(s3 )

336

8. Second-Order Properties of Stationary Point Processes

uniformly in ω [see e.g. Copson (1935, p. 153) for evaluation of the integral]. It follows that the variance of Ua (ω) is proportional to the length of the interval and independent of its location, corresponding to the presumption that Z(·) in this case must be a process with orthogonal and second-order stationary increments. On the other hand, Z(·) clearly does not have strictly stationary increments, for the full form of the characteristic function depends nontrivially on ω. Similarly, it can be checked from the joint characteristic function that Z does not have independent increments. Indeed, as follows from inspecting its characteristic function, Ua (ω) has an inﬁnitely divisible distribution of pure jump type, with a subtle dependence of the jump distribution on a and ω that produces the requisite characteristics of the second-order properties. The spectral representation for stationary random measures and point processes plays a similar role in guiding intuition and aiding computation as it does for classical time series. We illustrate its use below by establishing basic procedures for estimating the Bartlett spectrum in two practically important cases: simple point processes and random (point process) sampling of a stationary continuous process. Further examples arise in Section 8.5, where we examine linear ﬁlters and prediction. Example 8.4(b) Finite Fourier transform and point process periodogram. Estimates of the Bartlett spectrum provide a powerful means of checking for periodicity in point process data as well as for investigating other features reﬂected in the second-order properties. The basic tool for estimating the spectrum is the point process periodogram, deﬁned much as in the continuous case through the ﬁnite Fourier transform of the realization of a point process on a ﬁnite time interval (0, T ), namely T N (T )

1 − e−iωT JT (ω) = e−iωt [N (dt) − m dt] = e−iωtk − m , (8.4.9) iω 0 k=1

in terms of which the periodogram is then deﬁned as 1 (ω ∈ R). (8.4.10) |JT (ω)|2 IT (ω) = 2πT Express JT (ω) in the form of the left-hand side of (8.4.6) by setting ψ(t) = e−iωt I(0,T ) (t), which is certainly bounded and of bounded support. Then, it follows from Proposition 8.4.IV(iv) that iT (ω −ω) e −1 JT (ω) = Z(dω ) a.s. − ω) i(ω R The orthogonality properties of Z now imply that &2 & iT (ω −ω) &e − 1 && 1 & Γ(dω ) (8.4.11a) E[IT (ω)] = 2πT R & i(ω − ω) & sin 12 T (ω − ω) 2 T = Γ(dω ). (8.4.11b) 1 − ω) 2π R T (ω 2

8.4.

Spectral Representation

337

If Γ(·) has an atom at ω, then it follows from (8.4.11a) that IT (ω) ∼ T Γ({ω}). On the other hand, if Γ(·) has a continuous density γ(ω ) in a neighbourhood of ω, then it follows from (8.4.11b) that E[IT (ω)] → γ(ω). Thus, the periodogram is an asymptotically unbiased estimate of the spectral density wherever the density exists. The contrast between the two cases is the basis of tests for periodic eﬀects, meaning here some periodic ﬂuctuation in the rate of occurrence of events. Consistency is another story, however, and some degree of smoothing must be introduced to obtain consistent estimates of the spectral density. The theory here parallels the standard theory except insofar as the observations are not Gaussian and some spectral mass is carried at arbitrarily large frequencies. The latter feature is a consequence of assuming that the points {tk } of the process are observed with complete precision, which is a ﬁction in any real context: in reality, only limited precision is possible, amounting to some smoothing or rounding of the observations, which then induces a tapering of the spectrum at very high frequencies. Nevertheless, the lack of any natural upper bound to the observed frequency range, even from a ﬁnite set of observations, causes diﬃculties in tackling questions such as the detection and estimation of an unknown periodicity modulating the occurrence times of the observed points. Indeed, the very deﬁnition of such a modulation, except for speciﬁc models such as the Poisson process (when it can appear as a periodic modulation of the intensity), is a matter of some diﬃculty. The crux of the matter for the spectral theory is that, whatever the form of modulation may be, it should induce a periodic variation in the reduced covariance measure. Vere-Jones and Ozaki (1982) discuss some of these issues in simple special contexts; the general problem of testing for unknown frequencies in point process models appears to lack any deﬁnitive treatment. Brillinger (1978, 1981) gives a systematic overview of the diﬀerences between ordinary time series and point process analogues. Example 8.4(c) Random sampling of a random process. A situation of some practical importance arises when a stationary continuous-time stochastic process X(t) is sampled at the epochs {ti } of a stationary point process. The resultant process can be considered in two ways, either as a discrete-time process Yi = X(ti ) or as a random measure with jump increments ξ(dt) = X(t) N (dt). Neither operation is linear, but the second equation is just a multiplication of the two processes and leads to the more tractable results. Neither N (·) nor ξ(·) is a process with zero mean; to express the latter as a process with zero mean, suppose for simplicity that X(·) has zero mean, and then write ξ(dt) = X(t) N 0 (dt) + mX(t) dt, where N 0 (dt) = N (dt) − m dt and m = EN (0, 1] is the mean rate of the

338

8. Second-Order Properties of Stationary Point Processes

sampling process. Proceeding formally leads to ˜ − v) ZX (du) N (dv) + m φ(u) ˜ ZX (du), φ(t) ξ(dt) = φ(u R

R

R

R

corresponding to a representation of the measure Zξ as a convolution of ZX and ZN with an additional term for the mean. Leaving aside the general case, suppose that the processes X(·) and N (·) are independent. Then we ﬁnd var

φ(t) ξ(dt) R 2 ˜ − v)|2 γ (du) γ (dv) + m2 |φ(u)| ˜ = |φ(u γN (du), X N R

R

R

from which we deduce that γX (dω − u) γN (du) + m2 γX (dω). γξ (dω) = R

Hence, for the covariance measures we have ˘2 (du). C˘ξ (du) = c˘X (u) m2 du + C˘N (du) = c˘X (u) M Of course, the last result can easily be derived directly by considering E X(t) N (t, t + dt] X(t + u) N (t + u, t + u + du] . In practice, one generally must estimate the spectrum γX (·) given a (ﬁnite portion of a) realization of ξ(·). When N is a Poisson process at rate m, γξ (dω) = (m/2π)(var X) dω + m2 γX (dω), so γX can be obtained quite easily from γξ . In general, however, a deconvolution procedure may be needed, and the problem is complicated further by the fact that the spectral measures concerned are not totally ﬁnite. Consequently, numerical Fourier transform routines cannot be applied without some further manipulations [see Brillinger (1972) for further details]. Only partial results are available for the extension of the spectral theory to random signed measures. One approach, which we outline brieﬂy below, follows Thornett (1979) in deﬁning a second-order random measure as a family of random variables {W (A)}, indexed by the Borel sets, whose ﬁrst and second moments satisfy the same additivity and continuity requirements as the ﬁrstand second-moment measures of a stationary random measure. The resulting theory may be regarded as a natural generalization to Rd of the theory of random interval functions developed by Bochner (1955) and extended and applied to a statistical context by Brillinger (1972).

8.4.

Spectral Representation

339

Deﬁnition 8.4.VII. A wide-sense second-order stationary random measure on X = Rd is a jointly distributed family of real- or complex-valued random variables {ξ(A): A ∈ BX } satisfying the conditions, for bounded A, {An } and B ∈ BX , (i) Eξ(A) = m(A), var ξ(A) < ∞; (ii) var((Sx ξ)(A)) = var ξ(Tx A) = var ξ(A); (iii) ξ(A ∪ B) = ξ(A) + ξ(B) a.s. for disjoint A, B; and (iv) ξ(An ) → 0 in mean square when An ↓ ∅ as n → ∞. If the random variables ξ(·) here are nonnegative, then (iii) reduces to the ﬁrst part of (6.1.2) and implies that in (iv) the random variables ξ(An ) decrease monotonically a.s.; that is, ξ(An+1 ) ≤ ξ(An ) a.s., so that (iv) can be strengthened to ξ(An ) → 0 a.s. when An ↓ ∅ as n → ∞ [see the second part of (6.1.2)]. We then know from Chapter 9 that there exists a strict-sense random measure that can be taken as a version of ξ(·) so that nothing new is obtained. Thus, the essence of the extension in Deﬁnition 8.4.VII is to random signed measures. For the sequel, we work only with the mean corrected version, taking m = 0 in the deﬁnition. Given such a family then, we can always ﬁnd a Gaussian family with the same ﬁrst- and second-moment properties: the construction is standard and needs no detailed explanation (see Doob, 1953; Thornett, 1979). For example, the Poisson process, corrected to have zero mean, has var ξ(A) = λ(A), where λ is the intensity; this function is the same as the variance function for the Wiener chaos process in Chapter 9. While the deﬁnition refers only to variances, covariances are deﬁned by implication from the relation, valid for real-valued ξ(·), 2 cov ξ(A), ξ(B) = var ξ(A ∪ B) + var ξ(A ∩ B) − var ξ(A \ B) − var ξ(B \ A), which is readily veriﬁed ﬁrst for disjoint A and B and then for general A and B by substituting in the expansion of cov ξ(A), ξ(B) = cov ξ(A ∩ B) + ξ(A\B), ξ(A ∩ B) + ξ(B\A) . Although we can obtain in this way a covariance function C(A × B) deﬁned on products of bounded A, B ∈ BX , it is not obvious that it can be extended to a signed measure on B(X (2) ). Consequently, it is not clear whether or not a covariance measure exists for such a family. When it does, the further theory can be developed much as earlier. Irrespective of such existence, it is still possible to deﬁne both a spectrum for the process and an associated spectral representation. Thus, for any bounded Borel set A, consider the process XA (x) ≡ ξ(Tx A). Mean square continuity follows from condition (iv), so XA (·) has a spectral measure ΓA (·), and we can deﬁne Γ(dω) = |I˜A (ω)|−2 ΓA (dω)

340

8. Second-Order Properties of Stationary Point Processes

for all ω such that |I˜A (ω)| = 0. Since we cannot ensure that |I˜A (ω)| = 0 for all ω, some care is needed in showing that the resultant measure Γ(·) can in fact be consistently deﬁned for a suﬃciently rich class of sets A [one approach is outlined by Thornett (1979) and given as Exercise 8.4.6]. Just as before, the measure Γ is translation-bounded and hence integrates (1 + ω 2 )−1 , for example. On the other hand, it is not positive-deﬁnite in general and not all the explicit inversion theorems can be carried over. Nevertheless, for all bounded A ∈ BX , we certainly have var ξ(A) = |I˜A (ω)|2 Γ(dω) (8.4.12) and its covariance extension

cov ξ(A), ξ(B) =

I˜A (ω) I˜B (ω) Γ(dω).

(8.4.13)

Since the indicator functions are dense in L2 (Γ), more general integrals of the form φ(x) ξ(dx) can be deﬁned as mean square limits of linear combinations of the random variables ξ(A), at least when φ˜ ∈ L2 (Γ). For such integrals, the more general formulae 2 ˜ var φ(x) ξ(dx) = |φ(ω)| Γ(dω)

and cov

φ(x) ξ(dx),

˜ ψ(ω) ˜ ψ(x) ξ(dx) = φ(ω) Γ(dω)

are available, but it is not clear whether the integrals make sense other than in this mean square sense. As noted earlier, it is also an open question as to whether Γ is necessarily the Fourier transform of some measure, which we could then interpret as a reduced covariance measure. The isomorphism result in Proposition 8.4.III can be extended to this wider context with only minor changes in the argument: it asserts the isomorphism between L2 (X) and L2 (Γ) and provides a spectral representation, for bounded A ∈ BX , a.s. (8.4.14) ξ(A) = I˜A (ω) Z(dω) just as in the previous discussion. To summarize, we have the following theorem of which further details of proof are left to the reader. Theorem 8.4.VIII. Let {ξ(·)} be a wide-sense second-order stationary random measure as in Deﬁnition 8.4.VII. Then, there exists a spectral measure Γ(·) and a process Z(·) of orthogonal increments with var Z(dω) = Γ(dω) such that (8.4.12–14) hold.

8.4.

Spectral Representation

341

Exercises and Complements to Section 8.4 8.4.1 Representation in terms of the second-moment measure. Show that the eﬀect of working with the Fourier transform of the second moment rather than the Bartlett spectrum would be to set up an isomorphism between the spaces L2 (ξ) generated by all linear combinations of the r.v.s ξ(A) and L2 (ν), where ˘ 2 . Show that the representation ν is the inverse Fourier transform of M

˜ φ(ω) Z1 (dω)

φ(x) ξ(dx) = Rd

Rd

holds for functions φ in a suitably restricted class, where Z1 (A) = mδ0 (A) + Z(A), and Z and Z1 diﬀer only by an atom at ω = 0. 8.4.2 Let Γ be a nontrivial boundedly ﬁnite measure. Show the following: (a) Simple functions of the form a I [bounded Ak ∈ B(Rd )] are dense k k Ak in L2 (Γ). (b) For bounded A ∈ B(Rd ), there exist open sets Un ∈ B(Rd ) with Un ⊇ A, Γ(Un ) ↓ Γ(A). (c) Any such Un is the countable union of hyper-rectangles of the form {αi < xi ≤ βi , i = 1, . . . , d}. (d) Indicator functions on such hyper-rectangles can be approximated by sequences of inﬁnitely diﬀerentiable functions of bounded support. Now complete the proof of Lemma 8.4.I. 8.4.3 Given ψ˜ ∈ L2 (Γ), choose ψn ∈ S such that ψ˜ − ψ L2 (Γ) → 0 (n → ∞), and deduce that {Zψn } is a Cauchy sequence in L2 (ξ 0 ). Show that there is a unique r.v. ζ ∈ L2 (ξ 0 ) such that Zψn → ζ in mean square. Interchange the roles of L2 (Γ) and L2 (ξ 0 ) and deduce the assertion of Proposition 8.4.III. 8.4.4 Show that (8.4.6) can be extended to all L1 functions φ such that φ˜ ∈ L2 (Γ). [Hint: The left- and right-hand sides can be represented, respectively, as an a.s. limit of integrals of bounded functions of bounded support and as a mean square limit. When both limits exist, they must be equal a.s. This argument establishes a conjecture in Vere-Jones (1974).] 8.4.5 Establish the following properties of the function hT (ω) = ω −1 sin ωT (they are needed in a proof of Corollary 8.4.IV). ∞ (a) −∞ hT (ω) dω = π. (b) For any continuous function φ with bounded support, the function

∞

φT (ω) ≡

φ(ω − u)hT (u) du → φ(ω)

pointwise as T → ∞

−∞

[this is an application of Fourier’s single integral (see Zygmund, 1968, Section 16.1)]. Show that the result still holds if only φ ∈ L1 (ξ) and φ is of bounded variation in any closed interval contained in its support. (c) φT (ω) → φ(ω) in L2 (Γ) for any p.p.d. measure (or for any Bartlett spectrum) Γ. [Hint: |φT (ω)| ≤ constant/|ω| for large |ω| while supω |φT (ω)| < ∞; these properties are enough to ensure that |φT (ω)|2 ≤ g(ω) for some Γ-integrable function g.]

342

8. Second-Order Properties of Stationary Point Processes (d) Interpret the convergence in (c) as

|φT (ω)|2 Γ(dω) =

R

hT (ω − v)φ(v) dv Γ(dω)

hT (ω − u)φ(u) du

R R

hT (ω − u)hT (ω − v) Γ(dω)

φ(u) φ(v) du dv

=

R

R

= R2

→

R2

=

R

R

φ(u) φ(v) Γ∗T (du × dv) φ(u) φ(v) Γ∗ (du × dv)

|φ(ω)|2 Γ(dω),

R ∗ ∗ 2 where ΓT (du × dv) and Γ are measures in B(R ), the former with density R hT (ω − u) hT (ω − v) Γ(dω), while the latter reduces to Γ along the diagonal u = v. These results are enough to establish that Γ∗T → Γ∗ vaguely in R2 and hence that a similar result holds when φ(·) is replaced by the indicator function of a bounded Borel set in R1 that is a continuity set for Γ.

8.4.6 Show that for Γ to be the spectral measure of a wide-sense second-order stationary random measure, it is necessary and suﬃcient that Γ integrate all functions |I˜A (ω)|2 for bounded Borel sets A. Deduce that any translation-bounded measure can be a spectral measure. [Hint: Use a Gaussian construction for the suﬃciency; then use Lin’s lemma. See also Thornett (1979).] 8.4.7 (a) Show that if a wide-sense second-order stationary process has a reduced co˘ ˘ variance measure C(·), then C({0}) = limT →∞ Γ((−T, T ])/(2T ) continues to hold (see Theorem 8.6.III). (b) Use Exercise 8.2.4 to show that not all spectral measures are transforms; that is, not all wide-sense processes have an associated reduced covariance measure (see also Exercise 8.6.3).

8.5. Linear Filters and Prediction One of the most important uses of spectral representation theory is to obtain spectral characteristics of processes acted on by a linear ﬁlter, meaning here any time-invariant linear combination of values of the process or any mean square limit of such combinations. This use carries over formally unchanged from mean square continuous processes to second-order point processes and random measures and includes the procedures for developing optimal linear predictors for future values of the process. Obtaining the precise conditions for these extensions and their character requires some care, however, and forms the main content of the present section.

8.5.

Linear Filters and Prediction

343

Let ξ(·) be a second-order stationary random measure and ψ ∈ L1 a smoothing function; consider the smoothed process deﬁned by ∞ X(t) = ψ(t − u) ξ(du). (8.5.1) −∞

Substituting from the Parseval relation (8.4.6) and recalling that the Fourier iωt ˜ transform of the shifted function ψ(t − u) is ψ(−ω)e , we ﬁnd ∞ ˜ X(t) = eiωt ψ(−ω) Z(dω). (8.5.2) −∞

The spectrum ΓX (·) of the transformed process is 2 ˜ Γ(dω). ΓX (dω) = |ψ(−ω)|

(8.5.3)

This will be totally ﬁnite, which implies that X(·) is a mean square continuous process, provided ψ˜ ∈ L2 (Γ). The relation (8.5.1) can be interpreted even more broadly; for example, if A(·) is a totally ﬁnite measure, the convolution A ∗ ξ still deﬁnes a.s. a random measure and (8.5.2) and (8.5.3) continue to hold. Thus, (8.5.1) continues to make sense, with a generalized function interpretation of ψ, provided the outcome deﬁnes a.s. a random measure. However, the situation becomes decidedly more complex when, as is often necessary in applications to prediction, signed measures intervene; then at best the wide-sense theory can be used, and the character of the ﬁltered process, in a realization-by-realization sense, has to be ascertained post hoc. Example 8.5(a) Binning. A special case of practical importance arises when X = R and the measure ξ is ‘binned’; that is, integrated over intervals of constant length ∆, say. Considering ﬁrst the continuous-time process X(t) ≡ ξ t − 12 ∆, t + 21 ∆], (8.5.2) yields

∞

X(t) = −∞

eiωt

sin 12 ω∆ Z(dω), 1 2ω

hence

ΓX (dω) =

sin 12 ω∆ 1 2ω

2 Γ(dω).

It is commonly the case that the binned process is sampled only at the lattice points {n∆: n = 0, ±1, . . .}. The sampled process can then be represented in the aliased form sπ/∆ ∞ 2kπ

einθ ZX Y (n) ≡ X(n∆) = + dθ . ∆ 0 k=−∞

Taking ∆ as the unit of time, we see from this representation that the discretetime process {Y (n)} has spectral measure GY (·) on (0, 2π] given by GY (dθ) = sin2 θ

∞

Γ(2kπ + dθ) . (θ + 2kπ)2

k=−∞

(8.5.4)

344

8. Second-Order Properties of Stationary Point Processes

In the simplest case of a Poisson process, Γ(dω) = [µ/(2π)] dω, so that GY (dθ) = sin2 θ

∞

µ [µ/(2π)] dθ dθ = (θ + 2kπ)2 2π

k=−∞

since the inﬁnite series is just an expansion of cosec2 θ. This reduction reﬂects the fact that the random variables Y (n) are then independent with common variance µ. Binning is widely used in practical applications of time series methods to point process data, and even where it is not explicitly invoked, it is present implicitly in the rounding of observations to a ﬁxed number of decimal places. Indeed, the point process results themselves can be regarded as the limit when the binsize approaches zero and the character of the process Y (n) approaches that of a sequence of δ-functions in continuous time. See e.g. Vere-Jones and Davies (1966) and Vere-Jones (1970), where these ideas are applied in the earthquake context. Perhaps the most important examples of linear ﬁltering come in the form of linear predictions of a timeseries or point process. By a linear predictor we t mean a predictor of the form −∞ f (t − u) ξ(du); that is, a linear functional of the past, with the quantity to be predicted a linear functional of the future. In the point process case, the problem commonly reduces to predicting, as a linear functional of the past, the mean intensity at some time point in the future. When the process has a mean square continuous density, this corresponds exactly to the classical problem of predicting a future value of the process as a linear functional of its past. Thus, our task is essentially to check when the classical procedures can be carried over to random measures and to write out the forms that they take in random measure terms. It is important to contrast the linear predictors obtained in this way with the conditional intensity functions we described in Chapter 7. The conditional intensity function comprises the best nonlinear predictor of the mean rate at a point just ahead of the present. It is best out of all possible functionals of the past, linear or nonlinear, subject only to the measurability and nonanticipating characteristics described in Chapter 7. The linear predictors are best out of the more restricted class of linear functionals of the past. They are diﬃcult to use eﬀectively in predicting nonlinear features such as a maximum or the time to the next event in a point process. On the other hand, they perform well enough in predicting large-scale features where the law of large numbers tilts the distributions toward normality. They are generally easy to combine and manipulate and can sometimes be obtained when the full conditional intensity is inaccessible. The Wold decomposition theorem plays an important role in ﬁnding the best linear predictor for mean square continuous processes, and we start with an extension of this theorem for random measures. As in Section 8.4, we use ξ and ξ 0 to denote a second-order stationary random measure and its zero mean

8.5.

Linear Filters and Prediction

345

form, respectively, with the additional understanding that X = R. Since the results to be developed depend only on the spectral representation theorems, ξ can be either a strict- or wide-sense random measure. We continue to use L2 (ξ 0 ) to denote the Hilbert space of equivalence classes of random variables formed from linear combinations of ξ 0 (A) for bounded A ∈ B and their mean square limits. Similarly, L2 (ξ 0 ; t) denotes the Hilbert space formed from ξ 0 (A) with the further constraint that A ⊂ (−∞, t]. Deﬁnition 8.5.I. The second-order strict- or wide-sense stationary random measure ξ is deterministic if L (ξ 0 ; t) = L2 (ξ 0 ) and purely nondeter2 t∈R 0 ministic if t∈R L2 (ξ ; t) = {0}. The following extension of Wold’s theorem holds (Vere-Jones, 1974). Theorem 8.5.II. For any second-order stationary random measure ξ, the zero mean process ξ 0 can be written uniquely in the form ξ 0 = ξ10 + ξ20 , where ξ10 and ξ20 are mutually orthogonal, stationary, wide-sense zero-mean random measures, and ξ10 is deterministic and ξ20 purely nondeterministic. Proof. Again we start from the known theorems for mean square continuous processes [see e.g. Cram´er and Leadbetter (1967), especially Chapters 5–7] and use smoothing arguments similar to those around (8.5.1) to extend them to the random measure context. To this end, set t X(t) = e−(t−u) ξ 0 (du), (8.5.5) −∞

where the integral can be understood, whether ξ 0 is a strict- or wide-sense random measure, as a mean square limit of linear combinations of indicator functions. These indicator functions can all be taken of sets ⊆ (−∞, t], so we have X(t) ∈ L2 (ξ 0 ; t), and more generally, X(s) ∈ L2 (ξ 0 ; t) for any s ≤ t, so L2 (X; t) ⊆ L2 (ξ 0 ; t). To show that we have equality here, we write t+h X(t + h) − e−h X(t) − ξ 0 (t, t + h] = [e−(t+h−u) − 1] ξ 0 (du), t iωh ∞ − e−h e eiωh − 1 = Z(dω), eiωt − 1 + iω iω −∞ where Z is the process of orthogonal increments associated with ξ 0 as in Theorem 8.4.IV. Subdividing any ﬁnite interval (a, a + ∆] into n subintervals of length h = ∆/n, we obtain n

X(a + kh) − e−h X a + (k + 1)h − ξ 0 (a, a + ∆] k=1

∞

= −∞

!

n

k=1

" e

iω(a+kh)

eiωh − 1 eiωh − e−h − 1 + iω iω

Z(dω).

346

8. Second-Order Properties of Stationary Point Processes

The variance of the left-hand side therefore equals &2 & ∞ sin 12 ω∆ 2 && eiω − 1 && Γ(dω) −h 1 − e − . iω & 1 + ω 2 sin 12 ωh & −∞ The measure (1 + ω 2 )−1 Γ(dω) is totally ﬁnite (see Exercise 8.6.5), the term | · |2 is uniformly bounded in ω by 4h2 and for ﬁxed ω it is o(h2 ) as h → 0, and the term in braces is bounded by (∆/h)2 and for ﬁxed ω equals const. × h−2 (1 + o(1)) as h → 0. The dominated convergence theorem can therefore be applied to conclude that this variance → 0 as h → 0 and hence that ξ 0 (a, b] can be approximated in mean square by linear combinations of {X(t): t ≤ b}. This shows that L2 (ξ 0 ; t) ⊆ L2 (X; t), and thus L2 (ξ 0 ; t) = L2 (X; t) must hold. The Wold decomposition for X(t) takes the form X(t) = X1 (t) + X2 (t), where X1 (·) is deterministic and X2 (·) purely nondeterministic. The decomposition reﬂects an orthogonal decomposition of L2 (X), and hence of L2 (ξ 0 ) also, into two orthogonal subspaces such that X1 (t) is the projection of X(t) onto one and X2 (t) the projection onto the other. Then ξ10 (A) and ξ20 (A) may be deﬁned as the projections of ξ 0 (A) onto these same subspaces. Furthermore, ξ10 (a, b] and ξ20 (a, b] can be expressed as mean square limits of linear combinations of X1 (t) and X2 (t) in exactly the same way as ξ 0 (a, b] is expressed above in terms of X(t): the deterministic and purely nondeterministic properties of X1 (·) and X2 (·), respectively, carry over to ξ10 (·) and ξ20 (·). Uniqueness is a consequence of the uniqueness of any orthogonal decomposition. To verify the additivity property of both ξ10 (·) and ξ20 (·), take a sequence {An } of disjoint bounded Borel sets with bounded union. From the a.s. countable additivity of ξ 0 , which is equivalent to property (iv) of Deﬁnition 8.4.VII, we have ! ∞ " ∞

0 ξ An = ξ 0 (An ) a.s.; n=1

hence,

! ξ10

∞ n=1

" An

−

∞

n=1

n=1

! 0

ξ (An ) =

ξ20

∞ n=1

" An

−

∞

ξ(An )

a.s.

n=1

Since the expressions on the two sides of this equation belong to orthogonal subspaces, both must reduce a.s. to the zero random variable. Properties (i)– (iii) of Deﬁnition 8.4.VII are readily checked, so it follows that both ξ10 (·) and ξ20 (·) are wide-sense second-order stationary random measures. But note that even when ξ 0 is known to be a strict-sense random measure, the argument above shows only that ξ10 and ξ20 are wide-sense random measures. The classical results that relate the presence of a deterministic component to properties of the spectral measure can also be carried over from X(·) to the random measure ξ(·). They are set out in the following theorem.

8.5.

Linear Filters and Prediction

347

Theorem 8.5.III. Let ξ(·) be a strict- or wide-sense second-order stationary random measure with Bartlett spectrum Γ. Then ξ(·) is purely nondeterministic if and only if Γ is absolutely continuous and its density γ satisﬁes the condition ∞ log γ(ω) dω > −∞. (8.5.6) 1 + ω2 −∞ This condition is equivalent to the existence of a factorization γ(ω) = |˜ g (ω)|2 ,

(8.5.7)

where g˜(·) is the Fourier transform of a (real) generalized function with support on [0, ∞) and can be written in the form g˜(ω) = (1 − iω)˜ g1 (ω), where g˜1 (·) is the Fourier transform of an L2 (R) function with its support in R+ . The function g˜(·) can be characterized uniquely among all possible factorizations by the requirement that it have an analytic continuation into the upper half-plane Im(ω) > 0, where it is zero-free and satisﬁes the normalization condition ∞ log γ(ω) 1 dω . (8.5.8) g˜(i) = exp 2π −∞ 1 + ω 2 Proof. Since ξ is purely nondeterministic if and only if X deﬁned at (8.5.5) is purely nondeterministic, the results follow from those for the continuous-time process X(·) as set out, for example, in Hannan (1970, Section 3.4). From Sections 8.2 and 8.6, it follows that the spectral measure ΓX of X(·) is related to the Bartlett spectrum Γ of ξ by ΓX (dω) = (1 + ω 2 )−1 Γ(dω), so ΓX has a density γX if and only if Γ has a density, andthe density γ satisﬁes (8.5.6) if ∞ and only if γX does because the discrepancy −∞ (1 + ω 2 )−1 log(1 + ω 2 ) dω is ﬁnite. Similarly, if γX (ω) = |˜ gX (ω)|2 , where g˜X (·) is the Fourier transform of an L2 (R) function with support in R+ , we can set g1 = gX so that (8.5.7) holds together with the assertions immediately following it. Finally, (8.5.8) follows from the corresponding relation for g1 since ∞ log γX (ω) 1 dω g˜(i) = 2˜ g1 (i) = 2 exp 2π −∞ 1 + ω 2 ∞ log γ(ω) 1 dω = exp 2π −∞ 1 + ω 2 using the identity

∞

−∞

log(1 + ω 2 ) dω = 2π log 2. 1 + ω2

These extensions from ΓX to Γ are to be expected because the criteria are analytic and relate to the factorization of the function γ rather than to its behaviour as ω → ±∞. We illustrate the results by two examples.

348

8. Second-Order Properties of Stationary Point Processes

Example 8.5(b) Two-point Poisson cluster process. Suppose that clusters occur at the instants of a Poisson process with parameter µ and that each cluster contains exactly two members, one at the cluster centre and the other at a ﬁxed time h after the ﬁrst. Then, the reduced covariance measure has just three atoms, one of mass 2µ at 0 and the others at ±h, each of mass µ. The Bartlett spectrum is therefore absolutely continuous with density γ(ω) = µ(1 + cos ωh)/π = µ(2 cos2 12 ωh)/π. In seeking a factorization of the form (8.5.7), it is natural to try (2µ/π)1/2 × cos 12 ωh as a candidate, but checking the normalization condition (8.5.8) reveals a discrepancy: using the relation

∞

−∞

(1 + ω 2 ) log cos2 12 ωh dω = 2π log 12 (1 + e−h )

leads to (2µ/π)1/2 (1 + e−h )/2 for the right-hand side of (8.5.8), while the candidate gives g˜(i) = (2µ/π)1/2 cosh 12 ωh. It is not diﬃcult to see that the correct factorization is . . 2µ 1 + eiωh 2µ iωh/2 g˜(ω) = cos 12 ωh. = e π 2 π In this form, we can recognize g˜(·) as the Fourier transform of a measure with atoms [µ/(2π)]1/2 at t = 0 and t = h, whereas the unsuccessful candidate function is the transform of a measure with atoms of the same mass but at t = ± 12 h; that is, the support is not contained in [0, ∞). Example 8.5(c) Random measures with rational spectral density. When the spectral density is expressible as a rational function, and hence of the form !

m j=1

"/! 2

(ω +

αj2 )

n

" 2

(ω +

βj2 )

j=1

for nonnegative integers m, n with m ≤ n, and real αj , βj , the identiﬁcation of the canonical factorization is much simpler because it is uniquely determined (up to a constant of unit modulus) by the requirements that g˜(ω) be analytic and zero-free in the upper half-plane. Two situations commonly occur according to whether m < n or m = n. In the former case, the process has a mean square continuous density x(·) and Γ(·) is a totally ﬁnite measure. The problem reduces to the classical one of identifying the canonical factorization of the spectrum for the density of the process. For point processes, however, the δ-function in the covariance measure produces a term that does not converge to zero as |ω| → ∞, implying that m = n; the same situation obtains whenever the random measure has a purely atomic component.

8.5.

Linear Filters and Prediction

349

As an example of the latter form, recall the comments preceding Example 8.2(e) concerning point process models with spectral densities of the form γ(ω) =

A2 (α2 + ω 2 ) . β 2 + ω2

The canonical factorization here takes the form (with A, α, and β real and positive) A(α − iω) α−β g˜(ω) = =A 1+ β − iω β − iω corresponding to the time-domain representation g(t) = A δ0 (t) + (α − β)I[0,∞) (t)e−βt . Similar forms occur in more general point process models, with polynomial a sum of products of exponential and polynomial factors in place of the exponential. The main thrust of these factorization results is that they lead to a timedomain representation that can be used to develop explicit prediction formulae. The fact that the canonical factor g˜(ω) is in general the transform not of a function but only of a generalized function leads to some speciﬁc diﬃculties. However, much of the argument is not aﬀected by this fact, as we now indicate. Let Z(·) be the process of orthogonal increments arising in the spectral representation of ξ 0 , and g˜(·) the canonical factor described in Theorem 8.5.III. Introduce a further process U (·) with orthogonal increments by scaling the Z(·) process to have stationary increments as in Z(dω) = g˜(ω) U (dω),

(8.5.9)

where the invertibility of g˜ implies that for all real ω E|U (dω)|2 = |˜ g (ω)|−2 E|Z(dω)|2 = dω. Note that the use of the complex conjugate of g˜ in (8.5.9) is purely for convenience: it simpliﬁes the resulting moving average representation in the time domain. Corresponding to U in the frequency domain, we may, in the usual way, deﬁne a new process V in the time domain through the Parseval relations, so ∞ ∞ ˜ φ(t) V (dt) = φ(ω) U (dω), (8.5.10) −∞

−∞

which in this case can be extended to all functions φ ∈ L2 (R). It can be veriﬁed that V (·) also has orthogonal and stationary increments, with E|V (dt)|2 = 2π dt,

350

8. Second-Order Properties of Stationary Point Processes

corresponding to the more complete statement ∞ ∞ φ(t) V (dt) = 2π |φ(t)|2 dt var −∞ −∞ ∞ 2 |φ(ω)| dω = var = −∞

∞

˜ φ(ω) U (dω) .

−∞

On the other hand, from the Parseval relation for the ξ 0 process, we have for integrable φ, for which φ˜ ∈ L2 (Γ),

∞

φ(t) ξ 0 (dt) =

−∞

∞

˜ φ(ω) Z(dω) =

−∞

∞

˜ g˜(ω) U (dω). φ(ω)

(8.5.11)

−∞

Thus, if we could identify φ˜g¯˜ with the Fourier transform of some function φ ∗ g ∗ in the time domain, it would be possible to write

∞

0

∞

φ(t) ξ (dt) = −∞

−∞

∗

(φ ∗ g )(s) V (ds) =

∞

t

φ(t) dt −∞

−∞

g(t − s) V (ds),

corresponding to the moving average representation

t

ξ 0 (dt) = −∞

g(t − s) V (ds) dt.

Because g(·) is not, in general, a function, these last steps have a purely formal character. They are valid in the case of a process ξ 0 having a mean square continuous density, but in general we need to impose further conditions before obtaining any meaningful results. In most point process examples, the generalized function g(·) can be represented as a measure, but it is an open question as to whether this is true for all second-order random measures. We proceed by imposing conditions that, although restrictive, are at least general enough to cover the case of a point process with rational spectral density. They correspond to assuming that the reduced factorial cumulant ,[2] is totally ﬁnite, so that the spectral density can be written in measure C the form γ(ω) = (2π)−1 m + c˜[2] (ω) . Speciﬁcally, assume that

g˜(ω) = A 1 + c˜(ω)

(8.5.12)

for some positive constant A and function c˜ ∈ L2 (R). Then, the generalized function aspect of g(·) is limited to a δ-function at the origin, and there exists an L2 (R) function c(·) such that A δ0 (t) + c(t) (t ≥ 0), g(t) = 0 (t < 0).

8.5.

Linear Filters and Prediction

351

Under the same conditions, the reciprocal 1/˜ g (ω) can be written ˜ 1/˜ g (ω) = A−1 1 − d(ω) , ˜ where d(ω) = c˜(ω)/(1 + c˜(ω)), and from ∞ 2 2 ˜ |d(ω)| γ(ω) dω = A −∞

∞

−∞

|˜ c(ω)|2 dω < ∞

it follows that d˜ ∈ L2 (γ). Often, we have L2 (γ) ⊆ L2 (R), in which case d˜ ∈ L2 (R), implying the existence of a representation of a Fourier inverse of 1/˜ g (ω) as −1 δ0 (t) − d(t) (t ≥ 0) A (8.5.13) 0 (t < 0) for some function d ∈ L2 (R). Proposition 8.5.IV (Moving Average and Autoregressive (ARMA) Representations). Suppose (8.5.12) holds for some c˜ ∈ L2 (R). Then, using the notation of (8.5.12–13), for φ ∈ L1 (R) such that φ˜ ∈ L2 (R), the zero-mean process ξ 0 (·) is expressible as 0 ˜ ˜ φ(t) ξ (dt) = φ(t) V (dt) + φ(t)X(t) dt a.s., (8.5.14) R

R

R

where V (·) is a zero-mean process with stationary orthogonal increments such that E|V (dt)|2 = 2πA2 dt (8.5.15) and X(·) is a mean square continuous process that can be written in the moving average form t X(t) = c(t − u) V (du) a.s. (8.5.16) −∞

or, if furthermore d˜ ∈ L2 (R), in the autoregressive form t d(t − u) ξ 0 (du) a.s. X(t) =

(8.5.17)

−∞

Proof. Under the stated assumptions, it follows from (8.5.11) that ˜ ˜ c(ω) U (dω) a.s. (8.5.18) φ(t) ξ 0 (dt) = A φ(ω) U (dω) + A φ(ω)˜ R

R

R

Consider now the process X(·) deﬁned by the spectral representation X(t) = eitω c˜(ω) U (dω) = eitω ZX (dω) a.s., (8.5.19) R

R

352

8. Second-Order Properties of Stationary Point Processes

where ZX has orthogonal increments and satisﬁes E |Z(dω)|2 = γX (ω) dω = |˜ c(ω)|2 dω. To ensure that R X(t)φ(t) dt can be validly interpreted as a mean square integral, it is enough to show that φ˜ ∈ L2 (γX ), as in the discussion ˜ around (8.5.3). But φ ∈ L1 (R) implies that |φ(ω)| is bounded for ω ∈ R, and then the assumption that c˜ ∈ L2 (R) implies that 2 2 ˜ ˜ |φ(ω)| |˜ c(ω)|2 dω = |φ(ω)| |γX (ω) dω < ∞, R

R

as required. The terms on the right-hand side of (8.5.18) can now be replaced by their corresponding time-domain versions. Thus, we have ˜ A φ(ω) U (dω) = φ(t) V (dt), R

R

absorbing the constant A into the deﬁnition of the orthogonal-increment process V as in (8.5.10), while the discussion above implies that the last term in ˜ (8.5.18) can be replaced by R φ(t)X(t) dt, with X(t) deﬁned as in (8.5.16). This establishes the representation (8.5.14). To establish the autoregressive form in (8.5.17), observe that itω ˜ ˜ Y (t) ≡ e d(ω) Z(dω) = A eitω d(ω) 1 + c˜(ω) U (dω) R R itω = A e c˜(ω) U (dω) = X(t), R

the integrals being well deﬁned and equal a.s. from the assumption that c˜ ∈ L2 (R), from which it follows that d˜ ∈ L2 (Γ). If ξ 0 is a strict-sense random measure, then the time-domain integral (8.5.17) is well deﬁned for φ ∈ L

1 (R) and can be identiﬁed a.s. with its frequency-domain version Y (t) above. If ξ 0 is merely a wide-sense process, then (8.5.17) can be deﬁned only as a mean square limit, which will exist whenever d˜ ∈ L2 (Γ). In either case, therefore, X(t) = Y (t) a.s. Equation (8.5.14) can be combined with equations (8.5.16) and (8.5.17) to yield the abbreviated but suggestive forms set out below; they embody the essential content of the moving average and autoregressive representations in the present context. Corollary 8.5.V. With the same assumptions and notation as in Proposition 8.5.IV, t− c(t − u) V (du) dt a.s., (8.5.20) ξ 0 (dt) = V (dt) + −∞ t−

ξ 0 (dt) = V (dt) +

−∞

d(t − u) ξ 0 (du) dt

a.s.

(8.5.21)

8.5.

Linear Filters and Prediction

353

There is a close analogy between (8.5.20) and the martingale decomposition of the cumulative process outlined in the previous chapter: the ﬁrst term in (8.5.20) corresponds to the martingale term, or innovation, while the second corresponds to the conditional intensity. The diﬀerence lies in the fact that the second term in (8.5.20) is necessarily representable as a linear combination of past values, whereas the conditional intensity, its analogue in the general situation, is not normally a linear combination of this type. Finally, we can use the results of the proposition to establish the forms of the best linear predictors when the assumptions of Proposition 8.5.IV hold. Consider speciﬁcally the problem of predicting forward the integral Q≡ φ(s) ξ 0 (ds) a.s. (8.5.22) R

from observations on ξ 0 (·) up to time t. The best linear predictor, in the mean square sense, is just the projection of φ onto the Hilbert space L2 (ξ 0 ; t). From equations (8.5.14) and (8.5.20), we see that it can be written as t ∞ ,t (s) ds a.s., ,t = φ(s) ξ 0 (ds) + φ(s)X (8.5.23) Q −∞

t

where for s > t, ,t (s) = X

t

−∞

c(s − u) V (du)

The truncated function cst (u) =

c(u) 0

a.s.

(8.5.24)

(u > s − t), (u ≤ s − t),

is in L2 (R) when c is, and the same is therefore true of its Fourier transform. ,t (s) and Q , t are well Consequently, the random integrals in the deﬁnitions of X deﬁned by the same argument as used in proving Proposition 8.5.IV. Equation (8.5.24) already gives an explicit form for the predictor, but it is not convenient for direct use since it requires the computation of V (·). In ,t (s) is more useful. To ﬁnd it, practice, the autoregressive representation of X observe that t ˜ ,t (s) = cst (s − u) V (du) = c˜st (ω) U (dω) = c˜st (ω)[1 − d(ω)] Z(dω) X −∞ t

=

−∞

c(s − u) −

R

t−u

R

c(s − u − v)d(v) dv ξ 0 (du)

a.s.

(8.5.25)

0

The integral is well deﬁned not only in the mean square sense but also in the a.s. sense if d ∈ L1 (R). In this case, the integrand in (8.5.25) can also be written in the form s−u d(s − u) + c(s − u − v)d(v) dv, t−u

354

8. Second-Order Properties of Stationary Point Processes

which is then the sum of two L1 (R) functions, both of which can be integrated against ξ 0 . These arguments are enough to establish the validity of the autoregressive form (8.5.25) as an alternative to (8.5.24). It is important to emphasize that ,t (s) is to be interpreted as the predictor of the intensity of the ξ 0 process X at time s > t, or in abbreviated notation, ,t (s) ds = E[ξ 0 (ds) | Ht ] = E[λ(s) | Ht ], X

(8.5.26)

where both expectations are to be understood only in the sense of Hilbertspace projections. Thus, the assumptions of Proposition 8.5.IV imply that the intensity is predicted forward as a mean square continuous function of the past. In contrast to the case where the process itself is mean square continuous, when the predictors may involve diﬀerentiations, here they are always smoothing operators. The discussion can be summarized as follows. Proposition 8.5.VI. Under the conditions of Proposition 8.5.IV, the best linear predictor of the functional Q in (8.5.22), given the history Ht of the ξ 0 process on (−∞, t], is as in (8.5.23), in which the mean square continuous ,t (s) may be regarded as the best linear predictor of the ‘intensity’ process X ξ 0 (ds)/ds for s > t and has the moving average representation (8.5.24) and the autoregressive representation t , ht (s − u) ξ 0 (du), Xt (s) = −∞

where

t−u

ht (s − u) = c(s − u) −

c(s − u − v)d(v) dv 0

s−u

c(s − u − w)d(w) dw.

= d(s − u) +

(8.5.27)

t−u

Returning to the original random measure ξ (as distinct from ξ 0 ), we obtain the following straightforward corollary, stated in the abbreviated form analogous to (8.5.26). Corollary 8.5.VII. The random measure ξ can be predicted forward with predicted intensity at s > t given by ,t (s) ds, E[ξ(ds) | Ht ] = m + X where the conditional expectation is to be understood in the sense of a Hilbertspace projection. Example 8.5(d) A point process with rational spectral density [continued from Example 8.5(c)]. Consider the case where γ(ω) = A2 (α2 + ω 2 )/(β 2 + ω 2 ) .

(8.5.28)

8.5.

Linear Filters and Prediction

355

From the form of g˜(ω) as earlier, it follows that α−β , β − iω c(t) = (α − β)e−βt ,

α−β ˜ , d(ω) = α − iω d(t) = (α − β)e−αt

c˜(ω) =

(t ≥ 0).

Substituting into (8.5.27), we ﬁnd ht (s − u) = (α − β)e−β(s−u) − (α − β)2 e−β(s−u)

t−u

e−(α−β)v dv

0

= (α − β)e−β(s−t) e−α(t−u) , so that ,t (s) = (α − β)e−β(s−t) X

t

e−α(t−u) ξ 0 (du)

a.s.

(8.5.29)

−∞

Thus, the predictor here is a form of exponential smoothing of the past. How well it performs relative to the full predictor, based on complete information about the past, depends on the particular process that is under consideration. The most instructive and tractable example is again the Hawkes process, which, in order to reproduce the second-order properties above, should have a complete conditional intensity of the special form as in Exercise 7.2.5, t− λ∗ (t) = λ + ν αe−α(t−u) N (du) a.s., ≡ λ + ναY (t), say, (8.5.30) −∞

which leads to (8.5.28) with A2 = λ/2π, β = α(1 − ν) [see equation (8.2.10)]. The full predictor can be found by taking advantage of the special form of the intensity, which implies that the quantity Y (t) as above and in Exercise 7.2.5 ∞ is Markovian. Deﬁning m(t) = E[Y (t)] = 0 y Ft (dy), we ﬁnd by integrating (7.2.12) that m(t) satisﬁes the ordinary diﬀerential equation dm(t) = −βm(t) + λ dt with solution m(t) =

λ −βt λ + m(0) − e . β β

To apply this result to the nonlinear prediction problem analogous to that ,t (s) in the linear case, we should set m(0) = Y (t) and consider solved by X m(s − t), which gives the solution , ∗ (s) ≡ E[λ∗ (s) | Ht ] = λ + ναE[Y (t + s) | Y (t)] = λ + ναm(s − t) X t λ λ = + να Y (t) − e−β(s−t) . 1−ν β

356

8. Second-Order Properties of Stationary Point Processes

Replacing Y (t) by its representation in terms of the past of the process as in (8.5.30) leads back to (8.5.29). Thus, for a Hawkes process with exponential infectivity function, the best linear predictor of the future intensity equals the best nonlinear predictor of the future intensity. It appears to be an open question whether this result extends to other Hawkes processes or to other stationary point processes. Linear and nonlinear predictors for an example of a renewal process with rational spectral density are discussed in Exercise 8.5.2. Example 8.5(e) Two-point Poisson cluster process [continued from Example 8.5(b)]. While this example does not satisfy the assumptions of the preceding discussion, it is simple enough to handle directly. From the expression for g˜(ω) given earlier, the moving average representation can be written in the form ξ 0 (dt) = (µ/2π)1/2 {V (dt) + V (dt − h)}. The reciprocal has the form 1/˜ g (ω) = (2π/µ)1/2 (1 + eiωh )−1 , which, if we proceed formally, can be regarded as being the sum of an inﬁnite series corresponding to the time-domain representation V (dt) =

2π/µ ξ 0 (dt) − ξ 0 (dt − h) + ξ 0 (dt − 2h) − · · · .

In fact, the sum is a.s. ﬁnite and has the eﬀect of retaining in V only those atoms in ξ 0 that are not preceded by a further atom h time units previously; that is, of retaining the atoms at cluster centres but rejecting their cluster companions. From this, it is clear that the process V (·) is just a scaled version of the zero-mean version of the original Poisson process of cluster centres, and the moving average representation is simply a statement of how the clusters are formed. It is now easy to form linear predictors: we have ξ 0 (ds | Ht ) =

0 (s − t > h), (µ/2π)1/2 V (ds − h) (0 ≤ h ≤ s − t),

and on 0 ≤ h ≤ s − t we also have ξˆ0 (ds | Ht ) =

∞

(−1)j ξ 0 (ds − jh).

j=1

The eﬀect of the last formula is to scan the past to see if there is an atom at s − h not preceded by a further atom at s − 2h: the predictor predicts an atom at s when this is the case and nothing otherwise.

8.6.

P.P.D. Measures

357

Exercises and Complements to Section 8.5 8.5.1 Renewal processes with rational spectral density. Show that the Bartlett spectrum for the renewal process considered in Exercise 4.2.4 with interval density µ2 xe−µx has the form µ ω 2 + 2µ2 γ(ω) = . 4π ω 2 + 4µ2 8.5.2 Linear and nonlinear prediction of a renewal process. (a) Show that for any renewal process the best nonlinear predictor E[λ∗ (t+s) | Ht ] for the intensity is the renewal density for the delayed renewal process in which the initial lifetime has d.f. [F (Bt + s) − F (Bt )]/[1 − F (Bt )], where Bt is the backward recurrence time at time t. (b) Find explicitly the best predictor for the process in Exercise 8.5.1. (c) Find the canonical factorization of the spectrum of the renewal process ,t (s), where Bt is in Exercise 8.5.1, and ﬁnd the best linear predictor B the backward recurrence time at t. When does it coincide with the best nonlinear predictor in (b)? (d) Investigate the expected information gain per event based on the use of the linear and nonlinear predictors outlined above.

8.6. P.P.D. Measures In this section, we brieﬂy develop the properties of p.p.d. measures required for the earlier sections of this chapter. We follow mainly the work of Vere-Jones (1974) and Thornett (1979); related material, in a more abstract setting, is in Berg and Frost (1975). No signiﬁcant complications arise in developing the theory for Rd rather than for the line, so we follow this practice, although most of the examples are taken from the one-dimensional context. Since the measures we deal with are not totally ﬁnite in general, we must ﬁrst deﬁne what is meant by a Fourier transform in this context. As in the theory of generalized functions (see e.g. Schwarz, 1951), we make extensive use of Parseval identities ψ(x) ν(dx) = ψ(ω) µ(dω) (8.6.1) Rd

Rd

to identify the measure ν as the Fourier transform of the measure µ in (8.6.1). Here ψ(ω) = eix·ω ψ(x) dx Rd

is the ordinary (d-dimensional) Fourier transform of ψ(·), but such functions must be suitably restricted. A convenient domain for ψ is the space S of real or complex functions of rapid decay; that is, of inﬁnitely diﬀerentiable functions that, together with their derivatives, satisfy inequalities of the form & & & ∂ k ψ(x) & C(k, r) & & & ∂xk1 · · · ∂xkd & ≤ (1 + |x|)r 1 d

358

8. Second-Order Properties of Stationary Point Processes

for some constants C(k, r) < ∞, all positive integers r, and all ﬁnite families of nonnegative integers (k1 , . . . , kd ) with k1 + · · · + kd = k. The space S has certain relevant properties, proofs of which are sketched in Exercise 8.6.1: (i) S is invariant under the Fourier transformation taking ψ into ψ. (ii) S is invariant under multiplication or convolution by real- or complexvalued integrable functions g on Rd such that both g and g˜ are zero-free. (iii) Integrals with respect to all functions ψ ∈ S uniquely determine any boundedly ﬁnite measure on Rd . The following deﬁnitions collect together some properties of boundedly ﬁnite measures that are important in the sequel. We use the notation, for complex-valued functions ψ and φ, (ψ ∗ φ)(x) = ψ ∗ (x) = ψ(−x), ψ(y)φ(x − y) dy, Rd

so that ∗

(ψ ∗ ψ )(x) =

Rd

ψ(y)ψ(y − x) dy.

Deﬁnition 8.6.I. A boundedly ﬁnite signed measure µ(·) on Rd is (i) translation-bounded if for all h > 0 and x ∈ Rd there exists a ﬁnite constant Kh such that, for every sphere Sh (x) with centre x ∈ Rd and radius h, & & &µ Sh (x) & ≤ Kh ; (8.6.2) (ii) positive-deﬁnite if for all bounded measurable functions ψ of bounded support, (ψ ∗ ψ ∗ )(x) µ(dx) ≥ 0; (8.6.3) Rd

(iii) transformable if there exists a boundedly ﬁnite measure ν on Rd such that (8.6.1) holds for all ψ ∈ S; (iv) a p.p.d. measure if it is nonnegative (i.e. a measure rather than a signed measure) and positive-deﬁnite. A few comments on these deﬁnitions are in order. The concept of translation boundedness appears naturally in this context and is discussed further by Lin (1965), Argabright and de Lamadrid (1974), Thornett (1979), and Robertson and Thornett (1984). If µ is nonnegative, then it is clear that if (8.6.2) holds for some h > 0 it holds for all such h. The notion of positivedeﬁniteness in (8.6.3) is a direct extension of the same notion for continuous functions; indeed, if µ is absolutely continuous, then it is positive-deﬁnite in the sense of (8.6.3) if and only if its density is a positive-deﬁnite function in the usual sense. Concerning the Parseval relation in (8.6.1), it is important to note that if the measure µ is transformable, then ν is uniquely determined by µ and conversely. Equation (8.6.1) generalises the relation c(x) = eiω·x F (dω) Rd

8.6.

P.P.D. Measures

359

for the covariance density in terms of the spectral measure F of a mean square continuous process to which it reduces (with the appropriate identiﬁcations) when the random measure and associated covariance measure are absolutely continuous. Our main interest is in the class of p.p.d. measures on Rd , denoted below by P + . Some examples may help to indicate the scope and character of P + . Example 8.6(a) Some examples of p.p.d. measures. (1◦ ) A simple counterexample. The measure on Rd with unit mass at each of the two points ±1 is not a p.p.d. measure because its Fourier transform 2 cos ω can take negative values and it thus fails to be positive-deﬁnite. On the other hand, the convolution of this measure with itself (i.e. the measure with unit mass at ±2 and mass of two units at 0) is a p.p.d. measure, and its Fourier transform is the boundedly ﬁnite (but not totally bounded) measure with density 4 cos ω. This also shows that the convolution square root measure of a p.p.d. measure need not be p.p.d. (2◦ ) Absolutely continuous p.p.d. measures. Every nonnegative positive-deﬁnite function deﬁnes the density of an absolutely continuous p.p.d. measure. (3◦ ) Counting measure. Let µ have unit mass at every 2πj for j = 0, ±1, . . . . Then, for ψ ∈ S, (8.6.1) reduces to the Poisson summation formula (see Exercise 8.6.4 for details) ∞

ψ(n) =

n=−∞

∞

ψ(2πj);

j=−∞

that is, µ has as its Fourier transform the measure ν with unit mass at each of the integers n = 0, ±1, . . . . It also shows that ν, and thus µ as well, is positive-deﬁnite for ψ a function of the form φ∗φ∗ so that the right-hand (take ˜ side becomes |φ(2πj)|2 ≥ 0). ◦ (4 ) Closure under product. Let µ1 , . . . , µd be p.p.d. measures on R with Fourier transforms µ ˜1 , . . . , µ ˜d . Then, the product measure µ1 × · · · × µd is a ˜1 × · · · × µ ˜d . p.p.d. measure on Rd with Fourier transform µ A simple and elegant theory for measures in P + and their Fourier transforms can be developed by the standard device of approximating µ by a smoothed version obtained by convoluting µ with a suitable smoothing function such as the symmetric probability densities t(x) = (1 − |x|)+ eλ (x) =

−λ|x| 1 2 λe

(triangular density), (two-sided exponential density),

and their multivariate extensions d t(x) = (1 − |xi |)+ , i=1

d eλ (x) = 12 λ exp

! −λ

(8.6.4a) d

i=1

" |xi | .

(8.6.4b)

360

8. Second-Order Properties of Stationary Point Processes

Observe that t(x) =

Rd

IUd (x − y) IUd (−y) dy.

(8.6.4a )

We are now in a position to establish the basic properties of P + . Proposition 8.6.II. (a) P + is a closed positive cone in M# (Rd ). (b) Every p.p.d. measure is symmetric and translation-bounded. Proof. In (a), we mean by ‘a positive cone’ a set closed under the formation of positive linear combinations. Then (a) is just the statement that if a sequence of boundedly ﬁnite measures in Rd converges vaguely to a limit, and if each measure in the sequence is positive-deﬁnite, then so is the limit. This follows directly from the deﬁnition of vague convergence and the deﬁning relation (8.6.3). Now let µ be a p.p.d. measure on Rd , and convolve it with t(·) as in (8.6.4a) so that the convolution is well deﬁned. The resultant function c(x) ≡ t(x − y) µ(dy) (8.6.5) Rd

is real-valued, continuous, and for all bounded measurable ψ of bounded support it satisﬁes, because of (8.6.4a ), Rd

∗

c(u)(ψ ∗ ψ )(u) du =

Rd

(ψ ∗ IUd ) ∗ (ψ ∗ IUd )∗ (y) µ(dy) ≥ 0;

note that (8.6.3) applies because ψ ∗ IUd is measurable and bounded with bounded support whenever ψ is. In other words, the function c(·) is realvalued and positive-deﬁnite and hence, from standard properties of such functions, also symmetric and bounded. Since t(·) is symmetric, it is clear that c(·) is symmetric if and only if µ is symmetric, which must therefore hold. Finally, it follows from the positivity of µ and the inequality t(x) ≥ 2−d for x ≤ 14 that if K is a bound for c(·), µ S1/4 (x) ≤ 2d

c(y) dy ≤ 2d K < ∞. S1/4 (x)

Inequality (8.6.2) is thus established for the case h = 14 , and since µ is nonnegative, its validity for any other value of h is now apparent. The Fourier transform properties can be established by similar arguments, though it is now more convenient to work with the double exponential function eλ (·) because its Fourier transform ˜eλ (ω) =

d

λ2 λ2 + ωi2 i=1

8.6.

P.P.D. Measures

361

has no real zeros. The existence of the convolution µ ∗ eλ follows from the translation boundedness just established. The relation eλ (x − y) µ(dy) dλ (x) = Rd

again deﬁnes a continuous positive-deﬁnite function. By Bochner’s theorem in Rd , it can therefore be represented as the Fourier transform eiω·x Gλ (dω) dλ (x) = Rd

for some totally ﬁnite measure Gλ (·). Now let ψ(ω) be an arbitrary element of S, and consider the function κ ˜ (ω) deﬁned by κ ˜ (ω) = (1 + ω 2 )ψ(−ω)/(2π)d . Then κ ˜ ∈ S also, and hence κ ˜ is the Fourier transform of some integrable function κ satisfying ψ(y) = (κ ∗ e1 )(y). From the Fourier representation of d1 , we have κ(x)d1 (x) dx = κ ˜ (ω) G1 (dω) Rd

Rd

for all integrable κ and hence in particular for the function κ just constructed. Substituting for κ, we obtain, for all ψ ∈ S, µ(dy) = (κ ∗ e1 )(y) µ(dy) = κ(x)d1 (x) dx ψ(y) Rd Rd Rd 1 = κ ˜ (ω) G1 (dω) = ψ(ω)(1 + ω 2 ) G1 (−dω). (2π)d Rd Rd We now deﬁne the measure ν by ν(dω) = (2π)−d (1 + ω 2 ) G1 (−dω) and observe that ν is boundedly ﬁnite and satisﬁes the equation (8.6.1), which represents ν as the Fourier transform of µ. Thus, we have shown that any p.p.d. measure µ is transformable. Then, interchanging Recall that S is preserved under the mapping ψ → ψ. the roles of ψ and ψ in (8.6.1) shows that every p.p.d. measure is itself a transform and hence that ν is positive-deﬁnite as well as positive; that is, it is itself a p.p.d. measure. Since the determining properties of S imply that each of the two measures in (8.6.1) is uniquely determined by the other, we have established the principal result of the following theorem.

362

8. Second-Order Properties of Stationary Point Processes

Theorem 8.6.III. Every p.p.d. measure µ(·) is transformable, and the Parseval equation (8.6.1) establishes a one-to-one mapping of P + onto itself. This mapping can also be represented by the inversion formulae: for bounded ν-continuity sets A, ν(A) = lim (8.6.6) I˜A (ω)˜eλ (ω) µ(dω); λ→∞

Rd

for bounded µ-continuity sets B, 1 µ(B) = lim − I˜B (−x)˜eλ (−x) ν(dx); λ→∞ (2π)d Rd ν({a}) = lim e−iω·a µ(dω); T →∞

Ud T

1 µ({b}) = lim T →∞ (2πT )d

(8.6.7) (8.6.8)

eix·b ν(dx).

(8.6.9)

Ud T

For all Lebesgue integrable φ for which φ˜ is µ-integrable, there holds the extended Parseval relation ˜ φ(x + y) ν(dy) = eiω·x φ(ω) µ(dω) (a.e. x). (8.6.10) Rd

Rd

Proof. It remains to establish the formulae (8.6.6–10), all of which are eﬀectively corollaries of the basic identity (8.6.1). Suppose ﬁrst that A is a bounded continuity set for ν(·) and hence a fortiori for the smoothed version ν ∗ eλ . Then, for all ﬁnite λ, it is a consequence of the Parseval theorem that (ν ∗ eλ )(A) = I˜λ (ω)˜eλ (ω) µ(dω). Rd

Now letting λ → ∞, the left-hand side → ν(A) by standard properties of weak convergence since it is clear that ν ∗ eλ → ν weakly on the closure A¯ of A. This proves (8.6.6), and a dual argument gives (8.6.7). To establish (8.6.8), consider again the convolution with the triangular density t(·). Changing the base of the triangle from (−1, 1) to (−h, h) ensures that the Fourier transform t˜(ω) does not vanish at ω = a for any given a. Now check via the Parseval identity that the totally ﬁnite spectral measure corresponding to the continuous function c(x) in (8.6.5) can be identiﬁed with t˜(ω)ν(dω). Then, standard properties of continuous positive-deﬁnite functions imply 1 ˜ t(a)ν({a}) = lim − e−ia·x c(x) dx. (8.6.11) T →∞ (4πT )d Ud2T

Consider DT ≡ t˜(a)

e Ud 2T

−ia·x

µ(dx) −

Ud 2T

e−ia·x c(x) dx,

8.6.

P.P.D. Measures

363

which on using the deﬁnition of c(·) as the convolution t ∗ µ yields T −x1 T −xd −ia·x −ia·y ˜ DT = e µ(dx) t(a)IUd2t (x) − ··· e t(y) dy . Rd

−T −x1

−T −xd

The expression inside the braces vanishes both inside the hypercube with vertices ± (T − h), . . . , ±(T − h) since the second integral then reduces to t˜(a) and outside the hypercube with vertices ± (T + h), . . . , ±(T + h) since both terms are then zero. Because µ is translation-bounded, there is an upper bound, Kh say, on the mass it allots to any hypercube with edge of length 2h. The number of such hypercubes needed to cover the region where the integrand is nonzero is certainly bounded by 2d(2 + T /h)d−1 , within which region the integrand is bounded by M , say. Thus, |DT | 2d ≤ (4πT )d (4π)d

1 2 + h T

d−1

M Kh →0 T

(T → ∞).

Equation (8.6.8) now follows from (8.6.11), and (8.6.9) follows by a dual argument with the roles of µ and ν interchanged. It is already evident by analogy with the argument used in constructing ν(·) that the Parseval relation (8.6.1) holds not only for ψ ∈ S but also for any function of the form (φ ∗ eλ )(x), where φ is integrable. In particular, any function of the form θ(x) = φ(y)ψ(x − y) dy = (φ ∗ ψ)(x) Rd

has this form for ψ ∈ S and φ integrable. Hence, for all ψ ∈ S, ˜ ψ(ω) ψ(x) dx φ(x + y) ν(dy) = φ(ω) µ(dω). Rd

Rd

Rd

If, furthermore, φ˜ is µ-integrable, we can rewrite the right-hand side of this equation in the form ˜ ψ(x) dx eiω·x φ(ω) µ(dω). Rd

Rd

Since equality holds for all ψ ∈ S, the coeﬃcients of ψ(x) in the two integrals must be a.e. equal, which gives (8.6.10). Many variants on the inversion results given above are possible: the essential point is that µ and ν determine each other uniquely through the Parseval relation (8.6.1). A number of further extensions of this relation can be deduced from (8.6.10), including the following important result. Proposition 8.6.IV. For all p.p.d. measures µ with Fourier transform ν as in (8.6.1), and for all bounded functions f of bounded support, (f ∗ f ∗ )(x) ν(dx) = |f˜(ω)|2 µ(dω). (8.6.12) Rd

Rd

364

8. Second-Order Properties of Stationary Point Processes

Proof. Examining (8.6.10), we see that the assumed integrability condition implies that the right-hand side there is continuous in x and consequently that the two sides are equal for any value of x at which the left-hand side is also continuous (note that the a.e. condition cannot be dropped in general because altering φ at a single point will alter the left-hand side whenever ν has atoms while the right-hand side will remain unchanged). Thus, to check (8.6.12), it is enough to establish the continuity of the left-hand side and the integrability of |f˜(ω)|2 with respect to µ on the right-hand side. Appealing to the dominated convergence theorem shows ﬁrst that Rd f (u)f (x + u) du is a continuous function of x and second, since this function vanishes outside a bounded set within which ν(·) is ﬁnite, that the integral Rd

(f ∗ f ∗ )(x + y) ν(dy)

also deﬁnes a continuous function of x. To establish that |f˜(ω)|2 is µ-integrable, we use Lemma 8.6.V given shortly (the lemma is also of interest in its own right). Speciﬁcally, express the integral on the right-hand side of (8.6.12) as a sum of integrals over regions Bk as in the lemma. For each term, we then have |f˜(ω)|2 µ(dω) ≤ bk µ(Bk ) ≤ Kbk Bk

for some ﬁnite constant K using the property of translation boundedness. Finiteness of the integral follows on summing over k and using (8.6.13). Lemma 8.6.V (Lin, 1965). Let A be a bounded set in Rd , h a positive constant, and θ(x) a square integrable function with respect to Lebesgue measure on A. For k = (k1 , . . . , kd ), let Bk be the half-open cube {ki h < xi ≤ ki h + h; i = 1, . . . , d}, and set 2 ˜ bk = sup |θ(ω)| . ω∈Bk

Then, for all such θ(·), there exists a ﬁnite constant K(h, A) independent of θ(·) and such that

bk ≤ K(h, A) |θ(x)|2 dx, (8.6.13) A

k

where summation extends over all integers k1 , . . . , kd = 0, ±1, . . . . Proof. For simplicity, we sketch the proof for d = 1, h = 1, A = [−1, 1], leaving it to the reader to supply the details needed to extend the result to the general case. Write 1 αk = 12 eiπkx θ(x) dx −1

8.6.

P.P.D. Measures

365

for the kth Fourier coeﬃcient of θ as a function on the interval (−1, 1). Then, from standard properties of Fourier series, we have 1 ∞

|αj |2 = |θ(x)|2 dx < ∞. (8.6.14) j=−∞

−1

Now let ωk be any point in Bk = (k, k + 1], and consider the Taylor series ˜ expansion of θ(ω) at ωk . Since A is bounded, θ˜ is an entire function, and hence the Taylor series about the point k converges throughout Bk , and we can write & ∞ ∞ &

& ∞ (ωk − k)n (n) &2 ˜ k )|2 = ˜ (k)& & |θ(ω θ & & n! k=−∞ k=−∞ n=0 "! ∞ " ! ∞ ∞

|ωk − k|2n

|θ˜(n) (k)|2 ≤ n! n! n=0 n=0 k=−∞

∞ from the Cauchy inequality. The ﬁrst series is dominated by n=0 1/n! = e for all choices of ωk ; hence, by analogy with (8.6.14), we obtain ! ∞ " ∞ ∞

1 2 2 (n) ˜ ˜ |θ(ωk )| ≤ e |θ (k)| n! n=0 k=−∞ k=−∞ 1 ∞

1 1 n 2 2 =e |x θ(x)| dx ≤ e |θ(x)|2 dx. n! −1 −1 n=0 ˜ k )|2 and so give bk , (8.6.13) In particular, choosing ωk in Bk to maximize |θ(ω now follows. Another integrability result is noted in Exercise 8.6.8. A simple and characteristic property of a p.p.d. measure is that it remains a p.p.d. measure after addition of an atom of positive mass at the origin. Equally, passing over to the Fourier transforms, it remains a p.p.d. measure after addition of an arbitrary positive multiple of Lebesgue measure. Now suppose that, starting from a given p.p.d. measure µ, we repeatedly subtract multiples of Lebesgue measure in alternation, ﬁrst from the p.p.d. measure itself and then from its Fourier transform, until one of these measures ceases to be nonnegative. Evidently, certain maximum multiples of Lebesgue measure will be deﬁned by this process, leaving, after subtraction, a p.p.d. measure ν with the additional property that no nonzero multiple of Lebesgue measure can be subtracted from ν or its Fourier transform without destroying the p.p.d. property. Let us call such a measure a minimal p.p.d. measure. This leads us to the following elementary structure theorem. Proposition 8.6.VI. Every p.p.d. measure µ on Rd can be uniquely represented as the sum of a minimal p.p.d. measure, a positive multiple of Lebesgue measure on Rd , and an atom of positive mass at the origin.

366

8. Second-Order Properties of Stationary Point Processes

Very little is known about the structure of minimal p.p.d. measures, even when d = 1. See Exercise 8.6.9. Example 8.6(b). As a simple illustration of (8.6.12), let f (x) be the indicator function of the hyper-rectangle (0, T1 ] × · · · × (0, Td ]. It then follows that 2 d d sin(ωi Ti /2) µ(dω). (Ti − |xi |)+ ν(dx) = ωi /2 Rd i=1 Rd i=1

Exercises and Complements to Section 8.6 8.6.1 The space S. (a) Show that if X = R and ψ: R → R has an integrable kth derivative, then ∞ |ω k ψ(ω)| → 0 as |ω| → ∞, and that, conversely, if −∞ |x|k |ψ(x)| dx < ∞,

is k times diﬀerentiable. Deduce that S is invariant under the then ψ(ω) Extend the result to Rd . Fourier mapping taking ψ into ψ. d (b) Let g: R → R be an integrable function with Fourier transform g˜ such that both g and g˜ are zero free on Rd . Show that both the mappings ψ → ψ ∗ g → ψg are one-to-one mappings of S onto itself. In particular, deduce and ψ that this result holds when ψ(·) has the double exponential form eλ (·) of (8.6.4b). (c) Show that if µ, ν are boundedly ﬁnite measures on R such that R ψ dµ = ψ dν for all ψ ∈ S, then µ = ν. [Hint: Consider ψ ∈ S of bounded R support and approximate indicator functions.] Extend to Rd .

8.6.2 Let {cn : n = 0, ±1, . . .} denote a doubly inﬁnite sequence of reals. Call {cn } 2π (i) transformable if cn = 0 eiωn ν(dω) for some measure ν on [0, 2π]; and (ii) positive-deﬁnite if for all ﬁnite families {α1 , . . . , αk } of complex numbers, k k

αi α ¯ j ci−j ≥ 0.

i=1 j=1

Let P + (Z) denote the class of all p.p.d. sequences and P + (0, 2π] the class of all p.p.d. measures on (0, 2π]. Show that every {cn } ∈ P + (Z) is bounded, transformable, and symmetric [i.e. cn = c−n (all n)] and that a one-to-one mapping between P + (Z) and P + (0, 2π] is deﬁned when the Parseval relation k

holds for all a ˜(ω) =

k

j=1

j=1

2π

aj cj =

a ˜(ω) ν(dω) 0

aj eiωj , with a1 , . . . , ak any ﬁnite sequence of reals.

8.6.3 Show that not all translation-bounded sequences are transformable. [Hint: Let X = R and exhibit a sequence that is bounded but for which T T −1 j=−T cj does not converge to a limit as T → ∞. Use this to deﬁne an atomic measure on R that is not transformable.]

are integrable on R, 8.6.4 Poisson summation formula. Show that if both ψ and ψ then ∞

k=−∞

ψ(2πk + x) =

∞

j=−∞

ψ(j)e−ijx

8.6.

P.P.D. Measures

367

whenever the left-hand side deﬁnes a continuous function of x. [Hint: Under the stated conditions, the left-hand side, a(x) say, is a bounded 2π continuous function of x. Denote by an = (2π)−1 0 einx a(x) dx its nth Fourier coeﬃcient, and show by rearrangement that an = ψ(−n). Then, the relation is just the representation of a(·) in terms of its Fourier series. Observe that the conditions hold for ψ ∈ S and that the formula in Example 8.6(a)(3◦ ) is the special case x = 0.] 8.6.5 Show that any p.p.d. measure on R integrates (1 + ω 2 )−α for α > 12 and hence conclude that any p.p.d. measure is a tempered measure in the language of generalized functions. 8.6.6 (a) Let c(x) = |x|−1/2 for (|x| ≤ 1), c(x) = 0 elsewhere, and deﬁne g(ω) = ∞ 4 − −∞ eiωx c(x) dx. Show that the measure G with density g is nonnegative and translation-bounded but cannot be made into a p.p.d. measure by adding an atom at the origin. (b) Show that dx ν(A) = (bounded A ∈ B) 2 − sin |x| A deﬁnes a measure that is a spectral measure but not a transform (Thornett, 1979). 8.6.7 Show that for 1 < γ < 2 the following functions are densities of p.p.d. measures in R2 , and ﬁnd their spectral measures: (a) c1 (x, y) = {sin(γπ/2)Γ(γ + 1)/2π}2 |xy|1−γ ; (b) c2 (x, y) = 22(γ−2) π γ−3 (Γ(2 − γ))−1 |x2 + y 2 |1−γ . [Hint: Both spectral measures are absolutely continuous with densities g1 (ω1 , ω2 ) = [ 12 γ(γ − 1)]2 |ω1 ω2 |γ−2 , g2 (ω1 , ω2 ) = π γ−2 /[Γ(γ − 1)|ω12 + ω22 |2 ], respectively. Thornett (1979) has formulae for similar p.p.d. measures in Rd .] 8.6.8 Translation-boundedness characterization. A nonnegative Borel measure µ on B(Rd ) satisﬁes |I˜A (ω)|2 µ(dω) < ∞

Rd

for all bounded A ∈ B(Rd ), if and only if the measure µ is translation-bounded. [Hint: Establish a converse to Lemma 8.6.V of the form

|f˜(ω)|2 µ(dω) ≤ K 2 sup |f (x)|2 , x∈A

where f , with Fourier transform f˜, is any bounded measurable function vanishing outside the bounded Borel set A and K is an absolute constant that may depend only on µ. See Robertson and Thornett (1984) for further details. Other results and references for such measures, but on locally compact Abelian groups, are given in Bloom (1984).] 8.6.9 Find the minimal p.p.d. measures corresponding to the Hawkes process with Bartlett spectrum (8.1.10).

CHAPTER 9

Basic Theory of Random Measures and Point Processes

9.1 9.2 9.3 9.4 9.5

Deﬁnitions and Examples Finite-Dimensional Distributions and the Existence Theorem Sample Path Properties: Atoms and Orderliness Functionals: Deﬁnitions and Basic Properties Moment Measures and Expansions of Functionals

2 25 38 52 65

This chapter sets out a framework for developing point process theory as part of a general theory of random measures. This framework was developed during the 1940s and 1950s, and reached a deﬁnitive form in the now classic treatments by Moyal (1962) and Harris (1963). It still provides the basic framework for describing point processes both on the line and in higherdimensional spaces, including especially the treatment of ﬁnite-dimensional distributions, moment structure, and generating functionals. In the intervening decades, many important alternative approaches have been developed for more specialized classes of processes, particularly those with an evolutionary structure, and we come to some at least of these in later chapters. As far as is convenient, we develop the theory in a dual setting, stating results for general random measures alongside the more speciﬁc more clearly the features that are peculiar to point processes. Thus, for results that hold in this uniﬁed context, proofs are usually given only in the former, more general, setting. Furthermore, the setting for point processes also handles many of the topics of this chapter for marked point processes (MPPs): an MPP in state space X with mark space K can be regarded as a point process on the product space X × K so far as ﬁdi distributions, generating functionals, and moment measures are concerned. It is only when we consider particular cases, such as Poisson and compound Poisson processes or purely atomic random measures, that distinctions begin to emerge, and become more apparent as we move to 1

2

9. Basic Theory of Random Measures and Point Processes

discuss stationary processes in Chapter 12 and Palm theory and martingale properties in Chapters 13 and 14. The other major approach to point process theory is through random sequences of points. We note that this is equivalent to our approach through random measures, at least in our setting that includes point processes in ﬁnite-dimensional Euclidean space Rd . Section 9.1 sets out some basic deﬁnitions and illustrates them with a variety of examples. The second section introduces the ﬁnite-dimensional (ﬁdi) distributions and establishes both basic existence theorems and a version of R´enyi’s theorem that simple point processes are completely characterized by the behaviour of the avoidance function (vacuity function, empty space function), viz. the probability P0 (A) ≡ P{N (A) = 0} over a suitably rich class of Borel sets A. Section 9.3 is concerned with the sample path properties of random measures and point processes, and includes a detailed discussion of simplicity (orderliness) for point processes. The ﬁnal two sections treat generating functionals and moment properties, extending the treatment for ﬁnite point processes given in Chapter 5.

9.1. Deﬁnitions and Examples Let X be an arbitrary complete separable metric space (c.s.m.s.) and BX = B(X ) the σ-ﬁeld of its Borel sets. Except for case (v) of Deﬁnition 9.1.II, all the measures that we consider on (X , BX ) are required to satisfy the boundedness condition set out in Deﬁnition 9.1.I. It extends to general measures the property required of counting measures in Volume I, that bounded sets have ﬁnite counting measure and hence, as point sets, they contain only ﬁnitely many points and therefore have no ﬁnite accumulation points. Deﬁnition 9.1.I. A Borel measure µ on the c.s.m.s. X is boundedly ﬁnite if µ(A) < ∞ for every bounded Borel set A. This constraint is incorporated into the deﬁnitions below of the spaces which form the main arena for the analysis in this volume. They incorporate the basic metric properties of spaces of measures summarized in Appendix A2 of Volume I. In particular we use from that appendix the following. (1) The concept of weak convergence of totally ﬁnite measures on X , namely that µn → µ weakly if and only if f dµn → f dµ for all bounded continuous f on X (see Section A2.3). (2) The extension of weak convergence of totally ﬁnite measures to w#(weakhash) convergence of boundedly ﬁnite measures deﬁned by f dµn → f dµ for all bounded continuous f on X vanishing outside a bounded set (Section A2.6). (3) The fact that both weak and weak-hash convergence are equivalent to forms of metric convergence, namely convergence in the Prohorov metric

9.1.

Deﬁnitions and Examples

3

at equation (A2.5.1) and its extension to the boundedly ﬁnite case given by equation (A2.6.1), respectively. Exercise 9.1.1 shows that for sequences of totally ﬁnite measures, weak and weak-hash convergence are not equivalent. Many of our results are concerned with one or other of the ﬁrst two spaces deﬁned below. Both are closed in the sense of the w# -topology referred to above, and in fact are c.s.m.s.s in their own right (Proposition 9.1.IV). At the same time it is convenient to introduce four further families of measures which play an important role in the sequel. Deﬁnition 9.1.II. (i) M# X is the space of all boundedly ﬁnite measures on BX . # (ii) NX is the space of all boundedly ﬁnite integer-valued measures N ∈ M# X, called counting measures for short. (iii) NX#∗ is the family of all simple counting measures, consisting of all those elements of NX# for which N {x} ≡ N ({x}) = 0 or 1

(all x ∈ X ).

(9.1.1)

(iv) NX#g ×K is the family of all boundedly ﬁnite counting measures deﬁned on the product space B(X × K), where K is a c.s.m.s. of marks, subject to the additional requirement that the ground measure Ng deﬁned by Ng (A) ≡ N (A × K)

(all A ∈ BX )

(9.1.2)

is a boundedly ﬁnite simple counting measure, i.e. Ng ∈ NX#∗ . # (v) M# X ,a is the family of boundedly ﬁnite purely atomic measures ξ ∈ MX . (vi) MX (respectively, NX ) is the family of all totally ﬁnite (integer-valued) measures on BX . We introduce the family NX#g ×K to accommodate our Deﬁnition 9.1.VI(iv) of a marked point process (MPP) (as a process on X with marks in K). In it we require the ground process Ng to be both simple and boundedly ﬁnite. Note that in general a simple boundedly ﬁnite counting measure on BX ×K need not be an element of this family NX#g ×K . For example, taking X = K = R, realizations of a homogeneous Poisson process on the plane would have ground process elements failing to be members of NR# . See also Exercises 9.1.3 and 9.1.6. Note also that although a purely atomic boundedly ﬁnite measure can have at most countably many atoms, these atoms may have accumulation points, so representing such measures as a countable set {(xi , κi )} of pairs of locations and sizes of the atoms can give a counting measure on X × R+ that need not # be in either NX#g ×R+ nor even NX ×R+ [cf. Proposition 9.1.III(v) below]. # In investigating the closure properties of M# X and NX (Lemma 9.1.V below), we use Dirac measures (see Section A1.6) deﬁned for every x ∈ X by 1 if x ∈ Borel set A, (9.1.3) δx (A) = 0 otherwise.

4

9. Basic Theory of Random Measures and Point Processes

Proposition 9.1.III. Let X be a c.s.m.s., and µ a boundedly ﬁnite measure on BX (i.e., µ ∈ M# X ). (i) The measure µ is uniquely decomposable as µ = µa + µd , where µa =

i

(9.1.4)

κi δxi

(9.1.5)

is a purely atomic measure, expressed in terms of the uniquely determined countable set {(xi , κi )} ⊂ X × R+ 0 , and µd is a diﬀuse measure (i.e., it has no atoms). (ii) A boundedly ﬁnite measure N on BX is a counting measure (i.e., N ∈ # M# X belongs to NX ), if and only if in part (i) its diﬀuse component is null, and in (9.1.5) all κi are positive integers, κi = ki say, and {xi } is a countable set with at most ﬁnitely many xi in any bounded Borel set; that is, ki δxi . (9.1.6) N= i

(iii) For any N ∈

NX# ,

N∗ =

i

δxi

(9.1.7)

deﬁnes the support counting measure N ∗ . Then N ∗ ∈ NX#∗ ; N belongs to NX#∗ if and only if at (9.1.6) ki = 1 (all i); equivalently, N coincides with its support counting measure. (iv) Any counting measure N ∈ NX# may be represented as a counting mea ∈ N #g sure N X ×Z+ with representation {(xi , ki )}, in which the ground measure Ng is equal to the support counting measure N ∗ and the positive integer-valued marks ki represent the multiplicities of the atoms of N. (v) There exists a one-to-one, both ways measurable, correspondence between purely atomic boundedly ﬁnite measures µ on BX and counting measures Nµ (A × K) = i δ(xi ,κi ) (A × K) on the Borel sets of X × R+ 0 satisfying the additional requirement that, for all bounded A ∈ BX , κ Nµ (dx × dκ) = κi < ∞; (9.1.8) i:xi ∈A

A×R+

the correspondence is given for bounded A ∈ BX and any K ∈ B(R+ 0) that is bounded away from 0 by κ Nµ (dx × dκ) (9.1.9a) µ(A) = A×R+ 0

and Nµ (A × K) = lim

n→∞

j

IK [µ(Anj )],

(9.1.9b)

where {Anj } is a dissecting system of measurable subsets of A.

9.1.

Deﬁnitions and Examples

5

Remarks. The measure Nµ in (v) above, in terms of the representation as δ , may have an accumulation point (x, 0) ∈ X × R+ and therefore (x ,κ ) i i i fail to be boundedly ﬁnite, even though for such a measure Nµ we do have κ < ∞. In this case, we have, for some bounded set A, Ng (A) = ∞ i i whereas for > 0, #{(xi , κi ): xi ∈ A, κi > } < ∞. See Deﬁnition 9.1.VI(vi). Proof. Part (i) is a standard property of σ-ﬁnite measures on BX : see the deﬁnition of atomic and diﬀuse measures in Appendix A1.6. In part (ii), it is clear from (i) that if the diﬀuse component is null and the κi are positive integers, then the measure is a counting measure. Conversely, if N is integer-valued, any atom of N must have positive integral mass; and because N is boundedly ﬁnite there can be at most a ﬁnite number of such atoms within any bounded set, and at most countably many in all because we can cover X by a countable number of bounded sets. Hence, to complete the proof, it is enough to show that N has no nonatomic component. Let y be an arbitrary point of X , and { j : j = 1, 2, . . .} a monotonic sequence of positive reals decreasing to zero, so that the spheres Sj (y) ↓ {y} as j → ∞. Then by the continuity lemma for measures (Proposition A1.3.II), N {y} ≡ N ({y}) = lim N Sj (y) . j→∞

Each term on the right-hand side is nonnegative integer-valued; the same therefore applies to N {y}. Thus, if y is not an atom of N , it must be the limit of a sequence of open spheres for which N (Sj (y)) = 0, hence, in particular, the centre of an open sphere with this property. This shows that the support of N (the complement of the largest open set with zero measure) consists exclusively of the atoms of N , or equivalently, that N is purely atomic. Equation (9.1.6) now follows from (9.1.5). The properties of N ∗ in (iii) follow from the representations in (i) and (ii). Part (iv) follows from Deﬁnition 9.1.II(iv) and part (ii) because δ(xi ,ki ) can be identiﬁed with an atom in the product space. For part (v), (9.1.9a) is a restatement of (9.1.5). The condition at (9.1.8) is a restatement of the requirement that the measure µ be boundedly ﬁnite. The representation in (9.1.9b) mimics the construction using decreasing spheres to prove (ii) but with decreasing sets from a sequence of partitions from the dissecting system. Because K is bounded away from 0, there are at most a ﬁnite number of atoms with locations in A and values in K. As the sets in the dissecting system shrink, each of these atoms will ultimately be isolated in one of the subsets, leading to the representation (9.1.9b). Notice that this proof makes essential use of the topological structure of X ; Moyal (1962) discusses some of the diﬃculties that arise in extending it to more general contexts. Basic properties of M# X are set out in Section A2.6 of Appendix 2, from which the key points for our purposes are set out below, together with their counterparts for NX# .

6

9. Basic Theory of Random Measures and Point Processes

Proposition 9.1.IV. (i) Under the w# -topology, M# X is a c.s.m.s. in its own right. (ii) The corresponding Borel σ-algebra, B(M# X ) say, is the smallest σ-algebra on M# with respect to which the mappings µ → µ(A) are measurable X for all A ∈ BX . (iii) Under the w# -topology, NX# is a c.s.m.s. in its own right, and its Borel sets coincide with the Borel sets of NX# as a subset of M# X. # (iv) B(NX ) is the smallest σ-algebra with respect to which the mappings N → N (A) are measurable for each A ∈ BX . Statements (i) and (ii) form Theorem A2.6.III. The extensions to counting measures follow from the next lemma. Lemma 9.1.V. NX# is a closed subset of M# X. Proof. Let {Nk } be a sequence of counting measures converging to some limit measure N in the w# -topology in M# X . As in the proof of Proposition 9.1.III, let y be an arbitrary point of X , and Sj (y) a sequence of spheres, contracting to {y}, with the additional property that N ∂Sj (y) = 0

(j = 1, 2, . . .)

[this is always possible because N S (y) , as a function of , has jumps for at most countably many values of , and thus, the complementary set of values of

, being dense, the j can be chosen in the complementary set]. For each such sphere it follows from the properties of w# -convergence (Proposition A2.6.II) that Nk Sj (y) → N Sj (y) . Once again the terms on the left-hand side are all nonnegative integers, so the same is true for the term on the right-hand side. As in the previous proof, it then follows that N is purely atomic. This argument shows that # NX# is sequentially closed in M# X , and hence closed because MX is separable (Theorem A2.6.III). Exercise 9.1.2 shows that the spaces in Deﬁnitions 9.1.II(iii)–(iv), although # # measurable subsets of M# X , are not closed in either MX or NX . # topoloSimilarly, the space M# X ,a is not closed in the weak or weak gies: the sequence of purely atomic measures with atoms of mass 1/n at {1/n, 2/n, . . . , 1} converges weakly to Lebesgue measure on (0, 1). Properties (ii) and (iv) of Proposition 9.1.IV open the way to deﬁning random measures and point processes as measurable mappings involving the spaces of Deﬁnition 9.1.II, and lead to simple characterizations of random measures and point processes.

9.1.

Deﬁnitions and Examples

7

Deﬁnition 9.1.VI. (i) A random measure ξ with phase or state space X , is a measurable map # ping from a probability space (Ω, E, P) into M# X , B(MX ) . (ii) A point process N on state space X is a measurable mapping from a probability space (Ω, E, P) into NX# , B(NX# ) . (iii) A point process N is simple when P{N ∈ NX#∗ } = 1.

(9.1.10)

(iv) A marked point process on X with marks in K is a point process N on BX ×K for which (9.1.11) P{N ∈ NX#g ×K } = 1; its ground process is given by Ng (·) ≡ N (· × K). (v) A purely atomic random measure ξ is a measurable mapping from a # probability space (Ω, E, P) into M# X ,a , B(MX ,a ) . (vi) An extended MPP with positive marks is a point process on BX ×R+ which is ﬁnite-valued on all sets of the form A × K for bounded A ∈ BX and Borel sets K ⊂ ( , 1/ ) for some > 0. The notation of Deﬁnition 9.1.VI(i) is intended to imply that with every sample point ω ∈ Ω, we associate a particular realization that is a boundedly ﬁnite Borel measure on X ; we denote it by ξ(·, ω) or just ξ(·) (or even ξ) when we have no need to draw attention to the underlying spaces. Similar statements can be made for counting measures and point processes N (·, ω), N (·), and so on. A consequence of Deﬁnitions 9.1.VI(i)–(ii) above and Deﬁnition 9.1.II(ii) for NX# , is that a random measure is a point process if and only if its realizations are a.s. integer-valued. Observe that by choosing the state space X appropriately, Deﬁnitions 9.1.VI(i)–(ii) can be made to include not only a number of important special cases but also a number of apparent generalizations. In the case X = R, discussion of one-dimensional random measures is essentially equivalent, as we note in Example 9.1(c), to the discussion of processes with nonnegative increments. The cases X = Rd , d ≥ 2, correspond to multidimensional random measures. If X has the product form Y × K, where K is a ﬁnite set, {1, . . . , d} say, and we deﬁne distance in Y × K by (for example) d (x, i), (y, j) = ρ(x, y) + |i − j|, the resulting process is a multivariate random measure; each of its d components is a random measure on Y. This itself is a special case of a point process deﬁned on a product space; when both components are metric spaces, any one of a number of combinations of the two individual metrics—the additive form above is one convenient choice—will make the product space into a metric space and so allow the basic machinery to be applied. The assumption that

8

9. Basic Theory of Random Measures and Point Processes

such a choice can and has been made underlies the introduction of MPPs in Section 6.4 and Deﬁnition 9.1.VI(iv) which, as already noted, is equivalent when coupled with Deﬁnition 9.1.II(iv) to the requirement that for the ground process, Ng (A) ≡ N (A × K) < ∞ a.s. for all bounded A ∈ BX . Exercise 9.1.4 indicates that for a marked point process N and any Borel set K ∈ BK , the ‘K-marginal’ process NK (·) = N (· × K) is a well-deﬁned simple point process. That Ng is boundedly ﬁnite and simple follows from Deﬁnition 9.1.II(iv). Exercise 9.1.5 sets out in greater detail the extension described in Deﬁnition 9.1.VI(vi). The motivation behind this deﬁnition is the construction of the counting measure Nµ in Proposition 9.1.III(v) for a purely atomic measure µ. Using this deﬁnition and the measurability assertion in Proposition 9.1.III(v) leads to the following equivalence for purely atomic random measures. Lemma 9.1.VII. Equations (9.1.9a, b) establish a one-to-one correspondence between purely atomic random measures ξ and extended MPPs with positive marks, Nξ say, satisfying the condition (9.1.8). A realization of a random measure ξ has the value ξ(A, ω) [or we may write just ξ(A)] on the Borel set A ∈ BX [and, similarly, N (A) for a point process N ]. For each ﬁxed A, ξA ≡ ξ(A, ·) is a function mapping Ω into R+ , and thus it is a candidate for a nonnegative random variable; that it is indeed such is shown in the following proposition. Proposition 9.1.VIII. Let ξ (respectively, N ) be a mapping from a proba# bility space into M# X (NX ) and A a semiring of bounded Borel sets generating ξ is a random measure (N is a point process) if and only if ξA BX . Then N (A) is a random variable for each A ∈ A. Proof. Let U be the σ-algebra of subsets of M# X whose inverse images under ξ are events, and let ΦA denote the mapping taking a measure µ ∈ M# X into µ(A) [hence, in particular, ΦA : ξ(·, ω) → ξ(A, ω) ]. Because ξA (ω) = ξ(A, ω) = ΦA ξ(·, ω) as in Figure 9.1, we have for any B ∈ BR+ −1 ξ −1 Φ−1 (B). A (B) = (ξA ) When ξA is a random variable, (ξA )−1 (B) ∈ E, and then by deﬁnition we # have Φ−1 A (B) ∈ U. It now follows from Theorem A2.6.III that B(MX ) ⊆ U and hence that ξ is a random measure. # −1 Conversely, by deﬁnition of B(M# X ), ΦA (B) ∈ B(MX ), and when ξ is a −1 −1 random measure, ξ (ΦA (B)) ∈ E, so then ξA is a random variable. Taking for A the semiring of all bounded sets in BX we obtain the following corollary. (respectively, N : Corollary 9.1.IX. ξ: Ω → M# X is a random measure Ω → NX# is a point process) if and only if ξ(A) N (A) is a random variable for each bounded A ∈ BX .

9.1.

Deﬁnitions and Examples

9

ξ: ω → ξ(·, ω) Ω ξA : ω → ξ(A, ω)

M# X ΦA : µ → µ(A)

IR+

Figure 9.1 One useful consequence of Proposition 9.1.VIII is that we may justiﬁably use ξ(A) to denote the random variable ξA as well as the value ξ(A, ω) of the realization of the random measure ξ. Deﬁnitions 9.1.VI on their own do not lend themselves easily to the construction of particular random measures or point processes: for this the most powerful tool is the existence Theorem 9.2.VII below. Nevertheless, using Proposition 9.1.VIII or its corollary, we can handle some simple special cases as below in Examples 9.1(a)–(e) and Exercises 9.1.7–8. Example 9.1(a) Uniform random measure. Let X be the real line, or more generally any Euclidean space Rd , and deﬁne ξ(A) = Θ(A), where (·) denotes Lebesgue measure on X and Θ is a random multiplier that is nonnegative. To set this up formally, take Ω to be a half-line [0, ∞), E the Borel σ-algebra on Ω, and P any probability measure on Ω, for example, the measure with gamma density xα e−x /Γ(α). This serves as the distribution of Θ. For each particular value of Θ, the corresponding realization of the random measure is the Θ multiple of Lebesgue measure. (Note that this process is random only in a rather artiﬁcial sense. Given only one realization of the process, we would have no means of knowing whether it is random. Randomness would only appear if we were to observe many realizations. In the language of Chapter 12, the process is stationary but not ergodic.) # We are left with one task, to verify that the mapping in M# X , B(MX ) is indeed measurable. By Proposition 9.1.VIII, it is suﬃcient to verify that the mappings ξ(A) are random variables for each ﬁxed A ∈ BX . In our case, ξ(A) is a multiple of the random variable Θ, so the veriﬁcation is trivial. Thus, we have an example of a random measure. Example 9.1(b) Quadratic random measures: measures with χ2 density [see Example 6.1(c) and Exercise 6.1.3]. In Volume I we sketched a class of random measures constructed as follows. Take X = R, and choose any Gaussian process Z(t, ω) with a.s. continuous trajectories. [Suﬃcient conditions for this can be expressed in terms of the covariance function: for example, it is enough for Z(·) to be stationary with covariance function c(u) that is continuous at

10

9. Basic Theory of Random Measures and Point Processes

u = 0; Cram´er and Leadbetter (1967) give further results of this kind.] Then set Z 2 (t) dt for continuous Z(·, ω), A ξ(A) = 0 otherwise. Let us prove more formally that this construction deﬁnes a random measure. Because Z 2 (t) ≥ 0, it is clear that ξ(A) ≥ 0, and countable additivity is a standard property of indeﬁnite integrals. Moreover, because Z 2 (·) is a.s. continuous, it is bounded on bounded sets, and so ξ(·) is boundedly ﬁnite. For almost all ω, therefore, ξ(·, ω) is a boundedly ﬁnite Borel measure. To complete the proof that ξ(·) is a random measure we check that the condition of Proposition 9.1.VIII is met. Let A be any ﬁnite half-open interval (left-open right-closed for deﬁniteness), and let Tn = {Ani : i = 1, . . . , kn } be a sequence of partitions of A into subintervals with lengths 1/n or less. If tni is a representative point from Ani , it follows from standard properties of the Riemann integral that as n → ∞, ξn (A) ≡

kn i=1

Z 2 (tni )(Ani ) →

Z 2 (t) dt = ξ(A)

a.s.

A

Each Z(t) is a random variable by assumption, and therefore so too is ξn (A) (as a linear combination of random variables) and ξ(A) (as the limit of a sequence of random variables). It is then clear that ξ(A) is a random variable for every set A in the semiring of ﬁnite unions of left-open right-closed intervals. It now follows from Proposition 9.1.VIII that ξ(·) is a random measure. The distributions of ξ(A) are nearly but not quite of gamma form: each particular value Z 2 (t) has a gamma distribution and is proportional to a χ2 random variable with one degree of freedom, so that the integral deﬁning ξ(A) behaves as a linear combination of gamma variables. Its characteristic function can be obtained in the form of an inﬁnite product of rational factors each associated with a characteristic root of the integral operator with kernel c2 (u − t) on A × A. Exercise 6.1.3 asserts that when Z(·) is stationary, so too is ξ(·), and its moments have been given there [see also Example 9.5(a)], whereas Example 9.3(a) discusses sample-path properties. Because a quadratic random measure has a gamma process as its density, it is a candidate for the directing measure of the class of Cox processes called negative binomial processes in Barndorﬀ-Nielsen and Yeo (1969). More generally, sums of independent quadratic random measures have gamma process densities, and also therefore meet Barndorﬀ-Nielsen and Yeo’s deﬁnition, but, as they noted, although these point processes have computable moment properties, their distributional properties are not so readily accessible other than in degenerate cases (cf. Exercise 9.1.9). These negative binomial processes diﬀer from those of Example 6.4(b), where the distributions are exactly of

9.1.

Deﬁnitions and Examples

11

negative binomial form, but, in contrast to this example, the earlier processes are either non-orderly or non-ergodic. Example 9.1(c) Processes with nonnegative increments. Let X = R. It seems obvious that any stochastic process X(t), deﬁned for t ∈ R and possessing a.s. ﬁnite-valued monotonic increasing trajectories, should deﬁne a random measure through the relation ξ(a, b] = X(b) − X(a).

(9.1.12)

We show that this is the case at least when X(t) is also a.s. right-continuous. In any case, (9.1.12) certainly induces, for each realization, a ﬁnitely additive set function on the ring of ﬁnite unions of half-open intervals. Rightcontinuity enters as the condition required to secure countable additivity on this ring (compare the conditions in Proposition A2.2.VI and Corollary A2.2.VII). Then the set function deﬁned by (9.1.12) can be extended a.s. to a boundedly ﬁnite measure, which we may continue to denote by ξ on BR . ξ now represents a mapping from the probability space into M# R . Because X(t) is a stochastic process, ξ(a, b] is a random variable for each half-open interval (a, b]. Proposition 9.1.VIII now implies that ξ is a random measure. The condition of right continuity can always be assumed when X(t) is stochastically continuous, that is, whenever for each > 0, Pr{|X(t + h) − X(t)| > } → 0

as h → 0,

(9.1.13)

because we may then deﬁne a new process by setting X ∗ (t) = X(t + 0) and it is easy to verify that X ∗ (t) is a version or copy of X(t), in the sense that it has the same ﬁdi distributions. This condition is satisﬁed in particular by processes with stationary independent increments, giving rise to stationary random measures with the completely random property of Section 2.2. The next example illustrates the type of behaviour to be expected; Section 10.1 gives a more complete discussion of completely random measures. Example 9.1(d) Gamma random measures—stationary case. We indicated in Example 6.1(b) (see also Exercise 6.1.1) that a stationary random measure is deﬁned by r.v.s ξ(Ai ) that are mutually independent for disjoint Borel sets Ai in Rd and have Laplace–Stieltjes transforms ψ(Ai ; s) = (1 + λs)−α(Ai )

(λ > 0, α > 0, Re(s) ≥ 0);

these transforms show that the ξ(·) are gamma distributed. The convergence ψ(Ai ) → 1 as (Ai ) → 0 shows that the process is stochastically continuous, so we can assume right-continuity of the sample paths. The discussion around Proposition 9.1.VIII then implies that the resulting family of random variables can be extended to a random measure.

12

9. Basic Theory of Random Measures and Point Processes

Despite the condition of stochastic continuity, the measures ξ(·) here are not absolutely continuous, but on the contrary have a purely atomic character. This follows from the L´evy representation theorem which asserts that a process with independent increments can be represented as the sum of a shift, a Gaussian component, and an integral of Poisson components indexed according to the heights of the jumps with which they are associated [see e.g. Feller (1966, Section XVII.2), Bertoin (1996), or Theorem 10.1.III below]. The existence of a Gaussian component is ruled out by the monotonic character of the realizations, which also implies that the jumps are all positive. Thus, the random measure can be represented as a weighted superposition of Poisson processes, in the same kind of way as the compound Poisson process of Section 2.2. In the present case, however, the random measure has a countable rather than a ﬁnite number of atoms in any ﬁnite interval, but most atoms are so small that the total mass in such an interval is still a.s. ﬁnite. See also Example 9.1(g) and the discussion preceding it. Example 9.1(e) Random probability distributions and Dirichlet processes. Random probability distributions play an important role in the theory of statistical inference, in particular, as prior distributions in nonparametric inference. Here we outline one method that has been proposed for constructing such distributions. Further constructions are in Exercises 9.1.10 and 9.3.4. Suppose given a random measure ξ on the c.s.m.s. X , with ξ a.s. totally ﬁnite and nonzero, and deﬁne ζ(A) = ξ(A)/ξ(X )

(A ∈ BX )

(9.1.14)

[in full, ζ(A, ω) = ξ(A, ω)/ξ(X , ω)]. Proposition 9.1.VIII shows that ζ is a random measure; because ζ(X ) = 1 a.s., it is a random probability measure. The Dirichlet process Dα is the random measure ζ deﬁned by the ratio at (9.1.14) when ξ is a gamma random measure as in the previous example and Exercise 6.1.1. Straightforward algebra shows that ζ(A) has a beta distribution and, more generally, that the ﬁdi distributions of ζ(Ai ) over disjoint sets Ai , i = 1, . . . , r, are multivariate beta distributions. Exercise 9.5.1 gives moment measures of the process. For a Dirichlet process on X = R, write Fζ (·) for the random d.f. associated with the random probability distribution ζ. Then the random variables Zi = −∞ = x0 < x1 < · · · < xr−1 < xr = Fζ (xi ) − Fζ (xi−1 ), i = 1, . . . , r, where ∞ are such that, if each αi = α (xi−1 , xi ] > 0, their joint distribution is singular with respect to r-dimensional Lebesgue measure but absolutely continuous with respect to (r − 1)-dimensional Lebesgue measure on the simplex {(z1 , . . . , zr−1 ): z1 + · · · + zr−1 = 1 − zr ≤ 1}, where it has the density function r r αi −1 zi . αi f (z1 , . . . , zr−1 | α1 , . . . , αr ) = Γ Γ(αi ) i=1 i=1 Ferguson (1973) supposes that an (unobserved) realization ζ of Dα governs independent observations X1 , . . . , Xn for which the parameter α speciﬁes the prior distribution of ζ. He shows that, conditional on (X1 , . . . , Xn ) =

9.1.

Deﬁnitions and Examples

13

(x1 , . . . , xn ), the posterior ndistribution of ζ is again that of a Dirichlet process but has parameter α + i=1 δxi . Our later discussion of completely random measures around (10.1.4) implies that both ζ and ξ have realizations that are purely atomic, and hence that the possible d.f.s in such a random distribution are a.s. purely discrete. This is actually an advantage in the above discussion, as it is the feature that allows prior and posterior to have the same distributional form. See also Exercise 9.1.11, where the gamma random measure and the Dirichlet distribution are used to deﬁne a prior distribution for an inhomogeneous Poisson process. Concerning the existence of point processes, two basic approaches are widely used in the literature. A point process is deﬁned sometimes as an integer-valued random measure N (·, ω), as above, and sometimes as a sequence of random variables {yi }. When are these approaches equivalent? In one direction the argument is straightforward and covered in the next result where we start from certain ﬁnite or countably inﬁnite sequences {yi : i = 1, 2, . . .} of X -valued random elements. Proposition 9.1.X. Let {yi } be a sequence of X -valued random elements deﬁned on a probability space (Ω, E, P), and suppose that there exists an / E0 implies that for any bounded event E0 ∈ E such that P(E0 ) = 0 and ω ∈ set A ∈ BX , only a ﬁnite number of the elements of {yi (ω)} lie within A. Deﬁne N (·) to be the zero measure on E0 and otherwise set N (A) = #{yi ∈ A} =

i

δyi (A)

(A ∈ BX ).

(9.1.15)

Then N (·) is a point process. Proof. The set function N (A) is clearly a.s. integer-valued and ﬁnitely additive on Borel sets. Given a sequence {yi }, choose y ∈ / {yi } and let {Aj } be a sequence of bounded Borel sets decreasing to {y}. Because any Aj is bounded, N (Aj ) < ∞ a.s. Then for each yi ∈ Aj , there exists ﬁnite ji such / Aj for j > ji . Therefore, for some ﬁnite j , N (Ak ) = 0 for all that yi ∈ k > j , and hence ∞ N A j=1 j = 0 a.s.; that is, N (Aj ) → 0 a.s. Thus, N (·) must be not just ﬁnitely but a.s. countably additive. This is enough to show that the sequence {yi } induces a counting measure on X , and so sets up a mapping from its probability space into NX# . To show that N (·) is a point process, the critical step is to show that this mapping is measurable. From Proposition 9.1.VIII it is enough to show that for each Borel set A, N (A) is a random variable. To this end, for each k = 1, 2, . . . and ω ∈ / E0 , write Nk (A, ω) =

k i=1

δyi (ω) (A).

(9.1.16)

14

9. Basic Theory of Random Measures and Point Processes

Because yi is an X -valued random variable, and A is a Borel subset of X , each δyi (A) is a random variable for i = 1, . . . , k and therefore so too is Nk (A). By monotonicity, N (A) = limk→∞ Nk (A) is well deﬁned and therefore a random variable as required. The main problem with this approach to point processes arises from the need to express, in terms of the distributions of the yi , the condition that with probability 1 the counting measures are boundedly ﬁnite. Suppose, for example, that the yi are generated as the successive states in a Markov chain with state space X and that the transition function for the chain satisﬁes an irreducibility condition suﬃcient to imply the usual classiﬁcation into recurrent and transient chains. Then the local ﬁniteness condition is satisﬁed if and only if the chain is transient. Exercises 9.1.12–13 illustrate this point. We now turn to the more diﬃcult question of constructing a sequence of X -valued r.v.s from a given point process. Recall from Proposition 9.1.III that the realizations of the random counting measure N (·) determine a.s. a countable set of atoms without any ﬁnite accumulation point, but this ignores the problem of ﬁnding a meaningful ordering of the atoms, without which the interpretation of the locations of the atoms as random variables is unresolved. The idea of ﬁxing some enumeration of the points, in a measurable way, prompts the following deﬁnition. Deﬁnition 9.1.XI. Let N be a point process on a c.s.m.s. X as in Deﬁnition 9.1.VI(ii). A measurable enumeration of N is a sequence of X -valued r.v.s {yi (N ) ≡ y i (N (·, ω)): i = 1, 2, . . .} such that for every bounded Borel set A, ∞ N (A, ω) = i=1 δyi (N (·,ω)) (A) a.s. As a ﬁrst illustration of how a sequence of random variables {yi } can be extracted from the counting measure N (·), we formalize the discussion at the end of Section 3.1 concerning the relation between counting and interval properties of a simple point process N ∈ NR#∗ . Retaining the notation of that section, recall the deﬁnitions ⎧ (t > 0), ⎪ ⎨ N ((0, t]) N (t) = 0 (t = 0), (9.1.17) ⎪ ⎩ −N ((t, 0]) (t < 0), and for i = 0, ±1, . . . , ti (N ) = inf{t: N (t) ≥ i}, τi (N ) = ti (N ) − ti−1 (N )

(9.1.18a) (9.1.18b)

(Figure 9.2 illustrates the relationship between {ti } and {τi }). Let S + denote the space of all sequences {τ0 , τ±1 , τ±2 , . . . ; x} of positive numbers τi satisfying 0 ≤ x < τ0 and ∞ ∞ τi = τ−i = +∞. (9.1.19) i=1

i=1

9.1.

Deﬁnitions and Examples

15

··· τ0 τ1 ··· τn τn+1 −−−−−−−−|−−−−−−−−|−−−−− | −−−−−−−−−−−− | −−−−−−−−−− | −−−−−−|−−→ time t1 ··· tn−1 tn tn+1 · · · t−1 [0] t0 Figure 9.2 Intervals τ1 , . . . , τn , . . . between successive points t0 , t1 , . . . , tn−1 , tn , . . . . When applicable, 0 satisﬁes t−1 < 0 ≤ t0 , so that the interval of length τ0 contains 0.

Then adding the relation x(N ) = −t0 (N ) to τi at (9.1.18) deﬁnes a mapping R: NR#∗ → S + . The inverse mapping R−1 is deﬁned for s+ ∈ S + by t0 (s+ ) = −x0 (s+ ),

ti (s+ ) =

ti−1 (s+ ) + τi ti+1 (s+ ) − τi+1

(i ≥ 1), (i < 0).

(9.1.20)

We give S + the Borel σ-algebra B(S + ) obtained in the usual way as the product of σ-algebras on each copy of R+ . Proposition 9.1.XII. The mapping at (9.1.18), R say, provides a one-to-one both ways measurable mapping of NR#∗ into S + . In particular, (i) the quantities τi (N ) and x(N ) are well-deﬁned random variables when N is a simple point process; and (ii) there is a one-to-one correspondence between the probability distributions P ∗ of simple point processes on NR#∗ and probability distributions on the space S + . Proof. The relations (9.1.18a) deﬁne the ti (N ) as stopping times for the increasing, right-continuous process N (·) (see Deﬁnition A3.3.II and Lemma A3.3.III). Hence the ti (N ), and also therefore the τi (N ), are random variables whenever N is a simple point process. To establish the converse, observe that the N can be deﬁned in terms of the sequence of random variables ti , noting that the requirement (9.1.19) implies that the resulting point process is boundedly ﬁnite. Then it follows from Proposition 9.1.X that N is a well-deﬁned, simple point process. Note, in particular, the intervention of the initial interval (0, t1 ], the length x of which must be given separately: there is not a one-to-one correspondence between intervals and simple counting measures (Exercise 9.1.14 provides a further example of this type and a counterexample involving the subset N0 ⊂ NR#∗ with an atom at 0 and the subset S0+ for which t0 = x = 0). Sigman’s (1995) Appendix D discusses these questions also. There is no analogous simple representation for point processes in Rd , d = 2, 3, . . . , although Exercise 9.1.15 sketches a possible construction based on distances of points from an origin (see also below Lemma 13.3.III). A general construction that exploits the separability property of the c.s.m.s. X was suggested by Nguyen and Zessin (1976) and incorporated into Theorem 1.11.5 of MKM (1982). It is based on an ordered system of tilings of the c.s.m.s. X , meaning a countable family T = {Tn } of ‘inﬁnite’ dissecting systems of

16

9. Basic Theory of Random Measures and Point Processes

Tn : Tn+1 :

An1 An+1,1

··· ···

x An,in An+1,in+1

··· ···

y An,jn An+1,jn+1

··· ···

Figure 9.3 Tilings Tn = {Ani }, Tn+1 = {An+1,i } with sets containing x, y.

X : each tiling Tn = {Ani : i = 1, 2, . . .} consists of countably many disjoint bounded sets Ani satisfying (i) (Partition and tiling properties) Ani ∩Anj = ∅ for i = j, and i Ani = X ; (ii) (Nesting property) A n−1,i ∩ Anj = Anj or ∅, and any An−1,i ∈ Tn−1 is expressible An−1,i = j∈Jn−1,i Anj for some ﬁnite set of indices Jn−1,i ; (iii) (Point-separating property) given distinct x, y ∈ X , there exists an inte/ Ani ; and ger n(x, y) such that for n ≥ n(x, y), x ∈ Ani implies y ∈ (iv) (Enumeration consistency property) given distinct x, y ∈ X , when Tm is such that x ∈ Ami y, for all n ≥ m sets An,in x, An+1,in+1 x and An,jn y, An+1,jn+1 y are such that jn − in and jn+1 − in+1 have the same sign. The ﬁrst three properties above are analogues of the properties of a dissecting system (Deﬁnition A1.6.I). We know from Proposition A2.1.IV that on any bounded Borel set in BX there exists a dissecting system. We also know that a c.s.m.s. is covered by the union of a countable increasing family of boundedly ﬁnite sets Si say; then {A1i } = {Si+1 \Si } is a tiling. Now introduce dissecting systems on each A1i , and enumerate their members in one sequence {A2i } so as to satisfy property (iv); then it is a tiling of X . Given an integer-valued measure N on BX , N (A1i ) is a ﬁnite integer for each i, and for each i = 1, 2, . . . we can enumerate the atomic support of N within those A1i for which N (A1i ) ≥ 1 via the tilings. Moreover, if x, y ∈ X are such that N ({x}) ≥ 1 and N ({y}) ≥ 1, then x and y will be enumerated (with appropriate multiplicity if either inequality is strict) in a ﬁnite number of operations starting from some Si that contains both x and y. The determination of such x or y proceeds via a sequence ofnested sets An,in say for ∞ which N (An,in ) ≥ N (An+1,in+1 ) ≥ 1 for all n with n=1 An,in = {x} say. It follows that an enumeration of the points of N is thereby determined, for a sequence y1 , y2 , . . . , within a ﬁnite number of steps for each yr even before its precise location is known. Now write yr = lim j Aj,ij,r for some monotonic decreasing sequence of sets Aj,ij,r for which N (Aj,ij,r ) = 1 for all suﬃciently large j. For any given ﬁnite enumeration of points, the position in the enumeration is found from a ﬁnite number of elements of dissecting systems, with all the associated counting measures N (·) measurable. The limit is therefore measurable; that is, the enumeration is measurable as required. This outline argument leads to the following assertion. Lemma 9.1.XIII. Given a point process N on a c.s.m.s. X as in Deﬁnition 9.1.VI(ii), there exists a measurable enumeration of X -valued random

9.1.

Deﬁnitions and Examples

17

elements {yi } satisfying (9.1.16). This enumeration is uniquely determined by a given family of bounded sets {Si } with Si ↑ X and of dissecting systems for each Si \ Si−1 . A random measure may be regarded as a family of random variables indexed by the Borel sets of X , but it is considerably more than this. The additivity and continuity properties of measures require at least the truth of ξ(A ∪ B) = ξ(A) + ξ(B) a.s.

(9.1.21)

for all pairs of disjoint Borel sets A, B in X , and ξ(An ) → 0

a.s.

(9.1.22)

for all sequences of bounded Borel sets An such that An ↓ ∅. It is not quite trivial to prove, but fundamental for the resulting theory, that these conditions are in fact suﬃcient for the family to form a random measure. The diﬃculty is associated with the exceptional sets of measure zero: because there is an uncountable family of relations (9.1.21) and (9.1.22), it is not clear that the exceptional sets can be combined to form a single set that is still of measure zero. The next lemma indicates one way around the diﬃculty. Lemma 9.1.XIV. Let A be a countable ring of bounded Borel sets with the self-approximating property of Deﬁnition A2.2.VIII, and ξA (ω) a family of nonnegative random variables indexed by the sets A ∈ A. In order that, with probability 1, the ξA (ω) should admit an extension to a measure on σ(A), it is necessary and suﬃcient that (9.1.21) hold for all disjoint pairs (A, B) of sets in A and that (9.1.22) hold for all sequences {An } of sets in A with An ↓ ∅. Proof. The number of sets in A being countable, it follows immediately that (9.1.21) implies that the ξA (ω) are a.s. ﬁnitely additive there. To establish countable additivity we use the covering property of Lemma A2.2.IX, from which it follows that it is enough to know that n F (A; 1/k) = ξ(A) (9.1.23) lim ξ i i=1 n→∞

simultaneously for all sets A ∈ A and integers k < ∞. Because each such relation holds a.s. from (9.1.22), and because the number of sets A ∈ A and integers k < ∞ is countable, this requirement is satisﬁed almost surely. Then Lemma A2.2.IX implies that the ξA can be a.s. extended to a measure on σ(A). The necessity of both conditions follows directly from the additivity and continuity properties of a measure. As an immediate corollary we obtain the following theorem, in which the point process analogues of (9.1.21) and (9.1.22), for A, B, and {An } as above, are N (A ∪ B) = N (A) + N (B) a.s., (9.1.24a) N (An ) → 0 a.s. (9.1.24b)

18

9. Basic Theory of Random Measures and Point Processes

Theorem 9.1.XV. Let {ξA (ω)} [respectively, NA (ω)] be a family of nonnegative random variables indexed by the sets of BX and a.s. ﬁnite-valued (ﬁnite integer-valued) on bounded Borel sets. In order that there exist a random measure ξ ∗ (A, ω) (point process N ) such that, for all A ∈ BX , ξ ∗ (A, ω) = ξA (ω) a.s.,

[N (A) = NA

a.s.],

(9.1.25)

it is necessary and suﬃcient that (9.1.21) [(9.1.24a)] hold for all pairs A, B of disjoint bounded Borel sets and that (9.1.22) [(9.1.24b)] hold for all sequences {An } of bounded Borel sets with An ↓ ∅. Proof. Let A be any countable generating ring of bounded Borel sets with the self-approximating property of Deﬁnition A2.2.VIII, as, for example, the ring C following Lemma A2.2.IX. If (9.1.21) and (9.1.22) hold for Borel sets in general, they certainly hold for sets in A. Thus, the conditions of Lemma 9.1.XIV are satisﬁed, and we can assert that with probability 1 the ξA (ω), initially deﬁned for A ∈ A, can be extended to measures ξ ∗ (A, ω) deﬁned for all A ∈ σ(A) = BX . For ω in the P-null set, U say, where the measures cannot be so extended, set ξ ∗ (A, ω) = 0. Then ξ ∗ (A, ω) is a random measure which coincides a.s. with the original random variables ξ(A, ω) at least on A. It is not immediately obvious, nor indeed is it necessarily true, that the / extensions ξ ∗ (A, ω) coincide with the original random variables ξA (ω) for A ∈ A, even outside the exceptional set U of probability zero where the extension may fail. The best we can do is to show that they are a.s. equal for each particular Borel set A. The exceptional sets may be diﬀerent for diﬀerent A, and we do not claim that they can be combined into a single exceptional set of measure zero. Consider the class of sets on which ξ ∗ and ξ coincide a.s. This class includes A, and from the relations (9.1.22) it is closed under monotone limits. By the monotone class theorem it therefore includes σ(A), which by assumption is BX . This proves (9.1.25), and hence also the suﬃciency part of the theorem. Necessity is an easy corollary of the additivity and continuity properties of a measure. The arguments can be applied equally to the case that the ξ is a.s. a counting measure, leading to the analogous result for a point process. As a sample application of Theorem 9.1.XV, we outline an approach to the deﬁnition of what, loosely speaking, might be termed a conditional random measure. It can be used to provide alternative proofs for the existence of the doubly stochastic and cluster processes introduced in Chapter 6. Proposition 9.1.XVI. Let ξ be a random measure deﬁned on the probability space (Ω, E, P) with some c.s.m.s. X as state space, and let F be a sub-σ-algebra there exists a version of the conditional expectation of E. Then η(A, ω) = E ξ(A) | F (ω) such that (i) for each A ∈ BX , η(A, ·) is an F-measurable r.v.; and (ii) η is a random measure with state space X .

9.1.

Deﬁnitions and Examples

19

Proof. It is easy to see from standard properties of conditional expectations that the additivity and consistency relations (9.1.21) and (9.1.22) are both satisﬁed for the conditional expectations η(A, ω). Furthermore, we may take the probability space here to be (Ω, F, PF ) rather than (Ω, F, P), where PF denotes the restriction of P to sets of F, because by deﬁnition the conditional expectations are all F-measurable. It now follows directly from Theorem 9.1.XV that there exists an F-measurable random measure η ∗ such that η ∗ (A) = η(A) a.s. An almost identical argument leads to the classical result on the existence of regular conditional distributions given a σ-algebra (see Exercise 9.1.16 for variants on this theme). To conclude this section, we would again emphasize the essential role played by the assumptions in the deﬁnitions. For example, the truth of Proposition 9.1.VIII and Theorem 9.1.XV depends in an essential manner on the assumption of nonnegativity. Corresponding statements for random signed measures are false in general: this is shown by the next example which, superﬁcially, would be regarded as a random measure. Example9.1(f) Wiener’s homogeneous chaos. For A ∈ BX , let ξ(A) have a normal N 0, µ(A) distribution, where µ(A) is a ﬁxed, boundedly ﬁnite, Borel measure on X , and suppose that the ξ(A) are independent for disjoint sets. These two requirements immediately allow the joint distributions of ﬁnite families ξ(A1 ), . . . , ξ(Ak ) to be written down, and it is easy to check that these joint distributions satisfy the consistency requirements of the Kolmogorov theorem. Thus, there does exist a probability space Ω on which the ξ(A) may be simultaneously deﬁned as random variables. Now consider the random variable W = ξ(A1 ∪ A2 ) − ξ(A1 ) − ξ(A2 ), where A1 , A2 are disjoint bounded Borel sets. It is readily checked that E(W ) = 0 and var W = µ(A1 ∪ A2 ) + µ(A1 ) + µ(A2 ) − 2µ(A1 ) − 2µ(A2 ) = 0, so W = 0 a.s.Next consider the sequence {An } of disjoint bounded Borel ∞ sets with A = j=1 Aj , where A is also bounded, and set Wn ≡ ξ

n j=1

n Aj = ξ(Aj ) a.s., j=1

where the last equality follows by induction from the previous result. Then n n var Wn = µ = A µ(Aj ), j=1 j j=1

20

9. Basic Theory of Random Measures and Point Processes

and if W = ξ

∞ j=1

Aj , we must have var(Wn − W ) → 0. This shows that n

ξ(Aj ) → ξ(A)

j=1

in quadratic mean, and because the ξ(Aj ) are independent, the partial sums converge to ξ(A) almost surely as well [see, e.g., Moran (1968, Theorem 8.24)]. We have shown that the family {ξ(Aj )} satisﬁes both (9.1.21) and (9.1.22). On the other hand it is not true that for almost all ω the realizations are signed measures. To see this, let {A1 , . . . , An } be a ﬁnite partition of A and set n |ξ(Aj )|. Yn = j=1

If the realizations ξ(·) were signed measures, the Yn would remain uniformly bounded a.s. over all possible partitions. But 1/2 n n 1/2 2 µ(Aj ) E |ξ(Aj )| = , E(Yn ) = π j=1 j=1 and

n

µ(Aj )

1/2

j=1

n

≥

µ(Aj ) µ(A) 1/2 = 1/2 , max1≤j≤n µ(Aj ) max1≤j≤n µ(Aj ) j=1

so E(Yn ) can be made arbitrarily large by choosing a partition for which max1≤j≤n µ(Aj ) is suﬃciently small. Because var Yn ≤ µ(A) for every partition, an application of Chebyshev’s inequality shows that for any given ﬁnite y, a partition can be found for which Pr{Yn ≥ y} can be made arbitrarily close to 1. This is impossible if the Yn are a.s. bounded. Other examples may fail to be a random measure or point process because they fail to satisfy the bounded ﬁniteness condition, as occurs for example with the jump points of many L´evy processes and for certain point sets for which Mandelbrot (1982, p. 78) proposed the term dust. A dust is a point set with inﬁnitely many points in some bounded set and which has topological dimension D = 0 (Mandelbrot, 1982, pp. 15, 409–412). [Stoyan and Stoyan (1994, p. 4) describe a dust as an uncountable point set containing no piece of any curve; the uncountability assumption seems unnecessarily restrictive.] For example, the rationals on [0, 1] constitute an everywhere dense countable dust, whereas the Cantor set (or, Cantor dust) on [0, 1] [see, e.g., Halmos (1950, Exercise 15.5)] is an uncountable nowhere dense dust. Example 9.1(g) L´evy dust. Mandelbrot (1982, p. 240) includes the set of zeroes of Brownian motion B(·) as an example of a L´evy dust. Here we use the term here to mean the class of dusts deﬁned via subordinators of a Brownian motion process [and, by a subordinator η(·) we mean a nonnegative

9.1.

Deﬁnitions and Examples

21

3 2

B(η(j/n))

1 0 −1 −2 −3

0

2

4

6

8 η(j/n)

10

12

14

16

Figure 9.4 A space–time skeleton at {t = j/n: j = 1, . . . , 20n} of points of L´evy dust {(xi , B(xi ))} for standard Brownian motion B(·) at jump points {ti } of gamma random measure subordinator η(·), with xi = η(ti −). [n = 50, but η(j/n) − η([j − 1]/n) < 10−7 for over half the skeletal points.]

L´evy process with zero drift coeﬃcient as in, e.g., Bertoin (1996, Chapter III and p. 16)], so that a L´evy dust consists of the set of values of the process y(t) = B[η(t)], where B(·) is a one- or two-dimensional Brownian motion. A subordinator η(·), being nondecreasing and a pure jump process with independent increments on R+ , has a countably inﬁnite number of jump points, {ti } say, on any bounded interval of positive length. When a Markov process is used as a subordinand, the resultant process is again Markovian, and its range remains a countable set of points which, in the case of Brownian motion B(·), is just the countable set {yi } = {B(η(ti ))}. Indeed, the process B(η(·)) is again a L´evy process (Bertoin’s Exercise III.6.1). Figure 9.4 depicts a space–time skeleton of a sample of these points by plotting {(xj , yj )} = {(η(j/n), B(xj )): j = 1, . . . , 20n} in the case of a stationary gamma random measure η(t) = ξ(0, t] of Example 9.1(d) with α = 1. For L´evy dusts on R2 , when B(t) = [Z1 (t), Z2 (t)] and Zj (t) (j = 1, 2) are standard independent one-dimensional Brownian motions, a useful approximation for simulation and illustrative purposes is to ignore the ultra-ﬁne structure (inﬁnitely many exceedingly small increments) and treat the process as a random walk {Xn } in R2 whose steps Yn = Xn+1 − Xn have an isotropic distribution in R2 . For example, when η(·) is a nonnegative stable process, the step lengths follow approximately the Pareto form Pr{|Yn | > r | |Yn | > δ} = (δ/r)α

(r > δ)

[see, e.g., Ogata and Katsura (1991) and the more extended discussion in Mart´ınez and Saar (2002) of astrophysical applications, where the approximation is also known as Rayleigh–L´evy dust or Rayleigh–L´evy ﬂights]. Two illustrations of such approximating point sets are shown in Figure 9.5.

22

9. Basic Theory of Random Measures and Point Processes

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

0.5

0

1

0

0.5

1

Figure 9.5 Two realizations of 1000 points of ‘Rayleigh–L´evy dust’, that is, a random walk with isotropic steps with d.f. tail (rmin /r)α on r > rmin = 0.001, α = 1.05 and 0.9 (left- and right-hand ﬁgures respectively), locations reduced modulo 1 to the unit square.

Exercises and Complements to Section 9.1 9.1.1 By deﬁnition, a sequence of totally ﬁnite measures {µk } that converges weakly to a totally ﬁnite measure µ converges in the w# sense to the same limit. Check that the converse need not be true by taking µk to be Lebesgue measure on [0, k]. [Hint: As k → ∞, µk does not converge weakly but does converge w# to Lebesgue measure on [0, ∞).] 9.1.2 (a) The sequence of measures {Nk } on BR deﬁned by Nk = δ0 + δ1/k converges weakly to the measure N = 2δ0 ; each Nk ∈ NR#∗ but not the limit measure. (b) For k = 1, 2, . . . , let the measure ηk on X × K, with X = R and K = Z+ , have unit atoms at all the points {(i + j/2k , 2k ): i = 1, 2, . . . ; j = 0, 1, . . . , 2k − 1} so ηk has boundedly ﬁnite support in R × Z+ , and let Nr = rk=1 ηk . Show that each Nr is an element of NX#∗ ×K but that their limit is not.

9.1.3 Show that an MPP can be simple even if its ground process is not simple. [Hint: Simplicity of the MPP implies only that no single location has two identical marks.] 9.1.4 Let N be a simple point process on BX ×K , and K a ﬁxed bounded Borel set in K. Show that NK (A) = N (A × K) (bounded A ∈ BX ) deﬁnes a simple point process. Deduce that the ground process Ng of Deﬁnition 9.1.V(iv) is well deﬁned. 9.1.5 (a) Show that an extended MPP in the sense of Deﬁnition 9.1.6(vi) may fail to satisfy the requirements either of an MPP with marks in R+ or of a point process on R × R+ . [Hint: Consider as a counterexample the Poisson process arising in the L´evy representation of the gamma random measure of Example 9.1(d). The problem lies in satisfying the bounded ﬁniteness properties as we have deﬁned them.] (b) Show that the mapping (xi , κi ) → (xi , log κi ) deﬁnes a one-to-one mapping between the realizations of an extended MPP with marks in R+ and the

9.1.

Deﬁnitions and Examples

23

space NX#×R . Use this mapping to deﬁne and explore the properties of a form of weak convergence for sequences of extended MPPs. 9.1.6 Show that, except for a set of P-measure zero, a realization of a marked point process (Section 6.4) can be regarded as a simple point process on the product space X ×K∪ , where K∪ = K(1) ∪K(2) ∪· · · , each K(k) consisting of all ordered sets of k-tuples of elements of the k-fold product set of K with itself, and the measure on each A × K(k) is symmetric in the subsets of K(k) . Conclude that any MPP is equivalent to another MPP whose ground process is simple. 9.1.7 Let {X(t): t ∈ R} be a measurable nonnegative stochastic process on (Ω, E, P). Show that, when the integrals concerned are ﬁnite, the relation X(t, ω) dt

ξ(A, ω) =

(bounded A ∈ BR )

A

deﬁnes a random measure ξ: Ω → M# R. [Hint: Start by considering X(t, ω) of the form j cj IAj (t)IEj (ω).] 9.1.8 Let N be a well-deﬁned point process on X = R2 . With each point yi in a realization of N associate a geometric object in one of the following ways. (a) Construct a disk Sr (yi ) with centre yi and radius r, and let ξ(A) =

i

(A ∩ Sr (yi ))

(bounded A ∈ B(R2 ))

represent the total area of disks intersecting any Borel set A. Use Proposition 9.1.VIII to verify that ξ is a well-deﬁned random measure on R2 . (b) If the radius of each disk is also a random variable, leading to SRi (xi ) say, a conditioning argument as in Example 6.4(e), coupled with some condition (A∩SRi (xi )), is needed. ensuring the a.s. ﬁniteness of the deﬁning sum (c) Instead of disks, construct from yi as endpoint, a ﬁnite line segment Li of length d and random orientation θi say, for some random variables {θi } that are i.i.d. on (0, 2π]. For any bounded Borel set A ⊂ R2 let (A ∩ L) now denote the Lebesgue measure (in R1 ) of the intersect of a line L with A. Again use a conditioning argument to show that ξL (A) ≡

i

(A ∩ Li )

(bounded A ∈ B(R2 ))

is a well-deﬁned random measure. 9.1.9 Let N (A) denote the number of points in A ∈ BX of the negative binomial process of Example 9.1(b), involving a Cox process directed by the random measure ξ(A) = A η(u) du for η(·) a gamma process. Show that E(z N (A) ) = E( exp[−(1 − z)ξ(A)]). Derive a negative binomial approximation for suitably small sets A, and relate the ﬁrst two moments of N (·) to those of ξ(·). 9.1.10 Random probability distributions. (a) Let ξ be a boundedly ﬁnite but not totally ﬁnite measure on R+ . Use the distribution function Fη (x) ≡ 1 − exp(−ξ[0, x]) to deﬁne a measure η on R+ , with η(R+ ) = 1. Show that when ξ is a random measure, η is a random probability measure on R+ . (b) When ξ is completely random (see Section 10.1), η is a ‘neutral process’ in the terminology of Doksum (1974). Show that when ξ has no deterministic component (see Theorem 10.1.III), the distribution η is a.s. purely atomoic.

24

9. Basic Theory of Random Measures and Point Processes

9.1.11 Prior and posterior distributions for an inhomogeneous Poisson process. Suppose it is desired to ﬁt an inhomogeneous but totally ﬁnite Poisson process to one or more sets of observations over X . Instead of assuming a speciﬁc parametric form for the intensity measure, suppose it is a gamma random measure Λ governed by a constant λ and some totally ﬁnite measure α(·) (see Exercise 6.1.1). Given a realization (x1 , . . . , xn ), show that the posterior distribution for Λ is again a gamma random measure, governed by the constant λ + 1 and the totally ﬁnite measure α + n i=1 δxi . Equivalently, we may take Λ = C.F where the prior distribution for the constant C is Γ(α(X ), λ), and the prior distribution for the probability distribution F has the Dirichlet form Dα .

9.1.12 Let {Xn } be a stationary ergodic real-valued Markov chain whose ﬁrst absolute moment is ﬁnite, and deﬁne Yn = X1 +· · ·+Xn . If EXn = 0, then by the ergodic theorem {Yn } obeys the strong law of large numbers and therefore satisﬁes the conditions of Proposition 9.1.X. 9.1.13 Let {Yn : n = 0, 1, . . .} be a random walk in Rd ; that is, Y0 = 0 and the Rd valued r.v.s Xn ≡ Yn − Yn−1 , n = 1, 2, . . . are i.i.d. Show that the conditions of Proposition 9.1.X are satisﬁed if either d ≥ 3 or else d = 1 or 2 and E|Xn | < ∞, EXn = 0. [Hint: Under the stated conditions, a random walk in Rd is transient.] Note that a renewal process is the special case d = 1 and Xn ≥ 0 a.s. (and Xn = 0 a.s.), so that for some d.f. F on R+ , with 0 = F (0−) ≤ F (0+) < 1 = limx→∞ F (x), and any positive integer r, r

P({Xi ∈ (xi , xi + dxi ], i = 1, . . . , r}) =

[F (xi + dxi ] − F (xi )]. i=1

9.1.14 Given a nonnull counting measure N ∈ NR# , deﬁne {Yn : n = 0, ±1, . . .} or a subset of this doubly inﬁnite sequence by N (0, Yn ) < n ≤ N (0, Yn ] N (Yn , 0] < −n + 1 ≤ N [Yn , 0]

(n = 1, 2, . . .), (n = 0, −1, . . .).

Show that if N is a point process then {Yn : −N (−∞, 0] + 1 ≤ n ≤ N (0, ∞)} is a set of well-deﬁned r.v.s. Now let N0 be the subspace of NR#∗ consisting of simple counting measures on R with a point at the origin, so N ∈ N0 is boundedly ﬁnite, simple, and N {0} = 1. Show that if the atoms of such N yield the ordered set {. . . , t−1 , t0 = 0, t1 , . . .} and τi = ti − ti−1 , then the mapping Θ: N0 → S0+ which takes the counting measure N into the space S0+ of doubly inﬁnite positive sequences {. . . , τ−1 , τ0 , τ1 , . . .} is one-to-one and both ways measurable with respect to the usual σ-ﬁelds in N0 and S0+ . Hence, probability measures on N0 and S0+ are in one-to-one correspondence. 9.1.15 To establish a measurable enumeration of the points of a point process on X ⊆ Rd , ﬁrst locate an initial point in the sequence as the point closest to some spatial origin (recall that the point process is on X ⊆ Rd ), with the proviso that in the event of there being two or more points equidistant

9.2.

Finite-Dimensional Distributions and the Existence Theorem

25

from 0 [i.e., for some sphere Sr (0) and integer k ≥ 2, N (Sr (0)) = k and N (Sr− (0)) = 0 for every > 0], these k points are ordered lexicographically in terms of a coordinate system for X , yielding y1 , . . . , yk say. The remaining points can be found on a sequence of progressively larger spheres centred on 0, using a similar tie-breaking rule as needed. Show that such a sequence of points is a sequence of X -valued r.v.s as required in Deﬁnition 9.1.XI. Compare this construction with the discussion of ﬁnite point processes summarized in Proposition 5.3.II. 9.1.16 (a) Mimic Theorem 9.1.XV to establish the existence of regular conditional probabilities on a product space X × Y, where (X , E) is an arbitrary measurable space and Y is a c.s.m.s. (cf. Proposition A1.5.III). [Hint: Let π be a probability measure on the product space and πX the marginal distribution on (X , E). For ﬁxed disjoint A, B ∈ BY show the existence of Radon–Nikodym derivatives Q(A | x), Q(B | x), Q(A ∪ B | x) such that Q(A ∪ B | x) = Q(A | x) + Q(B | x)

(πX -a.e. x).

Now identify Q(A | x) with ξA (ω) of the theorem and verify the continuity condition (9.1.22). For alternative approaches see Ash (1972, Section 6.6) and Feller (1966, Section V.10).] (b) Extend the above argument to the case of µ, a boundedly ﬁnite measure on X × Y, where X , Y are c.s.m.s.s and there exists a boundedly ﬁnite measure λ on BX such that µ(· × B) is absolutely continuous with respect to λ for bounded sets B ∈ BY ; that is, establish the existence of a family of measures µ(· | x) on BY for all x ∈ X such that µ(B | ·) is measurable for each bounded B ∈ BX , and for bounded sets A ∈ BX , B ∈ BY , µ(A × B) =

µ(B | x) λ(dx). A

[Hint: Normalize λ so that it is a probability measure on A.]

9.2. Finite-Dimensional Distributions and the Existence Theorem Only statements about the distributions of a process are amenable, via frequency counts and the like, to direct comparison with observations. This is some justiﬁcation for the view that the theory of random measures and point to the study of the measures they induce on can be reduced # processes # # ) and N , B(N MX , B(M# X X X ) respectively. The deeper reason, however, is the unity and clarity that this point of view brings to questions concerning the existence of random measures and point processes. Using the characterization of such distributions through their ﬁnite-dimensional (ﬁdi) distributions, as set out in Proposition 9.2.III below, we have an unequivocal answer to the problem of how to establish the existence of a particular class of point processes: can we write down for the class

26

9. Basic Theory of Random Measures and Point Processes

a unique and consistent family of ﬁdi distributions? This is the underlying reason for the importance of the studies by Moyal (1962) and Harris (1963), who were the ﬁrst to set up a systematic theory of point processes in these terms. It also opens up the way for extensions to point processes on general types of spaces. On the other hand, the ﬁdi distributions do not always provide the most convenient framework for examining the structure of particular models. For ﬁnite processes, the Janossy densities introduced in Chapter 5 are usually the most eﬀective tool; likewise, for evolutionary processes, the conditional intensities introduced in Chapter 7 may prove extremely useful. But in all such cases, basic questions of existence can be referred back to the possibility of constructing a consistent family of ﬁdi distributions. Deﬁnition 9.2.I. The distribution of a random measure or point process # , B(M ) or NX# , B(NX# ) , is the probability measure it induces on M# X X respectively. Deﬁnition 9.2.II. The ﬁnite-dimensional distributions (ﬁdi distributions for short) of a random measure ξ are the joint distributions, for all ﬁnite families of bounded Borel sets A1 , . . . , Ak of the random variables ξ(A1 ), . . . , ξ(Ak ), that is, the family of proper distribution functions Fk (A1 , . . . , Ak ; x1 , . . . , xk ) = P{ξ(Ai ) ≤ xi (i = 1, . . . , k)}.

(9.2.1)

Let us say that the distribution of a random measure is completely determined by some quantities ψ if, whenever two random measures give the same values for ψ, their distributions coincide. Analogously to Theorem A2.6.III and Proposition 9.1.VIII, we have the following result. Proposition 9.2.III. The distribution of a random measure is completely determined by the ﬁdi distributions (9.2.1) for all ﬁnite families (A1 , . . . , Ak ) of disjoint sets from a semiring A of bounded sets generating BX . Proof. Let R denote the ring generated by A. Then any element A of R k can be represented as the ﬁnite union of disjoint sets from A, A = i=1 Ai say, and thus, because k ξ(A) = ξ(Ai ), (9.2.2) i=1

the distribution of ξ(A) can be written down in terms of (9.2.1) for disjoint Ai . A similar result holds for the joint distributions of the ξ(A) for any ﬁnite family of sets Ai in R. Now consider the class of subsets of M# X of the form of cylinder sets, {ξ: ξ(Ai ) ∈ Bi (i = 1, . . . , k)},

(9.2.3)

where the Ai are chosen from R, and the Bi are Borel sets of the real line R. These cylinder sets form a ring, and it follows from Theorem A2.5.III

9.2.

Finite-Dimensional Distributions and the Existence Theorem

27

that this ring generates B(M# X ). But the probabilities of all such sets can be determined from the joint distributions (9.2.1). Thus, the distribution of ξ is known on a ring generating B(M# X ) and it follows from Proposition A1.3.I(b) that it is determined uniquely. In the terminology of Billingsley (1968, p. 15), Proposition 9.2.III asserts that ﬁnite families of disjoint sets from a semiring A generating BX form a determining class for random measures on (X , BX ). Of course we also have the following corollary. Corollary 9.2.IV. The distribution of a random measure is completely determined by its ﬁdi distributions. For a point process, that is, an integer-valued random measure, it is simplest to specify the ﬁdi distributions in the notation of (5.3.9), namely, for bounded Borel sets A1 , A2 , . . . and nonnegative integers n1 , n2 , . . . , Pk (A1 , . . . , Ak ; n1 , . . . , nk ) = P{N (Ai ) = ni (i = 1, . . . , k)}.

(9.2.4)

As in Proposition 9.1.VIII, the distribution of a point process, meaning the measure induced on NX# , B(NX# ) , is completely speciﬁed by the ﬁdi distributions of N (A) for A in a countable ring generating the Borel sets. We turn now to the main problem of this section, to ﬁnd necessary and suﬃcient conditions on a set of ﬁdi distributions (9.2.1) that will ensure that they are the ﬁdi distributions of a random measure. The conditions fall into two groups: ﬁrst the consistency requirements of the Kolmogorov existence theorem, and then the supplementary requirements of additivity and continuity needed to ensure that the realizations are measures. Conditions 9.2.V (Kolmogorov Consistency Conditions). (a) Invariance under index permutations. For all integers k > 0 and all permutations i1 , . . . , ik of the integers 1, . . . , k, Fk (A1 , . . . , Ak ; x1 , . . . , xk ) = Fk (Ai1 , . . . , Aik ; xi1 , . . . , xik ). (b) Consistency of marginals. For all k ≥ 1, Fk+1 (A1 , . . . , Ak , Ak+1 ; x1 , . . . , xk , ∞) = Fk (A1 , . . . , Ak ; x1 , . . . , xk ). The ﬁrst of these conditions is a notational requirement: it reﬂects the fact that the quantity Fk (A1 , . . . , Ak ; x1 , . . . , xk ) measures the probability of an event {ω: ξ(Ai ) ≤ xi (i = 1, . . . , k)}, that is independent of the order in which the random variables are written down. The second embodies an essential requirement: it must be satisﬁed if there is to exist a single probability space Ω on which the random variables can be jointly deﬁned.

28

9. Basic Theory of Random Measures and Point Processes

The other group of conditions captures in distribution function terms the conditions (9.1.21) and (9.1.22), which express the fact that the random variables so produced must ﬁt together as measures. Conditions 9.2.VI (Measure Requirements). (a) Additivity. For every pair A1 , A2 of disjoint Borel sets from BX , the distribution F3 (A1 , A2 , A1 ∪A2 ; x1 , x2 , x3 ) is concentrated on the diagonal x1 + x2 = x3 . (b) Continuity. For every sequence {An : n ≥ 1} of bounded Borel sets decreasing to ∅, and all > 0, 1 − F1 (An ; ) → 0

(n → ∞).

(9.2.5)

Conditions 9.2.V imply the existence of a probability space on which the random variables ξ(A), A ∈ BX , can be jointly deﬁned. Then Condition 9.2.VI(a) implies (9.2.6) P{ξ(A1 ) + ξ(A2 ) = ξ(A1 ∪ A2 )} = 1. It follows by induction that a similar relation holds for the members of any ﬁnite family of Borel sets. For any given sequence of sets, Condition 9.2.VI(a) implies a.s. ﬁnite additivity, and then Condition 9.2.VI(b) allows this ﬁnite additivity to be extended to countable additivity. This leads us to the existence theorem itself: it asserts that, in the case of nonnegative realizations, the Conditions 9.2.V and 9.2.VI are not only necessary but also suﬃcient to ensure that the ﬁdi distributions can be associated with a random measure. Note that Example 9.1(f) implies that without nonnegativity, the suﬃciency argument breaks down. It appears to be an open problem to ﬁnd necessary and suﬃcient conditions on the ﬁdi distributions that ensure that they belong to a random signed measure. See Exercise 9.2.4. Theorem 9.2.VII. Let Fk (· ; ·) be a family of distributions satisfying the Consistency Conditions 9.2.V. In order that the Fk (·) be the ﬁdi distributions of a random measure, it is necessary and suﬃcient that (i) the distributions Fk (·) be supported by the nonnegative half-line; and (ii) the Fk (·) satisfy the Measure Conditions 9.2.VI. Proof. Necessity is clear from the necessity part of Theorem 9.1.XIV, so we proceed to suﬃciency. Because the Fk (·) satisfy the Kolmogorov conditions, there exists a probability space (Ω, E, P) and a family of random variables ξA for bounded A ∈ BX , related to the given ﬁdi distributions by (9.2.1). Condition (i) above implies ξA ≥ 0 a.s., and condition (ii) that the random variables ξA satisfy (9.1.21) for each ﬁxed pair of bounded Borel sets. Now the random variables are a.s. monotonic decreasing, so (9.2.5) implies the truth of (9.1.22) for each ﬁxed sequence of bounded Borel sets An with An ↓ ∅. As in earlier discussions, the whole diﬃculty of the proof revolves around the fact that in general there is

9.2.

Finite-Dimensional Distributions and the Existence Theorem

29

an uncountable number of conditions to be checked, so that even though each individual condition is satisﬁed with probability 1, it cannot be concluded from this that the set on which they are simultaneously satisﬁed also has probability 1. To overcome this diﬃculty, we invoke Theorem 9.1.XIV. It is clear from the earlier discussion that both conditions of Theorem 9.1.XIV are satisﬁed, so that we can deduce the existence of a random measure ξ ∗ such that ξ ∗ (A) and ξA coincide a.s. for every Borel set A. But this implies that ξ ∗ and ξ have the same ﬁdi distributions, and so completes the proof. Corollary 9.2.VIII. There is a one-to-one correspondence between probability measures on B(M# X ) and families of ﬁdi distributions satisfying Conditions 9.2.V and 9.2.VI. In practice, the ﬁdi distributions are given most often for disjoint sets, so that Condition 9.2.VI(a) cannot be veriﬁed directly. In this situation it is important to know what conditions on the joint distributions of the ξ(A) for disjoint sets will allow such distributions to be extended to a family satisfying Condition 9.2.VI(a). Lemma 9.2.IX. Let Fk be the family of ﬁdi distributions deﬁned for ﬁnite families of disjoint Borel sets and satisfying for such families the Kolmogorov Conditions 9.2.V. In order for there to exist an extension (necessarily unique) to a full set of ﬁdi distributions satisfying Conditions 9.2.VI(a) as well as 9.2.V, it is necessary and suﬃcient that for all integers k ≥ 2, and ﬁnite families of disjoint Borel sets {A1 , A2 , . . . , Ak }, z Fk (A1 , A2 , A3 , . . . , Ak ; dx1 , z − x1 , x3 , . . . , xk ) (9.2.7) 0 = Fk−1 (A1 ∪ A2 , A3 , . . . , Ak ; z, x3 , . . . , xk ). Proof. The condition (9.2.7) is clearly a corollary of Conditions 9.2.VI(a) and therefore necessary. We show that it is also suﬃcient. Let us ﬁrst point out how the extension from disjoint to arbitrary families of sets can be made. Let {B1 , . . . , Bn } be any such arbitrary family. Then there exists a minimal family {A1 , . . . , Ak } of disjoint sets (formed from the nonempty intersections of the Bi and Bic ) such that each Bi can be represented as a ﬁnite union of some of the Aj . The joint distribution Fk (A1 , . . . , Ak ; x1 , . . . , xk ) will be among those originally speciﬁed. Using this distribution, together with the representations of each ξ(Bi ) as a sum of the corresponding ξ(Aj ), we can write down the joint distribution of any combination of the ξ(Bi ) in terms of Fk . It is clear from the construction that the resultant joint distributions will satisfy Condition 9.2.VI(a) and that only this construction will satisfy this requirement. To complete the proof it is necessary to check that the extended family of distributions continues to satisfy Condition 9.2.V(b). We establish this by induction on the index k of the minimal family of disjoint sets generating the given ﬁdi distribution. Suppose ﬁrst that there are just two sets A1 ,

30

9. Basic Theory of Random Measures and Point Processes

A2 in this family. The new distributions deﬁned by our construction are F2 (A1 , A1 ∪ A2 ), F2 (A2 , A1 ∪ A2 ), and F3 (A1 , A2 , A1 ∪ A2 ). Consistency with the original distributions F2 (A1 , A2 ), F1 (A1 ), and F1 (A2 ) is guaranteed by the construction and by the marginal consistency for distributions of disjoint sets. Only the marginal consistency with F1 (A1 ∪ A2 ) introduces a new element. Noting that by construction we have min(x,y) F2 (A1 , A1 ∪ A2 ; x, y) = F2 (A1 , A2 ; du, y − u), 0

and letting x → ∞, we see that this requirement reduces precisely to (9.2.7) with k = 2. Similarly, for k > 2, marginal consistency reduces to checking points covered by the construction, by preceding steps in the induction, by Condition 9.2.V(b) for disjoint sets, or by (9.2.7). Example 9.2(a) Stationary gamma random measure [see Example 9.1(d)]. The Laplace transform relation ψ1 (A; s) ≡ ψ(A; s) = (1 + λs)−α(A) determines the one-dimensional distributions, and the independent increments property on disjoint sets implies the relation ψk (A1 , . . . , Ak ; s1 , . . . , sk ) =

k

(1 + λsi )−α(Ai ) ,

i=1

which determines their joint distributions. Consistency of marginals here reduces to the requirement ψk−1 (A1 , . . . , Ak−1 ; s1 , . . . , sk−1 ) = ψk (A1 , . . . , Ak ; s1 , . . . , sk−1 , 0), which is trivially satisﬁed. Also, if An ↓ ∅, (An ) → 0 by continuity of Lebesgue measure, and thus ψ1 (A; s) = (1 + λs)−α(An ) → 1, which is equivalent to Condition 9.2.VI(b). Finally, to check (9.2.6) we should verify that for disjoint A1 and A2 , ψ1 (A1 ∪ A2 ; s) = ψ2 (A1 , A2 ; s, s), which is a simple consequence of additivity of Lebesgue measure. These arguments establish the consistency conditions when the sets occurring in the ﬁdi distributions are disjoint, and it follows from Lemma 9.2.IX that there is a unique consistent extension to arbitrary Borel sets. The basic existence theorem for point processes is somewhat simpler than Theorem 9.2.VII for general random measures, as we now indicate. Theorem 9.2.X (Kolmogorov Existence Theorem for Point Processes). In order that a family Pk (A1 , . . . , Ak ; n1 , . . . , nk ) of discrete ﬁdi distributions deﬁned on bounded Borel sets be the ﬁdi distributions of a point process, it is necessary and suﬃcient that

9.2.

Finite-Dimensional Distributions and the Existence Theorem

31

(i) for any permutation i1 , . . . , ik of the indices 1, . . . , k, ∞

Pk (A1 , . . . , Ak ; n1 , . . . , nk ) = Pk (Ai1 , . . . , Aik ; ni1 , . . . , nik );

(ii) r=0 Pk+1 (A1 , . . . , Ak , Ak+1 ; n1 , . . . , nk , r) = Pk (A1 , . . . , Ak ; n1 , . . . , nk ); (iii) for each disjoint pair of bounded Borel sets A1 , A2 , P3 (A1 , A2 , A1 ∪ A2 ; n1 , n2 , n3 ) has zero mass outside the set where n1 + n2 = n3 ; and (iv) for sequences {An } of bounded Borel sets with An ↓ ∅, P1 (An ; 0) → 1. The task of checking the conditions in detail here can be lightened by taking advantage of Lemma 9.2.IX, from which it follows that if the consistency conditions (i) and (ii) are satisﬁed for disjoint Borel sets, and if for such disjoint sets the equations n

Pk (A1 , A2 , A3 , . . . , Ak ; r, n − r, n3 , . . . , nk )

r=0

= Pk−1 (A1 ∪ A2 , A3 , . . . , Ak ; n, n3 , . . . , nk )

(9.2.8)

hold, then there is a unique consistent extension to a full set of ﬁdi distributions satisfying (iii). Example 9.2(b) The Poisson process with parameter measure µ [see Section 2.4]. Here the ﬁdi distributions for disjoint Borel sets are readily speciﬁed by the generating function relations Πk (A1 , . . . , Ak ; z1 , . . . , zk ) =

k

exp[−µ(Ai )(1 − zi )],

(9.2.9)

i=1

where Πk is the generating function associated with the distribution Pk . Condition (ii) is readily checked by setting Zk = 1; then the term 1 − zk vanishes and reduces the product to the appropriate form for Πk−1 . In generating function terms, equation (9.2.6) becomes, for k = 2, Π2 (A1 , A2 ; z, z) = Π1 (A1 ∪ A2 ; z), which expresses the additivity of the Poisson distribution. Finally, to check condition (iv) we require Π1 (An ; 0) → 1, that is, exp[−µ(An )] → 1, which is a corollary of the assumption that µ is a measure, so µ(An ) → 0 as An ↓ ∅. It should be noted that the form (9.2.9) does not hold for arbitrary sets but has to be replaced by such forms as Π2 (A1 , A2 ; z1 , z2 ) = exp[−µ(A1 )(1 − z1 ) − µ(A2 )(1 − z2 ) + µ(A1 ∩ A2 )(1 − z1 )(1 − z2 )] when the sets overlap. The extension to arbitrary families of nondisjoint sets is unique, but laborious, and need not be pursued in detail.

32

9. Basic Theory of Random Measures and Point Processes

Example 9.2(c) Finite point processes. If the distribution of a ﬁnite point process is speciﬁed in any of the ways described in Proposition 5.3.II, in particular, say, by its Janossy measures [see around (5.3.2)], then the ﬁdi distributions are given by (5.3.13), namely n1 ! . . . nk ! Pk (A1 , . . . , Ak ; n1 , . . . , nk ) ∞ (n ) (n ) Jn+r (A1 1 × · · · × Ak k × C (r) ) , = r! r=0

(9.2.10)

where C is the complement of the union of the disjoint sets A1 , . . . , Ak and n = n1 + · · · + nk . Although we can infer on other grounds that the point process is well deﬁned, and hence that the ﬁdi distributions must be consistent, it is of interest to check the consistency conditions directly. Because (9.2.10) is restricted to disjoint sets, the appropriate conditions are (i), (ii), and (iv) of Theorem 9.2.X together with (9.2.8). The permutation condition (i) follows from the symmetry of the Janossy measures. Also, condition (iv) reduces to ∞ Jr (X \ An )(r) →1 if An ↓ ∅. P1 (An ; 0) = r! r=0 But then X \ An ↑ X , and the result follows from dominated convergence, the fact ∞ that the(r)Jr (·) are themselves measures, and the normalization condition )]/r! = 1 as in (5.3.9). r=0 [Jr (X The additivity requirement (9.2.8) follows from identities of the type

(n1 )

Jn+r (A1

n1 +···+nk =n

(nk )

× · · · × Ak n1 ! . . . nk !

× C (r) )

Jn+r (A1 ∪ · · · ∪ Ak )(n) × C (r) = , n! which are immediate applications of Lemma 5.3.III. Similarly, the marginal condition (ii) reduces to checking the equations (nk−1 ) (n ) (n ) ∞ ∞ Jν+nk +r (A1 1 × · · · × Ak−1 × Ak k × C (r) )

nk ! r!

nk =0 r=0

∞ s 1 (nk−1 ) (n ) (t) Jν+s (A1 1 × · · · × Ak−1 × Ak × C (s−t) ) s! s=0 t=0 (n ) (nk−1 ) ∞ Jν+s A1 1 × · · · × Ak−1 × (Ak ∪ C)(s) , = s! s=0

=

where ν = n1 + · · · + nk−1 , the ﬁrst equation is a regrouping of terms, and the second equation is a further application of Lemma 5.3.III.

9.2.

Finite-Dimensional Distributions and the Existence Theorem

33

Underlying Example 9.2(c) is a mapping of X ∪ → NX ; see also Propo∞ ∪∗ = n=0 X (n)∗ , where X (n)∗ is the n-fold sitions 9.1.XI–XII. The space X product of the c.s.m.s. X subject to the constraint that any unordered set x = {x1 , . . . , xn } ∈ X (n)∗ satisﬁes xi = xj for i = j [cf. (5.3.10)], is a candidate space for describing ﬁnite simple point processes [cf. Chapter 5 and Deﬁnition 9.1.II(iii)]. Important questions relate to the characterization of subclasses of point processes through their ﬁdi distributions. Characterizing simple point processes leads to the discussion of orderliness taken up in Section 9.3. Marked point processes can be treated as point processes on product spaces, with the deﬁning sets for the ﬁdi distributions restricted to ‘rectangle sets’ Ai × Ki where Ai ∈ BX , Ki ∈ BK . The ﬁdi distributions of a process on the product space correspond to those of a marked point process if the distributions of the ground process, obtained by setting all Ki = K in the product sets, as in Pkg (A1 , . . . , Ak ; n1 , . . . , nk ) = Pk (A1 × K, . . . , Ak × K; n1 , . . . , nk ), are proper, and satisfy the conditions of Proposition 9.2.X. To conclude the present section, we outline an extension of R´enyi’s (1967) result, quoted at Theorem 2.3.II, that a simple Poisson process in Rd , whether homogeneous or not, is determined by the values of its avoidance function P0 (A) = P{N (A) = 0}

(9.2.11)

on a suitably rich class A of Borel sets. The essence of this result is that, for a simple point process, the avoidance function alone is enough to determine the full set of ﬁdi distributions. Our aim is to describe an interaction of structural properties of the space X and the function P0 (·) which are enough for P0 (·) to retain this determining character without the strong probabilistic assumptions of the Poisson process. Concerning terminology, Kendall (1974) used the term avoidance function in a more general (stochastic geometry) context, reﬂecting the fact that P0 (A) gives the probability of the support of a random set function avoiding a prescribed set A; other possible terms include zero function, avoidance probability function, and vacuity function [McMillan (1953)]. Extensions of R´enyi’s result are due to M¨ onch (1971), who showed that the Poisson assumption is not needed [see also Kallenberg (1973, 1975)], and a characterization of the avoidance function due to Kurtz (1974). Unpublished work of Karbe (1973) is presumably the basis of some discussion in MKM (1978, Section 1.4). Much of the work, largely couched in algebraic language, was developed by McMillan (1953) who used the term vacuity function in lectures in Berkeley in 1981. If only the state space X of the simple point process N (·) were countable, R´enyi’s result would be almost trivial, for with i, j, . . . denoting distinct points

34

9. Basic Theory of Random Measures and Point Processes

of X , we should have for the ﬁrst few ﬁdi distributions P1 ({i}; 0) = 1 − P1 ({i}; 1) = P0 ({i}), P2 ({i}, {j}; 0, 0) = P0 ({i, j}), P2 ({i}, {j}; 0, 1) = P0 ({j}) − P0 ({i, j}), P2 ({i}, {j}; 1, 1) = 1 − P0 ({i}) − P0 ({j}) + P0 ({i, j}). Continuing in this way, all the ﬁdi distributions could be built up through a sequence of diﬀerencing operations applied to P0 (·), and it is clear that the avoidance function would thereby determine the ﬁdi distributions uniquely. Our task here is to extend this argument to a general c.s.m.s. X as state space. Following Kurtz (1974), the equations ∆(A)ψ(B) = ψ(B) − ψ(A ∪ B), ∆(A1 , . . . , Ak , Ak+1 )ψ(B) = ∆(Ak+1 )[∆(A1 , . . . , Ak )ψ(B)]

(9.2.12a) (k = 1, 2, . . .), (9.2.12b)

deﬁne a diﬀerence operator ∆(A) and its iterates acting on any set function ψ(·) for A, A1 , A2 , . . . , B in a ring of sets on which ψ(·) is deﬁned. This operator is tailored to the needs of (9.2.16) in the lemma below; the sign convention in its deﬁnition here is opposite that of Kurtz and Kallenberg. Lemma 9.2.XI. For every integer k ≥ 1 and all Borel sets A1 , A2 , . . . , B, ∆(A1 , . . . , Ak )P0 (B) = P{N (Ai ) > 0 (i = 1, . . . , k), N (B) = 0}.

(9.2.13)

Proof. For k = 1 we have P{N (A1 ) > 0, N (B) = 0} = P0 (B) − P0 (A1 ∪ B) = ∆(A1 )P0 (B). The general form follows by an induction argument (see Exercise 9.2.5). As a special case of (9.2.13) with B the null set, ∆(A1 , . . . , Ak )P0 (∅) = P{N (Ai ) > 0 (i = 1, . . . , k)}.

(9.2.14)

The nonnegativity of (9.2.11) appears later in Theorem 9.2.XV in a characterization of the avoidance function. In the meantime, Lemma 9.2.XI provides a useful notational convention and serves as a reminder that the probability of the complex event on the right-hand side of (9.2.13) can be expressed immediately in terms of the avoidance function. The basic idea motivating the introduction of the operator ∆ at (9.2.12) is that it leads to a succinct description of the ﬁdi distributions of a point process when, for suitable sets Ai , N (Ai ) is ‘small’ in the sense of having P{N (Ai ) = 0 or 1 (all i)} ≈ 1. Such an approximation can be realized only if N is simple and the class of sets on which the values of the avoidance function are known contains a dissecting system for X (see Deﬁnition A1.6.I). Fortunately, on a c.s.m.s. X the Borel sets BX necessarily contain a dissecting system and hence a dissecting ring (Deﬁnition A2.1.V).

9.2.

Finite-Dimensional Distributions and the Existence Theorem

35

Theorem 9.2.XII (R´enyi, 1967; M¨ onch, 1971). The distribution of a simple point process N on a c.s.m.s. X is determined by the values of the avoidance function P0 on the bounded sets of a dissecting ring A for X . Proof. It is enough to show that the ﬁdi distributions of a point process as at (9.2.4), involving only bounded subsets of X , are determined by the avoidance function. We use an indicator function Z(B), B ∈ A, and a dissecting system T as in and below (9.3.12), for which the r.v.s ζn (A) =

kn

Z(Ani )

(n = 1, 2, . . .),

(9.2.15)

i=1

count the numbers of sets in Tn containing points of N (·). Because every Ani is the union of elements in Tn+1 , and the r.v.s Z(·) are subadditive set functions, it follows that {ζn (A)} is a nondecreasing sequence. Moreover, because N is simple and {Tn } is a dissecting system, the limit N (A) ≡ lim ζn (A)

(9.2.16)

n→∞

exists a.s. Now the joint distribution of the Z(Ani ), and hence of ζn (A), and (more generally) of {ζn (Ai ) (i = 1, . . . , k)}, is expressible directly in terms of the avoidance function: for example, r (9.2.17) ∆(Ani1 , . . . , Anir ) P0 A \ Anij , P{ζn (A) = r} = {i1 ,...,ir }

j=1

where the sum is taken over all krn distinct combinations of r sets from the kn (≥ r) sets in the partition Tn of A. Rather more cumbersome formulae give the joint distributions of ζn (Ai ). Because the convergence of {ζn } to its limit is monotonic, the sequence of events {ζn (Ai ) ≤ ni (i = 1, . . . , k)} is also monotone decreasing in n, and thus P{ζn (Ai ) ≤ ni (i = 1, . . . , k)} → P{N (Ai ) ≤ ni (i = 1, . . . , k)}. Thus, P0 determines the ﬁdi distributions as asserted. Corollary 9.2.XIII. Let N1 , N2 be two point processes on X whose avoidance functions coincide on the bounded sets of a dissecting ring for X . Then their support point processes N1∗ and N2∗ are equivalent. Versions of these results that apply to random measures can be given (see Exercise 9.2.7): the avoidance functions are replaced by the Laplace transforms E(e−sξ(A) ) for ﬁxed s > 0. We turn ﬁnally to a characterization problem. Deﬁnition 9.2.XIV. A set function ψ deﬁned on a ring R of sets is completely monotone on R if for every sequence {A, A1 , A2 , . . .} of members of R, (every n = 1, 2, . . .). ∆(A1 , . . . , An ) ψ(A) ≥ 0

36

9. Basic Theory of Random Measures and Point Processes

Note that this deﬁnition of complete monotonicity diﬀers from conventional usage by the omission of a factor (−1)n on the left-hand side of the inequality [see also the deﬁnition of ∆ at (9.2.12)]. Using Deﬁnition 9.2.XIV, Lemma 9.2.XI asserts that the avoidance function of a point process is completely monotone on BX . Complete monotonicity of a set function is not suﬃcient on its own to characterize an avoidance function. Theorem 9.2.XV (Kurtz, 1974). Let ψ be a set function deﬁned on the members of a dissecting ring R covering the c.s.m.s. X . In order that there exist a point process on X with avoidance function ψ, it is necessary and suﬃcient that (i) ψ be completely monotone; (ii) ψ(∅) = 1; (iii) ψ(An ) → 1 for any bounded sequence {An } in R for which An → ∅ (n → ∞); and (iv) for every bounded A ∈ R, r k = 1, ∆(Ani1 , . . . , Anir ) ψ A \ Anij lim lim ψ(A) + r→∞ n→∞

k=1 {i1 ,...,ir }

j=1

where {Tn } = {Ani : i = 1, . . . , kn } is a dissecting system for A, {Tn } ⊆ R, and the inner summation is over all distinct combinations of k sets from the kn sets in the partition Tn for A. Proof. The necessity of (i) has been noted in Lemma 9.2.XI, condition (ii) is self-evident, and condition (iii) here is the same as (iv) of Theorem 9.2.X. Condition (iv) here follows most readily from (9.2.17) when written in the form r P{ζn (A) = k} = lim P{N (A) ≤ r} = 1 lim lim r→∞ n→∞

k=0

r→∞

and expresses the fact that a point process N is boundedly ﬁnite. For the suﬃciency, it is clear from (i) and (ii) that we can construct an indicator process Z on bounded A ∈ R with ﬁdi distributions (for any ﬁnite number k of disjoint bounded A1 , . . . , Ak ∈ R) Pr{Z (A1 ) = 0} = 1 − Pr{Z (A1 ) = 1} = ψ(A1 ) = ∆(A1 ) ψ(∅), (9.2.18a) ⎫ Pr{Z (Ai ) = 1 (i = 1, . . . , k} = ∆(A1 , . . . , Ak ) ψ(∅), ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ Pr{Z (Aj ) = 0, Z (Ai ) = 1 (all i = j)} (9.2.18b) = ∆(A1 , . . . , Aj−1 , Aj+1 , . . . Ak ) ψ(Aj ), ⎪ ⎪ ⎪ ⎪ ⎪ k ⎭ Pr{Z (Ai ) = 0 (all i)} = ψ i=1 Ai ); nonnegativity is ensured by (i), summation to unity by (ii), and marginal consistency reduces to ∆(A1 , . . . , Ak+1 )ψ(B) + ∆(A1 , . . . , Ak )ψ(B ∪ Ak+1 ) = ∆(A1 , . . . , Ak )ψ(B).

9.2.

Finite-Dimensional Distributions and the Existence Theorem

37

In other words, we have a family of ﬁdi distributions that, being consistent in the sense of the Kolmogorov existence theorem [e.g., Parthasarathy (1967, Chapter V)], enable us to assert the existence of a probability space (Z, E, P ) on which are jointly deﬁned {0, 1}-valued r.v.s {Z (A): bounded A ∈ R}, and P is related to ψ via relations such as (9.2.18) (with Pr replaced by P ). We now introduce r.v.s ζn (A) (bounded A ∈ R) much as at (9.2.18) and observe that the subadditivity of Z implies that ζn (A) are a.s. monotone nondecreasing under reﬁnement as before, so lim ζn (A) ≡ N (A)

n→∞

(9.2.19)

exists a.s. and, being the limit of an integer-valued sequence, is itself integervalued or inﬁnite. From the last relation at (9.2.18b), we have P {ζn (A) = 0} = ψ(A) for all n, so P {N (A) = 0} = ψ(A)

(all bounded A ∈ R).

(9.2.20)

The a.s. ﬁniteness condition that N must satisfy on bounded A ∈ R is equivalent to demanding that lim lim P {ζn (A) ≤ y} = 1,

y→∞ n→∞

which, expressed in terms of the functions ψ via (9.2.21) and relations such as (9.2.18) (with P and ψ replacing P and P0 ), reduces to condition (iii). For bounded disjoint A, B ∈ R, we ﬁnd by using a dissecting system for A ∪ B containing dissecting systems for A and B separately that N (A ∪ B) = lim ζn (A ∪ B) = lim [ζn (A) + ζn (B)] = N (A) + N (B) a.s., n→∞

n→∞

and thus N is ﬁnitely additive on R. Let {Ai } be any disjoint sequence in R with bounded union ∞ Ai ∈ R; A≡ i=1

∞ ∞ we i=1 N (Ai ). Let Br = i=r+1 Ai = A \ r seek to show that N (A) = A , so B is bounded, ∈ R, ↓ ∅ (r → ∞), and thus P {N (Br ) = 0} = r i=1 i ψ(Br ) ↑ 1; that is, N (Br ) → 0 a.s. Deﬁne events Cr ∈ E for r = 0, 1, . . . by C0 = {N : N (A) = 0}

Cr = {N : N (Br ) = 0 < N (Br−1 )}. r Then P (C0 ∪ C1 ∪ · · ·) = 1, and N (A) = i=1 N (Ai ) + N (Br ) on Cr . Also, 0 = N (Br ) = limn→∞ ζn (Br ) that N (Ai ) = 0 for on Cr , it follows from ∞ ∞ i ≥ r + 1 and hence i=r+1 N (Ai ) = 0 on Cr . Because P r=0 Cr = 1, it now follows that N is countably additive on R. Then by the usual extension theorem for measures, N can be extended a.s. to a countably additive boundedly ﬁnite nonnegative integer-valued measure on BX . This extension, with the appropriate modiﬁcation on the P -null set where the extension may fail, provides the required example of a point process with avoidance function P {N (A) = 0} = ψ(A) (A ∈ R) satisfying conditions (i)–(iv). and

38

9. Basic Theory of Random Measures and Point Processes

Exercises and Complements to Section 9.2 9.2.1 Give an example of a family of ﬁdi distributions satisfying (9.2.6) for disjoint sets and all other requirements of Theorem 9.2.VII apart from Condition 9.2.VI(a), which is not satisﬁed. [Hint: Let X be a two-point space, {x, y} say, and construct a r.v. Z and a random set function ξ for which Z has the distribution of ξ({x, y}) but Z = ξ({x}) + ξ({y}).] 9.2.2 Give an example of a family of ﬁdi distributions that satisfy (9.2.6) for k = 2 but not for some k ≥ 3, and hence do not satisfy the consistency Condition 9.2.V(b). [Hint: Modify the previous example.] 9.2.3 Show that the joint distributions of the Dirichlet process of Example 9.1(e) are consistent. 9.2.4 Let Ψ = ξ1 − ξ2 be the diﬀerence of two random measures. For each ω, let Ψ = Ψ+ −Ψ− be the Jordan–Hahn decomposition of Ψ(ω) (Theorem A1.3.IV). Determine conditions under which the mappings Ψ+ and Ψ− are measurable and hence deﬁne random measures. Investigate the extent to which such conditions can be extended to more general settings. 9.2.5 Let A1 , . . . , An be disjoint, A = n i=1 Ai , and ψ(∅) = 1. Verify that the operator ∆ at (9.2.12) satisﬁes (a) and (b) below, and complete the induction proof of (9.2.13).

n

(a) ∆(A)ψ(B) =

n

(1 − ψ(Ai )) =

9.2.6 Let A1 , A2 , . . . be disjoint, A =

∞

1 − ψ(A) =

1−ψ

k j=1

k ∆(Ai1 , . . . , Aik ) ψ A \

Aij

k j=1

,

Aij .

k=1 1≤i1 <···
i=1

then

k=1 1≤i1 <···
n

(b)

∆(Ai1 , . . . , Aik ) ψ B ∪ A \

∞ i=1

Ai , and ψ(∅) = 1. Show that if

k=1 1≤i1 <···
∞

Ai =

∆(Ai1 , . . . , Aik ) ψ A \

k j=1

∆(Ai1 , . . . , Aik ) ψ A \

Aij ,

k j=1

Aij .

k=1 m≤i1 <···
9.2.7 Let ξ be a random measure. Show that for each ﬁxed s > 0 the transform ϕs (A) ≡ E(e−sξ(A) ) is completely monotone in the argument A. Use this to develop a characterization theorem for random measures parallel to Theorem 9.2.XV. [Hint: ϕs (A) is the avoidance function of a Cox process with intensity measure sξ(·). See Kallenberg (1975, Section 5.3).]

9.3. Sample Path Properties: Atoms and Orderliness The outstanding feature of the sample paths or realizations of a random measure is their countable additivity. Nevertheless, some additional questions remain: for example, under what conditions on the ﬁdi distributions will

9.3.

Sample Path Properties: Atoms and Orderliness

39

the realizations be almost surely purely atomic? Or purely nonatomic? Or absolutely continuous with respect to some given measure? Similarly, in the point process context, we may ask for analytic conditions equivalent to the sample-path property of simplicity. This section discusses some basic questions of this kind; Section 10.1 takes the discussion further in the particular context of completely random measures. A technique that runs through our analysis here and later, and has been exploited, for example, by Leadbetter (1968, 1972), Kallenberg (1975), and others, is the use of dissecting systems. Recall from Deﬁnition A1.6.I that a dissecting system T = {Tn : n = 1, 2, . . .} for the space X is a nested sequence of ﬁnite partitions Tn = {Ani : i = 1, . . . , kn } of Borel sets Ani that ultimately separate points of X ; that is, given any two distinct points x and y there exists an n such that x and y are contained in distinct members of Tn (and hence in distinct members of Tn for all n ≥ n). We note also that for any A ∈ BX , T ∩ A is a dissecting system for A. Proposition A2.1.IV asserts that such systems exist for any c.s.m.s. X . If, additionally, X is locally compact (such as when X is Euclidean), a suﬃcient condition for a family of nested partitions T to be dissecting for a bounded Borel set A is that (n → ∞). (9.3.1) max diam(Ani ) → 0 1≤i≤kn

Returning to sample paths, we start with a consideration of the ﬁxed atoms, abbreviating the notation ξ({x}) to ξ{x}. Deﬁnition 9.3.I. The point x0 is a ﬁxed atom of the random measure ξ if P{ξ{x0 } > 0} > 0. It must be remarked that the adjective ‘ﬁxed’ here refers to the locations of the atoms of the realizations ξ; by contrast, if a realization ξ has ξ{x} > 0 but P{ξ{x} > 0} = 0 then such an atom is termed a random atom. Lemma 9.3.II. A random measure ξ is free of ﬁxed atoms on the bounded Borel set A if and only if for some (and then every) dissecting system T for A, for every ﬁxed > 0, (n → ∞). (9.3.2) max P{ξ(Ani ) > } → 0 1≤i≤kn

Proof. Assume ﬁrst that for some dissecting system T for A, some > 0, and some η > 0, (9.3.3) P{ξ(Anj ) ≥ } ≥ η for at least one set Anj in each Tn , n = 1, 2, . . . . For each n let Jn denote all such index pairs (n, j); set J = J1 ∪ J2 ∪ · · · . Because the partitions Tn are nested, every An+1,j ⊆ An,j (j) for some j (j), and because ξ is a measure, ξ(An+1,j ) ≤ ξ(An,j (j) ) a.s. Consequently, the inﬁnite set J is such that every Jn contains at least one pair (n, j) for which Anj ⊇ An ,j for inﬁnitely many pairs (n , j ) ∈ J . Use this property to construct a sequence {(n, j(n)): n = 1, 2, . . .} such that An,j(n) ⊇ An+1,j(n+1) and (n, j(n)) ∈ Jn

40

9. Basic Theory of Random Measures and Point Processes

for each n. By the separation property of T , A∞ ≡ limn→∞ An,j(n) is either the empty set ∅ or a singleton set, {x0 } say. Because A is bounded, so is A1,j(1) , and therefore ∞ > ξ(An,j(n) ). Because ξ is a random measure, we then have (n → ∞), ξ(An,j(n) ) ↓ ξ(A∞ ) and therefore P{ξ(A∞ ) ≥ } ≥ η. Because ξ(∅) = 0 a.s. and > 0, we must have A∞ = {x0 }, and thus ξ has at least one ﬁxed atom. Conversely, if ξ has a ﬁxed atom x0 ∈ A, there exists > 0 for which 0 < P{ξ{x0 } > } ≡ η . Then, given any dissecting system T for A, there exists a set An,j(n) in each Tn such that x0 ∈ An,j(n) , and P{ξ(An,j(n) ) > } ≥ P{ξ{x0 } > } = η

(all n),

so (9.3.2) fails for any dissecting system T for A. Once the ﬁxed atoms have been identiﬁed, one would anticipate representing the random measure as the superposition of two components, the ﬁrst containing all ﬁxed atoms and the second free from ﬁxed atoms. It is not absolutely clear, however, that this procedure corresponds to a measurable operation on the original process. The following establishes this fact. Lemma 9.3.III. The set D of ﬁxed atoms of a random measure is countably inﬁnite at most. Proof. Suppose on the contrary that D is uncountable. Because X can be covered by the union of at most countably many bounded Borel sets, there exists a bounded set, A say, containing uncountably many ﬁxed atoms. Deﬁne the subset D of D ∩ A by D = {x: P{ξ{x} > } > },

(9.3.4)

observing by monotonicity that D ∩ A = lim↓0 D . If D is ﬁnite for every

> 0, then by a familiar construction we can deduce that D ∩ A is countable, so for some positive which we ﬁx for the remainder of the proof, D is inﬁnite. We can extract from D an inﬁnite sequence of distinct points {x1 , x2 , . . .} for which the events En ≡ {ξ: ξ{xn } > } have P(En ) > . Because ξ is boundedly ﬁnite, 0 = P{ξ(A) = ∞} ≥ P{ξ{x} > for inﬁnitely many x ∈ D } ≥ P{inﬁnitely many En occur} ∞ ∞ ∞ Ek = limn→∞ P Ek =P n=1 k=n

≥ > 0, thereby yielding a contradiction.

k=n

9.3.

Sample Path Properties: Atoms and Orderliness

41

It is convenient to represent the countable set D by {xk } and to write Uk ≡ Uk (ω) for the random variable ξ{xk }. Using Dirac measure δx as in (9.1.3), the set function ξc deﬁned for bounded Borel sets A by ξc (A, ω) = ξ(A, ω) − Uk (ω) δxk (A) xk ∈D

is positive and countably additive in A, and for every such A it deﬁnes a random variable. Thus, it deﬁnes a new random measure that is clearly free from ﬁxed atoms, and we have proved the following extension of Proposition 9.1.III(i) of properties of a ﬁxed measure µ to those of a random measure ξ. Proposition 9.3.IV. Every random measure ξ can be written in the form ξ(·, ω) = ξc (·, ω) +

∞

Uk (ω) δxk (·),

k=1

where ξc is a random measure without ﬁxed atoms, the sequence {xk : k = 1, 2, . . .} constitutes the set D of all ﬁxed atoms of ξ, and {Uk : k = 1, 2, . . .} is a sequence of nonnegative random variables. Consider next the more general question of ﬁnding conditions for the trajectories to be a.s. nonatomic. As before, let A be a bounded Borel set and T = {Tn : n = 1, 2, . . .} a dissecting system for A. For any given > 0, we can ‘trap’ any atoms of ξ with mass or greater by the following construction. For each n, set (9.3.5) N(n) (A) = #{i: Ani ∈ Tn , ξ(Ani ) ≥ }. (n)

Then each N (A) is a.s. ﬁnite, being bounded uniformly in n by ξ(A)/ . (n) Moreover, as n → ∞, N (A) converges a.s. to a limit r.v., N (A) say, which is independent of the particular dissecting system T and which represents the number of atoms in A with mass or greater (see Exercise 9.3.2 for a more formal treatment of these assertions). Consequently, ξ is a.s. nonatomic on (n) A if and only if for each > 0, N (A) = 0 a.s. Because N (A) converges a.s. to N (A) irrespective of the value of the latter, a necessary and suﬃcient (n) condition for N = 0 a.s. is that N → 0 in probability. This leads to the following criterion. Lemma 9.3.V. The random measure ξ is a.s. nonatomic on bounded A ∈ BX if and only if for every > 0 and for some (and then every) dissecting system T for A,

(n → ∞). (9.3.6) P #{i : ξ(Ani ) ≥ } > 0 → 0 Corollary 9.3.VI. A suﬃcient condition for ξ to be a.s. nonatomic on bounded A ∈ BX is that for some dissecting system for A and every > 0, kn

P{ξ(Ani ) ≥ } → 0

(n → ∞).

(9.3.7)

i=1

If ξ is a completely random measure then this condition is also necessary.

42

9. Basic Theory of Random Measures and Point Processes

Proof. Equation (9.3.7) is suﬃcient for (9.3.6) to hold because P

k n

kn {ξ(Ani ) ≥ } ≤ P{ξ(Ani ) ≥ }.

i=1

i=1

When ξ is completely random (Deﬁnition 10.1.I), the r.v.s ξ(Ani ) are mutually independent and hence k k n n {ξ(Ani ) ≥ } = P {ξ(Ani ) < } 1−P i=1

=

kn

i=1

P{ξ(Ani ) < } =

i=1

kn

1 − P{ξ(Ani ) ≥ } .

(9.3.8)

i=1

If now ξ is a.s. nonatomic, then by (9.3.6) the left-hand side of (9.3.8) converges to 1 as n → ∞. Finally, the convergence to 1 of the product of the right-hand side of (9.3.8) implies (9.3.7). Exercise 9.3.3 shows that (9.3.7) is not necessary for a random measure to be absolutely continuous. Indeed, it appears to be an open problem to ﬁnd simple suﬃcient conditions, analogous to Corollary 9.3.VI, for the realizations of a random measure to be a.s. absolutely continuous with respect to a given measure (see also Exercise 9.1.7). Example 9.3(a) Quadratic random measures [see also Example 9.1(b)]. Take A to be the unit interval (0, 1], and for n = 1, 2, . . . divide this into kn = 2n subintervals each of length 2−n to obtain suitable partitions for a dissecting system T . Each ξ(Ani ) can be represented in the form ξni ≡ ξ(Ani ) =

(i+1)/kn

Z 2 (t) dt ≈ (1/kn )Z 2 (i/kn ).

i/kn

Because Z 2 (i/kn ) has a χ2 distribution on one degree of freedom, we deduce (as may be shown by a more careful analysis) that Pr{ξni > } = Pr{Z 2 (i/kn ) > kn } 1 + O(1) ≤ C exp(−kn ) for some ﬁnite constant C. Then kn

Pr{ξni > } ≤ Ckn e−kn → 0

(n → ∞).

i=1

Thus, ξ being an integral with a.s. continuous integrand its trajectories are a.s. nonatomic. We turn now to point processes; ultimately we generalize the results of Section 3.3. Simplicity is again a sample-path property; in terms of Propo-

9.3.

Sample Path Properties: Atoms and Orderliness

43

sition 9.1.III, it occurs when N coincides with its support counting measure N ∗ , although this is not the only way it can be described. For MPPs, we note as in Exercise 9.1.6 that any MPP that is not simple can be redeﬁned as a simple MPP with marks in the compound space K∪ . In particular, a nonsimple point process on X can be regarded as a simple point process on X × {1}∪ = X × Z+ [recall also Proposition 9.1.III(iv)]. In practice, it is arguably more useful to have analytic conditions for a point process to be simple. The main approach in developing such conditions is again via dissecting systems. We start from the representation (9.1.6) of a counting measure N , writing it now in the form ∞ kNk∗ , (9.3.9) N= k=1

where

Nk∗ (A) = #{xi ∈ A : N {xi } = k}

(k = 1, 2, . . .).

(9.3.10)

∗

Then for the support counting measure N of N we can write (cf. (9.1.7)) N∗ =

∞

Nk∗ .

(9.3.11)

k=1

We would like to regard (9.3.9) and (9.3.11) as statements concerning point processes as well as statements about individual realizations. To this end we use Proposition 9.1.VIII. It is clear from the construction that N ∗ and each Nk∗ are elements of NX#∗ ; the essential point is to show that for any bounded Borel set A, N ∗ (A) and Nk∗ (A) are r.v.s. We establish this by a construction which plays an important role in later arguments. Suppose then that N is a point process, and for bounded B ∈ BX deﬁne the indicator functions 1 if 1 ≤ N (B) ≤ k, (9.3.12a) Zk (B) = I{1≤N (B)≤k} = 0 otherwise, 1 if N (B) ≥ 1, (9.3.12b) Z(B) = I{N (B)≥1} = 0 otherwise, which are r.v.s because the N (B) are r.v.s. All of the Zk , as well as Z, are subadditive set functions, meaning, for example, that for A, B ∈ BX , Zk (A ∪ B) ≤ Zk (A) + Zk (B). Such subadditive set functions have the important properties that P{1 ≤ N (A ∪ B) ≤ k} = E[Zk (A ∪ B)] ≤ P{1 ≤ N (A) ≤ k} + P{1 ≤ N (B) ≤ k}, (9.3.13) with similar inequalities for P{N (B) ≥ 1}.

44

9. Basic Theory of Random Measures and Point Processes

As around (9.3.9), let T = {Tn } = {Ani } be a dissecting system for any given bounded A ∈ BX . Then (n)

ζk (A) =

Zk (Ani ),

ζ (n) (A) =

i:Ani ∈Tn

Z(Ani )

i:Ani ∈Tn

are further r.v.s. Here, for example, ζ (n) counts the number of subsets in the (n) nth partition containing at least one point of N . The ζk are nondecreasing in n, as follows from (9.3.13) and the fact that Tn+1 partitions each Ani ∈ Tn . So for ﬁxed k and A, (n)

(n+1)

ζk (A) ≤ ζk

(A) ≤ N (A) < ∞

(n = 1, 2, . . .);

hence (n)

ζk (A) = lim ζk (A) n→∞

and

ζ(A) = lim ζ (n) (A), n→∞

(9.3.14)

being monotone limits of bounded sequences, exist a.s. and again deﬁne r.v.s. Although spheres are used in the derivation of (9.1.5), we could equally use elements of T , thereby showing explicitly that ζ1 (A) = N1∗ (A) and that limk→∞ ζk (A), which exists because ζk (A) ≤ ζk+1 (A) ≤ N (A) (k = 1, 2, . . .), equals N ∗ (A). Thus N1∗ (A), N ∗ (A), and Nk∗ (A) = ζk (A) − ζk−1 (A)

(k = 2, 3, . . .)

are all random variables. It follows from Proposition 9.1.VIII that, whenever N is a point process, the random set functions Nk∗ (·) and N ∗ (·) are also point processes. We summarize this discussion in the following proposition. Proposition 9.3.VII. For any point process N, the constructions at (9.1.7) and (9.3.10) applied to realizations of N deﬁne simple point processes N ∗ and Nk∗ , and the relations (9.3.9) and (9.3.11) hold as relations between point processes. In particular, N is simple if and only if Nk∗ = 0 a.s. (k = 2, 3, . . .). The next result generalizes Proposition 3.3.I and identiﬁes a general form of the intensity with the ﬁrst moment measure of N ∗ . Deﬁnition 9.3.VIII. The intensity measure of a point process N is the set function P{N (Ani ) ≥ 1} (A ∈ BX ). λ(A) ≡ sup Tn ∈T (A) i:A

ni ∈Tn

9.3.

Sample Path Properties: Atoms and Orderliness

45

Proposition 9.3.IX (Khinchin’s Existence Theorem). Whether ﬁnite or inﬁnite, (9.3.15) λ(A) = EN ∗ (A) ≡ M ∗ (A); it is independent of the choice of dissecting system T , and deﬁnes a measure when it is boundedly ﬁnite. Proof. Using the monotonicity of ζ (n) and the interchange of limits that monotone convergence permits, we ﬁnd that for any given dissecting system for A, M ∗ (A) ≡ E N ∗ (A) = E lim ζ (n) (A) n→∞ (n) = lim E ζ (A) n→∞ = lim P{N (Ani ) ≥ 1} n→∞

= sup Tn

i:Ani ∈Tn

P{N (Ani ) ≥ 1} = λ(A).

i:Ani ∈Tn

The relation N ∗ (A) = limn→∞ ζ (n) (A), and hence also λ(A) itself, is independent of the particular choice of dissecting system. The rest of the proposition follows easily from the fact that the ﬁrst moment measure M (·) of a point process is indeed a measure [see the discussion around (5.4.1)]. A consequence of the equality of λ(·) with M ∗ (·) is that it is a well-deﬁned measure, possibly ‘extended’ in the sense that we may have λ(A) = ∞ for some bounded A ∈ BX . Notice also that Deﬁnition 9.3.VIII and the subsequent proposition form a particular case of a subadditive set function yielding a measure by addition under reﬁnement (see Exercise 9.3.5). In the next result the direct part is now trivial; it generalizes Proposition 3.3.IV and so may be called Korolyuk’s theorem. The converse may be referred to as Dobrushin’s lemma (cf. Proposition 3.3.V), because it extends naturally a result ﬁrst referenced in Volkonski (1960) for stationary point processes. Proposition 9.3.X. For a simple point process N , λ(A) = M (A)

(all A ∈ BX ).

(9.3.16)

Conversely, if (9.3.16) holds and λ(A) < ∞ for all bounded A, then N is simple. Proof. When N is not simple, there is some bounded A ∈ BX for which ∆ ≡ P{N (A) = N ∗ (A)} > 0. Then M (A) = EN (A) ≥ ∆ + EN ∗ (A) = ∆ + λ(A) and (9.3.16) cannot hold when λ(A) < ∞.

46

9. Basic Theory of Random Measures and Point Processes

Proposition 9.3.VII asserts that each Nk∗ is a simple point process and thus has an intensity measure λ∗k = ENk∗ . From (9.3.9) we may therefore deduce the generalized Korolyuk equation M (A) = EN (A) =

∞

kλ∗k (A) = λ(A) +

k=1

∞

[λ(A) − λk (A)],

(9.3.17)

k=1

where λk = λ∗1 + · · · + λ∗k is the intensity measure of the simple point process ζk . Exercise 9.3.6 notes a version of (9.3.17) applicable to atomic random measures. Further analytic conditions for simplicity involve inﬁnitesimals directly, and usually bear the name orderly or ordinary, the latter deriving from transliteration rather than translation of Khinchin’s original terminology. Deﬁnition 9.3.XI. A point process N is ordinary when, given any bounded

A ∈ BX , there is a dissecting system T = {Tn } = {Ani : i = 1, . . . , kn } for A such that kn P{N (Ani ) ≥ 2} = 0. (9.3.18) inf Tn

i=1

Thus, for an ordinary point process, given > 0, there exists a dissecting system for bounded A and some n such that for n ≥ n , k kn n

> P{N (Ani ) ≥ 2} ≥ P {N (Ani ) ≥ 2} ≡ P(Bn ). i=1

i=1

Now the sequence of sets {Bn } is monotone decreasing, so for any > 0,

> P limn→∞ Bn = P N {x} ≥ 2 for some x ∈ A , which establishes the direct part of the following result. The proof of the converse is left as Exercise 9.3.7. Proposition 9.3.XII. An ordinary point process is necessarily simple. Conversely, a simple point process of ﬁnite intensity is ordinary, but one of inﬁnite intensity need not be ordinary. The following two analytic conditions have a pointwise character, and hence may be simpler to check in practice than the property of being ordinary. To describe them, take any bounded set A containing a given point x, let T be a dissecting system for A, and for each n = 1, 2, . . . , let An (x) denote the member of Tn = {Ani } that contains x. Deﬁnition 9.3.XIII. (a) A point process on the c.s.m.s. X is µ-orderly at x when µ is a boundedly ﬁnite measure and fn (x) ≡

P{N (An (x)) ≥ 2} →0 µ An (x)

(n → ∞),

(9.3.19)

where if µ An (x) = 0 then fn (x) = 0 or ∞ according as P{N (An (x)) ≥ 2} = or > 0.

9.3.

Sample Path Properties: Atoms and Orderliness

47

(b) The process is Khinchin orderly at x if gn (x) ≡ P{N (An (x)) ≥ 2 | N (An (x)) ≥ 1} → 0

(9.3.20)

as n → ∞, where gn (x) = 0 if P{N (An (x)) ≥ 1} = 0. In many situations the state space X is a locally compact group with a boundedly ﬁnite invariant measure ν. If a point process is ν-orderly for a dissecting system based on spheres, we speak of the process as being orderly. Such usage is consistent with Khinchin’s (1955) original use of the term for stationary point processes on R; a point process on R uniformly analytically orderly in Daley’s (1974) terminology is orderly in the present sense. Proposition 9.3.XIV. Suppose that for bounded A ∈ BX the point process N is µ-orderly at x for x ∈ A, and satisﬁes sup sup fn (x) < ∞,

(9.3.21)

n x∈A

where fn (·) is deﬁned by (9.3.19). Then N is simple on A. Proof. From the deﬁnition at (9.3.19), kn

P{N (Ani ) ≥ 2} =

fn (x) µ(dx). A

i=1

Here, fn (x) → 0 pointwise, and by using (9.3.21) to justify appealing to the dominated convergence theorem, the integral → 0 as n → ∞. The process is thus ordinary, and Proposition 9.3.XII completes the proof. When λ is boundedly ﬁnite, the hypotheses here can be weakened by dropping the requirement (9.3.21) and demanding merely that (9.3.19) and (9.3.20) should hold for µ-a.e. x in A. This observation is the key to obtaining a partial converse to the proposition: we give it in the context of Khinchin orderliness. Proposition 9.3.XV. A point process N with boundedly ﬁnite intensity measure λ(·) is simple if and only if it is Khinchin orderly for λ-a.e. x on X . Proof. It suﬃces to restrict attention to a bounded set A. To prove suﬃciency, use the fact that P{N (Ani ) ≥ 1} ≤ λ(Ani ) to write kn

P{N (Ani ) ≥ 2} ≤

i=1

gn (x) λ(dx),

→ 0 (n → ∞)

A

because from (9.3.20), 1 ≥ gn (x) → 0 for λ-a.e. x and λ(A) < ∞. To prove the necessity, suppose that N is simple: we ﬁrst show that hn (x) ≡

P{N (An (x)) ≥ 1} →1 λ(An (x))

(n → ∞) for λ-a.e. x.

(9.3.22)

48

9. Basic Theory of Random Measures and Point Processes

We establish this convergence property via a martingale argument much as in the treatment of Radon–Nikodym derivatives [see Lemma A1.6.III or, e.g., Chung (1974, Section 9.5(VIII))]. Construct a sequence of r.v.s {Xn } ≡ {Xn (ω)} on a probability space (A, BA , IP), where IP(·) = λ(·)/λ(A), by introducing indicator r.v.s Ini (x) = 1 for x ∈ Ani , = 0 otherwise, and setting Xn (x) =

kn

hn (x)Ini (x).

(9.3.23)

i=1

Let Fn denote the σ-ﬁeld generated by the sets of T1 ∪ · · · ∪ Tn ; because {Tn } is a nested system of partitions, Fn has quite a simple structure (!). Then, {Xn } is a submartingale because {Fn } is an increasing sequence of σ-ﬁelds, and on the set An (x) = Ani say, λ(An+1,j ) P{N (An+1,j ) ≥ 1} E Xn+1 | Fn = In+1,j (x) λ(An+1,j ) λ(Ani ) j:An+1,j ⊆Ani

≥

P{N (Ani ) ≥ 1} Ini (x) = Xn λ(Ani )

IP-a.s.,

where the inequality comes from the subadditivity of P{N (·) ≥ 1}, so {Xn } is a submartingale. Now hn (x) ≤ 1 (all x), so Xn (ω) ≤ 1 (all ω), and we can apply the submartingale convergence theorem [Theorem A3.4.III or, e.g., Chung (1974, Theorem 9.4.4)] and conclude that Xn (x) converges IP-a.s. Equivalently, limn→∞ hn (x) exists λ-a.e. on A, so to complete the proof of (9.3.22) it remains to identify the limit. For this, it is enough to show that lim supn→∞ hn (x) = 1 λ-a.e., and this last fact follows from hn (x) ≤ 1 (all x) and the chain of relations hn (x) λ(dx) ≤ lim sup hn (x) λ(dx) ≤ λ(A), λ(A) = lim n→∞

A

A

n→∞

in which we have used the lim sup version of Fatou’s lemma. The same martingale argument can be applied to the function

! P N An (x) = 1 λ An (x) if λ An (x) > 0, (1) hn (x) ≡ 0 otherwise, because the set function P{N (·) = 1} is again subadditive. Now for a simple point process with boundedly ﬁnite λ, kn

P{N (Ani ) = 1} → λ∗1 (A) = λ(A)

(n → ∞),

i=1

so it again follows that h(1) n (x) → 1

λ-a.e.

Combining (9.3.22) and (9.3.24), gn (x) = 1 − for λ-a.e. x.

(1) hn (x)/hn (x)

(9.3.24) → 0 as n → ∞

9.3.

Sample Path Properties: Atoms and Orderliness

49

Exercises 9.3.9–10 show that the λ-a.e. qualiﬁcation cannot be relaxed. Clearly, from (9.3.22) and (9.3.24), the proposition could equally well be phrased in terms of λ-orderliness rather than Khinchin orderliness. In this form the signiﬁcance of the result is more readily grasped, namely, that for a simple point process with boundedly ﬁnite λ, not only are M (·) and λ(·) interchangeable but also, for suitably ‘small’ sets δA and λ-a.e., we can interpret M (δA) = λ(δA) = P{N (δA) = 1} 1 + o(1) .

(9.3.25)

Note that for any x with λ{x} > 0, λ{x} = P{N {x} = 1} = P{N {x} ≥ 1} because N is simple. Equation (9.3.25) provides a link between the statements in Chapter 3 of conditional probabilities as elementary limits and those in Chapter 13 derived by direct appeal to the Radon–Nikodym theorem. The converse parts of the last few propositions in this section have included the proviso that the intensity measure be boundedly ﬁnite; without this proviso, the assertions may be false (see Exercise 9.3.11 and the references there).

Exercises and Complements to Section 9.3 9.3.1 Use Lemma 9.3.II to show that a general gamma random measure process [see Example 9.1(d)] has no ﬁxed atoms if and only if its shape parameter is a nonatomic measure. [See also the remark in Example 9.5(c).] 9.3.2 An elaboration of the argument leading to Lemma 9.3.V is as follows. Suppose given µ ∈ M# X , bounded A ∈ BX , a dissecting system T = {Tn } for A, and

> 0. Deﬁne ν(n) (A) = #{i: Ani ∈ Tn , µ(Ani ) ≥ }. Let x1 , . . . , xk be atoms in A whose masses u1 , . . . , uk are at least [and, because µ(A) < ∞, k ≡ k( ) is certainly ﬁnite]. Verify the following. (n) (a) k = 0 if and only if ν (A) = 0 for all suﬃciently large n (cf. Lemma 9.3.III). (n)

(b) More generally, there exists n < ∞ such that ν (A) = k for all n ≥ n . (n)

(c) Because k is independent of T , so is ν (A) ≡ limn→∞ ν (A). (d) Let A vary over the bounded Borel sets. Then ν (·) is a measure (in fact, ν ∈ NX#∗ of Deﬁnition 9.1.II). (e) Given a random measure ξ, so that ξ(·, ω) ∈ M# X , denote by N (·, ω) the counting measure that corresponds to ν deﬁned from µ = ξ(·, ω). Use Proposition 9.1.VIII to verify that N is a random measure [indeed, by (d) and Deﬁnition 9.1.V, it is a simple point process]. k() (f) For each > 0 the relation µ (A) = i=1 ui δxi (A) deﬁnes a measure. Then lim↓0 µ (A) = µ(A) if and only if µ is purely atomic. Ê 9.3.3 For every A ∈ B([0.1]) let ξ(A, ω) = A f (u, ω) du, where for 0 ≤ u ≤ 1, f (u, ω) = r(r + 1)u(1 − u)r−1 with probability 1/[r(r + 1)] for r = 1, 2, . . . . Note that ξ([0, 1], ω) = 1 a.s., but there is no interval of positive length in the neighbourhood of 0 where ξ is ‘small’ a.s. Investigate whether (9.3.7) is satisﬁed.

50

9. Basic Theory of Random Measures and Point Processes

9.3.4 Absolutely continuous random distributions [cf. Example 9.1(e)]. Construct a sequence {Fn (·)} of distribution functions on (0, 1] as follows: Fn (0) = 0, Fn (1) = 1 (n = 1, 2, . . . , ), F1 ( 12 ) = Fn ( 12 ) = U11 , and for k = 1, 3, . . . , 2n − 1, Fn (k/2n ) = (1 − Unk )Fn−1 ((k − 1)/2n ) + Unk Fn−1 ((k + 1)/2n ) = Fn+r (k/2n )

(n = 1, 2, . . . ; r = 1, 2, . . .),

where Un ≡ {Unk : k = 1, . . . , 2n − 1} is a family of i.i.d. (0, 1)-valued r.v.s with EUnk = 12 and σn2 = var Unk , the families {Un } are mutually independent, and Fn (x) is obtained by linear interpolation between Fn (jn (x)/2n ) and Fn ((jn (x) + 1)/2n ), where jn (x) = largest integer ≤ 2n x. With Fn (x) so deﬁned on 0 ≤ x ≤ 1, the derivative fn (·) of Fn (·) is well deﬁned except at x = anj ≡ j/2n (j = 0, 1, . . . , 2n ), where we adopt the convention that f (anj ) = 1 (all n, all anj ). (a) Show that there is a d.f. F on 0 ≤ x ≤ 1 such that

Pr{Fn (x) → F (x) (0 ≤ x ≤ 1) as n → ∞} = 1.

(b) Provided the Unj are suﬃciently likely to be close to 2 condition ∞ n=1 σn < ∞ is suﬃcient, show that

1 , 2

for which the

Pr{fn (x) → f (x) for 0 < x < 1} = 1 x

for some density function f (·) for which F (x) = 0 f (u) du a.s. for 0 < x < 1. Thus, the random d.f. F (·) is a.s. absolutely continuous with respect to Lebesgue measure. [Hint: Let the r.v. W be uniformly distributed on [0, 1] and independent of {Un }. Show that {fn (W )} is a martingale, and assuming σn2 < ∞, use the mean square martingale convergence theorem to deduce that fn (W ) converges a.s. and that F is the integral of its limit.] (c) Investigate other conditions such as ∞ n=1 Pr{|2Unk − 1| < 1 − } = ∞ for some > 0 or lim inf n→∞ σn2 < 14 that may be suﬃcient to imply either that F (·) is continuous on (0, 1) or else that F (·) has jumps. [Remarks: For constructions related to the above by Kraft (1964), see Dubins and Freedman (1967)—the random d.f.s they construct on [0, 1] are a.s. singular continuous—and M´etivier (1971) where the construction leads to random d.f.s on [0, 1] that are a.s. absolutely continuous.] 9.3.5 Let the nonnegative set function ψ deﬁned on the Borel subsets of a space X be subadditive under reﬁnement. Deﬁne a set function, λψ say, much as in Deﬁnition 9.3.VIII. Show that if λψ (A) is ﬁnite on bounded A ∈ BX , then λψ is a measure. [Hint: Check that λψ is continuous on the empty set.] 9.3.6 Korolyuk equation for purely atomic random measure. Let N be a marked point process on X × (0, ∞) for which the ground process Ng has boundedly ﬁnite intensity measure λ. Denote by {(xi , κi )} the points of a realization κi δxi [see also (6.4.6) and of N , and consider the random measure ξX = (9.1.5)]. Show that for each ﬁnite x > 0 the set function λx (A) ≡ sup T

kn

P{0 < ξX (Ani ) ≤ x}

(bounded A ∈ BX ),

i=1

is an intensity measure for those points with marks ≤ x, and that [cf. (9.3.17)] ∞ E[ξX (A)] = [λ(A) − λx (A)] dx, ﬁnite or inﬁnite. 0

9.3.

Sample Path Properties: Atoms and Orderliness

51

9.3.7 Use equation (9.3.16) to show that a simple point process with boundedly ﬁnite intensity measure is ordinary. 9.3.8 Show that a Poisson process is simple if and only if it is ordinary. [Hint: Show that being simple is equivalent to the parameter measure being nonatomic.] 9.3.9 Let the point process N on R+ have points located at {n2 U : n = 1, 2, . . .}, where the r.v. U is uniformly distributed on (0, 1). Show that for 0 < h < 1, P{N (0, h] ≥ k} 6/π 2 . = λ((0, h]) k2

P{N (0, h] ≥ k} 1 = 2, P{N (0, h] ≥ 1} k

Conclude that the λ-a.e. constraint in Proposition 9.3.XV cannot be relaxed. 9.3.10 Suppose that the simple point process N on X has a boundedly ﬁnite intensity measure λ(·) that is absolutely continuous with respect to a measure µ(·) on X , and that there is a version of the Radon–Nikodym derivative dλ/dµ coinciding µ-a.e. with a continuous function. Use the techniques of the proof of Proposition 9.3.XV to show that, in the notation of (9.3.22), P {N (An (x)) ≥ 1} dλ → dµ µ(An (x))

(n → ∞) for µ-a.e. x.

Deduce in particular that when X = Rd and N is stationary, P {N (An (x)) ≥ 1} → const. (An (x))

-a.e. x,

and then use stationarity both to identify the constant and to eliminate the -a.e. condition (cf. Theorem 13.3.IV). 9.3.11 When N is a stationary mixed Poisson process on R with mixing distribution function F (·), P{N (0, x] = k} =

∞ 0

e−λx

(λx)k dF (λ) k!

(0 < x < ∞, k = 0, 1, . . .).

Prove the following. (i) N is simple [because F (∞−) = 1]. (ii) The intensity of N is ﬁnite or inﬁnite with limy→∞ F1 (y), where y

F1 (y) = 0

[1 − F (u)] du.

(iii) Whether the intensity is ﬁnite or not, the conditional probabilities at (9.3.20) converge to zero when F1 is slowly varying [i.e., F1 (2y)/F1 (y) → 1 as y → ∞]. (iv) N is orderly when y[1 − F (y)] → 0 as y → ∞. (v) N is ordinary when lim inf y→∞ y[1 − F (y)] = 0. [Hint: It follows from results in Daley (1982b) that these results are in fact necessary and suﬃcient. It is not diﬃcult to ﬁnd d.f.s F showing that none of the implications (iii) ⇒ (iv) ⇒ (v) ⇒ (i) can be reversed; for other examples see Exercise 3.3.2, Daley (1974, 1982a), and MKM (1978, pp. 68, 371), but note that MKM use the term orderly for what we have called ordinary.]

52

9. Basic Theory of Random Measures and Point Processes

9.4. Functionals: Deﬁnitions and Basic Properties In the study of families of independent random measures and point processes, transforms analogous to Laplace–Stieltjes transforms and probability generating functions of nonnegative and integer-valued r.v.s play a central role. In this section we outline a general setting in the context of random measures and give general characterization results, before extending the discussion in Chapter 5 of the properties of probability generating functionals (p.g.ﬂ.s) of point processes. Then in Section 9.5 we discuss moment measures and some of their connections with functionals of random measures and point processes. Let f be a Borel measurable function, deﬁned on the same c.s.m.s. space as the random measure ξ. No special deﬁnitions are needed to introduce the integral f (x) ξ(dx) ≡ f dξ, (9.4.1) ξf = X

for by assumption each realization of ξ is a boundedly ﬁnite Borel measure, and the usual theory of the Lebesgue integral applies on a realization-byrealization basis. In particular, if we introduce the space BM(X ) of bounded measurable functions which vanish outside a bounded set in X , then with probability 1 the integral exists and is ﬁnite. It is also a random variable; this can be seen by ﬁrst taking f to be the indicator function of a bounded Borel set and then applying the usual approximation arguments using linear combinations and monotone limits. Now the class of r.v.s is closed under both these operations, and ξf = ξ(A) is certainly an r.v. when f = IA , so it follows that ξf is a (proper) r.v. for any f ∈ BM(X ). The study of such random integrals, which are evidently linear in f , links the theory of random measures with a whole hierarchy of theories of random linear functionals, of which the theory of random distributions is perhaps the most important, and is relevant in discussing second-order properties (see Chapter 8). We pause, therefore, to give a brief introduction to such general theories. Given any linear space U on which the notions of addition and scalar multiplication are deﬁned, the concept of a linear functional, that is, a mapping γ from U into the real line satisfying γ(αu + βv) = αγ(u) + βγ(v)

(α, β ∈ R; u, v ∈ U)

(9.4.2)

makes sense, and we may consider the space of all such linear functionals on a given U. Furthermore, if U has a topology conformable with the linear operations (i.e., one making these continuous), we may consider the smaller space of continuous linear functionals on U. Many diﬀerent possibilities arise, depending on the choice of U and of the topology on U with respect to which continuity is deﬁned. With any such choice there are several ways in which we may associate a random structure with the given space of linear functionals. Of these we

9.4.

Functionals: Deﬁnitions and Basic Properties

53

distinguish two general classes, which we call strict sense and broad sense random linear functionals. A natural σ-algebra in the space CU of continuous linear functionals on U is the smallest σ-algebra with respect to which the mappings γ: u → γ(u) are measurable with respect to each u ∈ U. Endowing CU with this σ-algebra, we may deﬁne a strict sense random linear functional on U as a measurable mapping Γ(·) from a probability space into CU . This ensures, as a minimal property, that Γ(u) is a random variable for each u ∈ U. On the other hand. it is often diﬃcult to determine conditions on the distributions of a family of r.v.s {Γu }, indexed by the elements of U, that will allow us to conclude that the family {Γu } can be identiﬁed a.s. with a random functional Γ(u) in this strict sense. The same diﬃculty arises if we attempt to deﬁne a random linear functional as a probability distribution on CU . How can we tell, from the ﬁdi distributions or otherwise, whether such a distribution does indeed correspond to such an object? Even in the random measure case this discussion is not trivial, and in many other situations it remains unresolved. The alternative, broad sense, approach is to accept that a random linear functional cannot be treated as anything more than a family of r.v.s indexed by the elements of U and to impose on this family appropriate linearity and continuity requirements. Thus, we might require that Γαu+βv = αΓu + βΓv

a.s.

(9.4.2 )

and, if un → u in the given topology on U, ξun → ξu

a.s.

(9.4.3)

or, at (9.4.3), we could merely use convergence in probability or in quadratic mean. If Γu = Γ(u) for all u ∈ U, where Γ(·) is a strict sense random linear functional, then of course both (9.4.2 ) and (9.4.3) hold a.s. Dudley (1969) reviews some deeper results pertaining to random linear functionals. Example 9.4(a) Generalized random processes (random Schwartz distributions). Take X = R, and let U be the space of all inﬁnitely diﬀerentiable functions on R that vanish outside some ﬁnite interval; that is, U is a space of test functions on R. Introduce a topology on U by setting un → u if and only if the {un } vanish outside some common ﬁnite interval, and for all k ≥ 0, (k) the kth derivatives {un } converge uniformly to u(k) . Then CU , the space of all functionals on U satisfying (i) the linearity condition (9.4.2), and (ii) the continuity condition γ(un ) → γ(u) whenever un → u in U, is identiﬁed with the space of generalized functions, or more precisely Schwartz distributions. Any ordinary continuous function g deﬁnes such a distribution through the relation ∞ g(x)u(x) dx, γ(u) = −∞

54

9. Basic Theory of Random Measures and Point Processes

the continuity condition (ii) following from the boundedness of g on the ﬁnite interval outside which the un vanish and the uniform convergence of the un themselves. Similarly, any bounded ﬁnite measure G on R deﬁnes a distribution by the relation u(x) G(dx). γ(u) = R

However, many further types of Schwartz distribution are possible, relating, for example, to linear operations on the derivatives of u. The corresponding strict sense theory has been relatively little used, but the broad sense theory plays a central role in the second-order theory of stationary generalized processes, of which the second-order theory of stationary point processes and random measures forms a special case. A similar theory for random generalized ﬁelds can be developed by taking test functions on Rd in place of test functions on R. Gelfand and Vilenkin (1964) and Yaglom (1961) give systematic treatments of these broad sense theories. A natural tool for handling any type of random linear functional is the characteristic functional, deﬁned by ΦΓ [g] = E[exp(iΓg )]

(g ∈ U),

(9.4.4)

where Γg is a random linear functional (strict or broad sense) on U. It can be described as the characteristic function E(eisΓg ) of Γg , evaluated at the arbitrary value s = 1 and treated as a function of g rather than s. Example 9.4(b) Gaussian measures on Hilbert space. Random variables taking their values in a Hilbert space H can be placed within the general framework of random linear functionals by taking advantage of the fact that the space of continuous linear functionals on a Hilbert space can be identiﬁed with the given Hilbert space itself. In this interpretation Γ(u) is identiﬁed with the inner product Γ, u for u ∈ H. When H is ﬁnite-dimensional, the characteristic functional reduces to the multivariate characteristic function n = φ(u1 , . . . , un ) = φ(u), Γk u k E[exp(iΓ, u)] = E exp i k=1

where Γk and uk are the coordinates of Γ and u. In this case a Gaussian measure is just the ordinary multivariate normal distribution: setting the mean terms equal to zero for simplicity, the characteristic function has the form φ(u) = exp − 12 u Au , where u Au is the quadratic form associated with the nonnegative deﬁnite (positive semideﬁnite) symmetric matrix A. This suggests the generalization to inﬁnite-dimensional Hilbert space of Φ[u] = E[exp(iΓu )] = exp(− 12 u, Au),

(9.4.5)

9.4.

Functionals: Deﬁnitions and Basic Properties

55

where A is now a positive deﬁnite self-adjoint linear operator. The ﬁnitedimensional distributions nof Γu1 , . . . , Γun for arbitrary u1 , . . . , un in H can be determined by setting k=1 sk uk in place of u in (9.4.5): they are of multivariate normal form with n × n covariance matrix having elements ui , Auj . From this representation the consistency conditions are readily checked, as well as the linearity requirements (9.4.2). If un → u (i.e., un − u → 0), it follows from the boundedness of A that (un − u), A(un − u) → 0 and hence that Γun → Γu in probability and in quadratic mean. These arguments suﬃce to show that (9.4.5) deﬁnes a broad sense random linear functional on H, but they are not suﬃcient to imply that (9.4.5) deﬁnes a strict sense random linear functional. For this, more stringent requirements are needed; these have their roots in the fact that a probability measure on H must be tight, and hence in a loose sense approximately concentrated on a ﬁnite-dimensional subset of H. It is known [see, e.g., Parthasarathy (1967, Chapter 8)] that the necessary and suﬃcient conditions for (9.4.5) to be the characteristic functional of a strict sense random linear functional on H [so that we can write Γu = Γ(u) = Γ, u], or, equivalently, of a probability measure on H itself, is that the operator A be of Hilbert–Schmidt type. In this case the characteristic functional has the more special form λk (hk , u)2 , Φ[u] = exp − 12 where {hk } is a complete set eigenvectors for A, {λk } is the set of corre of sponding eigenvalues, and λ2k < ∞. Returning to the random measure context, let us ﬁrst note that random measures can just as easily be characterized by the values of the integrals (9.4.1) as they can by their evaluations on Borel sets; indeed, the latter are just a special case of the former when f is an indicator function. It follows at once thatB(M# X ) is the smallest σ-algebra with respect to which the random integrals f dξ are measurable for each f ∈ BM(X ), and that a mapping ξ f dξ is from a probability space into M# X is a random measure if and only if a random variable for each f ∈ BM(X ) [a smaller class of functions f suﬃces: Kallenberg (1983a, Exercise 3.1) indicates a stronger version of this result]. A more useful result is the following analogue of Proposition 9.1.VIII, the proof of which is left to Exercise 9.4.1. Proposition 9.4.I. Let {ξf } be a family of random variables, indexed by the elements f of BM(X ). Then there exists a random measure ξ such that ξf = f dξ a.s. if and only if (i) ξαf +βg = αξf + βξg a.s. for all scalars α, β and f , g ∈ BM(X ); and (ii) ξfn → ξf a.s. as n → ∞ for all monotonically converging nonnegative sequences {fn } ⊂ BM(X ) (i.e. fn ≥ 0 and fn ↑ f ).

56

9. Basic Theory of Random Measures and Point Processes

Conditions (i) and (ii) are, of course, just the conditions (9.4.2) and (9.4.3) in a form suitable for random measures; the importance of the proposition is that it implies that the broad and strict sense approaches are equivalent for random measures. From this point it is easy to move to a characterization of the ﬁdi distributions of the integrals f dξ for a random measure; we state this in the form of a characterization theorem for characteristic functionals, which we deﬁne by (f ∈ BM(X )), (9.4.6) Φξ [f ] = E exp i f dξ as the appropriate special form of (9.4.4). Theorem 9.4.II. Let the functional Φ[f ] be real- or complex-valued, deﬁned for all f ∈ BM(X ). Then Φ is the characteristic functional of a random measure ξ on X if and only if (i) for every ﬁnite family f1 , . . . , fn of functions fk ∈ BM(X ), the function n φn (f1 , . . . , fn ; s1 , . . . , sn ) = Φ sk fk

(9.4.7)

k=1

is the multivariate characteristic function of proper random variables ξf1 , . . . , ξfn , which are nonnegative a.s. when the functions f1 , . . . , fn are nonnegative; (ii) for every sequence {fn } ⊂ BM(X ) with fn ≥ 0 and fn ↑ f pointwise, Φ[fn ] → Φ[f ] ; and

(9.4.8)

(iii) Φ(0) = 1 where 0 here denotes the zero function in BM(X ). Moreover when the conditions are satisﬁed, the functional Φ uniquely determines the distribution of ξ. Proof. If ξ is a random measure, conditions (i) and (ii) are immediate, and imply that the ﬁdi distributions of ξ, and hence the distribution of ξ itself, are uniquely determined by ξ. If fn → f pointwise, and the fn are either monotonic or bounded by a common element of BM(X ), it follows from the Lebesgue convergence theorems that for each realization, ξfn =

fn dξ →

f dξ = ξf ,

so that ξfn → ξf a.s. Equation (9.4.8) follows from " " application of a further the dominated convergence theorem, using " exp i fn dξ " ≤ 1. Suppose next that conditions (i)–(iii) are satisﬁed. Condition (i) subsumes both Kolmogorov consistency conditions for the ﬁdi distributions of the r.v.s ξf deﬁned by (9.4.7) for f ∈ BM(X ). For example, in characteristic function

9.4.

Functionals: Deﬁnitions and Basic Properties

57

terms, the requirement of marginal consistency reduces to the trivial veriﬁcation that n−1 n−1 si fi + 0 · fn = Φ si fi . Φ k=1

k=1

Thus, we may assume the existence of a jointly distributed family of r.v.s {ξf , f ∈ BM(X )}. Condition (i) also implies the linearity property (i) of Proposition 9.4.I, for the condition ξf3 = ξf1 + ξf2 a.s. is equivalent to the identity Φ[s1 f1 + s2 f2 + s3 f3 ] = Φ[(s1 + s3 )f1 + (s2 + s3 )f2 ], which will certainly be valid if f3 = f1 + f2 . A similar argument applies when scalar multipliers α, β are included. Finally, condition (ii) of the theorem implies that the distribution of ξf −fn approaches the distribution degenerate at 0, and hence that ξf −fn converges in probability to zero. From the linearity property of the ξf we deduce that ξfn → ξf

in probability.

(9.4.9)

However, because we assume in condition (ii) that the sequence {fn } is monotonic increasing, it follows from condition (i) that the sequence {ξfn } is a.s. monotonic increasing. Because ξfn ≤ ξf a.s. by similar reasoning, ξfn converges a.s. to a proper limit r.v. X say. But then (9.4.9) implies that X = ξf a.s., so condition (ii) of Proposition 9.4.I is also satisﬁed. The existence of a random measure with the required properties now follows from Proposition 9.4.I and the part of the theorem already proved. Variants of condition (ii) above are indicated in Exercise 9.4.2. As described above, the characteristic functional emerges naturally from the context of random linear functionals, but in the study of random measures and point processes, which are nonnegative by deﬁnition, it is enough to use real variable counterparts. The Laplace functional is deﬁned for f ∈ BM+ (X ), the space of all nonnegative f ∈ BM(X ), by Lξ [f ] = E exp − f dξ

(f ∈ BM+ (X ))

(9.4.10)

[this is the same as (6.1.8)]. An exact counterpart of Theorem 9.4.II holds for Laplace functionals; see Exercise 9.4.3 for some detail. Observe, in particular, that the theorem implies that the distribution of a random measure is completely determined by the Laplace functional. The class BM+ (X ) of functions on which the Laplace functional is deﬁned can be restrictive for some applications, as, for example, in discussing the mixing properties of cluster processes in Section 12.3. We therefore deﬁne the extended Laplace functional for use in such contexts much as for L but on the space of functions BM+ (X ) consisting of functions f that are expressible as

58

9. Basic Theory of Random Measures and Point Processes

the monotone limit of an increasing sequence of functions {fn } ⊂ BM+ (X ). Then by the monotone convergence theorem, fn (x) ξ(dx) ↑ f (x) ξ(dx) a.s., X

X

whether the limit is ﬁnite or inﬁnite, and then by dominated convergence, Lξ [fn ] → Lξ [f ] ≡ E exp − X f dξ (f ∈ BM+ (X )), (9.4.11) where we use Lξ or, more brieﬂy, L, both for the functional as originally deﬁned and for its extension to BM+ (X ). The extended Laplace functional, as deﬁned over functions f ∈ BM+ (X ), has continuity properties as below, but it need not be continuous for monotone sequences {fn } ⊂ BM+ (X ): take f (x) = (all x ∈ X ), ξ(X ) = ∞ a.s.; then for all > 0, L[f ] = 0 = L[0] = 1. Proposition 9.4.III. The extended Laplace functional L[·] satisﬁes L[fn ] → L[f ]

(fn , f ∈ BM+ (X ))

whenever either (a) fn (x) ↑ f (x), or (b) fn (x) → f (x) and there exists a nonnegative measurable function ∆(·) such that X ∆(x) ξ(dx) < ∞ a.s. and |fn (x) − f (x)| ≤ ∆(x) for all suﬃciently large n. Proof. If fn ↑ f , then it is easy to construct a monotone sequence of functions {fn } ⊂ BM+ (X ) with fn (x) ↑ f (x), and (9.4.11) holds by deﬁnition. In the other case, we have for all n ≥ some n0 , fn (x) ≤ f (x) + ∆(x) ≤ fn0 (x) + 2∆(x), so X fn (x) ξ(dx) ≤ X [fn0 (x) + 2∆(x)] ξ(dx) ≤ ∞ a.s., and by dominated convergence applied to the sequence {fn (·)}, fn (x) ξ(dx) → f (x) ξ(dx) < ∞ a.s.

X

X

A second appeal to dominated convergence now implies (9.4.11). Under conditions (b) here, it follows that L[ f ] → 1 = L[0] as → 0, so for such fn and f ∈ BM+ (X ), the extended Laplace functional has all the properties of the ordinary Laplace functional (9.4.10). Because random measures are inherently nonnegative, the Laplace functional is often the most appropriate tool to use in handling random measures, just as the Laplace–Stieltjes transform is generally the most useful tool for handling nonnegative random variables. It is only when the r.v. is integervalued, or, in our context, when the random measure is a point process, that

9.4.

Functionals: Deﬁnitions and Basic Properties

59

there are advantages in moving to the probability generating function (p.g.f.) or its counterpart the probability generating functional (p.g.ﬂ). We have already discussed the p.g.ﬂ. of ﬁnite point processes in Chapter 5, deﬁned there on the class U of complex-valued Borel measurable functions ζ satisfying the condition |ζ(x)| ≤ 1 (all x). But just as the Laplace functional is discussed more advantageously as a functional over a space of real-valued functions, so is the p.g.ﬂ. discussed better over a narrower class of functions. We largely follow Westcott’s (1972) general treatment. Deﬁnition 9.4.IV. V(X ) denotes the class of all real-valued Borel functions h deﬁned on the c.s.m.s. X with 1 − h vanishing outside some bounded set and satisfying 0 ≤ h(x) ≤ 1 (all x ∈ X ). V0 (X ) is the subset of h ∈ V(X ) satisfying inf x∈X h(x) > 0. V(X ) is the space of functions h expressible as limits of monotone sequences hn ∈ V(X ). Extending Deﬁnition 5.5.I, the probability generating functional (p.g.ﬂ.) of a (general) point process N on the c.s.m.s. X is deﬁned by log h(x) N (dx) (h ∈ V(X )). (9.4.12) G[h] ≡ GN [h] = E exp X

Because a point process is a.s. ﬁnite on the bounded set where 1 − h does not vanish, the exponential of the integral at (9.4.12) can legitimately be written in the product form GN [h] = E h(xi ) , (9.4.13) i

where the product is taken over the points of each realization of N (recall Proposition 9.1.V), with the understanding that it takes the value zero if h(xi ) = 0 for any xi , and unity if there are no points of N within the support of 1 − h. If h is such that − log h ∈ BM+ (X ), then the equation G[h] = LN [− log h]

(9.4.14)

relates the p.g.ﬂ. to the Laplace functional of the point process [cf. (9.4.10)]. Indeed, − log h ∈ BM+ (X ) implies that the values of h lie within a closed subset of (0, 1], so because the distribution of a random measure is determined by all f ∈ BM+ (X ), the distribution of a point process is determined by all h ∈ V0 (X ). Although results for point processes need to be proved only with this more restricted class of functions, it is only in our discussion of mixing properties of cluster processes (see Proposition 12.3.IX) that we need the constraint, so mostly we use V(X ). Note that if {hn (x)} is a pointwise convergent sequence of functions ∈ V0 (X ) with the support of each 1 − hn contained by some ﬁxed bounded set, then the pointwise limit, h say, has h ∈ V(X ) but not necessarily h ∈ V0 (X ). To this extent, V(X ) is a simpler class with which to work.

60

9. Basic Theory of Random Measures and Point Processes

By putting additional restrictions on the point process, the p.g.ﬂ. can be deﬁned for more general classes of functions h. For example, if the expectation measure M of N exists [cf. around (9.5.1) below], and nonnegative h is so chosen that | log h(x)| M (dx) < ∞, (9.4.15) X

then the integral in (9.4.12) converges a.s. to a ﬁnite quantity, and the expectation exists. The p.g.ﬂ. of a point process can be characterized as in Theorem 9.4.V. It is an exact analogue of Theorem 9.4.II, so proof is left to the reader. Theorem 9.4.V. Let the functional G[h] be real-valued, deﬁned for all h ∈ V(X ). Then G is the p.g.ﬂ. of a point process N on X if and only if (i) for every h of the form n (1 − zk )IAk (x), 1 − h(x) = k=1

where the bounded Borel sets A1 , . . . , An are disjoint and |zi | ≤ 1, the p.g.ﬂ. G[h] reduces to the joint p.g.f. Pn (A1 , . . . , An ; z1 , . . . , zn ) of an ndimensional integer-valued random variable; (ii) for every sequence {hn } ⊂ V(X ) with hn ↓ h pointwise, G[hn ] → G[h] whenever 1 − h has bounded support; and (iii) G[1] = 1 where 1 denotes the function identically equal to unity in X . Moreover, when these conditions are satisﬁed, the functional G uniquely determines the distribution of N . Variants on the continuity condition (ii) are again possible, although more is needed than just pointwise convergence (see Exercise 9.4.5). Indeed, we shall have a need for the extended p.g.ﬂ. G[·] deﬁned by analogy with the extended Laplace functional at (9.4.11) as log h(x) N (dx) (h ∈ V(X)) G[h] ≡ E exp X

[see Deﬁnition 9.4.IV for V(X)]. Further details are given in Exercise 9.4.6. Example 9.4(c) Poisson and compound Poisson processes. The form of the p.g.ﬂ. for the Poisson process is already implicit in the form of the p.g.f. obtained in Chapter 2 and its multivariate extension [see (2.4.5) and (9.2.9)] k (1 − zj )µ(Aj ) , (9.4.16) Πk (A1 , . . . , Ak ; z1 , . . . , zk ) = exp − j=1

where A1 , . . . , Ak are disjoint and µ(·) is the parameter measure of the process. k Writing h(x) = 1 − j=1 (1 − zj )IAj (x), so that h(x) = zj on Aj and = 1 outside the union of all the Aj , (9.4.16) is expressible as [1 − h(x)] µ(dx) , (9.4.17) G[h] = exp − X

which is evidently of the required form for the p.g.ﬂ.

9.4.

Functionals: Deﬁnitions and Basic Properties

61

The following heuristic derivation of (9.4.17) also throws light on the character of the p.g.ﬂ. Suppose that the support of 1 − h(x) is partitioned into small subsets ∆Ai in each of which is a ‘representative point’ xi . Then, approximately, log h(x) N (dx) ≈ log h(xi )N (∆Ai ), X

i

where because the r.v.s N (∆Ai ) are independent (by assumption), N (∆Ai ) log h(x) N (dx) ≈E exp log h(xi ) E exp X

=

i

E h(xi )N (∆Ai ) = exp − [1 − h(xi )]µ(∆Ai ) i

i

[1 − h(x)] µ(dx) . ≈ exp − X

The corresponding expression for the p.g.ﬂ. of a compound Poisson process, understood in the narrow sense of Section 2.4 as a nonorderly point process, is [1 − Πh(x) (x)] µ(dx) G[h] = exp − X (9.4.18) πn (x) 1 − [h(x)]n µ(dx) ; = exp − X

n

this reduces to (2.4.5) for the univariate p.g.f. when h(x) = 1 − (1 − z)IA (x). It is not diﬃcult to verify that G satisﬁes the conditions of Theorem 9.4.V, and therefore it represents the p.g.ﬂ. of a point process which enjoys the complete independence property discussed further in Section 10.1. Indeed, it is not diﬃcult to check from the representation in (9.4.18) that the compound Poisson process can be characterized as a completely random measure which has no drift component, is free of ﬁxed atoms, and has random atoms of positive integral mass. The generalized compound Poisson process described in Section 6.4 corresponds to a marked Poisson process with independent nonnegative marks. As a point process on X × K (with K = R+ ) it has a Laplace functional of the form −f (x,κ) 1−e π(dκ) µ(dx) (f ∈ BM+ (X × K)). L[f ] = exp − X

K

Here the interpretation as a Poisson process on the product space is immediately evident. Furthermore, when K = R+ , the cumulative process ξ(A) = i:xi ∈A κi has the Laplace functional, now deﬁned over f ∈ BM+ (X ), − κi f (xi ) −κf (x) i = exp − 1−e π(dκ) µ(dx) . L[f ] = E e X

K

This exhibits the process as a completely random measure (Deﬁnition 10.1.I) with neither any ﬁxed atoms nor a drift component.

62

9. Basic Theory of Random Measures and Point Processes

Example 9.4(d) Mixed Poisson process. Referring to (9.4.16), denote the ﬁdi distributions of a Poisson process by Pk (· | µ) for short. Then N is a mixed Poisson process when for some r.v. Λ and boundedly ﬁnite measure µ its ﬁdi distributions Pk (·) are given by Pk (·) = E[Pk (· | Λµ)],

(9.4.19)

the expectation being with respect to Λ. Write L(s) = E(e−sΛ ) [Re(s) ≥ 0] for the Laplace–Stieltjes transform of Λ. It then follows from (9.4.19) and (9.4.17) that the p.g.ﬂ. of a mixed Poisson process is given by [1 − h(x)]Λ µ(dx) =L [1 − h(x)] µ(dx) . G[h] = E exp − X

X

We have already remarked that one of the most important properties of the transforms of r.v.s is the simpliﬁcation they aﬀord in handling problems involving sums of independent r.v.s. The summation operator for random measures is deﬁned by (ξ1 + ξ2 )(A) = ξ1 (A) + ξ2 (A)

(all A ∈ BX ),

(9.4.20)

and it is both obvious and important that it extends the notion of superposition of point processes. Note that (9.4.20) has the equivalent form (all f ∈ BM+ (X )). (9.4.20 ) f d(ξ1 + ξ2 ) = f dξ1 + f dξ2 Now suppose that {ξi : i = 1, 2, . . .} is an inﬁnite sequence of random measures, each deﬁned on (Ω, E, P), such that ζ(A) ≡

∞

ξi (A)

(9.4.21)

i=1

is a.s. ﬁnite on all bounded A ∈ BX . It is well known and easy to check that a countable sum of measures is again a measure. Thus, ζ(·) is a boundedly ﬁnite measure, at least on the ω set where the ξi are simultaneously measures, which set has probability 1 by assumption and the fact that only a countable family is involved. Redeﬁning ζ to be zero on the complementary ω set of P-measure zero, and observing that a countable sum of r.v.s is again a r.v., we obtain a mapping from (Ω, E, P) into M# X satisfying the condition of Corollary 9.1.IX and which is therefore a random measure. Thus, we have the following lemma. Lemma 9.4.VI. ζ(·) deﬁned at (9.4.21) is a random measure if and only if the inﬁnite sum at (9.4.21) converges for all bounded A ∈ BX . No new concepts arise in the following deﬁnition of independence of two random measures; it extends to the mutual independence of both ﬁnite and inﬁnite families of random measures in the usual way.

9.4.

Functionals: Deﬁnitions and Basic Properties

63

Deﬁnition 9.4.VII. The random measures ξ1 and ξ2 are independent when they are deﬁned on a common space (Ω, E, P) and are such that P(F1 ∩ F2 ) = P(F1 )P(F2 ) for all ﬁnite families Fi of events deﬁned on ξi (i = 1, 2). Let ξi have characteristic functional Φi and Laplace functional Li . By writing ζn = ξ1 + · · · + ξn , the following assertions are simple consequences of the deﬁnitions and Lemma 9.4.VI, and can be proved by methods exploited already (see Exercise 9.4.7). Proposition 9.4.VIII. When the random measures ξ1 , ξ2 , . . . are mutually independent, the sum ζn = ξ1 + · · · + ξn has the characteristic functional n

Φi [f ] (all f ∈ BM(X )) (9.4.22a) Φζn [f ] = i=1

and Laplace functional Lζn [f ] =

n

Li [f ]

(all f ∈ BM+ (X )).

(9.4.22b)

i=1

Lζn [f ] converges as n → ∞ to a nonzero limit Lf for each f ∈ BM+ (X ) if and only if the inﬁnite sum at (9.4.21) is ﬁnite on bounded A ∈ BX , and then Lf is the Laplace functional of the random measure ζ at (9.4.21). The analogue of this result for the p.g.ﬂ. G of the superposition of the independent point processes N1 , . . . , Nn with p.g.ﬂ.s G1 , . . . , Gn is easily given (see Exercise 9.4.8 for proof). Proposition 9.4.IX. When the point processes N1 , N2 , . . . are mutually independent, the superposition N1 + · · · + Nn has p.g.ﬂ. n

Gi [h] (h ∈ V(X )). G[h] = i=1

This sequence of ﬁnite products converges if and only if the inﬁnite sum ∞ Ni (A) (9.4.23) N (A) = i=1

is a.s. ﬁnite on bounded A ∈ BX , and the inﬁnite product is then the p.g.ﬂ. of the point process N at (9.4.23).

Exercises and Complements to Section 9.4 9.4.1 Use linear combinations of indicator functions and their limits to prove that conditions (i) and (ii) of Proposition 9.4.I imply the a.s. form of (9.4.1). 9.4.2 Condition (ii) of Theorem 9.4.II requires the continuity of characteristic functionals (9.4.24) Φ[fn ] → Φ[f ] for f , fn ∈ BM(X ) when fn (x) → f (x) (x ∈ X ) pointwise monotonically from below with f , fn in fact ∈ BM+ (X ). Show that (9.4.24) holds without this monotonicity of convergence or nonnegativity of f if either

64

9. Basic Theory of Random Measures and Point Processes (i) P{ξ(X ) < ∞} = 1, bounded measurable f and fn , with sup |f (x) − fn (x)| → 0

(n → ∞); or

(9.4.25)

x∈X

(ii) f and fn ∈ BM(X ), the union of their support is a bounded set, and (9.4.25) holds. Give an example of a random measure ξ and functions f , fn ∈ BM(X ) satisfying (9.4.25) for which (9.4.24) fails. [Hint: Consider a stationary Poisson process on R+ with f (x) = 0 (all x ∈ R+ ), fn (x) = n−1 I[0,n] (x).] 9.4.3 Laplace functional analogues of various results for characteristic functionals are available, subject to modiﬁcations reﬂecting the diﬀerent domain of deﬁnition; below, f ∈ BM+ (X ). (a) [See Theorem 9.4.II.] Show that {L[f ]: all f } uniquely determines the distribution of a random measure ξ. (b) [See Exercise 9.4.2.] For sequences fn , the convergence L[fn ] → L[f ] holds as supx∈X |fn (x) − f (x)| → 0 if (i) ξ is totally bounded; or (ii) the pointwise convergence fn → f is monotonic; or (iii) there is a bounded Borel set containing the support of every fn . Give examples to show that, if otherwise, the convergence L[fn ] → L[f ] may fail. 9.4.4 For a random measure ξ on the c.s.m.s. X with Laplace functional L[·], show that for any bounded A ∈ BX , P{ξ(A) = 0} = lims→∞ L[sIA ], whereas for any A ∈ BX , P{ξ(A) < ∞} = lims↓0 limn→∞ L[sIAn ], where {An } is an increasing sequence of bounded sets in BX for which A = limn→∞ An (the case A = X is of obvious interest). 9.4.5 Suppose that the functions hn ∈ V(X ) and that hn (x) → h(x) (n → ∞) for every x ∈ X . Show that G[hn ] → G[h] (n → ∞) if, in place of the conditions at (ii) of Theorem 9.4.V, either (a) N is a.s. totally ﬁnite, or (b) N has a ﬁnite ﬁrst moment measure M and X |hn (x) − h(x)| M (dx) → 0 as n → ∞. Let hn (x) = 1 − n−1 for |x| < n, = 1 for |x| ≥ n, so that hn (x) → 1 (all x) for n → ∞. Show that for a stationary Poisson process at rate λ, G[hn ] = e−2λ = 1 = G[limn→∞ hn (·)] (cf. Exercise 9.4.2). 9.4.6 Let {hn } ⊂ V(X ) have hn (x) → h(x) pointwise as n → ∞. Show that the extended p.g.ﬂ. convergence result G[hn ] → G[h] holds if any one of the following conditions holds. (a) h is the monotone limit of functions hn ∈ V(X ). (b) | log[hn (x)/h(x)]| < (x) (all n) and X (x) N (dx) < ∞ a.s. (c) inf x∈X hn (x) > c for some c > 0 and suﬃciently large n, and |hn (x) − h(x)| N (dx) < ∞ a.s. X

[Hint: This is the p.g.ﬂ. analogue of Proposition 9.4.III. For part (b), use the method of proof of Proposition 9.4.VIII. Part (c) follows from (b). See Daley and Vere-Jones (1987).]

9.5.

Moment Measures and Expansions of Functionals

65

9.4.7 For the partial sums ζn of random measures (cf. Proposition 9.4.VIII), show that Lζn [f ] has a nonzero limit for f ∈ BM+ (X ) if and only if the inﬁnite product ∞ i=1 (1 − exp[−ξi (A)]) > 0 a.s., that is, when the inﬁnite series at (9.4.21) converges. Hence, complete the proof of the proposition.

9.4.8 For an inﬁnite sequence N1 , N2 , . . . of independent point processes, show that the necessary and suﬃcient condition for the inﬁnite superposition ∞ i=1 Ni to be a well-deﬁned point process is the convergence for every bounded A ∈ BX ∞ of the sum i=1 pi (A), where pi (A) = Pr{Ni (A) > 0}. Hence, establish Proposition 9.4.IX.

9.4.9 Let N be a renewal process with lifetime d.f. F , and denote by Gb|a [h] the p.g.ﬂ. of the process on the interval [a, b] conditional on the occurrence of a point of the process at a. Much as in Bol’shakov (1969), show that this conditional p.g.ﬂ. satisﬁes the integral equation Gb|a [h] = [1 − F (b − a)] +

b

(h(x) + 1)Gb|x [h] dx F (x − a).

a

Similarly, if Ga|b [h] denotes the p.g.ﬂ. conditional on a point at b,

Ga|b [h] = [1 − F (b − a)] +

b

(h(x) + 1)Ga|x [h] |dx F (b − x)|.

a

Find extensions of these equations to the case where the renewal process is replaced by a Wold process.

9.5. Moment Measures and Expansions of Functionals As with ordinary random variables and characteristic functions, the characteristic functional is closely associated with the moment structure of the random measure, which, as in the point process case studied in Chapter 5, is expressed through a family of moment measures. In particular, for any random measure ξ on the c.s.m.s. X and any Borel set A, consider the expectation M (A) = E[ξ(A)]

(ﬁnite or inﬁnite).

(9.5.1)

Clearly, M inherits the property of ﬁnite additivity from the underlying random measure ξ. Moreover, if the sequence {An } of Borel sets is monotonic increasing to A, then by monotone convergence M (An ) ↑ M (A). Thus, M (·) is continuous from below and therefore a measure. In general, it need not take ﬁnite values, even on bounded sets, but when it does, we say that the expectation measure of ξ exists and is given by (9.5.1). The expectation measure M (·) may also be called the ﬁrst moment measure of ξ. When M exists, the above argument can readily be extended to the random integrals f dξ for f ∈ BM(X). Thus, if f is the indicator function of the bounded Borel set A, E f dξ = M (A). Extending in the usual way through linear combinations and monotone limits it follows that E f dξ = f dM (f ∈ BM(X )). (9.5.2)

66

9. Basic Theory of Random Measures and Point Processes

Equations of the form (9.5.2) have been included under the name Campbell theorem [see, e.g., Matthes (1972), MKM (1978)] after early work by Campbell (1909) on the shot-noise process in thermionic vacuum tubes [see also Moran (1968, pp. 417–423)]. Campbell measures that we discuss in Chapter 13 constitute a signiﬁcant extension of this simple concept. Consider next the k-fold product of ξ with itself, that is, the measure deﬁned a.s. for Borel rectangles A1 × · · · × Ak by ξ (k) (A1 × · · · × Ak ) =

k

ξ(Ai )

(9.5.3)

i=1

and extended to a measure, necessarily symmetric, on the product Borel σalgebra in X (k) . Now the rectangles form a semiring generating this σ-algebra, and (9.5.3) deﬁnes a random variable for every set in this semiring, so it follows from Proposition 9.1.VIII that ξ (k) is a random measure on X (k) . Deﬁnition 9.5.I. The kth order moment measure Mk (·) of ξ is the expectation measure of ξ (k) , whenever this expectation measure exists. This identiﬁcation is illustrated forcefully in the proof below of a result already given at Proposition 5.4.VI and where a key role is played by diag A(k) ≡ {(x1 , . . . , xk ) ∈ X (k) : x1 = · · · = xk = x ∈ A}.

(9.5.4)

Proposition 9.5.II. A point process N with boundedly ﬁnite second moment measure has M2 (diag A(2) ) ≥ M (A) for all bounded A ∈ BX ; equality holds if and only if N is simple. Proof. Let Tbe a dissecting system for A, so diag A(2) equals the monotone kn Ani × Ani . Because M2 is a measure and M2 (A(2) ) < ∞, limit limn→∞ i=1 M2 (diag A(2) ) = M2

lim

n→∞

kn

kn (Ani × Ani ) = lim M2 (Ani × Ani ) n→∞

i=1

i=1

kn kn N 2 (Ani ) = M (A) + lim E N (Ani )[N (Ani ) − 1] . = lim E n→∞

i=1

n→∞

i=1

Write the last term as E(Xn ). From the nesting property of the Tn , the r.v.s Xn are a.s. nonincreasing and ≥ 0, so by monotone convergence M2 (diag A(2) ) = M (A) + E lim Xn , n→∞

and limn→∞ Xn = 0 if and only if limn→∞ supi N (Ani ) ≤ 1 a.s.; that is, P{N ({x}) ≤ 1 for all x ∈ A} = 1; equivalently, N = N ∗ a.s. The following kindred property of random measures is proved similarly; see Exercise 9.5.5(b), and Section 9.3 and Exercise 9.5.6 for related results. Proposition 9.5.III. A random measure ξ with boundedly ﬁnite second moment measure M2 is a.s. nonatomic if and only if M2 (diag X (2) ) = 0.

9.5.

Moment Measures and Expansions of Functionals

67

Example 9.5(a) Mixtures of quadratic random measures. A Gaussian r.v. has moments of all orders, from which it follows that the same is true for the stationary quadratic random measure in Example 9.1(b). In particular, its ﬁrst and second moment measures are deﬁned by the equations Z 2 (x) dx = σ 2 (A), (9.5.5a) M (A) = E[ξ(A)] = E A

2 2 M2 (A × B) = E[ξ(A)ξ(B)] = E Z (x)Z (y) dx dy A B = [σ 4 + 2c2 (x − y)] dx dy. A

(9.5.5b)

B

From these representations it is clear that M and M2 are both absolutely continuous with respect to Lebesgue measure on R and R2 , with derivatives σ 2 and σ 4 + 2c2 (x − y), respectively, where c(·) is the covariance function for Z. Similar representations can be obtained for higher moments. Example 9.5(b) Mixed random measure. Let Λ be a positive r.v., independent of the random measure ξ(·), and set ξΛ (A) = Λξ(A). (k)

Using independence, the kth order moment measures MΛ for ξΛ are related to those of ξ by the equations (k)

MΛ (·) = E(Λk )Mk (·). Thus, if Λ has inﬁnite moments of order k and higher, the same will be true for the moment measures of ξΛ , and conversely, if the kth moment of ξΛ is ﬁnite, the kth order moment measure of ξ exists. This particular example is nonergodic [meaning, the values of M (A) cannot be determined from observations on a single realization of the process], but this is not a necessary feature of examples with inﬁnite moment measures: for example, in place of the r.v. Λ, we could multiply ξ by any continuous ergodic process λ(t) with inﬁnite moments and integrate to obtain a random measure with similar moment properties. This procedure of mixing, or randomizing, with respect to a given parameter of a process is a rich source of examples. Example 9.5(c) Moments of completely random measures (see Sections 2.2 and 10.1). If ξ(·) is completely random, the ﬁrst and second moment measures (assuming these are ﬁnite) are given by relations of the type M (A) = E[ξ(A)] = µ(A), M2 (A × B) = E[ξ(A)ξ(B)] = µ(A)µ(B) + var ξ(A ∩ B).

(9.5.6)

Particular interest here centres on the variance term: it vanishes unless the set A ∩ B = ∅, so this term represents a measure concentrated along the

68

9. Basic Theory of Random Measures and Point Processes

diagonal. For the stationary gamma random measure studied in Example 9.1(d), var ξ(A) = λ2 α(A) and so M2 (A × B) = λ2 α(A ∩ B) + λ2 α2 (A)(B).

(9.5.7)

2 2

Thus, M2 has a constant areal density λ α oﬀ the diagonal and a concentration with linear density λ2 α along it. Such concentrations are associated with the a.s. atomic character of the random measure (see Proposition 9.5.III) and should be contrasted with the absolutely continuous moment measures of Example 9.5(a) in which the realizations themselves are a.s. absolutely continuous measures. The next lemma summarizes the relation between moment measures and the moments of random integrals, and is useful in discussing expansions of functionals. Note also the identiﬁcation property at Exercise 9.5.7. Lemma 9.5.IV. Let the kth moment measure Mk of the random measure ξ exist. Then for all f ∈ BM(X ), the random integral f dξ has ﬁnite kth moment satisfying k = f (x1 ) . . . f (xk ) Mk (dx1 × · · · × dxk ). (9.5.8) E f dξ X (k)

Proof. Apply (9.5.2) to the product measure ξ (k) , for which Mk is the expectation measure. This gives h(x1 , . . . , xk ) Mk (dx1 × · · · × dxk ) X (k) (9.5.9) ··· h(x1 , . . . , xk ) ξ(dx1 ) . . . ξ(dxk ) =E X

X

for all k-dimensional bounded Borel measurable functions h(x1 , . . . , xk ): #k X (k) → X . Then (9.5.8) is the special case h(x1 , . . . , xk ) = i=1 f (xi ). We now consider the ﬁnite Taylor series expansion of the characteristic functional (see Exercise 9.5.8 for an expansion for the Laplace functional). Proposition 9.5.V. Let Φ be the characteristic functional of the random measure ξ, and suppose that the kth moment measure of ξ exists for some k ≥ 1. Then for each ﬁxed f ∈ BM(X ) and real s → 0, Φ[sf ] = 1 +

k (is)r r=1

r!

X (r)

f (x1 ) . . . f (xr ) Mr (dx1 × · · · × dxr ) + o(|s|k ).

(9.5.10) Furthermore, if the (k + 1)th moment exists, the remainder term o(|s|k ) is bounded by |s|k+1 k+1 (k+1) C Mk+1 (Af ), (9.5.11) (k + 1)! f where Cf is a bound for f , and f vanishes outside the bounded Borel set Af .

9.5.

Moment Measures and Expansions of Functionals

69

Proof. Because Φ[sf ] = φf (s), where φf is the ordinary characteristic function for the r.v. f dξ, both assertions follow from (9.5.8) and the corresponding Taylor series results for ordinary characteristic functions. The bound (9.5.11), with Cf and Af as deﬁned there, is derived from "k+1 k+1 " ≤E |f | dξ ≤ E [Cf ξ(Af )]k+1 . E " f dξ " The analogy with the ﬁnite-dimensional situation may be strengthened by noting that the moment measures can be identiﬁed with successive Fr´echet derivatives of Φ. Speciﬁcally, we can write formally Mk (A) = Φ(k) (IA ). The diﬃculty with such expressions is that they rarely give much information, either theoretical or computational, concerning the analytic form or other characteristics of the moment measures. The corresponding expression for the logarithm of the characteristic functional leads to a new family of measures associated with ξ, the cumulant measures. The ﬁrst cumulant measure coincides with the expectation measure, whereas the second is the covariance measure deﬁned by (9.5.12) C2 (A × B) = M2 (A × B) − M (A)M (B) = cov ξ(A), ξ(B) . In Example 9.5(a), the covariance measure is absolutely continuous with respect to two-dimensional Lebesgue measure, the covariance density being given by c2 (x, y) = 2c2 (x − y). The covariance density for this random measure is just the ordinary covariance function of the process forming the density of the random measure. Similar relations between the moment and cumulant densities of the random measure, and the moment and cumulant functions of its density, hold whenever the random measure can be represented as the integral of an underlying process. By contrast, in Example 9.5(c), the covariance measure is singular, consisting entirely of the concentration along the diagonal y = x with linear density λ2 α. The relation can be expressed conveniently using the Dirac delta function as cξ (x, y) = λ2 α δ(x − y). The general relation between the moment measures and the cumulant measures is formally identical to the relation between the factorial moment measures and factorial cumulant measures studied in Chapter 5, inasmuch as both are derived by taking logarithms of an expression of the type (9.5.9). Finally we consider analogues of Proposition 9.5.V for point processes that no longer need be ﬁnite as in the discussion in Section 5.4. The advantages of working with factorial moment measures M[k] remain: the same deﬁnition

70

9. Basic Theory of Random Measures and Point Processes

at (5.4.2) holds when we require the sets Ai there to be bounded. Then M[k] exists (i.e., is boundedly ﬁnite) if and only if Mk exists, and the deﬁnitions (5.4.3) and (5.4.4) continue, now with bounded sets Ai . In the proof below N [k] (·) denotes the k-fold factorial product measure as described below (5.4.4) (the notation recalls the factorial power used around Deﬁnition 5.2.I). It has expectation measure M[k] (·). For simple point processes, integration with respect to N [k] (dx1 × · · · × dxk ) is the same as integration with respect to N (dx1 ) . . . N (dxk ) if we add the restriction that x1 , . . . , xk must all be distinct. Thus, the integral at (7.1.13) could be written without the coincidence annihilating function I(·) there if instead the product measure is replaced by the factorial product measure. Proposition 9.5.VI. Let G be the p.g.ﬂ. of a point process whose kth order moment measure exists for some positive integer k. Then for 1 − η ∈ V(X ) and 0 < ρ < 1, G[1 − ρη] = 1 +

k (−ρ)j j=1

j!

X (j)

η(x1 ) . . . η(xj ) M[j] (dx1 × · · · × dxj ) + o(ρk ). (9.5.13)

Proof. Some care is needed in evaluating the diﬀerence between G[1 − ρη] and the ﬁnite sum on the right-hand side of (9.5.13). For ﬁxed η and a given realization {yi } of the point process, consider the expressions Sm (ρ) = 1 +

m (−ρ)j j=1

j!

X (j)

η(x1 ) . . . η(xj ) N [j] (dx1 × · · · × dxj ),

where N [j] is the modiﬁed product counting measure formed by taking all possible ordered j-tuples of diﬀerent points of the realizations of N (with the convention that if {yi } has multiple points these should be treated as diﬀerent points with the same state space coordinates, that is, as if they represented distinct particles). Each integral then reduces to a sum Qj = η(yi1 ) . . . η(yij ) over all such j-tuples. Eﬀectively, each sum is a.s. ﬁnite, because with probability 1 only a ﬁnite number of points of the process will fall within the support of 1 − η. Moreover, the sum vanishes whenever j is larger than the number of points in this support. Because N [j] includes all possible orderings of a given j-tuple, each distinct term in the sum occurs j! times, so that we can write j! qj , where the sum on qj extends over all distinct combinations of Qj = j points from {yi } (with the same convention as before regarding multiple points). In this notation we have Sm (ρ) = 1 +

m j=1

(−ρ)j qj ,

(9.5.14)

9.5.

Moment Measures and Expansions of Functionals

71

and it is not diﬃcult to verify (e.g., by induction) that for all m and η,

S2m+1 (ρ) ≤ [1 − ρη(yi )] ≡ Π(ρ) ≤ S2m (ρ), (9.5.15) i

the product being taken over {yi }; Exercise 9.5.9 interprets (9.5.15) in terms of Bonferroni inequalities. Equation (9.5.14) implies that |Sk − Sk−1 | ≤ ρk qk , which with (9.5.15) and its implication that the sums Sk (ρ) are alternately above and below Π(ρ) implies both |Sk (ρ) − Π(ρ)| ≤ ρk qk

and

|Sk (ρ) − Π(ρ)| ≤ ρk+1 qk+1 .

(9.5.16)

Now suppose that Mk , and hence M[k] , exist. The ﬁrst inequality at (9.5.16) implies that [Sk (ρ) − Π(ρ)]/ρk is bounded by a random variable with ﬁnite expectation 1 η(x1 ) . . . η(xk ) M[k] (dx1 × · · · × dxk ), E(qk ) = k! X (k) because M[k] is just the expectation of N [k] . The second inequality at (9.5.16) implies that [Sk (ρ) − Π(ρ)]/ρk → 0 a.s. as ρ → 0. The limit behaviour of the remainder term in (9.5.13) now follows by dominated convergence. Uniqueness of the expansion (9.5.13) follows from the uniqueness of the coeﬃcients in a power series expansion and the fact that, as symmetric measures, the moment measures are uniquely speciﬁed by integrals of the type appearing in the expansion (see Exercise 9.5.7). Taking expectations in (9.5.16) yields the following corollary. Corollary 9.5.VII. When Mk+1 exists, the remainder term in (9.5.13) is bounded by ρk+1 η(x1 ) . . . η(xk+1 ) M[k+1] (dx1 × · · · × dxk+1 ). (k + 1)! X (k+1) On taking logarithms of the expression (9.5.13) and using the expansion k log(1 − y) = − j=1 y j /j + o(|y|j ) (y → 0), equation (9.5.17) below follows. Corollary 9.5.VIII. Under the conditions of Proposition 9.5.VI, the p.g.ﬂ. can be expressed in terms of the factorial cumulant measures C[j] , for ρ → 0, as k (−ρ)j η(x1 ) . . . η(xj ) C[j] (dx1 × · · · × dxj ) + o(ρk ). log G[1 − ρη] = j! (j) X j=1 (9.5.17) Equation (9.5.17) serves to deﬁne the cumulant measures, which can be expressed explicitly in terms of the measures M[k] as in Chapter 5. Unfortunately it does not seem possible to provide a simple bound for the remainder term in (9.5.17) analogous to that of Corollary 9.5.VII (but, see Exercise 9.5.10).

72

9. Basic Theory of Random Measures and Point Processes

Example 9.5(d) Moment measures of Poisson and compound and mixed Poisson processes [continued from Examples 9.4(c)–(d)]. Expanding the p.g.ﬂ. at (9.4.17) formally, in terms of η with 1 − η ∈ V(X ), k ∞ 1 η(x) µ(dx) G[1 + η] = 1 + k! X k=1 ∞ 1 =1+ ··· η(x1 ) . . . η(xk ) µ(dx1 ) . . . µ(dxk ). k! X X k=1

Thus, for the Poisson process with parameter measure µ, the kth order factorial moment measure M[k] is the k-fold product measure of µ with itself. The situation with the cumulant measures is even simpler: here, log G[1 + η] = η(x) µ(dx) so that for a Poisson process the second and all higher factorial X cumulant measures vanish. This last result is in marked contrast with the situation for compound Poisson processes for which log G[1 + η] equals ∞ ∞ [η(x)]k [k] [1 + η(x)]n − 1 πn (x) µ(dx) = n πn (x) µ(dx) k! X n k=1 X n=k ∞ = [η(x)]k m[k] (x) µ(dx), k=1

X

where m[k] (x) is the kth factorial moment of the batch-size distribution {πn (x)} at the point x, assuming the moment exists. This representation implies that C[k] is concentrated on the diagonal elements (x, . . . , x) where it reduces to a measure with density m[k] (x) with respect to µ(·). For the mixed Poisson process, suppose that Λ has a ﬁnite kth moment. Then in a neighbourhood Re(s) ≥ 0 of s = 0, the Laplace–Stieltjes transform L(s) = 1 +

k (−s)j E(Λj ) j=1

j!

+ o(|s|k ).

Then for η with 1 − η ∈ V(X ), G[1 − ρη] equals 1+

k (−ρ)j E(Λj ) j=1

j!

X

···

X

η(x1 ) . . . η(xj ) µ(dx1 ) . . . µ(dxj ) + o(ρk ),

and thus the factorial moment measures of the process are given by M[j] (dx1 × · · · × dxj ) = E(Λj ) µ(dx1 ) . . . µ(dxj )

(j ≤ k).

(9.5.18)

If in particular, X = Rd and µ(·) is Lebesgue measure on Rd , then M[j] has a density m[j] with respect to such Lebesgue measure given by m[j] (x1 , . . . , xj ) = E(Λj )

(j ≤ k).

9.5.

Moment Measures and Expansions of Functionals

73

Thus, the factorial moment measures for the mixed Poisson process retain the product form of the Poisson case but are multiplied by the scalar factors E(Λj ). McFadden (1965a) and Davidson (1974c) established the following converse to this result. Let {M[j] (·)} be a sequence of product measures of the form (9.5.18) with {E(Λj )} replaced by a sequence {γj }. Then the M[j] (·) are the factorial moment measures of a point process if and only if {γj } is the moment sequence of some nonnegative r.v. Λ0 . A suﬃcient condition for −1/j < ∞, in which the resulting process to be uniquely deﬁned is that γj case it is necessarily the mixed Poisson process with parameter measure Λ0 µ, where Λ0 has a uniquely deﬁned distribution with moments {γj }. Proposition 5.4.VII implies a weaker version of this result (see Exercise 9.5.11). For completeness here recall that in Example 6.4(b) we discussed negative binomial processes, meaning, point processes N for which N (A) has a negative binomial distribution [but see also Example 9.1(b)]. In particular, we noted two mechanisms leading to such processes, one starting from compound Poisson processes and the other from mixed Poisson processes. Not infrequently our concern with a point process may be with its structure only on some bounded region A of the state space. Within A (assumed to be Borel), N is a.s. ﬁnite-valued by assumption, and its probabilistic structure must be expressible in terms of some family of local probability distributions or Janossy measures as in Deﬁnition 5.4.IV and, for the p.g.ﬂ., Example 5.5(b). However, because such point processes are in general a.s. inﬁnite on the whole of X , no such measures exist for the process as a whole. We illustrate in the next example how local characteristics can be described: we take the case of the negative binomial process in the setting of its local Janossy measure. Example 9.5(e) Local properties of the negative binomial process. Recall from Example 5.5(b) that, given the p.g.ﬂ. G[·] of a point process and a bounded Borel set A, the p.g.ﬂ. GA [·] of the local process on A is given by GA [h] = G[1 − IA + h∗ ] ∗

∗

[h ∈ V(A)],

where h (x) = h(x)IA (x) so that h ∈ V(X ). Example 6.4(b)(i) gives the p.g.ﬂ. log [1 − ρh(x)]/(1 − ρ) µ(dx) G[h] = exp log(1 − ρ) X for a negative binomial process coming from a Poisson cluster process with clusters degenerate at a point and size following a negative binomial distribution. Then because the integral over Ac vanishes, we deduce that 1 1 − ρh µ(dx) . log GA [1 − IA + h∗ ] = exp log(1 − ρ) A 1−ρ Thus, the localized process is still a negative binomial process. The local Janossy measures can be found from the expansion ∞ ρn (n) 1 − ρh = − log(1 − ρ) + h , log 1−ρ n n=1

74

9. Basic Theory of Random Measures and Point Processes

from which we deduce that p0 (A) = exp[−µ(A)] and J1 (dx | A) = ρp0 (A) µ(dx), J2 (dx1 × dx2 | A) = ρ2 p0 (A)[µ(dx1 )µ(dx2 ) + δ(x1 , x2 )µ(dx1 )], where the two terms in J2 represent contributions from two single-point clusters at x1 and x2 (x1 = x2 ) and a two-point cluster at x1 = x2 .

Exercises and Complements to Section 9.5 9.5.1 Moment measures of Dirichlet process. Let ξ be a random probability measure on X . Show that for every k, the kth moment measure exists and deﬁnes a probability measure on X (k) . Find these measures for the Dirichlet process ζ of Example 9.1(e), showing in particular that Eζ(A) =

α(A) , α(X )

var ζ(A) =

α(A) 1 − α(A)/α(X ) · . α(X ) α(X ) + 1

9.5.2 For the random measure induced by the limit random d.f. of Exercise 9.3.4, show that the ﬁrst moment measure is Lebesgue measure on [0, 1]. 9.5.3 Let ξ be a random measure on X = Rd , and for g ∈ BM+ (X ) deﬁne G(A) = g(x) (dx), where denotes Lebesgue measure on Rd . Deﬁne η on BX by A G(A − x) ξ(dx). η(A) = X

(a) Show that η(A) is an a.s. ﬁnite-valued r.v. for bounded A ∈ BX , that it is a.s. countably additive on BX , and hence invoke Proposition 9.1.VIII to conclude that η is a well-deﬁned random measure. (b) Show that if ξ has moment measures up to order k, so does η, and ﬁnd the relation between them. Verify that the kth moment measure of η is absolutely continuous with respect to Lebesgue measure on (Rd )(k) . (c) Denoting the characteristic functionals of ξ and η by Φξ [·] and Φη [·], show that for f ∈ BM+ (X ), h(x) = X

f (y)g(y − x) dy

is also in BM+ (X ), and Φη [f ] = Φξ [h]. 9.5.4 (Continuation). By its very deﬁnition, η is a.s. absolutely continuous with respect to Lebesgue measure and its density g(t − x) ξ(dx), Y (t) ≡ X

when ξ is completely random, is called a linear process. Find the characteristic functional of Y when ξ is a stationary gamma random measure. [Remark: Examples of linear processes are provided by equation (9.5.2) in connection with the original Campbell theorem and by the shot-noise process as in Examples 6.1(d) and 6.2(a). See Exercise 10.1.3(b) for the case that ξ is completely random; for other references see, e.g., Westcott (1970).]

9.5.

Moment Measures and Expansions of Functionals

75

9.5.5 (a) For the random probability distribution on R+ deﬁned as in Exercise 9.1.4 by Fη (x) = 1 − exp(−ξ([0, x])), show that ξ is a.s. nonatomic if and only if the second moment measure M2 of η has M2 (diag(R(2) )) = 0. (b) Prove the more general Proposition 9.5.III. 9.5.6 (a) Use the intensity measure λ∗k of the simple point process Nk∗ in the decomposition at (9.3.9) [see also (9.3.17) and Proposition 9.5.II] to show that M2 (diag A(2) ) = M (A) +

∞

k(k − 1)λ∗k (A) = M (A) + M[2] (diag A(2) ).

k=2

r ∗ Conclude more generally that Mr (diag A(r) ) = ∞ k=1 k λk (A). (b) When Mr (A(r) ) < ∞, deduce that M[r] (diag A(r) ) = 0 if and only if P{N ({x}) ≤ r − 1 for all x ∈ A} = 1.

9.5.7 Let M(k) be a symmetric measure on (X (k) , B(k) ). By starting from indicator functions, show that M(k) is uniquely determined by integrals of the form

η(x1 ) . . . η(xk ) M(k) (dx1 × · · · × dxk )

X (k)

(1 − η ∈ V(X )).

9.5.8 Expand E(e−X−Y ) for nonnegative r.v.s X and Y to deduce that if a random measure ξ has a ﬁnite kth order moment measure, then for > 0 and for functions f , g ∈ BM+ (X ), using ξf as at (9.5.1), L[f + g] = L[f ] − E[ξg exp(−ξf )] + · · · + 9.5.9 Let QK =

K i=1 (1

(− )k E[(ξg )k exp(−ξf )] + o( k ). k!

− αi ), where 0 < αi < 1 for i = 1, . . . , K, and write qk =

···

1≤i1 <···
αi1 . . . αik ,

so that QK = 1 − q1 + q2 − · · · + (−1)K qK . By interpreting {αi } as the set of probabilities of some independent events A1 , . . . , AK , and using the Bonferroni inequalities (see Exercise 5.2.5), show that for all K and positive integers m ≤ 12 K, (2m−1) (2m) SK ≤ QK ≤ S K , where

(k)

SK = 1 − q1 + q2 − · · · + (−1)k qk ,

k = 1, . . . , K.

Hence, deduce (9.5.15). [Hint: See Westcott (1972).] 9.5.10 Give a Laplace functional analogue of Proposition 9.5.V. [Hint: Replace (is)r there by (−s)r , where now Re(s) ≥ 0.]

9.5.11 Suppose that for j = 1, 2, . . . , measures M[j] on B(X (j) ) are deﬁned by j (9.5.18) with each E(Λj ) replaced by some γj > 0 for which ∞ j=1 (1+ ) γj < ∞ for some > 0. Use (5.4.8) to show that local Janossy measures are deﬁned and that they determine a point process whenever {γj } is a moment sequence.

CHAPTER 10

Special Classes of Processes

10.1 10.2 10.3 10.4

Completely Random Measures Inﬁnitely Divisible Point Processes Point Processes Deﬁned by Markov Chains Markov Point Processes

77 87 95 118

We have already discussed in Volume I a variety of particular models for point processes and random measures, and described many of their properties. With the added beneﬁt of the basic theory in Chapter 9, we return here to the study of four important classes of models: completely random measures; inﬁnitely divisible point processes; point processes generated by Markov chains; and Markov point processes in space. Each class has interest in its own right, and contains models which are widely used in applications. Although it is the intrinsic interest of the models that motivates the discourse, our immediate aims are to use the theory of the last chapter to establish structure theorems for these classes, to show that they are well-deﬁned mathematical objects, and to establish some of their general properties. A key feature of the ﬁrst two classes is their close link to the Poisson process. Indeed, they form natural extensions of the compound Poisson processes discussed in Chapters 2 and 9. Because of this feature, many of their properties can be handled compactly by p.g.ﬂ. techniques, and we make extensive use of this approach. It should be borne in mind, however, that the main advantage of this approach lies precisely in its compactness: it quickly summarizes information that can still be derived quite readily without it and that in less tractable examples may not be so easily expressible in p.g.ﬂ. form. The other two sections illustrate both the power and limitations of the ideas of Markov chains that so pervade applied probability. When the process has a temporal ingredient, it is natural to include this in any probabilistic description so that it is the ‘future’ that is predicted (stochastically) on the basis of suﬃcient conditions described by the ‘present’ so that any other knowledge from the past is superﬂuous in terms of making any prediction better informed. In reality, this amounts to a factorization of the stochastic structure 76

10.1.

Completely Random Measures

77

of the evolution of the process as a product of probabilistic terms (densities) ‘chained’ through adjacent epochs in time. It is exactly a product probabilistic structure that underlies the class of so-called Markov point processes in space. The class of models that can be described in this way leads to representations of the type now known under the umbrella of Hammersley–Cliﬀord theorems. This Section 10.4 serves as a taste of further results in Chapter 15.

10.1. Completely Random Measures This section represents both an illustration of the ideas of the sample-path properties expounded in Section 9.3 and an extension of the discussion of Section 2.4 on the general Poisson process. The principal result is the Representation Theorem 10.1.III, which comes from Kingman (1967); it is based on a careful study of sample-path structures to which we proceed immediately. Although completely random measures have been referred to at several points in the text already, we state the following for the record. Deﬁnition 10.1.I. A random measure ξ on the c.s.m.s. X is completely random if for all ﬁnite families of disjoint, bounded Borel sets {A1 , . . . , Ak }, the random variables {ξ(A1 ), . . . , ξ(Ak )} are mutually independent. Of course, the Poisson process discussed extensively in Chapter 2 and elsewhere is the prime example. A compound Poisson process with marks in the c.s.m.s. K is a completely random process on X × K with the additional requirement that the ground process Ng on X be well deﬁned (and then a Poisson process in its own right). In one dimension, a random measure ξ is completely random if and only if the corresponding cumulative process t η(t) = 0 ξ(dx) has independent increments. Recall from Corollary 9.3.VI that a completely random measure ξ on X is nonatomic if and only if for some dissecting system {Tn } and every > 0, kn i=1 P{ξ(Ani ) ≥ } → 0 (n → ∞). A substantial step in the proof of the main representation result, equation (10.1.4) in Theorem 10.1.III, is the following result, which is of interest in its own right. Proposition 10.1.II. If the completely random measure ξ is a.s. nonatomic, then there is a ﬁxed nonatomic measure ν such that ξ(·) = ν(·)

a.s.

(10.1.1)

Proof. Let T = {Tn } = {Ani : i = 1, . . . , kn } be a dissecting system for any given bounded Borel set A; deﬁne the transforms ψni (s) = E exp[−sξ(Ani )]

(Re(s) ≥ 0).

78

10. Special Classes of Processes

Because ξ is completely random and ξ(A) =

kn i=1

ξ(Ani ), we have

kn ψA (s) ≡ E exp[−sξ(A)] = ψni (s)

(n = 1, 2, . . .),

i=1

and 1 − ψni (s) = E 1 − exp[−sξ(Ani )] =

∞

s e−sy P{ξ(Ani ) > y} dy.

0

Appealing to Corollary 9.3.VI and the dominated convergence theorem, ξ being nonatomic implies that max [1 − ψni (s)] → 0

(n → ∞)

1≤i≤kn

for every ﬁxed real s ≥ 0. Using this result in an expansion of the logarithmic term below, it now follows that − log ψA (s) = −

kn

log ψni (s) = − lim

n→∞

i=1 kn

= lim

n→∞

log ψni (s)

i=1

[1 − ψni (s)]

i=1 ∞

= lim

n→∞

kn

(10.1.2)

(1 − e−sy ) Gn (dy),

0

where Gn (·) is the sum of the kn individual probability measures of ξ(Ani ). Again from (9.3.7), ∞ Gn (dy) → 0 (n → ∞) (10.1.3)

for every ﬁxed > 0, and from the limit relation (10.1.2) it follows that y Gn (dy) remains bounded as n → ∞. Thus, the sequence of measures 0 Hn (dy) ≡ min(1, y) Gn (dy) is not merely bounded in total mass but, from (10.1.3), is uniformly tight. Again, using (10.1.3), it follows that the only possible limit for {Hn } is a degenerate measure with its mass concentrated at the origin. Uniform tightness implies the existence of a convergent subsequence, thus there must exist a constant ν ≡ ν(A) and a sequence {nk } for which Hnk → ν(A)δ0 weakly, and therefore ∞

− log ψA (s) = lim

nk →∞

0

= lim

nk →∞

0

∞

(1 − e−sy ) Gnk (dy)

1 − e−sy Hnk (dy) = sν(A). min(y, 1)

10.1.

Completely Random Measures

79

This result is equivalent to ξ(A) = ν(A) a.s. for the given bounded Borel set A. Because such a relation holds for any bounded Borel set A, the family ν(·) must be a ﬁnitely bounded Borel measure and ξ = ν must hold for almost all realizations. Finally, ξ being a.s. free of atoms, the same must be true of ν also. The major goal now is the following representation theorem. Theorem 10.1.III (Kingman, 1967). Any completely random measure ξ on the c.s.m.s. X can be uniquely represented in the form ∞ ∞ Uk δxk (A) + ν(A) + y N (A × dy), (10.1.4) ξ(A) = 0

k=1

where the sequence {xk } enumerates a countable set of ﬁxed atoms of ξ, {Uk } is a sequence of mutually independent nonnegative random variables determining (when positive) the masses at these atoms, ν(·) is a ﬁxed nonatomic boundedly ﬁnite measure on X , and N (·) is a Poisson process on X × (0, ∞), independent of {Uk }, the parameter measure µ of which may be unbounded on sets of the form A × (0, ) but satisﬁes µ(A × dy) < ∞, (10.1.5a) y> y µ(A × dy) < ∞ (10.1.5b) 0
for every bounded Borel set A and every 0 < < ∞, and for all x ∈ X , µ({x} × (0, ∞)) = 0.

(10.1.5c)

Proof. The complete independence property shows that the random masses Uk in the ﬁrst component in (10.1.4), whose identiﬁcation is assured by Proposition 9.3.IV, are mutually independent and independent also of the sum of the other two terms. So by considering ξ − Uk δxk , we may assume that the completely random measure has no ﬁxed atoms. Similar considerations imply (10.1.5c). From Lemma 9.1.VII we can identify the component with random atoms as an extended MPP on X with positive marks [Deﬁnition 9.1.VI(vi)], N say. To show that N inherits the completely random property, let Vj = Aj × [aj , bj ) (j = 1, 2) be any two disjoint product sets of the form described (V1 ) and N (V2 ) are independent. above. If A1 ∩ A2 = ∅, it is obvious that N Consider the other possibility, that A1 = A2 = A but [a1 , b1 ) and [a2 , b2 ) are disjoint. Let T be a dissecting system for A, and set 1 if a1 ≤ ξ(Ani ) < b1 , 1 if a2 ≤ ξ(Ani ) < b2 , Yni = Xni = 0 otherwise, 0 otherwise, pni = P{Xni = 1},

qni = P{Yni = 1}.

80

10. Special Classes of Processes

Then (V1 ) = lim N

kn

n→∞

Xni

a.s.,

(V2 ) = lim N

n→∞

i=1

kn

Yni

a.s.,

i=1

and the complete independence property yields for the joint probability generating function (with |zj | ≤ 1 for j = 1, 2) kn

(V ) N(V ) N E(z1 1 z2 2 ) = lim E(z1Xni z2Yni ) n→∞

= lim

n→∞

i=1 kn

[1 − pni (1 − z1 ) − qni (1 − z2 )].

(10.1.6)

i=1

Similarly, kn

(V ) (V ) N N E(z1 1 ) E(z2 2 ) = lim [1 − pni (1 − z1 )] [1 − qni (1 − z2 )]. n→∞

(10.1.7)

i=1

(V1 ) and N (V2 ), it is enough to show that To establish the independence of N (10.1.6) and (10.1.7) are equal for all 0 ≤ zj ≤ 1 (j = 1, 2). Take logarithms and use the inequalities x ≤ − log(1 − x) ≤ x/(1 − x)

(0 ≤ x < 1),

(10.1.8)

and write rni = pni (1 − z1 ) + qni (1 − z2 ), Then for 0 ≤ zj ≤ 1 in (10.1.6), 0≤−

kn

log(1 − rni ) −

i=1

≤ Rn ≡

kn

rni

i=1 kn i=1

kn 2 maxi rni rni ≤ rni . 1 − rni 1 − maxi rni i=1

(10.1.9)

Now from Lemma 9.3.II we must have maxi rni → 0 (n → ∞), and by (10.1.6) and the ﬁrst inequality at (10.1.9), kn i=1

rni ≤ −

kn

N(V ) N(V ) log(1 − rni ) → − log E z1 1 z2 2 ,

i=1

which is ﬁnite, uniformly in n. So Rn → 0 as n → ∞. Similar estimates apply to the two generating functions at (10.1.7), and because the diﬀerence of the leading terms in (10.1.6) and (10.1.7) vanishes, we must have equality of the generating functions as required. By induction we can demonstrate the independence of any ﬁnite family (Vj ): j = 1, . . . , k} whenever the sets Vj are rectangular and disjoint as {N considered. Now by Proposition 9.2.III and its corollary, the distribution is determined by all the joint distributions {N (Vj ): j = 1, . . . , k}, and of N thus N (·) is completely random. By construction N is a simple point process,

10.1.

Completely Random Measures

81

is a Poisson process with parameter therefore Theorem 2.4.VII implies that N measure µ taking ﬁnite values on rectangles of the form already considered. The tighter boundedness properties in (10.1.5) can be established as fol A×( , ∞) is boundedly ﬁnite for ﬁxed > 0 lows. First, because N (A) = N and (as just shown) is for ﬁxed A a Poisson process in the range ∞ > > 0, the ﬁniteness of (10.1.5a) is assured. For (10.1.5b), we observe moreover that for every > 0, (A × dy) ≤ ξ(A) < ∞ a.s., yN (10.1.10) 0

because the integral is simply a sum (possibly an inﬁnite series) of the contribution of the random atoms of ξ to its total mass ξ(A) on A. To see that the convergence of the integral at (10.1.10) implies the convergence of its expectation as at (10.1.5b), partition the interval (0, ) into the sequence of subintervals {[ /(r + 1), /r) : r = 1, 2, . . .}. Then ∞

y µ(A × dy) ≤ yr µ(Ar ), where yr = and Ar = A × yr+1 , yr . r 0 r=1 Let {ζr } be a sequence of independent Poisson r.v.s with parameters µ(Ar ). Then for the Laplace transform of yr ζr we have ∞ ∞

− log E exp − s = − log yr ζr exp − µ(Ar )[1 − exp(−syr )] r=1

r=1

=

∞

µ(Ar )[1 − exp(−syr )] ≥

1 2s

r=1

∞

yr µ(Ar )

r=1

for 0 < s < −1 , because then0 < syr ≤ 1. ∞ Thus, the convergence of r=1 yr µ(Ar ) is implied by the a.s. convergence ∞ of r=1 yr ζr , concerning which we have ∞ ∞ ∞ ∞

(A × dy) ≥ ζr ≥ ζr ≥ 12 yr ζr = yN yr ζr . r r+1 0 r=1 r=1 r=1 r=1 The asserted ﬁniteness at (10.1.5b) is now established. Finally observe that the measure ∞ (A × dy) ν˜(A) ≡ ξ(A) − Uk δxk (A) − yN 0

is a.s. nonatomic (by construction) and completely random inasmuch as ξ is by assumption and the other two terms have been demonstrated to have the property. Then by Proposition 10.1.I, ν˜ = ν a.s. for some ﬁxed nonatomic measure ν. The theorem is proved on noting that of the three terms in (10.1.4), the ﬁrst consists of ﬁxed atoms, the second is a constant measure, and the third is purely atomic, so uniqueness is assured. As a simple special case we obtain the L´evy-type representation for a process with nonnegative independent increments.

82

10. Special Classes of Processes

Example 10.1(a) Nonnegative L´evy processes. Here we use ‘L´evy process’ to mean a process X(t) on the real line with independent increments; nonnegativity ensures that the corresponding set process is a measure and not a signed measure. Thus, this example excludes the Brownian motion process and its fractional derivatives. The form of the representation (10.1.4) is unchanged; all that is required is the identiﬁcation of X with R. In applications, it is common to require the process to have increments that are both stationary and independent. Stationarity then rules out the existence of ﬁxed atoms, the ﬁxed measure ν reduces to a multiple of Lebesgue measure, and the compound Poisson process in the representation (10.1.4) inherits the stationarity property from X(t). Thus the representation takes the simpler form, for any ﬁnite interval (a, b], X(b) − X(a) = ν(b − a) +

∞

y N (a, b] × dy ,

0

where ν is a nonnegative real constant, called the drift coeﬃcient in, for example, Bertoin (1996, p. 16), and N is an extended stationary compound Poisson process, meaning that the intensity measure of the corresponding Poisson process on R × R+ has the form µ = × Ψ where Ψ, although not necessarily totally ﬁnite, does have ﬁnite total mass beyond any > 0 [corresponding to (10.1.5a)], and integrates y at the origin [corresponding to (10.1.5b)]. It is often convenient to describe the representation of Theorem 10.1.III in terms of Laplace functionals, as in the proposition below [cf. Kingman (1967)]. Exercise 10.1.2 summarizes the corresponding representations of the Laplace– Stieltjes transforms for the process increments in the real line case; these are standard representations for the transforms of nonnegative inﬁnitely divisible distributions. Proposition 10.1.IV. In order that the family {ψA (·), A ∈ BX } denote the Laplace transforms of the one-dimensional distributions of a completely random measure on X , it is necessary and suﬃcient that ψA (·) have a representation of the form, for Re(s) ≥ 0, log ψA (s) = −

∞ k=1

θk (s)δxk (A)−

∞

(1−e−sy ) µ(A×dy)−sν(A),

(10.1.11)

0

where {xk } is a ﬁxed sequence of points, each θk (·) is the logarithm of the Laplace transform of a positive random variable, and the measures ν, µ have the same properties as in Theorem 10.1.III. Conversely, given any such family {xk , θk (·), ν, µ}, there exists a completely random measure with one-dimensional Laplace transforms given by (10.1.11). Proof. The representation (10.1.11) follows immediately on substituting for ξ(A) from (10.1.4) in the expectation ψA (s) = E(e−sξ(A) ).

10.1.

Completely Random Measures

83

To prove the converse it is suﬃcient to show that the form (10.1.11), together with the deﬁnition of joint distributions through the completely random property, yields a consistent family of ﬁnite-dimensional distributions. The details of the veriﬁcation are left as Exercise 10.1.1. Example 10.1(b) Stable random measures; nonnegative stable processes. A special case of interest is the class of stable random measures for which the measure µ of Theorem 10.1.III takes the form µ(dx × dy) = κ(dx) y −(1+1/α) dy for 1 < α < ∞ and some boundedly ﬁnite measure κ(·) on BX . For such random measures the Laplace–Stieltjes transform of the one-dimensional distributions take the form ∞

1 − e−sy −ξ(A)s ] = exp −κ(A) dy = exp −Cα κ(A)s1/α , ψA (s) = E[e 1+1/α y 0 (10.1.12) where Cα = αΓ([α − 1]/α) [see, e.g., Bertoin (1996, p. 73)]. An alternative representation for the Laplace–Stieltjes transforms of nonnegative stable processes, due to Kendall (1963), is given in Exercises 10.1.2–3. When X = Rd and κ reduces to Lebesgue measure, the process is both selfsimilar and stationary, with index of similarity α (we also discuss self-similar random measures in Section 12.8). For more detail concerning stable random measures and related processes, see Samorodnitsky and Taqqu (1994). The case when the ﬁdi distributions are gamma distributions has already been discussed in Example 9.1(d). A related but more extended family of completely random measures is described below, following Brix (1999) whose exposition covers earlier material. Example 10.1(c) G-random measures. Here the one-dimensional distributions have Laplace transforms (10.1.11) of the form

ψA (s) = exp − κ(A)[(θ + s)ρ − θρ ]/ρ , where ρ and θ are parameters satisfying either ρ ≤ 0 and θ > 0, or 0 < ρ ≤ 1 and θ ≥ 0. The case ρ = 0 can be obtained as a limit for ρ ↓ 0, and gives back a gamma distribution for ψA (s). When ρ < 0, the underlying measure µ of (10.1.11) is a product of Lebesgue measure and a gamma distribution (i.e., in this case the jumps have gamma distributed heights). In the case 0 < ρ < 1 the corresponding ∞ density is improper (its integral diverges) but still satisﬁes the condition 0 yf (y) dy < ∞, implying that conditions (10.1.5) hold. The L´evy representation has density [cf. (10.1.11)] κ(A) y −ρ−1 e−θy dy. µ(A × dy) = Γ(1 − ρ) Lee and Whitmore (1993) describe the L´evy processes corresponding to the one-dimensional versions of these processes as Hougarde processes.

84

10. Special Classes of Processes

Brix (1999) also describes the use of these G-measures, or smoothed versions thereof, as directing measures for a Cox process. In one dimension, that is, when X = R, the smoothed version can then be made to correspond to a type of shot-noise process; Exercise 10.1.7 gives some details. No essentially new ideas arise in extending the complete randomness property to marked point processes. We say that an MPP on X with marks in K is completely random when the associated point process on X × K is completely random. It is somewhat surprising that when the MPP has a simple ground process Ng , this condition is equivalent to the apparently weaker condition that the random variables N (Ai × Ki ) should be mutually independent whenever the sets Ai are disjoint, irrespective of whether the corresponding sets Ki are disjoint. Because the construction of Exercise 9.1.6 indicates that, by adjusting the mark space if necessary, we can always ﬁnd an equivalent description of an MPP as an MPP with simple ground process, the lemma below is rather generally applicable. Lemma 10.1.V. An MPP with simple ground process Ng is completely random if and only if for every ﬁnite n, bounded Ai ∈ BX and Ki ∈ BK (i = 1, . . . , n), the random variables N (Ai × Ki ) are mutually independent whenever the Ai are mutually disjoint. Proof. It is obvious that if the complete randomness property holds, the asserted independence property holds because sets in a product space with disjoint marginals are disjoint. For the converse, suppose given two product sets in the product space, Aj × Kj say, for j = 1, 2. Consider ﬁrst the case A1 = A2 = A say but K1 ∩ K2 = ∅. We want to show that under the condition of Ng being simple, the N (A × Kj ) are independent. Let T be a dissecting system for A, and consider for elements Ani of a partition Xni,j = min(1, N (Ani × Kj )). Simplicity of Ng implies that N (A×Kj ) = limn→∞ i Xni,j . We can now imitate that part of the proof of Theorem 10.1.III around (10.1.6–9) to conclude that the N (A × Kj ) are independent. In the general case, with both A12 = A1 ∩ A2 and A1 ∪ A2 nonempty, where Aj = Aj \ A12 for j = 1, 2, the product sets Aj × K are disjoint in their X -components and therefore independent, and N (A12 × Kj ) are independent for disjoint Kj by the case already considered. Independence of N (Vj ) (j = 1, 2) for arbitrary bounded Borel sets Vj in the product space follows from their independence when the Vj are product sets by standard extension arguments. The argument extends to any ﬁnite number of sets by induction. We proceed to examine the structure of a completely random MPP. We know from Chapter 2 (or as a special case of Theorem 10.1.III) that a simple

10.1.

Completely Random Measures

85

completely random point process reduces to a Poisson process; in the present case the parameter measure µg of the ground process must satisfy µg (A) = E[N (A × K)] < ∞

(10.1.13)

for all bounded A ∈ BX , inasmuch as Ng is boundedly ﬁnite by assumption. Introduce a family of probability measures P (K | x) on the mark space (K, B(K)) by means of the Radon–Nikodym derivatives P (K | x) µg (dx).

E[N (A × K)] ≡ µ(A × K) =

(10.1.14)

A

Then the absolute continuity condition µ(· × K) µg is satisﬁed, and the property P (K | x) = 1 a.s. follows from the deﬁnition of µg . As in the discussion of regular conditional probability (Proposition A1.5.III), we can and do assume that the family {P (B | x): B ∈ B(K), x ∈ X } is so chosen that P (· | x) is a probability measure on B(K) for all x ∈ X . With this understanding we arrive at the next proposition which eﬀectively implies that completely independent MPPs reduce to some general type of compound Poisson process. Proposition 10.1.VI. A completely random MPP with simple ground process is fully speciﬁed by the two components: (i) a Poisson process of locations with parameter measure µg ; and (ii) a family of probability distributions P (· | x) giving the distribution of the mark in K with the property that P (B | x) is measurable in x for each ﬁxed B ∈ B(K). Conversely, given such µg and P (· | ·), there exists a completely random MPP having these as components. Proof. For the converse, it suﬃces to construct a Poisson process on X × K with parameter measure (10.1.14); we leave it to Exercise 10.1.6 to verify that the resultant process is an MPP with the complete randomness property.

Exercises and Complements to Section 10.1 10.1.1 Imitate the discussion of Example 9.2(a) to verify that the ﬁdi distributions of completely random measure, constructed from the one-dimensional distributions as at (10.1.12), satisfy Conditions 9.2.V and 9.2.VI. 10.1.2 Let ξ be a stationary, completely random measure on R, and let ψt (s) denote the Laplace–Stieltjes transform of ξ(0, t]. (a) Deduce from Proposition 10.1.IV that ψt (s) = exp

− stν − t

(0,∞)

[1 − e−sy ] Ψ(dy) ,

(10.1.15)

where ν is a positive constant, and the σ-ﬁnite ∞ measure Ψ on (0, ∞) satisﬁes, for some > 0, 0 y Ψ(dy) < ∞ and Ψ(dy) < ∞.

86

10. Special Classes of Processes (b) Establish also the equivalent representation ψt (s) = exp

− stν − t

(0,∞]

1 − e−sy G(dy) 1 − e−y

(10.1.16)

for some totally ﬁnite measure G and some nonnegative ﬁnite constant ν [ν here equals ν(0, 1] in the notation of (10.1.11)]. [Remark: This form parallels that given in Kendall (1963); the measure µ of (10.1.5) and (10.1.11) satisﬁes (0,∞) (1 − e−y ) µ(A × dy) < ∞.] 10.1.3 (Continuation). Using the representation above and the same notation, show that P{ξ(0, t] = 0} > 0 if and only if both (0,1) y −1 G(dy) < ∞ and ν = 0. [Remark: The condition ν = 0 precludes any positive linear trend; the other condition precludes the possibility of an everywhere dense set of atoms.] 10.1.4 For a given random measure ξ let Y denote the family of measures η satisfying P{ξ(A) ≥ η(A) (all A ∈ BX )} = 1. (a) Deﬁne νd (A) = supη∈Y η(A); check that it is a measure, and conﬁrm that ξ − νd is a random measure. (b) Extract from ξ − νd the random measure ζa consisting of all the ﬁxed atoms of ξ − νd , leaving ξr = ξ − νd − ζa . (c) The result of (a) and (b) is to eﬀect a decomposition ξ = νd + ζa + ξr of ξ into a deterministic component νd , a component of ﬁxed atoms ζa , and a random component ξr . Give an example showing that there may still be bounded A ∈ BX for which P{ξr (A) ≥ } = 1 for some > 0. [Hint: Let ξr give mass 1 to either U or U + 1, where U is uniformly distributed on (0, 1).] 10.1.5 Proposition 10.1.IV coupled with the independence property for disjoint sets A implies that the Laplace functional (9.4.10) of a completely random measure is expressible for f ∈ BM+ (X ) as

− log E exp

−

f (x) ξ(dx)

X

=

f (x) α(dx) + X

X ×R+

(1 − e−yf (x) ) µ(dx × dy),

# −y ) γ(B × dy) < ∞ where α ∈ M# X and γ ∈ M (X × R+ ) satisﬁes R+ (1 − e for all bounded B ∈ BX . [Hint: Kallenberg (1983a, Chapter 7) gives an alternative proof. Compare also with Proposition 10.2.IX.]

10.1.6 Verify the assertion that a completely independent MPP has a simple ground process Ng if and only if (10.1.12) holds for every dissecting system T for bounded A ∈ BX . Without complete independence (10.1.12) need not hold [see (9.3.18) and Proposition 9.3.XII].

10.2.

Inﬁnitely Divisible Point Processes

87

10.1.7 Cox processes directed by stationary G-processes. Let ξ be a stationary Grandom measure on Rd as in Example 10.1(c) so that in the notation of the example, κ(A) = κ(A) for some ﬁnite positive constant κ. (a) Show that if ξ itself is used as the directing measure of a Cox process, then the realizations are stationary but a.s. not simple. (b) In the case d = 1, suppose the directing process is not ξ but the smoothed y version X(y) = −∞ φ(y − x) ξ(dx) for some continuous nonnegative integrable kernel function φ(·); X(·) has a density and can be regarded as a type of general shot-noise process [cf. Examples 6.1(d) and 6.2(a)]. Show that the Cox process is well-deﬁned, stationary, a.s. simple, with ﬁnite mean rate m = θα−1 κ R φ(u) du and reduced factorial covariance + density c[2] (u) = κ θα−2 (1 − α)

φ(x) φ(x + u) dx. R+

10.2. Inﬁnitely Divisible Point Processes Our aim in this section is to characterize the class of inﬁnitely divisible point processes and random measures. In the point process case, this question is intimately bound up with characterizations of Poisson cluster processes. The role of inﬁnitely divisible point processes and random measures in limit theorems for superpositions is taken up in the next chapter (Section 11.2). Deﬁnition 10.2.I. A point process or random measure is inﬁnitely divisible if, for every k, it can be represented as the superposition of k independent, identically distributed, point process (or random measure) components. In symbols, a point process N is inﬁnitely divisible if, for every k, we can write (k) (k) (10.2.1) N = N1 + · · · + Nk , (k)

where the Ni (i = 1, . . . , k) are i.i.d. components. Using p.g.ﬂ.s, the condition takes the form (in an obvious notation) G[h] = (G1/k [h])k

h ∈ V(X ) .

(10.2.2)

The p.g.ﬂ. is nonnegative for such h, so we can restate (10.2.2) as follows. A point process is inﬁnitely divisible if and only if, for every k, the uniquely deﬁned nonnegative kth root of its p.g.ﬂ. is again a p.g.ﬂ. Similarly for random measures the deﬁning property can be restated as follows. A random measure is inﬁnitely divisible if and only if, for every integer k > 0, the uniquely deﬁned kth root of its Laplace functional is again a Laplace functional. From these remarks we may immediately verify that, for example, a Poisson process is inﬁnitely divisible (replace the original parameter measure µ by the measure µ/k for each component), as, more generally, are the Poisson cluster

88

10. Special Classes of Processes

processes studied in Section 6.3 (replace the parameter measure µc for the cluster centre process by (µc )/k and leave the cluster structure unaltered). In the point process case, any ﬁdi distribution has a joint p.g.f. expressible as G[hA ], where hA (·) is of the form hA (x) = 1 −

n

(1 − zi )IAi (x)

(10.2.3)

i=1

for appropriate subsets Ai of the set A. Then from (10.2.2) it follows that the ﬁdi distributions of an inﬁnitely divisible point process are themselves inﬁnitely divisible. Conversely, when a point process has its ﬁdi distributions inﬁnitely divisible, (10.2.2) holds for functions h of the form (10.2.3). Because such functions are dense in V(X ), it follows by continuity as in Theorem 9.4.V that p.g.f.s like G[hA ] and G1/k [hA ] deﬁne p.g.ﬂ.s. Similar arguments apply to the case of a random measure, thereby proving the following lemma. Lemma 10.2.II. A point process or random measure is inﬁnitely divisible if and only if its ﬁdi distributions are inﬁnitely divisible. We now embark on a systematic exploitation of this remark and the results set out in earlier sections concerning the representation of inﬁnitely divisible discrete distributions (see, in particular, Exercises 2.2.2–3). We ﬁrst consider the case of a ﬁnite point process. Proposition 10.2.III. Suppose that the point process N with p.g.ﬂ. G[·] is a.s. ﬁnite and inﬁnitely divisible. Then there exists a uniquely deﬁned, a.s. , such that Pr{N = ∅} = 0, and a ﬁnite positive number ﬁnite point process N α such that − 1) G[h] = exp α(G[h] h ∈ V(X ) , (10.2.4) is the p.g.ﬂ. of N and ∅ denotes the null measure. where G Conversely, any functional of the form (10.2.4) represents the p.g.ﬂ. of an a.s. ﬁnite point process that is inﬁnitely divisible. Proof. It is clear that any functional of the form (10.2.4) is a p.g.ﬂ. and that the point process to which it corresponds is inﬁnitely divisible (replace is a.s. ﬁnite, because α by α/k and take kth powers). It is also a.s. ﬁnite if N if G[ρIX ] → 1 as ρ increases to 1, then also G[ρIX ] → 1, implying N is a.s. ﬁnite (see Exercise 9.4.5). Suppose conversely that N is inﬁnitely divisible and a.s. ﬁnite, and consider n its p.g.ﬂ. When h has the special form i=1 zi IAi (·), where A1 , . . . , An is a measurable partition of X , we know from Exercise 2.2.3 that G[h], which then reduces to the multivariate p.g.f. P (z1 , . . . , zn ) of the random variables N (A1 ), . . . , N (An ), can be represented in the form P (z1 , . . . , zn ) = exp α[Q(z1 , . . . , zn ) − 1] ,

10.2.

Inﬁnitely Divisible Point Processes

89

where Q is itself a p.g.f. with Q(0, . . . , 0) = 0 and α is positive, independent of the choice of the partition and equal to − log(Pr{N (X ) = 0}). Now consider the function = 1 + α−1 log G[h]. G[h] reduces to the multivariate p.g.f. Q. When h has the above special form, G inherits continuity from G. Hence, it is a p.g.ﬂ. by Theorem 9.4.V. To Also, G show that the resulting process is a.s. ﬁnite consider the behaviour of G[ρIX ] as ρ increases to 1. Because N itself is a.s. ﬁnite, G[ρIX ] → 1 by Exercise X ] → 1, showing that G is the 9.4.5. But then log G[ρIX ] → 0 and so G[ρI p.g.ﬂ. of a point process N that is a.s. ﬁnite. The representation (10.2.4) has a dual interpretation. It shows that any a.s. ﬁnite and inﬁnitely divisible process N can be regarded as the ‘Poisson randomization’ [borrowing a phrase from Milne (1971)] of a certain other . In this interpretation, the process N is constructed by point process N ﬁrst choosing a random integer K according to the Poisson distribution with probabilities pn = e−α αn /n! , and then, given K, taking the superposition of K i.i.d. components each . having the same distribution as N On the other hand, the process N can also be related to the cluster pro in terms of the cesses of Section 6.3. To see this, ﬁrst represent the p.g.ﬂ. G Janossy measures for N , so that (10.2.4) becomes log G(h) = α

∞ k=1

1 k!

···

X (k)

h(x1 ) · · · h(xk ) Jk (dx1 × · · · × dxk ) − 1 .

This inﬁnite sum can be rewritten as log G(h) =

∞ k=1

···

X (k)

[h(x1 ) · · · h(xk ) − 1] Qk (dx1 × · · · × dxk ),

(10.2.5)

where Qk (·) = (α/k!)Jk (·). Observe ﬁnally that this last form is the log p.g.ﬂ. of a Poisson cluster process, as in Proposition 6.3.V. Both interpretations above coexist for an a.s. ﬁnite process: they represent alternative constructions for the same process. To investigate the behaviour when the a.s. ﬁnite condition is relaxed, we ﬁrst observe that any inﬁnitely divisible process remains inﬁnitely divisible but becomes a.s. ﬁnite when we consider its restriction to any bounded Borel set. Its local representation therefore continues to have the form (10.2.4). Rewrite (10.2.4) for the special case that the bounded set is a (large) sphere, n say, of the process N Sn say, and introduce explicitly the distribution, P

90

10. Special Classes of Processes

restricted to B(Sn ). Writing Gn [ · ] for the corresponding p.g.ﬂ. of N , we have from (10.2.4) that Gn [h] = exp αn

∈N # (Sn ) N

exp

(dx) − 1 P n (dN ) , log h(x) N Sn

(10.2.6) where we recall the convention that the inner exponential term is to be counted has no points in the region where h diﬀers from unity, and as as unity if N zero if N has any points in the region where h vanishes. Bearing this in mind, we have in particular, from (10.2.6), : N (Sn ) > 0} ≡ αn P : N (Sn ) > 0} = − log P{N (Sn ) = 0}, n {N n {N Q (10.2.7) n and we continue the convention that n is an abbreviation for αn P where Q (Sn ) = 0} = 0, so that e−αn is just the probability that the original n {N P process has no points in Sn . ∗ say, on the n may be used to induce a similar measure, Q Each measure Q n # class of cylinder sets in the full space NX determined by conditions on the behaviour of the counting process on Sn . Speciﬁcally, for C a set in NS#n of the form (Ai ) = ri , Ai ⊆ Sn , i = 1, . . . , k}, ∈ N# : N C = {N Sn where the ri are nonnegative integers not all zero, we associate the set C ∗ in ∈ N #: N (Ai ) = ri , Ai ⊆ Sn , i = 2, . . . , k}, and put NX# given by C ∗ = {N X n (C). ∗n (C ∗ ) = Q Q (Sn ) = 0: for this reason This construction fails for the set in NX# for which N ∗n not on the full sub-σ-algebra of cylinder sets with base we have to deﬁne Q determined by conditions in Sn , but on the sub-σ-algebra generated by those (Sn ) > 0. Let us denote this subcylinder sets incorporating the condition N σ-algebra by Bn . Then it is clear that the Bn are monotonic increasing and that ∞ Bn = B N0# (X ) , σ n=1

where N0# (X ) denotes the space NX# with the null measure ∅(·) omitted. On ∞ ∗ , the projective the union n=1 Bn , we can consistently deﬁne a set function Q ∗ limit of {Qn }, by setting ∗ (A) = Q ∗ (A) Q n ∗ reduces to Q∗ whenever whenever A ∈ Bn . This is possible because Q m n m > n and we restrict attention to sets in Bn . The set function Q∗ is countably additive on each of the Bn but not obviously so on their union. The situation, however, is similar to that of the Kolmogorov extension theorem

10.2.

Inﬁnitely Divisible Point Processes

91

for stochastic processes, or to the extension theorem considered in Section 9.2, where countable additivity is ultimately a consequence of the metric assumptions imposed on the space X . The same argument applies here also; we ∗ has a unique extension leave the details to Exercise 10.2.1. It implies that Q # to a measure Q on the σ-algebra B(N0 (X )). In addition to the fact that it is deﬁned on the sets of N0# (X ) rather than # NX , Q enjoys one further special property. Equation (10.2.7) implies that for any bounded set A, : N (A) > 0} < ∞. Q{N (10.2.8) deﬁned on the Borel Deﬁnition 10.2.IV. A boundedly ﬁnite measure Q sets of N0# (X ) = NX# \ {N (X ) = 0}, and satisfying the additional property (10.2.8), is called a KLM measure. The measure is so denoted for basic contributions to the present theory in Kerstan and Matthes (1964) and Lee (1964, 1967). Theorem 10.2.V. A point process on the c.s.m.s. X is inﬁnitely divisible if and only if its p.g.ﬂ. can be represented in the form G[h] = exp

N0# (X )

exp

X

log h(x) N (dx) − 1 Q(dN )

(10.2.9)

When such a representation exists, it is unique. for some KLM measure Q. Proof. Suppose that the point process N is inﬁnitely divisible, and let be the KLM measure constructed as above. When it is equal to unity Q outside the sphere Sn , the representation (10.2.9) reduces to (10.2.6) from and so the functional G in (10.2.9) must coincide with the construction of Q, the p.g.ﬂ. of the original process. is given, and consider (10.2.9). If Conversely, suppose a KLM measure Q we set N : N (Sn ) > 0} αn = Q{ (ﬁnite by assumption in Deﬁnition 10.2.IV), (10.2.9) can again be reduced to the form (10.2.6) for functions h that equal unity outside Sn , and therefore, by Proposition 10.2.III, it is the p.g.ﬂ. of a local process deﬁned on Sn . In particular, therefore, (10.2.9) reduces to a joint p.g.f. when h has the form k 1 − i=1 (1 − zi )IAi . The continuity condition follows from the remark already made that (10.2.9) deﬁnes a local p.g.ﬂ. when we restrict attention to the behaviour of the process on Sn . Thus, (10.2.9) is itself a p.g.ﬂ. Inﬁnite divisibility follows from Lemma 10.2.II and the remarks already made concerning the local behaviour of G[h]. Finally, uniqueness follows from the construction and the uniqueness part of Proposition 10.2.III. Example 10.2(a) Poisson process [see Example 9.4(c)]. If (10.2.9) is to reduce must be simple and have a to the p.g.ﬂ. (9.4.17) of a Poisson process, each N

92

10. Special Classes of Processes

N (X ) = 1} = 0, because single point as its support; that is, we must have Q{ must for given N the integrand for the integral at (10.2.9) with respect to Q reduce to h(x) − 1, where {x} is the singleton support of N . In fact, the KLM must be related to the parameter measure µ by measure Q N (A) = 1} = Q{ N : N (X ) = 1 = N (A)} = µ(A) Q{

(bounded A ∈ BX ).

Further insight into the structure of such processes can be obtained from a classiﬁcation of the properties of their KLM measures. In particular, we make the following deﬁnitions. Deﬁnition 10.2.VI. An inﬁnitely divisible point process is regular if its KLM measure is carried by the set : N (X ) < ∞} Vr ≡ {N

(10.2.10a)

and singular if its KLM measure is carried by the complementary set : N (X ) = ∞}. Vs ≡ {N

(10.2.10b)

We now have the following decomposition result. Proposition 10.2.VII. Every inﬁnitely divisible point process can be represented as the superposition of a regular inﬁnitely divisible process and a singular inﬁnitely divisible process, the two components being independent. Proof. This follows from the representation (10.2.9) on writing =Q r + Q s , Q where for each A ∈ B(N0# (X )), r (A) = Q(A ∩ Vr ), Q

s (A) = Q(A ∩ Vs ). Q

r (·) and Q s (·) is again a KLM measure, and because the original Each of Q p.g.ﬂ. appears as the product of the p.g.ﬂ.s of the two components, the corresponding components themselves must be independent and their superposition must give back the original process. Further characterizations of various classes of inﬁnitely divisible point processes can be given in terms of their KLM measures, as set out below. Some reﬁnements for the stationary case are given in Section 12.4. Proposition 10.2.VIII. (i) An inﬁnitely divisible point process is a.s. ﬁnite if and only if it is regular and its KLM measure is totally ﬁnite.

10.2.

Inﬁnitely Divisible Point Processes

93

(ii) An inﬁnitely divisible point process can be represented as a Poisson cluster process, with a.s. ﬁnite clusters, if and only if it is regular. (iii) An inﬁnitely divisible point process can be represented as a Poisson randomization if and only if its KLM measure is totally ﬁnite. Proof. Part (i) is a restatement of Proposition 10.2.III, regularity coming from the assertion that the process N is a.s. ﬁnite, and the total boundedness of the KLM measure from the fact that it can be represented in the form αP, is a probability measure. where 0 < α < ∞ and P Part (ii) follows from the representation of Poisson cluster processes in Proposition 6.3.V, taking X = Y there, so that the measures Qk (·) in that proposition can be combined to give a measure on the space of all ﬁnite counting measures, as in Proposition 5.3.II. That this is a KLM measure follows from the absence of any Q0 term, and the condition (6.3.33), which in the terminology of the present section can be rewritten [see (6.3.35)] as N : N (A) > 0} < ∞ µ(A) = Q{

(bounded A ∈ BX ).

Conversely, we can split the KLM measure of any regular inﬁnitely divisible : N (X ) = k}, in each of which process into its components on the sets Vk = {N it induces a measure Qk with the properties described in Proposition 6.3.V. Finally, part (iii) follows from the observation that here also the KLM = αP, where α is the parameter of measure can be written in the form Q the Poisson randomizing distribution and P is the distribution of the point process being randomized, which we assume adjusted if necessary so that N : N (X ) = 0}) = 0. P({ Extensions to inﬁnitely divisible multivariate and marked point processes are considered in Exercises 10.2.2 and 10.2.5. An analogous representation holds also for inﬁnitely divisible random measures, and is set out below. Proposition 10.2.IX. A random measure on the c.s.m.s. X is inﬁnitely divisible if and only if its Laplace functional can be represented in the form for all f ∈ BM+ (X ) − log L[f ] = 1 − exp − f (x) α(dx) + f (x) η(dx) Λ(dη), M# 0 (X )

X

X

(10.2.11) where α ∈ = − {∅}, and Λ is a σ-ﬁnite measure on M# (X ) satisfying, for every bounded Borel set B ∈ BX and distribution 0 F1 (B; x) ≡ Λ{η: η(B) ≤ x}, (1 − e−x ) F1 (B; dx) < ∞. (10.2.12) M# X,

M# 0 (X )

M# X

R+

A proof involving the inductive limit of the corresponding representations for the Laplace transforms of the ﬁdi distributions can be given along the same

94

10. Special Classes of Processes

lines as that of Theorem 10.2.V, with (10.2.11) reducing to (10.2.9) when ξ is a point process. See Exercise 10.2.5. Results about the convergence of inﬁnitely divisible distributions and their role in limit theorems are reviewed in Chapter 11, notably Section 11.2. Results for the stationary case are outlined in Section 12.4.

Exercises and Complements to Section 10.2 10.2.1 Kolmogorov extension theorem analogue for Q∗ . Show that the measure Q∗ deﬁned below (10.2.7) admits consistent ﬁdi distributions in the sense that Q∗ {N : N (Ai ) = ki , i = 1, . . . , n} satisfy the two consistency Conditions 9.2.V, namely, marginal consistency ˜ ∗ is ﬁnitely additive and symmetry under permutations. Show also that Q and continuous in the sense that for disjoint Ai , Q

∗

N: N

n

Ai

=

i=1

and

Q∗ {N : N (An ) > 0} → 0

n

N (Ai )

= 0,

i=1

when An ↓ ∅.

[For the latter, write Vn = {N : N (An ) > 0}. Because N (An ) ↓ 0 for all N ∈ NX , {Vn } is a monotonic decreasing sequence of sets, say Vn ↓ V . Supposing N0 ∈ V , then N0 (An ) > 0 for all n giving a contradiction. Q∗ (Vn ) → 0 follows from the countable additivity of Q∗ (·) on Sk , because we may assume the existence of some k for which An ⊆ Sk (n = 1, 2, . . .).] The same arguments as used in the proof of Lemma 9.2.IX now show that there exists a countably additive set function Q deﬁned on B(NX# ) such that Q(C) = Q∗ (C) Q∗ -a.s.; that is, Q∗ admits a countably additive extension Q as required. 10.2.2 For an inﬁnitely divisible multivariate point process [see Deﬁnition 6.4.I(a)], ˜ is deﬁned on X × {1, . . . , m} and satisﬁes for show that the KLM measure Q bounded A ∈ BX Q{N = (N1 , . . . , Nm ): N1 (A) + · · · + Nm (A) > 0} < ∞. 10.2.3 Show that the KLM measure of the Gauss–Poisson process of Example 6.3(d) is the sum of two components, namely, measures Qj concentrated on realizations with N (X ) = j for j = 1, 2. 10.2.4 Let N be an inﬁnitely divisible marked point process on the space X with marks in K, so Ng (A) < ∞ for each bounded A ∈ BX . (a) Write out the representation of N as an inﬁnitely divisible point process on Y ≡ X × K. (b) Observe that the ground process Ng is inﬁnitely divisible, and write down its representation as an inﬁnitely divisible point process on X . (c) Investigate the relation between the KLM measures Q on B(Y0 ) for N and Qg on B(X0 ) for Ng . In particular, investigate whether the KLM measure Q has Q{N : N (A × K) > 0} < ∞ for such A. (d) Investigate the cluster process representation of a regular inﬁnitely divisible marked point process.

10.3.

Point Processes Deﬁned by Markov Chains

95

10.2.5 Proof of Proposition 10.2.IX. To establish (10.2.11), show the following. (a) ξ is inﬁnitely divisible if and only if its ﬁdi distributions are inﬁnitely divisible. (b) Each ﬁdi distribution has a standard representation similar to that of (10.2.11) subject to the constraint (10.2.12). (c) An inductive limit argument as in the proof of Proposition 10.2.III holds. [Hint: See Kallenberg (1975, Theorem 6.1).]

10.3. Point Processes Deﬁned by Markov Chains In many applications, point processes arise as an important—and observable —component of the process of primary interest, but not necessarily as the primary process itself. Very commonly, the principal object of study is a Markov process. Our aim in this section is to examine some of the ways Markov processes can give rise to point processes in time, and some of the issues arising in the discussion of such models. We show ﬁrst that a Markov or semi-Markov process on ﬁnite or countable state space can be regarded equivalently as an MPP {(tn , κn )}, where each tn is the time of a state transition, and the associated mark κn denotes the state entered when the transition occurs. In the other direction, every point process with a conditional intensity can be represented as a Markov process, via its history, which can be thought of as a Markov process of jump–diﬀusion type on a general state space, the points of the state space representing past histories as viewed from the present. The process moves continuously between states during the intervals between events, and jumps to a new state whenever an event occurs. It is Markovian because both the timing of the jump (determined by the conditional intensity) and the nature of the jump (determined by the conditional mark distribution) are functions of the current state. In such great generality, this observation may not have great value, although we shall return to it in Section 12.5 in developing a framework for the discussion of convergence to equilibrium for point processes. However, many important models arise as special cases, when the relevant history can be condensed into a compact and manageable form. A renewal process provides the simplest example, where the relevant history is just the backward recurrence time, which increases at unit rate between events, resets to zero whenever a jump occurs, and constitutes a simple diﬀusion–jump-type Markov process. For a Wold process, the backward recurrence time and the length of the last complete interval together constitute a Markov process driving the point process. Some other simple examples are described following the discussion of Markov and semi-Markov processes. We then note an important distinction between models such as the renewal process, in which the underlying Markov process can be directly constructed from observations on the point process, and those for which the point process

96

10. Special Classes of Processes

forms only part of a more complex Markovian system, which therefore remains at best partially observable if the only available information comes from the point process observations. This latter class includes hidden Markov models (HMMs) for point processes, such as the so-called Markov-modulated Poisson process (MMPP), and the Markovian arrival process (MAP) and batch Markovian arrival processes (BMAPs) developed by Neuts and co-workers, and now widely used in modelling internet protocol (IP) traﬃc and elsewhere. Diﬃcult issues of parameter estimation arise for such processes: we outline the use of the expectation–minimization (E–M) algorithm for this purpose. Example 10.3(a) Point process structure of Markov renewal and semi-Markov processes [e.g. Asmussen (2003, Section VII.4); C ¸ inlar (1975, Chapter 10); Kulkarni (1995, Chapter 9)]. The probability structure of a Markov renewal or semi-Markov process on ﬁnite or countable state space X = {i, j, . . .} is deﬁned 0 by the following ingredients: an initial distribution {pi : i ∈ X}; a matrix of transition probabilities (pij ) with j∈X pij = 1 (all i ∈ X); and two families of distribution functions Fij0 (u) and Fij (u) [(i, j) ∈ X × X and u ∈ (0, ∞)], deﬁning the initial and subsequent lengths of time the process remains in state i, given that the next transition takes it to j. These ingredients can be used to construct a bivariate sequence {(κn , τn )} satisfying, for u ∈ R+ and sequences xr ∈ R+ and kr ∈ X for r = 0, 1, . . . , Pr{κ0 = k0 } = p0k0 ,

(10.3.1a)

Pr{τ1 ≤ u, κ1 = k1 | κ0 = k0 } = Fk00 k1 (u) pk0 k1 , and for n = 2, 3, . . . ,

Pr{τn ≤ u, κn = kn | κ0 = k0 , (κr , τr ) = (kr , xr ) (r = 1, . . . , n − 1)} (10.3.1b) = Pr{τn ≤ u, κn = kn | κn−1 = kn−1 } = Fkn−1 kn (u) pkn−1 kn . Letting u → ∞ here we see that {κn : n = 0, 1, . . .} is a discrete-time X-valued Markov chain with one-step transition matrix (pij ), and that for each n ≥ 1, τn conditional on (κn , κn+1 ) is independent of {(κr , τr ): r = 0, . . . , n − 1}. We assume that the process has neither instantaneous states nor explosions (see Exercise 10.3.1). Suﬃcient conditions, ensuring in particular that tn ≡ t0 + τ1 + · · · + τn → ∞ a.s., are that the matrix (pij )should be irreducible and have nontrivial invariant measure {πi } satisfying i πi µi < ∞, where µi =

j∈X

pij

∞

u Fij (du) = 0

j∈X

∞

u Gij (du) =

0

u Gi (du) 0

is the mean sojourn time in state i and Gi (·) =

j∈X

Gij (·) ≡

j∈X

∞

pij Fij (·).

10.3.

Point Processes Deﬁned by Markov Chains

97

When card(X) ≥ 2 and pii = 0 (i ∈ X), there is a one-to-one measurable mapping between sequences {(tn , κn )} and X-valued, right-continuous, piecewise-constant functions X(t) with at most ﬁnitely many change-points on any bounded interval, where t is a change-point of X if and only if X(t −) = X(t ) = X(t +). Clearly, any such X(t) ∈ X for all t for some countable set of states X = {i, j, . . .}, and all change-points on [0, ∞) can be ordered as 0 = t0 < t1 < · · · < tn → ∞ (n → ∞). Thus any such X(·) determines a marked point sequence {(tn , κn ): n = 0, 1, . . .} for κn ∈ X for all n. Conversely, such a marked point sequence determines X(·) via the relation X(t) =

∞

κn I[tn ,tn+1 ) (t),

(10.3.2)

n=0

that is, X(t) = X(tn + 0) = κn (tn ≤ t < tn+1 ). Write τn = tn − tn−1 for all n for which tn < ∞ (see Figure 9.2). We call the sequence of marked points {(tn , κn )} a Markov renewal process, and {X(t)} a semi-Markov process. The discussion around (10.3.2) implies that the two are equivalent under the stated conditions. Observe the following. (1) When X is a one-point set we have a renewal process, delayed if F 0 = F . (2) When all τn = 1, {X(n): n = 0, 1, . . .} is a discrete-time Markov chain on X with one-step transition probability matrix (pij ). (3) When Fij0 (u) = Fij (u) = 1 − e−qi u (0 < qi < ∞) for all j, and qij = qi pij , X(·) is a conservative continuous-time Markov chain with Q-matrix (qij ). A Markov renewal process {(tn , κn )} as deﬁned can just as well be interpreted as an MPP with mark space K = X and ground process deﬁned by Ng (A) = #{n: tn ∈ A} for bounded A ∈ BR+ . By the equivalence just noted, a semi-Markov process can also be treated as an MPP. Consider ﬁrst its conditional intensity. For any ﬁnite t > 0, since Ng is ﬁnite on bounded subsets, either Ng [0, t) = 0 or there exists a largest tn ∈ [0, t), deﬁning tprev say with associated mark1 κprev . Suppose that tprev is deﬁned, and the transition distribution functions Fij (u) have densities fij (u), so that there also exist transition-time density functions gij (t) dt = Pr{τn ∈ (t, t + dt), κn = j | κn−1 = i} = pij fij (t) dt.

(10.3.3)

Then the conditional intensity function λg (t) for the ground process depends only on the current state X(t−) (= κprev ) and when it was entered, and we have λg (t) = gκprev (t − tprev )/[1 − Gκprev (t − tprev )], The conditional intensity function λ∗ itself is expressible as λ∗ (t, κ | Ht ) = 1

gκprev ,κ (t − tprev ) = λg (t)f (κ | t, Ht ), 1 − Gκprev (t − tprev )

(10.3.4a)

In terms of the pair (tprev , κprev ), a Poisson process depends on neither element, a renewal process depends on tprev only, a Markov process depends on κprev only, and a semi-Markov process depends on the complete pair. See Exercise 10.3.2.

98

10. Special Classes of Processes

where the conditional mark distribution is given by f (κ | t, Ht ) =

gκprev ,κ (t − tprev ) . j∈X gκprev ,j (t − tprev )

For k ∈ X, each Nk (A) = #{n: tn ∈ A and κn = k} is a point process on R for which Nk (t) ≡ Nk (0, t] counts the number of entries into the state k during (0, t]. In the Markov case, the Markov property implies that the conditional intensity λ∗ (t, k) dt = E[dNk (t) | Ht ] depends on the past only through the state last entered, so that the ground intensity is given by λ∗g (t) = qX(t−) = qκprev , whereas for the conditional mark distribution f (k | t, Ht ) = pκprev ,k = pX(t−),k

(10.3.4b)

[cf. equation (7.3.3)]. The trajectories of the Markov process are easy to reconstruct from this MPP, because it carries details of which states were entered and for how long. The likelihood for a realization {(tn , kn ): n = 1, . . . , N (T )}, starting in state k0 at t = 0 and extending over an observation period (0, T ), can be written in the form LT = e−qk0 t1 qk0 k1 e−qk1 (t2 −t1 ) qk1 k2 . . . e

−qk

N (T )

(T −tN (T ) )

.

(10.3.5)

In the semi-Markov case, the MPP again contains complete information about the evolution of the process, and allows the likelihood to be written down explicitly in terms of the gij (·); details are set out in Exercise 10.3.5. As in the simple renewal process [see (4.1.5–10)], an important role in describing the properties of the semi-Markov process is played by the Markov renewal operator H(·) with elements Hij (t) = E[Nj [0, t] | (t0 , κ0 ) = (0, i)]. H is given by the sum of the series of convolution powers, in which G(t) = Gij (t) , H = I + G + G ∗ G + G ∗ G ∗ G + ···, or equivalently from the Markov renewal equation H = I + G ∗ H = I + H ∗ G, where convolution of matrices of nondecreasing functions is deﬁned elementwise by t dAik (u) Bkj (t − u) (A ∗ B)(t) ij = k∈X

0

[if instead of nondecreasing functions we have matrices of nonnegative den t sities then deﬁne (a ∗ b)(t) ij = k 0 aik (u) bkj (t − u) du]. In particular,

10.3.

Point Processes Deﬁned by Markov Chains

99

assuming densities as above and that Fij0 = Fij , the factorial moment densities for the ground process are given by the matrix products mg[r] (u1 , . . . , ur ) = p 0 H (u1 )H (u2 − u1 ) . . . H (ur − ur−1 )1

(10.3.6)

p0 = (p0i ) is the vector of initial probabilities, H (t) = for r =1, 2, . . . , where hij (t) = Hij (t) , and 1 is a vector of ones. Ball and Milne (2005) generalize this result to a wide range of point processes deﬁned by transitions between states or groups of states; see also Exercises 10.3.3–4. Let {πi } be left-invariant for (pij ). Then Ug (x) = i∈X πi j∈X Hij (x) is the analogue of the renewal function at (4.1.5). See Exercise 13.4.4. The next three examples illustrate further cases where a point process can be expressed in terms of a relatively simple Markov process. Example 10.3(b) Hawkes process with exponential decay [see also Exercise 7.2.5 and Example 7.3(c)]. This is the simplest of several such examples considered t in Chapter 7. The governing Markov process is of shot-noise type Y (t) = 0 e−α(t−u) N (du), and is a linear function of past observations; the conditional intensity function takes the form λ∗ (t) = λ + ν

t

α e−α(t−u) N (du) = λ + ναY (t)

0

when we add in a background rate λ. Exercise 7.2.5 details the forward Kolmogorov equation, the stationary distribution, and the likelihood function, assuming the process starts from the stationary distribution. Examples 7.3(b)–(c) consider various extensions, in particular to situations where the conditional intensity can be a more general function of Y (t). Example 10.3(c) Birth-and-death process. Consider a simple birth-anddeath process, with birth rate λ per individual and death rate µ per individual. Let N + (t) and N − (t) denote the numbers of births and deaths recorded up to time t. Then the population size at time t is just the diﬀerence N (t) = N (0) + N + (t) − N − (t). We can treat this as an MPP by supposing that the instants of births and deaths are recorded separately, together forming the ground process Ng (t) = N + (t) + N − (t), with the marks ‘+’ denoting a birth and ‘−’ a death. The process N (t) forms an ergodic Markov process if λ < µ and there are also new arrivals, or ‘births from external source’ (i.e., immigrants) that occur according to a Poisson process of constant rate ν > 0. This is a classic example of a continuoustime Markov process, and it is well known that, for example, a stationary

100

10. Special Classes of Processes

distribution exists when µ > λ and is negative binomial [see, e.g., Bartlett (1955, p. 78)] with generating function G(z) =

∞

j

z Pr{N (t) = j} =

j=0

µ − λz µ−λ

−ν/µ .

In point process terms, the conditional mark distribution takes the form ! f ∗ (+ | t) = [ν + λN (t−)] λ∗g (t), ! f ∗ (− | t) = [µN (t−)] λ∗g (t),

where

λ∗g (t) = ν + (λ + µ)N (t−)

[recall (7.3.3)]. The likelihood can be written down from the standard point process formulae (Proposition 7.3.III) provided either N (0) is known (e.g., the population starts from size zero), or its initial distribution is known [e.g., in the stationary regime, it should be the stationary distribution for N (t)]. Exercise 10.3.5 gives some details. Example 10.3(d) Pure death process for software reliability [Jelinski and Moranda (1972); Singpurwalla and Wilson (1999)]. Consider a new or partially tested piece of software containing a number of errors (‘bugs’). Every time the software fails, one of the bugs is discovered and repaired. Suppose that every undiscovered bug contributes a constant component µ to the total risk; denote the number of bugs still undiscovered at time t by X(t). Then the conditional intensity at time t is given by µX(t). The process X(t) is clearly Markovian; in fact it constitutes a pure death process with constant death rate µ. In principle, it is directly observable only if N = X(0) is observable, a somewhat unlikely circumstance in the given context. More commonly therefore, N is treated as an unknown parameter (see Exercise 10.3.5 and, e.g., Singpurwalla and Wilson). But unless µ is known, very little information about N is retrievable from the data on observed failure times, and standard likelihood estimates of N are either unstable or unobtainable. Hence a Bayesian approach is often preferred, with a prior distribution for the initial state which may be either given subjectively or constructed from experience of previous studies. The posterior distribution of N can then be used to obtain an estimate of the number of remaining bugs. The last example shows very clearly, as is also true in earlier examples, that the underlying Markov process cannot be considered as fully observable unless the initial state is known. If it is not known, the initial state plays a role similar to that of an unknown parameter, which can be either estimated along with the other parameters from the likelihood function (this rarely leads to a satisfactory estimate because the information in the data with any bearing on the initial value is usually limited), or speciﬁed in terms of a prior distribution. Such examples form simple cases of the more general situation where the observed point process carries only partial information about the underlying

10.3.

Point Processes Deﬁned by Markov Chains

101

Markov process. More commonly, not only the initial state but also the subsequent evolution of the Markov process remains unobserved. The birthand-death process illustrates also how simple it is for a point process driven by an observable Markov process to transform into an example of this kind. In this example, the situation would be radically altered if the ‘+/−’ marks were no longer observed: for a history with n observed points, there would then be 2n diﬀerent ways in which the marks might be assigned, and obtaining the overall likelihood would entail averaging over 2n likelihoods conditional on a given ordering. At this point we enter the territory of hidden Markov models. The observed history, usually the internal history for the point process, is insuﬃcient to allow the reconstruction of the Markov process driving the point process. Consequently the simple likelihoods characterizing the previous examples, which were dependent on knowing the driving process, are no longer available. Put in other terms, the conditional intensity based on the internal history is complex and diﬃcult to handle directly, in contrast to the conditional intensity given the history of the driving process, which commonly has a simple form. Starting from the 1960s, these ideas led to a ﬁltering theory for point processes, motivated by the analogy with the Kalman ﬁlter, and making use of the martingale constructions described in Chapter 14. More recently, a new focus on the problems of parameter and state estimation for such processes has developed through the use of the E–M (estimation–maximization) algorithm, an approach we outline shortly. First, however, we explore directly the simplest example of an HMM with point process observations. It played a key role in the early discussion of point process ﬁltering [see, e.g., Yashin (1970); Galchuk and Rosovskii (1971); Snyder (1972); Jowett and Vere-Jones (1972); Rudemo (1972, 1973); Vere-Jones (1975); Br´emaud (1981, Chapter 4)], and is also the starting point for the more elaborate HMMs which have come to be used extensively in communication theory and elsewhere. Example 10.3(e) Cox process directed by a simple Markov chain; ‘telegraph signal’ process; Markov modulated Poisson process. We sketched this model brieﬂy in Exercise 7.2.8. The more extended discussion here generally follows Rudemo (1972). Suppose given a Markov process {X(t): t ≥ 0} on the ﬁnite state space K {1, . . . , K} with Q-matrix Q = (qij ) so that j=1 qij = 0 (i = 1, . . . , K), and qi ≡ −qii , assumed positive for all i, governs the exponential holding times in state i, and pij = qij /qi represents the probability that when a jump occurs from state i it is into a state j = i. Then the matrix of transition probabilities P (t) ≡ (pij (t)) satisﬁes P (0) = I and the forward and backward equations dP = QP (t) = P (t)Q, dt with solution P (t) = exp(tQ). Suppose further that while this Markov process is in state j, points of a Poisson process are generated at rate λj , and that

102

10. Special Classes of Processes

the observational data consist only of these points. The simplest nontrivial case of this set-up occurs with a process X(·) on two states with λ2 ≈ 0 and λ1 somewhat larger: we then have a model of a ‘telegraph signal’ process. Several estimation problems arise in connection with this process. Let us consider in particular the problem of ‘tracking’ the unobserved Markov process X(·), given the observations on the point process. This requires maintaining and updating the family of probabilities πi (t) ≡ P({X(t) = i} | Ht ).

(10.3.7)

Suppose the points observed on (0, T ) are t1 < · · · < tN (T ) , with tN (T ) < T . To obtain the πi (t) above, we consider the evolution of the ‘joint statistics’ deﬁned, on any subinterval of the form (0, t) for 0 < t < T and with tN (t) < t ≤ tN (t)+1 , by pi (t; t1 , . . . , tN (t) ) dt1 . . . dtN (t) = Pr{X(t) = i and points occur in (tj , tj + dtj ), j = 1, . . . , N (t)}. (10.3.8) Call ‘either X(·) changes state or there is a point at t’ an event at t. Then, conditional on X(t+) = i, the time τ elapsing until the next event is exponentially distributed with Pr{τ > u} = e−(qi +λi )u , and, independent of τ , the event is either a point with probability λi /(qi +λi ) or a transition of X(·) from i to j with probability qij /(qi + λi ). Between observed points, therefore, the joint statistics evolve in a similar manner to the basic transition probabilities but with the matrix Q − Λ in place of Q, where Λ ≡ diag(λ1 , . . . , λK ). When a jump occurs, the joint statistics are weighted by factors λi , corresponding to multiplying the vector of joint statistics by the matrix Λ. It then follows that the vector p(·) ≡ p1 (·), . . . , pK (·) of joint statistics can be expressed as the matrix product (10.3.9a) p(t) ≡ p(t; t1 , . . . , tN (t) ) = p(0) R (0, t]; t1 , . . . , tN (t) , where R (0, t] = J(t) if N (t) = 0 and otherwise R (0, t] = J(t1 ) Λ J(t2 −t1 ) Λ . . . J(tN (t) −tN (t)−1 ) Λ J(t−tN (t) ), (10.3.9b) is the vector of initial probabilities for the process p(0) = p1 (0), . . . , pK (0) X(·) and J(x) = exp (Q − Λ)x . The probability πi (t) that at any time t ∈ (0, T ] the process is in state i, given the observations t1 , . . . , tN (t) up to time t and the initial distribution, is the ratio πi (t) =

pi (t) . p(t) 1

(10.3.7 )

Although (10.3.9) gives an explicit representation of the joint statistics, it may be just as convenient, particularly if an updating procedure is envisaged,

10.3.

Point Processes Deﬁned by Markov Chains

103

to represent their evolution in terms of the diﬀerential equations they satisfy between events, and the discrete jumps that occur at the events tn on the trajectory. In terms of the pi (t) these equations are linear in form as below, with Dt ≡ ∂/∂t: Dt pi (t) = −(λi + qi )pi (t) +

pj (t)qji

(t = any tn ),

(10.3.10a)

j =i

∆pi (tn ) = pi (tn +) − pi (tn −) = (λi − 1)pi (tn −).

(10.3.10b)

Similar equations can be written down for the conditional probabilities πi (t) but in view of the ratios involved these are nonlinear, having the form Dt πi (t) = −πi (t)[λi + qi − λH (t)] + ∆πi (tn ) =

where

λ i − 1 πi (tn −), λH (t)

λH (t) =

πj (t)qji

(t = any tn ),

(10.3.11a)

j =i

(10.3.11b)

K i=1

λi pi (t)

(10.3.12)

is the conditional intensity at time t, given only the internal history H. If we consider the same example from the point of view of parameter estimation, (10.3.9) immediately yields the likelihood for observations over the observation period (0, T ) in the form L(t1 , . . . , tN (T ) ) = p(T ) 1 = p(0) R (0, T ] 1,

(10.3.13)

where 1 is the column vector of 1s. Because the likelihood is represented here as an explicit function of the Q-matrix of the hidden Markov process and the rates λi , it could in principle be used directly in maximization routines to ﬁnd likelihood estimates of these quantities. However, this is a cumbersome and unstable process at best, and various techniques have been suggested to help stabilize the estimation procedures. One option, treated in detail in Vere-Jones (1975), is to extend the diﬀerence/diﬀerential equations from the joint probabilities (and hence the likelihood) to the means and variances of the parameter estimates; the results are still cumbersome and awkward to implement in practice. A more eﬀective approach, indicated already in Exercise 7.2.8, is to bring the E–M algorithm to bear on the problem. This approach also reveals structural features that are important in discussing more general classes of models. The virtues of the E–M algorithm in this context are that it allows us to return to the simpler likelihood structures that occur when the underlying Markov process is known, thus making maximization (the M-step) easy, and replaces the rather unstable direct maximization of the likelihood by a more stable iterative procedure. Nevertheless, numerical issues

104

10. Special Classes of Processes

remain a major concern, and for the point process models many hundreds or even thousands of observations may be needed to gain stable parameter estimates, and in unfavourable cases (e.g., when some state transitions appear only rarely), much more may be required. To introduce these ideas, we ﬁrst outline the basic steps in applying the E–M algorithm to an HMM in discrete time and space. The immediate point process application, treated in Example 10.3(f), assumes Poisson observations such as would result from binning the observed points in Example 10.3(d). Standard references for the E–M algorithm are Dempster et al. (1977), Elliott et al. (1995), MacDonald and Zucchini (1997), and particularly relevant examples are discussed, for example, in Qian and Titterington (1990), Deng and Mark (1993), Turner et al. (1998). The pioneering work goes back to papers by Baum and Petrie (1966) and Baum and Eagon (1967). To describe an HMM in discrete time and space we suppose given the matrix P = (pij ), i, j = 1, . . . , K, of one-step transition probabilities of the underlying Markov chain which for simplicity we assume is aperiodic and irreducible. We also suppose given a family of probability distributions: when the chain is in state j it generates an observation with density fj (z) (j = 1, . . . , K and z real). The procedures start by introducing the forward and backward probabilities deﬁned, respectively, by αn (j) = Pr{Xn = j; Z1 = z1 , . . . , Zn = zn } βn (j) = Pr{Zn+1 = zn+1 , . . . , ZN = zN | Xn = j}

(10.3.14a) (10.3.14b)

[strictly, αn (j) = α(j; z1 , . . . , zn ) and βn (j) = β(j; zn+1 , . . . , zN )]. For n = 1, . . . , N − 1, these probabilities satisfy the recurrence relations αn+1 (j) =

K

αn (i)pij fj (zn+1 ),

(10.3.15a)

pjk fk (zn+1 )βn+1 (k),

(10.3.15b)

i=1

βn (j) =

K k=1

where the pij are the one-step transition probabilities of the discrete-time chain (so (pij ) = P∆t in our case), the fj (·) are probability densities (Poisson probabilities in our case) for the observations when the chain is in state j, and α1 (·) is the vector of initial probabilities and βN (·) = 1. Matrix versions of these equations are discussed in Exercise 10.3.7. It is assumed that the fj are unchanging over the observation period, and that successive observations are conditionally independent, given the corresponding states. For every 1 ≤ n ≤ N , the likelihood LN of the N observations, obtained by averaging over the possible state sequences, can be written in the form LN = K α (i)β n n (i); in particular, i=1 LN =

K i=1

αN (i).

(10.3.16)

10.3.

Point Processes Deﬁned by Markov Chains

105

In practice it is generally desirable to renormalize the forward and backward probabilities at each step to avoid computations with extremely small numbers (see Exercise 10.3.7). Although the recurrence equations allow the forward and backward probabilities to be computed quickly, their underlying importance is embodied in the following lemma. Lemma 10.3.I (State Estimation Lemma). For an HMM with P and {fj (·)} as above, the conditional probabilities dn (i) = Pr{Xn = i | Z1 , . . . , ZN } and en (i, j) = Pr{Xn = i, Xn+1 = j | Z1 , . . . , ZN } are given, respectively, by αn (i)βn (i) , n = 1, . . . , N, k αn (k)βn (k) αn (i) pij fj (Zn+1 ) βn+1 (j) en (i, j) = , n = 1, . . . , N − 1. k, αn (k) pk f (Zn+1 ) βn+1 () dn (i) =

(10.3.17a) (10.3.17b)

Proof. The proof of (10.3.17a), as of the likelihood representation (10.3.16), depends on the fact that we can use the Markovian property to break open the joint probability Pr{Xn = i; Z1 = z1 , . . . , ZN = zN } and write it as the product of the two terms αn (i), βn (i) for any n in 0 ≤ n ≤ N . Start with Pr{Xn = i; Z1 = z1 , . . . , ZN = zN } = Pr{Z1 = z1 , . . . , ZN = zN | Xn = i} Pr{Xn = i}. Now, given the state i at time n, the distribution of the observations beyond n is independent of those before n and is nothing other than the backward probability βn (i). That is because the Markovian character of the transition probabilities means that no relevant information about the future states can be transmitted past the present state i, once that is given. Likewise the distribution of the observations up to time n is independent of those which follow, given the state at time n. The right-hand side of the above equation therefore reduces to [αn (i)/Pr{Xn = i}] × βn (i) × Pr{Xn = i} = αn (i)βn (i). Rewriting the joint probability in terms of the conditional distribution of the state, given the observations, and normalizing, yields (10.3.17a). Similar reasoning gives (10.3.17b). See also Exercise 10.3.7 for a matrix proof. The E–M algorithm for this setting involves ﬁnding those parameter values for the full model [i.e., both pij and fj (·)] that maximize the expected value of the likelihood, given the observations and an initial set of parameter values. Finding the expected value of the full likelihood, given the observations, is the E-step; ﬁnding the parameter values which maximize this expected likelihood is the M-step. An argument based on Jensen’s inequality shows that each iteration of the algorithm can only increase the likelihood. To illustrate the algorithm we turn to a discrete-time version of Example 10.3(e) [see also Fischer and Meier-Hellstern (1993), Davison and Ramesh (1993), and Ryd´en (1994, 1996)].

106

10. Special Classes of Processes

Example 10.3(f) Discrete-time HMMs with Poisson observations: E–M algorithm analysis. Because the standard procedures apply to discrete-time processes, a preliminary step in adapting them to the context of Example 10.3(e) is to bin the observations into small time intervals of length ∆t (preferably smaller than the mean interval length). The model then reduces to the discrete-time Markov chain with transition matrix P∆t = eQ∆t , whereas the observations consist of the sequence of counts Z1 , . . . , ZN in successive bins n = 1, . . . , N . The counts can be modelled conveniently as Poisson random variables having parameter µi = λi ∆t when the Markov chain is in state i. It is assumed that changes of state occur only on the boundaries of the ∆t intervals. We take the vector of parameters for this problem to be θ = {π1 , . . . , πK ; p11 , . . . , pij , . . . , pKK ; µ1 , . . . , µK }, where {µi } are the parameters of the distributions fi (·). The complete likelihood Lc , given successive states {i1 , . . . , iN } and observations {Z1 , . . . , ZN }, is given by N −1 N log pin ,in+1 + log fin (Zn ). log Lc (θ) = log πi1 + n=1

n=1

In our case, inasmuch as the distributions are Poisson, log fin (Zn ) = Zn log µin − µin − log(Zn !) . It is convenient here to rewrite the likelihood by collecting the quantities multiplying a particular term such as log pij ; then log Lc (θ) equals i

δi0 i log πi +

i,j

Eij log pij +

Gi log µi − Di µi + C,

(10.3.18)

i

where Di counts the total visits to state i, Eij the number of transfers from state i to state j, and Gi the number of points emitted while the chain is in state i, and C is a function of the observations Zn only. Equation (10.3.18) shows that {δi0 i , Di , Eij , Gi } is a set of suﬃcient statistics for the parameters θ. Next, take expectations conditional on the observations Z = {Z1 , . . . , ZN } and some initial parameter set θ∗ , treating the running parameter values in (10.3.18) as ﬁxed numbers. This requires taking an average over state histories of the log-likelihood (10.3.16) (speciﬁcally, the various suﬃcient statistics), conditional on the observations and initial parameters. But these conditional expectations can all be written down in terms of the conditional probabilities of visits to or transfers between successive states, given the observations, and

10.3.

Point Processes Deﬁned by Markov Chains

107

hence in terms of the quantities appearing in Lemma 10.3.I. Speciﬁcally, we ﬁnd N −1 E[Eij | Z] = n=1 e∗n (i, j) ≡ e∗ij , E[δi0 ,i | Z] = πi∗ , N N E[Gi | Z] = n=1 Zn d∗n (i) ≡ gi∗ , E[Di | Z] = n=1 d∗n (i) ≡ d∗i , where ∗ indicates dependence on the initial parameters. Substitution leads to an expression for the conditioned log-likelihood which exactly replicates the form of (10.3.18), namely, E[log Lc (θ) | Z] equals gi∗ log µi − d∗i µi + C. (10.3.19) πi∗ log πi + e∗ij log pij + i

i,j

i

Equation (10.3.19) constitutes the E-step. To implement the maximization step we have to ﬁnd the values of the current parameters that maximize (10.3.19) for given values of the observations and of the starred expressions. Recalling the constraints i πi = 1 = j pij , we ﬁnd for the new estimates e∗ij g∗ π ˆi = d∗1 (i), pˆij = ∗ , µ ˆi = i∗ . (10.3.20) di di These equations, which are commonly referred to as the Baum–Welch reestimation equations, constitute the M-step. Together with the results of Lemma 10.3.I, they summarize the application of the E–M algorithm in the given discrete context. The algorithm proceeds by successive application of the E- and M-steps, using the forward and backward probabilities to evaluate the quantities appearing on the right-hand sides in (10.3.20), and the equations themselves to evaluate the revised parameter estimates. In practice, a special problem again revolves around estimating the initial distribution, as there is not usually enough information in the data to provide stable estimates for the probabilities πi . If the πi are taken to be the stationary probabilities for the chain being estimated, and hence functions of the pij , the simplicity of the updating equations is lost, and they become essentially intractable. A reasonable compromise [cf. the discussion in Turner et al. (1998)] is to evaluate the stationary distribution for the chain with the initial parameters θ1 , and regard this distribution as ﬁxed in the M-step. Then only the transition probabilities and the distribution parameters need to be re-estimated, resulting in modiﬁed forms in (10.3.20) with the given initial distribution replacing the estimated initial distribution. The E–M algorithm may converge rather slowly as it approaches the maximum, in which case it may be more eﬃcient to switch over for the ﬁnal steps to direct maximization of the likelihood (10.3.16), using the E–M estimates as starting values. The parameters for the continuous-time model which motivated Example 10.3(f) can be related approximately to the parameters of the discrete-time version by the relations λi ≈ µi /∆t;

qii ≈ −(1 − pii )/∆t;

qij ≈ −pij /∆t (i = j).

108

10. Special Classes of Processes

It is also possible to apply the E–M algorithm directly to the continuoustime process [see Ryd´en (1996)], and we now outline this alternative approach. Example 10.3(g) Cox process directed by a ﬁnite Markov chain: Direct E– M analysis [continued from Example 10.3(e)]. We start from the deﬁnition (10.3.9a) for the joint statistic pi (t), which we can identify as the continuoustime analogue of the forward probability αn (i). To emphasize the analogy, we write for the remainder of this discussion, using N (t) as before, αt (i) = Pr{X(t) = i and points occur in (0, t] at times t1 , . . . , tN (t) } = p(0) R (0, t] δi = pi (t), where δi is a K-vector of 0s except for 1 in the ith component. Similarly, we can write down an analogue to the backward probability βn (i), namely, βt (i) = Pr{points occur in (t, T ] at times tN (t)+1 , . . . , tN (T ) | X(t) = i} = δ i R (t, T ] 1. The diﬀerential versions (10.3.10–11) may be regarded as continuous-time analogues of the forward recurrence equations (10.3.15a) in the Baum–Welch formulation of HMMs. The quantities αt (i), βt (i) can be computed from the diﬀerential equations, or by direct evaluation of (10.3.9a) and its analogue for the backward probabilities. Once again the likelihood can be evaluated as αt (i)βt (i) = αT (i) (every t in 0 < t < T ). Lc = i

i

The Markov property again allows us to write down state estimation probabilities, analogous to those in Lemma 10.3.I. These take the forms dt (i) = Pr{X(t) = i | points at t1 , . . . , tN (T ) } = αt (i)βt (i)/Lc , (10.3.21a) et (i, j) dt = Pr{transition i to j occurs in (t, t + dt) | points at t1 , . . . , tN (T ) } = αt (i)qij βt (j)/Lc . (10.3.21b) The complete likelihood can be written down directly in terms of the epochs {s : = 1, . . . , M } of state transitions, s0 = 0, the state transitions themselves {i−1 → i }, and the event times {tn }. Recalling the notation qi = −qii , πij = qij /qi , and taking logarithms, we can write the log-likelihood as log Lc = log πi0 +

M −1 =0

log qi ,i+1 + [N (s+1 ) − N (s )] log λi

− (qi + λi )(s+1 − s ) − (qiM + λiM )(T − tM ) δi0 ,i log πi − Di (qi + λi ) + j =i Eij log qij − Gi log λi , = i

10.3.

Point Processes Deﬁned by Markov Chains

109

where Di accumulates the time spent in state i during (0, T ], Eij counts the number of transitions from i to j, and Gi counts the number of events which occur while the chain is state i. Taking expectations after conditioning on the observations yields, for example, E[Di | t1 , . . . , tN (T ) ] ≡ Di∗ =

T

dt (i) dt, 0

∗ and G∗i but with dt replaced by with analogous integral expressions for Eij dN (t) (i.e., to give sums rather than integrals). Hence we obtain for the E-step, after substituting for the starred quantities,

E(log Lc | H(0,T ) ) =

T

d0 (i) log πi − (qi + λi )

i

+

j =i

dt (i) dt 0

T

et (i, j) dt − log λi

log qij 0

T

dt (i) dN (t) ,

0

where N (·) denotes the observed counting process. Then maximizing over the parameters πi , qi , qij , λi , we obtain for the M-step estimates T

π ˆi = d0 (i), T ˆi = λ

0

dt (i) dN (t)

T 0

dt (i) dt

,

et (i, j) dt , qˆij = 0 T dt (i) dt 0 T j =i 0 et (i, j) dt qˆi = . T d (i) dt t 0

(10.3.22)

The integrals in these equations can be evaluated with the aid of (10.3.9) by summation of integrals over intervals between the points 0, t1 , . . . , tN (T ) , T . From the numerical point of view, a serious problem is the calculation of the matrix exponentials J(t) in (10.3.9b). If a simple discretization is used for this purpose, one is eﬀectively reverting to the discrete-time model previously discussed, and there are greater advantages in keeping to the discrete set-up all through. One alternative is to use matrix diagonalization methods; a further alternative, suggested by Ryd´en (1996), is the ‘uniformization’ algorithm outlined in Exercise 10.3.9. A recent discussion of numerical aspects can be found in Roberts, Ephraim, and Dieguez (2006). This model has been the subject of considerable extension and elaboration. One generalization, which combines Example 10.3(g) with elements of the discrete HMM considered above, is to allow the observed process to be an MPP rather than a simple point process; a brief outline is given in Exercise 10.3.9. Example 10.3(f) can also be considered as a simple special case of the more general situation where the counting process N (t) is not itself Markovian, but forms one component in a more complex process which is

110

10. Special Classes of Processes

Markovian. Rudemo (1973) appears to be the ﬁrst to have studied point processes of this kind. He considered a bivariate system X(t), N (t) , where N (t) is the observed counting process, X(t) is a K-valued unobserved Markov process with Q-matrix Q, and counts may be produced either between state transitions, as in the previous example, or at the transitions themselves. The system remains Markovian, and similar issues arise: tracking the current state of the unobserved process X(t) from observations on the counting process, and estimating the parameters of the bivariate process. This framework also covers the extensive series of studies by Neuts and co-workers on Markovian and Batch Markovian Arrival Processes (so-called MAPs and BMAPs); see inter alia Neuts (1978, 1979, 1989), Ramaswami (1980), Asmussen et al. (1996), and Klemm et al. (2003). An important feature of these models is that, if they are used as input streams for singleserver and other queueing systems, many characteristics such as distributions of queue lengths and waiting times, can be represented as matrix-analytic analogues of the forms which they take in the simpler systems with Poisson or renewal input. Here, we concentrate on the point process properties, referring the reader to accounts of the wider range of applications in the cited references. The Q-matrix for a two-component process X(t), N (t) has a block structure of the form ⎫ ⎧ Q00 Q01 Q02 . . . ⎪ ⎪ ⎪ ⎪ ⎪ Q11 Q12 . . . ⎪ Q ⎪ ⎪ 10 ⎪ ⎪ , ⎪ ⎪ ⎪ ⎪ Q Q . . . Q ⎪ ⎪ 20 21 22 ⎭ ⎩ ... ... ... where each Qrs is a K × K matrix that describes the transition rates of the unobserved process X(t) which may occur while the counting variable moves from r to s. In most cases backward transitions are not possible, so that Qrs = 0 for r > s; also, the process X(t) has stationary transition probabilities and N (t) has stationary increments. The Q-matrix then becomes a block-type version of the Q-matrix of a pure birth process, ⎧ ⎫ Q0 Q1 Q2 . . . ⎪ ⎪ ⎪ ⎪ 0 Q0 Q1 . . . ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (10.3.23) ⎪ ⎪ ⎪ 0 ⎪. . . . 0 Q ⎪ ⎪ 0 ⎩ ⎭ ... ... ... Example 10.3(g) is recovered if we take Q0 = Q − Λ, Q1 = Λ, Qj = 0 for j > 1, where Q is the Q-matrix of X(t). That is to say, N (t) increases by 1 each time a point occurs, but the value of X(t) is unaltered, and the transitions of X(t) do not directly aﬀect the value of N (t). More generally, processes of the type (10.3.23) are characterized by the matrix Q0 , which describes the transitions of the process X(t) in the absence of jumps, and the matrices Q1 , . . . , QL describing the transitions of X(t) which accompany jumps of size 1, . . . , L, respectively. One of the simplest point processes of this type, often used as a building block in constructing more complex models, is described below.

10.3.

Point Processes Deﬁned by Markov Chains

111

Example 10.3(h) Renewal process of phase type; PH-distributions [Neuts (1978)]. This model is motivated by the situation in which points are recorded only when an underlying Markov process enters a particular state, say 0. It then becomes an example of an alternating renewal process, with one type of interval corresponding to sojourns in state 0, and the other to the periods while the process is traversing a path through the remaining states. As such it can be treated as a modiﬁed version of Example 10.3(a) as in Exercise 10.3.4. However, we wish to consider the limiting situation in which sojourns in state 0 are instantaneous, so that the observed process is actually a renewal process. This limiting process is then described by two components: a defecK tive K × K Q-matrix with row sums j=1 qkj = −δk < 0, and a vector of re-entry probabilities, say πk , describing the probability of entering state k at the same time as a renewal occurs. This corresponds to a model governed by a matrix of type (10.3.23) in which Q0 = Q, Q1 = δ π, Q = 0 for > 1. The interval between successive renewals can be represented as the sum of a random number of exponentially distributed waiting times, corresponding to the holding times in each of the states passed through before the next renewal, and mixed over the starting state. Its distribution takes the form 1 − F (t) = π eQt 1,

(10.3.24)

which in Neuts’ terminology represents a PH-distribution with representation (Q, π). Special cases, such as mixtures or sums of exponentials, correspond to giving special forms to Q and π; see Exercise 10.3.10 for examples. Expressions for the Laplace transform of the above density, as well as for the associated renewal function, are outlined in Exercise 10.3.10. Consider next the BMAP, originally called a versatile Markovian point process by Neuts (1979). This can be considered both as a Markov process with Q-matrix of the form (10.3.23), and an MPP. Both points of view are helpful in developing properties of the process, as we indicate below. The process accommodates not only batch arrivals, but also some dependence between the lengths of the intervals between arrivals, because the lengths of two consecutive intervals both depend on the state of X(t) at the arrival time by which they are separated. Variations on this model are extensively used in information-processing networks, where the batch size approximates packet length, and the arrival rate the packet frequency [see Klemm et al. (2003)]. Example 10.3(i) BMAP representations: E–M analysis [Neuts (1979); Ryd´en (1996); Klemm et al. (2003)]. We suppose again that the process is driven by an unobserved Markov process X(t) with K states and generator Q. In this case, however, the observed points correspond to batches of size 0 < ≤ L, and may or may not be associated with a change of state. To accommodate the latter possibilities, thematrix Q is split among the diﬀerent batch sizes, L and written as a sum Q = =0 Q , where the Q appear in the representation (10.3.23), so the Q govern the transitions associated with arrivals of batch

112

10. Special Classes of Processes

() () size (or no arrivals for Q0 ). Write Q = qij , so positive elements qkk with > 0 correspond to arrivals for which no change of state occurs. Although the BMAP is the archetypal process with a representation as at (10.3.23), for estimation purposes it is more fruitful to consider it as an MPP with bivariate marks, (k, ) say, the two components recording the state k entered and the size of the associated arrival batch, including the possibility = 0. Marks of the form (k, 0) do not occur when X(t) is in state k, because they would correspond to transitions from state k to state k with no accompanying arrivals, and hence to no eﬀective transition of any kind. Inasmuch as the structure of (10.3.23) already implies that the transition rates depend only on the state k and not on the accumulated number of arrivals, the conditional intensity for the associated MPP {tn , (kn , n )} depends on the history only through the current state X(t), which we denote by ξ to ease the notation. In terms of the representation (10.3.23), we have then for the conditional intensity, for k = 1, . . . , K and = 0, . . . , L, $ 0 if = 0 and ξ = k, λ∗k, (t) = () qξk otherwise. The ground intensity for the overall process (including transitions not associated with arrivals) is given by λ∗g (t) = E[dNg (t) | ξ]/dt = −(Q0 )ξξ = qξ ,

(10.3.25a)

and the conditional mark distribution takes the form f ∗ (k, | t) = πk (ξ) ≡ qξk /qξ , ()

= 0, 1, . . . , L.

(10.3.25b)

The complete process is stationary if and only if the Markov process X(t) is stationary, becasue X(t) determines the occurrence probabilities for all types of transitions. Because the marginal process X(t) is Markovian, and governed by the matrix Q, X(t) is stationary if and only if it starts with initial distribution π, satisfying π Q = 0 and which we assume to be welldetermined and unique. Taking expectations under this distribution, we ﬁnd for the mean rate of occurrence of all transitions, K

¯ = E[λ∗ (t)] = λ g

(0)

πk qkk =

k=1

K

πk q k

(all t).

k=1

The mean rate of arrival of batches of size > 0 is λ = and the overall rate of arrival of batches is λ=

K k=1

πk

K L =1 j=1

()

¯ − λ0 , qkj = λ

K k=1

πk

K

() j=1 qkj ,

10.3.

Point Processes Deﬁned by Markov Chains

113

(0) where λ0 = k πk j:j =k qkj is the expected rate of transitions not accompanied by arrivals. Other characteristics of the process, including interval distributions and correlations, can be represented in matrix exponential terms along the lines of Example 10.3(h) (see Exercise 10.3.11). The BMAP models share with the E–M algorithm the feature that elaborations of the forward–backward equations can be used for parameter estimation. Indeed, the steps follow a very similar pattern to those of Example 10.3(f), and we indicate them only in summary form; Klemm et al. (2003) can be consulted for further details and numerical aspects. Suppose that over the observation interval (0, T ), the process has initial probability distribution {πk0 }, starts from state k0 , and jumps at times {tn } with marks {kn , n }, corresponding to transitions into states kn associated with arrival batches of size n (including the possibility n = 0). Then the complete likelihood takes the form Lc = πk00 qk01,k1 e−qk0 t1 qk12,k2 e−qk1 (t2 −t1 ) . . . . ( )

( )

(10.3.26)

Grouping together the terms associated with particular transitions, or sojourns in particular states, this can be rewritten as % Lc =

πk00

() N qjk j,k

()

& e−

qk Dk

,

(10.3.27)

j,k, ()

where, with reference to the observation period (0, T ), Nj,k counts the total number of transitions from state j into state k associated with an arrival batch of size , and Dk is the total length of time that X(t) is in state k. We now turn to extensions of the backwards and forwards probabilities. In the present model, given the initial distribution π0 and the set of observed arrival times and batch sizes {(τn , n ): n = 1, . . . , N (T )} (and note that the {τn } form in general only a subset of the set {tn } used in describing the complete likelihood), the forward probabilities take the form αt (k) = (π0 ) eQ0 (τ1 ) Q1 eQ0 (τ2 −τ1 ) Q2 . . . eQ0 (t−τN (t) ) ek .

(10.3.28)

Likewise the backward probabilities βt (i) can be written Q0 (τN (t)+1 −t) βt (k) = e QN (t)+1 eQ0 (τN (t)+2 −τN (t)+1 ) QN (t)+2 . . . eQ0 (T −τN (T ) ) 1. ke

It is evident that here also, for every t in (0, T ), the incomplete likelihood can be expressed as αt (k)βt (k) = αT (k) = αT 1. (10.3.29) L(T ) = k

k

114

10. Special Classes of Processes

This expression can be directly maximized to ﬁnd suitable parameter estimates. Alternatively, as in Example 10.3(h), we can seek more stable procedures based on the E–M algorithm. Adapting the latter approach, the forward and backward probabilities reappear in the appropriate extension of the state estimation Lemma 10.3.I. Equation (10.3.21a) retains the form dt (i) ≡ Pr{X(t) = i | S} = (0) Nij (t) denotes the number of

αt (i)βt (i)/L(T ),

whereas if transitions from i to j with no associated arrivals in time t, (10.3.21b) becomes (0) (0) et (i, j) dt ≡ E[dNij (t) | S] = αt (i)qij βt (j)/L(T ) dt. For the conditional rate at time t of i → j transitions associated with arrivals of batch size > 0 we have () αt (j)qjk βt (k) () . et (j, k) = LT Turning ﬁnally to the E- and M-steps, we obtain from (10.3.27), log Lc =

log πi00

−

Dk log qk +

k

L

()

()

Nij log qjk .

(10.3.30)

j,k =0

Taking expectations conditional on the observed sequence S ≡ {(tn , n )} constitutes the E–step, and leads to an expression similar to (10.3.30) but with () E[Sk | S] and E[Njk | S] replacing the corresponding expressions without the expectations. Maximizing with respect to the parameters is again straightforward, and leads to the updated estimates ()

π ˆi0

= Pr{X(0) = i | L},

() qˆjk

=

E(Njk | S) E(Sj | S)

,

qˆj =

L k

()

qˆjk .

(10.3.31)

=0

Thus the crucial diﬃculties are again in evaluating the conditional expectations which appear in these equations, and again these can be represented in terms of the forward and backward probabilities. We ﬁnd Pr{X(0) = i | S} = d0 (i), T T (10.3.32a) (0) E(Sk | S) = dt (i) dt, E(Njk | S) = et (j, k) dt. 0

0

The conditional expectations for the cases with > 0 have a slightly diﬀerent form because they correspond to known times and sizes of arrivals. However, similar reasoning leads to the results T () () et (j, k) dN (t), (10.3.32b) E(Njk | S) = 0

where N (t) counts the number of batches of arrivals of size . Evaluation of the matrix exponentials which arise in these formulae can again be tackled by diagonalization or by the uniformization approximation described in Exercise 10.3.8.

10.3.

Point Processes Deﬁned by Markov Chains

115

Exercises and Complements to Section 10.3 10.3.1 Existence. Any state i of a continuous time Markov chain on countable state space with time-homogeneous transition probabilities (an MC, say) as in Part II of Chung (1967) is either stable or instantaneous according as the corresponding diagonal matrix element qi = −qii is ﬁnite or inﬁnite, respectively. An MC with only stable states and all of them nonabsorbing can nevertheless have inﬁnitely many jumps in a ﬁnite interval, but for an MC to be regarded as a point process as deﬁned in Chapter 9 we wish to exclude such a possibility. By using the role of the diagonal matrix elements qi = −qii as intensities, argue that a stochastic condition for an MC to have a.s. ﬁnitely many jumps {tn } in bounded subsets of R is that 1/qX(tn −) = ∞

a.s.

tn ∈R+

Verify that the pure birth process with quadratic birth rates fails this condition. [Hint: See, e.g., Feller (1968, Section XVII.4).] 10.3.2 (a) Use (tprev , κprev ) of Example 10.3(a) with the transition probabilities and distribution functions specifying a semi-Markov process to describe the conditional intensity at time t of (i) a Poisson process; (ii) a renewal process; (iii) a Markov process (on countable state space); and (iv) a semiMarkov process. Verify the assertion of the footnote in that Example. (b) Given a Markov renewal process {(tn , κn )} around (10.3.1) with card(X) ≥ 2 but allowing κn = κn+1 (so pii > 0 for at least one i ∈ X), X(·) at (10.3.2) is still well-deﬁned. Deﬁne a subset {(tn , κn )}, still a Markov renewal process, such that κn+1 = κn (all n); the relation (10.3.2) still holds (and the sample functions remain the same), and when the original transition matrix (pij ) is irreducible, the transition matrix (pij ), where pij = pij /(1−pii ), now leads to a one-to-one measurable mapping between the latter Markov renewal process and the semi-Markov process X(·). 10.3.3 Markov renewal or semi-Markov process observed on a subset [cf. C ¸ inlar (1975, 10(1.13))]. Let Y = {(tn , κn )} be a realization of an irreducible Markov renewal process on state space X, and let X be a nonempty proper subset of X. Construct the subset Y of Y via Y = {(tn , κn ): κn ∈ X } and relabel it sequentially as {(tn , κn )}. (a) Show that Y is an irreducible Markov renewal process on X . (b) Let Y have matrix renewal function H(t) = (Hij (t)). Show that the components of the matrix renewal function of Y are {Hij (t): i, j ∈ X }. (c) When card(X ) ≡ R say is ﬁnite, ﬁnd an expression for the R × R matrix GX (t) of transition distributions for the process observed only in X in terms of the original matrix G(t) = (Gij (t)). In the special case where the original process is Markovian, show that the transition distributions are of the phase-type discussed in Example 10.3(h). [Hint: {κn } is a discrete-time Markov chain on X, ensuring that {κn } is a similar process on X , inheriting irreducibility from Y. Its one-step transition probabilities (pij ) say, for i, j ∈ X , can be found from the pij via taboo probability versions of the Chapman–Kolmogorov equations, or else directly

116

10. Special Classes of Processes from a renewal-type equation. The same is true of the components of the matrix of transition distributions GX .]

10.3.4 (Continuation). When Y = {(tn , κn )} is a Markov renewal process, the process Y2 = {(tn , (κn−1 , κn ))} is also a Markov renewal process, now with state space X(2) . The subset Y2 of observations of transitions of Y restricted to a subset X2 ⊆ X(2) , is again a Markov renewal process by Exercise 10.3.3(a); for simplicity conﬁne attention to the case t0 = 0. Then the ﬁrst moment H(i,j),(k,) (t) say, for the number of jumps tn in (0, t] for which k → given a jump i → j at 0, is independent of i and satisﬁes the Markov renewal equation t

Hj,(k,) (t) = δjk Gk (t) + h∈X

0

dGjh (u) Hh,(k,) (t − u).

Then with H(k,) = (H1,(k,) · · · Hj,(k,) · · · ) and δk = (δ1k · · · δjk · · · ) , H(k,) (t) = (H ∗ Gk )(t)δk , provided (G(n∗) ∗ H(k,) )(t) → 0 (n → ∞). The analogue mg[r] for Y2 of the factorial moment densities at (10.3.6) when G has density g = (gij ), in terms now of gX2 (·) = (δ(i,j),X2 gij (·)) and with u0 = 0 < u1 < · · · < ur , is the product of matrices and vectors mg[r] (u1 , . . . , ur ) = p 0

r

(H ∗ gX2 (us − us−1 ))1.

s=1

[Hint: Deduce the equation for Hj,(k,) from the Markovian nature of {κn } and a backwards Chapman–Kolmogorov decomposition. See Ball and Milne (2005); Darroch and Morris (1967) considered the Markovian case earlier.] 10.3.5 Likelihoods for semi-Markov processes. In the notation of Example 10.3(a), with t0 = 0 and X(0) = κ0 , show that the likelihood L of a semi-Markov process X(t) observed on (0, T ] as having successive jumps at t1 , . . . , tN (T ) into states κ1 , . . . , κN (T ) is expressible as

L=

N (T )

gκn−1 κn (tn − tn−1 ) [1 − GκN (T ) (T − tN (T ) )],

n=1

u

where Gk (u) = j 0 gkj (u) du = j pkj Fkj (u). Write down the conditional intensity for the corresponding MPP, and verify that the above expression coincides with the usual form of the likelihood for an MPP. 10.3.6 Alternative treatments of the Jelinski–Moranda process [Example 10.3(d)]. (a) Formulate the likelihood when the initial state is treated as an unknown parameter. (b) Outline a Bayesian approach to the estimation of parameters in the process by ﬁnding the form of the posterior distribution for both the initial state and the parameters of the death process. 10.3.7 Forward and backward equations for discrete time HMMs. (a) Use induction to verify formally the forward and backward equations (10.3.17) for estimation in HMM [cf. MacDonald and Zucchini (1997)].

10.3.

Point Processes Deﬁned by Markov Chains

117

(b) Alternatively, show that the forward and backward equations reduce to matrix iterations such as α n+1 = α n P Dn+1 , where P is the transition probability matrix, and Dn = diag{f1 (zn ), f2 (zn ), . . . , }. Hence, for example, we have the explicit form α n = π 0 P D1 P D2 . . . P Dn , with a similar expression for the backward probabilities. Use these to give straightforward proofs of the likelihood equation (10.3.16) and (10.3.17) from the state estimation Lemma 10.3.I. (c) To introduce normalized forms of the forward and backward probabilities, † (i) = αn (i)/( j αn (j)), and similarly βn† (i) = βn (i)/( j βn (j)). set αn Reformulate equations (10.3.13) for these normalized forms. [Remark: These quantities are numerically more stable; constants ρn say are needed to recover the original αn (·), and similarly for βn (·).]

10.3.8 Uniformization approximation for calculating matrix exponentials [Gross and Miller (1984)]. Show that if m > max{−qii } is chosen just larger than the maximum diagonal element, exp (Qt) can be represented in the form eQt = e−mt emtA , where A denotes the nonnegative matrix Q/m + I, and hence that ∞ ∞ (mt)n n pn (t)An , A = exp (Qt) = e−mt n! n=0 n=0 where the {pn (t)} are the probabilities in a Poi(mt) distribution. Suﬃcient iterates of the ﬁxed matrix A, and a suﬃcient range of values of the Poisson probabilities, can then be computed to give an eﬀective algorithm for determining values of the matrix exponential to high precision. 10.3.9 MPP extension of Cox process driven by Markov chain [see Examples 10.3(e) and 10.3(g)]. Let Q be the matrix of transition rates for a K-state Markov process X(t). Suppose that for k = 1, . . . , K, when X(t) = k, points are generated at rate λk with conditional mark distribution fk (x). Write down the conditional intensity λ(t, x) of the corresponding MPP in terms of the current state ξ, and use it to ﬁnd the complete likelihood for observations over an interval (0, T ). Verify that the forward probabilities αt (k) for the observations up to time t, given the sequence {(tn , xn ): n = 1, . . . , N (t)} can be represented as a matrix product α t = π 0 R(0, t] where R(0, t] = J(t1 )ΛD(x1 )J(t2 − t1 )ΛD(x2 ) . . . J(t − tN (t) ) and J(t) has the same interpretation as in (10.3.6b), Λ = diag(λ1 , . . . , λK ), and D(x) = diag(f1 (x), . . . , fK (x)). Use these representations to ﬁnd a set of suﬃcient statistics for the complete likelihood, and the appropriate extension of the E- and M-steps of Example 10.3(g). 10.3.10 PH-distributions and their Laplace transforms. (a) Write out the distribution (10.3.24) explicitly in the special case that Q is a 2 × 2 matrix. (b) Show that both sums and mixtures of exponentials can be represented as PH-distributions, and ﬁnd the elements in their representation. [Hint: Restrict Q to the diagonal and superdiagonal; for a mixture of exponentials Q has pure diagonal form.]

118

10. Special Classes of Processes (c) Show that the distribution (10.3.24) has Laplace transform (s) = π [sI − Q]−1 1. Use this representation to ﬁnd the renewal function. Generalize to a matrix renewal function as in Example 10.3(a).

10.3.11 Interval distributions and correlations for the stationary BMAP process. Find a matrix exponential (phase-type) representation for the time interval between one batch arrival and the next, ﬁrst assuming the batches (of sizes 1 and 2 say) are associated with transitions into states k1 and k2 respectively, and then without making this assumption. Does this representation imply that the observed process can be regarded as a semi-Markov process with states = 1, . . . , L? Find also the correlations between successive intervals.

10.4. Markov Point Processes Markov processes in time heavily inﬂuenced the growth of applied probability modelling in the second half of the twentieth century. Dobrushin (1968) successfully described a spatial Markov property, and the 1970s saw Hammersley and Cliﬀord (unpublished, 1971), Moran (1973), and Besag (1974), for example, describing processes on two-dimensional lattices with a Markovian property, exploiting adjacency of points on a lattice to limit the range of stochastic dependence [Isham (1981) presents a broad review]. Ripley and Kelly (1977) gave a deﬁnitive description of Markov point processes, with an important sequel by Baddeley and Møller (1989) and subsequent expansion by Baddeley and co-workers; there is a consolidated account in van Lieshout (2000), and a broad exposition with examples in Møller and Waagepetersen (2004, Chapter 6 and Appendices F and G). Many mathematical properties of Gibbs distributions were anticipated earlier in statistical physics [Ruelle (1969, Chapter 3), Preston (1976)]. Georgii (1988) gives a probabilistic approach to Gibbs measures on multidimensional lattices. The practical appeal of Markov models lies in the form of the joint probability distribution in many variables: it is expressible as the product of many conditional probabilities each in a small number of variables deﬁned only on ‘adjacent’ time points. This then raises the possibility of specifying the model purely in terms of local conditional probabilities. The Papangelou conditional intensity function in Deﬁnition 10.4.I plays this role in point process modelling when coupled with an algebraic relationship property of ‘neighbourliness’ denoted below by ∼ , although the paradigm example of a renewal process fails this relationship in general [see around (10.4.16)]. Consider then simple ﬁnite point processes on a c.s.m.s. X , often a bounded subset of R2 or R3 . Proposition 5.3.II gives a canonical space for ﬁnite point processes as the union X ∪ of all product spaces {X (n) : n = 0, 1, . . .} with generic element x = {x1 , . . . , xn } for which n = card{x} = n(x). In Section 5.3 we used Janossy measures {Jn (·): n = 0, 1, . . .} to describe ﬁnite point processes; here we use their density functions {jn (x): n = 1, 2, . . .} with respect to n-fold products of Lebesgue measure on X with (X ) < ∞, symmetric as

10.4.

Markov Point Processes

119

around (5.3.1–2). It is sometimes more convenient to describe the distributions through the density function f = {fn } with respect to the distribution π = {πn } of a totally ﬁnite unit-rate Poisson process on the measurable space (X , BX ); if more generally this Poisson process has probability measure Pµ , then f is the likelihood of the process relative to Pµ as in (7.1.7). When X = Rd and µ = these two descriptions are virtually equivalent inasmuch as jn (x)/fn (x) = e−(X ) -a.e. (see Exercise 10.4.1). For Janossy densities, always symmetric, we have the interpretation as at (5.4.13), rewritten here in notation closer to the present setting, namely $ ' exactly n points in a realization: . (10.4.1) jn (x) dx1 . . . dxn = Pr one in each subset (xi , xi + dxi ) (i = 1, . . . , n), and none elsewhere Even more generally, f (= a collection {fn } of symmetric functions) may be a density with respect to any totally ﬁnite measure on the space X ∪∗ . For example, if X is a ﬁnite set and we use counting measure on X as the reference measure, then for each n = 0, 1, . . . , #(X ) we should want fn (x) equal to n! times the probability mass associated with the point set x for which n = n(x). Realizations of ﬁnite point processes with densities as at (10.4.1) are a.s. simple, and for them we have x ∈ X ∪∗ a.s., where X ∪∗ denotes the subset of X ∪ containing in each component X (n) only those x = {x1 , . . . , xn } for which xi = xj (i = j). This description proves more convenient here than the integer-valued measures N used in Chapter 9, but they are equivalent because n(x) N (·) = i=1 δxi (·), and N ∈ NX∗ when x ∈ X ∪∗ a.s. (cf. Proposition 9.1.X). We use x to denote both an element of X ∪∗ and a subset of X . For y ∈ X we usually write x ∪ y rather than x ∪ {y}. For the diﬀerence we write variously x \ y = x \ {y} = xy . An inclusion written y ⊂ x is strict. With this understanding, a nonnegative measurable function f : X ∪∗ → R+ is a density function of a simple ﬁnite point process when f (x) = fn(x) (x) for x ∈ X ∪∗ and f is integrable as at (10.4.2b). For such a process it is convenient to write (with mixed notation), when the reference measure is π(·), for a measurable function g, ∞ g(x) f (x) π(dx) = g(x) fn (x) πn (dx), (10.4.2a) E[g(N )] = X ∪∗

and f satisﬁes X ∪∗

f (x) π(dx) ≡

n=0 ∞ n=0

X (n)

X (n)

fn (x) πn (x) = 1.

(10.4.2b)

Often, the process of interest is on a subset of some Euclidean space Rd and it is regular in the sense of Deﬁnition 7.1.I, whereas the reference probability measure is a unit rate Poisson process, in which case this Poisson process is on a compact subset of Rd .

120

10. Special Classes of Processes

Deﬁnition 10.4.I. Given a simple ﬁnite point process on X with a density f : X ∪∗ → R+ ≡ {fn (x): x ∈ X (n)∗ , n = 1, 2, . . .}, the function ⎧ (y ∈ X , x = ∅), ⎨ f1 (y)/f0 (∅) ρ(y | x) = fn+1 (x ∪ y) f (x ∪ y) ⎩ ≡ (y ∈ X \ x, x ∈ X (n)∗ , n = 1, 2, . . .) fn (x) f (x) (10.4.3) deﬁnes its Papangelou conditional intensity [set ρ(y | x) = 0 if fn (x) = 0]. We remark that in view of our earlier comments, ρ(y | x) can just as easily be given in terms of Janossy densities: see Exercise 10.4.2. Ripley and Kelly’s deﬁnition of a Markov point process involves a concept of ‘adjacency’ of pairs of points and an analogue of a ‘neighbourhood’ based on such a notion of adjacency. This concept for points y, z ∈ X is embodied in some reﬂexive symmetric relation ∼ (meaning that y ∼ y and for z = y, y ∼ z if and only if z ∼ y), as, for example, y ∼ z if and only if |y − z| ≤ R for some ﬁnite positive R (but, see Exercise 10.4.3). Any such relation ∼ deﬁnes a (∼)-neighbourhood (or just neighbourhood for short) b∼ (y) of any y ∈ X by (10.4.4) b∼ (y) = {z ∈ X : z ∼ y}. This deﬁnition is easily extended to y ∈ X ∪∗ by setting b∼ (y) = {z ∈ X : z ∼ y for some y ∈ y}. Deﬁnition 10.4.II. A simple ﬁnite point process with density function f : X ∪∗ → R+ is a Markov point process if for every x with f (x) > 0 its Papangelou conditional intensity ρ(y | x) = f (x ∪ y)/f (x) satisﬁes (y ∈ X \ x), (10.4.5) ρ(y | x) = g y, x ∩ b∼ (y) where g: X × X ∪∗ → R+ . Call such f a Markov density function. In other words, for f to be the density function of a Markov point process, we require that, for all x ∈ X ∪∗ , y ∈ X \ x and writing n = n(x), the (n + 1)-dimensional joint density function fn+1 (x ∪ y) must be expressible as a product of the n-dimensional joint density function fn (x) and some function g(·, ·) that depends only on y and those elements of x that lie in the (∼)neighbourhood b∼ (y) of the ‘extra’ point y, i[i.e., g(y, ·) is independent of all elements of x and X that are not (∼)-neighbours of y]. In particular, for y, z ∈ x such that z ∼ y (hence, z ∈ / b∼ (y)), we have ∼ ∼ xy ∩ b (y) = (xz )y ∩ b (y). Thus, for such y and z, the relations f (xz ) f (x) = g y, xy ∩ b∼ (y) = g y, (xz )y ∩ b∼ (y) = f (xy ) f (xz )y

if z ∼ y (10.4.6)

hold for the density function f of a Markov point process, so that f (x) =

f (xy ) f (xz ) f (xz )y

(y, z ∈ x, z ∼ y).

(10.4.6 )

10.4.

Markov Point Processes

121

In (10.4.6), f (x) is a compact notation for fn(x) (x), so f (xy ) = fn(xy ) (xy ) = fn(x)−1 (xy ). Theorem 10.4.V shows that there are far-reaching consequences of this condition which states that the conditional intensity of adding an ‘extra’ point y to a set xy is independent of any point z that is not in the neighbourhood b∼ (y). Example # 10.4(a). An inhomogeneous Poisson process with density µ(·) has jn (x) = xi ∈x µ(xi ), so ρ(y | x) = f (x ∪ y)/f (x) = µ(y), which, being independent of x, is clearly of the required form (10.4.5) for the process to be a Markov point process. A major property of Markov point processes is the Hammersley–Cliﬀord Representation Theorem 10.4.V below. It is important practically because it expresses the joint density function of a point set x as a product of (conditional) probability density functions of many smaller subsets y ⊂ x. When it was ﬁrst proved, the result also identiﬁed classes of Markov random ﬁelds on one hand and Gibbs states with nearest neighbour potentials, the latter being already well known in statistical physics [cf. Cliﬀord (1990)]. Speciﬁcally, (10.4.7) expresses the density function f of a Markov point process as products of terms involving another function φ: X ∪∗ → R+ which is simpler than f in that φ(x) = 1 as soon as the set x includes a pair of distinct elements, y, z say, for which y ∼ z. In other words, φ(x) can diﬀer from 1 only when x is a clique deﬁned in 10.4.III(a) below. Deﬁnition 10.4.III (Cliques). Let x, y be ﬁnite nonempty subsets of X (equivalently, x, y ∈ X ∪∗ ), and ∼ a reﬂexive symmetric relation on elements of X . (a) y is a clique (write (∼)-clique if distinction is needed) if it is the empty set or a singleton set or else y ∼ z for every two-point subset {y, z} ⊆ y. (b) A clique y ⊂ x is a maximal clique of the set x when y ∪ {z} is not a clique for every z ∈ x \ y. (c) Clq(x) (or (∼)-Clq(x) if needed) is the family of all cliques y ⊆ x. To understand cliques better, we list below some of their properties, leaving their proof to Exercise 10.4.5. In this list, ∼ is a reﬂexive symmetric binary relation on elements of the c.s.m.s. X containing points y, z, . . . and ﬁnite nonempty subsets x, y, . . . . Note [compare (v) and (vi)] that cliques do not in general yield equivalence relations (this was wrongly claimed on p. 219 of the ﬁrst printing of Volume I). (i) The empty set and all one-point sets {y} are cliques. (ii) The two-point set {y, z} with y = z is a clique if and only if y ∼ z. (iii) If y ⊂ x and x is a clique, then so also is y. (iv) ∼ is transitive within a clique. (v) Distinct maximal cliques can overlap when ∼ is not transitive.

122

10. Special Classes of Processes

(vi) If ∼ is transitive then distinct maximal cliques cannot overlap, and the maximal cliques that are subsets of x provide a decomposition of x into equivalence classes. Deﬁnition 10.4.IV. Let h, φ be nonnegative real-valued functions on X ∪∗ . (a) A family H of subsets of X ∪∗ is hereditary if x ∈ H implies y ∈ H whenever y ⊂ x. (b) h is an hereditary function if h(x) > 0 implies h(y) > 0 for every y ⊂ x. (c) φ is a (∼)-interaction function if φ(x) = 1 whenever x is not a (∼)-clique. Møller and Waagepetersen (2004, Example F.2) describe a process deﬁned on a space with each element of H a strict subset of X ∪∗ [i.e., H ⊂ X ∪∗ = H]. The following theorem is a key result for Markov point processes. Its ﬁrst version is unpublished [see Besag (1974), discussion by its originators there, and Cliﬀord (1990)]; this version is due largely to Ripley and Kelly (1977). Theorem 10.4.V (Hammersley–Cliﬀord Representation). A probability density function f : X ∪∗ → R+ is the density function of a Markov point process if and only if there is a (∼)-interaction function φ such that for nonempty sets x,

φ(y) = φ(z) (x ∈ X ∪∗ ). (10.4.7) f (x) = y⊆x

z∈Clq(x)

Remark. Because ∅ ∈ Clq(x), φ(∅) appears as a factor of f (x) for every x, and hence acts as a multiplicative normalizing constant in this representation. Proof. For a (∼)-interaction function φ, φ(z) = 1 when z is not a clique so in (10.4.7) the second equality is trivial and only the ﬁrst needs proof. Given φ, deﬁne f˜(·) by either product in (10.4.7), and suppose that f˜ is integrable and hence can be and is normalized to be a density function on X ∪∗ . # Then for x with f˜(x) > 0 and z ∈ X \ x, f˜(x ∪ z)/f˜(x) = y⊆x φ(y ∪ z). Because φ is an interaction function, it is possible#for φ(y ∪ z) = 1 only when y ∪ z is a clique, so that y ∪ z ⊆ b∼ (z), hence y⊂x, y∪z∈Clq(x) φ(y ∪ z) = g z, x ∩ b∼ (z) for some function g. We thus have the form at (10.4.5), and f˜ as deﬁned is the density function of a Markov point process. Conversely, let f be the density function of a Markov point process, and deﬁne ψ iteratively by ⎧ ⎪ ⎨ f (∅) ψ(x) = 1 ⎪ ⎩ f (x)! #

if x = ∅, if nonempty x is not a clique, y⊂x ψ(y) otherwise,

(10.4.8)

taking 0/0 = 1 if need be. Then ψ is an interaction function, and it remains to show that the representation (10.4.7) holds with φ = ψ. Suppose x is given, with n(x) = r ≥ 2, and that (10.4.7) has been proved for all x with n(x ) ≤ r − 1; note that it is true for r − 1 = 1.

10.4.

Markov Point Processes

123

# # First, if f (x) = 0 and y⊂x ψ(y) = 0, then y⊆x ψ(y) = 0 also and (10.4.7) holds. # Next,#if y⊂x ψ(y) > 0, then for any z ⊂ x (hence, n(z) ≤ r − 1), we have f (z) = y⊆z ψ(y) > 0 because y ⊆ z ⊂ x, and the right-hand side of (10.4.6) is positive. If also f (x) = 0, then the left-hand side of (10.4.6) is zero so we have reached a contradiction if x not a clique, whereas if x is a clique then by the last case of (10.4.8), (10.4.7) holds. # Finally, when both f (x) > 0 and y⊂x ψ(y) > 0, either x is a clique and by the last case of (10.4.8), (10.4.7) holds, or else x is not a clique and therefore there exist y, z ∈ x such that z ∼ y. Because f is the density function of a Markov point process, (10.4.6 ) holds with arguments in the right-hand side there having at most r − 1 points, so that # # f (xy )f (xz ) w⊆xy ψ(w) w⊆xz ψ(w) = # . (10.4.9) f (x) = f (xz )y w⊆(xz )y ψ(w) Now ψ(x) = 1 because x is not a clique, and for any other w ⊂ x, either (i) w contains both y and z (and ψ(w) = 1 because it is not a clique), or (ii) w contains neither, in which case w ⊆ (xy )z , or (iii) w contains exactly one of y possibilities and z so it is of the form w ∪ y or w ∪ z for w ⊆ (xy )z . These # and facts imply that the right-hand side of (10.4.9) equals w⊆x ψ(w); that is, (10.4.6) holds when n(x) = r. In typical applications, the Papangelou conditional intensity or the clique density function φ(·) may be known, in particular, when n(x) is ‘small’ for most, if not all, cliques x. Example 10.4(b) Strauss process [continued from Example 7.1(c) and Exercise 7.1.8]. In the notation of this chapter, the Janossy density of the Strauss model of Example 7.1(c), for which x ∼ y if and only if x − y ≤ R, is given for 0 < β < ∞, 0 < γ ≤ 1, and α a normalizing constant, by jn (x) = αβ n(x) γ m(x,R) , where m(x, R) is the number of distinct elements y, z ∈ x for which y ∼ z. Then ρ(y | x) = β n(x∪y)−n(x) γ m(x∪y,R)−m(x,R) = βγ n(x∩SR (y)) , the exponent of γ being equal to the number of elements of x within distance R of y. Then ρ(y | x) is of the required form (10.4.5) for a Markov process, and therefore the representation (7.1.5) for the Janossy density follows from the Hammersley–Cliﬀord theorem. Kelly and Ripley (1976) showed that the Strauss process is uniquely characterized by two properties: its density function is hereditary, and its Papangelou conditional intensity is of the form, for x ∈ X ∪∗ and y ∈ X \ x, ρ(y | x) = g n(x ∩ SR (y)) , where g: Z+ → R+ and SR (y) is the closed ball with centre y and radius R.

124

10. Special Classes of Processes

Example 10.4(c) Area-interaction point process [Baddeley and van Lieshout (1995), van Lieshout (2000, Section 4.3)]. Suppose a simple ﬁnite point process on a compact subset X ⊂ Rd has density with respect to a Poisson process at unit rate on X given by f (x) = αβ n(x) γ −(X ∩UR (x))

(x ∈ X ∪∗ ; α, β, γ > 0), (10.4.10) n(x) where denotes Lebesgue measure on Rd , UR (x) = i=1 SR (xi ) is the union of n(x) spheres with centres xi ∈ x and common radii R > 0, and α is a normalizing constant. In this form it is also called the penetrable spheres model used by Widom and Rowlinson (1970) and others to study liquid– vapour equilibrium questions (note also the next example); the model in R is tractable as a Kingman regenerative phenomenon [Hammersley et al. (1975)]. Because f (·) at (10.4.10) is a density function with respect to a unit-rate Poisson distribution, Pr{N (X ) = n} lies between e−(X )

[(X )]n · αβ n γ −(X ) n!

and

e−(X )

[(X )]n · αβ n , n!

hence α lies between (eβ−1 /γ)−(X ) and e−(β−1)(X ) , so N (X ) is ﬁnite-valued a.s. Thus f is indeed the density function of a simple ﬁnite point process, being Poisson when γ = 1. Its Papangelou conditional intensity is given by ρ(y | x) = βγ −(X ∩[SR (y)\UR (x)]) , and because the set diﬀerence here is a function of R, y and those xi ∈ x for which |y − xi | < 2R, a function g(y, x ∩ S2R (y)) can be constructed to satisfy (10.4.5); that is, a process with density (10.4.10) is a Markov point process, with x ∼ y if and only if |x − y| ≤ 2R. It is attractive or repulsive as γ ≥ or ≤ 1, respectively, and Poisson for γ = 1, where a simple point process with Papangelou conditionial intensity ρ is called attractive (respectively, re/ y, ρ(z | x) ≤ ρ(z | y) pulsive) whenever, for all x, y with x ⊂ y and z ∈ ρ(z | x) ≥ ρ(z | y) . Baddeley and van Lieshout allow a general version of this model by replacing Lebesgue measure in the exponent of (10.4.10) by a totally ﬁnite Borel ⊂ X regular measure ν say and the spheres SR (xi ) by compact sets S(x) where |S|: X → R+ is continuous and bounded. They also show how a birthand-death process can have the model as a stationary distribution. Observe that for γ < 1, the process is not as ‘aggressively’ repulsive as the Strauss process in which the exponent of γ increases quadratically in n(x) whereas in (10.4.10) it changes at most linearly in n(x), with such changes becoming closer to zero with greater overlap between diﬀerent spheres SR (xi ) as n(x) increases. Example 10.4(d) Penetrable spheres mixture model [Widom and Rowlinson (1970), van Lieshout (2000, Examples 2.8, 2.11)]. Consider a bivariate point process (x1 , x2 ) constructed on a bounded set X by superposing two independent Poisson processes Nj at rates βj (j = 1, 2) subject to every x ∈ x1 being

10.4.

Markov Point Processes

125

at a minimum distance R from every y ∈ x2 ; that is, d(x1 , x2 ) > R, where for nonempty ﬁnite point sets x, y, d(x, y) = minx∈x, y∈y d(x, y). Then the density relative to a unit rate Poisson process on X equals n(x1 ) n(x2 ) β2 I{d(x1 ,x2 )>R}

f (x1 , x2 ) = αβ1

(10.4.11)

for some normalizing constant α. This density is positive when, given x1 , all n(x2 ) of the points of the second component avoid the union UR (x1 ) [cf. (10.4.10)] of circles of radius R around all the points of the ﬁrst component. It follows, using Poisson process properties, that the marginal density of the ﬁrst component equals ∞ i e−(X ) n(x ) n(x ) X \ UR (x1 ) β2i = αβ1 1 e(β2 −1)(X ) e−β2 (X ∩UR (x1 )) . αβ1 1 i! i=0 Thus, the marginal distributions in this mixture model are just the cases 1 < γ = eβ2 or eβ1 of the area-interaction model of Example 10.4(c). Given a space X , a multitude of possible symmetric reﬂexive relations can i be deﬁned. When two such relations ∼ (i = 1, 2) are given, the intersection ∩ ∩ 1 2 relation ∼ deﬁned by y ∼ z if and only if both y ∼ z and y ∼ z, and the ∪ ∪ 1 union relation ∼ deﬁned by y ∼ z if and only if at least one of y ∼ z and 2 y ∼ z holds, are both well-deﬁned symmetric reﬂexive relations. When the 1 2 1 2 relations ∼ and ∼ are ordered in the sense that (say) y ∼ z implies y ∼ z for ∩ ∪ 1 2 all y, z, it follows that we can identify ∼ and ∼ with ∼ and ∼, respectively. Now suppose that {fn } and {fn } are Markov density functions for point 1 2 sets on X with respect to ∼ and ∼, respectively, and that 1 = f (x)f (x) π(dx) < ∞ (10.4.12) c ∪∗ X for some ﬁnite constant c > 0, so that cf = cf f is a density function, where (x)fn(x) (x). f (·) = {fn (·)} = {fn (·)fn (·)}, and f (x) = fn(x) Proposition 10.4.VI. Let f = f f be the product of two Markov density functions. When cf is a density function for some ﬁnite positive c, it is a Markov density function. Proof. The Hammersley–Cliﬀord representation applied to the densities f and f for n = n(x) implies that

fn (x) = φ (y) φ (z), 1

y∈(∼)-Clq(x)

2

z∈(∼)-Clq(x)

where φ and φ are the interaction functions determined by the densities f and f . Observe that if we deﬁne the function φ on X ∪∗ by ⎧ ∩ ⎪ ⎪ φ (x)φ (x) x is a (∼)-clique, ⎨ 1 2 x is a (∼)-clique but not a (∼)-clique, φ (x) φ(x) = 2 1 ⎪ x is a (∼)-clique but not a (∼)-clique, ⎪ ⎩ φ (x) ∪ 1 otherwise (i.e., x is not a (∼)-clique),

126

10. Special Classes of Processes ∪

then φ is a (∼)-interaction function and the function f is expressible as in equation (10.4.8). The Hammersley–Cliﬀord theorem now implies that there exists some ﬁnite positive c such that cf is the density function of a Markov point process on X . Proposition 10.4.VI enables us to extend the Strauss model in such a way as to allow degrees of interaction that may depend on the distance between points, as sketched in Exercise 10.4.6. Such a model is still a Gibbs model. Example 10.4(e) Spatial birth-and-death process [Preston (1977); see also van Lieshout (2000, pp. 83–87)]. This is a continuous-time space–time Markov process with state space X ∪∗ satisfying the following. (a) The only transitions are ‘births’ (x → x ∪ y) and ‘deaths’ (x ∪ y → x), where x ∈ X ∪∗ and y ∈ X \ x. (b) The probability of more than one transition in (t, t + h) is o(h). (c) Given the state x at t, the probability of a death x → x \ y (y ∈ x) during (t, t + h) equals D(x \ y, y)h + o(h), where D(·, ·): X ∪∗ × X → R+ is a BX ∪∗ × BX -measurable function. (d) Given the state x at t, the probability of a birth x → x ∪ y in (t, t + h), where y ∈ F ∈ BX , equals B(x, F )h dy + o(h), where B(x, ·) is a ﬁnite measure on (X , BX ). Assume that B(x, ·) has a density b(x, ·) with respect to the ﬁnite measure λ(·) on (X , BX ), so that intuitively, b(x, y) is the transition rate for a birth x → x ∪ y. Let f be a Markov function that is the density of a ﬁnite point process on X . Ripley (1977) observed that if there exists a spatial birth-and-death process such that whenever f (x ∪ y) > 0 it is true that the detailed balance relation (10.4.13) b(x, y)f (x) = D(x, y)f (x ∪ y) > 0 (x ∈ X ∪∗ ) holds, then the birth-and-death process is indecomposable and time-reversible, and its unique equilibrium distribution is the point process with density f . Existence and convergence are guaranteed by the following amalgamation of Preston’s (1977) Proposition 5.1 and Theorem 7.1 [see, e.g., Preston or van Lieshout (2000) for proof]. Proposition 10.4.VII. Let B(·, ·): X ∪∗ × BX ∪∗ → R+ and D(·, ·): X ∪∗ × X → R+ be such that B(x, ·) is a ﬁnite measure on (X , BX ) for each x ∈ X ∪∗ , B(·, F ) is BX ∪∗ -measurable for each F ∈ BX , and D(·, ·) is BX ∪∗ × BX measurable. Deﬁne βn =

sup B(x, X ),

x∈X (n)∗

δn =

inf

x∈X (n)∗

D(x \ y, y).

(10.4.14)

y∈x

Suppose that either (a) βn = 0 for all suﬃciently large n ≥ 0 and δn > 0 for all n ≥ 1; or

10.4.

Markov Point Processes

127

(b) βn > 0 for all n > 0, δn > 0 for all n ≥ 1, and ∞ β0 . . . βn−1 < ∞, δ1 . . . δn n=1

∞ δ1 . . . δn = ∞. β . . . βn n=1 1

(10.4.15)

Then there exists a unique spatial birth-and-death process for which B and D are the transition rates (the backwards equations involving B and D have a unique solution). The process converges in distribution as t → ∞ to its unique equilibrium measure, independent of the initial state. When either the birth- or death-rate is constant, the equilibrium distribution is an area-interaction process (see Exercise 10.4.8). Powerful as it is, Ripley and Kelly’s deﬁnition of a Markov point process, Deﬁnition 10.4.II, poses problems for a stationary renewal process which, seemingly, should be the simplest nontrivial case of a Markov point process on R. To see this, suppose that on the interval (0, t) there are n points x = {xi : i = 1, . . . , n} with 0 < x1 < · · · < xn < t say, coming from a stationary renewal process whose lifetimed.f. F has support a(0, a), density function ∞ f , and ﬁnite mean lifetime λ−1 = 0 xf (x) dx [so, 0 f (x) dx = F (a) = 1 > F (a − h) for any h > 0]. Then it is a standard result (see, e.g., Exercise 7.2.3) that the Janossy density function jn (x) is given by n−1

f (xi+1 − xi ) λ[1 − F (t − xn )]. jn (x) = λ[1 − F (x1 )]

(10.4.16)

i=1

Consequently, for any x such that jn (x) > 0 [and then, necessarily, max{x1 , max2≤i≤n (xi − xi−1 ), t − xn } ≤ a], we have ⎧ if y < x1 , [1 − F (y)]f (x1 − y)/[1 − F (x1 )] ⎪ ⎪ jn+1 (x ∪ y) ⎨ f (y − xi−1 )f (xi − y)/f (xi − xi−1 ) if xi−1 < y < xi , = i = 2, . . . , n, ⎪ jn (x) ⎪ ⎩ f (y − xn )[1 − F (t − y)]/[1 − F (t − xn )] if y > xn . (10.4.17) It is evident from (10.4.17) that if such a process is to be Markovian in terms of some ‘adjacency’ relation ∼ and therefore have a Hammersley–Cliﬀord representation as in Theorem 10.4.V, that cliques on which interaction functions are deﬁned can have at most two elements of a set x, and that these elements must be nearest neighbours either to the right or left of a given element x ∈ x. Furthermore, even supposing that {xi , xi+1 } is a clique in x, adjoining a point y∈ / x for which xi < y < xi+1 , would then change the status of {xi , xi+1 } so that it would no longer be a clique. Within the setting of the earlier part of this section, this ‘argument’ suggests that a stationary renewal process does not generally ﬁt the Ripley–Kelly deﬁnition of a Markov point set. However, a word of caution is apposite: a Poisson process in R is a renewal process,

128

10. Special Classes of Processes

and it also satisﬁes the Ripley–Kelly deﬁnition (because of the complete independence property). We turn our attention therefore to the relation y ∼ z (or its negation) used earlier between elements y, z ∈ x ⊂ X . In all the examples noted here and in the literature generally, the relation is deﬁned in fact for any pair y, z ∈ X independent of any set x to which either (or both or neither) may belong. The setting of a renewal process suggests we restrict attention to describing the x elements of the pair {y, z} as satisfying a reﬂexive symmetric relation y ∼ z only when the pair is a subset of x and that the relation may depend crucially x /x on x in the sense that we may have, for y, z ∈ x, y ∼ z but, although for w ∈ x∪w we necessarily have x ⊂ (x ∪ {w}), it need not be the case that y ∼ z. For x example, when y, z ∈ x ⊂ R and y ∼ z means that z is the nearest right- or left-neighbour of y from the set x, incrementing the set x by a point w lying between y and z destroys this nearest right- or left-hand neighbour property. x Suppose then that such a reﬂexive symmetric relation ∼ is deﬁned for all ∪∗ two-point subsets of x ∈ X , subject to x being in an hereditary family x H (see Deﬁnition 10.4.IV). When for y, z ∈ x the property y ∼ z holds, x say that y and z are neighbours within x, or (∼)-adjacent. For y ⊂ x the x (∼)-neighbourhood is x

bx (y) = {z ∈ x: z ∼ y for some y ∈ y}. x

x

(10.4.18) x

The set y is a (∼)-clique if y ∼ z for all y, z ∈ y, and the (∼)-clique indicator function is x 1 if y ⊂ x is a (∼)-clique, (10.4.19) I x (y) = 0 otherwise x

x

[for strict analogy with (10.4.4) the notation b∼ (y) and I ∼ (y) would be used here]. Using this notation, we have for example I x ({y, z}) = 1

x

if and only if y ∼ z.

Observe the status of the sets y and x in (10.4.18–19): y provides the points that are ‘targeted’ for adjacency, and x the ‘environment’ within which ‘adx jacency’ is deﬁned via the reﬂexive symmetric relation ∼ . x x Notice that for ∼ but not ∼, expanding x can destroy the (∼)-adjacency x x∪u of points y, z ∈ x (i.e., there can exist u ∈ / x such that y ∼ z but y ∼ z). x

Deﬁnition 10.4.VIII. A function f : H → R+ is a (∼)-Markov function if for all x in the hereditary class H, the function f is hereditary, and for y ∈ X and x ∪ y ∈ H with f (x) > 0, the ratio ρ(y | x) depends only on y, bx∪y (y), x∪y x and the relations ∼ and ∼ restricted to the neighbourhood set bx∪y (y). Recall that the Hammersley–Cliﬀord Theorem 10.4.V gives a representation of joint densities in terms of simpler ‘interaction’ functions. For this x set-dependent adjacency relation ∼ the analogous function, and the extended theorem, are as follows; its proof is similar to that of Theorem 10.4.V and can be found in Baddeley and Møller (1989).

10.4.

Markov Point Processes

129

Deﬁnition 10.4.IX. Let the function φ: H → R+ be hereditary and be such that for y ∈ / x ∈ H, if φ(x) > 0 and φ(bx∪y (y)) > 0, then φ(x ∪ y) > 0. x A (∼)-interaction function is a function Φ deﬁned in terms of such φ by a relation of the form x (10.4.20) Φ(y | x) = φ(y)I (y) , where 00 = 0. x

Theorem 10.4.X (Hammersley–Cliﬀord, extended). Let the relation ∼ satisfy the consistency conditions, for ﬁnite x ∈ H, w ⊂ z ∈ H, y, z ∈ X but y, z ∈ / z, and x = z ∪ {y, z} ∈ H, (C.1) I z (w) = I z∪y (w) implies w ⊂ bz∪y (y); and x (C.2) when y ∼ z, I z∪y (w) + I z∪z (w) = I z (w) + I x (w). x Then f is a (∼)-Markov function if and only if

f (x) = Φ(y | x) (10.4.21) y⊆x x

for all x ∈ H, where Φ is a (∼)-interaction function. Baddeley, van Lieshout and Møller (1996) considered Poisson cluster processes and showed them to be nearest-neighbour Markov processes as above when the clusters are uniformly bounded, or if the cluster centre process is Markov or nearest-neighbour Markov and the clusters are both uniformly bounded and a.s. non-empty. Thus, the nearest-neighbour Markov property is preserved under random translation but not under random thinning. Other extensions of ∼ have been suggested: Ord’s process in Exercise 10.4.10 gives one, Chin and Baddeley (1999, 2000) looked ﬁrst at a relation based on components exhibiting pairwise-connectivity and then at interactions between components of point conﬁgurations, and van Lieshout (2006a) has considered a sequential deﬁnition in association with space–time processes.

Exercises and Complements to Section 10.4 10.4.1 Suppose that a simple ﬁnite point process on (a subset of) Rd has Janossy density {jn (x)}, and that its density with respect to an inhomogeneous Poisson process with intensity λ(x) (x ∈ X ) is {fn (x)}. Show that jn (x) = e−Λ(X ) fn (x) xi ∈x λ(xi ) where Λ(X ) = X λ(u) (du).

10.4.2 Let the ﬁnite point process on X have Janossy densities {jn (·)} as in Sections 5.4 and 7.1. Then the Papangelou conditional intensity of Deﬁnition 10.4.I is expressible for some ﬁnite c > 0 c j1 (y) (y ∈ X , x = ∅), ρ(y | x) = jn+1 (x ∪ y)/jn (x) (y ∈ X \ x, x ∈ X (n)∗ , n = 1, 2, . . .).

10.4.3 Suppose the reﬂexive symmetric relation ∼ of a Markov point process is given by y ∼ z if and only if |y − z| ≤ R for R = 0. Because the only cliques are then singletons, deduce that the point process must be Poisson. 10.4.4 Verify the properties (i)–(vi) of cliques listed after Deﬁnition 10.4.III, providing in particular a counterexample to property (v).

130

10. Special Classes of Processes

10.4.5 Check conditions for Examples 10.4(b) and (c) to be repulsive or attractive. 10.4.6 Extended Strauss models; multiscale processes. Suppose given for a ﬁnite positive integer k, 0 = R0 < R1 < · · · < Rk < Rk+1 = ∞, βj ∈ (0, ∞) and γj ∈ (0, 1] (j = 1, . . . , k), and X a bounded Borel subset of Rd . For x ∈ X ∪∗ and R ∈ R+ let m(x, R) = #{xi , xj ∈ x : xi − xj < R} as in Example 10.4(b), and ∆m(x, R , R ) = m(x, R ) − m(x, R ) for 0 ≤ R < R ≤ ∞. n(x) m(x,Rj ) (a) Let αj be such that f (j) (x) = αj βj γj is a Markov density function for each j = 1, . . . , k. Use Proposition 10.4.VI to deduce that k

(γj . . . γk )∆m(x,Rj−1 ,Rj )

f (x) = αβ n(x)

j=1

is a Markov density function for suitable α and β. ∆m(x,R

,R )

j−1 j is a Markov density (b) More generally, f (x) = αβ n(x) kj=1 γj function for some ﬁnite positive α and β; display its Papangelou intensity function. [Hint: Penttinen (1984) or Møller and Waagepetersen (2004, Example 6.2).] (c) Deﬁne stochastic monotonicity of Markov point sets for such models.

10.4.7 (a) In the setting of Example 10.4(f) verify that a nonnegative function f deﬁned for all x ∈ X ∪∗ by (10.4.13) and integrable as below (10.4.2) is a Markov density function when conditions such as (10.4.15) are satisﬁed. (b) When f is a Markov function, with (∼)-interaction √function √ φ say, a repre√ sentation such √ as (10.4.7) but with φ holds for f ≡ { fn }. Conclude that when f is integrable as at (10.4.2b) it is a Markov function.

10.4.8 Suppose a birth-and-death process as in Example 10.4(e) has constant birthrate B(x, A) = (X ∩ A) for A ∈ BX and death-rate D(x) = y∈x D(x \ y, y) for a BX ∪∗ × BX -measurable function D(·, ·) that satisﬁes condition (a) or (b) of Proposition 10.4.VII. Show that the equilibrium measure is an areainteraction process as in Example 10.4(c). [Hint: Baddeley and van Lieshout (1995, Section 4) also give a constant death-rate (but variable birth-rate) analogue of this property.] 10.4.9 Let X = {x1 , y1 , x2 , y2 , z, y3 } and suppose that x1 ∼ y1 ∼ x2 ∼ y2 ∼ z ∼ y3 ∼ x1 but u ∼ v for all other pairs {u, v} ⊂ X . For any {u, v} ⊂ X and w w ⊂ X deﬁne u ∼ v if either u ∼ v or else u ∼ w and w ∼ v for some w ∈ w. x

y3 . Then (a) Consider the two sets x = X \ z and y = {y1 , y2 , y3 }, so y2 ∼ x x∪z x x∪z (y) = 1, so I (y) = I (y). But y1 ∈ / bx∪z (z), so I (y) = 0 and I w condition (C.1) does not hold for such X and ∼ as deﬁned. (b) Use y as above, but now put z = y ∪ z and let u, v = x1 , x2 . Then I z (y) = I z∪u (y) = I z∪v (y) = 0 but I z∪{u,v} (y) = 1, so (C.2) also fails.

x 10.4.10 Ord’s process. Consider the function f (x) = αβ n n i=1 g(area of C (xi )), where x ∈ bounded subregion of R2 , n = n(x), C x (xi ) denotes the Voronoi cell associated with xi ∈ x as in Example 10.4(c), and g a function described shortly. If g is not constant then f is not a Markov function, because f (x ∪ y)/f (x) depends on neighbours of neighbours of y, but for positive x x bounded g, f is a Markov function w.r.t. ∼2 deﬁned by xi ∼2 xj if either x x x xi ∼ xj or else there exists xk ∈ x such that both xi ∼ xk and xk ∼ xj .

CHAPTER 11

Convergence Concepts and Limit Theorems

11.1 11.2 11.3 11.4

Modes of Convergence for Random Measures and Point Processes 132 Limit Theorems for Superpositions 146 Thinned Point Processes 155 Random Translations 166

When random measures and point processes are regarded as probability mea# sures on the appropriate c.s.m.s. M# X or NX , they may be associated with concepts of both weak and strong convergence of measures on a metric space. In this chapter we examine these concepts more closely, ﬁnding necessary and suﬃcient conditions for weak convergence, relating this concept to other possible deﬁnitions of convergence, and applying it to some near-classical questions concerning the convergence of superpositions, thinnings, and translations of point processes. A common theme in the limit theorems described in this chapter is the emergence of the Poisson process as the limit of repeated applications of some stochastic operation on an initial point process. In a loose sense, each of the operations of superposition, thinning, and random translation is entropy increasing; it is not surprising then that among point processes with ﬁxed mean rate, the Poisson process has maximum entropy (see Section 7.6 and the further discussion in Chapter 14). These limit theorems help not only to explain the ubiquitous role of the Poisson process in applications but also to reveal its central place in the structural theory of point processes. Of course these applications far from exhaust the role of convergence concepts in the general theory of random measures and point processes. Other important applications arise in the discussion of ergodic theorems and convergence to equilibrium in Chapter 12, and in various questions related to Palm theory in Chapter 13 and conditional intensities in Chapter 14. In this chapter we mostly restrict attention to X = Rd , even though extensions to more general locally compact groups are usually possible. Many of these extensions are covered in MKM (1978) and especially MKM (1982), giving systematic extensions of earlier work to the context of a general locally compact group. 131

132

11. Convergence Concepts and Limit Theorems

11.1. Modes of Convergence for Random Measures and Point Processes In this section we examine diﬀerent possible modes of convergence for a family of point processes or random measures. We generally suppose that the # processes involved are to be thought of as distributions on M# X or NX , where X as usual is a general c.s.m.s. Then the question is this: given a sequence of probability measures {Pn } on M# X , in what sense should the statement Pn → P be understood? Three types of convergence suggest themselves for immediate consideration: strong convergence of probability distributions on M# X (i.e., Pn − P → 0, where P is the variation norm as deﬁned at the end of Section A1.3); weak convergence of probability distributions on M# X [i.e., Pn → P weakly, meaning Pn → P in the sense of weak convergence of measures on the metric space M# X ; see Deﬁnition A2.3.I(i)]; and convergence of the ﬁnite distributions [i.e., for all suitable ﬁnite families of bounded Borel sets A1 , . . . , Ak , the joint distributions of the random variables ξ(A1 ), . . . , ξ(Ak ) under Pn converge weakly to their limit distribution under P]. The consequential matter of convergence of moments is noted before considering in the ﬁnal part of this section some further questions that arise when we try to relate convergence of measures in M# X and convergence of the associated cumulative processes (distribution functions) in the function space D(0, ∞) or its relatives [see discussion following Example 11.1(c)]. We sometimes adopt a common abuse of terminology by stating that the random measures ξn converge weakly (or strongly) to a limit random variable ξ when all that is meant is the weak (or strong) convergence of their distributions in M# X ; in fact there are no requirements for convergence of the random measures themselves (as elements of or mappings into M# X ). The same abuse applies to “point processes Nn converge to N .” We note ﬁrst that strong convergence implies weak convergence. Indeed, for any set U ∈ B(M# X ), we have in the notation of Section A1.3 Pn − P ≥ VPn −P (U ) ≥ |Pn (U ) − P(U )|. It follows that strong convergence implies Pn (U ) → P(U ) for all U ∈ B(M# X ), which then implies weak convergence by Theorem A2.3.II. The converse is not true; Example 11.1(a) below serves as a counterexample. Indeed, strong convergence implies that any ﬁxed atom for the limit probability must also be a ﬁxed atom for its approximants. One of the most important applications of convergence in variation norm concerns convergence to equilibrium, or stability properties, of stochastic processes, in which context it is frequently established through the concept of coupling. Two jointly deﬁned stochastic processes X(t) and Y (t) are said to couple if there exists an a.s. ﬁnite random variable T (the coupling time) such that X(t) and Y (t) are a.s. equal for all t ≥ T . The basic lemma is as below [see, e.g., Lindvall (1992) or Thorisson (2000) for more extended discussion].

11.1.

Modes of Convergence

133

Lemma 11.1.I. Let X(t, ω), Y (t, ω), be two stochastic processes, deﬁned on a common probability space (Ω, E, P), and taking their values in a common c.s.m.s. V . Denote by Pt , Qt the distributions of X(t), Y (t), respectively, on (V, BV ). Suppose that X and Y couple, with coupling time T . Then Pt − Qt ≤ 2 P{T > t}. Proof. To establish convergence in variation norm, we ﬁrst note that " " " " " Pt − Qt = sup " f (v) Pt (dv) − f (v) Qt (dv)"" , f :f ≤1

the supremum being taken over all bounded measurable functions f on (V, BV ) with f = supv∈V |f (v)| ≤ 1. (Indeed, the supremum is achieved when f = IU + − IU − in the Jordan–Hahn decomposition of Pt − Qt : see Theorem A1.3.IV.) We then have " " " " " " "f X(t, ω) − f Y (t, ω) " P(dω) " f (v) Pt (dv) − f (v) Qt (dv)" ≤ " " Ω " " "f X(t, ω) − f Y (t, ω) " P(dω) = T ≤t " " "f X(t, ω) − f Y (t, ω) " P(dω) + T >t

≤ 2 f P{T > t} ≤ 2 P{T > t}. This lemma can be applied to point processes by associating X(t) ∈ NX# with the shifted version St N of a point process initially deﬁned on R+ : see the further discussion on convergence to equilibrium in Section 12.5 where the weaker concept of shift-coupling is also discussed (see around Lemma 12.5.IV). Although convergence in variation norm is generally the more diﬃcult to establish, once available it is very convenient to use. This is because in addition to its properties as a norm, it also respects convolution in the sense that µ ∗ ν ≤ µ ν (see Exercise 11.1.1). In practice, it is often convenient to work not with the norm on the full space M# X , but rather with the family of norms on each of the spaces of totally ﬁnite measures MA for bounded A ∈ BX . We have already met examples of this approach in the discussion of convergence to equilibrium of renewal and Wold processes (see in particular Corollary 4.4.VI). Yet another possibility is to look at norm convergence rather than weak convergence for the ﬁdi distributions, an issue that arises in applying the Stein–Chen approach to establishing convergence to Poisson distributions. Although the main emphasis in our discussions is on weak convergence, an illustration of the sort of analysis required is given in the second half of

134

11. Convergence Concepts and Limit Theorems

Section 11.3, where some of the preliminary inequalities are derived and used to strengthen the convergence statements in the discussion of thinning. Other examples are given from time to time in the exercises and elsewhere. We turn now to the main topic of this section, namely the weak convergence of random measures and point processes, and its relation to the weak convergence of the ﬁnite-dimensional distributions. In connection with the latter concept, we call the Borel set A a stochastic continuity set for the measure P if P{ξ(∂A) > 0} = 0, equivalently P{ξ(∂A) = 0} = 1. Without a restriction to sets that are continuity sets for the limit measure, convergence of the ﬁdi distributions would be too strong a concept to be generally useful as the following example shows. Example 11.1(a) Convergence and continuity sets. Let ξn consist of exactly one point in each interval (k, k + 1), k = 0, ±1, ±2, . . . , with each such point uniformly distributed over (k, k + 1/n). Then as n → ∞, we would like to say that the sequence converges to the deterministic point process with one point at each integer. However, if A = (0, 1), we have Pn {ξ(0, 1) > 0} = 1 but P{ξ(0, 1) > 0} = 0. Thus, we can expect diﬃculties to arise in the deﬁnition if the limit random measure has ﬁxed atoms, and these atoms lie on the boundary of the set A considered in the ﬁnite-dimensional distribution. Similar but more general examples can readily be constructed. Granted the need for the restriction it is important to know that there are ‘suﬃciently many’ stochastic continuity sets. Given P, let SP denote the class of stochastic continuity sets for P. From the elementary properties of set boundaries (see Proposition A1.2.I), it is clear that SP is an algebra (see Exercise 11.1.2). The following lemma is then suﬃcient for most purposes. Lemma 11.1.II. Let X be a c.s.m.s., P a probability measure on B(M# X ), and SP the class of stochastic continuity sets for P. Then for all x and, given x, for all but a countable set of values of r > 0, Sr (x) ∈ SP . Proof. It is enough to show that for each ﬁnite positive ε, δ, and R, the set of r in [0, R] satisfying

P ξ ∂Sr (x) > δ > ε is ﬁnite. Suppose the contrary, and let ε, δ, and R be such that for some countably inﬁnite set {r1 , r2 , . . .} of distinct values of r in 0 ≤ r ≤ R < ∞, P(Bi ) > ε for i = 1, 2, . . . , where Bi = ξ: ξ ∂Sri (x) > δ . Then

ε ≤ lim sup P(Bi ) ≤ P(lim sup Bi ) ≤ P ξ SR (x) = ∞ i→∞

because

i→∞

ξ SR (x)

∞ ≥ ξ ∂Sri (x) = ∞ i=1

whenever ξ ∂Sri (x) ≥ δ > 0 for an inﬁnite number of values of i. This contradicts the bounded ﬁniteness of ξ.

11.1.

Modes of Convergence

135

Corollary 11.1.III. The stochastic continuity sets of P form an algebra that contains both a dissecting ring and a covering ring. Proof. Inspection shows that the constructions of a dissecting system at Proposition A2.1.IV and of a covering ring before Corollary A2.3.III are not aﬀected by replacing any sphere Sα (di ) that is not a stochastic continuity set by a marginally smaller sphere Sα (di ) that is. Because the remaining stages of the constructions involve only ﬁnite unions, intersections, and diﬀerences, they do not lead out of the algebra of such continuity sets. We can now state the following more formal deﬁnition. Deﬁnition 11.1.IV. The sequence {ξn } converges in the sense of convergence of ﬁdi distributions if for every ﬁnite family {A1 , . . . , Ak } of bounded continuity sets Ai ∈ BX the joint distributions of {ξn (A1 ), . . . , ξn (Ak )} converge weakly in B(Rk ) to the joint distribution of ξ(A1 ), . . . , ξ(Ak ). The mapping that takes a general element ξ of M# X into ξ(A), where A is a bounded Borel set, is measurable, essentially by deﬁnition of the σ-algebra B(M# X ), but need not be continuous. To see this last point it is enough to consider variants on Example 11.1(a), where the sequence of measures {ξn } converges in the w# -topology in M# X to a limit measure ξ that has an atom on the boundary ∂A. However, only those measures ξ giving nonzero mass to ∂A can act in this way as discontinuity points of the mapping ξ → ξ(A), for if ξ(∂A) = 0 and ξn →w# ξ, then by Theorem A2.3.II (see also Proposition A2.6.II), ξn (A) → ξ(A) . Now let {Pn } be a sequence of probability distributions, and suppose that Pn → P weakly, and that A is a stochastic continuity set for P. This last is just another way of saying that the set D of discontinuity points for the mapping fA : ξ → ξ(A) satisﬁes the condition P(D) = 0. It then follows from the extended form of the continuous mapping theorem (Proposition A2.3.V) that Pn (fA−1 ) → P(fA−1 ), or in other words that the distribution of ξ(A) under Pn converges to its distribution under P. A similar argument applies to any ﬁnite family of bounded Borel sets {A1 , . . . , Ak } satisfying P{ξ(∂A) = 0} = 1 and hence leads to the following lemma. Lemma 11.1.V. Weak convergence implies weak convergence of the ﬁnitedimensional distributions. What is more surprising is that the converse of this statement is also true, so that for random measures and point processes, the concepts of weak convergence and convergence of ﬁdi distributions are equivalent. This result, which constitutes the main theorem of this section, is proved at Theorem 11.1.VII. In preparation for this result, we set out in explicit form the conditions for a family of probability measures on B(M# X ) to be uniformly tight (cf. Appendix A2.4, in particular Theorem A2.4.I). In the proposition below T refers to an arbitrary index set, not necessarily countable.

136

11. Convergence Concepts and Limit Theorems

Proposition 11.1.VI. For a family of probability measures {Pt , t ∈ T } on B(M# X ) to be uniformly tight, it is necessary and suﬃcient that, given any closed sphere S ⊂ X and any ε, δ > 0, there exists a real number M < ∞ and a compact set C ⊆ S such that, uniformly for t ∈ T , Pt {ξ(S) > M } < ε,

(11.1.1)

Pt {ξ(S \ C) > δ} < ε.

(11.1.2)

If X is locally compact, and in particular if X = Rd , the second condition is redundant. Proof. Uniform tightness means that, for each ε > 0, there exists a compact set K ∈ B(M# X ) such that Pt (K) > 1 − ε for all t ∈ T . From Proposition A2.6.IV and Theorem A2.4.I, K is compact if there exists a sequence of closed spheres S n ↑ X such that for each δ > 0 and n < ∞ there exist constants Mn and compact sets Cn,δ ⊆ S n such that for all ξ ∈ K, (a) ξ(S n ) ≤ Mn , and (b) ξ(S n \ Cn,δ ) ≤ δ. Eﬀectively, (11.1.1) and (11.1.2) are just reformulations of (a) and (b). Indeed, supposing ﬁrst that (11.1.1) and (11.1.2) are satisﬁed, choose any sequence of closed spheres S n ↑ X such that each S n is a stochastic continuity set for P. From (11.1.1) we choose Mn such that Pt {ξ(S n ) > Mn } < ε/2n+1 , ⊆ S n so that and from (11.1.2) we choose the compact set Cmn Pt {ξ(S n \ Cmn )} < ε/2m+n+2 .

Deﬁne the sets, for n, m = 1, 2, . . . , Qmn = {ξ: ξ(S n \ Cmn ) ≤ m−1 }, ∞ ∞ ( ( K= (Qn ∩ Qmn ).

Qn = {ξ: ξ(S n ) ≤ Mn },

n=1 m=1

By construction, (a) and (b) are satisﬁed so K is compact, and Pt (K c ) ≤

∞

Pt (Qcn ) +

n=1

∞ ∞ ε ε = ε. Pt (Qcmn ) ≤ + 2n+1 m=1 2m+n+2 m=1 n=1 ∞

Thus, K satisﬁes all the required conditions. Suppose conversely that the measures Pt , are uniformly tight. Given ε, choose compact K ⊂ M# X and hence deduce the existence of spheres S n ↑ X such that there exist constants Mn for which (a) holds, and, given δ, there exist Cn,δ such that (b) holds. Given any S, choose n so that S ⊆ S n , set

11.1.

Modes of Convergence

137

M = Mn so that (11.1.1) is true, and, given δ, set C = Cn,δ ∩ S so that ξ(S \ C) ≤ ξ(S \ Cn,δ ) and hence (11.1.2) holds. Example 11.1(b) Convergence of one-point processes. Let ξn be the degenerate point process in R in which all the mass is concentrated on the counting measure with a single atom at the point n. Then (11.1.1) holds trivially for all S and ε with M = 2. In fact εn → ε∞ weakly, where ε∞ has all its mass concentrated on the zero random measure 0. Thus, it is important to bear in mind that weak convergence here does not preclude the possibility that the limit point process be everywhere zero. may n Next let ηn = k=1 ξk . Then for all n we have ηn (0, m] ≤ m so that (11.1.1) still holds with M equal to the radius of the sphere S. In this case ηn → η∞ weakly, where η∞ is the deterministic point process at unit rate with an atom ateach positive integer. n Finally, let ζn = k=1 kζn−k . Here condition (11.1.1) fails and no weak limit exists. The next theorem is the main result of this section. It is a striking consequence of the integer-valued character and locally bounded nature of a point process. Theorem 11.1.VII. Let X be a c.s.m.s. and P, {Pn : n = 1, 2, . . .} distri# butions on (M# X , B(MX )). Then Pn → P weakly if and only if the ﬁdi distributions of Pn converge weakly to those of P. Proof. The ﬁrst part of the theorem has already been proved in Lemma 11.1.V. Because the set of all ﬁdi distributions determines a probability measure uniquely, in order to prove the converse it suﬃces to show that the family {Pn } is uniformly tight, for then every sequence contains a weakly convergent subsequence, and from the convergence of the ﬁdi distributions this must be the limit measure P; thus, all convergent subsequences have the same limit, and so the whole sequence converges. To establish tightness, we use the assumption that the ﬁdi distributions converge for stochastic continuity sets of P to show that (11.1.1) and (11.1.2) hold for any given S, ε, and δ. We start by choosing S ⊇ S to be a stochastic continuity set not only for P but also for each of the Pn , n = 1, 2, . . . . (Because by Lemma 11.1.II only countable sets of exceptional radii are involved, this choice can always be made.) Furthermore, we can choose values M that are continuity points for the distribution of ξ(S ) under P and for which P{ξ(S ) > M } < 12 ε and Pn {ξ(S ) > M } → P{ξ(S ) > M } as n → ∞. Thus, for n > n0 say, we have Pn {ξ(S) > M } < ε, and by increasing M if necessary we can ensure that this inequality holds for all n. This establishes (11.1.1).

138

11. Convergence Concepts and Limit Theorems

Again working only with spheres that are stochastic continuity sets for P, choose spheres Srj (xi ) ≡ Sij centred on the points xi (i = 1, 2, . . .) of a separability set and with radii rj ≤ 2−j . Deﬁne Cij = S ij ∩ S . Because ξ

K

Cij

↑ ξ(S )

(K → ∞),

i=1

we can choose Kj so that, with Cj =

Kj i=1

Cij ,

P{ξ(S − Cj ) ≥ δj } ≤ ε/2j+1 , where δj ≤ δ/2j is chosen to be a continuity point of the distribution of ξ(S −Cj ) under P. Again using the weak convergence of the ﬁdi distributions, and increasing the value of Kj if necessary, we can ensure as before that the similar inequality (11.1.3) Pn {ξ(S − Cj ) ≥ δj } ≤ ε/2j ∞ holds for all n. Now deﬁne C = j=1 Cj . Then C is closed, and by construction it can be covered by a ﬁnite number of ε-spheres for every ε > 0, so by Proposition A2.2.II, C is compact. We have moreover from (11.1.3) that, for every n, ∞ (S − Cj ) > δ Pn {ξ(S ) − ξ(C) > δ} = Pn ξ j=1

≤

∞

Pn {ξ(S − Cj ) > δ/2j }

j=1

≤

∞ j=1

Pn {ξ(S − Cj ) > δj } ≤

∞ ε = ε, j 2 j=1

thereby establishing (11.1.2). Thus, both conditions of Proposition 11.1.VI are satisﬁed, and we conclude that the family {Pn } is tight. Several equivalent conditions for weak convergence can be derived as corollaries or minor extensions to the above theorem. The last condition represents a minor weakening of the full strength of convergence of ﬁdi distributions. Proposition 11.1.VIII. Each of the following conditions is equivalent to the weak convergence Pn → P weakly, where in (i) and (ii), f ranges over the space of continuous functions vanishing outside a bounded set. (i) The distribution of X f dξ under Pn converges weakly to its distribution under P. (ii) The Laplace functionals Ln [f ] ≡ EPn exp − X f (x) ξ(dx) converge pointwise to the limit functional L[f ]. (iii) For point processes, the p.g.ﬂ.s Gn [h] converge to G[h] for each continuous h ∈ V0 .

11.1.

Modes of Convergence

139

(iv) For every ﬁnite family {A1 , . . . , Ak } from a covering semiring of bounded continuity sets for the limit random measure ξ, the joint distributions of {ξn (A1 ), . . . , ξn (Ak )} converge weakly in B(Rk ) to the joint distribution of {ξ(A1 ), . . . , ξ(Ak )}. Proof. For any f as described, the mapping deﬁned by Φf (ξ) =

f (x) ξ(dx); X

is continuous at ξ provided ξ Z(f ) = 0, where Z(f ) is the set of discontinuities of f . Hence, in particular, Φf is continuous for all ξ whenever f itself is continuous. Thus, the distributions of Φf (ξ) under Pn converge weakly to its distribution under P. Now suppose that f is a function of the form i ci IAi (x), where i |ci |< ∞ contiand {Ai } is a bounded family of bounded Borel sets that are stochastic nuity sets for P. Convergence of the distributions of the integrals X f dξ for all such functions f is equivalent to the joint convergence in distribution of ξ(A1 ), . . . , ξ(Ak ) for every ﬁnite integer k, that is, to ﬁdi convergence. Because such functions can be approximated by continuous functions, as, for example, in the proof of Theorem A2.3.II, it follows that (i) implies convergence of the ﬁdi distributions and hence weak convergence. Condition (ii) is equivalent to (i) by well-known results on Laplace transforms. Because f (x) = − log h(x) is a function as in (i) if and only if h is continuous and h ∈ V0 , (iii) is equivalent to (ii) when the distributions Pn correspond to point processes. Establishing the suﬃciency of the last condition is a matter of verifying that the constructions in the proof of Theorem 11.1.VII can be carried through with sets Ai drawn from the covering semiring (or, more generally, from the ring it generates, in as much as it is clear that the convergence carries over to sets taken from this generated ring). Because by deﬁnition of a covering ring each open sphere can be approximated by sets in the ring, both constructions in the ﬁrst part of the proof can be so modiﬁed, implying that the sequence {Pn } is uniformly tight. If {Pnk } is any weakly convergent subsequence from {Pn }, with limit P say, then P and P must have the same ﬁdi distributions for sets drawn from the covering semiring. Then from Proposition 9.2.III it follows that P = P , and hence as before that Pn → P weakly. In the case of point processes, even sharper versions of condition (iv) are possible when the processes are simple, through the use of the avoidance function (see Theorem 9.2.XII). In this case the limit measure must correspond to a simple point process if it is to be uniquely characterized by the avoidance function, and condition (ii) below is such an additional requirement about asymptotic orderliness. Several variants are now possible; the following is perhaps the simplest.

140

11. Convergence Concepts and Limit Theorems

Proposition 11.1.IX. Let {Pn : n = 1, 2, . . .}, P be distributions on NX# with P corresponding to a simple point process, and suppose that R ⊆ SP is a covering dissecting ring. In order that Pn → P weakly, it is suﬃcient that (i) Pn {N (A) = 0} → P{N (A) = 0} as n → ∞ for all bounded A ∈ R; and (ii) for all bounded A ∈ R and partitions Tr = {Ari : i = 1, . . . , kr } of A by sets of R, kr Pn {N (Ari ) ≥ 2} = 0. (11.1.4) lim sup sup n→∞

Tr i=1

Proof. In view of Theorem 9.2.XII it is enough to show that under the stated conditions the family {Pn } is uniformly tight and that the limit of any weakly convergent subsequence must be a simple point process. Let S be a closed sphere in R, and in (ii) take A = S. Observing that {N (S) > kr } implies {N (Ari ) ≥ 2 for at least one i}, kr

Pn {N (Ari ) ≥ 2} ≥ Pn {N (S) > kr }.

i=1

Given ε > 0, (11.1.4) implies that the sum on the left-hand side here is bounded by ε for n ≥ n0 , hence (by adjusting kr if necessary) for all n, and thus that the ﬁrst condition for uniform tightness holds. Condition (11.1.2) here can be stated in the following form. Given ε > 0, there exists a compact set C such that Pn {N (S − C) = 0} > 1 − ε for n = 1, 2, . . . . Choose C so that for the limit distribution we have P{N (S − C) = 0} > 1 − 12 ε. From assumption (i) we have Pn {N (S − C) = 0} → P{N (S − C) = 0} as n → ∞, from which the required inequality (11.1.2) holds for all suﬃciently large n, and hence (by increasing C if necessary) for all n. Thus, both conditions for uniform tightness of {Pn } are satisﬁed. Now let {Pnk } be any weakly convergent subsequence from the family {Pn }, with limit P say, so that from Theorem 11.1.VII all ﬁdi distributions converge to those of P ; hence, from (i) of the proposition, for A ∈ R, P{N (A) = 0} = P {N (A) = 0}. From this result it follows not necessarily that P = P but merely that P{N (A) = 0} = P {N ∗ (A) = 0}

or

P = (P )∗ ,

where N ∗ is the support point process of N and (P )∗ is its distribution (see Corollary 9.2.XIII). However, we have kr

P {N (Ari ) ≥ 2} =

i=1

ki i=1

lim Pnk {N (Ari ) ≥ 2}

k→∞

so that from (11.1.4) lim sup Tr

kr i=1

P {N (Ari ) ≥ 2} ≤ lim sup

kr

k→∞ Tr i=1

Pnk {N (Ari ) ≥ 2} = 0,

11.1.

Modes of Convergence

141

from which it follows (Proposition 9.3.XII) that P is simple, and hence that P = (P )∗ = P. So all weakly convergent subsequences have limit P , and thus Pn → P weakly. Thus far we have considered essentially convergence of probability measures of point processes, but what of their moments? Uniform integrability is an analogous ‘bounding’ condition that ensures that a sequence of moment measures may converge, as in the assertion below (the proof is standard and left to the reader; see also Exercise 11.1.3). Proposition 11.1.X. Let {Nn } be a weakly convergent sequence of point processes on the c.s.m.s. X with limit N and for which the ﬁrst moment measures {Mn (A)} = {E[Nn (A)]} are ﬁnite for bounded A ∈ BX . These moment measures converge in the w# -topology to the ﬁrst moment measure M of N if and only if for some sequence of spheres Sk ↑ X , E[Nn (Sk )I{Nn (Sk ) ≥ a}] → 0 (a → ∞)

uniformly in n.

As an illuminating example of the use of weak convergence arguments, we outline below the proof used by Br´emaud and Massouli´e (2001) to establish the existence of a type of Hawkes process [Examples 6.3(c), 7.2(b)] with no immigration component: the population is both maintained and balanced purely via its own oﬀspring and their locations. Example 11.1(c) Existence of Hawkes process without ancestors [Br´emaud and Massouli´e (2001)]. We consider a sequence of ordinary Hawkes processes with infectivity functions (intensity measure of oﬀspring process) of the form µ (x) = (1− )µ(x) , µ(x) dx = 1, with associated immigration rates ν = ν. All these processes have the same mean rate 1−

ν ν = = ν, 1 − (1 − ) µ (x)dx

so it is plausible that as → 0, the processes may converge to some limit process with mean rate ν and conditional intensity of the form

t

λ(t) = −∞

µ(t − u) N (du).

(11.1.5)

Our aim is to establish that such convergence does indeed occur. In view of Theorem 11.1.VI, it is suﬃcient to show that the ﬁnite dimensional distributions converge to a consistent limit. Fix an interval [a, b] and consider the total number N (a, b) of points of the approximating process falling within this interval. From stationarity of the approximating process we have from Markov’s inequality, uniformly for all > 0, Pr{N (a, b) > M } ≤

ν(b − a) E[N (a, b)] = , M M

142

11. Convergence Concepts and Limit Theorems

so that the left-hand side converges uniformly to 0 as M → ∞. It follows that the distributions of N (a, b) are uniformly tight. It is easily seen from this that all ﬁdi distributions for the processes N restricted to (a, b) are uniformly tight. We can therefore extract a subsequence such that the ﬁdi distributions on (a, b) converge weakly to some limit process on (a, b). By covering the real line with a family of such intervals, we can even ﬁnd a subsequence along which all the ﬁdi distributions converge weakly. Then it follows from Theorem 11.1.VI that the point processes themselves converge weakly to some limit point process. Exercise 11.1.4 gives a more general version of this argument. In this model, the weak convergence just established also implies convergence of the expressions for the conditional intensities. To see this, consider expressions of the form which deﬁne the conditional intensity, namely E[N (a, b)IA ] = E

b

IA λ(u) du ,

(11.1.6)

a

where A ∈ Ha , the internal (minimal) history for the process (see the discussion in Chapter 7, or later in this volume in Chapter 14). In particular, it is enough to consider A of the form A = {N : N (C1 ) = n1 , N (C2 ) = n2 , . . . , N (Ck ) = nk } for integers k, n1 , . . . , nk and sets Ci ∈ (−∞, a), because sets of this kind generate Ha . In this model, the existence of densities for the Poisson processes of new immigrants and oﬀspring implies that all bounded Borel sets C ∈ R are continuity sets. It follows that the function N → N (a, b)IA is continuous in the weak# topology in BN (see the discussion following Lemma 11.1.III), and hence, from the continuity theorem (Proposition A2.3.V), that E[N (a, b)IA ] → E[N (a, b)IA ]. Similar arguments apply also to the more complex expressions on the right-hand side of (11.1.6), which for the approximating process N we can write as E a

b

IA λ (x) dx = E a

b

IA ν +

u

−∞

µ (x − u) N (du) dx .

Again the expectations converge to the corresponding form for the limit process, and serve to identify the conditional intensity for the limit process with the form (11.1.5). Nothing in the argument so far precludes the possibility that the limit point process is a.s. equal to the zero counting measure. Indeed, Br´emaud and Massouli´e show that if the function µ(x) is ‘light-tailed’ (decays at an exponential rate or faster) then the only possible limit processes are degenerate (zero or inﬁnite; see Exercise 11.1.5). Because E[N (a, b)] = ν(b − a), a suﬃcient condition to ensure that the limit process is nontrivial is that the limit of the ﬁrst moment measures should be the ﬁrst moment measure of

11.1.

Modes of Convergence

143

the limit process. For this, a uniform integrability condition is needed for the quantities N (a, b)Pr{N (a, b) > M }, as indicated in Exercise 11.1.3. Because for any random variable X ≥ 0, E[XIX>M ] ≤ E[X 2 ]/M , a suﬃcient further condition is boundedness of the variances var[N (a, b)]. This requires a careful examination of the spectral properties of N ; for details see Exercise 11.1.6, where it is shown that the variances remain bounded provided the infectivity function satisﬁes the ‘heavy-tail’ conditions that for some 0 < α < 12 , t1+α µ(t) is bounded on t ≥ 0, and t1+α µ(t) approaches a ﬁnite limit as t → ∞. We conclude this section with a few remarks concerning the relation between weak convergence of random measures with state space X = R+ and weak convergence of the associated cumulative processes as elements in D(0, ∞). Here the cumulative function associated with a measure µ on R+ is deﬁned by Fµ (x) = µ{(0, x]} 0 < x < ∞. Such functions are monotonic increasing, right-continuous with left limits, and therefore deﬁne a subspace of D(0, ∞). Because the metrics in M# (R+ ) and D(0, ∞) are obtained from compounding the analogous metrics over ﬁnite intervals, it is suﬃcient to compare the behaviour of the two metrics over a common ﬁnite interval, which for convenience we take as the interval (0, 1]. In both cases we are eﬀectively concerned with the distance between two cumulative functions F , G over (0, 1). Weak convergence of a family of measures on (0, 1] is equivalent to convergence of the cumulative functions with respect to the L´evy metric ρL , where ρL (F, G) is deﬁned as the inﬁmum of values ε such that for all x ∈ (0, 1], G(x − ε) − ε ≤ F (x) ≤ G(x + ε) + ε [we take G(−y) = 0, G(1 + y) = G(1) (y > 0) for the purposes of this deﬁnition]. On the other hand convergence of the distribution functions in D(0, 1) is equivalent to convergence with respect to the Skorohod metric ρS , where ρS (F, G) is deﬁned as the inﬁmum of values ε such that there exists a continuous mapping λ of [0, 1] onto [0, 1], with λ(0) = 0, λ(1) = 1, for which sup |x − λ(x)| < ε, 0≤x≤1

sup |F λ(x) − G(x)| < ε. 0≤x≤1

The statements ρL (F, G) < ε and ρS (F, G) < ε both require F and G to be close in the sense that uniformly for x ∈ (0, 1], the value of F (x) diﬀers from a possibly slightly shifted value of G(x) by less than ε, the shift also not being allowed to exceed ε. In the case of the Skorohod metric, the degree of shift is controlled by the function λ(x), whereas in the L´evy case it is constrained not to exceed ε but is otherwise not related from one x to any other. In both cases the statement ρ(Fn , F ) → 0 is equivalent to the requirement that Fn (x) → F (x) at all continuity points of x (see Exercises 11.1.7–8).

144

11. Convergence Concepts and Limit Theorems

Provided therefore that discussion is restricted to the subspace of cumulative processes, the two types of convergence are equivalent. Equivalently, the mapping from M(0, 1) into D(0, 1), which takes the measure ξ into its cumulative function Fs , is both ways continuous. The continuous mapping Theorem A2.3.V therefore yields the following result. Lemma 11.1.XI. A sequence of random measures {ξn } on M# (R+ ) converges weakly to a random measure ξ if and only if the corresponding sequence of cumulative processes Fξn converges weakly in D(0, ∞) to the cumulative process Fξ . Extensions to Rd can be obtained in terms of the concepts described by Straf (1972). From the point of view of weak convergence, it is therefore immaterial whether we deal with the random measures directly or the stochastic processes deﬁned by the associated cumulative functions. When rescaling is involved, as, for example, in the central limit theorem, the limits need no longer correspond to random measures, and functional limit theorems can be obtained. An example is given in Proposition 12.3.X.

Exercises and Complements to Section 11.1 11.1.1 For totally ﬁnite signed measures µ, ν on a c.s.m.s. X show that the variation norm · (see Section A1.3) has the following properties: (a) αµ = |α| µ for every scalar α. (b) µ + ν ≤ µ + ν, with equality if the supports of µ and ν are disjoint. (c) µ ∗ ν ≤ µ ν, with equality if both measures are of constant sign (and thus µ = |µ(X )|). 11.1.2 (a) Let N be a point process on R and R the ring generated by half-open intervals such as (a, b], so that for A ∈ R, ∂A consists of the ﬁnite set of endpoints of the constituent intervals of A. Deduce that A is a stochastic continuity set unless any of the endpoints of its constituent intervals happen to be ﬁxed atoms of the process. (b) Deduce that if N on R is stationary, it has no ﬁxed atoms. (c) In general, when P is a probability measure on M# X , the stochastic continuity sets for P form an algebra. 11.1.3 Convergence of moment measures. Let N be a point process that is the limit of the weakly convergent sequence {Nn } of point processes. Show that M (A) = E[N (A)] ≤ lim inf n→∞ E[Nn (A)] = lim inf n→∞ Mn (A). In order to have equality here, some condition such as the uniform integrability condition of Proposition 11.1.X is needed: weak convergence and existence and boundedness of the moment measures are not enough to ensure their vague convergence. Suppose that the point processes Nn are on R and deﬁned as follows. Choose with probability 1 − n−1 the randomized unit lattice point process with points at {r + U : r = 0, ±1, . . .} and U a random variable uniformly distributed on (0, 1), and with probability n−1 the point process with points

11.1.

Modes of Convergence

145

on the lattice {r/n: r = 0, ±1, ±2, . . .} in which the centre of the lattice is located uniformly at random over the unit interval. This sequence {Nn } has the following properties. (i) It converges weakly to the randomized unit lattice point process. (ii) E[Nn (A)] = (2 − n−1 )(A) for n = 1, 2, . . . . (iii) E[N∞ (A)] = (A) for all bounded Borel sets A. 11.1.4 When X = Rd , every sequence of random measures with locally uniformly bounded ﬁrst moment measures is relatively weakly compact (i.e., contains a weakly convergent subsequence). In particular, any sequence of random measures with uniformly bounded mean densities is relatively weakly compact. [Hint: Use Markov’s inequality to show that for each bounded Borel set A there exists a constant CA < ∞ such that Pr{ξ(A) > M } ≤ CA /M for all M > 0.] 11.1.5 For a Hawkes process as in Example 11.1(c), show that if xµ(x) dx ≤ ∞ then a process with conditional intensity (11.1.5) cannot have a ﬁnite mean. [Hint: Pr{N (R+ ) = 0} = E[Pr{N (R+ ) = 0 | H0 }] ≥ exp(−λ t µ(t) dt); this yields a contradiction if the process is ergodic and nonzero.] 11.1.6 Hawkes process without ancestors [see Example 11.1(c)]. ∞ (i) Suppose the density µ satisﬁes the conditions 0 µ(t) dt = 1 and for ﬁnite 1+α µ(t) ≤ R and limt→∞ t1+α µ(t) = r. Show positive R and r, supt>0 t that ∞ iu e −1 µ(ω) − 1]ω −α = r du. lim [ˆ ω→0 u1+α 0 (ii) Let V (T ) denote the variance of the approximating process N over a ﬁxed interval (0, T ). Show that as → 0, V (T ) remains bounded, and V (T ) →

λ 2π

∞

−∞

|eiωT − 1|2 dω. ω 2 |1 − µ ˆ(ω)|2

(iii) Show that the limit variance obtained above is, in fact, the variance of the limiting process described in the example. [Hint: For (ii), use (i) and equation (8.2.10) for the spectral density of a Hawkes process. For (iii), see Lemma 1 of Br´emaud and Massouli´e (2001).] 11.1.7 Let ρL , ρS refer to the Prohorov and Skorohod metrics, respectively, on the space of ﬁnite measures on (X , BX ) (the Prohorov metric reduces to the L´evy metric on R). Let {Nn }, N be ﬁnite counting measures on X . Prove that ρL (Nn , N ) → 0 if and only if for all suﬃciently large n, Nn has the same number of atoms as N , and the locations of the atoms in N , converge to their locations under N . Deduce that ρL (Nn , N ) → 0 implies ρS (Nn , N ) → 0. [Hint: See, e.g., Straf (1972).] 11.1.8 Write d0 (F, G) = supx∈R |F (x) − G(x)| for the sup metric on the space of d.f.s F , G on R, and write µF , µG for the measures generated by such d.f.s. Prove that 2d0 (F, G) ≤ µF − µG .

146

11. Convergence Concepts and Limit Theorems

11.2. Limit Theorems for Superpositions Limit theorems for superpositions of point processes go back at least as far as Palm (1943) and continued in Khinchin (1955) in developing a simple version of Proposition 11.2.VI below for the superposition of a large number of independent identically distributed stationary point processes on R. Under suitable conditions, rescaled versions of the resulting processes have a Poisson process limit. Extensions through work of Ososkov (1956), Franken (1963) and Grigelionis (1963) led ultimately to Theorems 11.2.III and 11.2.V in which rescaling is subsumed by convergence of sums in a uniformly asymptotically negligible array. The formal setting for studying the sum or superposition of a large number of point processes or random measures is a triangular array {ξni : i = 1, . . . , mn ; n = 1, 2, . . .} and its associated row sums ξn =

mn

ξni ,

n = 1, 2, . . . .

i=1

If for each n the processes {ξni : i = 1, . . . , mn } are mutually independent, we speak of an independent array, and when they satisfy the condition that for all ε > 0 and all bounded A ∈ BX lim sup P{ξni (A) > ε} = 0,

(11.2.1)

n→∞

the array is uniformly asymptotically negligible, or u.a.n. for short. In the case of a triangular array of point processes, the u.a.n. condition (11.2.1) reduces to the simpler requirement that lim sup P{Nni (A) > 0} = 0.

n→∞

(11.2.2)

i

Note that an independent u.a.n. array is called inﬁnitesimal in MKM (1978, Section 3.4), a null-array in Feller (1966) and Kallenberg (1975, Chapter 6), and holospoudic in Chung (1974, Section 7.1). The terminology u.a.n. comes from Lo`eve (1963) (Lo`eve in fact wrote uan). Although this formal setting can be extended (see Exercise 11.2.3) and the notation simpliﬁed by taking mn = ∞, from which the ﬁnite case is obtained by assuming all but ﬁnitely many elements to be zero, we retain the setting with mn < ∞ for the sake of familiarity. In the discussion that follows the reader will doubtless observe the very close analogy between the results developed for point processes and the classical theory for sums of i.i.d. random variables in R. This is hardly surprising, for a point process is just a particular type of random measure and a random measure is just a random variable taking its values on the metric Abelian group of boundedly ﬁnite signed measures on the state space X . As such it comes under the extension of the classical theory developed, for example, in Parthasarathy (1967, Chapter 4). We develop results for point processes

11.2.

Limit Theorems for Superpositions

147

directly; however, the reader may ﬁnd it useful to bear the classical theory in mind as a guide, as for example at the end of this section in reviewing the corresponding results for random measures. We start with a preliminary result on the convergence of inﬁnitely divisible n : n = point processes, continuing with notation from Section 10.2. Let {Q a 1, 2, . . .} denote a sequence of KLM measures (Deﬁnition 10.2.IV) and Q limit measure. We cannot immediately speak of the weak convergence of ﬁrstly because the KLM measures are only σ-ﬁnite in general, and n to Q, Q secondly because they are only deﬁned on the Borel subsets of the space N0# (X ) ≡ NX# \{N (X ) = 0}, which is not complete. To deﬁne an appropriate n }, recall that for each modiﬁcation of weak convergence for the sequence {Q A bounded Borel set A the KLM measure Q induces a totally ﬁnite measure Q # A on the space N0 (A) of nonzero counting measures on A. Extend Q to the (A) = Q A on N # (A) and Q (A) {N : N (A) = 0} = 0. whole of NA# by setting Q 0 n : n = 1, 2, . . .} conDeﬁnition 11.2.I. The sequence of KLM measures {Q verges Q-weakly to the KLM measure Q (i.e., Qn → Q Q-weakly), if for every the extended bounded Borel set A that is a stochastic continuity set for Q, (A) (A) measures Qn converge weakly to Q . This requirement can be spelt out more explicitly in terms of ﬁdi distributions or the convergence of functionals. Thus, it is equivalent to requiring that for every ﬁnite family of bounded Borel sets Ai ∈ SQ (i = 1, . . . , k; k ≥ 2), n N : N (Ai ) = ji (i = 1, . . . , k), k ji > 0 Q i=1

: N (Ai ) = ji (i = 1, . . . , k)}. → Q{N

(11.2.3)

A more convenient form for our purposes is the following. For every continuous function h ∈ V(X ) equal to one outside some bounded Borel set A, as n → ∞, n (dN ) exp log h(x) N (dx) Q {N (A)>0}

X

→

exp {N (A)>0}

log h(x) N (dx) Q(dN ). (11.2.4)

X

These restatements are immediate consequences of the conditions for weak (A) convergence developed in Section 11.1 applied to the measures Q n , which, although not probability measures, are totally ﬁnite so that the framework for weak convergence remains intact. Proposition 11.2.II. (a) The set of inﬁnitely divisible distributions is closed in the topology of weak convergence in NX# . (b) If {Pn : n = 1, 2, . . .} and P are inﬁnitely divisible distributions on NX# , the corresponding KLM measures, then Pn → P weakly if and n }, Q and {Q Q-weakly. only if Qn → Q

148

11. Convergence Concepts and Limit Theorems

Proof. Suppose that Pn → P weakly and that the Pn are inﬁnitely divisible. Take any integer k, and observe that if Pn has p.g.ﬂ. Gn [·], then (Gn [·])1/k n (·), where Q n is is also a p.g.ﬂ. and corresponds to the KLM measure k −1 Q the KLM measure of Pn . When Gn [h] → G[h] for all continuous h ∈ V(X ), it follows that (Gn [h])1/k → (G[h])1/k , so that (G[h])1/k is a p.g.ﬂ. for every integer k, and hence P is inﬁnitely divisible. Using the p.g.ﬂ. representation (10.2.9), we have for all continuous h ∈ V(X ), n (dN ) exp log h(x) N (dx) − 1 Q N0# (X )

X

→

N0# (X )

exp X

log h(x) N (dx) − 1 Q(dN ).

we can approximate the step Taking A to be a stochastic continuity set for Q, function h(x) = 1 − IA (x) arbitrarily closely by h ∈ SQ and thus conclude that : N (A) > 0}, n {N : N (A) > 0} → Q{N Q as the exponential term above vanishes if N (A) > 0. By subtraction we then have that the integrals n (dN ) exp log h(x) N (dx) Q {N (A)>0}

X

converge as required at (11.2.4). n → Q Q-weakly, the argument can be reversed. Conversely, when Q Given a point process with distribution P and p.g.ﬂ. G[ · ], deﬁne its Poisson approximant [corresponding to the accompanying law in the classical theory— see, e.g., Parthasarathy (1967, VI.6)] to be the Poisson randomization with distribution P ∗ and p.g.ﬂ. G∗ [ · ] given by all h ∈ V(X ) . (11.2.5) G∗ [h] = exp(G[h] − 1) More generally, given a triangular array {Nni }, the corresponding Poisson ∗ ∗ , with distributions Pni and p.g.ﬂ.s G∗ni [ · ]; approximants are given by Nni ∗ when the Nni are independent, take the Nni to be independent also. Because P ∗ {N (A) > 0} = 1 − exp[−P{N (A) > 0}] ≤ P{N (A) > 0}, ∗ } is u.a.n. whenever {Nni } is a u.a.n. array (see also the triangular array {Nni Exercises 11.2.1–2). The following theorem is basic for point processes.

Theorem 11.2.III. Let {Nni : i = 1, . . . , mn ; n = 1, 2, . . .} be an indepen∗ } an independent array of corresponding Poisson apdent u.a.n. array, {Nni proximants, and N an inﬁnitely divisible point process with KLM measure Then the following assertions are equivalent. Q.

11.2.

(i) (ii) (iii)

Limit Theorems for Superpositions

mn i=1

mn

149

Nni → N weakly.

∗ Nni → N weakly. (0) i=1 Pni → Q Q-weakly. i=1

mn

(0)

[In (iii), Pni is the restriction of Pni , the distribution of Nni , to N0# (X ).] Proof. Recall the simple inequalities, valid for 0 ≤ 2αi ≤ 1 (i = 1, . . . , mn ) 0 ≤ − log

mn

(1 − αi ) −

i=1

mn

αi ≤

i=1

mn

αi2

≤

mn

i=1

αi

max αi . (11.2.6) i

i=1

We apply these inequalities with αi = Gni [h], where h(x) = 1 for x outside some stochastic continuity set A for P, the distribution of N . From the u.a.n. condition, 1 − Gni [h] ≤ Pr{Nni (A) > 0} ≤ 12 (i = 1, . . . , mn ) for n suﬃciently large, and thus mn i=1

mn mn 1 − Gni [h] ≤ 1 − Pr{Nni (A) > 0} Pr{Nni (A) > 0} ≤ − log i=1

mn Nni (A) = 0 = − log Pr

i=1

i=1

→ − log Pr{N (A) = 0} < ∞, so that the left-hand sum here is uniformly bounded (over subsets of A) for n suﬃciently large. It follows from (11.2.6) that if one of mn

i=1

Gn [h]

and

exp

mn

1 − Gni [h]

i=1

converges to a ﬁnite nonzero limit, then so does the other, and the limits are equal. This implies the equivalence of (i) and (ii) of the theorem. (0) ∗ are inﬁnitely divisible, with KLM measures Pni , so The processes Nni (0) ∗ Pni . By the row sum Nni is inﬁnitely divisible, with KLM measure appealing to Proposition 11.2.II, the equivalence of (ii) and (iii) follows. The arguments used in the proof lead to an alternative formulation of the result in terms of p.g.ﬂ.s (see Exercise 11.2.3). The following result is an easy corollary. Proposition 11.2.IV. A point process is inﬁnitely divisible if and only if it can be represented as the limit of the row sums of a u.a.n. array. The most important application of Theorem 11.2.III is to ﬁnding conditions for convergence to a Poisson process.

150

11. Convergence Concepts and Limit Theorems

Theorem 11.2.V. The triangular u.a.n. array {Nni : i = 1, . . . , mn ; n = 1, 2, . . .} converges weakly to a Poisson process with parameter measure µ if and only if for all bounded Borel sets A with µ(∂A) = 0, mn

Pr{Nni (A) ≥ 2} → 0

(n → ∞)

(11.2.7)

i=1

and

mn

Pr{Nni (A) ≥ 1} → µ(A)

(n → ∞).

(11.2.8)

i=1

Proof. Recall from Example 10.2(a) that for a Poisson process the KLM is related to the parameter measure µ(·) by measure Q(·) : N (A) > 0}, µ(A) = Q{N itself is concentrated on one-point realizations, so that and that Q : N (X ) > 1} = 0. Q{N It follows from Theorem 11.2.III that if the array converges to a Poisson process, then mn

: N (A) > 0} = µ(A), Pr{Nni (A) > 0} → Q{N

i=1

and

mn

: N (A) ≥ 2} = 0, Pr{Nni (A) ≥ 2} → Q{N

i=1

so the conditions (11.2.7) and (11.2.8) are necessary. Conversely, if (11.2.7) holds for a sequence of sets An ↑ X , we must then : N (X ) > 1} = 0, so that the limit process must be Poisson, and have Q{N (11.2.8) identiﬁes the parameter measure as µ. We remark that as in other applications of weak convergence, it is suﬃcient, in checking the conditions of the theorem, to let A run through the sets of any covering semiring of continuity sets of µ [see Proposition 11.1.VIII(iv)]. The following special case was the ﬁrst to be studied and can be regarded as the prototype limit theorem for point processes. Proposition 11.2.VI. Let N be a simple stationary point process on X = R with ﬁnite intensity λ, and let Nn denote the point process obtained by superposing n independent replicates of N and dilating the scale of X by a factor n. Then as n → ∞, Nn converges weakly to a Poisson process with parameter measure λ(·), where (·) denotes Lebesgue measure on R.

11.2.

Limit Theorems for Superpositions

151

Proof. Here we can envisage a triangular array situation in which each Nni (i = 1, . . . , n) has the same distribution as the original process but on a dilated scale. Hence, using Propositions 3.3.I and 3.3.IV, Pr{Nni (0, t] > 0} = Pr{N (0, t/n] > 0} = (λt/n) 1 + o(1) . Summing on i = 1, . . . , n leads to (11.2.8) with µ(·) = λ(·). Similarly, from Proposition 3.3.V, Pr{Nni (0, t] > 1} = Pr{N (0, t/n] > 1} = o(1/n), and again summing on i leads to (11.2.7). The statement and proof need change when X = Rd ; see Exercise 11.2.4. We conclude this section by brieﬂy reviewing some extensions and further developments. Some of the results that we have handled by generating function arguments can be strengthened to give results concerning bounds in variation norm. In particular, there are elegant bounds that follow via the use of Poisson approximants (see Exercises 11.2.1–2). Extensions to the multivariate, nonorderly, and marked point process cases can generally be handled by applying the preceding results to the case where X has the product form X × K for an appropriate mark space K. Example 11.2(a) Convergence to a multivariate independent Poisson process. Suppose there is given a point process in which each point is identiﬁable as one of a ﬁnite set of types 1, . . . , K say. The process can be described by the (k) multivariate processes with component processes Nni (·) (k = 1, . . . , K). We seek conditions for weak convergence of the superpositions to a limit process in which the diﬀerent types follow independent Poisson processes with parameter measures µk (k = 1, . . . , K). This last process can thus be regarded as a Poisson process on the space X × {1, . . . , K} with overall measure µ such (k) that µk (·) = µ(· × {k}). Similarly, regard the family {Nni : k = 1, . . . , K} as deﬁning a process Nni on X × {1, . . . , K}. To apply Theorem 11.2.V we have to interpret (11.2.7) and (11.2.8), which apply to the overall processes Nni , in terms of the components. We take A at (11.2.7) and (11.2.8) to be a product set of the form B × {1, . . . , K} for some bounded Borel set B that is a stochastic continuity set for each µ1 , . . . , µk ; that is, µk (∂B) = 0

(k = 1, . . . , K).

(11.2.9)

Then (11.2.7) becomes mn i=1

K (k) Pr Nni (B) ≥ 2 k=1

→ 0

(n → ∞),

(11.2.10)

152

11. Convergence Concepts and Limit Theorems

which incorporates the requirement (crucial if the limit process is to have independent components) that there should be zero limiting probability for two distinct components each to contribute points to the same bounded B. Similarly, (11.2.8) takes the form that for all bounded Bk for which µk (∂Bk ) = 0, k = 1, . . . , K, mn K K (k) Pr Nni (B) ≥ 1 → µk (Bk ), i=1

k=1

k=1

which in view of (11.2.10) is satisﬁed if and only if for each bounded B for which µk (∂B) = 0 (k = 1, . . . , K), mn

(k)

Pr{Nni (B) ≥ 1} → µk (B).

(11.2.11)

i=1

Note also that the u.a.n. condition here becomes K (k) lim sup Pr Nni (B) > 0 = 0, n→∞

i

k=1

again apparently incorporating a constraint on the simultaneous occurrence of points of several types. However, because the mark space here consists only of a ﬁnite set of types, the u.a.n. condition is equivalent to a componentwise u.a.n. condition, namely, (k)

lim sup Pr{Nni (B) > 0} = 0

n→∞

(k = 1, . . . , K).

(11.2.12)

i

Thus, from Theorem 11.2.V we have the following corollary. Corollary 11.2.VII. For a K-variate independent triangular array satisfying the u.a.n. condition (11.2.12), the necessary and suﬃcient conditions for convergence to a K-variate Poisson process with independent components are that (11.2.10) and (11.2.11) hold. Necessary and suﬃcient conditions for convergence to other types of inﬁnitely divisible point processes, in particular the Poisson cluster process, can be derived by referring back to the general results of Theorem 11.2.III. The procedure is similar to that outlined in Theorem 11.2.V: ﬁrst identify the KLM measure for the process of interest, and then use (iii) to obtain necessary and suﬃcient conditions on the component probabilities to ensure convergence to the appropriate limit. The particular case of convergence to a Gauss–Poisson process is outlined in Exercise 11.2.5. There are further complements to Example 11.2(a) in Exercises 11.2.6–7. Finally, we return to the question of convergence of u.a.n. arrays of random measures broached at the beginning of this section. From the general structural form of an inﬁnitely divisible random measure given in Proposition 10.2.IX, we have the following condition for the convergence of a u.a.n. array of random measures (see also Exercise 11.2.10).

11.2.

Limit Theorems for Superpositions

153

Proposition 11.2.VIII. Let {ξni : i = 1, . . . , mn ; n = 1, 2, . . .} be an indemn ξni , pendent u.a.n. array of random measures on the c.s.m.s. X , ξn = i=1 and ξ an inﬁnitely divisible random measure with Laplace representation (10.2.11). Then necessary and suﬃcient conditions for ξn → ξ weakly, are mn (0) Pni → Λ Q-weakly on M# (i) 0 (X ); i=1

(ii) for all bounded A ∈ BX , lim lim sup ε→0 n→∞

mn E ξni (A)I[0,ε] ξni (A) i=1

= lim lim inf ε→0 n→∞

(iii)

lim lim sup

mn

r→∞ n→∞

mn E ξni (A)I[0,ε] ξni (A) = α(A); and i=1

P{ξni (A) > r} = 0.

i=1

Just as for point processes, the conditions (i)–(iii) of this proposition can be summarized more succinctly into the single requirement that the Laplace functionals should satisfy mn

1−Lni [f ] →

i=1

f (x) α(dx)− X

M# X

exp − f (x) ξ(dx) −1 Λ(dξ). X

(11.2.13) A sketch proof, completely analogous to that of Theorem 11.2.III, is set out in Exercises 11.2.8–9. For a more detailed treatment see Kallenberg (1975, Chapter 6).

Exercises and Complements to Section 11.2 11.2.1 (a) Let P, P ∗ be the distributions of a totally ﬁnite point process on X and its Poisson approximant, respectively. Show that 2

P − P ∗ ≤ 2(P{N (X ) > 0}) . (b) Denoting the convolution of measures Pni by *Pni , show that ∗ ≤2 *Pni − *Pni

2

i

(Pni {N (X ) > 0}) .

(c) Conclude that for independent u.a.n. arrays, the weak convergence of the probability measures in parts (i) and (ii) of Theorem 11.2.III can be replaced by their strong convergence on bounded Borel sets A. 11.2.2 For ﬁnite point processes Pj with Poisson approximants Pj∗ (j = 1, 2), show that P1∗ − P2∗ ≤ 2P1 − P2 .

154

11. Convergence Concepts and Limit Theorems

11.2.3 Restate Theorem 11.2.III as follows: the necessary and suﬃcient condition for convergence of an independent u.a.n. array is that

mn

(1 − Gni [h]) → i=1

#

N0 (X )

exp X

log h(x) N (dx) − 1 Q(dN )

for all h ∈ V(X ), and that the right-hand side then equals log G[h], where G is the p.g.ﬂ. of the limit point process. Hence or otherwise deduce that Theorem 11.2.III remains valid in the case mn = ∞ provided only that the resultant superpositions are well deﬁned. 11.2.4 Extensions of the prototype limit result for superpositions. (a) Formulate statements analogous to Proposition 11.2.VI when N is stationary but instead of simple it is (i) nonorderly; or (ii) a marked point process (does the limiting process have independent marks?). (b) Consider the case X = Rd for some d ≥ 2 in which Nn is as stated in Proposition 11.2.VI except that any A ∈ BX is now rescaled by a factor n1/d . What conditions on N suﬃce for the weak convergence of Nn ? What is needed to obtain nontrivial conclusions in the scenarios of (a)?

is 11.2.5 For a Gauss–Poisson process [see Example 6.3(d)] the KLM measure Q described in Exercise 10.2.3. Use this to deduce that if an independent u.a.n. array converges to a Gauss–Poisson process, then the following hold for bounded A, A1 , A2 in BX : (a) i Pr{Nni (A) ≥ 3} → 0; (b) (c)

1 {N (A) = 1} + 2Q 2 {N (A) = 1}; Pr{Nni (A) = 1} → Q i

i Pr{Nni (A1 ) = 1, Nni (A2 ) = 1} → Q2 {N (A1 ) = 1, N (A2 ) = 1} for disjoint A1 , A2 .

11.2.6 Express the result of Example 11.2(a) in terms of multivariate p.g.ﬂ.s. 11.2.7 Formulate a limit theorem for superpositions of independent marked point processes on the space X × K by regarding the components and the limit as point processes on X × K. [Hint: Compare with Example 11.2(a), and assume ﬁrst that the marks are independent as in Proposition 6.4.IV.] 11.2.8 Compare the statements of Theorem 11.2.III and Proposition 11.2.VIII. To establish the latter, proceed as below. (a) Introduce Poisson approximants and show that for a u.a.n. array their sum converges if the sum of original summands converges. (b) Prove an analogue of Proposition 11.2.II for the convergence of inﬁnitely divisible random measures in terms of the convergence of their components αn and Λn . (c) Finally, apply part (b) to the Poisson approximants. 11.2.9 (Continuation). Show that the equivalence of (11.2.13) and Proposition 11.2.VIII can be regarded as a continuity theorem for Laplace functionals complicated by the detail of the behaviour near the zero measure. 11.2.10 Interpret the conditions in Proposition 11.2.VIII in terms of the setting of Chapter 4 of Parthasarathy (1967).

11.3.

Thinned Point Processes

155

11.3. Thinned Point Processes The notion of thinning a point process to construct another point process has been described in Example 4.3(a) for renewal processes in the simplest case where the thinning occurs independently for each point. The idea underlying the operation is that, in principle, points of a process may occur at any of a very large number of locations, but it is only at a relatively small proportion of such locations that points are observed. The limit theorems described below formulate suﬃcient conditions for this process of rarefaction (or thinning or deletion) to lead in the limit to a Poisson process. Let N be a point process on X = Rd and p(·) a measurable function on X with 0 ≤ p(x) ≤ 1 (all x). Np(·) is obtained from N by independent thinning according to p(·) when the following holds. Let the realization N (·, ω) consist of the countable set of points {xi } (cf. Proposition 9.1.V); construct a subset of these points by taking each xi in turn, deleting it with probability 1 − p(xi ) and retaining it with probability p(xi ), independently for each point; and regard the set of points so retained as deﬁning a realization of the thinned point process Np(·) (·, ω). Some form of rescaling is needed before a limit theorem can emerge: the simplest set-up is the following. Take X = R and p(x) = p (all x), and after thinning contract the scale in X by an amount p, so that the point x ∈ R p (·), resulting from is mapped into px; equivalently, the point process, say N both thinning and scale-contraction, has Np (A) = k only if from the original process exactly k of the N (p−1 A) points in the set p−1 A are retained in the thinning process. p (·) denote the sequence of point processes obProposition 11.3.I. Let N tained by independent thinning and contraction at rate p = 1/T from a point process N (·) on X = R, and let N∞ denote a stationary Poisson process at rate λ. Then p (·) → N∞ weakly N if and only if as p → 0, for every bounded A ∈ BX , pN (p−1 A) ≡ (1/T )N (T A) → λ(A)

in probability.

(11.3.1)

Proof. For independent thinnings, the mechanics are most easily described in terms of p.g.ﬂs. Indeed, the thinned process can be regarded as an especially simple form of cluster process, in which each of the points of the original process may be regarded as the centre of a cluster, and the cluster itself is either empty (if the point is deleted) or has just one point at the site of the cluster centre. Suppose ﬁrst that we are given a general thinning function p(x). Then for h ∈ V(R), the p.g.ﬂ. of the cluster member process, given a centre at y, and in the notation of equations (6.3.6) and (6.3.7), is Gm [h | y] = p(y)h(y) + 1 − p(y). Thus, the p.g.ﬂ. Gp(·) [h] can be written in the abbreviated notation (11.3.2) Gp(·) [h] = G[1 − p + ph],

156

11. Convergence Concepts and Limit Theorems

where G, the p.g.ﬂ. of the original process, here plays the role of the p.g.ﬂ. of the cluster centre process. In particular, it follows easily from this representation, that a Poisson process remains Poisson after independent deletions [see Exercise 11.3.1(a)]. p Now suppose that the deletion function p(x) ≡ p (all x), and denote by G the p.g.ﬂ. of the point process Np after deletion and rescaling. From equation (11.3.2) we obtain log h(x) Np (dx) Gp [h] = E exp X log 1 − p[1 − h(x/p)] N (dx) = E exp X log 1 − p[1 − h(x)] N (dx/p) . = E exp X

The logarithmic term here equals −p[1 − h(x)] 1 + O(p) for p ↓ 0, so, using continuity of the generating function with respect to convergence in probability, [1 − h(x)]p 1 + O(p) N (dx/p) Gp [h] = E exp − X λ[1 − h(x)] (dx) → exp − X

if and only if (11.3.1) holds. An equivalent proof in terms of the convergence of one-dimensional distributions and using the characterization result of Theorem 9.2.XII is given in Westcott (1976) [see Exercise 11.3.2(a) and Proposition 11.1.IX]. Proof in the case of a renewal process is simpler [see Example 4.3(a) and Exercise 11.3.1(b)]. Equation (11.3.1) requires the individual realizations to satisfy an almost sure averaging property with a deterministic limit; if instead of (11.3.1) we have (11.3.3) pN (p−1 A, ω) → λ(ω)(A) in probability for some r.v. λ(·) deﬁned on the space (Ω, F, P) on which N is deﬁned [and, implicitly, (Ω, F, P) is assumed to be large enough to embrace the independent thinning process], the conclusion of Proposition 11.3.I is modiﬁed as below. This in turn is a special case of Theorem 11.3.III, so the proof is omitted. p (·) converges weakly to a mixed Poisson process, Proposition 11.3.II. N with mixing random variable λ, if and only if (11.3.3) holds. The formulation at (11.3.1) or (11.3.3) speciﬁes a particular form of the measure approximated by λ(A), namely, pN (p−1 A). An alternative approach is simply to postulate the existence of a sequence of point processes {Nn (·)}

11.3.

Thinned Point Processes

157

such that, given a sequence of thinning probability functions {pn (x)} satisfying (x ∈ X , n = 1, 2, . . .) (11.3.4a) 0 ≤ pn (x) ≤ 1 and sup pn (x) → 0

(n → ∞),

x∈X

the sequence of random measures Λn , where Λn (A) = satisﬁes Λn → Λ weakly

(11.3.4b) A

pn (x) Nn (dx), then (11.3.5)

for some limit random measure Λ(·). Here, we may also allow the functions pn (·) to be stochastic, subject to the constraints at (11.3.4) and (11.3.5). Note that the operation of ‘scale-contraction’ needs care when the space X = Rd say: the independent thinning operation implies that the expectation measure M (·) of N becomes pM (·) after thinning, so we should look for convergence of pM (A/p1/d ) in order to obtain a nontrivial d-dimensional analogue of the following basic result (see Exercise 11.3.4). Theorem 11.3.III. Let {pn (x): x ∈ X , n = 1, 2, . . .} be a sequence of measurable stochastic processes satisfying (11.3.4), {Nn : n = 1, 2, . . .} a sequence n the process obtained from Nn by independent of point processes, and N thinning according to pn . Then there exists a point process N for which n → N N

weakly

if and only if (11.3.5) holds for some random measure Λ, in which case N is the Cox process directed by Λ. The statement in this theorem allows for more general limits than Proposition 11.3.II precisely because no construction of the ‘increasingly dense’ processes Nn (·) is speciﬁed. In the context of Proposition 11.3.II, it is not possible to have any Cox process other than the mixed Poisson process because when pN (p−1 A, ω) → Λ(A, ω), say, as p → 0, then also p1 p2 N ((p1 p2 )−1 A, ω) → p2 Λ(p−1 2 A, ω) as p1 → 0, and taking (for example) X = R and A = (0, 1], we −1 then have Λ((0, p−1 2 ], ω) = p2 Λ((0, 1], ω) for all 0 < p2 < 1, from which it follows that Λ(·, ω) coincides with λ(ω)(·) for some r.v. λ(·). n , with h ∈ V, n of N Proof. We have for the p.g.ﬂ. G Gn [h] = E exp log 1 − pn (x)[1 − h(x)] Nn (dx) . X

When equations (11.3.4) are satisﬁed we can write − log 1 − pn (x)[1 − h(x)] = pn (x)[1 − h(x)][1 + Rn (x)],

158

11. Convergence Concepts and Limit Theorems

where |Rn (x)| ≤ 12 pn (x) and θn = supx∈X |Rn (x)| → 0 as n → ∞. We can therefore write log 1 − pn (x)[1 − h(x)] Nn (dx) = [1 − h(x)] 1 + Rn (x) Λn (dx). − X

X

If now (11.3.5) holds, then the random variables X [1 − h(x)] Λn (dx) converge in distribution to X [1 − h(x)] Λ(dx), so that their Laplace transforms, and n [h], converge to the Laplace transform of their limit, hence also the p.g.ﬂ.s G namely, n [h] → E exp − G

X

[1 − h(x)] Λ(dx)

.

(11.3.6)

The right-hand side here is just the p.g.ﬂ. of the Cox process directed by Λ, which completes the proof that (11.3.5) is suﬃcient. n converge. We ﬁrst establish Suppose conversely that the point processes N that the random measures Λn are weakly compact. Referring to Proposition 11.1.V, let S be a closed sphere in X , and consider the random variables n } implies G n [h] → G∞ [h], say, Λn (S). Weak convergence of the sequence {N for h ∈ V, and in particular for h = hz , where 0 ≤ z ≤ 1 and hz (x) =

z 1

(x ∈ S), (x ∈ S).

But, assuming (11.3.4), this is equivalent to convergence of the Laplace transforms E exp[−(1 − z)Λn (S)] to the limit G∞ [hz ], which is continuous in z as z → 1. Then the continuity theorem for Laplace transforms implies that the limit is the Laplace transform of a proper distribution, and hence that the distributions of the random variables Λn (S) are uniformly tight. Thus, given > 0 we can ﬁnd M < ∞ such that for all n, P{Λn (S) > M } < ; that is, (11.1.1) holds. n converge weakly then, given η > 0, there exists a As for (11.1.2), if the N compact C such that for n = 1, 2, . . . , n (S − C) > 0} < η P{N [i.e., we use the necessity of (11.1.2) for the point processes]. Now set h(x) = 0 or 1 as x ∈ or ∈ / S − C, and deduce as above that from (11.3.4), n (S − C) > 0} − E exp − Λn (S − C) → 0 1 − P{N

(n → ∞).

11.3.

Thinned Point Processes

159

Thus, for suﬃciently large n, n (S − C) > 0} + η ≤ 2η. E 1 − exp[−Λn (S − C)] ≤ P{N

(11.3.7)

But by a basic inequality, because 1 − e−x is nonnegative and monotonic, P{Λn (S − C) > δ} ≤

E[1 − e−Λn (S−C) ] 2η ≤ . 1 − e−δ 1 − e−δ

Thus, no matter how small δ and , we can ﬁnd C such that P{Λn (S −C) > δ} < for all suﬃciently large n, and hence (by modifying C if necessary) for all n > 0. Thus, both conditions (11.1.1) and (11.1.2) hold for the sequence Λn . It is n that any limit now a simple matter to deduce from the convergence of the G random measure Λ must satisfy 1 − h(x) Λ(dx) = G∞ [h]. E exp − X

It follows that the limit Λ must be unique and that the Laplace functionals of the Λn converge to that of Λ, so that (11.3.5) holds. One of the important applications of point process methods, and of the concept of thinning in particular, is to the study of high-level crossings of a continuous stochastic process. We do not treat this topic in detail, for which see Leadbetter, Lindgren, and Rootzen (1983) and the earlier text by Cram´er and Leadbetter (1967), apart from brieﬂy indicating one possible approach as follows. Consider a nonnegative, discrete time process {Xn } and associate with each n the point (n, Xn ) of a marked point process in R × R+ , where R+ plays the role of the mark space K in Deﬁnition 9.1.V. Let the underlying process of time points in R be thinned by rejecting all pairs (n, Xn ), for which (say) Xn ≤ M , and let the time axis be rescaled suitably. We may now seek conditions under which the rescaled process converges to a limit as M → ∞. In general, the thinnings here are not independent and the resulting process may exhibit substantial clustering properties. With suitable precautions, however, and assuming some asymptotic independence or mixing conditions, we may anticipate convergence to a Poisson limit. A richer theory might be expected to result if one could retain the values of the marks accepted, albeit themselves rescaled in an appropriate manner. We now give an example of this kind, in the especially simple case where the marks {Xn } are i.i.d.; yet another approach to dependent thinnings, where the probability of thinning is allowed to depend on the previous history, is outlined in Proposition 14.2.XI.

160

11. Convergence Concepts and Limit Theorems

Example 11.3(a). Thinning by the tails of a regularly varying distribution. With the set-up just described, suppose that the initial points {ti } form a stationary, ergodic process N0 with ﬁnite mean rate m and that the marks Xi are i.i.d. with regularly varying tails [see, e.g., Feller (1971, Section VIII.8) or Bingham, Goldie, and Teugels (1987)], so that for some α > 0, 1 − F (x) = L(x)x−α

(x → ∞),

where L(x) is slowly varying at inﬁnity; that is, L(cx)/L(x) → 1 for all c > 0. Consider now a sequence of point processes on the space R × R+ obtained in the following manner. For each n = 1, 2, . . . , set Nn (t1 , t2 ] × (u, v] = #{(ti , xi ): nt1 < ti ≤ nt2 and an u < xi ≤ an v}, where the sequence of constants {an : n = 1, 2, . . .} is deﬁned by 1 − F (an ) = 1/n,

equivalently,

an = F −1 (1 − 1/n),

and we assume for convenience that the distribution of F is continuous so the inverse F −1 is well deﬁned. Because the marks are independent, the p.g.ﬂ. of the marked process on R × R+ = X × K can be written, for suitable functions h(·), in the form G[h] = E exp log h(t, y) dF (y) N0 (dt) . R

R+

The function h here must of course lie in V(X × K) (see Proposition 6.4.IV), but, because the limit point process is boundedly ﬁnite only in subsets of the mark space bounded away from the origin, h should be equal to unity in a neighbourhood of the origin on the mark space R+ . In eﬀect, the metric in the mark space should be modiﬁed so that the origin on the y-axis becomes a point at ∞. With this modiﬁcation the p.g.ﬂ. theory carries through without change. For the rescaled process, we have to consider t y t , dF (y) = h h , y dF (an y) n an n R+ R+ ∞) t * 1 − h , y dF (an y). =1− n 0 The assumption of regular variation of F is equivalent to the weak convergence of the measures nF (an ·) to the measure ν deﬁned by ν(y, ∞) = y −α , because for any interval (u, v] with 0 < u < v < ∞, v dF (an y) = n 1 − F (an u) − 1 − F (an v) n u

L(an u) −α L(an v) −α 1 − F (an u) 1 − F (an v) − = u − v 1 − F (an ) 1 − F (an ) L(an ) L(an ) → u−α − v −α (n → ∞).

=

11.3.

Thinned Point Processes

161

Consequently, the innermost integral in the expression for G[h] is expressible as ∞ t * ∞) t y −1 1 + o(1) dF (y) = 1 − n 1 − h , y dν(y); h , n an n 0 0 thus, the p.g.ﬂ. of the rescaled process, Gn say, is given by t y F (dy) N0 (dt) log h , Gn [h] = E exp n an R R+ ) t * 1 − h , y ν(dy) N0 (dt) = E exp n−1 1 + o(1) n R R+ → E exp (1 − h(u, y)) ν(dy)m du as n → ∞, R

R+

using the ergodicity of N0 . The limit process is thus a Poisson process on R × R+ with intensity measure m(·) × ν. For any c > 0, the limit process restricted to points with marks above c is a compound Poisson process where the marks are distributed on (c, ∞) according to the distribution with d.f. Fc (x) = 1 − (x/c)−α . Strictly speaking, the overall process is not a compound Poisson process as deﬁned in Section 6.4, because the ground process is not boundedly ﬁnite. Such extended compound Poisson processes appear also in the discussion of stable random measures (see Section 10.2) and self-similar MPPs in Section 12.8. For further examples and applications see Resnick (1986, 1987). We now indicate how the convergence in Theorem 11.3.III can be strengthened. The theorem is the weak convergence of the ﬁdi distributions: we prove the stronger property that the ﬁdi distributions converge in variation norm, and at the same time provide a bound on the rate of convergence. For probability measures P1 , P2 on the measurable space (Ω, E) the variation metric d(P1 , P2 ) can be deﬁned by d(P1 , P2 ) = supB∈E |P1 (B) − P2 (B)|. This metric has the probabilistic interpretation that d(P1 , P2 ) = inf Pr{ω: X(ω) = Y (ω)}, where the inﬁmum is taken over all pairs of measurable functions X, Y on (Ω, E, Pr) inducing the measures P1 , P2 , respectively. A pair (X, Y ) for which equality holds constitutes a maximal coupling for the probability measures P1 , P2 . The metric d(·, ·) diﬀers by a factor of 2 from the variation distance because, in the notation of Appendix A1.3 where it is deﬁned, |P1 (dω) − P2 (dω)| VP1 −P2 = P1 − P2 = Ω

n(T )

= sup T (Ω)

1

|P1 (Ai ) − P2 (Ai )| = 2d(P1 , P2 ).

162

11. Convergence Concepts and Limit Theorems

This notation · is as in MKM (1978) where Var(·) = · is also used. Because the limit r.v. in Theorem 11.3.III is Poisson, our concern is with d(Pn , P∞ ), where the limit probability measure P∞ is Poisson and nonatomic. The Renyi–M¨ onch Theorem 9.2.XII asserts that such measures are characterized by their one-dimensional distributions, and by Proposition 11.1.IX it is then enough here to consider the quantity d Nn (A), N (A) ≡ d Pr{Nn (A) ∈ ·}, Pr{N (A) ∈ ·} for any bounded Borel set A [we abuse the notation in replacing the distributions of r.v.s in d(·, ·) by the r.v.s themselves]. Furthermore, for nonnegative integer-valued r.v.s, X, Y say, with distributions {pk }, {qk } say, we have " " " " " d(X, Y ) = d({pk }, {qk }) = sup " (pk − qk )"" = A⊂Z +

=

∞

k∈A

1 2

∞

|pk − qk |

k=0

(pk − qk )+ .

k=0

Proposition 11.3.IV. In the setting of Theorem 11.3.III, for bounded A ∈ BX , n (A), N (A) ≤ E sup |pn (x)| + 1 − exp − |Λn (A) − Λ(A)| . d N x∈A

Proof. Observe that it is enough to prove the result in the context that the functions pn and measures Λn , Λ are deterministic, for if otherwise, describe these entities as functions of ω with distribution µ(·). Then d Npn (A), N (A) = sup |P1 (B) − P2 (B)| B " " " " P1 (B; ω ) − P2 (B; ω ) µ(dω )"" = sup "" B Ω " " sup "P1 (B; ω ) − P2 (B; ω )" µ(dω ) ≤ B Ω = d Npn (A), N (A) | ω µ(dω ). Ω

The ﬁrst term in the bound comes from Lemma 11.3.V below, and the second comes from the fact that for Poisson r.v.s X, Y with means λ, µ, d(X, Y ) ≤ d(0, Z) = 1 − e−|λ−µ| , where Z is Poisson with mean |λ − µ|. The second term in the bound can be tightened: see Exercise 11.3.5. Lemma 11.3.V. Let X1 , . . . , Xn be independent Bernoulli r.v.s with n p j = Pr{Xj = 1} = 1 − Pr{Xj = 0}, and Y a Poisson r.v. with mean λ = j=1 pj .

11.3.

Thinned Point Processes

163

n Then the distributions of Y and S = j=1 Xj have variation distance d(S, Y ) bounded as in n C j=1 p2j ≤ C max{pj }, (11.3.8) d(S, Y ) ≤ n j j=1 pj where C ≤ 0.71 when maxj {pj } ≤ 0.25. Always, C ≤ 1. Remarks. Although Stein–Chen methods have been much used to prove results like this lemma, we give a direct approach based on Fourier transforms. The method of proof below draws in part on work of Samuels (1965) and Kerstan (1964a). It shows that the value of the constant C depends on the value of max{pj } and can be reduced further by supposing that the maximum is smaller than 0.25. Its smallest value, when the maximum → 0, is equal (by the method below) to 0.409, which is tighter than that quoted by Romanowska (1978) for the more restricted case of a simple binomial approximated by a Poisson. Denoting the middle term in (11.3.7) by C, the computations in the later part of the proof below can be tightened further by retaining ξ in (11.3.8) as a function of θ and integrating numerically; then 0.61 can replace 0.70789 ≈ 0.71 in the theorem. The general result that C ≤ 1 follows from work of Barbour and Hall (1984) and is not discussed here. Proof. The sum S has a Poisson binomial distribution, {bk } say, for which the generating function is n

n

bk z k =

(1 − pj + pj z)

(|z| ≤ 1).

j=1

k=0

! An inequality of Newton used in Samuels (1965) implies that ck = bk nk is a log concave sequence; that is, c2k ≥ ck−1 ck+1 for k = 0, 1, . . . , n; equivalently, b2k ≥ 1 + (n − k)−1 (1 + k −1 )bk−1 bk+1 for k = 1, . . . , n − 1. Write πk = πk (λ) ≡ e−λ λk /k! . Then d(S, Y ) =

n

{bk − πk )+ =

k=0

n

πk (bk /πk − 1)+ ,

k=0

and this summation will involve nonzero terms on a single interval of integers, {k0 + 1, . . . , k1 } say, if the sequence of ratios {bk /πk } is unimodal. For this it suﬃces that the ratio of ratios (bk /πk )/(bk−1 /πk−1 ), which equals kbk /λbk−1 , be monotonic in k, and this is implied by the corollary to the Newton inequality (see Exercise 11.3.6). Consequently, the sup distance d0 (S, Y ) ≡ sup |Pr{S ≤ k} − Pr{Y ≤ k}| k

= max

k0

(πk − bk ),

k=0

n k=k1 +1

(πk − bk ) .

164

11. Convergence Concepts and Limit Theorems

By addition, we thus have 2d0 (S, Y ) ≥ d(S, Y ) ≥ d0 (S, Y ), so to bound the variation distance d, it suﬃces to bound the sup distance d0 . The Fourier inversion relation gives " " π " 1 E(eiθS ) − E(eiθY ) "" e−iθk dθ d0 (S, Y ) = sup "" " 2πi −π 1 − eiθ k π 1 |E(eiθS ) − E(eiθY )| ≤ dθ. 2π −π |2 sin 12 θ| The characteristic functions here are E(eiθY ) = exp[−λ(1 − eiθ )] and E(eiθS ) =

n

1 − pj (1 − eiθ )

j=1

n

= E(eiθY )

exp[pj (1 − eiθ )] [1 − pj (1 − eiθ )].

j=1

Each term in the product here is of the form [1 − p(1 − eiθ )] exp[p(1 − eiθ )] = 1 −

∞ [p(1 − eiθ )]k (k − 1)

k!

k=2 2

≡ 1 − p (1 − eiθ )2 f (p; θ) so |E(e

iθS

iθY

)− E(e

)| = |E(e

iθY

say,

" " " n " 2 iθ 2 " )|" [1 − pj (1 − e ) f (pj ; θ)] − 1"" j=1

" " n [−(1 − eiθ )2 ]r = e−λ(1−cos θ) "" r=1

−λ(1−cos θ)

≤e

n

1 + 2(1 −

p2jk f (pjk ; θ)

j1 <···<jr

cos θ)p2j |f (pj ; θ)|

−1

j=1

≤ e−λ(1−cos θ) exp[λξ(θ)(1 − cos θ)] − 1 , where ξ(θ) = and |f (p; θ)| ≤

2

n

p2j |f (pj ; θ)| n j=1 pj

j=1

∞ |2p sin 1 θ|k−2 (k − 1) 2

k=2

k!

=

1 − (1 − ψ)eψ , ψ2

with ψ = |2p sin 12 θ|. This bound is a maximum for |θ| ≤ π at θ = π where it equals (1 − (1 − 2p)e2p )/4p2 , which increases monotonically with p in p > 0.

11.3.

Thinned Point Processes

165

Write ξ = ξ(π). Then using d ≤ 2d0 and the bounds above gives

e−λ(1−cos θ) [eλξ(1−cos θ) − 1] dθ sin 12 θ 0 ξ 2λ π 1 sin 2 θ dθ exp[−2λ(1 − y) sin2 12 θ] dy = π 0 0 ξ 4λ 1 du exp[−2λ(1 − y)(1 − u2 )] dy = π 0 0 1 4λ ξ dy exp[−2λ(1 − y)(1 − u)] du ≤ π 0 0 2 log(1 − ξ) 2 ξ dy = − , ≤ π 0 1−y π 1 π

d(S, Y ) ≤

π

(11.3.9)

uniformly in λ. For maxj {pj } ≤ 0.25, ξ ≤ 0.3513, and thus d(S, Y ) ≤ = 1.102, where we follow Le Cam (1960) in [(2π −1 log(1/0.6487))/0.25] n n writing = j=1 p2j / j=1 pj . Alternatively, the inequality (11.3.8) can be replaced by the tighter bound based on α>0

1

α exp[−α(1 − u2 )] du = c = 0.642374

sup 0

(this supremum is attained at α = 2.255), leading to 2c d(S, Y ) ≤ π

0

ξ

2c log(1 − ξ) 2cξ dy =− < . 1−y π π(1 − ξ)

For maxj {pj } ≤ 0.25, this yields d(S, Y ) ≤ 0.70789, and for ﬁxed n, lim

maxj {pj }↓0

d(S, Y ) ≤

2cξ = 0.40895ξ. π

Exercises and Complements to Section 11.3 11.3.1 (a) Verify that if a Poisson process in X = Rd with intensity measure Λ(·) is subjected to independent thinning with retention function p(x) for measurable p(·), then the thinned process is Poisson with intensity measure Λp , where Λp (A) =

p(x) Λ(dx). A

(b) When N in Proposition 11.3.I is a renewal process, use inter alia transform techniques to show the following [cf. R´enyi (1956)]. (i) The rarefaction of any renewal process is a renewal process. (ii) The only renewal process invariant under the operations is the Poisson. (iii) Starting from any renewal process with ﬁnite mean lifetime, the limit under the operations is a Poisson process.

166

11. Convergence Concepts and Limit Theorems

11.3.2 (a) Apply Proposition 11.1.IX to furnish an alternative proof of Proposition 11.3.I via avoidance functions. (b) Give a direct proof of Proposition 11.3.II either via p.g.ﬂ.s or by extension of part (a). 11.3.3 Use Theorem 11.3.III to show that a distribution P on NX# is a Cox distribution if and only if for each c in 0 < c < 1 there exists a distribution Pc , which under independent random thinning with constant probability c yields P [see Mecke (1968)]. 11.3.4 Extensions of the prototype limit result for thinnings. (a) Formulate statements analogous to Proposition 11.3.I when N is stationary but instead of simple it is (i) nonorderly and either (i ) batches of points are treated in toto, with all points retained with probability p and otherwise all deleted; (i ) each point of a batch is thinned independently (with retention probability p per point); or (ii) a marked point process. In case (ii) does the limit process have independent marks? (b) Consider the case X = Rd for some d ≥ 2 in which Nn is as stated in Proposition 11.3.I except that any A ∈ BX is now rescaled by a factor n1/d . What conditions on N suﬃce for the weak convergence of Nn ? How are the results of (a) aﬀected by having X = Rd ? √ 11.3.5 (a) Show that ak ≡ supλ>0 πk (λ) λ occurs for λ = k + 12 and that the se√ quence of ratios {ak+1 /ak } is monotonic in k. Show √ that a0 = 1/ 2eλ ≥ ak > ak+1 for k = 0, 1, . . . , and that πk (λ) ≤ 1/ 2eλ (every λ > 0). (b) Express d({πk (λ)}, {πk (µ)}) as an integral with density πj (·) for some j, and√hence deduce that this variation metric is bounded above by √ 2/e | λ − µ |. Thus, except for small Λn , Λ, the last term in Proposition 11.3.IV can be tightened [see Daley (1987)].

11.3.6 (a) Let bk (n; p) = nk pk (1 − p)n−k , k = 0, 1, . . . , n, 0 < p < 1, denote binomial probabilities, and write πk (λ) for Poisson probabilities as in the proof of Lemma 11.3.V. Show directly that {bk (n; p)/πk (λ)} is a unimodal sequence in k by virtue of the monotonicity of (k + 1)bk+1 /bk . (b) Now let {bk } = {bk (n; p1 , . . . , pn )} denote the Poisson binomial distribution of S of the lemma. Show by induction on n that {(k + 1)bk+1 /bk } is monotonic in k. (c) Deduce that d(S, Y ) ≤ maxk {bk /πk } − 1.

11.4. Random Translations The remaining class of stochastic operations that we consider has as its prototype random translations: each point xi in the realization of some initial point process N0 is shifted independently of its neighbours through a random vector Yi , the Yi forming a sequence of i.i.d. random variables. Exercises 2.3.4(b) and 8.2.7 give simple cases. Generalizations occur if the translations are replaced by more general clustering mechanisms in which the mean number of points per cluster is held equal to one: we defer discussion to Section 13.5 in order to take advantage of some properties of Palm distributions.

11.4.

Random Translations

167

One possible approach to such problems is to view the resultant process as the superposition of its individual clusters, one from each point of the initial realization, and to seek to apply the results of Section 11.2 on triangular arrays. Here the nth row in the array relates to the process derived from n stages of clustering (translation); although the number of terms Nni in the nth row is inﬁnite, this does not aﬀect the validity of the criteria provided their superpositions are well deﬁned (see Exercise 11.2.3), as they must be in this case for the resultant processes themselves to be well deﬁned. In the case of random translations in Rd this leads us to a result of the following kind. Let ν(·) denote the common distribution on B(Rd ) for the translations Yi , and νn (·) the n-fold convolution of ν, corresponding therefore to the eﬀect of n successive random translations. For h ∈ V and X = Rd , the p.g.ﬂ. after n translations takes the form h(· + y) νn (dy) Gn [h] = G0 X) (11.4.1) * log h(x + y) νn (dy) N0 (dx) , = E exp X

X

where G0 [ · ] is the p.g.ﬂ. of the initial process N0 . Each process Nni contains just one point, and the u.a.n. condition (11.2.2) reduces to the requirement sup νn (A + xi ) = sup Pr{Y1 + · · · + Yn ∈ xi + A} → 0 i

i

(n → ∞).

(11.4.2) To use Theorem 11.2.V to prove convergence to a Poisson limit, observe that condition (11.2.7) is trivial here because P{Nni (A) ≥ 2} ≡ 0, and (11.2.8) translates into νn (A + x) N0 (dx) = νn (A + xi ) → µ(A) (n → ∞). (11.4.3) X

i

The problem then is to ﬁnd conditions on the initial process N0 and the distribution ν that will ensure the truth of (11.4.2) and (11.4.3). The former of these is closely related to the concept of the concentration function QA (F ) of a distribution F on B(Rd ) deﬁned for bounded Borel sets A by QA (F ) = supx∈Rd F (x + A). Clearly the expression in (11.4.2) is bounded above by QA (νn ), so the u.a.n. statement there is a direct consequence of the lemma below [for a non-Fourier analytic proof see Ibragimov and Linnik (1971, Chapter 15, Section 2)]. Lemma 11.4.I. Let F be a distribution on B(Rd ) and F n∗ its nth convolution power. For bounded A, QA (F n∗ ) → 0 as n → ∞ if and only if the support of F contains at least two distinct points.

168

11. Convergence Concepts and Limit Theorems

Proof. If F is degenerate, then so is F n∗ and QA (F ) = QA (F n∗ ) = 1 for every n for every nonempty A. Otherwise, we show in fact that QA (F n∗ ) ≤ c(F, A)/n1/2

(11.4.4)

for some constant c(F, A), observing that it is enough to prove this in the case d = 1 because for d ≥ 2, noting that QA (F ) increases for monotonic increasing A, we can embed A in a set corresponding to a marginal distribution. Exercise 11.4.2 shows that the order n−1/2 of the bound in (11.4.4) is tight. Write H(y) = (sin 12 y/ 12 y)2 for the c.f. of the probability measure H(·) with triangular density function H (x) = (1 − |x|)+ . Then the Parseval relation (A2.8.8) yields for any d.f. G, positive a, and real γ, ∞ |ω| 1 ∞ −iωγ 1− dω. (11.4.5) H a(x − γ) G(dx) = G(ω)e a −∞ a + −∞ Substitute G = F n∗ , and recognize that the integral on the right-hand side here is over the interval (−a, a), and the left-hand side is real, so that an upper bound that is uniform in γ is given by 1 a 1 a n |F (ω)| dω ≤ exp − 12 n(1 − |F(ω)|2 ) dω, a −a a −a where we have used the inequality x = 1 + (x − 1) < ex−1 with x = |F(ω)|2 . Now |F|2 is the characteristic function of the d.f. Fs of the symmetrized r.v. X − X , where X and X are i.i.d. like X with d.f. F . Thus, 1 − |F(ω)|2 = 1 − E exp[iω(X − X )] = 1 − E cos ω(X − X ) sin2 12 ωy Fs (dy) = 2E sin2 12 ω(X − X ) ≥ 2 |y|>b

for some positive b, which, because F is nondegenerate, can be and is so chosen that Fs (dy) = Pr{|X − X | > b} ≡ η > 0. |y|>b

Use Jensen’s inequality in the form exp E[f (Y )] ≤ E exp[f (Y )] to write exp[−nη sin2 12 ωy] 1 a 1 a Fs (dy) |F (ω)|n dω ≤ dω a −a a −a η |y|>b Fs (dy) 1 exp[−nη sin2 12 z] dz = a |y|>b η|y| |z|
11.4.

Random Translations

Then QA (F

n∗

)≤

169

1 2 4 π (const.)

√

for any A ⊆

nη

π π − , . a a

Fix a = π/ρb for any ρ > 1, so that then for any A ⊆ [−ρb, ρb], (11.4.4) holds. But b is positive, so we can choose ρ arbitrarily large, proving the lemma. Condition (11.4.3) is more diﬃcult to realize, although it is certainly satisﬁed when the initial realization is suﬃciently regular. Suppose, as an extreme example, that d = 1 and that N0 consists of the lattice process with a point at every integer. We can then proceed via an application of the Poisson summation formula (see Exercise 8.6.4), which for suﬃciently smooth functions g yields ∞ ∞ (νn ∗ g)(k) = ν˜n (2πj)˜ g (2πj), j=−∞

k=−∞ n

where ν˜n (ω) = ν˜(ω) is the Fourier–Stieltjes transform of ν n∗ , g˜ is the ordinary Fourier transform of the continuous function g, and we assume integrable ∞ g (2πj)| is convergent. g is so chosen that g˜ is integrable and j=−∞ |˜ Recall that a distribution in R1 is nonlattice if its distribution is not concentrated on a lattice (shifted multiple of the set of integers) in R1 , and that its characteristic function then satisﬁes |˜ ν (ω)| < 1

(ω = 0).

(11.4.6)

Letting n → ∞ in (11.4.4) we obtain by dominated convergence ∞ (νn ∗ g)(k) → ν˜n (0)˜ g (0) = g(x) dx. R

k=−∞

To show that this implies (11.4.3) for intervals it is enough to sandwich the indicator function of an interval A between two functions g1 , g2 with the properties required above, that is, in such a way that for given > 0, supx |g2 (x) − g1 (x)| <

and

g1 (x) ≤ IA (x) ≤ g2 (x)

[this can be achieved, e.g., by taking for g2 the function IA/2 ∗ t/2 and for g1 the function IA−/2 ∗ t/2 , where tα is the triangular distribution on base (−α, α)]. Thus, (11.4.3) holds whenever ν is nonlattice and A is an interval. Because it is enough in Theorem 11.2.V to let A run through intervals, we can conclude from that theorem that if the initial process is lattice with a point on every integer but the distribution ν is nonlattice, then the processes Nn , obtained by successive random translations according to ν, converge weakly to the stationary Poisson process with unit rate. Obviously, the condition that the initial points lie on a lattice can be relaxed, but, unfortunately, it cannot be relaxed far enough to apply almost surely to the realizations of a typical stationary point process. Indeed, it follows from a theorem of Stone (1968) that the essential requirement on the initial process, if this is regarded as ﬁxed, is that N0 (x + An )/(An ) → const.

uniformly in x,

(11.4.7)

170

11. Convergence Concepts and Limit Theorems

where An is the sequence of hypercubes Udn of side n in Rd or, more generally, a convex averaging sequence in the sense of Deﬁnition 12.2.I. Such a condition is not satisﬁed almost surely even by the realizations of a stationary Poisson process. On the other hand, averaged forms of (11.4.5), that is, with convergence in L1 or L2 , follow directly from the ergodic theorems of Section 12.2, and in these the uniformity is trivial when the initial process is stationary. We therefore seek an alternative approach that will bypass the probability one requirements on the initial conﬁguration. For this purpose we return to a direct study of the p.g.ﬂ. at (11.4.1) of the processes Nn . To establish weak convergence of the translated versions Nn to a Poisson limit with rate m, it is enough to show that for h ∈ V, p − log h(x + y) νn (dy) N0 (dx) → m [1 − h(x)] dx, X

X

X

because this implies convergence of the Laplace transforms of the random variables on the left-hand side, and hence of the p.g.ﬂ.s. To ease the notation write u(x) = 1 − h(x), so that u vanishes outside a bounded set and satisﬁes 0 ≤ u ≤ 1. Then the above requirement becomes p log 1 − u(x + y) νn (dy) N0 (dx) → m u(x) dx. (11.4.8) − X

X

X

From Lemma 11.4.I, if ν has at least two points in its support, we can easily deduce that u(x + y) νn (dy) → 0 (n → ∞). θn ≡ sup x

X

We can therefore approximate the logarithm by its leading term, with remainder, for suﬃciently large n, bounded by " "" " " " u(x + y) ν (dy) + log 1 − u(x + y) ν (dy) u(x + y) νn (dy). ≤ θ n n n " " X

X

X

Suppose now that N0 is stationary with ﬁnite mean rate m. Then (11.4.8) is implied by the corresponding L1 convergence, which leads us to estimate the expected diﬀerence by " " " " " E"m u(x) dx + log 1 − u(x + y) νn (dy) N0 (dx)"" X X X " " " " " u(x) dx − u(x + y) νn (dy) N0 (dx)"" ≤ E"m X X " "X " " " u(x + y) νn (dy) + log[1 − u(x + y)] νn (dy) N0 (dx)"", + E" X

X

X

where the second expectation is bounded by u(x + y) νn (dy) N0 (dx) = mθn u(x) dx, θn E X

X

X

11.4.

Random Translations

171

which tends to zero by Lemma 11.4.I. Thus, for weak convergence to the Poisson limit it is enough to show that for measurable u with bounded support and 0 ≤ u ≤ 1, " " " " u(x) dx − u(x + y) νn (dy) N0 (dx)"" → 0. (11.4.9) E""m X

X

X

A complication arises here if the stationary distribution N0 is nonergodic, for the ergodic theorems then assert convergence not to the constant m but to the random variable (asymptotic density of N0 ) Y = E[N0 (Ud ) | I ],

(11.4.10)

where Ud is the unit cube in Rd , I is the invariant σ-algebra, and E(Y ) = m (see Theorem 12.2.IV). In this case (11.4.7) should be replaced by the more general requirement, justiﬁed by a completely analogous argument, that " " " " " u(x) dx − u(x + y) νn (dy) N0 (dx)"" → 0. E" Y (11.4.11) X

X

X

Note that (11.4.9) and (11.4.11) may be regarded as L1 versions of (11.4.3). A full discussion of (11.4.11) involves further delicate analysis of the convolution powers of ν; we content ourselves here with the much easier L2 version, assuming the initial point process has boundedly ﬁnite second moment measure. This leads us to the following theorem. Theorem 11.4.II. Let N0 be a second-order stationary point process on X = Rd and ν a distribution on Rd that is nonlattice. Then the sequence of point processes {Nn }, derived from N0 by successive random translations according to ν, converges weakly to the stationary mixed Poisson process with p.g.ﬂ. G[h] = E exp

−Y

X

[1 − h(x)] dx

,

(11.4.12)

where Y is given by (11.4.10). Proof. We again use a Fourier argument, observing that in the ergodic case " "2 " " E""m u(x) dx − u(x + y) νn (dy) N0 (dx)"" X X X = var u(x + y) νn (dy) N0 (dx) X X = |˜ u(ω)|2 |˜ ν (ω)|2n Γ(dω), (11.4.13) X

where Γ(·) is the Bartlett spectrum introduced in Deﬁnition 8.2.II. The validity of the above relation follows from the Parseval relation (8.6.10) and Lemma ν (ω)|2n |˜ u(ω)|2 . 8.6.V, ensuring the Γ-integrability of |˜ u(ω)|2 and hence of |˜

172

11. Convergence Concepts and Limit Theorems

Because |˜ ν (ω)| < 1 for ω = 0 [this holds for nonlattice distributions in Rd for arbitrary d ≥ 1 : see (11.4.6) for the case d = 1], the right-hand side of the identity above converges to Γ{0}, which being equal to var Y where Y is given by (11.4.10) (see Exercise 12.2.9), vanishes for an ergodic process. In the nonergodic case we have to replace Γ by a modiﬁed measure Γ∗ introduced as the Fourier transform of the modiﬁed covariance measure C ∗ (A × B) = E N0 (A) − Y (A) N0 (B) − Y (B) . Then Γ∗ diﬀers from Γ precisely by the absence of the atom at zero. We can now argue as in the ergodic case and deduce that "2 " " " " u(x + y) νn (dy) N0 (dx) − Y u(x) dx"" → Γ∗ {0} = 0. E" X

X

X

This result implies (11.4.11) and so completes the proof. It is easily seen that both ordinary and mixed stationary Poisson processes are invariant under the operation of random translation (see Exercise 11.4.1). As a corollary to Theorem 11.4.II we now have the following converse. Corollary 11.4.III. Suppose that N is a stationary, second-order point process that is invariant under the operation of random translation according to a nonlattice distribution ν. Then N is a stationary mixed Poisson process. Proof. Take N as the initial distribution in the theorem, and observe that if N is invariant the weak limit of the Nn must coincide with N . A second case of interest arises when the random translations νn are derived from the movements over time n of particles with ﬁxed but random and independently chosen velocities as in Example 8.3(g). If these velocities have a common distribution ν, we can then write νn (dx) = ν(n−1 dx) and observe that, from Exercise 11.4.5(d), QA (νn ) = QA/n (ν) → 0. Moreover, the integral at (11.4.13) becomes |˜ u(ω)|2 |ν(nω)|2 Γ(dω). X

Now if ν is absolutely continuous, |˜ ν (nω)| → 0 as n → ∞ for every ω = 0 by the Riemann–Lebesgue lemma, so that the proof of (11.4.9) and its extension (11.4.11) to the ergodic case can be completed as in the previous discussion. The restriction to integer values n is immaterial here, and we therefore obtain the following further result. Theorem 11.4.IV. Let N0 be as in Theorem 11.4.II, and for all t ≥ 0 let the point processes Nt be derived from N0 by random translations through time t by ﬁxed but random velocities with common distribution ν. If ν is absolutely continuous with respect to Lebesgue measure on Rd , the processes Nt converge weakly to a mixed Poisson process as in (11.4.12).

11.4.

Random Translations

173

Corollary 11.4.V. Let the point process Nt represent the position at time t of a system of particles moving in Rd with ﬁxed velocities chosen independently and randomly according to a distribution ν that is absolutely continuous with respect to Lebesgue measure in Rd . If the distribution of Nt is independent of t, spatially homogeneous, and of second order, then Nt is a mixed Poisson process as in (11.4.12). A more general type of location-dependent random translation is illustrated in the following example. Example 11.4(a) Markov shifts (random translations). Suppose given a point process on X with p.g.ﬂ. G[h] (h ∈ V(X )) and that any particle of this process initially at x is shifted into any A ∈ BX with probability p(A | x), where p(X | x) =

X

p(dy | x) ≤ 1

(all x),

the shortfall q(x) = 1 − p(X | x) being the probability of deletion of the particle. Arguing as for Exercise 11.3.1 yields Gm [h | x] = q(x) + h(y) p(dy | x) = 1 − [1 − h(y)] p(dy | x) X

X

for the p.g.ﬂ. of the (zero- or one-point) cluster associated with x, from which the resultant p.g.ﬂ. for the translated process equals G Gm [h | ·] . The kth tr for the shifted process is given in terms of the correfactorial moment M[k] sponding moment of the initial process by ··· p(dy1 | x1 ) . . . p(dyk | xk ) M[k] (dx1 × · · · × dxk ). X

X

When the initial process is Poisson with parameter measure µ(·) so that log G[h] = − X [1 − h(x)] µ(dx), the p.g.ﬂ. of the shifted process equals exp

−

X

X

[1 − h(y)]p(dy | x) µ(dx) ,

so the shifted process is Poisson also, with parameter measure tr µ (A) = p(A | x) µ(dx) (bounded A ∈ BX ). X

A situation of particular interest arises if µtr = µ; that is, µ is an invariant measure (not necessarily totally ﬁnite) for the Markov transition kernel p(· | ·). It follows from the last relation that a Poisson process with this parameter measure is invariant under the Markov shift operation, a result due to Derman (1955).

174

11. Convergence Concepts and Limit Theorems

Consider ﬁnally the case of a pure shift (so that q(x) = 0 for all x ∈ X ). Suppose that X = Rd and µ(dx) = µ(dx) where on the right-hand side, µ is a constant and denotes Lebesguemeasure on BRd . Then the initial process is stationary and p(dy | x) = F d(y − x) , meaning that the shifts are identically distributed about the positions of the initial points; that is, we have random translations of the points. Then µtr = µ and consequently a stationary Poisson process is invariant under a process of i.i.d. shifts. Before leaving this topic we make a few remarks concerning the L1 theory referred to brieﬂy before Theorem 11.4.II. A key step here is to show that in both situations considered, the distributions νn satisfy the condition, for all bounded Borel sets A ∈ Rd , |νn (y + A) − νn (x + y + A)| dy → 0 uniformly in x. (11.4.14) Rd

This condition, or the apparently stronger but in fact equivalent condition νn ∗ γ1 − νn ∗ γ2 → 0

(11.4.15)

for all pairs γ1 , γ2 of distributions absolutely continuous with respect to Lebesgue measure in Rd , is referred to in MKM (1978) as weak asymptotic uniformity of the sequence {νn }. A particular example of such a sequence is the sequence of uniform distributions on the sets {An } of a convex averaging sequence. The major technical diﬃculty is then to show that the standard form of conclusion of the mean ergodic theorem, which can be written as " " " " " Hn (x + A) N0 (dx) − Y (A)"" → 0, (11.4.16) E" d R

where Hn is this special case of a weakly asymptotically uniform sequence, can be extended to the general case and therefore implies (11.4.11) in each of the two situations under consideration. The deﬁnitive treatment of the L1 case was given by Stone (1968), after earlier work by Dobrushin (1956) and Maruyama (1955) in the context of iterated random translation, and by Breiman (1963) and Thed´een (1964) for the random velocities scheme. Further extensions and generalizations occur in a series of papers by Matthes and co-workers; for details we refer to MKM (1978) especially Chapter 11 and the further references there. An algebraic treatment of (11.4.14) and related properties, when νn are convolution powers, is contained in the papers by Stam (1967a, b). The second-order treatment used to prove Theorem 11.4.II is an extension of the discussion in Vere-Jones (1968). Some partial results concerning (11.4.14) and related topics are covered in Exercises 11.4.4–5.

Exercises and Complements to Section 11.4 11.4.1 Show that in the stationary case, both ordinary and mixed Poisson processes are invariant under the operation of random translation. [Hint: Use the p.g.ﬂ. representations at (11.4.1) and (11.4.12).]

11.4.

Random Translations

175

11.4.2 The binomial distribution {bk (n; p)} = { nk pk (1 − p)n−k } with 0 < p < 1 is the n-fold convolution of the simplest nondegenerate d.f. F that can arise √ with Lemma 11.4.I. The order 1/ n of the bound at (11.4.4) is tight because 1 1 ≤ Q{0} ({bk (n; p)}) ≤ 2π(n + 1)p(1 − p) 4(n + 1)p(1 − p)

[see, e.g., MKM (1978, pp. 476–477) or else Daley (1987)]. 11.4.3 Let P (x, A) denote a stochastic or substochastic kernel deﬁned for all x ∈ Rd and A ∈ B(Rd ) such that it has an inﬁnite invariant measure ν. Consider the operation of random translation according to the kernel P [i.e., a point initially at x is translated to a new point y according to the distribution P (x, ·)]. (a) The Poisson process with intensity measure ν is invariant under this operation. (b) If P is continuous, then the initial process N0 is invariant under this operation if and only if it is a Cox process directed by Y ν, where Y is a nonnegative random variable. (c) Investigate conditions under which the sequence of point processes {Nn } obtained from an initial process N0 by successive iteration of this operation will converge to a limit of the form described in (b). [Hint: See Kerstan and Debes (1969) and Debes et al. (1971). Part (a) goes back to Derman (1955). The case where ν is totally ﬁnite is discussed in MKM (1978, Section 4.8).] 11.4.4 For a given distribution F on Rd , let S denote the set of points a in Rd such that for all intervals A, supx |F n∗ (x + a + A) − F n∗ (x + A)| → 0

as n → ∞.

(11.4.17)

Prove the following. (a) S is an algebra. (b) If a ∈ supp(F ), then a ∈ S. (c) If supp(F ) is contained in no proper subalgebra of Rd , then S = Rd . 11.4.5 (Continuation). A sequence of measures {νn } is weakly asymptotically uniformly distributed if for all absolutely continuous distributions σ on B(Rd ) and all x ∈ Rd , (n → ∞). (11.4.18) σ ∗ νn ∗ δx − σ ∗ νn → 0 (a) Show that (11.4.17) implies (11.4.18) in the special case that σ is the uniform distribution on the interval A and {νn } = {F n∗ }. (b) Extend this result and deduce that if F is nonlattice, the sequence of convolution powers of F is weakly asymptotically uniformly distributed. (c) Prove that (11.4.18) is equivalent to νn ∗ σ1 − νn ∗ σ2 → 0

(n → ∞)

for all pairs of absolutely continuous distributions σ1 and σ2 . (d) If (11.4.18) holds then QA (νn ) → 0 (cf. Lemma 11.4.I). [Hint: For further details and applications see MKM (1978, Chapter 11).]

CHAPTER 12

Stationary Point Processes and Random Measures

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8

Stationarity: Basic Concepts Ergodic Theorems Mixing Conditions Stationary Inﬁnitely Divisible Point Processes Asymptotic Stationarity and Convergence to Equilibrium Moment Stationarity and Higher-order Ergodic Theorems Long-range Dependence Scale-invariance and Self-similarity

177 194 206 216 222 236 249 255

Stationary point processes play an exceptionally important role in applications and lead also to a rich theory. They have appeared already in Chapters 3, 6 and 8, where some basic properties and applications were outlined, and they are central to the discussion not only in the present chapter, but also in Chapters 13 and 15, and to a lesser extent Chapter 14 also. Our main purpose in this chapter is to develop a systematic study of the theory of point processes and random measures that are invariant under shifts in d-dimensional Euclidean space Rd . Much of the theory is likely to appear familiar: some parts are merely variants of the corresponding theory of stationary continuous processes; the greater part can be deduced from the theory of stationary random distributions, or, in R1 , from the theory of processes with stationary increments, but some parts, especially applications, are peculiar to point processes and random measures. Although we have chosen to develop the basic theory for shifts Su acting d on the canonical space M# X , with X = R , the underlying ideas are capable of extension in several directions. In the ﬁrst instance this refers to point processes and random measures invariant under more general forms of group action, for example, to processes in two- or three-dimensional Euclidean space that are invariant under rotations (i.e., isotropy) as well as shifts (i.e., homogeneity), and to processes on other types of manifold, such as the surface of a sphere or cylinder. In fact, many of the topics included in this chapter can 176

12.1.

Stationarity: Basic Concepts

177

be developed with almost equal facility (i.e., requiring nothing or little more than changes of wording or interpretation) for point processes on a locally compact metric group, and are so developed in the Russian edition of MKM (1982). We examine some such extensions in the present chapter, especially in connection with scale-invariance and self-similarity, whereas others, including isotropy, are taken up in Chapter 15 on spatial point processes. At the same time, the shifts studied in this chapter are examples of a ﬂow on a probability space (Ω, E, P), meaning in general a group {θg } of measurable one-to-one transformations of (Ω, F) onto itself. The probability measure P is invariant under the ﬂow if for all E ∈ E and all θg , P(θg E) = P(E). This more general concept [see, e.g., Baccelli and Br´emaud (2003)] is useful in unifying the treatment of processes, including marked point processes (MPPs) and Cox processes, where the canonical space M# X needs extending to accommodate information about the outcomes of auxiliary random variables or processes. An important technical role in establishing the form of both probability and moment structures for stationary processes is played by the factorization theorems summarized in Appendix A2 as Lemma A2.7.II and Theorem A2.7.III. In their basic form they assert that if a measure µ on a product space Rd × K is invariant under shifts in the ﬁrst component, then µ reduces to a product of Lebesgue measure in Rd and a ﬁxed measure κ on K. These factorization results extend to more general contexts with Rd replaced by a σ-group H, and Lebesgue measure replaced by Haar measure on H. They underlie the structure not only of stationary MPPs, where they apply most obviously, but also of stationary Poisson and Poisson cluster processes, of the moment measures of stationary processes, and of the Palm theory which is the subject of Chapter 13. After a ﬁrst section on basic concepts and examples, the chapter covers ergodic theorems, moment and mixing properties (Sections 12.2–4), stationary inﬁnitely divisible point processes (Section 12.5), convergence to equilibrium (Section 12.6), long-range dependence (Section 12.7), and scale-invariance and self-similarity (Section 12.8).

12.1. Stationarity: Basic Concepts We consider ﬁrst X = Rd and invariance properties with respect to translations (or shifts). For arbitrary u, x ∈ X , and A ∈ BX , write Tu A = A + u = {x + u: x ∈ A}.

Tu x = x + u,

M# X

NX# )

(and also of Then Tu induces a transformation Su of equation1 (Su µ)(A) = µ(Tu A) (µ ∈ M# X , A ∈ BX ). M# X

(12.1.1) through the (12.1.2)

It is clear that Su µ ∈ whenever µ ∈ that is, Su maps M# X into (indeed, onto) itself. Moreover, Su is continuous: to see this, let {µn } be 1

M# X;

With the operators T· and S· as deﬁned, the Dirac measure δ(·) has the property that

178

12. Stationary Point Processes and Random Measures

a sequence of measures on BX converging in the w# -topology to a limit µ, and let f be a bounded continuous function vanishing outside a bounded set. Then its translate f (x − u) has similar properties, so from properties of w# -convergence (Proposition A2.6.II), f (x) (Su µn )(dx) = f (x − u) µn (dx) X X →w # f (x − u) µ(dx) = f (x) (Su µ)(dx). X

X

An application of the suﬃciency half of Proposition A2.6.II shows that Su µn →w# Su µ, and hence that Su is continuous. Because a shifted counting measure is again a counting measure, and NX# is closed in M# X , the same conclusion holds for the eﬀects of shifts Tu on counting measures. This establishes the following simple but important result. Lemma 12.1.I. For X = Rd and u ∈ Rd , both the mappings Su : M# X → # # M# and S : N →

N deﬁned at (12.1.2) via the shift operator T are u u X X X continuous (and hence measurable) and one-to-one. It now follows that if ξ is a random measure or point process, then so is Su ξ for every u ∈ Rd because Su ξ is then the composition of two measurable mappings. This remark enables us to make the following deﬁnition. Deﬁnition 12.1.II. A random measure or point process ξ with state space X = Rd is stationary if, for all u ∈ Rd , the ﬁdi distributions of the random measures ξ and Su ξ coincide. If extra emphasis is needed, we call such random measures strictly stationary or stationary as a whole to distinguish them from random measures that are stationary in weaker senses such as second-order stationarity following Proposition 8.1.I. Note also that this deﬁnition is the natural extension to Rd and M# X of Deﬁnition 3.2.I, and lies behind the summary of stationarity properties in Proposition 6.1.I. Deﬁnition 12.1.II can be stated in a compact form by deﬁning a ‘lifted’ operator or transformation S+u that functions at a third level of abstraction, # on measures P on the Borel sets of M# X . For B ∈ B(MX ) set S+u P(B) = P(Su B),

(12.1.3)

where Su B = {Su µ: µ ∈ B}. The remark following Lemma 12.1.I implies that # S+u maps M# X (or NX ) into itself, and an argument similar to the proof of (Su δx )(·) = δx−u (·) and f (x) (Su µ)(dx) =

f (x) µ(d(x + u)) =

f (x − u) µ(dx).

Sometimes, an operator S· for which Su = S−u is used instead. Then the + and − signs in the above equations are interchanged. The operator Su we use is the same as Tu in MKM (1978, p. 258) and as T−u in Kallenberg (1975 or 1983a, Exercise 10.10).

12.1.

Stationarity: Basic Concepts

179

that lemma shows that the mapping is even continuous (see Exercise 12.1.1). Then Deﬁnition 12.1.II is equivalent to stating that a random measure on Rd is stationary if its distribution on M# is invariant under shifts S+u . Rd These concepts and results can be extended to the more general context of a ﬂow referred to in the introduction. In this case the probability space (Ω, E, P) is not given an explicit structure, but it is supposed to be capable of supporting a family of one-to-one measurable transformations of Ω onto itself, {θu : u ∈ G} say, where G has a group structure, θu+v = θu ◦ θv , θ0 is the identity, and (θu )−1 = θ−u . The link to a transformation acting directly on the random measure ξ: (Ω, E) → M# X , BM# is then provided by taking G = X = Rd and requiring that

X

ξ(A, θu ω) = ξ(A + u, ω) or, more brieﬂy, ξ(θu ω) = Su ξ(ω). Most of the examples below illustrate speciﬁc cases where the ﬂow is deﬁned for u ∈ G = Rd , but of course shifts in Rd are not the only actions which can be deﬁned by ﬂows. In the case of a marked point process, it is enough to take for Ω the space of counting measures on the product space Rd × K (i.e., NR#d ×K ), and to consider for the ﬂow the family of translations Su acting on the ﬁrst component only, so that Su N (A × K) = N (Tu A × K). The MPP is stationary if its ﬁdi distributions are invariant under the translations {Su } (equivalently, its probability distribution on M# is invariant under the lifted operators {S+u }). Rd ×K In discussing a stationary MPP N say, we mostly write Su N rather than Su N , the restriction of the shift to Rd being understood. We proceed to a detailed study of stationarity of random measures in Rd , illustrating the constructions in terms of shifts S+u on the canonical probability space. The results of Chapter 9 imply that the distribution of a random measure is completely determined either by its ﬁdi distributions or by its Laplace functional (see Propositions 9.2.III and 9.4.II). Similarly, the distribution of a point process is completely determined by its ﬁdi distributions or by its p.g.ﬂ. (Theorem 9.4.V) or, if the point process is simple, by its avoidance function (Theorem 9.2.XII). Applying these criteria to the deﬁnition of stationarity, we deduce that a random measure is stationary if and only if its ﬁdi distributions are stationary, or, equivalently, if and only if its Laplace functional is stationary. Similarly, a point process is stationary if and only if its p.g.ﬂ. is stationary, or, if it is simple, if and only if its avoidance function is stationary. Spelling out the details of these remarks yields the following theorem. Theorem 12.1.III. Let ξ be a random measure on state space X = Rd . Each of the following conditions is necessary and suﬃcient for ξ to be stationary. (i) For each u ∈ Rd and k = 1, 2, . . . , the ﬁdi distributions satisfy Fk (A1 . . . , Ak ; x1 , . . . , xk ) = Fk (A1 + u, . . . , Ak + u; x1 , . . . , xk ). (12.1.4)

180

12. Stationary Point Processes and Random Measures

(ii) For each u ∈ Rd and f ∈ BM(Rd ) the characteristic functional satisﬁes Φ[f (·)] = Φ[f (· − u)]. (12.1.5) When ξ is a point process N the conditions are equivalent to the following. (iii) For each u ∈ Rd and h ∈ V(Rd ), the p.g.ﬂ. G satisﬁes G[h(·)] = G[h(· − u)].

(12.1.6)

If also N is simple, the conditions are equivalent to the following. (iv) For each u ∈ Rd and all bounded Borel sets A ∈ B(Rd ), the avoidance function P0 (·) of N satisﬁes P0 (A) = P0 (A + u).

(12.1.7)

Furthermore, in (i) and (iv) it is suﬃcient for the results to hold for disjoint sets Ai and A from a dissecting semiring generating B(Rd ). The ﬁnal statement of the theorem implies that it is enough in (i) and (iv) to have the statements holding for disjoint sets that can be represented as ﬁnite unions of half-open rectangles. It is not possible to relax this condition signiﬁcantly: in R1 Lee’s counterexample quoted in Exercise 2.3.1 exhibits two processes with the same distributions for N (I) whenever I is an interval, one of these processes being stationary (indeed, a stationary Poisson process) and the other not. See also Exercise 12.1.2. Analogues of this proposition hold for other examples of ﬂows. The details for MPPs are spelled out in Exercise 12.1.3. The next few examples illustrate some applications of Theorem 12.1.III and its extensions. Example 12.1(a) Stationarity of Poisson and compound Poisson processes [continued from Example 9.4(c); see also Lemma 6.4.VI]. From the representation of the Poisson process p.g.ﬂ. at (9.4.17) we have, for X = Rd , log G[h(· − u)] = [h(x − u) − 1] µ(dx) = [h(y) − 1] (Su µ)(dy), (12.1.8) X

X

which under the assumption of stationarity at (12.1.6) is to be equal to [h(y) − 1] µ(dy). X

Because the measure µ is completely determined by its integrals of functions of the form h(y) − 1 for h ∈ V(Rd ), it follows that a Poisson process is stationary if and only if its parameter measure is invariant under translation. Now the only measure on Rd invariant under translations is Lebesgue measure, so µ(·) must be a multiple of Lebesgue measure on Rd ; that is, for some µ ≥ 0, µ(·) = µ(·). Thus, a Poisson process on Rd is stationary if and only if it has a constant intensity with respect to Lebesgue measure on Rd .

12.1.

Stationarity: Basic Concepts

181

Alternatively, we may ﬁrst observe that a stationary random measure can have no ﬁxed atoms, then use the fact (Theorem 2.4.II) that a Poisson process has no ﬁxed atoms if and only if its parameter measure is nonatomic, in which case the process is simple, and ﬁnally appeal to (12.1.7), which yields e−µ(A) = e−µ(A+u) , implying the same result even more directly. The compound Poisson process, in the general sense of Section 6.4, is an example of a marked point process, but essentially similar techniques can be applied. Using the p.g.ﬂ. approach from Exercise 12.1.3(iii), we have to check, for a constant rate Poisson ground process and i.i.d. marks, that (12.1.8) holds for a function h(u, κ) of the two variables. In fact we have [h(x − u, κ) − 1] µ dx π(dκ) log G[h(· − u, ·)] = X K = [h(y, κ) − 1] µ dy π(dκ) = log G[h(·, ·)]. X

K

Example 12.1(b) Stationarity is preserved by simple random thinnings and translations [continued from Sections 11.3 and 11.4]. Let N be a stationary point process and assume that each point xi of a realization of N is independently and randomly shifted through a random distance Xi , where the {Xi } are identically distributed with common d.f. F (·); to accommodate deletions, we allow the distribution to be defective, and set q = 1 − F (Rd ). Then from equation (11.4.1) the respective p.g.ﬂ.s G and G0 of the shifted process and N are related by G[h(·)] = G0 q + X h(y) F (dy − ·) . Much as in the previous example, when G0 is itself stationary, G0 [h(·)] = G0 [h(· − u)] for all u ∈ Rd and h ∈ V(Rd ). The right-hand side of the expression for G[ · ] then equals G0 q + X h(y − u) F (dy − ·) = G[h(· − u)], so by (iii) the transformed process is again stationary. Pure translations occur when q = 0, else random deletions when F is concentrated at 0. The stationarity of Cox processes and some cluster processes can be veriﬁed by similar techniques. Cox processes are important as examples where the ﬂow needs to be deﬁned initially on an extension of the canonical space M# X. Example 12.1(c) Mixed Poisson and Cox processes. For the case of a mixed Poisson process, we may take Ω = NR#d ×R+ . Then the pair (N, λ) corresponds to the choice of a counting measure N ∈ NR#d and a rate λ ∈ R+ . The distribution P can be generated by conditioning as outlined in Section 6.1: Poi(V | λ) Π(dλ) (A ∈ BR+ ), P(V × A) = A

182

12. Stationary Point Processes and Random Measures

where V is a set of realizations from NR#d , Poi(· | λ) is the probability distribution on NR#d of a Poisson process at rate λ, and Π is the distribution of λ on the Borel sets of R+ . As in Example 12.1(a), the ﬂow is the family of shifts Su on the ﬁrst component. Stationarity is guaranteed, because invariance of the mixture P(·) is implied by invariance of each of the conditional distributions Poi(· | λ). The case of a Cox process is only a little more complicated. Here we may take Ω = NX# × M# X , where the ﬁrst component refers to the realizations of the point process and the second to the realizations of the directing random measure. The ﬂow must now act simultaneously on both components, so that (12.1.9) N (A, θu ω), ξ(B, θu ω) = N (A + u, ω), ξ(B + u, ω) or in more compact notation θu (N, ξ) = (Su N, Su ξ).

(12.1.9 )

We have then for the distribution P of the Cox process on Ω as above, Poi(V | ξ) Q(dξ) (W ∈ B(M# )), (12.1.10) P(V × W ) = Rd W

where V a set of counting measures, W is a set of directing measures ξ, Poi(· | ξ) is now the distribution of the inhomogeneous Poisson process with parameter measure ξ, and Q is the distribution of the directing random measure ξ. Exercise 12.1.4 indicates how to show that the resultant process is stationary if and only if Q is stationary. In this case the bivariate process N (·), ξ(·) is also stationary, its distribution being invariant under the same shift acting on both components. Similar constructions are possible in other cases where the evolution of the random measure under study is associated with the evolution of some auxiliary process. For stationarity of a general cluster process see Exercise 12.1.6; the important example of a Poisson cluster process is summarized shortly in Proposition 12.1.V where for the ﬁrst time we meet a measure in (k) Rd invariant under the group of diagonal shifts Dx deﬁned for k ∈ Z+ and d x ∈ R by (12.1.11) Dx(k) (y1 , . . . , yk ) = (x + y1 , . . . , x + yk ), where y = (y1 , . . . , yk ) and yi ∈ Rd for i = 1, . . . , k, so ﬁrst we examine the structure of such measures. As in Appendix A2.7, the cosets under this group of transformations are images of the main diagonal y1 = · · · = yk . The (k) action of Dx along any such coset is just a shift through the vector x. Thus, we should anticipate that any measure invariant under the diagonal shifts should reduce to a multiple of Lebesgue measure in each such coset. The next lemma makes this idea precise: by the diagonal subspace we mean the space {(y1 , . . . , yk ): y1 = · · · = yk ∈ Rd }.

12.1.

Stationarity: Basic Concepts

183

Lemma 12.1.IV (Diagonal Shifts Lemma). Let µ be a boundedly ﬁnite Borel measure on X (k) with X = Rd . Then µ is invariant under the diagonal (k) shifts Dx of (12.1.12) if and only if it can be represented as a product of Lebesgue measure on the diagonal subspace and a reduced measure µ ˘ on X (k−1) such that, for any function f ∈ BM(X (k) ), and k > 0, f (x1 , . . . , xk ) µ(dx1 × · · · × dxk ) X (k) = dx f (x, x + y1 , . . . , x + yk−1 ) µ ˘(dy1 × · · · × dyk−1 ), (12.1.12) X

X (k−1)

where in the case k = 1, µ ˘(·) = mδ0 (·) in which δ0 denotes Dirac measure, m = µ(Ud ), and Ud is the unit d-dimensional hypercube. Proof. Consider the mapping from X × X (k−1) into X (k) deﬁned by x, (y1 , . . . , yk−1 ) → (x, x + y1 , . . . , x + yk−1 ). (12.1.13) Given any (x1 , . . . , xk ) ∈ X (k) , we have uniquely x = x1 and yi = xi+1 − x1 (i = 1, . . . , k − 1), so the mapping is one-to-one and onto; it is clearly continuous and hence measurable. Under the mapping, the action of the diagonal (k) shifts Dx on X (k) is reduced to the ordinary shift Tx on the X component of the product X × X (k−1) . We therefore have a representation of the original space X (k) to which we can apply Lemma A2.7.II and assert that the image, µ∗ say, of µ induced by the mapping (12.1.13) reduces to a product of ddimensional Lebesgue measure along X and some measure µ ˘ on the other ˘. Then µ ˘ and µ are related as at factor space X (k−1) ; that is, µ∗ = × µ (12.1.12). For an alternative approach see Exercises 12.1.8–9 and 12.6.1–2. Proposition 12.1.V. A Poisson cluster process with a.s. ﬁnite clusters, and both cluster centres and cluster members in X = Rd , is stationary if and only if it can be represented in such a way that (i) the cluster centres form a stationary Poisson process in Rd ; and (ii) the cluster members depend only on their positions relative to the cluster centre, and not on the location of the cluster centre itself. In particular, the p.g.ﬂ. of a stationary Poisson cluster process with a.s. ﬁnite clusters has a unique representation (its regular representation) of the form ∞ πk dx [h(x)h(x + y1 ) . . . h(x + yk−1 ) − 1] log G[h] = µc k! X X (k−1) k=1

Pk−1 (dy1 × · · · × dyk−1 ),

(12.1.14)

where µc is the intensity of the cluster centre process, {πk : k ≥ 1} is a proper probability distribution of cluster sizes, P0 (·) = δ0 (·), and for k ≥ 2, Pk−1 (·) is a symmetric probability distribution describing the locations of the remaining k−1 cluster members relative to an arbitrary cluster member chosen as origin.

184

12. Stationary Point Processes and Random Measures

Remark. As already noted around Proposition 6.3.V, the representation of a cluster process in terms of cluster centre and cluster member processes is not unique. In the present context, it is even possible to construct a stationary Poisson cluster process from nonstationary components; that is, they do not satisfy conditions (i) and (ii) (see Exercise 12.1.5). What the proposition asserts is that, even in such cases, the process will have an alternative representation where the above conditions do hold, and that when there is more than one representation, the regular representation is always available as an option. Proof. Recall from Proposition 6.3.V that a Poisson cluster process with a.s. ﬁnite clusters has a unique representation with p.g.ﬂ. of the form ∞ 1 h(x1 ) . . . h(xk ) − 1 Kk (dx1 × · · · × dxk ), (12.1.15) log G[h] = k! X (k) k=1

where the Khinchin measure Kk has the representation Kk (B) = Jk (B | y) µc (dy)

(12.1.16)

X

in terms of the Janossy density Jk (· | y) of the cluster member process and the intensity measure µc of the Poisson cluster centre process. (Note that we here assume that both cluster centres and cluster members have points in the same space Y = X = Rd .) If the process is stationary, uniqueness of the representation (12.1.14) implies that each of the measures Kk must be invariant under diagonal shifts (x1 . . . , xk ) → (u + x1 , . . . , u + xk ), or equivalently Kk (Tu A1 × · · · × Tu Ak ) = Kk (A1 × · · · × Ak ). The diagonal shifts Lemma 12.1.IV now implies that Kk (·) must reduce to a product of Lebesgue measure along the diagonal, and a boundedly ﬁnite ˘ k−1 (·) on B(X (k−1) ) such that (12.1.15) holds. measure K These ingredients can be used to construct a candidate process for the regular representation. We introduce Janossy measures J0 = 0, J1 (dx | y) = δy (dx), and, for k > 1, ˘ k−1 (dy2 × · · · × dyk ), Jk (dx1 × dx2 × · · · × dxk | x) = (1/µc ) δ0 (dy1 ) K ∞ ˘ (k−1) where yi = xi − x for i = 1, . . . , k and µc = k=1 K )/k! is the k−1 (X candidate intensity of a stationary Poisson cluster centre process. We interpret πk = Jk (X (k) )/k! as the probability that a cluster has k members, and ˘ k−1 (·) Pk−1 (·) = (k! µc πk )−1 K

12.1.

Stationarity: Basic Concepts

185

as the symmetric probability distribution describing the location of the remaining cluster members relative to a given cluster member as centre; from symmetry the cluster member selected as centre may be regarded as being chosen uniformly at random. We leave the reader to verify that back-substitution of these candidate elements results both in a Poisson cluster process with the required properties and in (12.1.14) being satisﬁed. This establishes the necessity of a representation satisfying the conditions (i) and (ii) of the proposition, as well as the form of the regular representation. That the two conditions are suﬃcient to guarantee stationarity is a matter of veriﬁcation. In p.g.ﬂ. terms, condition (ii) of the proposition implies that for every x, Gm [h(·) | x] = Gm [h(· + x) | 0]. It is then straightforward to check that condition (12.1.6) in Theorem 12.1.III is met. Exercise 12.1.6 extends this argument to more general cluster processes; an alternative approach using Radon–Nikodym derivatives and the disintegration of measures is sketched in Exercises 12.1.8–9. The next example examines a particular case of this representation in detail, and shows that it is not always the most natural or convenient for further manipulations. Example 12.1(d) The regular representation of a stationary Neyman–Scott process [continued from Example 6.3(a)]. In the Neyman–Scott model the cluster members have a common distribution F (·) about the cluster centre. To obtain the regular representation we should refer the distribution of the cluster members to an arbitrarily chosen member of the cluster itself as origin. For clusters with just one element we have evidently 0 (A) = δ0 (A); P that is, the cluster is necessarily centred at its sole representative. For k = 2 we obtain 1 (A) = Pr{Y − X ∈ A} = F (x + A) F (dx), P X

where X and Y are independent r.v.s with the distribution F . Similarly for general k ≥ 2, k−1 (A2 × · · · × Ak ) = F (dx) F (x + A2 ) . . . F (x + Ak ). P X

k−1 gives In the branching process interpretation of the cluster members, P the distribution of the locations of the other siblings given that the arbitrarily chosen member comes from a family of size k. The use of (i) or (iv) rather than (ii) or (iii) of Theorem 12.1.III is indicated in the next example. It shows that a stationary measure of renewal type can be deﬁned on NR# irrespective of whether the interval distribution has a ﬁnite or inﬁnite ﬁrst moment. Without a ﬁnite mean, this measure is not totally ﬁnite and so cannot be used directly to deﬁne a stationary point process; it is used in Exercise 12.4.6 to exhibit an example of a weakly singular inﬁnitely divisible point process.

186

12. Stationary Point Processes and Random Measures

Example 12.1(e) Stationary regenerative measure and renewal process [see also Exercises 9.1.13–14]. Let µ be a measure, not necessarily totally ﬁnite, deﬁned on the space of sequences {Yn : n ∈ Z+ } satisfying Y0 = 0 ≤ Y1 ≤ · · · ≤ Yn → ∞ (n → ∞), and put τn = Yn − Yn−1 (n = 1, 2, . . .). We call µ regenerative when for any positive integer r and τi ∈ R+ (i = 1, . . . , r), r

dF (xi ), µ {τi ∈ (xi , xi + dxi ], i = 1, . . . , r} = µ1 (dx1 ) i=2

where µ1 is a boundedly ﬁnite measure on R+ and F is a d.f. on R+ . Then it follows as in Exercise 9.1.14 that µ deﬁnes a measure on B(NR#+ ). Our ﬁrst aim is to show that when µ1 (dx1 ) = [1 − F (x1 )] dx1 , the measure µ is invariant under shifts Su for u > 0, even if it is not a probability measure as in Deﬁnition 12.1.II. When the counting measure N on R+ consists of unit atoms at Y1 , Y2 , . . . , with N (0, Yr ] = r for r = 1, 2, . . . as in Exercise 9.1.12, the successive atoms {Yn } for the counting measure Su N for u > 0 are given by Yn = Yn+ν − u, where the index ν = 0 if Y1 > u, = sup{n: Yn ≤ u} otherwise. Consequently, (Y0 ≡ 0), the measure Su µ on {Yn } is related to µ by writing τn = Yn − Yn−1 (Su µ) {τi ∈ (xi , xi + dxi ], i = 1, . . . , r} = µ {τ1 ∈ (u + x1 , u + x1 + dx1 ], τi ∈ (xi , xi + dxi ], i = 2, . . . , r} ∞ µ {τ1 + · · · + τj ≤ u, τ1 + · · · + τj+1 ∈ (u + x1 , u + x1 + dx1 ], + j=1

τj+i ∈ (xi , xi + dxi ], i = 2, . . . , r} ∞ µ {τ1 + · · · + τj ≤ u, = µ {τ1 ∈ (u + x1 , u + x1 + dx1 ]} + j=1

τ1 + · · · + τj+1 ∈ (u + x1 , u + x1 + dx1 ]}

×

r

dF (xi ).

i=2

For convenience, integrate x1 over (0, y] say, so that on the right-hand side, y when µ1 (0, y] = 0 [1 − F (x1 )] dx1 , the (r − 1)-fold product of terms dF (·) has coeﬃcient

u+y

[1 − F (x1 )] dx1 + u

∞ j=1

· · · [1 − F (t1 )] dt1 dF (t2 ) . . . dF (tj+1 ),

12.1.

Stationarity: Basic Concepts

187

where the multiple integral ∞is over the set {t1 + · · · + tj ≤ u < t1 + · · · + tj+1 ≤ u+y}. Writing U0 (x) = j=1 F j∗ (x), so that U0 satisﬁes the renewal equation

x

F (x − y) dU0 (y),

U0 (x) = F (x) + 0

the multiple integral can be expressed as u−x1 +y y [1 − F (x1 )] dx1 dF (x2 ) 0

u−x1

= 0

u

0

u

u−v

[1 − F (x1 )] dx1

U0 (dv)

+

0

u−x1 −v+y

x+y

[1 − F (u − x)] dx dF (z) x u u−v + U0 (dv) [1 − F (u − v − x)] dx 0

dF (x2 )

u−x1 −v

0

x+y

dF (z). x

Here, the second term equals u u−x x+y dx [1 − F (u − x − v)] dU0 (v) dF (z) 0 0 x u x+y = F (u − x) dx dF (z), 0

x

so the coeﬃcient of the (r − 1)-fold product of terms dF (·) equals

u+y

[1 − F (x)] dx + u

u

[F (y + x) − F (x)] dx = 0

y

[1 − F (x)] dx, 0

showing that ∞ µ is invariant as required. When 0 [1 − F (x)] dx ≡ λ−1 < ∞, λµ(·) is a probability measure, and so also is the measure it induces on B(NR#+ ). We can then identify the counting measure N (·) with such a stationary distribution as a stationary renewal process. We use the following proposition in discussing stationary inﬁnitely divisible point processes; the result is of wider importance (see, e.g., Section 3.4 and the discussion of parallel lines in a stationary line process in Section 15.4). An analogue for MPPs is at Exercise 12.2.10. Proposition 12.1.VI (Zero–inﬁnity Dichotomy). For a stationary random measure ξ on X = Rd , P{ξ(X ) = 0 or ∞} = 1. (12.1.17) Proof. The assertion is equivalent to showing that P{0 < ξ(X ) < ∞} = 0. Supposing the contrary, it necessarily follows that there exist some positive

188

12. Stationary Point Processes and Random Measures

constants a and γ such that for the hypercube Udγ with vertices {± 12 γ, . . . , ± 12 γ} and its complement (Udγ )c ,

P ξ: ξ(Udγ ) > a, ξ (Udγ )c < a = α > 0. Write Tγr Udγ for the shift of Udγ through the vector γr = (γr1 , . . . , γrd ), where r has integer-valued components so r ∈ Zd , and consider the events

Vr = ξ: ξ(Tγr Udγ ) > a, ξ Tγr (Udγ )c < a . By stationarity, P(Vr ) = P(V0 ) = α for all such r, and because the events Vr are disjoint for distinct r, P Vr = P(Vr ) = ∞ · α, r∈Zd

r∈Zd

which is impossible when P is a probability measure unless α = 0. Equation (12.1.17) prompts the following deﬁnition. Deﬁnition 12.1.VII. A random measure ξ is nonnull when P{ξ = ∅} = 0. It follows from (12.1.17) that a nonnull stationary random measure on X = Rd satisﬁes P{ξ(X ) = ∞} = 1. The discussion so far has centred on invariance with respect to shifts in Rd , but, as mentioned earlier, the ideas can be carried over with only nominal changes to processes invariant under other types of transformation, such as rotations, permutations of coordinates, or changes of scale. To conclude this section we examine one such example where, as in Rd , the state space itself is the group. In such cases we should anticipate that a basic role will be played by Haar measure which, analogous to the uniform distribution on the circle, or Lebesgue measure on the line, is the unique measure on the group invariant under the group actions. In the case of a Poisson process, for example, the properties of the process are determined by the parameter measure, which inherits the property of being invariant under the group actions from invariance under the corresponding ﬂow. But the only measures invariant under the group actions are multiples of the Haar measure, and so the parameter measure itself must be a multiple of Haar measure [recall Examples 12.1(a) and (c)]. Even when there is no obvious governing measure, Haar measure will reappear in the moment measures, and lurks in the background behind the ﬁnitedimensional distributions. Its role in the latter context can be seen most clearly when the state space is compact as in the next example. Example 12.1(f) Stationary point process on the circle S. For a point process with state space the circle S, which we identify with angles θ modulo 2π, the compactness of S implies that the process necessarily has a.s. ﬁnite

12.1.

Stationarity: Basic Concepts

189

realizations, so explicit constructions in terms of Janossy measures are possible. Thus, supposing that the realization consists of exactly n points, deﬁned by angles {θ1 , . . . , θn }, its distribution can be described by the symmetrized probability measure (conditional on n) Πn (dθ1 × · · · × dθn ) =

Jn (dθ1 × · · · × dθn ) . Jn (S(n) )

Stationarity (invariance under rotations) implies that for all θ ∈ S and A1 , . . . , An ∈ B(S), Πn (Tθ A1 × · · · × Tθ An ) = Πn (A1 × · · · × An ) so that we again have invariance under diagonal shifts. Here Lemma 12.1.IV implies that Πn can be written in terms of a product of the uniform measure ˘ n on a space of n − 1 arguments on S and a reduced probability measure Π (n) φ1 , . . . , φn−1 : for g ∈ BM(S ) we have g(θ1 , . . . , θn ) Πn (dθ1 × · · · × dθn ) S(n) dθ ˘ n (dφ1 × · · · × dφn−1 ). = g(θ, θ + φ1 , . . . , θ + φn−1 ) Π S 2π S(n−1) (12.1.18) The interpretation of this result is quite simple. If the distribution Πn of n points is rotationally invariant, it can be described by locating one point uniformly around the circle and the other n − 1 points relative to it according ˘ n . The symmetry properties of Πn imply that it to the reduced distribution Π is immaterial which point is designated as the one to be uniformly distributed, and stationarity (i.e., rotational invariance) implies that it is immaterial which point of the circle is chosen to play the role of origin. For example, if n = 2 and densities exist, the distribution of the two points is completely described by a symmetrical density function f (·) such that Π2 (dθ1 × dθ2 ) = (2π)−1 f (θ2 − θ1 ) dθ1 dθ2 . Note that here, as in general, it is a necessary consequence of stationarity that any one-dimensional marginal distribution such as Π2 (· × S) must be uniform. As a more speciﬁc example of such a process, consider ﬁrst any symmetric distribution g(θ) about the origin (pole) θ = 0 [see, e.g., Mardia and Jupp (2000) for examples]. Take any ﬁxed or random number of points independently distributed about the origin, to form the circular analogue of a Neyman–Scott cluster, N (· | 0) say. Then shift the origin to an angle uniformly distributed over S. This is already a single-cluster, stationary process on S. Finally, consider the superposition of N such processes, where N is Poisson distributed with mean ν. The result is a cluster process on S analogous to a Neyman–Scott process in time or space.

190

12. Stationary Point Processes and Random Measures

˘ n (·) deﬁned by In this case, unfortunately, the reduction of Πn (·) to Π (12.1.18) is of little direct value in computing its properties. Even if the realization consists of only two points, the density function f for the angular separation of the two points will be an awkward mixture of densities that arise from pairs of points coming from either single or diﬀerent clusters. For stationary point processes in general, the locations of points relative to a given point of the process as origin are independent of where that point itself is located. This is the theme of the Palm theory for stationary point processes discussed in Section 13.3. Often the main diﬃculty in applying the group concepts relates to the fact that the group G of transformations may split the space into equivalence classes in quite a complex manner. By contrast, the shifts act transitively on the whole space (any point can be transformed by a member of G into any other point) so that the equivalence classes are trivial, the whole space forming the unique equivalence class. Marked point processes on Rd form the canonical example of the sort of structure to be expected in more general cases. Here the state space has the representation X = Rd × K, in which the ﬁrst factor is the group and the second can be regarded as a representation of the space of equivalence classes. This product form is the desired endpoint of analyses based on Lemma A2.7.II and Proposition A2.7.III. Any measure on X that is invariant under the group actions can then be expressed as the product of Haar measure on the group and a measure on the other component K. Of course the probability measure deﬁning the point process does not live on X itself, but on NX# , but once again the underlying factorization of measures on X generally carries with it some corresponding simpliﬁcations of the probability distributions and the moment measures. To illustrate, we consider an extension of the previous example to the marked case. Example 12.1(g) A stationary MPP on S. To extend Example 12.1(f) to an MPP, start by supposing that the number n of points in the ground process is ﬁxed, where the ground process is again speciﬁed by locating an initial point uniformly at random around the circle, and then locating the other points relative to it according to a reduced (n − 1)-dimensional symmetric ˘ n (φ1 , . . . , φn−1 ) say. To take the speciﬁc case when n = 2 as an distribution Π example, the process can be speciﬁed by two components: (i) a distribution F (φ) for the angular separation φ (in a given direction, clockwise say); and (ii) a family of symmetric bivariate distributions, G2 (K1 , K2 | φ) say, for the marks (with Ki ∈ BK for a mark space K that is a c.s.m.s.), given the angular separation φ. More generally, the associated distribution of marks can be speciﬁed by a family of n-dimensional symmetric distributions on K, Gn (K1 , . . . , Kn | θ, θ + φ1 , . . . , θ + φn−1 ) say, indexed by the angular locations; in the stationary case each Gn is independent of θ ∈ S. The simplest case is that of independent

12.1.

Stationarity: Basic Concepts

191

marks, but in general the joint distribution of the set of marks may depend on both the number and relative angles of the points in the ground process. Symmetry implies that the marginal distributions of the multivariate mark distribution are equal, and for a ﬁxed number of points this can be taken as deﬁning what is meant by the stationary mark distribution. However, in the general case of a random number of points, the marginal distributions may also depend on the value of n, so that the stationary mark distribution appears as a weighted average of the stationary mark distributions for realizations with diﬀerent numbers of points (see Exercise 12.1.10).

Exercises and Complements to Section 12.1 12.1.1 (a) Modify the argument leading to Lemma 12.1.I to show that when X = Rd , # Su µ is jointly continuous as a mapping from M# X × X into MX . (b) Show that Su deﬁned at (12.1.3) is a continuous and hence measurable # mapping of the space M# (M# X ) of boundedly ﬁnite measures on MX into itself, and that Su preserves measure and hence maps the set of all probability measures on M# X into itself. Verify that Su acts on the space M# (NX# ) of all boundedly ﬁnite measures on NX# in a similar way. [Hint: Su inherits the properties of Su in much the same way as Su inherits the properties of Tu . Continuity depends ultimately on the upper continuity of a measure at the empty set (Proposition A1.3.II).] 12.1.2 Give examples of nonstationary point processes for which (a) the avoidance function is stationary; (b) the one-dimensional distributions are stationary. [Hint: For integer-valued r.v.s X, Y, and X + Y, with X, Y ≥ 0 a.s., ﬁnd a bivariate distribution for dependent X and Y with the same marginal distribution for X + Y as though X, Y are independent. Take X = Z and deﬁne the ﬁdi distributions of a point process by using the dependent and independent bivariate distributions for alternate pairs of integers (see Ripley, 1976).] 12.1.3 Stationarity conditions for marked point processes. Verify that the following conditions for MPPs on Rd (i.e., for point processes on state space Rd × K), are equivalent, each corresponding to stationarity. (i) For each u ∈ Rd , k = 1, 2, . . . , and families A1 , . . . , Ak ∈ BRd and K1 , . . . , Kk ∈ BK , the ﬁdi distributions satisfy P ({N (Ai × Ki ) = ni (i = 1, . . . , k)}) ≡ Pk (A1 × K1 , . . . , Ak × Kk ; n1 , . . . , nk ) = Pk ((A1 + u) × K1 , . . . , (Ak + u) × Kk ; n1 , . . . , nk ). (ii) For each u ∈ Rd and f ∈ BM(Rd × K), with Su f (x, κ) = f (x − u, κ), the characteristic functional satisﬁes Φ[Su f ] = Φ[f ]. (iii) For each u ∈ R and h ∈ V(Rd × K), the p.g.ﬂ. G satisﬁes d

G[Su h] = G[h].

192

12. Stationary Point Processes and Random Measures

12.1.4 Starting from a Cox process N which with its directing measure ξ satisﬁes (12.1.9), use the p.g.ﬂ. of N and the Laplace functional Lξ of ξ, which are related by (6.2.3) of Proposition 6.2.II, to verify that a Cox process N on Rd directed by ξ is stationary if and only if ξ is stationary. Using a similar approach show that the joint process (N, ξ) is stationary. [Hint: Stationarity means invariance as indicated at (12.1.4). Check that this holds if and only if ξ is invariant.] 12.1.5 The following two examples show that a stationary cluster process can be realized from nonstationary components. (a) Random thinning with deletion probability µ(x)/[1 + µ(x)] at x ∈ R1 of an inhomogeneous Poisson process at rate [1 + µ(x)] dx, where µ(x) ≥ 0 (all x), yields a stationary Poisson process (cf. Exercise 11.3.1). (b) Take a simple point process on R with points at {2n + U : n = 0, ±1, . . .}, where the r.v. U is uniformly distributed on (0, 1), to be a (nonstationary) cluster centre process. Let clusters be independent and let them consist of precisely two points at distances X1 and 1+X2 from the cluster centre, where for each cluster X1 and X2 are i.i.d. r.v.s. Then the cluster process so constructed is the same as the random translation of a stationary deterministic process at unit rate. 12.1.6 Let N be a cluster process (see Deﬁnition 6.3.I) with stationary centre process Nc on Rd and independent component processes Nm (· | y) (y ∈ Rd ) for which the ﬁdi distributions of Nm (· | y), relative to y, are independent of y. Denote the p.g.ﬂ. of Nm by Gm [· | y]. Referring to Lemma 6.3.II and Exercise 6.3.2, show that a stationary cluster process N is well deﬁned if and only if

Rd

(1 − Gm [h | y]) Nc (dy) < ∞

a.s.

(h ∈ V(Rd )).

(12.1.19)

[Hint: Homogeneity of the Nm means that Gm [h(·) | y] = Gm [h(· + y) | 0].] 12.1.7 Stationary deterministic lattice processes [see Example 8.2(e) for the case d = 1]. Let the r.v. Y be uniformly distributed over the unit cube Ud in Rd , and let Zd denote the set of all integer-valued lattice points in Rd . Show that the point process N with sample realizations {n + Y : n ∈ Zd } is stationary. (Call N the stationary cubic lattice process at unit rate in Rd .) If the span of the lattice in the direction of the xi -axis, i = 1, . . . , d, is changed from 1 to ai , where the positive reals ai satisfy di=1 ai = 1, verify that stationarity at unit rate is retained.

É

12.1.8 (a) Let f (·) be a nonnegative measurable function on R satisfying for each ﬁxed u ∈ R f (x + u) = f (x) (a.e. x). (12.1.20)

Ê

Show that there exists a ﬁnite constant α such that f (x) = α a.e. [Hint: y F (y) = 0 f (x) dx satisﬁes the Hamel equation F (x + y) = F (x) + F (y).] (b) Extend the result of (a) to Rd . [Hint: Apply (a) in a coordinatewise manner, deducing at the ﬁrst step, for example, that in place of the constant α is a measurable function α(xd−1 ) (xd−1 ∈ Rd−1 ) satisfying (12.1.20).]

12.1.

Stationarity: Basic Concepts

193

12.1.9 Radon–Nikodym approach to construction of stationary cluster elements. (a) Check that each of the measures Kk1 (·) ≡ Kk ( · × X (k−1) ) in Proposition 12.1.V reduces to a multiple of Lebesgue measure. (b) Deﬁne the Radon–Nikodym derivatives Pk−1 (· | x) as in the discussion under (6.3.34) by Pk−1 (B | x) Kk (dx×X (k−1) ) = Kk (A×B)

(A ∈ BX , B ∈ B(X (k−1) ))

A

and observe that, for each ﬁxed u, Pk−1 (Tu A2 × · · · × Tu Ak | x + u) and Pk−1 (A2 × · · · × Ak | x) are versions of the same density and hence equal a.e. (c) For ﬁxed A2 , . . . , Ak , show that the function Pk−1 (Tu A2 × · · · × Tu Ak | u) in part (b) is a measurable function of u, implying by Exercise 12.1.8(b) that it reduces a.e. to a constant Pk−1 (A2 × · · · × Ak | 0). [Hint: For ﬁxed u, the Radon–Nikodym theorem shows that Pk−1 (Tu A2 × · · · × Tu Ak | x + u) = Pk−1 (A2 × · · · × Ak | x) Kk1 -a.e. x. Integrate Kk (Tu A1 × · · · × Tu Ak ) over u and use Fubini’s theorem to express the result as an integral whose density with respect to the product measure du × dx is Pk−1 (Tu A2 × · · · × Tu Ak | x + u), thereby showing via the Radon–Nikodym theorem its joint measurability in x and u. Hence, by putting x = 0, deduce that Pk−1 (Tu A2 × · · · × Tu Ak | u) is a measurable function of u that is a.e. equal to Pk−1 (A2 × · · · × Ak | 0).] (d) Take a countable semiring A generating B(Rd ) and show that Pk−1 is countably additive on product sets of the form A2 × · · · × Ak for Ai ∈ A and so can be extended uniquely to a measure Pk−1 on B(R(k−1)d ) such that for all product sets with Ai ∈ B(Rd ), Pk−1 (Tu A2 × · · · × Tu Ak | u) = Pk−1 (A2 × · · · × Ak )

a.e.

12.1.10 In the setting and notation of Examples 12.1(f)–(g), put πn = Pr{N (S×K)}. Verify that the stationary mark distribution, for K ∈ BK , equals

∞ 1 n=1

∞

nπn

n=1

nπn S

dθ 2π

S(n−1)

Gn (K, K, . . . , K | θ, θ + φ1 , . . . , θ + φn−1 ) ˘ n (dφ1 × · · · × dφn−1 ). Π

12.1.11 Renewal process and random walk on S. Suppose given a probability distribution G(dθ) on (0, 2π], interpreted as the length of a step in the clockwise direction around the circumference of a circle, and for n = 1, 2, . . . let Un (A) denote the expected number of visits within the ﬁrst n steps to the set A ⊆ (0, 2π]. Find conditions on G such that

2π

¯ Un (A)/n → 2π(A)/θ,

where θ¯ = 0 θ G(dθ) is necessarily ﬁnite and bounded by 2π. Investigate the behaviour when the conditions fail. [Hint: Formulate versions of the direct Riemann integrability and spread-out conditions of Section 4.4 for G, and apply the results for the real line, then wrap around the circle.]

194

12. Stationary Point Processes and Random Measures

12.2. Ergodic Theorems In this section we review some basic ergodic theorems and develop them for random measures and point processes. There are diverse examples of their application through the rest of this chapter and the next, and signiﬁcant extensions of the theory in Sections 13.4–5. Let (Ω, E, µ) be a measure space and S a measure-preserving operator on this space; that is, µ(S −1 E) = µ(E) for E ∈ E. The classical ergodic theorems assert the convergence, in some sense and under appropriate conditions, of the n averages n−1 r=1 f (S r ω) to a limit function f¯(ω), which is invariant under the action of S [i.e., f¯(Sω) = f¯(ω)] for a measurable function f . When f is µintegrable, the limit function f¯ is also µ-integrable and the individual ergodic theorem asserts convergence µ-a.e. When f ∈ Lp (µ) for some 1 ≤ p < ∞, f¯ ∈ Lp (µ) also and the statistical ergodic theorem asserts convergence in the Lp norm. When µ is a probability measure, the limit function f¯(ω) is a random variable that can be identiﬁed with the conditional expectation of f with respect to the σ-algebra I of invariant events under S, that is, of those sets E ∈ E for which µ(S −1 E E) = 0. Writing X for f (ω), Xn for f (S n ω), and YX = E(X | I) for f¯, the individual ergodic theorem in the probability case can be written more graphically in the form 1 a.s. Xr → E(X | I) ≡ YX . n r=1 n

(12.2.1)

An important special case arises when the probability measure is such that the events in I all have probability measure either 0 or 1. In this case the transformation S is said to be metrically transitive with respect to the measure µ, and the process {Xn }, or its distribution, is said to be ergodic. In such circumstances the only invariant functions are constants, the conditional expectation in (12.2.1) reduces to the ordinary expectation, and (12.2.1) takes the familiar form n 1 a.s. Xr → m ≡ EX. n r=1 For a fuller discussion of these results with proofs and references, see, for example, Billingsley (1965). One other prefatory remark is in order. Given a stationary process {X(t): t ∈ R}, deﬁne the two σ-ﬁelds I1 and I of sets E ∈ E that are invariant under the shift transformations {Sn : n = 0, ±1, . . .} and {St : t ∈ R}, respectively. In general I = I1 , with I ⊆ I1 ; of course, if I1 is trivial, then so is I. This and the consequences of the sandwich relation below cover our main concerns. We consider ﬁrst the implications of these results for stationary random measures on R. Here we take Ω = M# R and S as the shift through the unit distance. The measure-preserving character of S is then a corollary of

12.2.

Ergodic Theorems

195

1 stationarity. The simplest choice for X is the random variable X = 0 ξ(dx), which has ﬁnite expectation whenever ξ has ﬁnite mean intensity. Then Xn =

1

n+1

ξ(n + dx) = 0

ξ(dx) n

and the assertion in (12.2.1) becomes " 1 " ξ(0, n] a.s. → E ξ(dx) "" I . n 0

(12.2.2)

If, in particular, ξ is ergodic then ξ(0, n] a.s. → m. n

(12.2.3)

The results (12.2.2) and (12.2.3) seem simple, but they can be applied to many more general situations of which the simplest is to a continuous-time process. Observe ﬁrst that from the simple sandwich relation ξ(0, T ] [T + 1] ξ 0, [T + 1] [T ] ξ 0, [T ] · ≤ ≤ · T [T ] T T [T + 1] we easily extend (12.2.2) to arbitrary intervals as for Proposition 3.5.I, so that " 1 " ξ(0, T ] a.s. → E ξ(dx) "" I . T 0

(12.2.4)

Because the limit is invariant under all shifts {St : t ∈ R}, it is I-measurable rather than just I1 -measurable, so that the conditional expectation can and will be taken with respect to I. As a corollary, consider the behaviour of a nonnegative measurable function f (·) on R, applied to a stationary measurable stochastic process X(·) on R. If E f X(t) < ∞, we can deﬁne a random measure ξ with ﬁnite mean intensity by setting f X(t) dt. ξ(A) = A

Applying (12.2.4) to such ξ yields the result 1 T

T

" a.s. f X(t) dt → E f X(t) " I .

0

The only restrictive feature is the limitation to nonnegative functions f : this is not inherent in the ergodic problem but arises from our concern with random measures rather than random signed measures.

196

12. Stationary Point Processes and Random Measures

Similar results hold in higher-dimensional spaces and in the more general context of metric groups considered at the end of Section 12.1. The main point of diﬃculty concerns the choice of averaging sets to replace the intervals (0, n] in (12.2.2). Even in the plane it is not diﬃcult to ﬁnd sequences {An } with An ⊂ An+1 and (An ) → ∞ such that the analogue of (12.2.2) fails in some cases (see Exercise 12.2.1). To consider this question further, let (Ω, E, µ) be a measure space acted on measurably by the group of measurable transformations {Sg : g ∈ G}, meaning that (g, ω) → Sg ω is jointly measurable, where G is a σ-group with unique right-invariant Haar measure χ. Note the most important fact that the averaging in ergodic theorems takes place over sets in G and not the state space X . For example, the individual ergodic theorem takes the form that, for suitable sequences {An }, An

f (Sg ω) χ(dg) χ(An )

→ f¯(ω)

µ-a.e.,

(12.2.5)

where, in the probability case, f¯(ω) is the conditional expectation E(f | I) with respect to the σ-algebra of events invariant under the whole family {Sg : g ∈ G}. A thorough discussion of extensions of the classical ergodic theorems in this context is given by Tempel’man (1972) [see also Tempel’man (1986) and Sinai (2000, Chapter 4, Section 3.3)] who sets out a range of conditions on the sequence {An }—some necessary, others suﬃcient—for the validity both of (12.2.5) and of corresponding statistical ergodic theorems. For the present discussion we adopt only the simplest of the conditions he describes. Deﬁnition 12.2.I. Let X = Rd . The sequence {An } of bounded Borel sets in Rd is a convex averaging sequence if (i) each An is convex; (ii) An ⊆ An+1 for n = 1, 2, . . . ; and (iii) r(An ) → ∞ (n → ∞), where r(A) = sup{r: A contains a ball of radius r}. Using this terminology, we set out versions of the individual and statistical ergodic theorems, referring to Tempel’man (1972) for proofs and further extensions. Proposition 12.2.II. (a) (Individual Ergodic Theorem for d-dimensional Shifts). Let (Ω, E, P) be a probability space, {Sx : x ∈ Rd } a group of measurepreserving transformations acting measurably on (Ω, E, P) and indexed by the points of Rd , {An : n = 1, 2, . . .} a convex averaging sequence in Rd , and I the σ-algebra of events in E that are invariant under the transformations {Sx }. Then for all measurable functions (random variables) f on (Ω, E, P) with E |f | < ∞, f (Sx ω) dx a.s. An → E(f | I). (12.2.6) (An )

12.2.

Ergodic Theorems

197

(b) (Statistical Ergodic Theorem for d-dimensional Shifts). Under the same conditions as in (a) and for p ≥ 1, "p " " " f (Sx ω) dx − E(f | I)"" → 0 E"" An (An )

all f ∈ Lp (P) .

(12.2.7)

Remark. In general the statistical ergodic theorem holds under weaker conditions on the sequence {An } than the individual ergodic theorem. Versions of the theorem remain true when the probability measure P is replaced by a σﬁnite measure µ, subject of course to the condition that |f (ω)| µ(dω) < ∞; see Proposition 12.4.V for an application. Our task is to apply these theorems to stationary random measures on the c.s.m.s. X : we consider two cases, X = Rd and X = Rd × K, for unmarked and marked point processes, respectively, where the c.s.m.s. K is a space of marks. # When X = Rd , we identify (Ω, E) with the space M# X , B(MX ) of boundedly ﬁnite measures ξ deﬁned on BX , and Sx with the shift taking ξ(·) into ξ(· + x). If ξ has ﬁnite ﬁrst moment measure, stationarity requires that this should reduce to a constant multiple m(·) of Lebesgue measure on Rd . # More generally, if X = Rd × K, we still take (Ω, E) = M# X , B(MX ) but identify {Sx } with shifts in the ﬁrst coordinate only. Under stationarity, the ﬁrst moment measure becomes a measure on the product space and it is invariant under shifts in the ﬁrst component. The factorization Lemma A2.7.II then implies that the ﬁrst moment measure has the product form ×ν, where ν is a boundedly ﬁnite measure on K. If the ground process has ﬁnite ﬁrst moment measure, then it must be a multiple mg (·) of Lebesgue measure on Rd . In this case, ν(K) < ∞ and ν can be normalized to a probability measure π(·) on (K, BK ), the stationary mark distribution. The ﬁrst moment measure is then of the form mg × π. We proceed to an extension of these remarks to the conditional expectation of ξ with respect to the appropriate invariant σ-algebra I. Lemma 12.2.III. Let ξ be a random measure on the product space X = Rd × K and I the σ-algebra of invariant events with respect to the shifts Sx in Rd . When ξ is stationary with respect to these shifts and such that its expectation measure exists, there exists an I-measurable random measure ψ(·) on K such that for all nonnegative measurable functions f on X , E Rd ×K

" " " f (x, κ) ξ(dx × dκ) " I = ψ(dκ) K

f (x, κ) (dx) Rd

P-a.s. (12.2.8)

In particular, for bounded B ∈ B(Rd ) and K ∈ BK , " E ξ(B × K) " I = (B) ψ(K)

P-a.s.

(12.2.9)

198

12. Stationary Point Processes and Random Measures

Proof. Let X be any r.v. on (Ω, E, P) with ﬁnite expectation and G ∈ I be an invariant set, so that P(G Sx G) = 0. Then X(ω) P(dω) = X(ω) P(dω) = X(S−x ω) P d(S−x ω) G S G G x X(S−x ω) P(dω), = G

so for all x ∈ R , d

E(X | I) = E(Sx X | I)

P-a.s.

(12.2.10)

Take X = ξ(A), and recall from 9.1.XV that there is a version of Proposition the conditional expectation, E ξ(·) | I ≡ η(·) say, which is again a random measure. Then (12.2.10) asserts that η(Sx A) = η(A) P-a.s. Take A of the form B × K as at (12.2.9), and let B and K run through the members of countable rings generating B(Rd ) and BK , respectively, and x through a countable dense set in Rd . Because only a countable family of null sets is involved, we can assume that (12.2.10) holds simultaneously for all such B, K, x, and for ω outside a single set V with P(V ) = 0. For ω ∈ /V it now follows from Lemma A2.7.II that η(B × K, ω) = (B) ψ(K, ω) for some kernel ψ on K × Ω. But η(·) was chosen to be an I-measurable random measure, so for each K the left-hand side is an I-measurable r.v. (more precisely, it can be extended to all ω ∈ Ω, in such a way as to form such a r.v.). Also, for a ﬁxed ω ∈ / V , ψ(K, ω) is countably additive and its extension to V can be constructed so as to retain this property. Thus, ψ(·) is a random measure on K, from Proposition 9.1.VIII. This establishes (12.2.9), and (12.2.8) follows by standard extension arguments. Applying the deﬁnition in this lemma to a stationary MPP leads to an analogue of Proposition 12.1.VI (see Exercise 12.2.10). When X = Rd in Lemma 12.2.III, K reduces to a single point, and thus the random measure ψ is then an I-measurable random variable Y = E ξ(Ud ) | I , where Ud is the unit cube in Rd . Then (12.2.10) becomes the more familiar assertion that E ξ(A) | I = Y (A) P-a.s. (12.2.10 ) We can now state the main theorem of this section. It treats both marked and unmarked processes, and combines simple versions of both the individual and the statistical ergodic theorems. For more extensive results see MKM (1978, Section 6.2), Nguyen and Zessin (1979a), and the further discussion in Sections 13.4–5.

12.2.

Ergodic Theorems

199

Theorem 12.2.IV. Let the random measure ξ on X = Rd × K for some c.s.m.s. K be stationary with respect to shifts on Rd and have boundedly ﬁnite expectation measure × ν. Let ψ be the invariant random measure deﬁned as in Lemma 12.2.III. Then for any convex averaging sequence {An } on Rd and any ν-integrable function h on K, 1 h(κ) ξ(An × dκ) → h(κ) ψ(dκ) (n → ∞) (12.2.11) (An ) K K a.s. and in L1 norm. If the second moment measure exists and 2 d E < ∞, h(κ) ξ(U × dκ) K

then convergence at (12.2.11) also holds in mean square. For an unmarked process (K reduces to a single point) the statements are equivalent to ξ(An ) → Y = E[ξ(Ud ) | I] (n → ∞) (12.2.12) (An ) a.s. and in L1 mean, and also in mean square if the second moment measure of ξ exists. Proof. We give details mainly for the unmarked case, and consider the proof of (12.2.12). d For some ﬁxed ε > 0, let gε be a continuous function in R such that (i) gε (x) ≥ 0, Rd gε (x) dx = 1; and (ii) the support of gε (·) ⊆ Sε (0), the ball in Rd with centre at 0 and radius ε. Now deﬁne a function f on M# X by f (ξ) = gε (y) ξ(dy). Rd

It is clear that f is measurable and, because ξ has ﬁnite expectation measure, f is a P-integrable function with gε (x) dx = m. E(f ) = m Rd

Observe that when {An } is a convex averaging sequence, so are the related sequences {Aεn } and {A−ε n } with elements deﬁned by A−ε and Aεn = Sε (x). n = {x: Sε (x) ⊆ An } x∈An

Also,

f (Sx ξ) =

Rd

gε (y) ξ(x + dy) =

Rd

gε (u − x) ξ(du).

200

12. Stationary Point Processes and Random Measures

This leads to the sandwich relation f (Sx ξ) dx = ξ(du) gε (u − x) dx A−ε Rd A−ε n n ξ(du) gε (u − x) dx = ≤ ξ(An ) ≤ Rd

Aεn

f (Sx ξ) dx,

Aεn

where the inequalities are consequences of properties (i) and (ii) of gε (·), which further imply that gε (u − x) dx = 0 (u ∈ / An ) and gε (u − x) dx = 1 (u ∈ An ). A−ε n

Aεn

Invoking Proposition 12.2.II(a), we obtain Y lim inf

ξ(An ) ξ(An ) (Aεn ) (A−ε n ) ≤ lim inf ≤ lim sup ≤ Y lim sup (An ) (An ) (An ) (An )

P-a.s.

Because r(An ) → ∞ and An is convex, (Aεn )/(An ) → 1 and (A−ε n )/(An ) → 1 as n → ∞. This establishes the a.s. assertion in (12.2.12). Also from Proposition 12.2.II(b), with p = 1, we have, with f as deﬁned above and for n → ∞, " " " " " " " ε f (Sx ξ) dx " −ε f (Sx ξ) dx " " " An " An − Y − Y → 0 and E E" " → 0. " " ε) " " " " (A (A−ε ) n n Denoting the ﬁrst terms in these diﬀerences by Ln and Un , respectively, these equations imply that E|Un − Ln | → 0 as n → ∞. Furthermore, (Aεn ) ξ(An ) (A−ε n ) Ln ≤ ≤ Un , (An ) (An ) (An ) where the coeﬃcients of Ln and Un converge to 1, so E|ξ(An )/(An ) − Y | → 0 as n → ∞. This establishes the L1 convergence in (12.2.12). When the second moment measure exists, a similar argument with the L2 norm replacing the L1 norm establishes the L2 convergence. Turning now to the general marked case, let h(·): K → R be measurable and ν-integrable. Deﬁne a function f on M# X by gε (y)h(κ) ξ(dy × dκ), f (ξ) = Rd ×K

and observe that f (Sx ξ) is the same integral with gε (y) replaced by gε (y − x). Then form the integrals f (Sx ξ) dx and f (Sx ξ) dx, A−ε n

Aεn

12.2.

Ergodic Theorems

201

and invoke the general forms of Proposition 12.2.II and Lemma 12.2.III to assert that f (Sx ξ) dx Aεn → h(κ) ψ(dκ), (Aεn ) K with a similar statement holding with Aεn replaced by A−ε n ; here ψ(·) is the invariant random measure deﬁned by Lemma 12.2.III. Similar inequalities and arguments now apply as in the unmarked case, and yield (12.2.11) in its a.s., L1 and L2 forms. As simple special cases of (12.2.12) and (12.2.11), the theorem yields the following corollaries. Corollary 12.2.V. (a) When ξ is stationary and metrically transitive with ﬁnite mean density m, ξ(An ) → m a.s. and in L1 norm. (An )

(12.2.13)

(b) If {(xi , κi )} is the realization of a stationary ergodic MPP on Rd × K, then with ν as in Theorem 12.2.IV, 1 (An )

a.s.

h(κi ) →

i:xi ∈An

h(κ) ν(dκ).

(12.2.14)

K

For versions of the L2 norm results, see Exercises 12.2.7–8. Numerous other special cases and corollaries follow from Theorem 12.2.IV such as the following (see also the exercises to this section). Proposition 12.2.VI. Under the conditions of Theorem 12.2.IV, for any measurable integrable function h(·) on Rd and with Y as at (12.2.12), Rd

h(y)ξ(An + y) dy = (An )

An

ξ(dx)

Rd

h(u − x) dx

(An )

→Y

h(y) dy Rd

a.s. and in L1 norm.

(12.2.15)

In all of these results, the convex averaging sequence {An } can of course be specialized to sequences of balls about the origin or nested hyper-rectangles whose smallest dimension → ∞. Higher-order ergodic theorems, requiring the existence of higher-order moment measures, are discussed in Section 12.6 and reappear in Chapter 13 in connection with higher-order Palm distributions. A diﬀerent type of extension is outlined brieﬂy in the proposition below. Proposition 12.2.VII (Weighted Averages). Let {an (·)} be a monotonic increasing sequence of nonnegative functions, convex upward, {An } a convex

202

12. Stationary Point Processes and Random Measures

averaging sequence in Rd , and ξ a stationary random measure on Rd with ﬁnite intensity m. Then as n → ∞, a (x) ξ(dx) a.s. An n → Y ≡ E ξ(Ud ) | I . (12.2.16) a (x) dx An n Proof. Deﬁne an associated random measure ξ on Rd × R by ξ (A × B) = ξ(A)(B). Then ξ is a stationary random measure in Rd+1 , and the sets An = {(x, u): x ∈ An , 0 ≤ u ≤ an (x)} are those of a convex averaging sequence in Rd+1 . Equation (12.2.16) follows by applying Theorem 12.2.IV to ξ . Some other classes of weighting functions can be handled by using the more general averaging sequences considered by Tempel’man (1972). In particular, this includes the class an (x) = a(x/tn ), where the nonnegative measurable function a(·) has bounded support (e.g., the unit cube) and {tn } is a sequence of nonnegative reals → ∞; a(·) need not be convex upward. Also, the assumptions of Proposition 12.2.VII can be trivially extended to the case where there exist positive constants bn such that an (x) ≤ bn an+1 (x). Ergodic theorems are important in nearly all branches of point process theory, whether in establishing properties of point process models, or in developing estimation and testing procedures in statistics, or in analyzing the behaviour of simulation routines for point process models. We conclude this section with an application to the frequency of occurrence of special conﬁgurations of points in a Poisson process. Discussions of particular conﬁgurations of points are closely related to one method of introducing Palm probabilities as ergodic limits (see in particular Theorem 13.2.VI). Such results can often be reduced to a direct application of Theorem 12.2.IV itself by introducing a suitable auxiliary random measure. We illustrate the procedure in a case where the expectation in the limit can be evaluated explicitly. Example 12.2(a) Conﬁgurations in a Poisson process. Let N be a stationary Poisson process in Rd at rate µ. Consider ﬁrst the conﬁguration consisting of a single point of the process with no neighbours within a distance a. The general (estimation) procedure is to take a convex region A, which we suppose to be a member of a convex averaging sequence, and count the number of points in A satisfying the required condition. Write this as the sum IB (Sxi N ), Y (A) = i:xi ∈A

where B = {N : N ({0}) = 1 = N Sa (0) }. Evidently, the sum can also be written as the counting process integral IB (Sx N ) N (dx), Y (A) = A

12.2.

Ergodic Theorems

203

and can be regarded as the value of a further point process Y (·) if A is allowed to range more generally over bounded sets of B(Rd ). In fact, Y (·) here is just a dependent thinning of the original process (see Section 11.3). Applying Theorem 12.2.IV to Y (·) yields the result that for increasing A, Y (A) →E IB (Sx N ) N (dx) = µpB , (12.2.17) (A) Ud where pB may be regarded as the probability that a given point will be retained in the thinning process: later, we show that pB can be interpreted as the Palm probability of the event B. In the special case considered here, we can evaluate the expectation by a simple approximation argument using the independence properties of the Poisson process as follows. The probability that there is a point in the small region (x, x+ δx) and none in the remainder of a ball Sa (x) centred at x is µ(δx) exp −µ Sa (x) −(δx) , and because the process is simple this is also the expected number of such conﬁgurations associated with the element (x, x + δx). Integration over Ud gives the limit as µe−µV (a) , where V (a) = Sa (x) is the volume of a sphere of radius a. Theorem 12.2.IV here asserts that the average density of points in A that have no neighbours closer than a, approaches this value as a limit when (A) → ∞ through a convex averaging sequence. Similarly, the average density of points in A that have at least k neighbours within a distance a approaches the limit µ 1 − e−µV (a) 1 + µV (a) + · · · + [µV (a)]k−1 /(k − 1)! . Finally, consider the numbers of pairs of points that lie within a distance a of one another. Taking one point of any such pair as a reference origin, at x say, the number of such pairs to which it belongs is just N Sa (x) − 1. Summing over all points in A leads to the integral 1 N Sa (x) − 1 N (dx), Y2 (A) = 2 A

the factor 12 arising (asymptotically when the edge eﬀects from points near the boundary of A become negligible) from the fact that each point of each pair is counted twice. Dividing by (A) and noting that Y2 (·) deﬁnes a random measure to which the theorems can be applied, we obtain the quantity E[Y2 (Ud )] as the limiting value of such an average density of pairs; again this can be evaluated by an approximation argument that uses the independence properties of the Poisson process as d 1 E N Sa (x) \ Sδx (x) N (dx) E[Y2 (U )] = lim 2 δx→0 d U 1 µ V (a) − V (δx) µ dx = 12 µ2 V (a). = lim 2 δx→0

Ud

Of course, the expected numbers of pairs is related to the second factorial moment measure of the process, and the general version of the above argument leads to a higher-order ergodic theorem in which the reduced factorial moment measures appear as ergodic limits.

204

12. Stationary Point Processes and Random Measures

Exercises and Complements to Section 12.2 12.2.1 Suppose the stationary point process N (·) in R2 has sample realizations {Y + (n1 , 2n2 ): n1 , n2 = 0, ±1, . . .}, where Y is uniformly distributed on the rectangle (0, 1] × (0, 2]. Contrast the a.s. limits of N (An )/(An ) (n → ∞) when (2◦ ) An = n (1◦ ) An = (0, n] × (0, 1]; j=1 (j − 1, j] × (j − 1, j]; (4◦ ) An = (0, n] × (0, n]; (3◦ ) An = (0, 1] × (0, n]; (5◦ ) An = {(x, y): x ∈ (0, n], 0 ≤ y ≤ 2x/n}; and (6◦ ) An = {(x, y): x ∈ (0, n], x ≤ y ≤ x + 1}. 12.2.2 Show that Theorem 12.2.IV, which in the text is deduced from Proposition 12.2.II, implies the latter in the sense that, if ξ is a stationary random measure on Rd satisfying the assumptions of Theorem 12.2.IV and (12.2.12) holds, then (12.2.6) holds also. [Hint: For such ξ and f (ξ(·)) a functional with ﬁnite expectation, deﬁne a new random measure ξf by

ξf (A) =

(bounded A ∈ B(Rd )).

f (Sx ξ(·)) dx

A

Apply Theorem 12.2.IV to ξf to deduce Proposition 12.2.II(a) as applied to the original random measure ξ.] 12.2.3 Show formally that (a) if ξ has a trivial invariant σ-ﬁeld, then the only invariant functions are the constant functions; and (b) metric transitivity implies ergodicity [i.e., (12.2.13) holds]. 12.2.4 Let F be a σ-algebra in B(M# X ) and ξ(·) a random measure on X with ﬁnite expectation measure. (a) Use Theorem 9.1.XIV to show that there exists a random measure ψ such that (bounded A ∈ BX ). ψ(A) = E[ξ(A) | F ] a.s. (b) Show that if F is chosen to be the σ-algebra of events invariant under shifts Sx , then ψ(·) is invariant in the sense that Sx ψ(A) = ψ(A)

a.s.

(bounded A ∈ BX ).

[Hint: Show ﬁrst that the indicator function of any event in F is invariant, and ultimately that any F -measurable r.v. is invariant.] 12.2.5 Interpret the strong law result at Exercise 4.1.1 in the setting of Theorem 12.2.IV. 12.2.6 Establish statistical ergodic theorem versions of the individual ergodic theorems at Theorem 12.2.IV, Corollaries 12.2.V–VI and Proposition 12.2.VII. [Hint: Use Proposition 12.2.II(b).] 12.2.7 L2 convergence of a stationary MPP in Rd . Show that if the stationary ergodic MPP ξ in Rd satisﬁes the conditions of Theorem 12.2.IV, including the existence of ﬁnite second moments, then for any convex averaging sequence {An }, 2 ξ(An × K) lim E = 0, − ψ(K) n→∞ (An )

where ψ(·) is as in Lemma 12.2.III.

12.2.

Ergodic Theorems

205

12.2.8 Let the simple point process N on R have stationary second-order distributions and ﬁnite second-order moment. Then the expectation function U (·) of (3.5.2) satisﬁes U (x)/x → λ (x → ∞) for some λ ≥ λ [Exercise 8.1.3(e) or Lemma 9 in Daley (1971)]. Examine the implications of Proposition 12.2.II for an L2 norm result. 12.2.9 (a) Use Theorem 12.2.IV, the inversion result at (8.6.8), and the identiﬁcation of the Bartlett spectrum Γ(·) as in Deﬁnition 8.2.II to show that for a nonergodic second-order stationary random measure ξ on Rd , Γ({0}) = var Y, where Y is the I-measurable r.v. as in (12.2.10 ) and (12.2.12). (b) For a second-order stationary random measure ξ on R and with V (x) = var ξ(0, x], recall from Exercise 8.1.3(b) that limx→∞ x−2 V (x) exists and ∞ is ﬁnite. Deﬁning v(s) = 0 e−sx dV (x), use an Abelian theorem for Laplace–Stieltjes transforms [e.g., Widder (1941, p. 181)] to show that lims→0 s2 v(s) = limx→∞ 2x−2 V (x). Then show from (8.2.3) that Γ(dω) 1 2 s v(s) = Γ({0}) + , 2 2 2 R\{0} 1 + ω /s and conclude that var ξ(0, x] ∼ x2 Γ({0}) (x → ∞). 12.2.10 Let N be a marked point process on Rd ×K, stationary as in Lemma 12.2.III. Extend Proposition 12.1.VI to the statement that for each K ∈ B(K), P{N (Rd × K) = ∞} + P{N (Rd × K) = 0} = 1. 12.2.11 A stationary nonisotropic point process in R2 . For i ∈ Z let {Ni (·)} be independent copies of a simple stationary point process on R for which var Ni (0, 1] < ∞ and with generic realization {xij : j ∈ Z} at rate λ1 ; let {yi : i ∈ Z} be a realization of some other simple stationary point process N 0 (·) on R at rate λ0 and with ﬁnite second moment measure, independent of the Ni . Write N (·) for the counting measure of the point process in R2 with realizations {(xij , yi ): i, j ∈ Z}. (a) Show that N is a stationary point process in R2 . (b) Using the convex averaging sets {A1n } and {A2n } speciﬁed by A1n = (0, n] × (0, n2 ] and A2n = (0, n2 ] × (0, n], show that both N (A1n )/(A1n ) a.s. and N (A2n )/(A2n ) → E[N (A11 )] = λ0 λ1 . 0 (0,y] Ni (0, x] for the rect(c) Use the representation N ((0, x] × (0, y]) = N i=1 angle (0, x] × (0, y] to show that var N ((0, x] × (0, y]) equals x y [1 + 2U (u)] du + λ0 λ21 x2 2[U 0 (v) − λ0 v] dv, λ0 yλ1 0

0

where U (·) and U 0 (·) are the expectation functions for the Ni and N 0 , respectively. Deduce that, when var N1 (0, x] = O(x) and var N 0 (0, y] = O(y) for x, y → ∞, and for the same convex averaging sets as in (b), var N (A1n ) = O(n4 ) but var N (A2n ) = O(n5 ) and (A1n ) = n3 = (A2n ).

206

12. Stationary Point Processes and Random Measures (d) Suppose that both Ni (·) and N (·) have covariance density functions, c(·) and c0 (·) say. Show that Pr{N ((x, x + dx) × (y, y + dy)) > 0 | N ({0, 0}) = 1} =

λ[c(x) + λ] dx

if y = 0,

λλ0 [c0 (y) + λ0 ] dx dy

if y = 0.

Does N have a reduced covariance density, c((x, y)) say?

12.3. Mixing Conditions In practice, the useful applications of the ergodic theorem are to those situations where the ergodic limit is constant or, in other words, where the process is metrically transitive (the invariant σ-algebra is trivial). It is therefore important to characterize as fully as possible the various classes of processes that have this property. Now the absence of nontrivial invariant events is closely related to the absence of long-term dependence, and thus, checking for metric transitivity is generally accomplished by verifying that some kind of asymptotic independence or mixing condition is satisﬁed. This section contains a review of such conditions and outlines some of their applications. As in the previous section we suppose that either X = Rd or X = Rd × K, and write Sx for the operator on M# X deﬁned by shifts as in (12.1.2) or, for X = Rd × K, by shifts in the ﬁrst coordinate Sx ξ(·, K) = ξ(· + x, K). We also write Ud2a for the hypercube in Rd with sides of length 2a and vertices (±a, . . . , ±a), and P for the probability measure of a random measure on M# X or a point process on NX# . Deﬁnition 12.3.I. A stationary random measure (respectively, point process) on state space X = Rd or Rd × K is # (i) ergodic if, for all V , W in B(M# X ) [respectively, B(NX )] 1 P(Sx V ∩ W ) − P(V )P(W ) dx → 0 (a → ∞); (12.3.1) (Uda ) Uda (ii) weakly mixing if for all such V , W , " " 1 "P(Sx V ∩ W ) − P(V )P(W )" dx → 0 d (Ua ) Uda

(a → ∞); (12.3.2)

(iii) mixing if for all such V , W , P(Sx V ∩ W ) − P(V )P(W ) → 0

(x → ∞);

(12.3.3)

(iv) ψ-mixing (on R1 ) if for u > 0, t ∈ R, and a function ψ(u) with ψ(u) ↓ 0 as u → ∞, |P(V ∩ W ) − P(V )P(W )| ≤ ψ(u) (12.3.4) whenever V ∈ σ{ξ(A): A ⊆ (−∞, t]} and W ∈ σ{ξ(B): B ⊆ (t+u, ∞)}.

12.3.

Mixing Conditions

207

The conditions are written in order of increasing strength: it is clear that mixing implies weak mixing, which in turn implies ergodicity. Furthermore, any completely random measure, such as the Poisson process, clearly satisﬁes all four conditions. The ﬁrst three conditions apply to point processes and random measures generally; the fourth is introduced speciﬁcally to illustrate the central limit theorem at Proposition 12.3.X. Before examining the conditions in more detail, we show that in general it is enough to check the properties # on any semiring of events generating the Borel sets in M# X : replacing MX by # NX throughout leads to the same statement for point processes. Lemma 12.3.II. For a stationary random measure the limits in (12.3.1–4) hold for all V , W ∈ B(M# X ), if and only if they hold for V , W in a semiring S generating B(M# ). X Proof. We establish the truth of the assertion for (12.3.3) (mixing); the other cases are proved similarly. Let F ⊆ B(M# X ) denote the class of sets for which (12.3.3) holds. It is clear that if (12.3.3) holds for ﬁnite families of disjoint sets V1 , . . . , Vj and W1 , j k . . . , Wk , then it holds also for V = i=1 Vi and W = i=1 Wi . So, if (12.3.3) holds for sets in a semiring S, it holds for sets in the ring R generated by S. Suppose that W ∈ F and Vn ∈ F for n = 1, 2, . . . with Vn ↑ V . Now " " "P(Sx V ∩ W ) − P(V )P(W )" " " " " ≤ "P(Sx V ∩ W ) − P(Sx Vn ∩ W )" + "P(Sx Vn ∩ W ) − P(Vn )P(W )" " " + "P(Vn )P(W ) − P(V )P(W )", in which the ﬁrst term on the right-hand side is bounded above by P Sx (V Vn ) ∩ W 0, we can ﬁx n large enough such that each term < ε, uniformly in x. For the middle term, having ﬁxed n, (12.3.3) holds for the pair Vn , W , so for x > some x0 , this term < ε also. Thus, " " "P(Sx V ∩ W ) − P(V )P(W )" < 3ε (all x > x0 ). Similarly, we may also replace W by a sequence {Wn } ⊆ R with Wn → W , showing that F is closed under monotone limits. Thus, F is a monotone class which, because it includes R, includes σ{R} = B(M# X ). Our aim now is to establish links with the theorems of the previous section. The next proposition establishes the equivalence of metric transitivity (trivial invariant σ-algebra) with the ergodicity condition at (i) of Deﬁnition 12.3.I above. It implies that in talking of an ergodic point process we may use the two criteria indiﬀerently.

208

12. Stationary Point Processes and Random Measures

Proposition 12.3.III. A stationary random measure or point process is ergodic if and only if it is metrically transitive; that is, the invariant σ-algebra I is trivial. Proof. Let ξ be ergodic as at (12.3.1) above and let A be an invariant event. Putting V = W = A in (12.3.1), observe from invariance that P(Sx A ∩ A) = P(A) and hence, using ergodicity, that 1 P(A) − [P(A)]2 dx → 0 (a → ∞), d (Ua ) Uda which is possible only if P(A) = 0 or 1. Conversely, suppose that I is trivial, so that (12.2.6) takes the form, in the notation as there, 1 f (Sx ξ) dx → E[f (ξ)] a.s. (An ) An Let V , W be as in Deﬁnition 12.3.I and take f (ξ) = IV (ξ), so that Ef (ξ) = P(V ) and (12.2.6) yields 1 IV (Sx ξ) dx → P(V ) a.s. (An ) An Writing gn (ξ) for the left-hand side of this equation, observe that 0 ≤ gn (ξ)≤1 # and that gn (ξ) is a measurable function of ξ in M# X , B(MX ) . Integrating over W and using dominated convergence and Fubini’s theorem, we obtain 1 IV (Sx ξ) dx P(dξ) → P(V )P(W ), (An ) An W which reduces to (12.3.1) in the special case An = Udn . Just as ergodicity is related to the invariant σ-algebra I being trivial, so mixing is related to the σ-algebra of tail events deﬁned on the process ξ being trivial (this σ-algebra being, in general, larger than I). To deﬁne these events, denote by Ta , for each a > 0, the σ-algebra of events deﬁned by the behaviour of ξ outside Uda ; that is, Ta is the smallest σ-algebra in B(M# X ) with respect to which the ξ(A) are measurable for A ∈ B(X \Uda ). Deﬁnition 12.3.IV. ∞ The tail σ-algebra of the process ξ is the intersection T∞ ≡ a>0 Ta = n=1 Tn . An element of T∞ is a tail event. Thus, T∞ deﬁnes the class of events that are determined by the behaviour of ξ outside any bounded subset of X . It is not diﬃcult to show that, modulo sets of P-measure zero, any invariant event is in the tail σ-algebra (see Exercise 12.3.2). The converse is not true, however: periodic processes provide typical examples of processes that are ergodic but for which the tail σ-algebra is non-trivial (see Exercise 12.3.1).

12.3.

Mixing Conditions

209

The triviality result referred to above is set out below. Triviality of the tail σ-algebra is also closely related to the concept of short-range correlation; see Exercise 12.3.4 for a deﬁnition and details of the relationship. Note that the term short-range correlation is well established in the physics literature, although it relates to stochastic dependence rather than any second-order product-moment property; paradoxically, the terms long- and short-range dependence (see Section 12.7 for the point process setting) are established in the statistical literature as pertaining to a second-order or correlational (!) property. Proposition 12.3.V. If the tail σ-algebra is trivial, then the random measure ξ is mixing. Proof. Let V be any set in B(M# X ). Because Tn ↓ T∞ , we have for any random variable with expectation, and hence in particular for the indicator function IV (·), E(IV | Tn ) → E(IV | T∞ ) a.s. [this is a standard result for backward martingales; see, e.g., Chung (1974, Theorem 9.4.7)]. When T∞ is trivial, the right-hand side here reduces to E(IV ) = P(V ) a.s. Consequently, given ε > 0, we can choose n0 such that for n ≥ n0 , " " "E(IV | Tn ) − P(V )" < ε a.s. (12.3.5) Now let W be a cylinder set belonging to the σ-algebra generated by the family {ξ(A): A ∈ B(Uda )}. For x suﬃciently large, namely, x > d1/2 n, Tx Uda lies in the complement of Udn and hence Sx W ∈ Tn . For ﬁxed n and x large enough, the indicator function ISx W is therefore Tn -measurable and so satisﬁes E(ISx W IV | Tn ) = ISx W E(IV | Tn ) a.s. Taking expectations and using (12.3.5), we obtain P(Sx W ∩ V ) = P(Sx W )P(V ) + E(ISx W Y ), where the r.v. Y has |Y | < ε a.s. This establishes the mixing property for arbitrary V and any cylinder set W . Because the cylinder sets generate B(M# X ), the proposition follows from Lemma 12.3.II. Mixing is deﬁned above in terms of probabilities of events; the conditions can equally be stated in terms of expectations of random variables deﬁned on the process: see Exercises 12.3.5–6 for details in the context of convergence to equilibrium, which topic is discussed more fully in Section 12.5. We already observed in the proof of the last proposition that it is enough to verify the mixing or weak mixing or ergodicity conditions for cylinder sets, that is, for sets of the type that occur in the deﬁnition of the ﬁdi distributions. Although this may sometimes be convenient, it is generally easier to check

210

12. Stationary Point Processes and Random Measures

the conditions in a form that relates to the generating functionals rather than directly to the ﬁdi distributions. The next proposition provides such conditions: the discussion here, as in the applications that follow, is based on Westcott (1972). In the proposition below we use the shift operator S. deﬁned on functions h(·) by (Sx h)(y) = h(y + x) [compare with (12.1.1)]. Proposition 12.3.VI. (a) Let ξ be a random measure, L[·] its Laplace functional, and h1 , h2 functions in BM+ (X ). (i) ξ is ergodic if and only if for all such h1 , h2 , 1 L[h1 + Sx h2 ] − L[h1 ]L[h2 ] dx → 0 (n → ∞). (12.3.6) d (Un ) Udn (ii) ξ is weakly mixing if and only if for all such h1 , h2 , " " 1 "L[h1 + Sx h2 ] − L[h1 ]L[h2 ]" dx → 0 (n → ∞). d (Un ) Udn

(12.3.7)

(iii) ξ is mixing if and only if for all such h1 , h2 , L[h1 + Sx h2 ] → L[h1 ]L[h2 ]

(x → ∞).

(12.3.8)

(b) Let N be a point process, G[·] its p.g.ﬂ., and h1 , h2 functions in V(X ). (i) N is ergodic if and only if for all such h1 , h2 , 1 G[h1 Sx h2 ] − G[h1 ]G[h2 ] dx → 0 (n → ∞). (12.3.9) (Udn ) Udn (ii) N is weakly mixing if and only if for all such h1 , h2 , " " 1 "G[h1 Sx h2 ] − G[h1 ]G[h2 ]" dx → 0 (n → ∞). d (Un ) Udn

(12.3.10)

(iii) N is mixing if and only if for all such h1 , h2 , G[h1 Sx h2 ] → G[h1 ]G[h2 ]

(x → ∞).

(12.3.11)

Proof. (a) Let {Aj : j = 1, . . . , J} and {Bk : k = 1, . . . , K} be bounded on X , and consider the family of random variables {ξ(Aj )} ∪ {ξ(Tx Bk )}. If ξ is mixing then (12.3.3) implies that the joint distribution of this family converges to the product of the two joint distributions of the families {ξ(Aj )} and {ξ(Bk )}. It follows that the multivariate Laplace transforms of these joint distributions satisfy for real nonnegative {αj } and {βk } J K αj ξ(Aj ) − βk ξ(Tx Bk ) E exp − j=1

k=1

J K → E exp − αj ξ(Aj ) E exp − βk ξ(Bk ) . j=1

k=1

12.3.

Mixing Conditions

211

But this is just the statement (12.3.8) for the special case that the hi are the simple functions h1 (x) =

J

αj IAj (x),

h2 (x) =

j=1

K

βk IBk (x).

(12.3.12)

k=1

Now any h ∈ BM+ (X ) can be monotonically and uniformly approximated by simple functions of this form, so an argument similar to that of Lemma 12.3.II shows that (12.3.8) holds as stated. Conversely, when (12.3.8) holds, take h1 , h2 to be simple functions as at (12.3.12). Then it follows from the continuity theorem for Laplace transforms that the joint distributions of {ξ(Aj )} and {ξ(Tx Bk )} converge for all families {Aj } and {Bk }, and hence that (12.3.3) holds for the corresponding cylinder sets, that is, a semiring generating B(M# X ). Then by Lemma 12.3.II, (12.3.3) holds generally and so ξ is mixing. Analogous statements hold in the other cases; we omit the details. It is important for our proof below of Proposition 12.3.IX to observe that the convergence properties in equations (12.3.6–11) hold for wider classes of functions than those with bounded support. For example, when each hi is the monotone limit of a sequence {hin } ⊂ BM+ (X ), (12.3.8) holds provided we interpret the functionals as extended Laplace functionals [see (9.4.11)]. To see this, recall that any function in BM+ (X ) is the monotone limit of simple functions, and ﬁrst assume that the functions h1 and {h2n } are simple functions. By the argument leading to (12.3.12) we then have, when ξ is stationary and mixing, L[h1 + Sx h2n ] → L[h1 ]L[h2n ] as x → ∞. Thus, 0 ≤ |L[h1 ]L[h2 ] − L[h1 + Sx h2 ]| " " " " ≤ "L[h1 ](L[h2 ] − L[h2n ])" + "L[h1 ]L[h2n ] − L[h1 + Sx h2n ]" " " + "L[h1 + Sx h2n ] − L[h1 + Sx h2 ]" ≡ δ1n + δ2n (x) + δ3n (x),

say.

Now " " " δ3n (x) = "E exp − h1 dξ exp − Sx h2n dξ − exp − Sx h2 dξ by nonnegativity and monotonicity, ≤ |L[Sx h2n ] − L[Sx h2 ]| by stationarity, = |L[h2n ] − L[h2 ]| and by monotone convergence, L[h2n ] ↓ L[h2 ] as n → ∞. Similarly, δ1n ≤ |L[h2n ]−L[h2 ]| and so, given ε > 0, we can make both δ1n < ε and δ3n (x) < ε, uniformly in x, by choosing n suﬃciently large. Fixing such n, (12.3.12) now implies that we can make δ2n (x) < ε by taking x suﬃciently large. Thus, (12.3.8) holds for simple h1 ∈ BM+ (X ) and h2 ∈ BM+ (X ), and a similar argument establishes it for h1 ∈ BM+ (X ) as well.

212

12. Stationary Point Processes and Random Measures

Extensions of the rest of (12.3.6–11) are established in a similar manner. It is also pertinent to note that in part (b), it is enough to restrict the functions hi to the subspace V0 (X ) ⊂ V(X ). This follows directly from part (a) and the remark following (9.4.14). Using these results, we can investigate the mixing and ergodicity properties of some classes of point processes, namely Cox and cluster processes in Propositions 12.3.VII–IX, and interval properties (e.g., renewal processes) in Exercise 12.4.1 and Section 13.4. Proposition 12.3.VII. A stationary Cox process N on X = Rd is mixing, weakly mixing, or ergodic if and only if the random measure Λ directing N has the same property. Proof. Recall from Proposition 6.2.II that the p.g.ﬂ. G[·] of a Cox process N is related to the Laplace functional L[·] of the random measure Λ directing N by G[h] = L[1 − h] for h ∈ V(X ). To verify the mixing property we start from the relation G[h1 Sx h2 ] = L[1 − h1 Sx h2 ] = L[1 − h1 + Sx (1 − h2 ) − (1 − h1 )Sx (1 − h2 )]. Because each 1 − hi (i = 1, 2) has bounded support, the last term vanishes for suﬃciently large x, and for such x, appealing to (12.3.8) for x → ∞, G[h1 Sx h2 ] = L[(1 − h1 ) + Sx (1 − h2 )] → L[1 − h1 ]L[1 − h2 ] = G[h1 ]G[h2 ]. This argument is reversible and proves the result concerning the mixing property. Proofs for the weakly mixing and ergodicity properties are similar. Corollary 12.3.VIII. A stationary mixed Poisson process N is mixing if and only if it is a simple Poisson process. Proof. From Example 9.4(d) and Exercise 12.1.4, the directing measure ξ must be a random multiple of Lebesgue measure, Λ(·) say, so for h ∈ V(Rd ) the p.g.ﬂ. G[h] of N equals φΛ

Rd

[1 − h(x)] dx ,

where φΛ (·) denotes the Laplace–Stieltjes transform of Λ. Now as x → ∞, [1 − h1 (x)] dx + [1 − h2 (x)] dx , L[(1 − h1 ) + Sx (1 − h2 )] → φΛ Rd

Rd

which can equal the product of the φΛ Rd [1 − hi (x)] dx for all hi ∈ V(Rd ) if and only if φΛ is an exponential function, and hence the distribution of Λ is concentrated at a single point. By a stationary cluster process N on Rd , we mean a process as in Exercise 12.1.6 (or Proposition 12.1.V for a stationary Poisson cluster process).

12.3.

Mixing Conditions

213

Proposition 12.3.IX. A stationary cluster process is mixing, weakly mixing, or ergodic, whenever the cluster centre process has the same property. Proof. We give details for the mixing case only; the other cases can be treated in a similar fashion. Also, although we generally follow the p.g.ﬂ. proof of Westcott (1971) [for an alternative proof see MKM (1978, Proposition 11.1.4)], some further argument is needed as in Daley and Vere-Jones (1987). In particular, we use the idea of extended p.g.ﬂ.s and use Proposition 12.3.VI(b) with functions hi ∈ V0 (X ) (see the last remark following that proposition). Then in view of (12.1.18) and (12.3.11) with hi ∈ V0 (X ), it is enough to deduce from Gc [h1 Sx h2 ] → Gc [h1 ]Gc [h2 ] as x → ∞ that (12.3.13) Gc Gm [h1 Sx h2 | · ] → Gc Gm [h1 | · ] Gc Gm [h2 | · ] . Formally, the mixing property implies that the right-hand side here is the limit at x → ∞ of ˜ 1 Sx h ˜ 2 ], (12.3.14) Gc Gm [h | · ] Sx Gm [h2 | · ] ≡ Gc [h ˜ i (y) = Gm [hi | y]; so, formally, it is enough to show that as x → ∞, where h ˜ 1 Sx h ˜ 2 ] − Gc Gm [h1 Sx h2 | · ] → 0. (12.3.15) Gc (h We have said ‘formally’ because, although hi ∈ V0 (Rd ), the same need not ˜ i . However, by replacing the generic cluster Nm (· | 0) necessarily be true of h ˜ i is expressed as by Nmn (· | 0) = Nm (· ∩ Udn | 0) and letting n → ∞, each h ˜ the limit of the monotonic sequence {hin } for which hin ∈ V0 (Rd ). Consequently, appealing to the convergence properties of extended p.g.ﬂ.s established in Exercise 9.4.6(b) and the extended form of the mixing property (12.3.11) noted below Proposition 12.3.VI, it follows that when Nc (·) is mix˜ 1 Sx h ˜ 1 ] Gc [h ˜ 2 ] as x → ∞. ˜ 2 ] → Gc [h ing, Gc [h To complete the proof, deﬁne x (u) = Gm [h1 Sx h2 | u] − Gm [h1 | u] Sx Gm [h2 | u]. As upper bounds on x (u) we have x (u) ≤ Gm [Sx h2 | u](1 − Gm [h1 | u]) ≤ 1 − Gm [h1 | u], x (u) ≤ Gm [h1 | u](1 − Gm [Sx h2 | u]) ≤ 1 − Gm [Sx h2 | u]; as lower bounds we have x (u) ≥ Gm [h1 Sx h2 | u] − Gm [Sx h2 | u] = −E exp X log[Sx h2 (y)] Nm (dy | u) × 1 − exp X log h1 (y) Nm (dy | u) ≥ −(1 − Gm [h1 | u])

214

12. Stationary Point Processes and Random Measures

and, similarly, x (u) ≥ −(1 − Gm [Sx h2 | u]). Because h2 ∈ V0 (Rd ) and Nm (Rd | u) < ∞ a.s., Gm [Sx h2 | u] → 1 as x → ∞ and thus x (u) → 0 as x → ∞. Also, because the cluster process exists, (12.1.18) holds, and therefore |x (u)| is bounded above by 1 − Gm [h1 | u], which is Nc -integrable a.s. Indeed, again because h1 ∈ V0 (Rd ) and Nm (Rd | u) < ∞ a.s., it also holds that 1 ≥ Gm [h1 | u] ≥ c1 , uniformly in u, for some positive constant c1 , and similarly for Gm [Sx h2 | u]. Thus, χx (u) ≡ Gm [h1 | u]Sx Gm [h2 | u] ≥ c > 0 for some constant c, uniformly in x and u. Now " " "Gc Gm [h1 Sx h2 | ·] − Gc Gm [h1 | ·]Sx Gm [h2 | ·] " = |Gc [χx + x ] − Gc [χx ]| " " = "E exp Rd log[χx (u) + x (u)] Nc (du) − exp Rd log χx (u) Nc (du) " |x (u)| Nc (du) . ≤ 1 − E exp − log 1 + χx (u) Rd This expression → 0 as x → ∞ because 1 ≥ χx (u) ≥ c > 0 (all x and u), x (u) → 0 pointwise, and x (u), and hence c−1 x (u) also, is Nc -integrable a.s. uniformly in x (see Exercise 9.4.6). Thus, (12.3.13) is proved. One of the classical applications of mixing conditions is in establishing conditions for a central limit theorem. Here, the point process or random measure character of the realizations plays an entirely minor role; it is not the local behaviour but the behaviour over large time spans that is important. Results for point processes and random measures can, indeed, be written down directly from the results in texts such as Billingsley (1968) for stochastic processes in general. Because rescaling is involved, the limits need no longer correspond to random measures, and convergence needs to be expressed in terms of convergence in a function space such as D(0, 1). For example, adapting Billingsley’s Theorem 20.1 to random measures yields the following. Proposition 12.3.X. Let the stationary random measure ξ on X = R be ψ-mixing for ∞some continuous, monotonic, nonnegative function ψ(·) on R+ for which 0 [ψ(t)]1/2 dt < ∞, and have boundedly ﬁnite ﬁrst and second moment measures, with mean rate m and reduced covariance measure C(·) satisfying 0 < σ 2 = C(R) < ∞. Then the sequence of random processes {Yn } deﬁned by (0 ≤ t ≤ 1) Yn (t) = {ξ(0, nt] − mnt}/σn1/2 converges weakly in D(0, 1) to the Wiener process on (0, 1).

Exercises and Complements to Section 12.3 12.3.1 Prove that a stationary renewal process can exist if and only if the lifetime distribution has a ﬁnite mean and that such a process is ergodic but need

12.3.

Mixing Conditions

215

not be mixing. Show that if the lifetime distribution is nonlattice then the process is mixing. [Hint: Use the renewal theorem in diﬀerent forms. A periodic renewal process can be made into a stationary process by suitably distributing the initial point (see e.g. the stationary deterministic process at Exercise 12.1.7), but such a process is not mixing because, for example, the events V = {N (0, 12 ] > 0} and W = {N ( 21 , 1] > 0} do not satisfy the mixing property (12.3.1) when x is an integer and the process has period 1.] 12.3.2 Show that any event in I is equal (modulo sets of measure zero) to an event in T∞ , but not conversely. [Hint: See Exercise 12.3.1 for the converse. For W ∈ I, consider W = ∞ n=1 Sxn V , where xn → ∞ as n → ∞.] 12.3.3 As an example of a cluster process that is mixing but for which the cluster centre process is not mixing, take the cluster centre process to be a mixture of two Neyman–Scott cluster processes with member distributions F1 and F2 about the centre, and the cluster member process again of Neyman–Scott type with distribution F3 such that F1 ∗ F3 = F2 ∗ F3 . 12.3.4 Say that a process ξ has short-range correlation if for W ∈ B(M# X ) with P(W ) > 0, and arbitrary ε > 0, there exists a bounded set A ∈ BX such that on the sub-σ-algebra of B(M# X ) determined by {ξ(B): B ∈ X \A}, the variation norm of the diﬀerence as below satisﬁes PX \A (· | W ) − PX \A (·) < ε. Show that ξ has short-range correlation if and only if the tail σ-algebra T∞ is trivial. [Hint: Consider the sequence of Radon–Nikodym derivatives pn (·) ≡ dPX \An (· | W )/dPX \An (·) for An = Udn (for example). Show that these functions {pn (·)} constitute a martingale that converges to a limit p∞ (·), which is T∞ -measurable, and that h∞ = constant a.s. if PX \A (· | W ) − PX \A (·) > c > 0 for some real c and every bounded A ∈ BX . The result is attributed to Lanford and Ruelle (1969) in MKM (1982, Theorem 1.10.1).] 12.3.5 Let r.v.s X(ξ), Y (ξ) be deﬁned on the stationary random measure ξ and have ﬁnite expectations. Show that ξ is mixing if and only if for all such r.v.s X, Y ,

#

MX

X(Sx ξ)Y (ξ) P(dξ) → E[X(ξ)] E[Y (ξ)]

(x → ∞).

12.3.6 Stability condition for mixing process. Let P, P0 be measures on M# X with P0 P. Show that if P is mixing then for Px deﬁned by Px (V ) =

#

MX

IV (Sx ξ) P0 (dξ)

(V ∈ B(M# X )),

Px → P weakly as x → ∞. [Hint: Let p(ξ) be a measurable version of the Radon–Nikodym derivative; apply Exercise 12.3.5.] 12.3.7 A process is mixing of order k if for all V0 , . . . , Vk−1 ∈ B(M# X ), P(V0 ∩ Sx1 V1 ∩ · · · ∩ Sxk−1 Vk−1 ) → P(V0 )P(V1 ) . . . P(Vk−1 ) as xi → ∞ (i = 1, . . . , k − 1) in such a way that xi − xj → ∞ also for all i = j. Show that when T∞ is trivial the process is mixing of all orders. [Hint: Use the method of proof of Proposition 12.3.V. See also MKM (1978, Theorem 6.3.6) and MKM (1982, Theorem 6.2.9).]

216

12. Stationary Point Processes and Random Measures

12.3.8 Verify the assertions of Proposition 12.3.IX concerning conditions for a stationary cluster process to be weakly mixing or ergodic.

12.4. Stationary Inﬁnitely Divisible Point Processes This shorter section discusses stationarity and mixing conditions for inﬁnitely divisible point processes, resuming the notation and terminology of Section 10.2. It can be viewed as an extended example concerning mixing conditions, at the same time yielding a more reﬁned classiﬁcation of such point processes. denote the KLM measure of a stationary Consider ﬁrst stationarity. Let Q inﬁnitely divisible process so that from (10.2.9), (dx) − 1 Q(d N ) exp log h(x − u) N log G[h(· − u)] = =

N0# (X )

N0# (X )

exp X

X

u dN ). log h(y) N (dy) − 1 Q(S

(12.4.1)

When N is stationary, this must coincide with the original form of (10.2.9), (dy) − 1 Q(d N ). exp log h(y) N log G[h] = N0# (X )

X

by As in (12.1.3), we can deﬁne a new measure S+u Q N : Su N ∈ B} = Q{ S+u Q(B) so that (12.4.1) could equally well be written in the form (10.2.9) with S+u Q in place of Q. But by Theorem 10.2.V the KLM measure is unique, so Q and must coincide, and we have established the following result (see Exercise S+u Q 12.4.1 for a generalization). Proposition 12.4.I. An inﬁnitely divisible point process on Rd is stationary if and only if its KLM measure is stationary (i.e., invariant under the shifts S+u ). In Section 10.2 we established some relationships between total ﬁniteness of the KLM measure and the representation of an inﬁnitely divisible point process as a Poisson cluster process. This relationship can be sharpened when the point process is stationary. Proposition 12.4.II. Let N be a stationary inﬁnitely divisible point process on Rd . (a) If N has a representation as a Poisson randomization then it is singular. (b) N is regular if and only if it can be represented as a Poisson cluster process with a stationary Poisson process of cluster centres and a cluster structure that depends only on the relative locations of the points in a cluster and not on the location of the cluster itself.

12.4.

Stationary Inﬁnitely Divisible Point Processes

217

is totally ﬁnite, so that Proof. Suppose ﬁrst that the KLM measure Q # P(·) ≡ Q(·)/Q(N0 (X )) is the probability measure of a stationary point . The special property (10.2.8) of the KLM measure now implies process N N (X ) = 0} = 0, so coupled with Proposition 12.1.VI we must have that P{ P{N (X ) = ∞} = 1, which with (10.2.10b) proves (a). From the decomposition at Proposition 10.2.VII of an inﬁnitely divisible point process into its regular and singular components, and from the fact that N (X ) = ∞} into itself, it follows that we may discuss the eﬀects S+u maps P{ of stationarity separately for each type of process. From the discussion around Proposition 10.2.VIII, we know already that a regular inﬁnitely divisible point process has a representation as a Poisson cluster process. Statement (b) then follows from Proposition 12.1.V. Observe that, from (a) of this proposition, the stationary singular inﬁnitely divisible distributions can be classiﬁed into those with totally ﬁnite KLM measures, namely the Poisson randomizations, and those with unbounded KLM measure. An alternative and more interesting classiﬁcation can be based on the Q-measures of the sets of trajectories with zero or positive asymptotic densities, as indicated below and in Example 13.2(c). First, however, note that Proposition 12.3.IX implies that a stationary Poisson cluster process, and hence any regular stationary inﬁnitely divisible process, is necessarily mixing and hence ergodic. Interest centres therefore around the mixing properties of the singular stationary inﬁnitely divisible processes. We follow essentially Kerstan and Matthes (1967) and MKM (1978, Chapter 6), starting from a simple general property. Proposition 12.4.III. If the stationary random measure ξ on X = Rd is ergodic, it cannot be represented as a mixture of two distinct stationary processes. Proof. Suppose the contrary, so that P = αP1 + (1 − α)P2 say, where 0 < α < 1 and the other three terms are stationary probability measures on M# X . Evidently, P1 P and the Radon–Nikodym derivative dP1 /dP is invariant under shifts Sx . When P is ergodic the only invariant functions are constants a.s., so P1 = cP; hence, c = 1 because both P and P1 are probability measures. Thus, P1 = P, and similarly P2 = P, showing that the decomposition is trivial. As a converse to this result, by noting that a Poisson randomization is by deﬁnition a nontrivial discrete mixture of distinct components, we conclude as follows. Corollary 12.4.IV. No stationary Poisson randomization, nor more generally any process that can be represented as the superposition of a stationary process and a Poisson randomization, is ergodic. From this result it may seem plausible that no singular stationary inﬁnitely divisible process could be ergodic. Such is not the case: the next result, due

218

12. Stationary Point Processes and Random Measures

to Kerstan and Matthes (1967), spells out in terms of the KLM measure those properties that lead to ergodicity or mixing, and we show that such properties include examples of singular inﬁnitely divisible processes. Proposition 12.4.V. Let N be a stationary inﬁnitely divisible point process on X = Rd with KLM measure Q. (a) N is ergodic if and only if for all bounded A, B ∈ BX , 1 (Udn )

Ud n

(B) > 0} dx → 0 N : N (Tx A) > 0 and N Q{

(n → ∞), (12.4.2)

in which case N is also weakly mixing. (b) N is mixing if and only if for all such A, B, (B) > 0} → 0 N : N (Tx A) > 0 and N Q{

(x → ∞).

(12.4.3)

Proof. We consider ﬁrst part (b). We use the p.g.ﬂ. formulation of the mixing condition, writing now = log G[h] = G[h]

# NX

exp X

N ) (h ∈ V(Rd )). log h(x) N (dx) −1 Q(d

Then the mixing condition (12.3.11) is expressible as 1 ] + G[h 2] 1 Sx h2 ] → G[h G[h

(x → ∞).

(12.4.4)

To show that this condition is the same as (12.4.3), take h1 = 1 − IB , h2 = 1 − IA , and consider the diﬀerence 1 ] − G[h 2] = 1 Sx h2 ] − G[h G[h

# NX

i

ai bi −

i

ai −

bi + 1 Q(dN ),

i

(12.4.5) where ai = h1 (xi ), bi = h2 (xi − x), and the xi are the points of the particular . For the particular h1 , h2 , the integrand on the right vanishes realization N for which both N (B) > 0 and N (Tx A) > 0, when except for realizations of N it reduces to unity. Thus, the left-hand side of (12.4.5) coincides with the expression at (12.4.1) for such h1 , h2 , and thus (12.4.4) implies(12.4.3). Conversely, when (12.4.3) holds, (12.4.4) holds whenever 1 − hi are indicator functions as above. For more general hi ∈ V(Rd ), the diﬀerence at (12.4.5) is dominated by the corresponding diﬀerence when h1 and h2 are replaced by 1 − IB and 1 − IA , where B and A are the supports of 1 − h1 and 1 − h2 . Thus, (12.4.3) also implies (12.4.4) for these more general hi , and part (b) is established. To prove part (a) we need to develop some auxiliary results. Because is stationary, although perhaps only σ-ﬁnite, an extension of the ergodic Q

12.4.

Stationary Inﬁnitely Divisible Point Processes

219

theorem (see comment following Proposition 12.2.II) can be applied to deduce the limit, for any convex averaging sequence {An }, 1 n→∞ (An )

lim

) dx = f¯(N ) f (Sx N

Q-a.e.

(12.4.6)

An

) Q(dN ) < ∞. whenever the B(NX# )-measurable function f satisﬁes N # f (N X Let IV be the indicator function of the set V = {N : N (A) > 0} for bounded ) < ∞ by (10.2.8), so the limit in (12.4.6), I¯V (N ) say, A ∈ B(Rd ). Then Q(V exists Q-a.e. We assert that if the inﬁnitely divisible process is ergodic, then this limit must be zero Q-a.e. To see this, consider for any ﬁxed positive c : I¯V (N ) > c > 0}, which is measurable and invariant because the set Jc = {N I¯V (·) is. Furthermore, c) ≤ cQ(J

) Q(d N ) ≤ I¯V (N

Jc

≤ lim sup n→∞

1 (An )

An

# NX

# NX

) Q(d N ) I¯V (N

) Q(d N ) dx = Q(V ) < ∞. IV (Sx N

measure and is invariant. If in fact Q(J c ) > 0, Consequently, Jc has ﬁnite Q # we can construct a stationary probability measure PV on NX by setting c ∩ ·)/Q(J c ), PV (·) = Q(J and it then follows that the original process has the Poisson randomization of PV as a convolution factor. But for an ergodic process this is impossible ) = 0 Q-a.e. c ) = 0 for every c > 0, and I¯V (N as by Corollary 12.4.IV, so Q(J asserted. : N (B) > 0}, and Now let B be any bounded set in B(Rd ), write W = {N consider the relations 1 N : N (A) > 0 and N (B + x) > 0} dx Q{ (An ) An 1 )IS W (N ) Q(d N ) dx IV (N = x (An ) An NX# 1 ) ) dx Q(d N ). IV (N ISx W (N = # (An ) An NX We have just shown that the inner integral here → 0 as n → ∞ Q-a.e., and because this integral ≤ 1 and Q(V ) < ∞, we can apply the dominated convergence theorem and conclude that the entire expression → 0 as n → ∞; that is, (12.4.2) holds.

220

12. Stationary Point Processes and Random Measures

The converse implication depends on the inequality [see under (12.4.5)] 1 Sx h2 ] − G[h 1 ] − G[h 2] 0 ≤ G[h N : N (A) > 0 and N (Tx B) > 0} = q(x) ≤ Q{

say,

where A, B are the supports of 1 − h1 and 1 − h2 . Using the elementary inequality eα − 1 ≤ αeα (α > 0) and taking exponentials, we obtain 0 ≤ G[h1 Sx h2 ] − G[h1 ]G[h2 ] ≤ G[h1 ]G[h2 ]q(x)eq(x) .

(12.4.7)

N : N (A) > 0} < ∞ uniformly in x, it follows that the Because q(x) < Q{ diﬀerence in (12.4.7) is bounded by Kq(x) for some ﬁnite positive constant K. Then the integral at (12.3.9) is bounded by K q(x) dx, (Udn ) Udn which → 0 as n → ∞ when (12.4.3) holds. Proposition 12.3.VI(b) part (i) now shows that the process must be ergodic, and because the diﬀerence at (12.4.7) is already nonnegative, the stronger convergence statement at (12.3.10) must hold, and the process is in fact weakly mixing as asserted. The arguments used in the preceding proof can be taken further. Consider in particular the invariant function I¯V (·) introduced below (12.4.6), putting can be classiﬁed Q-a.e. A = Ud say for deﬁniteness. The trajectories N by deﬁning : I¯V (N ) = 0}, : IV (N ) > 0}. Ss = {N Sw = {N Both these subsets of NX# are measurable, and their complement has zero Q s , where can be decomposed as Q =Q w + Q measure, so Q w ∩ ·), w (·) = Q(S Q

s = Q(S s ∩ ·). Q

The notation here comes from the deﬁnition of a singular stationary inﬁnitely s vanishes and strongly singular divisible point process as weakly singular if Q if Qw vanishes. Just as a general inﬁnitely divisible point process can be represented as the superposition of regular and singular components, so in the stationary case the singular component can be further represented as the superposition of weakly singular and strongly singular components. Evidently, a Poisson randomization is strongly singular, and any stationary singular process that is ergodic must be weakly singular (in fact, the condition is necessary and suﬃcient: see Exercises 12.4.2–4). Examples of weakly singular processes are not easy to construct: one such construction that uses a modiﬁed randomization procedure starting from a renewal process with inﬁnite mean interval length is indicated in Exercise 12.4.6. Other examples arise from the so-called stable cluster processes described in Section 13.5. See also Example 13.3(b). For additional results see MKM (1978, Sections 6.3 and 9.6), and MKM (1982, Section 6.2).

12.4.

Stationary Inﬁnitely Divisible Point Processes

221

Exercises and Complements to Section 12.4 12.4.1 Show that if an inﬁnitely divisible point process on the c.s.m.s. X is invariant under a σ-group {Tg : g ∈ G} of transformations (Deﬁnition A2.7.I), then its KLM measure is also invariant under the transformations Sg induced by {Tg }. [Hint: See Proposition 12.4.I.] 12.4.2 Show that a singular inﬁnitely divisible process is strongly singular if and only if it can be represented as a countable superposition of Poisson randomizations. [Hint: Consider the restriction Qn of Q to the set Jn = {N : I¯V (N ) ∈ ((n + 1)−1 , n−1 ]}, where V = {N : N (A) > 0} for some bounded A ∈ B(Rd ) (cf. the proof of Proposition 12.4.II). Then Qn is totally ﬁnite, so it can be the KLM measure of a Poisson randomization. The original process is equivalent to the convolution of these randomizations.] 12.4.3 Show that for a singular inﬁnitely divisible point process the conditions below are equivalent. (a) The process is weakly singular; (b) The process is ergodic; (c) If V ∈ I then Q(V ) = 0 or ∞. [Hint: To show (a) ⇔ (b), use the characterization of weak singularity in terms of I¯V together with relevant parts of the proof of Proposition 12.4.II. Modify the argument of Exercise 12.4.2 to show that (a) ⇔ (c).] 12.4.4 Show that T∞ is trivial for every regular inﬁnitely divisible point process. [Hint: Use the short-range correlation inequality of Exercise 12.3.4 and equation (12.4.7) to show that for some positive c, |P(V ∩ W ) − P(V )P(W )| ≤ c Q{N (B) > 0 and N (X \A) > 0}, where for bounded A, B ∈ BX , V and W are in the sub-σ-algebras determined by {ξ(B ): B ∈ BX , B ⊆ B} and {ξ(A ): A ∈ BX , A ∈ X \A}. Now use the regularity of ξ to show that the right-hand side → 0 as A ↑ X , and hence that events in T∞ have probability 0 or 1. MKM (1982, p. 97) attributes the result to K. Hermann.] 12.4.5 For n = 1, 2, . . . let Nn be independent stationary inﬁnitely divisible point processes on R that are Poisson randomizations (with mean number = 1) ∞ ∗ = of Poisson processes at rate λn , where n=1 λn < ∞. Verify that N ∞ N is a well-deﬁned stationary point process that is singular inﬁnitely n n=1 divisible with inﬁnite KLM measure. 12.4.6 Let a stationary inﬁnitely divisible point process on R+ have as its (stationary) KLM measure a ﬁnite positive multiple αµ of the shift-invariant regenerative measure of Example 12.1(e), so that (using the notation from there) y

Q{N (0, y] > 0}) = α 0

[1 − F (u)] du,

222

12. Stationary Point Processes and Random Measures which has a ﬁnite or inﬁnite limit according as the d.f. F (·) has ﬁnite or inﬁnite mean. Show that for x > y > 0 and z > 0, Q({N (0, y] > 0}, {N (x, x + z] > 0})

=α 0

y

x−u+z

[1 − F (u)] du

[1 − F (x − u + z − v)] U0 (dv), x−u

and that this quantity has limit zero when F has inﬁnite mean. Hence, conclude that the point process is then weakly singular. [See MKM (1978, Section 9.6) for some other details.] 12.4.7 Show that if an inﬁnitely divisible process is mixing then it is mixing of all orders as in Exercise 12.3.7.

12.5. Asymptotic Stationarity and Convergence to Equilibrium The issue of convergence to equilibrium, or stability, for point process models has already surfaced in Chapter 4, where we illustrated the use of coupling arguments to obtain classical results for renewal and Wold processes, and in Exercise 12.3.6, where convergence was established assuming both mixing and an absolute continuity condition. This section treats convergence to equilibrium in the more general context of simple and marked point processes on R. Similar results can be developed for random measures with essentially only changes in terminology, but are left to the exercises. The special problem of convergence to equilibrium from the Palm distribution is taken up in Section 13.4. The tools available to tackle these problems have been extended greatly by the development of coupling and shift-coupling methods, which we illustrate in the present section, and the work of Br´emaud and Massouli´e on Poisson embedding, which we outline in Section 14.7 and use there to develop conditions for convergence to equilibrium in terms of conditional intensities. Lindvall (1992) is a basic reference on coupling ideas; for shift-coupling see Aldous and Thorisson (1993) and Thorisson (1994, 2000). In the point process context, systematic treatments concentrating on applications to queues appear in Franken et al. (1981), Sigman (1995), and Baccelli and Br´emaud (1994); convergence to equilibrium has been discussed, for example, by Lindvall (1988) for ﬁnite-memory processes, Sigman (1995) for shift-coupling and queues, Br´emaud and Massouli´e (1996) and Massouli´e (1998) for generalizations of the Hawkes process, and Last (2004) for the stress-release model and its analogues. Additional references are given in these papers and below. Apart from the basic theoretical interest of these issues, their importance has increased in recent years because of the greatly increased role of simulation methods, especially Markov chain Monte Carlo methods, and the associated

12.5.

Asymptotic Stationarity and Convergence to Equilibrium

223

need to estimate the ‘burn-in times’ required for initial system values to approximate the equilibrium values that are usually the endpoint in view. When such problems arise in applications, we are commonly given a ‘law of evolution’ of the process and seek to establish, ﬁrst, whether such a law is compatible with a stationary form for the process, and second, whether the process will converge to that stationary form when started at t = 0 from some ‘initial distribution’. To try to capture these ideas more precisely, consider ﬁrst the action of shifts on a point process (simple or marked) deﬁned on the half-line [0, ∞) = R+ . If the process has distribution P say, on B(NR#+ ×K ), then for u > 0, the action of the general shift operator S+u on P is well deﬁned: it corresponds to shifting the time origin to u. Thus S+u P ≡ Pu is a probability measure deﬁned on the Borel sets of N # ([−u, ∞) × K). Any such measure can be projected onto the smaller sub-σ-algebra B(NR#+ ×K ), on which there is then a measure that corresponds to a new (simple or marked) point process on R+ × K. For the rest of this section the notation Pu = S+u P denotes the probability measures of the shifted point processes induced in this way on R+ × K. We call an MPP on the half-line stationary if its distribution is invariant under these positive shifts. Trivially, the restriction to the half-line of an MPP stationary on the whole line is stationary on the half-line. Conversely, any MPP stationary on the half-line can be extended to a process stationary on the full line by shifting the origin forward (which does not change the distributions) and taking the limit of these shifted processes (Exercise 12.5.1). In the discussion below we use the notation (C, 1), as for Ces´aro summability in analysis, to denote the behaviour of integral averages. Deﬁnition 12.5.I. Let N be an MPP on R+ × K, and let P, Pu be the probability measures on B(NR#+ ×K ) associated as above with N and its shifted versions Su N . Then N is asymptotically stationary u (resp., (C, 1)-asymptotically stationary) if as u → ∞, Pu (resp., P u = u−1 0 Pv dv) converges weakly to a limit P ∗ corresponding to a limit MPP N ∗ . It is strongly asymptotically stationary (resp., strongly (C, 1)-asymptotically stationary) if the convergence holds in variation norm. Br´emaud and Massouli´e and others refer to asymptotic stationarity as a stability property, having in mind the analogy with the stability of diﬀerential equations. Lemma 12.5.II. Any limit measure P ∗ arising in Deﬁnition 12.5.I necessarily corresponds to a stationary point process on R+ . Proof. When

S+u P → P ∗

weakly

(u → ∞),

(12.5.1)

then for all ﬁnite u, the sequences S+x+u P and S+x P must have the same weak limit P ∗ as x → ∞, implying that P ∗ = S+u P ∗ .

224

12. Stationary Point Processes and Random Measures

A similar argument holds in the (C, 1) case if we apply the shift S+x to the x u+x integrals and note that the contributions from the integrals 0 and u on the end-intervals are asymptotically negligible. The convergence at (12.5.1) is equivalent to requiring that for u > 0 the ﬁdi distributions should satisfy Fk (A1 + u, . . . , Ak + u; x1 , . . . , xk ) → Fk∗ (A1 . . . , Ak ; x1 , . . . , xk ), (12.5.2) either directly or in the (C, 1) sense, where on the right-hand side F ∗ refers to the ﬁdi distributions of P ∗ . Thus, if the shifted distributions of the initial point process converge to limits which constitute a family of ﬁdi distributions, then these limit distributions must be those of a stationary point process. The underlying problem can be phrased alternatively as follows. Given a stationary MPP with associated probability measure P ∗ , ﬁnd the class of MPP distributions P for which (12.5.1) or (12.5.2) holds. In this sense, the problem of convergence to equilibrium can also be interpreted as a domain of attraction problem. Consider then what more stringent requirements may be needed for convergence to equilibrium without the (C, 1)-averaging. It is convenient here to specify the distribution P of the initial point process on the half-line by treating it as the conjunction of two components: a distribution Π(·) of initial conditions Z deﬁned on some appropriate c.s.m.s. Z, and a kernel P(· | z) that governs the evolution of the process N given the initial condition Z = z, so that P(V | z) Π(dz) (V ∈ B(NR#+ )). (12.5.3) P(V ) = Pr{N ∈ V } = Z

This formulation follows Br´emaud and Massouli´e (1994, 1996), which in turn has its origins in the fundamental paper of Kerstan (1964b). For u ≥ 0 let Pu (·) denote the distribution of Su N (so that in our earlier notation, Pu = S+u P and P0 = P), and suppose that there exists a family of distributions Πu on the space Z of initial conditions such that for these u, Pu (V ) =

Z

P(V | z) Πu (dz)

(V ∈ B(NR#+ ))

(12.5.4)

with the same kernel P(· | z) as in (12.5.3). We can interpret Πu as the initial conditions that apply when the time origin is shifted to u. When such a representation is available, it is not diﬃcult to show that convergence of the distributions Πu of the initial conditions implies convergence of the shifted distributions Pu . Proposition 12.5.III. Let Z be a c.s.m.s. of initial conditions Z for some point process and {Πu } a family of probability measures on BZ . Let P(V | z)

12.5.

Asymptotic Stationarity and Convergence to Equilibrium

225

be a measurable family of point processes on R+ , and suppose that for all u ≥ 0 and V ∈ B(NR#+ ), Pu (V ) ≡ S+u P0 (V ) ≡ S+u

Z

P(V | z) Π0 (dz) =

Z

P(V | z) Πu (dz).

(a) If Πu → Π∗ weakly, and the kernel P(· | z) takes continuous functions on NR#+ into continuous functions on Z, then Pu → P ∗ weakly and P ∗ (V ) = P(V | z) Π∗ (dz) (i.e. the point process N with distribution P = P0 is Z asymptotically stationary with limit distribution P ∗ ). (b) If Πu → Π∗ in variation norm, then the point process N of (a) is strongly asymptotically stationary with the same limit distribution P ∗ , and Pu − P ∗ ≤ Πu − Π∗ .

(12.5.5)

Proof. To establish asymptotic stationarity we have to prove weak conver gence of the Pu . This means showing that N # (R+ ) h(N ) Pu (dN ) converges for any bounded, continuous function h(N ) on NR#+ . But under assumption (a), the integral N # (R+ ) h(N ) P(dN | z) deﬁnes a bounded continuous function of z ∈ Z, and then the weak convergence of Πu to Π∗ implies h(N ) Pu (dN ) = h(N ) P(dN | z) Πu (dz) N # (R+ )

Z

→

Z

N # (R

N # (R+ )

h(N ) P(dN | z) Π∗ (dz) = +)

N # (R

h(N ) P ∗ (dN ). +)

Under assumption (b), the same relations hold under the weaker requirement that the integral N # (R+ ) h(N ) P(dN | z) is merely bounded and measurable, which follows from the boundedness of h and the measurability condition on the family P(· | z). The bound in (12.5.5) follows from the fact that P(· | z) is a stochastic kernel, so that, for any totally ﬁnite signed measure Ψ on BZ , , , , , , P(dN | z) Ψ(dz), , , # N (R+ )×R+ |h(N )| P(dN | z)[Ψ+ (dz) + Ψ− (dz)] ≤ Ψ, ≤ sup h:|h(N )|≤1

N # (R+ )

R+

where Ψ+ , Ψ− are the positive and negative parts of Ψ from its Jordan– Hahn decomposition, and the supremum is taken over all measurable functions bounded by 1. The major advantage of this form of representation is that the process of ‘initial conditions’ can usually be represented as a Markov process Z(t) say,

226

12. Stationary Point Processes and Random Measures

thus reducing the problem of convergence to equilibrium for point processes to the well-developed theory of convergence to equilibrium for Markov processes. Such a representation is always possible in principle because the initial condition can be taken as the history of the process up to time 0, and such a history can itself be regarded as a point in a c.s.m.s. This still leaves us within the framework of Markov processes on a c.s.m.s. (or more generally a Polish space). Consequently, problems for point processes concerning conditions for the existence of and convergence to equilibrium distributions, are reduced to corresponding problems for general Markov chains [see, e.g., Meyn and Tweedie (1993a, b)]. Similarly, the notions of coupling and shift-coupling, which again have their origins in Markov chain theory, can be invoked and applied within the point process context. The treatment becomes greatly simpliﬁed if a condensed representation can be found, for example, one in which the space of initial conditions can be reduced to a low-dimensional Euclidean space. The goal in practice, then, is to identify a simple representation for the space Z of initial conditions, and to examine the conditions for convergence of the induced Markov process Z(·). To establish a representation (12.5.4) for a given point process model, the ﬁrst step is to identify a convenient state space Z for the Markov process of initial conditions. The essential feature that such a representation should possess is that knowledge of the current value Z(u) should be suﬃcient to fully determine the ﬁdi distributions for the evolution of the process beyond time u, any further information from the past being redundant. The other components needed to complete the representation are the form of the state transition probabilities for the Markov process Z(·) and the mappings P(· | z) determining the evolution of the process from a given value of z. Most commonly, this is done by identifying a mapping, Φ: NR#− → Z say, that condenses the information from the past trajectory {N (t): t < 0} of the point process onto a particular value z = Z(0) ∈ Z. When this transformation is applied to Su N , its value Φ(Su N ) = Z(u) represents the updated value of the initial condition, so that from there on its evolution is governed by P(· | Z(u)). Under these conditions the transition probabilities for the process Z(·) B ∈ BZ are given by Pu (B | z) = P Su Φ−1 (B) | z = S+u P Φ−1 (B) | z (B ∈ BZ ). Integrating over the further evolution from u to u + v yields the Chapman– Kolmogorov equation P Sv Φ−1 (B) | z Pu (dz | z) Pu+v (B | z) = Z = Pv (B | z ) Pu (dz | z). Z

Thus, once the mapping Φ has been identiﬁed, the provisions of Proposition 12.5.III, especially the bound (12.5.5), can be invoked to establish asymptotic stationarity for the associated point process.

12.5.

Asymptotic Stationarity and Convergence to Equilibrium

227

We now illustrate the proposition with some well-known examples. Example 12.5(a) Convergence to equilibrium of renewal and Wold processes. Consider ﬁrst the simpler case of the renewal process, supposing the lifetime distribution to be nonlattice, with ﬁnite mean µ = λ−1 . A convenient choice for Z here is the forward recurrence time deﬁned as in (3.4.15). The space Z becomes the half-line R+ and the conditional point process corresponding to P(· | z) is the ordinary renewal process (with initial point at 0) shifted through z. It remains to investigate the conditions in parts (a) and (b) of the proposition. Continuity of the integral N # (R+ ) h(N ) P(dN | z) as a function of Z, here the value z say of the forward recurrence time, follows from the observation (Lemma 12.1.I) that the shifted realization Su N of an element N ∈ NR#+ depends continuously on u, so if h(·) is a bounded and continuous function of N (·), h(Su N ) is a bounded continuous function of u. Continuity of N # (R+ ) h(N ) P(dN | Z = z) then follows by dominated convergence. The other requirement of condition (a) amounts here to the weak convergence of the distribution Πu of the forward recurrence time at time u to its equilibrium form Π∗ . This was established for nonlattice lifetime distributions with ﬁnite mean in Example 4.4(a). In fact, the stronger result required for condition (b) holds when the lifetime distribution is spread out, as noted in Exercise 4.4.4. Thus a renewal process having a spread-out lifetime distribution with ﬁnite mean is both weakly and strongly asymptotically stationary, from an arbitrary initial condition. Similar results for the Wold process can be obtained from the discussion in Section 4.5. Here it is convenient to describe the initial conditions in terms of the pair (X, Y ) = t0 (N ), t0 (N ) − t−1 (N ) representing the time to the ﬁrst event and the length L0 (N ) of the (unobserved) initial interval. Let (Xu , Yu ) denote their values after a shift of the time origin through u. Conditions for the convergence of the pair (Xu , Yu ) are obtained in Proposition 4.5.VI. For example, if the recurrence time distribution G in that proposition is nonlattice but spread-out, and the stationary interval distribution has ﬁnite mean, then convergence holds in variation norm and the associated Wold process is strongly asymptotically stationary. In particular this is the case when the transition kernel satisﬁes the simpler Condition 4.5.I . The next example introduces two extensions of principle: the point process under consideration is now an MPP, and the underlying Markov process may not be directly observable. The extension to a marked process causes no essential diﬃculties; the function h in the proposition now needs to be taken as a function on the space NR#+ ×K rather than on NR#+ , with the shifts deﬁned on the ﬁrst component only, but no essentially new points arise. To accommodate the non-observable parts of the process, or more generally any random variables in addition to those described by the observed history of the point process, the probability space needs to be taken in the general form (Ω, E) and the explicitly deﬁned shifts Su on NR#+ replaced by a general ﬂow

228

12. Stationary Point Processes and Random Measures

θu on (Ω, E) which is compatible with the shifts [see the discussion following (12.1.3) and elsewhere in Section 12.1]. The initial conditions must then carry suﬃcient information to deﬁne the initial state of all components of the process, observable or otherwise, and it is again their evolution with u which governs the convergence of the point process. Example 12.5(b) Convergence to equilibrium of BMAPs and MMPPs [see Examples 10.3(e), (h), (i)]. The BMAPs of Example 10.3(i) are treated there as examples of MPPs. But, any BMAP shares with the simpler MMPP model of Examples 10.3(e),(g) the feature that its evolution is fully determined by the evolution of a hidden or at best partly observed Markov process, X(t) say. Indeed, equations (10.3.23) spell out the form of the conditional intensity as a function of the current state of the process. In such a situation, the space of initial conditions Z reduces to the state space X of the process X(t) (and in these examples X is just a ﬁnite set of points, {1, . . . , K} say). The family of distributions P(· | z) of Proposition 12.5.III becomes the family of distributions on (NR# , BN # ) of the BMAP started from the initial condition R X(0) = z. The distribution Π0 of the proposition is the distribution of X(0) on the ﬁnite set X, π0 = {πk (0)} say, and Πu , corresponding to the distri u u uQ is the matrix of bution of X(u), is given by π u = π0 P , where P = e transition probabilities pij (u). Because X is ﬁnite, the transition probabilities P u converge in variation norm to the limit matrix π∞ 1 , where π∞ is the stationary distribution, as soon as the process X(t) is irreducible. It then follows from Proposition 12.5.III(b), and speciﬁcally from (12.5.5), that a BMAP with irreducible governing process X(t) is strongly asymptotically stationary, no matter what the initial condition. In tackling more complex examples, several diﬀerent approaches have been suggested. In most cases, as illustrated already for the renewal process, the process Zu takes the form of a continuous-time, mixed jump–diﬀusion process on a continuous state space such as Rd . Conditions for the convergence to equilibrium for such processes, based on extensions of Foster–Lyapunov conditions for stability, have been set out with a considerable degree of generality in Meyn and Tweedie (1993b); their book, Meyn and Tweedie (1993a), contains a compendium of related material on general state-space Markov chains. Application of the Markov process conditions to any particular point process example requires a careful analysis of the properties of the induced process {Zu }, and may be far from a trivial exercise (cf. Exercise 12.5.2). For example, in a comprehensive study, Last (2004) has used this approach to derive stability properties for a class of models that are related to the stress-release model of Example 7.2(g). Further examples include models in reliability for repairable systems [Kijima (1989), Last and Szekli (1998)] and work-load processes [Browne and Sigman (1992)]; see also Exercise 12.5.3. The diﬃculty revolves around the subtle behaviour of the Markov process itself. Zheng (1991) derived general conditions for irreducibility and positive recurrence of the simple stress-release model, and Last extended the discussion to more

12.5.

Asymptotic Stationarity and Convergence to Equilibrium

229

general processes, and gave estimates for the rate of convergence and convergence of moments. For the more complex linked stress-release model and its variants, as in Example 7.3(d), many basic questions remain open. Even suﬃcient conditions for ergodicity have been established only in very special cases [Bebbington and Borovkov (2003)], and no necessary and suﬃcient conditions are known. Another general approach is to invoke coupling or shift-coupling results. These have the advantage that they directly yield convergence in variation norm. They may be applied to the process of initial conditions, to the point process itself, or to the process of intervals appearing in the Palm distributions. Early applications of coupling arguments to point processes appear in Hawkes and Oakes (1974), Berb´ee (1979), and Lindvall (1988). Recent texts such as Lindvall (1992), Sigman (1995), and Thorisson (2000) cover much wider territory. We gave both the underlying deﬁnition of coupling and the basic coupling inequality in Lemma 11.1.I. To apply the inequality directly to point processes, we take the stochastic processes X(t), Y (t) in Lemma 11.1.I to be shifted versions St N, St N , of two simple or marked point processes N, N , so that the parameter t refers to the extent of the time shift. It is convenient to treat the processes as deﬁned on a common probability space (Ω, E, Pr), and to denote by θt the ﬂows associated with the shifts St . , N of N, N , and a We say that N and N couple if there exist versions N ﬁnite stopping time T , such that θT N = θT N a.s. The last equation means that the trajectories on [0, ∞) of the shifted versions are a.s. equal, so that = θT +t N a.s. for t > 0. Similarly we say that N and N shift-couple θT +t N , N of N, N respectively, and ﬁnite stopping times if there exist versions N = θT N ; that is, θt+T N = θt+T N a.s. for t ≥ 0. In this T, T such that θT N context the basic coupling inequality of Lemma 11.1.I extends as follows. Note that because coupling equalities hold between shifts of versions of the original MPPs, rather than between the original MPPs themselves, the inequalities involve only the distributions of the original processes. Lemma 12.5.IV (Coupling Inequalities). Let N, N be jointly distributed by P, P the probability measures induced on MPPs and denote # on R+ × K, B N (R+ × K) by N, N , respectively, and · the total variation norm. (a) Suppose N and N couple, with coupling time T . Then S+t P − S+t P ≤ 2 Pr{T > t}.

(12.5.6a)

(b) Suppose N and N shift-couple, with coupling times T, T . Then , t , t ,1 , +u P du − 1 +u P du , ≤ 2 Pr{max(T, T ) > U t}, , S S ,t , t 0 0

(12.5.6b)

where U is uniformly distributed on (0, 1), independent of T, T , N, N .

230

12. Stationary Point Processes and Random Measures

Proof. The ﬁrst inequality is proved in Lemma 11.1.I. For the other we follow Thorisson (1994; 2000, Theorem 5.3.1). By deﬁnition of shift-coupling, there exist a.s. ﬁnite stopping times T , T such that θT N = θT N is a successful coupling of N and N . Let U be a r.v. uniformly distributed on (0, 1) independent of (T, T , N, N ). Then so is 1−U , and hence for any ﬁnite t > 0, θU t N and θ(1−U )t N are copies of the same process. For such U and t, (T +U t) mod t = t T /t+U −T /t+U is uniformly distributed on (0, t), independent of (T, T , N, N ) (x denotes the largest integer ≤ x). Therefore θ(T +U t) mod t N is also a copy of θU t N . Similarly, θ(T +U t) mod t N is a copy of the process θU t N . On the set C ≡ {U t + max(T, T ) ≤ t}, (T + U t) mod t = T + U t, and thus, θ(T +U t) mod t N = θU t θT N = θU t θT N = θ(T +U t) mod t N

on C,

the central equality by the assumed shift-coupling. Thus on C we have a shift-coupling of θU t N and θU t N , and therefore S+U t P − S+U t P ≤ 2[1 − Pr(C)]. But 1 − Pr(C) = Pr{U t + max(T, T ) > t} = Pr{max(T, T ) > (1 − U )t}, which is the same as Pr{max(T, T ) > U t}. Taking expectations over U in the displayed inequality yields the left-hand side of (12.5.6b). As a corollary, we obtain the following conditions for asymptotic stationarity. Proposition 12.5.V. Let N, N be jointly distributed MPPs on R+ × K, and suppose that N is stationary. (a) If N couples with N , then N is strongly asymptotically stationary, with limit process N . (b) If N shift-couples with N , then N is strongly (C, 1)-asymptotically stationary, with limit process N . Proof. The proof follows from Lemma 12.5.IV and the observation that when N is stationary, its distribution is invariant under the shifts S+u . As a further corollary of these results it follows that the ergodic theorems from Section 12.2 extend to processes which are asymptotically stationary. Suppose that N is asymptotically stationary and that both N and the limit process N have boundedly ﬁnite ﬁrst moment measures. The limit process N is stationary by assumption, so that the ergodic Theorem 12.2.IV applies to N , as also to any of its a.s. versions. If N shift couples to N there are versions of both processes, the realizations of which coincide a.s. after certain realization-dependent but ﬁnite time-shifts. The existence a.s. of the (C, 1) t limits t−1 0 f (Su N ) du for the realizations of the version of the stationary process therefore implies the existence a.s. of the same (C, 1) limits for the version of the approximating process. But the a.s. existence of the (C, 1) limits does not depend on which version is used (the versions are equal outside a set of realizations with zero probability measure), and so the ergodic behaviour applies also to the original version of the approximating process.

12.5.

Asymptotic Stationarity and Convergence to Equilibrium

231

Corollary 12.5.VI. The a.s. ergodic results of Theorem 12.2.IV apply to asymptotically stationary MPPs whenever the original process and its stationary limit have boundedly ﬁnite ﬁrst moment measures. In fact, a great deal more follows from the work of Aldous and Thorisson, who showed in particular that coupling is strongly linked to mixing properties and the behaviour of the tail σ-ﬁeld T∞ , whereas shift-coupling is linked in a parallel way to ergodicity properties and the behaviour of the invariant σ-ﬁeld I. The results depend on a deep analysis of the ergodic properties of Markov chains, which we do not develop in detail here, referring to the papers cited for proofs, and to Sigman (1995) for an elaboration of their applications to point processes. The most important results for our purposes are summarized in the proposition below [see in particular Thorisson (1994, Section 4)], where for the sake of completeness Proposition 12.5.V is included. Proposition 12.5.VII. Suppose that N and N are two MPPs on X = R+ × K, and that N is stationary. (A) (Coupling Equivalences) The following statements are equivalent. (i) N and N couple. (ii) N is strongly asymptotically stationary with limit process N . (iii) N and N induce the same probability distribution on the tail σ-algebra T∞ . (B) (Shift-Coupling Equivalences) The following statements are equivalent. (i) N and N shift-couple. (ii) N is strongly (C, 1)-asymptotically stationary with limit process N . (iii) N and N induce the same probability distribution on the invariant σ-algebra I. Let us note at least one of the remarkable consequences of these results. Corollary 12.5.VIII. Weak and strong (C, 1)-asymptotic stationarity coincide. Proof. Suppose N is weakly (C, 1)-asymptotically stationary with limit process N , and denote by P, P the probability measures they induce on (X , BX ). Then for any invariant set E ∈ NX# , P(E) = S+x P(E) → P (E), so P and P coincide on I. Thus condition B(iii) holds and hence N and N shift-couple. The rest of the corollary follows from Proposition 12.5.VII. In applications we are still left with the problem of identifying situations where coupling holds. For point processes, identifying shift-coupling is generally easier for the associated sequence of intervals than for the continuous-time process, because regeneration points, which often lie at the heart of coupling arguments, are more commonly linked to the intervals. Thus a common approach is to establish shift-coupling for the intervals, and then to refer to the

232

12. Stationary Point Processes and Random Measures

Palm equivalence between counting and interval properties to establish shiftcoupling and associated ergodic results for the continuous-time process. See the further discussion in Section 13.5 and Sigman (1995). As an example of coupling (rather than shift-coupling) arguments, we outline in the next example the basic argument used by Hawkes and Oakes (1974) to establish convergence to equilibrium of the simple Hawkes process. The extensive work on linear and nonlinear Hawkes processes by Br´emaud, Massouli´e, and colleagues is outlined in Section 14.7 in conjunction with their ‘Poisson embedding’ technique. Example 12.5(c) Convergence to equilibrium of the Hawkes process [continued from Examples 6.3(c), 7.2(b)]. We introduced the Hawkes process at Example 6.3(c) as an example of a Poisson cluster process with a special kind of branching process cluster. Then Example 7.2(b) shows that, when started with the zero initial condition at t = 0, the process is characterized by a conditional intensity function of the form

t

µ(t − u) N0 (du),

λ0 (t) = ν +

(12.5.7)

0

where ν is an immigration rate and µ(u) is a density for the intensity measure, assumed to satisfy the condition

∞

µ(x) dx ≡ ρ < 1.

(12.5.8)

0

The existence of a stationary version of the process follows from the Poisson clustering representation, or can be established by letting the origin retreat to −∞ in (12.5.7) (see Exercise 12.5.4). We now compare two versions of the process: N0 starts from 0 at time 0 and follows (12.5.7), whereas N † is a stationary version with the complete conditional intensity t µ(t − u) N † (du) (12.5.9) λ† (t) = ν + −∞

and mean rate m = ν/(1 − ρ). For both versions, we consider the eﬀect of shifting the origin forward to s; equivalently, we consider the shifted versions Ss N0 and Ss N † that bring the origin back to 0. Ss N † can be split into two components: one component has the same structure as Ss N0 , being built up from clusters initiated by † consists of the immigrants arriving after time −s, and the component N−s ‘oﬀspring’ of points that occurred before time −s. On R+ the contributions from the latter form a Poisson process whose intensity, conditional on H−s , is given by −s † λ†−s (t) = µ(t − u) N−s (du). (12.5.10) −∞

12.5.

Asymptotic Stationarity and Convergence to Equilibrium

233

Note that from the integrability of µ as at (12.5.8), ∞ µ(x)dx → 0 (s → ∞), E[λ†−s (t)] = m t+s

and more generally, for any T < ∞, T Pr{Ns† (0, T ) > 0} = E 1 − exp − 0 λ†−s (t) dt T ≤ E 0 λ†−s (t)dt ∞ µ(x) dx → 0 (s → ∞). (12.5.11) ≤ mT s

Consider now the corresponding probability measures on N # (R+ ), P0 and P say, but restrict attention to their projections onto N # ([0, S]). Using the property P ∗ Q = P · Q of the variation norm, which implies that the contribution from the common distribution of Ss N0 disappears from the diﬀerence below, we have †

† [0, S] > 0} → 0 Sˆs P0 −P † [0,S] = Sˆs P0 − Sˆs P † [0,S] ≤ Pr{N−s

(s → ∞).

This implies convergence of the ﬁdi distributions of the two processes on [0, S], for any S > 0, hence their weak convergence, and hence the weak asymptotic stationarity of N0 . We can extend this result to initial conditions speciﬁed by a distribution P J of a point process N J on R− , assuming that for t > 0 the process evolves according to the same dynamics as before. As in the previous discussion, for t > 0 the conditional intensity can then be written as the sum of two components, one corresponding to N0 and deriving from immigrants arriving after t = 0, and the other the residual risk remaining from events occurring J , we before t = 0. Denoting the shifted version of the latter process by N−s can write its contribution to the conditional intensity as in (12.5.10) but with † J replacing N−s . Similar arguments to those used before show that, for a N−s given realization of N J on R− , S −S ∞ J Pr{N−s [0, S] > 0} ≤ Ss N J (du) µ(t − u) dt = ∆(s, i), t=0

−∞

i=1

−s−ui +S

where ∆(s, i) = −s−ui µ(x) dx, and the ui are the points of N J in R− . If the initial distribution P J is concentrated on an individual realization, it follows that theresultant process will be weakly asymptotically stationary provided Σ(s) ≡ i ∆(s, i) < ∞, and Σ(s) → 0 as s → ∞. For under these conditions, comparing the two projections on [0, S], we have Sˆs P J − P † [0,S] ≤ Sˆs P0 − Sˆs P J [0,S] + Sˆs P0 − P † [0,T ] † J ≤ Pr{N−s (0, S) > 0} + Pr{N−s [0, S] > 0} → 0 (s → ∞).

234

12. Stationary Point Processes and Random Measures

If the initial distribution P J is such that N J has mean density m(u) on R− , then a similar conclusion holds provided −s s m(u) µ(t − u) dt du → 0 (s → ∞), t=−∞

0

which is certainly satisﬁed if, for example, m(u) is bounded. Under slightly stronger conditions we can establish strong asymptotic stationarity as well. Suppose that, in addition to (12.5.8), µ(x) also satisﬁes the condition ∞ x µ(x) dx < ∞. (12.5.12) 0

This condition implies that, if a parent event has a direct oﬀspring, the mean gestation period, that is to say, the mean time to the appearance of the oﬀspring, is ﬁnite, and hence, because (12.5.8) implies that the mean number of oﬀspring is also ﬁnite, that the random time T from the appearance of an ancestor (cluster centre) to the last of his descendants (last cluster member) is also ﬁnite; that is, E(T ) < ∞. The probability that a cluster is initiated at some time −u ≤ −s, and still produces some members in R+ , is given by 1 − FT (u) = Pr{T > u}. Treating this as a thinning probability, and using the Poisson character of the arrival of ancestors (cluster centres), the probability that (in the stationary process) none of the ancestors arriving before −s produces oﬀspring in R+ is equal to ∞ † [0, ∞) > 0)} = 1 − exp − ν [1 − FT (u)] du . Pr{N−s s

∞

† [0, ∞) > 0)} → 0. Because 0 [1 − FT (u)] du = E(T ) < ∞, we have Pr{N−s The occurrence time Ts for the last point in R+ of a cluster initiated before time −s is therefore a.s. ﬁnite, and acts as a coupling time between the processes Ss N0 and N † on R+ . The coupling time inequality (12.5.6a) then yields ≤ 2Pr{Ts > 0} = 2Pr{N † [0, ∞) > 0)} → 0 (s → ∞). Sˆs P0 − P † [0,∞]

−s

Thus, when the density µ satisﬁes both (12.5.8) and (12.5.13), the process starting from zero is strongly asymptotically stationary. Again the result can be extended to more general initial conditions N J , for example, when ∞ 0 µ(t − u) N J (du) dt = σ(|uj |) < ∞, (12.5.14) t=0

−∞

∞

where σ(u) = u µ(x) dx and {uj } enumerates the points of N J over (−∞, 0). The arguments are similar to those used before, and are outlined in Exercise 12.5.6. Note that the argument used above to derive strong asymptotic stationary is not peculiar to Hawkes processes, but holds for a wide range of Poisson cluster processes. Exercise 12.5.7 gives some details and examples.

12.5.

Asymptotic Stationarity and Convergence to Equilibrium

235

Exercises and Complements to Section 12.5 12.5.1 Prove the statement above Deﬁnition 12.5.I, that a process stationary under positive shifts on the half-line can be extended to a process stationary on the whole line. [Hint: Consider {Su P: u > 0}.] 12.5.2 Generalized stress release model. Following Last (2004), consider a piecewise linear Markov process X(t) determined by the following components: a positive constant ν representing the linear increase of X(t) between jumps; a locally bounded risk function Ψ(x): R → R+ representing the instantaneous rate of occurrence of jumps given that X(t−) = x; and a stochastic kernel J(x, B) representing the probability of a jump into B ∈ BR+ given X(t) = x ∈ R. (a) Show that, as for the BMAP processes, the process X(t) uniquely determines the associated MPP, say NX ≡ {(tn , κn )} of jump times and jump sizes, and that conversely knowledge of the MPP on R+ determines X(t) uniquely up to the initial value X(0). (b) Show also that if X(t) is positive recurrent, in the sense that, if X(t) has distribution Πt on R+ , there exists a stationary distribution Π∗ such that Πt − Π∗ → 0 as t → ∞, then NX is strongly asymptotically stationary. (c) Show that if X(t) is ‘geometrically ergodic’ in the strong sense that Πt − Π∗ ≤ C exp(−βt) for positive constants C, β, then also NX is geometrically asymptotically stationary in the sense that in Deﬁnition 12.5.I, Pu − P ∗ ≤ C exp(−β t) for some constants C , β . [Hint: Use the norm-preserving property of the stochastic kernel, as in establishing (12.5.5).] 12.5.3 (Continuation). Application to reliability models. For each of the examples listed below, identify the form of the components, and ﬁnd suﬃcient conditions to ensure that the resulting process X(t) is (i) well-deﬁned, (ii) positive recurrent, and (iii) geometrically ergodic. (a) The simple stress release model of Example 7.2(g) and Exercises 7.2.9–10. (b) The repairable system model [e.g., Block et al. (1985), Kijima (1989), and Last and Szekli (1998)], in which X(t) denotes the ‘virtual age’ of a system subject to failure, and after every failure an instantaneous repair takes place, restoring the system to some fraction 0 ≤ θ < 1 of its ‘age’ before failure. (c) The workload-dependent queueing process [e.g., Browne and Sigman (1992), Meyn and Tweedie (1993b)], in which W (t) = max{−X(t), 0} decreases linearly between jumps until either another jump (upwards) occurs, or W (t) = 0 in which case it remains zero until the next jump occurs. W (t) here can be interpreted as the workload in a queueing system in which both the arrival and the service rates may depend on the current value of W (t). [Hint: The basic aim is to ﬁnd variations on condition (7.1.3) that will allow Foster–Lyapunov drift conditions, as developed in Meyn and Tweedie, to be applied to the process in question. Once convergence of the Markov process has been established, Proposition 12.5.III can be invoked to transfer the results to the associated point process. Last (2004) gives a very general discussion, as well as applications to the speciﬁc examples mentioned.] 12.5.4 Investigate conditions under which asymptotic stationarity of the point process implies, in an appropriate sense, asymptotic convergence of its ﬁrst,

236

12. Stationary Point Processes and Random Measures second, and higher moment measures to the moment measures of the stationary version of the process.

12.5.5 Consider the Hawkes process of Example 12.5(c). Show that, for the process started from zero as in (12.5.7), there is an instantaneous mean rate m(t) = E[λ0 (t)] which is always ﬁnite and satisﬁes m(t) ≤ m = ν/(1 − ρ). Imitate the arguments leading to (12.5.11) to show that the sequence Ss P0 deﬁnes a strong Cauchy sequence of point processes on [0, S], and so implies the existence of a limit point process on [0, S], which in turn, because S is arbitrary, implies the existence of the distribution P † of an equilibrium process N † on R+ such that Ss P0 → P † weakly as s → ∞. 12.5.6 Let N J be an initial realization on R− satisfying (12.5.14), and suppose that (12.5.12) holds. Show that the process started from N J is strongly asymptotically stationary. [Hint: Extend the arguments below (12.5.12) to show that for both Ss N J and Ss N † , the probability that the residual components on R+ are nonempty converge to zero, so that both couple to Ss N0 .] 12.5.7 Convergence to equilibrium of Poisson cluster processes. (a) Consider a Poisson cluster process as in Proposition 6.3.III with constant intensity µc for the cluster centres, and stationary cluster structure such that the time T from the ﬁrst to the last member of the cluster satisﬁes E(T ) < ∞. Imitate the last part of the discussion in Example 12.5(c) to show that the process is strongly asymptotically stationary. (b) Show that the conditions are satisﬁed for the Neyman–Scott process whenever both the number N of points and the distance X of a satellite point from the cluster centre have ﬁnite (absolute) ﬁrst moments. Similarly for the Bartlett–Lewis model show that the conditions are satisﬁed whenever the number of points N and the distance X between successive cluster points both have ﬁnite means. 12.5.8 (Continuation). Investigate conditions under which the strong asymptotic stationarity for Poisson cluster processes can be strengthened to geometric asymptotic stationarity as in Exercise 12.5.2(c).

12.6. Moment Stationarity and Higher-order Ergodic Theorems The essential simpliﬁcation of the moment structure implied by stationarity derives from the application of Lemma A2.7.II as in the diagonal shifts Lemma 12.1.IV. It amounts to a diagonal factorization: each moment measure is represented as a product of Lebesgue measure along the main diagonal and a reduced measure in a complementary subspace. These reduced measures determine the moment structure of the process and have long been studied, usually as densities, in applications. They appear in many diﬀerent guises, notably as the moment measures of the Palm distributions in Chapter 13, and in the higher-order ergodic theorems discussed at the end of this section and

12.6.

Moment Stationarity and Higher-order Ergodic Theorems

237

again in Section 13.4. Their Fourier transforms provide the various spectra of the random measure, the second-order version (Bartlett spectrum) being discussed in detail in Chapter 8. The role of the factorization theorem in the present context emerged in the work of Brillinger (1972) and Vere-Jones (1971), being subsumed by the more general results on the disintegration of moment measures developed in Krickeberg (1974b). We start with some deﬁnitions, supposing again that X = Rd , and writing Tx for the shift operator as in (12.1.1). Although we include MPPs in the exposition below, for which stationarity was noted below Deﬁnition 12.1.II, the resulting expressions are more involved and are left largely as exercises. Deﬁnition 12.6.I. (a) A random measure or point process on X = Rd is kthorder stationary if its kth moment measure exists, and for each j = 1, . . . , k, bounded Borel sets A1 , . . . , Aj , and x ∈ Rd , Mj (Tx A1 × · · · × Tx Aj ) = Mj (A1 × · · · × Aj ).

(12.6.1)

(b) An MPP on Rd × K is kth-order stationary if its kth moment measure exists, and for each j = 1, . . . , k, bounded Borel sets A1 , . . . , Ak and K1 , . . . , Kk ∈ BK , Mj (Tx A1 × K1 × · · · × Tx Aj × Kj ) is independent of x ∈ Rd . If ξ is a stationary random measure, the joint distributions of {ξ(Tx A1 ), . . . , ξ(Tx Aj )} coincide with those of {ξ(A1 ), . . . , ξ(Aj )} (see around Deﬁnition 12.1.II), so that a stationary random measure for which the kth-order moment measure exists is kth-order stationary. The converse implication is not true in general (see Exercise 8.1.1), but in particular parametric models, moment stationarity, even of relatively low order, generally requires stationarity of the process as a whole. For example, a Poisson process is stationary if and only if it is ﬁrst-order stationary. The imposition of conditions on Mj for j < k in Deﬁnition 12.6.I is certainly redundant in the case of a simple point process, for the lower-order moment measures appear as diagonal concentrations (see Proposition 9.5.II) and are thereby identiﬁed uniquely (see Exercise 9.5.7). It may be redundant more generally, but the question appears to be open. It is relatively easy, however, to ﬁnd a process for which the second cumulant measure is stationary but the expectation measure is not (see Exercise 8.1.2). The case k = 1 of the condition (12.6.1) simply asserts that the expectation measure M1 (·) is invariant under shifts. It must therefore reduce to a multiple of the unique measure on Rd with this property, namely, Lebesgue measure. We thus have the following proposition that incorporates parts of Propositions 6.1.I, 8.1.I and 8.3.II. Proposition 12.6.II. (a) A random measure on Rd is ﬁrst-order stationary if and only if its expectation measure is a ﬁnite positive multiple m (the mean density) of Lebesgue measure on Rd . (b) A marked random measure or MPP on Rd × K is ﬁrst-order stationary if and only if its expectation measure is a product × F of Lebesgue measure

238

12. Stationary Point Processes and Random Measures

on Rd and a boundedly ﬁnite measure F on BK ; F is totally ﬁnite if and only if the expectation measure of the ground process has ﬁnite mean density mg , and mg = F (K). The proportionality constant referred to in this proposition is usually denoted m and called either the mean density or the rate of the process. For k > 1, the conditions at (12.6.1) imply, via the generating properties of rectangle sets, that the whole measure Mk is invariant under the group of (k) diagonal shifts Dx deﬁned by (12.1.11). The diagonal shifts Lemma 12.1.IV now implies the following proposition for the unmarked case. The corresponding results for the marked case are outlined in Exercise 12.6.9. Proposition 12.6.III. For any kth-order stationary random measure or ˘ k, M ˘ [k] , C˘k , and C˘[k] repoint process on Rd , there exist reduced measures M lated to the corresponding kth-order measures Mk , M[k] , Ck , and C[k] through equations, valid for any function f ∈ BM(X (k) ), of the type f (x1 , . . . , xk ) Mk (dx1 × · · · × dxk ) X (k) ˘ k (dy1 × · · · × dyk−1 ). (12.6.2) = dx f (x, x + y1 , . . . , x + yk−1 ) M X

X (k−1)

˘ k, M ˘ [k] , C˘k , and C˘[k] the reduced kth-order moment measure, the We call M reduced kth-order factorial moment measure, the reduced kth-order cumulant measure, and the reduced kth-order factorial cumulant measure, respectively; see Proposition 13.2.VI for their interpretation as moment measures of the Palm distribution. For k = 1 these reduced measures all coincide and equal the mean density m = M1 (Ud ). For k = 2 we mostly use C˘2 , which we also call the reduced covariance measure. It is deﬁned on BX , and its properties and applications form the main content of Chapter 8. Note that the disintegration furnished by (12.6.2) is of the form ˘ k, Mk = × M

(12.6.3)

where denotes standard Lebesgue measure on Rd (so, (Ud ) = 1) and thus any scale factors remain in the reduced measure. The same disintegration result can also be obtained via an argument involving Radon–Nikodym derivatives with respect to the ﬁrst-moment measure, as in Exercises 12.1.8–9. This alternative approach is outlined in Exercises 12.6.1–2 and leads to a decomposition of the form ˘ k) Mk = M1 × (m−1 M with M1 = m. This and its role in Palm theory has led some authors to ˘ k as the deﬁnition of the reduced measure; we have preferred not adopt m−1 M to adopt this convention, mainly because of its incompatibility with the usual deﬁnition of the stationary form of the covariance function when the measure is absolutely continuous.

12.6.

Moment Stationarity and Higher-order Ergodic Theorems

239

Some of the more accessible properties of the reduced moment measures are given in the next proposition; analogous statements for factorial moment and cumulant measures can be given (see Exercise 12.6.3). A more extended list of properties for the case k = 2 is given in Proposition 8.1.II for the unmarked point processes and in Proposition 8.3.II for MPPs. ˘ k be the kth-order reduced moment measure Proposition 12.6.IV. Let M for the kth-order stationary random measure ξ on X = Rd . ˘ k (·) is a symmetric measure on X (k−1) (invariant under permutations of (i) M the arguments in the product space) and is invariant also under the ‘shift reﬂection’ transformation mapping (u1 , u2 , . . . , uk−1 ) into (−u1 , u2 − u1 , . . . , uk−1 − u1 ). ˘ k is also absolutely (ii) When Mk is absolutely continuous with density mk , M continuous, and its density m ˘ k is related to mk by ˘ k (x2 − x1 , . . . , xk − x1 ). mk (x1 , x2 , . . . , xk ) = m (iii) For all bounded Borel sets A1 , . . . , Ak−1 ∈ X , ˘ k (A1 ×· · ·×Ak−1 ) = E ξ(x+A1 ) . . . ξ(x+Ak−1 ) ξ(dx) . M

(12.6.4)

(12.6.5)

Ud

Proof. In (12.6.2) set f (x1 , . . . , xk ) = g(x1 )h(x2 − x1 , . . . , xk − x1 ), where g(·) and h(·) are bounded Borel functions of bounded support on X and X (k−1) , respectively, so that g(x1 )h(x2 − x1 , . . . , xk − x1 ) Mk (dx1 × · · · × dxk ) X (k) ˘ k (dy1 × · · · × dyk−1 ). = g(x) dx h(y1 , . . . , yk−1 ) M (12.6.6) X

X (k−1)

Now let the variables x2 , . . . , xk in the argument of h(·) on the left-hand side be permuted. Because of the symmetry properties of Mk (·), this leaves the integral unaltered. Observe also that it corresponds to permuting the variables y1 , . . . , yk−1 in the argument of h(·) on the right-hand side of (12.6.6). Equivalently, it corresponds to leaving the variables in h(·) unaltered and ˘ k . Because a measure on X (k−1) is uniquely permuting the variables in M ˘ k must determined by the integrals of all such functions h(·), it follows that M be invariant under permutations of its arguments; that is, it is symmetric. Alternatively, if we interchange x1 and x2 on the left-hand side of (12.6.6), the integral is unaltered, and from (12.6.3) the right-hand side becomes ˘ k (dy1 × · · · × dyk−1 ). dx g(x + y1 )h(−y1 , y2 − y1 , . . . , yk−1 − y1 ) M X

X (k−1)

240

12. Stationary Point Processes and Random Measures

But

X

g(x + y1 ) dx =

X (k−1)

X

g(x) dx, so we conclude that

˘ k (dy1 × · · · × dyk−1 ) h(y1 , . . . , yk−1 ) M ˘ k (dy1 × · · · × dyk−1 ), = h(−y1 , y2 − y1 , . . . , yk−1 − y1 ) M X (k−1)

from which there follows the shift reﬂection invariance assertion in (i). If Mk has density mk with respect to Lebesgue measure on Rdk , invariance of Mk implies that m k (x, x + y1 , . . . , x + yk−1 ) is independent of x, so on cancelling the factor X g(x) dx in (12.6.6) we obtain ˘ k (dy1 × · · · × dyk−1 ) h(y1 , . . . , yk−1 ) M X (k−1) = h(y1 , . . . , yk−1 ) mk (x, x + y1 , . . . , x + yk−1 ) dy1 . . . dyk−1 . X (k−1)

˘ k is absolutely continuous with density mk (x, x + y1 , . . . , x + yk−1 ), Thus, M which is equivalent to the assertion (ii). #k−1 Finally, in (12.6.6) set g(x) = IUd (x) and h(y1 , . . . , yk−1 ) = j=1 IAj (yj ). Then because X g(x) dx = (Ud ) = 1, (12.6.5) follows directly. ˘ k are necessarily nonnegative, the same Although the reduced measures M is not true of reduced cumulant measures C˘k . In the simplest nontrivial ˘ 2 (A) − m2 (A) [see (8.1.6)], so that for its Jordan–Hahn case, C˘2 (A) = M decomposition C˘2 = C˘2+ − C˘2− into positive and negative parts, we have C˘2− (A) ≤ m2 (A) < ∞

(bounded A ∈ BX ).

˘ 2 (A) = (m2 + m)(A) so For the simple (stationary) Poisson process, M + − C˘2 (A) = m(A), C˘2 (A) = 0, but for the stationary deterministic process on R with span a as in Example 8.3(e), C˘2+ consists of atoms of mass 1/a at the points ka (k ∈ Z) and C˘2− (A) = (A)/a2 for A ∈ BR . Thus, although this process has 0 ≤ var N (0, x] ≤ 14 , neither of C˘2+ and C˘2− is totally ﬁnite. ˘ k (·) as an expectation suggests the exisThe identiﬁcation at (12.6.5) of M tence of higher-order ergodic theorems in which the reduced moment measures appear as the ergodic limits. To identify the limits in the nonergodic situation, we use the following application of Lemma 12.2.III, where I again denotes the σ-algebra of invariant events. Lemma 12.6.V. Let ξ be a strictly stationary random measure with ﬁnite kth moment measure. Then there exists a symmetric I-measurable random measure ψ˘k on X (k−1) , invariant also under the shift reﬂections of Proposition 12.6.IV, such that for bounded Borel functions f of bounded support on X (k) , " " f (x1 , . . . , xk ) ξ(dx1 ) . . . ξ(dxk ) "" I E (k) X = dx f (x, x + y1 , . . . , x + yk−1 ) ψ˘k (dy1 × · · · × dyk−1 ). (12.6.7) X

X (k−1)

12.6.

Moment Stationarity and Higher-order Ergodic Theorems

241

In particular, for bounded A1 , . . . , Ak−1 ∈ BX , " " ˘ ψk (A1 × · · · × Ak−1 ) = E ξ(x + A1 ) . . . ξ(x + Ak−1 ) ξ(dx) "" I . (12.6.8) Ud

Proof. Represent X (k) in the product form X × X (k−1) via the mapping (12.1.13). On the space X (k) , ξ induces a new random measure, namely, the k-fold product ξ (k) of ξ with itself, and ξ (k) is stationary with respect to diagonal shifts. Its image under (12.1.13) is therefore stationary with respect to shifts in the ﬁrst component. We now have a situation to which the general result in Lemma 12.2.III applies, with X (k−1) playing the role of the mark space K. On the product space X × X (k−1) there exists a σ-algebra of sets invariant under shifts in the ﬁrst component, and the image of ξ (k) under (12.1.13) has a conditional expectation with respect to this σ-algebra, which factorizes into a product of Lebesgue measure on X and an I-measurable random measure ψ˘k on X (k−1) , which is readily checked as having the properties described in the lemma. Before proceeding to the ergodic theorems, we give a further example, albeit somewhat artiﬁcial in character, to illustrate some of the types of behaviour that can occur in the nonergodic case. Example 12.6(a) Poisson cluster process with dependent clusters. Suppose that X = R and that cluster centres occur at rate λ. Set up a common pattern for the clusters from a ﬁxed realization {y1 , . . . , yZ } of a ﬁnite Poisson process on R with a nonatomic parameter measure µ(·), so that Z is a Poisson r.v. with mean E(Z) = µ(R) = ν and, conditional on Z, the r.v.s y1 , . . . , yZ are i.i.d. r.v.s with distribution µ(·)/ν. Then, given a realization {xi } of the cluster centre process, we associate with the cluster centre xi the cluster (xi + yj : j = 1, . . . , Z}, so that the whole process has as its realization the points {xi + yj : i = 0, ±1, . . . ; j = 1, . . . , Z}. The r.v.s {Z, y1 , . . . , yZ } deﬁne a σ-algebra of events, which, in fact, coincides with the invariant σ-algebra I for the whole process. We can then compute moment characteristics of the process as follows. (1◦ ) For k = 1, the mean density given the invariant σ-algebra I, that is, the r.v. Y of (12.2.12), here equals λZ; the mean density of the whole process equals m = E(λZ) = λµ(R) = λν. (2◦ ) For k = 2 and before reduction, the second-order moment measure given I has three components: a multiple λ2 Z 2 of Lebesgue measure in the plane; a line concentration with density λZ along the main diagonal x = y; and line concentrations of density λ along each of the Z(Z − 1) lines y = x + yi − yj , where i = j but both orderings are permitted. Then the reduced moment measure ψ˘2 (·) on BR can be written ψ˘2 (du) = λ2 Z 2 du + λZδ(u) du + λ δ(u − yi + yj ) du, i =j

242

12. Stationary Point Processes and Random Measures

where the δ-function terms represent atoms at 0 and the points ±|yi − yj | (i = j). Taking expectations leads to the reduced second moment measure: ˘ 2 (du) = λ2 ν(ν + 1) du + λνδ(u) du + λ M

µ(x + du) µ(dx). R

(3◦ ) Third- and higher-order moments can be built up in a similar way by considering all possible locations of triples of points {yi } and so on. Observe that ψ˘2 is just the form that the reduced moment measure would take if the cluster structure were ﬁxed for all realizations: the process would then be inﬁnitely divisible and ergodic. A variant of this Poisson cluster process, but having conditionally independent clusters, can be obtained by treating the clusters as a Cox process directed by some a.s. ﬁnite random measure ξ replacing the ﬁxed measure µ(·) above: regard ξ as ﬁxed for any given realization with the points in each cluster now being determined mutually independently according to a Poisson ˘ 2 as above equals the random process with parameter measure ξ(·). Then M ˘ 2 is obtained by a further averaging measure ψ˘2 of this process and the new M over the realizations of ξ. Further variants of the model are possible. We are now in a position to state the higher-order version of the ergodic Theorem 12.2.IV [see also Nguyen and Zessin (1976)]. Extensions to the marked case of Lemma 12.6.V and the result below are outlined in Exercise 12.6.10. For further extensions see Sections 13.4–5, especially Propositions 13.4.I and 13.4.III. Theorem 12.6.VI. Let ξ be a strictly stationary random measure for which the kth moment measure exists, ψ˘k the invariant random measure deﬁned by (12.6.8), and B1 , . . . , Bk−1 a family of bounded Borel sets in Rd . Then, for any convex averaging sequence {An } in Rd , as n → ∞, 1 (An )

a.s. ξ(x + B1 ) . . . ξ(x + Bk−1 ) ξ(dx) → ψ˘k (B1 × · · · × Bk−1 ). (12.6.9)

An

In particular, if ξ is ergodic, 1 (An )

a.s. ˘ ξ(x+B1 ) . . . ξ(x+Bk−1 ) ξ(dx) → M k (B1 ×· · ·×Bk−1 ). (12.6.10)

An

Proof. Given a bounded Borel function g of bounded support on X (k) , consider the random function g(x1 , . . . , xk ) ξ(dx1 ) . . . ξ(dxk ), f (ξ) ≡ X (k)

noting that, by assumption, we have E[f (ξ)] < ∞.

12.6.

Moment Stationarity and Higher-order Ergodic Theorems

243

Appealing to Proposition 12.2.II(a) and evaluating the limit in (12.2.6) from (12.6.7), we obtain, for n → ∞, 1 f (Sx ξ) dx (An ) An 1 dx g(u1 − x, . . . , uk − x) ξ(du1 ) . . . ξ(duk ) = (An ) An X (k) a.s. → dx g(x, x + y1 , . . . , x + yk−1 ) ψ˘k (dy1 × · · · × dyk−1 ). (12.6.11) X (k−1)

X

In particular, by taking g(x1 , . . . , xk ) = gε (x1 )h(x2 − x1 , . . . , xk − x1 ), where gε (·) has the same properties as in the proof of Theorem 12.2.IV, it follows, for example, that Aε f (Sx ξ) dx equals

n

gε (u1 − x) dx Aεn

X (k)

gε (u1 − x) dx

= Aεn

≥

h(u2 − u1 , . . . , uk − u1 ) ξ(du1 ) . . . ξ(duk )

ξ(du1 ) An

X (k)

X (k−1)

h(v1 , . . . , vk−1 ) ξ(du1 ) ξ(u1 + dv1 ) . . . ξ(u1 + dvk−1 )

h(v1 , . . . , vk−1 ) ξ(u1 + dv1 ) . . . ξ(u1 + dvk−1 ).

Thus, we can use the approximation argument exploited in Theorem 12.2.IV to deduce that, for nonnegative bounded functions h of bounded support in X (k−1) , as n → ∞, 1 ξ(du) h(v1 , . . . , vk−1 ) ξ(u + dv1 ) . . . ξ(u + dvk−1 ) (An ) An X (k−1) a.s. → h(v1 , . . . , vk−1 ) ψ˘k (dv1 × · · · × dvk−1 ). (12.6.12) X (k−1)

Equations (12.6.9) and (12.6.10) are now easily derived as special cases of (12.6.12). It is, of course, a corollary of (12.6.8) that ˘ k (B1 × · · · × Bk−1 ). E ψ˘k (B1 × · · · × Bk−1 ) = M An L1 version of Theorem 12.6.VI is given in Exercise 12.6.8. For point processes, the left-hand side of (12.6.10) suggests a natural class of nonparametric estimates for the reduced moment measures, as for example the estimate + [2] (B; A) = 1 N ∗ (xi + B), (12.6.13) M (A) i:xi ∈A

∗

where N (B) = N (B) − δ0 (B), introduced at (8.1.25) in discussing secondorder factorial moment measures. In practice, estimates of this kind are

244

12. Stationary Point Processes and Random Measures

subject to serious biases arising from edge eﬀects, as discussed brieﬂy around (8.1.26) and in greater detail in texts such as SKM (1995). In such contexts, Theorem 12.6.VI provides a starting point for proving the consistency of the estimates or of variants in which (A) is replaced by N (A), itself representing an estimate of M1 (A) = m(A). The resulting quantity can be written + [2] (B; A) ≈ M

N (A) m ∗ N (xi + B) N (A) i=1

(12.6.14)

and represents the average, over the points of A, of the counts of points in sets B relative to the points of A as origin. As such, it is a point estimate of the ﬁrst-order moment measure of the Palm process associated with N , as discussed further in Section 13.4. Again, Theorem 12.6.VI provides the basis for proving the consistency of such estimates, and is so used in the discussion of fractal dimension in Section 13.6. Often, the natural interpretation of the estimates such as in (12.6.13) is in terms of point conﬁgurations. For example, in the case k = 3, the third-order factorial moment measure gives information about the occurrence of triplets of points of the realization, taking one point of the triplet as origin. In the discussion of Section 13.6, for example, use is made of sets Bk−1,r = {(u1 , . . . , uk−1 ): max uj < r}, which for k = 3 gives information about the proportion of triplets in the realization with the property that all three points of the triplet lie within a maximum distance r of one of the three points. Kagan (see Exercise 12.6.11) has used estimates of two-, three- and four-point conﬁgurations at ‘small’ scale in examining possible relations between shocks and aftershocks in earthquake studies. As for ﬁnite processes in Chapter 5, moment densities are often used as an aid to understanding both qualitative and quantitative behaviour of models, as we illustrate in our concluding example. Example 12.6(b) Interacting point processes. Suppose given two stationary simple point processes Nj (j = 0, 1) with mean densities mj . They evolve independently except that each successive point ti of the process N1 is followed by a dead time Zi , during which any point of the process N0 is deleted. Suppose that Zi = min(Yi , ti+1 −ti ), where {Yi } is a sequence of i.i.d. nonnegative r.v.s independent of both N0 and N1 . We observe N1 and the thinned process N2 consisting of those points of N0 that are not deleted. Our aim is to describe the ﬁrst and second moment measures of the output (N1 , N2 ), particularly as they relate to the same measures for N0 . To this end it is convenient to use the {0, 1}-valued process J(t) for which 0 if 0 < t − ti ≤ Zi for some i, J(t) = 1 otherwise (so, t ∈ i (ti + Zi , ti+1 ]).

12.6.

Moment Stationarity and Higher-order Ergodic Theorems

245

Because the marked point process {ti , (Yi , ti+1 − ti )} on R × R2+ is stationary whenever N1 (with realizations the points {ti }) is stationary, it follows that then J(·) is itself a stationary process, with ﬁrst and second moments α = EJ(t) (all t), β(u) = E J(t)J(t + u) (all t, u). Furthermore, J is determined by N1 and {Yi }, and is thus independent of N0 . Consequently from N2 (dx) = J(x) N0 (dx) it follows that N2 has mean density m2 given by 1 1 J(x) N2 (dx) = E E[J(x)] N2 (dx) = αm0 . m2 = EN2 (0, 1] = E 0

0

In addition, for bounded measurable h of bounded support in R2 , and writing D = {(x, y): x = y} for the diagonal of R2 , the second factorial measure (2) M[2] (·) of N2 satisﬁes R2 \D

(2) h(x, y) M[2] (dx × dy) = E =E

R2 \D

R2 \D

h(x, y) N2 (dx) N2 (dy) h(x, y)J(x)J(y) N0 (dx) N0 (dy) .

Thus, when N0 has a reduced factorial moment density m ˘ [2] (·), we have h(x, y)J(x)J(y)m ˘ [2] (x − y) dx dy E R2 \D h(x, y)β(x − y)m ˘ [2] (x − y) dx dy, = R2 \D

and N2 has a reduced factorial moment measure which likewise has a density (2) m ˘ [2] (·); it is given by (2)

m ˘ [2] (u) = β(u)m ˘ [2] (u). Finally, for the cross-intensity we ﬁnd similarly (using diﬀerential notation for brevity) that E N1 (dx) N2 (dy) = m0 γ(y − x) dx dy, where γ(u) dx = E[J(x + u) N1 (dx)]. Here, γ(u) can be interpreted as the rate of occurrence of points ti of N1 such that ti +u lies outside any dead-time interval. Any more detailed evaluation of the quantities α, β(·), and γ(·) of the process N1 requires in turn more speciﬁc detail about its structure. Ergodicity of N1 is enough to show via the ergodic theorem that n ∞ (X − Zi ) i=1 n i α = lim = m1 E[(Xi − Yi )+ ] = m1 G(v) 1 − F (v) dv, n→∞ 0 i=1 Xi

246

12. Stationary Point Processes and Random Measures

where G is the common d.f. of the Yi , and Xi = ti+1 − ti with common d.f. F given by F (x) = lim Pr{N1 (0, x] ≥ 1 | N1 (−h, 0] ≥ 1} h↓0

(see Chapters 3 and 13). The functions β and γ are necessarily more complex, involving joint distributions, as we now illustrate in the case that N1 is a stationary renewal process with lifetime d.f. F as just given. Writing π(t) = Pr{J(t) = 0 | N1 has a point at the origin} for the conditional dead-time probability function, the regenerative properties of N1 imply that π(·) satisﬁes the renewal equation t π(t − v) dF (v) π(t) = 1 − G(t) 1 − F (t) + 0 t 1 − G(t − v) 1 − F (t − v) dU (v), = 0

where U (·) is the renewal function generated by the d.f. F [see (4.1.7)]. When F is such that the nonlattice form of the renewal Theorem 4.4.I holds, (4.4.2) yields ∞ 1 − G(v) 1 − F (v) m1 dv (t → ∞). π(t) → 0

Thus, writing B for a stationary backward recurrence time r.v. (see Section 4.2), we also have lim π(t) = Pr{Z > B} = 1 − α. t→∞

For general t, we have γ(t) = m1 [1 − π(t)], so it remains only to identify β(·). When N1 is stationary, write Bt and Tt for the backward and forward recurrence time r.v.s at time t, noting that stationarity implies from (4.2.7) and (4.4.2) that their joint distribution has

∞

Pr{Bt > x, Tt > y} = m1

1 − F (v) dv.

x+y

For u > 0 we have β(u) = E J(t)J(t + u)

u 1 − π(u − v) Pr{Bt > Z, Tt ∈ (v, v + dv)} = Pr{Bt > Z, Tt > u} + ∞ u 0 = α − m1 dG(z) π(u − v) 1 − F (z + v) dv, 0

→ α2

(u → ∞)

0

when π(t) → 1 − α

(t → ∞).

12.6.

Moment Stationarity and Higher-order Ergodic Theorems

247

The properties of β(u) and γ(u) noted for large u reﬂect asymptotic independence, but it is primarily the local properties, for u closer to zero, that are of interest in practice, because it is these that reﬂect any causal relation between the two processes N0 and N1 . Parametric models are then called for, but take us further from the theme of this section so we refer, for example, to Lawrance (1971) or Sampath and Srinivasan (1977) for speciﬁc details (see also Exercises 12.6.12–13).

Exercises and Complements to Section 12.6 12.6.1 For a random measure (not necessarily stationary) on the c.s.m.s. X for which the kth moment measure Mk (·) (k ≥ 2) is boundedly ﬁnite, so M (·) ≡ M1 (·) ˘ k (B | x) with exists, establish the existence of a Radon–Nikodym family M the properties below, including the ‘disintegration’ property in (c). ˘ k (B | x) is a measurable function of (a) For each bounded B ∈ B(X (k−1) ), M x that is integrable with respect to M (·) on bounded sets. ˘ k (· | x) is a boundedly ﬁnite measure on B(X (k−1) ). (b) For M -almost all x, M (k−1) ), use the fact that Mk (· × B) M (·) to conclude that (c) For B ∈ B(X ˘ k (B | x) M (dx) M

Mk (A × B) =

[A ∈ B(X )].

A

12.6.2 (Continuation). Arguing as in Exercises (12.1.8–9), deduce that when X = ˘ k (· | x) Rd and the process is kth-order stationary, there exists a version of M (k−1) ˘ that is invariant under simultaneous translations; that is, Mk (Dy B | ˘ k (B | x). Hence, give an alternative proof of Proposition 12.6.III. x + y) = M 12.6.3 Give analogous statements to those of Proposition 12.6.IV for the reduced factorial moment measure and for the reduced ordinary and factorial cumulant measures. Investigate the analogue of (12.6.6) when Mk is replaced by M[k] and g and h are indicator functions as in the proof of (12.6.5). Relate the case k = 2 to the ergodicity result underlying (12.6.13). 12.6.4 Find the reduced moment and cumulant measures for a stationary renewal process. In particular, show that if the renewal function has a density h(·), then reduced kth factorial moment measures exist for all k = 2, 3, . . . and have densities m ˘ [k] (x1 , . . . , xk−1 ) = λh(x1 )h(x2 − x1 ) · · · h(xk−1 − xk−2 ), where {x1 , . . . , xk−1 } is the set {x1 , . . . , xk−1 } arranged in ascending order. [Hint: See Example 5.4(b) and Exercise 7.2.3.] 12.6.5 (a) Show that when the reduced kth factorial cumulant measure of a kthorder stationary point process is totally ﬁnite, the kth cumulant of N (A) is asymptotically proportional to (A) as A ↑ X through a convex averaging sequence. (b) Show that a stationary Poisson cluster process for which the cluster size distribution has ﬁnite kth moment, satisﬁes the conditions of (a). (c) Show that the conditions of (a) are not satisﬁed for k ≥ 2 by either of (i) any nontrivial mixture of Poisson processes; and (ii) a stationary renewal process whose lifetime distribution has ﬁnite ﬁrst moment but inﬁnite second moment. [Hint: Compare Exercises 4.1.1–2 and 4.4.5(c).]

248

12. Stationary Point Processes and Random Measures

12.6.6 Let the stationary random measure ξ on Rd have ﬁnite kth moment measure Mk . Show that for bounded Borel functions f of bounded support in (Rd )(k−1) , as n → ∞, E

1 (An )

f (x1 , . . . , xk ) ξ(dx1 ) . . . ξ(dxk ) An

−

X (k−1)

f (x, x + y1 , . . . , x + yk−1 ) ψ˘k (dy1 × · · · × dyk−1 ) → 0.

12.6.7 Suppose the stationary random measure ξ(·) has the mean square continuous nonnegative random function η(·) as density; that is, ξ(A) = A η(u) du a.s. (a) Prove that the covariance measure C2 of ξ(·) has no atom at the origin. (b) When η(·) is stationary and cov(η(x), η(y)) = σ 2 ρ(|x − y|), show that var ξ(0, x] = 2σ 2

x 0

(x − y)ρ(y) dy

(x > 0).

(c) Interpret the results of (a) and (b) in the (degenerate) case that ξ(A) = Y (A) for a r.v. Y with E(Y 2 ) < ∞. 12.6.8 L1 version of Theorem 12.6.VI. Let the strictly stationary random measure ξ of Theorem 12.6.VI be ergodic and have ﬁnite kth moment measure. Show that under the conditions of the theorem, as n → ∞, E

1 (An )

˘ k (B1 × · · · × Bk−1 ) → 0. ξ(x + B1 ) . . . ξ(x + Bk−1 ) ξ(dx) − M An

12.6.9 Higher-order moments for marked random measures. Show that for any kth-order stationary marked random measure or MPP on Rd with marks ˘ k such that for any function f ∈ in K there exists a reduced measure M (k) (k) BM(X × K ), and writing xk , κk for (x1 , . . . , xk ), (κ1 , . . . , κk ) and so on,

X (k) ×K(k)

f (xk , κk )Mk (d(xk × κk ))

=

dx X

X (k−1) ×K(k)

˘ k (d(yk−1 × κk )). f (x, x + yk−1 , κk )M

In particular, use this reduced measure to imitate (12.6.5) for the marked case. 12.6.10 Higher-order ergodic theorem for the marked case. (a) Extension of Lemma 12.6.VI. As in Lemma 12.2.III, let ξ be a random measure on the product space X ×K, and let I be the associated σ-algebra of events invariant under shifts Sx , x ∈ X = Rd . Establish the existence ˘ k (xk−1 , κk ) on X (k−1) × K(k) such of an I-measurable random measure Ψ that for bounded Borel functions f on X (k−1) ×K(k) with bounded support

E

X (k) ×K(k)

k

f (xk , κk )

=

dx X

ξ(dxi × dκi ) I

1

X (k−1) ×K(k)

˘ k (d(yk−1 × κk )). f (x, x + yk−1 , κk ) Ψ

(b) Establish a corresponding version of Proposition 12.6.VII for MPPs, with ˘ k replacing ψ˘k . Ψ

12.7.

Long-range Dependence

249

12.6.11 Kagan’s conjectures. On the basis of empirical evidence from current earthquake catalogues [see also Kagan and Knopoﬀ (1980)], Kagan (1981a, b) conjectured that earthquakes in the crust have a type of self-similar distribution in which the second-, third-, and fourth-order factorial moment densities have the respective forms: m[2] (x, y) ∼ 1/D(x, y) ;

m[3] (x, y, z) ∼ 1/A(x, y, z) ;

m[4] (w, x, y, z) ∼ 1/V (w, x, y, z), where D is the distance, A the area, and V the volume described by the respective arguments of the densities. (a) Investigate conditions under which the above conjectures are compatible with the existence of a stationary process with the prescribed densities. [Hint: Consider integrability conditions at the origin.] (b) Investigate conditions under which they might be approximately true for processes of fractal type, meaning, that they are concentrated in ‘random faults’ on lower-dimensional elements such as lines or surfaces. [Remark: These conjectures are still unresolved; for further discussion see Kagan and Vere-Jones (1995).] 12.6.12 Suppose that the process N0 in Example 12.6(b) is Poisson with rate parameter λ0 . Then the output process N2 has covariance density λ20 c(u) = λ20 cov (J(0), J(u)) = λ20 [β(u) − α2 ]. Show that, when N1 is a Poisson process at rate λ1 and the Yi are exponentially distributed with mean 1/µ, α= π(t) =

µ , λ1 + µ λ1 + µe−(λ1 +µ)t , λ1 + µ

β(u) =

µ(λ1 e−(λ1 +µ)u + µ) , (λ1 + µ)2

γ(u) = λ1 (1 − π(u)) =

λ1 µ(1 − e−(λ1 +µ)t ) . λ1 + µ

12.6.13 Replace the inhibitory mechanism of Example 12.6(b) deﬁned via the i.i.d. sequence {Yi } by Zi = min(Tti , ti+1 −ti ), where Tt is the forward recurrence time r.v. of the process N0 . Show that when N0 and N1 are independent stationary simple point processes with intensity λ0 , the output process has intensity ∞

λ2 = λ0

G(t) dF (t), 0

where G is now the d.f. of a lifetime r.v. Y for the process N0 and F is the d.f. of the backward recurrence time r.v. of the process N1 .

12.7. Long-range Dependence Long-range dependence (LRD) of a stochastic process could in principle relate to any of several characteristics of the process, but it has now generally come to be associated with second moments. For stationary point processes and random measures on R, we deﬁne it via the following variance property given in Daley and Vesilo (1997) and already referred to in Exercise 4.5.13.

250

12. Stationary Point Processes and Random Measures

Deﬁnition 12.7.I. A second-order stationary point process or random measure ξ on R is long-range dependent if lim sup t→∞

var ξ(0, t] = ∞. t

(12.7.1)

When the lim sup here is ﬁnite, the random measure is short-range dependent. Deﬁnition 12.7.I is less easy to apply in higher dimensions, although essentially the same ideas can be used. One approach, outlined in Exercise 12.7.3, is to consider the growth of var ξ(An ) relative to (An ) as An increases through a convex averaging sequence of sets. In the treatment and examples below, we mostly restrict attention to point processes in R. Recall that the second moment measure of a stationary random measure on an interval (0, t) in R cannot grow faster than O(t2 ) (see Exercise 12.7.1), so that (12.7.2) lim sup t−2 var ξ(0, t] < ∞. t→∞

This property implies that if in the denominator in (12.7.1) we replace t by tα , the ratio can be inﬁnite in the limit only if α < 2. It is therefore convenient to delineate this range by an index as in the next deﬁnition [the name recalls early work on long-range dependence in ﬂow records of the river Nile by the British engineer Hurst; see Beran (1994)]. Deﬁnition 12.7.II. A long-range dependent stationary random measure ξ has Hurst index var ξ(0, t] =∞ . (12.7.3) H = sup h: lim sup t2h t→∞ Then, the Hurst index must lie in the interval [0, 1], although for longrange dependence it is more narrowly conﬁned to 12 ≤ H ≤ 1. Furthermore, to have H < 1, we can immediately rule out the nonergodic case, for unless the invariant σ-ﬁeld I is trivial, it follows from Exercise 12.2.9 that (x → ∞), var ξ(0, x] ∼ x2 Γ({0}) where Γ({0}) = var Y and Y = E ξ(U) | I , so H = 1 when var Y > 0. For a stationary Poisson process at rate λ the ratio at (12.7.1) equals λ for all t so it cannot be long-range dependent. The next few examples indicate further possibilities of short- and long-range dependence. Example 12.7(a). For a stationary renewal process N (·) with renewal function U (·) as in Section 4.1, it follows from (3.5.7) that t 2[U (u) − λu] − 1 du (12.7.4) var N (0, t] = λ 0

and [cf. Exercise 4.4.5(d)] that var N (0, t] ≤ (const.)t for some ﬁnite constant if and only if supt>0 [U (t) − λt] < ∞, which is the case if and only if the lifetime

12.7.

Long-range Dependence

251

distribution underlying U (·) has its second moment ﬁnite. In other words, the independence between intervals (of a renewal process) is not suﬃcient to eliminate the possibility of long-range dependence. The Hurst index for such N is identiﬁed in Exercise 12.7.4. Example 12.7(b) Cluster process. Let V c (x) = var Nc (0, x] denote the variance function [see (8.1.13b)] of the stationary cluster centre process Nc (·) of a cluster process N (·) with independent identically distributed component processes Nm (· | ·) (see Deﬁnition 6.3.I). Equation (53) of Daley (1972) shows that when N is stationary and Nc is orderly at rate λc , and the second moment of the component process Nm is ﬁnite, the variance function of N is given by ∞ E [Nm ((−u, x − u] | 0)]2 du var N (0, x] = λc −∞ ∞ ∞ ) m1 (dy) m1 (dz) − λc (x − |y − z|)+ + −∞ −∞ * + V c (y − z + x) − 2V c (y − z) + V c (y − z − x) , where m1 (y) = ENm ((−∞, y] | 0) and V c (x) = V c (−x) for x < 0. The three variance terms here equal cov(Nc (0, x], Nc (y − z, y − z + x]); this is dominated by V c (x), which implies that when limx→∞ V c (x)/xα exists ﬁnite and equals λ2,α say, the dominated convergence theorem can be applied when α > 1 to conclude that limx→∞ x−α var N (0, x] = m21 λ2,α , where m1 = m1 (∞). In this example, long-range dependent behaviour of the cluster centre process [as shown by V c (x) ∼ const. xα ] is carried over into the cluster process itself, magniﬁed by the square of the mean cluster size. When V c (x)/x → λ2 say (so Nc is not long-range dependent) the same argument [e.g., Daley (1972)] gives instead V (x) ∼ m21 λ2 x + (var Nm (R | 0))λc x

(x → ∞);

that is, variability in the component processes is no longer swamped. Consistent with this fact, having heavy-tailed distributions for the distances from the cluster centre to the cluster members is not suﬃcient to cause long-range dependence in the process as a whole. For example, the variance in a Neyman– Scott process with Poisson centre process [see, e.g., (6.3.19) and (12.7.5) below] is given by x |u| ˘ var N (0, x] = C[2] (du) = λc [m21 + m[2] F ∗ F − (x)], 1− x x −x where m[2] is the second factorial moment of the cluster size distribution, and F ∗ F − (·) is the distribution function of the distance X1 − X2 between two members of a cluster. When m[2] is ﬁnite, this converges to a ﬁnite limit irrespective of the character of the distribution of F .

252

12. Stationary Point Processes and Random Measures

Example 12.7(c) Superpositions. Suppose the stationary random measure ξ is expressible as the sum of the two stationary random measures ξ1 and ξ2 , and that all processes have ﬁnite second moments; we write ξ(t) = ξ(0, t] and so on. Then for t > 0 and from the Cauchy–Schwarz inequality, 2 var ξ(t) = var ξ1 (t) + var ξ2 (t) + 2 cov ξ1 (t), ξ2 (t) var ξ(t) = 2 var ξ1 (t) ± var ξ2 (t) . (12.7.5) ≤ (≥) Let H, H1 , H2 denote the Hurst indexes of ξ, ξ1 , ξ2 . Using (12.7.5), deduce that either H < H1 = H2 or H = max(H1 , H2 ), and the latter holds when ξ1 and ξ2 are independent. Daley and Vesilo (2000) give details and applications to some queueing examples. Because the variance properties are controlled by the reduced second moment measure C˘2 , we should expect the distinction between long- and shortrange dependence to be expressible in terms of this measure. The next lemma gives a partial resolution of this question; Example 12.7(e) indicates that the complement of these suﬃcient conditions does not yield a set of necessary conditions. Lemma 12.7.III. Let ξ be a second-order stationary random measure in R, and write C˘2 = C˘2+ − C˘2− for the Jordan–Hahn decomposition of its reduced covariance measure into its positive and negative parts. (a) ξ is short-range dependent if C˘2 is totally ﬁnite. When C˘2− is totally ﬁnite, ξ is long-range dependent if and only if the positive part C˘2+ is not totally ﬁnite. (b) ξ is short-range dependent if its Bartlett spectrum Γ(·) has a bounded density in a neighbourhood of the origin. Proof. The results are proved from the relations [cf. equations (8.1.13) and (8.2.3)] t ∞ |u| var ξ(0, t] = 1− (1 − |u|/t) C˘2 (du) = C˘2 (du) t t + −t −∞ (12.7.6) ∞ sin 12 θ 2 = Γ(t dθ), 1 −∞ 2θ expressing the middle integral as a diﬀerence of two integrals (involving C˘2+ and C˘2− , respectively) to which the monotone convergence theorem can be applied. Similarly, if Γ(·) has a density γ(θ) which is bounded in a neighbourhood of θ = 0, it follows from dominated convergence that the ﬁnal integral remains bounded as t → ∞. Example 12.7(d) LRD Cox process. The Cox process N (·) directed by the and their variance funcstationary random measure ξ(·) on BR is stationary tions are related by var N (0, t] = E ξ(0, 1] t + var ξ(0, t] (Proposition 6.2.II). Thus, N (·) has exactly the same long-range dependence behaviour as ξ(·).

12.7.

Long-range Dependence

253

For example, when ξ(·) accumulates mass at unit rate during just one of the phases of an alternating renewal process with generic lifetimes Xj (j = 1, 2) say, each with ﬁnite ﬁrst moments, whenever one of the Xj has inﬁnite second moment the random measure ξ (and hence N also) is long-range dependent. Details are left to Exercise 12.7.5. Example 12.7(e) Deterministic point process. Example 8.2(e) shows that the reduced covariance measure C˘2 of a stationary deterministic point process N with span a has positive atoms C˘2+ ({ka}) = 1/a for k = ±1, ±2, . . . , C˘2+ (A) = 0 whenever A ∩ {ka} = ∅ for all such k, and C˘2− (du) = (du)/a2 . This process has a periodic variance function, with 0 ≤ var N (0, t] ≤ 14 , so it is not long-range dependent. On the other hand the positive and negative parts of C˘2 are mutually singular [C˘2+ is purely atomic and C˘2− is a multiple of Lebesgue measure], and both C2+ (0, u] and C2− (0, u] increase indeﬁnitely with the length u of the interval, whereas C˘2 (0, ka] = 0 for k = 1, 2, . . . . This example shows that we cannot exclude the possibility of the process being short-range dependent even when C˘2 fails to be totally ﬁnite. Long-range dependence is frequently understood in terms of power-law decay of the correlation function. In the point process context, this means looking at the covariance density function rather than the variance function of which it is the second derivative [Exercises 8.1.3(g), 12.7.2]. For example, Ogata and Katsura (1991) discuss situations in which the covariance density shows a power-law decay but describe it in terms of ‘fractal behaviour’. We note here the diﬃculty of maintaining a coherent and consistent naming system in situations where interest from the media and the general public preponderates. Fractals, with all the striking images associated with them in Mandelbrot’s writings form a case in point. Anything exhibiting some form of scaling behaviour or self-similarity, particularly when it is linked to powerlaw decay, is almost automatically labelled a ‘fractal’ in a nonmathematical context. By contrast, in this book we have tried to establish and maintain distinctions between the concepts of long-range dependence, representing (as in Exercise 4.5.13 and this section) a variance property related to powerlaw decay in a covariance or correlation function; heavy-tailed behaviour in a probability distribution; scale-invariance and self-similarity (discussed in the next section); and fractal behaviour which we take up in Section 13.6 and relate to the behaviour of moment densities at very small distances or time intervals. In a similar vein, the Hurst index is indeed an index and not dependent on any parametric setting.

Exercises and Complements to Section 12.7 12.7.1 When the set A ⊂ Rd is a hyper-rectangle of side-lengths n1 , . . . , nd and ξ is a second-order stationary random measure, express ξ(A) as the sum of

254

12. Stationary Point Processes and Random Measures the measures on νA = di=1 ni unit cubes, and use the Cauchy–Schwartz inequality to conclude that the second moment measure M2 of ξ satisﬁes 2 )M2 (Ud ). Compare with Exercise 8.1.3(b). M2 (A) ≤ (νA

12.7.2 Show that a stationary point process on R with reduced covariance density c˘(x) ∼ a/xγ (x → ∞) for a > 0 and 0 < γ < 1 has Hurst index 1 − 12 γ. 12.7.3 Call a stationary random measure ξ: B(Rd ) → R+ long-range dependent whenever lim supn→∞ ( var[ξ(An )]/(An )) = ∞ for some convex averaging sequence {An ; n = 1, 2, . . .}. (a) Show that this deﬁnition is independent of the convex averaging sequence. (b) Show, analogously to the ﬁrst part of (12.7.6), that var ξ(An ) = (An )

Rd

(An ∩ Tz An ) ˘ C2 (dz). (An )

Show that the ratio in the integrand is always bounded by 1, and is close to 1 for small z. Hence provide an extension of Lemma 12.7.III to Rd . 12.7.4 Hurst index of a LRD renewal process. Let N be a stationary renewal process as in Example 12.7(a) whose generic lifetime r.v. X with d.f. F and tail F (x) = 1 − F (x) has moment index κ deﬁned by κ ≡ inf{k: E(X k ) = ∞} = lim inf [− log F (x)]/ log x x→∞

[see, e.g., Daley and Goldie (2005) for the equality] for which 1 < κ < 2. Use ∞ (12.7.4) and the asymptotic behaviour U (t) − λt ∼ λ2 0 min(u, t) F (u) du 1 [Sgibnev (1981)] to deduce that its Hurst index H = 2 (3 − κ). [Hint: Daley (1999) gives a proof by contradiction; the proof via (12.7.4) and Exercise 4.4.5(c) as just indicated is direct.] 12.7.5 LRD in ON/OFF processes. Let I(t) be the stationary {0, 1}-valued process generated by the ON phases of a stationary alternating renewal process Nalt with lifetime d.f.s F1 for such ON-phases and F0 for the OFF phases (so I(t)= 1 when the alternating renewal process is in an ON phase, = 0 otherwise). Let T1 (t) denote the accumulated duration during (0, t) for which t I(u) = 1 for 0 < u < t, so that T1 (t) = 0 I(u) du. Let T0 (t) = t − T1 (t) denote the corresponding duration for which I(·) is in the OFF phase. Observe that var Tj < ∞ for j = 0, 1 because these Tj are bounded r.v.s, and in fact

cov (T1 (t), T0 (t)) = cov (T1 (t), t − T1 (t)) = − var T1 (t) = − var T0 (t). Consequently, if the Cox process NCox directed by the ON phases of Nalt is LRD (i.e., lim supt→∞ t−1 [var T1 (t)] = ∞), then so too is the Cox process directed by the OFF phases of Nalt . Show that NCox , Nalt , and the stationary random measure ξ with I(·) as its density, all have the same Hurst index. [Hint: Daley (2007) studies this example further, showing that for t → ∞, ( var NCox (0, t])/( var Nalt (0, t]) has a limit if one of the lifetime distributions Fj (j = 0, 1) has ﬁnite second moment but if both have inﬁnite second moment the ratio can oscillate indeﬁnitely. Daley, Rolski and Vesilo (2007) extend this work to a Cox process driven by a LRD semi-Markov process.]

12.8.

Scale-invariance and Self-similarity

255

12.8. Scale-invariance and Self-similarity In this last section we look at processes where invariance under multiplicative actions (scale changes) rather than additive actions (translations) plays the key role. Such processes include the important class of self-similar processes (also called auto-modelling in the older Russian literature). In the context of the present book, the most relevant concept is that of a self-similar random measure, where nonnegativity plays a dominant role, and the theory is rather diﬀerent from that of the fractional Brownian motions and related two-sided processes which have become familiar in ﬁnance and other application areas. The self-similar random measures we consider are purely atomic, and can be described by the marked point process of the locations and sizes of the atoms. Moreover, self-similarity of the random measure can be restated in terms of a corresponding invariance property of this marked point process; we call this property biscale invariance. We ﬁrst consider the simpler case of point processes invariant under scale changes about a ﬁxed origin. A process on X = Rd is called scale-invariant if it is invariant under the group of scale changes {Tα : 0 < α < ∞},

where for x ∈ Rd ,

Tα x = αx.

(12.8.1)

This group splits Rd into equivalence classes, one of which is the origin, and the others can be identiﬁed with rays originating from the origin. Now Rd \{0} can be written as the product Sd × R+ 0 , where Sd , the group of d-dimensional rotations, can in turn be identiﬁed with the surface of the d-dimensional unit sphere, and R+ 0 denotes the open half-line (0, ∞) = R+ \ {0}. Note that S1 is just the two-point group T2 = {−1, 1} under multiplication. R+ 0 is a group under multiplication, and it has the unique invariant measure h(dx) = dx/x. It is now obvious, but also follows formally from Lemma A2.7.II, that any measure on R1 that is invariant under scale changes, can be represented as the sum of a point mass at the origin, and the direct product of a two-point mass on T2 and the measure h(·) on R+ 0 . Similarly, a scaleinvariant measure on Rd can be represented as the sum of a point mass at the origin and a measure κ(dθ) dr/r on Rd \ {0} = Sd × R+ 0 , where κ(·) is an arbitrary totally ﬁnite measure on Sd . The position of the origin is unimportant here, but it is clear that the above structure is incompatible with translations in Rd , so that a measure on Rd cannot be invariant under both translations and scale changes, a result with important consequences for both scale-invariant and self-similar random measures. Example 12.8(a) Scale-invariant Poisson processes on Rd . As in Example 12.1(a) we deduce that if a Poisson process on Rd is invariant under scale changes, its parameter measure must have the same property, and must therefore have the structure described above, namely, the sum of a point mass at the origin and a measure κ(dθ) dr/r on Sd × R+ 0.

256

12. Stationary Point Processes and Random Measures

Part (a) of the proposition below is an immediate consequence of this discussion. A similar argument applied to the ﬁrst moment measure, whenever it exists, yields part (b). Proposition 12.8.I. (a) A Poisson process on Rd cannot be simultaneously invariant under both scale changes and translations. (b) No stationary random measure on Rd with ﬁnite expectation measure can be scale-invariant. We turn next to a discussion of self-similarity for random measures. Here, a change in scale is balanced by a compensating change in mass. Deﬁnition 12.8.II. Let D be a ﬁnite positive constant. A random measure is self-similar with similarity index D, (self-similar or D-self-similar for short), (D) if its distribution is invariant under the group of transformations {Rα : α ∈ # + R0 } deﬁned on boundedly ﬁnite measures MX by (D) µ(A) = α−D µ(αA) Rα

(A ∈ BX ).

(12.8.2)

Z¨ahle (1988) uses the shorter terminology above, and includes discussion of nonprobability measures on M# X . Even more extensive work is covered in Z¨ ahle (1990a, b, 1991). Note that self-similarity, like scale-invariance, refers in the ﬁrst instance to invariance relative to a ﬁxed origin; it is only under stationarity, or some explicit rule describing how the invariance properties alter as we shift the origin, that the concept extends beyond this case. (D) The transformations Rα do not result directly from transformations of the phase space X into itself, but do still induce a group, the renormalization group, of bounded continuous transformations of M# (Rd ) into itself (see Exercise 12.8.1). We start with two negative results. Because of the change in mass, the renormalization group does not map N # (Rd ) into itself. This justiﬁes the following. Proposition 12.8.III. A point process cannot be self-similar. (D)

The class of (deterministic) measures invariant under Rα is not a rich family: on R+ it is conﬁned to measures with power-law densities [hyperbolic densities in the usage of Mandelbrot (1982, p. 204)] fD (x) = Cx1−D

(C > 0, x > 0).

Only in the trivial case D = 1 is the measure invariant under both the simi(D) larity transformations Rα and translations; it reduces then to a multiple of Lebesgue measure. The situation for general random measures is more rewarding. To give a preview of the issues which arise, we examine ﬁrst, without assuming stationarity, the consequences of self-similarity on the representation for completely random measures given in Theorem 10.1.III.

12.8.

Scale-invariance and Self-similarity

257

Suppose that ξ is a completely random measure deﬁned on X = Rd , so that, as in (10.1.4), it can be represented in terms of a drift, a set of ﬁxed atoms, and a Poisson process N in Rd × R+ 0 . Operating on this representation (D) by Rα , and using diﬀerential notation for brevity, we ﬁnd ∞ (D) Rα ξ(dx) = α−D ν(α dx) + y α−D N (α dx × dy) + Uk α−D δxk /α (dx) 0

= α−D ν(α dx) +

0

k

∞

y N (α dx × αD dy) +

Uk α−D δxk /α (dx).

k

This last expression again corresponds to a completely random measure: the (D) measure ν has been transformed by Rα as at (12.8.2), the Poisson process (D) N has been subjected to the biscale transformation Sα on X × R+ 0 given by Sα(D) N (A × K) = N (αA × αD K),

(12.8.3)

and the ﬁxed atoms have been transformed both in mass and in location. If the distribution of the completely random measure is to remain invari(D) ant under all transformations Rα (α > 0), then it is clear that there can be no ﬁxed atoms, that the (deterministic) measure ν must be invariant under (D) the transformations Rα , and that the parameter measure µ of the bivari(D) ate Poisson process N must be invariant under the transformations Sα . Thus, we have reduced the problem of characterizing the class of self-similar completely random measures to the problem of characterizing the classes of measures invariant under these two groups of transformations. For simplicity we consider only the case X = R1 ; the details for X = Rd are similar (see Exercise 12.8.2 for the case d = 2). As in Example 12.8(a), it is necessary to consider separately the action − + of the transformations on R+ 0 , {0} and R0 = {x: −x ∈ R0 }. Because the processes have no ﬁxed atoms, ν{0} = 0 and µ, the parameter measure of N , (D) on measures has µ({0} × R+ 0 ) = 0. Thus, we may consider the eﬀect of Rα (D) ν acting on BR+ , and of Sα on measures µ on BR+ ×R+ , with similar results following for the components on R− and R− × R+ . (D) From (12.8.2), invariance of ν under Rα implies that ν is absolutely + continuous on R0 ; its density with respect to Lebesgue measure is given by dν (x) = c1 xD−1 (x ∈ R+ 0 ). d Similarly, on R− 0,

dν (x) = c2 |x|D−1 (x ∈ R− 0 ). d Next, consider the representation of the parameter measure µ on the quad+ rant R+ 0 × R0 . Invariance of the distribution of N under (12.8.3) implies that

258

12. Stationary Point Processes and Random Measures (D)

µ itself is invariant under the biscale shifts Sα , and hence is invariant under shifts along the curves xD/y = constant. This suggests writing v = D log x − log y,

u = log x,

so that in the (u, v)-plane the transformation (12.8.3) becomes (u, v) → (u + log α, v).

(12.8.4)

We now deduce from Lemma A2.7.II that µ ˜, the image of µ under this mapping, reduces to a product of Lebesgue measure along the u-axis and an arbitrary σ-ﬁnite measure ρ˜1 along the v-axis. Thus, integration with respect to µ in this quadrant can be represented in the form ∞ ∞ f (x, y) µ(dx × dy) = f (eu , eDu−v ) du ρ˜1 (dv). (12.8.5) + R+ 0 ×R0

−∞

−∞

+ Similar considerations may be applied on R− 0 × R0 , with a possibly diﬀerent measure ρ˜2 replacing ρ˜1 . The ﬁniteness constraints (10.1.5), when expressed in the form ∞ (1 − e−y ) µ(A × dy) < ∞ (bounded A ∈ B(R)), 0

lead to the requirements that for i = 1, 2, 0 (1 + |v|) ρ˜i (dv) < ∞ and −∞

∞

e−v ρ˜i (dv) < ∞.

(12.8.6a)

0

If, in particular, ρ˜1 is absolutely continuous with respect to Lebesgue mea+ sure, then it is more convenient to write, for (x, y) ∈ R+ 0 × R0 and with ρ˜1 (dv) = ρ1 (v) dv, µ(dx × dy) =

η1 (xD/y) ρ1 (log(xD/y)) dx dy ≡ dx dy xy xy

for some nonnegative, locally integrable function η1 on R+ 0 . If also a similar + × R , then an analogous representation representation holds for (x, y) ∈ R− 0 0 and |x| in place of x. holds for some similar function η2 on R+ 0 When such absolute continuity conditions hold, the Laplace functional L[f ] for the random measure ξ can be written (f ∈ BM+ (R)), L[f ] = exp L1 [f ] + L2 [f ] where L1 [f ] equals ∞ D−1 x f (x) dx + −c1 0

0

∞

0

∞

(eyf (x) − 1)

η1 (xD/y) dx dy, xy

(12.8.6b)

and a similar expression holds for L2 with c1 replaced by c2 , η1 (xD/y) by η2 (|x|D/y), and integration of x is over (−∞, 0) in place of (0, ∞). Conditions (12.8.6a) transform to 1 ∞ 1 + | log z| ηi (z) ηi (z) dz < ∞ and dz < ∞. (12.8.7) z z2 0 1 We thus have a complete answer to the representation problem in the onedimensional case.

12.8.

Scale-invariance and Self-similarity

259

Proposition 12.8.IV. A completely random measure on R is self-similar if and only if, in terms of the representation in Theorem 10.1.III, there are no ﬁxed atoms, and the measures ν, µ can be written, as in (12.8.5), in terms of positive constants c1 , c2 and measures ρ˜1 , ρ˜2 satisfying (12.8.6a). In particular, when these measures have densities, the Laplace functional of the random measure can be written in the form (12.8.6b) for nonnegative, locally integrable functions η1 , η2 satisfying (12.8.7). For applications we generally require the random measure to be stationary as well as self-similar, in which case the representation must also be invariant under shifts along the x-axis. Then, the ﬁrst term must vanish unless D = 1, when it reduces to a constant multiple of Lebesgue measure along the whole real axis. For the measures in the second term, the additional condition is easily seen to be satisﬁed if and only if η1 (v) = ρv 1/D = η2 (v), corresponding to (12.8.8) µ(dx × dy) = ρy −(1+1/D) dx dy. Then the constraints at (12.8.7) require D < ∞ and D > 1 respectively. Hence, it follows that the class of completely random measures that are both stationary and self-similar reduces, for D = 1, to the trivial example of a constant multiple of Lebesgue measure, and for 1 < D < ∞, to the stable processes with index α = 1/D and Laplace functional of the form [cf. Example XIII.7(c) of Feller (1966)]

∞

∞

1 − e−yf (x) dy y 1+1/D −∞ 0 ∞ [f (x)]1/D dx. = ρDΓ(1 − D−1 )

− log L[f ] = ρ

dx

(12.8.9)

−∞

Corollary 12.8.V. A completely random measure on R is both stationary and self-similar if and only if there is no drift or atomic component, and the Poisson process in representation (10.1.4) has density (12.8.8), with 1
for some class of extended MPPs N more general than the Poisson process. The use of an extended MPP here can be avoided by altering the metric on R+ 0 to (say) d(x, y) = | log(x/y) |, thus eﬀectively removing the origin to an inaccessible boundary point. With such a change of metric, the process remains boundedly ﬁnite, although its ground process remains undeﬁned as a

260

12. Stationary Point Processes and Random Measures

point process. A change of metric of this kind corresponds to representing the mark on a logarithmic or decibel scale, as indeed is commonly done in physical applications, where, for example, the energy release might be the physically meaningful mark, for which some form of self-similarity could be claimed, although the most convenient measure of size is more often a measurement on a decibel scale. The same sort of transformation lies behind (12.8.4). Exercise 12.8.3 restates the requirements of self-similarity in terms of measurements on an associated logarithmic or decibel scale. We proceed to a more systematic examination of the properties of random measures deﬁned by (12.8.10). As in the independent case, a key role is played by the biscale transformations, and we adopt the following deﬁnition. Deﬁnition 12.8.VI. An extended MPP on X = Rd ×R+ 0 is biscale invariant, with index D (D-biscale invariant for short), if, for every real constant α > 0, (D) its ﬁdi distributions are invariant under the biscale transformations Sα (x, y) of (12.8.3). Proposition 12.8.VII. Let N be an extended MPP on Rd with marks in R+ 0. (a) In order that (12.8.10) should deﬁne a valid (boundedly ﬁnite) random measure ξ on Rd , it is necessary and suﬃcient that for all bounded A ∈ BRd , the integral (12.8.8) should converge a.s. If N has boundedly ﬁnite expectation measure M (· × ·), then it is suﬃcient that M should satisfy conditions (10.1.5a) and (10.1.5b). (b) If N (·) is stationary (i.e., invariant under shifts in its ﬁrst argument), then ξ is stationary. (c) If N is D-biscale invariant, then ξ is D-self-similar. (d) If the expectation measure M exists and (b) is satisﬁed, then there exists a σ-ﬁnite measure φ on R+ 0 such that M = × φ, where is Lebesgue measure on Rd . If also (c) is satisﬁed, with index D, then φ reduces to the power-law with density φ(κ) = cκ−(1+1/D)

(c > 0, 1 < D < ∞).

(12.8.11)

Proof. Provided it is understood that K ∈ BR+ is bounded away from both 0 0 and ∞, it follows from Proposition 9.1.VIII that κ N (dt × dκ) ζ(A × K) = A×K

deﬁnes a boundedly ﬁnite random measure ζ on R × K. Then a.s. convergence of the integral (12.8.10) to a ﬁnite value is enough to ensure that ξ, the ground measure for ζ (set K = K), is also a boundedly ﬁnite random measure. If the expectation measure M for N exists, and A is bounded, then con dition (10.1.5a) implies that the integral 0 κ N (A × dκ) converges a.s., and condition (10.1.5b) implies that N (A × ( , ∞)) is a.s. ﬁnite, the two together being suﬃcient to imply the convergence a.s. of (12.8.10).

12.8.

Scale-invariance and Self-similarity

261

Assertion (b) is an easy consequence of the deﬁnitions of stationarity, and (c) follows from the following equations, where we ﬁrst consider the distribution of µ(A) for some bounded set A ∈ BRd :

∞

∞

(D) ξ(A) = Rα 0

=

0

=

∞

κ α−D N (αA × dy) κ N (αA × αD dκ) κ N (A × dκ) = ξ(A),

0

in which the last line follows from the assumed biscale invariance of N . (D) Thus the one-dimensional distributions of ξ are invariant under Rα . For all r>1, similar arguments can be applied to the r-dimensional distributions of ξ(A1 ), . . . , ξ(Ar ) for bounded Borel sets A1 , . . . , Ar . Such ﬁdi distributions (D) are suﬃcient to determine the distributions of ξ and Rα ξ completely, so it follows that the two random measures must be equal in distribution. Finally, (d) follows on taking expections through the equations expressing invariance of N under shifts in its ﬁrst component and under biscale invariance, and repeating the argument leading to (12.8.8). The representation implies that, as in the Poisson case of Proposition 12.8.I, the power-law form (12.8.11) is the only possible form for the stationary mark distribution, even though it is unbounded and cannot be normalized to form a probabilitiy distribution. Of course, over any lower threshold κ0 > 0, it can be normalized to form a Pareto distribution with distribution function F (κ | κ0 ) = 1 − (κ/κ0 )−(1/D)

(κ ≥ κ0 ).

Our main interest now is in exhibiting random measures of the above type that are both self-similar and stationary. The stable processes form one such example: constructing additional examples is not a trivial exercise. We attempt this only for processes in one dimension (X = R), where we can specify the model via its conditional intensity function (see Section 7.3 and the more extended discussion in Chapter 14). Because we are concerned with processes with an inﬁnite past, the appropriate version of the conditional intensity is the complete conditional intensity, λ∗ (t, κ), representing the current risk (of a point in [t, t + dt) with mark in [κ, κ + dκ)) given the whole past back to −∞. The next result gives necessary conditions which must be satisﬁed by the complete conditional intensity if the underlying point process is to be stationary and D-biscale invariant. In the present context it is desirable to reﬂect the dependence on the past explicitly in the notation, so anticipating what we use in Section 14.7, we therefore write λ∗ (t, κ) = ψt (Nt− , κ),

262

12. Stationary Point Processes and Random Measures

where ψt is a functional of the point process realization N− t− on (−∞, t) and (D) the mark κ. We also use Sα Ht to denote the past at time t of a transformed process where, taking t as the time origin, the times of past events in the original process are inﬂated by a factor α and their marks by a factor αD . Similarly, we use Tτ Ht to denote the past at time t of a version of the original process shifted through τ . Lemma 12.8.VIII. Let N be an extended MPP with state space R × R+ 0, and complete conditional intensity function λ∗ (t, κ) dt dκ = E[N (dt × dκ) | Ht− ] = ψt (Nt− , κ) dt dκ. (a) If N is stationary then for all real t, ψt (Nt− , κ) = ψ0 (St Nt− , κ) ≡ ψ(St Nt− , κ) is independent of t so that λ∗ (t, κ) = ψ(St Nt− , κ) for all t > 0. (b) If N is also D-biscale invariant, then for all real α > 0, ψ(Sα(D) N0− , αD κ) = α−(1+D) ψ(N0− , κ), and for all t, α > 0, " (D) Ht = ψ(St Nt− , κ) dt dκ = λ∗ (t, κ) dt dκ. E N d(αt) × d(αD κ) " Rα Proof. Conditions (a) and (b) are to be understood as equalities of functionals of the inﬁnite past, suitably adjusted where appropriate. Thus, condition (a) means that if there are previous occurrences at {ti : ti < t}, and the times are shifted so that these become {ti + τ : ti < t}, then the value of the conditional intensity for the shifted process at time t + τ coincides with the value of the conditional intensity for the original process at time t. Now, under the assumptions, the conditional intensities can be expressed in terms of the ﬁdi distributions and vice versa, so the statement is equivalent to equality of the ﬁdi distributions under shifts and hence is a consequence of stationarity. It can be satisﬁed only if the conditional intensity depends on the past occurrence times through the diﬀerences t − ti and not on the absolute values ti . It is a necessary condition (but not suﬃcient) for the conditional intensity itself to be a stationary process in time. Similarly, to justify condition (b), consider a simultaneous inﬂation of the time scale (from origin t = 0 back into the past) by a factor α, and of the mark scale by a factor αD . If the underlying process is D-biscale invariant, the conditional intensity at t = 0 for the inﬂated process must have the same value as the conditional intensity for the original process at t = 0, yielding the condition (b) for t = 0. It implies that, as a function of past events, the 1/D conditional intensity at time t = 0 must be a function of the ratios ti /κi . Because by assumption the underlying process is also stationary, a similar

12.8.

Scale-invariance and Self-similarity

263

condition must hold for all t. Note that although the left-hand side of the last equation in condition (b) refers to a conditional intensity at time αt it cannot be equated with λ∗ (αt) [as wrongly asserted in Vere-Jones (2005)] because the conditioning histories are diﬀerent. Conditions (a) and (b) are both satisﬁed when the conditional intensity can be expressed in the form κ (t − ti )D , (12.8.12) , λ∗ (t, κ) = κ−(1+1/D) h κi κi where h is a function of the inﬁnite set of pairs of arguments (κ/κi , (t−ti )D /κi ) involving the times and marks ti , κi of past events (i.e., with ti < t). To understand the form of the arguments of h, note that, in general, the complete conditional intensity should be a function of (t, κ) and the inﬁnite set of pairs (ti , κi ). Setting τ = −t in condition (a) shows that the arguments of h can be reduced to κ and the pairs (t − ti , κi ). Then setting α = 1/κ and using (b) the arguments reduce to the pairs ((t − ti )/κ1/D , κi /κ), which is equivalent to the form in (12.8.12). Note that the initial term κ−(1+1/D) arises from the inﬂation of the inﬁnitesimal elements dt dκ. Unfortunately, the conditions of the lemma provide no guarantee that in any particular case a process with the proposed form of conditional intensity function exists, or, if it does exist, that it is uniquely speciﬁed and inherits the invariance properties of the conditional intensities. To illustrate the latter point, consider a Hawkes process, with conditional intensity as set out in Example 7.2(b), but with criticality constant ν ≥ 1. The proposed conditional intensity satisﬁes condition (a), but the only corresponding point process is explosive and does not admit any stationary version. In developing a potential model, therefore, it is necessary to check two points: that the proposed conditional intensity satisﬁes the conditions of the lemma, and that a process with this conditional intensity exists and is both stationary and self-similar. The stable processes correspond to the choice h ≡ const. The next example demonstrates that the proposed class of processes is not limited to the stable processes. Example 12.8(b). Self-similar ETAS model. Recall from Examples 6.4(d) and 7.3(b) that the standard ETAS model has conditional intensity of the form, for M > M0 , p > 0, $ ' p cp ∗ −β(Mi −M0 ) α(M −M0 ) µc + A , e λ (t, M ) = βe (c + t − ti )1+p i:t
(12.8.13) where the condition for stability (existence of a stationary version) is that ∞ Aβ < 1. (12.8.14) eα(M −M0 ) βe−β(Mi −M0 ) dM = ρ=A β −α M0

264

12. Stationary Point Processes and Random Measures

The mark here is in the logarithmic (magnitude) scale. To ﬁnd a self-similar variant, we should convert back to the original (energy) scale κ = eM , and investigate the eﬀect of lowering the magnitude cut-oﬀ M0 = log κ0 . In this formulation, the conditional intensity for the ETAS model takes the form λ∗ (t, κ) =

κ i α β κ −(1+β) p cp µ+ . A κ0 κ0 κ0 (c + t − ti )1+p i:t
(12.8.15)

i

This is reminiscent of the structural form (12.8.12), but several modiﬁcations are needed before it can represent the complete intensity of a stationary, biscale invariant process. The essential steps are the following. 1◦ . The threshold κ0 should disappear. To make this possible we need at least to set α = β. 2◦ . This now violates the stability condition (12.8.14); to ensure the existence of a suitable point process we replace the factor κα i inside the summation by the stabilizing factor S

κ κi

) κ κ *δ/2 i = min , κi κ

(δ > 0).

(12.8.16)

3◦ . To ensure that all terms have a self-similar form, we replace the constant 1/D of the size of the generating c in (12.8.13) by the function c(κi ) = Cκi event at ti . The resulting complete intensity can be written in the form ∗

−(1+1/D)

λ (t, κ) = κ

µ+η

−(1+p) κ . 1+ S 1/D κ i Cκ

i:ti
t − ti

(12.8.17)

i

The quantities µ, η, C, p, δ, and D are positive constants and constitute the parameters of the model. The parameters µ and η are rates per unit time and energy, C has units time/energy1/D , and the remaining parameters are dimensionless. The compatibility with (12.8.12) is now evident, but it is not yet clear whether there exists a well-deﬁned process with this form of complete intensity. To clarify this point, we revert to the cluster process interpretation of the Hawkes process (see Section 6.3). To accommodate the mark structure, the state space should be changed from R to R × R+ 0 . Our candidate model then has cluster centres forming a Poisson process on R × R+ 0 , with mark–time intensity µκ−(1+1/D) , and the cluster members from a parent at (ti , κi ) are the total oﬀspring, from all generations, of a branching process in which ﬁrst generation oﬀspring form an independent Poisson process with intensity θ(t, κ | ti , κi ) =

η κ1+1/D

−(1+p) κ 1+ S 1/D κi Cκ t − ti i

(t > ti , κ > 0),

12.8.

Scale-invariance and Self-similarity

265

and the oﬀspring from this and all later generations independently follow the same Poisson process relative to their own parent. Using this formulation, we can make use of the general criterion for the existence of a Poisson cluster process at Proposition 6.3.III, namely, the convergence, for each bounded Borel set B ∈ X ≡ R × R+ 0 (the set B should also be bounded away from 0 in K = R+ 0 ), of the integral X

Pr{N (B | x) > 0} µc (dx),

(12.8.18)

where N (· | x) is the cluster member process from a cluster centre at x and µc (·) the expectation measure for the process Nc (·) of cluster centres. In addition to being nonnegative, the essential characteristic of the kernel θ is that it should admit the function ψ(t , κ ) = (κ )−(1+1/D) as the eigenfunction corresponding to a positive (and hence maximum) eigenvalue ρ < 1. To clarify its behaviour in this regard, it is convenient to rewrite θ in the more general form θ(t, κ | t , κ ) = ρ f (t − t , κ ) P (κ, κ )

κ −(1+1/D) κ

,

(12.8.19)

where f is normalized to be a probability density function in u = t − t and P is normalized to be a Markov transition kernel in κ . Straightforward computations show that this is achieved in the present instance by setting −(1+p) t − t p 1+ ) , f (t − t , κ ) = C(κ )1/D C(κ )1/D κ κ *δ/2 δ) P (κ, κ ) = min , , κ κ κ ρ = ηC/δp.

For a branching process with Poisson located oﬀspring, the ﬁrst generation of oﬀspring from an ancestor at (tc , κc ) follows a Poisson process with intensity θ(t, κ | tc , κc ), the second generation follows a Poisson process with intensity θ(2) (t, κ | tc , κc ) =

X

θ(t, κ | t , κ ) θ(t , κ | tc , κc ) dt dκ ,

and in general the kth generation follows a Poisson process with intensity given by the kth iterate of θ, say θ(k) (· | ·). Then, treating tc and κc as the coordinates of a cluster centre (independent arrival) and writing B for a Borel

266

12. Stationary Point Processes and Random Measures

subset of R × R+ 0 that is both bounded and bounded away from 0 in its second argument, we can estimate the integral by X

Pr{N (B | x) > 0} µc (dx) ≤ ≤

X

k

X

k

=µ

Pr{Nk (B | x) > 0} µc (dx) E[Nk (B | x)] µc (dx)

k

R

R+ 0

θ(k) (B | tc , κc ) ψ(tc , κc ) dtc dκc ,

(k) where (k) Nk (· | x) is the process of kth generation oﬀspring, θ (B | ·) = θ (t, κ | ·) dt dκ, and the intensity µc (dx) of cluster centres has been reB placed by the speciﬁc form µψ(tc , κc ) dtc dκc . But because ψ(t, κ) is an eigenfunction, this last sum reduces to

M (B) = µ

∞ k=0

κ−(1+1/D) dt dκ .

ρk B

If ρ < 1 and B is bounded away from 0 on the κ axis, this sum is certainly ﬁnite, and then represents the expected number of points falling into B, namely, µ κ−(1+1/D) dt dκ. (12.8.20) M (B) = 1−ρ B The argument shows that, although the number of cluster members is inﬁnite in total, most are very small, and only a ﬁnite number fall into a bounded set B bounded away from 0 on the energy axis. A similar statement holds for the overall process. Finally, given that such a cluster process exists, and has the above mean rate, it follows that the series deﬁning λ∗ (t, κ) converges almost surely, and then represents the total risk of an event in dt × dκ, given the locations of all points with ti < t, that is, given the complete history of the process up to time t. The model can be modiﬁed and extended in various ways. In particular, the power-law form for the density function f of time-delays is not an inevitable feature of self-similarity. It can just as well be replaced by an exponential (short-tailed) form without aﬀecting either stationarity or self-similarity. An outline is given in Exercise 12.8.4. Thus there is no necessary connection between self-similarity and long-range dependence, as there is in the case of the fractional Brownian motions. On the other hand, self-similarity does imply power-law growth in some sense, as is apparent from the very deﬁnition. However, because the moments are inﬁnite, this sense cannot be expressed in terms of the rate of growth of the moment functions.

12.8.

Scale-invariance and Self-similarity

267

A related and important approach to self-similarity for random measures is developed in the papers by Z¨ ahle already quoted. Z¨ ahle suggests basing the property, not on absolute locations relative to the state space X , but on locations relative to a given point of the realization, that is, on the Palm distributions of the process. This allows the treatment of some examples, such as L´evy dust [Example 9.1(g)] which lie outside the more restricted development in the text above. Z¨ ahle gives a general algebraic treatment, and develops many important properties of self-similar random measures, such as the dimensionality of the set of atoms, but he does not examine the probabilistic structure of the atoms from the point of view considered in this section.

Exercises and Complements to Section 12.8 (D)

12.8.1 Show that the renormalization group acts boundedly and Ê Rα (D) Ê continuously −D . [Hint: For suitable f , f (x)R µ(dx) = α f (y/α) µ(dy). on M# α X Now imitate the proof of continuity of the shifts Sx in Lemma 12.1.I.] 12.8.2 Develop a representation for self-similar completely random measures in R2 analogous to that set out in Proposition 12.8.IV for such measures in R. Then consider the simpliﬁcations which occur on assuming that the random measure is in addition (a) homogeneous (i.e., stationary with respect to shifts in R2 ), (b) isotropic, or (c) both. [Hint: Consider the eﬀects on the intensity of the Poisson process in the representation (10.1.4).] 12.8.3 Let N be a self-similar extended MPP on Rd × R+ ; transform the mark-scale by setting q = log κ. Show that D-self-similarity is equivalent to requiring the transformed process N ∗ on Rd × R to have ﬁdi distributions that are invariant under the transformations Eα(D) (A × Q) = αA × (SD log α Q). Restate Proposition 12.8.VII and Lemma 12.8.VIII in terms of the transformed process N ∗ . 12.8.4 Investigate in detail the properties of a version of the self-similar ETAS model where the normalized density for the time delay in (12.8.19) has the exponential form f (t − ti ) = cκ ecκ (t−ti ) . In particular check that the form (12.8.12) can be sustained, and that the existence of a stationary process can be established as in Example 12.8(b). 12.8.5 Long-range dependence of self-similar ETAS model. (a) In the model of Example 12.8(b), show that the ﬁrst- and higher-order moment measures of self-similar random measures do not exist, so that long-range dependence in the sense of Section 12.7 cannot be deﬁned. (b) Investigate conditions for long-range dependence of the process NK restricted to any mark set K bounded away from 0 and ∞.

CHAPTER 13

Palm Theory

13.1 13.2 13.3 13.4

Campbell Measures and Palm Distributions Palm Theory for Stationary Random Measures Interval- and Point-stationarity Marked Point Processes, Ergodic Theorems, and Convergence to Equilibrium 13.5 Cluster Iterates 13.6 Fractal Dimensions

269 284 299 317 334 340

In Section 3.4 we gave a brief introduction to Palm–Khinchin equations and noted that, for a stationary point process on the line, they provide a link between counting and interval properties. In this chapter we study this link both in more detail and in a more general setting. It is a topic that continues to ﬁnd new applications, both within point process theory itself, and in the applications of that theory to ergodic theory, queueing theory, stochastic geometry and many other ﬁelds. Its continuing relevance is linked to the shift of viewpoint that it entails: from an absolute frame of reference outside the process under study, to a frame of reference inside the process (meaning, for a point process, relative to a point of the process). Such a change of viewpoint is usually insightful, and sometimes essential, in seeking an understanding of point process properties. Early contributions by Palm (1943) and Khinchin (1955) have already been noted in Chapter 3. Subsequently, the general theme was taken up by Kaplan (1955), who was inﬂuenced by Doob’s (1948) work on renewal processes, and Slivnyak (1962, 1966). This work examined point processes in R with the property of interval-stationarity; successful extension of this idea to point processes in Rd has been much more recent. A critical development in the study of stationary random measures and point processes was the formulation by Kummer and Matthes (1970) of what they called Campbell measure, in essence a reﬁnement of the Radon–Nikodym approach that Ryll-Nardzewski (1961) and Papangelou (1970, 1974a) used earlier. The later evolution of their work can be traced through the three editions—in German, English, and Russian—of Matthes, Kerstan, and Mecke, 268

13.1.

Campbell Measures and Palm Distributions

269

referred to as MKM (1974, 1978, 1982). The relation with ergodic theory (the theory of ﬂows and of ﬂows under a function) was studied by Neveu (1968, 1976), Papangelou (1970), and Delasnerie (1977). Baccelli and Br´emaud (1994) exploit the links between material in this chapter and the martingale approach outlined in Chapter 14, while Sigman (1995) and Thorisson (2000) use shift-coupling arguments in an alternative approach to the main limit theorems and their applications in queueing theory and elsewhere. More recently, Thorisson (2000), Tim´ ar (2004), Heveling and Last (2005), and others, have shown how to extend the concept of interval-stationarity for a point process in R to a more general concept of point-stationarity in Rd for d ≥ 2, although it dates back at least to Mecke (1975). The theory has many applications, notably in queueing theory, where work was initiated by K¨ onig and Matthes (1963); see also K¨ onig, Matthes, and Nawrotski (1967) and Franken (1975), Franken et al. (1981), and Brandt, Franken, and Lisek (1990), among many others. Related applications in stochastic geometry are presented in Stoyan and Mecke (1983) and, more profusely, in Stoyan, Kendall, and Mecke (1987, 1995) [SKM (1987, 1995) below]. In our discussion, which has been strongly inﬂuenced by MKM (1978), the major emphasis concerns the stationary case. The main results are derived by a factorization of the Campbell measure, which parallels the factorization of the moment measures given in Section 12.6. Indeed, the reduced moment measures reappear in this chapter as multiples of the moment measures of the Palm distribution. The deﬁnition of Campbell measure and a brief account of the Radon– Nikodym approach is given in Section 13.1. The main results for stationary random measures are set out in Section 13.2, and Section 13.3 develops the basic relationships between stationarity of the measure, and stationarity relative to points of the process. This includes the interpretation of the Palm distribution as the distribution ‘conditional on a point at the origin,’ the equivalence between stationarity of the point process and stationarity of the intervals for a one-dimensional point process, and its recent extensions to point-stationarity in higher dimensions. Ergodicity and convergence to equilibrium from the Palm distribution are discussed in Section 13.4, which also outlines extensions to MPPs. Section 13.5 gives the discussion of cluster iterates deferred from Chapter 11, and Section 13.6 looks at an interpretation of fractal dimensions in terms of moments of the Palm distribution.

13.1. Campbell Measures and Palm Distributions For any random measure ξ, including possibly a point process, on the c.s.m.s. X , we introduce a measure CP (· × ·) on the product space W ≡ X × M# X by # setting, for A ∈ BX , U ∈ B(M# ), and P the distribution of ξ on M , X X

270

13. Palm Theory

CP (A × U ) = E[ξ(A)IU (ξ)] =

ξ(dx) P(dξ). U

(13.1.1a)

A

It represents a reﬁnement of the ﬁrst moment measure M (A) = CP (A×M# X ), which results when U is expanded to cover the full space M# . X Write BW for the product Borel σ-ﬁeld BX ⊗ B(M# X ); that is, BW is generated by all A × U ∈ W with A ∈ BX and U ∈ B(M# X ). The set function CP (·) is clearly countably additive on such product sets but is totally ﬁnite if and only if the ﬁrst moment measure exists and is totally ﬁnite. To see that CP (·) is always at least σ-ﬁnite, let {Am } (m = 1, 2, . . .) be a sequence of bounded Borel sets covering X , and deﬁne Umn = {ξ: ξ(Am ) ≤ n}

(n = 1, 2, . . .).

Then the inequalities CP (Am × Umn ) =

ξ(dx) P(dξ) ≤ nP(Umn ) ≤ n

Am

Umn

imply that CP is certainly ﬁnite on each set Am × Umn . These sets cover W, because for any given (x, ξ) ∈ W we can select Am x and then, because any ξ ∈ M# X is a.s. boundedly ﬁnite, given Am we can ﬁnd n such that ξ(Am ) ≤ n, so (x, ξ) ∈ Am × Umn . It then follows that the set function CP extends uniquely to a σ-ﬁnite measure on BW . We continue to use CP for this extension. It is also convenient to introduce here the modiﬁed Campbell measure1 ! CP (· × ·) which plays an important role in the analysis of spatial point processes in Chapter 15. It is deﬁned much as in (13.1.1a) but speciﬁcally for simple point processes N , and with the special feature of excluding the point at the origin which is characteristic of the ordinary Palm distributions. For A ∈ BX and U ∈ B(NX#∗ ) and P the distribution of N on NX#∗ , we write IU N \ x N (dx) , CP! (A × U ) = E

(13.1.1b)

A

where N \ x denotes the realization of N modiﬁed by the removal of any point that there may be at location x (sometimes N \x is written loosely as N −δx ). Deﬁnition 13.1.I. (a) The Campbell measure CP associated with the random measure ξ on the c.s.m.s. X , having distribution P on M# X , is the unique extension of the set function deﬁned at (13.1.1a) to a σ-ﬁnite measure on BW . 1 Kallenberg (1983a, §12.3) uses the term ‘reduced Campbell measure,’ which we avoid here to eliminate any confusion with the term ‘reduced moment measure’ used, for example, onwards from Proposition 12.6.III.

13.1.

Campbell Measures and Palm Distributions

271

(b) The modiﬁed Campbell measure CP! associated with the simple point process N on the c.s.m.s. X with distribution P on NX#∗ , is the unique extension of the set function deﬁned at (13.1.1b) to a σ-ﬁnite measure on BW . For the remainder of this chapter we deal only with the ordinary Campbell measure and associated Palm measures, leaving until Chapter 15 any results we need from the analogous development of modiﬁed Campbell and Palm measures (but see Exercise 13.2.7). By following the usual route from indicator functions to simple functions and limits of simple functions, the quantity deﬁned initially at (13.1.1a) extends to the following integral form. Lemma 13.1.II. For BW -measurable functions g(x, ξ) that are either nonnegative or CP -integrable, g(x, ξ) CP (dx × dξ) = E g(x, ξ) ξ(dx) W X (13.1.2) = g(x, ξ) ξ(dx) P(dξ). M# X

X

We have already noted the connection between Campbell measure and the ﬁrst moment measure which results on setting U = M# X in (13.1.1a); it yields CP (A × M# X ) = E[ξ(A)] = M (A) whenever the ﬁrst moment measure M (·) exists. The link with Campbell’s theorem noted around (9.5.2) follows most easily from (13.1.2). When M (·) exists, and g is a function of x only, (13.1.2) reduces to g(x) ξ(dx) = g(x) ξ(dx) P(dξ) = g(x) M (dx), E X

X

M# X

X

that is, precisely (9.5.2), of which Campbell’s (1909) original result is the special case for a stationary Poisson process. No doubt it was this link with (9.5.2) that Kummer and Matthes (1970) had in mind in coining the term Campbell measure for the measure CP . Several further comments should be made concerning this deﬁnition. As , P) can in Chapter 9, the role of the canonical probability space (M# X , BM# X be replaced by a more general probability space (Ω, E, P) without altering the basic character of the deﬁnition, provided only that the probability space is rich enough to support the random measure. In this case the product measurable space W above is replaced by the product measurable space (W ∗ , BW ∗ ) ≡ (Ω × X , E ⊗ BX ), and the deﬁning property (13.1.1a) of the Campbell measure becomes ξ(dx, ω) P(dω) (U ∈ E). CP (A × U ) = E[ξ(A)IU ] = U

A

272

13. Palm Theory

In this chapter, as in the last, we develop the basic theory for the canonical probability space. In Chapter 14, however, the more general deﬁnition as above is needed when we consider conditioning on random variables external to the point process itself. A second point to note is that reﬁnements of higher-order moment measures can be deﬁned in a similar way to the Campbell measure itself [see, e.g., Kallenberg (1975, p. 69; 1983a, p. 103)]. For example, a second-order (2) Campbell measure CP can be deﬁned on X (2) × M# X by setting (2)

CP (A × B × U ) = E[ξ(A)ξ(B)IU (ξ)]

(13.1.3)

for A, B ∈ BX , U ∈ B(M# X ). Clearly, the second moment measure M2 (A×B), when it exists, appears as the marginal distribution on integrating out P (see Exercise 13.1.1). Finally, observe that the construction is not restricted to measures P on M# X which have total mass one, but can be carried through for any measure Q on M# X for which (i) Q is σ-ﬁnite, and (ii) there exists a suitable family {Am } covering X such that, for all (m, n), Q(Umn ) = Q({ξ: ξ(Am ) ≤ n}) < ∞ [see below (13.1.1a)]. Such a construction, of CQ say, starting from CQ (A × U ) = U A ξ(dx) Q(dξ) with A and U as in (13.1.1a), is important (and always possible) for the associated with an inﬁnitely divisible random measure (see KLM measures Q Exercise 13.1.2). We brieﬂy digress to examine its deﬁnition in this more general case, for while it is clear from the construction that Q determines CQ uniquely, the converse is true only if some additional information is given. The situation is summarized in Lemma 13.1.III. A characterization of measures that can appear as Campbell measures, based on Wegmann (1977), is outlined in Exercise 13.1.3. Lemma 13.1.III. When the measure Q on M# X satisﬁes (i) and (ii) above, the corresponding Campbell measure CQ determines Q uniquely on M# X \{∅}; # it determines Q uniquely on MX if, in particular, either Q is a probability measure, or Q({∅}) = 0. Proof. Using the assumptions, choose a bounded set A within a ﬁnite union of the Am . Then ξ(A) is Q-a.e. ﬁnite, and setting Vx = {ξ: ξ(A) ≤ x} for arbitrary x > 0, we can deﬁne FA (x) = Q(Vx ) < ∞. Clearly FA (·) is the d.f. of a probability distribution when Q is a probability measure, but will not be so in general. Then x y dFA (y) = GA (x) say; CQ (A × Vx ) = 0

13.1.

Campbell Measures and Palm Distributions

273

that is, the Campbell measure determines GA (·), a weighted version of FA (·). Furthermore, apart from the value of FA (0+), FA (·) can be recovered from CQ via the relation dFA (x) = x−1 dGA (x). By varying the choice of A we see that CQ determines all the one-dimensional ‘distributions’ of Q. Analogous arguments apply to the multivariate ﬁdi distributions [recall that the basic sample functions ξ(·) are Q-random measures] and show that the Campbell measure determines all the ﬁdi distributions of Q, hence Q itself, up to the value of Q({∅}). This last is clearly determined if Q(M# X ) = 1, or if its value is explicitly prescribed. The key to the introduction of Palm distributions in general is the relation between the Campbell measure CP and the ﬁrst moment measure M (·). Whenever M (·) exists as a boundedly ﬁnite measure, for each ﬁxed U ∈ B(M# X ), CP (· × U ) is then absolutely continuous with respect to M (·). We can thus introduce the Radon–Nikodym derivative as a BX -measurable function Px (U ) satisfying, for each A ∈ BX , Px (U ) M (dx) = CP (A × U ), (13.1.4) A

and Px (U ) is deﬁned uniquely up to values on sets of M -measure zero. Moreover, for each ﬁxed, bounded Borel set A, CP (A × U )/M (A) = CP (A × U )/CP (A × M# X) is a probability measure on M# X . Just as in the discussion of regular conditional probabilities (see Proposition A1.5.III), it follows that the family {Px (U )} can be chosen so that (A) for each ﬁxed U ∈ B(M# X ), Px (U ) is a measurable function of x that is M -integrable on bounded subsets of X ; and (B) for each ﬁxed x ∈ X , Px (U ) is a probability measure on U ∈ B(M# X ). We call each such measure Px (·) a local Palm distribution for ξ, and the family of such measures satisfying (A) and (B) the Palm kernel associated with ξ. Then the discussion above implies the following result, in which (13.1.5) follows from (13.1.4) by the usual extension arguments. Proposition 13.1.IV. Let ξ be a random measure whose ﬁrst moment measure M exists. Then ξ admits a Palm kernel, that is, a regular family of local Palm distributions {Px (·): x ∈ X } which are deﬁned uniquely up to values on M -null sets, and for all BW -measurable functions g they are either nonnegative or CP -integrable, and satisfy E g(x, ξ) ξ(dx) = g(x, ξ) CP (dx × dξ) = Ex [g(x, ξ)] M (dx), X

X ×M# X

X

(13.1.5)

274

13. Palm Theory

where Ex [g(x, ξ)] =

M# X

g(x, ξ) Px (dξ)

(x ∈ X ).

Note that this proposition holds equally for random measures and point processes, nor does it require ξ to be stationary. When ξ is stationary, the local Palm distributions become translated versions of a single basic distribution, so that (-a.e. x). (13.1.6) Px (Sx U ) = P0 (U ) A more general version of (13.1.6) is given in Section 13.2 where a factorization argument is used and there is no requirement for the existence of ﬁrst moments. An outline proof in the present setting, using arguments similar to those of Exercise 12.1.9, is sketched in Exercise 13.1.4(a). We turn to illustrate the nature of local Palm distributions, ﬁrst for a random measure with density. Example 13.1(a) Palm distributions for a random measure with density. Suppose that the random measure ξ on R has trajectories with a.s. continuous locally bounded derivatives dξ(x, ω)/dx = X(x, ω) and that the ﬁrst moment measure M (·) has a continuous locally bounded density dM (x)/dx = m(x) = E[X(x)]. Here we use ω ∈ Ω for the probability space, rather than ξ ∈ M# X, merely to avoid ambiguity of notation. In (13.1.5) let g(·) run through a sequence of functions of the form gn (x, ω) = hn (x)IU (ω), where U ∈ B(Ω) is ﬁxed and {hn (x)} is a sequence of functions converging to δx0 for some x0 ∈ R, with m and X continuous at x0 . From (13.1.5) we obtain P(dω) hn (x)X(x, ω) dx = m(x)hn (x) Px (U ) dx. U

X

X

Using the a.s. continuity and local boundedness, the left-hand side converges as n → ∞ to X(x0 , ω) P(dω), U

and the right-hand side converges to m(x0 ) Px0 (U ), where these functions are continuous in x at x0 by assumption, so 1 X(x0 , ω) P(dω). (13.1.7) Px0 (U ) = m(x0 ) U Thus, the measure Px0 (·) appears as a reweighted version of the probability measure P, the weight for the particular realization ξ(ω) being taken as proportional to the value of the density X(x0 , ω) at the chosen point x0 . Alternatively, if Y is any random variable deﬁned on the process, and Ex0 (·) denotes expectations with respect to the Palm distribution at x0 , (13.1.7) is equivalent to E[X(x0 )Y ] . Ex0 (Y ) = E[X(x0 )]

13.1.

Campbell Measures and Palm Distributions

275

Some condition such as the assumed continuity in x at x0 of Px (U ) is essential, as can be shown by counterexamples where the right- and leftlimits at x0 of Px (U ) exist but are diﬀerent [see Leadbetter (1972) for the point process context]. Greater interest attaches to random measures that are a.s. purely atomic. In this case the Campbell measure inherits a singular structure from ξ, in that its support is restricted to the subset of X × M# X deﬁned by # U = {(x, ξ) ∈ X × M# X : ξ ∈ MX , ξ({x}) > 0}.

(13.1.8)

For point processes the support is further restricted to the subset of X × NX# deﬁned by V = {(x, N ) ∈ X × NX# : N ∈ NX# , N ({x}) ≥ 1}.

(13.1.9)

The relevant properties of U and V are summarized in the next proposition. Proposition 13.1.V. Let U and V be deﬁned by (13.1.8–9). # (i) U is a Borel subset of X × M# X and V is a Borel subset of X × NX . (ii) A random measure ξ with distribution P on M# X is purely atomic if and only its Campbell measure CP satisﬁes CP (U c ) = 0. (iii) ξ is a point process if and only if CP (V c ) = 0. Proof. Consider for each n > 0 a partition of X into a countable family Tn of Borel subsets {Anm : m = 1, 2, . . .} of diameter ≤ n−1 , and set Vnmk = {(x, ξ): x ∈ Anm , ξ(Anm ) ≥ 1/k}. Clearly, Vnmk ∈ BX ⊗ B(M# X ). We assert that U=

∞ ∞ ∞ (

Vnmk ,

(13.1.10)

k=1 n=1 m=1

implying inter alia that U is a measurable subset of X × M# X. To justify (13.1.10), consider any (x, ξ) ∈ U. Because ξ({x}) > 0 there exists k such that ξ({x}) ≥ 1/k . For each n, x belongs to just one element of Tn , Anm say, for which ξ(Anm ) ≥ ξ({x}) ≥ 1/k . Hence (x, ξ) ∈ Vnm k . Thus every element in U is an element in theright-hand side of (13.1.10). Conversely, for ﬁxed k, suppose (x, ξ) ∈ n m Vnmk , and for each n let An (x) denote the unique Anm containing x. Then ξ({x}) = lim ξ An (x) ≥ 1/k, n→∞

implying that (x, ξ) ∈ U, so (13.1.10) holds as asserted. Part (i) is shown. To check (ii), let Tn = {Anm : m = 1, . . . , rn } now be a dissecting system for an arbitrary bounded A ∈ BX , and let UA denote the analogue of U at

276

13. Palm Theory

(13.1.10) with these redeﬁned {Anm }. Use the representation of ξ in terms of the MPP Nξ to write, for such bounded A, κ N (dx × dκ) < ∞, (13.1.11) M (A) = EP [ξ(A)] = EP ∗ A×(0,∞)

where expectations are written with respect to the measures P for ξ and P ∗ for N under the one-to-one measurable mapping ξ ↔ Nξ [cf. Proposition 9.1.V(v)]. Set 1 ∗ . κ N (dx × dκ) ≥ Vnmk = (x, N ): k Anm ×(1/k, ∞) ∞ ∗ c We assert that for each n and k, CP ∗ vanishes. Indeed, m=1 (Vnmk ) CP ∗

( ∞

∗ (Vnmk )c

= EP ∗

m=1

and the right-hand side equals ∞ EP ∗ m=1

∞

∗ [1 − IVnmk (x, N )] κ N (dx × dκ) ,

X ×(0,∞) m=1

Anm ×(0,∞)

∗ [1 − IVnmk (x, N )] κ N (dx × dκ) ,

because for x ∈ Anm , only the term involving Anm can contribute a term different from unity to the inﬁnite product, so that the integral of the product re∗ ∗ , IVnmk (x, N )= 1, duces to the sum of integrals as shown. But if (x, N ) ∈ Vnmk so the integrand in that term, and hence its integral, vanishes. If, alterna∗ tively, (x, N ) ∈ / Vnmk , then there are no atoms in Anm of mass > 1/k [i.e., N Anm × (1/k, ∞) = 0], so again the integral vanishes. Thus all terms in the sum vanish, and our assertion is justiﬁed. Let

∗ = (x, N ): x ∈ A, N {x} × (1/k, ∞) = 1 . UA,k Then, as for (13.1.6), we can show that ∗ = UA,k

rn (

∗ Vnmk ,

n m=1

and it follows from the assertion just proved that ( rn ∗ c c ∗ CP ∗ (UA,k ) = CP ∗ (Vnmk ) = 0. n m=1 ∗ DeﬁneUA analogously to UA but in terms of the counting measures, that ∞ ∞ rn ∗ ∗ , and consider the diﬀerence is, UA = k=1 n=1 m=1 Vnmk ∗ ∗ CP ∗ (UA ) − CP ∗ (UA,k ) = EP ∗ κ IUA∗ (x, N ) N (dx × dκ) . A×(0,1/k]

13.1.

Campbell Measures and Palm Distributions

277

Letting k → ∞, the diﬀerence to zero by (13.1.7) ∗ cconverges and dominated ) = 0, equivalently, CP (UA )c = 0. convergence, so that CP ∗ (UA Because the space X is separable, we it by a countable family of can cover bounded sets Ai on each of which CP (UAi )c = 0. Also, because U c = {(x, ξ) : ξ({x}) = 0} =

c (x, ξ): x ∈ Ai , ξ({x}) = 0 = UA i ,

i

i

assertion (ii) in the Proposition now follows. Assertion (iii) follows by a similar (and simpler (!)) argument applied to the space of counting measures. The proposition above shows that for random measures that are a.s. purely atomic, each local Palm measure Px inherits the singular structure of the Campbell measure, in the form of an atom at the point x selected as a local origin. In the point process case, this leads directly to the interpretation of the Palm distribution Px as a distribution conditional on the occurrence of a point at x. For a direct approach to this deﬁnition, under some assumptions, see Exercise 13.1.9. The stationary case is discussed in detail in Section 13.3. More generally, for a purely atomic random measure, it follows from the proposition that the only contributions to the local Palm distribution Px come from realizations of ξ which have an atom at x. The relationships are most easily explored in the case of a random measure with only ﬁnitely many atoms, or equivalently, a ﬁnite point process with positive marks, as in the next example. The special case of a nonsimple point process is illustrated in Exercise 13.1.6; the general relationship is sketched in Exercise 13.1.7. Example 13.1(b) Purely atomic random measure with ﬁnitely many atoms. On a state space X we suppose given a random measure ξ that is purely atomic and which a.s. has only ﬁnitely many atoms, so ξ(X ) < ∞ and we also assume that M (X ) = E[ξ(X )] < ∞.Denote P{ξ has n atoms} = pn for some npn = µ denoting the mean number of probability distribution {pn } with atoms. Let {xi : i = 1 . . . , n} denote the locations of the atoms, and {κi } their masses, supposing that the locations are i.i.d. with distribution F (·), and that the masses are independently distributed with distributions Π(· | x) conditional on the locations. We can thus describe a realization ξ by means of the subset {y1 , . . . , yn } for some ﬁnite integer n and the pairs yi = (xi , κi ) that are i.i.d. on X × R+ 0 with distribution Ψ(dy) = Ψ(d(x, κ)) = Π(dκ | x) F (dx). With this notation and recalling (9.1.4) and Proposition 9.1.III(v), the process can alsobe identiﬁed as an MPP N on X with positive marks, and for A ∈ BX , ξ(A) = A ξ(dx) = xi ∈A κi . To identify the Palm kernel for ξ, consider ﬁrst the left-hand side of the deﬁning equation (13.1.5), taking the function g(x, ξ) there to be of product form α(x)h(ξ), where h on Y ∪ is deﬁned piecewise as hn (y1 , . . . , yn ) on Y (n) where it is symmetric in the indices (1, . . . , n) for positive n; h can be arbitrary

278

13. Palm Theory

for n = 0 because the integral vanishes when ξ(X ) = 0. Then n

∞ E g(x, ξ) ξ(dx) = pn X

Y (n)

n=1

n

κj α(xj )hn (y1 , . . . , yn )

j=1

Ψ(dyi ).

i=1

Because hn and the joint distribution are symmetric, this can be rewritten as µ Y

κα(x) Π(dκ | x) F (dx)

n−1

Y (n−1)

n−1

κj hn−1 (y1 , . . . , yn−1 )

j=1

Ψ(dyi ),

i=1

∞ where µ = n=1 npn denotes the mean number of atoms. Introducing the mean atomic mass m(x) = R+ κ Π(dκ | x), the ﬁrst moment measure is then 0 given by M (dx) = µ m(x) F (dx). Inspecting the right-hand side of (13.1.5), deﬁne for x ∈ X , κ ∈ R+ 0 , n ∈ Z+ , ! ! p∗n = npn µ and Π∗ (dκ | x) = κ Π(dκ | x) m(x), and identify Ex [g(x, ξ)] as α(x)

∞ n=1

p∗n

R+ 0

∗

Π (dκ | x)

Y (n−1)

n−1

κj hn−1 (y1 , . . . , yn−1 )

j=1

n−1

Ψ(dyi ).

i=1

We can now see that (13.1.5) is indeed satisﬁed if we take for Px (·) on Y (n) for n = 1, 2, . . . the symmetrized version of the measure Pn∗ (x; dy1 × · · · × dyn ) = p∗n δx (dx1 ) Π∗ (dκ1 | x1 )

n

Ψ(dyi ).

(13.1.12)

i=2

In comparison with the corresponding component of the original measure, the above component is ‘tilted’ in two respects: the distributions of the realizations are weighted by the number of points they contain (a realization with n points has n diﬀerent possibilities of locating a point at a given origin), and the distribution of the mass of the atom at x is weighted by the mass of original atom at x. Exercise 13.1.7 extends this example to a more general setting. Palm distributions for an MPP are best introduced by treating the MPP as a point process on X × K. Each local Palm distribution P(x,κ) then represents the behaviour of the process given the occurrence of a point at x with mark κ. The relation of the local Palm distributions to the Palm distribution of the ground process which we assume has ﬁnite ﬁrst moment measure, can be deduced from the following representation, where the function g(·, ·) of (13.1.5) is taken to have the special form g(x, N ) for x ∈ X and N ∈ NX#×K :

13.1.

Campbell Measures and Palm Distributions

E

g(x, N ) N d(x, κ) =

X ×K

X ×K

=

X ×K

X

M g (dx)

= X

# NX ×K

M g (dx) Π(dκ | x)

M g (dx)

=

M d(x, κ)

279

# NX ×K

g(x, N ) P(x,κ) (dN )

g(x, N ) P(x,κ) (dN )

# NX ×K

# NX ×K

g(x, N ) K

Π(dκ | x) P(x,κ) (dN )

g(x, N ) P x (dN ).

(13.1.13)

In this chain of relations, M g (·) = M (· × K) is the ﬁrst-moment measure for the ground process, and we have used the disintegration M (d(x, κ)) = M g (dx) Π(dκ | x). The measure Px (·) deﬁned at the last step can be interpreted as an ‘average’ local Palm distribution representing the behaviour of the process given the occurrence of a point at x with unspeciﬁed mark. The ground measure of the Palm distribution can now be obtained as the projection of Px (·), which lives on the Borel sets of NX#×K , onto the Borel sets of g NX# . Indeed, denoting this projection by Px and taking g(x, N ) in (13.1.13) to be a function g ∗ (x, Ng ) of x and Ng only, (13.1.13) reduces to g ∗ g g (x, Ng ) Ng (dx) = M (dx) g ∗ (x, Ng ) Px . E X

# NX

X

Exercise 13.1.10 gives some further details. To illustrate this situation, suppose in Example 13.1(b) above, we treat the process not as a random measure ξ but as an MPP. Then the representation (13.1.12) for the local Palm distribution of ξ should be replaced by Pn∗ (y; dy1 × · · · × dyn ) = p∗n δy (y1 )

n

Ψ(dyi )

(13.1.14a)

i=2

for the local Palm distribution for the MPP, and for its ground process, g

P n (x; dx1 × · · · × dxn ) = p∗n δx (x1 )

n

F (dxi ).

(13.1.14b)

i=2

We conclude this section with a characterization of Palm distributions via Laplace functionals, and give some results that can be deduced via this characterization. Much as at (9.4.18), write L[f ], where f ∈ BM+ (X ), for the Laplace functional of a random measure with distribution P, and {Lx [f ]} for the family of Laplace functionals derived from the associated Palm kernel at (13.1.5), so that for x ∈ X and f ∈ BM+ (X ), exp − X f (y) ξ(dy) Px (dξ). (13.1.15) Lx [f ] = M# X

280

13. Palm Theory

Proposition 13.1.VI. Let ξ be a random measure with ﬁnite ﬁrst moment measure M and L[f ], Lx [f ] the Laplace functionals associated with the original random measure and its Palm kernel, respectively. Then the functionals L[f ] and Lx [f ] satisfy the relation, for f, g ∈ BM+ (X ), L[f ] − L[f + εg] = g(x)Lx [f ] M (dx). (13.1.16) lim ε↓0 ε X Conversely, if a family {Lx [f ]} satisﬁes (13.1.16) for all f, g ∈ BM+ (X ) and some random measure ξ with Laplace functional L[·] and ﬁrst moment measure M (·), then the functionals {Lx [f ]} coincide M -a.e. with the Laplace functionals of the Palm kernel associated with ξ. Proof. Because the ﬁrst moment measure exists, a ﬁnite Taylor expansion (see Exercise 9.5.8) for ε > 0 and f, g ∈ BM+ (X ) yields L[f + εg] = L[f ] − ε E g(x) exp − X f (y) ξ(dy) ξ(dx) + o(ε). X

(13.1.17)

From (13.1.6), g(x) exp − X f (y) ξ(dy) ξ(dx) E X = g(x) M (dx)

X

= X

M# X

exp − X f (y)ξ(dy) Px (dξ)

g(x)Lx [f ] M (dx);

substitution into (13.1.17) followed by rearrangement leads to (13.1.16). To prove the converse, suppose that (13.1.16) holds for the family of func˜ x [f ]}; because (13.1.16) holds for all g ∈ BM+ (X ), the measures tions {L ˜ x [f ]M (dx) coincide, so Lx and L ˜ x agree for M -a.e. x. Lx [f ]M (dx) and L The relation at (13.1.16) is useful in identifying the form of the Palm kernel in some simple cases. Example 13.1(c) The Palm kernel for a Poisson process. For a Poisson process with parameter measure µ(·), we have from below (9.4.18) that log L[f ] = − (1 − e−f (x) ) µ(dx). X

Then

d dL[f + εg] −f (x)−εg(x) = −L[f + εg] (1 − e ) µ(dx) dε dε X −f (x)−εg(x) g(x)e µ(dx) = −L[f + εg] X → −L[f ] g(x)e−f (x) µ(dx) (ε → 0). X

13.1.

Campbell Measures and Palm Distributions

281

This can be put in the form of (13.1.16) via the identiﬁcation M (·) = µ(·), and (13.1.18) Lx [f ] = e−f (x) L[f ] = Lδx [f ] L[f ], where on the right-hand side, Lδx [·] denotes the Laplace functional of the degenerate random measure with an atom of unit mass at x and no other mass. The interpretation of (13.1.18) is that the local Palm distribution Px (·) coincides with the distribution of the original process except for the addition of an extra point at x itself to each trajectory. By regarding the local Palm distribution as being conditional on a point at x, the independence properties of the Poisson process then imply that, apart from a given point at x, the probability structure of the conditional process is identical to that of the original process. The relation embodied in (13.1.18) can also be written in the form Px = P ∗ δx ,

(13.1.19)

where δx denotes a degenerate random measure as in (13.1.18). Equation (13.1.19) is the focus of a characterization of the Poisson process. Proposition 13.1.VII [Slivnyak (1962); Mecke (1967)]. The distribution of a random measure with ﬁnite ﬁrst moment measure satisﬁes the functional relation (13.1.19) if and only if the random measure is a Poisson process. Proof. The necessity of (13.1.19) has been shown above. For the converse, suppose that Px satisﬁes (13.1.19). Then from (13.1.16) we obtain dL[εf ] = −L[εf ] f (x)e−εf (x) M (dx), dε X where M is the ﬁrst moment measure, assumed to exist. Using log L[0] = log 1 = 0, 1 f (x)e−εf (x) dε M (dx) = (1 − e−f (x) ) M (dx), − log L[f ] = X

X

0

so that L[f ] is the Laplace functional of a Poisson process with parameter measure equal to M (·).

Exercises and Complements to Section 13.1 ! 13.1.1 (a) Check that modiﬁed Campbell measure CP (·) has a unique extension ). from (13.1.1b) to sets in BX ⊗ B(M# X (2)

(b) Show that the set function CP (·) deﬁned at (13.1.3) has a unique extension to a σ-ﬁnite measure on X × X × M# X . When the second moment measure exists, deﬁne a second-order family of local Palm distributions (2) Px,y (U ) satisfying (2)

A×B

(2) Px,y (U ) M2 (dx × dy) = CP (A × B × U ).

282

13. Palm Theory

13.1.2 Use the analogue of (13.1.1) to deﬁne the Campbell measure CQ (·) of the KLM measure Q, using (10.2.8) to establish the σ-ﬁniteness property of CQ (·)

on appropriate subsets of X × NX# . Appeal to Lemma 13.1.III to establish that CQ (·) determines Q uniquely.

13.1.3 A measure C(·) on X × M# X is the Campbell measure CP of some random measure ξ with σ-ﬁnite ﬁrst moment measure if and only if the following three conditions hold: # (i) C(A × MX ) < ∞ for bounded A ∈ BX ; (ii) X ×M# g(x, η) C(dx × dη) = 0 whenever X g(x, η) η(dx) = 0 for each X

η ∈ M# X ; and (iii) 1 − φA ≡ A×{η:η(A)>0} [η(A)]−1 C(dx × dη) ≤ 1 for bounded A ∈ BX . When these conditions hold, inf A φA = P{ξ = ∅}. [Hint: For the converse, deﬁne a measure P on M# X by

#

MX

f (η) P(dη) =

X

#

MX

k(x, S−x η) f (S−x η) C(dx × dη),

where k(·) satisﬁes (13.2.9); then verify that C and CP coincide. See Wegmann (1977) for details.] 13.1.4 (a) Use arguments analogous to those of Exercise 12.1.9 to show that (13.1.6) holds when the process N is stationary and has boundedly ﬁnite ﬁrst moment measure. (2) (b) What can be said about the local second-order Palm measure Px,y (·) when the process is stationary? [Hint: Use Exercise 12.1.8 much as in Exercise 12.1.9.] 13.1.5 Let ξ be a random measure supported by Z = {0, ±1, . . .} and with ﬁnite ﬁrst moment measure. Describe its Palm and Campbell measures [see (13.1.4)]. When ξ is a simple point process, reinterpret the Palm measure as a conditional distribution. 13.1.6 Discuss two possible interpretations for the local Palm distributions for a nonsimple point process, the ﬁrst based on the occurrence of a point at x, and the second based on the occurrence of a point of given multiplicity at x. Show that the ﬁrst can be represented as an average of the second. + # 13.1.7 Let Ψ: M# X → N (X × R0 ) denote the mapping of a purely atomic boundedly ﬁnite random measure ξ on X into an integer-valued extended random measure Nξ on X × R+ 0 as in Proposition 9.1.III(v). + ∗ # (a) Show that a measure P on M# X induces a measure P on N (X × R0 ) and conversely. (b) Let CP and CP ∗ denote the corresponding Campbell measures. Show that

#

X ×MX

g(x, ξ) CP (dx × dξ) = EP

g(x, ξ) ξ(dx) X

= EP ∗

X ×R+ 0

κ g(x, Ψ(N )) N (dx × dκ) .

13.1.

Campbell Measures and Palm Distributions

283

(c) Writing h(x, κ, N ) = κ g(x, Ψ(N )) as in part (b), and Y = X × R+ 0 , show that EP ∗

X ×R+ 0

=

h(x, κ, N ) N (dx × dκ)

#

Y×NY

=

Y

h(x, κ, N ) CP ∗ (dx × dκ × dN )

M ∗ (dx × dκ) E(x,κ) [h(x, κ, N )],

where M ∗ is the ﬁrst moment measure of the extended MPP on Y and

E(x,κ) [h(x, N )] =

#

NY

h(x, N ) P(x,κ) (dN ),

where the Palm kernel P(x,κ) (·) is conditioned both on the location and the mass of a given atom. Simpliﬁcations of the relations above depend on particular features in the model as in Example 13.1(b) and (13.1.15). 13.1.8 Show that, although the arguments leading to Proposition 13.1.V can fail when CP (A × M# X ) = M (A) = ∞ for some bounded A ∈ BX , they can be recovered by introducing a nonnegative function h(·) such that

#

h(x, ξ) CP (dx × dξ) < ∞

(bounded A ∈ BX ),

A×MX

and deﬁning a modiﬁed Campbell measure H(dx×dξ) = h(x, ξ) CP (dx×dξ). [Hint: Apply the arguments leading to Proposition 13.1.V to H, and use the results for H to derive corresponding results for CP itself. A possible function for h is IB (x)/[1 + ξ(B)] for some ﬁxed bounded B ∈ BX . Similar arguments occur in the discussion following (13.2.4).] 13.1.9 Establish conditions for a limit interpretation of the local Palm distributions Px as conditional distributions, given the occurrence of a point in a neighbourhood of x. [Hint: As in Example 13.1(a), consider the behaviour of functions of the form gn (x, ω) = hn (x)IU (ω), where hn (x) is a δ-sequence. First take U to be the event that a point occurs within a neighbourhood of x, then a compound event incorporating the occurrence of a point near x with additional conditions. Find appropriate continuity conditions for the ratio to converge. The stationary case is discussed in detail in Section 13.3; see also Leadbetter (1972).] 13.1.10 Local Palm measures for an MPP and its ground process. Let N ≡ {(xi , κi )} be an element of NX#×K . Denote by ψg the projection of N onto the corresponding realization Ng ≡ {xi } of its ground process which has measure Pg [for this projection, regard N as the pair (Ng , S) where S is the ordered sequence of marks {κi } from the space K∞ of such sequences]. (a) Verify that every probability measure P on B(NX#×K ) admits the disintegration P(dN ) = Pg (dNg ) ν(dS | Ng ), where ν is a regular probability kernel on the product space NX# × K∞ .

284

13. Palm Theory (b) Use this kernel to express formally the link between the Palm measures P0 and P0g of an MPP and its ground process respectively, namely, #

NX ×K

g(x, κ, N ) P0 (dN ) =

#

NX

P0g (dNg )

−1 ψg (Ng )

g(x, κ, Ng , S) ν(dS | Ng ).

g

(c) Check that the measure Px introduced below (13.1.13) is the marginal distribution, in the above sense, corresponding to the measure Px . 13.1.11 Higher-order versions of local Palm measure. (a) Let the random measure ξ have boundedly ﬁnite second-order moment measure M2 . Show that for any bounded measurable function g(x, y, ξ), (2) the second-order kernel Px,y (·) on M# X , which exists by Exercise 13.1.1, satisﬁes #

MX

P(dξ)

X (2)

g(x, y, ξ) ξ(dx) ξ(dy) = X (2)

M2 (dx × dy)

#

MX

(2) g(x, y, ξ) Px,y (dξ).

(b) Show that when ξ is stationary, there exists a one-parameter kernel (2) P˘x (·) such that (2) (2) Px,y (·) = S−x P˘y−x . (2) (2) (c) Evaluate explicitly the forms of the kernels Px,y (·) and P˘u (·) for the simple i.i.d. model of Example 13.1(b).

13.2. Palm Theory for Stationary Random Measures Throughout this section we suppose that X = Rd and that the random measure ξ is stationary (Deﬁnition 12.1.II), or, equivalently, that its distribution P is invariant under the transformation ξ → Su ξ (all u ∈ X ). Recall also the notation S+u P in (12.1.3) for the transformation on distributions induced by Su : B ∈ B(M# (S+u P)(B) = P(Su B) X) . In fact, the only properties of Rd that are critical for the discussion are the existence and uniqueness of the invariant measure, various standard results such is determined by its integrals f (x) µ(dx) for as that any measure µ ∈ M# X X bounded nonnegative measurable functions f , and the factorization Lemma A2.7.II. Thus, the results hold equally when X is the circle group S, or more generally whenever X is an Abelian σ-group as deﬁned in Section A2.7. Explicit treatments from this more general point of view are given by Mecke (1967) and in MKM (1982). The results below are stated for general random measures, but most of the illustrations concern simple point processes. Extensions to stationary MPPs are given at the beginning of Section 13.4. We start by investigating the eﬀect of stationarity on the Campbell measure, again using W to denote the product space X × M# X.

13.2.

Palm Theory for Stationary Random Measures

285

Proposition 13.2.I. Let ξ be a random measure on X = Rd with distribution P and CP the associated Campbell measure on W. Then ξ is stationary if and only if CP is invariant under the group of transformations Θu : W → W deﬁned for each u ∈ X by (13.2.1a) Θu (x, ξ) = (x − u, Su ξ). The interpretation of (13.2.1) is that, if we shift the origin to u, the Campbell measure of the shifted process relative to u is the same as the Campbell measure of the original process relative to 0. Note also that by + u for the mapping on measures on W induced by Θu , the assertion writing Θ of the proposition is expressed more succinctly as + u CP (all u). (13.2.1b) if and only if CP = Θ P = S+u P (all u) Proof. When CP is the Campbell measure associated with P, the Campbell measure associated with S+u P is to be found from [cf. (13.1.2)] g(x, ξ) ξ(dx) P(Su dξ) = g(x, ξ) Su ξ(dx − u) P(Su dξ) X

M# X

X

= =

M# X

X ×M# X X ×M# X

g(x, ξ) CP Θu (dx × dξ) + u C (dx × dξ) g(x, ξ) Θ P

+ u C . Consequently, if P and S+u P coincide, so do C and is thus equal to Θ P P + and Θu CP . + u C coincide, then it follows from Lemma 13.1.III Conversely, if CP and Θ P that the probabilities P and S+u P from which they are derived coincide. An alternative criterion, due to Mecke (1975), can be deduced as a consequence of the above result. Proposition 13.2.II. P as in Proposition 13.2.I is stationary if and only if its associated Campbell measure CP satisﬁes g(x, y, Sy ξ) CP (dy × dξ) dx X

X ×M# X

= X

X ×M# X

g(y, x, Sy ξ) CP (dy × dξ) dx

(13.2.2)

for every nonnegative BX ⊗ BX ⊗ B(M# X )-measurable function g(·). + u on a measure Ψ on BX ⊗ B(M# ) Proof. The action of the mappings Θ X can be represented by requiring the equations + h(y, ξ) Θu Ψ(dy × dξ) = h(Θ−u (y, ξ)) Ψ(dy × dξ) X ×M# X

X ×M# X

=

X ×M# X

h(y + u, S−u ξ) Ψ(dy × dξ)

(13.2.3)

286

13. Palm Theory

to hold for any measurable nonnegative h. P being stationary implies by Proposition 13.2.I that CP on X × M# X is stationary. Put Ψ = CP in (13.2.3), set h(y, ξ) = g(x, y, Sy ξ) and u = −x, and then integrate with respect to x over X ; this yields g(x, y, Sy ξ) CP (dy × dξ) dx X

X ×M# X

= X

X ×M# X

g(x, y − x, Sy ξ) CP (dy × dξ) dx.

But because dx is the invariant measure on X , integration over the whole of X for any nonnegative measurable function m(·, ·) satisﬁes m(x, y − x) dx = m(y − x, x) dx. X

X

Applying this and Fubini’s theorem reduces the right-hand side of the previous equation to X

X ×M# X

g(y − x, x, Sy ξ) CP (dy × dξ) dx.

Then reversing the operation with Θu gives the right-hand side of (13.2.2). Conversely, suppose that (13.2.2) holds; let j(x) be a measurable nonnegative function satisfying X j(x) dx = 1. Apply (13.2.3) with Ψ = CP to general nonnegative h, multiply by j, and integrate over x; this yields + u CP (dy × dξ) h(y, ξ) Θ X ×M# X

= X

X ×M# X

X

X ×M# X

X

X ×M# X

X

X ×M# X

j(x) h(y + u, S−(u+y) Sy ξ) CP (dy × dξ) dx

=

j(y) h(x + u, S−(u+x) Sy ξ) CP (dy × dξ) dx

by (13.2.2),

=

j(y) h(x, S−x Sy ξ) CP (dy × dξ) dx

by invariance of x,

= =

X ×M# X

j(x) h(y, ξ) CP (dy × dξ) dx

h(y, ξ) CP (dy × dξ)

by (13.2.2),

on integrating out x.

We now look for a product representation of W to which Lemma A2.7.II can be applied. Consider the transformation D: W → W for which D(x, ψ) = (x, S−x ψ) for (x, ψ) ∈ W. Much as in Exercise 12.1.1, this transformation is continuous and hence measurable. It is also one-to-one and onto, because the inverse mapping D−1 has D−1 (x, ξ) = (x, Sx ξ). Observe that Θu D(x, ψ) = Θu (x, S−x ψ) = (x − u, Su−x ψ) = D(x − u, ψ),

13.2.

Palm Theory for Stationary Random Measures

287

so that D provides a representation of W under the actions of the group Θu . is invariant under denoting the image of C under D, C Thus, with C P P P shifts in its ﬁrst argument. It may appear now that Lemma A2.7.II should be applicable and thereby . However, there is a technical diﬃyield the required decomposition of C P culty: it is not obvious in general (and may not be true if the ﬁrst moments takes ﬁnite values on products of bounded subsets of W. are inﬁnite) that C P to construct a To overcome this diﬃculty, we use the σ-ﬁniteness of C P modiﬁed measure, with the same invariance properties, to which we can apply implies the existence of a Proposition 13.2.I. Indeed, the σ-ﬁniteness of C P strictly positive function h(x, ψ) such that, provided P({∅}) < 1, P (dx × dψ) < ∞ h(x, ψ) C (13.2.4) 0< X ×M# X

(see, e.g., the constructions in Exercises 13.1.8 and 13.2.3). Deﬁne α(ψ) = h(x, ψ) dx = h(x+y, ψ) dy (all x ∈ X ), so that α(ψ) > 0, and let g(·) be X X any nonnegative Lebesgue integrable function on X . By using the invariance we obtain properties of C P P (dx × dψ) = P (dx × dψ) g(x)α(ψ) C g(x)h(x + y, ψ) dy C X ×M# X

X ×M# X

=

X ×M# X

=

X

X

X

(du × dψ) g(u − y)h(u, ψ) dy C P

g(u − y) dy

< ∞.

X ×M# X

P (du × dψ) h(u, ψ) C

The ﬁniteness of the integral on the left-hand side for all such integrable g (dx × dψ) takes ﬁnite values on shows that the modiﬁed measure α(ψ) C P products of bounded sets, and indeed on sets A × M# X for bounded A. Inasmuch as the presence of the multiplier α(ψ) does not aﬀect invariance of the measure with respect to shifts in x, the factorization Lemma A2.7.II can still be applied and yields P (dx × dψ) = α C˘P (dψ) (dx) α(ψ) C for some uniquely deﬁned ﬁnite measure α C˘P on B(M# X ). We now deﬁne the Palm measure being sought by setting ! C˘P (dψ) = α C˘P (dψ) α(ψ). This measure is still σ-ﬁnite and satisﬁes the deﬁning relation P (dx × dψ) = C˘P (dψ) (dx). C Its properties are spelled out in more detail in Theorem 13.2.III where the relations (13.2.5–6) are referred to as the reﬁned Campbell theorem.

288

13. Palm Theory

Theorem 13.2.III. (a) Let ξ be a stationary random measure on X = Rd . Then there exists a unique σ-ﬁnite measure C˘P on M# X such that for any BX ⊗ B(M# )-measurable nonnegative, or C -integrable, function g(·), P X g(x, ξ) ξ(dx) = g(x, ξ) CP (dx × dξ) E X

X ×M# X

=

dx X

M# X

equivalently, E g(x, Sx ξ) ξ(dx) = dx X

X

M# X

g(x, S−x ψ) C˘P (dψ), (13.2.5)

g(x, ψ) C˘P (dψ).

(13.2.6)

(b) The measure C˘P is totally ﬁnite if and only if ξ has ﬁnite mean density m, in which case m = C˘P (M# the measure m−1 C˘P (·) coincides with X ) and the regular local Palm distribution Px S−x (·) for -a.e. x. Proof. Equations (13.2.5–6) are applications to the present setting of the factorization theorem, speciﬁcally, of (A2.7.5). The σ-ﬁniteness of C˘P follows from the construction preceding the theorem. To prove part (b), set g(x, ξ) = IA (x) for some bounded A ∈ BX . Then (13.2.5) yields C˘ (dψ), M (A) = E[ξ(A)] = (A) M# X

P

so that if M (A) < ∞, C˘P must be totally ﬁnite with C˘P (M# X ) = M (A)/(A) # ˘ = m. Conversely, if CP (MX ) < ∞, and equal to m say, then M (A) = m (A) < ∞. Furthermore, setting g(x, ξ) = f (x)h(Sx ξ) in (13.1.5) and (13.2.5) above yields f (x)h(Sx ξ) Px (dξ) dx = f (x) dx h(ψ) C˘P (dψ) m X

M# X

X

M# X

for all measurable Lebesgue-integrable f , so for all C˘P -integrable h it follows that h(Sx ξ) Px (dξ) = h(ψ) C˘P (dψ) (-a.e. x), m M# X

M# X

which in turn implies that mPx (S−x dξ) and C˘P (dψ) must coincide for -a.e. x as measures. Deﬁnition 13.2.IV. The Palm measure associated with the stationary random measure ξ on Rd or its distribution P is the measure C˘P (·) on B(M# X) deﬁned by (13.2.5); when the mean density m is ﬁnite, the probability measure P0 (·) = m−1 C˘P (·) is the Palm distribution for ξ. Remark. In the point process case, it follows from Proposition 13.1.V that the measure C˘P is supported by the subspace of counting measures N on Rd

13.2.

Palm Theory for Stationary Random Measures

289

for which N {0} ≥ 1; we denote this space N0# (Rd ) in general, and N0#∗ when the counting measures are also simple with the space Rd omitted when the dimension d is clear from the context. Thus, a generic element N0 ∈ N0#∗ is boundedly ﬁnite, simple, and has N {0} = 1. If there is danger of ambiguity, we may use the phrase ‘stationary Palm measure’ (or ‘distribution’) to distinguish P0 (·) from the local versions described onwards from (13.1.4) in the last section. As there, we can gain insight into the character and interpretation of the Palm measure by looking ﬁrst at a ﬁnite point process. Example 13.2(a). Stationary point process on S [see Example 12.1(f)]. A stationary point process on S has probability distributions that are invariant under rotation, and we showed in Example 12.1(f) that this implies that the symmetrized probability measures Πn describing such a process have re˘ n satisfying (12.1.17); they describe the positions, relative to duced forms Π an arbitrarily selected initial point of the realization, of the other n−1 points. It is obvious that the Palm measure for the process should be closely related ˘ n . The diagonal decompositions summarized to the reduced distributions Π by (12.1.17) for a given value of n can be expressed in portmanteau form via a mapping of X ∪ \{∅} → X × X ∪ , similar to the mapping D preceding (13.2.4), where on the component X (n) (θ1 , . . . , θn ) → (θ1 ; θ2 − θ1 , . . . , θn − θ1 ). Any measure P on X ∪ satisfying P({∅}) = 0 is thereby mapped into a product of uniform measure on X and a reduced measure on X ∪ , P˘ say, which consists ˘ n with weightings pn = P{ξ(X ) = n}. However, this is not of the measures Π quite the Palm distribution, as we can see by reference to any of the previous formulae, such as (13.2.6), from which follows the relation 2π 1 ˘ E IΓ (Sθ N ) N (dθ) . CP (Γ) = 2π 0 Then for a set Γ determined by a realization with n points of the form (0, φ1 , . . . , φn−1 ), n " pn " E IΓ (Sθi N ) " n C˘P (Γ) = 2π i=1 n pn IΓ (Sθi N ) Πn (dθ1 × · · · × dθn ) = 2π X (n) i=1 npn ˘ n (dφ1 × · · · × dφn−1 ), dθ IΓ (N0 ) Π = 2π X X (n−1) where now N0 is a generic counting measure with points at 0, φ1 , . . . , φn−1 . The factor n arises because the n terms in the sum give identical integrals

290

13. Palm Theory

on account of the symmetry properties of Πn . For such Γ we therefore have ˘ n (Γ). Thus, just as in the more general situation of Exercise C˘P (Γ) = npn Π 13.1(b), the Palm measure requires a weighting by the factor n. The Palm distribution ∞ is then obtained by normalizing this weighted form, which requires n=1 npn = E[N (X )] < ∞. The intuitive explanation is as before: a realization with n points is n times more likely to locate a point at the origin than a realization with just one point, and must be weighted accordingly in taking the expectation. Equations (13.2.5) and (13.2.6) yield a range of striking formulae as special cases and corollaries. One of the simplest is the following interpretation of the Palm probabilities for point processes. Example 13.2(b) Palm probabilities as rates. An important interpretation of the Palm measure for a point process comes from (13.2.6), which yields for Γ ∈ N0#∗ , C˘P (Γ) = E #{i: xi ∈ Ud and Sxi N ∈ Γ} , (13.2.7) where on the right-hand side it is to be understood that each xi is a point of the realization N . Thus, C˘P (Γ) is the expected number of points of the process in the unit cube (or, because the process is stationary, their expected rate), which, when the origin is transferred to the point in question, are associated with the occurrence of Γ. As Matthes and others have suggested, we can regard this as the rate of occurrence of marked points where a point is marked if and only if Γ occurs when the origin is shifted to the point in question. The Palm distribution then appears as a ratio of rates P0 (Γ) = m(Γ)/m,

(13.2.8)

where m(Γ) is the rate of the marked process and m is the rate of the original process. Next we use (13.2.5–6) to give more explicit formulae expressing the Palm measure C˘P in terms of P and vice versa. Setting g(x, ξ) = j(x)h(ξ) in (13.2.5), where X j(x) dx = 1, we obtain at (13.2.10) below the expression for C˘P in terms of P. For an inverse relation, more subtlety is needed: suppose that k: X × M# X → R+ is a normalizing kernel for the realizations ξ, that is, a nonnegative measurable function satisfying k(x, ξ) ξ(dx) = 1 for each ξ = ∅. (13.2.9) k(x, ∅) = 0 and X

Substituting g(x, ξ) = k(x, ξ)f (ξ) for some nonnegative B(M# X )-measurable f (·) in (13.2.5), equation (13.2.11) is obtained. We summarize as below. d ˘ Proposition 13.2.V. Let P be stationary on B(M# X ) for X = R and CP its Palm measure.

13.2.

Palm Theory for Stationary Random Measures

291

# (a) For any nonnegative B(MX )-measurable function h(·), and nonnegative BX -measurable j with X j(x) dx = 1,

M# X

h(ψ) C˘P (dψ) =

M# X

= EP

P(dξ)

X

X

j(x)h(Sx ξ) ξ(dx)

j(x)h(Sx ξ) ξ(dx) .

(13.2.10a) (13.2.10b)

(b) For any nonnegative B(M# X )-measurable function f (·) and for k(·) satisfying (13.2.9), EP f (ξ) =

M# \{∅} X

X

k(x, S−x ψ)f (S−x ψ) C˘P (dψ) dx + f (∅)P{ξ = ∅}, (13.2.11)

where ∅ denotes the null measure. As a particular corollary of the proposition set h(ψ) = I{∅} (ψ) in (13.2.10). Then (13.2.12) C˘P ({ψ = ∅}) = 0. From (13.2.11) it can be seen that C˘P determines P up to P{ξ = ∅}. For the rest of this section we assume also that P{ξ = ∅} = 0, in which case it follows from Proposition 12.1.VI that P{ξ(X ) = ∞} = 1. Equation (13.2.10) is often presented in the special case j(x) = IUd (x), when it takes the form ˘ h(ψ) CP (dψ) = EP h(Sx ξ) ξ(dx) . M# X

Ud

In the point process case this can be interpreted as follows: shift the origin successively to each point of the process in Ud and sum the values of h obtained from these shifted versions of ξ, thus obtaining an expected rate of contributions to the sum. When h is specialized to the indicator of some event U of the process, the value of the integral on the right-hand side is just the rate of occurrence of points for which U occurs when the origin is shifted to the point in question. The two equations (13.2.10–11) can be specialized in various obvious ways. If we take h or f to be the indicator function of a set U ∈ B(M# X ), we obtain direct expressions for C˘P (U ) in terms of P and for P(U ) in terms of C˘P . When the mean density m exists, the equations can be put in the more symmetrical forms using P0 from Deﬁnition 13.2.IV, −1 h(Sx ξ) ξ(dx) , EP0 h(ψ) = m EP d U k(x, S−x ψ)f (S−x ψ) dx . EP f (ξ) = m EP0

X

(13.2.13) (13.2.14)

292

13. Palm Theory

Speciﬁc examples of functions k(·) satisfying (13.2.9) are given in Section 13.3; Exercise 13.2.3 outlines Mecke’s (1967) general construction. We now consider the moment measures of the Palm distribution. There is no loss of generality here in speaking of moments of the Palm distribution, rather than of the Palm measure, because the existence of higher moments implies the existence of the ﬁrst moment, which ensures that the Palm measure is totally ﬁnite and so can be normalized to yield the Palm distribution. Following the notation of the earlier sections, let ψ denote a stationary random measure, and write ˚k (A1 × · · · × Ak ) = EP0 ψ(A1 ) . . . ψ(Ak ) (A1 , . . . , Ak ∈ BX ) M for the kth moment measure for P0 . ˚k of Proposition 13.2.VI. For k = 1, 2, . . . , the kth moment measure M the Palm distribution exists if and only if the (k + 1)th moment measure of the original random measure exists, in which case it is related to the reduced ˘ k+1 by (k + 1)th moment measure M ˘ k+1 (·). ˚k (·) = m−1 M M (13.2.15) Proof. The result is a further application of (13.2.5) and Fubini’s theorem. For nonnegative measurable g and h with h(·) on X (k) and g(·) integrable on X , set ··· h(y1 − x, . . . , yk − x) ξ(dy1 ) . . . ξ(dyk ). g(x, ξ) = g(x) X

X

Then the left-hand side of (13.2.5), using also (12.6.6), becomes g(x)h(y1 − x, . . . , yk − x) Mk+1 (dx × dy1 × · · · × dyk ) X (k+1) ˘ k+1 (du1 × · · · × duk ), = g(x) dx h(u1 , . . . , uk ) M X (k)

X

whereas using Fubini’s theorem with the right-hand side yields g(x) dx h(u1 , . . . , uk ) ψ(du1 ) . . . ψ(duk ) m EP0 X X (k) ˚k (du1 × · · · × duk ). =m g(x) dx h(u1 , . . . , uk ) M X

X (k)

But h ≥ 0 is arbitrary, so (13.2.15) follows, together with ﬁniteness (because m is ﬁnite). The results of Section 12.4 and as above can be summed up in the following diagram: P(dξ) ξ(dx) | | reduction ↓ m P0 (dξ)

moments

−−−−−−−→

moments

−−−−−−−→

˚k mM

Mk+1 | | reduction ↓ ˘ k+1 ≡M

13.2.

Palm Theory for Stationary Random Measures

293

Again the results are conveniently illustrated with respect to the ﬁnite case of a point process on a circle, as set out in Exercise 13.2.6. Moment measures can also be deﬁned for higher-order and modiﬁed Palm distributions derived from the corresponding Campbell measures introduced around (13.1.3). For example, the ﬁrst-order moment measure for the second˚1(2) (dz | x, y) can be deﬁned as the Radon–Nikodym order Palm distribution M derivative of the third-order moment measure with respect to the second-order moment measure: ˚1(2) (dz | x, y) = M

M3 (dz × dx × dy) . M2 (dx × dy)

(13.2.16)

When the process is stationary this reduces to a function of u = y − x and v = z − x. Further discussion of these quantities and of the corresponding moment measures for the modiﬁed Palm distributions are brieﬂy set out in Exercise 13.2.9. We turn ﬁnally to a characterization of Palm measures due to Mecke (1975). Keeping to Mecke’s context, we ﬁrst observe that the relation between a sta˘ tionary measure P on B(M# X ) and its Palm measure CP can be extended to general σ-ﬁnite measures P. The deﬁnition of stationarity carries over to this context, and equations (13.2.5–6) continue to deﬁne a σ-ﬁnite measure C˘P in terms of an initial σ-ﬁnite measure P, whether or not the initial measure is totally ﬁnite. The only step that needs checking is that the argument establishing the σ-ﬁniteness of C˘P remains valid (see Exercise 13.2.8). In this context, neither P nor CP need be totally ﬁnite but both may be. The previous results can be succinctly summarized for this extended context as follows. Proposition 13.2.VII. Let Q be a σ-ﬁnite measure on B(M# X ) and CQ its associated Campbell measure. Then the following assertions are equivalent. (i) Q is stationary (invariant under shifts S+u , u ∈ X ). + u , u ∈ X deﬁned at (13.2.1b). (ii) CQ is invariant under the transformations Θ (iii) CQ factorizes as in (13.2.6) into a product of Lebesgue measure and an ˘ associated σ-ﬁnite measure R on B(M# X ), and R coincides with CQ . Proof. Only the last assertion, equivalent to asserting the uniqueness of the Palm factorization, requires further comment. Suppose in fact that two distinct measures R and R both satisfy (13.2.6) for the same Q. Then there must be some measurable, nonnegative f with M# f dR = M# f dR . But X X each of R and R is associated with an inversion formula (13.2.11), leading to a contradiction if both R and R are associated with the same Q. The third assertion of the proposition invites the further question: which measures R can appear in a Palm factorization. Although the factorization lemma implies that any R inserted in the right-hand side of (13.2.6) yields + u , in general this measure will not a measure on W that is invariant under Θ

294

13. Palm Theory

be a Campbell measure, that is, it will not be generated by some underlying stationary Q on M# X . The constraint which R must satisfy in order to be a stationary Palm measure is given in the following result of Mecke (1975, Theorem 1.7). d Theorem 13.2.VIII. A measure R on B(M# X ) with X = R , is the Palm measure of some stationary, σ-ﬁnite (but not necessarily ﬁnite) measure Q on # MX , B(M# X ) if and only if the following three conditions hold: (i) R is σ-ﬁnite; (ii) R({∅}) = 0; and (iii) for all measurable nonnegative h(·, ·) on X × M# X, h(−y, Sy ψ) ψ(dy) R(dψ) = h(y, ψ) ψ(dy) R(dψ). M# X

M# X

X

If, in addition, η =

M# X

X

X

(13.2.17)

k(y, S−y ψ) dy R(dψ) ≤ 1 for some k(·) satisfy-

ing (13.2.9), then R is the Palm measure of a probability measure P on M# X satisfying P{ξ = ∅} = 1 − η. Proof. Suppose ﬁrst that R = C˘Q as in Proposition 13.2.VII for some stationary, σ-ﬁnite measure Q. Then (i) follows from Theorem 13.2.III, and (ii) is (13.2.12). To establish (iii), let j(x) be nonnegative, measurable, with j(x) dx = 1, and apply (13.2.10) to the left-hand side of (13.2.17), regarded X as a function of ψ. This yields j(x)h(−y, Sy Sx ξ) ξ(dx) Sx ξ(dy) Q(dξ).

M# X

X

X

Because X f (y) Sx ξ(dy) = X f (y − x) ξ(dy) for any nonnegative measurable function f , the displayed expression can be rewritten as j(x)h(x − y, Sy ξ) ξ(dx) ξ(dy) Q(dξ), M# X

X

X

and using now the reverse of this transformation to the integration over x, gives j(x + y)h(x, Sy ξ) Sy ξ(dx) ξ(dy) Q(dξ). M# X

X

X

Rewriting the integration over ψ and y in terms of the factorization as in the right-hand side of (13.2.6) (with y and ξ in place of x and ψ there), gives j(x + y)h(x, ξ) ξ(dx) R(dξ) dy.

X

M# X

X

Finally, using X j(x + y) dy = 1 and integrating out y, yields the right-hand side of (13.2.17).

13.2.

Palm Theory for Stationary Random Measures

295

For the reverse argument, suppose that R satisﬁes conditions (i)–(iii) and that k is a normalizing kernel satisfying (13.2.9). Use R to deﬁne a measure Q on B(M# X ) by Q({∅}) = 0 and the ﬁrst term in (13.2.11), so that

f (ξ) Q(dξ) =

M# X

M# \{∅} X

X

k(y, S−y ψ)f (S−y ψ) R(dψ) dy.

We insert this measure into the left-hand side of the fundamental equation (13.2.6) and obtain

= = =

M# X

M# X

X

X

M# X X

X

M# X X

X

X

g(x, Sx ξ) ξ(dx) Q(dξ)

g(x, Sx−y ψ)k(y, S−y ψ) S−y ψ(dx) dy Q(dψ)

g(u + y, Su ψ)k(y, S−y ψ) ψ(du) dy Q(dψ),

putting x = u + y,

g(v, Su ψ)k(v − u, Su−v ψ) ψ(du) dv Q(dψ), putting y = v − u,

where in the last equation we used the invariance of Lebesgue measure under shifts. In general this expression does not simplify, but when (iii) holds, then with h(y, ψ) = X g(v, Su ψ) k(v − u, Su−v ψ) dv, we can write for the righthand side g(v, ψ)k(v + u, S−v ψ) ψ(du) dv R(dψ) = g(v, ψ) dv R(dψ), because

k(v + u, S−v ψ) ψ(du) =

k(x, S−v ψ) S−v ψ(dx) = 1.

The last equation but one shows that the measure Q deﬁned from R factorizes as in Proposition 13.2.VII(iii), and hence ﬁrst is stationary, and second, identiﬁes R as the reduced Campbell measure of Q. By extension of Deﬁnition 13.2.IV we call such an R a Palm measure also. Finally, the normalizing condition entailing η ≤ 1 comes from Proposition 13.2.V(b). Example 13.2(c) Palm factorization of the KLM measure for a stationary inﬁnitely divisible point process. From Proposition 12.4.I we know that an inﬁnitely divisible point process is stationary if and only if its KLM measure Q is stationary. In that case, a Palm factorization can be applied to the measure CQ (A × U ) =

U

(A) Q(d N ) N

U ∈ B(NX# \{∅}) ,

296

13. Palm Theory

giving X

# NX \{∅}

) N (dx) Q(d N ) = g(x, Sx N

dx X

N0#

0 ) C˘ (dN 0 ), g(x, N Q

0 here denotes a generic element of N # . Let us write for brevity where N 0 0 , C˘ = Q Q

0 is deﬁned on and note that for a point process, as in the probability case, Q 0 may or may not be totally ﬁnite. B(N0# ); Q 0 for the various types of stationary, We can now examine the properties of Q inﬁnitely divisible point processes. (1◦ ) Suppose that P is regular and therefore has a representation as the regular version of a stationary Poisson cluster process (Proposition 12.4.II). 0 is closely related to the symmetrized measures Pk−1 used in deﬁning Here Q the regular representation. Regard Pk−1 not as a measure on X (k−1) but as a measure on the set Dk of counting measures in N0# containing just k − 1 points in addition to the point at the origin. Then we have k−1 0 ) Q 0 (dN 0 ) = k h(N h δ0 + δxi Pk−1 (dx1 × · · · × dxk−1 ), Dk

X (k−1)

i=1

(du) integration the factor k arising here, as in Example 13.2(a), from the N in the Campbell measure. The normalized measures ! kpk Pk−1 (·) m, where m = kpk and {pk } describes the distribution of the cluster size, can be interpreted loosely as providing the conditional distribution of the other cluster members, given that the point at the origin is arbitrarily chosen from a randomly selected population of i.i.d. families of k members. 0 is supported by the set of We see that in the regular case, the measure Q # counting measures in N0 with ﬁnite support. (2◦ ) If the process is strongly singular [see the deﬁnitions of Ss at the ) below (12.4.6)], the KLM measure itself is end of Section 12.4 and of I V (N supported on the set of counting measures with positive ergodic limits [i.e., ) > 0], and it follows, as in the discussion of Theorem 13.3.II below, I V (N 0 is concentrated on the space of sequences that also have positive that Q ergodic limits. This is the situation if, in particular, the process is a Poisson 0 = µC˘ , where µ is the parameter of the randomization, in which case Q P Poisson distribution and P is the point process that is being ‘randomized’. 0 measures concentrated (3◦ ) Finally, the weakly singular processes have Q # on the subset of N0 of counting measures that have inﬁnite support but are asymptotically sparse in the sense that their ergodic limits are zero. In summary we have proved the following statement.

13.2.

Palm Theory for Stationary Random Measures

297

Proposition 13.2.IX. A stationary inﬁnitely divisible point process is regular, strongly singular, or weakly singular, according to whether the Palm 0 is supported, respectively, by the ﬁnite counting measures, the measure Q counting measures with inﬁnite support and positive ergodic limits, or the counting measures with inﬁnite support but zero ergodic limits. Exercise 13.2.10 exhibits the Palm measure P of a stationary inﬁnitely divisible random measure with ﬁnite ﬁrst moment measure as a convolution of P with the reduced Campbell measure of its KLM measure.

Exercises and Complements to Section 13.2 13.2.1 Let the random measure ξ on the integers as in Exercise 13.1.5 be stationary. Describe ξ as a stationary sequence {Xn } ≡ {ξ{n}} of nonnegative r.v.s. What aspects of this sequence are described by P and P0 , respectively? When ξ is a simple point process, give analogues and simple cases of equations (13.2.2–3) and (13.2.6–7). 13.2.2 Let {Pn } be a sequence of probability measures on M# X and P a limit measure. If {Pn } and P are stationary and Pn → P weakly, investigate under what conditions and in what sense there is convergence of the Campbell measures CPn to CP . Consider also conditions under which, for a stationary process, weak convergence of the underlying probability measures implies weak convergence of the associated Palm distributions, and vice versa. [Hint: Consider ﬁrst the convergence of the ﬁrst moment measures, which is necessary for any meaningful sense of convergence for the Campbell measures. For the second part, use the representation theorems ﬁrst to establish convergence of the ﬁdi distributions, and then refer to Section 11.1.] 13.2.3 Let {An } be a covering of X by disjoint bounded Borel subsets of X . Deﬁne the functions a(·) on W \ (X × {∅}) and k(·) for ξ = ∅ by

1 2 Iξ(A(x)) a(x, ξ) . k(x, ξ) = a(y, ξ) ξ(dy) a(x, ξ) =

∞ n=1

−n An

if x ∈ An and ξ(An ) = 0, otherwise,

n

X

Verify that the function k(·) satisﬁes (13.2.9). 13.2.4 Let the stationary random measure ξ have ﬁnite ﬁrst moment measure. Show that the Laplace functional result of Proposition 13.1.VI simpliﬁes on writing Lx [f ] = L0 [S−x f ]. 13.2.5 (a) Let ξ , ξ be stationary random measures with probability measures P , P . For nonnegative p, q with p + q = 1, the random measure ξ, which equals either ξ or ξ with probabilities p, q, respectively, has probability measure P = pP + qP . Find its Campbell and Palm measures. (b) Let ξ , ξ as in (a) be independent random measures, and let ξ = ξ + ξ . Using ∗ to denote convolution, show that the Palm measure P0 of ξ is given by P0 = P0 ∗ P + P ∗ P0 .

298

13. Palm Theory

13.2.6 Find explicit expressions for the reduced moment measures for a stationary ˘ n (·) in the point process on the circle in terms of the distributions {pn }, Π notation of Example 13.2(a). In particular, expanding the second moment measure by (5.4.7) and with p∗n = npn /m, m = n npn ,

∞

˘ [2] (B) dθ = M

(n + 1)(n + 2)pn+2 Πn+2 (dθ × S−θ B × X (n) )

n=0

=

m dθ 2π

∞

˘ n+2 (B × X (n) ) (n + 1)p∗n+2 Π

n=0

= (m dθ/2π) EP0 [N (B)]. † introduced in 13.2.7 Investigate properties of the modiﬁed Campbell measure CP (13.1.1b) and the corresponding modiﬁcations of local Palm distributions. In particular, ﬁnd the form of these distributions for the process considered in Example 13.1(b). [Hint: See Kallenberg (1983a, §12.3), (MKM, 1978, §5.4).]

13.2.8 Extend the argument of Exercise 13.1.2 from the special case of KLM mea# sures to show that if a stationary, σ-ﬁnite measure R on (M# X , B(MX )) is associated with a Campbell measure CR by (13.1.1), then CR is again σ-ﬁnite. Show that in this case also, a function h can be found satisfying (13.2.4), and used to establish the existence of a σ-ﬁnite reduced version C˘R satisfying (13.2.5) and (13.2.6) of Theorem 13.2.III. 13.2.9 (a) Deﬁne a second-order Campbell measure as in (13.1.3), and use it to (2) deﬁne a family of second-order local Palm distributions Px,y (·). What simpliﬁcation can be expected if the underlying random measure is stationary? (b) Suppose that ﬁrst-, second-, and third-order moment measures M1 , M2 , M3 exist for the random measure ξ. Deﬁne the reduced moment measure ˚1(2) (dz | x, y) as the Radon-Nikodym derivative in M M3 (C, A, B) =

˚1(2) (C | x, y) M2 (dx × dy), M

A×B

and show that it can be interpreted as the ﬁrst moment measure of the second-order local Palm distribution as in (a). What simplications can be expected if the underlying random measure is stationary? [Hint: Both sides equal E[ξ(A)ξ(B)ξ(C)], assumed ﬁnite for bounded A, B, C ∈ BX .] (c) Investigate the forms of these measures for (i) Poisson, (ii) renewal, and (iii) two-point cluster Poisson processes.

13.2.10 Let P be the probability measure of a stationary inﬁnitely divisible random measure with ﬁnite ﬁrst moment measure, and let Q be its KLM measure. ˘ . Prove that P0 = Q0 ∗ P, where Q0 is the reduced Campbell measure C Q

[Hint: Let P and Q have Laplace functionals LP and LQ . Relate log LP [f ]

to LQ [f ] and deduce that for f , g ∈ BM+ (Rd ) [see (13.1.15)], Rd

g(x)LP [S−x f ; 0] mP dx = LP [f ]

Rd

g(x)LQ [S−x f ; 0] mQ dx. ]

13.3.

Interval- and Point-stationarity

299

13.3. Interval- and Point-stationarity When the ideas of Section 13.2 are specialized to point processes, a number of new features arise; we review them in this section. In particular, we consider a number of results for the important special case where X is the real line, and the point process is simple with ﬁnite mean rate m. The central result here is the correspondence, foreshadowed already in Chapter 3, between such point processes and stationary sequences of intervals. Finding a counterpart for this result in spaces of higher dimension seemed an impossible task at the time of the ﬁrst edition of this book, but the recent attack on the problem by Thorisson, Last, Heveling, and others has shown this not to be the case, and is introduced at the end of the section. As noted below Deﬁnition 13.2.IV, it follows from Proposition 13.1.V that, for a simple stationary point process, the support for its Palm measure can be taken to be the space N0#∗ of simple counting measures with a point at the origin. From Proposition 12.1.VI it follows, because N {0} = 1, that the Palm measure is in fact supported by elements N ∈ N0#∗ which cannot be empty and must therefore satisfy N (Rd ) = ∞. When the state space is the line, this means that N (−∞, 0] = N (0, ∞) = ∞. For this special case, it is clear also that there is a one-to-one both ways measurable mapping Φ between the space N0#∗ [with the σ-algebra of Borel sets inherited from B(NR#∗ )] and the space T + of doubly inﬁnite sequences of positive numbers. Denoting the points of a generic element N0 ∈ N0#∗ by {. . . , t−1 (N0 ), t0 (N0 ) = 0, t1 (N0 ), . . .} with ti (N0 ) < ti+1 (N0 ) (i = 0, ±1, . . .), the mapping Φ associates the points of N0 with the sequence of intervals τi ≡ ti (N0 ) − ti−1 (N0 ); that is, ΦN0 = {τi } ≡ {ti (N0 ) − ti−1 (N0 ): i = 0, ±1, . . .}. For measurability see Exercise 9.1.14. Every measure on N0#∗ , B(N0#∗ ) , and in particular every Palm measure C˘P (·), then induces a measure on (T + , B(T + )), say (C˘P Φ−1 )(·). These remarks pave the way for the results setting up a correspondence between counting properties and interval properties. The correspondence is essentially a restatement of the equations representing the Palm measure in terms of the Campbell measure and hence of its underlying probability measure, and vice versa. We state the theorem ﬁrst in its most striking form, for the case of ﬁnite mean density m. Theorem 13.3.I [Ryll-Nardzewski (1961), Slivnyak (1962, 1966), Kaplan (1955)]. There is a one-to-one correspondence between the distributions P on B(NR#∗ ) of simple nonnull stationary point processes on the line with ﬁnite mean density m, and the distributions Π on B(T + ) of stationary sequences of positive random variables with ﬁnite mean m−1 . If P0 is the image in

300

13. Palm Theory

B(N0#∗ ) under Φ−1 of the measure Π on B(T + ), this relation is eﬀected by the equations N (0,1] h(Sti N ) (13.3.1) EP0 h(N0 ) = m−1 EP i=1

for nonnegative B(N0#∗ )-measurable h(·), and EP

t1 (N0 ) g(N ) = mEP0 g(St N0 ) dt ∞ 0 =m EP0 g(St N0 )I{t1 (N0 )>t} (N0 ) dt

(13.3.2)

0

for nonnegative B(NR#∗ )-measurable g(·). Proof. Equations (13.3.1–2) are adaptations to the present context of equations (13.2.10–11), but further comments are required. In (13.3.1), the points ti refer to points of N lying in the unit interval (0, 1]; each Sti satisﬁes (Sti N )({0}) = N ({ti }) = 1, and so with probability 1 it can be identiﬁed with an element of U0 whenever h(Sti N ) is well deﬁned. In the exceptional / U0 , the value of h(Sti N ) can be represented P-null set, where (ti , Sti N ) ∈ arbitrarily (and set equal to zero, say). To derive (13.3.2) from (13.2.11) set there k(x, N ) =

1 if x = t0 (N ), 0 otherwise.

(13.3.3a)

Observe that for a simple nonnull counting measure (see Deﬁnition 12.1.VII), R

k(x, N ) N (dx) = N ({t0 (N )}) = 1,

(13.3.3b)

so that k(·) satisﬁes (13.2.9) when also P corresponds to a simple point process, that is, P(NR#∗ ) = 1. Now substituting in (13.2.11), the term k(x, S−x N0 ) in the integral on the right-hand side of (13.2.11) equals unity provided x = t0 (S−x N0 ), which, because the counting measure S−x N0 for N0 ∈ N0#∗ consists of atoms of unit mass at {x + ti (N0 ): i = 0, ±1, . . .}, is true for x ≤ 0 < x + t1 (N0 ), that is, for −τ1 < x ≤ 0. Changing the variable of integration from x to −x leads to (13.3.2). We now show that if P is stationary on B(NR#∗ ) then Π is stationary on B(T + ) and conversely. Following Franken et al. (1981) we argue as follows. First, using the stationarity of P we can extend (13.3.1) to give EP0

m EP h(N0 ) = T

N (0,T ] i=1

h(Sti N )

(0 < T < ∞).

13.3.

Interval- and Point-stationarity

301

Deﬁne the shift operator ϑ: T + → T + by {ϑτi } = {τi−1 }. Then its image Θ: N0#∗ → N0#∗ , where ΘN0 = Φ−1 {ϑτi } = Φ−1 (ϑ(ΦN0 )), satisﬁes EP0

m EP h(ΘN0 ) = T

N (0,T ]

h(Sti−1 N ) ,

i=1

from which we have " " "EP h(N0 ) − EP h(ΦN0 )" ≤ (m/T )EP |h(St N )| + |h(St N )| , 0 0 0 N +1 where N = N (0, T ]. Then for all bounded h, the right-hand side → 0 as T → ∞. Consequently, the expectations on the left-hand side coincide for all bounded measurable h, so that the measures Π and Π ◦ ϑ, equivalent to P0 and P0 ◦ Θ, are therefore equal; that is, Π is invariant under ϑ, and thus its iterates are also invariant. Similarly, when Π is stationary, (13.3.2) can be extended by iteration to tr (N0 ) 1 EP0 g(St N0 ) dt (r = 1, 2, . . .). EP [g(N )] = mr 0 Replacing N by St N , subtracting, and letting r → ∞, we ﬁnd in an analogous fashion that P is stationary under shifts St . We come ﬁnally to the question of uniqueness. Suppose we are given a stationary measure Π on T + and that P is constructed from Π via P0 and (13.3.2). Then P, which is clearly a probability measure, has an associated Palm measure C˘P that satisﬁes the equation analogous to (13.3.2); namely, t1 (N0 ) EP [g(N )] = C˘P (dN0 ) g(St N0 ) dt. (13.3.4) N0#∗

0

Substituting g(N ) = h(St0 (N ) N ), the inner integral in (13.3.4) becomes

t1 (N0 )

h(St0 (St N0 ) St N0 ) dt.

0

Now for 0 < t < t1 (N0 ), St N0 has points at ti (N0 ) − t, so the point of St N0 lying in (−∞, 0) and nearest to the origin is at −t. In this range of t, therefore, the argument of h reduces to N0 , and (13.3.2) yields for this g EP [g(N )] = mEP0 t1 (N0 )h(N0 ) . Similarly, (13.3.3) yields EP [g(N )] =

N0∗

t1 (N0 )h(N0 ) C˘P (dN0 ).

Both these equations hold for nonnegative B(N0#∗ )-measurable h, so it follows that the measures mt1 (N0 ) Π(dN0 ) and t1 (N0 ) C˘P (dN0 ) coincide, thereby

302

13. Palm Theory

identifying mΠ as the Palm measure for P. Thus, Π is determined uniquely by P. Again, if P is given and Π is determined (through P0 ) by (13.3.1), then we know already that Π is the Palm distribution of P, and hence from (13.3.2) that P is uniquely determined by Π. Either equation on its own is enough to imply a one-to-one correspondence. Theorem 13.3.I is a substantial generalization of the Palm–Khinchin equations of Section 3.4, and it provides the most satisfactory approach to the determination of the point process associated with a given process of intervals. The intuitive content of (13.3.2) can be expressed as follows. To embed a stationary sequence of intervals {τn : n = 0, ±1, . . .} with distribution Π as in Theorem 13.3.I into a stationary point process on R, ﬁrst select a realization {τn }, and choose a number X uniformly at random on a suitably large interval (0, T ) say, with T any τn [roughly speaking, this is like taking r large in the display before (13.3.4)]. Then deﬁne a realization {tn } of the point process on R by relabelling the sequence −X + τ1 + · · · + τr (r = 0, 1, . . .), tr = −X − τ0 − · · · − τr+1 (r = −1, −2, . . .), as tn = tr +n (n = 0, ±1, . . .), where we identify r from tr = inf{tr : tr > 0}. Then (all n). tn − tn−1 = τr +n ∞ From the choice of X, Pr{τr > x} = x u Π(du), that is, the length-biased distribution of the common distribution of the stationary sequence {τn }. The next example utilizes a more direct construction that follows Palm’s original suggestion. It incorporates the idea of a point chosen uniformly at random from within an initial interval selected by the length-biased form of the stationary interval distribution spelled out in detail for the Wold process around (4.5.3a). Equation (13.3.2) is a more formal and general way of expressing the same ideas. Example 13.3(a) Renewal and Wold processes. Suppose ﬁrst that {. . . , L−1 , L0 , L1 , . . .} is a sequence of i.i.d. positive r.v.s, which is therefore stationary and so describes a distribution Π on T + . Indeed, Π is just the product measure on (R+ )(∞) derived from multiple copies of the measure F (dx) associated with each of the Li . To ﬁt into the framework of the theorem we must have ∞ x F (dx) = m−1 < ∞. F (0+) = 0, 0

In (13.3.2) take g(N ) = IΓ (N ), where Γ = Γ1 ≡ {N : t1 (N ) > x}. Then the term g(St N0 ) on the right-hand side of (13.3.2) equals unity for 0 < t < t1 (N0 ) − x and t1 (N0 ) > x, and zero otherwise, so that EP g(N ) = P(Γ1 ) = P{t1 (N ) > x} = m EΠ [(L1 − x)I{L1 >x} ] ∞ ∞ (y − x) F (dy) = m [1 − F (y)] dy, =m x

x

13.3.

Interval- and Point-stationarity

303

which is the ﬁrst of the Palm–Khinchin equations (3.4.9) and shows in the renewal case that the ﬁrst interval after a ﬁxed origin (and in view of stationarity the choice of origin is immaterial) has the distribution of the forward recurrence time [see Example 4.1(c)]. Next, take Γ = Γ2 ≡ {N : t1 (N ) > x, t2 (N ) − t1 (N ) > y}. We obtain similarly P(Γ2 ) ≡ P{t1 (N ) > x, t2 (N ) − t1 (N ) > y} = m EΠ [(L1 − x)I{L1 >x,L2 >y} ] ∞ [1 − F (u)] du 1 − F (y) , =m x

on account of the assumed independence of the {Li }. The ﬁrst equality here is the second of the Palm–Khinchin equations. The other equality shows that for a stationary renewal process, the length of the second interval after the origin is independent of the ﬁrst. In the case of a Wold process (see Section 4.5), the intervals {Li } form a stationary Markov chain with stationary distribution π(·) say and transition kernel P (x, B) = Pr{Li+1 ∈ B | Li = x}. Again we must assume that ∞ π({0}) = 0 = π((−∞, 0]) and 0 x π(dx) = m−1 < ∞. For Γ1 and Γ2 as above we ﬁnd that t1 (N ) has the same kind of forward recurrence time distribution with π(·) in place of F (·), and that t1 (N ) and t2 (N ) − t1 (N ) have the joint distribution

∞

F2 (dx × dy) = m dx

π(du) P (u, dy). x

Thus, the marginal distribution of t2 (N ) − t1 (N ) is now given by F2 (R+ × dy) = m

∞

∞

dx 0

π(du) P (u, dy), x

and in general neither this interval nor any of the later intervals has exactly the stationary interval distribution. The analysis of Section 13.2 allows us to construct a Palm measure even for a process with inﬁnite intensity. It is therefore natural to seek a version of Theorem 13.3.I valid even for processes with inﬁnite mean rate; this is possible if Π is allowed to have inﬁnite total mass. In fact, the proof of Theorem 13.3.I carries over with only notational changes as soon as we replace mΠ by the measure induced on T + by the Palm measure C˘P , which remains σ-ﬁnite but not necessarily totally ﬁnite. For brevity we state the theorem below in terms of a measure R on the space N0#∗ rather than a measure on the space T + of interval sequences.

304

13. Palm Theory

Theorem 13.3.II. There is a one-to-one correspondence between distributions P on B(NR#∗ ) of simple nonnull stationary point processes on R, and stationary σ-ﬁnite (but not necessarily totally ﬁnite) measures R on B(N0#∗ ) satisfying t1 (N0 ) R(dN0 ) = P(dN ) = 1. (13.3.5) N0#∗

NR#∗

The correspondence is eﬀected via nonnegative B(N0#∗ )-measurable h and nonnegative B(NR#∗ )-measurable g in the equations

N0#∗

and

h(N0 ) R(dN0 ) =

N (0,1]

NR#∗

h(Sti N ) P(dN )

NR#∗

g(N ) P(dN ) =

N0#∗

(13.3.6a)

i=1

R(dN0 )

t1 (N0 )

g(St N0 ) dt.

(13.3.6b)

0

Proof. The normalization condition (13.3.5) follows from setting f (ξ) ≡ 1 in (13.2.11). The remaining results paraphrase those of Theorem 13.3.I; details of the proofs are left to the reader. We return now to the more general context of point processes on Rd . In the absence of a total ordering on Rd , it is not immediately apparent what should be the exact counterparts of the preceding results. Some initial progress can be made by replacing the role of τ1 above by the point of the realization, x∗ (N ) say, that is closest to the origin. We ﬁrst check that this concept is well deﬁned. Lemma 13.3.III. Let N be a simple nonnull stationary point process on X = Rd . Then the set {N : there exist x , x with x = x and N ({x }) ≥ 1, N ({x }) ≥ 1} is B(NX#∗ )-measurable and has P-measure zero. Proof. The set J ⊂ X (2) deﬁned by J = {(x, y): x = y, x = y} is a measurable set in B(X (2) ) by inspection, and we can write IJ (x, y) N (dx) N (dy) = h(x, N ) N (dx), X (2)

X

where h(x, N ) =

1 if N ({y}) > 0 for some y with y = x, 0 otherwise,

13.3.

Interval- and Point-stationarity

305

is measurable. Applying (13.2.5), we obtain ˘ E h(x, N ) N (dx) = h(x, S−x N0 ) dx. CP (dN0 ) N0#∗

X

X

The function h(x, S−x N0 ) equals 1 only on the at most countable set of surfaces {y: y + xi = xi , y = xi } obtained by letting xi run through the points of N0 . For d = 1, the surface consists of the single point y = −2xi ; for d > 1, it consists of a surface in Rd of dimension d − 1, and so is of zero Rd -Lebesgue measure. In either case, the inner integral vanishes for each N0 , and so the expectation is zero. It follows from Lemma 13.3.III that with probability 1 the distances from the origin to the points of a realization of a nonnull stationary simple point process in Rd can be set out in a strictly increasing sequence 0 < r1 (N ) < r2 (N ) < · · · . In this case the quantities ri (N ) are well-deﬁned random variables because N Sa (0) ≥ i}, {ri (N ) < a} if and only if and for given i there is a.s. a unique point of the process, x∗i (N ) say, associated with a given distance ri . In the exceptional set (of probability 0) where there is no such unique point, we can put all the x∗i (N ) equal to zero. It follows that the x∗i (N ) form a measurable enumeration of the points of the realization (Deﬁnition 9.1.XI), for the measurability of sets such as {N : x∗i (N ) ∈ A} =

( k k+1 ≤ ri (N ) < ; N (A) > 0 n n n

(A ∈ B(Rd ))

k

implies that each x∗i (N ) is a well-deﬁned random element of Rd . In the sequel we mostly use the point of a realization N that is closest to the origin, which we denote for brevity by (13.3.7) x∗ (N ) = x∗1 (N ). One immediate use for x∗ (N ) is to develop an inversion formula extending (13.3.2) to the case X = Rd . For this we need the concept of a Voronoi polygon: for any given realization N of a nonnull simple point process, and any point u ∈ Rd , the Voronoi polygon Vu (N ) with ‘centre’ u is the subset Vu (N ) = {x: x − u < x − xj , xj ∈ N and xj = u} of points x ∈ Rd that lie closer to u than to any point xj of N . Consequently, 0 if u ∈ / N, N (dx) = (13.3.8) 1 when u ∈ N and N is simple. Vu (N )

306

13. Palm Theory

In particular, if N0 ∈ N0#∗ , we write V0 (N0 ) for the Voronoi polygon about the origin (which is a point of N0 ), and note that N0 (dx) = N0 V0 (N0 ) = 1, V0 (N0 )

so that IV0 (N0 ) (x) = k(x, N0 ) where k(·) is a function as in (13.2.9) and used in the inversion formulae (13.2.11) and its variants. The inversion formula itself takes the form, for nonnegative B(NX#∗ )-measurable g(·), g(Sx N0 ) dx R(dN0 ), (13.3.9a) EP g(N ) = N0#∗

V0 (N0 )

where the measure P on B(NX#∗ ) is a probability measure provided R satisﬁes the normalizing condition dx R(dN0 ) = 1. (13.3.9b) N0#∗

V0 (N0 )

If R is totally ﬁnite with mass m, say R = mP0 , the right-hand side can be written as g(Sx N0 ) dx , (13.3.9c) m EP0 V0 (N (Sx N0 ))

in which case the left-hand side of (13.3.9a) corresponds to the probability distribution of a stationary simple point process with ﬁnite mean density m and Palm distribution P0 [the proof of this fact is similar to that of (13.3.2) and left to Exercise 13.3.4(a)]. The inversion formula (13.3.9a) is not as useful in Rd (d ≥ 2) as is (13.3.2) in R. This is chieﬂy a reﬂection of the increased structural complexity of the higher-dimensional Euclidean spaces, but a few simple results can be deduced from it, such as the intuitively obvious fact that the expected hypervolume (i.e., Lebesgue measure) of the Voronoi polygon about the origin equals m−1 [see Exercise 13.3.4(b)]. It does not supply a full counterpart in Rd of Theorems 13.3.I and 13.3.II, because it does not address the issue of deﬁning a notion of stationarity for measures in N0#∗ extending that of interval-stationarity for point processes on the line; this extension must await the discussion of point-stationarity and Theorem 13.3.IX. A further use of x∗ (N ) is in the next theorem, which establishes the conditional probability interpretation of the Palm distribution for point processes in general Euclidean spaces Rd . Theorem 13.3.IV. Let N be a simple stationary point process in Rd with ﬁnite mean rate m, distribution P, and Palm distribution P0 , and let {An : n = 1, 2, . . .} be a nested sequence of bounded Borel sets with nonempty interiors satisfying (n → ∞). (13.3.10) diam(An ) → 0

13.3.

Interval- and Point-stationarity

Then as n → ∞,

307

(An )−1 P{N (An ) > 0} → m.

(13.3.11)

If, furthermore, the sets {An } are spheres in R centred at the origin, then for bounded continuous nonnegative Borel functions f on NX#∗ , EP f (N ) | N (An ) > 0) → EP0 f (N0 ) . (13.3.12) d

Proof. The ﬁrst assertion (13.3.11) is a corollary to the discussion on intensities in Chapter 9 [see, in particular, (9.3.22) and Exercise 9.3.11]. A further corollary of the same discussion, which we need in the sequel, is that ! (n → ∞) (13.3.13) P{N (An ) ≥ 2} (An ) → 0 (see Proposition 9.3.XV). We approach the assertion at (13.3.12) via the following result. Proposition 13.3.V. Let N , P, P0 , m, and {An } be as in Theorem 13.3.IV and x∗ (N ) as at (13.3.7). Then for bounded nonnegative B(NX#∗ )-measurable f (·), (13.3.14) EP f (Sx∗ (N ) N ) | N (An ) > 0 → EP0 [f (N0 )]. Proof. Note ﬁrst that " " "EP f (Sx∗ (N ) N ) | N (An ) > 0 − EP0 f (N0 ) " −1 " " "EP [f (Sx∗ (N ) N )I{N (A )>0} ] − m(An )EP f (N0 ) " ≤ m(An ) 0 n " " −1 ! EP [f (Sx∗ (N ) N )I{N (An )>0} ]"1 − m(An ) P{N (An ) > 0}" + m(An ) ≡ J1 + J2

say.

In J2 , the modulus of the diﬀerence converges to zero as n → ∞ by (13.3.11), and the multiplier remains ﬁnite because P{N (A ) > 0} EP [f (Sx∗ (N ) N )I{N (An )>0} ] n ≤ supN ∈N #∗ f (N ) X m(An ) m(An ) by (13.3.11), → supN ∈N #∗ f (N ) X

and this supremum is ﬁnite by the boundedness assumption on f (·). Thus J2 → 0 as n → ∞. For J1 , we note ﬁrst from the proof of Theorem 13.3.I and (13.2.8) that f (Sx N ) N (dx) = EP f (Sxi N ) . m(An ) EP0 f (N0 ) = EP An

xi ∈An

It is thus enough to consider the diﬀerence " " " 1 "" ", ∗ f (S N )I − f (S N ) E P xi x (N ) {N (An )>0} " " (An ) xi ∈An

308

13. Palm Theory

which certainly vanishes when N (An ) = 0. When N (An ) > 0, then we have x∗ (N ) ∈ An , implying that it can be identiﬁed with one of the xi ∈ An , so the ﬁrst term cancels with one of the elements of the sum. Consequently, the diﬀerence is dominated by (supN ∈N #∗ f (N ))P{N (An ) ≥ 2}/(An ), which X tends to zero by (13.3.13). Resuming the proof of Theorem 13.3.IV, Proposition 13.3.V implies that it is enough to establish the convergence to zero of the diﬀerence " " "EP [f (N ) | N (An ) > 0] − EP [f (Sx∗ (N ) N ) | N (An ) > 0]" " ≤ EP |f (N ) − f (Sx∗ (N ) N )| " N (An ) > 0 (13.3.15) " ≤ EP supx∈A |f (Sx∗ (N ) N ) − f (Sx+x∗ (N ) N )| " N (An ) > 0 , n

because x∗ (N ) ∈ An under the condition N (An ) > 0. Fixing the set An for the supremum as An0 say, and letting n → ∞ for the conditioning, this last expression converges by Proposition 13.3.V to EP0 supx∈An |f (N0 ) − f (Sx N0 )| . Inasmuch as f is uniformly bounded and continuous, and the shift operation is continuous also, the argument of the supremum converges to zero pointwise as n0 → ∞, and then by dominated convergence the expectation must converge to zero. We note that in both Theorem 13.3.IV and Proposition 13.3.V the convergence results are suﬃcient to imply that both P{· | N (An ) > 0} and P{Sx∗ (·) | N (An ) > 0} converge weakly to P0 {·} as n → ∞. In fact the convergence in Proposition 13.3.V can be strengthened: see Exercise 13.3.6. The results just proved also provide some kind of analogue to the diﬀerential form of the Palm–Khinchin equations given at (3.4.11) of Chapter 3. Even in the one-dimensional case, however, it is not easy to provide a completely satisfactory account by this diﬀerential approach [see Slivnyak (1962, 1966), Leadbetter (1972), and Exercise 3.4.4]. It is a far more diﬃcult exercise to extend to Rd the interval-stationarity interpretation of a stationary point process on R. Results in this direction centre around the concepts of point maps and point-stationarity, initially considered by Mecke (1975) (the term point-stationarity is generally used in Rd for what in R1 has often been called cycle stationarity). One way of looking at the intervals in a one-dimensional point process is as mappings, each of which links two points of a realization. The underlying idea is that, even in higher dimensions, we may be able to deﬁne a family of mappings that link points of the process in such a way that invariance under these mappings provides a characterization of the Palm measure. What distinguishes the two cases R1 and Rd is that the mappings in R1 can be restricted to points that are left or right neighbours, whereas in Rd they need not have any particular proximity relation.

13.3.

Interval- and Point-stationarity

309

Extending the notation from the beginning of the section, with X = Rd , let {xi (N )} be a measurable enumeration of the points (of support) of N ∈ NX#∗ . We use x ∈ N as shorthand for x ∈ supp(N ), so x ∈ N ∈ NX#∗ implies that N ({x}) = 1. In view of Proposition 12.1.VI and the fact that the point process is stationary, we assume that P{N (X ) = ∞} = 1. Then we are in fact interested in that part of the subspace N0#∗ for which N (X ) = ∞, and the enumeration {xi (N )} is countably inﬁnite. Let Ψ(N, x) be a measurable mapping from NX#∗ ×X to X . Call Ψ covariant when for all N ∈ NX#∗ and x, y ∈ X , Ψ(Sy N, Sy x) ≡ Ψ(N − y, x − y) = Ψ(N, x) − y ≡ Sy Ψ(N, x),

(13.3.16)

so for covariant Ψ, Sx Ψ(N, x) = Ψ(Sx N, 0). Deﬁnition 13.3.VI. A point map is a covariant mapping Ψ: NX#∗ × X → X such that ∈ N if x ∈ N , Ψ(N, x) (13.3.17) = x if x ∈ / N. For covariant Ψ(N, ·), Ψ(N, x) = S−x Ψ(Sx N, Sx x) = S−x Ψ(Sx N, 0), so Ψ being a point map and x ∈ N is the same as Sx N ∈ N0#∗ , and the ﬁrst (and critical) case of (13.3.17) is equivalent to requiring that for N ∈ N0#∗ , Ψ(N, 0) is again a point of N.

(13.3.18)

[Earlier work in Thorisson (1995, 2000) and Heveling and Last (2005) introduced a point map as a mapping from NX#∗ → X satisfying (13.3.18). Their subsequent work starts from Deﬁnition 13.3.VI.] Take We deﬁne the composition of two point maps Ψ1 and Ψ2 as follows. x ∈ N ∈ N0#∗ and consider (Ψ2 ◦ Ψ1 )(N, x) ≡ Ψ2 N, Ψ1 (N, x) . Because y = Ψ1 (N, x) ∈ N by (13.3.18), it follows that z ∈ N also, that is, Ψ2 ◦ Ψ1 is well deﬁned as a point map and, with N speciﬁed, it is indeed the usual composition of the two point maps. Consequently, starting from x0 ≡ x ∈ N , deﬁning xn = Ψ(N0 , xn−1 ) for n = 1, 2, . . . yields a sequence of points in N , but they need not be distinct, nor need they enumerate the elements of N when N is countable. Now our interest is in countably inﬁnite point sets N ∈ N0#∗ , so the sequence {xn } just deﬁned by the nth point map iterates of x0 is well deﬁned, and either constitutes an inﬁnite chain or reduces to recurring cycles. Call a point map bijective if for any N ∈ NX#∗ , the function Ψ(N, ·) is a one-to-one mapping on X . It is evident that a bijective point map Ψ as at (13.3.17) maps N in a one-to-one manner onto itself, and for x ∈ / N it is just the identity. We can then deﬁne the inverse point map Ψ−1 for z ∈ N as the solution x of z = Ψ(N, x), and write x = Ψ−1 (N, z). We associate with every point map Ψ the point shift S Ψ on NX#∗ deﬁned by (13.3.19) S Ψ N = SΨ(N,0) N.

310

13. Palm Theory

Because Ψ(N, 0) = 0 for N ∈ / N0#∗ , S Ψ N = N unless N ∈ N0#∗ , in which case Ψ the eﬀect of S on N is to shift the points {xi (N )} of N to {xi (N )−Ψ(N, 0)}. Now by virtue of Ψ as a point map and 0 ∈ N ∈ N0#∗ , Ψ(N, 0) ∈ N , that is, it is one of the points enumerated as {xi (N )}, and this point therefore gives xi (N ) − Ψ(N, 0) = 0; that is, 0 ∈ SΨ(N,0) N when N ∈ N0#∗ . Thus, S Ψ N shifts the origin to Ψ(N, 0), and S Ψ deﬁnes an operator on NX#∗ that maps the space N0#∗ into itself (although not necessarily onto itself). For any N ∈ NX#∗ , the point sets N and its shifted version Sy N (any y ∈ X ) consist of points whose relative positions in X are not changed by the shift. However, the point shift S Ψ when operating on N ∈ N0#∗ yields a point set S Ψ N ∈ N0#∗ that in general diﬀers from the original point set. The most familiar examples are for right- and left-shifts for sequences in R1 as in the next example. Example 13.3(b) Right- and left-shifts as bijective point maps in R1 . For X = R, enumerate the points of N ∈ NX#∗ as {xi } ≡ {xi (N ): i = 0, ±1, . . .} where xi < xi+1 with x0 ≤ 0 < x1 . For N ∈ N0#∗ with N (R− ) = N (R+ ) = ∞, deﬁne Ψ(N, 0) = x1 (N ) [i.e., for such N , Ψ(N, 0) is the ﬁrst point of N to the right of the origin, x1 (N )], and for other N ∈ N0#∗ deﬁne Ψ(N, x) = x. Because Ψ is a point map and therefore covariant, Ψ(N, xi (N )) = Ψ(N − xi (N ), 0) + xi (N ) = xi+1 (N ). Again because Ψ is a point map, Ψ(N, x) = x whenever x ∈ / N , and therefore we have shown that this point map is indeed simply a right-shift. When N 0, the corresponding point shift S Ψ shifts the origin for S Ψ N to x1 (N ), thus subtracting x1 (N ) from each of the points of N , yielding xi (S Ψ N ) = xi+1 (N ) − x1 (N ). For example, we then have Ψ(S Ψ N, 0) = x2 (N ) − x1 (N ). For any x ∈ N , x (N ) say, we have Ψ(N, x ) = Ψ(N − x , 0) + x = x+1 (N ), so that Ψ−1 (N, x ) = x−1 (N ). It then follows that if we deﬁne n Ψn+1 = Ψ ◦ Ψn with Ψ1 = Ψ (and thus Ψ−n = Ψ−1 ), that {Ψn (N0 , x0 )} = {xn (N0 )}, i.e. this sequence enumerates the points of N0 . Observe that the point map Ψk ≡ Ψk above has the property that Ψk (N, x) maps x to the kth point of N to the right of x, and is again a bijective point map. The earlier theorems of this chapter can be reinterpreted as assertions that the Palm relations deﬁne a one-to-one correspondence between stationary point processes on NR#∗ and point processes on N0#∗ which are invariant under the bijective point maps taking one point of the realization to the next. When X = Rd with d > 1, however, the problem of ﬁnding a suitable family of bijective point maps becomes nontrivial. The next example, due to H¨ aggstr¨ om and quoted in Thorisson (2000, §9.2.8), underlies the extension of Theorem 13.3.II that we describe shortly.

13.3.

Interval- and Point-stationarity

311

Example 13.3(c) Mutual nearest neighbour matching. Given distinct points x , x in N ∈ NX#∗ , x and x are mutual nearest neighbours in N when the element of N closest to x is x and the element of N closest to x is x . Deﬁne a point map for N ∈ N0#∗ by x if x and 0 are mutual nearest neighbours in N , Ψ(N, 0) = 0 otherwise. Then as x runs through all the elements of N , yx if yx ∈ N is mutual nearest neighbour to x, Ψ(N, x) = x otherwise,

(13.3.20)

so Ψ is a bijective point map. For Ψ deﬁned in this way, we see that for x ∈ N , either x has a mutual nearest neighbour yx ∈ N or not, and Ψ(N, yx ) if the ﬁrst case, Ψ(N, Ψ(N, x)) = = x in either case, Ψ(N, x) otherwise, so that Ψ−1 = Ψ; that is, this point map is self-inverse. A simple case of a systematic approach to the construction of bijective maps in Rd due to Heveling and Last starts from the following deﬁnition. Deﬁnition 13.3.VII. For any bounded Borel set B, a B-selective point map ΨB (N, x) is a mapping N0#∗ × X → X such that (1) if N ∩ [B ∪ (−B)] consists of a single point xB say, and if also 0 is the only point of N in (B + xB ) ∪ (−B + xB ), then ΨB (N, 0) = xB ; (2) for x ∈ N , ΨB (N, x) is deﬁned from (1) and the covariant property; and (3) in all other cases, ΨB (N, x) = x. Property (2) here ensures that ΦB (N, x) is in fact a point map: we call it the B-selective point map. Observe that the conditions ensure that a unique point of Rd is deﬁned for every N and x. Note, in particular, that ΨB (N, 0) = 0 whenever there are two or more points of N in B ∪ (−B). Because B ∪ (−B) is symmetric and xB = ΨB (N, 0) = x−B , the map ΨB is in fact bijective. To see this, if a point xB = 0 exists as in (1) for given N ∈ N0#∗ , the associated point shift S ΨB interchanges 0 and xB , leaving the other points of N unchanged. ΨB is also self-inverse in the sense that ΨB ◦ ΨB = I (identity). The aim now is to build up a comprehensive set of mappings via an exhaustive area search by letting B vary over a suﬃciently wide class of testing sets. To this end, let {Tn } = {Bni } be a nested system of tilings of Rd , namely, a system of partitions of X like a dissecting system, each Tn consisting of a denumerable number of disjoint sets for which the norm Tn = supi {diam(Bni )} → 0 as n → ∞. As the sets get smaller, their ability to distinguish between points of the realization increases, until ﬁnally every

312

13. Palm Theory

point in the realization can be interchanged with the origin by one of the point shifts S ΨBni . It is now plausible that requiring a measure on N0#∗ to be invariant under the denumerable family of point shifts S ni ≡ S ΨBni ,

(13.3.21)

may be a condition comparable to interval stationarity in the one-dimensional case. This is a little too simple in general, because of the complexities of the possible conﬁgurations of points in a general element of NX#∗ . To avoid this diﬃculty, we impose on the point process a property similar to but stronger than simplicity. We say that the element N of N0#∗ in Rd is unaligned if it contains no equidistant pairs of points that are collinear. Exercise 13.3.11 shows that the set UX of all unaligned elements in X = Rd is measurable. Then the point process as a whole is unaligned if P(UX ) = 1, and we show that under this condition, invariance under the point shifts S ni is enough to imply stationarity of the original point process. In the general case treated by Heveling and Last (2005), this collection of point shifts has to be augmented by additional point shifts that deal with situations where the conﬁgurations may contain ﬁnite chains of equidistant collinear points. In any case, the aim is to establish the existence of a suﬃciently comprehensive set of bijective point maps to justify the far-reaching generalization of the one-dimensional results outlined in the deﬁnition and theorem which follow. Deﬁnition 13.3.VIII. A σ-ﬁnite measure R on B(N0#∗ ) is point-stationary when it is invariant under all bijective point maps. Theorem 13.3.IX. If the measure P on B(NX#∗ ) corresponds to a stationary point process on X = Rd , then its associated Palm measure R on B(N0#∗ ) is point-stationary. Conversely, if the σ-ﬁnite measure R on B(N0#∗ ) is unaligned, satisﬁes the normalizing condition (13.3.5), and is invariant under the family of point shifts {S ni } associated with the sets Bni from a nested sequence of symmetric tilings {Tn } satisfying Tn → 0, then it is the Palm measure associated with an unaligned stationary point process. Proof. Start by appealing to the general relations between stationary random measures and their associated Palm measures that are embodied in equation (13.2.6) and its specializations (13.2.10) and (13.2.11). We have to verify that if R is the Palm measure for a stationary point process on X = Rd and #∗ h(·) a bounded, measurable, nonnegative function on NX , the quasi Palm expectation operator ER deﬁned by ER [h(N )] ≡ N #∗ h(N ) R(dN ) satisﬁes 0

ER [h(S Ψ N )] = ER [h(N )] for any bijective point map Ψ.

(13.3.22)

13.3.

Interval- and Point-stationarity

313

Writing j(x) = IUd (x) for the indicator function of the unit cube in Rd , (13.2.10b) yields j(x)h{S Ψ (Sx N )} N (dx) P(dN ) ER [h(S Ψ N )] = = = =

#∗ NX

#∗ NX

#∗ NX

=

#∗ NX

#∗ NX

X

X

j(x)h{SΨ(Sx N,0) (Sx N )} N (dx) P(dN )

by (13.3.19),

j(x)h{SΨ(N,x)−x (Sx N )} N (dx) P(dN )

by (13.3.16),

X

X

X

j(x)h{SΨ(N,x) N )} N (dx) P(dN )

because S−x Sx N = N,

j Ψ−1 (N, y) h(Sy N ) N (dy) P(dN )

setting y = Ψ(N, x).

Because Ψ is bijective, so is Ψ−1 . Returning to the basic relation (13.2.6), and using Fubini’s theorem, the last term in the above chain can be evaluated as h(N ) j Ψ−1 (N, y) − y dy R(dN ) = ER [h(N )]. N0#∗

X

Thus, (13.3.22) holds and the direct part is proved. To prove the converse part, call a family {Ψni } of bijective point maps distinctive if, for ﬁxed n, N ∈ N0#∗ , and x ∈ N , the unions r Ψrni (N, x) are disjoint for distinct i = 1, 2, . . . , and exhaustive if δΨrni (N,0) (·) (every N ∈ N0#∗ ) (13.3.23) N (·) = lim n→∞

i,r

[here, Ψr denotes the r-fold product as below (13.3.18), else see Exercise 13.3.10]. The last condition implies that there are suﬃciently many point maps to distinguish the points in any realization of a simple point process. The rest of the proof consists of two stages. (1◦ ) If R is σ-ﬁnite, and invariant under a family of bijective point shifts which is both distinctive and exhaustive, then it is the Palm measure of some stationary measure. Let Ψ be a bijective point map from the speciﬁed family, and deﬁne its cycle length relative to the realization N ∈ N0#∗ by CΨ (N ) = inf{r: Ψr (N, 0) = 0}. For any given value k of the cycle length, for the successive points within that cycle, invariance under Ψ implies that for any nonnegative measurable function f (N, x), f S Ψ N, −Ψ(N, 0) R(dN ) = f N, Ψ−1 (N, 0) R(dN ). {CΨ =k}

{CΨ =k}

(13.3.24)

314

13. Palm Theory

For it is easy to check that the length of the cycle is invariant under S Ψ , and −1 applying S Ψ to the terms in the left-hand side leads directly to the form in the right-hand side [see Exercise 13.3.10(d)]. The same equation holds also for each of the iterates of Ψ, because the length of the cycle is not aﬀected, and the remaining features again follow from properties of Ψ−1 . Still continuing with Ψ ﬁxed, let NΨ denote the (reduced) realization, derived from N ∈ N0#∗ , and comprising the points {Ψr (N, 0): r = 1, . . . , CΨ }. Applying (13.3.24) to each of the iterates in turn, yields for each k, f (Sx N, −x) NΨ (dx) R(dN ) {CΨ =k}

X

= {CΨ =k}

X

f (N, x) NΨ (dx) R(dN ).

This result holds for all values of the cycle length, including the case k = ∞, so we can amalgamate the above equations for all k > 0 (omitting the zero iterate to avoid duplication) and obtain f (Sx N, −x) (NΨ − δ0 )(dx) = E0 f (N, x) (NΨ − δ0 )(dx) . E0 X

X

(13.3.25) We next amalgamate these equations also over the mappings Ψni , holding n ﬁxed, and summing over i. Because the family is distinctive, and the origin is excluded, there are no overlaps. Let Nn denote the point process obtained by amalgamating all the points in all the (NΨni − δ0 ) for i = 1, 2, . . . , and including the origin just once. Then (13.3.25) holds in the form f (Sx N, −x) Nn (dx) = E0 f (N, x) Nn (dx) . E0 X

X

Finally, because the family is exhaustive, we can let n → ∞, so that Nn → N0 for each N0 , leading to E0 f (Sx N, −x) N (dx) = E0 f (N, x) N (dx) . X

X

It now follows from Theorem 13.2.VIII that R is the Palm measure on N0#∗ for some stationary measure; R is a probability measure when the normalization condition at (13.3.6) holds. (2◦ ) The family of B-selective bijective point maps Ψni derived from the sets {Bni } of the symmetric tilings Tn is both distinctive and exhaustive over the set of unaligned elements of N0#∗ . Because the Ψni are self-inverse, they are also idempotent, so that in considering the powers Ψrni required by the distinctiveness condition it is suﬃcient to take r = 1. In this case the condition merely requires the points xB and xB identiﬁed by distinct B and B to be

13.3.

Interval- and Point-stationarity

315

distinct; this follows trivially from the fact the subsets in a given level of the tiling are by deﬁnition disjoint. In considering exhaustiveness, the crucial point here is that when the point process is unaligned, S ni ≡ S Ψni either leaves the point at the origin invariant, or interchanges the point at the origin with the unique point in Bni . Under these circumstances, every point in the realization N0 will ultimately be exchanged with the origin by one of the S ni , without altering other points in the conﬁguration. Thus it will ultimately appear in the sum at (13.3.23). Note that this will not be the case if the realization contains equi-spaced sequences of more than two collinear points, for then at least one point pair x1 , −x1 will be repeated in every set Bni containing at least one of the two points, sending the value of Ψni to 0, and hence ensuring that x1 never appears in the union. The converse is proved. The complete version of this theorem in Heveling and Last (i.e., without the restriction to unaligned processes) allows an equivalence statement for point processes in Rd analogous to that for point processes on the line given by Proposition 13.3.II as follows (see their paper for proof). Theorem 13.3.X. There is a one–to–one correspondence between the distributions of simple stationary point processes in Rd (i.e., on NX#∗ with X = Rd ) and point-stationary measures on N0#∗ satisfying the normalization conditions P{N = ∅} = 0 and (13.3.9b). In fact even the restriction to probability measures can be dropped, as in the discussion around Theorem 13.2.VIII, and the statement presented in symmetric form between σ-ﬁnite stationary measures on NX#∗ and σ-ﬁnite point-stationary measures on N0#∗ .

Exercises and Complements to Section 13.3 Note: Assume below that N is nonnull, that is, P{N = ∅} = 0. 13.3.1 Variants of Theorems 13.3.I–II. (a) [cf. (13.3.2) and (13.2.6)]. Substitute g(x, N ) = h(x, N )k(x, N ) with k(·) as in (13.3.3) to show that

t (N ) 1

EP (h(t1 (N ), St1 (N ) N )) = mEP0

0

h(x, N0 ) dx .

0

(b) Use (13.3.2) to show that m = EP0 (t1 (N0 )), and hence write (13.3.2) as

t (N ) 1

EP (g(N )) = EP0

0

g(St N0 ) dt 0

EP0 (t1 (N0 )).

13.3.2 For simple stationary N use (13.3.2) to ﬁnd the joint distribution of X = t1 (N ) and τ1 ≡ t1 (N ) − t0 (N ) in terms of the d.f. F (·) of τ1 . In particular, show the following.

316

13. Palm Theory (a) The joint distribution for (X, τ1 ) has a density function representation f (u, v) du dv =

mv −1 (1 − F (v)) du dv

(0 ≤ u ≤ v),

0

(u > v).

(b) The conditional distribution of X given τ1 is uniform on (0, τ1 ). (c) The conditional distribution of X given the whole sequence {τn : n= 0, ±1, . . .} is uniform on (0, τ1 ). [Hint: Let A ∈ B(NR#∗ ) belong to the sub-σ-ﬁeld generated by the τi and start from the deﬁnition of conditional expectation, so that for measurable h, τ E(h(X)IA | σ{τi }) = IA ({τi }) 0 1 h(x) dx.] 13.3.3 Use (13.3.2) and Exercises 13.3.1–2 to provide an alternative derivation of the formulae in Exercise 3.4.4 for Q(Bk ) and Pr(Bk ). 13.3.4 (a) Deﬁne k(x, N ) = 1 if x = x∗ (N ), = 0 otherwise. Verify that k(·) so deﬁned satisﬁes (13.2.9) and establish (13.3.9) from (13.2.14) as in the derivation of (13.3.2). (b) For a simple stationary point process in Rd with mean rate m, use (13.3.9) to show that E[(V0 (N ))] = E[ V0 (N ) dx] = 1/m. 13.3.5 Supply an alternative derivation of (13.3.11) from (13.3.3) and (13.3.5). n − P0 → 0, 13.3.6 The conclusion of Proposition 13.3.V can be strengthened to P n is the measure on N #∗ induced by the conditional probabilities where P 0 EP (f (Sx∗ (N ) N ) | N (An ) > 0) and convergence is with respect to the variation norm. [Hint: The basic inequalities depend on f only through f and hence hold uniformly in f for f ≤ 1 say.]

13.3.7 Use the inversion equation (13.3.9) to deduce that under the conditions of Theorem 13.3.IV, P{N (An ) > 1} = o((An )) ≡ (An )o(1) and P{N (An ) = 1} = m(An )(1 + o(1)) [cf. Theorem 1.2.12 of Franken et al. (1981)]. For simple stationary point processes on R as in Theorem 13.3.I, the analogue of (13.3.9) is t1 (N0 )/2 g(St N0 ) dt . EP (g(N )) = mEP0 t−1 (N0 )/2

13.3.8 Under the assumptions that N is stationary, simple, and has ﬁnite mean rate m, show that in Theorem 13.3.IV there is a nested sequence {An } with (An ) → 0, not satisfying (13.3.10), for which (13.3.11) fails. Investigate whether (13.3.11) holds with (13.3.10) but without {An } being nested, and whether (13.3.12) holds for more general sets An than spheres. [Hint: Let the realizations span a lattice, and let An be the union of two small spheres with centres at two lattice points.] 13.3.9 Consider a Poisson process in R1 with a point at 0. Suppose that the origin is shifted to the point of the process closest to the origin. Show that the shifted point process cannot be Poisson. [Hint: The interval between the new origin and the old is shorter than the other interval with an endpoint at the old origin, so these intervals cannot be independently distributed (Thorisson, 2000, Example 9.2.1).]

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

317

13.3.10 Properties of point maps. (a) Products. Given two point maps Ψ1 and Ψ2 , deﬁne the product map Ψ2 ◦ Ψ1 as below (13.3.18). Give a proof or counterexample to decide whether the product is (i) associative, and (ii) commutative. [Hint: Mecke (1975, §2) asserts that it is associative and distributive but not commutative.] If Ψ1 and Ψ2 are bijective, check that Ψ2 ◦ Ψ1 is bijective. When Ψ1 is bijective, can Ψ2 ◦ Ψ1 be bijective without Ψ2 being bijective? (b) Inverse. The inverse Ψ−1 of a bijective point map is well deﬁned [see further below (13.3.18)]. More generally, given a point map Ψ, deﬁne Φ: NX#∗ ×X → X , written Φ(N, z) = x, to be any solution x of z = Ψ(N, x). If there is a unique solution, check whether Φ is (i) covariant, (ii) a point map, (iii) bijective. −1

(c) When Ψ−1 is well deﬁned, show that S Ψ (S Ψ N ) = S0 N = N . (d) For bijective Ψ, show that Ψ(S Ψ N, 0) = −Ψ−1 (N, 0) [cf. equation (4.12) of Heveling and Last (2005)]. (e) When the cycle length CΨ (N ) as above (13.3.24) is ﬁnite and equal to k say, verify that Ψr (N, x) = (Ψ−1 )(k−r) (N, x), where Ψr = Ψ ◦ · · · ◦ Ψ denotes the r-fold product map of Ψ.

13.4. Marked Point Processes, Ergodic Theorems, and Convergence to Equilibrium In this section we further examine the role of the stationary Palm distribution P0 , especially in questions related to ergodicity and convergence to equilibrium. Before doing so we outline brieﬂy the extensions of the previous theory to marked random measures and MPPs. Initially we state the results for general random measures; in the examples and development subsequent to Example 13.4(b) we focus on MPPs. For convenience we generally assume that the ground process of any MPP has ﬁnite mean rate, and is simple. We ﬁrst examine the extension of the Campbell theory results to the marked case. As already mentioned in Section 13.1, application of the Radon– Nikodym derivative approach leads to families of local Palm distributions P(x,κ) indexed by an element (x, κ) in the product space X × K. The Campbell measure itself becomes a marked Campbell measure on Borel sets of the space X × K × Ω, where Ω = M# X ×K in the canonical framework. When the MPP is stationary, the arguments of Section 12.2 show that the local families of Palm measures are invariant under shifts in the location, but the dependence on the mark remains. Thus we obtain a family of stationary Palm distributions P(0,κ) on B(M# X ×K ) which in the point process case can be interpreted as the behaviour of the process conditional on the occurrence of a point at the origin with mark κ. The other important ingredient in the marked case is the stationary mark distribution. This appeared ﬁrst as the nonnormalized measure ν(·) on K introduced in the discussion of marked random measures above Lemma 12.2.III.

318

13. Palm Theory

When the ground process has ﬁnite mean rate mg , ν is a totally ﬁnite measure, and we can write E[ξ(A × K)] = (A) ν(K) = mg (A) π(K), where π is the stationary mark distribution. In this situation the reduced Campbell measure itself takes the form C˘P (dξ × dκ) = ν(dκ) P(0,κ) (dξ) = mg π(dκ) P(0,κ) (dξ).

(13.4.1)

The arguments are outlined in Exercise 13.4.1. In the ergodic theorems of Sections 12.2 and 12.6, in particular Theorem 12.6.VI, the limits of products of random measures averaged over increasing sets are related to the reduced moment measures. Proposition 13.4.I below develops similar results for more general functionals of the random measure, with the limit identiﬁed as an integral against C˘P . The proposition is stated for marked random measures; the special case of MPPs is examined in detail in the discussion around Proposition 13.4.IV. Results for the unmarked case follow by letting the functional g be independent of κ and integrating over κ. Proposition 13.4.I. Let ξ be a strictly stationary, ergodic, marked random ), ﬁnite ground measure on Rd × K, with probability measure P on B(M# Rd ×K rate mg , stationary mark distribution π, and reduced Campbell measure (i.e., Palm measure) C˘P . Let g(ξ, κ) be a B(M# X × K)-measurable nonnegative # function on MX × K. Then for any convex averaging sequence {An }, as n → ∞, 1 g(Sx ξ, κ) ξ(dx × dκ) → g(ψ, κ) C˘P (dψ × dκ) P-a.s. (An ) An K M# X = mg g(ψ, κ) P(0,κ) (dψ) π(dκ). (13.4.2a) K

M# X

Proof. The result is an extension of the individual ergodic Theorem 12.2.II and the approximation arguments used in deriving Theorem 12.2.IV. As in the latter theorem, we give details mainly for the unmarked case. Suppose ﬁrst that g is a nonnegative measurable function on M# X satisfying ˘ # g(ξ) C (dξ) < ∞ P-a.s., and introduce the function P MX g(Su ξ) g (u) ξ(du), f (ξ) = X

where g (·) is a continuous function ‘close’ to a δ-function and integrating to 1. As in the proof of Theorem 12.2.IV, we ﬁnd f (Sx ξ) dx = g (y − x) g(Sy ξ) ξ(dy) dx An

≥

X

X

An

g (y − x)

A− n

g(Sy ξ) ξ(dy) dx =

A− n

f (Sx ξ) dx.

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

319

From Theorem 12.2.II, using also (13.2.5), we have, as n → ∞, 1 f (Sx ξ) dx → E f (ξ) = g (u) du g(ψ) C˘P (dψ) (An ) An X M# X = g(ψ) C˘P (dψ), M# X

with similar results if An is replaced by An or A− n . Because is arbitrary, (13.4.2) follows when the limit is ﬁnite P-a.s. If not, then replace g(·) by an increasing sequence of functions {gr (·)} for which gr (ξ) ↑ g(ξ) (r → ∞), as, for example, gr (ξ) = min g(ξ), rα(ξ) , where α(·) is as in the discussion below (13.2.4). Then for every r, 1 g(Sx ξ) ξ(dx) ≥ gr (ξ) C˘P (dξ), lim inf n→∞ (An ) An M# X and the right-hand side → ∞ as r → ∞. For thegeneral marked case we start from a nonnegative function g(ξ, κ) satisfying M# g(ξ, κ) C˘P (dξ × dκ) < ∞ P-a.s., and introduce the function X ×K

f (ξ) = X ×K

g(Su ξ, κ) g (u) ξ(du × dκ).

The rest of the proof then proceeds much as for the unmarked case. Note that for g(ξ) a function of the realization ξ only, (13.4.1) yields 1 g(Sx ξ) ξ(dx) → g(ψ) C˘P (dψ × dκ) P-a.s. (An ) An K M# X = mg g(ψ) P 0 (dψ), (13.4.2b) M# X

where P 0 is not the Palm distribution of the ground process, but rather the averaged form K P(0,κ) (·) π(dκ) ≡ P 0 (·). The diﬀerence arises because the realizations ξ here are for a marked process, whereas the realizations of the ground process are unmarked. In the point process case the measure P 0 can be interpreted as the Palm distribution dependent on the occurrence of an arbitrary point (i.e., a point with an arbitrary mark) at the origin. The example below illustrates how marks can aﬀect ergodic limits in some simple cases. Example 13.4(a) Ergodic limits for processes with independent and unpredictable marks. Consider ﬁrst the case of a stationary marked Poisson process with nonnegative marks. Suppose the underlying Poisson process has intensity µ and the marks have distribution function F (·). Because of the total lack of memory, the Palm distributions P(0,κ) are independent of κ and all reduce to the distribution of the original marked Poisson process, with mg = µ and

320

13. Palm Theory

ν(dx) = µ π(dx) = µ dF (x). Let the function g be the indicator function of the event Γ that TM > τ , where TM is the time from the origin to the ﬁrst point of the process with mark greater than M . Then the ergodic limit on the right-hand side of (13.4.2) becomes P(0,κ) (Γ) π(dκ) = µ exp{−µ[1 − F (M )]τ }. (13.4.3) µ K

Note that the function g here does not depend on the mark κ at the origin. Also, because, in approaching the ergodic limit, averages are taken over all points of the process, irrespective of the mark, the constant preceding the exponential in (13.4.3) is µ, and not µ[1−F (M )]. The latter rate could arise in situations that allow g to depend explicitly on κ. Suppose, for example, we required the mark at the origin to be greater than M before a contribution to Γ could be counted, so that g(ξ, κ) = IΓ (ξ) I(M,∞) (κ); the limit on the right-hand side of (13.4.3) then becomes µ[1 − F (M )] exp{−µ[1 − F (M )]τ }. In this case we are just looking at the time intervals between points where the mark exceeds M , and the limit has the form we would expect from a Poisson process with rate µ[1 − F (M )]. The situation changes signiﬁcantly if we allow extensions to processes with unpredictable marks. For example, consider a process of independent exponential intervals, in which the length of the interval following a point with mark κ is exponentially distributed with mean κ. In this case the ground process is a renewal process in which successive intervals are i.i.d. with common d.f. given by the mixture distribution ∞ e−x/κ dF (κ) G(x) = 1 − ∞

0

with mean 0 κ F (dκ), which we again denote by 1/µ. The Palm distribution P(0,κ) now depends crucially on the mark κ at the origin, inasmuch as the time to the next point of the process (i.e., the ﬁrst interval X1 ), is exponential with mean κ, and the remaining intervals X2 , X3 , . . . are i.i.d. with d.f. G. With no other constraints, and the same event Γ as before, P(0,κ) (Γ) can be evaluated as the sum of the probabilities Pr{X1 > τ, κ1 > M } + Pr{X1 < τ, X1 + X2 > τ, κ1 < M, κ2 > M } + · · · , where κ1 , κ2 , . . . are the marks of the successive points following the origin, and the special distribution of X1 must be observed. If we look at the ergodic limit, still with g = IΓ , the dependence on the value of the initial mark is lost. If the further constraint is added that the initial mark κ must exceed M before a contribution to Γ is counted, the initial factor µ on the right-hand side of (13.4.3) must ﬁrst be multiplied by the rate factor 1 − F (M ), as in the previous case, but the calculation of P(0,κ) (Γ) must be modiﬁed to allow for the fact that the distribution of X1 is now constrained by the requirement κ > M . Some further details and examples are given in Exercise 13.4.2.

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

321

Higher moment results analogous to Proposition 13.2.VI can be developed for the marked case as special cases of the above proposition, but are relatively more complex. Reducing the second moment measure leads to a measure on X × K × K which can also be described as a ﬁrst-order moment ˚1 (du × dκ1 × dκ2 ) = measure of the Palm distribution, taking the form M −1 ˘ mg M2 (du × dκ1 × dκ2 ). The relation here is between an initial point of the process taken as origin and a second point of the process, (0, κ1 ) and (u, κ2 ) say. If we standardize the second-order reduced measure by dividing by the reduced second moment measure for the ground process, we obtain a bivariate distribution π2 (dκ1 × dκ2 | u) for the marks at 0, u respectively, conditioned by the occurrence of points at 0 and at u (see Lemma 8.3.III). Then we can write ˚1g (du) π2 (dκ1 × dκ2 | u), ˚1 (du × dκ1 × dκ2 ) = M (13.4.4) M ˚1g is the ﬁrst moment measure for the Palm distribution of the ground where M process. As noted in the discussion of Lemma 8.3.III and Example 8.3(e), the bivariate distribution π2 need not have marginals which reduce to the stationary distribution π; indeed, the distribution need not even be symmetric (see Exercise 13.4.3). The reason for the discrepancy is that we are not merely conditioning on a point with a given mark at the origin, but on the occurrence of two points, one at a speciﬁed distance from the other, and this additional conditioning alters the distributions in general. Next we formulate extensions of the last proposition to the nonergodic case; these are summarized in the next two results, for which proofs are outlined in Exercise 13.4.5. To avoid notational confusion, we write ω for the element of the probability space even if this is the canonical space, and E(· | I)(ω) for conditional expectations with respect to the σ-algebra I of invariant events. Lemma 13.4.II. Let P be the distribution of a stationary marked random measure on Rd × K, C˘P its reduced (marked) Campbell measure, and I the σ-algebra of invariant events under shifts in X = Rd . Then there exists an invariant random measure ζ(· , · | ω), deﬁned on sets in B(M# X ×K ), such that ˘ for nonnegative and CP -integrable functions g(x, κ, ξ), " " " g(x, κ, Sx ξ) ξ(dx × dκ) " I (ω) = dx E X

with

X

M# X

g(x, κ, ψ) ζ(dψ × dκ | ω) (13.4.5a)

E ζ(dκ × dψ) = C˘P (dκ × dψ) = ν(dκ) P(0,κ) (dψ).

(13.4.5b)

In particular, (13.4.5a) with g = IA×K×Γ , A ∈ BX , K ∈ K, Γ ∈ B(M# X ×K ) yields " " 1 E ζ(K × Γ | ω) = IΓ (Sx ξ) ξ(dx, K) "" I (ω) (A) A

((A) > 0).

322

13. Palm Theory

The next result establishes convergence of the sample path averages to the invariant random measure ζ. Note that ζ here is a reﬁnement of the random measure ψ of Lemma 12.2.III. Both ζ and ψ are a.s. totally ﬁnite with random total mass Y as deﬁned around (12.2.10 ). The measure ψ is the marginal measure of ζ on K after integrating out ξ, as in ζ(K × dξ) (K ∈ BK ; Y = ψ(K)). ψ(K) = M# X

Theorem 13.4.III. Let ξ be a strictly stationary marked random measure on Rd ×K, ζ the invariant random measure deﬁned in Lemma 13.4.II, and with ˘ X = Rd , let h(·, ·) be a B(M# X ×K )-measurable nonnegative or CP -integrable # function on MX ×K . Then for any convex averaging sequence {An } and n → ∞, 1 h(Sx ξ, κ) ξ(dx × dκ, ω) → h(ψ, κ) ζ(dψ × dκ | ω) P-a.s. (An ) An M# X ×K (13.4.6a) ) and K ∈ BK , In particular, for h(ψ, κ) = IΓ×K (ψ, κ) with Γ ∈ B(M# X ×K 1 IΓ×K (Sx ξ, κ) ξ(dx × dκ, ω) → ζ(Γ × K | ω) P-a.s. (13.4.6b) (An ) An Notice that the random measure ζ(·) is associated with the reduced Campbell measure (i.e., the Palm measure) rather than the Palm distribution. Thus, in combining limits over diﬀerent ergodic components, it is the conditional Palm measures, rather than the normalized Palm distributions, that combine linearly according to (13.4.5b). In considering ergodic results, it may seem more natural to combine the normalized limits in a linear manner, as suggested by Sigman (1995); this leads to the ‘alternative Palm distribution’ described in Exercise 13.4.6, where some consequences and elementary results are given. The next example illustrates the point. Example 13.4(b) A mixed Poisson process. Consider two stationary Poisson processes on R with parameters λ, µ (λ = µ), selected with probabilities p and q = 1 − p, respectively. The invariant σ-ﬁeld contains two nontrivial events J1 , J2 say, corresponding to the choice of the λ and µ processes, respectively. Let Γ = {N (0, a] = 0} for some ﬁxed a > 0. Then Sx Γ = {N (x, x + a] = 0}, and E IΓ (ξ) = p e−λa + q e−µa , whereas

ζ(Γ, ω) = λe−λa IJ1 (ω) + µe−µa IJ2 (ω),

in which the indicator variables have multipliers representing the average rate of occurrence of points followed by an empty interval of length a. The measure C˘P (Γ) represents the overall average rate of occurrence of such points, weighted according to the probabilities of the two components, and so it is

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

given by

C˘P (Γ) = pλe−λa + qµe−µa ,

323

(13.4.7a)

corresponding to (13.4.5a). The overall Palm probability of Γ equals ! (13.4.7b) (pλe−λa + qµe−µa ) (pλ + qµ), the denominator representing the overall rate of occurrence of points. It can be interpreted as the limit of the conditional probability that the next point will occur after lapse of time a, given that a point occurs in a small interval about the origin. This expression is to be contrasted with the situation for the Palm distributions: the Palm probability of Γ on J1 is e−λa and on J2 it is e−µa , but the overall Palm probability of Γ is not p e−λa + q e−µa as might have been expected. See Exercise 13.4.6 for the latter probability. For simple ergodic point processes, the previous theorems, coupled with the interpretation of the Palm distribution as the distribution of the process at an ‘arbitrary point’, lead to a circle of important results on convergence to equilibrium. Speciﬁcally, starting from an ‘arbitrary point’ (i.e., with the Palm distribution for a simple point process) and translating through some x ∈ Rd , we seek conditions under which the translated distributions converge to the stationary distribution as x → ∞. Dually, starting from an ‘arbitrary location’ (i.e., the stationary distribution) and observing the process at the nth point nearest to that location, we seek conditions under which the distribution of the process, relative to that nth point as origin, converges to the Palm distribution as n → ∞. To approach these ideas for MPPs, we ﬁrst establish some relevant notation. # For MPPs, the Palm measure P(0,κ) (·) is supported by the subspace, N(0,κ) say, of marked counting measures having a point with mark κ at the origin. The averaged form, P 0 (·) = P(0,κ) (·) π(dκ) (13.4.8) K

which we call the mean Palm distribution, occurs frequently in ergodic limits. # of It can be interpreted as a measure on the subspace N0K# = κ∈K N(0,κ) marked counting measures on Rd with a point at the origin whose mark κ ∈ K there is unspeciﬁed. We use N K to denote a generic marked counting measure with marks in K, NgK to denote its associated ground counting measure, N0K to denote a marked counting measure having a point with unspeciﬁed mark from K at the origin, and N(0,κ) to denote a marked counting measure with mark κ at the origin. Using this notation, inversion theorems for MPPs on Rd corresponding to the results in Section 13.3 can be developed. Thus (13.3.1) can be extended to MPPs on Rd : for bounded, nonnegative functions h(·) of N0K , 1 K EP(0,κ) [h(N(0,κ) )] π(dκ) = EP h(S N ) , EP [h(N0K )] = xi 0 i:xi ∈Ud mg K (13.4.9)

324

13. Palm Theory

where {xi ≡ xi (N K )} denotes some measurable enumeration of the points of the realization N K . The analogue of (13.3.2) for Rd is (13.3.9b), which for MPPs becomes, for bounded nonnegative functions g(·) of N K , EP(0,κ) g[Sx N(0,κ) ] dx π(dκ) EP [g(N K )] = mg K

= mg EP

0

V0 (N(0,κ) )

V0 (N0K )

g[Sx N0K ] dx

(13.4.10) ,

where, for example, V0 (N0K ) denotes the Voronoi polygon about the origin in Rd formed by the locations (points of the ground process) of the realization N0K . Exercise 13.4.1 sketches arguments that justify these inversion formulae. Using these representations and notations, we obtain the following combination of results from Section 12.2 and Proposition 13.4.I. Proposition 13.4.IV. Let P be the distribution of a stationary ergodic MPP on Rd with marks in K, ﬁnite ground rate mg , and stationary mark distribution π(·); let P(0,κ) be the associated family of stationary Palm distributions, and P 0 the mean Palm distribution deﬁned at (13.4.8). If {An } is a convex averaging sequence in Rd satisfying (An )/(An+1 ) → 1 (n → ∞), and h(·) a bounded, measurable, nonnegative function of N K , then 1 h Sx N0K dx → EP h(N K ) (n → ∞) P 0 -a.s. (13.4.11) (An ) An Furthermore, if x∗j ≡ x∗j (N K ) as above (13.3.7) and g(·) is a bounded, measurable, nonnegative function of N0K , then

n 1 g[Sx∗j (N K )] → EP g(N0K ) (n → ∞) 0 n j=1

P-a.s.

(13.4.12)

Proof. We start by considering (13.4.12). In the present context of MPPs and with {An } a sequence of spheres, (13.4.2) of Proposition 13.4.I can be put in the form 1 (An )

NgK (An )

g[Sx∗j (N K )] → mg EP g(N0K ) 0

P-a.s.,

(13.4.13)

j=1

because the locations of points in An are precisely those of NgK with modulus rj satisfying rj = x∗j ≤ radius(An ). Taking g ≡ 1, it follows that NgK (An )/(An ) → mg , so in place of (13.4.13) we can write 1 NgK (An )

NgK (An )

g(Sx∗j N K ) → EP g(N0 ) 0

j=1

P-a.s.

(13.4.14)

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

325

We can suppose that the {An } have been so chosen that (An )/(An+1 )→ 1 as n → ∞, in which case we have ! P-a.s. NgK (An ) NgK (An+1 ) → 1 For brevity, write gj = g(Sx∗j N ), and Nk = NgK (An(k) ), where n(k) is chosen to satisfy x∗k ∈ (An(k)+1 \ An(k) ). Then the inequalities 1

Nk

Nk+1

j=1

gj ≤

Nk+1 Nk 1 1 gj ≤ gj Nk j=1 Nk j=1

show that the limit relation (13.4.14) implies (13.4.12). Equation (13.4.11) has the same form as (12.2.6) in the ergodic case, with one notable exception: (12.2.6) holds outside a set of counting measures with P-measure zero, whereas in (13.4.11) the corresponding result is required to hold outside a subset of N0K , which itself has P-measure zero. Thus we cannot immediately draw conclusions from (12.2.6). To get over this diﬃculty, let Γ be the subset of NX# (X = Rd × K) on which (13.4.11) holds and Γ0 its restriction to N0K . From (13.4.9), taking h as the indicator of Γ0 , 1 P 0 (Γ0 ) = EP mg

K (Ud ) Ng

IΓ0 Sx∗i (N K )

i=1

1 EP IΓ0 Sx (N ) N (dx) = mg d U 1 IΓ (Sx N ) N (dx) P(dN ) = mg Γ U d 0

(13.4.15)

because Γ has P-measure one. Now Γ is invariant, so Sx N ∈ Γ whenever N ∈ Γ. But if Sx N ∈ Γ0 , as required by the indicator function IΓ0 in the integrand, we must have N ({x}) = 1, and so the last line in (13.4.15) can be rewritten EP N (Ud ) 1 d = 1. P 0 (Γ0 ) = N (U ) P(dN ) = m Γ m This establishes (13.4.11), the set of P 0 -measure zero being taken as the relative complement of Γ0 in P 0 . In the nonergodic case, the argument leading to (13.4.15) can be extended to show that P and P 0 induce the same measures on the invariant σ-algebra I; see Exercise 13.4.7. Equation (13.4.11) of Proposition 13.4.IV can be interpreted as an a.s. statement about convergence to equilibrium. To put the result in the context of Section 12.5, take expectations with respect to P 0 and P in equations (13.4.11) and (13.4.12), respectively, and apply the dominated convergence theorem. Then conclude as follows.

326

13. Palm Theory

Corollary 13.4.V. Suppose that the conditions of Proposition 13.4.IV hold, with g and h bounded. Then 1 (An ) and

An

EP h(Sx N0 ) dx → EP h(N ) 0

k 1 EP g(Sx∗j N ) → EP g(N0 ) 0 k j=1

(n → ∞),

(k → ∞).

(13.4.16)

(13.4.17)

Notice that although the left-hand side of (13.4.17) converges to the Palm distribution in the ergodic case, this is not so in the nonergodic case, when it converges rather to the ‘alternative Palm distribution’ of Exercise 13.4.6. Equation (13.4.16) can be interpreted as asserting weak convergence of the measures (1/(An )) An S+x P 0 dx to the limit measure P. In the onedimensional context, if we consider intervals An = [0, n) and project onto the half-line R+ (see Exercise 13.4.8 for details), this is nothing other than the weak (C, 1)-asymptotic stationarity of the measure P0 . Corollary 12.6.VIII then implies that P0 is also strongly (C, 1)-asymptotically stationary, with the same limit measure P. Because the weak and strong versions coincide, we call such a process simply (C, 1)-asymptotically stationary. To develop a similar interpretation of equation (13.4.17), ﬁrst recall that, in the one-dimensional case, every stationary MPP can be associated with a stationary marked interval process, and vice versa, where by a marked interval process we mean a sequence of pairs {(τi , κi−1 )}, with τi the length of the ith interval and κi−1 the mark associated with its left-hand endpoint. As in the discussion in Section 13.3, the probability distribution of a marked interval process can be treated either as a distribution on the space of sequences {(τi , κi−1 )}, or as a distribution on the subspace N0# of MPPs with a point at the origin. Note that if the sequence of pairs is stationary, then its distribution on N0# is the mean Palm distribution P 0 for some stationary MPP with measure P related to P 0 by (13.4.9–10). Note also that for a stationary MPP, the initial interval (0, t1 (N )) has the stationary interval distribution for the ground process, and the initial mark κ1 has a form of stationary mark distribution which is diﬀerent in general from the stationary mark distribution π. There is also a stationary joint distribution for the initial interval and the initial mark; see Exercise 13.4.9, which also sketches out a proof of the stationarity results summarized above. The concepts of coupling and shift-coupling, as of weak and strong (C, 1)asymptotic stationarity, can be developed for an interval process, whether simple or marked, just as easily as, and in a parallel fashion to, those for a point process. As for point processes, the concepts of weak and strong (C, 1)asymptotic stationarity coincide, and can be referred to simply as (C, 1)asymptotic stationarity. Now (13.4.17) can be interpreted as asserting that, if we start an interval process from the ﬁrst (or indeed the kth) point following

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

327

the origin in a stationary MPP, that interval process is (C, 1)-asymptotically stationary with limit process equal to the stationary interval process associated with the original MPP. Sigman (1995) calls a point process N ‘event-stationary’ when the sequence of intervals initiated by t1 (N ) is interval-stationary. In this terminology, the MPP corresponding to P 0 is event-stationary. In general, a stationary point process will not be event-stationary in this sense, but Corollary 13.4.V implies that on R+ , a stationary MPP is (C, 1)-asymptotically event-stationary, and conversely that an event-stationary MPP is (C, 1)-asymptotically stationary. Finally in this circle of ideas, these last conclusions can be extended to processes that are not themselves stationary, but only asymptotically stationary, as summarized below. Proposition 13.4.VI. Suppose that the MPP N K is either stationary or (C, 1)-asymptotically stationary with limit measure P. Then the interval process, starting from t1 (N ) as origin, is (C, 1)-asymptotically stationary with limit measure P 0 associated with P through (13.4.9–10). Conversely, suppose that an MPP N0K deﬁned on N0# represents an interval process which is either stationary or (C, 1)-asymptotically stationary, with limit measure P 0 . Then N0K is (C, 1)-asymptotically stationary with limit measure P associated with P 0 through (13.4.9–10). Proof. We start from the observation that if two MPPs N and N shiftcouple, then the same is true of the associated interval processes N0 and N0 , started at t1 (N ) and t1 (N ), respectively, and vice versa. Indeed, if , N of the MPPs such that there are stopping times T, T and versions N N (t + T ), N (t + T ) are a.s. equal for t ≥ 0, then the corresponding interval 0 and N are equal after discrete times J = N (T ) + 1, J = processes, say N 0 + 1 and so the interval processes shift-couple. Conversely, if the two inN terval processes, corresponding to MPPs N0 , N0 , shift-couple, with coupling times J, J respectively, then N0 and N0 shift-couple as point processes with coupling times T = tJ (N ), T = tJ (N ), respectively. Suppose now that N is (C, 1)-asymptotically stationary with limit process N having distribution P, so that N and N shift-couple. Then the corresponding interval processes, which we may associate with point processes N0 and N0 , also shift-couple, so that N0 is (C, 1)-asymptotically stationary with limit process N0 . This last process is not itself stationary, but corresponds to the process on the left-hand side of (13.4.16), which by Corollary 13.4.V is (C, 1)-asymptotically interval stationary with limit the stationary interval process, N0 say, associated with the stationary process N ; thus N0 shiftcouples to N0 . The transitivity of shift-coupling now implies that, as interval processes, N0 shift-couples to N0 and is therefore (C, 1)-asymptotically stationary with limit process N0 . But we already know that the distribution of a stationary point process is associated with the distribution of the corresponding stationary interval process through equations (13.4.9–10), and so the ﬁrst statement of the

328

13. Palm Theory

proposition follows. The second statement follows by an analogous argument with the roles of the point and interval processes reversed. The last proposition ﬁnds applications in queueing theory and related ﬁelds, where it provides a starting point for a systematic approach to results on convergence to equilibrium, and the forms of the resulting stationary distributions. Accounts are given in Franken et al. (1981), Baccelli and Br´emaud (1994), and Sigman (1995), and we do not attempt to repeat the material here. However, to give the ﬂavour of the applications, we indicate how they apply to regenerative processes. Example 13.4(c) Regenerative processes via embeddings. Call an MPP N K on R+ , with internal history H, regenerative if the sequence of event times {ti : i ≥ 1} includes an embedded renewal process. More precisely, we require the existence of a subsequence {τj = tij : j = 1, 2, . . .} (the regeneration points) such that (1) for j ≥ 1, the intervals τj+1 −τj form an i.i.d. family with proper d.f. F (x) (and in the sequel, we suppose that this distribution has a ﬁnite mean, so that the corresponding renewal process has a stationary version); and (2) between regeneration points, the successive families of marked points N (j) = {(0, κij ), (t1+ij − tij ), κ1+ij , (t2+ij − tij ), κ2+ij , . . . , (tij+1 − tij ), κij+1 } are i.i.d. versions of a ﬁnite MPP on X = [0, ∞) × K. To bring such processes within the purview of the previous theory, we ﬁrst regard the sequence {(τj , N (j) )} as a marked renewal process with i.i.d. marks N (j) , treating the latter as random variables on the portmanteau space X ∪ of Chapter 5 [equation (5.3.10)]. If the intervals in the renewal process have ﬁnite mean length, the resultant renewal process is stationary, and (having i.i.d. marks) so is the associated marked renewal process. The ﬁrst interval may be exceptional (i.e., the renewal process may be a delayed renewal process) but in any case the process is asymptotically stationary, hence (C, 1)asymptotically stationary. The stationary mark distribution is nothing other than the common distribution, say Pf , of the ﬁnite MPPs N (j) . These remarks imply not only that the embedded marked renewal process is asympotically stationary, but also that the original MPP is asymptotically stationary. The underlying reason is that any bounded functional of the MPP can be represented, in terms of the sequence of renewal times and the i.i.d. sequence N (j) , as a bounded functional of the marked renewal process. Hence the original MPP is stationary (expectations of bounded functionals invariant under shifts) if and only if the latter is stationary. It follows also that if the original MPP is started from a regeneration point, it is asymptotically stationary; that is, there is a shift-coupling with the asymptotic form of the original MPP. Moreover, if we take any stopping time T , deﬁned in terms of the initial ﬁnite process N (0) , the process started at T also shift-couples to

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

329

the limit process (modify the shift). Thus the original MPP is asymptotically stationary if started from any such T , in particular, from any of the points of the MPP in the initial cycle. A more explicit argument, exhibiting the form of the ergodic limit in terms of the N (j) is outlined by Sigman (1995, Section 2.6). For another approach use the ideas of Lemma 12.5.III, taking the initial condition to be the history of the current process N (j) up to the time-point chosen as origin. As in other situations of this kind, although these general results are useful for discussing the convergence to stationarity and existence of stationary versions of the particular process in view, the hard step is the evaluation of the form of the stationary distribution, which in the present case means evaluating the explicit form of the distribution of the i.i.d. components N (j) . If, for example, the regenerative process is taken to be the arrival of a customer at an empty queue in an M/G/1 queueing system, and the original MPP is the number of customers waiting in the queue at the arrival of a new customer, explicit arguments are still needed to evaluate the form of the stationary distribution of the queue size. Although the results have been stated for regenerative processes, the independence of the cycles has been used only tangentially. The essential points of the argument are the existence of an embedded sequence of points which are (C, 1)-asymptotically stationary, and a law of large numbers for the behaviour within cycles. For extensions in this direction see Sigman for references. A stronger form of convergence holds when the process is mixing, and we can drop the (C, 1)-averaging that stems from the ergodic theorem. For a brief statement of such results, we return to unmarked processes, leaving the reader to formulate extensions. Proposition 13.4.VII. Let P be the distribution of a simple stationary mixing point process on Rd , with ﬁnite density, and let P0 be the corresponding Palm distribution. Then S+x P0 → P

weakly

(x → ∞).

(13.4.18)

Proof. To establish weak convergence we need to show that for all bounded continuous f on M# X, EP0 f (Sx N0 ) → EP f (N ) as x → ∞. Proposition 13.3.V is a convenient starting point for the proof. From that result, for any given > 0 and suﬃciently small sphere An , " " "EP f (Sx N0 ) − EP f (Sx Sx∗ (N ) N ) | N (An ) > 0 " < 1 (13.4.19) 0 2 for each ﬁxed x ∈ X [replace f (·) by f (Sx ·) in (13.3.14)]. Inspection of the proof of (13.3.14) shows that the inequality is uniform in x (see Exercise 13.3.6) because the two critical inequalities used in its proof depend only on

330

13. Palm Theory

(13.3.11) and sup |f (·)|, both of which are independent of x. Consequently, we can ﬁx n from (13.4.19) and proceed by simply evaluating the diﬀerence " " "EP f (Sx Sx∗ (N ) N ) | N (An ) > 0 − EP f (N ) " " " "EP f (Sx Sx∗ (N ) N )I{N (A )>0} (N ) − EP f (N ) EP I{N (A )>0} (N ) " n n . = P{N (An ) > 0} Now it is enough to apply the result of Exercise 12.3.5 with X(N ) = f (Sx∗ (N ) N ),

Y (N ) = I{N (An )>0} (N )

to deduce that this diﬀerence can also be made less than suﬃciently large.

1 2

by taking x

To illustrate the close connection between these results and the classical renewal theorems, we prove a result for mixing second-order processes that Delasnerie (1977) attributes to Neveu. It asserts that for large x, the reduced second moment measure approximates its form under a Poisson process. Theorem 13.4.VIII. Let N be a stationary second-order point process in ˘ 2 (·). If N is mixing Rd with density m and reduced second moment measure M then as x → ∞, ˘ 2 (·) →w m2 (·). (13.4.20) Sx M Proof. The formal connection here lies with the representation of the reduced moment measures as moments of the Palm distribution. However, we do not need to call on the Palm theory as such; it is enough to use the deﬁnition of the reduced moment measure, which yields, with b∗ (x) = b(−x) as in Chapter 12 and functions a(·), b(·) that vanish outside a bounded set, ˘ 2 (dv) a(u − x)b(v) N (du) N (dv) = (a ∗ b∗ )(v − x) M E X X X ˘ 2 (dv). (a ∗ b∗ )(v) Sx M = X

Now when N is mixing, the ﬁrst expectation converges as x → ∞ to 2 2 m a(u) du b(v) dv = m (a ∗ b∗ )(v) dv, X

X

X

which by letting a(·) run through boundedcontinuous functions ensures the ˘ 2 )(·) to b ∗ m2 (·) , from which a standard weak convergence of b ∗ (Sx M sandwich argument yields (13.4.20). This result assumes a more familiar form in the case that d = 1 (i.e., X = R) when expressed in terms of the expectation function U (·) introduced in Theorem 3.5.III. Then, for a second-order stationary simple point process on R we have " U (x) = 1 + lim EP N (0, x] " N (−h, 0] > 0 h↓0 ! ˘ 2 (0, x] m, = 1 + EP0 N (0, x] = 1 + M leading to the following corollary to Theorem 13.4.VIII.

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

331

Corollary 13.4.IX (Generalized Blackwell Theorem). Let N be a simple stationary mixing point process on R with ﬁnite second moment measure. For all h > 0, U (x + h) − U (x) → mh (x → ∞). Because a renewal process with m < ∞ has ﬁnite second moment measure and is mixing if its lifetime distribution F is nonlattice, this corollary includes the standard version of Blackwell’s theorem (Theorem 4.4.1) as a special case. However, the simplicity of the above argument as compared with the intricacies of Chapter 4 is misleading, because what is obscured here is the fact that to prove that a renewal process is mixing, a result close to Blackwell’s theorem must be assumed. For this case, therefore, the corollary would then become the conclusion of a somewhat circular argument. More generally, there is no very simple relation between mixing of the basic process and mixing of the sequence of intervals, in contrast to the case of ergodicity for which the concepts coincide (see Exercise 13.4.9). For the Wold process, similar questions regarding the lattice structure have to be overcome as in Chapter 4, and, additionally, the function U (·) refers to expectations when the initial interval has the stationary distribution. Consequently, further extensions are needed to cover the case of a process starting with an arbitrary distribution for the initial interval.

Exercises and Complements to Section 13.4 13.4.1 Reduced Campbell measure for marked random measures and MPPs. (a) The marked Campbell measure on X × K × M# X ×K is deﬁned for A ∈ BX , K ∈ BK , and U ∈ B(M# X ×K ) by CP (A × K × U ) =

ξ(dx × dκ) P(dξ). U

A

K

Show that if P is stationary under shifts on X = Rd , CP is invariant under the actions of the transformations Θu (x, κ, ξ) = (x − u, κ, Su ξ). (b) Find a mapping on the triple product space analogous to D in (13.2.4), and show that when P corresponds to a stationary random marked measure ξ, the marked Campbell measure factorizes into a product of ˘P (dκ × dψ) Lebesgue measure on X and a reduced Campbell measure C deﬁned by the following extension of (13.2.6),

g(x, κ, Sx ξ) ξ(dx×dκ)

E

=

dx

X ×K

X

#

K×MX ×K

˘P (dκ×dψ). g(x, κ, ψ) C

(c) If the ground process has ﬁnite mean rate mg , with associated stationary mark distribution π, show (using a disintegration argument) that the right-hand side of the above equation can be written

dx

mg X

π(dκ) K

#

MX ×K

g(x, κ, ψ) P(0,κ) (dψ),

where {P(0,κ) } is a family of stationary Palm distributions conditional on the mark κ.

332

13. Palm Theory

13.4.2 MPP with exponentially distributed intervals. Consider the second situation in Example 13.4(a), where successive intervals are exponentially distributed with mean length equal to the mark κ of the point initiating the interval. Find a series expansion for P(0,κ) (Γ) as a function of M and κ. Is the averaged form P0∗ (Γ) smaller or larger than the its value in the ﬁrst situation, where the intervals are i.i.d. exponential with the same mean µ as in the mixed case. Does a version of the renewal theorem hold for the mixed process? 13.4.3 A marked Gauss–Poisson process. Consider a stationary Gauss–Poisson process [see, e.g., Exercise 7.1.9(a)], in which the parent point has mark A and the oﬀspring point, following it after a distance u with d.f. dF (u) = f (u) du (u ≥ 0), has mark a. If parent points occur as a stationary Poisson process with rate λ, ﬁnd the reduced second-order moment densities (2) (2) (2) (2) mA,a (u), ma,A (u), mA,A (u), ma,a (u), and the bivariate (2 × 2) distribution (2)

(2)

(2)

π2 (κ1 , κ2 ). Show in particular that mA,a (u) = ma,A (−u) = ma,A (u). 13.4.4 Markov renewal process. Let X(t) be a stationary semi-Markov process on countable state space X, with counting function Ng (·), as in Example 10.3(a), and Hg (x) = E[Ng [0, x] | transition in X(t) at t = 0], = EP 0 (Ng [0, x]) = i πi j Hij (x) [cf. (13.4.8–9) and Example 10.3(a) for notation]. Use πk = x ∞ π i i j 0 Hij (du) x−u Gjk (dz) (all x > 0) to show that Hg (·) is subadditive; that is, Hg (x + y) ≤ Hg (x) + Hg (y) for x, y > 0. [Hint: Daley, Rolski, and Vesilo (2007) gives a proof.]

13.4.5 Ergodic properties of nonergodic MPPs. Mimic the steps in the proofs of Lemma 12.2.VI and Theorem 13.2.III to establish Lemma 13.4.II and Theorem 13.4.III. 13.4.6 Alternative version of Palm distribution [cf. Sigman (1995, Section 4.4)]. (i) Establish a form of Corollary 13.4.V for the nonergodic case, and hence verify the comment below the corollary. (ii) Apply this form to Example 13.4(b) and verify the comment at the end of the example. (iii) Find a Radon–Nikodym interpretation of this distribution [Sigman p. 65 quotes Nieuwenhuis (1994) and Thorisson (1995)]. [Hint: The basic link comes from the r.v. Y controlling the invariant distribution. For (i), use Exercise 13.4.4 to express the result of the corollary in terms of the invariant random measure. In particular, argue from (13.4.11) (in the unmarked case) that k 1 1 g(Stj N ) → ζ(dN | I), k j=1 Y where we have written ζ(· | I) to emphasize the dependence on the invariant σ-algebra. Now, following Sigman, write k 1 g(Stj N ) → E[g(N0 ) | I] = g(N0 ) P0 (dN | I) k j=1

for an ergodic measure relative to discrete shifts. Deduce that E[ζ(dN | I)] = (1/m)E[Y P0 (dN | I)]. ]

13.4.

MPPs, Ergodic Theorems, and Convergence to Equilibrium

333

13.4.7 Equivalence of invariant σ-algebras for P and P0 . If P is stationary and Γ is an invariant set in I with P(Γ) = p, then Γ0 = Γ ∩ N # (X0 ) is invariant in I0 with P0 (Γ0 ) = p. Deduce that a stationary point process N is ergodic if and only if the associated Palm distribution P0 is ergodic (e.g., when X = R, N is ergodic if and only if the corresponding interval process is ergodic). [Hint: Replace the expectation in (13.4.15) by the conditional expectation with respect to I; then the integral on the right-hand side of (13.4.15) is zero outside Γ. A similar argument holds inside Γ. To prove the converse, if Γ0 ∈ I0 , ﬁrst put Γ = {N : Sx1 (N ) N ∈ Γ0 }, then use (13.2.9) with h the indicator of Γ. Otherwise put, ergodicity is equivalent to the requirement that all invariant sets have probability 0 or 1; now use Lemma 13.4.II, except that a converse to the lemma is needed: P0 (Γ0 ) = 1 implies the existence of Γ in B(NX# ), such that Γ is invariant with P(Γ) = 1 and Γ0 = Γ ∩ N0∗ .] 13.4.8 Let P be the distribution of a stationary point process on R+ for which P{N (R) = ∞} = 1. Restate (13.4.16–17) in terms of integrals on (0, tn ] with tn → ∞, where tn is the nth point of the process in R+ . 13.4.9 Stationary marked interval process. Consider a stationary sequence of pairs {(τi , κi−1 )}, as below (13.4.17), with stationary mean ground rate mg . Identify its distribution with the distribution of an MPP with a point at the origin with mark distributed according to the stationary distribution of marks in the original sequence. Then ﬁnd a one-dimensional version of (13.4.10) to deﬁne the distribution of an MPP P and verify that it is stationary. Find the distribution of the time to the ﬁrst point occurring after the origin, and of the mark distribution of that point. Construct an example to illustrate that this may diﬀer from the stationary mark distribution. [Hint: Condition on the value of the mark of the point at the origin, which by assumption has the stationary mark distribution. For a counterexample, consider, e.g., an alternating renewal process. See also Sigman (1995, Section 3.4).] 13.4.10 Let the multivariate point process N (·) = (N0 (t), N1 (t), . . . , Nk (t)) in t > 0 be such that the successive points t00 ≤ 0 < t01 < t02 < · · · of the component process N0 are regeneration epochs for N (·) in the sense that, for ui ≥ 0, the conditional distributions of N (t + ui ) given t0j = t for some j ≥ 1 are independent of t, and the interval lengths {t0,j+1 − t0j } are i.i.d. r.v.s with density function f (·). Deﬁne H(t) = E(N1 (t) | t00 = 0, t01 = t),

V (t) = var (N1 (t) | t00 = 0, t01 = t).

Assuming the moments are ﬁnite, show that Ê∞ V (t)f (t) dt var (N1 (t)) , lim ≥ β ≡ Ê 0∞ t→∞ E(N1 (t)) H(t)f (t) dt 0 with equality holding if and only if H(t)/t is constant a.e. on the support of f (·). [See Berman (1978) for this and other results concerning such multivariate point processes that are regenerative in the sense of Smith (1955). Results concerning ergodicity and Palm distributions have been extended from regenerative processes to the context of marked point processes; there are details and several examples in Franken et al. (1981).]

334

13. Palm Theory

13.5. Cluster Iterates We turn next to the operation of iterated cluster formation, already mooted in Section 11.4, but postponed then because it makes essential use of Palm theory concepts. We consider clusters with mean size unity, but exclude the case P{N (X | x) = 1} = 1 (all x) inasmuch as this is the case of random translations already considered in Section 11.4. Standard results on branching processes [e.g., Harris (1963, Chapter 1)] imply that for critical branching processes (mean cluster size m = 1) the oﬀspring from a given ancestor eventually become extinct, or in other words the iterated clusters eventually collapse to the zero measures. In such circumstances it may seem surprising that stable limit behaviour can occur. The explanation is to be found in the inﬁnite character of the initial distribution, which allows local depletion to be perpetually replenished by immigration from more successful clusters in distant parts of the state space. The higher the dimension of the space, the greater the opportunities for such replenishment become, so that stable behaviour is the norm for d ≥ 3, whereas it is the exception for d = 1 or even d = 2. An earlier account may be found in Liemant, Matthes, and Wakolbinger (1988); Wakolbinger has several subsequent publications with various co-workers. The nature of the limiting behaviour is most easily understood by studying ﬁrst the situation where the initial process is Poisson, with mean density equal to unity say. Any limit, although no longer Poisson, is still inﬁnitely divisible, so the discussion can be phrased in terms of the convergence of the n to their limit (Proposition 11.2.II). Because associated KLM measures Q the successive Poisson cluster processes formed by iterating the clustering operation are also stationary, the discussion can be further reduced to the (n) associated with these KLM measures [see study of the Palm measures Q 0 Example 13.2(c), especially the discussion leading to Proposition 13.2.IX, and Exercise 13.2.10]. The essential question can now be phrased in terms of the (n) }, namely, ﬁnd conditions on the cluster mechanism such Palm measures {Q 0 (n) (∞) say. that Q converges to some boundedly ﬁnite limit measure Q 0 0 The cluster mechanism itself can be conveniently speciﬁed in terms of (A) a distribution {πk : k ≥ 0} for the size of the cluster; and (B) a family of symmetric d.f.s Pk (dx1 × · · · × dxk ) (k ≥ 1) specifying the locations of the cluster members relative to the cluster centre at the origin. The assumptions m ≡ E{N (X | x)} = 1 and P{N (X | x) = 1} < 1 imply both ∞ ∞ πk = kπk and π0 > 0. 1= k=0

k=0

Also, because by assumption the cluster mechanisms are homogeneous in space, the locations relative to a cluster centre not at 0 are speciﬁed by the appropriately shifted versions of the Pk (·).

13.5.

Cluster Iterates

335

The KLM measure corresponding to the Poisson cluster process formed at the ﬁrst stage of clustering is concentrated on the set of totally ﬁnite counting measures and allocates mass πk to those trajectories containing just k points. Because m = 1, it may be considered as a probability distribution on N0 (X ) in its own right. As in Example 13.2(c), the associated Palm measure is then deﬁned in terms of a modiﬁed cluster structure, in which the cluster size is distributed according to {πk } = {kπk : k ≥ 0} (note that π0 = 0), and the locations of the cluster members are speciﬁed by placing one cluster member at the origin and distributing the remaining k − 1 points about it according to the symmetrized measures Pk dy × (y + A2 ) × · · · × (y + Ak ) . Pk−1 (A2 × · · · × Ak ) = X

Note that the Palm clusters considered here diﬀer from the clusters arising in the regular representation (Propositions 12.1.V and 12.4.II) only through the relative weightings given to the diﬀerent cluster sizes. Note also that the intensity measure for the underlying cluster process, given by ρ(dx) =

∞

kπk

k=1

X (k−1)

Pk (dx × dy2 × · · · × dyk ),

(13.5.1)

is here a probability measure on X , whereas the intensity measure for the Palm cluster process is given by ρ˜(dx) = δ0 (dx) +

∞ k=2

k(k − 1)πk

X (k−1)

Pk (y2 + dx) × dy2 × · · · × dyk .

Now consider the Palm cluster resulting from two stages of clustering. To ease the notation only, we use here density notation, with corresponding lower case symbols. First note that the quantity kπk pk (y, x2 + y, . . . , xk + y)

(13.5.2)

can be interpreted as the joint density of locating the parent (cluster centre) at −y and k − 1 siblings at x2 , . . . , xk , given one point of the cluster at the origin (cf. Exercise 1.2.5). The marginal density for the parent, given a point at the origin, is thus g(y) =

∞

kπk

· · · pk (y, x2 + y, . . . , xk + y) dx2 . . . dxk = ρ(y),

k=1

where we here write ρ(y) for the density of the intensity measure (13.5.1). The members of the two-stage Palm cluster can now be classiﬁed into three groups: ﬁrst, the point located at the origin; second, its immediate siblings,

336

13. Palm Theory

jointly located with the cluster centre according to (13.5.2); and third, its ‘cousins’, found by locating a grandparent and a set of ‘uncles’ by (13.5.2), but given the parent at −y rather than at the origin, and then superposing the clusters generated by each of the uncles. In symbols, we may write brieﬂy 2 = δ0 + N1 + N2 . N Evidently, this process, introduced by Kallenberg (1977a) and called by him the ‘method of backward trees’, can be continued. At each stage we move one generation further back, taking the location of what was previously the oldest ancestor as origin, locating the ancestor of next order and the siblings of the previously oldest ancestor by (13.5.2), then moving forward to add in to the current generation the superposition of clusters of appropriate order deriving from the siblings of the previously oldest ancestor. n developed in this way have a monotonic character, beThe processes N cause we can imagine them as deﬁned on a common probability space and embedded into an indeﬁnitely continued process of superposition of this kind. (n) (∞) Whether the Palm measures Q0 converge to some limit Q0 thus reduces to the question of whether this process of superposition produces in the limit an a.s. boundedly ﬁnite limit measure. Because each stage is formed from its predecessor by an independent operation representing a shift of locations and corresponding augmentation of the number of branches by the distribution (13.5.2), it follows from the Hewitt–Savage zero–one law that the limit is boundedly ﬁnite either with probability 1 or with probability 0. This dichotomy allows us to make the following deﬁnition. Deﬁnition 13.5.I. The cluster mechanism described by (A) and (B) is stan described above ble or unstable according as the sequence of processes N converges a.s. to a boundedly ﬁnite limit or diverges a.s. In complete generality, the problem of determining conditions that are necessary and suﬃcient for the stability of a given cluster mechanism appears to be still open. What is known is that the conditions are closely linked to the behaviour of the random walk with step-length distribution governed by the symmetrized form (13.5.3) σ = ρ ∗ ρ− , where ρ− (A) = ρ(−A), of the intensity measure for the clusters. To see how this measure arises, suppose that the mean square cluster size ∞ 2 k=1 k πk is ﬁnite; this ensures that the Palm clusters also have ﬁnite intensity, which we write in the form ρ˜ = δ0 + ρˆ, where ρˆ(A) =

∞

k−1 (A × X × · · · × X ), (k − 1)P

k=2

and we note ρˆ(X ) =

∞ k=2

k(k − 1)πk .

13.5.

Cluster Iterates

337

n+1 − N n between the Palm clusters at the Consider now the diﬀerences N (n + 1)th and nth stages of clustering. To obtain the intensity measure of this increment, we should start from ρn∗ − , representing the n steps taken to the left (i.e., according to ρ− ) to locate the position of the nth generation ancestor, convolve this with ρˆ to obtain the locations of that ancestor’s siblings, and ﬁnally convolve again with ρn∗ + to obtain the intensity measure of the superposition of the nth stage clusters generated by those siblings. Thus, n , we have if ρ˜n denotes the intensity measure for N ρ˜n+1 = ρ˜n + ρn∗ ˆ ∗ ρn∗ ˜n + σ n∗ ∗ ρˆ; − ∗ρ + =ρ hence, ρ˜n = δ0 + ρˆ ∗ (δ0 + σ + σ 2∗ + · · · + σ (n−1)∗ ). The series on the right-hand side converges toward the renewal measure (boundedly ﬁnite or inﬁnite) of the random walk with step-length distributions σ. It is boundedly ﬁnite if and only if the random walk is transient (see Exercise 9.1.11). On the other hand, it follows from the monotonic character n that they converge to a boundedly ﬁnite limit with boundedly ﬁnite of the N ﬁrst moment measure if and only if the sequence ρ˜n so converges. We are thus led to the following result [see Liemant (1969, 1975)]. Proposition 13.5.II. A critical cluster member process with ﬁnite mean square cluster size is stable if and only if the random walk generated by the symmetrized intensity measure (13.5.3) is transient. Recall that random walks in three or more dimensions are necessarily transient, so that a properly three-dimensional cluster process with ﬁnite mean square cluster size is always stable. In one or two dimensions, however, a random walk is not necessarily transient: it is if the step distribution has nonzero mean (d = 1 or 2) or inﬁnite variance (d = 2), so it is only under particular conditions that the associated cluster process can be stable. Kallenberg (1977a) gives further results concerning stability. In particular, for a process of Neyman–Scott type, transience of the random walk alone is necessary and suﬃcient for stability, but necessary and suﬃcient conditions for stability cannot be formulated in full generality solely in terms of conditions on the cluster size distribution and the cluster intensity measure. Granted that the cluster mechanism is stable, convergence to a limiting process can be established by arguments similar to those used in discussing (n) random translations. As above, write Q0 for the Palm measure corresponding to the Poisson cluster process formed after n stages of clustering from a Poisson process with unit rate, and suppose the clusters are stable, so that by hypothesis (n) (∞) (n → ∞) (weakly in N0# ) Q0 → Q0 (∞)

for some limit distribution Q0 . This convergence implies the corresponding assertion for the associated KLM measures (see Exercise 13.5.3), namely, ∞ n → Q Q

(n → ∞)

(weakly).

(13.5.4)

338

13. Palm Theory

(n) (∞) ∞ ) allots zero mass to the totally ﬁnite Unlike Q0 , Q0 (and hence also Q n+1 − counting measures. To see this, observe that the successive increments N Nn are independent and nonnegative, and that for positive constants c, c ,

n ≥ 1} ≥ cP{Zn > 0} ≥ c /n, n+1 − N P{N where {Zn } is a Galton–Watson branching process governed by the cluster size distribution {πk } (see Harris, 1963, Chapter 1). Because the sum of these terms diverges, it follows from the Borel–Cantelli lemmas that with probability 1, an inﬁnite number of the events on the left-hand side occur. n (X ) is inﬁnite a.s., which is equivalent to the assertion that Thus, limn→∞ N (∞) Q0 allots zero mass to the counting measures with ﬁnite total mass. Now the various Poisson cluster processes formed from the initial Poisson process all have unit rate, so their distributions on NX# are weakly relatively compact, and from (13.5.4) above and Proposition 11.2.II it follows that the limit of any weakly converging subsequence must be inﬁnitely divisible with ∞ . This limit process must therefore be the overall weak KLM measure Q limit. Recalling that an inﬁnitely divisible point process is singular if its KLM measure is supported by the counting measures with inﬁnite total mass, we can assert the following result. Lemma 13.5.III. In the stable case, the Poisson cluster processes derived from an initial Poisson process of unit rate converge weakly to a limit point process that is stationary, singular inﬁnitely divisible, and has KLM measure ∞ . Q It can be shown further that the limit process is actually mixing and therefore weakly singular (Fleischmann, 1978). Also, if we start from an initial Poisson process of rate λ, the Poisson cluster processes converge to a limit ∞ . point process that is inﬁnitely divisible with KLM measure λQ Granted the Palm versions of the cluster iterates converge to the limit ∞ , we may raise more generally the question of convergence of the processes Q formed by successive clusterings from a general initial distribution. Because stability implies that the intensity measure ρ(·) of the cluster member process has at least two points in its support, we obtain an estimate for the p.g.ﬂ. Gn [h] for the nth cluster iterate with cluster centre at the origin, namely, sup |Gn [Tx h] − 1| ≤ θn ≡ sup x

x

X

ρn∗ (x + dy) 1 − h(y)

(h ∈ V),

and the last quantity → 0 as n → ∞ by Lemma 11.4.I. Using this estimate, we can start the discussion along much the same lines as that of Theorem 11.4.II. As in that proof, the above estimate implies " " " " E "− log Gn [Tx h] N0 (dx) − (1 − Gn [Tx h]) N0 (dx) " → 0. X

X

13.5.

Cluster Iterates

339

If now Y is given by (11.4.10) and we write G∞ [h] for the putative limit of the terms in the exponent, namely, ∞ (dN # ), exp G∞ [h] = log h(x)N (dx) − 1 Q N0# (X )

X

then to complete the proof along the lines of Theorem 11.4.II we need to show that "2 " " " E "" (1 − Gn [Tx h]) N0 (dx) − Y G∞ [h]"" → 0. X

Now we have already established that the KLM measures of the Poisson clus ∞ , so we have further that ter process converge to Q (1 − Gn [Tx h]) dx → G∞ [h], X

and hence it is in fact suﬃcient to show that "2 " " " (1 − Gn [Tx h]) dx"" E "" (1 − Gn [Tx h]) N0 (dx) − Y X

X

→ 0.

(13.5.5)

At this point we meet a diﬃculty, because we do not have the detailed information concerning the asymptotic behaviour of the cluster p.g.ﬂ.s 1 − Gn [tx h], in x and n, which would correspond to the information concerning the convolution powers 1 − X h(y) ρn∗ (x + dy) used to complete the proof of Theorem 11.4.II. Such information can in fact be obtained by the ‘method of reduced trees’ introduced by Fleischmann and Prehn (1974, 1975). The underlying idea here is that if the clustering survives a large number of generations, the oﬀspring in the current generation come with high probability from a single line in the family tree, other lines having become extinct. In other words, the current oﬀspring have a single common ancestor a few generations back, so that for all generations preceding that it is enough to track the positions of this single ancestor and its forebears. Each backward step in this reduced part of the tree corresponds to a step in a random walk governed by the distribution ρ(·), as discussed below (13.5.2). Hence, we may approximate the p.g.ﬂ.s Gn [Tx h] in (13.5.5) by the corresponding p.g.ﬂ.s X h(y) ρn∗ (x + dy) for the random translations process governed by the distribution ρ. Then we may refer to the proof of Theorem 11.4.II again to deduce that for this process, assuming ρ is nonlattice, the terms corresponding to those in (13.5.5) are asymptotically equal. These considerations lead to the limit result set out in Theorem 13.5.IV below, representing an extension of Theorem 11.4.II. For details of the argument, as well as extensions and strengthenings of the theorem and analogous results for subcritical branching mechanisms, see MKM (1978, Chapter 12) and the extended and updated version in MKM (1982, Chapter 10).

340

13. Palm Theory

Theorem 13.5.IV. Let N0 be a second-order stationary point process on X =Rd , and let {πk ,Pk } be a stable cluster mechanism in the sense of Deﬁnition ∞ denote the limiting KLM measure (13.5.4) asso13.5.I. Furthermore, let Q ciated with the iterates of the cluster mechanism and {Nn } the sequence of point processes derived from N0 by successive independent clusterings according to {πk , Pk }. If the intensity measure ρ for {πk , Pk } is nonlattice, then the sequence {Nn } converges weakly to the limit point process with p.g.ﬂ. exp X log h(x) N (dx) − 1 Q∞ (dN ) , G[h] ≡ E exp Y N0# (X )

where Y = E[N0 (Ud ) | I].

Exercises and Complements to Section 13.5 13.5.1 Find the class of point processes invariant under the cluster operation of Theorem 13.5.IV (i.e., ﬁnd an analogue of Corollary 11.4.III to Theorem 11.4.II for Theorem 13.5.IV). 13.5.2 Let the critical branching mechanism {πk , Pk } be neither stable nor a random translation. Show that the cluster iterates {Nn } starting from a stationary second-order point process N0 converge weakly to the zero point process. [Hint: Consider ﬁrst the case where N0 is a stationary Poisson process.] 13.5.3 Complete the convergence results relating KLM measures and the Palm measures of n-stage Poisson cluster processes asserted in the text following Proposition 13.5.III. [Hint: See the discussion of weak convergence of KLM measures preceding Proposition 11.2.I, and link to Exercise 13.5.2.]

13.6. Fractal Dimensions Although the concept of a fractal set, and many related concepts associated with the names fractals and multifractals, were introduced principally in the work of Mandelbrot (1982), the fractal dimensions we consider here have their origins in earlier work of R´enyi (1959), who at that time was studying generalizations of the concept of entropy. They are characteristics of a measure rather than a set. Their link to point processes might seem obscure, because a point process, as a measure, has the dimension of a single point, namely, zero. The connection arises through estimation procedures for fractal dimensions, for example, box-counting and Grassberger–Procaccia estimates, which can be and often are applied to a wide range of point process data. In this case it turns out, as we describe below, that the estimates can be related to the moment measures of the Palm distribution of the underlying point process. Indeed, in such contexts, the estimation procedures might well be better directed toward a careful study of these moment measures than to limit properties which have relatively limited application and are fraught with practical diﬃculties.

13.6.

Fractal Dimensions

341

The material for this section is based on Vere-Jones (1999). Falconer (1990) and Cutler (1991) provide basic references for the R´enyi dimensions and the multifractal formalism, and Harte (2001) gives a broad overview of both concepts and statistical issues. The topic now comprises a major ﬁeld in its own right, and has generated a huge literature; we consider here only the speciﬁc issues that arise in interpreting estimates of multifractal dimensions when the estimates are derived from point process data. The R´enyi or multifractal dimensions Dq are deﬁned for a measure µ on Rd as the limits q−1

log X µ Sδ (x) µ(dx) (13.6.1a) Dq (µ) = lim δ→0 (q − 1) log δ for q = 1, where Sδ (x) is a sphere radius δ, centre x. In the special case q = 1, D1 (µ) = lim

δ→0

X

log µ Sδ (x) µ(dx) , log δ

(13.6.1b)

a quantity sometimes called the entropy dimension. For measures that have a bounded density with respect to Lebesgue measure over some bounded set A, and vanish outside that set, the R´enyi dimensions are all equal to the dimension d of the space on which the measure is deﬁned. For singular measures, the situation can be more complicated. In the case of the Cantor measure, for example, the R´enyi dimensions are still equal, but to a value less than the dimension of the space. Such a measure is unifractal. In still other examples, the growth rates of the measure may vary from point to point, resulting in diﬀerent weights being given to the increment µ(dx) for diﬀerent x; in such examples, the R´enyi dimensions will vary with q, and the measure is described as multifractal. Simple variants on the Cantor measure which possess this property are outlined in Exercise 13.6.1. When µ is a probability measure over a bounded observation region A, one might attempt to ascertain the values of the R´enyi dimensions empirically by generating i.i.d. observations according to µ, and replacing µ in (13.6.1) by the corresponding empirical distribution µ +(B) = N (B)/N (A). In this context, integrals of the type appearing in the numerators of (13.6.1) are referred to as correlation integrals [cf. Harte (2001)], although strictly speaking the term refers to the particular case q = 2. To keep to quantities that can be related to point process moment measures, we restrict our discussion to multifractal dimensions of positive integral order q = k ≥ 2. In such cases, a correlation integral which we denote by Ck (·) has a particularly simple interpretation, namely, k−1 µ Sδ (x) µ(dx) = Pr{Mk ≤ δ}, (13.6.2) Ck (δ, A, µ) = A

where

Mk = max X1 − Xk , X2 − Xk , . . . , Xk−1 − Xk

342

13. Palm Theory

and the Xj are i.i.d. with common distribution µ which vanishes outside A. It then follows (see Exercise 13.6.2) that the relation Dk (µ) = η is equivalent to the statement that, at the origin, the distribution of Mk has power-law growth ν of order ν = (k − 1)η, so that Pr{Mk ≤ δ} = φ(y) y for some function φ(y) with | log φ(y) | = o | log y| . In applications, empirical correlation integrals of the above kind may be calculated for many diﬀerent types of data, and used to obtain estimates of some quantity purporting to be a fractal dimension. In circumstances when there is no obvious generating measure µ, however, it is not clear whether the quantities obtained empirically have any meaningful interpretation. The discussion which follows has the aim of elucidating this point in situations where the data are generated by a point process. We start by writing the correlation integral in (13.6.1a) in a form more convenient for calculations with general point processes. Let Ik,δ (·) denote the indicator function of the set Uk,δ = {x: max |xi − xk | ≤ δ}, i

so that

k−1 µ Sδ (x) µ(dx) = A

A(k)

Ik,δ (x) µ(k) (dx),

(13.6.3)

where x denotes a k-vector (x1 , . . . , xk ) with each component xi ∈ Rd . Replacing µ by the empirical measure µ + formed from the counting measure N (·) from a ﬁnite set of observations {x1 , . . . , xN (A) } over some bounded region A, +) which we write as we obtain Ck (δ, A, µ k−1 +k (δ, A) = µ + Sδ (u) C µ +(du) A 1 = Ik,δ (u1 , . . . , uk ) N (k) (du1 × · · · × duk ) [N (A)]k A(k) 1 Ik,δ (x∗1 , . . . , x∗k ), (13.6.4) = k [N (A)] perm the last sum being taken over all permutations of the points x1 , . . . , xN (A) of the realization in A taken k at a time. Points lying outside A are ignored, a convention which can lead to bias if the measure of interest has continued support outside A. Because we are concerned with the limits for small testing sets, such edge eﬀects are generally of smaller order than the main terms, but their possible existence needs to be borne in mind. In practice, they can be important even when the size of the testing set is as small as 10% of the observation region; see Harte (2001), especially Chapters 9 and 10, for discussion and illustration of these and related aspects. For computations, it is convenient to consider contributions only from distinct pairs, triplets, and so on, taking advantage of the symmetries in the

13.6.

Fractal Dimensions

343

counting measure to reduce the number of terms to be considered. In the present case, the function Ik,δ (x1 , . . . , xk ) is symmetric in the ﬁrst k − 1 arguments, so that each combination of one point of the realization for xk , and a set of k − 1 points for x1 , . . . , xk−1 , will be repeated (k − 1)! times, leading to the alternative estimate, a quasi factorial moment estimator, . N (A) N (A) − 1 ∗ ∗ + , (13.6.5) Ik,δ (x1 , . . . , xk ) N (A) C[k] (δ, A) = k−1 j=1 comb

where the inner sum is taken over distinct combinations of one term x∗j and k − 1 diﬀerent terms {x∗1 , . . . , x∗k } \ {x∗j } from x1 , . . . , xN (A) . No combinations with repeated points from the realization appear in this representation, so that in taking expectations it should be written as an integral against the modiﬁed product counting measure N [k] used in deﬁning the factorial moments (see Section 5.2). Another way of writing this last formula, which may help to show up its link to power-law growth, is outlined in Exercise 13.6.3. We describe two general situations where the estimates based on the correlation dimension estimates (13.6.4) and (13.6.5) do lead to consistent estimates of a multifractal dimension. Both situations relate to a space–time point process, both use the same estimates, but the estimates are embedded in diﬀerent limit processes, and the quantities that they estimate are likewise diﬀerent. In one case the process is stationary in time and observations accumulate over a ﬁxed spatial region, whereas in the other, the process is homogeneous in space, and observations are considered over an expanding sequence of spatial sets, the time span being held ﬁxed. In the ﬁrst situation, observations accumulate ever more densely over a bounded spatial region, until in the limit their spatial distribution approximates that of the ﬁrst spatial moment measure over that region. In this situation the estimates can be related to the fractal dimensions of the ﬁrst spatial moment measure. In the second situation, although we do not accumulate information about density variations in any particular spatial subregion, we do start to collect information about the behaviour of groups of points at various relative distances from each other, leading to the possibility of estimating the power-law growth of the reduced moment measures when spatial homogeneity is assumed. The proposition below establishes consistency of the correlation integral estimates in these two cases. It provides a starting point for considering the limit behaviour of the multifractal dimension estimates themselves. We describe the process as space–time (cf. Section 15.4 below), this being a convenient nomenclature for the two components of the state space in which the point process exists. In part (b) the ‘time’ variable plays no role so mention of it has been omitted. Proposition 13.6.I. Let N be a simple space–time point process, with state space X = R × Rd , and k > 1 a positive integer. (a) Suppose that N is stationary and ergodic in time, and that its ﬁrst spatial moment measure exists and has stationary spatial distribution µ over a

344

13. Palm Theory

given spatial set A. Then, for A ﬁxed and time interval T → ∞, both +k (δ, A) → Ck (δ, A, µ) C

+[k] (δ, A) → Ck (δ, A, µ) and C

a.s. and in L1 -norm. (b) Suppose that N is stationary on X = Rd , and that its moment measures ˘ [k] (·) denote its reduced ordinary and fac˘ k (·), M exist up to order k. Let M ˚k−1 (·) and M ˚[k−1] (·) the corresponding Palm torial measures of order k, M moment measures, m the mean spatial density, {An ; n = 1, 2, . . .} a con(k) vex averaging sequence of sets in Rd , Sδ the set {x1 , . . . , xk : max |xi | ≤ [r] δ}, and with Nn = N (An ), Nn = Nn ! /(Nn − r)! . Then as n → ∞, +k∗ (δ, An ) ≡ Nnk−1 C +k (δ, An ) → m−1 M ˘ k (S (k−1) ) = M ˚k−1 (Sδ(k−1) ), C δ + ∗ (δ, An ) ≡ Nn[k−1] C +[k] (δ, An ) → m−1 M ˘ [k] (S (k−1) ) C [k] δ a.s. and in L1 norm. Proof. The proofs are exercises in using the ergodic theorems of Sections 12.2 and 12.6, and the link between reduced moment measures and the moment measures of the Palm distribution established in Section 13.4 and quoted in the proposition. For case (a) we need the extension of Proposition 12.2.IV to product integrals [see below (12.2.15)]. Write x for the k-vector (x1 . . . , xk ) as earlier, NT (·) for the projection of N (· × (0, T )) onto the spatial component Rd of X , and NT (A) for the total number of points in the observation region A. The proof of the following lemma is sketched in Exercise 13.6.4. Lemma 13.6.II. With the notation just given, suppose that the assumptions of Proposition 13.6.I(a) are satisﬁed and that h(·) is a µ(k) -integrable function on (Rd )(k) . Then as T → ∞ with region A ﬁxed, 1 (k) k h(x) NT (dx) → m h(x) µ(k) (dx). (13.6.6) T k A(k) A(k) If in addition h is symmetric, then . NT (A) → h(xi ) h(x) µ(k) (dx), k (k) A

(13.6.7)

comb

where the sum is taken over all combinations xi of k distinct elements from the realization (x1 , . . . , xNT (A) ). To establish part (a) of the proposition, apply (13.6.6) twice, the ﬁrst time with h = Ik,δ and the second time with h ≡ 1, and take the ratio. Comparing the resulting equations with the expressions (13.6.3) and (13.6.4) yields the ﬁrst statement from part (a) of the proposition. The second statement from part (a) follows similarly from (13.6.5) by applying the variant of (13.6.2) which holds for functions with symmetry only in the ﬁrst k − 1 arguments.

13.6.

Fractal Dimensions

345

It is noteworthy here that the same limit is obtained whether or not we allow repeated indices in the sums over permutations of the sample elements. This is because the contributions from terms with multiple points, corresponding to the diagonal concentrations in the moment measures, are of lower order in N (or T ) than the terms from sets of distinct points. The argument for (b) rests on the higher-order ergodic Theorem 12.6.VI. Replacing the sets Bi in (12.6.10) by spheres Sδ ≡ Sδ (0) ⊂ Rd we obtain 1 (An )

An

k−1

N (x + Sδ ) N (dx) =

j=1

n 1 N (k−1) (xi + Sδ )(k−1) (An ) i=1

N

(k−1)

˘ k (S →M δ

).

(13.6.8)

This form also neglects edge eﬀects, for it assumes that the process is observed not only within An but also within any parts of the translated spheres Txi Sδ which happen to fall outside An even for xi ∈ An . A sandwiching argument shows that such edge eﬀects are asymptotically negligible provided () (An )/(An ) → 1, which we assume. in the middle term of (13.6.8) is just another way of expressing Now the sum ∗ ∗ perm Ik,δ (x1 , . . . , xk ) from (13.6.4). Rewriting this in terms of Ck (δ, An ), adjusting the scaling factor [only a single integral over An is involved in (13.6.8), whereas a multiple integral over (0, T )(k) is implicit in (13.6.4) and Lemma 13.6.II], and recalling that N (An )/(An ) → m, we obtain the ﬁrst statement in part (b) of the proposition. For the second statement, we omit repeated points, thereby obtaining a representation in terms of the factorial product counting measures N ∗[r] whose expectation deﬁnes the factorial moments. In place of (13.6.8) above we start from Nn 1 ˘ [k] (S (k−1) ). N ∗[k−1] [(xi + Sδ )(k−1) ] → M (13.6.9) δ (An ) i=1 The left-hand side of (13.6.9) can be rewritten as Nn (k − 1)! I[k,δ] (x∗1 , . . . , x∗k ). (An ) i=1 comb

Replacing (An ) by Nn , the result is of the same form as (13.6.5) except for [k−1] . Incorporating this factor and rewriting the expression in the factor Nn terms of the sum in (13.6.5) completes the proof2 of (b). As already noted, the limits in (b) can equally well be written in terms of ˘ k (S (k−1) ), repre˚k−1 (Sδ(k−1) ) = m−1 M moments of the Palm probabilities M δ senting the moment measures (of one order lower) for the process conditioned 2 Note two errata in Vere-Jones (1999): (a) the last term in equation (35), should start with 1/(q − 1)! not (q − 1)! , and (b) the scaling factor in equations (37) and (41) should read (Nn − 1) . . . (Nn − k + 1) and not (Nn − 1)! or [N (X ) − 1]! .

346

13. Palm Theory

on a point at the origin. This interpretation is a natural one in the present context because the construction is based on the maxima Mk [see below (13.6.3)] which already singles out one point of the group as a local origin. A range of diﬀerent estimates based on the correlation integrals have been proposed and their behaviour analyzed in diﬀerent contexts. For example, the simplest, na¨ıve, estimate is based directly on the deﬁnition (13.6.1a) and takes the form, for k = 2, 3, . . . , + + [k] (δ, A, T ) = log C[k] (δ, A) . D (k − 1) log δ

(13.6.10)

Its major drawback in practice is that the behaviour is unreliable when δ is small because of measurement error and lack of data within very small spheres; on the other hand for large δ it is likely to be signiﬁcantly biased by edge eﬀects, which we have ignored in the discussion above, and by any departures from simple power-law growth as the test region expands. To avoid at least some of these diﬃculties, Grassberger and Procaccia (1983) suggested replacing (13.6.10) by + + + ∗ (δ, A, T ) = log C[k] (δ1 , A) − log C[k] (δ2 , A) , D [k] (k − 1)[log δ1 − log δ2 ]

(13.6.11)

where the interval (δ1 , δ2 ) is chosen essentially by inspection, to obtain a portion of the graph of (13.6.10) linear in δ, and taking δ1 as small as seems reasonable. Let us note in passing, however, that although the Grassberger–Procaccia and similar procedures may establish the existence of power-law growth over a certain distance range, something which may well be of importance in its own right, it is another matter to assert that the power-law index for this range necessarily coincides with the limiting value at vanishingly small distances. In applications, the diﬃculty of distinguishing the two situations has been one factor leading to confusion over whether fractal behaviour refers generally to power-law growth, or speciﬁcally to the limiting behaviour near zero. Another approach, developed by Mikosch and Wang (1995), is based on the Hill (1975) estimate; it uses extreme quantiles to estimate the limiting powerlaw growth. Mikosch and Wang also advocate use of a bootstrap method to estimate the dimension estimates and their conﬁdence bounds. The Hill method is broadly similar to the approach of Takens (1985) which also bases the estimate on the behaviour in the extreme tail. The methods are reviewed and compared in Harte (1998, 2001) where the use of the Hill method is illustrated in practical situations in which bias from measurement error and ﬁnite boundary eﬀects cannot be ignored. Whatever form of estimate is adopted, it is a further nontrivial exercise to establish conditions for consistency of the estimates. Because the fractal dimension is itself deﬁned as a limit, a double limit problem is involved: as

13.6.

Fractal Dimensions

347

either T or N (T ) → ∞, and as δ → 0. In general, the maximum rate at which δ → 0 will be constrained by the rate at which the study region expands in time or space. A slightly unusual form of limit process is required, in which one dimension shrinks to zero and the other expands to inﬁnity. Results for the na¨ıve estimate (13.6.10) are summarized in the theorem below, and in Exercises 13.6.5–6. Consistency of a Grassberger–Procaccia type of estimate, for the index of power-law growth over a predetermined range, is more easily established on the basis of Proposition 13.6.I; an outline is given in Exercise 13.6.7. Theorem 13.6.III. (a) Suppose that the conditions of Proposition 13.6.I(a) hold, that the moment measures for the space–time process N exist up to order 2k, and that the stationary space distribution µ has kth fractal dimension Dk (µ). Deﬁning + [k] (δ, A, T ) , d}, + [k] (δ, A, T )+ = min{D D if the controlled diagonal growth conditions of Exercise 13.6.5 hold, then + [k] (δ, A, T )+ is a mean-square consistent estimate of Dk (µ). D (b) Suppose that the conditions of Proposition 13.6.I(b) hold, that the moment measures of N exist up to order 2k, and that the reduced factorial ˘ [k] is such that the limit moment measure M (k−1)

˘ [k] (S log M ) δ δ→0 (k − 1) log δ

∗ D[k] ≡ lim

(13.6.12)

exists and is ﬁnite. Setting + ∗ (δ, A, T )+ = min D [k]

+ ∗ (δ, A) log C [k] (k − 1) log δ

, d ,

if the bounded growth conditions of Exercise 13.6.6 hold, then + ∗ (δ, A, T )+ is a mean-square consistent estimate of D∗ . D [k] [k] Proof. We make a few comments only, referring to Vere-Jones (1999) and the Exercises 13.6.5–6 for details. The nature of the problems which arise is most easily illustrated by considering the expected value of the quantity which appears in the numerator of (13.6.5). Written out directly in terms of the modiﬁed product counting measure N [k] (·), which omits repeated points, it takes the form Ik,δ (x∗1 , . . . , x∗k ) N (k) (dx∗1 , . . . , dx∗k ), A(k)

where A is the region over which points are observed. In part (a), the spatial process in view is the ground process of the space– time process over the region A×(0, T ), time being treated as a mark and then

348

13. Palm Theory

ignored. Its expectation therefore reduces to the integral over [A × (0, T )](k) of the kth order factorial moment M[k] (·) of the space-time process: N (A) ∗ ∗ Ik,δ (x1 , . . . , xk ) E j=1 comb

= [A×(0,T )](k)

Ik,δ (x1 , . . . , xk ) M[k] (dx1 × dt1 × · · · × dxk × dtk ).

In case (a), we write M[k] (dx1 × dt1 × · · · × dxk × dtk ) =

k

M1 (dxi × dti ) + ∆(dx1 × dt1 × · · · × dxk × dtk ),

(13.6.13)

i=1

where in view of stationarity, M1 (dx × dt) = m µ(dx) dt, m being the mean rate of occurrence of points over the whole region A and µ being normalized to a probability measure. The product term is what we would expect if we were looking for dimension estimates of µ, whereas the term ∆(·), which in general consists of an amalgam of lower order factorial moment and cumulant measures, deﬁnes the error which must be controlled if consistent estimates are to be obtained. Indeed, the ‘controlled diagonal growth’ condition of Exercise 13.6.5 puts a bound on the growth of the expected value of the integral against ∆, and the ‘controlled bi-diagonal growth’ condition puts a bound on the growth of its variance. In part (b), the time variable plays no signiﬁcant role, but the process is homogeneous in space. In this case, omitting the dt terms, we can write Ik,δ (x1 , . . . , xk ) M[k] (dx1 × · · · × dxk ) A(k) ˘ [k] (du1 × · · · × duk−1 ) = dxk I(max(|u1 |,...,|uk−1 |<δ) M A

A(k−1) −xk 1

˘ [k] (S (k−1) ), = (A)(1 − ∗ ) M δ where ∗ is a measure of the edge eﬀects (which become negligible when δ is very small) but there is no other bias term. Here, only a constraint on the growth of the variance is needed to guarantee consistency, as indicated in Exercise 13.6.6. Example 13.6(c) Space–time Poisson processes. Consider ﬁrst the setting in part (a) of both Proposition 13.6.IV and Theorem 13.6.VI. Here it is natural to consider a stationary but spatially inhomogeneous Poisson process with intensity measure of the form Λ(dt × dx) = m µ(dx) dt, where m is the mean rate per unit time and space, and the stationary distribution µ over space is normalized to form a probability distribution. This is essentially the classical case of ﬁnding the fractal dimension from i.i.d. observations with distribution

13.6.

Fractal Dimensions

349

µ. The conditions for part (a) hold for general µ. Because all factorial cumulants of order 2 and greater vanish for a Poisson process, it is most convenient to deal with the combinatorial form (13.6.5). Then the only constraints on the rate at which δ → 0 are imposed by the growth behaviour of µ. In particular, the rate of growth of the variance of the estimate depends on the rate of growth of the distributions of the random variables M2 and M3 in (13.6.2). Checking the conditions in Exercise 13.6.4, we ﬁnd that a suﬃcient condition for the na¨ıve estimate to be consistent is that δ → 0 no faster than T −1/D2 , where D2 is the R´enyi dimension of order 2 of µ. It will diﬀer from the dimension of the space only in some extreme cases, as when µ is concentrated along a set of lines, or has some other singularity which aﬀects the dimension. If we take the other interpretation, of parts (b), then we must assume that the process is homogeneous in space. The factorial cumulants vanish, and ˘ [r] (·) is proportional to Lebesgue the reduced factorial moment measure M d r measure on (R ) . Hence the dimension is the dimension of the space. The next example illustrates the type of situation more commonly met with in geophysical or similar applications, where dimension estimates are used to gain an impression of the character of small-scale clustering behaviour. We use it also to illustrate the possibility that diﬀerent rates of power-law growth may occur over diﬀerent ranges in time or space, and that the dimension estimates may pick up any one or a combination of these, according to the range selected. Example 13.6(d) A space–time Neyman–Scott process with singular components [cf. Example 6.3(a) and Exercise 6.3.10]. We consider a two-dimensional spatial region, and suppose that cluster centres follow a Poisson process with space–time intensity Λ(dx × dt), and that cluster members are i.i.d. about the cluster centre, with a spatial distribution described by the probability measure B(dy), y being the position of the cluster member relative to the cluster centre as origin, and after a time lag governed by the probability measure C(dt). Suppose also that the number of points in a cluster has a discrete distribution {pn : n > 0} with ﬁnite factorial moments ν[j] for j = 1, . . . , k. We consider the behaviour of the correlation integral (k = 2) and its estimates under various assumptions concerning the components. Consider ﬁrst the behaviour under assumptions of Theorem 13.6.III(a), so that Λ(dx × dt) = λ dt θ(dx), where θ is a probability measure. After integrating out the time factor, we ﬁnd for the ground (spatial) process (13.6.14) M1g (dx) = λT ν1 B(dx − y) θ(dy). Also for k = 2, the expansion (13.6.13) takes the particularly simple form [cf. equations (6.3.5), (6.3.17)], g g M[2] (dx1 × dx2 ) = M1 (dx1 ) M2 (dx2 ) + C[2] (dx1 × dx2 ) = (λT ν1 )2 θ(dx1 ) θ(dx2 ) + λT ν2 B(dx1 − y) B(dx2 − y) θ(dy). (13.6.15)

350

13. Palm Theory

Integrating against I2,δ we obtain, denoting the convolution of θ and b by θ ∗ B, g I2,δ (x1 , x2 ) M[2] (dx1 × dx2 ) = (λT ν1 )2 (θ ∗ B)[Sδ (x)] (θ ∗ B)(dx) A×A A B(dx1 − y) B(dx2 − y) θ(dy). (13.6.16) + λT ν2

A

Bearing in mind that θ is a probability distribution, the second term on the right-hand side can be written in the form

λT ν2 1 − (δ, A)

B[Sδ (x)] B(dx),

0 < (δ, A) < 1,

A

where the correction term becomes negligible as δ → 0. The order of growth of the terms in the right-hand side of (13.6.16) is T 2 δ D2 (θ∗B) for the ﬁrst, and T δ D2 (B) for the second. Their relative behaviour can be very diﬀerent under diﬀerent assumptions. If both cluster centre and cluster member processes have bounded densities, then all terms grow like small areas (i.e., proportional to δ 2 ), so that the correlation dimension is 2. Moreover the second term in (13.6.16) is then O(T ) and the ﬁrst term is O(T 2 ) so the ﬁrst term dominates and the controlled growth conditions are satisﬁed even for δ = O(1/T ). If the clusters are dispersed, so that B has a bounded density b say, but the cluster centres are concentrated along a set of lines in two-dimensional space, then the correlation dimension of θ is 1, but the ﬁrst moment measure of the point process still has a smooth density m1 (x) = λT ν1 b(x − y) θ(dy) corresponding to a correlation dimension of 2. This is typically the situation when observations are contaminated by spatial measurement errors, so the correlation integral grows as O(δ 2 ) until δ reaches the same order of magnitude as the limits of measurement error, or in our case the eﬀective range of the distribution B, say δ0 . When δ is increased beyond δ0 , the linear concentration of θ starts to tell, and the correlation integral starts to grow as O(δ × δ0 ) [i.e., as O(δ)], corresponding to a fractal dimension estimate of 1 rather than 2. If the converse situation obtains, so that cluster centres are smoothly (say uniformly) distributed over the observation area, but the cluster members are distributed along a line, then the ﬁrst moment measure is now uniform, corresponding to growth rate O(δ 2 ), but the ﬁrst term may no longer dominate the expression (13.6.16). Its overall contribution is O(T 2 δ 2 ), whereas that of the second term is O(T δ). For ﬁxed T , the correlation integral grows initially as O(δ) and then as O(δ 2 ), so it shows two regions of power-law growth. If we look for a consistent estimate, and take δ = O(T −(1+η) ), then the ﬁrst term will be O(T −2η ) and the second term O(T −η ). In this case the bounded growth conditions break down, and the dimension estimate is dominated by the behaviour of the local clustering. If we take δ = O(T −(1−η) ), then the ﬁrst

13.6.

Fractal Dimensions

351

term is O(T 2η ) and the second is O(T η ), so the ﬁrst-order term dominates; in this case the clusters accumulate suﬃciently thickly over the observation region that their local linear structure is no longer the dominating feature, and the dimension estimates revert to estimates of the ﬁrst moment measure. In general, the more extreme the concentrations of the local clusters are, the more slowly δ will need to approach zero before the conditions of the theorem are satisﬁed. In such situations, the correlation integral may show a sequence of ranges, all with diﬀerent power-law growths: which of these is picked out as the dimension estimate will depend on the rate at which δ approaches 0 as T → ∞ (see Exercise 13.6.8 for a more elaborate example). Finally consider the behaviour under the assumptions of part (b). The same basic equations hold, and the estimates are still controlled by the growth behaviour of the second factorial moment in (13.6.16). Ignoring the time coordinate, and recalling that the process is assumed spatially homogeneous, ˘ [2] (Sδ (0)), with the (13.6.16) directly gives the reduced moment measure M additional simpliﬁcation that the ﬁrst moment measure here averages out to a multiple of Lebegue measure; θ in the second term is again proportional to Lebesgue measure. Then the same issues arise, but in a slightly diﬀerent form, because the question now is to determine the correlation dimension of (13.6.16) as a whole, meaning therefore the initial growth rate, from whichever term that rate happens to derive. In the examples we have just been considering, when B has a bounded density both terms are O(δ 2 ) so the correlation dimension is again 2. With locally linear clusters, it is the second term which will dominate for δ → 0 and so the dimension here will be 1. We see that for this particular example, the eﬀect of changing from the assumptions in (a) to those in (b) is to switch attention from the growth rate of the ﬁrst moment measure to the growth rate of whatever feature of the cluster structure dominates the behaviour in the initial range of power-law growth. Another commonly occurring and widely studied situation relates to processes generated by a dynamical system, where the measure µ of interest is an invariant measure for the process. The common examples are determinsistic in character, but can be randomized by introducing a random starting point. We do not look at this example in great detail, referring the reader rather to Cutler (1991), Serinko (1994), and the review in Harte (2001) for further discussion and references. Example 13.6(e) Point processes generated by a dynamical system [Serinko (1994)]. Let Θ be a measurable mapping taking the closed, bounded set A ∈ Rd into itself, and let µ be a totally ﬁnite invariant measure for Θ; we suppose µ normalized to form a probability measure. We consider the application of Theorem 13.6.III(a) to the point process formed by the sequence {xn } = {Θn x0 : n = 0, 1, . . . }. If the initial value x0 itself has distribution µ, then the sequence {xn } is stationary; we suppose that it is also ergodic. The process may also be regarded as a space–time point process with discrete time variable. The results of Proposition 13.6.I and Theorem 13.6.III(a) are not

352

13. Palm Theory

aﬀected by the character of the time variable, because they all concern limits for large values of T . The task is then to ﬁnd the moment measures of the resulting point process, and to verify the conditions of Theorem 13.6.III(a). For bounded measurable functions h(x) on A and h2 (x, y) on A(2) we can characterize the actions of the ﬁrst and second moment measures of the process on A, up to time T , by means of the equations h(x) M1 (dx) = A

T k=1

k

h(Θ x) µ(dx) = T

A

h(x) µ(dx) A

and A(2)

h(x, y) M2 (dx × dy) =

k=1 =1

=T

T T

h2 (x, x) µ(dx) + A

T −1

h(Θk x, Θ x) µ(dx)

A

(T − r)

[h2 (x, Θr x) + h2 (Θr x, x)] µ(dx). A

r=1

To verify the bounded diagonal growth condition for M2 we should set h2 in the second equation to be the indicator I||x−y||≤δ and examine the behaviour of the integral as T → ∞ , δ → 0. Evidently, the critical feature will be the rate at which Θr x − x increases with r. If the rate is fast enough, the bounded growth condition will hold. For large r, we expect the distribution of Θr x to approximate µ, for µ-almost all x. The rate at which this occurs is governed by mixing conditions on Θ. When appropriate mixing conditions are satisﬁed, therefore, we expect the estimates to be consistent. Serinko (1994) gives details from a somewhat diﬀerent point of view.

Exercises and Complements to Section 13.6 13.6.1 Multinomial measures. Let b be a positive integer, and consider a division of the unit interval into b successive subintervals of length 1/b, then a further subdivision of each such subinterval into b equal sub-subintervals, and so on. Starting with unit mass for the whole interval, at each stage of this process, divide the mass of any given interval among its component subintervals in b proportions {p1 , p2 , . . . , pb }, with r=1 pr = 1. At the nth stage of this process, the subinterval corresponding to the b-adic expansion 0.ω1 ω2 . . . ωn will have mass n j=1 pωj . Denote the corresponding probability distribution by µn . By considering the values of the distribution function at points with ﬁnite b-adic expansions, or otherwise, show that the R´enyi dimension of order q of µn is approximately equal to − logb [ br=1 pqr ]/(q − 1), and converges to this value as n → ∞. Show also that as n → ∞, the measures µn converge weakly to a limit µ, and that the fractal dimensions converge to the fractal dimensions of the limit measure. Investigate conditions under which the Dq are equal. [See, e.g., Harte (2001, Chapter 3).]

13.6.

Fractal Dimensions

353

13.6.2 Prove the equivalence of the relations [see below (13.6.2)] Dk (µ) = η and Pr{Mk < y} = φ(y) y (k−1)η for some function φ(y) such that log φ(y)/ log y → 0 as y → 0. [Hint: To establish the basic link between Ck (δ, A, µ) and the distribution of Mk , condition on Xk and then take expectations.] 13.6.3 For a given set {x1 , . . . , xk } of k distinct points, let the function k

Nδ (x1 , . . . , xk ) = i=1

ISδ (xi − xj )

j=i

count the number of points of the set with the property that the remaining k − 1 points of the set lie within distance δ of the selected point. Show that in this notation, (13.6.5) can be written as

C[k] (δ, A) =

comb

Nδ (x∗1 , . . . , x∗k ) k

N (A) , k

the sum being taken over all distinct combinations (x∗1 , . . . , x∗k ) of observation points.

13.6.4 Prove Lemma 13.6.II, assuming ﬁrst that h is of product form k1 hr (xr ) with each hr measurable and bounded on A, and extending via linear combinations to general h. [Hint: When h is a product, (13.6.6) is just the product of limits from the one-dimensional case. To derive (13.6.7) take ratios of (13.6.5) ﬁrst for the given h and then for h ≡ 1. Because of the symmetry of h, each combination of distinct terms appears k! times, and contributions from repeated arguments are of lower order and can be neglected in the limit.] 13.6.5 Suppose N (· × ·) is an orderly space–time point process, stationary and ergodic with respect to time, observed over the bounded spatial region A, and with moment measures existing up to order 2k. In Theorem 13.6.III(a) let ∆k denote the signed measure ∆k = Mk − mk (µ × )(k) , ∆+ k its total variation, and let Vk,δ denote the restriction of the set Uk,δ deﬁned in (13.6.3) to A(k) . Say that N has controlled diagonal growth of order k, if ∆k satisﬁes the condition (controlling the bias in replacing the kth moment measure by its ﬁrst-order approximation) k (k−α) β δ ∆+ k [Vk,δ × (0, T ) ] ≤ CT

for some positive constants C, α, β and suﬃciently small δ, 1/T . Furthermore, say that N has controlled bi-diagonal growth of order k if ∆2k satisﬁes the condition (controlling the growth of the variance) (2)

2k ∆+ 2k [Vk,δ × (0, T ) ] ≤ C T

(2k−2η) 2ν

δ

for positive constants C , η, ν and suﬃciently small δ, 1/T . Show that if (i) the stationary distribution µ over A has kth order R´enyi dimension Dk (µ), (ii) N has controlled diagonal and bi-diagonal growth of order k, with constants as above, and

354

13. Palm Theory (iii) δ is chosen to vary with T in such a way that δ → 0 but T r δ → ∞, where α η , r > max , (k − 1)Dk (µ) − β (k − 1)Dk (µ) − ν then the consistency result of Theorem 13.6.III(a) holds. [Hint: See Vere-Jones (1999, Proposition 2) for details.]

13.6.6 Let N (· × ·) be orderly, stationary (homogeneous), and ergodic with respect to space; let N be observed over a convex averaging sequence {An } of sets in X = Rd , and suppose its moment measures up to order 2k exist. In Theorem 13.6.III(b), let ∆∗2k denote the signed measure ∆∗2k = M[2k] −M[k] ×M[k] , and n the restriction of Uk,n deﬁned in (13.6.18) to the set An . Say that N has Vk,δ controlled Palm growth of order k if ∆∗2k satisﬁes the condition (controlling the growth of the variance) (2)

(∆∗2k )+ (Vk,δ ) ≤ K(An )2k−2η δ 2ν for positive constants (K, η, ν) and suﬃciently small δ, 1/n. Show that if ∗ of (13.6.12) exists, (i) the limit D[k] (ii) N has controlled Palm growth of order k, with constants as above, and (iii) δ is chosen to vary with T in such a way that δ → 0 but T r δ → ∞, where η , r> ∗ (k − 1)D[k] −ν then the consistency result of Theorem 13.6.III(b) holds. [Hint: See Vere-Jones (1999, Proposition 4) for details.] 13.6.7 Grassberger–Procaccia estimates. (a) Suppose the correlation integral C(k, δ, µ) of (13.6.2), where µ is the stationary spatial distribution under assumptions (a) of Proposition 13.6.I, shows power-law growth over a given interval (a, b), so that log C(k, b, µ) − log C(k, a, µ) = η, log b − log a say. Show that under assumptions (a), replacing C in the above expression by its sample counterpart (13.6.5) yields a consistent estimate of η. (b) Formulate and prove a similar result under assumptions (b). 13.6.8 Further Neyman–Scott examples. (a) Consider Example 13.6(d) but with homogeneous cluster centres and cluster structure determined by a spatial component B that has gamma distribution on a line with shape parameter α < 12 , so that the density has a singularity at 0. Show that if X, Y both have such a distribution, then X − Y also has a singularity at 0, with initial power-law growth 2α. Show that the correlation integral (13.6.31) has three ranges of power law growth, initially with δ = 2α, then with δ = 1, and ﬁnally with δ = 2. Investigate the consequences for the correlation dimension estimates. (b) Extend to R3 , and investigate also the behaviour when both cluster centre and cluster structure components have linear or planar concentrations.

CHAPTER 14

Evolutionary Processes and Predictability

14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8

Compensators and Martingales Campbell Measure and Predictability Conditional Intensities Filters and Likelihood Ratios A Central Limit Theorem Random Time Change Poisson Embedding and Existence Theorems Point Process Entropy and a Shannon–MacMillan Theorem

356 376 390 400 412 418 426 440

The ideas discussed in this chapter have transformed the study of point processes over the last few decades. They provide the background not only for the results on conditional intensities and likelihoods summarized in Chapter 7, but also for general theories of estimation, prediction and control that have been inﬂuential as much in the engineering as in the statistical communities. The introduction to Chapter 7 provides some references to the early literature. Last and Brandt (1995) give a thorough study of the theory for marked point processes; other recent texts covering related material include Asmussen (1987, 2003), Baccelli and Br´emaud (1994, 2003) and Andersen et al. (1993). Broadly speaking, the chapter provides a setting for the functionals that arise in describing the evolution, or ‘dynamics’, of a simple or marked point process evolving in time. This setting embraces certain broad structural features of a point process embodied in the Doob–Meyer decomposition, Theorem A3.4.IX, which is a basic tool. A point process N (·) on R+ is equivalent to the nondecreasing function N (t) = N ((0, t]) for which it is plausible that we should be able to separate N (·) into a ‘generally’ increasing part A(·), its compensator (here, A comes from accroissement or ‘growing’ part), and the ‘unpredictable’ variable part M (·); that is, N (t) = A(t) + M (t). The essence of the Doob–Meyer result is that such a decomposition is possible with A(·) ‘predictable’ and M (·) a martingale. The notion of predictability which arises here is crucial for the development of a rigorous theory; it forms one part of the so-called ‘general theory of 355

356

14. Evolutionary Processes and Predictability

processes’, as set out in Dellacherie and Meyer (1978), for example. A brief outline of some key points is provided in Appendix 3. Section 14.1 gives a general introduction to the notions of compensator and point-process martingale, and their links to the Doob–Meyer theorem. In Section 14.2 we extend these concepts to random measures and marked point processes. Section 14.3 re-introduces the conditional intensity, which appears here as a Radon–Nikodym derivative of Campbell measure on Ω × R, when a certain absolute continuity condition is satisﬁed. Later sections discuss various applications, including the likelihood and time-change theorems treated more informally in Sections 7.2 and 7.4, a martingale-type central limit theorem, and the notion of entropy rate which lies behind the discussion in Section 7.6.

14.1. Compensators and Martingales The aim of this section is to introduce the basic ideas of the martingale approach to the study of point processes on the open half-line1 R+ 0 ≡ (0, ∞). Many of the technicalities are summarized in Appendix 3, so as to allow scope here to stress the connections with other aspects of point process theory. In much of the discussion we are concerned with a random measure on R+ 0 , for although the case of point processes is of paramount importance, the more general theory can be covered with little extra eﬀort. Thus, general results are stated in terms of the cumulative process ξ(t) ≡ ξ(t, ω) ≡ ξ((0, t], ω) for some random measure ξ(·, ω) (we use ξ for both, but the abuse of notation should not lead to diﬃculties). Observe that such processes have trajectories that are a.s. monotonic increasing and right-continuous, as is true in particular for the counting processes N (t) ≡ N (t, ω) ≡ N ((0, t], ω) of a point process N (·, ω) on R+ . A more important extension, taken up in Section 14.2, is to multivariate and marked point processes. Because the mark may include a spatial location as well as a size or indicator variable, this extension also includes space–time processes. In such cases, it is necessary to consider, not just a single cumulative process, but a family of cumulative processes indexed by the bounded Borel sets of the mark space. To facilitate the discussion of martingale properties, we suppose throughout the chapter that, unless otherwise stated, the point processes and random measures in view have boundedly ﬁnite ﬁrst moment measures, or ﬁnite mean rates in the case of a stationary process. The ‘information available at time t’ is represented mathematically by a σ-algebra Ft of sets from the underlying probability space (Ω, E, P). The 1

We use R+ = [0, ∞) and R+ 0 = (0, ∞) to distinguish the closed and open half-lines.

14.1.

Compensators and Martingales

357

accumulation of information with time is reﬂected in Ft being a member of an increasing family F = {Fs : 0 ≤ s < ∞} of σ-algebras; that is, Fs ⊆ Ft for 0 ≤ s ≤ t < ∞. F is called a history for the process ξ provided ξ(t) is Ft -measurable for all 0 ≤ t < ∞ (i.e., ξ is F -adapted). F0 plays the special role of subsuming all information / available before observations commence at 0+, and the σ-algebra F∞ ≡ t≥0 Ft subsumes all information in the history F . For the rest of this section ‘the pair (ξ, F )’ or ‘the process (ξ, F )’ always means a cumulative process ξ and a history F such that ξ is F -adapted. The history H consisting of the σ-algebras Ht generated for each t by {ξ(s): 0 < s ≤ t} plays a special role: we call it the internal history (it is also called the natural or minimal history, reﬂecting the fact that H is the smallest family of nested σ-algebras to which the observed values of ξ are adapted). Note that H0 = {∅, Ω}. Histories with the particular structure Ft = F0 ∨ Ht , with F0 in general nontrivial, are called by Br´emaud intrinsic histories; among other uses, they are important in the analysis of doubly stochastic processes. The special considerations which are associated with a stationary process observed over a ﬁnite or inﬁnite past are examined as part of the discussion of complete conditional intensitiesin Sections 14.3 and 14.7. A history F is called right-continuous if Ft = s>t Fs ≡ Ft+ . Counting processes and other cumulative processes, being right-continuous and boundedly ﬁnite by assumption, necessarily yield internal histories that are rightcontinuous. In general, right-continuity represents a mild constraint on the admissible forms of conditioning information; in any case, whenever the process is adapted to the history F , it is adapted also to the right-continuous history F(+) ≡ {Ft+ : 0 ≤ t < ∞} (see Exercise 14.1.3). It is part of our basic framework that the realizations of the cumulative process are a.s. ﬁnite for ﬁnite t, for this reﬂects our assumption that, as random measures, the trajectories are a.s. boundedly ﬁnite (elements of M# R+ ). This assumption rules out the possibility of explosions, and imposes a certain requirement on the sequence of F -stopping times deﬁned for n = 0, 1, . . . (see Deﬁnition A3.3.II and Lemma A3.3.III for background) by Tn ≡ Tn (ω) = sup{t: ξ(t, ω) < n} ∞ if ξ(t, ω) < n for all 0 < t < ∞, = (14.1.1) inf{t: ξ(t, ω) ≥ n} otherwise, namely, that Tn → ∞ a.s. as n → ∞. Exercise 14.1.5 addresses the case that the sequence is ﬁnite; otherwise we suppose the sequence continues indeﬁnitely. A feature of the general theory of processes is that the family ξ(t, ω) is regarded as a single real-valued mapping ξ: R+ × Ω → R+ rather than as an indexed family of r.v.s. The product σ-algebra B(R+ ) ⊗ E of sets from the product space R+ × Ω contains a hierarchy of important sub-σ-algebras, each of which is associated with a corresponding class of processes, namely, the measurable, progressively measurable, and predictable processes; these are deﬁned and discussed brieﬂy in Section A3.3.

358

14. Evolutionary Processes and Predictability

In this section we need especially the concept of an F -predictable process X, which is a process measurable with respect to the F -predictable σ-algebra, which in turn is the sub-σ-algebra of B(R+ ) ⊗ E generated by all product sets of the form (s, t] × U for U ∈ Fs and 0 ≤ s < t < ∞ (see above Lemma A3.3.I). The main result of this section is a theorem which asserts that, for every pair (ξ, F ), where ξ is F -adapted, there exists an integrated form of the conditional intensity function described in Section 7.2 that is predictable, and is the key to a martingale property extending the result of Lemma 7.2.V. In general, what must be subtracted from an increasing process ξ to yield a martingale is called a compensator; it is formally deﬁned as follows. Deﬁnition 14.1.I. Let ξ(t) be an F -adapted cumulative process on R+ . An F -compensator for ξ is a monotonic nondecreasing right-continuous predictable process A(·) such that for each n and F -stopping time Tn at (14.1.1), the stopped process {ξ(t ∧ Tn ) − A(t ∧ Tn ): 0 ≤ t < ∞} is an F -martingale. In passing, note that a process {D(t): 0 ≤ t < ∞} such that {D(t ∧ Tn ): 0 ≤ t < ∞} is a martingale for some sequence of stopping times {Tn } for which E|D(t ∧ Tn )| < ∞ (all t ≥ 0, n = 1, 2, . . .), is often called a local martingale. The notion occurs repeatedly in more general treatments of point processes [e.g., Liptser and Shiryaev (1974, 1977, 1978, 2000)]. See also Exercise 14.1.7. Example 14.1(a) below, although trivial, illustrates the fact that the compensator is eﬀectively of interest only for processes with jumps: indeed, as subsequent examples illustrate, the compensator may be regarded as a device for smoothing out jumps and producing an a.s. diﬀuse random measure from a random measure that may have atoms, but has no ﬁxed atoms. Example 14.1(a) Cumulative process with density. Suppose ξ(·) is the cumulative process of an absolutely continuous random measure, with density x(t, ω) some F -progressively measurable nonnegative process. Then ξ(t) = t x(u) du is its own compensator, for because x(·, ω) ≥ 0, ξ is monotonic 0 nondecreasing and continuous, and for this reason predictable, inasmuch as it is both F -adapted and left-continuous. Given a pair (ξ, F ) as in Deﬁnition 14.1.I, the ﬁrst problem is to give conditions that ensure that a compensator for ξ exists. We start with the simplest example, a one-point process consisting of a single point whose location is deﬁned by a positive r.v. X with d.f. F [see Example 7.4(b)]. The associated counting process is deﬁned by N (t, ω) = I(0,t) X(ω)

(0 < t < ∞, ω ∈ Ω).

(14.1.2)

If we let F coincide with the internal history of the process, that is,

Ht ∈ H is the σ-algebra generated by the sets {ω: X(ω) ≤ s}: 0 < s ≤ t , we can give a direct construction of the H-compensator without any need to appeal to the deeper theorems of the general theory of processes.

14.1.

Compensators and Martingales

359

Observe ﬁrst that N is monotonic nondecreasing, right-continuous, and even uniformly bounded so there is no problem about the existence of moments. Next, because N (t, ω) = 1 implies N (t , ω) = 1 for all t ≥ t, the compensator, for the same ω, must also be constant for such t ≥ t. On the other hand, if N (t, ω) = 0, then we know that X(ω) > t, and thus, in a small interval (t, t + dt), we can expect E[dN (t, ω)] ≈

dF (t) , 1 − F (t)

(14.1.3)

which equals h(t) dt if the d.f. has a density and hence a hazard function h. These heuristics are approximately correct; the key to obtaining a precise statement is the integrated hazard function (IHF) of Deﬁnition 4.6.IV. Lemma 14.1.II. The one-point process N at (14.1.2) generated by the positive r.v. X has H-compensator A(t, ω) = H(t ∧ X(ω))

(0 < t < ∞, ω ∈ Ω),

where H is the IHF of X,

H(t) = 0

t

(14.1.4a)

dF (x) . 1 − F (x−)

(14.1.4b)

The compensator A(t) so deﬁned is continuous except at jumps ui of F , where in terms of ∆F (u) = F (u) − F (u − 0), A(·) has jumps of height ai = ∆F (ui )/[1 − F (ui −)] ≤ 1,

(14.1.5)

with equality if and only if ∆F (ui ) = 1 − F (ui −). Proof. Note that for each Ht the set {ω: X(ω) > t} constitutes a large ‘atom’ (i.e., a subset of Ω that cannot be decomposed by the σ-algebra). Of course, H0 = {∅, Ω}, whereas H∞ is the σ-algebra generated by the r.v. X. Like H itself (see Deﬁnition 4.6.IV), H(t ∧ X(ω)) is monotonic increasing and right-continuous in t. To verify that it is predictable, we ﬁrst check that X(t, ω) ≡ t ∧ X(ω) is predictable, so we study {(t, ω): X(t, ω) > x} = {t > x} × {ω: X(ω) > x}. Now {ω: X(ω) > x} ∈ Hx , so the set in (t, ω) has the form of a generating set for the predictable σ-algebra. Thus, X(t, ω) is predictable. The IHF H(x) is monotonic increasing and right-continuous in x and thus has a uniquely deﬁned inverse H −1 for which H(x) ≥ y if and only if x ≥ H −1 (y). In particular, {X(t, ω) ≥ H −1 (y)} is a predictable set, so H(X(t, ω)) is a predictable process. It remains to verify the martingale property that for ﬁxed s and t with 0 ≤ s < t, E[N (t ∧ X) − H(t ∧ X) | Hs ] = N (s ∧ X) − H(s ∧ X)

a.s.

(14.1.6)

360

14. Evolutionary Processes and Predictability

Note ﬁrst that, because of the special structure of Ht here, we have for any bounded function g(·), ⎧ ⎨ g(X) E[g(X) | Ht ] = E[g(X)I{X>t} ] ⎩ E(I{X>t} )

on {X(ω) ≤ t}, on {X(ω) > t},

(14.1.7)

because when X > t, E[g(X) | Ht ] = E[g(X) | X > t], so the second case of (14.1.7) can be written in terms of the d.f. F (·) of X as ∞ 1 g(u) F (du) on {t < X(ω)}. E[g(X) | Ht ] = 1 − F (t) t On {s ≥ X(ω)}, N (t∧X) = N (s∧X) = 1 and H(t∧X) = H(s∧X) = H(X), which for X ≤ s is Hs -measurable, so (14.1.6) holds in this case. On the complement where {s < X(ω)}, using (14.1.7), 1 E[N (t ∧ X) | Hs ] = 1 − F (s)

t

F (du) = s

F (t) − F (s) , 1 − F (s)

so that from Lemma 4.6.I we obtain [1 − F (s)][E(H(t ∧ X) | Hs ) − H(s)] t [H(u) − H(s)] F (du) + [H(t) − H(s)][1 − F (t)] = F (t) − F (s). = s

Thus, E[N (t ∧ X) − H(t ∧ X) | Hs ] = −H(s) on {s < X(ω)}, so (14.1.6) holds generally as asserted. It is a standard property of the distribution function F of an honest positive r.v. that F (x) = Fa (x) + Fc (x) for a purely atomic function Fa and a continuous function Fc (see above Lemma A1.6.II). Similarly, the IHF H of F , being a monotonic nondecreasing function, can be decomposed as H(x) = Ha (x) + Hc (x), with x x dFa (u) + dFc (u) dFc (u) = , ai + Ha (x) + Hc (x) = 1 − F (u−) 1 − F (u−) 0 0 i:ui ≤x

where on the right-hand side the sum is atomic and the integral is a continuous function of x. The asserted nature of A(·) follows. Example 14.1(b) One-point process with absolutely continuous or discontinuous H-compensator. Suppose ﬁrst X above has an exponential distribution, say F (x) = 1 − e−λx , so that its IHF H(t) = λt. Then the corresponding onepoint process has A(t) = λ min(t, X) which is diﬀerentiable except at X, and in any case is absolutely continuous with density λ∗ (t) = λ (t ≤ X), = 0 (t > X).

14.1.

Compensators and Martingales

361

Now suppose that X = 1 with probability p, and with probability 1 − p is exponential as above, so that X has survivor function p + (1 − p)e−λt (t < 1), S(t) = (t ≥ 1). (1 − p)e−λt Then from (4.6.4) X has IHF λt − log 1 + p(eλt − 1) (t < 1), λt (t ≥ 1), with a jump of size ∆H = log 1 + p(eλ − 1) at t = 1. The compensator is given by A(t) = H(t∧X), and thus has a discontinuity at x = 1 if X ≥ 1. The risk is reduced in the interval (0, 1) because there is a positive probability that the event will occur at the end of the interval rather than randomly during the interval. Typically, discontinuities in the compensator are associated with the occurrence of deterministic elements such as ﬁxed atoms as around (14.1.5). See Exercise 14.1.10 for further examples.

H(t) =

An important extension of Example 14.1(b) can be given when F is an intrinsic history for the one-point process and so consists of σ-algebras of the form Ft = F0 ∨ Ht . At least in the case that X has a regular conditional probability distribution given F0 , a version of which we denote F (· | F0 ), the inﬂuence of F0 can be described very simply: all we need to do in Lemma 14.1.II is to replace the distribution of X and its IHF by this conditional distribution F (· | F0 ) and its associated IHF, H(t | F0 ) = 0

t

dF (u | F0 ) . 1 − F (u− | F0 )

Lemma 14.1.III. A one-point process with prior σ-algebra F0 and regular conditional distribution F (· | F0 ) for X has compensator H(t | F0 ) relative to the intrinsic history Ft = F0 ∨ Ht . Proof. Note ﬁrst that because Ft ⊇ F0 , E( · | F0 ) = E E(· | Ft ) | F0 . We now claim that for nonnegative measurable functions g: R → R, ⎧ on {X(ω) ≤ t}, ⎨ g(X) ∞ (14.1.8) E[g(X) | Ft ] = g(u) F (du | F0 ) ⎩ t on {X(ω) > t}. 1 − F (t | F0 ) The ﬁrst part of (14.1.8) is obvious, while on {X(ω) > t}, Ft consists entirely of sets of the form U ∩ {X(ω) > t} for some U ∈ F0 . In this case we can write g(X(ω)) P(dω) = I{X>t} g(X(ω)) P(dω) U ∩{X>t}

U

362

14. Evolutionary Processes and Predictability

E[g(X)I{X>t} | F0 ] P(dω)

= U U

I{X>t} E[g(X)I{X>t} | F0 ] P(dω) E(I{X>t} )

U

E[g(X)I{X>t} | F0 ] P(dω). E(I{X>t} | F0 )

= =

(14.1.9)

The ﬁrst expression in this chain reduces to the left-hand side of (14.1.8) and, from the assumption that F (· | F0 ) is a version of the regular conditional distribution, the last expression reduces to the right-hand side of (14.1.8) on {X(ω) > t} as asserted. This result can now be used in place of (14.1.6) to establish the compensator property of the conditional IHF, provided at least that we can manipulate the conditional distributions in the same way as unconditional distributions: this is certainly the case when we can choose a regular version of the conditional distribution. Example 14.1(c) One-point process with randomized hazard function. To take a speciﬁc example, suppose that X has a negative exponential distribution with parameter λ, where λ itself is a positive r.v. determined by F0 (i.e., λ is F0 -measurable). Then the F -compensator, AF (t, ω) say, can be represented in terms of the IHF of the exponential (λ) distribution, namely, AF (t, ω) = λ(t ∧ X(ω)). On the other hand, to ﬁnd the H-compensator we must ﬁrst evaluate the survivor function for the resultant mixed exponential distribution. If, for example, λ itself has a unit exponential distribution with density e−λ dλ, then the unconditional survivor function is ∞ 1 . e−λt e−λ dλ = H(t) = E[H(t) | F0 ] = 1 + t 0 The IHF is therefore equal to log(1+t), and for the H-compensator we obtain AH (t, ω) = log(1 + t ∧ X(ω)). Such examples show that the choice of prior σ-algebra can drastically aﬀect the form of the compensator. We can now construct the compensator for a simple point process with respect to the intrinsic history F = {F0 ∨Ht : 0 < t < ∞}; that is, we allow some initial conditioning as in the last example. Such a history F is completely described by the initial σ-algebra F0 and the family of stopping times {Tn } as at (14.1.1): in view of the assumed simplicity, {Tn } is a.s. a strictly increasing sequence. Given F(n−1) ≡ FTn−1 ,

14.1.

Compensators and Martingales

363

which means we are given F0 and T1 , . . . , Tn−1 , choose a family of regular conditional distributions Gn (x | F(n−1) ) for the distributions of the successive diﬀerences (n = 1, 2, . . . , T0 ≡ 0). τn = Tn − Tn−1 Writing N (t) =

∞

[N (t ∧ Tn ) − N (t ∧ Tn−1 )] =

n=1

∞

N (n) (t) say,

n=1

each N (n) (·) is a one-point process with a single point of increase at Tn . Deﬁning now the IHFs Hn (·) ≡ Hn (· | F(n−1) ) from the conditional d.f.s Gn (· | F(n−1) ) by x Gn (du | F(n−1) ) , Hn (x | F(n−1) ) = 0 1 − Gn (u− | F(n−1) ) we assert that the F -compensator for N (n) (·) has the form ⎧ on t < Tn−1 (ω), ⎪ ⎨0 on Tn−1 (ω) ≤ t < Tn (ω), A(n) (t, ω) = Hn (t − Tn−1 ) ⎪ ⎩ Hn (Tn − Tn−1 ) on Tn (ω) ≤ t.

(14.1.10)

Then by additivity, N (·) has the F -compensator A(t, ω) =

∞

A(n) (t, ω).

n=1

To establish (14.1.10), note that predictability of A(n) (·) is established as in Lemma 14.1.II, so it remains to show that each diﬀerence Z (n) (t, ω) ≡ N (n) (t, ω) − A(n) (t, ω) is an F -martingale. We establish the requisite equality E[Z (n) (t) | Fs ] = Z (n) (s)

(14.1.11)

for 0 < s ≤ t separately on the sets Bn = {ω: Tn−1 ≤ s} and Bnc , observing that Bn and Bnc ∈ F(n−1) . Considering ﬁrst the subsets of Bn , we have Fs ∩ {Tn−1 (ω) ≤ s < Tn (ω)} = F(n−1) ∩ {Tn−1 (ω) ≤ s < Tn (ω)}, (14.1.12) which means that, given any C ∈ Fs , there exists C ∈ F(n−1) such that C ∩ Bn = C ∩ Bn and conversely: that this is so is clear from the structure of the σ-algebra Fs (because Fs ⊃ Hs ) and a consideration of the basic sets such as {ω: N (s, ω) = k}. Now on Bn , the stopping time τn plays the same role for N (n) (·) as X plays for the one-point process of Example 14.1(b), with

364

14. Evolutionary Processes and Predictability

F(n−1) here playing the role of F0 there. In particular, on Bn we have for any bounded measurable function f (·) E[f (τn )I{τn >xn } | F(n−1) ] = E[f (τn ) | Fs ] = E[I{τn >xn } | F(n−1) ]

∞

xn

f (u) Gn (du | F(n−1) ) , 1 − Gn (xn | F(n−1) )

where xn = s − Tn−1 , necessarily ≥ 0 on Bn . In principle, this evaluation of the conditional expectation involves the extension of (14.1.8) to the case where t there (equals xn here) is a r.v. measurable with respect to the prior σalgebra F0 there (which is F(n−1) here). However, scrutiny of (14.1.9) and the surrounding argument shows that nothing need be altered there, with (14.1.8) remaining F0 -measurable, so (14.1.9) is still valid. Thus, on Bn , proof of the martingale equality (14.1.11) follows as in Lemma 14.1.III. On the sets {s < t < Tn−1 (ω)} and {s ≥ Tn (ω)} the equality is trivial because all terms are zero. There remains the case {s < Tn−1 (ω) ≤ t}. Here we proceed by conditioning ﬁrst on F(n−1) , when equality follows as a special case of the above, because this equality is not aﬀected by further conditioning on Fs . We summarize this discussion as follows. Theorem 14.1.IV. Let N be the counting process of a simple point process on (0, ∞), F a history for N of the form {F0 ∨ Ht }, and {Tn } the sequence of stopping times at (14.1.1). Suppose there exist regular versions Gn (· | F(n−1) ) of the conditional d.f.s of the intervals τn = Tn − Tn−1 , given F(n−1) , such that 1 − Gn (x−) > 0 for x > 0. Then a version of the F -compensator for N is given by ∞ A(n) (t, ω), (14.1.13a) A(t, ω) = n=1

where ⎧ ⎪ ⎨0 (n) A (t, ω) = ⎪ ⎩

0

(t ≤ Tn−1 (ω)), τn ∨(t−Tn−1 )+

Gn (du | F(n−1) ) 1 − Gn (u− | F(n−1) )

(t > Tn−1 (ω)). (14.1.13b)

The following special case ties in the result above with the earlier discussion of Section 7.2, in particular with Proposition 7.2.I and Deﬁnition 7.2.II. Corollary 14.1.V. The F -compensator A(·) at (14.1.13) is absolutely continuous a.s. if and only if the conditional d.f.s Gn (· | F(n−1) ) have absolutely continuous versions, with densities gn (· | F(n−1) ) say, in which case one version of the F -compensator is given by

t

A(t, ω) = 0

λ∗ (u, ω) du,

14.1.

Compensators and Martingales

365

where λ∗ (t, ω) =

∞ n=1

λ∗n (t, ω) ≡

∞ gn (t ∧ Tn − Tn−1 | F(n−1) )I{Tn−1
In particular, the compensator has this form when N is regular; in this case λ∗ (t, ω) deﬁned above coincides a.s. and t-a.e. with the function denoted λ∗ (·) in Deﬁnition 7.2.II. The construction also yields the following important result. Proposition 14.1.VI. Under the conditions of Theorem 14.1.IV, the point process deﬁnes the compensator uniquely, and, conversely, the compensator uniquely deﬁnes the process. Proof. It is clear from the construction that the compensator is uniquely deﬁned by the process. Within the class of processes referred to in Theorem 14.1.IV, the converse is also true, for, step by step, the compensator determines the functions Gn and hence ultimately the full set of ﬁdi distributions for the process. Some further details are given in Exercises 14.1.13–14. As noted in Proposition 7.2.IV, this result has the corollary that if the compensator is absolutely continuous, then also the conditional intensity function characterizes the point process uniquely. As an example to illustrate how the reconstruction of the point process from its compensator proceeds, consider Watanabe’s characterization of the Poisson process, the ﬁrst and still one of the most striking applications of martingale ideas to point processes. Watanabe’s proof used martingale calculus ideas of the type that we develop further in ensuing sections. Here we treat it as a simple special case of Proposition 14.1.VI and its corollary noted above. Example 14.1(d) Watanabe’s theorem (Watanabe, 1964). Suppose that the conditional intensity is a constant, λ∗ (t) ≡ λ, so A(t, ω) = λt. Then there is no dependence on the past, and an induction argument shows that all the conditional d.f.s Gn (u | F(n−1) ) of (14.1.13) have the same form 1 − e−λu , and that the random variables to which they relate are mutually independent. Thus successive intervals are independently and exponentially distributed, and the process must be Poisson. Much of the preceding discussion can be extended in a straightforward manner to MPPs, but is deferred to the more extended discussion in the next section. For completely general histories it is not possible to provide a representation for the F -compensator analogous to that of Theorem 14.1.IV, even in the point process case: there are too many diﬀerent ways in which the conditioning information can aﬀect the duration of the interval lengths. In speciﬁc situations, however, when the history can be identiﬁed with the internal history for some larger process of which the point process forms part, it may not be too diﬃcult to identify appropriate distributional forms.

366

14. Evolutionary Processes and Predictability

5

N(t)−A(t)

0 −5 −10 −15 −20 −25

0

5

10

15 t

20

25

30

Figure 14.1. Martingale N (t)−A(t) for compensator (——) of renewal process with uniform (0,2) lifetimes, with incorrect ‘martingales’ from a unit rate Poisson process (· · · · · ·) and triangular (0,2) lifetimes (– – –).

Example 14.1(e) Simple and modulated renewal process [continued from Example 7.2(e) and Exercise 7.2.11(c)]. For a simple renewal process with absolutely continuous lifetime distribution and hence a hazard function h(·) say, we know already that the conditional intensity has the form h(t − TN (t) ). Evidently, the corresponding compensator has the form

N (t)

A(t) =

H(Tn − Tn−1 ) + H(t − TN (t) ),

(14.1.14)

n=1

where H(·) is the IHF corresponding to h (for t < T1 , N (t) = 0 and the summation term vanishes). From Theorem 14.1.IV it is easy to see that this holds for a general renewal process whose lifetime r.v.s are positive a.s. In Figure 14.1 we have plotted (as the continuous curve) the martingale M (t) = N (t) − A(t) that results from a realization of a renewal process on (0, 30) in which the lifetimes are uniformly distributed on (0, 2) so they have mean 1 and the IHF H(u) = − log(1 − 12 u) (0 < u < 2). Two further ‘martingales’ are plotted in Figure 14.1 to show the eﬀect of using (14.1.14) with incorrect compensators as a result of making an incorrect assumption as to the underlying lifetime distribution. The dashed curve results from using the IHF H(u) = u as would hold for a unit rate Poisson process, and the dashed curve from using H(u) = − log(1− 12 t2 ) for 0 < t ≤ 1, = log[2/(2 − t)2 ] for 1 ≤ t < 2, as holds for a renewal process whose lifetimes have the triangular density 1 − |t − 1| on (0, 2). See Exercise 14.1.11. Suppose next that, as in Example 7.2(e), we also observe a family (vector) of stochastic processes {X(t): 0 < t < ∞} ≡ {X1 (t), . . . , Xk (t): 0 < t < ∞}, and identify the history F with the joint history generated by the σ-algebras Ft = HtN ∨ HtX , where on the right-hand side there are denoted σ-algebras of the internal histories of {N (t): 0 < t < ∞} and {X(t): 0 < t < ∞}. Returning to the

14.1.

Compensators and Martingales

367

absolutely continuous case for ease of exposition, suppose that the hazard function in successive intervals is modiﬁed in a multiplicative fashion by some nonnegative function ψ(X1 (·), . . . , Xk (·)); that is, we take λ∗ (t) = h(t − TN (t) ) ψ(X1 (t), . . . , Xk (t)). In this set-up, the F -compensator would be found by integrating λ∗ (t) over successive intervals. As a very particular illustration, suppose k = 1, that ψ(·) is the step function ψ(x) = 2 if x > 0 and ψ(x) = 1 otherwise, and that h(x) = λ, corresponding to exponential interarrival times in the absence of the modulating factor. Then for any measurable X(·) we would have A(t) = λ(t + Yt ), where the random variable Yt is the length of time for which X(s) > 0 during the interval (0 < s < t). This assumes that the process X(t) is observable; when this is not the case a ﬁltering problem is involved, requiring averaging of the F -intensity over the coarser σ-algebras of the internal history, as discussed further in Sections 14.3 and 14.4. For more general processes ξ(·) and histories F , even if we cannot establish explicit representations, many important results can still be derived from the Doob–Meyer decomposition, as, for example, below where we show both the existence and uniqueness of the compensator for a cumulative process ξ with general history F and in discussing quadratic variation.. Theorem 14.1.VII. Let {ξ(t): t > 0} be a cumulative process adapted to the right-continuous history F . Then ξ(·) admits an F -compensator A(t, ω), which is uniquely deﬁned P-a.e. in the sense that for any other compensator ˜ ω), P{A(t, ω) = A(t, ˜ ω) (all t)} = 1. A(t, Proof. We again use the stopped process ξn (t) = ξ(t ∧ Tn ), where the stopping times {Tn } are as at (14.1.1). Because ξn (t, ω) ≤ n, each ξn (·) is uniformly bounded in (t, ω), and also has bounded ﬁrst moment, so that in addition it is uniformly integrable in t. Also, each ξn (t) has its trajectory a.s. nondecreasing in t, so for 0 < s < t, E[ξn (t) | Fs ] = E[ξn (t) − ξn (s) | Fs ] + E[ξn (s) | Fs ] ≥ ξn (s)

a.s. on Fs .

Thus, {ξn (t): 0 < t < ∞} is a right-continuous, bounded submartingale with respect to the history F , and the Doob–Meyer decomposition (Theorem A3.4.IX) can be applied. It implies that there exists a right-continuous nondecreasing F(n) -predictable process An (·) and an F -martingale Mn (·) such that ξn (t) = An (t) + Mn (t).

368

14. Evolutionary Processes and Predictability

Moreover, the processes An , Mn are uniquely deﬁned P-a.s. as functions on (0, ∞), which implies that the functions {An (t)} are a.s. nested in the sense that for m > n, An (t, ω) ≤ Am (t, ω) a.s. for t ≤ Tn (ω). Now the deﬁnition of a cumulative process in terms of a boundedly ﬁnite random measure requires Tn → ∞ a.s.; letting n → ∞ it follows that, a.s. for all t, a well-deﬁned limit A(t, ω) exists and deﬁnes an F -adapted process such that for every n > 0, A(t, ω) = An (t, ω)

t ≤ Tn (ω) .

Clearly, A(t, ω) inherits the monotonicity and right-continuity properties from each member of the sequence {An (t, ω)}. For predictability, observe that because Tn → ∞, the left-continuous F(n) adapted processes generate left-continuous F -adapted processes, so A(t, ω), which is F(n) -predictable for each n, is also F -predictable. Finally, uniqueness of the overall decomposition ξ(t) = A(t) + M (t) follows from the uniqueness of the Doob–Meyer decomposition on each of the sets t ≤ Tn . The results do not address directly the question of which predictable cumulative processes can be compensators, nor which nonnegative predictable processes can be conditional intensities. To make sense of this question we suppose ﬁrst that the processes are deﬁned on the canonical space X × NX# , with X = R+ , where a predictable cumulative process A(t, N ) takes the form of a function Ψ{N (t), T1 (N ), . . . , TN (t) (N )} of the points of the realization occurring before time t. It is then a matter of ﬁnding conditions on the function Ψ that allow a consistent set of integrated hazard functions to be deﬁned from it. These conditions must ensure that the hazard functions satisfy the requirements set out in Exercise 14.1.8. A brief outline of the argument is sketched in Exercises 14.1.13–14; a thorough discussion, incorporating also the marked case, is given by Last and Brandt (1995, Chapter 8). Another important application of the Doob–Meyer decomposition is in proving the existence of the quadratic variation of a martingale when the martingale itself is square integrable. Let ξ(t) be an F -adapted cumulative process on R+ with ﬁnite second moments E [ξ(t)]2 < ∞

(0 < t < ∞),

write A(t) for its F -compensator (which then necessarily exists), and M (t) for the F -martingale ξ(t)−A(t). Then [M (t)]2 is again an F -adapted process whose expected increments are nonnegative because E[(M (t))2 − (M (s))2 | Fs ] = E[(M (t) − M (s))2 | Fs ] + 2E[(M (t) − M (s))M (s) | Fs ] = E[(M (t) − M (s))2 | Fs ],

(14.1.15)

14.1.

Compensators and Martingales

369

using the fact that M (·) is an F -martingale. Thus [M (t)]2 is an F -submartingale, which therefore has a Doob–Meyer decomposition, say [M (t)]2 = Q(t) + M2 (t),

(14.1.16)

where M2 (t) is the F -martingale component, and the F -compensator Q(t) is called the quadratic variation process. The name stems from the fact, as follows from (14.1.16) on taking expectations and using the martingale property of M2 , that E Q(t) − Q(s) | Fs = E [M (t)]2 − [M (s)]2 | Fs − E[M2 (t) − M2 (s) | Fs ] = E [M (t) − M (s)]2 | Fs (14.1.17) (cf. the last equation of A3.4, where, however, the conditional expectation on the left-hand side has been omitted). But E M (t) − M (s) | Fs = 0, so (14.1.17) shows that the increments in Q are the conditional variances of the increments in the martingale M , and hence the terminology. The right-hand side of (14.1.17) also provides one approach to evaluating Q: write the argument as t t 2 M (du) M (dv) = (M × M )(du × dv), [M (t) − M (s)] = s

(s,t]×(s,t]

s

and consider the integral on the three regions D1 = {s < u < v ≤ t}, D2 = {s < v < u ≤ t}, and D3 = {s < u = v ≤ t}. On D1 the martingale property implies that " E[M (du) M (dv) | Fs ] = E E[M (du) M (dv) | Fu ] " Fs " = E M (du) E[M (dv) | Fu ] " Fs = 0, and similarly for the integral over D2 . This leaves only the conditional expectation of the integral over D3 , hence " " (14.1.18) (M × M )(du × dv) " Fs . E Q(t) − Q(s) | Fs = E s
The diagonal component of the product measure M × M is zero except where M itself has atoms, namely, at jumps of either N or its compensator A (or both), ∆N and ∆A say. The case of a simple point process with continuous compensator is particularly simple, for here the diagonal component is zero except at the jumps of N , for which (∆N )2 = ∆N . In this case we obtain t " " " dN (u) " Fs = E A(t) − A(s) " Fs , E Q(t) − Q(s) | Fs = E s

this last step following from the martingale property of N − A. It now follows from the uniqueness of the Doob–Meyer decomposition, and the fact that both Q and A are monotonic increasing and predictable, that Q = A a.s.

370

14. Evolutionary Processes and Predictability

Proposition 14.1.VIII. Let N be a simple point process with internal history H, continuous H-compensator A(·), and H-quadratic variation Q(·). Then under the conditions of Theorem 14.1.IV the processes Q(t) and A(t) coincide a.s.; that is, P{Q(t) = A(t) (all t)} = 1. This rather surprising result may be regarded as a further illustration of the locally Poisson character of a process with a conditional intensity, because it reﬂects the property - that the mean and the variance of a Poisson distribution are equal. Then λ∗ (t) plays the role of a local standard deviation. Exercise 14.1.17 gives a useful application. The proposition extends easily to multivariate point processes, as follows. Example 14.1(f) Quadratic variation for simple and multivariate point processes. Let N be a simple point process with jump-points {ti } and F -intensity λ∗ (t) on 0 < t < ∞. Then the martingale M (t) = N (t)−A(t) is the diﬀerence of a pure jump process N and a continuous process A, and from Proposition 14.1.VIII, Q(t) = A(t) a.s. Thus both the compensator and the quadratic variation of the point process are continuous, and they reduce to the same process. For an F -adapted multivariate point process Nj (t) (j = 1, . . . , J), the discussion can be extended to the cross terms Qjk (dt) = E[Mj (dt) Mk (dt)] (j = k). Provided the ground process is simple and Nj has conditional intensity λ∗j say, the probability that both components have a point in the same interval of length dt is O[(dt)2 ], so that, writing the conditional expectation E [Mj (t) − Mj (s)] [Mk (t) − Mk (s)] | Fs as the sum of integrals over the regions D1 , D2 , and D3 as below (14.1.17), all three terms now vanish. Thus the quadratic variation matrix equals diag A1 (·), . . . , Ak (·) in which the jth term is the compensator for Nj and has conditional F -intensity λ∗j (·). When the compensator for N is not continuous, it is more diﬃcult to evaluate the expectation in (14.1.18) (see Exercise 14.1.16 for a simple example). An alternative, to which we now turn, is to imitate the direct analysis that led to the form of the one-point process established in Lemma 14.1.II [Elliott’s (1976) derivation of Lemma 14.1.IX uses properties of orthogonal square integrable martingales, as also in the book form of Elliott (1982, Section 15.1)]. The quadratic variation for general point processes then follows from a sequence of extensions analogous to those used in establishing the general form of the point process compensator in Theorem 14.1.IV from Lemma 14.1.III. For further discussions see Elliott or Karatzas and Shreve (1988). For a one-point process, as in general, the quadratic variation Q can be written as the sum Qc + Qa of continuous and atomic components, where Qc = Ac and Qa Aa ; speciﬁcally, in the notation of Lemma 14.1.II, Qa (t) =

ui ≤min(t,X)

(1 − ai )ai <

ui ≤min(t,X)

ai = Aa (t).

(14.1.19)

14.1.

Compensators and Martingales

371

Lemma 14.1.IX. Let the H-martingale M for the one-point process N at (14.1.2) be square integrable. Then N has H-predictable quadratic variation Q(t) = H(t ∧ X) − H2 (t ∧ X), where H2 (t) =

ui ≤t

(14.1.20)

a2i and H(·), ai , and ui are as in (14.1.4b) and (14.1.5).

Proof. Using the notation of Lemma 14.1.II we have N (t, ω) = H(min(t, X(ω))) + M (t, ω), where H is the IHF given by (14.1.4b) and M is the H-martingale −H(t) if X(ω) > t, M (t, ω) = 1 − H(X) if X(ω) ≤ t, with H the intrinsic history determined by the positive r.v. X. Recall also from around (4.6.4) that the IHF H (denoted Q there) satisﬁes the equation

t

t

[1−F (u−)] dH(u) = [1−F (t−)]H(t)+

F (t) = 0

H(u) dF (u), (14.1.21a) 0

so that

[1 − F (t−)]H(t) =

t

[1 − H(u)] dF (u).

(14.1.21b)

0

The H-submartingale {[M (t, ω)]2 } has the Doob–Meyer decomposition [M (t, ω)]2 = Q(t, ω) + M2 (t, ω) for some quadratic variation process Q and H-martingale M2 . Equation (14.1.20) asserts that Q(t, ω) = Hc (t, ω) + ui ≤min(t,X) ai (1 − ai ), which is nondecreasing and H-predictable as in the proof of Lemma 14.1.II. So to prove the lemma it suﬃces to show that the H-adapted process if X(ω) > t, [H(t)]2 − H(t) + H2 (t) (14.1.22) M2 (t, ω) = 2 [1 − H(X)] − H(X) + H2 (X) if X(ω) ≤ t, is an H-martingale, for which it is enough to check that E M2 (t, ω) | Hs = M2 (s, ω) for t > s. From (14.1.22) this is trivially true on X(ω) ≤ s. On X(ω) > s we use (14.1.7) to evaluate J(s, t) ≡ E M2 (t, ω)−M2 (s, ω) | X > s . Then [1 − F (s)]J(s, t) equals

[H(u) − 1]2 − H(u) + H2 (u) dF (u) s +[1 − F (t)] H(t)[H(t) − 1] + H2 (t) − [1 − F (s)] H(s)[H(s) − 1] + H2 (s) , = J0 + J1 + J2 + J3 say, (14.1.23) t

372

14. Evolutionary Processes and Predictability

where J0 involves all the terms with H2 and is given by t J0 = H2 (u) dF (u) + [1 − F (t)]H2 (t) − [1 − F (s)]H2 (s) s t [1 − F (u−)] dH2 (u) = a2i [1 − F (ui −)], = s

s
J1 is the rest of the integral, and J2 and J3 are the boundary terms. The term J1 can be written as t t J1 = [1 − 2H(u)] dF (u) + [H(u) − 1]H(u) dF (u), s

s

and using Lemma 4.6.I, the second integral equals t F (u−) d [H(u) − 1]H(u) , F (t)[H(t) − 1]H(t) − F (s)[H(s) − 1]H(s) − s

where writing Hc for the continuous component of H and ai = ∆H(ui ), the last term equals t F (u−)[2H(u) − 1] dHc (u) − F (ui −) ∆H(ui )[2H(ui ) − 1] − a2i − s

=

t

a2i F (ui −) −

F (u−)[2H(u) − 1] dH(u) s

s
=

s

s

t

a2i F (ui −) +

t

[2H(u) − 1] dF (u) − s

[2H(u) − 1] dH(u), s

in which at the last step we have used dH(u) = dF (u)/[1 − F (u−)]. Now t 2H(u) dH(u) = [H(t)]2 − [H(s)]2 + a2i s

s
(use Lemma 4.6.I or Exercise 14.1.19). Putting these results together we ﬁnd that [1 − F (s)]J(s, t) vanishes identically for s < t as required.

Exercises and Complements to Section 14.1 14.1.1 Consider a simple point process N on X = Z+ ≡ {0, 1, . . .} adapted to the history F = {Fn : n ∈ Z+ }. Show the following: (a) A process {Xn : n ∈ Z+ } is F -adapted if Xn is Fn -measurable for each n = 0, 1, . . . . (b) {Xn } is F -predictable if each Xn is Fn−1 -measurable for n = 1, 2, . . . . (c) An F -adapted simple point process N on Z+ has F -compensator An given by n

E[N ({k}) | Fk−1 ].

An = k=1

14.1.

Compensators and Martingales

373

14.1.2 (Continuation). Suppose that N is a discrete time renewal process with lifetime distribution {fr } = {Pr{X = r}: r = 1, 2, . . .} (so f0 = 0). Then if N (n) = #{j = 1, . . . , n: renewal at j} = #{1 ≤ j ≤ n: N ({j}) = 1}, and T (k) = inf{j: N (k) = j},

f q = f

n−T (N (n−1))

∆An ≡ An − An−1

n−T (N (n−1))

(0) n−1

on {N (n − 1) ≥ 1}, on {N (n − 1) = 0},

(0)

where qr = 1−f1 −· · ·−fr−1 and {fr } is the distribution of the initial length. Deduce that if {fr } is the geometric distribution {(1 − ρ)ρr−1 : r = 1, 2, . . . }, then An = n(1 − ρ), representing a discrete analogue of the Poisson process.

14.1.3 Consider the history F(+) ≡ {Ft+ : 0 ≤ t < ∞}, where Ft+ ≡ s>t Fs . Prove that F(+) is right-continuous, and that if ξ(t) is F -adapted and rightcontinuous, then it is also F(+) -adapted. 14.1.4 Show that the history generated by a cumulative process or random measure on R+ need not necessarily be right-continuous. Investigate whether the origin is the only possible exceptional time. [Hint: Consider Lebesgue measure multiplied by a nondegenerate r.v.] 14.1.5 Let the sequence of stopping times at (14.1.1) satisfy Tn+1 = Tn for n ≥ n0 for some ﬁnite integer n0 , so that T∞ ≡ limn→∞ Tn < ∞. (a) Prove that T∞ is a stopping time. (b) Show that the conclusions of Lemma 14.1.II and Theorem 14.1.IV continue to hold for t satisfying 0 ≤ t < T∞ . What can be said when Tn < Tn+1 ↑ T∞ < ∞ ? 14.1.6 (a) Let {X(t): t ≥ 0} be an F -submartingale. Show that if EX(t) = EX(0) for some t > 0, then Y (s, ω) = X(min(s, t), ω) is an F -martingale. (b) Let S and T be two F -stopping times, and FS the S-prior σ-algebra (see Deﬁnition A3.4.V). For U ∈ FS and S ≤ T a.s., show that IU (ω) is F -predictable. [Hint: Suppose ﬁrst that S and T are both countablyvalued r.v.s, and then use the fact that general stopping times can be approximated from above by such countably-valued stopping times.] 14.1.7 Let ξ(t) be an F -adapted cumulative process with compensator A(t); deﬁne Z(t) = ξ(t) − A(t). Show that for any bounded F -predictable process Y (t), η(t) ≡

t

Y (u) Z(du) 0

is an F -local martingale; that is, η(t ∧ Tn ) is an F -martingale for each stopping time Tn as at (14.1.1). [Hint: The case Y (t, ω) = I{(s,t]×B} for B ∈ Fs and s < t, reduces to the deﬁning requirement of the compensator. An extension argument completes the proof.]

374

14. Evolutionary Processes and Predictability

14.1.8 Let N be a point process with internal history H. A process X(t, ω) on R+ is H-predictable if and only if for n = 0, 1, . . . there exist (B(R+ ) ⊗ HTn )measurable functions fn (t, ω) such that X(t, ω) = fn (t, ω)

First write X(t, ω) =

on Tn < t ≤ Tn+1 .

∞ [Hint: n=0 X(t, ω)I(Tn ,Tn+1 ] (t, ω). Then argue as around (14.1.8) that, on each such set, the predictability requirement implies that X(t, ω) must be HTn -measurable.]

14.1.9 (a) Verify that the quantity N (t) − A(t) in the one-point process of Lemma 14.1.II has a stopping time representation as in Exercise 14.1.7. (b) Extend the argument above to the one-point processes with prior σalgebra treated in Lemma 14.1.III. 14.1.10 (a) For Example 14.1(c), compare the two compensators with the internal history, and the intrinsic history where F0 contains the event {X = 1}. (b) For Example 8.5(b), write down the compensator for the bivariate Poisson process in which each parent point is followed after a ﬁxed delay h by a single oﬀspring point. Find the compensators for the joint process, for the parent points only, and for the oﬀspring points only. For the last case, compare the compensators when the history does or does not contain information about the past of the parent process. (c) Construct examples of discontinuous compensators ﬁrst for the one-point process when the d.f. of X is a mixture of discrete and continuous components, and then [cf. (b)] extend to situations where the conditional hazard functions are themselves discontinuous. 14.1.11 In the setting of Example 14.1(e), the sum in (14.1.14) ∼ E[H(T1 )]N (t) for large t. When the IHF H = Ha comes from a lifetime Ê ∞ d.f. Fa diﬀerent from the lifetime d.f. F underlying N (·), E[Ha (T1 )] = 0 [F (u)/F a (u)] Fa (du). This quantity equals 1 for the Poisson process, but as soon as it diﬀers from 1, |N (t) − Aa (t)| = O(t) |N (t) − A(t)| = o(t) for large t. In Figure 14.1, the upper tail behaviour of F a for the triangular lifetime distribution diﬀers markedly from F . 14.1.12 Show that the compensator of a sum of independent cumulative processes is the sum of the compensators of the components. Also consider more general linear combinations. [Hint: Take care with the histories.] 14.1.13 Let {Q0 (x), Qk (x; t1 , . . . , tk ): k = 1, 2, . . .} be a family of nonnegative functions deﬁned on 0 ≤ x < ∞, 0 < t1 < · · · < tk < · · · , satisfying the following conditions. (i) Qk (x; t1 , . . . , tk ) is measurable in t1 , . . . , tk for ﬁxed x, and monotonic nondecreasing and right-continuous in x for ﬁxed t1 , . . . , tk . (ii) Q0 (0) = Qk (0; t1 , . . . , tk ) = Qk (x; t1 , . . . , tk ) = 0 for x ≤ tk . (iii) At any discontinuity xi say, of Qk , ∆Qk (xi ) ≡ Qk (xi ) − Qk (xi −) ≤ 1, with ∆Qk (xi ) = 1 only if Qk (x; t1 , . . . , tk ) = Qk (xi ; t1 , . . . , tk ) for all x > xi . Interpret the {Qk (·)} as the IHFs for the successive intervals of a simple point process N on (N # (R+ ), B(N # (R+ ))) by showing that for given Qk the corresponding d.f.s are determined uniquely from the results in Section 4.6.

14.1.

Compensators and Martingales

375

Deduce that N has H-compensator A(·) given by A(t) = Qk (t; t1 , . . . , tk ) for tk < t ≤ tk+1 , and that only ﬁnitely many points can occur in any ﬁnite interval, provided also that (iv) the Janossy measures corresponding to the IHFs Qk are proper. Thus, the H-compensator A determines the distribution of N [cf. Boel et al. (1975), Davis (1976), Br´emaud (1981)]. 14.1.14 (Continuation). Replace the functions Qk (·) by a family of regular conditional IHFs given a prior σ-algebra F0 . Extend the uniqueness statement to a point process N with history {F0 ∨ Ht }. Find an extension to the case where the given process N is conditioned by a point process N evolving simultaneously with N . (N ) (N ) [Hint: Use a history with Ft = Ft ∨ Ft ; no essential diﬀerences arise, but the functions Qk are now functions of the points ti say, of the process N as well as the points ti of N , and can therefore change at both ti and ti .]

14.1.15 Quadratic variation (i): discrete-time point process. Let {Nn : n = 1, 2, . . .} n denote a discrete-time simple point process on Z+ 0 so Nn = i=1 Ii for {0, 1}-valued r.v.s Ii . Then [cf. Exercise 14.1.1(c)] its Doob–Meyer decomposition Nn = An + Mn for an F -martingale {Mn } and F -compensator {An } is determined by An = n i=1 E(Ii | Fi−1 ). Check that [Mn ]2 is an F -submartingale. The Doob–Meyer decomposition justiﬁes the representation [Mn ]2 = Qn + M2,n where {M2,n } is an F -martingale and

n

E(Ii | Fi−1 )[1 − E(Ii | Fi−1 )].

Qn = i=1

When Nn is a discrete-time renewal process with lifetime distribution {fj } (so fj = Pr{I1 = · · · = Ij−1 = 0, Ij = 1}), and Tn = sup{i ≤ n: Ii = 1}, E(In | Fn−1 ) = an−Tn−1 , where aj = fj /(1 − Fj−1 ), Fj = f1 + · · · + fj . For such a renewal process, with successive lifetimes {τr } say, the quadratic variation is expressible in terms of H(2) (k) = kj=1 aj (1 − aj ) [cf. (14.1.14)] as

Nn

n

ai−Ti−1 (1 − ai−Ti−1 ) =

Qn =

H(2) (τr ) + H(2) (n − Tn ). r=1

i=1

14.1.16 Quadratic variation (ii): continuous time examples. (a) Use (14.1.18) to show generally, if informally, that for a simple point process, the quadratic variation Q(u) increases by dAc (u) as u passes through a point of continuity of the compensator A, and by the sum a(1 − a)2 + (1 − a)a2 = a(1 − a) when u passes through a point where A has a jump of height a, the two terms in the sum representing a weighted average of the jumps associated with the occurrence and nonoccurrence of a point at u. Verify that the result is compatible with Lemma 14.1.IX for the case of a one-point process. (b) Consider the one-point process when X has the d.f. F (x) =

1 − e−λx

if x < 1,

1 − (1 − a)e−λx

if 1 ≤ x,

376

14. Evolutionary Processes and Predictability for some a ∈ (0, 1), so the corresponding IHF equals λx + aI{x≥1} . Show that the quadratic variation Q(x, ω) = λ min(x, X) + a(1 − a)I{X≥x} . (c) Investigate the forms of the compensator and quadratic variation for a one-point multivariate point process.

14.1.17 Let N be a simple stationary point process on R with continuous compensator A, mean rate m, and ﬁnite second moment. Show that for real or complex f ∈ L1 ∩ L2 ,

var R

f (x) [dN (t) − dA(t)]

=m R

|f (t)|2 dt.

Investigate what changes are needed if the compensator is not continuous. [Hint: Use results on the quadratic variation process in Proposition 14.1.IX. This idea is used in Br´emaud, Massouli´e and Ridolﬁ (2005) to establish results for the Bartlett spectrum of a process with a conditional intensity.] 14.1.18 Unbounded compensators. (a) Suppose the d.f. F of a nonnegative r.v. is purely atomic with atoms of mass (1 − p)pn−2 at points un = 1 − 1/n for n = 2, 3, . . . . Show that a one-point process as in Lemma 14.1.II whose determining r.v. X has such F as its d.f. has an explosive compensator. √ (b) Show a similar property for the d.f. F (x) = 1 − 1 − x . 14.1.19 For real-valued functions F and G of locally bounded variation on R, show that for any ﬁnite open interval (a, b),

F (x) dG(x) + (a,b)

(a,b)

G(x) dF (x) = F (b−)G(b−) − F (a+)G(a+) +

[∆L F (xi )∆L G(xi ) − ∆R F (xi )∆R G(xi )],

a<xi
where the sum extends over all discontinuities of such functions for which ∆L F (x) = F (x) − F (x−), ∆R F (x) = F (x+) − F (x). Recover Lemma 4.6.I. [Hint: See, e.g., Asplund and Bungart (1966, Proposition 8.5.5).]

14.2. Campbell Measure and Predictability The properties of compensators are closely linked with concepts from the general theory of processes, especially the Campbell measure on R+ × Ω deﬁned on product sets B × U for B ∈ B(R+ ) and U ∈ E by Cξ (B × U ) =

ξ(B, ω) P(dω).

(14.2.1)

U # In Chapter 13 we mostly took Ω to be the canonical space M# X or NX ; here we prefer a general Ω, because of the greater ﬂexibility this allows in the choice of the history F . We explore this connection ﬁrst for simple point processes,

14.2.

Campbell Measure and Predictability

377

and then extend the discussion to MPPs. As an application of these ideas we examine an extension of the limit theorems of Section 11.3 for thinned processes to situations where the thinning probability may depend on the past history of the process. As in Section 13.1, the set function Cξ can be extended to a σ-ﬁnite measure on the product σ-algebra B(R+ ) ⊗ E. This done, we focus attention on its restriction to the sets of the F -predictable σ-algebra ΨF of Section A3.3. In particular, for U ∈ Fs and B = (s, t] for 0 ≤ s < t < ∞, we have from the deﬁning relation P(dω) ξ(dx, ω). Cξ ((s, t] × U ) = (s,t]

U

On the other hand, the martingale relation of Theorem 14.1.VII implies that for U ∈ Fs and n = 1, 2, . . . , [ξ(t ∧ Tn , ω) − A(t ∧ Tn , ω)] P(dω) U = [ξ(s ∧ Tn , ω) − A(s ∧ Tn , ω)] P(dω), U

which on rearrangement gives t∧Tn P(dω) ξ(dx, ω) = P(dω) U

s∧Tn

U

t∧Tn

A(dx, ω). s∧Tn

Then monotone convergence can be used to let n → ∞ and we conclude that t t P(dω) ξ(dx, ω) = P(dω) A(dx, ω), s

U

or, equivalently, I(s,t]×U (x, ω) ξ(dx, ω) = E E R+

s

U

R+

I(s,t]×U (x, ω) A(dx, ω).

(14.2.2)

Comparison with (13.1.2) shows that this is just the assertion that on ΨF the Campbell measures induced by ξ and its compensator A coincide. More generally, the indicator function in (14.2.2) can be replaced by any ΨF -measurable function Y (·). But such a function is just an F -predictable process. This argument can be reversed, and leads to the following alternative characterization of the compensator, equivalent to the martingale characterization, but couched here in terms of the general theory of processes. Proposition 14.2.I. Given a cumulative process ξ adapted to the history F , its compensator is the unique F -predictable cumulative process A satisfying Y (t, ω) ξ(dt, ω) = E Y (t, ω) A(dt, ω) (14.2.3) E R+

R+

for every nonnegative F -predictable process Y (·).

378

14. Evolutionary Processes and Predictability

It is on account of this relation that the compensator is referred to as the dual predictable projection in literature that adheres to the terminology of the general theory of processes. The use of the word ‘projection’ relates to the fact that any conditional expectation can be viewed as a projection from one space of functions to a subspace of functions having a coarser structure: see Krickeberg (1965, Chapter IV) for an introduction to this circle of ideas. The analogue is clearer in the next proposition, which introduces the predictable projection itself. Proposition 14.2.II. Let X(t, ω) be a measurable stochastic process on the probability space (Ω, E, P) satisfying, for all 0 < t < ∞,

t

|X(s, ω)| ds < ∞ ,

E 0

and let F be a history for X. Then there exists an F -predictable process X F (t, ω) such that for all bounded F -predictable processes Y (t, ω),

t

Y (u, ω)X(u, ω) du = E

E 0

t

Y (u, ω)X F (u, ω) du

(all 0 < t < ∞).

0

(14.2.4) Moreover, X F (·) is uniquely deﬁned up to its values on a predictable set of ( × P)-measure zero. Proof. First let Y (u, ω) be the indicator function of the generating set (s, t] × U (0 ≤ s < t < ∞, U ∈ Fs ) of ΨF , for which (14.2.4) takes the form of the deﬁning relation for the Radon–Nikodym derivative ω) du P(dω). X(u, X(u, ω) du P(dω) = (s,t]×U

(s,t]×U

Because the Radon–Nikodym derivative is unique up to values on null sets of the deﬁning σ-algebra, this both identiﬁes X F as the Radon–Nikodym derivative, and justiﬁes the uniqueness statement in the proposition. The process X F can be called the F -predictable projection of X, or predictable projection for short. Note that the technique of proof is exactly that used in the study of conditional expectation, except that here the measure on R+ × Ω is no longer ﬁnite nor is X necessarily integrable on R+ × Ω. We turn now to the extension of these ideas to MPPs. A marked cumulative process is a random kernel ξ(t, K, ω), R+ × BK × Ω → [0, ∞] such that (i) for each ﬁxed K ∈ BK , the process ξ(· , K) is monotonic increasing and right-continuous in its ﬁrst argument t, and (ii) for each ﬁxed t ∈ R+ , ξ(t, ·) is a boundedly ﬁnite random measure on BK . Such marked cumulative processes arise, in particular, from counting the number of points with marks in K from an MPP observed over the time interval (0, t]. This is the only case we consider in detail.

14.2.

Campbell Measure and Predictability

379

Results for marked cumulative processes, analogous to the Doob–Meyer decomposition or the projection theorems just described, can be obtained by considering decompositions of the three-component product space R+ × K×Ω. The marked Campbell measure on this product space is deﬁned by the relations ξ(B × K, ω) P(dω), (14.2.5) Cξ (B × K × U ) = U

where B and K are bounded Borel sets from R+ and K, respectively. For a given history F , the mark-predictable σ-algebra, ΨF K is generated by product sets of the form (s, t] × K × U with 0 < s < t, K ∈ BK and U ∈ Fs . Equivalently, it is the product of the predictable σ-algebra ΨF with BK . A marked cumulative process is mark-predictable if, as a function of three arguments, it is measurable with respect to the mark-predictable σ-algebra. Deﬁnition 14.2.III. The F -compensator of an MPP N on X with marks in K is any mark-predictable, cumulative process A(t, K, ω) such that, for each K ∈ BK , A(t, K, ω) is the F -compensator for the simple point process NK (t) ≡ N ((0, t] × K). The existence and structure of compensators in the marked case are clariﬁed by introducing the ground process Ng of Section 6.4. By assumption Ng is a well-deﬁned point process in its own right. For the present context, we also require Ng to have ﬁnite ﬁrst moment measure, and hence its own compensator Ag relative to the history F . For a given mark set K and history F , we can also deﬁne the compensator A(t, K) for NK (t). For each such K, the associated Campbell measure satisﬁes CNK (dt × dω) CNg (dt × dω). If we think of this result as a relation between measures on the product σalgebra ΨF ×E, it follows as in Proposition A1.5.III for the existence of regular conditional probabilities, that we can write t F (K | u, ω) CNg (du × dω) Cξ ((s, t] × K × U ) = CNK ((s, t] × U ) = U

s

(14.2.6) for s < t, U ∈ Fs , and some predictable kernel F (K | t, ω), which is (i) ΨF -measurable, and hence predictable, for each ﬁxed K; and (ii) a probability distribution on BK for ( × P)-almost all pairs (t, ω). We now show that if Ag (t, ω) is the F -compensator for the ground process, then with F determined by (14.2.6), the process A(t, K, ω) deﬁned by t F (K | u, ω) Ag (du, ω) (14.2.7) A(t, K, ω) = 0

is an F -compensator for the marked process. Using the Doob–Meyer decomposition for Ng , we have, for any nonnegative predictable process X(t, ω), X(t, ω) Ng (dt, ω) = E X(t, ω) Ag (dt, ω) . E R+

R+

380

14. Evolutionary Processes and Predictability

Take X(t, ω) = K Y (t, κ, ω) F (dκ | t, ω) for any given mark-predictable process Y (t, κ, ω), and observe that the properties of Y and the kernel F imply that X is indeed predictable. The equations in the chain below then hold (Fubini’s theorem implies ﬁniteness of one and every element of the chain): E

Y (t, κ, ω) ξ(dt × dκ, ω) = Y (t, κ, ω) Cξ (dt × dκ × dω) R+ ×K R+ ×K×Ω Y (t, κ, ω)F (dκ | t, ω) CNg (dt × dω) = R+ ×Ω

=E

R+

K

X(t, ω) Ng (dt, ω)

=E X(t, ω) Ag (dt, ω) R + Y (t, κ, ω) F (dκ | t, ω) Ag (dt, ω) P(dω), = Ω

R+

K

which both establishes the decomposition of the marked Campbell measure into the product of the predictable kernel F (dκ | t, ω) and the Campbell measure for the ground process, and the martingale property for the marked process. In particular, setting Y (t, κ, ω) = IK (κ)X(t, ω) yields the equations R+ ×Ω

X(t, ω) CNK (dt × dω) = Ω

R+

X(t, ω)F (K | t, ω) Ag (dt, ω) P(dω),

(14.2.8) implying that the compensator for NK can indeed be represented in the form (14.2.7). Uniqueness follows from the uniqueness of the Radon–Nikodym derivatives in (14.2.6). In summary, we have established the ﬁrst part of the next theorem. For the rest, in the special case that the history F is either the internal history H of the MPP or an intrinsic history F = F0 ∨ H, we can develop a more explicit representation for the compensator extending that of Theorem 14.1.IV. Again we start by considering the canonical example of a one-point MPP deﬁned by a bivariate distribution Γ on B(R+ ×K) for the time X and mark Y of the unique point and develop an analogue of Lemma 14.1.II [cf., e.g., Last and Brandt (1995, Proposition 1.7.1)]. The family of conditional distributions µ(K | t) = Pr{Y ∈ K | X = t} for (t, K) ∈ R+ × K is trivially predictable because for every K it is a function of t that is either deterministic or determined entirely by the prior σ-algebra F0 . From this point onwards the development largely duplicates the discussion leading to Theorem 14.1.IV and the associated assertion in Proposition 14.1.VI that for internal or intrinsic histories, the compensator and the point process determine each other uniquely. The arguments are brieﬂy outlined in Exercises 14.2.1–3 and the results summarized in part (b) of the theorem.

14.2.

Campbell Measure and Predictability

381

Theorem 14.2.IV. (a) Let N be an F -adapted marked point process with ground process Ng having ﬁnite ﬁrst moment measure. Then an F -compensator for N exists, is ( × P)-a.e. unique, and can be represented as in (14.2.7) in terms of a mark-predictable kernel F . (b) Suppose, in particular, that F is a history for N of the form {F0 ∨ Ht }, that {(Tn , κn )} is the sequence of stopping times and marks as at (14.1.1), and that there exist regular versions Γn (· , · | F(n−1) ) of the bivariate conditional distributions of the pairs (τn , κn ), where τn = Tn − Tn−1 , given the σ-algebras F(n−1) generated by the history up to time Tn−1 . Also let Gn (· | F(n−1) ) denote the marginal conditional distribution function for τn , and Fn (· | τn , F(n−1) ) the conditional distribution of κn given τn and F(n−1) , and suppose Gn satisﬁes 1 − Gn (x−) > 0 for x > 0. Then a version of the F -compensator A(t, K, ω) at (14.2.7) is given by Ag (t, ω) = F (K | t, ω) =

∞ n=1 ∞

A(n) g (t, ω),

(14.2.9)

I(Tn−1 (ω),Tn (ω)] (t) Fn K | t − Tn−1 (ω), F(n−1) ,

n=1

where

A(n) g (t, ω)

τn ∨(t−Tn−1 (ω))+

= 0

Gn (du | F(n−1) ) . 1 − Gn (u− | F(n−1) )

(c) Under the conditions in part (b), the form of the compensator and the ﬁdi distributions for the process determine each other uniquely. Example 14.2(a) Semi-Markov processes [see Example 10.3(a)]. In viewing a semi-Markov process X(·) as an MPP in Section 10.3 we detailed its conditional intensity, its ground process, and its conditional mark distribution, in terms of the structural elements of the process given there, including the transition-time density functions gjk (t) (j, k ∈ X, t > 0). In terms of a realization {(ti , κi )} as before, σ-ﬁelds F(n) are determined by {(ti , κi ): i ≤ n}, consistent with the setting also of Theorem 14.2.IV, and then with tn−1 < t ≤ tn and τn = tn − tn−1 , Gn (t | F(n−1) ) =

j∈X

t−tn−1

gκn−1 ,j (u) du = Gκn−1 (t − tn−1 ),

0

gκn−1 ,κ (τn ) , j∈X gκn−1 ,j (τn ) t∧tn (ω)−tn−1 (ω)

Fn ({κ} | τn , F(n−1) ) = A(n) g (t, ω) =

1 − Gκn−1 (u−)

0 (n)

in which we observe that Ag

j∈X gκn−1 ,j (u) du

is an IHF.

,

382

14. Evolutionary Processes and Predictability

In the simpler case that X is a countable state space continuous-time Markov process with transition matrix (qij ) and no instantaneous states, these functions simplify (with t and τn as above) to Gn (t | F(n−1) ) = 1 − exp − qκn−1 (t − tn−1 ) , Fn ({κ} | τn , F(n−1) ) = pκn−1 ,κ = qκn−1 ,κ /qκn−1 , Γn (t, {κ} | F(n−1) ) = Pr{τn > u, κn = κ | F(n−1) ) = pκn−1 ,κ e−qκn−1 u = [1 − Gn (u | F(n−1) )] Fn ({κ} | u, F(n−1) ), and F ({κ} | t, ω) = pX(t−),κ . Further discussion is given in Last and Brandt (1995, Exercise 4.3.4). The projection operator onto the mark-predictable functions, and its dual, give rise to projection theorems analogous to Theorems 14.2.I and 14.2.II, which we state below for completeness; proofs are sketched in Exercise 14.2.4. See also Exercise 14.2.5. Proposition 14.2.V. Given a marked cumulative process ξ adapted to the history F , its compensator A is the unique F -predictable cumulative marked process satisfying, for every nonnegative mark-predictable process Y (· , ·), Y (t, κ, ω) ξ(dt × dκ, ω) = E Y (t, κ, ω) A(dt × dκ, ω) E R+ ×K

=E R+

K

R+ ×K

Y (t, κ, ω) F (dκ | t, ω) Ag (dt, ω).

(14.2.10)

In the next theorem we assume the existence of a boundedly ﬁnite (hence σ-ﬁnite) reference measure K on the Borel sets BK of the mark space. Proposition 14.2.VI. Let X(t, κ, ω) be a measurable marked stochastic process on the probability space (Ω, E, P) satisfying t |X(s, κ, ω)| ds K (dκ) < ∞ (all 0 < t < ∞, bounded K ∈ BK ), E 0

K

and let F be a history for X. Then there exists an F -predictable marked process X F (t, κ, ω) such that for all bounded F -mark-predictable processes Y (t, κ, ω), all 0 < t < ∞, and all K ∈ BK , t Y (u, κ, ω) X(u, κ, ω) du K (dκ) E 0 K t Y (u, κ, ω) X F (u, κ, ω) du K (dκ). (14.2.11) =E 0

K

F

Moreover, X (·) is uniquely deﬁned up to its values on a predictable set of ( × K × P)-measure zero. We conclude this section with some results illustrating applications of the preceding ideas to limit theorems stated in terms of the convergence of compensators. The following technical lemma is a useful aid in these proofs.

14.2.

Campbell Measure and Predictability

383

Lemma 14.2.VII. Let F be a history, N (·) a simple point process that is F -predictable, and X(·) a nonnegative F -adapted process with ﬁnite ﬁrst moment. Then the cumulative process t X(ti ) = X(s+) dN (s), (14.2.12) η(t) = 0

i: ti ≤t

where N (·) has jump points {ti }, is F -adapted with F -compensator (14.2.13) ζ(ti ) ≡ E X(ti ) | Fti − . α(t) = i: ti ≤t

i: ti ≤t

Proof. All the processes N , η, and α are clearly right-continuous and monotonic increasing, with their points of increase conﬁned to {ti } (or a subset of these jump points of N ), where the sizes of their jumps are 1, X(ti ), and ζ(ti ), respectively. N is F -adapted by assumption, and it is easily checked that so too are η and α. Two more steps are needed to prove the lemma: we must show that α(·) is F -predictable, and that η(·) − α(·) is an F -local martingale. Because α(·) can be written in the form ∞ ζ(ti )Vt+ (t, ω), α(t, ω) = i i=1

where ≡ I{ω: T (ω)≤t} (t, ω) is the right-continuous indicator process of the stopping time T (see the end of Appendix A3.3 for discussion), we limit ourselves here to establishing these steps in the special case of a one-point process, that is, where (14.2.14) η(t, ω) = X(T ) VT+ (t, ω) and α(t, ω) = ζ(T ) VT+ (t, ω), (14.2.15) VT+ (t, ω)

with ζ(T ) = E[X(T ) | FT − ] for some F -stopping time T such that VT+ (·) is predictable. The extension to the general case is left to Exercise 14.2.6. Let Y be any FT − -measurable r.v., and consider the process Y VT+ (t). Indeed, taking Y ﬁrst to be an indicator r.v. of some basic FT − -measurable set of the form Bs ∩ {T (ω) > s} for some Bs ∈ Fs and s ∈ R+ , α(t, ω) = 1 on the intersection of the sets {(t, ω): T (ω) ≤ t}

and

{(t, ω): t > s, ω ∈ Bs ∩ {ω: T (ω) > s}}.

The ﬁrst set here is in the predictable σ-algebra ΨF by assumption, and the second is of the form of a generating set for ΨF and is therefore predictable also. A similar argument holds when Y is the indicator r.v. of the other type of generating set for ΨF, namely, an element of F0 . Now the class of predictable processes is closed under the formation of nonnegative linear combinations and monotone limits, so it follows that Y VT+ (·) is a predictable process whenever

384

14. Evolutionary Processes and Predictability

Y is FT − -measurable and nonnegative, and hence in particular when it has the form E[X(T ) | FT − ]. For the second part of the proof we have to show that for every As ∈ Fs and s < t,

[η(t, ω) − α(t, ω)] P(dω) = As

[η(s, ω) − α(s, ω)] P(dω), As

with η and α as at (14.2.14) and (14.2.15), so that η(t, ω) − α(t, ω) = X(T ) − E[X(T ) | FT − ] I{T (ω)≤t} (t, ω). The indicator function vanishes on {ω: T (ω) > t}, and on {T (ω) ≤ s} for s < t, so it is enough to consider {ω: s < T (ω) ≤ t}, and hence show that, for T > s, X(T ) − E[X(T ) | FT − ] P(dω) = 0. (14.2.16) As ∩{T >s}

Because the set As ∩ {ω: T (ω) > s} is an element of both Fs and FT − , the integrand can be replaced by its conditional expectation with respect to FT − , and thus (14.2.16) holds. The following result of Brown (1978, 1982, 1983) establishes convergence to Poisson and Cox processes via conditions on compensators. Theorem 14.2.VIII. Let {F (n) : n = 1, 2, . . .} be a sequence of histories deﬁned on a common probability space (Ω, E, P), {Nn } a sequence of simple point processes for which for each n, Nn is F (n) -adapted and has F (n) compensator An . Suppose A(·) is a cumulative process deﬁned on (Ω, E, P) with continuous trajectories and such that for each t > 0, (n) (i) A(t) is F0 -measurable for every n = 1, 2, . . . ; and (ii) An (t) → A(t) (n → ∞) in probability. Then Nn converges weakly to a Cox process directed by A. Proof. We ﬁrst show that conditions (i) and (ii) imply the tightness of the distributions on NR#+ induced by {Nn }, so that subsequences converge weakly to the distribution of some point process N , and then stable convergence is used to help in identifying the limit distribution. Tightness in the present context amounts to showing that for each t ∈ R+ , given > 0, we can ﬁnd some a0 and some subsequence {nk } such that for all n ∈ {nk }, (all a > a0 ); (14.2.17) P{Nn (t) > a} < mostly, we take ‘n in a subsequence {nk }’ as understood. Now for any event U ∈ E, " " "P(U | {A(t) ≤ M }) − P(U )" ≤ P{A(t) > M } ≤ 1 12 P{A(t) ≤ M }

14.2.

Campbell Measure and Predictability

385

for M suﬃciently large. Consequently, we can establish (14.2.17) by showing that for all n, P0 ({Nn (t) > a} | {A(t) ≤ M }) < 23

(all a > a0 ).

(14.2.18)

Write Nn (·) = IB Nn (·), An (·) = IB An (·), A (·) = IB A(·), where B is the indicator function of the set {A(t) ≤ M }. Equation (14.2.18) will be established if we can show that P {Nn (t) > a} < 23 (all a > a0 ). From condition (ii) of the theorem we have immediately the property (ii) An (·) → A(·) (n → ∞) in probability, so by a diagonal selection argument we can ﬁnd a subsequence {nk } such that An (s) → A (s) a.s. for a countable set of values of s. But An (s) and A (s) k are both monotone in s, ﬁnite for ﬁnite s, so sup |An (s) − A (s)| → 0 (n → ∞) a.s.

0≤s≤t

Then identifying {nk } = {nk }, sup |An (s) − A (s)| → 0

in probability.

(14.2.19)

0≤s≤t

By the continuity and F0 -measurability of A (·), (n)

Tn ≡ inf{s: An (s) > A (t) + 1} (n)

is an extended F(−) -stopping time, with the property implied by (14.2.19) that P{Tn ≥ t} → 1 as n → ∞. Consequently, there exists ∆ such that, for all n, P{|Nn (t ∧ Tn ) − Nn (t)| > ∆} < 13 . Now P{Nn (t ∧ Tn ) > a + ∆} ≤

E[An (t ∧ Tn )] M +1

E[Nn (t ∧ Tn )] = ≤ < a+∆ a+∆ a+∆ 3

for a suﬃciently large and all n. Equation (14.2.18), and hence (14.2.17), now follow. Assume there is given a subsequence {nk } for which the distributions induced by Nn (·) converge weakly to those of some limit point process N (·): our aim is to show that all such limit distributions coincide with those of a Cox process directed by A(·). This will follow from the characterization result in Theorem 14.6.I if we can show that there exists a version of N (·) (N ) having A(·) as its F -compensator, where F = {Ft }, Ft = F0 ∨ Ft , and F0 = σ{A(s): 0 < s < ∞}. As in the ﬁrst part of the proof, it is enough to ﬁx some t < ∞ and to assume that the subsequence is so chosen and the point process so modiﬁed that An (s) → A(s) uniformly a.s. on (0, t] and that the compensators An (t) are a.s. uniformly bounded by ﬁnite M say.

386

14. Evolutionary Processes and Predictability

Referring to the properties of F -stable convergence of distributions, with # and F = {Ft } as above, the space X of Deﬁnition A3.2.III taken to be N(0,t] it follows from Proposition A3.2.VI that a version of N can be deﬁned on the (possibly enlarged) space (Ω, E, P) in such a way that if U ∈ Fs for some s in (0, t], Nn (s) P(dω) →

N (s) P(dω)

U

(14.2.20)

U

[see condition (iv) of Proposition A3.2.VI]. But the convergence of the compensators {An (·)} implies a result corresponding to (14.2.20) with An and A replacing Nn and N , so we also have [Nn (s) − An (s)] P(dω) → [N (s) − A(s)] P(dω). (14.2.21) U

U

Before we can invoke the martingale property for the left-hand side here, we have to show that (14.2.21) continues to hold when U on the left-hand side (n) is replaced by some approximating set Un from Fs . It is enough to take U to be a generating set V ∩ W , where V ∈ F0 , W = {ω: N (·, ω) ∈ B, B ∈ # B(N(0,s] )}. Deﬁne Un = V ∩Wn , where Wn is deﬁned as W with N replaced by Nn . The properties of F -stable convergence imply that P(U ∆ Un ) → 0 (see Proposition A3.2.IV). Furthermore, the integrands at (14.2.21) are uniformly integrable because of the boundedness of their second moments as follows from 2 E [Nn (s)]2 ≤ E [Nn (t)]2 = E i ∆Nn (ti ) =E [∆Nn (ti )]2 + 2 ∆Nn (ti ) ∆Nn (tj ) i

i

t = E Nn (t) + 2 Nn (s−) dNn (s) ,

tj
because Nn is simple,

0

t Nn (s−) dAn (s) = E An (t) + 2 0

≤ E[An (t) + 2Nn (t) An (t)] ≤ M + 2M 2 . Using these two facts, we may replace (14.2.21) by [Nn (s) − An (s)] P(dω) → [N (s) − A(s)] P(dω). Un

(14.2.22)

U

Take s = t in this equation, but in the deﬁnition of Wn , and hence of Un , take # ) with s < t. Then the martingale property of Nn implies B ∈ B(N(0,s]

[Nn (t) − An (t)] P(dω) =

Un

[Nn (s) − An (s)] P(dω). Un

14.2.

Campbell Measure and Predictability

387

Using (14.2.22), the same equality holds with Nn and An replaced by N and A, which is just the conditional expectation property identifying N (·) − A(·) as a martingale, and hence A(·) as the F -compensator for N (·). This result is applicable, for example, to ‘dependent thinnings’ where the probability of a point being deleted is allowed to depend on either or both of the past of the original (unthinned) point process and of the history of previous deletions. To see this, consider the special case of Lemma 14.2.VII in which X is a {0, 1}-valued process so that η at (14.2.12) is a thinning of the process N , with ζ(ti ) = Pr{ti from N retained in η}. Applying the theorem above leads to the corollary below. Corollary 14.2.IX. Let {Nn , F (n) : n = 1, 2, . . .} be a sequence of simple point processes and associated histories, all deﬁned on a common probability space (Ω, E, P) and such that Nn is F (n) -predictable for each n. Let {Xn (·)} be a family of F (n) -adapted {0, 1}-valued processes, and let ηn (·), ζn (·), and αn (·) be deﬁned as at (14.2.12) and (14.2.13). If for all t > 0 αn (t) → A(t)

(n → ∞)

in probability,

(14.2.23)

(n)

where A(·) has continuous paths and is F0 -measurable for each n, then the thinned processes ηn (·) converge in distribution to a Cox process directed by A(·). Example 14.2(b) A point process with controlled thinning. In quality control and similar contexts, the detection and elimination of errors may be regarded as a thinning operation on an original stream of errors. The error rate in production may vary between batches, so that in order to achieve a uniform low level of errors in the output, screening may need to be intensiﬁed according to current estimates of the error rate. As a crude model, suppose the actual error rate equals some quantity λ ≡ λ(ω) per item produced; here, λ is constant within a batch but may vary between batches, and ω ∈ Ω for some probability space (Ω, E, P). Suppose that probabilistic screening of items occurs, the {0, 1}-valued r.v. X(ti ) indicating elimination of (X(ti ) = 0) or failure to detect (X(ti ) = 1) an error at ti , and E(X(ti ) | Fti − ) = ζ(ti ) for some process ζ(·). Let Y (t) and Z(t) denote, respectively, the numbers of errors detected and undetected in time (0, t), so that assuming the time-scale unit is chosen according to the rate of production of items (whether faulty or not), [Y (t) + Z(t)]/t ≈ λ for large t assuming ergodicity. The aim is to choose ζ(·) (and this choice is assumed to be available to the producer) in such a way that Z(t)/t < ∼ γ, asymptotically in t (= long batch run), for some small residual error rate γ per item produced. One such possibility is to have 1 − ζ(t, ω) =

γt , γt + Y ∗ (t−, ω)

Y ∗ (t, ω) ≡ max(1, Y (t, ω)),

(14.2.24)

388

14. Evolutionary Processes and Predictability

noting that this implies that ζ(t, ω) → 1 as t → 0 and that ζ(t, ω) remains closer to 1 until (if ever) Y ∗ (t−, ω)/(γt) is no longer signiﬁcantly larger than 1; if λ γ we expect intuitively that this would not occur, because the aim is to make Z(t, ω)/(γt) asymptotically about 1 and so Y ∗ (t, ω)/(γt) is like (λ − γ)/γ. We therefore study the asymptotic behaviour in Z(·) for γ → 0 by considering a sequence of schemes, indexed by γ, and show how Corollary 14.2.IX may be applicable. Suppose that the realizations N (·) are kept ﬁxed and that each of a sequence of thinning operations is applied to the same realizations, so that Nγ (t) = N (t) and Yγ (t) and Zγ (t) are the result of a dependent thinning via a {0, 1}-valued process Xγ (t) for which E[Xγ (ti ) | Fti − ] = ζγ (ti ) and ζγ (·) is related to Yγ (·) as at (14.2.24). Finally, change the time scale by setting τ = γt and deﬁning the processes γ (τ ) = Zγ (τ /γ), Z Yγ (τ ) = Yγ (τ /γ), τ . ζγ (τ ) = ζγ (τ /γ) = 1 − τ + Yγ (τ /γ−, ω) (γ)

Set up a scheme of histories F (γ) as follows. Take F0 to include the γ (t) = Nγ (τ /γ) for every γ > 0, and deﬁne Fτ(γ) = F (γ) ∨ information on N 0 γ (s): 0 < s < τ }. This choice ensures that for each γ we have F (γ) σ{X γ and that X γ is F (γ) -adapted, leading to the expression predictability of N γ , for the F (γ) -compensator for the thinned (output) process Z τ τ /γ γ (u) = γ (τ ) = [1 − ζγ (u)] dN [1 − ζγ (u)] dN (u). A 0

0

γ (·) converge in probability We must now determine whether the processes A as γ → 0, and whether the limit function is the compensator of a Poisson process (because this is the plausible limit process under thinning). The triangle inequality gives " τ " " τ " " " " "γ " " γ (τ ) − τ " ≤ " γ " " "A dNγ (v) − τ " + " dNγ (v)"" "λ λ 0 0 " " τ" " τ " " " γ "" " " " [1 − ζγ (v)] dNγ (v)" + +" "1 − ζγ (v) − λ " dNγ (v), (14.2.25) τ 0 and as indicated in Exercise 14.2.7, provided N (t, ω)/t → λ a.s. (t → ∞), τ can be chosen here to make each of these four terms converge to zero in probability as γ → 0. We therefore conclude that, in the asymptotic sense indicated by change of time scale, the thinning procedure will be eﬀective in reducing the output error rate to the required low level γ, irrespective of λ, and that the resultant process will be asymptotically Poisson in character. Observe that although conceptually we regard N as being known a priori (γ) (Y ) [i.e., σ(N ) ⊆ F0 ], ζ(·) is in fact H(−) -adapted, where H(Y ) is the internal history of the observed process of errors.

14.2.

Campbell Measure and Predictability

389

Exercises and Complements to Section 14.2 14.2.1 One-point MPP compensator. Consider a one-point MPP determined by the bivariate distribution Γ(·, ·) on BR+ × BK with marginal distribution function Fg (t) = Γ([0, t], K), and conditional distribution FÊ(K | t) = Pr{Y ∈ K | t X = t} as in Theorem 14.2.IV(b). Let Hg (t) = 0 dFg (u)/[1 − Fg (u−)]. Ê t∧X Show that A(t, K, ω) = 0 F (K | u) dHg (u) is an MPP compensator for the one-point process as in Deﬁnition 14.2.III. Extend to the case that Γ is itself determined by a prior σ-algebra F0 . 14.2.2 (Continuation). Use the previous exercise to justify the representations in (14.2.9) in Theorem 14.2.IV(b). [Hint: The representation of Ag (t) repeats the argument leading to Theorem 14.1.IV. The essential remaining step is to establish the predictability of F (K | t, ω) for ﬁxed K ∈ BK : this is done sequentially by successive conditioning on σ-algebras F(n−1) , much as in the discussion around (14.1.9).] 14.2.3 (Continuation). Check the assertion in Theorem 14.2.IV(c) by constructing the ﬁdi distributions for the MPP in terms of the components {Gn } and {Fn } of the compensator, and vice versa. 14.2.4 Starting from the deﬁnition of the mark-predictable σ-algebra, argue as for Proposition 14.2.I to prove Proposition 14.2.V. Similarly, extend the Radon– Nikodym argument for Proposition 14.2.II to prove Proposition 14.2.VI. 14.2.5 Regard a marked cumulative process ξ as being measure-valued in the sense that for each t > 0, ξ(t, ·) is a measure on BK . Consider the latter as an element in the ordered space of measures in which the sequence of measures is monotonic increasing in t. Investigate whether the projection Propositions 14.2.V–VI remain valid in this measure-valued context. 14.2.6 (a) To complete the proof of Lemma 14.2.VII, extend the discussion from the one-point case given after (14.2.14–15) to any simple point process N . (b) Let F be a history, ξ an F -predictable cumulative process, and X a nonnegative F -adapted process with ﬁnite ﬁrst moment. Show that if there exists an F -predictable version of Êthe conditional expectation ζ(t) ≡ t E[X(t) | Ft− ], then the process η(t) ≡ 0 X(u) dξ(u) has F -compensator Êt α(t) ≡ 0 ζ(u) dξ(u). [Hint: The required measurability and predictability properties are preserved in (a) because only linear combinations and a.s. limits are involved. To establish the compensator property in (b), use Fubini’s theorem and the argument in the text to show that for every As ∈ Fs and s < t, t

P(dω) As

(X(u) − E[X(u) | Fu− ] ) du = 0. ] s

14.2.7 In Example 14.2(b), show that each of the four terms on the right-hand side of (14.2.25) converges in probability to zero as γ → 0 on the assumption that (γ) N (t)/t converges a.s. to some limit λ ≡ λ(ω) ∈ F0 for all γ. [Hint: The convergence is shown directly for the ﬁrst three terms; for the last, investigate supt
390

14. Evolutionary Processes and Predictability

14.2.8 Extend Exercises 14.1.12–13 to the marked case by exhibiting the form of the H-marked compensator in terms of suitably deﬁned conditional distributions and intensity functions. 14.2.9 Prove Theorem 14.2.VIII by imitating the construction and proof of Watanabe’s theorem at Proposition 14.6.I.

14.3. Conditional Intensities In this section we examine in more detail the properties of both simple and marked point processes with absolutely continuous compensator, extending results for conditional intensities constructed via hazard functions as in Sections 7.2–3 and Corollary 14.1.V. Note that for MPPs we prefer to deﬁne the conditional intensity as a density not only with respect to time but also with respect to a reference measure in the mark space, rather than working with families of conditional intensities indexed by Borel sets in the mark space. The main reason for this choice is that in applications, it is usually the density form that suggests itself most naturally. The main topics we consider are conditions for the existence of conditional intensities and, when they exist, the selection of predictable versions. We consider ﬁrst processes deﬁned on a half-line R+ as in Sections 14.1–2, and then look at complete intensities for stationary processes on the whole line R. We start from the following deﬁnitions, covering successively the simple and marked cases. We again assume the existence of ﬁrst moment measures, including in the marked case the ﬁrst moment measure of the ground process. Deﬁnition 14.3.I. (a) Let ξ be a cumulative process with history F and F -compensator A. An F -intensity for ξ is any F -adapted process λ∗ (u, ω), measurable with respect to the product σ-algebra BR+ × E, and such that a.s. 0 for all t, t

A(t, ω) =

λ∗ (u, ω) du.

(14.3.1a)

0

(b) Let ξ be a marked cumulative process, with mark space (K, BK , K ) and F -compensator A. An F -intensity for ξ is any F -adapted process λ∗ (u, κ, ω), measurable with respect to the three-fold product σ-algebra BR+ × BK × E, 0 and such that, a.s. for all 0 < t < ∞ and K ∈ BK , λ∗ (u, κ, ω) du K (dκ). (14.3.1b) A(t, K, ω) = (0,t]×K

t In other words, we require 0 λ∗ (u, ω) du to be P-indistinguishable from A(t, ω), with an equivalent statement in the marked case. The existence of the integrals on an a.s. basis is not automatic for a general measurable process λ∗ , but is so if λ∗ is F -progressively measurable (see Section A3.3 for

14.3.

Conditional Intensities

391

the deﬁnition of progressively measurable unmarked processes; the extension to MPPs follows as in the deﬁnition of predictable MPPs). In fact, it is clearly desirable, whenever possible, to choose a version of the intensity that is predictable, because the intensity is used to calculate risks before rather than after the occurrence of points. The projection Propositions 14.2.I–II and 14.2.V–VI help to clarify the circumstances under which this is possible. Regard the absolute continuity as referring not to the a.s. properties of the compensator for ﬁxed ω but rather to a property of the Campbell measure on the appropriate product space. The Radon–Nikodym derivative on the restriction of the product σ-algebra to the predictable σ-algebra then provides us with the predictable version that we want. These remarks are formalized below, where we state the results for the general marked case; those for simple point processes follow on collapsing the mark space to a single point (see Exercise 14.3.1 for an explicit statement). Unmarked point processes which are not simple can be treated as MPPs with integer marks. Proposition 14.3.II. Let ξ be an F -adapted marked cumulative process with F -compensator A. (a) A necessary and suﬃcient condition for the existence of an F -markpredictable intensity λ∗ for A is that the marked Campbell measure at (14.2.5) be absolutely continuous with respect to × K × P on ΨF K. (b) When the condition in (a) is satisﬁed, the Radon–Nikodym derivative dCξ (t, K, ω)/d( × K × P) = λ∗ (t, κ, ω) is F -mark-predictable, and provides an F -mark-predictable version of the conditional intensity. This version then coincides, except possibly on a set of ( × K × P)-measure zero, with any F -intensity for ξ satisfying Deﬁnition 14.3.I(b). (c) When an F -mark-predictable intensity exists, the ground process has an F -predictable intensity λ∗ (t, κ, ω) K (dκ), (14.3.2a) λg (t, ω) = K

and the F -predictable version of the conditional mark distribution F (· | t, ω), introduced in (14.2.7), is ( × P)-a.e. absolutely continuous with respect to K , with density f (κ | t, ω) = λ∗ (t, κ, ω)/λg (t, ω) K -a.e. so that for K ∈ BK , λ∗ (t, κ, ω) K (dκ) ( × P)-a.e. F (K | t, ω) = f (κ | t, ω) K (dκ) = K ∗ λ (t, κ, ω) K (dκ) K K (14.3.2b) Proof. Suppose ﬁrst that an F -intensity exists. Consider any basic set (s, t] × K × U , with s < t, K ∈ BK and U ∈ Fs , for the mark-predictable σ-algebra. Now (s, t]×K ×U is an element of the threefold product σ-algebra, so it follows from Deﬁnition 14.3.I(b) that λ∗ (u, κ, ω) du K (dκ) P(dω) = E IU A(t, K) − A(s, K) . (s,t]×K×U

392

14. Evolutionary Processes and Predictability

By the martingale property embodied in (14.2.3), the right-hand side here equals E IU N (t, K) − N (s, K) = CP (s, t] × K × U . It follows that the Campbell measure CP is absolutely continuous on ΨF . Now suppose conversely that absolute continuity as above holds. Then the Radon–Nikodym theorem implies the existence of at least one version of the density that is itself ΨF K -measurable. For this version, the above equations continue to hold and show that the density is in fact an F -predictable intensity. This version is a fortiori measurable with respect to the full product σ-algebra, so the Campbell measure is absolutely continuous with respect to the product measure on this larger σ-algebra also. The uniqueness results associated with the Radon–Nikodym theorem then imply that this predictable version should diﬀer from any other F -intensity at most on a subset of ( × K × P)-measure zero. When the marked Campbell measure is absolutely continuous, the Campbell measure for the ground process, obtained by setting K = K in the deﬁnition, is likewise absolutely continuous with respect to × P on the twofold product space X ×Ω, and so a predictable version of the intensity exists also for the ground process. Indeed, the integral λg (t, ω) = K λ(t, κ, ω) K (dκ) is still predictable, satisﬁes the requirements of a density for the Campbell measure of the ground process, and thus coincides with any other version of its predictable intensity ( × P)-a.e. Finally, the ratio K λ(t, κ, ω) K (dκ)/λg (t, ω) is predictable and satisﬁes the deﬁning equation (14.2.6) for the predictable version of the conditional mark distribution. Note that in the marked case it follows from standard results on product spaces (see the discussion around Fubini’s Theorem A1.5.I) that for each ﬁxed κ, function λ∗ (t, κ, ω) is F -predictable, and that for ﬁxed K, λ∗K (t, ω) = the ∗ λ (t, κ, ω) K (dκ) is an F -conditional intensity for the point process K NK (·) = N (· × K). Note also that throughout the discussion we require the marked conditional intensity to be a density not only in time but also with respect to the reference measure K on the mark space; for an alternative approach, aimed at omitting this last requirement, see Exercise 14.3.2. The proposition implies that the existence of any version of the F -conditional intensity, being enough to ensure absolute continuity of the Campbell measure, is enough to ensure also the existence of an F -predictable version of the intensity. This raises the question of the relation between the two intensities. In this situation the construction in Proposition 14.3.II reduces to the construction of the predictable projection (as in Propositions 14.2.V–VI) of the initial version of the conditional intensity. As a speciﬁc example, let F and G be two histories for a process (ξ, F ) with G ⊆ F in the sense that Gt ⊆ Ft for every t ≥ 0. A typical ﬁltering problem is to ﬁnd the G-intensity for ξ given its F -intensity. Again, this problem is readily solved by appeal to Propositions 14.3.II and 14.2.V: the G-predictable projection of the F -intensity is a G-intensity. Inasmuch as this resolves the

14.3.

Conditional Intensities

393

question of existence of a G-intensity, actually computing the projection from its deﬁnition as a Radon–Nikodym derivative in the product space R+ × Ω may be diﬃcult. A simpler relation such as ˜ G (t, κ) = E[λF (t, κ) | Gt− ] λ

(14.3.3)

is intuitively plausible, but the diﬃculty is that there is no guarantee in general that the right-hand side has a version that is even measurable, let alone predictable. Fortunately, this is usually more of a theoretical diﬃculty than a practical problem, because in most practical applications a version of (14.3.3) can be found with continuity properties which allow it to be identiﬁed as a predictable version of the G-intensity. The lemma below covers the general situation. ˜ G (t, κ, ω) at (14.3.3), with Lemma 14.3.III. Suppose that the intensity λ G ⊆ F , admits a version whose trajectories are (K × P)-a.e. left-continuous in t, or, more generally, which is G-mark-predictable. Then this version is also a version of the G-mark-predictable intensity. Proof. Suppose such a version as described exists, µ(t, κ, ω) say, so that it necessarily satisﬁes µ(u, κ, ω) K (dκ) P(dω) = λF (u, κ, ω) K (dκ) P(dω) B

K

B

K

for all B ∈ Gs ⊆ Fs , K ∈ K, and u > s, by deﬁnition of the F -intensity and the inclusion Gu ⊇ Gs . From the deﬁnition of µ the left-hand side is measurable in u and can be integrated over (s, t] to give µ(u, κ, ω) du K (dκ) P(dω) = λF (u, ω) du K (dκ) P(dω). (s,t]×B×K

(s,t]×B×K

These sets (s, t] × B × K with B ∈ Gs generate ΨG , so this is just the assertion that µ is a version of the G-predictable projection of λF . Most commonly, the coarser σ-algebra G is the internal history H, and the problem is to ﬁnd the intensity for the internal history in terms of a larger history, such as some intrinsic history, for which the intensity is more easily calculated. The computation is eﬀectively a form of Bayes’ theorem; the following example is a convenient illustration. Example 14.3(a) Mixed Poisson process intensities. Suppose there is given a realization t1 , . . . , tN on (0, t] of a mixed Poisson process with rate parameter µ that is treated as a random variable with d.f. F on (0, ∞). Take F as the intrinsic history, so Ft = σ{µ} ∨ Ht . Then the F -intensity is just µ itself. To ﬁnd the H-intensity we investigate the form of E(µ | Ht− ), and assume for simplicity that F has a density f . The point about the internal history is that it has a simple structure and we can appeal to the existence of regular

394

14. Evolutionary Processes and Predictability

conditional distributions expressed in the form of densities. Bayes’ theorem here implies for the conditional density of µ given Ht

p(µ | Ht− ) = ∞ 0

(µt)N e−µt f (µ) p(Ht− | µ) f (µ) = ∞ , p(Ht− | µ) f (µ) dµ (µt)N e−µt f (µ) dµ 0

where N = N (t−, ω), and ∞ N +1 −µt µ e f (µ) dµ . E(µ | Ht− ) = 0 ∞ N −µt µ e f (µ) dµ 0 For example, if µ has the exponential density αe−αµ , then E(µ | Ht− ) =

N (t−, ω) + 1 N + 1 ≡ , t+α t+α

(14.3.4)

and because this function is a.s. left-continuous in t, it can be taken as a version of the H-intensity. For stationary point processes on the whole line R it is natural to use a form of conditional intensity that depends on the entire past rather than on the past since a ﬁxed origin. Strictly speaking, such processes lie outside the framework we have considered hitherto, because our deﬁnition of compensator was restricted to processes on the half-line R+ . However, the deﬁnition is readily extended to processes on R. Suppose ξ(t) is an F † -adapted process on (−∞, ∞), where F † is a history on R (earlier, our histories have been deﬁned as families {Ft : t ∈ R+ }). Then for any real t1 and t ≥ t1 we can consider the † process ξ1 (t) on [t1 , ∞) with respect to the intrinsic history Gt1 = Ft†1 ∨ F(t , 1 ,t] and deﬁne for it a compensator A1 (t) using the deﬁnition for a process on R+ . Similarly we can deﬁne a compensator A2 (t) for t ≥ t2 > t1 with respect to the † intrinsic history Gt2 = Ft†2 ∨ F(t . Now for i = 1, 2, Gti = Ft† for t ≥ ti , from 2 ,t] which it follows that the compensators coincide a.s. on the segment [t2 , ∞) where they overlap, and hence, by taking a sequence of values ti extending back to −∞, that there exists a unique process A(t) on (−∞, ∞) such that A(t) coincides a.s. with A1 (t) for any particular choice of t1 . From this point we can proceed as in the case of a half-line, deﬁning an F † conditional intensity as any F † -measurable process acting as a density for the compensator A(t) deﬁned above. To preserve a distinctive notation, we use λ† (t) to denote such a conditional intensity, and call it a complete conditional intensity (continuing if need be to use λF to notate the speciﬁc history). Similarly in the marked case, we can write λ† (t, κ) = λ†g (t) f (κ | t) for the ground intensity and conditional mark distribution relative to a complete history. The necessary and suﬃcient condition for the existence of a complete intensity is the absolute continuity of the Campbell measure relative to the product measure × P on the predictable σ-algebra on the full space R × Ω, with

14.3.

Conditional Intensities

395

a corresponding extension in the marked case. When this condition is satisﬁed the Radon–Nikodym derivative provides a predictable version of the complete intensity. As in the earlier discussion, we can always obtain a predictable version from any other complete intensity by projection onto the predictable σ-algebra, and in the examples we assume that such a choice has been made. Example 14.3(b) Renewal process with density [continued from Example 14.1(e)]. Here and for the Wold process, the complete intensity function has the form of the hazard function given in Example 14.1(e) but without the correction term due to the choice of a ﬁxed origin; that is, for the renewal process, λ† (t) = h(t − tN (t−) ), where tN (t−) = sup{ti : ti (ω) < t}. This illustrates the fact that the complete intensity function is frequently simpler analytically, as well as being amenable to probabilistic study. See Exercise 14.3.4 for the Wold process. Example 14.3(c) Hawkes process [continued from Examples 6.3(b), 7.2(d) and 8.5(d)]. This process is a stationary Poisson cluster process with clusters that are ﬁnite branching processes described by an oﬀspring intensity measure that has support in R+ and density function µ(·) of total mass ν < 1. From this Poisson character and because the support of µ is contained by the half-line, the conditional behaviour of the process is very simple: given the history up to time t, the process of births in [t, ∞) to individuals themselves born before t is conditionally a Poisson process with intensity at time t + u given by λ + ti
(14.3.5)

ti ≤t

which suﬃces to establish the right-hand side as a version of the conditional intensity with respect to the internal history H† = {Ht : t ∈ R}. In practice, µ is usually continuous apart from a possible jump at zero, so by either deﬁning µ(0) = 0 or restricting the summation at (14.3.5) to birth epochs ti < t, we obtain a left-continuous version that can therefore be taken as the requred predictable version of the complete intensity function. As already remarked in Example 8.5(d), the linear representation here has exactly the same form as that derived from the second-order theory of Chapter 8 when the Hawkes process has rational spectral density. In this case at least, as for Gaussian processes, the ‘best’ predictors are linear, at least where prediction of the intensity is in view. We proceed to a more detailed study of the complete intensity function for stationary processes. For simplicity we consider unmarked processes with just their internal histories, so that we can take for (Ω, E) the canonical

396

14. Evolutionary Processes and Predictability

space (NR# , B(NR# )), with the history H† consisting of the σ-algebras Ht† = σ{N (u) − N (s): −∞ < s < u ≤ t} for −∞ < t < ∞. Let us ﬁrst consider the eﬀect of stationarity on the conditional intensity. As in Proposition 13.2.I, stationarity of the underlying measure P implies invariance of the Campbell measure under the transformations Θu : (x, ξ) → (x − u, Su ξ). Observing that these transformations commute with the operation of extracting the Radon–Nikodym derivative, we see that when a predictable complete intensity exists, it should satisfy d Θu (CP ) d(C ) d(C ) P = = Θu P , d ×P d ×P d ×P so that

λ† (t, ξ) = Θu λ† (t, ξ) = λ† (t − u, Su ξ) = λ† (0, St ξ).

(14.3.6)

†

Observe that in our notation λ (t, ξ) we have chosen to emphasize the dependence of the intensity on the sample realization ξ (in the canonical space as we are assuming). Because of predictability, the conditional intensity λ† (t, ξ) † depends on ξ only through its past (i.e., its restriction to Ft− ). Speciﬁcally, appealing to Lemma A3.3.I and the converse statement for the canonical setup referred to in the discussion after Proposition A3.3.IV [see also Jacobsen † -measurable, and hence can be represented as a mea(1982)], λ† (0, ξ) is H0− surable function of the restriction of ξ to (−∞, 0). Then what (14.3.6) implies is that the form of the dependence of λ† (t) on the past up to t is the same as the form of dependence of λ† (0) on the past up to 0. For a stationary point process, this means that λ† (t) can be represented as a function of the sequence {t − ti } of intervals from t back to events ti with ti < t. A similar statement can be made when the process is stationary and the history includes information on the past of an ancillary stationary process, but general statements in the noncanonical framework are less easy to formulate. Questions relating to the stationarity and uniqueness of a process whose conditional intensity can be represented as in (14.3.6) are taken up in Section 14.7. Reasoning similar to that leading to (14.3.6) can be used to clarify the form of the absolute continuity condition on the Campbell measure when the process is stationary, under which condition the Campbell measure can be represented in terms of the associated Palm distribution P0 (see Deﬁnition 13.2.III and recall that we assume the existence of ﬁrst moments). We may then anticipate that the absolute continuity condition should therefore be expressible in terms of P0 . To see what form this condition might take, start from (13.2.6) of the reﬁned Campbell theorem relating P0 and the underlying stationary measure P, which here can be written for a general random measure on BR as m h(t, ξ) P0 (dξ) (dt) = h(t, S−t ξ) CP (dt × dξ) R

M# R

R×M# R

for nonnegative measurable functions h on B(R × M# R ). Now suppose that the absolute continuity condition holds, so that for nonnegative predictable

14.3.

Conditional Intensities

397

functions h(x, ξ) we can also write h(t, S−t ξ) CP (dt × dξ) = R×M# R

R

M# R

R

M# R

R

M# R

=

=

h(t, S−t ξ)λ† (t, ξ) dt P(dξ) h(t, ξ)λ† (0, ξ) dt P(dSt ξ) h(t, ξ)λ† (0, ξ) S+t P(dξ) dt.

Because P is stationary, it is invariant under S+t , and the above equations yield, for predictable h(t, ξ), h(t, ξ) P0 (dξ) (dt) = h(t, ξ) λ† (0, ξ) P(dξ) (dt). (14.3.7) m R

M# R

R×M# R

The last equation may seem somewhat paradoxical, at least in the point process context, because P0 is then deﬁned on a subspace of P-measure zero, namely, the realizations with a point at the origin. The explanation of this paradox lies in the predictability requirement, for on both sides of the equation the integration in fact is over the restrictions of the measures ξ to R− = (−∞, 0). Thus (14.3.7) asserts that the projection of mP0 onto M# (R− ) is absolutely continuous with respect to the projection of P onto M# (R− ), with λ† (0, ξ) acting as the Radon–Nikodym derivative. We summarize the preceding discussion as follows. Proposition 14.3.IV. Let ξ be a stationary random measure or point process in R with ﬁnite mean rate m, distribution P on B(M# R ), and stationary Palm distribution P0 . If a complete intensity function λ† exists, then it is sta† with Radon–Nikodym tionary in the sense of (14.3.6), and mP0 P on H0− † derivative equal to λ (0, ξ) P-a.e. † , the Radon–Nikodym derivative can be Conversely, if mP0 P on H0− † taken as one version of λ (0, ξ), in which case (14.3.6) is a stationary H† predictable version of the complete intensity. Proof. The last assertion follows from the fact that if the Radon–Nikodym † -measurderivative is taken as deﬁning λ† (0, ξ), then it may be supposed H0− † † able, in which case λ (t, ξ) deﬁned by (14.3.6) is Ht− -measurable for all t and is thus H†(−) -adapted. But in the canonical framework, such a process is also H† -predictable (cf. the remarks in Appendix A3.3 already referenced). Returning speciﬁcally to the point process case, we next establish a hazard function representation for λ† analogous to that for λ∗ (on R+ ) given in Corollary 14.1.V. Note that because the past of the process can be represented as a point in a c.s.m.s., we can and do assume the existence of regular conditional probabilities, given the past. In fact it is convenient to adopt the point of view of Theorem 13.3.I and treat the distributions as distributions on sequences of

398

14. Evolutionary Processes and Predictability

intervals. This leads to the following formulation, where we use the notation Tu for the backward recurrence time at time u, τ(u) = {τ−1 (u), τ−2 (u), . . .} for the vector of intervals between consecutive points prior to u, and F (· | τ) for the conditional distribution function of the interval-length τ0 given the vector τ = (τ−1 , τ−2 , . . .} of the preceding intervals. Because we are dealing here with stationary processes, the distributions involved can all be derived from the stationary Palm distribution, as indicated in the proof. Proposition 14.3.V. For a simple stationary point process on R, and in the notation above, a version of the H† -compensator is given by F Tu + du | τ(u) − F Tu | τ(u) . dA(u) = 1 − F Tu − | τ(u) Proof. Because of stationarity, we can take u = 0 without loss of generality. Consider the three variables t1 (the ﬁrst point in R+ ), the backward recurrence time T0 , and the vector of intervals τ(0), all to be considered as functions of the realization N . Let h(t1 , T0 , τ(0)) be any jointly measurable function of these variables on the product space R+ × R+ × (R+ )(∞) , which is given the usual Borel σ-algebra. From (13.3.2) and standard Palm theory we have τ0 h(x, τ0 − x, τ) dx , EP h(t1 , T0 , τ) = mEP0 0

where τ0 = t1 + T0 is the length of the current interval. Recall that under P0 , the same (τ0 , τ) and τ have distribution, and that we can write, symbolically, EP0 (·) = EP0 Eτ0 (· | τ) . We evaluate the inner conditional expectation via the conditional distribution F (· | τ), and thus obtain ∞ ∞ dx h(x, y, τ) dy F (x + y | τ) . EP h(t1 , T0 , τ) = mEP0 0

0

This relation shows that the joint distribution of (t1 , T0 , τ) has the form (in inﬁnitesimal notation) m dx dt F (x + t | τ) M0 (dτ), where M0 is the measure induced by P0 on (R+ )(∞) , so for the distribution of t1 conditional on T0 and τ, we have Pr{t1 ≤ t | T0 , τ} =

F (T0 + t | τ) − F (T0 − | τ) . 1 − F (T0 − | τ)

Appealing to the properties of regular conditional probabilities, it now follows, as in the proof of Lemma 14.1.III, that the H† -compensator here has the form asserted. Corollary 14.3.VI. A complete intensity for a simple stationary point process exists if and only if the conditional distribution F (· | τ) (·), in which case, using f (· | τ) for a density for F (· | τ), a version of the H† -conditional

14.3.

Conditional Intensities

399

intensity is given by

f (Tu | τ(u)) . (14.3.8) 1 − F (Tu | τ(u)) In examples, it is usually the case that either f (· | τ) is continuous or it can be chosen to be left-continuous, and in either circumstance (14.3.8) then gives a predictable version of the H† -intensity provided that at a point of the process, Tu is interpreted as the length of the preceding interval and not as zero. A corresponding statement for MPPs is outlined in Exercise 14.3.5(b). λ† (u) =

Exercises and Complements to Section 14.3 14.3.1 Let ξ be an F -adapted cumulative process with F -compensator A. (a) A necessary and suﬃcient condition for the existence of an F -predictable intensity λ∗ for A is that the Campbell measure at (14.2.1) be absolutely continuous with respect to × P on ΨF . (b) When the condition in (a) is satisﬁed, the Radon–Nikodym derivative dCξ (t, ω)/d(×P) = λ∗ (t, ω) is F -predictable, and provides an F -predictable version of the conditional intensity. This version coincides, except possibly on a set of ( × P)-measure zero, with any F -intensity for ξ satisfying Deﬁnition 14.3.I(a). 14.3.2 Develop an alternative approach to the existence of mark-predictable intensities by starting from the deﬁnition of a conditional intensity measure: a kernel λ(t, ω, K) from the product space R+ × Ω onto Borel sets of K, which for each K is measurable with respect to the predictable σ-algebra. Then λ can be represented as λg (t, ω) F (K | t, ω) where F (K | t, ω) is a probability kernel, and λg (t, ω) is a predictable intensity for the ground process. Establish projection theorems, deﬁnitions via Radon–Nikodym derivatives, and so on, much as in the text. See Br´emaud (1981) and Jacod (1975) for details. 14.3.3 As an extension of Lemma 14.3.III, show that if the process X(t, ω) is leftcontinuous and integrable and the history Ft is also left-continuous, then there exists a predictable version of the conditional expectation E[X(t) | Ft ] [see Mertens (1972)]. 14.3.4 Follow reasoning similar to that used for the renewal process in Example 14.3(b) to ﬁnd explicitly the form of the complete intensity function for a Wold process with transition kernel P (x, A). 14.3.5 (a) Formulate and prove an extension of Proposition 14.3.IV for MPPs, retaining the canonical framework. [Hint: In the marked case, the role of P0 is played by the bivariate measure K (dκ) P(0,κ) . λ† (0, κ, ξ) can be identiﬁed as the Radon–Nikodym derivative of this bivariate measure with respect to the product measure × K × P on H†(−) .] (b) Show that for a stationary MPP with ﬁnite mean ground rate, the statement equivalent to (14.3.8) takes the form λ† (u, κ) =

f (Tu , κ | σ(u)) , 1 − F (Tu , K | σ(u))

(14.3.9)

where σ(u) is the family of pairs ((τ−1 (u), κ−1 (u)), (τ−2 (u), κ−2 (u)), . . .), with κi (u) the mark at the endpoint of the interval τi (u).

400

14. Evolutionary Processes and Predictability

14.4. Filters and Likelihood Ratios Much of the early work on the martingale approach to point processes was motivated by the communications engineering context, where the transfer of information by a pulsed signal rather than a continuously modulated signal, had become a major consideration. The emphasis in the engineering literature was on extending the Kalman–Bucy updating algorithms, which allowed real-time estimation and control for processes with a linear Gaussian structure, to point processes (‘jump processes’) and other more general contexts. Noise pulses arising at various stages of the transmission process contaminate the received signal, which is typically a Poisson process with two components, namely, the original signal and the noise. The practical questions to be resolved concern the estimation of the original signal on the basis of the point process observed at the receiving end of the transmission line, and its use in control procedures. The last two decades have seen the development of a much wider interest in these models and the estimation procedures that go with them, often in the terminology of the hidden Markov models discussed in Section 10.3. In most of these applications, a key problem is estimating the state of some unobserved system at a particular time, either from the information available up to that time (ﬁltering problem), or from information available over a longer time period (smoothing problem). But whereas the discussion in Section 10.3 used intuitive arguments based on standard properties of discrete state Markov processes, here we provide an introduction to the more general martingale approach, making use of the concepts developed earlier in this chapter. Br´emaud and Jacod (1977) provide a helpful informal guide to the relationships between the point process procedures and earlier work on Gaussian ﬁlters. Fuller discussions of the ﬁltering problems, with further references, can be found in Br´emaud (1981) and Snyder and Miller (2000), and a very general treatment of the marked case is in Last and Brandt (1995, Chapter 11). We start with a recapitulation of the simplest version of the hidden Markov models discussed in Section 10.3. Example 14.4(a) Cox process directed by a ﬁnite state Markov process [continued from Example 10.3(d), Exercise 7.2.8]. In Example 10.3(d) we derived explicit estimates of the current state of a Markov chain X(t) which, when it is in state j, generates points of a Poisson process at rate λj . Such an example contains in embryonic form most of the features of the general problem of ﬁltering for point processes. In particular, we adduced earlier an estimate πi (t) of the probability of the current state, given the observations on the point process, as a ratio involving the ‘joint statistics’ or ‘forwardprobabilities’ pi (t) [cf. (10.3.4)], and the likelihood, which is just their sum pi (t): ! πi (t) = Pr{X(t) = i | observations} = pi (t) p(t) 1. (14.4.1a) Let F denote the joint history of the point process and the Markov chain, and H the internal history of the point process. Then πi (t) can be identiﬁed

14.4.

Filters and Likelihood Ratios

401

as the conditional expectation of the indicator function E[I{X(t) = i} | Ht ]. Similarly, we can identify the F -conditional intensity as λ∗F (t) = λX(t−) , and from Lemma 14.3.III we obtain the H-intensity as the projection K K pi (t)λi ∗ . (14.4.1b) πi (t)λi = i=1 λH (t) = E[λX(t−) | Ht ] = K i=1 pi (t) i=1 Equations (14.4.1) are typical ﬁltering equations, and one of the main goals of this section is to ﬁnd their extensions to more general models. Note also the possibility of developing a succinct system of equations for updating such estimates as in (10.3.6–9). Because of the importance of obtaining real-time estimates of the state, it is the latter topic that is dominant in much of the earlier literature. Both the diﬀerential equations that hold between observed points, and the diﬀerence equations that hold when such points are traversed, can be incorporated into a single set of integral equations, and such integral equations then form the main object of study. They play a role here somewhat analogous to that of stochastic diﬀerential equations in models exploiting diﬀusion concepts. Martingale representations ﬁt naturally into this discussion in which two main approaches can be identiﬁed, the ‘innovations’ approach, in which the martingale representations are studied via ﬁltering results such as those described in Proposition 14.3.II and Lemma 14.3.III, and the method of ‘reference probabilities,’ in which they enter via likelihoods. We outline an introduction to the second approach, based on Br´emaud (1981) which may be consulted for a more extended account. We start with a re-examination and extension of Propositions 7.2.III and 7.3.III concerning the structure of likelihood ratios for point processes which are regular (i.e., for which the Janossy measures have densities), and the conditional intensities are taken with respect to the internal histories. We state the result for MPPs; the simple point process appears as the special case that the mark space K reduces to a single point. Proposition 14.4.I. Consider an MPP N , with ground process Ng , state space R+ × K, adapted to the internal history H on the probability space (Ω, E). For j = 1, 2 let Pj be a probability distribution on (Ω, E) such that for each Pj the ground process is boundedly ﬁnite (hence nonexplosive) with boundedly ﬁnite ﬁrst moment measure, and for each t > 0, Pjt denotes the restriction of Pj to Ht . Suppose that P1 has a strictly positive mark-predictable H-intensity λH 1 (t, κ, ω) relative to the reference measure K on BK . Then the necessary and suﬃcient condition for P2t P1t for 0 < t ≤ ∞ is the existence of a nonnegative, mark-predictable process µ(t, κ, ω), such that under P2 , H N admits the mark-predictable intensity λH 2 (t, κ, ω) = µ(t, κ, ω) λ1 (t, κ, ω). When this condition is satisﬁed, the likelihood ratio Lt (ω) =

dP2t (N (ω)) dP1t (N (ω))

402

14. Evolutionary Processes and Predictability

has a right-continuous version with left limits given P1 -a.s. by Lt (ω) = 1 for t < t1 (ω), and for t ≥ t1 it equals

µ(ti , κi , ω) exp − (0,t]×K

0
[µ(s, κ, ω) − 1] λH (s, κ, ω) ds (dκ) , 1 K

(14.4.2) where {(ti , κi ): i = 1, 2, . . .} is an enumeration of the points of N . Moreover, (j) if for j = 1, 2, λg (t) is the ground intensity and f (j) (κ | t) is the density of (1) (2) the mark-predictable kernel, and µg (t, ω) = λg (t, ω)/λg (t, ω), then for each T > 0, Lt (ω) is the unique such solution of the integral equation Lt (ω) = 1 +

t

Ls− (ω)[µg (s, ω) − 1] Z ∗ (ds, ω)

(0 ≤ t ≤ T ),

(14.4.3)

0

where Z ∗ (t) is the (H, P1 )-martingale deﬁned by dZ ∗ (t) =

K

f (2) (κ | t) N (dt × dκ) − λ(1) g (t) dt. f (1) (κ | t)

Proof. Because we are concerned here only with the internal history intensities, a straightforward proof of (14.4.2) can be written down by starting from the representation of the likelihoods in terms of Janossy measures, as in the discussion leading to Proposition 7.2.III. By assumption, the Janossy measures under P1 are absolutely continuous (with respect to Lebesgue measure) with densities determined by the conditional intensity λH 1 (t, κ, ω). Then the existence of the likelihood ratio implies absolute continuity for the Janossy measures under P2 also. In turn, the densities under P2 can be used to deﬁne the conditional intensity under P2 , namely, λH 2 (t, κ, ω). Finally, restating the Janossy densities in terms of the conditional intensities leads to the form of the likelihood ratio given in (14.4.2). The fact that the conditional hazard functions are measurable functions of (ti , κi ) for ti ≤ t implies that, as a function on (0, t] × K × Ω, µ(s, κ, ω) is (B(0, t) ⊗ BK ⊗ Ht )-measurable, and therefore, from the projection theorem, possesses an H-mark-predictable version. Use of this version in (14.4.2) leaves the likelihood unaltered except possibly on a set of (×K ×P1 )-measure zero. Conversely, suppose that P2 P1 on HT = H[0,T ] , and that the conditional intensity λH 1 exists. We consider the likelihood ratio in the successive intervals (0, t1 ), [t1 , t2 ), . . . . In the interval (0, t1 ), absolute continuity of the likelihoods means that the distribution function of the length of the ﬁrst interval under P2 is absolutely continuous with respect to its distribution under P1 . The latter has a positive density, so this is equivalent to the assertion that its distribution under P2 is absolutely continuous with respect to Lebesgue measure. The ratio of the two densities on the event t1 > t is also the ratio of the two conditional intensities, and so deﬁnes an appropriate form of µ on this

14.4.

Filters and Likelihood Ratios

403

event. When t1 ≤ t < t2 , a similar argument applied to the conditional distribution for the length of the second interval, given t1 , establishes the existence of an appropriate µ on this event also. Proceeding in this way, and observing that the process is free from explosions under both probabilities, shows that the P2H conditional intensity exists in general. Both conditional intensities, and hence also their ratio, can be given predictable versions without altering the ratio except on sets of ( × K × P1 )-measure zero. Finally, the equivalence of (14.4.2) and (14.4.3) is a consequence of the exponential formula (Lemma 4.6.II) if in that formula we take Z ∗ (t) in place of the monotonic function F (t), and the ratio [µg (t)−1] in place of the function u(t). The integrability requirement on u here follows from the equations

t

µg (s)

E1 0

f (2) (κ | s) dN (s) = E µ(s, κ) N (ds × dκ) g 1 f (1) (κ | s) (0,t]×K (1) µ(s, κ) λ (s, κ) ds K (dκ) = E1 (0,t]×K

and the last expression equals E2 Ng (0, t] < ∞, where for j = 1, 2, Ej denotes expectation with respect to Pj . Equality of the right-hand sides of the ﬁrst and second of these equations is also the essential calculation needed to establish the martingale property of Z ∗ , because it shows that E[dZ ∗ (t) | Ft ] = 0. It is often convenient to write (14.4.2) in terms of the ground processes Ng (t) and conditional mark distributions f (j) (κ | t) (cf. Proposition 7.3.III) as Lt (ω) =

N g (t)

(2)

λg (ti , ω)

(1)

λg (ti , ω) N g (t) (2) f (2) (κi | ti ) (1) λg (t, ω) − λg (t, ω) dt K (dκ) . × exp − f (1) (κi | ti ) (0,t]×K i=1 i=1

When N is a simple point process, both expressions take the familiar form Lt (ω) =

0
t µ(ti , ω) exp − [µ(s, ω) − 1] λ(1) (s, ω) ds 0

t λ(2) (ti , ω) (2) (1) exp − [λ (s, ω) − λ (s, ω)] ds , = λ(1) (ti , ω) 0 0
and the martingale dZ ∗ (t) reduces to dZ(t) = dN (t) − λ(1) (t) dt. Corollary 14.4.II. For an MPP satisfying the conditions of Proposition 14.4.I, there exists a sequence of stopping times Sn → ∞ a.s. as n → ∞ such that for each n, Lt∧Sn is an H-martingale under P1 .

404

14. Evolutionary Processes and Predictability

Proof. Because L(t) ≡ Lt is left-continuous and nonnegative in each ﬁnite interval (0, t], it is also bounded a.s. on such intervals. Furthermore, if Tn is deﬁned for the ground process Ng as at (14.1.1),

t∧Tn

E1

µg (s) λ(1) g (s) ds = E2 [Ng (t ∧ Tn )] ≤ n < ∞,

0

and

t∧Tn

E1

λ(1) g (s) ds = E1 [Ng (t ∧ Tn )] ≤ n < ∞.

0

Deﬁning Sn = Tn ∧ inf{t: L(t) ≥ n}, we necessarily have Sn → ∞ and thus E1

t∧Sn

" " 2 "L(s−) [µg (s) − 1]" λ(1) g (s) ds ≤ n .

0

Thus, the quantity L(s−)[µg (s) − 1] on the right-hand side of (14.4.3) is predictable, so because Z ∗ (·) is an H-martingale under P1 , the likelihoods L(t ∧ Sn ) must also form a martingale (see Exercise 14.1.7). This corollary can also be rephrased as stating that under P1 , the likelihood ratios Lt form an H-local martingale. It should be emphasized that the treatment in the last few results involves some substantial simpliﬁcations. First, by restricting the discussion to boundedly ﬁnite processes, we rule out ‘explosive’ situations, where the sequence of time points Tn may approach a ﬁnite limit point. In general, if this requirement is dropped, similar results hold on the interval up to the time of the ﬁrst ‘explosion’. See, for example, Liptser and Shiryaev (1978) or Last and Brandt (1995) for treatments when this constraint is relaxed. Second, the absolute continuity condition imposed upon the compensator under P1 rules out situations where the compensators may have discontinuities. Such discontinuities correspond to atoms in the conditional distributions deﬁning the compensator, as illustrated in Exercise 14.1.10(c). Likelihood ratios can certainly be considered for such processes, with the requirement that the compensator for the derived process has discontinuities only where the base process compensator has discontinuities. Then the general form of the IHF comes into play; a simple example, involving a Poisson process with ﬁxed atoms, is given in Exercise 14.4.4, leading to additional factors of the form

1 − ∆A(2) i (1)

1 − ∆Ai (j)

,

(14.4.4)

where for j = 1, 2, the ∆Ai are the jumps in the compensator under Pj , (2) (1) ∆Ai = µ(τi )∆Ai where the τi are the corresponding time points, and, for

14.4.

Filters and Likelihood Ratios

405 (j)

a simple point process, both sets of jumps are required to satisfy ∆Ai ≤ 1. Again see, for example, Boel, Varaiya and Wong (1975); Jacod (1975); Liptser and Shiryayev (1978); Last and Brandt (1995, (10.1.14)) for further details. The third simpliﬁcation is a consequence of considering only internal histories. Situations where one conditional intensity depends on information additional to the record of the occurrence times and the other does not, as happens in the hidden Markov schemes described earlier, require some ﬁltering procedure to reduce the more complex intensity to a form of internal intensity, as we consider below Proposition 14.4.IV. In yet other situations, the point process likelihood may be only a partial likelihood, depending on the values of random variables treated as constants for the purposes of likelihood estimation. We proceed to a consideration of additional issues from the ﬁltering perspective, limiting the discussion to simple point processes to avoid too cumbersome a treatment. The form of the likelihood ratio prompts the following more general question. Given a point process on R+ with measure P and F -intensity λF inder P, and some further nonnegative F -predictable function µ(·), does the product µ(t, ω) λF (t, ω) represent the F -intensity of the point process under the new measure P1 (dω) = Lt (ω) P(dω) ? The answer in general is no, because under the new measure there is no guarantee, without some further constraints on µ(·), that the new trajectories will be a.s. boundedly ﬁnite. In other words, the new measure may not necessarily be a probability measure, the possible mass deﬁciency corresponding to the probability that the real# . It seems rather diﬃcult to ﬁnd conditions izations no longer lie within N(0,t] directly on µ that will avert this possibility; it is obvious that boundedness of µ(t, ω) λH (t, ω) on (0, t] is suﬃcient, but this is too restrictive to be generally useful. An alternative stratagem is simply to require P1 to form a probability measure, and this leads to the following result. Proposition 14.4.III. Let N be a point process on R+ deﬁned on the probability space (Ω, E, P), F a right-continuous history for N such that N admits the intensity λF (t, ω), and µ(t, ω) a nonnegative F -predictable process satisfying for some t > 0 t

µ(s, ω) λF (s, ω) ds < ∞

P-a.s.

0

Then for Lt (ω) deﬁned by (14.4.2), the necessary and suﬃcient condition for P1 (dω) = Lt (ω) P(dω) to be a probability measure on (Ω, E) is that E[Lt (ω)] = Lt (ω) P(dω) = 1; (14.4.5) Ω

then N has an F -intensity under P1 equal to µ(s, ω) λF (s, ω) for 0 ≤ s ≤ t. Proof. The relation at (14.4.5) is clearly necessary and suﬃcient for P1 to be a probability measure, because it is simply the statement Ω P1 (dω) = 1.

406

14. Evolutionary Processes and Predictability

The substantial part is to show that when it is satisﬁed, µ(s, ω) λF (s, ω) is the F -intensity on (0, t] for N under P1 , for which it is suﬃcient to show that for every nonnegative F -predictable process Y (s, ω) on (0, t], t t F Y (s) dN (s) = E1 Y (s) µ(s) λ (s) ds . E1 0

0

By deﬁnition of P1 , these expectations E1 (· · ·) under P1 can be rewritten as expectations E(· · ·) with respect to P, writing L(t) = Lt (ω), t t E L(t) Y (s) dN (s) and E L(t) Y (s) µ(s) λF (s) ds , 0

0

respectively. Now under condition (14.4.5), L(s, ω) is not merely a local martingale but is in fact a martingale [this follows from the deﬁning relation (14.4.2), where it is necessarily a submartingale, and the relation E1 (Lt ) = E1 (L0 ) = 1 then shows it to be a martingale]. From the results in Section A3.3 concerning the Doob–Meyer decomposition and natural increasing functions, these expectations E(· · ·) can now be rewritten as t t F L(s)Y (s) dN (s) and E L(s)Y (s) µ(s) λ (s) ds , E 0

0

respectively. But then from the deﬁnition of L(s) and the fact that λF (·) is the F -intensity of N under P, the ﬁrst of these integrals equals t t L(s−)Y (s) dN (s) = E L(s−)Y (s) µ(s) λF (s) ds E 0 0 t F L(s)Y (s) µ(s) λ (s) ds , =E 0

where the natural increasing property of last equality.

s 0

Y (u) µ(u) λF (u) du justiﬁes the

Next, we formulate a general version of the Bayesian type of formulae that appear in Examples 14.3(a) and 14.4(a). In this version, the role of the likelihood itself (i.e., the density with respect to Lebesgue measure) is taken by the likelihood ratio with respect to a ‘reference probability’ P1 , which we discuss shortly. Proposition 14.4.IV. Let G ⊆ F be two histories for the point process N on R+ and P1 , P2 two probability measures, all deﬁned on the measurable space (Ω, E). Suppose that for j = 1, 2 and t in some interval (0, T ], the restrictions Pjt of Pj to Ft satisfy P2t P1t , and let Lt = dP2t /dP1t be the likelihood ratio. Then for any real-valued bounded F -adapted process Zt , E2 (Zt | Gt ) =

E1 (Zt Lt | Gt ) E1 (Lt | Gt )

P1 -a.s.

(14.4.6)

14.4.

Filters and Likelihood Ratios

407

Proof. First we show that the product E1 (Lt | Gt ) E2 (Zt | Gt ) can be taken as a version of the conditional expectation E1 (Zt Lt | Gt ). By deﬁnition, for B ∈ Gt , E1 (Zt Lt | Gt ) P1 (dω) = Zt Lt P1 (dω) = Zt P2 (dω), B

B

B

and, similarly, because E2 (Zt | Gt ) is Gt -measurable, E2 (Zt | Gt ) E1 (Lt | Gt ) P1 (dω) = E2 (Zt | Gt ) P2 (dω) = Zt P2 (dω), B

B

B

implying the desired P1 -a.s. equality. Now the set D, where E1 (Lt | Gt ) = 0, is Gt -measurable and satisﬁes Lt P1 (dω) = E1 (Lt | Gt ) P1 (dω) = 0, P2 (D) = D

D

so the P1 -a.s. equality remains true when put in the ratio form (14.4.6) whatever particular deﬁnition is used for the conditional expectations when the denominator of (14.4.6) is zero. These results form the starting point of a general attack on the problem of ﬁltering for point processes. The reference probability P1 is generally the distribution of a Poisson process at unit rate, this process being assumed to be independent of the other random variables of interest, and in particular of those governing the signal process. The smaller history G is commonly the internal history H, in which case the ratios at (14.4.6) are quite analogous to the ratios occurring earlier in the examples. Thus, in the mixed Poisson process of Example 14.3(a), the F -likelihood ratio equals (µt)N e−(µ−1)t , and the quantity under the expectation sign in the numerator at (14.4.6) equals µ(µt)N e−(µ−1)t which, on taking conditional expectations, cancelling the fac tor tN et , and noting that µ is independent of Ht under P1 , leads back to the expression for the H-intensity obtained above (14.3.3). Similarly the updating formulae in Example 14.4(a) follow from (14.4.6) by setting Zt = I{X(t) = i}. We now quote some more general results which can be obtained by this method. Essentially, we follow the development in Br´emaud (1981) which gives more details. Updating formulae can be developed for both numerator and denominator of (14.4.6), essentially as corollaries to the integral equation (14.4.3) already obtained for the likelihood. The denominator requires the conditional expectation with respect to Gt of the F -likelihood ratio deﬁned in terms of the F -intensity. At ﬁrst sight it may seem surprising that the result of this is simply to replace the F -intensity by the G-intensity in the expression for the likelihood ratio, which in the language of engineers implies the ‘separation of estimation and detection’ [recall that ‘detection’ is implemented through tests based on the likelihood ratio, and ‘estimation’ comes from the updating formulae based on (14.4.6)]. The result is most easily proved by ﬁrst reverting to the integral equation form for the likelihood, taking conditional

408

14. Evolutionary Processes and Predictability

expectations, replacing λF by λG , and then returning to the basic form via a second application of the exponential formula (see Exercise 14.4.5 for details). Updating the numerator at (14.4.6) when G = H is also eﬀected by writing down the integral equation for the likelihood ratio, multiplying through by Zt , and taking conditional expectations with respect to H. Many of the most important examples in signal processing, and in many other ﬁelds also, relate to extensions of the Markov-directed Cox processes discussed earlier in this chapter and Section 10.3. In this case there occur a number of simpliﬁcations which allow the updating formulae for the numerator to be reduced to relatively manageable form, as we indicate below. Suppose ﬁrst that Ft = F0 ∨ Ht , where F0 incorporates full information about a prior or directing (signal) process Xt , so that F0 ⊇ σ{Xt : 0 ≤ t < ∞}. Then under P1 , N is independent of σ{Xt }, and still using the F -predictable version of the intensity, we obtain t E1 Zt Ls− [λF (s) − 1] | Hs (dN (s) − ds). E1 (Zt Lt | Ht ) = E1 (Zt ) + 0

(14.4.7) Suppose, in particular, that N is a Cox process directed by some nonnegative measurable function µ of Xt so that λ(t) = µ(Xt ) ≥ 0 a.s., and that Xt is a stationary Markov process on R with transition semigroup P t . The main interest in the ﬁltering context centres on estimating features of the directing process Xt , so we take Zt = f (Xt ) for some bounded continuous function f : R → R. The Markov property for Xt then implies E1 [Lt f (Xt )]

= E1 [f (Xt )] +

t

E1 L(s−)[λ(s) − 1]P t−s f (Xs ) | Hs (dN (s) − ds).

0

Moreover, if the transition probabilities P t (x, B) (x ∈ R, B ∈ B(R)) are continuous functions of x for t > 0, this expression can be written explicitly in terms of these probabilities and the associated distributions Pt (B) = P{X(t) ∈ B} and Πt (B) = Πt (B | N ) ≡ E[Lt IB (Xt ) | Ht ], the latter being a regular version of the conditional probability expressed as a function of the realization N . The previous equation then takes the form t Πs (dx)[µ(x) − 1]P t−s (x, B) (dN (s) − ds), (14.4.8) Πt (B) = Pt (B) + R

0

from which the updating character is more readily apparent. Explicit expressions depend on the nature of the governing Markov process. The simplest case is that of a pure jump process with a denumerable set of states. The transition probabilities P t−s in (14.4.8) then take the matrix exponential form e(t−s)Q and the equations reduce to the updating formulae already noted in Example 14.4(a) and Section 10.3. The one point of diﬀerence is that, because we are here considering the likelihood ratio, it is the diﬀerence of the conditional intensities (the second being just unity as it corresponds

14.4.

Filters and Likelihood Ratios

409

to a unit rate Poisson process) which appears in the integrand. This leads to a multiplicative factor which cancels in the ratio at (14.4.6). Even in the more diﬃcult cases of diﬀusion and mixed jump-diﬀusion processes, a similar formal representation holds, but with the role of Q taken by the inﬁnitesimal generator. This is illustrated in the ﬁnal example, which was a further starting point for point-process ﬁltering theory in Snyder (1972, 1975). Example 14.4(b) Cox process directed by a Markov diﬀusion process. Suppose there is given a diﬀusion process on R+ , whose densities pt (y) satisfy the forward equation (with Dt = ∂/∂t, Dy = ∂/∂y, Dy2 = ∂ 2 /∂y 2 ), Dt pt (y) = −Dy [β(y)pt (y)] + 12 Dy2 [α(y)pt (y)] ≡ L pt (y) , for drift and diﬀusion terms α(·) and β(·) and diﬀusion operator L [see, e.g., Feller (1966, Chapter XIV)]. Denoting the density of Πt at (14.4.8) by πt and diﬀerentiating (14.4.8) between jumps, we obtain [recalling p0 (x, y) = δ(x−y) and using the linearity of L] t πs (x)[µ(x) − 1]L pt−s (x, y) dx [dN (s) − ds] Dt πt (y) = L pt (y) + R

0

= L πt (y) − πt (y)[µ(y) − 1],

− πt (y)[µ(y) − 1] (14.4.9)

and at any jump t of N we have πt+ (y) = πt− (y)µ(y). The diﬀerential equation (14.4.9) may be compared with the diﬀerential equation for the joint probabilities in Example 14.4(a). As in that example, it is just the forward diﬀerential equation for the transition probabilities modiﬁed by the extra term needed to preclude the occurrence of additional points during the interval under examination. Similarly, the expression for the jump term has the same general character as in that example. One point to reemphasize is that the updating formulae for the joint probabilities, derived from the numerator of (14.4.6), are much easier to handle than the nonlinear equations which arise in trying to update the state probabilities directly. The main point about (14.4.8) and the various special cases that can be deduced from it is that it provides a very general formulation of the updating equations for the ‘joint statistics’ (forward probabilities) which ﬁrst appeared in (10.3.6) and (10.3.13a). The existence of such updating formulae depends on the Markov structure of the unobserved process and the assumed independence of the observations (the points of the point process) given the current state of the Markov process. The resulting ﬁltering theory has been mainly applied to the problem of state estimation. Whether there exist corresponding extensions to the E–M algorithm, used in Section 10.3 to provide a tractable approach to the problem of parameter estimation, is still a subject of current research. A variety of approaches has been suggested, from analytic results in special cases to various forms of Monte Carlo estimation, including particle ﬁlters, in more complex situations.

410

14. Evolutionary Processes and Predictability

An alternative in such more complex situations is to revert to a discrete context, for example, by approximating a diﬀusion process by a random walk, and taking advantage of the extensive literature and software for discrete hidden Markov models. There are important potential applications in many ﬁelds, including ﬁnance, neurophysiology, meteorology, and geophysics, as well as in the original context of signal processing. See references under hidden Markov models in Section 10.3 for some more recent work.

Exercises and Complements to Section 14.4 14.4.1 Extend Example 14.4(a) to the context where the underlying Markov process is bivariate, (X(t), Y (t)) say, with X(t) {1, . . . , K}-valued as before and unobserved, and Y (t) observed with values {α, β, . . .}, so that {(X(t), Y (t))} has Q-matrix (q(i,α),(j,β) ) say. Denote the internal history of the observed process Y (·) by H = {Ht } ≡ {σ({Y (s): 0 ≤ s ≤ t})}. p1 (t), . . . , pˆK (t)) with pˆi (t) = By analogy with (14.4.1) set pˆX (t) ≡ (ˆ P{X(t) = i, observed Y (s) (0 ≤ s ≤ t)}, so that [cf. (10.3.6–7)] on intervals of constancy for Y (·), Y (t) = α say, Dt pˆi (t) = −q(i,α)(j,α) pˆi (t) +

q(k,α),(i,α) pˆk (t), k=i

and where α = Y (t−) = Y (t) = β, pˆi (t) = pˆi (t−) q(i,α)(i,β) /|q(i,α)(i,α) | . (a) Recover Example 14.4(a) from Example 10.3(d) by taking X(t) as earlier and Y (t) = N (t). (b) As another special case, derive the corresponding equations for the process in which Y (t) simply counts the jumps in the process X(·), which is otherwise unobserved. [Hint: See Rudemo (1972, 1973) for further details and special cases.] 14.4.2 Consider a Neyman–Scott cluster process with Poisson cluster centre process at rate µ and clusters of random size ν, Pr{ν = j} = qj for j = 0, 1, . . . , located at independent exponentially distributed distances, mean 1/λ, from the cluster centre. The observed process N (t) consists of both the cluster centres and the points of the cluster, without distinction, lying in the interval (0, t]. Write X(t) for the numbers of points generated from centres tj < t at locations ti ≥ t. Then X(·) is an unobserved Markov process on {0, 1, . . .} governing the observed points, and λ∗ (t) = µ + λX(t−). Show that the process ﬁts the context of Exercise 14.4.1, albeit a countable state space for X(·), when Y (t) = N (t) and the nonzero oﬀ-diagonal transition rates are given for all i, j, r = 0, 1, . . . by q(i,r)(i+j,r+1) = µqj and q(i,r)(i−1,r+1) = λi. Deduce that the joint p.g.f. Pt (w, z) = E(wX(t) z Y (t) | X(0)) is given by

t

Pt (w, z) = [z + (w − z)e−λt ]X(0) exp

−µ

and that

µ 1 1 − Q(1 − (1 − w)v)

lim E(wX(t) ) = exp

t→∞

−

λ

0

[1 − zQ(z + (w − z)e−λu )] du ,

0

[Hint: See also Jowett and Vere-Jones (1972).]

v

dv .

14.4.

Filters and Likelihood Ratios

411

14.4.3 To complete the proof of Proposition 14.4.I, suppose that P2 P1 , where P1 has H-intensity λ1 (·). Arguing as in Proposition 7.1.III, this implies that P2 is absolutely continuous with respect to a Poisson process at unit rate, and thus has H-intensity λ2 (·) say. Set µ(t, ω) = λ2 (t, ω)/λ1 (t, ω) if ˜ be the predictable projection of µ. λ1 (t, ω) = 0, = 1 otherwise, and let µ Show that µ deﬁned by (14.4.2) has the properties of the Radon–Nikodym derivative dP2 /dP1 . 14.4.4 Let N = N1 + N2 be the sum of two Poisson processes, for which the parameter measures are µ times Lebesgue measure, and a purely atomic measure with atoms of ﬁxed mass a at each integer. Write down the likelihood ratio for N against a reference measure of the same type, with µ = a = 1, and verify that it can be written in the form (14.4.4). [Hint: The term due to the atomic component has the same structure as a set of i.i.d. Poisson variables.] 14.4.5 (a) For a process with F -intensity λF let Lt be its likelihood ratio relative to a unit rate Poisson process (see around (14.4.7)). If Gt ⊆ Ft (all t ≥ 0) for a history G, E(Lt | Gt ) has the same form as Lt but with G-intensity F E(λF t | Gt− ) in place of λ . [Hint: From the integral equation for Lt construct E(Lt | Gt ). Deduce that, because Ns is Gt -measurable for t > s,

t

E1 0

Ls− [1 − λF (s)] (dN (s) − ds) Gt

= 0

t

E1 [Ls− (1 − λF (s)) | Gt ] (dN (s) − ds).

Now Ls− and 1 − λF (s) are Gs− -measurable, so the integrand on the right-hand side equals E1 (Ls− | Gs− ) E2 (1 − λF (s) | Gs− ) P2 -a.s.] (b) Show similarly that (14.4.7) holds [cf. Br´emaud (1981, Chapter VI.3)]. 14.4.6 Derive the updating equations (10.3.6–7) of Example 10.3(d) from (14.4.8) by using standard Chapman–Kolmogorov equations for the derivatives of transition probabilities and considering (14.4.8) at and between jumps of N (·). Convert these equations into updating equations for the conditional probˆi (t) = P{X(t) = i | Ht ), either t is a point abilities Πt (·)/Lt , so that with π ˆ πi (t−), or else ˆi (t) = (λi /λ)ˆ of the realization in Ht and π

∂π ˆi (t) ˆ πi (t). qji π ˆj (t) − (λi − λ)ˆ = ∂t j=i

14.4.7 Suppose in Example 14.4(a) that the transition rates (qij ) and the rates λi are functions of some parameter α, and give α some prior distribution. Extend the updating equations [cf. also Example 14.3(a)] to obtain the integral equation for the joint statistic hi (α, t) for X(t), α, N (·), corresponding to (14.4.8), hi (α, t) =

j

t

0

hj (α, s) (µj (α) − 1) pij (t − s; α) dZ(s).

Consider the special case where X(·) is {1, 2}-valued, with unknown emission rates λ1 , λ2 but known exponential holding times in each state. Investigate the consequences of assuming prior distributions for λ1 , λ2 that are independent gamma distributions.

412

14. Evolutionary Processes and Predictability

14.5. A Central Limit Theorem If the Bayesian approach, with its close links with updating formulae, has been the main focus of attention in the engineering literature, there has also been a substantial development in the application of ‘classical’ statistical procedures to inference problems for point processes. In particular, the monograph of Kutoyants (1980), especially in its revised edition in English [Kutoyants (1984b)], develops the asymptotic results for maximum likelihood estimates based on the representations given earlier in this section. An important part of this development is establishing asymptotic normality for the likelihood derivatives Dθ log Lt (as usual, Dθ ≡ ∂/∂θ), where θ is the parameter under study. From the representation at (7.2.4) or (14.4.2) it is readily seen that under suitable regularity conditions these take the form t Dθ λH (s) [dN (s) − ds]. λH (s) 0 Evaluated at the true parameters, such integrals reduce to integrals with respect to the point process martingale Z(s) = N (s) − A(s), and hence they are themselves martingales. It is then possible to apply to them general central limit theorems for martingales, as, for example, did Rebolledo (1980), or else to develop versions of such theorems specially tailored for the point process context, as did Kutoyants (1979, 1984b). We follow the latter approach and give a slight extension to Kutoyants’ work to allow for the possibility of a random variance term that leads to a mixed normal distribution. A convenient framework for this extension is the concept of stable convergence in distribution, as described in Section A3.2. However, we do not use the full strength of stable convergence with respect to the σ-algebra F∞ but only stable convergence with respect to the σ-algebra generated by the limit r.v. itself, as this is all that is needed here to discuss the convergence to mixtures of normals. Jarupskin (1984) obtained stronger forms, requiring further conditions. Theorem 14.5.I. Let N be a simple point process on R+ , F -adapted and with continuous F -compensator A. Suppose that for each T > 0 an F -predictable process fT (t) is given and that there exists a positive F0 -measurable random variable η such that T (i) E 0 [fT (u)]2 dA(u) < ∞; T (ii) as T → ∞, E 0 [fT (u)]2 dA(u) → η 2 in probability; and (iii) there exists δ > 0 such that as T → ∞, T " "2+δ " " fT (u) E dA(u) → 0. 0

Then the random integral

T

fT (x) [dN (x) − dA(x)]

XT = 0

14.5.

A Central Limit Theorem

413

converges F0 -stably in distribution to a limit random variable U η, where U is independent of F0 and has a normal N (0, 1) distribution. Proof. It is most convenient to make use of the form (A3.2.10) for stable convergence, because the exponential formula can again be used to good eﬀect to simplify the limiting form of expectations E[Z exp(iyXT )]. To this end, consider the process, for ﬁxed real y,

t

fT (u) [dN (u) − dA(u)] +

ζT (t, y) = exp iy 0

1 2 2y

t

[fT (u)] dA(u) . 2

0

Here, A(t) and N (t) are, respectively, continuous and pure jump processes, so ζT (t, y) can be written in terms of its continuous and jump components as t 1 2 2 ζT (t, y) = exp y [f (u)] − iyf (u) dA(u) T T 2 0

1 + exp[iyfT (ti )] − 1 ∆N (ti ) , ×

(14.5.1)

i

where the product is taken over the jump points ti of the realization of N over the interval (0, t], and because N is assumed simple, ∆N (ti ) = 1 a.s. for all i. Comparing this expression for ζT (t, y) with the exponential formula at (4.6.2), it can be deduced that ζT (t, y) is the unique solution, bounded in [0, T ], of the integral equation ζT (t, y) − 1 t ) * ζT (u−, y) 12 y 2 [fT (u)]2 − iyfT (u) dA(u) + exp[iyfT (u)] − 1 dN (u) = 0 t ζT (u−, y) exp[iyfT (u)] − 1 [dN (u) − dA(u)] = 0 t ζT (u−, y) exp[iyfT (u)] − 1 − iyfT (u) + 12 y 2 [fT (u)]2 dA(u). + 0

Now let τ denote the stopping time (recall that η is F0 -measurable)

t τ = inf t: 0 [fT (u)]2 dA(u) ≥ η 2 , and let Z be any F0 -measurable, essentially bounded random variable. Setting t = T ∧ τ in the integral equation, multiplying by Z, and taking conditional expectations with respect to F0 , the optional sampling theorem implies that E Z

T ∧τ 0

" " ζT (u−, y) exp[iyfT (u)] − 1 [dN (u) − dA(u)] " F0 = 0;

414

14. Evolutionary Processes and Predictability

because both of ZζT (u−, s) (which is left-continuous) and Z exp[iyfT (u)]−1 are F -predictable, the latter function is bounded, and the integral on (0, t) is an F -martingale. Thus, we obtain the estimate " " "E [ZζT (T ∧τ ) | F0 ]−Z " ≤ E |Z|

T ∧τ

" " |ζT (u−, y)| |R(fT (u), y)| dA(u) " F0 ,

0

where R(fT (u), y) = exp[iyfT (u)] − 1 − iyfT (u) + 12 y 2 [fT (u)]2 . If δ > 0 is chosen as in condition (iii) of the theorem, then some ﬁnite C(δ) exists such that " " "R(fT (u), y)" ≤ C(δ) |y|2+δ |fT (u)|2+δ . Furthermore, on 0 < u < T ∧ τ , |ζT (u−, y)| ≤ exp

1

2y

2

T ∧τ 0

[fT (u)]2 dA(u) ≤ exp( 12 y 2 η 2 ).

Making use of these inequalities, we obtain the further estimate, writing Z = ess sup |Z(ω)|, " " "E[ZζT (T ∧ τ ) | F0 ] − Z " T ∧τ " " 2+δ 2+δ (14.5.2) |fT (u)| dA(u) " F0 . ≤ Z C(δ) |y| E 0

Multiplying through by the F0 -measurable function exp(− 12 y 2 η 2 ) shows that the right-hand side of (14.5.2) is an upper bound on " " "E Z[ρT eiyXT − e−y2 η2 /2 ] | F0 ",

(14.5.3)

where ρT equals

T

exp iy T ∧τ

fT (u) [dN (u) − dA(u)] −

η −

1 2 2y

2

T

. [fT (u)] dA(u) 2

0

+

Taking the expectation of (14.5.3) and its bound from (14.5.2), the bound converges to zero from assumption (iii). Also, because |ρT | ≤ 1 and ρT → 1 in probability from assumption (ii), E[Z(ρT −1)eiyXT ] → 0, so that for (14.5.3) we obtain the limit relation E(ZeiyXT ) → E(Ze−y

2 2

η /2

).

It now follows from Proposition A3.2.IV that there exists a random variable X such that XT → X (F0 -stably) and for each bounded, F0 -measurable Z, E(ZeiyX ) = E(Ze−y

2 2

η /2

).

14.5.

A Central Limit Theorem

415

2 2 This equality is equivalent to E e−sX | F0 = e−s η /2 , and hence to 2 E eiyX/η | F0 = e−y /2 . We deduce that X/η = U is independent of F0 and has a unit normal distribution. Corollary 14.5.II. Under the conditions of the theorem, the distributions of the random integrals XT converge weakly to the mixed normal distribution 2 2 with characteristic function φ(y) = E(e−y η /2 ). Thus, when η is a.s. constant, the XT converge weakly in distribution to the normal distribution N (0, η 2 ). This corollary merely restates the fact that F0 -stable convergence implies weak convergence. T Corollary 14.5.III. If BT2 = 0 [fT (u)]2 dA(u) > 0, then the randomly normed integrals XT /BT converge F0 -stably in distribution to the unit normal random variable U . Proof. We use the result from Proposition A3.2.IV that if Xn → X (Fstably) then g(Xn , Y ) → g(X, Y ) (F-stably) for bounded continuous functions g(·). Supposing ﬁrst that Y is bounded away from zero and X is essentially bounded, we can take g(x, y) = x/y so that Xn /Y → X/Y (F-stably). The constraint on X is immaterial in that it is given that P{|X| < ∞} = 1 (because X is a well-deﬁned r.v.). Now suppose also that Yn → Y in probability, where each Yn is a.s. positive and F-measurable. Then (Xn , Yn ) → (X, Y ) in distribution and thus, again, Xn /Yn → X/Y (F-stably). Finally, by approximating Y by a sequence of r.v.s bounded away from zero, the result extends to the case Y > 0 a.s. Taking Yn = BTn and Xn = XTn for some sequence Tn → ∞ and F = F0 , the result follows. The form of condition (iii) is not the most general possible. Kutoyants noted that it may be replaced by a Lindeberg type of condition, although the Liapounov type of condition suﬃces for most applications. Versions of the theorem for multivariate and MPPs can be given [see Kutoyants (1984a, b)]. The major application of the theorem is to the proof of the asymptotic normality of parameter estimates. This application is discussed and illustrated at length in Kutoyants (1980) for the case of inhomogeneous Poisson processes, and in Kutoyants (1984b) for more general processes. The next two examples may also help illustrate the range of applications for the theorem. Example 14.5(a) Poisson and mixed Poisson processes. As the simplest possible example, let N be a simple Poisson process with rate µ. Successful application of the theorem relies on identifying the appropriate norming function fT (·) for the quantity of interest. Here, to study N (t), recall ﬁrst that its H-compensator is µt. Thus, we need fT (·) to satisfy T [fT (u)]2 µ dt → const. (T → ∞), 0

416

14. Evolutionary Processes and Predictability

so fT (u) = T −1/2 is the simplest choice here, with the constant = µ and nonrandom. Then the left-hand side of (iii) reduces to µT −δ/2 → 0 (T → ∞) as required. Thus, N (T ) − µT → µ1/2 U in distribution, T 1/2 with U a standard normal r.v. as in the theorem. If in fact the process is mixed Poisson with µ a r.v. as in Example 14.3(a), the same conclusion holds provided we use the F -compensator with Ft = F0 ∨ Ht . Indeed, from Corollary 14.5.III we should have N (T ) − µT →U (µT )1/2

in distribution.

If we want to devise a result concerning an estimate of µ, it is preferable to express the left-hand side here as [N (T )/T − µ]/(µ/T )1/2 and then observe that, as T → ∞, we have N (T )/T → µ a.s. As a result of this, we can replace µ in the denominator and deduce further that N (T )/T − µ →U [N (T )]1/2 /T

in distribution,

with µ the only quantity on the left-hand side that is unknown at T . A ﬁnal possibility would be to use the H-compensator, which in the special case given in Example 14.3(a) has the form dA(t) =

N (t−) + 1 dt, t+α

and leads to virtually the same conclusions. In examples of this kind, where F0 is either trivial or very simple, there is little advantage in using the extensions to random norming. Only weak convergence is asserted and the theorem sheds no light on whether the estimates converge H∞ -stably, for example. For detail on this question see Jarupskin (1984). It also underlies the next example. Example 14.5(b) Simple birth process. This is a standard example [see, e.g., Keiding (1975); Basawa and Scott (1983)] for showing ‘nonergodic’ behaviour in the sense that the asymptotic distribution of the maximum likelihood estimate is not normal but a mixture of normals. If the probability of an individual producing oﬀspring in time (t, t + dt) is λ dt, and all individuals reproduce independently, it is known that with N (t) denoting the sum of the initial number n0 and the number of individuals born in (0, t] and qt = 1 − pt = e−λt , P{N (t) = n} =

0 qtn0 pn−n t

n−1 , n − n0

(14.5.4)

14.5.

A Central Limit Theorem

417

that N (t)e−λt → W

a.s.,

(14.5.5)

where W is a r.v. which, if n0 = 1, has the unit exponential distribution, and that ˆ t = N (t) − n0 λ t N (u) du 0 is the maximum likelihood estimate of λ. Clearly, the process may be treated as a point process, and it is then of interest to see what light the present methods shed on the behaviour of the likelihood estimate. The conditional intensity of the process with respect to the internal history H generated by the N (t) themselves is just equal to λN (t−). If we use this history, the ﬁrst derivative of the likelihood of the process on (0, T ) is proportional to T N (t−) dt, N (T ) − λ 0

which has variance function

T

N (t) dt ∼ E(eλT ).

λ 0

This suggests that the norming factor k(T ) = e−λT /2 is appropriate, but because W is not F0 -measurable with this choice of history, further discussion is required. In fact, what is needed is the F -intensity when F0 = σ{W } and (N ) Ft = F0 ∨ Ht . The history F is a reﬁnement of the internal history, and to ﬁnd the F -compensator we have to discuss the behaviour of the process conditional on the value of W . This can be computed by writing down from (14.5.4) the joint distribution of N (s) and N (t) for s > t, conditioning on N (s), and letting s → ∞, taking into account (14.5.5) and using Stirling’s formula [cf. Keiding (1975)]. The result can be stated as follows. Given N (t) and W , the conditional distribution of N (s) − N (t) is Poisson with parameter λ(s | t, W ) = W eλt (eλ(s−t) − 1). Hence the F -intensity is

λF (t) = λW eλt .

Note that E[λF (t) | Ht ] = λeλt E[W | N (t−)] = λeλt N (t−)e−λt = λN (t−), which is just the H-compensator if a predictable version of N (t) is taken. We now consider the asymptotic behaviour of the scaled diﬀerence −λT /2

∆(T ) = e

T e−λT /2 N (T ) − λ 0 N (u) du ˆ . (λT − λ) = T λe−λT 0 N (u) du

(14.5.6)

418

14. Evolutionary Processes and Predictability

Applying the theorem, we ﬁnd after simple computations that the pair e

−λT /2

T

[dN (u) − λW eλu du] = e−λT /2 [N (T ) − n0 − W (eλT − 1)]

0

and e−λT /2

T

λ(T − u) [dN (u) − λW eλu du] T = e−λT /2 N (u) du − no T − W (eλT − 1 − λT ) 0

0

converges F0 -stably to the pair (Z1 W 1/2 , Z2 W 1/2 ), where Z1 , Z2 are independent of F0 and jointly normally distributed with covariance matrix

1 1

1 , 2

so that in fact we can write Z2 = Z1 − Z1 where Z1 , Z1 are independent unit normal r.v.s. Thus, the numerator in the term on the right-hand side of (14.5.6) converges F0 -stably to Z1 W 1/2 and from (14.5.5) the denominator converges a.s. to W . Hence, ∆(t) ∼ Z W −1/2

(F0 -stably).

but an exponential r.v. W has just the form 12 χ2(2) , where χ2(2) denotes a chisquare r.v. on two degrees of freedom, and so the ratio has a t-distribution on two degrees of freedom [again, see Keiding (1975)].

Exercises and Complements to Section 14.5 14.5.1 For the self-correcting or stress-release model of Isham and Westcott (1979) for which λ(t) = exp (α + β[t − ρN (t)]), show that conditions for estimators of β and ρ to have central limit theorem properties hold for β > 0 and ρ > 0 but fail when β = 0. [Hint: When β > 0, ρ > 0, the process X(t) = t − ρN (t) is Markovian and the law of large numbers implies that condition (ii) of Theorem 14.5.I holds, but this fails when β = 0. See also Vere-Jones and Ogata (1984).]

14.6. Random Time Change The topics of both this section and the next describe methods for reducing more general point processes to Poisson processes, emphasizing yet again the fundamental role played by Poisson processes in point process theory. We start by recapitulating some introductory material, including Watanabe’s (1964) characterization of the Poisson process as a process with deterministic compensator. The time-change theorems in this section were introduced in Section 7.4 and linked there to Ogata’s (1988) residual analysis for

14.6.

Random Time Change

419

checking the goodness-of-ﬁt for a point process model [and now extended to the wider range of residual methods introduced for spatial point processes by Baddeley and co-workers, reviewed in Baddeley et al. (2005) and discussed brieﬂy in Sections 15.4–5 below]. Our main goal here is the extension of the time-change theorem to MPPs, ﬁrst proved for multivariate point processes by Meyer (1971) [but see also Dol´eans-Dade (1970)] using orthogonal martingale arguments, and later extended and simpliﬁed by Brown and Nair (1988). The proof we give is for general MPPs, and appears to be new, combining generating functional arguments with the use of the exponential formula much as in Br´emaud (1981) and Brown and Nair, but avoiding the second-order theory used in the orthogonal martingale arguments. To illustrate the technique we ﬁrst use it to give an extension of Watanabe’s theorem to Cox processes; in essence the proof is a minor variation of those in Brown (1978) and Br´emaud. Random time-change results for spatial processes are much more problematic: see Nair (1990) and Schoenberg (1999). Theorem 14.6.I. Let N be a simple point process on R+ adapted to the history F . If the F -compensator A of N is continuous and F0 -measurable, then N is a Cox process directed by A. Proof. With a view to characterizing the process via its p.g.ﬂ., take a ﬁxed continuous nonnegative h ∈ V(R+ ) (cf. Deﬁnition 9.4.IV), with h(0) = 1 and h(u) = 1 outside a ﬁnite interval [0, T ), and for a given realization {ti } of the process consider the expression

Φ(h) =

∞ − [h(u)−1] dA(u) h(ti ) e 0 .

(14.6.1)

0≤ti <∞

In fact both the product and the integral can be restricted to any ﬁnite interval (0, T ) outside which h(u) = 1, without altering their values. Now identify the quantity t − [h(u)−1] dA(u) h(ti ) e 0 . (14.6.2) H(t) = 0≤ti
with the solution (4.6.2) of the integral equation (4.6.3), namely,

t

H(x−)u(x) dF (x),

H(t) = H(0) +

(4.6.3)

0

by setting F (x) = N (x) − A(x) (x > 0) and u(x) = h(x) − 1. Thus, H(t) satisﬁes t

H(s−)[h(s) − 1] [dN (s) − dA(s)].

H(t) = 1 +

(14.6.3)

0

The integrand on the right-hand side is left-continuous and F -adapted, so it is F -predictable (see the discussion following Lemma A3.3.I). It now follows

420

14. Evolutionary Processes and Predictability

from Exercise 14.1.7 that the integral is an F -martingale, so on taking conditional expectations with respect to F0 , we obtain E[H(t) | F0 ] = 1. Letting t → ∞ and appealing to monotone convergence yields E[Φ(h) | F0 ] = 1. By assumption the compensator A is F0 -measurable, so the term with the exponential [see (14.6.2)] can be taken outside the expectation, and this last equation can be rewritten as " ∞ " " h(ti ) " F0 = exp (h(u) − 1) dA(u) , (14.6.4) E 0

{ti }

which we recognize as identifying the p.g.ﬂ. of the given point process, conditioned by F0 , with that of a Poisson process with (given random) mean A(t), so unconditionally it is a Cox process directed by A(·). The uniqueness theorem for p.g.ﬂ.s completes the proof. Corollary 14.6.II. Let N be a simple point process with internal history H and let its H-intensity be the deterministic function µ(·). Then N is a Poisson process with density function µ(·). Watanabe discussed the special case µ(t) = µ (all t > 0). Both his result and its generalization are closely related to a theorem in Papangelou (1972), already stated in Theorem 7.4.I, that any simple point process with continuous compensator is locally Poisson in character, in the sense that there exists a local transformation of the time axis that converts the process into a Poisson process. We develop a proof based on a use of the exponential formula similar to that of Theorem 14.6.I. Consider then a simple point process that is F -adapted for some general history F for which A is the F -compensator, and consider the time-change deﬁned by τ = A(t) (t ∈ R+ ), equivalently, t = A−1 (τ ) = inf{t: A(t) ≥ τ }. Note that if the compensator A is continuous, then A−1 is right-continuous (and monotonic, like A), with jumps at the at most countable set of values of τ of constancy of A, and A(A−1 (τ )) = τ for all τ > 0. It follows (see Lemma A3.3.III) that for every τ , A−1 (τ ) is an F -stopping time. Moreover, the σ-algebra Fτ ≡ FA−1 (τ ) is well deﬁned (Deﬁnition A3.4.V), with Fτ ⊆ Fυ for τ ≤ υ (Theorem ≡ {Fτ : 0 < τ < ∞} constitutes a history for the process A3.4.VII), and F −1 (τ ) = N A (τ ) . N We now imitate the proof of Proposition 14.6.I, but take the p.g.ﬂ. in the transformed space rather than the original space. Write τi = A(ti ), and again suppose h(·) ∈ V(R+ ) is continuous and equal to unity outside some ﬁnite interval (0, T ). In place of (14.6.1), consider ∞ − [h(τ )−1] dτ h(τi ) e 0 , (14.6.5) Φ(h) = 0≤τi <∞

14.6.

Random Time Change

421

and identify it with the integral equation version ∞ −)[h(τ ) − 1][dN (τ ) − dτ ], H(τ Φ(h) =1+ 0

where H(t) is deﬁned as in (14.6.2) by the integral up to t. Now substitute τ = A(t), write H(t) = H(A(t)), and take expectations. This yields ∞ E[Φ(h)] =1+E H(t−)[h(A(t)) − 1][dN (t) − dA(t)] . 0

Because h(A(t)) − 1 and H(t−) are still both F -predictable, the expectation of the integral above vanishes because of the martingale property of N − A, so that E[Φ(h)] = 1. Now if we take expectations in (14.6.5) we see that ∞ h(τi ) = exp − [1 − h(τ )] dτ , E 0≤τi <∞

0

which shows that the p.g.ﬂ. of the transformed time process is that of a unit-rate Poisson process. Hence, because the p.g.ﬂ. determines the process uniquely, we have proved the following result. Proposition 14.6.III. Let N be an F -adapted nonterminating simple A point process on R+ with continuous F -compensator for some history F . (τ ) = N A−1 (τ ) is a unit-rate Poisson Then the randomly rescaled process N process on R+ . When N is an F -adapted simple MPP on R+ × K, we can use a similar argument to establish a comparable time-change transformation of N . In place of a continuous F into a marked compound Poisson process N compensator we assume that N has a simple ground process with ﬁnite ﬁrst moment measure and an F -intensity λF (t, κ, ω) underlying the F -compensator A(t, K, ω) whose existence is guaranteed by (14.2.7). We also assume that the MPP is K -F nonterminating with respect to the reference measure K on the mark space (K, BK ), meaning that t a(t, κ, ω) ≡ λF (u, κ, ω) du → ∞ (t → ∞) K × P-a.e., (14.6.6) 0

and from Proposition 14.3.II(b), λF (t, κ, ω) = λF g (t, ω) f (κ | t, ω)

( × K × P)-a.e.

Recall from Lemma 6.4.VI that a compound Poisson process with mark kernel F with measure µ(·) has p.g.ﬂ. G[h] = (· | ·) and ground Poisson process E exp X ×K log h(x, κ) N (d(x, κ)) for nonnegative measurable functions h(x, κ) in V(X × K). We evaluate G[h] as [h(x, κ) − 1] F (dκ | x) µ(dx) = exp [hF (x) − 1] µ(dx) , exp X

K

where hF (x) =

K

X

h(x, κ) F (dκ | x), and hF ∈ V(X ).

422

14. Evolutionary Processes and Predictability

Theorem 14.6.IV (MPP Random Time-change). Let N be an F -adapted MPP on R+ with simple ground process having ﬁnite ﬁrst moment measure and whose F -compensator A admits an F -intensity λF (u, κ, ω) such that N denote the rescaled MPP deﬁned to have a is K -F nonterminating. Let N point at a(t, κ, ω), κ if and only if N has a point at (t, κ), where a(t, κ, ω) is is a stationary compound Poisson process with deﬁned by (14.6.6). Then N unit ground intensity and stationary mark distribution K . Proof. In place of the function h of (14.6.1), consider a nonnegative function h(t, κ) which is jointly continuous in the two variables (t, κ) and satisﬁes h(t, κ) = 1 whenever t = 0, or t > T for some ﬁnite T . Mimicking more (14.6.5) rather than (14.6.1), consider − t [h(a(s,κ,ω),κ)−1]λF (s,κ) K (dκ) ds h a(ti , κi , ω), κi e 0 K . H F (t) = 0≤ti ≤t

(14.6.7) In the exponential term here, (14.6.6) implies that h(a(s, κ)) = 1 for suﬃciently large s for K -almost all κ, so an integral there taken instead over (0, ∞) remains ﬁnite. First, however, use the factorization λF = λg f already noted and consider the integral of κ-dependent terms over K: deﬁne both F ¯ − 1, (14.6.8a) u (s) = [h(a(s, κ), κ) − 1]f (κ | s) K (dκ) = h(s) K

¯ where h(s) =

K

h(a(s, κ), κ)f (κ | s) K (dκ), and a process F F on R+ by

¯ i) ∆F F (ti ) = h(a(ti , κi ), κi )/h(t

and

dFcF (s) = −λg (s) ds, (14.6.8b)

where the (ti , κi ) are the points of the given realization N and s ∈ / {ti }. The product terms in the exponential formula (4.6.2) are then of the type ¯ i )h(a(ti , κi ), κi )/h(t ¯ i ) = h(a(ti , κi ), κi ), (1 + uF (ti ))∆F F (ti ) = h(t and the integral term in (4.6.2) is of the form

t

u 0

F

(s) dFcF (s)

=−

t

λg (s) 0

K

[h(a(s, κ), κ) − 1]f (κ | s) K (dκ) ds.

Hence both terms in (14.6.7) are recovered. ¯ are bounded functions, and the expected number Because h, and hence h, of points in any ﬁnite interval is ﬁnite (assumed throughout the chapter), the boundedness requirements of Lemma 4.6.II are satisﬁed, and we can identify H F (t) of (14.6.7) with the solution of the integral equation (4.6.3), written here as t F ¯ H F (s−)[h(s) − 1] dF F (s). (14.6.9) H (t) = 1 + 0

14.6.

Random Time Change

423

We assert that the process F F (s) is an F -martingale; to see this, write it in the form h(a(s, κ), κ) F N (ds × dκ) − λg (s) ds. dF (s) = ¯ h(s) K For ﬁxed κ, the function h(a(s, κ), κ) is F -predictable, as therefore is the ¯ integral h(s) = K h(a(s, κ), κ) K (dκ). Taking expectations over a range (a, b) therefore yields b " " dF F (s) " Fa E a b " h(a(s, κ), κ) " f (κ | s) K (dκ) − 1 λg (s) ds " Fa = 0, =E ¯ h(s) a K ¯ that the integrand vanishes for each s. where we see from the deﬁnition of h Next, let t → ∞ in (14.6.9), denoting the limit by Φ(h), and take expectations. Expanding the integral on the right-hand side of (14.6.9) and appealing to Fubini’s theorem we obtain ∞ F F ¯ H (x−)[h(x) − 1] dF (x) E 0 ∞ F F H (x−) [h(a(x, κ), κ) − 1]f (κ | x) K (dκ) dF (x) =E 0 K ∞ F F E H (x−)[h(a(x, κ), κ) − 1]f (κ | x) dF (x) K (dκ). = K

0

Again, for ﬁxed κ, the functions in the integrand are F -predictable, and F F (·) is a martingale, so that the expectation vanishes. This leaves us with the result 1 = E[Φ(h)] ∞ − [h(a(t,κ),κ)−1]f (κ|t) K (dκ)λg (t) dt =E h(a(ti , κi ), κi ) e 0 K ) * ∞ ) *0 − [h(τ,κ)−1] K (dκ) dτ =E h(τi , κi ) e 0 K ≡ E h(τi , κi ) G[h], in which G[h] is the p.g.ﬂ. of a compound Poisson process with unit-rate ground process and location-independent mark-kernel density K . Here we have used the nonterminating property of the compensator to justify taking the integral over (0, ∞), and by (14.6.6) and (14.3.2b) the change of variable of integration has density f (κ | t)λg (t). This integral is nonrandom, so it can be taken outside the expectation, and then identiﬁed with the reciprocal of the p.g.ﬂ. G[h] as just described. The theorem follows. For an earlier attempted proof of this result, with further references, see Vere-Jones and Schoenberg (2004). When the mark space is ﬁnite, we obtain the following specialization to multivariate processes.

424

14. Evolutionary Processes and Predictability

Corollary 14.6.V. Let N denote a nonterminating multivariate point process with components Nk , k = 1, . . . , K, reference measure π = {π1 , . . . , πK } K satisfying πk > 0, k = 1, . . . , K, k=1 πk = 1, simple ground process, and t the F -conditional intensity λk (t). Let a(t, k) = 0 λk (s) ds and denote by N rescaled MPP deﬁned to have a point at {a(t, k), k} if and only if the kth is a stationary compound component of N contains a point at t. Then N Poisson process with unit intensity and stationary mark distribution π. has a point at Equivalently, if the rescaling is performed so that N (a(t, k)πk , k) whenever the original process has a point at (t, k) (this is the same as the reference measure π assigning unit mass to each component), then the resultant process consists of K independent, unit-rate Poisson processes, one for each mark. Suppose that the mark space is the real line; let FK (m) ≡ κ<m K (dκ) be the cumulative measure corresponding to the probability measure K , and suppose that FK (m) is continuous as a function of m. Then a simple rescaling of the mark space, taking m∗ = FK (m), converts the stationary mark distribution for the transformed process into the uniform distribution on [0, 1]. Now a compound Poisson process with constant rate and uniform mark distribution can equally be interpreted as a two-dimensional Poisson process on a strip; hence we obtain the following result. Corollary 14.6.VI. Suppose that the MPP N has real marks, that the conditions of Theorem 14.6.IV hold, and that the reference probability measure K admits a continuous cumulative version FK . Then the doubly transformed ∗ , deﬁned to have a point at a(ti , mi ), F (mi ) when the original process N K process has a point at (ti , mi ), is a two-dimensional Poisson process with unit rate over the half-strip R+ × [0, 1]. The role of absolute continuity in Proposition 14.6.IV and its corollaries, in particular the requirement that the compensator A(t, K) have a density with respect to the mark κ as well as time t, is illustrated in the next example. Example 14.6(a) Poisson process with alternating marks. Let {ti } be a realization of a Poisson process at unit rate on R+ , and attach to points {t2i−1 } the mark 1, and to the points {t2i } the mark 2. Then, writing t0 = 0 and using the notation of Corollary 14.6.V, for i = 1, 2, . . . , (1, 0) for t2i−2 ≤ u < t2i−1 , λ1 (u), λ2 (u) = (0, 1) for t2i−1 ≤ u < t2i . Let Ui = ti − ti−1 (i = 1, 2, . . .), so that for t2i ≤ t < t2i+2 , t U1 + · · · + U2i−1 + min((t − t2i )+ , U2i+1 ) (k = 1), a(t, k) = λk (u) du = U 2 + · · · + U2i + min((t − t2i+1 )+ , U2i+2 ) (k = 2). 0 It is immediately evident that each case on the right-hand side here is a sum of independent unit exponential random variables, and so corresponds to a

14.6.

Random Time Change

425

Poisson process with unit rate; furthermore, the two cases are independent. It follows that the conditional intensity of the initial MPP can be written in the form λ∗ (t, κ) = λ∗g (t)f (κ | t), where λ∗g (t) = 1 and f (0 | t) = i(t) = 1 − f (1 | t), where i(t) is the parity of N (t), so that i(t) = 1 if N (t) is odd, = 0 otherwise. The mark space here has just two points, and the natural reference measure has unit atoms at each point, so that the function f described above can be regarded as a density with respect to this measure. So far, this example has illustrated the behaviour to be expected when the conditions of Corollary 14.6.V are satisﬁed. Now suppose, as a variant on this example where the conditions are not satisﬁed, that the sequence {ti } is as before, but that t2i−1 has mark κ2i−1 = Xi that is uniformly distributed in (0, 1), independently of the other random variables, and κ2i = 2 − Xi . With Ui = ti − ti−1 as before, we can deﬁne the marked point process compensator for K ∈ B((0, 2)) in the form $ U1 + · · · + U2i−1 + (t − t2i ) IK (Xi ), t2i ≤ t < t2i+1 , A(t, K) = U2 + · · · + U2i−2 + (t − t2i−1 ) IK (2 − Xi ), t2i−1 ≤ t < t2i . Here, in the ﬁrst case, F (· | t) is just the uniform distribution on (0, 1), but in the second case, F (· | t) has an atom in 2 − Xi , because the value of Xi is now known at time t. For this model, therefore, the absolute continuity requirement is not satisﬁed, and, as the reader may check, the conclusion of Theorem 14.6.IV fails.

Exercises and Complements to Section 14.6 14.6.1 In Example 14.6(a), show that each of the four terms on the right-hand side of (14.6.15) converges in probability to zero as γ → 0 on the assumption that (γ) N (t)/t converges a.s. to some limit λ ≡ λ(ω) ∈ F0 for all γ. [Hint: The convergence is shown directly for the ﬁrst three terms; for the last, investigate supt

Hki (t) =

(k)

(k)

(k)

exp (−θki [Ak (t) − Ak (ti−1 )])

ti−1 < t ≤ ti ,

0

otherwise,

426

14. Evolutionary Processes and Predictability (k)

(k)

and deﬁne τki = Ak (ti ) − Ak (ti−1 ), the duration of the ith interval of the kth component on transformed time-scale (note that each component has a diﬀerent transformation). Then by the monotonicity of Ak (·), 0
|

Ê R+

Hki (t) Mk (dt)| ≤ ≤

Ê R+

Hki (t) [Nk (dt) + Ak (dt)]

−1 θki [1

−1 − e−θki τki ] < θki < ∞.

(b) Conclude that, because Mk is a martingale, E( Ê with E[ R Hki (t) Nk (dt)] = E(e−θki τki ) and

Ê R+

Hki (t) Mk (dt)) = 0,

+

(k)

Ak (ti

Hki (t) dAk (t) = E

E R+

)

(k)

(k)

−θki [u−Ak (ti−1 )]

e

du =

Ak (ti−1 )

E(1 − e−θki τki ) , θki

so E(e−θki τki ) = 1/(1 + θki ), and hence each τki has a unit exponential distribution. (c) Use the martingale property and integration over the diagonal D3 and the two open half-planes D1 and D2 much as in the argument following (14.1.16) to show that E[ E[

R+

Hki (u)Mk (du) ×

R+

Hji (u) Mj (du) ×

R+

Hki (v)Mk (dv)] = 0

(i = i ),

R+

Hki (v) Mk (dv)] = 0

(j = k).

(d) Use a double induction on i and k to show that for all positive integers m ≤ K and n, m E[ n i=1 k=1 R Hki (t) Mk (dt)] = 0, +

which by parts (c) and (b) is equivalent to E[

n m i=1

k=1

(e−θki τki − 1/(1 + θki ))] = 0.

m −θki τki (e) Part (d) implies that E[ n ] = ni=1 m i=1 k=1 e k=1 1/(1 + θki ) for all θki ≥ 0, so the random variables τki are mutually independent with unit exponential distributions. The time-changed processes are therefore independent Poisson processes.

14.7. Poisson Embedding and Existence Theorems This section centres on the theme, already developed in a simulation context in Section 7.5, that a wide range of processes can be derived by thinning a Poisson process in a suitably history-dependent manner. One reason for returning to this topic here is to outline the Poisson embedding technique developed in a cycle of papers by Br´emaud, Massouli´e, and others. This technique

14.7.

Poisson Embedding and Existence Theorems

427

provides a powerful framework for discussing the existence, uniqueness, and convergence to equilibrium, of stationary versions of a point process speciﬁed via its conditional intensity function. The discussion in this respect supplements the earlier discussion of convergence to equilibrium in Section 12.5. We begin here by extending the thinning theorem of Proposition 7.5.I to MPPs, at the same time giving an explicit construction for the Poisson process from which a given point process can be derived by thinning. Like Watanabe’s theorem, this proposition has a long history. The original results stem from Kerstan (1964b) and Grigelionis (1971); both preceded the independent development of thinning algorithms in Lewis and Shedler (1976). The extension to MPPs follows Br´emaud and Massouli´e (1994, 1996). We restrict attention to conditional intensities derived from internal histories, but Massouli´e (1998) extends the results to situations where information from an external process can be incorporated into the history (see also Exercises 14.7.1–2). be a Poisson process deﬁned on the probaProposition 14.7.I. (a) Let N bility space (Ω, E, P), having state space R × K × R+ , with intensity measure µ ˜(dt × dκ × dx) = dt × κ (dκ) × dx

(14.7.1)

denote the internal history of N . equal to the reference measure, and let H ∗ Suppose that λ (t, κ) is a nonnegative H-adapted and H-mark-predictable process deﬁned on the same probability space, and that the integrated process λ∗g (t) = K λ∗ (t, κ) π(dκ) is a.s. ﬁnite and locally integrable on R. Deﬁne the point process N by dt × dκ × (0, λ∗ (t, κ)] . (14.7.2) N (dt × dκ) = N Then N is an H-adapted MPP with H-conditional intensity λ∗ (t, κ) and ∗ ground intensity λg (t). (b) Let N (·, ·) be an MPP on R×K, with internal history H and H-predictable intensity λ∗ (t, κ). Then, extending the probability space if necessary, on R × K × R+ , with intensity measure there exists a Poisson process N is H-adapted (14.7.1), such that N and (14.7.2) holds. . Proof. (a) That N is H-adapted follows from its deﬁnition in terms of N Given any H-mark-predictable, nonnegative process Y (t, κ), and given also the predictability assumptions in the proposition, we then have Y (t, κ) N (dt × dκ) = E Y (t, κ) N (dt × dκ × dx) E R×K

R×K×(0,λ∗ g (t)]

λ∗ (t,κ) Y (t, κ) dt π(dκ) dx =E R K 0 ∗ Y (t, κ)λ (t, κ) dt π(dκ) , =E R

K

428

14. Evolutionary Processes and Predictability

which is enough to establish λ∗ as the conditional intensity for N . The assertions concerning the ground process follow simply. (b) The construction depends on ﬁrst deﬁning a Poisson process N ∗ (dt × dκ × du) on R × K × R+ with intensity (14.7.1), and a sequence of i.i.d. uniform (0,1) r.v.s Uk , such that both N ∗ and the given process N as well as {Uk } are all mutually independent. This can always be done at the expense, if needed, of expanding the probability space on which the original point process is deﬁned. With each point (ti , κi ) from the original process, associate the point (ti , κi , Ui λ∗ (ti , κi )) in the ‘random strip’ 0 ≤ x ≤ λ∗ (t, κ) in the third component of the extended space R × K × R+ . Build the required process by amalgamating these points from within the random strip with points N of N ∗ from outside it. The probability of ﬁnding a point of the resultant t− , is process in (t, t + dt) × (κ, κ + dκ) × (x, x + dx), given the history H then λ∗ (t, κ) dt π(dκ) λ∗ (t, κ)−1 dx = dt π(dκ) dx for points inside the random strip, and the same for points outside it. The resultant process is therefore a Poisson process with the required intensity measure. Grigelionis (1971) describes representations of the type (14.7.2) as ‘stochastic integrals with respect to a Poisson process;’ the construction above shows that the general class of MPPs deﬁned by conditional intensities can be represented as stochastic integrals in this way. In the French literature and elsewhere, a point process N satisfying (14.7.2) is said to satisfy a stochastic . diﬀerential equation driven by the Poisson process N In Proposition 14.7.I we do not require the function λ∗ (t) to depend only on the past events of the newly deﬁned process N ; it could depend on aspects ˜ outside the strip; with minor changes it could be made of the behaviour of N to depend even on the past of ancillary processes. In applications, however, we generally want λ∗ to depend only on the past of N itself, that is, to be not H-adapted but H-adapted where H is the internal history of N . This can be achieved by requiring λ∗ (t) to be a functional ψ(·, ·) from the space NR#∗ × K into R+ , and then setting λ∗ (t, κ) = ψ(N (t− ), κ), where N (t− ) t− ×K is the part of the realization of the process N itself preceding time t. The change of point of view here is that the conditional expectation in the deﬁnition of λ∗ is regarded not directly as a function of ω but as a function of . ω through the realization corresponding to ω in the conditioning space NR#∗ t −

Here the c.s.m.s. property of NR#∗ is of critical importance in ensuring that the resulting function ψ is measurable. Br´emaud, Massouli´e, and coworkers use these ideas as the starting point for an extended discussion of existence theorems and convergence to equilibrium properties for point processes with a speciﬁed conditional intensity function. Br´emaud and Massouli´e (1996) and Massouli´e (1998) are two key references; more recent papers by Br´emaud, Nappo, and Torrisi (2002) and Torrisi (2002) extend the discussion to rates of convergence to equilibrium. In the present section we illustrate their approach in a simpliﬁed form which seems to us

14.7.

Poisson Embedding and Existence Theorems

429

more intuitive for applications to examples such as the ETAS model which have recurred in this book. We ﬁrst suppose that for t ≥ 0 the process is speciﬁed by a conditional intensity of the form λ∗ (t, κ) dt dκ = ψ(St N− , κ) = E N (dt × dκ) | Ft− (14.7.3) = ψ {(t − ti , κi ): ti < t}, κ dt × dκ, where ψ(N, κ) is a nonnegative function of κ and the restriction of the realization N of a simple point process to the half-line R− ; that is, it is a nonnegative × K. functional on NR#∗ − ×K In terms of the discussion around Lemma 12.5.III, the representation in (14.7.3) allows us to interpret the MPP in terms of a Markov process on the and is equivalent to requiring the resulting Markov process space Z = NR#∗ − ×K to have stationary transition probabilities (see Exercise 14.7.3). Thus the techniques developed in this section are an alternative to the direct analysis of such a Markov process, and are particularly eﬀective when the conditional intensity retains something of the linear structure characteristic of Hawkes processes. Last (1996, 2004) discusses and illustrates the Markov process approach. Note also that (14.7.3) is the form derived in (14.3.5) for the complete intensity when the process is stationary. However, unlike the situation for point processes deﬁned by their internal conditional intensities on the halfline R+ , specifying a conditional intensity on R by (14.7.3) does not in general specify a unique point process; there may be a variety of transient solutions, and there may or may not exist a stationary solution. Our ﬁrst objective is to seek conditions for the existence of a unique MPP that (i) satisﬁes given initial conditions which we suppose are speciﬁed in terms of a distribution of the point process over R− , and (ii) evolves for t ≥ 0 according to the representation (14.7.3). We then look for the existence of stationary solutions and for conditions governing convergence to equilibrium. We start by constructing a sequence of approximations to the ﬁnal process, imitating the classical Picard proof for the existence of solutions to a diﬀerential equation, and making use of the Poisson embedding results to construct the successive approximations. The ﬁrst of the approximations, N 0 say, is speciﬁed by the initial condition for t < 0 and for t ≥ 0 has λ0 (t, κ) ≡ 0, so N 0 (R+ ) = 0 a.s. For r > 0, the subsequent approximations N r have the same realization N− on R− but for t ≥ 0 each N r is deﬁned on a pathwise basis via the recursions r λr+1 (t, κ) = ψ St N− ,κ , (14.7.4a) r+1 r+1 N (dt × dκ) = N dt × dκ × (0, λ (t, κ)] , (14.7.4b) is a realization of a Poisson process constructed as in Proposition where N 14.7.I(a).

430

14. Evolutionary Processes and Predictability

Following Br´emaud and Massouli´e, we now impose a condition on ψ which plays a role in the subsequent discussion similar to that of a Lipschitz condition in the classical existence and stability proofs for diﬀerential equations; for convenience we therefore refer to it as a Lipschitz condition. It takes the general form h(−s, η, κ) (N ∆ N )(ds × dη), (14.7.5) |ψ(N, κ) − ψ(N , κ)| ≤ R− ×K

where the nonnegative mapping h: R+ × K × K → R+ is contractive in a sense deﬁned more precisely below, and the MPP N ∆ N is deﬁned from MPPs with conditional intensities λN and λN as the MPP with conditional intensity 0 for t < 0, " λN ∆N (t, κ) = "" " λN (t, κ) − λN (t, κ) for t ≥ 0. Poisson embedding provides a natural approach to deﬁning such a process. Assuming both N and N can be and are deﬁned on the same probability of (14.7.2) space, N ∆ N is formed from the points in the Poisson process N whose third component lies between the two curves x = λN (t, κ) and x = λN (t, κ). We use this construction repeatedly below. We now seek conditions, on the kernel h of (14.7.5) and the initial process N− , that are suﬃcient to ensure the existence of a unique MPP satisfying the prescribed initial distribution and with conditional intensity described by the given functional ψ for t > 0. Two broad classes of conditions can be identiﬁed in the literature. One class, ﬁrst studied in the seminal paper by Kerstan (1964b), imposes an overall bound on the functional ψ and as a consequence can relax the conditions on h; Massouli´e (1998, Theorem 1) gives a general version from this class and is outlined in simpliﬁed form in Exercise 14.7.5. The second class, which we now follow, imposes no overall bound on ψ but requires the kernel h to deﬁne a contraction operator on a suitable space of functions. The speciﬁc conditions we use are adapted from those in Theorem 2 of Massouli´e (1998); in particular, we omit the dependence on an external process allowed in Massouli´e’s treatment. Further variants are summarized in Exercises 14.7.5–6 and references noted there. The following are the conditions we impose on h. Conditions 14.7.II. The kernel h in the Lipschitz condition (14.7.5) satisﬁes (a) for all pairs (η, κ) ∈ K × K, ∞ h(t, η, κ) dt ≡ H(η, κ) < ∞; and (14.7.6a) 0< 0

(b) there exists a positive, BK -measurable, and K -integrable function r(κ) satisfying, for some positive ρ < 1, the subinvariance condition H(η, κ)r(η) K (dη) ≤ ρ r(κ). (14.7.6b) K

14.7.

Poisson Embedding and Existence Theorems

431

The existence, for some ρ > 0, of a ρ-subinvariant function for a positive kernel such as H(η, κ) is a mild constraint only on the regularity of the kernel. When the mark space is ﬁnite, it is a direct consequence of the Perron– Frobenius theorem, when ρ can be taken to be the maximum eigenvalue of the matrix, and equality rather than inequality holds in (14.7.6b). A similar result holds whenever H deﬁnes a compact operator (Lerch’s theorem). More general conditions can be established using generating function arguments, and are outlined, for example, in Vere-Jones (1968) for the denumerable case and more generally in Liggett (1985). The requirement H(η, κ) > 0 for all η, κ, which implies also r(κ) > 0 (all κ), is a strict irreducibility condition, imposed for convenience in order to avoid having to detail the possibilities when the kernel is reducible. Note that the kernel h can be interpreted as the analogue of the matrix density function for a Markov renewal process [see Example 10.3(a)] for which the renewal distributions are defective (hence, the processes are transient). The major constraint is the scaling requirement ρ < 1, which is analogous to the subcriticality requirement in the branching process interpretation of a Hawkes process. The main diﬀerence between our approach and that in Massouli´e (1998) is in the controlling role we give to the subinvariant function r(·), including the assumption that it is K -integrable. This last assumption is associated with the requirement for the process to have ﬁnite ground intensity and could be relaxed if this requirement were dropped. These assumptions increase the transparency of the arguments at the loss of some generality. The existence of the subinvariant function r(·) in (14.7.6b) implies that the kernel H(η, κ) deﬁnes a bounded linear operator H on the space K1 , say, of measurable functions f (κ) satisfying K f (κ)r(κ) K (dκ) < ∞, and that its transpose H ∗ deﬁnes a bounded linear operator on the space K∞ , say, of measurable functions g(η) satisfying ess sup[g(η)/r(η)] < ∞, through the respective actions H(η, κ)f (κ) K (dκ), (14.7.7a) (Hf )(η) = K (H ∗ g)(κ) = H(η, κ)g(η) K (dη). (14.7.7b) K

Moreover, under the given conditions, each of the operators H and H ∗ has its norm bounded by ρ, as indicated in Exercise 14.7.4. Among other consequences, the contractive condition implies that the sum ∞ n n=0 H converges geometrically fast and deﬁnes a bounded limit operator, R say, so that for functions f ∈ K1 and g ∈ K∞ , ∞ g(η)H n (η, κ)f (κ) K (dη) K (dκ) n=0

K

K

= K

K

g(η)R(η, κ)f (κ) K (dη) K (dκ) < ∞.

Our basic result can now be stated as follows.

(14.7.8)

432

14. Evolutionary Processes and Predictability

Proposition 14.7.III. Suppose that the functional ψ and kernel h satisfy Conditions 14.7.II for some K -integrable function r(κ), and that for some C < ∞, ψ(∅, κ) ≤ Cr(κ). (14.7.9) Suppose also that an initial MPP N− is given on R− such that for some D < ∞, h(t − s, η, κ) N− (ds × dη) ≤ D r(κ). (14.7.10)

(t, κ) ≡ E R− ×K

Then there exists a unique MPP N on R, with N = N− on R− and conditional intensity (14.7.3) on R+ , and with ﬁnite, bounded mean ground rate on R+ . Proof. Without loss of generality we suppose here and throughout that r(κ) K (dκ) = 1. K

Then the subinvariant function r(·) plays the role of the density of a dominating stationary mark distribution. We return to the sequence of approximations (14.7.4). Writing λn∆ (t, κ) = n+1 (t, κ) − λn (t, κ)|, we have λn∆ (t, κ) = 0 for t < 0 because all approxi|λ mations coincide with the initial condition N− on R− , and for t ≥ 0 (14.7.5) implies that for n = 1, 2, . . . , E λn∆ (t, κ) = E |ψ(St N n , κ) − ψ(St N n−1 , κ)| n n−1 h(t − s, η, κ) N ∆ N (ds × dη) ≤E (−∞,t)×K t

=E

−∞

K

h(t − s, η, κ)λn−1 (s, η) ds (dη) , K ∆

(14.7.11)

the last equality following from the H-martingale property applied to the n n−1 , which has conditional intensity |λn − λn−1 |. point process N ∆ N Writing φn (κ) = supt∈R+ E[λn∆ (t, κ)], for n = 0 we have from (14.7.9) and (14.7.10a), φ0 (κ) = ψ(St ∅) + [ψ(St N0 ) − ψ(St ∅)] ≤ Cr(κ) + supt (t, κ) ≤ (C + D)r(κ). For n = 1, 2, . . . , (14.7.11) now implies n φn−1 (η)H(η, κ) K (dη) φ (κ) ≤ K

so that on using the inequalities for H ∗ following from Conditions 14.7.II, Mn ≡ sup[φn (κ)/r(κ)] ≤ ρMn−1 ≤ ρn M0 , κ

14.7.

Poisson Embedding and Existence Theorems

433

∞ where M0 ≤ C + D < ∞. The series n=0 E[λn∆ (t, κ)] therefore converges uniformly and absolutely in (t, κ). Considering the ground processes Ngn , it then follows that for any ﬁnite T > 0, ∞ ∞ n n−1 E |Ng (0, T ) − Ng (0, T )| = E

T

(C + D)T . 1−ρ 0 n=0 n=0 (14.7.12) Now the ground processes Ngn can diﬀer by at most positive integer values, so at most a ﬁnite number of the diﬀerences Ngn (0, T ) − Ngn−1 (0, T ) can be nonzero, a.s. It follows that the ground processes Ngn , and hence also the full processes N n , must be a.s. all equal after a ﬁnite number of terms, and so must converge almost surely and in expectation to some limit process N . Such convergence implies in turn that the ﬁdi distributions of the processes N n (·) converge for all ﬁnite intervals, implying ﬁnally that the processes N n converge weakly to the limit process N (Theorem 11.1.VI). ∞ n Moreover, from the convergence of the series n=0 φ (κ), we have also that (14.7.13) sup E λ(t, κ)] ≤ (C + D)r(κ)/(1 − ρ), λn∆ (t, κ) dt dκ

≤

t>0

implying that for t > 0 the limit process has ﬁnite mean ground rate and mark distribution dominated by a multiple of r(κ). The limit point process N starts from the same distribution of initial condias the approximants N n , and on R+ , N has conditional intensity tions on NR#∗ − λ which, using Fatou’s lemma and condition (14.7.6b), satisﬁes E |λ(t, κ) − ψ(St N, κ)| ≤ lim E |λn (t, κ) − ψ(St N, κ)| n→∞ = lim E |ψ(St N n−1 ) − ψ(St N )| n→∞ t h(t − s, η, κ)(N n ∆ N n−1 )(dt × dη) . ≤ lim E n→∞

−∞

K

But h(·, ·) is ( × κ )-integrable and N n ∆ N n−1 converges weakly to the zero process, so the last limit is zero, showing that λ(t, κ) = ψ(St N, κ) a.s. Finally, uniqueness follows from Propositions 14.1.VI and 14.2.IV, because we may regard the realization on R− as deﬁning a prior σ-algebra F0 , and the compensator deﬁnes the process uniquely when the history is an intrinsic history Ht ∨ F0 . We turn now to stationary processes and convergence to equilibrium, recalling from Section 12.5 the terms weak and strong asymptotic stationarity to describe MPPs converging weakly or strongly (i.e. in variation norm) from given initial conditions to a stationary process. The next result is again a variant of the corresponding results in Br´emaud and Massouli´e (1996) and Massouli´e (1998), to which we refer for further discussion and extensions.

434

14. Evolutionary Processes and Predictability

Theorem 14.7.IV. Suppose that the functional ψ and kernel h of (14.7.5) satisfy Conditions 14.7.II for some K -integrable function r(κ), and that 0 < ψ(∅, κ) < Cr(κ) for some C > 0. Then there exists a unique stationary MPP N † whose complete intensity is given by (14.7.3), which has ﬁnite mean ground rate ψ(S0 N− , κ) K (dκ) , (14.7.14) m†g = E K

and whose stationary mark distribution is dominated by r(κ). Furthermore, given initial conditions satisfying (14.7.10), the unique MPP N speciﬁed by Proposition 14.7.III is weakly asymptotically stationary with limit process N † . If in addition the function h in (14.7.5) satisﬁes, for some D < ∞, the condition ∞

0

K

t h(t, η, κ)r(η) dt K (dη) < D r(κ),

(14.7.15)

then N is strongly asymptotically stationary with the limit N † . Proof. Again we start the existence proof by constructing a sequence of approximating processes, but in the present situation the processes are deﬁned over the whole line R, starting from N 0 which is taken to be the empty process over the whole line. It means that if ψ(∅, κ) ≡ 0, then all subsequent approximations are also empty, and the construction fails to lead to any nontrivial stationary process. Otherwise, N 1 is a stationary compound Poisson process with intensity λ1 (t, κ) = λg f (κ) for some ﬁnite, nonzero constant λg and stationary probability density f dominated by some multiple of r. The arguments leading to (14.7.12) now carry over with only minor changes, the supremum in the deﬁnition of φn (κ) now being taken taken over the whole real line. They lead as before to the existence of a well-deﬁned limit MPP N † . In this case, however, it follows from the stationarity of N 0 and the time invariant character of ψ, that N 1 , and by induction all subsequent approximants, are also stationary, and hence that the limit process N † is stationary. Moreover, the boundedness conditions on E[λ(t, κ)], which follow as in (14.7.13), imply here that N † has ﬁnite, constant ground rate mg , and stationary mark distribution dominated by some multiple of r(κ). Uniqueness of the stationary solution will follow if we can establish the asymptotic stationarity results, for any stationary solution N ∗ of (14.7.3), satisfying the conditions of the theorem, also satisﬁes the conditions of Proposition 14.7.III, (14.7.10) following here from the assumption that the stationary mark distribution is bounded by a multiple of r(·). If we assume the weak asymptotic stationarity results, therefore, this second solution should be weakly asymptotically stationary with limit N † , which is possible only if N ∗ and N † coincide. It remains to prove the assertions concerning asymptotic stationarity. We suppose N and N † are deﬁned as in the theorem. From the Lipschitz condition we obtain for the diﬀerence between the corresponding intensities, λ∆ (t, κ) ≡

14.7.

Poisson Embedding and Existence Theorems

435

|λ(t, κ) − λ† (t, κ)| say, E λ∆ (t, κ) = E |ψ(St N, κ) − ψ(St N † , κ)| † h(t − s, η, κ) (N ∆ N )(ds × dη) ≤E (−∞,t)×K t

=E

−∞

K

h(t − s, η, κ)λ∆ (s, η) ds K (dη)

t h(t − s, η, κ)λ∆ (s, η) ds K (dη) , = a(t, κ) + E where

a(t, κ) = E

0

(14.7.16)

K

0

−∞

K

h(t − s, η, κ)λ∆ (s, η) ds K (dη) .

(14.7.17)

If we set g(t, κ) = 0 for t < 0, g(t, κ) = E[λ∆ (t, κ)] for t ≥ 0, (14.7.16) has the form of a Markov renewal equation for the joint density with respect to ds K (dκ), namely, t g(t, κ) = a(t, κ) + K

0

h(t − s, η, κ)g(s, η) ds K (dη);

(14.7.18)

formally this has the solution t g(t, κ) = 0

K

a(t − s, η)R∗ (s, η, κ) ds K (dκ),

(14.7.19)

∞ in which R∗ (t, η, κ) = n=0 hn∗ (t, η, κ) is the sum of the iterates of h under the Markov convolution operation t (h ∗ g)(t, η, κ) = 0

K

h(t − s, η, ν)g(s, ν, κ) ds K (dν).

∞ Note that, in the notation of (14.7.8), 0 R∗ (t, η, κ) dt = R(η, κ) < ∞ for all t ≥ 0, and moreover that as a sum of iterates of the contraction operator H, R also satisﬁes R(η, κ)r(η) K (dη) = R∗ (t, η, κ)r(η) dt K (dκ) ≤ (1 − ρ)−1 r(κ). K

R+ ×K

(14.7.20) Because both intensities λ(t, κ) and λ† (t, κ) are dominated uniformly in t by multiples of r(κ), it follows easily from (14.7.17) that a(t, κ) is dominated by a multiple of r(·). Hence the integral in (14.7.19) converges, and from (14.7.18) and (14.7.20) we see that g(t, κ) is also dominated by a multiple of r(·).

436

14. Evolutionary Processes and Predictability

We now consider the ground process of N∆ , N∆,g say, over the ﬁnite interval (t, t + T ), with the aim of proving that it converges weakly to the empty measure. We have t+T g(u, κ) du K (dκ). Pr{N∆,g (t, t + T ) > 0)} ≤ E[N∆,g (t, t + T )] = t

K

Substituting for g(·, ·) from (14.7.19) and using Fubini’s theorem, leads to the estimates t+T g(u, κ) du K (dκ) t K t+T u ∗ = a(u − s, η)R (s, η, κ) ds K (dη) du K (dκ) t

≤T 0

≤T

0

K t+T

0

K×K t+T K×K

K

a(u − s, η)R∗ (s, η, κ) ds K (dη) K (dκ) a(u − s, η) ∗ R (s, η, κ)r(η) ds K (dη) K (dκ). r(η)

(14.7.21)

Now a(t, κ)/r(κ) is bounded, and as t → ∞, using (14.7.17) and the inequality f (t, κ) ≤ C r(κ), we see that 0 h(t − s, η, κ)f (s, η) ds K (dη) a(t, κ) = −∞ K ∞ h(u, η, κ)r(η) du K (dη) → 0 ≤ C t

K

from the integrability of h. Also, as a function of (s, η) after integrating out κ, the second integral in (14.7.21) converges from (14.7.20), and so (14.7.21) as a whole represents a moving average of the function a(t, κ)/r(κ) which is bounded and converges to zero as t → ∞. Thus Pr{N∆,g (t, t + T ) > 0} → 0, from which we deduce that the ﬁdi distributions of St N∆,g , and hence those of St (N ∆ N † ) itself, converge to the ﬁdi distributions of the empty process, and hence that the ﬁdi distributions of St N converge weakly to those St N † . But this implies (by Theorem 11.1.VII again) the weak convergence of St N to St N † , and hence the weak asymptotic stationarity of N . This argument fails if T = ∞, but under the additional condition (14.7.15), the expected value of the total number of points in N∆ is ﬁnite. Indeed, from the convolution representation (14.7.18), we have ∞ f (t, κ) dt K (dκ) E[N∆,g (0, ∞)] = 0 ∞ K ∞ ∗ a(u, η)R (s, η, κ) ds K (dη) du K (dκ) = K 0 ∞ K 0 = a(u, η)R(η, κ) K (dη) K (dκ) du. 0

K×K

14.7.

Poisson Embedding and Existence Theorems

437

Substituting for a(u, η) from (14.7.17), using the inequality f (t, κ) ≤ C r(κ), and changing the order of integration, the last expression becomes ∞ ∞ h(v, ν, η)f (u − v, ν) dv K (dν) R(η, κ) du K (dη) K (dκ) 0 u K×K K ∞ ∞ ≤ C h(v, ν, η)r(ν)R(η, κ) du dv K (dν) K (dη) K (dκ) 0 u K×K×K ∞ v h(v, ν, η)r(ν) dv K (dν)R(η, κ) K (dη) K (dκ). = C 0

K×K×K

Incorporating the condition (14.7.15) we obtain for the last integral, J say, the telescoping sequence of reductions r(η)R(η, κ) K (dη) K (dκ) ≤ ρD r(κ) K (dκ) = ρD < ∞. J ≤ D K×K

K

But if the expected number of points of the diﬀerence process on R+ is ﬁnite, there must be (with probability 1) a ﬁnite last occurrence time, say L, of points in N∆,g . This L acts as a coupling time for the two processes N and N † , and strong asymptotic stationarity then follows from the basic coupling inequality of Lemma 11.1.I. The arguments are considerably simpliﬁed if the process is unmarked, and are outlined in Exercises 14.7.5–7. Example 14.7(a) Nonlinear Hawkes and ETAS models [see Example 7.3(b)]. Recall that the simple Hawkes model has conditional intensity of the form t λ∗ (t) = λ + 0 µ(t − s) N (ds), with nonlinear version t λ∗ (t) = Φ λ + µ(t − s) N (ds) .

(14.7.22)

0

This falls within the ambit of Proposition 14.7.III provided Φ(·) satisﬁes a standard Lipschitz condition of the form |Φ(x) − Φ(y)| ≤ α |x − y| (x, y ≥ 0). In this case conditions (14.7.6a–b) reduce, respectively, to µ(t) < ∞ and ∞ ρ = α 0 µ(u) du < 1. When Φ(x) ≡ x this reduces to the usual stability requirement for the Hawkes process. The standard initial condition is to suppose the process empty for t< 0, which certainly satisﬁes condition (14.7.10), whereas (14.7.9) reduces to Φ(λ) < ∞. Thus a version of the nonlinear process, starting from the empty ∞ initial condition, exists provided that both Φ(λ) < ∞ and α 0 µ(u) du < 1. From Theorem 14.7.IV, the same conditions also imply the existence of a stationary version of the process, with complete conditional intensity λ† (·)

438

14. Evolutionary Processes and Predictability

as in (14.7.21) but with the integral taken from −∞. The same conditions imply weak asymptotic stationarity of the process started from the empty initial condition. The extra condition (14.7.15) required for strong asymptotic stationarity here reduces to ∞ tµ(t) dt < ∞. 0

Consider next a nonlinear version of the ETAS model as a simple example of a nonlinear marked Hawkes process. As in (7.3.10) we write the conditional intensity for the nonlinear version in the form µ(t − s)A(η) N (ds × dη) f (κ). λ∗ (t, κ) = Φ λ + (0,t)×K

where f (κ) is the density of the distribution of the ‘unpredictable marks’ and A(η) [alias ψ(η) in (7.3.10)] measures the increase in ‘productivity’ with the increase in η. Assuming a simple Lipschitz condition on Φ as above, we can identify the kernel h of (14.7.5) with h(t, η, κ) = α µ(t)A(η)f (κ). ∞ Then condition (14.7.6a) reduces to 0 µ(t) dt < ∞ and the crucial condition (14.7.6b), if we identify the subinvariant vector r with f , to the requirement that ∞ A(η)f (η) K (dη) µ(t) dt < 1. ρ=α K

0

Note that this requires the convergence of the integral K A(η)f (η) K (dη), and that in this case the subinvariant vector is strictly invariant. This condition, together with the condition Φ(λ) < ∞, imply both the existence of a stationary version, and the existence and weak asymptotic stationarity of a version started from the empty initial condition. The extra condition for strong asymptotic stationarity is the same as in the unmarked case. In the more explicit spatial version of Example 6.4(b), without the nonlinear generalization, we leave the reader to check that the conditions reduce to those quoted in the example.

Exercises and Complements to Section 14.7 14.7.1 Poisson embedding results for general histories. Extend Proposition 14.7.I(a) and (b) to the situation where the history Ft can be represented in the form Ft = Ht ∨Gt , where Ht refers to the internal history of the point process, and Gt to the history of a process evolving contemporaneously in parallel with the point process. [Hint: The crucial point for part (a) is to deﬁne the underlying Poisson process on a history suﬃciently rich for both the point process and the auxiliary process to be H-adapted and H-predictable. See Massouli´e (1998).]

14.7.

Poisson Embedding and Existence Theorems

439

14.7.2 (Continuation). Specialize Proposition 14.7.I to the case that the conditional intensity of the MPP can be represented in the form (14.7.3). [Hint: The main changes relate to the requirement of H-predictability, and the appropriate introduction of the functional ψ.] 14.7.3 Denote by ZN (t) the process whose value at time t is the realization St N− on R− of an MPP with conditional intensity function of the form (14.7.3). Verify the Markov property for ZN (t) and examine the form of its inﬁnitesimal generator. 14.7.4 Verify that the operators H and H ∗ of (14.7.7) do indeed deﬁne bounded linear operators on the spaces of functions K1 and K∞ deﬁned in the text, and that their norms are bounded by ρ. [Hint: Use (14.7.6b) and a Fubini theorem argument applied to Hf =

K

|(Hf )(η)| r(η) K (dη) ≤

1 ∗ H g = ess sup r(κ) κ∈K ∞

assuming for H that ess sup[g(κ)] < ∞.

K

K×K

H(η, κ) |f (κ)| r(η) K (dη) K (dκ),

H(η, κ)g(η) K (dη) , |f (κ)| K (dκ) ≡ f < ∞, and for H ∗ , that g ≡

14.7.5 Check and prove directly the following restriction of Proposition 14.7.III to simple (unmarked) point processes. Let Ψ(N ) be a mapping Ψ: NR#− → R+ , satisfying Ψ(∅) = C < ∞ and the Lipschitz condition |ψ(N ) − ψ(N )| ≤ ∞ h(−s) (N ∆N )(ds), where h: R+ → R+ satisﬁes 0 < 0 h(t) dt < ρ < 1. R −

Suppose also that the initial condition satisﬁes (t) = E[ R h(t − s) N− (ds)] − ≤ D < ∞. Then there exists a unique point process N with ﬁnite mean ground rate, initial condition N− , and conditional intensity λ∗ (t) = Ψ(St N ) for t ≥ 0. State and prove a corresponding extension of Theorem 14.7.IV. [Br´emaud and Massouli´e (1996) give a version without the restriction to processes with ﬁnite mean rate.]

14.7.6 (Continuation). As a variant, prove that similar uniqueness theorems hold under the conditions that Ψ is bounded overall, but the requirement that ρ < 1 is weakened to ρ < ∞. Investigate an analogous theorem for MPPs under the assumption that Ψ(N, κ) ≤ M r(κ) for some ﬁnite M and function r that acts as a ρ-subinvariant function for the kernel h(t, η, κ). [For other variants, see also Kerstan (1964b), Br´emaud and Massouli´e (1996), and, for the marked case, Massouli´e (1998).] 14.7.7 (Continuation). Say that a point process or MPP has bounded memory if the functional Ψ depends on N only on either (a) its past in the ﬁnite interval (−a, 0), or (b) a ﬁnite number of occurrence times t−k with t−k < 0. Investigate existence and stability theorems for point processes and MPPs with ﬁnite memory. [See Lindvall (1988) for a treatment of case (b) based on a regeneration point argument. See also Br´emaud and Massouli´e (1996).]

440

14. Evolutionary Processes and Predictability

14.8. Point Process Entropy and a Shannon–MacMillan Theorem In this section we return to the discussion in Section 7.6 of point process entropy and information gain. Our aim is to extend and consolidate the theoretical background to the results presented there concerning entropy rates, likelihoods, and information gain. In particular we complete the discussion of Proposition 7.6.II and establish an ergodic result for point process entropy analogous to the Shannon–MacMillan theorem for independent sequences. In introducing any general concept of entropy it is important to bear in mind that entropy, like likelihood, is best regarded as deﬁned relative to a reference measure which can be a probability measure, but need not be totally ﬁnite. The entropy of a discrete (i.e., purely atomic) distribution {pk : k = 0, 1, . . .} can be deﬁned directly as the expectation pk log pk , (14.8.1a) Ha = E(− log pk ) = − but the natural analogue for a continuous distribution, with density f (x) say on X ⊆ R, namely, f (x) log f (x) dx, (14.8.1b) Hc = − X

is scale-dependent, and cannot be reached as the limit of approximating discrete distributions. For example, in approximating the continuous uniform distribution on (0, 1) by a sequence of discrete uniform distributions with mass 1/n at each of the points (k − 12 )/n, the entropy of this discrete approximation equals log n and consequently diverges as n → ∞. Similarly, for the uniform distribution on the unit hypercube in Rd , the discrete approximation to the entropy equals d log n = log(nd ) = log(# points used in discrete approximation), and again diverges as n → ∞. Indeed, it was just the diﬀerences in the rates of divergence that was recognized by R´enyi (1959) as characterizing the dimension of the set on which the limit measure was carried, thus suggesting the deﬁnition of the R´enyi dimensions introduced around (13.6.1). Intuitively, the inﬁnite limits obtained from the discrete approximations can be regarded as stemming from the unreasonable requirement that observation of a real-valued random variable pins down its value precisely, that is, the observation speciﬁes all the digits in its decimal representation, thus conveying inﬁnite information. One way of overcoming the apparent diﬃculties in linking discrete and continuous entropies is to consider each entropy relative to a reference measure on the relevant carrying space. Then (14.8.1a) is considered as an entropy relative to the discrete measure with unit masses at each integer, and the continuous version at (14.8.1b) is considered as an entropy relative to Lebesgue measure. This leads to the concept of the relative or generalized entropy, an approach that also permits the deﬁnition of the entropy of a distribution on a general probability space. Suppose that (Ω, E, µ) is a measure space, and

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem

441

P µ is a probability distribution on this space. Then the generalized entropy of P with respect to the reference measure µ is given by Λ(ω) log Λ(ω) µ(dω) = − log Λ(ω)P(dω) = EP (− log Λ), H(P; µ) = − Ω

Ω

(14.8.2) where Λ(ω) = (dP/dµ) is the Radon–Nikodym derivative of P with respect to µ. If P is singular with respect to µ, we set H(P; µ) = ∞. If µ = Q, where Q is again a probability measure, we can rewrite (14.8.2) in the form dP P(dω) (14.8.3) log −H(P; Q) = dQ Ω which, apart from the negative sign, identiﬁes the generalized entropy with the expected value of the likelihood ratio (dP/dQ) under the assumption that P is the true distribution. This link to the likelihood ratio underlies the properties of the entropy scores introduced in Section 7.6. Indeed, the righthand side of (14.8.3) is nothing other than the Kullback–Leibler distance between the two probability measures P and Q. Convexity of the function x log x, when applied to the ﬁrst form in (14.8.2), guarantees that this quantity is always nonnegative, and equals zero if and only if the two measures coincide. When both distributions are absolutely continuous with respect to a reference measure µ, then in the terminology of Section 7.6, the right-hand side of (14.8.3) represents the (expected) information gain, that is, the expected value of the logarithm of the probability gain,resulting from scoring outcomes by − log dP/dµ rather than − log dQ/dµ , when the true distribution is really P. In this formulation the reference measure µ drops out of the comparison, or can be taken to be Q itself as in (14.8.3). Turning to point process entropies, we start from the entropy of a point process observed over a state space X which we take to be a bounded region A ∈ Rd . The distribution of the point process can then be regarded as a symmetric probability distribution on the countable union X ∪ . Observation of a realization of the process conveys information of two kinds: the actual number of points observed, and the location of these points given their number. Assuming absolute continuity with respect to Lebesgue measure for the distribution of locations, and bearing in mind that the points are indistinguishable, we can write the probability density for the latter term in the form k! πksym (x1 , . . . , xk ; A), where πksym denotes a symmetric probability density over A(k) . This suggests deﬁning the entropy of a realization {x1 , . . . , xN } as H ≡ H(N ; x1 , . . . , xN ) = H(N ) + E H(x1 , . . . , xN | N ) ∞ pk log pk =− k=0 ∞ − pk πksym (x1 , . . . , xk ) log[k! πksym (x1 , . . . , xk )] dx1 . . . dxk . k=1

X (k)

(14.8.4)

442

14. Evolutionary Processes and Predictability

This follows the notation of Section 5.3, with the factor k! arising, just as in the discussion of likelihoods, from the fact that only unordered point sets can be distinguished, so that any given allocation of particles to points is repeated k! times. Rudemo (1964) and McFadden (1965b) introduced point process entropy eﬀectively in this form. From an entropy viewpoint, the extra factor log k! can also be regarded as the loss of information, for given k and a given set of locations, about which particle is located at which location. Under the assumption of indistinguishable points, there are k! permutations from which to choose, all of them equally likely, corresponding to a distribution with entropy log k! . Notice that (14.8.4) can be written sym (x1 , . . . , xN )] H = −EP log[pN N ! πN = −EP log[jN (x1 , . . . , xN )] = EP (− log L), where L is the likelihood, identiﬁed as in Section 7.1 with the Janossy density. As with the entropy of distributions with a continuous density considered earlier, the deﬁnition at (14.8.4) is scale-dependent: approximating each of the on n points, results in a discrepancy densities πk by a discrete distribution which increases as pk log k log n = E(N ) log n. For this reason E(N ) is sometimes regarded as the dimension of the distribution of a ﬁnite point process on R. The implicit reference measure here has d-dimensional Lebesgue measure on each constituent space X d ⊆ Rd and unit mass at each nonnegative integer. The alternative is to proceed as in Section 14.4 and take the reference measure to be the probability distribution of some standard process, usually the Poisson process with unit rate, so that (dP/dQ) reduces to the likelihood ratio relative to this standard process. When this is the Poisson process, and X is a bounded set A ∈ Rd , the net eﬀect is merely to add an extra term (A) to (14.8.4) (see Exercise 14.8.1). The expected value of such a log likelihood ratio is just the information gain as introduced in Section 7.6, which therefore appears as the negative of the corresponding generalized entropy. It takes the same form whether the points are treated as distinguishable or indistinguishable, and is given by G=

∞ k=0

∞

pk log

pk + pk qk k=1

A(k)

πksym log

πksym dx1 . . . dxk . qksym

(14.8.5)

The two expressions (14.8.4) and (14.8.5) illustrate the diﬀerence in intent between an absolute and a generalized entropy, giving the entropy relative to a measure reﬂecting the structure of the space on which it is deﬁned, or else the information gain, which compares the entropies of two probability measures deﬁned within similar structural constraints. For a simple or marked point process in time, entropy can be represented alternatively in terms of conditional intensities. Suppose that P (which we use

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem

443

as a shorthand for PT , the probability measure restricted to the events generated by the point process in [0, T ]) corresponds to a MPP on [0, T ] × K, and that we represent the likelihoods in terms of conditional intensities λ∗ (t, κ) relative to the internal history H as in Deﬁnition 14.3.I(b). The generalized entropy and the information gain relative to an alternative probability measure P 0 can be found by taking expectations of the likelihood ratio as set out in (14.4.2). In particular, adopting the notation of (14.4.2), the expected information gain for MPPs over an interval (0, T ) can be written in the form GT (P; P 0 ) = EP [dP/dP 0 ] T 0 =E log µ(t, κ) N (dt × dκ) − [µ(t, κ) − 1]λ (t, κ) dt K (dκ) , 0

(0,T )×K

K

where µ(·) is the ratio of H-conditional intensities λ∗ (·) and λ0 (·) under P and Q ≡ P 0 , respectively. Taking predictable versions of the conditional intensity, the last expectation simpliﬁes to give T T λ∗ (t, κ) log µ(t, κ) dt K (dκ) − [mg (t) − m0g (t)] dt, GT (P; P 0 ) = E 0

K

0

(14.8.6) where we have written mg (·) and m0g (·) for the ground rates under P and P 0 respectively. For simple point processes, a similar expression holds but without any marks κ or integral over K. To obtain the generalized entropy, it is necessary to identify the appropriate reference measure. When the point process is simple the reference measure QT is the nonnormalized measure corresponding to eT Poi(1, T), where Poi(λ, T ) is the probability measure of a Poisson process on (0, T ) with constant rate λ. For an MPP the nonnormalized measure is a similar multiple of a compound Poisson process with unit rate and mark distribution π(dκ). Adopting these conventions, we obtain the corresponding generalized entropy in the form T ∗ ∗

∗ λ (t, κ) log λ (t, κ) − [λ (t, κ) − 1] dt π(dκ) , (14.8.7) HT = −E 0

K

In general, the expressions (14.8.6–7) are not easy to evaluate explicitly, although they are usually straightforward to obtain from simulations. It does simplify, however, when the MPP is stationary. In this case we use the extension of the likelihood (14.4.2) to intrinsic conditional intensities (i.e., con0 , so that for ditioned on some initial σ-algebra G0 ) and we take G0 = H−∞ G † t > 0, λ (t) can be identiﬁed with the complete intensity λ (t). In this case, for all t, we have E[λ† (t, κ)] = E[λ† (0, κ)] = mg E[f † (κ | 0)], where mg = E[λ†g (0)] is the overall mean rate, which we assume ﬁnite. Then, for example, the expected information gain (14.8.6) takes the form † † λ (0, κ) log µ (0, κ) π(dκ) − [mg − m0g ] ≡ T G, (14.8.8a) GT = T E K

444

14. Evolutionary Processes and Predictability

where G is the expected information gain per unit time of2 (7.6.14). It is often more usefully written as G being equal to f † (κ | 0) f † (κ | 0) log ¯ π(dκ) . E λ†g (0) log λ†g (0)] − [mg − m0g ] + E log λ†g (0) f (κ) K (14.8.8b) Then the ﬁrst two terms represent the information gain due to the times of the points, and the third term the conditional information gain due to the marks of the points, given their occurrence times. Similarly we can deﬁne an entropy rate by HT = −E λ† (0, κ) log λ† (0, κ) π(dκ) + [mg − 1] H≡ T K f † (κ | 0) log f † (κ | 0) π(dκ) + [mg − 1]. = −E λ†g (0) log λ†g (0) + K

(14.8.9)

For reference, the last results are summarized in the proposition below. Proposition 14.8.I. Let N be a simple or marked stationary point process on R, with complete intensity function (determined by the internal history and relative to a reference distribution π(·) on the mark space) λ† (t, κ) = λ†g (t)f † (κ | t). Suppose that E[λ†g (0) log λ†g (0)] < ∞, so that N has ﬁnite ground rate mg = E[λ†g (0)]. Then the expected information gain G, relative to a compound Poisson process with unit ground rate and strictly positive mark density f¯(κ) is given by (14.8.8), and the entropy rate H by (14.8.9). Example 14.8(a). Simple, mixed, and compound Poisson processes. The simplest example is a simple Poisson process of constant rate µ. In this case G = µ log µ − (µ − 1), representing the expected information gain per unit time over the unit rate Poisson process. Note the implicit ratio in the ﬁrst term: each µ is really the ratio µ/1 of rates between the true and reference processes. As a function of µ, G is 0 when µ = 1, that is, when the true process coincides with the reference process, and is otherwise strictly positive. The mean entropy rate is obtained by changing signs and omitting the −1 in the ﬁnal bracket: H = µ − µ log µ. Exercise 14.8.2 shows that amongst all point processes with given rate λ, the Poisson process has maximum entropy rate. In the case of a mixed Poisson process, where the rate λ is a random variable with distribution function F (·) say, the conditioning on G0 must be taken into account, and the information gain becomes λ[log λ − 1] dF (λ). G= R+

2 Equation (7.6.14) in Volume I contains two errors: in the second term +m E[· · ·] a factor g † fk|0 is omitted and it is wrongly assumed that the process has unpredictable marks, so K

† λ†g (0)fk|0

† fk|0

. In the ensuing display, G fk equals the right-hand side but the middle expression is also ﬂawed and should be deleted. that the term should instead read + E

k=1

log

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem

445

κ = 0.2 (G=1.898)

Poisson (G=0)

κ = 5 (G=0.456)

κ = 25 (G=1.204)

Figure 14.2 First 40 points of realizations of four diﬀerent gamma-distributed renewal processes, same mean but shape parameters κ = 0.2, 1, 5, 25, and (expected) information gains G as shown.

Consider ﬁnally a compound Poisson process with ground rate µ and mark distribution {pk : k ∈ Z+ }. The mark distribution is here discrete, and the reference distribution on the mark space can be any distribution f¯k on the nonnegative integers for which all terms are strictly positive. Because the marks are chosen independently of the previous history of the process, (14.8.6) simpliﬁes to ∞ pk log(pk /f¯k ). G = µ log µ − (µ − 1) + µ k=0

As does (14.8.6) more generally, this equation represents a decomposition of the expected information gain into two components: the gain due to modelling the counts, and the gain due to modelling the marks. Example 14.8(b) Renewal processes with gamma interevent distributions. It is shown in Example 7.6(b) that the information gain for a renewal process with lifetime density f (·), relative to a Poisson process with the same mean rate m, takes the form ∞ f (y) dy . (14.8.10) f (y) log G=m 1+ m 0 Figure 14.2, adapted from Daley and Vere-Jones (2004), illustrates the way G can vary with the model as the character of the interevent distribution is changed. Here the mean rate is ﬁxed but the shape parameter is allowed to vary. Details of the computations are sketched in Exercise 14.8.3. We turn next to approximations of point process entropies by entropies of systems of discrete trials, with the aim of consolidating the ideas introduced in Section 7.6; we follow broadly the treatment in Daley and Vere-Jones (2004). Suppose ﬁrst that the process is unmarked, and that the observation interval (0, T ] is partitioned into subintervals Ai = (ui−1 , ui ], i = 1, . . . , kn , with u0 = 0, ukn = T for which forecasts are required. We observe either the values Xi = N (Ai ) or the indicator variables Yi = IN (Ai )>0 , but not the

446

14. Evolutionary Processes and Predictability

individual points. Our aim is to show how the information gain and associated quantities for the point process can be approximated by the analogous quantities for the Xi or the Yi , both of which may be thought of as sequences of dependent trials. In order to embed this situation into a limit process, we suppose that the given partition is a member of a dissecting family of partitions, say Tn , n = 1, 2, . . . , as deﬁned in Section A1.6. The crucial partition, however, is not the partition of the state space but the partition this induces on the probability space Ω. For example, a partition of (0, T ] into r subintervals induces a partition of the probability space into 2r events, each corresponding to a particular sequence of values of the indicator variables Yi . We denote by An the algebra of events generated in this way by the partition Tn . To consider the eﬀect of reﬁning the partition, take a particular partition Tn0 , say, and deﬁne for each set A ∈ Tn0 the sequence of associated processes η (n) (A), where for n < n0 , η (n) (A) = 0, and for n ≥ n0 , (n) η (n) (A) = Yi = IN (A(n) )>0 . (14.8.11) (n)

i:Ai

⊆A

(n)

i:Ai

⊆A

i

For n > n0 , η (n) (A) counts the numbers of subintervals of A which belong to Tn and contain a point of the process. It is clear that, for increasing n, the η (n) are nondecreasing. Indeed, because the partitions form a dissecting system, and the point process is assumed to be simple, each point of the process will ultimately be the sole contributor to one of the nonzero terms η (n) (A), so that η (n) (A) ↑ N (A). Thus, any event {N (A) = k} can be approximated by the corresponding events {η (n) (A) = k}. More generally, any event deﬁned by the simple point process in (0, T ] can be approximated /∞ by events determined by the processes η (n) (·), or equivalently H(0,T ] = n=1 An . A similar argument holds also for marked point processes. Here we consider a dissecting family Tn of partitions of the product space (0, T ] × K, each of which is of product form Vn × Wn , so that each element of Tn is a rectangle V × W , where V ∈ Vn , W ∈ Wn respectively. A family of processes ζ (n) (·) (n) can be deﬁned much as in (14.8.11), but with the Ai in (14.8.11) interpreted as rectangles from (0, T ] × K. In this case, if (0, T ] is partitioned into r subintervals, and K into s components, the σ-algebra An will be generated by (s + 1)r distinct events, each corresponding to a sequence of length r, each term in which can be any one of the s marks, or a special mark φ to allow for the possibility that no events occur in the subinterval in question. Once again we ﬁnd that, for any rectangle set A × K from one /∞of the partitions Tn , ζ (n) (A × K) ↑ N (A × K), and consequently H(0,T ] = n=1 An . The following lemma summarizes the conclusions. Lemma 14.8.II. Let N be a marked point process with simple ground process and let ζ (n) be deﬁned as in (14.8.11) with respect to a dissecting family of partitions of (0, T ] × K. Then the processes ζ (n) (A × K) generate the internal history H(0,T ] .

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem

447

To proceed further, we need to clarify the role of the underlying σ-algebra in deﬁning the generalized entropy. Here we follow the treatment of Csisz´ ar (1969) and Fritz (1969). Suppose given a measure space (Ω, E), a σ-algebra A ⊆ E, and two measures on (Ω, E), a probability measure P, and a reference measure (not necessarily a probability measure) Q; suppose also that P Q on A, with Radon–Nikodym derivative (dP/dQ)A . As before, but explicitly recording the σ-algebra, we deﬁne the generalized entropy on (Ω, A) as dP dP log Q(dω) dQ A Ω dQ A dP dP . log P(dω) = EP − log =− dQ A dP 0 A Ω

H(P, Q; A) = −

(14.8.12)

When Q(·) is itself a probability measure, we prefer to change the sign and refer instead to the information gain dP dP dP 0 log P (dω) = log P(dω) , 0 A dP 0 A dP 0 A Ω dP Ω (14.8.13) with G(P, P 0 ; A) ≥ 0, and it is set equal to +∞ if in fact P P 0 ; note that the integral can diverge even when the absolute continuity condition holds. The following two properties of the generalized entropy, taken from Csisz´ ar (1969) as in Fritz (1969) but restated here in terms of information gains, are of crucial importance (for a sketch of the proof and some related material, see Exercises 14.8.4–6). G(P, P 0 ; A) =

Lemma 14.8.III. (a) The information gain G(P, P 0 ; A) is nondecreasing under reﬁnement of the σ-algebra A. (b) Let {Aα } be a family of σ-algebras generating A, and such that, to every Aα1 , Aα2 , there exists Aα3 ∈ {Aα } for which Aα1 ∪ Aα2 ⊆ Aα3 . Then

G(P, P 0 ; A) = supα G(P, P 0 ; Aα ) . Suppose, in particular, that A(n) is the σ-algebra derived from the ζ (n) (A) associated with a dissecting family of partitions as in Lemma 14.8.II. Then the condition of part (b) of Lemma 14.8.III holds as a consequence of the nested property of the partitions in a dissecting family, and because the family of σ-algebras An is also monotonic increasing with limit H(0,T ] , the two lemmas together imply the ﬁrst part of the following proposition, which provides an extension and minor strengthening of Proposition 7.6.II. Proposition 14.8.IV. Consider an MPP N deﬁned on (0, T ] × K, with distribution P on (Ω, E), and internal history H, and let P 0 be an alternative probability measure on (Ω, E). Also let Tn = Vn × Wn be a dissecting family

448

14. Evolutionary Processes and Predictability

of partitions of (0, T ] × K, and An the σ-algebra of events induced by Tn through the approximations ζ (n) of Lemma 14.8.II. (a) If G = G(PT , QT ; H) and Gn = G(PT , QT ; An ) denote the expected information gains associated with N and ζ (n) , respectively; then Gn ≤ G and Gn ↑ G as n → ∞. (b) The same conclusions hold for the information gains relative to the intrinsic histories G = H ∨ G0 and {Gn = An ∨ G0 }. Proof. Part (a) is a direct consequence of Lemmas 14.8.II–III. Part (b) follows by ﬁrst conditioning on G0 and applying part (a), then taking expectations over G0 . An important extension to these results is to situations where the point process evolves alongside another, stochastically related, process which can provide predictive information about the point process. It is in situations of this kind that more general conditional intensities λF arise, supposing that the joint history is available for observation, but only the point process needs predicting. For such situations it is also possible to consider information gains of the type (14.8.6–10), even though they lose their strict interpretation as expected values of likelihood ratios. We attempt only an informal sketch of this development. We consider just the case where a simple point process N (t) evolves alongside a second cumulative process W (t), both processes being observed over (0, T ], or in terms of their increments over a family of partitions of (0, T ). (n) (n) Speciﬁcally, let Yi , Wi denote the observed values of the increments of (n) the processes N (t), W (t) over the set Vi ∈ Vn deﬁned as in the previous (n) (n) (n) discussion, let pi (Y | Fi−1 ) denote the conditional distribution of Yi given (n)

observations (Yj (n)

bution of Yi gain

(n)

(n)

(n)

, Wj ) for 0 ≤ j ≤ i − 1, and pi (Y | Hi−1 ) the distri(n)

given Yj

(0 ≤ j ≤ i − 1) only. This gives an information

(n) (n) (n) pi (Yi | Fi−1 ) (n) (n) (n) Gn = E , pi (Yi | Fi−1 ) log (n) (n) (n) pi (Yi | Hi−1 ) i

(14.8.14)

n ≥ 0 holds essentially as a result of the well-known and the inequality G inequality for entropies, H(X | Y ) ≤ H(X). Lemma 14.8.III can now be applied to show that, in this situation also, reﬁnement of the partitions can only increase the information gain. Introducing a further partition point θ inside a subinterval (ui−1 , ui ] does not aﬀect the information available to the two histories at the beginning of the ﬁrst new subinterval, (ui−1 , θ], but in the second new subinterval, (θ, ui ], the full process obtains new information about both the point process and the explanatory variables, whereas the reference process obtains only the additional information about the point process. Hence the information gain can only increase with the introduction of the new partition.

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem

449

If now we let n → ∞ in (14.8.14), and suppose that conditional intensities λF (t, κ), λH (t, κ) exist, we obtain the limit, which again is also an upper bound, T λF (t, κ) λF (t, κ) dt K (dκ) , log H G(P, P 0 ; AF ) = E λ (t, κ) 0 K the second term in (14.8.6) disappearing because the processes have the same expected rates. In particular, if the two processes are stationary, the information gain per unit time G equals λF f F (κ | 0) g (0) F F + log λ (0) log (0) f (κ | 0) log (dκ) . (14.8.15) E λF K g g λH f H (κ | 0) K g (0) Notice that these equations have the same form as if we had taken expectations of a likelihood ratio, but in fact no true likelihood ratio exists here because the F -intensity by itself does not deﬁne a process but only the way the point process component is determined by the full process. See Exercise 14.8.7 for a hidden Markov model example. The last question we take up in this section is that of approximating the expected entropy rate from ﬁnite samples, as in MacMillan’s theorem for a discrete ergodic source with ﬁnite alphabet [see, e.g., Billingsley (1965)]. Here, the basic statements assert the convergence of the log likelihoods T −1 log LT , either a.s. or in L1 norm. We consider simple point processes only, and follow Papangelou (1978) in deriving the L1 version of this result. Extensions to the MPP context are sketched in Exercise 14.8.11. The main problem is that, although we have derived expressions for the entropy rate from a form of likelihood involving the complete intensity function, in reality the best that is likely to be available is the conditional intensity based on the internal history, starting from an empty history at time 0. Thus there are two main steps in the proof: ﬁrst, establish the convergence of the pseudolikelihoods in which the complete intensity plays the role of the intrinsic intensity over a ﬁnite interval, and second, show that the diﬀerence between the true and pseudolikelihoods is asymptotically negligible. These two steps are set out in the next two lemmas. Lemma 14.8.V. Suppose E[λ† (0) log λ† (0)] < ∞ for a simple stationary point process N . Then as T → ∞, 1 T log λ† (t) dN (t) → E[λ† (0) log λ† (0) | I] (14.8.16) T 0 both a.s. and in L1 norm, where I is the σ-algebra of invariant events. Proof. Because N is stationary, the process λ† (t) is stationary. Also, λ† (t) is H† -predictable so the set function log λ† (u) N (du) (bounded A ∈ B) ξ(A) = A

450

14. Evolutionary Processes and Predictability

may be regarded as a stationary random signed measure with mean density m = E[λ† (0) log λ† (0)]. The result (14.8.16) then follows from the a.s. and L1 ergodic results of Proposition 12.2.IV. [This involves noting that the two processes ξ+ (A) =

[log λ† (u)]+ N (du)

and

A

ξ− (A) =

A

[log λ† (u)]− N (du)

are both nonnegative measures " to which the theorems apply directly, with the " ﬁniteness of E"λ† (0) log λ† (0)" following from x log x ≥ −e−1 (x ≥ 0) and the ﬁniteness assumption in the lemma.] In the next result the monotone family of σ-algebras {H(−T,0) } generated by {N (t): −T < t < 0}, which increase as T → ∞ to H0− , plays a key role. First, the nonnegativity of λ ≡ λ† (0) and assumed ﬁniteness of E(λ log λ) ensure that E(λ) < ∞. Next, the family {λT : 0 < T < ∞}, where λT = E[λ† (0) | H(−T,0) ),

(14.8.17)

constitutes a martingale, which by deﬁnition and the ﬁniteness of E(λ) is uniformly integrable, so λT → λ both a.s. and in L1 norm by the martingale convergence theorem. But the function x log x is convex in x, so by Jensen’s inequality, ∞ > E(λ log λ) = E[E(λ log λ | H(−T,0) )] ≥ E(λT log λT ), and because x log x ≥ −e−1 (all x ≥ 0), it follows that E(λT log λT ) is welldeﬁned and ﬁnite. Lemma 14.8.VI. Under the conditions of Lemma 14.8.V, with λT as at (14.8.17), " " E"λ† (0) log λ† (0) − λ† (0) log λT " → 0

(T → ∞).

(14.8.18)

Proof. Because λT → λ in distribution and ∞ > E(λ log λ) ≥ E(λT log λT ), and also 0 ≤ λ log(λT /λ) I{λT > λ} ≤ λ(λT /λ − 1) = λT − λ for which λT → λ in L1 norm, it is enough to show that E(λT log λT ) → E(λ log λ). Let x ≥ 1 be a continuity point of the distribution of λ; then for suﬃciently large x we can certainly make E(λI{λ>x} log λ) < for arbitrary > 0. Now E[max(x log x, λT log λT )] = E max x log x, E(λ | H(−∞,0) ) log[E(λ | H(−∞,0) )] ≤ E[max(x log x, λ log λ)]

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem

451

by Jensen’s inequality because max(x log x, y log y) is convex in y > 0 for x ≥ 0. Because x is a continuity point for λ with x ≥ 1, 0 ≤ E(λT I{λT >x} log λT ) = E[max(x log x, λT log λT )] − x log x Pr{λT ≤ x} ≤ E[max(x log x, λ log λ)] − x log x Pr{λT ≤ x} → E[max(x log x, λ log λ)] − x log x Pr{λ ≤ x} = E(λI{λ>x} log λ) < , with the convergence holding uniformly for T suﬃciently large. Theorem 14.8.VII. Let the simple stationary point process N admit Hpredictable complete intensity λ† (t) and H-predictable conditional intensity λ∗ (t) on t ≥ 0 and be such that H ≡ −E λ† (0)[log λ† (0) − 1] is ﬁnite, so that m = E[λ† (0)] is ﬁnite also. Then as T → ∞, H(0,T ] →H T and

log L(0,T ) → E(Z | I) T

a.s.

(14.8.19)

in L1 norm,

(14.8.20)

where Z = λ† (0)[log λ† (0) − 1] and I denotes the σ-algebra of invariant events for N . Proof. Convergence as in (14.8.19) follows from the deﬁnition of H(0,T ] , H, and the conditions and results of the last two lemmas. To prove (14.8.20), consider the diﬀerence " " " " log L(0,T ) " − E(Z | I)"", (14.8.21) E" T which by virtue of the triangle inequality is dominated by T1 + T2 + T3 + T4 , where " " T " 1 "" T ∗ † log λ (t) dN (t) − log λ (t) dN (t)"", T1 = E" T 0 0 " " T " 1 "" T ∗ † λ (t) dt − λ (t) dt"", T2 = E" T 0 0 " T " "1 " log λ† (t) dN (t) − E λ† (0) log λ† (0) | I "", T3 = E"" T 0 " T " "1 † " † " λ (t) dt − E λ (0) | I "". T4 = E" T 0

452

14. Evolutionary Processes and Predictability

Here, T3 → 0 by Lemma 14.8.V, and applying the ergodic theorem to the stationary process λ† (t) implies that T4 → 0. By assumption λ∗ (t) and λ† (t) are predictable on R+ and R, respectively, so both are H-predictable on R+ , and thus T1 is dominated by 1 E T

T

" " " log λ∗ (t) − log λ† (t)"λ† (t) dt.

0

Recall from the projection Theorem 14.2.II that λ∗ (t) can be replaced by a suitably chosen version of E[λ∗ (t) | Ht− ] without altering the value of the integrals in T1 and T2 . Using stationarity, replace (0, T ) by (−T, 0), which leads to 0 1 = E T −T 0 1 T2 = E T −T

T1

" " " log E λ† (t) | H(−T,t) − log λ† (t)"λ† (t) dt , " † " "E λ (t) | H(−T,t) − λ† (t)" dt .

For each ﬁxed t, the expectation of the ﬁrst integrand → 0 as T → ∞ by Lemma 14.8.VI and stationarity, so the (C, 1) mean also converges to zero. In the proof of the same lemma, the expectation of the second integrand also converges to zero, so the (C, 1) mean does also. A diﬀerent approach is to ask for entropy rates associated with the point process on R in its dual form as a stationary process of intervals. Extensions of McMillan’s theorem from its original context of a discrete time ﬁnitealphabet source (i.e., from a ﬁnite state space stochastic process) to stationary sequences of random variables with arbitrary distributions were developed by Perez (1959) and can be applied directly to the process of intervals. In particular, Perez showed that if {Xn : n = 0, ±1, . . .} is a stationary sequence of r.v.s taking their values on the c.s.m.s. X , if Π is a ﬁxed totally ﬁnite or σ-ﬁnite measure on X , and if the ﬁdi distributions Fk of order k of the sequence are absolutely continuous with respect to the k-fold product measure Π(k) = Π × · · · × Π on (X (k) , B(X (k) )), then −

1 log n

dFk (X1 , . . . , Xn ) dΠ(k) (X1 , . . . , Xn )

→ E(Z | I),

where the invariant r.v. Z has expectation (ﬁnite or inﬁnite)

∞

E(Z) = E

∗ dF (x | H(−1) )

0

=E

∞

log 0

log

∗ dF (x | H(−1) )

Π(dx) ∗ ) dF (x | H(−1) Π(dx)

Π(dx)

Π(dx)

∗ dF (x | H(−1) ),

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem

453

∗ where F (· | H(−1) ) is a regular version of the conditional distribution of X0 ∗ given the sequence of past values {X−1 , X−2 , . . .} which generate H(−1) . This −µx result can be applied directly to our context if we take Π(dx) = µe dx (x ≥ 0), that is, the stationary interval distribution on R+ associated with the Poisson process with constant mean rate µ. This leads to the result that ∞ µ f (x | τ) log f (x | τ) dx + log µ − , HI (P; Pµ ) = −E0 m 0

where f (· | τ) is the conditional intensity introduced in Corollary 14.3.VI, E0 is used to denote expectations over the vector of past intervals τ, and HI denotes the ‘interval entropy rate.’ If, in particular, we take µ = 1 and use Q to denote the measure corresponding to Π(dx) = e−x+1/m dx, we have similarly that ∞ f (x | τ) log f (x | τ) dx. (14.8.22) HI ≡ HI (P; Q) = −E0 0

This interval entropy rate is easily related to the entropy rate H by appealing to Corollary 14.3.VI. As in the proof of Proposition 14.3.V, for any function h(T, τ) of the backward recurrence time T and the past sequence τ, τ0 ∞ y h(τ0 − x, τ) dx = m E0 f (y | τ) dy h(x, τ) dx E h(T, τ) = m E0 0 0 0 ∞ = m E0 h(x, τ)[1 − F (x | τ)] dx. 0

Now by taking for h(T, τ) the function f (T | τ) f (T | τ) log , 1 − F (T | τ) 1 − F (T | τ) it follows from Corollary 14.8.V that E λ† (0) log λ† (0) ∞ ∞ f (x | τ) log f (x | τ) dx − log[1 − F (x | τ)] f (x | τ) dx = m E0 0 ∞ 0 f (x | τ) log f (x | τ) dx + 1 , = m E0 0

and hence

H = −E λ∗ (0) log λ∗ (0) − λ∗ (0) ∞ f (x | τ) log f (x | τ) dx = mHI . = −m E0 0

Thus, H(P; Pµ ) = mHI (P; Pµ ), which leads to the following statement. Proposition 14.8.VIII. For a simple stationary point process with mean rate m, the entropy rate per unit time equals m times the entropy rate per interval.

454

14. Evolutionary Processes and Predictability

Exercises and Complements to Section 14.8 14.8.1 Show that, in comparison with (14.8.4), the relative entropy relative to the Poisson process contains the additional term (A). [Hint: Substitute for the Poisson probabilities in the expression at (14.8.5) for EP ( log(dP/dQ)), so the term log qksym reduces to log ([(A)]n ).]

∞ ∞ 14.8.2 (a) Show that, subject to the conditions k=0 pk = 1, k=1 kpk = µ and pk ≥ 0, the Poisson distribution maximizes the sum − ∞ k=0 pk log(k! pk ). (b) Deduce that for a regular point process on a bounded interval D ⊂ Rd with E[N (D)] = µ = const., the point process entropy (14.8.4) is maximized when the process is Poisson with uniform mean rate over D. [Hint: Start by writing (14.8.4) in the form

−H =

pk log(k! pk ) +

pk

D (k)

πk (y) log πk (y) dy.

Now use (a) together with the fact that, conditional on k, the integral is maximized, subject to πk (y) dy = 1, when πk (y) reduces to a uniform distribution over D(k) .] 14.8.3 Entropy of renewal process with gamma lifetimes. For the gamma density function fκ (x; a) = e−ax (ax)κ−1 a/Γ(κ), with shape parameter κ and mean κ/a, so mean rate m = a/κ, the information gain G at (14.8.10) equals ∞ m(1 − log m + 0 fκ (x; a) log fκ (x; a) dx). Show that the integral equals −κ − log Γ(κ) + log a + (κ − 1)ψ(κ), where ψ(z) = Γ (z)/Γ(z) is the digamma function. Known expansions for ψ yield

G = m( 12 +

1 2

log(κ/2π)+ 13 κ−1 +

1 −2 κ 12

+

1 −3 κ 90

−

1 κ−4 120

+ · · · ).

[Hint: Daley and Vere-Jones (2004) give more detail.] 14.8.4 Let (Ω, E, µ) be a measure space and P a probability measure on (Ω, E) with P µ. Let {Aα } be a family of ﬁnite or countable subalgebras of E. Deﬁne the generalized entropies H(P; µ), H(Pα ; µα ) by −H(P; µ) = −H(Pα ; µα ) =

dP dP dP log µ(dω) = log P(dω), dµ dµ dµ P(Uαj ) P(Uαj ) log , j µ(Uαj )

where {Uαj } is an irreducible (and countable) partition generating Aα . (i) If Aα ⊆ Aβ then H(Pα ; µα ) ≥ H(Pβ ; µβ ). (ii) If {Aα } generates E, then −H(P; µ) = inf α {−H(Pα ; µα )}. (iii) Let PT , QT be deﬁned as in Proposition 14.8.IV, and suppose that {Tα } is a family of ﬁnite or countable partitions of the interval (0, T ], so that Tα = {Aαi : i = 1, . . . , Kα } such that {Tα } generates B((0, T ]). Let Aα be the algebra of events generated by Zαi = N (Aαi ). Show that −H(PT ; QT ) = inf α {−H(PαT ; QαT )},

14.8.

Point Process Entropy and a Shannon–MacMillan Theorem where −H(PαT ; QαT ) =

n

pαn log

455

pαn , qαn

n = (n1 , . . . , nKα ), pαn = P{Zαi = ni , i = 1, . . . , Kα }, and qαn = Kα ni i=1 [(Aαi )] . [This result provides a discrete approximation to generalized entropy. See Csiszar (1969) and Fritz (1973).] 14.8.5 (a) Show that H(PT ; QT ) can also be characterized as −H(PT ; QT ) = inf α {

p

log pαn − T log δα },

αn

where δα = maxi µ(Aαi ). (b) Show also that −H(PT ; QT ) = inf

α

παn log

παn + qαn

P{Z

Kα

αi

≥ 1} log µ(Aαi ) ,

i=1

where παn = P{Yαi = ni , i = 1, . . . , Kα } for ni = 0 or 1 and Yαi = I{N (Aαi )>0} . [Hint: Show ﬁrst that if Tα ⊆ Tβ and each set Aαi of Tα is a union of no more than r sets of Tβ , then

p

αn

log pαn ≤

p

βn

log pβn ≤

p

αn

log pαn + T log r.

See Fritz (1973) for further details. This result shows that in some sense E(N (0, T ]) plays the role of a dimension for the process, as in R´enyi’s (1959) discussion of ‘dimensional entropy.’] 14.8.6 Generalize the results of Exercises 14.8.4–5 to the case of a general state space X with Poisson distribution having nonatomic parameter measure µ(·). 14.8.7 Information gain in a Markov modulated point process (MMPP). (a) Let X(t) be a stationary continuous-time Markov process on ﬁnite state space X, with stationary distribution πi = Pr{X(t) = i} (all t, i ∈ X), so the rate of jumps between states equals i∈X πi qi = λ say, where the qi are as in Section 10.3 below (10.3.2). Write down an expression for the information gain of the point process consisting of jumps of the Markov process relative to a Poisson process at rate λ. (b) Suppose that, when X(t) = i, points occur in a Poisson process at rate λi [i.e., the point process is a Cox process directed by the states of the hidden Markov process X(·)]. Find the information gain of this point process compared with a Poisson process at rate i∈X πi λi . (c) Compare the results of (a) and (b) and interpret. (d) Consider extensions to (b) when it is a MPP that is directed by X(·), where both the mark distribution and the frequency of observed points msy depend on X(·).

14.8.8 Calculate the entropy rate for a stationary renewal process whose lifetime p.d.f. has unit mean and (a) is exponential on (0, ∞); or else, for some a in (0, 1), is (b) uniform on (1 − a, 1 + a), or (c) triangular on (1 − a, 1 + a). (d) Interpret the limits from (b) and (c) when a → 0.

456

14. Evolutionary Processes and Predictability

14.8.9 The entropy rate of a stationary renewalÊ process with absolutely continu∞ lifetime d.f. with density f (·) equals 0 f (x) log f (x) dx, where m−1 = Êous ∞ xf (x) dx. One technique used in approximating the behaviour of a sta0 tionary point process N on R (e.g., for simulation purposes) is to replace N by a renewal process whose lifetime d.f. coincides with the stationary interval distribution. Use Proposition 14.8.VIII to show that the entropy rate of such an approximating stationary renewal process is larger than that of N . 14.8.10 Investigate extensions to MPPs of the arguments used to prove Lemmas 14.8.V–VI and Theorem 14.8.VII. Show also that the entropy rate for a stationary MPP can be given in a form analogous to that of Proposition 14.8.VIII by considering the bivariate sequence {(τi , κi−1 )}, where κi is the mark associated with the point initiating the interval τi , and applying the corresponding version of Perez’s result noted before Proposition 14.8.VIII.

CHAPTER 15

Spatial Point Processes

15.1 15.2 15.3 15.4 15.5 15.6 15.7

Descriptive Aspects: Distance Properties Directional Properties and Isotropy Stationary Line Processes in the Plane Space–Time Processes The Papangelou Intensity and Finite Point Patterns Modiﬁed Campbell Measures and Papangelou Kernels The Papangelou Intensity Measure and Exvisibility

458 466 471 485 506 518 526

This last chapter provides an introduction to spatial point processes, meaning for the most part results for point processes in R2 and R3 where the order properties of the real line, which governed the development in the preceding chapter, are no longer available. During the last few decades, the rapid growth of interest in image processing has brought about substantial treatments of spatial models, including both engineering and statistical aspects; see in particular Ripley (1981), Baddeley (1998), and van Lieshout (1995). At the same time, the collection of improved quality spatial data in ecology, geography, forestry, geophysics, and astronomy, has maintained a steady demand for spatial statistical models, for which Stoyan, Kendall and Mecke’s text SKM (1995) is an extensive general reference. The material we present falls into two main components. In the ﬁrst four sections we review mainly descriptive properties, distinguishing between distance and directional properties of spatial point patterns, starting from ﬁnite models, moving on to the moment properties of line processes, and then revisiting space–time models, where time reappears so that many of the modelling concepts in Chapter 14 are again available, but spatial patterns also play an important role. The three ﬁnal sections of the chapter provide an introduction to modelling centred around the concept of the Papangelou intensity; we provide some background and motivation from the statistical and physical settings, then attempt an introduction to the more mathematical theory. 457

458

15. Spatial Point Processes

This chapter also includes an introduction, mainly through the treatment of line processes in Section 15.3, to the rich and diverse territory of stochastic geometry, pioneered by Rollo Davidson, David Kendall, and others in the 1970s [see especially Kendall’s (1974) introduction to Harding and Kendall (1974)]. For present purposes, we may take stochastic geometry to mean the study of families of geometric objects randomly located in one-, two- or threedimensional Euclidean space, where to qualify as an ‘object’ all we demand is that the entity can be speciﬁed by a ﬁnite (or perhaps countably inﬁnite) set of real parameters that describe aspects such as location, size, and shape. To each such object there corresponds a point in a Euclidean parameter space of suitably high dimension, and random families of such objects can be deﬁned as point processes on this parameter space as state space. Because a characteristic feature of geometric objects is their invariance under rigid motions such as translation, rotation, and reﬂection, a key question is the implication of such invariance properties on the ﬁrst and second moment properties of the process. The results for isotropic point processes considered in Section 15.2, and those for line processes in Section 15.3, both illustrate this general theme. The results are still applications of the factorization Lemma A2.7.II or equivalent disintegration results, but as the objects become more complex, so also do the disintegrations become more varied and more intricate. The ﬁnal three sections of the chapter introduce a rather diﬀerent aspect of the theory of spatial point processes, where the underlying endeavour is to use the concepts of interior/exterior to provide some kind of weak counterpart to the ideas of past/future on the time axis. Central to this endeavour is the concept of the Papangelou intensity, which underlies recent developments in inference for spatial point processes, such as pseudolikelihood methods and point process residuals. For ﬁnite point processes, the properties of the Papangelou intensity can be developed in a relatively elementary manner from the theory of Janossy densities outlined in Chapter 5, and this is undertaken in Section 15.5. The extension to general processes is altogether more demanding, requiring a combination of deep concepts from statistical mechanics and general point process theory. An introduction to this material is contained in Sections 15.5 and 15.6, and centred round the concept of exterior conditioning, meaning a conditioning of the point process on its behaviour outside a bounded set. In this sense the theory can be thought of as a kind of dual to the Palm theory, which is concerned with conditioning on the behaviour within a bounded set, as the dimensions of that set shrink to zero.

15.1. Descriptive Aspects: Distance Properties Faced with a realization of a point process within a bounded region of R2 , or a spatial point pattern as we generally describe it in this chapter, a statistician’s ﬁrst reaction is likely to be to seek some numerical characteristics with

15.1.

Descriptive Aspects: Distance Properties

459

which to describe its salient features. Spatial point patterns being, in general, objects of some complexity, a variety of diﬀerent statistics has been developed for this purpose. In this section it is our aim to give a brief overview of some of these quantities, without getting too deeply involved with technical issues such as consistency, unbiasedness, or numerical stability. It is our concern rather to identify and place in context the model characteristics to which the statistics refer. More comprehensive introductions to spatial statistics, including point process models in particular, can be found in Ripley (1981), Diggle (1983, 2003), SKM (1987) and its second edition SKM (1995), Cressie (1991, 1993), van Lieshout (2000), Baddeley et al. (2005), and a broad collection of case studies in Baddeley et al. (2006). Crudely speaking, the models that are available to describe point processes in Rd for d = 2, 3, . . . are derived from Poisson processes, Gibbs processes, or (deterministic) lattice processes that may have undergone modiﬁcation by translation, clustering, or inhibition. To the extent that features of these modelling mechanisms have been discussed earlier in the book we should have little more to say. Nevertheless it is worth noting brieﬂy some properties that have been developed to describe how particular models and/or datasets may deviate from the simplest underlying structure meaning, most commonly, the ‘complete randomness’ as for a Poisson process. Some of the earliest characteristics to be studied relate to nearest-neighbour distances, which have long been used to assist both in estimating areal densities and in classifying cluster properties. Indeed, they relate to some of the earliest applications of point process ideas in forestry, ecology, and elsewhere [see, e.g., Mat´ern (1960) and Warren (1962, 1971)]. The functions most frequently used relate to stationary point processes (Deﬁnition 12.1.II), often called homogeneous point processes in purely spatial contexts, in which case descriptions in terms of the Palm probability measure P0 can also be used. Homogeneity in space is a major simplifying factor in conceptual models, but is a rare phenomenon in the real world, so that from a practical point of view, a rudimentary understanding of the behaviour of characteristics to be expected when the true model departs in diﬀerent ways from homogeneity is also important. The ﬁrst such quantity we consider is a particular example of the avoidance function or avoidance probability [equations (2.3.1), (9.2.11) and Example 5.4(a)], namely, 1 − F (r) = P{N (Sr ) = 0}, where in this section we mostly write Sr = Sr (0) for the circle (or in Rd , the sphere) of radius r with centre at the origin. When N is stationary, the function F (r) is also known as the spherical contact distribution [denoted Hs (r) in SKM (1995, pp. 72, 80)], or empty space function, because F (r) = P{N (Sr ) > 0} is the probability that a sphere of radius r makes contact with a point of N . It is also the distribution function of the distance from the origin to the point x∗ (N ) of (13.3.7).

460

15. Spatial Point Processes

The other function which plays a central role in this context that N is stationary is the nearest-neighbour function itself,

G(r) = P0 N (Sr \ {0}) > 0 , denoted D(r) in SKM (1995). It is the distribution function of the distance from an arbitrary point of the process, selected as origin, to the nearest other point of the process, or equivalently the Palm probability version of the spherical contact distribution. Its form for the Neyman–Scott process is given in Exercise 6.3.10 [see also Example 15.1(a) and Exercise 15.1.3]. The ratio of the two survivor functions, ⎧ ⎨ 1 − G(r) if F (r) < 1, (15.1.1) J(r) = 1 − F (r) ⎩ 1 if F (r) = 1, is an indicator, relative to a Poisson process for which J(r) = 1 (all r), of clustering (when < 1) or ‘regularity’ or inhibition (when > 1) at varying distances r. The notation J(·) follows van Lieshout and Baddeley (1996); Diggle (1983) uses q ∗ (·) for the same function in the setting of a Poisson cluster process as in Example 15.1(a), but the concept dates at least to Warren (1971). Note too that for F (r) < 1, (15.1.1) has the alternative expression

P0 N (Sr \ {0}) = 0 . (15.1.1 ) J(r) = P{N (Sr ) = 0} The similar ratios

P0 N (Sr \ {0}) = k P{N (Sr ) = k}

and

P0 N (Sr \ {0}) ≤ k P{N (Sr ) ≤ k}

(15.1.2)

are both identically 1 in r > 0 for every k in any space Rd when N is Poisson; deviations from 1 for diﬀerent k may indicate more detail than J(·) concerning clustering or inhibition. More generally, as in SKM (1995, Section 4.1), take a convex compact set B 0 and deﬁne the contact distribution function HB by HB (r) = P{N (rB) > 0}, irrespective of P being stationary or not. We consider these contact distributions only for the spherical case underlying J(r). In assessing the behaviour of empirical estimates of the J-function, it is important to bear in mind the possibility of non-stationarity, where the value may depend on the spatial origin. As a general rule, explicit expressions for the J-function are rather diﬃcult to obtain. The ﬁrst example below summarizes the more tractable results for Poisson cluster processes. Mase (1986, 1990) reviews diﬃculties in approximating Gibbs processes. Baddeley et al. (2005) and van Lieshout (2006a) examine a range of further results and examples.

15.1.

Descriptive Aspects: Distance Properties

461

Example 15.1(a) Two-dimensional stationary Poisson cluster process; centresatellite process [see Example 6.3(a) and Exercise 6.3.10]. Equation (6.3.14) of Proposition 6.3.III gives the empty space function for a general Poisson cluster process in a form which here reduces to pSr (y) µc dy , (15.1.3) 1 − F (r) = exp − R2

where µc is the intensity for the Poisson process of cluster centres (here assumed stationary) and pSr (y) is the probability that a cluster with centre at y has no members within Sr . To ﬁnd the G-function, suppose given a point of the process at the origin, and consider separately the distance to the nearest point from the same cluster, and to the nearest point from a diﬀerent cluster. For any given cluster structure, there will be a well-deﬁned distribution function tail, Qcl (r) say, for the probability that within a distance r of some given point of a cluster there is no other point of the same cluster. The distance to the nearest point in a diﬀerent cluster, however, has the same distribution F (r) as in (15.1.3). This implies [Diggle (1983, equation (4.6.5))] that 1 − G(r) = Qcl (r) [1 − F (r)],

(15.1.4)

and hence that J(r) = Qcl (r). Thus, for a stationary Poisson cluster process, J(r) is equal to the probability that no two points from the same cluster lie within a distance r of each other, and therefore satisﬁes 1 ≥ J(r) ↓ (0 ≤ r ↑ ). Under more speciﬁc assumptions, the functions pSr (y) and Qcl (r) can be evaluated explicitly. For example, for a Neyman–Scott process with isotropic normal distributions about the cluster centre, the function pSr (y) is evaluated in Exercise 6.3.10. If qj denotes the probability that a cluster contains j points, ∞ and mcl ≡ j=1 jqj < ∞, then Qcl (·) is evaluated over a cluster with j points with probability jqj /mcl ; see Exercise 15.1.3 and Warren (1971). As a further instructive example, albeit special, consider a stationary Poisson cluster process whose clusters consist of exactly two points, one at the cluster centre and the other uniformly distributed on a circle of radius R around the cluster centre. Then it is impossible for two points from the same cluster to lie within a circle of radius less than 12 R, and certain that they will do so for some circle of any larger radius. Because the clusters consist of exactly two points, it follows immediately that Qcl (r) = 0 or 1 according as r < or ≥ R. For circles of radius < 12 R, the process looks like a Poisson process. Some further details are given in Exercise 15.1.4. This example can be adapted to furnish counterexamples in the case of anisotropy (cf. the next section). When the point process is stationary, we can ﬁnd expressions for the F and G-functions in terms of the local Janossy measures of Deﬁnition 5.4.IV. Thus, the empty space function F (x) is exactly the function J0 ∅ | Sx (0) ,

462

15. Spatial Point Processes

and this allows F (x) to be expressed in terms of the factorial moment densities (when these exist) as in (5.4.14): ∞ (−1)k ··· m[k] (y1 , . . . , yk ) dy1 . . . dyk F (x) = J0 ∅ | Sx (0) = k! Sx Sx =

k=0 ∞ k=0

(−1)k M[k] [(Sx )(k) ]. k!

(15.1.5)

A similar expansion holds for the nearest-neighbour function G(x) in terms of the empty space function and moment densities of the Palm distribution; note that such densities exclude the point at the origin. For a stationary process, this reduces to an expansion in terms of the reduced moment densities of the original process (cf. Proposition 13.2.VI). Thus for G(x), (15.1.5) continues ˘ [k+1] [(Sx )(k) ], where m is the mean rate, ˚[k] [(Sx )(k) ] = m−1 M to hold with M in place of M[k] [(Sx )(k) ]. A careful discussion of these and related expansions is given in van Lieshout (2006b), relating the moment measures for the Palm process to the Papangelou intensities through the Georgii–Nguyen–Zessin formula; see also the discussion of these topics in Section 15.5. The F -, G-, and J-functions can be extended to MPPs provided due care is taken to specify the marks of the points appearing in the deﬁnitions. The situation here is analogous to that encountered in deﬁning the Palm distributions P(0,κ) of an MPP, where it is necessary to specify the mark κ of the point at the origin. Thus, in specifying the empty space function for an MPP we need to distinguish between the empty space function for the ground process, Fg (x) say, which determines the distance from an arbitrary origin to any point of the process, regardless of its mark, and the more general family of functions FB (x) determining the distance from such an origin to the ﬁrst point with mark in the subset B ∈ BK . For nearest-neighbour distances there are in principle four diﬀerent options to consider: the distance from a point of the process with arbitrary mark to the nearest point with arbitrary mark (giving the nearest-neighbour distribution function Gg (x) for the ground process); the distance from a point with arbitrary mark at the origin to the nearest neighbour with mark in a speciﬁed set B [giving the distribution G(g,B) (x), say]; the distance from a point at the origin with speciﬁed mark κ to the nearest point of the process regardless of its mark [giving G(κ,g) (x) say]; and the distance from a point with mark κ at the origin to the nearest point with mark in the subset B ∈ BK [giving G(κ,B) (x)]. The next example examines these options for the simplest case of an MPP with independent marks. Example 15.1(b) Processes with independent marks [see Deﬁnition 6.4.III]. Since the mark on the point at the origin is independent of the marks and locations of all further points, it has no eﬀect on the nearest-neighbour distances. Thus, for independent marks we ﬁnd G(κ,g) (x) = Gg (x) ;

J(κ,g) (x) = Jg (x),

15.1.

Descriptive Aspects: Distance Properties

463

and similarly G(κ,B) (x) = G(g,B) (x) ;

J(κ,B) = J(g,B) (x).

The nearest distance from the origin to a point within the mark set B here corresponds to the same distance for a point process obtained from the original by independent thinnings (Section 11.2) with thinning probability p = B F (dx). The eﬀect is to multiply the kth factorial density in (15.1.5) by the factor pk , so that we obtain FB (x) = =

∞ (−p)k k=0 ∞ k=0

k!

Sx

··· Sx

mg[k] (y1 , . . . , yk ) dy1 . . . dyk

(−p)k g M[k] [(Sx )(k) ], k!

where the moment densities refer to those of the ground process. A similar modiﬁcation occurs for the corresponding G-function, irrespective of the mark at the origin. Practical estimation of the F - and G-functions raises the usual problems of allowing for edge eﬀects and possible biases arising from nonhomogeneity. Ripley (1988) and Stoyan and Stoyan (1994) are among the several texts which examine such problems in depth. Here we mention only the edge correction for estimates of the nearest-neighbour distribution proposed in Hanisch (1984). This has the advantage of preserving the monotonicity of the estimate as a function of r. It replaces the na¨ıve estimate + G(r) =

N (W ) 1 IN [(Sr (xk ))=0] N (W ) k=1

with the form N (W ) IN [(Sr (xk ))∩W =0] + H (r) = (W ) G N (W ) (W −d(xk ,∂W ) ) k=1

(15.1.6)

where d(x, ∂W ) is the distance from the point x to the boundary c of the observation region W , and A− = {x ∈ A: ρ(x, Ac ) > } = (Ac ) denotes the -interior of A [cf. the -halo set at (A2.2.2); A− is deﬁned for convex A below (12.2.12)]. The interpretation is that when a point x is too close to the boundary of W for the ball Sr (x) to be wholly contained in W , the count from Sr (x) ∩ W is inﬂated by the weight factor (W )/(W −d(xk ,∂W ) ). Monotonicity and unbiasedness properties are outlined in Exercise 15.1.8. The other quantities we brieﬂy mention in this section are the distributions of point-to-point distances whose properties are summarized in the moment measures, especially the second-order or two-point moment measure. Again we assume stationarity, so that the quantities of principal importance are the

464

15. Spatial Point Processes

reduced moment measures of Section 12.6, or their equivalent representations as moment measures of one order lower for the Palm distributions, as in Proposition 13.2.VI. For spatial processes these reduced moment measures are functions of vector diﬀerences u = x1 − x2 , and so can be represented in terms of polar or spherical polar coordinates. Thus, for example, if u = (r, θ), ˘ 2 (dr) and ˘ 2 (u) can be factorized (disintegrated) into a marginal measure K M a family of conditional distributions Γ(dθ | r) describing the distribution of the angle θ for given r. Decompositions of this kind are examined in more detail in the next section. Here we note that Ripley’s K-function at (8.1.21), namely, ˘ 2 (r) = (1/m2 )M ˘ 2 (Sr \ {0}) = (1/m)M ˚1 (Sr \ {0}), (15.1.7) K(r) = (1/m2 )K is widely used alongside the F -, G-, and J-functions as a useful descriptive characteristic of spatial point patterns. It measures the rate of growth of the reduced second moment measure with distance r from the origin, and can be deﬁned for both isotropic and anisotropic processes, although the former ˘ 2 (r) for the isotropic Neyman–Scott is often assumed. The behaviour of K process referred to in Example 15.1(a) is described in Example 8.1(b); see also Exercise 15.1.2.

Exercises and Complements to Section 15.1 15.1.1 The Slivnyak–Mecke Theorem 13.1.VII implies that for a Poisson process, J(r) = 1 for all r. Investigate whether there are analogues of the constructions in Exercises 2.3.1 and 4.5.12 showing that there exist non-Poisson processes with this property. 15.1.2 Divide R2 into unit squares. Independently to each square allocate N points 1 1 , 89 , and 90 for j = 0, 1, 10, with the common distribution Pr{N = j} = 10 respectively, and for each square distribute its N points uniformly over the square. Check that for any Borel set A, the ﬁrst two moment measures M (A) and M2 (A) for the process are the same as for a Poisson process at unit rate, and hence conclude that Ripley’s K-function is the same for both processes. [Hint: Baddeley and Silverman (1984) give this example with a plot of a realization, remarking that the plot is visually quite diﬀerent from that of a Poisson process. They reference several other similar counterexamples.] 15.1.3 Consider a stationary Poisson cluster process in R2 as in Example 15.1(a) when a typical cluster member is as in a Neyman–Scott process consisting of j points with probability qj such that mcl ≡ ∞ j=0 jqj < ∞, each point being i.i.d. about the cluster centre with a radially symmetric distribution for r which Pr{point lies within distance r of centre} = 0 uf (u) du. Show that ∞ kqk ∞ Qcl (r) = [1 − P (r | y)]k−1 yf (y) dy, mcl 0

k=1

where P (r | y) =

r−y 0

f (x) dx +

r+y r−y

y+r y−r

f (x)lr (x, y) dx

f (x)lr (x, y) dx

(r > y), (r < y),

15.1.

Descriptive Aspects: Distance Properties

465

denotes the probability that a particular point of the cluster lies within a circle of radius r and centre at distance y from the cluster centre and lr (x, y) is the length of the segment of a circle of radius x and centre 0 intersected by a circle of radius r and centre at distance y from 0. For the spatial Neyman–Scott process of Exercise 6.3.10, conclude that the function Qcl (·) of (15.1.4) can be represented in the form ∞

Qcl (r) = k=1

kqk mcl

∞

e−x

2

k−1

/2

xr (x, y) dx

.

0

15.1.4 Show that for the Neyman–Scott process of Example 6.3(a), the function r2 (x) in (8.1.19), a standardized conditional intensity for a point at x given a point at the origin, is given by m[2] f (x + u)f (u) du r2 (x) = 1 + m X when the distribution of cluster points from the cluster centre has density function f (·) and m, m[2] are the ﬁrst two factorial moments of the cluster size distribution [cf. also (6.3.19)]. 15.1.5 Consider the particular Poisson cluster process N with two-point clusters described at the end of Example 15.1(a), setting R = 1. Regard N = Nc +Ns as the superposition of two stationary dependent Poisson processes, each at rate µc , Nc consisting of the cluster centres and Ns of the points of the clusters at unit distance and random orientation relative to the centres. (a) Show that the empty space function F (r) is given by 1 − F (r) = Pr{Nc (Sr ) = 0, Ns (Sr ) = 0} = Pr{Nc (Sr ) = 0} Pr{Ns (Sr ) = 0 | Nc (Sr ) = 0}. 2

The ﬁrst term in the product equals e−µv2 (1)r . The other term is the same for r ≤ 12 but for larger r we must evaluate Pr{no cluster centre outside Sr has a component point inside Sr }. Use standard Poisson process arguments to exclude points inside Sr from centres at distance y from the centre of Sr , with r < y < r + 1, to conclude that

−2µc πr 2 e

1−F (r) =

(0 < r ≤ 12 ),

2 e−µc πr exp − µc

r+1

r

2 arcos

1+y 2 −r 2 2y

2π

2πy dy

( 12 < r).

As a check, evaluate the latter case in the limit r ↓ 12 . (b) Show that the ﬁrst and second moment measures for this process are given by Pr{N (dx1 ) = 1} = 2µc (dx1 ),

2 4µc (dx1 ) (dx2 )

if |x1 − x2 | = 1, 2µc (dx1 ) I|x1 −x2 |=1 dθ otherwise, 2π and in the latter case, x2 is expressed in polar coordinates relative to x1 . (c) Evaluate the ratios at (15.1.2) for k = 1, 2, . . . and compare with J(r). [Remark: Since the clusters have two points, this is a Gauss–Poisson process.] Pr{N (dx1 ) = 1, N (dx2 ) = 1} =

466

15. Spatial Point Processes

15.1.6 For a stationary point process in Rd , express the function G(·) in (15.1.1) as G(r) = lim δ↓0

F δ (r) − F (r) , F (δ)

where F (r) = Pr{N (Sr ) > 0}, Fδ (r) = Pr{0 < N (Sδ ) = N (Sr \ Sδ )}, and F (r) = 1 − F (r) [cf. Chapter 3, Ambartzumian (1972), Paloheimo (1971)]. 15.1.7 Consider Mat´ern’s Model I in R2 [see Example 8.1(c) and Exercise 8.1.8]. Show that for 0 < r < R, G(r) = 0 and hence J(r) > 1 for such r. [The same holds true for any hard-core model as described in Example 5.3(c).] 15.1.8 Hanisch-type edge corrections [Hanisch (1984)] are of the form M 0 (A) =

xk ∈W

N [(xk + A) ∪ W ] , (W −d(xk ,∂W ) )

where d(x, ∂W ) is the distance from the point x to the boundary of the observation region W , A is a test set, and the function being estimated is the ﬁrst-order moment of the Palm distribution. (a) Show that for the edge-corrected estimate (15.1.6) of the nearest-neighbour distribution, and assuming a simple, stationary, homogeneous process with ﬁnite intensity λ (= the mean rate for a simple process), (i) for r1 < r2 , GH (r1 ) ≤ GH (r2 ); and

N (W )

I{N [(Sr (xk ))∩W ]=0} = λG(r). (W −d(xk ,∂W ) ) k=1 [Hint: For (ii), write the left-hand side as the expected value of an integral against N and use the basic formula (13.3.2) for the Palm distribution.] (b) Deﬁne analogous edge-corrected estimates of the marked versions Gg (r), G(g,B) (r) deﬁned above Example 15.1(b), and show that they have similar monotonicity and unbiasedness properties. [Hint: van Lieshout (2006b).] (c) Investigate estimates of the same type for other functionals of the Palm process. (ii) E

15.2. Directional Properties and Isotropy Consider ﬁrst a point process in R2 whose probability structure is invariant under rotations about a ﬁxed point in the plane. Such a process may represent, for example, the distribution of seedlings about a parent plant or of animals or other organisms about a nest or point of release (Byth, 1981). It is natural in such a case to take the ﬁxed point as origin and to represent the points on the plane in terms of polar coordinates (r, θ), with 0 ≤ r < ∞ and 0 < θ ≤ 2π. By omitting the origin, which plays a special role, it can be represented as a + product R+ 0 × S, where R0 = (0, ∞) and S denotes both the circle group and its representation as (0, 2π]. Assuming that a.s. there are no points at the origin (and, we hasten to add, there is little diﬃculty in incorporating the contribution of an atom at

15.2.

Directional Properties and Isotropy

467

the origin if so desired), we have a process with the same kind of structure as a stationary MPP in time. The distance from the origin constitutes the mark and the angular distance θ from a ﬁxed reference axis corresponds to the time coordinate. The factorization Lemma A2.7.II applies in a similar fashion and leads to the following representation of the ﬁrst and second moment measures, analogous to Proposition 8.3.II; the proof is left to Exercise 15.2.1. Proposition 15.2.I. Let N (·) be a point process in the plane R2 , invariant under rotations about the origin with N ({0}) = 0 a.s., and having boundedly ﬁnite ﬁrst and second moment measures. Then the ﬁrst and second factorial moment measures have the respective factorizations M1 (dr × dθ) = K1 (dr) dθ/2π, ˘ [2] (dr1 × dr2 × dφ) dθ1 /2π, M[2] (dr1 × dr2 × dθ1 × dθ2 ) = M where φ ≡ θ2 − θ1 (mod 2π); these factorizations correspond to the integral relations, valid for bounded measurable h(·) with bounded support, namely,

E (R+ ×S)(2)

R+ ×S

2π

dθ 2π

h(r, θ) N (dr × dθ) = 0

∞

h(r, θ) K1 (dr), 0

h(r1 , r2 , θ1 , θ2 ) M[2] (dr1 × dr2 × dθ1 × dθ2 )

2π

= 0

dθ 2π

(2)

R+ ×S

˘ [2] (dr1 × dr2 × dφ). h(r1 , r2 , θ, θ + φ) M

Even without isotropy, for such a ‘centred’ process it is frequently convenient to use polar coordinates and hence to represent the ﬁrst moment measure (assuming it is boundedly ﬁnite) in the form M1 (dr × dθ). Writing r

2π

M1 (ds × dθ)

K1 (r) = E[N (Sr )] = 0

0

for the expected number of points within a distance r of the origin, we can then deﬁne a directional rose as the Radon–Nikodym derivative Γ(dθ | r) = M1 (dr × dθ)/K1 (dr).

(15.2.1)

Observe that Γ, in contrast to K1 , is necessarily a probability distribution. In these terms, isotropy embodies two features: the directional rose is uniform over all angles (and equal to 1/2π), and independent of the radius r. When densities exist, we may wish to express M1 (·) in Cartesian coordinates rather than polar coordinates. The densities in these two representations are related by m(x, y) = r−1 k1 (r) γ(θ | r),

468

15. Spatial Point Processes

where r = x2 + y 2 , θ = artan (y/x), k1 (r) = dK1 (r)/dr, and γ(θ | r) = dΓ(θ | r)/dθ. In the isotropic case, γ(θ | r) = 1/2π. For the second-order analysis, we can introduce a factorial moment measure for the counting process on centred spheres N (Sr ) by setting M[2] (dr1 × dr2 × dθ1 × dθ2 ). K[2] (dr1 × dr2 ) = S(2)

In the isotropic case we can then introduce a second-order directional rose Γ2 (dφ | r1 , r2 ) as the Radon–Nikodym derivative ˘ [2] (dr1 × dr2 × dφ)/K[2] (dr1 × dr2 ). Γ2 (dφ | r1 , r2 ) = M

(15.2.2)

K[2] (dr1 × dr2 ) represents the expected numbers of pairs of points located with one point at distance (r1 , r1 + dr1 ) from the origin and the other at distance (r2 , r2 + dr2 ), and Γ2 (dφ | r1 , r2 ) gives the conditional probability distribution of the angular separation φ between the points. The symmetry properties of the second-order moments again lead to Γ2 (dφ | r1 , r2 ) = Γ2 (2π − dφ | r2 , r1 ), but these quantities are in general diﬀerent from Γ2 (dφ | r2 , r1 ). Example 15.2(a) Isotropic centred Poisson process. Let N (·) be a Poisson process in R2 whose rate parameter has density µ(x, y) = µ(r) for r = x2 + y 2 . Then k1 (r) = 2πrµ(r), k[2] (r1 , r2 ) = 4π 2 r1 r2 µ(r1 ) µ(r2 ), γ(θ | r) = 1/2π = γ2 (φ | r1 , r2 ). Isotropy here implies that the angular separation (at 0) between pairs of points is uniformly distributed. It is also easy to verify that the counting process on centred spheres is a Poisson process on R+ with areal density 2πrµ(r). Example 15.2(b) Isotropic centred Gauss–Poisson process. Deﬁne a Gauss– Poisson process, of pairs of points in R2 , by supposing that parent points are located around the origin O according to a Poisson process with density µ(·), as in Example 15.2(a), and that with each such parent point there is associated an oﬀspring or secondary point whose location relative to a parent point on the circle of radius r1 is governed by the probability distribution with density function f (r2 , φ | r1 ), where r2 is the distance of the secondary point from O and φ its angular separation (at O) from the parent point; for isotropy, suppose that this angular separation is independent of the parent’s angular coordinate. The overall intensity k1 (r) at distance r from O is then the sum of two components, an intensity 2πrµ(r) of parent points and an intensity of secondary points obtained by averaging over all locations of parent points: ∞ 2π 2πsµ(s) ds f (r, φ | s) dφ. k1 (r) = 2πrµ(r) + 0

0

15.2.

Directional Properties and Isotropy

469

Similarly, for the second-order radial moment measure density we ﬁnd k[2] (r1 , r2 ) = k1 (r1 )k1 (r2 ) 2π [2πr1 µ(r1 )f (r2 , φ | r1 ) + 2πr2 µ(r2 )f (r1 , 2π − φ | r2 )] dφ. + 0

The ﬁrst-order directional rose is of course uniform, but not so the secondorder rose which in general depends on the form of the density function f (r2 , φ | r1 ). If the factorization f (r2 , φ | r1 ) = f (r2 | r1 )g(φ) holds, then k[2] (r1 , r2 ) = k1 (r1 )k1 (r2 ) + 2π[r1 µ(r1 )f (r2 | r1 ) + r2 µ(r2 )f (r1 | r2 )], γ2 (φ | r1 , r2 ) = p(r1 , r2 )/2π + q(r1 , r2 )g(φ), where p(r1 , r2 ) = k1 (r1 )k2 (r2 )/k[2] (r1 , r2 ) and q(r1 , r2 ) = 1 − p(r1 , r2 ). Thus, the second-order directional rose is a mixture of two components: it reﬂects the relative proportions of pairs of points with radii r1 , r2 coming from independent point pairs and parent–oﬀspring point pairs, respectively. Note that if any given parent point at distance r1 from the origin has an oﬀspring point with probability p(r1 ), then it is enough to change f (r2 , φ | r1 ) into a subprobability density function with R+ ×S

f (r2 , φ | r1 ) dr2 dφ = p(r1 ).

We turn now to consider those planar point processes that are both stationary and isotropic, so they are invariant under the group of rigid body motions in R2 . By Proposition 15.2.I the ﬁrst moment measure is a multiple of Lebesgue measure in R2 by virtue of stationarity alone, so the eﬀect of isotropy shows up ﬁrst on the second moment measure, that is, on pairs of points. The only property of a pair of points that is invariant under rigid body motions is the distance between them, so a natural coordinate transformation of R2 × R2 to consider is of the form (x1 , y1 , x2 , y2 ) → (x1 , y1 , x1 + r cos θ, y1 + r sin θ), where 0 ≤ r < ∞ and 0 < θ ≤ 2π, with r the distance between the two points and θ the angle between the directed line joining them and a ﬁxed reference axis. Assuming the point process to be simple and considering just the factorial and cumulant measures, {r = 0} has zero probability and so the second factorial cumulant measure can be represented as a measure on the space R2 × R+ 0 × S, corresponding to the coordinates x, y, r, θ just introduced. Stationarity implies invariance with respect to shifts in the ﬁrst two coordinates and so yields the usual representation in terms of a reduced ˘ [2] (dr × dθ). factorial measure, which we write in the form M

470

15. Spatial Point Processes

˘ 2 (dr) and a Without yet assuming isotropy, introduce a radial measure K second-order directional rose Γ2 (dθ | r) via 2π ˘ 2 (dr) = ˘ [2] (dr × dθ), K M 0

˘ [2] (dr × dθ)/K ˘ 2 (dr). Γ2 (dθ | r) = M

The function

r

˘ 2 (ds) K

˘ 2 (r) = K 0

can now be interpreted as the expected number of pairs of points separated by a distance r or less and for which the ﬁrst point of the pair lies within a region of unit area. The directional rose Γ2 (dθ | r) then represents the probability that, given the separation is r, the directed line joining the ﬁrst point to the second makes an angle with the ﬁxed reference axis falling in the interval ˘ 2 (·) is in terms of the ﬁrst (θ, θ + dθ). A more natural interpretation of K moment measure of the Palm measure for the process: it equals the product of the mean density m and the expected number of points in a circle of radius r about the origin given a point at the origin, as noted around (15.1.7) and at (8.1.22) in the discussion of what is there called Ripley’s K-function. Consider now the implication of isotropy. A rotation through α transforms the angle θ to a new angle θ that depends in general on x, y as well as α. Given any θ and θ , we can ﬁnd (x, y) and α such that θ is transformed into θ . Rotational invariance implies therefore that Γ2 (dθ | r) must be invariant under arbitrary shifts θ → θ and so reduces to the uniform distribution on S. We summarize all this as follows. ˘ [2] (·) denote the reduced second factorial moProposition 15.2.II. Let M ˘ [2] (·) ment measure of a simple stationary point process in the plane. Then M can be expressed as ˘ 2 (dr) Γ2 (dθ | r), ˘ [2] (dr × dθ) = K (15.2.3) M corresponding to the integral representation of the second factorial moment measure M[2] (·) (for bounded measurable h with bounded support) h(x1 , y1 , x2 , y2 ) M[2] (dx1 × dy1 × dx2 × dy2 ) R2 ×R2 ˘ 2 (dr) Γ2 (dθ | r), = dx dy h(x, y, x + r cos θ, y + r sin θ) K R2

R+ ×S

˘ 2 (·) is a boundedly ﬁnite measure on R+ and for each r > 0, Γ2 (· | r) where K is a probability measure on S. If the process is also isotropic then Γ2 (dθ | r) = dθ/2π

(all r, θ).

˘ [2] (·) here in the stationary case imply that The symmetry properties of M Γ2 (dθ | r) = Γ2 (π + dθ | r), which is the analogue for the representation being used here of the property noted for Γ2 (· | r1 , r2 ) below (15.2.2).

15.3.

Stationary Line Processes in the Plane

471

Exercises and Complements to Section 15.2 15.2.1 Identify R2 \ {0} with R+ 0 × S and consider mappings that lead to invariance of these component factors. Now apply the factorization Lemma A2.7.II and hence complete the proof of Proposition 15.2.I. [Hint: In part (a) of the proposition identify R2 \{0} with the product R+ 0 ×S and consider invariance under the actions of S. For the second moment in part (b) use a diagonal factorization of the components of S × S.] 15.2.2 Let N be a point process on state space the surface of a cone with semiangle α (the extremes α = 0 and 12 π correspond to a cylinder and a disc, respectively). (a) Use R+ 0 × S to describe the points by the distance from the apex of the cone and the angle relative to a ﬁxed plane through the axis of the cone subtended by a plane through the point and the axis (i.e., the ‘longitude’ of a point on the cone). Describe the ﬁrst and second moment structure of a process invariant under rotations of the cone. (b) For an alternative parameterization, cut the cone down a straight line from the apex and ‘unwrap’ it onto a plane, so that it ﬁlls the plane apart from a sector of angle 2π − θ, where θ = 2π sin α. Rotations of the cone correspond to rotations in the plane modulo θ, where the two edges of the missing sector are identiﬁed. Rephrase the results in (a) in terms of this parameterization. [See Byth (1981) who uses the term θ-stationary process.] 15.2.3 Exercise 8.1.7 gives some results for the isotropic case of a bivariate Neyman– Scott cluster process. Using the notation from there but now for the nonisotropic case, the directional rose has a density γ2 (θ | r) proportional to 2πm1 + where g(θ, Σ) =

m[2] exp[−r2 g(θ, Σ)/4(1 − ρ2 )] , 4πσ1 σ2 (1 − ρ2 )1/2 2ρ cos θ sin θ sin2 θ cos2 θ − + , 2 σ1 σ1 σ2 σ22

that is, γ2 (θ | r) = p(r)/2π + q(r) exp{· · ·}, where p(r), q(r) are nonnegative involving a Bessel function arising from the normalizing condition Êfunctions, ∞ γ2 (θ | r) dθ = 1, and p(r) → 1 as r → ∞. 0

15.3. Stationary Line Processes in the Plane Stationary line processes constitute the paradigm for many of the recent developments in stochastic geometry. In particular, Davidson’s conjecture that all stationary isotropic line processes are doubly stochastic and his imaginative early investigations of this question inspired important further studies by Krickeberg, Papangelou, Kallenberg, and others, and these in turn laid the foundation for recent work on the relations between conditional intensities and Gibbs potentials in the models of statistical physics. Here we give a brief introduction to the properties of line processes, based largely on the early

472

15. Spatial Point Processes

sections of Harding and Kendall (1974) and the work of Roger Miles. The same circle of ideas is introduced in Stoyan and Mecke (1983, Chapter 7), and more extensively in SKM (1995, Chapters 8–9). It is convenient to characterize a directed line in the plane by its coordinates (p, θ), where θ satisfying 0 < θ ≤ 2π is the angle between the line and a ﬁxed reference direction, and p is the signed perpendicular distance from the line to a ﬁxed origin, being positive when, in looking in the direction of the line, the origin is to its left. We deﬁne a random process of directed lines to be a point process on the cylinder R×S, each (random) point on the cylinder being identiﬁed with the (random) line in R2 via its speciﬁcation (p, θ). Thus, by the distribution of a stochastic line process, we mean a probability measure on the point process in R × S. In this text we assume this point process (and hence the line process) to be simple. For example, by a Poisson process of lines in the plane at rate λ, we mean a Poisson process on R × S at rate λ/2π, {(pi , θi )} say, representing directed lines whose directions are i.i.d. uniformly over (0, 2π] and whose signed perpendicular distances from a ﬁxed origin form a Poisson process on R at rate λ [see also Exercise 15.3.1(a)]. Note that, given a planar Poisson process at rate λ and locating through each point a line with direction uniformly distributed over (0, 2π), we obtain inﬁnitely many lines intersecting any unit interval with probability one (cf. Exercise 15.3.2). A process of undirected lines can be treated as a point process on either R+ × S or R × (0, π]: the latter ﬁts more easily into our discussion and follows, for example, Stoyan and Mecke (1983) [see also Exercise 15.3.1(b)]. Another representation of a line process is as a point process in R2 in which the ﬁrst coordinate x say, denotes the intercept by the line on a ﬁxed reference line, and the second equals x cot θ, its intercept on a line orthogonal to the reference line, where θ ∈ (0, π] is the angle that the line makes with the reference line [in terms of (p, θ) these two intercepts are (p sec θ, p cosec θ)]. Clearly, when p = 0 the direction θ is not determined by these two intercepts. However, either representation can be used to describe stationary processes (see below) for which the event p = 0 has probability zero. It will be evident that these two representations of a given line process lead to diﬀerent distributions on diﬀerent spaces. As in Section 15.2 the principal questions we study relate to the eﬀects of stationarity or isotropy on the moment structure of the process. By these conditions we mean of course invariance of the process of lines under translations and rotations in the plane, so our ﬁrst task is to examine the eﬀect of these motions on the coordinates (p, θ) of a line. Rotation through an angle α about the ﬁxed origin corresponds to rotation of the cylinder through the same angle, (p, θ) → (p, θ + α). Translation of the plane a distance d in a direction making an angle φ with the ﬁxed reference axis induces the transformation (p, θ) → p − d sin(θ − φ), θ (−∞ < d < ∞, 0 < φ ≤ π) (15.3.1) on the cylinder, corresponding to a shear whereby points on the cylinder are

15.3.

Stationary Line Processes in the Plane

473

displaced parallel to its axis through a distance varying (when d = 1) from −1 at θ = φ + 12 π through 0 at θ − φ = 0 mod π to +1 at θ = φ − 12 π (here, addition of angles is taken modulo 2π). We start by showing that any Borel measure on the cylinder that is invariant under the action of the shears (but not necessarily under rotations) has the product form (dp) G(dθ), where (·) denotes Lebesgue measure on R. This is the ﬁrst occasion where we encounter an invariance result that cannot be handled via the factorization Lemma A2.7.II: this is so because shears do not generate translations of the cylinder along its axis. Nevertheless, the result we require is still a simple corollary of the more general theorems about the decomposition of invariant measures, which can be established via the general theory of the disintegration of measures, as for example in Krickeberg (1974b, Theorem 2). For the sake of completeness we sketch a simpliﬁed version of the theorem as it applies here. Lemma 15.3.I. Let M (dp × dθ) be a boundedly ﬁnite Borel measure on the cylinder R × S, and let M (·) be invariant with respect to the action of the shears at (15.3.1). With (·) denoting Lebesgue measure on R, there then exists a totally ﬁnite Borel measure G on S such that M (dp × dθ) = (dp) G(dθ).

(15.3.2)

Proof. In outline, we ﬁnd a factorization of M of the form K(dp | θ) G1 (dθ), and then show that K(dp | θ) factorizes as λ(θ) (dp). To start with, there exists a function f (p), as, for example, (n 0 for all p and f (p) M (dp × dθ) < ∞. Introduce the measure R×S f (p) M (dp × dθ) (θ ∈ S). G1 (dθ) = p∈R

For all Borel sets A, M (A × dθ) G1 (dθ), so by appealing to the usual arguments leading to the existence of regular conditional probabilities, we deduce the existence of a kernel K(dp | θ) such that K(A | θ) is a measurable function of θ for each bounded Borel set A, K(· | θ) is a Borel measure on R for G1 -almost all θ, and M (dp × dθ) = K(dp | θ) G1 (dθ). Invariance under the action of a given shear (15.3.1), with parameters (d, φ) say, implies that for any bounded measurable h(·) of bounded support, h(p, θ) M (dp × dθ) = h(p, θ) K(dp | θ) G1 (dθ) R×S R×S = h p + d cos(θ − φ), θ K(dp | θ) G1 (dθ) R×S = h(p, θ) K dp − d cos(θ − φ) | θ G1 (dθ). R×S

474

15. Spatial Point Processes

Because this is true for all such h, the measure K(· | θ) must coincide with its shifted version K(· − d cos(θ − φ) | θ) for G1 -almost all θ. By choosing two appropriate values of d and φ, we infer that for such θ, the measure K(· | θ) is invariant under the action of two incommensurate shifts. This in turn implies that K(· | θ) is a multiple of Lebesgue measure, λ(θ) (·) say (see Exercise 15.3.3 for details). Thus, K(A | θ) = (A)λ(θ), where the left-hand side is a measurable function of θ, hence there is a measurable version λ∗ (θ) of λ(θ) such that λ∗ (θ) = λ(θ) for G1 -almost all θ. Setting G(dθ) = λ∗ (θ) G1 (dθ) proves the lemma. Corollary 15.3.II. Let a stationary line process have ﬁrst moment measure M on R × S. Then M factorizes in the form (15.3.2). Because the measure G is totally ﬁnite, it can be normalized to give a ﬁrst-order directional rose Π(·) on S Π(dθ) =

G(dθ) G(dθ) = . G(S) G(dθ) S

Π(dθ) may be interpreted as the probability that an arbitrary line has orientation θ, and the total measure m ≡ G(S) has an interpretation as the mean density of the line measure induced by the process. To explain this idea, observe that for any line W and any closed bounded convex set A ⊂ R2 , there exists a well-deﬁned length (W ∩ A). Given any conﬁguration {Wi } of lines on the plane, we can introduce a corresponding line measure (Wi ∩ A). Z(A) = i

This set function Z(·) is clearly countably additive and extends to a measure on arbitrary Borel sets in the plane. Furthermore, if Wi has coordinates (pi , θi ) in the cylinder R × S, then the mapping (pi , θi ) → (Wi ∩ A) is measurable, so that if the {Wi } constitute a realization of a stochastic line process, each Z(A) is a random variable. From Proposition 9.1.VIII it follows that Z(·) is a random measure, which we call the line measure associated with the original line process. Proposition 15.3.III. Let Z be the line measure associated with a stationary line process W in R2 . Then Z is a stationary random measure on R2 , and if W has ﬁnite ﬁrst moment measure M (·), Z has mean density M (dp × dθ). (15.3.3) m = G(S) = (0,1]×S

Proof. Writing A (p, θ) = (W (p, θ) ∩ A) for a line with coordinates (p, θ) and any bounded Borel set A ⊂ R2 , we can express Z(A) = A (p, θ) N (dp × dθ) R×S

15.3.

Stationary Line Processes in the Plane

475

in terms of the point process N on the cylinder. Because A ≥ 0, we have A (p, θ) M (dp × dθ) = G(dθ) A (p, θ) (dp). E[Z(A)] = R×S

S

R

For ﬁxed θ, the integral over R is simply the area of A evaluated as the integral of its cross-sections perpendicular to the direction θ. Writing 2 for Lebesgue measure in R2 , we have E[Z(A)] = G(dθ) 2 (A) = m 2 (A), S

establishing (15.3.3). Stationarity of Z(·) follows from the stationarity of the line process deﬁning Z(·). Given a conﬁguration {Wi }, an alternative to the line measure Z(A) is the number of lines hitting A. This set function is subadditive but not in general additive over sets, and thus not a measure, although for each convex set A it deﬁnes a random variable whose distribution and moments can be investigated. If, however, A is itself a line, we obtain the point process on the line formed by its intersections with the lines Wi , which is a random measure. Proposition 15.3.IV. Let a stationary line process in R2 be given with mean density m and directional rose Π(·). (i) Let V be a ﬁxed line in R2 with coordinates (pV , α), and let NV (·) be the point process on V generated by its intersections with the line process. Then NV is a stationary point process on V with mean density mV given by (15.3.4) mV = m | cos(θ − α)| Π(dθ). S

If the line process is isotropic, mV is independent of V with mV = 2m/π

(all V ).

(15.3.4 )

(ii) Let A be a closed bounded convex set in R2 , and let Y (A) be the number of distinct lines of the line process intersecting A. If the line process is isotropic then E[Y (A)] = mL(A)/π, (15.3.5) where L(A) is the length of the perimeter of A. Proof. For any bounded measurable function h of bounded support in R, we have h(x) NV (dx) = h x(p, θ) N (dp × dθ), R

R×S

where x = x(p, θ) denotes the distance from a ﬁxed origin on V to a point of intersection of a line with coordinates (p, θ), and N (·) refers to the point

476

15. Spatial Point Processes

process on the cylinder representing the given line process. Because of stationarity, there is no loss of generality in taking the ﬁxed origin on V as the origin for the cylindrical (p, θ) coordinates. Then x(p, θ) = p sec(θ − α), and on taking expectations, we obtain

E R

h(x) NV (dx) =

R×S

h p sec(θ − α) (dp) m Π(dθ).

(15.3.6)

Substituting u = p sec(θ − α), (dp) = | cos(θ − α)|(du), from which (15.3.4) follows because NV , being stationary, has ENV (dx) = mV (dx). In the isotropic case, Π(dθ) = dθ/2π and integration at (15.3.4) leads to 2m/π as asserted. To prove (ii), suppose ﬁrst that A is a convex polygon. Apply the result of (i) to each side of the polygon in turn, so that adding over all sides shows that the expected number of intersections of lines from the line process with the perimeter of A equals 2mL(A)/π. Convexity implies that each line intersecting the polygon does so exactly twice (except possibly for a set of lines of zero probability), so the factor 2 cancels and (15.3.5) is established for convex polygons. A limiting argument extends the result to any closed bounded convex set A. Propositions 15.3.III and 15.3.IV can be extended to processes of random hyperplanes and more generally random ‘ﬂats’ in Rd : see Exercises 15.3.4–5 for some preliminary results and the extensive series of papers by Miles (1969, 1971, 1974). Krickeberg (1974b) sets out a general form of the required theory of moment measures. When further distributional aspects are speciﬁed, the results can be sharpened as in the basic example below. For extensions to higher dimensions see Exercise 15.3.6 and the cited papers by Miles. Example 15.3(a). We deﬁne a stationary Poisson process of lines in terms of the associated point process N (·) on the cylinder R×S. For the line process to be stationary in R2 , the point process N (·) must be invariant under shears of the cylinder, and its ﬁrst moment measure must decompose as at (15.3.2). But the ﬁrst moment measure of a Poisson process coincides with its parameter measure µ(·), so µ(dp × dθ) = µ (dp) Π(dθ) (15.3.7) must hold for some density µ and probability distribution (here, the directional rose of the line process) Π(·). We can therefore write for the p.g.ﬂ. of the point process N (·), with suitable functions h(·), log h(p, θ) N (dp × dθ) G[h] = E exp R×S h(p, θ) − 1 Π(dθ) . = exp µ (dp) R

S

15.3.

Stationary Line Processes in the Plane

477

Thus, for example, the second factorial moment measure is given by M[2] (dp × dp × dθ × dθ ) = µ2 (dp) (dp ) Π(dθ) Π(dθ ) = M (dp × dθ) M (dp × dθ ), which is of course of the product form expected for a Poisson process. The p.g.ﬂ. for the point process NV of intersections of the line process on a ﬁxed line V follows from an extension of the reasoning leading to (15.3.6). In the notation used there, log h(x) NV (dx) GNV [h] = E exp R log h p sec(θ − α) N (dp × dθ) = E exp R×S h p sec(θ − α) − 1 (dp) Π(dθ) = exp µ R×S h(u) − 1 du , = exp µ | cos(θ − α)| Π(dθ) S

R

which we recognize as the p.g.ﬂ. of a stationary Poisson process on V with density mV as at (15.3.4). In particular, for a stationary isotropic Poisson line process, the density mV = 2µ/π is independent of the orientation α of V , and the number of lines crossing a closed convex set A has a Poisson distribution with parameter µL(A)/π with µ equal to the mean line density. We now discuss second-order properties of line processes, conﬁning our attention to stationary isotropic processes. Now it is clear that one invariant of a pair of lines, under both translations and rotations on the plane, is the angle φ between them where 0 < φ ≤ 2π (this allows for directed lines), so we take coordinates in the form (p, θ, p , θ ) → (p, θ, p , θ + φ). Invariance under rotations then implies that the second factorial moment measure M[2] of the point process N representing the stationary isotropic line process in R2 factorizes into a product ˘ [2] (dp × dp × dφ) dθ/2π. M[2] (dp × dp × dθ × dθ ) = M ˘ [2] we proceed much as in Proposition 15.2.II to deduce To handle the term M that ˘ [2] (dp × dp × dφ) = K[2] (dp × dp | φ) G[2] (dφ). M Invariance of N under shears implies that for almost all φ and at least for (say) rational r and ψ = θ − α, K[2] (dp × dp | φ) = K[2] (dp − r cos ψ) × (dp − r cos(ψ + φ)) | φ .

478

15. Spatial Point Processes

Provided φ = 0 or π, the equations r cos ψ = u and r cos(ψ + φ) = v have unique solutions (r, ψ) for all real u and v, and therefore K[2] (dp × dp | φ) is invariant under at least a countable dense family of translations (p, p ) → (p + u, p + v), including incommensurate pairs (u, v), (u , v ). For such values of φ then, K[2] (dp × dp | φ) reduces to a multiple of Lebesgue measure 2 in R2 , λ(φ) 2 (dp×dp ) say, where as earlier we can take λ(φ) to be a measurable function of φ. The exceptional cases φ = 0 and π correspond to the occurrence of pairs of parallel and antiparallel lines, respectively. In both cases, the signed distance y between the lines is a further invariant of the motion. In the ﬁrst case, invariance under translation implies that K[2] (dp × dp | 0) = K[2] (dp + r cos(θ − α) × dp + r cos(θ − α) | 0), so that setting p = p + y, the measure factorizes into the form + (dy) (dp) K[2] (dp × dp | 0) = K + on R. Similarly, for φ = π, the measure for some boundedly ﬁnite measure K K[2] (dp × dp | π) is invariant under the transformations (p, p ) → p + r cos(θ − α), p − r cos(θ − α) so that setting now p = y − p, where y is the distance between parallel lines oriented in opposite senses, (dy) (dp) K[2] (dp × dp | π) = K − − . Finally, because M[2] is symmetric under the transfor boundedly ﬁnite K formation (p, p , θ, θ ) → (p , p, θ , θ), + and K − are symmetric under reﬂection in their respective all three of G[2] , K origins. We have proved the following result. Proposition 15.3.V [Davidson (1974a), Krickeberg (1974a, b)]. Let M[2] be the second factorial moment measure of a stationary isotropic line process in R2 . Then M[2] admits a representation in terms of the factors: (i) a totally ﬁnite symmetric measure G[2] (dφ) on (0, π) ∪ (π, 2π) governing the intensity of pairs of lines intersecting at an angle φ; + (dy) on R governing the inten(ii) a boundedly ﬁnite symmetric measure K sity of pairs of parallel lines distance y apart; and − (dy) governing the intensity of pairs of antiparallel (iii) a similar measure K lines a distance y apart.

15.3.

Stationary Line Processes in the Plane

479

The representation is realized by the integral relation, valid for bounded measurable nonnegative functions h of bounded support on R × R × S × S = R2 × S2 , R2 ×S2

h(p, p , θ, θ ) M[2] (dp × dp × dθ × dθ ) = G[2] (dφ) h(p, p , θ, θ + φ) dp dp dθ (0,π)∪(π,2π) R2 ×S + (dy) + h(p, p + y, θ, θ) dp dθ K R R×S − (dy) + K h(p, y − p, θ, θ + π) dp dθ. (15.3.8) R

R×S

A similar representation holds for the factorial covariance measure. As Rollo Davidson (1974a) showed, many remarkable corollaries follow from the representation (15.3.8). Corollary 15.3.VI. With probability 1, a stationary isotropic line process in R2 either has no pairs of parallel or antiparallel lines, or has inﬁnitely many pairs of parallel lines, or has inﬁnitely many pairs of antiparallel lines, − (R) = 0, or K + (R) > 0, or K − (R) > 0, respectively. + (R) = K according as K Proof. Let A be a bounded Borel set in R and V a ﬁxed line in the plane. + (A)/π as Then by the preceding discussion and (15.3.4 ) we can interpret 2K the mean density of the stationary point process on V generated by its intersections with those lines of the process which have other lines of the process + (R) = 0, any such process of parallel to them and at separation y ∈ A. If K interaction has zero mean density and is therefore a.s. empty. Letting A ↑ R, we deduce that with probability 1 no line of the process has another line of + (R) > 0, we can ﬁnd a bounded the process parallel to it. Conversely, if K Borel set A with K+ (A) > 0 and a line V such that the process of associated points on V is stationary with positive mean density and therefore has an inﬁnite number of points (see Proposition 12.1.VI). The argument concerning antiparallel lines is similar. Corollary 15.3.VII. M[2] is invariant under reﬂections if and only if the process has a.s. no pairs of antiparallel lines, in which case it is also invariant under translations p → p + y of the cylinder parallel to its axis. − (R) = 0, it follows from (15.3.8) that M[2] is invariant under Proof. If K the transformation (p, p , θ, θ ) → (p, p , −θ, −θ ) on account of the symmetry properties of G[2] . This mapping corresponds to reﬂection in the reference axis. Hence, by isotropy and stationarity, it

480

15. Spatial Point Processes

is invariant under any other reﬂection. A similar conclusion holds for the transformation (p, p , θ, θ ) → (p + y, p + y, θ, θ ). − (R) > 0, we can choose h(·) in (15.3.8) so that a conConversely, if K tradiction arises if we assume that M[2] is invariant under either of these transformations. + ((0, T ]) and T −1 K − ((0, T ]) both vanish in the limit Provided that T −1 K as T → ∞, we can show that the measure G[2] of Proposition 15.3.V is positive + (R) = deﬁnite: we proceed under the more restrictive assumption that K K− (R) = 0, that is, that there are a.s. no parallel or antiparallel pairs of lines. Let a(θ) be a bounded measurable function on (0, 2π) and in (15.3.8) set h(p, p , θ, θ ) = I(0,T ] (p) I(0,T ] (p ) a(θ) a(θ ) = h(p, θ) h(p , θ ). In place of M[2] , consider the ordinary second moment measure M2 for which M2 (A × B) = M[2] (A × B) + M1 (A ∩ B). We have 2 h(p, θ) N (dp, dθ) 0≤E R×S = h(p, θ)h(p , θ ) M[2] (dp × dp × dθ × dθ ) (R×S)2 + h2 (p, θ) M1 (dp × dθ) R×S

= T2 S

S

a(θ)a(θ + φ) G[2] (dφ) dθ +

S

a2 (θ)mT dθ . 2π

Division by T 2 and rearrangement yield S

S

a(θ)a(θ + φ) G[2] (dφ) dθ ≥ −

S

a2 (θ)m dθ →0 2πT

(T → ∞);

that is, the measure G[2] (·) is positive deﬁnite (equivalently, in the terminology of Deﬁnition 8.6.I, it is a p.p.d. measure). This result immediately suggests asking whether G[2] (·) can be interpreted as a covariance measure; accordingly, we look for some random measure Y on S with which G[2] (·) may be associated. The appropriate candidate (as we now show) is the ergodic limit of the original point process on R × S with respect to translations of the cylinder parallel to its axis, or, equivalently, the conditional expectation E(N (·) | IS ) of the original process with respect to the invariant σ-algebra IS generated by these translations. That such ergodic

15.3.

Stationary Line Processes in the Plane

481

limits exist follows directly from Theorem 12.2.IV, and we then have, for A, B ∈ B(S), E Y (A)Y (B) = lim

1 T →∞ T 2

= S2

(R×S)2

M[2] (dp × dp × dθ × dθ )

× I(0,T ] (p)I(0,T ] (p )IA (θ)IB (θ ) dp dp dθ dθ IA (θ)IB (θ + φ) G[2] (dφ) dθ.

An even more surprising corollary is the following. Consider the Cox process N ∗ on the cylinder R × S directed by the random measure × Y . It is readily checked that N ∗ is invariant under rotations and translations of the cylinder and that for A, B ∈ B(S) and U the unit interval, E[N ∗ (U × A)] = m(A)/2π = EY (A), ∗ (U × U × A × B) = E Y (A)Y (B) . M[2] Thus, N and N ∗ have the same ﬁrst and second moment measure. We summarize this as follows. Proposition 15.3.VIII. If a stationary isotropic line process has boundedly ﬁnite second moment measure and has a.s. no parallel or antiparallel lines, then the reduced moment measure G[2] (·) of Proposition 15.3.V is positive deﬁnite and can be identiﬁed with the second moment measure of the random measure on S " 1 " N (dp × A) a.s., Y (A) = E N (U × A) I = lim T →∞ T p∈(0,T ] where I is the invariant σ-algebra associated with shifts of the cylinder parallel to its axis. Furthermore, N has the same second moment measure as the Cox process N ∗ directed by × Y . This proposition, coupled with his failure to ﬁnd counterexamples, led Davidson to formulate his celebrated conjecture [‘the big problem’ of Davidson (1974b, p. 70)] that any stationary isotropic line process with a.s. no parallel or antiparallel lines and boundedly ﬁnite second moment measure must be a Cox process. Davidson showed that no counterexample could be constructed by taking a point process on a line and putting lines through its points [cf. Proposition 15.3.IV(i)], nor by taking a stationary point process in R2 and putting lines through its points, nor seemingly ‘by tinkering with a Poisson line process.’ The structure of stationary isotropic line processes, as well as of the more general stationary hyperplane processes in spaces R2d , differs radically from those of stationary point processes in R1 . That Davidson’s conjecture is false was shown by Kallenberg (1977b) in which the main idea is

482

15. Spatial Point Processes

the construction of a process from a lattice conﬁguration in a parametrization with respect to a ﬁxed line. To describe this example, it is necessary to adopt the alternative representation of a line as a pair (x, y) ∈ R2 . Here x is the x-coordinate of the intercept of the line with the x axis, and y = x cot θ as described earlier in this section. (Our attempt in the ﬁrst edition to apply similar arguments to a lattice in the cylinder representation contained a fundamental ﬂaw: see Exercise 15.3.10.) Example 15.3(b) Kallenberg’s randomized lattice. We start from the line system speciﬁed in the alternative representation above by the square lattice of points in the plane with integer coordinates. We randomize the location of the lattice by translating it by a vector X uniformly distributed over the unit square, and rotating it by an angle Φ uniformly distributed over the interval (0, 2π]. The resulting point process has unit mean density, and the average density of points in a realization (‘sample density’) is also a.s. unity. It is still of lattice type, and is invariant under arbitrary rigid motions of the plane. The crucial requirement, however, is to produce from this randomized lattice a point process that is invariant under shears, Σα : (x, y) → (x, x+αy) say, for these correspond to translations in the space of lines. Because the process is already invariant under rotations, it is enough to consider just the shears Σα parallel to the y-axis. To this end we consider a sequence of further randomizations: ﬁrst select α uniformly over the interval (−n, n), then let n → ∞. This yields a sequence of point processes in the plane, which become more and more nearly invariant under shifts as n → ∞. Moreover, each such process is still of lattice type (although no longer a square lattice), is invariant under translations and rotations, has mean density 1, and has mean sample density a.s. equal to 1. Boundedness of the mean densities implies that the sequence of point processes is tight in the topology of weak convergence (see Exercise 11.1.2). Thus, there exists at least one weakly convergent subsequence. However this argument does not preclude the possibility that the resulting limit measure might be null. To eliminate this last possibility, Kallenberg considers the corresponding Palm measures, and shows that these are tight, implying, because a Palm measure necessarily has a point at the origin, that the resultant limit is nonzero. The resultant line process is not locally bounded, but the line process obtained from considering just the points in a vertical strip will be so and will still be invariant under vertical shears (corresponding to translations in the space of lines) and to vertical shifts (corresponding to rotations). The limit is not a Poisson process because it still has an a.s. lattice character, but from the construction it is invariant under both rotations and shears. Finally, its second factorial moment measure exists, and a further analysis shows that − (R) = 0, and G[2] (·) is uniformly distributed on S. This is + (R) = K K the same second moment structure as for the simple Poisson process itself. Because it is not the Poisson process, it cannot be a Cox process either.

15.3.

Stationary Line Processes in the Plane

483

The corresponding line process is thus invariant under translations and rotations of the plane, has ﬁnite ﬁrst and second moment measures, has a.s. no parallel or antiparallel lines, and is not a Cox process. It therefore refutes Davidson’s conjecture. Details of the proof and further remarkable properties of the process in the plane constructed in this way are given in Kallenberg (1977b) (which includes a further comment by Kingman), and in Mecke’s (1979) subsequent paper. In particular, Mecke gives both an explicit construction for the process and an algebraic characterization of it as the unique process having lattice character and invariant under all aﬃne translations of the plane. The singular character is clearly evident from SKM (1995, Figure 8.3) illustrating the process.

Exercises and Complements to Section 15.3 15.3.1 (a) The line process in R2 represented as a Poisson process on R×S is deﬁned initially with respect to a speciﬁed origin and reference direction in R2 . Show that the line process in fact is homogeneous and isotropic in R2 . [Hint: Consider the transformation on (p, θ) eﬀected by moving the origin as underlying (15.3.1). The Poisson process on the cylinder is preserved under both this transformation and change of the reference direction.] (b) For each of the three point process representations below of a process of undirected lines, describe the point process that represents a line process that is (a) homogeneous; or (b) isotropic; or (c) both. (i) Take the distance p > 0 from the origin to the line as one parameter and the angle made by the line and a ﬁxed reference axis as the other. The line process is represented as a point process on R+ × S. (ii) As in (a) except that the distance is signed and the angle is restricted to the range (0, π]. Then the line process is a point process on R × (0, π]. (iii) Describe a line by its intercepts on the x and y axes as parameters. Then a line process is represented as a point process in R2 . 15.3.2 Suppose given a Poisson process in R2 at unit rate; let Wδ denote those of its points (x, y) lying in the wedge 0 < y/x < tan δ with δ < 12 π. Independently through each point construct a line with orientation θ uniformly distributed on (0, 2π). Let S denote the circle with centre at the origin and radius > 0. Show that Pr{no line through a point of Wδ intersects S } = 0. Conclude that with probability one, inﬁnitely many lines intersect S . 15.3.3 Let µ be a measure on R invariant under shifts Ta and Tb for incommensurate a and b. Set F (x) = µ((0, x]) for x > 0, = −µ((x, 0]) for x ≤ 0, and let U be the set of points u ∈ R such that µ is invariant under shifts Tu . Show that u ∈ U implies −u ∈ U , and that for u, v ∈ U , F (u + v) = F (u) + F (v), so that U is an additive group and thus contains all points of the form ja + kb for positive or negative integers j, k. Deduce that F (x) = αx for all x and some α ≥ 0. [This result, like Exercise 12.1.8, is a variant on the Hamel equation at (3.6.3).] 15.3.4 Random hyperplanes. A hyperplane is a (d − d )-dimensional linear subspace of Rd shifted through some vector x ∈ Rd for some positive integer d < d.

484

15. Spatial Point Processes (a) Show that a directed hyperplane is uniquely speciﬁed by a pair (p, θ), where θ lies on the d-dimensional unit ball S d , p ∈ R, and the sense of the hyperplane (whether the normal to the origin is directed toward or away from the hyperplane) is determined by the sign of p. (b) A process of random d − d hyperplanes can be represented as a point process in S d × R. Rotation of the original plane corresponds to rotation by an element of S d ; translation of the original plane corresponds to the transformation (p, θ) → (p + x, θ, θ), where x, θ is the inner product in Rd . (c) Such a process is homogeneous and isotropic if and only if the point process is invariant under both rotations and generalized shears as deﬁned above, and its ﬁrst moment measure, if it exists, is then a multiple of Lebesgue measure on S d × R, md (·) say. (d) Deﬁne a random measure ξ(A) for bounded Borel A in Rd as the sum of the hypervolumes A ∩ Si , where the particular hyperplanes of the process are denoted by {Si }. Then md (·) is the mean density of this random measure. (e) If L is an arbitrary ﬁxed line in Rd , the points of intersection of L with {Si } form a stationary point process with mean density mΓ( 12 d)/2π 1/2 . [Hint. See references preceding Example 15.3(a).]

15.3.5 The special case of random lines in R3 uses the representation of such lines as points in S2 × R2 , where the component in S2 determines the direction of the line and the point in R2 its point of intersection with the orthogonal plane passing through the origin. Find analogues to (d) and (e) of Exercise 15.3.4 for the ‘line density’ of the process (a random measure in R3 ) and the point process generated by the points of intersection of the lines with an arbitrary plane in R3 . 15.3.6 Extend the result of Example 15.3(a) to the context of Exercises 15.3.4(e) and 15.3.5 (i.e., show that if the original process is Poisson so are the induced processes on the line and plane, respectively). 15.3.7 Given a homogeneous isotropic Poisson directed line process in R2 , form a ‘clustered’ line process with pairs of lines in each cluster in one of the following three ways. (a) Railway line process (i): To each line (pi , θi ) of the process, add the line (pi +d, θi ) for some ﬁxed positive d. In the notation of Proposition 15.3.V, G[2] and K− are null, and K+ has an atom at d. The process is invariant under translation, rotation, and reﬂection. (b) Davidson’s railway line process (ii): To each line (pi , θi ) add the antiparallel line (−pi − d, π + θi ). Then G[2] and K+ are null, K− has an atom at d, and because of the built-in handedness, this process is not invariant under reﬂections of the plane. (c) To each line (pi , θi ) add the line (pi , θi + α) for some ﬁxed α in 0 < α < π. The resulting line process is no longer translation invariant. [Each process here is a possible analogue of the Poisson process of deterministic cluster pairs as in Bartlett (1963, p. 266) or Daley (1971, Example 5).]

15.4.

Space–Time Processes

485

15.3.8 (a) Show that two distinct lines (p, θ) and (p , θ ) intersect in a point inside the unit circle if and only if |p| < 1, |p | < 1 and p2 + p2 − 2pp cos(θ − θ ) < sin2 (θ − θ ). The expected number of line pairs intersecting within the circle is thus found by integrating the second factorial moment measure over the region deﬁned by these inequalities. (b) More generally, the ﬁrst moment measure M of the process of intersections is found from integrals of the form

R2

h(x, y) M (dx × dy) =

R2 ×S2

h(x(p), y(p)) M[2] (dp × dp × dθ × dθ ),

where h(·) is a bounded measurable function of bounded support, and p denotes the vector of coordinates p, p , θ, θ and the two lines (p, θ) and (p , θ ) intersect in the point (x(p), y(p)). (c) When the line process is homogeneous and isotropic, M[2] reduces to the form described in Proposition 15.3.V. Assuming there are a.s. no parallel or antiparallel lines, use the representation of (b) to show that the intersection process is stationary and has mean density given by sin φ G[2] (dφ).

4π (0,π)

15.3.9 Show that if a stationary isotropic line process has a.s. no parallel or antiparallel lines, then it cannot be a Poisson cluster process. [Hint: Consider the form of the second factorial moment measure; a Poisson cluster process with nontrivial cluster distribution cannot factorize in the form of Proposition 15.3.V.] 15.3.10 (a) Consider a line process represented by a lattice on the cylinder R × S. Show that its properties are quite diﬀerent from those of a line process represented by a lattice in R2 using the alternative representation using the intercepts (x, y) = (p sec θ, p cosec θ). (b) Let {(pi , θi )} denote a stationary line process in R2 . Investigate whether the point process in R with realizations {pi } is stationary [i.e., invariant under all shears at (15.3.1)]. [The question is due to Dietrich Stoyan.]

15.4. Space–Time Processes Space–time models combine elements from the evolutionary processes studied in Chapter 14 and from the descriptive properties of spatial patterns covered earlier in this chapter. Because the spatial location can always be considered as one component of a multi-dimensional mark, some aspects, such as the likelihood theory based on conditional intensity functions, are essentially special cases of the more general discussion of Chapter 14. For applications, on the other hand, the evolution of spatial features with time is often of special

486

15. Spatial Point Processes

interest. In this section we review some basic features of space–time point processes, trying to select those that most warrant more careful examination. The earliest statistical models for space–time processes of which we are aware were prompted by an agricultural setting. As agricultural trials continued on experimental stations, ﬂuctuations in soil fertility were studied, and it was observed that the spatial correlations decayed remarkably slowly. Pioneer studies by Whittle (1954, 1962), using diﬀusion methods, showed that such long-term correlations could be caused by a sequence of perturbations (applications of fertilizer or other treatments) followed by gradual diﬀusion. Whittle also observed that space–time models are likely to be more insightful, by penetrating farther into the physical processes generating a spatial point pattern, than a purely static model for that pattern. Despite such considerations, studies of space–time models have lagged well behind those of simple temporal models, and even those of purely spatial models. No doubt the reasons have been largely practical, notably the diﬃculty of compiling good space–time datasets and the heavy computations needed to analyze them. Their importance, however, can only grow as time goes on and these diﬃculties are overcome. Another point to bear in mind about space–time models is their diversity. The models on which we focus here—models for earthquakes form a paradigm example—are for events which can be regarded as points in both time and space dimensions. With earthquakes and forest ﬁres, a point pattern can be obtained only by accumulating events over time (e.g., the ﬁres which have occurred over the last year). Models for particles moving through space constitute a diﬀerent class. Although they can be viewed as spatial point patterns evolving in time, in space–time they form families of trajectories rather than families of points. A similar situation arises for models for storm centres or rainfall cells within a storm; in the discussion in Wheater et al. (2000), these phenomena are treated as points that persist a while until they disappear. A great deal of ﬂexibility is added by moving from simple space–time point processes to space–time point processes with an associated mark. The spatial location itself may be viewed as a mark for a simple point process in time, thereby providing one route to likelihood analyses of space–time models. Further characteristics, such as magnitude, spatial extent, or even duration, can be added as additional marks. Deft use of this procedure, such as has been employed for some decades in applications to queueing systems and networks, can be very helpful in making complex models more tractable. Finally it should be observed that behind many observed space–time point processes lie evolving but unobserved spatial ﬁelds: earthquakes may be regarded as a response to some evolving stress ﬁeld, forest ﬁres as a response to some underlying spatial ﬁeld determining the ignition potential, and so on. Thus, the study of space–time point processes leads almost inevitably to the more general study of evolving spatial ﬁelds, although practical modelling in this direction is still limited and very subject-speciﬁc. We turn to a more systematic study of space–time point processes and focus

15.4.

Space–Time Processes

487

on the most commonly occurring situation, namely, the process is stationary in time but not necessarily homogeneous1 in space or in the mark distribution. The two main aspects we discuss are the ﬁrst- and second-order moment properties, thereby revisiting and elaborating the discussion in Sections 8.3 and 12.3, and the extension to space–time processes of the conditional intensity and likelihood arguments of Chapter 14. For simplicity of exposition we suppose throughout that space here refers to Euclidean space R2 ; other options, such as point processes on the circle or the sphere, are indicated brieﬂy in Exercises 15.4.1–2. For further examples, discussion and references, see Vere-Jones (2007). We start with ﬁrst moments. Lemma 15.4.I. If a stationary marked space–time point process has ﬁnite overall ground rate mg , then there exist a distribution Φ(dx) in space, and a family of conditional distributions Ψ(dν | x) for the residual mark ν, such that the ﬁrst moment measure can be decomposed as M (dt × dx × dν) = mg dt Φ(dx) Ψ(dν | x).

(15.4.1)

Proof. We know from Proposition 8.3.II that, if an MPP is stationary and has ﬁnite ground intensity mg , then its ﬁrst moment measure can be written in the general form M (dt × dκ) = mg dt Π(dκ). The mark κ here has two components, the location x and a residual mark, ν say, so the measure Π here is a bivariate distribution on the product of the location space and the space of the residual mark. The decomposition (15.4.1) is then just the standard disintegration of Π(·) into the marginal distribution in space and a family of conditional distributions for the residual mark, given the spatial location. In this lemma, we allow the distribution of the residual mark ν to depend on the spatial location x, but not (from stationarity) on the time t. The lemma implies in particular that a stationary space–time Poisson process must have an intensity measure Λ which can be disintegrated as in (15.4.1). Assuming densities exist, its intensity in (space–time, mark) space will be of the form λ(t, x, ν) = λg φ(x) ψ(ν | x), where λg is the overall intensity (ground rate). [We follow convention by calling the process ‘space–time’, and write the components of a typical ‘space– time’ point (t, x) in reverse order.] The process can be otherwise interpreted as a space–time compound Poisson process with spatially varying space–time intensity λ(t, x) = λg φ(x) and spatially dependent mark distribution with density ψ(ν | x). 1

For a space–time process on, e.g., R+ × R2 we adopt the convention of using stationarity to refer to invariance with respect to time-shifts, and homogeneity to refer to invariance with respect to shifts in space. Thus, a process that is invariant under shifts in both time and space is an homogeneous stationary space–time process.

488

15. Spatial Point Processes

Example 15.4(a) Models for persistent points; spatial M/G/∞ queue. This example illustrates in simple form some of the issues which have been mentioned. Although in essence it is a model for particles that persist in time, it can be reduced to a marked space–time point process, in the narrow sense in which we have deﬁned it, by treating the duration, as well as the location, as part of the mark. It can be regarded as a spatial version of an M/G/∞ queue, but many models for population and other processes have a similar general structure: particles which arrive at times ti and locations xi , persist for some time τi and then die or otherwise disappear. The spatial birth-and-death process of Example 10.4(e) is a more complex example, where the duration τi may depend on the evolving history of the whole set of particles. It in turn is a special case of the wider class of branching diﬀusion models, which has a considerable literature of its own (see Section 13.5). One basic approach to the process is to consider it as an MPP in time, say N (dt × dκ), with time points ti and marks κi = (xi , τi ) embracing both the locations xi and the durations τi . Equally, it may be regarded as a space– time point process, with locations (ti , xi ) say, and associated marks τi . Once this underlying process has been speciﬁed, all other characteristics should be derivable from it. Observations, however, may be restricted to snapshots of the time-varying spatial point pattern Nt (dx) representing the locations of the particles extant at time t. Here Nt may be regarded as a stochastic process taking values in X ∪ ; the locations of the points at time t can also be represented as a vector xt , anticipating the notation to be used in Section 15.5. A slightly diﬀerent approach is to ﬁx sets Ai in X and look at the joint evolution of the processes Xi (t) = Nt (Ai ) as a multivariate time series. In any case, one initial question is to ﬁnd a representation of the spatial processes Nt in terms of the underlying process N, and to examine how far the structure of N can be reconstructed from observations on the Nt . The basic relation is of simple linear form: ∞ t N (ds × dx × dτ ) . Nt (A) = s=−∞

A

τ =t−s

This representation is immediately useful in obtaining the ﬁrst moment measure for Nt (·). Suppose that the MPP N is stationary in time, and, adopting notation similar to Lemma 15.4.I, that its ﬁrst moment measure has a density which can be written in the form m(t, x, τ ) = mg φ(x) ψ(τ | x), where φ(x) is a time-invariant probability density over the spatial region of interest, and ψ(τ | x) is the time-invariant probability density function for the life of a particle started at location x. Taking expectations in the expression for Nt we obtain E[Nt (dx)] = mt (x) dx, where t ∞ ∞ ψ(τ | x) dτ ds = mg φ(x) τ ψ(τ | x) dτ mt (x) = mg φ(x) −∞

t−s

0

15.4.

Space–Time Processes

489

∞ and 0 τ ψ(τ | x) dτ ≡ L(x) is the mean lifetime of a particle started at location x. A similar (albeit more involved) representation for the second moment measure of Nt is outlined in Exercise 15.4.2. Only the mean of the lifetime distribution can be obtained from the above expression, even in the case that the spatial mean L(x) is independent of x and the ground rate mg could be independently estimated. If the aim were to obtain further information about the lifetime distribution, it would be necessary to combine observations on Nt over a sequence of values of t. Let us then consider, as a second step, the ﬁrst moment structure for observations on two snapshots Nt1 and Nt2 . The combined observations can be treated as a single realization of a multivariate point process on the location space X , with points of three diﬀerent types: Type 1 is observed at t1 but not t2 , Type 2 at both times, and Type 3 at t2 but not t1 . (We assume, here and above, that the particles themselves cannot be distinguished by their ages: they are either present or not present.) Arguing much as above, and writing ∆ = t2 − t1 , we obtain for the ﬁrst moment measures of the three components t1 t2 −s m1 (x) = mg φ(x) ψ(τ | x) dτ ds = mg φ(x) A1 (∆ | x), −∞ t1

m2 (x) = mg φ(x)

t1 −s ∞

−∞ t2 −s t2 ∞

ψ(τ | x) dτ ds = mg φ(x) A2 (∆ | x),

ψ(τ | x) dτ ds = mg φ(x) A3 (∆ | x),

m3 (x) = mg φ(x) t1

t2

where

A1 (∆ | x) = A3 (∆ | σ) = A2 (∆ | x) =

∆

σψ(σ | x) dσ + ∆ 0 ∞ ∆

σψ(σ | x) dσ − ∆

∞

∆∞

ψ(σ | x) dσ, ψ(σ | x) dσ.

∆

Similar decompositions can be obtained for the ﬁrst moments of larger numbers of snapshots, and begin to piece together information about ψ. Because of the simple linear relation between Nt and N , it is possible to extend the moment results into results for p.g.ﬂs. For h ∈ V we obtain I(t − ti < τi ).1 + I(t − ti ≥ τi )h(xi ) Gt [h] = E ti
" " E 1 − I(t − ti ≥ τi )[1 − h(xi )] (ti , xi ) , =E ti
provided the τi , which may depend on xi , are independent of the past of the process up to ti . We can then take expectations conditional on the (ti , xi ) to obtain, using the Heaviside function H(u) = 0 or 1 as u < or ≥ 0, 1 − H(t − ti )[1 − h(xi )]Ψ(t − ti | xi ) = G− [ht ], Gt [h] = E i

490

15. Spatial Point Processes

u where Ψ(u | x) = 0 ψ(v | x) dv, G− is the p.g.ﬂ. of the times and locations only, and ht is the function ht (u, x) = 1 − H(t − u)[1 − h(x)]Ψ(t − u | x). In view of stationarity, the last expression for Gt [h] is independent of t, because we can replace t − u by v without altering the value of the expectation. In simple cases, the p.g.ﬂ. can be evaluated explicitly. If the initiating points (ti , xi ) form a constant rate Poisson process with intensity λg φ(x), then the previous expression can be evaluated as ∞ [1 − h(x)] φ(x) dx [1 − Ψ(v | x)] dv Gt [h] = exp − λg 0 X [1 − h(x)]L(x) φ(x) dx . = exp − λg X

The interpretation here is that, independently of t, the process of extant particles is a Poisson process over X with intensity λ(t, x) = λg L(x)φ(x). Even in the case of the ﬁrst moment measure, ﬁnding nonparametric estimates for the measure is a problem which needs to be approached with some caution. In addition to the boundary problems which inevitably arise in spatial processes, a particular diﬃculty in the space–time context is the problem of distinguishing between transient features, such as random clusters, and long-term spatial inhomogeneities. Suppose that some form of kernel estimate is adopted, say h1 (t − s) h2 (y − x) N (ds × dx), m(t, + y) = t−s∈A, y−x∈B

where the temporal and spatial sets A and B must be selected to reﬂect appropriate ‘bandwidths’ for the two smoothing kernels h1 and h2 . Unless the bandwidths are chosen with particular care (and possibly even then) an estimate of the above kind will either reﬂect the transient clusters (bandwidths too small) or smooth over true inhomogeneities (bandwidths too large). This in turn means that some knowledge of the cluster structure is required before the bandwidths are chosen. On the other hand, determining the cluster structure equally requires some knowledge of the spatial or temporal inhomogeneities. This dilemma can be resolved only partially in practical situations, for example by making a preliminary estimate of the clustering and then using this to determine an initial choice of bandwidth. Vere-Jones (1992) and Musmeci and Vere-Jones (1987) describe two diﬀerent ad hoc approaches to this problem, both requiring a preliminary estimate of the extent of clustering based on local variance/mean ratios. Moving to variable bandwidths generally leads to better visual representations of the data, but does not eliminate, indeed may only intensify, the above dilemma, and is in itself a nontrivial exercise.

15.4.

Space–Time Processes

491

Such diﬃculties raise the issue of whether, in fact, estimating the ﬁrst moment measure in such complex situations is even a desirable goal. In many cases, the immediate concern is to ﬁnd an informative visual display of the data, and many forms of kernel smoothing will achieve that. The question of which, if any, features are persistent spatial inhomogeneities may be better explored at a later stage, through the ﬁtting of exploratory models. In this connection, Zhuang et al. (2002, 2004) describe a powerful technique, stochastic declustering, based on the assumption that the observed process can be approximated by a space–time ETAS model [Example 6.4(d)]. Suppose in the ﬁrst instance that the process of initiating events is stationary in time but varying in space, and that the cluster parameters are constant in both time and space; then the ﬁtted model can be used to estimate, for each observed event, the probability that that event is an initiating event (i.e., not the oﬀspring of some earlier event). Smoothing these probabilities gives a ﬁrst estimate of the ﬁrst moment measure, not of the process as a whole, but of the process of initiating events. This estimate in turn can be used to give improved estimates of the cluster parameters, and the steps iterated until convergence is achieved. The resulting estimate for the ﬁrst moment measure of the initiating events may be a more useful function than the ﬁrst moment measure of the overall process. For example, it can play an important role as a diagnostic tool in identifying areas or time periods in which the process departs from its normal behaviour. This technique can be extended to provide, for every event, the probability that the event is an oﬀspring of any given preceding event; in turn, these probabilities can be used as the basis for simulating the detailed cluster structure of the data (stochastic reconstruction). In the earthquake context, these procedures lead to a technique for determining the clusters that is free of the subjective criteria used in other procedures for such declustering. On the other hand, the reconstruction is not unique, but depends on the particular simulation, and is based on the assumption that the underlying process has an ETAS structure. Nevertheless, it has already proven of value as an analytical and diagnostic tool. We turn next to a consideration of second moment measures in the space– time context. These can become quite complicated even in the simple case that the process is stationary and has no additional marks. Several equivalent representations are possible, all variations on the basic form given in Proposition 8.3.II, namely, ˘ 2 (du × dκ1 × dκ2 ) dt1 , M2 (dt1 × dκ1 × dt2 × dκ2 ) = M

(15.4.2)

˘ 2 is ‘reduced’ with respect to the time variable where u = t2 − t1 and M only. Suppose for deﬁniteness that space here is R2 , so that each κ can be interpreted as a point x ∈ R2 . Then this reduced measure can itself be looked at and standardized in several diﬀerent ways, as indicated in the next proposition, which both specializes and reﬁnes the earlier discussion around Lemma 8.3.III.

492

15. Spatial Point Processes

Proposition 15.4.II. (a) If a simple, stationary, space–time point process has boundedly ﬁnite ground process with ﬁnite second moment measure, then ˘ 2 as at (15.4.2) can be represented in either its reduced moment measure M of the two forms ˘ g (du) Π2 (dx1 × dx2 | u) ˘ 2 (du × dx1 × dx2 ) = M M 2

(15.4.3)

˘ g (·) is the reduced second moment measure for the ground process, where M 2 and Π2 (· × · | u) is a bivariate probability distribution for the locations, given that the occurrence times are separated by an interval of length u; or ˘ 2 (du | x1 , x2 ) Π(dx1 ) Π(dx2 ), ˘ 2 (du × dx1 × dx2 ) = M M

(15.4.4)

˘ 2 (du | x1 , x2 ) is a where Π(·) is the stationary distribution in space, and M reduced cross-moment measure for the occurrence of points at distinct locations x1 , x2 . (b) If the process is an homogeneous stationary space–time process with ﬁnite mean intensity m per space–time unit, then ˚1 (du × dy) dt1 dx1 M2 (dt1 × dx1 × dt2 × dx2 ) = m M

(15.4.5)

˚1 (·) is the ﬁrst-moment measure of the where u = t2 − t1 , y = x2 − x1 and M Palm distribution at (13.2.15). Proof. (15.4.3) is the form already treated in Lemma 8.3.III; it is a disinte˘ 2 with respect to its marginal measure in u and the conditional gration of M distribution of the marks given u. Similarly (15.4.4) is a disintegration with respect to the product measure Π × Π in (x1 , x2 ), justiﬁed by the absolute ˘ 2 (A × dx1 × dx2 ) with respect to Π(dx1 ) × Π(dx2 ) for all continuity of M bounded A. In both these representations, the assumption that the ground process exists implies that N (A × R2 ) < ∞ a.s. for bounded A, so that either the spatial coordinates are themselves restricted to a bounded set, or the occurrence rate drops oﬀ rapidly away from the spatial origin. The ﬁnal representation (15.4.5) is not constrained in this way, and is a consequence of the comments regarding moments of the Palm distribution summarized in Proposition 13.2.VI and in the discussion around (13.4.4). In general the local Palm distribution is conditioned by the value of the mark (here the location) at the point selected as origin, but when the process is homogeneous in space as well as time, the Palm distribution is independent of both the time and space coordinates of the point selected as origin, leading to the form (15.4.5) for its ﬁrst moment measure. Before leaving (15.4.3–5) we note some further aspects of these representations. In (15.4.3), Π2 is not in general symmetric in (x1 , x2 ) (the time sequence in which the two marks occur is important), nor do its marginals reduce in general to the stationary mark distribution. Rather, it provides a useful descriptive summary of how the spatial or mark distribution changes with the

15.4.

Space–Time Processes

493

time interval between the two points. For ergodic processes and large u, it should approximate to the product form Π(dx1 )×Π(dx2 ); for small u, and for cluster processes in particular, it may be highly concentrated about the diagonal x1 = x2 . Interpreting the marks as spatial coordinates, Example 8.3(e) illustrates some types of behaviour which can occur in a space–time cluster process under rather general conditions. A further illustration, focussing on second-order properties, is in Example 15.4(c) below. The decomposition (15.4.4) is the natural form to use in connection with spectral analysis, for its Fourier transform with respect to u gives a quantity, loosely deﬁned in the ergodic case as ∞ 1 ˘ 2 (du | x1 , x2 ) − m2 du], e−iωu [M γ(ω | x1 , x2 ) = g 2π −∞ which has the character of a cross-spectral density of the point processes associated with inﬁnitesimal regions around the locations x1 and x2 . Indeed, if integrated over two disjoint spatial regions A, B it gives precisely such a cross-spectrum, namely that between occurrences in regions A and B: γ(ω | x1 , x2 ) Π(dx1 ) Π(dx2 ). (15.4.6) γA,B (ω) = A

B

Some examples and further discussion are given following Propositions 8.2.I and 8.2.III, the latter giving the spectral representation for an isotropic point process in R2 . The third decomposition (15.4.5) is the natural form to use when the stationary process is (spatially) homogeneous. It describes the expectation of ﬁnding a second point of the process after time lag u at distance y from a ﬁrst point located at the space–time origin. If the process is also isotropic, ˚1 (du × dy) depends only on the length then the ﬁrst-order Palm measure M y. This leads to the possibility of deﬁning a family of Ripley K-functions, K(r | u) say, indexed by the lag u and deﬁned by a decomposition of the type ˚1 (du × Sr ) = K(r | u) du M

(15.4.7)

(Sr is the disc of radius r about the origin in R2 ). Such functions can be used to examine the change in the characteristics of the K-function with increasing time lags; for large lag u and ergodic processes, the initial choice of spatial origin is irrelevant and the form approximates that for the ﬁrst moment, in this case the same as for an homogeneous Poisson process. An alternative procedure for examining the same question is to consider the behaviour of the density m ˚1 (u, y) as a function of the distance y for increasing values of u, that is, corresponding roughly to the behaviour of the second moment measure in annular regions about the spatial origin at increasing separations in time.

494

15. Spatial Point Processes

Without homogeneity, or in the presence of additional marks, there is no uniquely deﬁned Palm distribution, but rather a family of Palm distributions indexed by the location, or by the value of the mark at the point selected as time origin. In such a situation, the most useful surrogate may be the average P 0 (·) of the Palm distributions relative to a point at the origin having a speciﬁed mark, as discussed around (13.4.2b). Example 15.4(b) A random walk renewal process. We consider a point process in time and space, with successive points represented as pairs (tn , xn ). It is assumed that the time points tn form a renewal process (so that successive intervals tn+1 − tn are i.i.d., with ﬁnite mean length µ say), and the xn (independently) form a random walk, so that the diﬀerences xn+1 − xn also form an i.i.d. sequence, independent of the sequence of time intervals. Although this process is not stationary (see Exercise 15.4.3), it is Palm stationary; indeed, its Palm structure, meaning the structure relative to a point pair (tn , xn ) as origin, is very simple: it is just a random walk in the product space (time × space), with the selected point as origin. From this ˚1 (du × dy) of the Palm distribution observation, the ﬁrst moment measure M which ﬁgures in (15.4.5) can be deduced, even though (15.4.5) itself is not meaningful. Let f (·), g(·) denote density functions for the time and space intervals respectively. As in the case of the simple renewal process studied in Chapter 4, the probability density of a second point having time and space coordinates (u, y), given the occurrence of a point at the origin, can be written ˚1 we have as a sum of convolution terms, so that for the density m ˚1 of M m ˚1 (u, y) =

∞

f n∗ (u) g n∗ (y),

(15.4.8)

n=1

denoting the nth convolution powers of f and g, respectively. f n∗ (·), g n∗ (·) ∞ If h(u) = n=1 f n∗ (u) denotes the ordinary renewal density function in time, obtained by integrating out the spatial components, then we can write m ˚1 (u, y) = h(u) π(y | u) = h(u)

∞

πn (u) g n∗ (y),

n=1

where π(y | u) denotes the conditional probability density that, if a second point is observed at time u, then its location is at y, and the ﬁnal sum exhibits this density as the sum of terms πn (u) = f n∗ (u)/h(u) for the probability that this second point corresponds to the nth term in the random walk. For example, if the renewal process in time reduces to a Poisson process of rate λ, the weights are simply the Poisson probabilities that precisely n − 1 points have occurred in the time interval (0, t) (see Exercise 15.4.3). Although this discussion is quite general, the asymptotic behaviour of the second moment measure depends crucially on the character of the underlying space. If this is R2 , the random walk gradually diﬀuses away from the spatial

15.4.

Space–Time Processes

495

origin, and h(t, x) → 0 for all x as t → ∞, even though h(t) → 1/µ. We have here an example for which the moment of the Palm distribution has a welldeﬁned stationary form although no corresponding stationary point process exists. If the space is bounded, in particular if R2 is replaced by the circumference of the unit circle S or the surface of the unit sphere, the behaviour is quite diﬀerent, and a stationary version of the process does exist. In the case of S for example, the convolution powers of a distribution g converge towards the uniform distribution on S, and the function h(t, θ) converges to the positive constant 1/(2πµ), corresponding to the ﬁrst moment density of the stationary version of the process. See also Exercise 15.4.4. Example 15.4(c) Second-order properties of space–time Poisson cluster processes. As in the case of distance properties, the general results on Poisson cluster processes outlined in Proposition 6.3.III can be used as a starting point for examining the second-order properties of cluster processes in both spatial and space–time contexts. Let us assume that the process of cluster centres is a space–time Poisson process with spatially varying but time-stationary intensity Λ(dt × dx) = λc (x) dt dx, and that, conditional on a cluster centre at time t and location x, the second factorial moment density for the cluster member process, in time and space, can be described by the function ρ[2] (s, y1 , y2 | t, x) = ρ˘(s − t; y1 , y2 | x). These are the natural conditions for a process stationary in time but not necessarily homogeneous in space. Then from the general results of Proposition 6.3.III, we obtain for the ˘ 2 (·) of (15.4.2), density of the function M ρ[2] (u; y1 , y2 | x) λc (x) dx. m ˘ 2 (u; y1 , y2 ) = X

To obtain the decomposition at (15.4.3), note that, for u = 0, and assuming the integrals are ﬁnite, g ˘ g (du) = m ˘ (u) du = m(u; ˘ y1 , y2 ) dy1 dy2 , M 2 2 X ×X

so that the bivariate kernel Π2 of (15.4.3) has a density π2 given by ! g π2 (y1 , y2 | u) = m ˘ 2 (u; y1 , y2 ) m ˘ 2 (u) (u = 0). To obtain the further decomposition at (15.4.4), we need ﬁrst to determine the overall ground rate mg and the stationary distribution Π(dx) for the

496

15. Spatial Point Processes

spatial locations. Let m(y, u | x) denote the mean density of cluster elements at location y and after time lag u, from a cluster centre at location x. Then the mean cluster size for a cluster with centre x is given by m(y, u | x) du dy, and mg = λc (x) µ(x) dx. µ(x) = R×X

X

For the density π of the stationary distribution Π of locations we have 1 λc (x)m(y, u | x) du dx . π(y) = mg R×X ˘ 2 (u | x1 , x2 ) of (15.4.4) has density Then the kernel M m ˘ 2 (u | x1 , x2 ) = m ˘ 2 (u, x1 , x2 )/[π(x1 )π(x2 )]. Whether these functions have convenient explicit forms depends on the particular assumptions for the cluster structure and the functions ρ[2] (s, y1 , y2 | t, x) and m(y, u|x) which then arise. A space–time Neyman–Scott example is outlined in Exercise 15.4.5, and a space–time analogue of the Bartlett–Lewis model, based on a ﬁnite version of Example 15.4(b), in Exercise 15.4.6. Nonparametric estimation of the second-order moment structure is again a diﬃcult exercise, which we review only brieﬂy. Suppose that the process can be assumed to be both stationary in time and homogeneous in space, so that an elementary estimate of the mean rate, such as N (W )/(W ), where W is the space–time observation region, is available, and one can proceed to the estimation of second-order properties. Under these assumptions, the most convenient description of the second ˚1 (du × dy) of the order properties is through the ﬁrst moment measure M Palm distribution, as in (15.4.5). The following is a general approach to its estimation. Let E be a small test set in space–time, for example, Iδ (τ )×S (x), where Iδ (τ ) is the interval (τ − δ, τ + δ), and S (x) the sphere with centre x ˚1 (E) by the average and radius . Estimate the Palm moment measure M + (E) = ˚ M 1

N (W ) 1 N (Tzi E), N (W ) i=1

(15.4.9)

where Tz denotes a shift through z and the zi = (ti , xi ) denote the observed points. Dividing by the measure 2δπ 2 of the test set E, and varying the parameters τ and x determining the location of E relative to the origin, one can build up a picture of the behaviour of the density of the Palm distribution over dimensions up to the order of some fraction (perhaps a quarter) of the size of the observation region. In an early study of earthquake patterns in New Zealand, Chong (1983) used this kind of technique to examine how occurrence rates varied with time

15.4.

Space–Time Processes

497

and distance about a typical event characterized by its magnitude. By taking the test set E to be of annular form, a rough estimate was made of how the occurrence rate varied over various times and distance ranges about an event (earthquake) at the origin with magnitude in a speciﬁed small range. Instead of using annular sets, a disc could have been used, of radius r say, thus giving an indication of how the form of the K-function varied with the time separation from the origin, as in (15.4.8). In addition to the usual problems of boundary eﬀects and departures from spatial homogeneity, a particular diﬃculty in Chong’s study is that the data are dominated by the clusters initiated by one or two large events, so that no very clear picture of the stationary behaviour can be obtained. A superﬁcially diﬀerent (but ultimately equivalent) approach is to plot all pairs (ti , xi ), (tj , xj ) (i = j) as points in the product space X ×X and estimate the ﬁrst moment measure of this product counting measure, using a kernel estimate. A recent development of this type of procedure is given in Tanaka and Ogata (2005), who assume that the pairs can be treated as the realization of a nonhomogeneous Poisson process in the product space, and use the Palm intensity for an isotropic process as a surrogate for the Poisson intensity. Estimation of the Bartlett spectrum for space–time processes assumes stationarity in time, and starts from the sample periodogram, which in turn can be obtained from ﬁnite Fourier transforms such as Ji (ω) =

T

e−iωt [NAi (dt) − mAi dt],

0

where mAi denotes the mean rate of occurrence (in time) of points in Ai . For each partition of the spatial region into sets Ai we can then form the matrix of periodogram estimates 1 Ji (ω)J¯j (ω). Iij (ω) = 2πT Each such quantity may be regarded as an estimate of the type of cross periodogram deﬁned in (15.4.6). Further analysis can then proceed by smooothing the periodogram in the frequency domain, and by reﬁning the partition and then smoothing the resulting spectral functions in the spatial domain. In all cases, boundary eﬀects cause major problems. As already mentioned, a variety of procedures for correcting the biases arising from edge eﬀects have been developed for spatial point processes, and many of these can be adapted for use in the space–time context. Perhaps the most generally useful, even if somewhat ineﬃcient in terms of data use, is to carry out the analysis within an extended space–time region, comprising the inner observation region and a buﬀer zone surrounding it in space and preceding it in time. Averaging is carried out over points (ti , xi ) within the observation region only; use of the extended region ensures that full contributions can be obtained even from test sets centred on points near the boundary.

498

15. Spatial Point Processes

In a parametric model, the second-order moments will in general be functions of the parameters of interest, so that in principle estimates of the parameters can be obtained by ﬁtting the theoretical forms for the moments to their nonparametric estimates. This corresponds to a form of moment estimation for the model parameters. The diﬃculty, however, is that little is known about the eﬃciency of such estimates, and indeed one would expect the eﬃciency to depend crucially on the sensitivity of the moment density to changes in the parameter values. Channelling the estimation through the spectrum, as suggested many years ago by Whittle (1951), has the advantage of reducing the estimate to linear combinations of approximately independent terms, and was adapted to a point process context by Ogata and Katsura (1991), but even here the variance properties are not easy to determine. We turn ﬁnally to methods based on the conditional intensity. As soon as time appears as a governing variable, the possibility arises of representing the spatial coordinate as a mark, and hence appealing to the likelihood, simulation, and prediction methods described in Chapters 7 and 14. In the space–time context, the conditional intensity λ∗ (t, x) becomes a function of two variables with the property that for every spatial Borel set for the point A, the quantity λ∗A (t) = A λ∗ (t, x) dx is a conditional intensity process of points with locations in A, NA (t) = NA (0, t] = (0,t]×A N (du × dx) = N (0, t] × A . A suitable reference model in this context is the bivariate Poisson process with constant overall rate λ and independently distributed spatial locations with strictly positive density f (x). The log likelihood ratio then becomes T Ng (T ) λ∗ (ti , xi ) L1 − = log [λ(t, x) − λf (x)] dt dx, log L0 λf (xi ) 0 V i=1

(15.4.10)

where V is the spatial region under consideration. If additional marks are needed, then the conditional intensity becomes a function of three variables λ∗F (t, x, κ), and a Poisson process with intensity λg f (x) g(κ | x) could be used for the reference process. For computational purposes, the space–time likelihood is often written more conveniently in terms of the ground intensity λ∗g (t) for the ground process and the conditional mark (spatial) distribution f ∗ (x | t), so that λ∗ (t, x) = λ∗g (t) f ∗ (x | t). The star here indicates that both quantities are conditional on the history Ft up to time t. Provided f ∗ is normalized to a probability density for any given past history, we can write L1 = log L0

N (T ) i=1

λ∗g (ti ) − log λ

0

T

N g (T ) f ∗ (xi | ti ) . [λg (t) − λ] dt + log f (xi ) i=1

In many models, the parameters appearing in the two terms have no common variables, in which case optimization can be carried out for the two terms separately.

15.4.

Space–Time Processes

499

As in the estimation of moment measures, parameter estimates obtained via likelihood maximization can be seriously biased by problems associated with both initial values and boundary eﬀects. In the rare cases that the data permit, a buﬀer zone around the observation region in both time (preceding) and space (surrounding) may be introduced, and the likelihood ratio computed just for the data points lying within the observation region; the buﬀer zone here provides the information needed to compute the conditional intensity within the observation region. In principle it could be replaced by a form of stationary distribution for the boundary eﬀects, analogous to the forward recurrence time distribution for a renewal process, but ﬁnding the analytical form for such boundary terms is a problem of similar diﬃculty to the Ising problem (see also Section 15.6). Both Bayesian and non-Bayesian approaches to parameter estimation exist. Software routines for likelihood estimation, based on direct maximization of the likelihood or the posterior density, are incorporated in (among other packages) David Harte’s SSLib routines [Harte (2003), Brownrigg and Harte (2005)], where the procedures extend to point processes with i.i.d. marks. Some of the most notable studies in the Bayesian direction have been made by Ogata and colleagues in Tokyo, following the general approach to nonstationary modelling for time series methods (ABIC) suggested by Akaike. Recent descriptions of their methods applied to non-stationary versions of the ETAS model are in Ogata et al. (2003), Ogata (2004), and Ogata and Zhuang (2006), and brieﬂy described below. Example 15.4(d) Space–time ETAS model [see Example 6.4(d)]. After the Poisson model, this has become the best-known model in seismicity studies, where it is used as a ﬁrst approximation to a wide range of catalogue data. In particular, local departures from the ﬁt of an overall ETAS model have become an important diagnostic tool in identifying anomalous regions or time intervals. The complete conditional intensity λ† for the spatial ETAS model has the three-dimensional form −β(M −M0 ) α(Mi −M0 ) µf (x) + A (t, x, M ) = βe e g(t − t )h(x − x | M ) λ i i i , †

i:ti
(15.4.11) where µ is the arrival rate of ‘immigrants’, f (x) is the probability density for the location of a newly arrived immigrant, g and h are probability densities (in time and space, respectively) for the time and space coordinates of an ‘oﬀspring’ event about its parent, the ﬁrst exponential term βe−β(M −M0 ) describes the distribution of the ‘magnitude’ of a newly occurring event above a ﬁxed threshold M0 , A is a constant determining the criticality of the process, and the other exponential term eα(Mi −M0 ) describes the factor by which the parent’s magnitude Mi inﬂates the expected number of oﬀspring. In typical earthquake applications [e.g., Ogata (1998)],

500

15. Spatial Point Processes

g(t) = and

−p t p−1 1+ c c

(t > 0),

−q q−1 x2 h(x | M ) = 1+ . πDeαM DeαM

The condition for stability is ρ < 1, where ∞ Aβ eα(M −M0 ) e−β(M −M0 ) dM = ρ = Aβ β−α M0 can be interpreted as the expected number of direct oﬀspring per ancestor, averaged over the ancestor’s magnitude. In a semiparametric version of this model, model parameters such as p, c, D, α, q are ﬁtted separately to distinct time intervals and geographic regions. Then the ABIC methods are used to select the parameters in the prior distributions for the model parameters. Here the prior distributions (‘hyperparameters’) control the smoothness of the ﬁtted parameters (i.e., the rate of change of their values with changes in time or space), and the AIC procedure is used to select the hyperparameters which give an optimal degree of smoothing. Ogata’s most recent studies use a Delaunay tesselation into cells of the space–time observation region, with each cell containing just one point of the observed process; three-dimensional splines are then used to eﬀect the linking between cells. The general method follows from Tanemura et al. (1983); recent applications are in the three papers cited earlier. A rather diﬀerent approach to space–time models is illustrated by the linked stress release model of Example 7.3(d). Here the observation region is subdivided into a small number of subregions, and the process is analyzed as a multivariate point process, the linkage between subregions being controlled in this example by a transfer to neighbouring regions of some proportion of the stress released by the occurrence of an event in any given region. Analogous models can be envisaged involving the spatial spread of infection for (spatial) epidemic models, or the extent of ﬁre risk in forest ﬁre models. The next example illustrates a further type of model structure in which external variables are incorporated into the conditional intensity. Example 15.4(e) Mutually exciting model for electric signals data. The model used here is straightforward; it is a version of the mutually exciting Lin-Lin model described, for example, in Utsu and Ogata (1997). It points to some of the practical issues which may arise in attempting to develop explanatory models for spatially distributed data. The data consisted of a roughly 25-year list of times and locations of moderate size earthquakes, M ≥ 4, occurring within a 200 km radius of Beijing, together with records of ultra-low frequency electric signals from ﬁve stations at essentially arbitrary locations zr (r = 1, . . . , 5) within the region. To simplify the analysis, both sets of

15.4.

Space–Time Processes

501

data were converted to daily {0, 1} values, and modelled as mutually exciting discrete-time point processes. The main question of interest is whether the electric signals show any predictive power. A general, continuous, version of the model used to model the earthquakes can be represented through a conditional intensity λ(t, x) = µ(x) + λS (t, x) + λE (t, x),

(15.4.12)

where µ(x) is an underlying spatial density of background events, λE (t, x) and λS (t, x) are the earthquake-clustering and signal-generated components of the intensity at time t and location x. It is assumed these can be expressed in the form t

λS (t, x) =

−∞

hS (t − u, x − w) NS (du × dw),

where NS is the count process for the target signals, and the response function hS can be represented parametrically as (for example) a sum of Laguerre polynomials; there is a similar representation for λE via hE and NE . The ﬁrst diﬃculty that arises is that although the earthquakes are well approximated by a space–time point process, the source of the signals is unknown, as indeed is the physical mechanism which causes them, if indeed they are associated with earthquakes. Thus the only available information about the signals is their occurrence or nonoccurrence at each of the recording stations. A visual inspection of the data suggests at most a loose association between the occurrence of a signal at one of the stations and the distance from that station to any temporally nearby earthquake. This suggests either ignoring the spatial dependence in the response term (making hE a function of time only) or possibly introducing a weakly decaying spatial component 5 with contributions from each station, hE (t, x) = f (t) r=1 g(x − zr ), thus assuming some cumulative eﬀect if signals are observed simultaneously at several stations. In the analysis described in Zhuang et al. (2005), locations were ignored in the initial model formulation, but this preliminary model was used as a diagnostic tool to examine the ability of the model to predict earthquake events in diﬀerent parts of the study region, ﬁrst using each station in turn as the source of the electric signals data, and comparing the log probability gains for diﬀerent classes of events, and then using signals from diﬀerent combinations of the stations. The overall analyses showed that, although earthquake clustering (described by the hE term) made the largest contribution to the conditional intensity, the electric signals terms gave signiﬁcant additional predictive power. The results reinforced the suggestion of a weak distance eﬀect, but suggested that there were other eﬀects, probably of greater importance, which masked any such dependence. On the other hand, if the roles of signals and earthquakes were reversed, the earthquakes showed no signiﬁcant predictive power for the electric signals.

502

15. Spatial Point Processes

In Section 7.5 we outlined one of the major advantages of the formulation in terms of conditional intensities, namely its role as the basis of simulation and prediction procedures, as well as in estimation and model selection. The simulation procedures apply as well to space–time models, regarded as MPPs, as to simple point processes in time (cf. Algorithm 7.5.V). Here we add a few further comments on the use of conditional intensity and simulation-based methods in model testing. Leaving aside likelihood ratio tests and associated AIC procedures, which apply to point processes as to any other class of stochastic models, we turn to a class of diagnostic procedures which have been developed more speciﬁcally for point processes in recent years, ﬁrst in the time domain, then more generally. We outlined Ogata’s residual method, based on the time-change theorem, in Section 7.4. In principle it can be extended to multivariate and MPPs by the extended time-change results of Section 14.6. A more general family of diagnostic tests for space–time processes is based on checking for discrepancies in expressions of the form h(t, κ) N (dt × dκ) − λ∗ (t, κ) dt dκ , R(h) ≡ R×R2

where h(t, κ) is a bounded, left-continuous, or more generally F -predictable function which vanishes outside a bounded set. When λ∗ is the true conditional intensity, the terms in square brackets are martingale increments, so any such integral has expected value 0 and from a quadratic variation argument [cf. Exercise 14.1.16 and Zhuang (2006)] [h(t, κ)]2 λ∗ (t, κ) dt dκ. var R(h) = R×R2

Consequently, R(h) can be made the basis for a rough test. Particular choices for h(·) are typically in the form of a space–time window I(T1 ,T2 ]×A (t, κ) weighted to give raw residuals (no further weighting), or Pearson residuals (1/ λ∗ (t, κ) weighting), or inverse-λ residuals (1/λ∗ (t, κ) weighting), and so on. Corresponding methods for spatial processes, using the Papangelou intensity in place of the conditional intensity, are described in the next section. In the next section, we describe the corresponding residuals for spatial processes, which use the Papangelou intensity in place of the conditional intensity. In both contexts, the residual methods focus on local discrepancies between the data and the ﬁtted model, such as might not show up in a global analysis or model selection procedure. Many further details are given for spatial processes in Baddeley et al. (2005); see also Schoenberg (2002) and Zhuang (2006), who extends the approach to second-order residuals that consider local departures from the expected behaviour of pairs of points. One drawback to these procedures in their current form is that the data are most commonly compared to a conditional intensity based on the ﬁtted

15.4.

Space–Time Processes

503

model, so that the procedures ignore the bias that comes from using the same data for both ﬁtting and testing. One ﬁnal procedure we mention is to use the entropy score, or average log likelihood ratio, as a test statistic. For observations over a ﬁnite time window, the log likelihood ratio is computed from the data for the given model against a reference model such as a constant rate Poisson process. The same log likelihood ratio is then computed from data simulated over the same window from the given model. From the simulations, a histogram of values for the simulated log likelihood ratios can be obtained, and the observed value (from the data) located within this histogram. If the observed value lies in the extreme tails, the model can be rejected. It is usually preferable to average the likelihood ratio by dividing by the number of data points, or the space–time volume of the observation window, to give an entropy score per observation or per space–time unit, as discussed for example in Daley and Vere-Jones (2004) or Bebbington (2005). This gives a type of overall, portmanteau, test, in contrast to the residual methods which highlight localized areas of disagreement. We conclude this section with a rather diﬀerent space–time model, in which the role of time is simply to allow points to grow into sets. Example 15.4(f) Lilypond protocol models. A diverse range of models in stochastic geometry stem from a protocol initially studied in H¨ aggstr¨om and Meester (1996) and Daley, Stoyan, and Stoyan (1999) as germ–grain models (cf. examples of particle systems in Section 6.4). Any particular model is in fact determined totally via a protocol acting on a point process Ng (in some c.s.m.s. X ) so as to construct particles via a uniform growth mechanism that leads to a marked point process N = {(xi , κi )} in which the marks κi are totally determined by the point set Ng . This structure contrasts starkly with cluster mechanisms in Section 6.2 and many other MPPs we have studied where the marks are often independent of the process Ng . In the simple lilypond germ–grain model of the cited references, Ng is a stationary Poisson process in Rd . At time zero, there start growing d-dimensional hyperspheres (‘grains’), one around each point as centre, at the same unit rate for each and every grain; any particular grain stops growing when it touches another grain which itself may have ceased growing earlier or else, because it touches the particular grain at the same instant, also then ceases growing. Ultimately, this leads to an MPP {(xi , κi )} whose components are, respectively, the centres and radii of the grains. Because every grain touches at least one other grain, questions arise about how far this ‘touching’ mechanism extends, in addition to more obvious questions about the distribution of the size of a typical grain and when each realization of Ng is a boundedly ﬁnite set of points, the fraction of Rd that is covered by some grain. The former question leads us to deﬁne ‘clusters’ of grains: each grain belongs to a unique cluster consisting of the union of itself and all grains that it touches or are touched by some member of the cluster. H¨aggstr¨om and Meester showed that in Rd ,

504

15. Spatial Point Processes

no matter what ﬁnite d, every cluster is of ﬁnite extent; that is, there is no inﬁnite cluster or, in their language, there is no percolation. Daley, Stoyan, and Stoyan report numerical studies of distributional properties of the radii {κi } for dimensions d = 1, 2, 3; Daley, Mallows, and Shepp (2000) give algebraic formulae describing several properties when d = 1 (cf. Exercise 15.4.5), as, for example, when Ng is a Poisson process on R at unit rate, a generic grain has size V (i.e., length, so Vi = 2κi for all i) with distribution Pr{V > y} = e−y exp(e−y − 1).

(15.4.13)

Such germ–grain models can equally start from ground processes Ng other than Poisson, and may well have grains that are not necessarily spherical: they could, for example, be similarly oriented hypercubes [Daley (2004)]. Or, Ng could be some point set that is a minor perturbation of a lattice set, in which case the resulting grains may all be of about the same size. Or, again, for X = R2 , the grains could be randomly oriented ﬁnite line-segments Li say that are grown centrally about xi and cease growth when one of their growing tips contacts another line-segment. Simulation studies of these models with Ng a Poisson process indicate that the form of the distribution (15.4.13) holds approximately if V is interpreted as the d-dimensional volume of the grain or (in the case of line-segments) the area of the circle with the line-segment length as diameter. The fact that such properties persist, albeit approximately, across dimensions and geometrical shapes, points to some further manifestation of the strong inﬂuence of ‘pure randomness’ associated with Poisson processes. Observe that because the grain-radii {κi } are determinate given the germs {xi } of Ng , the only model-ﬁtting that might be possible would concern the distribution of Ng on the basis of the distribution of these radii. When Ng is Poisson the only parameter to estimate is its intensity.

Exercises and Complements to Section 15.4 15.4.1 (a) In the M/G/∞ queueing model of Example 15.4(a), Poisson arrivals are served by an unlimited number of servers with common service time distributions. Recast this in the form of the example to obtain expressions for the mean number and p.g.ﬂ. of the number of occupied servers. [Hint: Take X to be Z+ and identify the delay with the service time distribution.] (b) Similarly recast Example 10.4(e) in the form of Example 15.4(a) in the special case when the birth and death probabilities of an individual may depend on the location of the individual but not on the rest of the conﬁguration. 15.4.2 (Continuation). Extend the arguments for the ﬁrst moment measure in Example 15.4(a) to ﬁnd an expression for the second factorial moment measure of the process Nt on the location space X in terms of the second-order moment measures of the underlying process N . Consider in particular the situation where the locations and lifetimes are independent of each other and i.i.d. [Hint: In any case, the aim is to ﬁnd an expression for the probability that at time t, two particles are extant in disjoint subsets dx1 , dx2 of X . This requires both particles to have been born at times s1 , s2 before t.]

15.4.

Space–Time Processes

505

15.4.3 Space–time renewal process: moment measures [cf. Example 15.4(b)]. (a) Use the discussion in Example 15.4(b) to show that although the process there has stationary ground process, it is not stationary when X = R2 . [Hint: Write the ﬁrst moment density in the form mg π(x | t) and show that π(x | t) → 0 as t → ∞.] (b) Show that, by contrast, when X = S (circle) a stationary version of the process does exist. Find the stationary mark distribution, the forms of ˘ g (du) and Π2 (dx1 ×dx2 | u) of (15.4.3), and the quantity the quantities M 2 ˘ M2 (du | x1 , x2 ) of (15.4.4). [Hint: Use the results of Exercise 12.1.4.] 15.4.4 Space–time cluster models [special cases of Example 15.4(b)]. (a) Neyman–Scott process [continued from Example 6.3(a)]. Suppose the number of elements in the cluster has ﬁrst and second factorial moments µ1 and µ[2] , and that individuals are distributed relative to the cluster centre according to a distribution with density f (u, x) on R × R2 , where u represents the time delay and x the displacement from the cluster centre. Check that the second cumulant measure for the resultant process still has density given by (6.3.19), namely, f˘(y + u)f˘(y) dy,

c˘[2] (u) = µc µ[2] Rd

with suitable interpretations of f˘, y, and u. Find expressions for the mean spatial density and space–time covariance density, and compare with the forms given around Example 15.4.c. [Hint: Here the dependence on locations reduces to a function of the diﬀerence y1 − y2 between the spatial locations. The process will not be homogeneous unless, in addition, the process of cluster centres is homogeneous.] (b) Bartlett–Lewis process [continued from Example 6.3(b)]. Suppose the cluster process has the form of a ﬁnite space–time random walk as in Example 15.4(b), where the component distributions are constant in time and space. Show that (6.3.23) continues to hold with an appropriate interpretation of the functions F . Use this to examine the second-order properties as in (a) above. (c) Investigate extensions of the previous two examples to situations where the component distributions are location-dependent. 15.4.5 Lilypond systems. (a) An algebraic deﬁnition of a lilypond system {(xi , κi ): i = 1, 2, . . .} for (xi , κi ) ∈ Rd × R+ is that the pairs should satisfy, with probability one, (i) {xi } is a boundedly ﬁnite point set; (ii) |κi + κj | ≤ d(xi , xj ) (all i = j); and (iii) for every i, equality holds in (ii) for at least one and at most two j. (b) A lilypond system {(xi , κi )} has a descending chain if there exists a sequence of germs {xj } such that |xj − xj+1 | < |xj−1 − xj | for j = 2, 3, . . . . Show that a system that has no descending chain cannot percolate. [Hint: See Daley and Last (2005) and Heveling and Last (2006).] (c) In Example 15.4(e) with d = 1 and Ng a Poisson process at unit rate, ﬁrst consider Ng only on R+ and let Q(t) = Pr{origin is uncovered after

506

15. Spatial Point Processes Êt time t of growth}. Show that 1 − Q(t) = 0 e−u · e−u Q(u) du; the corresponding diﬀerential equation has solution Q(t) = exp ( − 12 (1 − e−2t )). By considering i.i.d. systems on R− and R+ , deduce that for Ng on the whole of R, Pr{no grain covers the origin} = [Q(∞)]2 = e−1 . The generic grain-length V at (15.4.13) has mean satisfying 1−E(V ) = e−1 : interpret.

15.5. The Papangelou Intensity and Finite Point Patterns As already mentioned more than once in this chapter, the salient diﬃculty in handling spatial processes is the lack of a time-like dimension through which anything resembling process dynamics can be described. Much eﬀort has gone into circumventing the consequent modelling problems, and in this section we focus on what is perhaps the most important approach, based around the notion of the Papangelou intensity ρ(y | N ), introduced in Papangelou (1974b), following earlier work on lattice processes [see Besag (1974)]. The Papangelou intensity has statistical mechanical connections [see Example 15.5(a) and Section 15.6]. For our purposes here we start from Deﬁnition 10.4.I, which characterizes the appearance of ρ(· | ·) in the analysis of ﬁnite point patterns; it is given there in terms of the Janossy densities of Section 5.3. For a pattern2 x = {x1 , . . . , xk } containing exactly k points, we write ρ(y | x) =

jk+1 (y, x) j(y, x) jk+1 (y, x1 , . . . , xk ) = = , jk (x1 , . . . , xk ) jk (x) j(x)

(15.5.1a)

provided the test location y ∈ / x. Roughly speaking, ρ(y | x) can be interpreted as the conditional intensity for ﬁnding a point of the realization at y, given the realization of the process throughout the remainder of the state space, which in this section we denote W to emphasize that it is most commonly a bounded observation region (W for Window) within two- or three-dimensional space. In any case, Pr{realization contains k + 1 particles in (y, y + dy), (xi , xi + dxi ), (i = 1, . . . , k)} = 1 + o(1) jk+1 (y, x1 , . . . , xk ) dy dx1 . . . dxk , Pr{realization contains k + 1 particles in (y, y + dy), (x, x + dx)} = 1 + o(1) jk+1 (y, x) dy dx, 2 We make much use of x below, as earlier in Section 10.4, where elsewhere in the book we have used N , stressing the (integer-valued) random measure representation: the choice is a matter of convenience and emphasis towards readability. If needed, we write n(x) for card(x) = #(x) without comment. Thus, jk (x1 , . . . , xk ) = jn(x) (x) no matter what ﬁnite k, and write this last as j(x) where no ambiguity may arise, as in j(y ∪ x) = jn(y∪x) (y ∪ x) and (15.5.1a). Also, in Section 10.4 we called ρ(· | ·) the Papangelou conditional intensity; the briefer term is more convenient here.

15.5.

The Papangelou Intensity and Finite Point Patterns

507

with a similar interpretation for jk , so in heuristic terms it is clear that (15.5.1a) represents the required conditional density. In applications to inference, however, we are often concerned precisely with the value of ρ at observed points; in this case we consider ‘one fewer points’ and deﬁne the Papangelou intensity as the function, for xk ∈ x = {x1 , . . . , xk }, by j(x) jk (x1 , . . . , xk ) = . (15.5.1b) ρ(xk | x) = jk−1 (x1 , . . . , xk−1 ) j(x \ xk ) For an empty realization x = ∅, we set ρ(x | ∅) = j1 (x)/j0 = ρ(x | x). The explicit dependence of ρ on the number k of points in the realization, which is a feature of its deﬁnition via Janossy densities, can be subsumed in its general dependence on the sample, so that ρ can be regarded as a samplebased function, using either the notation ρ(y | x) as above, or ρ(y | N ) when it is more convenient to consider the realization as a counting measure. Keeping to the former notation, and adopting a similar notation for the Janossy measures themselves, we can write the deﬁnition of both ﬁrst- and higher-order Papangelou intensities in the single form [cf. (15.5.1a) and (15.5.1b)], as we use shortly and which perhaps best reveals its underlying character, ρ(y | x) =

j(x ∪ y) = ρ(y \ x | x) ρ(y ∩ x | x). j(x \ y)

(15.5.2)

Example 15.5(a) Gibbs process with pairwise interactions [cf. Examples 5.3(c) and 7.1(b)]. In Example 5.3(c) a Gibbs process on a bounded region W is speciﬁed through Janossy densities of the form (but the present notation) jn(x) (x) = C exp

n(x)

ψ1 (xi ) +

i=1

n(x) i−1

ψ2 (xi , xj ) ,

(15.5.3)

i=2 j=1

where ψ1 (·) represents a potential energy due to an external force ﬁeld and ψ2 (·) represents an interaction potential, whereas the expression in the exponent is some measure of the energy of the system, under equilibrium conditions, when it happens that the system has k = N (W) = n(x) particles and x describes their locations. For this model we have N (W) ψ2 (y, xi ) (y ∈ / x), ρ(y | x) = exp ψ1 (y) + i=1 ψ2 (xj , xi ) (xj ∈ x). ρ(xj | x) = exp ψ1 (xj ) + i =j

Similarly, for the second-order Papangelou intensity we have for example for / x, y 1 , y2 ∈ (W) 2 N ψ2 (yj , xi ) + ψ2 (y1 , y2 ) . ρ(y1 , y2 | x) = exp ψ(y1 ) + ψ(y2 ) + j=1 i=1

508

15. Spatial Point Processes

It is as a sample function that the Papangelou intensity appears most naturally in statistical applications, particularly in questions relating to the likelihood. Given a realization x, we know already that the likelihood is just the Janossy density jn(x) (x). Taking some arbitrary enumeration {x1 , . . . , xn(x) } of the points in the sample, writing xr for the set of the ﬁrst r elements in this enumeration, and repeatedly using the ratio in (15.5.1b), we have

n(x)

log L =

log ρ(xi | xi−1 ) + log j0 ,

(15.5.4)

i=1

This form is quite simple, but it has the disadvantage of treating the observed process as a ﬁnite process deﬁned wholly within the observation region. In fact the observations are more commonly the restriction to the observation region of a point process deﬁned within a much larger region. As already discussed around Deﬁnition 7.1.II, the Janossy densities that in principle should be used in such a situation are the local Janossy densities of Deﬁnition 5.4.IV. Their determination even in such a relatively simple model as the Gibbs process just considered is a formidable task. Even in treating a Gibbs process as a ﬁnite point process in its own right, a signiﬁcant problem arises in the calculation of the last term in (15.5.4), which reduces to the logarithm of the partition function. For these and other related reasons, parameter estimation via maximization of the exact likelihood of a spatial point process is commonly replaced by the maximization of the pseudolikelihood, introduced already in Example 7.1(b), and deﬁned by analogy with the likelihood3 for a spatial Poisson process by † log ρ(xi | x) − ρ(y | x) dy. (15.5.5) log L (x) = xi ∈x

W

Here the diﬃcult term j0 disappears and the pseudolikelihood is relatively easily calculated for the Gibbs models above and some Markov spatial processes of Section 10.4. See Exercise 15.5.1. For both time and space–time point processes, replacing the Poisson intensity by the conditional intensity still leads to an exact likelihood, but unfortunately the same is not true for spatial point processes when the Poisson intensity is replaced by the Papangelou intensity. Nevertheless many useful properties have been shown to hold for maximum pseudolikelihood estimates in the point process context, including consistency and asymptotic normality (Section 9.2 of Møller and Waagepetersen (2004) has a good survey). Moreover they are quite easily modiﬁed if data from a buﬀer region are available, much as discussed for estimation of the moment measures in Section 15.2. In such a case the Papangelou intensity is calculated for the realization in the 3 Equation (15.5.5) is correct; the equation for L† (x) in Volume I (ﬁrst impression) p. 217 Ê is incomplete: it is missing a term exp ( − W ρ(y | x) dy ).

15.5.

The Papangelou Intensity and Finite Point Patterns

509

extended region, and this sharpened form for the Papangelou intensity is then substituted into (15.5.5) which is evaluated over the observation region only. Similar modiﬁcations can be incorporated for Hanisch-type edge corrections. One context where this diﬃculty is avoided is for point processes on a circle or sphere, for here the process can be at once both a.s. ﬁnite and stationary (invariant under rotations), and also there are no edges. Example 15.5(b) Pairwise interaction process on a circle or sphere [Billiot and Goulard (2001)]. The study quoted in the heading was prompted by a desire to describe and interpret the variation between the angles of the separate strands in the root system of a maize plant. For this purpose, repeated observations on the solid angles between strands were recorded for a sample of maize plants. The data clearly show some tendency towards regular spacing, so as a basic model the authors suggest a pairwise interaction process as in Example 15.5(a), but assume in addition that the structure is invariant under rotations of the sphere (i.e., it is homogeneous). This implies that the ﬁrst-order term must reduce to a constant, and that the pairwise interaction term must be a function only of the angle (length of great circle arc) θij between the pair of points xi , xj under consideration. Now any pair of points can be joined by two great circle arcs, one longer than the other; we suppose in the sequel that 0 < θij ≤ π, so that the smaller distance (angle) is always chosen. For deﬁniteness we speak of points on a sphere, but it is clear (because no more than a pair of points needs to be considered in the ﬁrst- and second-order potential functions) that a similar analysis could be carried out for points on a sphere in any ﬁnite-dimensional Euclidean space. Under the additional conditions quoted, the Papangelou intensity of the Gibbs process of Example 15.5(a) takes the form φ2 (x − xi ) , (15.5.6) ρ(x | x) = exp φ1 + xi ∈x; xi =x

where x is a general point on the sphere, φ1 is a constant, and φ2 (·) is a function of the angular separation θ, and we use x − y to denote the angular separation of the points x, y. Billiot and Goulard suggest a semiparametric approach, expanding φ2 in a ﬁnite cosine series φ2 (θ) =

M m=1

αm cos mθ ,

with the ﬁrst-order term φ1 , and M and the coeﬃcients αm to be determined. A number of diﬃculties arise in this example. If only one realization is available, it is clearly necessary to suppose M N (S), the number of points in the realization. This suggests using an exact likelihood approach and using a criterion such as AIC to determine M . The main diﬃculty here is in the calculation of the normalization constant, although with M small enough this may still be feasible through numerical or Monte Carlo simulation. In the

510

15. Spatial Point Processes

situation considered by Billiot and Goulard, which is probably more typical of biological models, the repeat samples provide a greater abundance of data, and some version of the pseudolikelihood or Takacs–Fiksel methods (see the comments below and Exercise 15.5.2) looks more attractive. In fact, the authors allude to some diﬃculties with the usual pseudolikelihood estimates [see Billiot (1994)] and propose instead a variation based on a discrete approximation scheme derived from the pooled data from all sample members. We refer to their paper for computational and numerical details. Further properties of the Papangelou intensity for ﬁnite spatial point patterns can be deduced quite readily from the deﬁnition in terms of Janossy densities at (15.5.2) which, it will be noted, encompasses Papangelou intensities of higher order. We review these properties as a prelude to discussing the general case in Sections 15.6–7. As always, we suppose that the deﬁnition of Papangelou intensities is restricted to situations where the denominators in the ratios at (15.5.2) are positive. The properties of greatest interest, which indicate the main results to be expected in the general case, are set out in the Proposition below. We assume that the point process is regular (Deﬁnition 7.1.I) so that Janossy densities exist for all orders. Proposition 15.5.I. Let N be a regular, ﬁnite point process deﬁned on an observation space W ⊂ R2 . Then the Papangelou intensities for N satisfy the following relationships. (i) Multiplicative relation: For mutually disjoint x, u, v, ρ(u ∪ v | x) = ρ(u | x ∪ v) ρ(v | x),

(15.5.7a)

and, more generally, with ρ(y | x) = j(x ∪ y)/j(x \ y) as in (15.5.2), ρ(u ∪ v | x) = ρ(u | x ∪ v) ρ(v \ u | x \ u) = ρ (u ∪ v) \ x | x ρ (u ∪ v) ∩ x | x .

(15.5.7b)

(ii) Conditional probability interpretation: For any bounded Borel set B ⊂ W, y = {y1 , . . . , yn(y) } ⊂ B and x = {x1 , . . . , xn(x) } ⊂ B c , and with B cN denoting N restricted to B c , ρ(y | x) dy equals n(x)

Pr N (B) = n(y) and N (dyj ) = 1 (j = 1, . . . , n(y)) | B cN = i=1 δxi . n(x)

Pr N (B) = 0 | B cN = i=1 δxi (15.5.8) (iii) Relation to Palm densities: ! ρ(y | x) = q(x | y) q(∅ | y),

(15.5.9)

where q(x | y) is the Janossy density on (W \ y)∪ of the higher-order Palm (l) distribution Py (·) deﬁned as in Exercise 13.1.11.

15.5.

The Papangelou Intensity and Finite Point Patterns

511

(iv) Integral relations: for bounded Borel functions f (y, x), W ∪ ×W ∪

f (y, x) ρ(y | x)j(x) dx dy =

W ∪ ×W ∪

f (y, x)j(y ∪ x) dx dy. (15.5.10)

Proof. The multiplicative relations at (15.5.7a) and (15.5.7b) are direct consequences of the deﬁnition at (15.5.2). The probabilistic interpretation (15.5.8) becomes clear once it is recalled that the Janossy densities specify precisely the number of points in the realization. Thus, if the realization x has k points and is therefore speciﬁed by jk (x), and if x ⊂ B c , there can be no other points in W and hence none in B; similarly, if the realization has j + k points in all and k of them are in B c , then there must be exactly j in B. To understand (15.5.9), consider ﬁrst the Janossy density for the usual (ﬁrst-order) Palm distribution for a ﬁnite point process, supposing that the realization consists of a point at the origin y and a vector x of dimension n(x) of further points. In terms of the Janossy densities of the original process, we can write ! q(x | y) = j(x ∪ y) m(y) (y ∈ / x), where m(y) is the local mean density at y (i.e., the density of the ﬁrst-order moment measure), which can be written here as ∞ k

1 j(x ∪ y) dx . jk+1 (x1 , . . . , xk , y) dxi = k! X (k) W ∪ n(x) ! i=1 k=1 (15.5.11) ! Because also q(∅ | y) = j1 (y) m(x), equation (15.5.9) in the case = 1 follows from ρ(y | x) = j(x ∪ y)/j(y) = q(x | y)/q(∅ | y). m(y) =

The higher-order expressions can be derived in a similar way. From this point of view, the Papangelou intensities are rescaled versions of the Janossy densities of the Palm distribution. The ﬁnal integral relation also has a portmanteau character, as it subsumes a series of relationships for ﬁxed k, each of which follows directly from the ratio form of the deﬁnition in (15.5.2). By recasting the integrals in the last of these properties in terms of expectations, we recover the Georgii–Nguyen–Zessin formula [Georgii (1976), Nguyen and Zessin (1979b)] for ﬁnite point processes. It is a key point of the theory and lies behind many of the statistical applications of the Papangelou intensity. Proposition 15.5.II (Georgii–Nguyen–Zessin Equation). Let N be a ﬁnite point process on W satisfying the conditions of Proposition 15.5.I, K a ﬁxed positive integer, and h(·, ·) a nonnegative, measurable, integrable function

512

15. Spatial Point Processes

of (u, v), where u has dimension K. For a given realization x of N , with card(x) ≥ K, partition x into components u and v = x \ u with card(u) = K. Let N [K] (·) denote the modiﬁed form of the product counting measure omitting terms along the diagonals. Then, setting both expressions under the expectation signs equal to zero when the realization contains fewer than K points, E W (K)

h(u, x \ u) N [K] (du) = E

W (K)

h(u, x) ρ(u | x) du .

(15.5.12)

Proof. We again start with the ﬁrst-order result (K = 1), assuming that N has a mean density m(x) as in (15.5.11) and Papangelou intensity ρ(x | y). Then, with n(v) = k − 1, the left-hand side of (15.5.12) can be written as ∞ 1 h(u, x \ u) = k jk (u ∪ v) h(u, v) du dv, E k! W×W (k−1) u∈x

(15.5.13)

k=1

where the multiplier k arises because each element of x can appear in turn as u, giving (from symmetry) equal contributions to the integral. For u ∈ v, jk (u∪v) = ρ(u | v)jk−1 (v), so that writing = k−1, the expectation becomes ∞ 1 ρ(u | v) h(u, v) j (v) dv du = E ρ(u | x) h(u, x) du . ! W W () W =0

In the higher-order forms, the combinatorial factor k [K] replaces the term k but the argument is otherwise similar. The Georgii–Nguyen–Zessin formula was originally given for a ﬁnite Gibbs point process. The setting above, for a regular ﬁnite point process, is in fact no more general as Example 5.3.7 shows. The formula appears in the literature in a variety of guises, mostly associated with the fact that the left-hand side of (15.5.12) can be written in terms of the modiﬁed Campbell measure [Deﬁnition 13.1.I(b)]. Thus, for k = 1, E W

h(u, N \ u) N (du) =

#∗ W×NW

h(u, N ) CP! (du × dN )

so that in this case (15.5.12) becomes #∗ W×NW

h(u, N ) CP! (du

× dN ) = E

W

h(u, N ) ρ(u | N ) du ,

(15.5.14)

immediately suggesting a deﬁnition of the Papangelou intensity as a Radon– Nikodym derivative, an approach we take up in Section 15.6.

15.5.

The Papangelou Intensity and Finite Point Patterns

513

Furthermore, if the ﬁrst moment measure M exists, then the modiﬁed Campbell measure of Deﬁnition 13.1.I(b) can in turn be expressed in terms of modiﬁed local Palm distributions Pu! , namely,

#∗ W×NW

h(u, N ) CP! (du × dN ) =

#∗ W×NW

h(u, N ) Pu! (dN ) M (du).

Finally, if the process is stationary—strictly speaking this is impossible in the context of this section unless the state space is the circle or other compact group, but is covered in the more general context of the next section—then M (du) = m du and the left-hand side simpliﬁes further as in (13.2.5), yielding E W

h(u, N ) ρ(u | N ) du = m

#∗ W×NW

h(u, N ) P0! (dS−u N ) (du).

If h(u, v) is a function of u only, another, suggestive, way of writing (15.5.12) in the case k = 1 is as E W

h(u) N (du) − ρ(u | v) du = 0,

(15.5.15)

so that in this sense the residual process [‘innovations process’ in Baddeley et al. (2005)], [N (du) − ρ(u | v) du], (15.5.16) ν(B) = B

plays a role analogous to that of the martingale N − A in the temporal case. Note, however, that we cannot extend (15.5.15) to general functions h(u, v) with complete impunity, just because the form in which h enters the leftand right-hand sides of (15.5.12) is diﬀerent. This constraint is analogous to the requirement that in the relation (14.2.3) deﬁning the compensator as the dual predictable projection, we cannot replace the predictable process Y (·) in that equation by a completely arbitrary process. Evidently, the necessary and suﬃcient condition to justify the replacement is that h(u, N ) N (du) = E h(u, N ) ρ(u | N ) du . E W

W

This idea is developed more fully in Section 15.7 where the concept of exvisibility is introduced as a spatial analogue of predictability. Meanwhile we use Proposition 15.5.II and (15.5.15) to establish the properties of some inference procedures for spatial point patterns in circumstances where h(u, v) satisﬁes the extended form of (15.5.15), namely, E

W

h(u, v) N (du)−ρ(u | v) du

=E

h(u, v) ν(du) = 0. (15.5.15 ) W

514

15. Spatial Point Processes

The class of such functions includes not only functions of u only, but also functions where the dependence on v enters only through the conditional intensity ρ(u | v) (see Corollary 15.7.VI). We mention ﬁrst the Takacs–Fiksel estimation procedure [Takacs (1983), Fiksel (1988)], which may be regarded as a variation on the method of moments adapted to estimating the parameters in a spatial point process. It is based on equating the two sides of (15.5.12), or one of its equivalent forms such as (15.5.14), for selected functions h(u, v), using the locations of the observed points to evaluate the left-hand side, and the parametric form for the Papangelou intensity, also dependent on the sample through the second element v in h, to evaluate the right-hand side. Enough functions are selected to determine the parameters uniquely. The proper choice of estimating functions requires some care, and depends on the structure of the model. Edge eﬀects again constitute a signiﬁcant nuisance factor. The techniques have been applied especially for Gibbs and other Markov spatial point processes, because it is for just such processes that the Papangelou intensity provides a natural and accessible characterisation of the process. The case of pairwise interactions in Example 15.5(b) is typical. Diggle et al. (1994) has a useful review from an applied statistics perspective; Billiot (1997) establishes consistency and related asymptotic properties of the estimates, including also the case of repeated samples. We turn next to diagnostic tests for spatial point pattern models. These have been developed recently [see, in particular, Baddeley et al. (2005) for an excellent review with examples and illustrations], and are based on the fact that the expectation of (15.5.16) is zero. Thus a signiﬁcant deviation from zero of the sample equivalent of the integral in (15.5.15) or (15.5.15 ) can be used as an indication of a departure from the model under test. Typically, the function h in such diagnostic tests has the form IA (u)g(u, v) where A ∈ BW is a subset of the observation region, and g is interpreted as a weight function characterizing one of several possible types of residuals. Signiﬁcance of the observed value of the resulting sample function can be determined approximately by comparing the sample value to its estimated standard deviation. The next lemma, where ν is deﬁned as in (15.5.16), is key both to such tests and to determining approximate conﬁdence regions for the Takacs–Fiksel procedure. Lemma 15.5.III. Let h be as in Proposition 15.5.II and such that (15.5.15 ) holds. Then 2 h(u, v) ρ(u | v) du h(u, v) ν(du) = E var W W h(u1 , v ∪ u2 ) h(u2 , v ∪ u1 ) − 2 h(u2 , v) ρ(u1 , u2 | v) du1 du2 +E W×W h(u1 , v) h(u2 , v) ρ(u1 | v) ρ(u2 | v) du1 du2 . +E W×W

15.5.

The Papangelou Intensity and Finite Point Patterns

515

Proof. = E[X 2 ] + E[Y 2 ] − 2E[XY ], where Write var W h(u, v) ν(du) X = W h(u, v) N (du) and Y = W h(u, v) ρ(u | v) du. To evaluate E[X 2 ] we use the second-order version of Proposition 15.5.II, using as argument in the left-hand side of (15.5.12) the function of u1 , u2 and v \ {u1 , u2 } equal to h(u1 , v \ u1 ) h(u2 , v \ u2 ), which gives E h(xi , x \ xi ) h(xj , x \ xj ) i =j h(u1 , v ∪ u2 ) h(u2 , v ∪ u1 ) ρ(u1 , u2 | v) du1 du2 . =E W×W

The sum on the left-hand side omits the terms for which i = j; these can be evaluated directly using (15.5.12) as 2 2 E [h(xi , x \ xi )] = E [h(u, v)] ρ(u | v) du . i

Also

W

E[Y 2 ] = E W×W

h(u1 , v) h(u2 , v) ρ(u1 | v) ρ(u2 | v) du1 du2 .

To evaluate the product term we write E(XY ) = E h(u1 , x \ u1 ) N (du1 ) h(u2 , x) ρ(u2 | x) du2 W W h(u1 , x) ρ(u1 | x) du1 h(u2 , x ∪ u1 ) ρ(u2 | x ∪ u1 ) du2 =E W

W

on appealing to both (15.5.15 ) and (15.5.12) in the last step. The double integral can be written as a product integral over W × W, and the product of the terms ρ(·) equals ρ(u1 , u2 | v) by the multiplicative relation (15.5.7a). Collecting terms completes the proof. Residuals over a set A ⊂ W correspond to setting h(u, v) = IA (u) g(u, v), with g(u, v) ≡ 1 for raw residuals, g(u,v) = 1/ρ(u | v) for Stoyan–Grabarnik or inverse λ residuals, and g(u, v) = 1/ ρ(u | v) for Pearson residuals. When (15.5.15 ) holds, these residuals have zero means; their variances (cf. Lemma 15.5.III) are detailed in Example 15.5(c) and Exercise 15.5.5. When the model is parametrized by θ, pseudoscore residuals are obtained by taking gθ (u, v) = ∂ log ρθ (u | v)/∂θ (the name comes from the analogy with the score statistic in conventional statistical analysis). To see this, take the partial derivative in θ of the log pseudolikelihood as in (15.5.5) of a realization of a point process with model parametrized by θ; straightforward algebra shows that this pseudoscore is expressible ∂ log ρ(u | x) ν(du), (15.5.17) ∂θ W which is of the form of the integral in (15.5.15 ). Accordingly, this quantity is called a pseudoscore residual; we denote it (with integral over A) by Rψ (A).

516

15. Spatial Point Processes

Example 15.5(c) Stoyan–Grabarnik residuals [Stoyan and Grabarnik (1991)]. We can write the residual RSG (A) as xi ∈x∩A

1 − ρ(xi | x \ xi )

A

ρ(u, x) du = ρ(u, x)

xi ∈x∩A

1 − (A). ρ(xi | x \ xi )

Using Lemma 15.5.III, its variance is given by 1 ρ(xi | x \ xi ) xi ∈x∩A 1 ρ(u1 | v) ρ(u2 | v) − ρ(u1 , u2 | v) du + E du1 du2 . = E ρ(u1 , u2 | v) A ρ(u | v) A×A

var RSG (A) = var

In simple cases, we can evaluate this expression explicitly. For example, when N is Poisson, ρ(u | v) = λ(u) independently of the rest of the realization v, and because the second-order term ρ(u1 , u2 | v) = λ(u1 )λ(u2 ), the expression reduces to du . var RSG (A) = A λ(u) For a Gibbs process with pairwise interactions as in Example 15.5(a), N (A) var RSG (A) = E exp − ψ1 (u) − ψ2 (u, xi ) du i=1 A 1 − e−ψ2 (u1 ,u2 ) du1 du2 . − A×A

It should be noted that in practice, the expressions used for the Papangelou intensities will be estimates only, whereas the above expressions assume the exact forms are known. This must result in some reduction of the variances; Baddeley et al. (2005) refer to the exact forms as innovations and reserve the term residuals for the estimated forms, giving several examples to illustrate the diﬀerences in the variances.

Exercises and Complements to Section 15.5 15.5.1 Papangelou intensities and pseudolikelihood ratios. (a) Evaluate the Papangelou intensity (i) for a simple Poisson process over W ⊂ R2 ; and (ii) for a Poisson process with linear spatial drift. Write down the pseudolikelihood ratio for testing the one against the other, and show that it coincides with the ordinary likelihood ratio. (b) Extend to the likelihood ratio for a pairwise interaction process, as in Example 15.5(a), including a drift, against the alternative of a Poisson process with drift.

15.5.

The Papangelou Intensity and Finite Point Patterns

517

15.5.2 Second- and higher-order correlation functions; factorial cumulant densities. In the physics literature, ‘correlation’ has a wider meaning than that of a standardized centred bivariate product moment as in probability and statistics. For example, the two-point correlation function in Mart´ınez and Saar (2002) is precisely the second-order product density m[2] (·, ·) of Section 5.4, and Buchler et al. (1998) speak of long-range correlations and the correlation length. Mart´ınez and Saar discuss both a range of estimation procedures for this pair correlation function and higher-order correlation functions (these coincide with higher-order factorial product densities m[k] (·)) that have been used in astrophysical studies. 15.5.3 Covariance of residuals. Let f , g, and h be nonnegative measurable functions satisfying appropriate integrability conditions. (a) Imitate the proof of Lemma 15.5.III to show that

cov

i

=E

h(xi , x \ xi ),

j

+E −E Conclude that var (

W×W W

h(u, x) ρ(u | x) du E

W

+E W×W

W

g(v, y) ρ(v | y) dv .

h(u, x) ν(du)) equals

W

W

h(u, x ∪ v) g(v, x ∪ u) ρ(u, v | x) du dv

h(u, x) N (du) + E

var

h(u, x) g(u, x) ρ(u | x) du

W

g(xj , x \ xj )

2

h(u, x) ρ(u | x) du

h(u, x) h(v, x) ρ(u | x) ρ(v | x) du dv

− 2E

W×W

h(u, x) h(v, x ∪ u) ρ(u, v | x) du dv .

(b) Use this expression to ﬁnd the covariance of two integrals of the form h(u, x) ν(du), g(v, x) ν(dv), and write down conditions for the two W W integrals to be uncorrelated.

(c) Show also that var var

i

i

h(xi , x \ xi ) + −2

W

h(xi , x \ xi ) −

W×W

[Hint: Recall that var (

A

W

cov (f (u, x), f (v, x)) du dv

E h(u, x) ρ(u | x)

f (u, x) du equals

W

X(u) du) =

(f (v, x ∪ u) − E[f (v, y)]) dv du1

A×A

cov (X(u), X(v)) du dv.]

15.5.4 Takacs–Fiksel Method [see also Stoyan and Stoyan (1994, pp. 330–331)]. (a) Write down the Takacs–Fiksel equations in the special case of a Gibbs process for which the potential function U (·) [see Example 5.3(c)] is (i) limited to ﬁrst- and second-order interaction terms; and then either (ii) spatially homogeneous, or (iii) isotropic.

518

15. Spatial Point Processes (b) Check the speciﬁc form of the estimating equations when the Gibbs process is of either hard-core or general Strauss form, and hk (x, v) = N (Srk (x)) exp[U (x, v)], for an increasing sequence of radii rk . [Hint: See Fiksel (1988).] (c) Use the results of Exercise 15.5.3(b) to ﬁnd two uncorrelated test statistics to use in the Takacs–Fiksel method for estimating parameters in a hardcore model.

15.5.5 Residual variances [Baddeley et al. (2007)]. (a) Use Lemma 15.5.III and ν(·) as in Example 15.5(c) to verify the following expressions for the residual variances. (i) Raw residual:

ρ(u | N ) du

var ν(A) = E

[ρ(u1 | N ) ρ(u2 | N ) − ρ(u1 , u2 | N )] du1 du2 .

A

+E A×A

(ii) Pearson residual RPsn (A) =

A

[1/

ρ(u | N ) ] ν(du):

var RPsn (A) = (A)

+

E

2 ( ρ(u1 | N ) ρ(u2 | N ) − ρ(u1 , u2 | N ) ) du1 du2 . ρ(u1 | N ) ρ(u2 | N )

A×A

(b) Show that for a Poisson process with intensity λ(x),

λ(u) du,

var ν(A) =

and

var RPsn (A) = (A) ,

A

and that when λ(u) = eθη(u) the pseudoscore statistic Rψ (A) at (15.5.17) has var Rψ (A) = η 2 (u) eθη(u) du. A

Investigate the form of these three variances for a Gibbs process with pairwise interactions [cf. Examples 15.5(a), (c)].

15.6. Modiﬁed Campbell Measures and Papangelou Kernels The material in these last two sections may be regarded as an introduction to the general theory to be found, for example, in MKM (1982, Chapter 9) and subsequent developments in Kallenberg (1983a, Chapters 12–14) (some misprints are corrected in the 1986 reprinting). Connections with statistical mechanics are explored in Nguyen and Zessin (1979b), Matthes, Warmuth,

15.6.

Modiﬁed Campbell Measures and Papangelou Kernels

519

and Mecke (1979), Gl¨ otzl (1980), Rauchenschwandtner (1980), and Georgii (1988), among others. Our treatment has been much inﬂuenced by Kallenberg’s (1978) work, especially the informal account in Kallenberg (1984). In the discussion of ﬁnite spatial point processes in Section 15.5, the results are essentially extensions and reﬁnements of the basic theory outlined in Chapter 5. In particular this is true of processes deﬁned via interaction potentials as in Example 15.5(a): in the ﬁnite case such a description is equivalent to a description in terms of Janossy densities [see Example 5.3(c) and Exercise 5.3.7]. The situation changes radically, however, if we try to specify the distribution of an inﬁnite particle system in terms of interaction potentials. Certainly, equations such as (15.5.3) can no longer be used to describe the distribution of the point process, because the Janossy densities, which exist in a local sense, no longer exist in a global sense, and in any case sums such as those in the exponent of (15.5.3) in general diverge for k → ∞. The approach adopted by physicists in this situation has been to take a bounded subset, B say, of the state space, and to suppose that the particles in B are in equilibrium with both the ‘external’ interaction forces exerted on them by the particles outside B and the ‘internal’ interaction forces generated among themselves. Equations such as (15.5.3) can then be used to describe the conditions for local equilibrium, conditional on the conﬁguration of particles outside B. Taking expectations over all possible exterior conﬁgurations leads to a family of balance equations that must be satisﬁed by the overall equilibrium distribution of the process, if in fact such a distribution exists (and this last proviso is an important qualiﬁer). For example, suppose that exactly one particle lies in B and that the process is to be speciﬁed through one- and two-point interaction potentials ψ1 and ψ2 as in Example 15.5(a). Then, conditional on the external conﬁguration, which we denote B cN , the local Janossy densities for the process on B must be of the form ψ2 (y, xi ) (y ∈ B), (15.6.1) j1B (y | B cN ) = CB exp ψ1 (y) + xi ∈supp(B cN )

where the normalization constant is now CB , and B c ψ2 (y, x) dN (x) is required to converge for all N in the space N #∗ (B c ) of boundedly ﬁnite simple point processes with support in B c . To obtain a convenient form for the balance equations, multiply by a function f (y, η) mapping (y, η) ∈ B × N #∗ (B c ) into R and take expectations over B cN . This leads to f (y, η) j1B (y | η) PB c (dη) #∗ c N (B ) B = f (y, B cN ) N (dy) I{N (B)=1} (N ) P(dN ), (15.6.2) N #∗ (X )

B

where P is the equilibrium distribution assumed to exist for the process as a whole, and PB c is its projection onto N #∗ (B c ).

520

15. Spatial Point Processes

Given functions ψi (·), it is far from obvious whether there exists an equilibrium distribution P satisfying (15.6.2) and related equations. Indeed, it is nothing other than a general version of the Ising problem, which in its original and special form referred to the existence of an equilibrium distribution for a process on a one-dimensional lattice speciﬁed by interactions of pairs of points involving only nearest neighbours. The general problem, even in the lattice case, is still unsolved, although many partial results are available [see, e.g., Preston (1976) for a general formulation of the Ising problem and a review of results known then]. In particular, it is known that even when an equilibrium distribution exists, it may not be unique (leading to the possibility of ‘phase transitions’), and that even when the interaction potentials are spatially stationary in character (i.e., they depend only on the relative positions of the particles), the resulting solutions may not be stationary (this is known as ‘symmetry breakdown’). Within the class of Gibbs processes speciﬁed by pairwise interaction potentials [Example 15.5(a)], the existence and uniqueness of stationary versions of the process are known under rather strong constraints, in particular a sufﬁcient condition is the existence of a ﬁnite interaction range R such that ψ2 (xi , xj ) = 0 for xi − xj > R. It is not our intention, however, to pursue the Ising problem as such, but rather to consider the question of how quantities analogous to the Papangelou intensities of Section 15.5 can be introduced and related to other characteristics of the point process. One special feature of (15.6.2) is worth noting at this stage. For given location y and {xi : i = 1, 2, . . .}, the function at (15.6.1) is dependent on the set B only through the normalization constant CB and the requirements that y ∈ B, xi ∈ B c (i = 1, 2, . . .). We are therefore led to let B ↓ {y} to recover, in the ﬁnite case, the functions ρ(y | x) that we could there interpret as the conditional intensity for the occurrence of a particle at y given the realization of the process throughout the remainder of the state space, that is, in X \{y}. From the interaction potential viewpoint, the term in the exponent at (15.6.2) can be related to the work required to introduce a new particle into the position y keeping the locations of the existing particles ﬁxed. But although these ideas lead to a straightforward deﬁnition in the ﬁnite case, the situation is more complicated in the general case. It is necessary to distinguish three diﬀerent random measures, each of which embodies some aspect of the conditional intensity ρ(· | ·) of Section 15.5. These quantities are as follows. (i) The Papangelou kernel R(· | ·) deﬁned by integral relations extending (15.5.10). (ii) A random measure π(·) describing, loosely speaking, the atomic part of these kernels. (iii) The Papangelou intensity measures ζ(·) as originally introduced by Papangelou (1974b) in terms of the limit ζ(B) = lim

n→∞

kn c E N (Ini ) | Ini N , i=1

(15.6.3)

15.6.

Modiﬁed Campbell Measures and Papangelou Kernels

521

where T = {Ini : i = 1, . . . , kn }: n = 1, 2, . . . is a ﬁxed dissecting system of partitions of B as in Deﬁnition A1.6.I. In all approaches to this topic, Condition Σ below plays a fundamental role, and we work under it unless explicitly stated otherwise. Deﬁnition 15.6.I. The simple point process on the c.s.m.s. X satisﬁes Condition Σ if for all bounded Borel sets B, P{N (B) = 0 | B cN } > 0 a.s.

(15.6.4)

This requirement generalizes the assumption of Section 15.5 that the Janossy densities be positive everywhere. Its essential role is to preclude situations where the behaviour inside B is deterministically controlled by the behaviour outside B, as can occur in (1◦ ) and (2◦ ) of Example 15.6(a). Example 15.6(a) On Condition Σ. (1◦ ) Let N be a point process on X for which P{N (X ) = r} = 1 for some ﬁxed integer r. Then for any nonempty set A ∈ BX , P{N (A) = 0 | N (Ac ) < r} = 0, and thus Condition Σ is violated. (2◦ ) Let N1 be a point process with exactly one point uniformly distributed over the bounded state space X ∈ B(Rd ), for example, a circle of unit area. Let N2 be a Poisson process at unit rate on X with N2 independent of N1 . Then the point process N = N1 + N2 violates Condition Σ because for any Borel set A ⊆ X of positive Lebesgue measure, P{N (A) = 0 | N (Ac ) = 0} = 0. (3◦ ) With N1 and N2 as in (2◦ ), the process N equal to N1 with probability p and to N2 with probability q = 1 − p, with pq > 0, satisﬁes Condition Σ (details are left to the reader). In order to set down a general form of the integral equations (15.5.10) for the Papangelou kernel, denoted by R(A | N ) for A ∈ BX and N ∈ NX#∗ , we start from (15.5.12) and (15.5.14), treating only the ﬁrst-order case. As in (15.5.14), we can rewrite the left-hand side of this equation in terms of modiﬁed Campbell measure CP! of Deﬁnition 13.1.I(b) to yield h(u, N ) CP! (du × dN ) ≡ h(u, N − δu ) N (du) P(dN ) #∗ X ×NX

#∗ X ×NX

=

#∗ X ×NX

h(u, N ) ρ(u | N ) du P(dN )

=

#∗ X ×NX

h(u, N ) R(du | N ) P(dN ). (15.6.5)

For the ﬁnite case considered in Proposition 15.5.III, this exhibits R(· | N ) du as being derived from a disintegration of the second component of CP! with respect to P, namely, for all bounded A ∈ BX and U ∈ B(NX#∗ ), R(A | N ) P(dN ) = CP! (A × U ), (15.6.6) U

522

15. Spatial Point Processes

and leads us to seek such a disintegration in general. To justify such a disintegration, we use the absolute continuity condition that (15.6.7) CP! (A × ·) P(·) for each ﬁxed A ∈ BX . The proof of this condition in Lemma 15.6.III below gives us an immediate illustration of the role that Condition Σ plays. The proof also makes use of the following results, in which σ{N }, σ{B cN } denote the σ-algebras generated by random variables {N (A), A ∈ BX }, {N (A), A ∈ BB c }, and so on. Lemma 15.6.II. (a) For any U ∈ σ{N } ⊆ B(NX#∗ ), and any B ∈ BX , there exists U ∗ ∈ σ{B cN } such that U ∩ {N (B) = 0} = U ∗ ∩ {N (B) = 0}.

(15.6.8)

(b) For any σ{N }-measurable function g(N ), and any B ∈ BX , there exists a σ{B cN }-measurable function g0 (N ) such that g(N ) I{N (B)=0} (N ) = g0 (N ) I{N (B)=0} (N ).

(15.6.9)

Furthermore, if E|g(N )| < ∞, then for any bounded σ{B cN }-measurable Y , E Y (N ) g(N ) I{N (B)=0} (N ) = E Y (N ) g0 (N ) P{N (B) = 0 | B cN } . (15.6.10) Proof. Any U ∗ ∈ σ{B cN } is generated by sets of the form {N (Ai ) = ji : Ai ∈ B(B c ), nonnegative integers ji }, and for any U ∈ σ{N }, U ∩ {N (B) = 0} is generated by sets of the form {N (B) = 0, N (Ai ) = ji : A ∈ B(B c ), ji = 0, 1, . . .}. This proves (a). Part (b) follows from (a) by a standard extension argument: start from indicator functions IU (N ) for U ∈ σ{N }, for which IU ∩{N (B)=0} = IU ∗ I{N (B)=0} and IU ∗ is σ{B cN }-measurable. Lemma 15.6.III. Let N be a simple point process on the c.s.m.s. X satisfying Condition Σ. Then for all bounded A ∈ BX , the absolute continuity condition (15.6.7) holds. Proof. We have to show that for any U ∈ B(NX# ) such that P(U ) = 0, CP! (A × U ) = 0 (all bounded A ∈ BX ). Suppose ﬁrst that U ⊆ {N (A) = 0}, so that U = U ∩{N (A) = 0}, and thus by Lemma 15.6.II(a), IU (N ) = IU ∗ (N )I{N (A)=0} (N ) for some U ∗ ∈ σ{AcN }. Noting that for y ∈ A, N − δy ∈ U ∗ if and only if N ∈ U ∗ (i.e., the behaviour of N inside A is irrelevant to whether N ∈ U ∗ ), IU (N − δy ) N (dy) P(dN ) CP∗ (A × U ) = #∗ NX

=

#∗ NX

=

#∗ NX

A

I{N (A)=0} (N − δy ) N (dy)IU ∗ (N ) P(dN ) A

I{N (A)=1} (N ) IU ∗ (N ) P(dN ) ≤ P(U ∗ ).

15.6.

Modiﬁed Campbell Measures and Papangelou Kernels

523

Equally, using (15.6.10), 0 = P(U ) = E IU ∗ (N ) P{N (A) = 0 | AcN } . By Condition Σ, the coeﬃcient of the bounded function IU ∗ (N ) is positive a.s., so we have a contradiction unless IU ∗ (N ) = 0 a.s.; that is, P(U ∗ ) = 0, and hence CP! (A × U ) = 0 for such U . Suppose next that P(U ) = 0 for some U ⊆ {N : N (A) ≤ k}

for some given integer k ≥ 1, and let T = {Ani : i = 1, . . . , kn }: n = 1, 2, . . . be a dissecting family of partitions for A. Write also Uni = U ∩ {N (Ani ) = 0}

and

Uni = U \ Uni .

Then for n = 1, 2, . . . we have CP! (A × U ) =

kn

CP! (Ani × U ) =

i=1

=

kn

kn ! CP (Ani × Uni ) + CP! (Ani × Uni ) i=1

CP! (Ani × Uni )

i=1

because by the earlier argument, CP! (Ani × Uni ) = 0. For the last sum write kn i=1

CP! (Ani × Uni )=

kn i=1

≤k

kn

# NX

Ani

IU \Uni (N − δy ) N (dy) P(dN )

P{N (Ani ) ≥ 2, N (A) ≤ k}

i=1

(for this last step, N (Ani ) ≤ N (A) ≤ k for N ∈ U , and any y ∈ Ani that is an atom of N can contribute to the integral only if also (N − δy )(Ani ) ≥ 1, and thus N (Ani ) ≥ 2 for such y). The assumption that N is simple implies that the last sum → 0 as n → ∞, and hence that CP! (A × U ) = 0. To complete the proof, use monotone convergence to deduce that, whenever P(U ) = 0, CP! (A × U ) = lim CP! A × (U ∩ {N : N (A) ≤ k}) = 0. k→∞

Standard arguments based on this result can now be used to establish the existence and uniqueness properties of the disintegration of the modiﬁed Campbell measure, leading to the following theorem whose proof is left to Exercise 15.6.1. Kallenberg (1983a, Chapter 13) gives a more extended treatment; part (iv) of the theorem follows Gl¨ otzl (1980).

524

15. Spatial Point Processes

Theorem 15.6.IV. Suppose given a simple point process N deﬁned on the c.s.m.s. X , with probability measure P, and satisfying Condition Σ. Then there exists a unique kernel R(A | N ) satisfying (i) for each bounded A ∈ BX , R(A | ·) is a Borel-measurable function on NX#∗ ; (ii) for each N ∈ NX#∗ , R(· | N ) is a bounded ﬁnite Borel measure on BX ; and (iii) for all nonnegative, measurable functions h(u, N ): X × NX#∗ → R+ , vanishing for u outside a bounded Borel set of X , h(u, N ) CP! (du × dN ) = h(u, N ) R(du | N ) P(dN ). #∗ X ×NX

#∗ X ×NX

(15.6.11)

(iv) If also CP! (du × dN ) × P

on BX × BN #∗ , X

(15.6.12)

then there exists a BX × BN #∗ -measurable function ρ(u | N ) such that X (15.6.5) holds and for all A ∈ BX , ρ(x | N ) (dx) (P-a.s. in N ). (15.6.13) R(A | N ) = A

Deﬁnition 15.6.V. When they exist, the kernel R(· | ·) deﬁned by (15.6.11) of Theorem 15.6.IV is the Papangelou kernel associated with the point process N of the Theorem, and the density ρ(x | N ) of (15.6.13) is the Papangelou intensity of N . Strictly speaking, the kernel R(· | ·) is the ﬁrst-order Papangelou kernel associated with N , because higher-order kernels can be deﬁned via higher! . These we now introduce and discuss order modiﬁed Campbell measures CP,k brieﬂy, setting ! (A1 × · · · × Ak × U ) CP,k k IU N − δyi N [k] (dy1 × · · · × dyk ) P(dN ) = # NX

A1 ×···×Ak

(15.6.14)

i=1

for bounded A1 , . . . , Ak ∈ BX and U ∈ B(NX#∗ ), where N [k] (·) denotes the k-fold factorial product measure deﬁned by N (·) as above Proposition 9.5.VI. Much as in Lemma 15.6.III, it can be shown that under Condition Σ, ! (A1 × · · · × Ak × ·) P(·), and hence a kth-order Papangelou kernel CP,k Rk (A1 × · · · × Ak | N ) is well-deﬁned P-a.s. for bounded A1 , . . . , Ak ∈ BX by Rk (A1 × · · · × Ak | N ) P(dN ) U ! = CP,k (A1 × · · · × Ak × U ) all U ∈ B(NX#∗ ) . (15.6.15)

15.6.

Modiﬁed Campbell Measures and Papangelou Kernels

525

! Furthermore, regarding CP,k+ as a measure on X () × (X (k) × NX#∗ ), it ! ! can be shown that CP,k+ CP,k with respect to subsets of X (k) × NX#∗ . The corresponding kernel can be identiﬁed, up to the usual equivalence, with the kernel function k " " δxi , R A1 × · · · × A " N + i=1

thus justifying the extension of the multiplicative relation (15.5.7) to the form (using the simplicity of N in an essential way) k " " δxi Rk (dx1 × · · · × dxk | N ) R dy1 × · · · × dy " N + i=1

= Rk+ (dy1 × · · · × dy × dx1 × · · · × dxk | N ).

(15.6.16)

Finally, deﬁning R0 (· | N ) = 1 for N (X ) = 0 and zero otherwise, the Papangelou kernels of all orders can be combined into a portmanteau kernel on the space X ∪∗ when N (X ) < ∞ via the equation G(V | N ) =

∞

! Rk (V ∩ X (k) | N ) k!

V ∈ B(X ∪∗ ) .

(15.6.17)

k=0

Under Condition Σ, this Gibbs kernel G(· | ·) is a density for the portmanteau Campbell measure deﬁned much as in (15.6.14) but allowing k to vary, so the resultant set A is any Borel subset of X ∪∗ . Many properties of the Papangelou kernels can be assumed under a general treatment of the Gibbs kernel: see Kallenberg (1983a, Chapter 13) for details. Moreover, we can take the disintegrations the other way, assuming for any given k that the kth-order factorial moment measure exists, and disintegrating the kth-order Campbell measure relative to this moment measure in just the same way as for the ordinary Palm measures in Chapter 13. Then all these disintegrations can be combined to give a decomposition of the portmanteau Campbell measure into a family of Palm measures and an associated portmanteau factorial moment measure. Again we refer to Kallenberg (1983a, 1984) for further details. We conclude this section with another property of the ﬁrst-order Papangelou kernel; it is an extension of the conditional probability relation (15.5.8). Proposition 15.6.VI. Let N be a simple point process deﬁned on the c.s. m.s. X satisfying Condition Σ. Then for any bounded A, B ∈ BX with A ⊆ B, R(A | N ) =

P{N (A) = N (B) = 1 | B cN } P{N (B) = 0 | B cN }

a.s. on {N : N (B) = 0}. (15.6.18)

Proof. In (15.6.5) substitute h(u, N ) = IA (u) I{N (B)=0} (N ) IU (N + δu ),

526

15. Spatial Point Processes

where U ∈ σ{B cN }. When u ∈ A ⊆ B, N + δu ∈ U if and only if N ∈ U , so (15.6.5) yields R(A | N ) I{N (B)=0} (N ) P(dN ) = N (A) I{N (B)=1} (N ) P(dN ) U U = E N (A) I{N (B)=1} | B cN P(dN ) U = P{N (A) = N (B) = 1 | B cN }P(dN ). U

On the other hand, using Lemma 15.6.II(b), noting that U ∈ σ{B cN }, we can write E R(A | N ) I{N (B)=0} (N ) = E R0 (A | N ) P{N (B) = 0 | B cN } , where R0 (A | N ) is σ{B cN }-measurable. Because we have equality for all U on {N (B) = 0}, R0 (A | N ) P{N (B) = 0 | B cN } = P{N (A) = N (B) = 1 | B cN }. Moreover, by Condition Σ, the coeﬃcient of R0 (A | N ) > 0 a.s., and on N (B) = 0, R0 (A | N ) = R(A | N ), so (15.6.18) follows.

Exercises and Complements to Section 15.6 15.6.1 Prove Theorem 15.6.IV. [Hint: Cf. the proof of Lemma 15.6.III.]

15.7. The Papangelou Intensity Measure and Exvisibility In this ﬁnal section we turn to an investigation of the two other measures, π and ζ, referred to in the discussion around (15.6.3), and their relation to the ﬁrst-order Papangelou kernel R(· | ·). Theorem 15.6.IV(iv) exhibits the Papangelou intensity as the Radon–Nikodym derivative w.r.t. Lebesgue measure of the kernel R(· | ·), under the condition that CP! is absolutely continuous with respect to both × P and P (the latter property holds by deﬁnition). Atoms of π are precisely what hinders the existence of a Papangelou intensity in general. If the atomic component is absent, then the Papangelou intensity of the previous section coincides with the intensity deﬁned via exvisibility, that is, as a density of ζ. Thus far, we have considered R(· | N ) as a kernel on the canonical space NX#∗ . It is more convenient for this section to regard R(A | N ) as deﬁned on the space (Ω, E, P) itself. By an abuse of notation we sometimes write R(A) = R(A, ω) = R A | N (ω) , (15.7.1) and treat the quantity on the left-hand side as a random measure. That it is

15.7.

The Papangelou Intensity Measure and Exvisibility

527

a random measure follows from the measurability of R(A | N ) as a function of N , which implies that R(A) is a random variable for each A ∈ BX and hence that the requirements of Proposition 9.1.VIII are satisﬁed. As we show shortly, this random measure has some properties in common with the compensator for temporal processes, but diﬃculties arise with the atoms in particular, and we start on a diﬀerent tack, following broadly the approach adopted by Papangelou (1974b) and Kallenberg (1983a). Suppose given a ﬁxed dissecting system

of partitions for X [recall (15.6.3)] T = {Inj : j = 1, . . . , kn }, n = 1, 2, . . . . We need the following lemma. Lemma 15.7.I. For bounded Borel sets A, B with A ⊆ B,

P · ∩ {N (B \ A) = 0} | B cN a.s. on {N (B) = 0}, (15.7.2) P{· | Ac N } = P{N (B \ A) = 0 | B cN } the denominator being a.s. positive on {N (B) = 0}. Proof. E[I{N (B)=0} (N ) P{N (B) = 0 | B cN }] = E[P{N (B) = 0 | B cN }] a.s. If N ∈ {N (B) = 0}, then either N is in a set of measure zero, or else P{N (B) = 0 | B cN } > 0, so that in either case the last probability is a.s. positive on {N (B) = 0}, and therefore the denominator in (15.7.2) is a.s. positive on {N (B) = 0}. The relation itself is just a version of P (U | V ∩ W ) = P (U | W )/P (V | W ) for V ⊆ W , where because A ⊆ B, we can take P to be P(· | B cN ) and W to be {N (B \ A) = 0}. In what follows, we take A, B to be elements of T and note that, because T is countable, we can assume that (15.7.2) holds simultaneously a.s. for all such choices of A, B ∈ T . First we examine any atomic component of R. Proposition 15.7.II. Let N be a simple point process on the c.s.m.s. X . Let T be a dissecting system of partitions of X , x a general point of X , and {In (x)} a sequence of elements of T with In (x) ↓ {x} (n → ∞). Then the limit (15.7.3) π{x} = lim P N In (x) ≥ 1 | Inc (x)N } n→∞

exists a.s., is independent of T , and can be identiﬁed a.s. on the set {N (In (x) \ {x}) = 0} with the ratio

P N {x} = N In (x) = 1 | Inc (x)N

, π{x} = (15.7.4) P N In (x) \ {x} = 0 | Inc (x)N the ratio being interpreted as 1 when both numerator and denominator vanish. The equation π{xi } = π{xi }δxi (A) (15.7.5) π(A) = π(A, N ) ≡ xi ∈supp(N )∩A

deﬁnes a random measure on X .

528

15. Spatial Point Processes

When Condition Σ holds, π{x} < 1 for all x, the atoms of π include the atoms of R, and ! R{x} = π{x} (1 − π{x}) (15.7.6) unless N {x} = 1, in which case R({x}) = 0. Proof. In Lemma 15.7.I take A = In (x), B = Im (x) with n > m. Omitting the dependence on x for notational brevity, (15.7.2) implies that a.s. on {N (Im ) = 0}, P{N (In ) > 0 | Inc N } = 1 − P{N (In ) = 0 | Inc N } P{ N (Im ) = 0 | Inc N } . =1− c N} P{N (Im \ In ) = 0 | Im

(15.7.7)

For increasing n, the numerator here remains ﬁxed, and the denominator c N }. We deduce decreases monotonically to the limit P{N (Im \ {x}) = 0 | Im that the limit exists a.s. on the set {N (Im \ In ) = 0} for every n > m, hence a.s. on {N (Im \ {x}) = 0}, and hence a.s. because N (Im \ {x}) → 0 a.s. as m → ∞. Also the ratio equals (15.7.4) except possibly for realizations where the denominator vanishes. If the latter holds, the ratio for ﬁnite n will tend to ∞ unless in the limit the numerator also vanishes. In this case, we have P{N (In ) = 0 | Inc N } → 0; that is, P{N (In ) ≥ 1 | Inc N } → 1, implying that π{x} = 1, in accordance with the interpretation here that ‘0/0 = 1’. Next, let T1 and T2 be dissecting systems with T1 ⊆ T2 . Then the limits π1 {x}, π2 {x} say exist for each system, and taking {Inj (x)} ⊆ Tj with Inj (x) ↓ {x} (nj → ∞) for j = 1, 2, so {In1 (x)} ⊆ T1 ⊆ T2 , it follows that π1 {x} = π2 {x} because we can always ﬁnd In (x) ↓ {x} with successive terms taken alternately from {In1 } and {In2 }. In general, any two systems T1 , T2 generate by their intersection a third system T3 with T3 ⊇ T1 and T3 ⊇ T2 , so π{x} is independent of T . For bounded A ∈ BX , N (A) < ∞, so the deﬁning sum at (15.7.5) can be expressed as a ﬁnite sum of limits as at (15.7.3) over disjoint sets Inj for suﬃciently large n, and thus it is a random variable. From Proposition 9.1.VIII, we deduce that π(A, N (ω)) is a random measure. When Condition Σ holds, if π{x} = 1 for some x, then for this x and all n P{N In (x) ≥ 1 | Inc (x)N } = 1

a.s.

on account of the monotonicity of the denominator in (15.7.7). Consequently, P{N (In (x)) = 0 | Inc (x)N } = 0 a.s., contradicting Condition Σ. Thus, π{x} < 1. To demonstrate (15.7.6), refer to (15.6.18). By putting A = Im (x) and B = In (x), deduce that on {N (In ) = 0}, P{N (Im ) = 1 = N (In ) | Inc N } . R Im (x) | N = P{N (In ) = 0 | Inc N }

15.7.

The Papangelou Intensity Measure and Exvisibility

529

Here, R(· | N ) is a measure and Im (x) ↓ {x} as m → ∞, so the left-hand side → R({x}). On the right-hand side, {N : N (Im ) = 1 = N (In )} ↓ {N : N {x} = 1 = N (In )}, so on {N (In ) = 0},

P N {x} = 1 = N (In ) | Inc N . R{x} ≡ R({x}) = P{N (In ) = 0 | Inc N } The numerator here coincides with that of (15.7.4), and the denominator equals

P N (In \ {x}) = 0 | Inc N − P N (In ) = N {x} ≥ 1 | Inc N . Because N is simple, the last term equals P{N (In ) = N {x} = 1 | Inc N }. Then (15.7.6) follows from (15.7.4) whenever {N (In ) = 0} for suﬃciently large n. On the complementary event, x is an atom of N because (

{N (In ) ≥ 1} = N {x} ≥ 1 . n

In this case, we choose some positive integer k and substitute h(y, N ) = IA (y) I{N (B)≤k} (N ) I{N (A)≥1} (N ) in the relation (15.6.5) with B = In (x), A = Im (x) with m ≥ n. The left-hand side yields E R(Im ) I{N (In )≤k} (N ) I{N (Im )≥1} (N ) , and the right-hand side is bounded above by (k + 1)P{N (Im ) ≥ 2, N (In ) ≤ k + 1}. Now repeat this with Im replaced by a dissecting partition {Inj } for Im , and sum over j. Proceeding to the limit, the sum → 0 by simplicity, and the sum of the left-hand side converges to R{x}N {x}, the sum being taken over all atoms lying in Im . Consequently, R{x} = 0 whenever N {x} > 0. The following simple examples may help illustrate the nature of atoms of R and N , and especially the role played by π(·). Example 15.7(a) On Condition Σ [continued from Example 15.6(a)]. Let N be a point process on X for which P{N (X ) = r} = 1 for some ﬁxed integer r ≥ 1 as earlier in (1◦ ). For x ∈ supp(N ) and a given dissecting system T let {In } ≡ {In (x)} be a sequence of elements of T contracting to {x}. Then P{N (In ) ≥ 1 | N (Inc ) ≤ r − 1} = 1 = lim P{N (In ) ≥ 1 | N (Inc ) ≤ r − 1} n→∞

= π{x}. The set {N : N (In − {x}) = 0} ∩ {N (Inc ) ≤ r − 1} consists precisely of those realizations N for which N (Inc ) = r −1, N (In ) = N {x} = 1. If we assume that

530

15. Spatial Point Processes

the points of N are r points i.i.d. over X , which is a bounded Borel subset of Rd —for example, the interior of a circle or a sphere—then assuming the sets In have positive Lebesgue measure, we should have P{N (In ) = 1 | N (Inc ) = r − 1} = 1

= P N (In ) = 1 = N {x} | N (Inc ) = r − 1 = 0

= P N (In − {x}) = 0 | N (Inc ) = r − 1 . Thus, this example justiﬁes the interpretation of π{x} = π({x}, N ) as unity when the expression at (15.7.4) equals ‘0/0’. Example 15.7(b) A particular Gauss–Poisson process. Let there be given a Gauss–Poisson process in the plane, which, in its Poisson cluster process representation, consists of a cluster centre process that is Poisson at unit rate, and for which any point x in the centre process produces clusters of either zero or one additional points, with probability p for there being one point, which is then located at x + a for some ﬁxed position a relative to the cluster centre. Consider such a process on a bounded subset X ∈ B(R2 ). It can be checked that this process satisﬁes Condition Σ. Suppose the state space X contains all of x, x − a and x + a for some x; consider realizations N for which N {x} = 1, and let {In } ≡ {In (x + a)} be a sequence of sets of positive Lebesgue measure belonging to some dissecting system for X and ↓ {x+a}. On realizations N for which N {x} = 1 = N {x−a} and N (X \ ({x} ∪ {x − a} ∪ In )) = 0, P{N (In ) ≥ 1 | Inc N } = (In )(1 + o(1)), so the ratio at (15.7.4) → 0. On realizations for which N {x} = 1 = N (Inc ), P{N (In ) ≥ 1 | Inc N } = p + (In ) 1 + o(1) and

P N (In \ {x + a}) = 0 | Inc N = 1 − (In ) 1 + o(1) ,

so the ratio at (15.7.4) → p = π{x} ≡ π({x}, N ). Example 15.7(c) Discrete Bernoulli process [cf. Example 7.2(d)]. Consider a simple point process N on the ﬁnite set X = {x1 , . . . , xn } satisfying, independently for each point, N {xi } = 0 or 1 with probabilities q and p = 1 − q, respectively. The Janossy measures are purely atomic, with J0 = q n and Jk (xr1 . . . , xrk ) ≡ Jk ({xr1 }, . . . , {xrk }) = pk q n−k for any subset Sk of k distinct points in X . Observe that n k n−k p q Jk (Sk ) = . all Sk k The integral equation deﬁning R(· | ·) reduces to a deﬁnition as a ratio of Janossy measures ! R(y | x1 , . . . , xk ) = Jk+1 (y, x1 , . . . , xk ) Jk (x1 , . . . , xk ) = p/q.

15.7.

The Papangelou Intensity Measure and Exvisibility

531

Also, if B = {y, x1 , . . . , xk },

P N {y} = N ({y, x1 , . . . , xk }) = 1 | B cN p pq k

= , = q k+1 q P N ({y, x1 , . . . , xk }) = 0 | B cN consistent with (5.6.18). On the other hand,

P N {y} = N ({y, x1 , . . . , xk }) = 1 | B cN pq k

π{y} = = k = p, c q P N ({x1 , . . . , xk }) = 0 | B N and π{y}/(1 − π{y}) = p/(1 − p) = p/q. Finally, if N {y} = 1, π{y} is unchanged, but for R({y} | ·) we should have a ratio of Janossy measures with the argument y repeated in the numerator, which is zero on account of the process being simple. The last of the three quantities mentioned around (15.6.3), the Papangelou measure ζ(·), is arguably the one that has the most important applications. Papangelou (1974b) devised it primarily as a means of tackling certain problems in stochastic geometry quite distinct from the present context (see Kallenberg (1983b) and related discussion for an informal account and references). As we show shortly, its importance in statistical applications is that under weak conditions it has a density which can be identiﬁed with the Papangelou intensity. We start by establishing its existence under Condition Σ, relating it to the random measures R(· | ·) and π(·) discussed already. Proposition 15.7.III. Let N, X , T , π, and R be as in Proposition 15.7.II, and suppose that Condition Σ holds. Then as n → ∞, the limit ζ(B) = lim

n→∞

kn

c P{N (Bnj ) = 1 | Bnj N },

Bnj ≡ B ∩ Inj ,

(15.7.8)

j=1

exists a.s. for all bounded B ∈ BX and deﬁnes a random measure given a.s. by (15.7.9) ζ(·) = π(·) + Rd (· | ·), where Rd (· | ·) is the diﬀuse (i.e., nonatomic) component of the random measure R(· | ·). Proof. Without loss of generality assume that B ∈ T ; then each Bnj = Inj ∈ T (although we continue to write Bnj ). Write kn j=1

c P{N (Bnj ) = 1 | Bnj N} =

kn

c P{N (Bnj ) = 1 | Bnj N } I{N (Bnj }=0}

j=1 c + P{N (Bnj ) ≥ 1 | Bnj N } I{N (Bnj }≥1} c − P{N (Bnj ) ≥ 2 | Bnj N } I{N (Bnj }≥1}

≡ Σ0 (n) + Σ1 (n) − Σ2 (n),

say.

532

15. Spatial Point Processes

For Σ0 (n), observe from (15.6.18) that c c c N } = R(Bnj | Bnj N ) P{N (Bnj ) = 0 | Bnj N }, P{N (Bnj ) = 1 | Bnj

so we can rewrite Σ0 (n) in the form c IBn (x) hn (x) R(dx | Bnj N ),

Σ0 (n) = B

where, for ﬁxed N , Bn is the union of those Bnj where N (Bnj ) = 0, and c N } on Bnj . {Bn } is a monotonic increasing hn (x) = P{N (Bnj ) = 0 | Bnj sequence of sets, with limit B \ {supp N }, and hn (x) ↑ 1 − π{x} a.s. by Proposition 15.7.II. By monotone convergence, therefore, Σ0 (n) →

B\{supp N }

(1 − π{x}) Ra (dx) + Rd (dx)

with Ra denoting the atomic component of R. By using (15.7.3), Ra πa a.s., so the ﬁrst term here equals π(B \ {supp N }), and because π{x} = 0 Rd -a.s., and R{x} = 0 for x ∈ {supp N }, the second term equals Rd (B). For Σ1 (n), for given N , the sum reduces for n suﬃciently large to a sum over sets Bnj containing exactly one of the atoms of N (·) in B. By Proposition 15.7.II again, the limit as n → ∞ reduces to π(B ∩ {supp N }). Thus, Σ0 (n) + Σ1 (n) → Rd (B) + π(B) = ζ(B)

a.s.,

a.s. Just as for Σ1 (n), the sum and it remains to prove that Σ2 (n) → 0 say, over precisely N (B) sets reduces for n suﬃciently large to a sum, Bnj , where N (Bnj ) = 1; that is, Σ2 (n) = =

c P{N (Bnj ) ≥ 2 | Bnj N}

P{N (Bnj ) ≥ 2, N (B \ Bnj ) = 0 | B cN } P{N (B \ Bnj ) = 0 | B cN }

on using Lemma 15.7.I. As n increases, each of the N (B) terms in the numerator → 0 because N is simple, and for the denominator, which is decreasing to P{N (B \ {x}) = 0 | B c N } for some x ∈ {supp N }, Condition Σ implies that 0 < P{N (B) = 0 | B cN } ≤ P{N (B \ {x}) = 0 | B cN }. The Papangelou intensity measure is related to a ﬁrst moment in much the same way as the ﬁrst moment and intensity measures coincide (Propositions 9.3.IX–X).

15.7.

The Papangelou Intensity Measure and Exvisibility

533

Corollary 15.7.IV. Suppose the ﬁrst moment measure EN (·) exists. Then for all bounded B ∈ BX , ζ(B) at (15.7.8) also has the representation lim

n→∞

kn c E N (Bnj ) | Bnj N = ζ(B)

a.s. and in L1 mean.

(15.7.10)

j=1

Proof. For the a.s. convergence, write c N E N (Bnj ) | Bnj

c c N } + E N (Bnj ) I{N (Bnj )≥2} (N ) | Bnj N , = P{N (Bnj ) = 1 | Bnj

and use an extension of Lemma 15.7.II to write kn c E N (Bnj ) I{N (Bnj )≥2} (N ) | Bnj N j=1

kn E N (Bnj ) I{N (Bnj )≥2} I{N (B\Bnj )=0} | B cN ) P{N (B \ Bnj ) = 0 | B cN } j=1 kn c E j=1 N (Bnj ) I{N (Bnj )≥2} (N ) | B N . ≤ P{N (B) = 0 | B cN }

=

This conditional expectation is bounded above by E(N (B) | B cN ) < ∞ a.s., and N being simple implies that each term in the sum → 0 a.s. (cf. also Exercise 9.3.10), so by dominated convergence we have the required result. To establish L1 convergence, observe in the proof of the proposition that Σ0 (n) increases monotonically to its limit, which has expectation bounded by E(N (B)), so its a.s. convergence implies its L1 convergence. Also, Σ1 (n) − Σ2 (n) =

kn

c P{N (Bnj ) = 1 | Bnj N } I{N (Bnj )≥1}

j=1

≤

kn

I{N (Bnj )≥1} ≤ N (B),

j=1

so here the a.s. convergence of Σ1 (n) − Σ2 (n) implies its L1 convergence by the dominated convergence theorem. Finally, kn kn c E N (Bnj ) I{N (Bnj )≥2} (N ) | Bnj N =E N (Bnj ) I{N (Bnj )≥2} E j=1

j=1

→0

(n → ∞)

from simplicity and the assumption that E N (B) < ∞.

534

15. Spatial Point Processes

So far the development in this chapter has been based mainly on disintegrations and limits, having much in common with the material of Chapter 13. It is possible to base the derivation of the Papangelou intensity measure on arguments much closer to those used in the discussion of the compensator and its density, the conditional intensity λ∗ (·, ω). With a state space X more general than R or R+ as in Chapter 14, the concept of predictability used there is replaced by that of exvisibility due to Van der Hoeven (1982) [we follow the terminology of Kallenberg (1983a)]. Write σ{B cN } for the completion of the B-external σ-algebra σ{B cN } with respect to the null sets of σ{N } of the process N . Then on the product space X × Ω (or, more speciﬁcally, X × NX# in the canonical set-up), deﬁne the exvisible σ-algebra Z to be the σ-algebra generated by sets of the form B × U , where B ∈ BX , U ∈ σ{B cN }. A stochastic process on (Ω, E, P ) is then exvisible if it is measurable with respect to Z on X × Ω. Given a random measure ξ, a ‘dual exvisible projection’ of ξ can be introduced as the unique random measure satisfying conditions (i)–(iii) of Proposition 15.7.V below. A direct proof of this assertion requires arguments from the general theory of processes analogous to those needed to give a direct proof of the properties of the compensator A(·) in Section 14.1 [see Van der Hoeven (1982, 1983)]. However, when ξ is a point process satisfying the special conditions assumed in this chapter (namely, it is simple and satisﬁes Condition Σ), it is not too diﬃcult to see that the dual exvisible projection is nothing other than the Papangelou intensity measure ζ itself: we conclude with a formal statement and proof of this result. Proposition 15.7.V. Under the conditions of Proposition 15.7.III, and assuming EN (·) exists, the Papangelou intensity measure ζ is the unique (up to equivalences) random measure satisfying the conditions (i) ζ is determined by the point process N (i.e., ζ(B) is σ{N }-measurable for every B ∈ BX ); (ii) the process Z(x) ≡ ζ{x} is exvisible; and (iii) for every nonnegative exvisible process Y and bounded B ∈ BX , E Y (x) ζ(dx) = E Y (x) N (dx) . (15.7.11) B

B

Proof. The function ζ as deﬁned at (15.7.8) is the limit of a σ{N }-measurable r.v., and therefore (i) holds for ζ. To prove (ii), suppose x is an atom of ζ. Then in the notation used in the proof of Proposition 15.7.III, a.s., ζ{x} = 1 − π{x} = lim 1 − hn (x) n→∞

kn c where hn (x) = j=1 P{N (Bnj ) = 0 | Bnj N } IBnj (x) is clearly exvisible. The limit is thus a.s. equal to an exvisible process, and if the σ-ﬁelds are

15.7.

The Papangelou Intensity Measure and Exvisibility

535

complete we can allow modiﬁcations on sets of measure zero without upsetting measurability, so all versions are exvisible. For (iii), take Y in (15.7.11) to have the special form Y (x, ω) = IB (x) IU (ζ)

for U ∈ σ{B cN }.

c Then Corollary 15.7.IV implies that because U ∈ σ{Bnj N } for every Bnj ,

kn c E IU N (B) = E IU E N (Bnj ) | Bnj N → E IU ζ(B) . j=1

Because the left-hand side here is ﬁxed, (15.7.11) follows for this particular function Y , and then for processes Y as described by standard extension arguments. Thus, ζ satisﬁes (i)–(iii): suppose η is some other random measure satisfyc N }, (15.7.11) implies that ing the conditions. Whenever U ∈ σ{Bnj c N = E IU η(Bnj ) = E IU N (Bnj ) E IU E η(Bnj ) | Bnj c N , = E IU E N (Bnj ) | Bnj from which it follows that c c E η(Bnj ) | Bnj N = E N (Bnj ) | Bnj N

a.s.,

and hence that lim

n→∞

kn kn c c E η(Bnj ) | Bnj N = lim E N (Bnj ) | Bnj N = ζ(B) n→∞

j=1

a.s.

j=1

Each of these two sums may be further analysed by the same procedure as used in forming the sums Σ0 (n), Σ1 (n), Σ2 (n) in the proof of Proposition 15.7.III. c N }-measurability of η(·) on {N (Bnj ) = 0}, In particular, using the σ{Bnj kn c E η(Bnj ) I{N (Bnj )=0} (N ) | Bnj N I{N (Bnj )=0} j=1

=

kn

c η(Bnj ) P{N (Bnj ) = 0 | Bnj N } I{N (Bnj )=0}

j=1

→

B\{supp N }

(1 − π{x}) η(dx) = ηd (B) +

(1 − π{xi }) η{xi },

where in the second step we have used the limit behaviour of the function hn (x) as in the proof of the earlier result, ηd (·) denotes the diﬀuse component of η, and summation is over the atoms in B of η(·). Thus, we have a.s. (1 − π{xi }) η{xi } = ηd (B) + (1 − π{xi }) ζ{xi }. (15.7.12) ηd (B) +

536

15. Spatial Point Processes

Now it follows from conditions (ii) and (iii) that the atomic parts of η and ζ must be equal, because if we let V = {(x, ω) ∈ X × Ω: Z(x, ω) > ζ({x}, ω)}, then V is an exvisible set and from (15.7.11),

Z(x, ω) − ζ({x}, ω) η(dx, ω) − ζ(dx, ω) P(dω) = 0.

V

Only atoms of η and ζ contribute to this integral, and indeed only those for which η{xi } − ζ{xi } > 0. This leads to a contradiction unless η{xi } ≤ ζ{xi } a.s., and by reversing the argument we deduce that η{xi } = ζ{xi } a.s. for all atoms; that is, η and ζ have the same atoms and the atoms are of the same size a.s. Then (15.7.12) implies that the diﬀuse components agree a.s.; that is, η and ζ coincide except possibly on a set of measure zero. Now suppose that Condition Σ holds and that the point process N admits a Papangelou intensity ρ(x | N ). It means that the random measure R is a.s. absolutely continuous with ρ as its density. Because its atomic component π is null, it then follows from (15.7.9) that ζ also has no atomic component, and that the diﬀuse components of ζ and R coincide. Hence, ζ also is a.s. absolutely continuous with density ρ. This gives the following corollary of Proposition 15.7.V, in which we repeat (15.7.11) for convenience. Corollary 15.7.VI. If the simple point process N admits a Papangelou intensity ρ and satisﬁes Condition Σ, then the random measures R and ζ coincide, and both have density ρ. Moreover, in this case, for all nonnegative, exvisible, FN -measurable processes Y E Y (x) N (dx) = E Y (x) ζ(dx) = E Y (x) ρ(x | N ) dx . B

B

B

This equation may be compared with the extended form (15.5.15 ) of the Georgii–Nguyen–Zessin formula. There the role of the exvisible process Y is taken by the function h(x, v), where v denotes (points of) a realization N , so h must embody any condition of FN -measurability. If h does not depend on v, or depends on it only through the function ρ(u | v), which is itself exvisible, then (15.5.15 ) will hold. A further illustration is in Exercise 15.7.1.

Exercises and Complements to Section 15.7 15.7.1 Show that if a process Y (x) is a.s. continuous and FN -measurable, and vanishes outside a bounded set, then it is exvisible.

References with Index

[At the end of each reference entry is the page number or numbers where it is cited in this volume. A bibliography of about 600 references up to about 1970, although excluding much of the historical material of Chapter 1, is given in D.J. Daley and R.K. Milne (1972), The theory of point processes: A bibliography, Int. Statist. Rev. 41, 183–201.] Aldous, D. and Thorisson, H. (1993). Shift coupling. Stoch. Proc. Appl. 44, 1–14. [222] Ambartzumian, R.V. (1972). Palm distributions and superpositions of independent point processes in Rn . In Lewis, P.A.W. (Ed.), Stochastic Point Processes, Wiley, New York, pp. 626–645. [466] Andersen, P.K., Borgan, Ø., Gill, R.D. and Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer-Verlag, New York. [355] Ash, R.B. (1972). Real Analysis and Probability. Academic Press, New York. [Revised ed., with C. Dol´eans-Dade (2000), Probability and Measure Theory. Harcourt/Academic Press, San Diego.] [25] Asmussen, S. (1987). Applied Probability and Queues. John Wiley, Chichester. [355] —— (2003). = Asmussen (1987), 2nd ed. Springer-Verlag, New York. [96, 355] Asmussen, S., Nerman, O. and Olsson, M. (1996). Fitting phase-type distributions via the EM Algorithm, Scand. J. Statist. 23, 419–441. [110] Asplund, E. and Bungart, L. (1966). A First Course in Integration. Holt, Rinehart and Winston, New York. [376] Baccelli, F. and Br´emaud, P. (1994). Elements of Queueing Theory: PalmMartingale Calculus and Stochastic Recurrences. Springer, Berlin. [222, 269, 328, 355] —— and —— (2003). = Baccelli and Br´emaud (1994), 2nd ed. [177, 355] 537

538

References with Index

Baddeley, A.J. (1998). Spatial sampling and censoring. In Kendall, W., Lieshout, M.N.M. and Barndorﬀ-Nielsen, O. (Eds.), Stochastic Geometry: Likelihood and Computation, Chapman and Hall, Boca Raton, FL, pp. 37–78. [457] ——, Gregori, P., Mateu, J., Stoica, R. and Stoyan, D. (Eds.) (2006). Case Studies in Spatial Point Process Modeling (Lecture Notes in Statistics 185). Springer, New York. [459] —— and Møller, J. (1989). Nearest-neighbour Markov point processes and random sets. Internat. Statist. Review 57, 89–121. [118, 128] ——, —— and Pakes, A.G. (2007). Properties of residuals for spatial point processes. Ann. Inst. Statist. Math. 59 (to appear). [518] —— and Silverman, B.W. (1984). A cautionary example on the use of second-order methods for analyzing point patterns. Biometrics 40, 1089–1094. [464] ——, Turner, R., Møller, J. and Hazelton, M. (2005). Residual analysis for spatial point processes (with Discussion). J. Roy. Statist. Soc. Ser. B 67, 617–666. [419, 459, 502, 513–516] —— and van Lieshout, M.N.M. (1995). Area-interaction point processes. Ann. Inst. Statist. Math. 46, 601–619. [124, 130] ——, —— and Møller, J. (1996). Markov properties of cluster processes. Adv. Appl. Probab. 28, 346–355. [129] Ball, F. and Milne, R.K. (2005). Simple derivations of properties of counting processes associated with Markov renewal processes. J. Appl. Probab. 42, 1031–1043. [99, 116] Barbour, A.D. and Hall, P. (1984). On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95, 473–480. [163] Barndorﬀ-Nielsen, O. and Yeo, G.F. (1969). Negative binomial processes. J. Appl. Probab. 6, 633–647. [10] Bartlett, M.S. (1955). An Introduction to Stochastic Processes. Cambridge University Press, Cambridge [2nd ed. 1966; 3rd ed. 1978]. [100] —— (1963). The spectral analysis of point processes (with Discussion). J. Roy. Statist. Soc. Ser. B 29, 264–296. [484] Basawa, I.V. and Scott, D.J. (1983). Asymptotic Inference for Non-ergodic Models (Springer Lecture Notes Statist. 17). Springer-Verlag, New York. [416] Baum, L.E. and Eagon, J.A. (1967). An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology. Bull. Amer. Math. Soc. 73, 360–363. [104] —— and Petrie, T. (1966). Statistical inference for probabilistic functions of a Markov chain. Ann. Math. Statist. 37, 1554–1563. [104] Bebbington, M.S. (2005). Information gains for stress release models. Pure Appl. Geophys. 162, 2299–2319. DOI: 10.1007/s00024-005-2777-5, 2005. [503] —— and Borovkov, K. (2003). A stochastic two-node stress transfer model reproducing Omori’s law. Pure Appl. Geophys. 160, 1429–1455. [229] Beran, J. (1994). Statistics for Long-Memory Processes. Chapman and Hall, New York. [250] Berb´ee, H.C.P. (1979). Random Walks with Stationary Increments and Renewal Theory (Mathematical Centre Tracts 112). Mathematisch Centrum, Amsterdam. [229]

References with Index

539

Berman, M. (1978). Regenerative multivariate point processes. Adv. Appl. Probab. 10, 411–430. [333] Bertoin, J. (1996). L´evy Processes. Cambridge University Press, Cambridge. [Paperback edition, 1998, reprinted 2002.] [12, 21, 82, 83] Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with Discussion). J. Roy. Statist. Soc. Ser. B 36, 192–236. [118, 122, 506] Billingsley, P. (1965). Ergodic Theory and Information. Wiley, New York. [194, 449] —— (1968). Convergence of Probability Measures. Wiley, New York. [27, 214] Billiot, J-M. (1994). Asymptotique de la m´ethode du maximum de pseudo-vraisemblance pour les processus ponctuels de Gibbs. (Thesis, Universit´e de Montpelier II). [510] —— (1997). Asymptotic properties of Takacs–Fiksel estimation method for Gibbs point processes. Statistics 30, 69–89. [514] —— and Goulard, M. (2001). An estimation method of the pair potential function for Gibbs point processes on spheres. Scand. J. Statist. 28, 185–203. [509] Bingham, N.H., Goldie, C.M. and Teugels, J. (1987). Regular Variation. Cambridge University Press, Cambridge. Paperback edition (corrected printing), 1989. [160] Block, H.W., Borges, W.S. and Savits, T.H. (1985). Age-dependent minimal repair. J. Appl. Probab. 22, 370–385. [235] Boel, R., Varaiya, P. and Wong, E. (1975). Martingales on jump processes, I: Representation results, and II: Applications. SIAM J. Control 13, 999–1021 and 1022–1061. [375, 405] Bol’shakov, I.A. (1969). Statistical Problems in Isolating a Stream of Signals from Noise (in Russian). Sovyetskoye Radio, Moscow. [65] Brandt, A., Franken, P. and Lisek, B. (1990). Stationary Stochastic Models. Wiley, Chichester. [269] Breiman, L. (1963). The Poisson tendency in traﬃc distribution. Ann. Math. Statist. 34, 111. [174] Br´emaud, P. (1981). Point Processes and Queues: Martingale Dynamics. SpringerVerlag, New York. [101, 375, 399–419] —— and Jacod, J. (1977). Processus ponctuels et martingales: r´esultats r´ecents sur le mod´elisation et le ﬁltrage. Adv. Appl. Probab. 9, 362–416. [400] —— and Massouli´e, L. (1994). Imbedded construction of stationary point processes and sequences with a random memory. Queueing Systems 17, 213–234. [224, 427] —— and —— (1996). Stability of non-linear Hawkes processes. Ann. Probab. 24, 1563–1588. [222, 224, 427–439] —— and —— (2001). Hawkes branching point processes without ancestors. J. Appl. Probab. 38, 122–135. [141, 145] ——, —— and Ridolﬁ, A. (2005). Power spectra of random spike ﬁelds and related processes. Adv. Appl. Probab. 37, 1116–1146. [376] ——, Nappo, G. and Torrisi, G.L. (2002). Rate of convergence to equilibrium of marked Hawkes processes. J. Appl. Probab. 39, 123–136. [428] Brillinger, D.R. (1972). The spectral analysis of stationary interval functions. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 1, 403–431. [237] Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Probab. 31, 929–953. [83, 84]

540

References with Index

Brown, T.C. (1978). A martingale approach to the Poisson convergence of simple point processes. Ann. Probab. 6, 615–628. [384, 419] —— (1982). Poisson approximations and exchangeable random variables. In Koch, G. and Spizzichino, F. (Eds.), Exchangeability in Probability and Statistics, North-Holland, Amsterdam, pp. 177–183. [384] —— (1983). Some Poisson approximations using compensators. Ann. Probab. 11, 726–744. [384] —— and Nair, M.G. (1988). A simple proof of the multivariate random time change theorem for point processes. J. Appl. Probab. 25, 210–214. [419, 425] Browne, S. and Sigman, K. (1992). Work-modulated queues with applications to storage processes. J. Appl. Probab. 29, 699–712. [228, 235] Brownrigg, R. and Harte, D.S. (2005). Using R for statistical seismology. R News [499] 5(1), 31–35. URL: cran.r-project.org/doc/Rnews/Rnews 2005-1.pdf Buchler, J.R., Dufty, J.W. and Kandrup, H.E. (Eds) (1998). Long-range Correlations in Astrophysical Systems (Ann. New York Acad. Sci. 848), New York. [517] Byth, K. (1981). θ-stationary point processes and their second order analysis. J. Appl. Probab. 18, 864–878. [466, 471] Campbell, N.R. (1909). The study of discontinuous phenomena. Proc. Cambridge Philos. Soc. 15, 117–136. [66, 271] Chin, Y.C. and Baddeley, A.J. (1999). On connected component Markov point processes. Adv. Appl. Probab. 31, 279–282. [129] —— and —— (2000). Markov interacting component processes. Adv. Appl. Probab. 32, 597–619. [129] Chong, F.S. (1983). Time-space-magnitude interdependence of upper crustal earthquakes in the main seismic region of New Zealand. New Zealand J. Geol. Geophys. 26, 7–24. [496] Chung, K.L. (1967). Markov Chains with Stationary Transition Probabilities, 2nd ed. Springer-Verlag, New York. [115] —— (1974). A Course in Probability Theory, 2nd ed. Academic Press, New York. [48, 146, 209] C ¸ inlar, E. (1975). Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliﬀs, NJ. [96, 115] Cliﬀord, P. (1990). Markov random ﬁelds in statistics. In Grimmett, G.R. and Welsh, D.J.A. (Eds.), Disorder in Physical Systems, Clarendon Press, Oxford, pp. 19–32. [121, 122] Cram´er, H. and Leadbetter, M.R. (1967). Stationary and Related Stochastic Processes. Wiley, New York. [10, 159] Cressie, N.A.C. (1991). Statistics for Spatial Data. Wiley, New York. [459] —— (1993). = Cressie (1991), revised ed. [459] Csisz´ ar, I. (1969). On generalized entropy. Stud. Sci. Math. Hungar. 4, 401–419. [447, 455] Cutler, C.D. (1991). Some results on the behaviour and estimation of the fractal dimensions of distributions on attractors. J. Statist. Phys. 62, 651–708. [341, 351] Daley, D.J. (1971). Weakly stationary point processes and random measures. J. Roy. Statist. Soc. Ser. B 33, 406–428. [205, 484]

References with Index

541

Daley, D.J. (1972). Asymptotic properties of stationary point processes with generalized clusters. Z. Wahrs. 21, 65–76. [251] —— (1974). Various concepts of orderliness for point processes. In Harding and Kendall (1974), pp. 148–161. [47, 51] —— (1982a). Stationary point processes with Markov-dependent intervals and inﬁnite intensity. In Gani, J. and Hannan, E.J. (Eds.), Essays in Statistical Science, (J. Appl. Probab. 19A), pp. 313–320. [51] —— (1982b). Inﬁnite intensity mixtures of point processes. Math. Proc. Cambridge Philos. Soc. 92, 109–114. [51] —— (1987). The variation distance between Poisson distributions. (Unpublished). [Abstract in Bull. Inst. Math. Statist. 16, 41.] [166, 175] —— (1999). The Hurst index of long-range dependent renewal processes. Ann. Probab. 27, 2035–2041. [254] —— (2004). Further results for the lilypond model. In Baddeley, A., Gregori, P., Mateu, J., Stoica, R. and Stoyan, D. (Eds.), Spatial Point Process Modelling and its Applications (Treballs d’Inform` atica i Tecnologia 20), Universitat Jaume I, Castell´ o, pp. 55–65. [504] —— (2007). Long-range dependence in a Cox process directed by an alternating renewal process. (Submitted). [254] —— and Goldie, C.M. (2005). The moment index of minima (II). Statist. Probab. Letters 76, 831–837. [254] —— and Last, G. (2005). Descending chains, the lilypond model, and mutual nearest-neighbour matching. Adv. Appl. Probab. 37, 604–628. [506] ——, Mallows, C.L. and Shepp, L. (2000). A one-dimensional Poisson growth model with non-overlapping intervals. Stoch. Proc. Appl. 90, 223–241. [504] ——, Rolski, T. and Vesilo, R. (2007). Long-range dependence in a Cox process driven by a Markov renewal process. J. Appl. Math. Decision Sci. (Special Issue Statistics and Applied probability: A tribute to Jeﬀrey J. Hunter) (to appear). [254] ——, Stoyan, D. and Stoyan, H. (1999). The volume fraction of a Poisson germ model with maximally non-overlapping spherical grains. Adv. Appl. Probab. 31, 610–624. [503] —— and Vere-Jones, D. (1987). The extended probability generating functional, with application to mixing properties of cluster point processes. Math. Nachr. 131, 311–319. [64, 213] —— and —— (2004). Scoring probability forecasts for point processes: the entropy score and information gain. In Gani, J. and Seneta, E. (Eds.), Stochastic Methods and their Applications (Papers in honour of Chris Heyde; J. Appl. Probab. 41A), pp. 297–312. [445, 454, 503] —— and Vesilo, R. (1997). Long-range dependence of point processes, with queueing examples. Stoch. Proc. Appl. 70, 265–282. [249] —— and —— (2000). Long-range dependence of inputs and outputs of classical queues. In McDonald, D.R. and Turner, S.R.E. (Eds.), Analysis of Communication Networks: Call Centres, Traﬃc and Performance (Fields Institute Communications 28), pp. 179–186. [252]

542

References with Index

Darroch, J.N. and Morris, K.W. (1967). Some passage-time generating functions for discrete-time and continuous-time ﬁnite Markov chains. J. Appl. Probab. 4, 496–507. [116] Davidson, R. (1974a). Stochastic processes of ﬂats and exchangeability (Part II, Ph.D. thesis, Cambridge University, 1968). In Harding and Kendall (1974), pp. 13–45. [478, 479] —— (1974b). Construction of line processes: second-order properties. In Harding and Kendall (1974), pp. 55–75. [Original publication (1970), Izv. Akad. Nauk Armen. SSR Ser. Mat. 5, 219–234.] [481] —— (1974c). Exchangeable point-processes. In Harding and Kendall (1974), pp. 46–51. [73] Davis, M.H.A. (1976). The representation of martingales of jump processes. SIAM J. Control 14, 623–638. [375] Davison, A.C. and Ramesh, N.I. (1993). A stochastic model for times of exposure to air pollution from a point source. In Barnett, V.D. and Turkman, K.F. (Eds.), Statistics for the Environment, Wiley, Chichester. [105] Debes, H., Kerstan, J., Liemant, A. and Matthes, K. (1971). Verallgemeinerungen eines Satzes von Dobruschin, III. Math. Nachr. 50, 99–139. [175] Delasnerie, M. (1977). Flots m´elangements et mesures de Palm. Ann. Inst. Henri Poincar´e, Sec. B 8, 357–369. [269, 330] Dellacherie, C. and Meyer, P.-A. (1978). Probabilities and Potential. Hermann, Paris, and North-Holland, Amsterdam. [356] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with Discussion). J. Roy. Statist. Soc. Ser. B 39, 1–38. [104] Deng, L. and Mark, J.W. (1993). Parameter estimation for Markov-modulated Poisson processes via the E–M algorithm with time discretization. Telecommun. Systems 1, 321–338. [104] Derman, C. (1955). Some contributions to the theory of denumerable Markov chains. Trans. Amer. Math. Soc. 79, 541–555. [173, 175] Diggle, P.J. (1983). Statistical Analysis of Spatial Point Patterns. Academic Press, London. [459–461] —— (2003). = Diggle (1983), 2nd ed. Oxford University Press, Oxford. [459] ——, Fiksel, T., Grabarnik, P., Ogata, Y., Stoyan, D. and Tanemura, M. (1994). On parameter estimation for pairwise-interaction point processes. Internat. Statist. Review 62, 99–117. [514] Dobrushin, R.L. (1956). On the Poisson law for distributions of particles in space (in Russian). Ukr. Mat. Zh. 8, 127–134. [174] —— (1968). The description of a random ﬁeld by means of conditional probabilities and conditions of its regularity. Teor. Veroyat. Primenen. 13, 201–229. [Translation in Theory Probab. Applic. 13, 197–224.] [118] Doksum, K. (1974). Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab. 2, 183–201. [23] Dol´eans-Dade, C. (1970). Quelques applications de la formule de changement de variables pour les semimartingales. Z. Wahrs. 16, 181–194. [419] Doob, J.L. (1948). Renewal theory from the point of view of the theory of probability. Trans. Amer. Math. Soc. 63, 422–438. [268]

References with Index

543

Dubins, L.E. and Freedman, D.A. (1967). Random distribution functions. Proc. Fifth Berkeley Symp. Math. Statist. Probab. 2 (Pt. 1), 183–214. [50] Dudley, R.M. (1969). Random linear functionals. Trans. Amer. Math. Soc. 136, 1–24. [53] Elliott, R.J. (1976). Stochastic integrals for martingales of a jump process with partially accessible jump times. Z. Wahrs. 36, 213–226. [370] —— (1982). Stochastic Calculus and its Applications. Springer-Verlag, New York. [370] ——, Aggoun, L. and Moore, J.B. (1995). Hidden Markov Models. Springer, New York. [104] Falconer, K. (1990). Fractal Geometry: Mathematical Foundations and Applications. Wiley, Chichester. [341] Feller, W. (1966). An Introduction to Probability Theory and its Applications, Vol.2. Wiley, New York. [12, 25, 146, 259, 409] —— (1968). An Introduction to Probability Theory and its Applications, Vol.1, 3rd ed. Wiley, New York. [115] —— (1971). = Feller (1966), 2nd ed. [160] Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, 209–230. [12] Fiksel, T. (1988). Estimation of interaction potentials of Gibbsian point processes. Statistics 19, 77–86. [514, 518] Fischer, D.R. and Meier-Hellstern, K.S. (1993). The Markov-modulated Poisson process (MMPP) cookbook. Performance Eval. 18, 149–171. [105] Fleischmann, K. (1978). Mixing properties of cluster-invariant distributions. Litovsk. Mat. Sb. 18(3), 191–199. [338] —— and Prehn, U. (1974). Ein Grenzwersatz fur subkritische Verzweigungsprozesse mit endlich vielen Typen von Teilchen. Math. Nachr. 64, 357–362. [339] —— and —— (1975). Subkritische r¨ aumlich homogene Verzweigungsprozesse. Math. Nachr. 70, 231–250. [339] Franken, P. (1963). Approximation durch Poissonsche Prozesse. Math. Nachr. 26, 101–114. [146] —— (1975). Einige Anwendungen der Theorie zuf¨ alliger Punktprozesse in der Bedienungstheorie, I. Math. Nachr. 70, 303–319. [269] ——, K¨ onig, D., Arndt, U. and Schmidt, V. (1981). Queues and Point Processes. Akademie-Verlag, Berlin. [222, 269, 300, 316, 328, 333] Fritz, J. (1969). Entropy of point processes. Stud. Sci. Math. Hungar. 4, 389–399. [447] —— (1973). An approach to the entropy of point processes. Period. Math. Hungar. 3, 73–83. [455] Galchuk, L.I. and Rosovskii, B.L. (1971). The ‘disorder’ problem for a Poisson process. Teor. Veroyat. Primen. 16, 729–734. [Translation in Theory Probab. Appl. 16, 712–717.] [101] Gel’fand, I.M. and Vilenkin, N.Ya. (1964). Generalized Functions, Vol.4. Academic Press, New York. [54]

544

References with Index

Georgii, H.-O. (1976). Canonical and grand canonical Gibbs states for continuum systems. Commun. Math. Phys. 48, 31–51. [511] —— (1988). Gibbs Measure and Phase Transitions. (Studies in Mathematics 9). de Gruyter, Berlin. [118, 519] Gl¨ otzl, E. (1980). Lokale Energien und Potentiale f¨ ur Punktprozesse. Math. Nachr. 96, 195–206. [519, 524] Grassberger, P. and Procaccia, I. (1983). Measuring the strangeness of strange attractors. Physica D 9, 189–208. [346] Grigelionis, B. (1963). On the convergence of sums of random step processes to a Poisson process (in Russian). Teor. Veroyat. Primen. 8,189–194. [Translation in Theory Probab. Appl. 8, 177–182.] [146] —— (1971). On the representation of integer-valued random measures by means of stochastic integrals with respect to Poisson measure. Litovsk. Mat. Sb. 11, 93–108. [427, 428] Gross, D. and Miller, D.R. (1984). The randomization technique as a modeling tool and solution procedure for transient Markov processes. Operat. Res. 32, 345–361. [117] H¨ aggstr¨ om, O. and Meester, R. (1996). Nearest neighbour and hard sphere models in continuum percolation. Random Struct. Algor. 9, 295–315. [503] Halmos, P. (1950). Measure Theory. Van Nostrand, Princeton NJ. [20] Hammersley, J.M. and Cliﬀord, P. (1971). Markov ﬁelds on ﬁnite ﬁelds and lattices (unpublished). See Besag (1974), Cliﬀord (1990). [118] ——, Lewis, J.W.E. and Rowlinson, J.S. (1975). Relationships between the multinomial and Poisson models of stochastic processes, and between the canonical and grand canonical ensembles in statistical mechanics, with illustrations and Monte-Carlo methods for the penetrable sphere model of liquid–vapour equilibrium. Sankhy¯ a A 37, 457–491. [124] Hanisch, K.-H. (1984). Some remarks on estimators of the distribution function of nearest-neighbour distance in stationary spatial point-patterns. Math. Operat. Statist., Ser. Statistics 15, 409–412. [463, 466] Harding, E.J. and Kendall, D.G. (Eds.) (1974). Stochastic Geometry. Wiley, Chichester. [458, 472] Harris, T.E. (1963). The Theory of Branching Processes. Springer-Verlag, Berlin. [1, 26, 334, 338] Harte, D.S. (1998). Dimension estimates of earthquake epicentres and hypocentres. J. Non-linear Sci. 8, 581–618. [346] —— (2001). Multifractals: Theory and Applications. Chapman and Hall/CRC, Boca Raton, FL. [341–352] —— (2003). Package “PtProcess”: Time Dependent Point Process Modelling. R statistical program routines. Statistics Research Associates, Wellington. URL: homepages.paradise.net.nz/david.harte/SSLib [499] Hawkes, J. and Oakes, D. (1974). A cluster representation of a self-exciting process. J. Appl. Probab. 11, 493–503. [229, 232] Heveling, M. and Last, G. (2005). Characterization of Palm measures via bijective point shifts. Ann. Probab. 33, 1698–1715. [269, 309–317] —— and —— (2006). Existence, uniqueness, and algorithmic computation of general lilypond systems. Random Struct. Algor. 29, 338–350. [506]

References with Index

545

Hill, B.M. (1975). A simple general approach to inference about the tail of a distribution. Ann. Statist. 38 1163–1174. [346] Ibragimov, I.A. and Linnik, Yu.V. (1971). Independent and Stationary Sequences of Random Variables. Wolters-Noordhoﬀ, Gr¨ oningen. [Translated from original in Russian (1965), Nauka, Moscow.] [167] Isham, V. (1981). An introduction to spatial point processes and Markov random ﬁelds. Internat. Statist. Rev. 49, 21–43. [118] —— and Westcott, M. (1979). A self-correcting point process. Stoch. Proc. Appl. 8, 335–347. [418] Jacobsen, M. (1982). Statistical Analysis of Counting Processes (Springer Lecture Notes Statist. 12). Springer-Verlag, New York. [396] Jacod, J. (1975). Multivariate point processes: Predictable projections, Radon– Nikodym derivatives, representation of martingales. Z. Wahrs. 31, 235–253. [399, 405] Jarupskin, B.D.S. (1984). Maximum Likelihood and Related Estimation Methods in Point Processes. Ph.D. thesis, University of California, Berkeley. [412, 416] Jelinski, Z., and Moranda, P. (1972). Software reliability research. In Freiberger, W. (Ed.), Statistical Computer Performance Evaluation, Academic Press, New York, pp. 465–484. [100] Jowett, J. and Vere-Jones, D. (1972). The prediction of stationary point processes. In Lewis, P.A.W. (Ed.), Stochastic Point Processes, Wiley, New York, pp. 405–435. [101, 410] Kagan, Y.Y. (1981a). Spatial distribution of earthquakes: The three-point correlation function. Geophys. J. Roy. Astr. Soc. 67, 697–717. [249] —— (1981b). Spatial distribution of earthquakes: The four-point correlation function. Geophys. J. Roy. Astr. Soc. 67, 719–733. [249] —— and Knopoﬀ L., (1980). Spatial distribution of earthquakes: The two-point correlation function. Geophys. J. Roy. Astr. Soc. 62, 303–320. [249] —— and Vere-Jones, D. (1995). Problems in the modelling and statistical analysis of earthquakes. In Heyde, C.C., Prohorov, Yu.V., Pyke, R. and Rachev, S.T. (Eds.), Athens Conference on Applied Probability and Time Series, Volume 1: Applied Probability (Springer Lecture Notes in Statistics 114), Springer, New York, pp. 398–425. [249] Kallenberg, O. (1973). Characterization and convergence of random measures and point processes. Z. Wahrs. 27, 9–21. [33] —— (1975). Random Measures. Akademie-Verlag, Berlin, and Academic Press, London. [3rd ed. 1983, reprinted with corrections as 4th ed. 1986]. [33–39, 95, 146, 153, 178, 272] —— (1977a). Stability of critical cluster ﬁelds. Math. Nachr. 77, 7–43. [336, 337] —— (1977b). A counterexample to R. Davidson’s conjecture on line processes. Math. Proc. Cambridge Philos. Soc. 82, 301–307. [481, 483] —— (1978). On conditional intensities of point processes. Z. Wahrs. 41, 205–220. [519] —— (1983a). = Kallenberg (1975), 3rd ed. [55, 86, 178, 270, 272, 518–534]

546

References with Index

Kallenberg, O. (1983b). On random processes of ﬂats with special emphasis on the invariance problem. Bull. Int. Statist. Inst. 50(2), 854–862, and 50(3), 383–392. [531] —— (1984). An informal guide to the theory of conditioning in point processes. Int. Statist. Review 52, 151–164. [519, 525] —— (1986). = Kallenberg (1975), 4th ed. [518] Kaplan, E.L. (1955). Transformations of stationary random sequences. Math. Scand. 3, 127–149. [268, 299] Karatzas, I., and Shreve, S.E. (1988). Brownian Motion and Stochastic Calculus. Springer-Verlag, New York. [370] Karbe, W. (1973). Konstruction einfacher zuf¨ allliger Punktfolgen. Diplomarbeit, Friedrich-Schiller-Universit¨ at Jena, Sektion Mathematik. [33] Keiding, N. (1975). Maximum likelihood estimation in birth and death processes. Ann. Statist. 3, 363–372. [416–418] Kelly, F.P. and Ripley, B.D. (1976). A note on Strauss’s model for clustering. Biometrika 63, 357–360. [123] Kendall, D.G. (1963). Extreme-point methods in stochastic analysis. Z. Wahrs. 1, 295–300. [83, 86] —— (1974). Foundations of a theory of random sets. In Harding and Kendall (1974), pp. 322–376. [33, 458] Kerstan, J. (1964a). Verallgemeinerung eines Satzes von Prochorov und Le Cam. Z. Wahrs. 2, 173–179. [163] —— (1964b). Teilprozesse Poissonscher Prozesse. In Transactions of the Third Prague Conference on Information Theory, Statistical Decision Functions and Random Processes. Czech. Academy of Science, Prague, pp. 377–403. [224, 427, 430, 439] ¨ —— and Debes, H. (1969). Zuf¨ allige Punktfolgen und Markoﬀsche Ubergangsmatrizen ohne station¨ are Verteilungsgesetze. Wiss. Z. Friedrich-Schiller-Universit¨ at Jena 18, 349–359. [175] —— and Matthes, K. (1964). Station¨ are zuf¨ allige Punktfolgen II. Jber. Deutsch. Math.-Verein. 66, 106–118. [91] —— and —— (1967). Ergodische unbegrenzt teilbare station¨ are Punktfolgen. Trans. Fourth Prague Conf. Inf. Theory Stat. Dec. Functions Random Proc., 399–415. [217, 218] ——, —— and Mecke, J. (1974). Unbegrenzt Teilbare Punktprozesse. AkademieVerlag, Berlin. [2nd, 3rd ed. = MKM (1978, 1982) in English, Russian.] [268] Khinchin, A.Ya. (1955). Mathematical Methods in the Theory of Queueing (in Russian). Trudy Mat. Inst. Steklov 49. [Translated (1960) Griﬃn, London.] [47, 146, 268] Kijima, M. (1989). Some results for repairable systems. J. Appl. Probab. 26, 89–102. [228, 235] Kingman, J.F.C. (1967). Completely random measures. Paciﬁc J. Math. 21, 59–78. [77–82] Klemm, A., Lindemann, C. and Lohmann, M. (2003). Modeling IP traﬃc using the [110–113] batch Markovian arrival process. Performance Eval. 54, 149–173. K¨ onig, D. and Matthes, K. (1963). Verallgemeinerung der Erlangschen Formeln, I. Math. Nachr. 26, 45–56. [269]

References with Index

547

K¨ onig, D., Matthes, K. and Nawrotzki, K. (1967). Verallgemeinerung der Erlangschen und Engsetschen Formeln (Eine Methode in der Bedienungstheorie). Akademie-Verlag, Berlin. [269] Kraft, C.H. (1964). A class of distribution function processes which have derivatives. J. Appl. Probab. 1, 385–388. [50] Krickeberg, K. (1965). Probability Theory. Addison-Wesley, Reading, MA. [378] —— (1974a). Invariance properties of the correlation measure of line-processes. In Harding and Kendall (1974), pp. 76–88. [478] —— (1974b). Moments of point processes. In Harding and Kendall (1974), pp. 89– 113. [237, 473–478] Kulkarni, V.G. (1995). Modeling and Analysis of Stochastic Systems. Chapman & Hall, London. [96] Kummer, G. and Matthes, K. (1970). Verallgemeinerung eines Satzes von Sliwnjak, II. Rev. Roumaine Math. Pures Appl. 15, 845–870. [268, 271] Kurtz, T.G. (1974). Point processes and completely monotone set functions. Z. Wahrs. 31, 57–67. [33–36] Kutoyants, Y. (1979). Local asymptotic normality for Poisson type processes. Izv. Akad. Nauk Arm. SSR Ser. Mat. 14(1), 3–20 and 72. [412] —— (1980). Estimation of Parameters of Stochastic Processes (in Russian). Armenian Academy of Science, Erevan. [412, 415] —— (1984a). Parameter estimation for processes of Poisson type. Izv. Akad. Nauk Arm. SSR Ser. Mat. 19, 233–241 [415] —— (1984b). Parameter Estimation for Stochastic Processes. Heldermann, Berlin. [Translated by B.L.S. Prakasa Rao and revised from Kutoyants (1980).] [412, 415] Lanford, O.E. and Ruelle, D. (1969). Observables at inﬁnity and states with short range corrrelations in statistical mechanics. Comm. Math. Phys. 13, 194–215. [215] Last, G. (1996). Coupling with compensators. Stoch. Proc. Appl. 65, 147–170. [429] —— (2004). Ergodicity properties of stress-release, repairable system and workload models. Adv. Appl. Probab. 36, 471–498. [222, 228, 235, 429] —— and Brandt, A. (1995). Marked Point Processes on the Real Line. Springer, New York. [355, 368, 380, 382, 400–405] —— and Szekli, R. (1998). Asymptotic and monotonicity properties of some repairable systems. Adv. Appl. Probab. 30, 1089–1110. [228, 235] Lawrance, A.J. (1971). Selective interaction of a Poisson and a renewal process: The dependency structure of the intervals between the responses. J. Appl. Probab. 8, 170–173. [247] Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution. Paciﬁc J. Math. 10, 1181–1197. [165] Leadbetter, M.R. (1968). On three basic results in the theory of stationary point processes. Proc. Amer. Math. Soc. 19, 115–117. [39] —— (1972). On basic results of point process theory. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 3, 449–462. [39, 275, 283, 308]

548

References with Index

Leadbetter, M.R., Lindgren, G. and Rootzen, H. (1983). Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New York. [159] Lee, M.-L.T. and Whitmore, G. (1993). Stochastic processes directed by randomized time. J. Appl. Probab. 30, 302–314. [83] Lee, P.M. (1964). A structure theorem for inﬁnitely divisible point processes. Address to I.A.S.P.S., Berne (unpublished) [cf. Lee, P.M. (1964), The superposition of point processes (Abstract), Ann. Math. Statist. 35, 1406–1407]. [91] —— (1967). Inﬁnitely divisible stochastic processes. Z. Wahrs. 7, 147–160. [91] Lewis, P.A.W. and Shedler, G.S. (1976). Simulation of nonhomogeneous Poisson processes with log linear rate function. Biometrika 63, 501–506. [427] Liemant, A. (1969). Invariante zuf¨ allige Punktfolgen. Wiss. Z. Friedrich-SchillerUniversit¨ at Jena 18, 361–372. [337] —— (1975). Verallgemeinerungen eines Satzes von Dobruschin, V. Math. Nachr. 70, 387–390. [337] ——, Matthes, K. and Wakolbinger, A. (1988). Equilibrium Distributions of Branching Processes. Akademie-Verlag, Berlin. [334] Liggett, T.M. (1985). Interacting Particle Systems. Springer, New York. [431] Lindvall, T. (1988). Ergodicity and inequalities in a class of point processes. Stoch. Proc. Appl. 30, 121–131. [222, 229, 439] —— (1992). Lectures on the Coupling Method. Wiley, New York. [132, 222, 229] Liptser, R.S. and Shiryayev, A.N. (1974). Statistics of Random Processes (in Russian). Nauka, Moscow. [Translation = —— and —— (1977, 1978).] [358] —— and —— (1977). Statistics of Random Processes, I: General Theory. SpringerVerlag, New York. [358] —— and —— (1978). Statistics of Random Processes, II: Applications. SpringerVerlag, New York. [358, 404, 405] —— and —— (2000). = Liptser and Shiryayev (1977, 1978), retypeset ed. [358] Lo`eve, M. (1963). Probability Theory, 3rd ed. Van Nostrand, New York. [146] MacDonald, I.L. and Zucchini, W. (1997). A Hidden Markov and other Models for Discrete-Valued Time Series. Chapman and Hall, London. [104, 116] Mandelbrot, B. (1982). The Fractal Geometry of Nature, 2nd ed. W.H. Freeman, San Francisco. [20, 256, 340] Mardia, K.V. and Jupp, P.E. (2000). Directional Statistics. Wiley, Chichester. = 2nd ed. of Mardia, K.V. (1972) Statistics of Directional Data. Academic Press, London. [189] Mart´ınez, V.J. and Saar, E. (2002). Statistics of the Galaxy Distribution. Chapman & Hall/CRC, Boca Raton, FL. [21, 517] Maruyama, G. (1955). On the Poisson distribution derived from independent random walks. Nat. Sci. Rep. Ochanomiza Univ. 6, 1–6. [174] Mase, S. (1986). On the possible form of size distributions for Gibbsian processes of mutuallu non-intersecting discs. J. Appl. Probab. 23, 646–659. [461] —— (1990). Mean characteristics of Gibbsian point processes. Ann. Inst. Statist. Math. 42, 203–220. [461] Massouli´e, L. (1998). Stability results for a general class of interacting point process dynamics, and applications. Stoch. Proc. Appl. 75, 1–30. [222, 427–439]

References with Index

549

Mat´ern, B. (1960). Spatial Variation. Meddelanded Stat. Skogsforsk. 49 (5), 1–144. [2nd ed. (1986). Lecture Notes in Statistics 36, Springer-Verlag, New York.] [459] Matthes, K. (1972). Inﬁnitely divisible point processes. In Stochastic Point Processes (P.A.W. Lewis, ed.), Wiley, New York, pp. 384–404. [66] ——, Kerstan J. and Mecke, J. (1974). See Kerstan, Matthes and Mecke (1974). [269] ——, —— and ——. (1978). Inﬁnitely Divisible Point Processes. Wiley, Chichester. See MKM (1978). [33, 269] ——, —— and ——. (1982). Bezgranichno Delimye Tochechnye Protsessy. Nauka, Moscow. See MKM (1982). [269] ——, Warmuth, W. and Mecke, J. (1979). Bemerkungen zu einer Arbeit von Nguyen Xuan Xanh und Hans Zessin. Math. Nachr. 88, 117–127. [519] McFadden, J.A. (1965a). The mixed Poisson process. Sankhya A 27, 83–92. [73] —— (1965b). The entropy of a point process. J. SIAM 13, 988–994. [442] McMillan, B. (1953). Absolutely monotone functions. Ann. Math. 60, 467–501. [33] Mecke, J. (1967). Station¨ are zuf¨ allige Masse auf lokal-kompakten abelschen Gruppe. Z. Wahrs. 9, 36–58. [281–292] —— (1968). Eine charakteristische Eigenschaft der doppelt stochastischen PoissonProzesse. Z. Wahrs. 11, 74–81. [166] —— (1975). Invarianzeigenschaften allgemeiner Palmscher Maße. Math. Nachr. 65, 335–344. [269, 285–317] —— (1979). An explicit description of Kallenberg’s lattice-type point process. Math. Nachr. 89, 185–195. [483] Mertens, J.-F. (1972). Th´eorie des processus stochastiques g´en´eraux: applications aux martingales. Z. Wahrs. 22, 45–68. [399] M´etivier, M. (1971). Sur la construction de mesures al´eatoires presque sˆ urement absolument continues par rapport ´ a une mesure donn´ee. Z. Wahrs. 20, 332–344. [50] Meyer, P.A. (1971). D´emonstration simpliﬁ´ee d’un th´eor`eme Knight. In S´eminaire de Probabilit´es V (Springer Lecture Notes in Mathematics 191), Springer, New York, pp. 191–195. [419] Meyn, S.P. and Tweedie, R.L. (1993a). Markov Chains and Stochastic Stability. Springer-Verlag, London. [226, 228] —— and —— (1993b). Stability of Markovian processes II: Continuous time pro[226–235] cesses and sampled chains. Adv. Appl. Probab. 25, 487–517. Mikosch, T. and Wang, Q. (1995). A Monte Carlo method for estimating the correlation exponent. J. Statist. Physics, 78, 799–813. [346] Miles, R.E. (1969). Poisson ﬂats in Euclidean spaces, Part I: A ﬁnite number of random uniform ﬂats. Adv. Appl. Probab. 1, 211–237. [476] —— (1971). Poisson ﬂats in Euclidean spaces, Part II: Homogeneous Poisson ﬂats and the complementary theorem. Adv. Appl. Probab. 3, 1–43. [476] —— (1974). On the elimination of edge eﬀects in planar sampling. In Harding and Kendall (1974), pp. 228–247. [476] Milne, R.K. (1971). Stochastic analysis of multivariate point processes. Ph.D. thesis, Australian National University. [89]

550

References with Index

MKM (1974). = Kerstan, Matthes and Mecke (1974). [269] MKM (1978). = Matthes, Kerstan and Mecke (1978). [33, 51, 66, 131, 146, 162, 174–178, 198, 213–222, 269, 298, 339] MKM (1982). = Matthes, Kerstan and Mecke (1982). [15, 215–221, 339, 519] Møller, J. and Waagepetersen, R.P. (2004). Statistical Inference and Simulation for Spatial Point Processes, Chapman and Hall/CRC, Boca Raton, FL. [118–130, 508] M¨ onch, G. (1971). Verallgemeinerung eines Satzes von A. R´enyi. Stud. Sci. Math. Hungar. 6, 81–90. [33, 35] Moran, P.A.P. (1968). An Introduction to Probability Theory. Clarendon Press, Oxford. [20, 66] —— (1973). Necessary conditions for Markovian processes on a lattice. J. Appl. Probab. 10, 605–612. [118] Moyal, J.E. (1962). The general theory of stochastic population processes. Acta Math. 108, 1–31. [1, 5, 26] Musmeci and Vere-Jones (1987). A variable grid algorithm for smoothing of clustered data. Biometrics 42, 483–494. [490] Nair, M.G. (1990). Random space change for multiparameter point processes. Ann. Probab. 18, 1222–1231. [419] Neuts, M. (1978). Renewal processes of phase type. Naval Research Logistics Quarterly 25, 445–454. [110, 111] —— (1979). A versatile Markovian point process. J. Appl. Probab. 16, 764–779. [110, 111] —— (1989). Structured Markov Chains of the M/G/1 Type and Their Applications. Marcel Dekker, New York. [110] Neveu, J. (1968). Sur la structure des processus ponctuels stationnaires. C. R. Acad. Sci. Paris Ser. A 267, 561–564. [269] —— (1976). Processus ponctuels. In Springer Lecture Notes Math. 598, Springer, New York, pp. 249–445. [269] Nguyen, X.X. and Zessin, H. (1976). Punktprozesse mit Wechselwirkung. Z. Wahrs. 37, 91–126. [15, 242] —— and —— (1979a). Ergodic theorems for spatial point processes. Z. Wahrs. 48, 133–158. [198] —— and —— (1979b). Integral and diﬀerential characterizations of Gibbs processes. Math. Nachr. 88, 105–115. [511, 518] Nieuwenhuis, G. (1994). Bridging the gap between a stationary point process and its Palm distribution. Statist. Neerlandica 48, 37–62. [332] Ogata, Y. (1988). Statistical models for earthquake occurrence and residual analysis for point processes. J. Amer. Statist. Assoc. 83, 9–27. [419] —— (1998). Space–time point process models for earthquake occurrences. Ann. Inst. Statist. Math. 50, 379–402. [499] —— (2004). Space-time model for regional seismicity and detection of crustal stress changes. J. Geophys. Res. 109, B03308,doi:10.1029/2003JB002621 [501] —— and Katsura, K. (1991). Maximum likelihood estimates of the fractal dimension [21, 253, 498] for random spatial patterns. Biometrika 78, 463–474.

References with Index

551

Ogata, Y., Katsura, K. and Tanemura, M. (2003). Modelling of heterogeneous space– time seismic activity and its residual analysis. Appl. Stat. 52, 499–509. [499] —— and Zhuang, J. (2006). Space-time ETAS models and an improved extension. Tectonophysics 413, 13–23. [499] Ososkov, G.A. (1956). A limit theorem for ﬂows of similar events (in Russian). Teor. Veroyat. Primen. 1, 274–282. [Translation in Theory Probab. Appl. 1, 248–255.] [146] Palm, C. (1943). Intensit¨ atsschwankungen im Fernsprechverkehr. Ericsson Techniks 44, 1–189. [146, 268] Paloheimo, J.E. (1971). On a theory of search. Biometrika 58, 61–75.

[466]

Papangelou, F. (1970). The Ambrose–Kakutani theorem and the Poisson process. In Springer Lecture Notes Math. 160, 234–240. [268, 269] —— (1972). Integrability of expected increments of point processes and a related random change of scale. Trans. Amer. Math. Soc. 165, 483–506. [420] —— (1974a). On the Palm probabilities of processes of points and processes of lines. In Harding and Kendall (1974), pp. 114–147. [268] —— (1974b). The conditional intensity of general point processes and an application to line processes. Z. Wahrs. 28, 207–226. [506, 520, 527, 531] —— (1978). On the entropy rate of stationary point processes and its discrete approximation. Z. Wahrs. 44, 191–211. [449] Parthasarathy, K.R. (1967). Probability Measures on Metric Spaces. Academic Press, New York. [37, 55, 146–154] Penttinen, A.K. (1984). Modelling Interaction in Spatial Point Patterns: Parameter Estimation by the Maximum Likelihood Method. Jyv¨ askyl¨ a Studies in Computer Science, Economics, and Statistics, 7. University of Jyv¨ askyl¨ a. [130] Perez, A. (1959). Information theory with an abstract alphabet (Generalized forms of McMillan’s limit theorems for the case of discrete and continuous time). Teor. Veroyat. Primen. 4, 105–109. [Translation in Theory Probab. Appl. 4, 99–102.] [452] Preston, C.J. (1976). Random Fields (Springer Lecture Notes Math. 534). SpringerVerlag, New York. [118, 520] —— (1977). Spatial birth-and-death processes. Bull. Internat. Statist. Inst. 46, 371–391. [126] Qian, W. and Titterington, D.M. (1990). Parameter estimation for hidden Gibbs chains. Statist. Probab. Letters 10, 49–58. [104] Ramaswami, V. (1980). The N/G/1 queue and its detailed analysis. Adv. Appl. Probab. 12, 222–261. [110] Rauchenschwandtner, B. (1980). Gibbsprozesse und Papangeloukerne (Dissertations of the Johannes Kepler University of Linz, 17). Verb. Wiss. Gesellsch. Osterreichs, Wien. [519] Rebolledo, R. (1980). Central limit theorem for local martingales. Z. Wahrs. 51, 269–286. [412]

552

References with Index

R´enyi, A. (1956). A characterization of Poisson processes (in Hungarian; Russian and English summaries). Magyar Tud. Akad. Mat. Kutato Int. Kozl. 1, 519–527. [Translation in (1976) Selected Papers of Alfred R´enyi, Vol. 1 (P. Turan, ed.), pp. 622–628, Akad´emiai Kiad´ o, Budapest.] [165] —— (1959). On the dimension and entropy of probability distributions. Acta Math. Acad. Sci. Hungar. 10, 193–215. [340, 440, 455] —— (1967). Remarks on the Poisson process. Stud. Sci. Math. Hungar. 5, 119–123. [33, 35] Resnick, S.I. (1986). Point processes, regular variation and weak convergence. Adv. Appl. Probab. 18, 66–138. [161] —— (1987). Extreme Values, Regular Variation, and Point Processes, SpringerVerlag, New York. [161] Ripley, B.D. (1976). The second-order analysis of spatial point processes. J. Appl. Probab. 13, 255–266. [191] —— (1977). Modelling spatial patterns (with Discussion). J. Roy. Statist. Soc. Ser. B 39, 172–212. [126] —— (1981). Spatial Statistics. John Wiley & Sons, Chichester. [457, 459] —— (1988). Statistical Inference for Spatial Processes. Cambridge University Press, Cambridge. [463] —— and Kelly, F.P. (1977). Markov point processes. J. London Math. Soc. 15, 188–192. [118, 122] Roberts, W.J.J., Ephraim, Y. and Dieguez, E. (2006). On Ryd´en’s EM algorithm for estimating MMPPs. IEEE Signal Process. Lett. 13, 373–376. [109] Romanowska, M. (1978). Poisson approximation of some probability distributions. Bull. Acad. Polon. Sci., Ser. Sci. Math. Astr. Phys. 26, 1023–1026. [163] Rudemo, M. (1964). Dimension and entropy for a class of stochastic processes. Magyar Tud. Akad. Mat. Kutato Int. Kozl. 9, 73–87. [442] —— (1972). Doubly stochastic Poisson processes and process control. Adv. Appl. Probab. 4, 318–338. [101, 410] —— (1973). Point processes generated by transitions of Markov chains. Adv. Appl. Probab. 5, 262–286. [101, 109, 410] Ruelle, D. (1969). Statistical Mechanics: Rigorous Results. W.A. Benjamin, Reading MA. [118] Ryd´en, T. (1994). Parameter estimation for Markov-modulated Poisson processes. Stoch. Models 10, 795–829. [105] —— (1996). An EM algorithm for estimation in Markov-modulated Poisson processes. Comp. Statist. Data Anal. 2, 431–447. [105–111] Ryll-Nardzeweki, C. (1961). Remarks on processes of calls. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 2, 455–465. [268, 299] Samorodnitsky, G. and Taqqu, M.S. (1994). Stable Non-Gaussian Random Processes: Stochastic Models with Inﬁnite Variance. Chapman and Hall, New York. [83] Sampath, G. and Srinivasan, S.K. (1977). Stochastic Models for Spike Trains of Single Neurons (Springer Lecture Notes Biomath. 10). Springer-Verlag, Berlin. [247]

References with Index

553

Samuels, S.M. (1965). On the number of successes in independent trials. Ann. Math. Statist. 36, 1272–1278. [163] Schoenberg, F.P. (1999). Transforming spatial point processes into Poisson processes. Stoch. Proc. Appl. 81, 155–164. [419] —— (2002). On rescaled Poisson processes and the Brownian bridge. Ann. Inst. Statist. Math. 54, 445–457. [502] Serinko, R.J. (1994). A consistent approach to least squares estimation of correlation dimension in weak Bernoulli dynamical systems. Ann. Appl. Probab. 4, 1234–1254. [351, 352] Sgibnev, M.S. (1981). On the renewal theorem in the case of inﬁnite variance. Sibirsk. Mat. Zh. 22 (5), 178–189. [Translation in Siberian Math. J. 22, 787–796.] [254] Sigman, K. (1995). Stationary Marked Point Processes. Chapman and Hall, New York. [15, 222–231, 269, 322–333] Sinai, Ya.G. (Ed.) (2000). Dynamical Systems, Ergodic Theory and Applications, Second, Expanded and Revised Edition. Springer, Berlin. [196] Singpurwalla, N.D. and Wilson, S.P. (1999). Statistical Methods in Software Engineering, Reliability and Risk. Springer-Verlag, New York. [100] SKM (1987). = Stoyan et al. (1987). [269, 459] SKM (1995). = SKM (1987) 2nd ed. [244, 269, 457–460, 472, 483] Slivnyak, I.M. (1962). Some properties of stationary ﬂows of homogeneous random events. Teor. Veroyat. Primen. 7, 347–352. [Translation in Theory Probab. Appl. 7, 336–341.] [268, 281, 299, 308] —— (1966). Stationary streams of homogeneous random events. Vest. Harkov. Gos. Univ. Ser. Mech. Math. 32, 73–116. [268, 299, 308] Smith, W.L. (1955). Regenerative stochastic processes. Proc. Roy. Soc. London Ser. A 232, 6–31. [333] Snyder, D.L. (1972). Filtering and detection for doubly stochastic Poisson processes. IEEE Trans. Inf Theory IT-18, 97–102. [101, 409] —— (1975). Random Point Processes. Wiley, New York. [409] —— and Miller, M.I. (2000). Random Point Processes in Time and Space. SpringerVerlag, New York. [= Snyder (1975), 2nd ed.] [400] Stam, A.J. (1967a). On shifting iterated convolutions, I. Compositio Math. 17, 268–280. [174] —— (1967b). On shifting iterated convolutions, II. Compositio Math. 18, 201–228. [174] Stone, C.J. (1968). On a theorem of Dobrushin. Ann. Math. Statist. 39, 1391–1401. [169, 174] Stoyan, D. (2006). Personal communication, June 2006. [485] —— and Grabarnik, P. (1991). Second-order characteristics for stochastic structures connected with Gibbs point processes. Math. Nachr. 151, 95–100. [516] ——, Kendall, W.S. and Mecke, J. (1987). Stochastic Geometry and its Applications. John Wiley & Sons, Chichester. [= SKM (1987); 2nd ed. = SKM (1995).] [269] ——, —— and —— (1995). = SKM (1995), = Stoyan, Kendall and Mecke (1987) 2nd ed. [269] uhrung. Akademie—— and Mecke, J. (1983). Stochastische Geometrie: Eine Einf¨ Verlag, Berlin. [Precursor of SKM (1987, 1995).] [269, 472]

554

References with Index

Stoyan, D. and Stoyan, H. (1994). Fractals, Random Shapes and Point Fields. John Wiley & Sons, Chichester. [20, 463, 517] Straf, M. (1972). Weak convergence of stochastic processes with several parameters. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 2, 187–222. [145] Takacs, R. (1983). Estimator for the pair potential of a Gibbsian point process. Johannes Kepler Univ. Linz, Inst. f¨ ur Math. Inst. Ber. 238. [514] Takens, F. (1985). On the numerical determination of the dimension of an attractor. In Braaksma, B.L.J., Broer, H.W. and Takens, F. (Eds.), Dynamical Systems and Bifurcations: Proceedings of a Workshop held in Groningen, Netherlands, April 16–20, 1984. Lecture Notes in Mathematics 1125, Springer-Verlag, Berlin, pp. 99–106. [346] Tanaka, U. and Ogata, Y. (2005). Estimation of parameters for the Neyman–Scott spatial cluster model. Talk to 73rd annual meeting of the Japan Statistical Society. [497] Tanemura, M., Ogawa, T. and Ogata, Y. (1983). A new algorithm for three-dimensional Voronoi tesselation. J. Comput. Phys. 51, 191–207. [500] Tempel’man, A.A. (1972). Ergodic theorems for general dynamical systems. Trudy Moskov. Mat. Obsc. 26, 95–132. [Translation in Trans. Moscow Math. Soc. 26, 94–132.] [196, 202] —— (1986). Ergodic Theorems on Groups (in Russian). Mosklas, Vilnius. [Translated and Revised (1992). Ergodic Theorems for Group Actions. Kluwer, Dordrecht.] [196] Thed´een, T. (1964). A note on the Poisson tendency in traﬃc distribution. Ann. Math. Statist. 35, 1823–1824. [174] Thorisson, H. (1994). Shift-coupling in continuous time. Probab. Theory Related Fields 99, 477–483. [222–231] —— (1995). On time and cycle stationarity Stoch. Proc. Appl. 55, 183–209. [309, 332] —— (2000). Coupling, Stationarity, and Regeneration. Springer, New York. [132, 222–230, 309–316] Tim´ ar, A. (2004). Tree and grid factors for general point processes. Electron. Commun. Probab. 9, 53–59. [269] Torrisi, G.L. (2002). A class of interacting marked point processes: rate of convergence to equilibrium. J. Appl. Probab. 39, 137–160. [428] Turner, T.R., Cameron, M.A. and Thomson, P.J. (1998). Hidden Markov chains in generalized linear models. Canadian J. Statist. 26, 107–125. [104, 107] Utsu, T. and Ogata, Y. (1997). Statistical analysis of seismicity. In Algorithms for Earthquake Statistics and Prediction. IASPEI Software Library (Internat. Assoc. Seismology and Physics of the Earth’s Interior) 6, 13–94. [500] Van der Hoeven, P.C.T. (1982). Une projection de processus ponctuels. Z. Wahrs. 61, 483–499. [534] —— (1983). On Point Processes (Mathematical Centre Tracts 167). Mathematisch Centrum, Amsterdam. [534] van Lieshout, M.N.M. (1995). Stochastic Geometry Models in Image Analysis and Spatial Statistics. CWI Tract 108, Amsterdam. [457]

References with Index

555

van Lieshout, M.N.M. (2000). Markov Point Processes and their Applications. Imperial College Press, London. [118–126, 459] —— (2006a). Markovianity in space and time. In D. Denteneer, F. den Hollander and E. Verbitsky (Eds.), Dynamics & Stochastics: Festschrift in Honour of M.S. Keane (Lecture Notes—Monograph Series 48), Institute for Mathematical Statistics, Beachwood, pp. 154–168. [129, 460] —— (2006b). A J-function for marked point patterns. Ann. Inst. Statist. Math. 58, 235–259. [462, 466] —— and Baddeley, A.J. (1996). A nonparametric measure of spatial interaction in point patterns. Statist. Neerlandica 50, 344–361. [460] Vere-Jones, D. (1968). Some applications of probability generating functionals to the study of input/output streams. J. Roy. Statist. Soc. Ser. B 30, 321–333. [174, 429, 431] —— (1971). The covariance measure of a weakly stationary random measure. J. Roy. Statist. Soc. Ser. B 33, 426–428. [Appendix to Daley (1971).] [237] —— (1975). On updating algorithms and inference for stochastic point processes. In Gani, J. (Ed.), Perspectives in Probability and Statistics, Applied Probability Trust, Sheﬃeld, and Academic Press, London, pp. 239–259. [101, 103] —— (1992). Statistical methods for the description and display of earthquake catalogs. In Walden, A.T. and Guttorp, P. (Eds.), Statistics in the Environmental & Earth Sciences, Edward Arnold, London, pp. 220–246. [490] —— (1999). On the fractal dimensions of point patterns. Adv. Appl. Probab. 31, 643–663. [341–354] —— (2005). A class of self-similar random measure. Adv. Appl. Probab. 37, 908–914. [263] —— (2007). Some models and procedures for space–time point processes. Environ. Ecol. Statist. 14 [Special issue on forest ﬁres] (to appear). [487] —— and Ogata, Y. (1984). On the moments of a self-correcting process. J. Appl. Probab. 21, 335–342. [418] —— and Schoenberg, F.R. (2004). Rescaling marked point processes. Aust. N. Z. J. Statist. 46, 133–143. [423] Volkonski, V.A. (1960). An ergodic theorem on the distribution of fades. Teor. Veroyat. Primen. 5, 357–360. [Translation in Theory Probab. Appl. 5, 323–326.] [45] Warren, W.G. (1962). Contributions to the study of spatial point processes. Ph.D. thesis, University of North Carolina, Chapel Hill (Statist. Dept. Mimeo Series 337). [459] —— (1971). The centre-satellite concept as a basis for ecological sampling. In Patil, G.P., Pielou, E.C. and Waters, W.E. (Eds.), Statistical Ecology Vol. 2, Pennsylvania State University Press, University Park, PA, 87–118. [459–461] Watanabe, S. (1964). On discontinuous additive functionals and L´evy measures of a Markov process. Japanese J. Math. 34, 53–70. [365, 418] Wegmann, H. (1977). Characterization of Palm distributions and inﬁnitely divisible random measures. Z. Wahrs. 39, 257–262. [272, 282] Westcott, M. (1970). Identiﬁability in linear processes. Z. Wahrs. 16, 39–46. [74] —— (1971). On existence and mixing results for cluster point processes. J. Roy. Statist. Soc. Ser. B 33, 290–300. [213]

556

References with Index

Westcott, M. (1972). The probability generating functional. J. Aust. Math. Soc. 14, 448–466. [59, 75, 210] —— (1976). A simple proof of a result on thinned point processes. Ann. Probab. 4, 89–90. [156] Wheater, H.S., Isham, V.S., Cox, D.R, Chandler, R.E., Kakou, A., Oh, L., Onof, C. and Rodroguez-Iturbe, I. (2000). Spatial-temporal rainfall ﬁelds: Modelling and statistical aspects. Hydrol. Earth Syst. Sci. 4, 581–601. [486] Whittle, P. (1951). Hypothesis Testing in Time Series Analysis. Thesis, Uppsala University, Almqvist and Wicksell. [498] —— (1954). On stationary processes in the plane. Biometrika 41, 434–449. [486] —— (1962). Topographic correlation, power-law covariance functions, and diﬀusion. Biometrika 49, 305–314. [486] Widder, D.V. (1941). The Laplace Transform. Princeton University Press, Princeton, NJ. [205] Widom, B. and Rowlinson, J.S. (1970). A new model for the study of liquid–vapor phase transitions. J. Chem. Phys. 52, 1670–1684. [124] Yaglom, A.Ya. (1961). Second-order homogeneous random ﬁelds. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 2, 593–622. [54] Yashin, A. (1970). Filtering of jump processes. Avtomat. i Telemekh. 1970(5), 52–58. [Translation in Automation and Remote Control 1970, 725–730.] [101] Z¨ ahle, U. (1988). Self-similar random measures. I. Notion, carrying Haudorﬀ dimension, and hyperbolic distribution. Probab. Theory Related Fields 80, 79–100. [256] —— (1990a). Self-similar random measures. II. A generalization to self-aﬃne measures. Math. Nachr. 146, 85–98. [256] —— (1990b). Self-similar random measures. IV. The recursive construction model of Falconer, Graf, and Mauldin and Williams. Math. Nachr. 149, 285–302. [256] —— (1991). Self-similar random measures. III. Self-similar random processes. Math. Nachr. 151, 121–148. [256] Zheng, X. (1991). Ergodic theorems for stress release processes. Stoch. Proc. Appl. 37, 239–258. [228] Zhuang, J. (2006). Second-order residual analysis of spatiotemporal point processes and applications in model evaluation. J. Roy. Statist. Soc. Ser. B 68, 635–653. [502] ——, Ogata, Y. and Vere-Jones, D. (2002). Stochastic declustering of space–time earthquake occurrences. J. Amer. Statist. Assoc. 97, 369–380. [491] ——, —— and —— (2004). Analyzing earthquake clustering features by using stochastic declustering. J. Geophys. Res. 109, B05301 doi:10.1029/2003JB002879. [491] ——, Vere-Jones, D., Guan, H., Ogata, Y. and Li, M. (2005). Preliminary analysis of observations on the ultra-low frequency electric ﬁeld in the Beijing region. In Vere-Jones, D., Ben-Zion, Y. and Z´ uniga, R. (Eds.), Statistical Seismology (Pure [501] Appl. Geophys. (PAGEOPH) 162 (6, 7)), pp. 1367–1396.

Subject Index

[Page references in slanted font, such as 382, are to Volume I, but these references are not intended to be comprehensive.] Adjacency relation ∼, 120, 127 ∪ ∩ binary relations ∼, ∼, 125 x set-dependent adjacency (∼), 128 Area-interaction point process, 124 as equilibrium birth-and-death process, 130 Asymptotic independence mixing, 206 weakly mixing, 206 ψ-mixing, 206 ergodicity, 206 expressed via functionals, 210 Asymptotic stationarity, 223 (C, 1)-asymptotic stationarity, 223 shift-coupling suﬃcient, 230 weak and strong coincide, 326 conditions for convergence of moment measures, 236 strong asymptotic stationarity, 223 Atomic measure, 382 counting measure correspondence, 4 moment measure characterization, 66 Avoidance function, 2, 33, 459 determines distribution of simple point process, 35 of Cox process, 38 557

use in limit theorems, 166 Avoidance probability, 459 Bartlett spectrum, 303 atom at origin, 205 Batch Markovian arrival process (BMAP), 110 convergence to equilibrium, 228 E–M algorithm for, 114 Q-matrix structure, 110 representations, 111 stationary interval distribution and correlation properties, 118 Bayesian-type formulae likelihood ratio, 406 Bernoulli process, discrete Papangelou kernel properties, 530 Bijective point map, 309 Binomial probability bounds, 166 Birth-and-death process, 99 conditional mark distribution, 100 death process in reliability, 100 spatial, 126, 130 as space–time process, 488 Blackwell renewal theorem, 83 generalized, 331

558 BMAP, 110, see Batch Markovian arrival process Bonferroni inequalities p.g.ﬂ. bounds, 71 Borel measure, 384 Boundedly ﬁnite Borel measure, 2 Bounded measurable function space BM(X ), 52 B-selective point map, 311 ( , 1)-asymptotic stationarity, 326 see also Asymptotic stationarity Campbell measure, 268 ——, basic properties characterization, 272, 282 deﬁnition via extension, 270 factorization, 269 ﬁrst moment measure relation, 273 for a.s. atomic random measure, 275 structure is singular, 275 characterization for point process, 275 for KLM measure Q, 272, 282 for random measures, 284 higher-order analogues, 272 ‘modiﬁed’ v. ‘reduced’, 270 Radon–Nikodym approach, 270 reﬁnement of ﬁrst moment, 270 terminology origin, 271 ——, invariant, 285, 293 factorization, 293 Palm measure characterization, 294 stationary random measure invariance characterization, 285 ——, marked from marked cumulative process, 379 of MPP, reduced, 331 product of predictable kernel and Campbell measure for Ng , 380 semi-Markov process example, 381 Campbell theorem, 271 original, 66 Campbell measure precursor, 66 reﬁned, 288 Cantor dust, 20 Central limit properties simple point process martingale, 412 with random scaling, 413 Characteristic functional, 54 Taylor series moment expansion, 68 remainder terms, 71

Subject Index

Characteristic functions continuity condition, 63 convergence condition, 64 Cliques, 121 properties, 121, 129 maximal, 121 Cluster iterates, 334 Poisson case, 334 inﬁnitely divisible, 334 method of backward trees, 336 stable/unstable dichotomy, 336 for critical cluster member process, 337 for Neyman–Scott process, determined by random walk, 337 for stable cluster member processes, weakly singular inﬁnitely divisible limit, 338 Cluster process, stationary cluster components, construction, 193 from nonstationary components, 192 stochastic condition to be well-deﬁned, 192 Compensator, 241, 358 absolute continuity from conditional d.f.s, 364 characterizations Cox process, 419 Poisson process, 420 continuous equals quadratic variation for simple point process, 370 variance of integrated martingale, 376 convergent sequence of condition for Poisson/Cox limit, 384 dependent-thinning example, 387 dual predictable projection, 378 Campbell measure derivation, 377 integral of conditional intensity, 390 on whole of R, 394 complete conditional intensity, 394 one-point process, 358 unbounded, 376 uniquely deﬁned, determines process, 365 Complete conditional intensity, 394 Hawkes process, 395 renewal process with density, 395 stationary process, history H† , 396 hazard function representation, 397

Subject Index

Complete separable metric space (c.s.m.s.), 124, 2 Completely independent MPP when ground process simple, 86 Completely monotone set function, 35 rˆ ole in determining point process, 36 Completely random measure, 77 ﬁdi distribution structure, 85 Laplace functional representation, 86 moments, 67 sample path structure, 79 Laplace transform, 82 stationary Laplace–Stieltjes transform representation, 85 Compound Poisson process extended, 161 from random time change, 422 moment measures, 72 stationary, 180 MPP formulation, 181 Concentration function of distribution, 167 tight convergence rate, 175 Conditional intensity function 211, 231 existence of MPP, 429, 432 stationary solution, 434 weak and strong asymptotic stationarity, 434 law of evolution, 432 Lipschitz conditions, 430 Conditional intensity complete (for stationary process), 443, see also Complete conditional intensity F -mark-predictable Radon–Nikodym derivative of marked Campbell measure, 391 integral as F -predictable intensity of ground process, 391 F -predictable conditional mark distribution, 391 in information gain, 443 on coarser G w.r.t. F -conditional intensity, 393 for mixed Poisson process, 393 Conditional intensity measure (for MPP), 399 product form for kernel, 399 Condition Σ, 521 examples, 521, 529

559

Conﬁgurations of points, ergodicity, 202 Contact distribution function, 460 spherical, 459 Continuity condition for ﬁdi distributions, 28 Controlled thinning of point process, 387 Convergence modes of, 131 of Campbell measures, 297 of ﬁdi distributions, 132 of KLM measures, 147 of Laplace functionals weak convergence conditions, 138 of moments, 141 of moment measures, 144 of p.g.ﬂ.s weak convergence conditions, 138 of point processes of probability distributions strong, 132 implies weak convergence, 132 weak, 132 does not imply strong, 134 Convergence to equilibrium, 223 as limit distributions interpretation via inversion theorem , 325 via weak convergence, 326 Palm, from limit of nth point, 323 stationary, from t → ∞, 323 conditional intensity conditions, 427 via Poisson embedding, 428 Foster–Lyapunov conditions, 228 in variation norm, 223 space Z of initial conditions, 226 from Palm distribution, 323 see also Asymptotic stationarity Convex averaging sequence, 196 Copy or version of process same ﬁdi distributions, 11 Counting measure on R one-to-one correspondence with sequence of intervals, 24 Coupling, 132, 229 conditions for, 231 coupling inequality, 133, 229 coupling time, 132 equivalences, 231 with stationary process

560

implies strong asymptotic stationarity, 230 see also Shift-coupling Covariance density, 69 Covariance measure of random measure, 69 condition to be singular, 70 Covariant mapping, 309 Cox process avoidance function, 38 class invariant under rarefaction, 166 contraction of thinned process, 157 convergence to conditions on compensators, 384 via dependent-thinning, 387 directed by Markov chain, 101 likelihood, 102 E–M algorithm, 103 ﬁnite state space, 108 MPP extension, 117 directed by Markov diﬀusion, 409 directed by partially observed Markov process, 410 Neyman–Scott example, 410 directed by stationary G-process, 87 from scale-contraction of Rd , 157 stationary, 181 iﬀ directing measure stationary, 192 long-range dependent, 254 preserved under thinning and translation, 181 C.s.m.s. (= Complete separable metric space), 124, 2 Cumulant measures, 69 Cumulative process, 356 adapted to right-continuous history F has unique F -compensator, 367 family for MPP, 356 weak convergence, 143 with density, 358 Cycle stationarity point-stationarity, R1 analogue, 308 Davidson’s ‘big problem’, 481 Kallenberg counterexample, 482 Determining class of set functions, 372, 27 Deterministic component of random measure, 86 Deterministic H-intensity characterizes Poisson process, 420

Subject Index Deterministic lattice process in Rd , 192 Deterministic map of bounded set into itself, fractal dimension, 351 moment growth conditions, 352 consistent if mixing, 352 Deterministic point process, 76, 137 in Rd , 192 unit rate, 137 Diagonal shifts and stationarity, 182 reduced measure, 183 Dirac measure, 382, 3 Directed lines, random process of, 472 see also Line process Directional rose, 467 Ripley’s K-function, 467 process in R2 , moment factorization, 467 Dirichlet distribution, 24 Dirichlet process, 11 random probability distribution, 11 moment measures, 74 Discrete-time renewal process F -compensator, 373 Dissecting system, 382 use in entropy approximation, 446 monotone under reﬁnement, 454 information gain in limit, 454 tiling, inﬁnite analogue of, 311 tool to study sample paths, 39 Dobrushin’s lemma, 45 Doob–Meyer decomposition, 241, 430, 355 Dust, 20 see also Cantor, L´evy, 20 Edge-eﬀects in nearest-neighbour estimation, 463 Hanisch-type correction, 466 E–M algorithm, 101 numerical procedures, 103 uniformization algorithm, 109, 117 Empty space (F -)function, 2, 459 estimation, edge-eﬀects, 463 Hanisch-type corrections, 466 Poisson cluster process, 461 via local Janossy measure, 461 see also Avoidance function

Subject Index

Entropy, 440 atomic distribution, 440 continuous distribution, 440 generalized or relative, 440 Poisson has maximum entropy, 454 Entropy dimension, 341 Entropy rate (process in R1 ), 444 ﬁnite sample approx’n, 449 convergence conditions, 450 L1 and strong convergence, 451 mixed and compound Poisson, 444 renewal process, 455 Entropy rate for intervals, 452 relation to entropy rate, 453 Enumeration, measurable, 14 Ergodic theorem averaging over group, 196 convex averaging sequence, 196 for MPP, 197 for random measure, 197 for weighted averages, 201 general, 199 higher order, 242 individual, 196 statistical versions of, 204 statistical, 197 ETAS model, 203 nonlinear, 437 as space–time model, 499 Event- (= interval-)stationarity, 327 Extended Laplace functional, 57 sequence of, convergence, 58 Extended MPP, 7 MPP counterexamples, 22 purely atomic random measure, 8 Extended p.g.ﬂ., 60 convergence, 64 Exvisibility (spatial predictability), 513 of Papangelou intensity measure, 534 Factorial moment measure, 133, 69 advantages, 70 Family of probability measures uniformly tight, 136 F -adapted process, 236 on Z+ , 372 F -predictable process, 425, 358 on Z+ , 372 F -function, see Empty space function

561

Fidi distributions, 2, 25 consistent family, 26 convergence of random measures, 135 equivalent to weak convergence, 135, 137 Filtering problem, 400 Finite-dimensional (ﬁdi) distributions, 25, see Fidi distributions Finite point process existence, 32 Fixed atom of random measure, 39 at most a countable inﬁnity, 39 component of random measure, 86 Flow, 269 on probability space, 177, 179 stationary Cox process, 182 Fractal dimension, R´enyi, 340 controlled Palm growth, 354 deterministic map of set into itself, 351 from small-scale clustering, 349 kth order, controlled diagonal growth, 347 mean-square consistent estimate, 347 multinomial measures, 352 Functionals, linear, 52 Gamma random measure, 167 stationary case, 162, 11, 30 Gaussian measures on Hilbert space, 54 Gauss–Poisson process, 185, 465 as limit of u.a.n. array, 154 isotropic centred, 468 KLM measure structure, 94 marked, 332 reduced moment densities, 332 Papangelou kernel aspects, 530 Generalized Blackwell theorem, 331 Generalized compound Poisson process, 61 Generalized entropy expected likelihood ratio, 441 Kullback–Leibler distance, 441 reference measure in, 441 Generalized functions, random, 53 Georgii–Nguyen–Zessin formula, 462 GNZ equation, 511 extended form, 513, 536

562

Germ–grain models, 503 see also Lilypond model, Particle process, 205 G-function, 460 see also Nearest-neighbour function Gibbs kernel, 525 portmanteau Campbell measure, 525 Gibbs process with pairwise interactions, 507 interaction potential, 507 process on circle or sphere, 509 G-random measure, 83 directing measure of Cox process, 84 L´evy representation, 83 shot-noise, 84, 87 Grassberger–Procaccia estimates, 355 Ground measure, 3 Ground process, 194, 7 condition to be well-deﬁned, 22 Haar measure, 408 in transformation invariance, 409, 188 MPPs as canonical example, 190 Hamel equation, 64, 192 application in Rd , 192 variant of, 482 Hammersley–Cliﬀord theorem, 122 extended, 129 Hawkes process, 183 complete conditional intensity, 395 convergence to equilibrium, 232, 236 strong asymptotic stationarity, 234 exponential decay, 99 moments, 145 nonlinear, 437 Lipschitz condition, 437 without ancestors, 141, 145 Hazard function, conditional rˆ ole in likelihoods, 402 Hereditary class of subsets, 122 Hereditary function, 122, 128 Hidden Markov model (HMM), 101, 400 Cox process, 103, 400 Neyman–Scott example, 410 Poisson observations for, 106 state estimation, 105 Higher order ergodic theorem, 242 Hill estimate, 346

Subject Index

History, 357 minimal H, 357 right-continuous, 357 F(+) , 373 counterexample, 373 HMM, see Hidden Markov model Homogeneous point process, 459 Hougarde process, 83 Hurst index, 250 cluster process, 251 Cox process, 252, 254 of superpositions, 252 stationary renewal process, 254 Hyperplanes, directed, intersections of, 484 IHF, see Integrated hazard function Independence of random measures, 63 Inﬁnitely divisible distributions, nonnegative, 82 Inﬁnitely divisible MPP, 94 cluster process representation, 94 ground process, 94 inﬁnite divisibility, 94 KLM measure, 94 structure, 94 Inﬁnitely divisible point process, 87 a.s. ﬁnite characterization, 92 convergence, 147 equivalent KLM Q-weak convergence, 147 KLM measure, 89 regular/singular components, 92 representations cluster process, 89 p.g.ﬂ., 91 Poisson randomization, 89 transformation invariant iﬀ KLM measure invariant, 221 ——, stationary iﬀ KLM measure stationary, 216 KLM measure ergodicity, mixing conditions, 218 Palm factorization, 295 regular/singular dichotomy if regular then ergodic, 217 regular iﬀ Poisson cluster process representation, 216 singular if Poisson randomization representation, 216 weak/strong singular dichotomy, 220

Subject Index

strongly singular iﬀ Poisson randomization superposition exists, 221 weakly singular example, 185 Inﬁnitely divisible random measure, 87 ﬁdi distributions inﬁnitely divisible, 88 representation, 95 Laplace functional representation, 93 Inﬁnite particle systems external conﬁguration, 519 Information gain, expected, 276 generalized entropy, 441 in discrete approx’n to entropy, 447 limit of reﬁnement, 454 mixed and compound Poisson, 444 MPP decomposition (points/marks), 444 renewal process, 445 rˆ ole of conditional intensity, 443 simpliﬁes when stationary, 443 two processes evolving simultaneously, 448 Inhomogeneous Poisson process, as Markov point process, 121 Innovations (‘residual’) process, 513 Integrated hazard function (IHF), 108, 359 for one-point process, 359 Integration by parts for Lebesgue– Stieltjes integral, 107, 376 Intensity measure of point process, 44 Korolyuk equation, 46 Interacting point processes, 244 Interaction function, 122 Interaction potential in Gibbs process, 127, 507 Internal history H, 234, 357 H-predictability of process, 374 Intervals, process of, 302 deﬁnition of point process, 13 marked, stationary, 333 Interval-stationarity, 268, 299 Intrinsic history, 234, 357 Invariant σ-ﬁeld I trivial invariant functions constant, 204 Inversion formulae, Palm measure stationary point process, 291, 300

563

Isotropy, 297, 466 moment measure factorizations, 467 rotationally invariant probability, 466 MPP analogue, 467 Iterated convolutions conditions for Poisson limit, 169 see also Cluster iterates, Random translations Janossy density, 125 in Papangelou intensity, 507 Markov point process density, 119 J-function, 460 clustering/regularity indicator, 460 Poisson cluster process, 461 Poisson process, 464 Jordan–Hahn decomposition, 374, 38, 252 Khinchin existence theorem, 45 Khinchin orderliness, 47 K-function, 464 see Ripley’s K-function KLM measure, 91 extended, convergence of, 147 Gauss–Poisson process, 94 inﬁnitely divisible MPP, 94 Palm factorization for stationary inﬁnitely divisible point process, 295 regular, strongly or weakly singular, 295 Q-weak convergence, 147 see also Inﬁnitely divisible point process Kolmogorov consistency conditions, 27 Kolmogorov existence theorem for point process, 30 for random measure, 28 Korolyuk’s theorem, 45 generalized Korolyuk equation, 46 purely atomic random measure, 50 Kullback–Leibler distance expected information gain, 441 generalized entropy, 441 Laplace functional, 57 characteristic functional analogue, 64 completely random measure, 86 convergence conditions, 64 expansion, ﬁrst-order, 75 Taylor series, moments, 75 extended, 57

564

Palm kernel characterization, 280 random measure characterization, 57 Lebesgue–Stieltjes integration by parts, 107, 376 Level-crossing, Poisson limit, 159 L´evy dust, 20 L´evy process nonnegative, 82 subordinator, 20 Lifted operator, 178 Likelihood, MPP, point process existence when H-adapted, 401 from ground process intensity and conditional mark density, 403 reference probability measure, 401 Likelihood ratio, point process as Radon–Nikodym derivative, 411 from H-conditional intensity, 402 H-local martingale, 404 in Bayesian-type formulae, 406 Poisson process with mixed atomic and continuous parameter, 411 Lilypond protocol models, 503 algebraic speciﬁcation, 505 germ–grain model, 503 absence of percolation, 504 volume fraction, 504 Linear functionals, 52 Linear process of random measure, 74 shot-noise example, 74 Line process clustered, 484 coordinate representations, 472 directed/undirected, 472, 483 in R3 , 484 line measure, 474 Poisson, 472 stationary, 476 no parallel/antiparallel lines, 485 railway line process, 484 ——, stationary directional rose, 474 isotropic, 474 second-order properties, 477 parallel/antiparallel lines, 478 reﬂection invariant, 478 mean density, 474 moment structure, 473 shear-invariant, 473, 482 moment factors, 474

Subject Index

Line segment process from lilypond model, 504 yielding random measure, 23 Lipschitz conditions on conditional intensity functions, 430 Local martingale, 358 integral w.r.t. martingale, 373 Local Palm distribution P(x,κ) , 273 element of Palm kernel, 273 for modiﬁed Campbell measure, 298 for MPP, 317 for nonsimple point process, 282 for second-order Campbell measure, 298 for stationary process, 274 higher-order family, 281, 282, 284 random measure with density, 274 Long-range dependence, 250 renewal process, 250, 254 cluster process, 251 covariance measure decomposition, 252 deterministic process, 253 ‘power-law decay’, 254 MAP, see Markovian arrival process Mark distribution, stationary in ergodic theorem for MPP, 197 Marked Campbell measure, 317, 379 Marked cumulative process, 378 marked Campbell measure, 379 Marked point process (MPP), 194, 7 embedded regeneration points, 328 Palm moment inequalities, 333 ergodic theorem, 199 stationary mark distribution, 197 extended MPP, 7 individual ergodic theorem, 318 nonergodic case extension, 321 for marked random measure, 322 information gain, 443 points v. marks decomposition, 444 likelihood existence when H-adapted, 401 Palm distributions for mean Palm distribution P 0 , 323 local Palm distributions P(x,κ) , 278, 317 for ground process, 279, 283 representation on X × K∪ , simple, 23 with simple ground process, 22

Subject Index

random rescaling to compound Poisson, 422 rescaling counterexample, 425 rescaling to two-dimensional Poisson process, 424 ——, identiﬁed processes completely random, 84 with simple ground process, 84 structure of, 85 exponentially distributed intervals, 332 marked Gauss–Poisson process, 332 reduced moment densities, 332 on S, stationary, 190 conditions for stationarity, 191 ——, stationary, 197 asymptotic stationarity, 223 (C, 1)-asymptotic stationarity, 223 weak and strong coincide, 231 inversion theorem analogue, 327 averaged (= mean) Palm distribution P 0 , 319, 323 complete conditional intensity, 399 convergence to equilibrium, 323 coupling, shift-coupling equivalences, 230 ergodic, L2 convergence for, 204 higher-order mark distributions, 321 independent unpredictable marks, ergodic limits, 319 inversion theorem for Palm and stationary distributions, 324 kth order stationarity, 237 mark-dependent Palm distributions, 317 on Rd × K a.s. zero–inﬁnity dichotomy, 205 P, P0 invariant σ-algebras equivalent, 333 reduced Campbell measure, 318, 331 reduced moment measures, 238 higher-order ergodic theorem, 247 stationary mark distribution, 197, 317 Markov density function, 120 Markov modulated Poisson process (MMPP), 101 convergence to equilibrium, 228 Markov point process, 118 density function for, 119 Janossy densities, 119

565

Hammersley–Cliﬀord theorem, 122 extended, 129 products of Markov density functions, 125 Papangelou conditional intensity, 120 simple ﬁnite point process with density, 120 Markov renewal function, 115 mean Palm, subadditive, 332 Markov renewal process (MRP), 96 as MPP, 98 point process properties, 98 ground process, renewal function, 332 factorial moment densities, 99 Markov renewal function, 115 Markov renewal operator, 98 observed on subset, 115 renewal function analogue, 99, 332 subadditivity, 332 semi-Markov process equivalence, 97 see also Semi-Markov process Markovian arrival process (MAP), 110 Q-matrix structure, 110 Martingale, 427 integral w.r.t., local martingale property, 373 quadratic variation of, 368 increments as conditional variances, 369 see also Point process martingale Maximal clique, 121 Mean Palm distribution P 0 for MPP, 323 Measurable enumeration, 14, 24 Measure, decomposition of, 4 Method of backward trees for cluster iterates, 336 Method of reduced trees for cluster iterates, 336 Metric transitivity implies ergodicity, 204 of transformation, 194 ergodic theorem for, 201 M/G/∞ queue, spatial MPP in time, 488 space–time model, 488 Minimal history H, 357

566

Mixed Poisson process conditional intensities on G ⊂ F , 393 conditions to be ordinary, 51 contraction of thinned process, 156 deﬁned by moments, condition, 73 ﬁdi distributions, 62 limit of points moving with random velocities, 172 limit of random translations of nonergodic process, 171 moment measures, 72 overall Palm probability in ergodic limit, 323 p.g.ﬂ., 61 random translation invariant, 172 ——, stationary mixing iﬀ Poisson, 212 Mixed random measure, moments, 67 Mixing process, 206 kth order mixing, 215 trivial tail σ-algebra, 215 MMPP, see Markov modulated Poisson process Modiﬁed Campbell measure, 271 analysis of Papangelou kernel, 521 disintegration of, 524 in GNZ equation, 512 local Palm distributions for, 298 ‘reduced’ terminology, 270 Moment measure, 65 Campbell theorem, 66 expectation measure, 65 higher-order, symmetric, 75 isotropy, factorization, 467 Ripley K-function and directional rose, 467 kth order, 66 of diagonal of power set, 75 bound on multiplicity, 75 stationary diagonal factorization, 237 reduced moment estimation, 244 MPP, see Marked point process MRP, see Markov renewal process Multidimensional random measure, 7 Multivariate characteristic function, 54 characteristic functional, 54 Multivariate inﬁnite divisibility, 94 Multivariate Poisson process independent components, as limit, 152

Subject Index

Multivariate random measure, 7 Mutual nearest-neighbour matching, 311 Natural history, 357 Nearest-neighbour distances, 459 stationary MPP, 462 independent marks, 462 Nearest-neighbour (G-)function, 460 as Palm–Khinchin limit, 466 estimation, edge-eﬀects, 463 Hanisch-type corrections, 466 Mat´ern’s Model I, 466 stationary MPP, 462 independent marks, 462 via local Janossy measure, 461 Nearest-neighbour matching, 311 Negative binomial process, 10, 23, 73 deﬁned via two mechanisms, 73 local properties, 73 Neighbourhood relation, see Adjacency Neyman–Scott cluster process, 181 analogue on circle S, 189 bivariate, nonisotropic, isotropic, 471 hidden structure, 410 in R2 , 464 conditional intensity, 465 space–time, singular components, 349 diﬀerent orders of growth, 350, 355 stationary regular representation, 185 Nonergodic process Bartlett spectral mass at origin, 205 Nonnegative increment process, 11 Nonnegative inﬁnitely divisible distributions, 82 Nonnegative L´evy process, 82 Nonnegative stable processes, 83 Laplace–Stieltjes transform, 85 Nonstationary point process counterexamples avoidance function stationary, 191 MMPP, 455 one-dimensional distributions stationary, 191 One-point process convergence, 137 H-compensator, 358 absolutely continous or continuous, 360 randomized hazard function, 362

Subject Index

square integrable quadratic variation, 371 with prior σ-algebra F0 compensator w.r.t. intrinsic history, 360 ——, MPP compensator, 389 for ground process, 389 Orderly point process, 46 equivalent conditions, 51 Khinchin orderly, 47 µ-orderly, 46 terminology: orderly, ordinary, 51 Ordinary point process, 46 simple, 51 Ord’s process, 130 Palm, theory, 268 queueing theory applications, 269 stochastic geometry applications, 269 Palm distributions, measure, 273 for MPP, 277 from Campbell measure, 273 local, 273 Palm kernel, 273 ——, moment measures, 292 as Radon–Nikodym derivatives, 293, 298 relation to stationary moments, 292 Palm kernel, for Poisson process, 280 Laplace functional characterization, 280 see also Local Palm distribution Palm measure, σ-ﬁnite Campbell measure characterization, 293 factorization characterization, 294 ——, for MPP P(x,κ) , 317 mean (‘average’) P 0 , 323 Sigman’s alternative version, 332 ——, stationary point process inversion formulae, 291, 300 Palm probabilities as rates, 290 Palm–Khinchin equations generalization, 302 point process in Rd , 308 Papangelou (conditional) intensity, 120, 506 conditional probability interpretation, 507 deﬁned via Janossy densities, 507

567

density of Papangelou kernel, 524 density of Papangelou measure, 536 higher-order, 507 integral relations, 508 Markov density function rˆ ole, 120 multiplicative relations, 510 conditional Papangelou kernels, 525 relation to Palm densities, 507 Papangelou (intensity) measure exvisibility property, 534 relation to ﬁrst moment, 533 Papangelou kernel, 520 atomic part, 520, 527 discrete Bernoulli example, 530 Gauss–Poisson example, 530 higher-order, 525 Gibbs (portmanteau) kernel, 525 Papangelou (intensity) measure, 520 Papangelou intensity as density, 524 viewed as random measure, 526 Penetrable spheres model, 124 P.g.ﬂ., 59 see Probability generating functional PH-distributions, 111 representation (Q, π), 111 Laplace transforms, 117 Planar point process (= point process in R2 ) isotropic, 466 Poisson, centred, 468 Gauss–Poisson, centred, 468 moments, polar coordinates and factorization, 467 stationary isotropic, 469 counterexample, 205 moment measure factorization, 470 Ripley K-function and directional rose, 470 Point map bijective, 309 B-selective, 311 covariant mapping, 309 for point-stationarity, 308 inverse, 309 properties, 317 Point process (see also individual entries) ——, basic properties distribution, 26 determined by ﬁdi distributions, 26

568

ergodic limit a.s. constant trivial invariant σ-algebra, 206, 208 metrically transitive, 206 formal deﬁnition, 7 integer-valued random measure, 7 intensity measure, 44 interval sequence deﬁnition, 13, 302 Kolmogorov consistency conditions, 30 nonnull, 188 orderly, ordinary, 46 random variable mapping characterization, 8 simple, simplicity of, 47, 7, 43 unaligned, 312 ——, general properties asymptotic independence expressed via functionals, 210 mixing, 206 ψ-mixing, 206 weakly mixing, 206 conditional intensity local Poisson character, 370 entropy discrete trial approximation, 445 dissecting partitions, 446 ﬁltering, 400, 407 Cox process and HMM, 400 unit rate Poisson as reference, 407 mixed Poisson example, 407 updating formulae, 407 time series analogies, 401 likelihood existence when H-adapted, 401 likelihood ratio from H-conditional intensity, 402 H-local martingale, 404 martingale, for simple process, 247, 368, 412 F -compensator via IHFs of conditional d.f.s, 364 quadratic variation, 370 randomly scaled to normal limit, 415 (mixed) Poisson example, 416 ——, stationary, in R, 178 complete conditional intensity, 396 MPP extension, 399 kth order moments and k-point conﬁgurations, 244 kth order stationarity, 237 reduced moment measures, 238

Subject Index

long-range dependence, 250 Hurst index, 250 Palm measure a.s. zero–inﬁnity dichotomy, 299 inﬁnite intensity possible, 302 ——, stationary, in Rd , 304 points enumerated by distance from origin, 305 a.s. no points equidistant from origin, 304 Palm measure bijective point map invariant, 312 ——, identiﬁed processes controlled thinning, 387 in R2 , see Planar point process multivariate, 7 quadratic variation, 370 see also MPP on circle S stationary, 289 reduced moment measures, 298 on surface of cone, 471 two parameterizations, 471 on Z+ , 372 F -adapted, -predictable, 372 F -compensator, 372 discrete-time renewal process, 373, 375 quadratic variation, 375 Point-stationarity, 269, 299, 312 Poisson approximants inequality, 153 tool for convergence conditions, 154 Poisson cluster process convergence to equilibrium, 236 strong and geometric asymptotic stationarity, 236 empty space function, 461 stationary, regular representation, 183 nonuniqueness, 184 Khinchin measures, 184 Neyman–Scott process, 185 Poisson distribution maximum entropy, 454 Poisson embedding for general history, 438 history-dependent thinning, 426 Poisson probabilities bounds, 166

Subject Index

Poisson process, convergence to conditions on compensators, 384 dependent-thinning example, 387 Poisson process ——, basic properties existence, 31 inﬁnitely divisible, 91 isotropic centred, 468 moment measures, 72 ordinary iﬀ simple, 51 p.g.ﬂ., 60 stationary, 180 ——, characterization class invariant under rarefaction, 165 deterministic H-intensity, 420 inverse compensator scaling, 421 Palm kernel and Slivnyak–Mecke theorem, 281 Watanabe’s, 418 ——, general properties conﬁgurations of points, 202 information gain, 444 maximum entropy rate, 454 Palm kernel, 280 Poisson property lost under random shift, 316 ——, limit properties conditions on compensators, 384 dependent-thinning example, 387 contraction of thinned process, 155 multivariate, 151 of high level-crossing, 159 of dilated superposition, 150 extensions, 154 variation norm limit, 161 Poisson randomization representation for inﬁnitely divisible process, 89 KLM measure totally ﬁnite, 93 Power-law growth, 342 Predictability, 355 Probability generating functional (p.g.ﬂ.), 144, 59 characterization for point process, 60 convergence conditions, 64 counterexample, 64 expansions of via factorial moments, 70 factorial cumulant measure, 71 moments, 70

569

extended p.g.ﬂ., 60 condition for convergence, 64 inﬁnitely divisible point process, 91 Prohorov metric, 145 ψ-mixing stationary random measure central limit property, 214 Purely atomic measure, 3 see also Random measure, purely atomic Quadratic random measure, 9 integral as random measure, 42 moments, 67 Quadratic variation process of martingale, 368 atomic and continuous components, 370, 375 equals compensator if continuous for simple point process, 370 for simple and multivariate point processes, 370 square integrable one-point process, 371 Q-weak convergence, 147 Railway line process, 484 Random distribution, 11, 23 absolutely continuous, 50 nonatomic condition, 75 Random linear functionals strict and broad sense, 53 Random Markov shifts, 173 Random measure, 6 ——, basic properties characterization characteristic functional, 56 distribution, 26 determined by ﬁdi distributions, 26 Laplace functional, 57 random variables, 54 sample path components, 86 deterministic, 86 ﬁxed atom, 39, 86 condition to be free of, 39 nonatomic a.s., condition, 41 set-indexed family of r.v.s, 17 tail σ-algebra, 208 ——, general properties superposition, 63

570

——, identiﬁed completely random, see Completely random measure integral of nonnegative process, 23 as random sum of discs or lines, 23 linear process of, 74 shot-noise example, 74 multidimensional, 7 multivariate, 7 nonnull, 188 on R+ , history for F -adapted process, 357 internal (natural, minimal), 357 with χ2 density, 9, 42 ——, purely atomic, 7 characterization as point process, 275 via Campbell measure, 275 via construction, 49 extended MPP, 8 ﬁnitely many atoms, example, 277 Palm kernel, 277 Korolyuk equation for, 50 moment condition for, 66 on countable state space, 282 Campbell measure for, 282 ——, stationary, 178 Bartlett spectrum atom at origin, 205 nonzero iﬀ process nonergodic, 205 Campbell measure characterizations, 285 kth order stationarity, 237 long-range dependence, 250, 254 Hurst index, 250 moment properties when absolutely continuous, 248 on Rd , zero–inﬁnity dichotomy, 187 Palm measure, distribution, 288 reduced moment measures, 238 Random probability distribution, see Random distribution Random process of directed lines, 472 see also Line process Random Schwartz distributions, 53 Random time change (= Randomly rescaled point process) Poisson, 421 proofs using p.g.ﬂ., 419 proofs via interval distributions, 426 stationary compound Poisson process, 422

Subject Index

see also Poisson embedding Random thinning, see Thinning Random translations, 166 conditions for Poisson limit, 169 iterated convolution, u.a.n. property, 167 Random variable mapping characterization of point process or random measure, 8 Random walk, point process boundedly ﬁnite, 24 Random walk renewal process as space–time process, 494 moment measures, 505 Random walk on circle S, 193 Rarefaction, see Thinning Rayleigh–L´evy dust or ﬂight, 21 Reduced measure by diagonal factorization, 183 Campbell measure for MPP, 331 moment measures for stationary random measures, 238 properties, 239 scale factors in, 238 see also Modiﬁed Campbell measure Reference measure in generalized entropy, 441 in likelihood, 401 Reﬁned Campbell theorem, 288 Regeneration points in MPP, 328 Regenerative measure stationary, 186 Regular conditional probabilities 380 on product space, 25 Regular inﬁnite divisibility, 92 a.s. ﬁnite, 92 KLM measure totally ﬁnite, 92 Poisson cluster process representation, 93 Relative entropy, 440 Renewal process, 67 class invariant under rarefaction, 165 compensator and martingale for, 366 convergence to equilibrium, 227 entropy, gamma lifetimes, 454 not a Markov point process, 127 on circle S, 193 phase type (PH-distributions), 111 Poisson limit under rarefaction, 165 recurrence relation for p.g.ﬂ., 65

Subject Index stationary, 186 Palm–Khinchin equations, 302 reduced moment, cumulant measures, 247 with density complete conditional intensity, 395 Renewal theorem, 83 Blackwell, generalized, 331 for random walk on S, 193 R´enyi dimensions, 341 correlation integrals, 341 quasi factorial moment estimator, 343 consistent estimators, 343 discrete entropy approximation, 440 multifractal, 341 consistent estimator for, 343 unifractal, 341 R´enyi’s dimensional entropy, 455 R´enyi–Monch theorem, 35 Poisson limit from one-dimensional distributions, 162 use of variation norm, 162 Repairable system model, 235 Residual or ‘innovations’ process, 513 exvisibility, 513 Right-continuity of histories, 357 F(+) , 373 counterexample, 373 Ripley’s K-function, 297, 464 radial component of moment factorization of process in R2 , 467 Scale-invariance, 255 Schwartz distributions, random, 53 Self-similarity, 83, 255 in point conﬁgurations, 249 of random measure, 83 stable random measures, 83 Semi-Markov process, 96 compensator components, 381 conditional intensity and mark distribution, 381 Markov process simpliﬁcation, 382 equivalent Markov renewal process, 97 likelihood, 116 point process properties, 98 see also Markov renewal process Set function, subadditive under reﬁnement, 50

571 x

Set-dependent adjacency (∼), 128 Shift-coupling, 229 in limit theorems, 269 with stationary process implies strong (C, 1)-asymptotic stationarity, 230 see also Coupling Shift transformations, 177 shift operator, 178 Shot-noise process, 163, 170 as linear process, 74 from G-random measure, 84, 87 Signed random measure, 19 Wiener motion counterexample, 19 Simple birth process maximum likelihood estimator asymptotically mixed normal, 416 Simple counting measure, 3 Simple point processes distribution determined by avoidance function, 35 sample path property, 43 second moment measure suﬃcient condition, 66 sequence of, suﬃcient conditions for convergence, 140 Singular inﬁnite divisibility, 92 Skorohod metric, 145 Slivnyak–Mecke Poisson process characterization theorem, 281 Smoothing problem, 400 Space of counting measures NX# , 3 as c.s.m.s., 6 closed subspace of M# X, 6 Space of measures, as c.s.m.s., 6 Space–time process ——, general 485 estimation in, 490 evolving spatial ﬁeld, 486 residual analysis diagnostics, 502 second moment estimation, 496 Bartlett spectrum, 497 boundary (edge) eﬀects, 497 conditional intensity, 498 stochastic declustering, 491 stochastic reconstruction, 491 variety of processes, 486 with associated mark, 486

572

——, models 505 cluster, Bartlett–Lewis, 505 cluster, Neyman–Scott, 505 ETAS model, 499 M/G/∞ queue, 488 Poisson process, 348 spatial birth-and-death, 488 ——, stationary family of Palm distributions, 494 ﬁrst moments, 487 Poisson, 487 Poisson cluster process, 495 terminology, 487 reduced second moments, 491 alternative representations, 492 Fourier transforms, 493 simpliﬁed when homogeneous, 492 stationary-time, homogeneous-space, 487 Spatial birth-and-death process, 126 Spatial point pattern, 458 models, 459 statistics, 459 diagnostic tests, 514 residual variances, 516, 518 reduced moment measures, 464 Spherical contact distribution, 459 Stable convergence, 419 identifying Poisson/Cox limit of convergent compensators, 384 dependent-thinning example, 387 point process martingale, 412 randomly scaled to normal limit, 415 Stable random measures, 83 self-similarity, 83 Stationarity, 178 on half-line extension to whole line, 223, 235 preserved by random thinning, 181 preserved by random translation, 181 strict v. weak, 178 see also Asymptotic stationarity Stationary cluster process asymptotic independence determined by cluster centre process, 213 Stationary cubic lattice process, 192 Stationary gamma random measure, 162, 11, 30

Subject Index

Stationary independent increment process, 81 Stationary isotropic planar point process, 469 Stationary MPP, 179 Stationary point process on Rd , 178 a.s. zero–inﬁnity dichotomy, 187 MPP extension, 205 functional equivalence, 180 on circle S, 188 extension to MPP, 190 Palm measure inversion formulae, 291, 300 Stationary random measure ergodicity implies nontrivial mixture impossible, 217 on Rd , 178 a.s. zero–inﬁnity dichotomy, 187 functional equivalence, 180 Stein–Chen methods, 163 Stochastic continuity sets, 134 form an algebra, 135 Stochastically continuous, 11 Stochastic declustering, space–time model, 491 stochastic reconstruction, 491 Stochastic integral w.r.t. Poisson process, 428 stochastic d.e. driven by Poisson process, 428 thinning construction, 427 Stopping time sequence properties of limits, 373 Strauss process, 123 characterization, 123 extended, 130 Stress-release model, 239 limit properties of estimators, 418 MPP variant, 235 Subadditive set function, 43, 50 under reﬁnement, 50 Sums of independent random measures, 62 Superposition of point processes, 146 convergence, 65 p.g.ﬂ. condition for convergence, 154 Superposition of random measures, 63 conditions for weak limit, 153 convergence, 65 Sup metric on space of d.f.s, 145

Subject Index

Support counting measure, 4 Tail σ-algebra, 208 tail event, 208 trivial, implies mixing, 209 Takacs–Fiksel estimation procedure, 514 Telegraph signal process, 101 Thinning of point process, 155 condition for Poisson limit, 155 Cox process as limit, 157, 387 dependent-thinning, limit via convergent compensators, 387 random, nonstationary, with stationary output, 192 Tiling of c.s.m.s., 15, 311 ‘inﬁnite’ dissecting system, 311 Triangular array, 146 conditions for Poisson limit, 150 independent array, 146 u.a.n. condition, 146 U.a.n., see Uniform asymptotic negligibility Unaligned point set, 312 Uniform asymptotic negligibility (u.a.n.), 146 suﬃcient for inﬁnite divisibility, 149 Uniform integrability rˆ ole in convergence, 141 Uniform random measure, 9 Uniform tightness of probability measures, 136 Updating formulae, likelihoods 407 estimation/detection separation, 407 simpliﬁed in Markovian case, 408 Vacuity function, 2 see also Avoidance function Variation norm, 144 Poisson limit property, 162

573

Version or copy of process same ﬁdi distributions, 11 Voronoi polygon, 305 about point at origin of N0 , 306 Watanabe’s theorem (Poisson characterization), 365 analogue of proof, 390 basic form, 420 extension to Cox process, 419 Weak convergence, 390 totally ﬁnite measures, 2 weak-hash (w# ) convergence, 2 equivalence and non-equivalence, 3 example not weakly convergent, 22 ——, of random measures, 132 convergence of ﬁdi distributions, 134 equivalent convergence modes, 135, 137 Laplace functional conditions, 138 p.g.ﬂ. conditions, 138 Weakly asymptotically uniformly distributed measures, 175 Weighted averages, ergodic theorem, 201 Wiener’s homogeneous chaos, 19 Wold process, 92 complete conditional intensity, 399 convergence to equilibrium, 227 recurrence relation for p.g.ﬂ., 65 stationary Palm–Khinchin equations, 302 Workload-dependent queueing process, 235 Zero–inﬁnity dichotomy, a.s. stationary random measure/point process on Rd , 187 for MPP on Rd × K, 205