This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
b 
c
+
b(b1) 
c(c1)
+
where b is a positive integer and c > b.
b(b1)(b2) c(c1)(c2)
+
Then
b
1 = 
c
;
2 c  1
;
3 
c2
and in general
=
a+b

b cb+1
1, we have 1+
Solution 2.
a+b · =
a
The polynomial
b
b(b  1)
c
c(c1)
S(x) = 1 + x +
;c2
+
can be presented in the form of a definite integral
J;\ 1HI x))b(1
S(x) = (c + 1)

�)•6d�
whence
S(l) = (c + 1)
L1
(1  �)•bd� =
ifc=a+b1. 1!! Find the approximate expressions for page 36, when b is a large number. Take for
c+1 cb+1
=
a+ b 
a
the probabilities P and Q in Prob. 15, numerical application
a
= b = 50.
43
THEOREMS OF TOTAL AND COMPOUND PROBABILITY
Solution. Since P +Q=1, it suffices to seek the approximate expression for p Q. Now
whence
P Q =�
fl (1
2)o
( )
t a ( 1)b . u)b1 � a du+ 2 2°(a + b)C:+b
To find the approximate expression of this integral, we set
whence
u can be expressed
as
a power series in
v:
2 4b +a  1 12b2+ (2b + a 1)2 u= vv2 + va3 (2b+a  1)6 2b +a 1 (2b+a  1)3 Subs tituting the resulting expression of between limits 0 and oo, we obtain for P terms are
P
Q=
a
2b+a1
[
1
4b+a1
(2b+a1 )2

du/dv and integrating with respect to v Q an asymptotic expansion whose first
]
a1 [ 2b2+(2b+a 1)2) (1)ba + + (2b+a  1)6 24(a+b)C:+b·
A more detailed discussion reveals that the error of this approximate formula is less than
a (}2)b+t(�)at and greater than
b � 12.
For
a[4(a 1)2  6b (a  1) + 32b2) (2b +a
a =b =50 the formula yields p

p =0.6659;
Q=0.3318;
1)6
_
Q=0.3341.
References JAcOB BERNOULLI: "Ars conjectandi,"
1713.
ABR. DE MoiVRE: "Doctrine of Chances," 3d ed., 1756. LAPLAcE: "Theorie analytique des probabilites," Oeuvres VII, 1886. J. BERTRAND: "Calcul des probabilites," Paris, 1889.
E. CzUBER: "Wahrscheinlichkeitsrechnung," 1, Leipzig, 1908 WHITWORTH: "Choice and Chance," 5th ed., 1901. CASTELNuovo: "Calcolo delle probabilita," vol.
.
1, Bologna, 1925.
.
provided
CHAPTER III REPEATED TRIALS 1. IIi the theory of probability the word "trial" means an attempt to
produce, in a manner precisely described, an event E which is not certain. The outcome of a trial is called a "success" if E occurs, and a "failure" if E fails to occur.
For instance, if E represents the drawing of two cards
of the same denomination from a full pack of cards, the "trial" consists in taking any two cards from the full pack, and we have a success or failure in this trial according to whether both cards are of the same denomination or not. Regarding
If trials can be repeated, they form a "series" of trials. series of trials, the following two problems naturally arise: a.
What is the probability of a given number of successes in a given
series of trials?
And as a generalization of this problem:
b. What is the probability that the number of successes will be
contained between two given limits in a given series of trials? Problems of this kind are among the most important in the theory of probability. 2. Trials are said to be "independent" in regard to an event E if
the probability of this event in any trial remains the same, whether the results of any number of other trials are known or not.
On the other
hand, trials are "dependent" if the probability of E in a certain trial varies according to the information we have about the outcome of one or more of the other trials. As an example of independent trials, imagine that several times in succession we draw one ball from an urn containing white and black balls in given proportion, after each trial returning the ball that has been drawn, and thoroughly mixing the balls before proceeding to the next trial.
With respect to the color of the balls taken, we may reasonably
assume that these trials are independent.
On the other hand, if the
balls already extracted are not returned to the urn, the above described trials are no longer independent.
To illustrate, suppose that the urn
from which the balls are drawn, originally contained 2 white and 3 black balls, and that 4 balls are drawn. third ball is white?
What is the probability that the
If nothing is known about the color of the three
other balls, the probability is,%.
If we know that the first ball is white,
but the colors of the second and fourth balls are unknown, this proba bility is�.
In general, the probability for any ball to be white (or black) 44
REPEATED TRIALS
SEC. 3]
45
depends essentially on the amount of information we possess about the color of the other balls. Since the urn contains a limited number of balls, series of trials of this kind cannot be continued indefinitely. As an example of an indefinite series of dependent trials, suppose that we have two urns, the first containing 1 white and 2 black balls, and the second, 1 black and 2 white balls, and the trials consist in taking one
ball at a time from either urn, observing the following rules: first ball is taken from the first urn;
(b)
(a )
the
after a white ball, the next is
taken from the first urn; after a black one, the next is taken from the second urn; taken.
(c)
balls are returned to the same urns from which they were
Following these rules, we evidently have a definite series of trials, which can be extended indefinitely, and these trials are dependent. For if we know that a certain ball was white or black, the probability of the next ball being white is U or.%, respectively. Assuming the independence of trials, the probability of an event E may remain constant or may vary from one trial to another.
If an
unbiased coin is tossed several times, we have a series of independent trials each with the same probability, 72, for heads.
It is easy to give
an example of a series of independent trials with variable probability for the same event.
Imagine, for instance, that we have an unlimited
number of urns with white and black balls, but that the proportion of white and black balls varies from urn to urn. cessively from each of these urns.
One ball is drawn suc
Evidently, here we have a series of
trials independent in regard to the white color of the ball drawn, but with the probability of drawing a ball of this color varying from trial to trial. In this chapter we shall discuss the simplest case of series of inde pendent trials with constant probability.
They are often called "Ber
noullian series of trials" in honor of Jacob Bernoulli who, in his classical book, "Ars conjectandi" (1713) made a profound study of such series and was led to the discovery of one of the most important theorems in the theory of probability. 3. Considering a series of n independent trials in which the probability
of an event E is q
=
1
 p),
p
in every trial (that of the opposite event F being
the first problem which presents itself is to find the proba
bility that E will occur exactly m times, where 0, 1, 2, . . . n.
In the extreme cases m When m
=
m
is one of the numbers
In what follows, we shall denote this probability by T =
n and m
=
m·
0 it is easy to find Tn and T0•
n, the event E must occur n times in succession, so that T,
represents the probability of the compound event EEE . . . E with n identical components.
These components are independent events, since
the trials are independent, and the probability of each of them is
p.
46
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
III
Hence,the compound probability is
T,. = p·p·p · · · p (n times) or
T,. = p". The symbol To denotes the probability that E will never occur inn trials,which is the same as to say that F will occurn times in succession. Hence,for the same reasons as before,
To = q" When
m
(1  p)".
=
is neither 0 nor n, the event consisting in
m
occurrences of E
can materialize in several mutually exclusive forms, each of which may be represented by a definite succession of For example,ifn = 4 and
m = 2,
m letters E andn  m letters F.
we can distinguish the following mutu
ally exclusive forms corresponding to two occurrences of E: EEFF,EFEF,EFFE,FEEF,FEFE,FFEE. To find the number of all the different successions consisting of letters E and n
 m
m
letters F, we observe that any such succession is
determined as soon as we know the places occupied by the letter E. Now the number of ways to select
m
places out of the total number of
n places is evidently the number of combinations out ofn objects taken
m at a time. Hence, the m successes inn trials is em =
number of mutually exclusive ways to have
n(n
"

1) · · 1·2 ·3 ·
The probability of each succession of
(n ·
·
 m + 1) . ·
m
m letters E andn

m letters F,
by reason of independence of trials, is represented by the product of
m
factors
p
andn
 m factors q,
and since the product does not depend
upon the order of factors, this probability will be
for each succession.
Hence, the total probability of
m
trials is given by this simple formula: (1)
T "'=
n(n

1) · · 1·2·3 ·
(n •
·
 m + 1) · m
which can also be presented thus:
(2)
Tm =
nl pmqnm. ml (n  m) l
P mqn
m
successes in n
SEC.
4)
47
REPEATED TRIALS
This second form can be used even for
0 or
m =
n if, as usual,
m =
1. Either of the expressions (1) or (2) shows that we assume 0! may be considered as the coefficient of tm in the expansion of =
Tm
(q + pt) " t.
according to ascending powers of an arbitrary variable words, we have identically
(q + pt) "
=
To+ Ttt + T2t2 +
·
·
In other
+ T,.t".
·
For this reason the function
(q + pt) " is called the "generating function" of probabilities By setting t 1 we naturally obtain
To, T1, T2,
•
•
•
T,..
=
To+ Tt + T2+
·
·
·
+ T,.
1.
=
4. The probability P(k, l) that the number of successes
m
will satisfy
the inequalities (or, simply, the probability of these inequalities) k � where k and
m
l are two given integers,
� l
can easily be found by distinguishing
the following mutually exclusive events: m =
or
k
m
=
k + 1, ...
or
m
=
l.
Accordingly, by the theorem of total probability, P(k,
l)
=
Tk + Tk+t +
T,
··· +
or, using expression (2),
In particular, the probability that the number of successes will not,
l
be greater than
P(O, l)
=
q" +
is represented by the sum
�qn1 1
+
n(n 
1· 2
1) 2 p qr&2 + +
n(n
.
+
•
•
 1)
"
(n  l + 1) pq 1 n1 . .. · l
' '
1·2
Similarly, the probability that the number of successes in n trials will not be less than
l
can be presented thus:
INTRODUCTION TO MATHEMATICAL PROBABILITY
48
[CHAP.
l + P(l, n) n(n 1� : � (n l+1)pq [1+nl+lq l)(n l 1)(E)2 + + (n(l+1)(l+2) q 1
=
III
E
nl
. . z
. . .
J
where the series in the brackets ends by itself.
6. The application of the above established formulas to numerical examples does not present any difficulty so long as the numbers with which we have to deal are not large. Example 1. In tossing coins, what is the probability of having exactly heads? different coins at once is the same thing as tossing one coin times, if all the coins are unbiased, which is assumed. Hence, the required probability is given by formula where we must taken m p q � and it is
10
Tossing
10 5
10
(1),
10, 5, 10.. 9. . 4 . 6 . 211 2524 0•24609• 1 2 3 5 0 102 $1 $3 20 m t(20m) tm5 $3. m5�3, f m�6% m m7�7. 20 � 201 (1)m(2)20m �m!(20 m)! 3 3 14 7 � 201 (1)m(2)aom � 20! (1)m(2)20m �m!(20m)! �ml(20m)! 0.5207 =
. 8
.
7
=
Example I.
=
If a person playing a certain game can win
and lose twentyfive cents with the probability
ning at least
=
.
=
)3,
=
in
games?
Let
��.
with tile probability
what is the probability of win
be the number of times the game is won.
The
total gain (considering a loss as a negative gain) will be m
dollars
=
Hence
and the condition of the problem requires that it should not be less than
whence the probability
or, since is an integer, must happen at least
�
That is, in trials an event with times and the probability for that is:
m=7
This sum contains only
terms; but it can be expressed through another sum containi�g
terms, because
3
m=7
3
=
1
Using the last expression, one easily gets
3
m0
3
·
for the required probability.
6. In the series of probabilities for 0, 1,
2, ... n n 1 p q 2,
To, T1, T2, . . . Tn
successes in
n
trials, the terms generally increase till
the greatest term T,. is reached, and then they steadily decrease. instance, if
for m
=
0, 1,
=
0,
=
=
. . . 10 are
� the values of the expression
210Tm
For
49
REPEATED TRIALS
SEC. 6)
1, 10, 45, 120, 210, 252, 210, 120, 45, 10, so that T& is the greatest term.
1
For obvious reasons the number p. (to
which the greatest term T,. in the series of probabilities To, T1, ... T,. corresponds) is called the "most probable" number of successes. To prove this observation in general, and to find the rule for obtaining p., we observe first that the quotient
Tm+l_nmp  m+1
Tm
q
decreases with increasing m, so that (a)
T1 To
Ta
T2
> T1 > T2 >
The two extreme terms in (a) are T1
np
To
q
= , 
T,
P
T,_1
nq
=,
and if n is large enough, the first of them is >
1
and the last < 1.
To
find exactly how large n. must be, we notice that T1 To
>1
if np > q =1 p whence
n +1 >
1 ·
p
Similarly, T,. Tn1
<1
if p < nq
or
1q
whence
n+1>!. q
Consequently, if n. + 1 is greater than both 1/p and in (a) is >1 and the last term is <1.
ing sequence, there must be a last term which is �1. T,.
T,. 1 
Then
1/q,
the first term
As the terms of (a) form a decreas
·
Let it be
50
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. III
and
>
T,. Tn1
or, which is the same,
To< T1 < T2 < · · · < T,.1 � T,. T,. > T,.+1 > T,.+2 > · · · >T,.. In other words, the sequence of probabilities increases till the greatest term
T,.
is reached and steadily decreases from then on.
T,.1; namely, less than T,..
there may be another greatest term but all the other terms are certainly
when
Besides
T,.1
=
T,., T,.;
The number J.l. is
perfectly determined by the conditions
T,. T,.1
=
T,.+l T,.
n  p.+1 '!!. � 1 p. q  '
=
n p. '!!. < 1 q
p. +1
which are equivalent to the two inequalities
np  q < p.(p+q).
(n + 1)p � p.(p+q), These in turn can be presented thus:
p. � (n + 1)p < p.+1 and show that
p. is uniquely determined as the greatest integer contained in (n+1)p is an integer, then p. (n+1)p and T,. T,.1· That is, there are two greatest terms if, and only if, (n+1)p is an
(n +1)p.
If
=
=
integer. Let us consider now what happens if
1 n+1=:;; p In the first case, all the terms in tion of the first term
1
n + 1
=
Tt/To
1
· n+1 =::; ; q
or
(a)
are less than
1 with the single excep 1; namely, when
which may be equal to
Consequently,
·
p
To� T1 > T2 > · · · > T,. so that
To
If (n+1)p < 1 the greatest integer (n + 1)p is 0, and there is only one greatest term T0• If, 1, there are two terms To however, (n + 1)p T1 greater than is the greatest term.
contained in
=
=
others. If
(n+1)q � 1, all the terms in series (a) are >1 with the exception 1; namely, when (n+1)q 1.
of the last term, which may be equal to
=
Hence,
To< T1 < · · ·
<
T,._l � T,.
SEC.
REPEATED TRIALS
7)
51
so that T n is the greatest term, and the preceding term Tnl can be equal to it only if
(n + l)q
=
Now the condition
1.
(n + l)q is equivalent to
(n + l)p � n.
On the other hand, because
p
< 1,
(n + l)p Therefore
n
� 1
n +
<
1.
is the greatest integer contained in
(n + l)p.
Comparing the results obtained in the last two cases (excluded at first) with the general rule, we see that in all cases the greatest term
TP corresponds to JL = [(n +
l)p].
If (n + l)p is an integer, then there are two greatest terms TP and Tpl· This rule for determining the most probable number of successes is very simple and easy of application to numerical examples. Example 1.
Let
n = 20, p = %, q = %.
integer contained in this number is p. = 8.
Then
(n + l)p = 8.4, and the greatest
Hence, there is only one most probable
number of successes p. = 8 with the corresponding probability
T8 = Example 2. Consequently,
201 81121
(2)8(3)12 5
5
77
= 0.1 9 .
n = 110, p = %, q = %, and (n + l)p = 37, an integer.
Let
36 37 and
are the most probable numbers of successes with the corre
sponding probability
Tao =
T37 371731(1)3 37(2)7' 3 =
110!
=
0.0801.
m are large numbers, the evaluation of 7. When n, m, and n probability Tm by the exact formula 
Tm = ml(n
nl
� m)!pmqnm
becomes impracticable and it is necessary to resort to approximations. For approximate evaluation of large factorials we possess precious means in the famous "Stirling formula."
Referring the reader to Appendix I
where this formula is established, we shall use it here in the following form: log x! =log�+ x log x x + w(x) where 1
1 12(x +
i)
< w(x) < 12x·
52
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. III
In the same appendix the following double inequality is proved:
1 1 1 12n 12m  l2l
<
w(n) w(m) w(l)
<
1 6 12n+
1 6 12m+ 1 12z+a·
Now from Stirling's formula
m!
and two similar expressions for them into Tm, we get two limits Tm < k
(4)
T "' >
(n  m) I
follow.
Substituting
( nq ) "' (( )) ( n nq m)nm
n 27rm(n  m) n 27rm(n  m)
� � l
(3)
and
np m np m
"'

"'
n  m
where
1 1 12(n m)+6 e12n+612m+6 e12n1 12m1 12(n1 m), 1
k
l When
n, m,
n
each other. Inequalities for T ....

m
(3)
=
=
are even moderately large k and and
(4)
l
differ little from
then give very close upper and lower limits
To evaluate powers
(np)"' '
m
(_!!:!]_)nm n m
with large exponents, sufficiently extensive logarithmic tables must be available. If such tables are lacking, then in cases which ordinarily occur when ratios np/m and nq/(n  m) are close to 1, we can use
special short tables to evaluate logarithms of these ratios or else resort to
series. 8. Another problem requiring the probability that the number of
successes will be contained between two given limits is much more complex in case the number of trials as well as the difference between given limits is a large number.
Ordinarily for approximate evaluation
of probability under such circumstances simple and convenient formulas are used.
These formulas are derived in Chap. VII.
Less known is
the ingenious use by Markoff· of continued fractions for that purpose. It suffices to devise a method for approximate evaluation of the probability that the number of successes will be greater than a given integer l which can be supposed
>np.
We shall denote this probability by
8)
SEC.
53
REPEATED TRIALS
A similar notation Q(l) will be used to denote the probability
P(l).
that the number of failures is > l where again l > nq. The probability P(k, l) of the inequalities k � m � l can be expressed as follows:
P(k, l) if
>
l
<
np and k
1
=
np; P(k, l)
if both
if both For
P(l)
=
k
and
are >
l
np; P(k, l) l are < np.
k and P(l) we
 P(l)  Q(n  k)
=
P(k  1)  P(l)
and finally =
Q (n  l
 1)  Q(n  k)
have the expression
' pl+ 1+n 1 (l+1) ( z 1)l lqnz(n
:
+
{ ��; �+  (� + ;��7; i) 2) (�Y+ · · · } 1
The first factor
nl l+l nl1 (l+1) !(n  l  1)1P q can be approximately evaluated by the method of the preceding section. The whole difficulty resides in the evaluation of the sum
s
 1+n l l 2 1 qp
_
+
+
(n  l
(l
 1)(n  l  2) (pq)2+
+ 2)(l + 3)
which is a particular case of the hypergeometric series R F(a,tJ,'Y,X )
=
In fact
a{3 a(a + 1)(3{(3 + 1) 2 x+ 1+ . x+ 1·2'Y('Y+1) 1 'Y
(
F n+ l + 1,1,l+ 2,
 �)
=
S.
Now, owing to this connection between S and hypergeometric series, S can be represented in the form of a continued fraction.
First, it is
easy to establish the following relations:
'Y
F(a,{3 + 1,
+ 1,x)
F(a+1,{3,'Y+ 1, x)
=
F(a, {3,'Y,x) + a('Y  {3) +x (a+1,{3+ 1, 'Y + 2, x) + 'Y('Y 1 F(a,{3, 'Y,x) +. f3('Y  a) (a+1,{3+ 1, 'Y+ 2, x). + x 'Y('Y +1
l
=
{
54
INTRODUCTION TO MATHEMATICAL PROBABILITY
Substituting
[CHAP.
III
a+n,{3 + n,'Y+2n and a+n,{3+n + 1,'Y + 2n + 1, a,{3,'Y in these relations and setting
respectively, for
X2n =F(a+n,{3+n,y+2n,x); X2n+l =F(a+n,{3+n+ 1,'Y+2n+ 1,x) a2"
�+��a+� (y+2n)('Y+2n1);
=
a2n+l =
�+��{3+� (y+2n)(y+2n + 1)
for brevity, we have
Xo = Xt  a1xX2 X1 = X2 a2xXa
whence
1 In our particular case X1 =F( n+ l+ 1, 1, l+2,x), and a2n2z1
Xo
=
1
= 0.
Hence, taking
x = _'/! and introducing new notations, we have a q
finite continued fraction (5)
Cn!1 dn11
1 +
1
where
Ck
(6)
_ 
(n k  l)(l + k)p . (l+2k1)(l+2k)q' 
Every one of the numbers
Ct.
k(n+k)p (l + 2k)(l + 2k + 1)q.
will be positive and < 1 if this is true for
Now
Ct = if
Ck
dk =
l
>
(nl 1)p (l+2)q 
<
1
np, and that is exactly what we suppose.
The above continued
55
REPEATED TRIALS
SEC. 9]
fraction can be used to obtain approximate values of S in excess or in defect, as we please. Let us denote the continued fraction
by Wk·
Then 0 < Wk < Ck1
which can be easily verified.
1
S=·1 1 W1
W1
Furthermore, C1 1
=
d1 +1;  "'2
and in general
Having selected k, depending on the degree of approximation we desire in the final result (but never too large; k = 5 or less generally suffices). we use the inequality
to obtain two limits in defect and in excess for"'"·
Using these limits, we
obtain similar limits for "'"1' "'k2, "'k3, . . . and, finally, for w1 and S. The series of operations will be better illustrated by an example.
9. Let us find approximately the probability that in 9,000 trials an event with the probability p = �will occur not more than 3,090 times and not less than 2,910 times.
To this end we must first seek the
probability of more than 3,090 occurrences, which involves, in the first place, the evaluation of
Tam
9000! =
3091! 5909!
1 (1)309 (2)6909
3
3
'
By using inequalities (3) and (4) of Sec. 7, we find 0.011286 < Tam < 0.011287. Next we turn to the continued fraction to evaluate the sumS. following table gives approximate values of c1, c2, c8 and d1, d2 •
to 5 decimals and less than the exact numbers
•
•
•
The •
•
d6
56
INTRODUCTION TO MATHEMATICAL PROBABILITY
n
c,.
d,.
1 2
0.95553 0.95444 0.95335 0.95227 0.95119 0.95010
0.00047
3 4 5 6
[CHAP. III
0.00094 0.00140 0.00187 0.00234
We start with the inequalities 0 < W& < 0.95011 and then proceed as follows:
d5 < 1.04711; 1.00234 < 1 + 1  W&
0.90839 < W5 < 0.94898
1.02041 < 1 +
1  W5
d4
< 1.03685;
0.91842 < W4 < 0.93324
1  W4
da
< 1.02113;
0.93362 <

1.01716 < 1 + 
wa
< 0.93728
d2 < 1.01514; 1.01416 < 1 + 1  wa
0.94020 < W2 < 0.94113
1.00785 < 1 +
0.94779 <
dl
1  W2 
< 1.00816;
1 . 0.05221 0.02161 <
<
s
<
�Ta091
W1
< 0.94810
1 0.05190 < 0.02175.
Hence, we know for certain that 0.02161 < P(3,090) < 0.02175. By a similar calculation it was found that 0.02129 < Q(6,090) < 0.02142, so that
0.04290 < P(3,090) + Q(6,090) < 0.04317.
The required probability P that the number of successes will be contained between 2,910 and 3,090 (limits included) lies between 0.95683 and 0.95710 so that, taking P
less than 1.7 X 104•
=
0.9570, the error in absolute value will be
Problems for Solution 1. What is the probability of having 12 three times in 100 tosses of 2 dice? Am.
c:oo(n)3(ft)97
=
0.2257.
57
REPEATED TRIALS
2. What is the probability for an event E to occur at least once, or twice, or three times, in a series of
n independent trials with the probability p ? (n  l)p]; (b) 1  (1  p)nI[l
(a)
1  (1  p)";
(c)
1  (1  p)n2 1
+
[
(n  2)p
+
(n  1) (n  2) 2
+
p2
]
Ans.
·
3. What is the probability of having 12 points with 2 dice at least three times in
Ans. 0.528.
100 throws?
4. In a series of 100 independent trials with the probability 73, what is the most
Ans. ,.. = Ta3 = 0.0844. probable number of successes and its probability? = 36.93869. NoTE: Log 1001 = 157.97000; Log 671 = 94.56195; Log 6. A player wins $1 if he throws heads two times in succession; otherwise he loses
331 33;
25 cents.
If this game is repeated 100 times, what is the probability that neither his
1
Or $5?
gain nor loss will exceed $ ?
(b)
100! 20! 801
(1)20(3)80[ 4
4
1
(1)20(3)80
1001
(a)
Ans.
4
201 801
+ 6
3
60
"
80 . 79 . 78
80 . 79
80
= 0 0493;
4
80 . 79 . 78 . 77
+ 63 . 66 + 63 . 66 . 69 + 63 . 66 . 69 . 72 +
60 . 57
60 . 57 . 54
60 . 57 . 54 . 5 1
+ 8( + 81 . 82 + 81 . 82 . 83 + 81 . 82 . 83 . 84
] =
0.4506·
NoTE: Log 201 18.38612; Log 801 = 118.85473. 6. Show that in a series of 2s trials with the probability � the most probable num =
ber of successes is
s
and the corresponding probability
T,
1· ·5 ··· (2s  1) 2·4·6 · 2s
3
=
•
·
·
Show also that
T, <
1
"\hs
HINT: T < .
. +
1
2·4·6
3
·5 7 ·
·
·
·
·
·
2s
(2s+l)
·
.
7. Prove the following theorem: If P and P' are probabilities of the most probable n and n + 1 trials, then P' ;:a; P, the equality. sign being excluded unless (n + 1)p is an integer. 8. Show that the probability T,. corresponding to the most probable number of number of successes, respectively, in
successes in
n trials, is asymptotic to (2wnpq)J.i, that is, limT,.
. 9. When p
=
� Tm <
if
as
=l
n>oo
•
�. the following inequality holds for every m:
/2 '\j;;,
e1'
��

• 2t2
n2
INTRODUCTION TO MATHEMATICAL PROBABILITY
58
10. What is the probability of 215 successes in 1,000 trials if
p
=
[CHAP. III
�?
Ans. 0.0154. 11. What is the probability that in 2,000 trials the number of successes will be X. Ans 0.9G4. contained between 460 and 540 (limits included) if 1!1. Two players A and B agree to play until one of them wins a certain number of games, the probabilities for A and B to win a single game being and However, they are forced to quit when A has games still to win, and B has games.
p
.
=
p
a
q 1 p. b =
How should they divide their total stake to be fair? This problem is known as "problema de parties," one of the first problems on
probability discussed and solved by Fermat and Pascal in their correspondence. Solution Let P denote the probability that A will win remaining games before
1. b a
a

b
B can win games, and let Q 1 P denote the probability for B to win games before A wins games. To be fair, the players must divide their common stake Min =
the ratio P:Q and leave the sum MP to A and the sum MQ to B. To find P, notice that A wins in the following mutually exclusive ways:
a. If he wins in exactly a games; probability p". b. If he wins in exactly a+ 1 games; probability Ipaq. a(a+ 1)paq•. exact 1y a+ 2 games; probab'l' 11ty If he wms 1 ·2 ·
c.
n.
·
m
a+ b  1 games; probability a(a+ 1) . (a+ b 2)paqlr1 1 . 2 . 3 . . . (b 1)
If he wins in exactly
·
Consequently
·
+ 1)q1+ P=pa [ 1+a1q+a(a1·2
and similarly Q
=
[ +ip+b(b1 : 1
)
!I' 1
2
Show directly that HINT:
dP
dQ
dp+ dp
P
pl
+
+ Q 1.
·
•
+b 2) lr1 + a(a+1 . 21) .. (b(a1) q ] •
·
•
.
+ b(b +1 1)2 ·
·
·
·
·
·
·
(b+ a 2)pal ] . (a 1)
=
0. =
Solution 2. The same problem can be solved in a different way. Whether A orB games. Now if the players continue to play until the number of games reaches the limit the number of games won by A must be not less than And conversely, if this number is not less than A will win games before B wins games. Therefore, P is the probability that in 1 games A wins not less than a times, or wins will be decided in not more than
a.
a+ b 1
a+ b 1,
a,
a+b p =(a+ b 1)1 .,..1[1+b 1 �+ (b 1){b 2) (!!) '+ ...+ a p q a+ 1 q (a+ 1)(a+ 2) q a!(b  1)1 (b 1){b 2) ...2 . 1 (p)•l] (a+ 1)(a+ 2) (a+b 1) q a
b
Show directly that both expressions for P are identical.
•
·
REPEATED TRIALS HINT: Proceed
88
59
before.
13. Prove the identity
n
P" + 1p"lq +
n(n
 1)
p"�· + . . . + 1 ·2
n(n  1) · · · (n  k + 1) p""q" 1·2·3···k
J:�:z:"k1(1
=
 :z;)Aid:z;
J:l:z: .."1(1  :z:)"d:z: HINT: Take derivatives with respect to p. 14. A and B have, respectively, n + 1 and n coins. If they toss their coins simultaneously, what is the probability that (a) A will have more heads than B? (b) A ·and B will have an equal number of heads? (c) B will have more heads than A? Solution. a. Let P,. be the probability for A to have more heads than B. This probability can be expressed
88
the double sum
n+I
p"
n
1 � �
+"O"'• _ n 22n+l� � (J'!n+1 z1
a0
t"' in
Considering the coefficient of
(1 +
{1 + t)ln+l
{ iY
t)n+ 1
+
t"
we have
Hence
n+l
22 "  !.  22n+l � ln+1  22n+l  2 :o = l _1_ � o•+"'
Pn
b. The probability Q,. for A and B to have an equal number of heads is n
Q.. =
1 � "' 0'! c..+1 .. 21.. 1 +
�
=
c;,.+1
21n+l
1 <
y;;;·
a=O
c. The probability R,. for B to have more heads than A is Rn =
_ c;n+1. 2! 22n+l
16. If each of n independent trials can result in one of the E,, E2, ...Em with the respective probabilities
p,, pa,
•
.
•
(p1 +
Pmi
show that the probability to have z, events z, + lot + n, is given by + Zm •
•
·
m
P2 + · · · + Pm =
E,,
Z2 events Et, . .
1), . Zm events Em where
=
p,,,,., . .. l.,
_

nl _ _ 11 11 z,l z.l .._. lm1P1 Pz __ _
•
•
•
incompatible events
1
p,:.
CHAPTER IV
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM 1. The nature of the problems with which we deal in this chapter may be illustrated by the following simple example: Urns 1 and 2 contain, respectively, 2 white and 3 black balls, and 4 white and 1 black balls. One of the urns is selected at random and one ball is drawn. to be white.
It happens
What is the probability that it came from the first urn?
Before the ball was drawn and its color revealed, the probability that the first urn would be chosen had been 1/2; but the indication of the color of the ball that was drawn altered this probability.
To find this new
probability, the following artifice can be used: Imagine that balls from both urns are put together in a third urn. To distinguish their origin, balls from the first urn are marked with 1 and those from the second urn are marked with 2.
Since there are 5
balls marked with 1 and the same number marked with 2, in taking one ball from the third urn we have equal chances to take one coming from either the first or the second urn, and the situation is exactly the same as if we chose one of the urns at random and drew one ball from it. If the ball drawn from the third urn happens to be white, this can happen in 2 + 4
=
6 equally likely cases.
Only in 2 of these cases will the
extracted ball have the mark 1.
Hence, the probability that the white
ball came from the first urn is .%
=
�.
The success of this artifice depends on the equality of the number of balls in both urns.
It can be applied to the case of an unequal number
of balls in the urns, but with some modifications; however, it seems preferable to follow a regular method for solving problems like the preceding one.
2. The problem just solved is a particular case of the following funda mental:
Problem 1.
An event A can occur only if one of the set of exhaustive
and incompatible events
occurs.
The probabilities of these events
corresponding to the total absence of any knowledge as to the occurrence 60
SEc.
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM 61
2]
or nonoccurrence of
A,
Known also, are the conditional
are known.
probabilities
(A, B,); for
A
i
=
1, 2,
.
B,
.
n
B,.
to occur, assuming the occurrence of
bility of
.
How does the proba
change with the additional information that
A
has actually
happened?
Solution. bility
The question amounts to finding the conditional proba
(B,, A).
The probability of the compound event
AB,
can be
presented in two forms
(AB,)
=
(B,)(A, B,)
or
(AB,)
(A)(B,, A).
=
Equating the righthand members, we derive the following expression for the unknown probability
(B;, A):
A
=
(B;)(A, B,), (A)
can materialize in the mutually exclusive forms
ABt, ABz, ...AB,., by applying the theorem of total probability, we get
(A)
=
(Bt)(A, Bt) + (Bz)(A, Bz) +
·
·
·
+ (B,.)(A, B,.).
It suffices now to introduce this expression into the preceding formula for
(B1, A)
to get the final expression
( 1) (B' A) '
=
(B,)(A, B,) (Bt)(A, Bt) + (Bz)(A, Bs) +
·
·
·
+ (B,.)(A, B,.)'
This formula, when described in words, constitutes the socalled "Bayes' theorem."
However, it is hardly necessary to describe its
content in words; symbols speak better for themselves. reason, we prefer to speak of theorem.
Bayes' formula
For that
rather than of Bayes'
Bayes' formula is also known as the "formula for probabilities
of hypotheses."
The reason for that name is that the events
B1, B2,
•
•
•
B. may be considered as hypotheses to account for the occurrence of A. It is customary to speak of probabilities
(Bt), (Bz), .
.(B,.)
as a priori probabilities of hypotheses
Bt, Bz, ...B., while probabilities
(B,, A);
i
=
1, 2, ..
.
n
62
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. IV
are called a posteriori probabilities of the same hypotheses.
3. A few examples will help us to understand the meaning and the use of Bayes' formula.
Example 1.
The contents of urns 1, 2, 3, are as follows:
1 white, 2 black, 3 red balls 2 white, 1 black, 1 red balls 4 white, 5 black, 3 red balls One urn is chosen at random and two balls drawn.
They happen to be white and red.
What is the probability that they came from urn 2 or 3?
Solution.
The event A represents the fact that two balls taken from the selected
urn were of white and red color, respectively.
To account for this fact, we have three hypotheses: The selected urn was 1 or 2 or 3. We shall represent these hypotheses in the order indicated by Bt, Bs, Ba. Since nothing distinguishel! the urns, the probabili ties of these hypotheses before anything was known about A are
(Bt)
=
(B1)
=
(Ba)
=
f·
The probabilities of A, assuming these hypotheses, are
(A, Bs)
=
(A, Ba)
!,
=
h·
It remains now to introduce these values into formula (1) to have a posteriori prob abilities
(B1, A)
=
(Ba, A)
=
l·l !· t + f · f + f · /r l·h !· t + l· l + l · h
55 =
=
118 30 118
and also, naturally,
(Bt, A)
Example Sl.
=
1  (B2, A)  (Ba, A)
=
M·
It is known that an urn containing altogether 10 balls was filled in
the following manner: A coin was tossed 10 times, and according as it showed heads or tails, one white or one black ball was put into the urn.
Balls are drawn from tb,is
urn one at a time, 10 times in succession (always being returned before the next draw ing) and every one turns out to be white.
What is the probability that the urn con
tains nothing but white balls? Solution. The event A consists in the fact that in 10 independent tlials with a definite but unknown probability, only white balls appear. To account for this fact, we have 10 hypotheses regarding the number of white balls in the urn; namely, that
this number is either 1, or 2, or 3, . . . or 10.
The a priori probability of the hypo
thesis B; that there are exactly i white balls in the urn, according to the manner in which the urn was filled, is the same as the probability of having i heads in 10 throws; that is,
· 101 1. (B ) ,  i!(10  i)!210' _
i
=
( �)
1, 2, ... 10.
Granted the hypothesis B;, the probability of A is
(A, B;)
'
=
1
10 •
SEc. 3]
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM
The problem requires us to find (B1o, A). ately results from Bayes' formula:
63
The expression of this probability immedi
1
The denominator of this fraction is 14.247. Hence
(B1o, A)
=
0.0702.
This probability, although still small, is much greater than �024• the a priori prob ability of having only white balls in the urn. If, instead of 10 drawings,
m
drawings have been made and at each drawing white
balls appeared, the probability (B1o, A) would be given by (B1o, A)
1 ..10.
=
�
c�
t=1
{ �)
m
1
The denominator of this formula can be presented thus:
(
Now
.
t
1
)
 10
m <
mi
e
10
and so
Hence (B1o, A) > This shows that with increasing For instance, if
m
=
m
100
(
1 +
)
m 10
e
iii
.
the probability (B1o, A) rapidly approaches 1.
(B1o, A) > (1 + e10)10 > (1.0000454)10 > 0.99954. Thus, after 100 drawings producing only white balls, it is almost certain that the urn contains nothing but white ballsa conclusion which mere common sense would dictate.
Example S.
Two urns, 1 and 2, contain respectively 2 white and 1 black ball,
and 1 white and 5 black balls.
One ball is transferred from urn 1 to urn 2 and then
one ball is drawn from the latter.
It happens to be white.
that the transferred ball was black?
What is the probability
INTRODUCTION TO MATHEMATICAL PROBABILITY
64
Solution. and
B2,
Here we have two hypotheses:
that it was white.
Bt,
[CHAP. IV
that the transferred ball was black,
The a priori probabilities of these hypotheses are
(Bt) a
The probabilities of drawing
t,
=
white ball from urn 2, granted that Bt or
B,
is true,
are:
(A, B.) = �·
(A, Bt) = t,
. The probability of Bt, after a white ball has been drawn from the second results from Bayes' formula:
(Bt, A) 4. Problem 2.
=
urn,
1 l·t = l·t + i · t 5"
Retaining the notations, conditions, and data of
Prob. 1, find the probability of materialization of another event granted that
A
i
(a, AB,);
a
Conditional probabilities
has actually occurred. =
1, 2,
... n
are supposed to be known.
Solution.
A
Since the fact of the occurrence of
involves that of one,
and only one, of the events
the event
a
(granted the occurrence of
A)
can materialize in the following
mutually exclusive forms
aBt, aB2, ... aB,. Consequently, the probability
(a, A)
=
(a, A)
which we are seeking is given by
(aBt, A) + (aB2, A) +
·
·
·
+
(aBn, A).
Applying the theorem of compound probability, we have
(aB,, A)
=
(B,, A)(a, B,A)
and
(a, A)
=
(Bt, A)(a, ABt) + (Bs, A)(a, AB2) +
· +
(Bn, A)(a, ABn).
It suffices now to substitute for
(B,, A) its expression given by Bayes' formula, to find the final expression n
(2)
(a, A)
� (B,)(A, B,)(a, AB,)
=
i1
•
n
� (B,)(A , B,)
i=l
SEc.
4]
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM
It may happen that the materialization of hypothesis independent of
A;
B,
65
makes
C
then we have simply
(C, AB;)
(C, B,)
=
and instead of formula (2), we have a simplified formula n
� (B,)(A, B,)(C, B,) i=l
(C, A)
(3)
n
� (B,)(A, B1)
i=l
The·event
C
can be considered in regard to A as a future event.
For
that reason formulas (2) and (3) express probabilities of future events. For better understanding of these commonly used technical terms, we shall consider a simple example. From an urn containing
Example 4.
white and 5 black balls,
3
4 balls are
trans
ferred into an empty urn. From this urn 2 balls are taken and they both happen to be white. What is the probability that the third ball taken from the same urn, \vill be white? Solution.
(a) Let us suppose that the two balls drawn in the first place are returned
to the second urn.
Analyzing this problem, we distinguish first the following hypoth
eses concerning colors of the 4 balls transferred from the first urn. are necessarily 2 white balls.
Among them, there
Hence, there are only two possible hypotheses:
Bt:
2 white and 2 black balls;
B2:
3
white and
1
black ball.
A priori probabilities of these hypotheses are =
(Bt)
(B2) The event
A
=
c:. c: c: c: . c: c:
=
�. 7
=
_!_, 14
consists in the white color of both balls drawn from the second urn.
The conditional probabilities (A,
Bt)
(A, B.) The future event
C
=
and (A, B2) are
i;
(A, B2)
=
t.
consists in the white color of the third ball.
drawn at first are returned,
C becomes independent
as
= =
(C, Bt) (C, B2)
Substituting these various numbers in formula
=
=
(3),
Since the 2 balls
soon as it is known which
Hence
one of the hypotheses has materialized.
(C, ABt) (C, AB2)
of A
l !.
we firid that
INTRODUCTION TO MATHEMATICAL PROBABILITY
66
[CHAP. IV
(b) If the two balls drawn in the first place are not returned, we have
Then, making use of formula (2),
(C, A)
8 'f •
=
n ·t ·l t + � T
8 • 1f
=
1 6·
6. The following problem can easily be solved by direct application
of Bayes' formula.
Problem 3.
A series of trials is performed, which, with certain
additional data, would appear as independent trials in regard to an event
E with a constant probability
p.
Lacking these data, all we know is that the unknown probability p
must be one of the numbers
P1, P2, · · · Pk and we can assume these values with the respective probabilities
In n trials the event E actually occurred m times.
bility that
What is the proba
p lies between the two given limits a and (3 (0 � a
or else, what is the probability of the following inequalities:
a�
< (3 � 1 ) ,
p � (3?
A particular case may illustrate the meaning of this problem.
In a
set of N urns, Na1 urns have white balls in proportion p1 to the total
number of balls; Na2 urns have white balls in proportion p 2; . . . Nak urns have white balls in proportion Pk· An urn is chosen at random and
n drawings of one ball at a time are performed, the ball being returned each time before the next drawing so as to keep a constant proportion
of white balls.
It is found that altogether m white balls have appeared.
What is the probability that one of the Na; urns with the proportion
p; of white balls was chosen?
Evidently this is a particular case of the
general problem, and here we possess knowledge of the necessary data, provided that the probability of selecting any one of the urns is the same.
Solution.
We distinguish
k
exhaustive
and
mutually
exclusive
. . or Pk· The hypotheses that the unknown probability is p1, or p 2, these hypotheses are, resp ak. priori probabilities of a ectively, a1, a2, pothesis p p;, the probability of the event E occurring Assuming the hy •
•
•
•
=
m times in n trials is
C!'p'f(l  p;)nm.
Now, after E has actually happened m times in n trials, the a pos p p;, by virtue of Bayes' formula,
teriori probability of the hypothesis will be
=
SEc.
5]
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM
67
C;:"a,p'f(1  p;)nm ,. C;:'a;p'f(1  p;)nm
�
i=1
or, canceling C;:',
a,p'f(1  p;)"m ,. a;p'f(1  p;)nm
�
i=l
Now, applying the theorem of total probability, the probability P of the inequalities will be given by
P=
( 4)
 ; nm };a;p'f(1 p) ,. a,p'f(1  p;)nm ·
!
·= 1
where the summation in the numerator refers to all values of P• lying between a and P, limits included.
An important particular case arises when the set of hypothetical probabilities is
2 ' p, = k
1 P1 = k'
·
·
·
p,.
=
1
and the a priori probabilities of these hypotheses are equal:
a1 = a, = Then the fraction tor.
1/k can
·
·
·
1 = a,. = "k'
be canceled in both numerator and denomina
The final formula for the probability of the inequalities
a�p�p
will be
P 
(5)
_
};p'f(1
_
,.
� p'f(1
i=1
_
p;)
nm
; nm p)
summation in numerator being extended over all positive integers i satisfying the inequalities
ka�i �k{j. In the limit, when inequalities
k
tends to infinity, the a priori probability of the
68
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
IV
is given simply by the length fJ  a of the interval (a, {3). The a pos teriori probability of the same inequalities is obtained as the limit of Now, ask�
expression (5).
oo,
the sums
and tend to the definite integrals
J:xm(1  x)nmdx
and
Therefore, in the limit, the a posteriori probability of the inequalities a�p�fJ is expressed by the ratio of two definite integrals
P=
(6)
J:xm(1  x)"mdx. Jo 1xm(1  x)"mdx
This formula leads to the following conclusion: When the unknown probability p of an event E may have any value between 0 and 1 and the a priori probability of its being contained between limits a and fJ is fJ  a, then after n trials in which E occurred m times, the a posteriori probability of p being contained between a and fJ is given by formula (6). 6. Problem 4. Assumptions and data being the same as in Prob. 3, find the probability that in nt trials, following n trials, which produced
E
m
times, the same event will occur m1 times.
Solution.
It suffices to take in formula
(B,) and
=
(3)
a,;
(0, B,) = O:,'Pf"(1 
p,),., m,
to find for the required probability this expression: k
� a;p,m+mt(1  p,)n+n,mmt
(7)
·�=�l��Q= Omt• "' k
� a,pf(1  P>)"m
i=l
Supposing again
Pt
1
=
a1
P2
Tc' =
a2
2
=
Tc'
·
·
•
a�c
= ·
·
·
=
P�c
=
1 =
k
1
SEc.
7)
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM
and letting k
+ oo
formula
Q
{8)
=
• Cm Rl
(7)
69
in the limit becomes
J /xm+m•{1  x)n+n.mm•dx J:lxm{1  x)nmd:J: 0
•
This formula leads to the following conclusion: When the unknoum probability p of an event E may have any value between limits 0 and and the a priori probability of its be'ing contained between
{j

a
a
1
and {j is
(so that equal probabilities correspond to intervals of equal length),
the probability that the event E will happen m1 times in n1 trials following n trials which produced Em times is given by formula {8).
In particular, for n1
=
m1
=
1 (evaluating integrals by the known
formula), we have
Q
=
m+ 1 n+2
.
This is the much disputed "law of succession" established by Laplace.
7. Bayes' formula, and other conclusions derived from it, are neces sary consequences of fundamental concepts and theorems of the theory of probability.
Once we admit these fundamentals, we must admit Bayes'
formula and all that follows from it. But the question arises: When may the various results established in this chapter be legitimately applied?
In general, they may be applied
whenever all the conditions of their validity are fulfilled; and in some artificial theoretical problems like those considered in this chapter, they unquestionably are legitimately applied.
But in the case of practical
applications it is not easy to make sure that all the conditions of validity are fulfilled, though there are some practical problems in which the use of Bayes' formula is perfectly legitimate.1
In the history of probability
it has happened that even the most illustrious men, like Laplace and Poisson, went farther than they were entitled to go and made free use principally of formulas lems.
{6)
and
{8)
in various impQrtant practical prob
Against the indiscrimina;te use of these formulas sharp objections
have been raised by a number of authors, especially in modern times. The first objection is of a general nature and hits the very existence of a priori probabilities.
If an urn is given to us and we know only that
it contains white and black balls, it is evident that no means are available to estimate a priori probabilities of various hypotheses as to the propor tion of white balls.
Hence, critics say, a priori probabilities do not exist
at all, and it is futile to attempt to apply Bayes' formula to an urn with an unknown proportion of balls. 1
At first this objection may appear
One such problem can be found in an excellent book by Thornton C. Fry, "Prob
ability and Its Engineering Uses," New York, 1928.
70
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. IV
very convincing, but its force is somewhat lessened by considering the peculiar mode of existence of matfiematical objects. Some property of integers, unknown to me, is not present in my mind, but it is hardly permissible to say that it does not exist; for it does exist in the minds of those who discover this property and know how to prove it. Similarly, our urn might have been filled by some person, or selected from among urns with known contents.
To this person the a priori
probabilities of various proportions of white and black balls might have been known.
To us they are unknown, but this should not prevent
us from attributing to them some potential mode of existence at least as a sort of belief. To admit a belief in the existence of certain unknown numbers is common to all sciences where mathematical analysis is applied to the world of reality.
If we are allowed to introduce the element of belief
into such "exact" sciences as astronomy and physics, it would be only fair to admit it in practical applications of probability. The second and very serious objection is directed against the use of formula (6), and for similar reasons against formula
(8).
Imagine,
again, that we are provided with an urn containing an enormous number of white and black balls in completely unknown proportion.
Our aim
is to find the probability that the proportion of white balls to the total number of balls is contained between two given limits.
To that end, we
make a long series of trials as described in Prob. 5 and find that actually in n trials, white balls appeared m times.
The probability we seek would
result from Bayes' formula, provided numerical values of a priori proba bilities, assumed on belief to be existent, were known.
Lacking such
knowledge, an arbitrary assumption is made, namely, that all the a priori probabilities have the same value.
Then, on account of the
enormous number of balls in our urn, formula (6) can be used as an approximate expression of P. positive number
e ,
It can be shown that, given an arbitrary
however small, the probability of the inequalities m 
n
e
< p <
m 
n
+e
can be made as near to 1 as we please by taking the number of trials greater than a certain number
N(e) depending upon E alone.
In other
words, with practical certainty we can expect the proportion of white. balls to the total number of balls in our urn to be contained within arbitrarily narrow limits
m n

E
and
m +E. n
Smc. 7)
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM
71
A conclusion like this would certainly be of the greatest importance. But it is vitiated by the arbitrary assumption made at the beginning. The same is true of formula
(8)
and of Laplace's "law of succession."
The objection against using formulas
(6)
and
(8)
in circumstances where
we are not entitled to use them appears to us as irrefutable, and the numerical applications made by Laplace and others cannot inspire much confidence. As an example of the extremes to which the illegitimate use of formulas
(6)
and
(8)
may lead, we quote from Laplace:
En faisant, par exemple, remonter la plus ancienne epoque de l'histoire a cinq mille ans, ou a 1,826,213 jours, et le Solei! s'etant !eve constamment, dans cet intervalle, a chaque revolution de vingtquatre heures, il y a 1,826,214 a parier contre un qu'il se levera encore demain.
It appears strange that as great a man as Laplace could make such a statement in earnest.
However, under proper conditions, it would If, from the enormous number N +
not be so objectionable.
1
of
urns containing each N black and white balls in all possible proportions, one urn is taken and
1,826,213
balls are drawn and returned, and they
all turn out to be white, then nobody can deny that there are very nearly
1,826,214
chances against one that the next ball will also be white. Problems for Solution
1. Three urns of the same appearance have the following proportions of white and black balls: Urn 1: 1 white, 2 black balls Urn 2: 2 white, 1 black ball Urn a: 2 white, 2 black balls One of the urns is selected and one ball is drawn.
It turns out to be white.
is the probability that the third urn was chosen?
What
Ans. %.
2. Under the same conditions, what is the probability of drawing a white ball Ans. %. 3. An urn containing 5 balls has been filled up by taking 5 balls from another urn, which originally had 5 white and 5 black balls. A ball is taken from the first urn, and again, the first one not having been returned?
it happens to be black.
What is the probability of drawing a white ball from among
Ans. %. the remaining 4? 4. From an urn containing 5 white and 5 black balls, 5 balls are transferred into an empty second urn.
From there, a balls are transferred into an empty third urn and,
finally, one ball is drawn from the latter.
It turns out to be white.
probability that all5 balls transferred from the first urn are white?
What is the
Ans U 26• .
5. Conditions and notations being the same as in Prob. a (page 66), show that the probability for an event to occur in the (n + 1)st trial, granted that it has occurred in all the preceding n trials, is never less than the probability for the same event to occur in the nth trial, granting that it has occurred in the preceding n  1 trials.
HINT: it must be proved that
INTRODUCTION TO MATHEMATICAL PROBABILITY
72
[CHAP. IV
For that purpose, use Cauchy's inequality
6. Assuming that the unknown probability p of an event E can have any value between 0 and 1 and that the a priori probability of its being contained in the interval ( a, {3) is equal to the length of this interval, prove the following theorem: The prob ability a posteriori of the inequality p � CT
after E has occurred m times in n trials is equal to the probability of at least m + 1 successes in n + 1 independent trials with constant probability cr. (See Prob. 13, page 59.) '1. Assumptions being the same as in the preceding problem, find approximately the probability a posteriori of the inequalities M �P �Ht,
it being known that in 200 trials an event with the probability p has occurred 105 times. Ans. Using the preceding problem and applying Markoff's method, we find p
0.846. 8. An urn contains N white and black balls in unknown proportion.
=
The number
of white balls hypothetically may be 0, 1, 2,
.
.
. N
and all these hypotheses are considered as equally likely. Altogether n balls are taken from the urn, m of which turned out to be white. Without returning these balls, a new group of nt balls is taken, and it is required to find the probability that among them there are m1 white balls. Naturally, the total number of balls is so large as to haven + nt < N. Ans. The required probability has the same expression
( 1zm+mt(1  x)n+n.mm,dz �Jo ' 1x"'(1  x)"mdx "•
fo
as in Prob. 4, page 69. Polynomials ordinarily called "Hermite's polynomials," although they were dis covered by Laplace, are defined by
The first four of them are Ht(Y)
=
y;
Ha(Y)
=
y3 + 3y;
They possess the remarkable property of orthogonality: when while
ll4(Y)
=
y4  6y1 + 3.
PROBABILITIES OF HYPOTHESES AND BAYES' THEOREM Under very general conditions, a function f(y) defined in the interval ( 
oo,
can be represented by a series
where in general ak
1 =
Let
r.;
J
'"
k!v2,..  .. _
1n
a =
e
+ oo )
_!!:_
2 f(y)Hk(y)dy.
h2 =
and
n
73
n a
(1
a)
provided 0 < a < 1. 9. Prove the validity of the following expansion indicated by Ch. Jordan: h n + 1) I x•n(1  x)""' =  e 2 ( rnl(n m)! y'2; "' ·
[
1 
1  2a hH,(y) + n +2
2n  (lln + 6)a(1 2n(n + 2)(n + 3) for 0 ;:;;
x
;:;!
1
where y is
a
new variable connected to
x
a) h 2H2 (y )
+
... ]
by the equation
HINT: Consider the development in a series of Hermite's polynomials of the
Y'( )"'(
function

f(y) = e2
a +
y
f(y) = 0
)non
1 
h
a
y
h y
if either
ha ;:;!
for
<
or
ha
y ;:;! h(1
 a)
y > h(1 a).
10. Assuming that the conditions of validity of formula (6) are fulfilled, show that the a posteriori probability of the inequalities
�
1n
t n
a(1  a)
n
1n
�
a(1  a)
n
m
a=
;
n
can be expanded into a convergent series
2
P = 
y'2;
When
n is
large and
lt •
te
a is not near
either to 0 nor to 1, two terms of this series suffice
o
e �dy 
2

(lln 6)a(1  a) 2n '_;____ ....:. + , + ,'

y'2; � (n + 2)(n + 3)a(1
to give a good approximation toP (Ch. Jordan).
_
 a)
Apply this to Prob. 7. Ans . 0. 84585.
References BAYEs: An Essay toward Solving a Problem in the Doctrine of Chances, Philos. Trans., 1764, 1765.
LAPLACE: "Theorie analytique des probabilites," Oeuvres VII, pp. 37o409.
CHAPTER V USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS OF PROBABILITY
1. The combined use of the theorems of total and compound proba bility very often leads to an equation in finite differences which, together with the initial conditions supplied by a problem itself, serves to deter mine an unknown probability.
This method of attack is very powerful,
and it is often resorted to, especially in the more difficult cases.
In this
chapter the use of equations in finite differences, applied to a few selected and comparatively easy examples, will be shown; but in Chap. VIII we shall apply the method to
class of interesting and historically
a
important problems. Certain preliminary explanations are necessary at this point.
Again
we consider a series of trials resulting in an event E or its opposite, F, but this time we suppose that the trials are dependent, so that the probability of E at a certain trial may vary according to the available information concerning the results of some of the other trials. A simple and interesting case of dependent trials arises if we suppose that the probability of E in the (n + I)st trial receives a definite value a
if E has happened in t.he preceding nth trial, and this value does not
change whatever further information we may possess concerning the results of trials preceding the nth.
Also, the probability of E in the
(n + I)st trial receives another determined value {3 if E failed in the nth trial, no matter what happened in the trials preceding the nth. We have a simple illustration of this kind of dependence, if we suppose that drawings are made from an urn containing black and white balls in a known proportion, and that each ball drawn is returned to the urn, but only after the next drawing has been made.
It is obvious that the proba
bility that the (n.+ I)st ball drawn will be white, becomes perfectly definite if we know what was the color of the ball immediately preceding, and it remains the same no matter what we know about the colors of the I, 2, .
.
.
(n  I)st balls.
If the trials depend on each other in the abovedefined manner, we say th�t they constitute a "simple chain," to use the terminology of the late A. A. Markoff, who was the first to make a profound study of dependent trials of this and similar, but more complicated, types.
It is
implied in the definition of a simple chain that it breaks into two sepa rate parts as soon as the result of a certain trial becomes known.
74
For
SEc.
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 75
2)
instance, if the result of the fifth trial is known, trials 6, 7, 8, ... become independent of trials
1, 2, 3, 4, and the chain breaks into two distinct
parts: the trials preceding the fifth, and those following it.
If the
results of trials 1, 2, 3, ... (n  1) remain unknown, the event E in the following nth trial has a certain probability which we shall denote
Pn· Also, if it becomes known that E happened at trial k, where k < n  1, the probability of E happening in the nth trial receives a by
different value,
pc�>.
It is important to find means to determine the
probability Pn, the a priori probability of E in the nth trial when the results of the preceding trials remain unknown; as well as to determine the probability pc�> of E in the nth trial when we possess the positive information that E has materialized in the kth(k < n  1) trial.
2. Thus we are led to the following problem concerning simple chains of dependent trials: The initial probability
Problem 1.
P1 of the event E in a simple
chain of trials being known, find the probability Pn of E in the nth trial when the results of the preceding trials remain completely unknown. Also, find the probability
pC�> of E in the nth trial when it is known that E has happened in the kth trial where k < n  1. Solution. In the nth trial the event E can happen either preceded by E in the (n  1)st trial, the probability of which is Pn1, or preceded by F in the (n  1)st trial, the probability of which is 1  p,._1• By the theorem of compound probability, the probability of the succession
EE is Pn1a, while the probability of the succession FE is (1 Hence, the total probability (1)
Pn
=

Pn1){3.
Pn is
aPn1 + {3(1  Pn1)
=
(a  {3)pn1 + {3.
This is an ordinary equation in finite differences.
It has a particular
solution
Pn where
c
=
c
const.
=
is determined by the equation
c
=
(a  {3)c + {3,
whence 8
c provided 1
If 1 + {J
means that never
1 + {3 E

a
� 0.1
=
On
{3 ::  a ::1:+the
other
the
corresponding
a = 0 or a {J = 1, we necessarily have a = 1, {J 0, which must occur in all the trials if it actually occurs in the first trial, and 

=
occurs if it does not actually occur at the outset.
extreme casein which interest.
hand,
a

{J
=

This case, as well as the other
1 can therefore be excluded as not possessing real
INTRODUCTION TO MATHEMATICAL PROBABILITY
76
[CHAP. V
homogeneous equation
Yn
=
(a  f3)Yn1
has a general solution
Yn
=
involving an arbitrary constant
C(a{3)n1 Adding to it the previously found
C.
particular solution, we obtain the general solution of (1) in the form Pn
The arbitrary constant
C(a {3)n1 + 1+ a'
=
;
is determined by the initial condition
C
{3 C+ + 1 {3
_
a
=
so that finally Pn
=
�
1+
If
a
_
+
Pl
we see that
p,.
1+
1 +
does not depend on
cause we may exclude the cases
a {3 is
� a) (a {3)n1,
(Pl
=
n
P1 _
{3 {3 a
and is constantly equal to
a {3
=
1 or
a {3
=
1,
p 1•
Be
so that
contained between 1 and 1, we may conclude from the above
expression that
p,.,
if not a constant, at any rate tends to the limit
{3 1+{3a as
n
increases indefinitely.
As to
p�k>
we find in a similar way that it satisfies the equation
(2) of the same form as equation (1).
But the initial condition in this
a because the probability of E happening in the (k + 1)st case is pJok_/.1 trial is a when it is known that E occurred in the preceding trial. The =
solution of
(2)
satisfying this initial condition is
p �>
{3 =
1 +{3
1 a + (a {3)nk a 1 +{3 a
·
As the second term in the righthand member decreases with increas ing
n
and finally becomes less than any given number, we see that the
positive information concerning the result of the kth trial has less and less
SEc. 3)
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 77
influence on the probability of E in the following trials, and in remote trials this influence becomes quite insignificant. An urn contains a white and b black balls, and a series of drawings of
Example.
one ball at a time is made, the ball removed being returned to the urn immediately after the taking of the next following ball. What is the probability that the nth ball drawn is white when: (a) nothing is known about the preceding drawings; (b) the kth ball drawn is white? In this particular problem we have and
a 1 a a a = = P1 = a+b  1, f3 a+b1 a+ b •

a {3 =pl. = 1+{3a a+b 
Thus
Pn
=
a a +b
P1 =
·
That is, the probability for any ball drawn to be white is the same as that for the first ball, nothing being known about the results of the previous drawings. expression for
p�k\
p�k) So, for instance, if
The
is, in this example,
=
a b + (1)"k( a+b)(a + b1)"• a+b
a = 1, b = 2,
n
p�l)
= 5, k = 3, 1 1 2 = + . 3 3 22 =2;
the information that the third ball was white raises to
7'2
the probability that the fifth
ball will be white; it would be �without such information.
3. The next problem chosen to illustrate the use of difference equa tions is interesting in several respects.
It was first propounded and
solved by de Moivre.
Problem 2.
In a series of independent trials, an event E has the If, in this series, E occurs at least r times in
constant probability p.
succession, we say that there is a run of r successes.
What is the proba
bility of having a run of r successes inn trials, where naturallyn > r?
Solution.
Let us denote by y,. the unknown probability of a run of
r in n trials.
In n + 1 trials the probability of a run of r will then be
Yn+l·
Now, a run of
1·
in n + 1 trials can happen in two mutually
exclusive ways: first, if there is a run of r in the firstn trials, and second,
if such a run can be obtained only in n + 1 trials. the first hypothesis is y,..
The probability of
To find the probability of the second hypothe
sis, we observe that it requires the simultaneous realization of the follow ing conditions: (a) There is no run of which is 1

Ynr·
(b)
1·
in the first
In the (n

n

r
trials, the probability of
r + 1)st trial, E does not occur,
INTRODUCTION TO MATHEMATICAL PROBABILITY
78
q
the probability of which is remaining As
=
1  p.
(c)
trials, the probability of which is
r
(a) , (b), (c)
[CHAP. V
Finally, E occurs in the
p•.
are independent events, their simultaneous mate
rialization has the probability
(1  y,._,)qp•. At the same time, this is the probability of the second hypothesis. Adding it to
y,.,
we must obtain the total probability Yn+1•
Thus
(3) and this is an ordinary linear difference equation of the order
r
+ 1.
Together with the obvious initial conditions Yo
Y1
=
it serves to determine instance, taking n
=
y,.
= r,
·
·
·
=
0,
completely for n
=
Yr
=
+ 1,
= r
(3)
r
P'
+ 2, . . . .
For
p• + p•q.
+ 1, we obtain
= r
YrH
and so forth.
Yr1
we derive from Yr+1
Again, taking n
=
=
P" +
2p"q
Although, proceeding thus, step by step, we can find the
required probability
y,. for any
given n, this method becomes very labori
ous for large n and does not supply us with information as to the behavior of
y,. for
large n.
It is preferable, therefore, to apply known methods of
solution to equation by introducing z,. =
z,.
(3). First we can obtain a homogeneous equation 1  y,. instead of y,.. The resulting equation in
is
(4)
 z,. + qp•z,._r
Zn+l
=
0
and the corresponding initial conditions are: Zo
=
Z1
=
'
'
'
=
Zr1
=
1j
Zr
1  p".
=
We could use the method of particular solutions as in the preceding problem, but it is more convenient to use the method of generating functions.
The power series in � rp{E)
=
Zo + Z1� + Z2�2 +
·
·
·
is the socalled generating function of the sequence Zo, z1, z2, . . . . If we succeed in finding its sum as a definite function of �.the development of this function into power series will have precisely of �".
z,.
as the coefficient
To obtain rp(�) let us multiply both members of the preceding
series by the polynomial
SEc. 4]
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 79
The multiplication performed, we have

� + qp•�•H)!J'W
+ (z1 zon 1 = Zo + + (Zr1  Zr2)�• + + (z  Zr1)�· + (Zr 1  z,. + qp•zo)�r+1 + ,. + have In the righthand member the terms involving �·H, �r+2, vanishing coefficients by virtue of equation (4); also z�c  Z1c1 = 0 for r  1, while k 1, 2, 3, (1
·
·
·
... ·
=
Zo =
and
1
Zr  Zr1
so that
=
·
·
.
p•
and 1  p·�· IJ'(�) =:= 1  � + qp·�r+1'
The generating function IJ'(�) thus is a rational function and can be The
developed into a power series of � according to the known rules. coefficient of ��� gives the general expression for z,..
Without any dif
ficulty, we find the following expression for z,.:
z,.
(5) where
=
fJn,r  p•fJnr,r
n r+l fJ,.,,.
=
� ( 1)1C�_1,.(qp•)1
1=0
and fJ,._,.,,. is obtained by substituting n r instead of n.
If n is not very
large compared with r, formula (5) can be used to compute z,. and
Yn For instance, if n
z20
=
=
=
1
z,..
20, r = 5, and p = q = ,%, we easily find
(
)
45 15 10 1 10 10 1  64 + 642  643  32 1  64 + 642
and hence
Z20
= 0.75013
correct to five decimals; Y20 = 0.24987 is the probability of a run of 5 heads in 20 tossings of a coin.
4. But if
n
is large in comparison with r, formula (5) would require
so much labor that it is preferable to seek for an approximate expression for z,. which will be useful for large values of n.
It often happens, and
in many branches of mathematics, but especially so in the theory of probability, that exact solutions of problems in certain cases are not of any use.
That raises the question of how to supplant them by con
80
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. V
venient approximate formulas that readily yield the required numbers. Therefore, it is an important problem to find approximate formulas where exact ones cease to work.
Owing to the general importance of approxi
mations, it will not be out of order to enter into a somewhat long and complicated investigation to obtain a workable approximate solution of our prob . lem in the interesting case of a large n. Since cp(�) is a rational function, the natural way to get an appropriate expression of
Zn
would be to resolve cp(�) into simple fractions, correspond
ing to various roots of the denominator, and expand those fractions in power series of �. However, to attain definite conclusions following this method, we must first seek information concerning roots of the equation 1
_
5. Let
� + qpr�rH
where =
0:
=
0.
pr (1  p).
When
p varies from 0 to 1, the maximum of pr(1
p
� 1 and is
=
r
rj (r+ 1)rH so that
r
a
�

p) is attained for
rj (r+ 1)r+1 in all cases.
r
To deal with the most interesting case, we shall assume
r p
(6) which involves a
rr
< (r + 1)r+1
and we leave it to the reader to discover how the following discussion should be modified if
p ;;; r
� 1·
When �starts to increase from 0, the function/(�) steadily increases and attains a positive maximum for �
=
(r+ 1)o:��
�o where
=
1
after which /(�) decreases steadily to negative infinity. are two positive roots of the equation Ja) ?'
+ 1, r
=
and another root greater than this number.
condition
(6) is fulfilled.
The remaining roots are all imaginary if negative root among them if
r is
Hence, there
0: �1, which is less than
This root is
1/p if
r is odd and there is one
even.
Now we shall prove that the absolute value of every imaginary or > 1/p Let p be the absolute value of any such root.
negative root is
.
SEc. 6]
USE OF DIFFERENCE EQUATIONS IN SOLl'ING PROBLEMS 81
We have first
f(p) =
  ap'+l
P
< 0
1
so that p belongs either to the interval (0, �1) or to the interval (1/p, + oo ) , and if we can show that p > �o then p can be only > 1/p. If the root we
p
consider is negative,
satisfies the equation
F(p) = 1 + p  apr+1 =
·
0
and since F(p) increases till a positive maximum for p = �0 is reached, and then decreases, the root of F(p) = 0 is necessarily > �0• If � = pei6 is an imaginary root of f(�) = 0 we have, equating imaginary parts,
(1· + 1)8 8
, sin
ap
(7) But, whatever
(� + 1)8 Slll 8
sin
1
the equality sign being excluded if sin
�
8
(r + 1)ap' which implies p > �o. 6. The equation
�
0.1
  a�r+l = 1
i6
1 Hence,
1
>
+ a�'= I pe
r+
1
can be exhibited in the form
=
·
The statement is thus completely proved.
�
�
1

8 may be
'
Substituting
_
sin
0
1.
here, and again equating imaginary parts, we get
ap'+l
sin
r8 =
sin
8
and, combining this with (7), P
1
=
(r
sin
+
1)8
sin 1·8
a=
;
m88 . . (m
. sin The extreme values of the rat1o
mteger > 1) correspond to certain
SID
roots of the equation
sin
cos
8 m8 � �7n�8�
m
=
sin
s
=
Vl
The equality sign is excluded if sin
8
+
r8)' sin 8 (1· + 1)8]r+1•
(sin [sin
cos
but for every root of this equation
m8 8, t : (m _ 8 m 1) sin2
differs from 0.
�
82
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. V
If the imaginary part of � is positive, the argument 8 is contained between 0 and than
r

r
r.
�f
In this case, it cannot be less than For, if 0 < 8 < r8
�1
or greater
�1
r
sinr8
r
>
or
sin r( + 1)8 r( + 1)8
sinr8 r > __ . sin (r + 1)8 r + 1 At the same time
1 sin8 > __ sin (r + 1)8 r + 1
and hence a
=
{
sin r8 sin8 rr > sin r( + 1)8 sin r( + 1)8 r( + 1)r+1'
which is impossible.
}r
That 8 cannot be greater than
r
r
� 1 follows
simply, because in this case, sin r( + 1)8 and sinr8 would be of opposite signs andpwould be negative. As
11'
r + 1
� 8 ;;;;!
r
11'
1, we have
r +
psin8>psin r On the other hand, sin
x > 2x/r
if 0 <
P sm 8> •
x
�f <
r/2
and p >
1/p.
Hence,
2 (r + 1)p'
Thus, imaginary parts of all complex roots have the same lower bound
2 (r + 1)p
of their absolute values. 7. Denoting the roots of the equation/� ( ) �,.; we have
(k
=
=
0 by
1, 2, . . . r + 1)
r+l
V'W
=
1 P�r. � 1 1 1· � (1 p)�r.r( + 1 r�r.) �"
(
k=l
)
SEc.
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS
7]
Hence, expanding each term into power series of coefficients of
�", we find
r+l
z
� 1 P�k
,.
=
since
�;;"
·
k=l
I
+ 1 "H (1  p�k)�" < r p + r� r(1  p) 1  k) (1  p)�k(r
I!�k  PI < 2p;
1�;;11 < p; If
� and collecting
� ( 1  p)�k r + 1 rE,;
For every imaginary root, we have
I
83
1
< lr + 1  r�kl
(r + 1)p, 2r
r is odd, there are r  1 imaginary roots and the part in the expression
of z,. due to them in absolute value is less than
(r + 1)(r 1) n+2 p r(1  p) The term corresponding to the root z
"
IBI
where
<
=
<
_r_ n+2
1 pp
·
1/p vanishes, so that finally
r �1" 1  p�1 , + 0__ p"H + 1 r�1 1 p 1 p �1 ( ) r
1 and �1 denotes the least positive root of the equation 1  � + qpr�r+1 0. =
If
r is
even, there is one negative root.
The part of z,. corresponding
to this root is less than
2p"+2 (1  p)r· The whole contribution due to imaginary and negative roots is less than
in absolute value.
(8)
z,.
=
Thus, no matter whether
r is odd or even, we have
1  P�1 . �" + O�"H; 1  p (1  p)�1 1' + 1  r�1
1
< 0 <
1.
This is the required expre�sion for z,., excellently adapted to the case of a large value for n, since then the remainder term involving 8 is completely negligi le in comparison with the first principal term,
�
84
[CHAP. V
INTRODUCTION TO MATHEMATICAL PROBABILITY
The root h can be found either by direct solution of the trinomial equation following Gauss' method, or by application of Lagrange's series. Applying Lagrange's series, we have
lr 1 + a + � (lr + 2)( + �? ..
h
=
·
·
·
(lr + l)al
1=2 ..
1 og
_ t <;1
+ l)(lr + 2) a+ � � (lr ll ·
·
·
(lr + l

1)
a
1
1=2
both series being convergent if
Ia I
satisfied.
< ?''/ (r
+
8. Let us apply the approximate formula and
r
=
10.
1)r+1 (8)
to the case p
�1 z,. n
=
q
=
�'2
Using Lagrange's series, we find that =
1.0004908
and
Hence, for
and this condition is
=
=
1.003937 . (1.0004908)" +
100, 1,000, 10,000, z,.
=
;�.
respectively,
0.9559; 0.6146; 0.0074
so that, for instance, the probabilities of a run of at least
100, 1,000, or 10,000 throws of
10
heads in
a coin are, respectively,
0.0441; 0.3854; 0.9926. Thus, in
10,000
10
or
the probability y,. tends to
1,
throws, it is quite likely that heads would turn up
more times in succession. In general, for a given?" and increasing
n,
so that in a very long series of trials, runs of any length are extremely likely to occur, a conclusion which at first sight seems paradoxical.
9. In the preceding examples, an unknown probability was deter mined by an ordinary equation in finite differences.
Very often, how
ever, probability as a function of two or more independent variables is defined by a partial difference equation in two or more independent variables, together with a set of initial conditions suggested by the problem itself.
A few examples will suffice to illustrate the use of
partial equations in finite differences and to give an idea of the two principal methods for their solution; namely,
Laplace's
method of
generating functions, and the less well known, but elegant, method proposed by Lagrange. We start with an analytical solution of the problem which was dis cussed in detail in Chap. III.
SEc. 9)
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 85
Problem 3.
Find the probability of exactly x successes in t inde
pendent trials with the constant probability p. Solution by Laplace's Method. Let us denote the required proba bility by Y�.t· To obtain x successes in t trials can be possible only in two mutually exclusive ways: (a) by obtaining x successes in t  1 trials and a failure at the last trial;
(b)
by obtaining success at the last trial  1 trials. The probability of
and x  1 successes in the preceding t case
(a)
is qy,.,11 and that of case
(b)
is PYz1,11·
The total probability
Y�.1 satisfies the equation Yz,l
(9)
=
PY:.1,11 + qy,.,t1
for all positive x and t. This equation alone does not determine y,.,1 completely, but it does so in connection with certain initial conditions. These conditions are Y�.o
=
0
if
X> 0,
Yo,1
=
q1
if
t � 0.
(10)
The first set of equations is obvious; the second set is the expression of the fact that if there are no successes in t trials, the failures occur t
times in succession, and the probability for that is q1• Following Laplace, we introduce for a given t the generating function of Yo,ti Y1,ti Y2,t,
..
. , that is, the power series
..
�PtW
=
Yo.t + Y1.1� + Y2.t�2 + · · ·
=
� Yz.���.
:.:=0
Taking t  1 instead of t, separating the first term and multiplying by q, we have
..
q�Pt1(�)
=
qyo,t1 +
� qy�,t1��;
:.:=1
and similarly 00
P�IPI1(�)
=
� PY�1,11��.
z=l
Adding and noting equation
(9)
(p� + q)�PI1(�)
we obtain =
1P1W + qyo,t1  Yo.t,
but because of (10) qyo,l1  Yo,1
and hence,
=
1 t q  q
=
0
INTRODUCTION TO MATHEMATICAL PROBABILITY
86
for every positive
Taking
t.
t
1, 2, 3, .
=
[CHAP. V
..and performing successive
substitutions, we get
c,oeW
(p� + q)ec,ooW
=
and it remains only to find
c,oo(�) But on account of
=
Yo.o + Y1.oe + Y2,oe2 +
(10), y.,,o
0
=
cpo(�)
for x >
while Yo,o
=
1.
Thus,
1
=
and
c,oeW
0,
(pe + q)e.
=
To find y.,,e it remains to develop the righthand member in a power series of e and to find the coefficient of e"'. The binomial theorem readily gives
Y .,, e 
t(t

_
1) . . . (t  :r; + 1) "' tz p q 1· 2 ...:r;
10. Poisson's Series of Trials.
•
The analytical method thus enables
us to find the same expression for probabilities in a Bernoullian series of trials as that obtained in Chap.III by elementary means. Considering how simple it is to arrive at this expression, it may appear that a new deduction of a known result is not a great gain. But one must bear in mind that a little modification of the problem may bring new difficulties which may be more easily overcome by the new method than by a general ization of the old one. Poisson substituted for the Bernoullian series another series of independent trials with probability varying from trial to trial, so that in trials
1, 2, 3, 4,
...the same event E has different
and correspondingly, the opposite event probabilities p1, p2, pa, p., . has probabilities q1, q2, qa, q., . .. where qk 1  Pk, in general. Now, for the Poisson series, the same question may be asked: what is the •
•
=
probability y.,,e of obtaining x successes in t trials? The solution of this generalized problem is easier and more elegant if we make use of differ ence equations. First, in the same manner as before, we can establish the equation in finite differences
Yz.e
(11)
PeYzl,t1 + qeYz,t1·
=
The corresponding set of initial conditions is
y .,,o q1q2
=
(12)
Yo,e
=
•
if
0 •
•
Yo,o
q, =
:r; >
if
0 t >
1.
Giving cp,(e) the same meaning as above, we have
0
SEc.
11) USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 87 ..
� q,y.,,,_I�"'
qell'tI(�) = q,yo,tI +
z=l ..
Pe�IPe1(�) =
� PtYz1,e1�"', z=l
whence
(peE + q,)fPe1W = ll't(E) + qeYo,e1  Yo,ej but because of
(12)
q,yo,t1  Yo, t
=
q1q2
•
•
•
q,  q1q2
·
•
·
qe
=
0,
and thus whence again
lfrW
(PI� + qi)(p2� + q2) (p,� + q,)tpo(�). However, by virtue of (12), tp o{�) 1 so that finally =
·
·
·
=
!peW
=
(pi� + qi)(p2� + q2)
To find the probability of
x
·
·
·
(p,� + q,).
successes in t trials in Poisson's case, one
needs only to develop the product
(PI� + q1)(p2� + q2) ' ' ' (p,� + qe) according to ascending powers of �and to find the coefficient of �"'. We shall now apply to equa
11. Solution by Lagrange's Method.
tion (9) the ingenious method devised by Lagrange, with a slight modifica tion intended to bring into full light the fundamental idea underlying this method.
Equation (9) possesses particular solutions of the form
a"'fJ' if a and fJ are connected by the equation
afJ = p + qa. Solving this equation for fJ, we find infinitely many particular solutions
a"'(q + pa1)e where a is absolutely arbitrary.
Multiplying this expression by an
arbitrary function �p(a) and integrating between arbitrary limits, we obtain other solutions of equation (9).
Now the question arises of how
to choose �p(a) and the path of integration to satisfy not only equation (9) but also initial conditions
(10).
We shall assume that �p (a) is a regular
function of a complex variable a in a ring between two concentric circles, ·with their center at the origin, and that it can therefore be represented in this ring by Laurent's series ..
�p(a) =
� n oo
Cna".
88
INTRODUCTION TO MATHEMATICAL PROBABILITY
If cis a circle concentric with the regularity ring of
[CHAP. V
fP(a) and situated
inside it, the integral
Yz,l =
�i
azl(q + pa1)1fP(a)da
is perfectly determined and represents a solution of
(9).
To satisfy
the initial conditions, we have first the set of equations
� 2'11'�
i
azlfP(a)da=0
X=1, 2, 3, .
for
c
which show that all the coefficients c,. with negative subscripts vanish, and that
fP(a) is regular about the origin.
The second set of equations
obtained by setting x = 0
�i
�
fP a) d a = q1 (q + pa1)1
serves to determine
t =0,
for
1, 2,
If Eis a sufficiently small complex parameter,
fP(a).
this set of equations is entirely equivalent to a single equation:
1 27ri
i c
fP(a)da 1 = a E(p + qa) 1 Eq
'
Now the integrand within the circle c has a single pole
ao determined by
the equation
ao
=
E(p + qao)
and the corresponding residue is
'P(ao) 1 qE' At the same time, this is the value of the lefthand member of the above equation, so that
1 'P(ao) = 1qE 1qE or .
fP(ao) =1 for all sufficiently small
E or ao.
That is,
 i (
1 Yz,t  27ri is the required solution. that is, the coefficient of
c
fP(a)= 1 and
az1 q +
)
p ' da �
It remains to find the residue of the integrand;
1/a in the development of
SEc. 12)
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 89
in series of ascending powers of
That can be easily done, using the
a.
binomial development, and we obtain
Yz,t
=
tCfp"q z
as it should be.
12. Problem 4.
Two players, A and B, agree to play a series of
games on the condition that A wins the series if he succeeds in winning a games beforeB wins b games. The probability of winning a single game 1  p forB, so that each game must be won by either is p for A and q A or B. What is the probability that A will win the series? =
Solution. This historically important problem was proposed as an exercise (Prob. 1 2, page 58) with a brief indication of its solution based on elementary principles.
To solve it analytically, let us denote by
y.,,1 the probability that A will win when
x
games remain for him to win,
while his adversary B has t games left to win.
Considering the result
of the game immediately following, we distinguish two alternatives: (a) A wins the next game (probability p) and has to win x  1 games before B wins t games (probability Yzt,t)i (b) A loses the next game (probability q) and has to win x games before B can win t  1 games (probability Yz,t t). The probabilities of these two alternatives being PYzt,t and qy.,,,_1 their sum is the total probability y., ,,. Thus, y.,,,
satisfies the equation
y.,,,
(13) Now, y.,,o
=
PYz1.1 + qy.,,t1·
=
0 for
won all his games.
x > 0 , which means that A cannot win, B having 1 fort > 0, which means that A surely Also, Yo.t =
wins when he has no more games to win.
The initial conditions in our
problem are, therefore,
y.,,o
=
0
if
X> 0;
Yo,t
=
1
if
t > 0.
(14) The symbol Yo.o has no meaning as a probability, and remains undefined. 0. For the sake of simplicity we shall assume, however, that y0,0 =
Application of Laplace's Method.
Again, let
Yz,O + Yz,l� + Yz,2� 2 + be the generating function of the sequence y.,,o; y.,,1; y.,,s, responding to an arbitrary x > 0. We have !l'z(�)
=
'
..
q�"'.,w
=
�qy.,,,1�' 1=1 ..
P!l'z1(�)
=
PYs1, o + �PYiJ,t�� 1=1
•
•
•
•
•
cor
90
INTRODUCTION TO MATHEMATICAL PROBABILITY
and
[CHAP.
V
..
q�IPz(�) + PIPz1(�)
=
py.,_1,o +
� (PYz1,t + qy,.,e1W
1=1
or, because of {I3,) Q�ff,.(�) + PIPz1(�)
PYz 1,o  Yz,o + IP(z �.)
=
Now, for every x > 0
Yz,O
Yz1,0
=
=
0
in conformity with the first set of initial conditions, which allows us to present the preceding relation as follows:
whence
But
�PVW
=
o 1� + Yo,2�2 + Yo,o + Y,
·
·
·
=
� + � 2 + �3 +
·
·
·
=
I
��
and finally �P
i�)
�P"'
=
(I  �)(I  q�),.·
It remains to develop the righthand member in a power series of �and find the coefficient of �.� As �1 �
=
� + �2 + �3 + . . .
and
1 (1

q�)"'
=
+ 1) u2 1 + �q ,., + x(x . q ,. + I I 2
we readily get, multiplying these series according to the ordinary rules,
y.,, 1
=
p
.,[1
I · + �q + x(x + ) q 2 + ... + x(x + 1) I·2 I 1 2
·
·
·
·
·
·
(x + t 2) q (t  I)
1_1]
which coincides with the elementary solution indicated on page 58. Application of Lagrange's Method. Equation (I3) has particular solutions of the form
a"(j'
where
a(j
=
p(j + qa.
SEc. 12)
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS
Hence, we can either express
a
by
{3
or
{3
by
91
Leaving it to the reader
a.
to follow the second alternative, we shall express
as a function of {3
a
and seek the required solution in the form
where
cp(f3)
is again supposed to be developable in Laurent's series in a
certain ring;
c
is a circle described about the origin and entirely within
Setting x
that ring.
=
0, we must have
i;.if.(31cp({3)d{3
=
1
t
for
=
1,
2, 3,
and this set of equations is satisfied if we take
cp(f3)
=
1 (32
+
1
(33
1

+
.
lf31
 {3({3 1)'
> 1.
Now we have
and fort
=
0 YJ,O
=
p"'
as it should be, because for
power series of
1/{3,
d{3
r 21ri)c(1  q{3 1)"'{3({3 lf31
>
p"' 211i
f(

1) =:'
1 the integrand can be developed into a
the term with 1/{3 being absent.
solution is given by Yz,t
=
0
1
_
(311d(3 q(31)"'({3
_
Thus, the required
1)
where c is a circle of radius > 1 described about the origin. expression for y,.,1 is obtained as the coefficient of of
·
(1 q{3 1)"'({3 into power series of
1/{3.

The final
1/{3 in the development.
1)
We obtain the same expression as before. Problems for Solution
1. Each of n urns contains
a
white and b black balls.
One ball is transferred from
the first urn into the second, another one from the second into the third, and so on. Finally, a ball is drawn from the nth urn.
What is the probability that it is white,
when it is known that the first ball transferred was white? Ans.
a b + (a+b + 1)1n. a+b a+b 

INTRODUCTION TO MATHEMATICAL PROBABILITY
92
[CHAP. V
2. Two urns contain, respectively, a white and b black, and b white and a black A series of drawings is made, according to the following rules: a. Each time only one ball is drawn and immediately returned to the same urn it
balls.
came from. b. If the ball drawn is white, the next drawing is made from the first urn. c.
If it is black, the next drawing is made from the second urn.
d. The first ball drawn comes from the first urn.
What is the probability that the nth ball drawn will be white?
� K: � :)"·
Ans. Pn =
+
3. Find the probability of a run of 5 in a series of 15 trials with constant probability p = 78. An s. Yu =23.36  70.312 =0.0314184. 4. How many throws of a coin suffice to give a probability of more than 0.999 for
An s. 1.76 · 1032 throws suffice. 6. What is the least number of trials assuring a probability of 6; � for a mn of at least 10 successes if p =q = �? Ans. 1,423. 6. Seven urns contain black and white balls in the following proportions: a mn of at least 100 heads?
2 1 3 4 6 Urns.............................. 5 7 1 1  White.............................
1
2
2
3
Black..............................
2
1
2
1
One ball is drawn from each urn.
2
5
3 2
4
5
What is the probability that there will be among
Ans. Coefficient of e in.
them exactly 3 white balls?
H! = 0.28025. 7. Two players, each possessing $2, agree to play a series of games.
The prob
ability of winning a single game is � for both, and the loser pays $1 to his adversary after each game.
Find the probability for each one of them to be ruined at or before
the nth game? Solution. Let y .. be the probability that after playing 2m games, neither of the players is ruined.
We have
Ym+l = �y •• and hence
Ym
1 2" '
The probability for one of the players to be ruined at or before the nth game is �  12 2"'+1 if n =2m or n =2m + 1. 8. Solve the same problem if each player enters the game with $3.
Ans. �  �(%)"'1 if n =2m  1 or n =2m. 9. Players A., A ., . . . A,.+• play a series of games in the following order: first A. plays with A2; the loser is out and the winner plays with the following player, A a; the loser is out again and the next game is played with A4, and so on; the loser always being out and his place taken by the next following player.
The probability of winning
a
USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS
93
single game is 72 for each player and the series is won by the player who succeeds in winning over all his adversaries in succession. series will stop exactly at the xth game?
What is the probability that the
What is the probability that the series will
stop before or at the xth game? Solution. Let Yr be the probability that the series terminates exactly at the xth game. That means that the player who won the game entered at the (x  n + 1)st game and won successively the n following games. Now, there are n  1 cases to be distinguished according as the player beaten at the (x  n + 1}st game has already won 1, 2, 3, ...n  1 games. Let Pk be the probability that the loser in the (x  n + 1)st game previously has won k games. series in this case is Pk/2•. On the other hand,
The probability of ending the
so that
Hence, for x > n
Yz
1 =
+
"?}/z1
1
;tJz2
+
Initial conditions:
Y1
=
Y2
=
·
·
·
=
Y n1
=
0;
Yn
1 =
·
2•1
The generating function of Yz:
and the generating function of the probability that the series will end before or at the xth game is
10. Three players, A, B, C, play a series of games, each game being won by one of If the probabilities for A, B, C to win a single game are p, q, r, find the prob ability of A winning a games before B and C win band c games, respectively. Solution. Let Az,u,• denote the probability for A to win the series when he has still to win x games, while B and C have to winy and z games, respectively. First, them.
we can establish the equation
Ar,u,•
=
pAx1,u,• + qAx, u1,• + rAz,u,•1·
Next, Ao,u,• = 1 for positivey, z, and Ar,o,r = 0 for positive x, z; A •.•. o = 0 for posi tive x, y. Besides, although this is only a formal simplification, we shall assume
94
INTRODUCTION TO MATHEMATICAL PROBABILITY
A,.,0,,
=
A,.,.,,,
0, A,.,11,0
=
0 when :�: or y or z ( E,
z vanishes.
[CHAP. V
For the generating function of
..
.,
)
:t A,.,.,,.E"'II'
=
11·•0
we find the equation
whence
[
The final answer is A,.,b,c
=
a 
p" 1 + l(q + r) +
a(a + 1) a(a +1)(a + 2) . (q + r)2 + (q + r)3 + 1 2 1. 2 3 •
·
·
1
the dash indicating that powers of qand rwith the exponents iii:; band iii:; care omitted. Obviously, the same method can be extended to any number of players, and leads to a perfectly analogous expression of probability. 11. Anurn contains nballs altogether, and among them awhite balls. In a series of drawings, each time one ball is drawn, whatever its color may be, it is replaced by a white ball. Find the probability Yz,r that after r drawings there are :�: white balls in the urn. Solution. The required probability satisfies the equation Yz,r+l
Besides, y .. ,o
1,
=
Yz,o
=
0
=
n:�:+1 :�: Yzl,r +y,.,,, n n if
X
;ol' a,
Yz,r
=
0
if
:1:
<
a.
From the preceding equation, combined with the initial conditions, we find suc cessively y.. ,,
=
(;)'
[( � ) (;)'] [( ) ( ) (�)'] 1 ' 
a
Ya+l,r
=
(n a>
Ya+l,r
=
(n a)(n a 1) 1 2 ·
a+ 2 ' n
_
2
a+ 1 ' + n n
and so on. 12. If, in the problem of runs, pis supposed to be >
ity of a run of
r in
ntrials is greater than 1
_
(
p  P1
r 
(r + 1)p ,
+
r
r+l
=
r+1
)
r(p + p,) p�+l 2 1  p,
where p1 < __ is a root of the equation
PHI  p,)
1 •prove
p' ( 1  p).
that the probabil
USE OF DIFFERENCE EQUATIONS IN ·SOLVING PROBLEMS 95 13. To find an asymptotic expression of probability for a run of r in n independent trials, if p �
___!:_ , the following proposition is of importance: Imaginary and nega
r+ 1
tive roots of the equation
(1  8):1:"
are
.
 X
+8
n 0<8S n1
Oj
=
in absolute value, greater than the root
R > 1 of the equation
2'11' (1  8)R"  R + s cos n

Prove the truth of this statement.
14. Given
0.
n of black and white balls in known
urns containing the same number
8
=
proportions, drawings are made in the following manner: first, a single ball is drawn out of every urn; second, the ball drawn from the first urn is placed into the second; that drawn from the second is placed in the third, and so on; finally, the ball drawn from the last urn is placed in the first, so that again every urn contains n balls. Sup posing that this operation is repeated t times, find the probability of drawing a white
ball from the xth urn. Solution. Let be the required probability.
First, it can be shown that it
y.,,,
satisfies the equation
Ys,l
1
+
( ;)y.,,l1 ;Yz1,11•
=

Y2,o, y.,o are known; and, moreover, the function The initial probabilities must satisfy a boundary condition of the periodic type, o Hence,
Yt,o, y.,,1 y.,, ( ) [
•
•
•
Y ,1 y.,1• =
applying Lagrange's method, the following solution is found
=
1

1
;
'
f(x) +
t
1 . (n
t(t  1) 1) + f 1 . 2(n 1)• (x
 1/(:c

y.,,O
when


2) +
where
j(ie)
=
and the definition is extended to
x
x>O
� 0 by setting
f( x)
=
f(s

:c).
If, to begin with, all urns contain the same number of white and black balls, so that f(x)
=
const.
=
p, we shall have, no matter what t is,
References DEMOJVRE: "Doctrine of Chances," 3d ed., 1756. LAPLACE: "Theorie analytique des probabilites," Oeuv1·es VII, pp. 49 if. LAGRANGE: "M6moire sur les suites r6currentes dont les termes varient de plusieurs manieres diff6rentes, etc.," Oeuvres IV, pp. 151 JJ.
CHAPTER VI BERNOULLI'S THEOREM 1. This chapter will be devoted to one of the most important and
beautiful theorems in the theory of probability, discovered by Jacob Bernoulli and published with a proof remarkably rigorous (save for some irrelevant limitations assumed in the proof) in his admirable posthumous book"Ars conjectandi" {1713).
This book is the first attempt at scien
tific exposition of the theory of probability
as
a separate branch of
mathematical science.
If, inn trials, an event E occurs m times, the number mis called the
"frequency" of E in n trials, and the ratio m/n receives the name of "relative frequency."
Bernoulli's theorem reveals an important proba
bility relation between the relative frequency of E and its probability p. Bernoulli's Theorem. With the probability approaching 1 or certainty lL8 near lL8 we please, we may expect that the relative frequency of an event E in a series of independent trials with constant probability p will differ from that probability by less than any given number E > 0, provided the number
of trials is taken sufficiently large. In other words, given two positive numbers
E
and .,, the probability
Pof the inequality
will be greater than 1 limit depending upon Proof.
E

11
and
if the number of trials is above a certain fl.
Several proofs of this important theorem are known which
are shorter and simpler but less natural than Bernoulli's original proof. It is his remarkable proof that we shall reproduce here in modernized form. a. Denoting by T m,
as
usual, the probability of m successes in n trials.
we shall show first that {1) if b > a and k > 0.
Since the ratio
Tz.r1 T.,
n x p = x+l q 96
SEc. 1)
97
BERNOULLI'S THEOREM
decreases as x increases we have forb >
a
or Changingb, a, respectively, intob + 1, a + 1;b + 2, a + 2; + k, it follows from the last inequality that
·
·
·
b + k,
a
that is, Tb+" Tb
Ta+".
<
T,.
b. Integers X and p. being determined by the inequalities X 1 < np �X,
the probabilities
A
p.
1 < np + nE � ,.

and a of the inequalities m
0 :S; p < E''  n
are represented, respectively, by the sums A=
T>o.
C=
T,.
+ +
the first of which contains p.
TA+l T,. +t

+ . . . + + + ·
·
·
X= g terms.
T,._l T,.
Combining terms of the
second sum into groups of g terms (the last group may consist of less than
g terms) and setting for brevity
+
At=
T,.
A2 =
T��+u
Aa =
T,.Hu
T ,.+l
+
+
+ + T,.Ho+l +
T,.+cl1
+
·
·
·
T��+u+t
·
·
·
·
+ T,.Hut + T,.+But
we shall have
a=
At
+
Az
Aa
+
+
and at the same time
A2
(2)
At
T,. , T>o.
<
The ratio At
A
=
T>o.+u T>o.
+ +
+ +
T>o.+u+l T>o.+t
is less than the greatest of numbers
·
•
·
•
·
+ T>o.+aut + T>o.+ut
98
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. VI
But by inequality (1) T>.+o > T>.+o+l > T>.+1 T>.
hence A1 < T"
T>..
A
Similarly, A2 < Tl'+o, T" A1
and again by inequality (1) TP+o < T>.+u, T>. T"
Consequently A2 < T" , T).
Aa
A2
A1
and inequalities
(2)
<
T",
T>.
are established.
c. For x �"X
TT:
1 < 1.
It suffices to show that T>.+l=n"X'f!.
As "X� np
which shows that
T�:
1
"X+ 1 q
.
npq n  "X l! ::5: < 1 "X+ 1 q  npq + q < 1.
The inequality just established shows that in the following expression: T" T>.
=
Tpa+1 Tpa T>.+1 T" Tp1 Tl'1 . Tl'2 . . . Tpa . Tpa1 . . . T).
all the factors are
< 1.
Consequently, if we retain
only, replacing the others by 1, we get
Moreover,
a
� g first factors
SEC. 1)
99
BERNOULLI'S THEOREM
whence the following important inequality results:
)
(n p.+al!. a· \P a+1 q
T,. < Tx
(3)
Here a is an arbitrary positive integer �g. Now, let
e
be an arbitrary positive number.
Then we can show that
for n
(4) we have both (i)
� a(1 +E)

q
E(p +E)

n P.+ a'!!.. :S: _ P_ p.'a+1q p+E
and
(ii)
a � g.
Since p. � np+nE, it suffices to show that (i) is satisfied for p. If
p. = np +
= np+nE.
nE inequality (i) is equivalent to

nq
nE+a
np+ne
 a+
:S:q
1p+ e
or, after obvious simplifications, nE(p
a(1+E)
+ e) �
But this inequality follows from
 q.
To establish (ii), since
(4).
are integers, it suffices to show that
a
< g+ 1.
'X < np+1 and consequently g+ 1 > nE. lished if we can show that nE �
a
But
p+ E
a and g
np+nE,
Hence (ii) will be estab
which by virtue of
a(1+E)
p. �
 q =a
(4)
will be true if
>
that is, if a(1+E) q �
ap+ae
or aq
 q � 0 which is obviously true, a being a positive integer. d. The auxiliary integer a is still at our disposal. Given an arbitrary positive number 11 < 1 we shall determine a as the least integer satisfying
the inequality
or
At the same time
1
log
11 '7"  " >

( �)
.
log
1+
a

1
100
INTRODUCTION TO MATHEMATICAL PROBABILITY
and since log
(1 �) +
>
p
[CHAP. VI
� /we shall have
1 p+ E a< 1 + log1J
E
and 1 + E a(1 + E)  q < log ! + !. E(p + E) E2 1J E Consequently, if (5)
n�
then by virtue of (i) and
(3)
1+
E

E2
1 1 log + 1J
E
T,. < T>. 1/, and by virtue of
(2)
A 1 < A 11, A 2 < A 111 < A 112, A a < A 211 < A113, whence (6) This inequality holds if n satisfies (5).
No trace of the auxiliary
integer a is left. e. Let us now consider the inequalities m
E< p< 0 n
and
m
p�E n
and introduce their respective probabilities Band D.
These inequalities
are equivalent to 0 <
nm
n
<E q .
and
nm 
n
q�E. 
It is apparent that we can interpret Bor D as probabilities that the num
ber of occurrences m' satisfy either the
n  m of the event F opposite to E in ntrials will ' ' inequality 0 <  q< E or  q � E. Since =
:
:
the righthand side of (5) contains only given numbers E, 11 it is clear that
D < __!!!]_
(7) if (5) is satisfied. Now A + B
1TJ
=
P is the probability of the inequality
SEc. 2)
101
BERNOULLI'S THEOREM
and C + D =
Hence P +
Q
Q
=
is the probability of the opposite inequality
1.
Moreover, by (6) and
Q
<
(7)
__!y]_ , 1 '11
Consequently,
p +
__!yJ_
1 '11
>
1
or if only
n �
1 + E 
2 E
1
1
'lj
E
log + ·
This completes the proof of Bernoulli's theorem. For example, if p = q = � and
E =
0.01, '11 = 0.001 we get from (5)
n � 69,869 which shows that in 69,869 trials or more there are at least 999 chances against 1 that the relative frequency will differ from� by less than
7l00.
The number 69,869 found as a lower limit of the number of trials is much too large.
A much smaller number of trials would suffice to fulfill
all the requirements.
From a practical standpoint, it is important to
find as low a limit as possible for the necessary number of trials (given and '11).
E
With this problem we shall deal in the next chapter.
2. Bernoulli's theorem states that for arbitrarily given exists a number no(E, '11) such that for any single value
n
E
and '11 there
> no(E, '11) the
probability of the inequality
will be greater than 1 given
E

11·
The question naturally arises, whether for
and 11 a number N(E, 11) depending upon
E and '11 can be found
such
that the probability of simultaneous inequalities
'11· The following theo N(E, '11) will still be greater than 1 rem due to Cantelli shows that this question can be answered positively.
for all n >
Cantelli's Theorem. satisfying the inequality

For given
E
< 1, 11 < 1 let N be an integer
102
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. VI
The probability that the 1·elative f1·equencies of anevent E will differ f1·om p by less thane inthe Nth and all the following t1·ials is r g eater than 1  TJ.
Proof.
We shall prove first that the probability Q,. of the inequality
.I� PI�
will always be less than 2e��n•'. preceding section for any rJ > 0
According to results proved in the
Q,. < if n >
This inequality, if we take TJ
E 
1+
e2
=
E
1J
log
1

rJ
1 +e
·
2e��n•' becomes
1+e
1
1+e
+ ;  2 log 2 n > n 2
e
and in this form it is evident, since for e < 1 1
Hence, as stated,

1 +e
E
log 2 < 1  2 log 2 < 0.
Q,. < 2eln•'.
(8)
The event A, in which we are interested, consists in simultaneous fulfillment of all the inequalities
for n
=
N, N + 1, N + 2, . . . .
The opposite event B consists in
the fulfillment of at least one of the inequalities
where ncan coincide either with N, or with N + 1, or with N + 2, The probability of B, which we shall denote by R, certainly does not exceed the sum of the probabilities of all the inequalities
for n N, N + 1, N + 2, . Consequently, referring to (8), =
..
SEc. 3)
103
BERNOULLI'S THEOREM
To satisfy the inequality
it suffices to take
N >
2
2 E
log
2

f/
+
2
2 E
log
1 1 6
;1• 1
Now 2
log 1 2 E
1 e
2
2
log 2 + 2. t•• < 2 E . E
Consequently, if
we shall haveR < f/ and at the same time the probability of A will be greater than 1

f/, which proves Cantelli's theorem.
SIGNIFICANCE OF BERNOULLI'S THEOREM 3. As was indicated in the Introduction, one of the most important problems in the theory of probability consists in the discovery of cases
where the probability is very near to 0 or, on the contrary, very near to 1,
because cases with very small or very "great" probability may have real practical interest.
In Bernoulli's theorem we have a case of this kind;
the theorem shows that with the probability approaching as near to 1 or certainty as we please, we may expect that in a sufficiently long series of independent trials with constant probability, the relative fre quency of an event will differ from that probability by less than any specified number, no matter how small.
But it lies in the nature of the
idea of mathematical probability, that when it is near 1, or, on the con trary, very small, we may consider an event with such probability as practically certain in the first case, and almost impossible in the second. The reason is purely empirical. To illustrate what we mean, let us consider an indefinite series of independent trials, in which the probability of a certain event remains constantly equal to �
It can be shown that if the number of trials
is, for instance, 4.0,000 or more, we may expect with a probability > 0.999 that the relative frequency of the event will differ from � by less than 0.01.
In other words, we are entitled to bet at least 999 against 1 that
the actual number of occurrences will lie between the limits 0.49n and 0.51n if n � 40,000.
If we could make a positive statement of this
kind without any mention of probability, we should be offering an ideal scientific prediction.
However, our knowledge in this case is incomplete
INTRODUCTION TO MATHEMATICAL PROBABILITY
104
[CHAP. VI
and all we are entitled to state is this: we are more sure to be right in predicting the above limits for the number of occurrences than in expect ing to draw a white ball from an urn containing 999 white and only 1 black ball. In practical matters, where our actions almost never can be directed with perfect confidence, even incomplete knowledge may be taken as a
sure guide. Whoever has tried to win on a single ticket out of 10,000 knows from experience that it is virtually impossible. Now the convic tion of impossibility would be still greater if one tried to win on a single ticket out of 1,000,000. In the light of such examples, we understand what value may be attached to statements derived from Bernoulli's theorem: Although the fact we expect is not bound to happen, the probability of its happening is so great that it may really be considered as certain.
Once in a great
while facts may happen contrary to our expectations, but such rare excep tions cannot outweigh the advantages in everyday life of following the indications of Bernoulli's theorem.
And herein lies its immense practical
value and the justification of a science like the theory of probability. It should, however, be borne in mind that little, if any, value can be attached to practical applications of Bernoulli's theorem, unless the conditions presupposed in this theorem are at least approximately ful filled: independence of trials and constant probability of an event for every trial. And in questions of application it is not easy to be sure whether one is entitled to make use of Bernoulli's theorem; consequently, it is too often used illegitimately.
It is easy to understand how essential it is to discover propositions
of the same character under more general conditions, paying especial attention to the possible dependence of trials. achievements in this direction.
There have been valuable
In the proper place, we shall discuss the
more important generalizations of Bernoulli's theorem.
4. When the probability of an event in a single experiment is known, Bernoulli's theorem may serve as a guide to indicate approximately how
often this event can be expected to occur if the same experiments are
repeated a considerable number of times under nearly the same condi tions.
When, on the contrary, the probability of an event is unknown
and the number of experiments is very large, the relative frequency of that event may be taken as an approximate value of its probability. Bernoulli himself, in establishing his theorem, had in mind the approxi mate evaluation of unknown probabilities from repeated experiments.
That is evident from his explanations preceding the statement of the theorem itself and its proof.
Inasmuch as these explanations are interest
ing in themselves, and present the original thoughts of the great discov erer, we deem it advisable here to give a free translation from Bernoulli's _
Smc. 4)
book.
BERNOULLI'S THEOREM
105
After calling attention to the fact that only in a few cases can
probabilities be found a priori, Bernoulli proceeds as follows: So, for example, the number of cases for dice is known.
Evidently there are
as many cases for each die as there are faces, and all these cases have an equal chance to materialize. For, by virtue of the similitude of faces and the uniform
distribution of weight in a die, there is no reason why one face should show up more readily than another, as there would be if the faces had a different shape or if one part of a die were made of heavier material than another.
So one knows
the number of cases when a white or a black ticket can be drawn from an urn, and besides, it is known that all these cases are equally possible, because the num bers of tickets of both kinds are determined and known, and there is no apparent reason why one of these tickets could be drawn more readily than any other. But, I ask you, who among mortals will ever be able to define as so many cases, the number, e.g., of the diseases which invade innumerable parts of the human body at any age and can cause our death?
And who can say how much more
easily one disease than anotherplague than dropsy, dropsy than fever can kill a man, to enable us to make conjectures about the future state of life or death?
Who, again, can register the innumerable cases of changes to which the
air is subject daily, to derive therefrom conjectures as to what will be its state after a month or even after a year?
Again, who has sufficient knowledge of the
nature of the human mind or of the admirable structure of our body to be able, in games depending on acuteness of mind or agility of body, to enumerate cases in which one or another of the participants will win?
Since such and similar
things depend upon completely hidden causes, which, besides, by reason of the innumerable variety of combinations will forever escape our efforts to detect them, it would plainly be an insane attempt to get any knowledge in this fashion. However, there is another way to obtain what we want.
And what is impossi
ble to get a priori, at least can be found a posteriori; that is, by registering the results of observations performed a great many times.
Because it must be pre
sumed that something may occur or not occur as many times as it had previously been observed to occur or not occur under similar conditions.
For instance, if,
in the past, 300 men of the same age and physical build as Titus is now, were investigated, and it were found that 200 of them had died within a decade, the others continuing to enjoy life past this term, one could pretty safely conclude that there are twice as many cases for Titus to pay his debt to nature within the next decade than to survive beyond this term.
So it is, if somebody for many
preceding years had observed the weather and noticed how many times it was fair or rainy; or if somebody attended games played by two persons a great many times and noticed how often one or the other won; by these very observations he would be able to discover the ratio of cases which in the future might favor the occurrence or failure of the same event under similar circumstances. And this empirical way of determining the number of cases by experiments is neither new nor unusual.
For the author of the book "Ars cogitandi," a man
of great acumen and ingenuity, in Chap. 12 recommends a similar procedure, and everybody does the same in daily practice.
Moreover, it cannot be con
cealed that for reasoning in this fashion about some event, it is not sufficient to
106
INTRODUCTION TO MATHEMAT !CAL PROBABILITY
[CHAP. VI
make a few experiments, but a great quantity of experiments is required; because even the most stupid ones by some natural instinct and without any previous instruction (which is rather remarkable) know that the more experiments are made, the less is the danger to miss the scope. Although this is naturally known to anyone, the proof based on scientific principles is by no means trivial, and it is our duty now to explain it.
However,
I would consider it a small achievement if I could only prove what everybody knows anyway. There remains something else to be considered, which perhaps nobody has even thought of. Namely, it remains to inquire, whether by thus augmenting the number of experiments the probability of getting a genuine ratio between numbers of cases, in which some event may occur or fail, also augments itself in such a manner as finally to surpass any given degree of certitude; or whether the probleni, so to speak, has its own asymptote; that is, there exists a degree of certitude which never can be surpassed no matter how the observations are multiplied; for instance, that it never is possible to have a probability greater than �. %, or% that the real ratio has been attained.
To illustrate this by an
example, suppose that, without your knowledge, 3,000 white stones and 2,000 black stones are concealed in a certain urn, and you try to discover their numbers by drawing one stone after another (each time putting back the stone drawn before taking the next one, in order not to change the number of stones in the urn) and notice how often a white or a black stone appears. The question is, can you make so many drawings as to make it 10, or 100, or 1,000, etc., times more probable (that is, morally certain) that the ratio of frequencies of white and black stones will be 3 to 2, as is the case with the number of stones in the urn, than any other ratio different from that? If this were not true, I confess nothing would be left of our attempt to explore the number of cases by experiments. But if this can be attained and moral certitude can finally be acquired (how that can be done I shall show in the next chapter), we shall have cases enumerated a posteriori with almost the same confidence as if they were known a priori. that, for practical purposes, where
11
morally certain" is taken for
11
And
absolutely
certain" by Axiom 9, Chap. II, is abundantly sufficient to direct our conjectures in any contingent matter not less scientifically than in games of chance. For if instead of an urn we take the air or the human body, that contain in themselves sources of various changes or diseases as the urn contains stones, we shall be able in the same manner to determine by observations how much more likely one event is to happen than another in these subjects. To avoid misunderstanding, one must bear in mind that the ratio of cases which we want to determine by experiments should not be taken in the sense of a precise and indivisible ratio (for then just the contrary would happen, and the probability of attaining a true ratio would diminish with the increasing number of observations) but as an approximate one; that is, within two limits, which, however, can be taken as near as we wish to each other. For instance, if, in the case of the stones, we take pairs of ratios so�00 and 29%00 or soo�000 and 299%000, etc., it can be shown that it will be more probable than any degree of probability that the ratio found in experiments will fall within these limits than outside of them.
Such, therefore, is the problem which we have decided to
publish here, now that we have struggled with it for about twenty years.
The
SEc. 5]
BERNOULLI'S THEOREM
107
novelty of this problem as well as its great utility, combined with equal difficulty, may add to the weight and value of other parts of this doctrine." Ars Conjec tandi," pars quarta, Cap. IV, pp. 224227.
APPLICATION TO GAMES OF CHANCE 5. One of the cases in which the conditions for application of Ber noulli's theorem are fulfilled is that of games of chance. It is not out of place to discuss the question of the commercial values of games from the standpoint of Bernoulli's theorem. "Game of chance" is the term we apply to any enterprise which may give us profit or may cause us loss, depending on chance, the probabilities of gain or loss being lmown. The following considerations can be applied, therefore, to more serious questions and not only to games played for pastime or for the sake of gaining money, as in gambling. Suppose that, by the conditions of the game, a player can win a
a of money, with the probability p; or can lose another b with the probability q = 1 p.
certain sum sum
If this game can be repeated any number of times under the same conditions, the question arises as to the probability for a player to gain or lose a sum of money not below a given limit. the total number of games, and by wins.
Let us denote by
n
m the number of times the player
Considering a loss as a negative gain, his total gain will be
K =ma(n  m)b. It.is convenient to introduce instead of
m another number a defined by
a=m np and called "discrepancy."
Expressed in terms of
a the preceding expres
sion for the gain becomes
K =n(paqb) + (a + b)a. The expression
E=paqb entering as the coefficient of
n has, as we shall see, an important bearing
on the conclusion as to the commercial value of the game. "mathematical expectation" of the player. expectation is
positive.
discrepancy less than
It is called the
Suppose at first that this
By Bernoulli's theorem the probability for a nE,
E being an arbitrary positive number, is
smaller than any given number, provided, of course, the number of games is sufficiently large.
At the same time, with the probability approaching
1 as near as we please, we may expect the discrepancy to be � nE. However, if this is the case, the total gain will surpass the number
n[E E(a + b)]
INTRODUCTION TO MATHE1lIA'l'ICAL PROBABILITY
108
[CHAP. VI
which, for sufficiently large n, itself is greater than any specified positive number. It is supposed, of course, that E is small enough to make the difference
E
(
E a
+b)
positive. And that means that the player whose mathematical expecta tion is positive may expect with a probability approaching certainty as near as we please to gain an arbitrarily large amount of money if nothing prevents him from playing a sufficient number of games. On the contrary, by a similar argument, we can see that in case of a negative mathematical expectation, the player has an arbitrarily small probability to escape a loss of an arbitrarily large amount of money, again under the condition that he plays a sufficiently large number of games. Finally, if the mathematical expectation is 0, it is impossible to make any definite statement concerning the gain or loss by the player, except that it is ve�y unlikely that the amount of gain or loss will be considerable compared with the number of games. It follows from this discussion that the game is certainly favorable for the player if his mathematical expectation is positive, and unfavorable if it is negative.
_In case the mathematical expectation is 0, neither
of the parties participating in the game has a decided advantage and then the game is called equitable. equitable.
Usually, games serving as amusements are
On the contrary, all of the games operated for commercial
purposes by individuals or corporations are expressly made to be profita ble for the administration; that is, the mathematical expectation of the administration of a game operated for lucrative purposes is positive at each single turn of the game and, correspondingly, the expectation of any gambler is negative.
This confirms the common observation that those
gamblers who extend their gambling over large numbers of games are almost inevitably ruined.
At the same time, the theory agrees with
the fact that great profits are derived by the administrations of gaming places. A good illustration is afforded by the French lottery mentioned on page
19,
which, as is well known, was a very profitable enterprise operated
by the French government.
Now, if we consider the mathematical
expectation of ticket holders in that lottery, we find that it was negative in all cases; namely, denoting by following expectations: On 1 ticket On 2 tickets
M
(U (ttl
On 3 tickets (f.N!h and so forth.
the sum paid for tickets, we find the



1)M 1)M 1)M
=
=
=
iM, HM, HHM,

SEc. 6]
109
BERNOULLI'S THEOREM
On the other hand, the expectation of the administration was always positive, and because of the great number of persons taking part in this lottery, the number of games played by the administration was enormous, and it was assured of a steady and considerable income. This was an enterprise avowedly operated for the purpose of gambling, but the same principles underlie the operations of institutions having great public value, such as insurance companies, which, to secure their income, always reserve certain advantages for themselves. EXPERIMENTAL VERIFICATION OF BERNOULLI'S THEOREM 6. Bernoulli's theorem, like any other mathematical proposition, is a deduction from ideal premises. To what extent these premises may be considered as a good approximation to reality can be decided only by experiments. Several experiments established for the purpose of testing various theoretical statements derived from general propositions of the theory of probability, are reported by different authors. Here we shall discuss those purporting to test Bernoulli's theorem. I. Buffon, the French naturalist of the eighteenth century, tossed a coin 4,040 times and obtained 2,048 heads and 1,992 tails. Assuming that his coin was ideal, we have a probability of .% for either heads or tails. Now, the relative frequencies obtained by his experiments are:
HHmt
=
=
0.507 for heads 0.493 for tails
and they differ very little from the corresponding probabilities, 0.500. In this case, the conclusions one might derive from Bernoulli's theorem are verified in a very satisfactory manner. II. De Morgan, in his book "Budget of Paradoxes" (1872), reports the results of four similar experiments. In each of them a coin was tossed 2,048 times and the observed frequencies of heads were, respec tively, 1,061, 1,048, 1,017, 1,03.9. The relative frequencies corresponding to these numbers are
HU
=
0.518;
tlli
=
0.512;
tffi
=
0.497;
UH
=
0.507.
The agreement with the theory again is satisfactory. III. Charlier, in his book "Grundziige der mathematischen Statistik," reports the results of 10,000 drawings of one playing card out of a full deck. Each card drawn was returned to the deck before the next draw ing. The actual result of these experiments was that black cards appeared 4,933 times, and consequently the frequency of red cards was 5,067. The relative frequencies in this instance are:
NM frr�
=
=
0.4933 for a black card 0.5067 for a red card
110
INTRODUCTION TO MATHEMATICAL PROBABILITY
and they differ but slightly from the probability, drawn will be black or white.
[CHAP. VI
0.5000, that the. card
The agreement between theory and experi
ment in this case, too, is satisfactory. IV. The author of this book made the following experiment with playing cards: After excluding the
12 face cards from the pack, 4 cards 40, and the number of trials
were drawn at a time from the remaining was carried to
7,000.
The number of times in each thousand that the
four cards belonged to different suits, was: I
II
III
IV
V
VI
VII
113
113
103
105
105
118
108
Altogether the frequency of such cases was
765 in 7,000 trials, whence
we find for the relative frequency
...['lh = 0.1093 while the probability for taking
4 cards belonging to different suits is
HH = 0.1094. V. In J. L. Coolidge's "Introduction to Mathematical Probability," one finds a reference to an experiment made by Lieutenant R. S. Hoar, U.S.A., but the reported results are incomplete.
The author of this book
repeated the same experiment which consisted in 1,000 drawings of 5 cards
52 cards. The results were: 503 times the 5 cards were each of diff erent denominations; 436 times 2 were of the same denomination with 3 scattered; 45 times there were 2 pairs of 2 different denominations and 1 odd card; 14 times 3 were of the same denomination with 2 scattered; 2 times there were 2 of one denomination and 3 of another. The remaining possible combination, 4 cards of the same denomination with 1 odd, never appeared. The probabilities of these at a time, from a full pack of
different cases are, respectively,
ttH = 0.507; rlh = 0.021;
HH = 0.423; � = 0.001;
Nh = 0.048; rrn = 0.000.
507, 423, 48, 21, 1, 0, 503, 436, 45, 14, 2, 0. The dis crepancies are generally small and the greatest of them, 13, is still within The corresponding theoretical frequencies are
while the observed frequencies were reasonable limits.
Deeper investigation shows that the probability that
13 is about �; hence, the observed deviation 13 units cannot be considered abnormal.
a discrepancy will not exceed of
VI. Bancroft H. Brown published, in the American Mathematical Monthly, vol.
26, page 351, the results of a series of 9,900 games of craps.
This game is played with two dice, and the caster wins unconditionally if he produces
7 or 11 points, which are called "naturals"; he loses the
6]
Smc.
111
BERNOULLI'S THEOREM
But if he produces
game in case of 2, 3, or 12 points, called "craps."
4, 5, 6, 8, 9, or 10 "points," he does not win, but has the right to cast the dice an unlimited number of times until he throws the same number of points that he had before, or until he throws a 7.
If he throws 7 before
obtaining his point, he loses the game; ot�erwise he wins. It is a good exercise to find the probability of winning this game. It is
tH that is, a little less than�.
=
0.493
Multiplying the number of games, in our
case 9,900, by this probability, we find that the theoretical number of successes is 4,880 and of failures, 5,020.
Now, according to Bancroft H.
Brown, the actual numbers of successes and losses are, respectively,
4,871 and 5,029.
The discrepancy
4871  4880
=
9
is extremely small, even smaller than could reasonably be expected. The same article gives the number of times "craps" were produced; namely, 2 appeared 259 times, 3 appeared 508 times, and 12 appeared
293 times, making the total number of craps 1,060.
The probability
of obtaining craps is
hence, the theoretical number of craps should be 1,100.
1060  1100
=
The discrepancy,
40, is more considerable this time. but still lies within
reasonable limits. VII. E. Czuber made a complete investigation of lotteries operated on the same plan as the French lottery, in Prague between 1754 and 1886, and in Briinn between 1771 and 1886.
2,854 in Prague and 2,703 in BrUnn.
The number of drawings was
The probability that in each draw
ing the sequence of numbers is either increasing or decreasing, is
h
=
0.01667
while the observed relative frequency of such cases was Prague: 0.01612; BrUnn: 0.01739 and in both places combined
0.01674. The probabilities that among five numbers in each drawing there is none or only one of the numbers 1, 2, 3, . . . 9, are, respectively,
0.58298 and 0.34070.
112
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. VI
The corresponding relative frequencies were Prague: 0.58655 and 0.32656 Brunn: 0.57899 and 0.34591 and in both places combined 0.58183 and 0.33587, respectively.
The probability of drawing a determined number is 7f8.
Now, according
to Czuber, for the lottery in Prague the actual number of occurrences for single tickets varied from 138 (for No. 6) to 189 (for No. 83), so that for all tickets the discrepancy varied from 20 to 31.
Besides, there were
only 16 numbers with a discrepancy greater than 15 in absolute value. All these results stand in good accord with the theory. VIII. One of the most striking experimental tests of Bernoulli's theorem was made in connection with a problem considered for the first time by Buffon.
A board is ruled with a series of equidistant parallel
lines, and a very fine needle, which is shorter than the distance between lines, is thrown at random on the board.
Denoting by l the length of
the needle and by h the distance between lines, the probability that the needle will intersect one of the lines (the. other possibility is that the
needle will be completely contained within the strip between two lines) is found to be
p
=
2l 1T'h
·
The remarkable thing about this expression is that it contains the number 1T'
=
3.14159
·
circle to its diameter.
·
·
expressing the ratio of the circumference of a
In the appendix we shall indicate how this expres
sion can be obtained, because in this problem we deal with a different concept of probability. Suppose we throw the needle a great many times and count the number of times it cuts the lines.
By Bernoulli's theorem we may expect
that the relative frequency of intersections will not differ greatly from the theoretical probability, so that, equating them, we have the means of finding an approximate value of
'IT'.
One series of experiments of this kind was performed by R. Wolf,
astronomer in Zurich, between 1849 and 1853.
In his experiments the
width of the strips was 45 mm., and the length of the needle was 36 mm. Thus the theoretical probability of intersections is 72
451T'
=
0.5093.
SEc. 6]
113
BERNOULLI'S THEOREM
The needle was thrown 5,000 times and it cut the lines 2,532 times; whence, the relative frequency
·
un
=
o.5o64
.
The agreement between the two numbers is very satisfactory. If, relying on Bernoulli's theorem, we set the approximate equation 72 457r
=
0.5064,
we should find the number 3.1596 for r, which differs from the known value of
1r
by less than 0.02.
In another experiment of the same kind reported by De Morgan in the aforementioned book, Ambrose Smith in 1855 made 3,204 trials with a needle the length of which was % of the distance between lines.
There
were 1,213 clear intersections, and 11 contacts on which it was difficult to decide.
If on this ground, we should consider half of them as inter
sections, we should obtain about 1,218 intersections in 3,204 trials, which would give the number 3.155 for 1r.
If all of the contacts had been treated
as intersections the result would have been 3.1412very close to the real value of r. In an excellent book "Calcolo delle Probabilita," vol. 1, page 183, 1925, by G. Castelnuovo, reference is made to experiments performed by Professor Reina under whose direction a needle of 3 em. in length was thrown 2,520 times, the distance between lines being 6 em.
Taking into
account the thickness of the needle, the probability of intersection was found to be 0.345, while actual experiments gave the relative frequency of intersections as 0.341.
APPENDIX Bufton's Needle Problem.
Let h be the width of the strip between
two lines and l < h the length of the needle.
The position of the needle
can be determined by the distance x of its middle point from the nearest line and the acute angle rp formed by the needle and a perpendicular dropped from the middle point to the line. It is apparent that
x
may
vary from 0 to h/2 and rp varies within the limits 0 and r/2. We cannot define in the usual way the probability of the needle cutting the line, for there are infinitely many cases with respect to the position of the needle. However, it is possible to treat this problem as the limiting case of another problem with a finite number of possible cases, where the usual definition of probability can be applied. Suppose that h/2 is divided into an arbitrary number m of equal parts 8
=
h/2m and the right angle
1r/2
into
n
equal parts
w = 1r/2n.
Suppose, further, that the distance x may have only the values
o, a, 28,_ .. . m8
114
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. VI
and the angle If' the values
0, w, 2w, . . . nw. This gives
N
(m + 1){n + 1)
=
cases as to the position of the needle, and it is reasonable to assume that these cases are equally likely.
To find the number of favorable cases, we
notice that the needle cuts one of the lines if
x
and If' satisfy the inequality
l X< 2 COS I(J. The number of favorable cases therefore, is equal to the number of systems of integers i,
j
satisfying the inequality
. l . �a < 2 cos JCiJ
'
(A)
supposing that i may assume only the values 0, 1, 2, . . . m and the values 0,
j
only
Because we suppose l < h the greatest
1, 2, . . . n.
value of i satisfying condition
(A) is less than m and we can disregard �m. Now for given j there are k + 1
the requirement that i should be
values of i satisfying (A) if k denotes the greatest integer which is less than
l
6 2
.
COS JW.
In other words, k is an integer determined by the conditions
k<
;6
cos
jw � k + 1.
The number of possible values for i corresponding to a given therefore be represented thus
mi
=
;6
cos
jw +
j but for all j is � 0 and < 1. Taking the sum m1 corresponding to j 0, 1, 2, . . . n, we obtain the number =
of favorable cases
M where
9
can
{}i
where IJ1 may depend on of all the
j
l =
2
6(1 +
cos
w+
cos 2CiJ +
·
·
·
. + cos nw) + n9
again is a number satisfying the inequalities
0 � 9 < 1.
SEc.
6)
BERNOULLI'S THEOREM
115
But, as is well known,
1+cos w+cos 2w + w=
or, because
1
sin
+ cosnw=+ 2
(n+!)w
w 2 sm . 2
7r 2n
. 1 1 w + cos nw = + cot 2 2 2'
1 +cos w+cos 2w +
·
therefore
M
=
l 48
l +ne. 48
w
cot 2+
Dividing this by N = (m+1)(n+1) and substituting for
8
and
w
their expressions
h
7r w=2n
8 = , 2m
we obtain the probability in the problem with a finite number of cases cot _!!__ M l m 4n l + = 2h N 2h m+1 n+1 ·
·
·
m 1 ne + ' m+1 n+1 (n+ l)(m+I) ·
The probability in Buffon's problem will be obtained by making and
n
increase indefinitely in the above expression.
� 1 1, lim lim = (m+I�n+1) = 0 (n + 1){m + 1) lim
=
m
and . lim
cot
7r
4n 4  =, n+1 1r
we have lim
M N
2l.
=
h7r
Thus we arrive at the expression of probability
2l h7r
p =
in Buffon's needle problem.
Now, since
(m, n�
co
)
m
INTRODUCTION TO MATHEMATICAL PROBABILITY
116
[CHAP. VI
Problems for Solution Another very simple proof of Bernoulli's theorem, due to Tshebysheff (18211894), is based upon the following considerations:
1. Prove the following identities: n
�
n Tm(m
 np) = 0,
�
Tm(m
m=O
m=O Indication of the Proof.

np)1
=
npq,
Differentiate the identity
n
q)n =
en,•(pe• +
me<mn,>u
�T m=O
twice with respect to
u and set u = 0.
2. If Q is the probability of the inequality l m  npl !5;; nE prove that
Indication of the Proof.
In the identity
n (m
� Tm
 np)2 = npq
m=O drop all the terms in which l m
 npl < nE and in the remaining terms replace (m np)2
by
n2E2•
The resulting inequality
lmnpl�n• is equivalent to the statement. 3. Prove that p > 1 "
if
n

> pq/'IE2.
Q, Q < pq/nE2 and pq/nf2 < 11 if n > pq/11E2, Indication of the Proof. P = 1 The following two problems show how probability considerations can be used in proving purely analytical propositions. 4. S. Bernstein's Proof of Weierstra.ss' Theorem.
The famous theorem due to Weier
strass states that for any continuous functionf(x) in a closed interval a � x � b there exists a polynomial P(x) such that lf(x)  P (x)l < for
a
�
x
� b where
u
u
is a:1 arbitrary positive number.
By a proper linear trans
formation the interval (a, b) can be transformed into the interval
to 8. Bernstein, the polynomial n
P (x) =
� C;:'xm(1 m=O
m (;)
x)n
f
(0, 1).
According
117
BERNOULLI'S THEOREM for sufficiently large n satisfies the inequality
lf(x) uniformly in the interval 0 � x � Indication of the Proof. For x
P(x) l < ,
1. 0 and
=
/( 1 )
=
It suffices to prove the statement for 0 < n
independent trials.
z =
1 we have /tO)
x
< 1.
Let
x. be
a constant probability in
We have
f(x)  P(x)
=
�
C:'xm(1 
[
x)"m J(x) 
m=O
1(;)}
By the property of continuous functions, there is a number positive number
tr
P(O) and
P(1).
..
(a)
=
E
corresponding to any
such that
1/(x')  f(x)l
<
tT
2
whenever
lx' xl Also, there exists a number
(a)
<
M such
(0 �
E
lf(x)l
that
x', x �
� 1).
M for
0 �
x
�
1.
we get
1/(x)  P(x)l where
P
�
tT
2,P + 2MR
and R are, respectively, the probabilities of the inequalities and
Now
P
<
1
and
R <, ifn >
1/4E,.
Take,
=
v/4M;
then
lf(x)  P(x)l
<
tr
1
(Castelnuovo).
if
&. Show that
.
provided 0 <
m
m
 e
> 0,
m 
n
+
E
<
From equation
INTRODUCTION TO MATHEMATICAL PROBABILITY
118
Indication of the Proof.
By Prob. 6, Chap. IV, page
72,
[CHAP. VI
the ratio
m
represents the probability
J: •x'"(1  x)..mtJx J/x'"(1  x) ..mtJx + 1
Q of at least m
successes in a series of n +
pendent trials with constant probability
p Set m
whence
+
1
1
inde
1n =

n
=
(n +
E,
1)p +
(n +
1)u
nm
e + e u > . + 1) p(1  p) < 1 . Q < (n + 1)u• + 1)e• =
n(n
But
4(n
Hence
m
Jon ·xm(1  x)..mtJx 1 < 1)e• ( + n Jolx'"(1  x)..mdx 4
and by a similar argument
.C x'"(1  x)..mdx ;;+•
Jolx'"(1  x) ..mdx
1
< 4(n +
References
1)e•'
1713.
JACOB BERNOULLI: "Ars Conjectandi," pars quarta, P. L. TsHEBYBHEFF: "Sur les valeurs moyennes," Oeuvres, I, pp. 687694. F. P. CANTELLI: "Sulla probabilitA come limite di frequenza," d. R. Accad.
Naz.
dei Linc6i,
26, 1917.
Rend.
A. MARKOFF: "Calculus of Probability," 4th Russian ed., Leningrad, 1924.
CHAP TER VII
APPROXIMATE EVALUATION OF PROBABILITIES IN BERNOULLIAN CASE 1. In connection with Bernoulli's theorem, the following important question arises: when the number of trials is large, how can one find, at least approximately, the probability of the inequality
where Eis a given number?
Or, in a more general form: How can one
find, approximately, the probability of the inequalities
l�m�l' where l and l' are given integers, the number of trials
n
being large?
The exact formula for this probability is ,_,,
p
=
� T,
•l
where T .,
as
before, represents the probability of
s
successes in
While this formula cannot be of any practical use when
n
trials.
and l'  l
n
are large numbers, yet it is precisely such cases that present the greatest theoretical and practical interest.
Hen!Je, the problem naturally .arises
of substituting for the exact expression of P an approximate formula which will be easy to use in practice
and
sufficiently close approximation to P.
which, for large
n,
will give a
De Moivre was the first suc
cessfully to attack this difficult problem.
After him, in essentially the
same way, but using more powerful analytical tools, Laplace succeeded in establishing a simple approximate formula which is given in all books on probability. When we use an approximate formula instead of an exact one, there is always this question to consider: How large is the committed error? If,
as
is usually done, this question is left unanswered, the derivation of
Laplace's formula becomes an easy matter.
However, to estimate the
error comparatively long and detailed investigation is required.
Except
for its length, this investigation is not very difficult.
2. First we shall present the probability T, in a convenient analytical The identity
form.
F(t)
=
(pt + q)"
=
To + T1t + T2t2 + 119
·
·
·
+ T,.t"
120
INTRODUCTION TO MATHEMATICAL PROBABILITY
after substituting t
F(e'"') Multiplying it by
=
e¥ becomes To+ T1e1"' + T2e2;.p +
=
·
·
·
e1'"' and integrating between J:,..ei•"'F(e;.p)dcp
=
[CHAP.
VII
+ Tne"¥. and
7r
1r,
we get
21rT,
because for an integral exponent k if
k¢0 k 0.
if
=
Thus
and this is the expression for T. suitable for our purposes. sum
To find the
•=l'
p
�T.
=
•=l
we observe first that
.=
' s= l
� e"'"' � •=l
eil

.
=
F(e1"')
P
=
•
·
cp •
cp
sm
number F(e¥) can be presented in
·
=
whence
1 211'
2
+ 1
2
2
On the other hand, the complex trigonometrical form, thus:
e
. (l' l ) .
Sln
_l+l';.p
f"" ...
i(al+l'
Reie
. (l' l ) + 1
sm
2
sm cp 2
cp dcp
•
or, because Pis real,
p
=
_!J""... 211'
Finally, because
R cos
) . (l' l ) l l' (e + 2
+ 1
sm
2
cp
.
sm
R is an even function of cp and
extend the integration over the interval 0,
1r
e
cp
cp dcp.
2
is an odd one, we can
on the condition that we
SEc.
3]
121
APPROXIMATE EVALUATION OF PROBABILITIES
!,.. 1
P
=
(l' l ) ip ' . l l (  ip) dip. 2 l'
Thus we obtain
double the result.
0
11'
R cos
sm
+
e
2
sin!!!._'
2
It is convenient to introduce instead of land defined by
where Bn
l
=
=
np
+
+1
!
+
two numbers
rl and r2
rlVB..,
Setting further
npq.
e
+
npip
=
x,
P can be presented as
p where
=
p2  pl
pl and p2 are obtained by taking r J
(1)
=
_!_ f 211' Jo
...
R
sin
=
rl and r
=
r2 in the integral
< r"(B:ip  x) dip.
!ip
sm
3. Our next aim is to establish upper and lower limits for R.
Evidently
R Now log p
=
=
(p 2
+
q2
+
2pq
cos
� ( 1  4pq �) log
sin2
ip)i
=
=
(  4pq �)� 1
2pq sin2 �
sin2
 � (4pq)2
sin4
 !(4pq)3 6
whence log p < Since
72ip
<
11'/2,
2pq sin2 P.. 2
we have sinf>P.
2
and consequently
11'
or
(2)
P < e
_2pq,,
,.,
=
pn.
�
sin6f 
2
122
INTRODUCTJON.TO MATHEMATICAL PROBABILITY
for all values of
in the interval of integration.
IP
[CHAP. VII
On the other hand, we
have
. tp 2
tp 2
1/)3 48
IP 2
IP2 4
IP' 48
sm>  >0
for
and sm2>  , •
which gives another upper bound for p:
(3) The corresponding upper bounds for R will be 2Bn 1
(4)
R<e ;rs., 24 2 R<e �.,.+�.,.
(5)
To find a lower bound for R we shall assume
IP
�
1r
/2.
We can
present log p thus: log p =
_pqtp2  !(4pq)2 2
4
sin4 !E. + 2
2pq
{(t2)2 
sin2
}
t 2
 !(4pq)3 6
sin6 !E. 
2
On the other hand,
!(4pq)3 6
sin8 !E. +
2
!(4pq)4 8
sin81£!. +
2
and
(t2)2 
sin2 !E. >
2
! 3
sin' !E.
2
so that
2pq
{(�Y
 sin2
�}  �(4pq)3  !(4pq)3 3
sin8
sin6!!!.
2
� =

•
•
2pq 3
•
sin'
>
2 q �
1£!.2{1

sin'
�_
32p2q2
sin2
and consequently log p >
_pqtp2  !(4pq)2 2 4
sin' I£!. >
2
_pqiP2 _ p2q21P' 2 4
�}
2
>0
SEC. 5]
1"frp
APPROXIMATE EVALUATION 0/t' PROBABILITIES
11" < 2" =
123
Hence,
(6) � 11"/2. 4. Let r be defined by
and this is valid forrp
r2 = 3B;t.
Assuming B,. $1; 25 from now on, we shall have,
�!
T2
and a fortiori r <
0 � rp � r.
because
1r /2.
Let us suppose now thatrp varies in the interval
By inequality
e"' 1
>
x
for
(6)
x
we shall have
> 0 and pq
� �.
On the other hand, using inequality (5), we find that
since BnT'
3
ie24 = te8
<
t.
From the two inequalities just established it follows that
(7) in the interval
5. We turn now to the angle e. S=narc tg
Evidently psinrp
q + pcosrp
=nw
where "' = arc tg
psinrp q+pcosrp
·
By successive derivations with respect torp we find
124
[CHAP. VII
INTRODUCTION TO MATHEMATICAL PROBABILITY
d2w dw pq(p  q) sin I{J p2 + pq cos rp di{J2 di{J  p2 + 2pq cos if+ q2' (p2 + 2pq cos I{J + q2)2 d3w 4pq + (1  2pq) cos I{J  2pq cos2 I{J pq (p q) di{J3 (p2 + 2pq cos I{J + q2)3 sin I{J[1 + 4pq+20p2q2+8pq(1 2pq) cos I{J4p2q2 cos2 I{J] d1w pq p q) d
_
_
_
_
and for
I{J
=
0
(d2w) di{J2
0
= o,
(daw) di{J3
0
= pq(p
q
0 �
I{J �
Furthermore, one easily verifies that in the interval
I�d�I ����
) ( 2 �)' 2pqJp  ql (1 4pq
). 1r
/2
9 pqJ  qJ 1 � < p 4pq sin2 !. =8
�
�
sin2
Hence, applying Taylor's formula and supposing
(8)
x =
I{J·
0 � I{J � r,
we get for
x
iBn(P q)� + MI{J6
where ( 9)
JMJ
<
fiB niP  q J(1  pqr2)',
or
(10)
X
=L�
where (11)
JL J
<
floBnJP  q J(1  pqr2)3.
Using inequalities (9) and (11), we easily find (12)
sin
(tVli:I{J  x)
=sin
where
0 � I{J � r. 6. To find an appropriate expression of the integral J we split it into two integrals, J1 and J2, taken respectively between limits 0, T and r, 11'.
provided
We have
SEC. 7)
1 25
APPROXIMATE EVALUATION OF PROBABILITIES
because sin
�>;
Let
.
r1
r �'
=
i,..Rd
1
But for positive
x
e
T1
<{)
_
2Bn'l'2 ,..•. d
_

<{)
(4)
e"'du u
JT '\[lfi.2 ..
·
the following inequality holds:
i
(14)
..
i,..Rdq;<() 1
R(
eu'du u

"'
consequently
Noting that
..
then by inequality
<
<
e"'' 
e!BnT1
Bnr2
·
2x2'
efy'ii;, =
3B! .
is a decreasing function of
R(
Hence,
� R (r) < Jety'ii;,.
T r 1Rd<() JT <{)
<
�
�
T1
� log! eiy'ii;,, 2
2
and combining this inequality with the one previously established, we have finally (15)
7. More elaborate considerations are necessary to separate the principal term and to estimate the error term in J 1·
Making use of the
inequality
we can present J1 thus:
Jl where
=
� rTR sin (?;VB,.
and, because R < %elB•
I AI
<
B;;l·
'T
321r sin 2
T
126
INTRODUCTION TO MATHEMATICAL PROBABILITY
Since
T2
[CHAP. VII
� % we find by direct numerical calculation
0.0205,
T
 <
327r sin � and so, finally,
l.:li
<
0.0205B;;1•
8. Referring now to inequality (7), we can write
2 27r
TR L
sin
0
2 crv'B,.cp x) dcp  cp 27r
where
l.:ltl
<
Te  >Bn'{' L 2
0
B {"' ! 'f'' � o e Bn cp3dcp 1 J
2
sin
B1 =
8;
<
CrvB:.cp x) dcp cp
+
A L.>l
0.04B;;1•
Combining this with the result of the preceding section, we can present
J1 thus (16) and
l.:l2l
<
0.0605B;;1. (16), we substitute Taking into account inequal
9. To simplify the integral in the right member of ity
(tvB,.cp x) (13), we get (17):
2 27r
0
for sin

L
T
where
But
n'{',
_,8
e •
sin
its expression
(12).
LT rT
2 (tv'IJ:.cp  x) , sin (t v'IJ:.cp) dcp   e!nn'{' dcp27r. 0 cp cp B n'f'' B  q) J e  2 cp2 cos (tyB,.cp)dcp + .:la  e:(p o _
SEc.
10]
APPROXIMATE EVALUATION OF PROBABILITIES
127
and so
laal Now
<
4
�B;Iip  ql(1  pqT2)' 6!r(p q)2{1 ...:... pqT2)8B;l. +
pq ;;i! �' 1'2 ;;i! %, B,. � 25,
consequently
( )
20 ' 1 B:1(1  pqT2)4 :S: 20� 17 4y2r .. 1
On the other hand,
1 pqTI � 153 and for positive
is attained for
x
x2
p q) 2 p q) 2} {( (2 2 +
=
<
0.0385.
17 3 20 + 20(p q)21
the maximum of
x(H + .hxs)8
=
1 }i8, whence it follows that
( )( )
9 9 17 1 55 8 647f'IP  ql {1  pqT2)8 ;;i! 6471' 33 51
<
0.051.
Taking into account all this, we have
laal
<
0.09Ip qiB;1•
10. As to integrals in the righthand member of (17) we can write
(18) (19)
where
and
because
128
INTRODUCTION TO MATHEMATICAL PROBABILITY
> 1, as can easily be proved. {16), {17), (18), (19), we get
for
x
J _! f I  21rJo + B .. (p  q) f"' Jo (� 2 B;6 l "'
{20)
61r
+
l g 4 o
since for
!+
B,.
elB·'�''
tp
B;l
B;l
Finally, taking into account
3

()'" vroDnl{l B )dtp
)
371' + 1rv'a
etv'B.
<
I
<
0.065
71' B;l + B;i B;l 2+6 371' + 71'v'3
log
!
!
2 2 'Bn'l',sin (s v'lJ:.tp)d e. 1{1 =21ro 21ro tp B,.(p  q) f "'e!B•'�'' 2 cos (s VJI:.tp)dtp tp 61r
+
0.09lp  ql +
B..
0.065 + 0.09lp  ql +!etv'B .. 2 B,. <
It now remains to evaluate definite integrals in
(22)
{15),
25 4
(21)
VII
(s VJI:tp)dtp +
' • e• B•'I' tp2 cos
+
?;
sin
[CHAP.
00
00
1
2'
(20) .
We have
e_,,u ,sin SUdu u __
=
Jo
..
p q r e�u2 cos tudu. tnrvB..Jo
=
Differentiating the wellknown integral
(a> 0) twice with respect to b, and after that substituting find for
{22) this expression: p
q
6y'2;Bn
a
=
,72, b
=
s,
we
(1  s2)eir•.
On the other hand, an integral of the type
L(a)
=
f Jo
aud u u
"' u•sin el
can be reduced to a socalled "probability integral." derivation with respect to
L'(a) and since
L{O)
=
0,
=
a gives
r"'
Jo
1
e 2"' cos
audu
=
a•
}y'2:;;:e 2
In fact, the
SEc.
APPROXIMATE EVALUATION OF PROBABILITIES
11)
Consequently, integral
129
(21) can be reduced to
�Lr
elu•du.
Having found an approximate expression of the integral J after sub stituting in it
r2 and rl for r and
taking the difference of the results, we
find the desired expression of P.
11. The result of this long and detailed investigation can be sum marized as follows:
Theorem. Let m be the number of occurrences of an event in a series of n independent trials with the constant probability p. The probability P of the inequalities
np + i +
rl.y?ipq
� m � np  i +
r2.y?ipq
where extreme members are integers, can be represented in the form r.• t•• r• ... 1 e 2 du + (23) P (1  t�)e 2 (1  tf)e 2 +w. /i'L. v 2'11' r• 6 2'11'npq =
_
�[
i
J
The error term w satisfies the inequality lwl
<
0.13 + 0.18lp  ql + ely;;pq npq
provided npq � 25. By slightly increasing the limit of the error term, this theorem can be put into more convenient form.
Let
t1 and t2 be two arbitrary real
numbers and let P denote the probability of the inequalities
np + t1ynpq � m � np + t2ynpq. If the greatest integers contained in
np + t2ynpq are respectively,
and
A2 and A1, the preceding inequalities are equivalent to n
A1 � m � A2.
To apply the theorem, we set
np i + t2ynpq np + i + t1ynpq
=
=
np + t2ynpq 82 A2 n  A1 np + t1ynpq + 81 =
=
82 and 81 being, respectively, the fractional parts of np + t2ynpq and nq  ttynpq.
Hence,
130
INTRODUCTION TO MATHEMATICAL PROBABILITY
Applying Taylor's formula, it is easy to verify that
1 
(!  81)e 1 i to  � d i r•e �due u
I y'2;q I 6� [ r,
P
y'2;
e.
(1  t�)e _r�·  (1  rf)e
.!"f]
_
h2
[CHAP. VII
2�tz2
< 0.061 npq � q p [ (1  t�)e � 6� < 0.06  qi (1 tf)e 2
(!  82)e
+
_
�]�
��q
whence, finally, we can draw the following conclusion: For any two
t1, t2,
real numbers
the probability of the inequalities
t1yn:pq
;;;i! m
np
;;;i!
t2yn:pq
can be expressed as follows:
P
=
1

y'2;
where
82
i,. h
and
eiu'du +
81 are the
(!.2  81)ei1''
+
(!.2  82)el1•'
I Ol provided
npq
;;;;
25.
In particular, if
<
t2
+ n
respective fractional parts of
nq t1vnpq
and and
+
y2·nnpq q  p [(1  t�)e1122  (1  tf)e1112] + 6�
0.20
=
+
t1
0. 2 5 lp  ql npq t,
=
+
e llynpq
the probability of the inequality
lm  npl
;;;i!
tvnpq
is expressed by
p
=
_2_ f telu'du + 1  81 82eit• + v2�npq .y2;Jo
with the same upper limit for n. is an integer in which case
82
=
n
Laplace, supposing that np + tvnpq 0 and 81 is a fraction less than (npq)�i,
gives for P the approximate expression
P
=
te •d + e it' 2 iu u y'2; �

L
:==
0
without indicating the limit of the error.
Evidently Laplace's formula
coincides with the formula obtained here by a rigorous analysis, save for terms of the same order as the error term n.
SEc.
12]
APPROXIMATE EVALUATION OF PROBABILITIES
131
To find an approximate expression for the probability P of the inequality
it suffices to take
Then
p
. .y2;L 2
=
... tn vpq
e
1  2...
du
+
0
and evidently P tends to 1 as
ne1
1  lh 
82 2pq e �
n increases indefinitely.
+ n
This is the second
proof of Bernoulli's theorem. Referring to the above expression for the probability of the inequalities
kVnpg_
�
m np
�
t2'\1"1ipq
n increases indefinitely while t2 remain fixed, we immediately perceive the truth of the following limit theorem: The probability of the inequalities and supposing that the number of trials
h and
np _ mynpq
t1 �
� t2 _
tends to the limit
as n tends to infinity. This limit theorem is a very particular case of an extremely general
theorem which we shall consider in Chap. XIV.
12. To form an idea of the accuracy to be expected by using the
foregoing approximate formulas, it is worth while to take· up a few numerical examples.
Let
n
=
200,
95 �
p
m
=
q
=
Y2 and
� 105.
The exact expression of the probability that qualities is
[
m
will satisfy these ine
(
2001 200 100 100. 99 100. 99. 98 2 1 + 2 p 101 + 101. 102 + 101. 102. 103 + 1001100! 100 . 99 . 98 . 97. 96 100 . 99. 98. 97 + 101. 102. 103 . 104. 105 + 101 . 102. 103. 104
)]
.
132
INTRODUCTION TO MATHEMATICAL PROBABILITY
The number in the brackets is found to be five decimals
9.995776 and its
[CHAP. VII
logarithm to
0.99982. The logarithm of the first factor, again to five decimals, is
2.75088,
whence log P
1.75070;
=
p
=
0.56325,
and this value may be regarded as correct to five decimals.
Let us see
now what result is obtained by using approximate formulas.
In our
example
tvnpq
=
tv'5o
and 2
y'2;
=
t
5;
ree � 2du Jo
=
=
1 V2
=
0.707107
0.52050.
The additional term eo.2o _I v 100'11"
=
0.04394
and by Laplace's formula p
=
0.56444.
This is greater than the true value of P by limit of the error is nearly
Th
=
0.00119.
Now, the theoretical
0.004
so that, actually, Laplace's formula gives an even closer approximation than can be expected theoretically. When npq is large, the second term in Laplace's formula ordinarily is omitted and the probability is computed by using a simpler expression: P
2 re � = y21r ) e 2du. o
In our case this expression would give p
instead of
8
0.56325
=
0.52050
with the error about
per cent of the exact number.
0.043,
which 'amounts to about
Such a comparatively large error is
explained by the fact that in our example npq
=
50
is not large enough.
In practice, when npq attains a few hundreds, the simplified expression for
SEc. 12)
APPROXIMATE EVALUATION Ofi' PROBABILITIES
133
P can be used when an accuracy of about two or three decimals is con sidered as satisfactory.
In general, the larger t is, the better approxima
tion can be expected. For the second example, let us evaluate the probability that in 6,520 trials the relative frequency of an event with the probability p = % will differ from that probability by less than E = �0• To find t, we have the equation
tynpq =En where
n =
p = t,
6520,
q
= t,
E =
ilf,
which gives
130.4
t =
VI564.8
= 3.2965 '
and, correspondingly,
2 y'2; f'e �2 du = 0.999021.
Jo
Since
m
satisfies the inequalities
�= � = m
3912  130.4 the fractions 81 and 82 are 81
82
0·2
0.4 and the additional term is
e5•4334
y3129.&r
3912 + 130.4
= 0.000009.
Hence, the approximate value of P is
p
=
0.999030.
To judge what is the error, we can apply Markoff's method of con tinued fractions to find the limits between which P lies.
0.999028
and
These limits are
0.999044.
The result obtained by using an approximate formula is unusually good, which can be explained by the fact that in our example t is a rather large number.
Even the simplified formula gives 0.999021, very near the
true value. Finally, let us apply our formulas to the solution of the inverse problem: How large should the number of trials be to secure a probability larger than a given fraction for the inequality
I� PI�
E?
134
INTRODUCTION TO MATHEMATICAL PROBABILITY
Let us take, for example, p = bility 0.999.
�.
e
[CHAP. VII
= 0.01 and the lower limit of proba
To find n approximately, we first determine t by the
equation 2 r�e �2du "\f'2;Jo
which
gives
= 0.999,
t = 3.291.
Hence, n =
pqt2 7
20000
.
= 9( 3.291)2 = 24,066, approximately.
We cannot be sure that this limit is precise, since an approximate formula was used.
But it can serve as an indication that for n exceeding this
limit by a comparatively small amount, the probability in question will be >0.999.
For instance, let us take n = 24,300.
The limits for m
being 8,100  243 � m � 8,100 + 243,
we find t from the equation
fn \jpq
t = E
and correspondingly
= 3.3068
2 r� e �2du .y2;Jo
= 0.999057.
The additional term in Laplace's formula being 0.000023, we find
p > 0.99908  0.00006 > 0.999. Thus, 24,300 trials surely satisfy all the requirements. Problems for Solution 1. Find approximately the probability that the number of successes will be con tained between 2,910 and 3,090 in 9,000 independent trials with constant probability Ans. 0.9570 with an error in absolute value <104 [using (23)]. �. 2. In Buffon's experiment a coin was tossed 4,040 times, with the result that heads turned up 2,048 times. What would be the probability of having more than 2,050 Ans. 0.337. or less than 1,990 heads? 3. R. Wolf threw a pair of dice 100,000 times and noted that 83,533 times the numbers of points on the two dice were different. What is the probability of having such an event occur not less than 83,533 and not more than 83,133 times? Does the result suggest a doubt that for each die the probability of any number of points was �? Ans. This probability is approximately 0.0898 and on account of its smallness some doubt may exist.
APPROXIMATE EVALUATION OF PROBABILITIES
135
4·, If the probability of an event E is 72, what number of trials guarantees a probability of more than 0.999 that the difference between the relative frequency of E and 72 will be in absolute value less than 0.01? Ans. 27,500. 6. If a man plays 10,000 equitable games, staking $1 in each game, what is the probability that the increase or decrease in his fortune will not exceed $20 or $50?
Ans. (a) 0.167; (b) 0.390. 100,000 games of craps and stakes 50 cents in each game, what the probability that he will lose less than $100? Ans. About 72oo· 6. If a man plays
is
'1. Following the method developed in this chapter, prove the following formula for the probability of exactly m successes in n independent trials with constant probability p: T,. =
1
e
�
�[ 2
1+
(q  p)(t3  3t) 6Vnpq
]
+�
where t is determined by the equation m
=
np+ tvnpq
and
I �I
<
 ql
0.15 + 0.25lp
(npq)l
+ e lynpq
provided npq � 25. 8. Developments of this chapter can be greatly simplified if p = q = 72 (sym metrical case). In this case one can prove the following statement: The probability of the inequalities
i + � + r. � �
m
�
can be expressed as follows: p=
1 
it,
eiu'du +
i  �+ r2�
(r23  r2)eu·.•  (rt3  r1)eU"•'
12v21rn
.y2; r, < 1/2n2 for n > 16.
+�
where 1�1 9. In case of "rare" events, the probability p may be so small that even for a large number of trials the quantity X np may be small; for example, 10 or less. In cases of this kind, approximation formulas of the type of Laplace's cannot be used with confidence. To meet such cases, Poisson proposed approximate formulas of a different character. Let P,. represent the probability that in n trials an event with the probability p will occur not more than m times. Show that =
P,. =
eX
[
1+
X
1

X"'
xz
+ + · · +
1·2
·
1·2·3 ..·
where
1�1 1�1
< (e"
< (e" 

1)Q,. if Q,.)
1)(1

and
1 X3 X++X=
4
2( n
n
X)
•
m
]
+ � = Q,. + �
136
INTRODUCTION TO MATHEMATICAL PROBABILITY
[
We have
Indication of the Proof.
P
m
q"
=
1.!. 1 +x + 1.2nx•l + 
.
Now, smce q
=
•
q
q
1
[CHAP. VII
•
•
},.

n
and
i1(
II �co
+x  k n X
)
1
[:>.)
( IT 1 +
�
lc=O
[ :>.) :>.lc � n:>.
k n  X
) <e�co
x
(:>.+!>"
� e2(n:>.>.
Consequently
m X"' Pm < 1 � ei����; 1 + �1 + � 1·2 + ·· · + 1·2·3···m
( )
[
..
n
But
},. "
whence
Pm <eXQmi
Qm
=
e
<e
1
}. :>.2n,
( )  [ 1+�1+ 1x•· 2 + 1;
:>.
•
.
On the other hand, ..
1
 Pm
whence and
n(n =
,.m+l
 1)
•
•
•
"'
(n  1£
+ 1)
]·
q""P"
.
xm + · ·3 12 · · ·m
]·
=
1 Pm <eX(1  Qm) eX. P m > eXQm + 1


The final statement follows immediately from both inequalities obtained for P ...
137
APPROXIMATE EVALUATION OF PROBABILITIES 10. With the usual notation, show that
where �
Q
6 n
=
[ (
(nm)A•_m(m1) 2n• 2n 1
Indication of the Proof.
_
(J
m)).l
Referring to Chap. I, page "'
T,. < 1 m!
T,.
>
" "'
). ( ).)  ( ). ( )  (
1 �
1
). " "'
;
m!
, '
0 < (J < 1.
we have
m1
2n
n
"'
23,
) ) �
)]
1
m1 2 .
But
( ).)nm
A+_!!!An
1 
<6
n
(nm)A• 2n1 ,
(
1 m
)m1
2n
<6
m(m1) 2n ,
whence "' T"' < 6 ).
m!
mA_(nm)A•_m(m1) 2n• 2n A •6 n

On the other hand,
).
( ) ( ( �) ; ( 1
nm
=

·n
m
1 
1
=
). )(nm) : ) ;
1 + 
1+
> 6
nX
_m
n
1
m
+ mA+(nm)A•_(nm)A• A A nA 2(nA)• 3(nA)1
>6
;�:��� .
Hence
( ).)nm( 1 
n
+mA_(nm)A•_m(m1) _(nm)A• ___ •_ m_ m m1 A >6 n 2n1 2n • 6 2 3(nA)• 2n(nm),
1 
n
)
and a fortiori
" "'
( ).)  ( 1 
n
(nm)A•_ m(m1) m m1 A+�2 >6 n 2n1 2n 1
)
[
1 
n
(n m)).l 3(nX)3 _
ma
2n(nm) If ). and m are both small in comparison to
near 1 .
n the
] ·
aboveintroduced factor Q will be
Under such circumstances we may be entitled to use an approximate formula
due to Poisson
T
m
=
).
"'
DA
mr .
INTRODUCTION TO MATHEMATICAL PROBABILITY
138
[ CHAP. VII
The preceding elementary analysis gives means to estimate the error incurred by using this formula. 11. Apply the preceding considerations to the case n = 1,000, p = X00, X = 10 and m = 10. Am. 0.1256 < T10 < 0.1258. Poisson's formula gives 0.1251a very good approximation.
Also, 0.5398 < P10 < 0.5453. Taking Pto = 0.542, the 10 a. By a more elaborate method it is
error in absolute value will be less than 3.3
·
found P10 = 0.5421.
References !.>E MoiVRE: "Miscellanea analytica," Lib. V, 1730. DE MoiVRE: "Doctrine of Chances," 3d ed., 1756.
LAPLAcE: "Theorie analytique des probabilit�," Oeuvres VII, pp. 28G285. Ca. DE
LA
VALLEEPoussiN: Demonstration nouvelle du theoreme de Bernoulli,
Ann. Soc. Sci: Bruxelles, 31. D. MIRIIIIANOFF: Le jeu de pile ou face et les formules de Laplace et de J. Eggenberger, Oommentarii Mathematici Helvetici 2, pp. 133168.
CHAPTER VIII . FURTHER CONSIDERATIONS ON GAMES OF CHANCE 1. When a person undertakes to play a very large number of games " under theoretically identical conditions, the inference to be drawn from Bernoulli's theorem is that that person will almost certainly be ruined if the mathematical expectation of his gain in a single game is negative. In case of a positive expectation, on the other hand, he is very likely to win as large a sum as he likes in a sufficiently long series of games. Finally, in an equitable game when the mathematical expectation of a gain is zero, the only inference to be drawn from Bernoulli's theorem is that his gain or loss will likely be small in comparison with the number of games played. These conclusions are appropriate however, only if it is possible to continue the series of games indefinitely, with an agreement to postpone the final settling of accounts until the end of the series.
But if the
settlement, as in ordinary gambling, is made at the end of each game, it may happen that even playing a profitable game one will lose all his money and will have to discontinue playing long before the number of games becomes large enough to enable him to realize the advantages which continuation of the games would bring to him. A whole series of new problems arises in this connection, known as problems on the duration of play or ruin of gamblers.
Since the science
of probability had its humble origin in computing chances of players in different games, the important question of the ruin of gamblers was discussed at a very early stage in the historical development of the theory of probability.
The simplest problem of this kind was solved by
·Huygens, who in this field had such great successors as de Moivre, Lagrange, and Laplace. 2. It is natural to attack the problem first in its simplest aspect, and then to proceed to more involved and difficult questions. Problem 1.
Two players A and B play a series of games, the proba
bility of winning a single game being p for A and q for B, and each game ends with a loss for one of them.
If the loser after each game gives his
adversary an amount representing a unit of money and the fortunes of A and B are measured by the whole numbers a and b, what is the proba bility that A (or B) will be ruined if no limit is set for the number of games? 139
140
INTRODUCTION TO MATHEMATICAL PROBABILITY
Solution.
[CHAP. VIII
It is necessary first to show how we can attach a definite
numerical value to the probability of the ruin of A if no limit is set for the number of games. As in many similar cases (see, for instance, Prob. 15, page 41) we start by supposing that a limit is set. Let n be this limit.
There is only a finite number of mutually exclusive ways in which
A can be ruined in n games; either he can be ruined just after the first game, or just after the second, and so on. Denoting by p1, p2, . . . p,. the probabilities for A to be ruined just after the first, second, . . . nth game, the probability of his ruin before or at the nth game is P1+ P2 + · ·
·
+ p,..
Now, this sum being a probability, must remain <1 whatever n is. On the other hand, each term of this sum is !?; 0 for the same reason. Both remarks combined, show that the series P1+ P2 + Pa + ·
is convergent.
·
·
We take its sum as the probability for A to be ruined
when nothing limits the number of games played.
So it is clear that this probability, although unknown, possesses a perfectly determined numerical value.
Let us denote by Yz the probability for A to be ruined The probability we seek is Ya· Obviously,
when his fortune is x.
Yo= 1,
(I)
for A is certainly ruined if he has no money left.
Similarly
(2) because if the fortune of A is
a+ b, it means that B has no money where
wit\1 to play, and certainly the ruin of A is then impossible.
Further,
considering the result of the game immediately following the situation in which the fortune of A amounted to x it is possible to establish an equation in finite differences which Yz satisfies. For, if A wins this game (the probability of which case is p), his fortune becomes x + 1 and the probability of being ruined later is Yz+1· By the theorem of compound probability, the probability of this case is PYz+1· But if A loses (the probability of which is q) , his fortune becomes x
 1 and the probability
The proba that the one possessing this fortune will be ruined is Yz1 · bility of this case is qyz1· Now, applying the theorem of total proba bility, we arrive at the equation
(3)
Yz = PYz+1+ qYz1•
This equation has a particular solution of the form root of the equation
a= pa2 +
q.
a"' where
a is a
Smc.
If
2)
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
141
p � q there are two roots
1 !!. p ' and, correspondingly, there are two distinct particular solutions of equation
(3) :
Obviously, y., =
(�Y
c +D
(3) for arbitrary C and D. C and D so as to satisfy conditions (1) and (2).
is also a solution of
Now, we can dispose of To this end we have the
equations
C+D= 1 pa+bC + qa+bD= 0, whence
D= and
qa+bpz pa+bqz p"'(qa+b  p"+bf _
y., =
It remains to take x =
a
to obtain the required probability
qa(qb  pb) q"(pb  qb) Ya = q"+b q'*" p"+b = p"+b _
_
that the player A possessing the fortune
a
will be ruined.
Similarly,
the probability of the ruin of B is
It turns out that
Ya + Zb = 1, so that the probability that the series of games will continue indefinitely without A or B being ruined, is 0.
The probability 0 does not show the
impossibility of an eternal game, because this number was obtained, not by direct enumeration of cases, but by passage to the limit. retically, an eternal game is not excluded. possibility can be disregarded.
Theo
Actually, of course, this
142
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. VIII
If p = q = �. so that each single game is equitable, the preceding solution must be modified. In this case, the above quadratic equation in a has two coincident roots = 1, and we have only one particular solution of (3), y,., = 1. But another particular solution in this case is
x, so that we can assume
y,., =a+ Dx and determine a and
D
from the equations
a+ D(a +b)
a= 1; Thus, we find that
y,.,= 1

=
0.
X

a+b
and for x =a
b y,. =a+b. Similarly, giving
Zb
the same meaning as above, Zb
a =a+b.
If, therefore, each single game is equitable, the probabilities of ruin are inversely proportional to the fortunes of the players.
The practical
conclusion to be derived from this theoretical result is sheer common sense: It is unwise to play indefinitely with an adversary whose fortune is very large without submitting oneself to the great risk of losing all one's money in the course of the games, even if each single game is equitable.
Gamblers who gamble at an even game with any willing
individual are in the same condition as if they were gambling with an infinitely rich adversary.
Their ruin in the long run is practically
certain. If single games of the series are not equitable, that is, p ¢ conclusion may be different.
q
the
Supposing p > q, we have a case when
the expectation of A is positive; in each single game, A has an advantage over his adversary. form
and, because
qfp
The above expression for
y,.
may be written in the
< 1, it is easy to see that Ya remains always less than
SEc. 3]
143
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
and converges to this number when b becomes infinite.
Thus, playing a
serieA of advantageous games even against an infinitely rich adversary, the probability of escaping ruin is
If
a
is large enough, this can be made as near 1 as we please, so that a
player with a large fortune has good reason to believe that in the course of the games he will never be ruined, but that actually he is very likely to win a large sum of money. This conclusion again is confirmed by experience.
Big gambling
institutions, like the Casino at Monte Carlo, always reserve certain advantages to themselves, and, although they are willing to play with practically everybody (as if they played against an infinitely rich adver sary) the chance of their being ruined is slight because of the large capital in their possession.
3. In the problem solved above the stakes of both players were supposed to be equal, and we took them as units to measure the fortunes Next it would be interesting to investigate the case in
of both players.
which the stakes of A and B are unequal.
An exact solution of this
modified problem, since it depends on a difference equation of h;gher order, would be too complicated to be of practical use.
It is therefore
extremely interesting that, following an ingenious method developed by A. A. Markoff, one can establish simple inequalities for the required probabilities which give a good approximation if the fortunes of the players are large in comparison with their stakes. If the conditions presupposed in Prob. 1 are modified,
Problem 2.
in that the stakes of A and B measured in a convenient unit are and their respective fortunes are
a
a
and {3
and b, find the probabilities for A or
B to be ruined in the sense that at a certain stage the capital of A will a or that of B less than {3. Solution. Let y., be the probability for A to be forced out of the
become less than
game by the lack of sufficient money to set a full stake
a
when his
x. and consequently that of his adversary is a + b In the same way as before, we find that y., is a solution of the equation
fortune amounts to
x

in finite differences: (4)
y.,
=
+ qy.,a.
PYr�:+fJ
To determine y., completely, in addition to (4), we have two sets of supplementary conditions:
(5) (6)
Yo Ya+b
=
=
Y1
Ya+b1
=
=
·
·
·
'
'
'
=
=
Ya1
=
1
Ya+b((J1)
=
0.
144
INTRODUCTION TO MATHEMATICAL PROBABILITY
Equation
(5)
expresses the fact that if the fortune of A becomes less
than his stake, it is certain that A must quit.
(6)
[CHAP. VIII
On the contrary, equation
indicates the impossibility for A to be ruined if the other player B
does not have enough money to continue gaming. Equation (4) is an ordinary equation in finite differences of the order a + {J. It has par ticular solutions of the form 8"' where 8 is a root of the equation (7)
p
�+fJ  8" + q = 0.
The lefthand member for 8 = 0 is positive and with increasing 8 de creases and attains a minimum when
a 8 = p /J a+fJ and then steadily increases and assumes positive values for large 8.
This minimum must be negative or zero because 8 =
1
is a root of (7).
Now, if it is negative, there are two positive roots of (7).
is 8 =
1
and another > or
<1
a p
p
or
p
{J qa < 0
a > a+fJ
or
That is, the positive root of (7) different from
are favorable to B and
<1
One of them
according as
>0.
1
is >1 when single games
if they are favorable to A.
ble games, both positive roots coincide and 8 =
1
In case of equita
is a double root of (7).
All the other roots of (7) are negative or imaginary. The regular way to solve the problem would be to write down the general solution of
(4) involving a+ fJ arbitrary constants to be deter (5) and (6). As this method would lead to a com
mined by conditions
plicated expression for y.,, we shall refrain from seeking the exact solution of our problem, and instead, following A. A. Markoff's ingenious remark, we shall establish simple lower and upper limits for y., which are close enough if the fortunes of the players are large in comparison with their stakes.
Lemma.
If y., is a solution of equation (4) and none of the numbers Yo, Y1, . . . Ya1 Ya+l>, Ya+1>1,
is negative, then Proof.
Let
that the player
a+ b
:..__
•
•
•
Ya+l>fJ+1
!5;:; 0 for x = O, 1, 2, ... a + b. u�"'> (k = 0, 1, 2, . . . a  1) represent the probability A whose actual fortune is x (and that of his adversary
y,.
x) will be forced to quit when his fortune becomes exactly k. u�"'> is a solution of equation (4) satisfying the conditions
Evidently
=
SEc. 3)
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
n�k> = 0 for a+ b 1,
x = 0, 1, ... k 1, k+ 1, uf•> a+ b  fJ+ 1; 1.
. . . a 1;
145
a+ b,
==
Similarly, if v�0(l = 0, 1, 2, . . . fJ
1) represents the probability that

the player B will be forced to quit when the fortune of A becomes exactly =
a+ b  l, v�1> will be a solution of (4) satisfying the conditions
x = 0, 1, 2, . . . a for v�!> = 0 a+ b  l 1, ... a+ b fJ + 1;

1; a+ b, ... a+ b  l+ 1, v��� = 1.
Thus we get a+ fJparticular solutions of (4), and it is almost evident that these solutions are independent.
Moreover, since they represent
probabilities, u�"> � 0, v�n � 0 for x = 0, 1, 2, ... a + b.
Now, any
solution y., of (4) with given values of
Ya1 Ya+b/3+1
Yo, Y1, . Ya+b> Ya+b1,
•
can be represented thus
y.,
=
/31 a 1 �y�r,u�k> + �Ya+b!V�!>. k=O
l=O
Hence, y., � 0 for x = 0, 1, 2, . ..a+ b if none of the numbers
Yo, Y1, . . . Ya1 Ya+b> Ya+b1, Ya+b/3+1 •
is negative.
•
•
This interesting property of the solutions of equation (4)
derived almost intuitively from the consideration of probabilities can be established directly. (See Prob. 9, page 160.) The lemma just proved yields almost immediately the following proposition: If for any two solutions y� and y�' of equation (4) the inequality holds for X = 0, 1, 2,
a  1;
a+ b, a+ b 1, ... a+ b  fJ + 1,
the same inequality will be true for all x = 0, 1, 2, ... a+ b.
It suffices to notice that y., = y�'  y� is a solution of the linear equation (4) and, by hypothesis, y., � 0 for a+ b  1, ... a+ b  fJ+ 1.
x = 0,
Now we can come back to our problem. expectation of A
pfJ
qa
1, 2, . . . a 1; a+ b, First, if the mathematical
1 46
INTRODUCTION TO MATHEMATICAL PROBABILITY
is different from 0, equation (7) has two positive roots:
1
[CHAP. VIII
and 8.
With
arbitrary constants C and D
y�
=
C + D8"
is a solution of (4). Whatever C and D may be, y� as a function of x varies monotonically. Therefore, if C and D are determined by the conditions
y�
=
1,
Y�tb11+1
0
=
we shall have
y� � 1 y� � 0
0, 1, 2, . . . a  1 + b{3+ 1, . . . a + b
if
x =
if
x = a
and by the above established lemma, taking into account conditions (5) and
(6), we shall have for the required probability the following inequality
or, substituting the explicit expression for
y�,
8a+bJI+l  8"' . Y .., = > 8a+bJ1+1 1 _
If, on the contrary, C and Dare determined by Y�1 ,;
1,
we shall have
y� G 1 y� G 0
if
X
if
x =
and
Finally, taking
x = a,
0, 1, 2, . . . a  1 a + b{3+ 1, . a + b
8a+ba+1  8za+1
<
Y"'
=
oa+b a+1 
=
1
.
we obtain the following limits for the initial
probability Ya: 8a
(Jb11+1
1
9a+b II+1 
1
<
=
ya
(Jb  1 :: < oaa+l ::.;c:f (Ja+ba+1
=
_
They give a sufficient approximation to Ya if pared with
a
a
and b are large com
and {3.
If each single game is equitable, equation
(4)
has a solution with two
arbitrary constants:
y� = C + Dx. Proceeding in the same way as before, we obtain the inequalities
a
b{3+1 < < Y + b {3 + 1 = a
=
a
b ' + b a + 1
SEc.
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
4]
147
4. To simplify the analysis, it was supposed that nothing limited the number of games played by A and B so that an eternal game, although extremely improbable, was theoretically possible.
We now turn to
problems in which the number of games is limited. Players A and B agree to play not more than
Problem 3.
n
games.
The probabilities of winning a single game are p and q, respectively, and the stakes are equal. Taking these stakes as monetary units, the fortune of A is measured by the whole number
a
least so large that he cannot be ruined in bility for A to be ruined in the course of
and that of B is infinite or at n
n
games.
What is the proba
games?
Let y,.,, represent the probability for A to be ruined when
Solution.
his fortune is measured by the number
x
and he cannot play more than
The reasoning we have used several times shows that y,.,,
t games.
satisfies a partial equation in finite differences:
(8)
y,.,,
=
PYz+t.t1
+ qy.,._1,t1·
Moreover, if A has no money left, his ruin is certain, which gives the condition
(9)
Yo.�
=
if
1
t � 0.
On the other hand, if A still possesses money and cannot play any more, his ruin is impossible, so that y,.,o
(10) Conditions
(9)
and
(10)
=
0
X> 0.
if
together with equation
completely for all positive values of
x
and t.
sion for y,.,, we shall use Lagrange's method.
(8)
determine y,.,,
To find an explicit expres Equation
(8)
has particular
solutions of the form
a"'fJ' where
a
and
fJ
satisfy the relation
afJ
=
pa2 +
We can solve this equation either for expressions of y.,,1•
q.
fJ or for a which leads to two different
Solving for fJ we have infinitely many particular
solutions
a"'(pa + qa1)' with an arbitrary form
a and
we can seek to obtain the required solution in the
148
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. VIII
where f(a) is supposed to be developable in Laurent's series on a certain circle c. To satisfy (10) we must have
0�. Jfa"'1f(a)da Ulf�.
=
.
for
0
X
1, 2, 3, . . .
=
which shows that f(a) is regular within the circle completely, we must have, according to
l . 2��
J: c
f(a) da (pa + qal)t a
=
1
c.
To determine f(a)
(9) for
t
=
0, 1, 2,
All these equations are equivalent to a single equation
� i ��:�� i
a  qE
a 
holding good for all sufficiently smallE.
ao
within
c
=
1
�E
The integrand has a single pole
defined by
ao  pEa�qE
=
0,
and the corresponding residue is
q+pa� f(ao). q pa� But this must be equal to
1 1 E or, substituting forE its expression in
a0
q+pa� pa�ao+q' and hence for all sufficiently small
ao
qpa� . pa�ao +q'
f(ao) that is. if
f(a)
q pa2 pa2a+q
=
all the requirements are satisfied.
Taking into account that
we have
1 pa ' f(a)=  + 1a qpa and also ..
f(a)
=
1+
� [ 1 + (�)" Jan. n=l
p +q
=
1,
SEc.
4)
149
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
The expression for y.,,, is therefore
y.,,,=
where
co =
1 and
�iLa"'l(pa+qal)'� c,.a"da n=O
c,.= 1 +(p/q)"
if
� 1.
n
It remains to find the coefficient of 1/a in the development of the integrand in a series of descending powers of a. Since I
a"'l(pa+qa1)'= �Cfp'qlla21+o:tl l0
this coefficient is given by t.he sum 1:e
2
�Cfp1qllc,_.,_2z
l0
extended over all integers l from 0 up to the greatest integer not exceeding
t:c
2·

Hence, the final expression for the probability Ya,n is
na
��
(11)
y.....
= qo � C�(pq)l[pna21 + qna21] l0
with the agreement, in case of an even
n 
a, to replace the sum
pO + qo corresponding to l
n =
;a
by 1.
It is natural that the righthand
member of the preceding expression should be replaced by 0 if
n
< a,
which is in perfect agreement with the fact that A cannot be ruined in less
than a games.
The second form of solution is obtained if we express
(3.
The equation
a as a function of
pa2af3+q=O
having two roots, we shall take for
a=
a
the root
(3v'f324pq 2p
determined by the condition that it vanishes for infinitely large positive
(3 and can be developed in power series of 1/(3 when lf31 >
2ypq.
Using
150
[CHAP. VIII
INTRODUCTION TO MATHEMATICAL PROBABILITY
a in this perfectly determi11ed
sense, it is easy to verify that
Yz,t  2n1 ii({3 y2p(32  4pq) "'{3 (31 1d{3 1 0 c
where cis a circle of radius > the requirements. and t �
0,
described from
)d{3 1 �ii{3'G :2 0 0 {3 v'(32  4pq)"___jf!_ __!__ ( i i 2p {3  1 2n1/{3. +
Yo,t = and, finally, for t
as its center, satisfies all
For it is a solution of equation (8).
=
+
·
·
Next, for x
=
0
=
·
and x >
Yz.o =
=
c
0
.
because the development of the integrand into power series of starts at least with the second power of
To find y.,,, in explicit form, it remains to find the coefficient of
{3(  v'2pf32  4pq) "{3____!_ 1 {3. 2 ) ({3 V2pf3  4pq " �{3" z..+l
in the development of
in a series of descending powers of
1/{3 1/{3
Let
+
(3z+l +
=
multiplying this series by
_L {31
=
(31l
+
(312 +
we find that the coefficient of
1/{3
in the product is
l., + lz+l + . .
+ z,,
and hence
Yz,t.= l., + lz+l + provided t � x, for otherwise y.,,, can be written in the form
a
=
1
=
�(q
0.
+
' + '
lt
The quadratic equation in
pa2) .
and the development of any power of its root vanishing for power series of
We have
1/{3
{3
=
oo
a
into
can be obtained by application of Lagrange's series.
SEc.
FURTHER CONSIDERATIONS ON· GAMES OF CHANCE
5)
151
but
_!_[dn1(q + �2)n�z1] p d�" 1 �o nl if n
=
x+2i,
and
0 if n
=
l z+2i
=
=
Ya,n
=
x+ 2i+1.
!lence,
' ' x(x+2i1)1 +'. '1( + �) I q"' p �.X zz+2i+1 = o, '
and finally
(12)
(X + 2i1)!qz+t i i!(x+i)! p
=
•
)3 a(a+4)(a + 5) a(a+3) q)2 ( + (pq + 1'2'3 1 2 p a( a+k+1) ··· (a+2k 1) +. . .+ (pq 1·2 . . · k
[
qa 1 +�pq +
,
1
•
)k]
where
'k
=
na
2
or
k

n1
=
2
a
according as n and
a
are of the
same parity or not.
5. The difference Ya,n  Ya,n1 gives the probability for the player A
to be ruined at exactly the nth game and not before.
ence is 0 if n differs from

a
Now, this differ
by an odd number, so that the probability of
ruin at the (a+2i 1)st game is 0. That is almost evident because after every game the fortune of A is increased or diminished by 1 and therefore can be reduced to 0 only_ if the number of games played is of
the same parity as
a.
If
n =
a+2i,
the difference
Ya,n  Ya,n1 is
a(a+i+1) (a + 2i1) a +> 1 q p. 1·2·3 . . ·i ·
·
·
Such, therefore, is the probability for A to be ruined at exactly the
(a+2i)th game.
The remarkable simplicity of this expression obtained
by means which are not quite elementary leads to a suspicion that it might also be obtained in a simple way.
And, indeed, there is a simple
way to arrive at this expression and thus to have a third, elementary, solution of Prob.
3.
Considering the possible results of a series of
a+2i
stand for a game won by A, and B for a game lost by A.
games, let A The result of
every series will thus be represented by a succession of letters A and B. We are interested in finding all the sequences which ruin A at exactly
Because the fortune of A sinks from a to 0 there must be A i letters and i+a letters B in every sequence we consider. Besides, the last game.
there is another important condition.
Let us imagine that the sequence
is divided into two arbitrary parts, one containing the first letter and another the last letter of the sequence.
Let
x be the number of letters B,
152
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. VIII
A in the second or right part of the sequence. There B and i  y letters A in the first or left part. fortune of A after a game corresponding to. the last
andy that of letters will be
a+ i

x
letters
It means that the
letter in the left part, becomes
a+ i  y
x y (a+ i  x) and since A cannot be ruined before the (a+ 2i)th game, x must always be >y. That is, counting letters A and B from the right end of the sequence, the number of letters B must surpass the number of letters A at every stage.

=

Conversely, if this condition is satisfied the succession
represents a series of games resulting in the ruin of
A
at the end of the
series and not before. To find directly the number of sequences satisfying this requirement is not so easy, and it is much easier, following an ingenious method proposed by D. Andre, to find the number of all the remaining sequences of
i letters A
and
i + a letters B.
These can be divided into two �lasses:
those ending with A and those ending with
B.
Now, it is easy to show
that there exists a onetoone correspondence between successions of these two classes, so that both classes contain the same number of sequences. For, in a sequence of the second class (ending with
B)
starting from
the right end, we necessarily find a shortest group of letters containing
A
and
B
in equal numbers.
This group must end with
A.
Writing
letters of this group in reverse order without changing the preceding letters, we obtain a sequence of the first class ending with
A.
Con
versely, in a sequence of the first class there is a shortest group at the right end ending with
B.
B and containing an equal number of letters A and
Writing letters of this group in reverse order, we obtain a sequence
of the second class. An example will illustrate the described manner of establishing the onetoone correspondence between sequences of the first and of the second class.
Consider a sequence of the first kind
B!BBABAA. The vertical bar separates the shortest group from the right containing letters A and Bin equal numbers.
Reversing the order of letters in this
group, we obtain a sequence of the second class
BIAABABB and this sequence, by application of the above rule, is transformed again into the original sequence of the first class. of the first class can now be easily found. all possible sequences of
(a + 2i  1) I (i  1)!(a+ i)!
The number of sequences
It is the same as the number of
i  1 letters A and a+ i letters B, that is, (a+ 2i  1) (a+ i + 1)(a+ i + 2) 1 2 (i  1) ·
=
·
·
·
·
·
·
·
SEc. 6)
153
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
The total number of sequences in both classes is . (a+ i + 1)(a+ 1 + 2) · · · (a+ 2i  1 ) 2 . . 1 2.. . (1'  1 )
Hence, the number of sequences leading to ruin of A in exactly
a+ 2i
games is
(a+ i + 1 )(a + i + 2) · · · (a + 2i) 1.2 ... i (a + i + 1 )(a + i + 2) · ·· (a+ 2i  1 ) 2 1. 2 ... (i  1)  a(a+ i + 1) · · · (a+ 21:  1) 1. 2 ... i _
_
As the probability of gains and losses indicated by every such sequence
is the same, namely, q"+ipi the probability of the ruin of A in exactly
a+ 2i games is a(a + i + 1 ) ··· (a + 2i  1) +i ; q"p 1·2·3···i and hence the second expression found for Ya,n follows immediately. The problem concerning the probability of ruin in the course of a prescribed number of games for a player playing against an infinitely rich adversary was first considered by de Moivre, who gave both the preceding solutions without proof; it was later solved completely by Lagrange and Laplace. The elementary treatment can be found in Bertrand's "Calcul des probabilites."
6. Formulas (1 1 ) and (12), though elegant and useful when n is not large, become impracticable when
n is somewhat large, and that is pre
cisely the most interesting case.
Since the question of the risk of ruin
incurred in playing equitable games possesses special interest, it would not be out of place at least to indicate here, though without proof, a con venient approximate expression for the probability Ya,n in case of a large nandp = q = 72'.
Lett be defined by t =
then for
n � 50 it is possible to establish the approximate formula ya,n =
where
a . ' y2(n + i)
1
__!__ fte•'dz y';Jo
+
..!!_
6n
1. Suppose, for instance, that the fortune of a player $100, each stake being $1, and he decides to play 1 ,000,
< 8 <
amounts to
1 
154
[CHAP. VIII
INTRODUCTION TO MATHEMATICAL PROBABILITY
5 ' 000 ' 10 ' 000, 100,000, 1,000,000
games.
Corresponding to these cases,
we find t
2.2354, 0.9999, 0.7071, 0.2236, 0.0707
=
and hence
 � r1e••dz v1rJo
=
0.9984, 0.8427, 0.6827, 0.2482, 0.0796.
.
The corresponding approximate values of Y1oo,,. are
0.0016, 0.1573, 0.3173, 0.7518, 0.9204. Thus, for a player possessing in the course of
$100
there is very little risk of being ruined
games even if he stakes
1,000
$1
at each game.
is considerably larger, but still fairly small, when In
10,000
games we can bet
1
The risk
games are played.
that the player will still be able to
But when the limit set for the number of games becomes
continue.
100,000,
to
2
5,000
we can bet
course of those ruin in a series
3 to 1 that the player will be ruined somewhere in the 100,000 games. Finally, there is little chance to escape of 1,000,000 games. The risk of ruin naturally increases
with the number of games, but not so fast as might appear at first sight.
7. We conclude this chapter by solving the following problem, where the fortunes of both players are finite. Players A and B agree to play not more than
Problem 4.
n
games,
the probabilities of winning a single game being p and q, respectively. Assuming that the fortunes of A and B amount to
a
and b single stakes
which are equal for both, find the probability for A to be ruined in the course of
n
games.
Solution.
Let
z.,,1
when his fortune is
x
play only t games.
be the probability for the player A to be ruined (and that of his adversary
a
+ b

x)
and he can
Evidently z.,,t satisfies the equation
(13)
Zx,t = PZx+1,t1
perfectly similar to equation serving to determine
(14)
(8),
+ qZ.,1,t1
but the complementary conditions
z.,,1 completely are different. Zo,t
=
for
1
First we have
;;; 0.
t
Next,
(15)
Za+b,t
=
0
for
t
;;; 0,
because if A gets all the money from B, the games stop and A cannot be ruined.
(16)
Finally,
z.,,o
=
0
for
x =
1, 2, 3,
.
.
.
a
+ b

1,
SEc. 7]
155
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
because A, having money left at the end of play, naturally cannot be ruined. Since
( 13)
has two series of particular solutions
a"'fJ1 where
a
a'
and
a' "'{31
and
are roots of the equation
q
 {Ja +
pa2
=
0
both developable into series of descending powers of shall seek z.,,, in the form z.,,,
=
1 . 2 11'�
f [J(fJ)a"'
fJ for ltJI
>
1,
we
+ tp(fJ)a'"']fJ1dfJ.
c
Here the integration is made along a circle of sufficiently large radius and
f(B)
and
tp(B)
are two unknown functions which can be developed into
series of descending powers of {J. in
x
and t.
For
x
=
Obviously z,.,, satisfies
(13)
identically
0 and t � 0 we have the condition
�i [f(B)
+ tp(B)]B1dfJ
=
1;
t
o,
=
1, 2, .
which is satisfied if
J(fJ) + tp(B)
(17) Condition
(15) will
=
1 B1
·
be satisfied if
aa+bJ(B) + a'a+biP(B)
(18)
=
and it remains to show that at the same time
(17)
and
{18),
0
(16)
we have
/(fJ) tp(fJ)
=
=
is satisfied.
Solving
a'a+b 1 a+b  ()(.arb . fJ  1 a' aa+b . 1 a+b  aa+b fJ  1 a'
and
(19) /(fJ)a"' + tp({J)a'"'
=
a'a+ba"'  aa+ba'"' aa+b) (fJ 1)(a'a+b _
_
=
Now let
a
(q)"'
fJ.
Evidently x
=
_
aa+b:�:
a' the other root whose fJ starts with the term the development of (19) for 1, 2, 3, . . . a + b  1
be the root vanishing for
fJ
=
oo
and
development in series of descending powers of containing
a'a+b:o
P (fJ  1)(a'a+b  aa+b)'
INTRODUCTION TO MATHEMATICAL PROBABILITY
156
[CHAP. VIII
... a+
does not contain terms involving the first power of 1/fJ, and hence b 1 as it should be. The solution 1, 2, 3, o if x z.,,o being (16) (15), (14), ying unique, its analytical expression is tisf sa 13) of ( =
=
therefore
whence for x =
a
Zs,t
 (�ria'IJa'+b'*s"  atJ+bs 2ni
and t
 all+l>
c_
fJ'dfJ fJ 1'
n
=
To find an explicit expression for z,.,,. it remains to find the coefficient of 1/fJ in the development of
P
(p) a',.+ b  a11+1> a' q
=
,.
a'"
in series of descending powers of fl. ways. First we can substitute for
and present P in the form
P=
q
1or developing into series
P=
[a�� (�)"a��+�"+ (�Y+��aa..
H"
_
But the coefficient of 1/fJ in
a:
This can be done in two different its expression in
(!!.)" a,.+2b (qp)IJ+b a211+2b
all
fJ" fJ1
a"
_
fJ" fJ 1
__,
(�)'*2"aa..
H" +
amfJn
fJ1 by the second solution of Prob. 3 is the probability a fortune games. z,.,,.
=
m
fJ1
for a player with
to be ruined by an infinitely rich player in the course of
Hence, the final expression for
y,.,,.
Ym,n
... ]1.
" q Y��t2b,n  (p)
+
is
(p)a+b Y3<Jt2b,n (qp)"+2bYa��t4b,n q Za,,.

+
•
•
•
1
n
SEc.
7)
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
157
the terms of this series being alternately of the form
() � () 
and
p
ka+kb Y<2k+l)a+2kb,n
ka+(k+l) b Y(2k+l)a+(2k+2)b,n
q
The series stops by itself as soon as the first
for k = O, 1, 2, . .
subscript of Yz,n becomes greater than n. To obtain a second expression of Za,n we notice that
is a rational function of
is a polynomial in we set
{3
=
{3
whose denominator
{3 of the degree a+ b rp. Since, then,

1.
To find the roots of R
=
0,
2yPq cos
()
we have
!i.
=
R
a+�1sin
(� + b)rp . rp
sm
p
The equation sin
(�
+
sm
b)rp =
rp
0
having roots lf'h
the a + b

hr = a+
h = 1, 2, . .
b;
.
a+ b

1,
1 roots of R are f3h = 2ylpq cos lf'h·
Now we can resolve the rational function Pinto a sum of simple elements as follows:
a+b1
p where
=
E({3)
+
�1
{3 
+
""" � {3� f3h h=l

INTRODUCTION TO MATHEMATICAL PROBABILITY
158
and for h > 0
A,.
=
 (2ypq)"+I(!!.)� P
(a+ b)(1
while E(P) is the integral part of P. ment of P being
�
2
sin
pq cos
[CHAP. VIII
sin a�,C,.(cos If'�&)"
lf'h)
The coefficient of 1/(3 in the develop
we have a new explicit expression for Za,,.:
This expression shows clearly that
Za,
the limit
..
, with increasing
n,
approaches
representing the probability of ruin when the number of games is unlim ited, in complete accord with the solution. of Prob. 1. The first term in (20) naturally must be replaced by p
=
q
=
a
�
b
in case
This form of solution was given first by Lagrange.
.72'.
Problems for Solution 1. Players
A and B with fortunes of $50 and $tOO,
one of them is ruined. respectively, for
A
B, A?
and
and they stake
$t
at each game.
What is the probability
Am. Very nearly 2&o 8.88·1016• B at each single game stake $3 and $2, respectively, and have fortunes
of ruin for the player
A and of $30 and $20 at the beginning, what is the that A will be ruined if the probability of his �? (b) p 2. If
respectively, agree to play until
The probabilities of winning a single game are % and %, =
approximate value of the probability winning a single game is
(a) p
=
%;
=
Am. (a) 0.40 + di ldl < 1.7 X to•; (b) 0.96 + di ldl < 4.6 X toa. 3. A player A with the fortune $a plays an unlimited number of games against an infinitely rich adversary with the probability p of winning a single game. He stakes $t at each game, while his rich adversary risks staking such a sum p as to make the
FURTHER CONSIDERATiONS ON GAMES OF CHANCE
159
A.
game favorable to What is the probability that A will be ruined in the course of the games? Give numerical results if (a) a = 10, p = Y2, {3 = 3; (b) a = 100, = Y2, {3 = 3. Ans. Let () < 1 be a positive root of the equation p()fJ+I  () + q = 0. The required probability Pis:P = ()•.
p
In case (a) P = 0.002257; in case ( b) P = 3.43 · IQn. 4. A player A whose fortune is $10 agrees to play not more than 20 games against
an infinitely rich adversary, both staking $1 with an equal probability of winning a
A
will not be ruined in the course of single gam!J. What is the probability that 20 games? Ans. 0.973.4 6. Players and B with $1 and $2, respectively, agree to play not more than n
A
equitable games, staking $1 at each game.
A
both staking $1 at each game.
A:�3
{(0 ) (0 ) } � H(V: ) ( v: ) }
3 ForA : 5 5 ForB: 
+ 1 ".... + 4
3
·
2"+1
'
What are the probabilities of their ruin inn games?
Give the numerical result if n = 20. 4
3 + ( 1)".
3 (1)". forB:!3 3 • 2n+1 andB with $2 and $3, respectively, play a series of equitable games, Ans. For
6. Players
What are the probabilities of their ruin?
Ans.
1 ".... 4 • 1 "+
+ 1 "+� 
;
E=1 ifnis odd,
;
71 =1
ifnis even,
E=2 ifnis even. 71 =2ifnis
odd.
7. Find the expression of y., .., the probability of the ruin of A when his adversary B is infinitely rich, corresponding to formula (20). Ans. From the definition of a
definite integral it follows that
Ya,n = Ya,«> where
( )�
(2yPq)"+! !l_
P
''11'
(;)
if
"
J:
2
1r
.
a
tp)"dtp
p � q
If the games are equitable and n differs from
Ya,n = 1  
•
'==cos ( 0 1  2ypq cos tp
if
y.,., = 1 Ya,«> =
L
Sill 'P Sill
r
a by
p > q. an even number, then
i sin a
o
tp)"+ld
1!
2!
+
Find the general expression for
+ ..
.
+
= 1'.
Let
F(x) =
·
·
·
be the generating function of
x
x•
x3
e�=1++++ 31 21 11
Ans.
179
MATHEMATICAL EXPECTATION we find
e•F(z)
=
1
1 1:r;
+ :r; + z1 + . . .
=
or ez =·
F(z)
1:r;
whence
=
1 1 1  + 2! 1!


·
·
·
(1)"
+
n!

.
6. The total number of balls in an urn is known, but the number of white balls depends on chance and only its mathematical expectation is known. Find the prob
Am. Let N be the total number of balls and M the The required probability is M jN. 7. Two urns contain, respectively, a white and b black and a white and {J black balls. A certain number c (naturally not exceeding a + b) of balls is transferred
ability of drawing a white ball.
expectation of the number of white balls.
from the first urn into the second.
What is the probability of drawing a white ball Am. The required probability is
from the second urn after the transfers?
8. An urn contains
a
white and b black balls.
Mter a ball is drawn, it is to be
returned to the urn if it is white; but if it is black, it is to be replaced by a white ball from another urn.
What is the probability of drawing a white ball after the foregoing
operation has been repeated
z
number of white balls after
operations.
:r;
times?
Ans. Denote by ·M., the expectation of the From the equation
the following expression for M,. can be derived:
M
"
=
)
(
1 a + b  b l  "' a+b ·
·
It follows that the required probability is
P
=
l
a
:( b
l
a
� )"' b
·
9. Urns 1 and ·2 contain, respectively, a white and b black and c white and d black One ball is taken from the first urn and transferred into the second, while simultaneously one ball taken from the second urn is transferred into the first. What balls.
is the probability of drawing a white ball from the first urn after such an exch�j.nge has been repeated
:r;
times?
Am. Let M,. and P,. represent the mathematical expecta
tions of the number of white balls in the first and second urn after
z
exchanges.
Mz +P,. =a +c
Then
R INTRODUCTION TO MATHEMATICAL P OBABILITY
180 whence
M,
N.
(a + c)(a + b) = a
+ b + c +d
+
ad  be
a
1
1 
+ b + c +d
a +b
(

1
[CHAP. IX
)z
c + d
·
10. An urn contains pN white and qN black balls, the total number of balls being Balls are drawn one by one (without being returned to the urn) until a certain
number n of balls is reached. drawn?
What is the dispersion of the number m of white balls Am. Let :c; =1 if the ith ball drawn is white and :c; =0 if it is black.
We have
E(m) =np, and
The required dispersion is
D=E(m np)2 = npqNn  · N1 11. In a lottery containing n numbers (1, 2, 3, . . . n) m numbers are drawn at a time.
Let :�:1 represent the frequency of a specified number i in N drawings.
Prove
that
E(:c;) =Np, E(:c;  N p) 2 = Npq (i � j) E(:c1  Np)(:c,  Np) =Np(p '  p); where
m
p =·
m 1
p'=
q=1  p,
n
 ·
n 1
11. Let Zi = (:c;  Np)2
 Npq.
Show that the dispersion of the sum Z1
is
+
D= Indication of the Proof.
.z,
+
·
·
•
+
Zn
2N(N1) (npq)' · n 1
Let N variables � •• Es, . . . EN be defined as follows:
Er. = p if in the kth drawing the number i fails to appear E�:=q if in the kth drawing the number i appearl!. In a similar way, we can define N variables '11•
number j � i.
:Ci
 Np = �1 + Ea +
:CjNp ='11
we have The variables
,,,
•
•
Since
+ 'II +
+ EN + 'IN
•
'IN associated with the
181
MATHEMATICAL EXPECTATION being independent, we have E(e•
=
E(e•E•+vq,)
•
E(e•Es+vq•)
·
·
•
E(e•EN+•'IN).
But E(e•E•+•"•)
E(e•EN+•'IN) E(e•Eo+•"•) pp'e«•+« • + p(1  p')e«•pv + p(1  p')&•pu + ·
=
=
=
·
·
=
=
(q
_
p + pp')epupv
=
F(u, v).
=
Hence E(e•
=
F(u, v)N.
It suffices to expand both members into power series in u and
v
and compare terms
involving u2v2 to find E(z;z;);
i ¢j.
The rest does not present serious difficulties except for somewhat complicated calcula tions.
13. A box contains 2" tickets among which C � tickets bear the number i (i
0, 1, 2,
=
n).
A group of m tickets is drawn; denoting by 8 the sum of their numbers, it is required to find the expectation E and the dispersion D of 8. .
.
•
An8. E
=
1 1nn; D 2
=
1 m(m 1)n . mn 4 4(2" 1)
14. A box contains k varieties of objects, the number of objects of each variety being the same.
These objects are drawn one at a time and put back before the Denoting by n the smallest number of drawings which produce
next drawing.
objects of all varieties, find E(n) and E(n2). E(n)
E(n2)
=
=
(
1
k 1 +  + 1 +
1 
1
+
1
·
·
( ) ( 2
2

3
1 1 + 2
+
1

3
·
)
+ k
)
1 1 1 +  + 2 3
Ans.
+
·
·
·
+
1 
k
(
1 1 1 +  + + 2 3 
)
1
·
·
·
+  . k
Use the result of Prob. 12, p. 41.
References A. MARKOFF: "Wahrscheinlichkeitsrechnung," pp. 45JJ., Leipzig, 1912. G. CASTELNuovo: "Calcolo delle probabilita," vol. 1, pp. 3562, Bologna, 1925. J. BERTRAND: "Calcul des probabilit6s," Paris, 1889.
CHAPTER X
THE LAW OF LARGE NUMBERS 1. The developments of the preceding chapter, combined with a simple lemma due to Tshebysheff, lead in a natural and easy way to a far reaching generalization of Bernoulli's theorem, known under the name of the "law of large numbers."
Tshebysheff's Lemma.
Let u be a vm·iable which does not assume
negative values, and a its mathematical expectation.
The probability of the
inequality is always greater than
1 1 t2
whatevm· t may be.
Proof.
Let
be all the possible values of the variable u and
their respective probabilities.
By the definition of mathematical expec
tation, we have (1) We may suppose the notations so chosen that
are all the values of u which are � at2, the remaining values
being >at2•
If all the terms in (1) with subscripts 1, 2,
.
.
.
a
are
dropped, the lefthand members can only be diminished, since these terms are positive or at least nonnegative by hypothesis. therefore, But as u; > at2 182
We have,
SEc.
2]
for i
=
183
THE LAW OF LARGE NUMBERS
a+ 1, a+ 2, ..
n a still stronger inequality,
.
at2(p
..
+
+l
·
· · + p,.)
<
a
+ p ,.
<
1 t2
or
p l+ · · ..+
·
will hold. Here the lefthand member represents the probability Q of the inequality u > at2
because this inequality can materialize only in the following mutually u a+l• or u u,. whose or u Ua+2• , exclusive forms: either u =
=
Pa+l• Pa+21
probabilities are, respectively,
•
•
•
•
p,..
•
=
Thus
1
Q
� at2,
we must have
p+ Q
=
1,
whence
1 P>12
t
which proves the lemma.
2. Let x1, x2,
.
.
•
x,. be a set of stochastic variables and
a1, a2,
.
.
•
a,.
their respective expectations. The dispersion of the sum
X1 + X2 + ' ' ' +X,. which we shall denote by
B,. is,
by definition, the mathematical expecta
tion of the variable u
=
(x1 + X2 + · ·
·
+ x,. a1  a2 
·
•
•
 a,.)2.
Tshebysheff's lemma, applied to this variable u, shows that the proba bility of the inequality
(x1 + x 2 +
·
·
·
+ x,. a1 a2
is greater than
1 1 2
t
·
·
·
 a,.)2 � B,.t2
INTRODUCTION TO MATHEMATICAL PROBABILITY
184
[CHAP. X
But the preceding inequality is equivalent to two inequalities
tv'lf;. or,
�
+ x2 +
X1
dividing through by
t {B..
'\j1i2
�
X1
+
X2
·
·
+ x,.
a1 az
+ x,.
a1
a,.
·
·
·
;:a!
tVB:.
n,
+

·
•
•
·
_
+ a.. +
n
·
·
·
+
a,.
n
� t /B...  '\jn2
Hence, the probability of these inequalities for an arbitrary positive
t
is greater than
1Let
E
1
it"
be an arbitrary positive number.
t....;{B.. fii" whence
t2
=
=
Defining
t by
the equation
E1
n2Ez B,.
_,
we arrive at the following conclusion: The probability P of the inequalities
equivalent to a single inequality
+x,.
is greater than
Thus far nothing has been supposed about the behavior of indefinitely increasing
B,.fn2
tends to 0 as
n.
B,.
for
We shall now suppose that the quotient
n increases
indefinitely.
arbitrarily small positive numbers
E and.,,
Then, having chosen two
a number
no can
be found so
that the inequality
will hold for
n
>
no.
Consequently, we shall have
p > 1., for all
n
>
no.
This conclusion leads to the following important theorem
due, in the main, to Tshebysheff:
SEc. 3]
185
THE LAW OF LARGE NUMBERS
With the probability approaching 1 m·
The Law of Large Numbers.
certainty as near as we please, we may expect that the arithmetic mean of values actuclly assumed by n stochastic variables will differ from the arithmetic mean of their expectations by less than any given number, however small, provided the number of variables can be taken sufficiently large and provided the condition as
n�
oo
is fulfilled. If, instead of variables x., we consider new variables with their means
=
z,
=
x,
0, the same theorem can be stated as follows:

a,
For a fixed e > 0, however small, the probability of the inequality
I
Z1 + Z2 +
.
·
+ z,.
·
n
,
� e
tends to 1 as a limit when n increases indefinitely, provided
B;�o. n
This theorem is very general.
It holds for independent or dependent
variables indifferently if the sufficient condition for its validity, namely, that as
n�
oo
is fulfilled.
3. This condition, which is recognized as sufficient, is at the same time necessary, if the variables
z1, z2, ... z,.
are uniformly bounded;
..
that is, if a constant number (one independent of n), C, can be found so that all particular values of than C.
z, (i
=
1, 2, .
n) are numerically less
Let P, as before, denote the probability of the inequality
jz1 + Z2 +
·
·
·
+ z,.j
� ne.
Then the probability of the opposite inequality
jz1 + Z2 +
·
·
·
+ z,.j
will be 1 P. Now, by definition,
B,.
=
E(z1 + z2 +
whence one can easily derive the inequality
> ne
[CHAP. X
INTRODUCTION TO MATHEMATICAL PROBABILITY
186
from which it follows that
B ; n
<
02( 1  P)
+
e2P
<
+
e2
If the law of large numbers holds, 1
02(1  P).
P
converges to 0 when n
increases indefinitely, so that the righthand member for sufficiently large
n becomes
less than
any
given number, and that implies
Bn O n2 � ' which proves the statement. 4. [['here is an important case in which the law of large numbers certainly holds; namely, when variables
x1, x2,
.. . Xn are independent
and the expectations of their squares are bounded.
Then a constant
number a exists such that
b,
=
E(xD
<
a
i
for
=
1, 2, 3,
On the other hand, for independent variables n
Bn
=
n
!(b, 
a�)
�
i�l
!b, < nO i�l
and as
n�
co,
The expectations of squares are bounded, for instance, when all the variables are uniformly bounded, which is true, for instance, for "iden tical" or "equal" variables.
Variables are said to be identical if they
possess the same set of values with the same corresponding probabilities.
5. E. Czuber made a complete investigation of the results of 2,854 drawings in a lottery operated in Prague between 1754 and 1886. It consisted of 90 numbers, of which 5 were taken in each drawing. Czuber's book "Wahrscheinlichkeitsrechnung," vol. 1, p. 141
From
(2d
ed.,
1908), we reprint the table shown on page 187. With the 2,854 drawings, we associate 2,854 variables,
x1, x2,
•
•
•
x286
�
representing the sum of five numbers appearing in each of the 2,854 drawings.
These variables are identical and independent with the
common mathematical expectation 227.5.
Hence, by the law of large numbers, we can expect that the arithmetic mean of actually observed values of these variables will not notably differ from 227.5. the sum
To form
SEc. 5]
187
THE LAW OF LARGE NUMBERS
Their frequency
Numbers
1n
Difference 1n
138
6
158
20
39,65
139
19
16,41,76,87
142
16
143
15
18,44,47
144
72,80
145
14 13
12
146
12
21,53
147
11
70
149
 9
24,32,55,69
150
27,64,75
151
2,14,56,79,86
81
152
23, 29,85
153
 8  7  6  5
19,35,42,74
154
 4
155
13,34,40,67,88
156
 3  2  1
7,20,59 11,52,68
157
17,82
158
0
15,90
159
1
58 8, 25, 36
160
2
161
22
162
3 4
33,57
163
5
51
164
6
165
7
3,43,45,48 10,26,66
166
8
1, 5,60,84 50,62 9, 61, 63
167
9 10
170
12
54,73
171
13
49,71,78
172
14
28
173
15
168
37
176
18
30,46
177
19
89
178
31
179
20 21
38
184
26
4
185
27
77
186
28
83
189
31
we must multiply the frequencies given in the preceding table by the sum of corresponding numbers.
To simplify the task we notice that all
numbers from 1 to 90, actually appeared.
Hence, we multiply the
sum of these nllmbers, 4,095, by 158, which gives: 4095 . 158
=
647,010,
188
INTRODUCTION TO MATHEMATICAL PROBABILITY
and then add to this number the sum of the differences plied by the sum of the numbers in the same line.
22,336
s
=
158 multi
The results are:
19,587.
647,010 + 22,336  19,587
and

X
Sum of negative products
Sum of positive products
Hence
m
[CHAP.
s 2854
=
=
649,759
227.67,
which differs very little from the expected value 227.5.
An even larger
difference would be in perfect agreement with the law of large numbers since 2,854, the number of variables, is not very great.
6. The two experiments reported in this section were made by the
author in spare moments.
In the first experiment 64 tickets bearing
numbers 0, 1, 2, 3, 4, 5, 6 and occurring in the following proportions:
1 2 5 4 6 3 Number........................ 0 1  1  Frequency..... . .... . ..... . .....
1
6
15
20
15
6
1
were vigorously agitated in a tin can and then 10 tickets were drawn at a time and their numbers added.
Altogether 2,500 such drawings were
made and their results carefully recorded.
From these records we
derive Tables I and II. TABLE I Number
Frequency observed
0 1 2 3 4 5 6
Expected frequency
Discrepancy
390.625 2,343.75 5,859.375 7,812.5 5,859.375 2,343.75 390.625
404 2,321 5,850 7,863 5,821 2,344 397
The next table gives the absolute values of differences 8
+13.375 22.75  9.375 +50.5 38.375 + 0.25 + 6.375 
30 where 8
is the sum of the numbers on 10 tickets drawn at one time, and their respective frequencies. From Table I it is easy to find that the arithmetic mean of all 2,500 sums observed is:
7
4996
2500
=
29 . 9984
SEc.
6)
,
189
THE LAW OF LARGE NUMBERS TABLE II
Isaol
Frequency observed
Is ao1
Frequency observed
0 1 2 a 4 5 6
246 549 479 a79 a24 241 128
7 8 9 10 11 12
71 44 25 8 4 1
whereas the expectation of each of the 2,500 identical variables under consideration by Prob.
13,
page 181, is
.dispersion of 81 that is, E(8

30)2
30.
By the same problem the
is 12.857.
On the other hand, from
Table II we find that 2":(8

2":(8

30)2
=
31477
and
30)2
2500
=
12.5908
fairly close to 12.857. In the second experiment we tried to produce cards of every suit in drawings
(n
each card taken being returned before the next drawing. page
n
n
being the smallest number required) of one card at a time,
181, we
By Prob.
14,
find that the expectation and the dispersion of this number
14.44. 33 was the largest.
are, respectively, 8� and
recorded, of which
Altogether
3,000
values of
Values of the difference
n
n

were 8 are
given in Table III. TABLE III
n8
Frequency
n 8
Frequency
n8
Frequency
4 a 2 1 0 1 2 a 4 5
282 420 426 407 348 247 228 146 116 88
6 7 8 9 10 11 12 13 14 15
77 50 40 31 17 15 13 6 9 5
16 17 18 19 20 21 22 23 24 25
a 5 2 1 a 1 1 1 0 1
From this table we find
2":(n

8)
=
965,
2":(n

8)2
=
43,106,
190
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. X
whence
�(n  8!)2 = �(n  8)2  f�(n  8) + .B..9JLA �n = 24,965.
=
42,796
By the law of large numbers we may expect that the quotients
�(n 8!)2
and
3000
8,%
will not considerably differ from
and 14.44, respectively.
matter of fact,
�n 3000
As a
8. 322 ,
=
There is a very satisfactory agreement between the theory and this experiment in another respect.
Of 24,965 cards drawn there were
6,304 6, 236 6,131 6, 294
hearts diamonds clubs spades
whereas the expected number for each suit is 6241.25.
7. So far, we have dealt with stochastic variables having only a finite number of values.
However, the notion of mathematical expectation,
and the propositions essentially based on this notion, can be extended to variables with infinitely many values.
Here we shall consider the
simplest case of variables with a countable set of values, that can be arranged in a sequence · · ·
<
a2
<
a1
<
ao
<
a1
<
a2
<
·
•
•
in the order of their magnitude. With this sequance is associated the sequence of probabilities
. . . , P2, P1, Po,_ Pl, P2, . so that in general
P•
is the probability for
x
. .
to assume the value
a,.
These probabilities are subject to the condition that the series
�p,
=
· · ·
+ P2 + P1 + Po + P1 + P2 + · · ·
must be convergent with the sum 1. The definition of mathematical expectation is essentially the same as that for variables with a finite number of values, but instead of a finite sum, we have an infinite series
E(x)
=
�p,a,
provided this series is convergent (it is absolutely convergent, if con vergent at all).
If thie eeriee ie divergent, it is meaningless to speak of
SEc.
8)
THE
LAW OF LARGE NUMBERS
a:.
the mathematical expectation of tation of any function
'P(:t)
191
Likewise, the mathematical expec
is defined as being the sum of the series
E{'P(:t)}
2:p,fP(a,),
=
provided the latter is convergent. It can easily be seen that various theorems established in Chap. IX, as well as Tshebysheff's lemma, continue to hold when the various mathe·
matical expectations involved exist.
The law of large numbers follows, as a simple corollary, from Tshebysheff's lemma if the following requirements are fulfilled:
a. Mathematical expectations of all variables a:1, a:2, a:a, . . + a: exists. b. The dispersion B of the sum X1 + :t2 + c. The quotient B,./n2 tends to 0 as n tends to infinity. ..
·
. exist.
..
·
·
The first requirement is absolutely indispensable.
Without it the
The second requirement (not to speak
theorem itself cannot be stated.
of the third) need not be fulfilled; and still the law of large numbers may hold, as Markoff pointed out.
8. Let a:1, a:2, a:a, . . . be independent variables.
If for every i
the mathematical expectation
E(a:D exists, the quantity
B,.
exists also.
But if at least one of these expecta
B
tions does not exist, the quantity
has no meaning.
..
However, the
following theorem, due to Markoff, holds:
Theorem. The law of large numbers holds, provided that for some 0 all the mathematical expectations
8 >
i
1, 2, 3, . . .
=
exist and are bounded. Proof.
For the sake of simplicity we may assume that
E(a:,)
0;
=
i
=
1, 2, 3, ..
.
.
For, supposing
E(a:,)
=
a,;
i
=
1, 2, 3,
instead of a:,, we may consider new variables
Then
E(z,)
=
0
and it remains to prove the existence and boundedness of
E(jz,j1+');
i
=
1, 2, 3, .
192
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. X
The proof follows immediately from the inequalities
lx•  a,IH8 ia•i1+8
28{ lx,jt+8 + la•IH8} E(ix,i1+8)
� �
the first of which is well known; the second is a particular case of Lia pounoff's inequality, established in Chap. XIII, page
265.
Thus, from the outset we are entitled to assume that
E(x,) =
0.
The proof of the theorem is based on a very ingenious and useful
N be a positive number which later we shall x, we shall consider two new varia bles, u, and v,, defined as follows: a being a particular value of x,, the corresponding values of u, and v, are device due to Markoff.
increase indefinitely.
if
if
ial ial
�
>
N N.
Let
Together with
u, = a,
v, =
u, =
v, = Ol
0
and
0,
Thus, stochastic variables
u,
and v, are completely defined.
Evidently
x, = u,
v,
+
whence
0
=
E(u,)
+
E(v,)
and Now
by hypothesis.
Since
v, is
either 0 or its absolute value is > N, we have
N8E(iv,i)
�
E(iv,j1+8)
< c,
whence
(2) Likewise, the probability
q,
N1+8q, whence
(3)
for
�
v,
¢ 0 satisfies the inequality
E(iv,jtH)
< c,
Smc.
THE LAW OF LARGE NUMBERS
8]
193
Now, let us consider two inequalities
l =u 1 '+_u"z _+_n_·__+...;_u...c:_ nl < 1 =x1 ....:+_=xz ....:+_n_·__+..:.......:._ x�: � <
(4) (5) where
u
u u
is an arbitrary positive number and let Po and P be their respec
tive probabilities. The inequalities
(4)
and
(5)
V1 = Vz = ' ' ' = V,.
=
coincide when 0.
With this supplementary condition they have the same probability Q. But they can hold also when at least one of the numbers
is different from 0. Let the probabilities of circumstances be Ro andR.
Po=
(4)
and
(5)
under such
Then
Q
+Ro,
p = Q +R.
But evidently neither Ro norR can exceed the probability that in the series at least one number is different from 0; this probability in turn does not exceed (see Chap.II, page 30)
+
q,.
nc
Hence
nc
Ro
nc R < NlH
and
nc
IP  Pol
(6)
On the other hand, since none of the values of u,(i exceeds
=
1, 2, ...n)
N, we have E(u,) � NHE(Iu•llH) � NHE(Ix•llH)
Accordingly, the dispersion of the sum less than
u1 + Uz + · · · + Un will be
cnNl•.
Hence, by what has been proved in Sec.2, the probability of the ine quality
(7)
l ul + Uz +n· · · + u,
_
{j 1 + {jz + · · · + {j,. S ! n 2
,
194
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
X
is greater than
But whenever
lu l
�)
(7)
is satisfied, the inequality
+ u2 +
·
·
·
n
u,., 2 IP1 P
+
�
�
+ 2 + · · · +
+
n
P.. l
Hence, the probability of this inequality is a fortiori
is also satisfied. greater than
1 4cNla Owing to inequalities
lul
(2),
the following inequality follows from (8 ) :
+ u2 +
·
·
n
·
+
Hence
u,., 2 <
(6) p > 1
�
+ ...£_ =
N6
u.
4cN13 ,
P> o 1and on account of
.
E2n
E2n
4cN14 �
 nc
NI+a'
Now we can dispose of the arbitrary number N by taking nE
N= · 2 Then P> 1 Now
N tends
na.  2c(2)1+' ;
to infinity with nand as soon as nsurpasses a certain
limit no, the fraction
c N'
2
will become and remain less than / E .
I
1
X
+
Xs
The probability of the inequality
+ • ' ' +
,
X,. <
n
E
for n > no will be greater than Pand consequently greater than
1
2c
(2)1+' ;
n6  •
SEc.
9)
THE LAW
It tends, therefore, to 1 as
LARGE NUMBERS
OF
195
n tends to infinity, and that proves Markoff's
theorem. Example.
Xp{p
Let the possible values of the variable
pl(p + 1)!, pl(p + 1)1, pl(p
+
=
1, 2; 3, .
.
. ) be
1)1, ...
with the corresponding probabilities
p p p ' 1 p + 1 (p + 1 )1' {p + 1)a
. •
•
•
Since the series
1 1 1 +++ p
p
p
is divergent, the mathematical expectation
E(x!) does not exist.
Yet the law of large numbers holds.
For
..
is a convergent series for any 0 < 6 < ..
p3
1.
Moreover,
1
;5; . ;:1 �(13) � 2 2 1 ndP + 1)2
�

�
and consequently the conditions of Markoff's theorem are satisfied for any 0 < 6 <
1.
Hence, the law of large numbers holds in this example.
9. If variables Xt, x2, xa, . . . are identical, the law of large numbers holds without any other restrictions, except that for these variables mathe matical expectations exist.
In fact, Khintchine proved the following
theorem:
Theorem. IJ, as we may naturally suppose, E(x,) of the inequality
I
X1 + X2 + • n
"
' + X,.
,
�
0, the probability
=
E
tends to 1 as n increases indefinitely. Proof. The proof is quite similar to that of Markoff's theorem and Let
is based on the same ingenious artifice. ·
·
·
<
a2
<
a1
<
ao
<
a1
<
a2
<
·
·
be different values of any one of the identical variables Xt, ·
·
·
, P2, Pt, po, Pt, pz,
·
.
•
·
x2, xa, . . . and
196
[CHAP. X
INTRODUCTIO N TO MATHEMATICAL PROBABILITY
their probabilities.
By hypothesis
�p;a; is a convergent series with the sum 0.
The series
�p;la•l is also convergent; let c > 0 be its sum. Keeping the same �otations as before, we have
IP•I ;;! E(lv,l)
�
=
P•la;l
=
1/I(N)
Ja;J>N
where 1/I (N) is a decreasing function tending to 0 as
E(uD ;;! NEix•l
=
N +
Also
oo.
eN
so that the dispersion of the sum U1
is less than
+ Uz +
. +u..
cNn. Consequently the probability of the inequality
... n_.:.... . __ +..u...:.... ".:: P1 + Pz + n. ...+ u:.:.:..:.._ z:.+.. lu.:.:.. :.l ..
(9)
_
•
•
+ p,.,;;! �
is greater than
1
_
4cN Eln .
On the other hand, the probability q. of the inequality "•
than
� 0
1/I(N) 1r because
N
�
p;
<_1/I(N)
Ja1J>N
and
Hence, the difference between the probability of the inequality
lu1 + u11 + n.
·
•
+ u,.l
<
f1
is less
SEc. 9)
197
THE LAW OF LARGE NUMBERS
and that of the inequality
X1 I
� X2
+
' n
' ' + X,.
,
<
a
is numerically less than
As in the preceding section we conclude that the probability of the inequality
is greater than
Finally, the probability of the inequality
I
(10) is greater than
X1
+
X2
+
1
To dispose of
n
.
_
N we observe
·
·
+
4cN e2n
x,.,
�
� + !J(N)
n!J(N)_ N
_
that the ratio
vWf) N
n,
is a decreasing function of for large
N and tends to 0 as N + N such that
there exists an integer
Vf(N) N Then
n!J(N) N
<
<
V4c en
� 
4cN e2n
V4cv'W/))·l E
Hence, at least
oo.
v�J
�
V4c . _!!_.y!J'(N E N1
_
1)
whence it follows that the probability of inequality (10) is greater than
Now
N
1 V4c[v!J(N) E
no,
increases indefinitely
above a certain limit
!J(N 1) · _!!_ N 1 y together with n; therefore,
]
+
!J(N)
<
�
for all
n
198
INTRODUCTION TO MATHEMATWAL PROBABILITY
[CHAP.
X
so that for n > no the probability of the inequality
1:1:1
+
x2
+
.
. . +
n
x,.
l
<
e
will be greater than
1
·
V4c[YWI)
_!!_ N 1vl/I(N
+
E
1)]
and with indefinitely increasing n will approach the limit 1.
Thus
Khintchine's theorem is completely proved. Example.
Let
be all possible values of identical variables
1
1
1
_,_, _,
.
2 21 2 3 their corresponding probabilities.
1
211oc1
+
1
211oc1
+
:1:11 .
:�:,, :�:a,
1
_,
.
.
.
.
.
•
and
.
2"
Since the series
1
21loc3
+
·
·
·
=
1 +
1
2loc4
+
1
31oc4
is convergent, mathematical expectations of the variables
+
:1:11
•
·
zs,
•
za,
. . . exist.
Hence, the law of large numbers holds in this case.
Mark�ff's theorem cannot be applied here, because for any positve 6 the series .
..
2 "' � �nU+I>Joc4 1
is divergent.
Problems for Solution 1. Let
z
be a stochastic variable with the mean
Denoting by
=
0 and the standard deviation
a.
P(t) the probability of the inequality z
show that
P(t) � 1

P(t)
� t
ul us �
+
t2 ql
+ t•

 ql
for for
t < 0
t
> 0.
Show also that the righthand members cannot be replaced by smaller numbers.
Indication of the Proof.
we have also
Since
THE LAW OF LARGE NUMBERS whence, supposing that
x;
.
> t fori = 1, 2, .
. s and first taking t negative,
.,.t
tl  .,.t + tt'
1 P(t) � For positive two values:
199
P(t) �  .,.t + t2
·
·
t the proof is quite similar.
Considering a stochastic variable with
.,.t
PI
=
.,.t + tl
tl Pt = .,.t + 2 t one can easily prove the last part of our statement. 2. Tshebysheff's Problem.' If xis a positive stochastic variable with given
cr2,
E(x) = then the probability
P of the inequality
has the following precise upper bounds:
v
< .,..
for
Let
(X ) (X) �=
Then � <
V
for
p;;;; 1 .,.2 P�  v
Indication of the Proof.
6;
X
v if v 6; "'/cr2 and
cr2v
.,c
v  .,.,
·
� 2
E P� v � since
� 2

v �
for
x 6;
v.
�1 
On the other hand,
( )
� � .,,  20"'� + �� E x = = (v  �)2 v � whence
p� 
., .,c
. .
.,, ., c
.,.
+ vt  2cr2v
 .,. 
+ v2
,
2cr2v
.
1 Sur les valeurs limites des integrales, JoU1'. Liouville, Ser. 2, T. XIX, 1874.
200
INTRODUCTION TO
MATHEMATICAL PROBABILITY
[CHAP.
X
The equalit y sign is reached for the stochastic variable with two values:
(v  u2)2
Pt =
Xa
P2
v,
= ''
,., + v2 2u2v ,., u'
=
,., +
vB
·
2u2v
If u• ;;;; v < T4/u2 we have an obvious inequality p;;;; E
u2
(X) ;
=
;;·
To show that the righthand member cannot be replaced by a smaller number, con sider the following stochastic variable with three values:
Xt
=
Xa
=
X
a
=
0,
=
v(l v)
Pa
l,
where l >vis an arbitrary number.
P
Zu•  .,.,
P2
v,
=
=
.'  u2v l(l v)
For this variable
P2 + Pa
u2 = 
v

.4

u2v

lv
is arbitrarily near to u2/v for sufficiently large l.
3. If
x is
an arbitrary stochastic variable with given
and P denotes the probability of the inequality
lxl
!1:; ku,
then if
p;;;;
(;;;)
(;)
'
1
:' !.._ :'
;
___
if
+ k" 2k2
These inequalities cannot be improved.
HINT: Follows from Tshebysheff's problem. 4. Let X; assume two values, i and i with equal probabilities. law of large numbers cannot be applied to variables Xt, x2, xa, . 5. Variables Xt, x2, xa, each assume two values: .
•
log a or
Show that the •
•
log a; log (a+ 1) or
with equal probabilities. ables.
•
.
log (a+ 1); log (a+ 2) or
log (a+ 2);
Show that the law of large numbers holds for these vari
THE LAW OF LARGE NUMBERS
HINT: E(:x:;) Bn
=
=
0; i
E(:x:l+ :X:s+
•
•
2, 3, . . .
1,
=
+ :X:n)1
•
=
n1
=
�
i0 as
201
{log
(a+ i)}l"' (a+
n 
1){log
(a+
n 
1)}2
can easily be established by using Euler's summation formula (Appendix 1, page
347).
Hence as
n+ oo.
6. If :x:; can have only two values with equal probabilities, ia and ia, show that the Z11 Z21 :x:a, if a< �.
law of large numbers can be applied to
•
HINT:
Bn
=
pa + 2�« +
·
·
+
·
Bn  +0
n•'"+l n1'" "'•
2a + 1
ns
1 a<·
if
2
It can be shown that the law of large numbers does not hold if a !5:; let
�.
7. In an indefinite Bernoullian series of trials with the constant probability p,
m; denote the number of successes in the first i trials.
Show that the law of large
numbers holds for variables
:X:; if
=
a>�.
HINT: Evidently E(:x:;)
m; ip
0, E(:x:D
=
i
(ipq)'" j =
=
2, 3,
1,
.
.
•
(ipq)IIa and
"
Bn
=
j>i
i=1
Now
E(:x:;:x:i)
=
=
since
� (ipq)lla+ 2� E(:x:;Zj).
(ij)....(pq)•'"E(m1  ip)1 + (ij) ....(pq)�«E { (m;  ip)(m1 m1 (j t)p)} (pq)llail
m;  ip and m1  m;  ( j  i)p are independent variables.
=
Thus
[ � ilIa+ 2�ilaja ] "
Bn
=
(pq)lla
j>i
i1
and it is easy to show that as provided a>�.
n+ co
But the law of large numbers no longer holds if a ;::i!
proof of this is more difficult.
�.
The
8. The following extension of Tshebysheff's lemma was indicated by Kolmogoroft'. 0, E(:x:n b;, Let :x:1, :x: • :X:n be independent variables; E(:x:i) •
•
•
=
B,.
=
b1+ bt+
•
•
•
+ bn
=
INTRODUCTION TO MATHEMATICAL PROBABILITY
202 and
8k
Denoting by
P
Xt
=
+Xz +
·
•
•
+Xk;
k
=
[CHAP. X
1, 2, ... n.
the probability of the inequality
(A)
max.
P < 1/t2• Indication of the Proof.
(8�, 8�, ... 8�) > B.tz,
we shall have
The inequality
(A)
can materialize if and only if one of
the following mutually exclusive events occurs: event event event
If
e1: 8� > B.t2; ez; 8� ;;:i! B.t2; e3; 8� ;;:i! B.tz;
8� > B.t2; 8� ;;:i! B.t2;
(e;) represents the
probability of
P
=
e;(i
=
(et) + (ez) +
1, 2, ... n) ·
·
·
then
+ (e.).
Now consider the conditional mathematical expectation E(8�1ek) of
ek has occurred.
Since the indication of
these variables and
8k
are independent.
E(s�lek)
=
ek does not affect variables Xk+t, Hence
E(8:lek) +bk+t +
·
·
·
8�
given that
Xk+z,
..
. x.,
+b. > B.t2•
On the other hand
B.
=
n
� (ek)E(8!lek) > B.t21 (et) + (ez) +
E(s�)
whence P < 1/tz. 9. The Strong Law as
·
of Large Numbers (Kolmogoroff).
•
+ (e.))
Using the same notations
in the preceding problem, show that the probability of the simultaneous inequalities
will be greater than 1 of
·
k=l
E

8 + =:;;; n+1 _E,
1 nII "' provided
n exceeds a
1 1
8n+2 ;;:i! , n+2 E
.
.
•
certain limit depending on the choice
and "' and granted the convergence of the series
��. �nz
Indication of the Proof. T;
=
max.
(
)
8m 8 n m

1
Consider variables
and denote by q, the probability of the inequality lemma
i
for T;
=
> �E.
1, 2, 3, . . . By Kolmogoro:ff's
203
THE LAW OF LARGE NUMBERS and
oo ql + qz + qa +
·
•
1=2•n1
< 4E2!2 ,� n 2 2 2 !
·
. i=l
1=21•n
or
oo
1=2'n1
< 16E2! ! i=l 1=2••n
(),
�
.. q1 + q2 + qa +
·
·
< 16E2 �
·
!!._,._
�kB
k=n Hence, the probability of fulfillment of all the inequalities
..,
is greater than
� 32E; i
1, 2, 3,
=
..
1 16c2!b kB,.
·
k=n
ls,./kl
The inequalities
�
e; k
=
taneously
n, n + 1, n + 2, i
=
and
. . . are satisfied when simul
1, 2, a,

The probability of the last inequality ·being greater than 1
4B,.
, the probability
nte2
of simultaneous inequalities
Ik�� 
S
k
E'
,
=
n, n + 1, n + 2,
.
•
•
a fortiori will be greater than
B,./n1 tends to 0 when
This inequality suffices to complete the proof if we notice that the series
is convergent.
10. Let x1, x2,
by
P,.(E)
and
•
x,.
be identical stochastic variables and E (x 1)
•
•
•
+ Xn
n
>E
and
X1 + X2 +
+X.. ������ <
n
show that +
P,.(e) = n= ooP,.(e) .
hm
as
0.
Denoting
f>,.(E), respectively, the probabilities of the inequalities
X1 +X2 +
. according
=
E (xn > or <0.

0 or
+ao
•
•
•
e
INTRODUCTION TO MATHEMATICAL PROBABILITY
204
[CHAP.
X
For the proof see Khintchine's paper in Mathematische Annalen (vol. 101, pp. 381385). 11. The Law of the Repeated Logm·ithm (Khintchine, Kol11wgoro;ff). Let x,, x2, 0, i= 1, 2, ... nand B,.> oo ... x,. be bounded independent variables, E (x ;) For an arbitrarily small 6 > 0 and E > 0 and for an arbitrarily large N as n _. oo one can choose no > N so that: a. The probability of the fulfillment of the inequality =
•
ls.. l
> (1 +
6)y2B,. log log B,.
for at least one n � no is less than E. b. The probability of the fulfillment of the inequality
Is,.!
> (1
 6)y2Bn log log
B,.
for at least one n � no is greater than 1  E. For the proof see Kolmogoroff's paper in Mathematische Annalen (vol.101, pp. 126· 135). If
x1, x2,
•
•
•
x,. are variables independent in pairs and Bn the dispersion of their + Xn, then the probability P that
sum s= x1 + x2 +
·
·
·
satisfies the inequality p > 1
(Tshebysheff's inequality)
1, 2, ... n, which can be assumed without loss of generality. provided E(x1) = 0, i In case variables are totaUy independent and are subject to certain limitations of com paratively mild character, S. Bernstein has shown that Tshebysheff's inequality can be considerably improved. 12. Let x1, x2, x,. be totally independent variables. We suppose E(x;) = 0, E(x�) = b; and =
for i
=
1, 2,
•
•
. n and h > 2,
c
being a certain constant.
Show that
BnEII:
A= E{e•<,,+z.+···+z.>l < e2(1 .. ) where EC
�
IT
IT.
is an arbitrary positive number < 1 and
Indication of the Proof.
E
is a positive number so small that
We have
.. e"fJI � 1
+
EX;
+
�
n=2
whence
E(e""'•) � 1 +
00
::'
E"
"
b;e2
b ie• � (Ec)" < e2(1 ..). n=O
THE LAW OF LARGE NUMBERS
Q
13. If
denotes the probability of the inequality
+
X1
show that
X2
+
If
Q is
of the Proof.
(J
14. S. Bernstein's Inequality.
a
>
Bne
tt
+E
2(1  a)
the probability of the inequality
2
< e1 and
Q
Denoting by
lx1 + X2 +
·
·
·
Q
<
P
by Prob. 12.
the probability of the inequality
+ Xnl
;;;i
w,
given positive number, show that
Indication of the Proof.
� 2�"
=
u
l
t

1
P >
then F
+ Xn
•
< e1 •
then, by Tshebysheff's lemma,
being
• •
2
Q
Indication
w
205
To make
.,. 2e 2B.+2cw. t2
BnE
 +2(1 a) E
=
• •
F mmrmum take e
and t is determined by equating F to • =
is admissible only if
=
and correspondingly t
tV2(1 a) v'iJ,. ' ·
The resulting value of .,
a)
�(1 B.
cw (1 a) ;;;; a. The best choice for a is a = cw B n B.+ cw "' By Prob. 13 the probability of the inequality � / y 2Bn + 2cw X1 + X2 + • • + Xn > w
;;;; a
EC
w.
=
or
·
·
.,. is less than
e
�.
The same is true of the probability of the inequality
or < w x1  X2 bounded uniformly are Xn . . . variables x1, x2,
X1
+
X2
+
·
•
+ Xn
·
16. If of their numerical values, then we may take Indication
of the Proof.
Note that
c =

•
•
•
x.
�
=
. . Let m be the frequency of E m n trrals, p
!.n (p1q1 + p2q2 +
j;  i
p ;;;i
e
·
·
·
+ p.q,.).
=
1 
n•'
2 2e  '+l•.
•
•
P1 + P2 +
Show that the probability
has the following lower limit:
P >
w.
M /3.
16. Consider a Poisson's series of trials with probabilities p1, p2, event E to occur.
>
and M is an upper bound
n
P
•
·
Pn ·
·
for an + p.
,
of the inequality
[CHAP. X
INTRODUCTION TO MATHEMATICAL PROBABILITY
206
In the Bernoullian case P•
=
P2
=
·
·
P > 1
·
=

Pn, }..
pq and consequently
=
n•' 2e 2Pq+f•.
has the 17. An indefinite series of totally independent variables Xt, x2, xa, property that the mathematical expectations of any odd power of these variables is •
rigorously
2k) (�)\2k)l. k! '
\x,
=
1, 2, 3,
Xt + X2 + where Bn
=
•
0 while
=
E' for i
•
·
s
b;
2

=
E (xD
Prove that the probability of either one of the inequalities ·
·
+ Xn >
bt + b2 +
·
·
·
Xt + X2 +
or
tv'2B,;
+ bn is less than
e12 (S.
·
·
•
+ Xn < 
Bernstein).
ty'2B;.
Prove first that
e'bi
E (e•x•)
� e2.
18. Positive and negative proper decimal fractions limited to, say, five decimals, are obtained in the following manner: From an urn containing tickets with numbers
0, 1, 2,
.
•
. 9
in equal proportion, five tickets are drawn in succession (the ticket
drawn in a previous trial being returned before the next) and their respective numbers
This fraction, if not equal to 0, is preceded by the sign +or ,according as a coin tossed at the same time
are written in succession as five decimals of a proper fraction. shows heads or tails.
Thus, repeating this process several times, we may obtain as
many positive or negative proper fractions with five decimals as we desire. can be said about the probability that the sum of between prescribed limits  "' and oo?
Ans.
n
These
What
such fractions will be contained
n
fractions may be considered as
so many identical stochastic variables for each of which
{3
=
E(x2)
(1

lQ6)(2  lQ6)
=
6
<
1
3
·
Besides,
since in general + (s  1)2k <
s2k+l ·
2k
+1
Again, the inequality E(X
2k) (�)2 k (2k)l k! s

can easily be verified and we can apply the result of Prob. 17. probability P the following lower limit can be obtained:
For the required
THE LAW OF LARGE NUMBERS or, if
w
•
207
ne
P>l2e For example, if
e
=
Xo and
n
In•' •
!5; 814, p > 0.99999,
that is, almost certainly the sum of 814 fractions formed in the above described man ner will be contained between 82 and 82.
References P. L. TsHEBYBHEFF: Sur les valeurs moyennes, Oeuvres I, pp. 687694. A. MARKOFF: "Calculus of Probability,·· 4th Russian edition, Leningrad, 1924. S. BERNSTEIN: "Theory of Probability'' (Russian), Moscow, 1927.
A. KHINTCHINE: Sur la.loi des grands nombres, C.R. 188, 1929.
CHAPTER XI
APPLICATIONS OF THE LAW OF LARGE NUMBERS 1. A theorem of such wide generality as the law of large numbers is a source of a great many important particular theorems.
We shall begin
with a generalization of Bernoulli's theorem due to Poisson. Let us consider a series of independent trials with the respective
pi, p2, pa, ... , varying from one trial to another. Con n trials, we shall denote by m the number of successes. The
probabilities sidering
arithmetic mean of probabilities inn trials
I+ P2 + p= P will be called the
11
·
·
·
n
mean probability in
+ p,.
n trials."
With such conditions
and notations adopted, we can state Poisson's theorem as follows:
Poisson's Theorem.
The probability of the inequality
fo1· fixed e > 0, no matter how small, can be made as near to 1 (certa1:nty) we please, provided the number of trials n is sufficiently large. Proof.
as
To show that this theorem is but a particular case of the law
of large numbers, we use an artifice often applied in similar circum. stances, namely, we associate with trials 1, 2, 3, .
.
. n variables XI,
x2, xa, . . . x,. defined as follows: x,= 1 in case of success in the ith trial, x;= 0 in case of failure in the ith trial. Since the trials are independent, these variables are also independent. Moreover
E(x,) = E(xD = P• and the dispersion of
x, is P• p�
=
p,q,.
The dispersion B,. of the sum
XI+ X2 +
' ' + Xn
'
is the sum of the dispersions of its terms, that is,
B,.= piqi + p2q2 +
n
·
·
·
+ p,.q,. � .4·
At the same time, the former sum represents the number of successes 208
m.
Smc.
APPLICATIONS OF THE LAW OF LARGE NUMBERS
2)
209
Now, applying the results established in Chap. X, Sec. 2, we arrive at this conclusion: Denoting by P the probability of the inequality
1: PI�
E,
we shall have
It now suffices to take n
1 > 4E�
to have
p > 1'17 where 11 is an arbitrary positive number no matter how small.
That
completes the proof of Poisson's theorem.
Evidently Bernoulli's theorem is contained in Poisson's theorem as a
particular case when
Pt = Pz = · · · = p,. = p. Poisson himself attached great importance to his theorem and adopted for it the name of the "law of large numbers," which is still used by many authors.
However, it appears more proper to reserve this name to the theorem established in Chap. X, Sec. 2, which is due to Tshebysheff.
2. Let us consider
n series each consisting of
the constant probability
p.
8 independent trials with
Also, let
represent the number of successes in each of these
8 series.
Stochastic
variables
are independent and identical. tion is
8pq.
Their common mathematical expecta
The law of large numbers can be applied to these variables
and leads immediately to the conclusion: The probability of the inequality n
�(m, 8p)2
i1 1 ;;.._=
n



8p q
<
E
can be brought as near as we please to 1 (or certainty) if the number of
series n is sufficiently large.
210
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XI
Substituting espq forE and dividing through by spq, we may state the same proposition as follows: The probability of the inequalities ..
t<m• sp)2
1
i < 1
E
< 1 + E1
Npq
where N =
ns
as near to
as we please if the number of series is sufficiently large.
1
is the total number of trials in all
n ser�es,
can be brought
The law of large numbers can be legitimately applied to the variables
lm•  spl;
x, =
i = 1, 2, 3,
with the common mathematical expectation
M. where p. =
[sp + 1],
2spqC�=lP"Iq•,.
=
and leads to the following proposition: The proba
bility of the inequalities ..
tim• spl
i 1 E < =1 can be brought as near to
1
<1 +E
nM.
as we please if the number of series is suf
ficiently large. For the sake of simplicity, let us use the notations ..
A2
tcm.
:...= . 1;_ _::
='
sp)2
___
n
..
tim• spl
·,'_ 1 B =
n
___
The probabilities P and P' of the inequalities
(1) (2)
.y8pq(1 a)
which are equivalent to
..
t<m• sp)2
(1

a)2 < •1
< (1 + a)2
nspq
..
tim•
1

a <
i1
nM.
spl <
1 +a
SEc. 2]

1 .,, where 71 is an arbitrarily small The probability of simultaneous materialization of is not less than
can both be made greater than positive number.
(1)
21 1
APPLICATIONS OF THE LAW OF LARGE NUMBERS
and
(2)
P + P' But whenever
(I)
(2)
and
M.
>
1  2.,.
hold simultaneously, we have
yspq 1 
(3)
 1
1
+
< � < yspq 1 +fT.
f1
M.
B
f1

I
(f
Therefore the probability of these inequalities is again > let us take
1
 271•
Now
T
q where
r
= 2+r
is another positive number arbitrarily chosen.
l 1
+

f1 (f
1
= 1 + T', 1
+ :
>
1 
Then
T,
"
Hence, the inequalities
yspq l  r) <:! < yspq l M. ( B M. ( follow from inequalities
(3)
+
)
r
and their probability is a fortiori > 1
 2.,.
It suffices to take T
M. = =E
yspq
to arrive at the following proposition: The probability of the inequality
':! ySpql < B
for a fixed
E

M.
E
and sufficiently large number of series can be made as near to
1 as we please. If spq is somewhat large, the quotient
vspq M. differs but little from
V'lf'/2
(see Chap. IX, Prob.
2,
page 177).
Hence,
when the number of series is large and the series themselves sufficiently long, we may expect with great probability that the quotient A
B will not differ much from
v'i/2.
212
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
DIVERGENCE CoEFFICIENT 3. The considerations of the preceding section can be generalized. n series containing s trials each, and let
Let us consider again
represent the numbers of successes in each of these series.
Without
specifying the nature of the trials (which can be independent or depend ent) we shall denote by by
q=
p
the mean probability in all
1  pits complement.
N = ns
trials and
Again considering the quotient n
! <m• sp)2
=i ==1,..
Q=
Np q
we seek its mathematical expectation E(Q) =D. When all the
N trials
are of the Bernoullian type, D
possible to imagine cases when "coefficient of dispersion." divergence coefficient." quencies in
n
If
D
> 1 or
D
We shall call
m1, m2, ... m,.
D
But it is also
1.
=
Lexis calls
< 1.
VD
the
itself the "theoretical
are actually observed fre
series, the quotient
n
D
'
! <m• sp)2
=
'i ==1'o: c : 
Npq
may be called "empirical divergence coefficient."
Then, if the law of
large numbers can be applied to variables Xi=
(m,  sp)2 Npq ;
i = 1, 2, 3, .
.
.
'
we can expect with probability, approaching certainty as near as we please, that the inequality
ID' Dl
<
E
will be fulfilled for an adequately large number of series. Thus far we have not specified the nature of the trials. suppose that all
N = ns
trials, distributed in
n
Now we shall
series, are independent
but with probabilities varying in general from trial to trial.
(i =
1, 2, .
.
.
n)
Let
SEc. 3)
APPLICATIONS OF THE LAW OF LARGE NUMBERS
be the probabilities in successive trials of the ith series.
P•
pu =
+ P2• +
·
·
P1 + P2 +
=
n
is the mean probability in all (m;E(m,
8p)2,
·
Their mean
+ Poi
·
8
is the mean probability in the ith series. p
213
N
n8
=
·
Finally ·
Pn
+
·
As to the expectation of
trials.
we find
sp)2
=
E(m;
 8P> + 8(p, p))2
E(m,
=
8p,)2 + 82(p,  p)2
since E(m;
8p,)
0.
=
On the other hand, 8
8
E(m;
sp,)2
=
i1
and
8
!Pi• !P'•
=
i=1
i=1 8
8
!
(p,
Pii)2
=
8pf
+!Pi�, i 1
i 1
whence
8p, !7'1•
8
E(m,
 8p,)2
=
sp, 8p f

! (p,  Pi•)2• i1
. n
Now, letting i take values 1, 2, results, we get
and taking the sum of the
n
n
! E(m, 8p,)2
=
i1
8
n
n8p  8! p' 
! !(p,  Pii)2. i1i1
i1
But
n
n
8!(P p;)2
=
n8p2
+ 8!pf
i1
whence finally
i:al n
n
D
=
1
+
8 1 npq

� (p P•)2 i1
�� (p, Pii)2• 1

Npq
i1 i1
Two particular cases deserve special attention.
214
INTRODUCTION TO MATHEMATICAL PROBABILITY
Lexis' Case.
[CHAP.
XI
Probabilities remain the same within each series,
but vary from series to series.
In this case
p ii
=
D becomes:
Pi and the expression of
n
D =1 +
s1�
� (p  Pi)2 . npq i=l
The theoretical divergence coefficient in this case is always greater than 1 and may be arbitrarily large. Poisson's Case. The probabilities of. the corresponding trials in all series are the same, so that
Pii and
P = Pi =
= 1ri
7rl + 7r2 +
+
8
'Trs
��

In this case the divergence coefficient 8
� (p  7ri)2
D= 1 
1 = = .._ =·•:.
spq
is always less than 1. Since the law of large numbers evidently is applicable to variables
X·• =
(m, sp)2 ' Npq
we may expect that the empirical divergence coefficient D' will not differ much from D if the number of series is sufficiently large. ·For numerical illustration let us consider 100 series each containing 100 trials, such that in 50 series the probability is % and in the remaining 50 series it is ,%.
Here we evidently have Lexis' case.
The mean
probability in all trials is
p=! and 100
� (!
 p;)2 = 50 'Th +
50
'
Th
=
1.
i=l
Finally,
D =1 + U = 4.96. Now, suppose that we combine in pairs series of 100 trials with probability % and series of 100 trials with probability %, to form 50
4)
SEc.
APPLICATIONS OF THE LAW OF LARGE NUMBERS
series each of 200 trials.
215
Evidently we have here Poisson's case.
mean probability in each series again is
p
=
The
� and
200
� (! Finally,

'11'>)2
100
=
·
·dnr + 100 Th ·
=
2.
i1
D
=
1

h
=
0.96.
The consideration of the divergence coefficient may be useful in testing the assumed independence of trials and values of probabilities attached to these trials.
In the simplest case of Bernoullian trials with
a constant and known probability, the theoretical divergence coefficient is 1.
Now, if the number of series is sufficiently large and the empirical
divergence coefficient turns out to be considerably different from 1, we must admit with great probability that the trials we deal with are not of the supposed type.
If, however, the empirical divergence coefficient
turns out to be near 1, that does not conclusively prove the hypothesis concerning the independence of trials and the assumed value of the probability.
It only makes this hypothesis plausible.
There are cases of dependent trials (complex chains considered by Markoff) in which the theoretical divergence coefficient is exactly
i.
and
the probability of an event has the same constant value in each trial, insofar as the results of other trials remain unknown.
Cases like that
may easily be mistaken for Bernoullian trials without further detailed study of the entire course of trials.
4. When there is good reason to believe that the trials are independent with a constant but unknown probability, we cannot in all rigor find the value of the empirical divergence coefficient n
�(m; sp)2 D'
=
..'' 1�==

Npq
to compare it with the theoretical divergence coefficient D
=
1, since
p
remains unknown. But, relying on Bernoulli's theorem, we can take the quotient
M N where
M
=
m1 + m2 +
as an approximate value of
p.
·
·
By taking
expression for D' we get another number
·
+
p
m,.
=
M/N in the preceding
216
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
XI
However, considering m1, m2, ... mn
which in general is close to D'.
not as observed but as eventual numbers of successes in n series, the mathematical expectation of D" is different from
1.
To avoid this
difficulty, it is better to consider a slightly different quotient n

� (mi s;y n(N 1) Q
(n
=
 1)j}(N  M)
For this quotient there exists a theorem discovered and proved for the first time by the eminent Russian statistician Tschuprow.
Theorem. The mathematical expectation of Q is 1·igorously equal to Proof. Here we shall develop the proof given by Markoff. The
1.1
above given expression of Q presents itself in the form% and therefore has no meaning in two cases: M cases we set Q
=
=
0 or
1 by definition.
M N. For these exceptional M 0 nor M N, we =
If neither
=
=
can present Q in the form
(4)
Q
Considering m1, m2,
.
=
� mi2 M2 n (N 1) n n 1 M(N M) .
. . mn as stochastic variables assuming integral
values from 0 to s, the probability of a definite system of values
IS
To get the expectation of Q we must multiply it by P and take the sum E(Q)
'l;PQ
=
extended over all nonnegative integers m1, m2, . . . mr., each of them not exceeding s. To perform this multiple summation we first collect all terms with a given sum m1 + m2 + 1
·
·
·
+ mn
=
M.
The theorem itself and its proof given by Markoff can be extended to the case of
series of unequal length.
SEc.
4)
217
APPLICATIONS OF THE LAW OF LARGE NUMBERS
Let the result of this summation be S,,t. sum
Then it remains to take the
N
� SM
M=O
to have the desired expression E(Q). To this end we first separate two terms corresponding toM 0 and M N. In the former case =
=
and the probability of such an event is qN while Q case the probability of which is pN, while again Q
E(Q)
=
1.
=
=
1.
In the latter
Thus
N 1
� S.u.
pN + qN +
M=l
To find SM we observe that the denominator of Q has a constant value when summation is performed over variable integers m1, m2, . . . m,. connected by the relation m1 + m2 +
·
·
·
+ m,.
=
M.
Hence, it suffices to find two sums '];Pm;
and
'];P
extended over integers m1, m2, . . . mn varying within limits 0 and and having the sumM. To this end consider the function
V
=
(pte�• + q)•(pteh + q)•
·
involving n + 1 arbitrary variables t, ��. b, V consists of terms of the form
·
·
.
(pte�· + q)• . . �...
Evidently we obtain the sum '];P by setting�� and taking the coefficient of tM in the expansion
(Vh,=h=
.
.
.
�.=o
=
=
�2
(pt + q)N,
Thus
(5)
�p
�
=
Nl M NM MI(N M) 1P q '
To find '];Pm; take the second derivative o2V a�;
s
When developed,
=
·
·
·
=
���
=
0
2 18
and after setting
�1 �2 = ·
=
·
·
·
(�2�r)
and take the coefficient of
2[
tM.
�n
=
=
O, expand
�·Eo ... �·=0
Thus we find
]
(N1)! (N2)! 1) + �p M N� m; 8 (M1)l(NM)I 88(  (M2)!(NM)! p q M.
(6)
(4), (5),
Referring to
SM
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
=
and (6), we easily get
(N 2)!N n(N 1) nN n + (n 1)M (N M) n(M  1)!(N M)l[ + (N  n)(M  1)  M (N 1)]pMqNM;
or, after obvious simplifications,
S

'11
Nl pMqNM. M l(N  M)!
Hence N1
! SM
=
(p + q)N  p N  qN
=
M=1
1  pN  qN,
and finally E(Q)
=
1.
Markoff, using the same method, succeeded in finding the explicit expression of the expectation E(Q
2
1) •
Since there is no difficulty in finding this expression except for some what tedious calculations, we give it here without entering into details of the proof:
E(Q

2
1)
N1
=
�M 1 N M 1 2 N(N  n) N (n1)(N 2)(N 3)��. N  M c::pMq M, M=1
whence the following inequality immediately follows: E(Q
1)2
<
2 N(N n) (n 1)(N 2)(N 3)"
In case n � 5 a still simpler inequality holds:
(7)
E(Q  1)2
<
2 n 1
·
SEc. 5]
APPLICATIONS OF THE LAW OF LARGE NUMBERS
219
Let R be the probability of the inequality Q � 1 + e, Applying the same reasoning to inequality
where e is a positive number.
(7) as was used in establishing Tshebysheff's lemma, we find that 2
R <
(n 1)e2
.
Likewise, denoting by R' the probability of the inequality Q � 1
E1
we have
R' <
2 (n 1)e2
.
Thus, in a large number of series it becomes very unlikely that the value of Q found in actual experiment would lie outside of the interval
1 e, 1 + e.
For instance, the probability for Q � 2 in 100 series is
surely less than 2
99 or nearly 0.02.
However, this limit is much too high.
It would be
greatly desirable to have a good approximate expression for the proba bility of either one of the inequalities or
Q � 1
E.
But this important and difficult problem has not yet been solved.
6. In order to illustrate the foregoing theoretical considerations we turn to experiments reported by Charlier in his book "Vorlesungen fiber die Grundzuge der mathematischen Statistik" (Lund, 1920).
He
made 10,000 drawings of single cards from a complete deck of 52 cards (each card taken being returned before the next drawing), and noted the frequency of black cards.
The drawings were divided into 1,000
series of 10 cards, or into 200 series of 50 cards.
The results are given
in the tables on page 220. Assuming the independence of trials and the constant probability p
=
.%, the theoretical divergence coefficient must be 1.
Let us compare
it with the empirical divergence coefficient derived from Tables I and II. To this end we multiply the squares of numbers in the second column by the numbers given in the third column. For 200 series of 50 cards l:(m;  ps)2 2,487 =
The results are: For 1,000 series of 10 cards l:(m; ps)2 2,419 
=
220
INTRODUCTION TO MATHEMATICAL PROBABILITY
TABLE I.NUMBER OF BLACK CARDS 200 GROUPS OF 50 CARDS EACH
Frequency
Difference m
25
IN
TABLE H.NUMBER
OF
[CHAP. XI
BLACK CARDS
Number of groups with these frequencies
Frequency
Difference m
5
Number of groups with these frequencies
14
11
1
0
5
3
15
10
0
1
4
10
16
 9
2
2
3
43
17
2
3
2
116
18
 8  7
4
4
1
221
19
 6
8
5
0
247
20
 5
6
6
1
202
21
 4
15
7
2
115
22
 3
13
3
34
23
 2
15
8 9
4
9
24
 1
34
10
5
0
25
0
14
26
1
21
27
2
26
28
3
14
29
4
10
30
5
5
31
6
5
32
7
3
33
8
2
IN
1,000 GROUPS OF 10 CARDS EACH
Dividing these numbers by 10,000
·
X
=
2,500, we get the following
empirical divergence coefficients:
D'
D"
0.9948;
=
=
0.9676.
Both are close to 1, so that the hypotheses of independence of trials and constant probability 72 for each of them, are in good agreement with empirical results.
The second divergence coefficient, corresponding to
more numerous groups, differs from 1 more than the first, corresponding to only 200 groups.
But such a difference can be accounted for by
fluctuations due to chance. Series of 50 trials are long enough to test the theorem established in The quantities denoted there by A and B are
Sec. 2 of this chapter. here correspondingly:
A2
=
B whence
A
\W; tH ;
B
=
A
B
=
=
3.5263
=
2.805
1.2571
SEc. 5]
APPLICATIONS OF THE LAW OF LARGE NUMBERS
while
�
=
221
1.2533.
Again the difference, only about 4.103, is rather small. In this example, the probability of drawing a black card was assumed to be Yz. In case we do not know the probability, but suppose it to be constant throughout 10,000 independent trials, we must consider the coefficient
Q
=
�(
n(N 1) (n 1)M(N M)�
m;
8
)
M 2. N
a=1
In our example
n
=
N
1,000; s
=
10;
=
10,000; M M 4.933. s N
=
4,933
=
To evaluate the sum 1,000
S
=
� (m,  4.933)2
i=1
we write it in the form 1,000
S
=
�
1,000
(m1  5)2 + 0.134
� (m,  5) + 1,000
·
(0.067)2.
i=1
i=1
Now 1,000
� (m,  5)2
=
2,419
1
1,000 . (0.067)2
4.489
=
1,000
0.134
� (m,  5)
8.978
=
1
s
=
2,414.51
This is to be multiplied by the number
n(N 1) (n  1)M(N M)
=
1 2497.3.
The result is
0.9668, near enough to 1 for us to consider the hypothesis of independence of
222
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XI
trials and the constant value of probability as in agreement with experi mental data.
ExAMPLES OF DEPENDENT TRIALS 6. So far we have dealt only with independent variables.
But the
law of large numbers holds, under certain conditions, even in the case of dependent variables.
Leaving aside generalities, we shall show the appli
cation of the law of large numbers to a few interesting problems involving dependent variables.
n + 1 inde E, the opposite 1, 2, ... n variables
Let us consider first a Bernoullian series consisting of pendent trials with the same probability event being denoted by F.
XI, x2, ... Xn
p
for an event
We associate with trials
defined as follows:
X; = 1 if E occurs in trials i X; = 0 in all other cases. The probability of
x, = 1
evidently is
the values of other variables. implies the occurrence of is
p.
p2
and i +
when nothing is known about
x._I = 1,
which
trial, then the probability of
x; = 1 and x,.
But if we know that
E in the ith
Thus, consecutive variables are dependent.
are independent if Jk  iJ >
1,
1,
However, X;
as we can easily see.
Since
E(x,) = E(xD = p2 1 + (1  p2) o = p2 ·
·
XI + X2 + + Xn will + Xn) = np2• E(xi + X2 +
the expectation of the sum
·
·
·
·
·
be
·
As to the dispersion of this sum, it can be expressed as follows:
n
Bn =
� E(x,  p2)2 + 2� E(x,  p2)(xi  p2).
i1
i>i
Now
(8)
E(x,  p2)2 = E(xD  2p2E(x,) + p4 = p2(1  p2)
and
(9)
E(x,  p2)(xi  p2) = E(x,  p2) E(xi  p2) = 0 ·
for j > i + 1 because then
(10)
x;
and
E(x;  p2)(xi+I  p2)
xi
=
are independent.
E(x;x;+I)  p4 = pa  p4
since the probability of simultaneous events
X;= 1,
But
7)
SEc.
is p3•
223
APPLICATIONS OF THE LAW OF LARGE NUMBERS
Taking into account Bn
(8), (9), and (10), we find
=
np2q(3p +
1)  2p 3q
and the condition as is satisfied. x2,
•
•
•
x,.
n?
oo
Hence, the law of large numbers holds for variables x1, To express it in the simplest form, it suffices to notice that
the sum XI+ X2 + ' ' ' +X,.
represents the number of pairs EE occurring in consecutive trials of the Bernoullian series of n + pairs by m.
1 trials.
Let us denote the frequency of such
Then, referring to the law of large numbers, we get the
following proposition: If in n consecutive pairs of Bernoullian t1ials the frequency of double successes EE ism, then the probability of the inequality
1 as near as we please, when n becomes sufficiently large.
will approach
7. Simple chains of trials, described in Chap. V, Sec. 1, offer a good example of dependent trials to which the law of large numbers can be Let PI be the given probability of an event E in the first trial. According to the ·definition of a simple chain, the probability of E in
applied.
any subsequent trial is a or {3 according as E occurred or failed to occur in the preceding trial.
By p,. we ·denote the probability for E to occur
in the nth trial when the results of other trials are unknown.
{j
=a
{3,
p= 1
{3
Let
{j·


Then, according to the developments in Chap. V, Sec. 2,
) I Pn = P + (Pt  p {jn , whence PI+ P2 + ' ' ' + Pn �__:_.!...__:_ .::_ n __:_�
=
p+
__
barring the trivial cases {)
=
1 or {j
=
1.
{j n 1{j'
Pt P 1 n

It follows that p represents
the limit of the mean probability inn trials when n increases indefinitely, and for that reason p may be called the mean probability in an infinite chain of trials.
When it is known that E has occurred in the ith trial, its
224
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
probability of occurring in some subsequent jth trial is given by
p}')
p + q{Jii,
=
q = 1  p.
In the usual way we associate with trials 1, so that in general
Xt, x2, x3,
•
•
2, 3, . . . variables
•
x, x,
=
=
1 when E occurs in the ith trial 0 when E fails to occur in the ith trial.
Evidently
E(xi)
=
E(xr)
=
Pi·
In order to prove that the law of large numbers can be applied to variables for large
x1, x2, xa, . . , we must have an idea of the behavior of B,. n. By definition .
n
+ Xn  Pn)2 =
� E(xi p;)2 +
i=l
+ 2 � E[(xi Pi)(xi  Pi)]. i>i
The first sum can easily be found.
E(x,  Pi)2
=
We have
Pi p� =pq + (q  p)(pt  p)fli1  (Pt  p)2fJ2i2
whence n
A
=
� E(x; Pi)2"' npq
i=l
neglecting terms which remain bounded.
As to the second sum, we
observe first that
E(x,  p;)(xi  Pi)
E(x;xf)  PiPi·
=
Again, since the probability of
XiXf is evidently
=
1
p;p<jl we have E(xix;) =PiP}»,
and
E(xi  pi)(x1  p;)
=
Pi.P}i>  p;) pqfJii + 1 + (p  p)(q  p){Ji1  (p1 =
_
p)2fJHi2.
1, 2, ... n  1, we must take the sum of these expr�ssions letting j run over i + 1, i + 2, ...n. The result of this Now, for a fixed i
=
summation is
pq
O
_
0nHl
1 o
[Ji
0n
'
2 i + (pt  p)(q  p) �  (Pt  p) o _
0il
0nl . 1 fJ _
SEc. 8)
APPLICATIONS OF THE LAW OF LARGE NUMBERS
Taking i
225
1, 2, 3, . . . n  1 and neglecting in the sum the terms
=
which remain bounded, we get B
�E(x, p,)(x; p;)
=
,_
i>i
6 npq_ 1 6
whence B,.
=
1+6
A +2B,_np� ·
This asymptotic equality suffices to show that B ,..+0 n2
as
n+oo.
Therefore the law of large numbers can be applied to variables x2,
xa,
. . . .
x 1,
Since the sum X1 + X2 +
represents the frequency of
E
'
' ' +X,.
=
m
in n trials, the law of large numbers in
this particular case can be stated as follows: For a fixed
E
> 0, no matter
how small, the probability of the inequality
l tends to 1 as n +
+P2 + · m  P1
oo
·
·
+p,.
n
n
,
<e
•
The arithmetic mean P1 +P2 +
·
·
·
+p,.
n
itself approaches the limit p.
It is easy then to express the preceding
theorem thus: The probability of the inequality
tends to 1 as n + oo
•
This proposition is of exactly the same type as Bernoulli's theorem, but applies to series of dependent trials.
8. Let a simple chain of N
=
ns trials be divided into n consecutive
series each consisting of s trials; also, let mr, m2, quencies of
E
in each of these series.
.
•
.
m,. be the fre
When N is a large number, the
mean probability in N trials differs little from the quantity denoted by p. It is natural to 'modify the definition of the divergence coefficient given in Sec. 3 by taking p instead of the variable mean probability in N trials. Thus we define
226
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
In our case, the variables
X1 = (m1 8p)2,
X2 = (m  8p)2, 2
·
·
·
X,.= (m,. sp)2
are neither identical nor independent, although the degree of dependence is evidently very slight.
These variables can also be presented in the
form (11)
(x.. p + Xcz+l  p +
taking successively a
•
•
. . . (n  1)8 + 1.
1, 8 + 1, 28 + 1,
=
+ Xcz+s1  p)2
•
To find the mathematical expectation of (11) it suffices to notice that E(x,
_
p)2 = E(x,
E(x, 
p;)2 + (p,
_
_
p)2
=
pq + (q
_
p)(p1
_
p)�>1
p)(xi p) = E(x,  p ;)(x i  Pi) + (p;  p)(Pi  p) pq�i + (p1 p)(q  p)�i1 =
and then proceed exactly as in the approximate evaluation of B,. in Sec. 7. The final result is E(x..  p + Xa+11 �  8pq + 1  � _
P+ _
Xa+s 1
p)2 = (q p)(p1 p)(1 + �) 4_1 � (1 �)2 (1 �)2 +
2pq (1 �)2
·
·
2pq�
�·
·
+
_
(q  p)(p1 p) +1 [28(1  �) + 1 + �]�a+•1. (1  �)2
For somewhat large 8 the two last terms in the right member are com pletely negligible; so is the third term if a � 8 + 1. approximation,
Hence, with a good
2pq� 1 � (q p)(p1 p)(1 + �) 8pq + 1 � (1  �)2 + (1  �)2 , 2pq� 1 + � > 1 ,; if X E( ) = 8pq , (1 �)2 � 1 E(X1)
=
�
_
_
and
D
=
1 + �
1 �
_
(q p)(p1 p)(1 + �). 2� Npq(1  � )2 s(1  �)2 +
Again, when N is large, the last term can be dropped and as a good approximation to D we can take (12)
D=
1 + � 
1 �
2� 8(1 �)2
·
It can be shown that the law of large numbers holds for variables X11 X2, . . . X,. and therefore when n (or the number of series) is large, the
SEc. 9]
APPLICATIONS OF THE LAW OF LARGE NUMBERS
227
empirical divergence coefficient is not likely to differ considerably from D as given by the above approximate formula. 9. In order to see how far the theory of simple chains agrees with
actual experiments, the author of this book himself has done extensive experimental work.
To form a chain of trials, one can take two sets of
cards containing red and black cards in different proportions, and proceed to draw one card at a time (returning it to the pack in which it belongs after each drawing) according to the following rules: At the outset one card is taken from a pack which we shall call the
first
set;
then, whenever a red card is drawn, the next card is taken from the first set; but after a black card, the next one is taken from the
second
set.
Evidently, these rules completely determine a series of trials possessing properties of a simple chain.
In the first experiment the first pack
contained 10 red and 10 black cards, while the second pack contained 5 red and 15 black cards.
Altogether, 10,000 drawings were made, and
following their natural order, they were divided into 400 series of 25 drawings each.
The results are given in Table III.
TABLE III.DISTRIBUTION OF RED CARDS IN 400 SERIES OF 25 CARDS Frequency of red cards,
m
Difference, m
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17
8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7
8 9
Number of series with these frequencies
2 4 8 27 29 54 37 52 47 44 41 20 20 7 4 3 1
The sum of the numbers in column 3 is 400, as it should be.
Taking
the sum of the products of numbers in columns 1 and 3, we get 3,323, which is the total number of red cards.
The relative frequency of red cards in
10,000 trials is, therefore, 0.3323.
228
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
XI
In our case
i,
a =
f3
o
!,
=
=
l
and the mean probability p in an infinite series of trials p
1
{3
=

1
=
0
3
0.3333 .
=
Thus, the relative frequency observed differs from p only by 103 and in this respect the agreement between theory and experiment is very satisfactory.
Now let us consider the theoretical divergence coefficient
for which we have the approximate expression
D Here we must substitute
D
o  o
1 + =
o
1
_
>i and
=
s
2o s(1  o)2 25.
=
.
The result is
1.631, approximately.
=
To find the empirical divergence coefficient we must first evaluate the sum S
:Z(m Y)2
=
extended over all 400 series.
For the sake of easier calculation, we
presentS thus: S
:Z(m  8)2  i:Z(m 
=
8) +
'4�.
Now from Table III we get
:Z(m
8) 2
=
:Z(m 
3,521;
8)
=
123
whence
s
=
3,483.4.
2000%
Dividing this number by
=
2,222.2,
we find the empirical
divergence coefficient
D' which differs from D
=
=
1.568
1.631 by only about 0.06, well within reasonable
limits.
10. In two other experiments two packs were used: one containing
13 red and 7 black cards, and another 7 red and 13 black cards.
In
one experiment the pack with 13 red cards was considerPd as the
first
deck, and in the other experiment it became the
second
deck.
The
new experiments were conducted in the same way as that described in Sec. 9, but they were both carried to 20,000 trials divided into 1,000 series of 20 trials each. a =
In the first experiment, we have
f3
U,
=
'!"'f,
o
=
T"'�r,
and D
=
1.796, approximately,
P
=
i
SEc. 10)
APPLICATIONS OF THE LAW OF LARGE NUMBERS
229
while the same quantities for the second experiment are a=
p=i
f3 = H,
y�,
and
D = 0.556, approximately. The results of these experiments are recorded in the following two tables: TABLE IV.CONCERNING THE FIRST ExPERIMENT Frequency of red cards,
m
2 3 4 5 6 7 8 9 10 11
Difference, m
10
3
8 7 6 5 4
36 59
3 2 1
93 103 117
0 2 3
128 121 101 93
4 5
48 39
6 7
26 7
8 9 10
1 1 1
1
12
13 14
15 16 17 18 19 20
Number of seties with these frequencies
5 18
TABLE V.CONCERNING THE SECOND ExPERIMENT Frequency of red cards,
m
Difference, m
10
Number of series with t.hese frequencies
2 10 48 112
8 9
5 4 3 2 1
10 11 12 13
0 1 2 3
193 251 201 113 56
14 15
4 5
9 5
5 6 7
230
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XI
Taking the sum of the products of numbers in columns 1 and 3, we find 10,036
and
10,045
as the total number of red cards in the first and second experiments. Dividing these numbers by 20,000, we have the following relative frequencies of red cards: 0.50018 extremely near to p
=
0.5.
and
0.500225
From the first table we find that
2:(m  10)2
=
8,924
summation being extended over all 1,000 series. by 20,000
·
�
=
Dividing this number
5,000, we find the empirical divergence coefficient in
the first experiment
D'
1.785
=
which comes close to
D
=
1.796.
Likewise, from the second table we find 2:(m  10)2
=
2,709,
whence, dividing by 5,000,
D"
=
0.5418
again close to
D
=
0.5562.
Thus, all the essential circumstances foreseen theoretically, for simple chains of trials, are in excellent agreement with our experiments. Problems for Solution 1. From an urn originally containing a white and b black balls, in succession, each ball drawn being replaced by 1 + c(c > before the next drawing.
If
m
0)
n
balls are drawn
balls of the same color
is the frequency of white balls, show that the prob
ability of the inequality
1
I<E
a n �  a+b 
does not tend to 1 as n increases indefinitely Indication of the Proof. If x, 1 or x; =
(Markoff, G. P6lya). =
O, according as a white or a black ball
appears in the ith drawing, we have E(x;)
2
=
E(x;)
=
a a+b'
E(x;x;)
=
a+c a a+b a+b+c

·
•
APPLIOATIONS OF THE LAW OF LARGE NUMBERS
231
Hence
B,.
=
E
(
x, + x, +
+x
)
na ' ,.a+b
(a+b} l(a + b +c) +
+ nab
(a+b)(a+b +c)
.
2. Marbe' 8 Prob lem. A group of exactly m uninterrupted successesE or failures F in a Bernoullian series of trials with the probability p for a success is called an "m sequence." If N is the frequency of m sequences in n trials. show that the probability of the inequality
�� for a fixed
E
_
i
(pmq2 + plqm) <
E
converges to 1 as n becomes infinite.
Indication of the Proof. Associate with variables x,, x., . . . Xp assuming only two
each of the
I' = nm 
1 first trials For 1 < i < 14 we set 1 if, beginning with the ith trial, a succession of m lettersE or Fis preceded and x; 0. We set x, followed by ForE. In all other cases x; 1 if, beginning with the values, 0 and 1.
=
=
first trial, there is a succession of
m
=
lettersE or Fended by ForE; otherwise
1 if, beginning with the pth trial there is a succession of preceded by F orE, otherwise Xp = 0. Show that
Finally,
Xp
=
E(x1 + x2 +
E(z, + Xs +
•
•
·
•
.
•
+ Xp)
=
+ Xp)l
=
m
x1
=
0.
lettersE or F
(n  m + 1)(pmql + p'qm) + 2(pmq + pqm) n•(pmql + p 2qm) + nP
P remains bounded. 8. The following interesting series of dependent trials has been suggested by S. Bernstein: Two urns contain white and black balls. The probabilities of drawing white balls from the first and second urns are, respectively, p and p'. The probabilities 1p'. Finally, of drawing black balls from the same urns are q = 1p and q' the probability of taking a ball from the first urn at the outset of the trials is a. A series of trials is uniquely defined by the following rule: Whenever a white ball is drawn (and returned), the next ball is drawn from the same urn; but when a black where
=
ball is drawn, the next ball is taken from the other urn.
Let a,. be the probability
that the nth ball will be drawn from the first urn when the results of other drawings remain unknown. ball being white.
Under the same assumption, let p,. be the probability of the nth Find general expressions of a,. and p,..
HINT:
an+l
whence
a,.
1 =

=
PI
2pp'
a,.(p + p'1) + 1 p'
+
(
a
1 p' 2pp'
)
(p + p' 1)n1.
Also
p,. whence
p,.
a,.p + (1a,.)p'
p' ;' + (a2pp ,) (pp')(p' 2pp
p + p' 2p =
=
1
+ p l)n1.
4. When it becomes known that in the ith trial a white ball was drawn, what are the probabilities a�il and p�•> of taking a ball from the first urn in the jth(j > i) trial and of drawing a white ball in the same trial?
232
[CHAP. XI
INTRODUCTION TOMATHEMATICAL PROBABILITY
HINT: The probability a�;) that it was the first um from which a white ball was drawn in the ith trial is determined by Bayes' formula:
Forn�i + 1
���
=
a�'>(p + p' 1) + 1  p'
whence = a(il 1 for j >i + 1.
1 p'
2pp'
+
(
a;p p,

)
1p'
Furthermore
p�;) = a�•>p + (1 a�'l)p' for j �i + 1. ' 6. From now on we shall assume p + p' = 1 or p' g_, q law of large numbers can be applied to variables :1:11 :l:s1 :�:a, the usual way: =
.
:1:; :1:;
.
p. Show that the which are defined in
=
•
= 1 if a white ball is drawn in the ith trial, = 0 if a black ball is drawn in the ith trial.
Indication of the Proof.
Evidently E(:�:;) = E(:�:�)
=
p;.
Furthermore
"
B,.
=
�E(:�:;p;)2 + 2 �E(:�:; pi)(:�:s Ps). i>i
il
Now
E(:�:;  p;)1 = 2pg_(12pg_); i > 1 E(:1:1  PI)1 = pg_ + a(1a)(p g_)•.

For j > i > 1
if j>i + 1 E(:�:;p;)(:�:sPs) = 0 E(:!:; p;)(:l:i+l Pi+I) = pg_(14pg_). Fori=landj>1 j> 2 E(:�:1p1)(:�:sPs) = 0 if E(:!:I pi)(:l:a pa) = ap2 + (1a)g_1(1 2pg_)(g_ + (p Hence B,.
"'

g_)a).
4pg_(13pg_)n
and the law of large nu:nbers holds. It can be stated as follows: H in n trials the frequency of white balls is m, then the probability of the inequality
j;
l
(pi + g_•) <
e
tends to 1 as n tends to infinity for any given positive number e. 6. Let r = p1 + g_• be the mean probability in infinitely many trials. divergence coefficient
Find the
"
� (m; sr)•
D
=
"' E :.: :.: l:__ Nr(l r)
___
when N = m trials are divided in n consecutive groups containing
8
trials each.
APPLICATIONS OF THE LAW OF LARGE NUMBERS Indication of Solution.
E (xa

if a > 1.
r
233
From the foregoing formulas it follows that
+ Xa+l  r +
·
·
·
+ Xa+•1 
r) 1
=
4apq(1  3pq)  2pq(1  4pq)
Hence
II
E
� (m;  ar)1
=
4Npq(1  3pq)  4apq(1  3pq)  2(n  1)pq(1  4pq).
i2
Again
E(m1  ar)•
=
4spq(1  3pq)  2pq (3  10 pq) + p(1  6q + 1 2q•  4qa)  a(p  q)(1  8pq)
so that finally
D
2  6pq =
1  4pq
_
s(1  2pq)
1  2pq
+
(p q)(p a)(1  8pq) . 2Npq(1  2pq)
For large N with a good approximation
2 6pq 1 4pq . D= _ 1  2pq a(1 2pq)
7. Two sets of cards containing respectively 12 red and 4 black cards (the first deck) and 4 red and 12 black cards (the second deck) were used in the following experi ment: The first card was taken from the first deck, and in the following trials, after a red card the next one was taken from the same deck, but after a black one the next card was taken from the other deck.
Altogether 25,000 cards were drawn, and in their
natural order were divided in 1,000 series of 25 cards each.
in Table VI.
The results are recorded
How close is the agreement between this experiment and the theory?
TABLE VI.DisTRIBUTION
oF
RED CARDs IN 1,000 SERIEs oF 25 CARDs
Frequency of
Difference,
Number of series
red cards, m
m 16
with these frequencies
6
10
1
7
 9
1
8 9
 8  7
12
10

6
1 13 43 65
11
 5
12
 4  3  2
92 101
 1 0
162 94
17
1
164
18
2
68
19
110
21
3 4 5
22
6
10
23
7
24
8 9
7 1
13 14 15 16
20
25
26 28
1
234
INTRODUCTION TO MATHEMATICAL PROBABILITY
Am. In the present case p
=
' q
=
.%, p'
""
q
=
X.
[CHAP. XI
Mean probability in infinitely
many trials:
P2 + q 2 Theoretical divergence coefficient: D
=
=
i
=
0.625.
1.384.
Frequency of red cards: 15,696.
Relative frequency:
lHU close to 0.625. Empirical divergence coefficient: D'
=
=
0.62784, 1.3845, very close to 1.384.
The probability of taking a card from the second deck is 0.25.
Now, by actual
counting, it was found that in 7,500 trials a card was taken from the second deck
1,856 times.
Hence, the relative frequency of this event in 7,500 trials is H&&
=
0.2475,
again very close to 0.25.
References PoissoN: "Recherches sur la probabilite des jugements en matiere criminelle et en matiere civile," Paris, 1837. TBHEBYSHEFF: Sur les valeurs moyennes, Oeuvres I, pp. 687694.
W. LEXIB: "Zur Theorie der Massenerscheinungen in der menschlichen Gesell A.
schaft," 1877. TscauPROW: "Grundbegriffe
und
Grundprobleme
der
Korrelationstheorie,"
Leipzig, 1925. A. MARKOFF: "Calculus of Probability," 4th Russian edition, Leningrad, 1924.
S. BERNSTEIN: "Theory of Probability" (Russian), Moscow, 1927.
CHAPTER XII PROBABILITIES IN CONTINUUM 1. In the preceding parts of this book, whenever we dealt with
stochastic variables, it was understood that their range of variation was represented by a finite set of numbers.
Although, for the sake of better
understanding of the subject, it was natural to begin with this simplest case, there are many reasons why it is necessary to introduce into the calculus of probability stochastic variables with infinitely many values. Such variables present themselves naturally in many cases of the type of Buffon's needle problem which we had occasion to mention in Chap. VI. On the other hand, even in dealing with stochastic variables with a finite, but very large number of values, it is often profitable for the sake ·
of approximate evaluations, to substitute for them fictitious variables with infinitely many values. far are
Among these the most important ones by
continuous variables. CASE
OF
ONE VARIABLE
2. Beginning with the case of a single continuous variable x, we must
assume that its range of variation is known and represented by a given interval
(a, b),
variation of
x
finite or infinite.
The knowledge only of the range of
would not enable us to consider
x
as a stochastic variable;
to be able to do so, we must introduce in some form or other the considera tions of probability.
For a continuous variable it is
as
unnatural to
speak of the probability of any selected single value, as it is to speak of the dimension of a single selected point on a line.
But just as we speak
of the length of a segment of a line, we may introduce the notion of the probability that
x
will be confined to a given interval
(c, d),
part of
(a, b).
In introducing this new notion of probability in any manner whatso ever, we must be careful not to fall into contradiction with the laws of probability which are assumed as fundamental. is the probability for
x
to lie in the interval 1° 2°
P(c, d) � P(a, b)
=
(c,
To this end, if
P (c, d)
d), we are led to assume
0 1.
The first assumption is an expression of the fact that probability can never be negative. that
x
The second assumption corresponds to the fact
certainly assumes one out of the totality of its possible values. 235
236
INTRODUCTION TO MATHEMATICAL PROBABILITY
Next, if the interval
(c, e)
and
(e, d),
(c, d)
[CHAP. XII
is divided into two adjoining intervals
we assume
3° P(c, d)
=
P(c, e) + P(e, d)
in conformity with the theorem of total probability. For continuous variables it is furthermore assumed: 4° for an infini tesimal interval
(c, d), P(c, d)
is also infinitesimal.
Properties 3° and 4° show that and
d
P(c, d)
is a continuous function of
c
and that
P(c, c) In other words, the probability that At the same time P(c,
=
x
0.
will assume any given value is 0.
d) represents the probability of any one of the four
inequalities C < X <
di
C
� X < di
C < X
� di
C
� X � d.
3. A simple example will serve to clarify these general considerations. A small ball of negligible dimensions is made to move on the rim of a circular disk.
It is set in motion by a vehement impulse and after many
complete revolutions, retarded by friction and the resistance of the air, comes to rest.
The variety and complexity of causes influencing the
motion of the ball make it impossible to foresee the final position of the ball when it comes to rest and the whole phenomenon bears characteristic ieatures of a play of chance.
The stochastic variable associated with this
chance phenomenon is the distance from a certain definite point on the rim (origin) to the final position of the ball, counted in a definite direction, for example, clockwise.
This variable, when we consider the ball as a
mere point, may have any value between 0 and the length of the rim. The question now arises, how to define the probability that the ball will stop in a specified portion of the rim, or else that the variable we consider will have a value belonging to a definite interval, part of its total range of variation.
In trying to define this probability, we must observe the
fundamental requirements set forth in Sec. 2.
Besides that, we must of
necessity resort to considerations which are not mathematical in their nature but are based partly on aprioristic and partly on experimental grounds.
Suppose we take two equal arcs on the rim.
There is nothing
perceptible a priori that would make the ball stop in one arc rather than in another.
Besides, actual experiments show that the ball stops in one
arc approximately the same number of times as in another, and this experimental knowledge together with aprioristic considerations suggests the assumption that we must attribute equal probabilities to equal arcs, irrespective of the position of the arcs on the rim.
As soon as we agree on
this assumption or hypothesis, the problem becomes mathematical and can easily be solved.
SEc.
PROBABILITIES IN CONTINUUM
4)
237
Before proceeding to the solution, a remark on the meaning of zero probability in connection with continuous variables is not out of place. Zero probability in this case does not mean logical impossibility.
We
attribute zero probability to the event that the ball will stop precisely at the origin.
However, that possibility is not altogether excluded
so far as we consider the origin and the ball as mere points.
The question
lacks sense if we deal with a material ball and a material rim, no matter how small the former and how fine the latter.
4. A stochastic variable is said to have uniform distribution of probability if probabilities attached to two equal intervals are equal. This means that interval
(c, d)
P(c, d)
depends only upon the length
d
c = 8 of the P(8). Com 8 and 81 into a 
and accordingly can be denoted simply by
bining two adjoining intervals of the respective lengths single interval of length 8 +
811
according to requirement 3°, we must
have
P(8 + 81)
(1)
Suppose now that the interval
=
P(8) + P(8').
(a, b)
ing the whole range of variation of of the length
l/n.
of the length x,
b

is divided into
a
n
=
l,
represent
equal intervals
The repeated application of equation (1) gives
P(l) But by requirement 2°
P(l)
=
=
nP(*}
1 and hence
Again, repeated application of (1) gives
for any integer appropriate
m
m
<
n.
Now let us take any interval of length
it will contain the interval
�l n
8.
For an
and be contained in the
1 interval m + z; hence, referring to requirements 1° and 3° 1 we shall have
n
while
238
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
XII
or
�
s
Since P(s) and
s/l
m + 1 . n
�
n  l
<
are contained in the same interval of length 1/n,
'P(s) rl � <
and this being true for an arbitrary n, no matter how large, it follows that 8
P(s) Thus for a variable
x
=
l'
with uniform distribution of probability, the
probability of assuming a value belonging to an interval of length given by the ratio of
s to the length l
s
of the whole range of variation of
is x.
o. In the general case, when we cannot assume the uniform distribu tion of probability throughout the whole rangP of variation of
x,
we let
ourselves be guided by an analogy with a mass distributed continuously over a line.
In fact, the distribution of a mass satisfies all the require
ments set forth for probability. In particular, the mass am contained in an infinitesimal interval (z, z + az) is also infinitesimal and thP mean density
is generally supposed to tend, with "density at the point z." tained in any interval (c,
az
converging to 0, to a limit called
If this density p(z) is known, the mass con
d)
is represented by an integral
£"p(z)dz. Following this analogy we admit that the P(z, z +
az
mean density of probability
a z)
tends to a limitj(z): density of probability at the point z when the length of the interval az tends to 0. to an interval (c,
d)
Hence, again the probability corresponding
will be represented by the integral P(c,
d)
=
f."J(z)dz.
This expression satisfies all the requirements of Sec. 2 if the density of the probability f(z) is subject to two conditions:
SEc.
6)
PROBABILITIES IN CONTINUUM
f(z)
(a)
239
� 0 for all z in (a, b).
j)
(b)
1.
=
The second condition implies, of course, the existence of the integral itself. But in all cases of any importance the density is continuous, save for discontinuities of the simplest kind which do not cause any doubts as to the existence of the above integral.
P(c, d) it follows that for an infini (z, z + dz) the probability is given by f(z)dz neglecting
From the general expression of tesimal interval
infinitesimals of a higher order.
For the uniform distribution of proba
bility over an interval of length l the density is constant and
=
1/l.
In other cases we cannot expect to obtain a definite expression for density unless the variable itself is sufficiently characterized by addi tional conditions, either hypothetical or implied by the problem.
Thus,
for instance, in applications of probability to problems of theoretical physics, the physicists have succeeded in obtaining definite probability distributions by invoking physical laws of admitted universal validity together with some plausible hypotheses.
6. The interval containing all possible values of a stochastic variable may be finite or infinite according to the nature of that variable. ever, in all cases we may take the largest possible interval from
+
oo
;
to this end it suffices to
given interval as being
oo
to
define the density outside of the originally
Then the density will be defined for all
0.
=
How

real values of z and will satisfy the conditions:
f(z) � 0 for all z J_..J(z)dz 1
(a) (b)
=
Furthermore, the probability for
x
to be in any interval
(c, d)
will be
given by
J.df(z)dz. In particular, taking
c
=

oo
and writing
F(t) represents the probability that Considered as a function oft,
F(
oo
)
=
0 and F( +
oo
)
=
t
instead of
d,
J� j(z)dz
=
x
will not exceed or will be less than t.
F(t)
is never decreasing and varies between
1.
It is called the" distribution function of
probability."
In case
interval
its distribution function is evidently defined as follows:
(a, b)
F(t) F(t) F(t)
x
=
=
=
has uniform distribution of probability over an
0
for
t  a ba 1
t
<
for for
a a � t � b
t
Its graph is shown in Fig. 1 on page 240.
>
b.
240
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
XII
7. The definition of mathematical expectation can easily be extended to continuous variables; namely, the expectation of of
x
is defined by
E(x) provided this integral exists. of any function
rp(x)
=
x
or the mean value
J_"'.,.zf(z)dz
Similarly, the mathematical expectation
is given by
E(rp(x)]
=
J_"'.,.rp(z)f(z)dz.
Of course, the existence of the integral in the right member is presupposed again.
When this integral does not exist, it is meaningless to speak of the mathematical expectation of __
"£": :::1 oo 6 FIG. 1.

__
a
nth
moment.
+oo
xn with positive integer exponent moment of the order n or m,. so that
power
is called the
We shall denote it by
m,.
=
J_"'.,.z"f(z)dz.
The dispersion D and the standard deviation of way as in Chap. IX; namely,
D
=
u2
=
rp(x).
The mathematical expectation of the
____
E(x  m1)2
=
x
are defined in the same
J_ (z  m1)2j(z)dz
..
=
..
m2  mi.
Often it is advisable to consider the mathematical expectation of where
a
may be any real number, ordinarily positive.
is called the
11
absolute moment of the order Jl.a
=
" a.
lxla
This expectation
Its expression is
J_"'.,.lzlaJ(z)dz,
and it is evident that
The mathematical expectation of the function
where 11
t
is a real variable, is of the utmost importance.
It is called the
characteristic function" of distribution and is defined by
rp(t) Since f(z) � 0 and
=
J_"'., e"•f(z)dz.
J_"'..J(z)dz
=
1
SEC.
8).
PROBABILITIES IN CONTINUUM
the integral defining
tp(t)
241
is always convergent and
itp(t)i
� 1.
The distribution is completely determined by its characteristic func tion.
Because by the Fourier theorem
1 211'

f .. dtf .. e•t<•z>J(z)dz oo
_!_f .. tp(t)e""'dt .. f ....tp(t)eitzdt. 211'
tp(t)
f(x)
But the lefthand member is
at all points of continuity of f(x).
by the definition of
=
oo

and so
1 211'
8. To illustrate the preceding general explanations we shall now con f(x)
sider a few examples.
(0, l).
Example 1. the interval
Let
x
=

be a variable with uniform distribution of probability over
The density of this distribution being constant
j(z) the mean value of
x
=
1
l
is
and the second moment mz
=
flz� Jo l
=
�.
3
Hence, the square of the standard deviation
This simple example may be used to illustrate a remark made at the beginning of this chapter, that sometimes it is profitable to substitute for a variable with a finite but large number of values a fictitious continuous variable. n
Suppose that in flipping a coin
times, we mark heads by 1 and tails by O, thus obtaining a sequence comprising
units and zeros altogether, disposed in the order of trials.
n
This sequence may be con
sidered as successive digits in the binary representation of a fraction:
contained between
0 and 1.
«1 .,. +Cln «2 + X= + 2" 2 4 X may be considered as a stochastic variable with 2" The probability IT(a, (3) that Xwill be con
(a, (3), or more definitely that X will satisfy the inequalities
values each having the probability 1/2". tained in the interval
a<X�{J
INTRODUCTION TO MATHEMATICAL PROBABILITY
242
[CHAP. XII
is obviously obtained by !JlUltiplying the number of integers N contained in the limits 2na < N ;;;! 2np by 1/2n.
Now there are exactly
[2n{3]  [2na] = 2n(p  a) + IJ ;
1 < IJ < 1
such integers; hence ll(a, f3) ={3  a+ If
n
IJ ·
2n
is even moderately large, this probability is very near to the probability P(a, /3)
=
{3  a
that a fictitious variable x with uniform distribution over the interval (0, 1) will
assume a value in the interval (a, {3).
The first two moments of the variable X are,
respectively
Ml M2
0+1+2+"""+2R1
i=
1 =
22n
=
02+12+ 22+
1
2
•
•
•
+(2n  1)2
2 3n
=
2n+l 1
3

1
1
.
2n+l + 3 22n+l •
and differ little from the respective moments J2 and U of the fi.ctitious continuous variable. Without losing anything essential, we here gain considerably in sim plicity by substituting a fictitious continuous variable for the discontinuous variable
X.
Example 2. A thin bar can rotate freely about its middle point P. It is set in motion and after several revolutions comes to a stop pointing toward a point X on a
line l. The position of the bar is determined by an angle IJ formed by itself and the perpendicular PO dropped from P on l; IJ varies between the limits 'II'/2 and 'II'/2 and its distribution is �/', supposed to be uniform. The position of X is determined by t =0��x xfrom 0, this distance being positive or nega its distance OX FIG. 2.
X
=
tive according as X is to the right or to the left of the point 0. It is required to find the distribution of the probability of x. The relation between IJ and xis x
=
atg IJ
if OP =a or, conversely, IJ =arc tg
X ·
a
By differentiation we find the relation between diJand dx:
adx
diJ=  · a2+x2 Now, by hypothesis, the probability that
IJ+diJis
diJ
1
adx
SEc.
8]
243
PROBABILITIES IN CONTINUUM
And the probability that the distance of X from 0 will be contained between x and Hence, the density of probability for the variable x is
x + dx is the same.
1
j(z)
a
=
a•+z2
'��"
and the probability corresponding to a finite interval (c, d) is given by
rid
P(c, d)
'II"
adz
·
ca•+z•
For the whole range of variation of x
If..

'II"
as it should be.
adz
However, we cannot speak of the mean value of
higher order, since the integrals
f.. _
have no meaning.
xdx
•
,.a• +x1
f
x2dx
..

 ,.a•+x1
But the characteristic function
IP(t)
a = 
'��"
Example 3.
1

,.a• +z2

f

..
=
a• + x1
or of moments of
,etc.
!P(t)
e'�'dx
..

x
exists and is given by
e•111•
One of the most important distributions (theoretically and prac
tically) is the socalled "Gaussian" or "normal" distribution.
The density of this
distribution is given by
j (z) with three parameters K, h, a.
=
Ke11<•a)•
However, only two of these parameters are inde
pendent, since we must have
f
00
j(z)dz
=
 oo
whence
K
f
..
eA"<•..>•dz
=
f
K
00
eA•u•du
=
KV; h
oo
oo
K= and finally
j(z)
=
=
1;
h
y';
�
eA"<••>•.
ah .. v; f
To find the meaning of a and h we observe that the mean value of our variable is
.. v'lrf h


oo
since
eA•!I•>"zdz
.. v;f h
= 
oo
(z  a)eA•<••>1dz +

oo
.:••<•..>"dz
=
a
244 Thus
INTRODUCTION TO MATHEMATICAL PROBABILITY a
[CHAP. XII
has the meaning of the mean value of the normally distributed variable
x.
The square of the standard deviation is given by
whence
1
h=
 ·
11v2
Thus for the normally distributed variable with the mean
a
and standard deviation
11
the density of probability is
(za)'
1
f(z) =e� 11
Finally, for the variable
u
=x
 a
y'2;
with the mean value 0 and the same standard
deviation, the expression of density takes the simplest form
1
�
f(z) = e 2..•
y'2;
11
'
and the distribution function of probability is represented by the integral
F(t)
1
=
11
y'2;
The curve of density .
J
� e 2"'dz.
.,
"'
'
1 y = e 2"'
O'y'2;
or the probability curve has a bellshaped form as shown in the figure corresponding to
11
= 1.
It has a single maximum corre
sponding to
x
= 0 and on both sides of this
maximum it rapidly approaches the
FIG. 3. definition
axis.
distribution has a very simple form.
"' O'y'2;J., 1
x
The characteristic function of normal By
� e 2"' e'•'dz.
(a> 0) we find that cr2tl
=
O) can now be easily found.
From the definition of the characteristic function it follows that
inmn =
(dn
•
10
SEc. 9] In
our
245
PROBABILITIES IN CONTINUUM case
=
1

cr• t•
2
whence
dtlll+l
=
(cr•)•t'  (crz)a t• ( 1)• (d•dt•.,(t211)) 1
+
•
0,
eo
1 1·2·32

1·22
, ... o
1
=
•
3
•
5
+ ... '
•
•
· (2k  1)u211.
Thus mak+l mak
0
=
· 3 · 5 · · · (2k
1
=
1
 )cr
211
.
CAsE OF Two OR MoRE VARIABLES 9. By analogy it is easy now to extend the notion of probability to two or more variables considered simultaneously. A pair of special values x, y of two stochastic variables X, Y will be represented geomet
x, y referred to a rectangular The domain S of all the possible values of X and Y will
rically by a point with the coordinates system of axes.
be represented by a portion (finite or infinite) of a plane with a definite boundary unless this domain coincides with the whole plane. probability that the point
dxdy will cp(x, y) is
x, y
The
should belong to an infinitesimal area
be expressed by the product
cp(x, y) dxdy
where the function
again called the density of probability at the point
x, y.
The
density of probability must satisfy two requirements: it is nonnegative in the whole domain S and
JJcp(x, y)dxdy
=
1
8
where the double integral is extended over all the domain S. probability for the point
x, y
to be located in a given domain
0'
The
is then
given by the integral
JJcp(x, y)dxdy "
extended over If
cp(x, y)
uniform.
0'.
is a constant in S, the distribution of probability is called
The domain S in this case must be finite and if its area is
denoted by the same letter, then
cp(x, y) The probability for the point
1 =
x, y to be
E
f
within the domain
by the ratio 0'
8 denoting the area of the domain
0'
by
0'
again.
0'
will be given
246
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XII
10. We can always substitute the whole plane for the domain 8. To that en4 it suffices to set
IP(X, y) in all points not belonging to 8.
=
0
We shall then have
IP(X, y) E;
0
everywhere and
J_"',J_"' IP(x, y)dxdy ..,
=
1.
By doing so we have the advantage of stating results in a perfectly general form without mentioning the domain 8.
However, in dealing with
particular problems, it is more convenient to consider only those points which can actually represent simultaneous values of the variables. The probability of simultaneous inequalities
a<x
c
according to the general definition is represented by the double integral
LbfctliP(x, y)dxdy. This corresponds to the compound probability of two events and we must see that the fundamental theorem of compound probabilities continues to hold.
Taking c
=

oo
, d
+ oo the repeated integral
=
b
L dxf_:IP(X, y)dy represents the probability
P(a,
b) for the variable X (as if it were con
sidered alone without any reference to
Y)
to have its value in
(a, b).
The function
f(x)
=
f_"'..,IP(X1 y)dy
represents the density of probability of X.
P(a, b)
=
Thus
Lbf(x)dx.
In a similar way F(y)
=
J_"'..,IP(x, y)dx
represents the density of the probability of Y; and the probability Q(c,
(c, d) is given tl d) F(y)dy.
that this variable has its value in Q(c,
=
i
by
d)
SEc. 10]
247
PROBABILITIES IN CONTINUUM
Now the double integral
.CJ.tlrp(x, y)dxdy can be written in either of the forms
J:bJ.tlrp(x, y)dxdy J:b!(x)dx J.tlF1(y)dy J:bJ:tlrp(x, y)dxdy LtlF(y)dy J:b!I(x)dx =
·
·
=
where
F1(y)
=
J:brp(x, y)dx . , J.bf(x)dx
JI(x)
=
"
J.tlc rp(x, y)dy J.tlc F(y)dy
may be considered as densities of conditional probabilities, respectively, for Y when it is known that X has a value in known that Y has value in
(c, d).
(a, b)
and for X when it is
The preceding expressions for the
probability of the simultaneous inequalities a
<x
c y
have the same form as the theorem of compound probability and may be considered as its extension. its value in
The conditional probability for Y to have
(c, d) when it is known that X has its value in ( b). is given by f.tlF1(y)dy. a,
Now, we define variables X and Y as independent when the proba
(c, d) is not affected by the knowledge that X belongs (a, b), which means that itlF1(y)dy itlF(y)dy
bility for Y to be in to
=
or
JbLdrp(x, y)dxdy LtlF(y)dy J:b!(x)dx and, since intervals ( b) and (c, d) are arbitrary, rp(x, y) f(x) F(y) ·
=
a,
=
at points of continuity.
·
Hence, the density of probability for two
independent variables is a product of a function of of
y alone.
x alone by a function
Conversely, when this condition is satisfied the variables are
independent.
For independent variables the probability of the simul
taneous inequalities
a<x
248
[CHAP. XII
INTRODUCTION TO MATHEMATICAL PROBABILITY
has a simple expression
Lbf(x)dx £tlF(y)dy ·
which is the product of the probability for interval
(c, d),
(a, b)
X
to have its value in the
by the probability for Y to have its value in the interval
in perfect analogy with the compound probability of two inde
pendent events. Finally, the mathematical expectation of any function
Y,(x, y)
, can be
defined by
E(Y,(x, y))
=
J_'",.J_ "'.., 1/t(x, y)rp(x, y)dxdy
provided the integral in the right member exists.
11. It is hardly necessary to dwell at length upon the case of several n
x1, xs, .. . x,. of
A system of particular values
stochastic variables.
X1, Xs, ... X ..
stochastic variables
may be considered as a point in
ndimensional space. The density of probability is a nonnegative func tion
rp(x11 xs, ..
x ..)
defined in the whole space and satisfying the
condition
J_"'..,J_"'..,
·
·
·
J_"'.., rp(x1, x2, ... x,.)dx1dx2
The probability for a point representing
X1, X2,
dx,.
·
·
·
=
1.
..X .. to be located
.
in a given domainO" is given by the integral
JJ
·
·
·
J rp(x1, x2, ... x.. )dx1dx2
•
•
•
dx..
extended over O". In the case of uniform distribution of probability,
rp(x1, xs, ... x,.) of space and region and
v
=
is by definition a constant in a certain finite region
0 outside of that region.
the volume of the domain
0"1
If
V
is the volume of that
the ratio
v/V
gives the proba
bility that a point belongs toO". The probability of the simultaneous inequalities
is given by the integral
which, by introduction of the conditional probabilities as in the case of two variables, can be put into the form of a product of
n
integrals in a
manner perfectly analogous to the expression of the probability of a compound event with
n
components.
Finally, the variables are inde
SEc. 13]
PROBABILITIES IN CONTINUUM
249
IJ'(Xt, x2, . . . Xn) is a product of n functions Xt, X21 x,., respectively, and conversely.
pendent if the density depending only upon
•
•
•
The expression
E[ll(xt, x2,
•
·
· x,.)] =
J_"'..,J_"'..,
·
·
·
J_"'.. 11vxJ,xtdx2
·
·
·
dx,.
serves to define the mathematical expectation of any function
X21
•
•
•
Xn) of Xt1 X21
•
•
•
11(Xt,
Xn.
12. Since in introducing the extended idea of probability we took care to preserve the fundamental theorems of the calculus of probability, we may be sure that other theorems derived from them will hold for In particular, theorems concerning mathematical expectation and the fundamental lemma in Chap. X, Sec. 1, hold for continuous variables.
continuous variables.
Upon this basis as we have seen was built the
proof of the law of large numbers.
Hence, this important theorem
applies equally to continuous variables.
GEOMETRICAL PROBLEMS 13. A few geometrical problems will afford a good illustration of the foregoing general principles.
Problem 1.
A rectilinear segment AB is divided by a point C into
two parts AC =
a, CB = b.
Points X and Y are
taken at random onACandCB,respectively.
What is
A
the probability that AX,XY,BY can form a triangle?
Solution.
x c FIG. 4
U B
•
We must first agree upon the meaning of the expression
"at random."
The idea suggested by this expression implies that the way of selecting points X and Y gives no preference to
R
any point of AC and CB, respectively. variables
uniform distribution of probability. L...3L....J.�
Fxo. 5.
Consequently,
x = AX andy= BY may be assumed to have The domain of the
x, y is a rectangle OMPN with the sides OM= a, ON = b. In order that AX,XY,BY can form a triangle
point
the following inequalities must be fulfilled: or x < a+b  x x< (a + b  x  y) +y or y< a+b  y y< (a+b  x  y) + x a+b  X  y< X+y.
These inequalities are equivalent to
x<
a+b 2'
y<
a+b 2'
x+y
a+b >2·
QPR making ·QR drop the perpendiculars
To interpret them geometrically through P draw a line
<J::RQO = 45°.
From the midpoint of
VS, VW on OX, O Y.
Then the preceding inequalities liptit the position
250
INTRODUCTION TO MATHEMATICAL PROBABILITY
of the point
x,
y to the shaded area SVW, whose part TSU is contained
in the rectangle OMPN. uniform distribution.
x,
[CHAP. XII
Variables
x and
y are independent and have
Hence, the density of probability of the pair
y is constant and the probability that the point
TSU will be Area TSU Area OMPN
=
!bb (ib
At the same time this is the probability for
Problem 2.
On a line
y is in the triangle
1 b
2
=
triangle.
x,
ii'
AX, XY, BY to form a
AB two points X1, X2 are taken at random. AX1, X1X2, X2B can form a triangle?
What is the probability that
B
A
D
X1 Xz FIG. 6.
Solution.
Variables
AX1
FIG. 7. =
X11 AX2
uniform distribution of probability.
X2 are independent and have
=
The domain of all possible positions
x1, x2 is a square with the side AB l. Positions of this AX1, X1X2, X2B form a triangle can be characterized as First, if X1 precedes X2, we have
of the point
=
point when follows.
X2 X1
or which means that
<
l 2
l 2
X1
<
X2
>2
l
x1, X2 belongs to the triangle OPN, the definition of
which is evident if L, M, N, P are midpoints of the sides of the square
ABCD.
Second, if
X1 follows X2, we have X1
and these inequalities define the area OLM. x11 X2 is uniform, the required probability is Area OLM + Area ONP Area
Problem 3.
ABCD
=
l
>2
Since the distribution of
lll U
1
=
4'
A chord is drawn at random in a given circle.
What is
the probability that it is greater than the side of the equilateral triangle inscribed in tha.t circle?
SEc.
PROBABILITIES IN CON'l'INUUM
14]
Solution 1.
251
The position of the chord drawn at random can be deter
mined by its distance from the center of the circle.
This distance may
vary between 0 and R, the radius of the circle.
The chord is greater than the side of the equilateral triangle inscribed in the circle if its dis Hence, the required probability
tance from the center is less than J,2R. Pl
Solution 2.
=
jR
R
=
1 2'
Through one end of the chord, draw a tangent AT.
The angle tp varying from 0° to 180° determines the position of the T
86
chord.
If it is greater than the side of the inscribed equilat
eral triangle, the angle tp must lie between 60° and 120°.
Hence the answer p2 =
J<'IG. 8.
120° 60° 180°
1 =
3'
The fact that we obtain two different numbers for the same probability seems paradoxical, and the problem itself is known as "Bertrand's paradox."
However, going attentively over both solutions, we discover
that we are really dealing with two different problems.
In the first
solution it was assumed that the distance of the chord from the center has uniform distribution, while in the second solution the distribution
of the angle tp was taken as uniform.
The second solution may be con�
sidered reasonable if a thin bar or a needle can rotate freely about A and if, being set in motion, it determines the chord ABby its ultimate position.
On the other hand, the first solution is acceptable if a circulal'
disk is thrown upon a board ruled with parallel lines distant from one another by the diameter of the disk.
The intersection of the disk with
one of the lines determines a chord, �nd the probability that it is greater than the side of the inscribed equilateral triangle can reasonably be assumed to be J,2.
A general remark applies to all problems of this kind.
When a
certain geometrical element, such as a point or a line, is supposed to be taken at random, it should be clearly indicated by what kind of mechanism this is to be done.
Only then the hypothetically assumed
distribution can be put to an experimental test and either confirmed (approximately) or rejected.
14. Buffon's Needle Problem.
A board is ruled with equidistant
parallel lines, the width of the strip between two consecutive lines being
d.
A needle so fine that it can be likened to a rectilinear segment of the
length l < d is thrown on the board. What is the probability that the needle will intersect one of the lines (naturally not more than one)?
Solution. probabilities.
This is the oldest problem dealing It was mentioned by Buffon,
with
geometrical
the celebrated French
252
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XII
naturalist of the eighteenth century, in the Proceedings of the P,aris Academy of Sciences (1733) and later reproduced with its solution in Buffon's book "Essai d'arithmetique morale," published in 1777.
Let us determine the position of the needle by the distanee OP
= x of
its middle point from the nearest line, and the acute angle rp between OP and the needle.
Variables x and rp may be considered as independent.
Furthermore, x and rp vary respectively between 0 and
1r
/2.
�d, and 0 and
As a hypothesis we assume the distribution of probability for X
� 0
FIG. 9.
x and rp as uniform.
OA =r/2, 00 = d/2.
A
fJ
FIG. 10.
The domain of x, rp is a rectangle OABO with Now, the needle intersects one of the lines if x<
l cOSI,C 2
and then the point x, rp lies in the shaded area below the curve X=
l COS rp• 2
Since the distribution of x, rp is uniform, the required probabiiity will be
P
·=
Area OAD
Area OABO
.
But ...
l f2 cos rpdrp Area OAD = 2J o Area OABO
1r
=2
l
=
2
d ·2
and consequently
P
=
2l
. rd
On pages 112113 an account was given of experiments made by several authors in connection with Buffon's problem.
They all show good agree
ment with the theory and indirectly confirm the hypothesis assumed in deriving the above expression for probability.
15. Extension of Buffon's Problem.
A thin plate in the shape of a
convex polygon, of dimensions so small that it cannot intersect two of the lines simultaneously, is thrown on a board ruled, as in Buffon's needle
SEc.
PROBABILITIES IN CONTINUUM
16)
253
problem. What is the probability that the boundary of the plate will intersect one of the lines?
Solution.
Suppose that the polygonal boundary has five sides.
Let these sides (and their lengths) be denoted by
a, p, y, o, E. Each of them is shorter than the distance d between two consecutive lines.
On account of convexity, a line can intersect either none or two
(and only two) sides.
Accordingly, combining sides in pairs, we can
distinguish 10 mutually exclusive cases and denote their probabilities by
(aP), (ar), (ao), (ae), (Pr), (Po), {Pe), (ro), (re), (oE). The required probability will be given by the sum
p
=
(aP)+ (ar)+ (ao)+ (ae)+ (Py)+ {Po)+ {Pe)+ (yo) + + ('YE) + (oE).
On the other hand, the side
a can be intersected by a line in four mutually P or y, or o, or e. Hence, if (a) is
exclusive ways; namely, together with the probability of intersection
(a)
=
(aP)+ (ay)+ (ao) + (ae),
and similarly
(P) (r) (o) (e)
= =
=
=
(Pa)+ (Pr)+ (Po)+ (Pe) (ra)+ (rP)+ (ro)+ (re) (oa)+ (oP) + (or) + (oE) (ea)+ (EP)+ (ey)+ (eo),
whence
(a)+ (P)+ (r)+ (o)+ (e)
=
2 p.
But
(a)
=
2a
rd'
(P)
2P
=
(r)
rd'
=
2y
rd'
(o)
2o =

,
rd
and consequently P =
a+P+r+o+e rd
=
P rd
where P is the perimeter of the polygonal boundary. result is perfectly general.
Evidently this
Since it does not depend upon the number of
sides, by passage to the limit, it can be extended to the case of a plate bounded by any convex curve.
16. Second Solution of Bufton's Problem.
Barbier has given another
extremely ingenious solution of Buffon's problem and of its extension. Let f(l) be an unknown probability that the needle will intersect a line.
254
INTRODUCTION TO MATHEMATICAL PROBABILITY
Imagine that the needle is divided into two parts
l'
l".
and
[CHAP.
XII
Evidently a
line intersects the needle if, and only if, it intersects either the first or the second part.
Hence, by the theorem of total probabilities
f(l) = f(l') + f(l"), whence, as in Sec. 4, we conclude
f(l) = Cl where C is a constant independent of determine this constant.
l.
The whole question is how to
Barbier's ingenious idea was to let this
problem depend on the solution of another one: A polygonal line (convex or not) is thrown upon the board; what is the mathematical expectation of the number of points of intersection?
The perimeter of the polygonal
line can be subdivided into n rectilinear parts a1, a2,
d.
•
•
With these n parts we can associate n variables Xt, that
x1 = 1 x, = 0
The sum
. a,. all less x2, . . . x..,
than such
if one of the lines intersects a, otherwise.
8 = X1
+
X2
+ ' ' ' +
X,.
evidently gives the total number of the points of intersection.
Hence
E(s) = E(xt) + E(x2) + · · · + E(x,.) and, if P• is the probability of intersection of a, with one (and only one) line,
But, according to the previous result, p, =
Ca,.
Hence, we have a perfectly general formula
E(s) = C (a1 +
a2
+ · · · + a,.) = CP
where P is the perimeter of the polygonal line.
The result holds for any
curvilinear arc (closed or not) as can be seen by the method of limits. This formula applied to a circle with the diameter d gives
C ·.,.d=2 since such a circle has always exactly two points of intersection with the lines of the system.
Thus we find that
2 = a ?rd
..
SEc. 17)
255
PROBABILITIES IN CONTINUUM
and
f(l) as obtained before.
2l rd
=
For a closed convex line of sufficiently small dimen
sions only two cases are possible: two intersections (probability
1
p),
2p
and
in agreement with the result obtained in Sec.
15.
none (probability

whence E(s)
2P
=
p),
or
2P =
rd
or
p
p =
rd
17. Laplace's Problem.
A board is covered with a set of congruent rectangles as shown in the figure, and a thin needle is thrown on the board.
Supposing that the needle is shorter
than the smaller sides of the rectangles, find the probability that the needle will be entirely contained in one of the
Em Flo. 11.
rectangles of the set. Let AB
Solution.
=
a, AD
=
b
be the sides of the rectangle which
contains the middle point of the needle, the length of which is
l (l
l
< b).
Taking AB and AD for coordinate axes, the position of the needle is determined by two coordinates x, y of its middle point
Y ·
D
b
A
and the angle rp formed by the needle with the x axis.
c
B Flo. 12"
We may consider x, y, rp as three independent variables
X
with uniform distribution of probability. filled up
The domain
with a l l p o s s i b l e p o i n t s x, y,
rp i s a
parallelepipedon
0 < x
0 < y < b;
7r
1r
2
2
 <"' <
and the distribution of probability throughout this domain is uniform. To characterize the domain of points representing positions of the M G K
Flo. 13.
Flo. 14.
middle point of the needle when it is located entirely within ABCD we consider the sections of that domain by planes rp
=
constant and their
256
INTRODUCTION TO MATHEMATICAL PROBABILITY
projections on the plane xy.
[CHAP.
XII
These projections are represented by
the shaded areas in Figs. 13 and 14 corresponding to positive and negative
({J,
respectively.
In Fig. 13
BE BF = CR and AP Similarly, in the second figure =
=
APIIBFII CRIIDG
IP;
=
DG
=
DH
=
=
!l.
AJIIBQ\I CL\\DS and
AJ
=
AK
BQ
=
=
The area of the rectangle
CL CM DB !l. PQRS corresponding to =
=
=
these two cases can be
expressed as follows: Area
PQRS
Area
PQRS
=
=
(a
 l cos
(a  l
cos
({J)(b
 l sin
({J)(b +l
sin
�P) �P)
=
=
ab
 l (b cos IP
+a sin �P) + +l2 sin IP cos ({J, cos IP  a sin �P) 
ab
 l(b
 l2 sin IP cos IP·
Without distinguishing positive and negative values of ({J, we may write
F(�P)
=
area
PQRS
=
ab
 bl cos IP  la\sin
�Pi +!l2\sin 2�PI ·
The volume of the domain representing positions of the needle entirely within
ABCD is: ,..
v
=
J2,..F(�P)d�P = 1rab  2bl  2al + l2 2
while
v =7rab is the volume of the domain
0 < y < b,
O<x
'If'
'If'
2

·
Hence, the required probability is: _
p
1
_
2l(a +b)  l2 1rab
and the complementary probability for the needle to intersect the boundary of one of the rectangles is:
q=
2l (a +b)  l2 1rab
•
111
SEc.
257
PROBABILITIES IN CONTINUUM
Buffon's problem may be considered as a limiting case when a = oo, we find that
oo
and, indeed, by setting a =
2l q = 1!'b in conformity with the result in Sec. 14. These examples may suffice to give an idea of problems in geometric Sylvester, Crofton, and others have enriched this field
probabilities.
by extremely ingenious methods of evaluating, or rather of avoiding However, from the evaluations, of very complicated multiple integrals. standpoint of principles, these investigations, ingenious as they are, do not contribute much to the general theory of probability. Problems for Solution
l whose middle 1. A point X is taken at random on a rectilinear segment AB point is 0. What is the probability that AX, BX, and AO can form a triangle? The x is assumed to be uniform. Ans. _%. distribution of AX l. 2. Two points X1, X2 are taken at random on AB X Assuming uniform distribution of probability, what is the mathe A B ° between distance the of n power any of expectation matical X1 FIG. 15. =
=
=
and
X2?
dx1dx2 2ln . Ans. rz rz lx1 x � ln2JoJo l (n + 1)(n + 2) 3. Three points X1, X2, X a are taken at random on AB. What is the probability that X a lies between X1 and X2? Ans. �.assuming uniform distribution of probability. 4. A rectilinear segment AB is divided into four equal parts AC CO OD DB. =
=
=
=
Supposing that the distribution of probability is symmetric with respect to 0, let p be the probability that a point selected at random on
0 D B
C
A
AB will
be between
C and
D.
Also, let Q be the probability that the middle point between two points selected at random will be between C and D. Prove
FIG. 16.
1 +P2 > · 2
thatQ
HINT: The middle point of a segment X1X2 is surely between C and D if : (i) X1 X2 are in CO; or (ii) X1 and X2 are in OD; or (iii) X1 and X2 are on opposite sides
and
of 0.
5. Two points
X1, X2
are chosen at random in a circle of radius
r.
Assuming
uniform distribution of probability, what is the mathematical expectation of their distance?
Ans.
Denoting the required mathematical expectation by
.,.2r•M
=
M,
fo2"fo2"F(r, 9, 9')d9d9'
where
F(r, r
Hence, varying
o,
9')
by
dr
d
F
=
=
fo'J:'·v> + foryr2
2rdr
+
P'•
p2

2pp' cos (9  9')pp'dpdp'.
 2rp cos
(9  9')pdp
we have
INTRODUCTION TO MATHEMATICAL PROBABILITY
258
f2wJror
and
d(,.1r�M)
41rrdrJo
=
v'rt + p•  2rp
cos
[CHAP. XII
c.>pdpdM.
By introduction of new polar coordinates the integral in the right member can be exhibited as
f�2 L "'d.,
0

FIG. 17.
Thus
2r
cos.,
u1du
=
d(...r'M)
16 r 3 3
=
L
i
cos3
c.>d<.J
32
=
0
r3• 9
J.i.B.r'dr
whence
M
=
128r . 45...
6. A board is covered with congruent rectangles as in Laplace's problem.
A coin
the diameter of which is less than the smaller side of the rectangles is thrown on the board.
What is the probability that it will be partly in one rectangle and partly in
another?
Ans. a, b,
r
being respectively the sides of the rectangles and radius of the
coin, the required probability is
2r(a +b 2r) . ab
7. Solve Buffon's problem when the needle is longer than the distance between two consecutive lines.
Ans. The probability for the needle to intersect at least one
line is
p where
<po
is determined by cos
=
<po
2l 2<po . (1  sm <po) +...
...d
=
d/l.
8. A board is covered with congruent triangles whose sides are a, b,
c.
A needle
whose length is less than the shortest altitude of any one of these triangles is thrown on the board.
What is the probability that the needle will be contained entirely
Ans. The required probability is
within one of the triangles?
1+
(Aa• +Bb1 + Cc2)l1 (4a + 4b +4c  3l)l 2,.Q2 2...Q
where A, B, C are angles opposite to sides a, b, c and Q is the area of the triangle. equilateral triangles
�(�)·  v'3(  �) 
1+
z
3 a
4
...a
a
9. On each of the circles O,, O,, Oa, with respective radii r,, r2, r8, points M 1, M 2, M 8, are taken a.t random. Supposing that the series .
•
•
•
•
•
is divergent, while the series
r�+r�+r:+
·
·
·
For
259
PROBABILITIES IN CONTINUUM
is convergent, prove that the probability that the length of the vector OM
=
O,M,
0 as R >
will be > R tends to
oo no matter how large n is. Let Xt, X2 Xni y,, y2, ...Yn be components of w l r axesOX,OY. Then f.\M3
;�:. : ��::�� �1
Indication of Solution.
OM,, OM2, ...OMn
+ 02M2 +OaMa + . .. + OnMn
E (x D
=
EM)
=
•
•
•
Q
�
2"
M. Mh
� �
FIG. 18.
By Tshebysheff's lemma (Chap.X, Sec.1) the probabilities Q and Q' of the inequalities
!Yt
+ Y2 + • • • + Ynl >
are both less than
1/t2•
/r� + r� " Jr2 + r22 t "
+ XnI > t
!x1 + X2 +
1
+ r: + 2 + 12 . + 23
=
=
� �
t t
Now, if the length OM> R then either
!x1
+ X2 + • · · + Xnl >
IYt
+ Y2 + ' ' ' + Ynl >
or
� � =
_Ji_ V2
=
t
@. "2
t
Hence, the probability P for the length of OM to be >
that is,
R
is less than Q
+Q';
2G p
10. Prove that
.. LlLl
• dxn
li m n =
=
2
· 3
HINT: Considering x,, x2, ... Xn as continuous stochastic variables with uniform
distribution over the interval
(0, 1)
prove with the help of Tshebysheff's inequality
that the probability of
2
3 for any
•
> 0 tends to

1 as
•
2 +x! ...:." ._ < 3 + • x + n
x�+x�+ < !... _::...� ._ _;__ X , + x2 +
n>
__
oo.
References
E.CzUBER: "Geometrische Wahrscheinlichkeiten und Mittelwerte," Leipzig, 1884. E.CzUBER: "Wahrscheinlichkeitsrechnung,"
1,
pp.75 109, Leipzig,
1908.
H.PoiNCARE: "Calcul des probabilites," pp. 118 152, Paris, 1912. W.CROFTON: On the Theory of Local Probability Applied to Straight Lines Drawn at
Random in a Plane, the Method Used Being Also Extended to the Proof of
Certain New Theorems in Integral Calculus,
Philos.
Tr
ans., vol.
W.CRoFToN: Probability, "Encyclopaedi�t Brit�tnnica," 9th ed.
158, 1868.
CHAPTER XIII
THE GENERAL CONCEPT OF DISTRIBUTION 1. In dealing with continuous stochastic variables we have introduced the important concept of the function of distribution.
Denoting the
density of probability by f(z), this function was defined by F(t)
J� .J(z)dz
=
and it represents the probability of the inequality
:c
< t.
For a variable with a finite number of values the function of distribu tion can be defined as the sum F(t)
� P•
=
o:t<e
p11 p2, :c1, :c2, . :c,. where
•
•
•
p,.
are respective probabilities of all possible values
of the variable :c. The notation :c, < t is intended to show that the summation is extended over all values of :c less than t. •
•
Again, F(t) for any real t represents the probability of the inequality
:c
< t.
In this case F(t) is a discontinuous function, never decreasing and varying between F( co) at the points :c1, :c2,
=
0 and F( +co) . . . :c,. and are
=
F (:c,
+ 0)

1.
Its discontinuities are located
such that
F (:c,
 0)
=
p,,
denoting, in the customary way,
F(:c, + 0) F(:c,  0)
=
=
lim F(:c, + lim F (:c, 
E) E)
when E1 through positive values, converges to
0.
To represent F(t)
graphically we note that
F(t) F(t)
=
P1. + P2 +
+p,.
=
F(t)
=
=
·
0 P1 P1 + P2
F(t)
260
for
t <
for
< t < :Cz
for
X1 X2
for
:c,.
< t.
X1
< t < :Ca
SEc. 2)
261
THE GENERAL CONCEPT OF DISTRIBUTION
As for the value of F(t) at the point t x,, it is F(x,  0). Hence, the graph of F(t) consists of rectilinear segments as shown in the figure 1; x, 0; Xa (for n = 4; x1 = 2; X2 3; P1 = p2 = pa P• 7.!1:) and belongs to the socalled step lines. =
=
=
=
=
=
Thus, in case of a continuous variable the distribution function is given by an integral, and in case of a discontinuous variable, by a sum. In stating theorems equally true for continuous and discontinuous variables, it would be tedious always to distinguish these two cases. The question naturally arises whether it is possible to represent distribu tion functions, moments, and similar quantities by using new symbols equally applicable to continuous and discontinuous variables. In a similar kind of investigation Stieltjes was confronted with the same

oo
0
2
3
I
+co
FIG. 19.
difficulties and he succeeded in overcoming them by introducing a new kind of integrals known as "Stieltjes' integrals."
STIELTJES' INTEGRALS 2. Let
IP(X)
a � x � b.
(for
e
be a never decreasing function defined in the interval
For any particular value of the argument both the limits
converging to
0
through positive values) lim lim
exist.
ip(Xo + e) IP(Xo  e)
=
=
ip(Xo + 0) IP(Xo  0)
Since evidently
ip(Xo  0) � IJ'(Xo) � ip(Xo + 0), Xo
is a point of continuity of
IP(x)
if
IP(Xo  0)
=
ip(Xo + 0).
IP(Xo  0)
<
IJ'(Xo + 0)
If, however,
IP(x)
is discontinuous at
xo,
1no =
and the difference
rp(Xo + 0)  rp(Xo  0)
gives the measure of discontinuity or simply discontinuity. for any number of points of discontinuity discontinuities
xo, x1,
•
•
•
Since
x,. the sum of
262
INTRODUCTION TO MATHEMATICAL PROBABILITY
the points of discontinuity form a countable set.
[CHAP.
XIII
For there are only a
finite number of discontinuities above any given number, so that, con sidering the sequence .
� > �1 > �2 > . . .
tending to O, there is only a finite number of points with discontinuities > �; also a finite number of points with discontinuities � � and > �1, and so on.
It follows that points of discontinuity can be arranged into
a single sequence and hence form a countable set. It may happen, however, that
interval, no matter how small; but at any rate there are points of con tinuity in any interval. If
E
>
0 the point xo is called a "point of increase" of
In particular,
any point of discontinuity is a point of increase.
3. Let f(x) be a continuous function in the interval a � x � b. inserting points n
< x2 <
+ 1 partial intervals.
�o, �1, 8
X1
=
•
•
�n
•
. . .
<
Xn
By
this interval is subdivided into
In each of these we arbitrarily select points
and form the sum
j(�o)[
·
+ + f(�n)[
·
It can be proved in the same way as for ordinary integrals that when all intervals
X1  a, X2
 X
1 , . . . b  Xn
tend to zero uniformly, the sum S tends to a definite limit.
This limit,
called Stieltjes' integral, does not depend upon the manner of subdividing the interval
�n· It has (a, b) or upon the choice of points �0, �1, f(x) and
•
•
a perfectly definite value as soon as
are given, and accordingly is denoted by
In case
as the ordinary differential; Stieltjes' integral then coincides with the
d
ordinary one.
In other cases
reminder of the origin of Stieltjes' integral.
•
•
•
•
•
which is a finite sum or an absolutely convergent infinite series according as the set of points of discontinuity is finite or infinite.
SEc.
4)
THE GENERAL CONCEPT OF DISTRIBUTION
263
Stieltjes' integrals possess many properties of ordinary integrals. For instance, the meanvalue theorem holds for them in the form:
£bf(x)d'fJ(X) fW['P(b)  'fJ(a)] where a � � �b. Also, if f(x) has a continuous derivative, we have an =
analogue for the integration by parts
£bf(x)d'fJ(x) f(b)'P(b)  f(a)'fJ(a)  J:b 'P(x)df(x) where df(x) means an ordinary differential and the integral in the right =
member is an ordinary integral.
However, some important properties
of o�dinary integrals do not hold universally for Stieltjes' integrals.
For
b or a, they may have discontinuities. In the definition of Stieltjes' integral it was assumed that a and b were finite numbers. Stieltjes' integral over the interval  co, +co is
instance, considered as functions of
defined in an ordinary way as being the limit of
£bf(x)d'fJ(x) when a and b tend independently to 
co
and +
co,
respectively.
In
other words,
J_"'j(x)d'fJ(X)
=
lim
£bf(x)d'fJ(x)
provided this limit exists.
when
If it does not exist, the symbol
J_"'j(x)d'fJ(x) has no meaning.
THE GENERAL CONCEPT OF DISTRIBUTION 4. The most general type of distribution function of probability, covering all imaginable cases, is given by a never decreasing function
F(t) defined for all real values of t and varying .F( +co) 1. If at points of discontinuity we set
from
F( co)
=.
F(t)
=
F(t
 0),
then for any t the probability of the inequality
X< t will be given by
F(t).
Also, the probability of the inequalities t1
will be
�X< t2
=
0 to
264
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIII
F(t),
j(t)
The case of continuous having a continuous derivative (save for a finite set of points of discontinuity), corresponds to a con
j(t), since F(t) �� .,f(x)dx.
tinuous variable distributed with the density =
F(t) is a step function with a finite number of discontinuities, it charac of values. Finally, if F(t) is a step function with an infinite set of dis If
terizes the distribution of probability of a variable with a finite number
continuities distributed nowhere densely, it corresponds to a variable whose values can be arranged in a sequence according to their magnitude.
These are the most important types of variables considered in the calculus of probability, and for all of them the distribution function can be represented by Stieltjes' integral
F(t) �� .. dF(x). =
The mathematical expectation of any continuous function defined by Stieltjes' integral
j(t)
is
E(J(t)) f_"'J(t)dF(t) =
provided it has a meaning.
In particular, moments of the order
n (n
positive integer) and absolute moments of the order a (a real) are defined, respectively, by m,.
=
p...
=
f_"'.,t,.dF(t) f_"'.,itj«dF(t)
and we always have Finally,
tp(t) J_"'., e''"'dF(x) =
is the characteristic function _of distribution. for any real inequality
Since the integral exists
t, this function is defined for all real values t and satisfies the INEQUALITIES FOR MOMENTS
5. Moments of any distribution satisfy certain inequalities, which it is important to know.
They all are particular cases of the following
very general inequality due to Liapounoff.
SEc.
5)
265
THE GENERAL CONCEPT OF DISTRIBUTION
Let a, b, c be three real numbers satisfying
Liapounoft's Inequality. the inequalities
a�b�c�O and p,., p.,, p.. absolute moments of orders a, b, c for an arbitrary distribu Then the following inequality holds:
tion.
a. Let
Proof.
P11 P21
•
•
•
p,.;
. x,. be positive numbers
x1, x2,
and
+ p,.x:. Then for arbitrary real numbers s1, s2, holds:
(
(1)
rp
For p
=
S1
+
S2
+
+
·
·
·
)
Sp P
p
•
�
. Sp the following inequality
.
rp(s1)1P(s2)
·
·
•
rp(sp).
2 this inequality follows immediately from the known inequality
due to Cauchy:
by taking in it
For p
(
rp
S1
=
+
4 we have S2
+ 4
Sa
+
)
s, ' !5:

ffJ
(
s1
+ 2
) ( 2
S2
ffJ
sa
+ 2
)
s,
2
!5:

rp(S1)rp (S2)rp(sa)rp(S4)
a.nd continuing in the same manner we find in general that
(
rp
S1
+
S2
+ ... + 2m
)
S2n
2
"'
�
�P(S1)rp(S2) ' ' ' rp(S2n),
Let m be taken so that 2m > p and let us take in the last inequality SP+1
=
Since
we shall have
Sp+2
=
.
.
.
=
S2..
=
s
S1 =
+
S2
+
'
p
+
Sp.
266
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIII
whence
cp(s)P � cp(s1)cp(s2)
·
·
·
cp(sp),
which is inequality (1).
b.
Let
a � b � c � 0
S1
+
S2
+
·
·
p
ac; Taking = Sa c = a, we have (ab)c + (b c)a ac
be integers.
Sab+l
Sab = C j
=
=
·
Sac
+
·
·
ac
·
=
=
and consequently, by virtue of (1),
a c
n
If
n
n
(�p;x�) (�p;x�)ab(�p;xt)
(2)
pjs, b
=
qjs, c = r/s
=
it suffices to take, in (2),
p,
are rational numbers
rational
c. Po
Let the interval
=
1/s
a, b, c,
t1
<
F(t1)  F(A), Xo
=
A t2
to < Pt
IAI,
(a � b � c � 0),
B
·
=
replace X; by
x:,
and
to ascertain that (2) holds for
a, b, c. Finally, the passage to the limit a, b, c, provided a � b � c � 0.
inserting numbers
·
I
q, 1' instead of
raise both members to the power holds for real
b
be
�
a
=
makes it clear that (2)
be subdivided into partial intervals by
·
<
·
tn
between
A
and
B
and let
F(t2)  F(h), .. . Pn = F(B)  F(tn) itnl• lhl, • • Xn
X1
=
=
•
Then the three sums n
n
n
0
0
0
:.tp;x�, :.tp;x�, :.tp;xt will tend to the respective limits
J:itibdF(t), J:iticdF(t), J:itiadF(t) when all differences
A  t1, t2  t1, . . . B  tn
tend to
0
uniformly.
Hence, passing to the limit in (2), we get
and finally, letting
(f_ or
....
A
itibdF(t)
tend to oo and
B
to +
oo,
y• � (f_ ,.iticdF(t) yb (f_ ,.itjadF(t)y•
..
·
..
SEc. G]
267
'l'HE GENERAL CONCEPT OF DISTRIBUTION
as stated. . a+ c 1·1ty becomes · Takmg b = 2, L'1apounoff' s mequa ac ac ac � 2 2 1'0 1'a+c �'a , 2 
whence
for any two real positive numbers a and c.
integers and we take c = 2k, a = 2l, then
If k and l are two positive
or since P,21 = m21.
and
Another important inequality results if we ta.ke c = 0.
P,o
=
Then, since
1,
or 1
1
,li � ,_a: ,..b ,..a
if a > b > 0.
This amounts to log P,b < log P,a
b
=
if
a
a>b
which is equivalent to the statement that log p,,. X
is an increasing function of x for positive x.
CoMPOSITION OF DISTRIBUTION FuNcTIONS
6. An important problem in the calculus of probability is to find the
distribution function of the sum of several independent variables when ' distribution functions of these variables are known. It suffices to show how this problem can be solved for the sum of two independent variables. Let x and y be two independent variables with the corresponding
distributioh functions F(t) and G(t).
To find the distribution function
H (t) of their sum z=x+y
268
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIII
is the same as to find the probability of the inequality
t.
for an arbitrary real number
x+y
view of the applications we propose to consider later, we shall assume that one, at least, of the variables generally continuous density.
x, y
has continuous distribution with
x and y have continuous distributions so that F(t) �� j(x)dx; G(t) J� ..,u(x)dx.
At first, let both
=
=
The probability of the inequality
x+y
H(t)
=
JJ!(x)g(y)dxdy
extended over the domain
X+y < t. Now, following ordinary rules, we can reduce this double integral to a
To this end, for any fixed x we integrate g(y) between tx, thus obtaining ����g(y)dy G(tx). Then, after multiplying by f(x), we integrate the resulting expression between limits  and + for x. The final result will be H(t) J_"'..,G(t x)f(x)dx
repeated integral. limits

co
and
=
co
co
=
or, written as Stieltjes' integral,
H(t)
=
J_"'..,G(tx)dF(x).
x be a discontinuous variable with different x1, x2, x3, . . . and corresponding probabilities p1, p2, pa, For x x, the inequality In the second place, let
values
=
is equivalent to
y
SEc.
6]
THE GENERAL CONCEPT OF DISTRIBUTION
269
and the probability of this inequality is G(t  x,). Since the probability x x1 is p,, the compound probability of the two events
of
=
will be ·
The total probability
p,G(t  x,).
H(t) of the inequality x+y
will be expressed by the sum
H(t)
T.p/}(t  x,)
=
extended over all possible values of
But this sum can again be written
x.
as Stieltjes' integral:
H(t)
(1)
=
J_
....
G(t  x)dF(x).
In both cases we obtain the same expression for H(t). Evidently H(t) can also be defined as the mathematical expectation of G(t  x):
H(t)
=
E{G(t  x)}
taken with respect to the variable
The important formula
x.
(1) is F(t)
known as the formula for composition of distribution functions and
G(t).
Example.
Let
x and y
be two normally distributed variables with means
and respective standard deviations cr1 and crz.
=
0
Instead of using (1), it is better to
write H(t) as a double integral
extended over the domain
x+y
C, D,
x+y=
s
fJ so as to have identically
a,
zl
2cr1z +
yz 2 I = C(x + y)• CT1
+
D(ax
+
{Jy)•,
whence one easily finds
C= and
1
2(u:
' + u:) a= u:,
1
D= 2 u:u:cu: +cr:) fJ= u:
as a new variable and
270
[CHAP.
INTRODUCTION TO MATHEMATICAL PROBABILITY
XIII
The. Jacobian of s
= z +y,
ITa
ITt
ITt
ITI
u=zy
with respect to z, y being
1: : , 2
.

ITt
==
�
IT
+
:
IT

IT2
1Tt1T2
H(t) can be presented as the double integral
H (t)
1
= 2r(a�
+
IT
:)
If
6
2(o• + vr")dsdu •• +u• •
with the domain of integration defined by a single inequality:
< t.
s
Hence,
or
since
The expression obtained for H (t) leads to a remarkable conclusion: The sum of two normally distributed variables with means standard deviations cr1 and the mean of
x
and
=
y
cr2
0 and
=
is also a normally distributed variable with
If the means cr = Vcri + cr� . a1 and a2, then evidently z will be normally distributed a = a1 + a2 and the standard deviation cr = Vcri + 4
0 and the standard deviation
are
with the mean Repeated application of this result leads to the following important theorem: Xn are normally distributed independent variables with a,. and standard deviations cr1, cr2, cr,., then their sum •
Z =
X1 +
X2
+ ' ' ' +
.
•
X,.
is again normally distributed with the mean a = a1 + a2 + and the standard deviation CT = Veri+ cri + ' ' + CT� .
+ a,.
·
·
·
·
·
+ Cna,.
Finally, any linear function U = C1X1
+
C2X2
+ ' ' . +
is normally distributed with the mean
a
=
C..Xn
c1a1 + c2a2 +
·
SEc. 7]
and
271
THE GENERAL CONCEPT OF DISTRIBUTION
the
u
deviation
standard
=
vciui
+
c�u�
+
particular, the arithmetic mean
X1 +X2 + " " " +X,. n
of identical normally distributed variables with the mean a and the standard deviation
u
is normally distributed about the mean a and with
the standard deviation
u/yn.
Hence, the conclusion may be drawn
that the probability P of the inequality
'
Xl +X2 + • " " +X,.
_
n
al
<
E
is given by
P
=
vn s· .. 1 n _
e
__
uv'21r
and rapidly approaches

2u'dx
•
2 =
2'.

v'27i
as n increases.
•
vi
L ,. "
e
2dt
o
This is a more definite form
of the law of large numbers applied to normally distributed (identical or equal) variables.
DETERMINATION OF DISTRIBUTION WHEN ITs CHARACTERISTIC FuNCTION Is GIVEN 7. One of the most important conclusions to be drawn from the preceding considerations is that the distribution function of probability is uniquely determined by the characteristic function.
The known
proofs of this fact are rather subtle, owing to the use of conditionally convergent integrals.
However, such integrals can be avoided by resort
ing to an ingenious device due to Liapouno:ff.
In the general case, the
distribution function of a variable x has discontinuities.
To avoid the
bad effect of these discontinuities, Liapouno:ff introduces a continuous variable y that, with reasonable probability, can have values only in the
vicinity of 0.
It may be surmised, therefore, that the continuous
distribution function of the sum x + y will approximately represent that of x and, by disposing of a parameter involved in the distribution function of y, will tend to it as a limit.
To make these explanations more definite,
let y be a normally distributed variable whose distribution function is
G(t)
=
�
1 I' .. .. r
e
hidz.
hv.,. 
When h is small, the probabilities of any one of the inequalities y >
E,
y <
E
272
[CHAP. XIII
INTRODUCTION TO MATHEMATICAL PROBABILITY
0 when h tends to 0.
will be extremely small and even will tend to
x+y
the distribution function H(t) of the sum
F(t) as a limit when h tends to 0.
Hence,
is likely to tend to
To prove this in all rigor, we apply the composition formula (Sec.
to our case.
H(t)
=
6)
tzeh!dz f .. hv .,.  .. tz .. .y; 1 f dF(x) fh eu'du; .. e ( F(x)dx. h.y;f .. '"')"
We obtain the following expression for
1 _ r
f
00
H(t):
•'
dF(x)
_
or, in more convenient form
H(t)
=
oo
oo
and furthermore, integrating by parts,
H(t)
=
1
T
The integral in the right member can be split into three parts
1

hy:;
•i'+ e <'"")'F(x)dx
1 I..t+•e <'"")'F(x)dx tf •e ( hv:;  .. .. J;IT eu'du < �eT•.
T
te
Now, for positive T
T
+ 
hy:;
+
+
1
) F(x)dx.
' "" . T
'" eu'du <e •• i.. e (' ) F(x)dx f.. e (' ) dx h.y; t+• t+• "" y:;J t {1z)' F(x)dx < eiii•',  f •e hy:; .. H(t) f•ef.F(t u)du 8efo; f•ef.F(t u)du Making use of this inequality, we find that
1 hy:;
T
"
"" T
1
<
'
1
=
1 Iii 2
. h
and similarly
1
1 2
h
so that =
1 _ r:_
+
hv1rJo
+
1 _ r:_
+
hv1rJo
0
Given an arbitrary
u
0 � 0 �
>
0,
the number
E
<
8
< 1.
can be taken so small that
F(t + u)  F(t + 0) F(t  0)  F(t u)
u
7)
SEc.
for
0
273
THE GENERAL CONCEPT OF DISTRIBUTION
<
u<
whence
E,
•  fe hv'"i)o 1

f.F(t u)du F(tVi O)Jof"eu•du
and
H(t) F(t =
+
•
+
+
F(t O) f"eu•du •
0) + Vi
Jo
8'
+
(2u + e f.)
< u
i
18'1
<
1.
8"
<
1,
On the other hand,
• " y'; f eu•du Jo 1
'
=
1 2
1
y';
f
..
•
e""du
=
1
2
r.
8"  2e
�
h•;
0
<
so that finally
F(t ) F(t 0) 1 IH(t)  O t h( F(t  0) 1 F(t IH(t) 0) t H(t) F(t 0) F(t 0) +
< 2u
and for all sufficiently small
E
being kept fixed)
+
that is,
lim
1&�0
lim 1&�0
< 4u;
+
+
=
or, if tis a point of continuity,
+ 2e f.,
2
H(t) F(t). =
Now we must find another analytical representation for this end we consider the difference
_: eu•du, H(t)  H(O) y';1 J'" dF(x) f
H(t).
To
1:r;
=
_
..
"
and, to represent in a convenient way the inner integral, we make use of the known integral
274
[CHAP. XIII
INTRODUCTION TO MATHEMATICAL PROBABILITY
Multiplying both sides by du and integrating between we find
1
.. .y; f" f 1f,. f,. tx
and
1
e"'du

2?r
= h
H(t)  H(O)
=
e

=
1  ,... 4 e'""'
e
21J"
oo
_h•v• 4
and
t
X

h
 eivl . dv w
 ..
dF(x)
X
li
e'""'
1eivt . dv. W
oo
The next step is to reverse the order of integrations, an operation
1 .. "'"'1 . .. f f f .. ,... 1
which can be easily justified in this case. H(t)  H(O)
=
or
W
oo
1
= 0_ �
H(t)  H(O)
eivt
4
e
2Jr
since
J_
4
e ..

The result will be: dv
oo
ei•zdF(x)
eivl . dv �v
r,o(v)
..
r,o(v)
=
ei•"'dF(x). ,.
Now, taking the limit of H(t) for h converging to 0, we have at any point of continuity of F(t)
F(t)
(2)
=
1
,.,_ �
C +
lim
h=O
where the constant
C
=
f
..
e
oo
F( + O)
.  , .. 1 4
r,o(v)
 eivl . dv W
+ F( 0)
2
is determined by the condition F( oo ) = 0. Thus, the distribution function is completely determined by (2) at all points of continuity when the characteristic function r,o(v) is given. Example 1.
Let us apply
(2)
to find the distribution corresponding to the
characteristic function
Since in this case the integral whose limit we seek is uniformly convergent with respect to h, we find simply
F(t)
=
=
C C
1
+2r
f
1
+
211"
'"' . f.. "'"' oo
e
"
2
eivt
I
•
tV
oo
oo
e
2 sm
V
tv
dv.
dv
SEc.
THE GENERAL CONCEPT OF DISTRIBUTION
8]
On the other hand (Chap. VII, page
f
.. e
oo
so that
F(t) Taking
t =
 oo,
=
C
128),
sm 2 d v ..... . V t v
fo
1
uvr.;21r  .. _
F( 
the condition
oo
/O:_fo'0 ...

v ;w
 tT
=
u• e2<7•du +
)
275
2"1du,
e
1 r.;
_
uv 21r
0 gives
=
' J
u• e2.,•du.

..
and so finally
1
F(t) = 
uvr.;21r
f' ...
6 2<7"dU.
 ..
Naturally, we find a normal distribution with the standard deviation u (compare page
270). Example 2.
What is the distribution determined by the characteristic function qo(tJ)
a> 0?
e:"1"1,
=
As in the preceding example we find that
F(t)
=
1 C +
21r
dL'"
But
f
sin ttJ ealvldtJ IJ
oo
sintv e"• dtJ

dt
..
=
v
o
o
whence
r .. e sin tvdtJ ;J0 "";;" 1
=
L
'"
and the condition
F(
oo
)
=
=
C
f' dz ;Jo a• + z•
f
1
a
2
'If'
  +
0 gives C
F(t)
eo•
a
Thus
F(t)
=
=
�
'II"
=
1
C +
11'
tvdtJ
COB
=
a
;
'
L0
f
'"
a
= ,
+t8
a•
'
_
sin tv
en�IJ. IJ
dz ,.a•
+ z•

1
2'
dz
 ,.a• + :1:1
�. so that finally
I
I

dz
·
.. a• + :�:8
Naturally we find the same distribution as that considered in Example 2, page 243. Sometimes it is call� "Cauchy's distribution" with the parameter a.
COMPOSITION OF CHARACTERISTIC FUNCTIONS 8. Having n independent variables X11 Xt1
•
•
•
Xn whose charac
teristic functions are rp1(t), rp2(t), ... rp,.(t), the product rp(t)
=
f/'1 (t)rp2(t)
•
•
•
fl'n(t)
276
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIII
is the characteristic function of their sum X1
8 =
+
Xz
+
In fact, the characteristic function of
rp(t)
E(ei•t)
=
=
·
8
·
+
•
X,..
is by definition
E(e'"'•' e""•' ·
·
·
·
e""•').
x,. are independent variables, the expectation of the
Since x1, xz, product
is eq�al to the product of the expectations of the factors, whence
rp(t) = lfll(t)rpz(t)
·
·
rp,.(t).
·
This simple theorem is of great importance since it determines the characteristic function of the sum of independent variables and indirectly its function of distribution.
9. A few examples will illustrate the preceding remark. Example 1. with means
=
Consider n independent normally distributed variables :�:1,
0 and standard deviations
.
u,, u2,
.
:�:2 ,
•
•
•
x.
Their characteristic func
. u..
tions are crt ttl
k
e yi
=
1, 2, . .
=
.
n
and the characteristic function of their sum 8
=
:1:1
+ :�:,.
+x• +
will be
where 2 u
Hence
s
=
u
�
+
u
�
+
·
+ u! .
•
·
is a normally distributed variable with the mean 0 and the standard deviation
as we found previously by a method involving a considerable amount of calculation.
Example 2.
with parameters
Independent variables a1, a2,
•
•
•
a..
:�:,, :1:2,
•
•
•
:�:,.
have Cauchy's distributions
Since the characteristic function of
Xk
is
the characteristic function of the sum 8
=
:1:1
+ :l:t +
a,
+
+ :1:,.
will be where a
Hence,
8
=
a2
+
·
·
·
+ a,..
again has Cauch�·'s distribution with the parameter
a,
+
a2
+
•
•
·
+
a,..
THE GENERAL CONCEPT OF DISTRIBUTION
277
Example a. Let Xt, X2, • • . x,. be independent variables with uniform distribu tion of probability in the interval (0, Z). The characteristic function of any one of them is
1 fle''"'dx e'"1 �=
i Jo
Hence, the characteristic function of their sum
F(t) = C +
8
8
will be
" 1)". (e'�
is given by
1)"1 e;., 1 f e ,....(e'1• 00
lim
211ho
4
. ilv
oo
dv
•
tv
and, since the integral again is uniformly convergent,
F(t) = C +
.!..f (e' ". 1)"1  e'•'dv. oo
21r
_..
. tv
tlv
The evaluation of this integral presents certain difficulties.
To avoid them we
notice that the integrand considered as a function of a complex variable v is holomorphic everywhere.
Hence, 'r."' °
we can substitute for the rectilinear path of integration
Reelrlaxi's
Flo. 20.
the path r as shown in Fig. 20. Now it is easy to show that integrating over the path r we have
f(g) =
J:;
e ,.
0
th =
rzn+l
if
g" 2wi"+1nl
g>O
g �O
if
The integral
being a linear combination of integrals of the type /(g) with Similarly,
or, in explicit form,
J =
��(1)kc:(i )
k ".
kS� z 
Referring to the above expression of
F(t) = C +
F(t), we find that
��(1)k�:(i )
k ".
kS� z
g � 0 reduces to 0.
INTRODUCTION TO MATHEMATICAL PROBA BILIT Y
278
The constant
t
=
0.
C
=
[CHAP. XIII
F(t) and the sum in the right member both vanish for F(t) is, therefore:
0 since
The final expression of
F(t)
=
1
1 · 2· 3 · ·
·
[ OY iO y
n
1
: � 1) 0  2y · · ·
n (
+
l
The series in the right member is continued as long as arguments remain positive. Such is the probability that the sum :1:1 of
+
:1:2
+ . . . +
:1:,.
n independent variables, uniformly distributed throughout the interval (0, l), will t. The above expression is due to Laplace, who, however, obtained it in
be less than
quite a different manner.
Problems for Solution 1. Prove directly the inequality
for absolute moments.
HINT: The quadratic form in >..,
J_
..
f.&
(},j:!:li
+
f.&l:!:l�)1dl"(:!:)
..
is definite or semidefinite. Show that the equality sign cannot hold if 1"(:1:) has a.t least two points of increase a, {J such that a: {J is neither 0 nor ± 1.
2. Let :�:1, :�:2, a
for :1:; by
•
•
•
:�:,. be
f.&�l, and by "'3
n variables.
Denoting the absolute moment of the order
the quotient
(113 =
f.&w3 + "��, + (1',1) + �2) +
"�'3 8 + ll�"l)1+2
. . . +
prove that
if8'>8>0. HINT: Use Liapounoff's inequality. 3. A variable is distributed over the interval (0, + oo ) with a decreasing density of probability. Show that in this case moments M2 and M, satisfy the inequality
M: ;:i! IM, (Gauss) and that in general
if
1 1 [(p + 1)M,.]" ;:i! [(v + 1)M v]"
J' >I'> 0.
Indication of th£ Proof.
Show first that the existence of the integral
Jo .. x•f(x)d:l: in case /(:c) is a. positive and decreasing function implies the existence of the limit lim a•Hf(a)
=
0;
a
+ + oo.
279
THE GENERAL CONCEPT OF DISTRIBUTION Hence, deduce that
fo .. xd.p(a;) where
.p(a;)
=
Jo '"z•+ld.p(a;)
1,
=
=
( v + 1)M
•
J(O) :...... J(z) and, finally, apply the inequality
[.fo
..
a;l'+ld.p(a;)
r [.fo
..
�
zv+1d.p(a;)
]" [.fo .. zd.p(a;) r1'. •
4. Using the composition formula (1), page 269, prove Laplace's formula on page 278 by mathematical induction. 15. Prove that the distribution function of probability for a variable whose charac teristic function
.p(t) is given can be determined by the formula F (t)
=
.
1
C +lim
,.,_ ho
J
'"
.p(v)
... 2
1 6i•t
dv.
•
.. 1 + h .,

�v
HINT: 'In carrying out Liapounoff's idea, take an auxiliary variable with the dis tribution
G(y)
=
Also make use of the integral
_!_f" 2h
 ..
lildz.
e
Many definite integrals can be evaluated using the relation between characteristic and distribution functions, as the following example shows.
6. Let a; be distributed over ( istic function being in this case
oo,
+
oo
F(t)
=
J 1J
1
C +2r
whence
1r
'"
1 eivt
 .. iv(1 + v1) oo
The character
1 1 +t1
.p(t) we find
) with the density Y2elzl,
eivt
 dv  ,.1 + v1
=
J
1
dv
= 
2
'
_
..
e1•1da;,
eltl,
an integral due to Laplace.
7. A variable is said to have Poisson's distribution if it can have only integral 1, 2, ...and the probability of a; = k is
values 0,
the quantity
a
is called "parameter" of distribution. If
distribution with parameters
a1, as,
.
.•
distribution, the parameter of which is
a1
a,.,
+
n
variables have Poisson's
show that their sum has also Poisson's a2
+
·
·
·
+
a,..
280
INTRODUCTION TO MATHEMATICAL PROBABILITY
8. Prove the following result:
.!_f .. (�)
211"
oo
v
"
sin tv v d
1 + ::: 2n 2·4·6·
_!
=
2
v
·
[
t ( +
n)"
1·2
n(n  1)
+

� t (
1
(t + n
[CHAP. XIII
+ n  2)" +  4)"
...]

the series being continued as long as arguments remain positive.
HINT! Consider the sum of n uniformly distributed variables in the interval ( 1, +1) and express its distribution function in two different ways. 9. Establish the expression for the mathematical expectation of the absolute
value of the sum of
n
uniformly distributed variables in the interval
Am.
2
+ 3:nl
(2n + 2)
2·4·6· · ·
=
[nl nn 1 n n
 �(n 1
+
(
+
 )( 1. 2
(  �. + �).
2)"+1 + _
4)"+1
_
.
•
.
]·
the series being continued as long as the arguments remain positive.
HINT! Apply Laplace's formula on page 278, conveniently modified, to express the 3:t + Xs + · · · + x,. and that of lxt + X2 + + x,.l. 10. Show that under the same conditions as in Prob. 9
expectation of
·
Elx + x2 +
t
·
·
·
+ x,.l
n
=
211"
f'".. ( ) sint t
_
"1
·
·
sin t t cost dt. ta
HINT: Prove and use the following formula lim
Too
I
T .
81'",.
T
1 tWX . tk X2
=
.,.lwl.
11. Let Xt and X2 be two identical and normally distributed variables with the 0 and the standard deviation tr. If xis defined as the greater of the values
mean
=
lxtl, lx2!,
that is,
x find the mean value of
x as
max.
well as that of
E(x) 12. Let
=
x
(ixtl, lx•l> x1•
Am.
2tr
=
=
y;• min.
(lxtl, lx21, . . . lx,.i)
. x,. are identical normally distributed variables with the mean 0 where Xt1 x2, and the standard deviation tr. Find the mean value of x. Am. Setting for brevity =

_ria' ... fo .. v2

we have
tTy;.
E(3:)
=
e
0
2.•
du
(1 
=
B(t),
B(t) }"dt.
THE GENERAL CONCEPT OF DISTRIBUTION In particular for n
=
281
2
E<:c> =
;.cV2  t).
For large n asymptotically
E(:c) "'
y';j2 .
n+l
13. A variable with the mean = 0 and the standard deviation = "reduced variable."
1
is called a
By changing the origin and the unit of measurement any
variable can be made reduced.
For, if
:c has the mean a and the standard
deviation
the variable
:ca
u=
The distribution function of the reduced variable
u
can be called the
"reduced law of distribution."
Z1
As we have seen, variables
and
:Cz with normal distribution have the same
reduced law of distribution, as does their sum.
The question may be raised: Is the
normal law of distribution a unique law possessing this property? Solution.
Let
:c1, :cz
(G. P6lya.)
be two variables for which the second moment of the distri Let :c1 az and
bution exists, so that we can speak of their means and standard deviations. have its mean
a1 and its standard deviation <�'1;
the standard deviation of :cz.
Three reduced variables
:c1 a1
UI
=  ,
likewise, let
Ut
:Cz az = ,
have by hypothesis the same law of distribution.
Hence, they have the same charac
.p(t) whence we can draw the conclusion that the characteristic functions of :c1, :c2, :Ct + :cz are, respectively, teristic function
<Pa(t) =
we must have for an arbitrary real
t
tp(
or
.p(at).p(flt) = .p(t)
(1) where
{J
Since
(1)
holds for every real
t,
<1'2
�
•
a1 + {J1
=
1.
·
we shall have
.p(at) = .p(a2t).p(a{Jt);
.p(fJt) = .p(a{Jt).p(fllt)
and (2) Applying
(3)
(1)
again to each of these factors in the right member of (2 ) , we find that
=
282
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIII
and proceeding in the same way, we arrive at the general formula
q>(t)
(4) where p0, p1,
.
•
=
•
q>(ot"t)Poq>(otnlflt)Pl
•
•
q>(fl"t)P,.
. p,. are coefficients in the expansion (1
+ z)"
Vo
=
=
·
·
·
ot"1 flt,
•
•
Po + P1Z +
+ P..Z"·
The arguments tend uniformly to 0 since a
a"t,
=
V1
< 1, fl < 1.
q>(v) 1 V1
f..
=

oo
•
=
V,.
fl"t
The quotient
Ll
t2dF(t)
 x)ei•l•d:e
(1
0
is represented by a uniformly convergent integral; hence •
hm
tp(V)  1
1
=
v1

2
J
00
oo
=
t1dF(t)
1

2
or
=
1
+ [ i + e(v)]vt
where
e(v) .0
v. 0.
as
At the same time log
=
(principal branch of log )
[ i + B(v)]v1
where again
B(v). 0
v. 0.
as
Now, taking logarithms of both members of (4) log
=
W(poot1" + Ptot2"2fl1 +
•
•
•
+ p,.fl1") + 0
=
lt1 + 0
where
0
=
t1[poB(vo)a1" + PIB(vi)ottnafl• +
•
·
•
+ p,.B(v,.)flln),
Given e > 0, we can take n so large that
I B(vi ) l < e;
i
=
0, 1,
•
•
•
n
whence
IOI < et•. Thus
l log q>(t) + it11 < et1 and since e can be taken arbitrarily small, log
l"(t)
+ it1 =
=
0
ell•,
which shows that the normal law is the only one with the required properties, among
all laws with finite second moments. References
LAPLACE: Thoorie analytique des probabilit�, Oeuvres VII, pp. 260261. STIELTJEB: Recherches sur les fractions continues, Oeuvres II, pp. 402566. A. LIAPOUNOFF: Nouvelle forme du thoor�me sur Ia limite de probabilit6, M�m. Acad.
P.
T. J.
Sci. St. Petersbourg, 12, 1901. P.
LEVY:
"Calcul des probabilites" Paris, 1925.
CHAPTER XIV FUNDAMENTAL LIMIT THEOREMS 1. Bernoulli's theorem, as we have seen in Chap. VII, follows from a more general one known as Laplace's limit theorem.
In terms already
familiar to us, this theorem can be stated as follows: Let an event
E mtimes in a series of n independent trials with constant probability As n becomes infinite, the distribution function of the quotient
occur
p.
mnp �
1 J' �
approaches
eiu"du
oo
as a limit; or, to state it in a less precise form, the distribution of the above quotient tends to normal. Just as Bernoulli's theorem itself is a very particular case of the general law of large numbers, so Laplace's limit theorem is a special case of another extremely general theorem, the discovery of which by Laplace may be considered as the crowning achievement of his persistent efforts, extending over a period of more than twenty years, to find the approxi mate distribution of probability for sums consisting of a great many independent components with almost arbitrary distributions.
The
result at which Laplace finally arrived is as astonishing as it is simple: if
x1, x2, . . . x,. (E(x;)
= 0, i = 1, 2, .
. . n)
are independent variables
(subject to some very mild limitations not stated, however, by Laplace)
and B,. is the dispersion of their sum, then for large
n the distribution of
the quotient
X1 +X2 + VB:. •
is nearly normal.
•
•
+X,.
To put it more precisely, the distribution function
J �
of this quotient tends to the limit
1
'
etu'du
oo
as
n
becomes infinite.
Laplace's attempt to proye this important proposition does not stand the test of modern rigor and, besides, cannot easily be made rigorous. 283
INTRODUCTION TO MATHEMATICAL PROBABILITY
284
[CHAP.
•
r
XIV
The same is true of the attempts made by later investigators, notably Poisson, Cauchy, and many others. Only after a lapse of many years were truly rigorous proofs of Laplace's theorem given.
This important
achievement is the result of the work of three great Russian mathemati cians: Tshebysheff (1887), Markoff (1898), and Liapounoff (190Q1901). An account of Tshebysheff's and Markoff's ingenious investigations is given in Appendix II.
Here we shall follow Liapounoff; for his method
of proof has the advantage of simplicity even compared with more recent proofs, of which that given by J. W. Lindeberg deserves special mention.1
2. Before going into details of analysis, we shall state the limit theo rem in a very general form due to Liapounoff. Let x1, x2, ...x,. be independent 0, possessing absolute moments of the order
LaplaceLiapounoff's Theorem. variables with their means
2 +
=
6 (where 6 is some number > 0):
If, denoting by B,. the dispersion of the sum X1 + X2 + quotient _
Cain
tends to 0 as n
+
oo
PW.a + 1'�a +
·
·
·
t+ 1 B,. 2
+
·
·
·
+
Xn, the
1'�a
, the probability of the inequality X1
+
Xs
+ ' ' ' +
X,.
< t
VB:. tends uniformly to the limit
It is natural that the complete proof of a theorem of such character cannot be too short, and to make the proof clearer it is advisable to divide it into logically separated parts.
3. The Fundamental Lemma. Let s,. be a variable, depending on an 0 and the standard deviation 1. If its
integer n, with the mean
=
=
characteristic function fP,.(v)
=
E(e''"")
tends to ...
e2 1
Lindeberg's proof, as well
as
later proofs by P. Levy and others, make use of an
ingenious artifice due to Liapounoff.
Lindeberg explicitly acknowledges his indebted
ness to Liapounoff, while Levy and other French writers fail to give due credit to the great Russian mathematician.
285
FUNDAMENTAL LIMIT THEOREMS
SEc. 3)
uniformly in any given finite interval ( l, l), then the dist1·ibution function F,.(t) of s,. tends uniformly (in the domain of all real values of t) to the limit
1
.y2;

It

elu'du .
..
a. Together with the variable s,., whose distribution function
Proof.
is Fn(t), Liapounoff considers another variable
+ Y
Tn = Sn
where y is a normally distributed variable with the distribution function
G(y)
1 hvr

=
y .. f _
r,.
Denoting the distribution function of Sec.
�
e "'dx.
7)
by H,.(t), we have (Chap. XIII,
(1) On account of the inequality
T�O we have: For t
For t


t"'
f�� 1 ..  0: Yi f��
1 x < 0: y:K x �
=
2
oo
,_.,
eu•au

ct
=
)
 .. e  T
8'
eu'du
1 1Yi
Hence, introducing these expressions into
Hn(t)
= .
f
t
_
where again
..dF,.(x) +
8o
0<
inequality:
But

e
<
�oJrt ..
1; 0
(tx  )' h
tc ..r
· ·
it....
eu'du
81
h
=
2vr
<
f
1.
e ..
1
8"
ct ) ..
e II 2
·
;
0<8"�1.
t ct"')' � I ..
(1),
8

_
e II dFn(x)
This leads to the following
oo
_
=
h
e  II dF,.(x)
<
0<8'�1.
1
.. t)
h2v• +i•( 4
dv
286
INTRODUCTION TO MATHEMATICAL PROBABILITY
f., e _ h•v e••1!p,.(v)dv
and consequently
IH,.(t)  F ,.(t)i or
(2)
IH,.(t)  F,.(t)i
<
[CHAP. XIV
h�
_
4
4v7r "'
•
h'v• 1 {f.. e 4 '" [
h�
<
••
.
+
_
+
Here we split the first integral into three between limits by J 4·
Since
 oo, l; l, l; l, +
•• I
h J I
(3)
4y;
1
+
Jal
because
for positive x.
<
oo
J 1, J2, J3,
taken respectively
and denote the second integral
we shall have
..
(hl)'
•v•
h rz e h dv y;J 4
<
2 e y
y; xl
Also
(4) To estimate J2 we shall denote by the interval l �
v � l.
Then
u
• E,.(l) the maximum of
) qt( +
(5) Finally, taking into account
(2), (3), (4), and (5), we find
(6) b. Expression (1) of H n(t) can be transformed in a manner similar to that employed in Chap. XIII, Sec. 7, if we first write
1
J
y;
Teu'du tx
_
.,
=
lTe"'du.
1 1 + 2 y;


tx
0
SEc.
3]
FUNDAMENTAL LIMIT THEOREMS
287
Thus we get
or
1 H.,(t) = 2 +
lL
11"
Now
l I l,. 
11"
0
e
0
..
1 "'•' �sin tv e 4 2dv + 0_ mr V
l
v 2 dv
s1n ••
·
V
t
fo
11"
0
00
e
f

oo
h•v• •' sm 4
.. e
tv 2dv •
V
I
h'v'  t.tv y
.
W
h2
47r
<
�
(e 2  tp.,(v))dv.
l
'"
O·
•' ve  2dv
h2
4?r
=
since
and consequently
(7) To find an upper bound of the integral in the right member, we split
l; l, >..; >.., >..; >.., l; l, + oo.
it into five integrals oo,
that
I 1, I2, Ia, I 41 I 6
and
To estimate I3, we notice
taken respectively between limits
ltp.,(v)  e 21 � v2. ••
Hence
(8)
l >.vdv l
IIal �21r
1
11"
=
0
To estimate 12 + I 4, we use the inequality get
}.,2
21r
·
ltp.,(v)  e 21 � E.,(l) ••
(9) Finally, dealing with I 1 and
I 6,
we use the obvious inequality
ltp.,(v)  e 21 � 2 ••
and we
288
[CHAP. XIV
INTRODUCTION TO MATHEMATICAL PROBABILITY
and we obtain
(10) Taking into account results:
IH (t) n
_
In it, since
!. 2
_
(7), (8), (9),
!. f "'e fsin tvdv v Jo
7f'
l
and
the following inequality
(10),
(Ill)•
En(l) + �e(hl)2 v:;hx
2
X2 + < h + 2'11' 4n
4
•
7f'
X is still at our disposal, we can take X En(l)lh1. =
The inequality thus obtained when combined with
(11)
I .. F
.. Jo I
�sin tv 1 1 dv (t)  2 ?i f e 2 v
crt
<
+ a? + 4n 2
l
Here
a
(6)
(a
gives
=
hl)
a1
4e4 2 e4 a ;;;y: + v;7 + V'Bz +
(1
2'11' +
)
1 v:; (la1)1e..(l)l
and l are arbitrary positive numbers.
+
1
2en(l).
We dispose of them in
the following manner: Given an arbitrary positive number
e,
we take
a
so large as to have
 a:l
at
4e4 + 2 e4 'tf'a2 v:; a
1 <E 3
and after that we select llarge enough to make
a + a2 < 1 VBl 4Jrl2 3E. e.. (l) by hyuothesis,
Finally, since for a fixed l,
tends to
there exists a number no such that
(ir
for all n > no.
+
�)(za1)1e..(l)1 + �e..(l)
The inequality
I
F.. (t) 
fi+OO
.. (t)
=
1 2
1
7f'
< �E
.. I
<E
l .. 2 tvdv 1 � f'
+
0
e
when n
then shows that
1 1 r � sin tv 2 ?iJo e 2 v dv
for n > no and this means that lim F
(11)
0
�sin
V
=
!u•
e 2 du
oo
� oo 1
SEc.
4]
289
FUNDAMENTAL LIMIT THEOREMS
uniformly in
t
Remark 1.
no,
because the number
ceding analysis, depends upon
as clearly follows
only and not upon
E
t.
from
the pre
Without changing anything in the proof, we can state
the fundamental lemma in a slightly generalized form as follows:
tends to the limit t; the probability of the inequality tends to 1 Remark 2.
'
J
If t,.
eiu'du. ..
The fundamental lemma, although not explicitly stated v'21r by Liapounoff, is implicitly contained in his proof. More general propositions of the same nature have been published by P6lya and Levy. The very elegant result due to the latter can be stated as follows: If
the characteristic function of the variable. s,. tends to the characteristic function q,(t)
J_
=
ei'"'dF(x)
.. ..
of a fixed distribution uniformly in any finite interval, then lim
F,.(t)
=
F(t) .
at any point of continuity of F(t). The above proof, corresponding to the particular case
F(t)
1 =

f'
eiu'du, ..
can be used, almost without any changes, in proving the general proposi
v'21r
tion of Levy.
a.
4. Proof of Liapounoff's Theorem.
#'��a
+
#'�2_/.a
+
·
·
·
6
Bl+2
+
If Liapounoff's condition
#'��6 + 0
..
is satisfied for a certain � > 0, it will be satisfied for all smaller 8. Let
/i(t)
be the distribution function of
x;(i
=
1, 2, ... n).
The
sum
f(t)
=
ft(t) + f2(t) + . . . + f,.(t)
being a nondecreasing function of (Chap. XIII, Sec.
t,
the following inequality holds
5):
(f_ ,.Wdf(t) yc � (f_ ,.ltj•df(t) yb' (f_")l"df(t) y•, ..
..
INTRODUCTION TO MATHEMATICAL PROBABILITY
290
provided a> b > c > 0.
We take here
a= 2 + 6, '
supposing 0 < 6 < 6.
f_ and
IWdf(t)
=
b
=
2 + 6',
c=2
Then
�Jl��a'; n
....
[CHAP, XIV
f_00,.1tjadj(t)
1
�Jl��6; n
=
f_ ,.ltl•df(t)
..
1
=
B ,.
(*Jl��6'Y B,.aa'(*JlwaY'· �
But this inequality is equivalent to
and it shows that
if n
�r2+6 __a_�o, �,(k)
1 _ 1+ B" 2 provided 0 < 61 < 8.
Hence, in the proof we can assume that the funda
mental condition is satisfied for some positive 6 � 1. b. Liapounoff's inequality (Chap. XIII, Sec. 5) with c= 0, b = 2, a
=
2 + 8 when applied to
Hence, (12)
x, gives ( u
b2+6 r2+6 b, (____!ti_)2H2 w2+a 2 B w,.� b2 i
<
=
•
p.
b· __!._ � B,.
"
0, all the quotients
b,
+
E(x1). �
<
1+� "
and, since it is assumed that
b, = B,. b1 +
=
·
·
will converge to 0 uniformly as
·
+ b,.
.
n�
oo.
(i
=
1, 2,
... n)
SEC.
4]
29I
FUNDAMENTAL LIMIT THEOREMS
c. The following formula can easily be obtained by means of integra
tion by parts:
eiz
=
I+
ix 
�  Ll (e�"'1  I)(I x2
t)dt.
If xis real and in absolute value >2, we have
l fo\e'"''  I)(I  l le'"''  II 2.
t)dt � x2 < !
x2
since
x�:
8
�
If lxl � 2, we can use the inequality
le'"'  11 j;j
l � 2 t� 2
8
�� t
and fj.nd
l �f\e'"'  l)(I  l
2 < lx!Ha lxiH ) t < ! td 3 � �
x2
=
Thus, for every real x
ei"'
=
I+

ix
xi2H
x2
z + ol26;
Substituting here
and taking the mathematical expectation of both members, we have
(I3) Furthermore, since
1

x
=
ez  2
0x2•
X>
'
we can write bk
,.
I  B t2 2
(I4) If w,.ltjH 2
<
1,
=
e
_...!!!.,, 2B.
bk
<
0
( )
0 t
<
2
b 2 k 2 B 2 ,. .
we shall have, by virtue of
B,.t2
0;
I
(I2),
0
<
1,
292
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIV
and consequently
(b�.t2 ) 2 ( bkt2 ) ( b 2 ) 1+! 2
=
�.t
This inequality, together with expression of
2
2B,.
2B,.
2B,.
�( bkt2 ) <
1!
2 (13)
2
2Bn
12
and
'Pk (t):
1+!
(14),
<
�o��"/.8 t 2H. lj 1+a 4B ,. 2
leads to the following
(15) where
(16) d. The characteristic function of the variable 8,.
=
X1 + X2 + ' ' ' +X,.  /D
vB..
is
cp(t) because
x1, x2,
•
•
•
x,.
=
'Pl(t)cp2(t)
•
.
cp,. (t)
.
are independent variables.
Hence, by
(15)
(1 + u,.) cp(t) eit'(1 + u1)(1 + u2) (1 + lu,.i ) 1 < ei"•I+I"H .. ·+I"·' 1 icp(t) eit'i < (1 + lu1l)(1 + lu21) =
·
·
·
·
·
·
and
(17) (16). Inequality (17) w,.iti 2+8 < 1.
taking into account inequalities
holds if
Suppose, now, that tis confined to an arbitrary finite interval
l � Because
w,.,
t � l.
by hypothesis, tends to O, the difference
will tend to 0 as n � oo.
In connection with
(17)
this shows that
cp(t) � eit• uniformly in any finite interval.
It suffices now to invoke the funda
mental lemma to complete the proof of Liapounoff's theorem.
6. Particular Cases.
This theorem is extremely general and it is
hardly poseible to find cases of any practical importance to which it
SEC.
5)
293
FUNDAMENTAL LIMIT THEOREMS
Two particularly significant cases deserve special
could not be applied. mention.
First Case.
Let us suppose that variables x1, x2,
•
•
•
Xn are bounded,
so that any possible value of any one of them is absolutely less than a constant C.
Evidently
and hence
It suffices to assume that
Bn
=
b1 + b2 + ' ' ' + bn
tends to infinity to be sure that wn
�
0.
Hence, dealing with bounded
independent variables, the condition for the validity of the limit theorem IS
as
n
� oo,
which is equivalent to the statement that the series
b1 + b2 + bs +
·
·
·
is divergent. Poisson's series of trials affords a good illustration of this case.
In
the usual way, we attach to each of the trials a variable which assumes two values, 1 and O, according as an event E occurs or fails in that trial. Let p; and q;
=
1

p; be the respective probabilities of the occurrence
and failure of E in the ith trial.
The variable Z; attached to this trial
is defined by Z; Z;
=
=
1 if E occurs, 0 if E fails.
Noticing that E(z;)
p;,
=
we introduce new variables
X;
=
(i
Z; p;
=
1, 2,
.
.
.
n)
with the mean 0, whose sum is given by m
where
m
np
is the number of occurrences of E in
n
probability
p
=
P 1 + P2 +
n
·
+ Pn,
trials and p the mean
294
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIV
In our case and
Hence, we can formulate the following theorem:
The probability of the inequality
Theorem.
m np
h/B,.
<
tends uniformly to the limit
as n
+
oo
, provided the series ..
is divergent.
At the same time the probability of the inequalities t1v'B:.
m np
<
<
t2v'B:.
tends uniformly (in t1, t2) to the limit 1
rn=
_
v211'
Let Zt, z2, a and dispersion b.
Second Case. common mean
J:"e t,
•
•
•
�
2du.
z,.
be identical variables with the
Supposing that for some positive �
Elz, ai2H
=
c
exists, we have w,.
and hence
w,. +
0 as n
c
nc
8
=
8
=
(nb)H2
+ oo
•
_!
n 2,
b1+2
The limit theorem applied to this case
.
can be stated as follows:
The probability of the inequality Zt + Z2 +
·
·
+
·
tends uniformly to 1 •
R>=
v211' _
I
t
oo
z,.
e

�2
na
du,
<
ty';ib
SEc. 61
295
FUNDAMENTAL LIMIT THEOREMS
provided
Elz• ai 2H
exists for some positive inequalities
� Vn
t f.ends to
6.
As a corollary ''e liave:
< Z1 + Z2 + ' ' ' + Z,. n
_
a
The probability of the <
t
� Vn
f'e "idu. 2 V2WJo u•
This proposition is regarded as justification of the ordinary procedure of taking a mean of several observed measurements of the same quantity, made under the same conditions, to approximate its "true value." Barring systematical errors which should be eliminated by a careful study of the tools used for measurements, the true value of the unknown quantity is regarded as coinciding with the expectation of a set of poten tially possible values each having a certain probability of materializing in actual measurement.
Since for comparatively small
t the above
integral comes very near to 1 and
for large
n
becomes as small as we please, the probability of the mean of a
very large number of observations deviating very little from the true value of the quantity to be measured, will be close to 1 and herein lies the justification of the rule of mean mentioned above.
EsTIMATION OF THE ERROR TERM
6. The limit theorem is a proposition of an essentially asymptotic
character.
It states merely that the distribution function
F,.(t) of the
variable
s,.
=
approaches the limit
X1 + X2 + ' ' ' +X,.
' V2W J
VII:.
1

�
e 2du
oo
as n becomes infinite when a certain condition is fulfilled.
For practical
purposes it is very important to estimate the error committed by replac ing
F ,.(t) by its limit when
n
is a finite but very large number.
In his
original paper Liapounoff had this important problem in his mind and for that reason entered into more detailed elaboration of various parts
296
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIV
of his proof than was strictly necessary to establish an asymptotic theorem. We do not intend to reproduce here this part of Liapounoff's investiga tion; it suffices to indicate the final result. absolute moments of the third order suppose n so large that
.
 #'�1) + p.�2) + .. B�
+
w,.Then, setting
F,.(t)
=
we shall have
IRI
<
� .. [ ( a:)i w
log
_
1
.tn=
v27r
Assuming the existence of
Elx,l3; i
1, 2,
#'�")
+
. n, we shall
1
<
' eiu'du J
+ 1.1
=
· 20
R,
_,.
]
+
w
� log
a:.. �,.le1"'··. +
Although this limit for the error term is probably too high, it seems to be the best available.
However, it is greatly desirable to have a more
genuine· estimation of R.
7. Hypothesis of Elementary Errors.
It is considered as an experi
mental fact that accidental errors of observations (or measurements) follow closely the law of normal distribution.
In the sphere of biology,
similar phenomena have been observed as to the size of the bodies and various organs of living organisms. explanation of these observed facts?
What can be suggested as an In regard to errors of observations,
Laplace proposed a hypothesis which may sound plausible. He considers the total error as a sum of numerous very small elementary errors due to independent causes. It can hardly be doubted that various independent or nearly inde pendent causes contribute to the total error.
In astronomical observa
tions, for instance, slight changes in the temperature, irregular currents of air, vibrations of buildings, and even the state of the organs of percep tion of an observer may be considered as but a small part of such causes. One can easily understand that the growth of the organs of living organ isms is also dependent on many factors of accidental character which independently tend to increase or decrease the size of the organs.
If,
on the ground of such evidence, we accept Laplace's hypothesis, we can try the explanation of the normal law of distribution on the basis of the general theorems established above. Suppose that elementary errors do not exceed in absolute value a certain number l, very small compared with the standard deviation •
of their sum.
The quantity denoted by
w,.
u
in the preceding section will
be less than the ratio l/u and hence will be a small number; and the same
SEc. 8)
297
FUNDAMENTAL LIMIT THEOREMS
will be true of the error term R.
Hence, the distribution of the total
error will be nearly normal. Laplace's explanation of the observed prevalence of normal distribu tions may be accepted as plausible, at least.
But the question may be
raised whether elementary errors are small enough and numerous enough to make the difference between the true distribution function of the total error and that of a normal distribution small.
Besides, Laplace's
hypothesis is based on the principle of superposition of small effects and thus introduces another assumption of an arbitrary character. Finally, the experimental data quoted in support of the normal dis tribution of errors of observations and biological measurements are not numerous enough for one to place full confidence in them.
Hence, the
widely accepted statistical theories based on the normal law of distribu tion cannot be fully relied on and may be considered merely as substitutes for more accurate knowledge which we do not yet possess in dealing with problems of vital importance in the sphere of human activities.
LIMIT THEOREMS FOR DEPENDENT VARIABLES 8. The fundamental limit theorem can be extended to sums of depend ent variables as, under special assumptions, was shown first by Markoff and later by S. Bernstein, whose work may be considered an outstanding recent contribution to the theory of probability.
However, the condi
tions for the validity of the theorems established by Bernstein are rather complicated, and the whole subject seems to lack ultimate simplicity. For that reason we confine ourselves here to a few special cases. Example 1.
Let us consider a simple chain in which probabilities for an event E
to occur in any trial are the preceding trial.
p' and p", respectively, according as E occurred or failed in
The probability for E to occur at the nth trial when the results of
other trials are unknown is
Pn = p + (pt  p)ant where Pt is the initial probability,
a = p'  p" and p=
The mean probability for
n
p"
1 a

·
trials is given by
P1p 1 an
p,.=p+n 1a
so that p may be considered as the mean probability in infinitely many trials. In the usual way, to trials 1, 2, 3, . . we attach variables Xt, x2, x,, . .
in general
x;=1p;
.
Ol'
.
so
that
298
INTRODUCTION TO MATHEMATICAL PROBABILITY
according as E occurs or fails in the ith trial. E in n trials, the sum :CI
+ :1:2 +
.
[CHAP. XIV
If m is the number of occurrences of
.
+
•
:c,.
of dependent variables represents
m np,.. Evidently E(m np,.)
0
=
and, as we have seen in Chap. XI, Sec. 7, B,.
=
E(m
 np,.)1
1 + ll
"'
n pq:;: 1ll
ll l1 + tends to 1 as n becomes infinite. /l
that is, the ratio of B,.: n p In order to find quotient
an
appropriate expression of the characteristic function of the
we shall endeavor first to find the generating function w,.(t) for probabilities P ... .. (m
=
01 1, 2,
...n)
to have exactly m occurrences of E in n trials. Let A ..... be the probability of m occurrences when the whole series ends with E and similarly B ... .. the probability of m occurrences when this series ends with F, the event opposite to E. The following relations follow immediately from the definition of a chain (18)
Am,,.+!
=
Bm,n+l
=
Aml,,.p' + Bml,nP" A.. ,,.q' + B.. ,,.q".
Let ..
8,.(t)
..
� A.,,,.tm,
=
1/t,.(t)
m=O
m0
be the generating function of A.,,,. and B...... (19)
Bn+l(t) .p,.+l(t)
� Bm,ntm
=
=
=
From relations (18) it follows that
p'tB,.(t) + p"t.p,.(t) q'B,.(t) + q".p,.(t).
These relations established for n iS; 1 will" hold even for n
1/to(t) by
p'Bo + p"l/to q'Bo + q"l/to
=
PI
=
1
Bo +.Po
=
1.

=
0 if we define Bo(t) and
PI
whence From (19) one can easily conclude that both B,.(t) and .p;,(t) satisfy the same eqUa tion in finite differences of the second order
8,.+2  (p't + q'1)8n+l + llt8,. (p't + q")l/tn+l + lltljt..
1/tn+l
=
0
=
0,
SEc.
8)
299
FUNDAMENTAL LIMIT THEOREMS
Evidently P.,,,.=A.,,,.+ B.,,,.; hence '
w,.(t)
=
8,.(t)+t{!,.(t)
satisfies the equation
(20)
Wn+l(p't+ q")Wn+l +6tc.J,.=0
and is completely determined by it and the initial':londitions
wo =1, Since
p'=p+ q6,
q"=q+p6
the characteristic equation corresponding to
(20) can be written
l"• =1+c.(t  1) +c2(t1)1+ ... l"1=6+d1(t  1) +dz(t  1)1+ The general expression of w,.(t) will be
w,.(t)=Ar�+Bl"2=At�+B6"t"l"i" where to satisfy the initial conditions we must take
B=
r. + fl• + P•t. l"1l"1
Having found w,.(t), the characteristic function of m
s,.=
njj,.
VB,.
will be given by
rp,.(ll) =6
nfJ.___!i_ VB•w,.
To study the asymptotic behavior of interval

l
:a;
11
(
,_v 6
)
yB• •
rp,.(11 ) when
11
is confined to
a
finite fixed
:a; Z, we notice that then

u=
II
yB,.
will be well within the convergence region of the series we are going to consider now. By means of Lagrange's series or otherwise, we find the following expansion of log l"1 in power series of. t

1
log
(
)
pi pq6 r.=p(t1)  (t1)1+ 2 16
convergent for sufficiently small values of t power series in
 1.
pq1+6
.
=
psu

.
By setting t =6i" we obtain another
u
log l"•
..
+
 u1
2 16
300
INTRODUCTION TO MATHEMATICAL PROBABILITY
convergent for sufficiently small
rl where
g(u)
11
Hence
u.
r� = e =e
[CHAP. XI
. I+au• npu•np� l . 2 +nu'g(u) ,
t+au• u•o(u) npui+nP
is a bounded function of
u, u being contained in a certain
interval ( r, r).
By substituting
v 'U
=
VB..
her�, we easily conclude that
tends uniformly to the limit
in the interval l �
v � l while nil• e
remains there uniformly bounded.
vi
VB•r1"
Since, as can easily be seen, A and B can be
represented by power series
A B A tends uniformly to
1
1 + a1u + a2u2 + a1t� a2u2 
=
=

·
·
·
•
and B tends uniformly to 0.
Hence, finally,
q>,.(v)
in any fixed
v•
interval l � v � l tends uniformly to e2. It suffices to apply the fundamental lemma to conclude that the probability of the inequality m
< t,. �
np,.
tends uniformly to the limit

¥2; 1
if t,. tends to t. Since B,. is asymptotic to
1 npq1
f'
e
_,.
2du
�
+ 6 and Pn differs from p by a quantity of the order 6
1 /n, the inequality m
np
<
� t'\Jl=&npq
can be written in the form m
np,.
< t,. yB,.
with t,. tending to t, whence, using the above established result, the following theorem due to Markoff can be derived:
SEc. 8]
FUNDAMENTAL LI MIT THEOREMS
301
For a simple chain the probability of the inequalities
Theorem.
tl
/1+an a pq
'\j 1 
<
m  np
t2
<
/1+a
vr=�pq
tends to the limit
1

i"e
�It as n�oo. Example 2. ability
2du
�
Considering an indefinite series of Bernoullian trials with the prob
p for an event A to occur, we can regard pairs of consecutive trials 1 and 2,
2 and 3, 3 and 4, and so on, as forming a new serie1 of trials which may produce an event
E consisting of two successive occurrences of A (E
toE (F
=
AB, BA, BB).
independent.
Let
tn
=
AA) or an eventF opposite
With respect toE the trials of the new series are no longer
be the number of occurrences of E in
E(m

np2)
=
n trials.
Then
0
and
B.
=
E(m

np2)2
=
np2q(1 + 3 p)  2p3q
as was shown in Chap. XI, Sec. 6. Let Pm,n be the probability of exactly
m occurrences of E in a series of n trials.
Evidently
Pm,n
=
Am,n + Bm,n
m occurrences of E when the Bernoullian n + 1 trials ends with A orB, respectively. By an easy application of the
where Am,n and Bm,n are the probabilities of series of
theorems of total and compound probabilities we get
A,n,n+l Bm,n+l
=
=
Aml,nP + Bm,nP Am,n q + Bm,nq.
Corresponding to these relations the generating functions
o.(t)
� Am,.tm,
=
1/tn(t)
� Bm,ntm
=
m=O
m=O
satisfy the following equations in finite differences:
Bn+l 1/tn+l
=
=
ptB. + p.Jt. qBn + q.Jt,.
0 if we set Bo n q. Hence, it follows that B.(t) and p, .Po .. .Jt.(t) satisfy the same equations of the second order
holding even for
=
=
=
Bn+2  (pt + q)Bn+l + pq(t  l)Bn >/ln+2  (pt + q).Jtn+l + pq(t  1)1/t,.
= =
and so does their sum
w.(t)
=
o.(t) + .p.(t)
� Pm,ntm. m=O
0 0
INTRODUCTION TO MATHEMATICAL PROBABILITY
302
[CHAP. XIV
Thus, to determine w.(t) we have the equation
"'•+2  (pt + q)wn+l + pq(t  1)w,.
=
0
and the initial conditions
wo= 1,
"'1
= 1  p2 +
pit.
The general expression of w.(t) is
w,.(t)= An + Br;= An + Bp"q"(t

1)"!1"
where rl and rl are roots of the equation
rs  r= p(t  1)
and
A=
rs
+ 1 + p2(t  1) r�  r2
B=
;
r�  1  p2(t  1). rl  r2
If r1 is the root which fort= 1 reduces to 1, we easily find the following series log rl = p2(t  1) +
p2( p2 + 2pq) (t  1)2 + 2
or, setting t= e'" and supposing u sufficiently small, log rl As to
A
=
. p2q(1 + 3p) $p2u u2 + 2
and B, they can be developed into series of the form
A= 1 + cu2 + ... B= cu2 + ·
·
·
Hence, reasoning in the same manner as in Example 1, we can conclude that the
characteristic function
np'v .
rp,.(v)
=
tv
e VB.'w,.(e./B.)
of the variable m
np2
VB:. ,.
tends to the limit e2uniformly in any finite and fixed interval l ;;;;
v
;;;; l.
Refer
ring, finally, to the fundamental lemma, we reach the following conclusion: The probability of the inequalities •
.
tt"Vnp2q(1 + 3p) <
m
 np2 < t2ynp2q(1 + 3p)
tends uniformly (with respect to t1 and t2) to the limit
as n>
oo.
FUNDAMENTAL LIMIT THEOREMS
303
Problems for Solution 1. Consider a series of independent variables Zt, :c2, :c8, where in general a a xr. (k = 1, 2; 3, . . . ) can have only two values k and k each with t.he probability �. Show that the limit theorem holds for the variables thus defined if a> �, but the law of large numbers holds only if a < �. •
Solution.
•
•
Evidently
From Euler's formula (Appendix
I) we derive two asymptotic expressions
n
B = pa + 22a + pa + 23a +
•
•
.
•
.
+ nlla
•
+ naa
"' ·
3a +I
Hence
(2a+ 1)il t
"'n"'
3a+1 For
so that the limit theorem holds.
e
Z1
<
a
n
=
'
n2a+l 2a+I nlla+l
"'
Wn
+0
� the probability of the inequalities
+z2 +
• • • +:en
<e
n
tends to the limit
and the law of large numbers does not hold. 2. Let m; be the number of successes in i Bemoullian trials with the probability p. Show that the limit theorem holds for variables
8; =
8p
y'ipq m;
;
i =I, 2,
.
.
. n
but the law of large numbers does not hold (Bernstein). HINT:
81 + 82 +
where
:c1, :c2,
+
8n
. . • :en
= (pq) l
[( 0 +(�2+ 1+
+
are independent variables with two values
in the customary way with trials
1 , 2 , . . . n.
S. Consider an infinite sequence of independent variables
:cr. can have three values
0, (log k)", (log k)"
q and p associated
:111,
:c., :ca,
•••
whe1·e
304
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XIV
with the corresponding probabilities
1
2 (k +a)
{log
1 1 ' ·������ (k +a) )P (k + a) {log (k +a) JP (k +a) {log (k + a) )P ·
a being a sufficiently large constant.
p. and p satisfy the inequality
Moreover,
2,.p + 1 > 0. p < 1 and hence the limit 1 and at the same time the
Show (a) that Liapounoff's condition is satisfied when theorem holds; (b) that this condition is not satisfied if p � limit theorem fails at least for p > 1. Solution. a. By using Euler's formula we find
c.�,."'
1+�
2
(2p. +1  p) (2 + B)p. +1
_ P
' 
(n + a))2
{log
(p

1) •
Hence the first part is answered. b. The probability of the inequality
z, +z2+
·
·
+z.. �O
·
is less than
2�
� (k k�l
and this, in case
1 +a) {log (k + a)}P
p > 1, is less than 2 p1
 (log
a)IP,
Hence, the probability of the equality Z1
+ Z2 +
•
·
·
+ :r;,.
=
0
2 >1 (log a)1P and the limit theorem cannot hold. p 1 because 2p. p + 1 > 0.
remains always that
B,. �
oo
Note
4. Prove the asymptotic formula
+
n" 1·2
·
•
•
n
1 "'e"'
2
being a large integer. Hm: Apply Liapounoff's theorem ton variables distributed according to Poisson's law with parameter 1. 5. By resorting to the fundamental lemma, prove the following theorem due to Markoff: If for a variables,. with the mean 0 and the standard deviation 1 n
=
lim
E(s�)
n+oo
1 =

y2w
J
..
_
00
=
t�<eit'dt
305
FUNDAMENTAL LIMIT THEOREMS for any given k
=
3, 4, 5,
. , then the probability of the inequality
s,.< t
tends
to the limit
6. In many special cases the limit of the error term can be considerably lower than :r;,. are identical and uni that given in Sec. 6. For instance, if variables x1, xs, formly distributed in the interval �.� the probability F,.(t) of the inequality •
•
•
/n "\/12
Zt+z•+ · · · +x,.<' differs from
by less
(in
absolute value) than
 ()
1 2 "
1
.
7.5n
+ 
.,. .,.
12
+e .,.an
_ ... ... 24
the last two terms being completely negligible for somewhat large
Indication of the Proof.

sin"'
tp
for 0
�'I' � .,.;2.
n.
First establish the inequalities
<e
�
6,
Further, represent F,.(t) by the integral
and split it into two integrals taken between 0 and
+oo.
'1. Supposing again that
:1:11 xs,
•
•
x,. are
•
.,.v;;; y'i2 and .,.v;;;y'i2 and
identical and uniformly distributed in
the interval �, �.prove that for n � 2
E lx1 + Xa + 8. Let
s,.
·
·
·
+ x,.l
=
.Jf;. 60�;
be a variable with the mean
+
=
0 and standard deviation
characteristic function tp,.(t) tends to eJ.i11 as n .
l � t � Z, show that
.. J
Els I.
0 < 9 < 1.
·
oo
=
1.
If its
uniformly in any finite interval ·
306
INTRODUCTION TO MATHEMATICAL PROBABILITY
9. If independent variables
Xt, x2,
•
•
x.
•
[CHAP. XIV
with means =0 satisfy Liapounoff's
condition, prove that
10. Show that for a simple chain of trials
Elm  npi ,...., p being
�2npq 11 + ��
, 1r
the mean probability in infinite series of trials and
�
=
p' p".
11. A series of dependent trials can be illustrated by the following urn scheme:
Two urns,
1
and
2,
contain white and black balls in such proportions that the prob
1
ability of drawing a white ball from white ball from
2
is
q
=
1
p.
p,
is
whereas the probability of drawing a
Whenever a ball taken from an urn is white, the
next ball is taken from the same urn, but if it is black, the next. ball is drawn from the other urn.
The urn at the first drawing is selected by lot, the probabilities of select
ing the first or the second urn being given.
Evidently the course of trials is deter
mined by these rules without any ambiguity.
Let
obtained in
n
m denote the number of white balls
drawings and let
Show that the probability of the inequality
m  na
<
tVLa(1 
approaches the limit
1 
I
a}n;
t
.y2; Indication of the Proof.
Let
L
2(1  3pq) 1 2pq
=
u' e 2du. ..
P!,:!.; Pi2!.; P�,a!.; p�4·l" be the probabilities of having m white balls in n trials when (a) the last ball is white and from urn 1; (b) the last ball is white and from urn 2; (c) the last ball is black and from urn
1; ane. (d)
the last ball is black and from urn
2.
The sum
Pi1!. + Pi2!. + Pia.ln + P!.,4_l. represents the probability of having exactly m white balls in n trials. functions of probabilities Pi'�. satisfy the following equations Pm,n
=
i�t
=
=
=
=
The generating
pt(
whence it can be shown that they all, as well as their sumthe generating function of Pm,nsatisfy the same equation of the second order
Zn+2 tzn+l + pq(t2 1)Zn
=
0.
FUNDAMENTAL LIMIT THEOREMS Setting t
= ei",
307
one of the characteristic roots will be given by .
e
...
(12pq)>u4pq(l3pq)"2 + ...
for small u, while the other root tends to
0
be reached in the same way as in Examples
as
u+
1 and 2,
0.
The final conclusion can now
pages
297 and 301.
References P. LAPLACE: "Thoorie analytique des probabiliMs," Oeuvres VII, pp. 309ff. S. PoissoN: "Recherches sur la probabilite des jugements en matiere criminelle et en matiere civile," pp.
246ff.,
Paris,
1837.
P. TsHEBYSHEFF: "Sur deux th�oremes relatifs aux probabilit�s," Oeuvres
481491.
2,
pp.
A. MARKOFF: The Law of Large Numbers and the Method of the Least Squa1·es (Russian), Proc. Phys.Math. Soc. Kazan, .Jl"ez•  : Sur les racines de }'equation e� dx"
1898. =
0,
Bull. Acad. Sci. St. Petersbourg,
9,
1898.  :
"D�monstration du second thooreme limite du calcul des probabilit� par la
m�thode des moments," St. Petersbourg,  :
1913. 1912.
"Wahrscheinlichkeitsrechnung," Leipzig,
A. LIAPOUNOFF: Sur une proposition de la thoorie des probabilit�, Bull. Acad. Sci. St. Petersbourg,
13, 1900.
: Nouvelle forme du thooreme sur la limite de probabilite, M�m. Acad. Sci. St. Petersbourg,
12, 1901.
G. P6LYA: Ueber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung, Math. Z., 8, 1920. J. W. LINDEBERG: Eine neue Herleitung des Exponcntialgesetzes in der Wahrschein lichkeitsrechnung, Math. Z.,
15, 1922.
P. LEVY: "Calcul des probabilit�," Paris,
1925.
S. BERNSTEIN: Sur !'extension du thooreme limite du calcul des probabilites aux sommes de quantit� d�pendantes, Math. Ann., A.
KoLMOGOROFF:
Eine
97, 1927.
Verallgemeinerung des LaplaceLiapounoffschen Satzes,
Bull.·Acad. Sci. U.S.S.R., p.
959, 1931.
: Uebe� die Grenzwertsii.tze der Wahrscheinlichkeitsrechnung, Bull. Acad. Sci.
. U.S.S.R., p. 363, 1933. A. KHINTcHiNE: "Asymptotische Gesetze der WahrscheinlichkeitsrechnutJ.g," Berlin,
1933.
CHAPTER XV
NORMAL DISTRIBUTION IN TWO DIMENSIONS. LIMIT THEOREM FOR SUMS OF INDEPENDENT VECTORS. ORIGIN OF NORMAL CORRELATION 1. The concept of normal distribution can easily be extended to two and more variables.
Since the extension to more than two variables
does not involve new ideas, we shall confine ourselves to the case of twodimensional normal distribution. Two variables,
x, y, are said to be normally distributed if for them
the density of probability has the form (j'l'
where cp
ax2 + 2bxy + cy2 + 2dx + 2ey + f
=
is a quadratic function of x, y becoming positive and infinitely large together with lxl + IYI· This requirement is fulfilled if, and only if,
ax2 + 2bxy + cy2 is a positive quadratic form.
The necessary and sufficient conditions
for this are:
a> 0;
acb2
=A > 0.
Since A> 0 (even a milder requirement A ;;6 0 suffices), cbnstants
xo, y0
can be found so that cp
a(xxo)2 + 2b(xXo)(yYo) + c(y  Yo)2 + g
=
identically in
x, y.
presented thus:
It follows that the density of probability tr"' may be
4
The expression in the right member depends on six parameters K; a, b, c; x0, yo. But the requirement
reduces the number of independent parameters to five.
We can take
a, b, c; xo, Yo for independent parameters and determine K by the condition
J_ ..,f_"',.e,.(sso)t'Jb(zso)(V:"•lc(u1/o>•dxdy
K
.
308
=
1
'
SEc. 2)
309
NORMAL DISTRIBUTION IN TWO DIMENSIONS
which, by introducing new variables

�= x can be exhibited thus
.. ..
f1 = Y
Xo,
 Yo
f_ .. J_ .. eaf._26�"'�"d�d., = 1.
K
To evaluate this and similar double integrals we observe that the positive quadratic form
a�2+ 2b�., + c.,2 can be presented in infinitely many ways as a sum of two squares
a�2+ 2b�., + c.,2 = (a�+ p.,)2+ ('Y�+ a .,)2, whence
b and
(aa
 P'Y)2
=
=
a{j+ 'Ya
1:1.
By changing the signs of a and {j if necessary, we can always suppose
a8
 P'Y
=
+Vil.
Now we take u =a�+ Pfli
for new variables of integration.
v
=
'Y�+ a.,
Since the Jacobian of u, v with respect
to ��.,is Vil, the Jacobian of�� f1 with respect to u, v will be by the known rules
f .. f .. oo
oo
eafL26�•'1"d�fl =
1
__
Vil
Thus
K'lf' Vil
=1
'
J
oo
oo
f
oo
eu•v•dudv
oo
=
1/Vil and, __!____ . Vil
Vil K=· 7f'
That is, the general expression for the density of probability in two dimensional normal distribution is
vac 7f'

b2e (..zo)L (..zol
.
2. Parameters Xo, Yo represent the mean values of variables x, y. To prove this, let us consider E(x

Xo) =
.V:J_'".. J_ .. ..
(x

Xo)ea<..zolL26(..zol(uuolc(ullol'dxdy.
310
INTRODUCTION TO MATHEMATICAL PROBABILIT.Y
[CHAP. XV
To evaluate the double integral, we can express x and y through new variables
u,
v introduced in the preceding section. X  Xo =
and E(x  Xo) =
au  {3v, v'6
Y  Yo
1 J .... f .... (au
_/A
rvA 
whence
We have
yu
+ v'6
av
,=
=
'
 {3v)eu._v•dudv = 01
_
E(x) = xo,
and similarly E(y) =Yo·
3. Having found the meaning of xo, Yo we may consider instead of x, y, variables x  Xo1 y  Yo whose mean values 0. Denoting these new variables by x, y again the expression of the density of probability for =
x, y will be:
It contains only three parameters, a, b, c. To find the intrinsic meaning of a, b, c let us consider the mathematical expectation of (x + 'Ay)2 where}.. is an arbitrary constant. E(x + 'Ay)2 =
or, introducing
u,
We have
yxJ_:J_..
,.
(x + 'Ay)2ea:z:2'Jb:z:ucu•dxdy,
v defined as in Sec. 1 as new variables of integration,
But
a{j + ya
=
b,
whence E(x2)
+ 2'AE(xy) +
c
b
2A
2A.
'A2E(y2) =   2}..
+
a
}.,_21
2A
and since}.. is arbitrary E(x2) =
c
2A'
E(y2)
a =
·
2A·
SEc.
4]
NORMAL DISTRIBUTION IN TWO DIMENSIONS
On the other hand, if of x,
y
crt, cr21
and
311
r are respectively standard deviations
and their correlation coefficient, we have
E(x2) =
E(y2)
E(xy) = rcrtcr21
crf1
=4
Hence
and
or
Finally, a=
1 , 2crf(1  r2)
b
=
�= With these values for
a,
r 2crtcr2(1  r2)1 1 . 2crtcr2V1  r2
b, c, and
.y'K the
density of probability can
be presented as follows: " 1 e 2(1 :r•>[ (�)" 2� �+ (�) ] 21r1Tt1T2�
and the probability for a point x,
y to
belong to a given domain D will be
expressed by the double integral
1 21rcrt1T2� extended over D.
4. Curves
1 2(1  r2)
If
e
l
2T= !!.. (!!.. . 2(1r•) [(=)· ..  ... ...+ ...) ] dxdy
__
.
(D)
2 2r=._ JL ( y )2] = l = [(=) ITt ITt ITt 0'2 +
const.
are evidently similar and similarly placed ellipses with the common center at the origin. equal probability. of
l
(ellipse
l)
is
For obvious reasons they are called ellipses of
The area of an ellipse corresponding to a given value
312
[CHAP. XV
INTRODUCTION TO MATHEMATICAL PROBABILITY
whence the area of an infinitesimal ring between ellipses l and l + dl has the expression  r2dl.
2?ru1u2y1
The infinitesimal probability for a point x, y to lie. in that ring is expressed by
e 1dl.
Finally, by integrating this expression between limits l1 and l2 > l 1, we find as the expression of the probability for x,yto belong to the ring between
two ellipses l1 and l2.
If l1
=
0 and l2
1
l,
=
 e1
gives the probability for x, yto belong to the ellipse l. If n numbers l, l1, l2, . . . ln1 are determined by the conditions
1  1 e
=
1  1 • e e
Z.  1• e e
=
=
·
the whole plane is divided into n +
·
1
·
=
e1·•

e1·•
=
1 , n+l
regions of equal probability:
namely, the interior of the ellipse l, rings between l, l1; lt, l2; ••• ln2, lnl and, finally, part of the plane outside of the ellipse ln_1• 5. To find the distribution function of the variable x (without any regard toy), we must take forD the domain  oo <X< t;
oo
As the integral
1
f' f,. 2?ru1.yr;=T22f'..
2?r1T11T2� 1
=
e

oo

1_ __ 2{1r')
' • [(!!.. !!. ) 'J .. , ) +(1 r ) ( =.. , ,,.
oo
dxdy
=
.
' e 2"•'dx
f.. oo
·
"'
z!
r e 2<1  '>dz
we see that the probability of the inequality X< t is expressed by
1

IT1
yl2;
f'

e 2
.,.
Similarly, the probability of the inequality y< t
=
' "''
1 1Ttyl2; 
J
e 2.. •'dx�
..
is
1 
IT2�
Thus, if two variables means
313
NORMAL DISTRIBUTION IN TWO DIMENSIONS
SEc. 6)
=
x,
f'
•
e
u 2Viody.
oo
y
are normally distributed with their
0, each one of them taken separately has a normal distribution
of probability with the common mean 0 and the respective standard
y
�r1 and �r2• Variables x and are not independent except when For if they were independent the probability of the point
deviations r = x,
0.
y belonging to an infinitesimal rectangle t <X< t + dt;
would be
whereas it is
e
1
27riTliT2�
2(1
� r•) [ (f.)"2rf, ·;;+(;;)"]dtdT'
and these expressions are different unless r
=
Thus, except for r
0.
= 0,
normally distributed variables are necessarily dependent in the sense of the theory of probability. "correlated variables."
Dependent variables are often called
In particular, variables are said to be in "normal
correlation" when they are normally distributed.
6. The probability of simultaneous inequalities
y
X<X< X', is represented by the repeated integral
while
1 ITl�

is the probability that (Chap. XII, Sec.
10)
x
X'�dx LX e
:wa •
will be contained between X and X'.
the ratio
X'e �dxf� ,.e ! fx r x· � U2V27r(1  r2) Jx e :Wo•(
1
[ �z Jdy
r ) u •
2
can be considered as the probability of the inequality
y
Hence
314
INTRODUCTION TO MATHEMATICAL PROBABILITY
it being known that
x is
contained between X and X'.
[CHAP. XV
Considering X' as
variable and converging to X the above ratio evidently tends to the limit
1 crs.yr=Ti.y2;
' J

1
[
'"x]"dy
e 2,,•(1 r•) !1r;; ..
which can be considered as the distribution function of fixed value X.
Hence,
y
for
x =X
y
when x has a
has a normal distribution with the
standard deviation
and the mean
cr
y = r sx. 0"1
Interpreted geometrically, "line of regression" of y on
this equation represents the socalled
x.
In a similar way, we conclude that for
y =Y
the distribution of
x
is normal with the standard deviation
r2
crtv'1 and the mean
X = rcrty, 0"2
This equation �epresents the line of regression of .
x
on
y.
LIMIT THEOREM FOR SUMS OF INDEPENDENT VECTORS
7. So far normal distribution in two dimensions has been considered abstractly without indication of its natural origin.
Onedimensional
normal distribution may be considered as a limiting case of probability distributions of sums of independent variables.
In the same manner
twodimensional normal distribution or normal correlation appears as a
limit of probability distributions of sums of independent vectors. Two series of stochastic variables X11 Xs1
, X,.
Yt, Ys,
.
y,.
define n stochastic vectors Vt1 v2 , . . . v,. so that x,, y, represent com ponents of v, on two fixed coordinate axes. If
E(x,) =a,
E(y,) = b,,
SEc.
8)
315
NORMAL DISTRIBUTION IN TWO DIMENSIONS
the vector
a, with the components
b, is called the mean value of v,.
a,,
Evidently the mean value of v=
Vt + v2 +
·
·
·
+ v,.
is represented by the vector
a = a1 + a2 + and that of
+ a,.
v a is a vanishing vector.
Without loss of generality
we may assume at the outset that i
E(x,) = E(y,) = 0; in which case
1, 2, .. .
=
n
,
v,. are said to be inde E(v) = 0. Vectors Vt , vz, x,, y, are independent of the rest of the variables •
•
.
pendent if variables
Xf, Yi where j
�
i.
In what follows we shall deal exclusively with independent vectors.
8. As before, let x,., y,. be components of the vector
v,.(k = 1, 2,
.
.
)
.
n .
Then
X = Xt + X2 + Y Yt + Y2 +
+ x.. + y,.
=
will be the components of the sum
v
=
Vt + v2 +
·
·
+ v,..
·
If
E(x,.) = E(y,.) = 0 E(y�) = c,., E(x:) = b,., then
E(X) 0, bt + bs + E(X2) E(Y2) = Ct + Cs + E(XY) = dt + d2 +
E(Y) = 0
=
·
=
•
·
·
+ b,. + c,.
·
·
··
=
=
B,. Cn
+ d,. = r,. v'B.. y'C"
·
because if variables
j � i,
x, and y 1 being independent.
Let us introduce instead of variables
x,., y,.(k
variables
�,.
x,. =
__ ,
VB..
y,.
.,,.==
yC,.
=
1, 2,
.
.
•
n)
new
316
[CHAP. XV
INTRODUCTION TO MATHEMATICAL PROBABILITY
and correspondingly
X
u=
8=,
VB,.
instead of X, Y.
y

VC:.
We shall have:
E(��o)= E(fl�o) = 0 E(�:)
.,;,
!:;
�:
E(fl:)=
, and
E(s)= E(u) = 0 E(s2) = E(u2)= 1 E(su) = r,.. The quantity
� 1.
r,.,
the correlation coefficient of s and
u, is in absolute value
We define
q,(u, v) = E[ei] as the characteristic function of the vector s,
q,(O, v)
u.
Evidently
q,(u, 0)
are respectively the characteristic functions of 8 and
ei(ua+Vor)
= ei(uf•+••ll>
•
ei(uf:r+v.,,)
•
•
•
and
u. Since
ei(ufn+v.,n)
and the factors in the righthand member represent independent varia bles, we shall have
q,(u, v) = E(ei
9. For what follows it is very important to investigate the behavior when n increases indefinitely while u, v do not exceed an
q,(u, v)
of
arbitrary but fixed number l in absolute value. Let
and
!t+!z+ · Bl g1 + U2 + c•.. ·
If
"'"
and fin tend to
0
as
n + co,
·
+!.. =
"'"
+g.. =
fin•
we shall have
(1) provided
lui�
l,
lvl
� l
SEc.
9)
and
n
NORMAL DISTRIBUTION IN TWO DIMENSIONS
317
is so large as to make
l(wi + 711) Since ei
1 +
 �(u�,.
i(u�,. + V1/�o)
<
+
1.
V7/�o)2 +
+
8. (u�,. + V7/,.)a.I 6
IBI < 1,
we shall have:
18'1 < 1. On the other hand,
1
 �2 b 2Bn
2dk
UV 2VJ[;.C,.
_
_
c
:4v2
2Cn
Ci 2di bi u•ue•• = e
2Bn
2yB.Cn
2Cn
+
IB"I < 1 and so
Furthermore,
because
Em)
=
!"n
<
w !,
Also
[E(u�,. + V7/�o)2)2
<
[E(u�,. + V7/�o)2)f � Elu�,. + V'7�ola
. Elu�,. + V7/�ol3 � 4la
(�l �i} +
Taking into account these various inequalities, we may write
INTRODUCTION TO MATHEMATICAL PROBABILITY
318
[CHAP. XV
where
Finally,
and
Let P denote the probability of simultaneous inequalities
as was stated.
10. Theorem.
Provided r,. remains less than a fixed number a < 1 in absolute value and the above introduced quantities "'"' .,,. tend to 0 as n + oo , P can be expressed as p
=
1(112rntT+ T') i"iTI e 2(1r.•> dtdr + A,. 27ry1  r! lo To 1
where A,. tends to 0 uniformly in to, t1; ro, r1. If, in addition, r,. itself tends to the limit r(lrl to
Proof.
<
)P will tend uniformly 1
a. In trying to extend Liapounoff's proof to the present case
we introduce an auxiliary quantity II defined as II
=
iTl (!=!)I ( 1 i" (!.=:!)I e " du e " dv h'T lo TO
E 
Using the inequality
1 i __ .. e'"dt
.y;.:J:
)
·
• < e..
2
for
i"  (!.=:!). .[Tl  ( !=!) . .e " du e " dv  1 1 1h'T lo TO
·
X> 01
one can easily derive the following inequalities:
(2)
·
<
I
<
�(e (";•)' + e ('"�·)' +
e
t'�")' + e to�")')
SEc.
10]
319
NORMAL DISTRIBUTION IN TWO DIMENSIONS
if
(3) and
(4) _!_ h2�
i"e ( ; ) du iO e("�")'dv < ! (e( ; )' '
"
to
'
n
·
2
T
+
+
"
"
e
( ;")
if at least one of the inequalities (3) is not fulfilled. of II, P and from (2) and (4) it follows that
n
)
e  (" ; " ' + '
+
ee·;")'
)
From the definition
IP III < !E e (" ; ")' + e ('·;·)' + e (Tl;")' + ee·;")' . But referring to (1) and setting
(
)
we have by virtue of the developments in Chap. XIV, Sec. 3,
8 e (�)' IP  III < 2a,.(l) + hv'2 + y; hl
(5)
b. Replacing derivative of
II
t1, r1
by variable quantities t, t and r, we get
with respect to
d 2II 1 Ee (' " •)' dtdr h2�
(
_
r
·
and taking the second
(�)). "
On the other hand
te e',.•)' cT,.")' �
h2 
=
�2
J .. f.. e oo
whence (6)
d2II dtdT
_!_ A_ ":t'lf 2
=
Here we substitute
For all real
f f oo
oo
cp(u, v)
u, v
oo
oo
=
oo
4 .. �<
..i. (u, v)dudv. e ¥(u'+••}e i( tu+Tvl .,
ei(u'+2r.uv+v•> + g(u, v).
lg(u, v)l ;;:;; If
lui ;;:;; l, lvl ;;:;; l,
where
l
·+··>ei(tutTvlei
2.
is an arbitrarily fixed number, and n is large
enough, we have
lg(u, v)l ;;:;; a,. ( l) .
320
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
XV
Hence, the double integral
1 4n2
I
fe
�(u'+•') 4
ei(lu+rv>g(u, v)dudv
extended over the region outside of the square
in absolute value.
lui �
l,
lvl �
lui �
l,
lvl �
The same double integral extended over the square
l is less than
·
l2 2a ,.( l) 'II"
in absolute value.
lis less than
Thus, referring to
(6)
and
Now
1�1 < 1 and
Hence
and (hl)•
IR'I
l2 a,.(l) + < r2
e 4 h2 ,;_2 + 4n(1
By transformation to new variables �
= u + r,.v;

a2)t"
Smc.
10)
32 1
NORMAL DISTRIBUTION IN TWO DIMENSIONS
the foregoing double integral becomes
�r,.I I oo
1
A  / 1 ":t:lf2 v
2
_
 ..
oo
oo
6
_.!�ab,o 2
2
•
it�+i TIro ' 6 '\11r.• d�d11 =
so that finally
d2J1 dtdT
_
1
=
2'11' �
=
1
1)( C•2rn1T+T1) 2(1rn• 6 
2'11' vr=r!
 1(112 roiT+TI) 6 2(1rn• ) + R',
Integrating this expression with respect to t and T between limits to, t1
and To1 T11 we get:
(7)
II
1 =
f'•
27ry1  r�J,.
i
"16 2(1�r.•)(t•2rntT+,.•) dtdT
...
+
P
where
Hence combining inequality (5) with
p
1 =
27ry1  r�J,.
where
IA.I
<
[ + :.
(t,
2
f1t
i
lo)(n To) +
(7)
and (8),
T1 � (t•2rntr+,.•J 6 2(1 r.•) dtdT
...
+ .1..,
]a.(!)+�}'{��+
(t1  to)(TI 
}
To) + hv'2 +
(ti  to)(n
 ;o)h2•
471'(1  a2)2
Considering to, t1; To1 TI as variable and denoting an arbitrarily large
number by
L,
we shall assume at first that the rectangle D
is completely contained in the square
Q: lui� L.
Then, taking h
1.1... 1
<
=
z� we shall have
( + �:z2)a.. (Z) + �(�z; + 4L2J 2
l6
+
0l� +
L2Zl �· '11'(1  a2)2
322
INTRODUCTION TO MATHEMATICAL PROBABILITY
e,
Given an arbitrary positive number
le �(�z� After that, since
no(e) so
+ 4£2
a,.(l) +
0 as
+
n+
that
a,.(l) for n >
)
no(e).
<
�
v'2l � +
n
no(e);
>
c. To prove that Tt
large as to have
L2zt � � <
11'(1  a2)2 l)
e.
we can find a number
(2 + 4£2l2)l ;
Finally, we shall have <
E
� .. tends to 0 uniformly in any Q with an arbitrarily large side 2L.
that is,
..
contained in the square
ro,
l so
(for a fixed
oo
�� .. � as soon as
we take
[CHAP. XV
�
rectangle D
tends to 0 uniformly no matter what are
to, t1;
we observe that the integral
ff 2(l�rn')(t•2rntr+r')dtdr e 
1
21rv11
,.�
Q
extended over the area outside of Accordingly, we take
L
becomes infinitesimal as
L
+ oo
.
so large as to make this integral < e/2 (no matter
what n is) and in addition to have L1 <
e/4.
The number L selected
according to these requirements will be kept fixed. Let D' represent that part of D which is inside that the point let
J'
and
J"
s, u
shall be contained in D' or D", respectively.
ff 2(1�r.•)!t'2rntr+r'}dtdr e r�
extended over D' and D", respectively. > 0 a number
n0(e) can
no(e).
=
P' + P";
whence
IP no(e).
<
E
Now
P
for n >
By what has been proved, given
be found so that
IP' J'l for n >
Also,
be the integrals
1 211'V1 E
Q, the remaining part or
Let P' and P" denote the probabilities
parts (if there are any) being D".
 Jl
<
J E
=
J' + J",
+ P" + J"
Since by Tshebysheff's lemma (Chap. X, Sec.
probability of either one of the inequalities
1)
the
Smc. 11)
323
NORMAL DISTRIBUTION IN TWO DIMENSIONS
lsi> L is less than
1/L,
or
lui > L
we shall have
P"
<
�
<
L
�. 2
Also,
J" < �� 2 whence
jP Jj for
n > no(e);
2e
<
that is, the difference
tends to 0 uniformly, no matter what
to, t1; ro,
Tt
are.
Finally, the last statement of the theorem appears as almost evident and does not require an elaborate proof.
11. The theorem just proved concerns the asymptotic behavior of the probability P of simultaneous inequalities
which, due to the definition of sand
tov'B,. � x1 + x2 + rov'C: � Y1 + Y2 +
u,
·
'
·
are equivalent to the inequalities ·
+ Xn < trv'B..
' ' + y,. <
1"1VC:.
From the geometrical standpoint the above domain of tangle.
s,
u
is a rec
But the theorem can be extended to the case of any given
domain R for the point
s,
It is hardly necessary to enter into details
u.
of the proof based on the definition of a double integral.
It suffices to
state the theorem itself:
Fundamental Theorem. The probability for the point (s, u ) to be located in a given domain R can be represented, for large n, by the integral
1 27ry1 r!
ffe
2(1 � r.•>(t•2r.t.+.•>dtd.,.
extended over R, with an error which tends uniformly to 0 as n becomes infinite, provided w,.+01
.,,.+0,
324
INTRODUCTION TO MATHEMATICAL PROBABILITY
while for all
[CHAP. XV
n
lr.. l
In less precise terms we may say that under very general conditions the probability distribution of the components of a vector which is the sum of a great many independent vectors will be nearly normal. The first rigorous proof of the limit theorem for sums of independent vectors was published by S. Bernstein in 1926.
Like the proof developed
here it proceeds on the same lines as Liapounoff's proof for sums of independent variables.
Moreover, Bernstein has shown that the limit
theorem may hold even in case of dependent vectors when certain addi tional conditions are fulfilled.
12. A good illustration of the fundamental theorem is afforded by E, F, G. For the sake of simplicity we shall assume that probabilities of E, F, G are
series of independent trials with three alternatives,
p, q, rin
Naturally
all trials.
p+q+r=l. In the usual way, we associate with these trials triads of variables
(i
=
1, 2, 3, . . . )
so that
x, 1 y, = 1 z, = 1 =
or 0 according as
E
occurs or fails at the ith trial;
or 0 according as F occurs or fails at the ith trial; or 0 according as G occurs or fails at the ith trial.
Evidently
E(x,) = E(xl) = p E(y,) E(yn = q =
so that vectors v, with components
�. have their means
x,

p,
'I'Ji
= y,

q
The independence of trials involves the inde
0.
=
pendence of vectors
=
Vt1 V21
•
•
•
v,..
Hence we can apply the preceding
considerations to the vector v
= Vt +v2 +
·
·
·
+ v,.
with the components
X .= �1 + �2 + y = '171 +'172 +
+ �.. +TJ,..
We have
B,. = E(X2) = np(1  p);
C,.
=
E(Y2)
=
nq(1  q).
SEc. 12]
NORMAL DISTRIBUTION IN TWO DIMENSIONS
325
Moreover,
E(�•"''>) =E(x(IJ;) pq =pq and
E(XY) = r,.VJi:v'C,. = npq whence r,. =
pq ypq(1  p)(1  q)
The quantities denoted by /J;,, gk in Sec. 9 are in our case
fk =El�kl3 = p(1  p)3+ (1  p)pa Uk = El'llkl3 = q(1  q)3+ (1  q)q3• Hence
 q(1  q)3+ (1  q)q8 ' "'" nlq1(1  q)t
p(1  p)3+ (1  p)p3 ' "'" = nlp1(1  p)i and the conditions w,. ___..
are satisfied.
o,
'lin___.. 0
The fundamental theorem, therefore, can be applied.
If k, l, mare the respective frequencies of events E, F, Ginn trials, the
quantities X and Y represent the discrepancies }. =k

p. = l nq.
np,
Introducing the third discrepancy
v=m nr we shall have
>.+p.+v=O so that
v
is determined when >. and p. are given.
The last two quantities,
however, may have various values depending on chance.
Concerning them the following statement follows from the fundamental theorem:
The probability that discrepancies >., p. in n trials shall Theorem. simultaneously satisfy the inequalities
tends uniformly, with indefinitely increasing n, to the limit
1
Cll
li
'hypqr ao
{J1
{Jo
(
l a" + (JI +
e 2P 


11
'Y•)
r dadfl

326
INTRODUCTION TO MATHEMATICAL PROBABILITY
where, to have symmetrical notation,
[CHAP. XV
is a vm·iable defined by
'Y
a+ {3+ 'Y=
0.
On account of _symmetry, perfectly similar statements can be made in regard to any two pairs of discrepancies >., p.,
v.
Since the fundamental theorem and its proof can be extended without any difficulty to vectors of more than two dimensions, we shall have in the case of trials with more than three alternatives a result perfectly analogous to the last theorem .
Theorem. Each of n independent trials admits of k alternatives E1 , E2, Ek the probabilities and the f1·equencies of which respectively are P1, P2, . . . Pk and m1, m2, ...mk. The probability that the discrep . k  1) should satisfy simultaneously the ancies m;  np;(i = 1, 2, . . inequalities •
•
•
arv'n
<
m;  np;
<
{3;vn
tends uniformly, with indefinitely increasing n, to the limit
fllt J Pk
1 k1
a "'
2 VPtP2 ' ' ' (2?r)where
. .
1
k
1<2
fP•• 2�1 Pi e dttdt2 ... dtk1 Jo
.
a kl
tk= (tt+t2+ ...+tk1).
From this theorem, by resorting to the definition of a multiple integral, we may deduce an important corollary: Let Pn
denote the p1'0bability of the
inequality
(m2  np2)2 (mt  np1)2 (mk  npk)2 � x2. + ...+ + np2 npk np1 Then, as n tends to infinity Pn tends to the limit
1
f Pk
k1
2 VP1P2 (2?r)
·
·
·
' ' '
f !(�+�+ 2
e
Pt
Pt
.
P•
. .
+�)
dt1dt2 ' ' ' dtk1
where the integration is extended over the (k  1) dimensional ellipsoid t t t = l + � + ...+ f � x2. P1 P2 Pk
({'
It is easy to see that the determinant of the quadratic form
(k  1)
(PtP2
variables is
·
·
·
Pk)1
1 (2?r)2
ff
•
•
•
f  !<••'+••'+ e
in
Hence, by a proper linear trans
•
formation the above integral reduces to
,k;:· ;1
({'
2
. . .
•• dv dv 1 2 +• •)
·
·
·
dvk1
SEc.
13)
327
NORMAL DISTRIBUTION IN TWO DIMENSIONS 2
of integration being v¥ + v: + + vl_1 � x • But this multiple integral, as will be shown in Chap. XVI, Sec. 1, can be the
domain
·
·
·
reduced to a simple integral
Thus
.
hm P,.
_
=
The probability
(A)
Q,.
=
X (k ) L
1 k3 1 22r 2
e
.!u• 2 uk2du.
o
1  P,. of the opposite inequality
.. _n=p=1)' 2 + (ms nps)2 + (.:.m_l_ np2 np1
tends to the limit
and for large
n
we have an approximate formula
Q.
�
(�  l)f l..
!='! 22r 2
e
u''du,
x
but the degree of approximation remains unknown.
In practice, to
test whether the observed deviations of frequencies from their expected
values are significant, the value of the sum (A), say x2, is found; then
by the above approximate formula the probability that the sum (A) will be greater than
x2
is computed.
If this probability is very small, then
the obtained system of deviations is significantly different from what could be expected as a result of chance alone.
The lack of information
as to the error incurred by using an approximate expression of
Q,. renders
the application of this "x2test" devised by Pearson somewhat dubious.
HYPOTHETICAL EXPLANATION OF EMPIRICALLY VERIFIED' CASES OF NoRMAL CoRRELATION 13. Normal distribution in two di:rp.ensions plays an important part in target practice.
It is generally assumed on the basis of varied evidence
collected in actual target practice that points of a target hit by projectiles are scattered in a manner suggesting normal distribution.
By referring
INTRODUCTION TO MA'I'HEMATICAL PROBABILIT Y
328
[CHAP. XV
points hit by projectiles to a fixed coordinate system on the target, it is possible from their coordinates to find approximately (provided the number of shots is large) the elements of ellipses of equal probability. Dividing the surface of the target into regions of equal probabilities as described in Sec. 4, and counting the actual number of hits in each region, the resulting numbers in many reported instances are nearly equal.
That and the agreement with other criteria are generally con
sidered as evidence in favor of assuming the probability in target practice to be normally distributed. Twodimensional normal distribution or normal correlation has been found to exist between measurable attributes, such as the length of the body and weight of living organisms.
Attributes like statures of parents
and their descendants, according to Galton, again show evidence of normal correlation. Facing such a variety of facts pointing to the existence of normal correlation, one is tempted to account for it by some more or less plausible hypothesis.
It is generally assumed that deviations of two magnitudes
from their mean values are caused by the combined action of a great many independent causes, each affecting both magnitudes in a very small degree.
Clearly, the resulting deviations under such circumstances may
be regarded as components of the sum of a great many independent vectors.
Then, to explain the existence of normal correlation, reference
is made to the fundamental theorem in Sec. 11. Problems for Solution 1. Let p denote the probability that two normally distributed variables (with means
=
O) will have values of opposite signs.
lation coefficient
r 2. Variables
Show that between p and the corre
r the following relation holds:
x,
=
cos p7r.
y (with the means O) are normally distributed. x, y to be located in an ellipse =
Show that the
probability for the point
z2
x
y
y2
.,.1
1Tt
.,.2
2 2r+2
=
l
is greater than the probability corresponding to any other domain of the same area.
3. Three dice colored in white, red, and blue are tossed simultaneously n times. Let X and Y represent the total number of points on pairs: white, red and white, blue. Show that the probability of simultaneous inequalities
7n + toy'ii;, <X< 7n + ttV¥n; tends to the limit
asn> oo.
7n + ToVij.;; < Y < 7n + ny'ij;,
NORMAL DISTRIBUTION IN TWO DIMENSIONS 4. Three dice, white, red, and blue, are tossed simultaneously
n times.
329 If k and
l
are frequencies of 10 points on pairs: white, red; red, blue; show that the probability of simultaneous inequalities
n
+
12
{11n < l < n + / 11 n "�144 12 "''\Jl44
tends to the limit
asn+ oo,
5. Two players, A and B, take part in a game arranged as follows: Each ti� e one
ball is taken from an urn containing 8 white, 6 black, and 1 red ball; if this ball is white, black,
A and B both gain $1; A loses $2, B loses $4;
$4, B gains $16. Let a,. and a,. be the sums gained by A and B after n games. A gains
red,
Show that the probability
of simultaneous inequalities
TO
for very large
�
n will be approximately equal to
Note that the probability of the inequality
s,.a,. < 0 is about 0.13not very small
so that it is not very unlikely that the luck will be with one player and against another.
6. Concentric circles C,, C1 , Ca, ...in unlimited numbers are described about Points P,, P1, Pa, . ..are taken at random on these circles. Let R be the end point of the vector representing the sum of vectors OP1, OP1, OPa, H r,, r1, ra, ...are radii of C,, Ct, Ca, and the condition the origin 0.
•
.,a 1
+ .,a+ I
(r� + r: +
•
. , , +.,a " + r!)il
·
•
•
as
+0
·
n+
oo
is fulfilled, show that the probability that R will lie within the circle described with the radius
p
for large
about the origin will be very nearly equal to
n.
1

e rt•+r•'+
pi ·
·
·
+""'
References A. BRAVAis: Analyse matMmatique sur les probabilit� des erreurs de situation d'un point, Mem.
pr�enUs par div. sav. a l'Acad. Roy .Sci. l'Imt. France, 9, 8 1 46. Ann. l'Bcole polyt.
Ca.M.ScaoLs: Thoorie des erreurs dans le plan et dans l'espace,
Delft, 2, 1886. K. PEARSON: Mathematical Conttibutions to the Theory of Evolution, Philos. Tram . , 187A, 1896.
330
INTRODUCTION TO MATHEMATICAL PROBABILITY
[Cuu. XV
S. BERNSTEIN: Sur l'extension du theoreme limite du calcul des probabilites aux sommes des quantites dependantes, Math. Ann., 97, 1927.
A.
KHINTCHINE:
BegrUndung
der
Normalkorrelation nach
der Lindebergschen
Methode, Nachr. Ver. Forschungsinstitute Univ. Moskau, 1, 1928.
G. CABTELNuovo: "Calcolo delle probabilitA," vol. 2, Bologna, 1928. V. RoMANOVBKY: Sur une extension du theoreme de Liapounoff sur la limite de prob abilite, Bull. l' A cad. Sci. U.S.S.R., No. 2, 1929.
CHAPTER XVI
DISTRIBUTION OF CERTAIN FUNCTIONS OF NORMALLY DISTRIBUTED VARIABLES 1. In modern statistics much emphasis is laid upon distributions of certain functions involving normally distributed variables.
Such dis
tributions are considered as a basis for various "tests of significance" for small samples, that is, when the number of observed data is small. Some of the most important cases of this kind will be considered in this chapter.
x1, x2, . . . x,.
Independent variables
Problem 1.
distributed about their common mean deviation
u.
=
are normally
0 with the same standard
Find the distribution function of the sum of their squares 8
X�
=
+
X�
+
·
•
•
x!.
+
The inequality
Solution.
X� < t being equivalent to
the distribution function
F;(t)
1
==
uy2n
v'i J
e
v'i
ve < x;
_=:
2"'dx
=
F;(t)
=
1
uy'2;

0
Sot �2"'u e
_
! 2du
for
t
t
for
o
< 0.
Hence, the characteristic function of any one of the variables x! is
IPk(t)
=
1
uy'2; 
l,.
e

( 2"' _!_ ) u u 
t't
_
! 2du
1

o
and that of their sum !p (t)
=
1
(
1
(uv'2)" 2u2

(
1
uv'2 2u2
=
)
. 
)_!
it
• !!·
�t
;:;:; 0
2
Consequently, the distribution function of 8 is expressed by
2
x�, xi,
INTRODUCTION TO MATHEMATICAL PROBABILITY
332
and it remains to transform this integral. distributed over the interval
(0, + aJ )
[CHAP. XVI
To this end, imagine a variable
with the density
Its characteristic function is
(
1 20'2
 /n
2
(O'y ..:;)n   it
)�
and since the distribution function is given a priori, we must have for
t � 0 <·
rr
r

2
r·. ;;.,;;;·du Jo
�
const. +
<·
'{;'"f •. ( ; �·:/•· 00 w
20'2
 w 2
Hence
The constant must be
=
member vanishes fort
=
F(t)
0 since F(t) as well as the integral in 0. The final expression is therefore:
1 =
(D'.y2)nr (;) F(t)
=
fo'e ;;.,u�1du
for
the right
t � 0
0.
for
0
t � 0.
The probability of the inequality Xl
+
X�
+
·
•
•
+
X� <
t,
on the other hand, can be expressed directly as a multiple integral
extended over the volume of the ndimensional sphere S Xl
+
X�
+
•
•
•
+
X� <
t.
By equating both expressions of F(t), we obtain an important transforma tion,
SEc. 2)
DISTRIBUTION OF CERTAIN FUNCTIONS
II··· f
<1>
:a:••+:a:s•+ 21r• •
e
.
•
+.:..•
dxtdXs
·•·
dx..
333
=
L' (�) o n
11'2
=
If
F(xf + x� +
· ··
_...!._ !.'_1
e 211'u2 du.
r
+ x=) is an arbitrary function of =
tt
x¥ + x� +
·
·
·
+ x!,
the integral
1
{uvl21r)n
If I ·
·
·
e
:a:··+:a:··+ ... +z.· F(xi + 2u•
+ x!)dxtdX2
·
·
·
·
·
dxn
·
extended over the whole ndimensional space represents the mathematical
F(u). On the other hand, the distribution function of u being known the same multiple integral will be equal to expectation of
1
{uV2)"r Taking in particular
{2)
II
.
..
f
u
=
(i)
f Jo
1, F(u)
oo
0
=
n2
u
e 21r•F(u)u_2_du. ..
e""J.i, we get the formula
� <:a:••+ ... +z.•>+.. v:a:••+ ... +z.•dxtdxs ... dxn e
=
n
11'2
r(�)Lo
oo
=
_!+,.ul �
e 2
u 2du1
which will be used later.
2. Problem 2.
Variables
Xt1 X21
Denoting their arithmetic mean by
8
=
Xt + Xs +
•
••
Xn are defined + Xn
n
find the distribution function of the sum 2;
Solution.
=
(xt
 8)2
+ (xs
 8)2 ·· · +
The probability of the inequality 2; < t
is expressed by the multiple integral
as in Prob. 1.
I
8•
+ (x,.  )2
334
[CHAP. XVI
INTRODUCTION TO MATHEMATICAL PROBABILITY
1
F(t)
=
y'2; (u 2?r) "
If
·
·
·
f
"'•'+"'•'+ 2cr• •
e
•
•
+zn'
dx,dx2
·
·
·
dx,.
extended over the volume of the ndimensional ellipsoid
(x1  8) 2 + (x2  8)2 +
Let
·
·
+ (x,.  8)2 < t.
·
whence = 0
U1 + U2 + ' ' ' + U,. and
xl + x: +
Taking
u,, u2,
Jacobian J of
=
J
1 1 1
.
.
·
.
·
+ x�
·
1t,._,,
u\ + u: +
=
·
+ u� +
·
·
and 8 for new variables, we must first find the
x,, x2, . . . x,.
with respect to
u 1, u2, . . . Un1, 8.
1
0
0
0
1
1
0
0
. . 0
0
1
0
0
1
0
1
0
. . 0
0
0
1
0
1
0
0
0
1
1
0
0
0
1
1
1
1
1
. . 1
n
0
0
0
0
In the new variables the expression for
F(t)
F(t)
=
n82.
It is
will be
u•'+ . • 2;, . +u. d8du,du2 ;:: '"'+ ... f f f : (u 2?r)" e
•
•
. du _ ,. ,
and the domain of integration in the space of the new variables is defined by
ul + u � +
oo<8
·
·
+ u �1 + (u1 + u2 +
·
·
·
+ u,._1)2 < t.
·
After performing the integration with res pect to 81 we get
Vn F(t) (uy'2;)nl The quadratic form lfJ =
If
ul + u� +
·
·
·
·
·
·
f  Ut'+u••+ e
•
2.'
. . +un•
+ u !_1 + (u, + U2 +
can be represented as a sum of the squares of variables
u,, u2,
. u,._,: lfJ =
The Jacobian
vl + v� +
·
·
·
du,du2
·
·
·
(n  1)
+ v!_1•
·
·
·
du,._1,
+ Un1) 2 linear forms in
SEc. 2)
335
DISTRIBUTION OF CERTAIN FUNCTIONS
o (vl, V21 Vn1 ) o(ul, U2, ... Un1) •
•
•
is the square root of the determinant of the form If', which is the same as the determinant of linear forms
! 8��' = 2u1+u2
+
2ou1 ! Olf' = u1+ 2u2 + 2 OU2
!� 2 O Un1
+un.t +Un1
=u1+u2+
Now, in general p times
xu . . lh1 . .
1 1
= (X
1)P1(X+ p


1)
111 . . so that the determinant of If' is
=n, whence
o(vl, V21 o(u1, U21 and
•
•
• • Vn1) = Vn • • Un1)
o(u1, U21 • • • Un1) = o(V11 V21 • • • Vnl)
Therefore, taking
v1, v2, • •
1
yn:
Vn1 for new variables, F(t) can be expressed
as follows
F(t) =
1
(uyl2; 27!)n1
If
· · ·
f
e
·
vt•+vo•+ . . . +•••' :z.r•
dv1dv2
·
·
·
dvn1
where the integral is extended over the volume of the sphere
vf + v� +
·
·
·
+v�_1
<
t.
This multiple integral is exactly of the type considered in the preceding problem, and it can be reduced to a simple integral as follows
336
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XVI
Mter substitution, the final expression of F(t) is F(t)
1
(
=
(crv/2)ntr n; F(t)
=
3. Problem 3. Variables As in Prob. 2, we set
)
1
0
J:'  "2 'u2 du u
for
t>O
o
for
Xt, x2,
•
t � 0. •
X1 + X2 +
8
n3
e
•
Xn
•
•
i
Xn
+
'
=
n
are defined as in Prob. 1.
1, 2, . .
=
.
n
and introduce the quantity E
=
�u12 + u22 + .
. . +
n
un2 ,
What is the distribution function of the ratio
8 E
or, which is the same, the probability F(t) of the inequality
8
< tE?
First, assuming t to be positive, let us find the probability
Solution.
1/>(t) of the inequality
or ull1
+ u22
+
... + un2
< =
n82
t2
·
This probability can be presented in the form 1/>(t)
=
n � cr ( 2Jr)n
where the multiple integral
'¥(8) in which
=
II
·
I
L..
2"q,(8)d8
·
·
0
ut•+u•'+ 2u• ·
e
ns• e
+u.•
du1du2
•
·
•
dunl
SEC. 3]
DISTRIBUTION OF CERTAIN FUNCTIONS
337
is extended over the domain
Proceeding in exactly the same manner as in Prob. 2, we can transform
'IJI(s)
into
'lt(s)
If
=
· · ·
Vn 1
f
e
dv1dv2 · · · dv,._1
2u•
···+···+
.
.
.
+····
extended over the sphere
in the space of the variables v1, v2, .
Vn1·
For this multiple integral
we can substitute a simple integral n1
ns'
n1
n1
a
du =
r(�)J."•,:'u•;' ��J.'•and thus reduce
'lt(s)
;;.,••d!
to the form
:Ms. n1
"'(·)
=
n2
a
·· 
:::,.'d,.
After substitution we can express cf>(t) as a repeated integral
cf>'(t)
=
338
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XVI
whence
Now
q,(+
so that C
=
1
00
)
=
.. J '"(1 + z2)i
2
dz
0;
v;rr(n1 ) r(�)
=
and
q,(t)
=
1

r v;rr(n(�); 1) J' '"(1 + z2)2 dz
n
Such is the probability of the inequality
� tE.
8
The probability F(t) of the inequality
will be
1
 ,P(t)
8
<
tE
or
F(t)
=
v;rrr(;) (n; 1) J' '"(1
but this is established only for positive t. for negative tas well .
Fortbeing negative 8
<
TE
is entirely equivalent to 8 >
and its probability is evidently
TE
dz n' + z2)2
However, this result holds T the inequality
=
SEc.
4)
339
DISTRIBUTION VF CERTAIN FUNCTIONS
But
which permits of writing the preceding expression for
(n)�r(n; 1) i r
F(

)
r
=
.
2
F( r)
as
follows·
n oo
(1
+
z2) Zdz
=
T
Thus, no matter whether
t
is positive or negative, the distribution func
tion of the ratio 8 E
or the probability of the inequality
s
by F(t)
The distribution of the quotient 8/E was discovered by a British statistician who wrote under the pseudonym "Student," and it is com monly referred to as "Student's distribution."
The first rigorous proof
was published by R. A. Fisher.
4. Problem 4.
Variables x,
y are in normal correlation.
A sample of
x., Yn is taken and the" correla corresponding pairs, Xt, Yti x2, Y2i tion coefficient of the sample" is found by the formula
n
.
p
=
•
•
2:(x;  s)(y;  s') y2:(x;  8)2 2:(y ;  8 )
�====���==�� ' 2 •
where, for the sake of abbreviation, 8
=
+ Xn Xt + X2 + ' �����
n
.
, 8
Yt + Y2 +
·
=
n
+ Yn,
INTRODUCTION TO MATHEMATICAL PROBABILITY
340
[CHAP.
XVI
Find the distribution function of p, that is, the probability P of the
R for a given R( 1 < R < 1). Solution. Since the expression of p is homogeneous in x1, xz, x ,.; uz 1. Also without loss of general y,. we can assume u1 y1, y2, 0 Denoting by r ity the expectations of x and y may be supposed the correlation coefficient of x and y, the density of probability in the inequality p <
•
•
•
=
•
.
.
=
=
.
twodimensional distribution will be:
1
1(z1+y• 2rzy) 2(1r•)
=.::=. e
r2)l
27r(1 
·
Hence the required probability will be expressed by the multiple integral
P
!!J f
1 =
·
·
·
(27r)"(1  r2)2
f
e
20�r•>dx1
·
·
·
dx,.dy1
·
·
·
dy,.
extended over the 2ndimensional domain
2;(x,  8)(y,  81)
(3)
<
Ry2;(x,  8)2
·
2;(y1  8')2
and
(4) 1, 2, ... n), respectively, by �x,, �y,, x,, y,(i we can write P thus: Replacing
P while
(3)
=
=
<\;,.}:)iJ J
and
(4)
·
·
·
J 4"'
dx1
e
·
·
dx,.dy1
·
·
·
·
dy,.
still hold but with the new notation for the variables.
Let us set now Xi 8
=
Y• 81
Ui,
v,,
=
then
U1 + Ug + ' ' ' + U,. Introducing 81 81;
u1, U21
find as in Sec. 2
•
•
•
=
0,
Un1
v1 + ;
V11 Vz1
•
Vz •
+ •
·
·
·
+ v,.
=
n82
+
n812  2nr8s' +
0.
Vn1 as new variables, we
where
1/1
=
};ul + };vl 
2r };u.v,
4)
SEc.
341
DISTRIBUTION OF CERTAIN FUNCTIONS
and the domain of integration is defined by the inequalities 
s
oo <
< oo;
l:u,v,
<
s'
oo <
< oo
Ryl:ul· l:v[.
Now by the same linear transformation the quadratic forms
l:v[
n
(each containing

1
l:u[,
independent variables) can be transformed
into n1
n1
�wl,
� zil
i=I
i=1
""' 2.
at the same time n
n1
�u.v,
�WiZi·
=
i=1
Proceeding as in Sec.
2
and noting that
J f.,. e oo
oo
we find that
rn2)1 2
i=1
n (•'+•''  2 rss') 2

n1
p
=
(1

(2'11')
where
dsds'
2'11' ' n�
=
oo
If
· · ·
x =
J
1
e
nx 2
dw1 · · · dwn1dZ1 ·
·
•
dzn1
l:wl + l:zl  2r l:WiZi
and the domain of integration in the space of
2n

2 dimensions is defined
by
l:wiZi
<
Rv l:w� · l:zf.
We shall integrate now in regard to variables system of values
w1, w2,
transformation
Z1 Z2 Zn1
. . . Wn1·
z1, z2, . . . Zn1 for a fixed
To this end we use an orthogonal
=
C1,1� 1 + C1,2�2 + ' ' ' + C1,n1�n1 C2,1�1 + C2,2�2 + ' ' ' + C2,nl�n1
=
Cn1,1�1 + Cn1,2�2 + ' ' ' + Cn1,n1�n1
=
in which the elements of the first column are
C;,l
=
� VW +
Wi .
,
•
+ W�1
Wi W
342
INTRODUCTION TO MATHEMATICAL PROBABILITY
Defining
��, �2,
•
•
[CHAP. XVI
�n1 by
•
WI = C1,1�1 + C1,2�2 + W2 = C2,1�1 + C2,2�2 + we shall have � � =
w, �2 =
=
orthogonal transformations
+ C1,nl�nl + C2,nl�nl
�n1
By the properties of
0.
=
so that for a fixed system of values integration in the space of variables
w1, w2, r1, r 2, •
•
•
•
Wn1 the domain of
rn1 will be
•
(5) Thus we must first evaluate the integral
J = If
·
· · f e Hl"•'+
..
.
+!"·•'> ll"•'+rwt•dt 1dt2
If rl < 0 no restriction is imposed upon r2,
r� + · · · + r��
(�
>
2
•
•
 1
.
•
•
dtn1·
•
tnli if rl
)n
Consequently the result of integration in regard to t2,
•
•
•
>
o, then
tn1 can be
presented thus:
where the inner integral is extended over the domain
r� + · · · + r��
(�2 )n  1
<
Making use of formula (1), Sec. 1, the expression of
and cis a constant.
J reduces to r'w' 
J = ce
2

n2 211' 2
r(
n
l.. r•• r for' ( )
;2
o
 1
e 2
+ rw ,
dtl
o
1 .1 R
)l
••

e 2vnadv.
This has to be multiplied by
1 (1 (21!')nl

n1
_!:Ew•'
r2 ) 2 e 2
dw1 · · · dwn1
and integrated over the whole space of the variables The resulting expression for P will be
w1, w2,
Wni·
SEc.
4]
343
DISTRIBUTION ,OF CERTAIN FUNCTIONS
P  const. 
n1 n2 2'11'2 (1 r2)2
(2'11')"lf where
M
=
fo
..
(n ; 2)If e
_!r,•+rwi• ds1 2
J:r.(�1)1 0
e
� 2v"3dv •
Now we differentiate in regard toR, reverse the order of integrations, and make use of formula
Sec. 1; the resulting value of dP/dR will
(2),
then be expressed as a double integral
n4 n1 2(1 1'2)_2 _(1 R2)_2 _
1
dP dR
'II'
=
or
dP �
) n2 (n1 Lo Lo 2,_3r ( 2 r 2 ) n1
=
'"
00
t +u')+Rrtu e !(' 2
(tu)" 2dtdu,
n4
2 f .. f .. e H'+ ' + tu (1  1'2)_2_(1 R2)__ t u l Rr (tu)n2dtdu d�� � Jo . '
since
r
)
)(
(
n1 n2 r 2 2
=
y; ,. r(n2). 2 _3
In the double integral we make transformation to new variables E, 11
defined by
E
t u
= _,
11
=
tu.
The Jacobian oft, u in regard to E, 11, beinw 72E1, we have
.. .. e �<•t +u•> +Rrtu(tu)n2dtdu
So So
 2 1 r(n 
 1)
f Jo
..
and so, finally,
dp dR 
In case r
=
=
2
E
(
�
=
i
So .. So
..
e �
EldE 1 E 1 r "  R
)
n1
'II'
r(n  1)
n4
�(1 r2)_2_(1 R2)_2 _
:
(E+t'Rr) 1/n2dTJ
Lo
E
=
dt Jf.. o (cht Rr)"
1'
dt (cht Rr)" 1•
oo
0, that is, when the variables x, y are uncorrelated, we have
a very simple expression of P:
344
INTRODUCTION TO MATHEMATICAL PROBABILITY
p
In case
=
n21) (v;r(n; 2) f r
[CHAP.
XVI
n4
R
1
(1  p2)_2_dp .
r :;6 0 the integral
f"' Jo (cht can still be found in finite form.
"' Lc
whence
f "' J o (cht
dt  Rr)"
dt ht  Rr
1
=
=
V1
 2)
We have, in fact,
1
_
[ 2 7r
R2r2
dn2
r
(n
dt  Rr)" 1
I dRn2
{[1
.
+arc sm
]
(Rr) ,
[
.  R2r2)! 27r + a.rc sm (R1·)
]}
'
and so
P =A
J_R1 (1  p2)n;':;:�2{[1 
where
A When
n
=
p2r2]
(1 1r(n 3)1
r
{�
+arc sin
(rp)
]}
dp,
n1
 r2) 2.
is an even number, this integral appears in a very simple finite
form, but in case of an odd type appear.
n
certain integrals of a rather complicated
Besides, the behavior of P for somewhat large
be easily grasped by using this integral expression for P.
n
cannot
6. Fisher, who was first to discover the rigorous distribution of the
correlation coefficient, called attention to the fact that, setting
thz
=
2:(x,  s) (y,  s') y2:(x,  s)22:(y;  s')2
,
the distribution of z will be nearly normal even for comparatively small values of
n.
Let us set
p Instead of
_
n
thR = w, th!
 2
 7r
"' f"'
J .. Jo _
=
r; then P can be expressed thus:
chzdtdz . (chtchzch!  sh!shz)" 1
t it is convenient to introduce a new variable chtchzch!  sh!shz = r1ch(z  !).
r
so that
SEc. 5]
345
DISTRIBUTION OJ<' CERTAIN FUNCTIONS
chz) l J"' .. (chr [chCz  mn
Then
P
=
dz
n 2 'lrV2
where
P for all values of
z
ch(z + r)  2chzchr
I
flr1(1  r)n2dr Jo v'1  pr
ch(w + r) 2chwchr
< =
under consideration.
Now
and
since
f lr1(1  r)n2dr < f \1(1  r)n2(1 + pr)dr • )o )o v'1  pr for 0 < p < 1 as can be easily verified. Consequently dz p (n  2)r(n  1) "' chz l 
( )  !) J .. chr [ch(z  r)]nt [ 1 ch(w r) •
_
y'21rr(n

+
·
+
]
8
0<8<1.
2chwchr 2n  1 ;
As to the integral in this formula, its approximate expression, omitting
f..
terms of the higher order, is:
oo
e
_2na<•t>• 4
t
dz hr e 2n3
Thus for somewhat large
n
_2na<.. r>• 4
•
the required value of P can be found with
the help of a simple approximate formula. ·
The various distributions dealt with in this chapter are undoubtedly
of great value when applied to variables which have normal or nearly normal distribution. be doubted.
Whether they are always used legitimately can
At least the "onus probandi" that the "populations" with
which they deal are even approximately normal restswith the statisticians.
1. Show that
Problems for Solution
l+C � (n) L n 
n2
lim ,. oo
n>
2 2 r 2
0
n 8
nu n 1 
2u2
du
' ::11 v'2rJl
==
HtNT: Liapounoff's theorem and Prob. 1, page 332.
oo
8

1
2
1
du
346
[ CHAP. XVI
INTRODUCTION TO MATHEMATICAL PROBABILITY
2. With the same assumptions and notations as in Prob.
3, page 336, show that the
distribution function of the quotient
Xi 8 E
;
i =
1, 2,
. . .
n
is
F(l)
{·T)n 2 r (·  n 1r.. ( ; ) .y;;=I

_
V?r(n1)r F(t) =
1
t
if
>
�;
It is worthy of notice that for
ltl
if
�
Vn"=1
..

n = 4 the
F(t)
=
0
t< � .
if
distribution is uniform.t
3. In two series of observations, samples
x., x2,
. . . Xn and y., Y21
•
•
•
Yn• from
the same normally distributed population (or of the same normally distributed vari able) are obtained.
8 =
X1
Denoting for brevity
+ X2 + ' n E
=
' ' + Xn
�(; ;;} +
�(Xi  8 ) 2
find the distribution function of the quotient
F(t) =
r
8' =
Y +� Yn' 1 + Y2 + .... .::_'.::_'...:.. ·
+
8  81 E
n'

·
�(Yi  81 ) 2) , Ans.
("Student").
(n + n'2 1) 2) J' n+ r( .. (1 + 2
 r v ,.
·
�
dT n+n
•
'1'2
)
•
1
2
References "STUDENT": The Standard Error of a Mean, Biometrica, vol. 6, 1908. R. A. FisHER: Frequency Distribution of the Values of the Correlation Coefficient in Samples from Indefinitely Large Population, Biometrica, vol.
:On
10, 1915.
the Probable Error of a Coefficient of Correlation Deduced from a Small
Sample, Metron, vol.
1, 1920.
D. JACKSON: On the Mathematical Theory of Small Samples, Ame1·. Math. Monthly, vol. 1
42, 1935.
CoMMANDER LHOSTE: Memorial de l'artillerie fran9aise, pp.
245 and 1027, 1925.
APPENDIX I Let f(x) be a function with a f'(x) in an interval (a, b) where a and b > a are
1. Euler's Summation Formula. continuous derivative
arbitrary real numbers.
The notation n ;:i!b
!J(n)
n>a
n
will be used to designate the sum extended over all integers which are >a and �b. It is an important problem to devise means for the approxi mate evaluation of the above sum when it contains a considerable number of terms..
Let [x], as usual, denote the largest integer contained in a real number x, so that x [x] + 0 where 0, socalled "fractional part" of x, satisfies the inequalities =
0�0<1. Considered as functions of a continuous variable discontinuities for integral values of
x.
x, both [x] and
0 have
The function
[x]  x + t is likewise discontinuous for integral values of x. Besides, it is a periodic function of x with the period 1; that is, we have p(x + 1) p(x) for any real x. With this notation adopted we have the following p(x)
=
!

0
=
=
important formula: nSb
(1)
tJ(n) J)<x)dx + p(b)f(b)  p(a)f(a)  J>(x)f'(x)dx =
n>a
which is known as "Euler's summation formula."
Proof.
Let k be the least integer
>a and l the greatest integer �b.
The sum in the left member of (1) is, by definition,
f(k) + f(k + 1) +
.
. . + f(l)
and we must show that this is equal to the right member. we write first 347
To this end
348
INTRODUCTION TO MATHEMATICAL PROBABILITY
(H f.abp(x)f'(x)dx f."a p(x)f'(x)dx J(z bp(x)f'(x)dx J iA: i 1p(x)f'(x)dx. j s:+1p(x)f'(x)dx LH1(j �)f'(x)dx _f(j) �(j 1) fiH1f(x)dx J i=l1
+
=
Next, since
� ..,
+
is an integer,
+
X+
=
+
+
=
+
and
i=l1 i 1 n=l1 ) f( f( ) �i + p(x)f'(x)dx _ k t l  � f(n) Lf(x)dx. +
=
1=A:
n=A:+1
On the other hand,
i"p(x)f'(x)dx i"(k  x �)f'(x)dx _f�) p(a)f(a) if(x)dx .[bp(x)f'(x)dx .[b(z x �)f'(x)dx _!�) p(b)f(b) .[bf(x)dx, 1
=
+
+
=
+
+
=
+
=
+
so that finally
J.>(x)f'(x)dx
=
f(k)  f(k + 1)

·
+
f(l) p(b)f(b)  p(a)f(a) J)<x)dx; ·
·
+

+
whence
n:S:b a J)<x)dx p(b)f(b)  p(a)f(a)  J:bp(x)f'(x)dx, n>�f(n) +
=
which completes the proof of Euler's formula.
Corollary 1.
The integral
JCzp(z)dz u(x) =
represents a continuous and periodic function of
u(x
+ 1)
If 0 �X� 1,
x
with the period
 u(x) J:+1p(z)dz Jo1p(z)dz JC1(i  z)dz =
=
=
1.
=
0.
For
349
APPENDIX I
11(x)
=
f'"o (2l  z)dz = x(l  x) 2
J
and in general
O'(X) where
0 is
a fractional part of
x.
0 �
=
0(1
 0} 2
Hence, for every real
O'(X) � j.
Supposing thatf"(x) exists and is continuous in parts, we get
x
(a, b) and integrating by
J>(x)f'(x}dx O'(b)f'(b}  q(a)f'(a)  J:bO'(x)f"(x)dx, =
which leads to another form of Euler's formula:
S:b "'£f(n) J)<x}dx + p(b)f(b)  p(a)f(a)  O"(b)f'(b) +
n
n>a
=
+ Corollary 2.
If
11(a)j'(a) + J:O'(x)f"(x)dx.
f(x) is defined for all x � a and possesses a continuous (a, + oo); if, besides, the integral J: p(x)f'(x}dx
derivative throughout the interval oo
exists, then for a variable limit
(2}
n
b
we have
S:b
"'£f(n)
n >a
= C+
jf(b}db + p(b)f(b) + !"'p(x)f'(x)dx
where C is a constant with respect to It suffices to substitute for
b.
J:p(x)f'(x}dx the difference
£ p(x)f'(x)dx  h p(x)f'(x)dx oo
oo
and separate the terms depending upon
2. Stirling's Formula.
b
from those involving a.
Factorials increase with extreme rapidity
and their exact computation soon becomes practically impossible.
The
question then naturally arises of finding . a convenient approximate
350
INTRODUCTION TO MATHEMATICAL PROBABILITY
expression for large factorials, which question is answered by a celebrated formula usually known as "Stirling's formula," although, in the main, it was established by de Moivre in connection with problems on proba bility.
De Moivre did not establish the relation to the number 11" =
3.14159 ...
of the constant involved in his formula; it was done by Stirling.
(2)
In formula
it suffices to take
a
=
�.
J(x)
= log
x,
and replace b
by an arbitrary integer n to arrive at the remarkable expression log
(1
·
2
·
3
·
·
·
( �)
+ n+
n) = C
log n  n+
i ..p(x�dx
For the sake of brevity we shall set
where C is a constant.
f ..p(x)dx. Jn X
w (n) =
Now
f ..p(x)dx Jn X
=
and
f k+1p(x)dx Jk x
=
=
i
f"+1p(x)dx + "+2p(x)dx + Jn X n+l X
rp(u)du = f ' p(u)du f1p(u)du = Jo u + k +J; u +k Jo u + k f' Ci  u)du + f1(! u)du = ! f 1 (1  2u)2du . 2 j o(k+u)(k+1u) J; u+k Jo u+k
Hence
where
F (u) ,.
=
1
� (k +u)(k + 1 u)' k=n
Since
(k +u)(k + 1  u) it follows that for 0 < u < � (k (k Thus for 0 <
+ +
u
u)(k + 1  u) u)(k + 1  u) <
�
> <
=
k(k +1) +u  u2,
k(k +1) (k +!)2
<
(k +!)(k +t).
351
APPENDIX I
..
1 1 Fn(u) < � k(k + 1) =n �
A:=n
..
Fn(u)
>
1
�
� (k + i)(k +f) A:n
1 =
n +
f
Making use of these limits, we find that
w(n)
<
w(n)
>
1 r• 2u) 2du = 12n 2n)o 1 1 r• 2 2d u) u = 12(n+ !)' 2n + 1)o
1 (1
(1
and consequently can set
( )
w n
where
=
1 12(n + 8)
0 < 8
<
i.
Accordi!lgly log
(1 . 2·3 .
·
( + �)
· n) = C +
n
lo n g  n +
1 12(n + IJ)
•
The constant C 4epends in a remarkable way on the number
'II'.
To show this we start from the wellknown expression for 'II' due to Wallis:
�
2
=
l im
2n . .i.i . . . 2n1 2n+1 133 5
�).
(� . �
n+
co
which follows from the infinite product sin by taking
x2 x=x 1'11'
(
1
 '11' 9 2
)(  �)(1 �) .
2
4'11'2
. .
x='II' /2. Since
2 2 4 4 · · · ···
I 3 3 5
2 2·4·6 · · 2n 1 2n 2n · = · · ··· 13 5 2n  1 2n + 1 (2n1) 2n + 1
[
·
]
we get from Wallis' formula
y;;:
=
li m
[
1 2·4·6 ..· 2n __ · · 5 ·· (2n  1) Vn , 13 ·
]
n+co,
On the other hand,
2·4·6 ··· 2n=2n·1·2·3··· n 1·2·3·· ·2n 1 . 3 . 5 . . . (2n  1) = n .1 . . 2 2 3 . . . n'
352
INTRODUCTION TO MATHEMATICAL PROBABILITY
so that ;= v1r _
. =hm
{
}
22"(1·2 ·3 ...n)2 1 ·, 1·2·3 . . ·2n Vn
n+
Q)
or, taking logarithms log v; = lim [2n log 2 + 2 log (1·2·3 · . n)  log (1·2·3 ··· 2n)  ! log n ] .
But, neglecting infinitesimals,
log (1 ·2 ·3 ··· n ) = C + (n + !) log n  n log (1 2·3 ··· 2n) = C + (2n + !) log 2n 2n ·
whence

lim [2n log 2 + 2 log (1·2·3 ··· n )  log (1·2·3 ·· 2n)  ! log n] = C  ! log 2. Thus logy.;;: = C  ! log 2, and finally (3)
log (1·2·3 · · n ) =log y27r +
C =log
(n + �) +
y27r
log n  n +
1
12(n + 8);
This is equivalent to two inequalities
;
e12 +6 <
1 .2 .3 . . . n
�n"en
�
< el n
which show that for indefinitely increasing n lim
1·2·3 ·· · n
�n"en
=l.
This result is commonly known as Stirling's formula. For a finite n we have 1 .2 .3 . . . n = �n "e" . where 1
The expression
12(n + !)
<
n) "'(
<
1 12n
'
ew(n)
0 < 8 <
1
2'
353
APPENDIX I
1 2 3 n for large n in the sense that the ratio of both is near to 1; that is, the relative error is
is thus an approximate value of the factorial small. large
n,
·
·
On the contrary, the absolute error will be arbitrarily large for but this is irrelevant when Stirling's approximation is applied
to quotients of factorials. In this connection it is useful to derive two further inequalities. Let
m
<
n;
we have, then, k=n1
Fm(u)  F,.(u)
�
=
k=m
u
and further, supposing 0 <
1 (k+ u)(k+ 1  u)i
Y2,
<
k=n1
Fm(u)  F,.(u)
<
>
1 m
1 n
k(k
�
1 1 (k + !) k+ t) =m+!n+f
k=m k= n1
Fm(u) F,.(u)
� 1)
�
�
k=m
Hence,
w(m)  w(n)
<
1 1 12m  12n'
w(m)  w(n)
>
1 12(m+ !)
1
12(n +
i)
and, if lis a third arbitrary positive integer,
w(m) + w(l)  w(n)
<
w(m) + w(l)  w(n)
>
1 1 1 + 12l 12n 12m 1 1 + 12(l + !) 12(m + !)
3. Some Definite Integrals.
1
12 (n + !)
·
The value of the important definite
integral
fo .. e''dt can be found in various ways.
One of the simplest is the following: Let
J = n
in general where
n is
fo .. et't"dt
an arbitrary integer � 0.
can easily establish the recurrence relation
J,.
=
n1 Jn2i 2
Integrating by parts one
354
INTRODUCTION TO MATHEMATICAL PROBABILITY
whence
J2m
=
J2m+l
=
1 ·3 ·
5
·
.
·
2m
1·2· 3 ·
(2m
 1)
Jo
m
2
On the other hand,
J..+t + 2'XJ.. + )..,21 .._1
=
.fo .. e••tn1(t + X)2dt,
which shows that
Jn+l + 2'XJ.. + X2Jnl > 0 for all real
X.
Hence, the roots of the polynomial in the left member are
imaginary, and this implies
J! Taking n for
J2m
2 ·4
1·
3 ·5
=
·
6
2m and n
J2m+l,
and ·
·
=
<
Jn+IJ n1•
2m + 1 and using the preceding expression
we find . 1
·2m
1) y4 m + 2
(2m
<
Jo
2 ·4·6 ·· ·2m
<
1 3 ·5 ·
··· (2m
_1_. 1) .yTni
But . lim
m
,.
1 ·3
· 5
·
hence
Jo Here substituting
t
1  1) ym 
2 4·6···2m
=
=
yau,
(2m 
·
·
Jo .. e••dt
where
a
=
� v '��' ''
!y';.
is a positive parameter, we get
r.. .d J o ea" u
=
1 r; 2\ja'
As a generalization of the last integral we may consider the following one:
v
=
..
Jro eau• cos budu .
The simplest way to find the value of this integral is to take the derivative
dV •

=

L 0
.. eou•
sin
bu ·udtt
.
and transform the right member by partial integration.
The result is
APPENDIX I
dV db
=
or
b V
2a
b• d(Veiii)
=
whence
V
=
(V),_o
=
0,.
b•
=
Ce 44.
To determine the constant C, take C
355
b
=
0; then
f eau'du Jo ..
=
f!, 2\ja
!
so that finally
f eau• cos budu Jo ..
=
f!.e �. 2\ja
!
The equivalent form of this integral is as follows:
APPENDIX II
METHOD OF MOMENTS AND ITS APPLICATIONS To prove the fundamental limit theorem
1. Introductory Remarks.
Tshebysheff devised an ingenious method, known as the "method of moments," which later was completed and simplified by one of the most prominent
among
Tshebysheff's
disciples,
the
late
Markoff.
The
simplicity and elegance inherent in this method of moments make it advisable to present in this Appendix a brief exposition of it. The distribution of a mass spread over a given interval characterized by a never decreasing function
rp(x),
(a,
b) may be
defined in
(a,
b)
and varying from rp(a) = 0 to rp(b) = mo, where mo is the total mass con tained in (a, b). Since rp(x) is never decreasing, for any particular point
xo,
both the limits lim lim
rp(xo  E) rp(xo + E)
exist when a positive number
E
= =
tends to
rp(xo  0) rp(xo + 0) Evidently
0.
rp(xo  0) � rp(xo) � rp(xo + 0). If
rp(xo  0) then
x0
=
rp(Xo + 0)
is a "point of continuity" of
rp(Xo + 0) Xo
is a point of discontinuity of
rp(x).
In case
rp(xo  0),
>
rp(x),
rp(Xo),
=
and the positive difference
rp(xo + 0)  rp(xo  0) may be considered as a mass concentrated at the point x0• In all cases rp(x0  0) is the total mass on the segment (a, xo) excluding the end point Xo, whereas rp(xo + O) is the mass spread over the same segment including the point
Xo.
The points of discontinuity, if there are any, form an enumerable set, whence it follows that in any part of the interval (a, b) there are points of continuity. If for any sufficiently small positive
rp(xo + E)
>
E
rp(xo

)
E ,
xo is called a "point of increase" of rp(x). There is at least one point of increase and there might be infinitely many. For instance, if 856
357
APPENDIX II
IP(x) IP(x) then
c
0
for
mo
for
=
=
is the only point of increase.
1.0(x) every point of the interval
On the other hand, for
xa ba
mo
=
(a, b)
a�x�c C <X� b,
is a point of increase.
In case of a
finite number of points of increase the whole mass is concentrated in these points and the distribution function
IP(x)
is a step function with a
finite number of steps. Stieltjes' integrals
J:d1,0(x)
=
mo,
represent respectively the whole mass mo and its moments about the origin of the order 1, 2, . . . i. When the distribution function IP(x) is given, moments mo, mt, m2, . . . mi (provided they exist) are deter mined.
If, however, these moments are given and are known to originate
in a certain distribution of a mass over
(a, b),
the question may be raised
with what error the mass spread over an interval by these data?
(a, x) can be determined
In other words, given mo, m1, m2, . . . mi, what are the
precise upper and lower bounds of a mass spread over an interval
(a,
x)?
Such is the question raised by Tshebysheff in a short but important article "Sur les valeurs limites des integrates"
(1874).1
The results contained
in this article, including very remarkable inequalities which indeed are of fundamental importance, are given without proof. The first proof of these results and the complete solution of the question raised by Tsheby sheff was given by Markoff in his eminent thesis "On some applications of algebraic continued fractions"
(St.
Petersburg,
1884),
written in
Russian and therefore comparatively little known. Suppose that Pi is the limit of the error with which we can evaluate the (a, x) or, which is almost the same, the
mass belonging to the interval value of
IP(x),
when moments mo, m1, m2, . . . mi are given.
If, with i
tending to infinity, Pi tends to 0 for any given x, then the distribution function IP(x) will be completely determined by giving all the moments
One case of this kind, that in which mo
1
=
1,
m2,.
( 2_k__1...!_) , 1 _3 _. _5 _· =:'
__· =
Jour. Liouville, Ser. 2, T. XIX, 1874.
2k
APPENDIX II METHOD OF MOMENTS AND ITS APPLICATIONS 1. Introductory Remarks. To prove the fundamental limit theorem Tshebysheff devised an ingenious method, known as the "method of moments," which later was completed and simplified by one of the most prominent among Tshebysheff's disciples, the late Markoff. The simplicity and elegance inherent in this method of moments make it advisable to present in this Appendix a brief exposition of it. The distribution of a mass spread over a given interval (a, b) may be characterized by a never decreasing function cp(x), defined in (a, b) and varying from cp(a) = 0 to cp(b) = mo, where mo is the total mass con tained in (a, b). Since cp(x) is never decreasing, for any particular point xo, both the limits lim cp(xo  e) = cp(xo 0) lim cp(xo + E) = cp(xo + 0) 
exist when a positive number
cp(xo

E
tends to 0.
Evidently
0) ;;;;; cp(xo) ;;;;; cp(xo + 0).
If
cp(xo  0)
=
cp(xo + 0)
then x0 is a "point of continuity" of cp(x).
cp(xo + 0)
>
cp(xo
=
cp(xo),
In case 
0),
x0 is a point of discontinuity of cp(x), and the positive difference cp(xo + 0)  cp(Xo

0)
may be considered as a mass concentrated at the point x0• In all cases cp(x0  0) is the total mass on the segment (a, xo) excluding the end point x0, whereas cp(x0 + 0) is the mass spread over the same segment including the point Xo. The points of discontinuity, if there are any, form an enumerable set, whence it follows that in any part of the interval (a, b) there are points of continuity. If for any sufficiently small positive E
cp(xo + E) > cp(xo

)
e ,
x0 is called a "point of increase" of cp(x). There is at least one point of increase and there might be infinitely many. For instance, if 356
357
APPENDIX II
IP(x)
=
IP(x)
=
for
0
a�x�c C <X� b,
for
mo
then c is the only point of increase. On the other hand, for IP(X) every point of the interval
=
xa 1no;b
a
(a, b) is a point of increase. In case of
a
finite number of points of increase the whole mass is concentrated in these points and the distribution function IP(x) is a step function with a finite number of steps. Stieltjes' integrals
represent respectively the whole mass mo and its moments about the
origin of the order
1, 2, ...i.
When the distribution function IP(X)
is given, ·moments mo, m1, m2, . . . m. (provided they exist) are deter mined. If, however, these moments are given and are known to originate in a certain distribution of a mass over
(a, b), the question may be raised (a, x) can be determined
with what error the mass spread over an interval by these data?
In other words, given mo, m1, m2, . . . m., what are the
precise upper and lower bounds of a mass spread over an interval
(a, x)?
Such is the question raised by Tshebysheff in a short but important article 11 Sur
les valeurs limites des integrales"
{1874)
.
1
The results contained
in this article, including very remarkable inequalities which indeed are of fundamental importance, are given without proof. The first proof of these results and the complete solution of the question raised by Tsheby sheff was given by Markoff in his eminent thesis of algebraic continued fractions"
11 On
(St. Petersburg,
some applications
1884), written in
Russian and therefore comparatively little known. Suppose that P• is the limit of the error with which we can evaluate the mass belonging to the interval
(a, x) or, which is almost the same, the
value of IP(x), when moments mo, m1, m2, . . . m, are given. If, with i tending to infinity, P• tends to 0 for any given x, then the distribution
function IP(x) will be completely determined by giving all the moments
One case of this kind, that in which mo 1
=
1,
m2,.
=
5_ . · �<2::.:k :....::._ 1_· .::.3_ . · .::. . 1:!.), = � 2k
Jour. Liouville, Ser. 2, T. XIX, 1874.
358
INTRODUCTION TO MATHEMATICAL PROBABILITY
'
was considered by Tshebysheff in a later paper, "Sur deux theoremes relatifs aux probabilites"
(1887) 1
devoted to the application of his
method to the proof of the limit theorem under certain rather general conditions.
The success of this proof is due to the fact that moments,
as given above, uniquely determine the normal distribution
IP(x)
=
1y;
J '"
e"'du
oo
of the mass 1 over the infinite interval
(
co,
+co).
After these preliminary remarks and before proceeding to an orderly exposition of the method of moments, it is advisable to devote a few pages to continued fractions associated with power series, for continued frac tions are the natural tools in questions of the kind we shall consider.
2. Continued Fractions Associated with Power Series.
q,(z)
=
At + A, f!%1 za
• +
Aa zaa
Let
+ . . . i
be a power series arranged according to decreasing powers of smallest exponent
at
is positive.
z where the
We consider this power series from a
purely formal point of view merely as a means to form a sequence of rational fractions
and we need not be concerned about its convergence. Evidently
1/ ,P(z)
can again be expanded into power series, arranged
according to decreasing powers of nonnegative powers of
z,
z.
Let its integral part, containing
be denoted by
qt(z),
and let the fractional part
Bt + B2 + Ba + zll•
zil'
containing negative powers of
z,
·
·
·
be denoted by
1
,P(z)
zfJ•
=
 t/Jt(z),
qt(Z)  tPt(Z),
In the same way 1
tPt(Z) can be represented thus:
1
t/Jt(z) 1
=
q2(z)  t/J2(z)
Oeuvres compl�tes de P. L. Tshebysheff, Tome 2, p. 482.
so that
·
359
APPENDIX II
where q2() z is a polynomial and
cP2(z)
Ct C2 Ca zr• + r z • + zr• +
=
a power series containing only negative powers of z.
Further, we shall
have
1 q,2(z)
=
qa(z)  c/Ja(z)
with a certain polynomial qa() z and a power series
z c/Ja()
=
D2 Dt Da ' z • + zh + ' z • + . .
containing negative powers of ,z and so on.
.
Thus we are led to consider a
continued fraction (finite or infinite)
1 ql1 q2 qa
(1)
.!
•
associated with q,() z in the sense that the formal expansion of
1

1 z q,  q,,()
into a power series will reproduce exactly 9() z .
(1)
The continued fraction
is again considered from a purely formal standpoint as a mere abbre
viation of the sequence of its convergents
The polynomials
Pt, P2, Pa, .. . Qt, Q2, Qa, .. . can be found step by step by the recurrence relations
(2 )
p, Q, Pt
Ql
= = = =
}
q.Pil p._2 i q,Q._t  Qi2 1, Po = 0 Qo = 1 qt,
=
2,
3, 4,
360
INTRODUCTION TO MATHEMATICAL PROBABILITY
from which the following identical relation follows:
(3) showing that all fractions
�
are
irreducible.
P,(z) Q,(z)
Evidently degrees
of consecutive denominators of
convergents form an increasing sequence and the degree of least i.
Q;(z) is at
Since
=
1
P;(q,+l  4>>+I(z))  P;1 Q;(qi+1  4>>+1(z))  Q'1
=
=
P;+l  P;lf>;+J(z) Q'+1  Q;lf>;+1(z)
we can write
q,(z)
=
P1+1  P,q,,+l(z) Q;+1  Q;lf>;+J(z)
in the sense that the formal development of the righthand member is identical with
q,(z).
By virtue of relation
p,
4> (z) Q, The degree of
(3)
1 =
Q,(Q>+I  Q;l/>i+1)"
Q; being �� and that of Qi+1 being �i+1, the expansion of Q;(Q;+1  Q;lf>i+l)
in a .series of descending powers of z begins with the power
z>tH·1+�.
Hence,
p,
q,(z)  Q, and, since
M
=
z>.M>1+1 +
�i+J ?; �; + 1, the expansion of q,(z)  p, Q;
begins with a term of the order characterizes the convergents
2�, + 1 in 1/z at least. This property P;/Q; completely. For let P/Q be a
rational fraction whose denominator is of the nth degree and such that in the expansion of
q,(z)  p
Q
361
APPENDIX II
the lowest term is of the order 2n + 1 in 1 Iz at least. Then PIQ coincides with one of the convergents to the continued fraction (1). Let i be determined by the condition
Then 4> (
z
)
P;  Q; P
4J(z) Q
M zX.+X<+�
=
=
+
·
·
·
N + z 2n+ l
whence in the expansion of
the lowest term will be of degree degree of
2n + 1 or>.; +
>•+t in
1/z.
Hence, the
PQ; P;Q in
z is
not greater than both the numbers
>.;n1
and
which are both negative while
is a polynomial.
Hence, identically, PQ; P;Q
=
0
or
which proves the statement. 3. Continued Fraction Associated with
ib d'P(x)X
Let
·
a Z
IP(X)
be a never
decreasing function characterizing the distribution of a mass over an interval (a, b). The moments of this distribution up to the moment of the order 2n are represented by integrals mo
=
.C d'P(x),
m1
=
J:bxd'P(x), m2
=
.Cx2d1P(X),
•
·
•
m2n
=
J:bx2nd'P(x).
INTRODUCTION TO MATHEMATICAL PROBABILITY
362 Let
If rp(x) has not less than
n + 1 points of increase, we must have Lll > o, . . . Ll,. > o,
.do > 0,
and conversely, if these inequalities are satisfied, rp(x) has at least points of increase. To prove this, consider the quadratic form
in
n + 1 variables to, t1, ,P
=
.
.
•
n + 1
t,.. Evidently
};mi+;t,t;
(i, j
=
0, 1, 2, . ..n)
so that � .. is the determinant of ,P and �o, Ll 1, � .. 1 its principal minors. The form ,P cannot vanish unless to 0. t1 t,. For if x 0, we must have also � is a point of increase and q, •
•
•
=
=
·
·
·
=
=
=
=
. J:e+•(to + t1x + f• for an arbitrary positive
(to + t11/ +
'
•
•
E1
·
·
+ t,.x")2drp(x)
·
=
0
whence by the mean value theorem
J:e+•drp(X) ee
+ t,.f/")2
=
0 (� 
E
< 1/ < � + E)
or
to + tlf/ + . . . + t,..,.. = 0 because
J:e+•d�p(x) ee Letting
E
>
0.
converge to O, we conclude to + lt � +
at any point of increase.
·
·
·
+ t,.�"
=
Since there are at least
0
n + 1 points of increase
the equation to + t1x + would have at least
·
·
·
+ t,.x"
=
0
n + 1 roots and that necessitates to
=
t1
=
·
·
·
=
t,.
=
0.
363
APPENDIX II
Hence, the quadratic form !/>, which is never negative, can vanish only if all its variables vanish; that is, !/> is a definite positive form.
determinant � .. and all its principal minors �nt, �..2, positive, which proves the first statement.
•
•
Its
�0 must be
•
Suppose the conditions
At> 0, ... A,.> 0
Ao> 0, satisfied and let
have 8 < n+ 1 points of increase.
cp(x)
Then the
integral representing !/> reduces to a finite sum
+ t..��)2 + + t..��)2 + P2(to + t1�2 + + + p ( to + h�. + + t,.�:)2 ·
·
denoting by p1, p 2 ,
•
•
•
·
·
·
,
·
•
·
p. masses concentrated in the 8 points of
increase �1, �2, �.. Now, since 8� n constants t0, t1, all zero, can be determined by the system of equations •
·
·
•
+ t..�� to + h�t + . . + to + tt�2 . . . + t..��
to + tt�• +
·
·
+ t..�:
·
=
=
=
•
•
•
t,., not
0 0 0.
Thus !/> vanishes when not all variables vanish; hence, its determinant
�..
=
0, contrary to hypothesis.
From now on we shall assume that increase.
cp(x)
has at least n +
1
points of
The integral
rb dcp(x) Ja Z X can be expanded into a formal power series of
rb dcp(x) Ja z X 
=
1/z,
thus
m2n mo mt m2 ... + 2 +t + + + 2 + z3 zn z z
and this power series can be converted into a continued fraction as explained in Sec. 2. Let
Pt P 2 Q/ Q21 be the first n +
1
•
P,. P..+t ' Q,. Qn+J
convergents to that continued fraction.
degrees of their denominators are, respectively,
I say that the
1, 2, 3, ... n+ 1.
Since these degrees form an increasing sequence, it suffices to show that there exists a convergent with the denominator of a given degree
8�n+l. This convergent PjQ is completely determined by the condition that in a formal expansion of the difference
INTRODUCTION TO MATHEMATICAL PROBABILITY
364
ibdlp(X)X
_
1/z, terms involving 1/z, 1/z2,
into a power series of absent.
!:_ Q
a Z
•
•
•
1/z2" are
This is the same as to say that in the expansion of
ibdfP(X)X P(z)
Q(z)
a Z
1/z, 1/z2,
there are no terms involving
•
•
•
1jz•.
The preceding expres
sion can be written thus:
ibQ(x)dfP(x) ibQ(z) Q(x)dlp(x) P(z)  X  X +
a
a
Z
Z
=
__!._ z•+l
+
Since
i bQ(z) XQ(x)dfP(x) P(z) a
is a polynomial in
z,
it must vanish identically.
P(z}
(4) To determine
Z
That gives
ibQ(z) XQ(x)dfP(X).
=
a
Z
Q(z) we must express the conditions that in the expansion of
ibQ�)�fP;X) terms in 8
1/z, 1/z2,
•
•
•
ljz•
vanish.
These conditions are equivalent to
relations
which in turn amount to the single requirement that
(6) 8(x) of degree � 8 1. Q(z) of degree 8 satisfying con ditions (5), and P(z) is determined by equation (4), then P(z)/Q(z) is a
for an arbitrary polynomial
Conversely, if there exists a polynomial
convergent whose denominator is of degree
ibdfP(X)
_
z x
lacks the terms in
1/z, 1/z2,
•
•
•
1/z2•.
8.
P(z) Q(z)
For then the expansion of
365
APPENDIX II
Let
Q(z) Then equations
=
lo + l1z + l,z2 +
·
·
+ l,_1z•1 + z•.
·
(5) become
molo + m1l1 + m2l2 + m1lo + m2l1 + mal2 +
·
·
·
·
·
·
+ m,_1l,_, + m, + m,la1 + m,+l
ma1lo + m.l1 + ma+1l2 + . . . + m2a2la1 + m2a1
=
=
=
0 0 0.
This system of linear equations determines completely the coefficients lo, l1, . . . la1 since its determinant �•1 > 0. The existence of a convergent with the denominator of degree 8�n+l being established, it follows that the denominator of the sth convergent P,jQ, is exactly of degree 8. The denominator Q. is determined, except for a constant factor, and can be presented in the form:
Q,
c =

�a1
1 z z2 mo m1m2 m1 m2ma
•
•
·
•
• •
•
•
•
z• m, ma+1
A remarkable result follows from equation (6) by taking Q 8
=
Q,.; namely, if
(7) while
(8 � n). In the general relation
Q.
=
q.Qa1  Q,_ll
the polynomial q, must be of the first degree
q,
=
a,z + fJ,,
which shows that the continued fraction associated with
ibdrp(x)
zx
=
Q, and
366
INTRODUCTION TO MATHEMATICAL PROBABILITY
has the form 1
a1Z + �1
1
a:aZ + ,.,R 2
1
aaz +�a:=
The next question is, how to determine the constants plying both members of the equation
a,
and �..
Multi
(s � 2) Q,_2drp(z), (7), we get
by
integrating between limits
0
=
a
and b, and taking into account
a.J:zQ,_1Q,_zdrp(z)  .CQL2drp(z).
On the other hand, the highest terms in
Q,_l and Q,_2 are
Hence,
zQ,_z
.
1
=
Q,_1 + 1/1 ao1
where ![I is a polynomial of degree �s

2.
Referring to equation (6),
we have
and consequently
(8)
a, aa1
J:bQ;_2drp(z)
=
J:bQ;_ldrp(z).
mz,.; how m1, a, can be found? Evidently a1 1/mo. Fur thermore, Qo = 1 and Q1 is completely determined given mo and m1. Relation (8) determines a2, and Q2 will be completely determined given mo, m1, m2, m8. The same relation again determines a8, and Q3 will be determined giyen mo, m1, ... m5. Proceeding in the same way, we conclude that, given mo, m1, m2, ... m2,., all the polynomials' Suppose that the following moments are given: mo,
many of the coefficients
as
well as constants
•
=
.
.
367
APPENDIX II
can be determined.
It is important to note that all these constants are
positive. Proceeding in a similar manner, the following expression can be found
{3,
=
J:b zQ!1d111(z).
a,
(b Ja Q:_ldiP(Z)
It follows that constants
f311 f3s1
•
•
•
f3n For if B
are determined by our data, but not f3n+l·
=
n + 1, the integral
J:bzQ�diP(z) can be expressed as a linear function of mo, coefficients.
m1,
•
•
•
m2n+l with known
But msn+l is not included among our data; hence, f3n+1 ·
cannot be determined.
4.. Properties of Polynomials Q,.
Q,(z)
=
0
Theorem.
Roots of the equat{on
(s ;;!! n)
are real, simple, ancl. contained within the interval (a, b). Proof. Let Q,(z) change its sign r < B times when z passes through points z1, z2, Zr contained strictly within (a, b). Setting •
•
•
8(z)
=
(z  Zt)(Z  Z2)
•
•
•
(z  Zr)
the product
B(z)Q,(z) does not change its sign when
z increases from a to b.
J:8(z)Q,(z)d!l'(z)
=
However,
0,
and this necessitates that
B(z)Q,(z) or
Q,(z) vanishes in all points of increase of !II(Z). But this is impossible, n + 1 points of increase,· whereas the degree B of Q, does not exceed n. Consequently, Q,(z) changes its sign in the interval (a, b) exactly 8 times and has all its roots real, simple, and located within (a, b).
since by hypothesis there are at lea,st
It follows from this theorem that the convergent
Pn
Q,.
368
INTRODUCTION TO MATHEMATICAL PROBABILITY
can be resolved into a sum of simple fractions as follows:
P,.(z) =�+�+ . Q,.(z) z  Zt z  Zs
(9)
. z,. are
. +� z  z,.
.
roots of the equation
Q,.(z)
=
0
and in general
,. P�(zr.). A = Q,.(zr.) The right member of coefficient of
1/z"'
can be expanded into power series of
(9)
being
1/z,
the
"
�A..z�1•
a=l
By the property of convergents we must have the following equations: n
�A..
=
mo
=
m1
a�l "
�A ..z..
a=l n
�A..z!n1 = msnl•
a1
These equations can be condensed into one, n
�A..T(z..) = J:T(z)drp(z)
(10)
a=l
which should hold for any polynomial Let us take for
T(z)
T(z)
of degree � 2n

1.
a polynomial of degree 2n  2:
T(z) 
[ (z
Q,.(z) z..)Q�(z..)

]2
•
Then
T(z..)
=
1,
T(zfl)
·=
0
and consequently, by virtue of equation
A.. = Thus constants At, A2,
fb[ (z .
.
.
.
(10),
]2
Q,.(z) drp(z) > 0. z.. )Q�(z..)
_
A,.
if
are. all positive, which shows that
.
P,.(zr.)
369
APPENDIX II
has the same sign as
Q�(zr.).
Now in the sequence
Q�(zt), Q�(z2), ...Q�(z,.) any two consecutive terms are of opposite signs.
The same being true of
the sequence
P,.(zt), P,.(z2), ...P,.(z,.), P,.(z)
it follows that the roots of
are all simple, real, and located in the
intervals
(Zt1 Z2); (z2, Za); . ..(Zn11 z,.). Finally, we shall prove the following theorem:
Theorem.
For any real x
Q�(x)Qnt(X)  Q�_1(x)Q,.(x) is a positive number. Proof. From the relations
Q.(z) Q.(x)
=
=
(a.z + {3.)Qst(Z)  Q._2(z) (a.x + {3.)Q,_t(x)  Qa2(x)
it follows that
Q.(z)Q.t(X)  Q.(x)Qat(Z) =
zx
a.Q._1(z)Q._1(x) + +
whence, takings
Q .t (z) Q .2 (x)

Q. _t (x)Qa 2(z)
zx
1, 2, 3, ... nand adding results,
=
•=1
It suffices now to take z
=
x to arrive at the identity. n
Q�(x)Q,._t(X)  Q:,_1(x)Q,.(x) Since
Q0
=
1 and
a.
>
=
! a.Q._ (x) 1
2•
•1
0, it is evident that
Q�(x)Q,._t (X)  Q:,_1(x)Q,.(x)
>
0
for every real x.
6. Equivalent Point Distributions.
If the whole mass can be con
centrated in a finite number of points so as to produce the same l first moments as a given distribution, we have an "equivalent point distribu
.
·370
INTRODUCTION TO MATHEMATICAL PROBABILITY
tion" in respect to the l first moments.
In what follows we shall suppose
that the whole mass is spread over an infinite interval
 oo,
oo
and that
the given moments, originating in a distribution with at least n + 1 points of increase, are
The question is: Is it possible to find an equivalent point distribution where the whole mass is concentrated inn + 1 points?
Let the unknown
points be ��. ��. . . . �..+1 and the masses concentrated in them
At, A2, ... A,.+l· Evidently the question will be answered in the affirmative if the system of 2n + 1 equations
n+l
�
Aa a=l n+l Aa�a a=l n+l Aae: a=l
�
(A)
�
=
=
=
mo m1 ms
n+l
� AaE!"
=
a=l
m2,.
can be satisfied by real numbers E1, � � • . . • �..+Ji A1, A2, • • • An+11 the last n + 1 numbers being positive. The number of unknowns being greater by one unit than the number of equations, we can introduce the additional requirement that one of the numbers ��. ��. . . . �..+1 should be equal to a given real number v.
(A) may be replaced by
The system
the single requirement that the equation
n+l (11)
� AaT(�a) f_"'.., T(x)dlf'(X) =
a1
shall hold for any polynomial
T(x) of degree �2n. Let Q(x) be the �n+l and let B(x) be e2,
polynomial of degree n + 1 having roots��. an arbitrary polynomial of degree �n

1.
(11) to
T(x)
=
B(x)Q(x).
•
.
•
Then we can apply equation
APPENDIX II
Since
QU..)
(12)
= 0, we shall have J_"'.., 8(x)Q(x)d�P(x)
for an arbitrary polynomial see that requirement
(12)
371
0
=
8(x) of degree �n  1. Presently we shall Q(v) 0 determines Q(x), save
=
together with
for a constant factor if
Q,.(v) ¢ 0. Dividing
Q(x) by Q,.(x), we have identically Q(x)
where
=
(Xx + p.)Q,.(x) + R,.I(x)
R,._I(x) is a polynomial of degree �n  2,

1.
If
8(x) is an arbi
trary polynomial of degree �n
(Xx + p.)8(x) will be of degree �n
 1.
Hence
.f<xx + p.)8(x)Q,.(x)diP(x) = 0 by
(6),
and
(12)
shows that
Lb8(x)R,._I(x)d�P(X)
=
0
8(x) of degree �n  2. The last require R,.I(x) differs from Q,._I(x) by a constant factor. Since the highest coefficient in Q(x) is arbitrary, we can set
for an arbitrary polynomial ment shows that
RnJ(x)
=
Q,._I(x).
In the equation
Q(x)
=
(Xx + p.)Q,.(x)  Q,._I(x)
X and p.. Multiplying both members by Q,._1(x)dfP(X) and integrating between  oo and oo, we get
it remains to determine constants
X _"'..,xQ,._lQndfP(X)
J
or
But
=
J_"',. Q!...1d!p(X)
372
INTRODUCTION TO MATHEMATICAL PROBABILITY
whence The equation
Q(v) (an+tV p,)Q,.(v)  Q,._t(V) serves to determine p, if Qn(v) The final expression of Q(x) will be Q(x) (an+t(X  v) QQ:(;)>)Q,.(x)  Q,._1(x). 0
+
=
=
¢ 0.
+
=
Owing to recurrence relations
Q2 (a2X f32)Q1  Qo; Qs (asx f3s)Q2  Q1; Q,. (a,.X (3,.)Qnl  Qn21 it is evident that Q, Q,., Q,._J, . Qt, Qo 1 +
=
+
=
·
+
=
·
x
in a Sturm series. For = oo only permanences.
x
=
·
1
 oo, it contains n + variations and for It follows that the equation
=
Q(x)
1
=
0
v.
has exactly n + distinct real roots and among them Thus, if the is solvable, the numbers h, , determined as problem + are roots of
Q(x)
�2
=
•
•
•
�n l
0.
Furthermore, all unknowns Aa will be positive. it follows that
(11)
In fact, from equation
a J_.. [<x _ ��)b'(�a)rd�(x) > o. .. Now we must show that constants Aa can actually be determined so A
=
to satisfy equations (A).
as
To this end let
_ P(x) f_.... Q(x)  Q(z) d�(z) [an+t(X v) + Q,.Q,.(tv(v))JPn(X) Pnt(x). Then Q(x) J_.... :�z� P(x) J_..,. Q�)��;z) =
X_ Z
=
=
(12),
and, on account of the expansion of the right member into power series of lacks the terms in Hence, the expan sion of
1/x
1/x, 1/x2, 1/x". f .... d�(z) P(x) _ x z Q(x) •
_
•
•
373
APPENDIX II
lacks the terms in
P(x) Q(x)
1/x, 1/x2, =
mo
+
x
•
•
mt
•
1/x2"+1;
+ ... +
x2
that is, m2,. + x2"+1
On the other hand, resolving in simple fractions,
P(x) Q(x)
�
=
+
�
n
A +t . X  �n+l
+ ... +
X  �2
X  �1
1/x (A).
Expanding the right member into power series of
and comparing
with the preceding expansion, we obtain the system
By the previous
remark all constants A .. are positive.
Thus, there exists a point distribu
tion in which masses concentrated in
mo,
m1, . . . m2,..
+
n
1
points produce moments
One of these points v may be taken arbitrarily, with
the condition
Q,. (v)
¢
0
being observed, however.
6. Tshebysheff's Inequalities.
In a note referred to in the introduc
tion Tshebysheff made known certain inequalities of the utmost impor tance for the theory we are concerned with. proof of them was given by Markoff in
The first very ingenious
1884
and, by a remarkable
coincidence, the same proof was rediscovered almost at the same time by Stieltjes.
A few years later, Stieltjes found another totally different
proof; and it is this second proof that we shall follow. Let

oo,
IP(x)
oo.
be a distribution function of a mass spread over the interval
Supposing that a moment of the order i,
J_..
x'diP(x)
=
m;,
..
exists, we shall show first that lim
when
l
tends to
+ oo
.
Similarly
= =
0 0
For
J, .. xidiP(x) or
li(mo  IP(l)) lim liiP( l)
f, .. diP(X)
!1;; li
=
li[IP(
+ oo
)  IP(Z)]
374
INTRODUCTION TO MATHEMATICAL PROBABILITY
or
Now both integrals
converge to 0 as ately.
l tends to + oo; whence both statements follow immedi
Integrating by parts, we have
J:xidrp(x) = li[rp(l) mo]  i.fo1[rp(x)  m0]xi1dx fo xidrp(x) = ( 1)illirp(l) iJ01xi1rp(x)dx, l
whence, letting
l converge to + oo, '
=
m;
f .. xidrp(x) =iJro .. [rp(x) mo]xildx if 0 �
xilrp(x)dx.
�
If the same mass
mo, with the same moment m;, is spread according to 1/t(x), we shall have
the law characterized by the function m;
�
J_.... xidl/t(x) = ifo ..[l/t(x)  mo]xi1dx if� .. xi11/t(x)dx,
whence
J_....xil[rp(x) 1/t(x)]dx = 0.
(13)
Suppose the moments
rp(x) are known. Provided rp(x) + 1 points of increase, there exists an equivalent point
of the distribution characterized by ha.<; at least
n
distribution, defined in Sec. 5 and characterized by the step function
1/t(x) which can be defined as follows: 1/t(x) = 0 1/t(x) =A1 1/t(x) =At + A2
for
+ An + An+t
for
1/t(x) =A1 + A2 + 1/t(x) =At + A2 + ·
provided roots
��, �2,
•
•
•
·
·
·
·
·
for for
for
00
<X< ��
�� �X< �2 �2 �X< �3 �n �X< �n+l �n+l �X< + 00 1
�n+l of the equation Q(x) = 0 are arranged
in an increa.<�ing order of magnitude.
APPENDIX II
Equation
(13)
will hold for
same, the equation
J_
(14)
"',..
i
375
1, 2, 3, ... 2n
=
B(x)[cp(x)  ift(x)]dx B(x)
will hold for an arbitrary polynomial function
h(x)
=
=
or, which is the
0
of degree �2n
 1.
The
cp(x)  ift(x)
in general has ordinary discontinuities.
We can prove now that
h(x),
if
not identically equal to 0 at all points of continuity, changes its sign at least 2n times.1 Suppose, on the c ontrary, that it changes sign r < 2n times; namely, at the points
Taking
B(x) equation
(14)
=
a1) (x  a2)
(x 
·
·
·
(x 
a.) ,
will be satisfied, while the integrand
B(x)h(x), if not 0, will be of the same sign, for example, positive. point of continuity of since
h(x)
h(x).
If �
changes sign at a;.
numbers a11 a21
a;
=
(i
=
1, 2,
.
.
.
r
)
Let � be any
h(a1)
then
=
0
If �does not coincide with any one of the
a., then for an arbitrarily small positive
E
we must
E
above a
have
(H• f B(x)h(x)dx J •
=
0.
But by continuity
O(x)h(x) remains in the interval (� 
certain positive number unless
E,
�+
h(�)
)
E
=
for sufficiently small
0.
h(x) does not vanish cp(x) and if;(x) do not differ
Thus, if
at all points of continuity (in which case
essentially), it must change sign at least 2n times. the change of sign can occur.  oo,
Let us see now where
In the intervals n l, + �1 and �+
oo
1A function f(x) is said to change sign once in (a, b) if in this interval there exists a point or points c such that, for instance, f(x) � 0 in (a, c) and f(x) � 0 in (c, b), equality signs not holding throughout the respective intervals. The change of sign occurs n times if (a, b) can be divided in n intervals in which f(x) changes sign once.
376
INTRODUCTION TO MATHEMATICAL PROBABILITY
�(x)  1/l(x)
evidently cannot change sign. Within each of the intervals
there can be at most one change of sign, since
1/l(x)
remains constant
there, and �(x) can only increase. The sign may change also at the points of discontinuity of 1/l(x); that is, at the points �1, �2, ... �n+1· Altogether, �(x)  1/l(x) cannot change sign more than 2n + 1 times and not less than 2n times. Since 1/l(x) = 0 so far as positive
e,
< �1 and
x
�(�t  E) Also
1/l(x)
�(�1  e)
is not negative for
we must have
=
mo for
x
> �..+1 and
�(�n+1 + e)
1/1(�1 
�(x) ;a;
E
) � 0.
mo, so that
 1/1(�..+1 +
E
) ;a; 0.
At first let us suppose
In this case
�(x)  1/l(x) must change sign an odd number of times;that is, 1 times. Since this cannot happen more than 2n + 1 ' number of times �(x)  1/l(x) changes its sign must be exactly
not less than 2n + times, the
2n + 1.
These changes occur once within each interval �i1, �.
. �.. +1· When the change of sign and in each of the points �1, �2, occurs in the interval (�i1, �.)where 1/l(x) remains constant, because �(x) never decreases, we must have for sufficiently small e •
•
(15) But the sign changes in passing the point �.;therefore,
(16) The equalities
�(�t  e)  1/1(�1  E)
=
O,
cannot both hold for all sufficiently small be a change of sign at �1 and not be greater than 2n 
�..+1,
1 which
e.
For then there would not
so that the number of changes would is impossible. Therefore, let
and Then there will be exactly 2n changes of sign: one in each of the intervals
�i1, �
377
APPENDIX II
and in each of the points
�2, �a, .
. �n+l·
The inequalities (15) and
(16) would hold fori � 2, but
foO(�l

)  !f(h
E
for all sufficiently small Now let

)
E
0,
=
E.
and for all sufficiently small positive of sign: In each of the points
Then there will be exactly 2n changes
E.
�1, �2,
•
•
•
�n and in each of the n intervals
The inequalities (15) and (16) will again hold fori � n, but
tp(�n+l  E)  !f(�n+l
for all sufficiently small
E.
foO(�n+l + E)  !f(�n+l + E)
and
) <0
 E
Letting
E
converge to
0,
=
0
we shall have
tp(�i  0) � !f(�i  0) tp(�i + 0) � !f(�i + 0) fori
=
1, 2, 3, . . . n + 1 in all cases.
Then, since
tp(�.) � tp(�i  0); we shall have also
tp(�.) � lf(�i  0) tp(�,) � !f(�i + 0) or, taking into consideration the definition of the function
!f(x)
i1
tfJ
( ) �.
> =
� P(M � Q'(�z) l=l i
These are the inequalities to which Tshebysheff's name is justly attached.
For a particular root
�.
=
v
they can be written thus:
�P(�z)
tp(v) � �Q'(�z) (17)
�I
378
INTRODUCTION TO MATHEMATICAL PROBABILITY
with the evident meaning of the extent of summations.
Another, less
explicit, form of the same inequalities is
cp(v) !;; if;(v  0) cp(v) � if;(v + 0).
(18) As to
P(x) and Q(x), they can be taken in the form: P(x) =[an+J(X  v)Q,.(v) + Q,_J(v)]P,.(x)  Q,.(v)Pnl(x) Q(x) =[an+l(x  v)Q,.(v) + Q,_J(v)]Q,.(x)  Q,.(v)Q,_J(x).
Thus far we have assumed that
v was different from any root of the
equation
Q,.(x) =0, but all the results hold, even if
v approaches a root � of or tends to ( either h , one root of �n Q(x) +l)  oo or + oo, while the Q,.(x) remaining n roots approach the n roots x1, x2, . . . x,. of the equation
To prove this, we note first that when a variable
Q,.(x) =0. If
�1 tends to negative infinity, it is easy to see that P(�1) Q'(�J)
tends to
0.
In this case the other quotients
P(�,) Q'(�,) tend respectively to
P,.(x1) P,.(x2) ' ' Q�(x1) Q�(x2) If
�ntl tends to positive infinity the quotients P(�,). 'l=1, 2, . ' Q (�z)
.
.
n
approach respectively
P,.(x,), l  1, 2, 3, Q�(x,), _
while
. . . n,
APPENDIX II
tends to
0.
Now take
tive number
E
v
� =
converge to
0.

E
and
v
379
� =
+
E
in
(17)
and let the posi
Taking into account the preceding remarks,
we find in the limit
whence again
(" cp "') cp
(�)
> =
�P..(x,) � Q�(x,) zz<e
< =
�P..(x,). � Q�(x,) "'' ;:;;e
(17)
But these inequalities follow directly from
by taking
v
=
� .
Since
t/l(v + 0)  t/l(v  0) it follows from inequalities
(18)
=
P(v) ' Q (v)
that
0 � cp(v)  t/l(v  0) �
P(v) ' Q (v)
·
On the other hand, one easily finds that
P(v) ' Q (v)
=
1 an+1Q,.(v)2 + Q�(v)Qnl(v)  Q�1(v)Qn(v)
•
But referring to the end of Sec. 4,
..
Q�(v)Q,._l(v)  Q�_1(v)Q,.(v)
=
� a,Q,_l(v)2,
•=1
whence
an+1Q,.(v)2 + Q�(v)Qnl(v)  Q�_l(v)Q,.(v)
=
Q�+l(v)Q,.(v)  Q�(v)Qn+l(v).
Finally,
1
0 � cp(v)  t/l(v  O) � Q�+1(v)Q,.(v)  Q�(v)Qn+l(v) If
cp1(v)
is another distribution function with the same moments
380
INTRODUCTION TO MATHEMATIOAL PROBABILITY
we shall have also
and as a consequence,
( 19 ) Here for brevity we use the notation
a very important inequality. x ..(v) =
1
Q�+l(v)Q,.(v)  Q�(v)Q�1(v)"
7. Application to Normal Distribution.
An important particular
case is that of a normal distribution characterized by the function
In this case it is easy to give an explicit expression of the polynomials
Q,.(x).
Let
H,.(x)
dne"1 e"•· dx"
=
Integrating by parts, one can prove that for l � n  1
J_"".,e"•x1H,.(x)dx Hence, one may conclude that factor.
Let
Q,.(x)
Q,.(x) To determine
c,.,
=
=
0.
differs from
H,.(x)
by a constant
c,.H,.(x).
we may use the relation
H,.(x)
=  2xHn l (x)
which can readily be established.
 2(n 1)H2(x) Introducing polynomials
relation becomes
Q,.(x) Hence, n C n C 2
Since
Ho(x)
=
=
Qo(x)
c,.
1
n a,. = ,., Cn1
2n  2'
= 1, we have
IXl
1
co
=  = mo
= 1; also
1
=
1 C
z:! �
fJn
=
0.
Q,.,
this
APPENDIX II
whence
The knowledge of
Ct = �.
381
Co and Ct together with the relation
Cn2 c = ,. 2n  2 allows determination of all members of the sequence
c2, ca, C4, •
The final expressions are as follows;
1 2"' · 1 · 3 · 5 · · (2m 1) 1 C2m+l = +1 2"' · 2 · 4 · 6 • · • 2m' C2"' =
H,.(x), Hnt(x), H,._2(x) and owing to H,.(x) is an even or odd polynomial, according as n is even or
From the above relation between the fact that
odd, one finds
H2m(O) = (2)"' · 1 · 3 · 5 · · · (2m 1), while another relation
H�(x) = 2nHnt(X), following from the definition of
H,.(x),
gives
H�,.._1(0) = (2)"' · 1 3 · 5 · · · (2m 1). ·
These preliminaries being established, we shall prove now that
1 (v) = CnCnrt(H:..rl(v)H,.(v)  H�(v)Hn+l(v))
x ..
attains its maximum for
v =0.
Let
O(v) =H:..r1(v)H,.(v)  H�(v)Hn+I(v). Then, taking into account the differential equation for polynomials
H,.(v): H::(v) =2vH�(v)  2nH,.(v) we find that
dO = 2v0 2H,.(v)Hn+t(v). dv On the other hand,
( )t!:._ H,.(v)
0 =Hn+l V
and denoting roots of the polynomial
dV Hn+l(V)' Hn+I(v) in general
by �'
38 2
INTRODUCTION TO MATHEMATICAL PROBABILITY
Consequently
Again
and so
�� = 2Hn+l(v) 2� �:�n) (v � �)2 =  �+� �v) 2� (v � �) 2• Roots of the polynomial
Hn+l(x) being symmetrically located with
respect to 0, we have:
and finally
Hence
dO dv 
>
dO dv
v < 0;
if
O
<
that is,O(v) attains its maximum forv for
v =
0.
H�m+l (0),
v>O
if
O
= 0 and :Xn(v) attains its maximum C2m, c2m+li H2m(O),
Referring to the above expressions of we find that
2 4·6··· 2m 3· 5 · 7 ··· ( 2m + 1) 2·4·6··· 2m :X2m+l(O) = • 3· 5 · 7 ··· ( 2m + 1) X2m(O) =
354, we find the inequality
In Appendix I, page
·
2·4·6 .. · 2m 1 .y; : :.: "'"F.'====:=� < 2 5;:;· ··= 1) ( 2m 1:;· 3;oV 4m + 2 

·
whence
•

2·4·6···2m 5 7 ( 2m + 1)
��=����= <
3 Thus, in all cases
·
·
·
·
·
� 4m + 2
.
383
APPENDIX 11
whence, by virtue of inequality
(19),
I1P1(v)  tp(v)l Thus any distribution function mo
=
1,
m21o

1
=
1,01(v)
m2,.
0,
=
with the moments
1·3·5 (2k  1) :2=,. '__.:.
corresponding to
1,0(v) differs from
tp(v)
1
=
J
v
� 
by less than
�2:·
<
(k
� n)
e""du oo
Since this quantity tends to 0 when n increases indefinitely, we have the following theorem proved for the first time by Tshebysheff: The system of infinitely many equations
(2k 1) k
=
1, 2, 3, . . .
uniquely determines a never decreasing function
tp(x) such that 1,0(
oo
)
=
0;
namely,
8. TshebysheffMarkoff's Fundamental Theorem. depending upon a parameter
�,
1 F(x, �)
When a mass
is distributed according to the law characterized by a function
=
we say that the distribution is variable.
Notwithstanding the variability of distribution, it may happen that its moments remain constant.
If they are equal to moments of normal
distribution with density
� then by the preceding theorem we have rigorously
F(x, �) no matter what � is.
1
=
r:_ V 'lr _
f"'
e""du oo
384
INTRODUCTION TO MATHEMATICAL PROBABILITY
Generally moments of a variable distribution are themselves variable.
>..
Suppose that each one of them, when instance
oo
),
tends to a certain limit (for
tends to the corresponding moment of normal distribution.
One can foresee that under such circumstances
1p(x)
F(x, >..)
"' eu•du. J v;
will tend to
1
=

oo
In fact, the following fundamental theorem holds:
If, for a variable distribution characterized
Fundamental Theorem.
by the function F(x, >..) , lim
for any fixed
J_
k
..
=
xkdF(x, >..)
=
�I'" e"'•xkdx;
..
0, 1,
lim
>..�
oo
..
2,
3,
F(v, >..)
..., then =
1 _ ,r: V 'll"
J
"
>..�
e"'"dx;
oo
oo
uniformly in v. Proof.
be
2n +
Let
1 moments corresponding to a normal distribution.
They
allow formation of the polynomials
Qo(x), Q1(x), ... Qn(x) and the function designated in Sec. 6 by
and
Q(x)
1/l(x).
Similar entities cor
responding to the variable distribution will be specified by an asterisk. Since as
and since .!ln > O, we shall have
.1! for sufficiently large
>...
Then
> 0
F(x, >..)
will have not less than
n+1
points of increase and the whole theory can be applied to variable dis tribution.
In particular, we shall have
0 ;;! 1p(v)  1/l(v

0) ;;! Xn(v)
(20) 0 ;;! F(v, >..)  1/l*(v  0) ;;! x!(v). Now
Q �(x) (s
=
0,
1,
2, ... n)
and
Q*(x)
depend rationally upon
385
APPENDIX II
mt(k
=
0,
1,
. 2n);
2,
hence, without any difficulty one can see that
Q!(x) + Q,(x); s 0, 1, 2, ... n Q*(x) + Q(x) =
as >.. + oo ; whence,
x:(v)
+ x.. (v).
Again
1/t*(v

O)
+
1/t(v
0)

as >.. + oo.
A few explanations are necessary to prove this.
Q,.(v) ¢
Then the polynomial
0.
�1
�2
<
<
Q(x)
�3
will have
' ' '
<
<
n +
At first let
1 roots
�n+l·
Since the roots of an algebraic equation vary continuously with its coefficients, it is evident that for sufficiently large >.. the equation
Q*(x)
=
0
will haven + 1 roots:
n and
<
��
<
et will tend to �" as >.. + oo. 1/t(v  0). If Q,.(v)  oo
or +
· ·
<
·
<
�:...1
In this case, it is evident that
1/t*(v

O)
o, it may happen that nor �:...1 tends as >.. + oo, while the other roots tend to the
will tend to
respectively to
n
oo
=
roots
of the equation
Q,.(x) But the terms in tend to
0,
1/t*(v

0)
=
0.
corresponding to infinitely increasing roots
and again
1/t*(v

0)
+
1/t(v

0).
Now
x.. (v) <
�;
n
.
Consequently, given an arbitrary positive number large as to have
x .. (v) <
�;n
..
<
E.
E1
we can select n so
386
INTRODUCTION TO MATHEMATICAL PROBABILITY
Having selected
in this �anner, we shall keep it fixed. Then by the
n
preceding remarks a number L can be found so that
x*(v)
<
.J"in
< e
1¥(v  0)  !f*(v  O)l for
X > L.
Combining this with inequalities
IF(v, X)  .,o(v)l for
X
> L.
<
E
< 3e
And this proves the convergence of To show that the equation
fixed arbitrary v.
lim
F(v, X)
1 =
c. V'lr _
holds uniformly for a variable to P6lya. Since
.,o(
oo
)
=
function, one can determine
.,o(x)  .,o(x)
1 Next, because
.,o(x)
� 1
we find
(20),
s·
<
a2
for k
=
large
X
<
. . .
0, 1, 2,
<
.
n
v we can follow a very simple reasoning due 1 and .,o(x) is an increasing two numbers ao and an so that 0, .,o( + oo )
=
� < �
.,o(ao)
�
for
<
 .,o(an)
for
an.
X�
ao
(ao, an) can be and an points
so that

1.
By the preceding result, for all sufficiently
1
 F(a,., X)
<
�
and
k Now consider the interval
0 �
for a
e"''dx
is a continuous function, the interval
an1
.,o(v)
oo
subdivided into partial intervals by inserting between
al
to
F(v, X)
F(v, X)
(<
oo,
�;
=
ao).
1, 2, ...
Here for v �
0 <
.,o(v)
and
IF(v, X)  .,o(v)l
n
<
e.
<
�
 1.
ao
387
APPENDIX II
v belonging to the interval (a,., + oo )
For
0 �
1  F(v, X)
<
E "2'
0 < 1 .cp(v) <
E
2,
whence again
IF(v, X)  cp(v)l
E.
<
Finally, let
(k
=
. .. n  1).
0, 1, 2,
Then
F(v, A)  cp(v) � F(ak, A)  cp(ak+l) [F(ak, A)  cp(ak)] + (cp(ak)  cp(ak+l)] a v a , v F( , A)  cp( ) � F( k+l A)  cp( k) [F(ak+ll X)  cp(ak+l)] + (cp(ak+l)  cp(a�o)]. =
=
=
=
But
whence E <
F(v, A)  cp(v)
Thus, given E, there exists a number such that
L(E)
IF(v, A)  cp(v)l for
A
< E.
depending upon
E
alone and
< E
> L (E) no matter what value is attributed to
v.
The fundamental theorem with reference to probability can be stated as
follows:
Let s,. be a stochastic variable depending upon a variable positive integer 1, 2, 3, n. If the mathematical expectation E(s:) for any fixed k tends, as n increases indefinitely, to the corresponding expectation =
E(x")
=
1  .. x"e"'"dx .v;;:  ..
J
of a normally distributed variable, then the probability of the inequality Sn < V
tends to the limit
and that uniformly in v.
_1.v;;:_J.
 ..
e"'"dx
388
INTRODUCTION TO MATHEMATICAL PROBABILITY
In very many cases it is much easier to make sure that the conditions of this theorem are fulfilled and then, in one stroke, to pass to the limit theorem for probability, than to attack the problem directly.
APPLICATION TO SUMS OF INDEPENDENT VARIABLES 9. Let z1, Z2, za, ...be independent variables whose number can be increased indefinitely.
Without losing anything in generality, we may
suppose from the beginning E(zk)
k
0;
=
=
1, 2, 3, ....
We assume the existence of E(z&) for all k
=
1, 2, 3, ....
=
bk
Also, we assume for some positive o the
existence of absolute moments
k
=
1, 2, 3, ..
.
.
Liapounoff's theorem, with which we dealt at length in Chap.XIV, states that the probability of the inequality Z1
+ Z2 + ' ' ' + Zn
<
t,
v'2B: where
B
..
=
b1 + b2 + · · · + b,.
tends uniformly to the limit
1 
..Ji as n
+
oo
J
' e"''dx
co
, provided
p.r"i6>
+
p.�'J.+6)
+
.
6
. . +
p.�2+6>
+
0.
B!+2 Liapounoff's result in regard to generality of conditions surpassed by far what had been established before by Tshebysheff and Markoff, whose proofs were based on the fundamental result derived in the preceding sec Since Liapounoff's conditions do not require the existence of
tion.
moments in an infinite number, it seemed that the method of moments was not powerful enough to establish the limit theorem in such a general form. ·
Nevertheless, by resorting to an ingenious artifice, of which we
made use in Chap. X, Sec. 8, Markoff finally succeeded in proving the limit theorem by the method of moments to the same degree of generality as did Liapounoff.
389
APPENDIX II
Markoff's artifice consists in associating with the variable
x, andy,. defined as follows:
variables
z, two new
Let N be a positive number which in the course of proof will be selected so as to tend to infinity together with
n.
Then
if if Evidently
z,., x,, y,. are connected by the relation z,. x, +y,. =
whence
E(x,) + E(y,.)
(21)
=
0.
Moreover
(22)
Elx�ol*'
+
EIYki2H
=
Elz,.l*'
I'L2+11,
=
x, and y,. x,. is bounded, mathematical expectations E(xt)
as one can see immediately from the definition of Since
exist for all integer exponents l
l, 2, 3, . . . and fork
=
=
1, 2, 3,
In the following we shall use the notations
IE(xDI
cL11; l 1, 2, 3, . . . c�2> + c�2> + . . . + c�2> B� 1'�2+1> + 1'�2+8> + . . . + 1'�2+6) c... =
=
=
=
Not to obscure the essential steps of the reasoning we shall first establish a few preliminary results.
Lemma 1.
Let
q,
represent the probability thaty, � . < c.. + q.. . q1 + q2 + ·
·
·
=
Proof. only if
lz�ol
Let
IP�<(x)
N*'
be the distribution function of
> N, the probability
q, is not greater than
��:diJ'k(X)
+
0; then
z,.
JN"' d!p�o(X).
On the other hand,
J::lxi*'diJ',.(X)
+ fN"'ixi2+'diJ',.(X) � 1'�2+3>,
Since y,. � 0
390
INTRODUCTION TO MATHEMATICAL PROBABILITY
But
J:
 lxl2+6d
"' N +6 +6 k(x N Ixl2 d