90
DLEMENTARY CALCULUS OF PROBABILITY
$ 21. The Rule of the Product So far in this study, the multiplication theorem has beenwritten in the form P(A,B.C) : P(A,B) . P(A.B,C)
(1)
Sincethe left side is symmetrical with respectto B and C, we may write the correspondingequation, dividing the product in a different way: P(A,B.C) : P(A,C) .P(A.C,B)
(2)
This is not a new axiom; it follows from (1) by substituting B for C and C for B, and in view of the fact that the "&nd" on the left is commutative. Becauseof the equality of the expressionswritten on the left in (1) and (2), we have P(A,B) - P(A.B,C) : P(A,C) -P(A.C,B) (3) This equation is called lhe rule of the prod,uct. For example,let P(A,B.C) be the probability that a person,.,4.,shows ability for physics,B, as well as for music,C.r Then (1) representsone form of splitting the probability of the product into two probabilities: the probability that a person has an ability for physics and the probability that a person endowedwith an ability for physics also shows a talent for music. Formula (2) representsthe oppositesplitting of the probability of-the product, na,mely,into the probability that a person has a talent for"music and the probability that a musically giftcd personalso showsability for physics. The two probabilitieshaving two terms in their referenceclassare not equal; rather, aecordingto (3), they have the ratio P(A.B,C) _ P(A,C) P(A.C,B) P(A,B)
(4)
The probability P(/.,C) of being musically gifted is, in general,much greater than the probability P(A,B) of having any ability for physics. Therefore, accordingto (4), the probability that a personwho is able in physics has a talent for music must be much $eater than the probability that a musician showsan aptitude for physics.But we know from experiencethat someconnection existsbetweenability in physicsand in music, such that P(A.B,C) ) P(A,C); musical talent occurs more frequently among persorlswho are able physicists than correspondsto the general average.Therefore, because of (a) it must equally be the casethat P(A.C,B)> P(A,B), that is, &mong musiciansalso there must be a higher percentageof peoplewith an ability in I It does not matter that, in this example, according to the convention given, we should write for A,B,C the same cipital letters 6ut'with diffelent subscripts.
$ 21. THE RUrJEoF THE PRoDUcr
91
physicsthan correspondsto the average.The ratio must be the samein both cases,since (4) may be written
P(A.C,B) : rg.n P(A,B)
P(A,C)
(5)
Furthermore,we derive from (5) that if P(A.C,B) : P(A,B), then also P(A.B,C) : P(A,C). This meansthat the independenceof B and C with respectto ,4.is a relation symmetrical in B and C (seep. 105). The condition of exclusionis also symmetrical,becauseit P(A.B,C):0, then P(A.C,B) : 0, accordingto (5), providedP(A,B) and P(A,C) are differentfrom 0. Solving (3) for P(A .C,B), we obtain
P( A .C ,B ):P (A .B ,C ).P S ' z P(A,C)
(6)
This relation shows that the probability P(A.C,B) is determinedby the three probabilitiesP(A,B), P(A,C), and P(A.B,C). Since the latter three probabilities determine also the probability P(A.B,C), as is shown in (10, $ 19), it followsthat all probabilitiesof B and C relative lo A arc determined by thesethree probabilities.Thus P(A.C,B) is derivableby means of (6), when we substitute there C for C and use the rule of the complement (7, $ 13). D, / D\ P(A'B) t,c)l,r_P(Ag (7) P(A.e,q: tr - P(A.Pn" The three probabilities P(A,B)
P(A,C)
P(A.B,C)
(7')
will be called the fundamentalprobabilitiesof the three eventsA,B,C;Lhey determine completely lhe probabil;itystatuso! B and C with respectto A. The choice of these values as fundamental must be regarded as a convention; any other three independentvalues might be chosen,for instance,the values P(A,B), P(A,C), P(A,B.C). But the conventionwill be seento be expedient. The numerical values of the fundamental probabilities can be chosenarbitrarily when a problem is to be given; they are subject only to the restrictions of the inequalities(15, $ 19).It hasbeenshownin $ 20 that theseinequalities sufficeto guaranteethat the probabilitiesP(,A,.Bv C) and P(A,B.C) are between0 and 1, limits included.Formula (6) showsthat then P(A.C,B) also is bound to theselimits, sincewe derive from the condition on the right
of (15'$ 19)that
p(A 'B,C)''=14^'"=)' := r gt , C) ',
"(S)
The inequalities (15, $ 19) formulate, therefore, the necessaryand sufficient restrictions to which the fundamental probabilities are subject.
ELEMENTARy cArrcuLUS oF pRoBABTLITy
92
The term "restrictions" is applied to an arbitrary choice of numerical values such as is made when fictitious problemsare constructed.For all statistics that are empirically compiled, these restrictions are satisfied automatically. With respect to applications, the result can be stated as follows: when three events A,B,C, aie concerned,it is sufficient to ascertain statistically the values of the three fundamental probabilities; thesevalues,which will always satisfy the inequalities (15, $ 19), are sufficient to derive all other probabilities of B and C that have the term,4 as referenceclassor as a factor of the referenceclass. If B and C are events that stand to each other in the relation of causeto effect, (6) becomesof particular interest. For instance, let ,4. representthe occurrenceof a hot day in sunmer; B, the occurrenceof a thunderstorm; C, the occurrenceof a change in the weather. Then P(,4.. C,B) represents the probability that a change of weather observedon a hot, day has been precededby a thunderstorm. In contradistinction to the example referring to talents in music and physics,in which the probabilitiesP(,A..B,C)and P(A.C,B) expressmere correlations, the probabilities refer here to causal relations: the thunderstorm is a possiblecauseof the changein the weather. The quantity P(A.B,C) is thereforethe probability that a certain cause will produce a particular effect, and the quantity P(A.C,B) is the probability that an observed efrect was produced by a specified cause. With respect to applications of thi5 kind, (6) is atso called the rule lor the probability of a w,use.In this interpretation (6) is usually given another form. Considering.Eas another possiblecauseof C, and expandingP(A,C) according to the rule of elimination (2, $ 19), we transform (6) into ;
P( A.C,B) :
P(A,B). P(A.B,C) . P(A,B) P(A.B,C)+ P(A,B). P(A.B,C)
(e)
The expressionobtains & more generalform when the version (21, $ 19) of the rule of elimination is used: P(A,Bi . P(A.Bk,C) (10) P(A.C,Bh): r
l e 1 t,n ;
. P (a .Bt,c)
This formula carries the name of the English clergymanThomas Bayes2and is called the rul,eof Bages.The schemaof figure 5 (p. 82) may serve again es illustration.
$ 21. THE RUriE oF THE PRoDUcr
93
The quantities P(,4.,B;), which occur in Bayes's rule, have been named "a priwi probabilities". The term is misleading becauseof its metaphysical probabilities.The name connotations, and I prefer to call them anteceiJent indicatesthat in theseprobabilities the event B; is referredto certain general data A the acquisition of which precedesthe observation of the specific data included in C. It goeswithout saying that antecedentprobabilities are of the samet54peas all other probabilities. The probabilitiesP(,4 .C,Br) are called inuerseprobabilities.Bayes'srule determines the irwerseprobaAilittiesas functions qf the forward probabilities, the latter term including both kinds of probabilities occurring on the right of (10). It is important to realize that such a determination is possibleonly if, among the forward probabilities, the antecedentprobabilities are given; without a knowledgeof the latter the problem would be indeterminate.Only when the antecedentprobabilities are all equal, that is, when P(A,B) : P(A,Bz) :
. . . : P(4,8,)
(11)
do they disappearin the formula, sincethen (10) &ssumesthe simplified form P(A .Bh ,C )
P(A.C,Bk):
(12)
Ep(e.Br,c) i:l
But in order to apply (12) we must have the positive knowledgeexpressed in (11). It is by no means permissibleto use (12) when the values of the antecedentprobabilities are unknown. Absenceof knowledge of numerical valuesis not equivalent to knowledgeof thefr equality. The disregardof this simple logical fact has becomethe sourceo[ many erroueousinterpretations of Bayes'srule.3When nothing about the antecedentprobabilities is known, we must simply admit that the inverse probabilities cannot be determined. The following example may serve as a numerical illustration of Bayes's nrle. A factory :4.has three machinesfor the manufacture of a certain produet; machine Br produces 10,000 pieces daily; machine 82, 20,000 piecesl mechine Bs, 30,000 pieces.AII three machines occasionallyproduce faulty pieces,C; and, specffically,the first machine has on the averagea rejection of 4/e; the second,of.2/6; the third, of.47o. A characteristicsampleis found rmong the rejects, and we ask for the probability stating by which of the ttree machinesit was produced.We have here
: + P(A,Bz): : t P(A,B):+fr+€+ *#&3* P(A,B;):33+3t+:+ . ft"t" *i.interpretations go back to Bayes and Laplace, who regarded it permissibte to rpply ( 12) when the antecedentprobabilities are unknown; the name a priori probabilitieswas cd with reference to such an "o pri,ori reasoning". See the criticigm of the princi{p of indfierence in $ 68.
ELEMENTARY
94 P(A.Bt,C) : 4Ta :
CALCULUS OF PROBABILITY
114o P(A.B2,C) : 27o : rk
P(A.B1,C) : 47o : tt1.
L4
P( A.C ,B ): : P( A.C ,B 2) : P( A.C ,B s)
6 . roo 14r l 6 . Tro-it -l-
ztl T . Tdo -r
4 z . To-o
L2 5 .TCO
+.rfro++.r& o++. r f r L4 Z .T T N 1 6
4 . TOT
Il -t-
B
2 . Tdd-t- J_L Z
1 . TOO
: E :2OTo : E :207o :g:60T0
We see clearly the influence of the antecedentprobabilities P(,4,,B7,),which are calculatedin a simple way from the distribution of the total production over all the machines.Though the secondmachine works twice as well as the first, it is equally probable that the rejected piece originates from the second a,sfrom the first machine; this is due to the fact that the second machine producestwice as many pieces.The third machine, which supplies half of the total production, is to be assignedthe probability $ of having produced the reject; this probability is greater than ! becauseone of the two other machinesworks more reliably. Therefore,of all rejects, 20/e originate from the first, 20/e from the second,and 60/6 from the third machine; this representsthe statistical meaningof the inverse probabilities calculated. At the same time we recognizethat without such antecedentprobabilities the problem is not determined. Should we consideronly the effi.ciencyratio of the machines,4 :2 :4, and calculatethe inverseprobabilitiesas +, #, 1+, this would meanputting P(A,B) : P(A,Bz) : P(A,83); but we must check whether the assumptionis justified. The probability of causesrpnnot be calculated without a knowledgeof the antecedentprobabilities. (Further examples are given in the appendixto chap.3, pp.123-127.) The range of application for Bayes'srule is extremely wide, becausenearly all inquiries into the causesof observedfacts are performedin terms of this rule. The methodof indtrect euidence, as this form of inquiry is called, consists of inferencesthat on closeranalysis can be shown to follow the structure of the rule of Bayes. The physician's inferences,leading from the observed synnptomsto the diagnosisof a specifieddisease,are of this type; so are the inferencesof the historian determining t'he historical events that must be assumedfor the explanation of recorded observations; and, Iikewise, the in-ferencesof the detective concluding criminal actions from inconspicuous observabledata. In many instancesthe use of probability relations is not manifest becausethe probabilities occurring have either very high or very low values.Thus, when a corpseis found, it is virtually certain that a_murder has been committed; and a fingerprint on the handle of a pistol may'be con-
95
$ 21. TlrE RULE oF THE PRoDUcr
sidered as strict evidencefor the assumption that a certain person X has fired the pistol. That even in such casesthe inferencehas the structure of Bayes's rule is often seen from the fact that appraisalsof the antecedent probabilities are made. Thus an inquiry by the detective into the motives of a crime is an attempt to estimate the antecedentprobabilities of the case, namely, the probability of a certain person committing a crime of this kind, irrespectiveof the observedincriminating data. Similarly, the generalinductive inference from observational data to the validity of a given scientific theory must be regardedas an inferencein termb of Bayes'srule.a The theory of indirect evidencehas beenobscuredby the assumptionthat there exists an inferenceleading from an implication (B ) C) to a probability would enableus to infer with probability from implication (C 1B),which an observedefiect C the presenceof the causeB. This inferencehas been called an inferenceby conf,rmation.EThe analysis of the calculus of probability shows that no such inference exists. The probability of a cause B can be inferred from the observationof the effect C only if all the probabilities occurring on the right-hand side of (9) are known. The relation (B I C) supplies only P(,4 .B,C) : l. There remain to be known, theref,ore,the antecedentprobability P(A,B) and the probability P(A.B,C). Thesevalues are in no way restricted by the fact that (B ) C) holds and must be independently ascertainedfor this as well as for the generalcaseP(,4,.B,C) < l. For the caseP(,4 .B,C) : 1, a weaker inference can be made when it is known, at least, that the other probabilities on the right-hand side of (9) exist.PuttingP(A,B): p, P(A.B,C): u, this sidethen assumes the form p piG-du
(13) I
If. u : 1, the denominatorwill be : 1;if u1l,it will be ( 1. The fraction, therefore,will be Z p, and we have the inequality
P(A.C,B)> P(A,B)
(14)
If we know that o ( 1, we can say that the observationof C will increasethe probability of B. But even the latter statement presupposesmore than the observationof C; besidesthe knowledgethat P(A.B,C) existsand is ( 1, it presupposesknowledgeabout the existenceof the probability P(,4.,.B).It is obvious, furthermore, that when inferencesof indirect evidenceare made, the conclusionis not restricted to the assertionof a mere increasein proba For a more elaborate discussionof this inference see $$ 84-85. 6 R. Carnao. "Testabilitv and Meanine." in Ph:il,os.of Science,Vol. III (1936). p. 420: Vol. IV (tS3?), p. 1. Instei;d of the relatiirn of implicaiion in B')C, other'relalibns thai make the inference even worse are sometimes used.
96
ET,EMENTARY car,cul,us
oF pRoBABIrrrry
ability. We wish to assertmore, namely, to arrive at an estimate of whether the probability P(,4..C,8) is a high value. This aim can be reachedonly when the values P(A,B) and P(,4..8,C) are known to a certain degreeof approximation. The so-calledinference by confirmation, therefore, representsan incomplete schematizationof the inferenceactually made in such cases.IVhen it seemsthat we sometimesdo infer from an observed consequencethat an assumptionis probably true, as in the confirmation of a scientific theory by such a procedureis possibleonly the observationaltest of its consequences, becausemore is known than is explicitly stated in the inference,in other words, becausewe have estimatesof the other necessaryprobabilities. This additional knowledgeplays a part in the inferencesactually made, as may be illustrated by the problemsgiven in the exercises(seethe appendixto chap.3, pp. 123-124). An inferential schemalike the inferenceby confirmation, which omits this knowledgein its premises,must be regardedas an instance of the follacy of Like other fallacies,it will sometimeslead to corincomplnteschematizalton. rect results; that will be the case when the additional premisesare true. But it doesnot representa valid inference,becauseit doesnot state all the premisesrequired for the truth of the conclusion.Such mistaken interpretations of the method of indirect evidencemake it clear that a satisfactory analysisof the method can be given only when it is construedas an inference that follows the rules of the calculusof probability.
$ 22. The Rule of Reduction In connectionwith the schemaof "bifurcation" as shown in figure 5 (p. 82), we shall now derive a theorem for a probability containing an "or" in it[. first term, that is, in the refereneeclass.We can obtain such a probability by using theorem (1, $ 14), which showsa way of bringing a symbol B from the secondinto the first t€rm of a probability expression.It is convenientto use, instead, the mathematical notation (3, $ l4), which, when we put D for C, may be written p( A,B.D)
(l)
P(A.B,D):-ffi
Let B v C be an exclusivedisjunction that is incomplete with respectto .4. If we substitute B V C for B and apply the distributive law and the special theorem of addition, we obtain
p(
a,B.D) + P(A.IBvcl,D)-ffi
P(A,C.D)
'' (2)
97
$ 22. THE RUr,EoF REDUcrroN
Solving the terms in the mrmerator by the theorem of multiplication, we arrive at the formula
P(A.lBv Cl,D):
P(A,B). P(4. B,D) + P(A,C). P(A .C,D) P(A,B) + P(A,C)
(3)
This theorem, which is valid only when B and C are mutually exclusive, will be called the specialrul,eo! reduction.It solvesa probability with a disjunction in the first place of the probability functor in tems of.indfuidml probabilities, that is, probabilities that do not contain a disjunction in the first place, but may contain a conjunction in that place. The theorem can easily be extendedfor exclusivedisjunctions of the form ,B1V . . . v B- that are incompleterelative to .A:
p ( A.lh v...
ie6.,a,) . P(a.Bb,D)
Y B ^ ],D)- /c= I
m
(4)
\e6',a,1
&- l
The name ruln of reductionis chosenin order to expressthe fact that the referenceclasson the left in (4) can be conceivedas resulting from the general referenceclass/ by a reduction. This is to be understood as follows. The referenceclassd is the sameas the classA.[Br v . . . v B,l when the latter disjunctionis completerelativeto A. The referenceclass.4..[BrV . . YB^7, containinga disjunction that is incompletewith respectto 24.,results from the former by the cancelingof some of the B,-a processthat may be called a reduction. Such a reduction will be used when additioual knowledgepermits us to drop someof the Br. An example chosenfrom political electionsmay serve as an illustration. Let Br . . . B, represent candidates of several political parties for a high office, say the presidencyof a nation; let P(A,B) be the probability r{ith which the election of the candidate 82 ma! be expected in the situation .d.existing beforethe votes are castl and let Br V . . v.B, be a completedisjunction relative to .d..This disjunction is also exclusivewhen the political office can be occupied by only one candidate. Let D be a certain action of economicimportance; it may be expectedwith a probability P(A.B1,D) that the candidateBp carriesout this ection successfully.For example,D may be the conclusionof a commercialpact with another country. [In this case, P(A.B;,D) would be equal to P(B4D), which is, however,irrelevant to the example.l Belme the beginning of the electionsthe probability P(A,D) of the signing of the commercialtreaty is calculated according to the rule of elimination (21, $ 19). Now assumethat the electionsare under way,,and that it is already known that certain candidatesare not elected; so only a part .B1v. . YB*(m( r) of the candidatesremain to be considered.The
98
ELEMENTARY CALCULUS OF PROBABILI]T
probability with which the signing of the commercial pact is to be expect+: is then obtained by a reduction of the referenceclassand is determined by t+ Equation (4) expresses a characteristic asymmetry between the first ar: the second terms of a probability implication. An "or" in the second tel= leads to an addition of probabilities, but an "or" in the first term, as is recc'5nized from (4), leads to an addition combined with a division, that is, to thr formation of a mean ualue. This becomes obvious throueh consideration ri the special casein which
P(A.BrD):
P(A.B2,D):
: P(A.B^,D)
i:,
Then we obtain from (4) P(A.Lhv.
. vB^l,D) :P(A.B',D)
:P(A.B^,D)
i'ii
Here the addition of or-terms in the first term doesnot changethe probability If we assignto each of the candidateswho are not yet eliminated an equ:probability that he will successfullycarry out the signing of the pact, then i; is immaterial which candidateis elected.The probability of the signing of tL= treaty doesnot dependon the further outcomeof the election.Furthermort. it is of no importancefor the relation (6) whetherall the P(A,B;) are equathe values of the P(A.B",D) (m1n( r) no Iongermatter, becausetL= respectivecandidatesare already eliminatedfrom the election. If the disjunctionis completerelative to A, (4) representsa secondforr of the rule of elimination(21, $ 19),sincethe denominatorbecomesequalto I
p(A,D): p(A.[Brv . . . v B,],D): f
. p(A.Bo,D) (; e1,+,no)
When we write the disjunction in the form B V,B, we arrive at the equatior p(A,D) : p(a.lB v Bl,D) : p(a,B) . p(a.B,D) + P(A,B) . P(A.B,D)
(s.
From (7) and (8) we seethat a B occurring in the first term can be eliminatec, like a B in the secondterm, whereasthe right side of the equationassume the sameform as in (2, $ 19) and (21, $ 19). The differencebetween the rule of reduction and the rule of elimination. however,is made clear when we considerdisjunctionswhich, though exclusive, are incompletewith respectto ,4. The formula
p ( a , l BL v. . . v B ^l .D ): i t,1 ,+,nu). p( a.Br ,D)
( 9.
/._l
which correspondsto the rule of elimination, is then always true, although this probability is not equal to P(A,D). The corresponding formula with the
$ 22. THE RULE oF REDUcrroN
99
disjunction in the first placeis given by (4); here the sum in the denominator is added. When we add the assumption(5) to (7) and extend (5) to hold for all terms of a disjunction Br V . . v -B,, which is complete relative to ,4., we obtain from (7), analogousto (6),
(10) This representsthe trivial assertionthat, in this case,the probability from A to D is the sameas the probability from r4.together with any Btto D. Formula (a) will now be presentedin a different form in order to make its structure clearer.Taking into account the condition of exclusion,namely, P(A.Bt,Bi : I
P(A.Bt,B) :0fork
+i
(11)
and substituting Bp for D, we obtain from (4)
| 1 k3 m P(A.lBrv. . . v B^l,Br): :'@Enp(a,Bt) I
(r2)
These expressionsmay be called red,uceilprobabilities; they represent the probability that Br, has with respectto .4 in combination with the terms of the incomplete disjunction. In the example given, they representthe probbeing electedwhen we know that only the candidates ability of a candidate.B7, Br . . B- remain. The reducedprobabilitiesare bound probabilitiesbecause, accordingto(12)' . . . v B^r,Bh) Lrro.[Brv
(13) This also follows from axiom rr,1; when the term in the square brackets in (13) is denoted by B, the secondexpressionassumesthe form P(A.B,B), and this probability is : 1 becauseof.(A.B t B). Using (12), we c&n write (4) in the form
P (4.[B :v.
.vB *],D)
m
. P(A.B*,D) : EP(A.[B'v . . . v B*7,8h)
(14)
The desired probability containing the "or" in the first term is here detepmined by a summation of terms, each of which contains a probability P(A.B;,D) multiplied by the correspondingreduced probability. For the calculation of the probabilities having an "or" in the first term, the prob-
ELEMENTARy cALcULUS oF pRoBABrLrry
100
abilities P(A.BI,D) do not suffice-a peculiarity reminiscentof Bayes's rule. They must first be multiplied by the reduced probabilities P(A.lh v . . v B^f,Bk), which, in turn, are determined by the values P(A,B) accordingto (12). Without these divergent or antecedentprobabilities the problem remainsindeterminate. We turn now to the extensionof theseresults to nonexclusivedisjunctions. As before, we start from (1). However, when we substitute here for B the the nonexclusivedisjunction B v C, we must use the generaltheorem of addition and thus obtain, instead of (2), the formula
P(A.lB v Cl,D):
P (A ,B .D)+ P (A ,C.D )- P (A , B . C. D) P(A,B)+ P(A,C)- P(A,B.C)
(15)
Applying the generaltheorem of multiplication, we arrive at the relation
P(A.lBv Cl,D):
(16)
P (A,B) . P (A . B,D) + P (A,C) . P (A . C,D) - P (A,B) . P (A . B,C) . P (A . B . C,D) P(A,B) + P(A,C) - P(A,B) . P(A.B,C) This formula will be called the generalrule ol reduction.It contains the special rule, expressedin (3), as the specialcaseresulting for P(A .B,C) : g. 'We can use formula (16) to determine generalizedreduced probabilities, correspondingto (12); for this purposewe substitute B for D. Taking account of the fact that P(A.B,B) : 1 and using (3, $ 21), we obtain
P(A.lB v C),8):
P(A,B) P(A,B) + P(A,C)- P(A,B) . P(A.B,C)
(17)
P(A,C) (18) P(A,C) - P(A,C). P(A.C,B) P(A,B)+ probabilit;ies Theseformulasdeterminethe valueof the reduced' lor nonenclw of (17) and (18) are equal becauseof The denominators siaedisjunctiozr.s. (3,$ 21). Introducingthe reducedprobabilitiesinto (16),we cSngiveto the general to (1a): rule of reductiona form corresponding P(A.lB v C7,C):
P( A.lB v Cl,D): P (4.[B v q,q'
P (A . B , D)
+ P(A. [B v C],C). P(A .C,D)- P(4. [B v C],8) . P(A.B,C). P(A.B.C,D)
(19)
becauseP(A.B,C) : g, For exclusivedisjunctionsthe last term disappears and the formulais thus transformedinto (14)written i.orm : 2.
$ 22. TrrE RULE ox' REDUcrroN
101
It is possibleto extend the general rule of reduction to disjunctions of more than two events.The resultingformula, however,is cumbersomebecause of the complicated form that the general theorem of addition assumesfor more than two events.Thereforeit is not presentedhere. Considera schemathat representsa combination of the rule of reduction with the rule of Bayes. Assumethe observation of an event D that can be explainedby severalpossiblec&usesBr . .8,. We do not know which of the causesexists,but we know their antecedentprobabilitiesP(A,BI) relative
E o
B. Fig. 6, Schemafor rule of compositionl according to (21).
to a common first term A and, furthermore, the probabilities P(A.B;,D) for the production of D by the individual causesBr. The disjunction Br . . B, may be complete and exclusive. We ask for the probability P(A .D,E) of. an event E resulting from D. What are known, however, are only the individual probabilitiesP(A.B;.D,E), which confer$rponZ a probability relative to D and to the causeBr that produced D. The schemais illustrated by figure 6. For example, assumethat' D means a s5rmptomof diseasethat may be explainedby severalpossiblecauses,and let ,E mean the caseof death. lV'e know the probability of a lethal issuefor each of the diseasesBp, and we ask for the probability of the death of the patient who showsthe symptom D. The probability can be constructed&s a mean in terms of the rule of reduction, after the probabilitiesof the individual causesBr have beencomputedthrough the rule of Bayes. For this purpose,in turn, we must know the antecedent probabilities P(A,B;), in which .4. means the class of persons of a certain ageand state of health; moreover,we must know the probabilitiesP(A.Bk,D) for the production of the symptom D by the individual diseasesBr.
t02
ELEMENTARY CALCULUS OF PROBABILITY
We have
P(A.D,E): P(A.[Brv . . . v B"].D,E) I
: E peq.D,B o). P (a . D. B k , E )
(2 0 )
h-L
accordingto (4), when we put in (4) A.D for,4. and E for D, the denominator being : I becauseof the completenessof the disjunction. Putting for P(A.D,Br) its value resultingfrom the rule of Bayes (10, $21), we obtain
P( A.D,E ):
. P(a.B,.D,E) I r6,,nny. P(a.Bh,D)
le:l
(21) E P(A,Bk). P(A.Bk,D) h-l
It showshow the integral This formula may be called the rule of compoxitiorz. probability from D to E is composedof the individual probabilities that dependon the oauseB*. In the interpretation given, the rule of compositionrepresentsan inference from the present (D) by way of the past (Br,) to the future (Z). Such inferencesoccur in many kinds of scientific prognoses.As explainedfor the rule of Bayes,however,the temporal interpretation is not the only possibleinterpretation; formula (21) and figure 6o representa logical structure capableof many interpretations. Note that the rule of composition (21) becomesidentical with the rule of elimination (21, $ l9) when ,4.is identical with D.
$ 23. The Relation of Independence The independenceof two eventsB and C was definedby the condition P(A.B,C) : P(A,C)
(1)
We then derived for the theorem of multiplication the specialform P(A,B.C) : P(A,B) . P(A,C)
(2)
It is also possibleto consider(2) as the definition of independ$nceand then to derive (1). This method has the disadvantagethat it breaks down if P(A,B):
0, since then the fraction '##
assumesthe indeterminate
form $ and thus doesnot determinethe value of P(A.B,C). In this case, (1) is not derivablefrom (2), whereas(2) is alwaysderivablefrom (1), even for the caseP(A,B) : 0. This is why it is preferableto defineindependence bv (1).
$23.
THE RELATToN oF TNDEPENDENcE
103
The generaltheorem of addition &ssumesfor independentevents th6 particular form P(A,B v C) : P(A,B) + P(A,C) - P(A,B) . p(A,C)
(3)
If the values P(A,B) and. P(A,C) are small numbers, the value of their product is small within a lower order of magnitude; for such events the product term in (3) can be omitted-which meansthat in practice it is permissible,for low probabilities, to replacethe generalrule of addition by the specialone. Thus, it P(A,B) : P(A,C) : rt$-,'their product is mo*r""; this value can be neglectedin (3), and we have, with sufficientapproximation, P(A,B vC) : rfrr. Although the general inequalities (10 and 13, $ 20) show that the probability P(A,B V C) cannot be greaterthan 1 or smaller than 0, a simple proof may be addedto showthat this condition is alwayssatisfiedfor the form (3).1 Let us put P(A,B) : p P(A,C) : q (4) Then the condition under considerationrequiresthat O
(5)
Now this inequality holds for all numbers p and g between 0 and 1, limits included. To show this, we put p:l-p'
q:t-q'
(6)
Inserting theseexpressionsin (5), we arrive at 0 < I - p'q'= 1
(7)
This is indeedtrue if p' and g'are between0 and 1, limits included. The case of equality with 0 in (7) can occur only when both p' and e' are : 1, that is, whenboth?:0 andq:0;andequalitywith l willoccuronlywhenp' : O or q' :0, that is, when at least one of the two values p or q: 1. In all other casesthe expressionconsideredin (5) will differ from its lower and upper limit. It is important to realize that the independencedefined bV (1) is a threeplacerelation, that is, a relation involving the termsp,C, and,4.'We must say, B is independentof C w'ith respectlo :4..Without stating the reference term ,4 we cannot speakof independence.This is contrary to linguistic usage, in which the referenceterm usually is not expressed. It may be asked whether this usagecan, perhaps,be justified by saying that certain events B and C are independentrelative to all ,4 as reference l Seealso the remarks following (15, l9). $
104
EI,EMENTARY
CAI,CULUS OF PROBABII,ITY
classes,so that the threeplace relation can be transformed into a two-place relation by generalizationin ,4..However, it turns out that this assumption is erroneous,for there always exist events A in respect to which any two eventsB and C are mutually dependent. This can be proved by choosingr4.as given by the disjunction A.[B v Cl. 'W'e can then show that, relative to this referenceclass,B and C are not independent. Using formula (18, $ 22) and applying (1), we obtain the following expressionfor a reduced probability, holding for any events B and C independent of eachother with respectto A:
P(A.lBv Cl,C):
P(A,C) P(A,B) + P(A,C) - P(A,B) . P(A,C)
(8)
Applying the inequality (5) to the denomi''ator, we seethat, apart from the extreme casesP(d,B) : 1 or P(A,C) : 1, the expression(8) is > P(A,C). Using (4e, $ 4) and (1), we have P(A.IB v C\.B,C) : P(A.B,C) : P(A,C)
(9)
Thereforewe have
P(A.lBv Cl.B,C) < P(A.lBv C\,C)
(10)
Thus with respectto the referenceclassd . [B v C] the two events B and C are not independent. This result may be illustrated by an example concerning bets on two horsesin different races.Let B be the casethat the first horse wins; C, thal the secondhorsewias. /. is given by the generalconditions before the races. AssumeP(A,B) : 507o,P(A,C) : 8070.Relative to the generalconditions .4, the two results are independentand thus (1) is satisfied;if the first horse wins, the chancesfor the other are not changed.Assumethat the racesare over and we are told that one of the horseshas won, but not which horseit is. The probabiJity that the secondhorse has won, relative to what we know now, is given by (8); this formula furnishesthe value 89%. So we now have a greater chancethan beforethat the secondhorsehas won. At this moment we learn that it was the first horsethat won; the result as to the secondhorse is still unknown. Now the probability that the secondhorsehas won is given by (9) and is the same as in the beginning, namely, 80%. This showsthat relative to the situation A.lB v Cl the casethat the secondhorse hasfwon is not independent of whether the first horse has won. In this situation, additional knowledge as to the winning of the first horse will change the probability with which we may expect the secondhorse to have won. This example shows that we must regard independenceas a three-place relation, which is comparable,for instance,to the geometrical relation "between": the statement, "A lies betweenB and C", can be formulated dnty
$ 23. THE RELATToN oF TNDEPENDENcE
105
for three terms. Just as the between-relationis symmetrical with respect to the terms B and C, so is the independencerelation symmetrical in B and C. For it has been shown in (5, $ 21) that if (1) is valid, it is likewise true that
P(A.C,B): P(A,B)
(11)
Furthermore, the independencerelation is similar to the between-relationin that it is not transitive with respectto B and C. If B and C are mutually independentrelative to .u4., and if C and D are mutually independentwith respectto .4.,then B and D need not be mutually independentrelative to ,4.. In the caseof the between-relationthere evenexistsintransitivity for B and.C, that is, if ,4 is betweenB and C, and L betweenC and D, then 14.never lies betweenB and D. The independencerelation rs only nontransitiue,that is to say, in the case consideredB and D may be mutually independent with respect to /., but such is not necessarilythe case.An instance of the nontransitive caseis obtained when two dice are linked with a piece of string and, besides,a third, free die is used. The first two dice produce mutually dependentsequences,each of which, however, is independent of the third sequence. A further property of the independencerelation must now be presented. Let three events B,C,D be given, any pair of which is mutually independent with respectto ,4,;then it doesnot follow that one of the eventsts ind,epend,mt ol the other two with respect to A. We must understand this statement in the following way. From the relations
;';::;:;';?i"),:i"i:'}, (t2)
P(A.D,B): P(A.C,B): P(A,B) it doesnot followthat the relations P (A .B .C,D): P (A ,D) P(A.C.D,B): P(A,B) P(A.D.B,C):
(13)
P(A,C)
also hold. This fact i" saying that the independencerelation is "*pr....d'lby not combinable. This is shown by the following considerations.ff we add to the probffiilities on the left side of (13) those obtained by negating B,C,D in the reference class,that is, if we regard probabilitiesof the kind P(,4.8.C,D1 or P(A.B.e ,D1 or P(A.B.C,D;, there are twelve probabilitiesthat have a
106
EI,EMENTARY CALCULUS OF PROBABILITY
triple referenceclass.For these, according to the rule of elimination, there are only six independentequationsof the form
P( A,D): P (A .B ,D): P (A .B ,C)'P (A . B . C, D) + tl - P (A .B ,C)l. P (A .B .e, D)
(14)
The probabilities having a triple referenceclass are, therefore, not determined by the probabilities having a single or a double referenceclass,and thus (13) doesnot follow from (12). An exampleof a casefor which (12) is valid, but not (13), is provided by thesequencet /. AAAAAAA...
BBBBBBBB.
(15)
cceeccee... DDDDDDDD.
. .
for which the first part written down is to be repeated periodically in the sameorder. Here all the probabilities(12) are equal to *. But P(A.B.C,D) is equalto 1;so is P(,4..8.C,D),and so on. Sequencesfor which, apart from the relations (12), the relations (13) are fulfilled, are called compl,etely ind.ependent. This notation appliessimilarly for a greater number of sequenees.
$ 24. CompleteProbability Systems In $ 16 the assumptionof a compact sequence,4.was introduced and shown to be convenient for the frequency interpretation, becauseit leads to the simple formula (4, $ 16). It is possibleto introduce this assumption by a logical device that makes its truth analytic: by replacing the class.4 by the universal class,4.V .4. The condition ri eA v,4 is then tautologically satisfied for every elementrr. To simplify the notation we introduce the rule that the universal classmay be omitted in the first term of a probability expression.This rule is expressed by the definition p(B) : ot p(a v A,B) (1) The probability P(B) may be called an absoluteprobability, in contradistinction to the relative probabilities_sofar cqnsidered.An absolute prob ability can be regardedas a relative $robability the referenceclassof which is the universal class. If the statementn;e,A is true for all r;, though not analytic, the class,4, for this sequence,is equivalent to the universal class,4' V .4. If a seque4ceis