Symmetry Arguments in Probability Kinematics R. I. G. Hughes; Bas C. van Fraassen PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, Vol. 1984, Volume Two: Symposia and Invited Papers. (1984), pp. 851-869. Stable URL: http://links.jstor.org/sici?sici=0270-8647%281984%291984%3C851%3ASAIPK%3E2.0.CO%3B2-4 PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association is currently published by The University of Chicago Press.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/ucpress.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact
[email protected].
http://www.jstor.org Tue Jun 19 06:31:49 2007
Symmetry Arguments in
Probability Kinematics
R.I.G. Hughes
Yale University
Bas C. van Fraassen
Princeton University
Probability kinematics is the theory of how subjective probabilities change with time, in response to certain constraints (accepted by the subject in response to his experience or incoming information). Transformations -- rules for updating one's probabilities -- which have been described and to some extent accepted in the literature, include (in increasing order of generality): conditionalization, Jeffrey conditionalization, and INFOMIN (relative information minimizing). The present paper investigates the general problem. By an argument purely based on symmetry considerations it is demonstrated that conditionalization and Jeffrey conditionalization are the unique admissible rules for the cases to which they apply. The general problem is thereby reduced to a sequence of two operations, the second a Jeffrey conditionalization and the first a determination of posterior odds on some partition. In Section 2 we give a new deduction of INFOMIN from a simple postulate; in Section 3 a rival rule (E) derived from an analogy to quantum mechanics. The Appendix presents some preliminary comparisons.' 1. 1.1.
Significant Structure and Symmetry
The general problem of probability kinematics
We are given a (&) probability function p, defined on algebra of sets F, and asked to revise p so as to produce a (posterior) p' which meets certain constraints CC. These constraints are stated as conditions on the probabilities assigned to a certain set X of elements of F. As an example, imagine that I wish to transform my probabil-
ity function in response solely to the constraint that my proba-
bility for rain and for snow tomorrow should be equal. (Perhaps
I have just heard only the last part of a weather report on the
radio, to this effect). The coarsest relevant partition is
{rain, snow, neither rain nor snow, both rain and snow?. My
present probabilities for these are perhaps 0.3, 0.2, 0.4, 0.1.
The constraint could be satisfied in many ways, with any posteri-
or of form x+a, x+a, x+b, x such that 4x+2a+b=l. But some of
PSA 1984, Volume 2, pp. 851-869 Copyright @ 1985 by the Philosophy of Science Association
t h e s e , e . g . , w i t h x=.2, would l o o k v e r y u n r e a s o n a b l e and c a p r i cious a s a response. The s o l u t i o n of t h e g e n e r a l problem c o n s i s t s i n d e f i n i n g a which w i l l t u r n a n a r b i t r a r y g i v e n p r i o r p i n t o such function a p o s t e r i o r p ' = s p ( p r o v i d e d CC i s a p p l i c a b l e ) . Unless we have further desiderata for t h e problem does n o t make s e n s e . But t h e i n s p i r a t i o n f o r t h e problem i s a n imagined s i t u a t i o n i n which a person s e r i o u s l y t r i e s t o 'update' h i s p r o b a b i l i t i e s . Such a p e r s o n can s t a t e h i s c o n c r e t e problem i n a number of ways, and t h e g e n e r a l s o l u t i o n we o f f e r him must l e a d t o ' e s s e n t i a l l y e q u i v a l e n t ' s o l u t i o n s when a p p l i e d t o ' e s s e n t i a l l y e q u i v a l e n t ' problems.
s,
s,
The c l u e l i e s i n t h e c h a r a c t e r of t h e c o n s t r a i n t s which a r e assumed t o concern a s p e c i f i c f a m i l y X of e v e n t s ( o r p r o p o s i t i o n s ) . Although X i s p a r t of F, i t can b e r e g a r d e d a s p a r t of many o t h e r a l g e b r a s o v e r l a p p i n g F. Indeed t h e p e r s o n who s t a t e s h i s c o n s t r a i n t s t o b e s a t i s f i e d , may choose a s m a l l a l g e b r a , l a r g e enough t o c o n t a i n X , o r any number of l a r g e r o n e s , t o r e p r e s e n t t h e problem he f a c e s . He may o r may n o t have p r o b a b i l i t i e s f o r many p r o p o s i t i o n s l o g i c a l l y independent of X , and o u r a d v i c e t o him s h o u l d n o t b e a f f e c t e d by whether o r n o t h e i n c l u d e s them i n t h e d e s c r i p t i o n . Moreover, i f he chooses a r e p r e s e n t a t i o n of some s o r t t o d e p i c t h i s problem -- f o r example, he a Venn diagram t o r e p r e s e n t t h e p r o p o s i t i o n s i n q u e s t i o n can do s o i n many a l t e r n a t i v e b u t e q u i v a l e n t ways. We should b e a b l e t o i d e n t i f y when h e p r e s e n t s u s w i t h ' t h e same' problem and when n o t , f o r i f h e does we should g i v e him ' t h e same s o l u t i o n ' .
--
1.2.
A symmetry p r i n c i p l e
We must f i r s t i s o l a t e t h e s i g n i f i c a n t s t r u c t u r e which can b e p r e s e n t i n t h e problem s i t u a t i o n . T h i s i s q u i t e easy f o r i t i s d i c t a t e d by p r o b a b i l i t y t h e o r y : p r i o r and p o s t e r i o r p r o b a b i l i t i e s must b e d e f i n e d on a s u i t a b l e a l g e b r a . The t o t a l s t r u c t u r e of i n t h e c h a r a c t e r of a l g e b r a F probability space(F,p)consists a n d measure p , and t h e s t r u c t u r e a l l p r o b a b i l i t y s p a c e s have i s ) a l g e b r a and p a measure t h a t F i s a Boolean ( o r Boolean sigma t h e r e o n . T r a n s f o r m a t i o n of one problem s i t u a t i o n i n t o a n o t h e r o n e , which may b e c a l l e d ' e s s e n t i a l l y t h e same' should t h e r e f o r e preserve exactly t h i s structure.
-
To make t h i s p r e c i s e , l e t u s c a l l a measure embedding any one-to-one map g of < F , p ) i n t o < F 1 , p ' > such t h a t g i s a n isomorphism a s f a r a s t h e a l g e b r a i c o p e r a t i o n s a r e concerned, and a l s o p r e s e r v e s measure, i . e . , p l ( g A ) = p(A) f o r a l l A i n F. C l e a r l y g h a s a n i n v e r s e , and we may r e s t a t e t h i s e i t h e r a s p ' g = p o r p ' = pg-l ( t h e f i r s t f o r t h e domain of g and t h e second f o r i t s r a n g e , which w i l l g e n e r a l l y b e o n l y p a r t of t h e domain of P'). I f CC i s a s e t of c o n s t r a i n t s on p r o b a b i l i t i e s t o b e a s s i g n e d t o elements of X , p a r t of F, we have a n e q u i v a l e n t s e t of
c o n s t r a i n t s CC' imposed on t h e images g ( A ) : A C X , p r o b a b i l i t i e s d e f i n e d on t h e t a r g e t a l g e b r a F'. upon o u r g e n e r a l s o l u t i o n i s t h e r e f o r e t h a t E', p r e s e n t f o r imposing C C ' on p r i o r s d e f i n e d on F ' , i n t h e f o l l o w i n g way: related t o
cc
applicable t o The r e q u i r e m e n t t h e f u n c t i o n we should b e
>
SYMMETRY. I f g i s a measure embedding of F,p ' for a l l A in F t h e n ccp(A) = ~ ' p (gA) (F' , p ' )
into
I n o t h e r words, when p i s j u s t p ' g t h e n s p s h o u l d be z ' p ' g . The p r a c t i c a l e f f e c t of f o l l o w i n g t h i s p r i n c i p l e i s t h a t , when we t r y t o i d e n t i f y E, we a r e always allowed t o s w i t c h o u r a t t e n t i o n t o a more t r a c t a b l e ' e q u i v a l e n t ' p r o b a b i l i t y s p a c e ( e i t h e r t h e domain o r t h e range of a measure embedding r e l a t i n g i t t o t h e one we h a d ) . 1.3.
Main theorem and c o n d i t i o n a l i z a t i o n c o r o l l a r i e s
The c o n s t r a i n t s CC a r e s t a t e d a s c o n d i t i o n s on t h e p o s t e r i o r p r o b a b i l i t i e s t o b e a s s i g n e d t o a c e r t a i n s e t X of e l e m e n t s o f F. Without l o s s o f g e n e r a l i t y , we can t a k e X t o b e a s u b a l g e b r a ; a l t e r n a t i v e l y s o a s l o n g a s X i s c o u n t a b l e we c a n , w i t h o u t l o s s , t a k e i t t o b e a p a r t i t i o n . We s h a l l h e n c e f o r t h assume t h e l a t t e r . As a working h y p o t h e s i s , l e t u s assume t o e x i s t , and when applied t o p, t o give the posterior probabilities s p ( B ) = r B members B of p a r t i t i o n X. It i s c l e a r t h a t p l ( A ) = L Q ~ ' ( A A B ) : B ~ X ) f o r any f u n c t i o n p ' , s o we w i l l have i d e n t i f i e d s p i f we can d e t e r m i n e what v a l u e s i t g i v e s t o s u b s e t s o f members B of X. T h e r e f o r e we begin with:
Lemma 1. p(E1).
Let E,E1 b e s u b s e t s of member B of X and p(E) =
F o r p r o o f , we l o o k a t t h e s u b a l g e b r a of F g e n e r a t e d by XU$E,E;); call it F Let p be p r e s t r i c t e d t o F The i d e n t i t y f u n c t i o n i s t h e n aOmeasure gmbedding o f F ,p ) 'into F , ~ ) . We now c o n s t r u c t a measure embedding of t g e f o r m e r o n t o i t s e l f a s follows:
.
<
.
<
g(E-E') = E'-E
g ( ~ ' - E ) = E-E'
g i s t h e i d e n t i t y f u n c t i o n on x ~ { E E ' , B-(EVE'))
g ( A V A 1 ) = g ( A ) V g ( A 1 ) when A, A' a r e d i s j o i n t .
A Venn diagram s u f f i c e s t o d e p i c t t h i s s u b a l g e b r a and t h i s automorphism; b e c a u s e p(E) = p ( E 1 ) i t f o l l o w s t h a t p(E-E') = p(E1-E) s o i t i s a measure embedding.
We can now a p p l y SYMMETRY t o t h i s measure embedding g of ( F ~ , ~ o~n t)o i t s e l f t o deduce t h a t c c p (E) = c c p (gE) = a 0 a 0 I ccopo(E ) . But s e c o n d l y , t h e i d e n t i t y map i s a measure embedding 1 o f ( F ~ , ~i n ~ t )o ( F , ~ ) and s o we a l s o deduce t h a t z p (E ) =
1 s p 0 ( E ) = %po(E)
= s p (E) a s r e q u i r e d .
Because of t h i s lemma, we now know t h a t f o r a s u b s e t E of a member of t h e p a r t i t i o n , t h e s o l e r e l e v a n t f a c t o r i s i t s p r i o r p r o b a b i l i t y . So t h e r e e x i s t s f o r a g i v e n member B of X a funct i o n f such t h a t s p ( E ) = f ( p ( E ) ) when E 5 B . What i s t h i s f u n c t i o n l i k e ? Here i t i s more c o n v e n i e n t t o embed o u r problem i n a c o n t e x t where r e a l a n a l y s i s can a p p l y .
full
A p r o b a b i l i t y space < F , p > i s when f o r each element A of F, p t a k e s e v e r y v a l u e i n [ 0 , p(A)] on t h e elements of F which a r e s u b s e t s of A.
L e m a 2. There e x i s t s a measure embedding a f u l l s p a c e
*
of < F , p >
in
T h i s embedding i s e a s i l y c o n s t r u c t e d w i t h ( F * , p * > t h e p r o d u c t Then F* o f $ , p) w i t h ( [ 0 , 1 ] , m ) where m i s Lebesque measure. i s t h e f a m i l y ( ~ x Q : A i n F and Q a Bore1 s e t on t h e u n i t i n t e r ~ a l ) , ~ * ( A x =~ )p(A)m(Q), and A* = AxQ. To c o n t i n u e t h e main argument, we now a p p l y Lemma 1 t o o u r t h i n k i n g a b o u t t h i s f u l l s p a c e , and s a y a c c o r d i n g l y t h a t f o r element B* t h e r e e x i s t s a n u m e r i c a l f u n c t i o n f such t h a t =*p*(E) = f ( p * ( E ) ) when ESB*. There t h e f u n c t i o n i s p e r c e i v e d t o be a It must b e a d d i map of [ 0 , p ( B ) ] = [ 0 , p*(B*)] o n t o [ 0 , r 1 . we have t i v e , f o r i f E, E' a r e d i s j o i n t p a r t s of
%*
and b e c a u s e t h e s p a c e i s f u l l we have d i s j o i n t p a r t s E, E' w i t h p r o b a b i l i t i e s r , s r e s p e c t i v e l y whenever r + s $ p ( B ) .
A theorem of c a l c u l u s (which A r t h u r F i n e used t o s i m p l i f y
T e l l e r ' s proof) implies t h a t f has a constant derivative, thus
f (x) = kx + m Looking a t x =
p(A)
and x = p(B) r e s p e c t i v e l y we deduce t h a t
f (0) = 0 s o m = 0 , and k = r
B
s,
s*,
Hence by SYMMETRY the function like is given by the
equation:
g ( E ) = rB p(E) for E G B € X
P (B)
as required.
We have now proved our first theorem:
SYMMETRY THEOREM. If there exists a function
corresponding to constraints CC on the posterior
probabilities for members of partition X, then for each
probability function p to which is applicable and
such that p is positive on all members of X, we have
ccp(- 1 B) = p(- I B) for all B in X. It was assumed of course that any such function must satisfy the
SYMMETRY principle of section 11. As an immediate corollary we
have Teller's theorem:
1st Corollary (Teller) If p(B)#O and CC is the constraint
then
that the posterior probability for B equal=, =p(-)
= p(-
I B).
Richard Jeffrey (1983) proposed for the constraint that the
posterior p_robabil.ity for B equal r, the rule p (-) = rp(- 1 B) + (I-r)p(-( B). We prove similarly the uniqueness of his rule for
this constraint, in the general form:
2nd Corollary (Jeffrey Conditionalization Rule). If X is a
countable partition with p(B)#O for all B in X, and CC
the constraint than the posterior probability for B
equal rB, then
{ rBp(- I B):B CX].
g(-)=E
In these cases therefore, our first theorem already singles out a
unique way of imposing the constraint.
We may sum up our results so far as follows: whatever the
constraints CC (stated with reference to partition X) are, the
effect of the function is equivalent to an operation that
determines posterior probabilities on the partition X, followed
the operation of Jeffrey conditionalization with those poste-
rior probabilities. The task that remains is to investigate the
first operation with greater generality.
2. 2.1.
Optimal Uniform Hotion: a Deduction of the INFOMIN Rule
The odds picture
In view of t h e above r e d u c t i o n , we can h e n c e f o r t h e x p e c t no h e l p from t h e s t r u c t u r e of p r o b a b i l i t y t h e o r y . For j u s t cons i d e r : we have a c o n s t r a i n t on p o s t e r i o r p r o b a b i l i t i e s f o r members A , A of p a r t i t i o n C . These p o s t e r i o r p r o b a b i l i t i e s a r e n unknown:; we can t h i n k of t h i s a s a v e c t o r :
...
partition prior posterior constraint
l ~... ~ A n ,l 2 =(xl,
7 1Ul,
...
---
9
, xn> 9
Yn)
F(y) = m.
I t might seem a t f i r s t b l u s h t h a t one s u b s t a n t i a l b i t of probab i l i t y t h e o r y remains: 2 R = 5 3 = 1. But t h i s r e p r e s e n t s an a r b i t r a r y s c a l i n g c o n v e n t i o n . The o n l y real i n f o r m a t i o n t o b e g o t t e n from ?, f o r i n s t a n c e , i s t h a t t h e p r i o r odds of Ai t o A . T h i s can b e r e p r e s e n t e d e q u a l l y w e l l by t h e oddsJ are x v e c t o ir : x j '
z
and e q u i v a l e n t odds provided k i s p o s i t i v e . Let us c a l l v e c t o r s whenever Z = k b f o r some p o s i t i v e r e a l number k , and c a l l a n odds v e c t o r z a p r o b a b i l i t y v e c t o r e x a c t l y i f r ? = 1. Thinki n g i n terms of odds v e c t o r s w i l l remove t h e emcumbrance of an i l l u s o r y condition. What e x a c t l y o f odds v e c t o r s ? my odds are significant.
i s o b j e c t i v e about t h e s t r u c t u r e of t h i s s p a c e A l l t h e comparisons of such form a s : f o r A1 t o A a r e t w i c e y o u r s 2
Let us c a l l t h e equation
a n o d d s comparison between ji and 7. These e q u a t i o n s , l i k e t h e e q u i v a l e n c e r e l a t i o n d e s c r i b e d above, d e s c r i b e s i g n i f i c a n t a s p e c t s of t h e s t r u c t u r e . Let u s t h i r d l y d e f i n e f o r any two vectors: the whe: and
. .,
>
u o t i e n t (fly') i s ( x l / Y ' , x1 /y7 i 1and p 7 a r e p r o b a b i l i k G v e c t o r s n e q u ? v a l e n t t o 7 respectively.
P
For b o t h odds comparison and q u o t i e n t s e t (616) = i and (i;/6) =oo when k i s p o s i t i v e . I t w i l l b e c l e a r t h a t we have n o t i n t r o d u c e d a n y t h i n g a s s i m p l e a s a m e t r i c , b u t we have now t h r e e ways t o describe structure. Theorem. The f o l l o w i n g a r e e q u i v a l e n t c o n d i t i o n s ( a ) 7 and
7
a r e e q u i v a l e n t odds v e c t o r s
(b) t h e odds comparisons of
and
7
w i t h any o t h e r odds
v e c t o r a r e t h e same; i . e . ,
---
xi/xj
zi/zj
yilyj 1 .
zilzj
f o r a l l i ,j ; and a l l
z.
( c ) t h e q u o t i e n t s of x ' a n d y ' w i t h any o t h e r v e c t o r a r e t h e same, i . e . ,
(LIZ') =
(71;).
C l e a r l y ( a ) i m p l i e s y . = kx f o r some k , hence (b) y /y = i ~ i m % l a r (la~) i m p l i e s ( c ) a t once !iinJe t h e = x. /x.. prhabllity\e$tors equivalent t o equivalent vectors, a r e identi=41, , l), c a l . Given ( c ) , l e t Ib e t h e c o n s t a n t v e c t o r equivalent t o (l/n, , l/n>. I t f o l l o w s a t once t h a t t h e p r o b a b i l i t y v e c t o r s e q u i v a l e n t t o k and 7 a r e i d e n t i c a l , hence ( a ) . F i n a l l y we suppose ( b ) , and l e t 6 b e i a g a i n ; i t i s c l e a r i = 2, , n , such t h a t y i = t h a t t h e r e a r e u n i q u e numbers c ciyl and x = c . x l ; t h u s X and f ' s r e e a c h e q u i v a l e n t t o 1 , c 2' , cn ,iand 'kence t o e a c h o t h e r .
kx /kx
...
...
...
.. .
>
<
I n geometry we c a l l a t r a n s f o r m a t i o n i s o m e t r i c i f i t p r e s e r v e s a l l m e t r i c s t r u c t u r e , i . e . , a l l d i s t a n c e s . Here we have no m e t r i c , b u t l e t u s c a l l u n i f o r m any t r a n s f o r m a t i o n U (one-to-one o n t o mapping) which p r e s e r v e s t h e s t r u c t u r e we have--i.e., ( W I G ) = (fly). Theorem. I f U i s a uniform t r a n s f o r m a t i o n , t h e n ( U d / f ) i s t h e same f o r a l l f u;& x ; f 0,6%s I , ..., h . By d e f i n i t i o n , (UR/Uy) = ( f l y ) hence f o r e a c h each i n d e x i , i f U f =
Z'
and Uy =
7'
we have ( x l / y l . ) = ( x . / y . ) and t h e r e f o r e i 1 1 1
( y i / y f i ) =(xi/xqi)--as
was t o b e shown.
We s e e t h e r e f o r e t h a t U i s a uniform t r a n s f o r m a t i o n i f and , u s u c h t h a t US? = (u 1x 1 ' o n l y i f t h e r e e x i s t numbers u , unxn) f o r a l l y .
.. .
...
2.2.
General kinematics
I n geometry o r p h y s i c a l k i n e m a t i c s , we d e n o t e a s " r i g i d motion" any c o n t i n u o u s t r a n s f o r m a t i o n which remains i s o m e t r i c , i . e . , p r e s e r v e s d i s t a n c e s . Let u s h e r e d e f i n e t h e a n a l o g o u s n o t i o n of a uniform motion.Take t h e p r i o r K a s t h e s t a t e of o p i n i o n a t time t = 0 , and w r i t e F = Y ( 0 ) . We may imagine t h i s s t a t e d e v e l o p i n g i n time a s t h e v e c t o r Z ( t ) , and r e q u i r e t h a t
'c(t+d) results from ?(t) operators {vd).
by a uniform transformation Ud. These
include U which is the identity, and form a 0
semi-group with U U (P(t)) d e
=
Z(t+d+e)
=
U X(t). d+e
Let us denote
as a uniform motion any such one-parameter semigroup of uniform
transformations.
: z)/ 0) is a uniform motion then there are
numbers
kl,
... , kn such that
uZx =(eklZxl
,
.. .
,
ekhzx
>
This equation has many familiar models, such as Lambert's Law of light absorption, radio-active decay, and continuously compounded interest in economics. Recalling the remarks at the end of the last section, the theorem says that the quotient of posterior to prior, (UZ(l)/x'), is itself a simple function of time, of the
...
exponential form (ulZ,
,u
2.
To prove this theorem, recall the preceding one and let
u,(;)
=
(ul(z)xl,
.. .
,
have u.(z+w) = ui(z)ui(w).
=
U U we also
Thus the function E u
is additive i
Because U
un(z)xn>.
Z+W
Z W
(on the same domain, the non-negative real numbers).
By the
theorem of analysis which we utilized before, it follows that
lnui(z) -
= kiz+b, in other words
U (z) = eb.ekiz. i
identity function, we must have eb
=
Since Uo is the
1; this ends the proof.
2.3. Optimality
Of what interest are uniform motions to probability kinematics? If we look for a rule to give us the posterior 7 = %? for prior ?, in response to the constraint that F ( 7 ) = m, we cannot expect to be in general a uniform transformation. For clearly if Ffi) = m already then y' should be, or at least be equivalent, to ?; and the only uniform motions which can have this effect on Z have that trivial effect also on every other odds vector which has zeroes wherever k has zeroes. But if there
s,
it i s s t i l l the cannot b e a s i n g l e u n i f o r m motion which i s c a s e t h a t f o r e a c h k t h e r e i s some uniform motion U and time t such t h a t s K = U x'. T h i s o b v i o u s f a c t g i v e s u s a u s e f u l s l a n t t
on t h e problem.
Of c o u r s e t h e r e a r e i n g e n e r a l many uniform
m o t i o n s which w i l l have t h e r e q u i r e d e f f e c t . s i m p l e c o n s t r a i n t : even odds f o r A partition A x3)=41/6,
A2, A3.
1'
Let u s consider t h e
v e r s u s A2; -
stated for
The i n i t i a l odds v e c t o r i s , s a y ( x l ,
x2,
1/3, 112).
One way t o s a t i s f y t h e c o n s t r a i n t i s t o l e t x grow, w h i l e 1 k e e p i n g t h e o t h e r s t h e same, o v e r u n i t i n t e r v a l of t i m e , w i t h t h e results: p o s t e r i o r odds < 1 / 3 ,
113, 112
>
eq i v a l e n t t o : (217, 217, 3 1 7 ) where e = 2, i . e . , k = k ( 2 ) .
t:
=
<ek. 116, eO. 1/39 eO. 117.)
=
(0.29,
0.29, 0 . 4 2 )
Another way i s t o r e s t a t e t h e c o n s t r a i n t a s : t h e e x p e c t a t i o n o f q e q u a l s 0 , where q i s t h e q u a n t i t y - X A A i . e . , h a s v a l u e 1 on A1, v a l u e -1 on A and v a l u e 0 on A ~ ' The p r i o r e x p e c t a t i o n of q 2 3' e q u a l s -116; we l e t t h e odds grow o v e r u n i t time by " i n t e r e s t rates" proportional t o q ' s proper values: 0 p o s t e r i o r odds < e m . 116, e-m. 113, e . 1 / 2 >
=
such t h a t
y2
l . y l + (-1)y2 i.e.,
+
( 0 ) y 3 = O(yl
+
(y1, +
y2. y 3 \ y3)
y1 = y2 hence (em/6) = ( 1 / 3 e m ) , i . e . em =
K. e q u i v a l e n t t o (0.24,
0.24, 0.52) a p p r o x i m a t e l y .
T h i s l a t t e r c a l c u l a t i o n i s t h e one a d v i s e d by t h e s o c a l l e d INFOMIN r u l e , and a s we can s e e , t h e two p r o c e d u r e s g i v e v e r y different results. I n t h i s p a r t i c u l a r s o r t of problem we can a p ~ r o x i m a t e t h e INFOMIN r u l e by s a y i n g : t a k e f o r - t h e p o s t e r i o r p ; b b a b i l i t y of A a nd A a number t h a t l i e s rou h l halfwa between t h e i r r i o r s 1 ( i n tftis c a s e , r o u g h l y e q u a l t o :/4). T:is minimizes th: ( t e c h n i c a l l y d e f i n e d ) r e l a t i v e i n f o r m a t i o n of p o s t e r i o r t o prior--and i n o u r p r e s e n t s o r t of example y i e l d s a p l e a s i n g s e n s e o f minimum change on t h e i n t u i t i v e l e v e l a s w e l l , we t h i n k .
We do not wish to rely on a technical and abstractly motivated concept of information here; and given our uniformity postulate, we do not need to. When the transformation is uniform motion, each odds factor is multiplied by a number eka, where the value of k determines the growth rate. To use an economic analogy:
a capital V invested a interest rate k, continually
compounded, grows to ektV over time t.
Thus I propose that we minimize the 'interest rates' needed to achieve our objective. Since differential interest rates increase disparity, we minimize that disparity if we choose the rates lower; and if possible, attach the lower rates to the higher amounts. This suggests two possible optimality princi, xn) ples: in the change from(xl,
...
to
G T ~.,. .
, yn) with yi
=
k.t
e r xi, we can try to minimize either
the initial averageTx k or the final averageryiki. i i
Investi-
gation of the former leads to a rule which, as it turns out, is
almost always inapplicable. The second suggestion, however,
leads to exactly the same results as the INFOMIN rule, and is
quite widely applicable
Optimality Postulate. The transformation of prior odds (xl,
... , x ) to
posterior odds < Y ~ ,
subject to the constraint F (yl, by a uniform motion bkid}:
d
...
, yn)
. .. =
, yn>
m, is
2~3
for which Tyiki
is minimal.
Theorem (INFOMIN). For the simultaneous constraints
zyi 1 andIyiqi =
=
r (expectation value of quantity
q), there exists a constant w such that k for i
=
1,
..., n.
- Wqi,
Before proceeding to the easy
proof it should be noted that this theorem fixes the posterior
p r o b a b i l i t y v e c t o r uniquely.
For t h e r e w i l l e x i s t a t most one
number w such t h a t , w i t h r x i = 1 , we a l s o h a v e z y i = 1 and
For t h e proof we u s e t h e Lagrange m u l t i p l i e r method. That i s we i n t r o d u c e s p e c i a l v a r i a b l e s w and and u and s e t a l l t h e p a r t i a l derivatives of
w i t h r e s p e c t t o w, u , y l ,
..
,
y
equal t o zero:
Thus t h e minimum must l i e a t k = wq - u f o r undetermined c o n s t a n t s u and w. We d e r i v e fihfn Che f o i l o w i n g formula f o r t h e p o s t e r i o r p r o b a b i l i t i e s yi = e r , by s e t t i n g t = 1 ( c h o i c e of time scale):
I t i s c l e a r from t h e second c o n s t r a i n t ( 1 - t y . = 0 ) t h a t t h e f a c t o r u n o r m a l i z e s t h e p o s t e r i o r odds v e c t o r l t o make i t a p r o b a b i l i t y v e c t o r ; hence eU = Z x i e W q L , and s o o u r p o s t e r i o r v e c t o r 7 i s a f u n c t i o n o f p r i o r % and t h e s i n g l e unknown w. But t h e v a l u e of w i s t h e n d e r i v a b l e from t h e f i r s t c o n s t r a i n t ( r r y ; q i = 0 ) . Thus t h e p o s t e r i o r i s f i x e d u n i q u e l y .
Looking back t o o u r example problem of s e t t i n g odds e q u a l , we s e e t h a t t h e second method of d e t e r m i n i n g t h e p o s t e r i o r proceeded e x a c t l y i n a c c o r d a n c e w i t h t h e O p t i m a l i t y p o s t u l a t e , a s e l a b o r a t e d i n t h i s theorem.
This i s not a coincidence, f o r t h e
INFOMIN r u l e r e q u i r e s t h a t t h e q u a n t i t y r y . I n (yi/xi) 1
k. i m i z e d , and s i n c e yi = e ' x i (yi/xi).
-
( f o r t = l ) , we have ki
b e min-
=In
There a r e independent arguments t o j u s t i f y t h i s
quantity as a sort of measure of disparity between j7 and 7, which
I hope to have reinforced here by the intuitive picture of
uniform motion.
3. Maximizing Transition Probability: a Quantum-Mechanical
Analogue
3.1.
How quantum mechanics represents probabilities
An elegant way to represent probability functions geomet-
rically was bequeathed to us by Dirac and von Neumann: it is the
way used in quantum mechanics. It turns out that within this
representation a simple account exists of the way probability
functions change when new constraints are placed on them, and
that this account yields different results than does the applica-
tion of the INFOMIN rule.
The principle of the representation is very straightforward. Assume that we have a set of mutually perpendicular axes which yield a coordinate system for three-dimensional space. Now let f; be a vector of length 1 from the origin of this coordinate system. We know by Pythagoras' Theorem that the sum of the squares of the three components of 7 adds to one:
If these axes are used to represent a set of three mutually
exclusive and jointly exhaustJve events, then we can always
choose D in such a way that vi yields the probability of event i
(i = x , y , z).
In this case, of course, we deal with a partition of just
three members, but we can use a space of any dimension we choose,
and the Pythagorean property (on which the representation relies)
still holds. More formally, the representation works as follows.
3.2.
Representation of propositions and probabilities Given a set of propositions, we form a partition class
We now map
{oil
(one-to-one) onto a set!
L>
of mutually orthog-
onal subspaces which span a (real) vector space V. equipped with an inner product.)
f~,).
(V is is
li} is now closed under span
(representing disjunction) to form a Boolean lattice of (mutually
compatible) subspaces. from {Li}
If a( is the set of subspaces generated
, then the resulting algebra is
4 =<&
,v,
,& , 0,
V
).
Isomorphic t o t h i s i s t h e a l g e b r a of p r o j e c t i o n o p e r a t o r s
b,
onto t h e subspaces i n
A =(6,v,.,A,0,&),whereP v P = P + P - P PandPL=
P d u d r d , d
1 - (We w r i t e Ld f o r t h e subspace r e p r e s e n ~ i n gp r o p o s i t i o n 6 ,
P
d
f o r t h e p r o j e c t i o n o p e r a t o r o n t o Ld.)
A l l the projection
IP
o p e r a t o r s i n commute p a i r w i s e .
The a l g e b r a s AL, A a r e b o t h isomorphic t o t h e a l g e b r a A P 1 formed i n t h e u s u a l way from t h e s e t of p r o p o s i t i o n s . We
I oi\
f o r t h e c l o s u r e of i d i \
write
under d i s j u n c t i o n .
L e t 7 b e any n o r m a l i z e d v e c t o r i n V . function
7: &+ro,
11 by v ( L ~ )=
~
P
1~'. V
Then we d e f i n e t h e ~t i s t r i v i a l t o show
that:
( i ) e a c h such f u n c t i o n v r e p r e s e n t s a p r o b a b i l i t y f u n c t i o n Pv
On
%
such t h a t p
(d) = v (L ) f o r a l l 1 ;
B
( i i ) t o e a c h p r o b a b i l i t y f u n c t i o n p on normalized v e c t o r T all
B c
E V such t h a t v
P
corresponds a
(Ld) = p ( f ) f o r
.
We w i l l u s e t h e t e r m ' s t a t e '
3.3.
P
5 there
t o r e f e r t o a normalized v e c t o r
R e p r e s e n t a t i o n of c o n s t r a i n t s
C o n s t r a i n t s o n p r o b a b i l i t y f u n c t i o n s may t a k e v a r i o u s forms. We u s e t h e f o l l o w i n g n o t a t i o n : ' [ d = 11' r e p r e s e n t s t h e cons t r a i n t t h a t p ( d ) = 1, ' [ d = q ] ' t h e c o n s t r a i n t t h a t p ( d ) = q , ' [ d = \Y I ' t h e c o n s t r a i n t t h a t p ( d ) = p ( \ y ) , ' [ ( d \ v ) = q ] ' t h e
c o n s t r a i n t t h a t t h e c o n d i t i o n a l p r o b a b i l i t y of d , g i v e n Y , i s q . A l l t h e s e can b e r e g a r d e d a s s p e c i a l c a s e s of c o n s t r a i n t s on e x p e c t a t i o n v a l u e s : we w r i t e f o r example: ' [ E ( 5 d - 3 Y ) = 2 1 ' t o r e p r e s e n t t h e c o n s t r a i n t t h a t 5p(d) - 3 p ( ' l ) = 2 . Richard J e f f r e y u s e s t h e term ' p r o b a s i t i o n ' t o r e f e r t o t h e judgement such a c o n s t r a i n t e x p r e s s e s . Now c o n s i d e r an o p e r a t o r A on V e x p r e s s i b l e a s a weighted
sum of p r o j e c t i o n o p e r a t o r s i n : A =TaiPi.
Using t h e D i r a c n o t a t i o n f o r i n n e r p r o d u c t , we d e f i n e , f o r
any V t V , t h e e x p e c t a t i o n of A by: < A ) = ( 7 \ A?>.
-
\
2
Now (P ) = < V P C ) = P G\ = p - ( d ) and s o t o e a c h c o n s t r a i n t i n $ r o b a b i l i t y functf.ons menxioned above t h e r e c o r r e s p o n d s a n e q u a t i o n i n v o l v i n g t h e e x p e c t a t i o n v a l u e of an operat o r : we have v ~ [ =d 11 i f f < ~ d > 5= v € [ d = q ] i f f
V = v e [ d = q ] i f f
v E [ E ('Ijaidi)
=
1
q
=aidi
q ] i f f (A)
S
i s a ( f i n i t e ) weighted sum of
7 = q where A =
F
a Pd i i'
To e a c h c o n s t r a i n t 7 1 , t h e r e c o r r e s ~ o n d sa s e t V n o f s t a t e s , s u c h t h a t i f p = [ E ( ? a . d . ) = q ] and A = 5 aiPdi, t h e n v ^ s V i f f ' 3.1 r (A);=q. 3.4.
P r o b a b i l i t y kinematics r e v i s i t e d
The problem of p r o b a b i l i t y k i n e m a t i c s can be posed a s p i c k s o u t a s e t V p of s t a t e s which follows. A constraint s a t i s f y i t . Given a n i n i t i a l s t a t e ( o r p r i r p r o b a b i l i t y ) 7 , which f i n a l s t a t e ( p o s t e r i o r p r o b a b i l i t y ) v CV should we choose when t h e c o n s t r a i n t i s a p p l i e d ? The p r o p o s a l which t h e v e c t o r s p a c e r e p r e s e n t a t i o n of p r o b a b i l i t i e s s u y g e s t s i s t h a t t h e which i s geometp r o b a b i l i t y v e c t o r move t o t h a t v e c t o r G 42V r i c a l l y c l o s e s t t o 7 . To borrow a term from quantum t h e o r y , t h i s can b e c a l l e d t h e Maximum T r a n s i t i o n P r o b a b i l i t y (M.T.P.) P r i n c i ple. T h i s p r i n c i p l e i s t h e n :
r
P
(M.T.P.) g i v e n a p r i o r p r o b a b i l i t y r e p r e s e n t e d by c , t h e p o s t e r i o r p r o b a b i l i t y g i v e n c o n s t r a i n t /' i s r e p r e s e n t e d b y T ' , where ( i ) v'CVp,, ( i i ) f o r ? CV,
,
(B
IG")
"
i s a maximum when ?
-
= v'
47
The i n n e r p r o d u c t 1 ?'> i s c a l l e d t h e t r a n s i t i o n p r o b a b i l i t y between 'i and 7 ' . We c a n e x p r e s s i t i n t e r n s o f t h e p r o b a b i l i t i e ~t h e s e v e c t o r s r e p r e s e n t : t h e t r a n s i t i o n p r o b a b i l i t y from p t o p' measured on p a r t i t i o n X i s g i v e n by:
C o n d i t i o n a l i z a t i o n and J e f f r e y C o n d i t i o n a l i z a t i o n a p p e a r a s s p e c i a l c a s e s of t h e M.T.P. P r i n c i p l e . I f we c o n d i t i o n a l i z e on a proposition d l then F = = 11 and V p = L We o b t a i n t h e f i n a l s t a t e 7' by p r o j e c t i n g 7 o n t o Ld and t h t n r e n o r m a l i r i n g i t : v" = P ii/ \ p d t Then Pv, = d) f o r a l l y e
.
r$
d
.
PV(\Y)
9.
To i l l u s t r a t e J e f f r e y C o n d i t i o n a l i z a t i o n , c o n s i d e r a p a r t i t i o n w i t h t h r e e members, r e p r e s e n t a b l e by o r t h o g o n a l a x e s i n 3-space. The s e t of a l l s t a t e s i s t h e n t h e u n i t s p h e r e i n 3-space. Now t a k e t h e c a s e when f = [ d = q], and d i s r e p r e s e n t e d by o n e of t h e a x e s . I f we t h i n k o f t h i s a x i s a s t h e p o l a r a x i s of t h e u n i t s p h e r e , t h e n V p i s t h e l i n e of l a t i t u d e c f , where c o s d = q . According t o t h e M.T.P. P r i n c i p l e , i f 7 i s , t h e i n i t i a l s t a t e , c o n s t r a i n t [d = q ] p u s h e s u s t o t h e v e c t o r 7 ' ~V p w h i c h h a s t h e same l o n g i t u d e a s 7 ( s e e d i a g r a m ) . T h i s i s a l s o t h e f i n a l s t a t e p i c k e d o u t by t h e J e f f r e y C o n d i t i o n a l i z a t i o n r u l e , and i t ' s n o t h a r d t o show t h a t t h e M.T.P. P r i n c i p l e and J e f f r e y C o n d i t i o n a l i z a t i o n y i e l d i d e n t i c a l r e s u l t s i n t h e general case.
3.5.
A comparison of M.T.P. and INFOMIN P r i n c i p l e s
Given t h e r e s u l t proved i n t h e f i r s t p a r t o f t h i s p a p e r , we may r e g a r d agreement w i t h t h e J e f f r e y C o n d i t i o n a l i z a t i o n r u l e a s a minimal c o n d i t i o n f o r t h e a c c e p t a b i l i t y of a p r i n c i p l e governing probability kinematics. (We s h o u l d n o t e t h a t , d e s p i t e t h e p r o o f , t h e r u l e i s s t i l l n o t u n i v e r s a l l y a c c e p t e d . ) Thus i n c a s e s when P h a s t h e form [ d = q ] ( i n c l u d i n g t h o s e when d = I ) , t h e M.T.P. P r i n c i p l e and t h e IhTOXIS P r i n c i p l e a g r e e . However, t h e r e a r e d i v e r g e n c i e s between them when t h e c o n s t r a i n t s a r e more complex. I n t h e appendix we show r e s u l t s f o r ( I ) a c a s e when P We a r e = [ $ = VI ] and (11) a c a s e when r = [ ( $ I Y/ ) = q ] . p r e p a r i n g a f u r t h e r p a p e r which w i l l r e p o r t on p r a c t i c a l c a l c u l a t i o n s u s i n g t h e s e r u l e s and a comparison o f t h e " s u c c e s s " of a g e n t s u t i l i z i n g t h e s e r i v a l r u l e s under s i m i l a r c o n d i t i o n s . We have programs which can b e r u n on a p e r s o n a l computer, t o a p p l y t h e s e r u l e s t o problems o f t h e s o r t c o n s i d e r e d i n t h e Appendix and w i l l b e happy t o send c o p i e s t o anyone i n t e r e s t e d . Appendix We s h a l l h e r e pose two c o n c r e t e problems, t o which INFOEiIN a n d MTP can b e a p p l i e d , and compare t h e r e s u l t s . For t h e c a l c u l a t i o n a l p r o c e d u r e f o r t h e s e c o n d , s e e v a n F r a a s s e n (1981). (1) Your p r i o r odds f o r Problem I TJeather R e p o r t Problem. [ r a i n ; snow (and no r a i n ) ; f a i r ] f o r tomorrow a r e 1:2:3. You h e a r t h e t a i l e n d of a w e a t h e r r e p o r t and a c c e p t t h e c o n s t r a i n t t o make t h e odds of r a i n : snow e q u a l t o 1 : l . What a r e y o u r p o s t e r i o r o d d s r a i n : f a i r ? ( 2 ) G e n e r a l i z e t h e problem t o p r i o r p r o b a b i l i t i e s x , 0.5-x, 0 . 5 w i t h O ( x ( 0 . 5 .
--
INFOMIN a p p l i e d h e r e t o ( 1 ) y i e l d s t h e p o s t e r i o r (0.2427, 0.2427, 0.5146) a p p r o x i n a t e l y , w h i l e EI.T.P. y i e l d s (0.2464, 0.2464, 0.5072). Under ( 2 ) we may j u s t n o t e t h a t f o r t h e more extreme p r i o r (0.01, 0 . 4 9 , 0 . 5 ) , INFO?!IN y i e l d s 0.11 f o r Rain, w h i l e M.T.P. y i e l d s 0.19. Problem 11--Judy Benjamin Problem. ( 1 ) k%en undergoing m i l i t a r y t r a i n i n g , Judy Benjamin i s dropped i n a r e g i o n of t h e c o u n t r y s i d e d i v i d e d i n f o u r a r e a s : red-HQ, red-2, blue-HQ, blue-2. She d o e s n ' t know where s h e i s b u t h e r p r i o r odds f o r b e i n g l o c a t e d i n A p a r t i a l l y heard transmission red-2; red-HQ; b l u e a r e 1:1:2. f r o m b l u e HQ l e a d s h e r t o a c c e p t t h e c o n s t r a i n t t o make h e r c o n d i t i o n a l p r o b a b i l i t y f o r b e i n g i n red-HQ, g i v e n t h a t s h e i s i n r e d (-2 o r -HQ) e q u a l t o 314 ( i n s t e a d of g). What i s h e r p o s t e r i o r p r o b a b i l i t y f o r being located i n blue? (2) Generalize t h i s problem f o r p o s t e r i o r c o n d i t i o n a l p r o b a b i l i t y e q u a l t o 1. Comparison f o r Case ( 1 ) prior INFOMIN M.T.P.
red-2 0.25 0.1168 0.1207
r e d HQ 0.25 0.3505 0.3620
blue 0.50 0.5326 0.5173
Note i n e a c h c a s e t h e change i n p ( b 1 u e ) . Comparison f o r Case ( 2 ) The g e n e r a l p a t t e r n o f r e s u l t s f o r b o t h r u l e s i s shown i n t h e following figure.
PROHRY I L I TY SCALE
1 2
7
4
.4
5
6
7
.........................................................
FROHI
k
*
k
* X *
F'ROHZ
* +
+
+
+
+ +
#
t t
+ +
+ + +
X
-65 .67
+k #
+ +
*b
tt. # # #
# # #
+
+ 1
2
-91 #
+
k
.71 .73 .75 .77 .79 .81 .83 .85 .87 .89
#
+ + + + + + + +
1;
.69
#
+
t
* *
.63 #
t
*
.61 #
X
*t
.5 .52 .53 .55 .57 .59
# # # # # # # #
X+
X
*
FROB3 COND. PROY
.93
++
.95 #
.97 # .99
4
3
5
6
7
.........................................................
PRIOR =. 25, -25,- 5 CONDIT. FROB. FROM
.5
TO
1
STEP
-02
As q is varied from $ to 1, both M.T.P. and INFOMIN make y l decrease from 0.25 toward zero and make y increase from 0.5 toward 0.666... But both rules make y2 f?rst increase and then decrease again. The approximate peaks for y2 are at INFOMIN MTP -
q = 0.877 q = 0.916
7 7
=<0.0517, =<0.03674,
0.36887, 0.57939) 0.40064 0.56262)
It also appears that for y3 MTP always gives a smaller value than
INFOMIN and for y always a larger value, but the differences in
the values for yilare always small compared to those values.
Notes
'our Part One overlaps van Fraassen (1986), which contains
also an intuitive discussion of symmetry arguments. For previous
research on Jeffrey Conditionalization, introduced in Jeffrey
(1965; 2nd revised edition, 1983), and its extensions, see
especially Diaconis and Zabell (1982), van Fraassen (1980) and
(1981), Williams (1980). The authors wish to thank, respective-
ly, Yale University and the National Science Foundation for
research support.
Diaconis, P. and Zabell, S.L. ( 1982). P r ~ b a b i l i t y . ~m n a l o f t h e 822-830.
"Updating Subjective ASSod&&Ul
J e f f r e y , R. ( 1983). T h e i c of Decision, University of Chicago Press.
2nd ed.
77:
Chicago:
T e l l e r , Paul. ( 1976). "Condi t i o n a l i z a t i o n , Observation, and Change of Preference. " I n Foundations of P r o b a j l i t v Theorv. af S Volume I. i o Ser-o~hv of Sciance. Volume 6.) Edited by W.L. Harper and C.A. Hooker. Dordrecht: D. Reidel. Pages 205-253. van Fraassen, B.C. Kinematics."
(1980). "Rational Belief and P r o b a b i l i t y B F l o s o ~ h vof S-47: 165-187.
-
-----------------. (1981). Problem f o r R e l a t i v e Information Minimizing i n P r o b a b i l i t y Kinematics." Br Journal f o r "A
t h e 32:
-----------------.
(1986). a l i z a t i o n Rule."
3755-79.
"A Demonstration of t h e J e f f r e y Condition24: 17-24.
Williams, P.M. ( 1980). "Bayesian Conditional i z a t i o n and t h e P r i n c i p l e of M i n i m u m Information. The Journal f o r t h e P h i l o s o ~ h vo c -31: 131-144.