A bayesian justification for the linear pooling of opinions

J. ltal. Statist. Soc. (1992) 3, pp. 325-334 A BAYESIAN JUSTIFICATION FOR THE L I N E A R P O O L I N G OF OPINIONS* Ma...

Author: Bacco M. | Mocellin V.

5 downloads 406 Views 372KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

J. ltal. Statist. Soc. (1992) 3, pp. 325-334

A BAYESIAN JUSTIFICATION FOR THE L I N E A R P O O L I N G OF OPINIONS* Mario Di Bacco Universit~ degli Studi di Bologna

Vergilius Mocellin Universiut degli Studi di Venezia Summary It is known that the problem of combining a number of expert probability evaluations is frequently solved with additive or multiplicative rules. In this paper we try to show, with the help of a behavioural model, that the additive rule (or linear pooling) derives from the application of Bayesian reasoning. In another occasion we will discuss the multiplicative rule. Keywords: Bayesian inference, Expert resolution, linear pool. 1. Introduction

Let X be an h-dimensional (h i> 1) random vector (r. v.) and suppose the Decision-maker, DM, has assigned the distribution function, D.F., oF to r.v. X. Now DM consults k (k I> 1) Experts to learn their opinions on X; Expert iE answers this question by assigning the D.F. iF (i = 1 .... , k). The issue then is: how would DM combine (or aggregate) F Ck) = (1F, ..., kF) and oF to update his opinion on X or, equivalently, what would his <>or <> D.F. EF be? This is the <<problem of resolving the different probability judgements of a group of experts>> (Morris, 1983) which has been carefully studied by a number of statisticians (for a thorough, critical survey see Genest and Zidek, 1989). It seems that the most popular solutions are the additive rule (or linear pool), i.e.: k

k

eF = ~ wi ~F + Wo oF ; (wi , Wo) >~ 0 ;i=

Wi + w 0 = 1

(1.1)

i=1

* Research supported by C.N.R. and Ministry of University and Technological and Scientific Research. 325

M . D I B A C C O 9 V. M O C E L L I N

or the multiplicative rule (logarithmic p o o l ) , i.e. - denoting by f the density function (d.f.) associated to F -

(1.2)

or, for wi = 1, i = 1 . . . . , k, and Wo = 1 - X wi, Bordley's slight variant (Bordley, 1982)

It is usual enough to validate rules (1.1)-(1.3) in an axiomatic way, that is, to state them as ~optimal rules>) because they have some properties which are ~obviously>> or ~maturally>) typical for ~rationab~ pooling. We will not discuss this approach, but turn our attention to the Bayesian one or, following Genest and Zidek's terminology, the supra Bayesian approach, i.e. what was proposed by Morris's pioneer work. Denote L(x;d) as the likelihood that data (or information) d grives to x; then Morris proposes ef(x) o: of(X) L(x;F~k))

(1.4)

The resolution (1.4) has been receiving a lot of attention and there is a large number of contributions to its theoretic foundation as well as its practical usefulness. For our part we think everyone would agree that (1.4) is an effective tool only if DM is able to make L ( x ; F (k)) explicit. This is precisely the crux of the issue: it was Morris himself who wrote (Morris, 1986) ~Problems with the direct Bayesian approach of assessing likelihood functions over expert opinions are not conceptual; they are pratical. Most would agree that it is virtually impossible to directly assess a joint likelihood distribution over all possible expert densities>>. To overcome the impasse of ~virtual impossibility>>, he proposed a system of axioms which he considered selfevident and that would guide DM to a compelled choice of the likelihood version. Then (following this) he puts: L ( x ; F (k)) oc 17 if(x) and, obviously, arrives at rule (1.2) with wi = Wo = 1. This simplification undoubtely makes the application of (1.1) straightforward - indeed, almost automatic. Nevertheless this cannot be a good motive for distorting Bayesian inference, thrusting it - as de Finetti liked to say into a Procrustean bed. In fact, by giving an immutable version of likelihood, 326

LINEAR POOLING OF OPINIONS

this is, obviously, doing violence to the essence of inductive reasoning in the Bayesian sense. In our opinion, the desire to find a ~rational>~ (or reasonable, at least) rule able to resolve any problem of aggregation of opinions has been, and is, the main cause of genuine Bayesian disagreement with Morris's axiomatic proposal and of the criticisms made by - for instance, Lindley (Lindley, 1986). It is clear that, from the Bayesian viewpoint no single rule can be given to solve the expert problem. For DM, F (k) is an empirical datum that he now uses to update his present opinion on r.v.X. How he makes this explicit is his own business, subject only to the need to be coherent. Thus to guarantee his own coherence and, at the same time, facilitate interpersonal communication, we can ask him to use the empirical datum F ~k) in accordance with the rules of inductive reasoning. But DM wilt be able to do that only if, before consulting the experts, he holds precise opinions about the r.v. X and the expert evaluations, i.e. about the choice iF of expert iE (i = 1 . . . . . k). In spite of all this, there is no doubt that going in search of an ~optimal>> (or at least, a ~good~0 rule to aggregate probabilities is appealing when, for instance, the problem of resolving different probability judgements arises in the context of a decision-making involving more than one person. So appealing, indeed, as to silence the foundational scruples we discussed above. Then, it is natural to restrict the attention to the most popular rules, the additive or the multiplicative. This conflict between pragmatic requirements and strict Bayesian observance can be weakened, if there are situations of practical significance in which the explicitness of the likelihood function leads to the additive or the multiplicative rule. If both classes of situations are of common occurrence in practice, then it is right to say that from a pragmatic viewpoint there exist coherent rules for combining probability judgements, and DM may be able to find among them one that mirrors his own present beliefs. In this paper we try to describe some typical situations in which it appears appropriate to use the additive rule. We will give a Bayesian interpretation of a class of multiplicative rules elsewhere.

2. The additive rule We now present our paradigm, which will lead us to combine F ~k) and oF according to (1.1), i.e. according to the additive rule. Suppose that in DM's opinion the phenomenon q~- whose random realization is described by r.v. X - is placed in one, and only one, of the alternative 327

M . D I B A C C O 9 V. M O C E L L I N

environments, or state of the world, belonging to set 6) = {0}. T h e term ~environment>> and ~state of the world>> can be characterized in the following way: if D M and all the Experts of group E knew that the environment of 9 is O, then he and the whole E group would assign the D.F. F to X but, if it were known to all of them that the environment is O', then they would agree to assign the D.F. F' to X. For D M the true ~world's statue>> - i.e. the ~environment>> in which p h e n o m e n o n 9 is actually placed - is a random entity. On the contrary, he believes that each of the experts of group E is sure about what is the true world's state: in other words, D M believes that if iE assigns the D.F. iF to X, this happens because he is sure that Oi is the true environment. Then, if the answers of the experts do not coincide - i.e. if their probability assignments are 1F, .... iF, ..... kF, with iF ~ iF, i 4=j - D M can, at most, give the whole credit to only one expert. But if he did this, he would be incoherent, since he has just deemed experts all those who belong to group E, and then decided to consult all of them. It is natural, therefore, to think that D M believes n o n e of his experts to be totally credible thus admitting that each of them can be wrong about his diagnosis of the true world's state. As an example, let us suppose D M and his friend V. are about to play the following game. V. has got two coins, a fair coin C - i.e. P r { h e a d / C } = 1/2 and a biased coin C, so that Pr{head/C} = 1. V. will make two throws, each time choosing the coin to be thrown and D M who knows the rule but is in the dark about V.'s choice, will forecast the result. H e then consults 1E and 2E obtaining - if ~head>> is coded with 1 and ~tail>> with 0 - lp(i,j) = 1/4 (i = 0,1; j = 0,1) and, respectively, 2p(0,1) = 2p(1,1) = 1/2. Neither 1E or 2E want, nor is the opportunity offered them, to justify their probability assessments. It is D M himself who interprets them by saying that in 1E's [2E's] opinion it is 0 = (C,C) [ 0 = (C,C)]; there seems to be a paradox, of course, which immediately disappears if D M believes both Experts can be wrong! In the above setup how does the D M use data bxk) to update his own opinion about X?. H e r e we present our soultion. In giving it we assume O is a r a n d o m number (r.n.) with support T= {O:_O~
(2.2)

LINEAR POOLING OF OPINIONS

where {F} is a class of D.F. Therefore, if Expert iF. chooses iF, then it is univocally defined

O, : F o, = iF

(2.3)

Let (ol,

. . . , o ~ . . . . , Ok) = KO

(2.4)

and suppose map (2.2) is continuous with respect to the usual metric on ~ 1 and metric d : d(F,F') = sup [F(x)-F'(x)l on {/7}. We now return to DM, waiting for the Experts to tell him their own iF (i = I, ..., k). Before learning what they are, D M attempts to guess them. According to our model, if kO denotes the k-dimensional r.v. whose possible realization describe set T (k) = {k 0 ) = ~r4c, with k O defined in (2.4), D M assigns D.F. F(x,O, kO) to the (h + 1 + k)-dimensional r.v. (X,O, kO) and, denoting f as the d.f. corresponding to F and putting

B(x) = {v : V i

~ Xi ;

i = 1, ..., h}

(2.5)

to be coherent, he must write

oF(X) = /B (x) of(v) dv = /B (x) / r / ~ k~f(v,O, kO) dKO dO dv = = / T /7~, fB(x) f(vtO, kO) f(O, kO) dv dkO dO

(2.6)

At this point Expert iE communicates his D.F. iF (i = 1..... k) to DM. Then for DM, who knows map (2.2), it is as if he had learnt that Expert iE believes the true world's state is 0i. Therefore DM will update his own belief about X and substitute eF for oF, that is

(x)

= /T /B (x) f(VlO,kO) f(~kO) dv dO 329

(2.7)

M. DI B A C C O 9 V. M O C E L L I N or

el(x) =

f f(xlO, ko) f(kolo) f(o) do

(2.8)

f f(kOlO)f(t~) dO T

Now suppose DM knew that O was the true world's state (the true environment). According to our model, he assigns D.F. (d.f.) F(xlt~) (f(x]tg)) to X whatever the Experts tell him. Therefore we must put

f(xlO, kt~) = f(xlO)

(2.9)

On the other hand, we have assumed that DM believes each expert gives his advice separately: hence DM excludes any comunication between them or, at least, their beliefs do not affect one another. Consequently: k

f(kOlt~)

=

H f(Oi[O) i=1

(2.10)

Given hypotheses (2.9) and (2.10), (2.8) becomes

Ef(x) =

/7"

k I I f(t~l o) rio) dO

(2.11)

ill

At this point, to obtain the additive rule another feature needs to be made explicit; the most relevant, indeed, of our model. As we have already stated, DM believes each expert can be wrong; however, it is natural to suppose that he believes all the experts almost always guess almost right; after all, each of them is an expert! But it is the careful addition of these two ~almost~ that explains the disagreement in their probability assessments of X. There are many ways to do this precisely; what we will use is, we believe, the simplest. Let us suppose

Vi,_tg--~-e < 6 < 0 + - 7e 330

(e>

O)

(2.12)

L I N E A R P O O L I N G OF O P I N I O N S

It will soon be clear that this self-limitation simplifies our exposition without loss of effectiveness in the arguments (see (2.16)). Then set

iT(e)=

l~ : Oi -- --~- ~

O ~

Oi "t-

T(e,u,jv...,j~) = i=l ~ if(e)

; i = l,...,k

AY'T~(e) ' u >i 1

(2.13)

k

T(e,O) = f3 iT c (e) i=1

where (.)c denotes complementation and {]l,...,ju} denote the whole set of choices of u (u ~> 1) suffixes - all different - among 1 ..... k. Let us then assume

[

~--~, O e iT(e) , O < e < l

a) floilo) = 0 < a(e) <~ gi(Oi, O) <~ fl(e) , elsewhere

(2.14)

b)

8 ( 0 = 0 ,.

8(o.(o - 1

That is, according to DM's belief the experts can be wrong, but, almost always, they make very small mistakes. Moreover, we assume that all iF are different; then according to (2.2), i g= j 0i 4= 0j; this self-limitation too is suggested by reasons of expositive convenience and one can easily remove it. Now it is possible to show that in our present scheme the additive rule is approached if e is quite small. In fact, using (2.13), (2.11) becomes

,

X

X

,=1 (Jv...,J,,}

j(x) =

Z,

/

f(~lo)

f(odo)

rio) dO +

o,

rio) dO +

r(e, u,jl ..... j,,)

Z u=l O',,...d.}

/ i= T(e, u,jl ..... j,,)

331

M . M . D I B A C C O " V. M O C E L L I N

+/

(2.15)

T(~,O) + T(e,O)

From (2.14) and (2.15), taking e < .Mi.'n l,J

no~- Oil

we have

~/ i=1

(\--~) a (0 f-'

/

f(xlO) /(o) aO+ l e_--i-a(e)

~T(~)

f(xlO) /(0) aO

T(e,O)

~/~ ~)/(o) ao + ~_~~

i

13(e)

/

--

" "/

Z

i=1

<~ e l ( x )

/(o) do T(t,O)

f(xlO)

E

iT(e)

I:' g(Oj,O)] f(O) dO + j-~l

=

X i=I

H g ~,

~T(e)

f(o) dO +

j~l

+

T(e,O) k

f(xlO) f(O) dO + - f ~e

Z iffi l

#o) ao + iT(e)

a(e) ~

332

13(e) f r:,,o) /(xlO) f(o) d(O)

#o) a(o) T(e,O)

LINEAR POOLING OF OPINIONS

Choosing the suitable can write

values f(xlOr), Or 9 iT(e), and f(xlO~), 0~ 9 T(e,O), we

k

a(e) ]k-1

v_~f(xlO* ) Pr{O 9 iT(e)} + i=1 k

Z Pr{e 9 iT(e)} + i=1

+

k

{ i=~lf(X]O*i Pr{ O eT(e,l,0 ) + k

V~Pr{O 9 r(e,l,i)}

i=1

+ ~ee[3(n)f(x/O*o)

[ 1-Pr{O ~ tOk T(e,1,])} ] ]=1

+ ~-L--~a(e) 1-Pr{O e U T(e,I,j)} ]ffil

+

1

Then if e~ 0

(2.16)

the continuity assumption on the 1:1 map (2.2), gives k j(X)

=

i=1

f(~) k

,xs(o,)

k

f(x/Oi) = i=1 ~ wi ~(x)

(2.17)

which proves our assertion. So we reach the following result: if D M believes it is almost certain that each of the experts almost correctly diagnoses the world's state - i.e. if the 333

M . M . D I B A C C O 9 V. M O C E L L I N

probability assignments are very ~close~ to the ~true~ one - then the strictly observant Bayesian can regard the additive rule as a good approximation of the ~new~ probability he assigns to X after knowing the experts' assignments. A n d the more skilled the experts are, in the DM's opinion, the b e t t e r such an approximation is. Notice that the weights in the mixture (2.17) reflect DM's belief but restricted to the world's states indirectly indicated by the experts. It is also obvious, at this point, how to change (2.17) if nh (h=l,...,s; l<~s<~k) Experts give the same probability assignment.

REFERENCES MORmS, P. A. (1983). An Axiomatic Approach to Expert Resolution. Management Sc., Vol. 29, N. 1, 24-32. GENEST, C. & ZIDEK, J. V. (1986). Combining probability distributions: A critique and an annoted bibliography. Statistical Sc., Vol. 1, 114-135. MORRIS, P. A. (1986). Observations on Expert Aggregation. Management Sc., Vol. 32, N. 3, 321-328. LINDLEY,D. V. (1986). Another Look at an Axiomatic Approach to Expert Resolution. Management Sc., Vol. 32, N. 3, 303-306.

334