This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
O . We can represent {<£A,O|A £ A} as ipo : £ x A -> O : (p, A) H-> a]) —> [0,1] : [a, (1 — x)a + x--> x , i.e., uniformly distributed, < a i | + \a2 >< a2\ }i=i,2 • +p2\b2>, + a 2 | a 2 >, where a-i = PiPn + P2P21 and a2 — P\P\2 + /?2/?22- oDefinition 3 A class of the equivalence Kq of an element q £ $ is the set of all elements p £ $ for which p + 6\ = q + 62 for some #1, 62 € O. Set $ / 0 = {Kq}, q G $ . From definition 3 it is clear that KB = O for all 0 E O. Indeed, let x £ Kg, then by definition 3, we have x + 61 = 0 + 62 for some 9i, 82 £ O. By 6 it follows that x £ O. Further, since p + 6 = 8+p\/6£Owe have B and if>: B ->• M, such that \\tp o tp(a) - a\\ < 6 {t) H—Qp-dt, we have d${t) = G{t)<j>{t). dt 3/4 then the model is nonrenormalizable.
88
we obtain exactly the probability structure for spin-| in quantum theory .a An interpretational proposal of this model could be the following:1'2'3 Rather than decomposing states as in so-called hidden variable theories, here we decompose the measurements in deterministic ones — the probability measure fi should then be envisioned as encoding the lack of knowledge on the interaction of the measured system with its environment, including measurement device. We now introduce a notion of "relative size" of HMS-representations, justifying the use of "smaller". Given a er-algebra6 and probability measure H : B —> [0,1] denote by B/n the
3
f
:B
^
~* B'^''
a n in
J e c t i v e c-ni°rphism
We call {B,n) and (B',fi') equivalent, denoted (B,fi) ~ (B',fi'), whenever in the above / is a c-isomorphism. Given two MS (£,£) and (E',£') we set:
{
3s : S -> E ' , 3t :£-+£', both bijections Ve 6 £, 3 / e : B(Oe) -> B(Ot(e)), a cr-isomorphism Vp E E , V e E £ : Ps(p),t(e)
° fe =
P
P,e
Via this equivalence relation we can define a relation < M S between classes of measurement systems M and M1 as M <MSM' if for all (E,£) € M there exists (E',£') 6 M' such that (E,£) ~ M S (S',£"'), i.e., if M is included in M' up to MS-equivalence. We can then prove the following: (i) (B,/i) ~ (B',ii') if and only if (B>Ai) < (B',n') and {B',ft') < {B,ft) — " , 3, Lemma 1; thus, the equivalence classes with respect to ~ constitute a partially ordered set (poset) for the ordering induced by < ; we will denote 8
"As shown i n 6 , 9 this deterministic model for spin-^ in R 3 can be generalized to R 3 -models for arbitrary spin-N/2 . The states are then represented in the so called Majorana representation 18 5 ' , i.e., as N copies of So . Correct probabilistic behavior is then obtained by introducing entanglement between the N different "spin-^ systems". fc I.e., a "pointless" cr-fleld. In particular, it follows from the Loomis-Sikorski theorem 1 7 ' 2 0 that all separable
89 the set of these equivalence classes by M , a class in it will be denoted via a member of it as [B, n]. (ii) When setting M H M S := {M[B{K),ii\ \ [B(A),n] £ M } where M[B(A),fi] stands for all HMS with B(A') and \i' such that (S'(A'), fi') £ [B(A),/j] , we have that (B(A),/i) < (B'(A'),M') B,ndM[B(A),n] <MS M[B'(A'),n'] are equivalent 8i \ 3, Theorem 2. This then results in: Theorem 1. (M, <) and (M H M S ,< M S ) are isomorphic posets. One of the crucial ingredients in (ii) above and also in the proof for general existence with A = [0,1] is the following: when setting A M ( E , £ ) := {(B(Oe), Pp,e)\p € £, e G £} , we obtain that £, £ admits a HMS-representation with B(A) and \i if and only if AM(E, £) < (B(A),n), where the order applies pointwisely to the elements of A M ( E , £ ) 8 t , 4.2, Theorem 1. Using this and Theorem 1 above we can now translate properties of M to propositions on the existence of certain HMS-representations. We obtain the following: (i) (M, <) is not a join-semilattice, thus: In general there exists no smallest HMS-representation. As such we will have to refine our study to particular settings where we are able to make statements whether there exists a smallest one, and if not, whether we can say at least something on the cardinality of A. (ii) One can prove a number of criteria on A M ( E , £ ) that force (B(A),fi) ~ (S([0,1]), /i„) as such assuring existence of a smallest representation. Among these the following. Let Mfinite := {(B(X),^) € M J X is finite}. / / ^•finite Q A M ( £ , £ ) than A cannot be discrete. It then follows for example that quantum theory restricted to measurements with a finite number of outcomes still requires A = [0,1]. (iii) Let MJV := {(B(X),(i) 6 M | X has at most N elements} . J / A M ( £ , £ ) C M^r then there exists a HMS-representation with A — N . Thus, quantum theory restricted to those measurements with at most a fixed number N of outcomes has discrete HMS-representation. (iv) / / A M ( E , £ ) = MAT then there exists no smallest HMS-representation. Neither does it exist when fixing the number of outcomes. So there is no essentially unique smallest HMS-representation for ./V-outcome quantum theory. Although there exists no smallest and as such no canonical discrete HMSrepresentation we will give the construction of one solution for dichotomic (or propositional) quantum theory, i.e., N = 2, since this will constitute the core of the model presented in this paper. We will follow8"2, to which we also refer for a construction for arbitrary N. Let us denote the quantum mechanical probability to obtain a positive outcome in a measurement of a proposition or question a on a system in state p as Pp(a) — the outcome set consists here of "we obtain a positive answer for the question a", slightly abusively denoted
90 as a itself, and "we obtain a negative answer for the question a", denoted as -ia. Set inductively for A € N : c . <pa(p, X):=\
{
aa
iff P (n\ > A- 4- V * - 1 i(.Vc.(p W Z ^ + U=i 2>
^ -ia otherwise One verifies that for p,(X) := ^x we obtain the correct probabilities in the resulting HMS-model. This provides a discrete alternative for the above discussed E 3 -model for spin-i . The model, including the projection xp remains the same although we don't consider [a, ->a] as A anymore. Let A e A ' : = N . Set x„ := ( 1 - £)a+ (£)--i • For xp <E [a,x$[U[x$,x$[U[x%,x£[U... U = a [a;2A-i'~lQ;] w e se ^ f'a&ty > anc ^ PaiP'ty = ~}<x otherwise. Then, for p'0 := B(N) —»• [0,1] : {A} >-> ^ we obtain again quantum probability. Geometrically, this means that the values of A £ A, as compared to the first model where they represents points on the diagonal, i.e., a continuous interval, or, again equivalently, decompositions of an interval in two intervals, we now consider decompositions of an interval in 2A equally long parts, of which there are only a discrete number of possibilities. We refer t o 8 " for details and illustrations concerning. 3
Unitary, ortho- and projective structure
In the above discussed E 3 models, rotational symmetries where implicit in their spatial geometry. However, in general the decompositions of measurements over p: B(A) —> [0,1] go measurement by measurement so additional structure, if there is any, has to be put in by hand. It is probably fair to say that these contextual models only become non-trivial and useful when encoding physical symmetries within the maps tpa in an appropriate manner. For sake of the argument we will distinguish between three types of symmetries that can be encoded, namely unitary, ortho- and projective ones. i. Unitary symmetries: When considering quantum measurements with discrete non-degenerated spectrum we can represent the outcomes {OJ}J by the corresponding "eigenstates" {pi}i via spectral decomposition, i.e., there exists an injective map B(Oe) -t P(E) for each e € £. Then, specification of e = (UoipoU-1) : A x E -> {pe,i}i, where U : E -> E is the unitary transformation that satisfies U(pi) = pe,i, and pe = p. This is exactly the c We agree on N := {1,2,... } . Note here that already by non-uniqueness of binary decomposition — i = 4- = EigN T^TT — '* follows that the construction below is not canonical. Obviously, there are also less pathological differences between the different non-comparable discrete representations 8 ".
91 symmetry encoded in the above described E 3 -models. Note in particular that in this perspective the pairs (a, -ia) and (->a, ->(->a!)) should not be envisioned as merely a change of names of the outcomes, but truly as putting the measurement device (or at least its detecting part) upside down .d In this setting where we represent outcomes as states, the assignment of an outcome can now be envisioned as a true change of state fe>\ : E -> E (D Oe) : p i-> tpe(p, A), as such allowing to describe the behavior of the system under concatenated measurements. ii. Projective symmetries: For non-degenerated quantum measurements, the outcomes require representation by higher dimensional subspaces so identification in terms of states now requires an injective map B(Oe) -» V(V(S)). The behavior of states of the system under concatenated measurements then requires specification of a family of "projectors" {TTT • S -> T\T € Oe}, e.g., the orthogonal projectors 7 r ^ : E - > A : p i - > ^ l A ( p V A x ) on the corresponding subspace A in quantum theory. The above discussed non-degenerated case fits also in this picture by setting Oe C {{p} | p £ E} where now each 7T{p} : E —> {p} is uniquely determined (having a singleton codomain). Hi. Orthosymmetries: The existence of an orthocomplementation on the lattice of closed subspaces of a Hilbert space provides a dichotomic representation for measurements which can be envisioned as a pair consisting of a (to be verified) proposition a and its negation -*a, in quantum theory yielding TT^A '• E —> A1- : p H» A L A ( p V A ) . In terms of linear operator calculus we have IT^A = 1 — ""A > both of them being orthogonal projectors. 4
Representing quantum history theory
Although quantum history theory involves sequential measurements, one of its goals is to remain an essentially dichotomic propositional theory. This is formally encoded in a rigid way in the "History Projection Operator"-approach 14 . The key idea here is that the form of logicality aimed at in 14 represents faithfully in the Hilbert space tensor product. e Let A := (ctti)i be a d
T h e attentive reader will note that it is at this point that we escape the so-called hidden variable no-go theorems. They arise when trying to impose contextual symmetries within the states of the system by requiring that values of observables are independent of the chosen context, e.g., the proof of the Kochen-Specker theorem. Our newly introduced variable A £ A follows contextual manipulations in an obvious manner. c A t this point we mention that in the study of sequential phenomena in the axiomatic quantum theory perspective on quantum logic, sequentiality and compoundness both turn out to be specifications of a universal causal duality 1 0 , as such providing a metaphysical perspective on the use of tensor products both for the description of compound physical systems and sequential processes.
92 (so-called homogeneous) quantum history proposition with temporal support (£1, £2, • • • , tn) • Then, rather than representing this as a sequence of subspaces (Ai)i or projectors (ir^i we will either represent A as a pure tensor ®iAi in the lattice of closed subspaces of the tensor product of the corresponding Hilbert spaces or as the orthogonal projector ®i~Ki on this subspace. The crucial property of this representation is then that ->A again encodes as a projector namely id—®iiTi14, clarifying the notations TTJ, and 7r-,^ . Moreover, if {Al}i is a set of so-called disjoint history propositions, i.e., <8>kAk ± ®kA3k for i ^ j , then, the history proposition that expresses the disjunction of {A'}i sensu 1 4 is exactly encoded as the projector ] [ \ ®*7r£ . We get as such a kind of logical setting that is still encoded in terms of projectors. Note that TT-,A is not of the form ®j7Tj but of the form Yli ®A7rfc breaking the structural symmetry between a proposition and its negation in ordinary quantum theory. We will now transcribe the observations in the two previous sections to this setting in order to provide a contextual deterministic model for quantum history theory with discretely originating probabilities. One could say that we will apply a split picture in terms of Schrodinger-Eisenbergh, namely we assume that on the level of unitary evolution we apply the Eisenbergh picture such that we can fix notation without reference to this evolution, but for changes of state due to measurement we will (obviously) express this in the state space. When encoding outcomes in terms of states we need to consider n copies of E , encoding the trajectories due to the measurements. In view of the considerations made above it will be no surprise that we will consider these trajectories as of the form ®iPi in the tensor product (gijEj. This will require the introduction of the following "pseudo-projector": • 7r^ : £ -> ®i£i : p H> p ^ := p ® m(p) ® . . . ® (7Tn_i o . . . o in)(p). Setting £® := TT®[£] = {pg|p £ £ } then ir% : £ -> E ^ encodes a bijective representation of E . Noting that PP(A) :— (p® I'XAP'A) is the probability given by quantum theory to obtain A, we then set inductively for fixed A £ N that
•
E £ ^ ^
and (p^(p,\) = -1.4 otherwise. The outcome trajectories in case we obtain A are then given in terms of initial states by (n^ o 7r®) : E —> ®iAi. The value A € N can be envisioned as follows. We assume it to be a number of contextual events, either real or virtual depending on one's taste, and we assume that, given that some events already happened, the chance of a next one happening is equal to the chance that it doesn't happen, so we actually consider a finite number of probabilistically balanced consecutive binary decisive processes where the result of the previous one determines whether we actually
93
will perform the next one. Unitary symmetries are induced in the obvious way as tensored unitary operators ®iUi. This model then produces the statistical behavior of quantum history theory. The breaking of the structural symmetry between a proposition and its negation manifestates itself in the most explicit way in the sense that when we have a determined outcome ->A we don't have a determined trajectory in our model — obviously one could build a fully deterministic model that also determines this by concatenation of individual deterministic models (one for each element in the temporal support), but we feel that this would not be in accordance with the propositional flavor a history theory aims at. The negation ->A is indeed cognitive and not ontological with respect to the actual executed physical procedure or, in other words, the system's context, and one cannot expect an ontological model to encode this in terms of a formal duality. Explicitly, -i(A®B) can be written both as {H <8> ->B) © (->A ® B) and (->A ® H) © (A ® ->B) which clearly define different procedures with respect to imposed change of state due to the measurement. Even more explicitly, setting HPO({Hk}k) := { E ; ® * 4 l 4 G £(«*)> ®*4l -L ® * 4 for i ^ j} for £(}ik) the lattice of closed subspaces of Hk , the "ontologically faithful hull" oiUVO({Uk}k) consists then of all "ortho-ideals" Ol(HVO({Hk}k)) ~ • {4.[{®*Aj}i] | A\ e C{Uk),®kA\ ± ®kA{ for i ± j} where J,[—] assigns to a set of pure tensors all pure tensors in QkHk that are smaller than at least one in the given set, this with respect to the ordering in C{®kHk) — the downset 4-[~] construction makes Ol(HVO({Hk}k)) inherit the £(®kHk)-oideT as intersection. If a particular decomposition is specified as an element of OX(HVO({'Hk}k)), what means full specification of the physical procedure where summation over different sequences of pure tensors is now envisioned as choice of procedure, we can provide a deterministic contextual model, the choice of procedure itself becoming an additional variable. Conclusively, the HPO-setting "looses" part of the physical ontology that goes with an operational perspective on quantum theory/ and as such, if we want to provide a deterministic representation for general inhomogeneous history propositions sensu the one we obtained for the homogeneous ones, we formally need to restore this part of the physical ontology, e.g. as Ol{7iVO({7ik}k)) . 5
Further discussion
In this paper we didn't provide an answer and we even didn't pose a question. We just provided a new way to think about things, slightly confronting the ' A choice that is motivated by the traditional consistent history setting and its interpretation as well as by a particular semantical perspective on quantum logic as a whole.
94
usual consistency or decoherence perspective for history theories. Even if one does not subscribe to the underlying deterministic nature of the model it still exhibits what a minimal representation of the indeterministic ingredients can be, as such representing it in a more tangible way. With respect to the nonexistence of a smallest representation, in view of other physical considerations it could be that one of the constructible discrete models presents itself as the truly canonical one, e.g., equilibrium or other thermodynamical considerations, metastatistical ones, emerging from additional modelization. Acknowledgments We thank Chris Isham for useful discussions on the content of this paper. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
20.
D. Aerts, J. Math. Phys. 27, 202 (1986). D. Aerts, Int. J. Theor. Phys. 32, 2207 (1993). D. Aerts, Found. Phys. 24, 1227 (1994). G.K. Au — Interview with A. Ashtekar, C.J. Isham and E. Witten, The Quest for Quantum Gravity; arXiv: gr-qc/9506001 (1995). H. Bacry, J. Math. Phys. 15, 1686 (1974) . B. Coecke, Helv. Phys. Acta 68, 396 (1995). B. Coecke, Found. Phys. Lett. 8, 437 (1995). B. Coecke, Helv. Phys. Acta 70, 442, 462(1997); arXiv: quantph/0008061 k 0008062; Tatra Mt. Math. Publ. 10, 63. B. Coecke, Found. Phys. 28, 1347 (1998). B. Coecke et ai Found. Phys. Lett. 14(2001); arXiv: quant-ph/0009100. N. Gisin and C. Piron, Lett. Math. Phys. 5, 379 (1981). S. Gudder, J. Math. Phys. 11, 431 (1970). A. Horn and H. Tarski, Trans. AMS 64, 467 (1948). C. J. Isham J. Math. Phys. 23, 2157 (1994). C. J. Isham, Structural Issues in Quantum Gravity, In: General Relativity and Gravitation: GR14, pp.167 (World Scientific, Singapore, 1997). C.J. Isham and J. Butterfield, Found. Phys. 30, 1707 (2000). L. Loomis, Bull. AMS 53, 757 (1947). E. Majorana, Nuovo Cimento 9, 43 (1932). C. Rovelli, Strings, Loops and Others: A Critical Survey of the Present Approaches to Quantum Gravity, Plenary Lecture at GR15, Poona, India (1998); arXiv: gr-qc/9803024. R. Sikorski, Fund. Math. 35, 247 (1948).
95 I N T E R P R E T A T I O N S OF Q U A N T U M M E C H A N I C S , A N D I N T E R P R E T A T I O N S OF VIOLATION OF BELL'S INEQUALITY WILLEM. M. DE MUYNCK Theoretical Physics, Eindhoven University of Technology, FOB 513, 5600 MB Eindhoven, the Netherlands E-mail: [email protected] The discussion of the foundations of quantum mechanics is complicated by the fact that a number of different issues are closely entangled. Three of these issues are i) the interpretation of probability, ii) the choice between realist and empiricist interpretations of the mathematical formalism of quantum mechanics, iii) the distinction between measurement and preparation. It will be demonstrated that an interpretation of violation of Bell's inequality by quantum mechanics as evidence of non-locality of the quantum world is a consequence of a particular choice between these alternatives. Also a distinction must be drawn between two forms of realism, viz. a) realist interpretations of quantum mechanics, b) the possibility of hidden-variables (sub-quantum) theories.
1
Realist and empiricist interpretations of quantum mechanics
In realist interpretations of the mathematical formalism of quantum mechanics state vector and observable are thought to refer to the microscopic object in the usual way presented in most textbooks. Although, of course, preparing and measuring instruments are often present, these are not taken into account in the mathematical description (unless, as in the theory of measurement, the subject is the interaction between object and measuring instrument). In an empiricist interpretation quantum mechanics is thought to describe relations between input and output of a measurement process. A state vector is just a label of a preparation procedure; an observable is a label of a measuring instrument. In an empiricist interpretation quantum mechanics is not thought to describe the microscopic object. This, of course, does not imply that this object would not exist; it only means that it is not described by quantum mechanics. Explanation of relations between input and output of a measurement process should be provided by another theory, e.g. a hidden-variables (sub-quantum) theory. This is analogous to the way the theory of rigid bodies describes the empirical behavior of a billiard ball, or to the description by thermodynamics of the thermodynamic properties of a volume of gas, explanations being relegated to theories describing the microscopic (atomic) properties of the systems. Although a term like 'observable' (rather than 'physical quantity') is ev-
96
idence of the empiricist origin of quantum mechanics (compare Heisenberg1), there has always existed a strong tendency toward a realist interpretation in which observables are considered as properties of the microscopic object, more or less analogous to classical ones. Likewise, many physicists use to think about electrons as wave packets flying around in space, without bothering too much about the "Unanschaulichkeit" that for Schrodingei2 was such a problematic feature of quantum theory. Without entering into a detailed discussion of the relative merits of either of these interpretations (e.g. de Muynck3) it is noted here that an empiricist interpretation is in agreement with the operational way theory and experiment are compared in the laboratory. Moreover, it is free of paradoxes, which have their origin in a realist interpretation. As will be seen in the next section, the difference between realist and empiricist interpretations is highly relevant when dealing with the EPR problem. 2
E P R experiments and Bell experiments
In figure 1 the experiment is depicted,
/ measuring instrument for Q, or P,
Figure 1: E P R experiment.
proposed by Einstein, Podolsky and Rosen4 to study (in)completeness of quantum mechanics. A pair of particles (1 and 2) is prepared in an entangled state and allowed to separate. A measurement is performed on particle 1. It is essential to the EPR reasoning that particle 2 does not interact with any measuring instrument, thus allowing to consider so-called 'elements of physical reality' of this particle, that can be considered as objective properties, being attributable to particle 2 independently of what happens to particle 1. By EPR this arrangement was presented as a way to perform a measurement on particle 2 without in any way disturbing this particle. The EPR experiment should be compared to correlation measurements of the type performed by Aspect et al.5'6 to test Bell's inequality (cf. figure 2). In these latter experiments also particle 2 is interacting with a measuring instrument. In the literature these experiments are often referred to as EPR experiments, too, thus neglecting the fundamental difference between
97
Q,
Figure 2: Bell experiment.
the two measurement arrangements of figures 1 and 2. This negligence has been responsible for quite a bit of confusion, and should preferably be avoided by referring to the latter experiments as Bell experiments rather than EPR ones. In EPR experiments particle 2 is not subject to a measurement, but to a (conditional) preparation (conditional on the measurement result obtained for particle 1). This is especially clear in an empiricist interpretation, because here measurement results cannot exist unless a measuring instrument is present, its pointer positions corresponding to the measurement results. Unfortunately, the EPR experiment of figure 1 was presented by EPR as a measurement performed on particle 2, and accepted by Bohr as such. That this could happen is a consequence of the fact that both Einstein and Bohr entertained a realist interpretation of quantum mechanical observables (note that they differed with respect to the interpretation of the state vector), the only difference being that Einstein's realist interpretation was an objectivistic one (in which observables are considered as properties of the object, possessed independently of any measurement: the EPR 'elements of physical reality'), whereas Bohr's was a contextualistic realism (in which observables are only well-defined within the context of the measurement). Note that in Bell experiments the EPR reasoning would break down because, due to the interaction of particle 2 with its measuring instrument, there cannot exist 'elements of physical reality'. Much confusion could have been avoided if Bohr had maintained his interactional view of measurement. However, by accepting the EPR experiment as a measurement of particle 2 he had to weaken his interpretation to a relational one (e.g. Popper7, Jammer 8 ), allowing the observable of particle 2 to be codetermined by the measurement context for particle 1. This introduced for the first time non-locality in the interpretation of quantum mechanics. But this could easily have been avoided if Bohr had required that for a measurement of particle 2 a measuring instrument should be actually interacting with this very particle, with the result that an observable of particle n (n = 1,2) can be co-determined in a local way by the measurement context of that particle only. This, incidentally, would have completely made obsolete the EPR 'ele-
98
ments of physical reality', and would have been quite a bit less confusing than the answer Bohr9 actually gave (to the effect that the definition of the EPR 'element of physical reality' would be ambiguous because of the fact that it did not take into account the measurement arrangement for the other particle), thus promoting the non-locality idea. Summarizing, the idea of EPR non-locality is a consequence of i) a neglect of the difference between EPR and Bell experiments (equating 'elements of physical reality' to measurement results), ii) a realist interpretation of quantum mechanics (considering measurement results as properties of the microscopic object, i.e. particle 2). In an empiricist interpretation there is no reason to assume any non-locality. It is often asserted that non-locality is proven by the Aspect experiments, because these are violating Bell's inequality. The reason for such an assertion is that it is thought that non-locality is a necessary condition for a derivation of Bell's inequality. However, as will be demonstrated in the following, this cannot be correct since this inequality can be derived from quite different assumptions. Also, experiments like the Aspect ones, -although violating Bell's inequality,do not exhibit any trace of non-locality, because their measurement results are completely consistent with the postulate of local commutativity, implying that relative frequencies of measurement results are independent of which measurements are performed in causally disconnected regions. Admittedly, this does not logically exclude a certain non-locality at the individual level, being unobservable at the statistical level of quantum mechanical probability distributions. However, from a physical point of view a peaceful coexistence between locality at the (physically relevant) statistical level and non-locality at the individual level is extremely implausible. Unobservability of the latter would require a kind of conspiracy not unlike the one making unobservable 19"* century world aether. For this reason the 'non-locality' explanation of the experimental violation of Bell's inequality does not seem to be very plausible, and does it seem wise to look for alternative explanations. Since non-locality is never the only assumption in deriving Bell's inequality, such alternative explanations do exist. Thus, Einstein's assumption of the existence of 'elements of physical reality' is such an additional assumption. More generally, in Bell's derivation10 the existence of hidden-variables is one. Is it still possible to derive Bell's inequality if these assumptions are abolished? Moreover, even assuming the possibility of hidden-variables theories, are there in Bell's derivation no hidden assumptions, additional to the locality assumption. Bell's inequality refers to a set of four quantum mechanical observables, Ai,Bi,A2 and B2, observables with different/identical indices being compati-
99 ble/incompatible. In the Aspect experiments measurements of the four possible compatible pairs are performed; in these experiments An and Bn refer to polarization observables of photon n, n = 1,2, respectively). Bell's inequality can typically be derived for the stochastic quantities of a classical Kolmogorovian probability theory. Hence, violation of Bell's inequality is an indication that observables A\, B\, A2 and B2 are not stochastic quantities in the sense of Kolmogorov's probability theory. In particular, there cannot exist a quadrivariate joint probability distribution of these four observables. Such a non-existence is a consequence of the incompatibility of certain of the observables. Since incompatibility is a local affair, this is another reason to doubt the 'non-locality' explanation of the violation of Bell's inequality. In the following derivations of Bell's inequality will be scrutinized to see whether the non-locality assumption is as crucial as was assumed by Bell. In doing so it is necessary to distinguish derivations in quantum mechanics from derivations in hidden-variables theories. 3
Bell's inequality in quantum mechanics
For dichotomic observables, having values ± 1 , Bell's inequality is given according to \{A^A2) - {AXB2)\ - (B1B2) - (BiA2) < 2.
(1)
A more general inequality, being valid for arbitrary values of the observables, is the BCHS inequality -l
+p(bi,b2)+p(a1,b2)
- p ( o i , a 2 ) -p(bi)
-p(b2)
<0
(2)
from which (3.1) can be derived for the dichotomic case. Because of its independence of the values of the observables inequality (3.2) is preferable by far over inequality (3.1). Bell's inequality may be violated if some of the observables are incompatible: [>li,i?i]_ ^ O, [^2,-62]- ^ O. I shall now discuss two derivations of Bell's inequality, which can be formulated within the quantum mechanical formalism, and which do not rely on the existence of hidden variables. The first one is relying on a 'possessed values' principle, stating that values of quantum mechanical observables 'possessed may be attributed to the object as values' = < objective properties, possessed by the object principle independent of observation The 'possessec values' principle can be seen as an expression of the objectivistic-realist interpretation of the quantum mechanical formalism preferred by
100
Einstein (compare the EPR 'elements of physical reality'). The important point is that by this principle well-defined values are simultaneously attributed to incompatible observables. If a\n', bj = ± 1 are the values of Ai and Bj for the nth of a sequence of N particle pairs, then we have
- 2 < < 4 n ) 4 n ) - a[n)b{2n) - b[n)b2n) - &
<^iA2> = l f ; a W 4 n > > e t c . n=l
must satisfy Bell's inequality (3.1) (a similar derivation has first been given by Stapp 11 , although starting from quite a different interpretation). The essential point in the derivation is the assumption of the existence of a quadruple of values (ai, b\, a,2,62) for each of the particle pairs. From the experimental violation of Bell's inequality it follows that an objectivistic-realist interpretation of the quantum mechanical formalism, encompassing the 'possessed values' principle, is impossible. Violation of Bell's inequality entails failure of the 'possessed values' principle (no quadruples available). In view of the important role measurement is playing in the interpretation of quantum mechanics this is hardly surprising. As is well-known, due to the incompatibility of some of the observables the existence of a quadruple of values can only be attained on the basis of doubtful counterfactual reasoning. If a realist interpretation is feasible at all, it seems to have to be a contextualistic one, in which the values of observables are co-determined by the measurement arrangement. In the case of Bell experiments non-locality does not seem to be involved. As a second possibility to derive Bell's inequality within quantum mechanics we should consider derivations of the BCHS inequality (3.2) from the existence of a quadrivariate probability distribution p(ai, 61,02,62) by Fine12 and Rastalf 3 (also de Muynck14). Hence, from violation of Bell's inequality the non-existence of a quadrivariate joint probability distribution follows. In view of the fact that incompatible observables are involved, this, once again, is hardly surprising. A priori there are two possible reasons for the non-existence of the quadrivariate joint probability distribution #(01,61,02,62). First, it is possible that Um]v->00N(ai,bi,a2,b2)/N of the relative frequencies of quadruples of measurement results does not exist. Since, however, Bell's inequality already follows from the existence of relative frequency ^(01,61,02,62)/^ with finite
101 N, and the limit N —> oo is never involved in any experimental implementation, this answer does not seem to be sufficient. Therefore the reason for the non-existence of the quadrivariate joint probability distribution p{a\, &i, a
(3)
thus preventing the existence of one single value of observable A\ for the two Aspect experiments involving this observable. This, precisely, is the 'nonlocality' explanation referred to above. This explanation is close to Bohr's 'ambiguity' answer to EPR, referred to in section 2, stating that the definition of an 'element of physical reality' of observable A\ must depend on the measurement context of particle 2. As will be demonstrated next, there is a more plausible local explanation, however, based on the inequality ai^O^a^Bi),
(4)
expressing that the value of Ai, say, will depend on whether either Ai or B\ is measured. Inequality (3.4) could be seen as an implementation of Heisenberg's disturbance theory of measurement, to the effect that observables, incompatible with the actually measured one, are disturbed by the measurement. That such an effect is really occurring in the Aspect experiments, can be seen from the generalized Aspect experiment depicted in figure 3. This experiment should be compared with the Aspect switching experiment?, in which the switches have been replaced by two semi-transparent mirrors (transmissivities 71 and 72, respectively). The four Aspect experiments are special cases of the generalized one, having 7„ = 0 or 1, n = 1,2. Restricting for a moment to one side of the interferometer, it is possible to calculate the joint detection probabilities of the two detectors according to {p^auMj))
- {
(1
_7l)(F(D+)
i-7l(£(i)+)-(l-7l)(f(i)+)
J> (5)
in which {E^ +, E^„} and {F^+jF^-} are the spectral representations of the two polarization observables (Ai and Bi) in directions 81 and 6[, respectively. The values an = +/—,bij = +/— correspond to yes/no registration
102
<& • BID
(IIS • y
Pole, D, Pole,' C S 3
E 3
Pol 9]
Figure 3: Generalized Aspect experiment.
of a photon by the detector. p 7 1 ( + , + ) = 0 means that, like in the switching experiment, only one of the detectors can register photon 1. There, however, is a fundamental difference with the switching experiment, because in this latter experiment the photon wave packet is sent either toward one detector or the other, whereas in the present one it is split so as to interact coherently with both detectors. This makes it possible to interpret the right hand part of the generalized experiment of figure 3 as a joint non-ideal measurement of the incompatible polarization observables in directions 6\ and 6[ (e.g. de Muynck et al. 15 ), the joint probability distribution of the observables being given by (5). It is not possible to extensively discuss here the relevance of experiments of the generalized type for understanding Heisenberg's disturbance theory of measurement, and its relation to the Heisenberg uncertainty relations (see e.g. de Muynck 16 ). The important point is that such experiments do not fit into the standard (Dirac-von Neumann) formalism in which a probability is an expectation value of a projection operator. Indeed, from (5) it follows that P-n(au,bij) = TrpR^ij is yielding operators R^ij according to
(
*
( 1 )
«
) =
((1-T0F<
1
>
+
7i£(1)+ £(D. . + (l-7l)F( 7i
O
(6)
The set of operators {R^ij} constitutes a so-called positive operator-valued measure (POVM). Only generalized measurements corresponding to POVMs are able to describe joint non-ideal measurements of incompatible observables. By calculating the marginals of probability distribution p 7 l (an, b\j) it is possible to see that for each value of 71 information is obtained on both polarization observables, be it that information on polarization in direction 0\ gets more non-ideal as 71 decreases, while information on polarization in direction 0[ is getting more ideal. This is in perfect agreement with the idea of mutual disturbance in a joint measurement of incompatible observables. The explanation of the non-existence of a single measurement result for observable Ai, say, as implied by inequality (3.4), is corroborated by this analysis.
103 The analysis can easily be extended to the joint detection probabilities of the whole experiment of figure 3. The joint detection probability distribution of all four detectors is given by the expectation value of a quadrivariate POVM {Rijki} according to (an, bij,a2k,hi)
= TrpRijkt-
(7)
This POVM can be expressed in terms of the POVMs of the left and right interferometer arms according to Rijki = R%)R%).
(8)
It is important to note that the existence of the quadrivariate joint probability distribution (7), and the consequent satisfaction of Bell's inequality, is a consequence of the existence of quadruples of measurement results, available because it is possible to determine for each individual particle pair what is the result of each of the four detectors. Although, because of (3.5), also locality is assumed, this does not play an essential role. Under the condition that a quadruple of measurement results exists for each individual photon pair Bell's inequality would be satisfied also if, due to non-local interaction, Rijkt were not a product of operators of the two arms of the interferometer. The reason why the standard Aspect experiments do not satisfy Bell's inequality is the non-existence of a quadrivariate joint probability distribution yielding the bivariate probabilities of these experiments as marginals. Such a nonexistence is strongly suggested by Heisenberg's idea of mutual disturbance in a joint measurement of incompatible observables. This is corroborated by the easily verifiable fact that the quadrivariate joint probability distributions of the standard Aspect experiments, obtained from (7) and (3.5) by taking j n to be either 1 or 0, are all distinct. Moreover, in general the quadrivariate joint probability distribution (7) for one standard Aspect experiment does not yield the bivariate ones of the other experiments as marginals. Although it is not strictly excluded that a quadrivariate joint probability distribution might exist having the bivariate probabilities of the standard Aspect experiments as marginals (hence, different from the ones referred to above), does the mathematical formalism of quantum mechanics not give any reason to surmise its existence. As far as quantum mechanics is concerned, the standard Aspect experiments need not satisfy Bell's inequality.
104
4
Bell's inequality in stochastic and deterministic hidden-variables theories
In stochastic hidden-variables theories quantum mechanical probabilities are usually given as
p(ai)=
[ d\ p(\)p(ai\\),
(1)
JA
in which A is the space of hidden variable A (to be compared with classical phase space), and p(ai|A) is the conditional probability of measurement result A = ai if the value of the hidden variable was A, and p{X) the probability of A. It should be noticed that expression (4.1) fits perfectly into an empiricist interpretation of the quantum mechanical formalism, in which measurement result ai is referring to a pointer position of a measuring instrument, the object being described by the hidden variable. Since p(a,i | A) may depend on the specific way the measurement is carried out, the stochastic hidden-variables model corresponds to a contextualistic interpretation of quantum mechanical observables. Deterministic hidden-variables theories are just special cases in which p(ai|A) is either 1 or 0. In the deterministic case it is possible to associate in a unique way (although possibly dependent on the measurement procedure) the value ai to the phase space point A the object is prepared in. A disadvantage of a deterministic theory is that the physical interaction of object and measuring instrument is left out of consideration, thus suggesting measurement result ai to be a (possibly contextually determined) property of the object. In order to have maximal generality it is preferable to deal with the stochastic case. For Bell experiments we have p(ai,a2)=
/ d\p(X)p(ai,a2\\),
(2)
JA
a condition of conditional statistical independence, p(a1,a2\X) =p(ai|A)p(o 2 |A),
(3)
expressing that the measurement procedures of Ai and A2 do not influence each other (so-called locality condition). As is well-known the locality condition was thought by Bell to be the crucial condition allowing a derivation of his inequality. This does not seem to be correct, however. As a matter of fact, Bell's inequality can be derived if a quadrivariate joint probability distribution exists 12 ' 13 . In a stochastic hiddenvariables theory such a distribution could be represented by p(ai,bi,a2,b2)
= / dX p(X)p(ai,bi,a2,b2\X), JA
(4)
105
without any necessity that the conditional probability be factorizable in order that Bell's inequality be satisfied (although for the generalized experiment discussed in section 3 it would be reasonable to require that p(ai, 61,02,621 A) = p(ai,6i|A)p(a2,&2|A)). Analogous to the quantum mechanical case, it is sufficient that for each individual preparation (here parameterized by A) a quadruple of measurement results exists. If Heisenberg measurement disturbance is a physically realistic effect in the experiments at issue, it should be described by the hidden-variables theory as well. Therefore the explanation of the nonexistence of such quadruples is the same as in quantum mechanics. However, with respect to the possibility of deriving Bell's inequality there is an important difference between quantum mechanics and the stochastic hidden-variables theories of the kind discussed here. Whereas quantum mechanics does not yield any indication as regards the existence of a quadrivariate joint probability distribution returning the bivariate probabilities of the Aspect experiments as marginals, local stochastic hidden-variables theory does. Indeed, using the single-observable conditional probabilities assumed to exist in the local theory (compare (3)), it is possible to construct a quadrivariate joint probability distribution according to p(ai,a2,b1,b2)
= / d\ p(A)p(ai|A)p(a2|A)p(&i|A)p(&2|A),
(5)
JK
satisfying all requirements. It should be noted that (4.2) does not describe the results of any joint measurement of the four observables that are involved. Quadruples (ai, a2, b\, b2) are obtained here by combining measurement results found in different experiments, assuming the same value of A in all experiments. For this reason the physical meaning of this probability distribution is not clear. However, this does not seem to be important. The existence of (4.2) as a purely mathematical constraint is sufficient to warrant that any stochastic hidden-variables theory in which (2) and (3) are satisfied, must require that the standard Aspect experiments obey Bell's inequality. Admittedly, there is a possibility that (4.2) might not be a valid mathematical entity because it is based on multiplication of the probability distributions p(a|A), which might be distributions in the sense of Schwartz' distribution theory. However, the remark made with respect to the existence of probability distributions as infinite—A'' limits of relative frequencies is valid also here: the reasoning does not depend on this limit, but is equally applicable to relative frequencies in finite sequences. The question is whether this reasoning is sufficient to conclude that no local hidden-variables theory can reproduce quantum mechanics. Such a conclusion would only be justified if locality would be the only assumption in
106 deriving Bell's inequality. If there would be any additional assumption in this derivation, then violation of Bell's inequality could possibly be blamed on the invalidity of this additional assumption rather than locality. Evidently, one such additional assumption is the existence of hidden variables. A belief in the completeness of the quantum mechanical formalism would, indeed, be a sufficient reason to reject this assumption, thus increasing pressure on the locality assumption. Since, however, an empiricist interpretation is hardly reconcilable with such a completeness belief, we have to take hidden-variables theories seriously, and look for the possibility of additional assumptions within such theories. In expression (4.1) one such assumption is evident, viz. the existence of the conditional probability p(ai|A). The assumption of the applicability of this quantity in a quantum mechanical measurement is far less innocuous than appears at first sight. If quantum mechanical measurements really can be modeled by equality (4.1), this implies that a quantum mechanical measurement result is determined, either in a stochastic or in a deterministic sense, by an instantaneous value A of the hidden variable, prepared independently of the measurement to be performed later. It is questionable whether this is a realistic assumption, in particular, if hidden variables would have the character of rapidly fluctuating stochastic variables. As a matter of fact, every individual quantum mechanical measurement takes a certain amount of time, and it will in general be virtually impossible to determine the precise instant to be taken as the initial time of the measurement, as well as the precise value of the stochastic variable at that moment. Hence, hidden-variables theories of the kind considered here may be too specific. Because of the assumption of a non-contextual preparation of the hidden variable, such theories were called quasi-objectivistic stochastic hiddenvariables theories in de Muynck and van Stekelenborg17 (dependence of the conditional probabilities p(ai\X) on the measurement procedure preventing complete objectivity of the theory). In the past attention has mainly been restricted to quasi-objectivistic hidden-variables theories. It is questionable, however, whether the assumption of quasi-objectivity is a possible one for hidden-variables theories purporting to reproduce quantum mechanical measurement results. The existence of quadrivariate probability distribution (4.2) only excludes quasi-objectivistic local hidden-variables theories (either stochastic or deterministic) from the possibility of reproducing quantum mechanics. As will be seen in the next section, it is far more reasonable to blame quasiobjectivity than locality for this, thus leaving the possibility of local hiddenvariables theories that are not quasi-objectivistic.
107
5
Analogy between thermodynamics and quantum mechanics
The essential feature of expression (4.1) is the possibility to attribute, either in a stochastic or in a deterministic way, measurement result a\ to an instantaneous value of hidden variable A. The question is whether this is a reasonable assumption within the domain of quantum mechanical measurement. Are the conditional probabilities p(ai|A) experimentally relevant within this domain? In order to give a tentative answer to this question, we shall exploit the analogy between thermodynamics and quantum mechanics, considered already a long time ago by many authors (e.g. de Broglie18, Bohm et al. 19 ' 20 , Nelson 21,22 ). Quantum mechanics -¥ Hidden variables theory (A1,A2,BUB2) A
t
t
Thermodynamics —> Classical statistical mechanics (P,T,S) {quPi} In this analogy thermodynamics and quantum mechanics are considered as phenomenological theories, to be reduced to more fundamental "microscopic" theories. The reduction of thermodynamics to classical statistical mechanics is thought to be analogous to a possible reduction of quantum mechanics to stochastic hidden-variables theory. Due to certain restrictions imposed on preparations and measurements within the domains of the phenomenological theories, their domains of application are thought to be contained in, but smaller than, the domains of the "microscopic" theories. In order to assess the nature and the importance of such restrictions let us first look at thermodynamics. As is well-known (e.g. Hollinger and Zenzen23) thermodynamics is valid only under a condition of molecular chaos, assuring the existence of local equilibrium" necessary for the ergodic hypothesis to be satisfied. Thermodynamics only describes measurements of quantities (like pressure, temperature, and entropy) being defined for such equilibrium states. From an operational point of view this implies that measurements within the domain of thermodynamics do not yield information on the object system, valid for one particular instant of time, but it is time-averaged information, time averaging being replaced, under the ergodic hypothesis, by ensemble averaging. In the Gibbs theory this ensemble is represented by the canonical density function Z~1e~H^qn'p"^^kT on phase space. This state is called a macrostate, to be distinguished from the microstate {qn,Pn}, representing the point in phase space the classical object is in at a certain instant of time. The restricted validity of thermodynamics is manifest in a two-fold way: i) through the restriction of all possible density functions on phase space to a
In "equilibrium thermodynamics" equilibrium is assumed to be even global.
108
the canonical ones; ii) through the restriction of thermodynamical quantities (observables) to functionals on the space of thermodynamic states. Physically this can be interpreted as a restriction of the domain of application of thermodynamics to those measurement procedures probing only properties of the macrostates. This implies that such measurements only yield information that is averaged over times exceeding the relaxation time needed to reach a state of (local) equilibrium. Thus, it is important to note that thermodynamic quantities are quite different from the physical quantities of classical statistical mechanics, the latter ones being represented by functions of the microstate {
Note that a "definition" of an instantaneous temperature by means of the equality Z/2nkT = S i P?/2mj does not make sense, as can easily be seen by applying this "definition" to an ideal gas in a container freely falling in a gravitational field. t h e r m o d y n a m i c pressure is defined for the canonical ensemble by p — kTd/dV log Z.
109
*
Figure 4: Incompatible thermodynamic arrangements.
to define the macrostate. In order to illustrate this, consider two identical cubic containers differing only in their orientations (cf. figure 4). In principle, the same microstate may be prepared in the two containers. Because of the different orientations, however, the macrostates, evolving from this microstate during the time the gas is reaching equilibrium with the container, are different (for different orientations of the container we have Hx ^ H2, and, hence, e - i f l / f c T / Z i ^ e~H2/kT/Z2, since H = T+V, and Vi ^ V2 because potential energy is infinite outside a container). This implies that thermodynamic macrostates may be different even though starting from the same microstate. Macrostates in thermodynamics have a contextual meaning. It is important to note that, since the container is part of the preparing apparatus, this contextuality is connected here to preparation rather than to measurement. Consequently, whereas classical quantities f({qn,Pn}) can be interpreted as objective properties, thermodynamic quantities are non-objective, the non-objectivity being of a contextual nature. Let us now suppose that quantum mechanics is related to hidden-variables theory analogous to the way thermodynamics is related to classical mechanics, the analogy maybe being even closer for non-equilibrium thermodynamics (only local equilibrium being assumed) than for the thermodynamics of global equilibrium processes. Support for this idea was found in de Muynck and van Stekelenborg17, where it was demonstrated that in the Husimi representation of quantum mechanics by means of non-negative probability distribution functions on phase space an analogous restriction to a "canonical" set of distributions obtains as in thermodynamics. In particular, it was demonstrated that the dispersionfree states p(q,p) = S(q — qo)S(p — po) are not "canonical" in this sense. This implies that within the domain of quantum mechanics it does not make sense to consider the preparation of the object in a "microstate" with a well-defined value of the hidden variables (q,p). In the analogy quantum mechanical observables like Ai,A2,Bi,B2 should be compared to thermodynamic quantities like pressure, temperature, and entropy. The central issue in the analogy is the fact that thermodynamic quanti-
110
ties like pressure and temperature cannot be conditioned on the instantaneous phase space variable {qn,Pn} (microstate). Expressions like p({qn,Pn}) and T({qn,Pn}) are meaningless within thermodynamics. Thermodynamic quantities are conditioned on macrostates, corresponding to ergodic paths in phase space. Analogously, a quantum mechanical observable might not correspond to an instantaneous property of the object, but might have to be associated with an (ergodic) path in hidden-variables space A (macrostate) rather than with an instantaneous value A (microstate). On the basis of the analogy between thermodynamics and quantum mechanics it is possible to state the following conjectures: • Quantum mechanical measurements (analogous to thermodynamic measurements) do not probe microstates but macrostates. • Quantum mechanical quantities (analogous to thermodynamic quantities) should be conditioned on macrostates. A hidden-variables macrostate will be symbolically indicated by A . For quantum mechanical measurements the conditional probabilities p(ai\\) of (4.1) should then be replaced by p(ai|A ). Concomitantly, quantum mechanical probabilities should be represented in the hidden-variables theory by a functional integral,
p(ai) = Jd? ptfMa^X1),
(1)
in which the integration is over all possible macrostates consistent with the preparation procedure. By itself conditioning of quantum mechanical observables on macrostates rather than microstates is not sufficient to prevent derivation of Bell's inequality. As a matter of fact, on the basis of expression (4.3) a quadrivariate joint probability distribution can be defined, analogous to (4.2), according to p(oi,02,61,62) = f dt p(A')p(a 1 |A t )p(a 2 |A t )p(6 1 |A < )p(6 2 |A t ),
(2)
from which Bell's inequality can be derived just as well. There is, however, one important aspect that up till now has not sufficiently been taken into account, viz. contextuality. In the construction of (4.4) it is assumed that the macrostate A is applicable in each of the measurement arrangements of observables A\,A2,Bi, and B2. Because of the incompatibility of some of these observables this is an implausible assumption. On the basis of the thermodynamic analogy it is to be expected that macrostates A will depend on the
111
measurement context of a specific observable. Since [Ai,Bi]_ ^ O, we will have 1
(3)
f^f ,
and analogously for A2 and B2. Then, for the Bell experiments measuring the pairs (Ai, A2) and (Ai,B2), respectively, we have p(ai,a2)
=
dX* '
p(ai,b2) = Jd\tAlB2
2
p(t
1 2
)p(ai|A
1 2
)p(a2\X
1 2
p{tMB2)p{atfMB*)p{a2\\tMB*).
),
(4)
(5)
Now, the contextuality expressed by inequality (4.5) prevents the construction of a quadrivariate joint probability distribution analogous to (4.4). Hence, like in the quantum mechanical approach, also in the local non-objectivistic hidden-variables theory a derivation of Bell's inequality is prevented due to the local contextuality involved in the interaction of the particle and the measuring instrument it is directly interacting with. 6
Conclusions
Our conclusion is that if quantum mechanical measurements do probe macrostates A rather than microstates A, then Bell's inequality cannot be derived for quantum mechanical measurements. Both in quantum mechanics and in hidden-variables theories is Bell's inequality a consequence of the assumption that the theory is yielding an objective description of reality in the sense that the preparation of the microscopic object, as far as relevant to the realization of the measurement result, can be thought to be independent of the measurement arrangement. The important point to be noticed is that, although in Bell experiments the preparation of the particle pair at the source (i.e. the microstate) can be considered to be independent of the measurement procedures to be carried out later (and, hence, one and the same microstate can be assumed in different Bell experiments), the measurement result is only determined by the macrostate, which is co-determined by the interaction with the measuring instruments. It really seems that the Copenhagen maxim of the impossibility of attributing quantum mechanical measurement results to the object as objective properties, possessed independently of the measurement, should be taken very seriously, and implemented also in hidden-variables theories purporting to reproduce the quantum mechanical results. The quantum
112
mechanical dice is only cast after the object has been interacting with the measuring instrument, even though its result can be deterministically determined by the (sub-quantum mechanical) microstate. The thermodynamic analogy suggests which experiments could be done in order to transcend the boundaries of the domain of application of quantum mechanics. If it would be possible to perform experiments that probe the microstate A rather than the macrostate A , then we are in the domain of (quasi-)objectivistic hidden-variables theories. Because of (4.2) it, then, is to be expected that Bell's inequality should be satisfied for such experiments. In such experiments preparation and measurement must be completed well within the relaxation time of the microstates. Such times have been estimated by Bohm24 "for the sake of illustration" as the time light needs to cover a distance of the order of the size of an atom (10~ 18 s, say). If this is correct, then all present-day experimentation is well within the range of quantum mechanics, thus explaining the seemingly universal applicability of this latter theory. By hindsight, this would explain why Aspect's switching experiment? is corroborating quantum mechanics: the applied switching frequency (50 MHz), although sufficient to warrant locality, has been far too low to beat the local relaxation processes in each of the measuring instruments separately. It has often been felt that the most surprising feature of Bell experiments is the possibility (in certain states) of a strict correlation between the measurement results of the two measured observables, without being able to attribute this to a previous preparation of the object (no 'elements of physical reality '). For many physicists the existence of such strict correlations has been reason enough to doubt Bohr's Copenhagen solution to renounce causal explanation of measurement results, and to replace 'determinism' by 'complementarity'. It seems that the urge for causal reasoning has been so strong that even within the Copenhagen interpretation a certain causality has been accepted, even a non-local one, in an EPR experiment (cf. figure 1) determining a measurement result for particle 2 by the measurement of particle 1. This, however, should rather be seen as an internal inconsistency of this interpretation, caused by a tendency to make the Copenhagen interpretation as realist as possible. In a consistent application of the Copenhagen interpretation to Bell experiments such experiments could be interpreted as measurements of bivariate correlation observables. The certainty of obtaining a certain (bivariate) eigenvalue of such an observable would not be more surprising than the certainty of obtaining a certain eigenvalue of a univariate one if the state vector is the corresponding eigenvector. It is important to note that this latter interpretation of Bell experiments takes seriously the Copenhagen idea that quantum mechanics need not ex-
113
plain the specific measurement result found in an individual measurement. Indeed, in order to compare theory and experiment it would be sufficient that quantum mechanics just describe the relative frequencies found in such measurements. In this view quantum mechanics is just a phenomenological theory, in an analogous way describing (not explaining) observations as does thermodynamics in its own domain of application. Explanations should be provided by "more fundamental" theories, describing the mechanisms behind the observable phenomena. Hence, the Copenhagen 'completeness' thesis should be rejected (although this need not imply a return to determinism). This approach has important consequences. One consequence is that the non-existence, within quantum mechanics, of 'elements of physical reality' does not imply that 'elements of physical reality' do not exist at all. They could be elements of the "more fundamental" theories. In section 5 it was discussed how an analogy between quantum mechanics and thermodynamics could be exploited to spell this out. 'Elements of physical reality' could correspond to hidden-variables microstates A. The determinism necessary to explain the strict correlations, referred to above, would be explained if, within a given measurement context, a microstate would define a unique macrostate A . This demonstrates how it could be possible that quantum mechanical measurement results cannot be attributed to the object as properties possessed prior to measurement, and there, yet, is sufficient determinism to yield a local explanation of strict correlations of quantum mechanical measurement results in certain Bell experiments. Another important aspect of a dissociation of phenomenological and fundamental aspects of measurement is the possibility of an empiricist interpretation of quantum mechanics. As demonstrated by the generalized Aspect experiment discussed in section 3, an empiricist approach needs a generalization of the mathematical formalism of quantum mechanics, in which an observable is represented by a POVM rather than by a projection-valued measure corresponding to a self-adjoint operator of the standard formalism. Such a generalization has been very important in assessing the meaning of Bell's inequality. In the major part of the literature of the past this subject has been dealt with on the basis of the (restricted) standard formalism. However, some conclusions drawn from the restricted formalism are not cogent when viewed in the generalized one (for instance, because von Neumann's projection postulate is not applicable in general). For this reason we must be very careful when accepting conclusions drawn from the standard formalism. This, in particular, holds true for the issue of non-locality.
114
References 1. W. Heisenberg, Zeitschr. f. Phys. 33, 879 (1925). 2. E. Schrodinger, Naturwissenschaften 23, 807, 823, 844 (1935) (English translation in Quantum Theory and Measurement, eds. J.A. Wheeler and W.H. Zurek (Princeton Univ. Press, 1983, p. 152)). 3. W.M. de Muynck, Synthese 102, 293 (1995). 4. A. Einstein, B. Podolsky, and N. Rosen, Phys. Rev. 47, 777 (1935). 5. A. Aspect, P. Grangier, and G. Roger, Phys. Rev. Lett 47, 460 (1981). 6. A. Aspect, J. Dalibard, and G. Roger, Phys. Rev. Lett. 49, 1804 (1982). 7. K.R. Popper, Quantum theory and the schism in physics (Rowman and Littlefield, Totowa, 1982). 8. M. Jammer, The philosophy of quantum mechanics (Wiley, New York, 1974.) 9. N. Bohr, Phys. Rev. 48, 696 (1935). 10. J.S. Bell, Physics 1, 195 (1964). 11. H.R Stapp, Phys. Rev. D 3, 1303 (1971); II Nuovo Cim. 29B, 270 (1975). 12. A. Fine, Journ. Math. Phys. 23, 1306 (1982); Phys. Rev. Lett. 48, 291 (1982). 13. P. Rastall, Found, of Phys. 13, 555 (1983). 14. W.M. de Muynck, Phys. Lett. A 114, 65 (1986). 15. W.M. de Muynck, W. De Baere, and H. Martens, Found, of Phys. 24, 1589 (1994). 16. W.M. de Muynck, Found, of Phys. 30, 205 (2000). 17. W.M. de Muynck and J.T. van Stekelenborg, Ann. der Phys., 7. Folge, 45, 222 (1988). 18. L. de Broglie, La thermodynamique de la particule isolee (GauthierVillars, 1964); L. de Broglie, Diverses questions de mecanique et de thermodynamique classiques et relativistes (Springer-Verlag, 1995). 19. D. Bohm, Phys. Rev. 89, 458 (1953). 20. D. Bohm and J.-P. Vigier, Phys. Rev. 96, 208 (1954). 21. E. Nelson, Dynamical theories of Brownian motion (Princeton University Press, 1967). 22. E. Nelson, Quantum fluctuations (Princeton University Press, 1985). 23. H.B. Hollinger and M.J.Zenzen, The Nature of Irreversibility (D. Reidel Publishing Company, Dordrecht, 1985, sect. 4.4). 24. D. Bohm, Phys. Rev. 85, 166, 180 (1952).
115
D I S C R E T E HESSIANS IN S T U D Y OF Q U A N T U M STATISTICAL SYSTEMS: COMPLEX G I N I B R E E N S E M B L E
M. M. DURAS
Institute of Physics, Cracow University of Technology, ulica Podchorazych 1, PL-30084 Cracow, Poland E-mail: [email protected]
The Ginibre ensemble of nonhermitean random Hamiltonian matrices K is considered. Each quantum system described by K is a dissipative system and the eigenenergies Z; of the Hamiltonian are complex-valued random variables. The second difference of complex eigenenergies is viewed as discrete analog of Hessian with respect to labelling index. The results are considered in view of Wigner and Dyson's electrostatic analogy. An extension of space of dynamics of random magnitudes is performed by introduction of discrete space of labeling indices.
1
Introduction
Random Matrix Theory RMT studies quantum Hamiltonian operators H which are random matrix variables. Their matrix elements Hij are independent random scalar variables 1.2,3,4,5,6,7,8 There were studied among others the following Gaussian Random Matrix ensembles GRME: orthogonal GOE, unitary GUE, symplectic GSE, as well as circular ensembles: orthogonal COE, unitary CUE, and symplectic CSE. The choice of ensemble is based on quantum symmetries ascribed to the Hamiltonian H. The Hamiltonian H acts on quantum space V of eigenfunctions. It is assumed that V is TV-dimensional Hilbert space V = F ^ , where the real, complex, or quaternion field F = R, C , H , corresponds to GOE, GUE, or GSE, respectively. If the Hamiltonian matrix
116
H is hermitean H — H\ then the probability density function of H reads:
MH)=CH0exp[-p-±-Tr(H2)}, CH0 = ( ^ ) ^ MHP=N+
/
/ 2
^N(N
(1)
, -
1)0,
fn(H)dH = 1,
^=nniK ) . N
N
D-l
i = l j > i 7=0
Hii = (H$\...,H<S>-»)eF, where the parameter /3 assume values /3 = 1,2,4, for GOE(iV), GUE(A^), GSE(A^), respectively, and Nap is number of independent matrix elements of hermitean Hamiltonian H. The Hamiltonian H belongs to Lie group of hermitean N x AT-matrices, and the matrix Haar's measure dH is invariant under transformations from the unitary group U(iV, F). The eigenenergies Ei,i = 1,..., N, oi H, are real-valued random variables Ei = E*. It was Eugene Wigner who firstly dealt with eigenenergy level repulsion phenomenon studying nuclear spectra 1 ' 2 ' 3 . RMT is applicable now in many branches of physics: nuclear physics (slow neutron resonances, highly excited complex nuclei), condensed phase physics (fine metallic particles, random Ising model [spin glasses]), quantum chaos (quantum billiards, quantum dots), disordered mesoscopic systems (transport phenomena), quantum chromodynamics, quantum gravity, field theory.
2
The Ginibre ensembles
Jean Ginibre considered another example of GRME dropping the assumption of hermiticity of Hamiltonians thus denning generic F-valued Hamiltonian K 1,2,9,10 j j e n C 6 ) j{ belong to general linear Lie group GL(N, F), and the matrix Haar's measure dK is invariant under transformations form that group. The
117
distribution of K is given by: MK)
= CK0 exp [-P-\-
TrffftA-)],
(2)
K.Hf> = N2p, /
fK{K)dK
= 1,
^=nniK ) . N
N
i=\j=\
D-\
7=0
where /3 — 1,2,4, stands for real, complex, and quaternion Ginibre ensembles, respectively. Therefore, the eigenenergies Zi of quantum system ascribed to Ginibre ensemble are complex-valued random variables. The eigenenergies Zi,i = 1,...,N, of nonhermitean Hamiltonian K are not real-valued random variables Zi ^ Z*. Jean Ginibre postulated the following joint probability density function of random vector of complex eigenvalues Z\,..., ZN tor N X N Hamiltonian matrices K for f} = 2 1 ' 2 - 9 ' 1 0 : P{zu...,zN) N
=
(3)
N
1
N
=n ^771 • n \zi - ztf • e x p(- zZ I^I 2 )' 3= 1
J
i<j
j=l
where Zi are complex-valued sample points (zi 6 C). We emphasize here Wigner and Dyson's electrostatic analogy. A Coulomb gas of iV unit charges moving on complex plane (Gauss's plane) C is considered. The vectors of positions of charges are zt and potential energy of the system is: U(z1,...,zN)
= -J2]n\zi-*j\ i<j
+ l'E\Zil
(4)
i
If gas is in thermodynamical equilibrium at temperature T = ^- (ft = -^-^ = 2, ks is Boltzmann's constant), then probability density function of vectors of positions is P(ZI,...,ZN) Eq. (3). Therefore, complex eigenenergies Zi of quantum system are analogous to vectors of positions of charges of Coulomb
118
gas. Moreover, complex-valued spacings AxZi quantum system: A1Zi = Zi+1-Zi,i
of complex eigenenergies of
= l,...,(N-l),
are analogous to vectors of relative positions of electric charges. complex-valued second differences A 2 Zj of complex eigenenergies: A2Zi = Zi+2 - 2Zi+l + Zui = 1,..., {N - 2),
(5) Finally,
(6)
are analogous to vectors of relative positions of vectors of relative positions of electric charges. The eigenenergies Zi = Z(i) can be treated as values of function Z of discrete parameter i — 1,..., N. The "Jacobian" of Zi reads: dZi
JacZi = V
A1Zi
~ ^T1 =
Ol
,
A Z
<-
7
1
A!
We readily have, that the spacing is an discrete analog of Jacobian, since the indexing parameter i belongs to discrete space of indices i £ / = {l,...,iV}. Therefore, the first derivative with respect to i reduces to the first differential quotient. The Hessian is a Jacobian applied to Jacobian. We immediately have the formula for discrete "Hessian" for the eigenenergies Zi\ Q2 7.
A 2 7.
Thus, the second difference of Z is discrete analog of Hessian of Z. One emphasizes that both "Jacobian" and "Hessian" work on discrete index space / of indices i. The finite differences of order higher than two are discrete analogs of compositions of " Jacobians" with "Hessians" of Z. The eigenenergies Ei,i 6 / , of the hermitean Hamiltonian H are ordered increasingly real-valued random variables. They are values of discrete function Ei = E{i). The first difference of adjacent eigenenergies is: A1Ei = Ei+1-Ei,i
= l,...,(N-l),
(9)
are analogous to vectors of relative positions of electric charges of one-dimensional Coulomb gas. It is simply the spacing of two adjacent energies. Real-valued second differences A2Ei of eigenenergies: A2Ei = Ei+2 - 2Ei+1 +Eui
= 1,..., (N - 2),
(10)
119
are analogous to vectors of relative positions of vectors of relative positions of charges of one-dimensional Coulomb gas. The A2Zi have their real parts ReA2Zi, and imaginary parts ImA 2 Z;, as well as radii (moduli) \A2Zi\, and main arguments (angles) ArgA 2 Zi. A 2 Zj are extensions of real-valued second differences: A 2 £ i = Ei+2 - 2Ei+1 +Ehi
= 1,..., (N - 2),
(11)
of adjacent ordered increasingly real-valued eigenenergies Ei of Hamiltonian H defined for GOE, GUE, GSE, and Poisson ensemble PE (where Poisson ensemble is composed of uncorrelated randomly distributed eigenenergies) 11,12,13 ' 14 ' 15 . The Jacobian and Hessian operators of energy function E(i) — Ei for these ensembles read:
and
The treatment of first and second differences of eigenenergies as discrete analogs of Jacobians and Hessians allows one to consider these eigenenergies as a magnitudes with statistical properties studied in discrete space of indices. The labelling index i of the eigenenergies is an additional variable of "motion", hence the space of indices I augments the space of dynamics of random magnitudes. Acknowledgements It is my pleasure to most deeply thank Professor Antoni Ostoja-Gajewski for continuous help. I also thank Professor Wlodzimierz Wojcik for his giving me access to computer facilities. References 1. F. Haake, Quantum Signatures of Chaos (Springer-Verlag, Berlin Heidelberg New York 1990), Chapters 1, 3, 4, 8, pp 1-11, 33-77, 202-213. 2. T. Guhr, A. Miiller-Groeling and H. A. Weidenmuller: Phys. Rept. 299, 189-425 (1998). 3. M. L. Mehta, Random matrices (Academic Press, Boston 1990), Chapters 1, 2, 9, pp 1-54, 182-193.
4. L. E. Reichl, The Transition to Chaos In Conservative Classical Systems: Quantum Manifestations (Springer-Verlag, New York, 1992), Chapter 6, p. 248. 5. O. Bohigas, in Proceedings of the Les Houches Summer School on Chaos and Quantum Physics, (North-Holland, Amsterdam, 1991), p. 89. 6. C.E. Porter, Statistical Theories of Spectra: Fluctuations (Academic Press, New York, 1965). 7. T. A. Brody, J. Flores, J. B. French, P. A. Mello, A. Pandey and S. S. M. Wong, Rev. Mod. Phys. 53, 385 (1981). 8. C. W. J. Beenakker, Rev. Mod. Phys. 69, 731 (1997). 9. J. Ginibre, J. Math. Phys. 6, 440 (1965). 10. M. L. Mehta, Random matrices (Academic Press, Boston 1990), Chapter 15, pp 294-310. 11. M. M. Duras and K. Sokalski, Phys. Rev. E 54, 3142 (1996). 12. M. M. Duras, Finite difference and finite element distributions in statistical theory of energy levels in quantum systems (PhD thesis, Jagellonian University, Cracow 1996). 13. M. M. Duras and K. Sokalski, Physica D125, 260 (1999). 14. M. M. Duras, Description of Quantum Systems by Random Matrix Ensembles of Large Dimensions, in Proceedings of the Sixth International Conference on Squeezed States and Uncertainty Relations, 24 May-29 May 1999, Naples, Italy (NASA, Greenbelt, Maryland, at press 2000). 15. M. M. Duras, J. Opt. B: Quantum Semiclass. Opt. 2, 287 (2000).
121
SOME R E M A R K S ON H A R D Y F U N C T I O N S ASSOCIATED W I T H DIRICHLET SERIES
Institut
W. EHM fur Grenzgebiete der Psychologie und Psychohygiene Wilhelmstrasse 3a, 79098 Freiburg, Germany E-mail: [email protected]
A simple method of associating a Hardy function with a Dirichlet series is described and applied to some examples connected with the Riemann zeta function. The theory of Hardy functions then is used to derive "integral tests" of the Riemann hypothesis generalizing a recent result of Balazard, Saias and Yor.1
1
Introduction
The most famous example of a Dirichlet series f(z) = Y^=i an n~z converging absolutely in the half plane $lz > 1 is the Riemann zeta function ((z), which has all coefficients an = 1. It has a simple pole at z — 1 and can be extended as a meromorphic function with no other singularities to the whole complex plane.6 A simple method of associating a Hardy function with a Dirichlet series of that kind consists in multiplying f(z) by (z — l ) / ^ 2 : the factor (z — l)/z removes the pole at z = 1, and the division by z achieves square integrability along vertical lines. Moreover, the zeros of f{z) remain unchanged by this modification. The motivation for passing from f(z) to f(z) (z — l)/z2 is to utilize the theory of Hardy functions, especially factorization of Hardy functions, for the study of the zeta function. In section 2 of this note we give conditions under which the function f(z) (z — l)/z2 has an analytic continuation, as a Hardy function, beyond the abscissa of convergence of the Dirichlet series f(z). The criterion is tested on three examples, all related to the Riemann zeta function. Factorization of the Hardy function £(z) (z — l)/z2, which is briefly dicussed in section 3, is used in section 4 to derive some "integral tests" of the Riemann hypothesis. The content of the Riemann hypothesis, hereafter abbreviated "RH", is Riemann's yet unproven conjecture that all non-real zeros of the £ function lie on the line iftz = 1/2 in the complex plane. It has received increasing interest among physicists since the discovery of striking similarities in the distribution of the zeros of the zeta function and the spectrum of large random matrices.2 The idea to utilize Hardy functions in connection with the zeta function, including "integral tests" of the Riemann hypothesis, is not new. See the recent article of Balazard, Saias and Yor1, who initially work with Hardy functions in the disc, then pass to the half plane 3te > 1/2 by conformal mapping. In our
122
approach based on the function C,(z)(z — l ) / z 2 , which also appears in recent work of Burnol4, we deal with half plane Hardy functions from the beginning. This leads to somewhat more general results in a natural fashion. 2
"Hardyfication" of Dirichlet series
The basic result of this section is the following. Theorem Given a Dirichlet series f(z) = $3nLi a « n~z with a finite abscissa of convergence, let functions A and
=
^2
a„,
<j){x) =
l
^^
an(l-x
+ \ogn)
(x € R ) .
l
(1) Suppose that A{x) = 0(x) as x —>• oo, and let
X
=
l limsup
-?pM
,
where
DN
= A(N) - V
^ M .
(2)
Then the function f(z) (z — l)/z2 can be represented as the Laplace transform of
f(z)(z-l)/z2
= / Jo
e-zx4>(x)dx
($lz>\).
(3)
Proof. Fix an integer N > 1 and let log N < x < \og(N + 1). Then \4>(x)-4>(logN)\
= \(x-logN)A(N)\<\A(N)\log?t±l
=
0(1)
as N -> oo, by the assumed growth behavior of A(x). Combining this with (A(log(n + l))-)(logn) = an+1 - A(n) log ^
= an+1 - A(n)/n +
0(n'1),
we get for N = [ex] -> oo N-l
4>{x) =
m + J2 [^(log("+!)) - ^(losn)] + °(!) n=l N-l
=
ai + 5 3 [an+1 - A(n)/n + Ofa- 1 )] + 0(1) = DN + 0(log N), n=l
123
and thus for every e > 0,
fN(z) = ^2ann~z,
fN(z)(z-l)/z2
T]N(z) =
n=l
(j)N{x) =
Y2
an(l-x
+ \ogn),
l
— e~~ax
N >1, and set h^^ix)
2TT
[
J^
+
]
\
0
if x < 0
(for every integer q > 1, a > 0) we get for fixed a > aa
-f 2TT
=
-i
(•OO
eitxr)N(v
+ it)dt
(4)
J_. /-oo
v \
eitx
N
]C a" n~°~it (a + it- l)l(a +{t)2
dt
n=l
2?r J -OO N
^"-'ijy^-'i^u ya
(a + it)2,
+ it
ann-°e-°(x-lo^(l-(x-logn))
Y,
=
dt ha,N{x)
x
l
almost everywhere in x S R, the Fourier integrals being understood in the L2 sense. Note that r](z) is square integrable along every line 9?z = a with a > aa. Clearly rj^i^+it) converges to r){a+it) in L2(dt), so h^^ is a Cauchy sequence in L2(dx), by Parseval's formula. The pointwise limit ha(x) of h
r](a + it) = Kit)
= / Jo
poo
ha{x)e~ixtdx
=
/ Jo
e-(°+iVx
(5)
124
holds almost everywhere in t (a > aa), hence everywhere in 3te > aa, by continuity. This shows that the Laplace transform of
=
(x € R, a > 0).
(6)
The function £(z) (z — l)/z2 was also considered by Burno?, in connection with a closure problem in function space known as the "Nyman - Beurling real variable form of the Riemann hypothesis". It may be interesting to note here that although ha is square integrable for every a > 0 it is not true that hafM —>• /i
0 < a < 1.
(7)
Proof. Note first that for x > log N -> 00 4>{x) - 4>N{X)
J^
( l - z + logn) =
(8)
(l-aOQe^-AO+logte^l-logAT!
N
=
( l - x ) ( [ e * ] - A 0 + ([ex} + ±)log[ex] - [ex] - (N + |)logiV + N + 0(1)
=
(JV+!)(log[e x ]-logJV) + ( [ e ^ ] - i V ) ( l o g [ e a ; ] - x ) + 0 ( l )
=
(N + ! ) ( * - log TV) + 0(1),
on using Stirling's formula and the inequalities 0 < x - log [ex] <2e~x (x > 0). The estimate (8) shows that there exists a finite constant B > 0 such that
125
logN) for all large N and x > B + log JV. Therefore
/*O0
\\K,N-K\\l
(
> / JB+\ogN roo
> TV2 /
TOO
(x-logN)2e-2axdx
= N2~2°
JB+\ogN
/
y2 e~2try dy
JB
for all large N, and assertion (7) follows. Example 2. Let f(z) = ^2p~z^ogp, where the sum extends over all prime numbers. This example is related to the logarithmic derivative of the zeta function, as may be seen from the product representation £(z) = J~T_ (1—p _ z ) _ 1 . For IRz > 1, C'(z) C(z)
v - logP ^ Pz - 1
./ > , V !ogP ^ ^ Pz (p2 - 1) '
M
and since the last series converges for Htz > 1/2, it suffices to consider f(z) as far as the analytic continuation of C(z)/C(z) 1S concerned. The series f(z) had convergence abscissa 1/2, implying the RH, if the associated sequence DN satisfied condition (2) with A = 1/2. For a numerical check we computed DN for TV up to 5 million. A plot of log + |.Djv| / log TV versus logiV (thinned out to every 200th data point; the general picture is not affected thereby) is shown in Figure 1 (a). Within the considered range, the observed behavior is well in accordance with a possible value of A = 1/2. Notice the obvious connection with the classical criterion saying that the RH is equivalent to the error estimate $^ p < x logp — x = 0(x1/2+e) (V e > 0) in the prime number theorem (Edwards6, Sect. 5.5). Incidentally, 4>(x) seems to be nonnegative in this case, too, as a plot of
Factorization of r)
From now on we shall restrict attention to the case / = £. For brevity we write r](z) = (,(z)(z — l)/z2 throughout the sequel. Recall from the previous section that TJ belongs to every Hardy space 'H2T, a > 0. Being a Hardy function r\ admits a useful factorization, some applications of which will be discussed in
126
Figure 1: Convergence abscissa of Laplace transform equal to 1/2? log"1" \DN\ I logN versus log AT, for (a): Example 2; (b): Example 3.
Plot of criterion
the next section. The zeros of r) in the right half plane Sftz > 0, which coincide with the non-trivial zeros of the zeta function, are generically denoted by p. The p's are known to lie symmetrically with respect to both the real axis and the critical line Kz = 1/2. That is, whenever p is a zero then so are the mirror images p, 1-/9, and 1 — p. Let a > 0 be fixed. According to the factorization theorem for Hardy functions (see e.g. Dym and McKean5 (ch. 2.7) or Hoffman8 (p. 132, 133)) TJ can be represented as the product of an outer and an inner function on the half plane 5Rz > a. More precisely, r,(z) =
Ha{z)Ba{z)
(ftz > a),
(9)
where the outer function is given by log \rj(a + it)\
H
t(z — a) + i t + i(z-a)
dt 1+t2
(10)
and the inner function reduces in the present case to a Blaschke product Ba which is composed of the zeros p of T] with 5fy> > a and their mirror images after reflection at the line 9?z = a, 2a — ~p. Explicitly, D
M
_
TT
z
~ P
\l-{p-o? l1"
(ii)
These formulae are easily obtained from the familiar ones for the half plane 9iz > 0 8 by shifting both the complex variable and the zeros by a. The inner
127
factor simplifies to a Blaschke product for the following reasons: (i) n has an analytic continuation across the line dtz = a to the entire right half plane, so that there is no singular factor; (ii) the constant c appearing in the general factorization formula reduces to unity because Ba(o) = 1 and Ha(a) = rj(a), as is readily verified. For real arguments z = s, taking first logarithms, then real parts on both sides of (9) one obtains for s > a > 0
iog,(s) = i jy^(^)\
(s
_-^\2 + £ i0i
{s
5Rp>
s-p
s-(2a-p) (12)
Note that T](s) is positive for s > 0, being the Laplace transform of a nonnegative function. 4
Applications
The factorization of n gives rise to various "tests" of the RH. A first example is obtained by setting a = 1/2 in (12). The sum on the right-hand side of (12) vanishes if and only if £(z) has no zero within the region $lz > 1/2. Therefore, the RH is true if and only if for some (and then for all) s > 1/2
If 71" J-<
logMl + ^ l / * ! ^ * ,
(s
2)
= log»K*).
(13)
+t
This criterion is equivalent to the condition that r)(z) be an outer function for the half plane 9?z > 1/2; cf. Dym and McKean5, Sect. 2.7. For s = 1 it assumes a particularly neat form. The right-hand side vanishes, and the left-hand side can be simplified, and one gets the following criterion for the truth of the RH due to Balazard, Saias and Yor1,
4
+
l
Another example results from the formula OO
/
1,
log[|ij(
K(p-a)"1
(15)
(cr > 0), which can be derived from (12) by subtracting logger) on both sides, dividing by s - cr, and then taking the limit s \, a. The interchange of limits and integration (or summation) can be justified by dominated convergence.
128 Putting a = 1/2 in (15) one obtains the following differential version of the "integral tests" (13), (14). The RH is true if and only if
f
dt logtWi+itJIMDl-rj
=
(log^)'(i).
(16)
j —<
This statement can be amplified in various ways. First, it is possible to evaluate (log77)'(|) explicitly, (logr?)'(|) = f + |log(87r) + f - 6, and for u = 1/2 the sum in (15) can be written in a more symmetric form. One thus obtains the relation 00
/
log
\v{\+it)\
dt •Kt2
v(h)
((l l \2
+
l 1l, o6Vg 2
M; +
I 7_T 4
6
)\ J
=
\^\$tp-52 I E ^ \p - | p (17)
in which the sum extends over all zeros in the critical strip. Note that (17) quantifies the difference between the two sides of (16) as a weighted sum of the absolute deviations of the real parts of the zeros from 1/2. Secondly, there is a connection with logarithmic Hilbert transforms, also called logarithmic dispersion relations.3 Suppose we had T](z) ^ 0 for IStz > 1/2. Then n itself would be an outer function,
Taking imaginary parts in this equation one can show with a little algebra that for z — 1/2 = a + ib, a > 0, one then has ZlogV(z)
=
- J ^ (log|7?(i + it)\ - l o g W ! +ib)\) -±-±
-I
j-^18)
l o g M | + r t ) I - log \T}{ \ + ib) I a dt 2 t-b a + (t-b)2
Fix any b > 0 such that 7/(| +ib) ^ 0. Then the last term in (38) converges to zero as a 4- 0. Therefore, using the fact that \r](\ + it)\ is an even function of t, one obtains in the limit the logarithmic dispersion relation 2b o-i (! + •* Z-00 log k ( | + it)| - log |»?(| + t6)| ^ Zlogriiz+ib) = —J i ^— ^ dt,
(19)
which expresses the phase of rj on the boundary dtz = 1/2 as an integral of its log modulus along that line. Recall that this relation is a consequence of the
129 assumed outer function character of 77, that is, of the RH. In fact, the validity of (19) for every 6 > 0 such that 7?(| + ib) ^ 0 is also sufficient for the RH. To see this divide both sides of (19) by b and let 6 4-0. Then the left side tends to (log»7)'(i), the right side to f /0°°log[\r](\ + it)\h{\)] §, so in the limit we get the condition (16) shown above to be equivalent to the RH. Finally we note that — (log77)'(
=
TT
1-p-7
—
T,—on
exp y^ y^ E . p
ie-itn\oSp
_ i\
(20)
n=l
and noting that fa{t) is thus represented as a product of terms of the form exp(a(elbt — 1)), each of which is the characteristic function of a Poisson random variable with intensity a and values in the lattice kb, k = 0,1,2,... . In order to connect this fact with the above question it is convenient to introduce the Levy measure Fa, which puts mass (npncr)~1 at each of the points - logp", n>l,p prime. Then (20) becomes log '^fffi = J(eitx - 1) Fa(dx), so taking real parts in this equation and using J^° (l — costx)/t2 dt = n \x\ (x £ R) one obtains Jog[|C(a + ii)|/C(
/ /
(coste-1)—^F^dx)
j_^j{postx-l)Fa{dx)^
= - hxlFeidx)
=
xF„(dx).
Thus we find that the essential part of the integral in question equals the first moment of the Levy measure Fa. The other part stemming from the factor (z — l)/z2 can be incorporated by introducing a signed, absolutely continuous measure Ga with density x _ 1 [2eax - e ^ - 1 ^ ) on (-00,0) (zero on [0,00)). One then has log
r){a + it)
±ii)
=
j(eax-l)(Fa-Ga)(dx),
130
and hence l o g [ | „ ( | + r t ) I M § ) ] ^ = lx(F„-Ga){dx)
(<x>l).
These calculations give a more detailed picture of the way how the factor (z — l)/z2 regularizes the zeta function as a J. 1: it compensates the flow of mass of Fa towards — oo by the subtraction of measures Ga such that the first moment of Fa — Ga remains bounded. Evidently, other ways of renormalizing the Levy measure as a \, 1 are also conceivable, and may be interesting to explore. References 1. M. Balazard, E. Saias, and M. Yor, Adv. Math. 143, 284 (1999). 2. M.V. Berry and J.P. Keating, SIAM Review 41, 236 (1999). 3. R.E. Burge, M.A. Fiddy, A.H. Greenaway, and G. Ross, Proc. R. Soc. London A 350, 191 (1976). 4. J.-F. Burnol, < h t t p : / / arXiv.org/abs/math/0001013> (2000). 5. H. Dym and H.P. McKean, Gaussian Processes, Function Theory, and the Inverse Spectral Problem, (Academic Press, New York, 1976). 6. H.M. Edwards, The Theory of the Riemann Zeta Function (Academic Press, New York, 1974). 7. B.V. Gnedenko and A.N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables (Addison-Wesley, Cambridge, 1954). 8. K. Hoffman, Banach Spaces of Analytic Functions (Dover, New York, 1988).
131
E N S E M B L E PROBABILISTIC EQUILIBRIUM A N D NON-EQUILIBRIUM THERMODYNAMICS WITHOUT THE T H E R M O D Y N A M I C A L LIMIT D . H. E. G R O S S Hahn-Meitner-Institut Berlin, Bereich Theoretische 14109 Berlin, Germany and Freie Universitdt Berlin, Email: [email protected]
Physik,Glienickerstr.lOO Fachbereich Physik;
Boltzmann's principle S = k In W allows to extend equilibrium thermo-statistics to "Small" systems without invoking the thermodynamic limit'2'3. As the limit hides more than clarifies the origin of phase transitions, a deeper and more transparent understanding is thus possible. The main clue is to base statistical probability on ensemble averaging and not on time averaging. It is argued that due to the incomplete information obtained by macroscopic measurements thermodynamics handles ensembles or finite-sized sub-manifolds in phase space and not single timedependent trajectories. Therefore, ensemble averages are the natural objects of statistical probabilities. This is the physical origin of coarse-graining which is not anymore a mathematical ad hoc assumption. The probabilities P(M) of macroscopic measurements M are given by the ratio P(M) = W(M)/W of the volumes of the sub-manifold M. of the microcanonical ensemble with the constraint M to the one without. From this concept all equilibrium thermodynamics can be deduced quite naturally including the most sophisticated phenomena of phase transitions for "Small" systems. Boltzmann's principle is generalized to non-equilibrium Hamiltonian systems with possibly fractal distributions M. in 6iV-dim. phase space by replacing the conventional Riemann integral for the volume in phase space by its corresponding box-counting volume. This is equal to the volume of the closure M.. With this extension the Second Law is derived without invoking the thermodynamic limit. The irreversibility in this approach is due to the replacement of the phase-space volume of the fractal sub-manifold M. by the volume of its closure M. The physical reason for this replacement is that macroscopic measurements cannot distinguish M. from Ai. Whereas the former is not changing in time due to Liouville's theorem, the volume of the closure can be larger. In contrast to conventional coarse graining the box-counting volume is defined in the limit of infinite resolution. I.e. there is no artificial loss of information.
1
Introduction
Recently the interest in the thermo-statistical behavior of non-extensive manybody systems, like atomic nuclei, atomic clusters, soft-matter, biological systems — and also self-gravitating astro-physical systems lead to consider thermostatistics without using the thermodynamic limit. This is most safely done by going back to Boltzmann. Einstein considers Boltzmann's definition of entropy as e.g. written on his
132
famous epitaph S=k-lnW
(1)
as Boltzmann's principle4 from which Boltzmann was able to deduce thermodynamics. Here W is the number of micro-states at given energy E of the TV-body system in the spatial volume V: W(E,N,V)
=
tr[e0S(E - HN))
<*<*-&,)] = ff^(^0)\B.B„).
(2)
(3)
eo is a suitable energy constant to make W dimensionless, Hpf is the Nparticle Hamilton-function and the iV positions q are restricted to the volume V whereas the momenta p are unrestricted . In what follows, we remain on the level of classical mechanics. The only reminders of the underlying quantum mechanics are the measure of the phase space in units of 2-KK and the factor 1/N! which respects the indistinguishability of the particles (Gibbs paradoxon). In contrast to Boltzmann 5,6 who used the principle only for dilute gases and to Schrodinger7, who thought equation (1) is useless otherwise, I take the principle as the fundamental, generic definition of entropy. In the following sections 1 will demonstrate that this definition of thermo-statistics works well especially also at higher densities and at phase transitions without invoking the thermodynamic limit. 2
There is a lot to add to classical equilibrium statistics from our experience with "Small" systems:
Following Lieb8 extensivity a and the existence of the thermodynamic limit N —> oo|jv/v =co „ gt are essential conditions for conventional (canonical) thermodynamics to apply. Certainly, this implies also the homogeneity of the system. Phase transitions are somehow foreign to this: The essence of first order transitions is that the systems become inhomogeneous and split into different phases separated by interfaces. In the conventional Yang-Lee theory phase transitions are represented by the positive zeros of the grand-canonical partition sum where the grand-canonical formalism breaks down (Yang-Lee singularities). In the following we show that the micro-canonical ensemble "Dividing extensive systems into larger pieces, the total energy and entropy are equal to the sum of those of the pieces.
133
gives much more detailed and more natural insight which corresponds to the experimental identification of phase transitions. There is a whole group of physical many-body systems called "Small" in the following which cannot be addressed by conventional thermo-statistics: • nuclei, • atomic cluster • polymers • soft matter (biological) systems • astrophysical systems • first order transitions are distinguished from continuous transitions by the appearance of phase-separations and interfaces with surface tension. If the range of the force or the thickness of the surface layers is such that the number of surface particles is not negligible compared to the total number of particles, these systems are non-extensive. For such systems the thermodynamic limit does not exist or makes no sense. Either the range of the forces (Coulomb, gravitation) is of the order of the linear dimensions of these systems, and/or they are strongly inhomogeneous e.g. at phase-separation. Boltzmann's principle does not invoke the thermodynamic limit, nor additivity, nor extensivity, nor concavity of the entropy S(E,N) (downwards bending). This was largely forgotten since hundred years. We have to go back to pre Gibbsian times. It is a purely geometrical definition of the entropy and applies as well to "Small" systems. Moreover, the entropy S(E, N) as defined above is everywhere single-valued and multiple differentiable. There are no singularities in it. This is the most simple access to equilibrium statistics 9 . We will explore its consequences in this contribution. Moreover, we will see that this way we get simultaneously the complete information about the three crucial parameters characterizing a phase transition of first order: transition temperature Ttr, latent heat per atom qiat and surface tension crsurf. Boltzmann's famous epitaph above (eq.l) contains everything what can be said about equilibrium thermodynamics in its most condensed form. W is the volume of the sub-manifold at sharp energy in the 6iV-dim. phase space.
134
3
Relation of the topology of S(E,N) Z(T,n,V)
to the Yang-Lee zeros of
In conventional thermo-statistics phase transitions are indicated by zeros of the grand-canonical partition function Z(T, n, V), V is the volume. See more details in1-2'3'10 Z(T,fi,V)
r°°dE f r — dN JJo go V2 ff de = Y_ «o JJo
=
e-[E-*N-TsmiT
c -V[ e -Mn-r.(e,n)]/T_
dn
(4)
const.+lin.+quadr. in the thermodynamic limit V —> oo|;v/y = c o „ s t . The double Laplace integral (4) can be evaluated asymptotically for large V by expanding the exponent as indicated in the last line to second order in Ae, An around the "stationary point" es,ns where the linear term vanishes:
1 T
dE
8
as T P
dN dS
f
dv
(5)
the only term remaining to be integrated is the quadratic one. If the two eigen-curvatures Ai < 0, A2 < 0 this is then a Gaussian integral and yields: Z(T,li,V)
= Yle-V[e.-Itn.-T.^,n.)]/T CO = e - F ^ ^
Z(T,fi,V) FiT^V) V
_ -+ea-
ff°° dvidv2eV[Mvl+X,vl}/2 JJ-00
(g)
(7)
_ ^g ,n,)) T„, B Tln(v/det(e i ± , l^n V , Vi V ^K s^ /in. - Tss + " + o(—). V
(g)
Here det(e s ,n s ) is the determinant of the curvatures of s(e,n), vi,v2 are the eigenvectors of d. det(e,n) =
de2 d s
dedn
dnde d s2
dn
A1A2 Sfie
Snn
Ai > A 2
(9)
135
Nalooo P = 1 a t m ^
'e3 r / f ^ J - ^ — ^ ASsurf
^_^.
^
•7 1
/
e2 ' s(e)-25-e*11.5 Hlat
0.3
0.5
0.7
0.9
1.1
1.3
Figure 1: MMMC simulation of the entropy s(e) per atom (e in eV per atom) of a system of JVo = 1000 sodium atoms with realistic interaction at an external pressure of 1 atm. At the energy per atom e\ the system is in the pure liquid phase and at e$ in the pure gas phase, of course with fluctuations. The latent heat per atom is qiat = e.% — e\.
Attention: the curve s(e) is artifically sheared by subtracting a linear function 25 -(- e * 11.5 in order to make the convex intruder visible s(e) is always a steeply monotonic rising function.We clearly see the global concave (downwards bending) nature of s(e) and its convex intruder. Its depth is the entropy loss due to the additional correlations by the interfaces. Prom this one can calculate the surface tension per surface atom aSUrf/Ttr = As 3 1 i r / * No/NsUrf. The double tangent is the concave hull of s(e). Its derivative gives the Maxwell line in the caloric curve T(e) at Ttr- In the thermodynamic limit the intruder would disappear and s(e) would approach the double tangent (Maxwell line) from below.
In the cases studied here A2 < 0 b u t Ai can be positive or negative. If d e t ( e s , n s ) is positive (Ai < 0) the last two t e r m s in eq.(8) go t o 0, and we obtain the familiar result f{T,n,V —> oo) = es — /xn s — Tss. I.e. the curvature Ai of the entropy surface s(e, n, V) decides whether the grand-canonical ensemble agrees with the fundamental micro ensemble in the thermodynamic limit. If this is the case, \n[Z(T, /j,)] or f(T,n) is analytical in e'3^ and due to Yang and Lee we have a single, stable phase. Or otherwise, the Yang-Lee zeros reflect anomalous points/regions of Ai > 0 ( d e t ( e , n ) < 0). This is crucial. As d e t ( e s , n s ) can be studied for finite or even small systems as well, this is t h e only proper extension of phase transitions t o "Small" systems. 4
T h e r e g i o n s of p o s i t i v e c u r v a t u r e Ai of s{es,ns) p h a s e t r a n s i t i o n s of first o r d e r
correspond to
We will now discuss the physical origin of convex (upwards bending) intruders in the entropy surface in two examples. In table (1) we compare the "liquid-gas" phase transition in sodium clusters of a few hundred atoms with t h a t of t h e bulk at 1 a t m . c.f. also fig.(l). Figure (2) shows how for a small system ( P o t t s q = 3 lattice gas with 50 * 50 points) all phenomena of phase transitions can be studied from t h e
136 Table 1: Parameters of the liquid-gas transition of small sodium clusters (MMMCcalculation1) in comparison with the bulk for rising number No of atoms, Nsurf is the average number of surface atoms of all clusters together.
N0 Ttr [K] qiat [eV] Na
Sboil
^Ssurf •L* surf
cr/Ttr
200 940 0.82 10.1 0.55 39.94 2.75
1000 990 0.91 10.7 0.56 98.53 5.68
3000 1095 0.94 9.9 0.44 186.6 7.07
bulk 1156 0.923 9.267 oo 7.41
topology of the determinant of curvatures (9) in the micro-canonical ensemble.
5
Boltzmann's principle and non-equilibrium thermodynamics
Before we proceed we must comment on Einstein's attitude to the principle 11 ): Originally, Boltzmann called W the "Wahrscheinlichkeit" (probability), i.e. the relative time a system spends (along a time-dependent path) in a given region of 6./V-dim. phase space. Our interpretation of W to be the number of "complexions" (Boltzmann's second interpretation) or quantum states (trace) with the same energy was criticized by Einstein4 as artificial. It is exactly that criticized interpretation of W which I use here and which works so excellently1. In section 7 I will come back to this fundamental point. After succeeding to deduce equilibrium statistics including all phenomena of phase transitions from Boltzmann's principle even for "Small" systems, i.e. non-extensive many-body systems, it is challenging to explore how far this "most conservative and restrictive way to thermodynamics" 9 is able to describe also the approach of (eventually "Small") systems to equilibrium and the Second Law of Thermodynamics. Thermodynamics describes the development of macroscopic features of many-body systems without specifying them microscopically in all details. Before we address the Second Law, we have to clarify what we mean with the label "macroscopic observable". 6
Macroscopic observables imply the "EPS-probability"
A single point {qi(t),Pi(t)}i=i...N in the Af-body phase space corresponds to a detailed specification of the system with all degrees of freedom (d.o.f) com-
137
1
0.8 0.6 0.4 0.2 0 -2
-1.5
-1 e
-0.5
0
Figure 2: Conture plot of the curvature determinant of Potts-3 lattice gas; Dark grey line: d = 0, boundary of the region of phase coexistence, the triangle APmB\ Light grey line: minimum of d(e,n) in the direction of the largest curvature, second order transition; In the triangle APmC ordered (solid) phase; Above and right of the line CPmB disordered (gas) phase; The crossing Pm of the boundary lines is a multi critical point. The light gray region around the multi-critical point Pm corresponds to a flat region of d(e, n) ~ 0
pletely fixed at time t (microscopic determination). Fixing only the total energy E of an iV-body system leaves the other (6N — l)-degrees of freedom unspecified. A second system with the same energy is most likely not in the same microscopic state as the first, it will be at another point in phase space, the other d.o.f. will be different. I.e. the measurement of the total energy HN, or any other macroscopic observable M, determines a (QN — 1)dimensional sub-manifold £ or M in phase space. All points in iV-body phase space consistent with the given value of E and volume V, i.e. all points in the (6N — l)-dimensional sub-manifold £{N,V) of phase space are equally consistent with this measurement. £(N,V) is the microcanonical ensemble. This example tells us that any macroscopic measurement is incomplete and defines a sub-manifold of points in phase space not a single point. An additional measurement of another macroscopic quantity B{q,p} reduces £ further to the cross-section £ O B, a (6iV — 2)-dimensional subset of points in £ with the volume:
W{B,E,N,V) = ±J
{j0f)
e0S(E-HN{q,p})6(B-B{q,p}) (10)
138
If Hff{q,p} as also B{q,p} are continuous differentiable functions of their arguments, what we assume in the following, £ n B is closed. In the following we use W for the Riemann or Liouville volume of a many-fold. Microcanonical thermostatics gives the probability P(B, E, N, V) to find the TV-body system in the sub-manifold £ D B(E,N, V): P(B E N V)~
W BE
( ' >N'V)
_ ln[W(B,E,N,V)]-S(E,N,V)
( m
This is what Krylov seems to have had in mind12 and what I will call the "ensemble probabilistic formulation of statistical mechanics (EPS) ". Similarly thermodynamics describes the development of some macroscopic observable B{qt,pt} in time of a system which was specified at an earlier time to by another macroscopic measurement A{qo,p0}. It is related to the volume of the sub-manifold M(t) = A(t0) n B(t) D £:
^J{^0)N^-B{qupt])
W(A,B,E,t) =
6(A - A{q0,po})e0d(E
- H{qt,pt}),
(12)
where {qt{Qo,Po},Pt{Qo,Po}} is the set of trajectories solving the HamiltonJacobi equations dH %=«-,
. 8H Pi = — « - ,
. , i = l---N
(13)
with the initial conditions {q(t = to) = . However, here we are interested to formulate the Second Law for "Small" systems i.e. we are interested in the whole distribution P(B(t)) not only in its mean value . Thermodynamics does not describe the temporal development of a single system (single point in the 6iV-diiri phase space). There is an important property of macroscopic measurements: Whereas the macroscopic constraint A{qo,po} determines (usually) a compact region A(to) in {qo,Po} this does not need to be the case at later times t 3> to: A(t) denned by A{qo{qt,pt},Po{
139
such points may have intruded from the phase space complimentary to A(to). Illustrative examples for this evolution of an initially compact sub-manifold into a fractal set are the baker transformation discussed in this context by ref.13'14. Then no macroscopic (incomplete) measurement at time t = oo can resolve aoo from its immediate neighbors an in phase space with distances \o-n — «oo| less then any arbitrary small 5. In other words, at the time t S> to no macroscopic measurement with its incomplete information about {qt,Pt} can decide whether {qo{qt,Pt},Po{qt,Pt}} € -4(*o) or not. I.e. any macroscopic theory like thermodynamics can only deal with the closure of A(t). If necessary, the sub-manifold A(t) must be artificially closed to A(t) as developed further in section 8. Clearly, in this approach this is the physical origin of irreversibility. We come back to this in section 8. 7
On Einstein's objections against the EPS-probability
According to Abraham Pais: "Subtle is the Lord" 11 , Einstein was critical with regard to the definition of relative probabilities by eq.ll, Boltzmann's counting of "complexions". He considered it as artificial and not corresponding to the immediate picture of probability used in the actual problem: "The word probability is used in a sense that does not conform to its definition as given in the theory of probability. In particular, cases of equal probability are often hypothetically defined in instances where the theoretical pictures used are sufficiently definite to give a deduction rather than a hypothetical assertion"4. He preferred to define probability by the relative time a system (a trajectory of a single point moving with time in the ./V-body phase space) spends in a subset of the phase space. However, is this really the immediate picture of probability used in statistical mechanics? This definition demands the ergodicity of the trajectory in phase space. As we discussed above, thermodynamics as any other macroscopic theory handles incomplete, macroscopic informations of the A-body system. It handles, consequently, the temporal evolution of finite sized sub-manifolds - ensembles - not single points in phase space. The typical outcomes of macroscopic measurements are calculated. Nobody waits in a macroscopic measurement, e.g. of the temperature, long enough that an atom can cross the whole system. In this respect, I think the EPS version of statistical mechanics is closer to the experimental situation than the duration-time of a single trajectory. Moreover, in an experiment on a small system like a nucleus, the excited nucleus, which then may fragment statistically later on, is produced by a multiple repetition of scattering events and statistical averages are taken. No ergodic covering of the whole phase space by a single trajectory in time is demanded.
140
At the high excitations of the nuclei in the fragmentation region their life-time would be too short for that. This is analogous to the statistics of a falling ball on a Galton's nail-board where also a single trajectory is not touching all nails but is random. Only after many repetitions the smooth binomial distribution is established. As I am discussing here the Second Law in finite systems, this is the correct scenario, not the time average over a single ergodic trajectory.
8
Fractal distributions in phase space, Second Law
Let us examine the following Gedanken experiment: Suppose the probability to find our system at points {qt,Pt}\ in phase space is uniformly distributed for times t < to over the sub-manifold £{N, V\) of the TV-body phase space at energy E and spatial volume V\. At time t > to we allow the system to spread over the larger volume V2 > Vi without changing its energy. If the system is dynamically mixing, the majority of trajectories {qt,Pt}^ in phase space starting from points {qo,Po} with qo 6 V\ at to will now spread over the larger volume V2- Of course the Liouvillean measure of the distribution JA{qt,Pt} in phase space at t > to will remain the same (= tr[£(N, Vi)]f5. (The label {qo £ Vi} of the integral means that the positions {qo}^ are restricted to the volume Vi, the momenta {po}? are unrestricted.) tr[M{qt{qo,Po},Pt{qo,Po}}]\{goeVl}
6 1
-UMW"-^- ' ^ because of: 7-7—-—r = 1. o{qo,Po}
<14) (15)
But as already argued by Gibbs the distribution M{qt,Pt} will be filamented like ink in water and will approach any point of £{N, V2) arbitrarily close. M{qt,pt\ becomes dense in the new, larger £(N, V2) for times sufficiently larger than to (strictly in the limt_>.oo)- The closure M. becomes equal to £{N,V-z). This is clearly expressed by Lebowitz 16,17 . In order to express this fact mathematically, we have to redefine Boltzmann's definition of entropy eq.(l) and introduce the following fractal "mea-
141 sure" for integrals like (3) or (10):
W(E,N,t»t0) = ± [
i^Sf)zo6(E-HN{quPt}) (16)
With the transformation: f(d3qt d3Pt)N
• • • = / " d < n •• • da6N • • • 1
do-QN :=
IVffll
(17)
^{dH , dH, \ —> -—- dqi + -^—dpi =
1 ,_, —dE
,1Q, (18)
,+
N?s) ?gy
W[E, N, t » t 0 ) =
v,/9Lv3jv
f
rf<J
i
• • • d(76N-1JVH||
(20)
we replace .M by its closure M. and define now: W ( E , W , t » f o ) -> M(E, J V , t » t 0 ) :=
vo\box[M(E,N,t
» to)] :=)ms_y05dNs[M(E,N,f»
f0)]
(22)
with lim*= inf [lim *] or symbolically:
M(E,N,t»t0)
=: L
l.f^^Pi)
J {«o{«.,p«}eVi}M V ( 2 ^ ) ^ J
e06(E-HN)(23)
N
i =
1
1 a"at arvt \
WfaNWtWiE^M),
(24)
142
vb
Va
va + vb
*>io
t<*0
Figure 3: The compact set M(to), left side, develops into an increasingly folded "spaghetti"like distribution in phase-space with rising time t. This figure shows only the early form of the distribution. At much larger times it will become more and more fractal. The grid illustrates the boxes of the box-counting method. All boxes which overlap with A4(t) are counted in Ng in eq.(22)
where 3d means that this integral should be evaluated via the box-counting volume (22) here with d = 6N — 1. This is illustrated by the figure 3. With this extension of eq.(3) Boltzmann's entropy (1) is at time t ->• oo equal to the logarithm of the larger phase space W(E, TV, V^)- This is the Second Law of Thermodynamics. The box-counting is also used in the definition of the Kolmogorov entropy, the average rate of entropy gain 18 ' 19 . Of course still at to
Mto)=M{t0)=£{N,V1): M(E,N,t0)
l_
=:
d3q0 d?pQ (2irH)3 d3q0 d3p0 \
{<7o€Vi}
3 4o6V,> N\ \ (2nh) J
'{qo€Vi}
e06(E - HN) e0S(E -
(25)
HN) (26)
N l
The box-counting volume is analogous to the standard method to deter= W{E,N,V{). mine the fractal dimension of a set of points18 by the box-counting dimension: dimbox[M(E,N,t
» t0)] := lira,,
InNs[M(E,N,t> In S
tp)]
(27)
143 Like the box-counting dimension, volbox has the peculiarity that it is equal to the volume of the smallest closed covering set. E.g.: The box-counting volume of the set of rational numbers {Q} between 0 and 1, is vol;, ox {Q} = 1, and thus equal to the measure of the real numbers , c.f. Falconer18 section 3.1. This is the reason why vol&ox is not a measure in its mathematical definition because then we should have volf,0
£(M) i€{Q}
2
voUo«[Mi] = 0,
(28)
ie{Q}
therefore the quotation marks for the box-counting "measure". Coming back to the the end of section (6), the volume W ( A , B , • • • ,t) of the relevant ensemble, the closure M(t) must be "measured" by something like the box-counting "measure" (22,23) with the box-counting integral B d which must replace the integral in eq.(3). Due to the fact that the box-counting volume is equal to the volume of the smallest closed covering set, the new, extended, definition of the phase-space integral eq.(23) is for compact sets like the equilibrium distribution £ identical to the old one eq.(3). Therefore, one can simply replace the old Boltzmann-definition of the number of complexions and with it of the entropy by the new one (23). 9
Conclusion
Macroscopic measurements M determine only a very few of all 6N d.o.f. Any macroscopic theory like thermodynamics deals with the volumes M of the corresponding closed sub-manifolds M in the 6iV-dim. phase space not with single points. The averaging over ensembles or finite sub-manifolds in phase space becomes especially important for the micro canonical ensemble of a finite system. Because of this necessarily coarsed information, macroscopic measurements, and with it also macroscopic theories are unable to distinguish fractal sets M from their closures M. Therefore, I make the conjecture: the proper manifolds determined by a macroscopic theory like thermodynamics are the closed M. However, an initially closed subset of points at time to does not necessarily evolve again into a closed subset at t ^> to- l e . the closure operation and the t —)• oo limit do not commute, and the macroscopic dynamics becomes irreversible. The limt-^oo and l i m ^ o may be linked as e.g. S > const.ft and the S —>• 0 limit taken after the t —> oo limit. Here is the origin of the misunderstanding by the famous reversibility paradoxes which were invented by Loschmidt20 and Zermelo21'22 and which
144
bothered Boltzmann so much 23,24 . These paradoxes address to trajectories of single points in the JV-body phase space which must return after Poincarre's recurrence time or which must run backwards if all momenta are exactly reversed. Therefore, Loschmidt and Zermelo concluded that the entropy should decrease as well as it was increasing before. The specification of a single point demands of course a microscopic exact specification of all 6N degrees of freedom not a determination of a few macroscopic degrees of freedom only. No entropy is defined for a single point. By our formulation of thermo-statistics various non-trivial limiting processes can be avoided. Neither does one invoke the thermodynamic limit of a homogeneous system with infinitely many particles nor does one rely on the ergodic hypothesis of the equivalence of (very long) time averages and ensemble averages. The use of ensemble averages is justified directly by the very nature of macroscopic (incomplete) measurements. Coarse-graining appears as natural consequence of this. The box-counting method mirrors the averaging over the overwhelming number of non-determined degrees of freedom. Of course, a fully consistent theory must use this averaging explicitly. Then one would not depend on the order of the limits l i m ^ o limt_>oo as it was tacitly assumed here. Presumably, the rise of the entropy can then be already seen at finite times when the fractality of the distribution in phase space is not yet fully developed. The coarse-graining is no more any mathematical ad hoc assumption. Moreover the Second Law is in the EPS-formulation of statistical mechanics not linked to the thermodynamic limit as was thought up to now 16 ' 17 .
Appendix In the mathematical theory of fractals18 one usually uses the Hausdorff measure or the Hausdorff dimension of the fractal19. This, however, would be wrong in Statistical Mechanics. Here I want to point out the difference between the box-counting "measure" and the proper Hausdorff measure of a manifold of points in phase space. Without going into too much mathematical details we can make this clear again with the same example as above: The Hausdorff measure of the rational numbers € [0,1] is 0, whereas the Hausdorff measure of the real numbers € [0,1] is 1. Therefore, the Hausdorff measure of a set is a proper measure. The Hausdorff measure of the fractal distribution in phase space M(t -> oo) is the same as that of M(to), W(E, N,V{). Measured by the Hausdorff measure the phase space volume of the fractal distribution M(t -t oo) is conserved and Liouville's theorem applies. This would demand that thermodynamics could distinguish between any point inside the fractal from any point outside of it independently how close it is. This, however,
145
is impossible for any macroscopic theory that can only address macroscopic information where all unobserved degrees of freedom are averaged over. That is the deep reason why the box-counting "measure" must be taken and where irreversibility comes from. Acknowledgement I thank to E.G.D. Cohen and Pierre Gaspard for detailed discussions. References 1. D. H. E. Gross, Microcanonical thermodynamics: Phase transitions in "Small" systems. Lecture Notes in Physics (World Scientific, Singapore, 2000). 2. D. H. E. Gross and E. Votyakov, Phase transitions in "small" systems. Eur.Phys.J.B, 15, 115-126, (2000); http://arXiv.org/abs/condmat/9911257. 3. D. H. E. Gross. Micro-canonical statistical mechanics of some nonextensive systems. http://arXiv. org/abs/astro-ph/cond-mat/0004268 (2000). 4. A. Einstein, Uber einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt. Annalen der Physik, 17, 132 (1905). 5. L. Boltzmann, Uber die Beziehung eines algemeinen mechanischen Satzes zum Hauptsatz der Warmelehre. Sitzungsbericht der Akadamie der Wissenschaften, Wien, 2, 67-73 (1877). 6. L. Boltzmann, Uber die Begriindung einer kinetischen Gastheorie auf anziehende Krafte allein. Wiener Berichte, 89, 714 (1884). 7. E. Schrodinger, Statistical Thermodynamics, a Course of Seminar Lectures, delivered in January-March 1944 at the School of Theoretical Physics (Cambridge University Press, London, 1946). 8. Elliott H. Lieb and J. Yngvason, The physics and mathematics of the second law of thermodynamics. Physics Report,cond-mat/9708200, 310, 1-96 (1999). 9. J. Bricmont, Science of chaos or chaos in science? Physicalia Magazine, Proceedings of the New York Academy of Science, to apear, 1-50 (2000). 10. D.H.E. Gross, Phase transitions in "small" systems - a challenge for thermodynamics. http://arXiv.org/abs/cond-mat/0006087, page 8 (2000). 11. A. Pais, Subtle is the Lord, chapter 4, pages 60 - 78. (Oxford University Press, Oxford, 1982).
12. N. S. Krylov, Works on the Foundation of Statistical Physics. (Princeton University Press, Princeton, 1979). 13. R. F. Fox, Entropy evolution for the baker map. Chaos, 8, 462-465 (1998). 14. T. Gilbert, J. R. Dorfman, and P. Gaspard, Entropy production, fractals, and relaxation to equilibrium. Phys.Rev.Lett., 85, 1606,nlin.CD/000301 (2000). 15. H. Goldstein, Classical Mechanics (Addison-Wesley, Reading, Mass, 1959). 16. J. L. Lebowitz, Microscopic origins of irreversible macroscopic behavior. Physica A, 263, 516-527 (1999). 17. J. L. Lebowitz, Statistical mechanics: A selective review of two central issues. Rev.Mod.Phys., 7 1 , S346-S357 (1999). 18. K. Falconer, Fractal Geometry - Mathematical Foundations and Applications ( John Wiley & Sons, Chichester, New York, Brisbane, Toronto,Singapore, 1990). 19. E. W. Weisstein, Concise Encyclopedia of Mathemetics (CRC Press, London, New York, Washington D.C:, 1999. CD-ROM, edition 1, 20.5. 99). 20. J. Loschmidt, Wienerberichte, 73, 128 (1876). 21. E. Zermelo, Wied.Ann., 57, 778-784 (1896). 22. E. Zermelo. Uber die mechanische Erklarung irreversiblen Vorgange. Wied.Ann., 60, 392-398 (1897). 23. E. G. D. Cohen, Boltzmann and statistical mechanics. In Boltzmann's Legacy, 150 Years after his Birth, http://xxx.lanl.gov/abs/condmat/9608054, (Atti dell Accademia dei Lincei, Rome, 1997). 24. E. G. D. Cohen,. Boltzmann and Statistical Mechanics, volume 371 of Dynamics: Models and Kinetic Methods for Nonequilibrium Many Body Systems, J. Karkheck editor, 223-238 (Kluwer, Dordrecht, The Netherlands, 2000).
147
A N A P P R O A C H TO Q U A N T U M P R O B A B I L I T Y STAN GUDDER Department of Mathematics University of Denver Denver Colorado 80208 sgudder@cs. du. edu We present an approach to quantum probability that is motivated by the Feynman formalism. This approach shows that there is a realistic description of quantum mechanics and that nonrelativistic quantum theory can be derived from simple postulates of quantum probability. The basic concepts in this framework are measurements and actions. The measurements are similar to the dynamic variables of classical mechanics and the random variables of classical probability theory. The actions correspond to quantum mechanical states. An influence between configurations of a physical system is defined in terms of an action. The fundamental postulate of this approach is that the probability density at a measurement outcome x is the sum (or integral) of the influences between each pair of configurations that result in x upon executing the measurement.
1
Introduction
We shall discuss a new approach to quantum probability that combines a reformulation of the mathematical foundations of quantum mechanics and the basic tenets of probability theory. This approach is motivated by the Feynman formalism 1 and it answers various puzzling questions about traditional quantum mechanics. Some of these questions are the following. 1. Where does the quantum mechanical Hilbert space H come from? 2. Why are states represented by unit vectors in H and observables by selfadjoint operators on HI 3. Why does the probability have it's postulated form? 4. Why do the position and momentum operators have their particular forms? 5. Why does a physical theory that must give real-valued results involve complex amplitudes or states? 6. Is there a realistic description of quantum mechanics? Our philosophy is that quantum probability theory need not be the same as classical probability theory. That is, the probability need not be given by a measure. However, the predictions of quantum probability theory should agree
148
with experimental long run relative frequencies. We shall show that there is a realistic description of quantum mechanics. In other words, a quantum system has properties independent of observation. We also show that nonrelativistic quantum mechanics can be derived from simple postulates of this approach. Our presentation is a modified version of the discussion in Gudder 2 . 2
Formulation
We denote the set of possible configurations of a physical system <S by fl and call $1 a sample space. If X is a measurement on <S, then executing X results in a unique outcome depending on the configuration u of S. To be precise, we define a measurement to be a map X from fl onto its range R(X) C R satisfying: (Ml) R(X) is the base space of a measure space (R(X), E x , fix)(M2) X_1(x)
is the base space of a measurable space (X~1(x), E x ) for every
x e R(x). We call the elements of R(X), X-outcomes and the sets in E x are X-events. Note that X _ 1 ( x ) corresponds to the set of configurations resulting in outcome x when X is executed and we call X_1(x) the X-fiber over x. The measure fix represents an a priori weight due to our knowledge of the system (for example, we may know the energy of S or we might assume the energy has a certain value). In the case of total ignorance, the weight is taken to be counting measure in the discrete case and uniform measure in the continuous case. This framework gives a realistic theory because a configuration CJ determines the properties of S independent of any particular observation. That is, w determines the outcomes of all measurements simultaneously. Notice that measurements are similar to the dynamical variables of classical mechanics and the random variables of classical probability theory. The sample space fi gives an underlying level of reality upon which traditional quantum mechanics can be constructed. If X is a measurement, an X-action is a pair
(S,{/£:
xeR(X)})
where S: CI —> R and (ix is a measure on [X~l{x),Hxx). As we shall see, actions correspond to quantum states. For simplicity, we frequently denote an action by S and we remark that S depends on our model of S and also on our knowledge of <S. We define the influence between w, w' 6 SI relative to S
149
by Fs(u,u')
= JVf cos[S(w) - S(u')]
(1)
where Ns > 0 is a normalization constant. The appearance of the cosine in (1) is not arbitrary, but it can be derived from the regularity conditions of continuity and causality. 2 ' 5 We now make a fundamental reformulation of the probability concept. 2 ' 5 We postulate that the probability density Px,s{%) of an X-outcome x is the sum (or integral) of the influences between each pair of configurations that result in x upon executing X. Precisely, we postulate that Fs(w, u/) is integrable and that PX,S(X)=
f JX-l(x)
[ JX~l(x)
(2)
FS(uj,Uj')fM'x(du)^x(dLj'
Also, to ensure that Px,s{x) is indeed a probability density, we assume that Px,s{x) is measurable with respect to Ex and that
L
Px,s(x)nx(dx)
- 1
(3)
R{X)
Equation (3) can be employed to find Ns- To show that Px,s(x)
> 0 we have
Px,s(*) = N2S[ Jx-Hx)
f [caaS(w)coaS(w') Jx-Hx) -| 2
= N2S
/ cosS(u})fix(dcj Jx-1(x)
+
8mS(u)S(u')]px(du)px(du')
p
+
/
sinS(w)^ x (eL;
>0
Jx-^x) (4)
We conclude that Px,s(x) is a probability density on {R(X), £ X , / J X ) . If B G £ * is an X-event, we define the (X, 5)-probability of B by Px,s{B)
= [ JB
Px,s(x)Vx{dx)
(5)
Then Px,s'- Ex -> [0,1] is a probability measure on (R(X),Hx) that we call the S-distribution of X. If h: R(X) ->• R is ^x-integrable, then the
150
5-expectation of h{X) is defined by Es(h{X))=
[
h(x)Px,s(dx)=
JR(X)
[
h(x)Px,s(x)nx(dx)
(6)
JR(X)
In particular if h is the identity function, the 5-expectation of X becomes ES(X)=
[
xPx,s{x)nx(dx)
(7)
JR(X)
Influence is a strictly quantum phenomenon that is not present in classical physics. In the classical limit, Fs{w,u') approaches a delta function 5U(UJ'). In this limit Fs(ui,ui') = 0 for u 7^ OJ' and there is no influence between distinct configurations. We then have Px,s(x) — nxx {X~l{x)) which gives a classical probability framework. We can extend this theory to include expectations of other functions on Q. Let g: Q —> R be a function that is integrable along X-fibers. We define the (X, 5)-expectation of g at x by EXlS(g)(x)
= I
[
5 (w)fs(w,a;')Mx(dw)Mx(dw')
JX-1(x)JX-^(x)
(8)
This is the natural generalization of (2) from a probability density to an expectation density. If Ex,s(g) 1S integrable, then the (X, 5)-expectation of g is given by Ex,s(9)
= [
Ex,s{9){x)»x{dx)
(9)
JR(X)
In particular, if g(u) = h (X(CJ)) Ex,s(g)(.x)
=
then
h(x)Px,s(x)
and ExM
= I
h(x)Px,s(x)»x(dx)
= Es
(h(X))
JR(X)
This shows that (9) is an extension of (6). We can also use this formalism to compute probabilities of events in fi. Let ACQ, and denote the characteristic function of Aby xA- If XA is integrable along X-fibers we define, analogously as in classical probability theory, the (X, 5)-pseudoprobability of A by ?x,s(A)
=
Ex,s(xA)
151
It follows from (3) and (9) that Px,s(ty = 1 and Px,s is countably additive. However, Px,s rnay have negative values, which is why it is called a pseudoprobability. Nevertheless, there are cr-algebras of subsets of fi on which Px,s is a probability measure. For example, if A = X~X{B) for B € E x , then it can be shown that Px,s(A) = Px,s(B).2 Therefore, in this case Px,s reduces to the distribution Px,s- We shall consider some less trivial examples later. 3
Wave Functions and Hilbert Space
This section employs the formalism of Section 2 to derive the wave functions and Hilbert space of traditional quantum mechanics. It is not necessary to do this because the needed probability formulas have been presented in Section 2. However, as we shall see, the Hilbert space formulation gives more convenient and concise notations. Applying (4) we obtain 2 iS
/
JX-l(x)
Nse ^»x(duj)
(10)
We call the function / s M = NseiS^
(11)
the S-amplitude function and define the (X, S)-wave function by fx,s(*)
= f
fs(u)»xx(du)
(12)
./X-i(a:)
From (10) and (12) we obtain Px,s(x) = l/x,s(*)| 2
(13)
We also have Fs(u,w')
= iVfRe e ^ M e - ^ " ' ) = Re /s(w)'/ s (w')*
(14)
Equation (10) shows how the complex numbers arise in quantum mechanics. The complex numbers are not needed for the computation of Px,s because we can always write FS(OJ,W') in the form (1). They are merely a convenience that gives a simple and concise formula. Equation (11) gives the Feynman amplitude function which we have now derived from deeper principles and (12) is Feynman's prescription that the amplitude of an outcome a; is the sum (or
152
integral) of the amplitudes of the configurations (or alternatives) that result in x when X is executed. : If B G E x , applying (5) and (13) gives Px,s(B)
= [ \fx,s(x)\2»x(dx)
(15)
JB
and this is the usual probabilistic formula of traditional quantum mechanics. It follows from (3) that fx,s is a unit vector in the Hilbert space 1? (R(X),Hx,^x) and this derives the quantum Hilbert space and the vector form for a state. If Ax is a set of X-actions, then the Hilbert space Hx Q L2 (R(X), T,x,fJ-x) generated by the set of wave functions {fx,s- S € Ax} is called an X-Hilbert space. Some X-actions may not be relevant for physical reason so we may want Ax to be a proper subset of the set of all X-actions. If g: Cl —> R is integrable along rr-fibers and S £ Ax, we define the (X, 5)-amplitude average of g at x by fx,s(9){x)
= [
g(u)fs(u>)fx(dLj)
l
= NS [
Jx- (x)
g{u)eiS^nx{d")
JX-i(x)
(16) Applying (8) and (14), we obtain £x,s(ff)(s)=Re =
/
g(Lj)fs(cj)»x(du)
[
/s(^')>i(^')
Befx,s(g)(x)fx,s{x)*
It follows from (9) that Ex,s(g)='Re(fx,s(g),fx,s)
(17)
Define the linear operator g on Hx by gfx,s(%) = fx,s(g)(%) and extend by linearity. If the operator Tj is self-adjoint on Hx we call g an X-observable and we have Ex,s(9) = (9fx,s,fx,s)
(18)
for all S G Ax- We then say that g is represented by the self-adjoint operator on Hx • This derives the representation of observables by self-adjoint operators.
153 For a simple example of a representation, let g: £1 -» R be a constant function g(uj) = c. Then (16) gives fx,s(g){x)
= c /
JX-1(x)
fs(w)nx(du)
=
cfx,s(x)
Hence, g is an A"-observable and is represented by the self-adjoint operator cl. As another example, letting g — X we have by (16) that fx,s(X){x)
=
xfXiS(x)
It follows that X is represented by the self-adjoint operator X on Hx given by Xu(x) = xu{x). We conclude that Hx is a Hilbert space in which X is "diagonal." More generally, since fx,s (h(X)) (x) = h(x)fx,s(x)
(19)
we see that h{X) is represented by the self-adjoint operator h(X)Au(x) h(x)u(x). Moreover, the spectral measure Px is given by Px (B)u(X) XB(x)u(x) and applying (15) gives Px,s(B)
=
= —
\\px(B)fx,s\
which is again a standard probabilistic formula. Finally, for A C fi the (X, 5)-pseudoprobability becomes by (17) Px,s(A)
= Re (fx,s(xA),fx,s)
(20)
where by (16) we have fxAxA)(x)= 4
[ fs(cj)fixx(du) JX-'(i)ni
= NS I
1
eiS^»x(ckj)
(21)
Jx- (x)nA
Spin
We now illustrate the framework presented in the last two sections by presenting a model for spin 1/2 measurements. Fix a direction corresponding to the z axis and assume that the spin j z in the z direction is known (either 1/2 or —1/2). Let UJ € [0,7r] denote a direction whose angle to the z axis is LJ. By symmetry, the spin distribution should depend only on u. Let fi = [0,7r], 8 6 fi and let X: Q -> {-1/2,1/2} be the function X(u)
= -1/2
for u E [0,6] and X(u)
= 1/2 for u G (0,TT].
154
We make X into a measurement by defining fix ({-1/2})=
^
({1/2}) = 1
and endowing X~1(-l/2) = [0,(9] and X ~ 1 ( 1 / 2 ) = (0,ir] with the usual Borel structure. The function X corresponds to a spin 1/2 measurement in the 0 direction. Letting 6 vary, we obtain an infinite number of spin measurements each applied in a different direction. Observe that a sample point u> € CI determines the spin in every direction simultaneously. For j z = 1/2 we define the X-action (S, < fix ' , fix >J given by S(LJ) = u and fix ' , fix are fi/2 where fi is Lebesgue measure restricted to X _ 1 (—1/2), X _ 1 ( l / 2 ) , respectively. We then have FS(OJ,CJ') = cos(o; - a/)
(we shall see that Ns = 1). The probabilities become P*,5(-l/2)
Px,s(l/2)
=
l/oVoCOs^-w'Jdwdw'
=
i[/ 0 9 cosa;du;] 2 + i [ / 0 e s i n a ; ^ ] 2
=
±sin20+i(l-cos0)2=sin2f
=
(22)
\f?ficoa(u,-uj')dLjdu,'
=
\ [fg cos uiduj] + i [fg sin udu}]
=
\ sin2 6 + \(1 + cos Of = cos2 f
(23)
Since Px,s(-l/2) + Px,s(ll2) = 1 we see that Ns = 1. Notice that (22) and (23) are the usual probability distribution for spin in the 9 direction when U = i/2. For j z = —1/2 we define the X-action \S' Avx ' , vj \ J given by S' = u for u e (0,7r) and S' = -TT/2 for u e {0, n} and vx' = So + fi/2, vx = Sn + fi/2 where <5o, Sv are the Dirac point measures at 0, ir, respectively. A similar, but more tedious calculation gives i^S'(-1/2)
=
cos 2 ^
Px,s-(1/2)
=
sin2^
155
which is the usual distribution for spin in the 6 direction when j z — - 1 / 2 . We now examine the wave functions and Hilbert space corresponding to this model. The 5-amplitude function becomes fs(u>) = etw and the (X,S)wave function fx,s is given by /x,s(-l/2)
2 Jo
fx,s{^l2) = \f
e*wdw=-(l-
')
e^
The S'-amplitude function becomes fs> (w) = e™ for u € (0, TT) and / s - M = -i for w € {0, TT} . and the (X, 5')-wave function fx,s' IS given by fxM-W)
/x,5<(l/2)
=
f[o,9]fs'(.">x1/2^)
=
- f ( l + e iS )
=
/{M]/5'H^/2(^) = -i + 3Xr^dW
=
-i+12foei"d"
-|(l-eie)
=
The X-Hilbert space is clearly C 2 and we can represent fx,s and /x,S' in C by the unit vectors 2
vs VS'
(l-ei9,l
+ eie)
(I + eie,1 - eie)
Notice that vs i. vs'- Also, when 6 = 0, vs — (0,1) and us/ = (1,0) which are the usual eigenvectors for the spin 1/2 operator in the z direction. We can treat this as a measurement and the general X as an observable. It can be shown that the matrix for X in the standard basis (1,0), (0,1) becomes
* =5
cos 9 -i sin 6
ism 6 = - cos 6 — cos 6 2
1 0
0 -1
+ - sin 6
0 -i
i 0
which is the usual form for a spin 1/2 matrix in the direction 6. We can extend this analysis to higher order spins. 3 Moreover, this framework gives a realistic model for the Bohm version of the EPR problem. 4 The reason that Bell's theorem is not contradicted is because Bell's inequalities are derived using classical probability theory and we have employed quantum probability theory.
156 5
Traditional Quantum Mechanics
We now show that this formalism contains traditional nonrelativistic quantum mechanics. For simplicity, we consider a single spinless particle in one dimension although this work easily generalizes to three dimensions. We take our sample space to be the phase space n = K2 = {(q,p):
q,pER}
The two most important measurements are the position and momentum given by Q(Q,P) = > P(QJP) = P, respectively. However, as is frequently done in quantum mechanics, we shall investigate the ^-representation of the system. In this case, Q is considered a measurement and P: fi —> R is viewed as a function on fi which, as we shall show, is a Q-observable. Each Q-fiber, Q~l{q) = {(q,p)- p £ R } can be identified with R. We make Q a measurement by endowing its range R(Q) = R with Lebesgue measure and its fibers with the usual Borel structure of R. Only certain Q-actions IS,<(1Q: < 7 G R H correspond to traditional quantum states and these can be derived from natural postulates. We assume that fj,% is absolutely continuous relative to Lebesgue measure on R and that \IQ is independent of Q. This is because sets of Lebesgue measure zero are too small to have any effect on the outcomes of position measurements and there is no a priori reason to distinguish between Q-fibers. It follows from the Radon-Nikodym theorem that there exists a nonnegative Lebesgue measurable function £: R —> R such that »Q(dp) = (2irh)-1/2ap)dp
(24)
We take S: fl —> R to have the form S(q,p) = f+V(p)
(25)
This form is natural because qp is the classical action and adding a function of momentum gives a quantum fluctuation. We could also add a function of q but it is easy to see that this would just multiply the wave function by a constant phase which would not alter the probabilistic formulas. Denote by AQ the set of (^-actions that have the form (24), (25). Applying (12) for S € AQ, we find that the (Q, 5)-wave function becomes fQ,s(q) =
{2-KK)-1/2
J
ti{pYn{p)eiqv/hdp
Defining m
= t(p)eiv{p)
(26)
157 v
and denoting the inverse Fourier transform by
we have
fQ,s(q) = (27T?r 1/2 / 4>{Pyqp/hdP
=
(27)
In order for (3) to be satisfied, / Q ^ must be a unit vector in L 2 (R, dq) or equivalently, <j>{p) must be a unit vector in L2(R, dp). However, every vector in L2 (R, dp) has the form (26) for some functions £: R -»• R + , 77: R ->• R. It follows that the Q-Hilbert space becomes the traditional Hilbert space HQ = L2(R, dq) and fQ,s is the usual wave function (or state). Let (s, l^9Q:
q € R } ) be a fixed Q-action in AQ of the form (24), (25) $(p) = ^(p)eit>^.
and let ip(q) = fQ,s(q), fQ,s(P)(Q)
Applying (16) and (27) we have
(2nh)-1/2Jp(p)ei^dp
=
-ih±(2nh)-V2j4>(P)ei'lp/hdp=-ih%{q)
=
More generally, if n is a positive integer, we obtain fQ,s(Pn)(Q) = (-ihQ
V-CP)
(28)
Moreover, applying (18) we have
E pn)
^
= l[(-ihiS
1>(q) P(q)dq
which is the usual quantum expectation formula. We conclude from (28) that P " is a Q-observable and is represented by the operator (—ihd/dq)n. Moreover, if V: R —> R is measurable, we see from (19) that V(Q) is a Q-observable and is represented by the operator V(Q)Au(q) = V(q)u(q). This together with our observation concerning P " , gives a derivation of the Bohr correspondence principle. We now consider probability distributions. We have already seen in (15) that I \
PQ,S(B)=
JB
which is the usual distribution of Q. It is more interesting to compute the probability of A = P~1(B) for the momentum function P . We have from (21) that fQ,s(xA)(q)
= {2Kh)-1'2
[ JB
4>{j>yqp,hdp={xB4>Y{q)
158
Hence, by (20) and the Plancherel formula we obtain
PQ,S [P-^B)] = j{xBd>Y{q)r{.q)dq / <(xB4>){p)
= / |#(p) dp JB <
Again, this is the usual momentum distribution. This gives an example in which PQ,S is an actual probability measure on a er-algebra of subsets of fi. Until now we have treated time as fixed. We now briefly consider dynamics. Let ip{q,t) be a smooth function. Our previous formulas hold with tp(q) replaced by tp(q,t) and HQ replaced by t*Qt- We now derive Schrodinger's equation from Hamilton's equation of classical mechanics dp/dt = —dH/dq. Suppose the energy function has the form H(q,P) =
^+V(q)
We assume that Hamilton's equation holds in the amplitude average. Applying (16) we have Jt J Pfs(q,P,t)nqQ
= -—J
H(q,p)fs{q,p,t)nqQt{dp)
Hence dt
Jp$(p, t)e^'hdp =-^f
H(q,p)$(p, t)e^lhdp
Applying (28) and (19) gives
dt \
dq J
dq
h2 d2i> + V(q)rl> 2m dq2
Interchanging the order of differentiation on the left side of this equation and integrating with respect to q gives Schrodinger's equation. 6
Concluding Remarks
In this paper we have presented a realistic, contextual, nonlocal approach to quantum probability theory. The formalism is realistic because each sample
159
point w € n uniquely determines a value X(u>) for any measurement X. In this way, a physical system <S possesses all of its attributes independent of whether they are measured. Although the sample space fi exists and we can discuss its properties, fi is not physically accessible in general. This is because the sample points may not correspond to physical states which can be prepared in the laboratory or at least exist in nature. We may think of fi as a hidden variable completion of quantum mechanics. This approach is contextual because it is necessary to specify a particular basic measurement X. Once X is specified, a Hilbert space Hx can be constructed and Hx provides an X-representation for S. Of course, one may choose a different basic measurement Y and then the ^-representation will give a different picture of S. For example, in traditional quantum mechanics we usually choose the position representation or the momentum representation to describe <S. For a given basic measurement X and an action S we have given a method for constructing the probability distribution Px,s of X. We have shown that Px,s may be found in terms of a state vector fx,s 6 Hx and these correspond to physically accessible states. In Hx the measurement X and functions of X are "diagonal" and hence represented by "random variables." Other measurements which we call observables to distinguish them from X are represented by self-adjoint operators on Hx and their usual distributions follow in a natural way. The theory is nonlocal because the distribution Px,s is specified by an influence function Fs(w,w'). This function provides an influence between pairs of sample points which in a spacetime model may be spacelike separated. There is considerable controversy concerning various interpretations and approaches to probability theory. I believe that three types of probabilities are necessary for a description of quantum mechanics. The probabilities and distributions of measurement results in the laboratory are usually computed using long run relative frequencies. Even though a measurement X may involve a microscopic system S (for example, the position of an electron), S must interact with a macroscopic apparatus in order to obtain an observable outcome. The theoretician's task is to find the distribution Px of X. This theoretical distribution should agree with the long run relative frequencies found in the laboratory or give predictions that can eventually be tested experimentally. Since there are serious well-known difficulties in dealing with abstract theories of relative frequencies, it is convenient and perhaps even necessary to use the standard Kolmogorovian probability theory for describing Px- Now Px is a probability measure that satisfies the axioms of standard probability theory. However, the method for computing Px is characteristic of quantum mechanics and is not found in any classical theory. Richard Feynman, whose work has motivated the present paper, once said that nobody really understands
160
quantum mechanics. I think that what he meant is that nobody understands why nature has chosen to compute probabilities in this unusual way. As presented here, the probability density for Px is found by employing an influence function. The advantage of this method is that it is physically motivated and avoids complex numbers. An equivalent method, which is usually employed in quantum mechanics, is to take the absolute value squared of the wave function. The quantum probability approach that we have presented contains standard probability theory as a special case. Thus, we only need two types of probabilities to describe quantum mechanics. Standard probability theory as developed by Kolmogorov is a distillation of hundreds of years of experience with empirical and theoretical studies of chance phenomena. The founders of the subject were concerned with games of chance, statistics and the behavior of macroscopic objects. They were not aware of microscopic objects and quantum mechanics and had no reason to design a probability theory for describing such situations. It is therefore not surprising that a new theory called quantum probability theory had to be developed to serve these purposes. References 1. R. Feynman and A. Hibbs, Quantum Mechanics and Path Integrals (Mc Graw-Hill, New York, 1965). 2. S. Gudder, Int. J. Theor. Phys. 32, 1747 (1993). 3. S. Gudder, Int. J. Theor. Phys. 32, 824 (1993). 4. S. Gudder, Quantum probability and the EPR argument, Ann. Found. Louis De Broglie, 20, 167 (1994). 5. G. Hemion, Int. J. Theor. Phys. 29, 1335 (1990).
161
INNOVATION A P P R O A C H TO STOCHASTIC P R O C E S S E S AND QUANTUM DYNAMICS TAKEYUKI HIDA Department of Mathematics Meijo University Tenpaku,Nagoya 468-8502 and Nagoya University (Professor Emeritus) Theory of stochastic process has extensively developed in the twentieth century and there established a beautiful connection with quantum dynamics. It seems to be a good time now to revisit the foundations of stochastic process and quantum mechanics with the hope that the attempt would suggest some of further directions of these two disciplines with intimate relations. For this purpose, we review some topics in white noise analysis and observe motivations from physiscs and how they have actually been realized.
1
Introduction
We shall discuss the analysis of random complex systems and its connection with Quantum dynamics. In particular, we analyse some stochastic processes X{t) and random fields X(C), in a manner of using the innovation and revisit quantum dynamics in connection with stochastic analysis. Actually, our aim is to study those random complex systems including quantum fields by using the white noise analysis. The basic idea of our analysis is that we first discuss stochastic processes by taking a basic and standard system of random variables, then expressing the given process as a function of the system that has been provided. The system of such variables from where we have started is called idealized elemental random variables (abbr. i.e.r.v.). The idea of taking such a system is in line with the Reductionism. One might think that this thought seems to be similar to the Reductionism in physics. Before we come to this point, it sounds interesting to refer to the lecture given by P.W. Anderson at University of Tokyo 1999. His title included Emergence together with reductionism and he gave good interpretation. Following the reductionism we then come to the next step, is to form a function of the i.e.r.v.'s, so that the function represents the given random complex system. It is nothing but Synthesis.
162
Then, naturally follows the analysis of functions which have been formed in our setup. Thus the goal has therefore to be the analysis of the function (may be called functional) to identify the random complex system in question. The first step of taking suitable system of i.e.r.v.'s has been influenced by the way how to understand the notion of a stochastic process. We therefore have a quick review of the definition of a stochastic process starting from the idea of J. Bernoulli (Ars Conjectandi, 1713), S. Bernstein (1933) and P. Levy on the definition of a stochastic process (1947), where we are suggested to consider the innovation of a stochastic process. It is viewed as a system of i.e.r.v.'s, which will be specified to be a white noise. The analysis of white noise functionals has many significant characteristics which are fitting for investigation of quantum mechnical phenomena. Thus, we shall be able to show examples to which white noise theory is efficiently applied. Having had great contribution by many authors, the theory developed in our line has become the present state: AMS 2000 Mathematics Subject Classification 60H40 White Noise Theory 2
Review of defining a stochastic process and white noise analysis
There is a traditional, and in fact original way of defining a stochastic process. Let us refer to Levy's definition of a stochastic process given in his book [3] Chapt. II. "une fonction aleatoire X(t) du temps t dans lequel le hasard intervient a chaque instant". The hasard is expressed as an infinitesimal random variable Y(t) which is independent of the observed values of X(s), s < t, in the past. The random variable Y(t) is nothing but the innovation of the process X(t). Formally speaking the Y(t), which is usually an infinitesimal random variable, contains the information that was gained by the X(t) during the time interval [t, t + dt). To express this idea P. Levy proposed a formula called an infinitesimal equation for the variation 5X (t): 6X(t) = $(X(s),s
<
t,Y(t),t,dt),
where $ is a non-random functional. Although this equation has only a formal significance, it still tells us lots of suggestions. While, it would be fine if the given process is expressed as a functional of
163
Y{t) in the following manner: X(t) =
V(Y(s),s
where ^ is a sure (non random) function. Such a trick may be called the Reduction and Synthesis method. The above expression is causal in the sense that the X(t) is expressed as a function of Y(s), s
e
C,C,6C).
The {y(s),s G C} is the innovation. We note that the white noise analysis has many advantages as are quickly mentioned below. Such a generalization can be done because of the use of the innovation. 1) It is an infinite dimensional analysis. Actually, our stochastic analysis can be systematically done by taking a white noise as a sytem of i.e.r.v.'s to express the given random complex systems. Indeed, the analysis is essentially infinite dimensional as will be seen in what follows. 2) Infinite dimensional harmonic analysis The white noise measure supported by the space E* of generalized functions on the parameter space Rd is invariant under the rotations of E*. Hence a harmonic analysis arising from the group will naturally be discussed. The group contains significant subgroups which describes essentially infinite dimensional characters. 3) Generalizations to random fields X(C) are discussed in the similar manner to X(t) so far as innovation is concerned. Needless to say, X(C) enjoys more profound characteristic properties.
164
4) Connection with the classical functional analysis. The so-called Stransform applied to white noise functionals provides a bridge connecting white noise functionals and classical functionals of ordinary functions. We can therefor appeal to the classical theory of functionals established in the first half of the twentieth century. 5) Good connection with quantum dynamics as will be seen in the next section. Differential and integral calculus of white noise functionals using annihilation dt and creation <9t*, class of generalized functionals, harmonic analysis including Fourie analysis, the Levy Laplacian A L , complexification and other theories are refered to the monograph [12] and other literatures. 3
Relations to Quantum Dynamics
We now explain briefly some topics in quantum dynamics to which white noise theory can be applied. What we are going to present here may seem to be separate topics each other, but behind the description always is a white noise. 1) Representation of the canonical commutation relations for Boson field. This topic is well known. Let B(t) be a white noise and let dt denote the S(i)-derivative. Then it is an annihilation operator and its dual operator 3t* stands for the creation. They satisfy the commutation relations
[ft,a.] = [a;,a;] = o, [dt,d;] =
s(t-s).
From these, a representation of the canonical commutation relations are given for Bosonic particle. It is noted that the following assertion holds. Proposition. There are continuously many irreducible representations of the canonical commutation relations. White noises with different variances are inequivalent each other, which proves the assertion. 2) Reflection positivity (T-positivity).
165
A stationary multiple Markov (say N-ple Markov) Gaussian process has a spetral density function /(A) of particular type. Namely,
On the other hand, it is proved that Proposition. The covariance function 7(/t) of a stationary T-positive Gaussian process is expressed in the form /•OO
j(h) = / Jo
exp[— |/i|x]cfo(a;),
where v is a positive finite measure. By applying this assertion to the N-ple Markov Gaussian process we claim that T-positivity requires Ck > 0 for every k. Note that in the strictly N-ple Markov case this condition is not satisfied. It is our hope that this result would be generalized to the cases of general stochastic processes of multiple Markov properties. 3) A path integral formulation. One of the realizations of Dirac-Feynman's idea of the path integral may be given by the following method using generalized white noise functionals. First we establish a class of possible trajectories when a Lagrangian L(x, x) is given. Let x be the classical trajectory determined by the Lagrangian. As soon as we come to quantum dynamics we have to consider fluctuating paths y. We propose they are given by y(s) = x{s) + \ —B{s). Vm The average over the paths is replaced with the expectation with respect to the probability measure for which Brownian motion B(t) is defined. Thus, the propagator G(yi,y2,t) is given by E{Nexp[l-J
L(y,y)ds+^j
B(s)2ds] • S(y(t) -
y2)}.
With this setup actual computations have been done to get exact formulae of the propagators. (L. Streit et al.)
166
4) Dirichlet forms in infinite dimensions. With the help of positive grneralized white noise functionals we prove criteria for closability of energy forms. See [3]. 5) Random fields X(C). A random field X{C) depending on a parameter C, which is taken to be a certain smooth and closed manifold in a Euclidean space, naturally enjoys more complex probabilistic structure than a stochastic process X(t) depending on the time t. It therefore has good connections with quantum fields in physics. We are particularly interested in the case where X(C) has a causal representation in terms of white noise. Some typical examples are listed below. 5.1) Markov property and multiple Markov properties. We are suggested by Dirac's paper [1] to define Markov property. For Gaussian case a reasonable definition has been given (see [15]) by using the canonical representation in terms of white noise, where the canonical property of a representation can be introduced as a geberalization of that for a Gaussian process. Some attempts have been made for some non Gaussian fields (see [17]). For Gaussian case, multiple Markov properties have been defined. It is now an interesting question to find conditions under which a Gaussian random field satisfies a multiple Markov property. 5.2) Stochastic variational equations of Langevin type. Let C runs through a class C of concentric circles. The equation is to solve the following stochastic variational equation of Langevin type. SX(C) = -XX{C)
[ 6n(s)ds + X0 [
Jc
Jc
v(s)d*s5n(s)ds.
The explicit solution is given by using the 5-transform and the classical theory of functionals. 5.3) We have made an attempt to define a random field X(C),C G C which satisfies conformal invariance. Reversibility can also be discussed. Example. Linear parameter case. A Brownian bridge. For t € [0,1] it is defined by X(t) = (l-t)
[ —^—B(u)du. Jo 1 ~u
167
Reversibility can be guaranteed not only by the time reflection but also by whiskers (one-parameter subgroup denned by deformation of parameter) in the conformal group that leaves the unit time interval invariant. We now come to the case of a random field. Let C be the class of concentric circles. Assume 0 < r0 < r < r\. Denote by Cr the circle with radius r. Then we define
'(ft) - yfi^^bw
w^w*^
This is a canonical representation. To show a reversibility, we apply the inversion with respect to the circle with radius y/rori:
We claim that it is possible to have a generalization to the case where C is taken to be a class of curves obtained by a conformal mapping of concentric circles. Remark 1. It is noted that the white noise x(t) is regarded as a representation of the parameter t, so that propagation of randomness (fluctuation) is expressed in terms of x(t) instead the time t itself. Namely, the way of development of random complex phenomena, in particular reversibility, has explicit description in terms of white noise as is seen in the above example. Remark 2. See the papers [1] by Dirac and [13] by Polyakov to have suggestions on a generalization of the path integral. 4
Addenda to foundations of the theories. Concluding remarks
Before the concluding remarks are given, we should like to add some facts as an addenda to SI, regarding the foundations of probability theory. Prom a brief history mentioned in S I , we understand the reason why a white noise, that is a system of i.e.r.v.'s is introduced. It is a generalized stochastic process, so that we need some additional consideration when reasonable functionals, in general nonlinear functionals, of white noise are introduced. In physics we met interesting cases where those nonlinear functionals of white noise are requested; canonical commutation relations for quantum fields, where degree of freedom is continuously infinite, Feynman's path integrals as was discussed in 3) of the last section, and variational equation for a
168
random field. On the other hand, we were lucky when a class of generalized white noise functionals were introduced in 1975, since the theory of genaralized functions was established and some attempt had been made to apply it to the theory of generalized stochastic processes. To have further fruitful results, we have been given a powerful method to study random fields indexed by a manifold. It is the so-called innovation approach, where our reductionism does not care higher dimensionality of the parameter space. With these in mind we can come to the concluding remarks. As the concluding remarks some of proposed future directions are now in order. 1. One is concerned with good applications of the Levy Laplacian. Its significance is that it is an operator that is essentially infinite dimensional. 2. A two-dimensional Brownian path is considered to have some optimality in occupying the territory. This property should reflect to forming a model of physical phenomena. 3. Systematic approach to in variance of random fields under transformation group will be discussed. 4. Stochastic Variational Calculus for random fields. With the classical results on variational calculus we can proceed further white noise analysis. Acknowledgements. The author is grateful to Professor A. Khrenikov who has invited him to give a talk at this conference. Thanks are due to Academic Frontier Project at Meijo University for the support of this work. References 1. P.A.M. Dirac, The Lagrangian in quantum mechanics. Phys. Z. Soviet Union, 3, 64-72(1933). 2. S. Tomonaga, On a relativistically invariant formulation of the quantum theory of wave fields. Prog. Theor. Phys., 1, 27-42 (1946). 3. P. Levy, Processus stochastiques et mouvement brownien (GauthierVillars 1948; 2 ed. 1965). 4. P. Levy, Nouvelle notice sur les travaux scientifique de M. Paul Levy, Janvier 1964. Part III. Processus stochastiques. (unpublished manuscript).
169
5. T. Hida, Canonical representations of Gaussian processes and their applications. Mem. College of Science, Univ. of Kyoto, A, 33, 109-155(1960). 6. T. Hida, Stationary stochastic processes (Princeton Univ. Press. 1970). 7. T. Hida, Brownian motion (Iwanami Pub. Co., 1975; English ed. Springer-Verlag, 1980). 8. T. Hida, Analysis of Brownina functionals. Carleton Math. Lecture Notes, 13 (1975). 9. T. Hida, Innovation approach to random complex systems. Pub. Volterra Center, 433 (2000). 10. T. Hida and L. Streit, On quantum theory in terms of white noise.Nagoya Math. J., 68, 21-34(1977). 11. T. Hida, J. Pothoff and L. Streit, Dirichlet forms and white noise analysis. Commun. Math. Phys., 116, 235-245 (1988). 12. T. Hida, H.-H. Kuo, J. Potthoff and L. Streit, White noise, an Infinite dimensional calculus (Kluwer Academikc Pub. 1993). 13. A.M. Polyakov, Quantum geometry of Bosonic strings. Phys. Lett., 103B, 207-210(1981). 14. J. Schwinger, Brownian motion of a quantum oscillator. J. of Math. Phys., 2, 407-432 (1961). 15. Si Si, Gaussian processes and Gaussian random fields. Quantum Informational (World Scientific Pub. Co. 2000). 16. L. Streit and T. Hida, Generalized Brownian functionals and the Feynman integral. Stoch. Processes Appl., 16, 55-69 (1983). 17. L. Accardi and Si Si, Innovation approach to multiple Markov properties of some non Gaussian random fields, to appear.
170
STATISTICS A N D E R G O D I C I T Y OF WAVE F U N C T I O N S I N CHAOTIC O P E N SYSTEMS H. ISHIO Department of Physics and Measurement Technology, Linkoping University, S-581 83 Linkoping, Sweden E-mail: [email protected] and Division of Natural Science, Osaka Kyoiku University, Kashiwara, Osaka 582-8582, Japan E-mail: [email protected] In general, quantum chaotic systems are considered to be described in the context of the random matrix theory, i.e., by random Gaussian variables (real or complex) in an appropriate universality class. In reality, however, quantum states inside a chaotic open system are not given by a statistically homogeneous random state. We show some numerical evidences of such statistical inhomogeneity for ballistic transport through two-dimensional chaotic open billiards, and argue about their relation to the corresponding classical dynamics.
1
Introduction
Quantum-mechanical signature of classical chaos is called quantum chaos. The rigorous definition of chaotic systems in quantum theory has been given very recently for Kolmogorov (K-) and Anosov (C-) systems on the analogy of the corresponding classical natures. 1 In such systems, quantum ergodicity is naturally expected: Eigenfunctions are equidistributed in their representation space, and all expectation values of quantum observables coincide with mean values of the corresponding classical observables. It was first noted that a sufficient condition for quantum ergodicity to hold is the ergodicity of the corresponding classical dynamics. 2 More recently, the statement was proved in the case of quantum billiards. 3 ' 4 Nowadays, the quantum ergodicity is one of the few results for which there exist mathematical proofs in the field of quantum chaos. The quantum ergodicity, however, can be reached only in the semiclassical limit (h —> 0). In experiments or numerical simulations for chaotic systems, we often see nonuniversal quantum features far from ergodicity even in a high (but finite) energy region. In the present work, we show some numerical evidences of such statistical inhomogeneity for chaotic open systems. In Sec. 2, we introduce a model of ballistic transport through a chaotic open billiard, and show some evidences of nonergodicity in the classical dynamics. We briefly discuss in Sec. 3 the general wave-statistical description of chaotic open systems by
171
Figure 1: Typical single trajectory in the open stadium billiard.
the random matrix theory (RMT). In Sec. 4, we show numerical results of fully-quantum calculations of the open billiard model, and find that the idealistic description by RMT does not apply in some cases even in a high energy region. There, we focus on the relation between the statistical deviations and wave localization corresponding to classical short paths. Section 5 consists of conclusions. 2
Classical Nonergodicity and Short-Path Dynamics
We consider a two-dimentional (2D) billiard where the motion of noninteracting particles confined by Dirichlet boundaries is ballistic. The shape of the boundaries directly determines the nonlinearity of particle dynamics inside the billiard. One of the prototypes of conservative chaotic systems is a Bunimovich stadium billiard. In the case of a closed stadium billiard, it is proved that the system has K-property. 5 In the case of an open stadium billiard coupled to two narrow leads (see Fig. 1), the nonintegrability is still expected, e.g., we can observe a fractal structure in the spectrum of dwell times inside the cavity region. 6 However, the Monte Carlo simulation of the classical path-length (oc dwell time) distribution shows that the distribution function is not a simple exponential decay function as a signature of ergodicity, but a highly structured function owing to short-path dynamics. 7 Another example showing nonergodicity of classical dynamics in the case
172
of the open stadium billiard is a transmission-reflection diagram of particles as is shown in Fig. 2. There, y is an initial transversal position of each particle incoming from the lead 1 (see Fig. 1) at the entrance of the stadium cavity. d denotes a common width of the attached leads. We apply semiclassical quantization condition to the momentum of the incoming particles in the lead: The angle of incidence is quantized as 6, = ± sin - 1 [(nir)/(kd)] (n = 1,2,...), where we choose the positive and negative 0j for the upper and lower direction of particle motions in Fig. 1, respectively, k is the Fermi wave number of the semiclassical particles. In the calculation of all the range of the diagram, we fix the quantized mode number n as n = 1. Because of the semiclassical quantization condition, \0i\ monotonically decreases as a function of k. The distributed black and white points correspond to transmission and reflection events, respectively. The relative measure of the black (white) portion for each fc is equal to the classical transmission (reflection) probability Tci(k) (Rct(k)). In Fig. 2, we see a number of black and white "windows" in the chaotic sea. Each of them is associated with a family of short paths connecting from the lead 1 to the lead 2 (for the black) and the lead 1 (for the white). Such paths are stable in the event of transmission and reflection, and are expected to make an important contribution as a family to the corresponding quantum transport. 3
Universal Description of Wave Function Statistics
We write the scaled local density as p(r) — V\ip(r)\2, where V is the volume of the system, in which a single-particle wave function ip(r) is normalized in terms of the position r. It is well known that the probability distribution of the local densities of a chaotic eigenfunction of a closed system is the Porter-Thomas (P-T) distribution, 8 P(p) = ( l / v / 2 ^ ) e x p ( - p / 2 ) ,
(1)
described by a Gaussian orthogonal ensemble (GOE) of random matrices, when time-reversal symmetry (TRS) is present, i.e., ip £R. On the other hand, the distribution is an exponential, 8,Q P(p) = exp(-p),
(2)
described by a Gaussian unitary ensemble (GUE) of random matrices, when TRS is broken in the closed system, i.e., tp 6 C. The space-averaged spatial correlation of the local densities of a 2D chaotic wave function with wave number k is also given by 9 ' 1 0 ' 1 1
P2(kr) = (p^pfa))
= l + cJi(kr),
(3)
173
where r = |ri — r 2 | and Jo{x) is the Bessel function of zeroth order. The parameter c is chosen as c = 2 for GOE (TRS) and c = 1 for GUE (broken TRS) eigenfunctions. Investigations of the continuous transition of the wave function statistics between GOE and GUE symmetries have been also worked out. Introducing a transition parameter b € (1,2], we have the probability distribution: 12,13,14,15,16
PM
=
2Vr3Texp("4(5^T)'')
where Io{x) is the modified Bessel function of zeroth order, and the spatial correlation: 17 Pb2{kr) = 1 + (l + ( ^ )
2
)
JS(kr) •
(5)
For b -> 1 and b -> 2, both equations tend to the GOE and GUE cases, respectively. On the other hand, the systematic statistical investigations of scattering wave functions in open chaotic systems have been carried out quite recently. 16 It is essential that the space reciprocity in conservative closed systems, which means that each plane wave ties up with its counterpart with the same amplitude and running in the opposite direction in phase, is lost in open systems. As a result, the wave function statistics in a chaotic open system is expected to be the GUE if the system is completely open. 16 4
Numerical Analyses and Discussions
We show in this section some numerical evidences of wave statistical inhomogeneity for ballistic transport through the 2D open stadium billiard. Assuming steady current flow through the leads, we solve the time-independent Schrodinger equation for a single particle under Dirichlet boundary conditions based on the plane-wave-expansion method, 6 giving reflection and transmission amplitudes as well as local wave functions for each energy. In the calculation of the statistics, a sample space A(= V) is taken in the cavity region corresponding to the closed stadium, and more than one million sample points are used to obtain reliable statistics. We show the numerical results for the wave probability density in Fig. 3 and for the probability distribution P(p) and spatial correlation P2(kr) in Fig. 4.
174
In Fig. 3(a), we find the so-called bouncing-ball mode in the central region of the stadium cavity, where we see a number of vertical nodes associated with marginally stable classical orbits bouncing vertically between the straight edges. Bouncing-ball states are nonstatistical states since the amplitude of ip is strongly localized in the middle region of the stadium (the space reciprocity holds locally) and is very small in the endcaps (the space reciprocity does not necessarily hold). As a result, both P{p) and P2(kr) for such states do not follow their universal expressions (see Fig. 4(a)). In addition to the bouncing-ball mode, we also see another wave localization strongly coupled to both the initial and the (open) transmission channels corresponding to the direct transmission path (see the white line depicted in Fig. 3(a)). Along such localization, plane wave may propagate with nonzero probability current, partially contributing to the anomaly of the wave statistics. 16 In the higher energy region, where the ratio of the system size \/A to the wave length A is v^4/A ~ 25 (i.e., in the case of Fig, 3(b)), we may expect the GUE statistics. However, we see in Fig, 4(b) that both P(p) and P2(kr) follow closely the GOE. The reason is a localization effect reminiscent of the phenomenon known as "scar" 18 describing an anomalous localization of quantum probability density along unstable periodic orbits in classically chaotic systems. In order to characterize a localization, we usually introduce a moment defined by J, = V~l Jv \tp(r)\2qdr of the eigenfunction local density |VKr)|2, with V being the system volume. 19 ' 20 The second moment, I2, is known as the inverse participation ratio (IPR). Assuming a normalization condition (|V'|2) (= ^1) = 1> we have I2 = 1 for completely ergodic (random and uniform) eigenfunctions while h = 00 for completely localized eigenfunctions like IV'(r)!2 ~ V5(r). The localization effect on wave-function density statistics has been examined analytically in relation to J, for closed systems 21,22,23 and also numerically using a time-dependent approach, i.e., in terms of recurrences of a test Gaussian wave packet, for closed and weakly (imperfectly) open systems. 24>25>26 In the latter work, they showed that the tail of the wave-function intensity distribution in phase space is dominated by scarring, departing from the RMT predictions. In contrast, the most prominent effect of the localization of wave probability density in open billiards is the local space reciprocity holding along the classical orbits corresponding to the localization not strongly coupled to any (open) transmission channel (see, e.g., the white lines depicted in Fig. 3(b)): Along such orbits, there is no net current owing to the coherent overlap of timereversed waves, so that both P(p) and P2(kr) are close to the GOE predictions. 16 For quantitative discussion, the value of the GOE-GUE transition parameter b is calculated numerically from the wave function ip(r) — u(r) + iv(r)
175
by a formula:
16
&= 2<|V|2)/
(h/f) + y(|V| 2 ) 2 -4((u2)( l ;2)-( w ) 2 )
(6)
and (• • •) denotes a space average on A. The obtained value for Fig. 3(b) is b = 1.03, which corresponds to the case very close to the GOE. In the case of open systems, the IPR may again play an important role as a measure of localization. 27 In the definition, I2 = V " 1 Jv |^(r)| 4 dr, |V'(r)| 2 (= p(r)) is the scattering-wave local density and V the area (A) of the stadium cavity in our case. For chaotic wave functions normalized as (IV'I2) = 1 > w e obtain from Eq. (4) the IPR l\ for the transition between the GOE and GUE statistics as Tb
I
p2Pb(p)dp = 7T
2 V / F^i
5
[2*
dQ
7 0 Ti[l+(t-l)cos0] ;
3b2 - 4 6 + 4 b2
(7)
In the GOE and GUE limits, I%=1 = 3 and 7|= 2 = 2, respectively. For Fig. 3(b), the numerically obtained IPR is h = 2.89, which is exactly equal to jt=i.03 ^phis m e a n s that the enhancement of the IPR by the amplitude of the localized wave is not strong in the case of Fig. 3(b), and that the effect of the localization appears mainly in the value of b, which also determines the IPR. From our investigations together with more extended studies, 16 the complete GUE statistics is conjectured to be obtained only in the high-energy (semiclassical) limit. Until the energy reaches such limit, the localization of wave functions within the chaotic open systems strongly affects the wave statistical properties, leading to deviations from the RMT predictions based on the ergodicity or uniform randomness of wave functions. Finally, we note that the classical-path families associated with the localization found in Fig. 3(a) and (b) can be identified as windows indicated with a and /3 in Fig. 2, respectively. (In Fig. 3(b), only the path family for the localization touching the entrance can be identified in Fig. 2.) We notice that the angle of incidence 0, for a given k is irrelevant to that of the path corresponding to the observed localizations directly connected to the entrance. 5
Conclusions
In conclusions, our numerical analyses show that chaotic-scattering wave functions in open systems exhibit remarkably different features from the idealistic GUE predictions. The statistical deviations from the GUE can be understood in terms of wave localization corresponding to classical short-path dynamics.
176
Acknowledgments The auther is obliged to K.-F. Berggren, A. I. Saichev and A. F. Sadreev for fruitful collaboration leading to the work in Sec. 4. Support from the Swedish Board for Industrial and Technological Development (NUTEK) under Project No. P12144-1 is also acknowledged. Part of the calculations of the wave function statistics were carried out by using a resource in National Supercomputer Center (NSC) at Linkoping. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
H. Narnhofer (to be published). A. I. Shnirelman, Usp. Mat. Nauk 29, 181 (1974). P. Gerard and E. Leichtnam, Duke Math. J. 71, 559 (1993). S. Zelditch and M. Zworski, Comm. Math. Phys. 175, 673 (1996). L. A. Bunimovich, Fund. Anal. Appl. 8, 254 (1974). K. Nakamura and H. Ishio, J. Phys. Soc. Jpn. 61, 3939 (1992). H. Ishio and J. Burgdorfer, Phys. Rev. B 51, 2013 (1995). C. Porter and R. Thomas, Phys. Rev. 104, 483 (1956). V. N. Prigodin, Phys. Rev. Lett. 74, 1566 (1995). V. N. Prigodin et al, Phys. Rev. Lett. 72, 546 (1994). M. V. Berry, in Chaos and Quantum Physics, ed. M. J. Giannoni, A. Voros, and J. Zinn-Justin (Elsevier, Amsterdam, 1990), p. 251. K. Zyczkowski and G. Lenz, Z. Phys. B 82, 299 (1991). G. Lenz and K. Zyczkowski, J. Phys. A 25, 5539 (1992). E. Kanzieper and V. Freilikher, Phys. Rev. B 54, 8737 (1996). R. Pnini and B. Shapiro, Phys. Rev. E 54, R1032 (1996). H. Ishio et al., (unpublished). S.-H. Chung et al., Phys. Rev. Lett. 85, 2482 (2000). E. J. Heller, Phys. Rev. Lett. 53, 1515 (1984). F. Wegner, Z. Phys. B 36, 209 (1980). C. Castellani and L. Peliti, J. Phys. A 19, L429 (1986). Y. V. Fyodorov and A. D. Mirlin, Phys. Rev. B 51, 13403 (1995). K. Miiller et al, Phys. Rev. Lett. 78, 215 (1997). V. N. Prigodin and B. L. Altshuler, Phys. Rev. Lett. 80, 1944 (1998). L. Kaplan, Nonlinearity 12, R l (1999). L. Kaplan, Phys. Rev. Lett. 80, 2582 (1998). L. Kaplan and E. J. Heller Ann. Phys. 264, 171 (1998). H. Ishio and L. Kaplan (private communication).
177
-612
0 y(-9i)
612-612 0
612
y(+6i)
Figure 2: Transmission-reflection diagram of classical particles as a function of initial position y at the entrance of the stadium cavity and Fermi wave number k corresponding to the angle of incidence $i calculated by semiclassical quantization condition (n = 1 in all the range) in the lead. Black and white points correspond to transmission and reflection events, respectively. Two families of short paths are identified with an arrow beside the diagram (see the text).
178
Figure 3: Contour plot of wave probability density in the open stadium billiard for the condition (a) kd/n = 1.8785 (n = 1) and (b) kd/rc = 4.6553 (n = 1). Initial wave comes through the left lead into the cavity. The transmission probability is (a) Tqm = 0.55 and (b) Tqm = 0.36. The contours show about 97.5% of the largest wave probability density. Thin white lines show some of the short classical orbits corresponding to the localization of the wave probability density. Taken from the work by the authors in Ref. [12] (unpublished).
179
Q. Q_
0.01
10
(b) \
\GOE
=*. 2
• Q.
X^QUE _ _S">J^\ 0 GOrT<^<\)
Q_
0.1
0.01
2
4
6
8
kr
0
Figure 4: Probability distribution (steps) and spatial correlation (thick line in the inset) of local densities in the open stadium billiard for the condition (a) kd/% = 1.8785 (n = 1) and (b) kd/ir = 4.6553 (n = 1). Two thin lines show GOE (i.e., Eq. (1)) and GUE (i.e., Eq. (2)) cases (Eq. (3) for the inset). Taken from the work by the authors in Ref. [12] (unpublished).
180
ORIGIN OF Q U A N T U M PROBABILITIES ANDREI KHRENNIKOV International Center for Mathematical Modeling in Physics and Cognitive Sciences, MSI, University of Vaxjo, S-35195, Sweden Email: [email protected] We demonstrate that the origin of the quantum probabilistic rule (which differs from the conventional Bayes' formula by the presence of cos 0-factor) might be explained by perturbation effects of preparation and measurement procedures. The main consequence of our investigation is that interference could be produced by purely corpuscular objects. In particular, the quantum rule for probabilities (with nontrivial cos 0-factor) could be simulated for macroscopic physical systems via preparation procedures producing statistical deviations of a special form. We discuss preparation and measurement procedures which may produce probabilistic rules which are neither classical nor quantum; in particular, hyperbolic 'quantum theory.'
1
Introduction
It is well known that the conventional probabilistic rule, formula for the total probability, (that is based on Bayes' formula for conditional probabilities) cannot be applied to quantum experiments, see, for example, [1]-[12] for extended discussions. It seems that special features of quantum probabilistic behaviour are just consequences of violations of the conventional probabilistic rule. In this paper we restrict our investigations to the two dimensional case. Here the formula for the total probability has the form (i = 1,2) : p(A = ai) = p(B = h)p(A
=
where A and B are physical variables which take, respectively, values ai,a2 and 61,62- Symbols p(A = a^jB = bj) denote conditional probabilities. It is one of the most important rules used in applied probability theory. In fact, it is the prediction rule: if we know probabilities for B and conditional probabilities, then we can find probabilities for A. However, this rule cannot be used for the prediction of probabilities observed in experiments with elementary particles. The violation of conventional probabilistic rule and the necessity to use new prediction rule was found in interference experiments with elementary particles. This astonishing fact was one of the main reasons to build the quantum formalism on the basis of the wave-particle duality.
181
Let (f> be a quantum state. Let {\b{ >}f =1 be the basis consisting of eigenvectors of the operator B corresponding to the physical observable B. The quantum probabilistic rule has the form (i = 1,2) : Pi = qiPii + q2P2i ± 2 % /qiPHq 2 p2i cos0,
(2)
where p* = p^A = a,i),qj - p^B = 6j),Py = p\bi>(A = aj),i,j = 1,2. Here probabilities have indexes corresponding to quantum states. By denoting P = pj and P i = qiPii,P2 = q2P2i we get the standard quantum probabilistic rule for interference of alternatives: P = P i + P 2 + 2v / P7PT cos6». There is the large diversity of opinions on the origin of violations of conventional probabilistic rule (1) in quantum mechanics, see [1]-[12] The common opinion is that violations of (1) are induced by special properties of quantum systems (for example, Dirac, Feynman, Schrodinger). Thus the quantum probabilistic rule must be considered as a peculiarity of nature. An interesting investigation on this problem is contained in the paper of J. Shummhammer [12]. In the opposite to Dirac, Feynman, Schrodinger,... he claimed that quantum probabilistic rule (2) is not a peculiarity of nature, but just a consequence of one special method of the probabilistic description of nature, so called method of maximum predictive power. In this paper we provide probabilistic analysis of quantum rule (2). In our analysis 'probability' has the meaning of the frequency probability, namely the limit of frequencies in a long sequence of trials (or for a large statistical ensemble). Hence, in fact , we follow to R. von Mises' approach to probability [13]. It seems that it would be impossible to find the roots of quantum rule (2) in the measure-theoretical framework, A. N. Kolmorogov, 1933, [14]. In the measure-theoretical framework probabilities are defined as sets of real numbers having some special mathematical properties. The conventional rule (1) is merely a consequence of the definition of conditional probabilities. In the Kolmogorov framework to analyse the transition from (1) to (2) is to analyse the transition from one definition to another. In the frequency framework we can analyse behaviour of trails which induce one or another property of probability. Our analysis shows that quantum probabilistic rule (2) can be, in principle, a consequence of perturbation effects of preparation and measurement procedures. Thus trigonometric fluctuations of quantum probabilities can be explained without using the wave arguments. In fact, our investigation is strongly based on the famous Dirac's analysis of foundations of quantum mechanics, see [1]. In particular, P. Dirac pointed out that one of the main differences between the classical and quantum theories is that in quantum case perturbation effects of preparation and measurement
182
procedures play the crucial role. However, P. Dirac could not explain the origin of interference for quantum particles in the purely corpuscular model. He must apply to wave arguments: 'If the two components are now made to interfere, we should require a photon in one component to be able to interfere with one in the other', [1]. In this paper we discuss perturbation effects of preparation and measurement procedures. We remark that we do not follow to W. Heisenberg [15]; we do not study perturbation effects for individual measurements. We discuss statistical (ensemble) deviations induced by perturbations." We underline again that our probabilistic analysis was possible only due to the rejection of Kolmogorov's measure-theoretical model of probability theory. Of course, each particular experiment (measurement) can be described by Kolmogorov's model: there are no 'quantum probablities'. Moreover, it seems that there is nothing more than the binomial probability distribution (see the paper of J. Shummhammer in the present volume). The most important feature of QUANTUM STATISTICS is not related to a single experiment. We have to consider at least three different experiments (preparation procedures) to observe 'quantum probabilistic behaviour', namely interference of alternatives. Kolmogorov's model is not adequate to such a situation. In this model all random variables are defined on the same probability space. It is impossible to do in the case of a few experiments that produce interference of alternatives (at least the author does not see any way to do this). In our analysis probability is 'classical', relative frequency, but it is not Kolmogorov (compare with Accardi [3]). An unexpected consequence of our analysis is that quantum probability rule (2) is just one of possible perturbations (by ensemble fluctuations) of conventional probability rule (1). In principle, there might exist experiments which would produce perturbations of conventional probabilistic rule (1) which differ from quantum probabilistic rule (2). Moreover, if we use the same normalization of the interference term, namely
2v/PTP7, then we can classify all possible probabilistic rules that we have in nature: 1) trigonometric; 2) hyperbolic; 3) hyper-trigonometric. The hyperbolic probabilistic transformation has a linear space representation that is similar to the standard quantum formalism in the complex Hilbert space. Instead of complex numbers, we use so called hyperbolic numbers, see, for example, [18], p.21. The development of hyperbolic quantum mechanics can be interesting for comparative analysis with standard quantum mechanics. In "Such an approach implies the statistical viewpoint to Heisenberg uncertainty relation: the statistical dispersion principle, see L. Ballentine [16], [17] for the details.
183
particular, we clarify the role of complex numbers in quantum theory. Complex (as well as hyperbolic) numbers were used to linearize nonlinear probabilistic rule (that in general could not be linearized over real numbers). Another interesting feature of hyperbolic quantum mechanics is the violation of the principle of superposition. Here we have only some restricted variant of this principle. 2
Quantum formalism and perturbation effects
1. Frequency probability theory. The frequency definition of probability is more or less standard in quantum theory; especially in the approach based on preparation and measurement procedures, [5], [10], [16], [11]. Let us consider a sequence of physical systems n = (7TI,7T2, ...,71-JV, •••) • Suppose that elements of TT have some property, for example, position or spin, and this property can be described by natural numbers: L = {1,2, ...,m}, the set of labels. Thus, for each -Kj € TT, we have a number Xj £ L. So ir induces a sequence x = (XI,X2,...,XN,...),
Xj e L.
(3)
For each fixed a € L, we have the relative frequency VN{OC) — niv(a)/N of the appearance of a in (a;i,a;2, ...,XN). Here njv(a) is the number of elements in (XI,X2,-.-,XN) with Xj = a. R. von Mises [13] said that x satisfies to the principle of the statistical stabilization of relative frequencies, if, for each fixed a G L, there exists the limit p(a)
=
lim
^AT(Q).
(4)
N—HXl
This limit is said to be a probability of a. Thus the probability is defined as the limit of relative frequencies. In fact, this definition of probability is used in all experimental investigations. In Kolmogorov's approach [14] probability is denned as a measure. The principle of the statistical stabilization is obtained as the mathematical theorem, the law of large numbers. 2. Preparation and measurement procedures and quantum formalism. We consider a statistical ensemble S of quantum particles described by a quantum state <j>. This ensemble is produced by some preparation procedure 8, see, for example, [4], [5], [16], [10], [11] for details, see also P. Dirac [1]: 'In practice the conditions could be imposed by a suitable preparation of the system, consisting perhaps in passing it through various kinds of sorting apparatus, such as slits and polarimeters, the system being left undisturbed after the preparation.' There are two discrete physical observables B = bi, 62 and A = ax, a2.
184
The total number of particles in S is equal to N. Suppose that n\,i — 1,2, particles in S with B = bi and n",i = 1,2, particles in S with A = a,. Suppose that, among those particles with B = bi, there are riij,i,j, = 1,2, particles with A = a,j (see (R) below to specify the meaning of 'with'). So n\ = nn +ni2,n^ = nxi +n2j,i,j = 1,2. (R) We follow to Einstein and use the objective realist model in that both B and A are objective properties of a quantum particle, see [5], [4], [10] for the details. In particular, here each elementary particle has simultaneously defined position and momentum. In such a model we can consider in the ensemble S sub-ensembles Sj(B) and Sj(A),j = 1,2, of particles having properties B = bj and A = a,j, respectively. Set S ij (A,B) = S i (B)nS j (A). Then n^ is the number of elements in the ensemble S ; J ( A , B ) . We remark that the 'existence' of the objective property (B — bi and A — Oj) need not imply the possibility to measure this property. For example, such a measurement is impossible in the case of incompatible observables. In general the property (B = bi and A = a,j) is a kind of hidden objective property. b The physical experience says that the following frequency probabilities are well defined for all observables B, A : q i = p^(B = 6 i ) = lim q ^ U r
0
JV—>oo
p.
= p
(j4 = a . ) =
lim
(5)
iV
p W,pf) =
IS —too
^ ;
| .
(6) 1\
Let quantum states |6j > be eigenstates of the operator B. Let us consider statistical ensembles Ti,i = 1,2, of quantum particles described by the quantum states |6j > . These ensembles are produced by some preparation procedures £j. For instance, we can suppose that particles produced by a preparation procedure £ (for the quantum state 4>) pass through additional niters Fi, i = 1,2. In quantum formalism we have
\h > •
(7)
^Attempts to use objective realism in quantum theory were strongly criticized, especially in the connection with the EPR-Bell considerations. Moreover, many authors (for example, P. Dirac [1] and R. Feynman [2]) claimed that the contradiction between objective realism and quantum theory can be observed just by comparing the conventional and quantum probabilistic rules (see d'Espagnat [4] for the extended discussion). However, in this paper we demonstrate that there is no direct contradiction between objective realism and quantum probabilistic rule.
185
In the objective realist model (R) this representation may induce the illusion that ensembles Tt,i = 1,2, for states \bi > must be identified with subensembles Si(B) of the ensemble S for the state (j). However, there are no physical reasons for such an identification: The additional filter Fj(i = 1,2) changes the A-property of quantum particles. In general the probability distribution of the property A for the ensemble S;(B) = {IT e S : B(7r) = b;} differs from the corresponding probability distribution for the ensemble T;. Suppose that there are rriij particles in the ensemble T; with A = a,j(j — 1,2). c The following frequency probabilities are well defined: Pij = p| 6 . > (A = a,j) = limAr-^oo p>- ', where the relative frequency p ^ = ^f- (by measuring values of the variable A for the statistical ensemble T ; we always observe the stabilization of the relative frequencies pj • to some constant probability p y ) . Here it is assumed that the ensemble Tj consists of n^ particles, i = 1,2. This assumption is natural if we consider preparation procedure £; = Ft, a filter with respect to the value B — bi. Only particles with B = bi pass this filter. Hence the number of elements in the ensemble T; (represented by the state \bi >) coincides with number of elements with B = bi in the ensemble 5 (represented by the state cj>). It is also assumed that n\ = n\(N) -> oo,iV->oo. In fact, the latter assumption holds true if both probabilities q;,i = 1,2, are nonzero. We remark that probabilities pjj = Tp\bi>{A = a,j) cannot be (in general) identified with conditional probabilities p$(A = a,j/B = bi). As we have remarked, these probabilities are related to statistical ensembles prepared by different preparation procedures, namely by £i,i — 1,2, and £. Probabilities P|i, j> (A = a,j) can be found by measuring the A-variable for particles belonging to the ensemble Tj. Probabilities p^iA = CLJ/B = bi) in general could not be found; these are hidden probabilities with respect to the ensemble S. 3. Derivation of quantum probabilistic rule. Here we present the standard Hilbert space calculations. c We can use the objective realist model, (R). Then m^- is just the number of particles in the ensemble Tj having the objective property A = a,j. We can also use the contextualist model, (C). Then rriij is the number of particles in the ensemble T, which in the process of an interaction with a measurement device for the physical observable A would give the result A = a,j.
186
\a2 > • (8)
We note that Pll + Pl2 = 1, P21 + P22 = 1The first sum is the probability to observe one of values of the variable A for the statistical ensemble Ti; the second sum is the probability to observe one of values of the variable A for the statistical ensemble T 2 . As < &i|62 > = 0, we obtain: VP11P21 + e i(71 ~ 72) v / p l i p i i = 0. We suppose that all probabilities pij > 0. This is equivalent to say that A and B are incompatible observables or that operators A and B do not commute. Hence, sin(7i — 72) = 0 and 72 = 71 + nk. We also have VP11P21 + cos(7i - 72VP12P22 = 0. This implies that k = 21 + 1 and ^ p i ^ i = i/Pi2P22- As p!2 = 1 — P n and P21 = 1 — P22, we obtain that Pll=P22, Pl2=P21-
(9)
This equalities are equivalent to the condition: P u + P21 = 1, P12 + P22 = 1. Hence, the matrix of probabilities (pij) is double stochastic matrix, see, for example, [5] for general considerations. Thus, in fact, \h >= v^PiT K > +e17lVPi2
\a2 >, \b2 >= ^pln |ai > - e J 7 l v ^ 2 2 \a2 > . (10)
So (p = di|ai > +d2|a2 >, where di = VqlpTT + e ^ y ^ p i T , d2 = e i 7 l ,/qiPi2 Thus
e'^+^y/qjp^.
pi = p 0 ( A = ai) = |di| 2 = q i p n + q 2 p 2 i + 2 v ' q i p i i q 2 p 2 i cos^; p 2 = p
(11) (12)
187
3. Probability transformations connecting preparation procedures. Let us forget at the moment about the quantum theory. Let B(= b\, b2) and A(= 01,02) be physical variables. We consider an arbitrary preparation procedure £ for microsystems or macrosystems. Suppose that £ produced an ensemble S of physical systems. Let £\ and £2 be preparation procedures which are based on filters Fi and F2 corresponding, respectively, to values 61 and b2 of B. Denote statistical ensembles produced by these preparation procedures by symbols Tx and T2, respectively. Symbols have the same meaning as in the previous considerations. Probabilities qi)Pij>Pi a r e defined in the same way as in the previous considerations. The only difference is that, instead of indexes corresponding to quantum states, we use indexes corresponding to statistical ensembles: q* = Ps(B = bi),pi = ps(A = a,i),pij = PTi(A = a,). We shall restrict our considerations to the case of strictly positive probabilities. The following simple frequency considerations are basic in our investigation. We would like to represent the frequency p^ (for A = a, in the ensemble S) as the sum of the conventional (Bayes) part,
qi^Pif+q^P^ and some perturbation term. Such a perturbation term appears, because frequencies q^ ' and p ^ ' are calculated with respect to different ensembles. The magnitude of this perturbation term will play the crucial role in our further analysis. We have:
P
(N) _ n± _ nu , I^£ _ m i l , H!2i 4. ( n ii ~ m i») , ( n 2i ~ ra2j) i ~ N ~ N N ~ N N N N
But, for i = l,2, we have ™>u _ rnu_ r^_ _ N ~ n\ ' N ~ P l i
(N)
(N)
qi
'
m^ _ rn^ n | _ N ~ n\ ' N ~P2i
(jy) ^
(N)
'
Hence p w = qwp(f) + qwp(f) + r ) )
where SiN) = Jj[(nu
~ m i i ) + ("2i - m2i)], i — 1,2.
(13)
188
In fact, this rest term depends on the statistical ensembles N
S,Ti,T2,
4 >=6W(S,Tl,T2). 4. Behaviour of fluctuations. First we remark that limjv-yoo S\ ' exists for all physical measurements. We always observe that P1(N)-MM,qi(N)-q,,pJ,)->Pu,N->00. Thus there exist limits 6i = limiv^oo S\ = Pi ~ qiPii - q2P2iThis coefficient Si is statistical deviation produced by the perturbation effect of the preparation procedure Ei (quantities S\ ' are experimental statistical deviations). Suppose that preparation procedures £,,i = 1,2, (typically filters F,) produce negligibly small (with respect to the size N of the statistical ensemble) changes in properties of particles. Then 6?° ->0,N-*oo.
(14)
This asymptotic implies conventional probabilistic rule (1). In particular, this rule can be used in all experiments of classical physics. Hence, preparation and measurement procedures of classical physics produce experimental statistical deviations with asymptotic (14). We also have such a behaviour in the case of compatible observables in quantum physics. Moreover, the same conventional probabilistic rule we can obtain for incompatible observables B and A if the phase factor 9 = j + nk. Therefore conventional probabilistic rule (1) is not directly related to commutativity of corresponding operators in quantum theory. It is a consequence of asymptotic (14). Despite the same asymptotic, (14), there is the crucial difference between classical observations (and compatible observations) and decoherence, 9 = f + irk, for incompatible observations. In the first case S\ both
4T = jj(nu ~mH)w °' si¥
=
fa 0, TV -> oo, because
jj(n2i ~ m 2 * ) K ° ' N •*• °°-
In an ideal classical experiment we have >ii» = ma and n^i = tn^iHere preparation procedures £j (filters with respect to the values hi of the variable B) do not change values of the A-variable at all. In the case of decoherence of incompatible observables the statistical deviations S\ j ' and 8\ 2 are not negligibly small. So perturbations can be sufficiently strong. However, we still observe (14), as a consequence of the compensation effect of perturbations:
189 x(N) ~
_x(")
°i,l
°i,2 •
~
Suppose now that filters Fi,i = 1,2, produce changes in properties of particles that are not negligibly small (from the statistical viewpoint). Then the statistical deviations lim 6\N) =Si^0. (15) iV->oo
Here we obtain probabilistic rules which differ from the conventional one, (1). In particular, this implies that behaviour (15) cannot be produced in experiments of classical physics (or for compatible observables in quantum physics). A rather special class of statistical deviations (15) is produced in experiments of quantum physics. However, behaviour of form (15) is not the specific feature of quantum measurements (see further considerations). To study carefully behaviour of fluctuations S\ ', we represent them as:
where A-N) = , . [jnu - mii) + (n2i - m2i)] . 2y/mum2i These are normalized (experimental) statistical deviations. We have used the fact: (N) (N) (N) (N) _ nj r^}± ^2 ^ 2 i _ qi P H q 2 p2i - N • n t • N • n 6 -
rniim2i JV-2 •
In the limit N -> oo, we get: Si = 2y'qiPHq2P2i A», where the coefficients Aj = limjv->oo A^ ',i = 1,2. Thus we found the general probabilistic transformation (for three preparation procedures) that can be obtained as a perturbation of the conventional probabilistic rule (i = 1,2) : Pi = qiPH + q2P2i + 2Vqiq2PiiP2iAj.
(16)
Of course, we are free in the choice of a normalization constant in the perturbation term. We use 2 v /qiq2Piipi7 by the analogy with quantum formalism. In fact, such a normalization was found in quantum formalism to get the representation of probabilities with the aid of complex numbers. Complex numbers were introduced in quantum formalism to linearize the nonlinear
190 probabilistic transformation q i p i , + q2P2» + 2-v/qiq2PiiP2i cos 6. To do this, we use the formula (c, d > 0): c + d + 2Vcdcos6 = \^+Vdeie\2
.
(17)
The 'square root' y/c+Vde*9 gives the possibility to use linear transformations. Thus we do not see anything mystical in the appearance of complex numbers in quantum theory. This is a consequence of the impossibility of real linearization of the nonlinear probabilistic transformation. In classical physics the coefficients A; = 0. The same situation we have in quantum physics for all compatible observables as well as for measurements of incompatible observables for some states. In the general case in quantum physics we can only say that the normalized statistical deviations \K\ < 1.
(18)
Hence, for quantum experiments, we always have: (nu - mu) + (n2i 2y/mum2i
m2i).
(19)
Thus quantum perturbations induce a relatively small (but not negligibly small!) statistical variations of properties. We underline again that quantum perturbations give just the proper class of perturbations satisfying to condition (19). Let us consider arbitrary preparation procedures that induce perturbations satisfying to (18). We can set Aj = cos9i,i = 1,2, where 6i are some 'phases.' Here we can represent perturbation to the conventional probabilistic rule in the form: St = 2 v * , qip li q 2 p2iCOS0 i ,J = 1,2.
(20)
In this case the probabilistic rule has the form (i = 1,2) : Pi = qiPii + q2P2i + 2^/qiq 2 piiP 2 i cos8i. This is the general form of a trigonometric probabilistic The usual probabilistic calculations give us 1 = Pl + p 2 = qiPH + q2P21 + +qiPl2 + q2P22 + 2 T/qTqiPiTpircos^i + 2 yqTqiPiipii" cos 02 = 1 + 2 A /qiq 2 [ x /pnP2i cos
(21) transformation.
191
Thus we obtain the relation: \ / P l l P 2 1 c o s ^ l + \/Pl2P22COS02 = 0 .
(22)
Suppose now that the matrix of probabilities is a double stochastic matrix. We get cos 6\ — — cos 6-2 .
(23)
We obtain quantum probabilistic transformation (2). We demonstrate that this rule could be derived even in the realist framework. Condition (19) has the evident interpretation. To explain the mystery of quantum probabilistic rule, we must give some physical interpretation to the condition of double stochasticity, see section 4 for such an attempt. We can simulate quantum probabilistic transformation by using random variables nij{u),mij{u) such that the deviations:
4T = nu - mH = 2^fVmi»m2». 4i
= n2i ~ m2j = ^ii
VmUm2i,
(24) (25)
where the coefficients £y satisfy the inequality
l # ° + $ ° I < l,*->oo.
(26)
Suppose that A> — £j; + Qj ' ~» A;, N -»• oo, where |Ai| < 1. We can represent A|N) = cos(9i(N). Then0J N ) ->• 9i,mod2iT, when N -> oo. Thus A; = cos ft. We remark that the conventional probabilistic rule (which is induced by ensemble fluctuations with Q ' —> 0) can be observed for fluctuations having relatively large absolute magnitudes. For instance, let e
li
— *?lt V m l»>
e
2i
— 2S2t V m 2 i ) » — J-iA
(27)
where sequences of coefficients {£}4 ' } and {£^ '} are bounded (JV -> oo). Here (N)
f(JV)
£ (JV)
= ^ \/mti "*" w'mn -> 0, iV -> oo (as usual, we assume that p,j > 0). Example 2.1. Let N « 10 6 ,nJ w rig « 5 • 10 5 ,mn ss mi 2 « m 2 i « m22 ~ 25 • 10 4 . So qi — q 2 = 1/2; p u — p i 2 = p 2 1 = p 2 2 = 1/2 (symmetric state). Suppose we have fluctuations (27) with f^ ' m Qi ~ 1/2- Then e w H w 4 ^00. So riij = 24 • 104 ± 500. Hence, the relative deviation
192 (N)
"m7" = 25I04 ~ 0.002. Thus fluctuations of the relative magnitude « 0,002 produce the conventional probabilistic rule. It is evident that fluctuations of essentially larger magnitude 4V
= 2^f ) (m H ) 1 / 2 (m 2 1 ) 1 A>,
€W
= 2&\m2i)^(mu)W,a,p
> 2, (28)
where {Q{ '} and {£2i } a r e bounded sequences (N —> 00), also produce (for Pij ¥" 0) the conventional probabilistic rule. Example 2.2. Let all numbers N,... ,m,ij be the same as in Example 3.1 and let deviations have behaviour (28) with a = /? = 4. Here the relative AN)
deviation -"— « 0,045. Remark 2.1. The magnitude of fluctuations can be found experimentally. Let A and B be two physical observables. We prepare free statistical ensembles S , T i , T 2 corresponding to states <j),\bi >,\b2 > • By measurements of B and A for 7r G S we obtain frequencies q[ ',q2 > Pi > P2 > ^y measurements of A for 7r € Ti and for TT G T 2 we obtain frequencies p[j '. We have H N )
=
A(N)
=
p(N)
q(N)p(N)_q(N)p(N,
It would be interesting to obtain graphs of functions f; (N) for different pairs of physical observables. Of course, we know that lini7v-»oo ft (N) = ±cos6. However, it may be that such graphs can present a finer structure of quantum states. 3
Hyperbolic and hyper-trigonometric probabilistic transformations
Let Si, £2 be preparation procedures that produce perturbations such that the normalized (experimental) statistical deviations
lAJ^I > l,JV-»oo.
(29)
Thus |Aj| > 1,2 = 1,2. Here the coefficients Aj can be represented in the form Aj = ± cosh8i,i = 1,2. The corresponding probability rule has the following form: Pi = qiPii + Q2P2J ± 2 A /qIqipIip 2 7cosh Qh i = 1,2. The normalization pi + p 2 = 1 gives the orthogonality relation: VP11P2I COSh 61 ± 1 /Pl2P22COSh^ 2 = 0 .
Thus cosh 62 — C0Sn ^i\/pi2P22
an
d signAiA2 = —1.
(30)
193 This probabilistic transformation can be called a hyperbolic rule. It describes a part of nonconventional probabilistic behaviours which is not described by the 'trigonometric formalism'. Experiments (and preparation procedures 8,61,82) which produce hyperbolic probabilistic behaviour could be simulated on computer. On the other hand, at the moment we have no 'natural' physical phenomena which are described by the hyperbolic probabilistic formalism. Trigonometric probabilistic behaviour corresponds to essentially better control of properties in the process of preparation than hyperbolic probabilistic behaviour. Of course, the aim of any experimenter is to approach trigonometric behaviour. However, in principle there might exist such natural phenomena that trigonometric quantum behaviour could not be achieved. Example 3 . 1 . Let qi = a, q2 = 1 - a , P n = . . . = P22 = 1/2. Then
pi = I + y/a(l - a)Ai, P2 = I - \A*(1 - «)^i • If a is sufficiently small, then Ai can be, in principle, larger than 1. We can find a 'phase' 6 such that the normalized statistical deviation Ai = cosh#. Let us consider experiments that produce hyperbolic probabilistic rule and let the corresponding matrix of probabilities be double stochastic. In this case orthogonality relation (30) has the form: cosh#i = cosh 62 = cosh#. We get the probabilistic transformation: Pi = q i P n +q2P2i ± 2^/qiq 2 piiP2i coshfl ; P2 = q i P i 2 + q2P22 T 2 v /qiq 2 Pi2P22COsh0 .
This probabilistic transformation looks similar to the quantum probabilistic transformation. The only difference is the presence of hyperbolic factors instead of trigonometric. This similarity gives the possibility to construct a linear space representation of the hyperbolic probabilistic calculus, see section 7. The reader can easily consider by himself the last possibility: one normalized statistical deviations |A| is large than 1 and another is less than 1; hyper-trigonometric probabilistic transformation. Remark 3.1. The real experimental situation is more complicated. In fact, the phase parameter 6 is connected with the experimental arrangement. In particular, in the standard interference experiments the phase is related to the space-time structure of an experiment. It may be that in some experiments dependence of the normalized statistical deviation A on 6 is neither trigonometric nor hyperbolic: P = P! + P2 + 2 y/P^XiO). However, if the function |A(#)| < 1, then we can obtain the trigonometric transformation by just the reparametrization: 6' = arccos/(#).
194
4
Double stochasticity and correlations between preparation procedures
In this section we study the frequency meaning of the fact that in the quantum formalism the matrix of probabilities is double stochastic. We remark that this is a consequence of orthogonality of quantum states \bi > and |62 > corresponding to distinct values of a physical observable B. We have PU Pl2
=
P22 P21
(31)
Suppose that all quantum features are induced by the impossibility to create new ensembles Ti and T2 without to change properties of quantum particles. Suppose that, for example, the preparation procedure Si practically destroys the property A = ai (transforms this property into the property A = a 2 ). So p n = 0. As a consequence, the £1 makes the property A = a 2 dominating. So p i 2 « 1. Then the preparation procedure Si must practically destroy the property A = a 2 (transforms this property into the property A = ai). So P22 PS 0. As a consequence, the Si makes the property A = ai dominating. So P21 « 1. We remark that
We recall that the number of elements in the ensemble T is equal to n\. Thus n n -run
_ ,n 2 2 - m 2 2 , ^ nil _ "22
,„„.
This is nothing than the relation between fluctuations of property A under the transition from the ensemble S to ensembles Ti, T2 and distribution of this property in the ensemble S. 5
Hyperbolic quantum formalism
The mathematical formalism presented in this section can have different 'physical interpretations.' In particular, quantum state can be interpreted from the orthodox Copenhagen as well as statistical viewpoints. A hyperbolic algebra G, see [18], p. 21, is a two dimensional real algebra with basis eo = 1 and ei = j , where j 2 = 1. Elements of G have the form z = x + jy, x,y € R. We have zi + z2 = (xi + x 2 ) + j(yi + yi) and ziz 2 = {xixi + 2/12/2) + j(^i2/2 + X2yi). This algebra is commutative. We introduce
195
the involution in G by setting z = x - jy. We set \z\2 = zz = x 2 - y2. We remark that \z\ = yjx2 - y2 is not well denned for an arbitrary z € G . We set G+ = {z £ G : \z\2 > 0}. We remark that G+ is the multiplicative semigroup: Zi,Z2 £ G + —• z = z\z2 £ G+. It is a consequence of the equality \zxz2\2 = |zi| 2 |z 2 | 2 . Thus, for z\,z2 £ G + , we have \z\z2\ = l^iH^I- We introduce eje = c o s h 6 + j s i n h 9 , 6 £ R. We remark that ej0iej02
_ em+<>2)^
_
e
- j 9 ; |gj«|2 _
cosh
2 g _
sinh
2 g, _
L
Hence, z = ±e J ' e always belongs to G+. We also have cosh6» = e +2e , sinh6> = e ~j . We set G ; = { z e G + : |Z| 2 > 0}. Let z £ G*+. We have * = W(1f[+W = ^ « N ( a S r + j H S r ) 2
2
As A T - T^TJ = 1, we can represent x sign a; = cosh 6 and y sign a; = sinh 6, where the phase 6 is unequally defined. We can represent each z £ G+ as z = sign x |z| e?e . By using this representation we can easily prove that G+ is the multiplicative group. Here \ — 5! Spe -J ' fl '. The unit circle in G is denned as Si = {z £ G : \z\2 = 1} = {z = ±eje,9 £ (-oo,+oo)}. It is a multiplicative subgroup of G+. Hyperbolic Hilbert space is G-linear space (module), see [18], E with a G-linear product: a map (•,•): E x E —> G that is 1) linear with respect to the first argument: (az + bw,u) = a(z,u) + b(w,u),a,b £ G,z,w,u £ E; 2) symmetric: (z,u) = (u,z); 3) nondegenerated: (z,u) = 0 for all u £ E iff z — 0. If we consider E as just a R-linear space, then (•, •) is a bilinear form which is not positively defined. In particular, in the two dimensional case we have the signature: (+, —, +, —). As in the ordinary quantum formalism, we represent physical states by normalized vectors of the hyperbolic Hilbert space:
196 and \bi >< b\\ + |&2 > < b2\, where {|a; >}j=i,2 and {\bi >}i=i,2 are two orthonormal bases in E. Let (p be a state (normalized vector belonging to E). We can perform the following operation (which is well defined from the mathematical point of view). We expend the vector
(34)
where the coefficients (coordinates) Pi belong to G. As the basis {\bi >}i=i,2 is orthonormal, we get (as in the complex case) that: \p1\2 + \p2\2 = l.
(35)
However, we could not automatically use Born's probabilistic interpretation for normalized vectors in the hyperbolic Hilbert space: it may be that Pi $. G + (in fact, in the complex case we have C = C + ) . We say that a state ip is decomposable with respect to the system of states {|6j >}i=i,2 (S-decomposable) if Pi G G+ .
(36)
In such a case we can use Born's probabilistic interpretation of vectors in a hyperbolic Hilbert space: Numbers q; = \Pi\2,i = 1,2, are interpreted as probabilities for values B = bi for the G-quantum state tp. We now repeat these considerations for each state \bi > by using the basis {\o>k >}*=i,2- We suppose that each \bi > is ^-decomposable. We have: |&i > = / ? n k > +Pi2\a2 >, |&2 > = & i | a i > +p22\a2 > ,
(37)
where the coefficients Pik belong to G+. We have automatically: |/?n| 2 + |/?i 2 | 2 = l, |/?2i|2 + |/? 22 | 2 = l .
(38)
We can use the probabilistic interpretation of numbers p n = |/?n| 2 ,pi2 = |/3i2|2 and p 2 i = |/32i|2,P22 = \P22? • Pik is the probability for a - ak in the state \bi > . Let us consider matrices B = (Pik) and P = (pik)- As in the complex case, the matrix B is unitary: vectors u\ = (Pn,Pi2) and u2 = (p2i,P22) are orthonormal. The matrix P is double stochastic. By using the G-linear space calculation (the change of the basis) we get
197 We remark that decomposability is not transitive. In principle ip may be not A-decomposable, despite B-decomposability of ip and A-decomposability of the B-system. Suppose that ip is A-decomposable. Therefore coefficients p^ = |afc|2 can be interpreted as probabilities for a = a,k for the G-quantum state
,i,k,=
1,2.
We find that Pi = q i P u + Q2P21 + 2ei v /q 1 piiq 2 p 2 i coshfli ,
(39)
P2 = qiPi2 + q2P22 + 2e2v/qTpl2q2P22 cosh^ 2 ,
(40)
where 6t = 77 + 7* and 77 = f i - £2,71 = 7 n - 721,7i = 7i2 - 722 and e* = ± . To find the right relation between signs of the last terms in equations (39), (40), we use the normalization condition M
2
+ |a 2 | 2 = l
(41)
(which is a consequence of the normalization of ip and orthonormality of the system {\ai >}i=i,2). It is equivalent to the equation (condition of orthogonality in the hyperbolic case, see section 8). VPl2P22COSh02 ± \/PllP2lCOSh02 = 0. Thus we have to choose opposite signs in equations (39), (40). Unitarity of B also inply that 6\ — 62 = 0, so 71 = 72. We recall that in the ordinary quantum mechanics we have similar conditions, but trigonometric functions are used instead of hyperbolic and phases 71 and 72 are such that 71—72 = ir. Finally, we get that (unitary) linear transformations in the G-Hilbert space (in the domain of decomposable states) represent the hyperbolic transformation of probabilities (see section 8): Pi = QiPu + q2P2i ± 2- v /q 1 piiq 2 p2iCOsh0 , P2 = qiPi2 + q2P22 =F 2 v /q 1 pi 2 q2P22COsh0 . This is a kind of hyperbolic interference. There can be some connection with quantization in Hilbert spaces with indefinite metric as well as the theory of relativity. However, at the moment we cannot say anything definite. It seems that by using Lorentz-'rotations' we can produce hyperbolic interference in a similar way as we produce the standard trigonometric interference by using ordinary rotations.
198
6
Physical consequences
The wave-particle dualism was created to explain the interference phenomenon for massive elementary particles. In particular, the orthodox Copenhagen interpretation was proposed to find a compromise between corpuscular and wave features of elementary particles. The idea of superposition of distinct 'properties' is, in fact, based on these interference experiments. It is well known that the orthodox Copenhagen interpretation is not free of difficulties (in particular, collapse of wave function) and even paradoxes (see, for example, Schrodinger [19]). Problems in the orthodox Copenhagen interpretation induce even attempts to exclude corpuscular objects from quantum theory at all, see, for example, [20] for Schrodinger critique of the classical concept of a particle. At the moment there is only one alternative to the orthodox Copenhagen interpretation, namely Einstein's statistical interpretation. By this interpretation the wave function describes distinct statistical features of an ensemble of elementary particles, see L. Ballentine [17] for the details (see also [16], [5], [10], [11])However, we must recognize that Einstein's statistical approach could not solve the fundamental problem of quantum theory: it could not explain the appearance of NEW STATISTICS in the purely corpuscular model. We did this in the present paper. On one hand, this is the strong argument in favour of the statistical interpretation of quantum mechanics. On the other hand, one of main motivations to use the wave-particle duality disappeared. Nevertheless, our investigation could not be considered as the crucial argument against the wave-particle duality. It is clear that by using purely mathematical analysis we cannot prove or disprove some physical theory. The only thing that we proved is that corpuscular objects (that have no wave features) can exhibit NEW STATISTICS. In fact, we obtained essentially more than planed: this NEW STATISTICS are not reduced to QUANTUM STATISTICS. In principle, we can propose experiments that induce TRIGONOMETRIC, HYPERBOLIC and HYPERTRIGONOMETRIC STATISTICS. We remark that the quantum probabilistic transformation
P = Pi + P 2 + 2VPTP7 cos0 gives the possibility to predict the probability P if we know probabilities P i and P 2 . In principle, there might be created theories based on arbitrary transformations: P = F(P1>P2). It may be that some rules have linear space representations over 'exotic number systems', for example, p-adic numbers [20].
199
Preliminary analysis of probabilistic foundations of quantum mechanics (that induced the present investigation) was performed in the books [11] and [21] (chapter 2); a part of results of this paper was presented in preprints [22]-[24]. Acknowledgements I would like to thank S. Albeverio, L. Accardi, L. Ballentine, V. Belavkin, E. Beltrametti, W. De Muynck, S. Gudder, T. Hida, A. Holevo, P. Lahti, A. Peres, J. Summhammer, I. Volovich for (sometimes critical) discussions on probabilistic foundations of quantum mechanics. References 1. P. A. M. Dirac, The Principles of Quantum Mechanics (Claredon Press, Oxford, 1995). 2. R. Feynman and A. Hibbs, Quantum Mechanics and Path Integrals (McGraw-Hill, New-York, 1965). 3. L. Accardi, The probabilistic roots of the quantum mechanical paradoxes. The wave-particle dualism. A tribute to Louis de Broglie on his 90th Birthday, ed. S. Diner, D. Fargue, G. Lochak and F. Selleri (D. Reidel Publ. Company, Dordrecht, 297-330, 1984). 4. B. d'Espagnat, Veiled Reality. An anlysis of present-day quantum mechanical concepts (Addison-Wesley, 1995). 5. A. Peres, Quantum Theory: Concepts and Methods (Kluwer Academic Publishers, 1994). 6. J. von Neumann, Mathematical foundations of quantum mechanics (Princeton Univ. Press, Princeton, N.J., 1955). 7. E. Schrodinger, Philosophy and the Birth of Quantum Mechanics. Edited by M. Bitbol, O. Darrigol (Editions Frontieres, 1992). 8. J. M. Jauch, Foundations of Quantum Mechanics (Addison-Wesley, Reading, Mass., 1968). 9. P. Busch, M. Grabowski, P. Lahti, Operational Quantum Physics (Springer Verlag, 1995). 10. W. De Muynck, W. De Baere, H. Martens, Found. Phys. 24, 15891663 (1994). 11. A. Yu. Khrennikov, Interpretations of probability (VSP Int. Publ., Utrecht, 1999). 12. J. Summhammer, Int. J. Theor. Phys. 33, 171-178 (1994). 13. R. von Mises, The mathematical theory of probability and statistics (Academic, London, 1964).
200
14. A. N. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung (Springer Verlag, Berlin, 1933); reprinted: Foundations of the Probability Theory. (Chelsea Publ. Comp., New York, 1956). 15. W. Heisenberg, Z. Physik., 43, 172 (1927). 16. L. E. Ballentine, Quantum mechanics (Englewood Cliffs, New Jersey, 1989). 17. L. E. Ballentine, Rev. Mod. Phys., 42, 358-381 (1970). 18. A. Yu. Khrennikov, Supernalysis (Kluwer Academic Publishers, Dordreht, 1999). 19. E. Schrodinger, Die Naturwiss, 23, 807-812, 824-828, 844-849 (1935). 20. E. Schrodinger, What is an elementary particle? in Gesammelte Abhandlungen. (Wieweg and Son, Wien 1984). 21. A. Yu. Khrennikov, p-adic valued distributions in mathematical physics (Kluwer Academic Publishers, Dordrecht, 1994). 22. A. Yu. Khrennikov, Ensemble fluctuations and the origin of quantum probabilistic rule. Rep. MSI, Vaxjo Univ., 90, October (2000). 23. A. Yu. Khrennikov, Classification of transformations of probabilities for preparation procedures: trigonometric and hyperbolic behaviours. Preprint quant-ph/0012141, 24 Dec (2000). 24. A. Yu. Khrennikov, Hyperbolic quantum mechanics. Preprint quantph/0101002, 31 Dec (2000).
201
N O N C O N V E N T I O N A L V I E W P O I N T TO E L E M E N T S OF PHYSICAL REALITY B A S E D O N N O N R E A L A S Y M P T O T I C S OF RELATIVE F R E Q U E N C I E S ANDREI KHRENNIKOV International Center for Mathematical Modeling in Physics and Cognitive Sciences, MSI, University of Vaxjo, S-35195, Sweden Email:[email protected] We study connection between stabilization of relative frequencies and elements of physical reality. We observe that, besides the standard stabilization with respect to the real metric, there can be considered other statistical stabilizations (in particular, with respect to so called p-adic metric on the set of rational numbers). Nonconventional statistical stabilizations might be connected with new (nonconventional) elements of reality. We present a few natural examples of statistical phenomena in that relative frequencies of observed events stabilize in the p-adic metric, but fluctuate in the standard real metric.
1
Introduction
The present methodology of physical measurements is based on the principle of the statistical stabilization of relative frequencies in the long run of trials. In the mathematical model this principle is represented by the law of large numbers. This approach to measurements is induced by human representation of physical reality as reality of stable repetitive phenomena. In the process of evolution we created cognitive structures that correspond to elements of this 'repetitive physical reality'. All modern physical investigations are oriented to the creation of new elements of such a reality." It must be remarked that the notion of stabilization (of relative frequencies) plays the fundamental role in the creation of this reality. I would like to point out that the conventional meaning of stabilization is based on real numbers. When we say stabilization, we mean the stabilization with respect to the standard real metric pn(x,y) = |x — y| (the distance between points x and y on the real line R). Of course, such a choice of the metric that determines statistically elements of physical reality was not just a consequence of the development of one special mathematical theory, real analysis. b It a
W e ask the reader not connect our vague ('common sense') use of the notion of an element of physical reality with the E P R sufficient condition to be an element of reality, [1]. b Nevertheless, we must not forget that the human factor played the large role in the expending of the (presently dominating) model of physical reality based on real numbers. At the beginning Newton's analysis was propagated as a kind of religion. There were (in particular
202
seems that the notion of /^-stabilization was induced by human practice in that quantities n « N were not important. We created 'real physical reality', because we used smallness based on the standard order on the set of natural numbers. It must be underlined that in modern physics the real physical reality (i.e., reality based on the /9R-stability) is, in fact, identified with the whole physical reality. On the other hand, the modern mathematics is not more just a real analysis. In particular, the development of general topology [2], [3] induced large spectrum of new nearness (in particular, metric) structures. In principle, we need not more identify any stabilization with the p^-stabilization. There appears a huge set of new possibilities to introduce new forms of stability in physical experiments. Moreover, new stable structures can be considered as new elements of physical reality that, in general, need not belong the standard real reality. This idea was presented for the first time in author's investigations [4], [5] on so called p-adic physics [6]- [10]. Later we tried to find the place of padic probabilities in quantum physics [11], [12] (in particular, to justify on the mathematical level of rigorousness the use of negative and complex probabilities as well as create models with hidden variables that do not produce Bell's inequality). In this paper we give the brief introduction into these probabilistic models as well as present a few rather natural examples in that relative frequencies of events stabilize with respect to so called p-adic metric, but fluctuate with respect to pR. There is no corresponding element of the real reality. But there is an element of the p-adic reality. The objects considered in examples could be created on the 'hard'-level. In particular, to create a plantation in that a colour of the flower (red or white) is the element of p-adic reality, I need just a tractor and (sufficiently large) peace of land. Nevertheless, I must agree that such a p-adic element of reality were never observed in 'naturally created' physical objects. The reader can be interested in the reasons by that we are concentrated on the statistical stabilization with respect to the p-adic numbers, p-adic frequency probability theory. The main reason is that p-adic numbers are, in fact, the unique alternative to real numbers: there is no other possibility to complete the field of rational numbers and obtain a new number field (Ostrovskii's theorem, see, for example, [13], [14]). Our probabilistic foundations are based on the generalization of R. von Mises frequency theory of probability [15], [16]. At the beginning of this century, when the foundation of modern probability theory were being laid, the in France) divine services devoted to Newton's analysis.
203
frequency definition of probability proposed by von Mises played an important role. In particular, it was this definition of probability that Kolmogorov used to motivate his axioms of probability theory (see [17]). We also begin the construction of the new theory of probability with a frequency definition of probability. Von Mises defined the probability of an event as the limit of the relative frequencies of the occurrence of the event when the volume of the statistical sample tends to infinity. This definition is the foundation of mathematical statistics (see example, Cramer [18]), in which von Mises's definition is formulated as the principle of statistical stabilization of relative frequencies. In this paper, we propose a general principle of statistical stabilization of relative frequencies. By virtue of this principle, statistical stabilization of relative frequencies {u = n/N} can be considered not only in the real topology on Q (and all relative frequencies are rational numbers), but also in any other topology on Q. Then the probabilities of events belong to the corresponding completion of the field of rational numbers. As special cases, we obtain the ordinary real probability theory (von Mises's definition) and p-adic probability theories, p = 2 , 3 , 5 , . . . . How should one choose the topology of statistical stabilization for a given statistical sample? The topology is determined by the properties of the studied probability model. In essence, we propose this principle: for each probability model there is a corresponding topology (or topologies) of statistical stabilization. For example, in a random sample there need not be any statistical stabilization of the relative frequencies in the real metric. Thus, from the point of view of real probability theory this is not a probabilistic object. However, in this random sample one may observe p-adic statistical stabilization of the relative frequencies. In essence, I am asserting that the foundation of probability theory is provided by rational numbers (relative frequencies) and not real numbers. Real probabilities of events merely represent one of many possibilities that arise in the statistical analysis of a random sample. Such an approach to probability theory agrees well with Volovich's proposition that rational numbers are the foundation of theoretical physics [19]. In accordance with this proposition, everything physical is rational, and number fields that are different from the field of rational numbers arise as an idealization needed for the theoretical description of physical results. All necessary information on p-adic (and more general m-adic) numbers can be found in Appendix 1 of this paper. However, in the first two sections they are hardly used at all, and we may restrict ourselves to the remark that
204
in addition to the completion of the field of rational numbers Q with respect to the real metric there also exist completions with respect to other metrics, and among these completions there are the fields of p-adic numbers Qp,p = 2,3,5,.... 2
Analysis of the foundation of probability theory
2.1. Frequency Definition of Probability. As is well known, the frequency definition of probability proposed by von Mises [15] in 1919 played an important role in the construction of the foundations of modern probability theory. This definition exerted a strong influence on the theory of probability measures, the foundations of which were laid by Borel [20], Kolmogorov [17], and Frechet [21]. There is no point in giving here Kolmogorov's axioms (which can be found in any textbook on probability theory) but it is probably necessary to recall in its general features the main propositions of von Mises's theory of probability. The theory is based on infinite sequences x = (a;i, x
205
more or less constant value at large N" (see Cramer [18]). In defining a collective, von Mises used a further principle - the principle of irregularity of a sequence of tests, i.e., invariance of the limit of the relative frequencies with respect to the selection, made using a definite law, from a given sequence of tests x = (xi,X2,... ,xn,...) of some subsequence. It is important that the law of this selection should not be based on the difference of the elements of the sequence with respect to the considered label. "Second, this limiting value must remain unchanged if from the complete sequence we choose arbitrarily any part and consider in what follows only this part" [16]. This principle, like the principle of statistical stabilization of the relative frequencies, is fully in accord with our intuitive ideas of randomness. However, there are here some logical difficulties associated with the "arbitrariness" of the choice. A detailed analysis of these logical problems was made by Khinchin [22], see also [12] for the details. It appears that one must agree with Khinchin's critical comments and consider the frequency theory of probability that is based only on von Mises's first principle - the principle of statistical stabilization of the relative frequencies. As is noted in [22], the frequency theory of probability based solely on von Mises's first principle is axiomatized and is as rigorous a mathematical theory as Kolmogorov's theory of probability. Here, we do not intend to consider von Mises's theory of probability in the framework of an axiomatic approach. Our task is to analyze the principle of stabilization of the frequencies of occurrence of a particular event in a collective. 2.2. Von Mises Frequency Theory of Probabilities as Objective Foundation of Kolmogorov's Axiomatics. As motivation of his axioms, Kolmogorov used the properties of limits of relative frequencies, see [17]. We shall be interested in the manner in which Kolmogorov's axiom 2 arose; in accordance with this axiom, the probability P{E) of any event E is a nonnegative real number < 1. In [17], Kolmogorov considers von Mises's definition [16] of probability as the limit of the relative frequencies of occurrence of the event E. Further, since the relative frequencies i/(£) = n/N are rational numbers that lie between zero and unity, their limits in the real topology are real numbers between zero and unity. Cramer proceeded similarly in the construction of his theory of probability distributions [18]. Khinchin, discussing the advantages of Kolmogorov's axioms over von Mises's frequency theory of probability, noted that "...from the formal aspect, the mutual relationship between the axiomatic and frequency theories is characterized in the first place by a higher degree of abstraction of the former." This higher degree of abstraction was the foundation of the successful
206
development of the theory of probability measures. However, this degree of abstraction is too high, and some properties of the world of real frequencies are lost in it. Essentially, the rational numbers were lost in Kolmogorov's theory of probability. Whereas in von Mises's theory the rational numbers arise as primary objects, and real probabilities are obtained as a result of a limiting process for rational frequencies, in Kolmogorov's theory rational frequencies are secondary objects associated with real probabilities (which are here primary) by means of the law of large numbers. 3
General principle of statistical stabilization of relative frequencies
First, we emphasize that the probabilities P in von Mises's frequency theory are ideal objects (symbols to denote the sequences of relative frequencies that are stabilized in the field of real numbers). Therefore, real numbers arise here as ideal objects associated with rational sequences of frequencies (see also Borel [20] and Poincare [23]). A basis for a broader view of probability theory is provided by the following principle of statistical stabilization of frequencies: Statistical stabilization (the limiting process) can be considered not only in the real topology on the field of rational numbers Q but also in any other topology on Q. The probabilities of events are defined as the limits of the sequences of relative frequencies in the corresponding completions of the field of rational numbers. For each considered probability model, there is a corresponding topology on the field of rational numbers. The metrizable topologies on Q given by absolute values are the most interesting. By virtue of Ostrovskii's theorem, there are very few such topologies; indeed, besides the usual real topology, for which p(x,y) = \x — y\, there exists only the p-adic topologies p = 2 , 3 , . . . , where p(x, y) = \x — y\p. Thus, if we consider only topologies given by absolute values, then, besides the usual probability theory over R, we obtain only the probability theories over Qp. It is here necessary to introduce a natural restriction on the topology of statistical stabilization. The completion Qt of the field of rational numbers Q with respect to the statistical stabilization topology t is a topological field. We have deliberately not introduced this restriction into the general principle of statistical stabilization. One can also consider statistical stabilization topologies that are not consistent with the algebraic structure on Q. However, probability theory based on such topologies loses many familiar properties. For
207
example, it turns out that the continuity of the addition operation is equivalent to additivity of probabilities, and continuity of the division operation is equivalent to the existence of conditional probabilities. Let x = (x\,X2,. • • ,xn,...) be some collective. We denote the set of all labels for this collective (possible outcomes of an experiment producing this collective) by the symbol II. We denote by fi the event consisting in the realization of at least of the label n € II. Proposition 3.1. The probability of the event il is equal to unity. To prove this, it is sufficient to use the fact that all the relative frequencies are equal to unity. Let v^fi, j = 1,2, be the relative frequencies of realization of certain labels 7Ti and 7r2, and Pj = l i m i / ^ be the corresponding probabilities. Let event A be the realization of the label TT\ or -K-I : A = n\ V TT2 • Using the continuity of the addition operation, we obtain P(A) = lim i/W = lim(j/W + v^)
= lim i/W + lim J / 2 ) = PX+P2
(1)
This rule can be generalized to any number of mutually exclusive events. Proposition 3.2. Let Aj,j = 1 , . . . ,k, be mutually exclusive events (i.e., the sets of labels that define these events are disjoint). Then k
P(A1V...VAk)
= Y,P(Aj)
(2)
i=i
Using the continuity of the subtraction operation, we obtain the following proposition. Proposition 3.3. For any two events A and B, the equation P(A\/B) — P{A) + P{B) - P{A A B) holds. In the language of collectives, the rule of addition of probabilities is formulated as follows, see[16]: "Beginning with an original collective possessing more than two labels, an appreciable number of new collectives can be constructed by "uniting" labels; the elements of the new collective are the same as in the original one, but their labels are unifications of the labels of the original collective...." To the unification of labels there corresponds the addition of frequencies. We consider the set of rational numbers U = {x € Q : Q < x < \}. We denote by the symbol Ut the closure of the set U in the field Qt (if t is the ordinary real topology, then Ut — [0,1]). An obvious consequence of the definition of probabilities is the following proposition. Proposition 3.4 The probability of any event P{E) belongs to the set Ut-
208
Conditional probabilities are then introduced into the frequency theory in same way as in [16]. Suppose there is some initial collective x = (xltx2,--. ,xn,...) with probabilities pn of the labels, IT € II. Using the unification rule, we define the probabilities of all groups of labels:
P(A) = Y,P*-
(3)
We fix some group of labels B = n^ V . . . V iTik. We are interested in the conditional probability P(TT/B),TT € B, of the label n given the condition B. We form a new collective x' = (x[, x'2,... ,x'n), which is obtained from the original one by choosing only the elements with the labels ?r' £ 5 . The probability of the label -K in this new collective is then called the conditional probability of the label n under the condition B : P(n/B) = lim v^lB^, where J,(T/ B ) a r e the relative frequencies of the label -K in the new collective. Noting that z/*'/ 5 ) = i/M / z / B ) , where v^ is the relative frequency of the label it in the collective x, and j / B ) is the relative frequency of the event B in the collective x, we obtain (using the continuity of the division operation) j/(7r) limi/W p(V) PMB)=lua-m = — m = ^ y
P{B)*0.
(4)
The general formula can be proved similarly.
Proposition 3.5. P(A/B) = P{AAB)/P(B),P(B)
£ 0.
We now introduce the concept of independence of events. Analyzing arguments in the book [16], one notes that the rule of multiplication of probabilities for independent events is equivalent to the continuity of the multiplication operation. An important property that makes it possible to use p-adic probabilities when considering standard problems of probability theory is the p-adic interpretation of the probabilities zero and one (which are probabilities in the sense of ordinary probability theory). Indeed, the equation P(E) = 0 in ordinary probability theory does not mean that the event E is impossible. It merely means that in a long series of experiments the event E occurs in a very small fraction of cases. However, in a large number of experiments this fraction can be relatively large. Moreover, the equation P(E) = 0 "lumps together" a huge class of events that intuitively appear to have different probabilities. For example, suppose we consider two events, E\ and Ei and in the first
N = Nk = C£*)2
(5)
209 = 2k times and the event E2 is realized
trials the event Ei is realized n^
k
nW
= Y,2j
(6)
J=0
times. It is intuitively clear that the probabilities of these events must be different. However, in real probability theory Pi = lim n{1)/N
= P2= lim n ( 2 ) /N = 0
(7)
It is different in 2-adic probability theory. Stabilization in the 2-adic topology gives Pi = 0 ? P2 = - 1 since in Q2 we have 2* -> 0, k -> co, and for - 1 we have the representation - 1 = l + 2 + 22 + . . . + 2" + . . . We here encounter for the first time negative numbers for probabilities of events (compare to Wigner [24], Dirac [25], Feynman [26], see also [27], [28], [12]). Of course, these probabilities are forbidden by Kolmogorov's second axiom in ordinary probability theory (in von Mises's approach, they are forbidden by the choice of the topology of statistical stabilization). However, from the point of view of the frequency theory of probability P = — 1 is only an ideal object, the symbol that denotes the limit of a sequence of relative frequencies. This symbol is in no way better and in no way worse than the symbol P = \jix in ordinary probability theory. In this example negative p-adic probabilities were used to split zero conventional (real) probability. So p-adic negative probabilities can be interpreted as infinitely small conventional probabilities. It may be that all negative probabilities that appear in quantum physics might be interpreted in such a way. If conventional (real) probability is equal to zero there is no conventional (real) element of reality. However, there is nonconventional (p-adic) element of reality that is realized with negative probability. Real and p-adic probabilities correspond to different classes of measurement procedures. The element of reality that it would be impossible to observe by using 'real measurement procedure' might be observed by using 'p-adic measurement procedure.' One can treat similarly the case of a probability (in the sense of the ordinary theory) equal to unity. For example, suppose k
N = Nk = (J2V)2,n^ j=0
k
k
k
= (]T2^)2 - 2fc,n(2) = ( ^ V ) 2 - £)2>' j=0
j=0
j=0
(8)
210
In 2-adic probability theory, we find that oo
P1=l^P2
=
l_(l/^2>)=2
(9)
3=0
We see here that natural numbers not equal to unity also belongs to the set Up. In this example p-adic (integer) probabilities which are larger than 1 were used to split conventional (real) probability one. So under the p-adic consideration a conventional element of reality can be split to a few p-adic elements of reality. In the framework of p-adic statistical stabilizations there is also "nothing seditious" about complex probabilities. For example, let p = l(mod 4). Then i = ( - l ) V a e Qp. Let i = io + hp + iip1 + • • • , ir = 0 , 1 , . . . ,p - 1,
(10)
be the canonical decomposition of the imaginary unit in powers of p. Note also that for any p _l = (p-l) + (p-l)p+(p-l)p2
+
....
(11)
Then for rational relative frequencies, we have v
JQ + HP+... + ikpk (p - 1) + (p - l)p + . . . + (p - l)pk
^ _{
,12,
in the p-adic topology. Geometrically, one may suppose that the new probability theory is a transition from one-dimensional probabilities on the interval [0,1] to multidimensional probabilities. 4
Probability distribution of a collective
Let x = (xi,... , Xk, • • •) be some collective, and II be the set of labels of this collective. We consider the simplest case when the set II is finite, II = ( 1 , . . . ,S). We denote by v^ the relative frequency of the j—label and by Pj = limi/ J ') the corresponding probability. In the frequency theory, the set of probabilities Px = (Pi,. • • , Ps) is called the probability distribution of the collective x.
211
The general principle of statistical stabilization makes it possible to consider not only real distributions but also distributions for other number fields. For one and the same collective x, there can exist distributions over different number fields. Thus, in the proposed approach a collective has, in general, an entire spectrum of distributions, PXit = ( P i , t , . . . ,Ps,t), where t are the topologies of statistical stabilization for the given collective. Therefore, one here studies more subtle structure of the collective. The relative frequencies are investigated not only for real stabilization but for a complete spectrum of stabilizations. In the connection with the existence of an entire spectrum of probability distributions of a collective, it is necessary to make some comments. First, this agrees well with von Mises's principle that "the collective comes first and the probabilities after." Indeed, a probability distribution is an object derived from a collective, and to one and the same collective there corresponds an entire spectrum of probability distributions, these reflecting different properties of the collective. Second, each statistical stabilization determines some physical property of the investigated object. For example, if in a statistical experiment involving the tossing of a coin the probability of heads is Pi and tails is P2, then these probabilities are physical characteristics of the coin like its mass or volume. This question is discussed in detail in the books of Poincare [23] and von Mises [16]. If we consider from this point of view the new principle of statistical stabilization, we obtain new physical characteristics of the investigated objects. For example, if in the real topology statistical stabilization is absent, then it is not possible to obtain any physical constants in the language of ordinary probability theory. But these constants could exist and be, for example, p-adic numbers. If a collective has not only a real probability distribution but an entire spectrum of other distributions, then, besides real constants corresponding to physical properties of the investigated object, we obtain an entire spectrum of new constants corresponding to physical properties that were hidden from the real statistics. Note that these new constants can also be ordinary rational numbers. 5
Model examples of p-adic statistics
5.1 Plantation with R e d and W h i t e Flowers. As one of the first examples of a collective, von Mises considered [16] a plantation sown with flowers of different colors, and he studied the statistical stabilization of the relative frequencies of each of the colors. We shall construct
212
an analogous collective for which p-adic stabilization always occurs but real stabilization is in general absent. Suppose there are flowers of two types: red (R) and white (W). The plantation (or, rather, infinite bed) is sown in a random order with red and white flowers, the flowers being sown in series formed by blocks of p flowers, the length of the series (the power of p) being also determined in accordance with a random rule. Namely, suppose there are two generators of random numbers: 1) j = 0,1; 2) i = 1,2 (with probabilities 0.5). If j = 0, then a series of red flowers is sown; if j = 1, then a series of white ones. The length of each series is defined as follows: the length of the first series is some power p' 1 (it can also be determined in accordance with a random rule); if the length of the previous series was plm, then the length of the next series is plm+x, lm+i =lm + im. We introduce the relative frequencies of the red and white flowers in the firs m series: v}£> = rVm>/Nm,i^T = n™ /Nm. Proposition 5.1. For all generators of the random numbers j and i, there is statistical stabilization of the relative frequencies u^R> and u^w> in the p-adic topology. Thus, we have defined p-adic probabilities PR = l i m i / ' ^ and Pw —
limi/(w\ and
oo
oo
oo
ln
oo
PR = (£(1 -Jn)P'")/CZ,P )> w = ( E ^ " ) / ( E ^ n ) n=l
n=l
p
n=l
(13)
n=l
Note that in general there is no real statistical stabilization for such a random plantation. If the generator of the random numbers j gives series 0 or 1, then u^ and v^w^ in the real topology can oscillate from zero to unity. Thus, a real observer (an investigator who carries out statistical analysis of the sample in the field of real numbers) cannot obtain any statistically regular law. He will obtain only a random variation of the series of real relative frequencies. In contrast, the p-adic observer (the investigator who makes a statistical analysis of the sample in the field of p-adic numbers) will obtain a well-defined law, consisting of the stabilization of the outcomes in the p-adic decomposition of the relative frequencies. It is evident that in the example of probability theory we observe a new fundamental approach to the investigation of natural phenomena. In accordance with this approach, experimental results must be analyzed not only in the field of real numbers but also in p-adic fields. Naturally, our example is purely illustrative, but it does appear to reflect many very important properties of p-adic statistics.
213
Remark 5.1. Intuitively, one supposes that in a real plantation it is possible to find a white flower next to almost every red flower; in contrast, large groups (clusters) of red and white flowers are distributed randomly over a p-adic plantation (one can sow not only a bed but also distribute series of red and white flowers over a plane in accordance with a random rule). A real random plane is obtained if one throws at random red and white points onto the plane; in contrast, a p-adic random plane is obtained if one throws patches of pl points at a time of red and white color onto the plane. In Appendix 2, we give the results of statistical analysis of the results of a random modeling on a computer of the proposed probability model. There is very rapid p-adic stabilization of the relative frequencies and no stabilization in the sense of ordinary real probability theory. Remark 5.2. Evidently, the structure of series formed by powers of p need not necessarily be directly observed in a statistical sample. This structure is introduced by rounding the number of results to powers of p. In very large statistical samples, one can take into account only the orders of the numbers, and one thereby introduces into the sample a 10-adic structure. 5.2. Random Choice of the Digit of a p-Adic Number. Suppose there are two labels: 1 and 2; j is a generator of random numbers corresponding to the choice of one of the labels. Each random label is produced in series, the length of the series being determined by random choice of the next p-adic digit, i.e., there is a generator of random numbers a that take the values a = 0 , 1 , . . . , p - 1, and the length of the next series is anpn~1,n = 1,2,... . We introduce the relative frequencies v^ and v^. Proposition 5.2. For all generators of the random numbers j and a there is statistical stabilization of the relative frequencies v'-1' and i / 1 ' in the p-adic topology. Thus, the following p-adic probabilities are defined: oo P
oo
l = (Y,^l-J^nPn~1)l{Y,^nPn-l),P2 n=l n=l
oo
= n=l
oo
(Ejn<*nPn-l)/(
In the real topology, there is, in general, no statistical stabilization. Appendix 1 Every rational number x ^ 0 can be represented in the form
where p does not divide m and n. Here p is a fixed prime. The p-adic absolute value (norm) for the rational number x is defined by the equations \x\p =
214 p r , i / 0, |0| p = 0. This absolute value has the usual properties: l)\x\p > 0, \x\p = 0 «-»• x = 0; 2)|x?/|p = |a;|p|2/|p, and satisfies a strong triangle inequality: 3)\x + y\p < max(|a;|p, |y| p ). The completion of the field of rational numbers with respect to the metric p(x — y) = \x — y\p is called the field of p-adic numbers and denoted by the symbol Qp. It is a locally compact field. Numbers in the unit ball Zp = {x € QP '• \X\P < 1} °f the field Qp are called integer p-adic numbers. Prom the strong triangle inequality, we obtain a theorem which states that a series in the field Qp converges if and only if its general term tends to zero. Any p-adic number can be represented in a unique manner in the form of a (convergent) series in powers of p : oo
x
= Yla^'ai
=0,1,... ,p-l;fc = 0,±l,... ,
(15)
j=k
with \x\p = p~k. One can define similarly m-adic numbers, where m is any natural number, m > 2. In the general case, property 2) is replaced by the weaker property \xy\m < |z|m|2/|m> i-e-> \x\m ls a pseudonorm. The completion of the field Q in the metric p(x,y) = \x — y\m will not be a field (for m that are not prime). It is only a ring. Here, we already encounter some deviations from the ordinary probability rules (which can be extended without any changes to padic probabilities). For example, one can have a situation of the following kind: A and B are independent events, P(A) ^ 0 and P{B) ^ 0, but P(A AB)=0. In particular, the conditional probability P(A/B) is in general not defined for an event B having nonvanishing probability. Appendix 2 We give here the results of a random experiment (modeled on a computer) for a 2-adic plantation. The results of this experiment give a good illustration of a situation in which there is no statistical stabilization in the real topology, but there is statistical stabilization in the 2-adic topology. In the following tables, m is the number of a random experiment in which two random numbers are modeled, one corresponding to the choice of a flower and the other to the length of the series of this flower; d is the number of elements in the sample. Because of the exponential growth of the number of elements in the series, d increases very rapidly. The table of relative frequencies in the field of real numbers is:
215
m 4 5 6 7
d 10 102 103 103
uyyw 0.1304 0,6364 0,1913 0,0504
uH 0.8696 0,3636 0,8087 0,9496
12 13 14
105 105 106
0,0006 0,5335 0,1703
0,9994 0,4665 0,8297
22 109 23 10 10
0,0022 0,7453
0,9978 0,2547
Thus, for the relative frequencies in the field of real numbers there is no stabilization of even the first digit after the decimal point. We examined large sequences of experiments on the computer in which the oscillations continued. The calculations in the field Q2 give the results AT = 10
v(w)
i/W
=101011111011000000110100010111011000110011011110110001011 =001100000100111111001011101000100111001100100001001110100 iV = 20
v(w)
_ 10101111101100111011001100101111110000011100111000000001 vW> = 00110000010011000100110011010000001111100011000111111110 AT = 30 i/W = 101011111011001110110011001111111100000000100110110000011 i/W =001100000100110001001100110000000011111111011001001111100 AT = 40 v(w)
i/W
=101011111011001110110011001111111100000000010111001110100 =001100000100110001001100110000000011111111101000110001011
216
Thus, after ten random experiments 14 digits are stabilized in the 2-adic decomposition for the relative frequency of occurrence of a red flower and 14 digits for a white flower; after 20 experiments, the numbers of digits that are stabilized are 27 for both colors; after 30 experiments, 42 digits are stabilized for each, and so forth. Appendix 3 W e give the results of analysis of a statistical sample in a field of 5-adic numbers. Here, N is the number of random experiments, M is the number of elements of the sample, M\ is the number of elements of the first label, and Mi is the number of elements of the second label: N : 2; M l : 002; M 2 : 00002; M : 00202 Ml/M:1044004400440044004400440044004400440044004400440044 M2/M:0010440044004400440044004400440044004400440044004400 N : 3; M l : 002; M 2 : 000023; M : 002023 Ml/M:1040303403420004404141041024440040303403420004404141 M2/M1:0014141041024440040303403420004404141041024440040303 N : 4; M l : 00200002; M 2 : 000023; M : 00202302 Ml/M:1040303004000130020234341334320032124414032304024031 M2/M:0014141440444314424210103110124412320030412140420413 N : 5; M l : 00200002; M 2 : 000023004; M : 002023024 Ml/M:1040301040132010043322212441423102032221232032034142 M2/M:0014143404312434401122232003021342412223212412410302 N : 6; M l : 00200002; M 2 : 00002300403; M : 00202302403 Ml/M:1040301003131014113132222240403413222311230303113140 M2/M:0014143441313430331312222204041031222133214141331304 N : 7; M l : 00200002; M 2 : 0000230040303; M : 0020230240303
217
Ml/M:1040301003202004101343032004014023441101104433243020 M2/M:0014143441242440343101412440430421003343340011201424 Thus, in the analysis of the sample in the field of 5-adic numbers there is rapid stabilization of the digits in the 5-adic decomposition of the relative frequencies. For example, after 55 experiments 78 digits in the 5-adic decomposition of the relative frequencies are stabilized. When the sample is analyzed in the field of real numbers, there is again no statistical stabilization. Acknowledgements I would like to thank L. Ballentine and J. Summhammer for discussions on p-adic probabilities and elements of physical reality. References 1. A. Einstein, B. Podolsky, N. Rosen, Phys. Rev., 47, 777-780 (1935). 2. P.S. Alexandrov, Introduction to general theory of sets and functions. (Gostehizdat, Moscow, 1948). 3. R. Engelking, General Topology (PWN, Warszawa, 1977). 4. A.Yu. Khrennikov, Dokl. Akad. Nauk , 322, 1075-1079 (1992). 5. A.Yu. Khrennikov, J. of Math. Phys., 32, 932-937 (1991). 6. V.S. Vladimirov, I. V. Volovich, and E. I. Zelenov, p-adic analysis and mathematical physics ( World Scientific Publ., Singapore, 1994). 7. Yu. Manin, Springer Lecture Notes in Math.,1111, 59-101 (1985). 8. P. G. 0 . Freund and E. Witten, Phys. Lett. B, 199, 191-195 (1987). 9. A.Yu. Khrennikov, Non-Archimedean Analysis: Quantum Paradoxes, Dynamical Systems and Biological Models (Kluwer Academic Publ., Dordrecht, 1997). 10. S. Albeverio, A. Yu. Khrennikov and R. Cianci, J. Phys. A, Math. and Gen. 30, 881-889, (1997). 11. A. Yu. Khrennikov, J. of Math. Physics, 39, 1388-1402 (1998). 12. A.Yu. Khrennikov, Interpretations of probability (VSP Int. Publ., Utrecht, 1999). 13. Z. I. Borevich and I. R. Shafarevich, Number Theory (Academic Press, New-York, 1966). 14. W. Schikhov, Ultrametric calculus (Cambridge Univ. Press, Cambridge, 1984) 15. R. von Mises, Math.Z., 5, 52-99 (1919).
16. R. von Mises,, Probability, Statistics and Truth (Macmillan, London, 1957). 17. A. N. Kolmogorov, Foundations of the Probability Theory (Chelsea Publ. Comp., New York, 1956). 18. H. Cramer, Mathematical theory of statistics (Univ. Press, Princeton, 1949). 19. I. V. Volovich, Number Theory as the Ultimate Physical Theory, Preprint, CERN, Geneva. TH. 4781/87 (1987) 20. E. Borel, Rend. Cic. Mat. Palermo, 27, 247 (1909). 21. M. Frechet, Recherches theoriques modernes sur la theorie des probability (Univ. Press., Paris, 1937-1938). 22. A. Ya. Khinchin, Voprosi Filosofii, No 1, 92; No 2, 77 (1961) (in Russian). 23. A. Poincare, About Science. Collection of works (Nauka, Moscow, 1983). 24. E. Wigner, Quantum -mechanical distribution functions revisted, in: Perspectives in quantum theory. Yourgrau W. and van der Merwe A., editors (MIT Press, Cambridge MA, 1971). 25. P. A. M. Dirac, Proc. Roy. Soc. London, A 180, 1-39 (1942). 26. R. P. Feynman , Negative probability. Quantum Implications, Essays in Honour of David Bohm, 235-246. B.J. Hiley and F.D. Peat, editors (Routledge and Kegan Paul, London, 1987). 27. W. Muckenheim, Phys. Reports, 133, 338-401 (1986). 28. A. Yu. Khrennikov, Int. J. Theor. Phys., 34, 2423-2434 (1995).
219
"COMPLEMENTARITY" OR SCHIZOPHRENIA: IS P R O B A B I L I T Y IN Q U A N T U M M E C H A N I C S I N F O R M A T I O N OR ONTA?
E-mail:
A. F. KRACKLAUER [email protected]
Of the various "complimentarities" or "dualities" evident in Quantum Mechanics (QM), among the most vexing is that afflicting the character of a 'wave function,' which at once is to be something ontological because it diffracts at material boundaries, and something epistemological because it carries only probabilistic information. Herein a description of a paradigm, a conceptual model of physical effects, will be presented that perhaps can provide an understanding of this schizophrenic nature of wave functions. It is based on Stochastic Electrodynamics (SED), a candidate theory to elucidate the mysteries of QM. The fundamental assumption underlying SED is the supposed existence of a certain sort of random, electromagnetic background, the nature of which, it is hoped, will ultimately account for the behavior of atomic scale entities as described usually by QM. In addition, the interplay of this paradigm with Bell's 'no-go' theorem for local, realistic extentions of QM will be analyzed.
1
Introduction
Of the various "complimentarities" or "dualities" evident in Quantum Mechanics (QM), among the most vexing is that afflicting the character of a 'wave function,' which at once is to be something ontological because it diffracts at material boundaries, and something epistemological because it carries only probabilistic information. All other diffractable waves, it may be said, carry {momentum, energy}, not conceptual, abstract information, "ideas." All other probabilities are calculational aids, and like abstractions generally, are utterly unaffected by material boundaries. The literature is replete with resolutions of QM-conundrums selectively ignoring one or the other of these characteristics— in the end, they all fail. Herein a description of a paradigm, a conceptual model of physical effects, will be presented that perhaps can provide an understanding of this schizophrenic nature of wave functions. It is based on Stochastic Electrodynamics (SED), a candidate theory to elucidate the mysteries of QM.1 The fundamental concept underlying SED is the supposed existence of a certain sort of random, electromagnetic background, the nature of which, it is hoped, will ultimately account for the behavior of atomic scale entities as described usually by QM. 2 Among the successes of SED, one is a local realistic explanation of the diffraction of particle beams.3 The core of this explanation is the
220
notion that relative motion through the SED background effectively engenders de Broglie's pilot wave. Given such a pilot wave associated with a particle's motion, the statistical distribution of momentum in a density over phase space can be decomposed, in the sense of Fourier analysis, such that the resulting form of Liouville's Equation, under some conditions, is Schrodinger's Equation. From this viewpoint, the 'schizophrenic' character of wave functions can be discussed and understood free of preternatural attributes. These concepts have broad implications for serious philosophical questions such as the "mindbody" dichotomy through teleportation to popular science fiction effects. In addition, the peculiar nature of probability in QM is clarified. Although much remains to be done to comprehensively interpret all of QM in terms of SED, many of the by now hoary 'paradoxes' can be rationally deconstructed. A secondary (but intimately related) issue is that of determining the import of Bell's Theorem for the use of the SED paradigm to reconcile fully the interpretation of QM. Arguments will be presented showing that in his proof, Bell (essentially by misconstruing the use of conditional probabilities) called on inappropriate hypothetical presumptions, just as Hermann, de Broglie, Bohm and others found that Von Neumann did before him.4'5 2
De Broglie waves as an SED effect
The foundation of the model or conceptual paradigm for the mechanism of particle diffraction proposed herein is Stochastic Electrodynamics (SED). Most of SED, for which there exists a substantial literature, is not crucial for the issue at hand.1 The nux of SED can be characterized as the logical inversion of QM in the following sense. If QM is taken as a valid theory, then ultimately one concludes that there exists a finite ground state for the free electromagnetic field with energy per mode given by E = huj/2.
(1)
SED, on the other hand, inverts this logic and axiomatically posits the existence of a random electromagnetic background field with this same spectral energy distribution, and then endeavors to show that ultimately, a consequence of the existence of such a background is that physical systems exhibit the behavior otherwise codified by QM. The motivation for SED proponents is to find an intuitive local realistic interpretation for QM, hopefully to resolve the well known philosophical and lexical problems as well as to inspire new attacks on other problems.
221
The question of the origin of this electromagnetic background is, of course, fundamental. In the historical development of SED, its existence has been posited as an operational hypothesis whose justification rests o posteriori on results. Nevertheless, lurking on the fringes from the beginning, has been the idea that this background is the result of self-consistent interaction; i.e., the background arises out of interactions from all other electromagnetic charges in the universe.6 For present purposes, all that is needed is the hypothesis that particles, as systems with charge structure (not necessarily with a net charge), are in equilibrium with electromagnetic signals in the background. Consider, for example, as a prototype system, a dipole with characteristic frequency u. Equilibrium for such a system in its rest frame can be expressed as moc2 = Jkj0.
(2)
This statement is actually tautological, as it just defines UJQ for which an exact numerical value will turn out to be practically immaterial. This equilibrium in each degree of freedom is achieved in the particle's rest frame by interaction with counter propagating electromagnetic background signals in both polarization modes separately, which on the average, add to give a standing wave with antinode at the particle's position: 2cos(fc0a;)sin(wo*)-
(3)
Again, this is essentially a tautological statement as a particle doesn't 'see' signals with nodes at its location, thereby leaving only the others. Of course, everything is to be understood in an on-the-average, statistical sense. Now consider Eq. (3) in a translating frame, in particular the rest frame of a slit through which the particle as a member of a beam ensemble passes. In such a frame the component signals under a Lorentz transform are Doppler shifted and then add together to give what appears as modulated waves: 2 cos(fc07(x — cflt)) sin(wo7(i — c_1/3a;)),
(4)
for which the second, the modulation factor, has wave length A = (7/?fco)-1. From the Lorentz transform of Eq. (2), P = hj/3ko, the factors j/3k0 can be identified as the de Broglie wave vector from QM as expressed in the slit frame. In short, it is seen that a particle's de Broglie wave is modulation on what the orthodox theory designates Zitterbewegung. The modulation-wave effectively functions as a pilot wave. Unlike de Broglie's original conception in which the pilot wave emanates from the kernel, here this pilot wave is a kinematic effect of the particle interacting with the SED Background. Because
222
this SED Background is classical electromagnetic radiation, it will diffract according to the usual laws of optics and thereafter, modify the trajectory of the particle with which it is in equilibrium.3 (See Ref. [1], Section 12.3, for a didactical elaboration of these concepts.) The detailed mechanism for pilot wave steerage is based on observing that the energy pattern of the actual signal that pilot waves are modulating, and to which a particle tunes, comprises a fence or rake-like structure with prongs of varying average heights specified by the pilot wave modulation. These prongs, in turn, can be considered as forming the boundaries of energy wells in which particles are trapped; a series of micro-Paul-traps, as it were. Intuitively, it is clear that where such traps are deepest, particles will tend to be captured and dwell the longest. The exact mechanism moving and restraining particles is radiation pressure, but not as given by the modulation, rather by the carrier signal itself. Of course, because these signals are stochastic, well boundaries are bobbing up and down somewhat so that any given particle with whatever energy it has will tend to migrate back and forth into neighboring cells as boundary fluctuations permit. Where the wells are very shallow, however, particles are laterally (in a diffraction setup, say) unconstrained; they tend to vacate such regions, and therefore have a low probability of being found there. The observable consequences of the constraints imposed on the motion of particles is a microscopic effect which can be made manifest only in the observation of many similar systems. For illustration, consider an ensemble of similar particles comprising a beam passing through a slit. Let us assume that these particles are very close to equilibrium with the background, that is, that any effects due to the slit can be considered as slight perturbations on the systematic motion of the beam members. Given this assumption, each member of the ensemble with index, n say, will with a certain probability have a given amount of kinetic energy, En, associated with each degree of freedom. Of special interest here is the beam direction perpendicular to both the beam and the slit in which, by virtue of the assumed state of near equilibrium with the background, we can take the distribution, with respect to energy of the members of the ensemble, to be given in the usual way by the Boltzmann Factor:e _ ^ £ " where /? is the reciprocal product of the Boltzmann Constant k and the temperature, T, in degrees Kelvin. The temperature in this case is that of the electromagnetic background serving as a thermal bath for the beam particles with which it is in near equilibrium. Now, the relative probability of finding any given particle; i.e., with energy E{n,j} or E{n
223
particles with energy less than the well depth,
e J
( )e s&
(1 e SD)
£ ^- = f" t " = ^ - " ' {l\En,,
JO
0
V
<5)
0
where approximating the sum with an integral is tantamount to the recognition that the number of energy levels, if not a priori continuous, is large with respect to the well depth. If now d in Eq. (5) is expressed as a function of position, we get the probability density as a function of position. For example, for a diffraction pattern from a single slit of width o at distance D, the intensity (essentially the energy density) as a function of lateral position is: E0 sin2(9)/62 where 9 = k[piiotWave(^/D)y, and the probability of occurrence, P(6(y)), as a function of position, would be P(y)a(l-e-^sin2W/fl2).
(6)
Whenever the exponent in Eq. (6) is significantly less than one, its r.h.s. is very accurately approximated by the exponent itself; so that one obtains the standard and verified result that the probability of occurrence, P{y) = ip*tp in conventional QM, is proportional to the intensity of a particle's de Broglie (pilot) wave. 3
Schrodinger Equation
A consequence of the attachment of a De Broglie pilot wave to each particle is that there exists a Fourier kernel of the following form: • 2p V
(7)
which can be used to decompose the density function of an ensemble of similar particles. Consider an ensemble governed by the Liouville Equation:
^
at
= - V / » - ^ + (Vpp).F, m
(8) i=x,y,z
Now, decompose p(x, p)with respect to p using the De Broglie-Fourier Kernel: p(x, x', t) = / e'-^p(x,
p , t)dp,
(9)
224
1.10
relative intensity
Neutron Diffraction
0
Particle Beam
1
x
Radiation
•I
A
Chi(y)-squared (x50)
lateral displacement in radians, 'theta'
Figure 1: A simulated single slit neutron diffraction pattern showing the closeness of the fit of Eq. (6) to the pure wave diffraction patten. See Ref. [3] for details.
to transform the Liouville Equation into:
dt
\i2m
f)(x'.P)?.
(10)
To solve, separate variables using: r = x + x',
r ' = x —x',
(11)
to get
i = ( ^ ) ^ - ( ^ » - ( i ) (-"»•'(4^^
(12)
which can (sometimes) be separated by writing: #r,r')=V*(r')V<(r),
(13)
225
to get Schrodinger's Equation: ihd-^ = ~y^ at 2m 4
+ v^.
(14)
Conclusions
Within this paradigm, Quantum Mechanics is incomplete as surmised by Einstein, Padolsky and Rosen.4 It is built on the basis of the Liouville Equation while taking a particular stochastic background into account. The conceptual function of Probability in QM is just as in Statistical Mechanics. Measurement reduces ignorance; it does not precipitate "reality." Of course, measurement also disturbs the measured system, but this presents no more fundamental problems that it does in classical physics. 'Heisenberg uncertainty,' on the other hand, is seen to be caused simply by the incessant dynamical perturbation from background signals. In so far as the source of background signals can not be isolated, this source of uncertainty is intrinsic, but not fundamentally novel. For these reasons, "duality" is superfluous. Particles have the same ontological status as in classical physics. Individual particles in a beam pass through one or the other slit in a Young double slit experiment, for example, while their De Broglie piloting waves pass through both slits. Beyond the slit, the particles are induced stochastically to track the nodes of their pilot waves so that a diffraction pattern is built up mimicking the intensity of the pilot wave. From within this paradigm, the now infamously paradoxical situations illustrating various problems with the interpretation of QM never arise or are resolved with elementary reasoning. In particular, wave functions are not vested with an ambiguous nature. The SED Paradigm also clarifies the appearance of interference among "probabilities." Numerous analysts from various view points have discovered that fact that Probability Theory admits structure (used by QM) that goes unexploited in traditional applications. (E.g., see Gudder, Summhammar, this volume) While each of these approaches provides deep and surprising insights, none really offers any explanation of why and how nature exploits this structure. Just as a certain second order hyperbolic partial differential equation becomes the "wave equation," as a physics statement only with the introduction; e.g., of Hook's Law, so this extra probability structure can be made into physics only with an analogue to Hook's Law. SED provides that analogue for particle behavior with its model of pilot wave guidance. In this model, radiation pressure is responsible for particle guidance.3 Radiation pressure is proportional to the square of EM fields; i.e.,
226
the intensity (in this case of the the background field as modified by objects in the environment) which is not additive. Rather, the field amplitudes are additive and interference arrises in the way well understood in classical EM. In other words, QM interference is a manifestation of EM interference. The relevant Hook's Law analogue is the phenomenon of radiation pressure. For radiation, this is all intimately related, of course, to classical coherence theory as applied to "square law" photoelectron detectors, which, when properly applied, resolves many QM conundrums, including those instigated by Bell's Theorem surrounding EPR correlations. Appendix: Bell's Theorem The interpretation or paradigm described herein conflicts with the conclusions of Bell's "no-go" theorem, according to which a local, realistic extention of QM should conform with certain restraints that have been shown empirically to be false. To be sure, this paradigm does not deliver the hidden variables for exploitation in calculations, but it does indicate to which features in the universe they pertain—namely, all other charges. The character of these hidden variables is dictated by the fact that they are distinguished only in that they pertain to particles distant from the system of particular interest; thus, internal consistency requires that they be local and realistic.8 The basic proof Bell's Theorem purports to establish certain limitations on coincidence probabilities of spin or polarization measurements as calculated using QM if they are to have an underlying deterministic but still local and realistic basis describable by extra, as yet, 'hidden variables,' A, distributed with a density p(X). These limitations take the form of inequalities which measurable coincidences must respect. The extraction of one of these inequalities, where the input assumptions are enumerated as Bell made them, proceeds as follows: Bell's fundamental Ansatz consists of the following equation: P(a, b) = f d\p(X)A(a,
X)B(b, A),
(15)
where, per explicit assumption: A is not a function of 6; nor B of a. This he motivated on the grounds that a measurement at station A, if it respects 'locality,' can not depend on remote conditions, such as the settings of a distant measuring device, i.,e., b. In addition, each, by definition, satisfies \A\<1;
\B\<1.
(16)
227
Eq. (15) expresses the fact that when the hidden variables are integrated out, the usual results from QM are recovered. The extraction proceeds by considering the difference of two such coincidence probabilities where the parameters of one measuring station differ: P(a, b) - P(a, b') = f d\p(X)[A(a, X)B(b, A) - A(a, X)B(b', A)],
(17)
to which zero in the form
A(a, X)B(b, X)A(a', X)B(b', A) - A(a, X)B(b', X)A(a', X)B(b, A),
(18)
is added to get: P(a, b) - P(a, b') = [ dXp(X)(A(a, X)B(b, A))(l ± A(a', X)B(b', A)+
/ dXp(X)(A(a, X)B(b', A))(l ± A(a', X)B(b, A),
(19)
which, upon taking absolute values, Bell wrote as: \P(a, b)-P(a,
b')\ < [dXp(X)(l
± A(a', X)B(b', A)+
I dXp{X){l ± A(a', X)B(b, A).
(20)
Then, using Eq. (15), "Ansatz, " and normalization J dXp(X) = 1, one gets \P(a, b) - P(a, b')\ + \P(a', V) + P(a', b)\ < 2,
(21)
a Bell inequality.9 Now if the QM result for these coincidences, namely P(a, b) = — cos(20), is put in Eq. (21), it will be found that for 6 = ir/&, the r.h.s. of Eq. (21) becomes 2\/2. Experiments verify this result.10 Why the discrepancy? According to Bell: it must have been induced by demanding "locality," as all else he took to be harmless.
228
Critiques Although Bell's analysis is denoted a 'theorem,' in fact there can be no such thing in Physics; the axiomatic base on which to base a theorem consists of those fundamental theories which the whole enterprise is endeavoring to reveal. Moreover, buried in all mathematics pertaining to the physical world are numerous unarticulated assumptions, some of which are exposed below. The analytical character of dichotomic functions In motivating his discussion of the extraction of inequalities, Bell considered the measurement of spin using Stern-Gerlach magnets or polarization measurements of 'photons.' In both cases, single measurements can be seen as individual terms in a symmetric dichotomic series; i.e., having the values ± 1 . It is therfore natural to ask if the correlation computed using QM, P(a, b) = — cos(20), and verified empirically, can be the correlation of dichotomic functions. It is easy to show that they can not so be; consider: - cos(20) = k f P(x-
6)P(x)dx,
(22)
where p(A) is fc/27r and where the P's are dichotomic functions. Now, take the derivative w.r.t. 8, to get: 2 sin(2<9) = f 5(x - 6j)P(x)dx J
=^
P{0j) = k,
(23)
i
and again 4cos(20)=O,
(24)
which is false. QED Some authors (see, e.g., Aerts, this volume) employ a parameterized dichotomic function to represent measurements. Such a function can be dichotomic in the argument but continuous in the parameter, e.g., of the form P(sin(i) — x)), for which then the correlation is taken to be of the form Corr(t) = J
D(x-
sin(2t))D(x)dx.
(25)
J — IT
However, this approach seems misguided. First it assumes that the the argument of Corr, t, can be identical to the parameter of the dichotomic function
229
Pt(x) rather than the 'off-set' in the argument, here x, as befitting a correlation. Moreover, the same sort of consistency test applied above also results in contradictions; therefore, such parameterized functions do not constitute counterexamples invalidating the claim that discontinuous functions can not have an harmonic correlation. At best, this tactic implicitly results in the correlation of the measurement functions w.r.t. the continuous parameter, t, which is interpreted as the "weight" or frequency of the the dichotomic value. This tactic, however, does not conform with Bell's analysis in which the dichotomic values are to correlated, rather it corresponds with the type of model proposed below, without, however, recognizing Malus' Law as the source of the 'weights.' Conclusion: There is a fundamental error in Bell's analysis; the QM result is at irreconcilable odds with the conventional understanding of his arguments. 11 This can be revealed alternately, following Sica, by considering four dichotomic sequences (with values ± 1 and length N) a, a', b and b' and the following two quantities a ^ + a ^ = a;(6j + 6J) and dfii — a'^)'i = a\{bi — b^). Sum these expressions over i, divide by N, and take absolute values before adding together to get N
N
N
N
i
i
i
i
N
N
- £ | a j | | & i + &;i + - j > ; n & i - & ; i . i
(26)
i
The r.h.s. equals 2; so this is a Bell Inequality. Conclusion: this Bell Inequality is an arithmetic identity for dichotomic sequences; there is no need to postulate "locality" in order to extract it.12 Discrete vice continuous variables By implication Bell considered discrete variables for which the correlation would be 1 N Cor(a, 6 ) : = - 5 3 X 4 ( 0 ) ^ ( 6 ) ,
(27)
i
But: experiments measure the number of hits per unit time given a, b; and then compute the correlation, each event is a density, not a single pair. The
230
data taken in experiments corresponds to the read-out for Malus' Law, not the generation of dichotomic sequences for which each term represents an event consisting of a pair of photons with anticorrelated polarization or a particle pair with anticorrelated spins. This discrepancy is ignored in the standard renditions of Bell's analysis. It is, however, serious and suggests a different tack. Consider, following Barut, a model for which the spin axis of pairs of particles have random, but totally anticorrelated instantaneous orientation: Si = —S 2 . 13 Each particle then is directed through a Stern-Gerlach magnetic field with orientation a and b. The observable in each case then would be A := S i • a and B := S2 • b . Now by standard theory, _ , . „ s <\AB\> - . . Cor (A, B) = ' === , 28 V< A2 > < B2 > the where the angle brackets indicate averages over the range of the variables. This becomes Cor(A, B) = / ^ s i n ( 7 ) d y c o s ( 7 - g ) c o s ( 7 ) 2
^
2
\J(Jd'ysm(j)cos (j))
which evaluates to -cos(0); i.e., the QM result for spin state correlation. Conclusion: this model, essentially a counter example to Bell's analysis, shows that continuous functions (vice dichotomic) work. It is more than just natural to ask where do the 'gremlins' reside in Bell's analysis? There are at least two. One has to do with the following covert hypothesis: Bell's 'proof seems to pertain to continuous variables in that the demand is only that \A\ (\B\) < 1. This argument, however, silently also assumes that the averages, < A > = < B > = 0. It enters in the derivation of a Bell inequality where the second term above is ignored as if it is always zero. When it is not zero, Bell inequalities become; e.g., l\P(a, b) - P(a, b')\ + \P(a', b') - P(a', b)\<2+
2<
^>
(30)
which opens up a broader category of non quantum models. A second covert gremlin having broader significance is discussed below. Are 'nonlocal' correlations essential? The demand that in spite of the introduction of hidden variables, A, that a probability, P(a, b), averaged over these extra variables reduce to currently
231
used QM expressions, implies that: P(a, b)= f P(a, b, X)dX.
(31)
By basic probability theory, the integrand in this equation is to be decomposed in terms of individual detections in each arm according to Bayes' formula P{a, b, A) = P(X)P(a\ X)P(b\a, A),
(32)
where P(a\ A) is a conditional probability. In turn, the integrand above can be converted to the integrand of Bell's Ansatz: P(a, b) = jA(a, X)B(b, X)p{X)dX, iff P(b\a,X)
= P(b\X),
Va.
(33)
This equation admits, it seems, two interpretations: (i) When this equation is true, the ratio of occurrence of outcomes at station B must be statistically independent of the outcomes at A. Therefore, as the hidden variables A are 'extra' and do not duplicate a and b, even if the correlation is considered to be encoded by a A , it will not be available to an observer. But, the correlation by hypothesis does exist and is to be detectable via the a's and 6's; therefore, this equation can not hold. Thus, within this interpretation, Bell's Ansatz is not internally consistent. (ii) Alternately, if the a on the l.h.s. is superfluous, so is b; so that P — P(X) = 0 except at one value of A, where it equals 1, or is a Diracdelta function . That is, the correlation is totally encoded by the hidden variables, as follows if a sufficient number of new variables are introduced to render everything deterministic—as often assumed. Consequently, individual products of probabilities at the separate stations, i.e., AB's, in Bell's notation, become Dirac delta-functions of the A. If everything is deterministic, then there can be no overlap of the of the non-zero values of pairs of probabilities for a given value of A, and therefore, in the extraction of a Bell inequality, all quadruple products of P's with pair-wise different values of A in Eq. (19) are identically zero so that the final form of a Bell inequality is the trivial identity: \P(a,b)-P(a,b')\<2.
(34)
232
In either case, "locality" is not be so employed so as to exclude correlations generated at the conception of the spin-particles or photon pairs, i.e., "common causes." The non existence of instantaneous communication can not impose a restraint here; it must bear no relationship to the validity of Eq. (33). In addition, Eq. (34) reconciles Barut's continuous variable model with Bell's analysis. Bell-Kochen-Specker
'Theorem'
Besides Bell's original theorem there is another set of no-go theorems ostensibly prohibiting a local realistic extention for QM. In contrast to the theorem analyzed above, they do not make explicit use of 'locality,' rather they use certain properties (falsely, it turns out) of angular momentum (spin). In general, the 'proof of these theorems proceeds as follows: The system of interest is described as being in a 'state' \ip) specified by observables A, B, C A hidden variable theory is then taken to be a mapping v of observables to numerical values: v(A),v(B),v(C)... Use is then made of the fact that if a set of operators all commute, then any function of these operators f(A, B,C...) = 0 will also be satisfied by their eigenvalues: f(v(A), v(B),v(C)...) — 0. The proof of a Kochen-Specker Theorem proceeds by displaying a contradiction; consider, e.g., two 'spin-1/2' particles for which the nine separate mutually commuting operators can be arranged in the following 3 by 3 matrix:
°l °l °\°\ °Wy °l°\ °\°z
(35)
It is then a little exercise in bookkeeping to verify that any assignment of plus and minus ones for each of the factors in each element of this matrix results in a contradiction, namely, the product of all these operators formed row-wise is plus one and the same product formed column-wise is minus one.14 Now, recall that given a uniform static magnetic field B in the z-direction, the Hamiltonian is: H = ^Baz for which the time-dependent solution of the r n—iuit
Schrodinger equation is: ip(t) = 4=
e
„+iut
and this in turn gives time-
dependent expectation values for spin values in the x,y directions^ 5 < &x >— ~ cos(o;i); < ay >= - sin(wi), where w =
eB/mc.
(36)
233
Proof of a Bell-Kochen-Specker theorem depends on simultaneously assigning the [eigenvalues ± 1 to
(37)
where factors of the form exp(i(wt + k • x + £(t)), where £(£), is a random variable, are dropped, as they are suppressed by averaging.16 Now, the random variables with physical significance, emerging in the detectors per Malus' Law, are EA B . It is the detectors that digitize the data and create the illusion of 'photons.' But, because Maxwell's Equations are not linear in intensities, rather in the fields, a fourth order field correlation is required to calculate the cross correlation of the intensity: P(a, b) = K<(A-
B)(B • A) >,
(38)
where brackets indicate averages over space-time. (This appears to be the source of "entanglement" in QM, which is seen to have no basis beyond that found in classical physics.) Here, Eq. (38) turns out to be: P ( + , +)
(COS(J/) sin(i/ + 6) - sin(i/) cos(i/ + 6)fdv, (39) Jo which gives P ( + , + ) = P ( - , - ) oc /tsin 2 (0) a n d P ( - , + ) = P ( - , - ) ocfccos2(0). The constant, K, can be eliminated by computing the ratio of particular events to the total sample space, which here includes coincident detections in all four combinations of detectors averaged over all possible displacement angles 6; thus, the denominator is: <XK
— / (sin2 (6») + cos2 (6))d6 = 2K, i" Jo
(40)
234
so that the ratio; becomes: P ( + , + ) = isin 2 (0),
(41)
the QM result. This in turn yields the correlation Cor(a, b) :=
P ( + , +) + P ( - , - ) - P ( + , - ) - P ( - , +) P ( + , +) + P ( - , - ) + P ( + , - ) + P ( - , + ) ' Cor (a, b) = -cos(20).
(42)
If the fundamental assumptions involved in this local, realistic model are valid, then there would be observable consequences. For example, if radiation on the "other side" of a photodetector is continuous and not comprised of "photons," then, photoelectrons are evoked independently in each detector by continuous but (anti)correlated radiation. Thus, the density of photoelectron pairs should be linearly proportional (baring effects caused by limited coherence) to the coincidence window width. On the other hand, if photons are in fact generated in matched pairs at the source, then at very low intensities, the detection rate should be relatively insensitive to the coincidence window width once it is wide enough to capture both electrons. 1. L. de la Peha and A. M. Cetto, The Quantum Dice (Kluwer, Dordrecht, 1996). 2. A. F. Kracklauer, An Intuitive Paradigm for Quantum Mechanics. Physics Essays 5 (2) 226 (1992). 3. A. F. Kracklauer, Found. Phys. Lett. 12 (5) 441 (1999). 4. G. Hermann, Die Naturphilosophischen Grundlagen der Quantenmechanik. Abhandlungen der Fries'schen Schule 6, 75-152 (1935). 5. D. Bohm, Causality and Chance in Modern Physics. (Routledge & Kegan Paul Ltd., London, 1957). 6. H. Puthoff, Phys. Rev. A 40, 4857 (1989); 44, 3385 (1991). 7. A. Einstein, B. Podolsky and N. Rosen, Phys. Rev. 47, 777 (1935). 8. J. S. Bell, Speakable and unspeakable in quantum mechanics, (Cambridge University Press, Cambridge, 1987). 9. J. S. Bell in Foundations of Quantum Mechanics, Proceedings of the International School of Physics 'Enrico Fermi,' course IL (Academic, New York, 1971), p. 171-181; reprinted in Ref [8]. 10. A. Afriat and F. Selleri, The Einstein, Podolsky and Rosen Paradox, (Plenum, New York, 1999) review theory and experiments from a current prospective.
235
11. A. F. Kracklauer in New Developments on Fundamental Problems in Quantum Mechanics, M. Ferrero and A. van der Merwe (eds.) (Kluwer, Dordrecht, 1997), p.185. 12. L. Sica, Opt. Commun. 170, 55-60 & 61-66 (1999). 13. A. O. Barut, Found. Phys. 22 (1) 137 (1992). 14. N. D. Mermin; Rev. Mod. Phys. 65 (3) 803 (1993); 15. R. H. Dicke and J. P. Wittke, Introduction to Quantum Mechanics, (Addison-Wesley, Reading, 1960) p. 195. 16. A. F. Kracklauer, in Instantaneous Action-at-a-Distance in Modern Physics, A. E. Chubykalo, V. Pope and R. Smirnov-Rueda (eds.) (Nova Science, Commack NY, 1999) p. 379; http://arXiv:quant-ph/0007101; Ann. Fond. L. deBroglie 20 (2) 193, (2000).
236
A PROBABILISTIC I N E Q U A L I T Y FOR T H E KOCHEN-SPECKER PARADOX JAN-AKE LARSSON Matematiska Institutionen, Linkopings Universitet SE-581 83 Linkoping, Sweden E-mail: [email protected] A probabilistic version of the Kochen-Specker paradox is presented. The paradox is restated in the form of an inequality relating probabilities from a non-contextual hidden-variable model, by formulating the concept of "probabilistic contextuality." This enables an experimental test for contextuality at low experimental error rates. Using the assumption of independent errors, an explicit error bound of 0.71% is derived, below which a Kochen-Specker contradiction occurs.
1
Introduction
The description of quantum-mechanical (QM) processes by hidden variables is a subject being actively researched at present. The interest can be traced to topics where recent improvements in technology has made testing and using QM processes possible. Research in this field is usually intended to provide insight into whether, how, and why QM processes are different from classical processes. Here, the presentation will be restricted to the question whether there is a possibility of describing a certain QM system using a non-contextual hidden-variable model or not. A non-contextual hidden-variable model would be a model where the result of a specific measurement does not depend on the context, i.e., what other measurements that are simultaneously performed on the system. It is already known that for perfect measurements (perfect alignment, no measurement errors), no non-contextual model exists. These results origin in the work of Gleasonf but a conceptually simpler proof was given by Kochen and Specker2 (KS). The KS theorem concerns measurements on a QM system consisting of a spin-1 particle. In the QM description of this system, the operators associated with measurement of the spin components along orthogonal directions do not commute, i.e., 'Sxj^y,
and
s z do not commute.
(1)
however, the operators that are associated with measurement of the square of the spin components do commute, i.e., ^1,'s'i,
and s^
commute.
(2)
237
The latter operators (the squared ones) have the eigenvalues 0 and 1, and si +s2y + s2z = 21.
(3)
Thus, it is possible to simultaneously measure the square of the spin components along three orthogonal vectors, and two of the results will be 1 while the third will be 0. Only this QM property of the system will be used in what follows. The notation used from now on is intended to avoid confusion with QM notation, since the notions used will be those of (Kolmogorovian) probability theory, not QM. A hidden-variable model will be taken to be a probabilistic model, i.e., the hidden variable A is represented as a point in a probabilistic space A, and sets in this space ("events") have a probability given by the probability measure P. The measurement results are described by random variables (RVs) Xj(A), which take their values in the value space {0,1}. These mappings will depend not only on the hidden variable A, but also the specific directions in which we choose to measure the squared spin components, so that we would have Xi(x,y,z,A):A->{0,l} X2(x,y,z,A):A-+{0,l} X3(x,y,z,A):A^{0,l}.
(4)
Here, Xi is the result of the measurement along the first direction (x), X 2 along the second (y), and X3 along the third (z). To be able to model the spin-1 system described above, these RVs would need to sum to two, i.e., 3
^ X i ( x , y , z , A ) = 2.
(5)
i=l
This is in itself no guarantee that the model will be accurate, but it is the least one would expect from a hidden-variable model yielding the QM behaviour. In simple experimental setups, there is usually only one direction specified (the direction along which the spin component squared is measured). Thus, we would expect that X\ only depends on x (and A). This is referred to as non-contextuality, and more formally this can be written as Xi(x,y,z,A) = X 1 ( x , y ' , z ' , A ) X2(x,y,z,A)=X2(x',y,z',A) AT 3 (x,y,z,A) = X 3 ( x ' , y ' , z , A ) .
(6)
These two prerequisites are all that is needed to arrive at the Kochen-Specker paradox.
238
2
The Kochen-Specker t h e o r e m
A more appropriate name for this section is perhaps "A Kochen-Specker theorem," since there are several variants; the example presented here is from Peres (1993).3 All variants aim for the same thing: to show a contradiction by assigning values to measurement results coming from a non-contextual hiddenvariable model. In this particular one,3 a set of 33 three-dimensional vectors are used, depicted in Fig. 1.
Figure 1: The 33 vectors used in the Kochen-Specker theorem. The vectors are from the center of the cube onto one of the spots on the cube's surface (normalized, if desired).
The proof is as follows; assume that we have a non-contextual hiddenvariable model. Then, for any A (except perhaps for a null set), this model satisfies equations (5) and (6), in particular for the directions in Fig. 1. Now, look at Fig. 2(a). The measurement result along one of the coordinate axes must be 0, and along the other axes it must be 1. Let us assume that the 0 is obtained from the measurement along the z axis (the white spot on the cube) and the other two measurements yield 1 (black spots"). Measurements along other directions in the ay-plane must also yield 1, as indicated in Fig. 2(a). In Fig. 2(b-d), three more similar choices are made, and having made these assignments, a white spot must be added at the position indicated in Fig. 2(e), because of the two black spots at orthogonal positions, and by this another black spot must be added, being orthogonal to the white one. This procedure continues in Fig. 2(f-j) until all the spots are painted either white or black as necessitated by the previously painted spots. Finally, in Fig. 2(k), we have three black orthogonal spots, violating equation (5), the condition of QM results. A similar contradiction will occur whatever choices we make in our assignments in Fig. 2(a-d), and we have a proof of the KS theorem. We have "these were green and red in Peres 3
239
(a) Arbitrary choice
(b) Arbitrary choice
(c) Arbitrary choice
(d) Arbitrary choice
(e) Orthogonality
(f) Orthogonality
(g) Orthogonality
(h) Orthogonality
(i) Orthogonality
(j) Orthogonality
(k) Contradiction
Figure 2: A proof of the Kochen-Specker paradox.
240
Theorem 1: (Kochen-Specker) The following three prerequisites cannot hold simultaneously for any A (i) Realism. Measurement results can be described by probability theory, using three (families of) RV's X;(x,y,z):A->{0,l},
i = 1,2,3.
(ii) Non-contextuality. The result along a vector is not changed by rotation around that vector. For example, Xi(x,y,z,A) = X j ( x , y ' , z ' , A ) . (Hi) Quantum-mechanical results. For any triad, the sum of the results is two, i.e., ^ X i ( x , y , z , A ) = 2. i
Note that there is a certain structure to the proof: assignment of measurement results on a finite number of orthogonal triads according to the QM rule, and rotations connecting the measurement results on different triads by non-contextuality. This structure can be made explicit in the statement of the theorem, by introducing the set EKS (a "KS set of triads"):
5
"""{©•©•©•©•-•(-i )}
(7)
In this set there are n vectors forming TV distinct orthogonal triads where some vectors are present in more than one triad, establishing in total M connections by rotation around a vector. Using this notation, (a restricted version of) the KS theorem is Theorem 1': (Kochen-Specker) Given a KS set of vector triads EKS, the following three prerequisites cannot hold simultaneously for any A (i) Realism. For any triad in EKS, the measurement results can be described by probability theory, using three (families of) RV's Xi(x,y,z):A^{0,l},
1 = 1,2,3.
241
(ii) Non-contextuality. For any pair of triads in EKS related by a rotation around a vector, the result along that vector is not changed by the rotation. For example, Xi(x,y,z,A) = X i ( x , y ' , z ' , A ) . (Hi) Quantum-mechanical results. For any triad in EKS , the sum of the results is two, i.e., ^ X i ( x , y , z , A ) = 2. i
This version of the KS theorem will be useful when formulating a probabilistic version of the theorem. 3
T h e Kochen-Specker inequality
The above discussion is valid in an ideal situation where no measurement errors are present. Introducing measurement errors, these occur as (i) missing detections, (ii) changes in the results along the axis vector when rotating, or (hi) deviations from the sum 2. Since the prerequisites of Theorem 1 is no longer valid, neither is the theorem. However, using probabilistic notions the theorem can be restated as follows. Theorem 2: (Kochen-Specker inequality) Given a KS set EKS of AT vector triads with M interconnections by rotation, if we have (i) Realism. For any triad in EKS, the measurement results can be described by probability theory, using three (families of) RV's Jfi(x,y,z):AXl-+{0,l},
i=l,2,3,
where Ax{ is a (possibly proper) subset of A. (ii) "Rotation" error bound. For any pair of triads in EKS related by a rotation around a vector, the set of As where the result along that vector is not changed by the rotation is probabilistically large (has probability greater than 1 — S). For example, p ( \ \ : Xi(x > y > z,A) = Xi(x,y' > z' > A)) > ) > 1 - S.
242
(Hi) "Sum" error bound. For any triad in EKS, the set of As where the sum of the results is two is probabilistically large (has probability greater than 1 - e ) , i.e., p f { A : ^ X i ( x , y , z , A ) = 2 } ) > 1 - e.
Then M8 + Ne> 1, To shorten the proof, the following symmetry of the measurement results are assumed to hold (the proof goes through without the symmetry, but grows notably in size): Xi(x,y,z,A) = X 2 ( z , x , y , A ) = X 3 ( y , z , x , A).
(8)
Proof: By Theorem 1, we have (f|{A:X1(x,y,z,A)=X1(x,y',z',A)})fl M
(f|{A:]Txi(x,v,z,A) = 2 } ) = 0 N
%
Then, the complement has probability one, and 1 = P
(\j{^-X1(K,y,z,X)=X1(x,y',z,,X)} -
)
M
U(U{ A : £^( x 'y> z > A ) = 2 } c )l N
J
i
<^p({A:X1(x,y,z,A)=X1(x,y',z',A)}C)
(9)
M
+
E p ({ A : E x ^ x >y> z ' A ) = 2 } c ) N
i
<M6 + Ne Here, the probability in (iii) is to be read as "the probability of obtaining results for all three Xi and that the sum is two." In other words, it is
243
possible to avoid using the no-enhancement assumption in Theorem 2, but unfortunately inefficient detector devices would contribute no-detection events to both the error rates S and e, which puts a rather high demand on experimental equipment. While the no-enhancement assumption can be used in inefficient setups, this may weaken the statement (cf. a similar argument for the GHZ paradox 2 ). The error rate e is the probability of getting an error in the sum (both nondetections and the wrong sum are errors here), not the probability of getting an error in an individual result. This makes it easy to extract e from experimental data, but unfortunately, the errors that arise in rotation are not available in the experimental data so it is not possible to estimate the size of S (note that it is not even meaningful to discuss 5 in QM). It is possible to use e to obtain a bound for 5: Corollary 3 (Kochen-Specker inequality) Given a KS set of N vector triads EKS with M interconnections by rotation, if Theorem 2 (i-iii) hold, then
Obviously, a small EKS s e t (small N and M) is better, yielding a higher bound for S for a given e (for a few different KS sets, see 2 , 3 ' 5 ). In an inexact experiment yielding a large e one expects the error rate S to be large as well, whereas the bound in Theorem 3 will be low because of the large e. A model for this inexact experiment may then be said to be "probabilistically non-contextual"; the measurement error rate is large enough to allow the changes arising in rotation to be explained as natural errors in the inexact measurement device, rather than being fundamentally contextual. For a good experiment yielding a low e one expects 6 to be low, but here the bound in Theorem 3 is higher. In a hidden-variable model of this experiment, the changes arising in rotation occur at an unexpectedly high rate which cannot be explained as due to measurement errors, and a model of this type may be said to be "probabilistically contextual". Note that this "probabilistic" noncontextuality is a weaker notion than the one used in Theorem 1 (ii). 4
Independence
To enable a general statement, the proof of Theorem 2 does not make any assumptions on independence of the errors, but it is possible to give a more quantitative bound for the error rate by introducing independence (for simplicity, at 100% detector efficiency). Corollary 4 (KS inequality for independent errors): Assuming that the errors are independent at the rate r and that Theorem 2 (i-iii) hold, then both
244
6 and e are given by r, and M(2r - 2r 2 ) + iV(3r - 5r 2 + 3r 3 ) > 1. Proof: In the case of independent errors at the rate r, the expressions for the probabilities in Theorem 2 (i) and (ii) are
p({\:X1(X,y,z,\)=X1(x,y',z',\)}) .rrors) + P(fliponbothXi's) P(fliponboth. = P(noerrors) 2
= (l-r) +r
p
2
*• '
2
= l-(2r-2r ),
({ A: E x <( x 'y' z > A ) = 2 }) 1
(ii)
= P(noerrors) + P(flipoftheOandonel) = (1 - r) 3 + 2(1 - r)r 2 = 1 - (3r - 5r 2 + 3r 3 ). The probabilities of these sets are not independent, so from this point on we cannot use independence. The inequality above then follows easily from Theorem 2. An expression on the form r > f(N, M) can now be derived from Corollary 4, but this complicated expression is not central to the present paper. One important observation is that again, to obtain a contradiction for high error rates (r), a small EKS set is needed (small N and M). Unfortunately, the error rate needs to be very low, e.g., in the E^s m the present example, 6 only an error rate r below 0.71% yields a contradiction in Corollary 4. Please note that there is no experimental check whether the assumption of independent errors holds or not. While the errors in the sum may be possible to check, it is not possible to extract what errors are present in the rotations or check for independence of those errors (further discussion of independence is necessary but cannot be fit into this limited space). ''The set contains 33 vectors forming 16 distinct orthonormal bases 3 , but some rotations used are not between two of these 16 bases; in some cases a rotation goes from one of the 16 bases to a pair of vectors in the set (where the third needed to form a basis is not in the set), and a subsequent rotation returns us to another of the 16 bases. Thus, in the notation adopted here, a few extra vectors are needed to form % s yielding n = 41, N — 24, and M = 31. Note that these additional vectors are not needed to yield the KS contradiction, but are only needed in the proof of the inequality in this paper. A more detailed analysis for the initial set of 33 vectors is possible, probably yielding a contradiction at a somewhat higher r than the one obtained from this general analysis but this is lengthy and will not be done here.
245
5
Conclusions
To conclude, for any hidden-variable model we have a bound on the changes arising in rotation:
Here, iV is the number of triads in EKS and M is the number of connections within EKS- A proof using few triads with few connections is not only easier to understand but is also essential to yield a bound usable in real experiments. At a large error rate e probabilistically non-contextual models cannot be ruled out, since the changes of the results arising in rotation can be attributed to measurement errors. However, a small error rate e will force any hiddenvariable description of the physical system to be probabilistically contextual. If the assumption of independent errors is used, an explicit bound can be determined for the error rate r: M(2r - 2r 2 ) + ./V(3r - 5r 2 + 3r 3 ) > 1,
(13)
which is possible to write on the form r > f(N, M). Below the bound, we have a KS contradiction. Again, a small KS set is better than a large one, yielding a higher bound. For example, for the KS set used here,3 an r below 0.71% yields a contradiction. While writing this paper, the author learned from C. Simon that a similar approach was in preparation by him, C. Brukner, and A. Zeilinger.6 The author would like to thank A. Kent for discussions. This work was partially supported by the Quantum Information Theory Programme at the European Science Foundation. 1. A. M. Gleason, J. Math. Mech. 6, 885, (1957). 2. S. Kochen and E. P. Specker, J. Math. Mech. 17, 59 (1967). 3. A. Peres, Quantum Theory: Concepts and Methods, Ch. 7, (Kluwer, Dordrecht, 1993). 4. D. M. Greenberger, M. Home, A. Shimony, and A. Zeilinger, Am. J. Phys. 58, 1131 (1990); N. D. Mermin, Phys. Rev. Lett. 65, 1838 (1990); J.-A. Larsson, Phys. Rev. A 57, R3145 (1998); J.-A. Larsson, Phys. Rev. A 59, 4801 (1999). 5. A. Peres, J. Phys. A 24, L175 (1991); J. Zimba and R. Penrose, Stud. Hist. Philos. Sci. 24, 697 (1993). 6. C. Simon, C. Brukner, and A. Zeilinger, quant-ph/0006043.
246
Q U A N T U M STOCHASTICS. T H E N E W A P P R O A C H TO T H E D E S C R I P T I O N OF Q U A N T U M M E A S U R E M E N T S ELENA LOUBENETS Moscow State Institute of Electronics and Mathematics Abstract We propose a new general approach to the description of an arbitrary generalized direct quantum measurement with outcomes in a measurable space. This approach is based on the introduction of the physically important mathematical notion of a family of quantum stochastic evolution operators, describing in a Hilbert space the conditional evolution of a quantum system under a direct measurement. In the frame of the proposed approach, which we call quantum stochastic, all possible schemes of measurements upon a quantum system can be considered. The quantum stochastic approach (QSA) gives not only the complete statistical description of any quantum measurement (a POV measure and a family of posterior states) but it gives also the complete stochastic description of the random behaviour of a quantum sytem in a Hilbert space in the sense of specifying the probabilistic transition law governing the change from the initial state of a quantum system to a final one under a single measurement. When a quantum system is isolated the family of quantum stochastic evolution operators consists of only one element which is a unitary operator. In the case of continuous in time measurements the QSA allows to define, in the most general case, the notion of the family of posterior pure state trajectories (quantum trajectories) in the Hilbert space of a quantum system and to give their probabilistic treatment. 1
Introduction
The evolution of the isolated quantum system is quantum deterministic since its behaviour in a complex separable Hilbert space H is described by a unitary operator U(t) :% —> %, satisfying the Schrodinger equation whose solutions are reversible in time. Under a measurement the behaviour of a quantum system becomes irreversible in time and stochastic: not only is the outcome of a measurement random being defined with some probability distribution but the state of a quantum system becomes random as well. Consider the general scheme of description of any quantum measurement
247
with outcomes of the most general nature possible under a quantum measurement. Such a measurement is usually called generalized. Let n be a set of outcomes and J 7 be a u-algebra of subsets of fi. Let po be a state of a quantum system at the instant before a measurement. The complete statistical description of any generalized quantum measurement implies that for any initial state po of a quantum system we can present: • the probability distribution of different outcomes of a measurement; • the statistical description of a state change po -> pout of the quantum system under a measurement. We shall say also about the complete stochastic description of the random behaviour of a quantum system under a measurement in the sense of specifying the probabilistic transition law governing the change from the initial state of a quantum system to a final one under a single measurement. Introduce some notations. Let fj,(E,po) = Prob{w 6 E;p0}, WE £ T be a probability that under a measurement (upon a quantum system being initially in a state po) the observed outcome UJ belongs to a subset E. Let Ex{Z\E) be a conditional expectation of any von Neumann observable Z G C(H), Z = Z+ at the instant immediately after the measurement provided the observed outcome w 6 E. Here C{H) denotes the linear space of all linear bounded operators on 7i. The statistical (density) operator pout(E,po) is called a posterior state of a quantum system conditioned by the observed outcome w € E if for any Z the following relation is valid Ex{Z\E}
= tr[pout(E,p0)Z].
(1)
Unconditional (a priori) state p0ut(Q,Po) of a quantum system defines the quantum mean value tr[pout(n,p0)Z]
= Ex{Z\Q}
= (Z)Pout{n,Po)
(2)
of any von Neumann observable Z at the instant immediately after the measurement if the results of a measurement are ignored. Any conditional state change p0ut(E,po) of a quantum system under a measurement can be completely described by a family of statistical operators {Pout(u,Po),v G ft], denned ^-almost everytwhere on fl, and called a family of posterior states Specifically, for WE £ T, fi(E, p0) ^ 0 Pout{E Po)
'
~
pjE^)
(3)
248
and, consequently, due to (1), for any von Neumann observable Z the conditional expectation can be presented as Ex{Z\E\
= f"eB
tr pout
^
^' P o ) Z M ^ , Pa) p(E,p0)
(4)
Every posterior state pout(^,po) describes the state of a quantum system conditioned by the "sharp" outcome w. In general, however, when outcomes of a measurement are not of discrete character or the observation is not "sharp" then, provided the outcome u> £ E, we can only say that after a measurement the quantum system is in a state p0ut(<^,Po) with probability
**
n{dw,po) (w)
7^T
(5)
where XE{<*>) is an indicator function of a subset E. The a priori state p0ut(^,Po) a n d the quantum mean value of any von Neumann observable Z at the instant immediately after the measurement are represented through the family of posterior states as Pout(tt,p0)=
(z)pout(n,po)=
/ Pout(u,p0)lJ'(du,Po), Ja
(6)
/ tr[pout(uj,po)Z]ft(
(7)
respectively. The relation (6) can be considered as the usual statistical average over posterior states p0ut{u,Po), given with the probability distribution p,(cLj,po). From (7) it also follows that in any possible measurement upon an observable Z, which could be done immediately at the instant after the first measurement, the probability distribution Prob{z € A;pout(Cl,po)} of possible outcomes is given by Prob{z e A ; / w ( n , / 9 0 ) } = / Pvob{z € A;pout(u,po)}fi(du,p0).
(8)
JQ
This formula can be considered as the quantum analog of Bayes' formula in classical probability theory. In quantum theory there are two major approaches to the specification of above mentioned elements of the description of a quantum measurement.
249
• The von Neumann approach [1] considers only direct measurements with outcomes in R. According to this approach only self-adjoint operators on ~H are allowed to represent real-valued variables of a quantum system, which can be measured (observables). The probability distribution p,(E,po) of any measurement is denned as Li(E,po)=tr[p0P(E)l
(9)
through the projection-valued measure P(-) on (R, B(M)), corresponding, due to the spectral theorem, to the self-adjoint operator, representing this observable. Under the von Neumann approach the posterior state of a quantum system is defined only in the case of discrete spectrum of a measured quantum variable and is given by the well-known "jump" of a quantum system under a measurement, prescribed by von Neumann reduction postulate. In the case of continuous spectrum of a quantum observable the description of a state change of a quantum system under a measurement is not formalized. The simultaneous measurement of n quantum observables is allowed if and only if the corresponding self-adjoint operators and, consequently, their spectral projection-valued measures commute. •The operational approach [2-8] gives the complete statistical description of any generalized quantum measurement. In the frame of the operational approach the mathematical notion of a quantum instrument plays the central role. In physical literature a quantum instrument is usually called a " superoperator". Specifically, a mapping T(-)[-]: T x C(Ji) -> C{T-L) is called a quantum instrument if T(-) is a measure on (fi, F) with values T(E), VE £ T, being linear bounded normal completely positive maps on £(H), such that the following normality relation is valid: T(fi)[J] = J. Let T(-)[-] be an instrument of a generalized quantum measurement. Then the conditional expectation of any von Neumann observable Z at the instant after a measurement is defined to be Ex{m
=
^mMMt
yE£jr.
(10)
H{h,po) In case Z = I, from (10) it follows that in the frame of the operational approach the probability distribution p(E, po) of outcomes under a measurement is given by p(E,p0)
= tr[p0T(E)[I]],
V£ € T.
(11)
250
The positive operator-valued measure M(E) = T(E)[I], satisfying the condition M(fi) = / is called a probability operator-valued measure or a POV measure, for short. From (1) and (10) it also follows that, for any initial state po, the posterior state p0ut(E,po) conditioned by the outcome us £ E can be represented as Pout(E,p0)-
KEpo)
,
(12)
where T*(E)[-] denotes the dual map, acting on the linear space T(H) of trace class operators on H and denned by tr[ST(E)[Z]} = tr[T*(E)[S}Z],
VZ £ C{U),
VS
<ET(H).
(13)
For any initial state po of a quantum system the family of posterior state {Pout(u,po),w G fi} always exists and is denned uniquely, /^-almost everywhere, by the relation: /
tr[pout(cj,p0)Z]fi(du,p0)=tr[p0T(E)[Z]},
MZ 6 C(H), V£ € T. (14)
Jui€E
Due to (13), (14) we have T*(E)[p0]=
pout(uj,po)p-(du),po),
(15)
Jw€E
and, consequently, the posterior state pout(^,Po) is a density of the measure T*(-)[po] with respect to the probability scalar measure p,(-,po). The operational approach is very important for the formalization of the complete statistical description of an arbitrary generalized quantum measurement. However, the operational approach does not specify the description of a generalized direct quantum measurement, that is, the situation where we have to describe a direct interaction between a measuring device and an observed quantum system, resulting in some observed outcome w in a classical world and the change of a quantum system state conditioned by this outcome. We would like to emphasize that, in principle, the description of a direct measurement can not be simply reduced to the quantum theoretical description of a measuring process. We can not specify definitely neither the interaction, nor the quantum state of a measuring device environment, nor to describe a measuring device only in quantum theory terms. In fact under such a scheme the description of a direct quantum measurement is simply postponed to the
251
description of a direct measurement of some observable of the environment of a measuring device. The operational approach does not also, in general, give the possibility to include into consideration the complete stochastic description of the random behaviour of a quantum system under a measurement. We recall that for the case of discrete outcomes the von Neumann approach gives both - the complete statistical description of a direct quantum measurement and the complete stochastic description in a Hilbert space of the random behaviour of a quantum system under a single measurement. In particular, if the initial state po of a quantum system is pure, that is, po = |V'o)(V'o|, and if under a single measurement the outcome A_, is observed, then in the frame of von Neumann approach the quantum system "jumps" with certainty to the posterior pure state
AV'o H^-iM
(16)
where Pj is the projection, corresponding to the observed eigenvalue Xj. The probability fij of the outcome Xj is given by
H = ll-P^oll2.
(17)
We would also like to underline that the description of stochastic, irreversible in time behaviour of the quantum system under a direct measurement is very important, in particular, in the case of continuous in time direct measurements, where the evolution of continuously observed quantum system can not be described by reversible in time solutions of the Schrodinger equation. In quantum theory any physically based problem must be formulated in unitarily equivalent terms and the results of its consideration must not be dependent neither on the choice of a special representation picture (Schrodinger, Heisenberg or interaction) nor on the choice of a basis in the Hilbert space. That is why, in [9] we introduce the notion of a class of unitarily equivalent measuring processes and analyse the invariants of this class. We show [9] that the description of any generalized direct quantum measurement with outcomes in a standard Borel space ( n , F g ) can be considered in the frame of a new general approach, which we call quantum stochastic, based on the notion of a family of quantum stochastic evolution operators, satisfying the orthonormality relation. In the case when a quantum system is isolated the family of quantum stochastic evolution operators consists of only one element, which is a unitary operator. The quantum stochastic approach (QSA), which we present in the next section, can be considered as the quantum stochastic generalization of the de-
252
scription of von Neumann measurements for the case of any measurable space of outcomes, an input probability scalar measure of any type on the space of outcomes and any type of a quantum state reduction. Due to the orthonormality relation, the QSA allows to interpret the posterior pure states, defined by quantum stochastic evolution operators, as posterior pure state outcomes in a Hilbert space corresponding to different random measurement channels. Even for the special case of discrete outcomes, the QSA differs, due to the orthogonality relation for posterior pure state outcomes, from looking somewhat similar, approaches considered in the physical literature [10,11], where the so called "measurement" or Kraus operators are used for the description of both the statistics of a measurement (a POV measure) and the conditional state change of a quantum system. The QSA gives not only the complete statistical description of any generalized direct quantum measurement but it gives also the complete stochastic description of the random behaviour of the quantum system under a measurement 2
Quantum stochastic approach
In this section we introduce the quantum stochastic approach (QSA) to the description of a generalized direct quantum measurement, developed in [9]. Specifically, it was shown in [9] that for any generalized direct quantum measurement with outcomes in a standard Borel space (ft, TB) upon a quantum system, being at the instant before the measurement in a state po, there exist: • the unique family of complex scalar measures, absolutely continuous with respect to a finite positive scalar measure v(-) and satisfying the orthonormality relation: A = {nji(ui)i/(du;) : LJ £ Cl;i,j - 1,...,N0;N0
< oo; / Trji(cj)i/(du)) = <%}; Jn (18)
• the unique (up to phase equivalence) family of v- measurable operatorvalued functions l^(-) on fi, satisfying the orthonormality relation, with values being linear operators on % defined for any ip 6 % v- almost everywhere on ft: V = {Vi(u) : u £ il;i = 1,..., JV0; f Vf (u)Vi(w)irji(u)v(du)
= %/},
(19)
and such that for any index i = l,...No and for V.E 6 TB [ Jw€E
Vi(w)7r«(u;)i/(dw)
(20)
253
is a bounded operator on %. The relation WVOM = V;MV,
W> G H ,
(21)
holding ^-almost everywhere on fl, defines the bounded linear operator Wi : Ti —>Ce(il,i>y,'H) with the norm ||Wj|| = 1. Here Vi{dw) = nu(ui)i'(daj); • the unique sequence of positive numbers a = (01,0:2, ..,OJV 0 ), satisfying the relation No
5 > i = i;
(22)
»=i
such that the complete statistical description (a POV measure and a family of posterior states) of a measurement and the complete stochastic description of the random behaviour of a quantum system under a single measurement (a family of posterior pure state outcomes and their probability distribution) are given by: • The POV measure Wo
M(E) = J2 <*iMi{E),
V£ e TB
(23)
VJ+MVSMi^dw);
(24)
i=l
with Mi(E) = f JweE
• The family of posterior states No
Pout{u, Po) = ^2 t = i &(w)r^(w, po)
(25)
T%t(w,p0) = Vi(cj)poV?(Lj)
(26)
with
and
f«H = ^
, " \7u ) f
-'>
E j <Xi*n MM "™*(u, po)]'
(27)
254
• The probability scalar measure of the measurement, given by the expression H(du,p0) = ^ a ^ w ( d w , p 0 )
(28)
i
through the probability scalar measures ^ ( d w . p o ) = tr[T^t(uj,po)Mdoj).
(29)
• The family of random operators (19), describing the stochastic behaviour of the quantum system under a single measurement. Every operator Vi(ui) defines in the Hilbert space % a posterior pure state outcome, conditioned by the observed result ui and corresponding to the i-th random channel of a measurement. For any ij)0 £% the following orthonormality relation for a family {Vi(u>)ipo, w i £l;i = l,...,No} of unnormalized posterior pure state outcomes is valid:
/ (^»Vo, v s M M w M K d w ) = <MhM«-
(30)
For the definite observed outcome u the probability of the posterior pure state outcome Vi(-)tpo in the Hilbert space % is given by Q(. A1 ;
O^MMIIVJM^OH2
~E,-«i*iiMI|v;-MiM 2 "
/OI\
^ '
We call Viifjj) quantum stochastic evolution operators and the probability scalar measures i/j(-),fo(-) = Z ^ a w O and/zW(-,p 0 ), P>(-,Po) = S» a »/ x ( , ) ("iA)) - input and output probability measures, respectively. Due to the decompositions (23), (25), and (28) Mi(E), T^t(uj,p0), Vi(-) and fj,^(-,po) are interpreted to present the POV measure, the unnormalized posterior state, the input and the output probability distributions of outcomes in the i-th func-random channel of the measurement, respectively..The statistical weights of different i-th func-random channels are given by numbers a>i,i = 1, ...,N0.
The a priori state Pout(ti;po) = y2ai i
T^t(u,p0)ui((hj)
J
(32)
n
is the usual statistical average over unnormalized posterior states Tg^t(uj,po) with respect to the input probability distribution of outcomes Ui(-) in every channel.and with respect to different random channels of the measurement.
255
Physically, the introduced notion of different random channels of a measurement corresponds, under the same observed outcome, to different random quantum transitions of the environment of a measuring device, which we can not, however, specify with certainty. The triple 7 = {A, V , a } is called a quantum stochastic representation of a generalized direct measurement. We call direct measurements, presented by different quantum stochastic representations, stochastic representation equivalent if the statistical and stochastic description of these direct measurements is identical. In the frame of the QSA von Neumann (projective) measurements present such the stochastic representation equivalence class of direct measurements on (E, B(M)), for which the complete statistical and the complete stochastic description is given by the von Neumann measurement postulates [1], presented by the formulae (16), (17). 3
Concluding remarks
We present a new general approach to the description of a generalized direct quantum measurement. The proposed approach allows to give: • the complete statistical description (a POV measure and a family of posterior states) of any quantum measurement; • the complete description in a Hilbert space of the stochastic behaviour of a quantum system under a measurement (in the sense of specifying of the probabilistic transition law governing the change from the initial state of a quantum system to a final one under a single measurement); • to formalize the consideration of all possible cases of quantum measurements, including measurements continuous in time; • to give the semiclassical interpretation of the description of a generalized direct quantum measurement. 4
Acknowledgments
This investigation was supported by the grant of Swedish Royal Academy of Sciences on the collaboration with states of the former Soviet Union and the Profile Mathematical Modeling of Vaxjo University. I would like to thank A. Khrennikov for the warm hospitality and fruitful discussions. References 1. J. Von Neumann, Mathematical foundations (Princeton, U. Princeton, NJ, 1955).
of Quantum
Mechanics
256
2. E. B. Davies, J. T. Lewis , An operational approach to quantum probability. Commun. Math.Phys.17, 239-260 (1970). 3. E. B. Davies, Quantum Theory of Open Systems (Academic Press, London 1976). 4. A. S. Holevo, Probabilistic and statistical aspects of quantum theon/(Moscow, Nauka, 1980; North Holland, Amsterdam, 1982, English translation). 5. K. Kraus, States, Effects and Operations: Fundamental Notions of Quantum Theory (Springer-Verlag, Berlin, 1983). 6. M. Ozawa, Quantum measuring processes of continuous observables. J. Math. Phys. 25, 79-87 (1984). 7. M. Ozawa, Conditional probability and a posteriori states in quantum mechanics. Publ. RIMS, Kyoto Univ. 2 1 , 279-295 (1985). 8. A. Barchielli, V. P. Belavkin, Measurements continuous in time and a posteriori states in quantum mechanics. J. Phys. A; Math.Gen. 24, 1495-1514 (1991). 9. E.R. Loubenets, Quantum stochastic approach to the description of quantum measurements. Research Report N 39, MaPhySto, University of Aarhus, Denmark (2000). 10. A. Peres, Classical intervention in quantum systems. I. The measuring process. Phys. Rev. A. 6 1 , 022116 (1-9) (2000). 11. H. Wiseman, Adaptive quantum measurements. Proceedings of the Workshop on Stochastics and Quantum Physics. Miscellanea N 16, 8993, MaPhySto, University of Aarhus, Denmark (1999).
257 A B S T R A C T MODELS OF PROBABILITY
V . M. M A X I M O V Institute of Computer Science, Bialystok PL15887 Bialystok, ul.Sosnowa 64,
University, POLAND
Probability theory presents a mathematical formalization of intuitive ideas of independent events and a probability as a measure of randomness. It is based on axioms 1-5 of A.N. Kolmogorov x and their generalizations 2 . Different formalized refinements were proposed for such notions as events, independence, random value etc., 2 ' 3 , whereas the measure of randomness, i.e. numbers from [0,1], remained unchanged. To be precise we mention some attempts of generalization of the probability theory with negative probabilities 4 . From another side the physicists tryed to use the negative and even complex values of probability to explain some paradoxes in quantum mechanics 5 , 6 , 7 . Only recently, the necessity of formalization of quantum mechanics and their foundations 8 led to the construction of p-adic probabilities 9 , 1 0 , 1 1 , which essentially extended our concept of probability and randomness. Therefore, a natural question arises how to describe algebraic structures whose elements can be used as a measure of randomness. As consequence, a necessity arises to define the types of randomness corresponding to every such algebraic structure. Possibly, this leads to another concept of randomness that has another nature different from combinatorical - metric conception of Kolmogorov. Apparenly, discrepancy of real type of randomness corresponding to some experimental data lead to paradoxes, if we use another model of randomness for data processing 1 2 . Algebraic structure whose elements can be used to estimate some randomness will be called a probability set $ . Naturally, the elements of 4> are the probabilities.
1
W h a t probability sets $ are possible?
For practical conclusions of probability theory, two kinds of events so called, certain and uncertain, are of importance. Therefore, the probability set $ must have two type of elements corresponding to certainty and uncertainty. Their main role is that they are coupling all elements of $. We interpret them as a possibility of a determination of any probability p € $ of a random events by an infinite sequence of random independent variables denned by the probability set $. In this connection we don't require the formal physical interpretation for certainty. We would like to preserve all fundamental properties of probability on [0,1], corresponding to an intuitive ideas of a probability of an event for abstract probability set $. Analogical situation occures in logic. A construction which preserve the main properties of Bool algebra and possesses a some new properties led to appearance of the logical Lukasiewicz-Tarski system 13 ' 14 .
258
Definition 1 A set $ is called the probability set if it has the following properties: (i) In $ a binary operation "•" can be defined as multiplication of probabilities being unnecessary commutative. Whith respect that operation the set $ is semigroup. In addition, $ consists of three non-intersecting semigroups O, e and P , such that $ = O U P U e. The elements of semigroup O will play a role of zeros, i.e. O is a semigroup of zeros. The elements of e will play role of units, i.e. e is a semigroup of units. P is a semigroup of probabilities. Besides, for all p £ P , 8 £ O we have 9 • p, p • 6 £ O and for all p £ P , e 6 e we have e-p, p-e £ P . It is clear that zero elements correspond to uncertain events, and the unit elements correspond to certain events. (ii) For some elements of $ a commutative and associative operation "+" of addition is defined. The operation of addition and multiplication are distributive. It means that, ifforp,q,r £ $ the operationsp+q, (p+q)+r are defined, then operations q + r, p + (q + r) also are defined and an equality takes place (p + q) + r = p+ (q + r). In addition for all u,v,r the operations u-p + v-q, p-u + q-v are defined and the equalities take place r-(p + q)—r-p + r-q, (p + q)-r=p-r + q-r. (iii) For all p £ P there exists a complementary element p £ P and e £ e such that p + p = e. (iv) The operation "+ " is defined for all elements of O and is not defined for elements of e. Besides, for all p fi e, 6 ^ O a sum p + 6 is defined and p + 6 £ O, p + 6 $. e. Also for e £ e the inclusion takes place 6 + e £ e, but p + e is not defined. (v) In $ some topology is introduced such that with respect that topology the operations "•" and "+ " are continuous. For arbitrary neighbourhood V(0) of zeros there esists p £ $ such that pn € V(O) for n>n0 (V,p). (vi) Ifp,qE$ andp + q £ O, then it follows that p,q £ O (the property of indecomposability of zero). That property is not necessary. For example in the complex and p-adic probability it can be not fulfilled. (vii) The equation p2 = p always has the solutions in O and e. If the equation p2 = p has the solutions only in O and in e then we will say, that Kolmogorov' condition is valid for probability set $ . The properties (3.1)-(5) provide the main identity of independent' probabilities calculus, i.e. if
259 Pi + • • • +pn = e G e, pi 6 P , then we have ( p i + ---+Pn)n
= E f t i •••Pik
=efc
€
e
-
Unfortunately, operations of a direct sum and of a tensor product of [0,1] do not produce new probability set different from [0,1]. For example in case of a direct sum [0,1] © [0,1] with the coordinatewise multiplication we have (p,q), p,q G [0,1] as probabilities. Consequently, (Pi, can be examined easily. Note that the first coordinate x runs over the segment [0,1]. Since R 2 with the coordinate-wise addition and multiplication is a simplest non-trivial topological semi-field 15 . We can consider $ as an example of a probability set included in a topological semi-field. In 16 the foundation of classical probability theory is presented in terms of semi-fields. Thus the construction of probability sets in abstract topological semi-fields can be of interest for applications. In section 3 we considered multidimentional examples of probability sets which could be even noncommutative. These examples get beyond the frames of topological semi-fields. The zero-indecomposability property can be included or not included into the properties of $. It depends on a problem. For example, if we consider all fields of p-adic numbers as a probability set, then the indecomposability property does not holds. Nevethless, it does not prevent the existence of an analogue of Bernoulli theorem in the p-adic probabilities 10 . However, we can find sets satisfying all axioms in the field of p-adic numbers. For this purpose, we take a p-adic number q, \\q\\p < 1, that is not a root of any algebraic equation with integer coefficients. Then the set of p-adic
260
Fig. 1
numbers of a form nkqk + nk+1qk+1
+••• +
nrqr,
where n\. G TV, and the rest of n^ belong to Z, k, r 1,2,3,... and of the form 1 — msqs + ms+iqs+1 + • •• + mtq*, where ms £ N and the rest of m,j belong to Z, s,t = 1,2,3,... together with 0 and 1 are a probability sets with the operations of addition and multiplication in a p-adic set. The semigroups O and e consist of 0 and 1, respectively. Essentially different examples of probability sets will be considered in sections 3 and 4. 2
Uniqueness of semigroups of zeros and units. (i) Proposition 1 In the probability set $ defined by operations "•" and "+" the semigroups O ande satisfying properties (3.1)-(3.4) are unique. Proof. It is important to note that semigroups O and e posses the maximality property, i.e. they cannot be extended to semigroups O', O C O' and e', e C e' or e C e', O C O' preserving the properties (3.1)-(3.4). Indeed, if there is an extention O', then there is an element p £ O, such that p G O'. But this will contradict conditions (3.3)-(3.4), since on one hand, the operation p + e, e £ e is not defined for p £ O, and on the other side, the operation p + e is denned for all e, e £ $, since p £ O'.
261
Now let O' = O and e C e'. Then there exists an element j ) 6 e ' , but p £ e. By (3.3) there exists p, £ O such that p + p € e C e'. Prom the other side the operation p + q is not defined for q £ O' = O and p e e'. Thus any two pairs of semigroups O and e satisfying (3.1)-(3.4) are maximal. By the same reason, in $ there exists no other pairs: semigroup O i and semigroup ei different from O and e. Indeed, assume these semigroups exist. Let Ox ^ O, O x
262
O2, P2, e 2 are semigroups, the semigroup properties of ip imply that the sets 0[, P i , e[ are semigroups in $1. Further, using properties (iia) and (iib) one can easly verify that the sets O^ and e[ satisfy conditions (3.1)(3.4) of definition 1 and thus are semigroups of zeros and units. In view of proposition 1, we have O'I = Oi and e^ = e i . It follows that P[ = P i . Then, if p £ P i there exists an element p £ P i such that p + p £ ei. Therefore ip(p + p) = ip{p) + (p(p) £ e 2 and we can set ip(p) =
p£Kp. The following two lemmas are similar to those for conjugate classes in rings, but the proofs are different. Lemma 1 If z £ Kp, then Kz = Kp. Proof. If z £ Kp, then by definition 3 we have z + 81 = p + 62 for some #1, 82 £ O. Let x be an arbitrary element of Kz. Then by definition 3 we have that x + 83 = z + 84 for some 83, 84 £ O. Adding 81 to this equality and using the addition properties in $ and the relation z + 81 = p + 82 we obtain (x + 83) + 0i = x + (83 + 0i) = (z + 8A) +8X = = (Z + 01) + 04 = (p + 62) + 04 = P + (#2 + 04) Since 0 3 + 0i and 02 + 84 belongs to O, from definition 3 follows that x £ Kp, i.e. Kz C Kp. Also, from the relation p + 82 = z + 0i it follows that p £ Kz. quently Kp C Kz and we have Kz = Kp.
Conse-
263
Lemma 2 The classes Kp and Kq either coinside or do not intersect. Proof. Indeed, let KpC\Kq^%. If z € Kp n Kq then by Lemma 1 we have Kz = Kp and Kz = Kq, i.e Kp = Kq. Proposition 3 In the set $ / 0 one can introduce the operations of multiplication and addition naturally induced by the operations in $ that transform $ / 0 to a probabilitic set. (We denote it by $o). Moreover, the semigroup of zeros of a probability set $o consists of a single element Kg = O, V0 € O, which possesses the properties of a usual zero. Proof. Define the set Kp + Kq by a term-by-term addition of elements. The definition of Kp + Kq is correct if p + q is defined. Indeed, let us consider x G Kp, y G Kq. Then by definition 3 we have that x + 0i = P + 02, y + 03 — q + 64 for some 0» G O. Since p + q is defined, by properties (3.2) and (3.4), imply (p + 02) + (q + 04) = (p + q) + (02 + 04) = (* + ») + (0i + 03)Consequently, x + y € -ftTP+9 and it follows that Kp + Kq C -ftTp+g. Similarly, we can define the set Kp • Kq by term-by-term multiplication. If x G Kp, y e Kq we have x + 0i = p + 02 and y + 03 =
C K p .,.
Those inclusions, lemma 2 and properties (3.3), (3.4) allow to introduce correctly the operations of multiplication and addition on classes <J>/0 by KpGKq
= Kpq, Kp\HKq
= Kp+q.
(1)
These operations transform the set $ / 0 into a probability semigroup $o- The zero semigroup of <J>0 consists a single class O = K#, 0 € O and the semigroup by units e / O consists of classes {Ke}, e € e. Obviously the properties (3.1)-(6) of definition 1 can be easly verified. The class K$ = O, V 0 G O, possesses all properties of usual zero, since Kq • Kg = Kq9 = Kg, = O and Kq + Kg = K g + e = if,. We define <^ on $ as
Probabilities with hidden parameters. (i) The idea of a hidden variables is very popular in quantum mechanics 17 . With the help of hidden variables many investigators try to overcome some difficulties of quantum mechanics. For example, in 1 8 to solve the Bell's inequality paradox it was proposed the p-adic theory of distributions for hidden variables. On the other hand we propose to consider the hidden variables as a hidden parametres of usual probabilities, so that the letter ones must be the abstract probabilities satysfying the conditions of definition 1. At first, we consider one model of hidden parameters for abstract probabilities. Definition 4 We say, that a set of abstract probabilities $ allows hidden parameters A (or $ has hidden parameters A), where A is certain topological space, if to each a £ A corresponds a subset Pa C $ , such that (J Pa = $ and the continuous mappings cp and ifi from A x A x $ x $ a
into A are defined and possess the following properties. The operations (p, a) + (q, /3) = (p + q, tp(a, /?; p, q)) {p,a)-{q,/3)
= {p-q,i>{a,/3;p,q))
(2) (3)
where p G Pa, q £ P0, p + q G P^a.frp.q), P • Q € ^V(«,/?;P.«) define on the set of pairs (p,a), a € A, p 6 Pa a probability set, denoted by #(;4), P(A) C $ x A. Since the left hand side of (2) and (3) is the operations in the probability set $, the hidden parameters can describe additional properties of probabilities including some possible physical sense. It is obvious that the principle problem conserning the probability with hidden parameters is as follows: can we destinguish statistically the sequences Ci(w)> •••> C«(w)) — and T]i(ui), ...,nn(£j),..., where C*(w) a r e independent random variables with identical distributions with respect to usual probabilities from [0,1] and %(a>) are independent random variables with the some values as £fc(w), but with the distributions from probability set [0,1] x A and satisfying the conditions: if P{(k(u) E B} =p, then p{r)k(oJ) G B} — (p,a) for some a € A.
265
(ii) Now we consider the principle construction for different examples of usual probability on [0,1] with hidden parameters. Proposition 4 Let $ = [0,1] and A be some convex semigroup in arbitrary Banach algebra over R. Then the set $ x A = {(p, a), a £ A} forms a probability set with respect to the operations: (p,a) + (q,a) = (p + q, - £ - a + - ^ 8 ) , p + q
= (p-q,a-
(4)
/?)
(5)
Proof. As a zero set O we consider the set {(0,a)}, a £ A and as e we consider the set { ( l , a ) } , a £ A. Then all properties of definition 1 can be easly verified. By the proposition 3, all elements of the form (0, a ) , a £ A can be ^identified with one zero. A simple interesting example of such kind can be obtained by considering a set of pairs (p, q), p,q £ [0,1] with the operations: (pi,Qi) + {P2,qi) = (pi +P2,
^—q\ + Pi +P2 Pi+
}—92), Pi 0
( P i , 9 i ) • (P2.92) = (Pi -P2, qi • 92)
2
(6) (7)
Obviously, instead of q £ [0,1] we can take the elements of Banach algebra of sequences of numbers from [0,1] with coordinate-wise multiplication. We can interpret probabilities (p, q) with hidden parameters Q — (<7i)<72, •••)) 0 ^ Ii ^ 1 a s follows: if an event S occurs with the probability p, then the probabilities (71,(72, ••• can be considered as probabilities of some independent events Si,52,... which can occur when S occurs. Another example of hidden parameters interesting from a probabilitic point of view can be obtained, when q = \\qij\\ runs over stochastic matrices. Now we can consider random index i, i = 1,2,... with distribution (Pt, ||
266
(Hi) As a prototype of a general construction of a probability $ with hidden parameters, we can consider a set of positive measures min(G) on some semigroup structure G with natural opperation of addition and composition of measures. Indeed, let G be an arbitrary locally compact semigroup. Consider a set min(G) of all positive measures on G with weak topology. We can naturally define operation of convolution (composition) "*" on min(G) as follows: for /i, v € min(G)we set 3 , H*v(B)
=fj,xv{(x,y)
: x-yeB,
x,y£G},
(8)
where /i x v denotes direct product of measures fi and u on G. Then min(G) is a semigroup with respect to the convolution. Besides, the addition (fi + v){B) = n{B) + v{B) and the multiplication by a positive number A, (\v)(B) = X/J.(B) are defined on min(G). Obviously, the operations of convolutions and additions are distributive. Thus, the linear set min(G) is convex semigroup with respect to convolution. The set min(G) possesses almost all properties of the probabilities sets with respect to these operations except one: there is no semigroup of units in min(G). But if we restrict min(G), we can obtain a convex semigroup possessing all properties of a probability set. To this end we consider a subset minj(G) of min(G) consisting of all probability measures, i.e. the set of positive measures fi for which (i(G) = 1. Prom (8) it follows that mini (G) is a semigroup. Consider a convex closed semigroup min[ 01 ](G), consisting of all non-negative measures fi for which 0 < /i(G) < 1. It can be readily seen that set min[0]i](G) with the operations of the addition and the composition satisfies all properties (3.1)-(6) of the probability set with a semigroup of units e = mini(G). Each element fi from min[oii](G) can be obviously represented in the form p • (^fJ.), where n(G) = p 6 [0,1], p ^ 0, ^/i € mini (G). If fi and u belong to min[ 0j i](G), then we have: p
q
p+q
H*v = p(-»)*q(-v) Prom (9) and (10) we obtain the
=pq{(-ti)*(-v)}-
(10)
267
Proposition 5 The convex semigroup min[o,i](G) and the set ${mini(G)}, of elements (p,a), p £ [0,1], a E mini(G) with the operations (4), (5) are isomorphic. The probabilities (p, n) can be interpreted similary to item ii above. However, the structure of multiplication of semigroup is rather more complicated. Consider an algebra of some events F. Suppose, that each such event has a state, which can be represented by an element of a group G. Let the probabilities (pi,p,i), ]TXPJ,^J) = (1,£*) assigne the distribution on events Ti C T, TiV\ Tj = 0. Then the probability (pi,fii) means the choice of a event Ti with the probability pi and the choice of a state g £ G with distribution \n. It is obvious that the addition and multiplication of these probabilities must be determined by the physical model obtained from an experiment or theoretically. 4
Probability sets with a single unit.
If a semigroup G is finite, then min[0ii] (G) is convex set in the Euclidean space. We will show that convex set contains probability subsets with a single unit. A special two-demensional case of such probability set was presented in section 1. (i) Let G be a finite group (commutative or non-comutative) with elements ei,62, ...,e s , s > 2. Consider a group algebra G(R), i.e. a linear space of linear forms ziei + (- xses, i j G R with a group multiplication of basic elements {ej}. Assume that the basis {ej} is ortonormalized. Let mini(G) be a simplex formed by the vertices e\,ei,--,es, and the set min[o)i](G) be a simplex formed by the vertices 0,ei,e2, ...,e s , see Fig.2. Then the measure (i 6 min[ 01 ](G) can be written as fj, = p\e\-\ \-pses, where 0 < pi < 1 and J2iPi 5: 1- The geometrical center of mini (G) is an invariant measure no = \e\ -\ h ^e s . For any measure fi € min[ 01 ] (G) we have: /j,nG = nGiJ, - n{G)nG.
(11)
In special case, if p, 6 mini(G), then una = nop = no and nG = noDenote the line passing through the points 0 and no by I. Then, as it can be seen from Fig.2, mini(G) is a part of hyperplane orthogonal to line I and passing through the point no, and min[0)1](G) is a part of positive orthant cut of by mini(G).
268
M/G) ^3
i^_„^„r..
Fig. 2
Really, Fig.2 corresponds to the case s — 3, when G is a cyclic group of three elements. This case is of a special interest, because algebra G(R) is isomorphic to direct sum of real numbers field and complex numbers field19. Consider a cube Q as it is shown in Fig.2. The cube Q consists of all measures fi = Y^l Piei f° r which 0 < pt < j . Proposition 6 The set Q considered as a subset of a convex semigroup minr0,i](C?) is a probability set with a single zero 0 and a single unit noProof. Let us establish, that the set Q is a semigroup with respect to the multiplication. Indeed, if fi = ^2{piei, v — YHljej belong to Q, then 0 < Pi < - , 0 < qj < 1, and therefore, we have \i*v = Y^Pi1ieiej ~ S ( ^Pilik I efc> where i* = 1,2,..., s are defined uniquely for each i and k \ i J k by the condition a • e{k = ejt, i, k = 1,2,..., s. Since G is a group, then for any fixed k, k — 1,2, ...,s, the indexes ik run over 1,2, ...,s, when i runs over 1,2, ...,s. Therefore, we have
$>ife < ; E « ^ ?
269 Now let us show that a complimentary element ~p exists for each p = p-\.e\ + • • • + pses € Q. By definition 1 we must have \i + ~p 6 e. In our case we set e = n g . Then p + ~p = ng and therefore ~p - nG - p. = ( i - pi)ei + ••• + ( j - ps)es 6 Q, since 0 < pi < £, i = 1,2,..., s. Finally, let us check property (3.4). Really, if p € Q, p ^ n o then p(G) = A < 1. Thus, by virtue of (11) we have pna = ^GM = n(G)nG = \nG. The remaining properties of definition 1 for the set Q follow straightforwardly from the properties of probability set min[ 0 ,i](G). Note that the Kolmogorov condition (7) holds in Q. (ii) It proves to be possible to construct even more general kind of probability sets with a single unit as a subsets of the set min[ 01 ] (G). For this purpose, we consider an arbitrary convex semigroup S(G) in mini (G) and a convex set SQ(G) formed by zero (0) and the elements of the set S(G). One can readily see that So(G) also satisfies properties of a probability set in which S(G) is a set of units. Now we consider a set Q(S, G) which is an intersection of the set S$(G) and all half-spaces contained zero and bounded by hyperplanes parallel to the faces of the So(G) and passing through the point nG. Proposition 7 Let S be an arbitrary convex semigroup in mini {G), central symmetric with respect to the point nG. Then Q(S, G) is a probability set with a single zero and a single unit. Proof. We shall show that Q(S,G) is a semigroup with respect to convolution, and hence Q(S,G) as a subset of min[0]1](G), is a probability set with a single unit nG- First, note that, in view of central symmetry of 5 with respect to nG, an intersection of any face of So(G) with any hyperplan passing through the element nG and parallel to another face lays in the intersection of faces of SQ(G) and the hyperplan h passing through \nG and perpenducular to the line /. Fig.3 shows a plane -K passing through the point p0 € S0(G) and line /. The rhombus 0AnGB is an intersection of Q(S,G) with this plane. Each element p, of this rhombus can be represented by p, = nG — Ai/xi, where pi € S(G), 0 < Ai < 1. Symilary, for each other element v of Q{S,G) we also have ii = nG - A2^i where v\ £ S(G), 0 < A2 < 1.
270
71 O S(G) -• x G
JA
s
^ 1
Fig. 3
Therefore the product fiv equals (n G - Xim)(nG
- A2^i) - nG - A 2 n G ^i - Ai/zin G + A i A ^ i ^ i = = ( 1 - A i - A 2 )n G +AiA 2 /ii^2.
(12)
Let us show that the element (12) belongs to Q(S,G). Consider the first case when either Ai and A2 is greater than | . Let for example, Ai > | . Then the point jl lays in the left-hand side of the rhombus and thus can be represented as ty, \i 6 S(G), t < | . On the other hand, we have v - T • v for v E Q(S,G), where v £ S(G), 0 < r < 1. Therefore, the product Jiv is equal tr • fiu, where fj, • v G S(G) and 0 < tr < | . Consequently, by construction of Q(S,G) measure p.i> lays the left of hyperplane h (Fig.3), and consequently ftu £ Q(S,G). Now consider the case, when Ai < | , A2 < | . Then p = 1 — \x — A 2 > 0 and q = \1\2 > 0. Show that inequality p + 2q < 1 holds, which is equivalent to the inequality Ai + Ai > 2AiA2. Indeed, (Ai — A2)2 = Af + A| - 2AiA2 > 0. Since 0 < Ai < 1, 0 < A2 < 1 we have Ai + A2 - 2AX A2 > \\ + \l - 2AiA2 > 0. Whence p + 2p
p,q >
271
0, p + 2g < 1. Show, the measure m = pna + gw belongs to Q(S, G) for any measure w € S(G). Fig.4 shows the plane passing through the points 0, u ans no- The point m = priG + qw lays on the line /' parallel to Ow and passing through priGNow, to prove that m belongs to Q(S,G) it suffices to demonstrate that \qu>\ < |A|. By similarity of triangles 0 u n s and pno BTIQ we have |2A|
(l-p)|nG|
u>
\nG\
That is, |A| = | ( 1 -p)\u\.
l-p.
Then 1(1
\qu\
=
2
-P)\" Q
11-p
> 1
U)\
follows from the inequality p + 2q < 1. Hypothesis: For arbitrary S(G) C mini(G), the set Q(S, G) as a subset of a convex semigroup minr0)i] (G) is a probability set with a single 0 and a single unit no •
272
We would like to note in connection with the examples of section 1, that a general description of probability sets in topological semi-fields and in the field of p-adic numbers is of a great interest for applications. We hope that problems of an experimental determination of abstract probabilities will be considered in the continuation of this work. 5
Acknowledgments
In conclusion, I want to express my gratitude to A. Yu. Khrennikov (Vaxjo Univ. Sweden), Yu. V. Prokhorov, O. V. Viskov, I. V. Volovich, (all of Steklov Mathematical Institut, Russia), V. Ja. Kozlov (Academy of Criptografy, Russia), V. I. Serdobolskii (Moskow Univ. of Electronic and Math., Russia), and A. K. Kwasniewski (Bialystok Univ., Institut of Computer Science, Poland) for discussions and their advices on foundations of probability theory and quantum mechanics. This investigation was supported by the grant of Swedish Royal Academy of Sciences on the collaboration with states of the former Soviet Union and the Profile Mathematical Modeling of Vaxjo University. References 1. A. N. Kolmogorov, Foundation of the probability theory (Chelsea Publ. Comp, New York, 1956). 2. T. L. Fine, Theories of probabilities, an examination of foundations (Academic Press, New York 1973). 3. H. Heyer, Probability measures on locally compact groups (Springer Verlag, Berlin-Heidelberg, New York, 1977). 4. Y. P. Studnev, TV and its applications 12, 727 (1967). 5. R. P. Feyman, Negative probability. Quantum implications. Essays in Honour of David Bohm, B.J. Hiley and F.D.Peat (Routledge and Kegan Paul, London, 1987). 6. P. Dirac, Pev. Mod. Phys 17, 195 (1945). 7. 0 . G. Smolaynov and A. Y. Khrennikov, Dokl. Akademii Nauk USSR 281, 279 (1985). 8. V. S. Vladimirov, I. V. Volovich and E. I. Zelenov, p-adic analysis and mathematical physics (World Scientific Publ., Singapore, 1993). 9. A. Y. Khrennikov, Theor. and Math. Phis. 97, 348 (1993). 10. A. Y. Khrennikov, Doklady Mathematics 55, 402 (1997). 11. A. Y. Khrennikov, Mathematical and physical arguments for the change of Kolmogorov's axiomatics. Trends in Comtemporary Inf. Dim. Analysis and Quantum Probability, N.l, 215-249 (2000).
273
12. L. Accardi, The probabilitic roots of the quantum mechanical paradoxes. The wave - particle dualism (D. Reidel Publ. Company, Dordrecht, 1958). 13. C. C. Chang, Transactions of the Amer. Math. Sos. 86, 467 (1958). 14. R. S. Grigolia, Algebraic ananlysis of Lukasiewicz - Tarski's n-valued logical systems. Selected papers on Lukasiewicz sentential calculi (PAN, Ossolineum, Poland, 1977). 15. T. A. Sarymsakov, Topological semi-fields and its applications (FAN, Tashkent, 1989). 16. T. A. Sarymsakov, Topological semi-fields and probability theory (FAN, Tashkent, 1969). 17. J. S. Bell, Rev. Mod. Phys. 38, 447 (1966). 18. A. Y. Khrennikov, Physics Letters A 200, 219 (1995). 19. B. L. Wan der Waerden, Algebra I Achte Auflage der modern algebra, (Springer-Verlag, Berlin-Heidelberg, New Yok, 1977).
274
Q U A N T U M K-SYSTEMS A N D THEIR A B E L I A N MODELS H. NARNHOFER Institut fur Theoretische Physik Universitat Wien Boltzmanngasse 5, A-1090 Wien E-mail: [email protected] In this review the concept of quantum K-systems is studied, on one hand, based on a set of increasing algebras, on the other hand, with respect to entropy properties. We consider in examples how far it is possible to find abelian models.
1
Introduction
Classical ergodic theory is a powerful discipline both in mathematics and physics to analyze mixing properties of a given dynamics. Since in physics the mixing properties take place on the microscopic level that is controlled by quantum theory it is natural to try to translate the concepts of classical ergodic theory also into the quantum framework and to study how far these concepts can find their quantum counterpart and whether new features appear. One possibility is the following: we start with a classical dynamical system, e.g. a free particle on a hyperbolic manifold with finite measure and quantize the dynamics, i.e. study the properties of the Laplace-Beltrami operator on this manifold. Since the manifold has finite measure the Laplace-Beltrami operator has necessarily discrete spectrum 1 and the classical mixing properties can only have their footprints in the distribution of the eigenvalues at high energy 2,3 . Many deep results have been found on the basis of this approach. But in this review we will follow another path of considerations. We start with the classical dynamical system with optimal mixing properties, the Kolmogorov system 4,5,6 . It can be characterized either by its algebraic structure or by properties of its dynamical entropy. Both concepts find their counterpart in quantum systems 7 but they are not equivalent any more. First we will give the definition of an algebraic K-system and some definitions of dynamical entropies. One of them relates the quantum system to classical K-systems that can be considered as models of the quantum system. Then we will give examples of algebraic quantum K-systems and will discuss how far they can be represented by classical models. Finally we will give examples of quantum K-systems for which no classical model exist and, on the other hand, a quantum dynamical model that allows the construction of a classical model, but for which the algebraic K-property so far cannot be controlled.
275
2
Classical K-System
Let us repeat the characteristics of a classical dynamical system (A, a, /z) where we take A to be the abelian algebra built by the characteristic functions over a measure space with measure fi and a an automorphism over A with [i o a = fi 4,5,6
Definition 2.1 Ao £ A,
We call (A, Ao, a, fi) a K(olmogorov) system if: crAoDAo,
\JanAo=A,
f]a~nAo
= XI.
(2.1)
For a given classical dynamical system (A, a, fi) we can decide in several ways, if some Ao (that is not unique) exists, so that (A,Ao,a,fj,) form a K-system 5,6
A) Choose some finite subalgebra 13 C A (i.e. some finite partition of the measure space) and construct its past algebra Ao = Un€N a~n&- If A) is a proper subalgebra of A, it will increase in time. Check, if \J anAo = A, if not, B has to be increased. If B is large enough, check, if f] a~nAo = Al. B) Consider the conditional entropy H(B\Ao). positive V B, (A, a, fi) is a K-system.
If this expression is strictly
C) If lim H(anB\Ao)
= H(B)
VB,
(2.2)
n—foo
then (^4, a, (i) is a K-system. The classical K-system can also be characterized by its clustering properties: Let (A,AQ,(J,H) be a K-system. Then to every B E A, e > 0, 3 n 0 such that \p(Bo-nA)
- n(B)n(A)\
< en(A)
VAeAo,n>n0.
(2.3)
The prototype of a K-system are the Bernoulli shifts (including the Baker transformation): We regard the Bernoulli shift as an infinite tensor product A — <8)fez Bti where Be is isomorphic to a finite abelian algebra Bi « BQ = {Pi,... ,Pk} with projections P, with expectation values /z;. The dynamics is given as the shift a over the tensor product. The state \x has to be translation
276
invariant. It can be the tensor product of the local state, but we allow also spatial correlations. The dynamical entropy is given by suptflQc/SI | J arB\ " \t=0 r<-l+n J
(2.4)
= supiffMJ
(2.5)
and coincides with H (B) if the state p, factorizes. 3
Algebraic Quantum K-Systems
It is obvious that one can adopt Definition 2.1 directly to define an algebraic quantum K-system. It is also obvious that the definition is not empty because we can construct the quantum analogue of a Bernoulli shift by taking for B a nonabelian algebra, e.g. a full matrix algebra Mkxk- In the following we will first discuss physical applications of this quantum Bernoulli shift and then turn to generalizations. A. A model for Quantum Measurement We start with a finite-dimensional algebra B and a state u over B. In order to determine w we have to make many copies of u and repeat a variety of measurements. The classical Bernoulli shift consists of projections and every measurement gives as outcome 0 or 1 on these projections with probability corresponding to the state p. By repeated measurements we can determine p with exponentially increasing security. In the quantal situation a measurement corresponds to pick some abelian subalgebra Bo of B, maximal abelian, if the measurement is sharp, and again the outcome of the measurement will be 0 or 1 on the projections in Bo- To determine the state u we have to vary the measurements, respectively the algebras Bo. Since the state space over B is compact it suffices to vary over finitely many Bo- Let u(Pj) = pj for Pj 6 BQ. TO get security on the density distribution with respect to Bo, the number of experiments have to be of the order pj(l — pj)/e2. For the algebra Bo that commutes with the density matrix p corresponding to u the entropy S(p\g ) is minimal and approximative security on the density distribution is reached for the smallest number of measurements. For other abelian subalgebras BQ we are satisfied with less security,
277
we have just to be sure that p\e0 is more mixed than p\-go. With pj — UJ(PJ) for Pj £ Bo and Jj- = u(Pj) for ~Fj e B0- The probability that the outcome of N measurements gives a probability qj > pj + e is Nipj-pj-e)2 exp
—. (3.1a Pi(l-Pj) This has to be compared with the security given by N measurements on B0 exp-^-p
~Ne2
-r.
(3.1b)
Therefore the number of experiments N necessary to control p\s0 is small compared to the number N that fixes p\g and at the same time p. If we interpret the entropy as a measure on the reliability of a sequence of measurements we see that it is not changed compared to the classical expression, i.e. the same order of experiments is necessary and therefore S(p) = S(p\Bo) = -Trplnp.
(3.2)
R e m a r k : In 8 the Shannon information resp. von Neumann entropy (3.2) was questioned to be the appropriate quantity. But in these considerations it was not taken into account that measurements on different abelian subalgebras are correlated. We have incorporated these correlations by taking into account the varying necessary accuracy and in this way got the desired result. B. Lattice Systems Again we choose a matrix algebra B and define A = ® n 6 ^ Bn as before. But now the algebra describes particles on a lattice (one-dimensional for n £ Z), the shift corresponds to space translation and the translation invariant state describes the system in e.g. the ground state or equilibrium state with respect to some Hamiltonian, e.g. the Heisenberg ferromagnet. Therefore in general the state will not factorize but be obtained as 9
...
,.
Tre-^A
u(A) = hm —
^s—.
A-yZ Tr e-PH*
(3.3)
We assume that the sequence of local Hamiltonians H\ determines a time automorphism on the algebra that commutes with space translation. We can assume that ui(A) is space translation invariant. In order that we have an algebraic K-system on the von Neumann level (in the weak topology) it is necessary that the state is extremal space translation invariant. This can be achieved, if necessay, by a unique decomposition as in the classical situation 9 .
278
C. Fermi Systems We consider the CAR algebra A{a(f), a^(g)} either over C2(Z) or L2(R). The shift defines an automorphism over A and the K-property is satisfied with AQ = { a ( / ) , a t ( / ) ; s u p p / 6 Z~ or R~}. This is not a Bernoulli-K-system, because creation and annihilation operators anticommute. D. Quantum Stationary Markov Processes Another example 10 of a K-system is provided by stationary Markov chains. Here many variations of the definition of such a Markov chain exist. We give an explicit example that again cannot be imbedded into a Bernoulli system. Let Ao be a 2 x 2 matrix algebra and C = ® n € Z Cn a Bernoulli system, Cn again a 2 x 2 matrix algebra. Define the map Ti : A$ ® 1 —> Ao <8> C\ by Ti(ax®l) T^y®\)
— ~ox®ox = ax®ay
ri(az®l)
=
(3.4)
l®az.
On C we consider the shift r and a r-invariant state CJ. Therefore we can define T = (Ti ® idci )°{idA®T).
(3.5)
Then A[m,n] = \/m
ax.
(3.6)
Shift
Another illustrative example for a quantum K-system is the Prize-Powers shift n Let ej be a unitary satisfying e2 = 1. Let eiek
= ( - l ) ^ - * ) e * e i with g(i - k) e {0,1}.
Let aek = e^+i. Then {Vg,o = {ehi
< 0},Vg
= {etJ
£
Z},<J,T)
(3.7)
279 form an algebraic K-system where r is the tracial state -r(e/) = Sift with e/ =
JJ
eiu ...eik.
(3.8)
iiii„€l
Special examples are a) g(l) — 1, g{k) = 0 otherwise: Then the algebra coincides with 0 A M ^ x 2 where &2k R2k+i
crz®az£
=
Mk+i <8> Mk
1 ®<Jx € >lfc.
b) g(i) = IV i. Then the algebra coincides with CAR on Z: et = a,i+a\. Other explicit examples can be found in 1 2 . In all these examples (A - E) we inherit from the classical theory the following Theorem: Let (A, Ao, cr, u) be a K-system and u an extremal translationally invariant state. (That is equivalent that f)(j~nAo = Al in the strong topology.) Then to every A, e 3 no such that \oj(Aa-nB)
-
< e\\B\\
U(A)OJ(B)\
\/n>n0,
B e A0.
(3.9)
Therefore we have the same clustering properties as in (2.3). Proof: tation
If
OJ
is the tracial state
T(AB)
=
T(BA),
then in the GNS represen-
OJ(B) = (n|7r(B)|n>. ir(Ao) defines a projection operator PQH = Tr(Ao)Q that is increasing respectively decreasing in an u{Ao-xB)
=
oj(Aa-nP0(J-nB)
and st- lim (7nP0 = 1, n—*oo
st- lim a~nPQ = \fl)(fl\. n—•oo
(3.10)
280
If LJ is not the tracial state but a KMS state, it cannot be excluded that ft is not only cyclic for TT(A)" but also for TT(AO)". But in this case the modular operator corresponding to ^(Ao)", A 0 can replace P 0 for controlling the cluster properties and satisfies 13 st- lim
4
A
i/2
= J |fi)(fi|. +
1
(3.11)
2
Dynamical E n t r o p y
The dynamical entropy of classical ergodic theory can be interpreted in two different ways: If we use the definition h{a) = supH(a,B) = supH(B\ I J a~nB),
(4.1)
then it measures how the algebraic K-system increases and how in the course of time our information on the complete system increases. If we concentrate on the fact that lim H[akB\
I J a~nB) = H(B),
(4.2)
it describes that the remote past becomes more and more irrelevant for the presence. Both properties can inspire us to look for an appropriate definition for a dynamical entropy for a quantum dynamical system. a) For an algebraic K-system we can just copy the definition of a classical K-system. Definition: Given two subalgebras A, B C M, w a state over M. Then we define with S(uj\ip) the relative entropy the conditional entropy H(A\B) HU{A\B)=
sup ^2(S(u\uiU
- S(u\ui)B).
(4.3)
Evidently H(A\B) > 0. By monotonicity of the relative entropy H(A\B) = OifAcB. Let (A,Ao,a,u) be an algebraic K-system. Then HiJ(aAo\Ao) measures how fast AQ is increasing. The above expression has not been much
281
investigated. The main reason lies in the fact that for a given quantum dynamical system different to the classical situation, no strategy is known to decide whether an AQ with the desired properties exist. If it exists, there is no reason to assume that it is unique. In the classical situation the dynamical entropy does not depend on the special choice of AQ. In a quantum system, due to the lack of a constructive approach to Ao, we also have no chance to compare H(aAo\Ao) with respect to different past algebras AoThere exists also another characterization for the amount of increase: For A D Ao, both type Hi algebras, define P 0 the projector on AoO. in the GNS representation of the tracial state over A, Po 6 n(Ao)". Then 14
[A:A0}=T(P0)-\
(4.4)
r the trace over n(Ao)". This definition has been generalized to type III algebras by 1 5 . Note that it is not state dependent. As a typical example it can be evaluated for the Price-Powers shift: both (4.3) and (4.2) are independent of the sequence {g} and give In 2 resp. 2. But it should be noted that in general there exists only an order relation 16 H(aAo\Ao)
< 2 1 o g M o : M-
b) The main obstacle to use (4.3) or (4.4) as a definition for the dynamical entropy comes from the fact that for noncommutative algebras in general U n = 1 a~nB will increase in a way that can be hardly controlled. An illustrating example is given by the following observation 17 . Take A = {a(f),a^(g), f, g G C2(R), a} with a the space translation. We know already that it corresponds to a K-system with A0 = {a(f),a'(g), f,g, € C2(R~)}. But if we pick a(e~x ) and construct the algebra A0 = {a(e~( x _ a ) ), a > 0}, then A° coincides with A: if it would not, we could find some / with (/|e~( x ~°) ) = OVa > 0 and this is impossible due to the analyticity properties of the Gauss function. Due to this fact entropy
18
proposed the following definition for a dynamical
282
Definition: Let M be a hyperfinite von Neumann algebra with a faithful normal trace. Let Pf(M) be the family of finite subsets of M. Let X C M. We write
if for every x € w there exists ay e x T((X
sucn
that
- y)(x - y)*) < 6.
(4.5)
Let J" be the family of finite dimensional C* subalgebras of M. Then rT(cj,5) = inf{rank A : A e T{M),UJ
haT(a,uj,S)
=
haT(a,u>)
=
C A}.
(4.6)
1 (n~l lim sup —logr r I I J oUu),8
n-¥oo
n
\ ^ \j=o
suphaT(a,uj,S) (5>0
haT(a)
=
sup{/io T ((T,w):w6P/(M)}.
(4.7)
The notation stands for approximation entropy of a. The above definition allows many variations. For instance, the lim sup can be replaced by a lim inf, and we can hope, but it is not proven, that this does not change the definition. New information can be gained if we change the approximation conditions (4.5). The topological entropy uses the approximation in norm. But to keep generality we cannot assume that the full matrix algebra belongs to A. Concentrating on nuclear C* algebras we have to approximate via completely positive maps (
VaGw.
(4.8)
hat{a) is denned as haT only under the new approximation condition. If M is an AF-algebra and therefore possesses a tracial state, then the topological entropy dominates the approximation entropy ht{a) < hat{a).
(4.9)
283
As another possibility we can approximate ip o p(a) — a in the strong topology in a given representation corresponding to a state ip and replace the rank of the best algebra A by the entropy 19 s = (ipoip). All these definitions satisfy the requirement that they coincide with the usual definitions (state dependent dynamical entropy or topological entropy) if we apply them to commutative algebras. Let us finally remark that applied to the Price-Powers shift, again independent of {g} (3.7) haT(a) = hat(a) = ht -
(4.10)
Li
For further studies we refer to (Stormer, Choda, Dykema) 2 0 ' 2 1 ' 2 2 . c) An approch that differs very much from the mathematically motivated definition of Voiculescu is offered by Alicki and Fannes 23 . It is motivated from the concrete method how we are able to determine by experiment the state of a system: we perform a measure and repeat the measurement in the course of time. Here we use the idea of the history of a system as discussed e.g. in 2 4 ' 2 5 . A single measure corresponds to a partition of unity fc-i
]•>**; = !.
(4.11)
j=0
In fact, we may think that the x^ are commutative selfadjoint projection operators. But by time evolution this commutativity is destroyed anyhow and also for the necessary estimations it is preferable to consider this generalized partition of unity without further restrictions on Xi. Repetition of the measurement corresponds to a composed partition
X = ax VX°X
(x0,...x„-i)
=
((TX0,... ,o-x n _i)
=
(...
,
2
i.e. a partition of size k .
(ii\x*Xj\n) = MX
284
defines a density matrix of dimension k with entropy H{x) ~ S(MX.
(4.12)
As dynamical entropy h(x) we define h(x)
h{a)
H(am~1x°---vx°x)
=
limsup— m rn
=
limsup — S(Mam-ixo
=
suph(X).
axox)
(4.13)
But here a problem arises: if we do not restrict B in the algebra A we lose control on the dynamical entropy. For instance, if we take as C*algebra the Cuntz algebra 9 with 11,17j — % and UfUj = Pj and use the {Ui} for %, then the identity map has infinite dynamical entropy. If, for instance, we consider the shift on the lattice system B), then we can choose as natural subalgebra B that is dense in A, the algebra of strictly local operators. Some weakening of this restriction is possible, and this is of course necessary, if we want to apply the theory to time evolution with interaction, where local operators immediately delocalize. But this derealization decreases exponentially fast in space 2 6 , therefore B consisting of exponentially localized operators, should be sufficient to define a dynamical entropy for time evolution in the sense of Alicki and Fannes. As an example we consider the shift on the lattice. Then /IAFMO,
=
S(LJ)
+ lnd,
(4.14)
there s(u is the entropy density corresponding to the state w and d is the dimension of the full matrix algebra of each lattice point. d) As last proposal for the definition of a dynamical entropy we describe the one which, in fact, has the longest history: First it was proposed by Connes and Stormer for type II algebras 27 and then generalized in 2 8 and 29 to general situations. We present the definition given by Sauvageot and Thouvenot 30 which they showed to be equivalent to the ones in 27 and 29 for hyperfinite algebras. In their definition it is most evident that this dynamical entropy measures how far the quantum system is related to a classical K-system. In addition, concepts developed in this framework also find their application in quantum information theory.
285
Definition: The entropy defect of an abelian model. Let (.4, w) be a nonabelian algebra with state u. Let (B, n) be an abelian algebra with state fi that is coupled to A by a state A over A®B, satisfying A|^t = w, X\B = fi. Its entropy defect is defined as HX(B\A) = [H^B) - S(LJ ® ii\X)A9B]. Theorem:
(4.15)
The entropy of the state u is given as SA(w) = sup [HB(fi) - HX(B\A)].
(4.16)
In fact, there exist many abelian models that optimize the above expression: every decomposition of OJ into pure states ui = J^ILi Viui c a n be interpreted as abelian model with B = { P i , . . . ,Pn} and fi(Pi) = fii, \(Pi®A)
= fiiOJi(A).
Due to quantum effects the entropy is not monotonically increasing, if we consider an increasing sequence An C Am ,n<m. But monotonicity can be regained, if we change the definition to Definition: Then
Let A C C and (B,fx,\)
HUlC(A)=
be an abelian model for
sup [HB{n) - HX(B\A)].
(C,CJ).
(4.17)
(B,M,A)
This suggests the definition for a dynamical entropy: Definition: Given (A,a,u>) a quantum dynamical system. The dynamical entropy is given by hu(a) = sup[/»M(P|P_) - H(P\P-
® A)]
(4.18)
where the supremum is taken over all dynamical abelian models (B, n, 0 ) with n o 0 = 0 and coupling A o 0
There holds equality between hu(a) and s u p [ M P | P _ ) - H(P\A)].
(4.19)
286
This is based on considering H(P\P-)
=
lim - H I \ / ekP)
=
lim - H I \ / BkP\A)
H(P\P_ ®A)
)
]
and taking V @kP as a new abelian model. It is evident that one can also define the dynamical entropy with respect to a subalgebra C C A K{a,C) = sup[/i M (P|P_) - H{P\P-
® C)],
(4.20)
an expression that we need, if we want to discuss 2.C) in the framework of quantum systems. Notice that (4.19) cannot be replaced in general by an expression like (4.18). The main task now is to find abelian models. This can be done very similar as for calculating the entropy of a state. Theorem:
Assume a state w is decomposed w = ^Mii,...,i„Wi 1 ,...,i n .
(4.21)
Define <
<
=
1^
Wi,...,i»Wii,...i»-
it ,l^k
Consider
H(C, aC,..., ak^C) =: 5 ( W ) - £ S{$)
+£
/ ^ S M U ^ ^ - M .
(4.22) Consider now the decomposition
w = ^ p y 51 E ' 1 " - i " W i ' - - i « ^ '
=
S/*i„...,i.w
r=-*
In the limit lim,—,..*, lim„^.oo (i-e. we have to start with a sufficiently large decomposition) the {pik} converge to an abelian model and all
287
abelian models can be obtained in this way. The detailed proof for this statement can be found in 3 0 . This theorem enables us to find lower limits for the dynamical entropy. Together with the fact that 1 H(C,aC,...,ak-lC)
< \ SU(C) + 0(8),
(4.24)
if C C C in the sense of (4.5) or (4.8), we also have the upper bound 2 9 h(a) < sup lim \ H(C,... c k
,
(4.25)
so that in some cases we can really evaluate the dynamical entropy. 5
Some General Considerations on Abelian Models
As we already mentioned the entropy of a state over a quantum system can be calculated via an abelian model. For a matrix algebra this view point may look superficial, but has found its important application in the theory of entangled states, where subalgebras A® B C C are considered and the entanglement describes that a pure state over C will not be pure as state over A resp. B. This entanglement can be used for quantum communication and the amount of this applicability is expressed as entanglement of formation 31 (compare (4.17)) E{u,A)
= S(u)A
- HW(A) = miY^mS{u>\uji)A.
(5.1)
Expressed in terms of an abelian model we can also write HU(A)
= sup S(U®H\*)A®B0, A,0o
(5.2)
where A is a state over BQ ® C. We have the following inequality: Let w as state over C be written in the GNS-representation w(C) =
and let C be the commutant in this representation. Then S(u
® H\U>)A®C0 < HU(A)
< S(UJ ® U}\LJ)A®C'
(5.3)
with C'0 any abelian subalgebra of C. A maximal abelian subalgebra of C gives a lower bound to the entropy and in some cases it even is the best
288
abelian model (compare 32 and the explicit results in 33 for estimates on E, i.e. without dynamics), but in other examples 32 , see also the forthcoming 6.E, it is evidently too small. If, in addition, the abelian model has to carry a dynamics the question arises, when the abelian model can be imbedded into the commutant (or whether by the natural isomorphism the algebra itself contains a sufficiently large time invariant abelian subalgebra). Here we have the following results: Theorem: 34 . Assume that (A,CT,CJ) is a dynamical system and OJ a tracial state. Assume that the analogue of l.c) ("entropic K-system") is satisfied, i.e. lim H(on,B)
= H(A)
V finite dimensional B C A.
n—too
Then st-lim[yl,
V A.
(5.4)
Proof: It sufficies to choose B = {P} for all projection operators in A. Then {P} is its own best abelian model in the calculation of H(B). Refinements of the models {P},... , {anP} have to be used to calculate H(an,B) (compare theorem (4.23)). But they are only possible, if P and anP nearly commute. The theorem was generalized to other states 34 , but with the restriction that we had to be able to keep control over sufficiently many optimal abelian models. We do not believe that these restrictions cannot be removed by a harder analysis. Another result on footprints of commutativity is the following. Theorem: 3 5 . Assume that in the calculation of the dynamical model there exists an optimal abelian model, i.e. h(a) = sup (4.19) = max Ai p,e(4.19),
(5.5)
\,B,0
then the algebra .4 contains an abelian subalgebra Ao on which a acts as an automorphism. Notice that this does not imply that this abelian subalgebra already is the optimal abelian model. 6
Abelian Models for Algebraic K-Systems
In the following we will discuss the examples of abelian K-systems given in Sect. 3 and how far they allow to find good abelian models.
289 A) In this model of a quantized Bernoulli system that completely factorizes the obvious choice of the abelian model that gives the correct result is
-4o = (g)4 n ) n€Z
where BQ is the abelian algebra that commutes with p and describes the measurements with maximal certainty. B) For the lattice system for which the state does not factorize any more it does not suffice to pick a suitable abelian subalgebra at every lattice point. This provides an abelian model, but not an optimal one. According to the observations (4.25) it is clear that an upper bound for the dynamical entropy is given by the entropy density 2 9 , and it seems very plausible that it should not be less. To our knowledge no general proof is available, but for the states that are of physical interest, equality is shown. Already in 29 equality was shown under some compatibility relation between space translation and modular automorphism. Only in reality it is difficult to check whether this compatibility relation holds. For quasifree states this is possible and was done i n 3 6 . Here an abelian subalgebra was selected for increasing size of the tensor product. This subalgebra delocalizes, but only to such an extent that the convergence of these subalgebras to an abelian model that gives the desired result, can be controlled. In 37 equilibrium states over lattice systems as in 9 were considered and a decomposition offered that in the limit gave the desired result. 38 applied the affinity of the dynamical entropy to control these limits and allow to exchange them. His ideas are generalized in 3 9 giving the following result: If you assume that the shift a is asymptotically abelian (i.e. we consider not only lattice algebras but some generalization in the framework of AF-algebras) and you consider a dynamics given by a sequence of local Hamiltonians, then: The thermodynamic limit of the equilibrium states exists and they satisfy the KMS property with respect to the dynamics. For these states the entropy density and the dynamical entropy of the shift coincide. The dynamical entropy of the shift can be used in a thermodynamic variation principle. This variation principle is satisfied exactly by states that are KMS with respect to the time evolution.
290 The maximal dynamical entropy is achieved by the tracial state and coincides in this state with the Voiculescu-dynamical entropy hat (4.9). In all these examples the abelian model is constructed by considering the sequence p\ = C~HA and the corresponding minimal projectors in (4.21-23). There exists another possibility to construct space translation invariant states on the lattice, namely the method of correlated states: We start again with our chain A = ®nBn. In addition, we choose an algebra C (we restrict to finite dimensional ones) and consider some completely positive map F : C ® $ -> C, that we can write as fb{c), and we demand / i (c) = c. Let w be a state over C satifying Q o fx =Q. Then we define uj(bi
fb„(l))
where bi is an operator at the lattice point i (many of them can be 1). It can be checked that in this way we obtain a translation invariant state. If, e.g. /&(1) = oj(b) • 1, then we obtain a state that is clustering. If we want to have nontrivial correlations between nearest neighbours, we have to choose another / , but this enforces that there must be also correlations to other neighbours. Space clustering is encoded in the convergence properties of / ( " ) 4 0 . Now the construction of an abelian model is offered by a decomposition of F into finer completely positive maps. Convergence properties in the construction of abelian models as it is necessary in (4.23) are now controlled by convergence properties of F (that acts over finite dimensional algebras) instead of convergence properties of space correlations. Again we have to choose Bn sufficiently large, i.e. combine sufficiently many lattice points. With appropriate estimates it was shown 41 that for all finitely correlated states (C of finite dimension) the dynamical entropy and the entropy density of the so constructed states coincide. C) The Fermi Algebra If we concentrate on the even subalgebra Ae of the CAR algebra, i.e. the algebra consisting of even polynomials in creation and annihilation operators, this is just a special AF-algebra that is asymptotically abelian and therefore the results in 3 9 guarantee that for equilibrium states dynamical entropy of space translation and entropy density coincide. If, in addition, we apply the theorem 29 h{an) = \n\ h(a),
291
then obviously hAA°n)
<
hA(an)
~ <
h^PlP^-HiPlP-ttA) hli(P\PLn))-H(P\P-®Ae)
+ ln2
(6.1)
shows that hAc(a) = hA(a). Nevertheless the noncommutativity of the algebra has consequences: Theorem:
If u> =
OJ O
a, then
UJ(AQ)
= 0 for all odd elements in A.
Proof: N-l
M4>)| 2
n=0„
N
= ^EFUPO.^4W.
(6-2)
The anticommutator vanishes for strictly local odd operators except for (£-k) = 0(l). Therefore
K 4 o ) | 2 < ^ ViV. We notice that noncommutativity reduces the possibility for invariant states. Concerning the question for entropic K-systems (2.2), for all even subalgebras KmH((Tn,Be)=H(Be), but for a typical odd subalgebra AQ = {ao + %}" h(a, Ao) = 0. D) For the stationary quantum Markov chain again an abelian model can be constructed that gives the optimal result, i.e. the entropy density 10 . The main idea in the proof is the fact that apart from the algebra A we can concentrate on the algebra C and inside of this algebra we construct an optimal decomposition. Therefore in the limit of these decompositions we find an abelian model with vanishing entropy defect H(P\P- ® A).
292 As we ample) system we can
already mentioned, the automorphism T (as in our special exwill not be asymptotically abelian in general and therefore the fails to be an entropic K-system. Similar as for the Fermi system introduce the gauge automorphism 7 ~Ox =
-Vx
l°y
=
-Oy
•yaz
=
az.
The elements invariant under this gauge automorphism are asymptotically abelian under space translation, because they become localized in 1 ®C. Therefore again the result corresponds to the results in 3 9 , though the states are constructed in different ways. E) The last example we want to discuss in this framework is the PricePowers shift. We have already considered the special case g(i) — 1, the Fermi algebra (3Eb). For g{l) = 1, g(l) — 0 otherwise, the representation (3Ea) already indicates how to construct an abelian model: For a2 we are dealing with a quantum Bernoulli shift that is factorizing with the obvious choice for an abelian model. Therefore it is easy to construct the abelian model for a: We can consider Bff2 as subalgebra of A, therefore oBai is again an abelian subalgebra and for the shift a we consider the abelian model oBai with the obvious coupling. Notice that now we have presented an example where the entropy defect of the abelian model does not vanish, i.e. the abelian model is not a subalgebra of the system. For arbitrary g we will in general fail to find an abelian model. We have only to vary the proof (6.2): If g is sufficiently irregular, so that for all wj € A, where Wi are monomials in a, i € / , [wI,
|w(w/)| 2 =
J2 TT\
UJ (jkw
^ i)
= jjjl E w([«'/,"*"/«'/]+) = o (j-ijJ ,
(6.3)
293 then
LJ(WI)
has to vanish.
In fact, it was shown in 4 2 that it is possible to construct a sequence {g}, so that (6.3) holds for all wi and therefore the only invariant state is the tracial state. In 4 3 we proved that with probability one on the set of possible {} (6.3) holds and again we have a unique invariant state. But this argument can be generalized to every coupling to abelian models, therefore every coupling has to be trivial and the dynamical entropy in the sense of29, resp. 30 vanishes. The Price-Powers shift was also studied in the context of Voiculescu's dynamical entropy and in the context of the Alicki-Fannes entropy 23 ' 44 . Here the increasing property is the dominant feature. We obtain hat(a) = i In 2,
hAF (a) = In 2,
(6.4)
independently of the special sequence {g}. If we return to our remark that the dynamical entropy describes how information increases, but at the same time becomes more and more irrelevant for classical dynamical systems, we notice that the Voiculescu and the Alicki-Fannes algebra concentrate on the fact that information increases, whereas the 29 entropy is sensitive to the amount, how information becomes irrelevant. 7
Continuous K-Systems
So far we concentrated on discrete dynamics. But obviously the discrete group of translation Z can be replaced by R without varying much of the definitions. Especially due to the linearity of the dynamical entropy (which is proven for 18 and 2 9 ) h{an) = \n\ h(a),
(7.1)
also for the continuous groups R we can choose the subgroup aZ and can calculate the dynamical entropy (for all possible definitions) for this subgroup. It can be shown that the result will be independent of the scaling parameter a. Also the definition of an algebraic quantum K-system is applicable also for a continuous group. Only in this case the amount of increase cannot be described by [At : Ao\: it is either zero on infinity, because [At : AQ] = n[At/n : AQ] and [A\ : ,40] is either 0 or > 2 1 4 .
294
This remark shows that a continuous quasifree evolution over a Fermi lattice system (aaa(f) = a(eiapf), a 6 R) can give positive dynamical entropy but cannot correspond to a continuous algebraic K-system: [At : A0] = hat(at) and hat(at) = hT(crt) in the tracial state (compare 39 ). This leads to a contradiction, if hT(aT) is bounded. A prototype of a continuous K-system is given in relativistic quantum field theory: The Wedge Algebra 45 . Consider the algebra Aw = {
(ft| • |ft)}
c) \fl) is cyclic and separating for Aw- Therefore it defines a KMSautomorphism and this KMS-automorphism coincides with the geometric action of the boost b^. With {Aw,Z{1)Aw,bw, (tt\ • |ft)} we obtain a new K-system, where the K-automorphism is now the modular automorphism ad b^ = ad e%B , £x acts as endomorphism on Aw- The generators satisfy [flW,LW]=ilW. (7.2) These relations can be generalized to the following theorem: Theorem: Let {A, Ao,Tt,uj} automorphism of A and
be a modular K-system, i.e. rt the modular n A0 D Ao-
a) Then the GNS vector \Q) implementing ui is cyclic and separating both for A and Ao-
295 b) Let Tt be implemented by eim, eiHtil = \Q). Let rt° be the modular automorphism of A implemented by eiH * with eiH *|fi) = |ft). Then G =: H° - H is well defined,
G > 0,
e i G s , s > 0, implements an endomorphism on A with elG A e~%G = Ao [H,G) = iG.
(7.3)
The proof is based on the analyticity properties of the modular operator, taking appropriate care of domain properties 46 ' 47 . We notice that for quantum modular K-systems in a natural way endomorphism arise that satisfy the Anosov commutation relations and therefore offer by Lyapunov exponents the clustering properties of the automorphism: Theorem: Let {A, T(t),a(s),uj} be an Anosov system with r the Kautomorphism and a the Anosov endomorphism. Take XA to be the characteristic function (a, oo) for some a > 0. Choose A and B € A such that i) AQ, 6 T>(Gr) for some r > 0, ii) XA'(G)BQ, = 0. As a consequence (n|Z?|fi) = 0. Then |w(i4T*B)| < e-tra-r\\Bn\\\\GrAn\\.
(7.4)
We refer t o 1 and 4 8 . As for discrete quantum K-systems we wonder whether the dynamical entropy is positive and there exists nontrivial models: Again no general result is available. On the basis of quasifree evolution 49 we can construct models for fermions and bosons that are modular K-systems with positive dynamical entropy. But there exists also a ^-deformed quasifree modular system 5 0 . Here the past algebra has trivial relative commutant and therefore the algebra does not contain any subalgebra, on which the dynamics acts asymptotically abelian, which according to 34 seems to be a requirement for the construction of abelian models.
296
8
Mixing Properties Without Algebraic K-Property
As already mentioned, no strategy is available up to now to construct for a given quantum dynamical system a subalgebra that satisfies the K-property. A model for which it is still undecided whether we are dealing with an algebraic K-system is the rotation algebra 51 . Definition:
The rotation algebra Aa is built by unitary operators U, V with U-V = eiaV • U
(8.1)
for some a G [0,27r). This algebra arises in a natural way in a physically motivated example: Consider a free particle in a constant magnetic field, confined to two dimensions. Then the particle describes Larmor bounds. In the thermodynamic limit these Larmor bounds can be occupied up to a precise filling factor 52 . This thermodynamic limit can most easily be achieved by confining the particles in an additional harmonic potential whose strength is going to zero 5 3 . Another method more taylor-made to study electric currents are periodic boundary conditions. Therefore the algebra is built by eiav*, e'Pv*, einx, e'my with piavx
Jinx
pin(x+a)
iavx
pifivypiny
_
pim(y+P) giffvy
eiavXpil3vy
_
pia0Bpi0vypiav*
/g 2}
with B the magnetic field orthogonal to the plane. All other commutators vanish. If we introduce exp[inx]
=
exp tn(x - —vy
exp[im?7]
=
exp
im
(y
- ~5vx
len the algebra splits into {eiav*,
em'y}® {einx,eimy}
pinXpimy
_
gi/Bpimypinx
297 Therefore the rotation algebra with a = l/B describes the algebra of the center of the Larmor precision. For Aa there exists a representation on C{T2)
7r(Va)
=
(8.3)
exp [i [y - ^ P z ) ] >
where p , , pv are the momentum operators - —-, - — , with periodic boundary i ox i ay conditions on the torus. For |fi) = |1), the constant function on the torus
*{JJa)\il) n(va)\n)
= =
eix jy
(8.4)
independent of the rotation parameter M . On Aa we have the following automorphism
4 ( ^ C ) = J^usv?
with n m
= T
n m
ad — be = 1. - ( : ! ) •
tjW describe currents and are therefore of physical relevance. QT describes dilation in R? space and reduces to a map on the torus T2 only for discrete values and discrete directions of the dilation. A physical description for QT can be given, if it describes a sudden periodic push to the particle. Whereas CT'1' and a(2> have no good mixing behaviour, QT inherits all mixing properties from the classical torus due to (8.4) (n\n(Wa(z))QTn(Wa(z))\n)
= (Q\ir{W0{z))QTn(W0(z))\il).
(8.5)
But with respect to dynamical entropy the noncommutativity plays an essential
298
role: Let A be the eigenvalue > 1 of T. Then =
\ In A
for a irrational 18
=
In A
for a. rational
/IAF(©T)
=
In A
for all
/ICNT(©T)
=
hi A
for a rational
>
0
for a depending rationally on A 57
=
0
in general 56 .
hat(&T)
a55
In addition, it was possible to construct for a rational a subalgebra Ao, so that (A, AQ,QT,U) became a K-system 54 . This was possible, because A can be looked at as a crossed product of the classical algebra on T2 with a discrete translation group and by rather general considerations crossed product algebras inherit under some conditions the K-structure of the underlying algebra 56 . Obviously this construction does not give a hint for irrational a. The strong dependence on a of the CNT-dynamical entropy is based on the fact of the strong dependence of the asymptotic commutation behaviour. Only if a and A are rational depending the system is asymptotically abelian and the commutator converges asymptotically fast to zero. This rapid convergence made it possible to construct an abelian model 57 using the fact that the algebra Aa can be imbedded in, but is not an AF-algebra. Therefore, different from the approaches for lattice systems, the abelian model cannot be identified up to convergence problems with an abelian subalgebra of Aa9
Time Evolution
As we have seen, in a quantum system there are many possibilities for some kind of mixing behaviour that are not equivalent as in the classical situation. Up to now we concentrated on dynamics that were constructed in such a way that they should give us information on possible ergodic structures. When dynamics is given to us by a sequence of local Hamiltonians we have, up to now, hardly control on the asymptotic behaviour, apart from quasifree evolution. We mention just one result: The x-y model 5 8 allows a transformation to a quasifree evolution. Therefore we know that it is weakly, but not strongly asymptotically abelian. Its dynamical entropy is positive and all definitions give the same result (with the dimensional correction term for /IAF)- We do not know whether it is an algebraic K-system for a discrete subset in time. For sure it is not a continuous algebraic K-system.
299 References 1. G.G. Emch, H. Narnhofer, G.L. Sewell, W. Thirring, Anosov Actions on Non-Commutative Algebras, J. Math. Phys. 35/11, 5582-5599 (1994). 2. M.C. Gutzwiller, Chaos in classical and quantum mechanics (Springer, New York, 1990). 3. E. Bogomolny, F. Leyvraz, C. Schmit, Statistical Properties of Eigenvalues for the Modular Group, in Xlth International Congress of Mathematical Physics, Daniel Jagolnitzer ed. (International Press, Boston, 306-323, 1995) 4. A.N. Kolmogorov, A new metric invariant of transitive systems and automorphisms of Lebesgue spaces, Dokl. Akad. Nauk 119, 861-864 (1958). 5. P. Walters, An Introduction to Ergodic Theory (Springer, New York, 1982). 6. LP. Cornfeld, S.V. Fomin, Ya.G. Sinai, Ergodic Theory (Springer, New York, 1982). 7. H. Narnhofer, W. Thirring, Quantum K-Systems, Commun. Math. Phys. 125 565-577 (1989). 8. C. Brukner, A. Zeilinger, Conceptual Inadequacy of the Shannon Information in Quantum Measurements, quant-ph/0006087 9. 0 . Bratteli, D.W. Robinson, Operator Algebras and Quantum Statistical Mechanics I, II (Springer, Berlin, Heidelberg, New York, 1993). 10. B. Kiimmerer, Examples of Markov dilation over 2 x 2 matrices, in L. Accardi, A. Frigerio, V. Gorini eds., Quantum Probability and Applications to the Quantum Theory of Irreversible Processes, Springer, Berlin, 1984, 228-244, and private communications 11. R.T. Powers, An index theory for semigroups of *-endomorphisms of B{H) and type Hi factors, Canad. J. Math. 40 86-114 (1988); G.L. Price, Shifts of Hi factors, Canad. J. Math. 39 492-511 (1987). 12. H. Narnhofer, W. Thirring, Chaotic Properties of the Noncommutative 2-Shift, in From Phase Transition to Chaos, G. Gyorgyi, I. Kondor, S. Sasvari, T. Tel eds., World Scientific 1992, 530-546 13. H. Narnhofer, W. Thirring, Clustering for Algebraic K-Systems, Lett. Math. Phys. 30 307-316 (1994). 14. V.F.R. Jones, Index for subfactors, Invent. Math. 72 1-25 (1983). 15. R. Longo, Simple Injective Subfactors, Adv. Math. 63 152-171 (1987), Index of Subfactors and Statistics of Quantum Fields, Commun. Math. Phys. 130 285-309 (1990). 16. M. Choda, Entropy of canonical shifts, Trans. Amer. Math. Soc. 334 827-849 (1992).
300
17. H. Narnhofer, A. Pflug, W. Thirring, Mixing and Entropy Increase in Quantum Systems, in Symmetry in Nature in honour of Luigi A. Radicati di Brozolo, Scuola Normale Superiore, Pisa , 597-626 (1989). 18. D.V. Voiculescu, Dynamical Approximation Entropies and Topological Entropy in Operator Algebras, Commun. Math. Phys. 170 249-282 (1995). 19. M. Choda, A C* Dynamical Entropy and Applications to Canonical Endomorphisms, J. Fund. Anal. 173 453-480 (2000). 20. E. Stormer, A Survey of noncommutative dynamical entropy, Oslo preprint No. 18, Dep. of Mathematics, MSC-class 46L40 (2000) 21. M. Choda, Entropy on crossed products and entropy on free products, preprint (1999) 22. K. Dykema, Topological entropy of some automorphisms of reduced amalgamated free product C* algebras, preprint (1999) 23. R. Alicki, F. Fannes, Defining Quantum Dynamical Entropy, Lett. Math. Phys. 32 75-82 (1994). 24. R.B. Griffiths, Consistent histories and the interpretation of quantum mechanics, J. Stat. Phys. 36 219-279 (1984). 25. M. Gell-Mann, J. Hartle, Alternative decohering histories in quantum mechanics, in Proc. of the 25th Int. Conf. on High Energy Physics, Vol. 2, ed. by K.K. Phua and Y. Yamaguchi, World Scientific, Singapore, 1303-1310 (1991). 26. E.H. Lieb, D.W. Robinson, The finite group velocity of quantum spin systems, Commun. Math. Phys. 28 251-257 (1972). 27. A. Connes, E. Stormer, Entropy of IIj von Neumann algebras, Acta Math. 134 , 289-306 (1972). 28. A. Connes, Acad. Sci. Paris301I, 1-4 (1985). 29. A. Connes, H. Narnhofer, W. Thirring, Dynamical Entropy of C*Algebras and von Neumann Algebras, Commun. Math. Phys. 112 691-719 (1987). 30. J.L. Sauvageot, J.P. Thouvenot, Une nouvelle definition de I'entropic dynamique des systems non commutatifs, Commun. Math. Phys. 145, 411-423 (1992). 31. C.H. Bennett, D.P. DiVincenzo, J.A. Smolin, W.K. Wootters, Mixed state entanglement and quantum error corrections, Phys. Rev. A 54, 3824-3851 (1996). 32. F. Benatti, H. Narnhofer, A. Uhlmann, Decomposition of quantum states with respect to entropy, Rep. Math. Phys. 38, 123-141 (1996). 33. W.K. Wootters, Entanglement of formation of an arbitrary state of two qubits, q-ph/970929,
301
34. F. Benatti, H. Narnhofer, Strong asymptotoc abelianess for entropic Ksystems,Commun. Math. Phys. 136 231-250 (1991); Strong Clustering in Type III Entropic K-Systems, Mh. Math. 124, 287-307 (1996). 35. H. Narnhofer, An Ergodic Abelian Skeleton for Quantum Systems, Lett. Math. Phys. 28, 85-95 (1993). 36. H. Narnhofer, W. Thirring, Dynamical Theory of Quantum Systems and Their Abelian Counterpart, in On Klauder's Path, eds. G.G. Emch, G.C. Hegerfeldt, L. Streit, World Scientific, 127-145 (1994). 37. H. Narnhofer, Free energy and the dynamical entropy of space translation, Rep. Math. Phys. 25, 345-356 (1988). 38. H. Moriya, Variational principle and the dynamical entropy of space translation, Rev. Math. Phys. 11, 1315-1328 (1999). 39. S. Neshveyev, E. Stormer, The variational principle for a class of asymptotically abelian C* algebras, MSC-class 46L55 (2000) 40. M. Fannes, B. Nachtergaele, R.F. Werner, Finitely correlated states of quantum spin systems, Commun. Math. Phys. 144 443-490 (1992). 41. R.F. Werner, private communication 42. H. Narnhofer, E. Stormer, W. Thirring, C* dynamical systems for which the tensor product formula for entropy fails, Ergod. Th. & Dynam. Sys. 15, 961-968 (1995). 43. H. Narnhofer, W. Thirring, C* dynamical systems that are highly anticommutative, Lett. Math. Phys. 35 145-154 (1995). 44. R. Alicki, H. Narnhofer, Comparison of Dynamical Entropies for the Noncommutative Shifts, Lett. Math. Phys. 33, 241-247 (1995). 45. H.J. Borchers, On the Revolutionization of Quantum Field Theory by Tomita's Modular Theory, ESI preprint, 160 pages, 148 references 46. H.J. Borchers, On Modular Inclusion and Spectrum Condition, Lett. Math. Phys. 27, 311-324 (1993). 47. H.W. Wiesbrock, Halfsided Modular Inclusions of von Neumann Algebras, Commun. Math. Phys. 157, 83-92 (1993), Commun. Math. Phys. 184, 683-685 (1997). 48. H. Narnhofer, Kolmogorov Systems and Anosov Systems in Quantum Theory, review, to be publ. in IDAQP. 49. H. Narnhofer, W. Thirring, Realization of Two-Sided Quantum KSystems, Rep. Math. Phys. 45, 239-256 (2000). 50. D. Shlyakhtenko, Free quasifree states, Pac. Journ. of Math. 177 329368 (1997). 51. M.A. Rieffel, Pac. J. Math., 93, 415 (1981). 52. R.B. Laughlin, Quantized Hall Conductivity in Two Dimensions, Phys.
302
Rev. B 23/10, 5632-5633 (1981). 53. N. Ilieva, W. Thirring, Second quantization picture of the edge currents in the fractional quantum Hall effect, math-ph/0010038 54. F. Benatti, H. Narnhofer, G.L. Sewell, A Non Commutative Version of the Arnold Cat Map, Lett. Math. Phys. 2 1 , 157-172 (1991). 55. R. Alicki, J. Andries, M. Fannes, P. Tuyls, Lett. Math. Phys. 35, 375383 (1995). 56. H. Narnhofer, Ergodic Properties of Automorphisms on the Rotation Algebra, Rep. Math. Phys. 39, 387-406 (1997). 57. S.V. Neshveyev, On the K property of quantized Arnold cat maps, J. Math. Phys. 41 1961-1965 (2000). 58. H. Araki, T. Matsui, Commun. Math. Phys. 101 213-246 (1985).
303
SCATTERING IN Q U A N T U M T U B E S B O R J E NILSSON School of Mathematics and Systems Engineering, Vaxjo SE-351 95 VAXJO, Sweden E-mail: [email protected]
University,
It is possible to fabricate mesoscopic structures where at least one of the dimensions is of the order of de Broglie wavelength for cold electrons. By using semiconductors, composed of more than one material combined with a metal slip-gate, two-dimensional quantum tubes may be built. We present a method for predicting the transmission of low-temperature electrons in such a tube. This problem is mathematically related to the transmission of acoustic or electromagnetic waves in a two-dimensional duct. The tube is asymptotically straight with a constant cross-section. Propagation properties for complicated tubes can be synthesised from corresponding results for more simple tubes by the so-called Building Block Method. Conformal mapping techniques are then applied to transform the simple tube with curvature and varying cross-section to a straight, constant cross-section, tube with variable refractive index. Stable formulations for the scattering operators in terms of ordinary differential equations are formulated by wave splitting using an invariant imbedding technique. The mathematical framework is also generalised to handle tubes with edges, which are of large technical interest. The numerical method consists of using a standard MATLAB ordinary differential equation solver for the truncated reflection and transmission matrices in a Fourier sine basis. It is proved that the numerical scheme converges with increasing truncation.
1
Introduction
In the search for faster computers critical parts are becoming smaller. Today, it is possible to build mesoscopic structures where some dimensions are of the order of the de Broglie wavelength for cold electrons. Often the electron motion is confined to two dimensions. Consequently, it may be necessary, at least for some computer parts, to include quantum effects in the design process. A large number of studies, devoted to such quantum effects, have been carried out in recent years and a review is given by Londegan et al x . Many investigations aim at understanding the physical properties of a particular quantum tube rather than developing reliable mathematical and numerical methods that can be used in a more general context. The research has given valuable knowledge on the physical behaviour but also reports on the limitations of the methods used. For instance, Lin & Jaffe2 report that a straightforward matching at the boundary of a circular bend does not converge, demonstrating the numerical problems with such a method. An illposedness is present in quantum tube scattering and some type of regularisation is therefore required to avoid large errors. Often, the tubes have sharp corners to facilitate manufacturing
304
but also to enhance quantum effects. The presence of corners with attached singularities requires special treatment. Scattering of electrons in quantum tubes, see figure 1, is theorywise related to the scattering of acoustic and electromagnetic waves in ducts. Nilsson 3 treats a general method for the acoustic transmission in curved ducts with varying cross-sections. Wellposedness, i.e. stability, is achieved in an asymptotic sense. The mathematical framework guarantees consistent results and allows for sharp corners and a proof for numerical convergence is given. We set out to present a quantum version of the results of Nilsson 3 . In this way the problems reported on convergence 2 and on inconsistent mathematical results would be resolved. The paper is organised as follows. An introduction to scattering in quantum tubes is given in section 2 and a mathematical model is formulated in section 3. The Building block Method which is a systematic method to analyse complicated tubes in terms of results for simple tubes is also briefly described. Then in section 4 the scattering problem for the curved tube with varying cross-section and constant potential is reformulated to a scattering problem for a straight tube with a varying refractive index. The solution to this problem is presented in section 5 and a discussion on numerical methods are also given. 2
Tubes in quantum heterostructures
A schematic view of a quantum heterostructure is shown in figure 2 following Wu et al. 4 Electrons are emitted from the n-type doped AlGaAs layer, migrate into the GaAs layer and stay close to the boundary to the AlGaAs layer. In this way a very narrow layer of electrons which are free to move in a plane is formed. Nearly all the electrons in this two-dimensional gas are in the same quantum state. By applying a negative potential on the metal electrodes on the top of the heterostructure in figure 1, the electrons are banished from the region below the electrodes. For relatively low voltages, the effective potential in the tube for one electron is close to the square-well potential. 1 As a consequence the electrons in the two-dimensional gas are further restricted to a tube that in form is a mirror picture of the gap between the two electrodes. This quantum tube links the electrons between the two two-dimensional gases on both sides of the strip formed by the electrodes. 3
Mathematical model
Consider a two-dimensional tube with interior ft' according to figure 1. The boundary V consists of two continuous curves, F'+ and r'_, which are piecewise
305
C 2 . The upper boundary r + can be continuously deformed to T'_ within ft'. Outside a bounded region the duct is straight with constant widths a and b, respectively. These terminating ducts are called the left and the right terminating duct or L and R for short. We use stationary scattering theory for one electron in an effective potential, with time dependence exp(—iEt/h), assuming that the wave function ip satisfies the time-independent Schrodinger equation Atp + k2ip = 0 in ft',where k2 = 2m*E/h and m* is the effective mass 5 . Usually k2 is called energy. The effective potential is assumed to be a square well meaning that Vlr' = 0In a tube with constant cross-section the harmonic wavefunction ip can be uniquely decomposed in leftgoing and rightgoing parts by ip = ip++ip~. Super indices " + " and " — " indicate rightgoing or plus and leftgoing or minus waves respectively. Let ipfn a n d V ^ be known incoming waves in the terminating ducts. tpfn is present in the left and ip~n in the right one. Let us write f V = 1>tn + R+tfn + T-rp-JnL \ r/j = VTn + R'iTn + T+i>fninR
'
, ^
, '
where for example the last two terms in (3.1a) are minus waves and the equation defines the left reflection mapping R+ that maps the incoming wave to an outgoing one in L. The scattering problem consists of finding the mappings R+ ,T~, R~ and T+ as functions of energy for a given duct. In summary we have Aip + k2i> = Oinfl'
1>+=1>pnL
•
{6 2)
-
i> = ">PininR
There is always a solution to (3.2), and except for a discrete number of eigenenergies k2 = kf,i = 1,2,3,..., the solution is unique. 6 When k2 = k2, an eigenenergy, there exists a solution without incoming but with outgoing waves. The use of the Building Block Method 7 or transfer matrix formalism 8 is very efficient for the solution of scattering problems. In this method a tube with a complicated geometry is divided into two parts usually where the tube is straight. These two parts are converted to the type shown in figure 1 by extending the terminating tubes to infinity. A sub tube for the tube shown in figure 1 originates from the left part and is depicted in figure 3. The Building Block Method gives a procedure for calculating the mappings R+, T~, R~, and T+ for the entire tube in terms of the corresponding scattering properties for the sub tubes. This procedure can be repeated to get several sub tubes.
306
Rather than using a general numerical package for conformal mappings we have for the calculations in this paper employed the Schwarz-Christoffel mapping for a duct with corners and rounding the corners using the methods of Henrici 9 . Required analytic integrations are performed in MATHEMATICA. We recall the standard duct theory 6 in a form that illustrates the illposedness of the problem and we have oo
oo
rP = V>+ + V - = Y, A+e t e "V»(v) + £ ra=l
^e^-'^ly),
with pn(y) = sin(nny/a) and an = ^Jk2 — n2n2/a2, nient to define the operator Bo by / I
Im an > 0. It is conve-
-Bo/ = £rT=l f(y)
= Zn=l<*nfn
We find that BQ — d2x 4- k2 and dx^
ttn/nVn,
,,
.,
'
^
— ±i5 0 V' ± - The initial value problem,
/ dxtp+(x) = iB0ip+(x),
I
(3.3)
n=l
V+(0) = ^ ,
.
.
(
;
is illposed for x < 0, but not for x > 0. If an attenuated plus wave is marched to the left an exponential growth is found. To avoid the illposedness, ip is decomposed and the plus waves are calculated by marching to the right and minus waves in the opposite direction. 4
Reformulated scattering problem
To be able to use powerful spectral methods it is advantageous to transform the tube to a flat boundary. It is enough, according to the Building Block Method, to consider the scattering in the sub tubes and we restrict ourselves to the first part as shown in figure 3. One way of transforming the tube is to use a conformal mapping w(C) transforming the interior CI' of the tube with variable cross-section in the £ = x + iy plane (figure 3) to the interior H of a straight tube with constant cross-section in the w = u + iv plane. The straight tube is described by —oo < u < o o , 0 < t ; < a . Introducing cf>(u, v) = tp(x, y) we get f d2ucl> + B2(u)^ = 0inn \ 0(u,O) = 0(u,o) = O , u e R
(
'
K
' '
with B2{u) = d2 + k2n(u,v) and n = \dC,/dw\2. /^(u,i>) -1 can be denoted as a refractive index for the straight tube. In figure 4, /x related to the simple
.
307 tube in figure 3 is depicted. The factor (i(u, v) is asymptotically constant at both ends of the tube or more precisely fj,(u, v) = (i±+0(e^cu^), u —> ±00 with [i- — 1 and /J+ = (b/a)2. We use a first order description and rewrite (4.1a) as
9u
\ du<j> ) ~ { - B 2 0j{
du
(4.2)
To avoid illposedness the decomposition <j> =
\du)-\
ic
(4.3)
-ic ){
Solving (4.3) for 0 + and <j> and taking the u-derivative and using (4.2) we find that
* ( £ ) - ( ; i)(£)-
(4.4)
where a = MiduC-^C + iC~lB2 + iC] _ 1 -(d C-1)C + iC-1B2-iC u & 1 1 2 + iC 7 = I -(duC-l )C-iC- B l 2 'S ='\'[{d 2 - iC~ B - iC]~ uC- )C
(4.5)
To generalize the concept of transmission operators we make them u-dependent, using a similar notation as Fishman 1 0 : / 4>+{u2) \
V tf-(Ul) J
f T+(U2,Ui) R (u 1 ( u 2 ) "\ ( 4>+(ui) \ V ^+("2,«l) T-(Ul,u2) J V r
(«») ) '
(4.6)
assuming that t*i < u 2 , and suppressing the explicit v-dependence. It is assumed for (4.6) that the scattering problem has a unique solution or that homogenous solutions are removed. A homogenous solution is usually called a bound state. Next we find a differential equation for the scattering operators T+(u2, u\), R~(ui,u2), R+(u2,ui) and T~(ui,u2) in (4.6) using the invariant imbedding 11 10 technique ' . It is required that the incoming wave from the right, <j>~{u2),
308
is vanishing. Then put u\ = u, find du<j) (u) from (4.6), use (4.6) once more to obtain duR+(u2,u)
= J + 5R+(u2,u)
- R+(u2,u)a
- R+(u2,u)PR+(u2,u),
(4.7)
(u2, u)/3R+ {u2, u).
(4.8)
In a similar manner we get duT+ (u2, u) = -T+ (u 2 ,u)a-T+
The stability properties of (4.7) and (4.8) are of central importance. In the flat regions where B = B+ or B- we have C — B and duC~x — 0 implying that /? = 7 = 0 and a = -S = IB. Similarly (4.7) and (4.8) reduce to duX+ = —iBX+, X+ = R+ or T + , equations which are well-posed for marching to the left. The initial values to accompany (4.7) and (4.8) are R+(u2,u2) = 0 and T+(u2,u2) = / , where I is the identity operator. We choose C — B- + f(u)(B+ — £?_) that is independent of v. Here / is increasing and smooth with limu-^-oo/^) = 0, and lim u _ > 0 0 /(u) = 1. 5
Solution of the scattering problem
For the numerical solution of the scattering operator we expand <j) in a Fourier sine series and / i i n a Fourier cosine series: / ^(u,v) = £ ~ = 1 (pn{u)tpn(v)
(
.
where £n(v) = cos(mr/a). Using the notation 4> = ((j>0,(j)\,...)T we find that ^M+B
2
(
U
) ^ ) = 0.
(5.2)
The matrix elements of B 2 (u) are given by B2(u)nm
k2 = — [-fj,m+n(u)
n2TT2 - Hm-n{u)
- Hm + Hn-m(u)]
^Snm
(5.3)
and it is understood in (5.3) that [ii(u) = 0 for negative I. For the tube in the physical C—plane we require that locally both the potential and the kinetic part of the energy are finite, that is both Jx \ip\ dxdy < oo and Jx \Vip\ dxdy < oo for all finite regions X inside the tube. We say that ip belongs to the Sobolev space Hj1^ meaning that tp and its first derivatives are locally square integrable. Transformed to the straight duct the local finite energy requirement means Jv \(f>\ fidudv < oo and / ^ |V^| dudv < oo for all
309 finite regions U inside the tube. For a smooth boundary cf> is more regular, and also the second derivatives of <j> are square integrable, that is 0 G H20C. It follows from the theory of Grisvard 12 that also the second derivatives of <j> are square integrable, which means that <j> 6 H 2 oc . According to a graph theorem 13 cj) € H2oc implies that cf>(u,-) 6 H 3 / 2 (0,o), meaning that up to 3/2 derivatives are square integrable. To interpret this regularity with fractional derivatives we define, following Taylor 13 , the function space Ds = \fe I
L2(0, a); f^ | / „ | 2 (l + n2)s < oo 1 , s > 0, 71=0 J
with / = J2^Li fn
(5.4)
D s is a Hilbert space with
oo
11/112). = (/,/) = £ l / n | 2 ( i + « 2 )'-
(5-5)
n=l 13
2
Taylor shows that D 0 =L (0,o), Di =Hj(0,a), D 2 =H 2 (0,a)nHj(0,a) and that dvDs = D s _i, s > 1. In this terminology we have that for a smooth boundary <j>{u, •) € D 3 / 2 The operator 9 2 is self-adjoint on D 3 / 2 - Thus, we may define B± by oo
B±f = ^2 \/k2H±-nHya?fnipn,
(5.6)
71=1
assuming that the branch Im > 0 of the square root is taken. It is clear that T + , R~, R+ and T~ are mappings D 3 / 2 ^ D 3 / 2 and B±: D s —> D s _ i , s > 1. For tubes with edges in the £—duct things are a little more complicated. With no restriction on the sharpness of the edges we cannot improve that (j> € H\oc implying <j>{u,-) €Di/ 2 . Then, as an intermediate step in our calculations B±<j) should be in the space D_!/ 2 . Such a derivative must of course be interpreted as a distribution. However, the end result, i.e. scattered wave function belongs to D ^ . To generalise we define by duality for positive s £»_s = | g; / f(v)g(v)dv
< oo for all f £ Ds\ .
Multiplication by^/ju is an operator T>i/2 ->• D_!/ 2 and if s > 1/2 we have the following mapping properties: B± : D s - • D g _i,d„ : D s -> D 5 _!, and T + , R~, R+ and T~ are mappings D s - ^ D s .
310
The equations (4.7-4.8) can only in very special cases be solved in a closed form. Therefore some type of numerical scheme is used. Generally a numerical method cannot give uniform convergence for the entire space Ds. In a practical application it is usually sufficient to know the effect of the scattering matrices on the lowest eigenfunctions, the first No say. A practical method is therefore to truncate the matrix representation of (4.7) - (4.8) to N » NQ and solve the finite-dimensional ordinary differential equation with a standard numerical routine. Nilsson 3 proves that such a procedure converges when N —> oo. Presently, numerical results are not available for the quantum tube scattering. However, Nilsson 3 presents results for the acoustic case where the Neumann rather than the Dirichlet boundary condition applies. He reports that for the lowest order reflection coefficient N = 1, i.e. a scalar solution, is accurate up to ka = 1.5, N = 2 gives a good and N = 5 gives a perfect discription up to ka = 6. Energy conservation holds for all N. References 1. J. T. Londegan, J. P. Carini, D. P. Murdock, Binding and scattering in two-dimensional systems - Applications to quantum wires, waveguides and photonic crystals. Lecture notes in physics (Berlin, Springer, 1999). 2. K. Lin, R. L. Jaffe, Bound states and threshold resonances in quantum wires with circular bends. Phys. Rev. B54, 5750-5762 (1996). 3. B. Nilsson, Acoustic transmission in curved ducts with varying crosssections. Article submitted to Proc. Roy. Soc. A.. 4. J. C. Wu, M. N. Wybourne, W. Yindeepol, A. Weisshaar, S. M. Goodnick, Interference phenomena due to a double bend in a quantum wire. Appl Phys. Lett. 59, 102-104 (1991). 5. J. Davies, The Physics of low-dimensional semiconductors (Cambridge, Cambridge University press, 1998). 6. M. Cessenat, Mathematical methods in electromagnetism (Singapore, World Scientific Publishing Co., 1996). 7. B. Nilsson, O. Brander, The propagation of sound in cylindrical ducts with mean flow and bulk reacting lining - IV. Several interacting discontinuities. IMA J. Appl. Math 27, 263-289 (1981). 8. H. Wu, D. W. L. Sprung, J. Martorell, Periodic quantum wires and their quasi-one-dimensional nature. J. Phys. D: Appl. Phys. 26, 798-803 (1993). 9. P. Henrici, Applied and computational complex analysis. Volume I (New York, John Wiley k Sons, 1988). 10. L. Fishman, One-way propagation methods in direct and inverse scalar
311
wave propagation modeling. Radio Science 28(5), 865-876 (1993). 11. R. Bellman, G. M. Wing, An introduction to invariant imbedding. Classics in Applied Mathematics, 8. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1992. 12. P. Grisvard, Elliptic problems in nonsmooth domains. Monographs and studies in mathematics, 24 (Boston, Pitman, 1985). 13. M. Taylor, Partial differential equations I. Basic theory. Applied mathematics sciences, 115 (NewYork: Springer, 1996).
312
Figure 1: Two-dimensional quantum tube
Doped AJGaAs Undoped AIGaAs
Undoped GaAs
Semi insulating GaAs
Figure 2: Schematic picture of heterostructure and split-gate structure.
313
Figiire 3: Sub-tube with interior Q' and upper boundary T^_and lower boundary T'_. b/a 0.6.
2
Figure 4: fi(u,v)
0
in the straight duct. Parameters as in figure 3. fi
x
is the refractive index.
314
P O S I T I O N EIGENSTATES A N D T H E STATISTICAL A X I O M OF Q U A N T U M M E C H A N I C S
Physics Dept,
L. POLLEY Oldenburg University, 26111 Oldenburg, E-mail: polleyQuni-oldenburg. de
Germany
Quantum mechanics postulates the existence of states determined by a particle position at a single time. This very concept, in conjunction with superposition, induces much of the quantum-mechanical structure. In particular, it implies the time evolution to obey the Schrodinger equation, and it can be used to complete a truely basic derivation of the statistical axiom as recently proposed by Deutsch.
1
Quantum probabilities according to Deutsch
A basic argument to see why quantum-mechanical probabilities must be squares of amplitudes (statistical axiom) was given by Deutsch 1 ' 2 . It is independent of the many-worlds interpretation. Deutsch considers a superposition of the form
He introduces an auxilliary degree of freedom, i = 1 , . . . , m + n, and replaces 1^4) and \B) by normalized superpositions, ~r~ m
/
£5>)|i> i—l
nr m+n
l5>Wn £
m)
(L2)
i=m+l
All amplitudes in the grand superposition are equal to 1/y/m + n and should result in equal probabilities for the detection of the states. This immediately implies the ratio m : n for the probabilities of property A or B. The argument has clear advantages over previous derivations of the statistical axiom. Gleason's theorem 3 ' 4 , for example, is mathematically non-trivial and not well received by many physicists, while von Neumann's assumption {0\ +C>2) = (Oi) + (O2) about expectations of observables 5 ' 6 is difficult to interpret physicswise if 0\ and Oi are non-commuting 4 ' 5 . However, Deutsch's argument relies in an essential way on the unitarity of the replacement, or the normalization of any physical state vector. Why should a state vector be "normalized" in the usual sense of summing the squares of amplitudes? It would seem desirable to provide justification for this beyond
315
its being "natural" 2 . In fact, the reasoning would appear circular without an extra argument about unitarity or normalization. I have proposed 7 to realize the "replacement" (1.2) physically by the time evolution of a suitable device. Then, what can be said about quantum-mechanical evolution without anticipating the unitarity? 2
Schrodinger's equation for a free particle as a consequence of position eigenstates
For free particles, a well-known and elegant way to obtain the Schrodinger equation is via unitary representations of space-time symmetries. Interactions can be introduced via the principle of local gauge invariance. However, this approach to the equation anticipates unitarity. As I pointed out recently 8 , the Schrodinger equation for a free scalar particle is also a consequence of the very concept of a position eigenstate a in discretized space. To an extent, this just means to regard "hopping amplitudes", as they are familiar from solid state theory, as a priori quantum-dynamical entities. The point is to show, however, that a hopping-parameter scenario without unitarity would lead to consequences sufficiently absurd to imply that unitarity must be a property of the physical system. As will be seen below, the absurdity is that a wave-function that makes perfect sense at t = 0 would cease to exist anywhere in space at an earlier or later time. Consider a spinless particle "hopping" on a 1-dimensional chain of positions x = na where n is integer and a is the lattice spacing. • • • • •->—•—-• • \- a -\
™-i
n
"+i
Assume the particle is in an eigenstate \n, t) of position number n at time t (using the Heisenberg picture), and it has a possibility to change its position. The information given by a "position at one time" does not determine which direction the particle should go. Thus the eigenstate \n, t) necessarily is a superposition when expressed in terms of eigenstates relating to another time t'. Moreover, because of the same lack of information, positions to the left and right will have to occur symmetrically. If t' —¥ t, only nearest neighbours will be involved. Thus we expect a "hopping equation" of the form \n,t)=a
\n,t')+/3
|n + l , t ' ) + / ?
\n-l,t')
This can be rewritten as a differential equation in t, —i— \n, t) = V \n, t) + K \n + 1, t) + K \n — 1, t)
K, V complex (so far)
"Which relies on linear algebra, hence includes the concept of "superposition".
316
Parameters a,/3 and K, V are in an algebraic relation 8 which need not concern us here. To obtain an equation for a wave-function we consider a general state \tp) composed of simultaneous position eigenstates, \ip) = ^J^>(n,£) \n,t)
(Heisenberg picture)
n
This defines the coefficients ip(n,t) for all t. Now take the time derivative on both sides, identify i[){n,t) with a function ip(x,t) where x = na, and Taylorexpand the shifted values ip(x ± a, t). This results in
Finally, take a —> 0 on the relevant physical scale. The spatial spreading of the wave-function is then given by the a2 term, and the solution of the equation is il>(x,t) = e~i{v+2K)t
f
rP(p)eipxe-ia2Kp2tdp
This time evolution would be unitary if K and V were real. Hence, consider the consequences of a non-real K. The integrand would then contain an evolution factor increasing towards positive or negative times like exp (± a2 Im/tp 2 1) This would lead to physically absurd conclusions about certain "harmless" wave-functions, like the Lorentz-shape function ij){x) = 1/1 + x2: • For Im/t > 0, "harmless" function rp{p) oc exp(—\p\) would not exist anywhere in space after a short while. • For Im/c < 0, the "harmless" function could not be prepared for an experiment to be carried out on it after a short while. In a mathematical sense, of course, it still remains a postulate that the value of K be real. But physicswise, it does seem that unitarity of quantum mechanics is unavoidable once the superposition principle and the concept of position eigenstate are taken for granted. As for parameter V, the factor e~lVt would be raised to the nth power in an n-particle state, and would lead to an absurdity similar to the above with certain superpositions of n-particle states unless V is real, too.
317
3
Driven particle: Weyl equation in general space-time
As an example of a particle interacting with external fields we may consider a massless spin 1/2 particle with inhomogeneous hopping conditions 8 . Here the starting point is common eigenstates of spin and position, where "position" refers to a site on a cubic spatial lattice. A particle in such a state at time t will be in a superposition of neighbouring positions and flipped spins at a time t' « t. In 3 dimensions, and immediately in terms of a wave-function, the corresponding differential equation is -i—ips(x,t)= at
S~] ,*—?
Hnssiil>sl{x-an,t)
lattice directions
where Hnssi are any complex amplitudes. On-site hopping (time-like direction) is included as n = 0. To begin with, a free particle is defined by translational and rotational symmetry. In this case, the hopping amplitudes reduce to two independent parameters 8 , e and K, both of them complex so far. By Taylorexpanding the wave-function and taking a —> 0 we find dtips{x,t)
= etp3(x,t) -
aKa^s,dntps>(x,t)
If K had an imaginary part, it would lead to physical absurdities with the time-evolution of certain "harmless" wave-functions similarly to the previous section. For real K, we recover the non-interacting Weyl equation. If we now admit for "slight" (order of o) anisotropics and inhomogeneities in the hopping amplitudes, by adding some a7MSS/(x, t) to the hopping constants above, we recover a general-relativistic version of the equation 9 with the Juss' (x, t) acting as spin connection coefficients. Unitarity in this context means that the probability current density
j"(*,t) = v;(*,*)
Imposing no constraints on the spin connection coefficients, we are dealing with a metric-affine space-time here, which can have torsion and whose metric
318
Figure 1: An array of eight cavities of equal shape. The initial state is located in the central cavity. When each channel is opened for an appropriate time, the state evolves to an equal-amplitude superposition of the peripheral cavity-states.
may be covariantly non-constant. The study of space-times of this general structure has been motivated by problems of quantum gravity 9 . It may be interesting to note that nothing but propagation by superposing next-neighbour states needs to be assumed here. In particular, scalar products of state vectors are not needed. 4
Realizing Deutsch's "substitution" as a time evolution
Having demonstrated "automatic" unitarity on two rather general examples we can now turn with some confidence to the original issue of completing Deutsch's derivation of the statistical axiom. To realize the particular substitution (1.2) for state vector (1.1), let us consider a particle with internal eigenstate \A) or \B), such as the polarisations of a photon. Let this particle be placed in a system of cavities6 connected by channels (Fig. 1) which can be opened selectively for internal state \A) or ''Or Paul traps, or any other sort of potential well; these are to enable us to store away parts of the wave function so that there is no influence on them by the other parts.
319
\B). It will be essential in the following that all cavities are of the same shape, because this will enable us to exploit symmetries to a large extent. The location of the particle in a cavity will serve as the auxilliary degree of freedom as in (1.2), except that \A) and \B) before the substitution will be identified with |A)|0) and |-B)|0) where |0) corresponds to the central cavity. Now let only one of the channels be open at a time. We are then dealing with the wave-function dynamics of a two-cavity subsystem, while the rest of the wave-function is standing by. What law of evolution could we expect? A particle with a well-defined (observed) position 0 at time t will no longer have a well-defined position at time t' if we allow it to pass through a channel, without observing it. Thus a state |0, t) defined by position 0 at time t (using the Heisenberg picture) will be a superposition when expressed in terms of position states relating to a different time t'. In particular, if channel 0 <-• 1 is the open one,
\0,t) = a\0,t') + p\l,t') Likewise, by symmetry of arrangement, |l,t> = a | M ' ) + 0 | O l f ) It follows that |0, t) ± |1,£) are stationary states whose dependence on time consists in prefactors (a ±
fi)k
after k time steps.
(4.1)
If the particle is initially in the rest of the cavities, whose channels are shut, we would expect this state not to change with time: |rest,t) = |rest,i') Now, if (4.1) were not mere phase factors, we could easily construct a superposition of |0), |1), and |rest) so that, relative to the disconnected cavities, the part of the state vector in the connected cavities would grow indefinitely or vanish in the long run. As there is no physical reason for such an imbalance between the connected and the disconnected cavities, we conclude that a + p = ei*
a-0
= eiv'
Having shown evolution through one open channel to be unitary, we can identify an opening time interval 7 , r m , to realize the following step of the replacement (1.2): y/m\A) |0) + |rest)
^
y/m=l\A)
|0) + | A) 11) + |rest)
320
Here |rest) stands for state vectors that are decoupled, such as all |.B)|i), and all |^4) |i) with i ^ 0,1. Opening other channels analogously, each one for the appropriate r m and internal state, we produce an equal-amplitude superposition m
m+n
X»|i> + £ |B>|t> i=l
i=m+l
The probability of finding the particle in a particular cavity is now 1/m + n as a matter of symmetry. As the internal state is correlated with a cavity by the conduction of the process, the probabilities for A and B immediately follow. These must also be the probabilities for finding A or B in the original state, because properties A and B have remained unchanged during the time evolution. 5
Can normalization be replaced by symmetry?
An interesting side effect of the above realization of Deutsch's argument is that state vectors need no longer be normalized at all. Permutational symmetry of a superposition suffices to show that all possible outcomes of an experiment must occur with equal frequency. Then the numerical values of the probabilities are fully determined. This feature of quantum probabilities may be relevant to problems of normalization in quantum gravity 10 , such as the non-locality of summing \xp\2 over all of space, or the non-normalizability of the solutions of the Wheeler-DeWitt equation. References 1. D. Deutsch, Proc. Roy. Soc. Lond. A 455, 3129 (1999); Oxford preprint (1989). 2. B. DeWitt, Int. J. Mod. Phys. 13, 1881 (1998). 3. A. M. Gleason, J. Math. Mech. 6, 885 (1957). 4. A. Peres, Quantum Theory (Kluwer Academic Publishers, Dordrecht, 1995). 5. J. von Neumann, Mathematische Grundlagen der Quantenmechanik (Springer, Berlin-New-York, 1932). 6. A. Bohr, 0 . Ulfbeck, Rev. Mod. Phys. 67, 1 (1995). 7. L. Polley, quant-ph/9906124 8. L. Polley, quant-ph/0005051 9. F. W. Hehl et al, Rev. Mod. Phys. 48, 393 (1976); Phys. Rep. 258, 1 (1995). 10. A. Ashtekar (ed.), Conceptual problems of quantum gravity (Birkhauser, 1991).
321 IS RANDOM EVENT THE CORE QUESTION ? SOME REMARKS AND A PROPOSAL P. ROCCHI IBM, via Shangai 53, 00144 Roma, Italy E-mail: paolorocchi@it. ibm. com This work addresses the Probability Calculus foundations. We begin with considering the relations of the event models today in use with the physical reality. Then we propose the structural model of the event and a definition of probability that harmonizes the interpretations sustained by different probabilistic schools.
1
Preface
The origin of the Probability Calculus is credited to Pascal who applied rigorous methods to the matter that had been grasped by gamblers and unreliable individuals until then. He intended to lay the foundations of a new Geometry and the random event should be a "point" in this hypothetical abstract science. Throughout the centuries several scientists shared the Pascal's conjecture which has been accepted without discussion. Instead in our opinion an exhaustive and systematic approach to probability requires us to investigate the argument before examining the probability itself. The probability theories do not diverge in their final results, do not provide different formulas for the total probability and the conditioned probability, instead they are in contrast on the foundations to wit in the initial concepts, and this circumstance seems to us a substantial reason to study the random event. In brief we may say that the probability theories use two main models of the random event: the linguistic model and the set model. We shall examine them in the ensuing sections. However we do not restrict our works to mere criticism but we shall trace a theoretical proposal. This one provides a new mathematical model of the random event and a definition of probability which seems capable of harmonizing the various authors appearing today in contrast: Kolmogorov and the frequentists, the subjectivist and objectivist schools, etc. In this article we present a few elements taken from the complete theoretical framework [11]. 2
Linguistic Model
In general different sentences can describe the same random event. Let the propositions p, q, ... regard one event and verify the equivalence relationship
322
p a> q
(1)
They form the equivalence class X X={p,q,...}
(2)
that constitutes the model of the random event so that we have P = P(X) (3) We share the opinion that random events are extremely complex and the linguistic model (2) is consistent with this feature. Disciplines which investigate complicated phenomena such as psychology and sociology, business management and medicine, adopt the linguistic representation and consider other schemes to be too simple and reductive. The proposition seems an adequate model except for the following perplexity. Each primitive is a simple idea and can be left to intuition only for its fundamental property. For example a number, a point, an entity are elementary concepts. Can we declare that the random event is complex and contemporarily assume it is a primary concept ? The acknowledgement of the complexity opposes the primitive assumption. This contrast would at least require an in depth justification that instead is lacking, as far as we know. The inconsistency is confirmed in the every-day practice and we examine the linguistic model in relation to the facts. 2.1) - Some subjectivists declare that each particular of the event should be described in order to make evident its uniqueness whereas in usual calculations we accept a sentence such as "The coin comes down heads "
(4)
Note that only two items are reported: the coin and the result. The precise date, time, place and all the particulars that make the event unique and unrepeatable, remain implicit. In fact the parts of a probabilistic event are not easy to distinguish and to relate in a sentence. In conclusion a gap exists between the theoretical assertions and the practical applications of (2). 2.2) - In the Logic of Predicates every phrase has a precise meaning and is liable to be calculated. Programmers using Prolog and Lisp, develop inferences. Logical programs can deduce the thesis from the hypothesis using precise clauses. However this linguistic precision constitutes an exception and normally the natural language is approximate to the extent that a word must be interpreted. The natural language usually represents a random event in generic terms whereas the linguistic model (2) should be liable to the probability calculation (3).
323
3
Ensemble Model
The axiomatic theory [8] assumes that the sample space D. includes all the possible elementary events. Kolmogorov defines the random event X as a set of particular events Ex X= {Ex}
(5)
X c
Q
(6)
P = P(X)
(7)
when X is a subset of Q
and the probability is the measure of X
The practical application of the theory is immediately clarified by Kolmogorov who defines X as the "result" of the event. 3.1) - This conception causes some perplexities in the light of modern systemic studies. Applied and theoretical works on systems [7] assume the event as the dynamic producing the result from the antecedent item EVENT
input
output/
(8)
The result is a part and the event is the whole. The properties of the event are evidently quite different from the properties of the output. We encounter heavy difficulties when we call {Ex) "set of events " and contemporarily we conceive it as a "set of results". We cannot merge them without a logical justification: But do we have any ? 3.2) - Some probabilistic outcomes cannot be properly modeled as sets and subsets. The spectrum of interference in the "two slit experiment" is a well-known case emerging in Quantum Physics [6].
324
4
Structural Model
We searched for a solution of the above written difficulties and we designed a theoretical framework based on the structure model for the random event. Ludwing von Bertalanffy, father of the General Systems Theory, conceives a system, and consequently an event, as an intricate set of items which affect one another [2]. Interacting and connecting is the essential character and the inner nature of events, and we take this idea as the basis of our theoretical proposal. We make the following assumption Axiom 4.1) - The idea of relating, of connecting, of linking is a primitive. This idea suggests two elements specialized in relating and in being related that we call entity and relationship. We define them such as. Definition 4.2) - The relationship R connects the entities and we say R has the property of connecting. Definition 4.3) - The entity E is connected by R and we say E has the property of being connected. Intuitively, we may say R is the "active" element and E is the "passive" one. They are symmetric, complementary and complete since they exhaust the applications of Axiom 4.1). Relationships and entities are already known in Algebra as operations and elements; as arrows and objects; as edges and vertices. The main difference is that all of them are given as primitive, while R and E derive from the axiomatic concept 4.1). In other words, the properties of the relationship and the entity are openly given in 4.2) and 4.3), while they are implicit in other theories. We underline that Axiom 4.1) is not a theoretical refinement and will provide the necessary basis to the ensuing inferences. From Definitions 4.2) and 4.3) follows that the relationship R links the entity E and they give the set S = (E;R)
(9)
which is an algebraic structure [4]. In this article we discuss theoretical models with respect to the physical reality thus we immediately examine howE, R and S provide proper models for events. The parts of an event are entities and relationships. As an example an entity is a dice, a spade, heads, tails, a product. The relationship that connects two or more entities is, for ease, a device, a force, a physical interaction [3]. In the physical reality an event is a dynamic phenomenon linking Ein to Eout, and from (9) we can deduce this general structure
325 5 = (Ein, Eout; R)
(10)
Using a graph we get
R
^ . Eout (11)
R is the pivotal element in (10) and (11), and the structural model represents accurately the facts. In addition we get the following advantages. 1. 2. 3.
The result Eout is distinct from the event S. The parts and the whole are logically separate and they give a precise answer to objection 3.1). Relations and entities constitute finite and also infinite sets so that R and E match with both discrete and continuous mathematical formalism. When Eout is an ensemble Eout = {Ex} Eout c= Q
4.
5.
The structure accomplishes the set model in (5) and (6). The result Eout may be also a rational or an irrational number, a real or an imaginary value. It can be calculated by a wave function or by another function etc. and we can offer a formal solution to point 3.2). The structure S can include the comprehensive context of the probabilistic event. E.g. The atomic experiment depends upon the observer Eo and we have this exhaustive structure S = (Ein.Eout, Eo; R)
6.
(12) (13)
(14)
We believe that the structural model can give a contribution to Quantum Probability. A simple sentence includes nouns that are entities and a verb representing a dynamical evolution. E.g. (4) expresses the following entities and relationship "The coin \ comes down \ heads " Ein R Eout
(15)
326
In short the algebraic structure accomplishes the linguistic model. However a sentence can be equivocal whereas the structure S is a rigorous formalism and answers to point 2.2). Note that the set (9) has the associative/ dissociative property namely the event is unicum S; then it is defined in terms of the details E and R. If this analysis is insufficient we reveal the entities (El,E2..,Em) and the relations (Rl, R2..,Rp); these are exploded at a greater level, and so forth. The structure of levels is the complete and rigorous model of any event S= = (E;R) = = (El,E2...,Em;Rl,R2...,Rp) = = (E11.E12 ,Eml,Em2,.,Emk;Rll,R12
,Rpl,Rp2..,Rph)
(16)
,Rpl,Rp2..,Rph
(17)
The structure can also be written such as level 0 level 1 level 2 level 3
E11.E12
S E;R El,E2...,Em;Rl,R2...,Rp Eml,Em2..,Emk;Rll,R12
The multiple level decomposition is known also as "hierarchical property" in literature [13]. It is applied by professionals in software analysis methodologies [14],[10], it is basic in modern ontology [12] and in various other sectors [1]. The progressive explosion of the event is already known in the Probability Calculus where we use trees connecting the parts and the subparts of a random event. For example an urn contains x red balls, y green balls and z white balls. Which is the probability of getting a white and two green balls through three draws ? We consider the drawing Rw of a white ball w and Rg of a green ball. The winning combinations wgg, gwg, ggw are generated by Rl, R2 and R3. Intuitively we write this tree connecting three levels
R3
/l\ Rg.Rg.Rw
(18)
The structure of levels (17) is rigorous and complete. It includes the relations of the event as well as the entities
327
level 0 level 1 level 2
S g,w; R1+R2+R3 wgg,gwg,ggw;(Rw,Rg,Rg)+(Rg,Rw,Rg)+(Rg,Rg,Rw)
(19)
Thanks to this completeness the structural model provides some insight into what is involved. In particular if Rx at level k includes the subrelationships of level (k + 1), then Rx connects the entities through these subrelationships. E.g. The structure of levels (19) illustrates the dynamic Rl carried out by (Rw,Rg,Rg) that physically determine the results The structure (16) proves that any event is composed of precise macromechanisms and micromechanisms. Any event appears like an industrial apparatus, a mechanical clock or an electronic device including various working parts. This operational analysis, which is based on Axiom 4.1), will be fundamental in the next section. 5
Certain and Uncertain Structures
Probability is the answer to such kinds of questions: Who will win the next foot-ball match? Who will be voted in the regional elections? Shall I pass the examinations? Where is the photon now ? These questions prove that probability concerns the particulars of an event that is already known in the whole. We see the overall random phenomenon but, however, we ignore the details that will produce the result. When we ask "who will win the next match ?", we are familiar with the match, we already know the teams which will play, where the match will be held, etc. We master the event, however we do not have the details that will set out the result. Why do we not have details ? The cognitive difficulties, related to the particulars of a random event, take several origins. For example there is a generic memory, the reports are not detailed, the particulars are missing because they are disseminated over a vast area, we meet obstacles in the use of instruments etc. Ignorance of microscopic is sometimes a voluntary choice. Every detail could be observed and yet we decline to know them. For example a company has collected analytical data but the executive managers ignore them and evaluate their average values in taking important decisions. Macroscopic knowledge and unawareness of microscopic items provide a precise method. Statisticians assume this method that is absolutely scientific. Let us translate these concepts into the formalism just introduced. Let the event S have the level /, the level 2, up to the level q; two cases arise now.
328
5.1
Certain Structures
The event is entirely described by the relations and the entities of level q. The elements at level (q + 1) do not exist in the paper and in the physical reality. This structure, which is wholly defined and complete, is certain. As an example we take a body falling level 0 level 1
S Eb,ET;Rf
(20)
The structure includes the body Eb, the Earth £Tand the force of gravity Rf at level 1. The elements exhaustively model the event and other elements do not exist in the physical world. 5.2
Uncertain Structures
The event is not entirely described by the relations and the entities of level q. The microelements pertaining to level (q + 1) exist in the physical reality and influence the final results in a decisive way however the structure do not include them. We call uncertain (or random) such a structure which is partial. As ease we take the flipping of a coin. The structure includes the coin Em, the launching/falling dynamcs Rm. The entities Et heads and Ec tails and the relations, which are alternative and produce them, appear at the next level level 0 level 1 level! level 3
S Em.Rm Et,Ec;Rt+Rc (21)
The subrelationships of Rt and of Re produce any specific outcome. They are essential since they would enable the calculation of any result and should be listed at the level 3 in (21). However they do not appear and the structure (21) is uncertain.
6
Probability
A certain event is entirely explained through the structure of levels. The structure clearly indicates "how" the event runs through q levels which are exhaustive by definition. On the contrary the uncertain structure is incomplete and cannot describe "how" the event runs in the physical reality. As the impossibility of describing "how" the event functions since the level (q + 1) is unknown, we inquire "when" the event behaves, that is "when" the random event exists in the physical reality. This
329 inquire unveils a typically physical approach. The problem eludes whoever develops an abstract study. For the pure theoretician the event S, once defined on the paper, exists by definition. The applicative instead knows the great difference between the definition of a model and its experimental observation. The structure of levels (16) proves that the event S works through R, therefore we measure the ability to connect of the relationship. Definition 5.1) - When R links the input to the output in the physical reality, the event S is certain and the measure P(R) equals one P(R)= 1
(22)
When R does not run in the physical reality, S is impossible in the facts and the measure P(R) is zero P(R) = 0
(23)
If R occasionally runs, P(R) assumes a decimal value. The connection is neither sure nor impossible and R has a value between zero and one 0 < P(R) < 1
(24)
We call probability the measure P(R) of the operation R which extensively indicates the occurrence of S. We can add the ensuing remarks. 1. The relationship R is the precise argument of probability while S is generic. 2. Definition 5.1) is coherent with the common sense on probability as P(R) gauges the possibility or the impossibility of the random event. 3. In some special events we can define the operation using its outcome. Formally we state an univocal relation between Eout and R Eout => R
(25)
and we calculate the probability of the outcome P(Eout) = P(R)
(26)
E.g. The result heads Et appears whenever Rt works and we forecast the chances of a gamble from the possible outputs P(Et) = P(Rt) = 0.5
(27)
330
4.
7
In conclusion if (12), (13) and (26) are true Definition 5.1) is consistent with the Kolmogorov 's theory. Certain structures include only certain elements, impossible elements have no sense and are omitted. The unitary value of probability merely confirms what is already related in the levels. For example P(Rf) is one and substantiates the structure of levels (20). Conversely the uncertain structure lacks the lowest elements that are essential and (24) unveils them. The decimal values of probabilities clarify the intervention of the elements at level (q + 1). For example we ignore the parts of Rt producing the result Et in (21) instead the probability (27) is capable of explaining how they work. Exactly half of the S occurrences is due to the subrelationships of Rt and the other half is activated by the components of Re. The explicative and predictive values of probability in (24) appear absolutely relevant.
Experimental Verification
Our inferences are strictly inspired by experience and Definition 5.1) must be confirmed in the facts. In order to simplify the discussion of practical verification, let the event include either the relationship Ri or NOT Ri at level 2, and level 3 is ignored level 0 S level 1 E;R level 2 Ei.NOT Ei; (Ri+NOTRi) level 3
(28)
The probability P(Ri) expresses the runs of Ri by definition, thus the occurrences gs(Ri) in the sample s verifies the theoretical value P(Ri). As much as Ri connects, so much is gs(Ri). Vice versa as little Ri runs, so small is gs(Ri). However the absolute frequency gs(Ri) exceeds the range [0,1], and we select the relative frequency Fs(Ri) which verifies 0 < Fs(Ri) <1
(29)
According to this theory the relative frequency must coincide with the probability calculated theoretically instead Fs(Ri) does not coincide withP(3?(). Why ? There is perhaps a systematic error in the experiment ? The relationship Ri at level q works by means of its subrelationships at level (q + 1), however we do not know in details how these ones behave. In particular a subrelationship at level (q + 1) occurs random and a finite number of tests does not
331
allow the subrelationships of Ri to maintain their dynamical contribution to Ri. Symmetrically the subrelationships of NOTRi are not proportional to P(NOT Ri). Every finite sample of tests unbalances Ri and NOT Ri. The occurrences of one group are lower to what they ought to be and the occurrences of the other are greater since the subrelationships are casual. The relative frequencies appear in favour of one group of subrelationships and in detriment of another. Fs(Ri) and Fs(NOT Ri) are necessarily unreliable and disagree P(Ri) and P(NOT Ri). We conclude the correct trial of probability must be extended over the universe where the subrelationships of Ri and of NOT Ri do not undergo limitations. The ideal experimentation of P(Ri), which excludes any deforming influence and provides the unaltered value oiFs(Ri), requires the number Gs of tests be infinite Gs = oo
(30)
In this situation the theoretical value P(Ri) and the experimental one coincide \Fs(Ri) - P(Ri)\ = 0
(31)
The ideal experiment (30) is unattainable therefore we can only bring near. We define this approximation using the limit Urn \Fs(Ri) - P(Ri)\ = 0 Gs^oo
(32)
The limit affirms that, given the high number AT, there is a value Gs Gs > N
(33)
\Fs(Ri) - P(Ri)\ <1/Gs
(34)
such that
In other words we repeat the tests a "sufficiently high" number of times and the difference between the frequency and the probability will be less to the "small" number 1/Gs. The limit (32) ensures a result as fine as desired. It proves that the probability defined by (22), (23), (24) is verifiable in the fact and confirms that the present theory has substance. The limit (32), known as empirical law of chance or law of great numbers, does not define probability but explains its experimental verification only. It is less meaningful with respect to the law sustained by frequentists [9] and does not give rise to the same conceptual difficulties. The limit (32) does not use probability to
332
describe the approximation of Fs(Ri) to P(Ri) and avoids a certain conceptual tautology. 8
Objective and Subjective Probability
The limit (32) states that the higher the number of tests the more frequency moves near to probability. Vice versa the smaller the sample, the less reliable is the experimental control of probability. The maximum deviation emerges in a single test and the structural model provides the explanation. One subrelationship of the level (q + 1) fires the single experiment and this subrelationship pertains to Ri or otherwise pertains to NOT Ri. In both cases the frequency deviates completely from the probability which should be decimal.
I Gs 1 Fs wrong
• >N approximate
oo right (35)
The spectrum (35) is valid in relation to frequency and also in relation to probability: What does this mean? Any scientific measure takes its meaning under the precise conditions in which it is defined. Therefore a parameter does not have a value for ever but does only in the practical conditions under which it must be tested. And this rule also concerns probability. A fairly simple case can clarify the matter. We define the force/ as the factor causing the acceleration a to the mass f=m-a
(36)
Mechanics defines the force (36) in the conditions which pertain exclusively to the inertial system. This is characterized by the property of being stationary or moving straight on and steadily. In the inertial system the mass m goes through the force and accelerates in accordance with (36). Conversely the body can move without any mechanical solicitation in the non-inertial reference. The force cannot be tested and definition (36) is meaningless when system is not inertial. In general a scientific measure takes on a significance only under the experimental conditions pertaining to it and out of this context it objectively has no meaning. The same criterion applies to probability with additional difficulties due to the experimental conditions that are expressed by the limit (32) and are somewhat
333
complex. We have not two alternative and mutually exclusive reference systems, intertial and non-intertial, conversely we have the continuous spectrum (35). Probability is correctly experimented and thus takes on a right and objective significance when (37)
Gs =00
This is unattainable and we use a large sample Gs >N
(38)
the higher is the test number and the more objective is the probability verification. Probability loses significance as more as Gs decreases. The test is absolutely meaningless when Gs = 1
(39)
Probability is very useful (see point 3 in section 6) and we calculate P(R) even if (39) is true. In the single event however "the probability does not exist" as De Finetti paradoxically states [5]. Probability can only orientate the personal expectation, namely probability takes on a subjective significance.
I Gs 1 Fs wrong P subjective
>N approximate objective
00
right (40)
Note that the subjectivist schools focus their attention on the single event while the general event is a repetition of single events. This remarks put to light, once again, that incongruences between various authors take their roots on the random event modeling. In substance Fs(Ri) and P(Ri) have a correct and objective meaning when they refer to the entire inductive base. As the number of experiments decrease so the precision of Fs(Ri) decreases and the objectivity of P(Ri) decreases progressively to the point (39) in which the numerical value of Fs(Ri) is systematically wrong and the value ofP(Ri) is subjective.
334 9
Conclusions
Our theoretical proposal arose from a critical approach to the probabilistic event, in particular we started with examining the relation between theoretical models today in use and the physical reality. We believe the algebraic structure meets the needs better than the linguistic and the set models. Besides the theoretical appreciations that we listed in the previous pages, we highlight that structures of levels are already applied in several fields and in Probability Calculus too. The definition of probability, that derives from the structural model, is consistent with the common sense and with the probabilistic schools. The different interpretations of probability, which today are conflicting, are unified in between our framework. We judge this is a significant feature and may provide a stimulation to the scientific debate. The reader may find some parts in this paper sketchy and insufficiently explained, we regret the conciseness. Other considerations and further calculations have been developed in [11] but exhaustive discussions cannot be included here.
References 1. Ahl V., Allen T.F.H., Hierarchy theory: a vision, vocabulary and epistemology (Columbia Univ. Press, N.Y., 1996). 2. von Bertalanffy L., General system theory (Brazziller, N.Y., 1968). 3. Chen P.S., The entity-relationship model: toward a unified view of data, ACM Transactions on Database Systems, vol 1, n.l (1976) 4. Cony L., Modern algebra and the rise of mathematical structures (Verlang, N.Y, 1996). 5. de Finetti B., Theory of probability (Wiler & Sons, N.Y., 1975). 6. Feynman R., The concept of probability in quantum mechanics, Proceedings Symp. on Math. andProb., California University Press (1951). 7. Kalman R.E., Falb P.L., Arbib M.A., Topics in mathematical system theory (McGraw,N.Y,1969). 8. Kolmogorov A.N., Foundations of the theory of probability (Chelsea, N.Y., 1956). 9. von Mises R., The mathematical theory of probability and statistics (Academic Press, London, 1964). 10. Rocchi P., Technology + culture = software (IOS Press, Amsterdam, 2000). 11. Rocchi P., La probabilitd e oggettiva o soggettiva ? (Pitagora, Bologna, 1998). 12. Uschold ML, Building ontologies: toward a unified methodology, Proc. Expert Systems, Cambridge (1996). 13. Takahara Y, Mesarovic M.D., Macko D., Theory of hierarchical, multilevel systems (Academic Press, N.Y., 1970). 14. YourdonE., Modern structured analysis (Englewood Cliffs, N.Y., 1989).
335
C O N S T R U C T I V E F O U N D A T I O N S OF R A N D O M N E S S V. I. SERDOBOLSKII Moscow 109028, B.Trekhsviatitelskii 3/12, MGIEM E-mail:
[email protected]
The ideas of the complexity and randomness are developed in a successively constructive theory. The Kolmogorov complexity is reconsidered as a minimization process. Basic theorems are proved for the processes. A new notion of the complexity based on sequential prefix coding algorithms (S-algorithms) is proposed. It is proved that a constructive infinite binary sequence is algorithmically stationary iff it is an S-encoded random sequence.
1
Introduction
In 1963 A.N.Kolmogorov [1] suggested an algorithmic approach to foundation of the probability. His new definition of probability was based on the notion of the complexity which was defined as the length of the minimal description: for a binary word x, the complexity function is defined as * • ( * ) = min b | ,
(1)
A(p)=x
where p are (shorter) binary words and the minimum is evaluated over all possible algorithms A. A remarkable properties of this approach was that thus algorithmically defined randomness was proved to display all traditional laws of probability. However, the function K(x) denned by (1) in a traditional intuitive approach cannot be effectively calculated since it is not a partially recursive function. In fact, this function is computable only for finitely many words x [2]. In [3] it was shown that K{x) is not partially recursive for any universal algorithm. In [4] the definition (1) was called "a heuristic basis for various approximation". In [5], the author writes that the non-constructive form of the definition (1) leads to some difficulties so that "many important relations hold only to within an error term measured by the logarithm of the complexity". To offer a constructive definition of randomness, it would be desirable to call an infinite sequence random if all initial segments (prefixes) in it are incompressible. However, it was proved [6] that such sequences do not exist. Kolmogorov proposed some definition of randomness (K-randomness) but he wrote that it was to be improved. In this paper we reconsider fundamental relations of the Kolmogorov complexity theory and develop a successively constructive formalism. The main idea is that, as far as we deal with algorithms, we must explicitly take into account the current time of their performance. Thus, a static notion of minimal
336
description must be replaced by the process of the minimization. Here we suggest a rigorous formalism in which it is possible to replace somewhat obscure intuitive reasoning of the existing complexity theory by formal investigation of strings of symbols. We present a survey of basic results of the Kolmogorov complexity theory in terms of processes of step-by-step performance of algorithms. We also introduce a new form of the complexity based on a restriction by algorithms coding sequentially from left to right (S-algorithms). Constructive infinite binary sequences can be called stationary if frequencies of all finite blocks of digits in it converge. We prove that a sequence is stationary iff it is the transformation of an incompressible (up to a logarithmic term) sequence by a sequential left-to-right encoding algorithm. Let us define the objects of the investigation and fix notations. We study binary words x that are finite chains of binary digits and, at the same time, binary numbers. These words are transformed with algorithmic procedures A, which can be represented by Turing algorithms (Turing machines) or, equivalently, by partially computable (partially recursive) functions. We also study infinite sequences x°° of binary digits which can be considered at the same time as infinite sequences of words x of increasing length n, i.e., initial segments of x°°. In the constructive approach, these sequences must be generated by some finite algorithms (generating functions). We write A(x) = y if A halts at some finite step and yields y. If A(x) does not halt we write A(x) = ? We will often need to perform algorithms step-by-step. Let At{x) denote the result of the performance of A{x) for t steps: At(x) — y if A{x) halts at the step t' < t and yields y. We write At(x) =? if A(x) does not halt or halts only at the moment t' > t. Let |a;| denote the length of binary word x. 2
Kolmogorov Complexity
According to Kolmogorov, the complexity of a binary word is the length of a minimal program generating this word. To make this definition completely constructive, we first must explicitly describe the minimization procedure. To minimize a partially computable function f(x) we combine the search of x with counting number of steps of an algorithm that evaluates f(x). Let us use the uniform increasing numeration N = 1,2,... of ntuples of arguments; for example, let N = 1,2,3,4,5,... represent pairs (1,1), (1,2), (2,1), (2,2), (1,3),... Define the standard minimization process for A(x) as follows min A(x) = {A(x,N),
N =
l,2,...}
X
where N = (x,t),
A(x,0)=?, and A(x,N)
= min (A(x,N - l),A t (x)) for
337
N > 1. In the minimization process, the sign "?" can be treated as infinity. If A{x) halts for a computable number of steps t then the minimization process ends and min A(x) is a computable function. If no such t exists, we can say X
then that the function A(x) has no "bottom". Consider the universal Turing machine U: by definition, U(A,p) = A(p) in the domain, where (and in the following) the same letter A also denotes the text of the algorithm. Let \A\ denote the length of the text A. T h e o r e m 1. There exist computable functions such that the mass problem of their minimization process halting is algorithmically unsolvable. Proof. Consider the indicator function ind(x,t) = 0 if Ut(x) with x = (A,p) halts exactly at the step t so that Ut(x) = A(x), otherwise ind(x,t) = 1. Denote (j){x,t) =TT ind(a;,T). T
The minimization process {
min (p,t)
\p\
A,(p)=x
In this definition, A{p) is called a generating algorithm and p is called a program or a code for x. So the complexity is defined as a process but not as a function. If A(x) halts for some x, then the sequence K(x,A) = {K(x,A,N), N = 1,2,...} converges to a constant for some computable N = NQ and we can say that the complexity function K(x) is defined. Otherwise, no such constructive function exist. To compare minimization processes we need a special technique. Definition 2. Given two minimization processes min A(x) = {A(x, N), N = 1,2,... }, X
min B(x) = {B(x, M), M = 1,2,...} X
we write A(x) N0 the inequality holds A(x, N) < B(x,M).
338
If the both processes halt we can write simply A(x) < B(x). If A(x)B(x) we say that the strong equivalence holds and write A(x) ~ B{x). Define also a weak equivalence: A(x) « B(x) if A(x)
min (p,t)
|»|.
At(p)=x
We use two methods of the complexity theory: upper estimates of the complexity are derived by the construction of explicit generating procedures; lower estimates are obtained by counting the variety of words and their programs. Theorem 2. For any algorithm A we have K(x,U)
+ cA,
where CA depends only on A but not on x. Proof. Count steps of A{x) by steps of the universal Turing machine performing A. For each N we can find a number M such that K(x, U, N) =
min
(z,t)
min B; \B\
min (p,t)
Ut(B,p)=x
|(B,p)|
U,(z)=x
< ~
\z\ < ~
min (p,t)
min \p\ = CA (p,t)<M: A,(p)=x
Ut(A,p)=x
(CA + \p\) < ~
+K(X,A),
where CA is a constant depending only on A. This is the proof. This statement is called the Invariance Theorem. Its significance is that it introduces a universal measure of complexity which is calculated by trying different algorithms with different input words. Let us fix a particular universal Turing machine U as a reference machine and set K(x) = K(x, U).
339 Let us call the difference |x| — K(x) the number of regularities. Remark 1. Given n = \x\, the fraction of words x with the number of regularities more than m is no more than 2 ~ m . This follows from the fact that there are only 2 n _ m programs p of length n—m. So almost all words are incompressible up to a slowly increasing function of n. Remark 2. K{x)<\x\ + c. This is obvious since we can use, as a generating, the identity algorithm A(x) = x. Note that the minimization process in Theorem 2 can be made more efficient if we restrict p with \p\ < \x\ + c. The complexity of finite words depends strongly on the additive constant c. Therefore, the main object of study will be the complexity of words x of arbitrarily great lengths n. Theorem 3. If f{x) is a partially computable function, K(f(x))
then
Proof. Suppose the algorithm evaluating f(x) halts. Given an arbitrary algorithm A we construct the composition B = fA. By Definition 3 and Theorem 2, for each N we can find M and a constant c' independent of x such that K(f(x),U,N)= min \p\ < (z,t)
min B: \B\
min
Inl + c <
(p,t)<M: Bt(p)=f(x)
min (p,t)<M:At(p)=x
~
Ut(z)=f(x)
min
\p\ + c <
(p,t)<M:f(At(p))=f(x)
Id + c = K(x, A) + c
'
The theorem is proved. Example. Let x — 0n (n zeros). Then K(x)
340
Note that the set of n presents a prefix-free set. More sparing self-delimiting codes can be obtained by further iterations. Denote their length by log* n = log + log log n 4- log log log n + ... (the iterated logarithm). Theorem 4. K(x, y)
+ K(y) + 2 log ||z|| + 1.
Proof. It suffices to use programs for (x, y) of the form p = 0mlp1p2, where m = logpi, A(pi) = x, B(p2) = y, and 0 m serves to separate p\ from p2. 3
Incompressibility
Now we consider algorithmically generated infinite sequences of digits x°° that are treated as sequences of words {x : |x| = n = 1 , 2 , . . . } . We cite (in a simplified form) two theorems by Martin-L6f [6]. Theorem 5. Any constructive x°° contains infinitely many words x of length n with K(x)
341
4
Reversible Complexity
Let us restrict ourselves with reversible algorithms. Definition 6. An algorithm A(p) is called reversible (R-algorithm) if one can find another algorithm B = A-1 such that A(p) — x implies B(x) — p and vice versa. These algorithms state 1-1 correspondence between inputs and outputs. We can say that B(x) is an encoding algorithm and A(p) is a decoding algorithm. Definition 7. R-complexity of a word x is defined as the process KR(X) = {KR(x, N), N = 1 , 2 , . . . } , where KR(X,N)
=
min A: \A\
min Id, {p,t)
where A are R-algorithms and the minimization process is shortened by discovering the first root of the equation A(p) = x. Since the class of R-algorithms includes the identity algorithm, we have KR(X) < \x\ + c. Definition 8. A function (an algorithm) A(x) is called unidomain , if there are no pairs x\ ^ x-i such that A{x\) = A{x2). Proposition 1. A function A(x) is unidomain iff it is reversible. Proof. First, let A be unidomain. Using A let us construct an algorithm B(y) as follows: for (p,t) = 1,2,... do if At(p) = y then B(y) := p; halt endfor If A(x) = y then this algorithm provides the first root of this equation and halts. If A(x) =? then we have B(y) =?. Conversely, if A is a reversible algorithm, then there exist an algorithm B(y) such that A{x) = y implies B(y) = x and the argument of A is recovered uniquely. Theorem 7. There exist no algorithm W such that for any algorithm A we have W(A) = 1 if A can be a reversible algorithm, and W(A) = 0 if not. Proof. To prove this assertion, it suffices to prove it for some special class of A. Let N be a nullifying algorithm such that for any x we have N(x) = 0 , and let B be an arbitrary algorithm. Choose A so that A(0) = 0, A(l) = N(B(1)), and A(n) = n for n > 1. This algorithm is not unidomain iff -B(l) halts. However, the mass problem of algorithm halting is algorithmically unsolvable. This proves the theorem.
342
Theorem 8. The complexity KR{X) as K(X). Proof. The relation K(X)
= min
min
A: \A\
(A,p,t)
Ipl. At(p)=x
where A are arbitrary algorithms. Given A, the minimization here is carried out over all roots of the equation At(p) = x. We replace the evaluation of all roots for a single algorithm At by evaluating roots of a number of the equations. Let us numerate roots of the equation A(p) = x in the process (p, t) = 1,2, Construct the algorithm B(v,p) as follows. k:=0 for (q,r)=l,2,... do if AT{q) — x then k := k + 1 if k = v and p = q then B := x; halt endfor The function B(v,p) = x iff p is the root number is, otherwise B(y,p) =?. By construction, for fixed v the function B(i/,p) is unidomain. The theorem statement follows. Knowing the complexity of a word x we can constructively evaluate its minimal codes. Minimizing descriptions of physical events x can be considered as a process of a "cognition" of x by search of a regularities producing the phenomenon x. It is known that all elementary physical processes are timereversible. The reversible generating algorithms, generally speaking, can be less efficient in producing long words. The equivalence K{x) « KR{X) stated by Theorem 8 can be interpreted as the absence of phenomena that can be produced but not cognized within the frames of the algorithmic theory. 5
Complexity and Information
Kolmogorov discovered [2], [9] that information theory can be developed from the algorithmic definition of complexity. The conditional complexity of a binary word x with respect to the word y is defined as the minimal length of a program that generates x from y: K(x\y,A)=
min (p,t):
\p\.
At(p,y)=x
Theorem 9. There exists an optimal algorithm V such that for any algorithm A we have K(x\y) d=!f K(x\y, V)
343
Example. We have K(On\n)
Theorem 10. Let o word x be partitioned into k blocks of length I. Then k~1K(x)
kilk2\...kml
m = 2l, where fci + • • • + km = k. Applying the Stirling formula we find that the length of this code is no more than m log k + kH(f) + c log k. The theorem statement follows. Thus, K{x) can be considered as the entropy and K(y\x) as the conditional entropy. The information in x about y is I(x\y) = K(y) — K(y\x). Remark 4. For arbitrary words x and y, K(y\x)
+ c and K(x,y) = K(x) + K(y|x) + clog|x|.
Indeed, consider a special code for (x, y) of the form P1P2, where pi is a selfdelimiting code for x and pi is a code for y. We have K(x,y)< min A,B: |A|
(pi,P2,t):
min (|Pi| + IP2I). At(pi) = x, Bt(p2) = y
This is the required statement. Note that the measure of the information I(x\y) is non-negative only asymptotically for long x and y. The correction logarithmic term can be prescribed to the individual description of x in contrast to traditional description in terms of distributions.
344
6
Frequency R a t e s
The stability of frequency rates that is assumed a priori in the conventional concept of probability can be deduced in the algorithmic theory. Denote the empiric rate of occurences of 1 in x by f(x, 1). The frequency rates stability can be stated as follows. Theorem 1 1 . Given L-random x°°, c > 0, for each word x in it \f(x,l)-l/2\2
Theorem 12. Given an L-random sequence x°° = {x} and a block of digits b of length I, for all words x of length n we have \f(x,b)-2~l\2
A number of other specifically probabilistic laws deduced previously by intuitive reasoning in can be proved similiarly. 7
Prefix Complexity
In 1974-1975 another approach to the complexity was developed starting from the concept of a prefix complexity (by L.A.Levin, P.Gacs, G.J.Chaitin [10-12])
345
Definition 9. A set of words is called prefix-free if there are no pairs of different words such that one is the beginning of the other. Lemma 1. (1) If {pi} is a prefix set, n; = \pi\, i — 1,2,..., holds
£
then the Kraft inequality
2-"<
t=l,2,...
(2) if numbers n\, n
At=x
The set of prefix algorithms is an enumerable set. Theorem 13. There exists a universal prefix algorithm V such that for any prefix algorithm A we have KP{x)
d
= KP(x, V)
A) + cA.
To deal with prefix algorithms, we notice that we can recover the word x = 0 n (n zeros) from n, but we cannot encode numbers n as simple integers since they are not prefix-free. Using self-delimiting codes we obtain prefix-free codes of length n + log* n. Remark 6.
K(x)
Remark 7. Kp(x,y)
+ log*(z). + Kp(y) + c. In contrast to K(x), here we do
not need an end marker for the word x since x is recognized as a prefix. Theorem 14 [12]. For any fixed length n of words x we have max Kp(x)>n + log* n — c. X
Theorem 15 [13]..An infinite sequence x°° is Martin-Lof random iff Kp(x)>\x\ — c for all words x.
346
For most of x°° we have Kp(x)>\x\ — c for all x. Thus, the prefix complexity of almost all sequences fluctuates within the bounds \x\ and |a;| + log* \x\ (with the accuracy up to c). 8
Universal Probability
The idea of a universal a priori probability was put forward by Solomonoff in [4]. For a binary word x, he introduced the probability P(x) = 2 _ l p ^ ^ , where p(x) is a minimal description of a;. However,
£2-*<*> = oo. x
To obtain normalizable algorithmic probabilities, the Kraft inequality for a prefix-free set was proposed and this led to the development of a theory of the prefix complexity [10-12]. Let us reformulate the basic results of it in a successively constructive form. Definition 11. The algorithmic probability of x is defined by the process P(x) = {2-Kr(*
AT = 1,2,...}
Example. If x = 0 n , then Kp(x)< logn + 2 log log n + c. Hence P(x)>c/(nlog2 n). Definition 12. The universal a priori probability is defined by Q{x) = {Q(x,U,N), N = (p, t) — 1,2,...} where U is the universal prefix algorithm and Q(x,U,N)
= Q{x,U,N-l)
+ md(Ut(p) = x) 2~M,
where the indicator function equals 1 iff Ut(p) halts exactly at the step number t otherwise 0. Since the mass problem of the universal machine halting is algorithmically unsolvable, the sequence Q(x) has no "ceiling". The following Coding Theorem shows that these two formulations define processes differing by no more than a constant. Theorem 16. For each x we have Kp{x) » logQ(x). In [14] a non-constructive infinite binary fraction was considered
n =53 Q(x) <
I.
347
The real number fi was called the universal algorithm halting probability. It can be interpreted as a process {Q(N), N — 1,2,...} with
fi(jV) = Yl
MN " !) + 'md(ut(p) = *)]>
(x,p,t)
where the indicator function equals 1 iff Ut{p) halts exactly at the moment t yielding x, otherwise 0. The monotone increasing sequence il(N) is bounded from above and has no "ceiling". Knowing first signs of il{N), N — 1,2,..., we can accumulate in fi solutions of all constructive problems of bounded complexity. C.Bennet and M.Gardner would call ft "the number of Wisdom" [15]. 9
Sequentially Coding Algorithms
We suggest the following extension of the complexity theory produced by a restriction with algorithms coding sequentially from left to right. A set P of code words is called complete-code if any half-infinite sequence can be represented as a concatenation of codes from P. Definition 13. An one-to-one constructive function T : X <—> Y is called a coding table if it is defined on complete-code prefix-free sets X and Y. Definition 14. An algorithm A evaluating a coding table T : X <—> Y is called a sequential coder or an S-algorithm if (1) for any concatenation x = x\Xi ...Xk of words Xi from X, we have A(x) = A(x1)A(x2)...A(xk); (2) for any concatenation y = A(xx)A(x2) •. • A(xk) we also have A(x1x2...xk) = y. The set of S-algorithms is recursively enumerable. Definition 15. The S-complexity of a word x with respect to an S-algorithm A is a process Ks(x, A) = {Ks(x, A,N), N = 1 , 2 , . . . } , where Ks(x,A,N)d=
min (p,t)
\p\.
At(p)=x
Theorem 17. There exists a (universal) S-algorithm V such that for any S-algorithm A we have Ks(x)
= Ks(x,V)
where CA does not depend on x.
+ cA,
348
Since the class of S-algorithms contains the identity algorithm (with A(0) = 0, A(l) = 1), we have Ks(x)<\x\+c. If f(x) is a partially computable function evaluated by some S-algorithm, then Ks(f(x))
Any L -random sequence is algorithmically stationary. Lemma 2. / / a binary sequence y°° = {y} is produced from an algorithmically stationary sequence x°° = {x} by an S-algorithm A so that y = A(x), then the sequence y°° is also algorithmically stationary. Proof. Suppose y°° is produced from x°° by y = A(x), where A is an S-algorithm. The algorithm A defines a prefix-free domain X and a codecomplete range of values Y. Choose a block of digits b. Using the completeness of Y, we have b — 2/12/2 • • • Vk, where j / , 6 Y, i = 1,2,... k. By the sequential property we can find a program a = X\Xi.. .Xk with all Xi € X such that A{a) = b. The frequencies f(a,x) = f(b,y). This proves the lemma. Lemma 3. Ks{Ks(x))<\Ks(x)\ + c. Proof. Note that S-algorithms are such that the composition AB of two S-algorithms A and B is again an S-algorithm. For a fixed N we find Ks(x.N)
= min A: \A\
min (p,t)
Ipl;
At(p)=x
and for the minimizing value p = Po, Ks{po,M)=
min B: \B\
min (y,t):<M: Bt(y)=p0
\y\. '
Let y = 2/0 be the minimizing value of a code for po- Since for some t, AtBt(y) = x (if both algorithms halt), it is clear that Ks{x) < \y\ + c. We obtain K(x)
349 Proof. First, assume that y = A(x) for all x € x°° and Ks(x)>\x\ — clog \x\. We have K(x)>Ks(x)-log* \x\. So K(x)>\x\ -c'log|a;|, c ' > c + l . By Theorem 12 the sequence x°° is stationary. To prove the converse, assume that x°° = {x} is stationary. We find minKs(x, N) for (p, t) < N; let p be a minimum code for x, At(p) = x for some t if At(p) halts. Here A : P -¥ X has the domain P and the range X, both prefix-free and code-complete. Since X is code-complete, we can express x as x\xi...Xk with Xi e X, and A(pi) = Xi with pi € P , i = l,...k. By Lemma 3 we have Ks(p)>\p\ - c. It follows that p — p\pi ...pk is log-incompressible. The proof is complete. The comparison of different notions of the complexity and randomness shows that this difference is no more than a logarithmic term. With account of stationarity theorems, it seems plausible to suggest a common definition of randomness of infinite sequences x°° — {x} as the incompressibility up to the term c log |x|, where c does not depend on x. In conclusion, I have a pleasure to express my sincere gratitude to prof. V.M.Maximov for encouraging discussions. References 1. A. N. Kolmogorov. Grundlagen der Wahrscheintlickkeits Rechnung (Springer Verlag, 1933; in English: Chelsea, New York, 1956). 2. A. N. Kolmogorov, Problems of Information Transfer, 1, 1, 1-7 (1965). 3. L. Longren, Computer and Information Sciences, 2, 165-175(1967). 4. R. J. Solomonoff, Progress of Symposia in Applied Math., AMS, 43 (1962); IEEE Trans, on Inform. Theory, 4, 5, 662-664(1968). 5. Li Ming, P. Vitanyi, An Introduction to Kolmogorov Complexity (Springer, Berlin-Heridelberg-New-York, 1993). 6. P. Martin-L6f, Information and Control, 9, 602-619(1966); Zeits. Warsch. Verw. Geb., 19,225-230(1971). 7. A. N. Shiryaev, The Annals of Probability, 17, 3, 866-944(1989). 8. G. J. Chaitin, J. ACM, 16, 145-159(1969). 9. A. N. Kolmogorov, Russian Math. Survey, 38, 4, 27-36(1983). 10. L. A. Levin, Problems of Information Transmission, 10, 3,206-210(1974). 11. P. Gacs, Soviet Math. Doklady, 15, 1477-1480(1974). 12. G. J. Chaitin, J. ACM, 22, 329-340(1975). 13. V. V. Vjugin, Semiotika i Informatika (in Russian), 16, 14-43(1981); V. A. Uspenskii, SIAM J. Theory Probab. Appl, 32, 387-412(1987). 14. R. J. Solomonoff, Information and Control, 7, 1-22(1964). 15. C. H. Bennet, M. Gardner, Sci. America, 241, 11, 20-34(1979).
350
S T R U C T U R E OF PROBABILISTIC I N F O R M A T I O N A N D Q U A N T U M LAWS JOHANN SUMMHAMMER Atominstitut der Osterreichischen Universitdten Stadionallee 2, A-1020 Vienna, Austria E-mail: [email protected] The acquisition and representation of basic experimental information under the probabilistic paradigm is analysed. The multinomial probability distribution is identified as governing all scientific data collection, at least in principle. For this distribution there exist unique random variables, whose standard deviation becomes asymptotically invariant of physical conditions. Representing all information by means of such random variables gives the quantum mechanical probability amplitude and a real alternative. For predictions, the linear evolution law (Schrodinger or Dirac equation) turns out to be the only way to extend the invariance property of the standard deviation to the predicted quantities. This indicates that quantum theory originates in the structure of gaining pure, probabilistic information, without any mechanical underpinning.
1
Introduction
The probabilistic paradigm proposed by Born is well accepted for comparing experimental results to quantum theoretical predictions*. It states that only the probabilities of the outcomes of an observation are determined by the experimental conditions. In this paper we wish to place this paradigm first. We shall investigate its consequences without assuming quantum theory or any other physical theory. We look at this paradigm as defining the method of the investigation of nature. This consists in the collection of information in probabilistic experiments performed under well controlled conditions, and in the efficient representation of this information. Realising that the empirical information is necessarily finite permits to put limits on what can at best be extracted from this information and therefore also on what can at best be said about the outcomes of future experiments. At first, this has nothing to do with laws of nature. But it tells us how optimal laws look like under probability. Interestingly, the quantum mechanical probability calculus is found as almost the best possibility. It meets with difficulties only when it must make predictions from a low amount of input information. We find that the quantum mechanical way of prediction does nothing but take the initial uncertainty volume of the representation space of the finite input information and move this volume about, without compressing or expanding it. However, we emphasize, that any mechanistic imagery of particles, waves, fields, even
351
space, must be seen as what they are: The human brain's way of portraying sensory impressions, mere images in our minds. Taking them as corresponding to anything in nature, while going a long way in the design of experiments, can become very counter productive to science's task of finding laws. Here, the correct path seems to be the search for invariant structures in the empirical information, without any models. Once embarked on this road, the old question of how nature really is, no longer seeks an answer in the muscular domain of mass, force, torque, and the like, which classical physics took as such unshakeable primary notions (not surprisingly, considering our ape origin, I cannot help commenting). Rather, one asks: Which of the structures principally detectable in probabilistic information, are actually realized? In the following sections we shall analyse the process of scientific investigation of nature under the probabilistic paradigm. We shall first look at how we gain information, then how we should best capture this information into numbers, and finally, what the ideal laws for making predictions should look like. The last step will bring the quantum mechanical time evolution, but will also indicate a problem due to finite information. 2
Gaining experimental information
Under the probabilistic paradigm basic physical observation is not very different from tossing a coin or blindly picking balls from an urn. One sets up specific conditions and checks what happens. And then one repeats this many times to gather statistically significant amounts of information. The difference to classical probabilistic experiments is that in quantum experiments one must carefully monitor the conditions and ensure they are the same for each trial. Any noticeable change constitutes a different experimental situation and must be avoided.0 Formally, one has a probabilistic experiment in which a single trial can give K different outcomes, one of which happens. The probabilities of these outcomes, pi, ...,PK, (52Pj = 1), are determined by the conditions. But they are unknown. In order to find their values, and thereby the values of physical quantities functionally related to them, one does N trials. Let us assume the outcomes j = 1, ...,K happen L\, ...,LK times, respectively (52 Lj = N). The Lj are random variables, subject to the multinomial probability distribution. Listing Li, ...,LK represents the complete information gained in the N trials. The customary way of representing the information is however by other random "Strictly speaking, identical trials are impossible. A deeper analysis of why one can neglect remote conditions, might lead to an understanding of the notion of spatial distance, about which relativity says nothing, and which is badly missing in todays physics.
352
variables, the so called relative frequencies Vj = Lj/N. Clearly, they also obey the multinomial probability distribution. Examples: * A trial in a spin-1/2 Stern-Gerlach experiment has two possible outcomes. This experiment is therefore goverend by the binomial probability distribution. * A trial in a GHZ experiment has eight possible outcomes, because each of the three particles can end up in one of two detectors 2 . Here, the relative frequencies follow the multinomial distribution of order eight. * Measuring an intensity in a detector, which can only fire or not fire, is in fact an experiment where one repeatedly checks whether a firing occurs in a sufficiently small time interval. Thus one has a binomial experiment. If the rate of firing is small, the binomial distribution can be approximated by the Poisson distribution. We must emphasize that the multinomial probability distribution is of utmost importance to physics under the probabilistic paradigm. This can be seen as follows: The conditions of a probabilistic experiment must be verified by auxiliary measurements. These are usually coarse classical measurements, but should actually also be probabilistic experiments of the most exacting standards. The probabilistic experiment of interest must therefore be done by ensuring that for each of its trials the probabilities of the outcomes of the auxiliary probabilistic experiments are the same. Consequently, empirical science is characterized by a succession of data-takings of multinomial probability distributions of various orders. The laws of physics are contained in the relations between the random variables from these different experiments. Since the statistical verification of these laws is again ruled by the properties of the multinomial probability distribution, we should expect that the inner structure of the multinomial probability distribution will appear in one form or another in the fundamental laws of physics. In fact, we might be led to the bold conjecture that, under the probabilistic paradigm, basic physical law is no more than the structures implicit in the multinomial probability distribution. There is no escape from this distribution. Whichever way we turn, we stumble across it as the unavoidable tool for connecting empirical data to physical ideas. The multinomial probability distribution of order K is obtained when calculating the probability that, in N trials, the outcomes 1,..., K occur L\, ...,LK times, respectively: Prob(L1,...,LK\N,p1,...,pK) =
L K
The expectation values of the relative frequencies are
^ - P K -
(2- 1 )
353 Vj = pj
(2.2)
and their standard deviations are
3
Efficient representation of probabilistic information
The reason why probabilistic information is most often represented by the relative frequencies Vj seems to be history: Probability theory has originated as a method of estimating fractions of countable sets, when inspecting all elements was not possible (good versus bad apples in a large plantation, desirable versus undesirable outcomes in games of chance, etc.). The relative frequencies and their limits were the obvious entities to work with. But the information can be represented equally well by other random variables \j> a s l° n g a s these are one-to-one mappings Xj{vj)i s o that no information is lost. The question is, whether there exists a most efficient representation. To answer this, let us see what we know about the limits pi, ...,PK before the experiment, but having decided to do iV trials. Our analysis is equivalent for all K outcomes, so that we can pick out one and drop the subscript. We can use Chebyshev's inequality4 to estimate the width of the interval, to which the probability p of the chosen outcome is pinned down.6 If N is not too small, we get
Wp
= 2kJ^,
(3.1)
where A; is a free confidence parameter. (Eq.(4) is not valid at ^=0 or 1.) Before the experiment we do not know u, so we can only give the upper limit, Wp
< -^.
(3.2)
But we can be much more specific about the limit x of the random variable x(f), for which we require that, at least for large N, the standard deviation 'Chebyshev's inequality states: For any random variable, whose standard deviation exists, the probability that the value of the random variable deviates by more than fc standard deviations from its expectation value is less than, or equal to, fc-2. Here, A; is a free confidence parameter greater 1.
354
A% shall be independent of p (or of x for that matter, since there will exist a function p{x)), Ax = ^ ,
(3.3)
where C is an arbitrary real constant. For the derivation of the function X(v) it is easiest to make use of the illustration in Fig.l. Although it already shows the solution, the argument is general enough, so that the particular form of the discussed function does not matter. First we note that x(^) shall be smooth and differentiate and strictly monotonic. For sufficiently large N the probability distribution of v can be approximated by a normal distribution centered at v and with standard deviation Av. In other words, it will approach the gaussian form Prob{v\N,p)
(y-vf 2
« rexp
2(Ai/)
(3.4)
where r is the normalization factor. But clearly, the corresponding probability distribution of \ will also tend to the gaussian form of standard deviation Ax(For instance, take the probability distributions of v and x for P — -5. These are the ones in the middle, as shown in Fig.l.) And if N is large, both Av and Ax will be small, so that in the range of x and v where the probability is significantly different from zero, the curve x(^) can be approximated by its tangent X « X W + ( | )
__{v-v).
(3.5)
Then it follows that the characteristic width of the probability distribution of x> which is Ax, will be proportional to the characteristic width of the probability distribution of v, which is Av. The proportionality constant will be g£, because this is by how much the distribution for v gets 'squeezed' or 'stretched' to become the one for x- So we have, for large N,
£U £. Av dv Use of (3) and (6), and integration yields X = C arcsin (2v - 1) + 9,
(3.6) ' (3.7)
where 9 is an arbitrary real constant?. For comparison with v we confine x to [0,1] and thus set C = 7r_1 and 6 = .5, as was already done in Fig.l. Then we
355
have A x = l/(iry/N), and upon application of Chebyshev's inequality we get the interval wx to which we can pin down the unknown limit x as wx = — ? = .
(3.8)
Clearly, this is narrower than the upper limit for wp in eq.(5). Having done no experiment at all, we have better knowledge on the value of x than on the value of p, although both can only be in the interval [0,1]. And note that, the actual experimental data will add nothing to the accuracy with which we know x, but they may add to the accuracy with which we know p. Nevertheless, even with data, wp may still be larger than to,, especially when p is around 0.5. For the representation of information the random variable x is the proper choice, because it disentangles the two aspects of empirical information: The number of trials N, which is determined by the experimenter, not by nature, and the actual data, which are only determined by nature. The experimenter controls the accuracy wx by deciding N, nature supplies the data x, and thereby the whereabouts of x. In the real domain the only other random variables with this property are the linear transformations afforded by C and 9. From the physical point of view x *s °f interest, because its standard deviation is an invariant of the physical conditions as contained in p or x. The random variable x expresses empirical information with a certain efficiency, eliminating a numerical distortion that is due to the structure of the multinomial distribution, and which is apparent in all other random variables. We shall call x an efficient random variable (ER). More generally, we shall call any random variable an ER, whose standard deviation is asymptotically invariant of the limit the random variable tends to, eq.(6). Another graphical depiction of the relation between v and \ c a n be given by drawing a semicircle of diameter 1 along which we plot v (Fig.2a). By orthogonal projection onto the semicircle we get the random variable C, = [K + 2arcsin(2i/ — l)]/4 and thereby Xi when we choose different constants. The drawing also suggests a simple way how to obtain a complex ER. We scale the semicircle by an arbitrary real factor a, tilt it by an arbitrary angle ip, and place it into the complex plane as shown in Fig.2b. This gives the random variable 0 = a(yv(l-v)
+iv} e^
+b
(3.9)
where b is an arbitrary complex constant. We get a very familiar special case by setting a — 1 and 6 = 0: V> = (yjv (1 - v) + iv) e'iv.
(3.10)
356
Figure 1: Functional relation between random variables v and x> and their respective probability distributions as expected for N = 100 trials, plotted for five different values of p: .07, .25, .50, .75 and .93. The bar above each probablity distribution indicates twice its standard deviation. Notice that the standard deviations of v differ considerably for different p, while those of x a r e aU the same, as required in eq.(6)
357
(a)
(b)
Figure 2: (a) Graphical construction of efficient random variable £ (and thereby of x) from the observed relative frequency v. £ is measured along the arc. ( b ) Similar construction of the efficient random variable /3. It is given by its coordinates in the complex plane. The quantum mechanical probability amplitude ip is the normalized case of /3, obtained by setting a = 1 and 6 = 0.
358
For large N the probability distribution of v becomes gaussian, but also that of any smooth function of v, as we have already seen in Fig.l. Therefore the standard deviation of ip is obtained as Aip
dip 4 dv
" = Sf
< 3U >
Obviously, the random variable ip is an ER. It fulfills \ip\2 — i/, and we recognize it as the probability amplitude of quantum theory, which we would infer from the observed relative frequency v. Note, however, that the intuitive way of getting the quantum mechanical probability amplitude, namely, by simply taking ^/vexp(ia), where a is an arbitrary phase, does not give us an ER. We have now two ways of representing the obtained information by ERs, either the real valued x o r the complex valued /?. Since the relative frequency of each of the K outcomes of a general probabilistic experiment can be converted to its respective efficient random variable, the information is efficiently represented by the vector (XI,---,XK), or by the vector (0i,...,/3K). The latter is equivalent to the quantum mechanical state vector, if we normalize it: (ipu...,ipK). At this point it is not clear, whether fundamental science could be built solely on the real ERs \j o r whether it must rely on the complex ERs /J,-, and for practical reasons on the normalized case ipj, as suggested by current formulations of quantum theory. We cannot address this problem here, but mention that working with the j3j or ipj can lead to nonsensical predictions, while working with the Xj never does, so that the former are more sensitive to inconsistencies in the input d a t a 6 . Therefore we use only the ipj in the next section, but will not read them as if we were doing quantum theory. 4
Predictions
Let us now see whether the representation of probabilistic information by ERs suggests specific laws for predictions. A prediction is a statement on the expected values of the probabilities of the different outcomes of a probabilistic experiment, which has not yet been done, or whose data we just do not yet know, on the basis of auxiliary probabilistic experiments, which have been done, and whose data we do know. We intend to make a prediction for a probabilistic experiment with Z outcomes, and wish to calculate the quantities 4>s, (s = 1,..., Z), which shall be related to the predicted probabilities Ps as Ps = \(j>s\2- We do not presuppose that the
359 all the input information needed to predict the cf>s, and therefore the Ps. With (13) the obtained information is represented by the ERs ip™, where m denotes the experiment and j labels a possible outcome in it (j = 1,..., Km). Then the predictions are
and their standard deviations are, by the usual convolution of gaussians as approximations of the multinomial distributions, M
A
N
d<j)s
4Nn
dip
(4.2)
where Nm is the number of trials of the mth auxiliary experiment. If we wish the
360
and checking in which he finds the particle. In N trials Bob obtains the relative frequencies vi,..., VK, giving a good idea of the particle's position probability distribution at time t. He represents this information by the ERs xpj of (10) and wants to use it to predict the position probability distribution at time T (T > t). First he predicts for t + dt. With (15) the predicted
)s(t + dt) = J2asjxpj.
(4.3)
i=i
Clearly, when dt —> 0 we must have asj — 1 for s — j and asj = 0 otherwise, so we can write asj (t) = 6aj + gsj (t)dt,
(4.4)
where gSj(t) are the complex elements of a matrix G and we included the possibility that they depend on t. Using matrix notation and writing the
(4.5)
For a prediction for time t + 2dt we must apply another such linear transformation to the prediction we had for t + dt, ${t + 2dt) = [1 + G(t + dt)dt] ${t + dt).
(4.6)
Replacing t + dt by t, and using
(4.7)
With (10) the input vector was normalized, \ip\2 — 1. We also demand this from the vector
361 numbers. The initial vector ip has K complex components. It is normalized and one phase is free, so that it is fixed by 2K — 2 real numbers. Altogether K2 + IK - 3 = (K + 3) (K - 1) numbers are needed to enable prediction. Since one probabilistic experiment yields K — 1 numbers, Bob must do K + 3 probabilistic experiments with different delay times between Alice's preparation and his measurement to obtain sufficient input information. But neither Planck's constant nor the particle's mass are needed. It should be noted that this analysis remains unaltered, if the initial vector ip is obtained from measurement of joint probability distributions of several particles. Therefore, (21) also contains entanglement between particles. 5
Discussion
This paper was based on the insight that under the probabilistic paradigm data from observations are subject to the multinomial probability distribution. For the representation of the empirical information we searched for random variables which are stripped of numerical artefacts. They should therefore have an invariance property. We found as unique random variables a real and a complex class of efficient random variables (ERs). They capture the obtained information more efficiently than others, because their standard deviation is an asymptotic invariant of the physical conditions. The quantum mechanical probability amplitude is the normalized case-of the complex class. It is natural that fundamental probabilistic science should use such random variables rather than any others as the representors of the observed information, and therefore as the carriers of meaning. Using the ERs for prediction has given us an evolution prescription which is equivalent to the quantum theoretical way of applying a sequence of infinitesimal rotations to the state vector in Hilbert space7. It seems that simply analysing how we gain empirical information, what we can say from it about expected future information, and not succumbing to the lure of the question what is behind this information, can give us a basis for doing physics. This confirms the operational approach to science. And it is in support of Wheeler's It-from-Bit hypothesis8, Weizsacker's ur-theor$, Eddington's idea that information increase itself defines the rest10, Anandan's conjecture of absence of dynamical laws11, Bohr and Ulfbeck's hypothesis of mere symmetry^2 or the recent 1 Bit — 1 Constituent hypothesis of Brukner and Zeilingei13. In view of the analysis presented here the quantum theoretical probability calculus is an almost trivial consequence of probability theory, but not as applied to 'objects' or anything 'physical', but as applied to the naked data of probabilistic experiments. If we continue this idea we encounter a deeper
362
problem, namely whether the space which we consider physical, this 3- or higher dimensional manifold in which we normally assume the world to unfurl 14 , cannot also be understood as a peculiar way of representing data. Kant conjectured this - in somewhat different words - over 200 years ago 1 5 . And indeed it is clearly so, if we imagine the human observer as a robot who must find a compact memory representation of the gigantic data stream it receives through its senses 16 . That is why our earlier example of the particle in a box should only be seen as illustration by means of familiar terms. It should not imply that we accept the naive conception of space or things, like particles, 'in' it, although this view works well in everyday life and in the laboratory — as long as we are not doing quantum experiments. We think that a full acceptance of the probabilistic paradigm as the basis of empirical science will eventually require an attack on the notions of spatial distance and spatial dimension from the point of view of optimal representation of probabilistic information.
Finally, we want to remark on a difference of our analysis to quantum theory. We have emphasized that the standard deviations of the ERs \ a n d tp become independent of the limits of these ERs only when we have infinitely many trials. But there is a departure for finitely many trials, especially for values of p close to 0 and close to 1. With some imagination this can be noticed in Fig.l in the top and bottom probability distributions of \ , which are a little bit wider than those in the middle. But as we always have only finitely many trials, there should exist random variables which fulfill our requirement for an ER even better than x a n d ip- This implies that predictions based on these unknown random variables should also be more precise! Whether we should see this as a fluke of statistics, or as a need to amend quantum theory is a debatable question. But it should be testable. We need to have a number of different probabilistic experiments, all of which are done with only very few trials. From this we want to predict the outcomes of another probabilistic experiment, which is then also done with only few trials. Presumably, the optimal procedure of prediction will not be the one we have presented here (and therefore not quantum theory). The difficulty with such tests is however that, in the usual interpretation of data, statistical theory and quantum theory are treated as separate, while one message of this paper may also be that under the probabilistic paradigm the bottom level of physical theory should be equivalent to optimal representation of probabilistic information, and this theory should not be in need of additional purely statistical theories to connect it to actual data. We are discussing this problem in a future paper 17 .
363
Acknowledgments This paper is a result of pondering what I am doing in the lab, how it can be that in the evening I know more than I knew in the morning, and discussing this with G. Krenn, K. Svozil, C. Brukner, M. Zukovski and a number of other people. References 1. M. Born, Zeitschrift f. Physik 37, 863 (1926); Brit. J. Philos. Science 4, 95 (1953). 2. D. Bouwmeester et al., Phys. Rev. Lett. 82, 1345 (1999) and references therein. 3. W. Feller, An Introduction to Probability Theory and its Applications, (John Wiley and Sons, New York, 3rd edition, 1968), Vol.1, p.168. 4. ibid., p.233. 5. The connection of this relation to quantum physics was first stressed by W. K. Wootters, Phys. Rev. D 23, 357 (1981). 6. We give the example in quant-ph/0008098. 7. Several authors have noted that probability theory itself suggests quantum theory: A. Lande, Am. J. Phys. 42, 459 (1974); A. Peres, Quantum Theory: Concepts and Methods, (Kluwer Academic Publishers, Dordrecht, 1998); D. I. Fivel, Phys. Rev. A 50, 2108 (1994). 8. J. A. Wheeler in Quantum Theory and Measurement, eds. J. A. Wheeler and W. H. Zurek (Princeton University Press, Princeton, 1983) 182. 9. C. F. von Weizsacker, Aufbau der Physik (Hanser, Munich, 1985). Holger Lyre, Int. J. Theor. Phys., 34, 1541 (1995). Also quant-ph/9703028. 10. C. W. Kilmister, Eddington's Search for a Fundamental Theory (Cambridge University Press, 1994). 11. J. Anandan, Found. Phys. 29, 1647 (1999). 12. A. Bohr and 0 . Ulfbeck, Rev. Mod. Phys. 67, 1 (1995). 13. C. Brukner and A. Zeilinger, Phys. Rev. Lett. 83, 3354 (1999). 14. A penetrating analysis of the view of space implied by quantum theory is given by U. Mohrhoff, Am. J. Phys. 68 (8), 728 (2000). 15. Immanuel Kant, Critik der reinen Vernunft (Critique of Pure Reason), Riga (1781). There should be many English translations. 16. E.T. Jaynes introduced the 'reasoning robot' in his book Probability Theory: The Logic of Science in order to eliminate the problem of subjectivism that has been plaguing probability theory and quantum theory alike. The book is freely available at http://bayes.wustl.edu/etj/prob.html 17. J. Summhammer (to be published).
364 Q U A N T U M C R Y P T O G R A P H Y IN SPACE A N D BELL'S T H E O R E M
Steklov
IGOR VOLOVICH Mathematical Institute, Gubkin St. 8, GSP-1, 117966, Moscow, Russia E-mail: [email protected]
Bell's theorem states that some quantum correlations can not be represented by classical correlations of separated random variables. It has been interpreted as incompatibility of the requirement of locality with quantum mechanics. We point out that in fact the space part of the wave function was neglected in the proof of Bell's theorem. However this space part is crucial for considerations of property of locality of quantum system. Actually the space part leads to an extra factor in quantum correlations and as a result the ordinary proof of Bell's theorem fails in this case. Bell's theorem constitutes an important part in quantum cryptography. The promise of secure cryptographic quantum key distribution schemes is based on the use of Bell's theorem in the spin space. In many current quantum cryptography protocols the space part of the wave function is neglected. As a result such schemes can be secure against eavesdropping attacks in the abstract spin space but they could be insecure in the real three-dimensional space. We discuss an approach to the security of quantum key distribution in space by using a special preparation of the space part of the wave function.
1
Introduction
Bell's theorem 1 states that there are quantum correlation functions that can not be represented as classical correlation functions of separated random variables. It has been interpreted as incompatibility of the requirement of locality with the statistical predictions of quantum mechanics : . For a recent discussion of Bell's theorem see, for example 2 - 17 and references therein. It is now widely accepted, as a result of Bell's theorem and related experiments, that "local realism" must be rejected. Evidently, the very formulation of the problem of locality in quantum mechanics is based on ascribing a special role to the position in ordinary threedimensional space. It is rather surprising therefore that the space dependence of the wave function is neglected in discussions of the problem of locality in relation to Bell's inequalities. Actually it is the space part of the wave function which is relevant to the consideration of the problem of locality. In this note we point out that the space part of the wave function leads to an extra factor in quantum correlation and as a result the ordinary proof of Bell's theorem fails in this case. We present a criterium of locality (or nonlocality) of quantum theory in a realist model of hidden variables. We
365
argue that predictions of quantum mechanics can be consistent with Bell's inequalities for Gaussian wave functions and hence Einstein's local realism is restored in this case. Bell's theorem constitutes an important part in quantum cryptography 19 . It is now generally accepted that techniques of quantum cryptography can allow secure communications between distant parties 18 - 2 5 . The promise of secure cryptographic quantum key distribution schemes is based on the use of quantum entanglement in the spin space and on quantum no-cloning theorem. An important contribution of quantum cryptography is a mechanism for detecting eavesdropping. However in many current quantum cryptography protocols the space part of the wave function is neglected. But exactly the space part of the wave function describes the behaviour of particles in ordinary real three-dimensional space. As a result such schemes can be secure against eavesdropping attacks in the abstract spin space but could be insecure in the real three-dimensional space. It follows that proofs of the security of quantum cryptography schemes which neglect the space part of the wave function could fail against attacks in the real three-dimensional space. We will discuss how one can try to improve the security of quantum cryptography schemes in space by using a special preparation of the space part of the wave function. 2
Bell's Inequality
In the presentation of Bell's theorem we will follow 17 where one can find also more references. The mathematical formulation of Bell's theorem reads: cos(a -P)±
E&tip
(2.1)
where £Q and r)p are two random processes such that |£ a | < 1, \r\$\ < 1 and E is the expectation. Let us discuss in more details the physical interpretation of this result. Consider a pair of spin one-half particles formed in the singlet spin state and moving freely towards two detectors (Alice and Bob). If one neglects the space part of the wave function then the quantum mechanical correlation of two spins in the singlet state ipspin is Dspin(a, b) = (ipspin\(7 -a® a • b\tpspin) = -a • b
(2.2)
Here a and b are two unit vectors in three-dimensional space, a — ( o i , ^ , ^ ) are the Pauli matrices and
366
Bell's theorem states that the function Dspin{a,b) represented in the form
Eq. (2.2) can not be
P(a,b) = Jaa,\)r](b,X)dp(X)
(2.3)
i.e. Dspin(a,b)
^ P(a,b)
(2.4)
Here £(a, A) and 77(6, A) are random fields on the sphere, |£(a, A)| < 1, \rj(b, A)| < 1 and dp(X) is a positive probability measure, / dp{\) = 1. The parameters A are interpreted as hidden variables in a realist theory. It is clear that Eq. (2.4) can be reduced to Eq. (2.1). One has the following Bell-Clauser-Horn-Shimony-Holt (CHSH) inequality \P(a, b) - P(a, b') + P(a', b) + P(a', b')\<2
(2.5)
Prom the other hand there are such vectors (ab — a'b = a'b' = — ab' = V2/2) for which one has \Dspin(a, b) - Dspin(a, b') + Dspin(a',
b) + Dspin(a',
b')\ = 2^2
(2.6)
Therefore if one supposes that Dspin(a,b) = P(a,b) then one gets the contradiction. It will be shown below that if one takes into account the space part of the wave function then the quantum correlation in the simplest case will take the form g cos(a — /3) instead of just cos(a - /3) where the parameter g describes the location of the system in space and time. In this case one can get the representation gcos(a-p)=EZaT]l3
(2.7)
if g is small enough (see below). The factor g gives a contribution to visibility or efficiency of detectors that are used in the phenomenological description of detectors. 3
Localized Detectors
In the previous section the space part of the wave function of the particles was neglected. However exactly the space part is relevant to the discussion of locality. The complete wave function is tp = (V>a/3(ri,r2)) where a and /? are spinor indices and r i and r^ are vectors in three-dimensional space.
367
We suppose that Alice and Bob have detectors which are located within the two localized regions OA and OB respectively, well separated from one another. Quantum correlation describing the measurements of spins by Alice and Bob at their localized detectors is G(a,0A,b,OB)
= (1>W • aPoA ® a • bPoB|V>
(3-1)
Here PQ is the projection operator onto the region O. Let us consider the case when the wave function has the form of the product of the spin function and the space function tp = y , spin^(i , i,r 2 ). Then one has G(a, 0A, b, 0B) = g(0A, 0B)Dspin(a,
b)
(3.2)
where the function 9(OA,OB)=
[
\4>(r1,T2)\2dT1dv2
(3.3)
JOAXOB
describes correlation of particles in space. It is the probability to find one particle in the region OA and another particle in the region OB- One has 0
(3.4)
Remark. In relativistic quantum field theory there is no nonzero strictly localized projection operator that annihilates the vacuum. It is a consequence of the Reeh-Schlieder theorem. Therefore, apparently, the function g(OA,Os) should be always strictly smaller than 1. I am grateful to W. Luecke for this remark. Now one inquires whether one can write the representation 9(0A,0B)Dspin(a,b)
= f^a,OA,X)v(b,0B,\)dP(X)
(3.5)
Note that if we are interested in the conditional probablity of finding the projection of spin along vector a for the particle 1 in the region OA and the projection of spin along the vector b for the particle 2 in the region OB then we have to divide both sides of Eq. (3.5) to g(OA, OB)The factor g is important. In particular one can write the following representation 15 for 0 < g < 1/2: gcos(a-/3)=
v^cos(a-A)v
/
2pcos(^-A)— Jo An Let us now apply these considerations to quantum cryptography.
(3.6)
368
4
Quantum Key Distribution
Ekert 1 9 showed that one can use the EPR correlations to establish a secret random key between two parties ("Alice" and "Bob"). Bell's inequalities are used to check the presence of an intermediate eavesdropper ("Eve"). There are two stages to the Ekert protocol, the first stage over a quantum channel, the second over a public channel. The quantum channel consists of a source that emits pairs of spin one-half particles, in a singlet state. The particles fly apart towards Alice and Bob, who, after the particles have separated, perform measurements on spin components along one of three directions, given by unit vectors a and b. In the second stage Alice and Bob communicate over a public channel.They announce in public the orientation of the detectors they have chosen for particular measurements. Then they divide the measurement results into two separate groups: a first group for which they used different orientation of the detectors, and a second group for which they used the same orientation of the detectors. Now Alice and Bob can reveal publicly the results they obtained but within the first group of measurements only. This allows them, by using Bell's inequality, to establish the presence of an eavesdropper (Eve). The results of the second group of measurements can be converted into a secret key. One supposes that Eve has a detector which is located within the region OE and she is described by hidden variables A. We will interpret Eve as a hidden variable in a realist theory and will study whether the quantum correlation Eq. (3.2) can be represented in the form Eq. (2.3). ^From (2.5), (2.6) and (3.5) one can see that if the following inequality g(0A,0B)
<1/V2
(4.1)
is valid for regions OA and OB which are well separated from one another then there is no violation of the CHSH inequalities (2.5) and therefore Alice and Bob can not detect the presence of an eavesdropper. On the other side, if for a pair of well separated regions OA and OB one has 9(OA,OB)
>l/y/2
(4.2)
then it could be a violation of the realist locality in these regions for a given state. Then, in principle, one can hope to detect an eavesdropper in these circumstances. Note that if we set g(OA, OB) = 1 in (3.5) as it was done in the original proof of Bell's theorem, then it means we did a special preparation of the states of particles to be completely localized inside of detectors. There exist such
369 well localized states (see however the previous Remark) but there exist also another states, with the wave functions which are not very well localized inside the detectors, and still particles in such states are also observed in detectors. The fact that a particle is observed inside the detector does not mean, of course, that its wave function is strictly localized inside the detector before the measurement. Actually one has to perform a thorough investigation of the preparation and the evolution of our entangled states in space and time if one needs to estimate the function g(C>A, OB)5
Gaussian Wave Functions
Now let us consider the criterium of locality for Gaussian wave functions. We will show that with a reasonable accuracy there is no violation of locality in this case. Let us take the wave function
= ( ^ ) » / V " V / a , |V>2(r)|2 = ( ^ ) » / » e - » ' ( ' - 1 ) V »
(5.1)
We suppose that the length of the vector 1 is much larger than 1/m. We can make measurements of PoA and PQB for any well separated regions OA and OB- Let us suppose a rather nonfavorite case for the criterium of locality when the wave functions of the particles are almost localized inside the regions OA and OB respectively. In such a case the function 9(OA,OB) can take values near its maxumum. We suppose that the region OA is given by \ri\ < 1/m,r = (ri,r2,r 3 ) and the region OB is obtained from OA by translation on 1. Hence V'i(ri) is a Gaussian function with modules appreciably different from zero only in OA and similarly «/>2(i"2) is localized in the region OB- Then we have
g(0A, OB) = ( ^ L J ^ e~x^2dx\
(5.2)
One can estimate (5.2) as g(0A,0B)<(^
(5.3)
which is smaller than 1/2. Therefore the locality criterium (4.1) is satisfied in this case. Let us remind that there is a well known effect of expansion of wave packets due to the free time evolution. If e is the characteristic length of the Gaussian
370
wave packet describing a particle of mass M at time t = 0 then at time t the chracteristic length tt will be
It tends to (H/Me)t as t —> oo. Therefore the locality criterium is always satisfied for nonrelativistic particles if regions OA and OB are far enough from each other. The case of relativistic particles will be considered in a separate publication. 6
Conclusions
It is shown in this note that if we do not neglect the space part of the wave function of two particles then the prediction of quantum mechanics can be consistent with Bell's inequalities. One can say that Einstein's local realism is restored in this case. It would be interesting to investigate whether one can prepare a reasonable wave function for which the condition of nonlocality (4.2) is satisfied for a pair of the well separated regions. In principle the function g(C>A, OB) can approach its maximal value 1 if the wave functions of the particles are very well localized within the detector regions OA and OB respectively. However, perhaps to establish such a localization one has to destroy the original entanglement because it was created far away from detectors. It is shown that the presence of the space part in the wave function of two particles in the entangled state leads to a problem in the proof of the security of quantum key distribution. To detect the eavesdropper's presence by using Bell's inequality we have to estimate the function g(OA, OB)- Only a special quantum key distribution protocol has been discussed here but it seems there are similar problems in other quantum cryptographic schemes as well. We don't claim in this note that it is in principle impossible to increase the detectability of the eavesdropper. However it is not clear to the present author how to do it without a thorough investigation of the process of preparation of the entangled state and then its evolution in space and time towards Alice and Bob. In the previous section Eve was interpreted as an abstract hidden variable. However one can assume that more information about Eve is available. In particular one can assume that she is located somewhere in space in a region OE- It seems one has to study a generalization of the function g(OA,OB), which depends not only on the Alice and Bob locations OA and OB but also depends on the Eve location OE, and try to find a strategy which leads to an optimal value of this function.
371
7
Acknowledgments
This investigation was supported by the grant of Swedish Royal Academy of Sciences on the collaboration with states of the former Soviet Union and the Profile Mathematical Modeling of Vaxjo University. I would like to thank A. Khrennikov for the warm hospitality and fruitful discussions. This work is supported in part also by RFFI 99-01-00105 and INTAS 99-0590. References 1. J.S. Bell, Physics 1, 195 (1964) 2. A. Peres, Quantum Theory: Concepts and Methods, Kluwer, Dordrecht, 1993 3. L.E. Ballentine, Quantum Mechanics, Prince-Hall, 1990 4. Muynck W.M. de, De Baere, W. and Martens, H. , Found, of Physics, (1994), 1589 5. D.M. Greenberger, M.A. Home, A. Shimony, and A. Zeilinger, Am. J. Phys. 58, 1131 (1990) 6. S.L. Braunstein, A. Mann, and M. Revzen, Phys. Rev. Lett. 68, 3259 (1992) 7. N.D. Mermin, Am. J. Phys. 62, 880 (1994) 8. G. M. D'Ariano, L. Maccone, M. F. Sacchi and A. Garuccio, Tomographic test of Bell's inequality, quant-ph/9907091 9. Luigi Accardi and Massimo Regoli, Locality and Bell's inequality, quantph/0007005 10. Andrei Khrennikov, Non-Kolmogorov probability models and modified Bell's inequality, quant-ph/0003017 11. Almut Beige, William J. Munro and Peter L. Knight, A Bell's Inequality Test with Entangled Atoms, quant-ph/0006054 12. F. Benatti and R. Floreanini, On Bell's locality tests with neutral kaons, hep-ph/9812353 13. A. Khrennikov, Statistical measure of ensemble nonreproducibility and correction to Bell's inequality, Nuovo Cimento, 115B (2000)179 14. W. A. Hofer, Information transfer via the phase: A local model of Einstein-Podolksy-Rosen experiments, quant-ph/0006005 15. Igor Volovich, Yaroslav Volovich, Bell's Theorem and Random Variables, quant-ph/0009058 16. N. Gisin, V. Scarani, W. Tittel, H. Zbinden, Optical tests of quantum nonlocality: from EPR-Bell tests towards experiments with moving observers, quant-ph/0009055 17. Igor V. Volovich, Bell's Theorem and Locality in Space, quant-
372
ph/0012010 18. C.H. Bennett and G. Brassard, in Proc. of the IEEE Inst. Conf. on Comuters, Systems, and Signal Processing, Bangalore, India (IEEE, New York,1984) p.175 19. A.K. Ekert, Phys. Rev. Lett. 67 (1991)661 20. D. S. Naik, C. G. Peterson, A. G. White, A. J. Berglund, P. G. Kwiat, Entangled state quantum cryptography: Eavesdropping on the Ekert protocol, quant-ph/9912105 21. Gilles Brassard, Norbert Lutkenhaus, Tal Mor, Barry C. Sanders, Security Aspects of Practical Quantum Cryptography, quant-ph/9911054 22. Kei Inoue, Takashi Matsuoka, Masanori Ohya, New approach to Epsilonentropy and Its comparison with Kolmogorov's Epsilon-entropy, quantph/9806027 23. Hoi-Kwong Lo, Will Quantum Cryptography ever become a successful technology in the marketplace?, quant-ph/9912011 24. Akihisa Tomita, Osamu Hirota, Security of classical noise-based cryptography, quant-ph/0002044 25. Yong-Sheng Zhang, Chuan-Feng Li, Guang-Can Guo, Quantum key distribution via quantum encryption, quant-ph/0011034
373
I N T E R A C T I N G STOCHASTIC PROCESS A N D RENORMALIZATION THEORY
YAROSLAV VOLOVICH Physics Department, Moscow State University, Vorobievi Gori, 119899Moscow, Russia E-mail: [email protected] A stochastic process with self-interaction as a model of quantum field theory is studied. We consider an Ornstein-Uhlenbeck stochastic process x(t) with interaction of the form x ( a ' ( t ) 4 , where a indicates the fractional derivative. Using Bogoliubov's R—operation we investigate ultraviolet divergencies for the various parameters a. Ultraviolet properties of this one-dimensional model in the case a = 3/4 are similar to those in the ip\ theory but there are extra counterterms. It is shown that the model is two-loops renormalizable. For 5/8 < a < 3/4 the model has a finite number of divergent Feynman diagrams. In the case a = 2/3 the model is similar to the
1
Introduction
There is a very fruitful interrelation between probability theory and quantum field theory 1 _ 6 . In this note we consider a stochastic process that shows the same divergencies as quantum electrodynamics or >4 theory in the 4dimensional spacetime. This stochastic process corresponds to one-dimensional Euclidean quantum field theory with the quartic interaction that contains fractional derivatives. This one-dimensional model can be used for studying the fundamental problem of non-perturbative investigation of renormalized quantum field theory 1 ' 3 . It can also find applications in theory of phase transitions 5 ' 6 . The Interacting Stochastic Process. Let x(t) = x(t,u)) be an OrnsteinUhlenbeck stochastic process with the correlation function 1
r°°
pip(t-r)
p~m\t-r\
where m > 0. There exists a spectral representation of the Ornstein-Uhlenbeck stochastic process 8 x{t,u)=
JeiktC(dk,u)
374
where ((dk,u)
is a stochastic measure. We define the fractional derivative
a
as *<<*> (t,w)=
f\k\aeiktC(dk,oj)
(1.2)
If 0 < a < 1/2 then x^(t) is a stochastic process. If a > 1/2 then one needs a regularization described below. We will use distribution notations and write C,(dk,ui) = x(k,cj)dk,
1 f°° ikt = — I x(t,cj)e - dt
i(k,w)
2?r
J-oo
We want to give a meaning to the following correlation functions E{x{h) • • • x{tN)e~xu)/
K{h ,...,tN)=
E(e-xu)
(1.3)
for all N = 1,2,... Here OO
/
:X^{T)A
:g(T)dT
(1.4)
-OO
where g(r) is a nonnegative test function with a compact support (the volume cut-off), a;(Q)(i) denotes the fractional derivative (1.2), A > 0 and : ^ ( ^ ( T ) 4 : is the Wick normal product. We will denote the expectation value as E(A) — {A). In this notations (x(t)x(r)) = ± J^ ^^rdp For the correlation function (1.3) one has the perturbative expansion (x(h)...
x{tN)e~xu)
= V
K
—f-
(xfa) • ••x(tN)Un)
(1.5)
n=0
If a > 5/8 then the expectation value in (1.5) has no meaning because there are ultraviolet divergencies. We have to introduce a cutoff stochastic process xK (t) 3 xK(t,e>)=
f
eiktadk,u)
J —K
Instead of U in (1.3) we put
UK = j
: 4 a ) M 4 : 9(r)dr
"Stochastic differential equations with fractional derivatives number fields.
7
are considered also on p—adic
375 where J—K
The problem is to prove that after the renormalization there exists a limit of the correlation functions (x{h)-x(tN)e-w')rm as K -> oo in each order of the perturbation expansion. We will consider this problem below by using the Bogoliubov-Parasiuk .R-operation and the standart language of the Feynman diagrams. In the momentum representation we obtain the expression of the form {x(pi)...x{jpN)e~xu)
= ^2Gr(pi,...
,PN)
Here the sum runs over all Feynman diagrams T with N external legs that can be build up using 4-vertices corresponding to the x^4 term. Contributions from the connected diagrams with n 4-vertices and L internal lines has a form
j=i
j j
= i
<
i j
+ m
where I = L — (n — 1), qi are linear combinations of the internal momenta fci,... , ki, and external momenta p i , . . . ,PNThe canonical degree D(T) of a proper diagram is defined by the dimension of the corresponding Feynman integral with respect to the integration variables. Using (1.6) we have D = D(T) = (2a - 2)L + I = (2a - \)L - n + 1
(1.7)
If for a given diagram D < 0 then this diagram is superficially finite, otherwise it is divergent. Let us consider a proper diagram with n vertices, L internal lines, and E legs. We have the following relation An-2L
+E
(1.8)
Note that for any nontrivial connected diagram 2n > L > n > 2
(1.9)
E <2n
(1.10)
376
Theorem If a < 5/8 then all Feynman diagrams of the interacting stochastic process are superficially finite. If 5/8 < a < 3/4 then there exists a finite number of divergent diagrams, moreover all divergent diagrams have only 0 or 2 legs. If a = 3/4 then the model is renormalizable and all divergent diagrams have only 0, 2 or 4 external lines. Finally, if a > 3/4 then the model is nonrenormalizable. Proof Let us prove the first statement of the theorem, i.e. if a < 5/8 then D < 0 for any n > 2. Using (1.7) and (1.9) we have nr <2L
D a<5/8
5
T L-n
+ nl =
L-An
8 <
+A ^ <
(1.11)
4
In - An + 4 4
2
<0
Prom (1.11) it follows that D < 0 for any a < 5/8. Let us consider a = 5/8. Similarly to (1.11) from (1.7) we have D
L-An
+ A
2_ n <0
(1.12)
a=5/8
Therefore only two-point (n = 2) diagram could be divergent (in this case D = 0). Rewriting (1.12) in the form D
A-(E
+ L)
a<5/8
(1.13)
Prom (1.13) it follows that only diagram with E = 0, L — A, n = 2 is divergent. In the case when 5/8 < a < 3/4 we can write (1.14)
a =
where 0 < e < 1/8. Substituting (1.14) into (1.7) and using (1.9) we have D
a=3/4-er
L = --2Le-n 2
+ l<
2n — 2
2ns - n + 1 = 1 - 2ne
(1.15)
Thus for any given s > 0 (and therefore any a < 3/4) there exists a number N such that for any n > N the canonical dimension D < 0. Hence there exists only a finite number of divergent diagrams. Rewriting (1.15) in the form D
a=3/4-e
= -2Le +
A-E
377
It follows that D > 0 only if E < 4, i.e. E = 0 or E = 2 and the model is super-renormalizable. Let us consider the case when a = 3/4. Using (1.8) and (1.7) we have
=l-f
D a=3/4
(1.16)
4
The equality (1.16) means that all divergent diagrams have only 0, 2, or 4 legs and the model is renormalizable. Finally if a > 3/4 we have = - - n + l =
D a>3/4
2,
>^>0 1
2
(1.17)
,
Therefore if a > 3/4 then all proper diagrams are divergent. • Examples of application of this theorem one can find in 9 . 2
Acknowledgments
This investigation was supported by the grant of Swedish Royal Academy of Sciences on the collaboration with states of the former Soviet Union and the Profile Mathematical Modeling of Vaxjo University. I would like to thank A. Khrennikov for the warm hospitality and fruitful discussions. References 1. N.N. Bogoliubov and D.V. Shirkov, Introduction to the theory of quantum fields, Nauka, Moscow, 1973 2. T. Hida Brownian Motion, Springer-Verlag, 1980. 3. J. Glimm and A. Jaffe Quantum Physics. A Functional Integral Point of View, Springer-Verlag, 1987 4. T. Hida, H.-H. Kuo, J. Potthoff and L. Streit, White noise: An Infinite Dimensional Calculus, Kluwer Academic, 1993 5. J. Kogut, K. Wilson Phys. Reports., 12C, p. 75, 1974 6. A.Z. Patashinski and V.L. Pokrovski, The fluctuational theory of phase transitions, Nauka, Moscow, 1975 7. V.S. Vladimirov, Generalized functions over the field ofp—adic numbers Russian Math. Surveys 43:5 (1988) 8. I.I. Gihman and A.V. Skorohod, Introduction to Theory of Random Processes, Nauka, Moscow, 1977 9. Ya.I. Volovich, Interacting stochastic process and renormalization theory, quant-ph/0008063
ISBN 981-02-4846-6
www. worldscientific.com 4 8 8 4h c
9 789810 248468