PROBABILITY AND
SCHRODINGERS MECHANICS David B. Cook
World Scientific
PROBABILITY AND
SCHRODINGER'S MECHANICS
This page is intentionally left blank
PROBABILITY AND
SCHRODINGER'S MECHANICS
David B. Cook Department of Chemistry, University of Sheffield
V | f e World Scientific vflfe
New Jersey • London • Singapore • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: Suite 202,1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
PROBABILITY AND SCHRODINGER'S MECHANICS Copyright © 2002 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-238-191-0
This book is printed on acid-free paper. Printed in Singapore by Mainland Press
"But the chemists and the whole class of mechanics and empirics, should they have the temerity to attempt contemplation and philosophy, being accustomed to meticulous subtlety in a few things, they twist by extraordinary means all the rest into conformity with them and promote opinions more odious and unnatural than those advanced by the very rationalists." — Francis Bacon, Preface to Natural History, 1609 (my emphasis)
This page is intentionally left blank
Contents
Preface
xiii
Organisation
xvii
Part 1
Preliminaries
1
Chapter 1 Orientation and Outlook 1.1. General Orientation 1.2. Materialism 1.3. Materialism and Realism 1.4. Logic 1.5. Mathematics 1.6. Reversing Abstraction 1.7. Definitions, Laws of Nature and Causality 1.8. Foundations 1.9. Axioms 1.10. An Interpreted Theory
3 3 4 8 9 12 13 15 18 20 21
Part 2
23
Probabilities
Chapter 2 Simple Probabilities 2.1. Colloquial and Mathematical Terminology 2.2. Probabilities for Finite Systems 2.2.1. An Example: The Faces of a Cube 2.2.2. Dice: Statistical Methods of Measure vii
25 26 27 29 31
viii
2.3. 2.4. 2.5.
2.6.
Contents
2.2.3. Loaded Dice: Statistical Methods of Measure 2.2.4. Standard Dice and Conservation Laws Probability and Statistics 2.3.1. An Extreme Example Probabilities in Deterministic Systems The Referent of Probabilities and Measurement 2.5.1. Single System or Ensemble? 2.5.2. The Collapse of the Distribution 2.5.3. Hidden Variables Preliminary Summary
34 35 39 40 41 45 47 49 50 51
Chapter 3 A More Careful Look at Probabilities 3.1. Abstract Objects 3.2. States and Probability Distributions 3.2.1. The Propensity Interpretation 3.3. The Formal Definition of Probability 3.3.1. A Premonition 3.4. Time-Dependent Probabilities 3.5. Random Tests 3.6. Particle-Distribution Probabilities
53 53 55 56 58 62 63 66 67
Part 3
Classical Mechanics
69
Chapter 4 The Hamilton-Jacobi Equation 4.1. Historical Connections 4.2. The H-J Equation 4.3. Solutions of the H-J Equation 4.3.1. Cartesian Coordinates 4.3.2. Spherical Polar Coordinates 4.3.3. Comparisons 4.3.4. Cylindrical Coordinates 4.4. Distribution of Trajectories 4.5. Summary
71 71 73 76 78 79 81 83 84 86
Appendix 4.A
89
Transformation Theory
Chapter 5 Angular M o m e n t u m 5.1. Coordinates and Momenta 5.2. The Angular Momentum "Vector"
99 99 101
Contents
5.3. 5.4. 5.5.
ix
The Poisson Brackets and Angular Momentum Components of the Angular Momentum "Vector" Conclusions for Angular Momentum
Part 4
Schrodinger's Mechanics
106 107 109
111
Chapter 6 Prelude: Particle Diffraction 6.1. History 6.1.1. The Experiment 6.1.2. The Explanations 6.2. The Wave Theory 6.3. The Particle Theory 6.4. A Simple Case 6.5. Experimental Verification 6.6. The Answer to a Rhetorical Question 6.7. Conclusion
113 113 114 114 115 116 118 120 120 121
Chapter 7 The Genesis of Schrodinger's Mechanics 7.1. Lagrangians, Hamiltonians, Variation Principles 7.1.1. Equations and Identities 7.2. Replacing the Hamilton-Jacobi Equation 7.3. Generalising the Action S 7.3.1. Changing the Notation for Action 7.3.2. Interpreting the Change 7.4. Schrodinger's Dynamical Law 7.4.1. Position Probability and Energy Distributions . . . . 7.4.2. The Schrodinger Condition 7.5. Probability Distributions? 7.6. Summary of Basic Principles
123 123 125 126 128 129 131 134 135 136 139 142
Chapter 8 The Schrodinger Equation 8.1. The Variational Derivation 8.2. Some Interpretation 8.3. The Boundary Conditions 8.4. The Time-Independent Schrodinger Equation
147 147 152 156 158
Appendix 8.A
161
Schrodinger's First Paper of 1926
x
Contents
Chapter 9 Identities: Momenta and Dynamical Variables 9.1. Momentum Definitions and Distributions 9.2. Abstract Particles of Constant Momentum 9.3. Action and Momenta in Schrodinger's Mechanics 9.4. Momenta and Kinetic Energy 9.5. Boundary Conditions 9.5.1. Constant Momenta and Kinetic Energy 9.5.2. Solution of the Schrodinger Equation 9.6. The "Particle in a Box" and Cyclic Boundary Conditions . . .
179 179 180 182 186 189 190 191 192
Chapter 10 Abstracting the Structure 195 10.1. The Idea of Mathematical Structure 195 10.1.1. A Pitfall of Abstraction: The Momentum Operator . 198 10.2. States and Hilbert Space 201 10.3. The Real Use of Abstract Structures 204
Part 5
Interpretation from Applications
207
Chapter 11 The Quantum Kepler Problem 11.1. Two Interacting Particles 11.2. Quantum Kepler Problem in a Plane 11.3. Abstract and Concrete Hydrogen Atoms 11.4. The Kepler Problem in Three Dimensions 11.5. The Separation of the Schrodinger Equation 11.6. Commuting Operators and Conservation 11.7. The Less Familiar Separations 11.7.1. The Everyday Solutions 11.8. Conservation in Concrete and Abstract Systems 11.9. Conclusions from the Kepler Problem 11.9.1. Concrete Objects and Symmetries
209 210 211 212 214 216 218 221 223 223 227 231
Appendix 11.A
233
Hamiltonians by Substitution?
Chapter 12 The Harmonic Oscillator and Fields 12.1. The Schrodinger Equation for SHM 12.2. SHM Details 12.3. Factorisation Method 12.4. Interpreting the SHM Solutions
237 237 239 241 242
Contents
xi
12.5. Vibrations of Fields and "Particles" 12.5.1. Phonons and Photons 12.6. Second Quantisation
244 248 249
Chapter 13 Perturbation Theory and Epicycles 13.1. Perturbation Theories in General 13.2. Perturbed Schrodinger Equations 13.3. Polarisation of Electron Distribution 13.4. Interpretation of Perturbation Theory 13.5. Quantum Theory and Epicycles 13.6. Approximations to Non-existent Functions 13.7. Summary for Perturbation Theory
251 251 252 255 256 258 259 261
Chapter 14 Formalisms and "Hidden" Variables 14.1. The Semi-empirical Method 14.2. The Chemical Bond 14.3. Dirac's Spin "Hamiltonian" 14.4. Interpretation of the Spin Hamiltonian
263 263 264 267 268
Part 6
Disputes and Paradoxes
271
Chapter 15 Measurement at the Microscopic Level 15.1. Recollection: Concrete and Abstract Objects 15.2. Statistical Estimates of Probabilities 15.2.1. von Neumann's Theory of Measurement 15.3. Measurement as "State Preparation" 15.4. Heisenberg's Uncertainty Principle 15.4.1. Measurement and Decoherence 15.5. Measurement Generalities
273 273 275 278 281 284 286 287
Appendix 15.A Standard Deviations of Conjugate Variables
289
Chapter 16 Paradoxes 16.1. The Classical Limit 16.1.1. The Ehrenfest Relations 16.2. The Einstein-Podolsky-Rosen (EPR) Paradox 16.2.1. The EPR Original
291 291 293 294 295
xii
Contents
16.2.2. Bohm's Modification 16.2.3. Bell's Inequality and Theorem 16.3. Bell's Assumptions 16.3.1. Lessons from EPR 16.3.2. Density of Spin and EPR 16.4. Zero-Point Energy
297 298 300 303 304 307
Chapter 17 Beyond Schrodinger's Mechanics? 17.1. An Interregnum? 17.2. The Avant-Garde 17.3. The Break with the Past 17.4. Classical and Quantum Mechanics
311 311 313 314 315
Index
319
Preface
The presentation and interpretation of (non-relativistic) quantum mechanics is a very well-worked area of study; there have to be very good reasons for adding to the literature on this subject. My reasons are (obviously) that I am far from satisfied with much of the published work and find difficulties with some points, in particular: • Any abstract formalism is much less rich than the structure from which it has been abstracted; a fact that even Wittgenstein had to come to terms with in the latter part his adult life. Language is richer than (cannot be reduced to) a representation of logic, Schrodinger's mechanics is richer than (cannot be reduced to) a representation of Hilbert space. Just as language contains more structures than logic so Schrodinger's mechanics contains more structures than those of Hilbert space. • The use of probability in quantum theory is arbitrary, eccentric, out of step with modern probability theory and is the source of the majority of "paradoxes" in the interpretation of quantum theory. While these paradoxes are the bread and butter of some of the more popular expositions of quantum theory, I cannot say that I am fond of paradoxes in physical theories. • Although positivism is discredited as a philosophy of science it has left a huge clutter of verbal and conceptual debris strewn across the field of quantum theory. Positivism in its most aggressive form (instrumentalism) makes the mistake of confusing the meaning of a concept with the way in which numerical values of the variables involved in that concept may be determined. This attitude has, of course, added to the confusion about probabilities; "defining" them in terms of the ways in which they might be measured thus reinforcing the view that probabilities are xiii
Preface
XIV
frequency ratios and the everyday opinion that probabilities are applicable to individual events. • More mundanely, the prescriptions for generating Schrodinger's mechanics from classical mechanics in the vast majority of texts are wrong; they simply do not work. We are saved from total chaos by the fact that the form of the Schrodinger equation is known and used independently of these formal prescriptions by working scientists. I am at a loss to explain why this central point is ignored in text after text, both on the applications of quantum theory and on its interpretation. Most modern studies of the interpretation of quantum mechanics are "modern" in another sense; that used in literary criticism. They study the theory of quantum mechanics rather than the quantum mechanics of the energetics and distribution of sub-atomic particles. Many of these works concentrate on the alleged consequences of imaginary experiments involving spin angular momentum; uncritically using the colloquial, "everyday" interpretation of probability rather than the modern mathematical theory. Indeed, scarcely any text says what is meant by "probability", leaving the reader to assume that the everyday interpretation is correct. If a Hamiltonian is used at all in these expositions it is an empirical one using "coupling constants" and making no reference to the laws of interaction in physical systems. In contrast, I wish to present a study of what I have called Schrodinger's mechanics, the richest, most concrete and most thoroughly interpretable of a variety of more-or-less abstract structures falling under the umbrella of "quantum mechanics". Quantum theory is a mathematically articulated theory embedded in a historical stream of scientific thought; it cannot be understood or its interpretation appreciated without addressing these obvious facts. I shall not shrink from joining scientists in cosmology, geology, biology, chemistry, history, archeology, sociology, paleontology, (even physics) in using, where possible, a mechanistic mode of explanation occasionally involving unobserved, even unobservable, objects and processes. What is more, I will occasionally look at actual solutions of the (spatial differential) equations to examine how they are capable of interpretation and how this interpretation bears on the referent of Schrodinger's mechanics. In a previous work I chose1 to express myself using the "ensemble" interpretation of probabilistic statements since this interpretation has been 'With considerable reservations, as I explained in an Appendix to the work.
Preface
XV
one of the major strands in quantum theory (notably by Einstein) and because it is closest to the "abstract object" interpretation which I use in this work. My main aim was twofold: to stay in contact with one of the main streams of thought in quantum theory and, more important, to emphasise the incorrectness of the colloquial interpretation of probability which assumes that probability statements apply to single concrete systems. I now regret this, since even the ensemble interpretation, although vastly superior to the colloquial interpretation, cannot be made precise enough for my purposes here. Two of the main lessons of the past century in the investigation of the structure of mathematics and, in general, of mathematically articulated theories have been the failures of ordinary logic and intuition for: • Systems with an infinite number of members or for infinite combinations of statements, • Constructions which are self-referential like the ones which generate Russell's paradox or for systems whose logic may be "internalised" like Godel's theorem. So, generally speaking, I shall not be concerned with "the wave function of the Universe", "the wave function of the measuring apparatus" or even "the wave function of the observer" since the first is both recursively selfreferential and with infinite referent and the others arguably so. I shall, therefore, try to avoid writing down symbols for non-existent mathematical objects; the typographical equivalent of hot air. The science of the very small and very light is quite complex enough for me, I do not have time (literally or figuratively), or indeed the expertise, to express opinions about the bearing of quantum theory on God, Mind or Consciousness. My aim is entirely bourgeois in the sense used by Wittgenstein and by Marx. I try to solve the problems of Schrodinger's mechanics "from within"; I do not wish to sweep away quantum theory but to bring this most successful physical theory of the twentieth century back into the mainstream of the great tradition of the mechanics of Newton, Lagrange and Hamilton and to base its interpretation on the work of a pillar of twentieth century mathematics: A. N. Kolmogorov. The degeneration of the interpretation of modern science into silopsism and triviality requires an entirely different kind of investigation for which I have neither the qualifications nor the stomach.
This page is intentionally left blank
Organisation
As this work is presented in a rather unusual way, this short chapter gives both a description of the organisation of material and an attempted justification for this choice. Analysis and interpretation of Schrodinger's mechanics is a very "mature" subject in the sense that an enormous amount has been written about the matter without any consensus emerging. Or, rather, such consensus as there is about the interpretation of quantum theory is, in my view, based on an erroneous (pre-Kolmogorov) view of probability. So, in what is written here almost every page (indeed, in places, almost every sentence) will contain material that is in dispute. Using Schrodinger's starting point and Kolmogorov's probability theory, I shall say (and I shall prove), among other things, that: • Probability statements do not refer to individual objects • Probability distributions do not "collapse" • The Hamiltonian operator cannot be obtained by an operator substitution in the classical Hamiltonian function • Dynamical variables may or may not all be represented by linear operators but there is no general way to find the form of these operators • Conserved quantities are not always represented by operators which mutually commute • It is nonsensical to give a physical interpretation to the terms in a perturbation expansion To take the time and space to discuss the opposing views on every such point is tiresome and, more importantly, makes the presentation diffuse and lacking in direction. I shall therefore simply give my presentation XVll
XV111
Organisation
of probability and Schrodinger's mechanics without detailed references to other points of view or, indeed, in most cases even giving an opposing view a "fair hearing". This device will enable the presentation of the theory and interpretation of Schrodinger's mechanics "as if" one were developing an interpreted theory ab initio in the cleanest and most logical manner while conveniently ignoring other viewpoints. My defence for this outrageous choice is twofold: defending my position would simply take too much space and the alternative and opposing views are too well-known to need rehearsing here. Throughout the work I use the term "state function" rather than the familiar "wave function"; this is a deliberate attempt to avoid, even in terminology, the celebrated wave/particle duality. In fact the solutions of the Schrodinger equation are only waves in the extremely atypical case of a particle in the absence of a field of force (a "free" particle). And even these waves are not real physical waves but particular forms of a function which generates a probability distribution. I have also tended (as I say on page 148 of Chapter 8) to work in a system of units — "atomic units" — in which Planck's constant has the value unity unless the historical context explicitly demands the appearance of of h.
PART 1
Preliminaries
In this introductory material I give a rather casual survey of the general approach I shall be using and of some of the attitudes I shall take to philosophy, logic and, particularly, mathematics. In mathematics I shall combine extreme fussiness in some topics with a rather casual lack of rigour in general. In general, internal, "technological" matters, mathematics is in good hands and needs no gloss from me. But in key areas of interpretation and abstraction the relationship between abstract structures and physical theories are, typically, ignored by mathematicians or deemed to be "arbitrary". In the nineteenth century Hegel showed Western Christendom that the God it worshipped was not "out there" but was nothing more than a fantastic image of itself. Modern mathematicians are not yet ready to accept the same conclusion about the logic they rely on; mathematics and logic are not "out there", they are abstracted from the real world of objects, processes, language and conceptual thought.
This page is intentionally left blank
Chapter 1
Orientation and Outlook
Writings about quantum mechanics are hedged about with claims that it changes our whole conception of the applicability of logic and mathematics and revolutionises philosophy. While much of this discussion is patently foolish, it is worth attempting to say a little about some of the assumptions which I have later (mostly silently) made about philosophical, logical and mathematical matters. The ideas simply sketched here will be developed elsewhere. What is said here is trifling as philosophy, it is merely a general orientation to the study of mathematically-articulated science.
Contents 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8. 1.9. 1.10. 1.1.
General Orientation Materialism Materialism and Realism Logic Mathematics Reversing Abstraction Definitions, Laws of Nature and Causality Foundations Axioms An Interpreted Theory
3 4 8 9 12 13 15 18 20 21
General Orientation
The theory of Schrodinger's mechanics, more than most other parts of theoretical science, seems to be inextricably involved with philosophical ideas. This is not, in my view, an essential part of quantum theory but rather, 3
4
Orientation and Outlook
as often happens, the fact that a new theory brings into sharp focus a collection of ideas which have been only imperfectly absorbed in the previous theories or not involved in those theories. It is astonishing how confusions, which are basically misunderstandings about the meaning and referent of probability theory, have generated a re-examination of the applicability of logic and mathematics to the world and even raised doubts about the objective existence of that world. Ostensible paradoxes involving quantum mechanics continue to be marvelled at even though these paradoxes, while brought to the attention of physicists by quantum theory, do not, in essence, involve quantum theory at all. In this introductory Chapter 1 try to give the briefest of summaries of my attitude to some of these points of ontology, logic and mathematics in that order. They are not meant to be anything other than starting points for a more comprehensive philosophy of science which has been given by others.
1.2.
Materialism
I assume that the world exists independently of my (or anyone else's) perceiving, measuring or thinking about it. Without this assumption the study of science is ridiculous. Post-classical materialism is a product of the British Isles, perfected, alas, by others. After John Duns Scotius had wondered in 1320 if "matter could think" and Francis Bacon's penetrating founding of modern materialism we find, 300 years after Duns Scotius, Thomas Hobbes writing: The Universe is corporeal; all that is real is material and what is not material is not real. (The Leviathan, 1650) After another 300 years, this vigorous statement has been refined to a rather more cautious Materialism: the doctrine that all items in the world are composed of matter. Because not all physical entities are material, the related doctrine of physicalism, claiming that all items in the world are physical entities, has tended to replace materialism. (The Blackwell Companion to Philosophy, N. Bunnin and Tsui-James)
1.2.
Materialism
5
These definitions, although suggestive, do not say what "corporeal", "material", "real", "physical" and, above all, "matter" are, nor is "item" defined. Surely, in this important context, one is entitled to ask, at the very least, for a careful statement of the meaning of "material" and "matter" } But I am being unfair on Hobbes and his commentators; he was bound, as we all are, by the cultural, ideological and scientific milieu in which he lived. The sense of Hobbes' statement is clear enough and, in particular, the strength of Hobbes' polemical formulation is understandable, surrounded as he was by established religion. Most modern statements of the meaning of materialism take the general line of the second quote and, again, the general sense of it is clear, particularly if one has no familiarity with physical science. However, the elucidation of the nature and dynamics of matter is, arguably, a definition of the aim of all physical science. With every major revolution in physical science our conception of the nature of matter and the transformations it may undergo are overthrown. We have seen • Newton's "massy impenetrable particles" replaced by • The Rutherford "planetary" atom composed mostly of empty space, which in turn has had to give way to • The idea of matter composed of particles which are nothing more than "quanta" associated with various fields (arguably massless, extended substances) • which, we can confidently expect to be replaced in due course There is every reason to believe that the scientific investigation of matter will never exhaust its properties and that our ideas of what matter is will continue to evolve.2 Also, in view of past experience, we can be completely confident that, at any particular time, our best ideas about what matter actually is will be wrong. So, in the absence of a definition or description from the philosophers, these changes in the conception of matter have apparently placed materialism, as defined above, in the curious position of being dependent on the pronouncements of a current specialised part of science (quantum field theories, at the moment) because ideas about the nature of "matter" are 1 Oi course, definitions have to stop somewhere with primitives which can, at best, be described not defined. But I think that an "ism" cannot be a primitive, it must be defined in terms of the "X" of which it is the "X-ism". 2 If a series of complete revolutions can be called evolution.
6
Orientation
and Outlook
constantly changing. But materialism (like idealism) is an ontological or metaphysical statement about what exists; it cannot be dependent on the pronouncements of particular, special sciences; decisions about ontology cannot depend on the results of physical science, however suggestive these might appear. If we decide, for example, that matter exists independently of our minds, we can scarcely complain if we are disappointed or repelled by what detailed investigations reveal about the actual nature of that matter. The details of any particular current theory of the structure of matter have no bearing whatsoever on one's ontological position, whether it is a branch of materialism or some form of idealism, scepticism, empiricism or whatever. Scientific investigations (practical and theoretical) continue to reveal the richness and depth of the structure of matter and its autonomous transformations; to base a general ontological position on any particular ideas about the structure of matter is a hostage to fortune of the worst kind. What materialism says is not what matter is but that matter, whatever it is, exists. That is, the characteristic property of matter for the purposes of philosophical materialism is not what its constitution is thought to be at any particular moment in time but the fact that it exists independently of our perceptions, thoughts, theories and intuitions about its true nature. The "materiality" of matter consists in precisely this independent existence and I cannot improve on the formulation of the most influential4 materialist of the twentieth century (and, arguably, of any century) Vladimir Illych Ulyanov who says: For the sole "property" of matter with whose recognition philosophical materialism is bound up is the property of being an objective reality, of existing outside our mind. (V. I. Lenin Materialism and Empirio-Criticism 1908 (emphasis in original)) Some modern physicists, disorientated by the huge changes in the conception of the structure of matter, have declared that, since some theories make fields the ultimate constituents of matter, "matter" has disappeared 3 My own opinion, for example, that wasps are an evolutionary mistake, being far too heavily armed for their size, does not affect my general acceptance of biological evolutionary theories. 4 A n d profoundly unfashionable.
1.2.
Materialism
7
from physics. In some cases the change in ideas about the structure of "Newtonian matter" is blurred together with the idea of a mechanistic universe and even with the "information revolution": Thus the rigid determinism of Newton's clockwork universe evaporates, to be replaced by a world in which the future is open, in which matter escapes its lumpen [sic] limitations and acquires an element of creativity. (P. Davies and J. Gribbin, Chapter 1 (The Death of Materialism) in The Matter Myth (Viking, 1991)) It is just as true that "Newtonian matter" has disappeared as it is that "the flat earth" has disappeared: • The idea of "the flat earth" is an eminently sensible one at the everyday level of operations and is perfectly acceptable as a description of the surface of the planet for one's local environment and journeys of a hundred miles or so. Conversations and plans for car and rail journeys may be safely entered into in the certain knowledge that the assumption of "flat earth" will be understood in context and not be seen to conflict with the understanding that, in a wider context, the earth is roughly spherical. • The idea of "matter" as understood in a Newtonian sense is an eminently sensible one at the everyday level of operations and is perfectly acceptable as a description of everyday objects in our normal environment. One may converse about buying apples or building with bricks in the certain knowledge that everyone will understand that these objects are made of hard substantial "Newtonian matter". No layman will be in any doubt that it is this property of substance which prevents our hands from passing through each other when we clap them together. This understanding will not be seen to contradict any more fundamental understanding of the structure of matter. What has happened in both these cases is that an everyday concept has had to be replaced by a more precisely-defined idea when used in a scientific context. The fact that theories of the underlying structure of matter has changed from the naive assumptions current when the word and concept "matter" were formed changes in no way the basic materialist view that matter, whatever its structure, exists objectively whether or not we perceive it.
8
Orientation
1.3.
and Outlook
M a t e r i a l i s m and Realism
Personally, I have no doubt whatsoever that natural gas (methane, CH4) exists independently of my or any other mind. What is more, the formula CH4 is interpretable as saying (among other things) that a molecule of methane contains 5 atomic nuclei. I believe that these 5 nuclei exist independently of my mind. Also, I know from both quantum mechanical calculations and a wealth of experimental data that the methane molecule has other properties which are not material. In particular, I believe that the four hydrogen nuclei are disposed about a central carbon atom in a tetrahedral manner; the lower the temperature of the methane (the smaller the amplitude of its vibrations) the more nearly does it assume a regular tetrahedral shape. • The number 5 is not a material object, although a penta-atomic molecule is. • The tetrahedral shape of a methane molecule is not a material object, although a tetrahedral molecule clearly is. So I must dissent from Hobbes' vigorous opinion at the start of Section 1.2 on page 4; integers and shapes (arrangements) are real but not material in any sensible use of the term "real". Further considerations along these lines will reveal a whole host of further things which are real but not material. All of them are of conceptual structure and exist in the minds of people; all nouns (except proper names of individual concrete objects) denote things which are abstractions. The desk on which my computer sits does not even have a name; it is just "my desk" but "desk" 5 is a concept and, as such, exists only in minds. Naturally, as far as I am concerned, minds are properties of (or processes in) some material objects but they and their "contents" are not themselves material objects. Thus, although a materialist, I cannot be what one might term an exclusive materialist; there are obviously, in the world, things existing independently of my mind which are not material. One is naturally drawn to the use of the word "realist" to denote such a metaphysical position since what I am saying is that both material objects and abstract objects are real. Unfortunately, the word realist in the philosophy of science seems to 5
1 choose not to distinguish between the word desk and the concept denoted by that word in order to avoid t h e disintegration of any discussion into a theory of t h e proper use of quotation marks of which there are quite enough in this chapter already.
1.4-
Logic
9
have come to mean "the existence, independent of minds, of the object of study" whether it is material or something else (there are realist mathematicians who believe in the objective mind-independent existence of, for example, Laplace transforms). However, without attempting — or indeed wishing — to work out a philosophical system, I take the position that the things that exist are of (at least) two kinds 6 : 1. Material objects which exist independently of minds. 2. Concepts, abstractions, ideas, etc. which exist in minds and are formed culturally and collectively by the interaction of the culture and the individual with material objects and other minds. Some of these material objects have the property of having and maintaining a mind. The existence (once formed) 7 of these minds is independent of other minds. In particular, there are no minds independently of bodies and minds may interact with the material world through bodies containing minds; either their own or others. These assumptions seem to me to be both self-evident and philosophically innocuous and they are certainly sufficient for the purposes of this work.
1.4.
Logic
Most mathematicians and many philosophers regard logic as the sui generis a priori "thing". Indeed, like the fish that is unaware of the water in which it swims, one's first impression is always "how could things be any other way?"; logic is not empirical, it is part of the structure of our minds, it is not capable of derivation or falsification. However difficult it is to conceive and whatever the difficulties there are in finding the opportunities for experimental research, there are always some indefatigable workers who will try to find things out. After the social revolution in the Tsarist Russian empire in 1917 the newly-formed 6
Notice that this rather elementary taxonomy of things is not a Cartesian dualism any more than the recognition of the existence of two types of living things (plants and animals, say) makes one a dualist in biology. 7 It is an empirical point whether or not an isolated individual would develop the property of having a mind. This could only be determined experimentally; reminiscent, perhaps of the seventeenth, century experiment to discover man's natural language by having a pauper child brought up in isolation by a dumb nurse.
10
Orientation and Outlook
Soviet Union was a collection of extremely heterogeneous countries, peoples and cultures; from the urbane metropolitan Muscovites and thoroughly European citizens of St. Petersburg to the nomadic tribes of central Asia. Amongst these peoples were some whose culture was materially quite advanced but who were illiterate. That is, the culture was illiterate, no one could read or write because there was no written expression of their language. This situation was and is extremely rare, it is not unusual for members (even a majority) of a given culture to be illiterate (this was the case in Russia itself) or for very primitive cultures to have no writing, but for a materially advanced culture to be entirely without writing was rare and may no longer be capable of being found. Two pioneering and innovative neuropsychologists A. R. Luria and L. S. Vygotskii were working on the relationship between the development of language and of mind, doing work on child development and individuals with brain lesions. They had the unique opportunity to study such cultures in the 1920s. Their work in this and other areas is now classical in their field and has influenced the development of neuropsychology and linguistics ever since. One of the most interesting and novel of their findings in interviewing and talking to these people was that they had not developed the syllogism; they did not have the general idea that, for example, from: 1. All polar bears are white 2. Boris is a polar bear it necessarily follows (we say) that Boris is white. They would typically demand more information or express exasperation and say that as they had never been to the Arctic how were they to know the colour of Boris? There was no question that it could have been the paucity of language which was at fault; it seems that their (spoken) language was concrete and they had not actually developed some of our abstract ideas at all. They had not developed the structures which we are in the habit of regarding as innate and a priori. With hindsight it is not too difficult to reconstruct a possible scenario. It is, perhaps, too often forgotten by theorists that language originates as spoken language. As we all know, even the most literate of us, when in conversation, will express meaning by facial gesture, by pointing and by mime when necessary. If the language is not written it may not be necessary to evolve actual sounds (articulated words) for some of the most familiar objects, properties and processes around us. However, when we
1.4-
Logic
11
need to convey information by writing to some person who is not with us and so we cannot use these useful aids, we must create words and forms of grammar which will enable us to get the message across. Once this level of abstraction has been done, the written language itself becomes an object of study and refinement and certain structures can be seen to have become explicit in the written language which were only implicitly present in the spoken form. In short, it is not at all difficult to see that Logic is a structure which is abstracted from language; it is not inherent in the human mind. 8 In the case of Vygotskii and Luria, this and a mass of other research (often done in spite of the official ideology) convinced them that language, logic and mind are social developmental products, they evolve in the individual and in the culture through interaction with the material world and with other minds. They have also convinced me. One of the main manifestations of logic, the concept of proof, is nicely illustrated by Wittgenstein's opinion that it is sufficient to "prove" the commutative property of the multiplication of integers simply by exhibiting a rectangular array of objects of sides N and M and noting that the products NxM and MxN are simply two ways of viewing the same array. Wittgenstein also saw clearly that attempting to validate mathematical proofs by some form of metamathematics is just a regress; just another calculus as he insisted. In the end, although the processes of a calculus may be objectified (even automated), the validation of the process can only come from humans or from comparison with the real world. We might, therefore, characterise a valid logic as a calculus for which there is "consenus amongst experienced practitioners" rather than having objective (mind-independent) existence. As Scruton 9 says "[mathematics is] a projection into logical space of our own propensities to coherent thought." I shall, in this work, silently assume the validity of "ordinary" logic, not least because I shall be expressing myself in "ordinary" language and because logic is inherent in all of "ordinary" mathematics (set theory, algebra and analysis) and, without language and mathematics, I would be unable to proceed. This takes us naturally on to a consideration of the use of mathematics in science. 8
Perhaps I should be more careful in view of the work of Frege, Russell and Wittgenstein and say "abstracted from a study of language". 9 R . Scruton, From Descartes to Wittgenstein (Routledge 1981).
Orientation
12
1.5.
and Outlook
Mathematics
The question of the "miracle" of the applicability of mathematics ("free creations of the human mind") to the real world in scientific theories is discussed in many works of the philosophy of science; there is a summary and extension of the ideas involved in a recent book 10 on the subject, so this is still a topical issue. A visit to a really well-stocked tool shop by a novice in hand-work might generate the same feelings; how is it that all these supremely useful objects ("free creations of the tool-maker's mind") can be so convenient and, above all, so applicable"? M a t h e m a t i c s is a system of formal rules for the manipulation of formal objects which has been developed by abstraction from millennia of cultural interaction with the material world. Language is a system of formal rules for the manipulation of formal objects which has been developed by abstraction from millennia of cultural interaction with the material world. And yet, one would be hard pressed to find a discussion (in language) of the problem of the miracle of the applicability of language to the real world in scientific philosophy. It is the same as the apparent problem with logic, in that we are so immersed in language that we cannot conceive that it would let us down. With mathematics we see it for what it is; a tool which has been created to help in our investigation and description of the world. Each of us has to be taught mathematics and to learn the rules explicitly, we do not pick it up at our mother's knee in childhood. So we are able to stand back and question its structure and applicability in much the same way as we are inclined to do when we learn a foreign language through explicit instruction rather than by the process of "osmosis" which occurs when we are immersed in the cultural usage of our native language. Now, while it is entirely possible that some particular parts of mathematics may be wrong, like the view that every continuous function must have a derivative, this is due to logical errors on our part or failures of intuition during the formal processes of generalisation and systematisation. The idea that mathematics might not be applicable to the real world forgets that mathematics has been historically developed from a study of the real world and, as it were by natural selection, the resulting concepts and techniques are necessarily applicable. 10
M. Steiner, The Applicability University Press, 1998).
of Mathematics
as a Philosophical
Problem (Harvard
1.6.
Reversing
Abstraction
13
Of course, since the end of the nineteenth century, mathematics and mathematicians have sought to develop autonomously of science; but the very laws of logic and abstraction which enable this apparent autonomy have, themselves, been generated by abstractions from language and interaction with the world. Without further ado, I shall assume the applicability and utility of ordinary mathematics not least because it contains logic which I have already accepted. 1.6.
Reversing A b s t r a c t i o n
It is increasingly the case in modern mathematically-articulated theoretical science that material is presented first as an abstract mathematical skeleton and the interpreted scientific structure is then given as a "representation" of that abstract structure. Some mathematicians call this the "lapidary" method; the gems are presented cut and polished, removed from the gross material in which they were found and mounted for display in such an environment that there can be no clue of their original earthly origins. In the context of considering the validity of abstraction and interpretation it is perhaps not out of place to be prepared for the rather eccentric use to which mathematicians are inclined to put the word "representation". In ordinary English, if A is a "representation" of B then B is richer and more complex then A; it contains more structure than A. For example, a wiring diagram is a "representation" of the electrical system in a car or building; a flow chart is a "representation" of an industrial process; a portrait is a "representation" of a child. In all these cases the representer is an abstraction of some properties of the represented and is used to show only one aspect of a more complex entity under study. It might therefore be reasonably expected that, in mathematics, the statement "A is a representation of B" might be a paraphrase of something like "A is abstracted from B" as it is in both everyday and scientific usage. In fact the mathematical usage is exactly the opposite; the abstraction is taken to be the more basic or fundamental entity (the represented) and the concrete entity as the representer. Thus, for a mathematician: • Line elements in real space are a "representation" of an abstract vector space. • Rotations of solid bodies in real space are a "representation" of group theory
14
Orientation and Outlook
• Looking ahead, the solutions of the Schrodinger equation are a "representation" of Hilbert space. One assumes and hopes that this inverted way of expressing an idea is to make allowance for the undoubted fact that the same mathematical structure may be abstracted from many (perhaps unrelated) real structures and processes and not in the (Platonic) belief that the mathematical structures are more fundamental. I have said in Section 1.3 that I assume that mathematical structures are real but with the very strict proviso that these structures like many others do not exist outside of people's minds; they are real but not material. Mathematics, like language, is generated by abstraction from the cultural interaction of material people with the material world and by interaction among minds, it has no existence independent of those minds in interaction and would certainly not have been generated without the historical interaction of those minds with the material world. This upside-down terminology will not cause any difficulty in actual applications of mathematics to scientific theories but it can and does give an unfortunate philosophical slant to the interpretation of those theories. If a mathematical or logical structure which is contained in (can be abstracted from) a scientific theory is regarded as more "real" than that scientific theory then the physical interpretation of the quantities in the theory becomes difficult and may come to be regarded as arbitrary. This view can be reinforced by the generation of intuitive paradoxes in the abstract theory which were not present in the original, physically interpreted, theory. The use of the extremely powerful methods of mathematics has more than just a manipulational value in physical science; these methods are a guide to intuition and an aid to concept formation. But mathematics is neither the source nor the destination of scientific theories. In spite of eminent opinion to the contrary, therefore, it is not possible for there to be any "Mathematical Foundations of Quantum Mechanics". If the unfortunate architectural analogy is to be used at all for the role of mathematics in the sciences it is more akin to the wiring, plumbing and heating — all those services which make a building convenient and pleasant to live in — than to the foundations. A more realistic metaphor is between mathematics and the tools with which the edifice is constructed; the building's ultimate shape is determined just as much by the techniques available as it is by the use for which it is intended.
1.7.
Definitions,
Laws of Nature and
Causality
15
From the outlines of the use of language, logic and mathematics which I have given in this chapter, it would appear that their development, in mutual interaction and abstraction, has meant that they are so locked together with each other and with the material world that we can never "step outside" of these structures to think about the world objectively. That we can, in fact, do so is proved, not by ratiocatination but by the very fact that we have been able to do so in both theory and in practice; we can use logic, our mathematics does work and is applicable, we can understand and explain the world and, a fortiori, we have used all these things to generate working logical and material technologies. In fact, it is easy to see that it is entirely possible to solve problems which are strongly "coupled" and "non-linear" in the sense of complete mutual dependence. The analogy which springs most readily to my mind (due to some of my day-to-day practical work) is that of the solution of the HartreeFock or Kohn-Sham equations for the electronic structure and energetics of many-electron molecules and solids. The distribution and energy of each electron in these systems is dependent on the distribution and energy of all the others, presenting a problem which has all the characteristics of a "deadly embrace" n ; we need the distribution of all but one of the electrons before we can compute the distribution of a particular one of them but those distributions can only be computed when we know the distribution of that particular one But, in computing laboratories all over the world these equations are solved thousands of times a day as a completely routine task; they are solved iteratively but guessing a solution and progressively refining it until it satisfies a self-consistency criterion. In theoretical science, perhaps the last thing we want is mere self-consistency because that would mean stagnation and self-satisfaction, but the analogy is useful.
1.7.
Definitions, Laws of Nature and Causality
There is a constant thread in science to confuse together scientific definitions and laws of nature. This tendency varies from confusions between the two and explicit positivistic programs to reduce one to the other; for example: • Mach tried to reduce Newton laws to definitions by defining force in terms of mass and acceleration using F = ma. Or a Catch-22, perhaps.
16
Orientation and Outlook
• The famous experiment to determine the mechanical equivalent of heat cannot now be done since the answer will always be unity as both heat and work are measured in the same units. • Measurement of the velocity of electromagnetic radiation in vacuo is now impossible as this velocity is a standard definition. The first of these is simply a mistake which I will elaborate below. The others are more interesting, since they fuse together a definition and a scientific law which can be clarified by other, more extreme, examples: • Anchor chain and dress fabric are both measured in metres (which is a definition) but they are not interconvertible (which is a law of nature). • Graphite and diamonds are both measured in grams (which is a definition) and they are interconvertible (which is a law of nature). • Work and heat are both measured in Joules (which is a definition) but they are only partly interconvertible (which is a law of nature). The fact that two quantities may be measured in the same units has no consequences at all for any relationship that they might have in natural processes. In fact, much of the science of thermodynamics is concerned with the fact that work and heat are not completely interconvertible. Heat and work are physically and conceptually different things; they are not equivalent, merely partially interconvertible. We shall meet this confusion between definitions and laws of nature later in a more quantitative form; the confusion (in science, not mathematics) between definitions and equations. It is equations which are the quantitative carriers of laws of nature; relationships which have the same form as equations may be simply quantitative definitions. In Chapter 7 and, particularly, Section 7.1.1 this point will be taken up in detail in the context of Schrodinger's mechanics. In the simpler examples of F = ma or pV = RT the question of whether these expression are equations or identities cannot be resolved by a minute examination of the symbols involved; it is a question of interpretation and meaning. If the symbols in an expression are all independently defined physical quantities, then the expression is an equation; it expresses a (possible) necessary relationship between those quantities in the real world. If all but one of them represent such physical quantities, then the expression
1.7.
Definitions,
Laws of Nature and
Causality
17
is merely a definition of the remaining one and does not imply anything about the structure or transformations of the material world. Thus, for example, in pV = RT, pressure (p), volume (V) and temperature (T) are all defined independently of each other and (knowing the value of the constant R) the expression is an equation which contains an (approximate) law of nature. Among the considerations involved in discussing laws of nature, causality plays a major part and it is worth discussing, however briefly, how causality can be viewed from the point of view of the philosophy and interpretation of science. The classic empiricist/positivist view of causality is contained in Hume's account, quoted with approval by Kant 12 : Necessary connection, then, cannot be observed, nor can its existence be logically derived from what is observed, (my emphasis) Truly spoken from the depth of the philosopher's armchair by someone who has never traced a fault in a complex piece of machinery or searched for a bug in an iterative computer program. Necessary connection (causality) is inaccessible to philosophy because it is not a question of precision or clarity of expression but a question of experimental interaction with the real world. No amount of passive observation or logical deduction can establish a causal connection between events. The mere fact that the 12.15 express to Glasgow always precedes the 14.25 train to Bristol says nothing about whether or not a trip to Glasgow causes a journey to Bristol; one must see what happens when one prevents the Glasgow train from leaving. I shall remark from time to time on the overwhelmingly passive view of much philosophy; without an engagement with the real world, philosophy is impotent. 13 12
Kant famously said that "reading Hume woke me from my dogmatic slumbers"; one wonders, in view of this quote, about the depth of Kant's dogmatic slumbers. But this is unkind; Kant must have been the first one to see that philosophy must ultimately split into two: a linguistic cul de sac and natural science. 13 A point on which Marx and Wittgenstein were in complete agreement, although their expressions of this view and their opinions about it are characteristically different. Marx enthusiastically advocates engagement with the material world in the Theses on Fuerbach while Wittgenstein makes the wistful statement that "philosophy leaves the world unchanged".
Orientation
18
1.8.
and Outlook
Foundations
Prom very early in the history of Schrodinger's mechanics there have been works which aim to provide the mathematical or philosophical foundations of quantum theory. In view of what I am to say later in this work it is useful to think, however briefly, about the sort of views expressed in these works. I have already commented on the unfortunate nature of the "foundations" metaphor but it is widely used in the philosophy of science community to mean the "essential underpinning" in the sense of a building's foundations and it is this sense that I want to comment on "foundations" here. Like many mature parts of physical science, Schrodinger's mechanics is expressed in mathematical terms and uses many of the standard conceptual structures of classical analysis, in addition to all the usual structures implied by classical logic (algebra, inference, etc.) These structures are, of course, used in Schrodinger's mechanics in exactly the same way as they are used in any science which is mathematically formulated, whatever its area of applicability and its level of treatment (however fundamental or approximate it claims to be); they are part of the (current) mode of articulation of the theory. 14 The foundations of a distinct part of physical science, whether mathematically expressed or not, cannot be mathematical, since: • as just noted, the same mathematical structures may be used in a wide variety of unrelated branches of science, • the basis of any science is not the rules which are used to manipulate structures representing objects or processes in the real world but the more fundamental assumptions which are made about the nature of the represented objects and of the automous processes they undergo. Not all sciences are mathematically articulated, at least when first formulated, and they may be quite properly and precisely expressed in ordinary language. Perhaps the most cogent and wide-ranging of such theories is Darwin's original formulation of the Theory of Natural Selection. No-one would claim 15 that theories like this have "verbal foundations" in 14 There is a beautiful example in V. I. Arnold excellent text Mathematical Methods of Classical Mechanics (Springer-Verlag, 1978) where the author says (p. 163): "Hamiltonian mechanics cannot be understood without differential forms". One is bound to wonder, therefore, how Hamilton himself managed in the nineteenth century without this twentieth-century mathematical tool. 15 Except, perhaps, in the gormless prattle of the post-moderns.
1.8.
Foundations
19
the same sense that Schrodinger's mechanics is sometimes claimed to have mathematical foundations. If we press the "foundations" metaphor a little, its claims begin to look a little thin. Mathematical and philosopical commentors, in claiming to discover or expose the foundations of a part of science would, I am sure, be viewed askance by the creators of those theories in much the same way that, say, Christopher Wren would be surprised by an announcement by an architectural critic that he had discovered the foundations of St. Paul's Cathedral. Wren was never in any doubt about what and where these foundations were; indeed it was he and his predecessors who developed the methods for establishing foundations of large and imposing edifices. Likewise, Schrodinger16 would, I think, have been amused by claims that, until the mathematical and philosophical critics had done their work, his theory lacked foundations. Schrodinger knew very well that the foundations for his theory were not the tools that he had used in constructing it or the language in which he choose to articulate its concepts. The foundations were (and are) accumulated experimental observations of the real world and the scientific culture epitomised in the seminal works of Newton, Lagrange, Hamilton and Jacobi. A dozen pages of the theory of operators on linear spaces are hardly comparable as the foundations of the most successful theory of matter yet developed. 17 Historically, branches of science do not have mathematical and philosophical foundations but are themselves the driving force for (or the foundation of) the development and application of techniques of mathematics and the formation and extension of philosophical ideas. This is the crucial point for the distinction between a materialist ("realist") view of the world and an idealist point of view. One is bound to ask, however naive it may seem, if an edifice of any kind, physical or logical, can be constructed without a knowledge of the location and nature of its foundations? Perhaps in the spirit of William of Ockham one should inquire what would happen to Schrodinger's mechanics, if the alleged foundations were summarily removed. 16
W h o was notoriously skeptical of taking the work of scientific philosophers seriously. See Foundations of Physics by M. Bunge (Springer Tracts in Natural Philosophy, Vol. 10 1967) for an attempt to present the foundations of physical theories in a balanced way; Bunge's first chapter (on logic, mathematics and philosophy) is called, significantly, "Toolbox" not "Foundations". 17
20
1.9.
Orientation and Outlook Axioms
T h e most effective way to study an existing b o d y of knowledge, particularly if t h a t b o d y is mathematically articulated, is t o find a set of axioms from which t h e whole corpus may be derived by t h e rules of logic and m a t h e m a t ics. In this way, inconsistencies, redundancies and straightforward errors of reasoning m a y be isolated and eradicated. This kind of technical enquiry has a very important place in t h e study of t h e theories of physics in particular. However, I shall not use this method here for reasons which might seem, at first sight, a little perverse. I prefer t o stress t h e basic physical law which generates Schrodinger's mechanics, rather t h a n t o axiomatise a structure which is capable of being abstracted from t h e ramifications of t h a t basic law. E x t a n t axioms systems for q u a n t u m theory are of varying degrees of formality, from, for example, Ludwig two-volume system 1 8 t o informal systems (where t h e axioms are usually called postulates) found in m a n y graduatelevel texts. Most of these systems contain wrong presciptions 1 9 for t h e generation of t h e most i m p o r t a n t differential operators in Schrodinger's mechanics and m a n y of them, in an a t t e m p t at generality, contain w h a t I consider to b e 2 0 false equivalences amongst the properties of some operators in Schrodinger's mechanics. However, these imperfections are not the main reason for my reluctance t o search for an axiom system for q u a n t u m theory; t h e main reason is t h e feeling t h a t axiom systems in physical science are a tool of taxonomy rather t h a n of science. Let me t r y to explain by means of an analogy. In Biology there is always a tension between t h e desire (by taxonomers, mainly) for a watertight classification of (say) m a m m a l s and t h e need (by evolutionists) t o show t h a t such a scheme specifically excludes t h e overarching property of mammals; t h a t is, species evolve. I do not share the desire of taxonomers and of many mathematicians, t o expose the Platonic forms lying behind t h e imperfect corporeal representations which we experience; I am, however, only too pleased t o share in t h e invention of abstract 18
G. Ludwig, An Axiomatic Basis for Quantum Mechanics, Vols. I & II (SpringerVerlag, 1985). 19 See, for example, G. Ludwig, Foundations of Quantum Mechanics I & II (SpringerVerlag, 1985), Vol. II p. 50, where, in addition to giving a wrong prescription for generating the Hamiltonian, it is generously asserted that Schrodinger "guessed" the correct form "with remarkable intuition". I leave the reader to judge by reading Appendix 8.A whether or not Schrodinger guessed the correct form. 20 See Appendix 11.A.
1.10.
An Interpreted
Theory
21
structures which have some of the properties of real objects and processes. In a word, axiom systems are abstracted from developing theories just as the classification of species presents a provisional episode in the history of mammalian development. The main difficulty with the axiomatic approach from the point of view of this work is that the interpretation of the symbolism must also be axiomatised; there have to be axioms of interpretation rather than the development of interpretation from the history and applications of the theory. It brings to mind Dieudonne's (an arch axiomatiser and Bourbakist) famous book on analysis in which he says in his 1960 preface: This has also as a consequence the necessity of a strict adherence to axiomatic methods, . . . a necessity which we have emphasised by deliberately abstaining from introducing any diagram in the book. ("Foundations of Modern Analysis" J. Dieudonne (Academic, I960)) This may be a useful logical exercise, but is it a way to understand analysis? The cat is let out of the bag in the preface to the enlarged and corrected printing of 1969 where we find: The only things assumed at the outset are the rules of logic and the useful properties of the natural numbers.... Nevertheless, this treatise... is not suitable for students who have not yet covered the first two years of an undergraduate honours in mathematics. Just so. 1.10.
An Interpreted Theory
In this work I shall attempt to present a completely interpreted theory of Schrbdinger's mechanics in the sense that I shall try to give a physical (materialist) interpretation to every major symbol which occurs: functions, operators and the like. The theory and the interpretation will be based on a general dynamical law (due, of course, to Schrodinger), a theory of probability (due to Kolmogorov) and articulated with ordinary language, logic and mathematics. The value and validity of the theory should not be judged simply by the agreement of a few numbers with experimental results, this is far too modest a requirement. I hope that the theory will be judged by its coherence and its interpretation of its area of applicability; the sub-atomic domain. I simply assume that this sub-atomic world exists
22
Orientation and Outlook
independently, without my permission. I shall not be making any comments of the relevance of Schrodinger's mechanics to observers, to consciousness, to minds or to God.
PART 2
Probabilities
The interpretation of the modern (measure-theoretical) theory of probability is at odds with the everyday meaning(s) of the word probability. The material presented stresses that Kolmogorov's theory of probability (o) is the only one which can be used in the context of quantitative theories and (b) that probabilities are the relative measures of sets. Statistical methods are related to probabilities insofar as they are ways of experimentally determining these measures.
This page is intentionally left blank
Chapter 2
Simple Probabilities
After some elementary considerations about the relationship between the "colloquial" and "mathematical" use of some common mathematical terminology, the idea of probability is introduced, not axiomatically at this stage, but descriptively. The relationship between probability and statistics is clarified by use of the simplest and most familiar example; dice. The generation of probability distributions for systems which are entirely deterministic is discussed and their indispensability is emphasised. Some of the difficulties which will occur in the interpretation of Schrodinger's mechanics are briefly visited.
Contents 2.1. 2.2.
2.3. 2.4. 2.5.
2.6.
Colloquial and Mathematical Terminology Probabilities for Finite Systems 2.2.1. An Example: The Faces of a Cube 2.2.2. Dice: Statistical Methods of Measure 2.2.3. Loaded Dice: Statistical Methods of Measure 2.2.4. Standard Dice and Conservation Laws Probability and Statistics 2.3.1. An Extreme Example Probabilities in Deterministic Systems The Referent of Probabilities and Measurement 2.5.1. Single System or Ensemble? 2.5.2. The Collapse of the Distribution 2.5.3. Hidden Variables Preliminary Summary
25
26 27 29 31 34 35 39 40 41 45 48 49 50 51
Simple
26
2.1.
Probabilities
Colloquial and Mathematical Terminology
We are all familiar with the fact that a considerable number of words which have common, everyday, meanings are used in more specialised, in particular more precise, ways in science and mathematics. Anyone familiar with elementary chemistry and mathematics can think of four or five different specialised uses to which the word "normal" can be put. 1 There are however some terms for which the mathematical or scientific usage is quite close to the everyday, colloquial, usage and this specialised usage seems perverse in the sense that it is close to the colloquial usage and yet gives a completely different "feel" from the ordinary, conversational use. The common use of the phrase "going off at a tangent" implies that there are several possible tangents to a curve and the whole sense of the familiar phrase is to imply that one could take several possible (inappropriate?) lines of thought from the one under consideration. Yet a mathematician will insist that there is only one tangent to a curve at any point on the curve. Similarly, if one says that x is "derivative" of y in ordinary usage, this tends to mean that x can be obtained in some way from y; thus a musical composition may be derivative of Debussy, a poem derivative of Larkin, etc. So, one might naively expect that mathematicians would consider, for example, 47a;4 or 3x 6 — 4a;2 to be derivatives of x2 since, if x2 is known, then the other expressions can be evaluated. But as we know, in the differential calculus, the the derivative of x2 is 2x. The mathematical definition flies in the face of the colloquial usage; there is only one derivative and its value patently cannot be obtained uniquely from x2 (knowing that x2 is 9 gives two possible values for 2x : ±6). Probability is the worst possible case of this confusion between "colloquial" and "mathematical" usage because it is only in the last 70 years that mathematicians have been able to give an unambiguous meaning to the term and a method of evaluating and manipulating probabilities whose meaning and interpretation is precise enough for scientific use. 2 1
A standard concentration of solutions, standard pressure and temperature of a gas, a perpendicular to a curve or surface, a linear operator with special properties, etc. 2 Significantly enough, for the purposes of this book, Kolmogorov's axiomatic development of Probability Theory (1930) came after Born's probability interpretation of Schrodinger's mechanics (1926-7).
2.2.
Probabilities for Finite
Systems
27
We are all familiar with the various ways in which the term probability is used in everyday life: • A given football team will probably win the World Cup. • It will probably rain tomorrow or even, in the weather forecast, there is a 20% probability of a heavy shower at Wimbledon on Saturday. • The probability of "heads" appearing on the toss of a coin is 1/2. • The probability of throwing two "fair dice" and obtaining two sixes is 1/36. and so on. All of these statements are completely comprehensible and make sense (convey real information) and their use is just as acceptable as the phrase "going off at a tangent" or "The Rolling Stones' music is derivative of the work of McKinley Morganfield" . 3 But none of them uses "probability" in the sense that it is used in the mathematical theory of probability. That this is so is more obvious for the first two statements than it is for the last two. In ordinary usage, where the meaning of probability is clear from the usage and context, it may imply "relative frequency", "reasonable expectation", "past experience", "a hunch", etc. but none of these familiar and frequently contradictory usages 4 can be made precise and quantitative enough for scientific use. As usual in this type of situation, we must extract what is quantitative and essential from common usage and, no matter how far the resulting definition seems to be from that common usage, show how the quantitative use includes all the idiomatic uses; a posteriori, if necessary.
2.2.
Probabilities for Finite Systems
In the modern mathematical theory, probabilities simply involve a comparison of the numbers of members contained in various subsets of a given set. In those cases where the sets contain a finite number of members, the number of members may be obtained by ordinary counting. When the sets 3
Muddy Waters. Suppose that, since records began, it has never rained on June 17th in Sheffield and yet this year it has rained in north west Scotland on the 15th and in Manchester on the 16th and the cold front advances inexorable south-eastward; what is the probability that it will rain in Sheffield on June 17th this year? 4
28
Simple
Probabilities
contain an infinite number of members, simple counting has to be replaced by a suitable measuring process, but the principle is exactly the same. It is conventional to divide the numbers of members in each subset by the total number of members in the whole set so that the probabilities obtained in this way sum to unity. Before attempting to make this idea precise by mathematical definition, it is worth noting: • The idea of "chance" plays no part in the definition of probability; probabilities are ratios of the measure of subsets of a given set. If we know how to measure (count the members of) these sets we can calculate the probabilities uniquely and exactly. • This choice shuts out many of the familiar ideas which fall into the legitimate colloquial use of the word "probability". It specifically excludes the legitimate use of the idea of probability to any quantities which cannot be counted or measured. Thus we will not be allowed to use an expression like "this statement is probably true" or "it will probably rain tomorrow" and the like. Just as we continue to use the words "normal" and "derivative" with their ordinary, conversational meanings we can continue to use "probability" in its colloquial sense but not in the context of the mathematical theory of probability. There is, in fact, no dispute that this is the mathematical theory of probability and, to mathematicians perhaps, probability is nothing more than a part of or an application of measure (integration) theory. Scientists concerned with the interpretation of probabilities seem also to regard the mathematical theory of probability in this way; as just an algorithm or "black box" which simply satisfies the requirements of mathematical rigour and that is all. This gives them free rein to impose their own interpretation on the formal calculus, whether or not this interpretation involves the measures of sets. Generally speaking, the measure 5 theory of probability plays no role in the interpretation of probabilities used in physical theories. Scientists seem dissatisfied with a definition of probability which does not involve the idea of randomness or chance in some way. This unfortunate 5
Throughout my discussion of probabilities and their experimental verification, I shall be in danger of tripping myself up over the use of the word "measure". I shall want to speak of the experimental "measurement" of probabilities, (using "measurement" in its everyday, laboratory sense) and probablities as t h e "measure" of sets (meaning mathematical measure, counting or integration). I hope the reader will bear with me on this if I fail to make the proper distinction.
2.2.
Probabilities for Finite
Systems
29
dichotomy in the use of probability in science is compounded by a legacy from positivism and instrumentalism; a tendency to define physical quantities in terms of the experimental procedures used to measure them. This leads, as we shall see, to a confusion between probability and statistics and to an increasingly subjective interpretation of probability, particularly in its applications in quantum theory. In this work I take the view that Kolmogorov is right and that probabilities are indeed relative measures of sets and that statistical measurements (or verifications) of probabilities are nothing more or less than approximations to these measures obtained by experimental means. This view is established first by some very simple and familiar examples. 2.2.1.
An Example:
The Faces of a Cube
Consider a perfect cube whose faces are numbered so that we can distinguish amongst them and consider the whole set of six faces as our basic set and the subsets as the various possible collections of the numbered faces. The subsets of special interest are the subsets containing a single face; six of them. All these sets have a finite number of members so we can measure them simply by counting their members, and in particular: • The probability that a face of the cube be numbered 5 is The number of cube faces numbered 5 The total number of faces of the cube
1 6
• The probability that the number on the face of a cube is even is The number of cube faces numbered even (2,4,6) __ 3 The total number of faces of the cube 6 and so on in the familiar elementary example. However, it does not take much thought to realise that these conclusions would also be true if the object were not a cube but any hexahedron, regular or not. This does not affect our calculation of probabilities but casts doubt on the possibility of the experimental verification of these probabilities if the object is incompletely specified. This is easily rectified either by simply insisting that the object be a cube or, perhaps better, using the measure of "area of a side" rather than simple counting. If now the measure of each
30
Simple
Probabilities
side is to be the same (A, say), the object must be a cube and we can replace the above calculations by: • The probability that a face of the cube be numbered 5 is The area of cube faces numbered 5 _ 1A _ 1 The total area of faces of the cube QA 6 • The probability that the number on the face of a cube is even is The area of cube faces numbered even (2,4,6) The total area of faces of the cube
3A 6A
3 6
These probabilities may, of course, be verified simply by constructing a cube and carrying out the counting or area measurements and comparing the results. No ideas or experiments involving chance are involved in either the theory or this experimental verification of the theoretical numbers. Probabilities are perfectly definite numbers whose values do not involve chance and may, under some circumstances, be measured experimentally directly with no use of chance. But how do these considerations apply to the tossing of dice; the results of which, as we quickly discover, are not reproducible? How are we to relate our theoretical probabilities to the frequency ratios of "face up" results of throws of material cubes with numbered faces? These throws do involve chance and, in certain special cases, approach the probabilities with increasing accuracy as the number of experiments increases. If we wish to calculate the probability of (for example) a material cube falling onto a horizontal surface with the face numbered 5 on top, we would, according to the above definition of probability, have to find a way of defining a measure for this throw and a way of measuring all the other possibilities. But this could not be simple counting or an area calculation, it is a problem in Newtonian mechanics of some considerable complexity, depending on the force of the throw, the height of the throw, the mass density of the material cube, at least. Equally important, if the results of these throws could be calculated they could be made reproducible and so one would obtain the same side face up every time. What is needed is an understanding of why the frequency ratios of "face up" results of throws of a die approach the number which we have calculated for the probability that the face of a cube be a 5 (say). In general, why are the ratios of measures of certain sets related
2.2.
Probabilities for Finite
Systems
31
to mechanical experiments with concrete realisations (dice) of the abstract quantity (regular hexahedron, cube) used in the mathematical calculation? Or, as Bridgeman 6 says: How can individual events, admittedly independent from one another, combine into regular aggregates unless there is a factor of control over their combination? But what kind of control can there be over independent events?
2.2.2.
Dice: Statistical
Methods
of
Measure
Having looked at what is meant by the term probability and noted that probabilities are purely theoretical quantities referring to abstract objects, it is time to see how probabilities relate to the real world of experiments on concrete systems. 7 Since probabilities are defined and calculated in terms of measure (counting, integral, quadrature) we must expect that any experimental measurement or verification of probabilities must necessarily involve implicit or explicit approximate integration over a set. It is to be expected, therefore, that verifications of probabilities will involve repeated measurements of properties of members of a set of physical objects. Sets may be measured by two general classes of method: • Finite sets may be counted and infinite point sets may be measured by analytical integration (quadrature) methods yielding lengths, areas, volumes and their higher-dimensional analogues. • Infinite point sets may be measured approximately by numerical quadrature methods, all of which involve evaluating a function of the set to be measured at various points within the set and forming a (possibly weighted) sum of these values. 6
" T h e Logic of Modern Physics". As predicted, I am hoist by my own petard here; "measure" is being used in two different ways: the mathematical measure meaning "integral" and the everyday measure meaning "obtain a numerical value of"! 7
Simple Probabilities
32
If the mathematical form of the measure function is known, its value can be calculated at chosen points and numerical methods can be very accurate, using only a small number of values of the function. If, however, the analytical form of the function is unknown (it may be tabulated from experimental results for example), the measure is more difficult to obtain accurately. The worst possible case is when the functional form of the integrand is unknown and the domain and range are also not precisely known. It would seem at first sight that such quadratures would be impossible to obtain. But there are methods of obtaining approximate numerical quadratures of such functions. The very simplest of the numerical methods of measuring a point set is the so-called Monte Carlo method; one simply takes whatever values of the function are available and forms the average of these values multiplied by the range of the function:
f /(x)dx « ^ Ja
E fin) i=i
=£
^ / ( r * )
(2.2.1)
i=i
where r, are the points at which the function is available. This procedure may be visualised by: • Replacing the area under the curve by a rectangle whose height is the average of the known function values and whose width is the length of the interval • Replacing the area by a set of N vertical strips of equal width ((b — a) /N) and height /(r^) Clearly, this method can be ludicrously inaccurate since the points r^ may be entirely unrepresentative of the whole interval [ab]. However, suppose that the points r* occur at random throughout the interval [ab], that is, they are equally likely to be anywhere in the domain of the function / . In this particular case there may be a chance that, if enough random points are used, the value of the approximate measure may converge to an acceptable value. 8 The interval over which the quadrature is being estimated is taken to be given by the extremities of the set of random points, again emphasising the assumption that a (large) set of random points will be representative enough of the domain and range of the measured function. 'This is the source of the name Monte Caxlo.
2.2.
Probabilities for Finite
33
Systems
Now we can see a way of obtaining experimental verification of our calculated probability that, for example, the side of a cube be numbered 5. We can take actual, existing (concrete) cubes and perform some random experiments on them which will experimentally evaluate the function "what is the number of a side of a cube?" or "is the number of a side of a cube 5?" and, most importantly, generate a random set of values of this function. These numbers can then be inserted into the above Monte Carlo formula to obtain approximate measures whose ratios should approximate to the relevant probabilities. If M\ ] is the measure functional and N is the number of experiments, the measure of the set {The number of a side is 5} is N
M[5] « £ * ( » - , - 5 ) i=i
and the measure of the set {The total number of experiments} is N
M [Total] « ] T 1 i=l
so that a numerical approximation to the probability is P(5)
M[5] M [Total]
which should approach 1/6 for large N if the Monte Carlo method of approximate quadrature is good enough. This is nothing more than a theoretical justification of the familiar method of using frequency ratios to get experimental estimates of probabilities. Let's look at two possible practical methods: 1. Dice-throwing: construct homogeneous material cubes, number their faces9 and arrange to have them thrown and spun from a height of not less than ten times their dimension in a gravitational field onto a solid horizontal surface and note which numbered face is uppermost. The method of tossing is assumed to guarantee the required randomness. 9
For the moment, 1 ignore the fact that real dice have their faces numbered, not at random but in a particular arrangement; this point does not affect the argument here and will be taken up later.
Simple Probabilities
34
2. An electronic method: fix a homogeneous material cube with numbered sides and arrange for its sides to be randomly illuminated and an image of the illuminated side to be projected onto a remote screen, the number on the projected image is noted. Here, the randomness is generated by a suitable algorithm, logical or physical. A long run of either type of measurement generates frequency ratios The number of cube faces numbered 5 ^ 1 The total number of faces of the cube 6 Thus these ratios constitute experimental verifications (by the Monte Carlo approximate quadrature method) of the theoretical result that the probability (ratio of measures of sets) that a side of a numbered cube be numbered 5 is 1/6. 2.2.3.
Loaded Dice: Statistical
Methods
of
Measure
Now suppose that we remove one of the specifications of the concrete cubes used in the above experiment and repeat the whole test. In place of the homogeneous cube with numbered faces we use a concrete cube which is not of homogeneous mass density: a "loaded die". Our definition of the probability that a face be numbered 5 is, of course, unchanged because the abstract cube used in the probability theory does not have any mass density, homogeneous or otherwise. But the experimental measurements of the probability will only give acceptable results in the second of the two experiments since the concrete loaded die will not (except by coincidence) give a set of frequency ratios which approximate to the probabilities of the abstract cube. The values of the functional M[5] will be inaccurate in this case because the values of the function will not be equally likely to occur over the whole domain of the function. The second experiment using illumination of the cube faces at random is unaffected by any changes in the density of the cube and will generate frequency ratios which, in long runs, approximate to the theoretical probabilities exactly as before. What makes a "material cube with numbered faces" into a die is: 1. It is thrown and spun in a gravitational field. 2. It falls onto a horizontal surface. 3. It must be allowed to fall a certain minimum distance compared to its own dimensions. 4. It must be of homogeneous mass density.
2.2.
Probabilities for Finite
Systems
35
Otherwise, "material cubes with numbered faces" are unsuitable concrete objects with which to measure (or verify) any calculations of the probability that the face of a cube be 5 (or any other such probability). It must be stressed that the calculated probabilities do not refer to throws of dice, they refer to the numbered sides of a cube. Any calculation of the probability that a throw of a die result in a 5 "face up" is far too complex even to be attempted. On the contrary, the throws are experiments used in a Monte Carlo measure to verify probabilities involving the sides of a cube. The fact that the probability of a side of a cube numbered 5 is 1/6 has no consequences whatsoever for a single throw of a die, loaded or fair. Only the relative frequencies of large numbers of throws of fair dice can have a role as an experimental verification of the probabilities. The relationship between individual throws of a die and probability are put into sharp focus by considering the "measure" definition of the probabilities and the nature of the individual experiments. The probabilities tell us the relative sizes of integrals of a certain function over intervals of the variable on which that function depends; that is all. It is quite impossible in general that the values of a few integrals of any function can tell us anything about the value of that function at any particular point. In the case of the cube, the emphasis and style of approach may be changed by using the idea of the "state" of an abstract object. We may choose the abstract object to be "a face of a cube" which we may think of as having 6 "states". In this case we can set up state functions associated with the six possible eigenvalues of the state operator, projection operators associated with each of the eigenstates and the whole machinery used in quantum mechanics. Since the problem involves a finite number of states, the state functions are 5 functions and the procedure is of rather formal interest. This approach will be discussed qualitatively when some of the more formal aspects of probabilities are reviewed in Chapter 3, since it enables the concepts thought to be unique to quantum theory to be brought closer to those involved in other areas of probability theory. 2.2.4.
Standard
Dice and Conservation
Laws
In the account of the relationship between the abstract object "a cube with numbered faces" and actual, existing, concrete cubes with numbered faces which I have called dice is not, in fact, sufficiently specified to be recognisable as one of the familiar white objects with spots on their
36
Simple
Probabilities
faces. This is deliberate since all that is required for the above considerations to be true is that the material cubes have their faces numbered differently. The faces do not, as I noted at the time, have to be numbered in such a way that the sum of the numbers of opposite faces is 7 any more than they have to be made from any particular material in order to serve as an experimental apparatus to verify probabilities concerning the abstract object "cube with numbered faces". As a matter of fact, it does not matter whether we count "face up" or "face down" as a result or, since the throws are assumed to be random, use a mixture of the two. There are some interesting consequences which arise if we wish to restrict our experimental setup to recording the results of tosses of concrete cubes with numbered faces such that the sum of the numbers on opposite faces is actually 7; that is we use "standard dice" in place of cubes with arbitrarily numbered faces. The abstract object corresponding to these concrete dice i.e. abstracted from size, colour, composition and density, etc. (so long as that density is uniform) is now not simply a "cube with numbered sides" but such a cube with the additional constraint that the numbers on opposite faces are completely correlated by the requirement that they sum to 7. A knowledge of the number on one of the sides determines which of the other five possible numbers is on the opposite side. 10 In this case there is, as physicists would say, a conservation theorem associated with the system. 11 Now let us set up an experiment which will reveal and illustrate the difference between what we may now call the "standard abstract die" (the image of all standard concrete dice) and the abstract "cube with numbered faces" with which we are by now, perhaps, overfamiliar. The experimental setup may seem a trifle eccentric: It is arranged to throw dice onto a horizontal glass table in a laboratory in Sheffield. Suitable video and transmission equipment is used so that an image of the "face up" side can be transmitted to one set of waiting quantum physicists in Mauritius and the "face down" image may be simultaneously transmitted to enthusiasts in Hawaii. The aim of the experiment is for the physicists in Mauritius to predict 10
In point of fact, using the standard dice this knowledge determines the numbers on all the remaining sides if one admits a "handedness" into the experiment. 11 Some quantum physicists might even use eccentric terminology and say that this abstract cube is in a "singlet spin state" but we will take this up later.
2.2.
Probabilities for Finite Systems
37
the the results which are transmitted to Hawaii based on their own readings. The first set of experiments use our "die" of the first type; a cube with its sides numbered 1 through 6 in any old random arrangement. The experimenters at both sites look at the image on their screens, verify that a large number of runs does indeed generate the numbers 1 through 6 with approximately equal frequencies and all are satisfied with the experimental setup. However, all attempts by the Mauritian visitors fail to do better than estimates of 1/5 for the relative frequency of the Hawaiian results. This is exactly what one would expect, of course, if the "die" faces are numbered in random arrangement then the opposite face will be one of the other numbers at random. The next set uses a "standard die" and, after the same intial satisfaction that the both the "face up" numbers and the "face down" numbers are found with approximately equal frequencies in a long run of tosses, the observers in Mauritus are able to predict the results transmitted to Hawaii with 100% success; they simply have to use the conservation law that the sum of the two numbers must be 7 to predict from their observation of n that the flower-clad Hawaiian visitors must see (7 — n). Finally, sets of experiments are performed in which several dice are used and the "face up" image of an arbitrary die is sent to Mauritius and the "face down" image an arbitrarily-chosen die is sent to Hawaii. In this case the result is qualitatively identical to the first case whichever type of die is used; no prediction of the other's result can be made which is better than that expected on purely statistical grounds. In fact it makes no difference whether two dice of each type are used or one of each. These results are trivially obvious of course and the whole exercise is nothing more than an excuse to combine business with pleasure on the part of the experimental physicists. The only point which emerges is the general rule: In the very special circumstances obtaining when one can be sure that two (or more) experiments can be performed on a single concrete system for which the quantities being measured are related by a conservation law, the result of one measurement may be predicted from the result of the other. This result is, of course, true whether or
38
Simple
Probabilities
not one is working with deterministic or probabilistic systems. But this result has no consequences for probabilities since probabilities do not refer to concrete systems. No-one will be in the least surprised by this result since the whole thing hinges on the measurement of two quantities which are known to be quantitatively related. In general, however, if we are dealing with statistical measurements which we wish to compare with computed probabilities, we will be in the position that the sun-bathing physicists were in the third set of experiments; we will not (and should not) know whether or not two measurements of physical quantities which turn up at our apparatus at random are due to the same concrete system or not. Indeed, one of our very basic general principle assumptions was that probabilities can be experimentally verified equally well by many identical experiments on one system or many identical experiments on several systems (or a mixture of both). Further, I noted earlier that the luxury of a choice between the two possibilities is not usually available to us in experiments at the atomic and sub-atomic level; normally, we have to be satisfied by results which simply turn up and are recorded. So, the physicists in Mauritius, on taking a reading of the "face up" image of a concrete standard die of n (say) know that the face down side of that particular concrete die is (7 — n) but transmitting this result to their colleagues in Hawaii cannot predict their image of the "face down" side of a random throw of one of the other concrete dice unless they can guarantee that the image is of the same die. But use of this knowledge invalidates the experiment's qualification as a random test which is absolutely crucial to the use of tests on concrete objects to verify probabilities. This might be simple at the level of macroscopic ivory cubes but to verify this rather simple result at the sub-atomic level requires an enormous amount of equipment, skill and expertise. The ability to use a conservation law to predict the result of an experiment on a single concrete system from another measurement on that same system may be useful but it has nothing to do with probabilities; probabilities are relative measures of sets and experimental measurements of probabilities are approximations to these measures obtained by quadratures based on many random measurements on concrete systems. Above all, it is profoundly un-mysterious that one can predict the value of someone else's experiment on a single concrete system from a known
2.3.
Probability and
Statistics
39
conservation law independently of the distance between those experiments or the time interval between them.
2.3.
Probability and Statistics
The term "statistics" has been used without saying what is meant by the term and how statistics relates to probability; it is time to clear this point up. We have already seen that probabilities are theoretical quantities; they are the ratios of measures of sets. But we have seen that it is possible to use approximate methods of quadrature to obtain approximations to probabilities. In particular, if the source of the values of the function to be measured in these approximation schemes is experiment — using concrete realisations of the abstract objects used to define and compute probabilities — then a connection is made between theory and experiment and we have a physical theory of the behaviour of (sets of) concrete objects. In designing experimental procedures for the measurement of probabilities (which must necessarily involve explicit or implicit quadratures) there are two general points to consider: • The concrete objects on which the experiments are performed must have the essential properties of the abstract object in the theory in the context of the particular experiment. In the example above, mass density homogeneity is essential if dice are to be thrown, but not if they are to be illuminated. • If the quadratures are to be performed numerically, then there must be adequate precautions taken to ensure that the experiments generate values of the integrand which sample the full interval over which the quadrature is to be performed. In the most common case, if the quadrature is to be Monte Carlo, then the randomness of the values must be guaranteed; a point which will be discussed later. In general, there have to be methods of treating the "raw" data to ensure that the implied quadratures are as meaningful and accurate as possible. The first of these is common to experimental verifications of all kinds of physical theory and is a question of good experimental design. The second is the domain of Statistics; statistics provides the mathematical techniques required in the design and analysis of methods for evaluating (among other things) probabilities experimentally using random tests. Although it is important to realise that statistical measurements are approximate (numerical) quadratures, it is obvious (from the explicit
40
Simple
Probabilities
considerations of dice above) that such quadratures almost always turn out to be frequency ratios. Since frequency ratios are what is measured, there is no harm in referring to experimental measurements of probabilities as "frequency ratios" rather than the less familiar and much clumsier "ratios of approximate quadratures" provided that the full context is kept in mind and probabilities are not, under any circumstances, defined by or identified with frequency ratios. Statistical methods are not the only way of getting experimental values of probabilities; as we have seen in particularly simple cases one can explicitly measure the sets by counting. Usually, however, the statistical method of quadratures using random tests is the only feasible way of testing probability statements experimentally. An extreme example of a case in which the statistical method might be inappropriate is given below. Although it is rather bad form to give an answer to what Bridgeman on page 31 clearly intended to be a rhetorical question, we can now see that there can be no question of "control" of independent events leading to measurements of random events generating good approximate probabilities. The reason why many independent events lead to frequency ratios which can be good experimental measurements of probabilities is simply the fact that, if they are random and independent, this ensures that (if enough are taken) they are a representative sample with which to perform the numerical quadrature which is an approximation to the measures defining the probabilities. 12 2.3.1.
An Extreme
Example
The population of mammals in an English meadow provides an elementary example of probabilities which are perfectly well defined but rather tricky to measure by statistical methods. Suppose that there are 8 cows, a horse, a pair of foxes, 9 rabbits, 48 fieldmice and a family (4, say) of weasels in a meadow; a total of 72 mammals. Let us use as a measure for the calculation of probabilities simple counting. So, the probability that a mammal in this 12
There may be "improvements" to the random choice of points such as the ones used in atomic and molecular simulation calculations. Here the random points are generated by a numerical algorithm and are weighted by a Boltzmann factor in the quadrature and so it is possible to reject some of the random points on the grounds of the size of this weighting. But in measurements the points simply turn up at random and we may have no grounds on which to distinguish amongst them.
2.4-
Probabilities in Deterministic
41
Systems
meadow be a cow is 8/72 = 1 / 9 and the probability that a mammal be a weasel is just half of this: 4/72 = 1/18. Now, as anyone who has done natural history research in the field will tell you, the probability of finding that a (randomly selected) mammal is a cow in these circumstances would be very much larger than double the probability of finding that a mammal is a weasel. Weasels are very resourceful and secretive animals and I doubt if any way of experimentally selecting mammals "at random" in the meadow would turn up any weasels at all. This unfortunate fact presents rather acute problems for zoologists and statisticians in their design of experimental procedures, but the relevant probabilities which are ratios of measures of sets of the different types of mammal are not affected at all by these practical considerations. Probabilities are ratios of measures of sets whether or not experimental techniques can be devised to verify them by statistical methods. Finally, let it be said that these probabilities are not impossible to measure, merely extremely difficult to measure by statistical methods. Since the measures concerned simply involve the counting of finite sets, we must resort to more drastic and inhumane measures; we must burn or flood the meadow and count the bodies and the survivors!
2.4.
Probabilities in Deterministic Systems
It has been repeatedly stressed that the idea of randomness or chance does not come into the definition or calculation of probabilities and it is useful to illustrate this point by an example which will have a bearing on the interpretation of Schrodinger's mechanics and which is completely deterministic. The motion of the undamped Simple Harmonic Oscillator as exemplified by an ideal pendulum is completely soluble in both classical and quantum mechanics; it is a paradigm of a deterministic mechanical system. For a pendulum of length I the angular displacement of the pendulum from the vertical (&) is given, as a function of time (t), by: 8(t) = esin(wf + 7 )
(2.4.2)
where 0 (a constant) is the maximum value of the displacement and 7 is the initial (t = 0) displacement of the pendulum. The angular velocity of
Simple
42
-e
-e/2
o
e/2
Probabilities
e
Fig. 2.1. Probability Distribution for a Simple Pendulum.
the pendulum is simply the time derivative of this expression: 8{t) = 0a; cos(ojt + 7)
(2.4.3)
(a; is given by y/g/l, where g is the acceleration due to gravity). The motion is, of course, cyclic and so it is simple to evaluate the probability that the angular displacement have any value from —0 to + 0 and we can therefore calculate the probability distribution function for the angular displacement which should be independent of time precisely because the motion is cyclic.13 The definition of this distribution function (P(0), say) is that the integral r02
M{61,62)=
/
P{8)d8
(2.4.4)
is the probability that the angular deflection of the pendulum lies in the region from 81 to 82A simple calculation gives: 1
P{8) = TTVI
- (0/©)2
(2.4.5)
and a graph of this function is given below; its form simply reflects the angular velocity of the pendulum. The faster the pendulum moves in a region, the less likely is it to be in that region; culminating in the greatest 13
T h e existence, at least in classical mechanics, of Poincare's reccurence theorem puts us all on shaky ground here in the sense that, if all motion is cyclic, all probability distributions are time-independent on some time scale.
2.4-
Probabilities in Deterministic
Systems
43
value at the two extreme turning points where the pendulum is momentarily stationary and a minimum as the pendulum passes through the vertical where the velocity is a maximum. 14 One might ask if this is not merely a mathematical exercise since the angular deflection and the velocity are precisely known for all times by reference to the above equations (2.4.2) and (2.4.3). While it is certainly true that one can calculate the precise position of the pendulum for any given t, what is equally obvious is that this information is of no use at all if one wishes to compare these calculations with experiment and is forced by circumstances (or by choice) to make measurements of the position of the pendulum at random, unpredictable times in order to verify the calculations. If, for whatever reason, one only has access to experimental measurements of the position of the pendulum at random times and wishes to compare these results with the theory of the physical system, then the only way is via the probability distribution function P{9) given by equation (2.4.5). If a random measurement is made on any system it does not make any difference whether the system is completely determinate (i.e. we know the laws of its stucture or evolution) or completely indeterminate (we have no idea of the relevant laws, if any); the result of this random measurement cannot be predicted. In such cases, at best, the relative results of many such random measurements may be predicted. It is worth considering what the factors actually are which make for a set of random times in the case in hand. The problem is simplified by the fact that the probability distribution function is independent of time, depending only on the angle 0. Although the distribution function is independent of time, the pendulum itself has a characteristic "cycle time" (the time taken for a complete swing of the pendulum) and although, in principle, random times could mean anything from nanoseconds to millennia, the experiment itself will normally suggest limitations on the choice of random times. There are some obvious limitations: • If we make measurements of 6 at random times which are very much shorter than the cycle time we may well conclude that the pendulum is stationary unless a huge number of measurements were made. Looking at the moon every few milliseconds during one second might lead to a similar conclusion. 14
There are, of course, no zeroes in this distribution since they would imply infinite velocity.
Simple Probabilities
44
• If measurements are made at random time intervals greater than the cycle time the results should be more useful. • Of course, the worst possible case would be measurements taken at random integral multiples of the cycle time; this "stroboscopic" case would definitely conclude that the pendulum were stationary. These considerations have some relevance to atomic measurements of course. The results of these experiments are interpreted in exactly the same way as the experimental investigations into the numbered faces of a cube: • The computed probabilities M(61,62)=
/
P{9)d8
refer to the abstract pendulum, they are the probabilities that the deflection of the pendulum be in the region [#i,#2] (for —O < 0% < ©). • The frequency ratios (approximate quadrature ratios) of f(0
fi\
-
^I'fr)
should approximate to the relevant M{9\, 62) for large enough N(Q\, O2). (where N(a,b) is the number of times that a real pendulum was found experimentally to have an angular deflection in the interval [a, b]) This example and the example of the cube have been discussed in some detail since both typify the relationship between the calculation of probabilities and their experimental verification (or not) by experimental methods. We have seen that, when we define a suitable measure functional, it is extremely simple to calculate the probability that a side of a cube be numbered 5 or the angle of deflection of a simple pendulum be within a given range. We shall also find that a more complicated calculation will yield the probability that the electron in a hydrogen atom be in a particular region of space. But the problems involved in the calculation of: • The face-up side of a die being found experimentally to be 5 • The angle of deflection of a simple pendulum being found experimentally to be in a given range • The electron of a hydrogen atom being found experimentally to be in a particular region of space
2.5.
The Referent of Probabilities and Measurement
45
are of unimaginable complexity depending, in the last two cases, on the nature of the experimental apparatus, the theory of operation of this apparatus, its accuracy and reliability, the competence of the operators etc., etc. Probabilities are the ratios of measures of sets; in these latter cases one only has an incomplete knowledge of what the sets actually are, let alone whether and how a suitable measure functional may be introduced so that they can be measured. 15 In the case of the pendulum I have chosen to illustrate the role of the probability distribution in the context of measurements at random times. In reality, of course, we often do not have the choice between using the deterministic equation of motion and the probability distribution, the measurements we make are not chosen to be at random but the situation is the opposite; measurements are necessarily random, we have to be satisfied with the information which simply turns up at random. In cases like this, comparison with experiment must be via a model which generates a probability distribution.
2.5.
The Referent of Probabilities and Measurement
I have taken considerable pains to stress that probabilities are theoretical quantities which, once the sets and the measure functional on those sets are chosen ("the model"), are capable of being calculated exactly and are perfectly definite (real) numbers which contain no reference to chance. In this respect they are analagous to any other theoretical quantities which are calculated using some model assumptions about the structure or behaviour of part of reality. One computes the orbit of a planet, say, initially as a two-body Kepler problem. The referent of probability calculations and of classical calculations of the Kepler problem are abstract objects: • In the cases we have discussed above, the referents were the abstract (massless, colourless, immaterial) cube and the abstract (inextensible, undamped) pendulum. • In the case of the Kepler problem, the referent is the relative motion of a set of two (point-mass, unperturbed, undamped) massive particles. Any measurements which we might wish to make to confirm (or not) the predictions of these theoretical models will have to be made on actual 15
The solution of the Schrodinger gives the energy levels of the abstract hydrogen atom, and the probability distributions for its abstract electron, not the design of a UV spectrometer or an X-ray diffractometer.
Simple
46
Probabilities
concrete objects which have the properties of the abstract object plus many other incidental properties and disturbances. It is the task of experimental design to minimise or attempt to neutralise these, as we say, inessential effects in order that any experimental results may be realistically compared to the numbers obtained from the theoretical model. If the referent of the theoretical calculation were the concrete object the life of the experimental scientist would be much more simple. In the case of the pendulum there are two possibilities: 1. If some initial conditions are known for the motion of a concrete pendulum and one is able to take measurements of the subsequent position of the pendulum at known elapsed times, then the experimental measurements may be compared directly with the numbers obtained from equation (2.4.2 on page 41). "Directly" here means that the theoretical (real) number should be comparable to the experimental result which, typically, will be an element of the standard topology of the real numbers 16 or, as typically reported, a result and a standard error. 2. If the initial conditions are not known and one is only able to note the position of the pendulum at random times, then equation (2.4.2) is of no use and one must use the whole sequence of random results to construct (rational) frequency ratios which are approximations to numerical measures of the theoretical probabilities obtained by suitable integrations of the probability distribution given by equation (2.4.5 on page 42). Both of these methods are capable of providing experimental measurements which confirm the theory of the abstract pendulum. Both are subject to errors from the same causes: damping by material resistance, etc. The fact that experimental measurements, however painstaking, will deviate from the theoretical predictions emphasise the process of abstraction in forming a model of reality. The referent of neither of the two equations (2.4.2) — the deterministic model — and (2.4.5) — the probabilistic model — is an actual existing (concrete) 17 pendulum. In both cases the referent is 16
A n interval in the real number system with rational end points. Talking about "concrete" dice, pendulums and, later, electrons, etc. brings to the mind's eye unfortunate mental images and associated attacks of the giggles; these must be sternly suppressed. 17
2.5.
The Referent of Probabilities and
Measurement
47
the abstract (idealised) pendulum whose only properties are its length and the field of force in which it swings. These remarks are not peculiar to pendulums; 18 in particular, the referent of the probability distribution for any abstract system is that abstract object and most emphatically not the actually existing concrete objects used in attempting to verify the probability distribution experimentally. Thus, the trivial probability distribution for the abstract object "numbered sides of a cube" (1/6 for each side) does not refer to throws of concrete dice; as we have noted elsewhere the probability distribution for the abstract object "a throw of a die" has never been calculated and we have to be satisfied with the results of throws of concrete dice. There is a constant thread running through the quantum theory literature that one of the main properties of any measurement is that it shall be reproducible: if the same measurement is repeated it should yield the same numerical result. I am baffled by this opinion 19 since it is self-evidently false for two of the main classes of physical phenomena: time dependent quantities and probabilistic phenomena. A measurement performed on a concrete object (a throw of a die) to verify a probabilistic theory (the numbered side of a cube) is required to be non-reproducible by the very conditions of statistical verification of the theoretical result. Probabilities are verified (or not) experimentally by measurements on randomly selected concrete objects and these measurements will be different from each other except in the case that the abstract object has the property measured as one of its fixed values. How would we verify the probability distribution for the simple pendulum if any measurements of the angular distribution were required to be reproducible? The essence of statistical verifications of probabilities are that the numerical values resulting from measurements are randomly obtained are are not reproducible.
2.5.1.
Single System
or
Ensemble?
It is a matter of common experience that one may verify the probability distribution for the abstract object "numbered side of a cube" by performing many throws of a single die or many throws of many dice or 18
I use "pendulums" rather than "pendula" mainly because the latter sounds rather sinister. 19 Unless it is due to an infatuation with idempotent projection operators on the part of mathematicians.
48
Simple
Probabilities
any combination of throws of concrete dice so long as the dice satisfy our criteria in Section 2.2.3; they may be any size, any material, any colour, etc. All that is necessary is that they reflect two things: • They must have, amongst their properties, the properties of the abstract object "numbered side of a cube". • They must satisfy the criteria for being a die just mentioned. The same remarks apply to the verification of the probabilities referring to any abstract object; in particular we may use any number of pendulums of length £ to verify the probabilities obtained by integrals of equation (2.4.5 on page 42) for the abstract pendulum. If we set a whole host of such pendulums in motion with arbitrary initial conditions (values of 7 in equation (2.4.2 on page 41)) and make measurements of the angular deflection of any or all of them at random times and collate the results, they should converge to rational-number approximations to the theoretical probabilities which are, of course, real numbers. This is the source of the attractive idea that the referent of probability distributions and probability statements in general is the set, the ensemble, of all concrete realisations of the abstract object. This interpretation has the obvious advantage that it does not attempt to make the referent of such statements a concrete object and concentrates attention on the collective nature of experimental aspects of probabilities, but it suffers from the same defect as Russell's definition of, for example, the number 2 as the set of all pairs of objects. Lurking behind each of these ideas is the abstract object used to form the ensemble or set; how is one to decide what a pair is without the use of the number 2? Much more important in the case of the ensemble interpretation of probability distributions is the fact that every concrete pendulum has properties or environmental factors in addition to those of the abstract pendulum which will, sooner or later, mean that any statistical measurements will deviate from the theoretical probabilities. All concrete pendulums are damped, and so the long-term statistical prediction of the probability of position for the angular deflection of an ensemble of concrete pendulums is zero. 20 One might object to this and say "I mean a virtual ensemble of idealised pendulums"; quite so, how does this differ from an abstract object?
2.5.
The Referent of Probabilities and
Measurement
49
Abstraction is the essence of all conceptual thinking and this is o fortiori the case in scientific thinking. In the theory of probability as in other physical theories we deal with abstract models of reality. 2.5.2.
The Collapse
of the
Distribution
We have already seen in Section 2.2.2 that statistical measurements of probabilities are approximations to integrals of the measure function and that a knowledge of either exact or measured probabilities is the knowledge of certain integrals of this function over intervals of its domain. These numbers are not at all sufficient to obtain any information about the value of the function anywhere in its domain; knowing that
[* f{d)d6 = 2; Jo
P
f(6)d6 = l
Jo
for example, does not help us to predict /(0.87) nor to predict the value of /(#) for a random value of 8. A knowledge of the probabilities (relative measures of sets) gives us no knowledge whatsoever about the outcome of any one random test of a concrete object used to verify these probabilities. The random tests give the value of a measure integrand at points in its domain while the probabilities refer to values of the measure functional. If, however, we defer to colloquial usage and imagine that the probability distribution function (measure integrand) refers to each concrete system used in the statistical measurement process we are trapped in an acute paradox: The outcome of every random event using a concrete object is a perfectly definite rational number, what is the role of the probability distribution here? It has been assumed by some writers that probability distributions depend, in addition to the value of the distributed variable, on some mysterious parameters which enable the distribution to change ("collapse" is the fashionable terminology) from being a distribution referring to the abstract object to the unique experimental result when the experiment is actually performed21 on a random representative concrete object. This is just a Or, when the result is read by an observer, in some interpretations.
Simple Probabilities
50
mistake and it is easy to ridicule this position and I am not the one to refrain from such ridicule. But the important point here is that this simple mistake about the referent of probability statements has been and is the source of an enormous literature on the "collapse" of probability distributions, "measurement and the role of the observer" in Schrodinger's mechanics where probability distributions are central to the theory. 2.5.3.
Hidden
Variables
There is a tendency among scientists to think that probabilities arise due to incomplete knowledge of the real nature of the degrees of freedom of a physical system. This opinion is encouraged by the most spectacular successes of probabilistic theories; the explanation of the laws of Thermodynamics by Statistical Mechanical methods and the Kinetic Theory of the Ideal Gas Law. The large-scale (macroscopic) properties of materials are explained by the behaviour of certain averages of the motions of the underlying microscopic (molecular) components of those materials. These examples are misleading because: There are no probabilities in Thermodynamics or in the Ideal Gas Law to be explained by hidden variables. Certainly there are variables hidden to the macroscopic level of observation and the relationship between averages of these properties and macroscopic variables is illuminating. But the "high level" theory does not contain any probabilities which require explanation by hidden variables at a lower level. Although a paradigm for the successful reduction of a high-level theory to a lower-level one, the statistical mechanical explanation of thermodynamic laws is not an example of the explanation of probabilities by hidden variables. Certainly, averages of microscopic variables are used in this reduction, for example the temperature of a body is explained in terms of the mean kinetic energy of its constituents but temperature is not defined as a probabilistic mean in the macroscopic thermodynamic theory. What we are to be concerned with later in Schrodinger's mechanics is a theory which, at its own level, generates probabilistic results unlike either Thermodynamics or the Ideal Gas Law. It is these probabilities which, it is claimed, should be explicable in terms of (averages, presumably of) hidden variables. If we think about the example of the pendulum it is helpful in this context. Suppose that we had generated the probability distribution for
2.6.
Preliminary Summary
51
the abstract pendulum (equation (2.4.5 on page 42)) directly without the intervening deterministic equation (2.4.2 on page 41) and had confirmed its predictions by statistical measurements of the angle of deflection of concrete pendulums. That is, we have a probabilistic theory with its experimental confirmation before us and are dissatisfied with the fact that probabilities are involved and seek a more "fundamental" explanation of the phenomenon. The variables used in this probabilistic theory of abstract pendulums are: the length (. of the pendulum, time t, the angular displacement 6 and the acceleration due to gravity g. Where do we look for hidden variables to generate a deterministic theory? In fact, of course, in this case we know what the deterministic theory is and, what is more, we also know that there are no hidden variables; the above set is completely sufficient to describe the phenomenon in both the deterministic and probabilistic cases. What is missing in this case is not hidden variables but "hidden" physical laws which give a deterministic connection between the explicitly-known variables common to both descriptions. That is not to say that all probabilistic sciences are of this type, but it is clear that we must distinguish between at least the two possibilities: • Probabilities in physical theories are necessary because the phenomena which our theories treat cannot be completely described in terms of the dynamical variables we are using. • Probabilities occur in physical theories because of our ignorance of some of the laws which connect the dynamical variables which we are currently using.
2.6.
Preliminary Summary
This has been an informal and elementary preview of ideas of probability with no axioms and no formal derivations. My main point is to establish that Kolmogorov's theory of probability is not just a mathematical scheme but may be equipped with a physical interpretation by relating the theoretical measures of sets to the experimental (statistical) approximate numerical measures which we obtain to verify probabilities. The most important aim in this chapter is to establish that the referent of probability statements, in particular probability distributions, is the abstract object with which the theory of probability deals. Probability
Simple
52
Probabilities
statements do not refer to individual objects which have the properties (amongst others) of this abstract object. Statistics deals with experimental measurements of the properties of actually existing {concrete) objects which have (at least) the properties of the abstract object. These measurements are used to verify the theoretical probabilities. In this chapter, I have concentrated on a descriptive introduction to the mathematical theory of probability and its interpretation; in the next chapter we can look at a more formal theory and give some of the terminology a more careful definition. Notice from the very outset, here and in Chapter 3, we introduce (and define) probabilities as measures of sets so that only those things which are both 1. Sets (collections of members) and 2. may have a measure introduced into them (members of the set may be counted or the concept of area, volume, etc. is given an exact meaning in the set) are probabilities. Thus: • "The truth of a proposition" (for example) is not a set and so cannot be measured • "A throw of a die" is also not a set and similarly cannot be measured and therefore it is just as meaningless to speak of the value of the probability of either of these objects as it is to speak of their length, area or volume. This is why the four statements made in Section 2.1 cannot be probabilities in the mathematical theory, however familiar they are in colloquial use.
Chapter 3
A More Careful Look at Probabilities
Some of the ideas introduced in the last chapter are placed of a firmer, more formal, footing and a problem in ontology is skirted. The problems associated with time-dependent probability distributions are rehearsed with particular reference to a familiar example. The abstract and concrete objects which are likely to be met in interpreting Schrodinger's mechanics are examined.
Contents 3.1. 3.2. 3.3. 3.4. 3.5. 3.6.
3.1.
Abstract Objects •....• States and Probability Distributions 3.2.1. The Propensity Interpretation The Formal Definition of Probability 3.3.1. A Premonition Time-Dependent Probabilities Random Tests Particle-Distribution Probabilities
53 55 56 58 62 63 66 67
A b s t r a c t Objects
Considerable stress has been placed on the idea that the referents of physical theories, in particular the theory of probability applied to physical processes, are abstract objects. Although indications have been given of what these objects are in Chapter 2, we need a more careful definition if the idea is to be used in less familiar circumstances. Also, I have to attempt to justify the idea that physical theories describe what are, in the everyday sense of the term, non-existent entities. 53
A More Careful Look at Probabilities
54
In fact, all language, logic, mathematics and theoretical science deal with entities which are abstracted from or are idealised versions of actually existing, concrete, objects. We are perfectly familiar with the use of a term like "mammal" and find its use completely unobjectionable. And yet there are no mammals; there are only cats, dogs, gnus, etc., etc. A moment's thought makes us realise that there are no dogs or cats either but only Fido, Rover, Pussums and Tiddles, etc. If we exclude the idea of abstractions from reality we can scarcely use language at all and fall into the same trap as the medieval Nominalists in thinking that only concrete objects exist. Roughly speaking, abstract objects are concepts. In ordinary language one is forced to abstract from "incidental" properties of concrete objects in order to be able to express general ideas; one needs to be able to say what a mammal is and to be able to distinguish a mammal from (say) a bird (another abstract object) without having to explain all the ways in which a gnu is different from a wren which includes (for example) the fact that a wren does not suckle its young. An abstract object is a mental construct which only has the properties which one explicitly assigns to it and no other properties. Thus: • An abstract object is not a "typical" member of a set of concrete objects. All concrete objects will have "incidental" properties; all concrete cubes have mass. • Usually, to each abstract object there is a corresponding concept (word or name) but not always, since, in science we must form abstract objects. • Every actually existing concrete object has a fixed set of values of all its properties 2 But abstract objects may well not have fixed values of some or all of their the explicitly-specified properties. But where are these abstract objects to be found? Do they exist? Are they real? They are real and exist in the minds of people. They do not have any material existence of course independently of such minds but is the place where an object exists a criterion for that existence? 3 Here, perhaps, one has to insist on the distinction between "realism" and "materialism" in ' O r the position satirised by Swift where the learned professors of the Academy of Lagado carried around actual objects to communicate with one another. 2 "Time of specification" may be one of these, of course. One would scarcely withhold the attribute of existence from goldfish because they exist in aquaria rather than in the wild.
3.2.
States and Probability
Distributions
55
philosophy, which I have done. Bananas 4 are both material and real as are electric fields; they exist independently of minds. The number 2 (or it) is real but not material. By "material" I simply mean "existing outside of and independently of our minds". I cannot define "real" in a few words but it includes everything which is material plus those things which, through an active agent (usually human), may have an effect on material objects; maybe material plus "minds and their contents" will do duty for a definition of "real". This usage differs from that used by realist mathematicians who hold, if I have understood their position, that, for example, numbers exist outside minds but, presumably, they are not material. The upshot of these rather cavalier considerations is that the referents of physical theories are abstract objects. 5 The predictions (numerical or otherwise) of physical theories may be checked against experiments on actually existing concrete objects which have at least the properties of the abstract object. The task of the experimenter is to minimise the effects of the incidental properties of the concrete objects on his measurements and that of the statistician to ensure that the mathematical techniques for analysing the data from the experiments is sound. Since properties may be predicated to concrete objects and an abstract object is a set of properties, some writers identify "concepts" with "predicates" so that their "predicates" are my "abstract objects". However, one can scarcely say that the referent of a probabilistic theory is a predicate.
3.2.
States and Probability Distributions
A set of the possible values which some or all of the properties of an abstract object can take may be considered to be (mutually exclusive) "states" of that abstract object. In the very simple cases considered so far there is only one such property and the idea of "state" may sometimes seem a little artificial but the formal similarity amongst even these simple cases is worth emphasising: • The "numbered face of a cube" may take one each of the six possibilities and we may say it is in a state "5" or "2", etc. 4 1 am in danger of tripping myself up here, by "bananas" I mean in this context the set of all concrete bananas, not the abstract banana! 5 "No science ever interprets reality in an exhaustive way. It constructs its object by a choice which preserves the essential and eliminates the non-essential." (Lucien Goldmann in "The Human Sciences k. Philosophy", Cape, 1969).
A More Careful Look at Probabilities
56
• The "mammal in the meadow" may take one of the six values "cow", "horse", "rabbit", "fox", "fieldmouse" and "weasel" and we may say that it is a state "rabbit" or whatever. Here the abstraction method gives us the strange idea of a "mammal" being in the state "rabbit" rather than the colloquial "rabbit as an example of a mammal". • In the case of the pendulum we would normally want to distinguish between the deterministic model and the probabilistic model, denning the abstract object according to the type of measurements to which we subject the concrete pendulums. — In the deterministic model the abstract object may simply be a "pendulum of length £ and angular deflection 9" whose state is defined by the numerical values of £ and 9. The "angular deflection of a pendulum of length £" may take on a non-denumerable infinity of values between —0 and 0 and we may say, therefore, that the abstract object is in a state "a" or some such value. — Using the probabilistic model, on the other hand, the abstract object may be chosen to be a "pendulum of length £ and amplitude 0 " . In this case the numerical values of £ and 0 (which fix its energy and frequency) define the state of the abstract object and the angular deflection 9 is not fixed by the state of the abstract object but is only given by a probability distribution. Notice for future reference the difference between the last pair of these examples and the other two; the pendulum can actually autonomously change from one state to another but neither the "numbered face of a cube" nor a fortiori the "mammal" can actually physically change the value of their state-defining property. 3.2.1.
The Propensity
Interpretation
This last point is worth some elaboration since it bears on the so-called Propensity interpretation of probability due to Popper and which is the adopted interpretation by many scientists who seek an objective view of probability. The principle idea in this interpretation is that probabilities are objective properties of (individual concrete) systems which measure the propensity that the object has to have a particular value of a property of interest. Thus: • In the case of our pendulum, the probability that the angular deflection of the pendulum lies in a given region is simply a measure of its propensity to be in that region
3.2.
States and Probability
Distributions
57
• In the case of a hydrogen atom, the probability that the electron be in a particular volume of space is again a measure of its propensity to be in that volume These propensities are, of course, determined by the potential which constrain the motion of the relevant mechanical system. One weakness of this position is obvious; without a satisfactory definition of propensity independent of probability, the explanation is circular. Its strength is its objectivity; nowhere is it implied that probabilities involve the acts or thoughts of the "observer". But, like other erroneous interpretations of probability, it refers to the properties of individual concrete objects rather than to the true referent of probabilities, abstract objects. This point is made much clearer by looking at the propensity interpretation of probabilities which involve states of systems which may not, autonomously, change into each other. While it may seem to make perfect sense to say that a pendulum of a given length has a larger propensity to be at the extremities of its motion than in a vertical position because of the laws of dynamics, a "mammal in a meadow" cannot be said to have a propensity to be a rabbit, for example, not least because, as we have seen, there are no concrete mammals. Any example of a "mammal in a meadow" (randomly chosen or not) is always a particular concrete animal, it cannot change from being a particular concrete rabbit to being a particular concrete cow by the act of measurement or any other process. But there is a perfectly definite probability that the abstract object "mammal in a meadow" be a rabbit because the probabilities are the relative measures of sets and not dependent on any particular property of that abstract object except that the relevant subsets may be measured. Similarly, any experimental verification of the probabilities using sets of concrete objects depends only on the fact that they can be measured (counted, in this case). To press the point to the edge of fatigue, the two statements: • 83% of all the animals on the planet are insects 6 • The probability that an animal on the planet is an insect is 0.83 are identical and niether of them has anything to say about the properties of any concrete ant or concrete elephant except that each has the (objective) property of being capable of being counted. An individual concrete animal is always a particular wasp, a particular gnu, ..., a particular spider, etc. 6
I have plucked the figure of 0.83 out of the air of course simply to make a point.
A More Careful Look at Probabilities
58
and cannot be said to have any "phylum or genus propensity" to be anything other than what it is. The probability may be statistically verified, as always, by counting sets of animals which are adequate 7 to give Monte Carlo quadrature approximations to that probability. The situation is completely analogous in the case of the "numbered sides of a cube"; the numbers on the sides of the abstract cube or any concrete cube cannot change and the probability that the "numbered side of a cube" be 5 cannot have a propensity interpretation even if that probability be mistakenly taken to refer to throws of concrete dice. The outcome of any particular throw of any concrete die is that a single numbered side be "face up"; that "face up" side cannot be said to have a propensity to be anything other than what it actually is. To say that each of the sides of a concrete die have an equal propensity to fall "face up" is to say nothing about the concrete dice throws and simply to paraphrase the probabilities in the case of the abstract cube. In order for the concept of probability to have a uniform interpretation for all kinds of sets which may be sub-divided into disjoint measureable subsets, it is necessary to discard the propensity interpretation even though its proponents are allies in other areas of philosophy; they are supporters of the objective existence of the properties of the material world. The propensity interpretation arises from the entirely laudable effort to give an objective meaning to the idea of probability in a particular area of science; the theory of those systems for which concrete objects may autonomously change from one part of the probability distribution to another. But this interpretation leads to obvious absurdities in more general probability applications. As we have seen, if probability statements are imputed to individual concrete objects — for whatever reason — one must fall into paradox and confusion.
3.3.
The Formal Definition of Probability
The Kolmogorov axioms apply to measures of subsets of a given set and simply state in formal terms the conditions we have become familiar with in Chapter 2; probabilities are relative measures of subsets of a given set. That is, for a set Q and sub-sets Wi a measure function P:Q-*R, Not a trivial task in experimental design!
P:Wi-*R
3.3.
The Formal Definition of Probability
59
is defined from the subsets Wi C £1 to the real numbers such that: • The probability of a the larger of two sets is not less than than that of the smaller: P{WX) > P(W2)
if
Wi C W2
(3.3.1)
• The sum of the probabilities of two disjoint sets is the sum of their individual probabilities: P(W1) + P(W2) = P(Wx + W2)
if
WinW2
=0
(3.3.2)
this result may be extended by rescursion to any denumerable number of subsets of $7. • The probability of the enclosing set is unity: P(Q) = 1
(3.3.3)
In the case of probabilities generated from a probability density which is a function p from the set X (members x G X, subsets Xi C X)) onto the real numbers, the corresponding results are: 1. /
p(x)dx > I
J Xi Xi
2.
p(x)dz
J X-2 X-i
ifX!CX2. /
p(x)dx + /
J X\
J X-2
p(x)dx = I
p(x)da
J A1+X2
if Xi n x2 = 0
L ix
p(x)dx = 1
which may always be arranged by multiplication by a numerical factor 1/JVif
L
p(x)dx = N < 00 ix Now, to qualify mathematically to be a distribution function a function p must be: 1. Single valued 2. Non-negative 3. Integrable to a finite value.
A More Careful Look at Probabilities
60
Continuity is not necessary and t h e function may have "corners", "jumps" a n d "spikes" provided any infinities occur on sets of zero measure (finite numbers of points for infinite sets). All of these conditions are more t h a n met by insisting t h a t p b e t h e square of t h e modulus of a continous, normalisable, possibly complex function: p(x) = |V(x)| 2 T h u s t h e square of t h e norm of any single-valued function has the correct mathematical requirements t o be a probability distribution function. 8 In fact the requirement can be generalised t o include any "vector" of such functions: 9
i/>(x)
1p3(x)
\i>n{x)J with p(x) = i$>(x)^ ip(x) In addition t o the generation of a probability density, each ip generates a projection operator: 1 0 Pi> = I
i>(x)ip(x')*dx'
such t h a t iV(z') = f
dx'^{xW{x')f{x')
= 4>{x) I dx'ip*(x')f(x')
= ip(x)pi
(say)
8 It is worth stressing the obvious fact that a probability distribution function for the quantity x (say) is a function of x. Thus, a distribution function of energy is a function of values of the energy, not a function of space. Later in this work we shall be concerned with distributions of quantities like x in space; these are not probability distributions since they do not satisfy Kolmogorov's conditions. 9 With the obvious implications for "spin". 10 Unfortunate near-collision of notation; P( ) for probability measure, P for projection operator.
3.3.
The Formal Definition
61
of Probability
where pi is a number. Thus the implied eigenvalue equation P^fi{x)
=Pifi(x)
has the solutions: Pi = 1,
if / i = V'
Pi = 0,
if fi is orthogonal to ip
if V' is normalised to unity. That is, the function ip which is capable of generating a probability distribution p = \ip\2 is an eigenfunction of the (Hermitian) operator P$ with eigenvalue unity. The other eigenfunctions comprise all those functions which are orthogonal to ip and they are all degenerate with eigenvalue zero. It is trivial to show that these alleged projection operators associated with orthonormal functions ipi satisfy the requirements of idempotency and completeness. The connection between the distribution function p and the associated function ip — the "state function", say — may be made more explicit by changing the definition of p slightly so that it is dependent on two sets of variables x and x'\ p(x;x') = ip(x)ip*{x') with the original probability distribution function being p{x; x) of course. This extended definition makes p(x; x') the kernel of the projection operator P$ in the usual sense of integral operators: Pif, =
dx'p(x; x')
Looking ahead we shall, in Schrodinger's mechanics, be dealing with probability distributions which are formed in this way as magnitudes squared \ip{x)|2 where the functions ip are solutions of (partial) differential equations and are typically continuous functions and therefore generate continuous probability distributions. A moment's thought shows that, if a probability distribution is continuous, its derivative must be zero when the function itself is zero:
62
A More Careful Look at Probabilities
since if this were not true p would be negative in the neighbourhood of xo which is impossible. If, however, p is given by p(x) =
tp*(x^(x)
for any square integrable ip then: p(x0) = 0 = > ip(x0) = ip*(x0) = 0 =>
suggesting that the use of probability distributions which are the square of the modulus of a normalisable function is fundamental to probability theory since this method always generates probability distributions which satisfy Kolmogorov's requirements. Since the ip may take positive and negative values while p must always be positive, different ip's may be orthogonal which is impossible for two different p's, again suggesting a more fundamental role for tp; different ^>'s may be solutions of the same (self-adjoint) differential equation. 3.3.1.
A
Premonition
Suppose we wish to consider the states of a single abstract particle and further suppose that to each possible energy 11 Ei of this abstract particle there corresponds a position probability distribution pi(r) such that Piif)
=Vi{r)i>i{r)
in an obvious special case of the above example for general probability distributions (dV is the relevant volume element). Then the projection operator Pi given by:
and each of the separate probability distributions is the squared modulus of an eigenfunction of the corresponding (Hermitian) projection operator: Piipi = 1 x ip. Possibly discrete, possibly continuous, it could be the pendulum bob in our earlier example.
3.4-
Time-Dependent Probabilities
63
Futhermore, t h e t o t a l operator
i
has all t h e tpi as eigenfunctions with eigenvalue 1, if the tpi are orthogonal, i.e. if
If these conditions are met t h e n we m a y form t h e Hermitian operator H, given by
i
for which Htpi = Eiipi for any position probability distributions provided that the orthonomality condition on their component fa is satisfied. This latter condition is not, in fact, met by t h e probability distributions of abstract particles whose motion is governed by classical mechanics as one can quickly verify fom t h e angular deflection probability distributions of Section 2.4 on page 41 b u t t h e exercise is an interesting one. T h e problem lies with t h e continuous n a t u r e of t h e possible E{. It is capable of solution by using (infinite) sets of ^-function distributions and associated projection operators, one for each member of t h e continuous set of Ei.
3.4.
Time-Dependent Probabilities
In first looking at the probabilistic model of t h e simple pendulum, t h e essence of t h e choice of this model was t h e assumption t h a t only random measurements on concrete pendulums were available, making the deterministic equation which fixes t h e p e n d u l u m ' s motion as a function of time unusable. In thinking about r a n d o m measurements of this type, one would naturally assume t h a t values of t h e pendulum's deflection occurred at random (unknown) times (the values simply t u r n up and are recorded). 1 2 B u t w h a t happens in t h e case of a damped pendulum when t h e amplitude 12 By "unknown" here I mean that the relationship of the random events to the motion of the pendulum is unknown, not that all clocks are banished from the lab.
A More Careful Look at Probabilities
64
of the swing decreases with time? If the abstract object is the one chosen in the last section — "a pendulum of length t and amplitude 0 " — then © changes with time so values of 9 for random times will refer to different abstract objects of the former time-independent type and any statistical measurements on a single concrete damped pendulum will be worthless in attempting to verify the probability distribution calculated for the abstract damped pendulum. First of all, let's say what we mean by a damped pendulum. In most pendulum (or other oscillator) is damped by fluid (air) friction and it is a good approximation to make this damping proportional to the velocity of the pendulum so the equation of motion for such a system is
where 7 is a measure of the size of the damping effect and u> is the angular frequency of the corresponding undamped pendulum
g/e Without going into the details of the solution of this equation (which obviously involves the relative magnitudes of the driving force of the oscillations, gravity g, and the damping effect 7) the simplest regime is called light damping in which the frequency of the motion is unchanged from the undamped case and only the amplitude decays with time. 9(t)
eex
P(
-ipt
sin(w* + >)
(3.4.5)
= Q(t) sin(u;i + >) (say). The time-dependence of the amplitude has simply been absorbed into the amplitude 0 without change of notation since the details of this dependence are not important here; the amplitude simply decays in the familiar exponential way with time. In this light-damping regime the probability distribution is of exactly the same form as that for the simple pendulum; equation (2.4.5 on page 42) of Chapter 2: P{6;t) =
V I - (0/Q(t))2
where the time dependence is entirely contained in the time dependence of the amplitude Q(t).
3.4-
Time-Dependent
Probabilities
65
It now seems to be the case that the abstract object must involve time: "a pendulum of length £ and amplitude 0(i) at time <". What we now have is a continuous series of abstract objects of the former type as the amplitude decays. 13 As we noted at the outset, random measurements of the angular deflection of a concrete damped pendulum would not be adequate to verify the probability distribution of the abstract object "damped pendulum". But, as we shall see, Schrodinger's mechanics and quantum theory in general generate time-dependent probability distributions. If the theory is to be capable of experimental verification, there will have to be some way of obtaining random measurements of the time-dependent properties of concrete objects which may be compared to the theoretical probabilities. The solution is the same as one of the methods of measuring the values of the relative probabilities of the time-independent "numbered face of a cube"; we can throw one die many times or many dice once (or many dice many times). The time-dependent probabilities may be experimentally verified by performing random measurements (experiments at random times) on many "identical" 14 pendulums simultaneously. Naturally, in order that these measurements should not all give the same answer, we must use the only remaining property of concrete pendulums which is not specified in the abstract object: the initial deflection >.15 So the (somewhat tedious) experiment would have to be: Random simultaneous measurements of the angle of deflection of a large number of pendulums of length t and initial amplitude 0(0) with random starting deflections. At the time of each random measurement, one would obtain a set of relative frequencies of the deflections which should approximate to the calculated probabilities at time t. These probabilities, like other numerical quadratures, would approximate the probabilities at each time t by a rational number and, of course, would only give a finite approximation to the continuous time development of those probabilities. In the latter respect 13 It must be said now, of course, that if the time t is known then the deterministic model could be used but I am keen to investigate the case of time-dependent probabilities in general and their relationship to measurement. 14 Concrete pendulums of length £ with identical initial amplitudes. 15 This provides a nice echo of Einstein's insistence that probabilities were necessary in atomic physics because of ignorance of initial conditions.
A More Careful Look at
66
Probabilities
the finite-step time dependence of the experimental quantities is identical to the way in which a particle's path would be verified in classical mechanics; a series of measurements at discrete times verifying the continuous path of the abstract particle. I have taken the trouble and braved the tedium of setting up an explicit model of a familiar time-dependent probability distribution for three main reasons: • As already noted Schrodinger's mechanics generates such distributions. • To stress the necessity of using statistical experiments involving many concrete objects in verifying these probabilities; there can certainly be no question of these probabilities referring to a single concrete object. • To counter the opinion that the deterministic evolution of a timedependent probability distribution is impossible, or worse "a mystery" which involves the large-scale universe or the observer. 16
3.5.
Random Tests
Although the idea of "a random test" or "a random measurement" has been used many times in this and the previous chapter, I have relied on familiarity and common sense to some extent to make the idea acceptable. As we have just seen, in even fairly simple mechanical systems a "random test" might be quite a complicated operation both in its experimental performance and in ensuring that it satisfies the necessary requirements to sample the relevant space in constructing a numerical quadrature. It is obvious that in even the crudest definition of a random test, the concrete object subject to that test should retain no "memory" of being tested. 17 Removing a ball from an urn, noting some property and returning the ball to the urn is an archtypical random test experiment. If the ball were to be marked for later recognition before being returned or even not being returned at all, this would be universally regarded as a foul. The simplest and most reliable way of ensuring that this happens is to perform 16 See, for example, the first few pages of Chapter 6 "Quantum Magic and Quantum Mystery" of "The Emperor's New Mind" by R Penrose (Vintage, 1990) which contains, almost verbatim, similar views to E Squires in "To Acknowledge the Wonder" (Adam Hilger, 1985) p. 183. I say that this is obvious but it is in direct conflict with the commonly-accepted definition of "measurement" in quantum theory as we shall see later.
3.6.
Particle-Distribution Probabilities
67
the random tests on many different concrete objects, all of which have the properties of that abstract object which is the source of the theoretical probabilities. Hence the constant calls for new decks of cards at professional card games. If a measurement is performed on a concrete object and this object is either discarded or becomes unavailable for other random measurements, this ensures that one important criterion for the genuine randomness of the measurements is satisfied. Even when each random test is carried out on a separate concrete object there may still remain the problem of deciding what a suitable "concrete object" actually is. In mechanics one is dealing with moving objects in interaction, having been in interaction or coming into interaction and physical distance may not be a good criterion for deciding what the components of a concrete object are. There is no difficulty in agreeing that a hydrogen atom is a candidate for a concrete object but what about a proton and an electron? The individual circumstances will decide each case. When the decision has been made about the nature of the concrete objects and the adequacy of the set of measurements, randomness is best assured by making measurements on separate concrete objects should these be available. Or, what amounts to the same thing in different circumstances, noting the value of the property in question as concrete objects turn up for measurement at random. Naturally, this is in principle impossible if one believes that probabilities refer to single concrete objects.
3.6.
Particle-Distribution Probabilities
The idea of forming an abstract object from sets of concrete objects is at its simplest when the properties of the concrete objects which are to be discarded as incidental are "static" properties of those concrete objects. The idea of "cube" being formed from concrete cubes of different sizes, colours, materials, etc. has already been done for us by centuries of linguistic usage. The existence of the word "cube" trivially shows that this has been done; similarly the referents of all nouns except for proper names are abstract objects (concepts, existing only in the mind). However, in science, more particularly in mechanics, one is dealing with objects which have properties other than simple static attributes and it may be found difficult to actually visualise the objects which arise when
68
A More Careful Look at Probabilities
the "dynamical" properties are abstracted away. For example, the abstract object "an electron with fixed linear momentum p "; here one has abstracted (among other things) the attribute of position from the electron. This abstract object has no position; random measurements of the momentum components of different concrete electrons corresponding to this abstract object (with fixed momenta p) will all give the same value, but random position measurements on such concrete electrons will not give reproducible results. If enough position measurements are made, the frequency ratios may well converge to some stable distribution in space and, if we have a theory of the abstract object, we may be able to compute the relevant probability distribution and compare it with these experiments. However compelling and unimpeachable the logic of this explanation, I personally find it hard to form a mental image of the abstract object "a particle without position". Thus, notwithstanding my taking the high moral ground and insisting that the true referent of probabilistic theories is abstract objects, I often find it very useful to visualise some abstract objects as ensembles when dealing with dynamical properties. This point has been touched on in Section 2.5.1 on page 43. For very obvious historical and cultural reasons, I have no difficulty visualising the abstract object "cube"; I do not have to think of the set of all cubes of different sizes, colours, materials, etc. But, in thinking about particles whose only properties are position and momentum, I can only imagine the abstract object "a particle with fixed momentum" as "the set of all particles with fixed momentum and all possible positions" , 18 perhaps because the historical, cultural essence of the "particle" concept includes localised position. Perhaps future generations of scientists will have as little difficulty with the visualisation of the abstract objects of dynamics as I have with the abstract object "cube". It must, of course, be stressed that this ensemble visualisation is of the same type as the Gibbsian ensemble of statistical thermodynamics; we are only thinking about the theory of a smg/e-particle system not a many-particle ensemble of the Boltzmann type.
Or, as Einstein might have put it, "all possible initial positions".
PART 3
Classical Mechanics
Notwithstanding the possibility of setting up axiom systems for quantum mechanics which are independent of classical (Newtonian) mechanics, the concepts and referents of are patently and historically developed from those of Newton's theory as developed by Lagrange, Hamilton and Jacobi. This part of the work uses elementary (nineteenth century) methods to try to illuminate the relationship between the two mechanical theories.
This page is intentionally left blank
Chapter 4
The Hamilton-Jacobi Equation
The Hamilton-Jacobi (H-J) equation represents the culmination of the development of classical {particle) mechanics. In this chapter I concentrate on those points in the interpretation of the HamiltonJacobi equation which are absolutely crucial to an appreciation of Schrodinger's mechanics; without which many mysteries appear in the interpretation of quantum theory. In an attempt to be emphatic I may have been repetitive here.
Contents 4.1. 4.2. 4.3.
4.4. 4.5. 4.1.
Historical Connections The H-J Equation Solutions of the H-J Equation 4.3.1. Cartesian Coordinates 4.3.2. Spherical Polar Coordinates 4.3.3. Comparisons 4.3.4. Cylindrical Coordinates Distribution of Trajectories Summary
71 73 76 78 79 81 83 84 86
Historical Connections
It is entirely possible to produce an axiom system for Schrodinger's mechanics which makes no mention of Newtonian mechanics and its developments by Lagrange, Hamilton and Jacobi; to produce an interpreted theory from which the historical development of quantum theory has been completely excised. It is possible but is it sensible? Taking a broader view of the development of science it is easy to find cases where the history of the current 71
72
The Hamilton-Jacobi Equation
theory is just that, its history not its historical development. One might think of the ousting of the phlogiston theory of combustion by the discovery of oxygen; no one would require a student of chemistry to be familiar with the concept of phlogiston before studying current combustion processes. The phlogiston theory has been replaced by the discovery of oxygen and the modern theory of chemical reactions and has left only a historical, not a scientific, trace of its existence; it is entirely sensible to omit any reference to phlogiston in any modern Chemistry course but not any History of Chemistry course. The relationship between classical and quantum mechanics is quite different; Schrodinger's mechanics developed out of classical mechanics; the two co-exist; the one is not a replacement for the other since, among other things, their referents are different. Classical mechanics is a theory of some aspects of the behaviour of material objects of a certain range of size and mass; Schrodinger's mechanics is a probabilistic theory of some aspects of the behaviour of material objects of a different size and mass scale. These two theories are both historically and ontologically related; they share a significant number of concepts and I hope to show that attempting to sever this historical and developmental link leads to unnecessary confusions and duplications. As I pointed out in Section 1.9 on page 20, axiom systems are useful taxonomic methods for the detailed study of the structure of a theory but all such axiom systems have to be abstracted from the developed, interpreted theory and the theories with which it overlaps and abuts. If an understanding of the meaning and interpretation of a theory is required, the choice is between: • Exhibiting the way in which the structures emerge from the overall physical principles. • Appending a set of interpretational axioms to the set of formal axioms of existence, structure and relationship. Ultimately, the second must be based on the first so I choose to use the first. Historically, Schrodinger's mechanics was developed (by Schrodinger) from the starting point of the Hamilton-Jacobi equation, replacing the equation by a variation principle. This replacement produces a very different mechanics with a totally different referent and interpretation of the original Hamilton-Jacobi mechanics. However, there are central points in the
4.2. The H-J Equation
73
interpretation of the H-J equation which, while not identical to the interpretation of Schrodinger's mechanics, are so closely related and suggestive that they help to dissolve key paradoxes in the conventional wisdom. 4.2.
The H-J Equation
The high point of the development of classical (particle) mechanics is the Hamilton-Jacobi equation: H
d S
(ai
t
\
_
d S
or
Here, the Hamiltonian function has its familiar form as a function of the coordinates (generically ql) and conjugate momenta (generically pi) but the momentum components are defined in terms of the action function S, itself a function of the ql and possibly t:
so that
H{q\Pi,t)d^H[q\^t and the energy of the system is the time derivative of the same function S: def
E =
as(gSt)
at-•
The dynamical law of Hamiltonian mechanics says that allowed trajectories q% and associated momenta pi as functions of time t, must be such that the Hamiltonian function of just these particular q%(t) and Pi(t) is equal to the energy of the system. Using the above substitutions, this law becomes the Hamilton-Jacobi equation involving only S. The H-J equation is a partial differential equation in 3./V + 1 variables (the ql and t) and so, when solved, will require 2>N +1 constants or initial conditions. One of those will be additive (corresponding to the time derivative)
74
The Hamilton-Jacobi
Equation
leaving just 3N required, which can be always be chosen to be the initial values of the coordinates ql: the <JQ. The relationships between the function S and the initial values q^ is just a special case of the general relationship between momenta and S:
(IF). ,-»
(42 3)
-
and so we may generate the initial momenta {pio) thus obtaining all QN initial conditions (conditions at a chosen value of t) of a particular trajectory. In obtaining the H-J equation, use was made of the dynamical law of classical mechanics and so the final equation can only apply for trajectories which solve Lagrange's equations: i.e. for trajectories which obey Newton's Law. Thus the H-J equation is just that; an equation not an identity; it is an equation equivalent to Hamilton's canonical equations or to Lagrange's equations or indeed to F = ma. But it is a partial differential equation for S in which the ql are the independent variables along with t; there is no question of the q% being functions of t as they are in the solutions of the Lagrange or Hamiltonian equations which are the explicit expressions for the trajectories as a function of time. Indeed such a thing is impossible by the very nature of the equation: it is S which is to be determined by the Hamilton-Jacobi equation not the q% and t. Hamilton's canonical equations 1 have the appearance of partial differential equations but, in fact, they are the generators of ordinary differential equations. That is, unlike the canonical equations
dqi
%
dpi
which involve a known function H (the partial derivatives merely picking out ordinary differential equations), the Hamilton-Jacobi equation involves an unknown function S to be determined by this equation. Once found, this S has values for all q% (and t). That is S, although it contains the dynamical law, does not determine trajectories as functions of t directly as, for every choice of values of the ql and t, S(ql,t) has a value. This raises two questions; one mathematical and one scientific: Note, once more, that only the first of the canonical "equations" is, in fact, an equation (it contains the dynamical law) while the second is merely the definition of velocity in Hamilton's dynamics.
4-2.
The H-J
Equation
75
1. Precisely how does a knowledge of S determine the allowed particle trajectories? 2. What is the referent of S and hence of the Hamilton-Jacobi equation which determines S? To what do the solutions refer — how does one interpret the Hamilton-Jacobi equation? In the past, attention has been concentrated almost exclusively on the first of these points and, fortunately, the answer to (1) helps with the consideration of (2). In the case of the solutions of Lagrange's equations, for the motion of a single particle, the quantities q% appearing as solutions of these equations (or the canonical equations) are functions of time (t G R1) and represent a path through 3D space: qz : i? 1 -> C
C C R3
(4.2.4)
where R1 is the real number system (modelling "time") and C a subset of R? (modelling ordinary space E3) which would normally be capable of being parametrised by R1: in short, C models a curve in ordinary space which is the trajectory of the particle. Now from what we have said above it is clear that this is not what q% is in the Hamilton-Jacobi theory. In fact, in this equation q% is independent of t; it is a coordinate variable which maps points in ordinary space (E3) into the real number system (R1)' given a point in space, ql is a function which gives the numerical value of a single co-ordinate variable in some frame of reference: qi :E3 -> R1
(4.2.5)
So that, in place of: Lagrange, Hamilton: qi : R1 (models "time") -> C (models a curve) C R? (models "space") we have: Hamilton-Jacobi ql : E3 (space) —> R1 (a coordinate) showing that the effect that the change from the Hamiltonian canonical equations to the Hamilton-Jacobi equation has on the interpretation of the ql and the referent of the function S.
76
The Hamilton-Jacobi
Equation
The meaning of ql has changed from "one co-ordinate of a point on a particular allowed trajectory satisfying the dynamical law" to "one coordinate of a point in space" since, as the introduction of S emphasises, all points in space2 lie on allowed trajectories; what distinguishes amongst these allowed trajectories is not the solution of the mechanical equations but only the initial conditions. Just as the referent of the ql has changed so has the referent of the whole mechanical theory with the introduction of the Hamilton-Jacobi approach. The referent of S is the set (ensemble) of all possible trajectories for the given field offeree and inter-particle interactions. Or, if we are to use the more precise concepts introduced in Chapter 3, the referent of S is the abstract object "a particle trajectory consistent with the given environment". This provides an answer to the question (2) posed earlier about the referent of the Hamilton-Jacobi equation and its solution S. The idea of an abstract object "a particle trajectory consistent with a given force field" which may be visualised as an ensemble of one each of all trajectories consistent with a given field of force and differing in initial conditions is a key one in the development of Schrodinger's quantum theory and is, at least incipiently, present in the high point of classical mechanics. Of course, a series of purely mathematical manipulations with equations cannot induce the equations to change their referent and change the meaning of the symbols involved; but the arguments and conclusions presented here can be made more acceptable using the method of "characteristic strips" in the theory of the equivalence of some partial differential equations to sets of ordinary differential equations.
4.3.
Solutions of the H-J Equation
There are two points of view which may be taken about the solutions of the Hamilton-Jacobi (H-J) equation; • That of Jacobi, which concentrates attention on the use of S as a kind of aid to generate the trajectories of the particle(s) (ql{t)) 2
W i t h some obvious exceptions like sources of potential.
4-3.
Solutions of the H-J
Equation
77
• Hamilton's development of the analogy between classical particle mechanics and optics; the possible particle trajectories ("rays") are the normals to the surfaces of constant S which are compared to wave-fronts in the optical model. From the perspective of the development of quantum mechanics, Hamilton's position is the more interesting. In this section some solutions of the Hamilton-Jacobi equation are presented for a very simple system: the free particle in three-dimensional space. Of course, we know the solutions of this problem from the solutions of Newton's equation; the motion is in a straight line with uniform velocity. The solutions of the H-J equation must reflect these known solutions but what they do is show how certain families of solutions emerge by separating the H-J equation in various coordinate systems. These families of solutions have direct connections with the solutions of the dynamical equations of Schrodinger's mechanics. There is a straightforward "recipe" for setting up and solving the H-J equation: • From a knowledge of the Lagrangian, write down the Hamiltonian for the system. • Replace the momentum components by gradients of the action function S. • This generates a partial differential equation (in 3 spatial dimensions plus the time variable for a single particle). • Choose coordinate system(s) in which the equation will separate into 3 ordinary spatial differential equations plus one time equation and solve these equations. • Each of the 3 spatial equations will involve one arbitrary constant which is an initial momentum component and one constant from the time equation which is the initial energy (which may be a constant throughout the motion in many cases of interest). • Combine the separate solutions into a total solution for S. • Form the gradients of S with respect to these arbitrary constants and use the resulting expression to fix the initial coordinates. For the simplest possible case, a single free particle, the basis of this simple recipe is a knowledge of the expression for the kinetic energy (T) in terms of the general coordinate velocities (q1):
The Hamilton- Jacobi Equation
78
T=\Yjmikqiqk
(4.3.1)
i,fe=l where the m ^ are products of the mass of the particle and metric coefficients depending on the particular coordinates 3 q%. In the more common orthogonal co-ordinate systems the matrix of rriik is diagonal and we have:
whence the momentum components required to form the Hamiltonian are:
p =
dL
=
dT
' W W
4.3.1.
Cartesian
.,
= rnuq
Coordinates
In Cartesian coordinates: ITIXX
= rriyy = mzz = m(say), the mass of the particle
and the H-J equation for a free particle is
where —dS/dt = E is the energy of the particle. Clearly, this equation separates Cartesians into a sum of three ordinary differential equations of identical form for x, y and z and a simpler, firstorder ordinary equation in t. Since the particle experiences no potential energy, the energy E is just its kinetic energy:
E = -m{±2 + y2 + z2). The resulting solution, the sum of the separate equations is: S(x, y, z, t) = pxx + pyy + pzz - Et = pxx + pyy + pzz ~^(P2x+P2x+ 3
Px)f
The coefficients in the Jacobian of the transformation between Cartesian coordinates and the q1, assuming that this transformation does not depend on time.
4-3.
Solutions of the H-J
Equation
79
where the pa are constants clearly identified as the components of the momentum of the particle since dS da as required. The surfaces S(x, y, z) = constant are sets of planes parallel to the coordinate planes and the trajectories are sets of parallel lines normal to these planes; straight lines as Newton's equation F = ma requires. The equations of motion for the system, if they are desired, may be had from the condition 9S —— = x opa
px 1 = constant = xo(say) m
(where xo is the value of x at t = 0) or, using px = mvx, x = vxt + XQ . The H-J equation provides a complete solution to the mechanical problem in providing both the (Jacobi) equations of motion and the (Hamilton) "wave fronts" which generate families of trajectories. Insisting that the energy
^(PI+PI+PD be a fixed constant generates a family of trajectories, all of which have the same energy; a situation which we shall meet later on. 4.3.2.
Spherical
Polar
Coordinates
In spherical polar coordinates: 0 < r < oo
the radial distance from the origin
0 < 6 < IT
the angle between r and the z-axis of Cartesians
0 < (j> < 2TT
angle about the z axis from the x axis of Cartesians
mrr = m; mgg — mr2 m^ = m(r sin 9) and the H-J equation becomes: 1
f fdS\2
+
1 fdS\2
2^{UO ^ U )
+
1
fdS\2\
_
+ =o
n
,
i n n
(4 3 3)
^ w U ) > ^ - --
.
The Hamilton-Jacobi
80
Equation
Writing S{r, 9,0; t) = Sr{r) + Se + S+tf) - Et separates this equation into a sum of three ordinary differential equations which differ in form:
fdS^ \d
)
a2 (because does not occur in the Hamiltonian)
(dSg\
\d8 J (dSr\
VdV)
sin 2 0 2mE - - ^ .
We know that the trajectories must be straight lines so, by setting a = b = 0 we obtain those trajectories with zero angular momentum about the chosen origin; that is trajectories, all of which pass through this origin. In this case we obtain the very simple solution for S: S(r, 6, >; t) = \llmEr
-
Et.
The "wave fronts" are spheres in this case and the trajectories corresponding to this simple solution are all those of a given energy E which pass through a given point; the (arbitrarily-chosen) origin of spherical polar coordinates. We may, by dropping the requirement that a = 0 or b = 0 (or both) obtain other families of trajectories with given energies and given angular momenta. Combining the ideas that the particle trajectories must be straight lines and that the angular momentum of each particle on such a trajectory is conserved it is easy to see, without an explicit calculation, that a family of trajectories with a given constant angular momentum must be all those whose perpendicular distance from the origin is a constant since the linear momentum and therefore energy of these trajectories is constant. Any non-zero values of the constants a and b determines a subset of this total collection. In the case of a free particle there is obviously no preferred point in space for the origin of either Cartesian or Spherical polar coordinates. In the case of Spherical polars, therefore, we have the unusual situation of constant angular momentum about any point. The values of the constant for a particular trajectory will depend on choice of origin, of course.
4-3.
Solutions of the H-J Equation
81
It is obvious that this separation technique in Spherical polars applies to a particle in a central field of force V(r) if the source of that field is at the (now unique) origin of coordinates since only the equation for Sr is changed:
( f ) 2 = 2 m(£ + nr))-^. The solution of this equation is more difficult than that for the free particle but some qualitative features are the same: • Trajectories with a = b = 0 (zero angular momentum) are straight lines (in this case of variable velocity) and pass through the origin. In this case the particle will impact on the source of potential. • Other trajectories form the familiar circular, elliptical, parabolic or hyperbolic orbits which have non-zero angular momentum with magnitude and spatial orientation being determined by the values of a and b. 4.3.3.
Comparisons
Thus, we can see that in the case of the free particle of constant energy E, for example: • All the trajectories are indeed straight lines with constant linear momentum (and constant angular momentum about any point). • The various possible separations of the H-J equation in different choices of the familiar 11 orthogonal coordinate systems generate families (abstract objects, ensembles) of trajectories with some particular properties in common. • These ensembles share a given spatial symmetry type or, what amounts to the same thing, they have common values of certain dynamical variables. In the ensembles resulting from a separation in spherical polars for example, for a given choice of E, b and a, all particles having one of these trajectories have the same energy and angular momenta. One point which I shall have reason to mention in Chapter 11 is the conservation laws which are characteristic of particular families of trajectories. For a free particle every concrete trajectory is a straight line with constant momentum (p) and constant angular momentum (£).4 What is more, any For a particular choice of origin.
The Hamilton-Jacobi Equation
82
family of trajectories (chosen, for example, by a given separation of coordinates) will consist of a family of straight-line trajectories with particular common properties: • Any one family obtained by separation in Cartesians all have the same constant linear momentum when the "initial conditions" are fixed. • Similarly, any one family obtained by separation in spherical polars all have the same constant angular momentum when the "initial conditions" are fixed. However, and this is the point which is crucial to Chapter 11, The trajectories from one family with constant angular momentum cannot all have the same value of linear momentum and vice versa even though each individual concrete trajectory of each family has constant values of both dynamical variables. Specifically, it is obvious that the linear momentum of any individual free-particle trajectory in the family "trajectories of zero angular momentum passing through the origin" is constant but the (vector) value of that constant varies from trajectory to trajectory within that family: their directions are all different. It cannot be overemphasised here that, for our simple example of a free particle of constant energy: • Every individual concrete trajectory has constant values of both (vector) linear momentum and (vector) angular momentum. Indeed, if they did not have perfectly definite values of each of their properties they would not be concrete trajectories. • Families of trajectories may be generated by separating and solving the H-J equation in various co-ordinate systems and each member of such a family has some shared constant linear or angular momenta but within a family of constant linear momentum there are members with different angular momentum and vice versa. • There are, in fact, at least 11 such families corresponding to the 11 orthogonal co-ordinate systems; each of which has its own characteristic set of constant momenta which are not all different. There is, then, no mystery here:
4-3.
Solutions of the H-J
83
Equation
• It is intuitively completely obvious that one cannot choose a set of all possible different trajectories passing through a given point in 3D space (i.e. of constant angular momentum zero) without all of them having different directions. That is, all of these trajectories have different linear momentum vectors. • This result has nothing to do with the simultaneous measurability of these sets of dynamical variables; it is merely a result of choosing families of trajectories in certain systematic ways. It is perfectly possible to measure both linear and angular momenta 5 of a concrete trajectory; indeed the angular momentum is completely determined by the linear momentum and the position of the origin in this simple case. These results are quite general and independent of the choice of particular example; in Chapter 111 will explore the quantum analogue of this method of choosing families of trajectories and attempt to clear up the confusion surrounding this method. To risk labouring this important point we may look at an intermediate case: the free particle in cylindrical coordinates. 4.3.4.
Cylindrical
Coordinates
Circular cylindrical coordinates provide an interesting intermediate case, in that they involve both linear and angular momenta. The three independent coordinates are: 0 < p < oo
the radial distance from the origin in the xy plane of Cartesians
0 < z < oo
identical to z of Cartesians
0 < 4> < 2TT
angle about the z axis from the x axis of Cartesians rripp — mzz = m
m^
= mr2
giving a H-J equation of the form 5 1 have purposely avoided any discussion of the simultaneous existence of more than one component of the angular momentum "vector" in classical mechanics, even though the answer to this question strengthens the similarity between the classical and quantum cases. In fact, as I shall establish in Chapter 5, only the absolute magnitude and one component of the angular momentum may be measured, even in classical mechanics. In this chapter, I am concerned with concentrating on the interpretation of the H-J equation and not muddying the waters with details of the separate problem of the interpretation of angular momentum.
The Hamilton- Jacobi
84
1
\(dS\2
(8S\2
1 (dS\2\
dS
^{UJ +U) v U ) ) + ^ = 0
n
Equation
fAOA.
(43 4)
-
which separates in the usual way into: a dSz dz dSp dp
b (because niether z nor <j> occur in the Hamiltonian)
)]*«*-(* + $)•
The solutions of these equations, as usual, generate families of trajectories with common values of radial momentum (in the p direction, perpendicular to the z-axis), linear momentum (in the z direction) and angular momentum (about the z axis) depending on the choice of the separation constants a and b. As in the other two cases, all the concrete trajectories are motion in a straight line with constant velocity and constant angular momentum but, this time, the families generated by the separation of the H-J equation in cylindrical coordinates have a set of common properties which differ from those of the earlier (Cartesian and spherical polar) families; in particular, each family has common values of two linear momenta and one angular momentum.
4.4.
Distribution of Trajectories
The H-J equation
generates the trajectories (or "rays") of the particles whose motion is described by the Hamiltonian H and, for a given problem, one might reasonably expect that the number of trajectories is constant; trajectories are neither created or destroyed by the evolution of the motion. If we use the more colloquial interpretation of the solution of the H-J equation as
4-4-
Distribution
of
85
Trajectories
referring to an ensemble6 of particle trajectories consistent with H, rather than the abstract object "particle trajectory", then this result might be heuristically interpreted to be equivalent to the conservation of ensemble particles. This fact ought to be represented by a conservation equation and indeed it is. If p(q%;t) is the density of trajectories in the ql space (the density of particles in the ensemble interpretation) then for the function S which solves the H-J equation, there is a conservation equation which connects S and p:
! +
r)TJ - ( < didS/dqt)
' = 0
(4.4.10)
which, because of the definition of velocity (ql) in Hamiltonian dynamics dS dql
dH dp1
becomes
and the quantity p(ql; t)q = J, say, may be interpreted in the way familiar from electrodynamics as a "current" of trajectories (or particles) to give:
t + v./.o. The differential equations for the two functions S and p may be generated from an all-embracing variational principle 8 f dV fdt£. = 5 fdV
fdtip
H(dS/dqi,qi;t)
+ ^-
\=0
(4.4.11)
when variations are allowed in the forms of the functions S and p. Carrying out the formal mathematical treatment of £ as a "field Lagrangian density" shows that the functions S and — p are a conjugate pair of "coordinate and momentum" ! 7 Here we are so close to Schrodinger's theory 8 that we can almost touch it. What is required now are two things: 6 We are then at one with Einstein; this is the ensemble of all possible trajectories with different initial conditions for a particle in the environment described by H. 7 Where t h e partial differential is replaced by a so-called functional differential 5C/SS. 8 A n d to field theories and "second quantisation".
The Hamilton-Jacobi Equation
86
• A different interpretation of the formalism. • Some connection between the two functions p and S to reduce the number of unknown functions from two to one and the physical consequences of this connection. The first problem will be addressed in due course. That the two functions should be connected is not difficult to see, at least qualitatively. Gradients of the action function S are the momentum components which (apart from masses and metric tensor components) are the velocities of the particles. Now, the faster a particle is moving at a particular point, the smaller amount of time it spends near that point and so the less likely it is to be in the vicinity of the point; as we have seen earlier the probability distribution function for a particle on a trajectory is inversely proportional to its velocity. Thus, gradients of S should influence the values of p.
4.5.
Summary
I have already said that there can be no question of "deriving" quantum mechanics from classical mechanics since, as we shall see, their referents are different and their descriptions of the energetics and distributions of particles are very different so it is worth stressing a few points: • At bottom, the result of solving the Hamilton-Jacobi equation is a set of trajectories for particle(s) in a given environment (potential energy function and mutual interactions). These results could have been obtained by solving Hamilton's, Lagrange's or, indeed, Newton's equations of motion. • What has been achieved by deriving the H-J equation is a transfer of viewpoint of the problems of classical particle mechanics from "finding a trajectory" to the realisation that, roughly speaking, 9 every point in the available space is capable of being on some trajectory. These trajectories differ, not in the particles' environment, but in the particular "initial conditions" which the particles have in that environment. Thus, the solutions of the H-J equation enable all the trajectories to be found. • By solving the H-J equation by the technique of separation of variables, one can obtain "families" of trajectories with particular properties in common as we have seen; all the trajectories of a free particle are straight lines with constant momentum but these trajectories may be collected 9
Obvious exceptions are point sources of potential.
4-5.
Summary
87
together in various ways; all trajectories parallel to a given line or all trajectories passing through a given point, etc. • Although the analogy is tempting, distributions of particle trajectories are not distributions of particles; in a one-particle system described by classical mechanics (Newton's equation or H-J) the equations of motion and the initial conditions fix the trajectory. Certainly the position probability distribution of the particle along that trajectory may be computed from a knowledge of its velocity10 but this position probability distribution is only non-zero along that trajectory and there is no 3-dimensional probability distribution function for the particle. • Precisely because the H-J equation was Schrodinger's starting point for the development of his quantum mechanics, there is a considerable literature on the relationship between the H-J equation and the equations of Schrodinger's mechanics. But Schrodinger's mechanics cannot be obtained from classical mechanics by mere mathematical manipulation. Notice that some care has been taken to distinguish between a concrete particle trajectory, an ensemble of possible trajectories and an abstract particle trajectory; distinctions which will become crucial in what follows.
'As we did for the pendulums in Chapter 2.
This page is intentionally left blank
A p p e n d i x 4.A
Transformation Theory
Here is the very briefest and most elementary "derivation" of the Hamilton-Jacobi equation and the Poisson bracket formalism. It is for illustration purposes only and will not stand close scrutiny. The solution of Hamilton's canonical equations: f £ ~ *
(4.A.D
£ = '
(4.A. 2 )
is obviously enormously simplified if the Hamiltonian function is independent of some of the coordinates qz; since in this case the left-hand-side of (4.A.1) is zero and so pi = 0 (pi = constant) integration is immediate. Similar comments hold for the identities (4.A.2). Clearly, if a co-ordinate system could be found for which the Hamiltonian was independent of all coordinates, then the solution of the equations of motion would be trivial: all momenta would be constants. Before spending too much time in searching for such a co-ordinate system — which would obviously depend on the potential in which the particles move — consider the physical interpretation of such a system. If all momenta are constants, then because of Newton's law there are no forces acting on the particles (or bodies in general) and so, apparently, this co-ordinate system can only be found for the trivial case of non-interacting particles in free motion (or free rotation of extended bodies). The only way in which one could hope to generate a co-ordinate system with the sort of properties we require for non-trivial mechanical problems is to allow the origins of the coordinates to move and "follow" the particles; so that the 89
Transformation
90
Theory
momenta in the moving co-ordinate system can be constant in the presence of a potential and inter-particle interaction. Now, in the Hamiltonian formulation of mechanics we have an ideal vehicle for the construction of such moving co-ordinate systems; the coordinates and momenta are independent variables and the canonical equations (4.A.1) and (4.A.2) are formally almost identical. So, if we admit transformation of coordinates and momenta which "mix" the original coordinates and momenta in the definition of the new sets we should be able to generate a set of canonical equations in terms of the new variables which do have the desirable properties. That is, we seek a transformation to new variables QJ, Pj such that Qj=Qj{q\Pi,t)
(4-A.3)
Pj=Pj(qi,pi,t)
(4.A.4)
for which dh
•
, , . _,
—— = -Pj = constant
(4.A.5)
§-rV
(4.A.6)
where h is the Hamiltonian expressed in terms of the variables Qi, Pj. Now, there are several points to consider when setting out on such a venture: 1. The transformations (4.A.3) should be a complete and non-redundant set if the original q% and pi were; this condition can be expressed in the usual way as the non-vanishing of a Jacobian. 2. The transformation (4.A.3) must be such that the transformed equations of motion stay within the canonical Hamiltonian formalism, i.e. are indeed of the type (4.A.5) and (4.A.6). 3. Since the transformation (4.A.3) "mixes" co-ordinates and momenta of the original "intuitive" type, the new, transformed, equations (4.A.5) and (4.A.6) cannot be as easily distinguished as the original equations (4.A.1) and (4.A.2). That is, while (4.A.1) is an equation of motion and (4.A.2) is an identity in the original formulation, in the transformed equations the "equations of motion" may be inextricably intermingled with the " identities" and so we are forced to treat equations (4. A.3) and (4.A.5) and, by implication, (4.A.1) and (4.A.2) on the same footing. If
91
we do this then we may as well be hung for a sheep as for a lamb and seek transformations which make both the Pi and the Ql constants. It must be said at this point that the equations describing the motions of a system of particles cannot be solved by sleight of hand. In attempting to find a particularly trivial form of the equations of motion by use of a transformation we are simply pushing the difficulties "out of sight" into the generation of the transformation. None of these manipulations are being carried through as aids to the practical solution of problems in mechanics; our aim is to obtain the most general form of the mechanical principles in order to throw light on the formalism and physical interpretation of the quantum mechanics of systems of particles. There are no intuitive guides to be used in seeking the transformations (4.A.3) and so we must fall back on general principles. The most general formulation of Hamiltonian equations is the co-ordinate-free variational formulation: 3N
1 Y^m'-H /•*2
dt = 0
(4.A.7)
,i=i
from which the canonical equations (4.A.1) and (4. A.2) follow. If we wish, therefore, to use only transformations (4. A.3) which remain in the Hamiltonian formalism we require /•*2
3N
Y.PiQ'-h
dt = 0
(4.A.8)
i=\
in addition to (4.A.7). Now the variational problem (•*2 /•12
•L
fdt = 0
(4.A.9)
can always be solved by a function whose total time derivative is / , i.e. if dt
J
then
7
6 I
Jt!
fdt = S(F(t2) - F(h)) = 0
identically. Or, what amounts to the same thing, the variation principle only determines the optimising integrand to within an additive total time
92
Transformation
Theory
derivative. This means, of course, that when / is a known function of some arguments other than just t, like ql, ql, pi in 3iV i=l
then, when the variational problem is solved, the integral which solves the problem is a function of t only. That is, of course, that (in our case) ql and Pi are then fixed as known functions of t which, if we choose to do it, we may substitute in the integrand for ql and pi and obtain 3N
dG(t) = 5^Pi9* - H(q>,Pj) dt i=l
explicitly. We can do this for both the original variables in (4.A.7) and the transformed variables in (4. A.8) and combine the two results; both of which are functions of t only so we may combine (4.A.7) and (4.A.8) to give 6
3AT
J
i^Piq'-HJ-i^PiQ'-hjdt
It is clear that the integrand can be set equal to the total time derivative of an arbitrary function F (say) 3AT
3N
^Pitf-H
£W-/i
.»=i
i=l
~~dt
(4.A.10)
In obtaining this equation we have put no requirements on the transformation (4.A.3), we have not yet sought a transformation which will generate constant Q% and Pi- However, it is clear that F contains a characterisation of the transformation since it is a function of the ql:pi,Q% and Pf, the question is "how do we extract (4.A.3) from (4.A.10)"? First of all, although F is a function of the q\ pi,Ql,Pi (and t), only 6iV of these 12N variables can be independent because of (4.A.3). We may choose which 6N at our pleasure. To illustrate the procedure we choose q% andQ*
F=
F{q\Q\t)
93
so that dF
^dF
.i
^ =E ^
H> dF
i+
.
dF
+
E ^ ar-
(4jU1)
Now both (4.A.10) and (4. A.11) are identities and may be combined to give
which itself is an identity so that the individual coefficients of the 6N + 1 time derivatives must be separately zero:
Pi =
dF(qi,Qi,t) d?
(4.A.13)
dF{ff,Qi,t)
(h-H)
=
(4.A.14) .
(4.A.15)
Equation (4. A.13) fixes the Ql in terms of the ql and pi and may be solved to generate their explicit form. Once the Ql are found from (4.A. 13) they may be substituted in (4.A.14) to generate the Pi and in (4. A.15) to obtain the new Hamiltonian h. These manipulations show that functions like F may be used to generate transformation of the variables in the canonical equations; what is not yet clear is how to choose a particular F which simplifies the canonical equation in order to generate (4.A.5) and (4.A.6). The key lies in the backsubstitution of (4.A.13), (4.A.14) and (4.A.15) into the canonical equations which generates an equation for F. We now suppose that the transformation of coordinates and momenta in which the new coordinates and momenta Q%, Pi are all constants can be found, and express it in terms of an F of the above type. In deference to long-established practice, we call this particular F, S and it is a function of the independent "variables" ql, Ql and t although the Ql are to be constants ultimately, albeit "independent constants". S =
S{q\Q\t).
Transformation Theory
94
We require S such that the transformed Hamiltonian h satisfies dh
„
dh
n
(4.A.16)
If the transformed Hamiltonian h contains no explicit time dependence we have dh ~dt
(4.A.17)
0
and so the Hamiltonian is a constant, having no dependence on Q%, Pi or t. Now the original momenta pi may be generated from S by the use of (4. A.13) so we may write H{q\pi)
asHiq
dS_ dq*
and, substituting into (4.A.15), we have H
dS
' dtf
dt
(4.A.18)
= h
where h (the transformed Hamiltonian) is a constant as we have seen above. This is a partial differential equation for S since we are now going to regard S as a function of the q% and t only since the Ql are constants. In fact a trivial re-definition of S enables us to absorb the constant h , replacing S by S — Et enables the equation to be written in a compact form H
9s
n
(4.A.19)
' dq*
where now S = S(ql,t). This equation, the Hamiltonian-Jacobi equation, can be set up whenever the Hamiltonian can be formed in terms of the original qx and p,. If it can be solved, the mechanical problem is solved. Notice that, however appealing the above "derivation" might be, the central assertion — the existence of the function S — has not been proved, merely made to seem reasonable. In the case where the transformed Hamiltonian function is a constant, which as we have seen may be chosen to be zero by a slight re-definition of S, we can go back to the original equation (4.A.10) which defined the original general transformation: '3N
H=i
M
E^ i=l
h
dt
95
If the Q% are constants then q* = 0 and, if h is chosen to be zero, we have for the special case F = S: JQ
[3N
That is 5 = / Ldt showing the special relationship between this particular transformation function and the original variational formulation of Lagrange's equations. The variational principle is now 8S = 0. It should be noted that this derivation of the transformation equations in general and the Hamilton-Jacobi equation in particular has used the dynamical law by assuming that F (or S) is a function of time only — the q% are fixed functions of T. Thus, these are indeed equations and not identities. We have seen that if the canonical equations were to have the same form in two different co-ordinate systems then there exists a function of the 127V + 1 variables qz,pi,Qj,Pj and (possibly) t from which the details of the transformation may be obtained. Crucial to this development is the fact that, of the 127V variables, only 67V may be independent: the other 67V are to be generated from these independent ones by the very transformation provided by this function. Now the choice of which 67V are chosen as independent variables is (formally if not practically) arbitrary and earlier the most obvious choice was taken; writing the function which generates the transformation from q1 to Q* as an explicit function of the q% and Qi. But there are three other obvious possibilities, if our original choice is Fi(ql,Q:',t) then using the initial and final coordinates and momenta as units we may choose the independent variables in three other ways to define the functions F2(qi,Pj,t):F3(pijQj,t) and F4(pi,Pj,t)
Transformation
96
Theory
and all are related. Of course there is no reason why one should not choose some ql and some pi, etc. b u t such choices have little theoretical or practical interest. We saw earlier t h a t , in addition t o t h e sets (ql,Pi) and (Qi,Pj) satisfying t h e canonical equations which they were required t o do, t h e pi and Pj were related t o F i by Pi
dFi dFi :P< = dqi
T h a t is dQi '
dpi
d2F1
dQi
dQidq*
It is straightforward t o show t h a t similar relationships hold for t h e other dqi choices: dqi _ dPj dqi dQi ~ dPi "' dPj
dQi
dpi
dQi
dpi
dPj
dqi
(4.A.20)
These four relationships fix t h e n a t u r e of t h e allowed transformations; t h a t is, since t h e initial coordinates and m o m e n t a are arbitrary, t h e y fix t h e possible co-ordinate systems in which Hamilton's canonical equations m a y be expressed, together with t h e requirement t h a t t h e Jacobian of t h e transformation to a known set of complete and nonredundant set of coordinates be non-zero. It is usual t o express these relationships in a form which displays t h e structure of t h e results rather t h a n its expression in t e r m s of two co-ordinate systems although, of course, our co-ordinate systems are arbitrary. T h e Poisson Bracket of X and Y (two quantities depending on coordinates, m o m e n t a and possibly time) is defined by: 3iV
[*.n,,P = £
dXdY
dXdY
dqi dpi
dpi dql
dQi dPj
dQi dPj
dqk dpk
dpk
and, in particular, 3N
[QS-P,j\q,p
E
dqk
k=i
which, when t h e relationships (4.A.20) are used, becomes dQl IV ' Pj\q,P ~
dQi
6%,j =
[Q\PJ]Q,P
97
and similarly,
[Q\ Q%,p = [Q\ qjh,p = o: [Pu Pj]q,P = [Pu PJ]Q,P = o • That is, the Poisson Bracket of the coordinates and momentum components are invariant with respect to which co-ordinate system they are evaluated in; so that these relationships which are derived using the relationships (4. A.20) may serve as diagnostic tests of allowed coordinates and conjugate momenta in the canonical equations.
This page is intentionally left blank
Chapter 5
Angular Momentum
While many of the more elementary paradoxes in the interpretation of quantum theory are generated by misunderstandings about probability, some more subtle ones are due to confusions about angular momentum. Rotational motion, having some seductive and misleading analogies with rectilinear motion, has proved a stumbling block in both classical and quantum mechanics. The most fertile source of "mysteries" in quantum theory — the Einstein-Podolsky-Rosen paradox and the related results due to Bell — are exacerbated by this confusion.
Contents 5.1. 5.2. 5.3. 5.4. 5.5. 5.1.
Coordinates and Momenta The Angular Momentum "Vector" The Poisson Brackets and Angular Momentum Components of the Angular Momentum "Vector" Conclusions for Angular Momentum
99 101 105 107 108
Coordinates and Momenta
In everything which has been discussed so far the terms "coordinates" and "momenta" have been used rather informally insofar as we have relied on intuition and ordinary practice to supply a picture of what is meant by a coordinate, and the definition P, = f
(5.11)
to provide a conjugate momentum component. The simplest examples show that this usage seems justified; in particular the familiar results in Cartesian 99
100
Angular Momentum
coordinates are all consistent with this general theory. In normal practice "coordinate" at its most complicated usually means a member of one of the familiar 11 orthogonal coordinate systems in three-dimensional space. If one of these coordinate systems is used, then (5.1.1) provides the conjugate momentum components which are, however, sometimes intuitively less accessible. Quite independently of the Lagrangian and Hamiltonian formalisms there are existing historical (Newtonian) definitions of various types of momenta and so it is natural to inquire about the relationship between these, what one might call "naturally occurring", momenta with their associated components and the momentum components conjugate to sets of coordinates generated by Lagrange's definition (5.1.1). That is, under what conditions is a "coordinate" suitable for use in the canonical formalism and under what conditions can a "pre-existing" momentum component associated with such a coordinate be made conjugate to that coordinate and so be brought into the canonical formalism? Naturally this investigation is limited to "coordinates" in the original sense of the term: i.e. specifically excluding transformations which "mix" coordinates and momenta. Elementary considerations are enough to show that there is a whole class of momentum components which cannot be brought into the canonical formalism in spite of their utility and familiarity. The problem of the interpretation of angular momentum and its components has three aspects which we shall look at separately although all are related: • The relationship between the so-called angular momentum vector and momenta conjugate to angular coordinates. • The canonical formalism and Poisson brackets. • The components of the so-called vector and their interpretation. To set the scene for this investigation it is worthwhile making the distinction between the pairs (linear momentum, Cartesian coordinates) and (angular momentum, angle coordinates). The position of a point in ordinary threedimensional space can be specified by the values of the three Cartesian coordinates (x, y, z) (with respect to some origin) and the linear momentum of a particle in that space can be uniquely specified by the three Cartesian components (linear momentum in each of the three mutually perpendicular directions). Three angles ((f), 6, x) (rotations about the three Cartesian axes, say) do not, of course, have a member with the dimensions of length so they can, at best, specify a direction in space or, intuitively more accessible, they
5.2.
The Angular Momentum
"Vector"
101
can specify the position of a point on a sphere about the origin. But the analogy is incomplete; while the position of a point in space is uniquely specified by the values (x, y, z) and only (x, y, z), the position of a point on a sphere cannot be specified by just the values of three angles. In order to get a unique point on the surface of a sphere, one must specify the order in which the rotations must be performed. It is enough to experiment with rotations of an unsymmetrical rectangular prism to be convinced of this. This asymmetry between lengths and angles destroys the value of the otherwise attractive analogy between the linear and angular momentum vectors.
5.2.
The Angular Momentum "Vector"
In elementary (vectorial) mechanics one defines the angular momentum vector as i=fxp
(5.2.2)
and this definition implies that the angular momentum vector can be resolved into components in any coordinate system whether or not any member of the coordinate system is an angle. That is, (5.2.2) is not concerned with the idea of momentum components conjugate to coordinates with the dimensions of angle (i.e. no dimensions) it is simply called the angular momentum for intuitively justifiable reasons. In fact, the most usual form of resolution of t is l = txt + £yj+ezie
(5.2.3)
in Cartesian components and, of course, tx is not conjugate to x, etc. There are a couple of points to be made about this definition: • The usual description of the angular momentum defined in this way is that I is "the angular momentum about a poinf; the origin of r. But rotation does not occur about a point, it occurs about an axis. This axis is, in fact, provided by the definition; it is the direction of the vector (.. • As it stands, equation (5.2.2) is ambiguous. It depends on the origin of coordinates; different origins give different values for t. These differences are not trivial constants; for a particle rotating about a given axis, the angular momentum defined by (5.2.2) is only constant if the origin is
Angular Momentum
102
chosen to be at a point on the axis at the centre of the rotation (the "centre of mass"). The magnitude of the angular momentum is the same for all origins along the axis, but for any origin not at the centre of mass, the direction of the angular momentum vector rotates. I shall silently assume in what follows that the origin is at the centre of mass of any system. What is very clear is that I cannot be expressed in a way in which, to each of of three linearly independent components (in some coordinate system), there is a conjugate angular coordinate. This is trivially true simply because the position of a point in space cannot be specified by three angles: at least one length is required. One might think of this simple example as suggesting a kind of "inverse problem in canonical coordinates": given a set of momentum components, under what conditions are found conjugate coordinates which: 1. are a complete and non-redundant set for the problem in hand, 2. regenerate the given momentum components via (5.1.1). Now among the familiar orthogonal coordinate systems there are some which have two angles and one length as their dimensions. But the problem with the resolution of the angular momentum vector is more acute than we have suggested; as we shall see shortly, it is possible to make only one of the three components of I into a canonically conjugate momentum. Perhaps the most direct explanation of why this is so is by way of an explicit example: the transformation between Cartesian and spherical polar coordinates. Taking a single particle of mass m with velocity v = f p = mv = mf we have £ = rx p = m(f x r) and the Cartesian components of I are: 4 = m(yz - zy) iy = m(zx - xz) Lz = m(xy - yx)
5.2.
The Angular Momentum
"Vector"
103
using the relationship between Cartesians and spherical polars: x = r sin 6 cos (j> y = r sin 9 sin <j> z = rcosO gives ^ x = —1(9 sin > + 0 cos 0 sin 0 cos <j)) ly = 1(9 cos <j> — (j> cos 0 sin 0 sin ) tz = I<j> = (I sin2 6)j>
where / = mr2 and 1$ is the "moment of inertia" of a particle about the z axis. Now for a Lagrangian of the simple form L = -mv2 -V
= -m(r2
+ r292 + r2 sin 2 9 ft) -
V.
Lagrange's definition of the momentum conjugate to (p is dL
d which is indeed I^cj) and so the angular variable (j> and the angular momentum component
are indeed conjugate. But lz is the z-component of the original angular momentum, not the ^-component! Of course, (f> is "tied" to the choice of z-direction (it is an angle of 0 to 27r around the z-axis) and we could have defined <j> in an analogous way around the i-axis and so obtained lx = 1^0. But we cannot do both, if only for the simple reason that angles of 0 to 2ir around two mutually perpendicular axes cover the sphere twice making such a putative coordinate system redundant and, incidentally, showing that there is no possibility of a non-redundant set of coordinates containing two, let alone three, canonical angular momentum components. In fact, once one component of the angular momentum "vector" has been chosen as conjugate to an angular
Angular
104
Momentum
variable this choice excludes other angular momentum components from being conjugate to any other angular coordinate in any coordinate system. Some further insight into the problem may be obtained by considering ignorable coordinates — the starting point of the transformation theory. In Cartesians if dL_ _ &L _ &L _ dx dy dz this usually implies the potential is a constant (say zero) and the vanishing of the above derivatives implies (via the Lagrange equations) the constancy of the three conjugate momentum components and we interpret this constancy by saying that in the absence of a potential function threedimensional space is isotropic with respect to linear displacement. Now, in spherical polars a free-particle Lagrangian is L = ^m{r2 + r2d2 + r2 sin 2 6 but
indicating a major a-symmetry between the two angles in the coordinate system: <j> has a privileged position. Once a given axis is chosen, then the homogeneity of space for rotations is destroyed by that choice. The range of the angle 6 is only from 0 to 7r, that is it is not cyclic and clearly not suitable for the description of angular momentum; the derivatives {dL/dO) are not defined at the end-points. Thus, caution is required if there is a tendency to make too much of certain apparent equivalences between translations and rotations: between angular and linear coordinates. We have seen above that p$ is not constant for a free particle but pe is conjugate to the angular coordinate 9 and the canonical equations can be expressed in spherical polar coordinates. It should be clear now what the source of the confusions about angular momentum actually are: they are verbal rather than essential. We have been discussing two quite distinct concepts and confusing them together because of similar terminology and certain intuitive
5.2.
The Angular Momentum
"Vector"
105
expectations. These confusions are compounded by a contingent connection between the two which occurs in a familiar coordinate system. The clues to the resolution of the confusion lie in the fact that it is the z-component of I which is conjugate to <j> (not z) and the fact that p$ is a valid momentum component conjugate to 0. The problem is simply the simultaneous existence of two quite different quantities: the angular momentum "vector" and the momentum components conjugate to angular coordinates. These two quantities are defined independently of each other and, in general, there will be no simple connection between them. In fact, as we have seen, there may be a contingent connection between them in the sense that (in spherical polar coordinates, at least) one of the Cartesian components of the angular momentum vector is identical to a momentum component conjugate to an angular variable. This is nothing more or less than a co-incidence which has, unfortunately, served to muddy the distinction between the two separate quantities. If one considers the 11 orthogonal coordinate systems in three-dimensional space — many of which have dimensionless (angular) members — the role of 6 in spherical polars is more typical. The momentum component dL/d6 is a proper conjugate momentum component: conjugate, that is, to an angular variable, but it is not a component of the angular momentum "vector", in particular it is not a Cartesian component of the angular momentum "vector". Even if the position of a point cannot be specified by three angles it might be thought that the three Euler angles (for example) which are used to specify the orientation of a rotating body might past muster as "rotational canonical" coordinates. As we noted in the last section, these three angles are not sufficient to define the orientation of a rigid body with respect to a fixed "global" coordinate frame because one must specify the order in which the rotations are performed in fixing the body's orientation. Such a set of coordinates cannot be brought into the canonical formalism. In the last few paragraphs, the word vector has been put in quotes when used in conjunction with angular momentum. This is deliberate and another attempt to draw attention to the difference between the angular momentum "vector" and momentum components conjugate to angular coordinates or, indeed, to any canonical momentum component. Angular momentum, defined as it is in terms of the vector product, is in fact a bivector or anti-symmetric second-rank tensor (number of components n{n — l)/2) and it is only the co-incidence in three-dimensional space that
Angular
106
Momentum
3 = 3(3 —1)/2 which enables the components of this bivector to be put into one-one correspondence with the components of a vector.
5.3.
The Poisson Brackets and Angular M o m e n t u m
The above preliminary discussion of some of the properties and peculiarities of angular momentum was initiated by the more general question "under what conditions is a coordinate (or momentum component) suitable to be used in the canonical equations" . Although angular variables and angular momentum provide special confusions and are the most celebrated case of "non-canonical" momenta, it is worth looking at the general case. However, it is worth remarking that these confusions between angular momenta and canonical momenta conjugate to angular coordinates, when taken over into quantum theory, have some ramifications in discussions of Bohm's version of the Einstein-Podolsky-Rosen (EPR) paradox. The techniques of the Transformation theory are specially suited for this investigation since the only limitations placed on the allowed transformations are: 1. The coordinates should be a complete and non-redundant set: the nonvanishing of the Jacobian of the transformation. 2. Hamilton's canonical equations should have the same form in all allowed coordinate systems. That is, the condition that a set of coordinates and momentum are canonical is "built into" the Transformation theory. In Appendix 4. A it was shown that the transformation theory led to the definition of sets of invariant quantities — the Poisson Brackets — which could be used as diagnostic tests for candidates aspiring to be canonical coordinates and momenta. In the case of angular momenta, we may use the convenience of the invariance of the Poisson brackets to evaluate them for the Cartesian components of angular momentum in Cartesian coordinates: obviously [x, x] = [x, y] = • • • = 0 .
But, for example •xi *-y\ — %-z
5.4-
Components
of the Angular Momentum
"Vector"
107
and faty]
=Z
confirming the fact that the angular momentum components are not conjugate to a set of canonical coordinates. It is easy to show that [tX,l2) = [ty,l2] =
[tz,l2]=0
but this does not admit £2 into the canonical scheme because there is no coordinate to which (? is conjugate and so the other two relationships are trivially not satisfied. To emphasise the point once more, the components of the angular momentum "vector" and the scalar "square of the angular momentum" cannot be brought into the canonical formalism. 5.4.
Components of the Angular Momentum "Vector"
The elementary vectorial definition of angular momentum in equation (5.2.2) I= r x p satisfies all the usual requirements of vector algebra and only the fastidious would object to the mathematics of this definition. The vector so defined is the dual of a bivector and, as a vector, can be resolved into components along any axes whatsoever. But the abstraction of the vectorial properties from the real motion does not carry the most important property of angular momentum with it. As I noted at the outset, it is instructive to contrast linear and angular momentum. If a linear momentum vector is expressed, for example, as the sum of its three Cartesian components P = Px + Py + Pz = Pxi + Py]
+Pzk
(say) then each of these three components has a physical interpretation which is identical to that of the total linear momentum: Each one of p, px, py and pz represents linear momentum in a particular direction; the difference is only in the magnitude and direction of the linear momentum. The key point being that a particle may, simultaneously, have linear momentum in any number of directions. 1 Only three of them will be linearly independent.
Angular
108
Momentum
However, in the case of angular momentum the situation is very different. If an angular momentum vector is similarly expressed as the sum of its three Cartesian components £ = tx ~r £y "T Z-z
==
*-x^ i ^y3
' ^-z^
(say) then, notwithstanding the identical resolution of the two vectors, the implied consequences for the interpretation of the physical reality are quite different. Angular momentum about a given axis is not capable of being seen as composed of several independent angular momenta about other axes. Rotational motion is such that the rotation of a body or system of particles may, at any one time, only take place about a single axis. The axis about which the rotation occurs may change with time of course, as happens in precession but, for example, if the angular momentum is constant, the axis about which rotation occurs is always the one given by the direction 2 of the "vector"
I = r x p. Only this vector has the physical interpretation of rotation of a body or system of particles about the direction of the vector. When we resolve the vector into, for example, Cartesian components ^x> £y> f-z the resulting components do not have a physical interpretation as angular momenta in the same sort of way. That is, the vectors (x, ty, tz are not (all) interpretable as rotational motion about the respective Cartesian axes. Such a conclusion is made more obvious and intuitively acceptable if (again!) we contrast linear and angular motion. If we have a particle moving in the xy plane with momentum components px, py and give it an impulse in the z-direction so that it acquires a linear momentum of pz in that direction, its resultant momentum in the xy plane is unchanged and its total momentum is simply P - Px + Py + Pz •
However, as we all know from our childhood experiences with spinning bicycle wheels, if we have a system rotating about a given axis and try to 2
W i t h due regard for the "sign" of this direction, of course.
5.5.
Conclusions for Angular Momentum
109
give it motion in a different direction (application of a quick torque impulse) we do not get a smooth transition to simple rotational motion about some intermediate axis; precessional motion occurs. 5.5.
Conclusions for Angular M o m e n t u m
Everything discussed in this chapter has been in terms of classical particle mechanics but it has some very far-reaching ramifications for the de-mystification of the interpretation of quantum mechanics. The main qualitative conclusions are: • Angular momentum is motion about an axis, not about a point, and only one direction in space may be chosen for a canonical angular momentum. • Whatever the mathematical convenience of the description of angular momenta by vectors, this description provides profoundly misleading analogies with genuine vector quantities In particular While it is entirely possible to resolve the angular momentum "vector" £ = f x p into components along any axes whatsoever, the resulting vector components are not (in general) capable of being interpreted as angular momenta about these directions. Colloquially one might say that the vector I has components in any direction but the physical quantity angular momentum does not. We shall wish to revisit these conclusions in later discussions.
This page is intentionally left blank
PART 4
Schrodinger's Mechanics
After a discussion of the most famous "thought experiment" based on crystal particle diffraction which purports to show that electrons are both particle and wave, the transition is made from the HamiltonJacobi equation of classical mechanics to the underlying dynamical law of Schrodinger's mechanics. The Schrodinger equation and its associated boundary conditions (which generate quantisation) are derived from this law. An attempt is made to distinguish, in Schrodinger's mechanics, between equations (which are expressions of the dynamical law) and identities (which are not) which, historically, has been the cause of some confusion in quantum mechanics.
This page is intentionally left blank
Chapter 6
Prelude: Particle Diffraction
There are a number of examples of experiments which are presented as supporting the essentially dual nature of what are seen in classical terms as particles. It is suggested both that these experiments make it mandatory to consider individual electrons (for example) as simultaneously having wave and particle properties and the existence in nature of mysterious cooperative properties independently of any physical interactions which the particles might have. In this chapter I take a closer look at the most familiar and ostensibly convincing of these experiments; particle diffraction.
Contents 6.1. 6.2. 6.3. 6.4. 6.5. 6.6. 6.7. 6.1.
History 6.1.1. The Experiment 6.1.2. The Explanations The Wave Theory The Particle Theory A Simple Case Experimental Verification The Answer to a Rhetorical Question Conclusion
113 114 114 115 116 118 120 121 121
History
Historically, the two-slit diffraction experiment has played a significant role in attempts to illustrate and elucidate the difficulties of interpretation associated with quantum theory. In particular, discussions of particle diffraction by periodic structures ("screens with slits") have been used to 113
Prelude: Particle Diffraction
114
support the idea of wave/particle duality and to illustrate the concept of the non-localisability of particles in the quantum domain. 6.1.1.
The
Experiment
There is enough experimental evidence in existence to establish that a beam of particles (electrons, say) of velocity v are diffracted in a way which has an identical mathematical form to a beam of monochromatic (constant wavelength) waves. The formal analogy is complete if the wavelength of the waves (A, say) is related to the momentum of the electrons by , , A = h/p =
Planck's constant ; : —momentum = electron mass x velocity
Experiments are done with crystals or other naturally-occurring forms of matter since the wavelengths A are quite short for accessible electron velocities. For the purposes of this discussion we can replace the actual experiments with an idealised case of a screen with slits. 6.1.2.
The
Explanations
There are two explanations of diffraction, a classical wave theory and a quantum particle theory, both of which lead to the same relationship for the distribution of diffracted particle probability density: • The classical Bragg wave theory involves only the idea of wave interference (each slit acting as a line source) with no involvement from the quantisation of dynamical variables. • The Duane particle diffraction theory, which involves the quantisation of the momentum components of any particles confined to a finite region of space; the linear momentum component of particles confined to a region of length L are restricted to be integral multiples of h/L. In one dimension, the linear momentum components of particles in the screen between the slits and perpendicular to the direction of those slits must be integral multiples of the inverse of the inter-slit distance (L, say). Since both models involve the description of the diffraction of particles (electrons, say) then the first is known as the (wave/particle) dual theory while the second is the unitary particle theory. The identical predictions of both theories were used in the general positivistic atmosphere in the pre-Second-World-War period to suggest that:
6.2.
The Wave
Theory
115
• Attempts to distinguish between the two theories were futile and • Theoretical distinctions which had no observable consequences were meaningless, and this position has remained basically unchanged. But a homomorphism of the mathematical structure of different theories most decidedly does not imply physical identity of the referents of those theories. One only has to think of, for example, 1. the familiar analogue modelling of many systems by electronic components 2. the many applications of Poisson's equation to be convinced that the set of physical systems described by mathematical structures is much larger than the set of mathematical structures used in those descriptions. • The 24 symmetry operations of a regular tetrahedron are isomorphic to the 24 permutations of 4 identical objects, because that is what the operations are, the permutations of the identical corners of a tetrahedron. • However, the fact that the formula F = C — P 4- 2 gives the number of degrees of freedom (F) of a system in terms of the number of components (C) and the number of phases (P) in one interpretation of the symbols and the relationship between the faces (P), edges (C) and vertices (F) of a convex polyhedron in another, is just a coincidence. • Similarly, the appearance in a theory of a partial differential equation containing spatial derivatives of second order and a time derivative of first order does not necessarily mean that there are real (physical) waves involved. In this chapter it is shown that it is possible to distinguish, both conceptually and experimentally, between the dual and unitary description of particle diffraction. A proposal is made for such an experiment.
6.2.
The Wave Theory
The wave theory of diffraction from single or multiple slits has been known for many years and is basically a "macroscopic" theory; it applies equally well to the diffraction of sea waves passing through a narrow channel and to light waves passing through a set of closely-spaced engraved lines on
Prelude: Particle
116
Diffraction
a glass plate. The theory is independent of the structure of the material surrounding the "slit", depending only on one basic assumption: When a set of monochromatic wave fronts impinge on a "screen" containing a single "slit" which is of a width comparable to the wavelength of the waves, the slit acts as a "line source" and emits wave fronts on the other side of the slit which are of (semi) cylindrical symmetry. When the screen contains many (equally-spaced, for simplicity) slits, each slit acts in the same way, emitting its own set of cylindrical wave fronts. The characteristic pattern which these sets of waves generate is due to the interference between the amplitudes of the sets being emitted from each slit; the waves may reinforce or partially cancel each other's amplitude in a characteristically regular fashion. Any plane parallel to the diffracting screen will record a pattern of maxima and minima in which the distance between two adjacent maxima will depend on the slit spacing. It is elementary to show that, for monochromatic radiation of wavelength A, inter-slit spacing L, and if the distance from the slits to a detecting screen is D then the spacing between adjacent maxima in the pattern on the detecting screen is A, given by A = Ax—.
6.3.
(6.2.1)
The Particle Theory
If we simply consider a idealised beam of particles impinging on a screen containing slits, then, if the beam is exactly perpendicular to the screen, there is ostensibly no reason for the beam to spread at all; the particles will either hit the screen and be stopped or "hit" the slit and simply pass through. The particles will only suffer any deflection if they hit or are hit by the internal edges of the slit. 1 So, in contrast to the wave theory of slit diffraction, the particle theory actually depends on the detailed structure of the material of the screen and, in particular, on the nature of the motion of screen material in the region between the slits. The essence of the particle theory is that any particle which is constrained to remain in a one-dimensional region of length L (say) has a *A maddening phenomenon familiar to all snooker (or pool) players when a ball oscillates rapidly across the mouth of the pocket before falling in.
6.3.
The Particle
117
Theory
momentum component in this dimension which may only be integral multiples of the basic quantum h/L. Unfortunately for the logical development of a particle theory of diffraction we are in a deadly embrace here since this quantisation of linear momentum is the simplest application of the Schrodinger equation which we have not yet introduced; it is nothing more than the familiar "particle in a box" model. For the moment, therefore, this quantisation of linear momentum is simply an assertion which will be justified (or not) a posteriori. However, it was clear to the pioneers of the "old" (pre-Schrodinger) quantum theory that, when a particular co-ordinate had a periodic structure, the value of the conjugate momentum is quantised. This was summarised in the so-called Sommerfeld quantisation condition: (p pidq1 = nh where q% is understood to inlude the spatial co-ordinates and time so that: • If an angle is cyclic, the corresponding angular momentum is quantised (2isp$ = mh, for integer m). • If a length is cyclic, the corresponding linear momentum is quantised (Lpx = £h, for integer €). • If time is cyclic, energy is quantised {{\/u)HE = nh, for integer n). Thus, when a beam of particles impinges on a screen with such a slit structure the particles passing through the slits may, if they are close to the edges of those slits, exchange momentum ("collide"2) with particles in the screen material. If we neglect any momentum exchanges in the initial direction of the beam perpendicular to the plane of the screen (which simply speed up or slow down the particles without deflecting them), then we can consider momentum exchanges which may take place in the two perpendicular directions within the screen which will cause changes in particle direction: • The quantity h/L is very small for directions along the slits since this length, L, is essentially of "laboratory" dimensions, and so momentum exchange here will be basically continuous, leading to a simple spreading of the beam in that dimension ("vertically", let's say). 2 Since the particles are charged, such "collisions" will be mutual short-range repulsions, of course.
Prelude: Particle
118
Diffraction
• However, if the inter-slit gap is small (this L being microscopic), the quantity h/L in the perpendicular direction ("horizontally") is large, and impacts between the beam and screen-material particles having a horizontal momentum component of integral multiples of h/L will lead to discrete deviations in the direction of the beam particles in the horizontal plane. Duane 3 showed how the quantitative treatment of this phenomenon for a periodic lattice generates the familiar Bragg law for the diffraction pattern. If we use an experimental setup identical to the one used above for the electromagnetic waves incident on a screen with slits where everything is the same except that the waves are replaced by a beam of particles of moment p, then it is easy to show that the distance between adjacent impact on the detecting screen is given by:
Which is of exactly the same form as (6.2.1) with the wavelength of the electromagnetic radiation replaced by the so-called de Broglie wavelength of the particle "A = h/p". This theory would predict a series of sharp lines on a detecting screen but any real screen made of real material would have a temperature-dependent spread of impacts centred around the theoretical quantity leading to a broadening of the line in a manner familiar to spectroscopists.
6.4.
A Simple Case
A special simple case of interest is the "three-slit" experiment; suppose a planar screen has three equally-spaced slits in it with inter-slit spacing of size appropriate to the momentum ("wavelength") of some incident particles so that a diffraction pattern is generated. Further suppose that the technology is available to collimate a beam of particles sufficiently accurately so that they only impinge on the centre slit of the three. Let us imagine what would be emitted by the slit in the two possible cases, wave diffraction and particle diffraction. 3
W Duane, "The transfer in quanta of radiation momentum to matter", Proc. Nat. Acad. Sci. 9, 158 (1923).
6-4-
A Simple
Case
119
• In the case of the classical wave theory, the presence of the two slits on which no "particle intensity" falls is irrelevant to the experiment. We simply expect that the single centre slit will act as a line source resulting in a cylindrically-symmetrical pattern with maximum intensity in line with that of the incident beam and falling away smoothly and symmetrically to zero for large enough deflection. • In the case of the quantised-inter-slit-momentum theory the situation is quite different. Particles passing through the central slit can interact with the material of the screen (the two narrow inter-slit parts) and exchange momentum only in integral multiples of the basic quantum (of h/L) independently of whether or not any particles pass through the other two slits. Thus, in this case, we would see a diffraction pattern centred about the direction of the incident beam with characteristic maxima and minima satisfying the now-familiar law of equations (6.2.1) and (6.3.2). The characteristic difference here is that, in the case of the Duane theory of quantised particle diffraction, the pattern is generated by momentum exchange with the periodically-quantised material of the screen independently for each set of particles passing through each slit. In the wave theory it is the combined effect of particles passing through the set of slits which generates the diffraction pattern. Thus, according to the wave-theory model, the emergent beam will generate a diffraction pattern only if more than one slit is "open". But, according to the particle model, the pattern will be generated by the component of the beam emerging from each slit. Of course, in both cases the actual amplitude of the pattern will be enhanced by the passage of the beam through many slits. Making an obvious extension of this argument, if an experiment could be arranged so that the beam only fell on a (regularly-spaced) subset of the slits, then the predictions of the two models would be different: • The wave model would predict a diffraction pattern obeying the Bragg law for the periodicity of the slits through which particles had actually passed (here the active slit spacing would be an integral multiple of the actual inter-slit gap). • The particle model would generate a set of deviations generated by the actual underlying basic slit pattern from each of the subset of slits through which particles actually passed. This would lead to sets of maxima and minima combined to form an overall pattern which would
Prelude: Particle
120
Diffraction
show the Bragg law of the basic slit pattern. Naturally this pattern would be attenuated by an overall envelope with the periodicity of the subset of gaps through which the beam had passed. It is important to note that, to see the full contrast between the predictions of the two models of particle diffraction, at least three slits must be used, not the more familiar two. If only two slits are used and the beam is collimated to pass through just one of them, the prediction for the particle model is very different, since impacts made by the particles passing through the slit would produce different effects according to which edge the impact occurred from. Only on one of the edges would the momentum of the screen material be quantised; the other edge would not be the limit of a small one-dimensional region; its dimensions would be macroscopic, from the slit edge to the end of the screen. So, in this CfilSG, cl beam of particles passing through one of a pair of slits would experience "horizontal" impacts which would be quantised from one edge but continuous from the other edge, spreading into an unsymmetrical pattern.
6.5.
Experimental Verification
Needless to say, the verification of this conjecture about the distinction between the two models of particle diffraction would be extremely experimentally demanding; the central problem being, of course, the collimation of the beam so that one could be sure that only one (or only a few relatively widely-separated) slits were traversed by the incident beam. More important, to make a genuine distinction the beam should consist of material about which there is universal agreement that it is composed of particles. This excludes the use of light beams since there is no such universal agreement; the dominant school of thought assumes electro-magnetic radiation to be composed of corpuscular photons while a significant minority regard the term "photon" as a convenient shorthand for "set of quantum numbers for the states of the electromagnetic field". Obviously, if a beam is used which actually is composed of a wave train then there is no distinction to be made.
6.6.
The Answer to a Rhetorical Question
It is common in semi-popular expositions of the quantum theory of particle slit diffraction to find statements like:
6.7.
Conclusion
121
"How is it possible that opening a slit through which the particle does not pass can affect its motion?", where the question is clearly intended to be rhetorical and startle the reader into thinking about "paradoxes" in the interpretation of quantum theory. But, using the particle diffraction theory there actually is a simple answer to this question: "Because opening another (nearby) slit changes the possible motions of the screen material from having continuous linear momentum in a direction in the screen to quantised motion in that direction and it is transfers of this linear momentum which are causing the pattern which you observe." or, even simpler and less pedantic: The pattern is generated by the interaction between two systems, changing the properties of either one of them will change the pattern.
6.7.
Conclusion
The purpose of the material in this chapter is not particularly to champion the cause of Duane's particle theory of diffraction, although it must be obvious where my prejudices lie, but to draw attention to two points which I believe have been lost sight of in attempts to understand quantum theory: • Mathematics is not physics; even though physical theories are often articulated in mathematical form, an identity or similarity of mathematical structure may or may not serve to indicate a similarity of structure in the material world. • Even in the event of having a satisfactory mathematical scheme which appears to rationalise a physical structure or process we can never shirk from the attempt to investigate any physical mechanisms which purport to underlie that rationalisation. Mathematical description cannot replace physical mechanism in science; the ghost of Ptolemy is always present to warn us. In the discussions of Schrodinger's mechanics which follow later in this work, nothing is dependent on the final outcome of an experimental determination of which of the two models of microscopic diffraction turns out to be true.
122
Prelude: Particle
Diffraction
Finally, I must note, parenthetically, that attempts have been made to solve the Schrodinger equation for model systems of this kind and such attempts are not at all trivial for at least two reasons: • The Hamiltonian for a system of particles and a model screen with slits is time independent; solution of the Schrodinger equation will only generate a static probability distribution for the electrons. • Any model potential for such a system will not have the convenient smoothness of the more familiar potentials; there are sharp "corners" with discontinuous derivatives in the potential as it goes from zero to infinity. But, much more important, the particle theory is dependent on the interactions between the particles and the inter-slit material which is not included in these models.
Chapter 7
The Genesis of Schrodinger's Mechanics
Using material developed in Part 3 a brief outline of the relationship between quantum and classical (particle) mechanics is given, stressing the nature and meaning of the variational principle which is the basis of Schrodinger's mechanics. Much of this material is descriptive and heuristic but the final section ("Summary") contains the principles which certainly will enable all of Schrodinger's mechanics to be generated as well as the formal structures which may be abstracted from it.
Contents 7.1. 7.2. 7.3.
7.4.
7.5. 7.6.
7.1.
Lagrangians, Hamiltonians, Variation Principles 7.1.1. Equations and Identities Replacing the Hamilton-Jacobi Equation Generalising the Action S 7.3.1. Changing the Notation for Action 7.3.2. Interpreting the Change Schrodinger's Dynamical Law 7.4.1. Position Probability and Energy Distributions 7.4.2. The Schrodinger Condition Probability Distributions? Summary of Basic Principles
123 125 126 128 129 131 134 135 136 140 142
Lagrangians, Hamiltonians, Variation Principles
T h e principles of classical particle mechanics are most cogently expressed in t e r m s of variation principles involving t h e Lagrangian or Hamiltonian functions. In view of t h e rather strict interpretation I am going t o place 123
The Genesis of Schrodinger's
124
Mechanics
on the corresponding variational principle in Schrodinger's mechanics, it is appropriate here to be absolutely clear about the meaning of these functions and what the effect of the variation principle is on their extremum values. Taking the Hamiltonian as an example, it is a function in general of 6JV + 1 variables: the 3N coordinates (collectively ql), the 37V conjugate momenta (collectively pi) and time (i); it has the dimensions of energy (ML2T~2). That is, for every set of values of q%, pi and t there is a value of H(qi,pi; t); here the special role of the variable t has been emphasised by the notation. When the variational principle is satisfied (when the equations of motion are solved) then, of course, the coordinates q% and momenta pi become known functions of the time t: qi—>qi(t);
Pi —> Pi(t)
where I have deferred to the usual convention in physics and not changed the notation for q% and pi when they become dependent on t, using the same symbol for the interpretation of, for example, a coordinate as for its functional dependence on t; a choice which, in this context, can be confusing. That is, the Hamiltonian function becomes a function (F, say 1 ) of time only:
H(q\Pi;t)^F{t). Also, when the equations of motion are solved, the value of the Hamiltonian function is, at every time t, numerically equal to the energy of the system: H(qi,pi;t)
= F(t) = E(t)
where it is important to stress that this latter expression is an equation not an identity; it is only true for those functions q1, pi which satisfy the variation principle (solve the equations of motion). The detailed functional forms of the Hamiltonian (H) and the energy (E) are different, but their numerical values are equal for trajectories which solve the equations of motion. The numerical value of the Hamiltonian function is only equal to the energy of the system for those particular functions q%{t) and pi{t) which solve the equations of motion for the system; that is UH = E" is only true for actual, real motions of the system. Purposely using new notation for the function of t only.
7.1.
Lagrangians,
Hamiltonians,
Variation Principles
125
In general, since the independent variables q%, pi and t may take any values whatsoever, the Hamiltonian function may take an infinite range of values different from the energy of the system albeit all having the dimensions of energy. Analogous considerations apply to the Lagrangian function L(ql,ql;t); it is a function of 6N+1 independent variables in general and, on solution of the equations of motion, becomes dependent on t only since the independent variables q% and
= minimum
where the variations involved are in the forms of the functions ql and q1; the explicit way in which the coordinates depend on time. Thus, the variation principle generates particular functional forms for the way in which the coordinates q% and the associated velocities q% depend on time t for actual allowed motions of the system. In the case of the Hamilton-Jacobi equation the situation is different. When the action function S is known and the momenta are replaced by derivatives with respect to the coordinates of this action function (dS/dq1) then the Hamiltonian function becomes an explicit function of t and the q% but not the pi (rather than a function of t only in the earlier case) which again can be emphasised by using specific notation for this functional dependence. We then have: H(q\pi;t)
—> F(t)
H(ql,Pi;t)
—> G(ql;t)
(say) Hamilton (say) Hamilton-Jacobi.
In both cases the functions formed from the Hamiltonian only refer to actual motions of the systems; the equations of motion are solved in generating F(t) or G(ql;t). 7.1.1.
Equations
and
Identities
In setting up the Hamiltonian function, the momentum components pi which are associated with each coordinate q% are defined by def
Pl =
dL{qi,qi;t)
a**
This is not an equation; it is an identity.
'
The Genesis of Schrodinger's
126
Mechanics
The familiar Hamiltonian "equations" of motion are often presented in the following symmetrical form:
dH . def 9H Qi = - 5 — • opt
The first of these is an equation — it contains Newton's F = ma in a generalised form — while the second is an identity — it defines "velocity" in Hamilton's theory. In practical applications of these equations, the second is simply used to eliminate one set of variables in order to be able to integrate the first. We shall meet situations very similar to this in investigating Schrodinger's mechanics; there are superficial (formal) similarities between expressions which are either equations or identities which, if we are to interpret the formalism, we must be able to distinguish. 2
7.2.
Replacing the Hamilton-Jacobi Equation
We have already seen, particularly in Section 4.3 on page 76 of Chapter 4, that in order to obtain a full solution to a dynamical problem from the relevant Hamilton-Jacobi (H-J) equation, in the sense of obtaining the actual particle trajectories as an explicit function of time, we must be able to obtain expression for the "initial conditions" of the motion. These expressions are given by the derivatives of the function S with respect to the "constants" introduced by solving the partial differential H-J equation. By fixing the numerical values of these quantities, we get explicit expressions for the trajectories. Now, in the dynamics of sub-atomic particles this information is simply not available; there is no hope of being able, for example, to obtain the initial positions of the 6 electrons in a carbon atom in order to be able to calculate the trajectories of these electrons and compare the resulting electron distribution and energies with experimental results. We must fall back on some way of calculating the (probability) distribution of the electrons. 2
A distinction which is not possible in the algebraic, Hilbert space formalism.
7.2. Replacing the Hamilton-Jacobi Equation
127
The H-J equation contains the best hope of being able to do this since: • It contains everything about the dynamics of all the possible trajectories of the system. • It is a function of space and time only, the momentum components being (literally) derived from the function S(ql;t). • There is an associated "trajectory density" equation derived from the condition that the H-J equation should conserve trajectories. Thus, speaking colloquially, we might say that "5 fills out configuration space with allowed trajectories" so that this function, unlike H or L, has an objectively real referent for each and every value of its arguments ql and t. What we are saying here is that, by a shift of emphasis, we may regard this referent as an abstract object: A particle whose motion satisfies the H-J equation specifically excluding any mention of (i.e. abstracting from) the initial conditions. This shift of emphasis is made because the next development in the mechanics of systems of particles came in attempts to describe the motions of particles for which there was (and is) no hope of being able to specify the initial conditions; atomic and sub-atomic particles. For most of these microscopic systems, experimental measurements are carried out on billions upon billions of concrete objects whose initial conditions (if that concept has meaning for sub-atomic particles) may well all be different. For all practical purposes, experiments can neither determine nor infer the "initial" conditions of, for example, the motion of the electrons in one carbon atom so that any mechanics which requires the specification of such data is doomed to impotence in the sub-atomic domain. Schrodinger took the Hamilton-Jacobi equation as his starting point for the creation of a new system of particle mechanics valid in the region of the very small and the very light. His contribution in his epoch-making first paper may be summarised in two steps; one apparently trivial and one boldly new. We will examine them in sequence, beginning with the smaller of the two steps. In looking at Schrodinger's work we must, of course, guard against the idea that his mechanics can be "derived" from the HamiltonJacobi equation; it cannot. Schrodinger's mechanics is a new creation, it contains new intuition about reality which mathematical manipulation can
The Genesis of Schrodinger's Mechanics
128
never supply; we can only hope to make the translation a little smoother, to bring, as I have tried to do in discussing the H-J equation, classical particle mechanics and quantum mechanics close together before making a jump. The difference between pedagogy and creation is that, when we jump, we know that there is something there to jump to because Schrodinger has been there before us. The considerations of the next few sections are therefore tentative and heuristic and the reader may safely deplore the whole project of attempting to join up classical particle mechanics with Schrodinger's mechanics. Nevertheless, once the Schrodinger Condition has been set up and the interpretation of the quantities it contains made clear, the whole of Schrodinger's mechanics may be derived and is, of course, independent of the heuristics used here. As we noted in Section 1.6 on page 13, it is the habit of mathematicians and mathematically-inclined scientists to present their work using the "lapidary method"; to show a selection of specimens so well-chosen and perfectly polished that there is no clue to their origin and original appearance. I prefer a more historical and human approach, exhibiting the rough-hewn specimens before cosmetic treatment.
7.3.
Generalising the Action S
Recall the point which we reached at the end of Chapter 4 in the attempt to find the most general expression of the laws of classical particle mechanics; an equation which generated the possible trajectories of particles whose motions are determined by a particular Hamiltonian and an equation for the density of particle trajectories in ordinary space: „ ' d S H
i
\
dS
„
q t + =0
lM' ' ) *
dp dt
„
/
dH ydidS/dq*)
These two partial differential equations may be derived from a general variation principle:
5 f dV fdtC = S fdV
fdtlp
H(dS/dq\Qi;t)
+~
)=0.
(7.3.1)
7.3.
Generalising the Action S
129
What Schrodinger did 3 was to find a variation principle which: • Replaced the two real functions S and p by a single complex function ip and generated an equation for tp. • Enabled the calculation of the allowed energies of systems of microscopic particles in the absence of any initial conditions on the particles' motion. The function ip proved to be capable of interpretation as generating a probability distribution for the positions of the particles when the abstract object to which the function ip refers is suitably defined. 7.3.1.
Changing the Notation
for
Action
The solution of the Hamilton-Jacobi equation, S, has the dimensions of energy x time ("action") and its time derivatives have, as we have seen, the dimensions of energy and, in particular, —dS/dt is the total energy of the system as a function of the ql (and possibly t). Schrodinger, in his investigations of the Hamilton-Jacobi equation made an apparently trivial change of notation writing: S =f K In V : 1> d= exp(S/K)
(7.3.2)
where the numerical factor was added, since In?/' has no dimensions and S should have dimensions of action so that K must have the dimensions of action; obviously the numerical value of K depends on the system of units being employed for actual calculations: a "natural" choice of units would be one which gave K the numerical value of unity. If this substitution is made in the Hamilton-Jacobi equation we need the relationship dS_ _ K_dj)_ da1 ip dqi i.e.
9V _ ip dS 'dqi ~ ~K~dqi for the momenta, and then the equation becomes
"i'.S£.
Not in exactly these terms, though.
(7.3.3)
130
The Genesis of Schrodinger's
Mechanics
which is the Hamilton-Jacobi equation in the new notation. However, this "mere notational change" provides an additional motivation for the change of emphasis in the interpretation of the equation which was mentioned in the previous chapter. For simplicity, consider the one-dimensional case with Hamiltonian H(q,p) = ^p2
+ V(q).
H is independent of time so the Hamilton-Jacobi equation is
41)
E =0
(7.3.5)
(where partial derivative notation has been retained in this one-dimensional case for consistency with the general case). Using Schrodinger's notation this equation becomes
i.e.
j-£(gy+v(,-.-o or +V(q)i>2-Ei>2=0.
~(^f] 2m \dq
(7.3.6)
Now if we transform just the derivative part of the equation back into the original notation involving S using (7.3.3) we have:
(df\2 _ f_ (dS\2 \dq)
~
K*\dq)
i.e. W>2)^ ( ^ )
2
+ W2)V{q) - {tf)E
= 0
(7.3.7)
7.3.
Generalising
the Action
S
131
the right-hand-side of which is identical to the integrand ("Lagrangian 4 density") of the equation (7.3.1 on page 128) from which classical particle mechanics may be derived if we identify the classical p with ip2. If we allow that ip may be complex, going through the whole procedure again generates m
|2
i (dsV
2 ^ \ d j )
,
L/J2T
+Wv(q)-\Tp\E
=0
(7-3-8)
and insisting that (7.3.8) be identical to the Hamilton-Jacobi equation yields K = ±ik (say) 5 5 = — iklnip:
ip =
exp(iS/k)
where k and S are real and the minus sign has been chosen for conventional reasons. Note that, although equation (7.3.8) is written in terms of the two functions S and tp this is only to emphasise its similarity to equation (7.3.1), the essence of the heuristic arguments here is to show that the whole equation may, using (7.3.3), be written in terms of a single complex function ip: -k2 2m 7.3.2.
Interpreting
dip dq the
+ V(q)\iP\2-E\iP\2
= 0.
Change
Now we can attempt to interpret the terms in the right-hand side of (7.3.8). Let us do this by temporarily ignoring the fact that \ip2\ is constrained to be unity and look at the form of (7.3.8) when translated back into coordinate 4 T h e r e is an unfortunate conflict of nomenclature here. Historically, the Lagrangian function was the one which we have used earlier, the difference between the kinetic and potential energies of a classical mechanical system. Its appearance as the integrand in the derivation of Hamilton's equations made it the archetype for the variational method so that, in a mathematical context, "Lagrangian" and "Lagrangian density" have lost their original interpretation and simply become colloquial names for the integrand in a variational principle. This is particularly unfortunate in mechanics since one is typically dealing with "real" Lagrangians and variational principles, which may involve integrands which are simply called Lagrangians. 5 T h i s is where we depart from Schrodinger's own derivation slightly; Schrodinger had K real.
132
The Genesis of Schrodinger's
Mechanics
and momentum variables, i.e. m2~p2
+ ^\2V(q)-\^\2E = 0.
(7.3.9)
Taking the first term, it has the form (positive function) x (Kinetic Energy) that is, it has the form of a distribution of kinetic energy. Similarly, the other two terms in (7.3.9) have the form of a distribution of potential energy and a distribution of total energy, respectively. Further, the function \ip\2 has some of the properties of a distribution function: it is always positive and it is bounded. Recall again that it is possible to interpret S as referring to an ensemble of systems and that p (replaced now by | ip\2) was the density of trajectories or particle density and we have the beginnings of a new approach. The essential point, however, is to make sure that the precise meaning of "a distribution of trajectories or particles" is made clear. With these hopeful ideas in mind we now go back to equation (7.3.2) and allow S to be complex while retaining the same relationship between S and ij). This is now not a trivial change of notation generating mere tautologies; it is a new assumption in mechanics. Since classical mechanics has no use for a two-component S function, all the dynamics comes out of a real S. The whole object of this change is to introduce a "new degree of freedom" into the development which will separate \tjj\2 from S and the momentum in order that \ijj\2 can play the role of a genuine distribution function not constrained to be a constant. Thus, by writing S - iR = -ik In ip : ip = exp[(fi + iS)/k]
(7.3.10)
we have M 2 = exp{2R/k)
= p(q) (say)
and equation (7.3.9) becomes p^p2
+ pV(q)-pE = 0
(7.3.11)
where p is a function of space and a discussion of the meaning of p (the momentum) in the new circumstances of complex "action" has been deliberately deferred. We can now use the same interpretation of the terms in (7.3.11) as before; each term is a distribution function multiplying an energy
7.3.
Generalising the Action S
133
function. But what is p(q) a distribution of and how is it to be determined? We have introduced a new function R and no equation to determine it; we can still formally cancel p from (7.3.11) leaving the Hamilton-Jacobi equation for S. In the classical continuity equation, which was one of our starting points, the strict interpretation of the function p was a density or distribution of trajectories since, for example, in the case of a single particle, there can be no such thing as a particle density; the particle is either at a given point or it is not. Similar remarks apply to many-particle systems. If we visualise the motion of N particles in three-dimensional space as the motion of a single particle in a 3iV-dimensional configuration space, the same conclusion holds; a distribution of trajectories is a mathematical object but a distribution of particle(s) is a physical quantity. The only coherent interpretation of the function p — \I/J\2 is as a probability distribution for the position of the particle(s). The transition between classical mechanics and Schrodinger's mechanics involves making explicit what is only implicit in the Hamilton-Jacobi equation and its associated continuity equation; the "distribution of trajectories" function is replaced by a particle probability distribution function. This decision takes us away from the general ansatz of classical mechanics, dealing as we now shall be with probabilities. The most important consequence of dealing with probabilities is the change in the referent involved. If we continue for the moment to think of a single-particle system, the referent of any theory which interprets p as a particle position probability density must be the abstract particle in the environment generated by the particular constraints of potential energy function, etc. Naturally, we cannot simply call a function a probability distribution function; it must satisfy the mathematical conditions imposed on any such function. That p satisfies such conditions is not obvious, indeed it is impossible to say until we have a (differential) equation which will generate it via the function ip. Investigation of these conditions must be deferred until we have some more information. Like us, Schrodinger was acutely aware of the need to develop a new mechanics of sub-atomic particles and, no doubt, even more acutely aware that a new mechanics cannot be got by changes in notation and re-interpretation of that notation, however suggestive those changes might be. What was needed to enable the theoretical understanding of the
134
The Genesis of Schrodinger's
Mechanics
dynamics of sub-atomic particles was a variational condition which would "contain" or "go over into" the Hamilton-Jacobi condition for large masses and allow for our ignorance of "initial conditions" in the sub-atomic world. Schrodinger was able to present a single variational condition which generated both R and S because they had both been absorbed into a single complex function V, and so could obtain both the particle's probability distribution and the momenta. Historically, Schrodinger's reasoning (if creative thinking can be called reasoning) was different from the development given here and was partially based on an analogy between optics and mechanics originally due to Hamilton. This very analogy led to some confusion about the interpretation of quantum mechanics which we are trying to side-step here; as we remarked above pedagogy is not creativity.
7.4.
Schrodinger's D y n a m i c a l Law
I have tried to present a plausible way in which the Hamilton-Jacobi equation and its continuity equation could be extended and generalised. The elements are: 1. The concentration, in classical mechanics, of information about trajectories into a single scalar function S and a trajectory density function p generated by the variation principle equation (7.3.1). 2. The functions S and p being regarded as referring to the trajectories of an ensemble'of systems differing in their initial conditions. 3. The possibility of relating S to the distribution function p by the (formally) slight generalisation of admitting complex S via the complex function ip. 4. Changing the referent of, for example, the mechanics of a single particle in a given environment from "an ensemble of trajectories of a particle in the given environment with their initial conditions" to the abstract object "a particle in the given environment". Now it is time to look for a variational principle which will replace equation (7.3.1) in this new situation. It turns out that all that is required is to use equation (7.3.1) in the new context described in the last section with p and S generated from the function ip. This is the mathematics of the derivation, the interpretation will be quite different.
7.4- Schrodinger's Dynamical Law
7.4.1.
Position
135
Probability
and Energy
Distributions
The function p{q;t) = \i>{q;t)\2 is (for the abstract single particle in the given environment) a probability of position distribution, i.e. P(W)=
f
\ip\2dV
WcR3
is the probability that the abstract particle be in the region of space W (where R3 models ordinary three-dimensional space E3). If this is the case, we must assume that ip may be normalised to unity by a suitable numerical multiplier: P(R3) = f
|V>|W = 1.
That is, we expect that random measurements of the positions of concrete particles in the given environment will give an approximation to P(W) by means of counting the relative number of such concrete particles found in the region of E3 modelled by the region W of R3. The "distribution" of the dynamical variables for this abstract object takes the form Probability distribution x Value of the dynamical of the particle at a point. variable at that point. which we may write as a distribution of the dynamical variable A, given by pA(qi;t)
=
p(qi;t)A{qi,pi;t)
where the dynamical variable A(q\pi; t) will, in general, depend on position (q1), momenta (pi) and time (t). Notice that this terminology carries the possibilities of misinterpretation, in particular:
The Genesis of Schrodinger's
136
Mechanics
• Quantities like pA.{qx;t) are not probability distribution functions; they are not always positive and may well not satisfy any of Kolmogorov's axioms. • The variable A is not spread out in space with density PA- A is a property of the particle and so there is only "any A" in a given region when there is a particle in that region and the probability that the abstract particle be in a given region is given by the position probability distribution function
ptf-t). So, it is safer to think of quantities like
PA(Q%;
t) as
Probability distribution Value of the dynamical variable A of the abstract particle x that a particle would have at a point. if it were at that point. 7.4.2.
The Schrodinger
Condition
In Schrodinger's notation the important Hamiltonian and energy densities become
and
What is now required is an "equation of motion" — a new dynamical law — which replaces the Hamilton-Jacobi equation in these new circumstances. The H-J equation which determines the particle trajectories is determined by the variation principle (7.3.1):
'/*'/*{'
H{dS/dq\qi;t)
BS + —
0
and can be given quite a simple verbal formulation: Of all the possible trajectories q%{t) and momenta Pi(t) of the particles described by H(ql,pi,t), the ones which occur in nature are those for which the value of the function H is numerically equal to the energy of the system.
7.4-
Schrodinger's
Dynamical
Law
137
That is, for real motions of the particles in the system, only those q1 and pi are allowed in H which make H
= - f t =E
<«*>•
The equation which determines the density of trajectories p is also obtained from this variation principle in which the integrand is minimised with respect to two functions p and S generating two differential equations; one for p and one for 5. Schrodinger's theory uses the same variation principle with the functions p and S replaced by their forms in terms of the new function ip:
'W*M"('^&«K£]}-0 That is-.dfdVJ
dt{pH - pE} = 0.
(7.4.13)
This replaces the classical requirement by a new quantum law which is just as easy to state verbally: The Schrodinger Condition Of all possible trajectories q%{t) and momenta Pi(t) of the particles described by H the ones which occur in nature are the ones which on the average over space and time make H equal to the energy of the system. That is, so to speak, Schrodinger's modification of the Hamilton-Jacobi principle is that the Hamilton-Jacobi equation does not have to be obeyed point by point in a configuration space of 3N dimensions but only in the mean over all space. The requirement that both p and S be expressed in terms of the single function ip means that the Schrodinger Condition will generate a single differential equation. Although the above verbal expression refers to the trajectories of the particles it is important to note that, in Schrodinger's mechanics, the trajectories are not determined. What is determined by the Schrodinger Codition are the functions ip which determine (among other things) only the particle probability distributions. We shall see in the next chapter how to obtain the functions ip f° r particular systems; from now on the term "state function" will be used for a function which satisfies the Schrodinger Condition. 6 6 "Wave function" is the more widely used term here but only very rarely do these functions have the form of waves.
The Genesis of Schrodinger's Mechanics
138
In more compact notation this new dynamical law becomes: I
PHdVdt
-
f psdVdt = 0
(7.4.14)
where the Hamiltonian and energy densities PH and PE are denned above. As it stands this variational principle is not in a usable form. What we shall do shortly is carry through the variational calculus to obtain a differential equation which will enable us to actually compute the functions V> for a variety of mechanical systems. Application of standard variational methods to (7.4.13) generates the Schrodinger equation for ip and some boundary conditions which will need elucidation. The function \ip\2, once determined, can be expected to contain reference to all possible trajectories with equal weights. For example, for an isolated single particle moving in a potential V the distribution is over all possible trajectories with constant energy obeying (7.4.13); the trajectories only differ in "initial" conditions. Now in studying a dynamical law which only determines the properties of averages (quadratures) of dynamical variables (H and E) over trajectories we should not be surprised if we cannot recover the individual trajectories over which the averages have been taken. Schrodinger's law generates an equation for the probability distribution of the abstract particle in space. Certainly this may be visualised as a distribution averaged over all possible initial conditions for the given abstract particle's environment, i.e. over all possible concrete trajectories. These individual concrete trajectories are, however, not required by Schrodinger's mechanics to solve an Hamilton-Jacobi equation (or, indeed any equation) only the averages (properties of the abstract object) are fixed by (7.4.13). That is not necessarily to say that the particles in each concrete object having the environment of the abstract object but (say) differing in initial conditions do not have perfectly definite trajectories along which some (as yet unknown) laws are obeyed, it is simply that condition (7.4.13) does not tell us what these trajectories are. For example, in the classical mechanics of an isolated system we have at every point in space
H-E
= 0.
But all that (7.4.13) requires is that if
/ JWi
(PH-pE)dV
=5
7.5.
Probability
Distributions?
139
for some region W\ of space, then there are enough "allowed trajectories" so that, for some other region W2 of space, / (pH - PE)dV = -S Jw2 so that, on average over all space: / (pH-pE)dV JR3
= 0.
As we shall see later, the connections amongst the averages are able to be cast into a form reminiscent of Newton's law — the Ehrenfest relationships — but again only the averages 7 obey these relationships, not the individual trajectories which are completely invisible to Schrodinger's mechanics. The difference between the mechanics generated by the Schrodinger Condition and classical statistical mechanics which deals with ensembles,8 distributions and ensemble averages will help to make the above points clearer. In Schrodinger's mechanics the motion of the abstract object is required to solve the quantum "Hamilton-Jacobi" equation; for all we know individual concrete objects may well not obey any Hamilton-Jacobi-like equation. In classical statistical mechanics the motion of each member of the relevant ensemble is required to solve the Hamilton-Jacobi equation exactly and averaging is done with these exact solutions. That is, the condition for the satisfaction of the relevant dynamical law is: H —E =0 p = 8(H — E) (H — E) = 0
Classical Particle Mechanics Classical Statistical Mechanics Schrodinger's Mechanics
Showing the clear "mean value" nature of the quantum case.
7.5.
Probability Distributions?
In this section a simplified example is used to attempt to clarify the nature of the alleged distribution function |t/>|2 since, so far, it has simply been 7
We shall see in Section 16.1.1 that even these averages are misleading. Here, for obvious reasons, I mean the older literal, Boltzmann, type of ensemble not the Gibbs ensemble. 8
The Genesis of Schrodinger 's Mechanics
140
asserted that this function is a probability distribution without any check on the properties which a probability distribution function must have. Attention is again restricted to a single abstract particle in ordinary three-dimensional space for simplicity and intuitive appeal since it is intended to try to picture the various qualities appearing in Schrodinger's theory. The function V fixed by the variational requirement (7.4.14) is, in general, complex and its primary physical interpretation is via \ip\2 as indicated in the build-up to (7.4.14). For an isolated (constant energy) system with a time-independent Hamiltonian function, —dS/dt = constant = E (say) and so the function V is dependent on time only through an exponential factor of modulus unity (ex-p(iEt/k)) which does not appear in \ip\2 and so we neglect it for the moment. The interpretation of |V|2 a s a particle probability distribution is clearly that /
\ip\2dV WcR3
(7.5.15)
is the (relative) probability that the abstract particle be in the region of space modelled by W. Clearly, numbers like (7.5.15) must be judged by reference to the size of [
\i)\2dV
(7.5.16)
JR3
if R? models 9 all three-dimensional space. If (7.5.16) is finite (and this is not always the case) then tp can be re-scaled by a constant factor so that (7.5.16) has some convenient value; unity or the number of particles in the abstract object are obvious choices. In fact, it is convenient to normalise ift to unity, 10 i.e. insist, by use of a numerical factor, that I \i>\2dV = 1
(7.5.17)
so that the relative numbers
fwM2dV ]\i>\2dV
(7.5.18)
are the same as the numbers (7.5.15). 9
Here, as elsewhere, I distinguish between three-dimensional space and the product of three copies of the real number system R3 plus three linearly independent directions which model that space. This does lead to some unfortunate circumlocutions at times. 10 This is obviously convenient for the probability interpretation of ip but is also means that any theory does not have to specify the number of particles in ip.
7.5.
Probability
Distributions?
141
Now if (7.5.17) is imposed then the measures (7.5.15) satisfy Kolmogorov's axioms for an uninterpreted probability system, for if: P(W) = f \xjj\2dV Jw then • P{W) > P(W) if W D W • P\WX) + P(W2) = P{WX + W2) if Wi n W2 = 0 . P(R3) = 1 Further, since we expect that solution of the variational problem will generate a differential equation for ip which will generally ensure that |^| 2 is a distribution function, i.e. • \ip\2 be single valued • IV7!2 be continuous • and, of course, \ip\2 > 0 We may therefore use any or all of the techniques and concepts of probability theory. In particular we may refer to the numbers (7.5.18) (normalised measures) as "the probability that the abstract particle be in region W" (recall, for the moment we are concentrating on single particle system for simplicity). The experimental verification of these numbers is, naturally, via sets of random measurements on large numbers of concrete one-particle systems. 11 The relative numbers of times a concrete particle in such random experiments is found in those regions should give numerical measures of these probabilities if enough concrete systems are used and they satisfy the statistical criteria of randomness. The extension of these ideas to systems of many-particle systems is straightforward and we may, with caution, use the probability concept and say that /
mq\q2,.
. .,q3N)\2dV1dV2
• • • dVN
Wi,W2,...
is the probability that particle 1 is in region Wi, particle 2 in region W2, etc. Always provided that the function ip has been normalised to unity in 3iV-dimensional space. 11
Or, less realistically, large sets of random measurements on a single concrete particle.
The Genesis of Schrodinger's
142
Mechanics
That these mathematical conditions on 0 are satisfied must be verified in each specific case studied.
7.6.
Summary of Basic Principles
Whatever the reader might make of the attempts in this chapter to make the transition from classical particle mechanics to Schrodinger's mechanics comprehensible, this final section of this chapter gives a collection of the principles from which all of Schrodinger's mechanics may be generated based on, but not dependent on, the previous heuristic considerations. It must be stressed yet again that it is not possible to derive Schrodinger's mechanics from classical particle mechanics however suggestive some of the analogies might be. Schrodinger's mechanics is a physical theory with methods, concepts and interpretations quite different from the classical theory. But, of course, since classical mechanics, classical statistical mechanics and Schrodinger's mechanics have contiguous, even overlapping, regions of applicability, it is not surprising that some mathematical structures and some concepts are common to all three. In this summary I collect together the basic ideas which have been developed12 and which do enable the generation of Schrodinger's mechanics and all the associated abstract structures and concepts. These principles are just that — principles — they are not axioms and are not presented as such. They presuppose most of standard classical mathematics and the logic which is assumed by mathematics. In physical theories axioms are only ever generated post hoc and are used to clarify and systematize an existing body of theory; like poetry, they may be thought of as emotion recollected in tranquillity. We start from the form of the classical variation principle (7.3.1) from which both the Hamilton-Jacobi equation and its associated conservation equation may be derived:
8JdVJdtL\H(dS/dqi,qi;t) = 5 f dV f dt {PH{dS/dq\ 12
+^
\
ql; t) - PE) = 0
W i t h hindsight of the actual material of quantum theory, of course.
7.6.
Summary
of Basic
Principles
143
and make the following assumptions to generate Schrodinger's mechanics (for concreteness for single-particle systems, although the generalisation is straightforward). 1. The referent of the theory is the abstract particle in the environment described by any potential function in the Hamiltonian H. 2. Both of the functions S(q*;t) and p(ql;t) may be expressed in terms of a single (complex, in general) new function ip(ql;t): S = —iklnif) def . , 12
The function p(ql;t) is a position probability density for the abstract particle in the usual Kolmogorov sense:
/
\1>\2dV
WcR3
Jw is the probability that the abstract particle be in a region of E3 (ordinary three-dimensional space) modelled by W if p is normalised over all space to unity. 3. The momenta are, as usual, given by def (dS\
=
^ik
/&4J\
This quantity is the momentum that the abstract particle has if it is at the point ql at time t. 4. The distribution of momentum in three-dimensional space (pPi, say) is this momentum multiplied by the position probability density for the abstract particle: pPM;t)
d
= pW\t)
xpi
= tfV;<)(-^)w;*)-
144
The Genesis of Schrodinger's
Mechanics
5. If any dynamical variable (A, say) depends on coordinates ql and momenta pi in classical particle mechanics, then in Schrodinger's mechanics the variable has exactly the same form with the momenta replaced by the above expression:
and the corresponding distribution of A is given by an expression of identical form to that for momenta:
In particular the Hamiltonian density (PH, say) in the above variation principle is given by this expression:
6. The energy (E, say) is, again as usual,
so that the distribution of energy is
pE d=lf pE = ptf; t) (ik^j rptf; t) . Inserting the expressions for pn and PE into the classical variation principle above will generate the Schrodinger equation and the relevant boundary conditions for the problem. There is an important point to be made here. Earlier, I have insisted that Schrodinger's mechanics differs profoundly from classical mechanics and cannot be derived from the latter. But in comparing the two theories in their most general form — the variation principle (7.3.1) — we seem to find the opposite to be the case. The only apparent difference seems to be in notation: • If we take the functions p(ql;t) and S(ql;t) as independent variables in (7.3.1) we obtain the Hamilton-Jacobi equations and its continuity equation.
7.6.
Summary of Basic Principles
145
• If we take
P&t) S&t)™
d
=\W,t)\2 -ikhirl/iq^t)
we obtain (as we shall shortly see) the Schrodinger equation and the associated theory which I allege is quite different. This seems, at first sight, to be nothing more than a notational change; using one complex function to do duty for two real functions. The crucial difference lies in the fact that the definition of the momenta (conjugate to q%) in Schrodinger's mechanics is still def dS which definition contains the derivative of ip and therefore contains an implied derivative of the probability density function p which is not present in the classical mechanical case. When the Schrodinger equation has been derived and some associated equations for momentum distributions established, this matter will be addressed in earnest, since it is clearly here where the essential difference between classical particle mechanics and Schrodinger's mechanics lies. Notice that the definition of, for example, the square of a momentum is given by: pPi(ql;t)
= (The square of the momentum) x (Position probability distribution)
and certainly not (Momentum x Position probability distribution) as we have implied by using the correct terminology in A and H. Since, among other things, in the latter case the dimensions of the Hamiltonian density would be different from the dimensions of the energy density.
This page is intentionally left blank
Chapter 8
The Schrodinger Equation
At last we are able to generate the Schrodinger equation which is at the heart of all applications of Schrodinger's mechanics. The solutions of this equation for particular systems enables the probability distributions for abstract systems to be computed and, as it turns out, the other product of the Schrodinger Condition (the boundary conditions on these distributions) generates the most surprising feature of Schrodinger's mechanics, the existence of discrete, quantised, values of some dynamical variables; in particular the existence of quantised energies for particles in some environments. The interpretation of the solutions of the Schrodinger equation is, of course, fixed in advance by the considerations of the previous chapter; but there are some unfamiliar consequences of this interpretation.
Contents 8.1. 8.2. 8.3. 8.4.
8.1.
The Variational Derivation Some Interpretation The Boundary Conditions The Time-Independent Schrodinger Equation
147 152 156 158
The Variational Derivation
The application of the standard techniques of the variational calculus to the Schrodinger Condition is basically the same as the derivation of the Lagrange equations except that the integrand ("Lagrangian") is itself an integral over all space of a "Lagrangian density". The constant k involved 147
The Schrodinger
148
Equation
in the relationship between the functions S and ip in S = — iklnip will be taken to be unity in what follows. Its value must be obtained a posteriori by comparing the results of specific calulations with experiment. Schrodinger's requirement that the Hamiltonian function be equal to the energy on average has the general form of a variational principle of the type 5A = 0 where A = j A(i/>, Tp*,diip, diip\dti>,
dt^*)dVdt
where tp is a function of the q% and t and a=
d
* fl?
;
d
» dt=
di
3N
dV = ^dQ =
^Y[dqz i=l
where it is assumed that there are N particles in the system and, as usual, g is the metric determinant of the co-ordinates ql which may be evaluated as the Jacobian of the transformation between Cartesians and the ql which we assume to be non-zero. For convenience, we may visualise the single particle in ordinary threedimensional space, i.e. when the 3./V-dimensional configuration space is ordinary, real, space. In orthogonal co-ordinates for example y/g = /11/12/13 where the hi are the scale functions associated with the tangent space basis in the usual way. Thus A is a functional of xp and tp*, etc., and it is desired to find a solution of the variational problem by choice of optimum tp; the standard problem in variational calculus. I use elementary methods. Let Sip and 5ip* be linearly independent variations in the linearly independent functions tp ' T h i s decision amounts to taking Planck's constant h divided by 2-K (h = h/2-n) as the unit of action; it is a very small unit. Where it is desireable to stress the fact that Planck's constant is involved I shall revert to standard units and write in h explicitly.
8.1.
The Variational
149
Derivation
and ip* so that we may investigate the variation in the functional A in the neighbourhood of its value at ip, %p* by writing 1p-+1p + dxp = 1p + €T] tp* -» V* + Sip* = ip* + erf where e is a "small" real parameter and 77, f]* are linearly independent functions arbitrary apart possibly from some boundary conditions. It is assumed that the integrand A is a sufficiently smooth function of ip and ip* so that it may be expanded as a Taylor series about ip, ip*:
plus additional quadratic and higher terms in 6ip, Sip*. Now the variation "operator" 6 and partial differentiation commute so that 5{di%p) = di(6ip) = edi-q 5{dtip) = dt(6ip) = edtv • Thus, to first order in e ..
(dA \dip
^
dA
,
OA j^diditP) ^
dA
dA d(dtip) _ ,
+ 50^ + L ^ ) ^
a
9A
+
.
\
W^J
and so
The Schrodinger
150
Equation
etc. by carrying through part of the integration. Taking a typical one of the terms containing time derivatives of 77 and integrating by parts:
where the first (integrated) term contributes to the boundary conditions on V> to be discussed later. This satisfies the requirement that 77 appear as a factor in the integrand. A typical member of the terms involving spatial derivatives d^tp is: dA /
f
W^)dl7]dVdt
=
dA
J ^woa^Qd'•
Again, integrating by parts and noting the additional complication of the presence of ^fg we obtain
J^d^)diT]dQdt=[wmM
~ldi
iy^mf))vdQdt-
Again the first term contributes to the boundaries. Using these typical terms the expression for 5A to first order in e is
(plus an expression of identical form in if and tp*) for arbitrary variations 77, 77* in tp, tp*. The condition 5A = 0 and the fact that 77, 77* are arbitrary can only be jointly satisfied if the factors multiplying 77 and 77* in the integrand are identically zero, i.e. vanish for all values of the ql and t. This gives two equations, one for the multiplier of 77 and one for the multiplier of 77*. Since ip and tp* appear symmetrically in A both equations are of the same form: dA
^
1
^
£iy/9
/
r
d A \
V d(di^)J
\d(M)
= 0.
These equations are the Euler-Lagrange equations which, together with the boundary terms, fix tp and ip*. The equations above fix ip and tp* in the region of space and time over which the integration, defining A in terms of A, is carried out. In the case with which we are concerned the integrand A is the difference between the Hamiltonian density and the energy density. Classically in
8.1.
The Variational
Derivation
151
general co-ordinates 2 this is 3JV
A = H-E
= T + V - E = ^ - Y 2m
J
gklPkPl + V(
k%
where the gkl are the elements of the metric tensor and pk, pi are the momentum components conjugate to the general co-ordinates qk, ql. The potential energy function V has been written as a function of the spatial co-ordinates only, but the derivation is still valid for those exceptional occasions when V depends on t. In quantum theory we use the Schrodinger form of the complex action to obtain the kinetic energy density and the energy density
2m
fc V
k,l=l 3N
2m 1 2m
y
, x r
k,l=l 3N
/dip* \dq^
fc,;=i
.
)
\8ql
ar\ (- i dtp
l dqk J \ 1%p dq
m
and
£; = ip*ip
i dtp
"ij>~dt - * %
giving 1
3N
2m The quantities needed to generate an Euler-Lagrange equation for ip are: dA .dtp = Vxp v — i— dtp* dt 2 I have taken the liberty of using a single mass (m) for all the particles. This is notational convenience since, using m* would give the spurious impression that each particle had 3 independent masses (one for each coordinate). The alternative, using one mass for each triple of coordinates generates a notational nightmare. None of this matters to the derivation.
152
The Schrodinger
2m f^9
d(djtp*) dA
3(d t V*)
Equation
dql
= 0
which, when substituted into the general equation, give an equation determining ip:
<**-<£>-K;**HH)" which may be re-arranged to
which is Schrodinger's equation for ip in general co-ordinates ql and t. If we go through a similar procedure for variations (5ip) in the function ip we obtain an equation which fixes the other linearly independent function (V>*) in the integrand A. This equation turns out to be the complex conjugate of the equation we have just determined so that only one equation is needed to generate both tp and ip*.
8.2.
Some Interpretation
Now, for a single-particle in 3 dimensions, the first (spatial derivative) term in the Schrodinger equation is recognisable as the full expression for the Laplacian operator V 2 in general co-ordinates; and so the equation can be made to look more compact and invariant with respect to co-ordinate systems by writing it as
-i-W
+^ =^
which is its usual form, or
-s^"*-'*
(8-2.2)
8.2.
Some Interpretation
153
It is also easy to see that the spatial derivative term in the general (iV-particle) case is just a sum of such V 2 terms (one for each particle):
using Vfc to mean the Laplacian of the coordinates of the fcth particle and dropping the (notationally convenient) constraint of particles with all the same mass. If, in this geometrical spirit, we re-write the kinetic energy expression occurring in the original form of the Schrodinger Condition for a single particle in terms of vector operators:
3,1 = 1
hence
we can see that there is a straightforward recipe for generating the Schrodinger equation from the Schrodinger Condition: Expression in A
becomes
ivvi2 ip*ipV
In Schrodinger equation
-W -
Vip dtp
which arises because of the simple form of A as a function of ip* and ip. Further, if we multiply the Schrodinger equation from the left by ip* giving
* , (-ST V "*) + V I*I"-'*'§F the similarity to the original A is even more striking; on the left-handside only the kinetic energy expression is replaced by another term. This similarity has proved so striking as to have been extremely misleading, historically speaking. By concentrating attempts to obtain a physical
The Schrodinger Equation
154
interpretation of the formalism on an examination of the Schrodinger equation instead of the more fundamental Schrodinger variational condition, paradoxes have crept into the physical interpretation of Schrodinger's quantum mechanics which are avoidable. Unfortunately, the very appearance of the Schrodinger equation, particularly in the form given above, has the appearance of an "energy balance" equation which has led to the term
being interpreted as the kinetic energy density which it is not. The kinetic energy density is, of course,
which, for example, is always positive as kinetic energy must be; whereas the expression involving the Laplacian may be either positive or negative. This confusion is compounded by the fact that for isolated bound particles the two expressions above integrate to the same mean value:
l^{'Lv2)^dv=!L^2dv because, for such systems, the difference between the two densities integrates to zero; a result which depends on the boundary conditions generated from the Schrodinger Condition as we shall see in Section 8.3. Taking such a case — a constant energy system, say — the Schrodinger equation becomes 2m from which we may obtain
Now, the argument goes, since the energy is constant, the sum of the kinetic and potential energies must be constant and this only obtains if the kinetic energy is given by the Laplacian expression so that T + V = E = constant
8.2.
Some
Interpretation
155
in the above expression. If the expression
is used, then the sum of the kinetic energy density and the potential energy density is not constant at each point in space which contradicts the original assumption. But the Schrodinger equation arises from the Schrodinger Condition which requires that the mean values of the Hamiltonian density (the sum of the kinetic and potential energy densities) be equal to the energy density. These quantities are equal only on the average, not point-by-point in space (and time). As we have seen above, both "candidates" for the kinetic energy density satisfy this condition in the mean so there is no difficulty. To insist that the sum of the kinetic and potential energies be equal to the total energy at all points in space is to contradict the central Schrodinger Condition that only the means of these quantities are required to be equal in quantum mechanics. In summary, the Schrodinger equation is an equation which determines the function tp, and the physical interpretation of the dynamical quantities of the theory are fixed in advance by the interpretation of the more fundamental Schrodinger Condition. In setting up the Schrodinger Condition, the particle probability density, the kinetic energy density, the potential energy density and the total energy density are all defined independently in terms of the complex action S or ip and the Schrodinger Condition fixes relationships among the average of these densities. The Schrodinger equation is, thus, a kind of "auxiliary"equation which determines ip a n d c a n o n r y carry the physical interpretation given in the fundamental variational integral. If this is forgotten, by fixing attention on the Schrodinger equation only, then paradoxes result — like negative kinetic energy. It is not a paradox that in an isolated (constant energy) system the sum of the kinetic and potential energies is not equal to the total energy at all points in space because it is precisely this freedom of the total energy density to deviate from the Hamiltonian density which distinguishes Schrodinger's quantum mechanics from classical mechanics. It is not a paradox but it is hard to understand — in short it is a call for further creative work!
156
The Schrodinger
8.3.
Equation
The Boundary Conditions
The expressions which I have, rather dismissively, called boundary terms for the general variational principle were:
dA
d(W\a
^b
3N
(8.3.3) »=1
where the region of space over which the original integration (over dV) was carried out has been denoted symbolically by } a . In the case of the derivation of the Schrodinger equation, the simple form of A gives: dA
.
W^ = -"*
Recall that 77 is an arbitrary variation function: V> — > if) + er).
There are, therefore, several possibilities: 1. Since 77 is a factor in both of these expressions, they are both zero if 77(a) = 77(6) = 0 with no further consequences for the Schrodinger equation. But 77 was an arbitrary function so that, requiring that it vanish at the boundaries of the region is tantamount to saying that the Schrodinger equation does not have to hold at these boundaries, so this simple solution will not do. 2. The two types of boundary condition may be zero separately; this may be achieved if
4>(a) = (Vtf)(a) = V(6) = (VV)(&) = 0. This is easily interpreted as the condition that the probability distribution function should vanish at the boundaries and the momentum of the abstract particle vanish at the boundaries, clearly the kind of conditions one would expect to obtain for bound particles within a region of space; in any concrete system of bound particles there should be no particles at or beyond the boundaries and their momenta should be zero. The boundaries may, of course be (a, b) = (—00, 00).
8.3.
The Boundary
157
Conditions
3. The two types of boundary condition should cancel, multiplying them both by ip it is easy to see that, in this case, the vanishing of the sum of the two conditions is a simple "conservation of particles" rule; what comes in at a must be matched by what goes out at b or vice versa. It is the combination of these boundary conditions and the solutions of the Schrodinger equation which satisfy the Schrodinger Condition and therefore, enable the computation of various physical properties and probability distribution functions for the relevant abstract object which may be compared to the experimentally-measured equivalents for sets of concrete objects. The explicit relationship between the "Laplacian density" and the kinetic energy density is illuminating, since it highlights the way in which the global Schrodinger Condition bears on the boundary conditions of the local equation and on the associated physical interpretation. For a single particle of unit mass we have
2 2 \w? = \ v>*(-^v )v + v<Mv )v* iv 2 ^ 2 . If the two expressions in braces on the right-hand-side were equal then a direct relationship could be obtained. This equation cries out for the application of Green's theorem; integrating both sides we obtain an expression for the average value of the kinetic energy in terms of integrals involving the Laplacian:
J \\^\2dv = ~ y r (-^v2) wv + Ji, (~v2) rdv +
1
-j^\^dV.
Now the two expressions in the braces may each be integrated by parts (twice) to show that they are equal ifip and VV> vanish on the boundary of the region in which the particle moves or that "incoming" and "outgoing" components of tp and V?/> cancel; that is if the boundary terms arising from the Schrodinger Condition are zero. Under these conditions the Divergence Theorem guarantees that the remaining term, involving the Laplacian of IV'I2, is zero. That is, for bound states or for systems in which there is "no net creation of particle density" the equality
J||vy>|2<*v = Jp (-^v 2 ) i,dv
The Schrodinger Equation
158
holds. The mean value of the kinetic energy may be obtained by integrating the Laplacian density
or by integrating the kinetic energy density. But the equality of the integrals does not imply the equality of the integrands — far from it, as \Vip\2 is always positive while the Laplacian density may be negative. The mean value of the kinetic energy is always
J\\^\2dV and usually
yv(4 v2 )^ F but the kinetic energy density is always
\m\2 and never
„.(-!*)• emphasising the fact that the physical interpretation of the quantities associated with the Schrodinger equation are fixed in advance by the assumptions leading up to the formulation of the Schrodinger Condition.
8.4.
The Time-Independent Schrodinger Equation
The most important applications of the Schrodinger equation both from the practical and from the theoretical point of view are those for which the Hamiltonian is independent of time; that is the potential energy function involves no time dependence. The full Schrodinger equation (8.2.2) is
-^2nQi;t)
+ V(qi;tMqi;t) = i ^
using * in place of tp in (8.2.2) for convenience in what follows. If V is a function of spatial coordinates only this equation may be separated into a
8.4-
The Time-Independent
Schrodinger
Equation
159
space-only equation and a time-only equation by the substitution:
where the 6(t) is a solution of
i.e. 6{t) =
exp(-iEt)
and ip{q%) is a solution of the so-called time-independent Schrodinger equation: 1 ~^VW)
+ VtfWq*) = E^q*).
(8.4.4)
Thus
the probability distribution function is independent of time. This equation, for systems of particles bound by an attractive time-independent potential function (plus any particle interactions), when solved in conjunction with the appropriate boundary conditions provides the mathematical structure from which the all formal algebraic structures of quantum theory may be abstracted. It is increasingly obvious that questions of the physical interpretation of the quantities arising in the manipulation and solution of the Schrodinger equation cannot be put off any longer. I have not yet discussed the problems associated with momenta and their densities. The next chapter tries to address some of the major difficulties and will provide us with some important mathematical analogies with the time-independent Schrodinger equation whose physical interpretation has to be carefully distinguished from that of the Schrodinger equation.
This page is intentionally left blank
A p p e n d i x 8.A
Schrodinger's First Paper of 1926
Here is an English version of the first and most astounding of the 1926 epoch-making papers by Schrodinger.
Introduction This paper, along with Schrodinger's other early papers on what was then called wave mechanics, were made available to the English-speaking world in 1928 by a combination of the Physical knowledge of J. F. Shearer of the University of Glasgow, the German of Ms. W. M. Deans, a graduate of Newnham College Cambridge and the foresight of Blackie the publishers. Ms Deans was astonished and delighted that anyone still read these papers when I corresponded with her in the 1980s. The publisher of the translation, Blackies, suffered the loss of much of their records and materials during the London Blitz of the Second World War and so the book went out of print. The 1928 translation was less than perfect, understandably so in view of the novelty of the material; with hindsight and their example, I have tried to do a slightly better job. There are two main points where this rendering differs from the German original and from the Shearer/Deans literal translation 3 : • I have abandoned the equation numbering of the original and used the same system used throughout this book; the equations are numbered in Appendices as (Appendix number.Equation number). 4 Something is lost 3
Which has been re-published as a facsimile of the original Blackie 1928 edition: Collected Papers on Wave Mechanics, E. Schrodinger (Chelsea, New York 1982). 4 So that Carsten Dominik's superb automatic referencing system (ReflEX) can be used within the GNU editor emacs without which madness looms. 161
162
Schrodinger's First Paper of 1926
here since, for example, Schrodinger used an equation with one or more primes (14, 14', etc.) in his original to denote the relationships between equations. More trivially, the equation number now follows the equation rather than preceding it, which is the current convention. • The notation "log" has been changed to "In" in line with modern notation for natural logarithms. • More controversially perhaps, I have systematically changed Schrodinger's notation for the two quantum numbers used in the solution of the variational problem for Kepler motion. Schrodinger used exactly the reverse of the present-day convention; l for the "principle" (energy-determining) quantum number and n for the angular momentum quantum number. I have swapped these symbols in order to make the text more acceptable to the modern reader.
Quantisation as a Problem of Proper Values (Part 1) (Annalen der Physik (4), Vol. 79, 1926, pp. 361-376)
§1. In this paper I wish to consider, first, the simple case of the hydrogen atom (non-relativistic and unperturbed), and show that the customary quantum conditions can be replaced by another postulate, in which the notion of "whole numbers", merely as such, is not introduced. Rather when integralness does appear, it arises in the same natural way as it does in the case of the node numbers of a vibrating string. The new conception is capable of generalisation, and strikes, I believe, very deeply at the nature of the quantum rules. The usual form of the latter is connected with the Hamilton-Jacobi differential equation, n(*%)=E.
(8.A.1)
A solution of this equation is sought such as can be represented as the sum of functions, each being a function of one only of the independent variables q. Here we put for S an new unknown ijj such that it will appear as a product of related functions of the single coordinates, i.e. we put S = K]ntp.
(8.A.2)
The constant K must be introduced from considerations of dimensions; it has those of action Hence we get
"(«•?£)-*•
(8 A 3)
''
Now we do not look for a solution of equation (8.A.3), but proceed as follows. If we neglect the relativistic variation of mass, equation (8. A.3) can 163
Quantisation
164
as a Problem of Proper Values (Part 1)
always be transformed so as to become a quadratic form (of ip and its first derivatives) equated to zero. (For the one- electron problem this holds even when mass variation is not neglected.) We now seek a function ip, such that for any arbitrary variation of it the integral of the said quadratic from, taken over the whole coordinate space, 5 is stationary, ip being everywhere real, single-valued, finite, and continuously differentiable up to second order. The quantum conditions are replaced by this variation problem. First, we will take for H the Hamiltonian function for Keplerian motion, and show that ip can be chosen for all positive, but only for a discrete set of negative values of E. That is, the above variation problem has a discrete and a continuous spectrum of proper values. The discrete spectrum corresponds to the Balmer terms and the continuous to the energies of the hyperbolic orbits. For numerical agreement K must have the value h/2n. The choice of coordinates in the formation of the variational equations being arbitrary, let us take rectangular Cartesians. Then (8. A.3 on the page before) becomes in our case dip^ dx
dip
(dip
2m ~K
+
-'II! 2m ~K
dip_
dxdydz
+
dx
E +
r
ip2 =0
dip dy (8.A.5)
0
tf
(8.A.4)
x2 + y2 + z2.
- charge, m = mass of an electron, r 2 Our variation problem then reads SJ
E+e-
the integral being taken over all space. From this we find in the usual way
>-l
dfdiP
dip dn
/ / / dxdydzStp „2
2m K
V^V + -rr \E +
v>
o.
(8.A.6)
Therefore we must have, firstly, _o ,
2m I ^
V2iP+ — [E + 5
e'
ip = 0
I am am aware this formulation is not entirely unambiguous.
(8.A.7)
Quantisation
as a Problem of Proper Values (Part 1)
165
and secondly, fdfSrp^=0.
(8.A.8)
df is an element of the infinite closed surface over which the integral is taken. (It will turn out later that this last condition requires us to supplement our problem by a postulate as to the behaviour of Sip at infinity, in order to ensure the existence of the above-mentioned continuous spectrum of proper values. See later.) The solution of (8. A.7) can be affected, for example, in polar coordinates r, 8, (j) if ip be written as the product of three functions, each only of r, of 6 or of . The method is sufficiently well known. The function of the angles turns out to be a a surface harmonic, and if that of r be called \, w e easily get the differential equation, d2X dr2
2dx r dr
(ImE \ K2
2me 2 K2r
1(1 + 1) Jx=0 r2
(8.A.9)
f = 0,l,2,3,.... The limitation of £ to integral values is necessary so that the surface harmonic may be single-valued. We require solutions of (8.A.9 on the preceding page) that will remain finite for all non-negative real values of r. Now6 equation (8.A.9 on the preceding page) has two singularities in the complex r-plane, at r = 0 and r = oo, of which the second is an "indefinite point" (essential singularity) of all integrals, but the first on the contrary is not (for any integral). These two singularities form exactly the bounding points of our real interval. In such a case it is now known that the postulation of the finiteness of \ at the bounding points is equivalent to a boundary condition. The equation has in general no integral which remains finite at both end points; such an integral exists only for certain special values of the constants in the equation. It is now a question of defining these special values. This is the jumping-off point of the whole investigation. Let us examine first the singularity at r = 0. The so-called indicial equation which defines the behaviour of the integral at this point, is p(p-l)
+ 2p-e(e-l)=0
(8.A.10)
6 For guidance in the treatment of (8.A.9 on the facing page) I owe thanks to Hermann Weyl.
Quantisation as a Problem of Proper Values (Part 1)
166
with roots Pi = i,
P2 = - ( * + ! ) •
(8.A.11)
The two canonical integrals at this point have therefore the exponents £ and -(£ + 1). Since £ is not negative, only the first of these is of use to us. Since it belongs to the greater exponent, it can be represented by an ordinary power series, which begins with r ° . (The other integral, which does not interest us, can contain a logarithm, since the difference between the indices is an integer.) The next singularity is at infinity, so the above power series is always convergent and represents a transcendental integral function. We therefore have established that: The required solution is (except for a constant factor) a single-valued definite transcendental integral function, which at r = 0 belongs to the exponent £. We must now investigate the behaviour of this function at infinity on the positive real axis. To that end we simplify equation (8.A.9 on the facing page) by the substitution X = raU
(8.A.12)
where a is so chosen that the term with 1/r2 drops out. It is easy to verify that then a must have one of the two values £, —(£ + 1). Equation (8.A.9 on the preceding page) then takes the form,
Its integrals belong at r = 0 to the exponents 0 and —2a — 1. For the a-value, a = £, the first of these integrals, and for the second a-value, a = — (£ + 1), the second of these integral function and leads, according to (8.A.12 on the preceding page), to the desired solution, which is singlevalued. We therefore lose nothing if we confine ourselves to one of the two a-values. Take, then, a = l.
(8.A.14)
Our solution then, at r — 0, belongs to the exponent 0. Equation (8.A. 13 on the page before) is called Laplace's equation. The general type is U
"+(S»
+
j)U'+(eo
+
j)U
= °-
(8.A.15)
Quantisation
as a Problem of Proper Values (Part 1)
167
Here the constants have the values 9m p2
1m F,
So=0,
S1=2(a
+ 1),
^0 = — ,
e1 = ^
.
(8.A.16)
This type of equation is comparatively simple to handle for this reason: the so-called Laplace's transformation, which in general leads again to an equation of the second order, here gives one of the first. This allows the solutions of (8.A.15) to be represented by complex integrals. The result 7 is only given here. The integral U = f ezr(z - c i ) " 1 - 1 ^ - c2)a2~1dz
(8.A.17)
is a solution of (8. A.15) for a path of integration L, for which zr ^-[e {z dz
JL
- c i ) 0 1 - 1 ^ - c2)a2-l\dz
= 0.
(8.A.18)
The constants ci,c 2 ,ai,a: 2 , have the following values. c\ and c2 are the roots of the quadratic equation z2 + 80z + e0 = 0
(8.A.19)
and ai =
ei + tfici , a2 = c\ - c2
ei + <5ic2 c\ - c2
. (8.A.20)
In the case of (8.A. 13 on the preceding page) these become, using (8.A. 16) and (8.A.14), C
ai=+
l = +\
-2mE KTSI2 ''
? f l = + € + l, KyRfrnE
°2
*
=
a2 =
V
-2mE K2
p L = + £ + l . Ky/-2mE
(8.A.21) v
The representation by the integral (8.A.17 on the preceding page) allows us not only to survey the asymptotic behaviour of the totality of solutions when r tends to infinity in a definite way, but also to give an account of this behaviour for one definite solution, which is always a much more difficult task. 7
Cf. Schlesinger. The theory is due to H. Poincare and J. Horn.
Quantisation
168
as a Problem of Proper Values (Part 1)
We shall at first exclude the case where ot\ and ot\ are real integers. When this occurs, it occurs for both quantities simultaneously, and when, and only when, 7716
.
O
= a real integer.
(8.A.22)
Therefore we assume that (8.A.22) is not fulfilled. The behaviour of the totality of solutions when r tends to infinity in a definite manner — we think always of r becoming infinite through real positive values — is characterised by 8 the behaviour of the two linearly independent solutions, which we will call U\ and U2, and which are obtained by the following specialisations of the path of integration L. In each case let z come from infinity and return there along the same path, in such a direction that lim ezr = 0
(8.A.23)
i.e. the real part of zr is to become negative and infinite. In this way condition (8.A.18 on the facing page) is satisfied. In the one case let z make a circuit once round the point C\ (solution U{), and in the other round c2 (solution U2). Now for very large real positive values of r, these two solutions are represented asymptotically (in the sense used by Poincare) by ( C/i ~ e C i r r - Q l ( - l ) Q l ( e 2 , r i a i - l ) r ( a i ) ( c i - c 2 ) a 2 _ 1 < [ U2 ~ e C 2 r r - a 2 ( - l ) Q 2 ( e 2 7 r i a 2 - l ) r ( a 2 ) ( c 2 - c i ) " 1 " 1
(8.A.24)
in which we are content to take the first term of the asymptotic series of integral negative powers of r. We have now to distinguish between the two cases. 1. E > 0. This guarantees the non-fulfillment of (8.A.22), as it makes the left hand a pure imaginary. Further, by (8.A.21 on the facing page), c\ and c 2 also become pure imaginaries. The exponential functions in (8.A.24), since r is real, are therefore periodic functions which remain finite. The values of a\ and a2 from (8.A.21 on the facing page) show that both U\ and U2 tend to zero like r _ < _ 1 . This must therefore be valid for our transcendental integral solution U, whose behaviour we 8 If (8.A.22) is satisfied, at least one of the two paths of integration described in the text cannot be used, as it yields a vanishing result.
Quantisation
as a Problem of Proper Values (Part 1)
169
are investigating, however it may be linearly compounded from U\ and 1/2- Further, (8.A.12 on page 166) and (8.A.14 on the facing page) show that the function \ , i-e. the transcendental integral solution of the original equation (8.A.9 on page 165), always tends to zero like 1/r, as it arises from U through multiplication by re. We can thus state: The Eulerian differential equation (8. A. 7 on page 165) of our variation problem has, for every positive E, solutions, which are single-valued, finite, and continuous; and which tend to zero with 1/r at infinity, under continual oscillations. The surface condition (8. A.8 on page 165) has yet to be discussed. 2. E < 0. In this case the possibility (8.A.22 on the preceding page) is not eo ipso excluded, yet we will maintain that exclusion provisionally. Then by (8.A.21 on page 167) and (8.A.24 on the preceding page), for r tends to infinity, U\ grows beyond all limits, but U% vanishes exponentially. Our integral function U (and the same is true for \) w m then remain finite if, and only if, U is identical to U2, save perhaps for a numerical factor. This, however, can never be, as is proved thus: if a closed circuit round both points c\ and C3 be chosen for the path L, thereby satisfying condition (13) since the circuit is really closed on the Riemann surface of the integrand, on account of ot\ + OLI being an integer, then it is easy to show that the integral (8.A.17 on page 167) represents our integral function U. (8. A.17 on page 167) can be developed in a series of positive powers of r, which converges, at all events for r sufficiently small, and since it satisfies equation (8.A.13 on page 167), it must coincide with the series for U. Therefore, U is represented by (8.A.17 on page 167) if L be a closed circuit round both points c\ and c-i- This closed circuit can be so distorted, however, as to make it appear additively combined from the two paths considered above, which belonged to U\ and C/2; and the factors are non-vanishing, 1 and exp(27riai). Therefore U cannot coincide with U2, but must also contain U\. Q.E.D. Our integral function U, which alone of the solutions of (8.A.13 on page 166) is considered for our problem, is therefore not finite for large r, on the above hypothesis. Reserving meanwhile the question of completeness, i.e. the proving that our treatment allows us to find all the linearly independent solutions of the problem, we may then state: For negative values of E which do not satisfy condition (8.A.22 on the preceding page) our variation problem has no solution.
Quantisation
170
as a Problem of Proper Values (Part 1)
We have now only to investigate that discrete set of negative ^-values which satisfy condition (8.A.22 on the page before). a\ and a2 are then both integers. The first of the integration paths,which previously gave us the fundamental values U\ and U2, must now undoubtedly be modified so as to give a non-vanishing result. For, since ct\ — 1 is certainly positive, the point c\ is neither a branch-point nor a pole of the integrand, but an ordinary zero. The point c2 can also become regular if a2 — 1 is also not negative. In every case, however, two suitable paths are readily found and the integration effected completely in terms of known functions so that the behaviour of the solutions can be fully investigated. Let TAP
O
— = = = n ;
n = 1,2,3,....
(8.A.25)
K\/-2mE Then from (8. A.21 on page 167) we have ai-l=n-M,
a2-l
= -n + £.
(8.A.26)
Two cases have to be distinguished: n < £ and n > (.. (a) n < £. Then c\ and c2 lose every singular character, but instead become starting-points or end-points of the path of integration, in order to fulfil condition (8.A.18 on page 167). A third characteristic point here is at infinity (negative and real). Every path between two of these three points yields a solution, and of these three solutions there are two linearly independent, as is easily confirmed if the integrals are calculated out. In particular, the transcendental integral solution is given by the path from Ci to c2. That this integral remains regular at r = 0 can be seen at once without calculating it. I emphasize this point, as the actual calculation is apt to obscure it. However, the calculation does show that the integral becomes indefinitely great for positive, indefinitely great values of r. On the other two integrals remains finite for r large, but it becomes infinite for r = 0. Therefore when n < I we get no solution of the problem. (b) n > (.. Then from (8.A.26) c\ is a zero and c2 is a pole of the first order at least of the integrand. Two independent integrals are then obtained: one from the path which leads from z = —00 to the zero, intentionally avoiding the pole; and the other from the residue at the pole. The latter is the integral function. We will give its calculated value, but multiplied by re, so that we obtain, according
Quantisation
as a Problem of Proper Values (Part 1)
171
to (8.A.12) and (8.A.14) the solution x of the original equation (8.A.9 on page 165). (The multiplying constant is arbitrary). We find
It is seen that this is a solution that can be utilised, since it remains finite for all real non-negative values of r. In addition, it satisfies the surface condition (8.A.8 on page 165) because of its vanishing exponentially at infinity. Collecting then the results for E negative: For E negative, our variation problem has solutions if, and only if, E satisfies condition (15). Only values smaller than n (and there is always at least one such at our disposal) can be given to the integer I, which denotes the order of the surface harmonic appearing in the equation. The part of the solution depending on r is given by (8.A.27 on the page before). Taking into account the constants in the surface harmonic (known to be 2^ + 1 in number), it is further found that: The discovered solution has exactly 2£ + 1 arbitrary constants for any permissible {(., n) combination; and therefore for a prescribed value of n has n2 arbitrary constants. We have thus confirmed the main points of the statements originally made about the proper-value spectrum of our variation problem, but there are still deficiencies. Firstly, we require information as to the completeness of the collected system of proper functions indicated above, but I will not concern myself with that in this paper. From experience of similar cases, it may be supposed that no proper value has escaped us. Secondly, it must be remembered that the proper functions, ascertained for E positive, do not solve the variation problem as originally postulated, because they only tend to infinity as 1/r, and therefore dtp/dr only tends to zero on an infinite sphere as 1/r2. Hence the surface integral (8.A.8 on page 165) is still of the same order as Sip at infinity. If it is desired therefore
Quantisation as a Problem of Proper Values (Part 1)
172
to obtain the continuous spectrum, another condition must be added to the problem, viz. that 5i[) is to vanish at infinity or, at least, that it tends to a constant value independent of the direction of proceeding to infinity; in the latter case the surface harmonics cause the surface integral to vanish. Condition (8.A.22 on page 168) yields <8-A"28>
" ^ = 2 ^ -
Therefore the well-known Bohr energy levels, corresponding to the Balmer terms, are obtained, if to the constant K, introduced into (8.A.2 on page 163) for reasons of dimensions, we give the value K=±
(8.A.29)
from which comes 27r2me4
_
~E"
=
12 2
,„
•
. „„.
8.A.30
Our n is the principal quantum number. £ + 1 is analogous to the azimuthal quantum number. The splitting up of this number through a closer definition of the surface harmonic can be compared with the resolution of the azimuthal quantum into an "equatorial" and a "polar" quantum. These numbers here define the system of node-lines on the sphere. Also the "radial quantum number" n — I — 1 gives exactly the number of the "node-spheres", for it is easily established that the function f(x) in (8.A.27 on page 171) has exactly n —£ — 1 positive real roots. The positive E-values correspond to the continuum of the hyperbolic orbits, to which one may ascribe, in a certain sense, the radial quantum number oo. The fact corresponding to this is the proceeding to infinity, under continual oscillations, of the functions in question. It is interesting to note that the range, inside which the functions of (8.A.27 on page 171) differ sensibly from zero, and outside which their oscillations die away, is of the general order of magnitude of the major axis of the ellipse in each case. The factor, multiplied by which the radius vector enters as the argument of the constant-free function / , — naturally — the reciprocal of a length, and this length is K =
K =
y/-2mE
^ ! me2
=
^
(s.A.31) n
Quantisation
as a Problem of Proper Values (Part 1)
173
where an = the semi-axis of the nth elliptic orbit. (The equations follow from (8.A.28 on the preceding page) plus the known relation En = -e2/2an). The quantity (8.A.31) gives the order of magnitude of the range of the roots when n and i are small; for then it may be assumed that the roots of f(x) are of the order of unity. That is naturally no longer the case if the coefficients of the polynomial are large numbers. At present I will not enter into a more exact evaluation of the roots, though I believe it would confirm the above assertion pretty thoroughly. §3. It is, of course, strongly suggested that we should try to connect the function if) with some vibration process in the atom, which would more nearly approach reality than the electronic orbits, the real existence of which is being very much questioned today. I originally intended to found the new quantum conditions in this more intuitive manner, but finally gave them the above neutral mathematical form, because it brings more clearly to light what is really essential. The essential thing seems to me to be that the postulation of "whole numbers" no longer enters into the quantum rules mysteriously, but that we have traced the matter a step further back, and found the "integralness" to have its origin in the finiteness and singlevaluedness of a certain space function. I do not wish to further discuss the possible representations of the vibration process, before more complicated cases have been calculated successfully from the new standpoint. It is not decided that the results will merely re-echo those of the usual quantum theory. For example, if the relativistic Kepler problem be worked out, it is found to lead in a remarkable manner to half-integral partial quanta (radial and azimuthal). Still, a few remarks on the representation of the vibration may be permitted. Above all, I wish to mention that I was led to these deliberations in the first place by the suggestive papers of M. Louis de Broglie,9 and by reflecting over the space distribution of those "phase waves", of which he has shown that there is always a whole number, measured along the path, present on each period or quasi-period of the electron. The main difference is that de Broglie thinks of progressive waves, while we are led to stationary proper vibrations if we interpret our formulae as representing vibrations. I have lately shown10 that the Einstein gas theory can be based on the consideration of such stationary proper vibrations, to which the dispersion law of de Broglie's phase waves has been applied. The above reflections on 9
L. de Broglie, Ann de Physique (10) 3, p. 22, 1925 (Theses, Paris, 1924). Physik. Ztschr. 27, p. 95, 1926.
10
174
Quantisation as a Problem of Proper Values (Part 1)
the atom could have been represented as a generalisation from those on the gas model. If we take the separate functions (8. A.27 on page 171), multiplied by a surface harmonic of order I, as the description of proper vibration processes, then the quantity E must have something to do with the related frequency. Now in vibration problems we are accustomed to the "parameter" (usually called A) being proportional to the square of the frequency. However, in the first place, such a statement in our case would lead to imaginary frequencies for the negative ^-values, and, secondly, instinct leads us to believe that the energy must be proportional to the frequency itself and not to its square. The contradiction is explained thus. There has been no natural zero level laid down for the "parameter" E of the variation equation (8.A.7 on page 165), especially as the unknown function ip appears multiplied by a function of r, which can be changed by a constant to meet a corresponding change in the zero level of E. Consequently, we have to correct our anticipations, in that not E itself — but E increased by a certain constant is expected to be proportional to the square of the frequency. Let this constant now be very great compared with all the admissible negative ^-values (which are already limited by (8.A.22 on page 168)). Then firstly, the frequencies will become real, and secondly, since our E-values correspond to only relatively small frequency differences, they will actually be very approximately proportional to those frequency differences. This, again, is all that our "quantum-instinct" can require, as long as the zero level of energy is not fixed. The view that the frequency of the vibration process is given by !/ = C " V / C T E = C'V / C + 4 = £ + --(8.A.32) 2%/C where C is a constant very great compared with all the £"s, has still another very appreciable advantage. It permits an understanding of the Bohr frequency condition. According to the latter, the emission frequencies are proportional to the E-differences, and therefore from (8.A.32 on the facing page) also to the differences of the proper frequencies nu of those hypothetical vibration processes. But these proper frequencies are all very great compared with the emission frequencies, and they agree very closely among themselves. The emission frequencies appear therefore as deep "difference tones" of the proper vibrations themselves. It is quite conceivable that on the transition of energy from one to another of the
Quantisation as a Problem of Proper Values (Part 1)
175
normal vibrations, something — I mean the light wave — with a frequency allied to each frequency difference should make its appearance. One only needs to imagine that the light wave is causally related to the beatsi which necessarily arise at each point of space during the transition; and that the frequency of the light is defined by the number of times per second the intensity maximum of the beat process repeats itself. It may be objected that these conclusions are based on the relation (8.A.32 on the preceding page) in its approximate form (after expansion of the square root), from which the Bohr frequency condition itself seems to obtain the nature of an approximation. This, however, is merely apparently so, and it is wholly avoided when the relativistic theory is developed and makes a profounder insight possible. The large constant C is naturally very intimately connected with the rest-energy of the electron (mc2). Also the seemingly new and independent introduction of the constant h (already brought in by (8.A.29 on page 172)), into the frequency condition, is cleared up or rather avoided, by the relativistic theory. But unfortunately the correct establishment of the latter meets right away with certain difficulties, which have already been alluded to. It is hardly necessary to emphasize how much more congenial it would be to imagine that at a quantum transition the energy changes over from one vibration to another, than to think of a jumping electron. The changing of the vibration form can take place continuously in space and time, and it can readily last as long as the emission process lasts empirically (experiments on canal rays by W. Wien); nevertheless, if during this transition the atom is placed for a comparatively short time in an electric field which alters the proper frequencies, then the beat frequencies are immediately changed sympathetically, and for just as long as the field operates. It is known that this experimentally established fact has hitherto presented the greatest difficulties. See the well-known attempt at a solution by Bohr, Kramers, and Slater. Let us not forget, however, in our gratification over our progress in these matters, that the idea of only one proper vibration being excited whenever the atom does not radiate — if we hold fast to this idea — is very far removed from the natural picture of a vibrating system. We know that a macroscopic system does not behave like that, but yields in general a pot-pourri of its proper vibrations. But we should not make up our minds too quickly on this point. A pot-pourri of proper vibrations would also be permissible for a single atom, since thereby no beat frequencies could arise
176
Quantisation
as a Problem of Proper Values (Part 1)
other than those which, according to experience, the atom is capable of emitting occasionally. The actual sending of many of these spectral lines simultaneously by the same atom does not contradict experience. It is thus conceivable that only in the normal state (and approximately in certain "meta-stable" states) the atom vibrates with one proper frequency and just for this reason does not radiate, namely, because no beats arise. The stimulation may consist of a simultaneous excitation of one or of several other proper frequencies, whereby beats originate and evoke the emission of light. Under all circumstances, I believe, the proper functions which belong to the same frequency, are in general all simultaneously stimulated. Multipleness of the proper values corresponds, namely, in the language of the previous theory to degeneration. To the reduction of the quantisation of degenerate systems probably corresponds the arbitrary partition of the energy among the functions belonging to one proper value. Addition at the proof correction on 28.2.1926. In the case of conservative systems in classical mechanics, the variation problem can be formulated in a neater way than was previously shown, and without express reference to the Hamilton-Jacobi differential equation. Thus, let T(p, q) be the kinetic energy, expressed as a function of the coordinates and momenta, V the potential energy, and dr the volume element of the space, "measured rationally", i.e. it is not simply the product dqidq2dqs • • • dqn, but this divided by the square root of the discriminant of the quadratic form T(p,q). (Cf. Gibbs Statistical Mechanics) Then let ip be such as to make the "Hamilton integral"
Jdr^T^q^+^v}
(8.A.33)
stationary, while fulfilling the normalising, accessory condition fdrip2
= l.
(8.A.34)
The proper values of this variation problem are then the stationary values of integral (8.A.33) and yield, according to our thesis, the quantum levels of the energy.
Quantisation
as a Problem of Proper Values (Part 1)
It is to be remarked that in the quantity a
(Cf. Atombau, 4th. (German) ed., p. 775.) Physical Institute of the University of Zurich (Received January 27, 1926)
This page is intentionally left blank
Chapter 9
Identities: M o m e n t a a n d Dynamical Variables
Some problems of interpretation are addressed here. The nature of some of the eigenvalue problems which are not associated with the Schrodinger's equation are investigated and found to be identities independent of the law of motion in Schrodinger's mechanics. Difficulties in understanding the nature of the momentum probability distribution for real functions ip also have to be faced.
Contents 9.1. 9.2. 9.3. 9.4. 9.5. 9.6. 9.1.
Momentum Definitions and Distributions Abstract Particles of Constant Momentum Action and Momenta in Schrodinger's Mechanics Momenta and Kinetic Energy Boundary Conditions 9.5.1. Constant Momenta and Kinetic Energy 9.5.2. Solution of the Schrodinger Equation The "Particle in a Box" and Cyclic Boundary Conditions
179 180 182 186 189 190 191 192
Momentum Definitions and Distributions
In Chapter 7 the definition of action was generalised to be complex in order to be able to generate the Schrodinger equation by variational methods from the Schrodinger Condition. As noted at that time this extension has some consequences for the interpretation of momenta in Schrodinger's mechanics since, if we retain the classical rule for the relationship between momenta and the action: def dS
»
=
ikdij)
=
W ~^W 179
(9 L1)
-
180
Identities:
Momenta
and Dynamical
Variables
then there are gradients of the position probability density involved in the momenta which do not occur in the classical equivalent. I hinted earlier (in Section 4.4 on page 84) why it is understandable that gradients of a particle probability density would be related to particle velocities and the classical action does not yield a particle probability density. If we write out the complex action explicitly as its real and imaginary parts S(q1;t)—*Stf;t)-iR(qi;t) ip(qj;t)
= exp[(R + iS)/ik]
(9.1.2)
Then the relationships we used in Chapter 7 were the above one for the momenta and the position probability distribution was
mqj;t)\2 = V V ; * W ; 0 = exp(2W;t))
(9.1.3)
which does not involve the real action S(qi;t). The difference between the classical and quantum cases is, then, the involvement of the probability density in the momenta via the function ip. Before looking at the consequences of this generalisation to complex action for the interpretation of momenta, it is worth looking briefly at abstract particles with constant momenta. 9.2.
Abstract Particles of Constant M o m e n t u m
If the momentum of an abstract particle is given by the definition (9.1.1) in the previous section, the condition that that momentum be constant (A, say) over all space is, of course, def dS
ik dip
or
or piiP(qj; t) = X^(qj; t) (say)
(9.2.5)
which is a differential equation of the same general structure as the Schrodinger equation; an eigenvalue problem. 1 This mathematical rl>(q3 ;t) will not, in fact depend on t since, by assumption, A is constant.
9.2.
Abstract Particles of Constant
Momentum
181
similarity is not matched by any similarity in physical interpretation: Expression (9.2.5 on the preceding page) is an identity, not an equation; the solution of this differential equation is simply a function tp for which the momentum pj of the associated abstract object is constant. This tp does not necessarily solve any dynamical law, there may or may not be abstract objects for which this tp satisfies the Schrodinger Condition, i.e. ip may or may not satisfy any Schrodinger equation. Naturally, this process can be extended to any dynamical variable A(qj,pj;t), the analogue of equation (9.2.5 on the facing page) which constrains the value of this dynamical variable to be constant over all space being
^'w;0=a(say)
(9 2 6)
--
which may or may not turn out to be a convenient differential equation mathematically similar to the Schrodinger equation. But the simple requirement that a particular dynamical quantity of an abstract object be constant is not enough to ensure that the abstract object underlying this assumption corresponds to any concrete objects in the real world. Only those abstract objects whose functions ^(q^-jt) satisfy the Schrodinger Condition have associated concrete objects in the real world. It may well be the case that, for a suitable choice of potential function in the Hamiltonian function, an abstract object may be found which does, indeed, have the required constancy for the particular dynamical quantity in question 2 as well as satisfying the Schrodinger Condition but, equally, it may not. It is quite easy to find examples for which no abstract or concrete object exist, particularly for momenta conjugate to angular coordinates. This point is crucial to the relationship between the abstract, algebraic, structure of Schrodinger's mechanics and its physical interpretation, we shall have to return to this matter in Chapter 10. To emphasise the difference between these identities and the equations of Schrodinger's mechanian, we may use the Hamiltonian function (H) as 2
T h e most familiar example will be looked at in Chapter 11.
Identities: Momenta and Dynamical Variables
182
a special case of the dynamical variable A in equation (9.2.6) to obtain the condition that the Hamiltonian function is constant over all space:
*(«'. ; r&<)* e ( ""-
(9
'"»
This is not, of course, the Schrodinger equation; it is simply the condition that a particular dynamical variable, the Hamiltonian function, be constant over all space. It is not the Schrodinger equation because it has not been derived from the Schrodinger Condition which is the dynamical law of Schrodinger's mechanics. Thus, from the point of view of Schrodinger's mechanics, this expression is just another identity with a similar status to equation (9.1.1 on page 179). In point of fact, if we revert to the original classical notation for the real action, equation (9.2.7) is simply the Hamilton-Jacobi equation for the case of a time-independent Hamiltonian function.3 Similarly, we may use the energy-time conjugacy relationship to obtain the condition that the energy of a system be constant in time: def dS _ ik dip
(9.2.8)
~ ~dt ~ ~^~di
which generates the familiar identity for such an energy-conserving system. Setting the (constant) Hamiltonian function equal to the (constant) energy for a system with time-independent Hamiltonian again does not generate any dynamical law in Schrodinger's mechanics because both are identities and have not been generated by the application of the variational method to the Schrodinger Condition which is the relevant dynamical law; in particular the Schrdinger equation cannot be generated in this way.
9.3.
Action and Momenta in Schrodinger's Mechanics
We have seen in Section 8.3 on page 156 of Chapter 8 that, for the most common types of system, the boundary conditions on the Schrodinger equation ensure that the mean values of
l\\Vi>\2dV
and
/'$• (-hj*\
$dV
3 Thus, (9.2.7) is an equation in classical mechanics and an identity in Schrodinger's mechanics.
9.3. Action and Momenta in Schrodinger's Mechanics
183
are identical, but that the distributions in the two integrands are obviously different almost everywhere. There is a similar situation associated with the mean value and the distribution of momenta. The distribution of any momentum pj is familiar as
^•™M){^(0)1 i I f
= V V ; 0 ( - i f c ^ j ) ^(ii;t).
(9.3.9)
Now since, from equation (9.1.2 on page 180) ip(qj;t) = exp[(i? + iS)/ik] this distribution may be written in terms of R and S:
= ^«)(^)- i ^')(0)
(9310)
the first term of which has an entirely intuitively acceptable interpretation as the distribution of the classical-like momentum. The second term is pure imaginary since R is real. What can this possibly mean? We can get an insight into its meaning by evaluating the mean value of a typical momentum component (conjugate to q, say):
jO(-3)*-jf (*-«*)«***
[
1
Tb
rb
BS
-exp(2 J R) + / exp(2R)—dq. (9.3.11) The first (pure imaginary) contribution arises from contributions to the particle probability distributions at the boundaries of the region which, as we have already seen, will either both vanish or exactly cancel when
Identities:
184
Momenta
and Dynamical
Variables
the boundary conditions are imposed.4 The second term has a completely intuitively acceptable interpretation; it is the mean value of the momentum (dS/dq) of a probability distribution p = exp(2i?). Thus, the mean value of any momentum component for any physically-acceptable ip involves no contribution from the pure imaginary component of the momentum distributions which is due to gradients of the particle probability distribution. This is some slight comfort but we are seeking an interpretation of all the quantities which occur in the formalism. Equally important, if we look at the expression for the kinetic energy distribution we cannot escape so easily. The kinetic energy distribution is the distribution of the square of the momentum so that gradients of both S and R occur in both the distribution and the mean value of the kinetic energy:
a=x,y,z
\
x
J
where both gradient terms appear in identical (real) forms.5 There is, therefore, a pressing need to interpret the imaginary terms in the momentum distribution. Looking back at the identity (9.2.5 on page 180) which generates the probability distribution function for the position of an abstract particle with constant momentum, it has solutions: ip(q) = Nexp{ikq)
(9.3.13)
for any k and a normalisation constant N. This function does not, in fact, generate a probability distribution function IV'I2 since it cannot be normalised without further imposition of boundary conditions on its behaviour at the limits of q. If, for simplicity, we assume that q is a cyclic coordinate (like <j) of the spherical polars) then this is sufficient to ensure proper behaviour. When this is done, the position probability distribution function for an abstract object with constant momentum is constant, 4 If the boundaries are boundaries of open sets (like (—00,00)), if the end-points of the coordinates are closed, then there are changes needed; the mathematical equivalent of "impacts". 5 Interestingly enough, the quantity tp*V2/4> is complex with imaginary terms vanishing by boundary conditions while also containing real terms in the gradients of R.
9.3.
Action and Momenta in Schrodinger's Mechanics
185
so that S(q) = kq R{q) = i\nN and the system's distributions are entirely classical. The momentum of the abstract particle (and therefore of any concrete particles, should these exist) is simply dS/dq and any concrete particle is equally likely to be found anywhere in the allowed range of q. The mean value of the momentum is, of course, k since it has been assumed that the abstract particle has constant momentum k. Although this example has a reassuringly familiar connection with classical particle mechanics, it provides no help in interpreting the imaginary momentum. The only concrete objects which exist in the real (micro-) world are those for which the Schrodinger equation for a corresponding abstract object exists. The time-independent Schrodinger equation is a real equation, so its solutions must be real. Or, if complex solutions exist, they must occur in degenerate complex conjugate pairs from which real solutions may be formed. This means that the mean values of all momenta are zero for all abstract objects for which there is a time-independent Schrodinger equation; the only contribution to the momentum distributions are pure imaginary and so (boundary conditions permitting) integrate to zero. Of course, the distribution integrates to zero by cancellation of equal positive and negative contributions and not because the distribution is identically zero. When we consider the nature of the abstract objects which the Schrodinger equation describes this result is less surprising than it would first appear. For a single particle, the abstract object might typically be An abstract particle in a given conservative field of force. with, of course, no specification of position, momenta, etc. Thus, there is no specification of, for example, direction of momenta (linear, angular or whatever) and the abstract object's probability distributions will therefore describe the properties of any concrete objects with all possible magnitudes and directions of all momenta consistent with the given field of force and the particular solution (Ei, rpi, say) of the Schrodinger equation. So, the mean
186
Identities: Momenta and Dynamical Variables
values of all the momenta will indeed be zero; for any given allowed magnitude and direction of momentum, its equal and opposite will also be allowed. Now, one may wish to impose additional constraints on the solutions of a particular Schrodinger equation in order that the resulting constrained abstract object have some particularly desirable properties for a particular application. So, for example, if a solution of a Schrodinger equation is degenerate it may be possible to form linear combinations of these degenerate solutions which are complex in order that the resulting abstract object thus defined have non-zero mean momentum (or momenta) in addition to constant energy. This is a constraint on the solutions of the Schrodinger equation and the abstract object concerned becomes less abstract or more concrete as more of its properties are prescribed. In fact, as we shall see in Chapter 11, it is possible either by an examination of the properties of the variational "Lagrangian" distribution, or by separating the Schrodinger equation, to find solutions to the Schrodinger equation with constant momenta in addition to constant energy and so specify the correspondingly restricted abstract objects. There is then a welldefined problem of finding the least abstract (most closely-specified) object for a given conservative field of force and this problem will be addressed in Chapter 11. The real solutions of the Schrodinger equation correspond, therefore, to the most abstract objects consistent with the given conservative force field and any complex solutions must correspond to less abstract objects. If the means of the momenta are all zero in the most general case, the pure imaginary part of the distribution of momenta (corresponding to the real part of ip) must carry a description of the deviations from these mean momenta; the distributions which cancel in generating the zero mean. 9.4.
Momenta and Kinetic Energy
In looking at the solutions, (9.3.13 on the facing page) of equation (9.2.5 on page 180) we have not checked that the abstract objects which they describe do, in fact, correspond to solutions of the Schrodinger equation. That is, are there concrete objects in the real world having the properties of these abstract objects? The functions ip of equation (9.3.13 on page 184) are complex and, what is more, it is not yet obvious that they solve some Schrodinger equation.
9.4-
Momenta and Kinetic
187
Energy
The simplest and most useful abstract object to look at here is a single free particle, since it is so familiar that any of its properties are almost immediately intuitively interpretable. Such an object only has kinetic energy and there are three possible ways of approaching its properties, using Cartesian coordinates in the same spirit of intuitive accessibility. In the first instance we can look at this example purely from the point of view of solving the differential equations which arise in the three possible cases, without regard for the effect of any boundary conditions which may be relevant. The three possibilities are: 1. The momentum is a (vector) constant: -VIIJ\(X,y,z)
= k = kxi + kyj
Vipi(x,y,z)
=
+ kzk
(say)
iktpi(x,y,z).
Clearly, this equation may be solved by writing Mx,y,z)
=
X1(x)Y1(y)Z1{z)
leading to separate equations for the three factors of the form: dX\
..
—— =
IKXX\
ax font with solutions X\(x) = Ax exp(ikxx) (9.4.14) 2 where the probability distribution function is IV'il and fex, etc., may take any (real) value. 2. The kinetic energy is a (positive, real) constant:
2\Mw)\2
r |VV>2(z,y,z)|
2 d
=T(say)
|VV^(x,l/,«)| 2 = 2T|^ 2 (x, 2 /, Z )| 2 again, writing i/,2 in the product form Mx,y,z)
=
X2(x)Y2(y)Z2(z)
leads to three equations of the form dX2 dx
2
2TX\X2\2
188
Identities:
Momenta
with solutions -X^rr) =
and Dynamical
Variables
Bxexp(i\J2Txx)
where T = yJT* + T2 + T 2
(9.4.15)
this time the probability distribution function is \ip212- The number T, etc., is, of course, non-negative in this case. 3. The system solves the time-independent Schrodinger equation and the separation proceeds in the same way: = Eip3(x,y,z)
(say)
ld2X3 = EXX3 2 dx2 / X3(x) = Cx sin( V 2^a;) + Dx
where E = Ex + Ey + Ez
cosfJ^E^x) (9.4.16)
with the same proviso on ip3 and Ex is positive. Notice that, in this case, the Schrodinger equation is a real equation so the real solutions are given. In each case, the equation separates in Cartesian coordinates and the xcomponent of each has been given explicitly. There are obvious mathematical relationships amongst these quantities: • Xi provides a special solution to the problem in 2. on the preceding page; for kx = y/2Tx. This is not surprising; an abstract object with a given value of linear momentum in a given direction must have a related fixed value of kinetic energy in that direction. But the converse is equally obviously false, a given abstract free particle with constant kinetic energy does not necessarily have constant linear momentum components. In one dimension any mixture of momentum components of |fcx| = V2TX will have the same kinetic energy and in three dimensions. In fact, the situation is even more free, all that is required is that hi + k2y + k2 = IT any mixture of linear momenta in any direction has constant kinetic energy, provided that this condition is satisfied. The solutions (9.4.14) are non-degenerate; to every triple (kx,ky, kz) there is a unique solution.
9.5.
Boundary
Conditions
189
• The solutions ^ ( r r ) may be chosen to have the same form as the solutions of equation (9.4.14 on the facing page) as we have seen above. In contrast to the solutions ipi, the solutions ip2 of (9.4.15) are infinitely degenerate; 6 for every value of T there are infinitely many ways of combining the triple (Tx,Ty,Tz) to generate the total sum. • The (real) solutions of equation (9.5.17) are obviously linear combinations of the complex solutions of equations (9.4.14 on the facing page) and X\ satisfies (9.5.17) again as a special case — specific choice of Cx/Dx = i and
Again the solutions ^3 of (9.5.17) are infinitely degenerate; for every value of E there are infinitely many ways of combining the triple (Ex, Ey,Ez) to generate the total sum. Thus, it looks as if the abstract objects "free particle with constant momentum" and "free particle with constant kinetic energy" do have solutions in common with a Schrodinger equation and so there are concrete objects in the real world which have this property. But for every abstract object "free particle with constant momentum" there are infinitely many others with the same energy.
9.5.
Boundary Conditions
The boundary conditions which might be imposed on the solutions of the Schrodinger equation or on the solutions of the simpler constant-momentum equations have been shabbily treated so far. In fact they are important for two, related, reasons: • It is the boundary conditions which ultimately decide whether or not an acceptable solution exists for the Schrodinger equation and these conditions are the source of the typical quantisation of dynamical quantities. • The boundary conditions fix the nature of the functions on which the operator
_iv2 + vV) In the absence of boundary conditions.
Identities:
190
Momenta
and Dynamical
Variables
may work and it is this domain of the operator which fixes its important formal properties as we shall see. In the case of the Schrodinger equation the boundary conditions are generated by the variational solution of the Schrodinger Condition. It is these boundary conditions which fix the range of Hermiticity of the Hamiltonian operator. • In the case of identities like (9.2.6 on page 181), there is no "natural" source of the boundary conditions, precisely because these identities are not per se part of Schrodinger's mechanics. We are left to impose such boundary conditions as we think suitable because it is we who have generated the original identity, not Schrodinger's mechanics. The particular case of a abstract free particle proves particularly atypical and problematic, basically because the motion is better described by classical mechanics.
9.5.1.
Constant Momenta
and Kinetic
Energy
The functions tp\ for the abstract particle of constant momentum of the previous section do not have finite normalisation integrals, for example: b
/
\Xrfdx
diverges if the full range of the Cartesian coordinate is used: (a, b) = (—oo, oo). If the region over which the abstract particle is allowed to move is finite, the functions Xi (and the functions V>i of which they are factors) may be normalised and the interpretation of the ipi as probability distribution functions is thereby secured. 7 There is a point of principle to be made here: since the complex functions X\(x) = Ax exp(ikxx) have constant modulus, one has to choose between allowing the function to simply stop at a and b with the same amplitude as the rest of the range or forcing a linear combination of the complex conjugates to obtain a real trigonometric function which can be made to vanish at the end-points. But, in doing this, we have formed linear combinations of functions belonging to different eigenvalues of the original eigenvalue equation and so destroyed the property of solving this constant-momentum equation. So, in this case, these new real solutions do not, in fact, solve the original constant-momentum equation 7 One could, of course, argue that the relative measures of the functions V"i over finite regions could still be intepreted as relative probabilities in those regions, even in the absence of this last constraint.
9.5.
Boundary Conditions
191
but they do solve the relevant Schrodinger equation, (9.5.17). They do not solve the original constant-momentum equation because, in order to generate real solutions, one must take linear combinations of two complex solutions with different values of kx; namely the ones with constant \kx\. There is no guidance to be had from the original definition of an abstract particle of constant momentum, precisely because it is a definition; it is at our discretion to enhance our definition. Whether we make the function simply stop at a and b or force it to be zero means that it is defined over a closed interval [a, b] and this implies that its derivatives are not defined at the end points and it is these very derivatives which generate the said momentum. The definition of the momentum can be extended to cover this case (and others where the coordinate ranges are closed intervals of R):
Similar remarks apply to the case of constant kinetic energy with the important exception that, this time, we may form the real solutions by combining the complex solutions of the same |fcx| (say) because these solutions do have the same (constant) kinetic energy even though they have different momenta. 9.5.2.
Solution
of the Schrodinger
Equation
In the case of ^3 and its factors X3 (solutions of the Schrodinger equation) which are entirely real-valued functions, the situation is much more clear-cut since the Schrodinger Condition generates a fixed set of boundary conditions which must be met. The original boundary conditions which must be satisfied by the solutions of a Schrodinger equation were obtained variationally from the Schrodinger Condition in Chapter 8 and are given in general by (8.3.3) which for the particular case here become: [-iip + VV]„ = 0
(9.5.17)
where "a" and "6" are used to mean the lower and upper limits of the three-dimensional region where 1P3 is defined; a lower and upper limit for each of x, y and z in Cartesians. Now tp3 is a real-valued function and so when we write out equation (9.5.17) out in full:
HV(6) + V>(6)) - (-iil>(a) + VV>(a)) = 0
Identities:
192
Momenta
and Dynamical
Variables
it is obvious that, since the real and imaginary parts must be separately zero, we have two simple possibilities: 1. The values of 1^3 and
VT/>3
are each zero at both limits:
Mb) = Ma) = Vlfe(6) = W3(a) = 0. 2. The value of V>3 is the same at both limits as is the value of VM
Mb) = Ma) VV>3(&) = VV>3(a).
(9.5.18)
Using the real trigonometric solutions, it is impossible to satisfy the first condition. Indeed, the gradient ("momentum") of ipz is at a maximum when the value of tps is zero for both the sine and cosine functions. We are then left with the very simple condition that, suitably displaced, the function tp^ would join continuously and smoothly at the two limits; its value must be the same and its slopes must be identical. It is a familiar fact of elementary quantum theory that the imposition of these boundary conditions makes the continuum of solutions tp3 of equation (9.5.17) unacceptable and makes the values of the Energy discrete (quantised). This is aways the case: The solutions of the Schrodinger equation (a differential equation) typically involve a continuum of possible energies; it is the imposition of the boundary conditions on these solutions — i.e. ensuring that we are dealing with the domain of Hermiticity of the associated Hamiltonian operator — which generates quantisation of energy. And, of course, to make the point yet again, both the Schrodinger equation and the boundary conditions are given by the fundamental law of Schrodinger's mechanics, the Schrodinger Condition.
9.6.
The "Particle in a Box" and Cyclic Boundary Conditions
The case of a single particle confined to a finite region of space (a particle in a box) is often the first example of the explicit solution of the Schrodinger equation encountered in elementary texts and uses one of two subterfuges
9.6.
The "Particle in a Box" and Cyclic Boundary Conditions
193
to obtain the correct energies, rather than use the correct boundary conditions supplied by the Schrodinger Condition. The one-dimensional case is mathematically identical to X3(x) of equation (9.5.17) and the boundaries are identical to the ones discussed in the last section; the one-dimensional space in which the particle can move is the interval (a, b) (or, perhaps [a, b]). One can think of two physical models of this situation: 1. For x < a and x > b, there is an infinite repulsive potential which prevents the particle from entering those regions. 2. The region is repeated exactly for each contiguous length of (b — a). In the first case, one says that the probability of a particle being outside the region (a, b) is zero and so, since the function X3 must be continuous everywhere, it must vanish at x = a and x = b, generating the correct quantisation conditions from the fact that the trigonometric functions must then be periodic in sub-multiples of (b — a) and only the sine function is retained. Unfortunately, it is equally plausible to say that the momentum of the particle is zero for x < a and x > b, so it must be zero at a and 6 and again generate the correct discrete energies, but this time the cosine function is the only one retained. These two plausible assumptions cannot be made compatible and one simply has to make a choice. In the second case one obtains the correct boundary conditions — those of equation (9.5.18) — but for an entirely artificial physical model. There is, as we have seen above, an entirely straightforward way of obtaining the solutions of the global Schrodinger Condition for a particular application by using its two characteristic local consequences: the Schrodinger equation and the boundary conditions. The boundary conditions, however physically obvious they might or might not be, are always generated by the Schrodinger Condition.
This page is intentionally left blank
Chapter 10
Abstracting the Structure
The most widely used and most familiar applications of Schrodinger's mechanics are in those cases where the potential energy function is independent of time and is a conservative field so that, both classically and in quantum theory, the energy of the system is conserved. The Schrodinger equation for these important cases takes a characteristic form; that of an eigenvalue problem. In this case it is possible to abstract the algebraic structure of a Hilbert space from the solutions of the Schrodinger equation. This algebraic structure transforms much of the mathematics from problems in analysis (which are hard) to problems in algebra (which are easier) while doing some violence to the interpretation of the theory. Quantum theory is not deepened or developed by the use of powerful abstract methods, merely sophisticated. It is via the interpretation of the symbolism that a theory is deepened, not by an improvement of the tools of articulation.
Contents 10.1. The Idea of Mathematical Structure 10.1.1. A Pitfall of Abstraction: The Momentum Operator 10.2. States and Hilbert Space 10.3. The Real Use of Abstract Structures
10.1.
195 198 201 204
The Idea of Mathematical Structure
In Chapter 8 the transition has been made from classical to quantum mechanics by the requirement that, in Schrodinger's dynamics, the Hamilton-Jacobi equation must be satisfied only on the average in space 195
196
Abstracting
the
Structure
and time. We have also embarked on an interpretation of the various quantities appearing in Schrodinger's theory, albeit on a fairly informal and intuitive basis. There are now several possible ways forward: 1. An investigation of the formal properties of the equations and identities of the Schrodinger theory with a view to extracting the "structure" of the theory for a more rigorous and, perhaps, more general formulation. 2. A continued investigation of the interpretation of the theory and a more precise definition of the referents of the theory and a more careful statement of the results of the last chapter. 3. Application of the Schrodinger theory to particular systems. In fact, we shall only be concerned with (3) in particular cases in Chapters 11 and 12 where it enables some far-reaching conclusions to be drawn for (1) and (2). The possible applications of the Schrodinger equation are very numerous and extremely successful and it was these very successes which guaranteed the acceptance of Schrodinger's theory long before an adequate interpretation of the formalism was available. I shall not go into these applications except where they bear on the interpretation of Schrodinger's mechanics. This chapter will be concerned with (1) and (2), and in particular with the connection between (1) and (2): how the interpretation of the theory bears on the validity and utility of any formal structures which may be abstracted from that theory. What are we to make, for example, of the fact that the Schrodinger equation has, no doubt, infinitely many solutions? Is there a relationship between the families of solutions of the Hamilton-Jacobi equation obtained by separation methods and the "corresponding" solutions of Schrodinger's equation? How do we interpret those amongst the solutions which cannot be normalised to unity? Can we distinguish between equations and identities in the theory as we did in classical mechanics? What concept replaces the familiar concepts of a particle's (as yet unmentioned) velocity in classical mechanics? These and many other pressing scientific problems are not touched by the ability of the Schrodinger equation to yield numerical results in essentially perfect agreement with experiment. Superposed on these problems which are particular to Schrodinger's theory is the all-pervasive problem which arises whenever an attempt is made to express a scientific theory in terms of a known mathematical structure: does the formal "abstraction" reveal the real structure of the theory without the inclusion of sources of scientific difficulty?
10.1.
The Idea of Mathematical Structure
197
In our reflections on classical mechanics and on the transition to Schrodinger's mechanics it has not proved necessary to pause to consider the nature of the mathematical structures used or their possible impact on the science involved since — along with Newton, Lagrange, Hamilton, Jacobi and Schrodinger — we have simply assumed the validity and utility of the differential and integral calculus. In a word, it has been silently assumed that the methods and results of classical analysis model the continuity and structure of real space and time. Moreover, it has been assumed that this modelling is done in the usual sense of abstraction; even if the use of the real numbers (for example) to model each dimension of real space excludes some of the properties of real space, the opposite is not true; the use of the reals to model a line does not surreptiously include any properties which are not possessed by physical space. This may be incorrect, of course, but it is the classical viewpoint. The possibilities of unintentionally including extraneous material into a scientific theory is much more real when the more powerful techniques of mathematics are used. The equations of classical mechanics specify local conditions among the mechanical quantities: they are differential equations. If it proves possible to generate these equations from a global condition — a variation principle, for example — then there is always the possibility, even likelihood, that the global condition will generate more differential equations and boundary conditions than the one which was its ostensible "source". This latter trap is only a danger in classical mechanics, where we start with F — ma and generalise the theory by the admission of more and more classes of "co-ordinate". Perhaps the most obvious place where we depend on a global condition is in the development of the transformation theory, where it is required that the generating function be capable of being developed as a function of time only, which is the direct result of the variational formulation of mechanics. In Schrodinger's mechanics, by contrast, the starting point is a global condition; the equality of the mean values of the Hamiltonian density and the energy density. The Schrodinger equation which is the (local) EulerLagrange equation of this global condition (together with some boundary conditions) and is therefore, at a lower level than the fundamental Schrodinger condition. But this condition on the mean values of the Hamiltonian and energy densities is based on the preceeding classical mechanics. If Schrodinger's theory is considered to depend on classical mechanics then any artifacts in classical mechanics may be carried forward. If, however,
Abstracting
198
the
Structure
Schrodinger's mechanics in viewed as independent of classical mechanics, depending only on historically-established classical mechanical concepts (not physical laws) then it is an independent global theory from the outset. 10.1.1.
A Pitfall
of Abstraction:
The Momentum
Operator
A very familiar example of the confusion which may arise in the cycle: Specific mathematical case —> Back to specific case
Abstraction of structure
<—
Re-generation of mathematical form (10.1.1) is the abstraction of structure from the definition of the momentum of a particle in terms of the action. In Schrodinger's mechanics we take over the relationship between the action (S) and the momentum conjugate to ql directly from the HamiltonJacobi equation: Pi =
dO-1-2)
W
which, when Schrodinger's notation (ip) for action is used, becomes Pi
—ih dip
~ ^fdq1'
Multipliying this quantity by the particle probability density tp*ip generates the momentum density
ft* = P \~ih^i)
*
leading, in the familiar way, to the definition of the momentum operator: Pi = - i h ^ .
(10.1.3)
This operator, working on a suitable class of functions, /(*), (say), of the variable ql has the property P i ( « W ) ) - «*&/(**)) = i W ) from which the abstract algebraic structure Vi$ ~ ?Pi = ihl
(10.1.4)
10.1.
The Idea of Mathematical
Structure
199
for the "operators" pi, ql and 1 may be generated. The two structures given by equations (10.1.3) and (10.1.4) are abstractions from the interpretated (analytical) quantity given by equation (10.1.2); equation (10.1.4) being more abstract than equation (10.1.3). The latter is entirely "algebraic", containing no "rules of operation" for the operator pi, merely having the structure of its relationship to the operators q1 and 1 displayed. Now suppose we take the algebraic structure given by equation (10.1.4) as the definition of the momentum operator pi (the definition of the operators q% and 1 being understood as simple multiplication) and close our eyes to the way in which this structure arose. We might then say to an aspiring researcher who wishes to use Schrodinger's mechanics for a particular application; "here is the definition of the momentum operators pi and a recipe for generating the Schrodinger equation, you can now apply it to your specific case". It quickly becomes obvious that, in addition to the definition given by equation (10.1.3), the source of the abstract definition, there is an infinite number of other quantities which satisfy the structure condition equation (10.1.4). In fact, any operator of the form
Pi =
-ih(a{qi)—t+b{qi)\
satisfies the equation for arbitrary functions a{ql), b(q1).1 What is to be done? The structural conditions are satisfied but no calculations can be performed unless the explicit form of the Schrodinger equation is known. It is completely obvious what has happened; the process of abstraction, in general, cannot be reversed; there is a kind of hysteresis in the cycle (10.1.1 on the preceding page), it is not possible to go around the cycle without the inclusion of something other than the starting material. 2 But this is the very nature of abstraction; if abstraction were reversible, that is if there were a one-to-one relationship between the original and the abstracted structure, the "abstraction" would be nothing more than a more compact notation. There is absolutely nothing wrong in using an algebraic or other mathematical structure which may be abstracted from an interpreted theory of some part of the world provided that one does not assume that this abstracted structure is identical to (or, worse, superior to) *Not to mention classes of (infinite-dimensional) matrices. Dirac, in his Principles of Quantum Mechanics (Oxford University Press, le, 1930), has to spend quite some space getting Schrodinger's mechanics from his abstract formalism for this very reason: pp. 99-104. 2
Abstracting the Structure
200
the underlying interpreted theory. When expressed in ordinary language rather than mathematics the fallacy is clear: A momentum operator satisfies (10.1.4) does not imply that Anything satisfying (10.1.4) is a momentum operator. And, a fortiori a momentum operator is not a representation of the abstract quantity pi in (10.1.4). In fact the very idea that there is a momentum operator at all arises by abstraction from the specific form into which the momentum density may be cast. If we had asked for the form of an operator for some more complicated dynamical variable (non-linear in momenta) by abstraction, from the associated density of that variable it may well not prove possible to generate such a quantity as we may see from the definition of the "vl-density" in equation (7.4.12 on page 135) of Section 7.4.1. In these common cases there is no rule for the generation of the alleged Hermitian operator representing A3 It is as if, because we can abstract the basic structures of a "mammal" from the class of living objects "dog" (warm blood, live birth, suckling young, etc.) and say "a dog is a mammal", we can invert the abstraction and say "a mammal is a dog" the fallacy being much clearer in the familiar, everyday usage. This rather elementary, but typical, example is just one case of the mathematical use of "representation" referred to in Section 1.6 on page 13. Using the abstract relationship in equation (10.1.4), it is quite easy to show that the "operators" p and q have, as eigenvalues, all possible real numbers from — oo to +oo as Dirac does in the first edition of his classic work. But the range and discreteness of possible eigenvalues of momentum operators is not simply determined by equation (10.1.4); the boundary values on the coordinate q are just as important as the commutation relationship between p and q. In the all-important case of angular momenta the cyclic, finite nature of the case q = <j> ensures that the eigenvalues of p,f, are discrete and integral rather than continuous and infinite. In a similar way, one routinely finds in texts the allegation that the momentum operator
.0 3
See Appendix A.
10.2.
201
States and Hilbert Space
is Hermitian that is, for functions f(q),g(q) Jdqr{q)pqg{q)
of q:
= Jdqg*(q)pqf(q).
(10.1.5)
But if we try the simplest forms / = qn and g = qm we find that it is not the case since qnWl
oq
=
mqn+m-X
± nqn+m-1
= qm^f- . oq
Again, it is the boundary conditions on acceptable functions of q which have been forgotten. Operators are not unilaterally and absolutely Hermitian, they have a limited domain of Hermiticity within which relationships like those of equation (10.1.5) hold. Typically, such boundary conditions are that the domain of Hermiticity is limited to functions which either vanish at the extremities of the range of the variable q or are cyclic in q. Boundary conditions are not just a nuisance which have to be specified empirically and externally to the problem while the "real" science is contained in the operators, they are an integral part of the whole problem in Schrodinger's mechanics since the Schrodinger Condition generates both the Schrodinger equation and the boundary conditions which ensure the Hermiticity of the Hamiltonian operator and any other operators which may be generated from that Hamiltonian.The boundary conditions are not available for arbitrary (allegedly Hermitian) operators precisely because, in the absence of the solution of a Schrodinger Condition, there is no guarantee that such operators are involved in a description of reality. Having spent a little time and space labouring the obvious, we can now turn to the task of abstracting the extemely useful mathematical and conceptual structures from Schrodinger's mechanics.
10.2.
States and Hilbert Space
The Schrodinger Condition for the motions of N particles in a timeindependent potential, subject to their (time-independent) mutual interactions, leads to a partial differential equation in 3N variables (the Schrodinger equation)
Abstracting the Structure
202
which may be written in "operator form": #VVn = Emrpm
(10.2.1)
where, obviously, the operator H is given by:
v
Hfri 'H The boundary conditions on possible solutions, ipm, of the equation, which also flow from the Schrodinger Condition, are enough to ensure that the space of functions concerned makes the operator H into a Hermitian operator, one for which /
Jw
*HxpdV= [ i>*H<j)dV.
Jw
This condition is enough to guarantee, for example, that the allowed energies (Em) are always real numbers. It is worth recalling here the point from the last section that there is no such thing as a Hermitian operator in this context; one must always specify the class of functions on which the operator acts to ensure Hermiticity. The theory of such Hermitian eigenvalue problems is very well developed and some salient properties of the solutions are: • There are typically discrete (negative) values of Em which have associated solutions Vm ; there may be more than one solution for each Em. • There may in addition be continuous (positive) values of Em (Ex, say) and associated solution ^4 • • The totality of all solutions tpm form a complete set for the expansion of any function of the relevant coordinates; for any f(qt) satisfying the same boundary conditions as the solutions tpm (ql) of the Schrodinger equation:
/(
+
/ c(*hM*
for some values of the "coefficients" c. This expression may, of course, be written as a single (Lebesque) integral over the allowed values of E.
10.2.
States and Hilbert Space
203
That is, the solutions rp form a linear space 4 and this linear space may be equipped with a "scalar product" (and hence a metric) by the definition (lpm,1pn) = / C.V'ndV where, as usual, dV is the product of the coordinate differentials and the metric of the 3iV-dimensional configuration space:
dV = y/g'[[dqi. This definition ensures that the solutions ip may have a vector space of infinite dimension abstracted from them and it is possible to show that this space contains all its limit points. This means that the solutions ip carry the formal structure of a Hilbert Space as well as their more familiar properties of generating probability and property densities. That is, the structure "Hilbert space" may be abstracted from the solutions of the Schrodinger equation. These solutions have, of course, much more content and meaning than that particular algebraic structure; they are functions of 3-space which carry the probabilistic interpretation of the whole mechanics. The referent of a Hilbert space is a mathematical structure less abstract than itself; the referent of the solutions of the Schrodinger equation is the real world. In starting from a formal mathematical skeleton, one quickly finds, for example, that one must introduce quite arbitrary assumptions in order to make the formalism work 05 a theory of the behaviour of matter. It is simply not possible to generate the operators allegedly representing dynamical variables of particles or fields by the standard formulae of replacing classical momenta by gradients. 5 No matter what the generality and power of the mathematics, the requirement that these structures must ultimately be genuine abstractions from the properties of real matter demands what is, at bottom, an assumption about the referent of the symbolism. 4 T h a t is, the totality of solutions may have the structure of a linear space abstracted from them. 5 Again, see Appendix A.
204
Abstracting
the
Structure
For example, in a self-confessed "cryptic" account of canonical quantisation in Hilbert space we find Ludwig Faddeev saying:6 There is no unique way to substitute operators for functions because the function pq is equal to the function qp while the operator PQ does not equal the operator QP: one must choose a prescription for the ordering of the operators, (my emphasis) After presenting purely mathematical prescriptions for his choice, Faddeev says This is for me the most concise formulation of the correspondence principle and explains what is meant by quantisation. But this is to make the most important choice that there is to be made in this context merely on grounds of personal taste or convenience, there is no attempt made to generate the correct order of operators from an interpreted scientific principle. This is the Achilles' heel of all such formal approaches; it stems from the erroneous assumption that every dynamical variable ("observable") may be associated with a (unique?) Hermitian operator generated from its classical equivalent by the familiar (erroneous) substitution rules. Ultimately this error is fallout from the confusion between: • the Schrodinger equation and its separated components ("symmetries") as equations; representations of a law of nature, and • identities merely expressing the condition that particular dynamical properties linear in the momenta of an abstract object be constant. Mathematics is useful for mathematical tasks; it cannot ever be a substitute for scientific analysis.
10.3.
The Real Use of Abstract Structures
In point of fact, as will become clear after we have looked at some particular applications of Schrodinger's mechanics, the main use of the algebraic structures which may be abstracted from the interpreted theory is not as In Quantum Fields and String: A Course for Mathematicians (American Mathematical Society/Institute for Advanced Study, 1999) Eds. P. Deligne et al.
10.3.
The Real Use of Abstract
Structures
205
foundations underlying that interpreted theory, but as extremely convenient manipulational technologies in applications of the theory to particular cases. The most widely-used of these technologies (second quantisation and perturbation theories) may even have their rules of manipulation expressed diagramatically and are very convenient for automated, computer, usage. The very convenience of these schemes has led, in some cases, to farce, when the intermediate terms in some mathematical expansion method are given physical interpretation or when one finds, after a long calculation, that one has computed an approximation to a function which does not exist. 7 We shall have the opportunity to re-examine this point in Part 5.
This page is intentionally left blank
PART 5
Interpretation from Applications
Schrodinger's mechanics, like many other physical theories, cannot be satisfactorily interpreted and understood without a careful scrutiny of the way it works when applied to particular physical systems. However instructive it is to think that conclusions may be drawn about the real world from thought experiments on non-existent objects, it is even more instructive to see how the theory actually applies to the real world. The applications discussed in this part are chosen because they are exactly soluble in Schrodinger's mechanics but, more importantly, because each one has important, even decisive, things to say about the interpretation of the theory.
This page is intentionally left blank
Chapter 11
The Quantum Kepler Problem
The Kepler problem (two particles in mutual attraction) is not only a classic problem in its own right with extensive application throughout physics and chemistry, but its theoretical ramifications bring into sharp focus some ideas which are crucial to the use and understanding of Schrodinger's mechanics. In this chapter we look at the several ways in which the associated Schrodinger's equation may be solved and the interpretation of the solutions. There are some surprises in the general properties of the solutions and some quantities which emerge on the way to those solutions. I have risked boring the reader by stressing fundamental points here.
Contents 11.1. 11.2. 11.3. 11.4. 11.5. 11.6. 11.7.
Two Interacting Particles Quantum Kepler Problem in a Plane Abstract and Concrete Hydrogen Atoms The Kepler Problem in Three Dimensions The Separation of the Schrodinger Equation Commuting Operators and Conservation The Less Familiar Separations 11.7.1. The Everyday Solutions 11.8. Conservation in Concrete and Abstract Systems 11.9. Conclusions from the Kepler Problem 11.9.1. Concrete Objects and Symmetries
209
210 211 212 214 216 218 221 223 223 227 231
The Quantum Kepler Problem
210
11.1.
Two Interacting Particles
The dynamics of a system of two bodies (masses mi and mi, say) which interact via an attractive potential acting along a line joining their centres (V(|fi — F2I), say) is easily reduced to the motion of the system as a whole and their relative motions. The kinetic energy of the system depends on the two velocities r\ and ra and we may, in anticipation of the success of the endeavour, transform to a new system of coordinates, R, the coordinate of the centre of mass of the pair and r*= (f\ — rjj), their relative positions, so that the kinetic energy depends on R and r. The potential energy, by assumption, depends only on r (plus any parameters of the particles; charge mass, etc.) In fact, after a small amount of manipulation, the Lagrangian is found to be: £ ( £ r, A, f) = ±(m1 2
+
m2)fc+l-
m ™2 r2-V(r)2 (mi + m
(H-l-1)
Of the six Lagrange equations resulting from this Lagrangian function, the three involving derivatives of R are all those of free motion in a straight line; the inter-particle potential does not depend on R. Further, there are no "cross terms" involving f and R. We may therefore, without loss of generality simply assume a Lagrangian of the simpler form L(f,f) = ^r2-V(f)
(11.1.2)
where the notation /J, has been introduced for the so-called reduced mass: mim2 u = 7 r. (mi + m 2 ) So, as is intuitively obvious, the motion reduces to an effective "one body in a field of force" problem and we may take the origin of coordinates as r = 0 and the problem to be solved is spherically symmetrical; no torques act so that the total angular momentum is conserved. For any motion whatsoever the vector £ = rxp = [irxr
(11.1.3)
is constant, which means that the position vector (f) of the "effective particle" as I shall call it is always perpendicular to a fixed direction in space (I); r is always in a fixed plane. The motion of the effective particle is confined to a plane. From now on the reduced mass /i will be given the value unity for simplicity ("atomic units").
11.2.
Quantum Kepler Problem in a Plane
211
The details of the motion may now be elucidated by using a system of plane polar coordinates to describe the position of the particle which enables maximum use to be made of the conservation of angular momentum. The details are well known and will not be repeated here, I simply wish to establish the result that, for a given set of initial conditions (fo and fo, say) which establish a plane (perpendicular to fo x fo) the motion is confined forever to that plane. The complications which arise due to the motion of the second particle do not involve any changes to the idea that the motion is planar; the centre of gravity of the combined system moves in an orbit in the same plane.
11.2.
Quantum Kepler Problem in a Plane
In view of the findings of the last section it would seem that the way to proceed to the solution of the quantum Kepler problem would be to set up and solve the Schrodinger equation for a particle in a centrally-symmetric field of force in two dimensions, avoiding the repetition of the elimination of the motion of the centre of mass and the theorem that the motion is confined to be planar. Also, taking advantage of the conservation of angular momentum, we may set up the two-dimensional Schrodinger equation in plane polar coordinates (r, <j>). The Jacobian in this simple case is
9 =
1
0
0
r2
so that, if the potential field is given by
V(r) = - I for a Coulomb's Law interaction (taking the charge on the proton to be unity; Atomic units again), the (time-independent) Schrodinger equation becomes:
-\ (^
+^
+ 4J5) ^><« - fa*) = EMr,4>) (11-2.4)
which, of course, separates into two ordinary differential equations by the assumption of a product form for %l)(r, 4>)
The Quantum Kepler Problem
212 —. + rf2fl dr2
|
f \
|
2 r
2 TO
|
$ = 0
(^2 ~ l) 1 r2 J
(11 2 5) 0
' '
This equation is straightforward to solve by the standard terminating series expansion method. The result of imposing the standard boundary conditions arising from the Schrodinger Condition is that the allowed (bound) values of the energies E (Em, say, since there is a denumerable infinity of them) are given by: E
™ =^ W
form = 0 1
' '-
(11 2 6)
--
which is, to say the very least, disturbing since it is well-known that the experimental energies of, for example, a hydrogen atom in appropriate units are given by: E
n = ^
for n = 1,2,....
(11.2.7)
What has happened here? Why does the Schrodinger equation not generate results in agreement with experiment; where is the much-vaunted invincibility of the Schrodinger equation in quantitative matters? After all, the same problem treated by the classical methods of the last section gives results in essentially perfect agreement with experiment for macroscopic bodies. There are a number of objections to the two-dimensional treatment of the Kepler problem in Schrodinger's mechanics, like the failure of the divergence theorem for the Coulomb potential in two dimensions, but all of these apply with equal force to the standard Newtonian treatment of the macroscopic case. As usual with theoreticians, we must blame the experiments.
11.3.
Abstract and Concrete Hydrogen Atoms
When one performs a series of experimental measurements in order to verify (or not) the predictions of the macroscopic Kepler model, such experiments are repeatedly carried out on a specific individual concrete object to verify (for example) its path through space as a function of time. Since the orbit of any such individual object is in a plane, there is no restriction on the applicability of the results obtained for the (Newtonian) abstract
11.3. Abstract and Concrete Hydrogen Atoms
213
object where, in this case, abstraction has only been done from the static properties (shape, colour, composition, ...) of the particles. Any experiments performed to verify (or not) calculations on the properties of the abstract hydrogen atom (the most familiar case of the microscopic Kepler problem) must be performed on concrete hydrogen atoms and, what is more, on large numbers of randomly occurring concrete hydrogen atoms, because the solutions of the Schrodinger equation generate probabilities. With the current state of experimental technique, such experiments will be carried out on unconstrained hydrogen atoms in ordinary, three-dimensional space. There is, therefore, the distinct possibility that measurements on sets of randomly-occurring concrete, randomly orientated, hydrogen atoms whose combined internal motions are taking place in three dimensions will not be commensurate with results from the theory of an abstract object which solves the Kepler problem in two dimensions. As we have seen in the case of particle diffraction, there is a profound effect on abstract objects described by Schrodinger's mechanics when constraints are placed on the motion of a particle which are entirely absent from the corresponding Newtonian problem; for example, particles confined to a limited area of space in which no potential acts have quantised energies and characteristic periodic distributions. As we shall see later, the possible quantised values of angular momentum are different for rotating systems (or particles in closed orbits around a source of central potential) in two or three dimensions. In two dimensions the quantised value of the square of the angular momentum is dependent on the square of the integer m, but in three dimensions the corresponding result is dependent on (.{(. + 1) where the integer values of both m and I are 0 , 1 , 2 , . . . The kinetic energy associated with these motions is Ti oc - m 2 2
for planar motion
T3 oc —£(£ + 1)
for three-dimensional motion
for which there is a suggestive analogy with the appearance of 1/2 in the energy. This is simply speculation; however: If it ever proves possible to make measurements of the energies suitably "squashed" concrete hydrogen atoms, it would not be surprising if their allowed energies proved to be given by the formula of the last section for the abstract hydrogen atom in two dimensions rather than the formula appropriate to the abstract hydrogen atom in three dimensions.
214
11.4.
The Quantum Kepler
Problem
The Kepler Problem in Three Dimensions
The complete solution to the Kepler problem in three dimensions in classical mechanics is obtained by the solution of the Hamilton-Jacobi equation. In Schrodinger's mechanics it is, of course, obtained by solving the Schrodinger equation. The solutions of the H-J equation are obtained as families of trajectories containing arbitrary parameters which may be taken to be the initial position and momentum (vectors) of the single effective particle and any particular trajectory may be chosen by insertion of the appropriate values of these constants. Of course, once a particular trajectory is chosen, it will lie in a plane determined by the initial position and momentum. In the case of Schrondinger's mechanics, the solutions of the Schrodinger equation are energies and probability distributions for the abstract effective particle which will necessarily be three-dimensional since, for example, the abstract hydrogen atom with a given fixed energy will have finite probabilities of position in three-dimensional space and no fixed (planar) trajectories are generated. The solutions of the Hamilton-Jacobi equation for the Kepler problem are well known and are given in standard texts on classical mechanics. Our problem in this chapter is to obtain the solutions of the Schrodinger equation and, in particular, to see how the interpretation of these solutions bears on the general problems associated with the interpretation of Schrodinger's mechanics. The only way to obtain the solutions of partial differential equations in closed analytical form is by the technique of separation of variables; to use a coordinate transformation which will change the form of the unknown function, when it is expressed in the new coordinates, into a sum or product of functions, each of which depends only on one of these coordinates. This breaks down the three-dimensional partial differential equation into three separate ordinary differential equations. In the case of the Hamilton-Jacobi method such a transformation typically results in the Action function S being broken down into a sum of functions of each variable while, since Schrodinger used the logarithmic form for the action; equation (7.3.2): S = Klnip the Schrodinger equation is separated (if this is possible) by a product form; V'(91)92>93) m a y °e written as a product: ^(q1,q2,q3)
= f(q1)9(q2)h(q3)
(say).
11.4- The Kepler Problem in Three Dimensions
215
There are two points to be made about this mathematical technique for the solution of the Schrodinger equation (and its Hamilton-Jacobi analogue), one mathematical and one physical: • It is possible to prove that, where a coordinate system can be found in which the solution separates, all possible solutions to the equation may be found. That is, there are no duplicates and no omissions. Here by "all" it is meant that the space of solutions is spanned by each set of separated products; all possible solutions are, at most, linear combinations of the ones explicitly generated by the technique. • The separation technique is not just a mathematical convenience; it provides the formal background for two extremely important points in the interpretation of Schrodinger's mechanics. As we shall shortly see, the idea of commuting operators and "constants of the motion" emerge from this study as does an important clarification of the relationship between abstract and concrete objects. I make no apology for re-stressing the latter point We are already familiar with the idea that the simple requirement in classical mechanics of the equality of the Hamiltonian function and the energy certainly generates all possible trajectories. But in some cases where the energy is constant, there may be conservation laws for some component parts of the energy as well as for the total. In the theory of the Kepler problem, for example, the fact that potential depends only on the radial distance of the effective particle from the origin of coordinates means that, for any particular trajectory, the rotational kinetic energy is also constant in addition to the constancy of the total energy. Even more familiar is the fact that, for a free particle with constant (kinetic) energy, the kinetic energy in any direction is also constant and the sum of the kinetic energies in any three mutually perpendicular directions is equal to the total energy. In short, any symmetries in the (time-independent) potential energy function will be reflected in the conservation of selected parts of the total energy during any motion in which the total energy is conserved. Of course, this is extremely obvious even at the level of Newton's or Lagrange's equations since, if the potential energy function does not depend on a particular coordinate, the corresponding Newtonian or Lagrange equation implies free motion in that coordinate and therefore a separate (kinetic) energy-conservation law for that coordinate. In Schrodinger's mechanics, the situation is modified by the fact that we must seek solutions of the Schrodinger equation whose referent is the
The Quantum Kepler Problem
216
abstract object and which contain probability distributions for, what in classical mechanics are, whole families of trajectories and not simply single trajectories.
11.5.
The Separation of the Schrodinger Equation
This section gives a barest skeleton of the interpretation of the results of the solution, by separation, of the Schrodinger equation for any system in three dimensions and is included to introduce the main ideas and nomenclature. There are many, more modern, methods of approaching the idea of "Symmetry and Separation" but it is attractive to retain the flavour of Schrodinger's approach here. Since the Schrodinger equation for a single particle contains the Laplacian operator V 2 , the coordinate systems in which the equation separates must be a subset of the familiar eleven systems in which V 2 separates. 1 The choice of which (if any) of the coordinate systems a particular Schrodinger equation separates is determined by the properties of the potential energy function for the particular case in question. Roughly speaking, does the potential energy function break down into a sum of functions, each depending on only one of the coordinates? The technique of separation is to generate three soluble equations which prove to be, in their turn, eigenvalue equations involving the so-called separation operators. Although the details of the solution are often more complex, the essence of the method is that the Hamiltonian operator {H{ql,q2,q3)), by a suitable choice of coordinates ( Q 1 , ? 2 , ? 3 ) , may be written as a sum of three separation operators: H(q1,q2,q3)
= U(q1) + V(q2) + W(q3)
such that, when the solution of
£ ( 9 V , < Z i W > 9 V ) = E^(q\q2,q3) which is sought, is written as i>{q1,q2,q3)=u(q1)v(q2)wq3) 1
These systems and much useful information about them can be found in The Field Theory Handbook by P. Moon and D. E. Spencer (Springer, 1961).
11.5.
The Separation of the Schrodinger
217
Equation
for the coordinate system {ql,q2,qz) in question, then the three functions of the single coordinates satisfy the equations: tf(g\
=
Ei,{q\q\q*)
1
tffaXg ) = auiq1) V(q2)v(q2)
= bv(q2)
W{q3)w(q3)
= cw(q3)
(11.5.8)
where, in the simplest possible case, E — a + b + c. In many cases, one or more of the separation operators may contain a constant which is omitted from its final form or an operator may turn out to be the square of a simpler operator (or both) which will mean that the eigenvalues will not sum the the total energy (the three may have different dimensions and units), but if the operators are kept in their original form after the separation process, all the eigenvalues will have the dimensions of energy which is important for their interpretation. In practice, since the four operators U, V, W and H are mutually dependent, there are three independent operators from each separation. In fact it is now known that the Schrodinger equation for the hydrogenlike atom ("the Kepler problem") can be separated in four coordinate systems: 1. 2. 3. 4.
The familiar spherical polar system: (r,6,), The parabolic system: (fi, u, ip) Conical or sphero-conal coordinates: (r, 9, A) Prolate spheroidal coordinates: (r/, 6, I/J)
Appropriately enough, these general techniques were first developed by Stackel2 for the separation and solution of the equations associated with the immediate precursor of Schrodinger's mechanics; the Hamilton-Jacobi equations. Thus, the four separations of the Schrodinger equation for the hydrogen atom generate four sets of separation operators; the Hamiltonian and eight separation operators. In practice not all the eight are different; there are five distinct forms of separation operator, two of which are families of operators depending on parameters contained in the definition of the coordinate system. 2 Details and illustrations of these coordinate systems and the Stackel method can be found in Moon and Spencer, cited on page 216.
The Quantum Kepler
218
11.6.
Problem
Commuting Operators and Conservation
In the previous section we know from the general theory of Schrodinger's mechanics that the quantity E is the (conserved) total energy of the system; the question naturally arises: What is the interpretation of the numbers which are the eigenvalues of the various separation operators? The answer must hinge on the answer to the question whether the eigenvalue equations associated with these separation operators are indeed equations or whether they are merely identities. Obviously, since they are components of the Schrodinger equation, they must be equations; they therefore carry some information about the world and are capable of being given a physical interpretation. They are not simply identities like the momentum distributions 3 defined in Section 7.4.1 on page 135. Some of the properties of these operators follow. • First of all, it is quite easy to prove that all five of the separation operators for the hydrogen atom commute with the Hamiltonian operator, that is, for any separation operator U HUf(q\q2,q3)
=
UHf(q\q2,q3).
• Within a particular choice of coordinate system, the separation operators trivially commute because they depend on different coordinates; they have no action on "each other's" coordinate UVf(q1,q2,q3)
=
VUf(q\q2,q3).
• Between coordinate systems, if two separation operators are not the same, in general they do not commute If we look at the most familiar case of separation for the hydrogen atom — spherical polar coordinates — the separation operators and their eigenvalues and eigenfunctions may be identified from the fact that they both appear in their own right in Schrodinger equations for other dynamical systems. If we were to solve the problem of a rigid rotor (of constant moment of inertia / ) in three dimensions, we would start from a Schrodinger Condition in the angular part of spherical polar coordinates containing the angular N o t w i t h s t a n d i n g the fact t h a t the same mathematical expression may be an identity in one context and an equation in another, which I have tried to distinguish by using, def
respectively, " = " or " = " in the relevant expression.
11.6.
Commuting Operators and Conservation
219
kinetic energy
which, when translated into Schrodinger's ip notation generates a density of angular kinetic energy of _ - M 2 f 92V
1 d2i> \
This, when inserted into the Schrodinger Condition, generates the variation problem 8 I fsm6d6d(l)\ip\2pnet(t>)-t
f sinedOd^A
>0
where t is an allowed numerical value of the angular kinetic energy. The resulting Schrodinger equation is rf,
Te
,
>**
=
-If
1
3 / .
J \
1
d2 } ,
u X^em {smem) + ^TeW2)* = t1p
where T is the rotational Hamiltonian operator: ~
- 1 / 1 9 / 5 2 / lsin6»96> V dd J
sm2 6 dcp'
This is the most familiar choice for one of the separation operators for the hydrogen atom Schrodinger equation. It is capable of further separation into operators involving the coordinates 6 and <j> individually to generate the other common separation operator
"
-{ —]
2Ie
where 1$ — I sin2 6 is the component of the moment of inertia about the Cartesian z-axis about which the angle <j> is defined. This operator is just the operator in the Schrodinger equation for rotational motion in two dimensions of an object with moment of inertia Ig. All that is required to make the identification complete is the definition of the moment of inertia of a single particle of mass m about the origin of coordinates: / = mr2. Thus, in summary, the separation operators for the hydrogen atom Schrodinger equation in spherical polar coordinates are operators whose eigenvalues are:
220
The Quantum Kepler
Problem
• The total angular kinetic energy of the particle and • The angular kinetic energy of the particle in a particular plane. The more common identification of these operators in terms of angular momenta is trivial to obtain by simply noting that: • The square of the total angular momentum (L2) is just the total angular kinetic energy divided by 27. • The magnitude of the two-dimensional angular momentum in a chosen plane (Lz) is just the square root of the planar angular kinetic energy divided by the appropriate moment of inertia component, Ig = m (rsinfl) 2 . The interpretation is now easy; the eigenvalues of each of the four operators of equation (11.5.8 on page 217) (in a three-dimensional case) is a dynamical constant energy of the motion; each of these energy quantities is conserved throughout the motion of the system. Only three of the four are independent since the eigenvalue of the Hamiltonian operator is the paramount quantity to which the other three must sum. In the case of the hydrogen atom, it is usual to choose the three which we have discussed and, further, it is usual to discard constants and take the simplest possible operator where there is a choice, leading to the three "constants of the motion" : • Energy • Square of the total angular momentum • One component of the angular momentum (conventionally the zcomponent) This is all very reassuring since it is well known from classical particle mechanics that these very quantities are conserved for the motions of concrete single Kepler systems and it is certainly not at all obvious that the same result would obtain for the abstract object which is the referent of Schrodinger's mechanics. There are two problems which remain: • Is this result general in the sense that "do the eigenvalues of all sets of commuting operators represent conserved components of the total energy which may be re-interpreted as conserved momenta?" • What is the interpretation of the fact that the separation operators obtained by performing the separation process in different coordinate systems may not commute with each other even though all of them necessarily commute with the Hamiltonian operator?
11.7.
The Less Familiar Separations
221
There are also some comments to be made on the results which have been obtained: • It must be noted that the operators obtained by the separation method arise from the Hamiltonian operator which itself is generated from the Schrodinger Condition. That is, for example, the operator which I have interpreted as the square of the total angular momentum is part of the Hamiltonian operator derived by applying the variation principle to the Schrodinger Condition, it is most definitely not the operator which would be obtained by making certain substitutions in the classical expression for the square of the angular momentum. In particular, the angular momentum density is not
as I have been at pains to emphasise in Section 7.4.1 on page 135. I shall return to this point in detail later. Ultimately, the fact that Schrodinger's mechanics is a system of equations rather than a set of identities rests on this distinction. • Can the conservation of a particular dynamical variable in Schrodinger's mechanics be interpreted in the same way as such a conservation can in classical mechanics? Can we say, for example as we do in classical particle mechanics, that the total angular momentum is conserved in Kepler systems because there are no torques acting on the system? Before we can begin to answer some of these points we need to look very briefly at the other systems of separation operators for the hydrogen atom Schrodinger equation. This section is very brief and it is not strictly necessary to know the explicit forms of the operators, merely the fact that they do not all commute.
11.7.
The Less Familiar Separations
In the spherical polar system the operator L2 is a separation operator and so the (scalar) square of the angular momentum is conserved as is the square of the z-component. Thus, necessarily, the sum of the squares of the x- and ^/-components is also conserved.
The Quantum Kepler
222
Problem
It is, perhaps, worth remarking that the squares of the three components of angular momentum, unlike the components themselves, do commute; that is, it is not the magnitudes of these vectors which are incompatible, but their relative directions. Sphero-conal Coordinates In the sphero-conal system the separation operators can be chosen to be l?z and a linear combination of the squares of the other two components:
all+bl\ where a and b are (positive) parameters which define the shape of the conical coordinate surfaces. These two operators commute with each other and with the Hamiltonian operator. The Parabolic System The system of parabolic coordinates was used by Schrodinger in his study of the Starck effect of the hydrogen atom. One of the separation operators in this case is the familiar L2 while the other one is Rz, the z-component of the Runge-Lenz vector operator:
M
Lx
p-pxL
Z
+
The Prolate Spheroidal System This system of coordinates is more familiar as the coordinates used to separate the Schrodinger equation for diatomic molecules; the coordinate surfaces are ellipses and hyperbolae with one focus at each nucleus; however, the system works just as well when one focus is "empty" and at an arbitrary distance from the nucleus of the atom. A = -L2 - RRZ again, the other separation operator is L2, and Rz is again the z-component of the Runge-Lenz vector operator. R is the distance between the nucleus and the dummy focus of the coordinate surfaces; an arbitrary positive number. There is a connection between this system and the parabolic system; R = oo is the parabolic limit. It is straightforward to show that the operator L2 does not commute with either Rz or A.
11.8.
Conservation
11.7.1.
in Concrete and Abstract
The Everyday
Systems
223
Solutions
The solutions familiar to every physics and chemistry student are actually the ones arising from the separation in sphero-conal coordinates; the real atomic orbitals which are, for n = 2: • The spherically-symmetrical "2s" and • The set of three a2px, 2py, 2pz real" dumbbell-shaped atomic orbitals. Spectroscopists more commonly invoke the four orbitals which arise from the spherical polar coordinate system: • Again, the spherically-symmetrical "2s" and • The set of three "2p_i, 2po, 2p+\ complex" atomic orbitals. precisely because spectroscopists are interested in abstract objects with well-defined angular momentum properties. Of course, the two sets of three "2p" orbitals are related by linear combination rules. However, these two sets of three "2p" orbitals both have equal pedigree as solutions of the Schrodinger equation for the hydrogen atom even though, for example, the three real atomic 2p atomic orbitals do not have constant values of total angular momentum squared (angular kinetic energy).
11.8.
Conservation in Concrete and Abstract Systems
Let's take a particular one of the separations outlined above and try to deduce the implications it has for the interpretation of the solutions of the Schrodinger equation for the hydrogen atom; how do the implications of this solution bear on the probability interpretation of these solutions? The simplest case is the parabolic separation and the simplest interesting case within this scheme is the one for n = 2 since, this case is the first one where there is any difference between the solutions in any of the four coordinate systems. We have already seen that the two sets of familiar solutions for n = 2 have different properties of total angular momentum squared but the parabolic system of coordinates is much easier to visualize than the sphero-conals (and is a single coordinate system, not a two-parameter family) so we can look at these solutions because of their simplicity and their historical interest. If we take the Cartesian z-axis to be the axis of the parabolae, then the four solutions of the Schrodinger equation are:
The Quantum Kepler Problem
224
• A pair of orbitals familiar from the spherical polar system: 2p_i and 2pi identical except for orientation and for which i 2 2 p _ i = 1(1 + l)2p_! L 2 2pi = 1(1 + l)2pi V
(11.8.9)
Lz2p-! = ( - l ) 2 p _ ! L z 2 P l = (+l)2pi that is, when squared, these orbitals generate probability distributions for abstract objects which have constant z-component and square of angular momentum. • Another pair of orbitals which are identical except for orientation in space which may be expressed as ± = -±=(2s ± 2p0) = -J=(2s ± 2p z )
(11.8.10)
and for which: L 2 >_ ^ \L2cf>+ ^ \+
(11.8.11)
Lz<£_ = (O)0_ Lz4>+
= {Q)4>+ .
That is, some of the solutions of the Schrodinger equation (atomic orbitals) in parabolic coordinates are for abstract objects which do not have constant total angular momentum (squared). Now, here is a dilemma if these solutions are interpreted as referring to an individual (concrete) hydrogen atom. Let's state the obvious first: • There are no preferred coordinate systems in physics; solutions of the Schrodinger equation are valid whatever means are used to obtain them. • No torques act on the electron in the hydrogen atom; the only force acting is Coulomb's law between the electron and the proton. If the solutions 2p_i, 2pi, <j>+ and - refer to individual hydrogen atoms, why do some of these hydrogen atoms not have constant total angular momentum (squared) while others do? It is a theorem in mechanics that the square of the total angular momentum of any individual system is conserved if no torques act on that system; why does this theorem apply to only two out of the four solutions?
11.8.
Conservation in Concrete and Abstract Systems
225
Completely analogous remarks apply to all the separations of the hydrogen atom Schrodinger equation except the familiar spherical polar solutions. We are left with only three possibilities: 1. The "torque" theorem does not always apply in Schrodinger's mechanics. 2. There are preferred coordinate systems in Schrodinger's mechanics. 3. The probability interpretation of the squares of these functions does not apply to individual hydrogen atoms. The first of these is too bizarre to contemplate. The second has some support in the sense that Killing's equations do provide justification for certain natural coordinate systems for spaces with a given metric. However, since we are always dealing with Euclidean (non-relativistic) space this would suggest a Cartesian system as the natural one; this is fatal for the separation of the hydrogen-atom Schrodinger equation. There is no support for the spherical polar system over any of the other three. The solution is, of course, the probability interpretation: Any individual concrete object has perfectly definite values of all its properties; that is what identifies it as a concrete object. Any concrete Kepler system has definite values of energy and all of the physical quantities associated with the separation operators in both classical and quantum mechanics. The solutions of the Schrodinger equation refer to the relevant abstract object which only has definite properties of the energy and the two physical quantities associated with the chosen separation operators. The situation is almost exactly analogous to the interpretation of the solutions of the Hamilton-Jacobi equation which we discussed in Section 4.3.3 on page 81; the mere fact that the abstract object only has definite values of the energy and two other dynamical quantities does not preclude any concrete object which has definite values of those properties from having definite values of other dynamical variables. In the H-J theory the choice of families of trajectories with constant values of angular momenta for the free particle did not, in any way, contradict the fact that all these trajectories had constant linear momentum components; it simply meant that all the constant linear momentum components within a family were not the same; each trajectory had common constant angular momenta but the constant value of the linear momentum components of any individual trajectory of the family was not the same as the other individual trajectories in the same family.
226
The Quantum Kepler
Problem
In Schrodinger's mechanics, the abstract object generated by a particular choice of separation coordinate system has definite values of energy and two other dynamical variables. Each concrete object described by this particular Schrodinger equation has definite values of these dynamical variables which it shares with the abstract object. 4 But each concrete object has definite values of the dynamical variables of the other separation operators. These definite values are not shared by the abstract object and are not, necessarily, shared by other such concrete objects, however. • The particular choice of coordinate system generates a pair of separation operators which produce a set of solutions of the relevant Schrodinger equation. • These solutions, in turn, define the properties of an abstract object which only has definite values of the physical quantities associated with the Hamiltonian and these separation operators. • All other physical quantities — position, momenta, other choices of separation operator, etc. — of this abstract object do not have definite values but the solutions of the Schrodinger equation generate (probability) distributions of these quantities. So, when measurements are performed of any particular dynamical quantity on randomly-occurring concrete hydrogen atoms, sets of definite numbers are, of course, obtained (subject to the usual experimental error bars). • Measurements of some of the physical quantities yield numbers which are always the same; corresponding to the values of that physical quantity for the the relevant abstract object. • But the numerical values of some physical quantities differ from measurement to measurement and settle down (after a sufficiently large number of measurements) to the distributions which are calculated from the solutions of the Schrodinger equation. Needless to say this is far from being the conventional wisdom about the interpretation of measurements of physical quantities in quantum theory and I shall return to this point in Part 6. For the moment I emphasise the current dilemma revealed by the above considerations; if the idea that the referent of Schrodinger's mechanics is the individual system is retained, then certain universal conservation laws must be abandoned. 4
In any particular state, of course.
11.9.
11.9.
Conclusions from the Kepler
Problem
227
Conclusions from the Kepler Problem
This sketch of the solution of the Schrodinger equation for an important system has generated some ideas which are important and, perhaps, which could not have been anticipated merely by looking at the abstract skeleton of quantum theory. Commuting Operators It is not too tiresome to remind ourselves of some very basic points of time-independent quantum theory (or, at least Schrodinger's mechanics): • Only those systems exist in nature for which there are solutions of the Schrodinger equation. • The Hamiltonian operator which occurs in the Schrodinger equation cannot be obtained by substituting gradient operators for the momentum components in the classical Hamiltonian function. This is proved in Appendix 11. A. • Solution of the Schrodinger equation for a particular system by separation of variables generates operators which, typically, are components of the Hamiltonian operator (and which themselves cannot be obtained from their classical equivalent by substitution). The physical interpretation of these operators is, in the first instance, that they are separately conserved parts of the total energy. • These separation operators all commute with the Hamiltonian operator but only commute amongst themselves within a particular coordinate system. • A particular set of separation operators generates solutions of the Schrodinger equation which have the values of the eigenvalues of these operators in common. What I wish to emphasise here is that it is the Hamiltonian operator (a by-product of the application of variational calculus to the Schrodinger Condition) which is the source of this whole system of operators. There is no question of, for example, the Hamiltonian (H) and the separation operators being in any sense equivalent; the mere existence of an expression: Atpi = aiipi
(11.9.12)
The Quantum Kepler
228
Problem
for some A has no ontological consequences whatsoever if there is no Schrodinger equation of which ipi is a solution. If A is not either a component of some H (in the sense above) or, more trivially, a simple function of some H, then the expression above is not connected to the central physical law of Schrodinger's mechanics, it is just an identity which defines a certain Vi- In other words, H has a privileged position amongst all possible operators in quantum theory; it is this operator which, in the Schrodinger equation, "decides" whether or not a system obeys the basic physical law because it is derived from the Schrodinger Condition. Therefore to attempt to abstract the structure of the time-independent Schrodinger equation (the eigenvalue equation) and "generalise" this structure to operators "representing" any physical quantity is extremely misleading and a spurious generalisation which suggests that one may obtain state functions by solving the above equation. But unless there is a route to this equation from the variational Schrodinger Condition, these solutions will be meaningless. Of course, in some cases plausible-looking equations of the above type may be set up using the substitution method and may well be the separation operators of some Hamiltonian in some coordinate system (e.g. the individual momentum operators in suitably symmetrical systems) but this is just a fortunate coincidence. In general, an operator formed by substitution of gradients for momentum components in the classical expression will not have any place in Schrodinger's mechanics. Only if such an operator is linear in momentum components is there any possibility that this operator might, by coincidence, be formed from (roughly, the square root of) a separation operator and therefore have ontological relevance in Schrodinger's mechanics; but operators linear in the momentum operator are scarcely different from the momentum operators themselves and the solutions of any equation like (11.9.12 on the facing page) where A is a momentum operator is some equivalent of Newton's first law. The considerations of Appendix 11. A mean that, notwithstanding considerable expert opinion to the contrary, Even if it is the case that a Hermitian operator can be chosen which represents each dynamical variable in Schrodinger's mechanics (and this is an unproven assumption) it is certainly not the case that the form of such an operator can be obtained by the substitution
d
11.9.
Conclusions from the Kepler
Problem
229
and I know of no general method for generating the operator representing an arbitrary (classical) dynamical variable. This conclusion, although of great theoretical importance, has no practical consequences because working physicists all know the "rules of thumb" for generating the relevant operators: • Make the substitutions in the classical operator using Cartesian coordinates and transform the resulting expression to the required coordinates or, more simply, • Use one of the known forms which works. The general conclusions relating to the nature of operators representing dynamical variables from what seems initially to be a specialised application of Schrodinger's mechanics are quite far-reaching for the basic structure of quantum theory: • There is no general "rule of substitution" for the formation of meaningful quantum-mechanical operators. • The Hamiltonian operator and its various components obtained by separation represent the relevant dynamical variables only in the sense that they generate the correct allowed values (eigenvalues) and state functions for the system. They do not, for example, generate the correct dynamical variable distributions. • Only those operators which are generated in some way from the Hamiltonian operator can represent dynamical variables which satisfy the fundamental dynamical law i.e. can be relied on to represent systems which exist in nature. Consideration like these are the ones which led me to abandon any "axiomatic" approach to the structure of Schrodinger's mechanics; the existence of any specific forms of any differential operators in quantum theory cannot meaningfully be postulated, they must be generated through some contact with the most fundamental law which ensures that the theory has contact with reality; this ensures that any equation involving these operators has a referent which exists in nature. It is surprising that any equation which looks like (has the same form as) the Schrodinger equation can be thought to have the same status as the Schrodinger equation; no-one would consider giving an equation which looks like the Hamilton-Jacobi equation similar status. 5 5
Or, indeed, an equation which looks like F = ma.
230
The Quantum Kepler
Problem
If the Schrodinger Condition is made the central physical law of quantum theory then all of the familiar theory may be generated using logic, mathematics and a little intuition without the inclusion of spurious material. Probability and the Referent of Schrodinger's Mechanics The other main area of interest which springs from our brief outline of the Kepler problem is the support given to a central plank of the theory presented earlier; the referent of probability theory and hence of Schrodinger's mechanics: We are presented with a stark choice involving the referent of Schrodinger's mechanics and the status of some conservation laws in the theory. I have constantly stressed (even, some would say, harped on) the fact that the referent of probability theory is the abstract object not individual concrete objects. All those cases where there are several possibilities for the choice of coordinate system in which to separate and solve the Schrodinger equation generate the same paradox as the one we have discussed in the case of the Kepler system in parabolic coordinates: In 3-space, the Hamiltonian and two separation operators are sufficient to solve the Schrodinger equation and to generate solutions which are eigenfunctions of these three operators. In general other separation operators do not have simultaneous eigenfunctions (do not commute with) with the first set but they do commute with the Hamiltonian. All concrete systems have constant values of the dynamical variables associated with all of these operators. One can, therefore, generate state functions which do not have constant values of dynamical variables which are required to be constant by more general theorems. 6 This is not a problem in the theory of probability presented here any more than it was a problem in the interpretation of the Hamilton-Jacobi equation where actual trajectories may be recovered, as we saw in Section 4.3.3 on page 81. However, if the choice is made to make the referent of probability theory (and therefore quantum theory) the individual concrete system then certain sacrifices have to be made among which is the rejection of well-known Emmy Nother's well-known work.
11.9.
Conclusions from the Kepler
Problem
231
conservation laws. It is instructive in this context to examine the solutions of the free-particle Schrodinger equation in, for example, spherical polar coordinates, and think about the conservation of linear momentum in the light of what was said in Section 4.3.3. The abstract object in this case is a particle with constant energy, total angular momentum (squared) and constant angular momentum in a plane but it does not have any definite linear momentum, only (probability) distributions of each component. Every concrete free particle has these angular properties and in addition constant linear momentum in three mutually perpendicular directions. The situation has compelling analogies with the families of trajectories generated by the separation and solution of the H-J equation. 11.9.1.
Concrete
Objects and
Symmetries
The possible energies and distributions of properties of the hydrogen atom are given by the solutions of the relevant Schrodinger equation. However, the hydrogen atom does not know that we are in the habit of solving this equation by the method of separation of variables and that this method generates characteristic product forms of the solutions which reflect certain ways of imposing constraints on those solutions. We can generate, either by trial or by some more systematic method, 7 all those dynamical quantities which are constants of the motion of the Kepler problem and it proves to be the case that, in the sense of the variational use of the Schrodinger Condition, these conserved quantities are represented by operators which commute with the Hamiltonian operator. Solving the Schrodinger equation by the separation method generates solutions which refer to an abstract hydrogen atom which has constant values of a particular subset of these conserved quantities; different choices of separation operators generate differently constrained abstract hydrogen atoms. The most abstract hydrogen atom is represented by the solution of the Schrodinger equation with no further dynamical constraints imposed; it is, for example, a hydrogen atom with energy En = —l/(2n 2 ) and a state function which is a linear combination of all possible solutions of the Schrodinger equation for the Kepler problem for that particular value of n. For the simplest case of n = 2, such a solution is i> = C2s1p2s + ClpAlp* 7
+ C2pytp2py +
C2p,rp2p,
The quantum-mechanical equivalent of the use of Emmy Nother's method in classical mechanics; generating the invariants of the Lagrangian.
232
The Quantum Kepler
Problem
with arbitrary values of the coefficients C2S, C2Pa,, etc. This solution may well not have constant values of any of the operators which commute with the Hamiltonian. Less abstract (more concrete) hydrogen atoms may be obtained by imposing more and more constraints on the most general solution until a solution is obtained which has the maximum number of such constraints imposed (two, in fact, by choice of one of the above four coordinate systems). Any concrete hydrogen atom has, of course, constant values of all of the dynamical variables associated with all the separation operators for, if it did not, we would have to explain what was the source of the forces which caused these dynamical variables to change. But, by hypothesis, there are no such forces for the spherically-symmetric Kepler problem. The act of measurement simply reveals the definite value of a particular property of a concrete object, whether or not that value is shared by the particular abstract object which is the referent of the solution of the Schrodinger equation which we have to hand. There can, for example, be no question that the act of measurement changes the properties of the abstract object because, among other things, it cannot be acted on by such an act of measurement; it is not a material object; noting the result of a throw of a die has no effect on the probability distribution of the faces of the abstract cube.
A p p e n d i x 11. A
Hamiltonians by Substitution?
/ give here a proof that the Hamiltonian operator contained in Schrodinger's mechanics cannot be obtained from the classical Hamiltonian function by the substitution d
but with little hope of being heeded. The momentum operator in both the Hamilton-Jacobi theory and in Schrodinger's mechanics is d/dqk; the difference between the two theories is not the form of the momentum operator but the interpretation of the functions on which it operates and the dynamical law in which it is involved. There is a stubborn prejudice that the Hamiltonian operator of Schrodinger's theory can be generated by the substitution of suitablydefined momentum operators pk for the classical momentum components in the expression for the Hamiltonian function of classical mechanics: 1
H
3iV
= o £
P^Pi
+ V(q)
(ll.A.l)
k,l=l
where V(q) may depend on any or all of the spatial coordinates. Now it is known from the variational derivation of the Schrodinger Equation from the Schrodinger Condition that the quantum mechanical Hamiltonian operator contains the Laplacian operator V 2 :
^ - s ^ E - ^ t v a * ! + "<«'> 233
(1LA 2)
'
Hamiltonians by Substitution?
234
where g = det\gkl\ and, simply because gkl = 8kl in Cartesians, this derivation may be replaced by the simple substitution of d
Pa —> -^
ft
^
(for a =
x,y,z)
in this special case. However, the unsymmetrical form way in which the partial derivatives appear in (11.A.2) does not hold out much hope of the substitution working in the general case; it clearly depends on some fortunate co-incidences in the derivatives of the metric tensor and its determinant if it is to work in general. Moreover, it is clear by simple inspection that the simple form d/dqk can never generate the correct Hamiltonian; a different form of the momentum operator is needed at the very least if there is to be any hope of success. Changing the form of the operator which is central to the theory on grounds of mathematical convenience, i.e. without good scientific reason, is the worst kind of hostage to fortune, but let us continue. We must assume that the new operator is linear in d/dqk and the most general form is: d pk = -iak—-j + cfe
(11. A.3)
because of the original derivative form. If we drop the potential energy term V{q?) from both expressions, the classical Hamiltonian with the above form substituted for the pk becomes:
A=
\£1 {iakwk+Ck)gkl tiaiw+Cl) •
Now, if this expression is to be identically equal to the quantum mechanical Hamiltonian operator we may equate coefficients of d/dqi, including the zero-order term and hence obtain expressions for the ak and ck • This leads to ak = 1
Ck =
(k — 1,3)
lw
(fc = 13)
'
which fixes the arbitrary functions introduced in equation (11.A.3). But the required disappearance of the zero-order term (to ensure that there
235
is no spurious "potential energy" term arising from the "kinetic energy" operator) generates another relationship which, in view of the fact that the a,k,Ck are already fixed, must be an equation relating some of these fixed functions. 3N
Ck9hlc, = 0
£ fc,i=i
i.e.
which must be satisfied by any coordinate system if the momentum substitution method is to work in general. It is trivial to verify that this equation is not satisfied by some of the most familiar orthogonal co-ordinate systems, let alone arbitrary co-ordinates; it is not true in spherical polars, for example. 8 That is, it is not possible to generate the Hamiltonian operator of Schrodinger's mechanics by substitution of an operator linear in d/dqk into the expression for the Hamiltonian function of classical mechanics. It is elementary to verify that the use of any re-arranged form of the product
Yl Pk9MPl k,l=l
e.g. 3N
3N
kl
Y^ PkPi9 k,l=l
or
^2 9klPkPi k,l=l
(or linear combinations of these) does not overcome the difficulty, except possibly in specific cases; there is no general solution to the problem. Parenthetically it is a relief to be spared the embarassment of casting about for an interpretation of any new form of the momentum operator with no physical theory to help. I have placed the interpretation of Schrodinger's mechanics on the physical significance of the Schrodinger Condition and the densities which 8
Equation (11. A.4) is, of course, satisfied in Cartesian coordinates since all the are zero.
dg/dqk
236
Hamiltonians
by
Substitution?
it contains and so it is obvious in my context why such a transformation cannot be found. In classical mechanics, when the dynamical law is imposed (i.e. when the Hamilton-Jacobi equation is satisfied) the Hamiltonian function is constrained to have a numerical value equal to the energy of the system. In Schrodinger's mechanics the Hamiltonian density is generated by the substitution d
in the classical expression, but the dynamical law is the Schrodinger Condition, which generates a Hamiltonian operator which is different from the function which generated the starting Hamiltonian density. This "Hamiltonian" operator of Schrodinger's mechanics does not generate the Hamiltonian density even when the dynamical law (Schrodinger's Condition or the Schrodinger equation) is satisfied. Only the mean value of this Hamiltonian operator has a physical interpretation and so the physical interpretation of the two "Hamiltonians" (one a function and the other an operator) and their status in their relevant theories is quite different. The crux of the matter is, of course, that the kinetic energy density is the square of the momentum density and not the density of the square of the momentum operators. One final point on momentum operators; the derivative of a function on a closed interval is not defined at the endpoints. Thus we must make some decision about the momentum density at the end-points of a closed interval. This mathematical necessity is, presumably, the equivalent of resolving the problem of "impacts" with a fixed object. One way to resolve this problem is to insist that the momentum operator be Hermitian at these points which generates an additional contribution to the momentum density expressible as the addition of a ^-function to the usual momentum operator.
Chapter 12
The Harmonic Oscillator and Fields
In some ways the quantum theory of simple harmonic motion (SHM) is a more basic application of Schrodinger's mechanics than the Kepler problem. It certainly provides, in addition to its obvious applications in the theory of mechanical vibrators, much of the theory and conceptual structure of various types of "field". This chapter provides the simple solutions of the Schrodinger equation for SHM and the development and interpretation of the more abstract applications of the associated formalism. We shall see that some interpretations of the solutions of the SHM problem are responsible for much obfuscation and discussion about non-existent particles.
Contents 12.1. 12.2. 12.3. 12.4. 12.5.
The Schrodinger Equation for SHM SHM Details Factorisation Method Interpreting the SHM Solutions Vibrations of Fields and "Particles" 12.5.1. Phonons and Photons 12.6. Second Quantisation
12.1.
237 239 241 242 244 248 249
The Schrodinger Equation for SHM
It is completely straightforward to write down the Schrodinger equation for a harmonic oscillator in ordinary 3D space which we can, of course, picture as a pair of masses connected by a Hooke's-law spring. The resulting motion can be separated into translation of the system as a whole, rotation of the 237
238
The Harmonic
Oscillator and Fields
whole system and, the part which we are interested in here, the internal motion of the system; oscillation of the two particles about their mutual equilibrium positions where the connecting spring is neither stretched nor compressed. Although interesting in their own right, the translational and rotational motions do not concern us here, what is of current interest is the one-dimensional vibrational motion. 1 I therefore concentrate attention on the rather artificial example of an isolated oscillator. Furthermore, one can use the idea of "reduced mass" to change the visual image of the oscillator from two particles (masses mi and mi, say) connected by a spring to that of a single particle (mass /i = (mim2)/(mi +7712)) attached by a spring to an immovable point. This device facilitates the visualisation by concentrating attention on the motion of just one particle or on the length of the spring. It is worth noting, before the solution is sketched, that we anticipate that the Schrodinger equation will generate a probability distribution for the extension of the spring; the position of the single particle with respect to the equilibrium position of the spring. That is, if we had no intuitions based on our experience with the classical mechanics of the harmonic oscillator we would not be able to detect the nature of the motion from the allowed energies and probability distributions obtained by solving the Schrodinger equation; we would not know, simply from the solutions of the Schrodinger equation, that the motion is indeed simple harmonic motion and that the motions associated with larger and larger allowed energies is, in fact, oscillation with a fixed frequency and (looking ahead) of larger and larger amplitude. We might suspect, from the similarity of the large-scale form of the probability distribution for very large quantum numbers to the classical distribution given by equation (2.4.5 on page 42) that there was a close relationship between the classical harmonic oscillator and the quantum system but this hardly constitutes a definitive proof; the differences between the two distributions are as striking as the similarities as we shall see. The very fact that the Hamiltonian operator does not depend on time and, therefore, the Schrodinger equation reduces to a time-independent equation which generates a time-independent probability distribution means that the motion must be at least cyclic. But this property is shared by any system for which the Hamiltonian does not depend on time, including, for example, the Kepler problem. 1 In real vibrators, the rotational and vibrational motions interact, for example, "centrifugal forces" imposing a stretch on the spring.
12.2.
SHM
239
Details
For the moment, I ignore these surprising difficulties and concentrate on the job in hand; the solution of the Schrodinger equation for a one-dimensional harmonic oscillator. 12.2.
SHM Details
The one-dimensional Schrodinger equation for a harmonic oscillator of (reduced) mass \i and force constant k is: 1 d2i)n{x) 1 2 2 dx2 + 2kxMx)
= Eni>n{x).
(12.2.1)
The existence of several solutions of this equation has been anticipated by using the subscript n. The manipulations involved in solving this equation can be simplified by absorbing some of the constants into a new variable q defined by: q = {pmj)*x where
is the classical vibration frequency of the oscillator. Writing n{q) = 1pn(x)
and noting that d2tpn{x) dx2
d24>n(q) dq2
=
X fXUJ
gives
dtfiniq) ,
i.e. ±4M +
2
+(l2(t>n{q) = en<j)n{q)
j o
dx
(12.2.2)
{en-q*)Mq)=0
where e n is a scaled energy quantity: 2En £n =
•
w
The Harmonic
240
Oscillator and Fields
This equation is simple enough to be solved by the well-tried elementary method of guessing a solution and correcting that guess by means of a power series. It is easy to see that this equation becomes d?n(q)
dx2
q24>n(q) = 0
for very large |g|, i.e. when q2 » e and that the solution of this asymptotic equation is n(q) = e x p ( ± 9 2 / 2 ) . We can anticipate which of these is appropriate on the basis of the physical interpretation of |0„| 2 ; the positive sign corresponds to a function which increases without limit as q increases and so we opt for the negative sign in the exponent. 2 It is sensible, therefore, to try a solution of the form M
(12.2.3)
and substitute in equation (12.2.2) to obtain the equation which Hn(q) must satisfy, ^
-
*
^
+
<•.-!)«.<.> = .
and use a power series expansion
Hn(q) = Y,ai
to solve this equation and hence obtain a recursion relationship for the coefficients af.
a^=2i-±-V)_ on {i + 2)(i + l)
v
;
Now, if the function (12.2.3) is to represent a probability distribution, its square must have a finite integral over all the one-dimensional available space (—00,00) which means that the power series cannot continue indefinitely; it must terminate to a finite number of terms. That is, for some These considerations can be made rigorous by consideration of the boundary conditions generated from the Schrodinger equation.
12.3.
Factorisation Method
241
n (say) an+2 an
2n - (e - 1) = 0 (n + 2)(n + l )
(12.2.5)
which implies that, for some n the numerator must vanish: e = (2n + 1) or, in terms of the original notation in the Schrodinger equation: En=cj(n+-\
n = 0,l,2,....
(12.2.6)
We may use the freedom within the solution of the second-order differential equation to fix CLQ and a\ and hence generate the explicit forms of the functions Hn(q) associated with each En and so the solution is complete. Before giving any detailed discussion of the solutions, I stress one point: The difference between any two adjacent allowed energy levels is constant, independent of the quantum number n: AE = En-En-1=uj.
12.3.
(12.2.7)
Factorisation M e t h o d
The reduced form of the differential equation for the harmonic oscillator, equation (12.2.3 on page 240) may be re-written in terms of a momentumlike operator conjugate to q: .d p= -i— aq which gives equation (12.2.3) as (P 2 + ?)n{q) = tnn{q)
taking the liberty of giving the co-ordinate q a "hat" to put it on the same operator footing as p. The operator expression on the left of this equation cannot be simply factorised as the quantities q and p do not commute; (pq - qp)f(q) =
-if(q)
for functions f(q) satisfying the relevant boundary conditions.
The Harmonic
242
Oscillator and Fields
However, the "factors" of (p2 4- q2) do have an interesting effect on the solutions of the harmonic oscillator equation (12.2.3), in fact:
tf-v)*. = *.+i (q + ip)<j>n
(123g)
=4>n-l.
These two operators have the property, when acting on a specific solution of equation (12.2.3), of "stepping" from that solution to a neighbouring one; so that, given any pair (en,n), one may obtain all the others simply by applying either or both of these operators to that solution the requisite number of times: (q - iv)mK
= K+m
(q + ip)e<j>n = K-t • This property of obtaining solutions of a Schrodinger equation by using the "factors" of the Hamiltonian operator is common to the Schrodinger equations for a number of physical systems, but nowhere is it as simple and so influential in the interpretation of quantum theory as in the case of the simple harmonic oscillator. In another context, as we shall shortly see, the factor operators are regarded as fundamental in the theory and one writes: fit = (q - ip) a = (q + ip) and
(12.3.10)
a'>„ = n+i an = <£„_!
(12.3.11)
in place of equation (12.3.10). Because the allowed energies are equally spaced, the allowed energy of any computed function (j)n+m (say) is simply (e„ + 2m) = (2n + 1 + 2m) = [2(n + m) + 1]. With these simple facts about the solutions of the Schrodinger equation in mind we can look at the interpretation of the solutions which is the reason for looking at this particular system here. 12.4.
Interpreting the SHM Solutions
The solutions of the SHM Schrodinger equation refer to the abstract object "particle undergoing simple harmonic motion"; the allowed energies
12.4-
Interpreting
the SHM
Solutions
243
e„ are the only energies which such a system may have and the squares of the functions n give the probability distributions for the position of the particle (extension of the spring) in the usual sense of Kolmogorov's theory of probability. That is, any set of suitably random measurements of the position of the particle (extension of the spring) should, if enough are taken, generate numerical quadrature approximations to ratios of the corresponding measures of the square of n. Having in mind the system's classical analogue, we would expect that these distributions should be dependent on the system's energy so that the higher the energy the "wider" the distribution. As the system is given more and more energy, the distribution should have significant values at larger and larger spring extensions. We must remember that, classically, the system simple oscillates at the same frequency independent of energy and the amount of energy simply affects the magnitude of the oscillation. If we plot these distributions this intuition is confirmed; for very large values of the quantum number n the qualitative "shape" of the distribution takes on a form similar to the probability distribution of the classical harmonic oscillator (see Fig. 12.2) given by equation (2.4.5 on page 42). The envelope of the quantum distribution is similar to the classical curve. However: • For low values of n (see Fig. 12.1) the shape of the distribution is very different from the classical analogue. 3
Fig. 12.1. 3
State functions for SHM: n = 1, 2, 3.
A point which will be taken up later in Section 16.4.
244
The Harmonic
Oscillator and Fields
Fig. 12.2. Probability distribution for SHM: quantum (n = 100) and classical (dotted and shifted upwards slightly to make it visible).
• For those very large values of n for which the distributions of the classical and quantum cases are qualitatively similar, the quantum distribution is very oscillatory; only the envelope of the quantum distribution is similar to the classical case. Each of the functions Hn(q) has (n — 1) zeroes and therefore so does each of the associated distributions 4>n-
12.5.
Vibrations of Fields and "Particles"
Any material system in equilibrium experiences restoring forces if it is distorted from that position of equilibrium and, if the force fields in operation are sufficiently smooth, the change in energy due to this distortion can be expanded as a Taylor series: E{Xi) = E0 + Y^ ( ^— ) 1
N
+ i.j=i
(Xi-
X
i0)
d2E dxidxi
{Xi -Xi0)(Xj
-XJO)
+•
X j -— i t i 0; *C j — X -j o
where EQ is the energy of the system at equilibrium and the x, are the coordinates describing the degrees of freedom available to the system. If the system is in equilibrium at Xj = x,o for all i then all the first derivatives
12.5.
Vibrations of Fields and
"Particles"
245
are zero and so, for small enough departures from equilibrium, the restoring forces are the quadratic and bilinear terms
\
l
J /
Xi~XiQ-,Xj—XjQ
which means that, for small enough departures from equilibrium, the motion will be that generated by Hooke's law; some form of harmonic vibration. It is possible to define a new set of co-ordinates (the so-called Normal coordinates Qi) which will bring this bilinear form into an exact quadratic form:4
Z i=1
\U^i/
Qi=Qi0
Thus, in the Qi system of coordinates, the overall JV-dimensional vibrational problem splits 5 neatly into N separate one-dimensional SHM problems with the force constants ki (the analogues of k in equation (12.2.1)) given by:
*~(*tfL.Q„' Thus, our results for the one-dimensional oscillator are all that is required for a complete solution 6 of the problem of the motions generated by small distortions of any mechanical system from a state of equilibrium: • The allowed energies are simply sums of the allowed energies of the separate SHM oscillators. • The solutions of the Schrodinger equation are simply the products of the solutions of the SHM problem: the fa. There are a couple of interesting applications of this surprisingly simple result: 1. vibrations of solids (crystals), 2. vibrations of (electro-magnetic) fields, 4
T h a t is, the square matrix of second derivatives is symmetrical. T h i s separation is dependant on the harmonic force field; in general, when higher terms are retained in the Taylor expansion, no such separation is possible. 6 Once we have the transformation t o Normal Coordinates, from the Xi to the Qi, which is not trivial, since they are often fewer Qi than n, but it is a purely technical matter. 5
246
The Harmonic
Oscillator and Fields
since the fact that the allowed energies of a SHM oscillator are equally spaced has led to the invention of a series of imaginary "particles" and the establishment of a whole colloquial terminology which, if taken literally, can lead to a serious misunderstanding of the structure and dynamics of extended bodies both massive (solids) and massless (fields). Let us look at the most controversial case: photons. An electromagnetic field is an extended massless substance; it is material in the philosophical sense of Section 1.2 on page 4. That is, it exists independently of our perceptions and thoughts. It has, in principle, an infinite number of degrees of freedom and it responds to having its equilibrium disturbed in the same way as other material systems do; by vibrating. The resulting vibrations are, of course, quantised and the system may take up (or release) energy in amounts which are (rii + 1/2) multiples of the fundamental vibration frequencies w, (say, one LJi for each Qi). That is, there is a tempting analogy between particles and these sets of identical units of energy of the electromagnetic field: • The quantum numbers n, correspond to excitation from a base energy of w»/2 by n,i identical amounts of energy. • These excitations are able to pass on their energy to other material systems (e.g. atoms and molecules) only in discrete amounts of the same size. Such exchanges have an obvious analogy with "collisions" between particles and the other systems. • The analogy is, therefore, that a field which has vibration frequency u>i excited to an amount rii (for each i) is "nothing but" sets of rii particles, each with energy proportional to uii and the residual energy a>j/2 is inaccessible energy of the field. One can use this analogy to say, for example, that a given vibrating electromagnetic field is just a region of space containing m particles of each energy Wj. This simple idea and the mathematical techniques associated with it provide a very convenient technology for the manipulations involved in dealing with practical applications of the theory of the interaction of, for example, radiation and matter. But the field is not actually a swarm of particles; it is an extended massless substance in a complicated mode of quantised vibration, some
12.5.
Vibrations of Fields and
"Particles"
247
of whose motions may be conveniently described as if it were a swarm of particles. The putative particles have no mass, 7 for example, and the system only behaves like a swarm of particles in certain circumstances, not in all. To make the point absolutely clear: The action of of the "step up" operator a^ on one of the vibrational state functions <j>n:
is to generate the next vibrational state function <j>n+i from >n. There is no question of creating a particle here; indeed there is no physical interpretation at all relevant here, it is just a mathematical device. The difference between the state described by <j)n and 4>n+i is simply that the latter has more vibrational energy (is vibrating more vigorously) and certainly not that there is any difference of particle numbers in the two states since particles are not involved in the interpretation of t h e 4>i.
There is no particular harm in using convenient fictions like this (the flat-earth model is such a local convenience) provided one remembers that is exactly what they are: convenient fictions. Applied physics abounds with such constructs; the most familiar, perhaps, being the "effective mass" of electrons in solid-state physics. 8 The harm arises when these working models are taken literally, accepted uncritically by those not familiar with the scientific background and then arbitrary philosophical constructs are spun out of this convenient technology; particle/field dualities and the like. In a revealing quote from their lucid work on quantum fields9 Bogoliubov and Shirkov say: In the theory of particles the momentum representation is used very frequently. First, it is more convenient to describe physical problems when particles are characterised by their energies and momentum (but not by space-time co-ordinates). Second... 7 We can, tongue in cheek, explain why photons have no mass; the word "photon" is just a mnemonic for a set of quantum numbers and sets of quantum numbers do not have mass. 8 However, I have to say that the description of a particle's motion in a potential field as that of a free particle of variable mass is not too convenient for my imagination. 9 N . N. Bogoliubov and D. V. Shirkov, Quantum Fields (Benjamin/Cummings, 1983).
The Harmonic Oscillator and Fields
248
Just so; if one is interested in those properties of fields which seem particlelike, it is convenient to use a particle model for excitations of the fields and to put less emphasis on their space-time co-ordinate structure. It goes without saying, of course, that the fact that photons are not particles but modes of vibration of a material field has considerable consequences for the interpretation of experiments involving the "coupling" of photons. In the case of solids the excitations of the vibrational modes of the nuclear framework are treated as particles in a completely analogous way ("phonons") although, in this case, there is no real pretence that these particles are anything other than a convenient fiction. In this chapter and what follows I shall refer constantly to the point mooted in Section 10.3 on page 204 that many of the current philosophical confusions in the interpretation of quantum theory (those which are not due to misunderstandings of the nature of mathematical probability theory) are actually generated by over-zealous physical interpretation of simple mathematical techniques which are convenient and powerful technologies, rather than elements of the interpreted theory.
12.5.1.
Phonons
and
Photons
It may seem strange that since no-one who gives the matter any serious thought would defend the idea that phonons — quanta of vibration in a material field — are material particles while photons, which have an analogous relationship to a material field, are overwhemingly held to be material particles. The answer to this strange dichotomy is, as usual, historical rather than scientific. Einstein developed the theory of the photo-electric effect in terms of light particles (photons) many years before the development of Schrodinger's mechanics. Indeed, this effect was one of the cluster of inexplicable phenomena which led to the invention of quantum theory. At the time, since there was no thought of quantised wave motion, the only possible explanation for the exchange of discrete amounts of energy between the electromagnetic field and matter was that the field contained, or was composed of, particles. Very quickly after Schrdinger's mechanics was available, a quantum theory of the electromagnetic field was developed and the relationship between the quantised vibrations of that field and the now familiar idea of these quanta being particles of light was established. But the name "photon", being more convenient than "quantised vibration of the electromagnetic field", was already well established and consequently stuck in the physicists' armoury of convenient shorthand.
12.6.
Second
Quantisation
249
In contrast, the theory of quantised vibrations in molecules and solids actually generated the word and concept of "phonon" as a more-or-less localised quantum of vibration and the more visible birth of this term enabled the relationship between the convenient fiction and the underlying theory to be seen.
12.6.
Second Quantisation
The most spectacularly successful application of the use of the convenient fiction of creation and destruction of particles is in reversing the above argument and (implicitly) generating a fictitious field associated with known material particles. That is, one uses the argument that the electric field in some way generates the particles "photons", so, perhaps, all particles are generated by (i.e. are quantised vibrations in) some kind of field. In this approach, which is widely used in theories involving the energies and distributions of many-particle systems, the solution, i/>, of a Schrodinger equation is regarded as the field and the operators at and a are introduced axiomatically so that they operate on this field to increase (or decrease) the number of (identical) particles described by a Schrodinger equation for the changed number of particles. In fact, the technique is at its most useful and powerful when the creation and destruction operators (as a) and a are called) generate a particle in a particular state described by a member of a set of orthonormal single-particle state functions ("orbitals"). The technique has the advantage that it is easy to program for automated use and the tedium of much of the routine tasks of, for example, many-electron theories of condensed matter is avoided. But: The solution of a Schrodinger equation, although mathematically a field (a function of 3-space for a single particle) is not a material field. In the sense of Section 1.3 on page 8 it is real but not material, since it exists only in minds. 10 It therefore cannot have real, material, motion and, a fortiori, cannot have its motions quantised. A solution of the Schrodinger equation, like a probability distribution, does not exist in the world "out there"; it is a conceptual structure, existing only in minds. If we could compute the probability distribution for road traffic accidents along the length of a motorway, there would be no evidence 'And their ancilliaries like paper and computer storage.
250
The Harmonic
Oscillator and Fields
of this function along the motorway; it exists only in our minds or in our computers. The fact that this distribution refers to the real world is bourne out by compiling accident statistics and comparing them with the probability distribution. It is the road traffic accidents that are real (existing in the external world); there is no material field along the motorway of which these accidents are the quantised motions. Do we really think that, associated with every type of (fundamental?) particle existing in the universe, there is a real massless extended substance filling the universe? I shall have occasion to refer to these techniques again in Chapter 13 where the combination of this method and another technique for the approximate solution of the Schrodinger equation leads to difficulties of interpretation which are entirely avoidable. However, it is useful to stress again the dangers of an uncritical acceptance of pronouncements made from a knowledge of and familiarity with what are, philosophically speaking, mere manipulative devices. The dangers involved in simply taking over what practising physicists say about their colloquial interpretation of the familiar techniques used amongst themselves is that the philosophers never get to grips with the underlying problems of interpretation. In ordinary English (at least in England) there is a word for each common domestic mammal used as food and a word for the meat of that mammal. So, the animals are pigs, cattle, sheep and deer, for example, but the meat is pork, beef, mutton and venison respectively. It is not surprising to learn that the words for the animals are of Saxon origin while the words for the meat are part of the Norman heritage. Obviously, the Norman lords saw the meat on the table and did not have to worry too much about where it came from; the raising and butchering of the actual animals was the province of the Saxon serfs. Now, while not wishing to impute serfdom to working physicists or, indeed, nobility to philosophers, there is an analogy here; if philosophers wish to interpret quantum theory, they have to be sure, as it were, that they are dealing with the real animal and not just an attractive and convenient part of it.
Chapter 13
Perturbation Theory and Epicycles
Perturbation theory, as an approximation method for tackling intractable Schrodinger equations, might seem out of place in a work concentrating on the interpretation of Schrodinger's mechanics. But, in fact, this approximation method is the principle (perhaps the only) tool of modern field theories and, as such, has proved to be the source of many of the concepts used by modern theorists. These concepts have tended to be taken over uncritically by philosophers and detached from their original source. It is worth looking at this method and, perhaps, comparing it and its conceptual structure with some earlier approximation techniques.
Contents 13.1. 13.2. 13.3. 13.4. 13.5. 13.6. 13.7.
13.1.
Perturbation Theories in General Perturbed Schrodinger Equations Polarisation of Electron Distribution Interpretation of Perturbation Theory Quantum Theory and Epicycles Approximations to Non-existent Functions Summary for Perturbation Theory
251 252 255 256 258 259 261
P e r t u r b a t i o n Theories in General
Q u a n t u m theory shares with its ancestor, classical particle mechanics, t h e difficulties of solving t h e dynamical equations for any but t h e simplest mechanical systems. In b o t h cases t h e list of exactly soluble problems is very short: 251
252
• • • • •
The The The The The
Perturbation
Theory and
Epicycles
free particle. harmonic oscillator. rigid rotor. Kepler problem. (fixed nucleus) one-electron diatomic (Hj).
These are the basic model systems from which concepts are formed which are used in intuitive descriptions of the complex behaviour of (for example) systems of many interacting particles with internal structure. These models are also the basis of the technical and numerical applications of theoretical systems. The most straightforward of the numerical techniques used to attack problems for which neither classical nor quantum mechanics can give exact solutions has its most transparent use when the system under investigation is, in some sense, "close to" an exactly soluble system. The most familiar examples are from celestial mechanics; the motion of the earth around the sun or the moon around the earth in the presence of the other planets are "close to" the respective Kepler problem when the influence of the other bodies in the solar system are neglected. Each of these motions can be meaningfully described as a Kepler problem perturbed by the gravitational fields of the more remote or less massive bodies; the path of each is close to being a conic section (indeed, close to being a circle). These simple considerations are the basis of a whole class of perturbation methods in applied (classical or quantum) mechanics in which one starts from an assumed known solution of the dynamical equation and generates a "corrected" or "perturbed" energy and distribution in a systematic way in which by increasing the length of a so-called perturbation expansion of the corrected motion. The exact solution may be approached with improved (in principle sometimes, arbitrary) accuracy. The problem of the solution of the insoluble Schrodinger equation (for example) then becomes one of technology rather than principle.
13.2.
Perturbed Schrodinger Equations
In Schrodinger's mechanics, the perturbation method is particularly easy to implement in its simplest form: • One writes the Schrodinger equation for the unperturbed system in "operator form":
13.2.
Perturbed Schrodinger
253
Equations
i*oV(0) - £ ( V 0 )
(13-2.1)
where the superscript (0) indicates that this is the exactly-soluble unperturbed case. • The perturbed equation, for which the (approximate) solutions are required is: Hip = £V
(13.2.2)
where, in a sense to be defined, the pair {E,tp) is close to the pair (£(°>,V (0) ). • The assumption is made that the Hamiltonian H is sufficiently similar to the Hamiltonian Ho that it may be written: H = H0 + XV.
(13.2.3)
The "perturbation parameter" A may be the size of some physical effect generating the perturbation (a field strength, for example) or, more usually, it is simply a mathematical device to keep track of the terms in the expansion below. Typically, one works systematically to a given power of A in the expansions below; the implication being that, the higher the power of A, the greater the accuracy of the result. The essence of the perturbation (the details of the difference between HQ and H) is contained in the operator V. • The related assumptions are made that this form of H implies that the solutions of equation (13.2.2) may be expanded as a power series in the perturbation parameter A:
^ = V(0) + AV'(1) + iA 2 V (2) + --2
(13.2.4)
E = E<® + XEW + U?EM + ... where the numerical accuracy of the approximation may be increased by taking more and more terms in the expansion (13.2.4) The details of the derivation of the equations to be solved are obtained simply by substituting (13.2.3) and (13.2.4) into equation (13.2.2) and generating an equation for each power of A. So far so good, there are no particular problems of interpretation here; the perturbation method is simply the mathematical expression of an idea which appeals to physical intuition. However, without any further approximations these substitutions have, apparently, made things much worse,
Perturbation Theory and Epicycles
254
having simply replaced one insoluble equation, (13.2.2), with a whole series of insoluble equations; one for each power A" to obtain E^ and ip(n\ Each of these new equations is a differential equation and, usually, just as hard to solve as the original equation (13.2.2). The trick here, as with any attack on any complicated partial differential equation, is to express the unknown function {tp^) as a linear expansion of known functions; this replaces the solution of a differential equation by the solution of a set of ordinary algebraic equations for which standard methods of solution are well-known and reliable. There are two sides to this standard procedure: 1. Fortunately (as it were) we have in hand a complete set of functions which are capable of expanding any function of the coordinates involved; the functions tpn of equation (13.2.4) for any n may be expanded as a linear combination of the (assumed known) complete set of solutions of the unperturbed Schrodinger equation (13.2.1) since this equation has an infinity of solutions in addition to the one which is close to the required solution: Ho40)
=^(0)Vl(°).
(13-2.5)
Where we may take our original ip^ to be the first of these solutions: V>o • This substitution reduces the problem from finding an unknown function to that of finding a set of expansion coefficients (CJ, say) for which formulae are well-known, i.e. VI = 5 Z Q ^ ( 0 ) .
(13.2.6)
i
2. Unfortunately, it is precisely this convenient fact — the availability of an "obvious" complete set of functions — which leads to the widespread misinterpretation of the perturbation expansion as a representation of processes rather than a mathematical convenience. The essence of this particular complete set of functions here is precisely that they are a complete set of functions for the expansion of the unknown functions ipn; in this context the fact that they are the state functions for the unperturbed system is just a coincidence, there are other complete sets of functions which could be used which have no relationship to the states of the unperturbed system. Looking at a very simple case will illustrate the main features of the problem.
13.3.
13.3.
Polarisation of Electron Distribution
255
Polarisation of Electron Distribution
The Schrodinger equation for the hydrogen atom is exactly soluble and the probability distribution for the normal ground state is known to be the square of a simple spherically-symmetrical exponential function of distance from the nucleus. We can use this as the unperturbed state and ask what is the change in energy and electron distribution in the presence of an electric field? The field can be a constant one or that due to a nearby point charge. The qualitative answer is obvious: The electron probability distribution will be distorted by the presence of the electric field away from the spherically-symmetric distribution of the isolated hydrogen atom. But how is this distortion to be described quantitatively by perturbation theory? We must look at equation (13.2.6) and use the other solutions of the hydrogen-atom Schrodinger equation in order to obtain, for example, the first term in the series ipi. But before looking at any quantitative details the kind of solution which occurs is rather obvious. In the presence of an electric field which destroys the spherical symmetry of the isolated hydrogen atom, the simplest thing that must be taken into account is the addition of non-spherically symmetric contributions to the ground-state function ip(°\ If these changes are to be formed from the other solutions of the hydrogen-atom Schrodinger equation then we must add functions like the dumb-bell shaped "p" functions and possibly others which are of more complex symmetries. This intuition proves to be correct and the largest contribution to ipi is from the so-called 2p function which is an exponential function multiplied by a trigonometric function (V4P , s a y) s o that the first approximation to the perturbed function tp becomes in this case:
< / > « V ( 0 ) + AC 2 P V4? + --where the expansion coefficient is given by:
jdv^vip: C2p =
p
^—'
(13.3.7)
What is this result telling us? Nothing more than, if one wants to describe the distortion of the electron distribution of a hydrogen atom in its (spherically-symmetric)
256
Perturbation
Theory and
Epicycles
ground state due to an electric field by the addition of other functions, one must use contributions from non spherically-symmetric functions. This qualitative result is independent of the source of the other functions; they could be, for example, the solutions of the Schrodinger equation for any system in three dimensions and with the same boundary conditions. 1 What the result above is certainly not telling us is anything about any processes occurring in the polarised atom: • The unperturbed atom's Hamiltonian is independent of time. • The perturbation is independent of time. • Therefore both systems have electron distributions which are independent of time; there are no processes occurring in either system. In particular the (approximate) function given by equation (13.3.7) does not imply that, in the polarised atom, transitions between the ground state (i/?o) and the excited state (V4P ) of the unperturbed system are occurring. The use of the excited-state functions of the unperturbed system is simply a computationally convenient way of describing the differences between the (static) electron distributions of the two systems. If one were to carry out a more accurate calculation of tpi (and all the other V'n), dozens or even hundreds of the ip\ ' would be invoked in the expansion. The same thing would happen if one were to use some other set of functions with which to expand the perturbation corrections ipn and one could scarcely describe these contributions in terms of excitations to (non-existent) states of the perturbed system. In short, the perturbation method is a computational technique which does not have and, indeed in general does not require, any physical interpretation.
13.4.
Interpretation of Perturbation Theory
In its original formulations, perturbation theory was designed to be used in situations like the one outlined in the last section; the approximate solution of a Schrodinger equation for which a "nearby" Schrodinger equation is exactly soluble. However, the derivation of equations (13.2.4) is 1 For example, a particle attached to a nucleus by a Hooke's-law spring and free to rotate.
13.4-
Interpretation
of Perturbation
Theory
257
completely general and nowhere makes explicit use of the "size" of the perturbation. The assumption is simply that, if the perturbation is sufficiently small, the length of the expansions in (13.2.4) will be manageably small for a quantitatively acceptable result. In this context it is clear why it seems sensible to use the excited state functions of the unperturbed system to correct the ground-state function.2 Presumably, if the perturbation is small, the correction will be small and the solutions of the unperturbed Schrodinger equation are the only states available to that system, so they ought to "fill out" the space of solutions of the perturbed system well.3 Now, suppose we are faced with a situation where the problem to hand is one in which we do not have any convenient, nearby, soluble Schrodinger equation. The only soluble systems are ones which are not at all close to the real system but are similar in some respects, like having the same number of particles or, in general if we are dealing with fields, have the same number of degrees of freedom. The classic cases are ones which involve a system of particles or fields (or both) in strong interaction for which we only have the solutions of the Schrodinger equation for the case when they do not interact. 4 The only way forward is to use the perturbation method and simply put up with the fact that the expansions will, necessarily, be much longer because the corrections to the original ^o will be greater. When this happens, specialised techniques must be developed for the manipulation and summation of the cumbersome series involved and the most famous of these technologies is, perhaps, the intuitively attractive diagrammatic methods due to the doyen of quantum theorists, Richard Feynman. These techniques enable the summation to infinity of certain classes of perturbation expansion. However impressive these technologies are they are still just that, numerical technologies which have no consequences for the physical interpretation of the underlying behaviour of the material world. The expansions are not unique since they assume an initial (formally, if not practically) arbitrary unperturbed starting point and use the state functions of this starting point with which to expand the approximate 2 In fact one can apply the perturbation method to any state of a system, I am using the ground state for convenience, the arguments are identical. 3 As it turns out, this assumption is not well-founded even in the simplest cases since, for example, the excited state functions of the H atom quickly become too diffuse to correct the ground-state function adequately and one must use the continuum functions which describe an ionised H atom. 4 O r they interact through some average interaction (the Hartree-Fock case).
258
Perturbation Theory and Epicycles
solutions. As a matter of fact, there is no theoretical reason why one should use the expansion method at all. In principle, the equations of which the solutions are (13.2.4) could be solved by any of the methods available for the approximate solution of differential equations, including variational techniques; one could approach the solution of the equation for Ex and tpi simply as a problem, independently of the solutions of equation (13.2.1). In this case there would be no question of interpreting the solutions in terms of processes involving the unperturbed system. The individual terms in the conventional expansions do not have a physical interpretation since they are, from the ontological point of view, arbitrary. If there are terms in the expansion which, if seen as state functions, would violate the laws of Schrodinger's mechanics, then this does not need explanation. The enormous lengths of the perturbation series for 4>n leads, naturally, to more and more involved expressions for the expansion coefficients; (13.3.7) is just the simplest of these expressions. This implies more and more complicated "processes" in the perturbed system with more and more fantastic physical interpretations. The "swarms of virtual particles" filling the vacuum, beloved of popular science writers, belong to this style of interpretation of the perturbation expansion. Any function may be expressed in an infinite number of ways as (linear) expansions in terms of other functions. In particular the state functions (eigenfunctions) obtained by solving Schrodinger equation s provide useful expansion functions. These expansions do not have any consequences for the physical interpretation of the target function and, in particular, the terms in such expansions have no meaningful physical interpretation. The larger the perturbation, the longer the expansions and, therefore, the more the physical interpretation of the terms in such expansions becomes patently absurd.
13.5.
Quantum Theory and Epicycles
Here is a quote from the writings of a prominent churchman concerning the amazing accuracy of the work of the scientists of his day: They predict many years ahead eclipses of the sun and moon; they specify the day, the hour and the extent; and their reckoning is correct — the events follow their predictions; they have discovered
13.6.
Approximations
to Non-existent
Functions
259
and recorded rules, by which it can be foretold in what year, in what month, at what hour of the day, in what part of their light the sun and moon are to be eclipsed; and what is foretold occurs. This is not from the seventeenth or eighteenth centuries about the use of Newtonian mechanics, but from Saint Augustine (Confessions Book V, Chapter 3) about his contemporary astronomers of the fourth century A.D. more than 1200 years before celestial mechanics existed. There is a very obvious precedent for the interpretation of the terms in a perturbation theory expansion and it has the same very understandable (if not excusable) explanation. The Greek astromomer Ptolemy had to hand the explanation of the unperturbed motion of the planets around the sun due to Aristotle; motion in a circle is perfect heavenly motion and requires no further explanation. But the motion of (particularly) Mars did not fit this excellent theory. However, simply by expanding the trajectory of Mars as a series of perfect (unperturbed) motions, he and his followers were able to obtain essentially perfect agreement with the experimental observations. Copernicus' revolution was to show that this complicated method could be replaced by a single perfect motion centred close to, but not co-inciding with, the Sun. Kepler realised that, by abandoning the idea that circular motion had to be the starting point, the motion could be explained by a single ellipse with the Sun at one focus. It is not at all clear what the early astronomers thought of their system of cycles and epicycles and the reality of these motions but what is certainly the case is that their numerical agreement with experiment was better than that of Copernicus' theory. Perhaps Copernicus' revolution consisted not so much in putting the Sun at the centre of the solar system (which he did not do) but in deciding that the complex Ptolemaic system was preposterous. And it is preposterous if the cycles and epicycles are thought to have physical meaning just as it is preposterous to give a physical interpretation to the terms in any arbitrary expansion or indeed to a semi-empirical approach whose only function is to agree with experiment. 13.6.
Approximations to Non-existent Functions
So far, I have been at pains to stress the computational usefulness of the perturbation method but I may have been too indulgent. The use of the "excited state" functions of the unperturbed system (the ip\ ) to correct the unperturbed function of the state of interest is actually fraught with
260
Perturbation
Theory and
Epicycles
dangers of an unexpected kind. One often finds that one can calculate the numerical value of some perturbed energy quantity to very high accuracy — using an expansion of moderate length — only to find that it is possible to prove that the Schrodinger equation one is attempting to solve either does not have a solution or that the actual solution has some properties which are contradicted by the perturbation method. Two examples illustrate this phenomenon, one very old and one relatively recent: • In solving the Kepler problem for the hydrogen atom one neglects the very small effects due to magnetic interactions beween the electron and the proton and, if they are to be considered at all, they are approximated by perturbation theory. Typically, these effects involve energies of interaction between the particles which depend on higher powers or 1/r than Coulomb's law (ex 1/r). Generally speaking, using the same method as the one outlined above (in Section 13.3) for the polarisation of the electron distribution, one obtains good agreement with experiment by these methods. But it is possible to prove that the Schrodinger equation for Hamiltonians with this kind of potential does not have any solutions; the electron will "fall" into the nucleus rather than have its motion changed slightly. What is happening here? With hindsight, the explanation is clear. If only the first few5 of the expansion functions are used in the computation they may simply not have the "capability" to describe the electron's "fall"; they are, after all, functions describing the bound states of the atom and are therefore completely inappropriate to describe such a catastrophic event. The perturbed function obtained is an "approximation to a non-existent function". • A more topical case strikes at the very root of the applications of perturbation theory to field theories; the subject matter here is that of relativistic field theories which are used to give an excellent quantitative "description" of the Lamb shift in the sense that the computed numbers are in basically perfect agreement with experiment; this was the first great triumph of quantised radiative corrections in the late 1940s. But, as Lieb has reported, 6 the perturbation series does not converge and (as in the above case) the physical explanation culled from the use of the perturbation method is erroneous; the energy has a wrong dependence 5 6
A n d here, "few" might mean hundreds or thousands out of an infinite number. E . H. Lieb, Physica A 263, p. 491 (1999).
13.7.
Summary for Perturbation
Theory
261
on the principle parameter used in the whole treatment. As Lieb himself says: We are looking for small effects, called "radiative corrections", and these effects are like a flea on an elephant. Perturbation theory treats the elephant as a perturbation of the flea. . . . perturbation theory cannot converge because if it did so it would give the wrong answer. Again, if the perturbed function is computed it is an approximation to a non-existent function.
13.7.
Summary for Perturbation Theory
In a rather pessimistic article, 7 E. T. Jaynes has a section entitled "Is Quantum Theory a System of Epicycles". But Jaynes is principally concerned, not with Schrodinger's mechanics but with the whole plethora of semi-empirical theories which have been simply generated ad hoc whenever some particular phenomenon needs explanation; in this approach quantum theory is simply a framework into which a Hamiltonian with empirical "coupling" parameters may be fitted to account for the measurements if not the phenomena. This is not what I have in mind here, 8 I am concerned with Schrodinger's mechanics which is an interpreted theory non-empirically derived from the Schrodinger Condition. What I am saying here is: No, quantum theory is not a system of epicycles but the perturbation method, if seen as a theory rather than a mathematical technique, is indeed a system of epicycles; an explanation of phenomena in terms of a series of fantastic physical processes. Further, any abstract system, which can have its abstract structure used to accommodate explanations of arbitrary phenomena within a given context by the use of semi-empirical schemes, is a system of epicycles in this sense. In the absence of a guiding physical principle and a interpretative approach, 7 "Scattering of Light by Free Electrons" in The Electron (Editors D Hestenes and A Weingartshofer, Kluwer, 1990). 8 Although it is very relevant to the view that Schrodinger's mechanics can be replaced by the abstract "Hilbert Space" structure which may be arbitrarily filled with parameters as discussed in Chapter 10.
262
Perturbation
Theory and
Epicycles
such "theories", however practically convenient, are empty frameworks.9 The archtypical semi-empirical theories are those based on "Spin Hamiltonians" 10 used to rationalise magnetic resonance phenomena. However, in these cases, it has been possible for many years now to derive the forms and numerical values of these parameters from Schrodinger's mechanics using the explicit configuration-space Hamiltonians and solutions of the associated Schrodinger equation to generate a complete theory with no hidden variables. The main lesson is clear; 11 some mathematical structures abstracted from physical theories are of such generality that, by replacing the physical content of the original interpreted theory with empirical data (chosen to "fit the facts"), it is possible to rationalise almost anything within a given framework. In an influential work, Mario Bunge 12 showed how the Hamiltonian formalism could be used to develop a "General Dynamics" within which almost any (classical) time-dependent phenomenon could be accomodated. There is, of course, nothing wrong with this approach, except that it may well give its adherents a false sense of achievement; let us hope that it is less than the 1,200 years which elapsed between Saint Augustine and Newton before genuine interpreted theories emerge to replace the current semi-empirical rationalisations.
As Betrand Russell famously remarked, "postulating what one requires to be true has certain advantages; they are those of theft over honest toil". 10 This point is taken up in a different context in Chapter 14. 11 Indeed, it is the same lesson emphasised in Chapter 12 concerning vibrations of fields. 12 Springer Tracts in Natural Philosophy Vol. 10 Foundations of Physics (Springer 1967).
Chapter 14
Formalisms and "Hidden" Variables
The most familiar and therefore, perhaps, the most widely ignored examples of explanations of phenomena in terms of hidden variables are in the use of various semi-empirical Hamiltonians. The archetypes are the spin Hamiltonians used in the theory of the chemical bond and various magnetic phenomena and the vector model of the atom, but modern particle physics abounds with examples.
Contents 14.1. 14.2. 14.3. 14.4.
The Semi-empirical Method The Chemical Bond Dirac's Spin "Hamiltonian" Interpretation of the Spin Hamiltonian
14.1.
T h e Semi-empirical M e t h o d
263 264 267 268
In t h e early years of t h e q u a n t u m theory of t h e electronic structure of many-electron atoms and molecules it was technically impossible t o evaluate m a n y of t h e integrals which are needed t o calculate t h e details of t h e repulsions between electrons in these systems. In these circumstances it is not unreasonable t o use Schrodinger's mechanics as a theoretical framework for t h e problem, get as far as one can with t h e theory and t h e n simply "carry" t h e intractable integrals (treated as parameters) as part of t h e development. Comparison with experimental results t h e n enables t h e values of these parameters t o be estimated in the sense t h a t , with a series of experimental measurements of some quantities and theoretical expressions for t h e values of these quantities, one can perform some kind of statistical "fit" t o obtain 263
264
Formalisms
and "Hidden"
Variables
a set of empirical values for the intractable integrals. Naturally, these theoretical values will contain the effects of some model approximations made in the theory which will be masked by this fitting procedure, but the whole approach has some value in generating a so-called semi-empirical method which at least systematises a whole set of otherwise uninterpretable data. However, problems arise if this method is used over-enthusiastically and the semi-empirical framework becomes mistaken for a genuine theory. The mental habits and rules-of-thumb used become so familiar that the method begins to be credited with explanatory power which it cannot have. Nowhere is this more evident that in the various angular momentum "coupling" schemes. The most venerable of these is the "vector model of the atom" where the familiar concept of repulsion between particles of the same charge is replaced by the mysterious "coupling" of the orbital angular momenta of electrons. The recipes involved are easy to manipulate but the description and explanation of the phenomenon is fantastic. But the top prize for these opaque schemes has to go to various electron spin-coupling schemes where, if the methods are taken literally as they often are, huge amounts of energy are associated with the interactions between the spins of electrons. These methods are of interest because of the way they show how coupling schemes may be used to "hide" both the actual variables relevant to a realistic interpretation of the phenomenon and to give a completely false picture of the mechanisms involved. The theory of the chemical bond was an early casualty of this over-enthusiastic formalism.
14.2.
The Chemical Bond
The theory of the simplest possible two-electron chemical bond uses the so-called electrostatic (non-relativistic) Hamiltonian operator: H = hin) + h(n) + where
/i(^) = - - V 2 ( f i ) + V(r 4 )
— + VNudear
(14.2.1) (14.2.2)
and V(fi) is the attraction between the ith electron and the two nuclei and VNudear is the mutual repulsion between the two nuclei. The associated (time-independent) Schrodinger equation is insoluble and any attempt to obtain a quantitative description of the simplest
14-2.
The Chemical
Bond
265
chemical bond must, therefore, involve approximations. Fortunately, in the two-electron case, any trial function can be written as a product of a function of space (the spatial coordinates of the two electrons, f\ and f\, say) and a function of electron spin (the two electrons' spin "coordinates" si and s2, say). If <&(rx,r2) is a sensible trial function — typically containing parameters to be optimised to obtain the best possible description of the bond — and the two spin "functions" are a(s) (spin up) and /3(s) (spin down), then the approximate solution of the Schrodinger equation associated with equation (14.2.2) is *(ri,r f 2;«i,S2) = *(ri,T ? 2 )e(ai,*2)
(14.2.3) 1
where ©(si,s 2 ) is a suitable spin "function" for the two electrons. In fact, the only sensible choice for ©(si,s 2 ) is the singlet function: 9(5i, s2) = {a(Sl)(3(s2)
- a(s2)(3(Sl)}
(14.2.4)
(the hydrogen molecule is diamagnetic). Since this function is antisymmetric with respect to exchange of electrons, this means that, in order to satisfy the Pauli principle, the spatial function $(fi, f2) must be symmetric with respect to exchange of the two spatial coordinates fi and r 2 : A 2 * ( r i , f 2 ) = $ ( r 2 , r ! ) = ^(n,^).
(14.2.5)
The actual variational method is to evaluate the (normalised) mean value of the Hamiltonian operator (14.2.2 on the facing page) for the trial function (14.2.3):
j^*mdVldV2 J^*^fdV1dV2 which, of course, contains no terms involving the spin operators (since H does not) and then to minimise E with respect to any parameters (generically A, say) contained in $ and therefore in E; i.e. one ensures that
1 Throughout this section I have used the convenient fiction that there is a spin variable (s) analogous to the spatial variables (q') and that the spin of an electron may be described by functions (a and 0) of this variable, and I have used the associated ideas of "spin integration", etc. In fact, there is no such variable s and consequently the spin angular momentum "operators" do not fit into the canonical scheme. Using either the Pauli (two-dimensional) or the Dirac (4-dimensional) matrix models is more common but does not resolve the anomaly.
266
Formalisms
and "Hidden"
Variables
In order to get any further we must choose a particular form for $ which satisfies equation (14.2.5); the simplest such solution is the antisymmetrised product of the solutions of the separate-atom Schrodinger equation (A, 4>B for the two atoms); functions which are available since, as we have seen, the Kepler problem is soluble: * ( r i , r 2 ) = {<Mri)<Mr 2 ) + < M r 2 ) < M n ) } .
(14-2-6)
Using this approximation, it is perfectly possible to obtain good agreement with the experimental interaction (bond) energy of the two atoms over a wide range of internuclear separation. The explanation of the bonding is in terms of the changes in kinetic and potential energy experienced by the two electrons in their different environment and the way these changes are sufficient to overcome the mutual repulsion of the two nuclei. The qualitative conclusions from this simplest molecular calculation can easily be extended to more complex molecules of conventional structure. The energy expression is just E = 0 ^
+ VNuclear
(14.2.7)
where Q and K are quantities with the dimensions of energy and involve the spin-independent functions >A and 4>B and the terms in H while S is a measure of how much the two cf> functions "overlap" in space (0 < 5 < 1). If S is sufficiently small then S2
14-3.
Dime's
Spin
267
"Hamiltonian"
symmetric function 0 ' (say) and the spatial function $ by an antisymmetric function $ ' say: $ ' ( r i , r 2 ) = {A(n)B(r2) - A(r2)
so the two expressions may be collected as: E » Q + VNuciear ± K 14.3.
(14.2.8)
Dirac's Spin "Hamiltonian"
The essence of the transformation of the theory of the last section — with its connection with Coulomb's law and all the familiar concepts of electrostatics — and a theory depending only on electron spin is the fact that, in equation (14.2.8), all the numerical parameters are the same in the expressions for the energy of two states of different spin character; only the sign of K differs in the energy of the diamagnetic and paramagnetic states. So, we look for a so-called spin Hamiltonian which acts only on functions of spin (in our case the two functions2 0 and 0 ' and whose mean values are identical to the expressions (14.2.8)). Trial and error with the simplest possible spin operators shows that {S(Sl)
• S(s2)}Q(Sl,S2)
=
{S(Sl)
• S(s2)}e,(s1,s2)
=
-\&(s1,s2)
\&(Sl,s2)
so that f 0 ( s i , s 2 ) { 5 ( s i ) • 5(s 2 )}0(*i, s2)dSlds2 2
= --
There are other components of the triplet function which have the same energy which are not explicitly considered here although the Hamiltonian does, in fact, give correct results for these as well.
268
Formalisms
J Q'(s1,s2){S(s1)
• S(s2)}Q'(Sl,
and "Hidden"
s2)dSlds2
Variables
= 1
which have the convenient difference of sign and the same denominator. Using the spin "Hamiltonian" HS = Q + VNuclear - i t f { l + 45 ( *i) • S(s2)}
(14.3.9)
we can generate the two energy expressions of equation (14.2.8) from the purely spin-dependent quantities H$, &{si,s2) and Q'(si,s2) by means of the energy formula =
J Q(si, s2)HsQ(si, j 9 ( s i , s2)Q(si,
,= /
s2)ds1ds2 s2)dsids2
&'(s1,s2)Hs&{si,s2)ds1ds2 / ©'(si, S2)©'(si, s2)dsids2
The actual energies are the same as in the full spatial calculation, precisely because the spatial integrals which arise are given the appropriate values. The upshot of these considerations is that the ideas may be generalised to any system of conventional (Lewis-type) electron-pair bonds (N of them) using a Hamiltonian which is the obvious generalisation of (14.3.9 on the preceding page): 1
N
Hs = Q + VNuclear-- J2 Kij{l + 4S(8i) •
S(SJ)}
i<j=l
which operates on products of individual bond spin-functions 0(si, Sj) and makes no reference at all to Coulomb's law of interaction between charged particles, notwithstanding the fact that all molecules are actually composed of charged particles in mutual interaction. 14.4.
Interpretation of the Spin Hamiltonian
There are a number of points to be made about the apparent simplification of the problem of the bonding in the hydrogen molecule by using a spin-only picture: • There is no explanation of the phenomenon of bonding here; the conventional interpretation of the spin Hamiltonian and its associated energy
14-4-
Interpretation
of the Spin
Hamiltonian
269
expressions is that the electron spin angular momenta are "coupled". What on earth does that mean? Electrons are not cogwheels, how can they be coupled except via their electric and magnetic properties which are not even mentioned in the Hamiltonian? • There is no physics or chemistry here, the explanation of spin coupling makes no connection with the known laws of physics which are involved in bond formation. The strength of the actual (magnetic) interaction between electron spins is way down in the rounding error of the actual bond energies. • In fact, one may completely break away from the idea of physical interpretation using this type of model and use the spin-coupling formula as an empirical device to obtain agreement with experimental results by simply treating the parameters Q and K, not as quantities to be computed from the hidden spatial functions and operators, but simply as disposable parameters chosen to obtain the best fit with a particular measured quantity like bond energy. • More subtle perhaps, but equally relevant, if the properties of the B in the diamagnetic and paramagnetic states are properly optimised separately for each state, the numerical parameters in the spin Hamiltonian (Q and K) are different in each state. This exposes the entirely empirical nature of this Hamiltonian; it would have to have "spin coupling" constants which were not constants. All of these comments are applicable with more or less emphasis to, for example the Heisenberg model of magnetism where a Hamiltonian analogous to (14.3.9) is used except that the spins are not simply the spins of individual particles but may refer to the total spin of a group of particles. The "coupling coefficients" are, again, energy integrals arising from integrations over the spatial coordinates (hidden variables) of the electrons and are often simply used as fitting parameters to obtain agreement with experiment; there is no physical explanation involved and no connection with the known laws of interaction between charged particles in these types of "Hamiltonian". This first spin or "effective" Hamiltonian set the tone for: • The transition from Schrodinger's mechanics to a purely algebraic and formal approach to quantum theory. • The gradual abandonment of explanations of quantum phenomena based on the actual (known) interactions between particles and fields in favour of semi-empirical methods based on the "coupling" of analogues
270
Formalisms
and "Hidden"
Variables
of spin using coupling constants with a posteriori justification and interpretation. with all that this implies for the realistic interpretation of the micro-world.
PART 6
Disputes and Paradoxes
Most, if not all, the paradoxes associated with quantum mechanics are associated with the interpretation of the mathematical theory of probability. In Part 2 these matters were addressed. In this part some of the more infamous of these paradoxes are examined individually.
This page is intentionally left blank
Chapter 15
Measurement at the Microscopic Level It should be clear that the question of the (statistical) verification of the probabilistic predictions of Schrodinger's mechanics is just a particular application of the statistical method in general. Measurements are made on random concrete objects which are instances of the relevant abstract object and summed to form approximate numerical quadratures to be compared to the theoretical measures of the appropriate model subsets. However, the "problem of measurement" and "theories of measurement" are totems in quantum theory and so I do obeisance to them here.
Contents 15.1. Recollection: Concrete and Abstract Objects 15.2. Statistical Estimates of Probabilities 15.2.1. von Neumann's Theory of Measurement 15.3. Measurement as "State Preparation" 15.4. Heisenberg's Uncertainty Principle 15.4.1. Measurement and Decoherence 15.5. Measurement Generalities
15.1.
273 275 278 281 284 286 287
Recollection: Concrete and Abstract Objects
In Part 2 we looked at the theory of probability and its relationship to statistical measurements and the results we obtained there are nowhere more relevant than in relating measurements on micro-systems to the
273
Measurement at the Microscopic Level
274
probabilities computed by Schrodinger's mechanics. Let us therefore recall: • Probabilities are theoretical quantities in which there is no mention of chance or randomness; probabilities are relative measures of sets. • The referents of probabilities are abstract objects; probabilities do not refer to particular individual concrete objects. • Experimental measurements (or verifications) of probabilities may be made in several ways. Since probabilities are measures,1 these experiments must yield (explicitly or implicitly) approximate measures; quadratures obtained by experimental measurements on individual, concrete objects which have the properties (amongst others, perhaps) of the abstract object. • An effective but inefficient way of ensuring that a procedure is adequate to the task of obtaining a numerical quadrature of a function is to sample the values of that function at random and add the results multiplied by the domain of the function (the Monte Carlo method of page 29). Thus the most common and general way of obtaining approximate measures of functions is the statistical method in which relative approximate measures often turn out to be simply frequency ratios of the measured properties of concrete objects. • It is important to note that every concrete object has one and only one definite value for each of its properties. It is this fact which ensures that every concrete object is a distinct individual. • By the same token, abstract objects do not have a definite value for all their properties. It is the very fact that not all of their properties are specified which makes these objects abstract. • Finally, measurements are made on concrete objects; it is not possible, even in principle, to make measurements on the abstract objects which are the referents of probabilities. In the context of Schrodinger's mechanics and measurements the particular point to be emphasised is that, since probabilities do not refer to individual objects, neither do the solutions of the Schrodinger equation; an individual, material, concrete object does not have a state function. 1
Again, note the unfortunate collision of mathematical "measure" and experimental "measurements".
15.2.
15.2.
Statistical Estimates
of Probabilities
275
Statistical Estimates of Probabilities
In line with the general way in which probabilities are estimated experimentally, we expect that the measurement of dynamical variables in Schrodinger's mechanics would be statistical. We saw in Section 7.4.1 on page 135 that the "distribution" of a dynamical variable may be defined, and with the precautions about interpretation outlined there, it may be colloquially interpreted as a distribution. In fact, in general, it is the product of the probability of presence distribution at a particular point multiplied by the value that the dynamical variable would have if it were at that point in configuration space. So that, if we attempt a measurement of the momentum (say) of a single concrete particle at a particular point in space, we expect either: 1. a null result, i.e. nothing (because there was no particle there at the time of the measurement), or 2. some value which would be the measured value of the momentum (because there did happen to be a particle at that point when the measurement was carried out). The probability distribution for the presence of the particle would determine the ratios of null results to actual measurements at various point in space. In Schrodinger's mechanics we compute the "amount" of a particular dynamical variable {A{q%,pi\ t), say) that there is in region Wj of space2 as (A)Wi=
f
i PA(q ;t)dV
(15.2.1)
JWi
where, as usual,
and p{qi,t) has been taken as the position probability for a single particle for simplicity. In order to verify this quantity experimentally we need to perform some kind of quadrature in the region Wi of space by means of making measurements of the dynamical variable A in that (and other) regions of space for concrete particles and summing these values. 2
Here, as usual, I omit any distinction between real space (in which the concrete systems exist) and the mathematical model of it in t h e theory, since t h e notation would be doubled to no conceptual advantage.
Measurement
276
at the Microscopic
Level
The simplest example is the case when we are interested only in the actual position probability (A = 1) and this probability is independent of time. 3 We proceed in exactly the same formal manner as we did in using concrete instances of the abstract cube (throwing dice), except that it is unusual to have the same kind of control over the experiment that we have in dice-throwing. Rather than throw dice at our leisure, we have to simply accept the results of such random appearances of a particle in region W, as happen spontaneously in as many concrete systems as we have available and arrange to record their positions within that region and count them. 4 It is, of course, assumed that one cannot make a statistical estimate of a probability absolutely by making random measurements only in the required region. One is only able to obtain relative estimates of probabilities in two or more regions by statistical experiments; comparisons of "numbers of hits" in two regions will give numerical estimates of the relative probabilities of the particle being in those regions. The statistical numerical quadrature approximations to the relevant probabilities are, as usual: r 1 N = / ptf; t)dV « - J2 8(rj € Wi)
Mm
(15.2.3)
where there are assumed to be TV "measurements" (i.e. N particles turn up somewhere accessible to our instruments) and
{
1
If a particle appears at rj which is in the region W^
0
Otherwise
When enough measurements have been made to constitute a satisfactory quadrature, we compare the ratios of these numbers (and any estimate of errors) with the theoretical probabilities. Whatever the technical complexities and experimental difficulties there are in this process, it is as logically simple as the experimental verification of the probabilities of the numbered sides of the abstract cube. The most satisfactory results will be, exactly as in throws of dice, when measurements are made on individual concrete 3
Unlike PA, p is a probability distribution. In the general case of a measurement of A, we would have to measure their A-values as they turn up. 4
15.2.
Statistical Estimates
of Probabilities
277
objects which are then discarded (or, as usually happens, they disappear spontaneously from the apparatus). This simple procedure is not possible if one only has a small number of (a fortiori only one) concrete systems on which to perform experiments in exactly the same way as it is impossible to verify theoretical probabilities from a small number of (or a single) dice throws. 5 Until relatively recently, this last precaution was not necessary since any measurement on atomic and sub-atomic systems were necessarily performed on billions upon billions of concrete objects at once; the technology for performing measurements on single concrete objects was simply not available. All this is simply an elementary (albeit technically demanding) application of the same principles which are used in the experimental verification of numbered sides of a cube (dice throwing) or, indeed, marked sides of a disk (coin-tossing). Experimental measurements are performed on concrete objects which have the properties of the abstract object. The theoretical probabilities refer not to concrete objects but to the corresponding abstract object. Provided that each concrete object does have the properties of the abstract object, it is irrelevant what happens to it after the measurement. This is not at all what is meant by measurement in some standard interpretations of quantum theory. Indeed, there is a whole literature on the quantum theory of measurement, all of it generated from the early work of von Neumannn who used a pre-Kolmogorov interpretation of probability, together with two aspects of a single mistaken assumption: 1. Probabilities refer to single concrete objects. 2. It is possible in principle to write down a state function for an individual, concrete macroscopic measuring instrument. The first of these has been dealt with in detail in Part 2. The second is simply a special case of this assumption. It is not the obvious impracticality of the assumption that a state function for the apparatus exists which is in error but the underlying assumption that the referent of state functions and probability distributions are individual (concrete) objects. Notwithstanding these considerations it is, perhaps, useful to look at this piece of 5
Remember that the concrete object here is "a throw of a die" not "a die".
278
Measurement
at the Microscopic
Level
theory in its own terms rather than simply write it off as a mistake since it is one of the most persistent sources of paradox and subjectivism in the interpretation of Schrodinger's mechanics.
15.2.1.
von Neumann's
Theory of
Measurement
The theoretical prediction of the value of some variable by Schrodinger's mechanics for some abstract object is, typically, a real number, depending on the structure of the abstract object, its environment and a system of units, and that is all. The result of a measurement of that variable on some concrete object is, typically, a rational number, together with some sort of (rational) error bound; an element of the standard topology of the real numbers expressed in some system of units. But that is not all, the act of measurement involved particular apparatus, the theory of the construction and operation of that apparatus, its likely accuracy, the competence of the operators of the apparatus and hosts of other things which will depend on the details of the measurement. The theory developed by von Neumann and used as a basis for most discussions of the theory of measurement is simply a special case of the use of Schrodinger's mechanics to describe a system composed of two interacting sub-systems. Let the (time-independent, for simplicity) Hamiltonian operators of the two separate systems be Hi and Hi respectively and let the interaction between them be V. Further, again for simplicity, let each of the two Schrodinger equation s associated with the two Hamiltonians have complete, non-degenerate, discrete sets of solutions:
£i*J = E}V\
(15.2.4)
The solutions of the combined Schrodinger equation: (#1 +H2 + V)V = EkV may be expanded as a linear combination of the products of the separate equations, i.e.
* = EE^*'*i i
j
(15-2.5)
15.2.
Statistical Estimates
of Probabilities
279
for some linear coefficients6 CV,, simply because the two sets of functions {$}} and {^j} span the space of the separate systems completely. This sum may well have an infinity of terms and the coefficients depend, of course, on the nature of the interaction V. Now suppose that the interaction between the systems is typical of a measurement process, namely the systems exist independently, are brought together (interaction switched on) and then separated (interaction switched off). In a way analogous to the EPR paradox discussed in Section 16.2 on page 294 the state functions of the two systems seem to be inextricably mutually dependent, suggesting that the act of measurement causes mixing in the states of the (macroscopic) measuring apparatus. How do these considerations bear on the measurement process? First of all note that measurement is a concrete process: • The measuring apparatus is a concrete, material object. • The subject of the measurement is a concrete object. No concrete objects have state functions or probability distributions. A fortiori, no unique macroscopic concrete objects have state functions, so any conclusions based on the interactions of abstract systems and the associated probability distributions have absolutely no bearing on the measurement process. What is more, statistical experiments to verify a prediction from a probabilistic theory do not have to be carried out on the same concrete systems with the same concrete apparatus. Any description of measurement based on the above theory would clearly have to be general in the sense that, if one of the constituents is macroscopic, why not both? So, how does this theory apply to dicethrowing experiments? After all we have just assumed that the symbol for a state function of a complex macroscopic piece of apparatus may be written down, compared with which the state function which generates a probability distribution to be tested by dice-throws is very small beer. But the probability distribution function relevant to dice-throwing experiments does not refer to throws of concrete dice but to the abstract object "numbered sides of a cube". It is not possible to make measurements on an abstract object and an abstract object will certainly have no effect on a measuring apparatus since it only exists in our minds. What is 6
T h e r e are, of course, many solutions of the composite equation, I have chosen just one t o avoid further indexing.
280
Measurement
at the Microscopic
Level
more, this probability distribution may be verified by quadratures based on experiments with any combination of "fair dice" thrown under a huge variety of conditions anywhere in the world and with a wide variety of types and locations of apparatus. Let us say that, in the manner of scientists everywhere, we will take advantage of our experimental program to combine work with recreation and do a test of 100,000 dice throws: • • • •
20,000 in Las Vegas using the crap tables and casino dice. 20,000 in Bali using ivory dice and bamboo tables. 20,000 in Monte Carlo using the casino dice and tables. 20,000 in the Maldives using dice carved from coconut shells and coral rocks as tables. • 20,000 in Sheffield using stainless steel dice under an umbrella.
Combining all these results might well give a statistical estimate of the probability that a numbered side of a cube be "5" is about 0.1667. What is the applicability of von Neumann's result here? There are 5 completely different sets of "apparatus" and the experiments are performed at separate locations with no communication between them and yet they generate a result which is entirely acceptable. This is obviously because all these concrete objects (mass-homogeneous dice thrown in a gravitational field from a height at least 10 times their dimension onto a horizontal surface) are suitable objects for the statistical estimation of the probabilities referring to the abstract object "cube with numbered faces". The alleged state function of the apparatus is completely irrelevant to the statistical verification of probabilities calculated for the abstract object. The apparatus does not have to be the same in every case or even similar, nor does the concrete target. Both apparatus and target object are concrete objects and, as such, do not have state functions, notional or otherwise. There is no probability distribution for a single concrete object and so it (or its underlying alleged state function) can have no interaction with the non-existent state function for the concrete apparatus. This example is deliberately trivial but exactly the same considerations apply to the experimental determination of (say) the probability of position distribution of an electron in a hydrogen atom; any number of atoms can be used with any number of different pieces of apparatus using (say) different modes of operation with different theoretical backgrounds. And all of these
15.3.
Measurement
as "State
Preparation"
281
results may be combined to give a statistical estimate of the probability involved. Even if one takes the von Neumann point of view and assumes that a state function refers to a concrete object, the whole theory, if not fatally flawed theoretically, is hopelessly unrealistic. From the point of view of its microscopic structure, any macroscopic object exchanging, as it does, matter and energy with its surroundings could never be considered to be in a single quantum state or even a linear combination of quantum states. 7 To base any kind of theory on the assumption that a macroscopic object can be considered to be in a single quantum state is fantastic. To draw philosophical implications from such an assumption is simply preposterous. Schrodinger's mechanics, like other scientific theories, has a limited domain of applicability and to extrapolate this domain in an unlimited way is to invite ridicule and disaster. We all know what the result of the extrapolation of the flat earth theory is; it leads to a rich field of philosophical controversy about the possibility of an infinite earth or something spectacular happening at the edges.
15.3.
Measurement as "State Preparation"
There is one aspect of the process of measurement which it is useful to discuss, since it highlights more confusions between abstract and concrete objects; the idea of measurement as state preparation. Using the usual colloquial lack of distinction between the referents of probability (abstract object) and measurements (concrete object) this argument runs like: • The allowed values of any dynamical variable are those which are the eigenvalues of some Hermitian operator (real numbers). • Any measurement on any system must, therefore, result in one of these numbers (or a rational approximation to it plus an error bar). • This act of measurement — and this is the key assumption — is said to force the system to be in an eigenstate of the relevant Hermitian operator; this, then, is the selected state. • Thereafter, if the system is undisturbed, it remains in that state so that, among other things, a subsequent measurement of the same physical quantity must result in the same number being returned. 7
Any laboratory apparatus would typically require about 10 3 0 coordinates to specify its state before it was switched on!
282
Measurement
at the Microscopic
Level
Some of the mistakes here are so obvious that it seems bad form to point them out; first of all in their own terms: 1. At the microscopic level only those systems may exist for which there is the solution of some Schrodinger equation. If the system is in a stationary state, therefore, the only physical variables which have constant values (measured or not) are the ones for which the operators commute with the Hamiltonian. It is a matter of technique, not theory, whether two successive measurements of such a dynamical quantity will give the same result. 2. The determination of the value of any physical variable which does not commute with the Hamiltonian certainly does not ensure that a subsequent measurement of that variable gives the same result; in a stationary state there will be a distribution of values. 3. There are some physical variables to which no Hermitian operator can be assigned; in particular, there is no operator corresponding to "position coordinate of a particle". Any such putative operator would have to have eigenfunctions and eigenvalues representing a stationary particle and there are no Schrodinger equation s for stationary particles; therefore there are no stationary particles in the microworld described by Schrodinger's mechanics. 8 4. We are left wondering about the status of a simple measurement, like noting the position of impact of a particle on a screen; surely the simplest of all atomic measurements. It is more coherent to express this position in terms of a measurement "filtering" out those members of a set of identical systems (an ensemble?) which have a common value of the variable involved; the process here is then active selection rather than passive measurement. But this interpretation is also open to the above objections. To make sense of these points we have to be clear about the referent of measurements and of probability; first measurements: • Measurements are, necessarily, made on actually existing, material concrete systems (particles or fields, say) and what makes a system concrete (different from all other concrete systems) is the fact that it has a definite 8 A s we have noted earlier, axiomatic quantum mechanics should contain an axiom: "There are no stationary objects in the microscopic world."
15.3.
Measurement
as "State
Preparation"
283
value of each of its properties; each electron has its own mass, charge, spin, position, momentum, etc. Of course, the numerical values of some of these physical variables are determined by scientific laws; some known and some (as yet) unknown. However difficult it is to determine the numerical values of all the properties of each concrete particle (say), these numbers characterise that object as a unique element of the real world which is acted on by our measuring apparatus. When a measurement is made on a particular concrete object, the result is the numerical value of the relevant property which that particular concrete object happens to have. Of course this value has to be one of the possible values that property may have (how could it be anything else?), but there is no question of this measurement on a particular concrete object forcing that object into a particular state (concrete objects do not have states in Schrodinger's mechanics) nor a fortiori of forcing the corresponding abstract object into a state. • When a measurement of a particular property of a concrete object is made, that measurement may or may not destroy the value of that or any other property of the concrete object; this is a matter of experimental technique. On the sub-atomic level, for example, determination of the position or momentum of a particle will generally mean stopping or deflecting it (changing its position, energy and momentum) before it vanishes forever onto the walls of the apparatus or into space somewhere; we will, in general, have no further contact with it ever again, let alone checking if a subsequent measurement gives the same value as the one just measured. We may be able to infer both the position and/or momentum of a concrete particle from such a deflection experiment but after that the particle has gone. • So, for example, sets of measurements on the positions of concrete particles will yield a distribution of values; and they will usually be one from each of a number of different concrete systems. The results of none of these individual measurements are governed by probabilities; probabilities are relative measures of sets and do not refer to any concrete objects. The only connection between the probabilities obtained (for example, from Schrodinger's mechanics) is via the approximate quadratures obtained by the frequency ratios of lots of random tests on concrete objects having the properties of the abstract object to which the probabilities refer. In particular, it is the abstract object which is
Measurement
284
at the Microscopic
Level
in a "state" described by Schrodinger's mechanics and the probabilities generated. No measurement on a concrete object can put it into an eigenstate because concrete objects do not exist in the states of Schrodinger's mechanics. It is therefore only possible to force a concrete object (prepare a "state") to have a value corresponding to an eigenvalue of one of the operators which commute with the Hamiltonian, 9 while it is entirely possible — if one has the equipment, time and ingenuity — to measure the value of any dynamical variable of any concrete object. 15.4.
Heisenberg's Uncertainty Principle
There is a minor theorem which relates the standard deviations of conjugate (in the sense of Hamilton) dynamical variables, which has been used to justify some theories of measurement at the microscopic level and to support the idea that there is something mysterious associated with the micro-world. For historical reasons, the theorem is known as Heisenberg's "uncertainty principle"; let's take a look at it and see what its implications are. The standard deviation, A / , of a (random) variable (/(x), say, for a < x < b) is defined in probability theory by ( A / ) 2 = / (/ -
(f))2p(x)dx
Ja
where (/) is the mean value of the variable / : (/) = / fp(x)dx Ja and p{x) is the probability distribution of the basic variable x. If we take a conjugate pair of dynamical variables q and pq = —id/dq and the state function ip(q) then, as we have seen earlier, the relevant quantities are: p(q) = = \HQ)\2 = V>* ()#?)
(Aq)2 =
f\q-{q))2r{qmq)dq
(15.4.6)
Ja
(Apq)2= fr{q) Ja 9
Assumed time-independent.
K)-<*>
2
ip(q)dq.
(15.4.7)
15.4- Heisenberg's Uncertainty Principle
285
The appendix to this chapter establishes a relationship between these two standard deviations: ApAq > | .
(15.4.8)
This result is a statement of the relationship between the standard deviations of the probability distributions of any pair of canonically conjugate variables in any state of an abstract object described by a Schrodinger equation. Clearly this result has some bearing on sets of random measurements on concrete objects of the values of those variables; they can be compared, for example, with statistical estimates of the standard deviations. But the result (15.4.8) can have no consequences for individual measurements of the values of those conjugate dynamical variables on corresponding concrete objects. If we think of a single-particle system for which we have the solution of the Schrodinger equation and make measurements on the associated concrete objects, there are just two kinds of results which we might obtain: 1. If the system is in an eigenstate of the relevant momentum, all concrete systems will give the same momentum measurement (ignoring experimental error) and a distribution of results for the conjugate coordinate measurement. 2. If it is not in such an eigenstate then both sets of measurements will generate distributions. If enough measurements are taken then the mean square deviations can be estimated in the standard statistical manner and the results compared with the theoretical results obtained from the solution of the relevant Schrodinger equation. Obviously in the first case the measured standard deviation of the momentum results should be close to zero since all measurements of the momenta of concrete objects should, ideally, yield the same value as that of the abstract object from the Schrodinger equation. That is, in this case the abstract object has a fixed value of momentum (in a given state). Electrons are never stationary, so measurements will generate a distribution for the position measurement in both cases; that is, the abstract object never has a fixed position in Schrodinger's mechanics. This is the only relationship between the inequality (15.4.8) and measurements. In particular, this relationship says nothing at all about the simultaneous measurability of conjugate pairs of dynamical variables. Like all other pro-
Measurement
286
at the Microscopic
Level
babilistic results, the above inequality can only say something about lots of (random) measurements and nothing at all about a single measurement. What it does say (approximately) is that the "sharper" the distribution of one of the conjugate pair , the more "diffuse" the distribution of the other; a result of some interest but hardly earth-shattering. Finally, we have seen that Heisenberg's uncertainty principle is just a theorem in Schrodinger's mechanics, it certainly has not the autonomy that the epithet "principle" would suggest; it cannot, for example, have any more explanatory power than the properties of conjugate pairs of operators have in their domain of Hermiticity from which this principle is derived. 15.4.1.
Measurement
and
Decoherence
In the 1970's a theoretical attack was made on the problem of the quantum states of macroscopic objects. Investigations were carried out into the consequences of possible interactions between the large-scale ("collective" as they are called) variables characterising the state of a macroscopic object and the variables characterising the states of the microscopic constituents ("environments" is the rather bizarre 10 terminology here) of those macroscopic bodies. This is a field which is, of necessity, investigations of model systems where, typically, a single macroscopic oscillator might be composed of many microscopic harmonic oscillators with the interaction between the two sets being generalised friction. The reasons for this choice are obvious; it is a realistic one (real pendulums being damped by friction and becoming stationary and heated) and the manipulative simplicity of the classical and quantum oscillator problems. The results of these investigations were as unsurprising as they were tricky to derive; it was found that Whenever a system in a hypothetical linear combination of pure states in the macroscopic variables is allowed to interact with the microscopic variables of its constituents it collapses with astonishing rapidity into a "mixed state"; a mixture of pure states. Or, as I have repeatedly emphasised, no concrete object and certainly not a macroscopic object is ever in a quantum state and a fortiori never in a. linear combination of pure quantum states. The macroscopic body is in a "thermostat" of its constituent sub-systems; external environments like radiation fields are also considered.
15.5. Measurement Generalities
287
This result has been seen in some quarters as a fig-leaf to hide the quantum measurement problem and so it is if one believes that the referent of probability distributions are concrete objects. However, from the point of view of this work the non-existence of the quantum measurement problem is due to the confusion about the referents of probability theories; probabilities (and, where relevant, state functions) refer to abstract objects while measurements are made on concrete objects.
15.5.
Measurement Generalities
Some of the discussion in this chapter has been a little light-hearted since the "quantum measurement problem" is simply an artifact of the mistaken idea that the solutions of the Schrodinger equation and the related probabilities refer to individual concrete objects. Once it is realised that state functions and probabilities refer to the corresponding abstract object and that measurements are always performed on concrete objects then the difficulties and paradoxes simply fall away. There is one other persistent point of view which is worth a brief mention; many writers are of the opinion that the operators of Schrodinger's mechanics, refer in some way to measurements11 of the relevant dynamical variables rather than the to the dynamical variables themselves. There is really only one sort of reply to this position; the one familiar to pantomimegoers everywhere: "Oh no they don't". The various quantities in any mathematical articulation of Schrodinger's mechanics, in particular the mathematical symbols in the Schrodinger equation and its associated operators are things like: • Position coordinates of particles and fields. • Charges, masses, etc. of particles and strengths of fields. • Representations of momenta of particles, fields and associated quantities. There is nothing in the Schrodinger equation which refers to measuring apparatus, the theory of the operation of such apparatus, the skill and knowledge of the operators (or software authors) of the apparatus or the likely accuracy of the apparatus any more than there is in any of the formulations of classical mechanics. It is sometimes said that "ideal measurements" are meant when this position is taken. If so, where, in Schrodinger's Or, worse, an observer noting the measurement.
288
Measurement
at the Microscopic
Level
mechanics, are the variables describing the ideal measuring device? The results of theory and measurement are quite different things leading to different kinds of mappings. Looking briefly12 at some real-valued variable; if S is the system and T the theoretical quantity, the theoretical result (in our context, the value of T for the abstract object) is: T:SxU
-^R
where U is a system of units and R is the real numbers. However, the corresponding experimental magnitude (T") (the value of T for a concrete object) is something like: T':SxUxMxM'xExOx...->
P(R)
where M is the set of methods of measuring T, M' is the set of theories about how these methods work, E is the set of equipment used, O the set of experimental workers, etc., etc. and P(R) is the set of intervals in the real numbers (with rational end points). Clearly T and X" are different mappings with different domains and different ranges and the relationship between them is not at all simple. In the absence of a shred of evidence that Schrodinger's mechanics refers to measurements or to laboratory equipment or to observers and their minds, I shall naively assume that, like other forms of mechanics, Schrodinger's mechanics refers to the abstract objects I have described which are (partial) images of the concrete, material objects in the real world.
Neglecting any complications due to probabilities.
Appendix 15.A
Standard Deviations of Conjugate Variables If V>() belongs to the domain of Hermiticity of pq, then the expressions for Aq and Apq take on a more symetrical form:
{Apqf = J" (i>(q) { ( * ^ ) -
x ( ^ { ( - ^ - { P , ) } ^ ) ) e^ i.e. Ap, =
and
/ V>*(«)
Aq = /
- ^ )
-
(P./
<%)d<2
f(9){(-(9)}*(9)rf9
The two quantities are the moduli of two functions so we can make use of a version of the Schwarz inequality: \x\\y\
>W-y\
since I m ( i • y) < \x • y\ \x\ \y\ > Im(x • y) to give:
Since APqAq
>
f {(q - {q)W(q)
{ (-*J-)
~ (Pg)\
then ApqAq > Im f {(q - {q)}^*(q) { ( - * | " ) - } VK?)^ 289
290
Standard Deviations
i.e. ApqAq > —
4>*{.q)q i-ig-)
of Conjugate
Variables
1>{q)dq
qi/j*{q)ip(q)dq
so that ApqAq >
f
P(q)
*dq\
ip{q)dq
J a
since some of the integrals in the starting expression are real numbers and do not contribute to the imaginary part and the right-hand side of the final expression above has been simplified using the commutation relationship of the conjugate pair q, pq.13 Finally, since the explicit form of the commutator of the conjugate pair is:
.8 q,-i
dq
= —ih
we have the result that ApAq < -
(15.A.1)
returning to standard units so that Planck's constant appears in the conventional expression. It is worth noting that the above derivation assumes that the function qip(q) belongs to the set of functions for which pq = —id/dq is Hermitian. If this condition does not hold there is a boundary-value term to add to the right-hand side of (15.A.1).
See Section 10.1.1 on page 198.
Chapter 16
Paradoxes
Here is a miscellany of matters which have caused difficulties of interpretation; some are soluble, some are not.
Contents 16.1. The Classical Limit 16.1.1. The Ehrenfest Relations 16.2. The Einstein-Podolsky-Rosen (EPR) Paradox 16.2.1. The EPR Original 16.2.2. Bohm's Modification 16.2.3. Bell's Inequality and Theorem 16.3. Bell's Assumptions 16.3.1. Lessons from EPR 16.3.2. Density of Spin and EPR 16.4. Zero-Point Energy
16.1.
291 293 294 295 297 298 300 303 304 307
The Classical Limit
The satisfying thing about disproof is that it is often so easy; one has simply to produce a counter-example. Schrodinger's mechanics is a probabilistic theory, in particular it is a theory whose referents are abstract objects; objects for which not all the properties have definite values; the values of some of the properties are given by probability distributions which may only be verified by statistical methods involving measurements at random. A question which naturally arises is "how do classical (particle) mechanics and quantum mechanics fit together?". Specifically, we know that Newtonian mechanics is capable of solving the Kepler problem at the astronomical level perfectly and Schrodinger's mechanics generates agreement 291
292
Paradoxes
with experiment for the hydrogen atom of comparable quality. How and where do these mechanics join up? When do the probabilistic predictions of Schrodinger's mechanics go over into the definite trajectories of Newton's theory? Questions of the relationship between macroscopic and microscopic mechanics can often be resolved by looking at the most direct link between them: the Hamilton-Jacobi theory. As we saw in Chapter 4, the referent of the H-J equation is the set of all possible trajectories consistent with a given Hamiltonian. The referent of the Schrodinger equation is the abstract object which has no position; the position of the particle is given by a probability distribution. Both of these functions are (for a single particle in 3-space) function of three-dimensional space; they have values almost everywhere in 3-space. In the case of the H-J equation, however, we have seen that it is possible to generate (all) the particle trajectories from the solution S. This is because the H-J equation was developed from Hamilton's mechanics, which did generate the trajectories. For Schrodinger's mechanics there is no such recipe. The particle trajectories are not even defined in Schrodinger's mechanics so there can be no way of recovering them from the function ip of 3-space. There can, therefore, be no question of the classical limit of the solutions of the Schrodinger equation being individual particle trajectories, since these trajectories are curves in 3-space (in the case of the Kepler problem, planar curves). No amount of taking limits in the case of very small h or taking the limits of large quantum numbers can alter this fact; there is no way of extracting the trajectories from Schrodinger's mechanics. 1 It should be said, however, that just because Schrodinger's mechanics does not tell us what the trajectories are does not mean that there are no trajectories; the referent of quantum theory is the abstract object; statistical measurements to verify this theory are carried out on randomly-occurring concrete objects which (presumably) do have trajectories. The analogy here is with the probability distribution for the deflection angle of the (classical) pendulum of Section 2.4; the trajectories cannot be obtained from the distribution function but there is absolutely no doubt that these trajectories exist. It is even doubtful if Schrodinger's mechanics goes over smoothly into statistical mechanics, which is the obvious classical limit, rather than 1 lt is not obvious to me that a particular probability distribution cannot be generated from more than one set of trajectories.
16.1.
The Classical
Limit
293
(individual) particle mechanics. Two examples spring to mind: • The large-quantum-number limit of the harmonic oscillator described in Section 12.4 looks plausible but what is the role of all those oscillations in the probability distribution? This situation always obtains when looking at the large-quantum-number limit of Schrodinger's mechanics. • The ground state of the quantum Kepler problem (the hydrogen atom) is a state with zero angular momentum; there is a finite probability that the electron be present at the nucleus. The analogous family of solutions of the Hamilton-Jacobi equation have the same property; they are all trajectories in which the electron collides with the nucleus.2
16.1.1.
The Ehrenfest
Relations
It is possible to show that, for the typical time-independent Schrodinger equation, 1
2m
V 2 + V(x, y, z)j xp(x, y, z) = Ei/>(x, y, z)
(16.1.1)
there exist certain relationships amongst the mean values which look very similar to the equations of classical mechanics. If we define a force (vector) in the familiar way by F = -VV(x,y,z)
= - W ( f ) (say)
then Ehrenfest showed, by using the general theorem for the time evolution of the mean value of a dynamical variable, that d ._ dV ' ±{p)
p m = (P)
(16.1.2)
d2 and so (F) — m-r^(f) a "mean-value" analogue of Newton's law. Unfortunately, although true for certain limited cases (for example, when F(r) = F((f))), the equations are without content since: 2
All the "s"-states have the same property.
Paradoxes
294
• The mean value of the coordinates used to construct (f) are independent of time for solutions of (16.1.1): -^(r) = Jt J ip*(x,y,z)rip{x,y,z)dV
= 0.
• The solutions of (16.1.1) are real (tp* = V) 3 SO J a s w e have seen in Section 9.3, the mean values of all momentum components are zero: (p) = / ip(x,y,z)pip(x,y,z)dV
= 0
d _, so that — (p) = 0. dt • The only cases when F(f) — F((f)) is satisfied are when V is zero or linear in r so that F is, at best, a constant. Thus, all of the equations (16.1.2) degenerate into 0 = 0 and, while true enough, certainly do not provide any support for the idea that classical mechanics is a limiting case of Schrodinger's mechanics.
16.2.
The Einstein-Podolsky-Rosen (EPR) Paradox
This is, perhaps, the most celebrated of the quantum-mechanical paradoxes since, unlike Schrodinger's cat, it is about the properties of micro-systems to which Schrodinger's mechanics actually refers. It is also the source of the great mystery of transmission of information at speeds greater than the velocity of light. In its original form as published by EPR, 4 the paradox involved the rectilinear motion of two free particles with constant total momentum. Later, Bohm and Aharonov 5 refined the model by using the spin of two particles with constant spin angular momentum. In making this apparent simplification of the model (and making it, at least in principle, accessible to experimental verification) Bohm unwittingly combined together the two things which have caused most confusion in the interpretation of quantum theory: probability and angular momentum. Let's look at these things one at a time by examining EPR's original argument and 3
Even if E is degenerate in (16.1.1), (p) is independent of time. A Einstein, B. Podolsky & N. Rosen, Phys. Rev. 4 7 (1935) pp. 777-780. 5 D Bohm and Y Aharonov, Phys. Rev. 108, p. 1070 (1957).
4
16.2.
The Einstein-Podolsky-Rosen
(EPR)
Paradox
295
then addressing Bohm's variant and subsequent developments due to J. S. Bell.6 16.2.1.
The EPR
Original
The title of this paper is "Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?" and it is worth quoting the opening sentence of this celebrated paper and later material: • "Any serious consideration of a physical theory must take into account the distinction between objective reality, which is independent of any theory, and the physical concepts with which the theory operates." • "Whatever the meaning assigned to the term complete, the following requirement seems to be a necessary one: every element of the physical reality must have a counterpart in the physical theory." (emphasis in original) and later, as a working definition of "reality" they say: "If, without in any way disturbing a system, we can predict with certainty (i. e. with probability equal to unity) the value of a physical quantity, then there exists an element of physical reality corresponding to this physical quantity' The first of these is a statement of Materialism about as concise as Lenin's on page 6. The second pair is more problematical dealing, as we are in Schrodinger's mechanics, with a probabilistic theory. Although the sense of the second pair of quotes is clear, they do not stand up to close scrutiny; in particular the initial statement is back to front. I would rather say "every element of the physical theory must have a counterpart in physical reality" which is a re-statement of the position taken in Part 1 unless, of course, one requires the impossible; a theory of everything. But it is the statement about prediction which cannot go unchallenged. If the statement that only those physical quantities whose values may be predicted with certainty can be judged to be "elements of physical reality" is true, then much of science is concerned with imaginary quantities. EPR use the fact that the momentum operator and the conjugate coordinate "operator" (multiplication by the coordinate) do not commute to deduce that "when the momentum of a particle is known, its coordinate has no reality". It is also clear, from even the most casual reading of EPR's argument, that they have in mind what I have called the "colloquial" interpretation of probability; they are clearly of 6
Physics
1, p. 195 (1964).
Paradoxes
296
the opinion that probabilities apply to concrete objects since they say, for example, "we can only say that the relative probability that a measurement of a coordinate will give a result ..." (my emphasis). There are all kinds of confusions here and it would be fatuous of me to imply any criticism of the work of these pioneers; instead let me re-state what is meant by probability and its relationship to measurements: • Probabilities are the relative measures of sets which are subsets of a function with domain an abstract object and range the real numbers. • Probabilities give no information whatsoever on the properties or behaviour of individual concrete objects. • Verifications of probabilities must be numerical quadrature approximations to these measures and are obtained by making measurements 7 on concrete objects which have the properties of the abstract object. • Statistical verifications of probabilities are meaningful only if the measurements are made at random on concrete objects so that the quadrature samples the set adequately. The system which EPR consider is a pair of particles which have total momentum which is constant but which may have interacted but subsequently are separated by their motion but which still have the same constant combined momentum (pi + p 2 = P, say). The referent of Schrodinger's mechanics here is the abstract object "a pair of particles with constant total momentum p". If one prepares a suitable set of concrete objects of this type and subjects these objects to random tests (or, more realistically, makes measurements of the concrete objects as they turn up), then the predictions which Schrodinger's mechanics makes for the abstract object may be tested. The most interesting properties are those for which Schrodinger's mechanics predicts should be independent of which concrete object is tested; the quantity is a numerical value of some property of the abstract object and so all concrete objects will have this value. In particular, one might measure the value of the momentum of one of the particles (pi, say) in a random test. Having measured this value and, knowing that the total momentum of the pair is p, it is not too much of a strain to deduce that the momentum of the other concrete particle is pi = p — pi. What one can deduce from this random measurement is that, somewhere out there, there is a concrete particle with momentum p-2.. This knowledge has no consequences for further random measurements of 7
Again, the collision of nomenclature of "measure" and "measurement".
16.2.
The Einstein-Podolsky-Rosen
(EPR)
Paradox
297
the momenta of the particles on other concrete objects of the same kind, because it is the essence of random tests that one does not know the time of measurement or the identity of the concrete system on which the measurement is made. Of course, if one had the technology and patience to know the position of the other particle, one could check that its momentum was indeed p2. But this would not qualify as a random measurement and so does not qualify as a means of verifying the predictions of Schrodinger's mechanics. It is as if one were to check the probabilities of the presence of balls in a bag containing one white and one black ball by withdrawing balls at random from a large number of such bags and cheat by not returning a white ball to a particular bag and marking that bag. The experiment of predicting the momentum of a distant particle by measuring the momentum of a nearby one and using the known constant sum is, of course, completely possible and entirely trivial, but, like the experiments with standard dice discussed in Section 2.2.4, it has nothing to do with probabilities or Schrodinger's mechanics. The point of the EPR paper, however, was not particularly to get the probability interpretation right but to try to show that, because one could predict the momentum of a distant object, the description of the system could be made more precise and that Schrodinger's is not a "complete" description of reality. Only incidentally was the question of mysterious "action at a distance" involved. This aspect is much more carefully put in Bell's considerations of Bohm's modification of the EPR thought experiment.
16.2.2.
Bohm's
Modification
The problem with the EPR experiment is that there is no possibility of checking it experimentally; Bohm solved this problem very neatly by replacing the linear momenta by spin angular momenta. 8 Two particles of spin 1/2 (electrons, say) will combine together in four ways and, in one of them, the total (spin) angular momentum is zero and the two electrons have opposite spin (the singlet state) and, therefore, the one component of angular momentum which may be chosen is also zero. Clearly the same 8 By replacing E P R ' s explicit problem in Schrodinger's mechanics by a mathematically convenient algebraic problem Bohm, in the context of discussions of "hidden variables", ironically introduces a set of hidden variables since the forces of interaction between electron spins are overwhelmingly due to electron repulsion and the Pauli principle associated with the charge and spatial distribution of t h e electrons; both of which are hidden from the spin-only algebra.
Paradoxes
298
argument as used in EPR will apply; if two electrons interact 9 to form a singlet spin state and then become separated with no changes to the spins of the particles, then, on finding a "spin up" particle one can say that somewhere there is a "spin down" particle. Unlike EPR, this prediction is, at least in principle, experimentally testable. The same arguments relating to probability also apply; the ability to predict the spin direction of a concrete particle is only possible if the criteria of randomness are violated. 16.2.3.
Bell's Inequality
and
Theorem
The entire discussion of Bell's inequality and theorem is carried out in terms of measurements of properties of the separate components of a twocomponent system, so I will follow this tradition in presenting the development and then see if the interpretation can be made compatible with Schrodinger's mechanics as I have presented it. One considers a two-component system and an associated pair of instruments, each of which can measure the value of a two-valued variable of one of those components which, conventionally, are well-separated in space. Although the argument is general it is clearly based on the measurement of spin components in a particular direction; I give this concrete picture in brackets where appropriate. Each of the instruments has the possibility of making a measurement, depending on the value of its instrumental settings (spin component in a particular direction in the laboratory). The setting of the first machine will be denoted by a and of the second by b (vector notation because, in the case of spin, the setting is a direction). The result of the measurement (±1, say, for each of the two components) clearly may depend on: • The properties of the relevant component of the two-component system; here assumed to be fixed. • The value of the relevant one of a or b. • Plus, possibly, any other parameters of the systems or instruments that we have ignored or do not know about; let them be formally and collectively called A. So, one assumes the existence of a pair of functions A(a, A), B(b, A) which determine the value of the measurement on each component. It is assumed that the measurement of the first component determined by A (a, A) does 9 Strictly speaking, any two electrons must always be in a definite spin state, whether they interact or not.
16.2.
The Einstein-Podolsky-Rosen
(EPR)
Paradox
299
not depend on b (the instrumental setting of the machine used to make the measurement on the other system) and vice versa; the measurements are independent of the setup of other instruments (the magnets are sufficiently remote to exclude interference in the spin measurements). Since one can, by assumption, have no knowledge of the nature or possible values of the parameters A the best one can do is assume that their values are governed by a probability distribution function p(X) which may be as diffuse or spikey as one likes provided it satisfies the usual requirements for a probability distribution: principally p(X) > 0 and / p(\)d\ = 1. The purpose of the exercise is to see what can be said about the correlation coefficient Ca,b connecting the two measurements: to what extent are the two measurements related? In this context, Bell defines the correlation coefficient as Ca,b=
IA(a,X)B{b,X)p{X)dX
(16.2.3)
and uses two simple facts: 1. The original (spin-motivated) assumption A(a, X) = ± 1 , B(b,X) = ± 1 are replaced by a more "continuous", related assumption: \A(a,X)\
\B(b,X)\
2. Two possible settings for both of the instruments (a, b) and (a', b'), giving four possible correlation coefficients: Ca,b Ca,b' Ca',6
and
Caip
to derive a simple result: \Ca,b - CaM | + \Ca;v + Ca>,b\ < 2
(16.2.4)
which is known as "Bell's inequality". It is straightforward to show that, if the example of two electrons in a singlet spin state is used to compute the correlation coefficient for the directional spin components the result is not in agreement with Bell's inequality. This particular special result of equation (16.2.4) is known as "Bells' theorem". If it were true it would be rather tiresome for Schrodinger's mechanics. The assumptions underlying the derivation 10 are therefore worth a closer look. 'The derivation itself is extremely simple and is beyond reproach.
300
Paradoxes
16.3.
Bell's Assumptions
First of all, if the proof is to concern the real world we should discard all the assumptions about measuring instruments; there is no theory of measuring instruments (for classical or quantum systems) nor will there ever be since the type, construction and operation of such machines is irrelevant to the theory. 11 If one pair of instruments were to give different correlation coefficients to another pair — of completely different construction but designed to do the same job — then the experimental physicist would be inclined to a more robust view than assume that some unknown parameters were at work; he would demand his money back from the manufacturers. The key thing here is the values of the variables that the components have not the means of measuring them. I therefore make the realistic assumption that the quantities a and b, etc., are not parameters of the instruments but, for example, coordinate systems used in the theoretical description of the components and so forth. So, in the realistic interpretation, the functions A(a, A), B(b,X), etc. depend only on the components of the system; that is the parameters A are, in the traditional terminology, "hidden variables"; properties of the system under investigation which we do not know about. These considerations are, of course, dependent on one's interpretation of science and do not affect the substance of the derivation. The only possible flaw in the argument of Bell is in the assumptions made in setting up the original definition of dependency through the correlation coefficient; mathematics is a machine, it can only transform truths, not generate them. Bell's definition of the correlation coefficient (equation (16.2.3)):
Ca,b = J A(a,X)B(b,X)p(X)dX is not the standard one used in probability theory because it does not have one of the most important properties of a quantity to measure dependence: it does not become zero when the quantities are independent. This is not crucial in itself, but when we begin to think about its implications the nature of the flawed logic becomes clear. The standard definition of the correlation coefficient is: C Eab=
^-{A)a{B)b °~AaVBb
Provided they are competently constructed and properly operated.
(1635)
16.3. Bell's Assumptions
301
where the <JS are standard deviations which are used to generate a dimensionless result for the coefficient. This coefficient is zero when Ca,b is merely the product of the means of the separate distributions for A and B. But how are we to evaluate these means? They are, presumably, in the spirit of Bell's assumptions and derivation (A)a=
f A(a,X)p{X)dX J
(16.3.6)
(B)b = J B(b,X)p(X)dX. But, without further exegesis, this means that, unlike the quantities a and b, the hidden variables collectively called A are assumed not to separate into a group associated with A and a group associated with B; there are just two possibilities: 1. Notwithstanding the generic notation A for the whole group of them and the generic distribution function p, in fact the hidden variables do separate into a group associated with A (Xa, say) and a group associated with B (At, say) with no "cross terms". That is, any hidden variables there may be, although unknown to us, are, like the parameters a and b, associated with each component separately. The hidden variables associated with one of the components do not affect the properties of the other remote component. In this case the above equations become simply {A)a =
A(a,X)pa{Xa)dXa
(B)b = JB(b,X)Pb(Xb)dXb (16.3.7) where pa(Xa) =
Jp(X)dXb
Pb(Afc) = / p(X)dXa . 2. The opposite, the properties of each component separately, depends on all the hidden variables. In this case their average values are given by the original definition equation (16.3.6). Notice the generality of this assumption, particularly that time and distance are not mentioned here, so the variables A are not merely global, they are, in principle, given the power to influence the properties of both components in any fashion
Paradoxes
302
whatsoever. In particular they may act instantly and with any range so that in order to calculate the mean of either A or B it is necessary to use all the unknown parameters A associated with the whole of the two-component system. Now, consider the consequences of these two assumptions for the interpretation of Bell's inequality and theorem. If we look at the two cases separately, then: 1. This is the most physically attractive case and the easiest to deal with. Going back to Bell's original definition of the correlation coefficient and using the partition of the hidden variables into the two "local" sets we have: Ca,b= = / = (
[A(a,X)B(b,X)p{X)dX A(a,Xa)B(b,Xb)pa(\a)pb(Xb)dXadXb A(a,Xa)pa(Xa)dXa)
x f /
B(b,Xb)pb(Xb)dXb
which is nothing but the product of the two means and the corresponding standard correlation coefficient would be zero. Bell's proof is unaffected and the inequality still holds, but all the correlation coefficients in it are zero and we are left with the reassuring result that 0 < 2. 2. In the second case, the proof still holds as in the original case since nothing is changed except that there is now no mystery about the result that the two quantities are correlated because it is specifically assumed that the hidden variables of one component can influence the properties of the other component and vice versa. Whether this assumption is realistic or not is a matter of experiment, not theory. As usual, with hindsight, one should have been able to see this from the very beginning, since in the integrand of Cay. A(a, X)B(b, X)p(X) the only possible source of the correlation is via the parameters A; everything else is "local".
16.3.
Bell's Assumptions
16.3.1.
Lessons from
303
EPR
It is not surprising that very little can be said about the physical world from the extremely general assumptions which Bell made. If the physical interpretation of the function p(X) is not made, the integrals defining the correlation coefficients are nothing more than certain measures of products of functions A and B with p as the scale factor of the variable A and all that can be obtained in these circumstances is a relationship reminiscent of the Schwarz ("triangle") inequality. Using the interpretation of A as hidden variables again cannot tell us anything about interactions in the real world until we say what the scope of these unknown variables is. If they are unknown and local to each component, they cannot induce any correlations between the two components; if they are unknown and global then they can induce such correlations simply because they are assumed to do so by the decision that they be global. Huge excitement has been generated by putting a particular interpretation on the inequality (16.2.4 on page 299). In setting up the expression for the correlation coefficient (16.2.3 on page 299) the quantities a and b were specifically assumed to be local to the two subsystems, and yet the result, as we have seen, produces a non-local correlation. This result is then said to violate Einstein's principle of locality where a measurement A(a, A) does not depend on b. But this is to completely ignore the qualitative properties of the quantities A on which A(a, A) does depend. If it is not said whether these quantities are local or not, then any result involving them has no bearing whatsoever on Einstein's common-sense principle. One can hardly blame Einstein if only the known parameters (a, b) are local and there are some unknown (hidden) ones (A) which are not. As we have seen, if the hidden parameters are local, then Einstein's principle is trivially true and, if they are global, the principle is irrelevant because the question of locality does not arise. In the case of the spin eigenfunction of two electrons (say) we know exactly what the major forces of interaction between the spins are and we also know that they are "hidden" in the sense that their origin lies outside the system of "spin variables" used to describe the system. They are due to a combination of Coulomb's law between charges carried by the particles and their environment and the Pauli principle for those particles as we saw in Section 14.3 on page 267. In this enlarged context it is not at all surprising that the experimental determination of correlations between the spins in a
Paradoxes
304
singlet (or any other) spin state is difficult when the components are widely separated; the "spin-spin" interaction falls off exponentially with distance between the particles and is, for all practical purposes, zero long before the particles are a millimeter apart. Any residual interaction between the two spins will be completely overwhelmed as soon as one or other of the target electrons gets a whiff of any other Coulombic interaction with any single electron. However, I am not concerned here with the practicalities of the determination of spin-component determination, but with combatting the idea that the assumption of a set of hidden variables (A) with supernatural powers — the ability to influence the properties of a remote system instantly — has any consequences for Schrodinger's mechanics. As we have seen, any results which conflict with quantum theory or the special theory of relativity or any other known scientific theory come from the assumptions made about the scope of the hidden variables A. If such variables associated with one component are required in order to calculate the properties of the other component without regard for distance and time lapse, then we should not be too surprised if accepted scientific laws — which have more modest requirements of their variables — are violated.
16.3.2.
Density
of Spin and
EPR
In this section so far I have simply stayed within the confines of what has become the traditional universe of discourse for EPR; the discussion, by Bell, in terms of the correlation of distant measurements. However, it is possible to take a more robust view by insisting that any discussion must be based on all the variables used to describe the system, including the variables that are hidden from view when talking about spin-only correlations. When we looked at the diamagnetic hydrogen molecule I stressed the non-physical nature of the spin-Hamiltonian model of chemical bonding and it is worthwhile to use this simple case to evaluate the distribution of electron spin in (real, three-dimensional) space associated with a singlet spin function since it is of interest in the current context. Recall that the distribution of a momentum variable pq{q), say, in space is given, for a state function \I/, by integrating over all the variables in the system other than q:
PvM)=
h*{q,q')pqn
(16.3.8)
16.3.
Bell's
305
Assumptions
where q' is used generically to represent any variables other than q. Since the electrons are indistinguishable, all these components are equal and so we may pick a generic one of them and multiply it by the number of electrons, n, say. In the case of our singlet spin function: *(n,f2;si,s2) =
$(ri,r2)6(s1,s2)
= *(Fi,r 2 ){a( S l )/?(s 2 ) - a ( s 2 ) / 3 ( s i ) } . The spin density, the distribution in space of the density of Sw, (for w = x, yz) — interpretable as the value of a-spin excess over /3-spin) as a function of space f — is: ps(f)
= f df' f ds
+ Sw(s2)}{a(s)0(s')
x {a(s)(3(s') -
a(s')(3(s)}
- a(s')(3(s)} = 0.
That is, for a singlet spin function the difference between a-spin and ,3-spin is zero in all directions in space at every point in space; a point which has some ramifications in thinking about the EPR paradox. For example, if this theory of the singlet spin function is correct, then any measurement on concrete systems associated with an abstract system in a singlet spin state must yield values of zero spin everywhere in space independently of the spatial locations of the spinning particles. Even more important is the density of the square of angular momentum (angular kinetic energy) and its implications for the EPR thoughtexperiment. Repeating the above derivation for the diamagnetic singlet state function generates exactly the same result; the distribution of the square of angular momentum is zero everywhere in space. Of course it is — it is necessarily non-negative everywhere and integrates to zero to give the zero mean value (and eigenvalue) of S2 for a singlet function. With these two elementary results in mind we can look again at the EPR experiment from the point of view of Bohm's modification to use electron spin. If we attempt to investigate the spin of the system by measuring the spin of one of the electrons, 12 then there are just two possibilities: • We get a result of zero for the ^-component (say) and a result of zero for the total angular momentum squared, which verifies that the system is indeed in a singlet state and that is all we may deduce. Ignoring the all-important point of the necessity of using statistical measurements on many systems to verify probability statements for the moment, since this point is usually ignored in arguments of this type.
306
Paradoxes
• We get non-zero results for one or both of these measurements and may therefore deduce that the system is not in a singlet state. Suppose that the second case obtains. The only possibility for the measured values for a single electron is ± 1 / 2 for the ^-component and ( l / 2 ) ( l / 2 + l ) = 3/4 for the total. Suppose we now ignore the fact that the very existence of these non-zero measurements means that the system in not in a singlet state and attempt to use them to predict the spin-values of the other (possibly remote) electron in this "singlet state". The argument goes: Since the total z-component of spin is zero in the singlet state and we have measured the z-component of one electron (1/2 say), the other must have the value —1/2; if one electron is "spin-up" the other must be "spin-down". But the other measurement gives an ostensibly similar argument: Since the total square of spin is zero in the singlet state and we have measured the total square of one electron (3/4), the other must have the value - 3 / 4 . But this is clearly in error since the only possibility for the square of the spin angular momentum for an electron is +3/4. From the point of view of those who maintain that the actual operators represent the act of measurement it might be worth remarking that the singlet spin function is not an eigenfunction of Sz(i) for i = 1, 2; the result of such a "measurement" of the spin of either one of the electrons in the singlet state is to generate the triplet state's function and not to leave the singlet function undisturbed. Let us just note in finishing that, if any measurement of electron spin on any concrete system at any point in space yields a non-zero value, then that system is not a concrete case of a singlet state and that is the end of the matter. As usual, in making the act of abstraction from a richer, more explicit interpreted theory, the effects of the richer theory have been ignored; one of the key things which characterises the singlet state of a twoelectron system is not that it consists of a pair of electrons with opposite spin but that the spatial distribution of electron spin is everywhere zero. In a similar manner, if the original EPR model were to be taken seriously, one would have to solve the Schrodinger equation for the two particles involved and not just concentrate on abstract thought-experiments.
16.4-
16.4.
Zero-Point
307
Energy
Zero-Point Energy
In Section 12.2 on page 239 we found the solutions of the Schrodinger equation for the harmonic oscillator and the allowed energies for this system turned out to be
En = hv(n + -J
n = 0,1,2,...
where v is the frequency of the motion. The lowest (ground) state of such a system is, of course, the state with n = 0 which generates the question "what is the meaning of the fact that the lowest state still has energy ( 1 / 2 ) W ? This question is particularly pressing in the application of the theory to the electromagnetic field since, with the infinite number of degrees of vibrational freedom involved, it suggests that there is an infinite amount of energy locked up in the overall ground state of that field. There is a related, although less celebrated, problem associated with the solution of the Schrodinger equation for a "particle in a box"; one usually omits the solution with zero energy on the grounds that the associated state function is a constant, although this is perfectly consistent with zero gradient and therefore zero (kinetic) energy. What is happening here is that both the Hamilton-Jacobi equation and the Schrodinger equation are equations for the dynamics of particles (and fields), there is no theory of quantum statics or electrostatics. In both the HJ equation and the Schrodinger equation there is the explicit assumption that the particles are in motion; both of them have solutions which depend explicitly on derivatives of S or i/>. In the case of a more abstract picture of quantum theory, as I noted in Section 15.3, there is a missing axiom which must fulfill the same function as the variational derivation of the Schrodinger equation from the Schrodinger Condition; it must be assumed that the particles are always in motion. 13 So, any states of systems which involve stationary particles are inaccessible to Schrodinger's mechanics. Now, most of the applications of Schrodinger's mechanics involve charged particles and there is a venerable theorem in classical electrostatics (Earnshaw's 14 theorem) which says that there is no state of equilibrium for 13 T h e conventional "explanation" of the zero-point energy is that Heisenberg's uncertainty principle would be violated by a stationary system; this principle is, as we have seen, a theorem in Schrodinger's mechanics and so, if the uncertainty relation is to be mentioned at all, it is that it is not applicable because its derivation assumes non-zero momentum distributions, i.e., moving particles. 14 Reverend S Earnshaw "On the Nature of the Molecular Forces Which Regulate the Constitution of the Luminiferous Ether", Trans. Camb. Phil. Soc. 7, p. 97 (1842).
308
Paradoxes
a system of charged particles in which the particles are stationary. Thus, for the vast majority of applications of quantum theory the requirement of motion for the particles is automatically met as it were by coincidence. This provides an important clue to the solution of this ancient paradox. The classical description of "particle in a box" and the harmonic oscillator have two properties in common: • They do not involve potentials associated with charged particles and so Earnshaw's theorem is not applicable. • Their lowest-energy states occur when there is no motion; no translational motion or no vibration respectively. Thus, strictly, Schrodinger's mechanics is not applicable to their ground states since it is explicit in Schrodinger's mechanics that the systems must be in motion (the V 2 in the Schrodinger equation) or implicit in more abstract formulations of quantum theory. In fact it is Earnshaw's theorem which points the way to a resolution of the zero-point energy paradox. The harmonic oscillator is an abstraction from the properties of (e.g. molecular) matter in two senses: 1. All cohesive forces can actually be overcome so that no chemical or solidstate bond can actually be described by a harmonic force. 2. More important, the source of the cohesive (restoring) forces in molecular matter is a combination of repulsive and attractive electrostatic forces between the charged particles (electrons and nuclei) of which that matter is composed. The first of these abstractions is not important in the present context, since all cohesive forces between atoms are harmonic for small enough distortions and more accurate force fields allowing bond breaking also have some zeropoint energy; only the numerical value may be changed. It is the real source of the restoring forces which allows us to see the problem clearly. The simplest and most familiar example, the diatomic hydrogen molecule H2, is said to have a single vibrational degree of freedom in which the two protons execute approximate harmonic motion. But this molecule is composed of four charged particles: two protons and two electrons. All the details of the energetics and particle distributions of this system are given by Coulomb's law; Hooke's law makes no appearance. It is simple to write down the Schrodinger equation for this system but, unfortunately, impossible to solve it exactly. However, what we can say is
16.4-
Zero-Point
Energy
309
that, in the lowest state of this system, all the four particles are in motion and that, by completely smashing this molecule into its four component particles and making them remote from each other, there is no residuum of trapped energy which cannot be extracted. 15 How does this correlate with the zero-point energy of the "vibrational mode" used in the standard description of the hydrogen molecule? In order to attempt even an approximate solution of the Schrodinger equation for the hydrogen molecule (or any other molecule or crystal), certain model approximations have to be made. In particular, the motion of the system as a whole may be factored out of the solution leaving the problem of rotation of the system as a whole and the remaining internal, relative motions to be described. It is usually a good approximation to treat the rotational motion of the molecule separately from the vibrational motion of the protons and the motion of the electrons about the two protons. On making an approximate solution to the Schrodinger equation for the electronic motion, one finds that there is an effective field of force acting on the two protons which (for small displacements from the equilibrium bond length) is very close to Hooke's law. That is the harmonic force constant results from the combined effect of Coulomb's law for the mutual interactions of the four charged particles. We can now see clearly what has happened; in the case of this simple molecule the very existence of the harmonic oscillator problem and the solution of the relevant Schrodinger equation is an artifact of abstraction and certain model assumptions, which however useful, are in no sense fundamental. The real motion of the four particles in the ground state of hydrogen molecule is a complex mixture of what we have merely conceptualised as translation, rotation, nuclear vibration and electronic motion; it does not involve the (quantum) paradox of an equilibrium with stationary (charged) particles. To emphasise, in the ground state of the hydrogen molecule all the (charged) particles are in relative motion and there is no mystery about this fact. If we use a mechanics which contains the explicit assumption that the components described are always in motion, we should not feel too cheated if paradoxes arise when we apply this mechanics to systems generated by our abstractions from real systems which prove to have states in which the model components are stationary. In making this familiar Where would this energy be?
310
Paradoxes
model of the vibrations of the hydrogen molecule we have converted the 12 spatial coordinates of the four particles into model assumptions and a single "coupling constant" in an effective Hamiltonian for the remaining single explicit coordinate. The actual forces of interaction between the particles (Coulomb's law) and the associated coordinates have been hidden in the form and parameter(s) of this model Hamiltonian. This technique is formally analogous to the explanation of the chemical bond as spin coupling but it has the distinct advantage of using a physical rather than occult explanation of the phenomena. It is worth distinguishing between the probability distributions for a harmonic oscillator and a particle in a box as obtained by classical mechanics. The abstract object "harmonic oscillator" has a delta function distribution; in its lowest state, the oscillator has zero displacement. The abstract object "particle in a box" has a probabilty distribution which is constant over the length of the box; its lowest state is simply a stationary particle with equal probability anywhere in the box. Of course, in the former case all concrete oscillators in their lowest state have a zero displacement, while in the latter case any concrete stationary particle must be at some particular location in the box.
Chapter 17
Beyond Schrodinger's Mechanics?
We have seen how it is possible to extract some of the formal structures abstracted from Schrodinger's mechanics and use them in a semi-empirical way to provide impressive technologies for the rationalisation of a wide range of phenomena for which no interpreted theory is yet available. It is interesting but perhaps not very profitable to compare this situation with previous experiences of this kind in the history of science and to comment on some current developments.
Contents 17.1. 17.2. 17.3. 17.4.
17.1.
An Interregnum? The Avant-Garde The Break with the Past Classical and Quantum Mechanics
311 313 314 315
An Interregnum?
I have made a some considerable fuss about the (philosophical) dangers of treating mathematical technologies as if they were interpreted physical theories in Chapters 12, 13 and 14 while, I hope, having due regard for their values as technologies. In this final Chapter 1 would like to try to put the position of Schrodinger's mechanics in some kind of perspective. 1 I must say, first of all, that I am in awe of the achievements of the field and particle theorists and of their mathematical prowess and determination. But, as I have noted, I am much less impressed by attempts to assume that 'At least, a perspective from where I stand. 311
Beyond Schrodinger's Mechanics?
312
these achievements are the generation of new physical theories. The analogy which forces itself on me is a comparison with the position of sub-atomic physics during the interregnum between classical and quantum mechanics in the early years of this century before Schrodinger's first paper of 1926. At that time there was: • A beautiful and very well-developed theory of classical particle mechanics and electromagnetic fields. • A variety of experimental data showing that classical mechanics just would not do at the sub-atomic level. • The discovery that the ad hoc use of the quantum rule: d> pidq% = nh
applied after the use of purely classical mechanical methods to a problem often gave results in agreement with experiment. Using this empirical approach Arnold Sommerfeld in his authoritative book "Atombau und Spektrallinien" 2 was able to derive, for example, the allowed energy levels for a rotating, vibrating molecule; an astonishing feat from such ostensibly shaky foundations and every bit as impressive as the later prediction of new particles by inventing suitable fields. In what must be one of the bravest publications of all time, Max Born finished his book "Atommechanik" 3 in 1924 and it was published in 1925; these dates immediately preceding Schrodinger's stunning papers of 1926. He was so sure that a new theory of matter was needed that he called this book Volume 1 in anticipation of writing Volume 2 when the new mechanics had appeared. Born used all the power of classical mechanics in this tour de force and was (in my view, correctly) sufficiently confident that the dependence of the new mechanics of Schrodinger on classical mechanics was so obvious that he was happy to see the English version of the book published in 1927; he says: . . . it seems to me that the time is not yet arrived when the new mechanics can be built up on its own foundations, without any connection with the classical theory. 2
English translation "Atomic Structure and Spectral Lines" Methuen, 1923. English translation "The Mechanics of the Atom", Bell 1927.
3
17.2.
The Avant-Garde
313
The current situation vis-a-vis the relationship between Schrodinger's mechanics and (for example) quantum "field" theories of elementary particles is similar; Schrodinger's mechanics has provided the motivation and the general abstract structures — Hilbert space, non-commuting operators, etc. — but additional a posteriori rules are needed to enable the theory to be extended to the new field. There can be no doubt that the various theories and frameworks which have been extrapolated from the basic structure of Schrodinger's mechanics do generate astonishingly accurate descriptions of the properties of sub-nuclear particles for which no interpreted theory is yet available. However, it is clear that there must be a new physical theory at the level of fundamental particles which has not yet been obtained by the current technique of combining quantum theoretical structures, the requirements of special relativity and additional ad hoc rules. As Morris Kline has pointed out so emphatically over the years, physical explanation is rapidly being replaced by mathematical description in physical science; the invention of quantum theory has exaggerated this trend since the working mathematical framework was generated long before an interpretation of the formalism was available. Since then, mathematical description has become the rule and the need for physical explanation may even be deprecated on the grounds that we have no intuitions to guide our explanations of sub-atomic matter or matter on a gigantic scale. But what intuitions did our (equally intelligent) ancestors have about white blood cells, about atoms and molecules, about the origin of species or mathematical continuity; concepts completely remote from their experience? Intuitions are nothing but yesterday's experience crystallised. As I took care to note in Chapter 1, what we may imagine or understand is not something which is fixed like the colour of our eyes but is part of our cultural heritage which seems to be infinitely extendable. Unlike much official opinion to the contrary, I do not believe we are coming to the end of science but have merely become daunted by the difficulties we face.
17.2.
The
Avant-Garde
There are disturbing parallels between the avant-garde movements in art and the combination of developments in modern quantum science and cosmology which aims to obtain a "Theory of Everything". In seeking to reject all previous movements and value systems, both are progressing further and further into their own cul-de-sac.
314
Beyond Schrodinger's
Mechanics?
In the thirtieth Walter Neurath Memorial Lecture "Behind the Times" 4 the historian Eric Hobsbawm says: Why, between say 1905 and 1914, the avant-garde deliberately broke this continuity with the past is a question I can't answer adequately. But once it had done so, it was necessarily on the way to nowhere. What could painting do once it had abandoned the traditional language of representation, or moved sufficiently far from its conventional idiom to make it incomprehensible? What could it communicate? If we change the dates, replace "painting" by "physics", and "language of representation" by "requirement of verifiability" we have a completely valid question about current trends in quantum science and cosmology. Mathematical structures obtained by abstraction and analogy from Schrodinger's mechanics, but which use concepts and constructions which are in principle unverifiable are now common in conjunction with cosmological models which are similarly unverifiable; there is an alliance between unverifiability "in the small" and unverifiability "in the large" here which represents what I think of as avant-garde science. Other authors, no doubt influenced by the terminology of the post-moderns, use the term "ironic science" for this phenomenon and have charitably assumed that it has a role to play in maintaining interest in the large (unanswerable) questions about the universe. Some proponents of such cosmological theories use Schrodinger's mechanics in that they are involved in discussions of the wavefunction (state function *£) of the universe or of an individual conscious human. But the universe is a concrete object, as is every individual human, and concrete objects do not have state functions nor do probability distributions refer to concrete objects as we have stressed throughout this work.
17.3.
The Break with the Past
Nowhere is the wilful break with the past better illustrated than in John Horgan's hilarious description of a meeting held in The Sante Fe Institute in May of 1994 to discuss the "Limits to Scientific Knowledge" . 5 At this meeting a participant from the Sloan Foundation had not heard of Ludwig 4
T h a m e s and Hudson, 1998. Described in The End of Science, Abacus reprint 1998 of an original published in 1996; Chapter 9. 5
17.4- Classical and Quantum Mechanics
315
Wittgenstein, no-one seemed to know that Bertrand Russell had something to say about the subject matter of their deliberations, and there was active, strongly expressed, hostility to the very idea that it might be useful to know about Immanuel Kant's thoughts on some key issues. With the exception of the last item, these points are trivial, but they do show a surprising ignorance of and active hostility to the very idea that the problems with which they are grappling have been encountered centuries (even millennia) before by the most acute minds of their time who have left us pellucid accounts of their investigations. Horgan opines that "... they did no want to be reminded that for the most part they were merely restating, with new-fangled jargon and metaphors, arguments set forth long ago not only by Kant but even by the ancient Greeks". In this context, one might be tempted to say of these participants what the great historian Edward Gibbon said of himself on going up to Oxford in 1752; he had . . . a stock of erudition that might have puzzled a doctor and a degree of ignorance of which a schoolboy would have been ashamed. 6 Returning to Hobsbawm's account of avant-garde art, after the quote above he makes the penetrating, if a little unkind, observation about the practitioners that: They were constantly torn between the conviction that there could be no future to the art of the past — even yesterday's past, or even to any kind of art in the old definition — and the conviction that what they were doing in the old social role of "artists" and "geniuses" was important, and rooted in the great tradition of the past. Again, this has a familiar ring about it; today's avant-garde theorists are not exactly tormented by self-doubt. 17.4.
Classical and Quantum Mechanics
The historical experiment with abstraction has gone too far in physical science. However well the equations of quantum field theory and its developments may be parametrised to agree with experiment, there remains the problem of interpretation and, above all, meaning. Letter to Lord Sheffield, 10th May 1786.
316
Beyond Schrodinger's
Mechanics?
In the years following the invention of Schrodinger's mechanics the prevailing atmosphere of positivism in philosophical circles provided just the smokescreen required to enable the innovative theorists to forge ahead with their work without worrying too much about questions of meaning and interpretation. One is inevitably reminded of the negligible effect of the very cogent and penetrating criticisms of Newton's integral and differential calculus expressed in Bishop George Berkeley's famous polemic The Analyst.7 There is no doubt that Berkeley was absolutely right in his devastating criticism of the poor foundations of the new calculus and that many of its formulations were self-contradictory or even meaningless.8 But these criticisms were expressed at the wrong time; they were too early. The solutions to the sort of problems which Berkeley raised were not found until the nineteenth century by Cauchy, Riemann and others. If all mathematics and mathematical physics had had to stop until the foundations of analysis were cleared up, practically the whole of classical mechanics, electromagnetic theory, celestial mechanics and fluid dynamics would never have appeared. There are times in the development of science when scientists simply have to "get on with it" and press on regardless. The beginnings of both classical and quantum (particle) mechanics were just such times. But those times are now past — the foundations of analysis are secure, positivism and operationalism are debunked — and we have the conceptual tools and technologies to dissolve the paradoxes and confusions surrounding quantum theory and to start to replace empirical theories with interpreted ones. No one would pretend that Schrodinger's mechanics is a completely satisfactory theory, even in its own area of applicability and notwithstanding its impressive results; but it does not contain any paradoxes or inconsistencies and its domain of applicability is well-defined. Just as Newton's theory left the "mechanism" of gravitation as a complete unknown, 9 so the characteristic feature of Schrodinger's Condition, the existence of discrete states, although generated by the Schrodinger Condition, is not explained. Equally important, Schrodinger's mechanics in no way supercedes or replaces Newton's mechanics since it is completely silent on the question of particle trajectories; if we had to rely solely on the Schrodinger equation 7 The Analyst: A Discourse Addressed to an Infidel Mathematician by the author of The Minute Philosopher, 1734. 8 My own favourite is the paragraph "fluxions [differentials] of all orders inconceivable". 9 "Hypothesis non fingo".
17.4- Classical and Quantum Mechanics
317
for mechanics at all scales we would not be able to calculate the orbit of the moon, for example, something which could be done over a millenium ago by Ptolemaic methods; in short, we are not close to a theory of everything.
This page is intentionally left blank
Index
I have, to some extent, tried t o pre-empt this index by using a fairly explicit table of contents and a contents list for each chapter. In the index, names appear in the order of surname b u t preceded by initials; references to footnotes have "n" appended t o t h e relevant page number.
Aristotle, 259 V. I. Arnold, 18 St. Augustine, 259 axioms, 20, 21, 72 and taxonomy, 20 Kolmogorov's, 58
abstract object, 53-55 and predicate, 55 definition, 54 existence, 54 state of, 55 visualisation, 68 abstract particle, 133 momentum of, 180 abstraction, see Chapter 10 action complex, 131, 132, 179 real, 129, 180 action at a distance, 297 Y. Aharonov, 294n angular kinetic energy, 219 angular momentum, see Chapter 5 "vector", 107, 109 and angular kinetic energy, 220 and Poisson Bracket, 106 canonical and vectorial, 100, 104
B F. Bacon, 4 basic principles summary, 142-145 J. S. Bell, 295 and EPR, 298-302 inequality, 299 theorem, 299 G. Berkeley, 316 N. N. Bogoliubov, 247n D. Bohm, 294n and EPR, 297, 298 Boltzmann ensemble, 68 M. Born, 26n, 312 319
320
boundary conditions for momentum, 200 W. Bragg, 118 P. W. Bridgeman, 31, 40 " M. A. Bunge, 19n, 262 N. Bunnin, 4 C causality, 17 chemical bond Dirac's spin theory, 267, 268 "collapse" of distribution, 49 commuting operators and conservation, 218 concrete object and conservation, 231 and random tests, 67 in real world, 181, 185 no state function, 274 conservation and commuting operators, 218 in concrete & abstract systems, 223-226 Copernicus, 259 coupling model, 269 culture illiterate, 10 D P. Davies, 7 definitions and equations, 16 and laws, 15 V 2 , 152 dice definition, 35 probability, 29-31 standard, 35 statistics, 31-34 J. Dieudonne, 21 diffraction, particle, see Chapter 6 dual and unitary theories, 114 three-slit, 118 P. A. M. Dirac, 199n, 265 distribution of dynamical variable, 135, 275
Index of particle position, 135 W. Duane, 118 dynamical variable distribution, 135 E Rev. S. Earnshaw theorem, 307 Ehrenfest relations, 139, 293 A. Einstein, 68n, 294n locality principle, 303 Einstein Podolsky Rosen (EPR) paradox, 294-299 energy and Hamiltonian, 124 ensemble and probability, 48 as visualisation of abstract object, 68 Boltzmann, 68 Gibbsian, 68 equations and definitions, 16 and identities, 181 Euler-Lagrange equations, 150 F L. Faddeev, 204n foundations, 18, 19 verbal or mathematical, 19 G. Frege, l l n G E. Gibbon, 315 Gibbsian ensemble, 68 God, 22 L. Goldmann, 55 J. Gribbin, 7 H H-J equation, see Chapter 4 and Poisson bracket, 89 and Schrodinger equation, 127 density of trajectories, 85 for the hydrogen atom, 293 referent, 127
Index referent of solutions, 81-83 solution in Cartesians, 78 in Cylindrical coordinates, 83 in Spherical Polars, 79 W. R. Hamilton, 18n, 19, 71 Hamilton's equations, 74, 89, 125 Hamilton-Jacobi equation, see Chapter 4 Hamiltonian and energy, 124 semi-empirical, 261, 263, 264 spin, 304 harmonic oscillator and zero-point energy, 307-310 classical, 41 probability distribution, 42 W. Heisenberg, 284 uncertainty principle, 284-286 and ZPE, 307n Hermitian operator and state function, 63 hidden variables, see Chapter 14, 297n and hidden laws of nature, 51 and probabilities, 50 and thermodynamics, 50 in EPR, 298 Hilbert space, 14, 201, 261 T. Hobbes, 4, 8 E. Hobsbaum, 314 Hooke's law, 237, 245 and molecules, 309 J. Horgan, 314 J. Hume, 17 I identities and equations, 181 initial conditions for sub-atomic particles, 126 intuition in quantum physics, 313 ironic science, 314
321 J K. G. J. Jacobi, 19, 71 E. T. Jaynes, 261 K I. Kant, 17, 315 J. Kepler, 209, 259 Kepler problem, see Chapter 11 and symmetries, 215 in a plane, 211 in three dimensions, 214 quantum and classical, 291 separable coordinates, 217 separation operators, 216 kinetic energy density, 154 M. Kline, 313 A. N. Kolmogorov, 21, 26n, 29, 51, 58, 277 L J. L. Lagrange, 19, 71 Lagrange's equations, 75, 125 language applicability, 12 spoken & written, 11 "lapidary method", 13 Laplacian, see V 2 laws and definitions, 15 V. I. Lenin, 6, 295 E. H. Lieb, 260 loaded dice statistics, 34, 35 logic, 9, 11 social product, 11 G. Ludwig, 20 A. R. Luria, 10 M E. Mach, 15 K. Marx, 17n materialism, 4, 9 "death of", 7 mathematical foundations, 14 mathematics, 12, 13
Index
322
applicability, 13 generation of, 14 meaning, 16 measure confusion with measurement, 28n, 31, 274n, 296n measurement made on concrete objects, 274 reproducibility, 47 von Neumann's theory, 278-281 mind and material objects, 8 momentum abstract operator, 199 and boundary conditions, 200 and constraints, 186 and kinetic energy, 190 imaginary distribution, 183 operator, 198 zero mean value, 185 momentum operator by substitution, see Appendix 11.A Monte Carlo quadrature, 32, 274 P. Moon, 216 N I. Newton, 5, 15, 19 Nominalists, 54 normal coordinates, 245 E. Nother, 231 O observer, 50 William of Ockham, 19 P particle in a "box", 192 and slit diffraction, 117 ground state, 307 W. Pauli, 265 R. Penrose, 66n perturbation theory, see Chapter 13 and epicycles, 258 physical interpretation, 254, 256-258 phonon, 248
photon, 246, 247n M. Planck, 148n Planck's constant, 148 B. Podolsky, 294n H. Poincare, 42 S. D. Poisson, 96 Poisson bracket and angular momentum, 106 and H-J equation, 89 definition, 96 positivism probabilities, 29 post-moderns, 18n predicate and abstract object, 55 probability, see Part 2 and conservation laws, 35 and ensemble, 48 and referent of quantum mechanics, 230 and state function, 61 colloquial and mathematical, 26, 27 density, 59 dice, 29-31 formal definition, 58 informal definition, 29 mathematical theory, 27 measure of sets, 27 non-statistical measurement, 40 of proposition, 52 propensity interpretation, 56 role of chance, 28, 45 statistics, 41 time-dependent, 63 projection operator and state function, 61 propensity and probability, 56 objectivity of, 58 Ptolemy, 259 R random test, 66 reduction and thermodynamics, 50 representation
323
Index
in mathematics, 13 N. Rosen, 294n B. A. W. Russell, l l n , 48, 262, 315 Rutherford, 5
M. Steiner, 12 step-up & step-down operators, 242 J. Swift, 54n syllogism, 10
S Schrodinger Condition, 137 Schrodinger equation boundary conditions, 156-158 role of, 155 recipe for, 153 time-independent, 158, 159 Schwarz inequality, 303 John Duns Scotius, 4 R. Scruton, 11 second quantisation, 249 D. V. Shirkov, 247n simple harmonic motion, see Chapter 12 one-dimensional, 239 A. Sommerfeld, 312 quantisation condition, 117 D. E. Spencer, 216 spin "coupling", 268 "functions", 265 spatial distribution, 305 E. Squires, 66n standard deviation and uncertainty principle, 284 and uncertainty principle, see Appendix 15.A state of abstract objects, 55 state function, 61 and Hermitian operator, 63 and projection operator, 61 and wave function, xviii definition, 137 fundamental nature of, 62 statistics, 39 dice, 31-34 frequency ratios, 33 loaded dice, 34, 35 Monte Carlo, 32 quadrature, 31-33
T thermodynamics, 16 trajectories in classical and quantum mechanics, 292, 293 transformation theory, see Appendix 4.A E. P. Tsui-James, 4 V variation principles, 123-125 J. von Neumann and measurement, 277 L. S. Vygotskii, 10 W wave function, xviii L. Wittgenstein, l l n , 17n, 315 Z zero-point energy, 307 ZPE, see zero-point energy
PROBABILITY AND
SCHRODINGER'S MECHANICS NHMlIlrrWT'rfMllllfllrill'l
www. worldscientific.com 5125 he
This book addresses some of the problems of interpreting Schrodinger's mechanics — the most complete and explicit theory falling under the umbrella of "quantum theory". The outlook is materialist ("realist") and stresses the development of Schrodinger's mechanics from classical theories and its close connections with (particularly) the Ilamilton-Jacobi theory. Emphasis is placed on the concepts and use of the modern objective (measure-theoretic) probability theory. The work is free from any menlion of the bearing of Schrodinger's mechanics on God, his alleged mind or, indeed, minds at all. The author has taken the naive view that this mechanics is about the structure and dynamics of atomic and sub-atomic systems since he has been unable to trace any references to minds, consciousness or measurements in the foundations of the theory.