Instructions to authors Aims and scope Physics Reports keeps the active physicist up-to-date on developments in a wide range of topics by publishing timely reviews which are more extensive than just literature surveys but normally less than a full monograph. Each Report deals with one specific subject. These reviews are specialist in nature but contain enough introductory material to make the main points intelligible to a non-specialist. The reader will not only be able to distinguish important developments and trends but will also find a sufficient number of references to the original literature. Submission In principle, papers are written and submitted on the invitation of one of the Editors, although the Editors would be glad to receive suggestions. Proposals for review articles (approximately 500–1000 words) should be sent by the authors to one of the Editors listed below. The Editor will evaluate proposals on the basis of timeliness and relevance and inform the authors as soon as possible. All submitted papers are subject to a refereeing process. Editors J.V. ALLABY (Experimental high-energy physics), EP Division, CERN, CH-1211 Geneva 23, Switzerland. E-mail:
[email protected] D.D. AWSCHALOM (Experimental condensed matter physics), Department of Physics, University of California, Santa Barbara, CA 93106, USA. E-mail:
[email protected] J.A. BAGGER (High-energy physics), Department of Physics & Astronomy, The Johns Hopkins University, 3400 North Charles Street, Baltimore MD 21218, USA. E-mail:
[email protected] C.W.J. BEENAKKER (Mesoscopic physics), Instituut–Lorentz, Universiteit Leiden, P.O. Box 9506, 2300 RA Leiden, The Netherlands. E-mail:
[email protected] G.E. BROWN (Nuclear physics), Institute for Theoretical Physics, State University of New York at Stony Brook, Stony Brook, NY 11974, USA. E-mail:
[email protected] D.K. CAMPBELL (Non-linear dynamics), Dean, College of Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, USA. E-mail:
[email protected] G. COMSA (Surfaces and thin films), Institut fur . Physikalische und Theoretische Chemie, Universit.at Bonn, Wegelerstrasse 12, D-53115 Bonn, Germany. E-mail:
[email protected] J. EICHLER (Atomic and molecular physics), Hahn-Meitner-Institut Berlin, Abteilung Theoretische Physik, Glienicker Strasse 100, 14109 Berlin, Germany. E-mail:
[email protected] M.P. KAMIONKOWSKI (Astrophysics), Theoretical Astrophysics 130-33, California Institute of Technology, 1200 East California Blvd., Pasadena, CA 91125, USA. E-mail:
[email protected] M.L. KLEIN (Soft condensed matter physics), Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323, USA. E-mail:
[email protected]
vi
Instructions to authors
A.A. MARADUDIN (Condensed matter physics), Department of Physics and Astronomy, University of California, Irvine, CA 92697-4575, USA. E-mail:
[email protected] D.L. MILLS (Condensed matter physics), Department of Physics and Astronomy, University of California, Irvine, CA 92697-4575, USA. E-mail:
[email protected] H. ORLAND (Statistical physics and field theory), Service de Physique Theorique, CE-Saclay, CEA, 91191 Gif-sur-Yvette Cedex, France. E-mail:
[email protected] R. PETRONZIO (High-energy physics), Dipartimento di Fisica, Universita" di Roma – Tor Vergata, Via della Ricerca Scientifica, 1, I-00133 Rome, Italy. E-mail:
[email protected] S. PEYERIMHOFF (Molecular physics), Institute of Physical and Theoretical Chemistry, Wegelerstrasse 12, D-53115 Bonn, Germany. E-mail:
[email protected] I. PROCACCIA (Statistical mechanics), Department of Chemical Physics, Weizmann Institute of Science, Rehovot 76100, Israel. E-mail:
[email protected] E. SACKMANN (Biological physics), Physik-Department E22 (Biophysics Lab.), Technische Universit.at Munchen, . D-85747 Garching, Germany. E-mail:
[email protected] A. SCHWIMMER (High-energy physics), Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel. E-mail:
[email protected] R.N. SUDAN (Plasma physics), Laboratory of Plasma Studies, Cornell University, 369 Upson Hall, Ithaca, NY 14853-7501, USA. E-mail:
[email protected] W. WEISE (Physics of hadrons and nuclei), Institut fur . Theoretische Physik, Physik Department, Technische Universit.at Munchen, . James Franck Strae, D-85748 Garching, Germany. E-mail:
[email protected] Manuscript style guidelines Papers should be written in correct English. Authors with insufficient command of the English language should seek linguistic advice. Manuscripts should be typed on one side of the paper, with double line spacing and a wide margin. The character size should be sufficiently large that all subscripts and superscripts in mathematical expressions are clearly legible. Please note that manuscripts should be accompanied by separate sheets containing: the title, authors’ names and addresses, abstract, PACS codes and keywords, a table of contents, and a list of figure captions and tables. – Address: The name, complete postal address, e-mail address, telephone and fax number of the corresponding author should be indicated on the manuscript. – Abstract: A short informative abstract not exceeding approximately 150 words is required. – PACS codes/keywords: Please supply one or more PACS-1999 classification codes and up to 4 keywords of your own choice for indexing purposes. PACS is available online from our homepage (http://www.elsevier.com/locate/physrep). References. The list of references may be organized according to the number system or the nameyear (Harvard) system. Number system: [1] M.J. Ablowitz, D.J. Kaup, A.C. Newell and H. Segur, The inverse scattering transform – Fourier analysis for nonlinear problems, Studies in Applied Mathematics 53 (1974) 249–315. [2] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions (Dover, New York, 1965).
Instructions to authors
vii
[3] B. Ziegler, in: New Vistas in Electro-nuclear Physics, eds E.L. Tomusiak, H.S. Kaplan and E.T. Dressler (Plenum, New York, 1986) p. 293. A reference should not contain more than one article. Harvard system:
Ablowitz, M.J., D.J. Kaup, A.C. Newell and H. Segur, 1974. The inverse scattering transform – Fourier analysis for nonlinear problems, Studies in Applied Mathematics 53, 249–315. Abramowitz, M. and I. Stegun, 1965, Handbook of Mathematical Functions (Dover, New York). Ziegler, B., 1986, in: New Vistas in Electro-nuclear Physics, eds E.L. Tomusiak, H.S. Kaplan and E.T. Dressler (Plenum, New York) p. 293. Ranking of references. The references in Physics Reports are ranked: crucial references are indicated by three asterisks, very important ones with two, and important references with one. Please indicate in your final version the ranking of the references with the asterisk system. Please use the asterisks sparingly: certainly not more than 15% of all references should be placed in either of the three categories. Formulas. Formulas should be typed or unambiguously written. Special care should be taken of those symbols which might cause confusion. Unusual symbols should be identified in the margin the first time they occur.
Equations should be numbered consecutively throughout the paper or per section, e.g., Eq. (15) or Eq. (2.5). Equations which are referred to should have a number; it is not necessary to number all equations. Figures and tables may be numbered the same way. Footnotes. Footnotes may be typed at the foot of the page where they are alluded to, or collected at the end of the paper on a separate sheet. Please do not mix footnotes with references. Figures. Each figure should be submitted on a separate sheet labeled with the figure number. Line diagrams should be original drawings or laser prints. Photographs should be contrasted originals, or high-resolution laserprints on glossy paper. Photocopies usually do not give good results. The size of the lettering should be proportionate to the details of the figure so as to be legible after reduction. Original figures will be returned to the author only if this is explicitly requested. Colour illustrations. Colour illustrations will be accepted if the use of colour is judged by the Editor to be essential for the presentation. Upon acceptance, the author will be asked to bear part of the extra cost involved in colour reproduction and printing. After acceptance – Proofs: Proofs will be sent to the author by e-mail, 6–8 weeks after receipt of the manuscript. Please note that the proofs have been proofread by the Publisher and only a cursory check by the author is needed; we are unable to accept changes in, or additions to, the edited manuscript at this stage. Your proof corrections should be returned within two days of receipt by fax, courier or airmail. The Publisher may proceed with publication of no response is received. – Copyright transfer: The author(s) will receive a form with which they can transfer copyright of the article to the Publisher. This transfer will ensure the widest possible dissemination of information. LaTeX manuscripts The Publisher welcomes the receipt of an electronic version of your accepted manuscript (encoded in LATEX). If you have not already supplied the final, revised version of your article (on diskette) to the Journal Editor, you are requested herewith to send a file with the text of the manuscript (after acceptance) by e-mail to the address provided by the Publisher. Please note that no deviations
viii
Instructions to authors
from the version accepted by the Editor of the journal are permissible without the prior and explicit approval by the Editor. Such changes should be clearly indicated on an accompanying printout of the file.
Files sent via electronic mail should be accompanied by a clear identification of the article (name of journal, editor’s reference number) in the ‘‘subject field’’ of the e-mail message. LATEX articles should use the Elsevier document class ‘‘elsart’’, or alternatively the standard document class ‘‘article’’. The Elsevier package (including detailed instructions for LATEX preparation) can be obtained from http://www.elsevier.com/locate/latex. The elsart package consists of the files: ascii.tab (ASCII table), elsart.cls (use this file if you are using LATEX2e, the current version of LATEX), elsart.sty and elsart12.sty (use these two files if you are using LATEX2.09, the previous version of LATEX), instraut.dvi and/or instraut.ps (instruction booklet), readme. Author benefits – Free offprints. For regular articles, the joint authors will receive 25 offprints free of charge of the journal issue containing their contribution; additional copies may be ordered at a reduced rate. – Discount. Contributors to Elsevier Science journals are entitled to a 30% discount on all Elsevier Science books. – Contents Alert. Physics Reports is included in Elsevier’s pre-publication service Contents Alert. Author enquiries For enquiries relating to the submission of articles (including electronic submission where available) please visit the Author Gateway from Elsevier Science at http://authors.elsevier.com. The Author Gateway also provides the facility to track accepted articles and set up e-mail alerts to inform you of when an article’s status has changed, as well as detailed artwork guidelines, copyright information, frequently asked questions and more. Contact details for questions arising after acceptance of an article, especially those relating to proofs, are provided when an article is accepted for publication.
Available online at www.sciencedirect.com
Physics Reports 378 (2003) 1 – 98 www.elsevier.com/locate/physrep
Critical market crashes D. Sornettea; b a
Institute of Geophysics and Planetary Physics and Department of Earth and Space Science, University of California, Los Angeles, CA 90095, USA b Laboratoire de Physique de la Mati%ere, Condens&ee, CNRS UMR6622 and Universit&e des Sciences, Parc Valrose, 06108 Nice Cedex 2, France Received 1 November 2002 editor: I. Procaccia
Abstract This review presents a general theory of +nancial crashes and of stock market instabilities that his co-workers and the author have developed over the past seven years. We start by discussing the limitation of standard analyses for characterizing how crashes are special. The study of the frequency distribution of drawdowns, or runs of successive losses shows that large +nancial crashes are “outliers”: they form a class of their own as can be seen from their statistical signatures. If large +nancial crashes are “outliers”, they are special and thus require a special explanation, a speci+c model, a theory of their own. In addition, their special properties may perhaps be used for their prediction. The main mechanisms leading to positive feedbacks, i.e., self-reinforcement, such as imitative behavior and herding between investors are reviewed with many references provided to the relevant literature outside the narrow con+ne of Physics. Positive feedbacks provide the fuel for the development of speculative bubbles, preparing the instability for a major crash. We demonstrate several detailed mathematical models of speculative bubbles and crashes. A +rst model posits that the crash hazard drives the market price. The crash hazard may sky-rocket at some times due to the collective behavior of “noise traders”, those who act on little information, even if they think they “know”. A second version inverses the logic and posits that prices drive the crash hazard. Prices may skyrocket at some times again due to the speculative or imitative behavior of investors. According the rational expectation model, this entails automatically a corresponding increase of the probability for a crash. We also review two other models including the competition between imitation and contrarian behavior and between value investors and technical analysts. The most important message is the discovery of robust and universal signatures of the approach to crashes. These precursory patterns have been documented for essentially all crashes on developed as well as emergent stock markets, on currency markets, on company stocks, and so on. We review this discovery at length and demonstrate how to use this insight and the detailed predictions obtained from these models to forecast crashes. For this, we review the major crashes of the past that occurred on the major stock markets of the planet and describe the empirical evidence of the universal nature of the critical log-periodic precursory signature of crashes. The concept of an “anti-bubble” is also summarized, with the Japanese collapse from the beginning of 1991 to present, taken as a prominent example. A prediction issued and advertised in January 1999 has been until E-mail address:
[email protected] (D. Sornette). c 2003 Elsevier Science B.V. All rights reserved. 0370-1573/03/$ - see front matter doi:10.1016/S0370-1573(02)00634-8
2
D. Sornette / Physics Reports 378 (2003) 1 – 98
recently born out with remarkable precision, predicting correctly several changes of trends, a feat notoriously diAcult using standard techniques of economic forecasting. We also summarize a very recent analysis the behavior of the U.S. S&P500 index from 1996 to August 2002 and the forecast for the two following years. We conclude by presenting our view of the organization of +nancial markets. c 2003 Elsevier Science B.V. All rights reserved. PACS: 02.50.−r
Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Financial crashes: what, how, why and when? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. What are crashes and why do we care? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. The crash of October, 1987 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. How? Historical crashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1. The Tulip mania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2. The South Sea bubble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3. The Great crash of October 1929 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. Why? Extreme events in complex systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5. When? Is prediction possible? A working hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Financial crashes are “outliers” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. What are “abnormal” returns? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Drawdowns (runs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Testing outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. The Dow Jones industrial average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. The Nasdaq composite index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6. The presence of “Outliers” is a general phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7. Implications for safety regulations of stock markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Positive feedbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Herding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. It is optimal to imitate when lacking information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Cooperative behaviors resulting from imitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Modelling +nancial bubbles and market crashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1. The risk-driven model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1. Finite-time singularity in the crash hazard rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2. Derivation from the microscoping Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3. Dynamics of prices from the rational expectation condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. The price-driven model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Risk-driven versus price-driven models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4. Imitation and contrarian behavior: hyperbolic bubbles, crashes and chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Log-periodic oscillations decorating power laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1. Status of log-periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Stock market price dynamics from the interplay between fundamental value investors and technical analysists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1. Nonlinear value and trend-following strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2. Nonlinear dynamical equation for stock market prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3. Dynamical properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Autopsy of major crashes: universal exponents and log-periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 4 4 5 7 7 8 9 11 14 15 15 16 19 20 22 24 26 26 27 29 31 36 37 38 39 41 43 50 51 54 54 55 55 57 59 60
D. Sornette / Physics Reports 378 (2003) 1 – 98 7.1. The crash of October 1987 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1. Precursory pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2. Aftershock patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. The crash of October 1929 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3. The three Hong Kong crashes of 1987, 1994 and 1997 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4. The crash of October 1997 and its resonance on the U.S. market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5. Currency crashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6. The crash of August 1998 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7. The Nasdaq crash of April 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8. “Anti-bubbles” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.1. The “bearish” regime on the Nikkei starting from 1st January 1990 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.2. The gold deKation price starting mid-1980 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.3. The U.S. 2000 –2002 Market Descent: How Much Longer and Deeper? . . . . . . . . . . . . . . . . . . . . . . . . . 8. Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1. “Emergent” behavior of the stock market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2. Implications for mitigations of crises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3. Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 60 61 62 65 67 70 73 76 77 81 82 84 85 87 87 88 91 92 92
1. Introduction The total world market capitalization rose from $3.38 trillion (thousand billions) in 1983 to $26.5 trillion in 1998 and to $38.7 trillion in 1999. To put these numbers in perspective, the 1999 U.S. budget was $1.7 trillion while its 1983 budget was $800 billion. Market capitalization and trading volumes tripled during the 1990s. The volume of securities issuance was multiplied by six. Privatization has played a key role in the stock market growth (Megginson, 2000). Stock market investment is clearly the big game in town. A market crash occurring simultaneously on most of the stock markets of the world as witnessed in October 1987 would amount to the quasi-instantaneous evaporation of trillions of dollars. In values of January 2001, a stock market crash of 30% indeed would correspond to an absolute loss of about 13 trillion dollars! Market crashes can thus swallow years of pension and savings in an instant. Could they make us suNer even more by being the precursors or triggering factors of major recessions as in 1929 –1933 after the great crash of October 1929? Or could they lead to a general collapse of the +nancial and banking system as seems to have being barely avoided several times in the not-so-distant past? Stock market crashes are also fascinating because they personify the class of phenomena known as “extreme events”. Extreme events are characteristic of many natural and social systems, often referred to by scientists as “complex systems”. Here, we discuss how +nancial crashes can be understood by invoking the latest and most sophisticated concepts in modern science, i.e., the theory of complex systems and of critical phenomena. Our aim is to cover a territory bringing us all the way from the description of how the wonderful organization around us arises, to the holy grail of crash predictions. This article is organized in eight parts. Section 2 introduces the fundamental questions: what are crashes? How do they happen? Why do they occur? When do they occur? Section 2 outlines the
4
D. Sornette / Physics Reports 378 (2003) 1 – 98
answers we propose, taking as examples some famous, or we should rather say, infamous historical crashes. Section 3 discusses +rst the limitation of standard analyses for characterizing how crashes are special. It presents then the study of the frequency distribution of drawdowns, or runs of successive losses, and shows that large +nancial crashes are “outliers”: they form a class of their own as can be seen from their statistical signatures. If large +nancial crashes are “outliers”, they are special and thus require a special explanation, a speci+c model, a theory of their own. In addition, their special properties may perhaps be used for their prediction. Section 4 reviews the main mechanisms leading to positive feedbacks, i.e., self-reinforcement, such as imitative behavior and herding between investors. Positive feedbacks provide the fuel for the development of speculative bubbles, preparing the instability for a major crash. Section 5 presents two versions of a rational model of speculative bubbles and crashes. The +rst version posits that the crash hazard drives the market price. The crash hazard may sky-rocket at some times due to the collective behavior of “noise traders”, those who act on little information, even if they think they “know”. The second version inverses the logic and posits that prices drive the crash hazard. Prices may skyrocket at some times again due to the speculative or imitative behavior of investors. According the rational expectation model, this entails automatically a corresponding increase of the probability for a crash. The most important message is the discovery of robust and universal signatures of the approach to crashes. These precursory patterns have been documented for essentially all crashes on developed as well as emergent stock markets, on currency markets, on company stocks, and so on. Section 5 also discusses two simple models of imitation and contrarian behavior of agents, leading to a chaotic dynamics of speculative bubbles and crashes and of the competition between value investors and technical analysts. Section 6 takes a step back and presents the general concept of self-similarity, with complex dimensions and their associated discrete self-similarity. Section 6 shows how these remarkable geometric and mathematical objects allow one to codify the information contained in the precursory patterns before large crashes. Section 7 analyzes the major crashes of the past that occurred on the major stock markets of the planet. It describes the empirical evidence of the universal nature of the critical log-periodic precursory signature of crashes. It also presents the concept of an “anti-bubble”, with the Japanese collapse from the beginning of 1991 to present, taken as a prominent example. A prediction issued and advertised in January 1999 has been until now born out with remarkable precision, predicting correctly several changes of trends, a feat notoriously diAcult using standard techniques of economic forecasting. We also summarize a very recent analysis the behavior of the U.S. S&P500 index from 1996 to August 2002 and the forecast for the two following years. Section 8 concludes. 2. Financial crashes: what, how, why and when? 2.1. What are crashes and why do we care? Stock market crashes are momentous +nancial events that are fascinating to academics and practitioners alike. According to the academic world view that markets are eAcient, only the revelation of a dramatic piece of information can cause a crash, yet in reality even the most thorough post-mortem analyses are typically inconclusive as to what this piece of information might have been. For traders and investors, the fear of a crash is a perpetual source of stress, and the onset of the event itself always ruins the lives of some of them.
D. Sornette / Physics Reports 378 (2003) 1 – 98
5
Most approaches to explain crashes search for possible mechanisms or eNects that operate at very short time scales (hours, days or weeks at most). We propose here a radically diNerent view: the underlying cause of the crash must be searched months and years before it, in the progressive increasing build-up of market cooperativity or eNective interactions between investors, often translated into accelerating ascent of the market price (the bubble). According to this “critical” point of view, the speci+c manner by which prices collapsed is not the most important problem: a crash occurs because the market has entered an unstable phase and any small disturbance or process may have triggered the instability. Think of a ruler held up vertically on your +nger: this very unstable position will lead eventually to its collapse, as a result of a small (or absence of adequate) motion of your hand or due to any tiny whiN. The collapse is fundamentally due to the unstable position; the instantaneous cause of the collapse is secondary. In the same vein, the growth of the sensitivity and the growing instability of the market close to such a critical point might explain why attempts to unravel the local origin of the crash have been so diverse. Essentially, anything would work once the system is ripe. We explore here the concept that a crash has fundamentally an endogenous origin and that exogenous shocks only serve as triggering factors. As a consequence, the origin of crashes is much more subtle than often thought as it is constructed progressively by the market as a whole, as a self-organizing process. In this sense, this could be termed a systemic instability. Systemic instabilities are of great concern to governments, central banks and regulatory agencies (De Bandt and Hartmann, 2000). The question that has often arisen in the 1990s is whether the new, globalized, information technology-driven economy has advanced to the point of outgrowing the set of rules dating from the 1950s, in eNect creating the need for a new rule set for the New Economy. Those who make this call basically point to the systemic instabilities since 1997 (or even back to Mexico’s peso crisis of 1994) as evidence that the old post-world war II rule set is now antiquated, thus endangering this second great period of globalization to the same fate as the +rst. With the global economy appearing so fragile sometimes, how big of a disruption would be needed to throw a wrench into the world’s +nancial machinery? One of the leading moral authorities, the Basle Committee on Banking Supervision, advises (1997) that, “in handling systemic issues, it will be necessary to address, on the one hand, risks to con+dence in the +nancial system and contagion to otherwise sound institutions, and, on the other hand, the need to minimize the distortion to market signals and discipline”. The dynamics of con+dence and of contagion and decision making based on imperfect information are indeed at the core of the present work and will lead us to examine the following questions. What are the mechanisms underlying crashes? Can we forecast crashes? Could we control them? Or at least, could we have some inKuence on them? Do crashes point to the existence of a fundamental instability in the world +nancial structure? What could be changed to mollify or suppress these instabilities? 2.2. The crash of October, 1987 From the opening on October 14, 1987 through the market close on October 19, major indexes of market valuation in the United States declined by 30 percent or more. Furthermore, all major world markets declined substantially in the month, which is itself an exceptional fact that contrasts with the usual modest correlations of returns across countries and the fact that stock markets around the world are amazingly diverse in their organization (Barro et al., 1989).
6
D. Sornette / Physics Reports 378 (2003) 1 – 98
In local currency units, the minimum decline was in Austria (−11:4%) and the maximum was in Hong Kong (−45:8%). Out of 23 major industrial countries (Australia, Austria, Belgium, Canada, Denmark, France, Germany, Hong Kong, Ireland, Italy, Japan, Malaysia, Mexico, Netherland, New Zealand, Norway, Singapore, South Africa, Spain, Sweden, Switzerland, United Kingdom, United States), 19 had a decline greater than 20%. Contrary to a common belief, the U.S. was not the +rst to decline sharply. Non-Japanese Asian markets began a severe decline on October 19, 1987, their time, and this decline was echoed +rst on a number of European markets, then in North American, and +nally in Japan. However, most of the same markets had experienced signi+cant but less severe declines in the latter part of the previous week. With the exception of the U.S. and Canada, other markets continued downward through the end of October, and some of these declines were as large as the great crash on October 19. A lot of work has been carried out to unravel the origin(s) of the crash, notably in the properties of trading and the structure of markets; however, no clear cause has been singled out. It is noteworthy that the strong market decline during October 1987 followed what for many countries had been an unprecedented market increase during the +rst nine months of the year and even before. In the U.S. market for instance, stock prices advanced 31.4% over those nine months. Some commentators have suggested that the real cause of October’s decline was that over-inKated prices generated a speculative bubble during the earlier period. The main explanations people have come up with are the following. 1. Computer trading. In computer trading, also known as program trading, computers were programmed to automatically order large stock trades when certain market trends prevailed, in particular sell orders after losses. However, during the 1987 U.S. Crash, other stock markets which did not use program trading also crashed, some with losses even more severe than the U.S. market. 2. Derivative securities. Index futures and derivative securities have been claimed to increase the variability, risk and uncertainty of the U.S. stock markets. Nevertheless, none of these techniques or practices existed in previous large and sudden market declines in 1914, 1929, and 1962. 3. Illiquidity. During the crash, the large Kow of sell orders could not be digested by the trading mechanisms of existing +nancial markets. Many common stocks in the New York Stock Exchange were not traded until late in the morning of October 19 because the specialists could not +nd enough buyers to purchase the amount of stocks that sellers wanted to get rid of at certain prices. This insuAcient liquidity may have had a signi+cant eNect on the size of the price drop, since investors had overestimated the amount of liquidity. However, negative news about the liquidity of stock markets cannot explain why so many people decided to sell stock at the same time. 4. Trade and budget de>cits. The third quarter of 1987 had the largest U.S. trade de+cit since 1960, which together with the budget de+cit, led investors into thinking that these de+cits would cause a fall of the U.S. stocks compared with foreign securities. However, if the large U.S. budget de+cit was the cause, why did stock markets in other countries crash as well? Presumably, if unexpected changes in the trade de+cit are bad news for one country, it should be good news for its trading partner. 5. Overvaluation. Many analysts agree that stock prices were overvalued in September, 1987. While Price/Earning ratio and Price/Dividend ratios were at historically high levels, similar Price/Earning
D. Sornette / Physics Reports 378 (2003) 1 – 98
7
and Price/Dividends values had been seen for most of the 1960 –1972 period over which no sudden crash occurred. Overvaluation does not seem to trigger crashes every time. Other cited potential causes involve the auction system itself, the presence or absence of limits on price movements, regulated margin requirements, oN-market and oN-hours trading (continuous auction and automated quotations), the presence or absence of Koor brokers who conduct trades but are not permitted to invest on their own account, the extent of trading in the cash market versus the forward market, the identity of traders (i.e., institutions such as banks or specialized trading +rms), the signi+cance of transaction taxes... More rigorous and systematic analyses on univariate associations and multiple regressions of these various factors conclude that it is not clear at all what was the origin of the crash (Barro et al., 1989; Roll, 1988). The most precise statement, albeit somewhat self-referencing, is that the most statistically signi+cant explanatory variable in the October crash can be ascribed to the normal response of each country’s stock market to a worldwide market motion. A world market index was thus constructed (Barro et al., 1989; Roll, 1988) by equally weighting the local currency indexes of the 23 major industrial countries mentioned above and normalized to 100 on september 30. It fell to 73.6 by October 30. The important result is that it was found to be statistically related to monthly returns in every country during the period from the beginning of 1981 until the month before the crash, albeit with a wildly varying magnitude of the responses across countries (Barro et al., 1989; Roll, 1988). This correlation was found to swamp the inKuence of the institutional market characteristics. This signals the possible existence of a subtle but nonetheless present world-wide cooperativity at times preceding crashes. 2.3. How? Historical crashes In the +nancial world, risk, reward and catastrophe come in irregular cycles witnessed by every generation. Greed, hubris and systemic Kuctuations have given us the Tulip Mania, the South Sea bubble, the land booms in the 1920s and 1980s, the U.S. stock market and great crash in 1929, the October 1987 crash, to name just a few of the hundreds of ready examples (White, 1996). 2.3.1. The Tulip mania The years of tulip speculation fell within a period of great prosperity in the republic of the Netherlands. Between 1585 and 1650, Amsterdam became the chief commercial emporium, the center of the trade of the northwestern part of Europe, owing to the growing commercial activity in newly discovered America. The tulip as a cultivated Kower was imported into Western Europe from Turkey and it is +rst mentioned around 1554. The scarcity of tulips and their beautiful colors made them valuable and a must for members of the upper society. During the build-up of the tulip market, the participants were not making money through the actual process of production. Tulips acted as the medium of speculation and its price determined the wealth of participants in the tulip business. It is not clear whether the build-up attracted new investment or new investment fueled the build-up, or both. What is known is that, as the build-up continued more and more, people were roped in to invest their hard won earnings. The price of the tulip lost all correlation to its comparative value with other goods or services. What we now call the “tulip mania” of the seventeenth century was the “sure thing” investment during the period from mid-1500s to 1636. Before its devastating end in 1637, those who bought
8
D. Sornette / Physics Reports 378 (2003) 1 – 98
tulips rarely lost money. People became too con+dent that this “sure thing” would always make them money and, at its peak, the participants mortgaged their houses and businesses to trade tulips. The craze was so overwhelming that some tulip bulbs of a rare variety sold for the equivalent of a few tens of thousand dollars. Before the crash, any suggestion that the price of tulips was irrational was dismissed by all the participants. The conditions now generally associated with the +rst period of a boom were all present: an increasing currency, a new economy with novel colonial possibilities, an increasingly prosperous country, all together had created the optimistic atmosphere in which booms are said to grow. The crisis came unexpectedly. On february 4th, 1637, the possibility of the tulips becoming de+nitely unsalable was mentioned for the +rst time. From then to the end of May 1637, all attempts of coordination between Korists, bulbgrowers as well as by the States of Holland were met with failure. Bulbs worth tens of thousand of U.S. dollars (in present value) in early 1637 became valueless a few months later. This remarkable event is often discussed in present days and parallels are drawn with modern speculation mania and the question is asked: does the tulip market’s build-up and its subsequent crash has any relevance for today’s times? 2.3.2. The South Sea bubble The South Sea Bubble is the name given to the enthusiastic speculative fervor ending in the +rst great stock market crash in England in 1720 (White, 1996). The South Sea Bubble is a fascinating story of mass hysteria, political corruption, and public upheaval. It is really a collection of thousands of stories, tracing the personal fortunes of countless individuals who rode the wave of stock speculation for a furious six months in 1720. The “Bubble year” as it is referred to, actually involves several individual “bubbles” as all kinds of fraudulent joint-stock companies sought to take advantage of the mania for speculation. The following account borrows from (The) Bubble Project at http://is.dal.ca/∼ dmcneil/bubble.html. In 1711, the South Sea Company was given a monopoly of all trade to the south seas. The real prize was the anticipated trade that would open up with the rich Spanish colonies in South America. In return for this monopoly, the South Sea Company would assume a portion of the national debt that England had incurred during the War of the Spanish Succession. When Britain and Spain oAcially went to war again in 1718, the immediate prospects for any bene+ts from trade to South America were nil. What mattered to speculators, however, were future prospects, and here it could always be argued that incredible prosperity lay ahead and would be realized when open hostilities came to an end. The early 1700s was also a time of international +nance. By 1719 the South Sea directors wished, in a sense, to imitate the manipulation of public credit that John Law had achieved in France with the Mississippi Company, which was given a monopoly of French trade to North America; Law had connived to drive the price of its stock up, and the South Sea directors hoped to do the same. In 1719 the South Sea directors made a proposal to assume the entire public debt of the British government. On April 12, 1720 this oNer was accepted. The Company immediately started to drive the price of the stock up through arti+cial means; these largely took the form of new subscriptions combined with the circulation of pro-trade-with-Spain stories designed to give the impression that the stock could only go higher. Not only did capital stay in England, but many Dutch investors bought South Sea stock, thus increasing the inKationary pressure.
D. Sornette / Physics Reports 378 (2003) 1 – 98
9
South Sea stock rose steadily from January through to the spring. And as every apparent success would soon attract its imitators, all kinds of joint-stock companies suddenly appeared, hoping to cash in on the speculation mania. Some of these companies were legitimate but the bulk were bogus schemes designed to take advantage of the credulity of the people. Several of the bubbles, both large and small, had some overseas trade or “New World” aspect. In addition to the South Sea and Mississippi ventures, there was a project for improving the Greenland +shery, another for importing walnut trees from Virginia. Raising capital sums by selling stock in these enterprises was apparently easy work. The projects mentioned so far all have a tangible speci+city at least on paper if not in practice; others were rather vague on details but big on promise. The most remarkable was “A company for carrying on an undertaking of great advantage, but nobody to know what it is”. The prospectus stated that “the required capital was half a million, in +ve thousand shares of 100 pounds each, deposit 2 pounds per share. Each subscriber, paying his [or her] desposit, was entitled to 100 pounds per annum per share. How this immense pro+t was to be obtained, [the proposer] did not condescend to inform [the buyers] at that time”. As T.J. Dunning (1860) wrote: “Capital eschews no pro+t, or very small pro+t.... With adequate pro+t, capital is very bold. A certain 1% percent will ensure its employment anywhere; 20 percent certain will produce eagerness; 50 percent, positive audacity; 100 percent will make it ready to trample on all human laws; 300 percent and there is not a crime at which it will scruple, nor a risk it will not run, even to the chance of its owner being hanged”. Next morning, at nine o’clock, this great man opened an oAce in Cornhill. Crowds of people beset his door, and when he shut up at three o’cock, he found that no less than one thousand shares had been subscribed for, and the deposits paid. He was thus, in +ve hours, the winner of 2000 pounds. He was philosophical enough to be contented with his venture, and set oN the same evening for the Continent. He was never heard of again. Such scams were bad for the speculation business and so largely through the pressure of the South Sea directors, the so-called “Bubble Act” was passed on June 11, 1720 requiring all joint-stock companies to have a royal charter. For a moment, the con+dence of the people was given an extra boost, and they responded accordingly. South Sea stock had been at 175 pounds at the end of February, 380 at the end of March, and around 520 by May 29. It peaked at the end of June at over 1000 pounds (a psychological barrier in that four-digit number). With credulity now stretched to the limit and rumors of more and more people (including the directors themselves) selling oN, the bubble then burst according to a slow, very slow at +rst, but steady deKation (not unlike the 60% drop of the Japanese Nikkei index after its all time peak at the end of December 1990). By mid-August, the bankruptcy listings in the London Gazette reached an all-time high, an indication of how people bought on credit or margin. Thousands of fortunes were lost, both large and small. The directors attempted to pump-up more speculation. They failed. The full collapse came by the end of September when the stock stood at 135 pounds. The crash remained in the consciousness of the Western world for the rest of the eighteenth century, not unlike our cultural memory of the 1929 Wall Street Crash. 2.3.3. The Great crash of October 1929 The Roaring 1920s—a time of growth and prosperity on Wall Street and Main Street—ended with the Great Crash of October 1929 (for the most thorough and authoritative account and analysis, see (Galbraith, 1997)). Two thousand investment +rms went under, and the American banking industry
10
D. Sornette / Physics Reports 378 (2003) 1 – 98
underwent the biggest structural changes of its history, as a new era of government regulation began. Roosevelt’s New Deal politics would follow. The Great Depression that followed put 13 million Americans out of work (that the crash of October 1929 caused the Great Depression is a part of +nancial folklore, but nevertheless probably not fully accurate. For instance, using a regime switching framework, Coe (2002) +nds that a prolonged period of crisis began not with the 1929 stock market crash but with the +rst banking panic of October 1930). The October 1929 crash is a remarkable illustration of several remarkable features often associated with crashes. First, stock market crashes are often unforeseen for most people, especially economists. “In a few months, I expect to see the stock market much higher than today”. Those words were pronounced by Irving Fisher, America’s distinguished and famous economist, Professor of Economics at Yale University, 14 days before Wall Street crashed on Black Tuesday, October 29, 1929. “A severe depression such as 1920 –1921 is outside the range of probability. We are not facing a protracted liquidation”. This was the analysis oNered days after the crash by the Harvard Economic Society to its subscribers. After continuous and erroneous optimistic forecasts, the Society closed its doors in 1932. Thus, the two most renowned economic forecasting institutes in America at the time failed to predict that a crash and a depression were forthcoming, and continued with their optimistic views, even as the Great Depression took hold of America. The reason is simple: predictions of trend-reversals constitutes by far the most diAcult challenge posed to forecasters and is very unreliable especially within the linear framework of standard (auto-regressive) economic models. A second general feature exempli+ed by the October 1929 event is that a +nancial collapse has never happened when things look bad. On the contrary, macroeconomic Kows look good before crashes. Before every collapse, economists say the economy is in the best of all worlds. Everything looks rosy, stock markets go up and up, and macroeconomic Kows (output, employment, and so on) appear to be improving further and further. This explains why a crash catches most people, especially economists, totally by surprise. The good times are invariably extrapolated linearly into the future. Is it not perceived as senseless by most people in today’s euphoria to talk about crash and depression? During the build-up phase of a bubble such as the one preceding the October 1929 crash, there is a growing interest in the public for the commodity in question, whether it consists in stocks, diamonds or coins. That interest can be estimated through diNerent indicators: increase in the number of books published on the topic (see Fig. 1), and increase in the subscriptions to specialized journals. Moreover, the well-known empirical rule according to which the volume of sales is growing during a bull market +nds a natural interpretation: sales increases in fact reveal and pinpoint the progress of the bubble’s diNusion throughout society. These features has been recently re-examined for evidence of a bubble, a ‘fad’ or ‘herding’ behavior, by studying individual stock returns (White and Rappoport, 1995). One story often advanced for the boom of 1928 and 1929 is that it was driven by the entry into the market of largely uninformed investors, who followed the fortunes of and invested in ‘favorite’ stocks. The result of this behavior would be a tendency for the favorite stocks’ prices to move together more than would be predicted by their shared fundamental economic values. The comovement indeed increased signi+cantly during the boom and was a signal characteristic of the tumultuous market of the early 1930s. These results are thus consistent with the possibility that a fad or crowd psychology played a role in the rise of the market, its crash and subsequent volatility (White and Rappoport, 1995).
D. Sornette / Physics Reports 378 (2003) 1 – 98
11
Fig. 1. Comparison between the number of yearly published books about stock market speculation and the level of stock prices (1911–1940). Black line: books at Harvard library whose titles contain one of the words “stocks”, “stock market” or “speculation”; grey line: Standard and Poor index of common stocks. The curve of published books lags behind the price curve with a time-lag of about 1.5 years, which can be explained by the time needed for a book to get published. Source: The stock price index is taken from the Historical Abstract of the United States. Reproduced from (Roehner and Sornette, 2000).
The political mood before the October 1929 crash was also optimistic. In November 1928, Herbert Hoover was elected President of the United States in a landslide, and his election set oN the greatest increase in stock buying to that date. Less than a year after the election, Wall Street crashed. 2.4. Why? Extreme events in complex systems Financial markets are not the only systems with extreme events. Financial markets constitute one among many other systems exhibiting a complex organization and dynamics with similar behavior. Systems with a large number of mutually interacting parts, often open to their environment, self-organize their internal structure and their dynamics with novel and sometimes surprising macroscopic (“emergent”) properties. The complex system approach, which involves “seeing” interconnections and relationships, i.e., the whole picture as well as the component parts, is nowadays pervasive in modern control of engineering devices and business management. It is also plays an increasing role in most of the scienti+c disciplines, including biology (biological networks, ecology, evolution, origin of life, immunology, neurobiology, molecular biology, and so on), geology (plate-tectonics, earthquakes and volcanoes, erosion and landscapes, climate and weather, environment, and so on), economy and social sciences (including cognition, distributed learning, interacting agents, and so on). There is a growing recognition that progress in most of these disciplines, in many of the pressing issues for our future welfare as well as for the management of our everyday life, will need such a systemic complex system and multidisciplinary approach. This view tends to replace the previous reductionist approach, consisting of decomposing a system in components, such that
12
D. Sornette / Physics Reports 378 (2003) 1 – 98
the detailed understand of each component was believed to bring understanding in the functioning of the whole. A central property of a complex system is the possible occurrence of coherent large-scale collective behaviors with a very rich structure, resulting from the repeated nonlinear interactions among its constituents: the whole turns out to be much more than the sum of its parts. A part of the scienti+c community holds that most complex systems are not amenable to mathematical, analytic descriptions and can only be explored by means of “numerical experiments” (see for instance (Wolfram, 2002) from an extreme implementation of this view and (KadanoN, 2002) for a enlightening criticism). In the context of the mathematics of algorithmic complexity (Chaitin, 1987), many complex systems are said to be computationally irreducible, i.e. the only way to decide about their evolution is to actually let them evolve in time. Accordingly, the “dynamical” future time evolution of complex systems would be inherently unpredictable. This unpredictability refers to the frustration to satisfy the quest for the knowledge of what tomorrow will be made of, often +lled by the vision of “prophets” who have historically inspired or terri+ed the masses. The view that complex systems are unpredictable has recently been defended persuasively in concrete prediction applications, such as the socially important issue of earthquake prediction (Geller et al., 1997a, b) (see the contributions in (Nature debates, 1999) for arguments put forward by leading seismologists and geophysicts either defending or +ghting this view). In addition to the persistent failures at reaching a reliable earthquake predictive scheme, this view is rooted theoretically in the analogy between earthquakes and self-organized criticality (Bak, 1996). In this “fractal” framework, there is no characteristic scale and the power law distribution of earthquake sizes reKects the fact that the large earthquakes are nothing but small earthquakes that did not stop. They are thus unpredictable because their nucleation is not diNerent from that of the multitude of small earthquakes which obviously cannot be all predicted. Does this really hold for all features of complex systems? Take our personal life. We are not really interested in knowing in advance at what time we will go to a given store or drive to a highway. We are much more interested in forecasting the major bifurcations ahead of us, involving the few important things, like health, love and work that count for our happiness. Similarly, predicting the detailed evolution of complex systems has no real value and the fact that we are taught that it is out of reach from a fundamental point of view does not exclude the more interesting possibility of predicting phases of evolutions of complex systems that really count, like the extreme events. It turns out that most complex systems in natural and social sciences do exhibit rare and sudden transitions, that occur over time intervals that are short compared to the characteristic time scales of their posterior evolution. Such extreme events express more than anything else the underlying “forces” usually hidden by almost perfect balance and thus provide the potential for a better scienti+c understanding of complex systems. These crises have fundamental societal impacts and range from large natural catastrophes such as earthquakes, volcanic eruptions, hurricanes and tornadoes, landslides, avalanches, lightning strikes, meteorite/asteroid impacts, catastrophic events of environmental degradation, to the failure of engineering structures, crashes in the stock market, social unrest leading to large-scale strikes and upheaval, economic drawdowns on national and global scales, regional power blackouts, traAc gridlock, diseases and epidemics, and so on. It is essential to realize that the long-term behavior of these complex systems is often controlled in large part by these rare catastrophic events: the universe was probably born during an extreme explosion (the “big-bang”); the nucleosynthesis of all important
D. Sornette / Physics Reports 378 (2003) 1 – 98
13
heavy atomic elements constituting our matter results from the colossal explosion of supernovae (these stars more heavy than our sun whose internal nuclear combustion diverges at the end of their life); the largest earthquake in California repeating about once every two centuries accounts for a signi+cant fraction of the total tectonic deformation; landscapes are more shaped by the “millennium” Kood that moves large boulders rather than the action of all other eroding agents; the largest volcanic eruptions lead to major topographic changes as well as severe climatic disruptions; according to some contemporary views, evolution is probably characterized by phases of quasi-stasis interrupted by episodic bursts of activity and destruction (Gould and Eldredge, 1993); +nancial crashes, which can destroy in an instant trillions of dollars, loom over and shape the psychological state of investors; political crises and revolutions shape the long-term geopolitical landscape; even our personal life is shaped on the long run by a few key decisions or happenings. The outstanding scienti+c question is thus how such large-scale patterns of catastrophic nature might evolve from a series of interactions on the smallest and increasingly larger scales. In complex systems, it has been found that the organization of spatial and temporal correlations do not stem, in general, from a nucleation phase diNusing across the system. It results rather from a progressive and more global cooperative process occurring over the whole system by repetitive interactions. For instance, scienti+c and technical discoveries are often quasi-simultaneous in several laboratories in diNerent parts of the world, signaling the global nature of the maturing process. Standard models and simulations of scenarios of extreme events are subject to numerous sources of error, each of which may have a negative impact on the validity of the predictions (Karplus, 1992). Some of the uncertainties are under control in the modeling process; they usually involve trade-oNs between a more faithful description and manageable calculations. Other sources of errors are beyond control as they are inherent in the modeling methodology of the speci+c disciplines. The two known strategies for modeling are both limited in this respect: analytical theoretical predictions are still out of reach for many complex problems even if notable counter-examples exist (see for instance (Barra et al., 2002; Arad et al., 2001; Falkovich et al., 2001)). Brute force numerical resolution of the equations (when they are known) or of scenarios is reliable in the “center of the distribution”, i.e., in the regime far from the extremes where good statistics can be accumulated. Crises are extreme events that occur rarely, albeit with extraordinary impact, and are thus completely under-sampled and thus poorly constrained. Even the introduction of teraKop (or even petaKops in the future) supercomputers does not change qualitatively this fundamental limitation. Notwithstanding these limitations, we believe that the progress of science and of its multidisciplinary enterprises make the time ripe for a full-Kedge eNort towards the prediction of complex systems. In particular, novel approaches are possible for modeling and predicting certain catastrophic events, or “ruptures”, that is, sudden transitions from a quiescent state to a crisis or catastrophic event (Sornette, 1999). Such ruptures involve interactions between structures at many diNerent scales. In the present review, we apply these ideas to one of the most dramatic events in social sciences, +nancial crashes. The approach described here combines ideas and tools from mathematics, physics, engineering and social sciences to identify and classify possible universal structures that occur at diNerent scales, and to develop application-speci+c methodologies to use these structures for prediction of the +nancial “crises”. Of special interest will be the study of the premonitory processes before +nancial crashes or “bubble” corrections in the stock market. For this, we will describe a new set of computational methods which are capable of searching and comparing patterns, simultaneously and iteratively, at multiple scales in hierarchical systems.
14
D. Sornette / Physics Reports 378 (2003) 1 – 98
We will use these patterns to improve the understanding of the dynamical state before and after a +nancial crash and to enhance the statistical modeling of social hierarchical systems with the goal of developing reliable forecasting skills for these large-scale +nancial crashes. 2.5. When? Is prediction possible? A working hypothesis Our hypothesis is that stock market crashes are caused by the slow buildup of long-range correlations leading to a global cooperative behavior of the market eventually ending into a collapse in a short critical time interval. The use of the word “critical” is not purely literary here: in mathematical terms, complex dynamical systems can go through “critical” points, de+ned as the explosion to in+nity of a normally well-behaved quantity. As a matter of fact, as far as nonlinear dynamical systems go, the existence of critical points is more the rule than the exception. Given the puzzling and violent nature of stock market crashes, it is worth investigating whether there could possibly be a link between stock market crashes and critical points. • Our key assumption is that a crash may be caused by local self-reinforcing imitation between traders. This self-reinforcing imitation process leads to the blossoming of a bubble. If the tendency for traders to “imitate” their “friends” increases up to a certain point called the “critical” point, many traders may place the same order (sell) at the same time, thus causing a crash. The interplay between the progressive strengthening of imitation and the ubiquity of noise requires a probabilistic description: a crash is not a certain outcome of the bubble but can be characterised by its hazard rate, i.e., the probability per unit time that the crash will happen in the next instant provided it has not happened yet. • Since the crash is not a certain deterministic outcome of the bubble, it remains rational for investors to remain in the market provided they are compensated by a higher rate of growth of the bubble for taking the risk of a crash, because there is a +nite probability of “landing smoothly”, i.e., of attaining the end of the bubble without crash. In a series of research articles, we have shown extensive evidence that the build-up of bubbles manifests itself as an over-all power law acceleration in the price decorated by “log-periodic” precursors, a concept related to fractals as will be become clear later. This article is to tell this story, to explain why and how these precursors occur, what do they mean? What do they imply with respect to prediction? We claim that there is a degree of predictive skill associated with these patterns. This has already been used in practice and is investigated by our co-workers and us as well as several others, academics and most-of-all practitioners (see Sornette and Johansen, 2001, and Johansen and Sornette, 2002, for a recent review and assessment and Zhou and Sornette, 2002a, b, c for nonparametric tests using a generalization of the so-called q-derivative). The evidence we shall discuss include: • the Wall street October 1929, the World October 1987, the Hong-Kong October 1987, the World August 1998, the Nasdaq April 2000 crashes, • the 1985 foreign exchange event on the U.S. dollar, the correction of the U.S. dollar against the Canadian dollar and the Japanese Yen starting in August 1998,
D. Sornette / Physics Reports 378 (2003) 1 – 98
15
• the bubble on the Russian market and its ensuing collapse in 1997–1998, • twenty-two signi+cant bubbles followed by large crashes or by severe corrections in the Argentinian, Brazilian, Chilean, Mexican, Peruvian, Venezuelan, Hong-Kong, Indonesian, Korean, Malaysian, Philippine and Thai stock markets. In all these cases, it has been found that log-periodic power laws adequately describe speculative bubbles on the western as well as on the emerging markets with very few exceptions. Notwithstanding the drastic diNerences in epochs and contexts, we shall show that these +nancial crashes share a common underlying background as well as structure. The rationale for this rather surprising result is probably rooted in the fact that humans are endowed with basically the same emotional and rational qualities in the 21st century as they were in the 17th century (or at any other epoch). Humans are still essentially driven by at least a grain of greed and fear in their quest for a better well-being. The “universal” structures we are going to uncover may be understood as the robust emergent properties of the market resulting from some characteristic “rules” of interaction between investors. These interactions can change in details due, for instance, to computers and electronic communications. They have not changed at a qualitative level. As we shall see, complex system theory allows us to account for this robustness. 3. Financial crashes are “outliers” In the spirit of Bacon in Novum Organum about 400 years ago, “Errors of Nature, Sports and Monsters correct the understanding in regard to ordinary things, and reveal general forms. For whoever knows the ways of Nature will more easily notice her deviations; and, on the other hand, whoever knows her deviations will more accurately describe her ways”, we document in this section the evidences showing that large market drops are “outliers” and that they reveal fundamental properties of the stock market. 3.1. What are “abnormal” returns? Stock markets can exhibit very large motions, such as rallies and crashes. Should we expect these extreme variations? Or should we consider them as anomalous? Fig. 2 shows the distribution of daily returns of the DJIA and of the Nasdaq index for the period January 2nd, 1990 till September 29, 2000. For instance, we read in Fig. 2 that +ve negative and +ve positive daily DJIA market returns larger or equal to 4% have occurred. In comparison, 15 negative and 20 positive returns larger or equal to 4% have occurred for the Nasdaq index. The larger Kuctuations of returns of the Nasdaq compared to the DJIA are also quanti+ed by the so-called volatility (standard deviation of returns), equal to 1.6% (respectively, 1.4%) for positive (respectively, negative) returns of the DJIA, and equal to 2.5% (respectively, 2.0%) for positive (respectively, negative) returns of the Nasdaq index. The lines shown in Fig. 2 correspond to represent the data by an exponential function. The upward convexity of the trajectories de+ned by the symbols for the Nasdaq quali+es a stretched exponential model (LaherrWere and Sornette, 1998) which embodies the fact that the tail of the distribution is “fatter”, i.e., there are larger risks of large drops (as well as ups) in the Nasdaq compared to the DJIA.
16
D. Sornette / Physics Reports 378 (2003) 1 – 98
Distribution function
1000 returnDJ>0 returnDJ<0 returnNAS>0 returnNAS<0
100
10
1
0
0.02
0.04
0.06
0.08
0.1
|Returns| Fig. 2. Distribution of daily returns for the DJIA and the Nasdaq index for the period January 2, 1990 till September 29, 2000. The lines correspond to +ts of the data by an exponential law. The branches of negative returns have been folded back onto the positive returns for comparison.
Let us use the exponential model and calculate the probability to observe a return amplitude larger than, say, 10 standard deviations (10% in our example). The result is 0.000045, which corresponds to 1 event in 22,026 days, or in 88 years. The drop of 22.6% of October 19, 1987 would correspond to one event in 520 million years, which quali+es it as an “outlier”. Thus, according to the exponential model, a 10% return amplitude does not qualify as an “outlier”, in a clear-cut and undisputable manner. In addition, the discrimination between normal and abnormal returns depends on our choice for the frequency distribution. Qualifying what is the correct description of the frequency distribution, especially for large positive and negative returns, is a delicate problem that is still a hot domain for research. Due to the lack of certainty on the best choice for the frequency distribution, this approach does not seem the most adequate for characterizing anomalous events. We now introduce another diagnostic that allows us to characterize abnormal market phases in a much more precise and nonparametric way, i.e., without referring to a speci+c mathematical representation of the frequency distribution. 3.2. Drawdowns (runs) Extreme value theory (EVT) provides an alternative approach, still based on the distribution of returns estimated at a +xed time scale. Its most practical implementation is based on the so-called “peak-over-threshold” distributions (Embrechts et al., 1997; Bassi et al., 1998), which is founded on a limit theorem known as the Gnedenko–Pickands–Balkema–de Haan theorem which gives a natural limit law for peak-over-threshold values in the form of the Generalized Pareto Distribution (GPD), a family of distributions with two parameters based on the Gumbel, Weibull and Frechet extreme value distributions. The GPD is either an exponential or has a power law tail. Peak-over-threshold
D. Sornette / Physics Reports 378 (2003) 1 – 98
17
distributions put the emphasis on the characterization of the tails of distribution of returns and have thus been scrutinized for their potential for risk assessment and management of large and extreme events (see for instance, Phoa, 1999; McNeil, 1999). In particular, extreme value theory provides a general foundation for the estimation of the value-at-risk for very low-probability “extreme” events. There are however severe pitfalls (Diebold et al., 2001) in the use of extreme value distributions for risk management because of its reliance on the (unstable) estimation of tail probabilities. In addition, the EVT literature assumes independent returns, which implies that the degree of fatness in the tails decreases as the holding horizon lengthens (for the values of the exponents found empirically). Here, we show that this is not the case: returns exhibit strong correlations at special times precisely characterized by the occurrence of extreme events, the regime that EVT aims to describe. This suggests to re-examine EVT and extend it to variable time scales, for instance by analyzing the EVT of the distribution of drawdowns and drawups. A drawdown is de+ned as a persistent decrease in the price over consecutive days. A drawdown is thus the cumulative loss from the last maximum to the next minimum of the price. Drawdowns embody a rather subtle dependence since they are constructed from runs of the same sign variations. Their distribution thus captures the way successive drops can inKuence each other and construct in this way a persistent process. This persistence is not measured by the distribution of returns because, by its very de+nition, it forgets about the relative positions of the returns as they unravel themselves as a function of time by only counting their frequency. This is not detected either by the two-point correlation function, which measures an average linear dependence over the whole time series, while the dependence may only appear at special times, for instance for very large runs, a feature that will be washed out by the global averaging procedure. To demonstrate the information contained in drawdowns and contrast it with the +xed time-scale returns, let us consider the hypothetical situation of a crash of 30% occurring over three days with three successive losses of exactly 10%. The crash is thus de+ned as the total loss or drawdown of 30%. Rather than looking at drawdowns, let us now follow the common approach and examine the daily data, in particular the daily distribution of returns. The 30% drawdown is now seen as three daily losses of 10%. The essential point to realize is that the construction of the distribution of returns amounts to count the number of days over which a given return has been observed. The crash will thus contribute three days of 10% loss, without the information that the three losses occurred sequentially! To see what this loss of information entails, we consider a market in which a 10% daily loss occurs typically once every 4 years (this is not an unreasonable number for the Nasdaq composite index at present times of high volatility). Counting approximately 250 trading days per year, 4 years correspond to 1000 trading days and 1 event in 1000 days thus corresponds to a probability 1=1000 = 0:001 for a daily loss of 10%. The crash of 30% has been dissected as three events which are not very remarkable (each with a relatively short average recurrence time of four years). The plot thickens when we ask what is, according to this description, the probability for three successive daily losses of 10%? Elementary probability tells us that it is the probability of one daily loss of 10% times the probability of one daily loss of 10% times the probability of one daily loss of 10%, giving 10−9 . This corresponds to a 1 event in 1 billion trading days! We should thus wait typically 4 millions years to witness such an event! What has gone wrong? Simply, looking at daily returns and at their distributions has destroyed the information that the daily returns may be correlated, at special times! This crash is like a mammoth which has been dissected in pieces without memory of the connection between the parts and
18
D. Sornette / Physics Reports 378 (2003) 1 – 98
we are left with what look as mouses (bear with the slight exaggeration)! Our estimation that three successive losses of 10% are utterly impossible relied on the incorrect hypothesis that these three events are independent. Independence between successive returns is remarkably well-veri+ed most of the time. However, it may be that large drops may not be independent. In other words, there may be “burst of dependence”, i.e., “pockets of predictability”. It is clear that drawdowns will keep precisely the information relevant to identify the possible burst of local dependence leading to possibly extraordinary large cumulative losses. Our emphasis on drawdowns is thus motivated by two considerations: (1) drawdowns are important measures of risks used by practitioners because they represent their cumulative loss since the last estimation of their wealth. It is indeed a common psychological trait of people to estimate a loss by comparison with the latest maximum wealth; (2) drawdowns automatically capture an important part of the time dependence of price returns, similarly to the run-statistics often used in statistical testing (Knuth, 1969) and econometrics (Campbell et al., 1997; Barber and Lyon, 1997). As previously showed (Johansen and Sornette, 1998, 2002), the distribution of drawdowns contains an information which is quite diNerent from the distribution of returns over a +xed time scale. In particular, a drawndown embodies the interplay between a series of losses and hence measures a “memory” of the market. Drawdowns exemplify the eNect of correlations in price variations when they appear, which must be taken into account for a correct characterisation of market price variations. They are direct measures of a possible ampli+cation or “Kight of fear” where previous losses lead to further selling, strengthening the downward trend, occasionally ending in a crash. We stress that drawdowns, by the “elastic” time-scale used to de+ne them, are eNectively function of several higher order correlations at the same time. Johansen and Sornette (2002) have shown that the distribution of drawdowns for independent price increments x is asymptotically an exponential (while the body of the distribution is Gaussian (Mood, 1940)), when the distribution of x does not decay more slowly than an exponential, i.e., belong to the class of exponential or super-exponential distributions. In contrast, for sub-exponentials (such as stable LZevy laws, power laws and stretched exponentials), the tail of the distribution of drawdowns is asymptotically the same as the distribution of the individual price variations. Since stretched exponentials have been found to oNer an accurate quanti+cation of price variations (LaherrWere and Sornette, 1998; Sornette et al., 2000a, b; Andersen and Sornette, 2001) thus capturing a possible sub-exponential behavior and since they contain the exponential law as a special case, the stretched exponential law is a good null hypothesis. The cumulative stretched distribution is de+ned by Nc (x) = A exp(−(|x|=)z ) ;
(1)
where x is either a drawdown or a drawup. When z ¡ 1 (resp. z ¿ 1), Nc (x) is a stretched exponential or sub-exponential (resp. super-exponential). The special case z=1 corresponds to a pure exponential. In this case, is nothing but the standard deviation of |x|. Johansen and Sornette (2002) have analyzed the major +nancial indices, the major currencies, gold, the twenty largest U.S. companies in terms of capitalisation as well as nine others chosen randomly. They +nd that approximately 98% of the distributions of drawdowns is well-represented by an exponential or a stretched exponential, while the largest to the few ten largest drawdowns are occurring with a signi+cantly larger rate than predicted by the exponential. This is con+rmed by extensive testing on surrogate data. Very large drawdowns thus belong to a diNerent class of their
D. Sornette / Physics Reports 378 (2003) 1 – 98
19
own and call for a speci+c ampli+cation mechanism. Drawups (gain from the last local minimum to the next local maximum) exhibit a similar behavior in only about half the markets examined. We now present some of the most signi+cant results. 3.3. Testing outliers Testing for “outliers” or more generally for a change of population in a distribution is a subtle problem: the evidence for outliers and extreme events does not require and is not even synonymous in general with the existence of a break in the distribution of the drawdowns. Let us illustrate this pictorially by borrowing from another domain of active scienti+c investigation, namely the search for the understanding of the complexity of eddies and vortices in turbulent hydrodynamic Kows, such as in mountain rivers or in the weather. Since solving the exact equations of these Kows does not provide much insight as the results are forbidding, a useful line of attack has been to simplify the problem by studying simple toy models, such as “shell” models of turbulence, that are believed to capture the essential ingredient of these Kows, while being amenable to analysis. Such “shell” models replace the three-dimensional spatial domain by a series of uniform onion-like spherical layers with radii increasing as a geometrical series 1; 2; 4; 8; : : : ; 2n and communicating with each other typically with nearest and next-nearest neighbors. As for +nancial returns, a quantity of great interest is the distribution of velocity variations between two instants at the same position or between two points simultaneously. Such a distribution for the square of the velocity variations has been calculated (L’vov et al., 2001) and exhibits an approximate exponential drop-oN as well as a co-existence with larger Kuctuations, quite reminiscent of our +ndings in +nance (Johansen and Sornette, 1998, 2002). Usually, such large Kuctuations are not considered to be statistically signi+cant and do not provide any speci+c insight. Here, it turns out that it can be shown that these large Kuctuations of the Kuid velocity correspond to intensive peaks propagating coherently over several shell layers with a characteristic bell-like shape, approximately independent of their amplitude and duration (up to a re-scaling of their size and duration). When extending these observations to very long times so that the anomalous Kuctuations can be sampled much better, one gets a continuous distribution (L’vov et al., 2001). Naively, one would expect that the same physics apply in each shell layer (each scale) and, as a consequence, the distributions in each shell should be the same, up to a change of unit reKecting the diNerent scale embodied by each layer. It turns out that the three curves for three diNerent shells can indeed by nicely collapsed, but only for the small velocity Kuctuations, while the large Kuctuations are described by very diNerent heavy tails. Alternatively, when one tries to collapse the curves in the region of the large velocity Kuctuations, then the portions of the curves close to the origin are not collapsed at all and are very diNerent. The remarkable conclusion is that the distributions of velocity increment seem to be composed of two regions, a region of “normal scaling” and a domain of extreme events. The theoretical analysis of L’vov et al. (2001) further substantiate the fact that the largest Kuctuations result from a diNerent mechanism. Here is the message that comes out of this discussion: the concept of outliers and of extreme events does not rest on the requirement that the distribution should not be smooth. Noise and the very process of constructing the distribution will almost always smooth out the curves. What is found by L’vov et al. (2001) is that the distribution is made of two diNerent populations, the body and the tail, which have diNerent physics, diNerent scaling and diNerent properties. This is a clear
20
D. Sornette / Physics Reports 378 (2003) 1 – 98
demonstration that this model of turbulence exhibits outliers in the sense that there is a well-de+ned population of very large and quite rare events that punctuate the dynamics and which cannot be seen as scale-up versions of the small Kuctuations. As a consequence, the fact that the distribution of small events might show up some curvature or continuous behavior does not tell anything against the outlier hypothesis. It is essential to keep this point in mind when looking at the evidence presented below for the drawdowns. Other groups have recently presented supporting evidence that crash and rally days signi+cantly diNer in their statistical properties from the typical market days. For instance, Lillo and Mantegna investigated the return distributions of an ensemble of stocks simultaneously traded in the New York Stock Exchange (NYSE) during market days of extreme crash or rally in the period from January 1987 to December 1998. Out of two hundred distributions of returns, one for each of two hundred trading days where the ensemble of returns is constructed over the whole set of stocks traded on the NYSE, anomalous large widths and fat tails are observed speci+cally on the day of the crash of October 19, 1987, as well as during a few other turbulent days. Lillo and Mantegna document another remarkable behavior associated with crashes and rallies, namely that the distortion of the distributions of returns are not only strong in the tails describing large moves but also in their center. Speci+cally, they show that the overall shape of the distributions is modi+ed in crash and rally days. Closer to our claim that markets develop precursory signatures of bubbles of long time scales, Mansilla has also shown, using a measure of relative complexity, that time sequences corresponding to “critical” periods before large market corrections or crashes have more novel informations with respect to the whole price time series than those sequences corresponding to periods where nothing happened. The conclusion is that, in the intervals where no +nancial turbulence is observed, that is, where the markets works +ne, the informational contents of the (binary-coded) price time series is small. In contrast, there seems to be signi+cant information in the price time series associated with bubbles. This +nding is consistent with the appearance of a collective herding behavior modifying the texture of the price time series compared to normal times. 3.4. The Dow Jones industrial average Fig. 3 shows the distribution of drawdowns and of drawups for the returns of the DJIA over this century. The (stretched) exponential distribution has been derived on the assumption that successive price variations are independent. There is a large body of evidence for the correctness of this assumption for most trading days (Campbell et al., 1997). However, consider, for instance, the 14 largest drawdowns that have occurred in the Dow Jones Industrial Average in this century. Their characteristics are presented in Table 1. Only 3 lasted one or two days, whereas 9 lasted four days or more. Let us examine in particular the largest drawdown. It started on October 14, 1987 (1987.786 in decimal years), lasted four days and led to a total loss of −30:7%. This crash is thus a run of four consecutive losses: +rst day the index is down by 3.8%, second day by 6.1%, third day by 10.4% and fourth by 30.7%. In terms of consecutive losses this correspond to 3.8%, 2.4%, 4.6% and with 22.6% on what is known as the Black Monday of October 1987. The observation of large successive drops is suggestive of the existence of a transient correlation as we already pointed out. For the Dow Jones, this reasoning can be adapted as follows. We use a simple functional form for the distribution of daily losses, namely an exponential distribution with
D. Sornette / Physics Reports 378 (2003) 1 – 98
21
0 -1 -2
Log (P(x))
-3 -4 -5 -6 -7 -8 -9 -0.3
-0.2 -0.1 Drawdown
0
0.1
0.2 Drawup
0.3
Fig. 3. Normalized natural logarithm of the cumulative distribution of drawdowns and of the complementary cumulative distribution of drawups for the Dow Jones Industrial Average index (U.S. stock market). The two continuous lines show the +ts of these two distributions with the stretched exponential distribution. Negative values such as −0:20 and −0:10 correspond to drawdowns of amplitude respectively equal to 20% and 10%. Similarly, positive values corresponds to drawups with, for instance, a number 0:2 meaning a drawup of +20%. Reproduced from Johansen and Sornette (2001c). Table 1 Characteristics of the 14 largest drawdowns of the Dow Jones Industrial Average in this century Rank
Starting time
Index value
Duration (days)
Loss (%)
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1987.786 1914.579 1929.818 1933.549 1932.249 1929.852 1929.835 1932.630 1931.93 1932.694 1974.719 1930.444 1931.735 1998.649
2508.16 76.7 301.22 108.67 77.15 238.19 273.51 67.5 90.14 76.54 674.05 239.69 109.86 8602.65
4 2 3 4 8 4 2 1 7 3 11 4 5 4
−30:7 −28:8 −23:6 −18:6 −18:5 −16:6 −16:6 −14:8 −14:3 −13:9 −13:3 −12:4 −12:9 −12:4
The starting dates are given in decimal years. Reproduced from (Johansen and Sornette, 2001c).
decay rate 1/0.63% obtained by a +t to the distribution of drawdowns shown in Fig. 3. The quality of the exponential model is con+rmed by the direct calculation of the average loss amplitude equal to 0.67% and of its standard deviation equal to 0.61% (recall that an exact exponential would give the three values exactly equal: 1=decay =average=standard deviation). Using these numerical values, the probability for a drop equal to or larger than 3.8% is exp(−3:8=0:63) = 2:4 × 10−3 (an event
22
D. Sornette / Physics Reports 378 (2003) 1 – 98 -30
Apr. 2000 -25
Oct. 1987
Loss (%)
-20 Oct. 1987
Aug. 1998
-15
-10
-5
0 1
10
100
1000
Rank
Fig. 4. Rank ordering of drawdowns in the Nasdaq Composite since its establishment in 1971 until 18 April 2000. Rank 1 is the largest drawdown. Rank 2 is the second largest, and so on. Reproduced from Johansen and Sornette (2000a).
occurring about once every two years); the probability for a drop equal to or larger than 2.4% is exp(−2:4=0:63) = 2:2 × 10−2 (an event occurring about once every two months); the probability for a drop equal to or larger than 4.6% is exp(−4:6=0:63)=6:7×10−4 (an event occurring about once every six years); the probability for a drop equal to or larger than 22.6% is exp(−22:6=0:63) = 2:6 × 10−16 (an event occurring about once every 1014 years). All together, under the hypothesis that daily losses are uncorrelated from one day to the next, the sequence of four drops making the largest drawdown occurs with a probability 10−23 , i.e., once in about 4 thousands of billions of billions years. This exceedingly negligible value 10−23 suggests that the hypothesis of uncorrelated daily returns is to be rejected: drawdowns and especially the large ones may exhibit intermittent correlations in the asset price time series. 3.5. The Nasdaq composite index In Fig. 4, we see the rank ordering plot of drawdowns for the Nasdaq composite index, since its establishment in 1971 until 18 April 2000. The rank ordering plot, which is the same as the (complementary) cumulative distribution with axis interchanged, puts emphasis on the largest events. The four largest events are not situated on a continuation of the distribution of smaller events: the jump between ranks 4 and 5 in relative value is larger than 33% whereas the corresponding jump between ranks 5 and 6 is less than 1% and this remains true for higher ranks. This means that, for drawdowns less than 12.5%, we have a more or less “smooth” curve and then a larger than 33% gap to ranks 3 and 4. The four events are according to rank the crash of April 2000, the crash of October 1987, a larger than 17% “after-shock” related to the crash of October 1987 and a larger than 16% drop related to the “slow crash” of August 1998, that we shall discuss later on.
D. Sornette / Physics Reports 378 (2003) 1 – 98
23
1
Normalised Cumulative Distribution
Nasdaq Composite 99% confidence line 99% confidence line
0.1
0.01
0.001
-0.25
-0.2
-0.15
-0.1
-0.05
0
Draw Down
Fig. 5. Normalized cumulative distribution of drawdowns in the Nasdaq Composite since its establishment in 1971 until 18 April 2000. The 99% con+dence lines are estimated from synthetic tests obtained by generating surrogate +nancial time series constructed by reshu\ing the daily returns at random. Reproduced from Johansen and Sornette (2000a).
To further establish the statistical con+dence with which we can conclude that the four largest events are outliers, the daily returns have been reshu\ed 1000 times generating 1000 synthetic data sets. This procedure means that the synthetic data sets will have exactly the same distribution of daily returns. However, higher order correlations and dependence that may be present in the largest drawdowns are destroyed by the reshu\ing. This “surrogate” data analysis of the distribution of drawdowns has the advantage of being nonparametric, i.e., independent of the quality of +ts with a model such as the exponential or any other model. We will now compare the distribution of drawdowns both for the real data and the synthetic data. With respect to the synthetic data, this can be done in two complementary ways. In Fig. 5, we see the distribution of drawdowns in the Nasdaq Composite compared with the two lines constructed at the 99% con+dence level for the entire ensemble of synthetic drawdowns, i.e. by considering the individual drawdowns as independent: for any given drawdown, the upper (resp. lower) con+dence line is such that 5 of the synthetic distributions are above (below) it; as a consequence, 990 synthetic times series out of the 1000 are within the two con+dence lines for any drawdown value which de+ne the typical interval within which we expect to +nd the empirical distribution. The most striking feature apparent in Fig. 5 is that the distribution of the true data breaks away from the 99% con+dence intervals at approximately 15%, showing that the four largest events are indeed “outliers”. In other words, chance alone cannot reproduce these largest drawdowns. We are thus forced to explore the possibility that an ampli+cation mechanism and dependence across daily returns might appear at special and rare times to create these outliers. A more sophisticated analysis is to consider each synthetic data set separately and calculate the conditional probability of observing a given drawdown given some prior observation of drawdowns.
24
D. Sornette / Physics Reports 378 (2003) 1 – 98
This gives a more precise estimation of the statistical signi+cance of the outliers, because the previously de+ned con+dence lines neglect the correlations created by the ordering process which is explicit in the construction of a cumulative distribution. Out of 10,000 synthetic data sets that were generated, we +nd that 776 had a single drawdown larger than 16.5%, 13 had two drawdowns larger than 16.5%, 1 had three drawdowns larger than 16.5% and none had 4 (or more) drawdowns larger than 16.5% as in the real data. This means that, given the distribution of returns, by chance we have a 8% probability of observing a drawdowns larger than 16.5%, a 0.1% probability of observing two drawdowns larger than 16.5% and for all practical purposes zero probability of observing three or more drawdowns larger than 16.5%. Hence, we can reject the hypothesis that the four largest drawdowns observed on the Nasdaq composite index could result from chance alone with a probability or con+dence better than 99.99%, i.e., essentially with certainty. As a consequence, we are lead again to conclude that the largest market events are characterised by a stronger dependence than is observed during “normal” times. This analysis con+rms the conclusion from the analysis of the DJIA shown in Fig. 3, that drawdowns larger than about 15% are to be considered as outliers with high probability. It is interesting that the same amplitude of approximately 15% is found for both markets considering the much larger daily volatility of the Nasdaq Composite. This may result from the fact that, as we have shown, very large drawdowns are more controlled by transient correlations leading to runs of losses lasting a few days than by the amplitude of a single daily return. The statistical analysis of the Dow Jones average and the Nasdaq composite suggests that large crashes are special. In following sections, we shall show that there are other speci+c indications associated with these “outliers”, such as precursory patterns decorating the speculative bubbles ending in crashes. 3.6. The presence of “Outliers” is a general phenomenon To avoid a tedious repetition of many +gures, we group the cumulative distributions of drawdowns and complementary cumulative distributions of several stocks in the same Fig. 6. In order to construct this +gure, we have +tted the stretched exponential model (1) to each distribution and obtained the corresponding parameters A, and z given in Johansen and Sornette (2001c). We then construct the normalized distributions NC(n) (x) = Nc ((|x|=)z )=A ;
(2)
using the triplet A, and z which is speci+c to each distribution. Fig. 6 plots the expression (2) for each distribution, i.e., Nc =A as a function of y ≡ sign(x)(x=)z . If the stretched exponential model (1) held true for all the drawdowns and all the drawups, all the normalized distributions should collapse exactly onto the “universal” functions ey for the drawdowns and e−y for the drawups. We observe that this is the case for values of |y| up to about 5, i.e., up to typically 5 standard deviations (since most exponents z are close to 1), beyond which there is a clear upward departure observed both for drawdowns and for drawups. Comparing with the extrapolation of the normalized stretched exponential model e−|y| , the empirical normalized distributions give about 10 times too many drawdowns and drawups larger than |y| = 10 standard deviations and more the 104 too many drawdowns and drawups larger than |y| = 20 standard deviations. Note that for AT& T, a crash of ≈ 73% occurred which lies beyond the range shown in Fig. 6.
D. Sornette / Physics Reports 378 (2003) 1 – 98
25
Fig. 6. Cumulative distribution of drawdowns and complementary cumulative distribution of drawups for 29 companies, which include the 20 largest USA companies in terms of capitalisation according to Forbes at the beginning of the year 2000, and in addition Coca Cola (Forbes number 25), Qualcomm (number 30), Appl. Materials (number 35), Procter & Gamble (number 38), JDS Uniphase (number 39), General Motors (number 43), Am. Home Prod. (number 46), Medtronic (number 50) and Ford (number 64). This +gure plots each distribution Nc normalized by its corresponding factor A as a function of the variable y ≡ sign(x)(|x|=)z , where and z are speci+c to each distribution and obtained from the +t to the stretched exponential model. Reproduced from Johansen and Sornette (2001c).
The results obtained in Johansen and Sornette (1998, 2000a, 2001c, 2002) can be summarized as follows: 1. Approximately 1–2% of the largest drawdowns are not at all explained by the exponential null-hypothesis or its extension in terms of the stretched exponential (1). Large drawdowns up to three times larger than expected from the null-hypothesis are found to be ubiquitous occurrences of essentially all the times series that we have investigated, the only noticeable exception being the French index CAC40. We term “outliers” these anomalous drawdowns. 2. About half of the time series show outliers for the drawups. The drawups are thus diNerent statistically from the drawdowns and constitute a less conspicuous structure of +nancial markets. 3. For companies, large drawups of more than 15% occur approximately twice as often as large drawdowns of similar amplitudes. 4. The bulk (98%) of the drawdowns and drawups are very well-+tted by the exponential nullhypothesis (based on the assumption of independent price variations) or by the stretched exponential model. The most important result is the demonstration that the very largest drawdowns are outliers. This is true notwithstanding the fact that the very largest daily drops are not outliers, except for the exceptional and unique daily drop on October 29, 1987. Therefore, the anomalously large amplitude of the drawdowns can only be explained by invoking the emergence of rare but sudden persistences of successive daily drops, with in addition correlated ampli+cation of the drops. Why such successions of correlated daily moves occur is a very important question with consequences for portfolio management and systemic risk, to cite only two applications, that we are going to investigate in the following sections.
26
D. Sornette / Physics Reports 378 (2003) 1 – 98
3.7. Implications for safety regulations of stock markets The realization that large drawdowns and crashes in particular may result from a run of losses over several successive days is not without consequences for the regulation of stock markets. Following the market crash of October 1987, in an attempt to head oN future one-day stock market tumbles of historic proportions, the Securities and Exchange Commission and the three major U.S. stock exchanges agreed to install the so-called circuit breakers. Circuit breakers are designed to gradually inhibit trading during market declines, +rst curbing New York Stock Exchange program trades and eventually halting all U.S. equity, options and futures activity. Similar circuit breakers are operating in the other world stock markets with diNerent speci+c de+nitions. The argument is that the halt triggered by a circuit breaker will provide time for brokers and dealers to contact their clients when there are large price movements and to get new instructions or additional margin. They also limit credit risk and loss of +nancial con+dence by providing a “time-out” to settle up and to ensure that everyone is solvent. This inactive period is of further use for investors to pause, evaluate and inhibit panic. Finally, circuit breakers clarify the illusion of market liquidity by spelling out the economic fact of life that markets have limited capacity to absorb massive unbalanced volumes. They thus force large investors, such as pension portfolio managers and mutual funds, to take even more account of the impact of their “size order”, thus possibly cushioning large market movements. Others argue that a trading halt can increase risk by inducing trading in anticipation of a trading halt. Another disadvantage is that they prevent some traders from liquidating their positions, thus creating market distorsion by preventing price discovery (Harris, 1997). For the October 1987 crash, countries that had stringent circuit breakers, such as France, Switzerland and Israel, had also some of the largest cumulative losses. According to the evidence presented here that large drops are created by transient and rare dependent losses occurring over several days, we should be cautious in considering circuit breakers as reliable crash killers. 4. Positive feedbacks Since it is the actions of investors whose buy and sell decisions move prices up and down, any deviation from a random walk in the stock market price trajectory has ultimately to be traced back to the behavior of investors. We are in particular interested in mechanisms that may lead to positive feedbacks on prices, i.e., to the fact that, conditioned on the observation that the market has recently moved up (respectively down), this makes it more probable to keep it moving up (respectively down), so that a large cumulative move ensues. The concept of “positive feedbacks” has a long history in economics and is related to the idea of “increasing returns”—which says that goods become cheaper the more of them are produced (and the closely related idea that some products, like fax machines, become more useful the more people use them). “Positive feedback” is the opposite of “negative feedback”, a concept well-known for instance in population dynamics: the larger the population of rabbits in a valley, the less they have grass per rabbit. If the population grows too much, they will eventually starve, slowing down their reproduction rate which thus reduces their population at a later time. Thus negative feedback means that the higher the population, the slower the growth rate, leading to a spontaneous regulation of the population size; negative feedbacks thus
D. Sornette / Physics Reports 378 (2003) 1 – 98
27
tend to regulate growth towards an equilibrium. In contrast, positive feedback asserts that the higher the price or the price return in the recent past, the higher will be the price growth in the future. Positive feedbacks, when unchecked, can produce runaways until the deviation from equilibrium is so large that other eNects can be abruptly triggered and lead to rupture or crashes. Youssefmir et al. (1998) have stressed the importance of positive feedback in a dynamical theory of asset price bubbles that exhibits the appearance of bubbles and their subsequent crashes. The positive feedback leads to speculative trends which may dominate over fundamental beliefs and which make the system increasingly susceptible to any exogenous shock, thus eventually precipitating a crash. There are many mechanisms in the stock market and in the behavior of investors which may lead to positive feedbacks. We describe a general mechanism for positive feedback, which is now known as the “herd” or “crowd” eNect, based on imitation processes. We present a simple model of the best investment strategy that an investor can develop based on interactions with and information taken from other investors. We show how the repetition of these interactions may lead to a remarkable cooperative phenomenon in which the market can suddenly “solidify” a global opinion, leading to large price variations. 4.1. Herding There are growing empirical evidences of the existence of herd or “crowd” behavior in speculative markets (see Shiller, 2000 and references therein). Herd behavior is often said to occur when many people take the same action, because some mimic the actions of others. The term “herd” obviously refers to similar behavior observed in animal groups. Other terms such as “Kocks” or “schools” describe the collective coherent motion of large numbers of self-propelled organisms, such as migrating birds and gnus, lemmings and ants. In recent years, much of the observed herd behavior in animals has been shown to result from the action of simple laws of interactions between animals. With respect to humans, there is a long history of analogies between human groups and organized matter (Callen and Shapero, 1974; Montroll and Badger, 1974). More recently, extreme crowd motions such as under panic have been remarkably well quanti+ed by models that treat the crowd as a collection of individuals interacting as a granular medium with friction such as the familiar sand of beaches (Helbing et al., 2000). Herding has been linked to many economic activities, such as investment recommendations (Scharfstein and Stein, 1990; Graham, 1999; Welch, 2000s), price behavior of IPO’s (Initial Public ONering) (Welch, 1992), fads and customs (Bikhchandani et al., 1992), earnings forecasts (Trueman, 1994), corporate conservatism (Zwiebel, 1995) and delegated portfolio management (Maug and Naik, 1995). Researchers are investigating the incentives investment advisors face when deciding whether to herd and, in particular, whether economic conditions and agents’ individual characteristics aNect their likelihood of herding. Although herding behavior appears ineAcient from a social standpoint, it can be rational from the perspective of managers who are concerned about their reputations in the labor market, Such behavior can be rational and may occur as an information cascade (Welch, 1992; Bikhchandani et al., 1992; Devenow and Welch, 1996), a situation in which every subsequent actor, based on the observations of others, makes the same choice independent of his/her private signal. Herding among investment newsletters, for instance, is found to decrease with the precision of private information (Graham, 1999): the less information you have, the more important is your incentive to follow the consensus.
28
D. Sornette / Physics Reports 378 (2003) 1 – 98
Research on herding in +nance can be subdivided in the following nonmutually exclusive manner (Devenow and Welch, 1996; Graham, 1999). 1. Informational cascades occur when individuals choose to ignore or downplay their private information and instead jump on the bandwagon by mimicking the actions of individuals who acted previously. Informational cascades occur when the existing aggregate information becomes so overwhelming that an individual’s single piece of private information is not strong enough to reverse the decision of the crowd. Therefore, the individual chooses to mimic the action of the crowd, rather than act on his private information. If this scenario holds for one individual, then it likely also holds for anyone acting after this person. This domino-like eNect is often referred to as a cascade. The two crucial ingredients for an informational cascade to develop are: [1] sequential decisions with subsequent actors observing decisions (not information) of previous actors; and [2] a limited action space. 2. Reputational herding, like cascades, takes place when an agent chooses to ignore his or her private information and mimics the action of another agent who has acted previously. However, reputational herding models have an additional layer of mimicking, resulting from positive reputational properties that can be obtained by acting as part of a group or choosing a certain project. Evidence has been found that a forecaster’s age is positively related to the absolute +rst diNerence between his forecast and the group mean. This has been interpreted as evidence that as a forecaster ages, evaluators develop tighter prior beliefs about the forecasters ability, and hence the forecaster has less incentive to herd with the group. On the other hand, the incentive for a second-mover to discard his private information and instead mimick the market leader increases with his initial reputation, as he strives to protect his current status and level of pay (Graham, 1999). 3. Investigative herding occurs when an analyst chooses to investigate a piece of information he or she believes others also will examine. The analyst would like to be the +rst to discover the information but can only pro+t from an investment if other investors follow suit and push the price of the asset in the direction anticipated by the +rst analyst. Otherwise, the +rst analyst may be stuck holding an asset that he or she cannot pro+tably sell. 4. Empirical herding refers to observations by many researchers of “herding” without reference to a speci+c model or explanation. There is indeed evidence of herding and clustering among pension funds, mutual funds, and institutional investors when a disproportionate share of investors engage in buying, or at other times selling, the same stock. These works suggest that clustering can result from momentum-following also called “positive feedback investment”, e.g., buying past winners or perhaps from repeating the predominant buy or sell pattern from the previous period. There are many reported case of herding. One of the most dramatic and clearest in recent times is the observation (Huberman and Regev, 2001) of a contagious speculation associated with a nonevent in the following sense. A Sunday New York Times article on a potential development of a new cancer-curing drugs caused the biotech company EntreMed’s stock to rise from 12.063 at the Friday May 1, 1998 close to open at 85 on Monday May 4, close near 52 on the same day and remain above 39 in the three following weeks. The enthusiasm spilled over to other biotechnology stocks. It turns out that the potential breakthrough in cancer research already had been reported in one of the leading scienti+c journal ‘Nature’ and in various popular newspaper (including the Times) more than
D. Sornette / Physics Reports 378 (2003) 1 – 98
29
+ve months earlier. At that time, market reactions were essentially nil. Thus the enthusiastic public attention induced a long-term rise in share prices, even though no genuinely new information had been presented. The very prominent and exceptionally optimistic Sunday New York Times article of May 3, 1998 led to a rush on EntreMed’s stock and other biotechnology companies’ stocks, which is reminiscent of similar rushes leading to bubbles in historical times previously discussed. It is to be expected that information technology, the internet and biotechnology are among the leading new frontiers on which sensational stories will lead to enthusiasm, contagion, herding and speculative bubbles. 4.2. It is optimal to imitate when lacking information All the traders in the world are organized into a network of family, friends, colleagues, contacts, and so on, which are sources of opinion and they inKuence each other locally through this network (Boissevain and Mitchell, 1973). We call “neighbors” of agent Anne on this world-wide graph the set of people in direct contact with Anne. Other sources of inKuence also involve newspapers, web sites, TV stations, and so on. Speci+cally, if Anne is directly connected with k “neighbors” in the worldwide graph of connections, then there are only two forces that inKuence Anne’s opinion: (a) the opinions of these k people together with the inKuence of the media; and (b) an idiosyncratic signal that she alone receives (or generates). According to the concept of herding and imitation, the assumption is that agents tend to imitate the opinions of their “neighbors”, not contradict them. It is easy to see that force (a) will tend to create order, while force (b) will tend to create disorder, or in other words, heterogeneity. The main story here is the +ght between order and disorder and the question we are now going to investigate is: what behavior can result from this +ght? Can the system go through unstable regimes, such as crashes? Are crashes predictable? We show that the science of self-organizing systems (sometimes also referred to as “complex systems”) bears very signi+cantly on these questions: the stock market and the web of traders’ connections can be understood in large part from the science of critical phenomena, in a sense that we are going to examine in some depth in the following sections, from which important consequences can be derived. To make progress, we formalize a bit the problem and consider a network of investors: each one can be named by an integer i = 1; : : : ; I , and N (i) denotes the set of the agents who are directly connected to agent i according to the world-wide graph of acquaintances. If we isolate one trader, Anne, N (Anne) is the number of traders in direct contact with her and who can exchange direct information with her and exert a direct inKuence on her. For simplicity, we assume that any investor such as Anne can be in only one of several possible states. In the simplest version, we can consider only two possible states: sAnne = −1 or sAnne = +1. We could interpret these states as “buy” and “sell”, “bullish” and “bearish”, “optimistic” and “pessimistic”, and so on. In the next paragraph, we show that, based only on the information of the actions sj (t − 1) performed yesterday (at time t − 1) by her N (Anne) “neighbors”, Anne maximizes her return by having taken yesterday the decision sAnne (t − 1) given by the sign of the sum of the actions of all her “neighbors”. In other words, the optimal decision of Anne, based on the local polling of her “neighbors” who she hopes represents a suAciently faithful representation of the market mood, is to imitate the majority of her neighbors. This is of course up to some possible deviations when she decides to follow her own idiosyncratic “intuition” rather than being inKuenced by her “neighbors”. Such an idiosyncratic move can be captured in this model by a stochastic component independent of the decisions of the neighbors or of any other agent. Intuitively, the reason why it is in general optimal for Anne to follow the
30
D. Sornette / Physics Reports 378 (2003) 1 – 98
opinion of the majority is simply because prices move in that direction, forced by the law of supply and demand. This apparently innocuous evolution law produces remarkable self-organizing patterns. Consider N traders in a network, whose links represent the communication channels through which the traders exchange information. The graph describes the chain of intermediate acquaintances between any two people in the world. We denote N (i) the number of traders directly connected to a given trader i on the graph. The traders buy or sell one asset at price p(t) which evolves as a function of time assumed to be discrete and measured in units of the time step ]t. In the simplest version of the model, each agent can either buy or sell only one unit of the asset. This is quanti+ed by the buy state si = +1 or the sell state si = −1. Each agent can trade at time t − 1 at the price p(t − 1) based on all previous information including that at t − 1. The asset price variation is taken simply proportional to the aggregate sum Ni=1 si (t − 1) of all traders’ actions: indeed, if this sum is zero, there are as many buyers as there are sellers and the price does not change since there is a perfect balance between supply and demand. If, on the other hand, the sum is positive, there are more buy orders than sell orders, the price has to increase to balance the supply and the demand, as the asset is too rare to satisfy all the demand. There are many other inKuences impacting the price change from one day to the other, and this can usually be accounted for in a simple way by adding a stochastic component to the price variation. This term alone would give the usual log-normal random walk process (Cootner, 1967) while the balance between supply and demand together with imitation leads to some organization as we show below. At time t − 1, just when the price p(t − 1) has been announced, the trader i de+nes her strategy si (t − 1) that she will hold from t − 1 to t, thus realizing the pro+t (or loss) equal to the price diNerence (p(t) − p(t − 1)) times her position si (t − 1). To de+ne her optimal strategy si (t − 1), the trader should calculate her expected pro+t PE , given the past information and her position, and then choose si (t − 1) such that PE is maximum. Since the price moves with the general opinion N i=1 si (t − 1), the best strategy is to buy if it is positive and sell if it is negative. The diAculty is that a given trader cannot poll the positions sj that will take all other traders which will determine the price drift according to the balance between supply and demand. The next best thing that trader i can do is to poll her N (i) “neighbors” and construct her prediction for the price drift from this information. The trader needs an additional information, namely the a priori probability P+ and P− for each trader to buy or sell. The probabilities P+ and P− are the only information that she can use for all the traders that she does not poll directly. From this, she can form her expectation of the price change. The simplest case corresponds to a market without drift where P+ = P− = 1=2. Based on the previously stated rule that the price variation is proportional to the sum of actions of traders, the best guess of trader i is that the future price change will be proportional to the sum of the actions of her neighbors that she has been able to poll, hoping that this provides a suAciently reliable sample of the total population. Traders are indeed constantly sharing information, calling each other to “take the temperature”, eNectively polling each other before taking actions. It is then clear that the strategy that maximizes her expected pro+t is such that her position is of the sign given by the sum of the actions of all her “neighbors”. This is exactly the meaning of expression (3) si (t − 1) = sign K (3) sj + j i j ∈ Ni
D. Sornette / Physics Reports 378 (2003) 1 – 98
31
such that this position si (t − 1) gives her the maximum payoN based on her best prediction of the price variation p(t) − p(t − 1) from yesterday to today. The function sign(x) is de+ned by being equal to +1 (to −1) for positive (negative) argument x, K is a positive constant of proportionality between the price change and the aggregate buy-sell orders. It is inversely proportional to the “market depth”: the larger the market, the smaller is the relative impact of a given unbalance between buy and sell orders, hence the smaller is the price change. ji is a noise and N (i) is the number of neighbors with whom trader i interacts signi+cantly. In simple terms, this law (3) states that the best investment decision for a given trader is to take that of the majority of her neighbors, up to some uncertainly (noise) capturing the possibility that the majority of her neighbors might give an incorrect prediction of the behavior of the total market. Expression (3) can be thought of as a mathematical formulation of Keynes’ beauty contest. Keynes (1936) argued that stock prices are not only determined by the +rm’s fundamental value, but, in addition, mass psychology and investors’ expectations inKuence +nancial markets signi+cantly. It was his opinion that professional investors prefer to devote their energy, not to estimating fundamental values but rather, to analyzing how the crowd of investors is likely to behave in the future. As a result, he said, most persons are largely concerned, not with making superior long-term forecasts of the probable yield of an investment over its whole life but, with foreseeing changes in the conventional basis of valuation a short time ahead of the general public. Keynes uses his famous beauty contest as a parable for stock markets. In order to predict the winner of beauty contest, objective beauty is not much important, but knowledge or prediction of others’prediction of beauty is much more relevant. In Keynes’view, the optimal strategy is not to pick those faces the player thinks the prettiest, but those the other players are likely to think the average opinion will be, or those the other players will think the others will think the average opinion will be, or even further along this iterative loop. Expression (3) precisely captures this concept: the opinion si at time t of an agent i is a function of all the opinions of the other “neighboring” agents at the previous time t − 1, which themselves depend on the opinion of the agent i at time t − 2, and so on. In the stationary equilibrium situation in which all agents +nally form an opinion after many such iterative feedbacks have had time to develop, the solution of (3) is precisely the one taking into account all the opinions in a completely self-consistent way compatible with the in+nitely iterative loop. Similarly, OrlZean (1984, 1986, 1989a, b, 1991, 1995) has captured the paradox of combining rational and imitative behavior under the name “mimetic rationality” (rationalit&e mim&etique). He has developed models of mimetic contagion of investors in the stock markets that are based on irreversible processes of opinion forming. See also Krawiecki et al. (2002) for a recent generalization with time-varying coupling strength K leading to on-oN intermittency and attractor bubbling. 4.3. Cooperative behaviors resulting from imitation The imitative behavior discussed in Section 4.2 and captured by the expression (3) belongs to a very general class of stochastic dynamical models developed to describe interacting elements, particles, agents in a large variety of contexts, in particular in physics and biology (Liggett, 1985, 1997). The tendency or force towards imitation is governed by the coupling strength K; the tendency towards idiosyncratic (or noisy) behavior is governed by the amplitude of the noise term. Thus the value of K relative to determines the outcome of the battle between order and disorder, and eventually the structure of the market prices. More generally, the coupling strength K could be heterogeneous
32
D. Sornette / Physics Reports 378 (2003) 1 – 98
Fig. 7. Four snapshots at four successive times of the state of a planar system of 64 × 64 agents put on a regular square lattice. Each agent placed within a small square interacts with her four nearest neighbors according to the imitative rule (3). White (resp. black) squares correspond to “bull” (resp. “bear”). The four cases shown here correspond to the existence of a majority of buy orders as white is the predominant color.
across pairs of neighbors, and it would not substantially aNect the properties of the model. Some of the Kij ’s could even be negative, as long as the average of all Kij ’s was strictly positive. Expression (3) only describes the state of an agent at a given time. In the next instant, new ”i ’s are realized, new inKuences propagate themselves to neighbors, and agents can change their decision. The system is thus constantly changing and reorganizing as shown in Fig. 7. The model does not assume instantaneous opinion interactions between neighbours. In real markets, opinions tend indeed not to be instantaneous but are formed over a period of time by a process involving family, friends, colleagues, newspapers, web sites, TV stations, and so on. Decisions about trading activity of a given agent may occur when the consensus from all these sources reaches a trigger level. This is precisely this feature of a threshold reached by a consensus that expression (3) captures: the consensus is quanti+ed by the sum over the N (i) agents connected to agent i and the threshold is provided by the sign function. The delay in the formation of the opinion of a given trader as a function of other traders’ opinion is captured by the progressive spreading of information during successive updating steps (see for instance Liggett, 1985, 1997). The simplest possible network is a two-dimensional grid in the Euclidean plane. Each agent has four nearest neighbors: one to the North, one to the South, the East and the West. The tendency K towards imitation is balanced by the tendency towards idiosyncratic behavior. In the context of the alignment of atomic spins to create magnetisation (magnets), this model is identical to the two-dimensional Ising model which has been solved explicitly by Onsager (1944). Only its formulation is diNerent from what is usually found in textbooks (Goldenfeld, 1992), as we emphasize a dynamical view point.
D. Sornette / Physics Reports 378 (2003) 1 – 98
33
Fig. 8. K ¡ Kc : buy (white squares) and sell (black squares) con+guration in a two-dimensional Manhattan-like planar network of 256 × 256 agents interacting with their four nearest neighbors. There are approximately the same number of white and black sells, i.e., the market has no consensus. The size of largest local clusters quanti+es the correlation length, i.e., the distance over which the local imitations between neighbors propagate before being signi+cantly distorted by the “noise” in the transmission process resulting from the idiosyncratic signals of each agent.
In the Ising model, there exists a critical point Kc that determines the properties of the system. When K ¡ Kc (see Fig. 8), disorder reigns: the sensitivity to a small global inKuence is small, the clusters of agents who are in agreement remain of small size, and imitation only propagates between close neighbors. In this case, the susceptibility of the system to external news is small as many clusters of diNerent opinion react incoherently, thus more or less cancelling out their response. When the imitation strength K increases and gets close to Kc (see Fig. 9), order starts to appear: the system becomes extremely sensitive to a small global perturbation, agents who agree with each other form large clusters, and imitation propagates over long distances. In the Natural Sciences, these are the characteristics of critical phenomena. Formally, in this case the susceptibility of the system goes to in+nity. The hallmark of criticality is the power law, and indeed the susceptibility goes to in+nity according to a power law ≈ A(Kc − K)− , where A is a positive constant and ¿ 0 is called the critical exponent of the susceptibility (equal to 7=4 for the 2-d Ising model). This kind of critical behavior is found in many other models of interacting elements (Liggett, 1985, 1997) (see also Moss de Oliveira et al. (1999) for applications to +nance among others). The large susceptibility means that the system is unstable: a small external perturbation may lead to a large collective reaction of the traders who may revise drastically their decision, which may abruptly produce a sudden unbalance between supply and demand, thus triggering a crash or a rally. This speci+c mechanism will be shown to lead to crashes in the model described in the next section.
34
D. Sornette / Physics Reports 378 (2003) 1 – 98
Fig. 9. Same as Fig. 8 for K close to Kc . There are still approximately the same number of white and black cells, i.e., the market has no consensus. However, the size of the largest local clusters has grown to become comparable to the total system size. In addition, holes and clusters of all sizes can be observed. The “scale-invariance” or “fractal” looking structure is the hallmark of a “critical state” for which the correlation length and the susceptibility become in+nite (or simply bounded by the size of the system).
For even stronger imitation strength K ¿ Kc , the imitation is so strong that the idiosyncratic signals become negligible and the traders self-organize into a strong imitative behavior as shown in Fig. 10. The selection of one of the two possible states is determined from small and subtle initial biases as well as from the Kuctuations during the evolutionary dynamics. These behaviors apply more generically to other network topologies. Indeed, the stock market constitutes an ensemble of interacting investors who diNer in size by many orders of magnitudes ranging from individuals to gigantic professional investors, such as pension funds. Furthermore, structures at even higher levels, such as currency inKuence spheres (U.S.$, DM, YEN ...), exist and with the current globalization and de-regulation of the market one may argue that structures on the largest possible scale, i.e., the world economy, are beginning to form. This observation and the network of connections between traders show that the two-dimensional lattice representation used in the Figs. 7, 8, 9 and 10 is too naive. A better representation of the structure of the +nancial markets is that of hierarchical systems with “traders” on all levels of the market. Of course, this does not imply that any strict hierarchical structure of the stock market exists, but there are numerous examples of qualitatively hierarchical structures in society. In fact, one may say that horizontal organizations of individuals are rather rare. This means that the plane network used in our previous discussion may very well represent a gross over-simpli+cation. Even though the predictions of these models are quite detailed, they are very robust to model misspeci+cation. We indeed claim that models that combine the following features would display the
D. Sornette / Physics Reports 378 (2003) 1 – 98
35
Fig. 10. Same as Fig. 8 for K ¿ Kc . The imitation is so strong that the network of agents spontaneously break the symmetry between the two decisions and one of them predominates. Here, we show the case where the “buy” state has been selected. Interestingly, the collapse onto one of the two states is essentially random and results from the combined eNect of a slight initial bias and of Kuctuations during the imitation process. Only small and isolated islands of “bears” remain in an ocean of buyers. This state would correspond to a bubble, a strong bullish market.
same characteristics, in particular apparent coordinate buying and selling periods, leading eventually to several +nancial crashes. These features are: 1. 2. 3. 4. 5.
A system of traders who are inKuenced by their “neighbors”. Local imitation propagating spontaneously into global cooperation. Global cooperation among noise traders causing collective behavior. Prices related to the properties of this system. System parameters evolving slowly through time.
As we shall show in the following sections, a crash is most likely when the locally imitative system goes through a critical point. In Physics, critical points are widely considered to be one of the most interesting properties of complex systems. A system goes critical when local inKuences propagate over long distances and the average state of the system becomes exquisitely sensitive to a small perturbation, i.e. diNerent parts of the system become highly correlated. Another characteristic is that critical systems are self-similar across scales: in Fig. 9, at the critical point, an ocean of traders who are mostly bearish may have within it several continents of traders who are mostly bullish, each of which in turns surrounds seas of bearish traders with islands of bullish traders; the progression continues all the way down to the smallest possible scale: a single trader (Wilson, 1979). Intuitively speaking, critical self-similarity is why local imitation cascades through the scales into global coordination. Critical points are described in mathematical parlance as singularities associated with bifurcation and catastrophe theory.
36
D. Sornette / Physics Reports 378 (2003) 1 – 98
The previous Ising model is one of the simplest possible description of cooperative behaviors resulting from repetitive interactions between agents. Many other models have recently been developed in order to capture more realistic properties of people and of their economic interactions. These multi-agent models, often explored by computer simulations, support the hypothesis that the observed characteristics of +nancial prices, such as nonGaussian “fat” tails of distributions of returns, mostly unpredictable returns, clustered and excess volatility, may result endogenously from the interaction between agents. Several works have modelled the epidemics of opinion and speculative bubbles in +nancial markets from an adaptative agent point-of-view (Kirman, 1991; Lux, 1995, 1998; Lux and Marchesi, 1999, 2000). The main mechanism for bubbles is that above average returns are reKected in a generally more optimistic attitude that fosters the disposition to overtake others’ bullish beliefs and vice versa. The adaptive nature of agents is reKected in the alternatives available to agents to choose between several class of strategies, for instance to invest according to fundamental economic valuation or by using technical analysis of past price trajectories. Other relevant works put more emphasis on the heterogeneity and threshold nature of decision making which lead in general to irregular cycles (Takayasu et al., 1992; Youssefmir et al., 1998; Levy et al., 1995; Sato and Takayasu, 1998; Levy et al., 2000; Gaunersdorfer, 2000). 5. Modelling &nancial bubbles and market crashes In this section, we describe three complementary models that we have developed to describe bubbles and crashes. The +rst two models are extensions of the rational expectation model of bubbles and crashes of Blanchard (1979) and Blanchard and Watson (1982). They originally introduced the model of rational expectations (RE) bubbles to account for the possibility, often discussed in the empirical literature and by practitioners, that observed prices may deviate signi+cantly and over extended time intervals from fundamental prices. While allowing for deviations from fundamental prices, rational bubbles keep a fundamental anchor point of economic modelling, namely that bubbles must obey the condition of rational expectations. In contrast, recent works stress that investors are not fully rational, or have at most bound rationality, and that behavioral and psychological mechanisms, such as herding, may be important in the shaping of market prices (Thaler, 1993; Shefrin, 2000; Shleifer, 2000). However, for Kuid assets, dynamic investment strategies rarely perform over simple buy-and-hold strategies (Malkiel, 1999), in other words, the market is not far from being eAcient and little arbitrage opportunities exist as a result of the constant search for gains by sophisticated investors. For the +rst two models, we shall work within the conditions of rational expectations and of no-arbitrage condition, taken as useful approximations. Indeed, the rationality of both expectations and behavior often does not imply that the price of an asset be equal to its fundamental value. In other words, there can be rational deviations of the price from this value, called rational bubbles. A rational bubble can arise when the actual market price depends positively on its own expected rate of change, as sometimes occurs in asset markets, which is the mechanism underlying the models of Blanchard (1979) and Blanchard and Watson (1982). The third model proposes to complement the modelling of bubbles and crashes by studying the eNects of interactions between the two typical opposite attitudes of investors in stock markets, namely imitative and contrarian behaviors.
D. Sornette / Physics Reports 378 (2003) 1 – 98
37
5.1. The risk-driven model This +rst model contains the following ingredients (Johansen et al., 1999a, b, 2000a): 1. 2. 3. 4.
A system of traders who are inKuenced by their “neighbors”. Local imitation propagating spontaneously into global cooperation. Global cooperation among traders causing crash. Prices related to the properties of this system.
The interplay between the progressive strengthening of imitation controlled by the three +rst ingredients and the ubiquity of noise requires a stochastic description. A crash is not certain but can be characterized by its hazard rate h(t), i.e., the probability per unit time that the crash will happen in the next instant if it has not happened yet. The crash hazard rate h(t) embodies subtle uncertainties of the market: when will the traders realize with suAcient clarity that the market is over-valued? When will a signi+cant fraction of them believe that the bullish trend is not sustainable? When will they feel that other traders think that a crash is coming? Nowhere is Keynes’s beauty contest analogy more relevant than in the characterization of the crash hazard rate, because the survival of the bubble rests on the overall con+dence of investors in the market bullish trend. A crash happens when a large group of agents place sell orders simultaneously. This group of agents must create enough of an imbalance in the order book for market makers to be unable to absorb the other side without lowering prices substantially. A notable fact is that the agents in this group typically do not know each other. They did not convene a meeting and decide to provoke a crash. Nor do they take orders from a leader. In fact, most of the time, these agents disagree with one another, and submit roughly as many buy orders as sell orders (these are all the times when a crash does not happen). The key question is to determine by what mechanism did they suddenly manage to organize a coordinated sell-oN ? We propose the following answer (Johansen et al., 1999a, b) already outline above: all the traders in the world are organized into a network (of family, friends, colleagues, and so on) and they inKuence each other locally through this network: for instance, an active trader is constantly on the phone exchanging information and opinions with a set of selected colleagues. In addition, there are indirect interactions mediated for instance by the media. Speci+cally, if I am directly connected with k other traders, then there are only two forces that inKuence my opinion: (a) the opinions of these k people and of the global information network; and (b) an idiosyncratic signal that I alone generate. Our working assumption here is that agents tend to imitate the opinions of their connections. The force (a) will tend to create order, while force (b) will tend to create disorder. The main story here is a +ght between order and disorder. As far as asset prices are concerned, a crash happens when order wins (everybody has the same opinion: selling), and normal times are when disorder wins (buyers and sellers disagree with each other and roughly balance each other out). We must stress that this is exactly the opposite of the popular characterization of crashes as times of chaos. Disorder, or a balanced and varied opinion spectrum, is what keeps the market liquid in normal times. This mechanism does not require an overarching coordination mechanism since macro-level coordination can arise from micro-level imitation and it relies on a realistic model of how agents form opinions by constantly interacting.
38
D. Sornette / Physics Reports 378 (2003) 1 – 98
5.1.1. Finite-time singularity in the crash hazard rate In the spirit of “mean +eld” theory of collective systems (Goldenfeld, 1992), the simplest way to describe an imitation process is to assume that the hazard rate h(t) evolves according to the following equation: dh = Ch dt
with ¿ 1 ;
(4)
where C is a positive constant. Mean +eld theory amounts to embody the diversity of trader actions by a single eNective representative behavior determined from an average interaction between the traders. In this sense, h(t) is the collective result of the interactions between traders. The term h in the r.h.s. of (4) accounts for the fact that the hazard rate will increase or decrease due to the presence of interactions between the traders. The exponent ¿ 1 quanti+es the eNective number equal to − 1 of interactions felt by a typical trader. The condition ¿ 1 is crucial to model interactions and is, as we now show, essential to obtain a singularity (critical point) in +nite time. Indeed, integrating (4), we get h(t) =
B (tc − t)
with ≡
1 : −1
(5)
The critical time tc is determined by the initial conditions at some origin of time. The exponent must lie between zero and one for an economic reason: otherwise, as we shall see, the price would go to in+nity when approaching tc (if the bubble has not crashed in the mean time). This condition translates into 2 ¡ ¡ + ∞: a typical trader must be connected to more than one other trader. There is a large body of literature in Physics, Biology and Mathematics on the microscopic modelling of systems of stochastic dynamical interacting agents that lead to critical behaviors of the type (5) (Liggett, 1985, 1997). The macroscopic model (4) can thus be substantiated by speci+c microscopic models (Johansen et al., 2000). Before continuing, let us provide an intuitive explanation for the creation of a +nite-time singularity at tc . The faster-than-exponential growth of the return and of the crash hazard rate correspond to nonconstant growth rates, which increase with the return and with the hazard rate. The following reasoning allows us to understand intuitively the origin of the appearance of an in+nite slope or in+nite value in a +nite time at tc , called a +nite-time singularity. Suppose for instance that the growth rate of the hazard rate doubles when the hazard rate doubles. For simplicity, we consider discrete time intervals as follows. Starting with a hazard rate of 1, we assume it grows at a constant rate of 1% per day until it doubles. We estimate the doubling time as proportional to the inverse of the growth rate, i.e., approximately 1=1% = 1=0:01= one hundred days. There is a multiplicative correction term equal to ln 2 = 0:69 such that the doubling time is ln 2=1% = 69 days. But we factor out this proportionality factor ln 2 = 0:69 for the sake of pedagogy and simplicity. Including it multiplies all time intervals below by 0.69 without changing the conclusions. When the hazard rate turns 2, we assume that the growth rate doubles to 2% and stays +xed until the hazard rate doubles again to reach 4. This new doubling time is only approximately 1=0:02 = 50 days at this 2% growth rate. When the hazard rate reaches 4, its growth rate is doubled to 4%. The doubling time of the hazard rate is therefore approximately halved to 25 days and the scenario continues with a doubling of the growth rate every time the hazard rate doubles. Since the doubling
D. Sornette / Physics Reports 378 (2003) 1 – 98
39
time is approximately halved at each step, we have the following sequence (time=0, hazard rate=1, growth rate = 1%), (time = 100, hazard rate = 2, growth rate = 2%), (time = 150, hazard rate = 4, growth rate = 4%), (time = 175, hazard rate = 8, growth rate = 8%) and so on. We observe that the time interval needed for the hazard rate to double is shrinking very rapidly by a factor of two at each step. In the same way that 1 1 1 1 + + + + ··· = 1 ; 2 4 8 16
(6)
which was immortalized by the Ancient Greeks as Zeno’s paradox, the in+nite sequence of doubling thus takes a +nite time and the hazard rate reaches in+nity at a +nite “critical time” approximately equal to 100 + 50 + 25 + · · · = 200 (a rigorous mathematical treatment requires a continuous time formulation, which does not change the qualitative content of the example). A spontaneous singularity has been created by the increasing growth rate! This process is quite general and applies as soon as the growth rate possesses the property of being multiplied by some factor larger than 1 when the hazard rate or any other observable is multiplied by some constant larger than 1. 5.1.2. Derivation from the microscoping Ising model The phenomenological equations (4) and (5) can be derived from the microscopic model of agent interactions described by Eq. (3). For this, let us assume that the imitation strength K changes smoothly with time, as a result for instance of the varying con+dence level of investors, the economic outlook, and so on. The simplest assumption, which does not change the nature of the argument, is that K is proportional to time. Initially, K is small and only small clusters of investors self-organize, as shown in Fig. 8. As K increases, the typical size of the clusters increases as shown in Fig. 9. These kinds of systems exhibiting cooperative behavior are characterized by a broad distribution of cluster sizes s (the size of the black islands for instance) up to a maximum s∗ which itself increases in an accelerating fashion up to the critical value Kc . Right at K = Kc , the geography of clusters of a given kind becomes self-similar with a continuous hierarchy of sizes from the smallest (the individual investor) to the largest (the total system). Within this phenomenology, the probability for a crash to occur is constructed as follows. First, a crash corresponds to a coordinated sell-oN of a large number of investors. In our simple model, this will happen as soon as a single cluster of connected investors, which is suAciently large to set the market oN-balance, decides to sell-oN. Recall indeed that “clusters” are de+ned by the condition that all investors in the same cluster move in concert. When a very large cluster of investors sells, this creates a sudden unbalance which triggers an abrupt drop of the price, hence a crash. To be concrete, we assume that a crash occurs when the size (number of investors) s of the active cluster is larger than some minimum value sm . The speci+c value sm is not important, only the fact that sm is much larger than 1 so that a crash can only occur as a result of a cooperative action of many traders who destabilize the market. At this stage, we do not specify the amplitude of the crash, only its triggering as an instability. For this explanation to make sense, investors change opinion and send market orders only rarely. Therefore, we should expect only one or few large clusters to be simultaneously active and able to trigger a crash. For a crash to occur, we thus need (1) to +nd at least one cluster of size larger than sm and (2) to verify that this cluster is indeed actively selling-oN. Since these two events are independent, the probability for a crash to occur is thus the product of the probability to +nd such a cluster of
40
D. Sornette / Physics Reports 378 (2003) 1 – 98
size larger than the threshold sm by the probability that such a cluster begins to sell-oN collectively. The probability to +nd a cluster of size s is a well-known characteristic of critical phenomena (Goldenfeld, 1992; StauNer and Aharony, 1994): it is a power law distribution truncated smoothly at a maximum s∗ ; this maximum increases without bound (except for the total system size) on the approach to the critical value Kc of the imitation strength. If the decision to sell oN by an investor belonging to a given cluster of size s was independent of the decisions of all the other investors in the same cluster, then the probability per unit time that such a cluster of size s becomes active would be simply proportional to the number s of investors in that cluster. However, by the very de+nition of a cluster, investors belonging to a given cluster do interact with each other. Therefore, the decision of an investor to sell oN is probably quite strongly coupled with those of the other investors in the same cluster. Hence, the probability per unit time that a speci+c cluster of s investors becomes active is a function of the number s of investors belonging to that cluster and of all the interactions between these investors. Clearly, the maximum number of interactions within a cluster is s × (s − 1)=2, that is, for large s, it becomes proportional to the square of the number of investors in that cluster. This occurs when each of the s investors speaks to each of his or her s − 1 colleagues. The factor 1=2 accounts for the fact that if investor Anne speaks to investor Paul then in general Paul also speaks to Anne and their two-ways interactions must be counted only once. Of course, one can imagine more complex situations in which Paul listen to Anne but Anne does not reciprocate but this does not change the results. Notwithstanding these complications, one sees that the probability h(t)]t per unit time ]t that a speci+c cluster of s investors becomes active must be a function growing with the cluster size s faster than s but probably slower than the maximum number of interactions (proportional to s2 ). A simple parameterization is to take h(t)]t proportional to the cluster size s elevated to some power larger than 1 but smaller than 2. This exponent captures the collective organization within a cluster of size s due to the multiple interactions between its investors. It is related to the concept of fractal dimensions. The probability for a crash to occur, which is the same as the probability of +nding at least one active cluster of size larger than the minimum destabilizing size sm , is therefore the sum over all sizes s larger than sm of all the products of probabilities ns to +nd a cluster of a speci+c size s by their probability per unit time to become active (itself proportional to s as we have argued). With mild technical conditions, it can then be shown that the crash hazard rate exhibits a power law acceleration with a singular behavior. Intuitively, this result stems from the interplay between the existence of larger and larger clusters as the interaction parameter K approached its critical value Kc and from the nonlinear accelerating probability per unit time for a cluster to become active as its typical size s∗ grows with the approach of K to Kc . The diverging acceleration of the crash probability implies a remarkable prediction for the crash hazard rate: indeed, the crash hazard rate is nothing but the rate of change of the probability of a crash as a function of time (conditioned on it not having happened yet). The crash hazard rate thus increases without bounds as K goes to Kc . The risk of a crash per unit time, knowing that the crash has not yet occurred, increases dramatically when the interaction between investors becomes strong enough so that the network of interactions between traders self-organized into a hierarchy containing a few large spontaneously formed groups acting collectively. We stress that Kc is not the value of the imitation strength at which the crash occurs, because the crash could happen for any value before Kc , even though this is not very likely. Kc is the most probable value of the imitation strength for which the crash occurs. To translate these results as
D. Sornette / Physics Reports 378 (2003) 1 – 98
41
a function of time, it is natural to expect that the imitation strength K is changing slowly with time as a result of several factors inKuencing the tendency of investors to herd. A typical trajectory K(t) of the imitation strength as a function of time t is erratic and smooth. The critical time tc is de+ned as the time at which the critical imitation strength Kc is reached for the +rst time starting from some initial value. tc is not the time of the crash, it is the end of the bubble. It is the most probable time of the crash because the hazard rate is largest at that time. Due to its probabilistic nature, the crash can occur at any other time, with a likelihood changing with time following the crash hazard rate. The critical time tc (or Kc ) signals the death of the speculative bubble. We stress that tc is not the time of the crash because the crash could happen at any time before tc , even though this is not very likely. tc is simply the most probable time of the crash. There exists a +nite probability tc 1− h(t) dt ¿ 0 (7) t0
of “landing” smoothly, i.e., of attaining the end of the bubble without crash. This residual probability is crucial for the coherence of the model, because otherwise agents would anticipate the crash and would exit from the market. 5.1.3. Dynamics of prices from the rational expectation condition Assume for simplicity that, during a crash, the price drops by a +xed percentage ∈ (0; 1), say between 20 and 30% of the price increase above a reference value p1 . Then, the dynamics of the asset price before the crash are given by dp = !(t)p(t) dt − [p(t) − p1 ] dj ;
(8)
where j denotes a jump process whose value is zero before the crash and one afterwards. In this simpli+ed model, we neglect interest rate, risk aversion, information asymmetry, and the market-clearing condition. As a +rst-order approximation of the market organization, we assume that traders do their best and price the asset so that a fair game condition holds. Mathematically, this stylized rational expectation model is equivalent to the familiar martingale hypothesis: ∀t ¿ t
Et [p(t )] = p(t) ;
(9)
where p(t) denotes the price of the asset at time t and Et [ · ] denotes the expectation conditional on information revealed up to time t. If we do not allow the asset price to Kuctuate under the impact of noise, the solution to equation (9) is a constant: p(t) = p(t0 ), where t0 denotes some initial time. p(t) can be interpreted as the price in excess of the fundamental value of the asset. This rational expectation bubble model can be extended to general and arbitrary risk-aversion within the general stochastic discount factor theory (Sornette and Johansen, 2001). Putting (8) in (9) leads to !(t)p(t) = [p(t) − p1 ]h(t)
(10)
using E[dj] = h(t) dt. In words, if the crash hazard rate h(t) increases, the return ! increases to compensate the traders for the increasing risk. Plugging (10) into (8), we obtain a ordinary
42
D. Sornette / Physics Reports 378 (2003) 1 – 98
diNerential equation. For p(t) − p(t0 ) ¡ p(t0 ) − p1 , its solution is t h(t ) dt before the crash : p(t) ≈ p(t0 ) + [p(t0 ) − p1 ] t0
If instead the price drops by a +xed percentage price before the crash is given by
(11)
∈ (0; 1) of the price, the dynamics of the asset
dp = !(t)p(t) dt − p(t) dj :
(12)
We then get Et [dp] = !(t)p(t) dt − p(t)h(t) dt = 0 ;
(13)
which yields !(t) = h(t) :
(14)
and the corresponding equation for the price is t p(t) = log h(t ) dt before the crash : p(t0 ) t0
(15)
This gives the logarithm of the price as the relevant observable. These two diNerent scenarios for the price drops raises a rather interesting question. If the +rst scenario is the correct one, then crashes are nothing but (a partial) depletion of preceding bubbles and hence signals the markets return towards equilibrium. Hence, it may as such be taken as a sign of economical health, as also suggested by Barro et al. (1989) in relation to the crash of October 1987. On the other hand, if the second scenario is true, this suggest that bubbles and crashes are instabilities which are built-in or inherent in the market structure and that they are signatures of a market constantly out-of-balance, signaling fundamental systemic instabilities. We will return to this question in the conclusion. Johansen and Sornette (2001b) have shown that the +rst scenario is slightly more warranted according to the data. The higher the probability of a crash, the faster the price must increase (conditional on having no crash) in order to satisfy the martingale (no free lunch) condition. Intuitively, investors must be compensated by the chance of a higher return in order to be induced to hold an asset that might crash. This eNect may go against the naive preconception that price is adversely aNected by the probability of the crash, but our result is the only one consistent with rational expectations. Complementarily, from a behavioral and dynamical point of view of the +nancial market, a faster rising price decreases the probability that it can be sustained much longer and may announce an instable phase in the mind of investors. We thus face a kind of “chicken and egg” problem. Plugging (5) into (11) gives the following price law: p(t) ≈ pc −
B × (tc − t)z z
before the crash :
(16)
where z = 1 − ∈ (0; 1) and pc is the price at the critical time (conditioned on no crash having been triggered). The price before the crash thus follows a power law with a +nite upper bound pc . The trend of the price becomes unbounded as we approach the critical date. This is to compensate for an unbounded crash rate in the next instant.
D. Sornette / Physics Reports 378 (2003) 1 – 98
43
The last ingredient of the model is to recognize that the stock market is made of actors which diNers in size by many orders of magnitudes ranging from individuals to gigantic professional investors, such as pension funds. Furthermore, structures at even higher levels, such as currency inKuence spheres (U.S.$, Euro, YEN ...), exist and with the current globalization and de-regulation of the market one may argue that structures on the largest possible scale, i.e., the world economy, are beginning to form. This means that the structure of the +nancial markets have features which resembles that of hierarchical systems with “traders” on all levels of the market. Of course, this does not imply that any strict hierarchical structure of the stock market exists, but there are numerous examples of qualitatively hierarchical structures in society. Models of imitative interactions on hierarchical structures recover the power law behavior (16) (Sornette and Johansen, 1998; Johansen et al., 2000). But in addition, they predict that the critical exponent can be a complex number! The +rst order expansion of the general solution for the hazard rate is then h(t) ≈ B0 (tc − t)− + B1 (tc − t)− cos[! log(tc − t) − ] :
(17)
Once again, the crash hazard rate explodes near the critical date. In addition, it now displays log-periodic oscillations. The evolution of the price before the crash and before the critical date is given by {B0 (tc − t)z + B1 (tc − t)z cos[! log(tc − t) − %]} ; (18) z where % is another phase constant. The key feature is that oscillations appear in the price of the asset before the critical date. This means that the local maxima of the function are separated by time intervals that tend to zero at the critical date, and do so in geometric progression, i.e., the ratio of consecutive time intervals between maxima is a constant p(t) ≈ pc −
& ≡ e2'=! :
(19)
This is very useful from an empirical point of view because such oscillations are much more strikingly visible in actual data than a simple power law: a +t can “lock-in” on the oscillations which contain information about the critical date tc . Note that complex exponents and log-periodic oscillations do not necessitate a pre-existing hierarchical structure as mentioned above, but may emerge spontaneously from the nonlinear complex dynamics of markets (Sornette, 1998). To sum up, we have constructed a model in which the stock market price is driven by the risk of a crash, quanti+ed by its hazard rate. In turn, imitation and herding forces drive the crash hazard rate. When the imitation strength becomes close to a critical value, the crash hazard rate diverges with a characteristic power law behavior. This leads to a speci+c power law acceleration of the market price, providing our +rst predictive precursory pattern anticipating a crash. 5.2. The price-driven model The price-driven model inverts the logic of the previous risk-driven model: here, again as a result of the action of rational investors, the price is driving the crash hazard rate rather than the reverse. The price itself is driven up by the imitation and herding behavior of the “noisy” investors. As before, a stochastic description is required to capture the interplay between the progressive strengthening of imitation controlled by the connections and interactions between traders and the ubiquity of idiosyncratic behavior as well as the inKuence of many other factors that are impossible
44
D. Sornette / Physics Reports 378 (2003) 1 – 98
to model in details. As a consequence, the price dynamics are stochastic and the occurrence of a crash is not certain but can be characterized by its hazard rate h(t), de+ned as the probability per unit time that the crash will happen in the next instant if it has not happened yet. Keeping a basic tenet of economic theory, rational expectations, the model developed in Sornette and Andersen (2002) captures the nonlinear positive feedback between agents in the stock market as an interplay between nonlinearity and multiplicative noise. The derived hyperbolic stochastic +nite-time singularity formula transforms a Gaussian white noise into a rich time series possessing all the stylized facts of empirical prices, as well as accelerated speculative bubbles preceding crashes. Let us give the premise of the model and some preliminary results. We start from the geometric Brownian model of the bubble price B(t), dB = !B dt + B dWt , where ! is the instantaneous return rate, is the volatility and dWt is the in+nitesimal increment of the random walk with unit variance (Wiener process). We generalize this expression into dB(t) = !(B(t))B(t) dt + (B(t))B(t) dWt − (t)B(t) dj ;
(20)
allowing !(B(t)) and (B(t)) to depend arbitrarily and nonlinearly on the instantaneous realization of the price. A jump term has been added to describe a correction or a crash of return amplitude , which can be a stochastic variable taken from an a priori arbitrary distribution. Immediately after the last crash which becomes the new origin of time 0, dj is reset to 0 and will eventually jump to 1 with a hazard rate h(t), de+ned such that the probability that a crash occurs between t and t + dt conditioned on not having occurred since time 0 is h(t) dt. Following Blanchard (1979) and Blanchard and Watson (1982), B(t) is a rational expectations bubble which accounts for the possibility, often discussed in the empirical literature and by practitioners, that observed prices may deviate signi+cantly and over extended time intervals from fundamental prices. While allowing for deviations from fundamental prices, rational bubbles keep a fundamental anchor point of economic modelling, namely that bubbles must obey the condition of rational expectations. This translates essentially into the no-arbitrage condition with risk-neutrality, which states that the expectation of dB(t) conditioned on the past up to time t is zero. This allows us to determine the crash hazard rate h(t) as a function of B(t). Using the de+nition of the hazard rate h(t) dt = dj, where the bracket denotes the expectation over all possible outcomes since the last crash, this leads to !(B(t))B(t) − B(t)h(t) = 0, which provides the hazard rate as a function of price: h(t) =
!(B(t)) :
(21)
Expression (21) quanti+es the fact that the theory of rational expectations with risk-neutrality associates a risk to any price: for example, if the bubble price explodes, so will the crash hazard rate, so that the risk-return trade-oN is always obeyed. We note that it is easy to incorporate risk-aversion by introducing a risk-premium rate or by amplifying the risk of a crash perceived by traders. The dependence of !(B(t)) and (B(t)) is chosen so as to capture the possible appearance of positive feedbacks on prices. There are many mechanisms in the stock market and in the behavior of investors which may lead to positive feedbacks. First, investment strategies with “portfolio insurance” are such that sell orders are issued whenever a loss threshold (or stop loss) is passed. It is clear that by increasing the volume of sell order, this may lead to further price decreases. Some commentators have indeed attributed the crash of October 1987 to a cascade of sell orders. Second, there is
D. Sornette / Physics Reports 378 (2003) 1 – 98
45
a growing empirical evidence of the existence of herd or “crowd” behavior in speculative markets (Shiller, 2000), in fund behaviors (Scharfstein and Stein, 1990; Grinblatt et al., 1995) and in the forecasts made by +nancial analysts (Trueman, 1994). Although this behavior is ineAcient from a social standpoint, it can be rational from the perspective of managers who are concerned about their reputations in the labor market. As we have already mentioned, such behavior can be rational and may occur as an information cascade, a situation in which every subsequent actor, based on the observations of others, makes the same choice independent of his/her private signal (Bikhchandani et al., 1992). Herding leads to positive nonlinear feedback. Another mechanism for positive feedbacks is the so-called “wealth” eNect: a rise of the stock market increases the wealth of investors who spend more, adding to the earnings of companies, and thus increasing the value of their stock. The evidence for nonlinearity has a strong empirical support: for instance, the coexistence of the absence of correlation of price changes and the strong autocorrelation of their absolute values can not be explained by any linear model (Hsieh, 1995). Comparing additively nonlinear processes and multiplicatively nonlinear models, the later class of models are found consistent with empirical price changes and with options’ implied volatilities. With the additional insight that hedging strategies of general Black–Scholes option models lead to a positive feedback on the volatility (Sircar and Papanicolaou, 1998), we are led to propose the following simplistic nonlinear model with multiplicative noise in which the return rate and the volatility are nonlinear increasing power law of B(t) (Sornette and Andersen, 2002): !(B)B =
m [B(B)]2 + !0 [B(t)=B0 ]m ; 2B
(B)B = 0 [B(t)=B0 ]m ;
(22) (23)
where B0 , !0 , m ¿ 0 and 0 are four parameters of the model, setting respectively a reference scale, an eNective drift and the strength of the nonlinear positive feedback. The +rst term in the r.h.s. (22) is added as a convenient device to simplify the Ito calculation of these stochastic diNerential equations. The model can be reformulated in the Stratonovich interpretation dB = (a!0 + b,)Bm ; dt
(24)
where a and b are two constants and , is a delta-correlated Gaussian white noise, in physicist’s notation such that , dt ≡ dW . The form (24) exempli+es the fundamental ingredient of the theory developed in Sornette and Andersen (2002) based on the interplay between nonlinearity and multiplicative noise. The nonlinearity creates a singularity in +nite time and the multiplicative noise makes it stochastic. The choice (22), (23) or (24) are the simplest generalization of the standard geometric Brownian model (20) recovered for the special case m = 1. The introduction of the exponent m is a straightforward mathematical trick to account in the simplest and most parsimonious way for the presence of nonlinearity. Note in particular that, in the limit where m becomes very large, the nonlinear function Bm tends to a threshold response. The power Bm can be decomposed as Bm = Bm−1 × B stressing the fact that Bm−1 plays the role of a growth rate, function of the price itself. The positive feedback eNect is captured by the fact that a larger price B feeds a larger growth rate, which leads to a larger price and so no.
46
D. Sornette / Physics Reports 378 (2003) 1 – 98
The solution of (20) with (22) and (23) is given by B(t) =
1 (!0 [tc − t] − (0 =B0m )W (t))
where ≡
1 m−1
(25)
with tc = y0 =(m − 1)!0 is a constant determined by the initial condition with y0 = 1=B(t = 0)m−1 . To grasp the meaning of (25), let us +rst consider the deterministic case 0 = 0, such that the return rate !(B) ˙ [B(t)]m−1 is the sole driving term. Then, (25) reduces to B(t) ˙ 1=[tc − t]1=(m−1) , i.e., a positive feedback m ¿ 1 of the price B(t) on the return rate ! creates a +nite-time singularity at a critical time tc determined by the initial starting point. This power law acceleration of the price accounts for the eNect of herding resulting from the positive feedback. It is in agreement with the empirical +nding that price peaks have sharp concave upwards maxima (Roehner and Sornette, 1998). Reintroducing the stochastic component 0 = 0, we see from (25) that the +nite-time singularity still exists but its visit is controlled by the +rst passage of a biased random walk at the position !0 tc such that the denominator !0 [tc − t] − (0 =B0m )W (t) vanishes. In practice, a price trajectory will never sample the +nite-time singularity as it is not allowed to approach too close to it due to the jump process dj de+ned in (20). Indeed, from the no-arbitrage condition, expression (21) for the crash hazard rate ensures that when the price explodes, so does h(t) so that a crash will occur with larger and larger probability, ultimately screening the divergence which can never be reached. The endogeneous determination (21) of the crash probability also ensures that the denominator !0 [tc − t] − (0 =B0m )W (t) never becomes negative: when it approaches zero, B(t) blows up and the crash hazard rate increases accordingly. A crash will occur with probability 1 before the denominator reaches zero. Hence, the price B(t) remains always positive and real. We stress the remarkably simple and elegant constraint on the dynamics provided by the rational expectation condition that ensures the existence and stationarity of the dynamics at all times, notwithstanding the locally nonlinear stochastic explosive dynamics. When !0 ¿ 0, the random walk has a positive drift attracting the denominator in (25) to zero (i.e., attracting the bubble to in+nity). However, by the mechanism explained above, as B(t) increases, so does the crash hazard rate by relation (21). Eventually, a crash occurs that reset the bubble to a lower price. The random walk with drift goes on, eventually B(t) increases again and reaches “dangerous waters”, a crash occurs again, and so on. Note that a crash is not a certain event: an inKated bubble price can also deKate spontaneously by the random realization of the random walk W (t) which brings back the denominator far from zero. Fig. 11 shows a typical trajectory of the bubble component of the price generated by the nonlinear positive feedback model of Sornette and Andersen (2002), starting from some initial value up to the time just before the price starts to blow up. The simplest version of this model consists in a bubble price B(t) being essentially a power of the inverse of a random walk W (t) in the following sense. Starting from B(0) = W (0) = 0 at the origin of time, when the random walk approaches some value Wc here taken equal to 1, B(t) increases and vice versa. In particular, when W (t) approaches 1, B(t) blows up and reaches a singularity at the time tc when the random walk crosses 1. This process generalizes in the random domain the +nite-time singularities described in Section 5.1.1, such that the monotonously increasing process culminating at a critical time tc is replaced by the random walk that wanders up and down before eventually reaching the critical level. This nonlinear positive feedback bubble process B(t) can thus be called a “singular inverse random walk”. In absence of a crash, the process B(t) can exist only up to a +nite time: with probability one (i.e., with certainty),
D. Sornette / Physics Reports 378 (2003) 1 – 98
47
B (t)
4
2
0
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500
W (t)
1 0.5 0
dB (t)
0.2
0
-0.2
dW (t)
0.1
0
-0.1
t
Fig. 11. Top panel: realization of a bubble price B(t) as a function of time constructed from the “singular inverse random walk”. This corresponds to a speci+c realization of the random numbers used in generating the random walks W (t) represented in the second panel. The top panel is obtained by taking a power of the inverse of a constant Wc here taken equal to 1 minus the random walk shown in the second panel. In this case, when the random walk approach 1, the bubble diverges. Notice the similarity between the trajectories shown in the top (B(t)) and second (W (t)) panels as long as the random walk W (t) does not approach too much the value Wc = 1. It is free to wander but when it approaches 1, the bubble price B(t) shows much greater sensitivity and eventually diverges as W (t) reaches 1. Before this happens, B(t) can exhibit local peaks, i.e., local bubbles, which come back smoothly. This corresponds to a realization when the random walk approaches Wc without touching it and then spontaneously recedes away from it. The third (respectively fourth) panel shows the time series of the increments dB(t) = B(t) − B(t − 1) of the bubble (respectively dW (t) = W (t) − W (t − 1) of the random walk. Notice the intermittent bursts of strong volatility in the bubble compared to the featureless constant level of Kuctuations of the random walk (reproduced from Sornette and Andersen (2002)).
we know from the study of random walks that W (t) will eventually reach any level, in particular the value Wc = 1 in our example at which B(t) diverges. The second eNect that tampers the possible divergence of the bubble price, by far the most important one in the regime of highly over-priced markets, is the impact of the price on the crash hazard rate discussed above: as the price blows up due to imitation, herding, speculation as well as randomness, the crash hazard rate increases even faster according to Eq. (21), so that a crash will occur and drive the price back closer to its fundamental value. The crashes are triggered in a random way governed by the crash hazard rate which is an increasing function of the bubble price. In the present formulation, the higher the bubble price is, the higher is the probability of a crash. In this model, a crash is similar to a purge administered to a patient.
48
D. Sornette / Physics Reports 378 (2003) 1 – 98
This model (Sornette and Andersen, 2002) proposes two scenarios for the end of a bubble: either a spontaneous deKation or a crash. These two mechanisms are natural features of the model and have not been arti+cially added. These two scenarios are indeed observed in real markets, as will be described later. This model has an interesting and far-reaching consequence in terms of the repetition and organization of crashes in time. Indeed, we see that each time the random walk approaches the chosen constant Wc , the bubble price blows up and, according to the no-arbitrage condition together with the rational expectations, this implies that the market enters “dangerous waters” with a crash looming ahead. The random walk model provides a very speci+c prediction on the waiting times between successive approaches to the critical value Wc , i.e., between successive bubbles. The distribution of these waiting times is found to be a very broad power law distribution, so broad that the average waiting time is mathematically in+nite (Sornette, 2000a). In practice, this leads to two inter-related phenomena: clustering (bubbles tend to follow bubbles at short times) and long-term memory (there are very long waiting times between bubbles once a bubble has deKated for a suAciently long time). The “singular inverse random walk” bubble model thus predicts very large intermittent Kuctuations in the recurrence time of speculative bubbles. Solution (25) can be used to invert real data during periods preceding +nancial crashes to obtain the relevant parameters. We present here some tests using an inversion method based on minimizing the Kolmogorov–Smirnov (KS) distance between the empirical distribution of returns and the synthetic one generated by the model, performed on the Hong Kong market prior to the crash which occurred in early 1994 and on the Nasdaq composite index prior to the crash of April 2000. To construct a meaningful distribution, we propose to add a constant fundamental price F to the bubble price B(t) as only their sum is observable in real life: P(t) = ert [F + B(t)] :
(26)
We can also include the √ possibility for a interest rate r or growth of the economy with rate r. We denote M = !0 = and V = 0 =B0m . For the Hang Seng index, the best +t is with = 2:5; V = 1:1 × 10−7 ; M = 4:23 × 10−5 ; r = 0:00032 and F = 2267:3. corresponding to a KS con+dence level of 96:3%. This should be compared with the best Gaussian +t to the empirical price returns giving a KS con+dence level of 11%. Thus the model “gaussianizes” the data at a very high signi+cance level: a white-Gaussian noise input is transformed by the nonlinear multiplicative process into a realistically looking +nancial time series. For the Nasdaq composite index, we obtain = 2:0; V = 2:1 × 10−7 ; M = −9:29 × 10−6 ; r = 0:00496 and F = 641:5, corresponding to a KS con+dence level of 85.9%. The corresponding best Gaussian +t to the empirical price gives a KS con+dence level of 73%. Here, the improvement is less impressive but nevertheless present. With the parameters of the model that have been obtained by the inversion, we can use them to generate many scenarios that are statistically equivalent to the real history of the Hang Seng and Nasdaq composite index. Fig. 12 shows 10 synthetic evolutions of the process (26) generated with the best parameter values for both bubbles. By comparison, the empirical prices are shown as the thick lines (one time step corresponds approximately to one trading day). The smooth continuous line close to the horizontal axis is the fundamental price Fert . This model together with the inversion procedure provides a new direct tool for detecting bubbles, for identifying their starting times and the plausible ends. Changing the initial time of the time series,
D. Sornette / Physics Reports 378 (2003) 1 – 98
49
4
Hang Seng Index from 1/7 1991 to 4/2 1994
2
x 10
1.5
1
0.5 0
100
200
300
400
500
600
700
time t 4
Nasdaq Index from 5/10 1998 to 27/3 2000
2
x 10
1.5
1
0.5
0
50
100
150
200
250
300
350
400
time t
Fig. 12. Top panel: the Hang Seng index from July 1, 1991 to February 4, 1994 as well as 10 realizations of the “singular inverse random walk” bubble model generated by the nonlinear positive feedback model. Each realization corresponds to an arbitrary random walk whose drift and variance as been adjusted so as to +t best the distribution of the Heng Seng index returns. Bottom panel: the Nasdaq composite index bubble from October 5, 1998 to March 27, 2000 as well as 10 realizations of the “singular inverse random walk” bubble model generated by the nonlinear positive feedback model. Each realization corresponds to an arbitrary random walk whose drift and variance as been adjusted so as to +t best the distribution of the Nasdaq index returns (reproduced from Sornette and Andersen (2002)).
the KS probability of the resulting Gaussian +t of the transformed series W (t) should allow us to determine the starting date beyond which the model becomes inadequate at a given statistical level. Furthermore, the exponent m (or equivalently ) provides a direct measure of the speculative mood. m = 1 is the normal regime, while m ¿ 1 quanti+es a positive self-reinforcing feedback. This opens the possibility for continuously monitoring it via the inversion procedure and using it as a “thermometer” of speculation. Furthermore, the variance V of the multiplicative noise is a measure of volatility, which is signi+cantly more robust than standard estimators. This is due to the inversion of the nonlinear formula which removes a large part of the volatility clustering and of the heavy-tail nature of the distribution of returns. Its continuous monitoring via the inversion procedure suggests new ways of looking at dependence between assets. Preliminary analyses show that most of the stylized facts of +nancial time series are reproduced by this approach (Sornette and Andersen, 2002). These stylized facts concern the absence of two-point correlation between
50
D. Sornette / Physics Reports 378 (2003) 1 – 98
returns, the fat-tail structure of distributions of returns, the long-range dependence of the two-point correlation of volatility and their persistence, the multifractal structure of generalized moments of the absolute value of the returns, and so on. Application to shorter time scales covering quarters down to months should be explored to test whether this model and some of its variants may detect regime of abnormal behavior (m = 1) in +nancial time series. We stress that the proposed class of nonlinear rational bubble model is fundamentally diNerent from bubble models that have been tested previously: all previous models assumed exponentially growing bubbles and the results of statistical tests have not been convincing (Camerer, 1989; Adam and Szafarz, 1992). In contrast, bubbles may be super-exponential which make them diNerent in principle from a fundamental price growing at a constant rate. By this work, we thus hope to rejuvenate the “old” theory of rational bubbles by extending its universe into the nonlinear stochastic regime. An additional layer of re+nement can easily be added. Indeed, following Hamilton (1989) which introduced the so-called Markov switching techniques for the analysis of price returns, many scholarly works have documented the empirical evidence of regime shifts in +nancial data sets (Van Norden and Schaller, 1993; Cai, 1994; Gray, 1996; Van Norden, 1996; Schaller and van Norden, 1997; Assoe, 1998; Chauvet, 1998; DriAll and Sola, 1998). For instance, Van Norden and Schaller (1997) have proposed a Markov regime switching model of speculative behavior whose key feature is similar to ours, namely over-valuation over the fundamental price increases the probability and expected size of a stock market crash. This evidence taken together with the fact that bubbles are not expected to permeate the dynamics of the price all the time suggests the following natural extension of the model. In the simplest and most parsimonious extension, we can assume that only two regimes can occur: bubble and normal. The bubble regime follows the previous model de+nition and is punctuated by crashes occurring with the hazard rate governed by the price level. The normal regime can be for instance a standard random walk market model with constant small drift and volatility. The regime switches are assumed to be completely random. This very simple dynamical model recovers essentially all the stylized facts of empirical prices, i.e., no correlation of returns, long-range correlation of volatilities, fat-tail of return distributions, apparent fractality and multifractality and sharp peak-Kat trough pattern of price peaks. In addition, the model predicts and we con+rm by empirical data analysis that times of bubbles are associated with nonstationary increasing volatility correlations. According to this model, the apparent long-range correlation of volatility is proposed to result from random switching between normal and bubble regimes. In addition, and maybe most important, the visual appearance of price trajectories are very reminiscent of real ones, as shown in Fig. 12. The remarkably simple formulation of the price-driven “singular inverse random walk” bubble model is able to reproduce convincingly the salient properties and appearance of real price trajectories, with their randomness, bubbles and crashes. 5.3. Risk-driven versus price-driven models In common, the risk-driven model of Section 5.1 and the price-driven model of Section 5.2 describe a system of two populations of traders, the “rational” and the “noisy” traders. Occasional imitative and herding behaviors of the “noisy” traders may cause global cooperation among traders
D. Sornette / Physics Reports 378 (2003) 1 – 98
51
causing a crash. The “rational” traders provide a direct link between the crash risks and the bubble price dynamics. In the risk-driven model, the crash hazard rate determined from herding drives the bubble price. In the price-driven model, imitation and herding induce positive feedbacks on the price, which itself creates an increasing risk for a looming yet unrealized +nancial crash. We believe that both models capture a part of reality. Studying them independently is the standard strategy of dividing-to-conquer the complexity of the world. The price-driven model appears maybe as the most natural and straightforward as it captures the intuition that sky-rocketing prices are unsustainable and announce endogeneously a signi+cant correction or a crash. The risk-driven model captures a most subtle self-organization of stock markets, related to the ubiquitous balance between risk and returns. Both models embody the notion that the market anticipates the crash in a subtle self-organized and cooperative fashion, hence releasing precursory “+ngerprints” observable in the stock market prices. In other words, this implies that market prices contain information on impending crashes. The next section explores the origin and nature of these precursory patterns and prepares the road for a full-Kedge analysis of real stock market crashes and their precursors. 5.4. Imitation and contrarian behavior: hyperbolic bubbles, crashes and chaos The model of bubbles and crashes that we now discuss complements the two previous models of rational expectation (RE) bubbles in that it describes a deterministic dynamics of prices embodying both the bubble phases and the crashes (Corcos et al., 2002). It is maybe the simplest analytically tractable model of the interplay between imitative and contrarian behavior in a stock market where agents can take at least two states, bullish or bearish. Each bullish (bearish) agent polls m “friends” and changes her opinion to bearish (bullish) (1) if at least m1hb (m1bh ) among the m agents inspected are bearish (bullish) or (2) if at least m1hh ¿ m1hb (m1bb ¿ m1bh ) among the m agents inspected are bullish (bearish). The condition (1) (resp. (2)) corresponds to imitative (antagonistic) behavior. In the limit where the number N of agents is in+nite, by using combinatorial techniques, it can be shown that the dynamics of the fraction of bullish agents is deterministic and exhibits chaotic behavior in a signi+cant domain of the parameter space {1hb ; 1bh ; 1hh ; 1bb ; m}. The deterministic equation of the price trajectory is found to be of the form pt+1 = Fm (pt ) ;
(27)
where the function Fm (x) is a sum of combinatorial factors. A typical chaotic trajectory can be shown to be characterized by intermittent phases of chaos, quasi-periodic behavior and super-exponentially growing bubbles followed by crashes. A typical bubble starts initially by growing at an exponential rate and then crosses over to a nonlinear power law growth rate leading to a +nite-time singularity. The reinjection mechanism provided by the contrarian behavior introduces a nonlinear reinjection mechanism rounding oN these singularity and leads to chaos. This model is one of the rare agent-based models that give rise to interesting nonperiodic complex dynamics in the limit of an in+nite number N of agents. A +nite number of agents introduces an endogeneous source of noise superimposed on the chaotic dynamics as shown in Fig. 13. One can observe burst of volatility, exploding bubbles and quiescent regimes. The traditional concept of stock market dynamics envisions a stream of stochastic “news” that may move prices in random directions. This model, in contrast, demonstrates that certain types
52
D. Sornette / Physics Reports 378 (2003) 1 – 98
pt
1
0.5
(a)
0 0
1000
2000
3000
4000
5000 t
6000
7000
8000
9000
10000
0
1000
2000
3000
4000
5000 t
6000
7000
8000
9000
10000
0
1000
2000
3000
4000
5000 t
6000
7000
8000
9000
10000
pt
1
0.5
(b)
0 0.4
pt
0.2 0 -0.2 -0.4
(c)
Fig. 13. Time evolution of the price pt over 10 000 time steps for m=60 polled agents with (a) N =∞, (b) N =m+1=61 agents and parameters 1hb = 1bh = 0:72 and 1hh = 1bb = 0:85. The panel (c) represents the noise due to the +nite size of the system and is obtained by subtracting the time series in panel (a) from the time series in panel (b). Reproduced from Corcos et al. (2002).
of deterministic behavior—mimicry and contradictory behavior alone—can already lead to chaotic prices. While the traditional theory of rational anticipations exhibits and emphasizes self-re-inforcing mechanisms, without either predicting their inception nor their collapse, the strength of this model is to justify the occurrence of speculative bubbles. It allows for their collapse by taking into account the combination of mimetic and antagonistic behavior in the formation of expectations about prices. The speci+c feature of the model is to combine these two Keynesian aspects of speculation and enterprise and to derive from them behavioral rules based on collective opinion: the agents can adopt an imitative and gregarious behavior, or, on the contrary, anticipate a reversal of tendency, thereby detaching themselves from the current trend. It is this duality, the continuous coexistence of these two elements, which is at the origin of the properties of our model: chaotic behavior and the generation of bubbles. It is the common wisdom that deterministic chaos leads to a fundamental limit of predictability because the tiny inevitable Kuctuations in those chaotic systems quickly snowball in unpredictable ways. This has been investigated in relation with for instance long-term weather patterns. In our model, the chaotic dynamics of the returns is not the limiting factor for predictability, as it contains too much residual correlations. Endogeneous Kuctuations due to +nite-size eNects and external news (noise) seem to be needed to retrieve the observed randomness of stock market prices. The model of imitative and contrarian behavior leads to accelerating bubble prices following +nite-time singularity trajectories aborting into a crash. The accelerating phase is due to imitation. The crash is due to the contrarian behavior reinforced later by the imitation behavior. Quantitatively, the bubble-crash sequence can be described by studying the logarithm of p − 1=2 (which is the deviation from equilibrium where the equilibrium is characterized by the equality between the fraction of bullish agents and the fraction of bearish agents) as a function of linear time. One observes +rst a linear trend which quali+es an exponential growth p−1=2 ˙ e t (with the factor ¿ 0), followed by a super-exponential growth accelerating so much as to give the impression of reaching a singularity in +nite-time.
D. Sornette / Physics Reports 378 (2003) 1 – 98
53
The understanding of this phenomenon comes from the behavior of the “elasticity” of Fm (p) − p with respect to p − 1=2, i.e., the derivative of the logarithm of Fm (p) − p, where Fm (p) is de+ned by (27), with respect to the logarithm of p − 1=2. Two regimes can be observed. 1. For small p − 1=2, the elasticity is 1, i.e.,
1 : Fm (p) − p (m) p − 2
(28)
This expression (28) explains the exponential growth observed at early time. 2. For larger p−1=2, the elasticity increases above 1 and stabilizes to a value !(m) before decreasing again due to the reinjection produced by the contrarian mechanism. The interval in p − 1=2 in which the slope is approximately stabilized at the value !(m) enables us to write
1 !(m) Fm (p) − p 2(m) p − with ! ¿ 1 : (29) 2 These two regimes can be collected in the following phenomenological expression for Fm (p):
1 !(m) 1 1 + 2(m) p − ; (30) Fm (p) = + (1 − 2gm (1=2) − gm (1=2)) p − 2 2 2
1 1 1 !(m) 1 + (m) p − + 2(m) p − = + p− with ! ¿ 1 ; (31) 2 2 2 2 and (m) = −2gm (1=2) − gm (1=2) :
(32)
Introducing the notation 4 = p − 1=2, the dynamics can be rewritten 4 − 4 = (m)4 + 2(m)4!(m) ;
(33)
which, in the continuous time limit, yields d4 = (m)4 + 2(m)4!(m) : dt
(34)
Thus, for small 4, we obtain an exponential growth rate 4t ∼ e(m)t ;
(35)
while for large enough 4 4t ∼ (tc − t)−(1=!(m)−1) :
(36)
For example, for m = 60 with 1hb = 1bh = 0:72 and 1hh = 1bb = 0:85, !(m) = 3, which yields for large 4 pt −
1 1 : ∼√ 2 tc − t
(37)
54
D. Sornette / Physics Reports 378 (2003) 1 – 98
The prediction (36) implies that the returns rt should increase in an accelerating super-exponential fashion at the end of a bubble, leading to a price trajectory 't = 'c − C(tc − t)(!(m)−2=!(m)−1) ;
(38)
where 'c is the culminating price of the bubble reached at t = tc when !(m) ¿ 2, such the +nite-time singularity in rt gives rise only to an in+nite slope of the price trajectory. This behavior (38) with an exponent 0 ¡ (!(m) − 2=!(m) − 1) ¡ 1 has been documented in many bubbles (Sornette et al., 1996; Johansen et al., 1999, 2000; Johansen and Sornette, 1999a, b, 2000a; Sornette and Johansen, 2001; Sornette and Andersen, 2002; Sornette, 2002, 2003). The case m = 60 with 1hb = 1bh = 0:72 and 1hh = 1bb = 0:85 leads to (!(m) − 2=!(m) − 1) = 1=2, which is reasonable agreement with the values reported previously. Interpreted within the present model, the exponent (!(m) − 2=!(m) − 1) of the price singularity gives an estimation of the “connectivity” number m through the dependence of ! on m. Such a relationship has already been argued by Johansen et al. (2000) at a phenomenological level using a mean-+eld equation in which the exponent is directly related to the number of connections to a given agent. This model developed recently has strong potential to provide a simple but powerful approach to modelling +nancial time series. It can be extended in many ways, which include (1) introducing at least a third state, called “neutral”, in addition to the “bullish” and “bearish” states, (2) introducing a fundamental price, a population of value investors and assume that “noise traders” follow the imitative-contrarian strategy previously described, (3) considering the possibility for several stocks to be traded simultaneously, with in particular the introduction of a riskless asset. 6. Log-periodic oscillations decorating power laws 6.1. Status of log-periodicity Log-periodicity is an observable signature of the symmetry of discrete scale invariance (DSI). DSI is a weaker symmetry than (continuous) scale invariance (Dubrulle et al., 1997). The latter is the symmetry of a system which manifests itself such that an observable O(x) as a function of the “control” parameter x is scale invariant under the change x → &x for arbitrary &, i.e., a number !(&) exists such that O(x) = !(&)O(&x) :
(39)
The solution of (39) is simply a power law O(x) = x , with = −(log !=log &), which can be veri+ed directly by insertion. In DSI, the system or the observable obeys scale invariance (39) only for speci>c choices of the magni+cation factor &, which form in general an in+nite but countable set of values &1 ; &2 ; : : : that can be written as &n = &n . & is the fundamental scaling ratio determining the period of the resulting log-periodicity. This property can be qualitatively seen to encode a lacunarity of the fractal structure. The most general solution of (39) with & (and therefore !) is
ln x O(x) = x P ; (40) ln &
D. Sornette / Physics Reports 378 (2003) 1 – 98
55
where P(y) is an arbitrary periodic function ∞ of period 1 in the argument, hence the name logperiodicity. Expanding it in Fourier series n=−∞ cn exp(2n'i(ln x=ln &)), we see that O(x) becomes a sum of power laws with the in+nitely discrete spectrum of complex exponents n = + i2'n=ln &, where n is an arbitrary integer. Thus, DSI leads to power laws with complex exponents, whose observable signature is log-periodicity. Speci+cally, for +nancial bubbles prior to large crashes, we shall see that a +rst order representation of Eq. (40) I (t) = A + B(tc − t)2 + C(tc − t)2 cos(! ln(tc − t) − %)
(41)
captures well the behavior of the market price I (t) prior to a crash or large correction at a time ≈ tc . There are many mechanisms known to generate log-periodicity (Sornette, 1998). The most obvious one is when the system possesses a pre-existing discrete hierarchical structure. There are however various dynamical mechanisms generating log-periodicity, without relying on a pre-existing discrete hierarchical structure. DSI may be produced dynamically and does not need to be pre-determined by e.g., a geometrical network. This is because there are many ways to break a symmetry, the subtlety here being to break it only partially. 6.2. Stock market price dynamics from the interplay between fundamental value investors and technical analysists The importance of the interplay of two classes of investors, fundamental value investors and technical analysts (or trend followers), has been stressed by several recent works (see for instance Lux and Marchesi, 1999 and references therein) to be essential in order to retrieve the important stylized facts of stock market price statistics. We build on this insight and construct a simple model of price dynamics, whose innovation is to put emphasis on the fundamental nonlinear behavior of both classes of agents. 6.2.1. Nonlinear value and trend-following strategies The price variation of an asset on the stock market is controlled by supply and demand, in other words by the net order size 6 through a market impact function (Farmer, 1998). Assuming that the ratio p=p ˜ of the price p˜ at which the orders are executed over the previous quoted price p is solely a function of 6 and using the condition that it is impossible to make pro+ts by repeatedly trading through a close circuit (i.e., buying and selling has to end up with a +nal net position equal to zero), Farmer (1998) has shown that the logarithm of the price is given by the following equation written in discrete form: 6(t) : (42) L The “market depth” L is the typical number of outstanding stocks traded per unit time and thus normalizes the impact of a given order size 6(t) on the log-price variations. The net order size 6 summed over all traders is changing as a function of time so as to reKect the information Kow in the market and the evolution of the traders’ opinions and moods. A zero net order size 6 = 0 corresponds to exact balance between supply and demand. Various derivations have established a connection between the price variation or the variation of the logarithm of the price to ln p(t + 1) − ln p(t) =
56
D. Sornette / Physics Reports 378 (2003) 1 – 98
factors that control the net order size itself (Farmer, 1998; Bouchaud and Cont, 1998; Pandey and StauNer, 2000). Two basic ingredients of 6(t) are thought to be important in determining the price dynamics: reversal to the fundamental value (6fund (t)) and trend following (6trend (t)). Other factors, such as risk aversion, may also play an important role. Ide and Sornette (2002) propose to describe the reversal to estimated fundamental value by the contribution 6fund (t) = −c[ln p(t) − ln pf ] | ln p(t) − ln pf |n−1 ;
(43)
to the order size, where pf is the estimated fundamental value and n ¿ 0 is an exponent quantifying the nonlinear nature of reversion to pf . The strength of the reversion is measured by the coeAcient c ¿ 0, which reKects that the net order is negative (resp. positive) if the price is above (resp. below) pf . The nonlinear power law [ln p(t) − ln pf ] | ln p(t) − ln pf |n−1 of order n is chosen as the simplest function capturing the following eNect. In principle, the fundamental value pf is determined by the discounted expected future dividends and is thus dependent upon the forecast of their growth rate and of the risk-less interest rate, both variables being very diAcult to predict. The fundamental value is thus extremely diAcult to quantify with high precision and is often estimated within relatively large bounds: all of the methods of determining intrinsic value rely on assumptions that can turn out to be far oN the mark. For instance, several academic studies have disputed the premise that a portfolio of sound, cheaply bought stocks will, over time, outperform a portfolio selected by any other method (see for instance, Lamont, 1988). As a consequence, a trader trying to track fundamental value has no incentive to react when she feels that the deviation is small since this deviation is more or less within the noise. Only when the departure of price from fundamental value becomes relatively large will the trader act. The relationship (43) with an exponent n ¿ 1 precisely accounts for this eNect: when n is signi+cantly larger than 1, |x|n remains small for |x| ¡ 1 and shoots up rapidly only when it becomes larger than 1, mimicking a smoothed threshold behavior. The nonlinear dependence of 6fund (t) on ln[p(t)=pf ] = ln p(t) − ln pf shown in (43) is the +rst novel element of our model. Usually, modellers reduce this term to the linear case n = 1 while, as we shall show, generalizing to larger values n ¿ 1 will be a crucial feature of the price dynamics. In economic language, the exponent n = d ln 6fund =d ln(ln[p(t)=pf ]) is called the “elasticity” or “sensitivity” of the order size 6fund with respect to the (normalized) log-price ln[p(t)=pf ]. A related “sensitivity”, that of the money demand to interest rate, has been recently documented to be larger than 1, similarly to the Ide–Sornette (2002) proposal of taking n ¿ 1 in (43). Using a survey of roughly 2700 households, Mulligan and Sala-i-Martin (2000) estimated the interest elasticity of money demand (the sensitivity or log-derivative of money demand to interest rate) to be very small at low interest rates. This is due to the fact that few people decide to invest in interest-producing assets when rates are low, due to “shopping” costs. In contrast, for large interest rates or for those who own a signi+cant bank account, the interest elasticity of money demand is signi+cant. This is a clear-cut example of a threshold-like behavior characterized by a strong nonlinear response. This can be captured by e ≡ d ln M=d ln r = (r=rinK )n with n ¿ 1 such that the elasticity e of money demand M is negligible when the interest r is not signi+cantly larger than the inKation rate rinK and becomes large otherwise. Trend following (in various elaborated forms) was (and probably is still) one of the major strategy used by technical analysts (see Andersen et al. (2000) for a review and references therein).
D. Sornette / Physics Reports 378 (2003) 1 – 98
57
More generally, it results naturally when investment strategies are positively related to past price moves. Trend following can be captured by the following expression of the order size: 6trend (t) = a1 [ln p(t) − ln p(t − 1)] + a2 [ln p(t) − ln p(t − 1)] ×|ln p(t) − ln p(t − 1)|m−1 :
(44)
This expression corresponds to driving the price up if the preceding move was up (a1 ¿ 0 and a2 ¿ 0). The linear case (a1 ¿ 0; a2 = 0) is usually chosen by modellers. Here, we generalize this model by adding the contribution proportional to a2 ¿ 0 from considerations similar to those leading to the nonlinear expression (43) for the reversal term with an exponent n ¿ 1. We argue that the dependence of the order size at time t resulting from trend-following strategies is a nonlinear function with exponent m ¿ 1 of the price change at previous time steps. Indeed, a small price change from time t − 1 to time t may not be perceived as a signi+cant and strong market signal. Since many of the investment strategies are nonlinear, it is natural to consider an average trend-following order size which increases in an accelerated manner as the price change increases in amplitude. Usually, trend-followers increase the size of their order faster than just proportionally to the last trend. This is reminiscent of the argument (Andersen et al., 2000) that traders’s psychology is sensitive to a change of trend (acceleration or deceleration) and not simply to the trend (velocity). The fact that trend-following strategies have an impact on price proportional to the price change over the previous period raised to the power m ¿ 1 means that trend-following strategies are not linear when averaged over all of them: they tend to under-react for small price changes and over-react for large ones. The second term of the right-hand-side of (44) with coeAcient a2 captures this phenomenology. 6.2.2. Nonlinear dynamical equation for stock market prices Introducing the notation x(t) = ln[p(t)=pf ] ;
(45)
and the time scale t corresponding to one time step, and putting all the contributions (43) and (44) into (42), with 6(t) = 6fund (t) + 6trend (t), we get 1 x(t + t) − x(t) = (a1 [x(t) − x(t − t)] + a2 [x(t) − x(t − t)] L ×|x(t) − x(t − t)|m−1 − cx(t)|x(t)|n−1 ) :
(46)
Expanding (46) as a Taylor series in powers of t, we get 2 a1 d x a2 (t)m d x d x m−1 2 d x t + (t) =− 1− dt 2 L dt L dt dt −
c x(t)|x(t)|n−1 + O[(t)3 ] ; L
(47)
where O[(t)3 ] represents a term of the order of (t)3 . Note the existence of the second order derivative, which results from the fact that the price variation from present to tomorrow is based on analysis of price change between yesterday and present. Hence the existence of the three time lags
58
D. Sornette / Physics Reports 378 (2003) 1 – 98
leading to inertia. A special case of expression (46) with a linear trend-following term (a2 = 0) and a linear reversal term (n = 1) has been studied in Bouchaud and Cont (1998) and Farmer (1998), with the addition of a risk-aversion term and a noise term to account for all the other eNects not accounted for by the two terms (43) and (44). We shall neglect risk-aversion as well as any other term and focus only on the reversal and trend-following terms previously discussed to explore the resulting price behaviors. Grassia (2000) has also studied a similar linear second-order diNerential equation derived from market delay, positive feedback and including a mechanism for quenching runaway markets. Expression (46) is inspired by the continuous mean-+eld limit of the model of Pandey and StauNer (2000), de+ned by starting from the percolation model of market price dynamics (Cont and Bouchaud, 2000; Chowdhury and StauNer, 1999; StauNer and Sornette, 1999) and developed to account for the dynamics of the Nikkei and Russian market recessions (Johansen and Sornette, 1999c, 2001b). The generalization assumes that trend-following and reversal to fundamental values are two forces that inKuence the probability that a trader buys or sells the market. In addition, Pandey and StauNer (2000) consider as we do here that the dependence of the probability to enter the market is a nonlinear function with exponent n ¿ 1 of the deviation between market price and fundamental price. However, they do not consider the possibility that m ¿ 1 and stick to the linear trend-following case. We shall see that the analytical control oNered by our continuous formulation allows us to get a clear understanding of the diNerent dynamical phases. Among the four terms of Eq. (47), the +rst term of its right-hand side is the least interesting. For a1 ¡ L, it corresponds to a damping term which becomes negligible compared to the second term in the terminal phase of the growth close to the singularity when |d x=dt| becomes very large. For a1 ¿ L, it corresponds to a negative viscosity but the instability it provides is again subdominant for m ¿ 1. The main ingredients here are the interplay between the inertia provided by the second derivative in the left-hand side, the destabilizing nonlinear trend-following term with coeAcient a2 ¿ 0 and the nonlinear reversal term. In order to simplify the notation and to simplify the analysis of the diNerent regimes, we shall neglect the +rst term of the right-hand side of (47), which amounts to take the special value a1 = L. In a +eld theoretical sense, our theory is tuned right at the “critical point” with a vanishing “mass” term. Eq. (47) can be viewed in two ways. It can be seen as a convenient short-hand notation for the intrinsically discrete equation (46), keeping the time step t small but +nite. In this interpretation, we pose = a2 (t)m−2 =L ;
(48)
= c=L(t)2 ;
(49)
which depend explicitly on t, to get d x d x m−1 d2 x = − x(t)|x(t)|n−1 : dt 2 dt dt
(50)
A second interpretation is to genuinely take the continuous limit t → 0 with the constraints a2 =L ∼ (t)2−m and c=L ∼ (t)2 . This allow us to de+ne the now t-independent coeAcients and according to (48) and (50) and obtain the truly continuous equation (50). This equation can also be
D. Sornette / Physics Reports 378 (2003) 1 – 98 m=2.5
2.5
n=3
59
y(0)=0.02
2 1.5
y1
1 γ = 10
0.5
γ = 1000
0 -0.5 -1 2
3
4
5
6
7
time
Fig. 14. “Reduced price” as a function of time for a trend-following exponent m = 2:5 with n = 3, = 1 and with two amplitudes = 10 and = 1000 of the fundamental reversal term. Reproduced from Ide and Sornette (2002).
written as dy1 = y2 ; dt
(51)
dy2 = y2 |y2 |m−1 − y1 |y1 |n−1 : dt
(52)
This system leads to a +nite-time singularity with accelerating oscillations for m ¿ 1 and n ¿ 1. The richness of behaviors results from the competition between these two terms. 6.2.3. Dynamical properties The origin (y1 =0; y2 =0) plays a special role as the unstable (for m ¿ 1) +xed point around which spiral structures of trajectories are organized in phase space (y1 ; y2 ). It is particularly interesting that this point plays a special role since y1 = 0 means that the observed price is equal to the fundamental price. If, in addition, y2 = 0, there is no trend, i.e., the market “does not know” which direction to take. The fact that this is the point of instability around which the price trajectories organize themselves provides a fundamental understanding of the cause of the complexity of market price time series based on the instability of the fundamental price “equilibrium”. Fig. 14 shows the reduced price for the trend-following exponent m=2:5. In this case, the reduced price goes to a constant at tc with an in+nite slope (the singularity is thus on its derivative, or “velocity”). We can also observe accelerating oscillations, reminiscent of log-periodicity. The novel feature is that the oscillations are only transient, leaving place to a pure +nal accelerating trend in the +nal approach to the critical time tc . Fig. 15 shows that the oscillations with varying frequency and amplitude seen in Fig. 14 are nothing but the projection on one axis of a spiraling structure in the plane. Actually, Fig. 15 shows more than that: in the plane of the reduced price y1 and its “velocity” y2 , it shows two special trajectories that connect exactly the origin y1 = 0; y2 = 0 to in+nity. From general mathematical
D. Sornette / Physics Reports 378 (2003) 1 – 98
y2
60
B+
p
+2
p-1
∆e
∆b
∆B
+ ( 2, 1)
∆B
- (1, 0)
∆ b - (1, 0) ∆b + ( 2 , 1) - ( 3, 2) ∆b
(-1, +0)
p -1
1
b+
∆e
+0
0
∆e (+2, -1)
0
1
p- 0 ∆e
+(3, 2)
∆b
(- 2, +1)
y1
(+1, - 0)
- (2, 1)
∆b +(1, 0)
∆B - ( 2, 1) ∆B
p +1
+(1, 0)
-1
b
-
p
-2
B
-
Fig. 15. Geometrical spiral showing two special trajectories (the continuous and dashed lines) in the “reduced price”–“velocity” plane (y1 ; y2 ) that connect exactly the origin y1 = 0; y2 = 0 to in+nity. This spiraling structure, which exhibits scaling or fractal properties, is at the origin of the accelerating oscillations decorating the power law behavior close to the +nite-time singularity. Reproduced from Ide and Sornette (2002).
theorems of dynamical systems, one can then show that any trajectory starting close to the origin will never be able to cross any of these two orbits. As a consequence, any real trajectory will be guided within the spiraling channel, winding around the central point 0 many times before exiting towards the +nite-time singularities. The approximately log-periodic oscillations result from the oscillatory structure of the fundamental reversal term associated with the acceleration driven by the trend-following term. The conjunction of the two leads to the beautiful spiral, governing a hierarchical organization of the spiralling trajectories around the origin in the price-velocity space. See Ide and Sornette (2002) for a detailed mathematical study of this system. In sum, the simple two-dimensional dynamical system (51,52) embodies two nonlinear terms, exerting respectively positive feedback and reversal, which compete to create a singularity in +nite time decorated by accelerating oscillations. The power law singularity results from the increasing growth rate. The oscillations result from the restoring mechanism. As a function of the order of the nonlinearity of the growth rate and of the restoring term, a rich variety of behavior is observed. The dynamical behavior is traced back fundamentally to the self-similar spiral structure of trajectories in phase space unfolding around an unstable spiral point at the origin. The interplay between the restoring mechanism and the nonlinear growth rate leads to approximately log-periodic oscillations with remarkable scaling properties. 7. Autopsy of major crashes: universal exponents and log-periodicity 7.1. The crash of October 1987 As discussed in Section 2, the crash of October 1987 and its black Monday on October 19 remains one of the most striking drops ever seen on stock markets, both by its overwhelming amplitude and its encompassing sweep over most markets worldwide. It was preceded by a remarkably strong “bull” regime epitomized by the following quote from Wall Street Journal, on August 26, 1987, the day after the 1987 market peak: “In a market like this, every story is a positive one. Any news is
D. Sornette / Physics Reports 378 (2003) 1 – 98
61
340 320 300
S&P
280 260 240 220 200 180 85.5
86
86.5
87
87.5
Time (year)
Fig. 16. Evolution as a function of time of the New York stock exchange index S&P500 from July 1985 to the end of October 1987 (557 trading days). The + represent a constant return increase of ≈ 30%=year and gives var(Fexp ) ≈ 113 (see text for de+nition). The best +t to the power-law (53) gives A1 ≈ 327, B1 ≈ −79, tc ≈ 87:65, m1 ≈ 0:7 and var pow ≈ 107. The best +t to expression (54) gives A2 ≈ 412, B2 ≈ −165, tc ≈ 87:74, C ≈ 12, ! ≈ 7:4, T = 2:0, m2 ≈ 0:33 and var lp ≈ 36. One can observe four well-de+ned oscillations +tted by the expression (54), before +nite size eNects limit the theoretical divergence of the acceleration, at which point the bubble ends in the crash. All the +ts are carried over the whole time interval shown, up to 87.6. The +t with Eq. (54) turns out to be very robust with respect to this upper bound which can be varied signi+cantly. Reproduced from Sornette et al. (1996).
good news. It’s pretty much taken for granted now that the market is going to go up”. Investors were thus largely unaware of the forthcoming risk happenings (Grant, 1990). 7.1.1. Precursory pattern Time is often converted into decimal year units: for nonleap years, 365 days = 1:00 year which leads to 1 day = 0:00274 years. Thus 0.01 year = 3:65 days and 0.1 year = 36:5 days or 5 weeks. For example, October 19, 1987 corresponds to 87.800. Fig. 16 shows the evolution of the New York stock exchange index S&P500 from July 1985 to the end of October 1987 after the crash. The plusses (+) represent the best +t to an exponential growth obtained by assuming that the market is given an average return of about 30% per year. This +rst representation does not describe the apparent overall acceleration before the crash, occurring already more than a year in advance. This acceleration (cusp-like shape) is better represented by using power law functions that Sections 5 and 6 showed to be signatures of a critical behavior of the market. The monotonic line corresponds to the following power law parameterization: Fpow (t) = A1 + B1 (tc − t)m1 ;
(53)
where tc denotes the time at which the powerlaw +t of the S&P500 presents a (theoretically) diverging slope, announcing an imminent crash. In order to qualify and compare the +ts, the variances, denoted var equal to the mean of the squares of the errors between theory and data, or its square-root called the root-mean-square (r.m.s.) are calculated. The ratio of two variances corresponding to two diNerent hypotheses is taken as a qualifying statistic. The ratio of the variance of the constant rate hypothesis to that of the power law is equal to var exp =var pow ≈ 1:1 indicating only a slightly better
62
D. Sornette / Physics Reports 378 (2003) 1 – 98
performance of the power law in capturing the acceleration, the number of free variables being the same and equal to 2. However, already to the naked eye, the most striking feature in this acceleration is the presence of systematic oscillatory-like deviations. Inspired by the insight given in Section 5 and especially Section 6, the oscillatory continuous line is obtained by +tting the data by the following mathematical expression: Flp (t) = A2 + B2 (tc − t)m2 [1 + C cos(! log((tc − t)=T ))] :
(54)
This equation is the simplest example of a log-periodic correction to a pure power law for an observable exhibiting a singularity at the time tc at which the crash has the highest probability to occur. The log-periodicity here stems from the cosine function of the logarithm of the distance tc − t to the critical time tc . Due to log-periodicity, the evolution of the +nancial index becomes (discretely) scale-invariant close to the critical point. The log-periodic correction to scaling implies the existence of a hierarchy of characteristic time intervals tc − tn , given by the expression Tn = Tc − (Tc − T0 )&−n ;
(55)
with a preferred scaling ratio denoted &. For the October 1987 crash, we +nd & 1:5 − 1:7 (this value is remarkably universal and is found approximately the same for other crashes as we shall see). We expect a cut-oN at short time scales (i.e. above n ∼ a few units) and also at large time scales due to the existence of +nite size eNects. These time scales tc −tn are not universal but depend upon the speci+c market. What is expected to be universal are the ratios (tc − tn+1 )=(tc − tn ) = &. For details on the +tting procedure, we refer to Sornette et al. (1996). It is possible to generalize the simple log-periodic power law formula used in Fig. 16 by using a mathematical tool, called bifurcation theory, to obtain its generic nonlinear correction, that allows one to account quantitatively for the behavior of the Dow Jones and S&P500 indices up to 8 years prior to the October 1987. The result of this theory presented in Sornette and Johansen (1997) is used to generate the +t shown in Fig. 17. One sees clearly that the new formula accounts remarkably well for almost eight years of market price behavior compared to only a little more than two years for the simple log-periodic formula shown in Fig. 16. The nonlinear theory developed in Sornette and Johansen (1997) leads to “log-frequency modulation”, an eNect +rst noticed empirically in Feigenbaum and Freund (1996). The remarkable quality of the +ts shown in Figs. 16 and 17 have been assessed in Johansen and Sornette (1999b). In a recent reanalysis, Feigenbaum (2001) examined the data in a new way by taking the +rst diNerences for the logarithm of the S&P500 from 1980 to 1987. The rational for taking the price variation rather than the price itself is that the Kuctuations, noises or deviations are expected to be more random and thus more innocuous than for the price which is a cumulative quantity. By rigorous hypothesis testing, Feigenbaum found that the log-periodic component cannot be rejected at the 95%-con+dence level: in plain words, this means that the probability that the log-periodic component results from chance is about or less than 0.05. 7.1.2. Aftershock patterns If the concept of a crash as a kind of critical point has any value, we should be able to identify post-crash signatures of the underlying cooperativity. In fact, we should expect an at least qualitative
D. Sornette / Physics Reports 378 (2003) 1 – 98
63
6
5.8
5.6
Log (S&P)
5.4
5.2
5
4.8
4.6
4.4 80
81
82
83
84
85
86
87
88
Year
Fig. 17. Time dependence of the logarithm of the New York stock exchange index S&P500 from January 1980 to September 1987 and best +t by the improved nonlinear log-periodic formula developed in Sornette and Johansen (1997) (thin line). The exponent and log-periodic angular frequency are m2 = 0:33 and !1987 = 7:4. The crash of October 19, 1987 corresponds to 1987.78 decimal years. The thick line is the +t by (54) on the subinterval from July 1985 to the end of 1987 and is represented on the full time interval starting in 1980. The comparison with the thin line allows one to visualize the frequency shift described by the nonlinear theory. Reproduced from Sornette and Johansen (1997).
symmetry between patterns before and after the crash. In other words, we should be able to document the existence of a critical exponent as well as log-periodic oscillations on relevant quantities after the crash. Such a signature in the volatility of the S&P500 index (a measure of the market risk perceived by investors), implied from the price of S&P500 options, can indeed be seen in Fig. 18. Fig. 18 presents the time evolution of the implied volatility of the S&P500, taken from Chen et al. (1995). The perceived market risk is small prior to the crash, jumps up abruptly at the time of the crash and then decays slowly over several months. This decay to “normal times” of perceived risks is compatible with a slow power law decay decorated by log-periodic oscillations, which can be +tted by expression (54) with tc − t (before the crash) replaced by t − tc (after the crash). Our analysis with (54) with tc −t replaced by t −tc gives again an estimation of the position of the critical time tc , which is found correctly within a few days. Note the long time scale covering a period of the order of a year involved in the relaxation of the volatility after the crash to a level comparable to the one before the crash. This implies the existence of a “memory eNect”: market participants remain nervous for quite a long time after the crash, after being burned out by the dramatic event. It is also noteworthy that the S&P500 index as well as other markets worldwide have remained close to the after-crash level for a long time. For instance, by February 29, 1988, the world index stood at 72.7 (reference 100 on September 30, 1987). Thus, the price level established in the October crash seems to have been a virtually unbiased estimate of the average price level over the subsequent months (see also Fig. 19). This is in support of the idea of a critical point, according to which the event is an intrinsic signature of a self-organization of the markets worldwide.
64
D. Sornette / Physics Reports 378 (2003) 1 – 98 90 80 70 2 (S&P 500)
60 50 40 30 20 10 0 87.6
87.8
88
88.2 time (year)
88.4
88.6
88.8
Fig. 18. Time evolution of the implied volatility of the S&P500 index (in logarithmic scale) after the October 1987 crash, taken from Chen et al. (1995). The + represent an exponential decrease with var(Fexp ) ≈ 15. The best +t to a power-law, represented by the monotonic line, gives A1 ≈ 3:9, B1 ≈ 0:6, tc = 87:75, m1 ≈ −1:5 and var pow ≈ 12. The best +t to expression (54) with tc − t replaced by t − tc gives A2 ≈ 3:4, B2 ≈ 0:9, tc ≈ 87:77, C ≈ 0:3, ! ≈ 11, m2 ≈ −1:2 and var lp ≈ 7. One can observe six well-de+ned oscillations +tted by (54). Reproduced from Sornette et al. (1996).
320
300
S&P 500
280
260
240
220
200 87.8
87.82
87.84 87.86 time (year)
87.88
87.9
Fig. 19. Time evolution of the S&P500 index over a time window of a few weeks after the October 19, 1987 crash. The +t with an exponentially decaying sinusoidal function shown in dashed line suggests that a good model for the short-time response of the U.S. market is a single dissipative harmonic oscillator or damped pendulum. Reproduced from Sornette et al. (1996).
There is another striking signature of the cooperative behavior of the U.S. market, found by analyzing the time evolution of the S&P500 index over a time window of a few weeks after the October 19, 1987 crash. A +t shown in Fig. 19 with an exponentially decaying sinusoidal function suggests that the U.S. market behaved, for a few weeks after the crash, as a single dissipative harmonic oscillator, with a characteristic decay time of about one week equal to the period of the oscillations. In other words, the price followed the trajectory of a pendulum moving back and forth with damped oscillations around an equilibrium position.
D. Sornette / Physics Reports 378 (2003) 1 – 98
65
400
Dow Jones
350
300
250
200
150 27
27.5
28
28.5 Date
29
29.5
30
Fig. 20. The Dow Jones index prior to the October 1929 crash on Wall Street. The +t shown as a continuous line is Eq. (54) with A2 ≈ 571; B2 ≈ −267, B2 C ≈ 14:3; m2 ≈ 0:45; tc ≈ 1930:22, ! ≈ 7:9 and % ≈ 1:0. Reproduced from Johansen and Sornette (1999a).
This signature strengthens the view of a market as a cooperative self-organizing system. The basic story suggested by these +gures is the following. Before the crash, imitation and speculation were rampant and led to a progressive “aggregation” of the multitude of agents into a large eNective “super-agent”, as illustrated in Figs. 16 and 17; right after the crash, the market behaved as a single “super-agent” +nding rapidly the equilibrium price through a return to “equilibrium”, as shown in Fig. 19. On longer time scales, the “super-agent” progressively was fragmented and the diversity of behaviors was rejuvenated as seen from Fig. 18. 7.2. The crash of October 1929 The crash of October 1929 is the other major historical market event of the twentieth century. Notwithstanding the diNerences in technologies and the absence of computers and other modern means of information transfer, the October 1929 crash exhibits many similarities with the October 1987 crash, so much so as shown in Figs. 20 and 21, that one can wonder about the similitudes: what has not changed over the history of mankind is the interplay between human’s crave for exchanges and pro+ts, and their fear of uncertainty and losses. The similarity between the two situations in 1929 and 1987 was in fact noticed at a qualitative level in an article in the Wall Street Journal on October 19, 1987, the very morning of the day of the stock market crash (with a plot of stock prices in the 1920s and the 1980s). See the discussion in Shiller (1989). The similarity between the two crashes can be made quantitative by comparing the +t of the Dow Jones index with formula (54) from June 1927 till the maximum before the crash in October 1929, as shown in Fig. 20, to the corresponding +t for the October 1987 crash shown in Fig. 16. Notice the similar widths of the two time windows, the similar acceleration and oscillatory structures, quanti+ed by similar exponents m2 and log-periodic angular frequency !: m1987 = 0:33 compared to 2 1987 1987 m1929 = 0:45; ! = 7:4 compared to ! = 7:9. These numerical values are remarkably close and 2 can be considered equal to within their uncertainties.
66
D. Sornette / Physics Reports 378 (2003) 1 – 98
Fig. 21. Time dependence of the logarithm of the Dow Jones stock exchange index from June 1921 to September 1929 and best +t by the improved nonlinear log-periodic formula developed in Sornette and Johansen (1997). The crash of October 23, 1929 corresponds to 1929.81 decimal years. The parameters of the +t are: r:m:s: = 0:041, tc = 1929:84 year, m2 = 0:63, ! = 5:0, ]! = −70, ]t = 14 years, A2 = 61, B2 = −0:56, C = 0:08. ]! and ]t are two new parameters introduced in Sornette and Johansen (1997). Reproduced from Sornette and Johansen (1997).
Fig. 21 for the October 1929 crash is the analog of Fig. 17 for the October 1987 crash. It uses the improved nonlinear log-periodic formula developed in Sornette and Johansen (1997) over a much larger time window starting in June 1921. Also according to this improved theoretical formulation, the values of the exponent m2 and of the log-periodic angular frequency ! for the two great crashes are quite close to each other: m1929 =0:63 and m1987 =0:68. This is in agreement with the universality 2 2 of the exponent m2 predicted from the renormalization group theory for log-periodicity (Saleur and Sornette, 1996; Sornette, 1998). A similar universality is also expected for the log-frequency, albeit with a weaker strength as it has been shown (Saleur and Sornette, 1996) that Kuctuations and noise will modify ! diNerently depending on their nature. The +ts indicate that !1929 = 5:0 and !1987 = 8:9. These values are not unexpected and fall within the range found for other crashes (see below). They correspond to a preferred scaling ratio equal respectively to &1929 = 3:5 compared to &1987 = 2:0. The October 1929 and October 1987 thus exhibit two similar precursory patterns on the Dow Jones index, starting respectively 2.5 and 8 years before them. It is thus a striking observation that essentially similar crashes have punctuated this century, notwithstanding tremendous changes in all imaginable ways of life and work. The only thing that has probably changed little are the way humans think and behave. The concept that emerges here is that the organization of traders in +nancial markets leads intrinsically to “systemic instabilities”, that probably result in a very robust way from the fundamental nature of human beings, including our gregarious behavior, our greediness, our instinctive psychology during panics and crowd behavior and our risk aversion. The global behavior of the market, with its log-periodic structures that emerge as a result of the cooperative behavior of traders, is reminiscent of the process of the emergence of intelligent behavior at a macroscopic scale that individuals at the microscopic scale cannot perceive. This process has been discussed in biology for instance in animal populations such as ant colonies or in connection with the emergence of consciousness (Anderson et al., 1988).
D. Sornette / Physics Reports 378 (2003) 1 – 98
67
There are however some diNerences between the two crashes. An important quantitative diNerence between the great crash of 1929 and the collapse of stock prices in October 1987 was that stock price variability in the year following the crash was much higher in 1929 than in 1987 (Romer, 1990). This has led economists to argue that the collapse of stock prices in October 1929 generated signi+cant temporary increased uncertainty about future income that led consumers to forgo purchases of durable goods. Forecasters were then much more uncertain about the course of future income following the stock market crash than was typical even for unsettled times. Contemporary observers believed that consumer uncertainty was an important force depressing consumption, that may have been an important factor in the strengthening of the great depression. The increase of uncertainty after the October 1987 crash has led to a smaller eNect, as no depression ensued. However, Fig. 18 clearly quanti+es an increased uncertainty and risk, lasting months after the crash. 7.3. The three Hong Kong crashes of 1987, 1994 and 1997 Hong Kong has a strong free-market attitude, characterized by very few restrictions on both residents and nonresidents, private persons or companies, to operate, borrow, repatriate pro+t and capital. This goes on even after Hong Kong reverted to Chinese sovereignty on July 1st, 1997 as a Special Administrative Region (SAR) of the People’s Republic of China, as it was promised a “high degree of autonomy” for at least 50 years from that date according to the terms of the Sino-British Joint Declaration. The SAR is ruled according to a mini-constitution, the Basic Law of the Hong Kong SAR. Hong Kong has no exchange controls and crossborder remittances are readily permitted. These rules have not changed since July 1st, 1997 when China took over sovereignty from the UK. Capital can thus Kow in and out of the Hong Kong stock market in a very Kuid manner. There are no restrictions on the conversion and remittance of dividends and interest. Investors bring their capital into Hong Kong through the open exchange market and remit it the same way. Accordingly, we may expect speculative behavior and crowd eNects to be free to express themselves in their full force. Indeed, the Hong Kong stock market provides maybe the best textbook-like examples of speculative bubbles decorated by log-periodic power law accelerations followed by crashes. Over the last 15 years only, one can identify three major bubbles and crashes. They are indicated as I, II and III in Fig. 22. 1. The +rst bubble and crash are shown in Fig. 23 and are synchronous to the worldwide October 1987 crash already discussed. On October 19, 1987, the Hang Seng index closed at 3362.4. On October 26, it closed at 2241.7, corresponding to a cumulative loss of 33.3%. 2. The second bubble ends in early 1994 and is shown in Fig. 24. The bubble ends by what we could call a “slow crash”: on February 4, 1994, the Hang Seng index topped at 12157.6 and, a month later on March 3, 1994, it closed at 9802, corresponding to a cumulative loss of 19.4%. It went even further down over the next two months, with a close at 8421.7 on May, 9, 1994, corresponding to a cumulative loss since the high on February 4 of 30.7%. 3. The third bubble, shown in Fig. 25 ended in mid-august 1997 by a slow and regular decay until October 17, 1997, followed by an abrupt crash: the drop from 13601 on October 17 to 9059.9 on October 28 corresponds to a 33.4% loss. The worst daily plunge of 10% was the third biggest percentage fall following the 33.3% crash in October 1987 and 21.75% fall after the Tiananmen Square crackdown in June 1989.
68
D. Sornette / Physics Reports 378 (2003) 1 – 98 18000 Hong-Kong
III
16000 14000 II
Index
12000 10000 8000 6000 I
4000 2000 0 80
82
84
86
88
90 Date
92
94
96
98
Fig. 22. The Hong Kong stock market index as a function of time. Three extended bubbles followed by large crashes can be identi+ed. The approximate dates of the crashes are October 87 (I), January 94 (II) and October 97 (III). Reproduced from Johansen and Sornette (2001b).
’Hong-Kong I’ 4000
Best fit eq. (1) Second best fit eq. (1)
3500
Index
3000
2500
2000
1500
1000 84.5
85
85.5
86
86.5
87
87.5
Date
Fig. 23. Hong Kong stock market bubble ending with the crash of October 87. On October 19, 1987, the Hang Seng index closed at 3362.4. On October 26, it closed at 2241.7, corresponding to a loss of 33.3%. See Table 2 for the parameter values of the +t with Eq. (54). Reproduced from Johansen and Sornette (2001b).
Table 2 gives the parameters of the +ts with Eq. (54) of the bubble phases of the three events I, II and III shown in Figs. 23–25. It is quite remarkable that the three bubbles on the Hong Kong stock market have essentially the same log-periodic angular frequency ! within ±15%. These values are also quite similar to what has been found for bubbles on the USA market and for the FOREX (see below). In particular, for the October 1997 crash on the Hong Kong market, we have m1987 = 0:33 ¡ mHK1997 = 0:34 ¡ m1929 = 0:45 and !1987 = 7:4 ¡ !HK1997 = 7:5 ¡ !1929 = 7:9; 2 2 2 the exponent m2 and the log-periodic angular frequency ! for the October 1997 crash on the
D. Sornette / Physics Reports 378 (2003) 1 – 98
69
14000 ’Hong-Kong II’ Best fit eq. (1)
13000 12000 11000
Index
10000 9000 8000 7000 6000 5000 4000 3000
92
92.5
93
93.5
94
Date
Fig. 24. Hong Kong stock market bubble ending with the crash of early 1994. On February 4, 1994, the Hang Seng index topped at 12157.6. A month later, on March 3, 1994, it closed at 9802, corresponding to a cumulative loss of 19.4%. It went even further down two months later, with a close at 8421.7 on May, 9, 1994, corresponding to a cumulative loss since the high on February 4 of 30.7%. See Table 2 for the parameter values of the +t with equation (54). Reproduced from Johansen and Sornette (2001b).
20000
18000
Hang Seng
16000
14000
12000
10000
8000
6000 95
95.5
96
96.5 Date
97
97.5
98
Fig. 25. The Hang Seng index prior to the October 1997 crash on the Hong Kong Stock Exchange. The index topped at 16460.5 on August 11, 1997. It then regularly decayed to 13601 reached on October 17, 1997. It then crashed abruptly reaching a close of 9059.9 on October 28, 1997, with an intra-day low of 8775.9. The amplitude of the total cumulative loss since the high on August 11 is 45%. The amplitude of the crash from October 17 to October 28 is 33.4%. The +t is Eq. (54) with A2 ≈ 20077, B2 ≈ −8241, C ≈ −397, m2 ≈ 0:34, tc ≈ 1997:74, ! ≈ 7:5 and % ≈ 0:78. Reproduced from Johansen and Sornette (1999a, 2001b).
Hong Kong Stock Exchange are perfectly bracketed by the two main crashes on Wall Street! Fig. 26 demonstrates the “universality” of the log-periodic component of the signals in the three bubbles preceding the three crashes on the Hong Kong market.
70
D. Sornette / Physics Reports 378 (2003) 1 – 98
Table 2 Fit parameters of the three speculative bubbles on the Hong Kong stock market shown in Figs. 23–25 leading to a large crash. Multiple entries correspond to the two best +ts. Reproduced from Johansen and Sornette (2001b) Stock market
A2
B2
B2 C
m2
tc
!
%
Hong Kong I Hong Kong II Hong Kong III
5523; 4533 21121 20077
−3247; −2304 −15113 −8241
171; −174 −429 −397
0.29; 0.39 0.12 0.34
87.84; 87.78 94.02 97.74
5.6; 5.2 6.3 7.5
−1.6; 1.1 −0.6 0.8
1 1987 1994 1997
0.9 0.8
Spectral weight
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
1
2
3
4 5 log-frequency
6
7
8
Fig. 26. Lomb spectral analysis of the three bubbles preceding the three crashes on the Hong Kong market is shown in Figs. 23–25. See Press et al. (1992) for explanations on the Lomb spectral analysis. All three bubbles are characterized by almost the same “universal” log-frequency f ≈ 1 corresponding to a preferred scaling ratio of the discrete scale invariance equal to & = exp(1=f) ≈ 2:7. Courtesy A. Johansen.
7.4. The crash of October 1997 and its resonance on the U.S. market The Hong Kong market crash of October 1997 has been presented as a textbook example where contagion and speculation took a course of their own. When Malaysian Prime Minister Dr Mahathir Mohamad made his now famous address to the World Bank-International Monetary Fund seminar in Hong Kong in September 1997, many critics pooh-poohed his proposal to ban currency speculation as an attempt to hide the fact that Malaysia’s economic fundamentals were weak. They pointed to the fact that the currency turmoil had not aNected Hong Kong, whose economy was basically sound. Thus, if Malaysia and other countries were aNected, that’s because their economies were weak. At that time, it was easy to point out the de+cits in the then current account of Thailand, Malaysia and Indonesia. In contrast, Hong Kong had a good current account situation and moreover had solid foreign reserves worth U.S.$88 billion. This theory of the strong-won’t-be-aNected already suNered a setback when the Taiwan currency’s peg to the U.S. dollar had to be removed after the Taiwan authorities spent U.S.$5 billion to defend their currency from speculative attacks, and then gave up. The “coup de grace” came with the meltdown in Hong Kong in October 1997 which shocked the analysts and the media as this high-Kying market was considered the safest haven in Asia.
D. Sornette / Physics Reports 378 (2003) 1 – 98
71
In contrast to the meltdown in Asia’s lesser markets as country after country, led by Thailand in July 1997, succumbed to economic and currency problems, Hong Kong was supposed to be diNerent. With its Western-style markets, the second largest in Asia after Japan, it was thought to be immune to the +nancial Ku that had swept through the rest of the continent. It is clear from our analysis of Section 5 and from the lessons of the two previous bubbles ending in October 1987 and in early 1994 that those assumptions naively overlooked the contagion leading to over-investments in the build-up period preceding the crash and the resulting instability, which left the Hong Kong market vulnerable to speculative attacks. Actually, hedge funds in particular are known to have taken positions consistent with a possible crisis on the currency and on the stock market, by “shorting” (selling) the currency to drive it down, forcing the Hong Kong government to raise interest rates to defend it by increasing the currency liquidity but as a consequence having equities suNer, making the stock market more unstable. As we have already stressed, one should not mix the “local” cause from the fundamental cause of the instability. As the late George Stigler once put it, to blame ‘the markets’ for an outcome we don’t like is like blaming the waiters in restaurants for obesity. Within the framework defended here (see also Sornette, 2003), crashes occur as possible (but not necessary) outcomes of a long preparation, that we refer for short as “herding”, which makes the market enter into a more and more unstable regime. When in this state, there are many possible “local” causes that may cause it to stumble. Pushing the argument to the extreme to make it crystal clear, it is as if the responsibility for the collapse of the infamous Tacoma Narrows Bridge that once connected mainland Washington with the Olympic peninsula was attributed to strong wind. It is true that, on November 7, 1940, at approximately 11:00 AM, it suddenly collapsed after developing a remarkably “ordered” sway in response to a strong wind after it had been open to traAc for only a few months (see Tacoma Narrows Bridge historical +lm footage showing in 250 frames (10 s) the maximum torsional motion shortly before failure of this immense structure: http://cee.carleton.ca/Exhibits/Tacoma Narrows/). However, the strong wind of that day is only the “local” cause while there is a more fundamental cause: the bridge, like most objects, has a small number of characteristic vibration frequencies, and one day the wind was exactly of the strength needed to excite one of them. The bridge responded by vibrating at this characteristic frequency so strongly, i.e., by “resonating”, that it fractured the supports holding it together. The fundamental cause of the collapse of the Tacoma Narrows Bridge thus lies in an error of conception that enhanced the role of one speci+c mode of resonance. We propose that, analogously to the collapse of the Tacoma Narrows Bridge, many stock markets crash as the results of built-in or acquired instabilities. These instabilities may in turn be revealed by “small” perturbations that lead to the collapse. The speculative attacks in periods of market instabilities are sometimes pointed at as possible causes of serious potential hazards for developing countries when allowing the global +nancial markets to have free play, especially when these countries come under pressure to open up their +nancial sectors to large foreign banks, insurance companies, stock broking +rms and other institutions, under the World Trade Organization’s +nancial services negotiations. We argue that the problem comes in fact fundamentally from the over-enthusiastic initial in-Kux of capital as a result of herding, that initially pro+ts the country, but at the risk of future instabilities: developing countries as well as investors “cannot have the cake and eat it too!” From an eAcient market view point, the speculative attacks are nothing but the revelation of the instability and the means by which the markets are forced back to a more stable dynamical state.
72
D. Sornette / Physics Reports 378 (2003) 1 – 98 6.2
Logarithm (S&P500 index)
6 5.8 5.6 5.4 5.2 5 4.8 4.6 91
92
93
94 95 96 time (years)
97
98
Fig. 27. The best +t shown as the smooth continuous line of the logarithm of the S&P500 index from January 1991 till September 4, 1997 (1997.678) by the improved nonlinear log-periodic formula developed in Sornette and Johansen (1997), already used in Figs. 17 and 21. The exponent m2 and log-periodic angular frequency ! are respectively m2 = 0:73 (compared to 0.63 for October 1929 and 0.33 for October 1987) and ! = 8:93 (compared to 5.0 for October 1929 and 7.4 for October 1987). The critical time predicted by this +t is tc = 1997:948, i.e., mid-December 1997. Courtesy A. Johansen.
Interestingly, the October 1997 crash on the Hong Kong market had important echos in other markets worldwide and in particular in the U.S. markets. The story is often told as if a “wave of selling”, starting in Hong Kong, has spread +rst to other southeast Asian markets based on negative sentiment—which served to reaArm the deep +nancial problems of the Asian tiger nations—then to the European markets, and +nally to the U.S. market. The shares that were hardest hit in Western markets were the multinational companies, which receive part of their earnings from the southeast Asian region. The reason for their devaluation is that the region’s economic slowdown would lower corporate pro+ts. It is estimated that the 25 companies which make up one third of Wall Street’s S&P500 index market capitalisation earn roughly half of their income from non-U.S. sources. Lower growth in southeast Asia heightened one of the biggest concerns of Wall Street investors. To carry on the then present “bull” run, the market needed sustained corporate earnings—if they were not forthcoming, the cycle of rising share prices would whither into one of falling share prices. Concern over earnings might have proved to be the straw that broke Wall Street’s six-year bull run. Fingerprints of herding and of an incoming instability were detected by several groups independently and announced publicly. According to our theory, the turmoil on the +nancial U.S. market in October 1997 should not be seen only as a passive reaction to the Hong Kong crash. The log-periodic power law signature observed on the U.S. market over several years before October 1997 (see Fig. 27) indicates that a similar “herding” instability was also developing simultaneously. In fact, the detection of log-periodic structures and a prediction of a stock market correction or a crash at the end of October 1997 was formally issued jointly ex-ante on September 17, 1997 by A. Johansen and the author, to the French oAce for the protection of proprietary softwares and inventions with registration number 94781. In addition, a trading strategy has been devised using put options in order to provide an experimental test of the theory. A 400% pro+t has been obtained in a two week period covering the mini-crash of October 28, 1997. The proof of this pro+t is available from a Merrill Lynch client cash management account released in November 1997. Using a variation of
D. Sornette / Physics Reports 378 (2003) 1 – 98
73
our theory which turns out to be slightly less reliable (see the comparative tests in Johansen and Sornette, 1999b), a group of physicists and economists (Vandewalle et al., 1998a) also made a public announcement published on September 18, 1997 in a Belgium journal (Dupuis, 1997) and communicated afterwards their methodology in a scienti+c publication (Vandewalle et al., 1998b). Two other groups have also analyzed, after the fact, the possibility to predict this event. Feigenbaum and Freund (1998) analyzed the log-periodic oscillations in the S&P500 and the NYSE in relation to the October 27’th “correction” seen on Wall Street. Gluzman and Yukalov (1998) proposed a new approach based on the algebraic self-similar renormalization group to analyze the time series corresponding to the October 1929 and 1987 crashes and the October 1997 correction of the New York Stock Exchange (NYSE) (Gluzman and Yukalov, 1998). The best +t of the logarithm of the S&P500 index from January 1991 till September 4, 1997 by the improved nonlinear log-periodic formula developed in Sornette and Johansen (1997), already used in Figs. 17 and 21 is shown in Fig. 27. This result and many other analyses led to the prediction alluded to above. It turned out that the crash did not really occur: what happened was that the Dow plunged 554.26 points, +nishing the day down 7.2%, and NASDAQ posted its biggest-ever (up to that time) one-day point loss. In accordance with a new rule passed after October 1987 “Black Monday”, trading was halted on all major U.S. exchanges. Private communications from professional traders to the author indicate that many believed that a crash was coming but this turns out to be incorrect. This sentiment has also to be put in the perspective of the earlier sell-oN at the beginning of the month triggered by Greenspan’s statement that the boom in the U.S. economy was unsustainable and that the current rate of gains in the stock market was unrealistic. It is actually interesting that the critical time tc identi+ed around this data indicated a change of regime rather than a real crash: after this turbulence, the U.S. market remained more or less Kat, thus breaking the previous “bullish” regime, with large volatility until the end of January 1998, and then started again a new “bull” phase stopped in its course in August 1998, that we shall analyze below. The observation of a change of regime after tc is in full agreement with the rational expectation model of a bubble and crash described in Section 5: the bubble expands, the market believes that a crash may be more and more probable, the prices develop characteristic structures of speculation and herding but the critical time passes without the crash happening. This can be interpreted as the nonzero probability scenario also predicted by the rational expectation model of a bubble and crash described in Section 5, that it is possible that no crash occurs over the whole lifetime of the bubble including tc . The simultaneity of the critical times tc of the Hong Kong crash and of the end of the U.S. and European speculative bubble phases at the end of October 1997 may be neither a lucky occurrence nor a signature of a causal impact of one market (Hong Kong) onto others, as has been often discussed too naively. This simultaneity can actually be predicted in a model of rational expectation bubbles allowing the coupling and interactions between stock markets. For general interactions, if a critical time appears in one market, it should also be present in other markets as a result of the nonlinear interactions existing between the markets (Johansen and Sornette, 2001a). 7.5. Currency crashes Currencies can also develop bubbles and crashes. The bubble on the dollar starting in the early 1980s and ending in 1985 is a remarkable example shown in Fig. 28.
74
D. Sornette / Physics Reports 378 (2003) 1 – 98 3.6 3.4
Exchange Rate
3.2 3 2.8 2.6 2.4 2.2 2 1.8 83
83.5
84
84.5
85
85.5
Date
Fig. 28. The U.S.$ expressed in German Mark DEM (top curve) and in Swiss franc CHF (bottom curve) prior to its collapse on mid-1985. The +t to the DEM currency against the U.S. dollar with Eq. (54) is shown as the continuous and smooth line and give A2 ≈ 3:88, B2 ≈ −1:2, B2 C ≈ 0:08, m2 ≈ 0:28, tc ≈ 1985:20, ! ≈ 6:0 and % ≈ −1:2. The +t to the Swiss franc against the U.S. dollar with Eq. (54) gives A2 ≈ 3:1, B2 ≈ −0:86, B2 C ≈ 0:05, m2 ≈ 0:36, tc ≈ 1985:19, ! ≈ 5:2 and % ≈ −0:59. Note the small Kuctuations in the value of the scaling ratio 2:2 6 & 6 2:7, which constitutes one of the key test of our “critical herding” theory. Reproduced from Johansen and Sornette (1999a).
The U.S. dollar experienced an unprecedented cumulative appreciation against the currencies of the major industrial countries starting around 1980, with several consequences: loss of competitiveness with important implications for domestic industries, increase of the U.S. merchandise trade de+cit by as much as $45 billion by the end of 1983, with export sales about $35 billion lower and the import bill $10 billion higher. For instance, in 1982, it was already expected that, through its eNects on export and import volume, the appreciation would reduce real gross national product by the end of 1983 to a level 1–1.5% lower than the 3rd quarter 1980 pre-appreciation level (Feldman, 1982). The appreciation of the U.S. dollar from 1980 to 1984 was accompanied by substantial decline in prices for the majority of manufactured imports from Canada, Germany, and Japan. However, for a substantial minority of prices, the imported items’ dollar prices rose absolutely and in relation to the general U.S. price level. The median change was a price decline of 8% for imports from Canada and Japan, and a decrease of 28% for goods from Germany (Fieleke, 1985). As a positive eNect, the impact on the U.S. inKation outlook was to improve it very signi+cantly. There is also evidence that the strong dollar in the +rst half of the 1980s forced increased competition in U.S. product markets, especially vis-a-vis continental Europe (Knetter, 1994). As we explained in Section 5, according to the rational expectation theory of speculative bubbles, prices can be driven up by an underlying looming risk of a strong correction or crash. Such a possibility has been advocated as an explanation for the strong appreciation of the U.S. dollar from 1980 to early 1985 (Kaminsky and Peruga, 1991). If the market believes that a discrete event may occur when the event does not materialize for some time, this may have two consequences: drive price up and lead to an apparent ineAcient predictive performance of forward exchange rates (forward and future contracts are +nancial instruments that track closely “spot” prices as they embody the best information on the expectation of market participants on near-term spot price in the future). Indeed, from October 1979 to February 1985, forward rates systematically underpredicted the strength of the
D. Sornette / Physics Reports 378 (2003) 1 – 98 1.6
75 190
CAN$ 180 170 1.5 Yen
1.45
160 150 140
1.4
Price Ratio of Yen to US$
Price Ratio of CAN$ to US$
1.55
130 1.35 120 1.3 96.8
97
97.2
97.4
97.6
97.8 Date
98
98.2
98.4
98.6
110 98.8
Fig. 29. The U.S. dollar expressed in CAN$ and YEN currencies prior to its drop starting in August 1998. The +t with Eq. (54) to the two exchange rates gives A2 ≈ 1:62, B2 ≈ −0:22, B2 C ≈ −0:011, m2 ≈ 0:26, tc ≈ 98:66, % ≈ −0:79, ! ≈ 8:2 and A2 ≈ 207, B2 ≈ −85, B2 C ≈ 2:8, m2 ≈ 0:19, tc ≈ 98:78, % ≈ −1:4, ! ≈ 7:2, respectively. Reproduced from Johansen et al. (1999).
U.S. dollar. Two discrete events could be identi+ed as governing market expectations (Kaminsky and Peruga, 1991): (1) change in monetary regime in October 1979 and the resulting private sector doubts about the Federal Reserve’s commitment to lower money growth and inKation; (2) private sector anticipation of the dollar’s depreciation beginning in March 1985, i.e., anticipation of a strong correction, exactly as in the bubble-crash model of Section 5. The corresponding characteristic power law acceleration of bubbles decorated by log-periodic oscillations is shown in Fig. 28. Expectations of future exchange rate have been shown to be excessive in the posterior period from 1985.2 to 1986.4, indicating bandwagon eNects at work and the possibility of a rational speculative bubble (MacDonald and Torrance, 1988). As usual before a strong correction or a crash, analysts were showing over-con+dence and there were many reassuring talks of the absence of a signi+cant danger of collapse of the dollar, which has risen to unprecedented heights against foreign currencies (Holmes, 1985). On the long term however, it was clear that such a strong dollar was unsustainable and there were indications that the dollar was overvalued, in particular because foreign exchange markets generally hold that a nation’s currency can remain strong over the longer term, only if the nation’s current account is healthy: in contrast, for the +rst half of 1984, the U.S. current account suNered a seasonally adjusted de+cit of around $44.1 billion. A similar but somewhat attenuated bubble of the U.S. dollar expressed respectively in Canadian dollar and Japanese Yen, extending over slightly less than a year and bursting in the summer of 1998, is shown in Fig. 29. Paul Krugman has suggested that this run-up on the Yen and Canadian dollar, as well as the near collapse of U.S. +nancial markets at the end of the summer of 1998, which is discussed in the next section, are the un-wanted “byproduct of a vast get-richer-quick scheme by a handful of shadowy +nancial operators” which back+red (Krugman, 1998). The remarkable quality of the +ts of the data with our theory does indeed give credence to the role of speculation, imitation and herding, be them spontaneous, self-organized or manipulated in part. Actually, Frankel and Froot (1988, 1990) have found that, over the period 1981–1985, the market shifted away from the fundamentalists and toward the chartists or trend-followers.
76
D. Sornette / Physics Reports 378 (2003) 1 – 98 18000
1400 1300 HK
16000
WS
1000 12000
900
S&P500
1100
14000 Hang Seng
1200
800 10000
700 600
8000
500 6000
400 95
95.5
96
96.5
97 Date
97.5
98
98.5
Fig. 30. The Hang Seng index prior to the October 1997 crash on the Hong-Kong Stock Exchange already shown in Fig. 25 and the S&P500 stock market index prior to the crash on Wall Street in August 1998. The +t to the S&P500 index is Eq. (54) with A2 ≈ 1321, B2 ≈ −402, B2 C ≈ 19:7, m2 ≈ 0:60, tc ≈ 98:72, % ≈ 0:75 and ! ≈ 6:4. Reproduced from Johansen et al. (1999).
7.6. The crash of August 1998 From its top on mid-June 1998 (1998.55) to its bottom on the +rst days of September 1998 (1998.67), the U.S. S&P500 stock market lost 19%. This “slow” crash and in particular the turbulent behavior of the stock markets worldwide starting mid-august are widely associated with and even attributed to the plunge of the Russian +nancial markets, the devaluation of its currency and the default of the government on its debts obligations. The analysis presented in Fig. 30 suggests a diNerent story: the Russian event may have been the triggering factor but not the fundamental cause! One can observe clear +ngerprints of a kind of speculative herding, starting more than three years before, with its characteristic power law acceleration decorated by log-periodic oscillations. Table 3 gives a summary of the parameters of the log-periodic power law +t to the main bubbles and crashed discussed until now. The crash of August 1998 is seen to +t nicely in the family of crashes with “herding” signatures. This indicates that the stock market was again developing an unstable bubble which would have culminated at some critical time tc ≈ 1998:72, close to the end of September 1998. According to the rational expectation bubble models of Section 5, the probability for a strong correction or a crash was increasing as tc was approached, with a raising susceptibility to “external” perturbations, such as news or +nancial diAculties occurring somewhere in the “global village”. The Russian meltdown was just such a perturbation. What is remarkable is that the U.S. market contained somehow the information of an upcoming instability through its unsustainable accelerated growth and structures! The +nancial world being an extremely complex system of interacting components, it is not farfetched to imagine that Russia was led to take actions against its unsustainable debt policy at the time of a strongly increasing concern by many about risks on investments made in developing countries. The strong correction starting mid-august was not speci+c to the U.S. markets. Actually, it was much stronger in some other markets, such as the German market. Indeed, within the period of only 9 months preceding July 1998, the German DAX index went up from about 3700 to almost 6200 and then quickly declined over less than one month to below 4000. Precursory log-periodic structures
D. Sornette / Physics Reports 378 (2003) 1 – 98
77
Table 3 Summary of the parameters of the log-periodic power law +t to the main bubbles and crashes discussed in this section (see Figs. 31, 32 and 33 below for the April 2000 crash on the Nasdaq and the two crashes on IBM and on Procter & Gamble) Crash 1929 1985 1985 1987 1997 1998 1998 1998 1999 2000 2000
(WS) (DEM) (CHF) (WS) (H-K) (WS) (YEN) (CAN$) (IBM) (P& G) (Nasdaq)
tc
tmax
tmin
drop
m2
!
&
A2
B2
B2 C
Var
30.22 85.20 85.19 87.74 97.74 98.72 98.78 98.66 99.56 00.04 00.34
29.65 85.15 85.18 87.65 97.60 98.55 98.61 98.66 99.53 00.04 00.22
29.87 85.30 85.30 87.80 97.82 98.67 98.77 98.71 99.81 00.19 00.29
47% 14% 15% 30% 46% 19% 21% 5.1% 34% 54% 37%
0.45 0.28 0.36 0.33 0.34 0.60 0.19 0.26 0.24 0.35 0.27
7.9 6.0 5.2 7.4 7.5 6.4 7.2 8.2 5.2 6.6 7.0
2.2 2.8 3.4 2.3 2.3 2.7 2.4 2.2 3.4 2.6 2.4
571 3.88 3.10 411 20077 1321 207 1.62
−267 1.16 −0.86 −165 −8241 −402 −84.5 −0.23
14.3 −0.77 −0.055 12.2 −397 19.7 2.78 −0.011
56 0.0028 0.0012 36 190360 375 17 0.00024
tc is the critical time predicted from the +t of each +nancial time series to the Eq. (54). The other parameters of the +t are also shown. & = exp[2'=!] is the preferred scaling ratio of the log-periodic oscillations. The error Var is the variance between the data and the +t and has units of price × price. Each +t is performed up to the time tmax at which the market index achieved its highest maximum before the crash. tmin is the time of the lowest point of the market disregarding smaller “plateaus”. The percentage drop is calculated as the total loss from tmax to tmin . Reproduced from Johansen et al. (1999).
have been documented for this event over the nine months preceding July 1998 (Drozdz et al., 1999), with the addition that analogous log-periodic oscillations occurred also on smaller time scales as precursors of smaller intermediate decreases, with similar preferred scaling ratio & at the various levels of resolution. However, the reliability of these observations at smaller time scales established by visual inspection in Drozdz et al. (1999) remain to be established with rigorous statistical tests. 7.7. The Nasdaq crash of April 2000 In the last few years of the second Millenium, there was a growing divergence in the stock market between “New Economy” and “Old Economy” stocks, between technology and almost everything else. Over 1998 and 1999, stocks in the Standard & Poor’s technology sector have risen nearly fourfold, while the S&P500 index has gained just 50%. And without technology, the benchmark would be Kat. In January 2000 alone, 30% of net inKows into mutual funds went to science and technology funds, versus just 8.7% into S&P500 index funds. As a consequence, the average price-over-earning ratio P/E for Nasdaq companies was above 200 (corresponding to a ridiculous earning yield of 0.5%), a stellar value above anything that serious economic valuation theory would consider reasonable. It is worth recalling that the very same concept and wording of a “New Economy” was hot in the minds and mouths of investors in the 1920s and in the early 1960s as already mentioned. In the 1920s, the new technologies of the time were General Electric, ATT and other electric and communication companies, and they also exhibited impressive price appreciations of the order of hundreds of percent in an 18 month time intervals before the 1929 crash. In the early 1960s, the
78
D. Sornette / Physics Reports 378 (2003) 1 – 98
growth stocks were in the new electronic industry like Texas Instruments and Varian Associates, which expected to exhibit a very fast rate of earning growth, were highly prized and far outdistanced the standard blue-chip stocks. Many companies associated with the esoteric high-tech of space travel and electronics sold in 1961 for over 200 times their previous year’s earning. The “tronics boom”, as it was called, has actually remarkably similar features to the “new economy” boom preceding the October 1929 crash or the “new economy” boom of the late 1990s, ending in the April 2000 crash on the Nasdaq index. The Nasdaq Composite index dropped precipiteously with a low of 3227 on April 17, 2000, corresponding to a cumulative loss of 37% counted from its all-time high of 5133 reached on March 10, 2000. The Nasdaq Composite consists mainly of stock related to the so-called “New Economy”, i.e., the Internet, software, computer hardware, telecommunication and so on. A main characteristic of these companies is that their price-earning-ratios (P/E’s), and even more so their price-dividend-ratios, often came in three digits prior to the crash. Some companies, such as VA LINUX, actually had a negative Earning/Share of −1:68. Yet they were traded around $40 per share which is close to the price of Ford in early March 2000. Opposed to this, “Old Economy” companies, such as Ford, General Motors and DaimlerChrysler, had P/E ≈ 10. The diNerence between “Old Economy” and “New Economy” stocks is thus the expectation of future earnings (Sornette, 2000b): investors, who expect an enormous increase in for example the sale of Internet and computer related products rather than in car sales, are hence more willing to invest in Cisco rather than in Ford notwithstanding the fact that the earning-per-share of the former is much smaller than for the later. For a similar price per share (approximately $60 for Cisco and $55 for Ford), the earning per share was $0.37 for Cisco compared to $6.0 for Ford (Cisco has a total market capitalisation of $395 billions (close of April 14, 2000) compared to $63 billions for Ford). In the standard fundamental valuation formula, in which the expected return of a company is the sum of the dividend return and of the growth rate, “New Economy” companies are supposed to compensate for their lack of present earnings by a fantastic potential growth. In essence, this means that the bull market observed in the Nasdaq in 1997–2000 has been fueled by expectations of increasing future earnings rather than economic fundamentals (and by the expectation that others will expect the same thing and will help increase the capital gains): the price-to-dividend ratio for a company such as Lucent Technologies (LU) with a capitalization of over $300 billions prior to its crash on the 5 January 2000 is over 900 which means that you get a higher return on your checking account(!) unless the price of the stock increases. Opposed to this, an “Old Economy” company such as DaimlerChrysler gives a return which is more than 30 times higher. Nevertheless, the shares of Lucent Technologies rose by more than 40% during 1999 whereas the share of DaimlerChrysler declined by more than 40% in the same period. The recent crashes of IBM, LU and Procter & Gamble (P&G) correspond to a loss equivalent to many countries state budget. And this is usually attributed to a “business-as-usual” corporate statement of a slightly revised smaller-than-expected earnings! These considerations make it credible that it is the expectation of future earnings and future capital gains rather than present economic reality that motivates the average investor, thus creating a speculative bubble. It has also been proposed (Mauboussin and Hiler, 1999) that better business models, the network eNect, +rst-to-scale advantages and real options eNect could account for the apparent over-valuation, providing a sound justi+cation for the high prices of dot.com and other new-economy companies. These interesting views expounded in early 1999 were in synchrony with the bull market in 1999 and preceding years. They participated in the general optimistic view and
D. Sornette / Physics Reports 378 (2003) 1 – 98
79
9 Best fit Third best fit
8.8
Log (Nasdaq Composite)
8.6 8.4 8.2 8 7.8 7.6 7.4 7.2 7 97.5
98
98.5
99
99.5
00
Date
Fig. 31. Best (r.m.s. ≈ 0:061) and third best (r.m.s. ≈ 0:063) +ts with Eq. (54) to the natural logarithm of the Nasdaq Composite. The parameter values of the +ts are A2 ≈ 9:5, B2 ≈ −1:7, B2 C ≈ 0:06, m2 ≈ 0:27, tc ≈ 2000:33, ! ≈ 7:0, % ≈ −0:1 and A2 ≈ 8:8, B2 ≈ −1:1, B2 C ≈ 0:06, m2 ≈ 0:39, tc ≈ 2000:25, ! ≈ 6:5, % ≈ −0:8, respectively. Reproduced from Johansen and Sornette (2000a).
added to the herding strength. They seem less attractive in the context of the bearish phase of the Nasdaq market that has followed its crash in April 2000 and which is still running more than two years later: Koller and Zane (2001) argue that the traditional triumvirate, earnings growth, inKation, and interest rates, explains most of the growth and decay of U.S. indices (while not excluding the existence of a bubble of hugely capitalized new-technology companies). Indeed, as already stressed, history provides many examples of bubbles, driven by unrealistic expectations of future earnings, followed by crashes (White, 1996; Kindleberger, 2000). The same basic ingredients are found repeatedly: fueled by initially well-founded economic fundamentals, investors develop a self-ful+lling enthusiasm by an imitative process or crowd behavior that leads to an unsustainable accelerating overvaluation. We propose that the fundamental origin of the crashes on the U.S. markets in 1929, 1962, 1987, 1998 and 2000 belongs to the same category, the difference being mainly in which sector the bubble was created: in 1929, it was utilities; in 1962, it was the electronic sector; in 1987, the bubble was supported by a general deregulation with new private investors with high expectations; in 1998, it was strong expectation on investment opportunities in Russia that collapsed; in 2000, it was the expectations on the Internet, telecommunication and so on that have fueled the bubble. However, sooner or later, investment values always revert to a fundamental level based on real cash Kows. Fig. 31 shows the logarithm of the Nasdaq Composite +tted with the log-periodic power law equation (54). The data interval to +t was identi+ed using the same procedure as for the other crashes: the +rst point is the lowest value of the index prior to the onset of the bubble and the last point is that of the all-time high of the index. There exists some subtlety with respect to identifying the onset of the bubble, the end of the bubble being objectively de+ned as the date where the market reached its maximum. A bubble signi+es an acceleration of the price. In the case of Nasdaq, it tripled from 1990 to 1997. However, the increase was a factor 4 in the 3 years preceding the current crash thus de+ning an “inKection point” in the index. In general, the identi+cation of such an “inKection point” is quite straightforward on the most liquid markets, whereas this is not always
80
D. Sornette / Physics Reports 378 (2003) 1 – 98 180 160
Price of IBM shares
140 120 100 80 60 40 20 97.5
98
98.5
99
99.5
Date
Fig. 32. Best (r.m.s. ≈ 3:7) +t with equation (54) to the price of IBM shares. The parameter values of the +ts are A2 ≈ 196, B2 ≈ −132, B2 C ≈ −6:1, m2 ≈ 0:24, tc ≈ 99:56, ! ≈ 5:2 and % ≈ 0:1. Reproduced from Johansen and Sornette (2000a).
the case for the emergent markets (Johansen and Sornette, 2001b). With respect to details of the methodology of the +tting procedure, we refer the reader to Johansen et al. (1999). Undoubtedly, observers and analysts have forged post-mortem stories linking the April 2000 crash in part with the eNect of the crash of Microsoft Inc. resulting from the breaking of negotiations during the weekend of April 1st with the U.S. federal government on the antitrust issue, as well as from many other factors. Here, we interpret the Nasdaq crash as the natural death of a speculative bubble, anti-trust or not, the results presented here strongly suggesting that the bubble would have collapsed anyway. However, according to our analysis based on the probabilistic model of bubbles described in Sections 5 and 6, the exact timing of the death of the bubble is not fully deterministic and allows for stochastic inKuences, but within the remarkably tight bound of about one month (except for the slow 1962 crash). Log-periodic critical signatures can also be detected on individual stocks as shown in Figs. 32 for IBM and 33 for Procter & Gamble. These two +gures oNer a quanti+cation of the precursory signals. The signals are more noisy than for large indices but nevertheless clearly present. There is a weaker degree of generality for individual stocks as the valuation of a company is also a function of many other idiosyncratic factors associated with the speci+c course of the company. Dealing with broad market indices averages out all these speci+cities to mainly keep track of the overall market “sentiment” and direction. This is the main reason why the log-periodic power law precursors are stronger and more signi+cant for aggregated +nancial series in comparison with individual assets. If speculation, imitation and herding become at some time the strongest force driving the price of an asset, we should then expect the log-periodic power law signatures to emerge again strongly above all the other idiosyncratic eNects. Generalization of this analysis to emergent markets, including six Latin-American stock market indices (Argentina, Brazil, Chile, Mexico, Peru and Venezuela) and six Asian stock market indices (Hong-Kong, Indonesia, Korea, Malaysia, Philippines and Thailand) has been performed in Johansen and Sornette (2001b). This work also discusses the existence of intermittent and strong correlation between markets following major international events.
D. Sornette / Physics Reports 378 (2003) 1 – 98
81
Price of Procter & Gamble shares
120
110
100
90
80
70
60
50 98.8
99
99.2
99.4 Date
99.6
99.8
00
00.2
Fig. 33. Best (r.m.s. ≈ 4:3) +t with equation (54) to the price of Procter & Gamble shares. The parameter values of the +t are A2 ≈ 124, B2 ≈ −38, B2 C ≈ 4:8, m2 ≈ 0:35, tc ≈ 2000:04, ! ≈ 6:6 and % ≈ −0:9. Reproduced from Johansen and Sornette (2000a).
7.8. “Anti-bubbles” We now summarize the evidence that imitation between traders and their herding behavior not only lead to speculative bubbles with accelerating over-valuations of +nancial markets possibly followed by crashes, but also to “anti-bubbles” with decelerating market devaluations following all-time highs (Johansen and Sornette, 1999c). There is thus a certain degree of symmetry of the speculative behavior between the “bull” and “bear” market regimes. This behavior is documented on the Japanese Nikkei stock index from 1 January 1990 until 31 December 1998, on the Gold future prices after 1980, and on the recent behavior of the U.S. S&P500 index from mid-2000 to August 2002, all of them after their all-time highs. The question we ask is whether the cooperative herding behavior of traders might also produce market evolutions that are symmetric to the accelerating speculative bubbles often ending in crashes. This symmetry is performed with respect to a time inversion around a critical time tc such that tc − t for t ¡ tc is changed into t − tc for t ¿ tc . This symmetry suggests looking at decelerating devaluations instead of accelerating valuations. A related observation has been reported in Fig. 18 in relation to the October 1987 crash showing that the implied volatility of traded options has relaxed after the October 1987 crash to its long-term value, from a maximum at the time of the crash, according to a decaying power law with decelerating log-periodic oscillations. It is this type of behavior that we document now but for real prices. The critical time tc then corresponds to the culmination of the market, with either a power law increase with accelerating log-periodic oscillations preceding it or a power law decrease with decelerating log-periodic oscillations after it. In the Russian market, both structures appear simultaneously for the same tc (Johansen and Sornette, 1999c). This is however a rather rare occurrence, probably because accelerating markets with log-periodicity almost inevitably end-up in a crash, a market rupture that thus breaks down the symmetry (tc − t for t ¡ tc into t − tc for t ¿ tc ). Herding behavior can occur and progressively weaken from a maximum in “bearish” (decreasing) market phases, even if the preceding “bullish” phase ending at tc was not characterised by a strengthening imitation.
82
D. Sornette / Physics Reports 378 (2003) 1 – 98 eq. (4) eq. (3) eq. (14)
10.6
10.4
Log (Nikkei)
10.2
10
9.8
9.6
9.4 90
92
94
96
98
2000
Date
Fig. 34. Natural logarithm of the Nikkei stock market index after the start of the decline from January 1, 1990 until December 31, 1998. The dotted line is the simple log-periodic formula (54) used to +t adequately the interval of ≈ 2:6 years starting from January 1, 1990. The continuous line is the improved nonlinear log-periodic formula developed in Sornette and Johansen (1997) and already used for the 1929 and 1987 crashes over 8 years of data. It is used to +t adequately the interval of ≈ 5:5 years starting from January 1, 1990. The dashed line is an extension of the previous nonlinear log-periodic formula to the next-order of description which was developed in Johansen and Sornette (1999c) and is used to +t adequately the interval of ≈ 9 years starting from January 1, 1990 to December 1998. Reproduced from Johansen and Sornette (1999c).
The symmetry is thus statistical or global in general and holds in the ensemble rather than for each single case individually. 7.8.1. The “bearish” regime on the Nikkei starting from 1st January 1990 The most recent example of a genuine long-term depression comes from Japan, where the Nikkei has decreased by more than 60% in the 12 years following the all-time high of 31 December 1989. In Fig. 34, we see (the logarithm of) the Nikkei from 1 January 1990 until 31 December 1998. The three +ts, shown as the undulating lines, use three mathematical expressions of increasing sophistication: the dotted line is the simple log-periodic formula (54); the continuous line is the improved nonlinear log-periodic formula developed in (Sornette and Johansen, 1997) and already used for the 1929 and 1987 crashes over 8 years of data; the dashed line is an extension of the previous nonlinear log-periodic formula to the next-order of description which was developed in Johansen and Sornette (1999c). This last most sophisticated mathematical formula predicts the transition from the log-frequency !1 close to tc to !1 + !2 for T1 ¡ < ¡ T2 and to the log-frequency !1 + !2 + !3 for T2 ¡ <. Using indices 1, 2 and 3, respectively, for the simplest to the most sophisticated formulas, the parameter values of the +rst +t of the Nikkei are A1 ≈ 10:7, B1 ≈ −0:54, B1 C1 ≈ −0:11, m1 ≈ 0:47, tc ≈ 89:99, %1 ≈ −0:86, !1 ≈ 4:9 for equation (54). The parameter values of the second +t of the Nikkei are A2 ≈ 10:8; B2 ≈ −0:70; B2 C2 ≈ −0:11; m2 ≈ 0:41; tc ≈ 89:97; %2 ≈ 0:14; !1 ≈ 4:8; T1 ≈ 9:5 years, !2 ≈ 4:9. The third +t uses the entire time interval and is performed by adjusting only T1 , T2 , !2 and !3 , while m3 = m2 , tc and !1 are +xed at the values obtained from the previous +t. The values obtained for these four parameters are T1 ≈ 4:3 years, T2 ≈ 7:8 years, !2 ≈ −3:1 and T2 ≈ 23 years. Note that the values obtained for the two time scales T1 and T2
D. Sornette / Physics Reports 378 (2003) 1 – 98
83
10.8
10.6
Log (Nikkei)
10.4
10.2
10
9.8
9.6
9.4 90
92
94
96 Date
98
100
Fig. 35. Natural logarithm of the Nikkei stock market index after the start of the decline from January 1, 1990 until February 2001. The continuous smooth line is the extended nonlinear log-periodic formula which was developed in Johansen and Sornette (1999c) and is used to +t adequately the interval of ≈ 9 years starting from January 1, 1990. The Nikkei data is separated in two parts. The dotted line shows the data used to perform the +t with formula developed in Johansen and Sornette (1999c) and to issue the prediction in January 1999 (see Fig. 34). Its continuation as a continuous line gives the behavior of the Nikkei index after the prediction has been made. Reproduced from Johansen and Sornette (2000b).
con+rms their ranking. This last +t predicts a change of regime and that the Nikkei should increase in 1999. Not only do the +rst two equations agree remarkably well with respect to the parameter values produced by the +ts, but they are also in good agreement with previous results obtained from stock market and Forex bubbles with respect to the values of exponent m2 . What lends credibility to the +t with the most sophisticated formula is that, despite its complex form, we get values for the two cross-over time scales T1 , T2 which correspond to what is expected from the ranking and from the 9 year interval of the data. We refer to Johansen and Sornette (1999c) for a detailed and rather technical discussion. The prediction summarized by Fig. 34 was made public on January 25, 1999 by posting a preprint on the Los Alamos www internet server, see http://xxx.lanl.gov/abs/cond-mat/9901268. The preprint was later published as Johansen and Sornette (1999c). The prediction stated that the Nikkei index should recover from its 14 year low (13232.74 on January 5, 1999) and reach ≈ 20500 a year later corresponding to an increase in the index of ≈ 50%. This prediction was mentioned in a wide-circulation journal in physical sciences which appeared in May 1999 (StauNer, 1999). In Fig. 35, the actual and predicted evolution of the Nikkei over 1999 and later are compared (Johansen and Sornette, 2000b). Not only did the Nikkei experience a trend reversal as predicted, but it has also followed the quantitative prediction with rather impressive precision. In particular, the prediction of the 50% increase at the end of 1999 is validated accurately. The prediction of another trend reversal is also accurately predicted, with the correct time for the reversal occurring beginning of 2000: the predicted maximum and observed one match closely. It is important to note that the error between the curve and the data has not grown after the last point used in the +t over 1999. This tells us that the prediction has performed well for more than one year. Furthermore, since the
84
D. Sornette / Physics Reports 378 (2003) 1 – 98
relative error between the +t and the data is within ±2% over a time period of 10 years, not only has the prediction performed well, but also the underlying model. The ful+lling of this prediction is even more remarkable than the comparison between the curve and the data indicates, because it included a change of trend: at the time when the prediction was issued, the market was declining and showed no tendency to increase. Many economists were at that time very pessimistic and could not envision when Japan and its market would rebounce. For instance, P. Krugman wrote on July 14, 1998 in the Shizuoka Shimbun at the time of the banking scandal “the central problem with Japan right now is that there just is not enough demand to go around—that consumers and corporations are saving too much and borrowing too little... . So seizing these banks and putting them under more responsible management is, if anything, going to further reduce spending; it certainly will not in and of itself stimulate the economy... . But at best this will get the economy back to where it was a year or two ago—that is, depressed, but not actually plunging”. Then, in the Financial Times, January 20, 1999, P. Krugman wrote in an article entitled “Japan heads for the edge” the following: “...the story is starting to look like a tragedy. A great economy, which does not deserve or need to be in a slump at all, is heading for the edge of the cliN—and its drivers refuse to turn the wheel”. In a poll of 30 economists performed by Reuters (one of the major news and +nance data provider in the world) in October 1998 reported in Indian Express on the 15 October (see http://www.indian-express.com/fe/daily/19981016/28955054.html), only two economists predicted growth for the +scal year of 1998–1999. For the year 1999 –2000 the prediction was a meager 0.1% growth. This majority of economists said that “a vicious cycle in the economy was unlikely to disappear any time soon as they expected little help from the government’s economic stimulus measures... . Economists blamed moribund domestic demand, falling prices, weak capital spending and problems in the bad-loan laden banking sector for dragging down the economy”. It is in this context that we predicted an approximately 50% increase of the market in the 12 months following January 1999, assuming that the Nikkei would stay within the error-bars of the +t. Predictions of trend reversals is noteworthy diAcult and unreliable, especially in the linear framework of auto-regressive models used in standard economic analyses. The present nonlinear framework is well-adapted to the forecasting of change of trends, which constitutes by far the most diAcult challenge posed to forecasters. Here, we refer to our prediction of a trend reversal within the strict con+ne of our extended formula: trends are limited periods of times when the oscillatory behavior shown in Fig. 35 is monotonous. A change of trend thus corresponds to crossing a local maximum or minimum of the oscillations. Our formula seems to have predicted two changes of trends, bearish to bullish at the beginning of 1999 and bullish to bearish at the beginning of 2000. 7.8.2. The gold deGation price starting mid-1980 Another example of log-periodic decay is that of the Gold price after the burst of the bubble in 1980 as shown in Fig. 36. The bubble has an average power law acceleration as shown in the +gure but without any visible log-periodic structure. A pure power law +t will however not “lock in” on the true date of the crash, but insists on an earlier date than the last data point. This suggests that the behavior of the price might be diNerent in some sense in the last few weeks prior to the burst of the bubble. The continuous line before the peak is expression (54) +tted over an interval of ≈ 3 years. The parameter values of this +t are A2 ≈ 8:5, B2 ≈ −111, B2 C ≈ −110, m2 ≈ 0:41,
D. Sornette / Physics Reports 378 (2003) 1 – 98
85
6.8 6.6 6.4
Log (Gold Price)
6.2 6 5.8 5.6 5.4 5.2 5 4.8 77
78
79
80
81
82
Date
Fig. 36. Natural logarithm of the gold 100 Oz Future price in U.S.$ showing a power law acceleration followed by a decline of the price in the early eighties. The line after the peak is expression (54) +tted over an interval of ≈ 2 years. Reproduced from Johansen and Sornette (1999c).
tc ≈ 80:08, % ≈ −3:0, ! ≈ 0:05. The price of gold after its peak is +tted by expression (54) and the result is shown as the undulating continuous line. Again, we obtain a reasonable agreement with previous results for the exponent m2 with a good preferred scaling ratio & ≈ 1:9. The strength of the log-periodic oscillations compared to the leading behavior is ≈ 10%. The parameter values of the +t in this anti-bubble regime are A2 ≈ 6:7, B2 ≈ −0:69, B2 C ≈ 0:06, m2 ≈ 0:45, tc ≈ 80:69, % ≈ 1:4, ! ≈ 9:8. 7.8.3. The U.S. 2000 –2002 Market Descent: How Much Longer and Deeper? Sornette and Zhou (2002) have recently analyzed the remarkable similarity in the behavior of the U.S. S&P500 index from 1996 to August 2002 and of the Japanese Nikkei index from 1985 to 1992, corresponding to an 11 years shift. In particular, the structure of the price trajectories in the bearish or anti-bubble phases are strikingly similar, as seen in Fig. 37. Sornette and Zhou (2002) have performed a battery of tests, starting with parametric +ts of the index with two log-periodic power law formulas, followed by the so-called Shank’s transformation applied to characteristic times. They also carried out two spectral analysis, the Lomb periodogram applied to the parametrically detrended index and the nonparametric (H; q)-analysis of fractal signals (Zhou and Sornette, 2002b, c). These diNerent approaches complement each other and con+rm the presence of a very strong log-periodic structures. A rather novel feature is the detection of a signi+cant second-order harmonic which provides a statistically signi+cant improvement of the description of the data by the theory, as tested using the statistical theory of nested hypotheses. The description of the S&P500 index since mid-2000 to end of August 2002 based on the combination of the +rst and second log-periodic harmonics is shown in Fig. 38. In the next two years, Sornette and Zhou (2002) predict an overall continuation of the bearish phase, punctuated by local rallies; speci+cally, they predict an overall increasing market until the end of the year 2002 or until the +rst quarter of 2003; they predict a severe following descent
D. Sornette / Physics Reports 378 (2003) 1 – 98 1985 10.6
1986
1987
1988
1989
1990
1991
1992
7.6
10.4
7.4
10.2 Nikkei
7.2
10
7.0
9.8 S&P 500
6.8
9.6 9.4
ln (Nikkei)
ln (Nikkei)
1993 7.8
9.2 9
10.5
6.6
10
6.4
9.5
6.2
9 1990 1992 1994 1996 1998 2000 2002 2004
8.8
Date 8.6 1996
1997
1998
1999
2000 Date
2001
2002
2003
ln (S&P500)
86
6.0 5.8 2004
Fig. 37. Comparison between the evolutions of the U.S. S&P500 index from 1996 till August 24, 2002 (bottom and right axes) and the Japanese Nikkei index from 1985 to 1993 (top and left axes). The years are written on the horizontal axis (and marked by a tick on the axis) where January 1 of that year occurs. The dashed line is the simple log-periodic formula (54) +tted to the Nikkei index (with tc − t replaced by t − tc ). The data used in this +t goes from 01-Jan-1990 to 01-Jul-1992 (Johansen and Sornette, 1999c). The parameter values are tc = 28-Dec-1989, = 0:38, ! = 5:0, % = 2:59, A = 10:76, B = −0:067 and C = −0:011. The root-mean-square residue is = 0:0535. The dash-dotted line is the improved nonlinear log-periodic formula developed in Sornette and Johansen (1997) +tted to the Nikkei index. The Nikkei index data used in this +t goes from 01-Jan-1990 to 01-Jul-1995 (Johansen and Sornette, 1999c). The parameter values are tc = 27-Dec-1989, = 0:38, ! = 4:8, % = 6:27, >t = 6954, >! = 6:5, A = 10:77, B = −0:070, C = 0:012. The root-mean-square residue is = 0:0603. The continuous line is the +t of the Nikkei index with the third-order formula developed in Johansen and Sornette (1999c). The Nikkei index data used in the +t goes from 01-Jan-1990 to 31-Dec-2000. The +t is performed by +xing tc , and ! at the values obtained from the second-order +t and adjusting only >t , >t , >! , >! and %. The parameter values are >t = 1696, >t = 5146, >! = −1:7, >! = 40, % = 6:27, A = 10:86, B = −0:090, C = −0:0095. The root-mean-square residue of the +t is = 0:0867. In the three +ts, A, B and C are slaved to the other variables by the multiplier approach in each iteration of the optimization search. The inset shows the 13-year Nikkei anti-bubble with the +t with the third-order formula over these 13 years shown as the continuous line. The parameter values slightly diNerent: >t = 52414, >t = 17425, >! = 23:7, >! = 127:5, % = 5:57, A = 10:57, B = −0:045, C = 0:0087. The root-mean-square residue of the +t is = 0:1101. In all the +ts, times are expressed in units of days, in contrast with the yearly unit used in Johansen and Sornette (1999c). Thus, the parameters B and C are diNerent since they are unit-dependent, while all the other parameters are independent of the units. Reproduced from (Sornette and Zhou, 2002).
(with maybe one or two severe ups and downs in the middle) which stops during the +rst semester of 2004. Beyond this, they cannot be very certain due to the possible eNect of additional nonlinear collective eNects and of a real departure from the anti-bubble regime. The similarities between the two stock market indices may reKect deeper similarities between the fundamentals of two economies which both went through over-valuation with strong speculative phases preceding the transition to bearish phases characterized by a surprising number of bad surprises (bad loans for Japan and accounting frauds for the U.S.) sapping investors’ con+dence.
D. Sornette / Physics Reports 378 (2003) 1 – 98
87
1600 1500 1400
S&P500
1300 1200 1100 1000 900
800 2000
2001
2002
2003
2004
Date
Fig. 38. Fitted trajectories using Eq. (54) (with tc − t replaced by |t − tc |), each curve corresponding to a diNerent starting time from Mar-01-2000 to Dec-01-2000 with one month interval. The diNerent +ts are obtained as a sensitivity test with respect to the starting time of the anti-bubble which is consistently found to start at tc ≈ July 15 –August 15, 2000. The dotted lines show the predicted future trajectories. One sees that the +ts are quite robust with respect to diNerent starting date tstart from Mar-01-2000 to Dec-01-2000. Reproduced from Sornette and Zhou (2002).
8. Synthesis 8.1. “Emergent” behavior of the stock market In this paper, we have synthesized a large body of evidence in favor of the hypothesis that large stock market crashes are analogous to critical points studied in the statistical physics community in relation to magnetism, melting, and so on. Our main assumption is the existence of a cooperative behavior of traders imitating each other described in Sections 5 and 6. A general result of the theory is the existence of log-periodic structures decorating the time evolution of the system. The main point is that the market anticipates the crash in a subtle self-organized and cooperative fashion, hence releasing precursory “+ngerprints” observable in the stock market prices. In other words, this implies that market prices contain information on impending crashes. If the traders were to learn how to decipher and use this information, they would act on it and on the knowledge that others act on it, nevertheless the crashes would still probably happen. Our results suggest a weaker form of the “weak eAcient market hypothesis” (Fama, 1991), according to which the market prices contain, in addition to the information generally available to all, subtle information formed by the global market that most or all individual traders have not yet learned to decipher and use. Instead of the usual interpretation of the eAcient market hypothesis in which traders extract and incorporate consciously (by their action) all information contained in the market prices, we propose that the market as a whole can exhibit an “emergent” behavior not shared by any of its constituents. In other words, we have in mind the process of the emergence of intelligent behaviors at a macroscopic scale that individuals at the microscopic scale cannot perceive. This process has been discussed in biology for instance in animal populations such as ant colonies (Wilson, 1971; Holldobler
88
D. Sornette / Physics Reports 378 (2003) 1 – 98
and Wilson, 1994) or in connection with the emergence of consciousness (Anderson et al., 1988; Holland, 1992). Let us mention another realization of this concept, which is found in the information contained in option prices on the Kuctuations of their underlying asset. Despite the fact that the prices do not follow geometrical brownian motion, whose existence is a prerequisite for most option pricing models, traders have apparently adapted to empirically incorporate subtle information in the correlation of price distributions with fat tails (Potters et al., 1998). In this case and in contrast to the crashes, the traders have had time to adapt. The reason is probably that traders have been exposed for decades to option trading in which the characteristic time scale for option lifetime is in the range of month to years at most. This is suAcient for an extensive learning process to occur. In contrast, only a few great crashes occur typically during a lifetime and this is certainly not enough to teach traders how to adapt to them. The situation may be compared to the ecology of biological species which constantly strive to adapt. By the forces of evolution, they generally succeed to survive by adaptation under slowly varying constraints. In contrast, life may exhibit successions of massive extinctions and booms probably associated with dramatically fast-occurring events, such as meteorite impacts and massive volcanic eruptions. The response of a complex system to such extreme events is a problem of outstanding importance that is just beginning to be studied (Commission on Physical Sciences, Mathematics, and Applications, 1990). Most previous models proposed for crashes have pondered the possible mechanisms to explain the collapse of the price at very short time scales. Here in contrast, we propose that the underlying cause of the crash must be searched years before it in the progressive accelerating ascent of the market price, reKecting an increasing build-up of the market cooperativity. From that point of view, the speci+c manner by which prices collapsed is not of real importance since, according to the concept of the critical point, any small disturbance or process may have triggered the instability, once ripe. The intrinsic divergence of the sensitivity and the growing instability of the market close to a critical point might explain why attempts to unravel the local origin of the crash have been so diverse. Essentially all would work once the system is ripe. Our view is that the crash has an endogeneous origin and that exogeneous shocks only serve as triggering factors. We propose that the origin of the crash is much more subtle and is constructed progressively by the market as a whole. In this sense, this could be termed a systemic instability. 8.2. Implications for mitigations of crises Economists, J.E. Stiglitz and recently P. Krugman in particular as well as +nancier Soros, have argued that markets should not be left completely alone. The mantra of the free-market purists requiring that markets should be totally free may not always be the best solution, because it overlooks two key problems: (1) the tendency of investors to develop strategies that may destabilize markets in a fundamental way and (2) the noninstantaneous adjustment of possible imbalance between countries. Financier George Soros has argued that real world international +nancial markets are inherently volatile and unstable since “market participants are trying to discount a future that is itself shaped by market expectations”. This question is of course at the center of the debate on whether local and global markets are able to stabilize on their own after a crisis such as the Asian crisis which started in 1997. In this example, to justify the intervention of the IMF (international monetary fund), Treasury Secretary Rubin warned in January 1998 that global markets would not be able to stabilize
D. Sornette / Physics Reports 378 (2003) 1 – 98
89
in Asia on their own, and that a strong role on the part of the IMF and other international institutions, and governments, was necessary, least the crisis spread to other emerging markets in Latin America and Eastern Europe. The following analogy with forest +res is useful to illustrate the nature of the problem. In many areas around the world, the dry season sees numerous large wild+res, sometimes with deaths of +re+ghters and other people, the destruction of many structures and of large forests. It is widely accepted that livestock grazing, timber harvesting, and +re suppression over the past century have led to unnatural conditions—excessive biomass (too many trees without suAcient biodiversity and dead woody material) and altered species mix—in the pine forests of the West of the U.S.A., in the Mediterranean countries and elsewhere. These conditions make the forests more susceptible to drought, insect and disease epidemics, and other forest-wide catastrophes and in particular large wild+res (Gorte, 1995). Interest in fuel management, to reduce +re control costs and damages, has thus been renewed with the numerous, destructive wild+res spread across the West of the U.S.A. The most-often used technique of fuel management is +re suppression. Recent reviews comparing Southern California on the one hand, where management is active since 1900, and Baja California (north of Mexico) on the other hand where management is essentially absent (a “let-burn” strategy) highlight a remarkable fact (Minnich and Chou, 1997; Moreno, 1998): only small and relatively moderate patches of +res occur in Baja California, compared to a wide distribution of +re sizes in Southern California including huge destructive +res. The selective elimination of small +res (those that can be controlled) in normal weather in Southern California restricts large +res to extreme weather episodes, a process that encourages broad-scale high spread rates and intensities. It is found that the danger of +re suppression is the inevitable development of coarse-scale bush fuel patchiness and large instance +res in contradistinction with the natural self-organization of small patchiness in left-burn areas. Taken at face value, the “let-burn” theory seems paradoxically the correct strategy which maximizes the protection of property and of resources, at minimal cost. This conclusion seems to be correct when the fuel is left on its own to self-organize in a way consistent with the dynamics of +res. In other words, the fuel–+re constitutes a complex nonlinear system with negative and positive feedbacks that may be close to optimal: more fuel favors +re; +res decreases the instantaneous level of fuel but may accelerate its future production; many small +res create natural barriers for the development and extension of large +res; +res produce rich nutrients in the soil; +res have other bene+ts, for instance, a few species, notably lodgepole pine and jack pine, are serotinous—their cones will only open and spread their seeds when they have been exposed to the heat of a wild+re. The possibility for complex nonlinear systems to +nd the “optimal” or to be close to the optimal solution have been stressed before in several contexts (Crutch+eld and Mitchell, 1995; Miltenberger et al., 1993; Sornette et al., 1994). Let us mention for instance a model of fault networks interacting through the elastic deformation of the crust and rupturing during earthquakes which +nds that faults are the optimal geometrical structures accommodating the tectonic deformation: they result from a global mathematical optimization problem that the dynamics of the system solves in an analog computation, i.e., by following its self-organizing dynamics (as opposed to digital computation performed by digital computers). One of the notable levels of organization is called self-organized criticality (Bak, 1996; Sornette, 2000a) and has been applied in particular to explain forest +re distributions (Malamud et al., 1998). Baja California could be a representative of this self-organized regime of the fuel–+re complex left to itself, leading to many small +res and few big ones. Southern California could illustrate the
90
D. Sornette / Physics Reports 378 (2003) 1 – 98
situation where interference both in the production of fuel and also in its combustion by +res (by trying to stop +res) leads to a very broad distribution with many small and moderate controlled +res and too many uncontrollable very large ones. Where do stock markets stand in this picture? The proponents of the “left-alone” approach could get ammunition from the Baja-Southern California comparison, but they would forget an essential element: stock markets and economies are more like Southern California than Baja California. They are not isolated. Even if no government or regulation interfere, they are “forced” by many external economic, political, climatic inKuences that impact them and on which they may also have some impact. If the example of the wildland +res has something to teach us, it is that we must incorporate in our understanding both the self-organizing dynamics of the fuel-+re complex as well as the diNerent exogeneous sources of randomness (weather and wind regimes, natural lightning strike distribution, and so on). The question of whether some regulation could be useful is translated into whether Southern California +res would be better left alone. Since the management approach fails to function fully satisfactorily, one may wonder whether the let-burn scenario would not be better. This has in fact been implemented in Yellowstone park as the “let-burn” policy but was abandoned following the huge Yellowstone +res of 1988. Even the “leave-burn” strategy may turn out to be unrealistic from a societal point-of-view because allowing a speci+c +re to burn down may lead to socially unbearable risks or emotional sensitivity, often discounted over a very short time horizon (as opposed to the long-term view of land management implicit in the left-burn strategy). We suggest that the most momentous events in stock markets, the large +nancial crashes, can indeed be seen as the response of a self-organized system forced by a multitude of external factors in the presence of regulations. The external forcing is an essential element to consider and it modi+es the perspective on the “left-alone” scenario. For instance, during the recent Asian crises, the International Monetary Fund and the U.S. government considered that controls on the international Kow of capital were counterproductive or impractical. J.E. Stiglitz, the chief economist of the IMF until 2000, has argued that in some cases it was justi+ed to restrict short-term Kows of money in and out of a developing economy and that industrialized countries sometimes pushed developing nations too fast to deregulate their +nancial systems. The challenge remains, as always, to encourage and work with countries that are ready and able to implement strong corrective actions and to cooperate toward +nding the +nancial solutions best suited to the needs of the individual case and the broader functioning of the global +nancial system when diAculties arise (Checki and Stern, 2000). Another important issue concerns the endogeneous versus exogeneous nature of shocks. Sornette et al. (2002) have shown that it is possible in some cases to distinguish the eNects on the +nancial volatility of the September 11, 2001 attack or of the coup against Gorbachev on August, 19, 1991 (exogeneous shocks) from +nancial crashes such as October 1987 as well as smaller volatility bursts (endogeneous shocks). Using a parsimonious autoregressive process (the “multifractal random walk”) with long-range memory de+ned on the logarithm of the volatility, they predict strikingly diNerent response functions of the price volatility to great external shocks compared to endogeneous shocks, i.e., which result from the cooperative accumulation of many small shocks. This approach views the origin of endogeneous shocks as the coherent accumulations of tiny bad news, and thus provides a natural uni+cation of previous explanations of large crashes including October 1987. Sornette and Helmstetter (2003) have suggested that these results are generally valid for systems with long-range
D. Sornette / Physics Reports 378 (2003) 1 – 98
91
persistence and memory, which can exhibit diNerent precursory as well as recovery patterns in response to shocks of exogeneous versus endogeneous origins. By endogeneous, one can consider either Kuctuations resulting from an underlying chaotic dynamics or from a stochastic forcing origin which may be external or be an eNective coarse-grained description of the microscopic Kuctuations. In this scenario, endogeneous shocks result from a kind of constructive interference of accumulated Kuctuations whose impacts survive longer than the large shocks themselves. As a consequence, the recovery after an endogeneous shock is in general slower at early times and can be at long times either slower or faster than after an exogeneous perturbation. This oNers the tantalizing possibility of distinguishing between an endogeneous versus exogeneous cause of a given shock, even when there is no “smoking gun”. This could help in investigating the exogeneous versus self-organized origins in problems such as the causes of major biological extinctions, of change of weather regimes and of the climate, in tracing the source of social upheaval and wars, and so on. 8.3. Predictions Ultimately, only forward predictions can demonstrate the usefulness of a theory, thus only time will tell. However, as we have suggested by the many examples reported in Section 7, the analysis points to an interesting predictive potential. However, a fundamental question concerns the use of a reliable crash prediction scheme, if any. Assume that a crash prediction is issued stating that a crash of an amplitude between 20% and 30% will occur between one and two months from now. At least three diNerent scenarios are possible (Johansen and Sornette, 2000a): • Nobody believes the prediction which was then futile and, assuming that the prediction was correct, the market crashes. One may consider this as a victory for the “predictors” but as we have experienced in relation to our quantitative prediction of the change in regime of the Nikkei index (Johansen and Sornette, 1999c, 2000b), this would only be considered by some critics just another “lucky one” without any statistical signi+cance. • Everybody believes the warning, which causes panic and the market crashes as consequence. The prediction hence seems self-ful+lling and the success is attributed more to the panic eNect than to a real predictive power. • SuAciently many investors believe that the prediction may be correct, investors make reasonable adjustments and the steam goes oN the bubble. The prediction hence disproves itself. None of these scenarios are attractive. In the +rst two, the crash is not avoided and in the last scenario the prediction disproves itself and as a consequence the theory looks unreliable. This seems to be the inescapable lot of scienti+c investigations of systems with learning and reKective abilities, in contrast with the usual inanimate and unchanging physical laws of nature. Furthermore, this touches the key-problem of scienti+c responsibility. Naturally, scientists have a responsibility to publish their +ndings. However, when it comes to the practical implementation of those +ndings in society, the question becomes considerably more complex, as history has taught us. We believe however that increased awareness of the potential for market instabilities, oNered in particular by our approach, will help in constructing a more stable and eAcient stock market. Speci+c guidelines for prediction and careful tests are presented in Sornette and Johansen (2001) and especially in Sornette (2003). In particular, Sornette (2003) explains how and to what degree
92
D. Sornette / Physics Reports 378 (2003) 1 – 98
crashes as well as other large market events, may be predicted. This work examines in details what are the forecasting skills of the proposed methodology and their limitations, in particular in terms of the horizon of visibility and expected precision. Several cases studies are presented in details, with a careful count of successes and failures. See also Johansen and Sornette (2001b) for applications to emergent markets, Johansen and Sornette for a systematic test of the correspondence between outliers and preceding log-periodic power law signatures and Sornette and Zhou (2002) for a live prediction on the future evolution of the U.S. stock market in the next two years, from August 2002 to the +rst semester of 2004. Acknowledgements This paper is extracted in part from the book which develops and documents this theme in depth (Sornette, 2003). I acknowledge the fruitful and inspiring discussions and collaborations with Y. Ageon, J.V. Andersen, S. Gluzman, Y. Huang, K. Ide, P. J`ogi, O. Ledoit, M.W. Lee, Y. Malevergne, V.F. Pisarenko, H. Saleur, D. StauNer, W.-X. Zhou and especially A. Johansen. References Adam, M.C., Szafarz, A., 1992. Oxford Economic Papers 44, pp. 626 – 640. Andersen, J.V., Sornette, D., 2001. Have your cake and eat it too: increasing returns while lowering large risks!. J. Risk Finance 2 (3), 70–82. Andersen, J.V., Gluzman, S., Sornette, D., 2000. Fundamental framework for technical analysis. European Phys. J. B 14, 579–601. Anderson, P.W., Arrow, K.J., Pines, D. (Eds.), 1988. The Economy as an Evolving Complex System. Addison-Wesley, New York. Arad, I., Biferale, L., Celani, A., Procaccia, I., Vergassola, M., 2001. Statistical conservation laws in turbulent transport— art. no. 164502. Phys. Rev. Lett. 8716 N16:4502,U62–U64. Assoe, K.G., 1998. Regime-switching in emerging stock market returns. Multinational Finance Journal 2, 101–132. Bak, P., 1996. How Nature Works: the Science of Self-organized Criticality. Copernicus, New York, NY, USA. Barber, B.M., Lyon, J.D., 1997. Detecting long-run abnormal stock returns: the empirical power and speci+cation of test statistics. J. Fin. Econom. 43 (N3), 341–372. Barra, F., Davidovitch, B., Procaccia, I., 2002. Iterated conformal dynamics and Laplacian growth—art. no. 046144. Phys. Rev. E 6504 N4 PT2A:U486 –U497. Barro, R.J., Fama, E.F., Fischel, D.R., Meltzer, A.H., Roll, R., Telser, L.G., 1989. In: Kamphuis, R.W., Kormendi, Jr., R.C., Watson, J.W.H. (Eds.), Black Monday and the Future of Financial Markets. Mid American Institute for Public Policy Research, Inc. and Dow Jones-Irwin, Inc. Basle Committee on Banking Supervision, 1997. Core Principles for ENective Banking Supervision. Basle, September. Bassi, F., Embrechts, P., Kafetzaki, M., 1998. Risk management and quantile estimation. In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails. Birkhauser, Boston, pp. 111–130. Bikhchandani, S., Hirshleifer, D., Welch, I., 1992. A theory of fads, fashion, custom, and cultural change as informational cascades. J. Pol. Econom. 100, 992–1026. Blanchard, O.J., 1979. Econom. Lett. 3, 387–389. Blanchard, O.J., Watson, M.W., 1982. Bubbles, rational expectations and speculative markets. In: Wachtel, P. (Ed.), Crisis in Economic and Financial Structure: Bubbles, Bursts, and Shocks. Lexington Books, Lexington. Boissevain, J., Mitchell, J., 1973. Network Analysis: Studies in Human Interaction. Mouton. Bouchaud, J.-P., Cont, T., 1998. A Langevin approach to stock market Kuctuations and crashes. Eur. Phys. J. B 6, 543–550.
D. Sornette / Physics Reports 378 (2003) 1 – 98
93
Cai, J., 1994. A Markov model of switching-regime ARCH. J. Business Econom. Stat. 12, 309–316. Callen, E., Shapero, D., 1974. A theory of social imitation. Phys. Today (July) 23–28. Camerer, C., 1989. Bubbles and fads in asset prices. J. Econom. Surveys 3, 3–41. Campbell, J.Y., Lo, A.W., MacKinlay, A.C., 1997. The Econometrics of Financial Markets. Princeton University Press, Princeton, NJ. Chaitin, G.J., 1987. Algorithmic Information Theory. Cambridge University Press, Cambridge and New York. Chauvet, M., 1998. An econometric characterization of business cycle dynamics with factor structure and regime switching. International Econom. Rev. 39, 969–996. Checki, T.J., Stern, E., 2000. Financial crises in the emerging markets: the roles of the public and private sectors. Current Issues in Economics and Finance (Federal Reserve Bank of New York) 6 (13), 1–6. Chen, N.-F., Cuny, C.J., Haugen, R.A., 1995. Stock volatility and the levels of the basis and open interest in futures contracts. J. Finance 50, 281–300. Chowdhury, D., StauNer, D., 1999. A generalized spin model of +nancial markets. Eur. Phys. J. B 8, 477–482. Coe, P.J., 2002. Financial crisis and the great depression: a regime switching approach. J. Money, Credit, Banking 34 (1), 76–93. Commission on Physical Sciences, Mathematics, and Applications, 1990. Computing and communications in the extreme research for crisis management and other applications. Steering Committee, Workshop Series on High Performance Computing and Communications, Computer Science and Telecommunications Board National Academy Press, Washington, DC. Cont, R., Bouchaud, J.-P., 2000. Herd behavior and aggregate Kuctuations in +nancial markets. Macroeconom. Dyn. 4, 170–196. Cootner, P.H. (Ed.), 1967. The Random Character of Stock Market Prices. M.I.T. Press, Cambridge, MA. Corcos, A., Eckmann, J.-P., Malaspinas, A., Malevergne, Y., Sornette, D., 2002. Imitation and contrarian behavior: hyperbolic bubbles, crashes and chaos. Quantitative Finance 2, 264–281. Crutch+eld, J.P., Mitchell, M., 1995. The evolution of emergent computation. Proc. Nat. Acad. Sci. U.S.A. 92, 10742–10746. De Bandt, O., Hartmann, P., 2000. Systemic risk: a survey, +nancial economics and internation macroeconomics. Discussion paper series No. 2634. Devenow, A., Welch, I., 1996. Rational herding in +nancial markets. European Econom. Rev. 40, 603–616. Diebold, F.X., Schuermann, T., Stroughair, J.D., 2001. Pitfalls, and opportunities in the use of extreme value theory in risk management. Preprint. DriAll, J., Sola, M., 1998. Intrinsic bubbles and regime-switching. J. Monetary Econom. 42, 357–373. Drozdz, S., Ruf, F., Speth, J., Wojcik, M., 1999. Imprints of log-periodic self-similarity in the stock market. European Phys. J. 10, 589–593. Dubrulle, B., Graner, F., D. Sornette (Eds.), 1997. Scale Invariance and Beyond. EDP Sciences and Springer, Berlin. Dunning, T.J., 1860. ‘Trades’ Unions and Strikes, London. Dupuis, H., 1997. Un krach avant Novembre, Tendances, 18 September, p. 26. Embrechts, P., Kluppelberg, C.P., Mikosh, T., 1997. Modelling Extremal Events. Springer-Verlag, Berlin, 645pp. Falkovich, G., Gawedzki, K., Vergassola, M., 2001. Particles and +elds in Kuid turbulence. Rev. Mod. Phys. 73 (N4), 913–975. Fama, E.F., 1991. EAcient capital markets. 2. J. Finance 46, 1575–1617. Farmer, J.D., 1998. Market force, ecology and evolution. Preprint at adap-org/9812005. Feigenbaum, J.A., 2001. A statistical analysis of log-periodic precursors to +nancial crashes. Quant. Finance 1, 346–360. Feigenbaum, J.A., Freund, P.G.O., 1996. Discrete scale invariance in stock markets before crashes. Int. J. Mod. Phys. B 10, 3737–3745. Feigenbaum, J.A., Freund, P.G.O., 1998. Discrete scale invariance and the “second black Monday”. Modern Phys. Lett. B 12, 57–60. Feldman, R.A., 1982. Dollar appreciation, foreign trade, and the U.S. economy. Federal Reserve Bank of New York Quart. Rev. 7, 1–9. Fieleke, N.S., 1985. Dollar appreciation and U.S. import prices. New England Econom. Rev. (November–December) 49 –54.
94
D. Sornette / Physics Reports 378 (2003) 1 – 98
Frankel, J.A., Froot, K.A., 1988. Chartists, fundamentalists and the demand for dollars. Greek Econom. Rev. 10, 49–102. Frankel, J.A., Froot, K.A., 1990. Chartists, fundamentalists, and trading in the foreign exchange market. Ameri. Econom. Rev. 80, 181–185. Galbraith, J.K., 1997. The Great Crash, 1929. Houghton Mi\in, Boston. Gaunersdorfer, A., 2000. Endogenous Kuctuations in a simple asset pricing model with heterogeneous agents. J. Econom. Dyn. Control 24, 799–831. Geller, R.J., Jackson, D.D., Kagan, Y.Y., Mulargia, F., 1997a. Geoscience—earthquakes cannot be predicted. Science 275 (N5306), 1616–1617. Geller, R.J., Jackson, D.D., Kagan, Y.Y., Mulargia, F., 1997b. Cannot earthquakes be predicted?—responses. Science 278 (N5337), 488–490. Gluzman, S., Yukalov, V.I., 1998. Booms and crashes in self-similar markets. Modern Phys. Lett. B 12, 575–587. Goldenfeld, N., 1992. Lectures on Phase Transitions and the Renormalization Group. Addison-Wesley Publishing Company, Reading, MA. Gorte, R.W., 1995. Forest +res and forest health. Congressional Research Service Report. The Committee for the National Institute for the Environment, 1725 K Street, NW, Suite 212, Washington, DC 20006. Gould, S.J., Eldredge, N., 1993. Punctuated equilibrium comes of age. Nature 366, 223–227. Graham, J.R., 1999. Herding among investment newsletters: theory and evidence. J. Finance 54, 237–268. Grant, J.L., 1990. Stock return volatility during the crash of 1987. J. Portfolio Manage. 16, 69–71. Grassia, P.S., 2000. Delay, feedback and quenching in +nancial markets. Eur. Phys. J. B 17, 347–362. Gray, S.F., 1996. Regime-switching in Australian short-term interest rates. Account. Finance 36, 65–88. Grinblatt, M., Titman, S., Wermers, R., 1995. Momentum investment strategies, portfolio performance, and herding: a study of mutual fund behavior. Amer. Econom. Rev. 85, 1088–1105. Hamilton, J.B., 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57, 357–384. Harris, L., 1997. Circuit breakers and program trading limits: what have we jearned? In: The 1987 Crash, Ten Years Later: Evaluating the Health of the Financial Markets. October 1997 Conference, published in Vol. II of the annual Brookings-Wharton Papers on Financial Services. The Brookings Institution Press, Washington, DC. Helbing, D., Farkas, I., Vicsek, T., 2000. Simulating dynamical features of escape panic. Nature 407, 487–490. Holland, J.H., 1992. Complex adaptive systems. Daedalus 121, 17–30. Holldobler, B., Wilson, E.O., 1994. Journey to the Ants: a Story of Scienti+c Exploration. Belknap Press of Harvard University Press, Cambridge, MA. Holmes, P.A., 1985. How fast will the dollar drop? Nation’s Business 73, 16. Hsieh, D.A., 1995. Nonlinear dynamics in +nancial markets: evidence and implications. Financial Analysts J. (July–August) 55 – 62. Huberman, G., Regev, T., 2001. Contagious speculation and a cure for cancer: a nonevent that made stock prices soar. J. Finance 56, 387–396. Ide, K., Sornette, D., 2002. Oscillatory +nite-time singularities in +nance, population and rupture. Physica A 307 (1–2), 63–106. Johansen, A., Ledoit, O., Sornette, D., 2000. Crashes as critical points. Int. J. Theoret. Appl. Finance 3, 219–255. Johansen, A., Sornette, D., 1998. Stock market crashes are outliers. European Phys. J. B 1, 141–143. Johansen, A., Sornette, D., 1999a. Critical crashes. Risk 12 (1), 91–94. Johansen, A., Sornette, D., 1999b. Modeling the stock market prior to large crashes. Europen. Phys. J. B 9 (1), 167–174. Johansen, A., Sornette, D., 1999c. Financial “anti-bubbles”: log-periodicity in Gold and Nikkei collapses. Int. J. Mod. Phys. C 10, 563–575. Johansen, A., Sornette, D., 2000a. The Nasdaq crash of April 2000: yet another example of log-periodicity in a speculative bubble ending in a crash. European Phys. J. B 17, 319–328. Johansen, A., Sornette, D., 2000b. Evaluation of the quantitative prediction of a trend reversal on the Japanese stock market in 1999. Int. J. Mod. Phys. C 11, 359–364. Johansen, A., Sornette, D., 2001a. Finite-time singularity in the dynamics of the world population and economic indices. Physica A 294, 465–502.
D. Sornette / Physics Reports 378 (2003) 1 – 98
95
Johansen, A., Sornette, D., 2001b. Bubbles and anti-bubbles in Latin-American, Asian and Western stock markets: an empirical study. Int. J. Theoret. Appl. Finance 4 (6), 853–920. Johansen, A., Sornette, D., 2002. Large stock market price drawdowns are outliers. J. Risk 4 (2), 69–110; e-print http://arXiv.org/abs/cond-mat/0210509. Johansen, A., Sornette, D., Ledoit, O., 1999. Predicting +nancial crashes using discrete scale invariance. J. Risk 1, 5–32. KadanoN, L.P., 2002. Wolfram on cellular automata; A clear and very personal exposition (Book review). Physics Today (July), 55–56. Kaminsky, G., Peruga, R., 1991. Credibility crises: the dollar in the early 1980s. J. Int. Money Finance 10, 170–192. Karplus, W.J., 1992. The Heavens are Falling: The Scienti+c Prediction of Catastrophes in Our Time. Plenum Press, New York. Keynes, J.M., 1936. The General Theory of Employment, Interest and Money. Harcourt, Brace, New York (Chapter 12). Kindleberger, C.P., 2000. Manias, Panics, and Crashes: a History of Financial Crises, 4th edition. New York, Wiley. Kirman, A., 1991. Epidemics of opinion and speculative bubbles in +nancial markets. In: Taylor, M. (Ed.), Money and Financial Markets. Macmillan, UK. Knetter, M.M., 1994. Did the strong dollar increase competition in U.S. product markets? Rev. Econom. Stat. 76, 192–195. Knuth, D.E., 1969. The Art of Computer Programming, Vol. 2. Addison-Wesley Publ., Reading, MA, pp. 1–160. Koller, T., Zane, D.W., 2001. What happened to the bull market? The McKinsey Quarterly Newsletter 4 (August 2001), http://www.mckinseyquarterly.com Krawiecki, A., Holyst, J.A., Helbing, D., 2002. Olatility clustering and scaling for +nancial time series due to attractor bubbling. Phys. Rev. Lett. 89 (15), 158701. Krugman, P., 1998. I know what the Hedgies did last summer. Fortune, December issue. LaherrWere, J., Sornette, D., 1998. Stretched exponential distributions in Nature and Economy: “Fat tails” with characteristic scales. European Phys. J. B 2, 525–539. Lamont, O., 1988. Earnings and expected returns. The J. Finance LIII, 1563–1587. Levy, M., Levy, H., Solomon, S., 1995. Microscopic simulation of the stock market—the eNect of microscopic diversity. J. Physique I 5, 1087–1107. Levy, M., Levy, H., Solomon, S., 2000. The Microscopic Simulation of Financial Markets: from Investor Behavior to Market Phenomena. Academic Press, San Diego. Liggett, T.M., 1985. Interacting Particle Systems. Springer, New York. Liggett, T.M., 1997. Stochastic models of interacting systems. The Ann. Probab. 25, 1–29. Lux, T., 1995. Herd behaviour, bubbles and crashes. Economic Journal: The J. Royal Economic Society 105, 881–896. Lux, T., 1998. The socio-economic dynamics of speculative markets: interacting agents, chaos, and the fat tails of return distributions. J. Econom. Behavior Organ. 33, 143–165. Lux, T., Marchesi, M., 1999. Scaling and criticality in a stochastic multi-agent model of a +nancial market. Nature 397, 498–500. Lux, T., Marchesi, M., 2000. Volatility clustering in +nancial markets: a micro-simulation of interacting agents. Int. J. Theoret. Appl. Finance 3, 675–702. L’vov, V.S., Pomyalov, A., Procaccia, I., 2001. Outliers, extreme events and multiscaling. Phys. Rev. E, 6305, PT2:6118, U158–U166. MacDonald, R., Torrance, T.S., 1988. On risk, rationality and excessive speculation in the Deutschmark–US dollar exchange market: some evidence using survey data. Oxford Bull. Econom. Stat. 50, 107–123. Malamud, B.D., Morein, G., Turcotte, D.L., 1998. Forest +res: an example of self-organized critical behavior. Science 281, 1840–1842. Malkiel, B.G., 1999. A Random Walk Down Wall Street. WW Norton & Company, New York. Mauboussin, M.J., Hiler, R., 1999. Rational exuberance? Equity Research Report of Credit Suisse First Boston, January 26. Maug, E., Naik, N., 1995. Herding and delegated portfolio management: The impact of relative performance evaluation on asset allocation. Working paper, Duke University. McNeil, A.J., 1999. Extreme value theory for risk managers. Preprint ETH Zentrum Zurich. Megginson, W.L., 2000. The impact of privatization on capital market development and individual share ownership. Presentation at the 3rd FIBV Global Emerging Markets Conference and Exhibition, Istanbul, April 5 –7. http://www.oecd.org/daf/corporate-aNairs/privatisation/capital-markets/megginson/sld001.htm
96
D. Sornette / Physics Reports 378 (2003) 1 – 98
Miltenberger, P., Sornette, D., Vanneste, 1993. Fault self-organization as optimal random paths selected by critical spatio-temporal dynamics of earthquakes, Phys. Rev. Lett. 71 3604 –3607. Minnich, R.A., Chou, Y.H., 1997. Wildland +re patch dynamics in the chaparral of southern California and northern Baja California. International J. Wildland Fire 7, 221–248. Montroll, E.W., Badger, W.W., 1974. Introduction to Quantitative Aspects of Social Phenomena. Gordon and Breach, New York. Mood, A., 1940. The distribution theory of runs. Ann. Math. Stat. 11, 367–392. Moreno, J.M. (Ed.), 1998. Large Forest Fires. Backhuys Publishers, Leiden. Moss de Oliveira, S., de Oliveira, P.M.C., StauNer, D., 1999. Evolution, Money, War and Computers. Teubner, Stuttgart-Leipzig. Mulligan, C.B., Sala-i-Martin, X., 2000. Extensive margins and the demand for money at low interest rates. J. Pol. Economy. Nature debates, 1999. Is the reliable prediction of individual earthquakes a realistic scienti+c goal? http://helix.nature.com/debates/earthquake/. Onsager, L., 1944. Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev. 65, 117–149. OrlZean, A., 1984. MimZetisme et anticipations rationnelles: une perspective keynesienne. Recherches Economiques de Louvain 52, 45–66. OrlZean, A., 1986. L’auto-rZefZerence dans la thZeorie keynesienne de la spZeculation. Cahiers d’Economie Politique, 14 –15. OrlZean, A., 1989a. Comportements mimZetiques et diversitZe des opinions sur les marchZes +nanciers. In: Bourguinat, H., Artus, P. (Eds.), ThZeorie eZ conomique et crises des marchZes +nanciers. Economica, Paris, pp. 45 – 65 (Chapter III). OrlZean, A., 1989b. Mimetic contagion and speculative bubbles. Theory Decision 27, 63–92. OrlZean, A., 1991. Disorder in the stock market (in French). La Recherche 22, 668–672. OrlZean, A., 1995. Bayesian interactions and collective dynamics of opinion—herd behavior and mimetic contagion. J. Econom. Beha. Organization 28, 257–274. Pandey, R.B., StauNer, D., 2000. Search for log-periodicity oscillations in stock market simulations. Int. J. Theoret. Appl. Finance 3, 479–482. Phoa, W., 1999. Estimating credit spread risk using extreme value theory—application of actuarial disciplines to +nance. J. Portfolio Management 25, 69–73. Potters, M., Cont, R., Bouchaud, J.-P., 1998. Financial markets as adaptative ecosystems. Europhys. Lett. 41, 239–244. Press, W.H., et al., 1992. Numerical Recipes. Cambridge University Press, Cambridge. Roehner, B.M., Sornette, D., 1998. The sharp peak-Kat trough pattern and critical speculation. European Phys. J. B 4, 387–399. Roehner, B.M., Sornette, D., 2000. Thermometers of speculative frenzy. European Phys. J. B 16, 729–739. Roll, R., 1988. The international crash of October 1987. Financial Anal. J. 4 (5), 19–35. Romer, C.D., 1990. The great crash and the onset of the great depression. Quart. J. Econom. 105, 597–624. Saleur, H., Sornette, D., 1996. Complex exponents and log-periodic corrections in frustrated systems. J. Phys. I France 6, 327–355. Sato, A.H., Takayasu, H., 1998. Dynamic numerical models of stock market price: from microscopic determinism to macroscopic randomness. Physica A 250, 231–252. Schaller, H., van Norden, S., 1997. Regime switching in stock market returns. Appl. Financial Econom. 7, 177–191. Scharfstein, D., Stein, J., 1990. Herd behavior and investment. American Econom. Rev. 80, 465–479. Shefrin, H., 2000. Beyond Greed and Fear: Understanding Behavioral Finance and the Psychology of Investing. Harvard Business School Press, Boston, MA. Shiller, R.J., 1989. Market Volatility. MIT Press, Cambridge, MA. Shiller, R.J., 2000. Irrational Exuberance. Princeton University Press, Princeton, NJ. Shleifer, A., 2000. IneAcient Markets: an Introduction to Behavioral Finance. Oxford University Press, Oxford, New York. Sircar, R., Papanicolaou, G., 1998. General Black-Scholes models accounting for increased market volatility from hedging strategies. Appl. Math. Finance 5, 45–82. Sornette, D., 1998. Discrete scale invariance and complex dimensions. Phys. Rep. 297, 239–270. Sornette, D., 1999. Complexity, catastrophe and physics. Phys. World 12 (N12), 57.
D. Sornette / Physics Reports 378 (2003) 1 – 98
97
Sornette, D., 2000a. Critical Phenomena in Natural Sciences, Chaos, Fractals, Self-organization and Disorder: Concepts and Tools. Springer Series in Synergetics, Heidelberg. Sornette, D., 2000b. Stock market speculation: spontaneous symmetry breaking of economic valuation. Physica A 284, 355–375. Sornette, D., 2002. Predictability of catastrophic events: material rupture, earthquakes, turbulence, +nancial crashes and human birth. Proceedings of the National Academy of Sciences USA, V99 (Supp.1), pp. 2522–2529. Sornette, D., 2003. Why Stock Markets Crash: Critical Events in Complex Financial Systems. Princeton University Press, Princeton, NJ (456 pages, 165 +gures, 21 tables). Sornette, D., Andersen, J.V., 2002. A nonlinear super-exponential rational model of speculative +nancial bubbles. Int. J. Mod. Phys. C 13 (2), 171–188. Sornette, D., Andersen, J.V., Simonetti, P., 2000a. Portfolio theory for “Fat Tails”. Int. J. Theoret. Appl. Finance 3 (3), 523–535. Sornette, D., Helmstetter, A., 2003. Endogeneous versus exogeneous shocks in systems with memory. Physica A 318, 577. Sornette, D., Johansen, A., 1997. Large +nancial crashes. Physica A 245, 411–422. Sornette, D., Johansen, A., 1998. A hierarchical model of +nancial crashes. Physica A 261, 581–598. Sornette, D., Johansen, A., 2001. Signi+cance of log-periodic precursors to +nancial crashes. Quant. Finance 1 (4), 452–471. Sornette, D., Johansen, A., Bouchaud, J.-P., 1996. Stock market crashes, precursors and replicas. J. Phys. I France 6, 167–175. Sornette, D., Malevergne, Y., Muzy, J.F., 2002. Volatility +ngerprints of large shocks: endogeneous versus exogeneous. Preprint at http://arXiv.org/abs/cond-mat/0204626 (Risk published February 2003). Sornette, D., Miltenberger, P., Vanneste, C., 1994. Statistical physics of fault patterns self-organized by repeated earthquakes. Pure Appl. Geophys. 142, 491–527. Sornette, D., Simonetti, P., Andersen, J.V., 2000b. %q -+eld theory for portfolio optimization: fat tails and non-linear correlations. Phys. Rep. 335, 19–92. Sornette, D., Zhou, W.-X., 2002. The US 2000 –2002 market descent: how much longer and deeper? Quantitative Finance 2 (6), 468–481. StauNer, D., 1999. Monte-Carlo-Simulation mikroskopischer B`orsenmodelle. Physikalische Bl`atter 55, 49. StauNer, D., Aharony, A., 1994. Introduction to Percolation Theory, 2nd Edition. Taylor & Francis, London, Bristol, PA. StauNer, D., Sornette, D., 1999. Self-organized percolation model for stock market Kuctuations. Physica A 271, 496–506. Takayasu, H., Miura, H., Hirabayshi, T., Hamada, K., 1992. Statistical properties of deterministic threshold elements—the case of the market price. Physica A 184, 127–134. Thaler, R.H. (Ed.), 1993. Advances in Behavioral Finance. Russell Sage Foundation, New York. Trueman, B., 1994. Analyst forecasts and herding behavior. The Rev. Financial Studies 7, 97–124. Van Norden, S., Schaller, H., 1993. The predictability of stock market regime: evidence from the Toronto stock exchange. Rev. Econom. Statist. 75, 505–510. Van Norden, S., 1996. Regime switching as a test for exchange rate bubbles. J. Appl. Econom. 11, 219–251. Vandewalle, N., Boveroux, P., Minguet, A., Ausloos, M., 1998a. The crash of October 1987 seen as a phase transition: amplitude and universality. Physica A 255, 201–210. Vandewalle, N., Ausloos, M., Boveroux, P., Minguet, A., 1998b. How the +nancial crash of October 1997 could have been predicted. European Phys. J. B 4, 139–141. Welch, I., 1992. Sequential sales, learning, and cascades. J. Finance 47, 695–732. See also http://welch.som. yale.edu/cascades for an annotated bibliography and resource reference on “information cascades”. Welch, I., 2000. Herding among security analysts. J. Financial Econom. 58 (3), 369–396. White, E.N., 1996. Stock market crashes and speculative manias. In: The International Library of Macroeconomic and Financial History, Vol. 13. An Elgar Reference Collection, Cheltenham, UK; Brook+eld, US. White, E.N., Rappoport, P., 1995. The New York stock market in the 1920s and 1930s: did stock prices move together too much? In: Bordo, M., Sylla, R. (Eds.), Anglo-American Financial Systems: Institutions and Markets in the Twentieth Century. Burr Ridge Irwin, pp. 299–316. Wilson, E.O., 1971. The Insect Societies. Belknap Press of Harvard University Press, Cambridge, MA. Wilson, K.G., 1979. Problems in Physics with many scales of length. Sci. Amer. 241 (2), 158–179.
98
D. Sornette / Physics Reports 378 (2003) 1 – 98
Wolfram, S., 2002. A New Kind of Science. Wolfram Media, Inc.; ISBN: 1579550088. Youssefmir, M., Huberman, B.A., Hogg, T., 1998. Bubbles and market crashes. Comput. Econom. 12, 97–114. Zwiebel, J., 1995. Corporate conservatism and relative compensation. J. Pol. Economy 103, 1–25. Zhou, W.-X., Sornette, D., 2002a. Statistical signi+cance of periodicity and log-periodicity with heavy-tailed correlated noise. Int. J. Mod. Phys. C 13 (2), 137–170. Zhou, W.-X., Sornette, D., 2002b. Generalized q-analysis of log-periodicity: applications to critical ruptures. Phys. Rev. E, in press, http://arXiv.org/abs/cond-mat/0201458. Zhou, W.-X., Sornette, D., 2002c. Non-parametric analyses of log-periodic precursors to +nancial crashes (preprint at http://arXiv.org/abs/cond-mat/0205531).
Available online at www.sciencedirect.com
Physics Reports 378 (2003) 99 – 205 www.elsevier.com/locate/physrep
Dispersion relations in real and virtual Compton scattering D. Drechsela , B. Pasquinib; c; d , M. Vanderhaeghena;∗ a
b
Institut fur Kernphysik, Johannes Gutenberg-Universitat, D-55099 Mainz, Germany ECT*-European Centre for Theoretical Studies in Nuclear Physics and Related Areas, I-38050 Villazzano (Trento), Italy c INFN, Trento, Italy d Dipartimento di Fisica, Universit3a degli Studi di Trento, I-38050 Povo, Trento, Italy Accepted 10 December 2002 editor: W. Weise
Abstract A uni.ed presentation is given on the use of dispersion relations in the real and virtual Compton scattering processes o0 the nucleon. The way in which dispersion relations for Compton scattering amplitudes establish connections between low energy nucleon structure quantities, such as polarizabilities or anomalous magnetic moments, and the nucleon excitation spectrum is reviewed. We discuss various sum rules for forward real and virtual Compton scattering, such as the Gerasimov–Drell–Hearn sum rule and its generalizations, the Burkhardt–Cottingham sum rule, as well as sum rules for forward nucleon polarizabilities, and review their experimental status. Subsequently, we address the general case of real Compton scattering (RCS). Various types of dispersion relations for RCS are presented as tools for extracting nucleon polarizabilities from the RCS data. The information on nucleon polarizabilities gained in this way is reviewed and the nucleon structure information encoded in these quantities is discussed. The dispersion relation formalism is then extended to virtual Compton scattering (VCS). The information on generalized nucleon polarizabilities extracted from recent VCS experiments is described, along with its interpretation in nucleon structure models. As a summary, the physics content of the existing data is discussed and some perspectives for future theoretical and experimental activities in this .eld are presented. c 2003 Elsevier Science B.V. All rights reserved. PACS: 11.55.Fv; 13.40.−f; 13.60.Fz; 14.20.Dh Keywords: Dispersion relations; Electromagnetic processes and properties; Elastic and Compton scattering; Protons and neutrons
∗
Corresponding author. E-mail address:
[email protected] (M. Vanderhaeghen).
c 2003 Elsevier Science B.V. All rights reserved. 0370-1573/03/$ - see front matter doi:10.1016/S0370-1573(02)00636-1
100
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Forward dispersion relations and sum rules for real and virtual Compton scattering . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Classical theory of dispersion and absorption in a medium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Real Compton scattering (RCS): nucleon polarizabilities and the GDH sum rule . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Forward dispersion relations in doubly virtual Compton scattering (VVCS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Dispersion relations in real Compton scattering (RCS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Invariant amplitudes and nucleon polarizabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. RCS data for the proton and extraction of proton polarizabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Extraction of neutron polarizabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6. Unsubtracted .xed-t dispersion relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7. Subtracted .xed-t dispersion relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8. Hyperbolic (.xed-angle) dispersion relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9. Comparison of di0erent dispersion relation approaches to RCS data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10. Physics content of the nucleon polarizabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11. DR predictions for nucleon polarizabilities and comparison with theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Dispersion relations in virtual Compton scattering (VCS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Kinematics and invariant amplitudes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. De.nitions of nucleon generalized polarizabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Fixed-t dispersion relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1. s-channel dispersion integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2. Asymptotic parts and dispersive contributions beyond N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5. VCS data for the proton and extraction of generalized polarizabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6. Physics content of the nucleon generalized polarizabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Conclusions and perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. t-channel exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B. Tensor basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
100 102 102 109 116 136 136 138 140 141 143 144 146 148 153 159 163 168 168 168 172 173 174 175 180 190 194 197 198 200 201
1. Introduction The internal structure of the strongly interacting particles has been an increasingly active area of experimental and theoretical research over the past 5 decades. Precision experiments at high energy have clearly established Quantum Chromodynamics (QCD) as the underlying gauge theory describing the interaction between quarks and gluons, the elementary constituents of hadronic matter. However, the running coupling constant of QCD grows at low energies, and these constituents are con.ned to colorless hadrons, the mesons and baryons, which are the particles eventually observed by the detection devices. Therefore, we have to live with a dichotomy: The small value of the coupling constant at high energies allows for an interpretation of the experiments in terms of perturbative QCD, while the large value at low energies calls for a description in terms of the hadronic degrees of freedom, in particular in the approach developed as Chiral Perturbation Theory.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
101
Between these two regions, at excitation energies between a few hundred MeV and 1–2 GeV, lies the interesting region of nucleon resonance structures which is beyond the scope of either perturbation scheme. There is some hope that this regime will eventually be described by numerical solutions of QCD through lattice gauge calculations. At present, however, our understanding of resonance physics is still mostly based on phenomenology. In the absence of a descriptive theory it is essential to extract new and precise hadronic structure information, and in this quest electromagnetic probes have played a decisive role. In particular, high precision Compton scattering experiments have become possible with the advent of modern electron accelerators with high current and duty factor, and of laser backscattering facilities, and in combination with high precision and large acceptance detectors. This intriguing new window o0ers, among other options, the possibility for precise and detailed investigations of the nucleon polarizability as induced by the applied electromagnetic multipole .elds. The polarizability of a composite system is an elementary structure constant, just as are its size and shape. In a macroscopic medium, the electric and magnetic dipole polarizabilities are related to the dielectric constant and the magnetic permeability, and these in turn determine the index of refraction. These quantities can be studied by considering an incident electromagnetic wave inducing dipole oscillations in the constituent atoms or molecules of a target medium. These oscillations then emit dipole radiation leading, by way of interference with the incoming wave, to the complex amplitude of the transmitted wave. A general feature of these processes is the dispersion relation of Kronig and Kramers [1], which connects the real refraction index as function of the frequency with a weighted integral of the extinction coeGcient over all frequencies. Dispersion theory in general relies on a few basic principles of physics: relativistic covariance, causality and unitarity. As a .rst step a complete set of amplitudes has to be constructed, in accordance with relativity and without kinematical singularities. Next, causality requires certain analytic properties of the amplitudes, which allow for a continuation of the scattering amplitudes into the complex plane and lead to dispersion relations connecting the real and imaginary parts of these amplitudes. Finally, the imaginary parts can be replaced by absorption cross sections by the use of unitarity, and as a result we can, for example, complete the Compton amplitudes from experimental information on photoabsorption and photo-induced reactions. In Section 2 we .rst discuss the classical theory of dispersion and absorption in a medium, and brieIy compare the polarizability of macroscopic matter and microscopic systems, atoms and nucleons. This is followed by a review of forward Compton scattering and its connection to total absorption cross sections. Combining dispersion relations and low energy theorems, we obtain sum rules for certain combinations of the polarizabilities and other ground state properties, e.g., the Gerasimov–Drell–Hearn sum rule for real photons [2,3], and the much debated Burkhardt–Cottingham sum rule for virtual Compton scattering [4] as obtained from radiative electron scattering. We then address the general case of real Compton scattering in Section 3. Besides the electric and magnetic (dipole) polarizabilities of a scalar system, the spin of the nucleon leads to four additional spin or vector polarizabilities, and higher multipole polarizabilities will appear with increasing photon energy. We show how these polarizabilities can be obtained from photon scattering and photoexcitation processes through a combined analysis based on dispersion theory. The results of such an analysis are then compared in detail with the experimental data and predictions from theory. In Section 4 we discuss the more general case of virtual Compton scattering, which can be achieved by radiative electron–proton scattering. Such experiments have become possible only very
102
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
recently. The nonzero four-momentum transfer squared of the virtual photon allows us to study generalized polarizabilities as function of four-momentum transfer squared and therefore, in some sense, to explore the spatial distribution of the polarization e0ects. In the last section, we summarize the pertinent features of our present knowledge on the nucleon polarizability and conclude by outlining some remaining challenges for future work. This review is largely based on dispersion theory whose development is related to Heisenberg’s idea that the interaction of particles can be described by their behavior at large distances, i.e., in terms of the S matrix [5]. The practical consequences of this program were worked out by Mandelstam and others [6]. An excellent primer for the beginner is the textbook of Nussenzveig [7]. In order to feel comfortable on Mandelstam planes and higher Riemann sheets, the review of Hoehler [8] is an absolute must for the practitioner. Concerning the structure aspect of our review, we refer the reader to a general treatise of the electromagnetic response of hadronic systems by BoG et al. [9], and to the recent book of Thomas and Weise [10], which is focused on the structure aspects of the nucleon.
2. Forward dispersion relations and sum rules for real and virtual Compton scattering 2.1. Classical theory of dispersion and absorption in a medium The classical theory of Lorentz describes the dispersion in a medium in terms of electrons bound by a harmonic force. In the presence of a monochromatic external .eld, E! , the equations of motion take the form 2 9 e 9 2 + !j r(t) = − E! e−i!t ; + 2j (1) 9t 2 9t m with −e the charge 1 and m the mass of the electron, and j ¿ 0 and !j ¿ 0 the damping constant and oscillator frequency, respectively, of a speci.c bound state j. The stationary solution for the displacement is then given by rj (t) = −
eE! e−i!t ; m(!j2 − 2ij ! − !2 )
(2)
and the polarization P is obtained by summing the individual dipole moments dj = −erj over all electrons and oscillator frequencies in the medium, P(t) =
j
1
Nj
m(!j2
e2 E! e−i!t = P! e−i!t ; − 2ij ! − !2 )
(3)
In Section 2.1 we shall use Gaussian units as in most of the literature on theoretical electrodynamics, i.e., the .ne structure constant takes the form em = e2 =c˝ ≈ 1=137 and the classical electron radius is rcl = e2 =mc2 . In all later sections the Heaviside–Lorentz units will be used in order to concur with the standard notation of particle physics.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
103
where Nj is the number of electrons per unit volume, in the state j. The dielectric susceptibility is de.ned by P! = (!)E! ;
(4)
with (!) =
Nj e2 : 2 m j !j − 2ij ! − !2
(5)
We observe at this point that (!) (I) is square integrable in the upper half-plane (I+ ) for any line parallel to the real ! axis, and (II) has singularities only in the lower-half plane (I− ) in the form of pairs of poles at (6) !± = ± !j2 − 2j − ij : According to Titchmarsh’s theorem these observations have the following consequences: The Fourier transform ∞ 1 (t) = (!)e−i!t d! 2 −∞
(7)
is causal, i.e., the dielectric susceptibility and the polarization of the medium build up only after the electric .eld is applied, and the real and imaginary parts of are Hilbert transforms, ∞ Im (! ) 1 Re (!) = P d! ; −∞ ! − ! ∞ Re (! ) 1 (8) Im (!) = − P d! ; −∞ ! − ! where P denotes the principal value integral. Applying the convolution theorem for Fourier transforms to Eq. (4), we obtain ∞ P(t) = (t − t )E(t ) dt ; −∞
(9)
with general time pro.les P(t) and E(t) of medium polarization and external .eld, respectively, constructed according to Eq. (7). The proof of causality follows from integrating the dielectric susceptibility over a contour C+ along the real ! axis, for −R 6 ! 6 R, and closed by a large half circle with radius R in the upper part of the complex !-plane. Since no singularities appear within this contour, (!)e−i! d! = 0 : (10) C+
We make contact with the Fourier transform of Eq. (7) by blowing up the contour (R → ∞) and studying the convergence along the half circle. According to our observation (I) the function itself
104
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
is square integrable in I+ , and therefore the convergence depends on the behavior of the exponential function exp(−i!), which depends on the sign of . In the case of ¡ 0 the convergence is improved by the exponential, and the contribution of the half-circle vanishes in the limit R → ∞. Combining Eqs. (7) and (10), we then obtain () = 0
for ¡ 0 ;
(11)
which enforces causality, as becomes obvious by inspecting Eq. (9): The electric .eld E(t ) will a0ect the polarizability P(t) only at some later time, = t − t ¿ 0. For such time, ¿ 0, the contour integral C+ is of course useless for our purpose, because the exponential overrides the convergence of in I+ . Therefore, the contour has to be closed in the lower half-plane, which picks up the contributions from the singularities in I− . We note in passing that Eq. (11) describes the nonrelativistic causality condition, which has to be sharpened by the postulate of relativity that no signal can move faster than the velocity of light. Furthermore, causality is found to be a direct consequence of analyticity of the Green function (!), which in the Lorentz model results from the choice of j . For j ¡ 0, the poles of would have moved to the upper half-plane of !, and the result would be an acausal response, () ¿ 0 for ¡ 0 and () = 0 for ¿ 0. Next let us study the symmetry properties of under the (“crossing”) transformation ! → −!. The real (R ) and imaginary (I ) parts of this function can be read o0 Eq. (5), R (!) = −
I (!) =
!2 − !j2 e2 Nj 2 ; m j (! − !j2 )2 + 42j !2
2j ! e2 Nj 2 ; m j (! − !j2 )2 + 42j !2
(12)
(13)
and the crossing relations for real ! values are R (−!) = R (!);
I (−!) = −I (!) :
This makes it possible to cast Eq. (8) into the form ∞ ∞ 2 2 ! I (! ) R (! ) R (!) = P !P d! ; (!) = − d! : I 2 2 2 2 ! −! ! −! 0 0
(14)
(15)
The crossing relations Eq. (14) can be combined and extended to complex values of ! by (−!∗ ) = ∗ (!) :
(16)
In particular, is real on the imaginary axis and takes on complex conjugate values at points situated mirror-symmetrically to this axis. The dielectric susceptibility can be expressed by the dielectric constant , (!) =
(!) − 1 ; 4
(17)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
which in turn is related to the refraction index n and the phase velocity vP in the medium, c c ! = = P (!) = ; k(!) n(!) (!) (!)
105
(18)
where k is the wave number, and the magnetic permeability of the medium. In the case of = 1, it is obvious that also ( − 1) and hence (n2 − 1) obey the dispersion relations of Eq. (15). In a gas of low density, the refraction index is close to 1, and we can approximate (n2 − 1) by 2(n − 1). The result is the Lorentz dispersion formula for the oscillator model, to be obtained from Eqs. (5), (17) and (18), n(!) = 1 + 2
Nj e2 : 2 m j !j − 2ij ! − !2
(19)
Let us now discuss the connection between absorption and dispersion on the microscopic level. Suppose that a monochromatic plane wave hits a homogeneous and isotropic medium at x = 0 and leaves the slab of matter at x = Px. The incoming wave is denoted by Ein (x; t) = ei(kx−!t) E0 eˆ0 ;
(20)
with the linear dispersion ! = ck and the polarization vector eˆ0 . Having passed the slab of matter with the dispersion of Eq. (19), the wave function is Eout (Px; t) = ei(!=c)n(!)Px e−i!t E0 eˆ0 = e−i(!=c)(nR −1)Px e−(!=c)nI Px Ein (Px; t) :
(21)
The imaginary part of n is associated with absorption, which de.nes an extinction coeGcient , such that the intensity drops like |Eout |2 = e−Px |Ein |2 . On the other hand the extinction coeGcient is related to the product of the total absorption cross section T for an individual constituent (e.g., a 1 H atom) and the number of constituents per volume N , and therefore (!) = 2!nI =c = NT (!) :
(22)
Further on the elementary level, the incident light wave excites dipole oscillations of the constituents with electric dipole moments d(t) = Ein (0; t) ;
(23)
with =(!) the electric dipole polarizability of a constituent. We note that here and in the following the dipole approximation has been used such that we can neglect retardation e0ects and evaluate the incoming wave at x = 0. Within the slab of matter, the dipole moments radiate, thus giving rise to an induced electric .eld Es . The .eld due to the individual dipole at r , measured at a point r = xeˆx in beam direction, is es = k 2 E0
ei(k!−!t) (%ˆ × eˆ0 ) × %ˆ ; !
with %ˆ = (r − r)=|r − r| and ! = |r − r|.
(24)
106
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
In particular, forward scattering is obtained in the limit kx1. Since the incoming .eld is polarized perpendicularly to this axis, we .nd es (" = 0) = k 2
ei(k!−!t) E0 eˆ0 ; !
(25)
and by de.nition the forward scattering amplitude f(k; " = 0) = k 2 :
(26)
The total .eld due to the dipole oscillations, Es , is obtained by integrating Eq. (24) over the volume of the slab and multiplying with N , the number of particles per volume. The result for small Px is Eout = Ein + Es ≈ (1 + 2ikPxN)Ein :
(27)
A comparison of Eqs. (26) and (27) with the macroscopic form, Eq. (21), expanded for small Px, yields the connection between the refractive index and the forward scattering amplitude, 2N f(k; " = 0) : k2 From Eqs. (22) and (28) we obtain the optical theorem, ! T (!) ; Im f(!) = 4 n(!) − 1 = 2N(!) =
(28)
(29)
and since f=k 2 is proportional to (n−1) and , there follows a dispersion relation for Re f analogous to Eq. (15), ∞ ∞ 2!2 Im f(! ) T (! ) !2 P Re f(!) = d! = P d! ; (30) 2 (! 2 − !2 ) 2 − !2 2 ! ! 0 0 where we have set c = ˝ = 1 here and in the following. Historically, Eq. (30) expressed in terms of n(!) − 1, was .rst derived by Kronig and Kramers [1]. We also note that without the crossing symmetry, Eq. (14), the dispersion integral would also need information about the cross section at negative energies, which of course is not available. In order to prepare for the speci.c content of this review, several comments are in order: (I) The derivation of the Kramers–Kronig dispersion relation started from a neutral system, an atom like the hydrogen atom. Since the total charge is zero, the electromagnetic .eld can only excite the internal degrees of freedom, while the center of mass remains .xed. As a consequence the scattering amplitude f(!) = O(!2 ), which leads to a di0erential cross section d = |f(!)|2 = O(!4 ) : d$ The result is Rayleigh scattering which among other things explains the blue sky. However, for charged systems like ions, electrons or protons, also the center of mass will be accelerated by the electromagnetic .eld, and the scattering amplitude takes the general form Re f(!; 0) = −
2 Qtot + O(!2 ) : Mtot
(31)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
107
The additional “Thomson” term due to c:m: motion results in a .nite scattering amplitude for ! = 0 and depends only on the total charge Qtot and the total mass Mtot . (II) We have de.ned the electric dipole polarizability as a complex function (!) whose real and imaginary parts can be calculated directly from the total absorption cross section T (!). In the Lorentz model this cross section starts as !2 for small !. In reality, however, the total absorption cross section has a threshold energy !0 . The absorption spectrum of, say, hydrogen is given by a series of discrete levels (!1s→1p = 10:2 eV, etc.) followed by a continuum for ! ¿ 13:6 eV. As a result T (!) vanishes in a range 0 6 ! ¡ !0 , and therefore (!) = Re (!) can be expanded in a Taylor series in the vicinity of the origin, ∞ ∞ 1 T (! 2 ) T (! 2 ) !2 (!) = 2 d! + d! + · · · (32) 2 !0 22 !0 ! 2 ! 4 In the following chapters we shall use the term “polarizability” or more exactly “static polarizability” only for the .rst term of the expansion. Moreover, in the dipole expansion used in Eq. (23), this .rst term is solely determined by electric dipole (E1) radiation, ∞ 1 T (! 2 ) ≡ (! = 0) = 2 d! ¿ 0 : (33) 2 !0 ! 2 The terms O(!2 ) in Eq. (32) contain the .rst order retardation e0ects for E1 radiation, and the full function (!) will be called the “dynamical polarizability” of the system. (III) Finally, the Lorentz model discards magnetic e0ects because of the small velocities involved in atomic systems. In a general derivation, the .rst term on the rhs of Eq. (32) equals the sum of the electric () and magnetic (() dipole polarizabilities, while the second term describes the retardation of these dipole polarizabilities and the static quadrupole polarizabilities. Let us .nally discuss the polarizability for some speci.c cases. The Hamiltonian for an electron bound by a harmonic restoring force, as in the Lorentz model of Eq. (1), takes the form H=
p2 m!02 2 + r + er · E ; 2m 2
(34)
where the electric .eld E is assumed to be static and uniform. Substituting r = r + Pr and p = p , where Pr is the displacement due to the electric .eld, we may rewrite this equation as H=
m!02 2 p 2 + r + PE : 2m 2
(35)
The displacement Pr leads to an induced dipole moment d and an energy shift PE, d = −ePr =
e2 E; m!02
PE = −
e2 E2 : 2m!02
(36)
The induced dipole moment d and the energy shift PE are both proportional to the polarizability, = e2 =m!02 , which can also be read o0 Eqs. (2) and (23) in the limit ! → 0. In fact,
108
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
the relation =
*2 PE *d =− *E (*E)2
(37)
is quite general and even survives in quantum mechanics. As a result we can calculate the energy of such a system by second order perturbation theory. The perturbation to .rst order (linear Stark e0ect) vanishes for a system with good parity, and if the system is also spherically symmetric, the second order (quadratic Stark e0ect) yields, |n|e z|0 |2 PE = − E2 ; (38) j n − j0 n¿0 where jn are the energies of the eigenstate |n . Eqs. (37) and (38) immediately yield the static electric dipole polarizability, |n|e z|0 |2 =2 : (39) jn − j 0 n¿0 As an example for a classical extended object we quote the electric () and magnetic (() dipole polarizabilities of small dielectric or permeable spheres of radius a [11], j−1 3 −1 3 a ; (= a : = (40) j+2 +2 The same quantities for a perfectly conducting sphere are obtained in the limits j → ∞ and → 0, respectively, = a3 ;
( = − 12 a3 :
(41)
The electric polarizability of the conducting sphere is essentially the volume of the sphere, up to a factor 4=3. Due to the di0erent boundary conditions, the magnetic polarizability is negative, which corresponds to diamagnetism ( ¡ 1). In this case the currents and with them the magnetizations are induced against the direction of the applied .eld according to Lenz’s law. A permeable sphere can be diamagnetic or paramagnetic ( ¿ 1), in the latter case the magnetic moments are already preformed and become aligned in the presence of the external .eld. While the magnetic polarizabilities of atoms and molecules are usually very small because of |−1| . 10−2 , electric polarizabilities may be quite large compared to the volume. For example, the static dielectric constant of water = 81 leads to a nearly perfect conductor; in the visible range this constant is down to = 1:8 with the consequence that the index of refraction is n = 1:34. A quantum mechanical example is the hydrogen atom in nonrelativistic description. Its ground state has good parity and spherical symmetry and therefore Eq. (38) applies. In this case it is even possible to perform the sum over the excited states and to obtain the closed expression [12] 9 (1 H ) = a3B ; (42) 2 where aB is the Bohr radius. The rms radius of 1 H is r 2 = 3a2B , the radius of an equivalent hard sphere is given by R2 = 5a2B , and as a result the hydrogen atom is a pretty good conductor, =volume ≈ 1=10.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
109
In the following sections we report on the polarizabilities of the nucleon. As compared to hydrogen and other atoms, we shall .nd that the nucleon is a dielectric medium with ≈ 1:001, i.e., a very good insulator. Furthermore, magnetic e0ects are a priori of the same order as the electric ones, because the charged constituents, the quarks move with velocities close to the velocity of light. However the diamagnetic e0ects of the pion cloud and the paramagnetic e0ects of the quark core of the nucleon tend to cancel, with the result of a relatively small net value of (. We shall see that “virtual” light allows one to gain information about the spatial distribution of the polarization densities, which will be particularly interesting to resolve the interfering e0ects of para- and diamagnetism. Furthermore, the nucleon has a spin and therefore appears as an anisotropic object in the case of polarized nucleons. This leads to additional spin polarizabilities whose closest parallel in classical physics is the Faraday e0ect. 2.2. Real Compton scattering (RCS): nucleon polarizabilities and the GDH sum rule In this section we discuss the forward scattering of a real photon by a nucleon. The incident photon is characterized by the Lorentz vectors of momentum, q = (q0 ; q) and polarization, . = (0; ”. ), with q · q = 0 (real photon) and . · q = 0 (transverse polarization). If the photon moves in the direction of the z-axis, q = q0 eˆz , the two polarization vectors may be taken as 1 (43) ”± = ∓ √ (eˆx ± ieˆy ) ; 2 corresponding to circularly polarized light with helicities . = +1 (right-handed) and . = −1 (lefthanded). The kinematics of the outgoing photon is then described by the corresponding primed quantities. For the purpose of dispersion relations we choose the lab frame, and introduce the notation lab q0 = 2 for the photon energy in that system. The total c.m. energy W is expressed in terms of 2 as : W 2 = M 2 + 2M2, where M is the nucleon mass. The forward Compton amplitude then takes the form ∗
∗
T (2; " = 0) = ” · ”f(2) + i · (” × ”)g(2) :
(44)
This is the most general expression that is: (I) constructed from the independent vectors ” ; ”; q = q (forward scattering!), and (the proton spin operator), (II) linear in ” and ”, (III) obeying the transverse gauge, ” · q = ” · q = 0, and (IV) invariant under rotational and parity transformations. Furthermore, the Compton amplitude has to be invariant under photon crossing, corresponding to the fact that each graph with emission of the .nal-state photon after the absorption of the incident photon has to be accompanied by a graph with the opposite time order, i.e. absorption following emission (“crossed diagram”). This symmetry requires that the amplitude T of Eq. (44) be invariant under the transformation ↔ and 2 ↔ −2, with the result that f is an even and g an odd function, f(2) = f(−2);
g(2) = −g(−2) :
(45)
110
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
γ (a)
N
γ
N
(b)
Fig. 1. Spin and helicity of a double polarization experiment. The arrows ⇒ denote the spin projections on the photon momentum, the arrows → the momenta of the particles. The spin projection and helicity of the photon is assumed to be . = 1. The spin projection and helicity of the target nucleon N are denoted by Sz and h, respectively, and the eigenvalues of the excited system N ∗ by the corresponding primed quantities. (a) Helicity 3=2: Transition N (Sz = 1=2; h = −1=2) → N ∗ (Sz = h = 3=2), which changes the helicity by 2 units. (b) Helicity 1=2: Transition N (Sz = −1=2; h = +1=2) → N ∗ (Sz = h = +1=2), which conserves the helicity.
These two amplitudes can be determined by scattering circularly polarized photons (e.g., helicity . = 1) o0 nucleons polarized along or opposite to the photon momentum q. The former situation (Fig. 1a) leads to an intermediate state with helicity 3=2. Since this requires a total spin S ¿ 3=2, the transition can only take place on a correlated 3-quark system. The transition of Fig. 1b, on the other hand, is helicity conserving and possible for an individual quark, and therefore should dominate in the realm of deep inelastic scattering. Denoting the Compton scattering amplitudes for the two experiments indicated in Fig. 1 by T3=2 and T1=2 , we .nd f(2) = (T1=2 + T3=2 )=2 and g(2) = (T1=2 − T3=2 )=2. In a similar way we de.ne the total absorption cross section as the spin average over the two helicity cross sections, T =
1 (3=2 + 1=2 ) ; 2
(46)
and the transverse–transverse interference term by the helicity di0erence, TT =
1 (3=2 − 1=2 ) : 2
(47)
The optical theorem expresses the unitarity of the scattering matrix by relating the absorption cross sections to the imaginary part of the respective forward scattering amplitude, 2 2 Im f(2) = (1=2 (2) + 3=2 (2)) = T (2) ; 8 4 2 2 (1=2 (2) − 3=2 (2)) = − TT (2) : (48) Im g(2) = 8 4 Due to the smallness of the .ne structure constant em we may neglect all purely electromagnetic processes in this context, such as photon scattering to .nite angles or electron-positron pair production in the Coulomb .eld of the proton. Instead, we shall consider only the coupling of the photon to the hadronic channels, which start at the threshold for pion production, i.e., at a photon lab energy 20 = m (1 + m =2M ) ≈ 150 MeV. We shall return to this point later in the context of the GDH integral. The total photoabsorption cross section T is shown in Fig. 2. It clearly exhibits three resonance structures on top of a strong background. These structures correspond, in order, to concentrations of magnetic dipole strength (M 1) in the region of the 8(1232) resonance, electric dipole strength (E1) near the resonances N ∗ (1520) and N ∗ (1535), and electric quadrupole (E2) strength near the
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
111
Fig. 2. The total absorption cross section T (2) for the proton. The .t to the data is described in Ref. [13], where also the references to the data can be found.
N ∗ (1675). Since the absorption cross sections are the input for the dispersion integrals, we have to discuss the convergence for large 2. For energies above the resonance region (2 & 1:66 GeV which is equivalent to total c.m. energy W & 2 GeV), T is very slowly decreasing and reaches a minimum of about 115 b around W = 10 GeV. At the highest energies, W 200 GeV (corresponding with 2 2 × 104 GeV), experiments at DESY [14] have measured an increase with energy of the form T ∼ W 0:2 , in accordance with Regge parametrizations through a soft pomeron exchange mechanism [15]. Therefore, it cannot be expected that the unweighted integral over T converges. Recently, also the helicity di0erence has been measured. The .rst measurement was carried out at MAMI (Mainz) for photon energies in the range 200 MeV ¡ 2 ¡ 800 MeV [16,17]. As shown in Fig. 3, this di0erence Iuctuates much more strongly than the total cross section T . The threshold region is dominated by S-wave pion production, i.e., intermediate states with spin 1=2 and, therefore, mostly contributes to the cross section 1=2 . In the region of the 8(1232) with spin J = 3=2, both helicity cross sections contribute, but since the transition is essentially M 1, we .nd 3=2 =1=2 ≈ 3, and TT becomes large and positive. Fig. 3 also shows that 3=2 dominates the proton photoabsorption cross section in the second and third resonance regions. It was in fact one of the early successes
112
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 600 500
σ3/2-σ1/2 (µb)
400 300 200 100 0 -100 0
200 400 600 800 1000 1200 1400 1600 1800
ν (MeV) Fig. 3. The helicity di0erence 3=2 (2) − 1=2 (2) for the proton. The calculations include the contribution of N intermediate states (dashed curve) [18], :N intermediate state (dotted curve) [19], and the N intermediate states (dashed–dotted curve) [20,21]. The total sum of these contributions is shown by the full curves. The MAMI data are from Refs. [16,17] and the (preliminary) ELSA data from Ref. [22].
of the quark model to predict this fact by a cancellation of the convection and spin currents in the case of 1=2 [23,24]. The GDH collaboration has now extended the measurement into the energy range up to 3 GeV up to 2 ≈ 2 GeV, at ELSA (Bonn) [22]. These preliminary data show a small positive value of TT with some indication of a cross-over to negative values, as has been predicted from an extrapolation of DIS data [25]. This is consistent with the fact that the helicity-conserving cross section 1=2 should dominate in DIS, because an individual quark cannot contribute to 3=2 due to its spin. However, the extrapolation from DIS to real photons should be taken with a grain of salt. Having studied the behavior of the absorption cross sections, we are now in a position to set up dispersion relations. A generic form starts from a Cauchy integral with contour C shown in Fig. 4, 1 f(2 ) f(2 + i) = d2 ; (49) 2i C 2 − 2 − i where 2 ¿ 0 and ¿ 0, i.e., in the limit → 0 the singularity approaches a physical point at 2 = 2 ¿ 0. The contour is closed in the upper half-plane by a large circle of radius R that eventually goes to in.nity. Since we want to neglect this contribution eventually, the cross sections have to converge for 2 → ∞ suGciently well. As we have seen before, this requirement is certainly not ful.lled by T (2), and for this reason we have to subtract the dispersion relation for f. If we subtract at 2 = 0, i.e., consider f(2) − f(0), we also remove the nucleon pole terms at 2 = 0. The remaining contribution comes from the cuts along the real axis, which may be expressed in terms of the discontinuity of Im f across the cut for a contour as shown in Fig. 4 or simply by an integral
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
113
Im(ν)
−ν
0
ν
0
Re(ν)
Fig. 4. The contour C for the dispersion integral Eq. (49). The physical point lies at 2 + i and approaches the Re 2 axis in the limit → +0. Singularities lie on the real axis, poles in the s- and u-channel contributions with intermediate nucleon states at 2 = 0, and cuts for |2| ¿ 20 due to production of a pion or heavier systems (2; K, etc.). In addition, there occur resonances in the lower half-plane on the second Riemann sheet.
over Im f as we approach the axis from above. By use of the crossing relation and the optical theorem, the subtracted dispersion integral can then be expressed in terms of the cross section, ∞ 22 T (2 ) Re f(2) = f(0) + 2 P d2 : (50) 2 − 22 2 2 20 Though the dispersion integral is clearly dominated by hadronic reactions, the subtraction is also necessary for a charged lepton, because the integral over T also diverges (logarithmically) for a purely electromagnetic process. We note that in a hypothetical world where this integral would converge, the charge could be predicted from the absorption cross section. For the odd function g(2) we may expect the existence of an unsubtracted dispersion relation, ∞ 1=2 (2 ) − 3=2 (2 ) 2 Re g(2) = 2 P 2 d2 : (51) 4 2 2 − 2 2 20 If the integrals exist, the relations Eqs. (50) and (51) can be expanded into a Taylor series at the origin, which should converge up to the lowest threshold, 2 = 20 : 1 ∞ T (2 ) Re f(2) = f(0) + d2 22n ; (52) 2 2n 2 2 2 0 n=1 1 1=2 (2 ) − 3=2 (2 ) Re g(2) = d2 22n−1 : (53) 2 )2n−1 4 (2 n=1 The expansion coeGcients in brackets parametrize the electromagnetic response of the medium, e.g., the nucleon. These Taylor series may be compared to the predictions of the low energy
114
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
theorem (LET) of Low [26], and Gell-Mann and Goldberger [27] who showed that the leading and next-to-leading terms of the expansions are .xed by the global properties of the system. These properties are the mass M , the charge eeN , and the anomalous magnetic moment (e=2M )N for a particle with spin 1=2 like the nucleon (i.e., ep =1; en =0; p =1:79; n =−1:91). The predictions of the LET start from the observation that the leading term for 2 → 0 is described by the Born terms, because these have a pole structure in that limit. If constructed from a Lorentz, gauge invariant and crossing symmetrical theory, the leading and next-to-leading order terms are completely determined by the Born terms, e2 eN2 + ( + ()22 + O(24 ) ; 4M
(54)
e2 N2 2 + 0 23 + O(25 ) : 8M 2
(55)
f(2) = − g(2) = −
The leading term of the no spin-Iip amplitude, f(0), is the Thomson term already familiar from nonrelativistic theory. 2 The term O(2) vanishes because of crossing symmetry, and only the term O(22 ) contains information on the internal structure (spectrum and excitation strengths) of the complex system. In the forward direction this information appears as the sum of the electric and magnetic dipole polarizabilities. The higher order terms O(24 ) contain contributions of dipole retardation and higher multipoles, as will be discussed in Section 3.10. By comparing with Eq. (52), we can construct all higher coeGcients of the low energy expansion (LEX), Eq. (54), from moments of the total cross section. In particular we obtain Baldin’s sum rule [28,29], ∞ 1 T (2 ) +(= 2 d2 ; (56) 2 20 2 2 and from the next term of the expansion a relation for dipole retardation and quadrupole polarizability. In the case of the spin-Iip amplitude g, the comparison of Eqs. (53) and (55) yields the sum rule of Gerasimov [2], Drell and Hearn [3], ∞ 3=2 (2 ) − 1=2 (2 ) e2 N2 = d2 ≡ I ; (57) 2M 2 2 20 and a value for the forward spin polarizability [27,30], ∞ 3=2 (2 ) − 1=2 (2 ) 1 d2 : 0 = − 2 4 20 2 3
(58)
Baldin’s sum rule was recently reevaluated in Ref. [13]. These authors determined the integral by use of multipole expansions of pion photoproduction in the threshold region, old and new total photoabsorption cross sections in the resonance region (200 MeV ¡ 2 ¡ 2 GeV), and a parametrization
2
By comparing with Eq. (31) we see that we have now converted to Heaviside–Lorentz units, i.e., em = e2 =4 = 1=137 and rcl = e2 =4M , here and in all following sections.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
115
of the high energy tail containing a logarithmical divergence of T . The result is p + (p = (13:69 ± 0:14) × 10−4 fm3 ; n + (n = (14:40 ± 0:66) × 10−4 fm3 ;
(59)
for proton and neutron, respectively. Due to the 2−3 weighting of the integral, the forward spin polarizability of the proton can be reasonably well determined by the GDH experiment at MAMI. The contribution of the range 200 MeV ¡ 2 ¡ 800 MeV is 0 = −[1:87 ± 0:08 (stat) ± 0:10 (syst)] × 10−4 fm4 , the threshold region is estimated to yield 0:90 × 10−4 fm4 , and only −0:04 × 10−4 fm4 are expected from energies above 800 MeV [31]. The total result is p0 = [ − 1:01 ± 0:08 (stat) ± 0:10 (syst)] × 10−4 fm4 :
(60)
We postpone a more detailed discussion of the nucleon’s polarizability to Section 3.9 where experimental .ndings and theoretical predictions are compared to the results of dispersion relations. As we have seen above, the GDH sum rule is based on very general principles, Lorentz and gauge invariance, unitarity, and on one weak assumption: the convergence of an unsubtracted dispersion relation (DR). It is of course impossible to ever prove the existence of such a sum rule by experiment. However, the question is legitimate whether or not the anomalous magnetic moment on the lhs of Eq. (57) is approximately obtained by integrating the rhs of that equation up to some reasonable energy, say 3 or 50 GeV. The comparison will tell us whether the anomalous magnetic moment measured as a ground state expectation value, is related to the degrees of freedom visible to that energy, or whether it is produced by very short distance and possibly still unknown phenomena. Concerning the convergence problem, it is interesting to note that the GDH sum rule was recently 3 evaluated in QED [32] for the electron at order em and shown to agree with the Schwinger correction to the anomalous magnetic moment, i.e., e =em =(2). This also gives the electromagnetic correction to the sum rule for the proton, which is of relative order (e =N )2 ∼ 10−6 . The GDH sum rule predicts that the integral on the rhs of Eq. (57) should be Ip = 205 b for the proton. The energy range of the MAMI experiment [17] contributes Ip (200 − 800 MeV) = [226 ± 5 (stat)±12 (syst)] b. The preliminary results of the GDH experiment at ELSA [22] shows a positive contribution in the range of the 3rd resonance region, with a maximum value of 3=2 −1=2 ≈ 100 b, but only very small contributions at the higher energies with a possible cross-over to negative values at 2 & 1:8 GeV. At high 2, above the resonance region, one usually invokes Regge phenomenology to argue that the integral converges [33,34]. In particular, for the isovector channel 1=2 − 3=2 → 21 −1 at large 2, with −0:5 . 1 . 0 being the intercept of the a1 (1260) meson Regge trajectory. For the isoscalar channel, Regge theory predicts a behavior corresponding to 1 −0:5, which is the intercept of the isoscalar f1 (1285) and f1 (1420) Regge trajectories. However, these assumptions should be tested experimentally. The approved experiment SLAC E-159 [35] will measure the helicity di0erence absorption cross section 3=2 − 1=2 for protons and neutrons in the photon energy range 5 GeV ¡ 2 ¡ 40 GeV. This will be the .rst measurement of 3=2 − 1=2 above the resonance region, to test the convergence of the GDH sum rule and to provide a baseline for our understanding of soft Regge physics in the spin-dependent forward Compton amplitude.
116
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
According to the latest MAID analysis [31] the threshold region yields Ip (thr − 200 MeV) = −27:5 b, with a sign opposite to the resonance region, because pion S-wave production contributes to 1=2 only. Combining this threshold contribution with the MAMI value (between 200 and 800 MeV), the MAID analysis from 800 MeV to 1:66 GeV, and including model estimates for the , : and K production channels, one obtains an integral value from threshold to 1:66 GeV of [31]: Ip (W ¡ 2 GeV) = [241 ± 5(stat) ± 12(syst) ± 7(model)] b :
(61)
The quoted model error is essentially due to uncertainties in the helicity structure of the and K channels. Based on Regge extrapolations and .ts to DIS, the asymptotic contribution (2 ¿ 1:66 GeV) has been estimated to be (−26 ± 7) b in Ref. [25], whereas Ref. [36] estimated this to be (−13 ± 2) b. We take the average of both estimates to be (−20 ± 9) b as a range which covers the theoretical uncertainty in the evaluations of this asymptotic contribution. Putting all contributions together, the result for the integral I of Eq. (57) is Ip = [221 ± 5 (stat) ± 12 (syst) ± 11 (model)] b ≈ Ip (sum rule) = 204:8 b ;
(62)
where the systematical and model errors of di0erent contributions have been added in quadrature. Assuming that the size of the high-energy contribution for the estimate of Eq. (62) is con.rmed by the SLAC E-159 experiment in the near future, one can conclude that the GDH sum rule seems to work for the proton. Unfortunately, the experimental situation is much less clear in the case of the neutron, for which the sum rule predicts In (sum rule) = 233:2 b :
(63)
From present knowledge of the pion photoproduction multipoles and models of heavier mass intermediate state, one obtains the estimate In = [147() + 55() − 6(:)] b ≈ 196 b [31], from the contributions of the , and : production channels, thereby assuming the same two-pion contribution as in the case of the proton. This estimate for In falls short of the sum rule value by about 15%. Given the model assumptions and the uncertainties in the present data, one can certainly not conclude that the neutron sum rule is violated. Possible sources of the discrepancy may be a neglect of .nal state interaction for pion production o0 the “neutron target” deuteron, the helicity structure of two-pion production, or the asymptotic contribution, which still remain to be investigated. We shall return to this point in the following Section when discussing so-called generalized GDH integrals for virtual photon scattering. In any case, the outcome of the planned experiments of the GDH collaboration [37] for the neutron will be of extreme interest. 2.3. Forward dispersion relations in doubly virtual Compton scattering (VVCS) In this section we consider the forward scattering of a virtual photon with space-like fourmomentum q, i.e., q2 =q02 −q2 =−Q2 ¡ 0. The .rst stage of this process, the absorption of the virtual photon, is related to inclusive electroproduction, e + N → e + N + anything, where e(e ) and N (N ) are electrons and nucleons, respectively, in the initial (.nal) state. The kinematics of the electron is traditionally described in the lab frame (rest frame of N ), with E and E the initial and .nal energy of the electron, respectively, and " the scattering angle. This de.nes the kinematical values of the
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
117
emitted photon in terms of four-momentum transfer Q and energy transfer 2, " Q2 = 4EE sin2 ; 2
2 = E − E ;
(64)
and the lab photon momentum |qlab | = Q2 + 22 . In the c:m: frame of the hadronic intermediate state, the four-momentum of the virtual photon is q = (!; qcm ) with M2 − Q2 M ; qcm = qlab ; (65) W W where W is the total energy in the hadronic c:m: frame. We further introduce the Mandelstam variable s and the Bjorken variable x, !=
s = 2M2 + M 2 − Q2 = W 2 ;
x=
Q2 : 2M2
(66)
The virtual photon spectrum is normalized according to Hand’s de.nition [38] by the “equivalent photon energy”, K = KH = 2(1 − x) =
W2 − M2 : 2M
(67)
An alternate choice would be to use Gilman’s de.nition [39], KG = |qlab |. The inclusive inelastic cross section may be written in terms of a virtual photon Iux factor =V and four partial cross sections [19], d = =V (2; Q2 ) ; d$ dE = T + jL + hPx 2j(1 − j) LT + hPz 1 − j2 TT ;
(68) (69)
with the photon polarization j=
1 1 + 2(1 +
22 =Q2 ) tan2
"=2
;
(70)
and the Iux factor =V =
1 em E K : 2 2 2 E Q 1 − j
(71)
of Eqs. (46) and (47), the longitudinal polarIn addition to the transverse cross section T and TT ization of the virtual photon gives rise to a longitudinal cross section L and a longitudinal-transverse . The two spin-Iip (interference) cross sections can only be measured by a doubleinterference LT polarization experiment, with h = ±1 referring to the two helicity states of the (relativistic) electron, and Pz and Px the components of the target polarization in the direction of the virtual photon momentum qlab and perpendicular to that direction in the scattering plane of the electron. In the following we shall change the sign of the two spin-Iip cross sections in comparison with Ref. [19], i.e., introduce the sign convention used in DIS, TT = −TT
and
LT = −LT :
(72)
118
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
The partial cross sections are related to the quark structure functions as follows [19]: 3 42 em F1 ; MK 42 em 1 + 2 F2 F1 − ; L = K 2 2 M T =
TT =
42 em (g1 − 2 g2 ) ; MK
LT =
42 em (g1 + g2 ) ; MK
(73)
with the ratio = Q=2. The helicity cross sections are then given by 1=2 =
42 em (F1 + g1 − 2 g2 ); MK
42 em (74) (F1 − g1 + 2 g2 ) : MK Due to the longitudinal degree of freedom, the virtual photon has a third polarization vector ”0 in addition to the transverse polarization vectors ”± de.ned in Eq. (43). A convenient de.nition of this four-vector is 3=2 =
0 =
1 (|q|; 0; 0; q0 ) ; Q
(75)
where we have chosen the z-axis in the direction of the photon propagation, q = (q0 ; 0; 0; |q|) :
(76)
All three polarization vectors and the photon momentum are orthogonal (in the Lorentz metrics!), m · q = 0;
m∗ · m = (−1)m *mm
for m; m = 0; ±1 :
(77)
The invariant matrix element for the absorption of a photon with helicity m is Mm ∼ m · J ;
(78)
where J is the hadronic transition current, which is gauge invariant, q · J = q0 ! − q · j = 0 :
(79)
Being Lorentz invariant, the matrix element Mm can be evaluated in any system of reference, e.g., in the lab frame and by use of Eq. (79), M0 ∼ 3
Q 1 Q (|qlab |! − 2jz ) = ! = jz : Q |qlab | 2
We note at this point that the factor 2 in the denominator of L is missing in Ref. [19].
(80)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
γ* q N
q N
γ*
N
119
γ*
γ* q
q
N
N
N
Fig. 5. Born diagrams for the doubly virtual Compton scattering (VVCS) process.
The VVCS amplitude for forward scattering takes the form (as a 2 × 2 matrix in nucleon spinor space): ∗
T (2; Q2 ; " = 0) = ” · ” fT (2; Q2 ) + fL (2; Q2 ) ∗
∗
+ i · (” × ”) gTT (2; Q2 ) − i · [(” − ”) × q]g ˆ LT (2; Q2 ) ;
(81)
where we have generalized the notation of Eq. (44) to the VVCS case. The optical theorem relates the imaginary parts of the four amplitudes in Eq. (81) to the four partial cross sections of inclusive scattering, K T (2; Q2 ) ; 4 K L (2; Q2 ) ; Im fL (2; Q2 ) = 4 K TT (2; Q2 ) ; Im gTT (2; Q2 ) = 4 K LT (2; Q2 ) : Im gLT (2; Q2 ) = 4 Im fT (2; Q2 ) =
(82)
We note that products KT , etc. are independent of the choice of K, because they are directly proportional to the measured cross section (see Eqs. (68) and (71)). Of course, the natural choice at this point would be K = KG = |qlab |, because we expect the photon three-momentum on the rhs of Eq. (82). However, we shall later evaluate the cross sections by a multipole decomposition in the c:m: frame for which K = KH is the standard choice. The imaginary parts of the scattering amplitudes, Eqs. (82), get contributions from both elastic scattering at 2B =Q2 =2M and inelastic processes above pion threshold, for 2 ¿ 20 =m +(m2 +Q2 )=2M . The elastic contributions can be calculated from the direct and crossed Born diagrams of Fig. 5, where the electromagnetic vertex for the transition ∗ (q) + N (p) → N (p + q) is given by = = FD (Q2 ) + FP (Q2 ) i2
q2 ; 2M
(83)
with FD and FP the nucleon Dirac and Pauli form factors, respectively. The choice of the electromagnetic vertex according to Eq. (83) ensures gauge invariance when calculating the Born contribution
120
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
to the VVCS amplitude, and yields: em 22B Born 2 2 2 FD + 2 ; fT (2; Q ) = − G M 2 − 22B + i M em Q2 4M 2 Born 2 2 2 FP + 2 ; fL (2; Q ) = − G 4M 3 2 − 22B + i E em 2 Q2 Born 2 2 2 FP + 2 ; gTT (2; Q ) = − G 2M 2 2 − 22B + i M em Q Q2 Born 2 F D FP − 2 gLT (2; Q ) = GE GM : 2M 2 2 − 22B + i
(84)
The electric (GE ) and magnetic (GM ) Sachs form factors are related to the Dirac (FD ) and Pauli (FP ) form factors by GE (Q2 ) = FD (Q2 ) − FP (Q2 );
GM (Q2 ) = FD (Q2 ) + FP (Q2 ) ;
(85)
with = Q2 =4M 2 , and are normalized to GE (0) = eN ;
GM (0) = eN + N = N ;
(86)
where eN ; N , and N are the charge (in units of e), the anomalous and the total magnetic moments (in units of e=2M ) of the respective nucleon. We have split the elastic contributions of Eq. (84) into a real contribution (terms in FD and FP ) and a complex contribution (terms in GE and GM ). The latter terms have a structure like the susceptibility of Eq. (5) and ful.ll a dispersion relation by themselves. By use of Eqs. (73), (82), and (84), the imaginary parts of the Born amplitudes can be related to the elastic contributions of the quark structure functions and to the form factors, 4M 1 2 Im fTBorn = F1el = GM *(1 − x) ; 2 e 2 Q2 + 4M 2 el 2M 2 2 4M Born el Im f = F − F = G *(1 − x) ; L 2 1 e2 2Q2 Q2 E 4M 2 el 1 2 4M Born el Im g = g − g = GM *(1 − x) ; TT 1 e2 Q2 2 2 2M el M 4M Born (g1 + g2el ) = GE GM *(1 − x) : Im gLT = 2 e Q Q
(87)
These equations describe the imaginary parts of the scattering amplitudes in the physical region at x = 1 or 2 = 2B . The continuation of the amplitudes to negative or complex arguments follows from crossing symmetry (see, e.g., Eq. (45)) and analyticity (see Eq. (16)). According to Eqs. (54) and (55), the low energy theorem for real photons asserts that the leading and next-to-leading order terms in an expansion in 2 are completely determined by the pole singularities of the Born terms. However, in the case of virtual photons the limit 2 → 0 has to be
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
121
performed with care [40], because lim lim f(2; Q2 ) = lim lim f(2; Q2 ) :
2→0 Q2 →0
Q2 →0 2→0
(88)
If we choose Q2 = 0 right away, we reproduce the results of real Compton scattering, Eqs. (54) and (55), for f(2) = fT (2; Q2 = 0) and g(2) = gTT (2; Q2 = 0), while fL and gLT vanish because of the longitudinal currents involved. On the other hand, if we choose Q2 .nite and let 2 go to zero, the result is quite di0erent. In particular em 2 e + O(22 ) ; fTBorn (2; Q2 = 0) = − (89) M N while fTBorn (2 = 0; Q2 ) =
em N (2eN + N ) + O(Q2 ) : M
(90)
The surprising result is that a long-wave real photon couples to a Dirac (point) particle, while a long-wave virtual photon couples only to a particle with an anomalous magnetic moment, i.e., a particle with internal structure. The inelastic contributions, on the other hand, are independent of the order of the limits. It is now straightforward to construct the full VVCS amplitudes by dispersion relations in 2 at Q2 = const. For the amplitude fT (which is even in 2), we shall need a subtracted DR as in the case of Eq. (50), ∞ 222 Im fT (2 ; Q2 ) 2 2 P d2 : Re fT (2; Q ) = Re fT (0; Q ) + (91) 2 (2 2 − 22 ) 0 The integral in Eq. (91) gets contributions from both the elastic cross section (nucleon pole) at 2 = 2B and from the inelastic continuum for 2 ¿ 20 : Re fT (2; Q2 ) = Re fTpole (2; Q2 ) + [Re fT (0; Q2 ) − Re fTpole (0; Q2 )] ∞ K(2 ; Q2 )T (2 ; Q2 ) 22 d2 : + 2P 2 2 (2 2 − 22 ) 20
(92)
In the case of K = KH (2; Q2 ) = 2(1 − x), the dispersion integral is of the same form as in Eq. (50) except for a factor (1 − x) typical for that choice of K. The pole contribution which enters in Eq. (92) can be read o0 Eq. (84), Re fTpole (2; Q2 ) = −
em 22B G 2 (Q2 ) : M 22 − 22B M
(93)
The function fT (2; Q2 ) − fTpole (2; Q2 ), i.e., excluding the nucleon pole term, is continuous in 2. Therefore, one may perform a low energy expansion in 2, Re fT (2; Q2 ) − Re fTpole (2; Q2 ) =[Re fT (0; Q2 ) − Re fTpole (0; Q2 )] + ((Q2 ) + ((Q2 ))22 + O(24 ) ;
(94)
122
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 (α + β) • Q /(2 M) (10 ) 4
α + β (10 fm ) -4
3
-4
30
14 12 10 8 6 4 2 0
25 20 15 10 5 0 0
0.25
0.5 2
0.75 2
Q (GeV )
1
0
1
2 2
3
4
2
Q (GeV )
Fig. 6. Q2 dependence of the polarizability + ( (left) and ( + () · Q4 =(2M ) (right) for the proton, as given by Eq. (95). The dashed (dashed–dotted) curves represent the MAID estimate [18,19] for the ( + : + ) channels. The upper solid curve is the evaluation using the DIS structure function F1 [42]. The lower solid curve is the evaluation for the resonance region (W ¡ 2 GeV) using the same DIS structure function. The solid circle at Q2 = 0 corresponds to the Baldin sum rule [13].
where the term in O(22 ) generalizes the de.nition of the sum of electric and magnetic polarizabilities at .nite Q2 . Comparing Eqs. (92) and (94), one obtains the generalization of Baldin’s sum rule to virtual photons, ∞ K(2; Q2 ) T (2; Q2 ) 1 2 2 (Q ) + ((Q ) = 2 d2 ; 2 20 2 22 x0 e2 M 2xF1 (x; Q2 ) d x ; (95) = Q4 0 where in the last line we have expressed the integral in terms of the nucleon structure function F1 using Eq. (73). The Callan–Gross relation [41] implies that in the limit of large Q2 the integrand 2xF1 (x; Q2 ) → F2 (x; Q2 ), i.e., the generalized Baldin sum rule measures the second moment of F1 and, asymptotically, the .rst moment of F2 . We can also de.ne the resonance contribution to + ( through the integral x0 e2 M 2 2 res (Q ) + (res (Q ) = 2xF1 (x; Q2 ) d x ; (96) Q4 xres where xres corresponds with W = 2 GeV. In Fig. 6, we show the Q2 dependence of + ( and compare a resonance estimate with the evaluation for Q2 ¿ 1 GeV2 obtained from the DIS structure function F1 , using the MRST01 parametrization [42]. For the resonance estimate we use the MAID model [18] for the one-pion channel and include an estimate for the : and channels according to Ref. [19]. One sees that at Q2 = 0, the one-pion channel alone gives about 85% of Baldin’s sum rule. Including the estimate for the : and channels, one nearly saturates Baldin’s sum rule. Going to Q2 larger than 1 GeV2 , we also show the sum rule estimate of Eq. (96) obtained from DIS by including only the range W ¡ 2 GeV. The comparison of this result with the resonance estimate of MAID shows that the MAID model nicely reproduces the Q2 dependence of T for W ¡ 2 GeV. By comparing the full DIS estimate with the contribution from W ¡ 2 GeV, one notices that the sum rule value for + ( at Q2 . 1 GeV2 is
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
123
mainly saturated by the resonance contribution, whereas for Q2 & 2 GeV2 , the nonresonance contribution (W ¿ 2 GeV) dominates the sum rule. Therefore, around Q2 1–2 GeV2 , a transition occurs from a resonance dominated description to a partonic description. Such a transition was already noticed in Refs. [43,44] where a resonance estimate for + ( was compared with the DIS estimate, giving qualitatively similar results as shown here. 4 As in the case of fT , also the longitudinal amplitude fL (which is also even in 2), should obey a subtracted DR: Re fL (2; Q2 ) = Re fLpole (2; Q2 ) + [Re fL (0; Q2 ) − Re fLpole (0; Q2 )] ∞ K(2 ; Q2 )L (2 ; Q2 ) 22 d2 ; + 2P 2 2 (2 2 − 22 ) 20
(97)
where the pole part to fL can be read o0 Eq. (84), Re fLpole (2; Q2 ) = −
1 em Q2 G 2 (Q2 ) : 2 M 2 − 22B E
(98)
Analogously to Eq. (94), one may again perform a low energy expansion for the nonpole (or inelastic) contribution to the function fL (2; Q2 ), de.ning a longitudinal polarizability L (Q2 ) as the coeGcient of the 22 dependent term. Eq. (97) then yields a sum rule for this polarizability: ∞ 1 K(2; Q2 ) L (2; Q2 ) 2 L (Q ) = 2 d2 ; 2 20 2 22
2 Q e2 4M 3 x0 2 2 2 2 dx [F2 (x; Q ) − 2xF1 (x; Q )] + x F2 (x; Q ) : (99) = Q6 0 4M 2 In the last line we used Eq. (73) to express L in terms of the .rst moment of FL ≡ F2 − 2xF1 and the third moment of F2 . Comparing Eqs. (95) and (99), one sees that at large Q2 , where F1 ; F2 and Q2 FL are Q2 independent (modulo logarithmic scaling violations), the ratio L =( + () ∼ 1=Q2 . The quantity L is therefore a measure of higher twist (i.e. twist-4) matrix elements. In Fig. 7, we show the Q2 dependence for L and compare the MAID model (for the one-pion channel) with the DIS evaluation of Eq. (99), using the MRST01 parametrization [42] for F2 and FL . By confronting the full DIS estimate with the contribution from the range W ¡ 2 GeV, one .rst notices that around Q2 1–2 GeV2 , a transition occurs from a resonance dominated towards a partonic description, as is also seen for + ( in Fig. 6. Furthermore, by comparing the MAID model with the DIS evaluation of Eq. (99) in the range W ¡ 2 GeV, one notices that, in contrast to the case of + (, the MAID model clearly underestimates L . This points to a lack of longitudinal strength in the phenomenological model, which is to be addressed in future analyses. Similar as in the case of real photons (Baldin sum rule!), the generalized polarizabilities can be, in principle, constructed directly from the experimental data. However, this requires a longitudinal– transverse separation of the cross sections at constant Q2 over a large energy range.
4
Note however that in Refs. [43,44], the virtual photon Iux di0ers from Eq. (96).
124
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 αL • Q /(2 M) (10 ) 6
αL (10 fm ) -4
3
3
-4
7
1.4 1.2 1 0.8 0.6 0.4 0.2 0
6 5 4 3 2 1 0 0
0.25
0.5 2
0.75 2
Q (GeV )
1
0
1
2 2
3
4
2
Q (GeV )
Fig. 7. Q2 dependence of the polarizability L (left) and L · Q6 =(2M )3 (right) for the proton, as given by Eq. (99). The dashed curve represents the MAID estimate [18,19] for the one-pion channel. The upper solid curve is the evaluation using the DIS structure functions F2 and FL [42]. The lower solid curve is the evaluation for the resonance region (W ¡ 2 GeV) using the same DIS structure functions.
We next turn to the sum rules for the spin dependent VVCS amplitudes (see also Ref. [45] where a nice review of generalized sum rules for spin dependent nucleon structure functions has been given). Assuming an appropriate high-energy behavior, the spin-Iip amplitude gTT (which is odd in 2) satis.es an unsubtracted DR as in Eq. (51), ∞ 22 Im gTT (2 ; Q2 ) 2 P Re gTT (2; Q ) = d2 : (100) 2 2 − 2 2 0 Assuming that the integral of Eq. (100) converges, one can separate the contributions from the elastic cross section at 2 = 2B and the inelastic continuum for 2 ¿ 20 , and by use of Eq. (82), one obtains ∞ 2 K(2 ; Q2 )TT (2 ; Q2 ) pole Re gTT (2; Q2 ) = Re gTT (2; Q2 ) + 2 P d2 ; (101) 2 2 2 − 2 2 20 where the pole part is given by Eq. (84) as pole (2; Q2 ) = − Re gTT
em 2 Q2 G 2 (Q2 ) : 2M 2 22 − 22B M
(102)
Performing next a low energy expansion (LEX) for the nonpole contribution to gTT (2; Q2 ), we obtain: 2em pole 2 2 Re gTT (2; Q ) − Re gTT (2; Q ) = IA (Q2 )2 + 0 (Q2 )23 + O(25 ) : (103) M2 For the O(2) term, Eq. (101) yields a generalization of the GDH sum rule: M 2 ∞ K(2; Q2 ) TT (2; Q2 ) 2 d2 ; IA (Q ) = 2 e 20 2 2
2M 2 x0 4M 2 2 2 2 = 2 d x g1 (x; Q ) − 2 x g2 (x; Q ) ; Q Q 0
(104)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 γ0 • Q /(2 M) (10 ) 6
γ0 (10 fm ) -4
125
4
0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1.2 -1.4
2
-4
5 4 3 2 1 0 -1 -2 -3 0
0.25
0.5 2
0.75 2
Q (GeV )
1
0
1
2 2
3
4
2
Q (GeV )
Fig. 8. Q2 dependence of the polarizability 0 (left) and 0 · Q6 =(2M )2 (right) for the proton, as given by Eq. (105). The dashed (dashed–dotted) curves represent the MAID estimate [18,19] for the ( + : + ) channels. The upper solid curve is the evaluation using the DIS structure function g1p [46]. The lower solid curve is the evaluation for the resonance region (W ¡ 2 GeV) using the DIS structure function. The shaded bands represent the corresponding error estimates as given by Ref. [46]. The solid circle at Q2 = 0 corresponds to the evaluation of Eq. (60).
where the integral IA (Q2 ) has been introduced in Ref. [19]. At Q2 = 0, one recovers the GDH sum rule of Eq. (57) as IA (0) = −N2 =4. However, it has to be realized that several de.nitions have been given how to generalize the integral to .nite Q2 [19]. The de.nition IA of Eq. (104) has the advantage that the (arbitrary) factor K in the photon Iux disappears (see the discussion after Eq. (82)). In other de.nitions the factor K=2 in Eq. (104) is simply replaced by 1, which formally makes the integral look like the GDH integral for real photons, Eq. (67). Unfortunately, these integrals now depend on the de.nition of K (see Eq. (67)). In the following we call these integrals IB (Gilman’s de.nition) and IC (Hand’s de.nition), and refer the reader to Ref. [19] for the expressions analogous to Eq. (104) and further details. We will discuss the O(2) term in Eq. (103) and the .rst moment of g1 in detail further on, and turn .rst to the O(23 ) term. From the O(23 ) term of Eq. (103), one obtains a generalization of the forward spin polarizability, ∞ 1 K(2; Q2 ) TT (2; Q2 ) d2 ; 0 (Q2 ) = 2 2 20 2 23
4M 2 2 e2 4M 2 x0 2 2 2 g d x x (x; Q ) − x g (x; Q ) : (105) = 1 2 Q6 0 Q2 At large Q2 , the term proportional to g2 in Eq. (105) can be dropped and 0 is then proportional to the third moment of g1 . In Fig. 8, we show the Q2 dependence of 0 and compare the resonance estimate from MAID to the evaluation with the DIS structure function g1 for Q2 ¿ 1 GeV2 . For the structure function g1 , we use the recent .t performed in [46], which also provides 1 error bands for this distribution, allowing us to determine the experimental error on 0 , as shown by the shaded bands in Fig. 8. At low Q2 , one sees that the estimate for the one-pion channel completely dominates 0 and reproduces well its measured value at Q2 = 0. At Q2 ¿ 2 GeV2 , the MAID model ( + : + channels) is also in good agreement with the DIS evaluation of the W ¡ 2 GeV range in the integral Eq. (105) for 0 . Furthermore, comparing the full DIS estimate with the contribution from the range W ¡ 2 GeV, we once more observe the gradual transition from the resonance dominated to the partonic region.
126
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
Around Q2 = 4 GeV2 , the W ¡ 2 GeV region contributes about 30% to 0 , whereas for + ( at the same Q2 , this contribution is below 10%. This di0erence can be understood by comparing the sum rule Eq. (105) for 0 with Eq. (95) for + (. From this comparison, one notices that the sum rule for 0 invokes one additional power of 2 in the denominator, giving higher weight to the resonance region as compared with + (. We next turn to the amplitude gLT (2; Q2 ), which is even in 2. Assuming an unsubtracted DR exists for the amplitude gLT , it takes the form ∞ 2 K(2 ; Q2 )LT (2 ; Q2 ) 1 pole d2 ; Re gLT (2; Q2 ) = Re gLT (2; Q2 ) + 2 P (106) 2 (2 2 − 22 ) 20 where the pole part is given by Eq. (84) as pole Re gLT (2; Q2 ) = −
em Q Q2 GE (Q2 )GM (Q2 ) : 2M 2 22 − 22B
(107)
One sees that for the unsubtracted dispersion integral of Eq. (106) to converge, the cross section LT (2; Q2 ) should drop faster than 1=2 at large 2. One can then perform a low energy expansion for the nonpole contribution to gLT (2; Q2 ), as 2em pole 2 2 Re gLT (2; Q ) − Re gLT (2; Q ) = QI3 (Q2 ) + Q*LT (Q2 )22 + O(24 ) ; (108) M2 where I3 (Q2 ) has been introduced in Ref. [19] as M 2 ∞ K(2; Q2 ) 1 2M 2 x0 I3 (Q2 ) = 2 d x{g1 (x; Q2 ) + g2 (x; Q2 )} : LT (2; Q2 ) d2 = 2 e 20 2 Q Q 0
(109)
For the O(22 ) term of Eq. (108), one obtains a generalized longitudinal–transverse polarizability, ∞ K(2; Q2 ) LT (2; Q2 ) 1 e2 4M 2 x0 *LT (Q2 ) = 2 d2 = d x x2 {g1 (x; Q2 ) + g2 (x; Q2 )} : (110) 2 20 2 Q22 Q6 0 This function is .nite in the limit Q2 → 0, and can be evaluated safely on the basis of dispersion relations. We note that in Ref. [19], the quantity *0 di0ers by the factor (1 − x) in the integrand. At large Q2 , *LT is proportional to the third moment of the transverse spin structure function gT ≡ g1 + g2 . In this limit, Wandzura and Wilczek [47] have shown that when neglecting dynamical (twist-3) quark–gluon correlations, the transverse spin structure function gT can be expressed in terms of the twist-2 spin structure function g1 as 1 g1 (y; Q2 ) : (111) g1 (x; Q2 ) + g2 (x; Q2 ) = dy y x Recent experimental data from SLAC [48,49] for the spin structure function g2 show that the measured value of g2 (x; Q2 ) (in the range 0:02 6 x 6 0:8 and 1 GeV2 6 Q2 6 30 GeV2 ) is consistent
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 δLT • Q /(2 M) (10 ) 6
δLT (10 fm ) -4
127
4
2
-4
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
1.4 1.2 1 0.8 0.6 0.4 0.2 0 0
0.25
0.5 2
0.75 2
Q (GeV )
1
0
1
2 2
3
4
2
Q (GeV )
Fig. 9. Q2 dependence of the polarizability *LT (Q2 ) (left) and *LT (Q2 ) · Q6 =(2M )2 (right) for the proton, as given by Eq. (110). The dashed curve represents the MAID estimate for the one-pion channel [18]. The upper solid curve is the evaluation using the DIS structure function gT = g1 + g2 [46]. The lower solid curve is the evaluation for the resonance region (W ¡ 2 GeV) using the DIS structure function. The shaded bands represent the corresponding error estimates as given by Ref. [46].
with the Wandzura–Wilczek (WW) relation of Eq. (111). One can therefore evaluate the rhs of Eq. (110), to good approximation, by calculating the third moment of both sides of Eq. (111). By changing the integration variables (x; y) → (z; y) with x = z · y, one obtains 1 1 g1 (y; Q2 ) 1 1 2 dx x dy dy y2 g1 (y; Q2 ) : (112) = y 3 0 x 0 Combining Eqs. (105) and (110) with Eq. (112) and using the WW relation, we may relate the generalized spin polarizabilities *LT (Q2 ) and 0 (Q2 ), at large Q2 : 5 *LT (Q2 ) →
1 3
0 (Q2 );
Q2 → ∞ :
(113)
In Fig. 9, the Q2 dependence of the polarizability *LT is shown both for the MAID model (for the one-pion channel) and for the DIS evaluation of Eq. (113). Comparing the MAID model with the DIS evaluation for the range W ¡ 2 GeV, one notices that the MAID model underestimates *LT , similarly as was seen in Fig. 7 for L . As the polarizability *LT involves a longitudinal amplitude, this may again point to a lack of longitudinal strength in the MAID model. In order to construct the VVCS amplitudes which are in one-to-one correspondence with the quark structure functions, it is useful to cast Eq. (81) into a covariant form,
q q2 2 ∗ 2 T (2; Q ; " = 0) = 2 −g + 2 T1 (2; Q2 ) q 1 p·q p·q 2 2 p − 2 q + p − 2 q T2 (2; Q2 ) p·q q q 5
Note that for Q2 → ∞, one can again neglect the elastic contribution and make the replacement in Eq. (110).
x0 0
→
1 0
128
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
+
i 2( j q s( S1 (2; Q2 ) M
+
i 2( j q (p · qs( − s · qp( )S2 (2; Q2 ) M3
;
(114)
where j0123 = +1, and s is the nucleon covariant spin vector satisfying s · p = 0, s2 = −1. With the de.nition of Eq. (114), all four VVCS amplitudes T1 , T2 , S1 and S2 have the same dimension of mass. Furthermore, the four new structure functions are related to the previously introduced VVCS amplitudes of Eq. (81) as follows: T1 (2; Q2 ) = fT (2; Q2 ) ; Q2 2 (fT (2; Q2 ) + fL (2; Q2 )) ; M 22 + Q 2 2M Q 2 2 2 S1 (2; Q ) = 2 gTT (2; Q ) + gLT (2; Q ) ; 2 + Q2 2 M2 2 2 2 2 gTT (2; Q ) − gLT (2; Q ) : S2 (2; Q ) = − 2 2 + Q2 Q T2 (2; Q2 ) =
(115) (116) (117) (118)
The Born contributions to these functions can be expressed in terms of the form factors by use of Eq. (84) as follows: em 22B Born 2 2 2 FD + 2 ; G T1 (2; Q ) = − M 2 − 22B + i M em 2 Q2 (F 2 + FP2 ) ; M 2 22 − 22B + i D em Q2 Born 2 2 FP + 2 S1 (2; Q ) = − FD (FD + FP ) ; 2M 2 − 22B + i T2Born (2; Q2 ) = −
S2Born (2; Q2 ) =
2 em FP (FD + FP ) : 2 2 2 − 22B + i
(119)
One sees that the pole singularities appearing at 2 = ±iQ, due to the denominators in Eqs. (116)– (118), are actually canceled by a corresponding zero in the numerator of the Born terms. The imaginary parts of the inelastic contributions follow from Eq. (82), Im T1 =
K e2 T = F1 ; 4 4M
(120)
Im T2 =
Q2 K 2 e2 ( F2 ; + ) = T L M 22 + Q2 4 4M
(121)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
2M K Im S1 = 2 2 + Q2 4 Im S2 = −
M2 K 22 + Q2 4
Q TT + LT 2 TT −
2 LT Q
=
e2 M e2 g1 := G1 ; 4M 2 4M
=
e2 M 2 e2 g := G2 : 2 4M 22 4M
129
(122) (123)
In order to cancel the singularities at 2=±iQ, the following relations should be ful.lled if the partial cross sections are continued into the complex 2-plane: T (iQ; Q2 ) = −L (iQ; Q2 ) ; TT (iQ; Q2 ) = iLT (iQ; Q2 ) :
(124)
These relations can be veri.ed by realizing that the singularities at Q2 + 22 = 0 correspond to the Siegert limit, qlab → 0, which also implies qcm → 0. Furthermore, all multipoles vanish in that limit, except for the (unretarded) dipole amplitudes. In the case of one-pion production as presented in Ref. [19], these are the amplitudes E0+ and L0+ of the transverse and longitudinal currents, respectively, which become equal in the Siegert limit. The relations of Eq. (124) then follow straightforwardly. We next discuss dispersion relations for the spin dependent amplitudes S1 and S2 . The spindependent VVCS amplitude S1 is even in 2, and an unsubtracted DR reads ∞ 2 Im S1 (2 ; Q2 ) 2 pole 2 d2 ; (125) Re S1 (2; Q ) = Re S1 + P 2 2 − 2 2 20 where the pole part Re S1pole is obtained from Eq. (119) as Re S1pole (2; Q2 ) = −
em Q2 FD (Q2 )(FD (Q2 ) + FP (Q2 )) ; 2M 22 − 22B
(126)
We can next perform a low-energy expansion for S1 (2; Q2 ) − S1pole (2; Q2 ) as Re S1 (2; Q2 ) − Re S1pole (2; Q2 ) 1 2em 2em 2 2 2 2 I1 (Q ) + (IA (Q ) − I1 (Q )) + M*LT (Q ) 22 + O(24 ) ; = M M Q2 where the leading term in 20 follows from Eq. (125) as 2M 2 x0 2 g1 (x; Q2 ) d x I1 (Q ) ≡ 2 Q 0
Q M 2 ∞ K(2; Q2 ) 2 2 (2; Q ) + (2; Q ) d2 ; = 2 TT LT e 20 (22 + Q2 ) 2
(127)
(128)
which reduces to the GDH sum rule at Q2 = 0, as I1 (0) = −N2 =4. By using Eqs. (104), (110) and (128), one can verify that the term in 22 in Eq. (127) can be expressed in terms of IA , I1 and *LT .
130
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
At large Q2 , I1 (Q2 ) has the limit I1 (Q2 ) → with: 6 =1 (Q2 ) ≡
2M 2 =1 (Q2 ); Q2
1
0
Q2 → ∞ ;
(129)
d x g1 (x; Q2 ) :
(130)
For the .rst moment =1 , a next-to-leading order (NLO) QCD .t to all available DIS data for g1p (g1n ) has been performed in Ref. [50], yielding the values at Q2 = 5 GeV2 : =1p = 0:118 ± 0:004 ± 0:007 ; =1n = −0:058 ± 0:005 ± 0:008 ; =1p − =1n = 0:176 ± 0:003 ± 0:007 :
(131)
For the isovector combination =1p − =1n , the Bjorken sum rule [51] predicts =1p − =1n →
1 gA = 0:211 ± 0:001; 6
Q2 → ∞ ;
(132)
where gA is the axial-vector weak coupling constant. The inclusion of QCD corrections up to order s3 yields [52]:
2 3 s (Q2 ) s (Q2 ) 1 s (Q2 ) p n − 3:5833 =1 − =1 = gA 1 − : (133) − 20:2153 6 When evaluating Eq. (133) using three light quark Iavors in s and .xing s (MZ2 ) at 0.114, one obtains [50]: =1p − =1n = 0:182 ± 0:005
at Q2 = 5 GeV2 :
(134)
One sees that the experimental value of Eq. (131) is in good agreement with the Bjorken sum rule value of Eq. (134). In Fig. 10, we show the Q2 dependence of I1 for the proton and compare the MAID estimate with the DIS evaluation for Q2 ¿ 1 GeV2 , using the parametrization of Ref. [46] for g1 . One immediately sees that the integral I1p has to undergo a sign change from the large negative GDH sum rule value at Q2 = 0 to the positive value at large Q2 as extracted from DIS. Recent data from SLAC [55] and JLab/CLAS [57] cover the intermediate Q2 range. In particular, the JLab/CLAS data, which extend downwards to Q2 0:15 GeV2 , clearly con.rm this sign change in the sum rule, which occurs around Q2 0:25 GeV2 . The resonance estimate of MAID, including + : + channels also displays such a sign change. Given some uncertainties in the evaluation of the channels and higher continua, the calculation qualitatively reproduces the trend of the data for the W ¡ 2 GeV contribution to I1p . 6
At Q2 → ∞ , one can replace highly suppressed.
x0 0
→
1 0
, because the elastic contribution to =1 vanishes like Q−8 and is therefore
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
131
p
I
1
0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0
0.2
0.4
0.6
0.8 2
1
1.2
1.4
2
Q (GeV ) p
2
2
I 1 • Q /(2 M ) 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -0.02 -0.04 0
0.5
1
1.5
2 2
2.5
3
3.5
4
2
Q (GeV )
Fig. 10. Q2 dependence of the integral I1p (upper panel) and I1p · Q2 =(2M 2 ) (lower panel) for the proton, as given by Eq. (128). The dashed (dashed–dotted) curve represent the MAID estimate [18,19] for the ( + : + ) channels. The thin solid curve, covering the whole Q2 range, is the parametrization of Eq. (135) evaluated as described in the text. The upper thin dotted curves are the O(p4 ) HBChPT results of Ref. [53], whereas the lower thick dotted curves are the corresponding O(p4 ) relativistic BChPT results of Ref. [54]. The upper thick solid curve at Q2 ¿ 1:25 GeV2 is the evaluation using the DIS structure function g1p of Ref. [46]. The lower thick solid curve in the same range is the evaluation for the resonance region (W ¡ 2 GeV) using the DIS structure function. The shaded bands around the thick solid curves represent the corresponding error estimates as given by Ref. [46]. The open star at Q2 = 0 corresponds with the MAMI data [16] combined with the estimate for the nonmeasured contribution in the range W ¡ 2 GeV, as given by Eq. (61). The solid star is the total value of Eq. (62), which includes the estimate for W ¿ 2 GeV. The SLAC data are from Ref. [55], the HERMES data are from Ref. [56], and the preliminary JLab/CLAS data are from Ref. [57] (inner error bars are statistical errors only, outer error bars include systematical errors).
At larger Q2 , one again notices the gradual transition from a resonance dominated to a partonic description. For example, at Q2 = 2 GeV2 , the W ¡ 2 GeV region amounts to only to 20% of the total sum rule value for I1p . To gain an understanding of this gradual transition in the integral I1p , it was proposed in [58] to parametrize the Q2 dependence through a vector meson dominance type model. This model was re.ned in Refs. [59,60] by adding explicit resonance contributions which are important at low Q2 as discussed above and lead to the following phenomenological parametrization: cp; n 2 1 p; n p; n 2 2 2 p; n I1 (Q ) = I1; res (Q ) + 2M =1; as − ; (135) (Q2 + 2 ) (Q2 + 2 )2
132
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 n A
I 0 -0.2 -0.4 -0.6 -0.8 -1 -1.2 -1.4 0
0.2
0.4
0.6
0.8 2
1
1.2
1.4
2
Q (GeV )
Fig. 11. The generalized GDH integral IA (Q2 ) vs. Q2 for the neutron. The dashed curve represents the MAID estimate [18,19] for the channel. The thin solid curve is the parametrization of Eq. (136), using MAID to calculate the resonance contribution. The lower thin dotted curve is the O(p4 ) HBChPT results of Ref. [53], whereas the upper thick dotted curve is the corresponding O(p4 ) relativistic BChPT results of Ref. [54]. The thick solid curve for Q2 ¿ 1:25 GeV2 is the evaluation using the DIS structure function g1n , and the shaded band represents the corresponding error estimate as given by Ref. [46]. The 3 He data of the Hall A Collaboration at JLab [61] are corrected for nuclear e0ects according to Ref. [64]. The HERMES data are from Ref. [56]. For both data sets: inner error bars are statistical errors only, outer error bars include systematical errors. The GDH sum rule value is indicated by the star.
n n where I1;p;res (Q2 ) is the resonance contribution to I1p; n , =1;p; as are the asymptotic values for the .rst moments of g1 , and the scale was assumed to be the vector meson mass [58], i.e., = m! . Furthermore, the parameter cp; n in Eq. (135) was chosen as 2 2 1 p; n p; n c =1+ (136) + I1; res (0) ; n 2M 2 =1;p; as 4
so as to reproduce the sum rule at Q2 =0. In Fig. 10, we use the parametrization of Eq. (135), but take the recent experimental value of Eq. (131) for =1;p as . Furthermore, we use as input for the resonance contribution at the real photon point I1;p res (0) (corresponding with W ¡ 2 GeV) the experimental value from Ref. [17]: I1;p res (0) = −0:95 (open diamond in Fig. 10). For the Q2 dependence of the resonance contribution, we take the MAID estimate [18,19] rescaled to the experimental value I1;p res (0) at the real photon point. It is seen from Fig. 10 that the resulting calculation (shown by the thin solid curve) gives a rather good description of the sign change occurring in I1p at Q2 0:25 GeV2 . The following Fig. 11 displays the results for the generalized GDH integral for the neutron, as derived from the 3 He data of the Hall A Collaboration at JLab [61] and corrected for nuclear e0ects according to the procedure of Ref. [64]. Recently, also the generalized GDH integral for the deuteron has been measured by the Clas Collaboration at JLab [62,63], and will provide a cross-check for the extraction of the generalized GDH integral for the neutron. The comparison of the existing neutron data with the MAID results in Fig. 11 shows the same problem as already discussed for real photons: The helicity di0erence in the low-energy region is not properly described by the existing phase shift analyses. However, the strong curvature at Q2 ≈ 0:1 GeV2 agrees nicely with the predictions.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
Ip1 - In1
133
(Ip1 - In1) • Q2/(2 M2)
0.5
0.25
0.4
0.2
0.3 0.15
0.2 0.1
0.1
0 0.05 -0.1 0
-0.2 0
0.5
1 2
2
Q (GeV )
1.5
0
2
1 2
3 2
Q (GeV )
Fig. 12. Q2 dependence of the integral I1p − I1n (left) and (I1p − I1n ) · Q2 =(2M 2 ) (right) for the proton–neutron di0erence, as given by Eq. (128). The dashed curve is the MAID estimate [18] for the one-pion channel. The thin solid curve is the parametrization of Eq. (136), using MAID to calculate the resonance contribution. The lower thin dotted curves are the O(p4 ) HBChPT results of Ref. [53], whereas the upper thick dotted curves are the corresponding O(p4 ) relativistic BChPT results of Ref. [53]. The upper thick solid curve for Q2 ¿ 1:25 GeV2 is the evaluation using the DIS structure function g1p − g1n [46], whereas the lower thick solid curve is the evaluation for the resonance region (W ¡ 2 GeV) using the DIS structure function. The shaded bands represent the corresponding error estimates as given by Ref. [46]. The SLAC data are from Ref. [55], and the HERMES data are from Ref. [56] (for both data sets: inner error bars are statistical errors only, outer error bars include systematical errors). The GDH sum rule value is indicated by the star.
In Fig. 11, we also show the corresponding parametrization of Eq. (135) for IA , by using the value of Eq. (131) for =1;n as and the MAID estimate for the resonance contribution. It is seen that the resulting calculation gives a rather good description for the generalized GDH integral IA for the neutron. In Figs. 10 and 11, we also present the heavy baryon chiral perturbation theory (HBChPT) calculation to O(p4 ) of Ref. [53], as well as the relativistic baryon ChPT (relativistic BChPT) calculation to O(p4 ) of Ref. [54]. From the comparison of both the HBChPT and relativistic BChPT calculations to the individual proton and neutron generalized GDH integrals, one sees that the chiral expansion may only be applied in a very limited range of Q2 . 0:05 GeV2 . This can be understood from the phenomenological calculations discussed above, where it became obvious that the GDH integrals for proton and neutron at small Q2 are dominated by the 8(1232) resonance contribution. However in the p − n di0erence, the 8(1232) contribution and other isospin 3/2 resonances drop out. Therefore, it was noted in Ref. [65] that the HBChPT expansion may be applied in a larger Q2 range for the di0erence I1p −I1n . In Fig. 12, we display the Q2 dependence of the proton−neutron di0erence I1p −I1n . It is indeed seen that the Q2 dependence of the ChPT calculations, in particular the HBChPT calculation, is much less steep for the p − n di0erence and follows the phenomenological estimate over a larger Q2 range. Therefore this opens up the possibility, as discussed in Ref. [65], to extend the Q2 range of the ChPT calculation upwards in Q2 . On the other hand, the extension of the operator product expansion for =1p − =1n to a value around Q2 0:5 GeV2 requires the control of higher twist
134
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
terms, which lattice QCD estimates show to be rather small [66]. This may open the possibility to bridge the gap between the low and high Q2 regimes, at least for this particular observable. The second spin-dependent VVCS amplitude S2 is odd in 2, which leads to the unsubtracted DR ∞ Im S2 (2 ; Q2 ) 22 d2 Re S2 (2; Q2 ) = P 2 2 − 2 2 0 ∞ Im S2 (2 ; Q2 ) 22 P = Re S2pole + d2 ; (137) 2 2 − 2 2 20 where the pole part Re S2pole is obtained from Eq. (119) as 2 em FP (Q2 )(FD (Q2 ) + FP (Q2 )) ; Re S2pole (2; Q2 ) = 2 2 2 − 22B
(138)
Assuming further that the high-energy behaviour of S2 is given by S2 (2; Q2 ) → 22
for 2 → ∞
with 2 ¡ − 1 ;
(139)
one can also write down an unsubtracted dispersion relation for the amplitude 2S2 (which is even in 2), ∞ 2 2 Im S2 (2; Q2 ) 2 2 d2 Re(2S2 (2; Q )) = P 2 2 − 2 2 0 ∞ 2 2 Im S2 (2; Q2 ) 2 pole = Re(2S2 ) + P d2 ; (140) 2 2 − 2 2 20 where the pole part is obtained from Eq. (119) as Re(2S2 (2; Q2 ))pole =
em 22B FP (Q2 )(FD (Q2 ) + FP (Q2 )) : 2 22 − 22B
(141)
If we subtract Eq. (140) from Eq. (137) multiplied by 2, we obtain the “superconvergence relation” (for any value of Q2 ), ∞ Im S2 (2; Q2 ) d2 ; (142) 0= 0
i.e., the pole contribution and the inelastic contribution to that integral should cancel. Eq. (142) is known as the Burkhardt–Cottingham (BC) sum rule [4]. When Eq. (142) is expressed in terms of the nucleon structure function g2 (x; Q2 ), the BC sum rule implies the vanishing of the .rst moment of g2 , i.e., 1 0= d x g2 (x; Q2 ) ; (143) 0
and the convergence condition of Eq. (139) leads to g2 (x; Q2 ) → x˜2
for x → 0
with ˜2 ¿ − 1 :
(144)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
135
Separating the elastic and inelastic contributions in Eq. (143) and using Eq. (87), we may express the BC sum rule for any value of Q2 as 2M 2 x0 1 I2 (Q2 ) ≡ 2 g2 (x; Q2 ) d x = FP (Q2 )(FD (Q2 ) + FP (Q2 )) : (145) Q 4 0 Alternatively the BC sum rule can be written in terms of the Sachs form factors and the absorption cross sections, i.e., by M 2 ∞ K(2; Q2 ) 2 2 I2 (Q ) = 2 {−TT (2; Q2 ) + LT (2; Q2 )} d2 2 2 e 20 2 + Q Q =
1 GM (Q2 )(GM (Q2 ) − GE (Q2 )) : 4 1+
(146)
Performing a low energy expansion for (2S2 ) − (2S2 )pole , we obtain from Eq. (140): 7 Re 2S2 (2; Q2 ) − Re(2S2 (2; Q2 ))pole =(2em )I2 (Q2 ) − (2em ) +
1 (IA (Q2 ) − I1 (Q2 ))22 Q2
1 1 2 2 2 2 2 (2 ) (I (Q ) − I (Q )) + M (* (Q ) − (Q )) 24 + O(26 ) ; em A 1 LT 0 Q2 Q2
(147)
in terms of the integrals I2 ; I1 ; IA and spin polarizabilities 0 and *LT introduced before. For the Born contribution, we obtain from Eqs. (119) and (141) that em FP (Q2 )(FD (Q2 ) + FP (Q2 )) ; Re(2S2 (2; Q2 ))Born − Re(2S2 (2; Q2 ))pole = (148) 2 yielding I2Born (Q2 ) =
1 FP (Q2 )(FD (Q2 ) + FP (Q2 )) : 4
(149)
It is interesting to note that the Born contribution to (2S2 ) leads exactly to the BC sum rule value of Eq. (145). Furthermore, the Born contribution also leads to I1Born (Q2 ) = IABorn (Q2 ) = −FP2 (Q2 )=4. By comparing Eqs. (109), (128) and (146) one obtains that I3 , de.ned by Eq. (108), can be expressed as I3 (Q2 ) = I1 (Q2 ) + I2 (Q2 ) :
(150)
If the BC sum rule holds at Q2 = 0, one obtains I3 (0) = eN N =4. The BC sum rule has been shown to be satis.ed in the case of quantum electrodynamics by a calculation in lowest order of em [67]. In perturbative QCD, the BC sum rule was calculated for a quark target to .rst order in s and also shown to hold [68]. Furthermore, it is interesting to note 7
Note that the relation of Ref. [19], i.e., IA (0) − I1 (0) = M 2 =(2em ) · (0 (0) − *LT (0)) ensures that the 24 term in 2S2 has no singularity at Q2 = 0.
136
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 I
p
I
2
0.06
1.2
p 2
2
2
Q /(2 M )
0.04
1 0.8
0.02
0.6
0
0.4
-0.02
0.2
-0.04
0
-0.06 0
0.25
0.5
0.75
2
2
1
0
Q (GeV )
2 2
4 2
Q (GeV )
Fig. 13. Q2 dependence of the integral I2 (left) and I2 · Q2 =(2M 2 ) (right) for the proton, as given by Eq. (145). The dashed curve represents the MAID estimate [18] for the one-pion channel. The dotted curve is the HBChPT result at order O(p4 ) [69]. The solid curve is the Burkhardt–Cottingham sum rule (rhs of Eq. (145)), using the dipole parametrization p p for GM and the parametrization for GEp =GM following from the recent JLab data [70,71]. The shaded band represents the evaluation using the recent SLAC E155 data for g2 integrated over the range 0:02 6 x 6 0:8 [49].
that the validity of the Wandzura–Wilczek relation of Eq. (111) for the transverse spin structure function g1 + g2 implies that the BC sum rule is satis.ed. Indeed one directly obtains Eq. (143) by integrating Eq. (111). For a nucleon target, the BC sum rule has recently been evaluated at small Q2 in HBChPT at order O(p4 ) [69], and has also been shown to hold to this order. In Fig. 13, we show the MAID model prediction for I2 (Q2 ) of the proton and compare it with the BC sum rule value. It is obvious from Fig. 13 that at small Q2 , the one-pion channel nearly saturates the BC sum rule prediction. At intermediate values of Q2 , the MAID calculation starts to fall short of the sum rule, because the channels and higher continua become increasingly important. One also notices that the HBChPT result at order O(p4 ) [69] for the .rst moment of g2 remains close to the phenomenological sum rule evaluation, in the range up to Q2 0:3 GeV2 . For the higher moments of g2 , it was shown in Ref. [69] that the 8-contribution is very small, so that the moments of g2 seem to be a promising observable to bridge the gap between the HBChPT description at the lower Q2 and the perturbative QCD result at the larger Q2 . In the large Q2 region, the .rst moment of g2 was recently evaluated by the E155 Collaboration [49] at Q2 = 5 GeV2 , and the integral of g2 over the range 0:02 6 x 6 0:8 was found to be −0:044 ± 0:008 ± 0:003. Although this value di0ers signi.cantly from zero, it does not represent a conclusive test of the BC sum rule, because the behaviour of g2 is still unknown in the small x region, which remains to be explored by future experiments. 3. Dispersion relations in real Compton scattering (RCS) 3.1. Introduction As we have seen in the previous section, forward photon scattering is closely related to properties of the excitation spectrum of the probed system. By use of dispersion relations (DRs) it becomes possible to set up sum rules on the basis of general principles and to determine certain combina-
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
+
+
(a)
+
(c)
+ (e)
+
+
(b)
(d)
+
+ (f)
137
(g)
Fig. 14. Some typical intermediate states contributing to Compton scattering o0 the nucleon. Upper row: The direct (a) and crossed (b) Born diagrams with intermediate nucleons, a typical resonance excitation in the s-channel (c) and its crossed version (d). Lower row: Typical mesonic contributions with photon scattering o0 an intermediate pion (e), the pion pole diagram (f) and a correlated two-pion exchange such as the “ meson” (g).
tions of polarizabilities from the knowledge of the absorption cross sections alone. In the following we shall discuss the general case of RCS and set up dispersion relations valid for all angles. Some typical processes contributing to RCS are shown in Fig. 14. When the nucleon is taken as a structureless Dirac particle, only the nucleon pole terms contribute. These are diagrams (a) and (b) for the s and u channels, respectively. The di0erential cross section for this situation, .rst obtained by Klein and Nishina [72] in 1929, is shown in Fig. 15. The inclusion of the anomalous magnetic moment leads to a far more complicated result corresponding to the Powell cross section [73] in Fig. 15. If we add the pion pole term, Fig. 14 (f), the cross section drops one third to the original result of Klein and Nishina. This term is, of course, due to the decay 0 → + , and therefore directly related to the axial anomaly, derived on general grounds as Wess–Zumino–Witten term [74]. The pion pole term is often referred to as triangle anomaly, because the vertex can be resolved into a triangular quark loop, a diagram not allowed in a classical theory and only appearing due to the renormalization process of quantum .eld theory. As we see from the .gure, the pion pole term gives a considerable contribution for backward scattering, its e0ect is sometimes included in the backward spin polarizability (the index stands for " = 180◦ !), though from the standpoint of dispersion relations it should be considered as a pole term like the nucleon pole terms. Except for the diagrams (a), (b), and (f), all other and higher diagrams in Fig. 14 have no pole structure, but correspond to excited states in s-, u- or t-channel processes. As such they lead to dispersive contributions whose lowest terms are given by the six leading polarizabilities of RCS on the nucleon. The result of a calculation taking account only of the electric and magnetic dipole polarizabilities is labeled LEX in Fig. 15. It is obvious that the low energy expansion is correct only up to about 80 MeV, in a region where the “world data” scatter and give only limited information on the polarizabilities. Therefore, the analysis of the modern data has been based on dispersion theory whose results are labeled by DR in the .gure. Clearly the higher order terms become more and more important with increasing photon energies, particularly after crossing the pion threshold (seen as a kink at about 150 MeV) with a sharp rise if the energy increases further towards the 8(1232) resonance.
138
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
35 Klein-Nishina Powell Powell + pion pole LEX DR
30
d σ/dΩlab (nb/sr)
25 20 15 10 5 0 0
20
40
60
80 100 Eγ ( MeV )
120
140
160
Fig. 15. Di0erential cross section for Compton scattering o0 the proton as a function of the lab photon energy E and at .xed scattering angle "lab = 135◦ . The curves show the full cross section from .xed-t subtracted dispersion relations (solid), the Klein–Nishina cross section (small dots), the Powell cross section (dashed), the Powell plus 0 pole cross section (large dots), and the low energy expansion (LEX) including also the leading order contributions from the scalar polarizabilities (dashed–dotted).
3.2. Kinematics Assuming invariance under parity, charge conjugation and time reversal symmetry, the general amplitude for Compton scattering can be expressed by six independent structure functions Ai (2; t), i = 1; : : : ; 6 [75]. These structure functions depend on two Lorentz invariant variables, e.g., 2 and t as de.ned in the following. Denoting the momenta of the initial state photon and proton by q and p respectively, and with corresponding .nal state momenta q and p , the familiar Mandelstam variables are s = (q + p)2 ;
t = (q − q )2 ;
u = (q − p )2 ;
with the constraint s + t + u = 2M 2 . The variable 2 is de.ned by s−u : 2= 4M
(151)
(152)
The orthogonal coordinates of the Mandelstam plane, 2 and t, are related to the initial (E ) and .nal (E ) photon lab energies, and to the lab scattering angle "lab by t = −4E E sin2
"lab = −2M (E − E ) ; 2
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
139
200
t channel
2
2
u=(M+mπ)
s=(M+mπ)
150
100
t (mπ2)
bII(u,t)
bII(s,t)
50
t=4mπ2 0
u channel
s channel
-50
bI(u,s)
-100
-6
-4
-2
0
ν (mπ)
2
4
6
Fig. 16. The Mandelstam plane for real Compton scattering. The physical regions are horizontally hatched. The spectral regions (with boundaries bI and bII ) are vertically hatched.
1 t = (E + E ) : (153) 4M 2 The physical regions of the Mandelstam plane are shown in Fig. 16 by the horizontally hatched areas. The vertically hatched areas are the spectral regions discussed in detail in Appendix A of Ref. [76]. The boundaries of the physical regions in the s, u and t channels are determined by the zeros of the Kibble function 2 = E +
E(s; t; u) = t(us − M 4 ) = 0 :
(154)
In particular the RCS experiment takes place in the s-channel region, limited by the line t = 0 (forward scattering, " = 0◦ ) and the lower right part of the hyperbola us = M 4 (backward scattering, " = 180◦ ). The u-channel region is obtained by crossing (2 → −2), and the t-channel region in the upper part of Fig. 16 corresponds to the process + → N + NZ and requires a value of t ¿ 4M 2 .
140
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
3.3. Invariant amplitudes and nucleon polarizabilities The invariant Compton tensor can be constructed as ∗
Tfi = 2 u(p Z ; .N )H 2 u(p; .N ) ;
(155)
where and are the polarization vectors of the incoming and outgoing photon, respectively, as de.ned in Eq. (43), u and uZ are the nucleon spinors, and .N (.N ) are the nucleon helicities in the initial (.nal) states respectively. The Compton tensor H 2 can be built from the four-momentum vectors and Dirac matrices as follows [77]: H 2 = −
P P2 N N 2 (T + T K =) − (T3 + T4 K =) 1 2 N2 P2
+i
P N 2 + P2N P N 2 − P2N T + i 5 K = T6 ; 5 5 P2K 2 P2K 2
(156)
where P = 12 (p + p ) ;
K = 12 (q + q ) ;
Q = 12 (q − q) ;
P = P −
(P · K) K ; K2
N ≡ 2( P2 Q K( ;
(157)
with 0123 = −1. The six tensorial objects in Eq. (156) form a complete basis, and the amplitudes T1 ; : : : ; T6 of Prange are scalar functions of 2 and t containing the nucleon dynamics. Unfortunately, the Prange amplitudes have singularities in the forward and backward directions leading to linear dependencies at these points (kinematical constraints). L’vov [78] has therefore proposed a di0erent tensor basis, resulting in the set of amplitudes 1 1 A1 = [T1 + T3 + 2(T2 + T4 )]; A2 = [2T5 + 2(T2 + T4 )] ; t t M2 t M2 t T1 − T3 − (T2 − T4 ) ; A4 = 4 2MT6 − (T2 − T4 ) ; A3 = 4 M − su 42 M − su 42 A5 =
1 [T2 + T4 ]; 42
A6 =
1 [T2 − T4 ] : 42
(158)
These L’vov amplitudes have no kinematical constraints and are symmetrical under crossing, Ai (−2; t) = Ai (2; t);
i = 1; : : : ; 6 :
(159)
In the spirit of dispersion relations we build the invariant amplitudes by adding the pole contributions of Fig. 14(a), (b) and (f), and an integral over the spectrum of excited intermediate states. Furthermore, we de.ne the polarizabilities by subtracting the nucleon pole contributions ABi from the
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
141
amplitudes and introduce the quantities 8 B ANB i (2; t) = Ai (2; t) − Ai (2; t) :
(160)
The polarizabilities are related to these functions and their derivatives at the origin of the Mandelstam plane, 2 = t = 0, NB NB 9Ai 9Ai ai ≡ ANB (0; 0); a = ; a = : (161) i; 2 i; t i 92 2=t=0 9t 2=t=0 For the spin-independent (scalar) polarizabilities =E1 and ( =(M 1 , one .nds the two combinations E1 + (M 1 = −
1 (a3 + a6 ) ; 2
(162)
1 a1 ; (163) 2 related to forward and backward Compton scattering, respectively. The four spin-dependent (vector) polarizabilities 1 to 4 of Ragusa [79], and the multipole spin polarizabilities E1E1 , M 1M 1 , M 1E2 , E1M 2 of Ref. [80] (see Section 3.10), are de.ned by E1 − (M 1 = −
0 ≡ 1 − 2 − 24 = −E1E1 − M 1M 1 − M 1E2 − E1M 2 = 13 ≡ 1 + 23 = −E1E1 + E1M 2 = −
1 a4 ; 2M
1 (a5 + a6 ) ; 4M
14 ≡ 1 − 24 = −E1E1 − 2M 1M 1 − E1M 2 =
1 (2a4 + a5 − a6 ) ; 4M
(164) (165) (166)
1 (a2 + a5 ) ; (167) 2M where 0 and are the spin polarizabilities in the forward and backward directions, respectively. Since the 0 pole (see Fig. 14 (f)) contributes to A2 only, the combinations 0 , 13 and 14 of Eqs. (164)–(166) are independent of the pole term, and only the backward spin polarizability is a0ected by this term. ≡ 1 + 2 + 24 = −E1E1 + M 1M 1 + M 1E2 − E1M 2 = −
3.4. RCS data for the proton and extraction of proton polarizabilities A pioneering experiment in Compton scattering o0 the proton was performed by Gol’danski et al. [81] in 1960. Their result for the electric polarizability was E1 =9±2, with a large uncertainty in the normalization of the cross section giving rise to an additional systematical error of ±5. We note that here and in the following all scalar polarizabilities are given in units of 10−4 fm3 . The next e0ort to determine the polarizabilities is due to the group of Baranov [82]. The data were taken with a bremsstrahlung beam with photon energies up to 100 MeV, and the polarizabilities 8
Alternatively, the polarizabilities can be de.ned by also subtracting the 0 pole contribution in the case of the amplitude A2 .
142
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
Table 1 Values for the scalar polarizabilities of the proton as obtained from the modern experiments Data set
Energies (MeV)
Angles (degree)
E1 + (M 1 (10−4 fm3 )
E1 − (M 1 (10−4 fm3 )
Illinois 1991 [84] Saskatoon 1993 [85] Saskatoon 1995 [83] LEGS 1998 [86] MAMI/TAPS 2001 [87]
32–72 149 –286 70 –148 33–309 55 –165
60; 135 24 –135 90; 135 70 –130 59 –155
15:8 ± 4:5 ± 0:1 12:1 ± 1:7 ± 0:9 15:0 ± 3:1 ± 0:4 13:23 ± 0:86+0:20 −0:49 13:1 ± 0:6 ± 0:8
11:9 ± 5:3 ± 0:2 7:9 ± 1:4 ± 2:0 10:8 ± 1:8 ± 1:0 10:11 ± 1:74+1:22 −0:86 10:7 ± 0:6 ± 0:8
were obtained by a .t to the low-energy expansion (LEX). However, such energies are outside the range of the LEX. A later reevaluation by use of dispersion relations [83] led to center values of E1 ≈ 12 and (M 1 ≈ −6, far outside the range of Baldin’s sum rule and more recent results for the magnetic polarizability (M 1 . In any case these .ndings were much to the surprise of everybody, because the spin Iip transition from the nucleon to the dominant 8(1232) resonance was expected to provide a large paramagnetic contribution of order (para ≈ 10. The .rst modern experiment was performed at Illinois in 1991 [84]. It was done with a tagged photon beam, thereby improving the capability to measure absolute cross sections, and in the region of energies between 32 and 72 MeV where the LEX was applicable. Unfortunately, by the same token the cross sections were small with the consequence of large error bars. The experiment was repeated by the Saskatoon– Illinois group at higher energies above [85] and below [83] the pion threshold, and evaluated in the framework of dispersion relations with much improved results on the polarizabilities. These results were con.rmed, within the error bars, by the Brookhaven group working with photons produced by laser backscattering from a high-energy electron beam [86]. Even more precise data were recently obtained by the A2 collaboration at MAMI, using the TAPS setup at energies below pion threshold [87]. The results of these modern experiments are compiled in Table 1. A .t to all modern low-energy data constrained by the sum rule relation E1 + (M 1 = 13:8 ± 0:4 leads to the results [87]: E1 = 12:1 ± 0:3(stat) ∓ 0:4(syst) ± 0:3 (mod) ; (M 1 = 1:6 ± 0:4(stat) ± 0:4(syst) ± 0:4 (mod) ;
(168)
the errors denoting the statistical, systematical and model-dependent errors, in order. This new global average con.rms, beyond any doubt, the dominance of the electric polarizability E1 and the tiny value of the magnetic polarizability (M 1 , which has to come about by a cancellation of the large paramagnetic contribution of the N8 spin-Iip transition with a nearly equally strong diamagnetic term. Much less is known about the spin polarizabilities of the proton, except for the forward spin polarizability 0 = 1 − 2 − 24 = [ − 1:01 ± 0:08(stat) ± 0:1(syst)] × 10−4 fm4 , which is determined by the GDH experiment at MAMI and dispersion relations according to Eq. (60). However, the only other combination for which there exists experimental information is the backward spin polarizability =1 +2 +24 . Dispersive contributions from the s-channel integral have been found to be positive and in the range of 5 . (disp) . 10 (here and in the following in units of 10−4 fm4 ). In addition
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
143
to this dispersive part, a large contribution comes from the t-channel 0 exchange, (0 -pole) −46:7 (see Eqs. (177)–(179)), giving a total result of −42 ¡ ¡−37. These theoretical predictions have been challenged by a .rst experimental value presented by the LEGS group [86] who found from a combined analysis of pion photoproduction and Compton scattering: = −27:1 ± 2:2(stat + syst)+2:8 −2:4 (mod) ;
(169)
where the .rst error combines statistical and systematical uncertainties, and the second one represents the model error. However, there is now contradicting evidence from recent MAMI data obtained both at low energies [87] and in the region of the 8 resonance [88–90]. Though these new results vary somewhat depending on the subsets of data in di0erent energy regions and on the input of the underlying dispersion analysis, they are well in the range of the expectations from both dispersion theory and chiral perturbation theory. Typical values are [87,88]
−36:1 ± 2:1(stat + syst) ± 0:8 (mod) [87] = (170) −37:9 ± 0:6(stat + syst) ± 3:5 (mod) [88] ; where the statistical plus systematical error dominates in the low energy region while the model dependency gives rise to a large uncertainty for the experiments in the 8 region. The predictions for the other spin and higher order polarizabilities from dispersion analysis and ChPT will be compared in the following chapters. Unfortunately, these polarizabilities are all small and can hardly be deduced without dedicated polarization studies. This will require a new generation of experiments with polarized beams, polarized targets, and recoil polarimetry. 3.5. Extraction of neutron polarizabilities The experimental situation concerning the polarizabilities of the neutron is still quite unsatisfactory. n The electric polarizability E1 can in principle be measured by scattering low energy neutrons o0 n the Coulomb .eld of a heavy nucleus, while the magnetic polarizability (M 1 remains essentially unconstrained. This technique seemed to be very promising until the beginning of the 1990s, when n Schmiedmayer et al. [91] published a value of E1 = 12:6 ± 1:5(stat) ± 2:0(syst), obtained by the scattering neutrons with energies 50 eV 6 En 6 50 keV o0 a 208 Pb target. Shortly later Nikolenko and Popov [92] argued that the errors were underestimated by a factor of 5. These .ndings were n con.rmed by a similar experiment [93] resulting in E1 = 0 ± 5, and by a further analysis of the n systematical errors [94] leading to the estimate 7 . E1 . 19. n The two remaining methods to measure E1 are quasi-free Compton scattering o0 a bound neutron, or elastic scattering from the deuteron. The .rst experiment on quasi-free Compton scattering by a neutron bound in the deuteron was performed by Rose et al. [95]. Interpreted in conjunction n n with Baldin’s sum rule, the result is 0 ¡ E1 ¡ 14 with a mean value E1 ≈ 10:7. The small sensitivity of the experiment follows from the fact that Thomson scattering vanishes for the neutron, and therefore also the important interference between the Thomson term and the leading nonBorn amplitude (present in the LEX of the proton!) is absent for the neutron. It was therefore pron n posed to measure E1 − (M 1 at photon energies in the 8 region and at backward angles. Of course, the analysis will strongly depend on .nal-state interactions and two-body currents. The quality of
144
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
the analysis can be tested, to some extent, by also measuring the polarizabilities of the bound proton. Such results obtained by the TAPS Collaboration at MAMI were quite promising [96], p p E1 − (M 1 = 10:3 ± 1:7(stat + syst) ± 1:1 (mod). The experiment was then extended to the neutron by the CATS/SENECA Collaboration [97]. Data were collected with both a deuterium and a hydrogen target and analyzed within the framework of Levchuk et al. [98] by use of di0erent parametrizations of pion photoproduction multipoles and nucleon-nucleon interactions. The agreement between the polarizabilities of free and bound protons was again quite satisfactory, and the .nal result for the (bound) neutron was n n +2:1 E1 − (M 1 = 9:8 ± 3:6(stat)−1:1 (syst) ± 2:2 (mod) :
(171)
The quasi-free scattering cross section obtained at MAMI is in good agreement with an earlier datum of a Saskatoon group [99] measured at 247 MeV and "lab =135◦ . From the ratio between the neutron n n and the proton results this group derived a most probable value of E1 − (M 1 = 12, however with a very large error bar. The comparison between proton and neutron shows that there is no signi.cant isovector contribution in the scalar polarizabilities of the nucleon. The second type of experiment, d → d, has been performed at SAL [100] and at MAX-lab [101]. An analysis with the formalism of Ref. [98] gave the results [100,101]: n n E1 − (M 1 = −4:8 ± 3:9
[100] ;
= +3:2 ± 3:1
[101] ;
(172)
leading to values compatible with zero. By comparing the two methods to extract neutron polarizabilities from deuteron experiments, we observe a clear tendancy that elastic Compton scattering leads to smaller values than those extracted from quasi-free scattering, which remains to be studied by future investigations. 3.6. Unsubtracted @xed-t dispersion relations The invariant amplitudes Ai are free of kinematical singularities and constraints, and obey the crossing symmetry Eq. (159). Assuming further analyticity and an appropriate high-energy behaviour, these amplitudes ful.ll unsubtracted DRs at .xed t, +∞ 2 2 Ims Ai (2 ; t) B Re Ai (2; t) = Ai (2; t) + P d2 ; (173) 2 2 − 2 2 20 where ABi are the Born (nucleon pole) contributions as in Appendix A of Ref. [75], Ims Ai the discontinuities across the s-channel cuts of the Compton process and 20 = m + (m2 + t=2)=(2M ). However, such unsubtracted DRs require that at high energies (2 → ∞) the amplitudes Ims Ai (2; t) drop fast enough so that the integral of Eq. (173) is convergent and the contribution from the semi-circle at in.nity can be neglected. For real Compton scattering, Regge theory predicts the following high-energy behaviour for 2 → ∞ and .xed t [75]: A1 ; A2 ∼ 2M (t) ; A3 ; A5 ∼ 2M (t)−2 ;
(A3 + A6 ) ∼ 2P (t)−2 ; A4 ∼ 2M (t)−3 ;
(174)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
145
where M (t) . 0:5 (for t 6 0) is a meson Regge trajectory, and where P (t) is the Pomeron trajectory which has an intercept P (0) ≈ 1.08. Note that the Pomeron dominates the high energy behaviour of the combination of A3 + A6 . From the asymptotic behaviour of Eq. (174), it follows that for RCS unsubtracted dispersion relations do not exist for the amplitudes A1 and A2 . The reason for the divergence of the unsubtracted integrals is essentially given by .xed poles in the t-channel, notably the exchange of the neutral pion (for A2 ) and of a somewhat .ctitious -meson (for A1 ) with a mass of about 600 MeV and a large width, which models the two-pion continuum with the quantum numbers I = J = 0. In order to obtain useful results for these two amplitudes, L’vov et al. [75] proposed to close the contour of the integral in Eq. (173) by a semi-circle of .nite radius 2max (instead of the usually assumed in.nite radius!) in the complex plane, i.e. the real parts of A1 and A2 are calculated from the decomposition as Re Ai (2; t) = ABi (2; t) + Aint i (2; t) + Ai (2; t) ;
(175)
with Aint i the s-channel integral from pion threshold 20 to a .nite upper limit 2max , and an ‘asymptotic contribution’ Aas i representing the contribution along the .nite semi-circle of radius 2max in the complex plane. In the actual calculations, the s-channel integral is typically evaluated up to a maximum photon energy E = 2max − t=(4M ) ≈ 1:5 GeV, for which the imaginary part of the amplitudes can be expressed through unitarity by the meson photoproduction amplitudes (mainly 1 and 2 photoproduction) taken from experiment. All contributions from higher energies are then absorbed in the asymptotic term, which is replaced by a .nite number of energy independent poles in the t channel. In particular the asymptotic part of A1 is parametrized by the exchange of a scalar particle in the t channel, i.e. an e0ective “ meson” [75], Aas 1 (2; t) ≈ A1 (t) =
F gNN ; t − m2
(176)
where m is the mass, and gNN and F are the couplings of the to nucleons and photons respectively. In Ref. [75], the product of the couplings in the numerator of Eq. (176) is used as a .t parameter, which determines the value of − ( through Eq. (163). In a similar way, the asymptotic part of A2 is described by the 0 t-channel pole: 0
A2 (0; t) =
F0 gNN : t − m2
(177)
The coupling F0 is determined through the 0 → decay as =(0 → ) =
1 m3 0 F 20 : 64
(178)
Using =(0 → ) = 7:74 eV [102], one obtains F0 = −
e2 = −0:0252 GeV−1 ; 42 f
(179)
where f =92:4 MeV is the pion-decay constant and the sign is in accordance with the 0 coupling in the chiral limit, given by the Wess–Zumino–Witten e0ective chiral Lagrangian [74]. With the
146
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
2 NN coupling constant taken from Ref. [103], gNN =4 = 13:73, the product of the couplings in −1 Eq. (177) takes the value F0 gNN ≈ −0:331 GeV , leading to a value of −46:7 for the pion pole contribution to . On the other hand, the e0ective chiral Lagrangian yields the value −43:5 [104]. This procedure is relatively safe for A2 because of the dominance of the 0 pole or triangle anomaly, which is well established both experimentally and on general grounds as Wess–Zumino– Witten term. However, it introduces a considerable model-dependence in the case of A1 . Though mesons have been repeatedly reported in the past, their properties were never clearly established. Therefore, this particle should be interpreted as a parametrization of the I = J = 0 part of the two-pion spectrum, which shows up di0erently in di0erent experiments and hence has been reported with varying masses and widths.
3.7. Subtracted @xed-t dispersion relations As has been stated in the previous section, unsubtracted DRs do not converge for the amplitudes A1 and A2 . Moreover, the amplitude A3 converges only slowly, and in practice has to be .xed by Baldin’s sum rule. In order to avoid the convergence problems and the phenomenology necessary to determine the asymptotic contributions, it was suggested to consider DRs at .xed t that are once subtracted at 2 = 0 [76], +∞ Ims Ai (2 ; t) 2 Re Ai (2; t) = ABi (2; t) + [Ai (0; t) − ABi (0; t)] + 22 P : (180) d2 2 2 (2 − 22 ) 20 These subtracted DRs should converge for all six invariant amplitudes due to the two additional powers of 2 in the denominator, and they are essentially saturated by the N intermediate states as will be shown later. In other words, the lesser known contributions of two and more pions as well as higher continua are small and may be treated reliably by simple models. The price to pay for this alternative is the appearance of the subtraction functions Ai (2=0; t), which have to be determined at some small (negative) value of t. We do this by setting up once-subtracted DRs, this time in the variable t, Ai (0; t) − ABi (0; t) = [Ai (0; 0) − ABi (0; 0)] + [Ait −pole (0; t) − Ait −pole (0; 0)] 2 t −2m −4Mm Imt Ai (0; t ) t +∞ Imt Ai (0; t ) − ; + dt dt (2m )2 t (t − t) −∞ t (t − t)
(181)
where Ait −pole (0; t) represents the contribution of poles in the t channel, in particular of the 0 pole in the case of A2 , which is given by Eq. (177). To evaluate the dispersion integrals, the imaginary part due to s-channel cuts in Eq. (180) is determined, through unitarity relation, from the scattering amplitudes of photoproduction on the nucleon. Due to the energy denominator 1=2 (2 2 − 22 ) in the subtracted dispersion integrals, the most important contribution is from the N intermediate states, while mechanisms involving more pions or heavier mesons in the intermediate states are largely suppressed. In our calculation, we evaluate the N contribution using the multipole amplitudes from the analysis of Hanstein et al. (HDT) [105]
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
147
at energies 2 6 500 MeV and at the higher energies up to 2 1:5 GeV we take as input the SAID multipoles (SP02K solution) [106]. The expansion of Ims Ai into this set of multipoles is truncated at a maximum angular momentum jmax = l ± 1=2 = 7=2, with the exception of the energy range in the unphysical region where we use jmax = 3=2. The higher partial waves with j ¿ jmax + 1 are evaluated analytically in the one-pion exchange (OPE) approximation. The relevant formulas to implement the calculation are reported in Appendices B and C of Ref. [75]. The multipion intermediate states are approximated by the inelastic decay channels of the N resonances. Since a multipole analysis is not yet available for the two-pion channel, we assume that this inelastic contribution follows the helicity structure of the one-pion photoproduction amplitudes. In this approximation, we .rst calculate the resonant part of the pion photoproduction multipoles using the Breit–Wigner parametrization of Ref. [106], which is then scaled by a suitable factor to include the inelastic decays of the resonances. It was found, however, that in the subtracted dispersion relation formalism, the sensitivity to the multipion channels is very small and that subtracted dispersion relations are essentially saturated at 2 0:4 GeV. The imaginary part in the t-channel integral from 4m2 → +∞ in Eq. (181) is saturated by the possible intermediate states for the t-channel process (see, for example, Fig. 14 (e) and (g)), which lead to cuts along the positive t-axis. For values of t below the K KZ threshold, the t-channel discontinuity is essentially saturated by intermediate states. As a consequence, the dependence of the subtraction functions on the momentum transfer t can be calculated by including the experimental information on the t-channel process through intermediate states as → → N NZ . In Ref. [76], a unitarized amplitude for the → subprocess was constructed, and a good description of the available data was found. This information is then combined with the → N NZ amplitudes determined from dispersion theory by analytical continuation of N scattering amplitudes [8]. In practice, the upper limit of integration along the positive-t cut is t = 0:78 GeV2 , corresponding to the highest t value for which the → N NZ amplitudes are given in Ref. [8]. In Appendix A, we show in detail how the discontinuities Imt Ai of the invariant amplitudes Ai (i = 1; : : : ; 6) in the t-channel ( → N NZ ) can be expressed in terms of the corresponding → and → N NZ amplitudes. The second integral in Eq. (181) extends from −∞ to a = −2(m2 + 2Mm ) ≈ −0:56 GeV2 . As we are interested in evaluating Eq. (181) for small (negative) values of t (|t||a|), the integral from −∞ to a will be highly suppressed by the denominator of the subtracted DRs, resulting in a small contribution. This contribution is estimated by saturation with the 8-resonance and nonresonant N intermediate states. In particular, we calculate the nonresonant N contribution to the Compton amplitudes through unitarity relation from the OPE and nucleon-pole pion-photoproduction amplitudes, while we consider the amplitudes corresponding to diagrams (c) and (d) of Fig. 14 for the 8-resonance excitation. Finally, the corresponding contributions to the discontinuities of the invariant amplitudes Ai at 2 = 0 and negative t are obtained by analytical continuation in the unphysical region. We estimated the total uncertainty resulting from the negative t integral and the two-pion contributions, by calculating the cross sections with and without the negative t integral and the two-pion contributions. The total di0erence can be estimated of the order of 3–5%, as long as we restrict ourselves to the calculation of observables up to the 8-resonance region. Once the t dependence of the subtraction functions Ai (0; t) is determined, the subtraction constants Ai (0; 0) remain to be .xed. Although all six subtraction constants a1 to a6 could be used as .t parameters, we shall restrict the .t to the parameters a1 and a2 , or equivalently to E1 − (M 1 and . The subtraction constants a4 , a5 and a6 will be calculated through an unsubtracted sum rule, as
148
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
1
m π)
2
0.2
t = 4 mπ2
2
t (GeV )
s=( M+
0.4
M+ u=(
0.6
m )2 π
0.8
0
o
0
-0.2 -0.4 100 o 180
o
-0.6 -0.8 -1 -0.6
-0.4
-0.2
0
0.2
0.4
0.6
ν (GeV) Fig. 17. Integration paths in the s-channel region of the Mandelstam plane for RCS at .xed lab angle.
derived from Eq. (173), 2 +∞ Ims A4; 5; 6 (2 ; t = 0) a4; 5; 6 = d2 : 20 2
(182)
The remaining subtraction constant a3 , which is related to E1 + (M 1 through Eq. (162), will be .xed through Baldin’s sum rule. 3.8. Hyperbolic (@xed-angle) dispersion relations As we have seen in the previous sections, DRs at constant t have the shortcoming that the dispersion integrals get contributions from the unphysical region between the boundaries of the physical s and u channel regions. Though in principle the integrand in this region can be constructed by extrapolating a partial wave expansion of the Compton amplitudes, the calculation is limited in practice to low partial waves. In order to improve the convergence for larger values of t, .xed-angle DRs have been proposed [107] and applied to Compton scattering [108,109]. In particular for "lab = 180◦ , the path of integration runs along the lower boundary of the s-channel region (see Fig. 17) from the origin of the Mandelstam plane to in.nity (“s-channel contribution”), complemented by a path of integration in the upper half-plane (“t-channel contribution”). The s − u crossing symmetric hyperbolic integration paths are given by (s − a)(u − a) = b;
b = (a − M 2 )2 ;
(183)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
149
where a is in one-to-one correspondence with the lab and c.m. scattering angles as a = −M 2
1 + cos "lab ; 1 − cos "lab
a = −s
1 + cos "cm : 1 − cos "cm
(184)
A few contours corresponding with .xed values of a are shown in Fig. 17. Along such a path at .xed a, one can write down a dispersion integral as Re Ai (s; t; a) = ABi (s; t; a) + Ait −pole (s; t; a) 1 1 1 ∞ 1 + − + ds Ims Ai (s ; t˜; a) (M +m )2 s −s s −u s −a 1 ∞ Imt Ai (s; ˜ t ; a) + ; dt 4m2 t − t
(185)
where the discontinuity in the s channel Ims Ai (s ; t˜; a) is evaluated along the hyperbola given by (s − a)(u − a) = b;
s + t˜ + u = 2M 2 ;
(186)
and Imt Ai (s; ˜ t ; a) runs along the path de.ned by the hyperbola (s˜ − a)(u˜ − a) = b;
s˜ + t + u˜ = 2M 2 :
(187)
The integrals in Eq. (185) have a similar form as in the case of subtracted DRs (Section 3.7) except that the individual partial waves are multiplied with di0erent kinematical factors depending on the angle. Though the problems of the partial wave expansion are now cured in the lower half plane (for t ¡ 0), the integration in the upper half plane (for t ¿ 0) still runs through the unphysical region 4m2 6 t ¡ 4M 2 . In the latter region the t-channel partial wave expansion of Imt Ai convergences if −0:594 GeV2 6 a 6 0, corresponding to 101◦ 6 "lab 6 180◦ . In fact, a simultaneous investigation of the s- and t-channel Lehmann ellipses leads to the result that the convergence of the partial wave expansion is limited at positive 22 by the spectral function bII (s; t), and at negative 22 by the left thick line shown in Fig. 18 which follows from the semi-major axis of the ellipse of convergence [8]. Summing up we .nd that DRs at t= const are perfect for t =0, i.e., "lab =0, and run into problems with increasingly negative t values, particularly at backwards angles, while DRs at "lab = const are best at "lab = 180◦ and loose accuracy with decreasing angle. Therefore, the two techniques nicely complement each other. Holstein and Nathan [108] investigated backward DRs ("lab =180◦ ) in order to get rigorous bounds for the backward scalar polarizability, E1 − (M 1 , and to connect the polarizabilities of nucleons and pions. From the s-channel integral they found for the proton (E1 − (M 1 )s = 4:8 − 10:8 ± 3:0, where the numbers on the rhs refer to one-pion intermediate states with parity change, one-pion states without parity change, and the error due to the unknown multipole structure of heavier intermediate states. For the t-channel the result was (E1 − (M 1 )t = 10:3 − 1:7, the .rst contribution due to S-wave states, the much smaller second number from D-waves. Correlations are found to play an important role for the S-waves. While the pion polarizability increases the Born contribution
150
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
4 4M
bII(s,t)
2.5 t (GeV2)
2
180 o
3
101 o
3.5
2
1.5 1 0.5 0 -0.8 -0.6 -0.4 -0.2
0
0.2 0.4 0.6 0.8
ν (GeV ) 2
2
Fig. 18. Fixed lab angle integration paths for RCS in the 22 − t plane. The right thick line corresponds to the spectral function bII (s; t) which determines the semi-minor axis of the ellipse of convergence, while the left thick line follows from the semi-major axis of the ellipse of convergence and gives the boundary of convergence at 22 ¡ 0.
from 16.1 to 19.1, the hadronic interaction decreases this value to 10.3 as shown above. Their total sum of the s- and t-channel contributions is still considerably smaller than the nowadays accepted experimental value, but the interesting .nding of this investigation was the partial resolution of the phenomenological meson by the continuum. The same technique was later applied by L’vov and Nathan [109] to study the puzzle of the large dispersive contributions to the backward spin polarizability as deduced by the LEGS experiment [86]. Their .nal result was = −39:5 ± 2:4, obtained by adding the contributions of the 0 pole (−45:0 ± 1:6), the s-channel N states (7:31 ± 1:8) and small contributions of N states as well as : and : in the t-channel. Our results for the proton polarizabilities from unsubtracted .xed-angle DRs are presented in the following three tables. We choose six convenient combinations of the polarizabilities as seen in Table 2 where the total results are given. For the reason outlined before, the .xed-angle results deteriorate with decreasing angle, and therefore only the range 100◦ 6 "lab 6 180◦ is shown. The contributions of the s- and t-channel paths are listed separately in Tables 3 and 4, respectively. In general we expect that the forward polarizabilities, E1 + (M 1 and 0 , are described best by forward DRs, i.e., the values at "lab = 0◦ (last line of Table 3). In comparing with the Baldin sum rule, Eq. (59), it becomes obvious that E1 + (M 1 is not yet saturated at the upper limit of our integration, 2 = 1:5 GeV. However, 0 is in good agreement with the experimental analysis, Eq. (60), because of the better convergence of the integral Eq. (58). The backward polarizabilities, on the
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
151
Table 2 The proton polarizabilities evaluated by unsubtracted .xed-angle dispersion relations at di0erent lab scattering angles. See Eqs. (162)–(167) for de.nitions. The values are given in units of 10−4 fm3 for the scalar polarizabilities, and 10−4 fm4 for the spin polarizabilities "lab
E1 − (M 1
E1 + (M 1
disp
13
14
0
180◦ 140◦ 100◦
10.89 10.58 9.36
10.80 10.93 11.41
7.79 7.62 6.93
4.32 4.33 4.40
−2:36 −2:35 −2:28
−1:07 −1:09 −1:14
Table 3 The s-channel contribution to the proton polarizabilities of Table 2, .rst integral on the rhs of Eq. (185). Note that the values for "lab = 0◦ are identical with the results of .xed-t dispersion relations at t = 0. Units as in Table 2 "lab ◦
180 140◦ 100◦ 60◦ 20◦ 0◦
E1 − (M 1
E1 + (M 1
disp
13
14
0
−5:56 −5:63 −5:76 −5:76 −5:49 −5:30
7.52 7.65 8.13 9.25 11.29 11.94
7.71 7.70 7.71 7.94 8.89 9.29
2.75 2.75 2.82 3.19 4.01 4.28
−2:70 −2:70 −2:62 −2:31 −1:68 −1:54
−1:07 −1:09 −1:14 −1:07 −0:82 −0:75
Table 4 The t-channel contribution to the proton polarizabilities of Table 2, last integral on the rhs of Eq. (185). Units as in Table 2 "lab
E1 − (M 1
E1 + (M 1
disp
13
14
0
180◦ 140◦ 100◦
16.46 16.20 15.11
3.28 3.28 3.28
0.08 −0:08 −0:78
1.57 1.57 1.57
0.34 0.34 0.34
0 0 0
other hand, should be evaluated by paths along backward angles. From Table 3 we .nd indeed that the s-channel contribution for E1 − (M 1 and disp is pretty stable for "lab & 100◦ . However, the astounding fact is the large t-channel contribution for E1 −(M 1 (see Table 4). From the last line in Table 3, one sees that the s-channel integral up to 2max = 1:5 GeV yields only a small contribution of about +3.3 to the electric polarizability. It is remarkable to observe that the bulk contribution resides at energies beyond 1:5 GeV. The bad convergence of the s-channel integral is related to a strong concentration of the spectral strength in the t-channel, close to two-pion threshold. This e0ect is clearly reIected by the large t-channel contribution of about +9:9 to E1 (see Table 2, .rst line). The integrand for the t-channel integral is shown in Fig. 19 for the + − and 0 0 channel and for the sum of both channels. The maximum of the integral is at t ≈ 0:09 GeV2 , and displays a long tail reaching out to higher values of t. It is obvious that this contribution contains
152
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 3
0.1
0.08
-d(α-β)/dt (10 fm /mπ2)
3
2
-4
d(α-β)/dt (10-4 fm3/mπ2)
2.5
1.5 1
0.04
0.02
0.5 0
0.06
0
10
20
t (mπ2)
30
40
0
0
10
20
30
40
t (mπ2)
Fig. 19. The integrand for the t-channel contribution from S-waves (left panel) and D-waves (right panel) to the polarizability E1 − (M 1 . Dashed curve: contribution from the + − channel, dotted curve: contribution from the 0 0 channel, solid curve: full result, sum of charged and neutral channels.
the phenomenological meson, which has to be introduced to describe the data in the framework of unsubtracted DRs at constant t. Our best value for E1 − (M 1 comes from backward angles with an error estimated from the stability in the region 140◦ 6 "lab 6 180◦ , E1 − (M 1 = 10:7 ± 0:2 :
(188)
In Tables 2–4 only the dispersive contribution to the backward spin polarizability has been listed. If we add the large 0 pole contribution (see Eqs. (177)–(179)), we obtain = −38:8 ± 1:8 ;
(189)
the largest error being due to the value of the 0 pole contribution. According to Table 4, the t-channel contributions for the remaining combinations 13 and 14 are very stable while the s-channel results depend on the path of integration. In the case of the polarizability 13 , the backward value and the forward value agree within 1% (see Table 2 and the last line of Table 3). However, the forward value of 14 = −1:54 di0ers from its backward value substantially. Since this polarizability contains the amplitude a4 , which in turn is related to the forward polarizability 0 , we expect that the forward value is more realistic. In summary, we obtain the following results: 13 = 4:30 ± 0:02 ; 14 = −1:95 ± 0:41 ; 0 = −0:91 ± 0:16 ;
(190)
where the central value and the errors are derived by combining forward (last line of Table 3) and backward (.rst line of Table 2) DRs.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
153
In order to improve the convergence, we shall also consider hyperbolic DRs that are once subtracted at s = u = M 2 , ReAi (s; t; a) = ABi (s; t; a) + [Ai (M 2 ; 0; a) − ABi (M 2 ; 0; a)] (u − M 2 ) (s − M 2 ) 1 ∞ ˜ + ds Ims Ai (s ; t ; a) + (M +m )2 (s − s)(s − M 2 ) (s − u)(s − M 2 ) t ∞ Imt Ai (s; ˜ t ; a) t −pole t −pole + [Ai (t) − Ai (0)] + dt : 4m2 t (t − t)
(191)
In addition to the better convergence in the s- and t-channel integrals, the subtraction at s = u = M 2 allows us to pursue a similar strategy as in the case of .xed-t subtracted DRs. We note in fact that the subtraction constants in Eq. (191) are again related to the polarizabilities, i.e. ai = [Ai (M 2 ; 0; a) − ABi (M 2 ; 0; a)] independent of the value of a. 3.9. Comparison of diAerent dispersion relation approaches to RCS data In this subsection we compare the results from .xed-t and hyperbolic DRs, in both their subtracted and unsubtracted versions, with some selected experimental data. Fig. 20 shows the di0erential cross section in the low-energy region for various lab angles, obtained at .xed values of E1 , (M 1 , and . The results from subtracted and unsubtracted .xed-t DRs (full and dashed curves) are nearly identical except for extreme backward scattering. Hyperbolic DRs, on the other hand, can only be trusted in the backward hemisphere. The unsubtracted version (dashed–dotted curve) clearly fails at "lab = 107◦ and above pion threshold, and of course even more so at smaller scattering angles. However, it is extremely satisfying that in all other cases the four di0erent approaches agree within the experimental error bars. We can therefore conclude that the analysis of the low-energy data is well under control and that quite reliable values can be extracted for the polarizabilities, in particular E1 , (M 1 , and , which have a large inIuence on the low-energy cross sections. In Table 5, we show the results from subtracted .xed-t DRs for the .t of the polarizabilities E1 , (M 1 , and to the modern low-energy data. In particular, we analyzed both the set of recent data from the TAPS [87] experiment and the full set of low-energy data of Refs. [83,84,87,110]. For the .tting procedure we used the standard 2 minimization 9 by following two di0erent strategies: (1) the spin polarizability, , the polarizability di0erence, E1 − (M 1 , and the polarizability sum, E1 + (M 1 , were all used as independent free parameters; (2) and E1 − (M 1 were varied in the .t, while E1 + (M 1 was constrained by Baldin’s sum rule, using as additional datum point the result from the recent re-evaluation of Ref. [87], E1 +(M 1 =13:8±0:4. We note that the two .t procedures give consistent results within the error bars. Only the value for the polarizability sum E1 + (M 1 obtained from the .t is slightly underestimated with respect to the expected value from the Baldin sum rule. However, we note that the set of the .tted data mainly covers the backward-angle region where cross sections are quite insensitive to the scalar polarizability sum and mainly depend on the correlated e0ects of and E1 − (M 1 . 9
The program package MINUIT from the CERNlib was used.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 40 θlab = 59
d σ/dΩlab (nb/sr)
d σ/dΩlab (nb/sr)
154
o
30
20
10
50
o
20
0
150
100
40 θlab = 107
o
30
20
0
d σ/dΩlab (nb/sr)
d σ/dΩlab (nb/sr)
0
10
50
150
100
40 θlab = 133
o
30
20
10 50
150
100
40 θlab = 155 o 30
20
0
d σ/dΩlab (nb/sr)
0
d σ/dΩlab (nb/sr)
θlab = 85 30
10
0
10
40
50
150
100
40 θlab = 180 o 30
20
0
50
100
Eγ ( MeV )
150
10
0
50
100
150
Eγ ( MeV )
Fig. 20. Di0erential cross section for Compton scattering o0 the proton as a function of the lab photon energy E and at di0erent scattering angles. Full curves: results from .xed-t subtracted DRs, dashed curves: .xed-t unsubtracted DRs, dotted curves: hyperbolic subtracted DRs, dashed–dotted curves: hyperbolic unsubtracted DRs. All result are shown for .xed values of E1 + (M 1 = 13:8, E1 − (M 1 = 10, and = −37. The experimental data are from Ref. [87] (full circles), Ref. [84] (diamonds), Ref. [83] (triangles), and Ref. [110] (open circles).
In the .t procedure to the TAPS data, we used the standard de.nition of 2 , i.e., (exp − theo ) 2 2 ; = P
(192)
where exp are the experimental and theo the calculated cross-sections, and P are the experimental error bars. In Eq. (192), the experimental errors were estimated according to Ref. [87], by adding in quadrature the statistical errors and the “random systematic errors” which were estimated, from uncertainties in the experimental geometry and from the statistics of the simulation, to be equal to ±5% of the measured cross sections. These statistical errors, including the random systematic uncertainties, determine the .rst error bar in the values of the polarizabilities reported as TAPS-.t values in Table 5. The second error bar in these .tted values corresponds to systematic uncertainties. These
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
155
Table 5 The polarizabilities E1 − (M 1 and as obtained by .tting the di0erential cross sections from di0erent experiments with .xed-t subtracted DRs: “TAPS” refers to the data from Ref. [87] (65 data points) .tted by using the de.nition of 2 in Eq. (192), and “global” denotes the values of the .t to the set of data from Refs. [83,84,87,110] (a total of 101 data points) with the 2 -function of Eq. (193). The term @xed denotes that E1 + (M 1 = 13:8 ± 0:4 is included as a constraint, while free indicates that this combination is also a free parameter. The .rst error band is statistical and the second one is systematic TAPS
global
E1 + (M 1 E1 − (M 1 2 red
13:8 ± 0:4 (@xed) 11:2 ± 1:2 ± 1:9 −35:7 ± 3:9 ± 0:6 82:1=(66 − 2) = 1:3
12:6 ± 1:0 ± 1:0 (free) 11:4 ± 1:3 ± 1:7 −35:6 ± 2:1 ± 0:4 80:6=(65 − 3) = 1:3
E1 + (M 1 E1 − (M 1 2 red
13:8 ± 0:4 (@xed) 11:3 ± 1:1 ± 2:7 −35:9 ± 1:8 ± 3:2 116:0=(102 − 7) = 1:2
13:2 ± 0:9 ± 0:7 (free) 11:1 ± 1:1 ± 0:8 −36:0 ± 1:8 ± 3:2 115:7=(101 − 8) = 1:2
systematic error bars were obtained by rescaling the di0erential cross section by ±3%, assuming that the systematic uncertainties in the data are mainly due to errors in the normalization of the measured cross sections. The global .t to the di0erent data sets of Refs. [83,84,87,110] was performed by using a di0erent 2 -function, namely (Nexp − theo ) 2 N −1 2 2 = + ; (193) N P Psys where N is a normalization parameter used to change the normalization for each data set within its systematic errors Psys taken equal to ±3% of the measured cross sections. According to Refs. [87] and [83], the minimization of this extended 2 -function was performed by taking the polarizabilities and the normalization constants for each data set as free parameters. The resulting uncertainties in the .tted values of the polarizabilities include contributions from both the statistical and the systematic errors. The purely statistical contribution to these error bars was obtained by .tting the data with .xed values of the normalization constants. On the other hand, the net systematic error bars were derived by assuming that the total uncertainty is the result of the sum in quadrature of the statistical and systematic contributions. These statistical and systematic errors are given by the .rst and second error bar, respectively, in the values of the polarizabilities quoted as global-.t values in Table 5. In Table 6 we also show the results of Ref. [87] from a .t within .xed-t unsubtracted DRs. We note that all the analyses, .xed-t subtracted, .xed-t unsubtracted and subtracted hyperbolic DRs, are in quite good agreement, giving us con.dence that the model dependence of polarizability extraction is well under control. The analysis is more model dependent when we move towards higher energies. As shown in Fig. 21, the results of subtracted .xed-t DRs have serious numerical problems at energies above the 8(1232). The reason for this failure is, however, not the high energy region itself, which is strongly suppressed by the denominator in Eq. (180). Quite on the contrary, the denominator creates the problem near the lower limit 20 of the integral, which extends into the region where the amplitude
156
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
Table 6 The polarizabilities E1 − (M 1 and as obtained by the .t of Ref. [87] with .xed-t unsubtracted DR to the di0erential cross sections from di0erent experiments: “TAPS” refers to the data from Ref. [87] (65 data points) .tted by using the de.nition of 2 in Eq. (192), and “global” denotes the values of the .t to the set of data from Refs. [83,84,87,110] (a total of 101 data points) with the 2 -function of Eq. (193). The .rst error band is statistical and the second one is systematic. The terms @xed and free are de.ned in Table 5 (Free) 12:2 ± 0:8 ∓ 1:4 0:8 ± 0:9 ± 0:5 −35:9 ± 2:3 ∓ 0:4 80:6=(65 − 3) = 1:3
Global
E1 + (M 1 E1 (M 1 2 red
(Fixed) 12:4 ± 0:6 ∓ 0:5 1:4 ± 0:7 ± 0:4 −36:1 ± 2:1 ∓ 0:4 108:4=(102 − 7) = 1:1
d σ / dΩlab (nb / sr)
TAPS
E1 + (M 1 E1 (M 1 2 red
200
200
180
180
160
160
140
140
120
120
100
100
80
d σ / dΩlab (nb / sr)
60
80 θlab = 113.9
o
60
40 200
40 200
180
180
160
160
140
140
120
120
100
100
80 60
o
80 θlab = 135.6 o
60
40 250
θlab = 122.6
θlab = 148.6 o
40 300
350
Eγ (MeV)
400
450
250
300
350
400
450
Eγ (MeV)
Fig. 21. Di0erential cross section for Compton scattering o0 the proton as a function of the lab photon energy E and at di0erent scattering angles. Full curves: results from hyperbolic unsubtracted DRs, dashed curves: .xed-t subtracted DRs, dotted curves: .xed-t unsubtracted DRs. All results are shown for .xed values of E1 + (M 1 = 14:05, E1 − (M 1 = 10, and = −38. The experimental data are from Ref. [89].
d σ / dΩlab (nb / sr)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 200
200
180
180
160
160
140
140
120
120
100
100
80
d σ / dΩlab (nb / sr)
60
80 θlab = 113.9 o
60
40 200
40 200
180
180
160
160
140
140
120
120
100
100
80 60
θlab = 122.6 o
80 θlab = 135.6 o
60
40 250
157
θlab = 148.6 o
40 300
350
Eγ (MeV)
400
450
250
300
350
400
450
Eγ (MeV)
Fig. 22. Di0erential cross section for Compton scattering o0 the proton as a function of the lab photon energy E and at di0erent scattering angles. Full curves: results from hyperbolic subtracted DRs, dashed curves: hyperbolic unsubtracted DRs. All results are shown for .xed values of E1 + (M 1 = 14:05, E1 − (M 1 = 10, and = −38. The experimental data are from Ref. [89].
has to be constructed by a continuation of the partial wave series into the unphysical region (the area between the line s = (M + m )2 and the s-channel region in Fig. 16). Moreover, there are some systematic di0erences between .xed-t and hyperbolic DRs at these higher energies. The fact that the data seem to favor .xed-t DRs is not surprising: The calculations are performed with polarizabilities essentially derived by this method. Since these data are taken at backward angles, hyperbolic DRs should in fact be quite appropriate. As can be seen from the following Fig. 22, subtracted and unsubtracted hyperbolic DRs agree quite nicely, with the exception of the lowest scattering angle where the subtraction is necessary. Fig. 23 shows the angular distribution at an energy somewhat below the 8(1232), which turns out to be quite sensitive to the backward spin polarizability. Our calculations con.rm the .nding of Ref. [89]: The value of derived by the LEGS collaboration [86] is related to the fact that the LEGS and SAL data lie systematically above the recent MAMI results. This di0erence can be partly compensated by a small change of the M1+ multipole, e.g., a 2% increase of M1+ raises the cross section by nearly 10%. However, the backward–forward asymmetry cannot be changed that way but requires a strong variation of , in addition of the e0ect of the (known) E1+ =M1+ ratio.
158
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 300 Eγ = 285 MeV
275
d σ / dΩcm (nb / sr)
250 225 200 175 150 125 100 20
40
60
80
100 θcm (deg)
120
140
160
Fig. 23. The angular distribution at .xed photon lab energy E =285 MeV. The results are displayed for E1 +(M 1 =14:05, E1 − (M 1 = 10 and di0erent values of the backward spin polarizability. The dashed and solid curves are the results from .xed-t subtracted DR and hyperbolic subtracted DR, respectively, for = −27 (pair of upper curves) and = −38 (pair of lower curves). The results from hyperbolic DRs are shown at backward angles, "cm ¿ 100◦ . The experimental data are from Refs. [89] (triangles), [86] (full diamonds), [111] (open diamonds), and [85] (circles).
In order to get new and independent information on the spin polarizabilities, it will be necessary to perform double polarization experiments. Fig. 24 shows the di0erential cross sections for circularly polarized photons and target polarized perpendicular or parallel to the photon beam. Both for parallel and perpendicular polarization, a spin-Iip of the target proton changes the cross section by large factors. The sensitivity to the backward spin polarizability turns out to be largest at the higher energies and for circularly polarized photons hitting protons with polarizations perpendicular to the photon beam. It is also demonstrated in Fig. 24 that even an unreasonably large 20% decrease of E1 − (M 1 can only simulate a change in of about 2–3 units, making this an ideally suited observable to access . In the case of linearly polarized photons, one can access three additional independent observables. In particular, we can classify these polarization observables by assuming the xz plane as the photon scattering plane, with the quantization axis along the direction of the incoming photon momentum, and denoting with E the angle between the polarization vector of the photon and the x axis. With respect to this frame, one can measure the cross sections with the target polarized along the x or z direction and the photon polarization at E = ±45◦ , and the cross sections with the target polarization perpendicular to the scattering plane and the photon polarization parallel (E = 0◦ ) or perpendicular (E = 90◦ ) to the scattering plane. The results from .xed-t subtracted DRs for these observables are displayed in Fig. 25, and show similar sensitivity to the and E1 − (M 1 polarizabilities as in the case of double polarization observables with circularly polarized photons.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 →
159
→
γ+p→γ+p
60
250 225
dσ / dΩcm (nb/sr)
50
200 175
40
150 30
125 100
20
75 50
10
dσ / dΩcm (nb/sr)
25 0
0
35
140
30
120
25
100
20
80
15
60
10
40
5
20 0
0 0
50
100
θcm(deg)
0
150
50
100
θcm(deg)
150
Fig. 24. Double polarization di0erential cross sections for Compton scattering o0 the proton, with circularly polarized photon and target proton polarized along the photon direction (upper panels) or perpendicular to the photon direction and in the plane (lower panels). The thick and thin curves correspond to a proton polarization along the positive and negative directions, respectively. The results of the dispersion calculation at .xed-t are for .xed E1 + (M 1 = 13:8, .xed E1 − (M 1 = 10, and = −32 (full curves), = −27 (dashed curves), and = −37 (dashed–dotted curves). We also show the result for E1 + (M 1 = 13:8, E1 − (M 1 = 8 and = −37 (dotted curves).
3.10. Physics content of the nucleon polarizabilities The physical content of the polarizabilities can be visualized best by e0ective multipole interactions ˜ and magnetic (H ˜ ) .elds of the photon with the internal structure for the coupling of the electric (E) of the nucleon [80,112], (2n) He0 = −4 (H˜ e0 + H˜ (2n+1) ) ; (194) e0 n=1
where the even and odd upper indices refer to scalar and vector polarizabilities, respectively. In particular, the lowest scalar polarizabilities are contained in 1 ˜2 1 ˜2 H˜ (2) e0 = 2 E1 E + 2 (M 1 H ; 1 ˜˙ 2 1 ˜˙ 2 H˜ (4) e0 = 2 E1; 2 E + 2 (M 1; 2 H +
1 12
E2 Eij2 +
1 12
(M 2 Hij2 :
(195)
The leading term contains the (static!) electric and magnetic dipole polarizabilities, = E1 and ( = (M 1 . In the subleading term there appear two derivatives of the .elds with regard to either
160
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 →
→
γ+p→γ+p
dσ / dΩcm (nb/sr)
150 60 100 40
50
20
0
dσ / dΩcm (nb/sr)
0 60
100 40
50
20
0
dσ / dΩcm (nb/sr)
0 60
100 40
50
20
0
0 0
50
100
150
θcm(deg)
0
50
100
150
θcm(deg)
Fig. 25. Double polarization di0erential cross sections for Compton scattering o0 the proton, with linearly polarized photon and target proton polarized parallel or perpendicular to the scattering plane. Upper panels: target polarized along the x direction in the scattering plane and photon with linear polarization at an angle E = +45◦ (thick curves) and E = −45◦ (thin curves) with respect to the scattering plane. Middle panels: target polarized along the y direction perpendicular to the scattering plane and linearly polarized photon parallel (thick curves) and perpendicular (thin curves) to the scattering plane. Lower panels: target polarized along the z direction in the scattering plane and photon with linear polarization at an angle E = +45◦ (thick curves) and E = −45◦ (thin curves) with respect to the scattering plane. The results of the dispersion calculation at .xed-t are for .xed E1 + (M 1 = 13:8, .xed E1 − (M 1 = 10; and = −32 (full curves), = −27 (dashed curves), and = −37 (dashed–dotted curves). We also show the result for E1 + (M 1 = 13:8, E1 − (M 1 = 8 and = −37 (dotted curves).
˜˙ = 9t E ˜ and Eij = 1 (∇i Ej + ∇j Ei ) respectively. Applied to a plane wave photon, time or space, E 2 the subleading term is therefore O(!2 ) relative to the leading one. The terms in E1; 2 and (E1; 2 are, of course, retardation or dispersive corrections to the respective leading order dipole polarizabilities, while E2 and (E2 are the electric and magnetic quadrupole polarizabilities. Combining the static dipole polarizabilities with all terms in the sum with time derivatives only, we obtain the “dynamical dipole polarizabilities” E1 (!) and (M 1 (!). The terms involving the gradients build up higher polarizabilities, at fourth order the (static) electric (E2 ) and magnetic ((M 2 ) quadrupole polarizabilities. In a similar notation the lowest vector or spin polarizabilities are de.ned by 1 ˜ × E) ˜˙ + 1 M 1M 1˜ · (H ˜ ×H ˜˙ ) · (E H˜ (3) e0 = 2 E1E1˜ 2
− M 1E2 Eij i Hj − E1M 2 Hij i Ej ;
(196)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
161
1 ˜˙ × E) ˜] + 1 M 1M 1; 2˜ · (H ˜˙ × H ˜] ) · (E H˜ (5) e0 = 2 E1E1; 2˜ 2
− M 1E2; 2 E˙ ij i H˙ j − E1M 2; 2 H˙ ij i E˙ j − 2E2E2 jijk i Ejl E˙ kl − 2M 2M 2 jijk i Hjl H˙ kl + 3M 2E3 i Eijk Hjk − 3E2M 3 i Hijk Ejk ;
(197)
where Eijk = 13 (∇i ∇j Ek + ∇i ∇k Ej + ∇j ∇k Ei ) −
1 (* PEk 15 ij
+ *jk PEi + *ik PEj ) :
(198)
As in the spin-averaged case, four of the terms in the O(!5 ) polarizabilities are simply dispersive corrections to the O(!3 ) expressions. All polarizabilities de.ned above can be related to the multipole expansions given in Ref. [80]. In terms of the standard notation of spherical tensors, the polarizabilities correspond to the following coupling of electromagnetic transition operators: EL ∼ [EL × EL ]0 ;
(ML ∼ [ML × ML ]0 ;
ELEL ∼ [EL × EL ]1 ;
MLML ∼ [ML × ML ]1 ;
M (L−1)EL ∼ [ML−1 × EL ]1 ;
E(L−1)ML ∼ [EL−1 × ML ]1 :
(199)
The higher order polarizabilities given above are uniquely de.ned by the quantities ai , ai; 2 and ai; t of Eq. (161), as discussed in detail in Refs. [80,112]. In particular we .nd for the leading terms the relations E1 = −
1 (a1 + a3 + a6 ); 4
(M 1 = +
3 E2 = − (a1; t + a3; t + a6; t ); E1E1 =
E2E2 =
3 (M 2 = + (a1; t − a3; t − a6; t ) ;
1 (a2 − a4 + 2a5 + a6 ); 8M
M 1E2 = −
1 (a1 − a3 − a6 ) ; 4
1 (a2 + a4 + a6 ); 8M
M 1M 1 = −
E1M 2 =
1 (a2; t − a4; t + 3a5; t + 2a6; t ); 24M
M 2E3 = −
1 (a2; t + a4; t + a6; t ); 12M
1 (a2 + a4 + 2a5 − a6 ) ; 8M
1 (a2 − a4 − a6 ) ; 8M M 2M 2 =
E2M 3 = −
1 (−a2; t − a4; t − 3a5; t + 2a6; t ) 24M
1 (−a2; t + a4; t + a6; t ) ; 12M
(200)
where we neglected recoil contributions of O(M −2 ). For details see Ref. [112]. In terms of Ragusa’s polarizabilities i one has E1E1 = −1 − 3 ; M 1E2 = 2 + 4 ;
M 1M 1 = 4 ; E1M 2 = 3 :
(201)
162
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
With these de.nitions we can now complete the expansion of the forward scattering amplitudes, Eqs. (54) and (55), to the next order: f(2) = − g(2) = −
e2 eN2 + (E1 (2) + (M 1 (2))22 + 4M
1 ( (2) 12 E2
+ (M 2 (2))24 + O(26 ) ;
e2 N2 2 + 0 (2)23 + ˜0 (2)25 + O(27 ) : 8M 2
(202) (203)
with EL (2) = EL + EL; 2 22 + O(24 ) and similarly for the magnetic terms. In the spin-Iip amplitude we have de.ned 0 (2) = −(E1E1 (2) + M 1M 1 (2) + M 1E2 (2) + E1M 2 (2)) ;
(204)
˜0 (2) = −(E2E2 (2) + M 2M 2 (2) + 85 M 2E3 (2) + 85 E2M 3 (2)) :
(205)
We repeat that all combinations of polarizabilities appearing in the forward direction can be evaluated safely on the basis of DRs. However, the individual polarizabilities su0er from the nonconvergence of the unsubtracted DRs for A1 and A2 , and the bad convergence of A3 . In the following section we shall compare the predictions of DRs, ChPT and some QCD-motivated models amongst each other and with the available experimental data. The imaginary parts of the dynamical polarizabilities are determined from the scattering amplitudes of photoproduction on the nucleon by the unitarity relation. If we take into account only the contribution from one-pion intermediate states, the unitarity relations take the following simple form [80]: k (c) 2 2 Im E1 (!) = 2 (2|E2(c) − | + |E0+ | ) ; ! c Im (M 1 (!) =
k (c) 2 2 (2|M1+ | + |M1(c) −| ) ; !2 c
Im E2 (!) =
k (c) 2 2 36 (3|E3(c) − | + |E1+ | ) ; 4 ! c
Im (M 2 (!) =
k (c) 2 2 36 (3|M2+ | + |M2(c) −| ) ; !4 c
(206)
where k is the pion momentum, ! the photon c.m. energy, and Elc± and Mlc± are pion photoproduction multipoles which are summed over the di0erent isotopic or charge channels. The real part of these amplitudes, calculated both in dispersion theory and HBChPT [113,114], is displayed in Fig. 26. The dynamical polarizabilities allow for a very detailed study of the internal degrees of freedom. For example, E1 and E2 clearly show cusp e0ects due to the opening of the pion threshold, and (M 1 exhibits the 8-resonance structure, with the real part passing through zero at the resonance position. The HBChPT calculation nicely reproduces the results of DRs.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
20
20
15
10
10
0
5
-10
0
0
100 200 ω (MeV)
300
-20
30
0
20
-10
10
-20
0 0
100 200 ω (MeV)
300
-30
0
0
100 200 ω (MeV)
300
100 200 ω (MeV)
300
163
Fig. 26. The real part of the proton polarizabilities E1 , (M 1 (upper panels) and E2 , (M 2 (lower panels) as function of the photon c.m. energy !. Full curves: results from .xed-t subtracted dispersion relations. Dashed curves: predictions in leading order HBChPT from Ref. [114] for the isoscalar contribution to the dynamical polarizabilities up to ! = 170 MeV. The diamonds are the experimental values for the dipole static polarizabilities [87] which are used to .t low-energy constants.
3.11. DR predictions for nucleon polarizabilities and comparison with theory In a nonrelativistic model like the constituent quark model (CQM), the scalar dipole polarizabilities can be expressed by |n|dz |0 |2 em 2 E1 = 2em + ei ri |0 ; (207) 0| En − E 0 3M i n=0
(M 1 = 2em
|n|z |0 |2 n=0
En − E 0
−
em 0|d2 + d2i |0 ; 2M i
(208)
where d = di = ei ri and = i = (ei =2mi )i are the electric and magnetic dipole operators in the c.m. frame of the nucleon. For simplicity the quark masses are taken as mi = 13 M , and the quark charges ei are in units of e. The terms O(M −1 ) in Eqs. (207) and (208) are retardation or recoil terms, which are small corrections in atomic physics but actually quite sizeable for the quark dynamics of the nucleon. Clearly the .rst term on the rhs of both equations is positive, because the dipole matrix elements appear squared and the excitation energy En − E0 is positive. The higher order terms O(M −1 ), however, are positive for E1 but negative for (M 1 . In the case of the magnetic polarizability, the leading term describes the paramagnetism which is essentially due to the spin-Iip transition from the nucleon to the 8 (1232), while the subleading term represent Langevin’s diamagnetism. The simple CQM with an oscillator potential connects the rms radius
164
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
r 2 1=2 with the oscillator frequency, !0 = 3=(M r 2 ), and yields [115] E1 = (2em )=(M!02 ) + O(M −2 ) :
(209)
Unfortunately, it is not possible to describe both size and excitation energy in this model. If we use the proper size, say the electric Sachs radius of the proton, r 2 = r 2 pE , E1 is grossly overestimated with a value of about 40. On the other hand, the correct excitation energy for the dominant dipole mode N ∗ (1520), leads to a value much too small, E1 ≈ 3:5. Concerning the magnetic polarizability, 8 the magnetic dipole transition to the 8 (1232) yields a large paramagnetic value, (M 1 ≈ 12, which is somewhat reduced by the diamagnetic terms. The fact that we underestimate E1 if using the excitation energy of the N ∗ (1520) is easily understood: The energy denominator in Eq. (207) has been taken to be nearly 600 MeV, while electric dipole absorption due to pion S-wave production already takes place at much smaller energies. The strong dependence of E1 on the size of the oscillator parameter can of course be used to get close to the experimental numbers, and indeed reasonable results were obtained using the MIT bag model [116,117]. However, it was also early recognized that no complete picture of the nucleon can emerge without including the pion cloud. In fact, a detailed study of the polarizabilities in a chiral quark model showed that for a reasonable quark core radius of 0:6 fm, the pion cloud contributions are clearly dominant [118]. Systematic calculations of pion cloud e0ects became possible with the development of chiral perturbation theory (ChPT), an expansion in the external momenta and the pion mass (“p expansion”). The .rst calculation of Compton scattering in that scheme was performed by Bernard, Kaiser and Mei^ner in 1991 [119]. Keeping only the leading term in 1=m , they found the following simple relation at order p3 (one-loop calculation) E1 = 10(M 1 =
5em gA2 = 12:2 ; 96f2 m
(210)
in remarkable agreement with experiment. The calculation was later repeated in heavy baryon ChPT, which allows for a consistent chiral power counting, and extended to O(p4 ) yielding [120] p = 10:5 ± 2:0; E1
p (M 1 = 3:5 ± 3:6 :
(211)
The error bars for these values indicate that several low-energy constants were determined by resonance saturation, e.g., by putting in phenomenological information about the 8 (1232) resonance. Since this resonance lies close, it may not be justi.ed to “freeze” the degrees of freedom of this near-by resonance. It is for this reason that the “small scale expansion” (SSE) was proposed which includes the excitation energy of the 8 (1232) as an additional expansion parameter (“ expansion”). Unfortunately, at O(3 ) the “dynamical” 8 [121,104] increases the polarizabilities to values far above the data [104], p E1 = 16:4
and
p (M 1 = 9:1 :
(212)
Since large loop corrections are expected at O(4 ), a calculation to this order might remedy the situation. Otherwise, one would have to shift the problem to large contributions of counterterms, thus loosing the predictive power.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
165
Table 7 Theoretical predictions for scalar polarizabilities of the proton: to O(p3 ) in HBChPT [119], to O(3 ) in the small scale expansion [104], in the .xed-t dispersion relation analyses of Ref. [112] (HDPV) and Ref. [80] (BGLMN), and in the dressed K-matrix model of Ref. [112] (KS). In the DR calculations E1 − (M 1 = 10:0 is used as input. The values are given in units of 10−4 fm3 for the dipole polarizabilities and in units of 10−4 fm5 for the quadrupole polarizabilities O(p3 ) E1 (M 1 E2 (M 2
13.6 1.4 22.1 −9:5
O(3 )
HDPV
BGLMN
KS
16.4 9.1 26.2 −12:3
11.0 1.0 28.8 −23:7
11.9 1.9 27.5 −22:4
12.1 2.4
The comparison between the predictions for the scalar polarizabilities in heavy baryon ChPT and in .xed-t DR is given in Table 7. The di0erences between the DR analyses of Ref. [112] (HDPV) and Ref. [80] (BGLMN) can be explained by di0erent inputs for the one-pion multipoles (in Ref. [80] the solution of SAID-SP97K was used) and di0erent approximations for the multipion channels. In Ref. [80], in addition to the parametrization of the resonant contribution of the inelastic channels mentioned in Section 3.7, the nonresonant contribution to the two-pion photoproduction channel was modeled by calculating the OPE diagram of the N → 8 reaction. The di0erence between the data and the model for two-pion photoproduction consisting of resonant mechanism plus the OPE diagram for the nonresonant mechanism, was then .tted and attributed to a phenomenological, nonresonant N → 8 S-wave correction term. The e0ect of the multipion channels can be seen mainly in the sum E1 + (M 1 , which, within the BGLMN analysis, approximately reproduces the value of Baldin’s sum rule as given in Eq. (59). Furthermore, in Table 7, we also show the results for E1 and (M 1 obtained in the dressed K-matrix model of Ref. [122], which turn out to be quite close to the DR results. The spin polarizabilities were calculated within the HBChPT approach in Ref. [123]. Taking out a common factor C = em gA =(42 f2 m2 ), Ragusa’s polarizabilities at O(p3 ) read
gA 1 gA 1 gA gA ; (213) {1 ; 2 ; 3 ; 4 } = C −1 + ; 0 + ; + ; − − 6 12 2 24 2 24 where the .rst term for each i is the contribution of the t-channel 0 pole term, and the second one the dispersive contribution. Clearly, the pole term is the dominant feature except for the case of 2 . Whether or not the pole term should be included or dropped in the de.nition of the spin polarizabilities is an open discussion, though from the standpoint of DRs the pole terms and dispersion integrals are clearly separated. In Table 8, we compare the proton results from heavy baryon ChPT, .xed-t DR and hyperbolic DR analyses for the dispersive contribution to the spin polarizabilities of Eq. (200). The agreement between the di0erent DR results is quite satisfactory in all cases, and the spread among the di0erent DR values can be seen as the best possible error estimate of such calculations to date. Let us also notice that the dressed K-matrix model of Ref. [122] also yields values which are quite close to the DR analysis, except for E1M 2 which comes out much larger in absolute value in the dressed K-matrix
166
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
Table 8 Theoretical predictions for the dispersive contribution to spin polarizabilities of the proton: to O(p3 ) in HBChPT [104], to O(p4 ) in HBChPT from the two derivations of Refs. [128] and [124], to O(3 ) in the small scale expansion [128], in .xed-t dispersion relation analysis of Ref. [112] (HDPV) and Ref. [80] (BGLMN), and in our calculation with hyperbolic dispersion relations (HYP. DR) at "lab = 180◦ . Furthermore, the column (KS) gives the results in the dressed K-matrix model of Ref. [122]. The values are given in units of 10−4 fm4 for the lower order polarizabilities and in units of 10−4 fm6 for the higher order polarizabilities
E1E1 M 1M 1 E1M 2 M 1E2 0 disp E2E2 M 2M 2 E2M 3 M 2E3 a
O(p3 )
O(p4 ) [128]
O(3 )
O(p4 ) [124]
HDPV
BGLMN
HYP. DR
KS
−5:7 −1:1 1.1 1.1 4.6 4.6 −0:4 −0:03 0.11 0.11
−1:8 0.4a 0.7 1.8 −1:1 3.3 0.08 0.06 0.03 0.03
−5:4 1.4 1.0 1.0 2.0 6.8 −0:28 −0:03 0.11 0.11
−1:4 3.3 0.2 1.8 −3:9 6.3
−4:3 2.9 −0:01 2.1 −0:7 9.3 −0:16 −0:09 0.08 0.06
−3:4 2.7 0.3 1.9 −1:5 7.8
−3:8 2.9 0.5 1.6 −1:1 7.8
−5:0 3.4 −1:8 1.1 2.4 11.4
Ref. [129] has suggested that a contribution of +2:5 from the 8-pole is still missing.
model and is responsible for the too large and positive value obtained in that model for 0 compared to experiment. One also sees from Table 8 that the ChPT predictions disagree in some cases, both among each other and with the DR results. It is obvious that the reason for these problems deserves further study. In the following we shall discuss the forward spin polarizabilities 0 . As is obvious from Eqs. (201) and (213), the 0 pole term cancels in the forward direction, and in agreement with forward dispersion relations, Eq. (58), only excited intermediate states contribute. Two recent calculations of 0 at O(p4 ) yield the following result [124,125]: m gA C 1− (21 + 3p − 2n ) = 4:5 − 8:4 = −3:9 : (214) 0 = 6 8M In another independent investigation, Gellas et al. [126] arrived at a value 0 = −1, close to the experimental value 0 = (−1:01 ± 0:08 ± 0:10) of Eq. (60). However, the apparent discrepancy within the HBChPT calculations is not related with any di0erences concerning the observables, but merely a matter of de.nition of a polarizability [127,128]. For comparison, the SSE at O(3 ) predicts [104] 0 = 4:6(N ) − 2:4(8) − 0:2(8) = 2:0 ;
(215)
the individual contributions being due to N loops, 8 poles, and 8 loops. The corresponding results for the neutron scalar and spin polarizabilities are shown in Tables 9 and 10. In the case of the scalar polarizabilities, the di0erence between proton and neutron dispersive results is quite small. This is in qualitative agreement with the ChPT calculations, in which the isovector e0ects appear only at the fourth order. At this order however, unknown low-energy
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
167
Table 9 Same as Table 7, but for the scalar polarizabilities of the neutron. In the DR calculations E1 − (M 1 = 11:5 is used as input O(p3 ) E1 (M 1 E2 (M 2
13.6 1.4 22.1 −9:5
O(3 )
HDPV
BGLMN
KS
16.4 9.1 26.2 −12:3
12.3 0.8 28.8 −23:7
13.3 1.8 27.2 −23:5
12.7 1.8
Table 10 Same as Table 8, but for the spin polarizabilities of the neutron
E1E1 M 1M 1 E1M 2 M 1E2 0 disp E2E2 M 2M 2 E2M 3 M 2E3 a
O(p3 )
O(p4 ) [128]
O(3 )
O(p4 ) [124]
HDPV
BGLMN
HYP. DR
KS
−5:7 −1:1 1.1 1.1
−4:2 0.4a 0.5 2.2
−5:4 1.4 1.0 1.0
−4:2 2.3 0.4 2.2
−5:9 3.8 −0:9 3.1
−5:6 3.8 −0:7 2.9
−4:7 2.8 0.4 2.0
−4:8 3.5 −1:8 1.1
4.6 4.6
1.1 6.3
2.0 6.8
−0:7 8.3
−0:07 13.7
−0:4 13.0
−0:5 9.2
2.0 11.2
−0:4 −0:03 0.11 0.11
0.08 0.06 0.03 0.03
−0:28 −0:03 0.11 0.11
−0:16 −0:09 0.08 0.06
Ref. [129] has suggested that a contribution of +2:5 from the 8-pole is still missing.
constants enter the scalar polarizabilities. For the lower-order spin polarizabilities of the neutron, shown in Table 10, it is amusing to note that the O(p4 ) HBChPT predictions of Ref. [124] are quite close to the hyperbolic DR results. The higher-order spin polarizabilities of the proton and neutron are predicted to be equal, because only the isoscalar contribution to the t-channel structure constants ai; t of Eq. (200) was taken into account. Finally, we also like to mention that very recently the .rst lattice QCD calculations of hadron electric and magnetic polarizabilities were reported [130]. While these initial results look quite promising, it is too premature to make a quantitative comparison with experiment at this time, because the current lattice calculations were performed for rather large values of the quark mass (corresponding with pion masses & 500 MeV). However, in the near future such calculations can be envisaged for values of m down to about 300 MeV. Furthermore, in this range one may make use of ChPT results which calculate the dependence of the polarizabilities on m . This opens up the prospect to extrapolate the lattice results downwards in m , and bridge the gap in m between the existing lattice calculations and the chiral limit. Such a study is very worthwhile to investigate in a future work.
168
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
4. Dispersion relations in virtual Compton scattering (VCS) 4.1. Introduction In this section, we discuss dispersion relations for the virtual Compton scattering (VCS) process. In this process, denoted as ∗ + p → + p, a spacelike virtual photon (∗ ) interacts with a nucleon (we consider a proton in all of the following as experiments are only performed for a proton target so far) and a real photon () is produced. At low energies, this real photon plays the role of an applied quasi-static electromagnetic .eld, and the VCS process measures the response of the nucleon to this applied .eld. In the real Compton scattering process discussed in Section 3, this response is characterized by global nucleon structure constants such as the nucleon dipole and higher order polarizabilities. In contrast, for the VCS process, the virtuality of the initial photon can be dialed so as to map out the spatial distribution of these nucleon polarizabilities, giving access to so-called generalized polarizabilities (GPs). First unpolarized VCS observables have been obtained from experiments at the MAMI accelerator [131] at a virtuality Q2 = 0:33 GeV2 , and recently at JLab [132] at higher virtualities, 1 ¡ Q2 ¡ 2 GeV2 . Both experiments measured two combinations of GPs. Further experimental programs are underway at the intermediate energy electron accelerators (MIT-Bates [133], MAMI [134], and JLab [135]) to measure both unpolarized and polarized VCS observables. VCS experiments at low outgoing photon energies can be analyzed in terms of low-energy expansions (LEXs), proposed in Ref. [136]. In the LEX, only the leading term (in the energy of the produced real photon) of the response to the quasi-constant electromagnetic .eld, due to the internal structure of the system, is taken into account. This leading term depends linearly on the GPs. As the sensitivity of the VCS cross sections to the GPs grows with the photon energy, it is advantageous to go to higher photon energies, provided one can keep the theoretical uncertainties under control when approaching and crossing the pion threshold. The situation can be compared to RCS as described in Section 3, where it was shown that one uses a dispersion relation formalism to extract the polarizabilities at energies above pion threshold, with generally larger e0ects on the observables. In this section, we describe the application of a dispersion relation formalism to the VCS reaction with the aim to extract GPs from VCS experiments over a larger energy range. We will also review the present status and future prospects of VCS experiments and describe the physics contained in the GPs. For more details on VCS, see also the reviews of Refs. [137,138]. 4.2. Kinematics and invariant amplitudes The VCS process on the proton is accessed through the ep → ep reaction. In this process, the .nal photon can be emitted either by the proton, which is referred to as the fully virtual Compton scattering (FVCS) process, or by the lepton, which is referred to as the Bethe–Heitler (BH) process. This is shown graphically in Fig. 27, leading to the amplitude T ee of the ep → ep reaction as the coherent sum of the BH and the FVCS process:
T ee = T BH + T FVCS : BH
(216)
is exactly calculable from QED if one knows the nucleon electromagnetic The BH amplitude T form factors. The FVCS amplitude T FVCS contains, in the one-photon exchange approximation, the
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
e
169
γ
e p
p (a)
γ e p
e
e
p
e
γ
p
p
(b)
Fig. 27. (a) FVCS process, (b) BH process.
VCS subprocess ∗ p → p. We refer to Ref. [137] where the explicit expression of the BH amplitude is given, and where the construction of the FVCS amplitude from the ∗ p → p process is discussed. We characterize the four-vectors of the virtual (real) photon in the VCS process ∗ p → p by q (q ) respectively, and the four-momenta of initial (.nal) nucleons by p (p ) respectively. In the VCS process, the initial photon is spacelike and we denote its virtuality in the usual way by q2 = −Q2 . Besides Q2 , the VCS process can be described by the Mandelstam invariants s = (q + p)2 ;
t = (q − q )2 ;
u = (q − p )2 ;
(217)
with the constraint s + t + u = 2M 2 − Q2 ;
(218)
where M denotes the nucleon mass. We furthermore introduce the variable 2, which changes sign under s ↔ u crossing: s−u 1 2= (219) = Elab + (t − Q2 ) ; 4M 4M where Elab is the virtual photon energy in the lab frame. In the following, we choose Q2 , 2 and t as the independent variables to describe the VCS process. In Fig. 28, we show the Mandelstam plane for the VCS process at a .xed value of Q2 = 0:33 GeV2 , at which the experiment of Ref. [131] was performed. The VCS helicity amplitudes can be written as T. .N ;..N = −e2 (q; .)2∗ (q ; . )u(p Z ; .N )M2 u(p; .N ) ;
(220)
where the polarization four-vectors of the virtual (real) photons are denoted by ( ), and their helicities by . (. ), with . = 0; ±1 and . = ±1. The nucleon helicities are .N ; .N = ±1=2, and u; uZ are the nucleon spinors. The VCS tensor M2 in Eq. (220) can be decomposed into a Born (B) and a nonBorn part (NB): 2 M2 = MB2 + MNB :
(221)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
+ m )2 π
u = (M N
0.4
N
+ m π)
2
0.2
s = (M
170
t = 4mπ2
t = mπ2
0
2
t ( GeV )
θ=
o
0o
θ=0
-0.2
-0.4
-0.6
-1 -0.6
-0.4
o
θ=
180
180 o
θ=
-0.8
-0.2
0
0.2
0.4
0.6
ν ( GeV )
Fig. 28. The Mandelstam plane for virtual Compton scattering at Q2 =0:33 GeV2 . The boundaries of the physical s-channel region are " = 0◦ and 180◦ for 2 ¿ 0, the u-channel region is obtained by crossing, 2 → −2. The curves for " = 0◦ and 180◦ intersect at 2 = 0, t = −Q2 , which is the point where the generalized polarizabilities are de.ned.
In the Born process, the virtual photon is absorbed on a nucleon and the intermediate state remains a nucleon, whereas the nonBorn process contains all nucleon excitations and meson-loop contributions. The separation between Born and nonBorn parts is performed in the same way as described in Ref. [136], to which we refer for details (see also Ref. [139]). One can proceed by parametrizing the VCS tensor of Eq. (221) in terms of 12 independent amplitudes. In Ref. [140], a gauge-invariant tensor basis was found so that the resulting nonBorn invariant amplitudes are free of kinematical singularities and constraints, which is an important property when setting up a dispersion relation formalism. This tensor takes the form M2 =
12
fi (Q2 ; 2; t)!2 i ;
(222)
i=1
where the 12 independent tensors !2 i are given in Appendix B. The corresponding 12 amplitudes fi are expressed in terms of the invariants Q2 , 2 and t. The tensor basis !2 i is chosen such that the resulting invariant amplitudes fi are either even or odd under crossing, which leads to the following symmetry relations for the fi at the real photon point: fi (0; 2; t) = +fi (0; −2; t); fi (0; 2; t) = −fi (0; −2; t);
(i = 1; 2; 6; 11) ; (i = 4; 7; 9; 10) ;
(223)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
171
while the amplitudes f3 , f5 , f8 , f12 do not contribute at this point, because the corresponding tensors vanish in the limit Q2 → 0. Nucleon crossing combined with charge conjugation provides the more general constraints on the fi at arbitrary virtuality Q2 : fi (Q2 ; 2; t) = +fi (Q2 ; −2; t);
(i = 1; 2; 5; 6; 7; 9; 11; 12) ;
fi (Q2 ; 2; t) = −fi (Q2 ; −2; t);
(i = 3; 4; 8; 10) :
(224)
When using dispersion relations, it will be convenient to work with 12 amplitudes that are all even in 2. This is achieved by introducing the amplitudes Fi (i = 1; : : : ; 12) as follows: Fi (Q2 ; 2; t) = fi (Q2 ; 2; t); Fi (Q2 ; 2; t) =
(i = 1; 2; 5; 6; 7; 9; 11; 12) ;
1 fi (Q2 ; 2; t); 2
(i = 3; 4; 8; 10) ;
(225)
satisfying Fi (Q2 ; −2; t)=Fi (Q2 ; 2; t) for i=1; : : : ; 12. As the nonBorn invariant amplitudes f3;NB4; 8; 10 ∼ 2 for 2 → 0, the de.nition of Eq. (225) ensures that also all the nonBorn amplitudes FiNB (i=1; : : : ; 12) are free from kinematical singularities. The results for the Born amplitudes FiB are listed in Appendix B of Ref. [141]. From Eqs. (223) and (224), one furthermore .nds that F7 and F9 vanish at the real photon point. Since 4 of the tensors also vanish in the limit Q2 → 0, only the six amplitudes F1 , F2 , F4 , F6 , F10 and F11 enter in real Compton scattering (RCS). These six amplitudes are related to the RCS amplitudes of Eq. (158) by 2
−e F1 = −A1 −
t − 4M 2 4M 2
22 A4 + A 6 ; M2 A4 ;
A3 +
1 t A3 + A6 − 2 2M 4M 2 1 A4 ; −e2 F4 = 2M 2 t − 4M 2 1 2 − A4 + A6 ; −e F6 = 4M 2 4M 2 −e2 F2 = −
1 [A5 − A6 ] ; 2M 1 t − 4M 2 + 422 A2 − A + A −e2 F11 = − 4 6 ; 4M 4M 2
−e2 F10 = −
(226)
where the charge factor −e2 appears explicitly on the lhs of Eq. (226), because this factor is included in the usual de.nition of the Ai .
172
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
4.3. De@nitions of nucleon generalized polarizabilities 2 | → 0) but The behavior of the nonBorn VCS tensor MNB of Eq. (222) at low energy (q ≡ |qcm at arbitrary three-momentum q ≡ |qcm | of the virtual photon, can be parametrized by six generalized polarizabilities (GPs), which will be denoted by P (M L ; ML)S (q) [136,140,142]. In this notation, M (M ) refers to the electric (E), magnetic (M ) or longitudinal (L) nature of the initial (.nal) photon, L (L ) represents the angular momentum of the initial (.nal) photon, and S di0erentiates between the spin-Iip (S = 1) and nonspin-Iip (S = 0) character of the transition at the nucleon side. Assuming that the emitted real photons have low energies, we may use the dipole approximation (L = 1). For a dipole transition in the .nal state, angular momentum and parity conservation lead to 10 GPs [136]. Furthermore, it has been shown [140] that nucleon crossing combined with charge conjugation symmetry of the VCS amplitudes provide four additional constraints among the 10 GPS. A convenient choice for the six independent GPs appearing in that approximation has been proposed in Ref. [137]:
P (L1; L1)0 (q); P (M 1; M 1)0 (q) ;
(227)
P (L1; L1)1 (q); P (M 1; M 1)1 (q); P (M 1; L2)1 (q); P (L1; M 2)1 (q) :
(228)
We note at this point that the di0erence between the transverse electric and longitudinal transitions is of higher order in q, which explains why the electric multipoles can be replaced by the longitudinal ones in the above equations. In the limit q → 0 one .nds the following relations between the VCS and RCS polarizabilities [140]: 4 2 4 8 (L1; L1)0 (M 1; M 1)0 P (0) = − 2 (0) = − 2 ; P ( ; e 3 e 3 √ √ 4 2 4 2 2 (L1; M 2)1 (M 1; L2)1 3 ; P (0) = − 2 (0) = − 2 √ (2 + 4 ) ; P e 3 e 3 3 P (L1; L1)1 (0) = 0;
P (M 1; M 1)1 (0) = 0 :
(229)
In terms of invariants, the limit q → 0 at .nite three-momentum q of the virtual photon corresponds to 2 → 0 and t → −Q2 at .nite Q2 . One can therefore express the GPs in terms of the VCS invariant amplitudes Fi at the point 2 = 0; t = −Q2 for .nite Q2 , for which we introduce the shorthand: FZ i (Q2 ) ≡ FiNB (Q2 ; 2 = 0; t = −Q2 ) :
(230)
The relations between the GPs and the FZ i (Q2 ) can be found in Ref. [140]. Analogously to the sum rules which we discussed in Section 3 for the nucleon polarizabilities at Q2 = 0, we now turn to dispersion relations for the GPs. From the high-energy behavior of the amplitudes Fi , it was found in Ref. [141] that the unsubtracted DRs do not exist for the amplitudes F1 and F5 , but can be written down for the other amplitudes. Therefore, unsubtracted DRs for the GPs will hold for those GPs which do not depend on the two amplitudes F1 and F5 . However, the amplitude F5 can appear in the combination F5 + 4F11 , because this combination has a high-energy
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
173
behavior leading to a convergent integral [141,143]. Among the six GPs we .nd four combinations that do not depend on F1 and F5 :
2 q Z 1 (M 1; M 1)0 −2 E + M 1=2 (L1; L1)0 Z Z Z √ (231) + P = M q˜0 F 2 + (2F 6 + F 9 ) − F 12 ; P 2 E q˜20 3 E + M 1=2 1 (L1; L1)1 √ = q˜0 {(FZ 5 + FZ 7 + 4FZ 11 ) + 4M FZ 12 } ; (232) P E 3 2 1 1 E + M 1=2 M q˜20 P (M 1; M 1)1 = P (L1; M 2)1 − √ 3 E q 2q˜0 ×{(FZ 5 + FZ 7 + 4FZ 11 ) + 4M (2FZ 6 + FZ 9 )} ; √
P
(L1; M 2)1
3 (M 1; L2)1 1 P + = 2 6
E+M E
1=2
(233)
q˜0 q2
×{q˜0 (FZ 5 + FZ 7 + 4FZ 11 ) + 8M 2 (2FZ 6 + FZ 9 )} ;
(234)
where E = q 2 + M 2 denotes the initial proton c.m. energy and q˜0 = M − E the virtual photon c.m. energy in the limit q =0. For small values of q, we observe the relation q˜0 ≈ −q 2 =(2M ). Furthermore, in the limit q = 0, the value of Q2 is always understood as being Q˜ 2 ≡ q2 − q˜20 , which we denote by Q2 for simplicity of the notation. 4.4. Fixed-t dispersion relations With the choice of the tensor basis of Appendix B, and taking account of the crossing relation Eq. (224), the resulting nonBorn VCS invariant amplitudes Fi (i = 1; : : : ; 12) are free of kinematical singularities and constraints, and even in 2, i.e., Fi (Q2 ; 2; t) = Fi (Q2 ; −2; t). Assuming further analyticity and an appropriate high-energy behavior, these amplitudes ful.ll unsubtracted dispersion relations 10 with respect to the variable 2 at .xed t and .xed virtuality Q2 , +∞ 2 Ims Fi (Q2 ; 2 ; t) 2 pole NB 2 2 B 2 Re Fi (Q ; 2; t) = Fi (Q ; 2; t) − Fi (Q ; 2; t) + P d2 ; (235) 2 2 − 2 2 20 where we explicitly indicate that the lhs of Eq. (235) represents the nonBorn (NB) parts of the amplitudes. In Eq. (235), FiB is de.ned as in the discussion following Eq. (221), whereas Fipole represents the nucleon pole contribution (i.e. energy factors in the numerators are evaluated at the pole position). 11 Furthermore in Eq. (235), Ims Fi are the discontinuities across the s-channel cuts 10
As a historical remark we note that dispersion relations have been considered for the .rst time for the virtual Compton scattering process in [144]. This work considered however a di0erent set of amplitudes as discussed here. To avoid numerical artefacts due to kinematical singularities we will only consider here DRs in the amplitudes Fi which are free from such singularities and constraints. 11 Note that of the twelve VCS amplitudes Fi , only for the amplitudes F1 , and a combination of F5 and F11 there is a di0erence between the Born and pole parts.
174
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
of the VCS process, starting at the pion production threshold, which is the .rst inelastic channel, i.e., 20 = m + (m2 + t=2 + Q2 =2)=(2M ). Besides the absorptive singularities due to physical intermediate states which contribute to the rhs of dispersion integrals as Eq. (235), one might wonder if other singularities exist giving rise to imaginary parts. Such additional singularities could come from the so-called anomalous thresholds [145,146], which arise when a hadron is a loosely bound system of other hadronic constituents which can go on-shell (such as is the case of a nucleus in terms of its nucleon constituents), leading to the so-called triangular singularities. It was shown that in the case of strong con.nement within QCD, the quark–gluon structure of hadrons does not give rise to additional anomalous thresholds [147,148], and the quark singularities are turned into hadron singularities described through an e0ective .eld theory. Therefore, the only anomalous thresholds arise for those hadrons which are loosely bound composite systems of other hadrons (e.g., the H particle in terms of I and ). For the nucleon case, such anomalous thresholds are absent, and the imaginary parts entering the dispersion integrals of Eq. (235) are calculated from absorptive singularities due to N; N; : : : physical intermediate states. The assumption that unsubtracted dispersion relations as in Eq. (235) hold, requires that at high energies (2 → ∞ at .xed t and .xed Q2 ) the amplitudes Ims Fi (Q2 ; 2; t) (i = 1; : : : ; 12) drop fast enough so that the integrals of Eq. (235) are convergent and the contribution from the semi-circle at in.nity can be neglected. The high-energy behavior of the amplitudes Fi was investigated in [141] by considering the Regge limit (2 → ∞, at .xed t and .xed Q2 ) of the VCS helicity amplitudes. As mentioned above, it follows from this analysis that for the amplitudes F1 and F5 , an unsubtracted dispersion integral does not exist, whereas the other ten VCS amplitudes can be evaluated through unsubtracted dispersion integrals as in Eq. (235). Having speci.ed the VCS invariant amplitudes and their high energy behavior, we are now ready to set up the DR formalism. The di0erence between Born and pole terms in Eq. (235) vanishes for the four combinations of GPs on the lhs of Eqs. (231)–(234). They can be directly evaluated by unsubtracted DRs through the following integrals for the corresponding FZ i (Q2 ): 2 +∞ Ims Fi (Q2 ; 2 ; t = −Q2 ) 2 Z F i (Q ) = d2 : (236) 20 2 We will next discuss in Section 4.4.1 how the s-channel dispersion integrals of Eqs. (235) and (236) are evaluated. In particular, unitarity will allow us to express the imaginary parts of the VCS amplitudes in terms of N; N; : : : intermediate states. Subsequently, we will show in Section 4.4.2 how to deal with the remaining two VCS invariant amplitudes for which one cannot write down unsubtracted DRs. 4.4.1. s-channel dispersion integrals The imaginary parts of the amplitudes Fi in Eq. (235) are obtained through the imaginary part of the VCS helicity amplitudes T. .N ;..N de.ned in Eqs. (220) and (222). The VCS helicity amplitudes can be expressed by the Fi in a straightforward manner, even though the calculation is cumbersome. The main diGculty, however, is the inversion of the relation between the two sets of amplitudes, i.e., to express the 12 amplitudes Fi in terms of the 12 independent helicity amplitudes. This problem has been solved in Refs. [141,149] in two di0erent ways. Firstly, the inversion was performed
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
175
numerically by applying di0erent algorithms. Secondly, an explicit analytical inversion was found as detailed in Ref. [149]. The two di0erent methods allow us to cross-check the results. Having expressed the amplitudes Fi in terms of the helicity amplitudes, the latter are determined by using unitarity. Denoting the VCS helicity amplitudes by Tfi , the unitarity relation takes the generic form † (2)4 *4 (PX − Pi )TXf TXi ; (237) 2 Ims Tfi = X
where the sum runs over all possible intermediate states X . Here we are mainly interested in VCS through the 8(1232)-resonance region. Therefore, we restrict ourselves to the dominant contribution by only taking account of the N s-channel intermediate states. If one wants to extend the dispersion formalism to higher energies, the inIuence of additional channels, like the N intermediate states has to be addressed. The helicity amplitudes for N intermediate states are expressed in terms of pion photo- and electroproduction multipoles as speci.ed in Appendix C.4 of Ref [141]. The calculations are performed by use of the phenomenological MAID analysis [18], which contains both resonant and nonresonant pion production mechanisms. This state-of-the-art analysis is based on the existing pion photo- and electroproduction data. A direct evaluation of the rhs of Eq. (237) is not possible due to an incomplete coverage of the phase space with the present data sets. Therefore, a phenomenological analysis is needed to fully calculate the dispersive input. 4.4.2. Asymptotic parts and dispersive contributions beyond N To evaluate the VCS amplitudes F1 and F5 in an unsubtracted DR framework, we proceed as in the case of RCS [75]. This amounts to perform the unsubtracted dispersion integrals of Eq. (235) for F1 and F5 along the real 2-axis in the range −2max 6 2 6 + 2max , and to close the contour by a semi-circle with radius 2max in the upper half of the complex 2-plane, with the result Re FiNB (Q2 ; 2; t) = Fipole (Q2 ; 2; t) − FiB (Q2 ; 2; t) + Fiint (Q2 ; 2; t) + Fias (Q2 ; 2; t) ; for (i = 1; 5), where the integral contributions Fiint (for i = 1; 5) are given by 2max 2 Ims Fi (Q2 ; 2 ; t) 2 int 2 d2 ; Fi (Q ; 2; t) = P 2 2 − 2 2 20
(238)
(239)
and with the contributions of the semi-circle of radius 2max identi.ed with the asymptotic contributions (F1as , F5as ). Evidently, the separation between asymptotic and integral contributions in Eq. (238) is speci.ed by the value of 2max . The total result for FiNB is formally independent of the speci.c value of 2max . In practice, however, 2max is chosen to be not too large so that one can evaluate the dispersive integrals of Eq. (239) from threshold up to 2max suGciently accurate. As we are mainly interested here in a description of VCS up to 8(1232)-resonance energies, the dispersion integrals are saturated by their N contribution and we choose 2max =1:5 GeV. In the following, we denote this contribution by FiN . Furthermore, the remainder is estimated by an energy-independent function, which parametrizes the asymptotic contribution Fias due to t-channel poles, and which contains all dispersive contributions beyond the value 2max = 1:5 GeV. We will next discuss the asymptotic contributions F5as and F1as .
176
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
• The asymptotic contribution F5as The asymptotic contribution to the amplitude F5 predominantly results from the t-channel 0 exchange: 12 0
0
(Q2 ; t) = − F5as (Q2 ; 2; t) ≈ F5 (Q2 ; t) = −4F11
1 gNN F0 (Q2 ) : Me2 t − m2
(240)
For the Q2 -dependence of F0 (Q2 ), one can use the interpolation formula proposed in [150]: F0 (Q2 ) =
F0 (0) ; 1 + Q2 =(82 f2 )
(241)
where F0 (0) has been given in Eq. (179). Eq. (241) provides a rather good parametrization of the 0 ∗ form factor data over the whole Q2 range. When .xing the asymptotic contribution F5as through its 0 -pole contribution as in Eq. (240), one can determine one more GP of the nucleon, in addition to the four combinations of Eqs. (231)–(234). In particular, the GP P (M 1; M 1)1 which contains F5 can be calculated through √ 2 E + M 1=2 M q˜20 Z (M 1; M 1)1 2 (Q ) = − {F 5 (Q2 ) + q˜0 FZ 12 (Q2 )} : (242) P 3 E q2 • The asymptotic part and dispersive contributions beyond N to F1 We next turn to the high-energy contribution to F1 . As in the case of RCS, the asymptotic contribution to the amplitude F1 originates predominantly from the t-channel intermediate states. In a phenomenological analysis, this continuum is parametrized through the exchange of a scalar–isoscalar particle in the t-channel, i.e. an e0ective “”-meson, as suggested in [75] and discussed in Section 3.6 for the RCS case. In this spirit, the di0erence between F1NB and its N contribution can be parametrized at Q2 = 0 by the energy-independent function: F1NB (0; 2; t) − F1N (0; 2; t) ≈ [F1NB (0; 0; 0) − F1N (0; 0; 0)]
1 ; 1 − t=m2
(243)
where F1N is evaluated through a dispersive integral as discussed in Eq. (236), and the -meson mass m is a free parameter as in the .xed-t unsubtracted RCS dispersion analysis. A .t to the t-dependence of RCS data results in m ≈ 0:6 GeV [75]. The value F1NB (0; 0; 0) is then considered as a remaining global .t parameter to be extracted from experiment. It can be expressed physically in terms of the magnetic dipole polarizability (: F1NB (0; 0; 0) =
4 ( : e2
(244)
The term F1N (0; 0; 0) in Eq. (243), can be calculated through a dispersion integral and results in the value: em F1N (0; 0; 0) = (N = 9:1 ; 12
(245)
As mentioned before, the 0 -pole only contributes to the amplitudes F5 and F11 , but drops out in the combination (F5 + 4F11 ), which therefore has a di0erent high-energy behavior.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
177
in units of 10−4 fm3 . From the N contribution (N of Eq. (245), and the phenomenological value ( of Eq. (168), one obtains the di0erence ( − (N = −7:5 ;
(246)
which enters in the rhs of Eq. (243). As discussed before, the small total value of the magnetic polarizability ( comes about by a near cancellation between a large (positive) paramagnetic contribution ((N ) and a large (negative) diamagnetic contribution (( − (N ), i.e., the asymptotic part of F1 parametrizes the diamagnetism. Turning next to the Q2 dependence of the asymptotic contribution to F1 , it has been proposed in [141] to parametrize this part of the nonBorn term F1NB (Q2 ; 2; t) beyond its N dispersive contribution, by an energy independent t-channel pole of the form: F1NB (Q2 ; 2; t) − F1N (Q2 ; 2; t) ≈
f(Q2 ) : 1 − t=m2
(247)
The function f(Q2 ) in Eq. (247) can be obtained by evaluating the lhs of Eq. (247) at the point where the GPs are de.ned, i.e., 2 = 0 and t = −Q2 , at .nite Q2 . This leads to: 2 2 2 f(Q2 ) = [FZ 1 (Q2 ) − FZ N 1 (Q )](1 + Q =m ) ;
(248)
where the shorthand FZ 1 (Q2 ) is de.ned in Eq. (230). Eqs. (247) and (248) then lead to the following expression for the VCS amplitude F1NB : 2 F1NB (Q2 ; 2; t) ≈ F1N (Q2 ; 2; t) + [FZ 1 (Q2 ) − FZ N 1 (Q )]
1 + Q2 =m2 ; 1 − t=m2
(249)
2 where the N contributions F1N (Q2 ; 2; t) and FZ N 1 (Q ) are calculated through dispersion integrals as given by Eqs. (236) and (239), respectively. Consequently, the only unknown quantity on the rhs of Eq. (249) is FZ 1 (Q2 ), which can be directly used as a .t parameter at .nite Q2 . The quantity FZ 1 (Q2 ) can be expressed in terms of the generalized magnetic polarizability P (M 1; M 1)0 of Eq. (227) as [140]: 1=2 1=2 2E 2E 3 4 2 (M 1; M 1)0 2 Z P (Q ) ≡ 2 ((Q2 ) ; (250) F 1 (Q ) = − 8 E+M e E+M
where ((Q2 ) is the generalized magnetic polarizability, which reduces to the polarizability ( of RCS at Q2 =0. The parametrization of Eq. (249) for F1 then permits to directly extract ((Q2 ) from VCS observables at a .xed Q2 . In the following, we consider a convenient parametrization of the Q2 dependence of ((Q2 ) in order to provide predictions for VCS observables. For this purpose it was proposed in [141] to use a dipole parametrization for the di0erence ((Q2 ) − (N (Q2 ), which enters in the rhs of Eq. (249) via Eq. (250), as: 13 ((Q2 ) − (N (Q2 ) = 13
(( − (N ) ; (1 + Q2 =I2( )2
The dipole form displays the 1=Q4 behavior at large Q2 as expected from perturbative QCD.
(251)
178
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
where the RCS value (( − (N ) on the rhs is given by Eq. (246). The mass scale I( in Eq. (251) determines the Q2 dependence, and hence gives us the information how the diamagnetism is spatially distributed in the nucleon. Using the dipole parametrization of Eq. (251), one can extract I( from a .t to VCS data at di0erent Q2 values, and check the parametrization of Eq. (251) for the asymptotic contribution to ((Q2 ). To have some educated guess on the physical value of I( , we next discuss two microscopic calculations of the diamagnetic contribution to the GP ((Q2 ). The diamagnetism of the nucleon is dominated by the pion cloud surrounding the nucleon. This diamagnetic contribution has been estimated in Ref. [141] through a DR calculation of the t-channel intermediate state contribution to F1 . Such a dispersive estimate has been discussed before for RCS in Section 3.7, where it was shown that the asymptotic part of F1 (or equivalently A1 ) can be related to the → → N NZ process. The dominant contribution is due to the intermediate state with spin and isospin zero (I = J = 0). The generalization to VCS leads then to the identi.cation of F1as with the following unsubtracted DR in t at .xed energy 2 = 0: ∞ Imt F1 (Q2 ; 0; t ) 1 as 2 FZ 1 (Q ) = dt : (252) 4m2 t + Q2 The imaginary part on the rhs of Eq. (252) has been evaluated in Ref. [141] through the subprocesses ∗ → and → N NZ . To describe the Q2 dependence of the ∗ → amplitude, which is dominated by the unitarized Born amplitude (on the pion), the pion electromagnetic form factor was included. The result for this dispersive estimate of FZ as 1 through t-channel intermediate states is shown in Fig. 29, and compared with the corresponding evaluation of Ref. [151] in the linear -model (LSM). The LSM calculation overestimates the value of FZ as 1 (0) (or equivalently (as ) by about 30% at any realistic value of m , which is a free parameter in this calculation. However, as for the dispersive calculation, it also shows a steep Q2 dependence. Furthermore in Fig. 29, the two model calculations discussed above are compared with the dipole parametrization of Eq. (251) for the two values I( = 0:4 GeV and I( = 0:6 GeV. It is seen that these values are compatible with the microscopic estimates discussed before. In particular, the result for I( = 0:4 GeV is nearly equivalent to the dispersive estimate of exchange in the t-channel. The value of the mass scale I( is small compared to the typical scale of ID ≈ 0:84 GeV appearing in the nucleon magnetic (dipole) form factor. This reIects the fact that diamagnetism has its physical origin in the pionic degrees of freedom, i.e., the diamagnetism is situated in the surface and intermediate region of the nucleon. • Dispersive contributions beyond N to F2 Though we can write down unsubtracted DRs for all invariant amplitudes (or combinations of invariant amplitudes) except for F1 and F5 , one might wonder about the quality of our approximation to saturate the unsubtracted dispersion integrals by N intermediate states only. This question is particularly relevant for the amplitude F2 , for which we next investigate the size of dispersive contributions beyond the N channel. We start with the case of RCS, where one can quantify the higher dispersive corrections to F2 , because the value of F2NB at the real photon point can be expressed exactly (see Eqs. (229) and (231)) in terms of the scalar polarizability ( + () as F2NB (0; 0; 0) = −
4 1 ( + () : e2 (2M )2
(253)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
179
0 -2
-6
-10 -12
F1
as
2
-8
2
-3
(Q ,0,-Q )1 ( GeV )
-4
-14 –
DR : γ γ → π π → N N LSM : mσ= 0.5 GeV LSM : mσ= 0.7 GeV *
-16 -18 -20 0
0.05
0.1
0.15 2
0.2
0.25
0.3
0.35
0.4
2
Q ( GeV )
Fig. 29. Theoretical estimates of the asymptotic contribution F1as : DR calculation [141] of the ∗ → → N NZ process (solid curve); linear -model (LSM) calculation [151] with m =0:5 GeV (dotted curve) and m =0:7 GeV (dashed–dotted curve). The dashed curves are dipole parametrizations according to Eq. (251), which are .xed to the phenomenological value at Q2 =0 and are shown for two values of the mass-scale, I( = 0:4 GeV (upper dashed curve, nearly coinciding with solid curve) and I( = 0:6 GeV (lower dashed curve).
The N dispersive contribution provides the value ( + ()N = 11:6;
(254)
which falls short by about 15% compared to the sum rule value of Eq. (59). The remaining part originates from higher dispersive contributions (N; : : :) to F2 . These higher dispersive contributions could be calculated through unitarity, by use of Eq. (237), similarly to the N contribution. However, the present data for the production of those intermediate states (e.g., ∗ N → N ) are still too scarce to evaluate the imaginary parts of the VCS amplitude F2 directly. Therefore, we estimate the dispersive contributions beyond N by an energy-independent constant, which is .xed to its phenomenological value at 2 = t = 0. This yields at Q2 = 0: F2NB (0; 2; t) ≈ F2N (0; 2; t) −
4 1 [( + () − ( + ()N ]; e2 (2M )2
(255)
which is an exact relation at 2 = t = 0, the point where the polarizabilities are de.ned. The approximation of Eq. (255) to replace the dispersive contributions beyond N by a constant can only be valid if one stays below the thresholds for those higher contributions. Since the next threshold beyond N is N , the approximation of Eq. (225) restricts us in practice to energies below the 8(1232)-aresonance. We next consider the extension to VCS, and focus our e0orts to describe VCS into the 8(1232)resonance region. Analogously to Eq. (255) for RCS, the dispersive contributions beyond N are approximated by an energy-independent constant. This constant is .xed at arbitrary Q2 , 2 = 0, and
180
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
t = −Q2 , which is the point where the GPs are de.ned. One thus obtains for F2NB [141]: 2 F2NB (Q2 ; 2; t) ≈ F2N (Q2 ; 2; t) + [FZ 2 (Q2 ) − FZ N 2 (Q )] ;
(256)
where FZ 2 (Q2 ) is de.ned as in Eq. (230), and can be expressed in terms of GPs through relations as given by Eqs. (231)–(234). We saturate the 3 combinations of spin GPs in Eqs. (232)–(234) by their N contribution, and include for the fourth spin GP of Eq. (242) also the 0 -pole contribution. Therefore, we only consider dispersive contributions beyond the N intermediate states for the two scalar GPs, which are therefore the two .t quantities in the present DR formalism for VCS. 2 In this way, one can use Eq. (231), to write the di0erence FZ 2 (Q2 ) − FZ N 2 (Q ) entering in the rhs of Eq. (256) as 1=2 q˜0 1 2E 4 2 N 2 Z Z F 2 (Q ) − F 2 (Q ) ≈ 2 e E+M q 2 2M ×{[(Q2 ) − N (Q2 )] + [((Q2 ) − (N (Q2 )]} ;
(257)
in terms of the generalized magnetic polarizability ((Q2 ) of Eq. (250), and the generalized electric polarizability (Q2 ), which is related to the GP P (L1; L1)0 (Q2 ) by Eq. (229). We stress that Eqs. (249) and (257) are intended to extract the two GPs (Q2 ) and ((Q2 ) from VCS observables minimizing the model dependence as much as possible. As discussed before for ((Q2 ), we next consider a convenient parametrization of the Q2 dependence of (Q2 ) in order to provide predictions for VCS observables. For this purpose, a dipole form has been proposed in Ref. [141] for the di0erence (Q2 ) − N (Q2 ) which enters in the rhs of Eq. (257), (Q2 ) − N (Q2 ) =
( − N ) ; (1 + Q2 =I2 )2
(258)
where the Q2 dependence is governed by the mass scale I , the second free parameter of the DR formalism. In Eq. (258), the RCS value ( − N ) = 9:6 ;
(259)
is obtained from the phenomenological value of Eq. (168) for , and from the calculated N contribution, N =2.5. Using the dipole parametrization of Eq. (258), one can extract the free parameter I from a .t to VCS data at di0erent Q2 values. 4.5. VCS data for the proton and extraction of generalized polarizabilities Having set up the dispersion formalism for VCS, we now show the predictions for the di0erent ep → ep observables for energies up to the 8(1232)-resonance region. The aim of the experiments is to extract the 6 GPs of Eqs. (227) and (228) from both unpolarized and polarized observables. We will compare the DR results, which take account of the full dependence of the ep → ep observables on the energy (q ) of the emitted photon, with a low-energy expansion (LEX) in q . In the LEX of observables, only the .rst three terms of a Taylor expansion in q are taken into account.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
181
In such an expansion in q , the experimentally extracted VCS unpolarized squared amplitude Mexp takes the form [136] Mexp =
exp M− 2
q 2
+
exp M− 1 + M0exp + O(q ) : q
(260)
exp exp Due to the low energy theorem (LET), the threshold coeGcients M− 2 and M−1 are known [136], and are fully determined from the Bethe-Heitler+ Born (BH+B) amplitudes. The information on the GPs is contained in M0exp , which contains a part originating from the BH+B amplitudes and another one which is a linear combination of the GPs, with coeGcients determined by the kinematics. The unpolarized observable M0exp can be expressed in terms of three structure functions PLL (q); PTT (q), and PLT (q) by [136]:
q˜0 M0exp − M0BH+B = 2K2 v1 [PLL (q) − PTT (q)] + v2 − v3 2(1 + )PLT (q) ; (261) q
where K2 is a kinematical factor, is the virtual photon polarization (in the standard notation used in electron scattering), and v1 ; v2 ; v3 are kinematical quantities depending on and q as well as on the ∗ c.m. polar and azimuthal angles ("cm and L, respectively) of the produced real photon (for details see Ref. [137]). The three unpolarized observables of Eq. (261) can be expressed in terms of the 6 GPs as [136,137] √ (262) PLL = −2 6MGE P (L1; L1)0 ; q 2 (M 1; M 1)1 √ (P − 2q˜0 P (L1; M 2)1 ) ; q˜0
(263)
3 Mq 3 Qq GE P (M 1; M 1)0 + GM P (L1; L1)1 ; 2 Q 2 q˜0
(264)
PTT = −3GM PLT =
where GE and GM stand for the electric and magnetic nucleon form factors GE (Q2 ) and GM (Q2 ), respectively. The .rst VCS experiment was performed at MAMI [131] and the response functions PLT and PLL − PTT = were extracted at Q2 = 0:33 GeV2 by performing a LEX to these VCS data, according to Eq. (261). To test the validity of such a LEX, we show in Fig. 30 the DR predictions for the full energy dependence of the nonBorn part of the ep → ep cross section in the kinematics of the MAMI experiment [131]. This energy dependence is compared with the LEX, which predicts a linear dependence in q for the di0erence between the experimentally measured cross section and its BH+B contribution. The result of a best .t to the data in the framework of the LEX is indicated by the horizontal bands in Fig. 30 for the quantity (d 5 − d 5 BH+Born )=Eq , where E is a phase space factor de.ned in Ref. [136]. The .vefold di0erential cross section d 5 is di0erential with respect to the electron lab energy and lab angles and the proton c.m. angles, and stands in all of the following p e e for d=d klab d$lab d$cm . It is seen from Fig. 30 that the DR results predict only a modest additional energy dependence up to q 0.1 GeV and for most of the photon angles involved, and therefore ∗ support the LEX analysis of [131]. Only for forward angles, "cm ≈ 0, which is the angular range from which the value of PLT is extracted, the DR calculation predicts a stronger energy dependence in the range up to q 0:1 GeV, as compared to the LEX.
182
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
(d σ-d σ 5
5 BH+Born
)/Φq
,
-2
(GeV )
0.5 0.25 0 -0.25 -0.5 0.5 0.25 0 -0.25 -0.5 0.5 0.25 0 -0.25 -0.5 0.5
50
100
0.25 0 -0.25 -0.5 50
100
50
100
50
100
,
q (MeV) Fig. 30. (d 5 − d 5 BH+Born )=Eq for the ep → ep reaction as function of the outgoing-photon energy q in MAMI ∗ kinematics: = 0:62; q = 0:6 GeV; L = 0◦ , and for di0erent photon c.m. angles "cm . The data and the shaded bands, representing the best .t to the data within the LEX formalism, are from [131]. The solid curves are the DR results taking into account the full q dependence of the nonBorn contribution to the cross section. The asymptotic contributions are calculated according to Eqs. (251) and (258), with I( = 0:6 GeV and I = 1 GeV, respectively.
In Fig. 31, we display the response functions PLL − PTT = and PLT at Q2 = 0:33 GeV2 , which have been extracted from the cross section data of Fig. 30 [131], and compare them with the corresponding DR calculations. For the electromagnetic form factors in Eqs. (262)–(264) we use the H]ohler parametrization [152] as in the analysis of the MAMI experiment [131]. In the lower panel of Fig. 31, the Q2 -dependence of the VCS response function PLT is displayed, which reduces to the magnetic polarizability ( at the real photon point. At .nite Q2 , it contains both the scalar GP ((Q2 ) and the spin GP P (L1; L1)1 , as seen from Eq. (264). It is obvious from Fig. 31 that the structure function PLT results from a large dispersive N (paramagnetic) contribution, which is dominated by 8(1232) resonance excitation, and a large asymptotic (diamagnetic) contribution to ( with opposite sign, leading to a relatively small net result. The asymptotic contribution is shown in Fig. 31 with the parametrization of Eq. (251) for the values I( = 0:4 and I( = 0:6 GeV, which were also displayed in Fig. 29. Due to the large cancellation in PLT , its Q2 dependence is a very
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
75
80
50
60
25
40
0
20 0
0.25 0.5 0.75
1
30 20 10 0 -10 -20 -30
0
0
0.25 0.5 0.75
1
0
0.25 0.5 0.75
1
183
0
-5
-10 0
0.25 0.5 0.75 2
2
Q (GeV )
1
2
2
Q (GeV )
Fig. 31. Results for the unpolarized structure functions PLL − PTT = (upper panels), and PLT (lower panels), for = 0.62. Upper left panel: dispersive N contribution of the GP (solid curve, S0), dispersive N contribution of the spin-Iip GPs (dashed curve, S1), and the asymptotic contribution (AS) of according to Eq. (258) with I = 1 GeV (dotted curve). Upper right panel: total result for PLL −PTT = (sum of the three contributions on the upper left panel) for I =1 GeV (solid curve) and I = 1:4 GeV (dashed curve). Lower left panel: dispersive N contribution of the GP ( (solid curve, S0), contribution of the spin-Iip GPs (dashed curve, S1), and the asymptotic contribution (AS) of ( according to Eq. (251) with I( = 0:6 GeV (dotted curve). Lower right panel: total result for PLT , for I( = 0:7 GeV (dotted curve), I( = 0:6 GeV (solid curve), and I( = 0:4 GeV (dashed curve). The RCS data are from Ref. [87], and the VCS data at Q2 = 0:33 GeV2 from Ref. [131].
sensitive observable to study the interplay of the two mechanisms. In particular, one expects a faster fall-o0 of the asymptotic contribution with Q2 in comparison to the N dispersive contribution, as discussed before. This is highlighted by the measured value of PLT at Q2 = 0:33 GeV2 [131], which is comparable to the value of PLT at Q2 = 0. As seen from Fig. 31, this points to an interesting structure in the Q2 region around 0.05 –0:1 GeV2 , where forthcoming data are expected from an experiment at MIT-Bates [133]. In the upper panel of Fig. 31, we show the Q2 -dependence of the VCS response function PLL − PTT =, which reduces at the real photon point to the electric polarizability . At nonzero Q2 ; PLL is directly proportional to the scalar GP (Q2 ), as seen from Eq. (262), and the response function PTT of Eq. (263) contains only spin GPs. As is shown by Fig. 31, the N dispersive contribution to and to the spin GPs in PTT are smaller than the asymptotic contribution to . At Q2 = 0, the N dispersive and asymptotic contributions to have the same sign and lead to a large value of , in contrast to ( where both contributions have opposite sign and largely cancel each other in their sum. Increasing the energy, we show in Fig. 32 the DR predictions for photon energies in the 8(1232)resonance region. It is seen that the ep → ep cross section rises strongly when crossing the pion
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
0.9 0.8 0.7 0.6 0.5 0.4
5
2
d σ (nb/GeV sr )
184
0.3 0.2 0.05
0.1
0.15
0.2
0.25
0.3
0.2
0.25
0.3
,
-2
)/Φq (GeV )
q (GeV)
1
0
5
(d σ-d σ
5 BH+Born
,
2
0.05
0.1
0.15 ,
q (GeV)
Fig. 32. Upper panel: The di0erential cross section for the reaction ep → ep as function of the outgoing-photon energy ∗ q in MAMI kinematics: = 0:62, q = 0:6 GeV, and for "cm = 0◦ , in plane (L = 0◦ ). The BH+B contribution is given 5 by the dashed–dotted curve. Lower panel: Results for (d − d 5 BH+Born )=Eq as function of q . The total DR results are obtained with the asymptotic parts of Eqs. (251) and (258), using a .xed value of I = 1 GeV and for the same three values of I( as displayed in the lower right plot of Fig. 31, i.e. I( = 0:7 GeV (dotted curve), I( = 0:6 GeV (solid curve), and I( = 0:4 GeV (dashed curve). In the lower panel, the DR calculations taking into account the full energy dependence of the nonBorn contribution (thick curves) are compared to the corresponding results within the LEX formalism (thin horizontal curves). The data are from Ref. [131].
threshold. In the dispersion relation formalism, which is based on unitarity and analyticity, the rise of the cross section with q below pion threshold, due to virtual N intermediate states, is connected to the strong rise of the cross section with q when a real N intermediate state can be produced. It is furthermore seen from Fig. 31 (lower panel) that the region between pion threshold and the 8-resonance peak displays an enhanced sensitivity to the GPs through the interference with the rising Compton amplitude due to 8-resonance excitation. For example, at q 0:2 GeV, the predictions for PLT in the lower right panel of Fig. 31 for I( =0:4 GeV and I( =0:6 GeV give a di0erence of about 20% in the nonBorn squared amplitude. In contrast, the LEX prescription results in a relative e0ect for the same two values of PLT of about 10% or less. This is similar to the situation discussed in Section 3.4 for RCS, where the region between pion threshold and the 8-resonance position also provides an enhanced sensitivity to the polarizabilities and is used to extract those polarizabilities from data using a DR formalism. Therefore, the energy region between pion threshold and the
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
185
0
-0.05
-0.1
-0.15
0
10
20
30
40
50
60
70
80
90
Fig. 33. Electron single spin asymmetry (SSA) for VCS in MAMI kinematics as function of the photon scattering angle. The full dispersion results are shown for the values: I = 1 GeV; I( = 0:6 GeV (solid curve), I = 1 GeV; I( = 0:4 GeV (dashed curve), I = 1 GeV; I( = 0:7 GeV (dotted curve), and I = 1:4 GeV; I( = 0:4 GeV (dashed–dotted curve).
8-resonance seems promising to measure VCS observables with an increased sensitivity to the GPs. Such an experiment has been proposed at MAMI and is underway [134]. When crossing the pion threshold, the VCS amplitude acquires an imaginary part due to the coupling to the N channel. Therefore, single polarization observables become nonzero above pion threshold. A particularly relevant observable is the electron single spin asymmetry (SSA), which is obtained by Iipping the electron beam helicity [137]. For VCS, this observable is mainly due to the interference of the real BH+B amplitude with the imaginary part of the VCS amplitude. As the SSA vanishes in-plane, its measurement requires an out-of-plane experiment. Such experiments have been proposed both at MAMI [134] and at MIT-Bates [153]. In Fig. 33, the SSA is shown for a kinematics in the 8(1232) region, corresponding with W ≈ 1:2 GeV. The DR calculation .rstly shows that the SSA is quite sizable in the 8(1232) region. The SSA, which is mainly sensitive to the imaginary part of the VCS amplitude, displays only a rather weak dependence on the magnetic GPs ((Q2 ), and shows a modest dependence on (Q2 ). Therefore, it provides an excellent cross-check of the dispersive input in the DR formalism for VCS, in particular by comparing at the same time the pion and photon electroproduction channels through the 8 region. Going to higher Q2 , the VCS process has also been measured at JLab and data have been obtained both below pion threshold at Q2 = 1 GeV2 [154], at Q2 = 1:9 GeV2 [155], as well as in the resonance region around Q2 = 1 GeV2 [156] (see Ref. [132] for a short review of these JLab data). In Fig. 34, we show the results for the ep → ep reaction in the resonance region at Q2 = 1 GeV2 and at a backward angle. These are the .rst VCS measurements ever performed in the resonance region. We also display the DR calculations of [141] for the cross section. The data clearly show the excitation of the 8(1232) resonance, and display a second and third resonance region, mainly due to the excitations of the D13 (1520) and F15 (1680) resonances. The DR calculations reproduce well the
186
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
Fig. 34. The di0erential cross sections for the ep → ep reaction as function of the c.m. energy W in JLab kinematics: ∗ Ee = 4:032 GeV; Q2 = 1:0 GeV2 , and for .xed scattering angle "cm = −167:2◦ , for di0erent out-of-plane angles L. The BH+B contribution is given by the dashed curve. The total DR result is shown by the solid curve (limited to W ¡ 1:25 GeV) for the values I = 1:0 GeV and I( = 0:45 GeV. The data are from Ref. [132].
8(1232) region. Due to scarce information for the dispersive input above the 8(1232) resonance, the DR calculations cannot be extended at present into the second and third resonance regions. Between pion threshold and the 8(1232) resonance, the calculations show a sizable sensitivity to the GPs, in particular to PLL in this backward angle kinematics, and seem very promising to extract information on the electric polarizability. The precise extraction of GPs from VCS data at these higher values of Q2 , requires an accurate knowledge of the nucleon electromagnetic form factors (FFs) in this region. For the proton electromagnetic form factors, we use the new empirical .t of [157], which includes the recent high accuracy measurements performed at JLab for the ratio of proton electric
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
1
1
10
10
-1
10
-2
10 10
10
10
10
-1
10
-2
10
10
10
-1
10 10
187
-1
-2
-3
-1
-2
-3
-1
-2
-2
10 -200 -150 -100 -50
0
50
-3
-200 -150 -100 -50
0
50
Fig. 35. The di0erential cross section for the reaction ep → ep as function of the photon scattering angle, at di0erent values of the outgoing-photon energy in JLab kinematics: Q2 = 1 GeV2 and = 0:95 (left panels) and Q2 = 1:9 GeV2 and = 0:88 (right panels). The BH+B cross sections are shown by the dashed–dotted curves. The DR results are displayed with the asymptotic terms parametrized as in Eqs. (258) and (251), using the values: I = 1 GeV and I( = 0:6 GeV (solid curves), I = 1 GeV and I( = 0:4 GeV (dashed curves), I = 1:4 GeV and I( = 0:6 GeV (dotted curves).
FF GE to the magnetic FF GM in the Q2 range 0.4 –5:6 GeV2 [70,71]. From Fig. 34, one sees that a good description of the JLab data is obtained by the values I = 1:0 GeV and I( = 0:45 GeV. Besides the measurement in the resonance region, the ep → ep reaction has also been measured at JLab below pion threshold for three values of the outgoing photon energy at Q2 = 1 GeV2 [154], and at Q2 = 1:9 GeV2 [155]. For those kinematics, we show in Fig. 35 the di0erential cross sections as well as the nonBorn e0ect relative to the BH+B cross section. It is seen from Fig. 35 that the sensitivity to the GPs is largest where the BH+B cross section becomes small, in particular in the angular region between 0◦ and 50◦ . In Fig. 35, we show the nonBorn e0ect for di0erent values of the GPs. From the JLab data below pion threshold, the two unpolarized structure functions PLL − PTT = and PLT have been extracted at Q2 = 1 GeV2 and at Q2 = 1:9 GeV2 [132]. For this extraction below pion threshold, both the LEX and the DR formalisms can be used. A nice agreement between the results of both methods for the structure functions was found in Ref. [132]. The preliminary results
188
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
20 80
15 10
60
5 0
40
-5 20
-10 -15
0 0
0.5
1 2
1.5 2
Q (GeV )
2
-20
0
0.5
1 2
1.5
2
2
Q (GeV )
Fig. 36. Results for the unpolarized VCS structure functions PLL (left panel) and PLT (right panel) divided by the proton electric form factor. Dashed lines: dispersive N contributions. Dotted lines: asymptotic contributions calculated according to Eqs. (251) and(258) with I = 0:92 GeV (left panel) and I( = 0:66 GeV (right panel). Solid curves: total results, sum of the dispersive and asymptotic contributions. The RCS data are from Ref. [87], the VCS MAMI data at Q2 = 0:33 GeV2 are from Ref. [132], and the preliminary VCS JLab data at Q2 = 1 GeV2 and Q2 = 1:9 GeV2 from Ref. [132] (inner error bars are statistical errors only, outer error bars include systematical errors). The values for PLL at Q2 ¿ 0 were extracted by use of a dispersive estimate for the not yet separated PTT contribution.
at Q2 =1 GeV2 and at Q2 =1:9 GeV2 for PLL 14 and PLT are displayed in Fig. 36, alongside the RCS point and the results at Q2 =0:33 GeV2 . By dividing out the form factor GE , one sees from Eq. (262) that PLL is proportional to the electric GP (Q2 ), whereas PLT is proportional to the magnetic GP ((Q2 ) plus some correction due to the spin Iip GP P (L1; L1)1 which turns out to be small in the DR formalism as discussed further on. One sees from Fig. 36 that the best .t value for I 0:92 GeV yields an electric polarizability which is dominated by the asymptotic contribution and has a similar Q2 behavior as the dipole form factor. However, the best .t value for I( 0:66 GeV is substantially lower, indicating that the diamagnetism, which is related to pionic degrees of freedom, drops faster with Q2 . One nicely sees that the data con.rm the interplay between para-and dia-magnetism in ( as function of Q2 . Until now, we discussed only unpolarized VCS observables. An unpolarized VCS experiment gives access to only three combinations of the 6 GPs, as given by Eqs. (262)–(264). It was shown in Ref. [158] that VCS double polarization observables with polarized lepton beam and polarized target (or recoil) nucleon, will allow us to measure three more combinations of GPs. Therefore a measurement of unpolarized VCS observables (at di0erent values of ) and of 3 double-polarization observables will give the possibility to disentangle all 6 GPs. The VCS double polarization observables, which are denoted by PM(h; i) for an electron of helicity h, are de.ned as the di0erence of the squared amplitudes for recoil (or target) proton spin orientation in the direction and opposite to the axis i (i = x; y; z), where the z-direction is chosen along the virtual photon momentum (see Ref. [158] 14
The present experiments, which are performed at a .xed value of only measure the combination PLL − PTT =. To extract PLL from these data, we calculate the relatively small (spin-Iip) contribution PTT (shown by the curves labeled S1 on the left panel of Fig. 31) in the DR formalism and subtract it from the measured value.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 q=600 MeV q =111.5 MeV ε=0.62 φ=0 ,
189
o
0 0.65
0.6
-0.1
0.55 -0.2 0.5
-0.3
0.45 -150
-100
-50
0
-150
-100
-50
0
Fig. 37. VCS double-polarization asymmetry (polarized electron, recoil proton polarization along either the z-or x-directions in the c.m. frame) in MAMI kinematics as function of the photon scattering angle. The dotted curves correspond to the BH+B contribution. The solid curves show the total DR results for the values of I = 1 GeV, I( = 0:6 GeV. The dashed curves are the HBChPT predictions from [159].
for details). In a LEX, this polarized squared amplitude yields PM
exp
exp exp PM− PM− 2 1 = + + PM0exp + O(q ) : q 2 q
(265)
exp exp Analogous to the unpolarized squared amplitude (260), the threshold coeGcients PM− 2 , PM−1 exp are known due to the LET. It was found in Ref. [158] that the polarized squared amplitude PM0 ⊥ z can be expressed in terms of three new structure functions PLT (q); P zLT (q), and P LT (q). These new structure functions are related to the spin GPs according to Refs. [158,137]: z = PLT
3Qq 3Mq GE P (M 1; M 1)1 ; GM P (L1; L1)1 − 2q˜0 Q
3 3M q 2 z GE P (M 1; M 1)1 ; P LT = − QGM P (L1; L1)1 + 2 Qq˜0 3 3qQ ⊥ q˜0 P (M 1; L2)1 : GM P (L1; L1)1 − P LT = 2q˜0 2
(266) (267) (268)
⊥ z and P zLT can be accessed by in-plane kinematics (L = 0◦ ), the measurement of P LT While PLT requires an out-of-plane experiment. In Fig. 37, we show the dispersion results for the double polarization observables, with polarized electron and by measuring the recoil proton polarization either along the virtual photon direction (z-direction) or parallel to the reaction plane and perpendicular to the virtual photon (x-direction).
190
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
80
80
60
60
40
40
20
20
0
0
0.4 0.2 2 Q (GeV )
0
0
2
2 0 -2 -4 -6 -8 -10 -12
0.4 0.2 2 2 Q (GeV )
2 0 -2 -4 -6 -8 -10 -12 0
0.2 2
0.4 2
Q (GeV )
0
0.2 2
0.4 2
Q (GeV )
Fig. 38. Comparison between the VCS unpolarized structure functions calculated within the DR formalism [141] and the O(p3 ) HBChPT [159,160]. Upper panels: results for the PLL − PTT = structure function for = 0:62 predicted from DR (left) and O(p3 ) HBChPT (right). For the DR predictions, the contribution from the electric GP (Q2 ) for I = 1:4 GeV (dashed curve), is compared with the sum of the contributions from the scalar and spin-Iip GPs (solid curve). Lower panels: results for PLT within DR (left) and O(p3 ) HBChPT (right). In the left panel, the contribution from ((Q2 ) is shown for the values I( = 0:6 GeV (dashed curve) and I( = 0:4 GeV (dotted curve). The total results, sum of the contributions from scalar and spin-Iip GPs, are shown for I( = 0:6 GeV (solid curve) and for I( = 0:4 GeV (dashed–dotted curves). In the right panel, the predictions from HBChPT are shown for the contribution from ((Q2 ) alone (dashed curve), and for the total result (solid curve), which includes the GP P (L1; L1)1 . The RCS data are from Ref. [87], and the VCS data at Q2 = 0:33 GeV2 from Ref. [131].
The double polarization asymmetries are quite large (due to a nonvanishing asymmetry for the BH+Born mechanism), but the DR calculations show only small relative e0ects due to the spin GPs below pion threshold. However, a heavy-baryon chiral perturbation theory (HBChPT) calculation to O(p3 ) [159] shows a signi.cantly larger e0ect due to larger values of the spin GPs in this calculation, as will be discussed in the next section. Although these double polarization observables are tough to measure, a .rst test experiment is already planned at MAMI [134]. 4.6. Physics content of the nucleon generalized polarizabilities Having discussed the present status of the VCS experiments and the combinations of GPs which have been extracted from such experiments so far, we now turn in some more detail to the physics content of these GPs and compare di0erent model predictions. We start our discussion with the VCS unpolarized structure functions as shown in Fig. 38, and compare the DR results of Ref. [141] with the O(p3 ) HBChPT calculations [159,160]. The DR results
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
191
6 10
4
5
2
0
0
0.2 0.4 2 2 Q (GeV )
0
0
0.2 0.4 2 2 Q (GeV )
Fig. 39. Left panel: comparison between the results for the electric GP (Q2 ) predicted by the DR formalism for I = 1:4 GeV (solid curve) and O(p3 ) HBChPT (dashed curve). Right panel: comparison between the results for the magnetic GP ((Q2 ) predicted by the DR formalism for I( = 0:6 GeV (full line) and I( = 0:4 (dashed–dotted line), and by O(p3 ) HBChPT (dashed line).
have been shown before in Fig. 31, where we discussed the di0erent mass scales parametrizing the asymptotic parts in the GPs and (. In Fig. 38, we show in addition the e0ect of the spin GPs on these response functions and compare them with the corresponding calculation in HBChPT. One notices that the e0ect of the spin GPs is much smaller in the DR calculation than in O(p3 ) HBChPT, in particular for the spin GPs entering PTT . The good agreement with the data found in the O(p3 ) HBChPT calculation is for an important part due to the larger size of the spin GPs in this calculation. The comparison between the spin independent GPs in both calculations is shown in Fig. 39. From this .gure, we see a qualitative agreement between both the DR and the O(p3 ) HBChPT results. In particular, we see that in both calculations the Q2 dependence of the electric and magnetic GPs is quite di0erent. The electric GP shows a rather smooth Q2 behavior, much as the nucleon electric form factor, whereas the magnetic GP has a characteristic structure at small Q2 . In the DR calculation, this results due to a cancellation between a large paramagnetic 8 contribution and a diamagnetic contribution (due to t-channel exchange) which have a di0erent Q2 behavior, as was already noticed in the early e0ective Lagrangian calculation of Ref. [161]. In the O(p3 ) HBChPT, this structure in ((Q2 ), at low Q2 , results from N loop e0ects. By Fourier transforming the GPs (Q2 ) 15 and ((Q2 ) in the Breit frame, it was argued in [162] that one obtains a spatial distribution of the induced electric polarization (r) and magnetization ((r) of the nucleon. The picture which then emerges from the N loop contribution in the HBChPT calculation is as expected from a classical interpretation of diamagnetism. Due to a change in the external magnetic .eld, pionic currents start circulating around the nucleon, and give rise to an induced magnetization, opposite to the applied .eld. This diamagnetic e0ect leads at distances r ¿ 1=m to an negative value for ((r), whereas for distances r 6 1=m the paramagnetism dominates and ((r) is positive. Therefore, as the momentum transfer Q2 increases, the negative long-distance contribution to the magnetic GP due to the pion cloud, no longer contributes and hence ((Q2 ) increases. This nicely explains the positive slope of ((Q2 ) at Q2 = 0 and the characteristic turn-over at low Q2 in the HBChPT calculation as is seen in 15
In the notation of Ref. [162], (Q2 ) is denoted as the so-called longitudinal electric GP L , to distinguish it from a higher order (in the outgoing photon energy), the so-called transverse electric GP T .
192
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
3 4 2
2
1 0
0
0
0.25 0.5 0.75 2
1
-2
0
2
0.25 0.5 0.75 2
Q (GeV )
1
2
Q (GeV ) 30
0
20 -10
10 0
-20
-10 -30
0
0.25 0.5 0.75 2 2 Q (GeV )
1
-20
0
0.25 0.5 0.75 2 2 Q (GeV )
1
Fig. 40. Q2 -dependence of the spin-Iip GPs as calculated in Refs. [141,143]. The dashed curves correspond to the dispersive N contribution, the dotted curves show the 0 -pole contribution, and the solid curves are the sum of the dispersive and 0 -pole contributions. For comparison, we also show the 0 -pole contribution when setting the 0 ∗ form factor equal to 1 (dashed–dotted curves). Note that P (L1; L1)1 has no 0 -pole contribution.
Fig. 39. Hence it will be interesting to see the results of a measurement around Q2 0:05−0:1 GeV2 for PLT performed at MIT-Bates [133] to reveal the nature of the diamagnetism in the nucleon. We next discuss the spin-Iip GPs. In Fig. 40, we show the dispersive and 0 -pole contributions to the 4 spin GPs as well as their sum, according to the calculations of [141,143]. For the presentation, we multiply in Fig. 40 the GPs P (L1; M 2)1 and P (M 1; L2)1 with Q, in order to better compare the Q2 dependence when including the 0 -pole contribution, which itself drops very fast with Q2 . The 0 -pole does not contribute to the GP P (L1; L1)1 , but is seen to dominate the other three spin GPs. It is however possible to .nd, besides the GP P (L1; L1)1 , the two combinations given by Eqs. (233) and (234) of the remaining three spin GPs, for which the 0 -pole contribution drops out [143]. In Fig. 41 we show the results of the dispersive contribution to the four spin GPs, and compare them to the results of the nonrelativistic constituent quark model [163], the HBChPT to O(p3 ) [159,160], the recent HBChPT calculation to O(p4 ) [164], and the linear -model [151]. The constituent quark model (CQM) calculation gives negligibly small contributions for the GPs P (L1; L1)1 and P (M 1; L2)1 , whereas the GPs P (M 1; M 1)1 and P (L1; M 2)1 receive their dominant contribution from the excitation of the 8(1232) (M 1 → M 1 transition) and the N ∗ (1520) (E1 → M 2 or L1 → M 2 transitions), respectively. The smallness of P (L1; L1)1 and P (M 1; L2)1 in the CQM can be understood by noting that those two GPs can be expressed in terms of a GP which involves a transition from L0 (Coulomb monopole) to M 1, through a crossing symmetry relation [140] as √ q2 (L1; L1)1 q2 3 P = P (M 1; L0)1 + √ P (M 1; L2)1 : (269) q˜0 2
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
8 7 6 5 4 3 2 1 0
0
0.1 0.2 0.3 0.4 0.5 2 2 Q (GeV )
2
2 1 0 -1 -2 -3 -4 -5 -6
0
0.1 0.2 0.3 0.4 2 2 Q (GeV )
0
0.1 0.2 0.3 0.4 0.5 2 2 Q (GeV )
193
0
0
-5
-2 -4
-10
-6
-15
-8
0
0.1 0.2 0.3 0.4 0.5 2 2 Q (GeV )
Fig. 41. Results for the spin-Iip GPs excluding the 0 -pole contribution in di0erent model calculations. The solid curves correspond to the dispersive N contribution [141,143]. The thin dashed curves show the results of O(p3 ) HBChPT [159], whereas the thick dashed curves for P (L1; L1)1 ; P (M 1; M 1)1 , and P (L1; M 2)1 are the O(p4 ) HBChPT results [164]. The dashed–dotted curves correspond to the predictions of the linear -model [151], and the dotted curves are the results of the nonrelativistic constituent quark model [163]. Note that the constituent quark model results (CQM) for P (L1; L1)1 and P (M 1; L2)1 are multiplied (for visibility) by a factor 100.
The GPs on the rhs of Eq. (269) encode the response to a static magnetic dipole .eld (M 1) of the nucleon charge density (L0) or the electric quadrupole density (L2). In a non-relativistic CQM calculation [136,163], the only response to an applied static magnetic .eld is the alignment of the quark spins, whereas the charge density or electric quadrupole density remain unchanged. Therefore, both GPs P (M 1; L0)1 and P (M 1; L2)1 are vanishingly small in the quark model, as well as P (L1; L1)1 through Eq. (269). Consequently, P (M 1; L0)1 and P (M 1; L2)1 are promising observables to study the e0ects of the pion cloud surrounding the nucleon. A large contribution of pionic e0ects for these GPs is indeed observed in the HBChPT and in the linear -model calculations. One furthermore notices from Fig. 41 that the O(p3 ) HBChPT predicts a rather strong increase with Q2 for the GPs P (L1; L1)1 and P (M 1; M 1)1 . For P (L1; L1)1 this result is con.rmed by the O(p4 ) calculation [164]. For the GP P (M 1; M 1)1 , it was found in Ref. [164] that the O(p4 ) calculation gives a large reduction compared to the O(p3 ) result, and calls the convergence of the HBChPT result for this observable into question. The linear -model, which takes account of part of the higher order terms of a consistent chiral expansion, in general results in smaller values for the GPs P (L1; L1)1 and P (M 1; M 1)1 compared with the corresponding calculations to leading order in HBChPT. For the GP P (L1; M 2)1 , its value at Q2 = 0, which is related to the spin polarizability 3 through Eq. (229), was reported in Section 3.1.1. In particular, we can notice that the O(p4 ) HBChPT result yields a relatively large correction, bringing
194
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
0
-2
PTT (GeV )
2
-2 -4 -6 -8
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Q2 (GeV2) Fig. 42. Results for the VCS structure function PTT . Dotted curve: O(p3 ) HBChPT [160]; dashed curve: O(p4 ) HBChPT [164]; solid curve: dispersive evaluation [143,141].
it in better agreement with the DR result. From Fig. 41 one notices that the Q2 dependence of the O(p4 ) HBChPT calculation for the GP P (L1; M 2)1 is rather weak [164], and results in a near constant reduction for this observable compared to the O(p3 ) calculation. The comparison in Fig. 41 clearly indicates that a satisfying theoretical description of the spindependent GPs over a larger range in Q2 is still a challenging task. This calls for VCS experiments which are sensitive to the spin-dependent GPs. Two types of experiments can be envisaged in this regard. Firstly, one notices from Eq. (261) that an unpolarized VCS experiment at di0erent values of (by varying the beam energy) allows one to disentangle the response functions PLL and PTT . The latter contains the combination of the spin GPs P (M 1; M 1)1 and P (L1; M 2)1 given by Eq. (263). In Fig. 42, we show the response function PTT and compare the DR predictions [143,141] with the O(p3 ) HBChPT result [160] and the O(p4 ) HBChPT result [164]. One notices large corrections at O(p4 ) to the HBChPT result. Therefore, the main di0erence between the DR result and the O(p3 ) HBChPT result for the measured response function PLL − PTT =, as shown in the upper panels of Fig. 38, is largely reduced by the O(p4 ) HBChPT calculations. It will be very worthwhile to directly measure the response function PTT which will provide an interesting check on our understanding of the spin densities of the nucleon, and allow to extract the electric polarizability (Q2 ) unambiguously from the measurement of PLL − PTT =. To access the other spin GPs, which do not appear in PTT , it was discussed before that one has to resort to double polarization observables. It was shown in Fig. 37 that such observables are particularly sensitive to the di0erent predictions for spin GPs, and are very promising to measure in the near future [134], so as to sharpen our understanding of the spin-dependent response of the nucleon to an applied electromagnetic .eld. 5. Conclusions and perspectives In this review, we have applied dispersion relations to real and virtual Compton scattering processes o0 a nucleon as a powerful tool to connect di0erent observables and to extract nucleon structure quantities.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
195
For forward real Compton scattering, sum rules directly connect low energy quantities to the polarized or unpolarized total absorption cross sections. We discussed in some detail the recent evaluations of the Baldin sum rule and the status of the GDH sum rule. The latter involves an integral over the helicity di0erence cross section 1=2 − 3=2 , for photon and proton helicities having the same or opposite signs. This helicity di0erence cross section for the proton has now been measured at MAMI and ELSA through the resonance region, up to W . 2:5 GeV. It displays a region around pion threshold which is dominated by S-wave pion production, for which 1=2 dominates. Furthermore, these data clearly exhibit three resonance regions with dominance of 3=2 . By performing the GDH integral up to W . 2 GeV, one overestimates the sum rule value for the proton by about 15%, indicating that the anomalous magnetic moment is mostly related to the low-lying degrees of freedom. A measurement of 1=2 − 3=2 at energies up to W . 9 GeV will be performed in the near future at SLAC, in order to .nd out whether the present “oversaturation” of the sum rule will be removed by high-energy contributions. Such an experiment will be quite important, because it will test both our understanding of soft Regge physics in the spin-dependent forward Compton amplitude and the validity of high-energy extrapolations of DIS data at large Q2 to the real photon point. For the neutron, the convergence of the GDH sum rule is less clear at the moment because of a lack of data. Theoretical estimates based on our present knowledge of pion photoproduction multipoles yield only about 85% of the sum rule value. This may point to systematic de.ciencies in these multipoles, which have mostly been obtained from experiments on a deuteron target, or to large contributions from higher intermediate states, such as two pions. It is therefore of extreme interest to see the outcome of dedicated experiments on the neutron which are planned in the near future at MAMI and GRAAL. Extending the sum rules to forward scattering of spacelike virtual photons, we have shown how to relate nucleon structure quantities to the inclusive electroproduction cross sections. The unpolarized cross section (weighted with 1=22 ) leads to a generalization of Baldin’s sum rule, whereas the polarized cross sections (weighted with 1=23 ) lead to two nucleon spin polarizabilities. We estimated these quantities at low and intermediate Q2 by a phenomenological model (MAID), and at large Q2 by the corresponding moments of DIS structure functions. As a result, we .nd that a transition occurs around Q2 1–2 GeV2 from a resonance-dominated description at lower Q2 to a partonic description at larger Q2 . Furthermore, we also studied the generalized GDH integrals, using very recent experimental results at intermediate Q2 values measured at SLAC, HERMES, JLab/CLAS, and JLab/HallA. In particular, the JLab/CLAS data for the helicity di0erence cross section 1=2 − 3=2 of the proton in the range Q2 0:15–1:2 GeV2 , clearly demonstrate a sign change from a large negative value at low Q2 , where 3=2 dominates due to resonance excitation, to the positive DIS value at larger Q2 , where 1=2 survives. We have shown that this transition can also be nicely understood in a quantitative way. For the proton-neutron di0erence, where isospin 3/2 resonances such as the 8 drop out, the validity of chiral perturbation theory (ChPT) extends towards somewhat larger Q2 , and there is hope to bridge the gap between ChPT and perturbative QCD, which eventually leads to the well established Bjorken sum rule at large Q2 . In Section 3, we extended the dispersion formalism for forward scattering to real Compton scattering (RCS) on the nucleon for all angles. At low photon energies, this process has a well-known low-energy limit, the Thomson term, which is determined by the total mass and electric charge of the system. Moving to larger photon energies, one can identify the higher order terms in a low energy
196
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
expansion (LEX) with the response of the nucleon to an external electromagnetic .eld, parametrized by dipole and higher order nucleon polarizabilities. However, such a LEX is only valid up to about 80 MeV photon energy, and a direct experiment in this energy range would have to be extremely precise to disentangle the small e0ects due to nucleon polarizabilities. In practice, one also has to include experiments at higher energies, up to and above the pion threshold, and to rely on dispersion relations to analyze the experiments. We have reviewed and compared several such dispersion relation formalisms for RCS. In the literature, most of the recent experiments have been analyzed using unsubtracted .xed-t dispersion relations for the six RCS amplitudes. In such an approach, one has to estimate the asymptotic contributions for two of the six RCS amplitudes for which the unsubtracted dispersion integrals do not converge. These asymptotic contributions can be parametrized as energy independent t-channel poles. In such parametrizations, the most important contributions are the 0 -pole for one of the spin-dependent amplitudes and a “”-pole for a spin-independent amplitude. This procedure is relatively safe for the 0 -pole which is well established both experimentally and theoretically. However, since the -meson mass and coupling constants enter as free parameters in such a formalism, the “”-pole introduces a considerable model dependence. Instead we replace the -meson by existing physical information on the I = J = 0 part of the two-pion spectrum, within the formalism of .xed-t dispersion relations. This has been achieved by subtracting the .xed-t dispersion relations (at 2 = 0) and by evaluating the subtraction functions through a dispersion relation in the variable t. The absorptive parts entering the t-channel dispersion integrals can be saturated by intermediate states in the reaction → → N NZ , constructed by means of the phenomenological information on the → and → N NZ subprocesses. In this way we found that a consistent description for Compton scattering data at low energy can be achieved in both formalisms. Going to higher energies and backward scattering angles, a large part of the integration range lies outside the physical region, and the full amplitude has to be constructed by an analytical continuation of the partial wave expansion. Since this expansion converges only in a limited range, the application of .xed-t dispersion relations is restricted in practice to energies up to the 8-resonance and to forward angles. In order to overcome this shortcoming, we also studied .xed-angle dispersion relations, in which case the integration range of the s-channel contribution falls into the physical region. The t-channel dispersion integrals can be reconstructed from a partial wave expansion which converges for angles & 100◦ . Furthermore, such .xed-angle dispersion relations can quantitatively explain the large value for the di0erence of the electric and magnetic dipole polarizabilities, − (, without invoking a -meson contribution. Evaluated at " = 180◦ , the predictions are − ( = (10:7 ± 0:2) × 10−4 fm3 , and = (−38:8 ± 1:8) × 10−4 fm4 for the backward spin polarizability. In conclusion, .xed-t and .xed angle subtracted dispersion relations nicely complement each other, the former being convergent at small scattering angles and the latter at large scattering angles. We applied this combined formalism to all existing data. Below pion threshold, we found that all methods nicely agree. This comparison proves that the polarizabilities can indeed be extracted with a minimum of model dependence for the energy range below the 8 resonance. However, subtracted dispersion relations also provide a quantitative description of the data through the 8 resonance. We have furthermore shown that the sensitivity to the backward spin polarizability can be substantially increased by an experiment with polarized photons hitting a polarized proton target. Such an experiment, although challenging to perform, could become feasible in the near future, and can teach us more about the spin response of the nucleon to a static electromagnetic .eld.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
197
In Section 4, we have extended the dispersion relation formalism to virtual Compton scattering (VCS) o0 a proton target, as a tool to extract generalized polarizabilities (GPs) from VCS observables over a large energy range. The way we evaluated our dispersive integrals using N intermediate states, allows us to apply the present formalism for VCS observables through the 8(1232)-resonance region. The presented dispersion relation framework, when applied at a .xed value of Q2 , involves two free parameters, which can be expressed in terms of the electric and magnetic GPs, and which are to be extracted from a .t to VCS data. We confronted our dispersive calculations with existing VCS data taken at MAMI and JLab below pion threshold. Our dispersion relation formalism yields results consistent with the low-energy expansion analysis for photon energies up to about 100 MeV. When increasing the photon energy, the dispersive calculations show that the region between pion threshold and the 8-resonance peak displays an enhanced sensitivity to the GPs. We also compared our dispersion relation calculations to JLab data taken at higher photon energies, through the 8(1232)-resonance region, and found a good agreement. The extraction of GPs from the preliminary JLab data below and above pion threshold yields consistent results. These data indicate a Q2 dependence of the electric GP similar to a dipole form factor, whereas the magnetic GP follows a more complicated Q2 behavior. As was already shown for RCS, the magnetic dipole transition involves a strong cancellation between a diamagnetic mechanism due to pion cloud e0ects and a paramagnetic contribution due to nucleon resonance excitation. Since the cloud e0ects have a considerably longer range in space than the resonance structures, the Q2 behavior of the magnetic GP is able to disentangle both physical mechanisms, which is already displayed in the existing data. Given this initial success, future experiments to measure VCS observables in the 8-energy region hold the promise to extract GPs with an enhanced precision, within the dispersion relation formalism presented here. Besides the VCS experiments without polarization degrees of freedom, which give access to a combination of only 3 of 6 GPs, we investigated the potential of double polarization VCS observables. In fact, a .rst double polarization experiment is now underway at MAMI. Although such investigations will be challenging indeed, they are prerequisite to access and quantify the full set of scalar and spin GPs of the nucleon. In conclusion we .nd that dispersion relations are indeed a powerful tool to analyze real and virtual Compton scattering processes, linking low-energy structure quantities to the excitation spectrum of the nucleon. Though the experiments with virtual photons have only become feasible very recently, they have opened up a new and systematic way to map out, in quantitative detail, the transition from hadronic degrees of freedom at low virtuality to partonic degrees of freedom at large virtuality. We are looking forward to increasing theoretical and experimental activities in the .elds of both real and virtual Compton scattering, and hope that the present review will be useful to stimulate and analyze such further work. Acknowledgements This work was supported by the Deutsche Forschungsgemeinschaft (SFB 443), and the European Centre for Theoretical Studies in Nuclear Physics and Related Areas (ECT*). We also like to thank for the hospitality of the ECT* (Trento) and its director W. Weise for hosting two Collaboration meetings related to the subjects of this paper, “Real and Virtual Compton Scattering o0 the Nucleon”
198
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
in 2001 and “Baryon structure probed with quasistatic electromagnetic .elds” in 2002. These meetings provided an excellent and stimulating atmosphere with lively discussions which shaped much of the material presented here. We would like to express our gratitude to M. Gorchtein, B. Holstein, S. Kamalov, C.W. Kao, A. Metz, T. Spitzenberg, and L. Tiator, in collaboration with whom some of the results, that are reviewed in this work, were obtained. Furthermore we would also like to thank J. Ahrens, H.J. Arends, P.Y. Bertin, V. Burkert, J.P. Chen, N. d’Hose, G. Dodge, H. Fonvieille, H. Griesshammer, P.A.M. Guichon, D. Harrington, T. Hemmert, R. Hildebrandt, C. Hyde-Wright, G. Laveissi`ere, A. L’vov, H. Merkel, Z.-E. Meziani, S. Scherer, R. Van de Vyver, L. Van Hoorebeke, T. Walcher, and W. Weise, for many useful and stimulating discussions.
Appendix A. t-channel exchange We express the invariant amplitudes Ai (2; t) (i = 1; : : : ; 6) in terms of the t-channel helicity amplitudes T.tN .NZ ;. . (2; t), for which we have found the expressions √
22 t 1 t t t A1 = √ T1 1 + T1 1 −√ ; T1 1 ; 11 ; − 1− 1 t t − 4M 2 su − M 4 2 − 2 ; 11 22 22
√ 2 t − 4M 22 1 A2 = √ − T 1t 1 − T 1t 1 ; − √ T 1t 1 ; 11 ; − 1− 1 t t su − M 4 2 − 2 ; 11 22 22
√ M2 1 su − M 4 t t t √ √ A3 = ; 2T 1 1 + + T1 1 T1 1 ; 1− 1 − ; 1− 1 − ; −11 su − M 4 t − 4M 2 2 t 22 2 2 2 2
1 M2 t t √ A4 = M −T 1 1 + T1 1 − ; 1− 1 − ; −11 su − M 4 su − M 4 2 2 2 2 √ √ t t − 4M 2 t t T1 1 + ; + T1 1 − ; 1− 1 − ; −11 42 2 2 2 2 √
t − 4M 2 t √ −2T 1 1 ; A5 = √ − ; 11 42 t su − M 4 2 2 √
t − 4M 2 t t T1 1 : (A.1) + T1 1 A6 = √ √ − ; 1− 1 − ; −11 42 t su − M 4 2 2 2 2 We decompose the t-channel helicity amplitudes for → N NZ into a partial wave series, T.tN .NZ ;. . (2; t) =
2J + 1 T.JN .NZ ;. . (t)dJIN I ("t ) ; 2 J
(A.2)
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
199
where dJIN I are Wigner d-functions and "t is the scattering angle in the t-channel, which is related to √√ the invariants 2 and t by cos "t =4M2= t t − 4M 2 . We calculate the imaginary parts of the t-channel helicity amplitudes T.tN .NZ ;. . (2; t) through the unitarity equation by inserting intermediate states, which should give the dominant contribution below K KZ threshold, p | 1 |˜ √ d$ [T→ ] · [T→N NZ ]∗ : (A.3) 2 Im T→N NZ = (4)2 t Combining the partial wave expansion for → , (J − I )! I 2J + 1 TIJ (→) (t) · · P (cos " ) ; (A.4) TI→ (t; " ) = 2 (J + I )! J J even and the partial wave expansion for → N NZ , Z TIN →N N (t; N)
2J + 1 J (→N NZ ) TIN = (t) · 2 J
(J − IN )! IN · P (cos N) : (J + IN )! J
We can now construct the imaginary parts of the Compton t-channel partial waves, 1 p J (→) Z →N NZ ) √ [TI 2 Im T.JN .( (t) = (t)][TIJ N (→N N ) (t)]∗ : NZ ;. . (8) t Z
(A.5)
(A.6)
J (t) of Frazer The partial wave amplitudes TIJ N (→N N ) of Eq. (A.5) are related to the amplitudes f± and Fulco [165] by the relations 16 (→N NZ ) J TIJ N =0 (t) = (pN p )J · f+ (t) ; pN √ t J (→N NZ ) J (t) = 8 (pN p )J · f− (t) ; (A.7) TIN =1 pN 2 with pN and p the c.m. momenta of nucleon and pion respectively (pN = t=4 − M and p = t=4 − m2 ). For the reaction → , we will use the partial wave amplitudes FJI (t), which are related to those of Eq. (A.4) by 2 · FJI (t) : TIJ (→) (t) = √ (A.8) 2J + 1 Inserting the partial-wave expansion of Eq. (A.2) into Eq. (A.1), we can .nally express the 2 t-channel contributions Imt Ai (2; t)2 by the partial wave amplitudes for the reactions → and → N NZ , 1 p J =0∗ Imt A1 (2; t)2 = √ 2 FJ =0I =0 (t)f+ (t) t t pN √ 5 p2 J =2∗ (8M 2 22 − su + M 4 )FJ =2I =0 (t)f+ (t) + 2 t 15 2 2 J =2∗ M2 p FJ =2I =0 (t)f− (t) ; − 2
200
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205
15 p3 J =2∗ √ 4M22 FJ =2I =0 (t)f− (t) ; 2 t2 t √ 2 3 M p 5 3 J =2∗ J =2∗ √ FJ =2I =2 (t)f+ Imt A3 (2; t)2 = (t) − MFJ =2I =2 (t)f− (t) ; 2 t t pN2 2 2
Imt A2 (2; t) = −
Imt A4 (2; t)2 = 0 ; Imt A5 (2; t)2 = − √ 2
Imt A6 (2; t) = −
15 M 3 J =2∗ √ p FJ =2I =0 (t) f− (t) ; 2 t t
5 M 3 J =2∗ √ p FJ =2I =2 (t)f− (t) : 2 t t
(A.9)
We note that the s-wave (J = 0) component of the 2 intermediate states contributes only to A1 and that only waves with J ¿ 4 contribute to the amplitude A4 .
Appendix B. Tensor basis In writing down a gauge-invariant tensor basis for VCS, we use the combinations of the fourmomenta given in Eq. (157), P=
1 (p + p ); 2
1 K = (q + q ) : 2
(B.1)
entering the VCS amplitude of Eq. (222) and introduced in The 12 independent tensors !2 i Ref. [140], are given by
2 2 !2 1 = −q · q g + q q ;
2 2 2 2 2 !2 2 = −(2M2) g − 4q · q P P + 4M2(P q + P q ) ;
2 2 2 2 2 2 !2 3 = −2M2Q g − 2M2q q + 2Q P q + 2q · q P q ; 2 = − 4M2(P 2 + P 2 ) + i4M25 2( K ( ; !2 4 = 8P P K 2 =− !2 5 =P q K
Q2 2 i (P − P 2 ) − M2q 2 − Q2 5 2( K ( ; 2 2
2 2 2 2 2 !2 6 = −8q · q P P + 4M2(P q + P q ) + 4Mq · q (P + P )
− 4M 2 2(q 2 + q2 ) + i4M2(q 2 K − q2 K + q · q 2 ) + i4Mq · q 5 2( K ( ;
2 2 !2 = − q · q (P 2 − P 2 ) + M2(q 2 − q2 ) ; 7 = (P q − P q )K
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 2 !2 8 = M2q q +
+
201
Q2 2 = + Mq · q q 2 (P q − P 2 q ) − q · q P 2 q − Mq q2 K 2
M 2 2 i Q (q − q2 ) − Q2 (q 2 K − q2 K + q · q 2 ) ; 2 2
2 2 2 2 2 2 2 !2 9 = 2M2(P q − P q ) − 2Mq · q (P − P ) + 2M 2(q − q )
+i2q · q (P 2 K + P 2 K ) − i2M2(q 2 K + q2 K ) ;
2 2 2 2 = − 2M (q 2 + q2 ) !2 10 = −4M2g + 2(P q + P q ) + 4Mg K
−2i(q 2 K − q2 K + q · q 2 ) ;
2 2 = − 4M2(q 2 + q2 ) + i4q · q 5 2( K ( ; !2 11 = 4(P q + P q )K 2 2 2 2 2 2 2 2 !2 12 = 2Q P P + 2M2P q − 2MQ P − 2M 2q + i2M2q K
+ iQ2 (P 2 K + P 2 K − M22 ) − iMQ2 5 2( K ( ; where we follow the conventions of Bjorken and Drell [145], in particular 0123 = +1.
(B.2) 2
2
= i=2[ ; ] and
References [1] R. de L. Kronig, J. Opt. Soc. Amer. Rev. Sci. Instrum. 12 (1926) 547; H.A. Kramers, Atti Congr. Int. Fis. Como 2 (1927) 545. [2] S. Gerasimov, Yad. Fiz 2 (1966) 598 [Sov. J. Nucl. Phys. 2 (1966) 460]. [3] S.D. Drell, A.C. Hearn, Phys. Rev. Lett. 16 (1966) 908. [4] H. Burkhardt, W.N. Cottingham, Ann. Phys. 56 (1970) 453. [5] W. Heisenberg, Z. Phys. 120 (1943) 513. [6] S. Mandelstam, Rep. Prog. Phys. 25 (1962) 99, and references given therein. [7] H.M. Nussenzveig, Causality and Dispersion Relations, Academic Press, New York, 1972. [8] G. H]ohler, Pion-Nucleon Scattering, in: H. Schopper (Ed.), Landolt-B]ornstein, Vol. I/9b2, Springer, Berlin, 1983. [9] S. BoG, C. Giusti, F.D. Pacati, M. Radici, Electromagnetic Response of Atomic Nuclei, Clarendon Press, Oxford, 1996. [10] A.W. Thomas, W. Weise, The Structure of the Nucleon, Wiley-VCH, Berlin, 2001. [11] J.D. Jackson, Classical Electrodynamics, Wiley, New York, 1975. [12] E. Merzbacher, Quantum Mechanics, Wiley, New York, 1970. [13] D. Babusci, G. Giordano, G. Matone, Phys. Rev. C 57 (1998) 291. [14] M. Derrick, et al. (ZEUS Collaboration), Z. Phys. C 63 (1994) 391. S. Aid, et al. (H1 Collaboration), Z. Phys. C 69 (1995) 27. S. Chekanov, et al. (ZEUS Collaboration), Nucl. Phys. B 627 (2002) 3. [15] J.R. Cudell, V. Ezhela, K. Kang, S. Lugovsky, N. Tkachenko, Phys. Rev. D 61 (2000) 034019. [16] J. Ahrens, et al. (GDH and A2 Collaborations), Phys. Rev. Lett. 84 (2000) 5950. [17] J. Ahrens, et al. (GDH and A2 Collaborations), Phys. Rev. Lett. 87 (2001) 022003. [18] D. Drechsel, O. Hanstein, S. Kamalov, L. Tiator, Nucl. Phys. A 645 (1999) 145. [19] D. Drechsel, S.S. Kamalov, L. Tiator, Phys. Rev. D 63 (2001) 114010. [20] H. Holvoet, Ph.D. Thesis, University Gent, 2001. [21] H. Holvoet, M. Vanderhaeghen, in preparation.
202 [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68]
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 K. Helbing, (GDH Collaboration), Nucl. Phys. B (Proc. Suppl.) 105 (2002) 113. L.A. Copley, G. Karl, E. Obryk, Nucl. Phys. B 13 (1969) 303. R. Koniuk, N. Isgur, Phys. Rev. D 21 (1980) 1868. N. Bianchi, E. Thomas, Phys. Lett. B 450 (1999) 439. F.E. Low, Phys. Rev. 96 (1954) 1428. M. Gell-Mann, M.L. Goldberger, Phys. Rev. 96 (1954) 1433. A.M. Baldin, Nucl. Phys. 18 (1960) 310. L.I. Lapidus, Sov. Phys. JETP 16 (1963) 964. M. Gell-Mann, M.L. Goldberger, W.E. Thirring, Phys. Rev. 95 (1954) 1612. L. Tiator, in: Proceedings of the 2nd International Symposium on the Gerasimov-Drell-Hearn Sum Rule and the Spin Structure of the Nucleon (GDH 2002); World Scienti.c, Singapore, 2003, to be published. D.A. Dicus, R. Vega, Phys. Lett. B 501 (2001) 44. S.D. Bass, Mod. Phys. Lett. A 12 (1997) 1051. S.D. Bass, M.M. Brisudova, Eur. Phys. J. A 4 (1999) 251. P. Bosted, D. Crabb, spokespersons, SLAC Proposal E-159 (2000). S. Simula, et al., Phys. Rev. D 65 (2002) 034017. MAMI-proposal A2/1-97, spokespersons H.-J. Arends and P. Pedroni. L.N. Hand, Phys. Rev. 129 (1963) 1834. F.J. Gilman, Phys. Rev. 167 (1968) 1365. X. Ji, Phys. Lett. B 309 (1993) 187. C.G. Callan, D.J. Gross, Phys. Rev. Lett. 21 (1968) 311. A.D. Martin, R.G. Roberts, W.J. Stirling, R.S. Thorne, Phys. Lett. B 531 (2002) 216. J. Edelmann, N. Kaiser, G. Piller, W. Weise, Nucl. Phys. A 641 (1998) 119. D. Drechsel, S.S. Kamalov, G. Krein, B. Pasquini, L. Tiator, Nucl. Phys. A 660 (1999) 57. X. Ji, J. Osborne, J. Phys. G 27 (2001) 127. J. Bl]umlein, H. B]ottcher, Nucl. Phys. B 636 (2002) 225. S. Wandzura, F. Wilczek, Phys. Lett. B 72 (1977) 195. P.L. Anthony, et al. (E155 Collaboration), Phys. Lett. B 458 (1999) 529. P.L. Anthony, et al. (E155 Collaboration), Phys. Lett. B 553 (2003) 18. P.L. Anthony, et al. (E155 Collaboration), Phys. Lett. B 493 (2000) 19. J.D. Bjorken, Phys. Rev. 148 (1966) 1467; J.D. Bjorken, Phys. Rev. D 1 (1970) 1376. S.A. Larin, J.A.M. Vermaseren, Phys. Lett. B 259 (1991) 345. X. Ji, C.-W. Kao, J. Osborne, Phys. Lett. B 472 (2000) 1. V. Bernard, T.R. Hemmert, U.-G. Mei^ner, Phys. Lett. B 545 (2002) 105. K. Abe, et al. (E143 Collaboration), Phys. Rev. D 58 (1998) 112003. A. Airapetian, et al. (HERMES Collaboration), Eur. Phys. J. C 26 (2003) 527. R. De Vita (for the CLAS Collaboration), in: C. Carlson, B. Mecking, Proceedings of the 9th International Conference on the Structure of Baryons (Baryons 2002), World Scienti.c, Singapore, 2003. M. Anselmino, B.L. Io0e, E. Leader, Sov. J. Nucl. Phys. 49 (1989) 136. V. Burkert, B.L. Io0e, Phys. Lett. B 296 (1992) 223; V. Burkert, B.L. Io0e, J. Exp. Theor. Phys. 78 (1994) 619. V. Burkert, Zh. Li, Phys. Rev. D 47 (1993) 46. M. Amarian, et al. (JLab E94010 Collaboration), Phys. Rev. Lett. 89 (2002) 242301. S.E. Kuhn, spokesperson Je0erson Lab Experiment E93-009 (1993). J. Yun, et al. (CLAS Collaboration), hep-ex/0212044. C. Cio. degli Atti, S. Scopetta, Phys. Lett. B 404 (1997) 223. V.D. Burkert, Phys. Rev. D 63 (2001) 097904. A. Sch]afer, in: D. Drechsel. L. Tiator (Ed.), Proceedings of the Symposium on the Gerasimov-Drell-Hearn Sum rule and the Nucleon Spin Structure in the Resonance Region (GDH 2000), World Scienti.c, Singapore, 2001. W-Y. Tsai, L.L. DeRaad Jr., K.A. Milton, Phys. Rev. D 11 (1975) 3537. G. Altarelli, B. Lampe, P. Nason, G. Ridol., Phys. Lett. B 334 (1994) 187.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114]
203
C.-W. Kao, T. Spitzenberg, M. Vanderhaeghen, Phys. Rev. D 67 (2003) 016001. M.K. Jones, et al. (JLab/Hall A Collaboration), Phys. Rev. Lett. 84 (2000) 1398. O. Gayou, et al. (JLab/Hall A Collaboration), Phys. Rev. Lett. 88 (2002) 092301. O. Klein, Y. Nishina, Z. Phys. 52 (1929) 853. J.L. Powell, Phys. Rev. 75 (1949) 32. J. Wess, B. Zumino, Phys. Lett. B 37 (1971) 95; E. Witten, Nucl. Phys. B 223 (1983) 422. A.I. L’vov, V.A. Petrun’kin, M. Schumacher, Phys. Rev. C 55 (1997) 359. D. Drechsel, M. Gorchtein, B. Pasquini, M. Vanderhaeghen, Phys. Rev. C 61 (1999) 015204. R.E. Prange, Phys. Rev. 110 (1958) 240. A.I. L’vov, Sov. J. Nucl. Phys. 34 (1981) 597. S. Ragusa, Phys. Rev. D 47 (1993) 3757; ibid. 49 (1994) 3157. D. Babusci, C. Giordano, A.I. L’vov, G. Matone, A.M. Nathan, Phys. Rev. C 58 (1998) 1013. V.I. Gol’danski, et al., Zh. Eksp. Teor. Fiz. 38 (1960) 1695 [Sov. Phys. JEPT 11 (1960) 1223]. P.S. Baranov, et al., Phys. Lett. B 52 (1974) 122; P.S. Baranov, et al., Yad. Fiz. 21 (1975) 689 [Sov. J. Nucl. Phys. 21 (1975) 355]. B.E. MacGibbon, G. Garino, M.A. Lucas, A.M. Nathan, G. Feldman, B. Dolbilkin, Phys. Rev. C 52 (1995) 2097. F.J. Federspiel, et al., Phys. Rev. Lett. 67 (1991) 1511, See also F.J. Federspiel, Ph.D. Dissertation, University of Illinois, 1991. E.L. Hallin, et al., Phys. Rev. C 48 (1993) 1497. J. Tonnison, A.M. Sandor., S. Hoblit, A.M. Nathan, Phys. Rev. Lett. 80 (1998) 4382. V. Olmos de Leaon, et al., Eur. Phys. J. A 10 (2001) 207. G. Galler, et al., Phys. Lett. B 503 (2001) 245. S. Wolf, et al., Eur. Phys. J. A 12 (2001) 231. M. Camen, et al., Phys. Rev. C 65 (2002) 032202(R). J. Schmiedmayer, et al., Phys. Rev. Lett. 66 (1991) 1015. G.V. Nikolenko, A.B. Popov, Z. Phys. A 341 (1992) 365. L. Koester, et al., Phys. Rev. C 51 (1995) 3363. T.L. Enik, et al., Phys. Atom. Nucl. 60 (1997) 567. K.W. Rose, et al., Phys. Lett. B 234 (1990) 460; K.W. Rose, et al., Nucl. Phys. A 514 (1990) 621. F. Wissmann, et al., Nucl. Phys. A 660 (1999) 232. K. Kossert, et al., Phys. Rev. Lett. 88 (2002) 162301. M.I. Levchuk, A.I. L’vov, Nucl. Phys. A 674 (2000) 449. N.R. Kolb, et al., Phys. Rev. Lett. 85 (2000) 1388. D.L. Hornidge, et al., Phys. Rev. Lett. 84 (2000) 2334. M. Lundin, et al., nucl-ex/0204014. K. Hagiwara, et al., (Particle Data Group), Phys. Rev. D 66 (2002) 010001. M.M. Pavan, R.A. Arndt, I.I. Strakovsky, R.L. Workman, PiN Newslett. 15 (1999) 171. T.R. Hemmert, B.R. Holstein, J. Kambor, G. Kn]ochlein, Phys. Rev. D 57 (1998) 5746. O. Hanstein, D. Drechsel, L. Tiator, Nucl. Phys. A 632 (1998) 561. R.A. Arndt, W.J. Briscoe, I.I. Strakovsky, R.L. Workman, Phys. Rev. C 66 (2002) 055213, and references therein. J. Bernabeu, T.E.O. Ericson, C. Ferro Fontan, Phys. Lett. B 49 (1974) 381; J. Bernabeu, B. Tarrach, ibid. 69 (1977) 484. B.R. Holstein, A.M. Nathan, Phys. Rev. D 49 (1994) 6101. A.I. L’vov, A.M. Nathan, Phys. Rev. C 59 (1999) 1064. A. Zieger, R. Van de Vyver, D. Christmann, A. De Graeve, C. Van den Abeele, B. Ziegler, Phys. Lett. B 278 (1992) 34. A. H]unger, et al., Nucl. Phys. A 620 (1997) 385. B.R. Holstein, D. Drechsel, B. Pasquini, M. Vanderhaeghen, Phys. Rev. C 61 (2000) 034316. H.W. Grie^hammer, T.R. Hemmert, Phys. Rev. C 65 (2002) 045207. R. Hildebrandt, H.W. Grie^hammer, T.R. Hemmert, B. Pasquini, in preparation.
204 [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159]
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 V.A. Petrunkin, Sov. J. Part. Nucl. 12 (1981) 278. P. Hecking, G.F. Bertsch, Phys. Lett. B 99 (1981) 237. A. Sch]afer, B. M]uller, D. Vasak, W. Greiner, Phys. Lett. B 143 (1984) 323. R. Weiner, W. Weise, Phys. Lett. B 159 (1985) 85. V. Bernard, N. Kaiser, U.-G. Mei^ner, Phys. Rev. Lett. 67 (1991) 1515. V. Bernard, N. Kaiser, A. Schmidt, U.-G. Mei^ner, Phys. Lett. B 319 (1993) 269; V. Bernard, N. Kaiser, A. Schmidt, U.-G. Mei^ner, Z. Phys. A 348 (1994) 317. T.R. Hemmert, B.R. Holstein, J. Kambor, Phys. Rev. D 55 (1997) 5598. S. Kondratyuk, O. Scholten, Phys. Rev. C 64 (2001) 024005. V. Bernard, N. Kaiser, U.-G. Mei^ner, Int. J. Mod. Phys. E 4 (1995) 193. K.B. Vijaya Kumar, J.A. McGovern, M.C. Birse, Phys. Lett. B 479 (2000) 167. X. Ji, C.-W. Kao, J. Osborne, Phys. Rev. D 61 (2000) 074003. G.C. Gellas, T.R. Hemmert, U.-G. Mei^ner, Phys. Rev. Lett. 85 (2000) 14. M.C. Birse, X. Ji, J.A. McGovern, Phys. Rev. Lett. 86 (2001) 3204. G.C. Gellas, T.R. Hemmert, U.-G. Mei^ner, Phys. Rev. Lett. 86 (2001) 3205. T.R. Hemmert, in: D. Drechsel, L. Tiator (Eds.), Proceedings of the Symposium on the Gerasimov–Drell–Hearn Sum rule and the Nucleon Spin Structure in the Resonance Region (GDH 2000); World Scienti.c, Singapore, 2001. J. Christensen, F.X. Lee, W. Wilcox, L. Zhou, hep-lat/0209043; hep-lat/0209128. J. Roche, et al., Phys. Rev. Lett. 85 (2000) 708. H. Fonvieille (for the Je0erson Lab Hall A and VCS Collaborations), in: C. Carlson, B. Mecking (Eds.), Proceedings of the 9th International Conference on the Structure of Baryons (Baryons 2002); World Scienti.c, Singapore, 2003; hep-ex/0206035. R. Miskimen, spokespersons MIT-Bates experiment, 1997–2003. N. d’Hose, H. Merkel, spokespersons MAMI experiment, 2001. C. Hyde-Wright, G. Laveissi`ere, private communication. P.A.M. Guichon, G.Q. Liu, A.W. Thomas, Nucl. Phys. A 591 (1995) 606. P.A.M. Guichon, M. Vanderhaeghen, Prog. Part. Nucl. Phys. 41 (1998) 125. M. Vanderhaeghen, Eur. Phys. J. A 8 (2000) 455. S. Scherer, A.Yu. Korchin, J.H. Koch, Phys. Rev. C 54 (1996) 904. D. Drechsel, G. Kn]ochlein, A.Yu. Korchin, A. Metz, S. Scherer, Phys. Rev. C 57 (1998) 941 and Phys. Rev. C 58 (1998) 1751. B. Pasquini, M. Gorchtein, D. Drechsel, A. Metz, M. Vanderhaeghen, Eur. Phys. J. A 11 (2001) 185. D. Drechsel, G. Kn]ochlein, A. Metz, S. Scherer, Phys. Rev. C 55 (1997) 424. B. Pasquini, D. Drechsel, M. Gorchtein, A. Metz, M. Vanderhaeghen, Phys. Rev. C 62 (2000) 052201(R). R.A. Berg, C.N. Lindner, Nucl. Phys. 26 (1961) 259. J.D. Bjorken, S.D. Drell, Relativistic Quantum Fields, McGraw-Hill, New York, 1965. H. Pilkuhn, Relativistic Particle Physics, Springer Verlag, Heidelberg, 1979. R.L. Ja0e, P.F. Mende, Nucl. Phys. B 369 (1992) 189. R. Oehme, Int. J. Mod. Phys. A 10 (1995) 1995. M. Gorchtein, Ph.D. Thesis, University Mainz, 2002. S.J. Brodsky, G.P. Lepage, Phys. Rev. D 24 (1981) 1808. A. Metz, D. Drechsel, Z. Phys. A 356 (1996) 351; A. Metz, D. Drechsel, Z. Phys. A 359 (1997) 165. G. H]ohler, E. Pietarinen, I. Sabba-Stefanescu, F. Borkowski, G.G. Simon, V.H. Walther, R.D. Wendling, Nucl. Phys. B 114 (1976) 505. N.I. Kaloskamis, C.N. Papanicolas, spokespersons MIT-Bates experiment, 1997. N. Degrande, Ph.D. Thesis, University Gent, 2001. S. Jaminion, Ph.D. Thesis, Universitae Blaise Pascal, Clermont-Ferrand, 2000. G. Laveissiere, Ph.D. Thesis, Universitae Blaise Pascal, Clermont-Ferrand, 2001. E.J. Brash, A. Kozlov, Sh. Li, G.M. Huber, Phys. Rev. C 65 (2002) 051001. M. Vanderhaeghen, Phys. Lett. B 402 (1997) 243. T.R. Hemmert, B.R. Holstein, G. Kn]ochlein, D. Drechsel, Phys. Rev. D 62 (2000) 014013.
D. Drechsel et al. / Physics Reports 378 (2003) 99 – 205 [160] T.R. Hemmert, B.R. Holstein, G. Kn]ochlein, S. Scherer, Phys. Rev. D 55 (1997) 2630; T.R. Hemmert, B.R. Holstein, G. Kn]ochlein, S. Scherer, Phys. Rev. Lett. 79 (1997) 22. [161] M. Vanderhaeghen, Phys. Lett. B 368 (1996) 13. [162] A.I. L’vov, S. Scherer, B. Pasquini, C. Unkmeir, D. Drechsel, Phys. Rev. C 64 (2001) 015203. [163] B. Pasquini, S. Scherer, D. Drechsel, Phys. Rev. C 63 (2001) 025205. [164] C.-W. Kao, M. Vanderhaeghen, Phys. Rev. Lett. 89 (2002) 272002. [165] W.R. Frazer, J.R. Fulco, Phys. Rev. 117 (1960) 1603.
205
Available online at www.sciencedirect.com
Physics Reports 378 (2003) 207 – 299 www.elsevier.com/locate/physrep
Quantum eld theory on noncommutative spaces Richard J. Szabo∗ Department of Mathematics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS, UK Accepted 23 January 2003 editor: A: Schwimmer
Abstract A pedagogical and self-contained introduction to noncommutative quantum eld theory is presented, with emphasis on those properties that are intimately tied to string theory and gravity. Topics covered include the Weyl–Wigner correspondence, noncommutative Feynman diagrams, UV/IR mixing, noncommutative Yang– Mills theory on in nite space and on the torus, Morita equivalences of noncommutative gauge theories, twisted reduced models, and an in-depth study of the gauge group of noncommutative Yang–Mills theory. Some of the more mathematical ideas and techniques of noncommutative geometry are also brie6y explained. c 2003 Elsevier Science B.V. All rights reserved. PACS: 11.10.−z
Contents 1. Historical introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. Evidence for spacetime noncommutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Matrix models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3. Strong magnetic elds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4. Outline and omissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Weyl quantization and the Groenewold–Moyal product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Weyl operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. The star-product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Noncommutative perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Planar Feynman diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1. String theoretical interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
208 208 211 213 214 215 215 218 220 222 224
Based on invited lectures given at the APCTP-KIAS Winter School on “Strings and D-Branes 2000”, Seoul, Korea, February 21–25 2000, at the Science Institute, University of Iceland, Reykjavik, Iceland June 1–8 2000, and at the PIMS/APCTP/PITP Frontiers of Mathematical Physics Workshop on “Particles, Fields and Strings”, Simon Fraser University, Vancouver, Canada, July 16 –27 2001. ∗ Tel.: +44-131-451-3230; fax: +44-131-451-3249. E-mail addresses:
[email protected],
[email protected] (R.J. Szabo). c 2003 Elsevier Science B.V. All rights reserved. 0370-1573/03/$ - see front matter doi:10.1016/S0370-1573(03)00059-0
208
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
3.2. Non-planar Feynman diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. UV/IR mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1. String theoretical interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Noncommutative Yang–Mills theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Star-gauge symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Noncommutative Wilson lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. One-loop renormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Gauge theory on the noncommutative torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1. The noncommutative torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Topological quantum numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Large star-gauge transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Duality in noncommutative Yang–Mills theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1. Morita equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1. Irreducible representations of twist eaters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2. Solving twisted boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1. Other transformation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. Projective modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1. String theoretical interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Matrix models of noncommutative Yang–Mills theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1. Twisted reduced models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. Finite-dimensional representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1. The twisted Eguchi–Kawai model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2. The matrix- eld correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3. Discrete noncommutative Yang–Mills theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Geometry and topology of star-gauge transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1. Star-gauge symmetries revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2. Inner automorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1. The Tomita involution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2. Geometrical aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3. Violations of Lorentz invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3. Universal gauge symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4. Large N limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1. Algebraic description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2. Geometric description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
224 226 229 230 231 233 236 239 239 241 242 246 247 248 250 253 255 256 260 262 262 263 264 265 267 270 272 275 276 278 279 280 284 284 287 289 289
1. Historical introduction 1.1. Evidence for spacetime noncommutativity It was suggested very early on by the founding fathers of quantum mechanics, most notably Heisenberg, in the pioneering days of quantum eld theory that one could use a noncommutative structure for spacetime coordinates at very small length scales to introduce an eMective ultraviolet cutoM. It was Snyder [1] who rst formalized this idea in an article entirely devoted to the subject. This was motivated by the need to control the divergences which had plagued theories such as quantum electrodynamics from the very beginning. It was purported to be superior to earlier
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
209
suggestions of lattice regularization in that it maintained Lorentz invariance. However, this suggestion was largely ignored, but mostly because of its timing. At around the same time, the renormalization program of quantum eld theory nally proved to be successful at accurately predicting numerical values for physical observables in quantum electrodynamics. The idea behind spacetime noncommutativity is very much inspired by quantum mechanics. A quantum phase space is de ned by replacing canonical position and momentum variables xi ; pj with Hermitian operators xˆi ; pˆ j which obey the Heisenberg commutation relations [xˆj ; pˆ i ] = i˝ ij . The phase space becomes smeared out and the notion of a point is replaced with that of a Planck cell. In the classical limit ˝ → 0, one recovers an ordinary space. It was von Neumann who rst attempted to rigorously describe such a quantum “space” and he dubbed this study “pointless geometry”, referring to the fact that the notion of a point in a quantum phase space is meaningless because of the Heisenberg uncertainty principle of quantum mechanics. This led to the theory of von Neumann algebras and was essentially the birth of “noncommutative geometry”, referring to the study of topological spaces whose commutative C ∗ -algebras of functions are replaced by noncommutative algebras [2]. In this setting, the study of the properties of “spaces” is done in purely algebraic terms (abandoning the notion of a “point”) and thereby allows for rich generalizations. Just as in the quantization of a classical phase space, a noncommutative spacetime is de ned by replacing spacetime coordinates xi by the Hermitian generators xˆi of a noncommutative C ∗ -algebra of “functions on spacetime” [2] which obey the commutation relations [xˆi ; xˆj ] = i ij :
(1.1) ij
The simplest special case of (1.1) is where is a constant, real-valued antisymmetric D × D matrix (D is the dimension of spacetime) with dimensions of length squared. Since the coordinates no longer commute, they cannot be simultaneously diagonalized and the underlying space disappears, i.e. the spacetime manifold gets replaced by a Hilbert space of states. Because of the induced spacetime uncertainty relation, Rxi Rxj ¿ 12 | ij | ;
(1.2)
a spacetime point is replaced by a Planck cell of dimension given by the Planck area. In this order parameters obtained way one may think of ordinary spacetime coordinates xi as macroscopic √ by coarse-graining over scales smaller than the fundamental scale ∼ . To describe physical phenomena on scales of the order of , the xi ’s break down and must be replaced by elements of some noncommutative algebra. Snyder’s idea was that if one could nd a coherent description for the structure of spacetime which is pointless on small length scales, then the ultraviolet divergences of quantum eld theory could be eliminated. It would be equivalent to using an ultraviolet cutoM on momentum space integrations to compute Feynman diagrams, which implicitly leads to a fundamental length scale −1 below which all phenomena are ignored. The old belief was therefore that the simplest, and most elegant, Lorentz-invariant way of introducing is through noncommuting spacetime “coordinates” xˆi . 1 The ideas of noncommutative geometry were revived in the 1980s by the mathematicians Connes, and Woronowicz and Drinfel’d, who generalized the notion of a diMerential structure to the noncommutative setting [3], i.e. to arbitrary C ∗ -algebras, and also to quantum groups and matrix 1
However, as we will discuss later on, this old idea is too naive and spacetime noncommutativity, at least in the form (1.1), does not serve as an ultraviolet regulator.
210
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
pseudo-groups. Along with the de nition of a generalized integration [4], this led to an operator algebraic description of (noncommutative) spacetimes (based entirely on algebras of “functions”) and it enables one to de ne Yang–Mills gauge theories on a large class of noncommutative spaces. A concrete example of physics in noncommutative spacetime is Yang–Mills theory on a noncommutative torus [4]. For quite some time, the physical applications were based on geometric interpretations of the standard model and its various elds and coupling constants (the so-called Connes–Lott model) [5]. Other quantum eld theories were also studied along these lines (see for example [6]). Gravity was also eventually introduced in a unifying way [7]. The central idea behind these approaches was to use a modi ed form of the Kaluza–Klein mechanism in which the hidden dimensions are replaced by noncommutative structures [8]. For instance, in this interpretation of the standard model [5] the Higgs eld is a discrete Z2 gauge eld on a noncommutative space, regarded as an internal Kaluza– Klein type excitation. This led to an automatic proof of the Higgs mechanism, independently of the details of the Higgs potential. The input parameters are the masses of all quarks and leptons, while the Higgs mass is a prediction of the model. However, this approach suMered many weaknesses and eventually died out. Most glaring was the problem that quantum radiative corrections could not be incorporated in order to give satisfactory predictions. Nevertheless, the model led to a revival of Snyder’s idea that classical general relativity would break down at the Planck scale because spacetime would no longer be described by a diMerentiable manifold [9]. At these length scales quantum gravitational 6uctuations become large and cannot be ignored [10]. More concrete evidence for spacetime noncommutativity came from string theory, at present the best candidate for a quantum theory of gravity, which in the 1980s raised precisely the same sort of expectations about the structure of spacetime at short distances. Because strings have a nite intrinsic length scale ‘s , if one uses only string states as probes of short distance structure, then it is not possible to observe distances smaller than ‘s . In fact, based on the analysis of very high-energy string scattering amplitudes [11], string-modi ed Heisenberg uncertainty relations have been postulated in the form 1 ˝ 2 Rx = (1.3) + ‘s Rp : 2 Rp When ‘s = 0, relation (1.3) gives the usual quantum mechanical prediction that the spatial extent of an object decreases as its momentum grows. However, from (1.3) it follows that the size of a string grows with its energy. Furthermore, minimizing (1.3) with respect to Rp yields an absolute lower bound on the measurability of lengths in the spacetime, (Rx)min = ‘s . 2 Thus string theory gives an explicit realization of the notion of the smearing out of spacetime coordinates as described above. More generally, spacetime uncertainty relations have been postulated in the form [12] Rxi Rxj = ‘p2 ;
(1.4)
where ‘p is the Planck length of the spacetime. Thus the spacetime con gurations are smeared out and the notion of a “point” becomes meaningless. In the low-energy limit ‘p → 0, one recovers the usual classical spacetime with commuting coordinates at large distance scales. 2
This bound can in fact be lowered to the 11-dimensional Planck length when one uses D0-branes as probes of short distance spacetime structure. This will be explained further in the next subsection.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
211
The apparent need in string theory for a description of spacetime in terms of noncommutative geometry is actually even stronger than at rst sight. This is because of the notion of quantum geometry, which may be de ned as the appropriate modi cation of classical general relativity implied by string theory. One instance of this is the quantum T -duality symmetry of strings on a toroidal compacti cation [13]. Consider, for example, closed strings compacti ed on a circle S1 of radius R. Then T -duality maps this string theory onto one with target space the circle of dual radius R˜ = ‘s2 =R, and at the same time interchanges the Kaluza–Klein momenta of the strings with their winding numbers around the S1 in the spectrum of the quantum string theory. Because of this stringy symmetry, the moduli space of string theories with target space S1 is parametrized by radii R ¿ ‘s (rather than the classical R ¿ 0), and very small circles are unobservable because the corresponding string theory can be mapped onto a completely equivalent one living in an S1 of very large radius. This has led to a mathematically rigorous study of duality symmetries [14–16] using the techniques of noncommutative geometry. The phenomenon of mirror symmetry is also possible to capture in this formalism, which is based primarily on the geometry of the underlying worldsheet superconformal eld theories [17]. The main goal of these analyses is the construction of an in nite-dimensional noncommutative “eMective target space” on which duality is realized as a true symmetry, i.e. as an isometry of an appropriate Riemannian geometry. In this framework, a duality transformation has a simple and natural interpretation as a change of “coordinates” inducing the appropriate change of metric. It is inspired in large part by Witten’s old observation [18] that the de Rham complex of a manifold can be reconstructed from the geometry of two-dimensional supersymmetric -models with target space the given manifold. A crucial ingredient of this construction is the properties possessed by the closed string vertex operator algebra, which in a particular low energy limit has the structure of a deformation algebra of functions on the target space [16]. This sort of deformation is very similar to what appears in Witten’s open string eld theory [19], which constitutes the original appearance of noncommutative geometry in string theory. The relationships between closed string theory and noncommutative geometry are reviewed in [20]. Other early aspects of the noncommutative geometry of strings may be found in [21]. Despite these successes, up until recently there have remained two main gaps in the understanding of the role of noncommutative geometry in string theory: • While most of the formalism deals with closed strings, the role of open strings was previously not clear. • There is no natural dynamical origin for the occurrence of noncommutative generalizations of eld theories, and in particular of Yang–Mills theory on a noncommutative space. 1.2. Matrix models The answers to the latter two points are explained by open string degrees of freedom known as D-branes [22], which are xed hypersurfaces in spacetime onto which the endpoints of strings can attach. It was realized very early on in studies of the physics of D-branes that their low-energy eMective eld theory has con guration space which is described in terms of noncommuting, matrix-valued spacetime coordinate elds [23]. This has led to the Matrix theory conjecture [24] and also the so-called IIB matrix model [25], both of which propose nonperturbative approaches to superstring theories. The latter matrix model is obtained by dimensionally reducing ordinary Yang–Mills theory
212
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
to a point and its bosonic part is given by the D-instanton action 1 tr[X i ; X j ]2 ; SIIB = − 2 4g
(1.5)
i=j
where X i , i = 1; : : : ; D, are N × N Hermitian matrices whose entries are c-numbers. The global minimum of the action (1.5) is given by the equation [X i ; X j ] = 0, 3 so that the matrices X i are simultaneously diagonalizable in the ground state. Their eigenvalues represent the collective coordinates of the individual D-branes, and so at tree-level we obtain an ordinary spacetime. However, the quantum 6uctuations about the classical minima give a spacetime whose coordinates are described by noncommuting matrices. The noncommutative geometry that arises in this way is due to the short open strings which connect the individual D-branes to one another [23]. Because of these excitations, D-branes can probe Planckian distances in spacetime at which their worldvolume eld theories are drastically altered by quantum gravitational eMects [26]. Furthermore, the matrix noncommutativity of the target space of multiple D-brane systems agrees with the forms of the string-modi ed uncertainty relations [27]. A more concrete connection to noncommutative geometry came from studying the toroidal compacti cations of the matrix model (1.5) [28]. It was shown that the most general solutions X i to the so-called quotient conditions for toroidal compacti cation are given by gauge connections on a noncommutative torus. Substituting these X i ’s back into the D-instanton action gives rise to Yang–Mills theory on a dual noncommutative torus. Thus, these matrix models naturally lead to noncommutative Yang–Mills theory as their eMective eld theories, and noncommutative geometry is now believed to be an important aspect of the nonperturbative dynamics of superstring theory (and M-theory). The noncommutativity was interpreted as the eMect of turning on the light-like component C−ij of the background three-form eld of 11-dimensional supergravity wrapped on cycles of a torus through the identi cation [28] −1 ( )ij = R d xi ∧ d xj C−ij ; (1.6) where R = d x− 1 (Here ij denote the dimensionless noncommutativity parameters). This identi cation holds in the scaling limit that de nes Matrix theory via discrete light-cone quantization [29]. In the usual reduction of M-theory to Type II superstring theory [30], the three-form eld C becomes the Neveu–Schwarz two-form eld B, with ∼ B−1 . This noncommutativity has been subsequently understood directly in the context of open string quantization [31–34], so that noncommutative geometry plays a role in the quantum dynamics of open strings in background elds and in the presence of D-branes. The relationship between the matrix noncommutativity of D-brane eld theory and the noncommutativity due to background supergravity elds is clari ed in [35]. At present, noncommutative Yang–Mills theory is believed to be a useful tool in the classi cation of string backgrounds, the best examples being the discoveries of noncommutative instantons for D = 4 [36], and of solitons in (2 + 1)-dimensional noncommutative gauge theory [37,38]. Other stringy type topological defects in this latter context may also be constructed [39]. 3
Other classical minima include solutions with nonvanishing but constant commutator. This observation will be used in Section 7 to establish a correspondence between the matrix model (1.5) and noncommutative Yang–Mills theory.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
213
1.3. Strong magnetic 5elds To quantify some of the previous remarks, we will now illustrate how noncommutativity emerges in a simple quantum mechanical example, the Landau problem [40]. Consider a charged particle of mass m moving in the plane ˜x = (x1 ; x2 ) and in the presence of a constant, perpendicular magnetic eld of magnitude B. The Lagrangian is m Lm = ˜x˙2 − ˜x˙ · ˜A ; (1.7) 2 where Ai =−B=2 jij xj is the corresponding vector potential. The Hamiltonian is Hm =1=2m˜2 , where ˜ = m˜x˙ = p ˜ + ˜A is the gauge invariant mechanical momentum (which is a physical observable), while p ˜ is the (gauge variant) canonical momentum. From the canonical commutation relations it follows that the physical momentum operators have the nonvanishing quantum commutators [ˆi ; ˆj ] = iBjij ;
(1.8)
and so the momentum space in the presence of a background magnetic eld ˜B becomes noncommutative. The points in momentum space are replaced by Landau cells of area B which serves as an infrared cutoM, i.e. ˜2 ¿ B. In this way the noncommutativity regularizes potentially divergent integrals such as d 2 =˜2 ∼ ln B. Spatial noncommutativity arises in the limit m → 0 whereby the Landau Lagrangian becomes B (1.9) L0 = − x˙i jij xj : 2 This is a rst order Lagrangian which is already expressed in phase space with the spatial coordinates x1 ; x2 being the canonically conjugate variables, so that i (1.10) [xˆi ; xˆj ] = jij : B This limiting theory is topological, in that the corresponding Hamiltonian vanishes and there are no propagating degrees of freedom. Note that the space noncommutativity (1.10) alternatively follows from the momentum noncommutativity (1.8) by imposing the rst class constraints ˜ ≈ ˜0. The limit m → 0 thereby reduces the four dimensional phase space to a two dimensional one which coincides with the con guration space of the model. Such a degeneracy is typical in topological quantum eld theories [41]. The limit m → 0 with xed B is actually the projection of the quantum mechanical spectrum of this system onto the lowest Landau level (The mass gap between Landau levels is B=m). The same projection can be done in the limit B → ∞ of strong magnetic eld with xed mass m. This simple example has a more or less direct analog in string theory [42]. Consider bosonic strings moving in 6at Euclidean space with metric gij , in the presence of a constant Neveu–Schwarz two-form B- eld and with Dp-branes. The B- eld is equivalent to a constant magnetic eld on the branes, and it can be gauged away in the directions transverse to the Dp-brane worldvolume. The (Euclidean) worldsheet action is 1 S = (gij 9a xi 9a xj − 2i Bij jab 9a xi 9b xj ) ; (1.11) 4 where = ‘s2 , is the string worldsheet, and xi is the embedding function of the strings into 6at space. The term involving the B- eld in (1.11) is a total derivative and for open strings it can be
214
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
written as an integral over the boundary of the string worldsheet, i S9 = − Bij xi 9t xj ; 2
(1.12)
9
where t is the coordinate of 9. Consider now the correlated low-energy limit gij ∼ ( )2 ∼ ! → 0, with Bij xed [34]. Then the bulk kinetic terms for the xi in (1.11) vanish, and the worldsheet theory is topological. All that remains are the boundary degrees of freedom of the open strings which are governed by the action (1.12). Then, ignoring the fact that xi (t) is the boundary value of a string, the one-dimensional action (1.12) coincides with that of the Landau action describing the motion of electrons in a strong magnetic eld. From this we may infer the noncommutativity [xˆi ; xˆj ] = (i=B)ij of the coordinates of the endpoints of the open strings which live in the Dp-brane worldvolume. The correlated low energy limit → 0 taken above eMectively decouples the closed string dynamics from the open string dynamics. It also decouples the massive open string states, so that the string theory reduces to a eld theory. Only the endpoint degrees of freedom remain and describe a noncommutative geometry. 4 1.4. Outline and omissions When the open string -model (1.11) is coupled to gauge eld degrees of freedom which live on the worldsheet boundary 9, the low-energy eMective eld theory may be described by noncommutative Yang–Mills theory (modulo a certain factorization equivalence that we shall describe later on) [34]. Furthermore, it has been shown independently that the IIB matrix model with D-brane backgrounds gives a natural regularization of noncommutative Yang–Mills theory to all orders of perturbation theory, with momentum space noncommutativity as in (1.8) [43]. The fact that quantum eld theory on a noncommutative space arises naturally in string theory and Matrix theory strongly suggests that spacetime noncommutativity is a general feature of a uni ed theory of quantum gravity. The goal of these lecture notes is to provide a self-contained, pedagogical introduction to the basic aspects of noncommutative eld theories and in particular noncommutative Yang–Mills theory. We shall pay particular attention to those aspects of these quantum eld theories which may be regarded as “stringy”. Noncommutative eld theories have many novel properties which are not exhibited by conventional quantum eld theories. They should be properly understood as lying somewhere between ordinary eld theory and string theory, and the hope is that from these models we may learn something about string theory and the classi cation of its backgrounds, using the somewhat simpler techniques of quantum eld theory. Our presentation will be in most part at the eld theoretical level, but we shall frequently indicate how the exotic properties of noncommutative eld theories are intimately tied to string theory. The organization of the remainder of this paper is as follows. In Section 2 we shall introduce the procedure of Weyl quantization which is a useful technique for translating an ordinary eld theory into a noncommutative one. In Section 3 we shall take a very basic look at the perturbative expansion of noncommutative eld theories, using a simple scalar model to illustrate the exotic 4
The situation is actually a little more subtle than that described above, since in the present case the coordinates xi (t) do not simply describe the motion of particles but are rather constrained to lie at the ends of strings. However, the general picture that xi (t) become noncommuting operators remains valid always [34].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
215
properties that one uncovers. In Section 4 we introduce noncommutative Yang–Mills theory, and discuss its observables and some of its perturbative properties. In Section 5 we will describe the classic and very important example of the noncommutative torus and gauge theories de ned thereon. In Section 6 we shall derive a very important geometrical equivalence between noncommutative Yang–Mills theories known as Morita equivalence, 5 which we will see is the analog of the T -duality symmetry of toroidally compacti ed open strings. In Section 7 we shall take a look at the matrix model formulations of noncommutative gauge theories and a nonperturbative lattice regularization of these models. Finally, in Section 8 we will describe in some detail the local and global properties of the gauge group of noncommutative Yang–Mills theory. We conclude this introductory section with a brief list of the major omissions in the present review article, and places where the interested reader may nd these topics. Other general reviews on the subject, with very diMerent emphasis than the present article, may be found in [44]. Solitons and instantons in noncommutative eld theory are reviewed in [45]. More general star-products than the ones described here can be found in [46] and references therein. The Seiberg–Witten map was introduced in [34] and has been the focal point of many works. See [47] for the recent exact solution, and references therein for previous analyses. The stringy extension of noncommutative gauge theory, de ned by the noncommutative Born–Infeld action, is analysed in [34,48,49], for example. The relationship between noncommutative eld theory and string eld theory is reviewed in [50]. A recent review of the more phenomenological aspects of noncommutative eld theory may be found in [51]. Finally, aspects of the -expanded approach to noncommutative gauge eld theory, which among other things enables a construction of noncommutative Yang–Mills theory for arbitrary gauge groups, may be found in [52]. 2. Weyl quantization and the Groenewold–Moyal product As we mentioned in Section 1.1, many of the general ideas behind noncommutative geometry are inspired in large part by the foundations of quantum mechanics. Within the framework of canonical quantization, Weyl introduced an elegant prescription for associating a quantum operator to a classical function of the phase space variables [53]. This technique provides a systematic way to describe noncommutative spaces in general and to study eld theories de ned thereon. In this section we shall introduce this formalism which will play a central role in most of our subsequent analysis. Although we will focus solely on the commutators (1.1) with constant ij , Weyl quantization also works for more general commutation relations. 2.1. Weyl operators Let us consider the commutative algebra of (possibly complex-valued) functions on D-dimensional Euclidean space RD , with product de ned by the usual pointwise multiplication of functions. We will assume that all elds de ned on RD live in an appropriate Schwartz space of functions of suVciently rapid decrease at in nity [54], i.e. those functions whose derivatives to arbitrary order 5
Morita equivalence is actually an algebraic rather than geometric equivalence. Here we mean gauge Morita equivalence which also maps geometrical structures de ned in the gauge theory.
216
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
vanish at in nity in both position and momentum space. This condition can be characterized, for example, by the requirements sup (1 + |x|2 )k+n1 +···+nD |9n11 · · · 9nDD f(x)|2 ¡ ∞ x
(2.1)
for every set of integers k; ni ∈ Z+ , where 9i = 9=9xi . In that case, the algebra of functions may be given the structure of a Banach space by de ning the L∞ -norm f∞ = sup |f(x)| : x
(2.2)
The Schwartz condition also implies that any function f(x) may be described by its Fourier transform i ˜ f(k) = d D x e−iki x f(x) ; (2.3) ˜ ˜ ∗ whenever f(x) is real-valued. We de ne a noncommutative space as described with f(−k) = f(k) in Section 1.1 by replacing the local coordinates xi of RD by Hermitian operators xˆi obeying the commutation relations (1.1). The xˆi then generate a noncommutative algebra of operators. Weyl quantization provides a one-to-one correspondence between the algebra of elds on RD and this ring of operators, and it may be thought of as an analog of the operator-state correspondence of local quantum eld theory. Given the function f(x) and its corresponding Fourier coeVcients (2.3), we introduce its Weyl symbol by dD k ˜ i ˆ W[f] = (2.4) f(k)eiki xˆ ; (2)D ˆ iki xi ] = where we have chosen the symmetric Weyl operator ordering prescription. For example, W[e i ˆ eiki xˆ . The Weyl operator W[f] is Hermitian if f(x) is real-valued. ˆ We can write (2.4) in terms of an explicit map '(x) between operators and elds by using (2.3) to get ˆ ˆ W[f] = d D xf(x)'(x) ; (2.5) where
d D k iki xˆi −iki xi e e : (2.6) (2)D ˆ † = '(x), ˆ Operator (2.6) is Hermitian, '(x) and it describes a mixed basis for operators and elds on spacetime. In this way we may interpret the eld f(x) as the coordinate space representation of the ˆ Weyl operator W[f]. Note that in the commutative case ij = 0, the map (2.6) reduces trivially to D ˆ a delta-function (xˆ − x) and W[f]| ˆ But generally, by the Baker–Campbell–HausdorM =0 = f(x). ij formula, for = 0 it is a highly nontrivial eld operator. We may introduce “derivatives” of operators through an anti-Hermitian linear derivation 9ˆi which is de ned by the commutation relations [9ˆi ; xˆj ] = j ; [9ˆi ; 9ˆj ] = 0 : (2.7) ˆ '(x) =
i
Then it is straightforward to show that ˆ ˆ = −9i '(x) ; [9ˆi ; '(x)]
(2.8)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
which upon integration by parts in (2.5) leads to ˆ ˆ ˆ ˆ i f] : [9i ; W[f]] = d D x9i f(x) '(x) = W[9
217
(2.9) i
ˆ
From (2.8) it also follows that translation generators can be represented by unitary operators ev 9i , v ∈ RD , with i ˆ ˆ + v) : ˆ e−vi 9ˆi = '(x ev 9i '(x)
(2.10)
Property (2.10) implies that any cyclic trace Tr de ned on the algebra of Weyl operators has the ˆ feature that Tr '(x) is independent of x ∈ RD . From (2.5) it follows that the trace Tr is uniquely given by an integration over spacetime, ˆ Tr W[f] = d D xf(x) ; (2.11) ˆ where we have chosen the normalization Tr '(x)=1. In this sense, the operator trace Tr is equivalent ˆ to integration over the noncommuting coordinates xˆi . Note that '(x) is not an element of the algebra of elds and so its trace is not de ned by (2.11). It should be simply thought of as an object which interpolates between elds on spacetime and Weyl operators, whose trace is xed by the given normalization. ˆ The products of operators '(x) at distinct points may be computed as follows. Using the Baker–Campbell–HausdorM formula, 6 i
i
eiki xˆ eiki xˆ = e−i=2
ij
ki kj i(k+k )i xˆi
e
;
(2.12)
along with (2.5), one may easily derive d D k d D k i(k+k )i xˆi −i=2 ij ki kj −iki xi −iki yi ˆ '(y) ˆ '(x) = e e e (2)D (2)D dD k dD k i −i=2 ij ki kj −iki xi −iki yi ˆ d D zei(k+k )i z '(z)e = e : D D (2) (2)
(2.13)
If is an invertible matrix (this necessarily requires that the spacetime dimension D be even), then one may explicitly carry out the Gaussian integrations over the momenta k and k in (2.13) to get 1 −2i( −1 )ij (x−z)i (y−z)j ˆ ˆ ˆ d D z '(z)e '(x) '(y) = D : (2.14) |det | In particular, using the trace normalization and the antisymmetry of −1 , from (2.14) it follows that ˆ the operators '(x) for x ∈ RD form an orthonormal set, ˆ '(y)) ˆ Tr('(x) = D (x − y) :
(2.15) ˆ
'(x) ˆ is invertible with inverse This, along with (2.5), implies that the transformation f(x) → W[f] given by
ˆ ˆ f(x) = Tr(W[f] '(x)) : 6
Going back to the quantum mechanical example in Section 1.3 of a particle in a constant magnetic (2.12) de nes the algebra of magnetic translation operators for the Landau levels [55].
(2.16) eld, relation
218
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
The function f(x) obtained in this way from a quantum operator is usually called a Wigner distribuˆ tion function [56]. Therefore, the map '(x) provides a one-to-one correspondence between Wigner elds and Weyl operators. We shall refer to this as the Weyl–Wigner correspondence. For an explicit formula for (2.6) in terms of parity operators, see [57]. 2.2. The star-product ˆ ˆ Let us now consider the product of two Weyl operators W[f] and W[g] corresponding to functions f(x) and g(x). From (2.5), (2.14) and (2.15) it follows that the coordinate space representation of their product can be written as (for invertible) 1 −1 i j ˆ ˆ ˆ Tr(W[f] W[g] '(x)) = D d D yd D zf(y) g(z)e−2i( )ij (x−y) (x−z) : (2.17) |det | Using (2.4), (2.3), and (2.12) we deduce that ˆ ˆ ˆ W[f] W[g] = W[f ? g] ; where we have introduced the Groenewold–Moyal star-product [58] dD k dD k ˜ ij i f(k) g(k ˜ − k)e−(i=2) ki kj eiki x f(x) ? g(x) = D D (2) (2) ← → i ij = f(x) exp 9i 9j g(x) 2 ∞ n 1 i1 j1 i = f(x) g(x) + · · · in jn 9i1 · · · 9in f(x) 9j1 · · · 9jn g(x) : 2 n! n=1
(2.18)
(2.19)
The star-product (2.19) is associative but noncommutative, and is de ned for constant, possibly degenerate . For = 0 it reduces to the ordinary product of functions. It is a particular example of a star product which is normally de ned in deformation quantization as follows [59]. If A is an associative algebra over a eld K, 7 then a deformation of A is a set of formal power series n n fn , which form an algebra A[[,]] over the ring of formal power series K[[,]] in a variable ,. The deformed algebra has the property that A[[,]]=A , ∼ = A, i.e. the order ,0 parts form the original undeformed algebra. One can then de ne a new multiplication law for the deformed algebra A[[,]]. For f; g ∈ A, this is given by the associative K[[,]]-bilinear product ∞ f ?, g = fg + ,n Cn (f; g) (2.20) n=1
which may be extended to the whole of A[[,]] by linearity. The Cn ’s are known as Hochschild two-cochains of the algebra A. The particular star product (2.19) de nes the essentially unique (modulo rede nitions of f and g that are local order by order in ) deformation of the algebra of functions on RD to a noncommutative associative algebra whose product coincides with the Poisson bracket of functions (with respect to the symplectic form ) to leading order, i.e. f ? g = fg + 7
Associativity is not required here. In fact, the following construction applies to Lie algebras as well, with all products understood as Lie brackets.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
219
(i=2) ij 9i f9j g+O( 2 ), and whose coeVcients in a power series expansion in are local diMerential expressions which are bilinear in f and g [59]. Note that the Moyal commutator bracket with the local coordinates xi can be used to generate derivatives as xi ? f(x) − f(x) ? xi = i ij 9j f(x) :
(2.21)
In general, the star-commutator of two functions can be represented in a compact form by using a bi-diMerential operator as in (2.19), ←
→
f(x) ? g(x) − g(x) ? f(x) = 2i f(x) sin( 12 9i ij 9j )g(x) ;
(2.22)
while the star-anticommutator may be written as ←
→
f(x) ? g(x) + g(x) ? f(x) = 2f(x) cos( 12 9i ij 9j )g(x) : A useful extension of formula (2.19) is
i ij 9 9 f1 (x1 ) · · · fn (x n ) : f1 (x1 ) ? · · · ? fn (x n ) = exp 2 9xai 9xbj a¡b
(2.23)
(2.24)
Therefore, the spacetime noncommutativity may be encoded through ordinary products in the noncommutative C ∗ -algebra of Weyl operators, or equivalently through the deformation of the product of the commutative C ∗ -algebra of functions on spacetime to the noncommutative star-product. Note that by cyclicity of the operator trace, the integral ˆ ˆ Tr(W[f1 ] · · · W[fn ]) = d D xf1 (x) ? · · · ? fn (x) (2.25) is invariant under cyclic (but not arbitrary) permutations of the functions fa . In particular, D d xf(x) ? g(x) = d D xf(x)g(x) ;
(2.26)
which follows for Schwartz functions upon integrating by parts over RD . The above quantization method can be generalized to more complicated situations whereby the commutators [xˆi ; xˆj ] are not simply c-numbers [60]. The generic situation is whereby both the coordinate and conjugate momentum spaces are noncommutative in a correlated way. Then the commutators [xˆi ; xˆj ], [xˆi ; pˆ j ] and [pˆ i ; pˆ j ] are functions of xˆi and pˆ i , rather than just of xˆi , and thereby de ne an algebra of pseudo-diMerential operators on the noncommutative space. Such a situation arises in string theory when quantizing open strings in the presence of a nonconstant B- eld [61], and it was the kind of noncommutative space that was considered originally in the Snyder construction [1]. If B is a closed two-form, dB = 0, then the associative star-product in these instances is given by the Kontsevich formula [62] for the deformation quantization associated with general Poisson structures, i.e. Poisson tensors which are in general nonconstant, obey the Jacobi identity, and may be degenerate. This formula admits an elegant representation in terms of the perturbative expansion of the Feynman path integral for a simple topological open string theory [63]. If B is not closed, then the straight usage of the Kontsevich formula leads to a nonassociative bidiMerential operator, the nonassociativity being controlled by dB. However, one can still use associative star-products within the framework of (noncommutative) gerbes. We shall not deal with these generalizations in this paper, but only
220
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
the simplest deformation described above which utilizes a noncommutative coordinate space and an independent, commutative momentum space. In the case of a constant and nondegenerate , the functional integral representation of the Kontsevich formula takes the simple form of that of a one-dimensional topological quantum eld theory and the star-product (2.19) may be written as f(x) ? g(x) = f(.(1)) g(.(0)) D (.(±∞) − x). ∞ i d.j (t) D : (2.27) = D. (.(±∞) − x) f(.(1)) g(.(0)) exp dt.i (t) ( −1 )ij 2 −∞ dt Here the integral runs over paths . : R → RD and it is understood as an expansion about the classical trajectories .(t) = x, which are time-independent because the Hamiltonian of theory (2.27) vanishes. Notice that the underlying Lagrangian of (2.27) coincides with that of the model of Section 1.3 projected onto the lowest Landau level. The beauty of this formula is that it involves ordinary products of the elds and is thereby more amenable to practical computations. It also lends a physical interpretation to the star-product. It does, however, require an appropriate regularization in order to make sense of its perturbation expansion [48]. In the present case the technique described in this section has proven to be an invaluable method for the study of noncommutative eld theory. For instance, stable noncommutative solitons, which have no counterparts in ordinary eld theory, have been constructed by representing the Weyl operator algebra on a multi-particle quantum mechanical Hilbert space [64,65]. The noncommutative soliton eld equations may then be solved by any projection operator on this Hilbert space. We note, however, that the general construction presented above makes no reference to any particular representation of the Weyl operator algebra. Later on we shall work with explicit representations of this ring. 3. Noncommutative perturbation theory In this section we will take a very basic look at the perturbative expansion of noncommutative quantum eld theory. To illustrate the general ideas, we shall consider a simple, massive Euclidean /4 scalar eld theory in D dimensions. To transform an ordinary scalar eld theory into a noncommutative one, we may use the Weyl quantization procedure of the previous section. Written in terms ˆ of the Hermitian Weyl operator W[/] corresponding to a real scalar eld /(x) on RD , the action is 1 ˆ ˆ m2 ˆ g2 ˆ 4 [9i ; W[/]]2 + W[/]2 + W[/] ; (3.1) S(4) [/] = Tr 2 2 4! and the path integral measure is taken to be the ordinary Feynman measure for the eld /(x) (This choice is dictated by the string theory applications). We may rewrite this action in coordinate space by using map (2.5) and property (2.18) to get
1 m2 g2 D 2 2 (9i /(x)) + /(x) + /(x) ? /(x) ? /(x) ? /(x) : (3.2) S(4) [/] = d x 2 2 4! We have used property (2.26) which implies that noncommutative eld theory and ordinary eld theory are identical at the level of free elds. In particular, the bare propagators are unchanged in
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
221
the noncommutative case. The changes come in the interaction terms, which in the present case can be written as
4 4 D d k a 4 D D ˜ a )(2) ˆ Tr(W[/] )= ka V (k1 ; k2 ; k3 ; k4 ) ; (3.3) /(k D (2) a=1 a=1 where the interaction vertex in momentum space is V (k1 ; k2 ; k3 ; k4 ) = e−(i=2)ka ∧kb
(3.4)
a¡b
and we have introduced the antisymmetric bilinear form ka ∧ kb = kai ij kbj = −kb ∧ ka
(3.5)
corresponding to the tensor . We will assume, for simplicity, throughout this section that is an invertible matrix (so that D is even). By using global Euclidean invariance of the underlying quantum eld theory, the antisymmetric matrix may then be rotated into a canonical skew-diagonal form with skew-eigenvalues # , = 1; : : : ; D=2, 0 #1 −#1 0 . ; .. = (3.6) 0 #D=2 −#D=2
0
corresponding to the choice of Darboux coordinates on RD . We denote by the corresponding operator norm of , = max |# | : 166D=2
(3.7)
From (3.4) we see that the interaction vertex in noncommutative eld theory contains a momentum dependent phase factor, and the interaction is therefore nonlocal. It is, however, local to each xed order in . Indeed, because of the star-product, noncommutative quantum eld theories are de ned by a nonpolynomial derivative interaction which will be responsible for the novel eMects that we shall uncover. Given the uniqueness property of the Groenewold–Moyal deformation, noncommutative eld theory involves the nonpolynomial derivative interaction which is multi-linear in the interacting elds and which classically reduces smoothly to an ordinary interacting eld theory (but which is at most unique up to equivalence). Notice that since the noncommutative interaction vertex is a phase, it does not alter the convergence properties of the perturbation series. When = 0, we recover the standard /4 eld theory in D dimensions. Naively, we would expect that this nonlocality becomes negligible for energies much smaller than the noncommutativity scale −1=2 (Recall the discussion of Section 1.1). However, as we shall see in this section, this is not true at the quantum level. This stems from the fact that a quantum eld theory on a noncommutative spacetime is neither Lorentz covariant nor causal with respect to a 5xed -tensor. However, as we have discussed, noncommutative eld theories can be embedded into string theory where the noncovariance arises
222
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
from the expectation value of the background B- eld. We will see in this section that the novel eMects induced in these quantum eld theories can be dealt with in a systematic way, suggesting that these models do exist as consistent quantum theories which may improve our understanding of quantum gravity at very high energies where the notion of spacetime is drastically altered. In fact, even before plunging into detailed perturbative calculations, one can see the eMects of nonlocality directly from the Fourier integral kernel representation (2.17) of the star-product of two elds. The oscillations in the phase of the integration kernel there suppress parts of theintegration region. Precisely, if the elds f and g are supported over a small region of size , then f ? g is nonvanishing over a much larger region of size = [66]. This is exempli ed in the star product of two Dirac delta-functions, D (x) ? D (x) =
1 D |det |
;
(3.8)
so that star product of two point sources becomes in nitely nonlocal. At the eld theoretical level, this means that very small pulses instantaneously spread out very far upon interacting through the Groenewold–Moyal product, so that very high energy processes can have important long-distance consequences. As we will see, in the quantum eld theory even very low-energy processes can receive contributions from high-energy virtual particles. In particular, due to this nonlocality, the imposition of an ultraviolet cutoM will eMectively impose an infrared cutoM 1= . 3.1. Planar Feynman diagrams By momentum conservation, the interaction vertex (3.4) is only invariant up to cyclic permutations of the momenta ka . Because of this property, one needs to carefully keep track of the cyclic order in which lines emanate from vertices in a given Feynman diagram. This is completely analogous to the situation in the large N expansion of a U (N ) gauge eld theory or an N × N matrix model [67]. Noncommutative Feynman diagrams are therefore ribbon graphs that can be drawn on a Riemann surface of particular genus [68]. This immediately hints at a connection with string theory. In this subsection we will consider the structure of the planar graphs, i.e. those which can be drawn on the surface of the plane or the sphere, in a generic scalar eld theory, using the /4 model above as illustration. Consider an L-loop planar graph, and let k1 ; : : : ; kn be the cyclically ordered momenta which enter a given vertex V of the graph through n propagators. By introducing an oriented ribbon structure to the propagators of the diagram, we label the index lines of the ribbons by the “momenta” l1 ; : : : ; lL+1 such that ka = lma − lma+1 , where ma ∈ {1; : : : ; L + 1} with lmn+1 = lm1 (see Fig. 1). Because adjacent edges in a ribbon propagator are given oppositely 6owing momenta, this construction automatically enforces momentum conservation at each of the vertices. Given these decompositions, a noncommutative vertex V such as (3.4) will decompose as V=
n
e−(i=2) lma ∧lma+1
(3.9)
a=1
into a product of phases, one for each incoming propagator. However, the momenta associated to a given line will 6ow in the opposite direction at the other end of the propagator (Fig. 1), so that the
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
< >
223
l1 l2
V1 p
V2
< >
p
l3
< >>
- l1 (a)
la
1
> <
=
lb
(l a - l b )
2
+ m
2
(b)
Fig. 1. (a) Example of a two-loop planar Feynman diagram of external momentum p in noncommutative /4 theory. The noncommutative phase factor at the rst vertex is V1 = e−(i=2)(l2 ∧l3 +l1 ∧l2 +l3 ∧l1 ) while that at the second vertex is V2 = e−(i=2)(l2 ∧l1 +l1 ∧l3 −l2 ∧l3 ) = (V1 )−1 . (b) The massive scalar propagator in ribbon notation.
phase associated with any internal propagator is equal in magnitude and opposite in sign at its two ends. Therefore, the overall phase factor associated with any planar Feynman diagram is [69] Vp (p1 ; : : : ; pn ) =
e−(i=2)pa ∧pb
(3.10)
a¡b
where p1 ; : : : ; pn are the cyclically ordered external momenta of the graph. The phase factor (3.10) is completely independent of the details of the internal structure of the planar graph. We see therefore that the contribution of a planar graph to the noncommutative perturbation series is just the corresponding = 0 contribution multiplied by the phase factor (3.10). This phase factor is present in all interaction terms in the bare Lagrangian, and in all tree-level graphs computed with it. At = 0, divergent terms in the perturbation expansion are determined by products of local elds, and the phase (3.10) modi es these terms to the star-product of local elds. We conclude that planar divergences at = 0 may be absorbed into rede nitions of the bare parameters if and only if the corresponding commutative quantum eld theory is renormalizable [66]. This dispells the naive expectation that the Feynman graphs of noncommutative quantum eld theory would have better ultraviolet behaviour than the commutative ones (at least for the present class of noncommutative spaces) [70]. Note that here the renormalization procedure is not obtained by adding local counterterms, but rather the counterterms are of an identical nonlocal form as those of the bare Lagrangian. In any case, at the level of planar graphs for scalar elds, noncommutative quantum eld theory has precisely the same renormalization properties as its commutative counterparts.
224
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
3.1.1. String theoretical interpretation The factorization of the noncommutativity parameters in planar amplitudes brings us to our rst analogy to string theory. Consider the string -model that was described in Section 1.3. The open string propagator on the boundary of a disk in a constant background B eld is given by [32,34,42] i xi (t) xj (t ) = − G ij ln(t − t )2 + ij sgn(t − t ) ; (3.11) 2 where ij 1 1 ij 2 = −(2 ) ; (3.12) B g + 2 B g − 2 B and Gij = gij − (2 )2 (Bg−1 B)ij
(3.13)
is the metric seen by the open strings (gij is the metric seen by the closed strings). Consider an i operator on 9 of the general form P[9x; 92 x; : : : ] eipi x , where P is a polynomial in derivatives of the coordinates xi along the Dp-brane worldvolume. The sign term in (3.11), which is responsible for the worldvolume noncommutativity, does not contribute to contractions of the operators 9n xi when we evaluate quantum correlation functions using the Wick expansion. It follows then that the correlation functions in the background elds G; may be computed as [34] 2 ipni xi (tn ) Pn [9x(tn ); 9 x(tn ); : : : ]e n
=
e
−(i=2)pn ∧pm sgn(tn −tm )
n¿m
G;
ipni xi (tn )
2
Pn [9x(tn ); 9 x(tn ); : : : ]e
n
:
(3.14)
G;=0
.
It implies that -model correlation funcThis result holds for generic values of the string slope tions in a background B- eld may be computed by simply replacing ordinary products of elds by star-products and the closed string metric g by the open string metric G. Therefore, the -dependence of disk amplitudes when written in terms of the open string variables G and (rather than the closed string ones g and B) is very simple. These two tensors represent the metric and noncommutativity parameters of the underlying noncommutative space. This implies that the tree-level, low-energy effective action for open strings in a B- eld is obtained from that at B=0 by simply replacing ordinary products of elds by star-products. By adding gauge elds to the Dp-brane worldvolume, this is essentially how noncommutative Yang–Mills theory arises as the low-energy eMective eld theory for open strings in background Neveu–Schwarz two-form elds [34]. This phenomenon corresponds exactly to the factorization of planar diagrams that we derived above. The one-loop, annulus diagram corrections to these results are derived in [71]. 3.2. Non-planar Feynman diagrams The construction of the previous subsection breaks down in the case of nonplanar Feynman diagrams, which have propagators that cross over each other or over external lines (Fig. 2). It is
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
ka
225
ka kb
kb
-1
+1
Fig. 2. Positive and negative crossings in a nonplanar Feynman graph.
straightforward to show that the total noncommutative phase factor for a general graph which generalizes the planar result (3.10) is given by [69] e−(i=2)∩ab ka ∧kb ; (3.15) Vnp (p1 ; : : : ; pn ) = Vp (p1 ; : : : ; pn ) a;b
where ∩ab is the signed intersection matrix of the graph which counts the number of times that the ath (internal or external) line crosses over the bth line (Fig. 2). By momentum conservation it follows that the matrix ∩ab is essentially unique. Therefore, the dependence of nonplanar graphs is much more complicated and we expect them to have a much diMerent behaviour than their commutative counterparts. In particular, because of the extra oscillatory phase factors which occur, we expect these diagrams to have an improved ultraviolet behaviour. When internal lines cross in an otherwise divergent graph, the phase oscillations provide an eMective cutoM eM = −1=2 and render the diagram nite. For instance, it turns out that all one-loop nonplanar diagrams are nite, as we shall see in the next subsection. However, it is not the case that all nonplanar graphs (without divergent planar subgraphs) are nite [66]. At = 0, it is possible to demonstrate the convergence of the Feynman integral associated with a diagram G, provided that G has no divergent planar subgraphs and all subgraphs of G have nonpositive degree of divergence. The general consensus at present seems to be that these noncommutative scalar eld theories are renormalizable to all orders of perturbation theory [72], although there are dangerous counterexamples at two-loop order and at present such renormalizability statements are merely conjectures. An explicit example of a eld theory which is renormalizable is provided by the noncommutative Wess–Zumino model [73,74]. In general some nonplanar graphs are divergent, but, as we will see in the next subsection, these divergences should be viewed as infrared divergences. Nonplanar diagrams can also be seen to exhibit an interesting stringy phenomenon. Consider the limit of maximal noncommutativity, → ∞, or equivalently the “short-distance” limit of large momenta and xed . The planar graphs have no internal noncommutative phase factors, while nonplanar graphs contain at least one. In the limit → ∞, the latter diagrams therefore vanish because of the rapid oscillations of their Feynman integrands. It can be shown [66] that a noncommutative Feynman diagram of genus h is suppressed relative to a planar graph by the factor 1=(E 2 # )2h , where E is the total energy of the amplitude. Therefore, if Gconn (p1 ; : : : ; pn ; ) is any connected n-point Green’s function in momentum space, then planar ei=2pa ∧pb Gconn (p1 ; : : : ; pn ; ) = Gconn (p1 ; : : : ; pn ) (3.16) lim →∞
a¡b
226
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
>
k >
p
k
p
Fig. 3. The one-loop planar and nonplanar irreducible Feynman diagrams contributing to the two-point function in noncommutative /4 theory.
for each n, and the maximally noncommutative quantum eld theory is given entirely by planar diagrams. But this is exactly the characteristic feature of high-energy string scattering amplitudes, and thus in the high momentum or maximal noncommutativity limit the eld theory resembles a string theory. Note that in this regard it is the largest skew-eigenvalue of which plays the role of the topological expansion parameter, i.e. is the analog of the rank N in the large N ’t Hooft genus expansion of multi-colour eld theories [67]. 3.3. UV/IR mixing In this subsection we will illustrate some of the above points with an explicit computation, which will also reveal another exotic property of noncommutative eld theories. The example we will consider is mass renormalization in the noncommutative /4 theory (3.2) in four dimensions. For this, we will evaluate the one-particle irreducible two-point function ˜ ˜ 9(p) = /(p) /(−p) 1PI =
∞
g2n 9(n) (p)
(3.17)
n=0
to one-loop order. The bare two-point function is 9(0) (p) = p2 + m2 , and at one-loop order there is (topologically) one planar and one nonplanar Feynman graph which are depicted in Fig. 3. The symmetry factor for the planar graph is twice that of the nonplanar graph, and they lead to the respective Feynman integrals dD k 1 1 (1) 9p (p) = ; (3.18) D 2 3 (2) k + m2 d D k eik ∧p 1 (1) (p) = : (3.19) 9np 6 (2)D k 2 + m2 The planar contribution (3.18) is proportional to the standard one-loop mass correction of commutative /4 theory, which for D = 4 is quadratically ultraviolet divergent. The nonplanar contribution is expected to be generically convergent, because of the rapid oscillations of the phase factor eik ∧p at high energies. However, k ∧ p = 0 when pi ij = 0, i.e. whenever = 0 or, if is invertible, whenever the external momentum p vanishes. In that case the phase factor in (3.19) becomes ineMective at damping the large momentum singularities of the integral, and the usual ultraviolet divergences of the planar counterpart (3.18) creep back in through the relation (1) 9p(1) = 29np (p = 0) :
(3.20)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
227
The nonplanar graph is therefore singular at small pi ij , and the eMective cutoM for a one-loop graph in momentum space is 1= |p • p|, where we have introduced the positive-de nite inner product p • q = −pi ( 2 )ij qj = q • p
(3.21)
with ( 2 )ij = kl ik lj . Thus, at small momenta the noncommutative phase factor is irrelevant and the nonplanar graph inherits the usual ultraviolet singularities, but now in the form of a long-distance divergence. Turning on the noncommutativity parameters ij thereby replaces the standard ultraviolet divergence with a singular infrared behaviour. This exotic mixing of the ultraviolet and infrared scales in noncommutative eld theory is called UV/IR mixing [66]. Let us quantify this phenomenon somewhat. To evaluate the Feynman integrals (3.18) and (3.19), we introduce the standard Schwinger parametrization ∞ 1 2 2 = d e−(k +m ) : (3.22) 2 2 k +m 0 By substituting (3.22) into (3.18) and (3.19) and doing the Gaussian momentum integration, we arrive at ∞ 1 d −m2 −(p•p=4)−(1= 2 ) (1) 9np (p) = e ; (3.23) D=2 6(4) D=2 0 where the momentum space ultraviolet divergence has now become a small divergence in the Schwinger parameter, which we have regulated by → ∞. The integral (3.23) is elementary to do and the result is
(2−D)=4 (D−2)=2 4 m 4 (1) 9np ; (3.24) p•p+ 2 (p) = K(D−2)=2 m p • p + 2 6(2)D=2 where K< (x) is the irregular modi ed Bessel function of order <. The complete renormalized propagator up to one-loop order is then given by (1) (1) (0) + g2 9np (p) + O(g4 ) ; 9(p) = p2 + m2 + 2g2 9np
(3.25)
where we have used (3.20). Let us now consider the leading divergences of the function (3.25) in the case D = 4. From the asymptotic behaviour K< (x) 2<−1 =(<) x−< + · · · for x → 0 and < = 0, the expansion of (3.24) in powers of 1= 2 produces the leading singular behaviour 2eM 1 (1) 2 2 9np (p) = eM − m ln 2 + O(1) ; (3.26) 962 m where the eMective ultraviolet cutoM is given by 2eM =
1 : (1= 2 ) + p • p
(3.27)
Note that in the limit → ∞, the nonplanar one-loop graph (3.26) remains nite, being eMectively regulated by the noncommutativity of spacetime, i.e. 2eM → 1=p • p for → ∞. However, the ultraviolet divergence is restored in either the commutative limit → 0 or the infrared limit
228
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
p → 0. In the zero momentum limit p → 0, we have eM , and we recover the standard mass renormalization of /4 theory in four dimensions, m2ren = m2 +
1 g 2 2 1 g2 m2 2 − ln 2 + O(g4 ) ; 2 32 32 2 m
(3.28)
which diverges as → ∞. On the other hand, in the ultraviolet limit → ∞, we have 2eM 1=p•p, and the corrected propagator assumes a complicated, nonlocal form that cannot be attributed to any (mass) renormalization. Notice, in particular, that the renormalized propagator contains both a zero momentum pole and a logarithmic singularity ln p • p. From this analysis we conclude that the limit → ∞ and the low momentum limit p → 0 do not commute, and noncommutative quantum eld theory exhibits an intriguing mixing of the ultraviolet ( → ∞) and infrared (p → 0) regimes. The noncommutativity leads to unfamiliar eMects of the ultraviolet modes on the infrared behaviour which have no analogs in conventional quantum eld theory. This UV/IR mixing is one of the most fascinating aspects of noncommutative quantum eld theory. To recapitulate, we have seen that a divergent diagram in the = 0 theory is typically regulated by the noncommutativity at = 0 which renders it nite, but as p → 0 the phases become ineMective and the diagram diverges at vanishing momentum. The pole at p = 0 that arises in the propagator for the / eld comes from the high momentum region of integration (i.e. → ∞), and it is thereby a consequence of very high energy dynamics. This contribution to the self-energy has a huge eMect on the propagation of long-wavelength particles. In position space, it leads to long-ranged correlations, since the correlation functions of the noncommutative eld theory will decay algebraically for small g [66], in contrast to normal correlation functions which decay exponentially for m = 0. Indeed, it is rather surprising to have found infrared divergences in a massive eld theory. Roughly speaking, when a particle of momentum pj circulates in a loop of a Feynman graph, it can induce an eMect at distance | ij pj |, and so the high momentum end of Feynman integrals give rise to power law long-range forces which are entirely absent in the classical eld theory. We may conclude from the analysis of this subsection that noncommutative quantum eld theory below the noncommutativity scale is nothing like conventional, commutative quantum eld theory. The strange mixing of ultraviolet and infrared eMects in noncommutative eld theory can be understood heuristically by going back to the quantum mechanical example of Section 1.3. Indeed, the eld quanta in the present eld theory can be thought of as pairs of opposite charges, i.e. electron–hole bound states, moving in a strong magnetic eld [33,75]. Recall from Section 1.3 that in this limit the position and momentum coordinates of such a charge are related by xi = ij pj , with ij = B−1 jij . Thus a particle with momentum p along, say, the x1 -axis will have a spatial extension of size |p| in the x2 -direction, and the size of the particle grows with its momentum. In other words, the low-energy spectrum of a noncommutative eld theory includes, in addition to the usual point-like, particle degrees of freedom, electric dipole-like excitations. More generally, this can be understood by combining the induced spacetime uncertainty relation (1.3) that arises in the noncommutative theory with the standard Heisenberg uncertainty relation. The resulting uncertainties then coincide with the string-modi ed uncertainty relations (1.3). Therefore, this UV/IR mixing phenomenon may be regarded as another stringy aspect of noncommutative quantum eld theory. It can also be understood in terms of noncommutative Gaussian wavepackets [64,66].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
229
τ τ
(a)
(b)
Fig. 4. The double twist diagram in (a) the open string channel and (b) the closed string channel.
3.3.1. String theoretical interpretation As we have alluded to above, the unusual properties of noncommutative quantum eld theories are not due to inconsistencies in their de nitions, but rather unexpected consequences of the nonlocality of the star-product interaction which gives the eld theory a stringy nature and is therefore well-suited to be an eMective theory of strings. The UV/IR mixing has a more precise analog in string theory in the context of a particular open string amplitude known as the double twist diagram [66]. This nonplanar, nonorientable diagram is depicted in the open string channel in Fig. 4(a). Note that symbolically it coincides with the ribbon graph for the one-loop nonplanar mass renormalization in noncommutative /3 theory. By applying the modular transformation > → −1=> to the TeichmYuller parameter of the annular one-loop open string diagram, it gets transformed into the cylindrical closed string diagram of Fig. 4(b). The latter amplitude behaves like 1=pi gij pj for small momenta [66]. In string perturbation theory, one integrates over the moduli of string diagrams, and the region of moduli space corresponding to high energies in the open string loop describes the tree-level exchange of a light closed string state. Therefore, an ultraviolet phenomenon in the open string channel corresponds to an infrared singularity in the closed string channel. This is precisely the same behaviour that was observed at the eld theoretical level above, if we identify the closed string metric with the noncommutativity parameter through gij ∼ −( 2 )ij . In the correlated decoupling limit → 0 described in Section 1.3, this is exactly what is found from (3.13) when the open string metric is taken to be G ij = ij , as it is in the present case. Thus the exotic properties unveiled above may indeed be attributed to stringy behaviours of noncommutative quantum eld theories. The occurrence of infrared singularities in massive eld theories suggests the presence of new light degrees of freedom [66,76]. From our analysis of the one-loop renormalization of the scalar propagator, we have seen that, in addition to the original pole at p2 = −m2 , there is a pole at p2 = O(g2 ) which arises from the high loop momentum modes of the scalar eld /. In order to write down a Wilsonian eMective action which correctly describes the low momentum behaviour of the theory, it is necessary to add new light elds to the action. For instance, the quadratic infrared singularity obtained above can be reproduced by a Feynman diagram in which / turns into a new eld ? and then back into /, where the eld ? couples to / through an action of the form
2 1 D 2 9?(x) • 9?(x) + (9 • 9?(x)) S? = : (3.29) d x g ?(x) /(x) + 2 2
230
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
This process is completely analogous to the string channel duality discussed above, with the eld / identi ed with the open string modes and ? with the closed string mode. Other stringy aspects of UV/IR mixing can be observed by studying the noncommutative quantum eld theory at nite temperature [77]. Then, at the level of nonplanar graphs, one nds stringy winding modes corresponding to states which wrap around the compact thermal direction. This gives an alternative picture to the eld theoretical analog of the open-closed string channel duality discussed in this section. Perturbative string calculations also con rm explicitly the UV/IR mixing [78]. A similar analysis can be done for the linear and logarithmic infrared singularities [66], and also for the corrections to vertex functions [66,79]. At higher loop orders, however, the momentum dependences become increasingly complicated and are far more diVcult to interpret [76]. Other aspects of this phenomenon may be found in [80]. Even eld theories which do not exhibit the UV/IR mixing phenomenon, such as the noncommutative Wess–Zumino model [73], show exotic eMects like the dipole picture [81]. The perturbative properties of the corresponding supersymmetric model are studied in [82]. In Minkowski spacetime with noncommuting time direction, i.e. 0i = 0, one encounters severe acausal eMects, such as events which precede their causes and objects which grow instead of Lorentz contract as they are boosted [83]. Such a quantum eld theory is neither causal nor unitary in certain instances [84]. In a theory with space-like noncommutativity, one can perform a boost and induce a time-like component for . The resulting theory is still unitary [85]. The Lorentz invariant condition for unitarity is p•p ¿ 0, which has two solutions corresponding to space-like and light-like noncommutativity. For space-like one can always boost to a frame in which 0i = 0. However, for light-like noncommutativity, one cannot eliminate 0i by any nite boost. In string theory with a background electric eld, however, stringy eMects conspire to cancel such acausal eMects [86]. There is no low-energy limit in this case in which both ij and G ij can be kept xed when → 0, because, unlike the case of magnetic elds, electric elds in string theory have a limiting critical value above which the vacuum becomes unstable [87], and one cannot take the external eld to be arbitrarily large. There is no low-energy limit in which one is left only with a noncommutative eld theory. Instead, such a theory of open strings should be considered in a somewhat diMerent decoupling limit whose eMective theory is not a noncommutative eld theory but rather a theory of open strings in noncommutative spacetime [86]. The closed string dynamics are still decoupled from the open string sector, so that the theory represents a new sort of noncritical string theory which does not require closed strings for its consistency. The eMective string scale of this theory is of the order of the noncommutativity scale, so that stringy eMects do not decouple from noncommutative eMects and an open string theory emerges, rather than a eld theory. This new model is known as noncommutative open string theory [86]. Other such open string theories have been found in [88]. One can also get a light-like noncommutative quantum eld theory from a consistent eld theory limit of string theory in the presence of electromagnetic elds satisfying E 2 = B2 = 0 and E · B = 0 [89]. 4. Noncommutative Yang–Mills theory Having now become acquainted with some of the generic properties of noncommutative quantum eld theory, we shall focus most of our attention in the remainder of this paper to gauge theories on a noncommutative space, which are the relevant eld theories for the low-energy dynamics of open
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
231
strings in background supergravity elds and on D-branes [28,34]. The Weyl quantization procedure of Section 2 generalizes straightforwardly to the algebra of N × N matrix-valued functions on RD . The star-product then becomes the tensor product of matrix multiplication with the Groenewold– Moyal product (2.19) of functions. This extended star-product is still associative. We can therefore use this method to systematically construct noncommutative gauge theories on RD [60]. 4.1. Star-gauge symmetry Let Ai (x) be a Hermitian U (N ) gauge eld on RD which may be expanded in terms of the Lie algebra generators ta of U (N ) as Ai = Aai ta , with tr N (ta tb ) = ab , a; b = 1; : : : ; N 2 , and [ta ; tb ] = ifab c tc . Here the ta live in the fundamental representation of the U (N ) gauge group and tr N denotes the ordinary matrix trace. In fact, many of the expressions in the following do not close in the U (N ) Lie algebra, as they will involve products rather than commutators of the generators. We introduce a Hermitian–Weyl operator corresponding to Ai (x) by ˆ ˆ W[A] = d D x'(x) ⊗ Ai (x); (4.1) i ˆ where '(x) is the map (2.6) and the tensor product between the coordinate and matrix representations is written explicitly for emphasis. We may then write down the appropriate noncommutative version of the Yang–Mills action as 1 2 ˆ ˆ ˆ ˆ ˆ SYM = − 2 Tr ⊗ tr N ([9ˆi ; W[A] (4.2) j ] − [9j ; W[A]i ] − i[W[A]i ; W[A]j ]) ; 4g where Tr is the operator trace (2.11) over the spacetime coordinate indices. Using (4.1), (2.9), (2.15) and (2.18), the action (4.2) can be written as 1 SYM = − 2 d D x tr N (Fij (x) ? F ij (x)) ; (4.3) 4g where Fij = 9i Aj − 9j Ai − i(Ai ? Aj − Aj ? Ai ) = 9i Aj − 9j Ai − i[Ai ; Aj ] + 12 kl (9k Ai 9l Aj − 9k Aj 9l Ai ) + O( 2 )
(4.4)
is the noncommutative eld strength of the gauge eld Ai (x). Thus the gauge eld belongs to the tensor product of the Groenewold–Moyal deformed algebra of functions on RD with the algebra of N × N matrices. Note that the action (4.3) de nes a nontrivial interacting theory even for the simplest case of rank N = 1, which for = 0 is just pure electrodynamics. Let us consider the symmetries of action (4.2). It is straightforward to see that it is invariant under any inhomogeneous transformation of the form ˆ ˆ † ˆ ˆ ˆ ˆ ˆ † W[A] (4.5) i → W[g]W[A]i W[g] − iW[g][9i ; W[g] ] ; ˆ with W[g] an arbitrary unitary element of the unital C ∗ -algebra of matrix-valued Weyl operators, 8 i.e. ˆ W[g] ˆ † = W[g] ˆ † W[g] ˆ W[g] (4.6) = 5ˆ ⊗ 5N ; 8
Actually, this algebra does not contain an identity element because we are restricting to the space of Schwartz elds. It can, however, be easily extended to a unital algebra. We will elaborate on this point in Section 8.
232
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
where 5ˆ is the identity on the ordinary Weyl operator algebra and 5N is the N × N unit matrix. Given the one-to-one correspondence between Weyl operators and elds, we may expand the unitary ˆ operator W[g] in terms of an N × N matrix eld g(x) on RD as ˆ ˆ ⊗ g(x) : (4.7) W[g] = d D x'(x) The unitarity condition (4.6) is then equivalent to g(x) ? g(x)† = g(x)† ? g(x) = 5N :
(4.8)
In this case we say that the matrix eld g(x) is star-unitary. Note that (4.8) implies that the adjoint g† of g is equal to the inverse of g with respect to the star-product on the deformed algebra of functions on spacetime, but for = 0 we generally have that g† = g−1 . In other words, generally ˆ −1 ] = W[g] ˆ −1 . The explicit relationship between g† and g−1 can be worked out order by order W[g in by using the in nite series representation of the star-product in (2.19). To leading orders we have (for invertible) i g† = g−1 + ij g−1 (9i g)g−1 (9j g)g−1 + O( 2 ) : (4.9) 2 From the Weyl–Wigner correspondence it follows that the function g(x) parametrizes the local star-gauge transformation Ai (x) → g(x) ? Ai (x) ? g(x)† − ig(x) ? 9i g(x)† :
(4.10)
The invariance of the noncommutative Yang–Mills gauge theory action (4.3) under (4.10) follows from the cyclicity of both the operator and matrix traces, and the corresponding covariant transformation rule for the noncommutative eld strength, Fij (x) → g(x) ? Fij (x) ? g(x)† :
(4.11)
The noncommutative gauge theory obtained in this way reduces to conventional U (N ) Yang–Mills theory in the commutative limit = 0. However, because of the way that the theory is constructed above from associative algebras, there is no direct way to get other gauge groups [60,90]. The important point here is that expressions in noncommutative gauge theory in general involve the enveloping algebra of the underlying Lie group. Because of the property (g ? h)† = h† ? g† ;
(4.12)
the Groenewold–Moyal product g ? h of two unitary matrix elds is always unitary and the group U (N ) (in the fundamental representation) is closed under the star-product. However, the special unitary group SU (N ) does not give rise to any gauge group on noncommutative RD , because in general det(g ? h) = det(g) ? det(h). In contrast to the commutative case, the U (1) and SU (N ) sectors of the decomposition U (N ) = U (1) × SU (N )=ZN
(4.13)
do not decouple because the U (1) “photon” interacts with the SU (N ) gluons [91]. Physically, this U (1) corresponds to the centre of mass coordinate of a system of N D-branes and it represents the interactions of the short open string excitations on the D-branes with the bulk supergravity elds.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
233
In the case of a vanishing background B- eld, the closed and open string dynamics decouple and one is eMectively left with an SU (N ) gauge theory, but this is no longer true when B = 0. It has been argued, however, that one can still de ne orthogonal and symplectic star-gauge groups by using anti-linear anti-unitary automorphisms of the Weyl operator algebra [92]. We shall see in Section 8 that these automorphisms are related to some standard operators in noncommutative geometry which can be thought of as generating charge conjugation symmetries of the eld theory. Physically, these cases correspond to the stability of orientifold constructions with background B- elds and Dp-branes [92]. Notice also that, in contrast to the case of noncommutative scalar eld theory, the corresponding quantum measure for path integration is not simply the ordinary gauge- xed Feynman measure for the U (N ) gauge eld Ai (x), because it must be de ned by gauge- xing the star-unitary gauge group, i.e. the group of unitary elements of the matrix-valued Weyl operator algebra. We shall return to this point in Section 4.3. The noncommutative gauge symmetry group will be described in some detail in Section 8. 4.2. Noncommutative Wilson lines We now turn to a description of star-gauge invariant observables in noncommutative Yang–Mills theory [68,95,96]. Let Cv be an arbitrary oriented smooth contour in spacetime RD . The line Cv is parametrized by the smooth embedding functions B(t) : [0; 1] → RD with endpoints B(0) = 0 and B(1) = v in RD . The holonomy of a noncommutative gauge eld over such a contour is described by the noncommutative parallel transport operator dBi Ai (x + B) U(x; Cv ) = P exp? i Cv
=1+
∞ n=1
i
n
0
1
dt1
t1
1
dt2 · · ·
1
tn − 1
dtn
dBi1 (t1 ) dBin (tn ) ··· dt1 dtn
×Ai1 (x + B(t1 )) ? · · · ? Ain (x + B(tn )) ;
(4.14)
where P denotes path ordering and we have used the extended star-product (2.24). The operator (4.14) is an N × N star-unitary matrix eld depending on the line Cv . Under the star-gauge transformation (4.10), it transforms as U(x; Cv ) → g(x) ? U(x; Cv ) ? g(x + v)† :
(4.15)
The noncommutative holonomy can be alternatively represented as [97] U(x; Cv ) = G(x) ? G(x + v)† ;
(4.16)
where G(x) is a solution of the noncommutative parallel transport equation 9i G(x) = iAi (x) ? G(x)
(4.17)
which in general depends on the choice of integration path. Observables of noncommutative gauge theory must be star-gauge invariant. Using the holonomy operators (4.14) and assuming that is invertible, it is straightforward to associate a star-gauge
234
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
invariant observable to every contour Cv by [68,95,96] i O(Cv ) = d D x tr N (U(x; Cv )) ? eiki (v)x ;
(4.18)
where the line parameter ki (v) = ( −1 )ij v j
(4.19)
can be thought of as the total momentum of Cv . The star-gauge invariance of (4.18) follows from i the fact that the plane wave eiki (v)x for any v ∈ RD is the unique function with the property that i
i
eiki (v)x ? g(x) ? e−iki (v)x = g(x + v)
(4.20)
for arbitrary functions g(x) on RD . Using (4.15), (4.20), and the cyclicity of the traces Tr and tr N , the star-gauge invariance of the operator (4.18) follows. To establish the property (4.20), via Fourier transformation it suVces to prove it for arbitrary i plane waves eipi x . Then, using the coordinate space representation of the Baker–Campbell–HausdorM formula (2.12) and the star-unitarity of any plane wave, we have the identity i
i
i
i
i
i
eiki x ? eipi x ? e−iki x = eiki x ? e−iki x ? eipi x eipi
ij
kj
i
= eipi (x +
ij
kj )
;
(4.21)
from which (4.20) and (4.19) follows. This means that, in noncommutative gauge theory, the spacetime translation group is a subgroup of the star-gauge group. In fact, the same is true of the rotation group of RD (cf. (2.21)) [98]. The fact that the Euclidean group is contained in the star-gauge symmetry implies that the local dynamics of gauge invariant observables is far more restricted in noncommutative Yang–Mills theory as compared to the commutative case. We shall describe such spacetime symmetries in more detail in Section 8. The most striking fact about the construction (4.18) is that in the noncommutative case there are gauge invariant observables associated with open contours Cv , in contrast to the commutative case where only closed loops C0 would be allowed. The translational symmetry generated by the star-product leads to a larger class of observables in noncommutative gauge theory. Let us make a few further remarks concerning the above construction: • For an open line Cv with relative separation vector v between its two endpoints, the parameter (4.19) has a natural interpretation as its total momentum (by the Fourier form of the integral (4.18)). It follows that the longer the curve is, the larger its momentum is. This is simply the characteristic UV/IR mixing phenomenon that we encountered in the previous section. If one increases the momentum kj in a given direction, then the contour will extend in the other spacetime directions proportionally to ij kj . In the electric dipole interpretation of Section 3.3, the relationship (4.19) and (4.20) follows if we demand that the dipole quanta of the eld theory interact by joining at their ends. We will see some more manifestations of this exotic property later on. • In the commutative limit = 0 we have v = 0, which is the well-known property that there are no gauge-invariant quantities associated with open lines in ordinary Yang–Mills theory. • When = 0, the quantity (4.18) can be de ned for closed contours by replacing the plane wave i eiki (v)x by an arbitrary function f(x), since in that case the total momentum of a closed loop is unrestricted. In particular, we can take f(x) to be delta-function supported about some xed spacetime point and recover the standard gauge-invariant Wilson loops of Yang–Mills theory. Howi ever, for = 0, closed loops have vanishing momentum, and only the unit function eiki (v)x = 1 is
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
235
permitted in (4.18). Thus, although there is a larger class of observables in noncommutative Yang–Mills theory, the dynamics of closed Wilson loops is severely restricted as compared to the commutative case. Indeed, the requirement of star-gauge invariance is an extremely stringent restriction on the quantum eld theory. It means that there is no local star-gauge invariant dynamics, because everything must be smeared out by the Weyl operator trace Tr. The fact that there are no local operators such as the gluon operator tr N Fij (x)2 suggests that the gauge dynamics below the noncommutativity scale can be quite diMerent from the commutative case. This is evident in the dual supergravity computations of noncommutative Wilson loops [99], which show that while the standard area law behaviour may be observed at very large distance scales, below the noncommutativity scale it breaks down and is replaced by some unconventional behaviour. This makes it unclear how to interpret quantities such as a static quark potential in noncommutative gauge theory. • The gauge-invariant Wilson line operators have been shown to constitute an overcomplete set of observables for noncommutative gauge theory [96], just like in the commutative case. This is due to the fact that 6uctuations in the shape of Cv leave the corresponding holonomy invariant. They may be used to construct gauge invariant operators which carry de nite momentum and which reduce to the usual local gauge invariant operators of ordinary gauge eld theory in the commutative limit as follows [100,101]. For this, we let Ck(0) : Bj (t) = ki ij t;
06t61
(4.22)
be the straight line path from the origin to the point v j = ki ij , and let O(x) be any local operator of ordinary Yang–Mills theory which transforms in the adjoint representation of the gauge group. Then a natural star-gauge invariant operator is obtained by attaching the operator O(x) at one end of a Wilson line of nonvanishing momentum, i ˜ O(k) = tr N d D xO(x) ? U(x; Ck(0) ) ? eiki x : (4.23) The collection of operators of form (4.23) generate a convenient set of gauge-invariant operators which are the natural generalizations of the standard local gauge theory operators in the commutative limit. For small k or , the separation v of the open Wilson lines becomes small, and (4.23) reduces to the usual Yang–Mills operator in momentum space. In this sense, it is possible to generate operators which are local in momentum in noncommutative gauge theory. • Correlation functions of the operators (4.23) exhibit many of the stringy features of noncommutative gauge theory [100,102]. They can also be used to construct the appropriate gauge invariant operators that couple noncommutative gauge elds on a D-brane to massless closed string modes in 6at space [103], and thereby yield explicit expressions for the gauge theory operators dual to bulk supergravity elds in this case. We will return to this point in Section 8. Observables (4.18) may also be expressed straightforwardly in terms of Weyl operators [95,96], though we shall not do so here. Here we will simply point out an elegant path integral representation of the noncommutative holonomy operator (4.14) in the case of a U (1) gauge group [93]. Let us introduce, as in the Kontsevich formula (2.27), auxilliary bosonic elds .i (t) which live on the contour Cv and which have the free propagator i .i (t).j (t ). = [(−i −1 ⊗ 9t )−1 ]ij (t; t ) = ij sgn(t − t ) : (4.24) 2
236
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
It is then straightforward to see that the parallel transport operator (4.14) can be expressed in terms of the path integral expectation value i dB Ai (x + B + .) U(x; Cv ) = exp i Cv
=
D. exp i
Cv
.
d.j (t) 1 i dBi (t) dt . (t)( −1 )ij + Ai (x + B(t) + .(t)) 2 dt dt
:
(4.25)
The equivalence between the two representations follows from expanding the gauge eld Ai (x+B+.) as a formal power series in .i (t) and applying Wick’s theorem. Because of the dependence of propagator (4.24), the Wick contractions produce the appropriate series representation of the extended star-product in (2.24), while the sgn(t − t ) term produces the required path ordering operation P in the Wick expansion. Again, the beauty of formula (4.25) is that it uses ordinary products of elds and is therefore much more amenable to practical, perturbative computations involving noncommutative Wilson lines. Other descriptions of the noncommutative holonomy may be found in [48,94]. 4.3. One-loop renormalization In order to analyse the perturbative properties of noncommutative Yang–Mills theory, one needs to rst of all gauge- x the star-gauge invariance of the model. This can be done in a straightforward way by adapting the standard Faddeev–Popov technique to the noncommutative case [91,104,107]. The gauge xed noncommutative Yang–Mills action assumes the form 1 2 D SYM = d x tr N − 2 Fij (x) ? F ij (x) + (9i Ai (x))2 4g B − 2 c(x) Z ? 9i ∇i c(x) + 2 9i ∇i c(x) ? c(x) Z
;
where c = ca ta and cZ = cZa ta are noncommutative fermionic Faddeev–Popov ghost transform in the adjoint representation of the local star-gauge group, c(x) → g(x) ? c(x) ? g(x)† ;
c(x) Z → g(x) ? c(x) Z ? g(x)† :
(4.26) elds which (4.27)
The constant B is the covariant gauge- xing parameter, and ∇i denotes the star-gauge covariant derivative which is de ned by ∇i c = 9i c − i(Ai ? c − c ? Ai ) :
(4.28)
Feynman rules for noncommutative Yang–Mills theory may now be written down [108]. Because of the noncommutative interaction vertices analogous to (3.4), the eMective “structure constants” of the star-gauge group will involve oscillatory functions of the momenta of the lines. 9 For instance, in the case of a U (1) gauge group, the Feynman rules are easily read oM from those of ordinary 9
Explicit presentations of the genuine structure constants of the noncommutative gauge symmetry group may be found in [98].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
237
nonabelian gauge theory by simply replacing Lie algebra structure constants fabc with the momentum dependent functions fkpq = 2 sin(k ∧ p)(2)D D (k + p + q) ;
(4.29)
and sums over Lie algebraic indices by integrations over momenta. In the generic case of rank N ¿ 1, the only subtleties which arise are that the Feynman rules will involve not only commutators but also anticommutators of the U (N ) generators ta . They will therefore depend on both the antisymmetric structure constants fabc and the symmetric tensors dabc , where i 1 (4.30) ta tb = fabc tc + dabc tc : 2 2 In the Feynman-’t Hooft gauge B=1, the gluon propagator is −(i=p2 )Gij ab and the ghost propagator is −(i=p2 ) ab . The three-gluon vertex is given by V(3) (k; p; q)abcijl = −2g[(dabc sin k ∧ p − ifabc cos k ∧ p)(k − q)l Gij + permutations](2)D D (k + p + q) ;
(4.31)
the four-gluon vertex by V(4) (k; p; q; r)abegijlm = − 4ig2 [(dabc sin k ∧ p − ifabc cos k ∧ p)(dceg sin q ∧ r − ifce g cos q ∧ r) : ×(Gil Gjm − Gim Gjl ) + permutations](2)D D (k + p + q + r) ;
(4.32)
and the ghost–ghost–gluon vertex by Vgh (k; p; q)abci = −2g ki (dabc sin k ∧ p − ifabc cos k ∧ p)(2)D D (k + p + q) :
(4.33)
The Feynman rules for the U (1) case follow from substituting fabc = 0 and dabc = 1 in the above. With these rules, it is relatively straightforward to do perturbative calculations in noncommutative Yang–Mills theory in parallel to the commutative case. Essentially the only tricks involved are the usages of various trigonometric identities to simplify the momentum integrations over the oscillatory functions involved. We shall not go into any details here, but simply quote a few of the many interesting results that have been obtained. First, let us consider the one-loop renormalization of the gluon propagator [66]. Since star-products and matrix products always appear together, the notion of planarity in the sense of the large N expansion is the same as that for the noncommutative interactions which was discussed in Section 3. Therefore, the large genus expansion of the theory will produce a similar sort of string expansion as in ordinary large N gauge theory. Moreover, nonplanar one-loop U (N ) diagrams will contribute only to the U (1) part of the theory. Using the noncommutative version of the standard background eld gauge, the divergent part of the one-loop eMective action to quadratic order in momentum space for D = 4 is found to be [66,105,106] d4 k 1 1 11N 2 2 (1) 2 (k) =eM [A] = − − ln 2 tr N (9A) 4 (2)4 g2 242 k
11N 1 4N ij kl 2 (tr N 9A) (k) : (4.34) (tr N (9i Aj ) tr N (9k Al ))(k) + ln 2 + 2 (k • k)2 242 k (k • k)
238
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
After renormalization, the one-loop Gell Mann-low beta-function may be computed from (4.34) as [107,109] F(g2 ) =
22 g4 N 2 9 g2 =− : 9 ln 3 82
(4.35)
Note that this formula holds even for N = 1, and it follows that noncommutative U (1) gauge theory is asymptotically free. The eMective coupling constant grows at large distance scales and leads to interesting strong coupling eMects. This is also true of course for all N . In fact, apart from the nonplanar terms which are generically nite at = 0, the eMective action (4.34) is the same as that of ordinary commutative SU (N ) Yang–Mills theory in the large N limit (in which only planar ’t Hooft diagrams survive). Therefore, the perturbative beta-function for noncommutative U (N ) Yang– Mills theory for any N can be simply found from that of the ordinary SU (N ) theory. This remarkable coincidence will be explained in Section 6 when we discuss Morita equivalence of noncommutative gauge theories. Note that, in contrast to the ordinary commutative case whereby the dynamics in the centre U (1) of the U (N ) gauge group is always decoupled and free, in the noncommutative case it runs with the same beta-function as the rest of the SU (N ) gauge theory [91]. The full, gauge-invariant noncommutative eMective action for pure Yang–Mills theory involves open Wilson lines [110]. Notice also that the nonplanar U (1) part of the eMective action (4.34) has a logarithmic infrared singularity, similarly to the case of noncommutative scalar eld theory. Here, unlike the power-like UV/IR mixing which seems to imply the alarming feature that perturbation theory is no longer reliable, logarithmic UV/IR mixing may be put to good use. This has been pointed out in [111] where it was suggested that the U (1) UV/IR mixed degrees of freedom of a U (N ) gauge eld theory have a direct physical interpretation. There are examples of supersymmetric theories in which they decouple from the SU (N ) degrees of freedom, and eventually become weakly coupled in the infrared, playing the role of the hidden sector which breaks supersymmetry. In this way the unfamiliar behaviour of the U (1) running coupling constant in the extreme infrared is not interpreted as an artifact of perturbation theory, but is instead turned into a useful mechanism to break supersymmetry. Physical interpretations of UV/IR mixing from the D-brane perspective may also be found in [106,110]. Unlike the standard high momentum divergences of ordinary quantum eld theories, which can be typically removed by a choice of regularization scheme, here the noncommutative momenta play the role of the regulators and lead to new infrared singularities which cannot be straightforwardly removed. These eMects can also be characterized as nonanalytic behaviour in the noncommutativity parameter , so that the noncommutative eld theory does not recover ordinary eld theory at the quantum level in the limit → 0. However, it is diVcult to analyse the renormalizability properties of noncommutative Yang–Mills theory along the lines that we discussed in Section 3. Part of the problem stems from the fact, discussed in the previous subsection, that noncommutative gauge theories appear to have no local gauge invariant operators, and so it is diVcult to deduce what (infrared) eMects will be induced by the noncommutativity. Naively, one would expect that the theory would have at worst logarithmic divergences (unlike the scalar eld theory studied in Section 3 which also contained quadratic divergences), but from (4.34) we see that both linear and logarithmic infrared singularities arise [105]. Because noncommutative Yang–Mills theory already contains massless elds, it is diVcult to disentangle the usual infrared eMects from the new ones induced by the noncommutativity. It is not clear in this case what the new light degrees of freedom look like.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
239
It has been shown, however, that noncommutative quantum electrodynamics, i.e. noncommutative U (1) gauge theory minimally coupled (with respect to the star-gauge invariance) to noncommutative fermion elds, is free from the infrared poles in but still contains the anticipated logarithmic nonanalyticity [105]. An exception is noncommutative supersymmetric Yang–Mills theory with 16 supercharges in which UV/IR mixing appears to be absent altogether [112]. A lot of eMort has also been expelled into analysing the ultraviolet structure of the quantum eld theory, and it is believed that in lower spacetime dimensions noncommutative Yang–Mills theory is renormalizable in precisely the same way that its commutative counterpart is [108,113]. It also appears to be gauge-invariant [112] and unitary [114] in perturbation theory, consistent with the fact that these models may be naturally embedded into string theory. Other aspects of perturbative noncommutative gauge theories are discussed in [115]. 5. Gauge theory on the noncommutative torus The study of massless eld theories on a torus is of great interest in the noncommutative case because the compactness of the spacetime gives a natural infrared regularization of the theory. One may therefore analyse more carefully the ultraviolet behaviour and also the new light degrees of freedom which are responsible for the UV/IR mixing. From a more mathematical point of view, the noncommutative torus constitutes one of the original examples in noncommutative geometry [2] which captures the essential topological changes which occur when one deforms a compact space. It is perhaps the most basic example which still contains a rich geometrical structure. In this section we shall describe some basic aspects of the noncommutative torus with particular emphasis on the properties of vector bundles de ned over them. From the study of the global properties of gauge theories de ned on this space, we will pave the way for our discussion of Morita equivalence in the next section. 5.1. The noncommutative torus Most of what we have said about noncommutative quantum eld theory is true when RD is replaced by a D dimensional torus TD , with only subtle changes that we shall now explain. Let ai be the D × D period matrix of TD , which is a vielbein for its metric, i.e. ai ab bj = G ij . Here and in the following the indices i; j; : : : will label spacetime directions while a; b; : : : will denote indices in the frame bundle of TD . The matrices ai parametrize the moduli of D dimensional tori and they may be regarded as maps from the frame bundle to the tangent bundle of TD . They de ne the periods of the directions of TD , xi ∼ xi + ai ;
a = 1; : : : ; D ;
(5.1)
for each i = 1; : : : ; D. When ai is not proportional to ia , the identi cations (5.1) for a = i describe how the torus is tilted in its parallelogram representation. Smooth functions on the torus must be single-valued, which implies that the corresponding Fourier momenta ˜k are quantized as ki = 2(−1 )ai ma ;
ma ∈ Z:
(5.2)
240
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
Therefore, to describe the deformation of the function algebra, one cannot use the unbounded operators xˆi obeying (1.1). Instead, one must restrict to the proper subalgebra of the algebra of noncommutative RD that is generated by the Weyl basis of unitary operators −1 a i Zˆ a = e2i( )i xˆ
(5.3)
which generate the algebra ab Zˆ a Zˆ b = e−2i Zˆ b Zˆ a ;
(5.4)
ab = 2(−1 )i a ij (−1 )j b
(5.5)
where
are the corresponding dimensionless noncommutativity parameters. The commutation relations (5.4) de ne the “algebra of functions” on the noncommutative torus. Formally, if L ∼ = ZD is the lattice of rank D (with bilinear form Gij ) which generates the torus as the quotient space TD = RD =L, then the projective regular representations L in (5.4) of the lattice group L are labelled by an element ab of the second Hochschild cohomology group H 2 (L; U (1)). This latter characterization can be generalized to describe other sorts of noncommutative compacti cations of RD [116]. Any function on TD can be expanded as a Fourier series −1 a i f(x) = fm˜ e2i( )i ma x : (5.6) m ˜ ∈ZD
The corresponding Weyl algebra is generated by the operators (5.3) and Weyl quantization takes the form of the map ˆ ˆ ; (5.7) W[f] = d D xf(x) '(x) where the integration is taken over TD and ˆ '(x) =
D 1 ab −1 a i (Zˆ a )ma e−i ma mb e−2i ( )i ma x |det | D a=1
(5.8)
a¡b
m ˜ ∈Z
is a periodic eld operator, ˆ + i a –ˆ) = '(x); ˆ '(x
a = 1; : : : ; D ;
(5.9)
with –ˆ a unit vector in the ith direction of spacetime. Like on RD , we may introduce anti-Hermitian, commuting linear derivations 9ˆi which on the noncommutative torus are de ned by their actions on the Weyl basis, [9ˆi ; Zˆ a ] = 2i(−1 )i a Zˆ a :
(5.10)
The basis (5.8) then has the requisite property ˆ ˆ [9ˆi ; '(x)] = −9i '(x) :
(5.11)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
241
5.2. Topological quantum numbers A U (N ) noncommutative Yang–Mills theory on the torus TD can be constructed in much the same way as we did in the previous section. If we restrict to gauge eld con gurations which are single-valued functions on TD , then everything we have said goes through without a hitch, with single-valued star-unitary functions g(x) parametrizing the star-gauge transformations (4.10). The only diMerence which arises is that, like in the commutative case, there are extra observables associated with the nontrivial homotopy of the torus. The most general star-gauge invariant observable is still given by (4.18), but now there is a larger set of line momenta. Because the momenta are now quantized as in (5.2), the identi cation of the translation vector v in (4.20) is ambiguous up to an integer translation of the periods of TD , and the relationship (4.19) is now modi ed to vi = ij kj (v; n) + i a na
(5.12)
for arbitrary integer-valued vectors na . When =0, the relationship (5.12) reproduces the well-known result that the only open line observables in ordinary Yang–Mills theory are those which are associated with loops that wind na times around the ath noncontractible cycle of the torus. Therefore, we obtain the analog of Polyakov lines in noncommutative Yang–Mills theory associated with the diMerent homotopy classes of the torus [95,96]. More interesting things happen, however, when we consider gauge eld con gurations of nonvanishing topological charge on the noncommutative torus. An elegant way to keep track of the quantum numbers associated with topologically nontrivial gauge elds is through their numbers. In the Chern n commutative case, these would be represented by the integers J(n) (E) = tr N F =(2)n de ned in terms of the curvature two-form F of some gauge connection of a U (N ) gauge bundle E over TD , and suitably integrated over cycles of the torus. For n = 0 they produce the rank N of the vector bundle E, for n = 1 they yield the 6uxes Qab of the gauge elds through the surface formed by the ath and bth cycles of TD , and for n = 2 they give the instanton number k of the bundle E when D = 4. We can collect these integers into the inhomogeneous Grassmann form ch0 (E) = N +
d 1 J(n) (E)a1 ···a2n La1 · · · La2n ; n! n=1
(5.13)
where here and in the following we will assume that the spacetime torus has even dimension D =2d. We have introduced a set La , a = 1; : : : ; D, of anticommuting Grassmann variables, La Lb = −Lb La ;
(5.14)
which can be thought of as local generators of the cotangent bundle of TD . The quantity ch0 (E) then de nes an integer cohomology class of the ordinary torus TD . Given these integers which characterize the given bundle E, there is an elegant formula for the noncommutative Chern character ˆ 1 W[F] n ˆ = Tr ⊗ tr N (W[F]) (5.15) ch (E) = Tr ⊗ tr N exp n n! 2 (2) n ¿0 which characterizes the corresponding gauge bundle over the noncommutative torus. Here F is the noncommutative curvature two-form of the bundle with local components Fab = i a Fij j b , where Fij is de ned by (4.4) for an arbitrary gauge connection A. It can be regarded as an element of
242
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
the ordinary cohomology ring H even (TD ; R) of even degree diMerential forms on the torus. Quantity (5.15) can be written in terms of (5.13) through the Elliott formula [117] 1 ab 9 9 ch (E) = exp − ch0 (E) ; (5.16) 2 9La 9Lb with regarded as a two-cycle of the homology group H2 (TD ; R) [118]. The coeVcients of La1 · · · La2n in the expansion of (5.16) de ne the nth noncommutative Chern numbers of the given noncommutative gauge theory. They represent the topological invariants of the corresponding deformation E → E from a commutative to a noncommutative gauge bundle. In the commutative limit = 0, ch0 (E) generates the ordinary integer-valued Chern numbers. But for = 0 they are nonintegral in general. For example, in two dimensions we nd ch (E) = (N − Q ) + Q L1 L2 ;
(5.17) 2
where Q is the magnetic 6ux through T . We see here that in general the rank of a bundle over the noncommutative torus is no longer necessarily an integer or even a rational number. This is a common feature of vector bundles over noncommutative spaces [2]. The integral curvature Tr ⊗ ˆ tr N W[F]=2, on the other hand, is always an integer, because in noncommutative geometry the top Chern number d D x ch (E) always computes the index of a Fredholm operator [2], analogously to the commutative case. In fact, in any dimension the topological numbers of E are all integers which can be obtained from the K-theory class of E. Similarly, in four dimensions the noncommutative Chern character is ˜ ab La Lb + k L1 L2 L3 L4 ; ch (E) = (N + 12 ab Qab + k ˜ ab ab ) + 12 (Q + k )
(5.18)
where ˜ ab = jabcd cd . Note that (5.16) in general agrees with the formula for D-brane charges in background supergravity elds as computed from a Wess–Zumino type action [119], in which the sum over all Ramond– Ramond form potentials couples to the generalized Mukai vector <(E)=ch0 (E)∧eB=2 ∈ H even (TD ; R) of the given vector bundle E → TD . The 2nth component of ch0 (E) in (5.13) gives the number of D(2n)-branes which wrap the various 2n-cycles of TD . The Chern character (5.16) measures the fact that D2-branes in background B- elds have an eMective D0-brane charge, and similarly for other branes. This is seen explicitly in (5.17), which shows that the number of D2-branes is unaMected in two dimensions by the presence of the B- eld, but the number of D0-branes is shifted by the product of the number of D2-branes and the Neveu–Schwarz two-form eld along the D2-branes. 1 2
5.3. Large star-gauge transformations Having described how to construct topological invariants of gauge theories on the noncommutative torus, let us now turn to their local aspects. We will consider the noncommutative gauge theory which is de ned by the action 1 SYM = − 2 d D x tr N (Fij (x) − fij 5N )2? (5.19) 4g where Fij (x) is the noncommutative eld strength tensor (5.18). The constant, antisymmetric background 6ux fij will be xed later on. At the classical level, the action (5.19) is minimized by
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
243
gauge eld con gurations of nonvanishing topological charge. On a compact space, gauge elds of nonvanishing 6ux are not single-valued functions and must be de ned on the corresponding covering space. We therefore regard the noncommutative gauge elds Ai (x) as functions on RD which obey the twisted boundary conditions Ai (x + j a —) ˆ = Na (x) ? Ai (x) ? Na (x)† − i Na (x) ? 9i Na (x)† ;
(5.20)
where j a are the periods of TD and Na (x) are the transition functions of the bundle which are N × N star-unitary matrices. Once we have taken the global gauge transformations (5.20) of the theory into account, we may use star-gauge invariance to write the action (5.19) in terms of gauge elds on the torus. By iterating (5.20) we nd a set of consistency conditions Na (x + i b –ˆ) ? Nb (x) = Nb (x + i a –ˆ) ? Na (x)
(5.21)
which require that the transition functions de ne cocycles of the local star-gauge group. We will make the gauge choice i
Na (x) = eiai x ⊗ =a ;
(5.22)
where is a real-valued constant D × D matrix with the antisymmetry property () = − which ensures that the transition function Na (x) has the periodicity Na (x + i a –ˆ) = Na (x). The matrix appears as the U (1) factor in the given gauge choice and it will essentially account for the abelian 6uxes of the gauge elds. The =a are constant SU (N ) matrices. From (5.21) it follows that they must commute up to some phases, =a =b = e2iQab =N =b =a ;
(5.23)
where Q is an antisymmetric D × D matrix. Taking the determinant of both sides of (5.23) shows that Qab ∈ Z. The commutation relations (5.23) de ne the Weyl-’t Hooft algebra in D dimensions [53,120], with Q the matrix of nonabelian SU (N ) ’t Hooft 6uxes through the various nontrivial two-cycles of the torus (Recall that magnetic 6ux on compact spaces with noncontractible two-cycles is always quantized). From (5.21) we nd the matrix-valued consistency condition Q=
N (2 − ) : 2
(5.24)
We will now rewrite the noncommutative gauge theory (5.19) in terms of gauge elds whose vacuum con guration has vanishing magnetic 6ux (i.e. Ai (x) = 0 up to a star-gauge transformation). These new eld con gurations will therefore be single-valued functions on the torus. For this, we introduce a xed, multi-valued background abelian gauge eld ai (x) to absorb the 6ux fij . A gauge choice which is compatible with (5.22) is given by ai (x) = 12 Fij xj ⊗ 5N ;
(5.25)
where F is a real-valued constant antisymmetric D × D matrix. From (5.23) and the identity (cf. (2.21)) j
j
j
xi ? eiaj x − eiaj x ? xi = − ik ak eiaj x ;
(5.26)
244
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
it follows that the twisted boundary conditions (5.20) for the gauge eld (5.25) are then equivalent to the matrix identities = − F
1 ; 2 5D + F
F = 2
1 : −
(5.27)
We decompose the gauge eld con gurations Ai (x) of the theory (5.19) into the particular solution (5.25) and (5.27) of the twisted boundary conditions and a 6uctuating part around the xed background as Ai (x) = ai (x) + Ai (x) ;
(5.28)
where the eld Ai (x) satis es the covariant twisted boundary conditions ˆ = Na (x) ? Ai (x) ? Na (x)† : Ai (x + j a —)
(5.29)
Condition (5.29) requires that the 6uctuating eld Ai (x) be an adjoint section of the given gauge bundle over the noncommutative torus. Substituting (5.29) and (5.25) into action (5.19) we arrive at 1 d D x tr N (Fij (x) + fij? − fij 5N )2? ; SYM = − 2 (5.30) 4g where Fij = Di Aj − Dj Ai − i(Ai ? Aj − Aj ? Ai )
(5.31)
D i = 9i + i ai
(5.32)
with
a ducial connection of constant curvature Fij , and 1 ? fij = 9i aj − 9j ai − i(ai ? aj − aj ? ai ) = F + F F ⊗ 5N 4 ij
(5.33)
is the noncommutative eld strength of the background gauge eld (5.25). Requiring that Ai (x) = 0 be the vacuum eld con guration of the theory up to a star-gauge transformation xes fij 5N = fij? in (5.30), and the action becomes 1 d D x tr N (Fij (x) ? Fij (x)) : (5.34) SYM = − 2 4g Since the classical gauge eld con gurations of the theory (5.34) have vanishing curvature Fij (x)= 0, we would like to interpret them as single-valued functions. This will be done in the next section, where we shall nd a suitable basis of the noncommutative C ∗ -algebra of functions in which the covariant derivatives (5.32) act as ordinary derivative operators. We will do so by nding the most general adjoint section obeying (5.29) and interpreting the resulting model as a new gauge theory on a new noncommutative torus. For this, it will be convenient to solve the covariant constraint (5.29) using Weyl operators. Using the map (5.8), we may associate to the adjoint section Ai (x)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
the Hermitian Weyl operator ˆ ˆ W[A]i = d D x'(x) ⊗ Ai (x) in terms of which the action (5.34) becomes 1 2 ˆ ˆ ˆ ˆ ˆ SYM = − 2 Tr ⊗ tr N ([Dˆ i ; W[A] j ] − [Dj ; W[A]i ] − i[W[A]i ; W[A]j ]) ; 4g
245
(5.35)
(5.36)
where i Dˆ i = 9ˆi + Fij xˆj 2 is a linear derivation on the Weyl operator algebra of constant curvature 1 ˆ ˆ : [D i ; D j ] = i F + F F 4 ij
(5.37)
(5.38)
The twisted boundary conditions (5.29) may then be written in terms of Weyl operators as e
j
ˆ
a 9j
− ˆ W[A] ie
j
ˆ
a 9j
† ˆ ˆ ˆ = W[N] a W[A]i (W[N]a ) ;
(5.39)
where iai xˆi ˆ W[N] ⊗ =a a=e
(5.40)
are the unitary Weyl operators corresponding to the transition functions in the gauge (5.22). The background abelian 6ux fij can be written in terms of the geometrical parameters of the given constant curvature bundle by using (5.27) and the identity 2 2 1 1 ⊗ 5N = 5 D + F ⊗ 5N = 5D ⊗ 5N + f ? (5.41) 2 5D − −1 to write (5.24) in the form 1 Q ; f = 2 N 5D − Q
(5.42)
where is the dimensionless noncommutativity parameter (5.5). The identity (5.42) gives the relationship between the central curvatures and the magnetic 6uxes of the gauge eld con gurations. Note that in the commutative case = 0, the SU (N ) ’t Hooft 6ux Q is equivalent to the U (1) 6ux f = F of the bundle in (5.38). The ’t Hooft 6ux was originally introduced for ordinary SU (N ) gauge theory and it is the only way in that case to twist the boundary conditions on the gauge elds [120,121]. For this reason the matrices =a which generate the Weyl-’t Hooft algebra (5.23) are sometimes referred to as twist eaters. In the commutative case, keeping the phase in (5.22) is redundant (see (5.24)), because it can be cancelled by using the global decomposition (4.13) of the U (N ) gauge group. The quotient there means that an element (ei ; g) ∈ U (1) × SU (N ) is identi ed with (ei !−1 ; !g) for any N th root of unity ! ∈ ZN . The U (1) twists can in this way be consistently cancelled by the SU (N )=ZN sector of the gauge theory and one can simply set = 0 without loss of generality. However, as is evident from the formulas above, this is no longer true in the noncommutative case. The physical reason behind this was explained at the end of Section 4.1. We remark also that the constructions we have presented in this section do not account for all possible
246
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
gauge theories on the noncommutative torus. In two and three dimensions, generic U (N ) bundles on tori admit connections of vanishing SU (N ) curvature (i.e. with constant curvature [Di ; Dj ] = fij 5N , like the ones we have considered above) [2]. However, for D ¿ 4, even in the commutative case not all bundles admit constant curvature connections [120]. The connections that we have considered in this section correspond to BPS states in the gauge theory [28,122]. Noncommutative gauge theories on T4 with nonconstant SU (N ) 6ux have been studied in [123]. 6. Duality in noncommutative Yang–Mills theory In this section we will derive a remarkable equivalence relation on the space of noncommutative Yang–Mills theories [16,28,96,124,125]. This is a special type of geometrical symmetry which relates two apparently distinct “spaces” in noncommutative geometry. It was originally introduced in the mathematics literature as a resolution to certain paradoxes that arise in the context of the reconstruction of topological spaces from C ∗ -algebras. Let us give a very simple example of this equivalence. Given any manifold M , consider the two nonisomorphic associative algebras A = C(M ); A = C(M ) ⊗ M(N; C) ;
(6.1)
where C(M ) is the space of smooth complex-valued functions on M and M(N; C) is the nitedimensional algebra of N × N complex-valued matrices. At the level of topology, a topological space may be completely characterized by the algebra of continuous complex-valued functions de ned on it, because one may reconstruct the topology given the continuity requirement of all functions on it. The algebra A is commutative, and if we didn’t know that it was a space of functions on M , but rather only knew its algebraic properties, then we could still associate the manifold M to it. That this is possible is the content of the Gel’fand-Naimark theorem which provides a one-to-one correspondence between the category of commutative C ∗ -algebras and the category of HausdorM topological spaces [2]. Given any commutative algebra A, we may formally construct a topological space M for which A is naturally isomorphic to the space of functions C(M ). The Gel’fand transform which accomplishes this identi es points of the space with the characters (i.e. the multiplicative linear functionals) of the algebra A [2]. In the case of a commutative algebra, all irreducible representations are one-dimensional, and the space of characters coincides with the space of irreducible representations. We will return to these points in Section 8. On the other hand, the algebra A is the space of N × N matrix-valued functions on the manifold M , which is noncommutative. The de nition of the Gel’fand transform, which is used to reconstruct a space from an algebra, becomes ambiguous for noncommutative algebras, and it is not possible to formally reconstruct the space M in this case. In particular, the spaces of characters and irreducible representations of the algebra no longer coincide. So if we were only given the algebra A , due to its noncommutativity we would have no way of knowing that it is canonically associated with a manifold. But clearly we would like to do so, because A is still a space of elds which are de ned over some con guration manifold, in the classical sense of the word. The only diMerence now is that the elds have isospin degrees of freedom associated to them. The ambiguity that arises in de ning a point is removed by the realization that the diMerent points which are associated via the Gel’fand transform are smeared over an N dimensional sphere [20], and are related to each other
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
247
by global rotations in the isospin space. The algebra A thereby certainly captures the topological characteristics of the manifold M . 10 This paradox is resolved by the realization that the space M(N; C) has only one irreducible representation as a C ∗ -algebra, namely its de ning representation. To capture the feature that both algebras (6.1) describe the same space M , one says that they are Morita equivalent. In general, two algebras are Morita equivalent if they become isomorphic upon tensoring them with the algebra of compact operators on some Hilbert space (heuristically, this is the algebra M(N; C) for “N suVciently large”—see Section 8) [2]. Morita equivalent spaces share many common geometrical characteristics, for example they have the same K-theory and cyclic homology. But gauge theories, or more precisely vector bundles, de ned over them can be very diMerent. For instance, the algebra A in (6.1), being commutative, possesses only a U (1) unitary subgroup of functions, while A has a U (N ) unitary subgroup. Therefore, under the Morita relation, a U (1) gauge theory becomes equivalent to a U (N ) gauge theory. This equivalence at the level of vector bundles follows from the stability of the corresponding K-theory groups (which characterize the cohomology of vector bundles over a space) under the Morita transformation [2], and indeed gauge theories over Morita equivalent spaces are canonically related. In this section we will see some speci c instances of this natural relation. In the following we will present a eld theoretical derivation of the Morita equivalence between Yang–Mills theories on noncommutative tori. We will see that this equivalence can be interpreted as a stringy T -duality symmetry of noncommutative Yang–Mills theory [16,125], which implies certain remarkable symmetries of Matrix theory compacti cations [28,118,124,125]. Indeed, we shall nd an explicit relationship with the T -duality symmetry of toroidally compacti ed open strings [13]. Another application will be to give a quantitative explanation of the form of the perturbative gauge theory results that we discussed in Section 4.3. There we saw, for example, that the one-loop renormalization of U (1) noncommutative gauge theory is identical on R4 to that of ordinary large N Yang–Mills theory (after a suitable rescaling of the Yang–Mills coupling constant g). This is a strong indication that the geometrical Morita equivalence property of noncommutative geometry, which holds at the classical level, does indeed persist in regularized perturbation theory. Therefore, Morita equivalence, along with the Eguchi–Kawai reduction of large N gauge theories which will be discussed in Section 7, lends a natural explanation of these coincidences. It also yields a quantitative explanation for the deep relationship that exists between large N reduced models (such as the IKKT matrix model which we discussed in Section 1.2) and noncommutative Yang–Mills theory [96]. We will return to this latter point in the next section. 6.1. Morita equivalence In this subsection we will demonstrate in some detail how to solve the twisted boundary conditions (5.39) and show that in this way we naturally arrive at a physically equivalent, dual 10 These statements can be made more precise by using the formalism of spectral triples in noncommutative geometry [2,14–16,20]. The Riemannian geometry of a manifold M can be reconstructed from the operator algebraic spectral data associated with the quantum mechanics of the free geodesic motion of a test particle on M . In this context, A coincides with the algebra of observables of the quantum theory. The algebra A , on the other hand, coincides with the algebra of observables of a test particle moving on M which has some internal degrees of freedom.
248
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
noncommutative Yang–Mills gauge theory. We will see in the next subsection that this notion of duality is identical to that of T -duality for toroidally compacti ed open strings in background supergravity elds. 6.1.1. Irreducible representations of twist eaters We rst need to digress brie6y and describe the representation theory of the Weyl-’t Hooft algebra (5.23) [126]. The irreducible representations of this algebra are called twist-eating solutions and for any even dimensionality D = 2d they may be constructed as follows. The lattice L which generates the torus as the quotient space TD = RD =L has automorphism group SL(D; Z) which becomes the modular group of TD . Using this discrete geometrical symmetry of the spacetime, we can rotate the ’t Hooft matrix Q → S QS, S ∈ SL(D; Z), into a canonical skew-diagonal form [127]
0
q1 Q=
−q1 0 ..
. 0 qd
: −qd
(6.2)
0
Given the d independent 6uxes q ∈ Z, we introduce the two relatively prime sets of d integers N ; gcd(N; q ) q ; q = gcd(N; q )
N =
(6.3)
where gcd denotes the greatest common divisor. We then assume that there exists an integer N0 ∈ Z+ which is a divisor of the rank of the gauge group and the product of the d integers N in (6.3), N = N0 (N1 · · · Nd ) :
(6.4)
The integer N1 · · · Nd is the dimension of the irreducible representation of the Weyl-’t Hooft algebra. Requirement (6.4) is a necessary and suVcient condition for the existence of D independent twist eating solutions =a [126]. It is a condition which must be met by the geometrical parameters of the given constant curvature bundle. The matrices =a may then be de ned on the SU (N ) subgroup SU (N1 ) ⊗ · · · ⊗ SU (Nd ) ⊗ SU (N0 ) as =2−1 = 5N1 ⊗ · · · ⊗ VN ⊗ · · · ⊗ 5Nd ⊗ 5N0 ;
=2 = 5N1 ⊗ · · · ⊗ (WN )q ⊗ · · · ⊗ 5Nd ⊗ 5N0
(6.5)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
for = 1; : : : ; d, where VN 0 1 0 1 .. . VN = WN =
249
and WN are the SU (N ) shift and clock matrices 0 .. . ; .. . 1 0
1 e2i=N e4i=N ..
. e2i(N −1)=N
(6.6)
which obey the commutation relations VN WN = e2i=N WN VN :
(6.7)
The twist eaters (6.5) commute with the SU (N0 ) subgroup of SU (N ) which is generated by the matrices of the form 5N1 ⊗ · · · ⊗ 5Nd ⊗ Z0 with Z0 ∈ SU (N0 ). Note that (=2−1 )N = (=2 )N = 5N for each = 1; : : : ; d. Since the integers N and q in (6.3) are relatively prime, there exist integers a ; b such that a N + b q = 1
(6.8)
for each =1; : : : ; d. In the basis (6.2) where Q is skew-diagonal, we then introduce the four integral D × D matrices a1 0 0 −b1 0 a1 b1 0 .. .. ; A= ; B = . . ad 0 0 −bd 0
0
q1 Q =
bd
ad
−q1 0 ..
. 0
qd
; −qd 0
N1
0 N =
0
0 N1 ..
. Nd 0
: 0 Nd
(6.9)
250
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
We may then use the SL(D; Z) modular symmetry of the torus to rotate the matrix Q back to general form, Q → S QS, and similarly rotate the four matrices in (6.9) as A → (S )−1 AS, B → (S )−1 B(S −1 ) , Q → S Q S , and N → S −1 N S , with S; S ∈ SL(D; Z). The geometrical significance of the extra SL(D; Z) matrix S is that it will parametrize the automorphism group of the dual noncommutative torus that we shall obtain. As we will discuss in the next subsection, such an interpolation between two dual spaces is a general characteristic of the Morita transformation. In these general forms we can write Q = N Q N
−1
;
(6.10)
and from (6.8) it follows in general that AN + BQ = 5D :
(6.11)
Because of (6.11) and the antisymmetry of the matrices AB and Q N we have the block matrix identity
A B A B 0 5D 0 5D = (6.12) 5D 0 5D 0 Q −N −N Q which is equivalent to the statement that
A B ∈ SO(D; D; Z) −N Q
(6.13)
with respect to the canonical basis of RD; D . 6.1.2. Solving twisted boundary conditions We are now ready to describe how to solve (5.39) [96,124]. For this, we make two key observations. First of all, the twisted boundary conditions are solved for gauge elds on RD which are only afterwards regarded as functions on a torus, so that the corresponding Weyl operators should likewise be thought of as originating in this way. This is important because the solutions to (5.39) do not actually live on the original noncommutative torus. Secondly, for any pair of relatively prime integers N; q the set {(VN )j (WN )qj | j; j ∈ ZN } spans the N 2 dimensional complex linear vector space ˆ gl(N; C). From the construction above, it follows that we may expand the Weyl operator W[A] i in an SU (N1 · · · Nd ) ⊗ SU (N0 ) subgroup of SU (N ) and leave a U (N0 ) sector of the original gauge group corresponding to the subgroup of matrices which commute with the twist eaters. We may therefore write down the expansion D a D iki xˆi ˆ W[A]i = d k e ⊗ (=a )j ⊗ a˜i (k; ˜j) (6.14) ˜j mod N a=1 where a˜i (k; ˜j) is an N0 × N0 matrix-valued function which is periodic in ˜j, a˜i (k; j a ) = a˜i (k; j a + N ab ) for each a; b = 1; : : : ; D. By applying the constraint (5.39) to (6.14) using (5.23) and (5.40), and by equating the expansion coeVcients on both sides of (5.39), we nd that the functions a˜i (k; ˜j) vanish
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
251
unless 1 −1 −1 ki (N )b a (F−1 )i b − j b Qbc (N )c a = na 2 for some na ∈ Z, where we have used (6.10) and introduced the D × D matrix F=
1 N −1 1 (25 : = + F) D 2 ( + )N (5D + F)
(6.15)
(6.16)
Given the matrices A and B constructed above which satisfy (6.11), we may then solve (6.15) by a setting na = Ab a mb and j a = Bba mb for some ma ∈ Z. Since a (=a )N b = 5N for each b = 1; : : : ; D, it follows that for any given set of D integers ma and xed Fourier momentum ˜k, this solution for (˜n; ˜j) is unique mod N . By substituting this solution into (6.15), we may then solve for the Fourier momenta ˜k as ki = 2Fa i ma and replace the integration in (6.14) by a summation over all m ˜ ∈ ZD . What we have shown is that the most general solution to constraint (5.29) takes the form ˆ W[A] i =
D ab a (Zˆ )ma ei ma mb ⊗ a˜i (˜ m) ; m ˜ ∈ZD a=1
(6.17)
a¡b
m) = a˜i (2F˜ m; B˜ m) are N0 × N0 matrix-valued Fourier coeVcients which by Hermiticity where a˜i (˜ † obey a˜i (−˜ m) = a˜i (˜ m) . The operators a i Zˆ a = e2iF i xˆ ⊗
D ab (=b )B
(6.18)
b=1
obey the commutation relations ab Zˆ a Zˆ b = e−2i Zˆ b Zˆ a ;
(6.19)
−1 [Dˆ i ; Zˆ a ] = 2i( )i a Zˆ a ;
where Dˆ i is the covariant derivation (5.37), and 1 = (A + B) ; Q − N = ( Q − N ) :
(6.20) 11
(6.21) (6.22)
The commutation relations (6.19) and (6.20) are of the same form as the de ning ones (5.4) and (5.10) of the original noncommutative torus. The operators (6.18) thereby de ne a Weyl basis which generates a new, dual noncommutative torus with deformation matrix (6.21) and period matrix (6.22). The canonical coordinates x i on this new torus may be used to de ne a new basis 'ˆ (x ) for the mapping between spacetime elds and Weyl operators, and they are obtained by formally choosing a rotation x → x in which [Dˆ i ; 'ˆ (x )] = −9i 'ˆ (x ) ; 11
(6.23)
Note that the transformation → is only well-de ned on those for which Q − N is an invertible matrix. Such ’s span a dense subspace of the whole space of antisymmetric D × D real-valued matrices.
252
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
where 9i = 9=9x i . All the information about the topological charges of the original gauge theory are now transferred into the new noncommutativity parameters (6.21), and the new basis is given as 'ˆ (x ) =
D 1 ab −1 a i (Zˆ a )ma e−i ma mb e−2i ( )i ma x ; |det | D a=1
(6.24)
a¡b
m ˜ ∈Z
analogously to (5.8). Note that the commutation relations (6.20) are tantamount to representing the covariant derivations through i
ˆ
i
ˆ
i
e a Di = e2 a 9i eiai xˆ ⊗ =a ⊗
D (Zˆ b )Qba :
(6.25)
b=1
We may now rewrite expansion (6.17) using the new basis (6.24) which leads to the Weyl quantization ˆ W[A] = d D x 'ˆ (x ) ⊗ Ai (x ) ; (6.26) i where Ai (x ) is by construction a single-valued U (N0 ) gauge eld on the dual noncommutative torus. The remaining rank N=N0 of the original U (N ) gauge theory has now been absorbed into the new Weyl basis (6.18). The operator trace Tr satisfying Tr 'ˆ (x ) = 1 may be computed in terms of the original trace Tr as N0 det Tr ⊗ tr N : (6.27) Tr ⊗ tr N0 = N det Using (6.24)–(6.27), we nd that the noncommutative Yang–Mills action (5.36) when expanded in this new basis of Weyl operators becomes 1 ij SYM = − 2 d D x tr N0 (Fij (x ) ? F (x )) ; (6.28) 4g where Fij = 9i Aj − 9j Ai − i(Ai ? Aj − Aj ? Ai )
(6.29)
and ? denotes the new Groenewold–Moyal product de ned using the deformation parameter = =2 instead of . The new Yang–Mills coupling constant in (6.28) is given by g = g |det( Q − N )| : (6.30) The exact equivalence between the two forms (5.19) and (6.28) of the noncommutative Yang–Mills action is the duality that we are looking for. It shows that a noncommutative U (N ) Yang–Mills theory with magnetic 6ux (5.42) (and hence multi-valued gauge elds) is equivalent to a U (N0 ) noncommutative Yang–Mills theory with deformation parameter transformed according to (6.21), no magnetic 6ux (and hence single-valued gauge elds), and reduced gauge group rank N0 de ned by (6.4). This duality is known as Morita equivalence of noncommutative gauge theories and its basic transformation rules are summarized for convenience in Table 1.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
253
Table 1 Basic Morita equivalence of noncommutative gauge theories on D dimensional tori. The integer N=N0 is the dimension of the irreducible representation of the Weyl-’t Hooft algebra =a =b = e2iQab =b =a in D dimensions
Magnetic 6ux Gauge group Noncommutativity Periods Coupling constant
Original gauge theory
Dual gauge theory
Q U (N ) g
0 U (N0 ) ( Q − N )−1 (A + B) ( Q − N ) g |det( Q − N )|1=2
6.2. Applications Let us now make a series of remarks concerning the duality that we have found in the previous subsection: • Modulo a straightforward conjugation of the transformation matrix in (6.13), the map → is the standard SO(D; D; Z) transformation that relates Morita equivalent noncommutative tori. In fact, it is a theorem of noncommutative geometry that two noncommutative tori are Morita equivalent if and only if their noncommutativity parameters are related in this way [128]. This statement holds also when the target gauge bundle has nonvanishing topological charge, as the equivalence may then be realized by the composition of two of the sort that we have described in the previous subsection. In general, the transformation rule for the background 6uxes fij is given by (see (5.42) and (6.10)) [125] f = ( Q − N ) f ( Q − N ) + 2 Q ( Q − N ) :
(6.31)
We will see below how the transformation (6.31) may be explicitly obtained. From the point of view of the derivation given in the previous subsection, Morita equivalence may be simply ˆ regarded as a change of basis '(x) → 'ˆ (x ) for the mapping between operators and elds in Weyl quantization. • From the transformation rule (6.22) for the period matrix of the torus, we nd that the dual metric G = is given by G = ( Q − N ) G ( Q − N ) :
(6.32)
The transformations (6.21) and (6.32) are recognized as those of the B- eld and the open string metric under the SO(D; D; Z) target space duality group of the torus, acting on the open string parameters [13,34]. 12 This similarity holds in the usual decoupling limit → 0 and modulo the conjugation we mentioned above. It is also only true modulo the normalization of the operator trace Tr (which determines the transformation rule for the Yang–Mills coupling constant g), a point that we shall return to in the next subsection. It is possible to work out the transformation 12
Note that for open strings, which do not wind around the cycles of TD , the mapping is linear in the metric and, unlike the closed string metric, there is no transformation which maps G → G −1 .
254
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
rules for higher exterior powers of the noncommutative eld strength (see below) and show that they transform in a spinor representation of SO(D; D; Z) [118,125]. This is because a diMerential form of even degree can be identi ed with a bi-spinor of the rotation group SO(D) in D dimensions, while a spinor of SO(D; D) can be identi ed with a bi-spinor of its SO(D) subgroup. It simply re6ects the fact that the target space duality group acts on D-brane charges (or more precisely on Ramond–Ramond potentials) in a spinor representation of the group SO(D; D; Z) [129]. Notice, however, that the T -duality transformations along a single direction of TD are absent in the present formalism (they are in fact elements of O(D; D; Z)), because such a map takes Type IIA strings to Type IIB strings (and vice versa) and is therefore not a symmetry of the corresponding gauge theory. Nevertheless, it is a remarkable feature of noncommutative Yang–Mills theory that a stringy symmetry such as T -duality acts at a eld theoretical level, rather than mixing the noncommutative gauge eld modes with string winding states and other stringy excitations. This makes noncommutative Yang–Mills theory a very powerful description of the low-energy eMective dynamics of strings, in contrast to ordinary Yang–Mills theory which is not invariant under T -duality [34]. • The Morita transformation has several very interesting special cases. For N0 = 1 (so that N corresponds to the dimension of the irreducible twist eating solution), the nonabelian nature of the gauge theory is completely absorbed into the noncommutativity of spacetime. All the internal matrix structure of the gauge elds is absorbed by the Weyl operators Zˆ a in this case. This is true even for = 0, so that an ordinary U (N ) gauge theory is equivalent to a noncommutative gauge theory with U (1) gauge group. We can therefore transform an ordinary nonabelian gauge theory into a gauge theory with an abelian gauge group, at the cost of making the spacetime noncommutative. On the other hand, the original and dual ranks can be made equal within the present framework only when there is no background, Q = 0, in which case N = N0 (see (6.3)). When = 0, we see in fact that the dual is rational-valued, and we nd that noncommutative Yang–Mills theory with rational-valued deformation parameters is dual to ordinary Yang–Mills theory with ’t Hooft 6ux. But these dualities between commutative and noncommutative gauge theories are not the whole story. The various theories should be properly understood as being members of a hierarchy of models [130], in which the noncommutative description is the physically signi cant one in the infrared regime as a local eld theory of the light degrees of freedom, even though the theory is equivalent by duality to ordinary Yang–Mills theory. This is due to the extra infrared degrees of freedom that noncommutative eld theories contain, as we discussed in Section 3, and it is evident in the dual supergravity descriptions of noncommutative Yang–Mills theory [99]. When is irrational-valued, there is no commutative dual. But this remarkable duality does allow one to interpolate continuously, through noncommutative gauge theories, between two ordinary Yang–Mills theories with gauge groups of diMerent rank and appropriate background magnetic 6uxes [130]. • It has been shown that the one-loop ultraviolet structure of noncommutative Yang–Mills theory on MhbfTD is the same as that on RD [104,108]. Given the duality between commutative and noncommutative Yang–Mills theories that we have discussed above, we now have a precise explanation for the equivalence of the one-loop renormalizations in the two types of theories that were discussed in Section 4.3. The reason why the large N limit is relevant for this equivalence will become clear in Section 7 when we examine the Eguchi–Kawai reduction.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
255
6.2.1. Other transformation rules We will now brie6y describe the transformation properties of the other quantum numbers of noncommutative Yang–Mills theory under the Morita map. We can formulate these in a collective form by using the noncommutative Chern character which was introduced in Section 5.2. The new noncommutative Yang–Mills theory over the dual torus determines a vector bundle E which may likewise be classi ed topologically by the cohomology class ] ˆ W[F : (6.33) ch (E ) = Tr ⊗ tr N0 exp 2 There is a simple and elegant relation between (6.33) and the Chern character (5.16) of the original noncommutative gauge theory [125]. For this, we recall that the new curvature two-form F in (6.33) is obtained from the original one as a shift by the constant background 6ux, i.e. F = F − f (compare (5.19) and (5.34)). Taking into account the change (6.27) in the normalization of the trace, we may then write N0 1 a b |det( Q − N )|exp − fab L L ch (E) ; ch (E ) = (6.34) N 2 where fab = i a fij j b is the noncommutative curvature (5.42) of the corresponding frame bundle. For example, in two dimensions formula (6.34) gives gcd(N; Q) (N − Q )3 ; (6.35) ch (E ) = N2 consistent with the fact that the magnetic 6ux vanishes in the target theory. This is also consistent with the way that D-brane charges transform under T -duality (even in the case = 0) [129]. Similar formulas can be worked out for gauge theories in higher dimensions. For the cases f = 0 when the target gauge bundle has nonvanishing magnetic 6ux, the transformation rule (6.31) now follows from (5.16) and (6.34). Finally, let us comment on the transformation of observables of noncommutative Yang–Mills theory [96], i.e. the noncommutative Wilson lines of Section 4.2. In the target theory, where there are no large star-gauge transformations, the observables O (Cv ) associated with an arbitrary oriented contour Cv can be constructed using the relations (4.14), (4.18) and (5.12), and replacing all un-primed quantities with primed ones. In the original theory, however, we have to be a bit more careful because the gauge elds are multi-valued functions on the torus and transform according to the twisted boundary conditions (5.20). The corresponding parallel transport operator (4.14) is likewise multi-valued and obeys the boundary conditions U(x + i a –ˆ; Cv ) = Na (x) ? U(x; Cv ) ? Na (x + v)† :
(6.36)
To construct a single-valued observable O(Cv ), we use a path-ordered star-exponential of the background abelian gauge eld (5.25) to absorb the global gauge transformation in (6.36). We then arrive at the observable † i O(Cv ) = d D x tr N U(x; Cv ) ? P exp? i ? eiki (v)x ; dBi ai (x + B) (6.37) Cv
which can be shown [96] to be equivalent to those of the target noncommutative gauge theory under ˆ the Morita map by using Weyl quantization and the change of basis '(x) → 'ˆ (x ) of the previous subsection.
256
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
As an explicit example of this equivalence, let us start with a commutative Yang–Mills theory, = 0, with topologically nontrivial gauge elds. Fixing a loop C˜n which winds na times around the ath cycle of TD , the integrand of the observable (6.37) is then the usual gauge-invariant Polyakov line D a −1 a b i (6.38) dBi Ai (x + B) (=a )n e2i( )i ab n x : P(x; C˜n ) = tr N P exp i C˜n
a=1
We have used the fact, discussed at the end of Section 5.3, that in the commutative case we may set = 0 in (5.22) and maintain global gauge invariance by using only the twist eaters =a . In the Morita equivalent theory, we have noncommutativity = N −1 B and periods = N . As discussed above, in this case the complete matrix structure of the gauge theory may be absorbed into the noncommutativity of spacetime and the target theory has gauge group U (1). The twist eaters in (6.38) are therefore eaten up by the Morita transformation and one is left with an open line observable (4.18) with momentum (4.19), where the endpoint separation distance vector vi = (N −1 )a b i a nb in general does not wind around the cycles of the dual torus. Therefore, we have the equivalence (6.39) O (C˜n ) = d D xP(x; C˜n ) ; and the Polyakov lines of ordinary Yang–Mills theory map to open noncommutative Wilson lines under the Morita transformation [96]. 6.3. Projective modules Within the operator algebraic setting of noncommutative geometry [2], there is a more precise notion of a vector bundle over a noncommutative space, and along with it a more formal de nition of Morita equivalence. We shall not enter much into this technical de nition, but simply satisfy ourselves here with the stronger notion of Morita equivalence of noncommutative gauge elds that we developed above. But let us give a brief indication of the more formal de nition, which will be exploited to some extent in Section 8. Consider an algebra A of Weyl operators. By a module for A we will mean a separable Hilbert space H on which A acts. We will use only right actions ˆ ˆ of the algebra and denote them by · W[f] for ∈ H and W[f] ∈ A. The action is required to satisfy the condition ˆ ˆ ( · W[g]) · W[f] =
ˆ W[f]) ˆ · (W[g]
(6.40)
ˆ ˆ for ∈ H and W[f]; W[g] ∈ A, so that a module generates an explicit representation of the Weyl operators. For noncommutative algebras there are also left modules of A, while in the commutative case there is no distinction between the two types of actions. The module H is said to be projective if it can be embedded as a direct summand of a freely generated module, i.e. if there exists another ˜ such that H ⊕ H ˜ is a direct sum of copies of the algebra A itself, completed to a A-module H Hilbert space in an appropriate inner product. The latter space is trivially an A-module, with action ˆ ˆ ˆ W[f] ˆ ˆ ˆ de ned by W[g] · W[f] = W[g] for W[f]; W[g] ∈ A. In this simplest case the de ning condition (6.40) of a module is equivalent to the associative law of the algebra A. The space =(E) of smooth sections of a vector bundle E over a manifold M is naturally a projective C(M )-module, again with action de ned by s · f = fs for f ∈ C(M ) and s ∈ =(E). Condition
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
257
(6.40) is a trivial consequence of the commutativity of the function algebras, while Swan’s theorem [2], i.e. that E is a direct summand of a trivial bundle, guarantees that this module is projective. In fact, there is an analog of the Gel’fand–Naimark theorem for vector bundles known as the Serre–Swan theorem [2], which asserts that there is a one-to-one correspondence between the category of smooth vector bundles over a manifold M and the category of nitely generated projective C(M )-modules. Therefore, we may formally de ne a vector bundle over a noncommutative space to be a representation space H for its Weyl operator algebra. The purpose of this subsection is to describe some of these modules explicitly in the case of the noncommutative torus. Although the derivation given thus far in this section is completely independent of any of these representations, as we have indicated earlier there are many instances in which one would like to have explicit representations of the Weyl operators. For example, we will see that these modules arise naturally in Matrix theory and also as the Hilbert spaces of physical states in open string quantization. This will enable us to make a more precise identi cation of the Morita equivalence of noncommutative Yang–Mills theory with the T -duality symmetry of string theory [16,34]. Furthermore, we will use this formalism in Section 8 to give a more precise description of noncommutative gauge transformations in terms of matter elds in the fundamental representation of the star-gauge group. A more mathematical reason for wanting to study these modules is that it gives a more concise de nition of Morita equivalence [2]. A Morita equivalence of two algebras A and A provides a natural one-to-one correspondence between their projective modules. Precisely, it provides equivalence bi-modules M and M for A × A and A × A, respectively. The Hilbert space M is simultaneously a right A-module and a left A -module, and vice versa for M , with the right and left actions of A and A commuting. Using M one can de ne a map from right A -modules to right A-modules by H → M ⊗A H , with inverse map H → M ⊗A H. The algebra A is the commutant of A in the module M, i.e. the set of operators on M which commute with A is precisely A , and vice versa in M . Together these algebras of Weyl operators act irreducibly on the Hilbert spaces M and M . The Morita equivalence derived at the eld theoretical level above then asserts that gauge theory over A in a certain bi-module M is equivalent to gauge theory over A in another bi-module M . It may be checked explicitly that the Weyl operators (5.3) commute with the dual Weyl basis (6.18). We will now give some illustrative examples to see how this more abstract notion of Morita equivalence works in practise. A particularly simple equivalence bi-module is provided by taking M to be the Hilbert space of square-integrable functions on the torus TD . We may then represent the Weyl algebra (5.4) of the noncommutative torus by the operators 9 a − 1 a i i ab (6.41) Zˆ = exp 2i ( )i x + i b 9xi acting on M, where is any constant, real-valued D × D matrix whose antisymmetric part is given by − = :
(6.42)
In the case of a gauge bundle of vanishing topological charge, Q = 0, we may take B = 0 and A = N = 5D in order to satisfy the relation (6.11). From (6.21) it then follows that the dual
258
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
noncommutativity parameter is = −, and so the dual Weyl algebra (6.19) may be represented on M by taking a −1 a i i ba 9 ˆ : (6.43) Z = exp 2i ( )i x + ib 9xi It is easy to see that the set of operators (6.43) generate the commutant of the set (6.41) in M. The appropriate linear derivations may also be represented on M as 9 9ˆi = Dˆ i = i : (6.44) 9x The situation for gauge bundles of nonvanishing magnetic 6ux is somewhat more complicated. In this case, a convenient representation of the algebra of the noncommutative torus is provided by the fundamental sections of the given gauge bundle. These are the CN vector-valued functions (x) on RD which transform under large gauge transformations in the fundamental representation of the star-gauge group, (x + j a —) ˆ = Na (x) ? (x) :
(6.45)
We shall now solve the twisted boundary conditions (6.45) and thereby explicitly construct the module corresponding to the Hilbert space of sections of the corresponding fundamental bundle. For illustration we will work only in D = 2 spacetime dimensions. The equation can be truncated to such a form always by picking Darboux coordinates in which the noncommutativity parameter assumes its canonical form (3.6). In addition, we will work on a square torus T2 of unit size, i.e. we take j a = ja , and make the gauge choice N1 (x) = e2iqx
2
=N
⊗ (WN )q ;
N2 (x) = 1 ⊗ VN
(6.46)
with relatively prime positive integers q and N , where WN and VN are the SU (N ) clock and shift operators (6.6). Given the fundamental section (x) obeying (6.45), we introduce the section S(x) ≡
ˆ J (x − (J − 1)2) =
N
[(VN )−(J−1) ]J< < (x) :
(6.47)
<=1
By using the explicit representation (6.6) we nd that S(x) =
1 (x)
(6.48)
is independent of the vector index J = 1; : : : ; N of the fundamental sections. Furthermore, since (VN )N = 5N , the eld (6.47) and (6.48) is a periodic function of period N , ˆ = S(x + N 2)
J (x
ˆ = + (N − J + 1)2)
1 (x)
= S(x) :
(6.49)
As for the other boundary condition in (6.45), by using (6.46) and (6.6) we nd ˆ = e2iq(x2 +1)=N ? S(x) : S(x + 1) The general solution to (6.50) may be written as ! q 2 (x + 1); 2ix1 ◦◦ ? ’(x) ; S(x) = ◦◦ exp? N
(6.50) (6.51)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
259
ˆ = ’(x), and, for any two functions f(x) and g(x) whose where ’(x) is a periodic function, ’(x + 1) Moyal bracket f ? g − g ? f is a constant function on T2 , the normal-ordered exponential function is de ned by ∞ 1 ?n 1 ◦ exp (f; g) ◦ = f ? g?n (6.52) ◦ ◦ ? 1 − f ? g + g ? f n=0 n! with f?n = f ? · · · ? f (n times). The function (6.52) reduces to the ordinary exponential function efg in the commutative limit and generically it shares similar properties, ◦ exp (f; g) ◦ ◦ ◦ ?
? ◦◦ exp? (−g; f) ◦◦ = 1 ; ◦ exp (f ◦ ?
+ c; g) ◦◦ = ◦◦ exp? (f; g) ◦◦ ? exp? (cg) ;
◦ exp (f; g ◦ ?
+ c) ◦◦ = exp? (cf) ? ◦◦ exp? (f; g) ◦◦ ;
(6.53)
where c is any constant function on the torus. The periodic eld ’(x) may be expanded in a Fourier series ∞ 1 e2inx ? ’n (x2 ) ; (6.54) ’(x) = n=−∞
and, using properties (6.53) of the normal-ordered exponential function, we arrive at the solution ∞ ! q 2 ◦ exp (x + 1) + n; 2ix1 ◦◦ ? ’n (x2 ) : (6.55) S(x) = ◦ ? N n=−∞ Let us now rewrite series (6.55) in terms of the decomposition n = qm + j with m ∈ Z and j = 1; : : : ; q as q ∞ ! q 2 1 ◦ 2 ◦ exp + 1) + qm + j; 2ix (6.56) (x S(x) = ◦ ◦ ? ’m; j (x ) ; ? N m=−∞ j=1 where ’m; j (x2 ) = ’qm+j (x2 ). The periodicity property (6.49) implies ’m−1; j (x2 + N ) = ’m; j (x2 ), and so by induction it follows that ’m; j (x2 ) = ’0; j (x2 + Nm) :
(6.57)
Therefore, by inverting the de nition (6.47), we arrive fundamental sections in the form J (x) =
q ∞ m=−∞ j=1
◦ exp ◦ ?
! q 2 (x + J + Nm) + j; 2ix1 N
N ?? x + J + Nm + j; j q 2
nally at the general expression for the ◦ ◦
;
(6.58)
where the functions ?(s; j) = ’0; j (s − 1 − (N=q)j) are de ned on the whole of the domain R × Zq and are only restricted by the requirement that they be Schwartz functions of s ∈ R. They form a basis
260
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
of vectors in the Hilbert space of fundamental sections of the given gauge bundle parametrized by the rank N and magnetic 6ux q. All operators of the algebra of functions on the noncommutative torus may now be represented on the basis ?(s; j). In particular, the actions of the covariant derivatives (5.37) and the dual Weyl ˆ ]J induce their representations on these basis functions. After basis (6.18) on the operators W[ some tedious algebra, we arrive at the explicit representations 9 ; Dˆ 1 = −ifs; Dˆ 2 = 9s 2 Zˆ 1 = U 1 ⊗ (Vq )a ; Zˆ = U 2 ⊗ Wq ; (6.59) where the integer a is de ned by (6.8). The q × q shift and clock matrices in (6.59) act on the vector indices j ∈ Zq of the functions ?(s; j), while the operators U a , a = 1; 2 act as shift and clock type operators on the continuous indices s ∈ R by 1 1 (6.60) U ?(s; j) = ? s − ; j ; U 2 ?(s; j) = e2is=(N −q) ?(s; j) q and thereby generate the algebra U 1 U 2 = e−2i=q(N −q) U 2 U 1 :
(6.61)
It is straightforward to verify that the operators (6.59) yield a representation of the commutation relations (6.19) and (6.20). These are the irreducible modules HN; q over the noncommutative torus that were used in [28] in the context of matrix theory compacti cations. Other representations corresponding to the standard form of T -duality, mapping a gauge eld into the position of a D-string on the dual noncommutative torus, may also be constructed [122,131]. The constant curvature modules over a four-dimensional noncommutative torus are explicitly constructed in [132]. In the general case the irreducible modules correspond to linear spaces of Schwartz functions on Rp × Zq × =, where 2p + q = D and = is a nite abelian group [28,133]. Such representations of the noncommutative torus are known as Heisenberg modules. Expression (6.58) shows that a Heisenberg module may be regarded as a deformation of the space of sections of a vector bundle over the ordinary, commutative torus [122]. 6.3.1. String theoretical interpretation The Heisenberg modules described above admit an elegant interpretation in the context of the quantization of open strings in external B- elds [28,34]. Consider an open string with one endpoint terminating on a D2-brane, and the other one on a con guration of N coincident D2-branes with q units of D0-brane charge which is equivalent to q units of magnetic vortex 6ux [119] (see Section 5.2). The situation is depicted schematically in Fig. 5. Let us consider rst the simplest case whereby the open string stretches between a pair of D2-branes. In the Seiberg–Witten scaling limit considered earlier (see Section 1.3), the topological open string -model action (1.12) in the case that the worldsheet is an in nite strip is given by i d xj d x˜j i SB = − dt Bij xi + dt Bij x˜i ; (6.62) 2 dt 2 dt where xi and x˜i denote the values of the string elds at opposite boundaries of the strip. Canonical quantization of the action (6.62) yields the quantum commutators (1.1) and [x˜ˆi ; x˜ˆj ]=−i ij . However,
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
..
..
261
.....
Fig. 5. An open string (wavy line) stretching from a single D2-brane (shaded region) to a cluster of N D2-branes carrying q units of D0-brane charge. Quantization of the point particle at the left end of the string produces a Hilbert space H = H1; 0 while quantization at the right end yields H = HN; q .
one has to remember that the particles described by the con gurations xi and x˜i are connected together by a string. The contribution to the energy from the bulk kinetic term gij 9a xi 9a xj in (1.11) is minimized by a string which is a geodesic from xi to x˜i . In the decoupling → 0 limit, the 6uctuations about this minimum have in nite energy, and we may thereby identify the classical phase space of the theory (6.62) as consisting of a pair of points xi and x˜i along with a geodesic line connecting them. This leads to the parametrization xi = yi + 12 si ;
x˜i = yi − 12 si ;
(6.63)
where y ∈ T2 is the midpoint of the geodesic joining x and x, ˜ and the coordinate s ∈ R2 keeps track of how many times the geodesic wraps around the cycles of the torus. Canonical quantization is then tantamount to taking yˆ i to be multiplication operators by yi and i sˆ the canonical momentum operators 9 (6.64) sˆi = i ij j : 9y The physical Hilbert space M of open string ground states thus consists of functions on an ordinary torus T2 with coordinates y. The algebras of functions at the left and right endpoints of the open string are thereby generated by operators of the type (6.41) and (6.43), respectively (cf. (5.6)). In the general case, the map → in (6.21) represents the Morita equivalence between the Heisenberg modules H1; 0 and HN; q , and it coincides with the T -duality transformation that maps a con guration of (D2; D0) brane charges (1; 0) to a brane cluster of charges (N; q). The latter collection of D2-branes supports a U (N ) Chan–Paton gauge bundle with connection of constant curvature q=N . The Hilbert space HN; q may then be constructed explicitly by quantizing open strings on an N 2 -fold cover of T2 that end on a cluster with brane charges (1; Nq), which can be easily described by the modular transformation B → B + NQ of the action in (6.62) governing the dynamics of x˜ ∈ T2 , and then orbifolding by the action of the discrete group ZN ×ZN [34]. Generally, the diMerent algebras obtained by quantizing open strings with diMerent boundary conditions, in the low-energy limit with xed open string parameters, are all Morita equivalent [34]. The equivalence bi-modules are generated by the open string tachyon states with a given boundary condition on the
262
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
left and another one on the right. That the actions of a pair of algebras commute with each other in these modules follows from the fact that they act at opposite ends of the open strings. That they act irreducibly on the bi-modules follows from the fact that they generate the full algebra of observables in the quantum mechanics of the open string ground states, a point we shall return to in Section 8. It is for this latter reason in fact that the modules above are referred to as Heisenberg modules. 7. Matrix models of noncommutative Yang–Mills theory As we discussed in Section 1, noncommutative gauge theories in string theory originally appeared through the large N limits of matrix models. There is in fact a very deep relationship between matrix models and noncommutative Yang–Mills theory that we shall now spend some time analysing. This will be particularly useful for our analysis in the next section. Moreover, it demonstrates the existence of a very natural nonperturbative regularization of noncommutative gauge theories which has no counterpart in ordinary quantum eld theory. This in itself proves that these models exist as well-de ned eld theories, even beyond perturbation theory. Henceforth we shall focus our attention on the simplest instance of structure group rank N = 1. That the ensuing conclusions hold as well without loss of generality for the generic cases will become clearer in Section 8.3. For the remainder of this paper, we shall also assume for simplicity that the spacetime dimensionality D = 2d is even and that ij is nondegenerate. 7.1. Twisted reduced models The remarkable aspect of noncommutative gauge theory that we shall build on in this section is the fact that the derivatives 9i (or 9ˆi ) can be completely absorbed into the noncommutative gauge elds. There is no analog of the following manipulation in ordinary, commutative Yang–Mills theory. For this, we introduce the covariant coordinates [134] Ci = ( −1 )ij xj + Ai
(7.1)
which, on using representation (2.21), are seen to transform covariantly under the noncommutative gauge transformations (4.10), Ci → g ? Ci ? g† :
(7.2)
In this sense, operators (7.1) may be thought of as the gauge covariant momentum operators that one introduces in the quantum mechanics of a charged particle in a background magnetic eld (cf. Section 1.3). Indeed, they are completely analogous to the covariant derivative operators (5.37) introduced in the case of a constant background 6ux. Their remarkable property though in the present context is that the entire noncommutative gauge theory may be expressed in terms of them. The star-gauge covariant derivatives ∇i , de ned by (4.28), are given as the star-commutators ∇i f = if ? Ci − iCi ? f ;
(7.3)
while the eld strength tensor (4.4) is the sum Fij = −i(Ci ? Cj − Cj ? Ci ) + ( −1 )ij :
(7.4)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
263
Notice that operators (7.1) are essentially elements of the abstract, deformed algebra of functions, and so from (7.4) we see that spacetime derivatives have completely disappeared in this rewriting from action (4.3). ˆ Passing to the Weyl representation Cˆ i = W[C] i , the noncommutative Yang–Mills action (4.2) may then be written as 1 SYM = − 2 Tr (−i[Cˆ i ; Cˆ j ] + ( −1 )ij )2 : (7.5) 4g i=j
The classical vacua of the gauge theory, i.e. the 6at noncommutative gauge elds with Fij = 0, are in representation (7.5) interpreted as the momenta which obey the commutation relations [Cˆ i ; Cˆ j ] = −i( −1 )ij . The remarkable feature of action (7.5) is that it is just an in nite dimensional matrix model action, as the elds Cˆ i are formally space independent, i.e. it is a large N version of the matrix model action (1.5). Such a theory is known as a reduced model, becomes it formally derives from the dimensional reduction of gauge theory by taking all elds to be independent of the spacetime coordinates. In fact, one could start from action (7.5), expand the in nite matrices Cˆ i as in (7.1) with a noncommuting background (1.1), and thereby derive noncommutative gauge theory from a large N matrix model. Such a matrix expansion about a nontrivial background is the way that noncommutative Yang–Mills theory is obtained from the large N limit of the IIB matrix model (1.5) [43]. The spacetime dependence appears from expanding around a classical vacuum, but initially it is hidden in the in nitely many degrees of freedom of the large N matrices Cˆ i . This is in fact the basis of the original appearance of noncommutative gauge theory from string theory [28]. This intimate connection with reduced models is just a special instance of the Eguchi–Kawai reduction of multi-colour eld theories [135] which was argued long ago to reproduce the physics of ordinary, large N Yang–Mills theory. The addition of the constants ( −1 )ij in (7.5) removes what would otherwise be an in nite constant and corresponds to a “twist” in the reduced model [136]. It is required in order that the reduced model be equivalent to the ’t Hooft limit [67] of large N quantum eld theory on continuum spacetime, precisely it restores a certain symmetry of the theory that is otherwise broken in the loop expansion of the model. The fact that noncommutative gauge theories are deeply connected to matrix models implies some rather surprising aspects of them and their gauge groups that we shall now proceed to explore. 7.2. Finite-dimensional representations The twisted reduced model (7.5) is intrinsically in nite-dimensional, because its classical equations of motion admit solutions only over an in nite-dimensional Hilbert space, as is usual for Heisenberg-type commutation relations (see the next section). This simply indicates that we must specify derivatives for it somehow. We can now ask whether there exists a nite dimensional, N ×N matrix model version of (7.5) which reproduces noncommutative Yang–Mills theory in the large N limit, as these are the types of models in string theory from which one starts from. Of course, any operator representation of noncommutative gauge theory is formally a matrix model, but we are really seeking a nite-dimensional version which can regulate the continuum quantum eld theory at a nonperturbative level. Such a matrix model does indeed exist [95,96]. Detailed reviews of these constructions are given in [137]. Related work can also be found in [138].
264
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
7.2.1. The twisted Eguchi–Kawai model A regulated, N × N matrix model formulation of noncommutative gauge theory is provided by the twisted Eguchi–Kawai model [136]. The action is 1 ∗ Uij tr N (Vi Vj Vi† Vj† ) ; (7.6) STEK = − 2 4g i=j
where Vi ; i = 1; : : : ; D, are N × N unitary matrices and the ZN -valued twist factors are given by Uij = e2iQij =N ;
(7.7)
with Q an antisymmetric D × D integral matrix. This action is the natural nonperturbative analog of ˆ the in nite-dimensional matrix model (7.5). By identifying Vi = eijC i , with j a dimensionful lattice spacing, the action (7.5) is obtained in the continuum limit of the twisted Eguchi–Kawai model corresponding to j → 0, N → ∞, and ( −1 )ij =
2Qij : N j2
(7.8)
Identi cations of this sort are of course well-known. It is the basis of Weyl’s nite version of quantum mechanics [53] which follows from the simple observation that while the Heisenberg commutation relations do not admit any nite dimensional representations, their exponentiated form in terms of unitary operators do in some special instances. The unitary matrix model (7.6) originates from the ordinary, Wilson lattice gauge theory version [139] of the commutative counterpart of the torus model with background ’t Hooft 6ux that we studied in Section 5.3. The commutative action is given by 1 SW = − 2 tr N [Ui (x) Uj (x + jˆ–)Ui (x + j—) ˆ † Uj (x)† ] ; (7.9) 4g x i=j
where the sum over x runs through sites on a periodic hypercubic lattice, and the gauge elds Ui (x) are N × N unitary matrices on the links of the lattice. As in Section 5.3, we assume that the gauge elds are multi-valued functions around the periods of the lattice. They thereby satisfy exponentiated, discrete versions of the twisted boundary conditions (5.20). However, we recall from the discussion at the end of Section 5.3 that in the commutative case we may choose = 0 in the transition functions (5.22), so that the Ni are constant and given by the twist eating solutions =i . Now let us dimensionally reduce action (7.9) to the point x = 0. Then, up to an irrelevant dimensionless volume factor, the reduced model describes a one-plaquette version of lattice gauge theory with multi-valued gauge elds. We can use this multi-valuedness to generate the gauge elds at the other corners of the plaquette from the unitary matrix Ui ≡ Ui (0) via the twisted boundary conditions Ui (j—) ˆ = =j Ui =j† : Substituting (7.10) into the reduced action induced from (7.9) yields 1 red =− 2 tr N (Ui =i Uj =i† =j Ui† =j† Uj† ) : SW 4g i=j
(7.10)
(7.11)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
265
By using the Weyl-’t Hooft commutation relations (5.23) and de ning the unitary matrices Vi = Ui =i ;
(7.12)
the action (7.11) reduces to (7.6). As we mentioned in the previous subsection, the twisted Eguchi–Kawai model was originally used as a matrix model which is equivalent to Yang–Mills gauge theory in the large N limit. However, for nite N the model admits another interpretation, which leads to a complete justi cation of the result that noncommutative gauge theory is equivalent to all orders of perturbation theory to a twisted large N reduced model, namely the IIB matrix model (1.5) with D-brane backgrounds [43]. For this, we will assume, for simplicity, that the rank N of the unitary matrices is an odd integer. Note that action (7.6) possesses the global U (N ) gauge symmetry Vi → NVi N† ;
N ∈ U (N ) ;
(7.13)
and a U (1)D centre symmetry Vi → eii Vi ;
i ∈ R :
(7.14)
The vacuum con guration Vi(0) of the theory is given, up to a U (N ) gauge transformation (7.13), by the twist-eating solutions for SU (N ), Vi(0) = =i :
(7.15)
We will also restrict the twist matrices and rank to be of the form Qij = 2Ld−1 !ij ;
N = Ld ;
where L is an odd integer and ! is the D × D skew-diagonal matrix de ned by
0 −1 != ⊗ 5d : 1 0
(7.16)
(7.17)
Then, following the general construction of Section 6.1, we have N = L, q = 1 for each = 1; : : : ; d, and N0 = 1. It follows that the SU (N ) twist eaters are all constructed from L × L clock and shift matrices. They therefore satisfy (=i )L = 5N . Note that the constraint (6.8) may then be satis ed by taking a = 0, b = 1 for each = 1; : : : ; d without loss of generality. 7.2.2. The matrix-5eld correspondence The relationship between the unitary matrix model (7.6) and noncommutative gauge theory comes about because there is a very natural nite-dimensional version of the Weyl–Wigner correspondence. Let us introduce the N × N unitary, unimodular matrices Jk =
D i=1
(=i )ki
eiQij ki kj =N
(7.18)
j¡i
de ned for integer-valued vectors k. The phase factor is included in (7.18) to symmetrically order the product of twist eaters. Since (=i )L = 5N , these matrices have the periodicity properties JL−k = J−k = Jk† ;
(7.19)
266
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
and they obey the algebraic relations Jk J q =
D D
eiki Qij qj =N Jk+q :
(7.20)
i=1 j=1 i
The Jk ’s have the same formal algebraic properties as the plane wave Weyl basis eiki xˆ for the continuum noncommutative eld theory on the torus. The basic operators were de ned in (5.3) and are the analogs of the twist eaters =i in the present case. Owing to the property (7.19), there are only N 2 independent matrices. As we will see, the integers ki label momenta on a periodic lattice which are restricted to a Brillouin zone k ∈ ZDL . The matrices (7.18) obey the orthonormality and completeness relations 1 tr N (Jk Jq† ) = k; q(mod L) ; N 1 (Jk )J< (Jk ),L = JL <, : (7.21) N D k ∈ZL
They thereby form the Weyl basis for the linear space gl(N; C) of N ×N complex matrices [53,140]. In particular, the 6uctuation modes Ui in (7.12) about the classical vacuum con guration (7.15) can be expanded as 1 Ui = 2 Ui (k)Jk ; Ui (k) = N tr N (Ui Jk† ) ; (7.22) N D k ∈ZL
where the c-numbers Ui (k) may be interpreted as Fourier coeVcients for the expansion of a lattice eld on the discrete torus. These N 2 momentum space coeVcients now describe the dynamical degrees of freedom in the twisted Eguchi–Kawai model. The underlying discrete, noncommutative space described by these matrices is sometimes called a fuzzy torus. We can now make a discrete Fourier transformation to de ne lattice elds on a discrete torus. In complete analogy with the continuum formalism, we de ne these via the N × N matrix elds 1 i '(x) = 2 Jk e−2iki x =‘ ; (7.23) N D k ∈ZL
where ‘ = jL
(7.24)
is the dimensionful extent of the hypercubic lattice with N 2 = LD sites xi . Because of the relations (7.19), the matrices '(x) are Hermitian and periodic in xi with period ‘. This means that the underlying lattice is a discrete torus. Since the algebraic relations satis ed by the matrices Jk are completely analogous to their continuum counterparts, the matrices (7.23) have the same formal properties as the continuum ones (5.8) and thereby yield an invertible map between N × N matrices and lattice elds. In particular, they obey the relations 1 i tr N (Jk '(x)) = e2iki x =‘ ; N
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
267
1 '(x)J< '(x),L = JL <, ; N x
1 tr N ('(x)'(y)) = N 2 x; y(mod ‘) : N A lattice eld may then be associated to the unitary matrix Ui by the Fourier series 1 1 i Ui (x) ≡ 2 Ui (k) e2iki x =‘ = tr N (Ui '(x)) : N N D
(7.25)
(7.26)
k ∈ZL
Because of relation (7.16), the eld Ui (x), which depends on LD spacetime points, describes the same N 2 degrees of freedom as the N × N unitary matrix Ui . In fact, as we will see below, the matrix trace tr N can be substituted by a summation x over lattice points, in complete analogy again with the continuum property (2.11). However, while the matrices Jk are unitary as in (7.19), their linear combination in (7.22) need not be unitary in general (a linear combination of unitary matrices does not necessarily stay in the group U (N )). Instead, the unitarity condition Ui Ui† = Ui† Ui = 5N on the matrices 1 Ui = 2 Ui (x)'(x) (7.27) N x reads Ui (x) ? Ui (x)∗ = Ui (x)∗ ? Ui (x) = 1 ;
(7.28)
where the lattice star-product is de ned by 1 tr N (FG'(x)) N 1 −1 i j F(x + y)G(x + z)e2i( )ij y z = 2 N y z
F(x) ? G(x) ≡
(7.29)
with (dimensionful) noncommutativity parameter j2 L !ij : (7.30) The star-product (7.29) reduces to the Fourier integral kernel representation (2.17) in the continuum limit j → 0. It is a proper discretized, nite-dimensional form of the continuum Groenewold–Moyal star-product, with which it shares the same algebraic properties (with spacetime integrals replaced by lattice sums). ij =
7.2.3. Discrete noncommutative Yang–Mills theory We are nally ready to interpret the twisted Eguchi–Kawai model in terms of noncommutative gauge theory. For this, we substitute (7.12) and the completeness relation 1 '(x) = 5N (7.31) N2 x
268
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
into the action (7.6) to write 1 tr N [Ui (=i Uj =i† )(=j Ui† =j† )Uj† '(x)] ; STEK = − 2 2 4g N x
(7.32)
i=j
where we have used the Weyl-’t Hooft algebra (5.23) to rearrange the twist eaters in (7.32). The key observation now is that the matrices =i act as lattice shift operators in this picture, i.e. they are ˆ discrete derivatives ej9i . Using (7.18), (7.23) and (5.23) we may easily compute =i '(x)=i† = '(x − jˆ–) ;
(7.33)
from which it follows that shifts of the lattice gauge elds may be represented as 1 ˆ = tr N (=j Ui =j† '(x)) : Ui (x + j—) N Using (7.26), (7.29) and (7.34), action (7.32) nally becomes 1 Ui (x) ? Uj (x + jˆ–) ? Ui (x + j—) ˆ ∗ ? Uj (x)∗ ; STEK = − 2 4, x
(7.34)
(7.35)
i=j
where ,=
g2 N
(7.36)
is the ’t Hooft coupling constant. Thus the twisted Eguchi–Kawai model (7.6) can be rewritten exactly as the noncommutative U (1) lattice gauge theory (7.35). Using the matrix- eld correspondence established above, we see that the U (N ) invariance (7.13) of the unitary matrix model translates into the local star-gauge symmetry of the lattice model, Ui (x) → g(x) ? Ui (x) ? g(x + jˆ–)∗ ;
(7.37)
where g(x) is a star-unitary lattice eld, g(x) ? g(x)∗ = g(x)∗ ? g(x) = 1. The lattice gauge theory (7.35) reduces to the Wilson plaquette model (7.9) in the commutative limit → 0. It is in this sense that the twisted Eguchi–Kawai model can be interpreted as noncommutative U (1) Yang–Mills theory (on a periodic lattice). In particular, the matrix model provides a nonperturbative regularization of the eld theory, and all results derived in this setting will be completely rigorous. For example, the integration measure for the path integral of the twisted Eguchi–Kawai model is determined in terms of the invariant Haar measures [dUi ] for the unitary Lie group U (N ), which are invariant under the gauge transformations (7.13). Using the above correspondence it determines the Feynman measure for path integration in the noncommutative gauge theory (7.35) which is invariant under the lattice star-gauge transformations (7.37) as D x
i=1
[dUi (x)] =
D i=1 k ∈ZD L
[dUi (k)] =
D
[dUi ] :
(7.38)
i=1
The simplicity in writing down the quantum theory here, as compared to the continuum case, is a consequence of the mapping of the N × N matrix degrees of freedom into a lattice of size N 2 = LD and with U (1) elds. In particular, we can identify the star-gauge symmetry group of the U (1)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
269
noncommutative gauge theory (7.35) with the symmetry group U (Ld ) of the unitary matrix model. These facts will all be instrumental in the analysis of the next section. We will conclude our discussion of the matrix model formulations of noncommutative Yang–Mills theories with a number of remarks concerning the above construction: • There are two sorts of continuum limits that the lattice gauge theory (7.35) admits. If we take the limit N = Ld → ∞ rst for nite lattice spacing j, followed by the continuum limit j → 0, then from (7.30) it follows that → ∞ and hence only planar Feynman diagrams survive (cf. (3.16)). This is the usual way to reproduce the ’t Hooft limit of ordinary large N Yang–Mills theory on continuum spacetime from twisted reduced models [136]. Alternatively, we can take the continuum limit of the reduced model by keeping the noncommutativity xed in the correlated limit N → √ ∞, j → 0, with Lj2 nite. The extent of the lattice (7.24) in this limit is ‘ ∼ L = N 1=D → ∞, and we recover noncommutative gauge theory on 6at, in nite space RD , as in the large N limit of the IIB matrix model [43]. In both types of large N limits the Yang–Mills coupling constant g must be tuned to be a function of j, in order that the ’t Hooft coupling constant (7.36) be nite in the limit. The case of nite N corresponds to the noncommutative version (7.35) of Wilson lattice gauge theory, which is a nonperturbative lattice regularization of the continuum noncommutative Yang–Mills theory. We have thereby formulated a well-de ned nite-dimensional matrix model representation of noncommutative Yang–Mills theory on RD . Among other things, this relationship completes the explanation of the remarkable coincidence of the perturbative beta-functions in the planar commutative and noncommutative gauge theories that we discussed in Section 4.3. • Given that the twisted Eguchi–Kawai model is derived as the dimensional reduction of the Wilson lattice gauge theory (7.9) with background ’t Hooft 6ux, it would appear that we have also derived a relationship between this latter, commutative lattice gauge theory and the noncommutative lattice gauge theory (7.35) which has single-valued elds. The two theories are in fact Morita equivalent [96]. Noncommutative lattice gauge theory is always Morita equivalent to a commutative lattice gauge theory, because the nite dimensionality of the representation of the noncommutative algebra of functions on the lattice necessitates a rational-valued dimensionless deformation parameter . This establishes that the phenomenon of Morita equivalence of noncommutative gauge theories holds in and beyond regulated perturbation theory. • It is possible to generalize the construction of this subsection to induce a noncommutative U (r) lattice gauge theory with rank r ¿ 1. For this, we take the unitary matrices of the twisted Eguchi–Kawai model to live in a direct product group U (r) ⊗ U (N ). The trace over the U (N ) indices is treated as before and transformed into a sum over lattice points. The remaining indices are left unaltered and become the nonabelian colour indices of the resulting eld theory. This is tantamount to choosing a more general background 6ux Q for which N0 ¿ 1, as we did in the derivation of Morita equivalence in Section 6. • The noncommutativity parameter (7.30) can be written as =‘j=, so that nite noncommutativity requires keeping the quantity ‘j xed in the continuum limit. In fact, nite noncommutativity in the lattice formulation necessarily implies a nite size ‘ = =j of the spacetime [96]. As the lattice spacing j is an ultraviolet cutoM for the dynamics of the eld theory and ‘ serves as an infrared cutoM, this is just a nonperturbative manifestation of the UV/IR mixing phenomenon in noncommutative quantum eld theory that we unravelled in Section 3.3. It is very explicitly evident in the discrete formalism that the two limits ‘ → ∞ (giving the noncommutative planar
270
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
limit) and j → 0 (giving the commutative limit) do not commute. This gives a very direct interpretation of this novel property of noncommutative eld theories, which in the present case occurs at a kinematical level. The reason for this is the very drastic regularization provided by the matrix model. • It is possible to modify the above construction and arrive at a continuum gauge theory on a noncommutative torus. Within the present framework this is not possible, because from (7.24) and (7.30) it follows that it is not possible to take a large N continuum limit which keeps both the size and noncommutativity of the spacetime nite. One can, however, repeat the above construction by introducing two more integers n and m with L = nm, and modifying the map (7.23) to [95] 1 i 'n (x) = 2 (Jk )n e−2iki x =jm : (7.39) N D k ∈Zm
The N × N matrix-valued lattice elds (7.39) provide a one-to-one Weyl–Wigner correspondence between lattice elds and elements of gl(N; C) which commute with the matrices D Ni = (=j )m!ij : (7.40) j=1
The resulting noncommutative lattice gauge theories thereby follow from the constrained twisted Eguchi–Kawai model (7.6) obtained by restricting its unitary matrices (7.12) to those which obey the constraints Ui Nj = Nj Ui for each i; j = 1; : : : ; D. Since (=i )L = 5N , for n = 1 the matrices (7.40) are trivial and we recover the previous construction, with '(x) = '1 (x). For n ¿ 1 the dimensionless noncommutativity parameter is = n=m and it is possible keep the noncommutativity of spacetime nite as N → ∞ even for nite extent jm. Thus the resulting continuum noncommutative eld theory lives on a torus. The geometrical meaning of the constraints is that they enforce the compacti cation of the matrices of the unitary matrix model on a d-dimensional torus [95]. Indeed, they are just equivalent to unitary versions of the quotient conditions [28] for toroidal compacti cations of the IIB matrix model (1.5). Unlike this Hermitian matrix model, the quotient conditions for the unitary matrix model admit nite-dimensional solutions. The resulting solutions also have a natural interpretation in terms of Morita equivalences of noncommutative tori [95]. It is also possible to view such correspondences between matrix models and lattice eld theories by using only Morita equivalence [96], without the quotient conditions, but with more complicated twist matrices Q. Such generalizations also allow the construction of noncommutative eld theories with the most general deformation parameters ij . We should point out, however, that the noncommutative lattice gauge theories which originate from twisted reduced models are not the only ones that can be constructed [96]. 8. Geometry and topology of star-gauge transformations In this nal section we will take a look at the structure of the group of star-gauge transformations in noncommutative Yang–Mills theory. Gauge symmetries in the noncommutative case are very diMerent from their commutative counterparts, because they involve an intriguing mixing between spacetime and internal, U (N ) symmetries. This mixing was responsible for the duality that we described in the previous section, in that the spacetime degrees of freedom were able to absorb some of the colour
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
271
degrees of freedom of the gauge elds. It is evident in fact immediately from the operatorial form of the noncommutative Yang–Mills action (4.2), which shows how the spacetime and U (N ) traces are interlocked and cannot be separated from one another. We recall that for this reason it was diVcult to construct local, star-gauge invariant observables. Another aspect of noncommutative gauge theory which is intimately tied to the mixing between spacetime and gauge degrees of freedom was its connection with matrix models, which followed from the observation that all derivatives, and hence all spacetime dependence, can be completely absorbed into the noncommutative gauge elds. This is particularly transparent in the discrete representations of the previous section, whereby the N × N matrix degrees of freedom are in a one-to-one correspondence with N 2 spacetime lattice points. In the remainder of this paper we will pose a very elementary question: What is the gauge symmetry group of noncommutative Yang–Mills theory on 6at in nite space? In our attempt to formulate an answer to this question, we will be guided by two main themes, with the aim of clarifying the structure of the local and global star-gauge symmetry group: • Geometry: We have seen in Section 4.2 that spacetime translations can be regarded to a certain extent (to be discussed below) as star-gauge transformations. The only other theory with such a property is general relativity. This feature has been used to suggest [141] that general coordinate transformations may be realized as genuine gauge symmetries of noncommutative Yang–Mills theory. In fact, via certain dimensional reduction techniques [142], the translational symmetry can be gauged to induce a eld theory which contains as special limits some gauge models of gravitation. The most compelling evidence however has been via the strong-coupling dual supergravity description of maximally supersymmetric noncommutative Yang–Mills theory in four dimensions, in which it is possible to identify the Newtonian gravitational force law [143]. Other indications include the observation that the unitary group of a closed string vertex operator algebra contains generic reparametrizations of the spacetime coordinates [15,16] (see [144] for an explicit description of this group), the identi cation of the one-loop long-ranged potential particular to noncommutative Yang–Mills theory with the gravitational interaction in Type IIB superstring theory [43,145], and the couplings of noncommutative gauge elds to massless closed string modes in 6at space [103]. However, we will see in this section that this assertion is incorrect, in that noncommutative gauge transformations can only realize a certain subgroup of the diMeomorphism group of spacetime [98]. • Topology: We are also interested in global properties of noncommutative gauge theories, and the global gauge group is an important object for the topological classi cation of solitons, among other things. The star-gauge group is in nite-dimensional and has been identi ed previously as both the in nite unitary group U (∞) and also as the group U (H) of unitary operators on a separable Hilbert space H. These two groups are very diMerent, and in fact both of these proposals for the star-gauge group are incorrect. The former group can never be identi ed with a space of functions (but rather only completions of it can), while the latter group is, as we will discuss, contractible and so it does not possess the interesting topological characteristics that its commutative counterpart has which leads to eMects like anomalies and topological solitons. In this section we will clarify some of these misconceptions, and also illustrate some very precise mathematical aspects of the star-gauge symmetry group of noncommutative Yang–Mills theory. This will bring out some more of the deeper operator-algebraic formalism of noncommutative geometry,
272
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
already unleashed at the end of Section 6. Throughout this section we will denote the algebra of Schwartz functions on RD → C equipped with the star-product (2.19) by A . Its commutative limit of ordinary functions will be denoted A0 = C(RD ). The representation of the algebra A by operators on a Hilbert space H will be denoted A (H) ⊂ End(H). The treatment of this section will mostly employ an operator formalism with Heisenberg commutation relations, mostly ignoring the star product structure. However, within the framework of deformation quantization, the structure of star-gauge symmetries is rather well understood through the formalism of deformed vector bundles [146], along the lines of what we described in Section 6.3. Along these lines an important structure which arises is a cocycle condition on the gauge transformations, which is treated implicitly in the operator framework. While the following material is somewhat more mathematical than that of previous sections, it introduces some more fundamental techniques and ideas of noncommutative geometry. Most of the material of this section follows closely [98], where more technical details may be found. 8.1. Star-gauge symmetries revisited We will begin by de ning more precisely the notion of gauge symmetry in the noncommuting setting, and will proceed throughout this subsection in a somewhat abstract fashion. In the commutative case, gauge elds arise through covariant derivatives which specify parallel transport along the bers of a given vector bundle over RD . In turn, as we saw in Section 6.3, a vector bundle is completely characterized by its (Hilbert) space of sections. The commutative algebra A0 acts naturally on this space, so that the sections form an A0 -module. Thus the noncommutative analog of a vector bundle over noncommutative RD is an A -module H, whose vectors will be interpreted as fundamental matter elds, i.e. elds which transform under the fundamental representation of the gauge symmetry group. These representations will be particularly important for the explicit identi cation of the star-gauge symmetry group. We then seek a covariant derivative ∇i such that ∇i is a matter eld in the same representation as . This is the algebraic version of the parallel transport condition. A canonical choice of projective module is the Hilbert space of square-integrable fundamental matter elds, Hm = L2 (RD ) ⊗ CN :
(8.1)
From a geometric standpoint and also to analyse properly the gauge symmetries, we need to investigate the reducibility of this representation of the algebra. To this end, let us begin by analysing the commutative case. We may de ne an action of A0 on Hm by ·f=f
(8.2)
with f ∈ A0 and ∈ Hm . The de ning condition of a module, ( · g) · f = · (fg), is a trivial consequence of the commutativity of pointwise multiplication of functions. We can then decompose the space (8.1) into irreducible components with respect to this action, Hm =
x ⊗ C N ;
(8.3)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
where x : A0 → C is the evaluation functional at x ∈ RD de ned by x (f) = d D y D (x − y)f(y) :
273
(8.4)
By approximating the delta-function by functions of A0 , we may view the functional x (f) = f(x) as a character of the algebra A0 , and also as a one-dimensional unitary irreducible representation of A0 on Hm via pointwise multiplication of functions, x (f) ·
= f(x)
:
(8.5)
We see that the points x ∈ RD are formally “reconstructed” from the unitary irreducible representations (or equivalently the characters) of the commutative algebra A0 . As discussed at the beginning of Section 6, this is the geometric basis of the Gel’fand-Naimark theorem and the association of topological spaces to commutative C ∗ -algebras. Let us now consider the noncommutative case. We may de ne a right action of A on Hm by ·f= ?f :
(8.6)
The requisite condition ( · g) · f = · (g ? f) follows from associativity of the star-product. Such elds are, as we will soon see, naturally interpreted as fundamental matter elds. For a left action f · = f ? , the ’s would instead be thought of as anti-fundamental matter elds. Again, this action de nes a reducible representation. To see this, let us rotate coordinates to the Darboux basis (3.6), in which the coordinate operators xˆi split into d mutually commuting blocks in each of which the Heisenberg commutation relations [xˆ2−1 ; xˆ2 ] = i# ;
= 1; : : : ; d
(8.7)
hold. By the Stone-von Neumann theorem [147], the Lie algebra (8.7) has a unique irreducible representation, the Hilbert space of quantum mechanics, i.e. the SchrYodinger representation Hq = L2 (Rd ) seen as functions of the coordinates x2 , = 1; : : : ; d. From this fact it is evident that the Hilbert space (8.1) is reducible. The SchrYodinger A -module Hq is a separable Hilbert space, i.e. it is countably in nite-dimensional, because it can be expressed in terms of the usual Fock space of creation and annihilation operators. Mathematically, this is the completion to the space of square-summable sequences Hq ∼ (8.8) = ‘2 (Zd ) = ⊕ C|˜n ; +
˜n∈Zd+
where |˜n=|n1 ; : : : ; nd is a multi-particle state, and the Fock space creation and annihilation operators are de ned by 1 1 cˆ = (xˆ1 + i sgn(# )xˆ2 ); cˆ† = (xˆ1 − i sgn(# )xˆ2 ) (8.9) 2|# | 2|# | with the nonvanishing commutation relations [cˆ ; cˆ†F ] = F :
(8.10)
The vectors |˜n are then the simultaneous orthonormal eigenstates of the d number operators nˆ = cˆ† cˆ with eigenvalue n ∈ Z+ , nˆ |˜n=n |˜n , and the actions of the operators (8.9) on this basis are de ned by √ cˆ |˜n = n |˜n − 1 ; cˆ† |˜n = n + 1|˜n + 1 ; (8.11)
274
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
with 1 the integer vector whose components are (1 )F = F . The Hilbert space (8.8) is projective as a right A -module. To see this, we use (8.9) to expand the Weyl operators (2.4) over the Fock space (8.8), and for each xed integer vector ˜n0 ∈ Zd+ consider the operator pˆ˜n0 = |˜n0 ˜n0 |. It is the orthogonal projection onto the one-dimensional subspace of Hq spanned by the vector |˜n0 . In the Weyl representation of the trivial rank-N A -module Hm , we may write the orthogonal decomposition Hm = pˆ˜n0 Hm ⊕ (5Hq − pˆ˜n0 )Hm . Under the correspondence ˜n| ↔ |˜n0 ˜n|, we have the natural isomorphism pˆ˜n0 Hm ∼ = Hq as right A -modules, and hence Hq is projective. By the stronger Mackey form of the Stone-von Neumann theorem [147] it follows that any A -module H is a direct sum of Fock modules. In particular, by iterating the preceding arguments it follows that the Hilbert space (8.1) contains in nitely many copies of the SchrYodinger representation, so that as A -modules there is a natural isomorphism ∞
Hm = ⊕ H q ⊗ C N : n=0
(8.12)
In this sense, Fock space Hq is the analog of a single point in the noncommutative space. It also follows that any fundamental matter eld may be expanded with respect to this decomposition as ˆ ]= W[ m ˜n| ; (8.13) ˜n; m ˜ |˜ ˜n∈Zd+
m ˜ ∈Zd+
label the points (Fock representations) on the noncommutative where ˜n; m˜ ∈ CN and the states ˜n| space. Note that the superposition m˜ ˜n; m˜ |˜ m is an element of Hq and so is the analog of a eld with support at only one point in spacetime. Field (8.13) is in fact an element of the Weyl repreˆ ]W[f]. ˆ sentation of the algebra A on the Hilbert space (8.8), de ned by · f = W[ This simply means that we are working in the de ning (or fundamental) representation of A and it is completely analogous to the commutative case (8.3) and (8.4). This superposition carries information about the in nite dimensional unitary gauge symmetry represented on the module Hq . Notice also that the irreducibility of the Fock module automatically implies that all of the algebras A for diMerent deformation parameters are Morita equivalent. This is in marked contrast to the noncommutative torus, which possesses a nontrivial topological structure. In analogy to the commutative case, we may then consider the gauge transformations (x) → g(x) ? (x) :
(8.14)
It commutes with the right action of A on Hm and thereby preserves the representation of the algebra. It also preserves the L2 -norm of the matter eld provided that the gauge function g(x) satis es the star-unitarity condition (4.8). This implies that g is an element of the group U (N; A ) = U (M(N; A )) of unitary elements of the algebra M(N; A ) = A ⊗ M(N; C) of N × N matrices with entries in the algebra A . Strictly speaking, however, the algebra A of Schwartz functions has no unit and so it is necessary to de ne unitary elements of the algebra A ⊕ C obtained from A by adjoining an identity element. Geometrically, this extension corresponds to studying functions on the topological (but not metric) one-point compacti cation of RD . While geometrically such a compacti cation can have dramatic eMects on the topological properties of the eld theory, it is perfectly harmless at the algebraic level. We shall always implicitly assume such a unital extension, and discuss some of its properties further in Section 8.4. The group U (N; A ) is the gauge symmetry group that we shall study in this section.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
275
We now introduce a covariant derivative as the anti-Hermitian operator ∇i : Hm → Hm de ned by ∇i ( ) = 9i − iAi ?
;
(8.15)
where Ai is a Hermitian element of the algebra M(N; A ), i.e. a gauge eld. This operator has the properties we need. Since the derivative 9i satis es the Leibnitz rule with respect to the star-product, 9i (f ? g) = (9i f) ? g + f ? (9i g) ;
(8.16)
it follows that ∇i satis es a right Leibnitz rule, ∇i ( ? f) = ? (9i f) + ∇i ( ) ? f :
(8.17)
This ensures that ∇i ( ) lies in the same representation of the algebra A as the matter eld , as desired. In particular, ∇i ( ) should transform in the same way as under the gauge transformations (8.14), which xes the gauge transformation rule ∇i → ∇gi , where ∇gi ( ) = g ? ∇i (g† ? ) :
(8.18)
By using (8.15) we then nd that the covariant transformation law (8.18) is equivalent to the usual star-gauge transformation law (4.10), and also that the noncommutative eld strength tensor (4.4) is given by the star-commutator Fij = i∇i ? ∇j − i∇j ? ∇i :
(8.19)
It follows that the curvature tensor Fij ∈ M(N; A ) commutes with the action of A on the Hilbert space Hm , and so it lies in the corresponding commutant of the algebra representation, i.e. Fij ∈ End A (Hm ). The derivation presented in this subsection thereby brings us back to the models of noncommutative Yang–Mills theory that were described in Section 4. In particular, the Weyl representation (4.2) gives a rewriting of noncommutative gauge theory as ordinary Yang–Mills theory (on a noncommutative space) with local elds and with the extended, in nite dimensional gauge symmetry group U (N; A ). This point of view has proven fruitful for analysing the renormalization properties of noncommutative Yang–Mills theory [108], and it shows rather explicitly the transmutation of U (N ) colour degrees of freedom into spacetime degrees of freedom along the noncommutative directions. We remark also that the fundamental matter elds (x) induce local star-gauge invariant observables of noncommutative gauge theory through the density operators L(x) = (x)† ? (x). 8.2. Inner automorphisms The discussion of the previous subsection emphasizes, among other things, the point that gauge transformations correspond to the inner automorphisms f → g ? f ? g† of the algebra M(N; A ). These transformations form the group Inn(N; A ) = {–g | –g (f) = g ? f ? g† ; f ∈ M(N; A ); g ∈ U (N; A )} :
(8.20)
They rotate the algebra elements and correspond to internal 6uctuations of the spacetime geometry in a sense that we will now describe. In general, group (8.20) is a proper, normal subgroup of the automorphism group Aut(N; A ), the group of transformations which preserve the algebra M(N; A ).
276
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
The remaining automorphisms are called outer automorphisms and together they form an exact sequence of groups, 5M(N; A ) → Inn(N; A ) → Aut(N; A ) → Out(N; A ) → 5M(N; A ) :
(8.21)
Equivalently, the group Aut(N; A ) is the semi-direct product of Inn(N; A ) by the natural action of Out(N; A ) on the elements –g ∈ Inn(N; A ). To get some feel for this somewhat abstract characterization, let us again turn to the commutative limit A0 . Then Inn(N; A0 ) is the group of U (N ) gauge transformations on RD → U (N ), while Out(N; A0 ) is naturally isomorphic to the group DiM (RD ) of diMeomorphisms of RD [148]. Given a smooth function / : RD → RD , there is a natural automorphism / : A0 → A0 de ned by / (f) = f ◦ /−1 ;
f ∈ A0 :
(8.22)
If we now represent, as in the previous subsection, the algebra A0 on the Hilbert space Hm of fundamental matter elds, then evidently all inner automorphisms are given via conjugation by unitary operators on Hm . The same property is in fact true of the outer automorphisms. Given / ∈ DiM (RD ), we may de ne a unitary operator gˆ/ on Hm by 1=2 9/ (/−1 x) : (8.23) gˆ/ (x) = 9x Thus, in the commutative case the group Aut(N; A0 ) may be modelled on the group U (Hm ) of unitary endomorphisms of the Hilbert space (8.1), and the unitary group of A0 coincides with the ordinary N × N unitary group U (N ) of Yang–Mills theory. In the noncommutative case = 0, we will soon see that it is also true that the automorphism group Aut(N; A ) is related to the group U (H) of unitary operators on some Hilbert space H. However, U (H) is not the right candidate for the gauge symmetry group of noncommutative Yang–Mills theory. The problem is that for any separable Hilbert space H (one with a countably in nite basis), it is a fundamental fact, known as Kuiper’s theorem [149], that U (H) is contractible, i.e. as a manifold, all closed loops on U (H) can be continuously contracted to a point. In particular, all of its homotopy groups are trivial, n (U (H)) = 0 ;
(8.24)
and we would thereby lose all of the nice topology residing in noncommutative gauge theory and the ensuing topological con gurations like solitons, instantons and D-branes, to name but a few. Furthermore, many such topological quantities should be stable under algebra deformations, i.e. they should be preserved in the commutative limit. The lesson to be learned here is that not all automorphisms of the algebra (or unitary endomorphisms of a Hilbert space) generate gauge transformations, but rather only the inner automorphisms do. We shall soon see what the appropriate gauge group is. For the remainder of this subsection we will describe some more basic aspects of the group of (inner) automorphisms of the noncommutative algebra A . 8.2.1. The Tomita involution A lot of what we have described in this section thus far has been based on the representation of A on some Hilbert space H, and indeed this will be important for the remainder of our discussion.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
277
The automorphism group Aut(N; A ) may then be computed via its lift to this Hilbert space as [99,148] Aut H (N; A ) = {gˆ ∈ U (H) | gJ ˆ = J g; ˆ –gˆ ∈ Aut(N; A (H))} :
(8.25)
The operator J is called the Tomita involution and it induces a bi-module structure for the given representation of the algebra A . It is inserted in de nition (8.25) because the structure of the automorphism group of A should not depend on whether the algebra acts on H from the right or left. The operator J is formally the anti-linear, self-adjoint unitary isometry of H such that J A (H)J −1 = A (H) is the commutant of the algebra A in the module H. If A acts on H from the right (resp. left), then J H is a left (resp. right) A -module. From the Hilbert space lift (8.25) we recover the automorphisms of A from the Wigner projection 9 : AutH (N; A ) → Aut(N; A ) de ned by ˆ ˆ 9(g)[f(x)] ˆ = Tr H (gˆW[f] ; (8.26) gˆ −1 '(x))
ˆ where '(x) is map (2.6) in the representation of A on H, and Tr H denotes the trace over states of the Hilbert space H. Clearly these constructions also hold true in the restriction of the automorphism group of the algebra to gauge transformations. A physical interpretation of the Tomita involution J may be given as follows. When H = Hm is the Hilbert space (8.1), we de ne the action of J on fundamental matter elds by J( ) =
†
:
(8.27) A (H
Thus in this case, J is a charge conjugation operator, and the commutant m ) = A− (Hm ) is naturally isomorphic to the algebra A (Hm ). In fact, it is simply the algebra obtained from A (Hm ) by multiplying its elements in the opposite order. In this case the symmetry operator J has the eMect of enlarging the irreducible Fock module Hq to Hm [98]. This means that, as anticipated, the gauge symmetries of the noncommutative space are only visible when the full set of “points” (Fock representations) of the space are incorporated. This is necessary because connections on the Fock module are trivial and induce, due to irreducibility, only the gauge group U (1) [38]. This induced bi-module structure also arises naturally within the context of open string quantization in background B- elds and D-branes, as described in Section 6.3.1. We quantize the point particle at an endpoint of an open string to produce a Hilbert space H, upon which the algebra A acts. This is depicted as in Fig. 5, with the same con gurations of D-branes at the opposite ends of the string, i.e. in the usual Seiberg–Witten scaling limit, in which the string oscillations can be neglected, we impose identical boundary conditions at both endpoints of an open string. This yields the bi-module M = H ⊗ H∨ , where H∨ is the conjugate A -module to H corresponding to the opposite orientations of a pair of Type II string endpoints. The Hilbert space M is naturally an algebra which coincides with the algebra of Hilbert–Schmidt operators on H that represent the joining of open string endpoints (see Section 8.4 below) [19,150]. As explained in Section 6.3.1, we may naturally identify the Hilbert spaces H = Hq and H∨ = J H. Then the condition involving the real structure J in (8.25) simply re6ects the fact that a lifted gauge transformation from the worldvolume eld theory (or more generally a lifted algebra automorphism) should preserve the actions of A at opposite ends of the open string. In fact, J may be thought of as a worldsheet parity operator, mapping Type IIB D-branes onto Type I D-branes and the associated orientifold planes. This real structure can be thereby used to construct nonunitary noncommutative gauge groups [92], as indicated at the end of Section 4.1.
278
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
8.2.2. Geometrical aspects From a very heuristic point of view, in the simplest instance of U (1) gauge symmetry, the automorphism group Aut(A ) ≡ Aut(1; A ) lies “somewhere” in between that of the commutative algebra of functions A0 and a nite-dimensional matrix algebra M(N; C). In the former case there are no inner automorphisms, 13 so that all automorphisms are outer automorphisms and generate spacetime coordinate transformations, Inn(A0 ) = {5};
Out(A0 ) = DiM (RD ) :
(8.28)
On the other hand, all automorphisms of the algebra M(N; C) can be represented via rotations by N × N unitary matrices, so that Inn(M(N; C)) = U (N );
Out(M(N; C)) = {5} :
(8.29)
For the algebra A there is a nontrivial mixing between the two structures (8.28) and (8.29). Unlike the commutative case, it is no longer true that the group U (N; A ) is the product of a function algebra and a nite-dimensional Lie group, and so much richer geometric and algebraic structures will emerge. In what follows we will attempt to make this mixing between spacetime and internal degrees of freedom more precise. We have already seen an example of this mixing in Section 4.2. Namely, the star-unitary plane i waves gv (x) = eiki (v)x , with momentum given by (4.19), determine inner automorphisms of the algebra A which generate the translations (4.20) of functions by the constant vectors v ∈ RD . The corresponding gauge transformation (4.10) is given by Ai (x) → Ai (x + v) − ki (v) :
(8.30)
The overall constant shift of the gauge eld in (8.30) drops out of the eld strength (4.4) and has no physical eMect in 6at, in nite space, i.e. it corresponds to a global symmetry transformation of the eld theory. In this simple instance we thereby nd that the noncommutative gauge group contains spacetime translations. This seemingly remarkable conclusion must, however, be taken in appropriate context. The plane waves gv (x) are not Schwartz functions, because they only oscillate very rapidly at in nity in RD . They can of course be approximated by Schwartz functions, in a distributional sense, and for many applications this would suVce to deduce that they generate gauge symmetries of noncommutative Yang–Mills theory. For other applications, such as those involving noncommutative solitons whereby the details of the asymptotic, topological con gurations of the elds are crucial, this conclusion is not entirely valid. It is natural to ask if this construction can be repeated for more general, nonconstant functions vi = vi (x) on RD . These will produce more general spacetime transformations, which we may wish to compare with diMeomorphisms of RD . Let us examine, at an in nitesimal level, the expansion of the Moyal commutator bracket in powers of . Using (2.22) we nd / f ≡ i/ ? f − if ? / = {/; f} + O(92 /92 f) ;
(8.31)
{/; f} = ij 9i /9j f
(8.32)
where
13
More precisely, the inner automorphisms correspond to abelian, U (1) gauge transformations.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
279
is the Poisson bracket based on the symplectic form of RD . From (8.31) it follows that, to leading orders in the deformation parameter (equivalently for slowly varying elds), the noncommutative gauge group coincides with the group of canonical transformations which preserve the symplectic structure . These diMeomorphisms form the symplectomorphism group DiM (RD ) of RD , and we see that U (A ) ∼ DiM (RD ) in the limit → 0. This limit is analogous to the classical limit in quantum mechanics, and in this truncation the noncommutative elds can be treated as ordinary functions rather than operators. The group DiM (RD ) is the natural symmetry group of membranes, dynamical systems, and hydrodynamic systems [151], and in this limit the noncommutative Yang–Mills action reduces to the corresponding bosonic membrane actions. Moreover, as we indicated in Section 2.2, this is the starting point for the deformation quantization description of the theory [59,62] which can be carried out over any Poisson manifold. Although the higher-derivative terms in (8.31) modify this interpretation of noncommutative gauge transformations, we will see that the general symplectomorphism nature of the spacetime symmetries induced by the star-gauge symmetry will always be the same [98]. We shall now proceed to try to understand better this uni cation of spacetime and gauge symmetries. At the same time we will also attempt to clarify more precisely in what sense the matrix degrees of freedom are deformed into spacetime ones, such that some outer automorphisms of the commutative algebra A0 become inner automorphisms of the noncommutative algebra A and thereby generate genuine gauge symmetries of noncommutative Yang–Mills theory. 8.2.3. Violations of Lorentz invariance We will rst digress momentarily to make some quick remarks on the Lorentz transformation properties of noncommutative gauge theories. For this, we rst notice that the global translational symmetry above naturally generalizes to other symplectic diMeomorphisms. For instance, for D = 2 we can de ne the star-unitary function 2 g (x) = 1 + 2 2 ei|x| (8.33) where = 12 and is a real parameter. It generates the inner automorphism g (x) ? f(x) ? g (x)† = f(x ) ; where
x1 x2
=
cos W
sin W
−sin W
cos W
x1
(8.34)
x2
(8.35)
is a rotation in the plane through angle W = arctan() :
(8.36)
Thus, as mentioned already in Section 4.2, it is possible to realize spacetime rotations via noncommutative gauge transformations. A similar property holds for global, discrete symmetries. For instance, the star-unitary function gp (x) = D=2 PfaM () D (x)
(8.37)
generates the parity re6ection gp (x) ? f(x) ? gp (x)† = f(−x) :
(8.38)
280
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
Using these geometrical properties we can now make some general remarks concerning the Lorentz invariance of the theory, which is super cially broken by the presence of the tensor ij in the spacetime commutation relations (1.1). For the so-called “observer” Lorentz transformations of the theory, rotations or boosts of an observer inertial reference frame leaves the physics unchanged because a unitary transformation of the matrix ij can be gauged away by a star-gauge transformation, as we have just shown above. However, the theory is not invariant under “particle” Lorentz transformations which correspond to rotations or boosts of localized eld con gurations within a xed observer frame. Such transformations leave the noncommutativity parameters unchanged, and lead to spontaneous Lorentz violation because ij provides a directionality to spacetime in any xed inertial frame. One can thereby compare noncommutative gauge theory in four dimensions with Lorentz-violating extensions of the standard model. Many terms in such extensions can be eliminated because noncommutative eld theories are CPT invariant [152]. Comparisons with the QED sector yield bounds of the order [153] ¡ (10 TeV)−2 ;
(8.39)
which follow from an analysis of atomic clock-comparison experiments or by comparison with standard QED processes. 8.3. Universal gauge symmetry A striking consequence of the connection between noncommutative gauge theories and twisted reduced models that was described in Section 7.1 is the universality of the noncommutative gauge group [38,154], in a sense that we shall now explain. Recall that the classical vacua of action (7.5) determined the noncommuting momenta [Cˆ i ; Cˆ j ] = −i( −1 )ij with the opposite deformation parameter −. This rewriting is therefore naturally associated with the bi-module structure based on the (equivalent) noncommutative function algebras A and A− , and it is the basis for the relationship between the commutative and noncommutative descriptions of the same eld theory [134]. In particular, the covariant coordinate operators (7.1) may be interpreted as gauge elds appropriate to the splitting of covariant derivatives on a generic A -module into a free part Hm (represented by the gauge eld Ai ) and a twisted part of constant magnetic 6ux corresponding to copies of the Fock module Hq [38,155]. In the D-brane picture, the corresponding global minima are identi ed with the closed string vacuum possessing no open string excitations. By using global Euclidean invariance, we may rotate to Darboux coordinates and consider the noncommutative gauge theory (7.5) independently in each 2 × 2 skew-block of the deformation matrix (3.6). In this subsection we will therefore consider only the simplest case of D = 2 spacetime dimensions. The general case can be easily obtained by stitching the independent blocks together again by means of an SO(D) transformation. We will therefore consider the noncommutative Yang–Mills action 1 1 2 ˆ ˆ SYM = − 2 Tr [C z ; C zZ] + ; (8.40) g 2 where Cˆ z = 12 (Cˆ 1 + i Cˆ 2 );
Cˆ zZ = 12 (Cˆ 1 − iCˆ 2 ) :
(8.41)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
281
We will restrict the quantum eld theory to those eld con gurations which have nite action. This requires the eld strength Fij to vanish almost everywhere and corresponds to the classical vacua of the theory. It will enable us to evaluate the corresponding Feynman path integral in a semi-classical approximation. From (8.40) it follows that this is equivalent to the conditions 1 : (8.42) [Cˆ zZ; Cˆ z ] = 2 In a particular A -module H, we require that Eq. (8.42) holds for all but a nite number of matrix elements of the operators (corresponding to a set of measure zero in eld space). The algebraic conditions (8.42) are simply the Heisenberg commutation relations that we encountered in Section 8.1. By the Stone-von Neumann–Mackey theorem, we know that (up to unitary equivalence) its unique unitary irreducible representation is the SchrYodinger representation on Fock space (8.8). For ¿ 0 it is given by cˆ† √ Cˆ (1) ; = − z 2
cˆ Cˆ (1) ; zZ = − √ 2
(8.43)
where cˆ† and cˆ are the Fock space creation and annihilation operators (8.9) for d = 1. For ¡ 0 one should interchange cˆ† and cˆ in (8.43). The most general solution to the classical equations of motion (8.42) is given by a countable direct sum of Fock modules. We may label the solution space ) by an integer N ¿ 1 which corresponds to a representation by operators Cˆ z(N ) ; Cˆ (N acting on the zZ separable Hilbert space N −1
⊕ Hq ∼ = Hq ⊗ CN ≡ Hq(N ) :
(8.44)
J=0
There is a (noncanonical) isomorphism Hq(N ) ∼ = Hq which can be used to rewrite the noncommutative gauge theory (8.40), evaluated (without loss of generality) in the SchrYodinger A -module Hq , in terms of eld con gurations in the sector labelled by N . This follows from the Hilbert hotel argument which regroups the Fock space states |n; n = 0; 1; 2; : : :, into the basis vectors |p; J; p = 0; 1; 2; : : : ; J = 0; 1; : : : ; N − 1, of Hq(N ) as |n = |pN + J ≡ |p; J ;
(8.45)
where the index p labels the in nite-dimensional, Fock space component of Hq(N ) and J indexes the nite-dimensional part CN . In this basis we may write the vacuum state, for ¿ 0, up to a star-gauge transformation as ∞ N −1
cˆ† 1 √ ) √ ⊗ 5N = − √ = − p |p; Jp − 1; J| ; Cˆ (N z 2 2 p=0 J=0 ∞ N −1
cˆ 1 ) Cˆ (N ⊗ 5N = − √ p + 1 |p; Jp + 1; J| ; zZ = √ 2 2 p=0 J=0
(8.46)
and similarly for ¡ 0. The vacuum con guration (8.46) has two types of unitary gauge symmetries. There is the in nitedimensional U (Hq ) symmetry acting on the Fock space labels, under which even the original action (8.40) is invariant. There is also a U (N ) symmetry acting by nite-dimensional rotations of the J
282
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
labels. The integer N labels the gauge inequivalent vacua of the noncommutative gauge theory and can be given as the analytical index ) ˆ z(N ) ; Cˆ (N ˆ (N ) ˆ (N ) ; N = Tr H(N ) [C zZ ] = dim ker C z C zZ q
(8.47)
) which counts the diMerence between the number of zero eigenvalues of the operators Cˆ z(N ) Cˆ (N and zZ ) ˆ (N ) Cˆ (N C whose nonzero eigenvalues all coincide. Here one must remember that we are dealing z zZ with in nite-dimensional operators, so that generally Tr[Cˆ zZ; Cˆ z ] = 0. Quantity (8.47) is thereby a topological invariant which detects the diMerential operators that are hidden in the elds Ci . In particular, N cannot be changed by any local gauge transformation. In fact, it is this quantity that identi es sectors with a higher-dimensional interpretation in what is naively a zero-dimensional theory (i.e. a matrix model). Any operator on Fock space which transforms covariantly under star-gauge transformations may now be re-expressed, via the isomorphism (8.45), in terms of matrices transforming in the adjoint representation of the nite-dimensional unitary group U (N ). For example, for the gauge eld ˆ strengths Fˆ ≡ W[F] zz Z we may write
Fˆ =
∞ ∞
ˆ mn |mn| (F)
m=0 n=0
=
∞ ∞ N −1 N −1
(Fˆ (N ) )J< pq |p; Jq; <|
p=0 q=0 J=0 <=0 2
=
N ∞ ∞ p=0 q=0 a=1
(Fˆ (N )a )pq
−1 N −1 N
(ta )J< |p; Jq; <| ;
(8.48)
J=0 <=0
where ta are the generators of U (N ) in the fundamental representation. As in Section 8.1, the Fock indices p; q label both the dependence of the elds on the “coordinates” of the noncommutative space and the internal star-gauge symmetry. But now there are new indices J on the elds, representing a hidden internal U (N ) gauge symmetry in the given topological vacuum sector labelled by N . This is quite remarkable, given that we started with only U (1) gauge elds. Let us now examine how the quantum eld theory decomposes according to the vacuum con gurations that we have found. Evidently, any path in eld space which connects diMerent vacua has in nite action, and so the quantum theory constructed about any one of these vacua will not mix with any of the others. Evaluating the corresponding path integral as a sum over each of the classical vacuum eld con gurations thereby splits it into a sum of partition functions for each U (N ) theory, Z=
∞
DCz DCzZ e−SYM = ZN : vol(U (Hq )) N =1
(8.49)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
283
In the sector labelled by N , we can expand the quantum theory described by the partition function ZN around the classical vacuum con guration (8.46) as ) : Cˆ i = Cˆ i(N ) + Aˆ (N i
(8.50)
By using (8.48) we may then write the noncommutative Yang–Mills action in the topological sector N which de nes the partition function ZN for ¿ 0 as 1 SYM = − 2 Tr Hq Fˆ 2 2g 1 Tr (N ) (Fˆ (N ) )2 2g2 Hq 2 1 1 = − 2 Tr Hq ⊗ tr N √ {[cˆ ⊗ 5N ; Aˆ z ] + [cˆ† ⊗ 5N ; Aˆ zZ]} + [Aˆ zZ; Aˆ z ] ; 2g 2 =−
(8.51)
where (Aˆ i )J< is an N × N matrix-valued operator on ordinary Fock space Hq which is de ned by its matrix elements in taking the traces in (8.51), p|(Aˆ i )J< |q = p; J|Aˆ i(N ) |q; < :
(8.52)
Like the coordinate operators xˆi , the oscillator operators cˆ and cˆ† generate derivatives of elds in the Weyl representation, owing to the commutation relations (8.10), || || ˆ ˆ ˆ ˆ zZf]; [cˆ† ; W[f]] [c; ˆ W[f]] = =− (8.53) W[9 W[9z f] ; 2 2 where z; zZ = x1 ± i sgn() x2 . It follows that the eld strength squared which appears in the argument of the trace in (8.51) is just the standard Weyl representation of that for noncommutative U (N ) gauge theory, and action (8.51) is of precisely the same form as (4.2). We thus conclude that U (1) noncommutative Yang–Mills theory contains noncommutative U (N ) gauge theory for all values of N . The rank N of the gauge group emerges as a superselection parameter, labelling separate, star-gauge inequivalent vacuum sectors of the original quantum Hilbert space. Therefore, noncommutative Yang–Mills theory is a universal gauge theory, containing all Yang–Mills theories (including the noncommutative ones). Universal gauge theories had been sought some time ago [156] in terms of models based on a gauge symmetry group U (∞) de ned through a sequence of embeddings of U (N ) structure groups, U (1) ⊂ U (2) ⊂ · · · ⊂ U (N ) ⊂ U (N + 1) ⊂ · · · ⊂ U (∞) :
(8.54)
In this sequence the unitary group U (N ) is viewed as consisting of operators on an in nite-dimensional Hilbert space which are equal to the identity operator except in an N × N submatrix. The N → ∞ limit then de nes the inductive limit U (∞), which is the group of all unitary operators that diMer from the identity by a nite-rank operator. However, the gauge group U (∞) is rather diVcult to deal with, in particular within the setting of noncommutative geometry. An appropriate enlargement to unitary groups within the Schatten ideals proves to be manageable, and in some instances even leads to an exactly solvable gauge theory [156]. We shall see in the next section that these same gauge groups are the appropriate ones for noncommutative Yang–Mills theory, although they arise much more naturally and for diMerent reasons.
284
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
8.4. Large N limits The basic symmetry underlying the remarkable construction of the previous subsection is some version of the in nite unitary group U (∞). We are thus posed with the problem of determining which version, i.e. how to embed nite-rank structure groups starting from U (N ) to end up with U (∞). DiMerent ways of embedding lead to very diMerent inductive limits. In particular, to make contact with the function algebras we are dealing with, an appropriate completion is required. The appearance of this gauge group and the ensuing universal gauge symmetry was a consequence of the rewriting of noncommutative Yang–Mills theory as the in nite-dimensional reduced model (7.5). However, to understand better the sequence of embeddings (8.54) from nite-dimensional symmetry groups to in nite-dimensional ones, we should rst appeal to the 5nite-dimensional matrix models underlying noncommutative gauge theory that were described in Section 7.2. This will help with the understanding of how the appropriate large N limit should be taken above. In these matrix models, the group of inner automorphisms may be written down in a very precise closed form, and this will help us understand the structure of the noncommutative gauge symmetry group U (A ). 8.4.1. Algebraic description We will begin with an algebraic formulation of the star-gauge symmetry group, as it is somewhat more straightforward to describe. From the matrix model formulation of noncommutative gauge theory that we presented in Section 7.2, it is clear that the gauge group is intimately related to the in nite unitary group U (∞). But this is also apparent from the representation of the algebra A on the irreducible Fock space (8.8). The natural algebra acting on Hq consists of d copies M(∞; C) ⊕ · · · ⊕ M(∞; C) ∼ = M(∞; C) of the algebra of nite-rank operators ∞ " M(∞; C) = M(N; C) ; (8.55) N =1
which is de ned with respect to the natural system of embeddings of algebras, M(N; C) ,→ M(N + 1; C);
M 0 M → : 0 0
nite-dimensional matrix
(8.56)
It is important to note that (8.55) consists of arbitrarily large but 5nite matrices. At the nite level, the unitary group of M(N; C) is of course just the usual N × N unitary group U (N ), which coincides with the gauge symmetry group of the twisted Eguchi–Kawai model (7.6), or equivalently of the noncommutative lattice gauge theory (7.35). The group homomorphism – : U (N ) → Inn(N; C); U → –U , is injective and has kernel ker – = U (1) generated by the unit matrix 5N . The group of nite-dimensional inner automorphisms is thereby given explicitly as Inn(N; C) = U (N )=U (1) = SU (N )=ZN ; and the unitary group of the inductive limit (8.55) is the expected ∞ " U (N ) : U (M(∞; C)) = U (∞) = N =1
(8.57) (8.58)
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
285
Fig. 6. The semi-in nite Dynkin diagram for the in nite-dimensional Lie algebra su(∞) represented on Fock space. The corresponding step operators for the Cartan basis are E˜n; m˜ = |˜n ˜ m|.
The corresponding semi-simple Lie algebra su(∞) is described by the Dynkin diagram depicted in Fig. 6 [157]. We should remember, however, that the Fock module (8.8) is de ned with an appropriate completion, emphasizing the fact that it derives from a space of functions on RD . To relate this construction to the function algebras of interest, we must consider appropriate norm completions of the algebra M(∞; C), and hence of the in nite unitary group U (∞). At the level of Schwartz functions, the completions may be de ned in terms of Lp -norms for 1 6 p 6 ∞. For p = ∞ this is de ned in (2.2), while for p ¡ ∞ we de ne 1=p D p d x|f(x)| (8.59) fp = on the space of p-integrable functions f ∈ Lp (RD ). The Lp -spaces form a sequence of completions L1 (RD ) ⊂ L2 (RD ) ⊂ · · · ⊂ L∞ (RD ) ;
(8.60)
L∞ (RD )
contains the algebra of Schwartz functions that we have been where the Banach space working with. Under the large N matrix model Weyl–Wigner correspondence, we should now translate this structure into a statement about operators acting on the irreducible Fock module Hq . Recalling from (2.11) that spacetime integrals map onto traces of Weyl operators in End(Hq ), it follows that the operator algebras should be completed in the Schatten p-norms. For 1 6 p ¡ ∞ these are de ned by † ˆ p=2 1=p ˆ ˆ W[f] ) p = (Tr Hq (W[f] W[f])
(8.61)
on the space of p-summable operators ‘p (Hq ) on Fock space. For p=∞ it is given by the operator norm 1=2 ˆ ˆ ˆ W[f] ∞ = sup (W[f] |W[f] )
| 61
(8.62)
de ned on the algebra of compact operators ‘∞ (Hq ) ≡ K(Hq ). Starting from the algebra of nite-rank operators (8.55), there is also a sequence of completions M(∞; C) ⊂ ‘1 (Hq ) ⊂ ‘2 (Hq ) ⊂ · · · ⊂ K(Hq )
(8.63)
in correspondence with the functional sequence (8.60). In other words, integrable functions correspond to trace-class operators, square-integrable functions to Hilbert–Schmidt operators, and so on. Notice that there is no functional analog of the nite-rank operators. ˆ Of special interest to us is the algebra K(Hq ) of compact operators. A compact operator W[f] † ˆ ˆ is one for which the sequence of eigenvalues of the Hermitian operator W[f] W[f] tends to zero. They are therefore the natural analogs of functions which fall-oM at in nity in RD . They are as close to nite-rank or bounded operators as one can get under the Weyl–Wigner correspondence. For
286
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
instance, like the nite-dimensional matrix algebras M(N; C), the de ning representation of K(Hq ) on Hq is, up to unitary equivalence, the only irreducible representation of the C ∗ -algebra of compact quantum mechanical operators. In particular, they are the natural analogs of Schwartz functions [147]. ˆ Now the map W[g] → –Wˆ [g] generates a continuous homomorphism (in the operator norm topology) of the unitary group U (Hq ) of Fock space onto the automorphism group Aut(K(Hq )). It has kernel U (1) consisting of phase multiples of the Fock space identity operator 5Hq . This identi es the automorphism group as the group of projective unitary automorphisms PU (Hq ) = U (Hq )=U (1) of the Hilbert space Hq , Aut(K(Hq )) = PU (Hq ) :
(8.64)
This is the natural completion of the matrix model automorphism group (8.57), whereby the global centre subgroup ZN of SU (N ) is replaced with the phase group U (1) in the large N limit of (8.57) leading to (8.64). This illustrates explicitly how the matrix model degrees of freedom are transmuted into spacetime degrees of freedom. But again, not all of these automorphisms are gauge transformations. We need to consider the unitary subgroups of the spaces of operators which comprise the sequence of completions (8.63). They themselves form a natural sequence of completions starting from the group U (∞) of nite-rank unitary operators on Hq , U (∞) ⊂ U1 (Hq ) ⊂ U2 (Hq ) ⊂ · · · ⊂ U∞ (Hq ) :
(8.65)
Putting these facts all together now identi es the star-gauge transformations as the automorphisms (8.64) generated by the compact unitaries U∞ (Hq ). Therefore, we arrive at a very nice physical interpretation of the gauge symmetry group of noncommutative Yang–Mills theory. Namely, it is the group of compact unitary operators on Fock space, U (A ) = U∞ (Hq ) :
(8.66)
Result (8.66) is the right answer for the gauge symmetry group. The driving reasons for this are the properties of the in nite unitary group (8.58). First of all, it contains all nite-rank structure groups U (N ), so that the group (8.66) has the appropriate universality properties that we encountered in the previous section, and it in fact coincides with the gauge groups used in [156] to build models of universal gauge theory. Secondly, the group U (∞) has homotopy groups determined by Bott periodicity which are nontrivial in every odd dimension, # Z; n = 2k + 1; n (U (∞)) = (8.67) 0; n = 2k: By Palais’ theorem [158], all the unitary groups that appear in the sequence (8.65) have the same homotopy type as U (∞), i.e. this is a topological property that is preserved under the completions in End(Hq ). Therefore, unlike the full unitary group of Hilbert space which has trivial homotopy (8.24), the subgroup (8.66) recovers the correct topological properties. Furthermore, the identi cation (8.66) agrees with the natural gauge orbit space that one should integrate over in the Euclidean path integral formulation of the quantum gauge theory [159]. In the commutative case this would be the quotient of the space of gauge eld con gurations on RD by the group of gauge transformations which are connected to the identity, i.e. which approach the identity at in nity in RD . But this connectedness property is precisely what is possessed by the compact unitaries U∞ (Hq ) under the
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
287
Weyl–Wigner correspondence. This provides a direct relationship between the topology of the gauge group U∞ (Hq ) and that of the con guration space of noncommutative gauge elds [159]. The analytic description (8.66) of the star-gauge symmetry group is therefore the correct one. 8.4.2. Geometric description In (8.66) we have unveiled a very precise, analytic description of the star-gauge symmetry group which illustrates clearly its topological properties. What is less transparent in this formalism is its geometrical characteristics. To determine these, we appeal once more to the nite-dimensional matrix model representation of noncommutative gauge theory. From (7.20) it follows that the fundamental discrete generators in the nite N matrix model obey the commutation relations [140]
i [Jk ; Jq ] = 2i sin ki Qij qj Jk+q : (8.68) N i¡j We now take the large N limit of the relations (8.68) in the dynamical regime of momentum √ space whereby the discrete noncommutative elds contain only small Fourier modes ki ; qj N . This restricts the Fourier momenta of the elds to lie in the interior of the Brillouin zone. After an appropriate rescaling of the operators Jk by N , the sine function in (8.68) can be expanded, resulting in the large N commutation relations of the W∞ algebra (∞) ; [Jk(∞) ; Jq(∞) ] = 2ik ∧ qJk+q with k ∧ q = i¡j ki !ij qj . The commutation relations (8.69) coincide with the Lie algebra of the vector elds 9 V/ = ij 9i / j 9x
(8.69)
(8.70)
i
for the functions /(x) = /k (x) = e2iki x =‘ which constitute the complete set of harmonics on a D-dimensional hypercubic torus. The in nitesimal diMeomorphisms generated by the vector elds (8.70), represented on functions as f → / f = V/ (f) ;
(8.71)
generate canonical transformations of the spacetime coordinates, as in (8.31), (8.32). In particular, they realize the Poisson–Lie algebra [V/ ; V/ ] = V{/; / } :
(8.72)
It follows that for smooth matrices Ui whose low Fourier modes dominate the expansion (7.22), the commutator bracket of large N noncommutative elds can be substituted by the Poisson bracket. As we remarked in Section 8.2.2, this reduction is reminescent of the semi-classical approximation of quantum mechanics. In this limit the Moyal and Poisson brackets coincide, and the group of symplectomorphisms may be identi ed with an appropriate completion of the in nite unitary group U (∞) [140]. Of course, this is no longer true for elds which have high momentum modes, as we saw already in Section 8.2.2. Given that the Moyal bracket represents the commutator bracket of quantum mechanical operators, we may thereby arrive at a geometrical characterization of the star-gauge symmetry group. Namely, U (A ) is a quantum deformation of the symplectomorphism group DiM (RD ). There are, however, many ways to see that this deformation must still consist
288
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
of operators which preserve the symplectic structure [98]. For instance, these are precisely the ←
→
transformations which preserve the Poisson bi-vector 9i ij 9j which appears in formula (2.19) for the star-product. Further aspects of this deformation are described in [98,161]. There are several subtleties with the geometrical description that we have just presented. First of all, it is only a local description, because the correspondence has been established only at the level of the Fourier basis for a torus. It has also neglected the boundaries of the Brillouin zone in momentum space, and subtleties associated with the periodicity N of the lattice in the large N limit. This latter property is the reason why, for instance, the U (∞) symmetry group associated with RD (which has the semi-in nite Dynkin diagram of Fig. 6) is not the same as that on TD (which has an in nite Dynkin diagram) [162]. There are many diMerent algebras that can be obtained starting from M(N; C) by taking inductive N → ∞ limits with more complicated embeddings than the simplest, canonical one in (8.56). Indeed, there are in nitely many, nonmutually pairwise isomorphic versions of the in nite dimensional Lie group SU (∞) which depend on the way that the large N limit is taken [163]. In particular, the Lie groups U (∞) and DiM (RD ) are not isomorphic [164], their diMerences lying precisely in the high frequency components of the elds. The proper way to relate the in nite matrix algebra to the algebra of functions on the noncommutative torus is described in [165]. One can embed the algebra A in this case into the completion of an in nite dimensional algebra of nite rank matrices. The nice geometric feature of this construction is that the embedding algebra contains all Morita equivalent tori, and hence the continuum noncommutative Yang–Mills theory is approximated by gauge theories on discrete spaces that at the same time approximate all dual eld theories. This is the basis of the nite-dimensional Morita equivalences constructed in [96]. While the description of this subsection still leaves some imprecision as to the precise nature of star-gauge symmetries in noncommutative quantum eld theory, we have at least captured some of the general geometrical features of the mixing between spacetime and colour degrees of freedom. At this stage it does not seem likely to capture complete diMeomorphism invariance using star-gauge transformations alone. However, investigation of star-gauge invariant operators, such as the open Wilson lines, could suggest how the noncommutative gauge elds couple to gravity [103]. Indeed, the results obtained in this subsection are very natural in the D-brane picture [134,160]. As we explained at the end of Section 5.2, noncommutative eld theories appear as a description of D-branes carrying a uniform distribution of lower-dimensional brane charges. In this way a noncommutative D-brane (i.e. one in a constant B- eld) may be described as a con guration of in nitely-many lower-dimensional D-branes. Then the usual U (1) gauge theory on the brane is represented as a U∞ (Hq ) gauge symmetry in the lower-dimensional eld theory corresponding to diMeomorphisms which leave the volume of the brane invariant. A larger class of diMeomorphisms may be obtained by considering the gauge theory operators which couple to closed string states in the bulk of the D-brane, ˜ C(k) =
D
d xP exp?
0
1
i dt(iv Ai (x + vt) + ya / (x + vt)) ? eiki x ; i
a
(8.73)
where /a are the embedding coordinates of the D-brane in target space, vi = kj ji is the separation of the straight open Wilson line, and ya = 2 ka with ka the momentum in the directions transverse to the D-brane.
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
289
Acknowledgements The author thanks A. Armoni, C.-S. Chu, J. Erickson, J. Gomis, J. Gracia-Bond]^a, D.R.T. Jones, F. Lizzi and V. Rivelles for very helpful comments on the manuscript. He would especially like to thank the referee for the many suggestions and remarks which have helped to greatly improve the manuscript. This work was supported in part by an Advanced Fellowship from the Particle Physics and Astronomy Research Council (U.K.). References [1] H.S. Snyder, Quantized spacetime, Phys. Rev. 71 (1947) 38; The electromagnetic eld in quantized spacetime, Phys. Rev. 72 (1947) 68; C.N. Yang, On quantized spacetime, Phys. Rev. 72 (1947) 874. [2] A. Connes, Noncommutative Geometry, Academic Press, New York, 1994; G. Landi, An Introduction to Noncommutative Spaces and their Geometries, Springer, Berlin, 1997; J. Madore, An Introduction to Noncommutative Geometry and its Physical Applications, Cambridge University Press, Cambridge, 1999; J.M. Gracia-Bond]^a, J.C. V]arilly, H. Figueroa, Elements of Noncommutative Geometry, BirkhYauser, Basel, 2000. ] [3] A. Connes, Noncommutative diMerential geometry, Inst. Hautes Etudes Sci. Publ. Math. 62 (1985) 257; S.L. Woronowicz, Twisted SU (2) group: an example of a noncommutative diMerential calculus, Publ. Res. Inst. Math. Sci. 23 (1987) 117; S.L. Woronowicz, Compact matrix pseudogroups, Commun. Math. Phys. 111 (1987) 613. [4] A. Connes, M.A. RieMel, Yang–Mills for noncommutative two-tori, Contemp. Math. 62 (1987) 237. [5] A. Connes, J. Lott, Particle models and noncommutative geometry, Nucl. Phys. (Proc. Suppl.) B18 (1990) 29; J.C. V]arilly, J.M. Gracia-Bond]^a, Connes’ noncommutative diMerential geometry and the standard model, J. Geom. Phys. 12 (1993) 223; C.P. Mart]^n, J.M. Gracia-Bond]^a, J.C. V]arilly, The standard model as a noncommutative geometry: the low-energy regime, Phys. Rep. 294 (1998) 363 [hep-th/9605001]. [6] H. Grosse, J. Madore, A noncommutative version of the schwinger model, Phys. Lett. B 283 (1992) 218; F. Lizzi, G. Mangano, G. Miele, G. Sparano, Constraints on uni ed gauge theories from noncommutative geometry, Mod. Phys. Lett. A 11 (1996) 2561 [hep-th/9603095]. [7] A.H. Chamseddine, G. Felder, J. FrYohlich, Gravity in noncommutative geometry, Commun. Math. Phys. 155 (1993) 205 [hep-th/9209044]; W. Kalau, M. Walze, Gravity, noncommutative geometry and the wodzicki residue, J. Geom. Phys. 16 (1995) 327 [gr-qc/9312031]; D. Kastler, The dirac operator and gravitation, Commun. Math. Phys. 166 (1995) 633; A.H. Chamseddine, J. FrYohlich, O. Grandjean, The gravitational sector in the Connes–Lott formulation of the standard model, J. Math. Phys. 36 (1995) 6255 [hep-th/9503093]; A.H. Chamseddine, A. Connes, The spectral action principle, Commun. Math. Phys. 186 (1997) 731 [hep-th/9606001]. [8] M. Dubois-Violette, R. Kerner, J. Madore, Gauge bosons in a noncommutative geometry, Phys. Lett. B 217 (1989) 485. [9] S. Majid, Hopf algebras for physics at the Planck scale, Class. Quant. Grav. 5 (1988) 1587; R. Coquereaux, Noncommutative geometry and theoretical physics, J. Geom. Phys. 6 (1989) 425; L.J. Garay, Quantum gravity and minimum length, Int. J. Mod. Phys. A 10 (1995) 145 [gr-qc/9403008]. [10] D.V. Ahluwalia, Quantum measurements, gravitation and locality, Phys. Lett. B 339 (1994) 301 [gr-qc/9308007]; D.V. Ahluwalia, Wave-particle duality at the Planck scale: freezing of neutrino oscillations, Phys. Lett. A 275 (2000) 31 [gr-qc/0002005]. [11] G. Veneziano, A stringy nature needs just two constants, Europhys. Lett. 2 (1986) 199; D.J. Gross, P.F. Mende, String theory beyond the Planck scale, Nucl. Phys. B 303 (1988) 407; D. Amati, M. Ciafaloni, G. Veneziano, Can spacetime be probed below the string size? Phys. Lett. B 216 (1989) 41.
290
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
[12] S. Doplicher, K. Fredenhagen, J.E. Roberts, Spacetime quantization induced by classical gravity, Phys. Lett. B 331 (1994) 39; S. Doplicher, K. Fredenhagen, J.E. Roberts, The quantum structure of spacetime at the Planck scale and quantum elds, Commun. Math. Phys. 172 (1995) 187; T. Yoneya, String theory and spacetime uncertainty principle, Progr. Theor. Phys. 103 (2000) 1081 [hep-th/0004074]. [13] A. Giveon, M. Porrati, E. Rabinovici, Target space duality in string theory, Phys. Rep. 244 (1994) 77 [hep-th/9401139]. [14] J. FrYohlich, K. Gaw_edzki, Conformal eld theory and geometry of strings, CRM Proc. Lecture Notes 7 (1994) 57 [hep-th/9310187]. [15] F. Lizzi, R.J. Szabo, Duality symmetries and noncommutative geometry of string spacetimes, Commun. Math. Phys. 197 (1998) 667 [hep-th/9707202]. [16] G. Landi, F. Lizzi, R.J. Szabo, String geometry and the noncommutative torus, Commun. Math. Phys. 206 (1999) 603 [hep-th/9806099]. [17] J. FrYohlich, O. Grandjean, A. Recknagel, Supersymmetric quantum theory, noncommutative geometry and gravitation, in: A. Connes, K. Gaw_edzki, J. Zinn-Justin (Eds.), Quantum Symmetries, Elsevier, Amsterdam, 1998, p. 221 [hep-th/9706132]. [18] E. Witten, Supersymmetry and morse theory, J. DiM. Geom. 17 (1982) 661; E. Witten, Constraints on supersymmetry breaking, Nucl. Phys. B 202 (1982) 253. [19] E. Witten, Noncommutative geometry and string eld theory, Nucl. Phys. B 268 (1986) 253. [20] F. Lizzi, R.J. Szabo, Noncommutative geometry and string duality, J. High Energy Phys. Conf. Proc. corfu98/073 [hep-th/9904064]. [21] P.-M. Ho, Y.-S. Wu, Noncommutative geometry and D-Branes, Phys. Lett. B 398 (1997) 52 [hep-th/9611233]; J. Kalkkinen, Dimensionally reduced Yang–Mills theories in noncommutative geometry, Phys. Lett. B 399 (1997) 243 [hep-th/9612027]; A.H. Chamseddine, The spectral action principle in noncommutative geometry and the superstring, Phys. Lett. B 400 (1997) 87 [hep-th/9701096]; A.H. Chamseddine, An eMective superstring spectral action, Phys. Rev. D 56 (1997) 3555 [hep-th/9705153]. [22] P. Ho`rava, Strings on worldsheet orbifolds, Nucl. Phys. B 460 (1989) 461; P. Ho`rava, Background duality of open string models, Phys. Lett. B 231 (1989) 251; J. Dai, R.G. Leigh, J. Polchinski, New connections between string theories, Mod. Phys. Lett. A 4 (1989) 2073; J. Polchinski, Dirichlet Branes and Ramond–Ramond charges, Phys. Rev. Lett. 75 (1995) 4724 [hep-th/9510169]. [23] E. Witten, Bound states of strings and p-branes, Nucl. Phys. B 460 (1996) 33 [hep-th/9510135]. [24] T. Banks, W. Fischler, S.H. Shenker, L. Susskind, M -theory as a matrix model: a conjecture, Phys. Rev. D 55 (1997) 5112 [hep-th/9610043]. [25] N. Ishibashi, H. Kawai, Y. Kitazawa, A. Tsuchiya, A large N reduced model as superstring, Nucl. Phys. B 498 (1997) 467 [hep-th/9612115]. [26] M.R. Douglas, D. Kabat, P. Pouliot, S.H. Shenker, D-Branes and short distances in string theory, Nucl. Phys. B 485 (1997) 85 [hep-th/9608024]. [27] N.E. Mavromatos, R.J. Szabo, Matrix D-Brane dynamics, logarithmic operators and quantization of noncommutative spacetime, Phys. Rev. D 59 (1999) 104018 [hep-th/9808124]; C.-S. Chu, P.-M. Ho, Y.-C. Kao, Worldvolume uncertainty relations for D-Branes, Phys. Rev. D 60 (1999) 126003 [hep-th/9904133]. [28] A. Connes, M.R. Douglas, A. Schwarz, Noncommutative geometry and matrix theory: compacti cation on tori, J. High Energy Phys. 9802 (1998) 003 [hep-th/9711162]. [29] A. Sen, D0-branes on T n and matrix theory, Adv. Theor. Math. Phys. 2 (1998) 51 [hep-th/9709220]; N. Seiberg, Why is the matrix model correct? Phys. Rev. Lett. 79 (1997) 3577 [hep-th/9710009]. [30] M.J. DuM, M -theory (The Theory Formerly Known as Strings), Int. J. Mod. Phys. A 11 (1996) 5623 [hep-th/9608117]; J.H. Schwarz, Lectures on superstring and M -theory dualities, Nucl. Phys. (Proc. Suppl.) B 55 (1997) 1 [hep-th/9607201].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
291
[31] M.R. Douglas, C.M. Hull, D-Branes and the noncommutative torus, J. High Energy Phys. 9802 (1998) 008 [hep-th/9711165]; Y.-K.E. Cheung, M. Krogh, Noncommutative geometry from 0-Branes in a background B- eld, Nucl. Phys. B 528 (1998) 185 [hep-th/9803031]; T. Kawano, K. Okuyama, Matrix theory on noncommutative torus, Phys. Lett. B 433 (1998) 29 [hep-th/9803044]; F. Ardalan, H. Arfaei, M.M. Sheikh–Jabbari, Noncommutative geometry from strings and branes, J. High Energy Phys. 9902 (1999) 016 [hep-th/9810072]; C.-S. Chu, P.-M. Ho, Noncommutative open string and D-Brane, Nucl. Phys. B 550 (1999) 151 [hep-th/9812219]. [32] V. Schomerus, D-Branes and deformation quantization, J. High Energy Phys. 9906 (1999) 030 [hep-th/9903205]. [33] D. Bigatti, L. Susskind, Magnetic elds, branes and noncommutative geometry, Phys. Rev. D 62 (2000) 066004 [hep-th/9908056]. [34] N. Seiberg, E. Witten, String theory and noncommutative geometry, J. High Energy Phys. 9909 (1999) 032 [hep-th/9908142]. [35] M. Kato, T. Kuroki, Worldvolume noncommutativity versus target space noncommutativity, J. High Energy Phys. 9903 (1999) 012 [hep-th/9902004]. [36] N.A. Nekrasov, A. Schwarz, Instantons on noncommutative R4 and (2; 0) superconformal six-dimensional theory, Commun. Math. Phys. 198 (1998) 689 [hep-th/9802068]. [37] A.P. Polychronakos, Flux tube solutions in noncommutative gauge theories, Phys. Lett. B 495 (2000) 407 [hep-th/0007043]; D. Bak, Exact multi-vortex solutions in noncommutative Abelian higgs theory, Phys. Lett. B 495 (2000) 251 [hep-th/0008204]; D. Bak, K. Lee, J.-H. Park, Noncommutative vortex solitons, Phys. Rev. D 63 (2001) 125010 [hep-th/0011099]; G.S. Lozano, E.F. Moreno, F.A. Shaposnik, Nielsen–Olsen vortices in noncommutative space, Phys. Lett. B 504 (2001) 117 [hep-th/0011205]; G.S. Lozano, E.F. Moreno, F.A. Shaposnik, Self-dual Chern–Simons solitons in noncommutative space, J. High Energy Phys. 0102 (2001) 036 [hep-th/0012266]. [38] D.J. Gross, N.A. Nekrasov, Solitons in noncommutative gauge theory, J. High Energy Phys. 0103 (2001) 044 [hep-th/0010090]. [39] D.J. Gross, N.A. Nekrasov, Monopoles and strings in noncommutative gauge theory, J. High Energy Phys. 0007 (2000) 034 [hep-th/0005204]; D.J. Gross, N.A. Nekrasov, Dynamics of strings in noncommutative gauge theory, J. High Energy Phys. 0010 (2000) 021 [hep-th/0007204]. [40] L.D. Landau, L.M. Lifshitz, Quantum Mechanics: Non-Relativistic Theory, Pergamon Press, Oxford, 1977. [41] D. Birmingham, M. Blau, M. Rakowski, G. Thompson, Topological eld theory, Phys. Rep. 209 (1991) 129. [42] A. Abouelsaood, C.G. Callan, C.R. Nappi, S.A. Yost, Open strings in background gauge elds, Nucl. Phys. B 280 (1987) 599. [43] H. Aoki, N. Ishibashi, S. Iso, H. Kawai, Y. Kitazawa, T. Tada, Noncommutative Yang–Mills in IIB matrix model, Nucl. Phys. B 565 (2000) 176 [hep-th/9908141]. [44] M.R. Douglas, Two lectures on D-geometry and noncommutative geometry, in: M.J. DuM, E. Sezgin, C.N. Pope, B.R. Greene, J. Louis, K.S. Narain, S. Randjbar-Daemi, G. Thompson (Eds.), Nonperturbative Aspects of String Theory and Supersymmetric Gauge Theories, World Scienti c, Singapore, 1999, p. 131 [hep-th/9901146]; A. Konechny, A. Schwarz, An introduction to matrix theory and noncommutative geometry: parts 1, 2, Phys. Rep. 360 (2002) 353 [hep-th/0012145] [hep-th/0107251]; M.R. Douglas, N.A. Nekrasov, Noncommutative eld theory, Rev. Mod. Phys. 73 (2001) 977 [hep-th/0106048]; R. Jackiw, Physical instances of noncommuting coordinates, Nucl. Phys. (Proc. Suppl.) B 108 (2002) 30 [hep-th/0110057]. [45] N.A. Nekrasov, Trieste lectures on solitons in noncommutative gauge theories [hep-th/0011095]. J.A. Harvey, Komaba lectures on noncommutative solitons and D-Branes [hep-th/0102076]. [46] M. Bordemann, M. Brischle, C. Emmrich, S. Waldmann, Subalgebras with converging star products in deformation quantization: an algebraic construction for CP n , J. Math. Phys. 37 (1996) 6311 [q-alg/9512019]; C. Castro, On the Large N Limit, W (∞) Strings, Star Products, AdS/CFT Duality, Nonlinear Sigma Models on AdS Spaces and Chern–Simons p-Branes [hep-th/0106260].
292
[47] [48] [49]
[50] [51] [52]
[53] [54] [55] [56] [57] [58] [59] [60] [61]
[62] [63] [64] [65]
[66]
R.J. Szabo / Physics Reports 378 (2003) 207 – 299 A.B. Hammou, M. Lagraa, M.M. Sheikh–Jabbari, Coherent state induced star-product on R3, and the fuzzy sphere, Phys. Rev. D 66 (2002) 025025 [hep-th/0110291]; J.M. Gracia-Bond]^a, F. Lizzi, G. Marmo, P. Vitale, In nitely many star products to play with, J. High Energy Phys. 0204 (2002) 026 [hep-th/0112092]. Y. Okawa, H. Ooguri, An exact solution to Seiberg–Witten equation of noncommutative gauge theory, Phys. Rev. D 64 (2001) 046009 [hep-th/0104036]. T. Lee, Noncommutative Dirac-Born-infeld action for D-Brane, Phys. Lett. B 478 (2000) 313 [hep-th/9912038]. L. Cornalba, Corrections to the Abelian Born–Infeld action arising from noncommutative geometry, J. High Energy Phys. 0009 (2000) 017 [hep-th/9912293]; S. Terashima, The Nonabelian Born–Infeld action and noncommutative gauge theory, J. High Energy Phys. 0007 (2000) 033 [hep-th/0006058]. I.Ya. Aref’eva, D.M. Belov, A.A. Giryavets, A.S. Koshelev, P.B. Medvedev, Noncommutative eld theories and superstring eld theories [hep-th/0111208]. I. HinchliMe, N. Kersting, Review of the phenomenology of noncommutative geometry [hep-ph/0205040]. B. Jur`co, S. Schraml, P. Schupp, J. Wess, Enveloping algebra valued gauge transformations for Nonabelian gauge groups on noncommutative spaces, Eur. Phys. J. C 17 (2000) 521 [hep-th/0006246]; B. Jur`co, L. Moller, S. Schraml, P. Schupp, J. Wess, Construction of Nonabelian gauge theories on noncommutative spaces, Eur. Phys. J. C 21 (2001) 383 [hep-th/0104153]; C.P. Mart]^n, The gauge anomaly and the Seiberg–Witten map [hep-th/0211164]. H. Weyl, The Theory of Groups and Quantum Mechanics, Dover, New York, 1931. M.A. RieMel, Deformation quantization for actions of Rd , Mem. Am. Math. Soc. 106 (1993) 1. E. Brown, Bloch electrons in a uniform magnetic eld, Phys. Rev. 133 (1963) 1038. J. Zak, Magnetic translation group I, II, Phys. Rev. 134 (1964) 1602, 1607. E.P. Wigner, Quantum corrections for thermodynamic equilibrium, Phys. Rev. 40 (1932) 749. A. Royer, Wigner function as the expectation value of a parity operator, Phys. Rev. A 15 (1977) 449; J.M. Gracia-Bond]^a, Generalized moyal quantization on homogeneous symplectic spaces, Contemp. Math. 134 (1992) 93. H.J. Groenewold, On the principles of elementary quantum mechanics, Physica 12 (1946) 405; J.E. Moyal, Quantum mechanics as a statistical theory, Proc. Cambridge Phil. Soc. 45 (1949) 99. F. Bayen, M. Flato, C. Fronsdal, A. Lichnerowicz, D. Sternheimer, Deformation theory and quantization I: deformations of symplectic structures, Ann. Phys. 111 (1978) 61. J. Madore, S. Schraml, P. Schupp, J. Wess, Gauge theory on noncommutative spaces, Eur. Phys. J. C 16 (2000) 161 [hep-th/0001203]. H. Garc]^a-Compe]an, J.F. Pleba]nski, D-Branes on group manifolds and deformation quantization, Nucl. Phys. B 618 (2001) 81 [hep-th/9907183]; P.-M. Ho, Y.-T. Yeh, Noncommutative D-Brane in nonconstant NS-NS B- eld background, Phys. Rev. Lett. 85 (2000) 5523 [hep-th/0005159]; L. Cornalba, R. Schiappa, Nonassociative star-product deformations for D-Brane worldvolumes in curved backgrounds, Commun. Math. Phys. 225 (2002) 33 [hep-th/0101219]; A.Yu. Alekseev, A. Recknagel, V. Schomerus, Open strings and noncommutative geometry of branes on group manifolds, Mod. Phys. Lett. A 16 (2001) 325 [hep-th/0104054]. M. Kontsevich, Deformation Quantization of Poisson Manifolds [q-alg/9709040]. A.S. Cattaneo, G. Felder, A path integral approach to the kontsevich quantization formula, Commun. Math. Phys. 212 (2000) 591 [math.QA/9902090]. R. Gopakumar, S. Minwalla, A. Strominger, Noncommutative solitons, J. High Energy Phys. 0005 (2000) 020 [hep-th/0003160]. K. Dasgupta, S. Mukhi, G. Rajesh, Noncommutative tachyons, J. High Energy Phys. 0006 (2000) 022 [hep-th/0005006]; J.A. Harvey, P. Kraus, F. Larsen, E.J. Martinec, D-Branes and strings as noncommutative solitons, J. High Energy Phys. 0007 (2000) 042 [hep-th/0005031]. S. Minwalla, M. Van Raamsdonk, N. Seiberg, Noncommutative perturbative dynamics, J. High Energy Phys. 0002 (2000) 020 [hep-th/9912072].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
293
[67] G. ’t Hooft, A planar diagram theory for strong interactions, Nucl. Phys. B 72 (1974) 461; D. Bessis, C. Itzykson, J.B. Zuber, Quantum eld theory techniques in graphical enumeration, Adv. Appl. Math. 1 (1980) 109. [68] N. Ishibashi, S. Iso, H. Kawai, Y. Kitazawa, Wilson loops in noncommutative Yang–Mills, Nucl. Phys. B 573 (2000) 573 [hep-th/9910004]. [69] T. Filk, Divergences in a eld theory on quantum space, Phys. Lett. B 376 (1996) 53. [70] J.C. V]arilly, J.M. Gracia-Bond]^a, On the ultraviolet structure of quantum elds over noncommutative manifolds, Int. J. Mod. Phys. A 14 (1999) 1305 [hep-th/9804001]; M. Chaichian, A. Demichev, P. Pre`snajder, Quantum eld theory on noncommutative spacetimes and the persistence of ultraviolet divergences, Nucl. Phys. B 567 (2000) 360 [hep-th/9812180]. [71] M. Laidlaw, Noncommutative geometry from string theory: annulus corrections, J. High Energy Phys. 0103 (2001) 004 [ep-th/0009068]. [72] I. Chepelev, R. Roiban, Renormalization of quantum eld theories on noncommutative Rd I: scalars, J. High Energy Phys. 0005 (2000) 037 [hep-th/9911098]; I. Chepelev, R. Roiban, Convergence theorem for noncommutative feynman graphs and renormalization, J. High Energy Phys. 0103 (2001) 001 [hep-th/0008090]; I.Ya. Aref’eva, D.M. Belov, A.S. Koshelev, Two-loop diagrams in noncommutative /44 theory, Phys. Lett. B 476 (2000) 431 [hep-th/9912075]; S.S. Gubser, S.L. Sondhi, Phase structure of noncommutative scalar eld theories, Nucl. Phys. B 605 (2001) 395 [hep-th/0006119]; A. Micu, M.M. Sheikh–Jabbari, Noncommutative /4 theory at two loops, J. High Energy Phys. 0001 (2001) 025 [hep-th/0008057]. [73] H.O. Girotti, M. Gomes, V.O. Rivelles, A.J. da Silva, A consistent noncommutative eld theory: the Wess–Zumino model, Nucl. Phys. B 587 (2000) 299 [hep-th/0005272]. [74] E.F. Moreno, F.A. Schaposnik, The Wess–Zumino–Witten term in noncommutative two-dimensional fermion models, J. High Energy Phys. 0003 (2000) 032 [hep-th/0002236]; E.F. Moreno, F.A. Schaposnik, Wess–Zumino–Witten and fermion models in noncommutative space, Nucl. Phys. B 596 (2001) 439 [hep-th/0008118]. [75] M.M. Sheikh–Jabbari, Open strings in a B- eld background as electric dipoles, Phys. Lett. B 455 (1999) 129 [hep-th/9901080]. [76] M. Van Raamsdonk, N. Seiberg, Comments on noncommutative perturbative dynamics, J. High Energy Phys. 0003 (2000) 035 [hep-th/0002186]. [77] G. Arcioni, M.A. V]azquez-Mozo, Thermal eMects in perturbative noncommutative gauge theories, J. High Energy Phys. 0001 (2000) 028 [hep-th/9912140]; W. Fischler, E. Gorbatov, A. Kashani-Poor, S. Paban, P. Pouliot, J. Gomis, Evidence for winding states in noncommutative quantum eld theory, J. High Energy Phys. 0005 (2000) 024 [hep-th/0002067]; W. Fischler, E. Gorbatov, A. Kashani-Poor, R. McNees, S. Paban, P. Pouliot, The interplay between and T , J. High Energy Phys. 0006 (2000) 032 [hep-th/0003216]; G. Arcioni, J.L.F. Barb]on, J. Gomis, M.A. V]azquez-Mozo, On the stringy nature of winding modes in noncommutative thermal eld theories, J. High Energy Phys. 0006 (2000) 038 [hep-th/0004080]. [78] O.D. Andreev, H. Dorn, Diagrams of noncommutative /3 theory from string theory, Nucl. Phys. B 583 (2000) 145 [hep-th/0003113]; Y. Kiem, S. Lee, UV/IR mixing in noncommutative eld theory via open string loops, Nucl. Phys. B 586 (2000) 303 [hep-th/0003145]; A. Bilal, C.-S. Chu, R. Russo, String theory and noncommutative eld theories at one loop, Nucl. Phys. B 582 (2000) 65 [hep-th/0003180]; J. Gomis, M. Kleban, T. Mehen, M. Rangamani, S.H. Shenker, Noncommutative gauge dynamics from the string worldsheet, J. High Energy Phys. 0008 (2000) 011 [hep-th/0003215]. [79] E.T. Akhmedov, P. DeBoer, G.W. SemenoM, Running couplings and triviality of eld theories on noncommutative spaces, Phys. Rev. D 64 (2001) 065005 [hep-th/0010003]; E.T. Akhmedov, P. DeBoer, G.W. SemenoM, Noncommutative Gross–Neveu model at large N , J. High Energy Phys. 0106 (2001) 009 [hep-th/0103199];
294
[80]
[81]
[82] [83] [84]
[85] [86]
[87] [88] [89]
R.J. Szabo / Physics Reports 378 (2003) 207 – 299 H.O. Girotti, M. Gomes, V.O. Rivelles, A.J. da Silva, The noncommutative supersymmetric nonlinear sigma model, Int. J. Mod. Phys. A 17 (2002) 1503 [hep-th/0102101]. J. McGreevy, L. Susskind, N. Toumbas, Invasion of the giant gravitons from anti-de sitter space, J. High Energy Phys. 0006 (2000) 008 [hep-th/0003075]; I.Ya. Aref’eva, D.M. Belov, A.S. Koshelev, O.A. Rytchkov, UV/IR mixing for noncommutative complex scalar eld theory 2: interaction with gauge elds, Nucl. Phys. (Proc. Suppl.) B 102 (2001) 11 [hep-th/0003176]; A. Rajaraman, M. Rozali, Noncommutative gauge theory, divergences and closed strings, J. High Energy Phys. 0004 (2000) 033 [hep-th/0003227]; H. Liu, J. Michelson, Stretched strings in noncommutative eld theory, Phys. Rev. D 62 (2000) 066003 [hep-th/0004013]; P.-M. Ho, M. Li, Fuzzy spheres in AdS/CFT correspondence and holography from noncommutativity, Nucl. Phys. B 596 (2001) 259 [hep-th/0004072]; F. Zamora, On the operator product expansion in noncommutative quantum eld theory, J. High Energy Phys. 0005 (2000) 002 [hep-th/0004085]; J. Gomis, K. Landsteiner, E. Lopez, Nonrelativistic noncommutative eld theory and UV/IR mixing, Phys. Rev. D 62 (2000) 105006 [hep-th/0004115]; Y. Kiem, S. Lee, J. Park, Noncommutative eld theory from tring theory: two loop analysis, Nucl. Phys. B 594 (2001) 169 [hep-th/0008002]; M. Li, Quantum corrections to noncommutative solitons [hep-th/0011170]; B.A. Campbell, K. Kaminsky, Noncommutative eld theory and spontaneous symmetry breaking, Nucl. Phys. B 581 (2000) 240 [hep-th/0003137]; B.A. Campbell, K. Kaminsky, Noncommutative linear sigma models, Nucl. Phys. B 606 (2001) 613 [hep-th/0102022]; G.-H. Chen, Y.-S. Wu, On critical phenomena in a noncommutative space [hep-th/0103020]. L. Griguolo, M. Petroni, Wilsonian renormalization group and the noncommutative IR/UV connection, J. High Energy Phys. 0105 (2001) 032 [hep-th/0104217]; Y. Kinar, G. Lifschytz, J. Sonnenschein, UV/IR connection: a matrix perspective, J. High Energy Phys. 0108 (2001) 001 [hep-th/0105089]. H.O. Girotti, M. Gomes, V.O. Rivelles, A.J. da Silva, The low-energy limit of the noncommutative Wess–Zumino model, J. High Energy Phys. 0205 (2002) 040 [hep-th/0101159]; I. Jack, D.R.T. Jones, N. Mohammedi, Ultraviolet properties of noncommutative nonlinear sigma models in two dimensions, Phys. Lett. B 520 (2001) 405 [hep-th/0109015]. I.L. Buchbinder, M. Gomes, A.Yu. Petrov, V.O. Rivelles, Super eld eMective action in the noncommutative Wess–Zumino model, Phys. Lett. B 517 (2001) 191 [hep-th/0107022]. N. Seiberg, L. Susskind, N. Toumbas, Space/time noncommutativity and causality, J. High Energy Phys. 0006 (2000) 044 [hep-th/0005015]. J. Gomis, T. Mehen, Space–time noncommutative eld theories and unitarity, Nucl. Phys. B 591 (2000) 265 [hep-th/0005129]; L. Alvarez-Gaum]e, J.L.F. Barb]on, R. Zwicky, Remarks on time–space noncommutative eld theories, J. High Energy Phys. 0105 (2001) 057 [hep-th/0103069]. R.-G. Cai, N. Ohta, Lorentz transformation and light-like noncommutative SYM, J. High Energy Phys. 0010 (2000) 036 [hep-th/0008119]. N. Seiberg, L. Susskind, N. Toumbas, Strings in background electric eld, space/time noncommutativity and a new noncritical string theory, J. High Energy Phys. 0006 (2000) 021 [hep-th/0005040]; R. Gopakumar, J. Maldacena, S. Minwalla, A. Strominger, S-duality and noncommutative gauge theory, J. High Energy Phys. 0006 (2000) [hep-th/0005048]. C.P. Burgess, Open string instability in background electric elds, Nucl. Phys. B 294 (1987) 427. N. Ohta, D. Tomino, Noncommutative gauge dynamics from brane con gurations with background B eld, Progr. Theor. Phys. 105 (2001) 287 [hep-th/0009021]. O. Aharony, J. Gomis, T. Mehen, On theories with light-like noncommutativity, J. High Energy Phys. 0009 (2000) 023 [hep-th/0006236].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
295
[90] K. Matsubara, Restrictions on gauge groups in noncommutative gauge theory, Phys. Lett. B 482 (2000) 417 [hep-th/0003294]. [91] A. Armoni, Comments on perturbative dynamics of noncommutative Yang–Mills theory, Nucl. Phys. B 593 (2001) 229 [hep-th/0005208]. [92] L. Bonora, M. Schnabl, M.M. Sheikh–Jabbari, A. Tomasiello, Noncommutative SO(N ) and Sp(N ) gauge theories, Nucl. Phys. B 589 (2000) 461 [hep-th/0006091]; I. Bars, M.M. Sheikh–Jabbari, M.A. Vasiliev, Noncommutative o? (N ) and usp? (N ) algebras and the corresponding gauge eld theories, Phys. Rev. D 64 (2001) 086004 [hep-th/0103209]. [93] K. Okuyama, A path integral representation of the map between commutative and noncommutative gauge elds, J. High Energy Phys. 0003 (2000) 016 [hep-th/9910138]. [94] P. Watts, Noncommutative string theory, the R-matrix and Hopf algebras, Phys. Lett. B 474 (2000) 295 [hep-th/9911026]; O.D. Andreev, H. Dorn, On open string -model and noncommutative gauge elds, Phys. Lett. B 476 (2000) 402 [hep-th/9912070]. [95] J. AmbjHrn, Y.M. Makeenko, J. Nishimura, R.J. Szabo, Finite N matrix models of noncommutative gauge theory, J. High Energy Phys. 9911 (1999) 029 [hep-th/9911041]. [96] J. AmbjHrn, Y.M. Makeenko, J. Nishimura, R.J. Szabo, Lattice gauge elds and discrete noncommutative Yang–Mills theory, J. High Energy Phys. 0005 (2000) 023 [hep-th/0004147]. [97] A.Yu. Alekseev, A.G. Bytsko, Wilson lines on noncommutative tori, Phys. Lett. B 482 (2000) 271 [hep-th/0002101]. [98] F. Lizzi, R.J. Szabo, A. Zampini, Geometry of the gauge algebra in noncommutative Yang–Mills theory, J. High Energy Phys. 0108 (2001) 032 [hep-th/0107115]. [99] A. Hashimoto, N. Itzhaki, Noncommutative Yang–Mills and the AdS/CFT correspondence, Phys. Lett. B 465 (1999) 142 [hep-th/9907166]; J.M. Maldacena, J.G. Russo, The large N limit of noncommutative gauge theories, J. High Energy Phys. 9909 (1999) 025 [hep-th/9908134]; M. Alishahiha, Y. Oz, M.M. Sheikh–Jabbari, Supergravity and large N noncommutative eld theories, J. High Energy Phys. 9911 (1999) 007 [hep-th/9909215]; R.-G. Cai, N. Ohta, Thermodynamics of large N noncommutative super Yang–Mills theory, Phys. Rev. D 61 (2000) 124012 [hep-th/9910092]; R.-G. Cai, N. Ohta, Noncommutative and ordinary super Yang–Mills on (D(p − 2); Dp) bound states, J. High Energy Phys. 0003 (2000) 009 [hep-th/0001213]; R.-G. Cai, N. Ohta, (F1,D1,D3) bound states, its scaling limits and SL(2; Z) duality, Progr. Theor. Phys. 104 (2000) 1073 [hep-th/0007106]; S.R. Das, S. Kalyana Rama, S.P. Trivedi, Supergravity with self-dual B- elds and instantons in noncommutative gauge theory, J. High Energy Phys. 0003 (2000) 004 [hep-th/9911137]; T. Harmark, N.A. Obers, Phase structure of noncommutative eld theories and spinning brane bound states, J. High. Energy Phys. 0003 (2000) 024 [hep-th/9911169]; A. Dhar, Y. Kitazawa, Wilson loops in strongly coupled noncommutative gauge theories, Phys. Rev. D 63 (2001) 125005 [hep-th/0010256]. [100] D.J. Gross, A. Hashimoto, N. Itzhaki, Observables of noncommutative gauge theory, Adv. Theor. Math. Phys. 4 (2000) 893 [hep-th/0008075]. [101] A. Dhar, S.R. Wadia, A note on gauge invariant operators in noncommutative gauge theories and the matrix model, Phys. Lett. B 495 (2000) 413 [hep-th/0008144]. [102] M. Rozali, M. Van Raamsdonk, Gauge invariant correlators in noncommutative gauge theory, Nucl. Phys. B 608 (2001) 103 [hep-th/0012065]; A. Dhar, Y. Kitazawa, High-energy behaviour of Wilson lines, J. High Energy Phys. 0102 (2001) 004 [hep-th/0012170]. [103] S.-J. Rey, R. von Unge, S-duality, noncritical open string and noncommutative gauge theory, Phys. Lett. B 499 (2001) 215 [hep-th/0007089]; S.R. Das, S.-J. Rey, Open Wilson lines in noncommutative gauge theory and the tomography of holographic dual supergravity, Nucl. Phys. B 590 (2000) 453 [hep-th/0008042];
296
[104] [105] [106] [107] [108] [109] [110] [111] [112]
[113]
[114]
R.J. Szabo / Physics Reports 378 (2003) 207 – 299 H. Liu, ?-Trek II: ?n operations, open Wilson lines and the Seiberg–Witten map, Nucl. Phys. B 614 (2001) 305 [hep-th/0011125]; S.R. Das, S.P. Trivedi, Supergravity couplings to noncommutative branes, open Wilson lines and generalized star products, J. High Energy Phys. 0102 (2001) 046 [hep-th/0011131]; Y. Okawa, H. Ooguri, How noncommutative gauge theories couple to gravity, Nucl. Phys. B 599 (2001) 55 [hep-th/0012218]; H. Liu, J. Michelson, Supergravity couplings of noncommutative D-Branes, Nucl. Phys. B 615 (2001) 169 [hep-th/0101016]; A. Dhar, Y. Kitazawa, Noncommutative gauge theory, open Wilson lines and closed strings, J. High Energy Phys. 0108 (2001) 044 [hep-th/0106217]. T. Krajewski, R. Wulkenhaar, Perturbative quantum gauge elds on the noncommutative torus, Int. J. Mod. Phys. A 15 (2000) 1011 [hep-th/9903187]. A. Matusis, L. Susskind, N. Toumbas, The IR/UV connection in the noncommutative gauge theories, J. High Energy Phys. 0012 (2000) 002 [hep-th/0002075]. M. Van Raamsdonk, The meaning of infrared singularities in noncommutative gauge theories, J. High Energy Phys. 0111 (2001) 006 [hep-th/0110093]. C.P. Mart]^n, D. Sanchez-Ruiz, The one-loop UV divergent structure of U (1) Yang–Mills theory on noncommutative R4 , Phys. Rev. Lett. 83 (1999) 476 [hep-th/9903077]. M.M. Sheikh–Jabbari, One-loop renormalizability of supersymmetric Yang–Mills theories on noncommutative two-torus, J. High. Energy Phys. 9906 (1999) 015 [hep-th/9903107]. M. Hayakawa, Perturbative analysis on infrared aspects of noncommutative QED on R4 , Phys. Lett. B 478 (2000) 394 [hep-th/9912094]. A. Armoni, E. Lopez, UV/IR mixing via closed strings and tachyonic instabilities, Nucl. Phys. B 632 (2002) 240 [hep-th/0110113]. C.-S. Chu, V.V. Khoze, G. Travaglini, Dynamical breaking of supersymmetry in noncommutative gauge theories, Phys. Lett. B 513 (2001) 200 [hep-th/0105187]. N. Grandi, R.L. Pakman, F.A. Shaposnik, Supersymmetric Dirac–Born–Infeld theory in noncommutative space, Nucl. Phys. B 588 (2000) 508 [hep-th/0004104]; H. Liu, J. Michelson, ?-Trek: the one loop N = 4 noncommutative SYM action, Nucl. Phys. B 614 (2001) 279 [hep-th/0008205]; D. Bellisai, J.M. Isidro, M. Matone, On the structure of noncommutative N = 2 super Yang–Mills theory, J. High Energy Phys. 0010 (2000) 026 [hep-th/0009174]; M.T. Grisaru, S. Penati, Noncommutative supersymmetric gauge anomaly, Phys. Lett. B 504 (2001) 89 [hep-th/0010177]; V.V. Khoze, G. Travaglini, Wilsonian eMective actions and the IR/UV mixing in noncommutative gauge theories, J. High Energy Phys. 0101 (2001) 026 [hep-th/0011218]; M. Pernici, A. Santambrogio, D. Zanon, The one-loop eMective action of noncommutative N = 4 super Yang–Mills is gauge invariant, Phys. Lett. B 504 (2001) 131 [hep-th/0011140]; D. Zanon, Noncommutative N = 1; N = 2 super U (N ) Yang–Mills: UV/IR mixing and eMective action results at one loop, Phys. Lett. B 502 (2001) 265 [hep-th/0012009]; F. Ruiz Ruiz, Gauge xing independence of IR divergences in noncommutative U (1), perturbative tachyonic instabilities and supersymmetry, Phys. Lett. B 502 (2001) 274 [hep-th/0012171]; A. Armoni, R. Minasian, S. Theisen, On noncommutative N = 2 super Yang–Mills, Phys. Lett. B 513 (2001) 406 [hep-th/0102007]; T.J. Hollowood, V.V. Khoze, G. Travaglini, Exact results in noncommutative N=2 supersymmetric gauge theories, J. High Energy Phys. 0105 (2001) 051 [hep-th/0102045]. H. Grosse, T. Krajewski, R. Wulkenhaar, Renormalization of noncommutative Yang–Mills theories: a simple example [hep-th/0001182]. L. Bonora, M. Salizzoni, Renormalization of noncommutative U (N ) gauge theories, Phys. Lett. B 504 (2001) 80 [hep-th/0011088]. A. Bassetto, L. Griguolo, G. Nardelli, F. Vian, On the unitarity of quantum gauge theories on noncommutative spaces, J. High Energy Phys. 0107 (2001) 008 [hep-th/0105257].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
297
[115] S. Cho, R. Hinterding, J. Madore, H. Steinacker, Finite eld theory on noncommutative geometries, Int. J. Mod. Phys. D 9 (2000) 161 [hep-th/9903239]; E. Hawkins, Noncommutative Regularization for the Practical Man [hep-th/9908052]. H.B. Benaoum, Perturbative BF-Yang–Mills theory on noncommutative R4 , Nucl. Phys. B 585 (2000) 554 [hep-th/9912036]; A.A. Bichl, J.M. Grimstrup, V. Putz, M. Schweda, Perturbative Chern–Simons theory on noncommutative R3 , J. High Energy Phys. 0007 (2000) 046 [hep-th/0004071]; A.A. Bichl, J.M. Grimstrup, H. Grosse, L. Popp, M. Schweda, R. Wulkenhaar, The super eld formalism applied to the noncommutative Wess–Zumino model, J. High Energy Phys. 0010 (2000) 046 [hep-th/0007050]; A.A. Bichl, J.M. Grimstrup, H. Grosse, L. Popp, M. Schweda, R. Wulkenhaar, Renormalization of the noncommutative photon self-energy to all orders via Seiberg–Witten map, J. High Energy Phys. 0106 (2001) 013 [hep-th/0104097]; A.A. Bichl, J.M. Grimstrup, L. Popp, M. Schweda, R. Wulkenhaar, Perturbative analysis of the Seiberg–Witten map, Int. J. Mod. Phys. A 17 (2002) 2219 [hep-th/0102044]; I. Jack, D.R.T. Jones, Ultraviolet nite noncommutative theories, Phys. Lett. B 514 (2001) 401 [hep-th/0105221]; I. Jack, D.R.T. Jones, Ultraviolet niteness in noncommutative supersymmetric theories, New J. Phys. 3 (2001) 19 [hep-th/0109195]; C.E. Carlson, C.D. Carone, R.F. Lebed, Bounding noncommutative QCD, Phys. Lett. B 518 (2001) 201 [hep-ph/0107291]. [116] P.-M. Ho, Y.-S. Wu, Noncommutative gauge theories in matrix theory, Phys. Rev. D 58 (1998) 066003 [hep-th/9801147]. [117] G.A. Elliott, On the K-theory of the C ∗ -algebra generated by a projective representation of a torsion-free discrete Abelian group, in: G. Arsene, et al. (Eds.), Operator Algebras and Group Representations 1, Pitman, London, 1984, p. 159. [118] A. Astashkevich, N.A. Nekrasov, A. Schwarz, On noncommutative nahm transform, Commun. Math. Phys. 211 (2000) 167 [hep-th/9810147]; A. Konechny, A. Schwarz, BPS States on noncommutative tori and duality, Nucl. Phys. B 550 (1999) 561 [hep-th/9811159]. [119] M.R. Douglas, Branes within branes, in: L. Baulieu, P. Di Francesco, M.R. Douglas, V.A. Kazakov, M. Picco, P. Windey (Eds.), Strings, Branes and Dualities, Kluwer Academic, Dordrecht, 1999, p. 267 [hep-th/9512077]. [120] G. ’t Hooft, A property of electric and magnetic 6ux in nonabelian gauge theories, Nucl. Phys. B 153 (1979) 141; G. ’t Hooft, Some twisted self-dual solutions for the Yang–Mills equations on a hypertorus, Commun. Math. Phys. 81 (1981) 267. [121] D.R. Lebedev, M.I. Polikarpov, A.A. Roslyi, Gauge elds on the continuum and lattice tori, Nucl. Phys. B 325 (1989) 138. [122] W. Taylor, D-Brane eld theory on compact spaces, Phys. Lett. B 394 (1997) 283 [hep-th/9611042]; O.J. Ganor, S. Ramgoolam, W. Taylor, Branes, 6uxes and duality in matrix theory, Nucl. Phys. B 492 (1997) 191 [hep-th/9611202]. [123] E. Kim, H. Kim, N. Kim, B.-H. Lee, C.-Y. Lee, H.S. Yang, Twisted bundles on noncommutative T4 and D-Brane bound states, Phys. Rev. D 62 (2000) 046001 [hep-th/9912272]. [124] P.-M. Ho, Twisted bundle on quantum torus and BPS states in matrix theory, Phys. Lett. B 434 (1998) 41 [hep-th/9803166]; D. Brace, B. Morariu, B. Zumino, Dualities of the matrix model from T -duality of the type II string, Nucl. Phys. B 545 (1999) 157 [hep-th/9810099]; C. Hofman, E. Verlinde, Gauge bundles and Born–Infeld on the noncommutative torus, Nucl. Phys. B 547 (1999) 157 [hep-th/9810219]; B. Pioline, A. Schwarz, Morita equivalence and T -duality (or B versus ), J. High Energy Phys. 9908 (1999) 021 [hep-th/9908019]. [125] A. Schwarz, Morita equivalence and duality, Nucl. Phys. B 534 (1998) 720 [hep-th/9805034]. [126] P. van Baal, B. van Geemen, A simple construction of twist eating solutions, J. Math. Phys. 27 (1986) 455; D.R. Lebedev, M.I. Polikarpov, Extrema of the twisted Eguchi–Kawai action and the nite Heisenberg group, Nucl. Phys. B 269 (1986) 285.
298
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
[127] J. Igusa, Theta Functions, Springer, Berlin, 1972. [128] M.A. RieMel, A. Schwarz, Morita equivalence of multi-dimensional noncommutative tori, Int. J. Math. 10 (1999) 189 [math.QA/9803057]. [129] N.A. Obers, B. Pioline, U -duality and M -theory, Phys. Rep. 318 (1999) 113 [hep-th/9809039]. [130] A. Hashimoto, N. Itzhaki, On the hierarchy between noncommutative and ordinary supersymmetric Yang–Mills, J. High Energy Phys. 9912 (1999) 007 [hep-th/9911057]; S. Elitzur, B. Pioline, E. Rabinovici, On the short distance structure of irrational noncommutative gauge theories, J. High Energy Phys. 0010 (2000) 011 [hep-th/0009009]; Z. Guralnik, J. Troost, Aspects of gauge theory on commutative and noncommutative tori, J. High Energy Phys. 0105 (2001) 022 [hep-th/0103168]. [131] B. Morariu, B. Zumino, Super Yang–Mills on the noncommutative torus, in: R.E. Allen (Ed.), Relativity, Particle Physics and Cosmology, World Scienti c, Singapore, 1999 [hep-th/9807198]. [132] E. Kim, H. Kim, C.-Y. Lee, Matrix theory compacti cation on noncommutative T4 =Z2 , J. Math. Phys. 42 (2001) 2677 [hep-th/0005205]. [133] M.A. RieMel, Projective modules over higher-dimensional noncommutative tori, Can. J. Math. 40 (1988) 257. [134] L. Cornalba, D-Brane physics and noncommutative Yang–Mills theory, Adv. Theor. Math. Phys. 4 (2000) 271 [hep-th/9909081]; N. Ishibashi, A relation between commutative and noncommutative descriptions of D-Branes [hep-th/9909176]. B. Jur`co, P. Schupp, Noncommutative Yang–Mills from equivalence of star products, Eur. Phys. J. C 14 (2000) 367 [hep-th/0001032]. [135] T. Eguchi, H. Kawai, Reduction of dynamical degrees of freedom in the large N gauge theory, Phys. Rev. Lett. 48 (1982) 1063. [136] A. Gonz]alez-Arroyo, M. Okawa, A twisted model for large N lattice gauge theory, Phys. Lett. B 120 (1983) 174; A. Gonz]alez-Arroyo, M. Okawa, The twisted Eguchi–Kawai model: a reduced model for large N lattice gauge theory, Phys. Rev. D 27 (1983) 2397; T. Eguchi, R. Nakayama, Simpli cation of the quenching procedure for large N spin models, Phys. Lett. B 122 (1983) 59; A. Gonz]alez-Arroyo, C.P. Korthals Altes, Reduced model for large N continuum eld theories, Phys. Lett. B 131 (1983) 396. [137] Y.M. Makeenko, Reduced models and noncommutative gauge theories, J. Exp. Theor. Phys. Lett. 72 (2000) 393 [hep-th/0009028]; R.J. Szabo, Discrete noncommutative gauge theory, Mod. Phys. Lett. A 16 (2001) 367 [hep-th/0101216]. [138] M. Li, Strings from IIB matrices, Nucl. Phys. B 499 (1997) 149 [hep-th/9612222]; T. Curtright, D.B. Fairlie, C.K. Zachos, Integrable symplectic trilinear interaction terms for matrix membranes, Phys. Lett. B 405 (1997) 37 [hep-th/9704037]; D.B. Fairlie, Moyal brackets in M -theory, Mod. Phys. Lett. A 13 (1998) 263 [hep-th/9707190]; I. Bars, D. Minic, Noncommutative geometry on a discrete periodic lattice and gauge theory, Phys. Rev. D 62 (2000) 105018 [hep-th/9910091]; K.N. Anagnostopoulos, J. Nishimura, P. Olesen, Noncommutative string worldsheets from matrix models, J. High Energy Phys. 0104 (2001) 024 [hep-th/0012061]; J. Nishimura, M.A. V]azquez-Mozo, Noncommutative chiral gauge theories on the lattice with manifest star-gauge invariance, J. High Energy Phys. 0108 (2001) 033 [hep-th/0107110]. [139] K.G. Wilson, Con nement of quarks, Phys. Rev. D 10 (1974) 2445. [140] D.B. Fairlie, P. Fletcher, C.K. Zachos, Trigonometric structure constants for new in nite algebras, Phys. Lett. B 218 (1989) 203; D.B. Fairlie, P. Fletcher, C.K. Zachos, In nite dimensional algebras and a trigonometric basis for the classical Lie algebras, J. Math. Phys. 31 (1990) 1088; D.B. Fairlie, C.K. Zachos, In nite dimensional algebras, sine brackets and SU (∞), Phys. Lett. B 224 (1989) 101. [141] C. Sochichiu, Many vacua of IIB, J. High Energy Phys. 0005 (2000) 026 [hep-th/0004062]; T. Azuma, S. Iso, H. Kawai, Y. Ohwashi, Supermatrix models, Nucl. Phys. B 610 (2001) 251 [hep-th/0102168]. [142] E. Langmann, R.J. Szabo, Teleparallel gravity and dimensional reductions of noncommutative gauge theory, Phys. Rev. D 64 (2001) 104019 [hep-th/0105094].
R.J. Szabo / Physics Reports 378 (2003) 207 – 299
299
[143] N. Ishibashi, S. Iso, H. Kawai, Y. Kitazawa, String scale in noncommutative Yang–Mills, Nucl. Phys. B 583 (2000) 159 [hep-th/0004038]. [144] F. Lizzi, R.J. Szabo, Noncommutative geometry and spacetime gauge symmetries of string theory, Chaos Solitons Fractals 10 (1999) 445 [hep-th/9712206]. [145] S. Iso, H. Kawai, Y. Kitazawa, Bi-local elds in noncommutative eld theory, Nucl. Phys. B 576 (2000) 375 [hep-th/0001027]; Y. Kimura, Y. Kitazawa, Supercurrent interactions in noncommutative Yang–Mills and IIB matrix model, Nucl. Phys. B 598 (2001) 73 [hep-th/0011038]. [146] H. Bursztyn, S. Waldmann, Deformation quantization of hermitian vector bundles, Lett. Math. Phys. 53 (2000) 349 [math.QA/0009170]. [147] M.A. RieMel, On the uniqueness of the heisenberg commutation relations, Duke Math. J. 39 (1972) 745. [148] A. Connes, Noncommutative geometry and reality, J. Math. Phys. 36 (1995) 6194; A. Connes, Gravity coupled with matter and the foundation of noncommutative geometry, Commun. Math. Phys. 182 (1996) 155 [hep-th/9603053]. [149] N.H. Kuiper, The homotopy type of the unitary group of hilbert space, Topology 3 (1965) 19. [150] E. Witten, Noncommutative tachyons and string eld theory [hep-th/0006071]. [151] B. de Wit, J. Hoppe, H. Nicolai, On the quantum mechanics of supermembranes, Nucl. Phys. B 305 (1988) 545; M. Bordemann, J. Hoppe, The dynamics of relativistic membranes 1: reduction to two-dimensional 6uid dynamics, Phys. Lett. B 317 (1993) 315 [hep-th/9307036]. [152] M.M. Sheikh–Jabbari, Discrete symmetries (C; P; T ) in noncommutative eld theories, Phys. Rev. Lett. 84 (2000) 5265 [hep-th/0001167]. [153] I. Mocioiu, M. Pospelov, R. Roiban, Low-energy limits on the antisymmetric tensor eld background on the brane and on the noncommutative scale, Phys. Lett. B 489 (2000) 390 [hep-ph/0005191]; M. Chaichian, M.M. Sheikh–Jabbari, A. Tureanu, Hydrogen atom spectrum and the lamb shift in noncommutative QED, Phys. Rev. Lett. 86 (2001) 2716 [hep-th/0010175]; S.M. Carroll, J.A. Harvey, V.A. Kosteleck]y, C.D. Lane, T. Okamoto, Noncommutative eld theory and lorentz violation, Phys. Rev. Lett. 87 (2001) 141601 [hep-th/0105082]. [154] A.P. Polychronakos, Noncommutative Chern–Simons terms and the noncommutative vacuum, J. High Energy Phys. 0011 (2000) 008 [hep-th/0010264]; D. Bak, K. Lee, J.-H. Park, Comments on noncommutative gauge theories, Phys. Lett. B 501 (2001) 305 [hep-th/0011244]. [155] A. Schwarz, Noncommutative instantons: a new approach, Commun. Math. Phys. 221 (2001) 433 [hep-th/0102182]. [156] S.G. Rajeev, Universal gauge theory, Phys. Rev. D 42 (1990) 2779; S.G. Rajeev, Embedding Yang–Mills theory into universal Yang–Mills theory, Phys. Rev. D 44 (1991) 1836. [157] V.G. Kac, In nite-Dimensional Lie Algebras, Cambridge University Press, Cambridge, 1985. [158] R.S. Palais, On the homotopy type of certain groups of operators, Topology 3 (1965) 271. [159] V.P. Nair, A.P. Polychronakos, On level quantization for the noncommutative Chern–Simons theory, Phys. Rev. Lett. 87 (2001) 030403 [hep-th/0102181]; J.A. Harvey, Topology of the gauge group in noncommutative gauge theory [hep-th/0105242]. [160] J.A. Harvey, P. Kraus, F. Larsen, Exact noncommutative solitons, J. High Energy Phys. 0012 (2000) 024 [hep-th/0010060]. [161] C.-T. Chan, J.-C. Lee, Noncommutative solitons and the W1+∞ algebras in quantum hall theory [hep-th/0107105]. [162] C.N. Pope, K.S. Stelle, SU (∞), SU+ (∞) and area-preserving algebras, Phys. Lett. B 226 (1989) 257. [163] J. Hoppe, P. Schaller, In nitely many versions of SU (∞), Phys. Lett. B 237 (1990) 407. [164] M. Bordemann, J. Hoppe, P. Schaller, M. Schlichenmaier, GL(∞) and geometric quantization, Commun. Math. Phys. 138 (1991) 207. [165] G. Landi, F. Lizzi, R.J. Szabo, From large N matrices to the noncommutative torus, Commun. Math. Phys. 217 (2001) 181 [hep-th/9912130].
Available online at www.sciencedirect.com
Physics Reports 378 (2003) 301 – 434 www.elsevier.com/locate/physrep
Classical and macroquantum dynamics of charged particles in a magnetic %eld Ram K. Varma∗ Physical Research Laboratory, Ahmedabad 380 009, India Received 1 December 2002 editor: J. Eichler
Abstract The investigations relating to the dynamics of charged particles in a magnetic %eld carried out over more than 40 years have been reviewed with special reference to the problem of nonadiabaticity due to %eld inhomogeneity, and time dependence. A detailed overview is presented of the standard approaches to one of the main problems namely the determination of the residence times of charged particles in an adiabatic magnetic trap which involves nonadiabaticity in a crucial way. In a major departure from the standard approach, a new paradigm described here as “macroquantum dynamics” was advanced by the author to address the problem of residence times. The evolution and development of this new paradigm is next presented as the main focus of the review. This consists of a probability amplitude Schr6odinger-like formalism for the classical macrodomain, which has been shown to be a description of the system in the correspondence limit of large Landau quantum numbers. It is demonstrated that this represents a remarkable persistence of matter wave behaviour well into the classical macrodomain, leading to unexpected experimental consequences. Experimental results con%rming some of the spectacular predictions of this formalism are presented. These refer to the existence of macroscopic matter wave interference phenomena and the observation of the curl-free vector potential a: la Aharonov–Bohm in the macrodomain. The problem of the nonadiabatic leakage of particles from an adiabatic trap takes the appearance here of the quantum-like tunneling of the adiabatic potential. The multiplicity of residence times predicted by the set of Schr6odinger-like equations have been well con%rmed by experiments. A critical comparison is %nally presented of the classical vs. macroquantum description of the system in the macrodomain. The new paradigm thus represents an entirely new and unexpected manifestation of quantum dynamics in the classical macrodomain. c 2003 Elsevier Science B.V. All rights reserved. PACS: 41.90.+e; 32.80.−t
∗
Fax: +91-79-630-1502. E-mail address:
[email protected] (R.K. Varma).
c 2003 Elsevier Science B.V. All rights reserved. 0370-1573/03/$ - see front matter doi:10.1016/S0370-1573(03)00005-X
302
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. Nonadiabaticity and quantum eFects: a formal analogy and a new paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. An outline of the conventional approach: the stochastic diFusion model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Adiabatic invariants: some general considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. The adiabatic invariant for a time dependent harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Asymptotic phenomena in quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Adiabatic invariance to all orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. A “tighter” adiabatic invariant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Charged particle dynamics in an electromagnetic %eld: adiabatic invariants and adiabatic motion . . . . . . . . . . . . . . 3.1. Preliminaries: motion in a constant and homogeneous magnetic %eld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Motion in a time varying magnetic %eld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Adiabatic invariant for motion in a homogeneous but time dependent magnetic %eld . . . . . . . . . . . . . . . . . . . . 3.4. General case: time dependent and inhomogeneous magnetic %eld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1. Adiabatic invariants and adiabatic equation of motion: adiabatic traps . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Nonadiabatic eFects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Time dependent, spatially homogeneous magnetic %eld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1. Discontinuous changes in the frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2. Smooth changes in the frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Time independent, spatially inhomogeneous magnetic %eld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1. Analytical calculations for the jumps in the gyroaction—single transit nonadiabaticity . . . . . . . . . . . . . 4.3. Jump in the gyroaction: a quantum mechanical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Long term non-adiabaticity of adiabatically con%ned systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1. Nonadiabatic loss of particles from magnetic mirror traps: some early experimental results . . . . . . . . . . . . . . . 5.2. Life time as an ensemble property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Stability of the adiabatically con%ned motion in the magnetic traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1. The axisymmetric case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2. The nonaxisymmetric case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. A wave mechanical Schr6odinger like description of long term nonadiabaticity: the new paradigm . . . . . . . . . . . . . . 6.1. Schr6odinger-like equations: a heuristic derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1. Multiple residence times: experimental determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Schr6odinger-like equations as a Hilbert space representation of the classical Liouville equation . . . . . . . . . . . 6.2.1. The Liouville equation for the evolution of the ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2. Equations for the probability amplitudes for the ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3. An analysis of the Schr6odinger-like formalism and its observational rami%cations . . . . . . . . . . . . . . . . 6.3. Residence times: experimental results and comparison with theoretical models . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Observations of one-dimensional interference phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1. Transmission characteristics of charged particles along a magnetic with a retarding potential—existence of discrete energy states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. Transmission characteristics of charged particles along a magnetic %eld with electron energy sweep: observations of discrete energy states and beats in the plate current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1. Analysis of the experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2. The wave algorithm for the present experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3. A quantum mechanical justi%cation of the non-Planckian macroscopic matter wave behaviour of electrons along a magnetic %eld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. The Schr6odinger-like equations—a quantum mechanical derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1. A path integral representation for a charged particle in an inhomogeneous magnetic %eld and the derivation of the set of Schr6odinger-like equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1. The nature of the Schr6odinger-like formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
303 305 308 310 310 312 313 314 317 317 318 319 320 321 324 325 325 330 337 339 343 346 347 348 349 350 355 357 358 364 370 373 374 378 379 381 383 389 392 394 398 399 400 401 409
R.K. Varma / Physics Reports 378 (2003) 301 – 434 8.2. 8.3. 8.4. 8.5.
Observability of the curl free vector potential aL la Aharonov–Bohm in the macro-domain . . . . . . . . . . . . . . . . Guiding centre equations of motion—the adiabatic limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonaxisymmetric magnetic %eld and the “longitudinal invariant” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonaxisymmetric magnetic %eld and the transition across magnetic surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1. Transport across magnetic surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6. Observation of the curl free vector potential in the classical macrodomain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Summarizing comments, discussion and future issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1. Future issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
303 410 413 414 415 419 421 426 429 431 432
1. Introduction The problem of the motion of charged particles in inhomogeneous magnetic %elds is a fascinating one and has been studied for more than half a century. In fact, in his pioneering studies, St6ormor addressed this problem almost 90 years back in relation to the study of the motion of primary cosmic radiation (protons and ions) in Earth’s magnetic dipole %eld as well as due to its supposed relevance at that time to the problem of understanding the polar aurora. These painstaking studies have been extensively described in his book, The Polar Aurora [1]. Being essentially a nonlinear problem, because of the inhomogeneity, it is not generally amenable to analytical solutions, and numerical methods have to be resorted to. St6ormor [1] had essentially used numerical techniques to delineate the orbits known after his name. It will be appropriate from the point of view of the reader to provide a motivation for the present review and for the main line of investigation which represents the core of the article. This is provided in what follows in this Introduction. While the study of the motion of charged particles in a magnetic %eld has been extremely useful, for its results in various applications as pointed out below, it has been a fascinating study in its own right both from a physical and a mathematical point of view. Many interesting results have been obtained over the years, which have not been tied together in a systematic review. We present in this review the most important of these results which should serve as an introduction for a nonspecialist reader as well as an overview for the specialist. On the other hand, there have existed a few problems which represent a considerable mathematical challenge in terms of the standard methods of tackling nonlinear problems such as these. Some new approaches have helped to provide new perspectives and dimensions to these problems, while there still remain some unanswered questions. These will be discussed in this article along with a systematic review of relevant results obtained earlier over the years. Since the 1950s, charged particle motion has been more extensively studied because of its importance in investigations relating to magnetic con%nement devices for fusion plasmas (mirror machines, stellarators, tokamaks and bumpy tori) as well as those relating to space and astrophysical plasmas (magnetospheric plasmas of earth and other planets, and of pulsar and other compact objects). Again, because of the inherent magnetic %eld inhomogeneity of fusion con%nement devices and of the space and astrophysical objects, the problem is inherently nonlinear, and would suggest numerical techniques for their solutions. However, numerical solutions are not convenient as inputs to further analyses with regard to the equilibrium and stability studies of plasmas in these objects. Therefore,
304
R.K. Varma / Physics Reports 378 (2003) 301 – 434
the emphasis in contemporary studies has been on an analytical understanding of the problem even if it is approximate in nature. In most fusion con%nement devices and astrophysical objects, the magnetic inhomogeneities are generally weak; that is the magnetic %elds are slowly varying in space. (We shall de%ne it more precisely later). This is the basis of a generally used and a very useful approximation under which the motion has been studied for over 40 years. The slowness of the magnetic %eld variation has prompted the use of the name “adiabatic approximation” for it, and the reduced motion resulting therefrom as the “adiabatic motion”. As will be discussed in what follows, the adiabaticity permits the introduction of approximate constants of motion of the system known as the “adiabatic invariants”. These are “local” invariants, rather than the global invariants which follow from the global symmetry of the system, like the axial or the spherical symmetry, leading to the strict conservation of appropriate angular momentum. A “local” invariant on the other hand, follows from an approximate symmetry in a small neighbourhood; the smaller the neighbourhood to which the system motion is con%ned, the better is the symmetry of space which the motion spans, the greater is the accuracy with which the invariance holds. However, one of the degrees of freedom of the system may carry the system from one region to another of a diFerent local symmetry. This would lead to a nonanalytic breakdown of the adiabatic invariance. Such “breakdowns” have been referred to as nonadiabatic eFects, which will be discussed in Section 4. Even if approximate, the adiabatic invariants have been found to be extremely useful in understanding the dynamics of a plasma in essentially inhomogeneous magnetic %eld geometries both in fusion and space systems. The use of these invariants, for example, helps one reduce the complex nonlinear Lorentz equation to simpler equations, which are amenable to easier manipulations. They have also led to the introduction of concepts and paradigms which have played important roles. One such concept is that of a “magnetic mirror”, which has led to the paradigm of “mirror con%nement”. The adiabatic invariant associated with the gyro-motion of the charged particle, which in its most popular form is called the “magnetic moment invariant” is responsible for the con%nement of the charged particle in a magnetic %eld domain bounded on either side by the regions of higher magnetic %eld. Such a system is referred to an “adiabatic magnetic trap”. The particle is reRected as if from a “mirror”, in a certain plane normal to the %eld line in the region of the higher magnetic %eld as it moves along the %eld line. Hence the names “magnetic mirror” and “mirror con%nement”. Since the adiabatic invariance is approximate (to be de%ned more precisely later) as it is due to the local symmetry, departures from it, however small, are bound to occur as the particle moves from the region of one local symmetry to a region of another local symmetry. Such breakdowns of invariance are referred to, as mentioned already, as nonadiabatic eFects. The mirror con%nement, which is dependent on the existence of the adiabatic invariance of the “magnetic moment”, would then be leaky. Early experiments [2] indeed exhibited the eFect of the “goodness” of the adiabatic invariance through the trapping of particles lasting over about 108 bounces, while nonadiabatic eFects have been manifested through an exponential decay time of particle population in the magnetic trap due to such nonadiabatic leakage which has been measured as a function of the magnetic %eld for a given energy and magnetic moment for the ensemble of particles injected into the mirror trap [48,49]. These can also be referred to as residence times against nonadiabatic leakage. A problem of considerable physical signi%cance is to determine theoretically these residence times as a function of the various relevant parameters. This is one of the main problems that will be
R.K. Varma / Physics Reports 378 (2003) 301 – 434
305
addressed in this review, which will cover approaches pursued to solve this problem. This problem, it may be pointed out, presents a considerable mathematical challenge since, as will be discussed, the nonadiabatic eFects which are responsible for the leakage and %nite residence times have a nonanalytic character in the smallness parameter which characterizes the adiabaticity. This nonanalyticity will have to be suitably extracted from the relevant equations and the residence times calculated in a self-consistent formulation. An important distinction is sought to be pointed out in this review, however, between the properties of the motion of an individual particle and that of an ensemble, which has been designated as a “coherent system of trajectories” by Synge [3], and as a “family” by Dirac [4]. There are thus nonadiabatic eFects relating to single particle dynamics. Important investigations relating to single particle nonadiabatic eFects are reviewed in Sections 4.1 and 4.2. On the other hand, the problem of the residence times against nonadiabatic leakage falls in the category of ensemble dynamics because a residence time (de%ned through an exponential decay of the particle population in the adiabatic trap) can only be legitimately de%ned in terms of an ensemble, the ensemble of particle injected into the trap with a given energy and magnetic moment invariant. This aspect is discussed in Section 5.2. 1.1. Nonadiabaticity and quantum e/ects: a formal analogy and a new paradigm An examination of the relationship between the adiabatic motion and the exact particle motion in a magnetic %eld on the one hand, and that between the classical dynamics and quantum mechanics on the other, reveals a rather interesting formal analogy between the two relationships. This formal analogy has been exploited by the author to introduce a new paradigm to address the problem of nonadiabaticity, and eFects associated with it, in particular, the nonadiabatic loss of particles from magnetic mirror traps. We present in this subsection some contours and an overview of this new paradigm which we shall develop in detail in Section 6. The discussion of the evolution and development of this paradigm and of its unexpected and novel experimental manifestations constitutes the core subject of this review. Mathematically and formally speaking a comparison is made between the nonadiabaticity in the charged particle dynamics and the quantum eFects. The quantum departures from classical mechanics are known to be of a nonanalytic nature as follows from the small (quantum) parameter, ˝2 multiplying the highest derivative ∇2 in the Schr6odinger equation. The relationship of the Schr6odinger equation of quantum mechanics with the Hamilton–Jacobi equation of classical mechanics in the limit ˝ → 0, presents a case of “singular” perturbation theory, and is achieved through the WKB expansion, which is an asymptotic expansion. The adiabatic motion of charged particles in a magnetic 3eld is similarly related to the exact motion. The former is obtained through a similar asymptotic expansion (see, for example, Northrop [5], Kruskal [6]) with the adiabaticity parameter j, as the smallness parameter de%ned later. The nonadiabatic eFects are nonanalytic in the adiabaticity parameter j, in the same way as the quantum eFects are in the quantum parameter of smallness ˝. This analogy has been exploited to develop a new paradigm to deal with nonadiabaticity, (Varma [55,58]). The nonadiabatic leakage of charged particles from the “adiabatic magnetic trap” has been likened to the quantum tunneling of particles from classical potential traps. Both are due to departures of a nonanalytic kind nonexpandible in the respective parameters of smallness. The question had thus been posed whether the nonadiabatic leakage could also be described through a Schr6odinger-like
306
R.K. Varma / Physics Reports 378 (2003) 301 – 434
formalism, analogously to the quantum tunneling. Since both quantum tunneling as well as nonadiabatic leakage of charged particles presuppose ensembles for the de%nition of residence times, a Schr6odinger-like description would constitute an appropriate ensemble description, if such a one were possible. Interestingly such a Schr6odinger-like description was in fact, constructed by the author (Varma [55]) based on intuitive-heuristic considerations making use of the above mentioned analogy. It consisted of a set of Schr6odinger-like equations (one dimensional along the magnetic %eld line coordinate), where the gyroaction (denoted by ) appears in the role of the quantum of action ˝, and the adiabatic potential appears in the location of the potential in the quantum mechanic Schr6odinger equation. This formulation will be presented in Section 6, along with its new surprising predictions and their experimental veri%cation. Now just as quantum departures from classical mechanics describe the quantum-tunneling of classical potential humps through the Schr6odinger equation, the Schr6odinger-like equations of the new paradigm also describe the nonadiabatic loss of particles from the adiabatic traps, analogously to the quantum tunneling as nonanalytic departures from the adiabatic equation of motion. Note that just as the classical equation of motion follows from the Schr6odinger equation in the limit ˝ → 0, the adiabatic equation of motion follows from the Schr6odinger-like equations in the limit → 0 (which is formally equivalent to j → 0). This quantum-like formulation of nonadiabatic behaviour of charged particles, turned out to be extremely successful. The diFerent equations of the set of Schr6odinger-like equations where (=n), with n as an integer, appear in the diFerent equations in the role of ˝, predicted previously unsuspected and unexpected multiplicity of residence times in adiabatic traps, for the ensemble of particles speci%ed by a -function in the energy E, the gyroaction initial value , and other canonical momenta. The experimental results [48,49] on the residence times of particle existing at that time (1970) were found to be well described by the equation corresponding to the mode n = 1. The experiments carried out subsequently [57] to check the predictions corresponding to n = 1; 2; 3; : : : did con%rm the existence of other residence times for n = 2 and 3 with characteristics in complete agreement with the theory. This came as a surprise on three counts. First, a pleasant surprise that a rather unconventional approach to the problem of nonadiabatic particle loss led to the unraveling of some unsuspected novel features of charged particle dynamics in magnetic %elds. Secondly, it is a surprise from the point of view of the conventional approach, as such multiple residence times were entirely unexpected in the latter approach, which is reviewed in Section 5 for the sake of comparison. The third element of surprise is of a fundamental nature: a probability amplitude description as the Schr6odinger-like one for a classical mechanical system would generally be considered as an enigma. But since it has been found to work so well this fact must point to something of a more fundamental nature, which will be discussed in detail in Sections 8 and 9. However, the derivation of Ref. [55] was heuristic and a %rmer theoretical basis for these equations was desirable if one were to assign to it attributes of a fundamental nature. The same set of equations were subsequently obtained [58] as a Hilbert space representation of the classical Liouville equation for the ensemble of particles injected into the trap with a -function distributions in energy E, the initial gyroaction value and other momenta (de%ning a “coherent system of trajectories” [3] or a “family” [4]). To recall the last point mentioned above, it ought to be emphasized that the system which has found such a successful description through the probability amplitude Schr6odinger-like formalism belongs to the classical mechanical domain. This underlines a very signi%cant recognition that a
R.K. Varma / Physics Reports 378 (2003) 301 – 434
307
probability amplitude description can exist for a classical mechanical system as well. These equations are found to have far greater implications than just the description of the nonadiabatic particle loss from adiabatic traps analogously to the quantum tunneling. Following from the probability amplitude nature of the formalism, a prediction was made in Ref. [58] that there should arise interference eFects in the transmission of charged particles along magnetic %eld in the macrodomain of classical mechanics. Though such a prediction would appear to be rather enigmatic, as no matter wave manifestations are expected to arise in the macrodomain where classical mechanics is supposed to operate; however, such predicted interference eFects have indeed been observed by Varma and Punithavelu [60] in the form of discrete energy allowed and forbidden bands in the transmission of electrons along a magnetic %eld as well as matter wave beats recently [65]. This constitutes a further surprise and indeed a paradox from the point of view of our present understanding of the matter wave phenomena. Further considerations become necessary to resolve the paradox and to enlarge our understanding of the matter wave phenomena. One can approach the problem in two ways. In one, the resolution of the paradox is attempted to be sought entirely in the framework of classical mechanics in whose domain the system under discussion belongs. It may be recalled that the Schr6odinger-like equations were obtained in Ref. [58] as the Hilbert space representation of the classical Liouville equation, with the “coherent system of trajectories” de%ning the ensemble, and so must be considered to represent classical mechanics. But classical mechanics in the equation of motion—initial value representation cannot in any way support the matter wave phenomena which the Schr6odinger-like equations do. What then is the fundamental diFerence between these two representations of classical mechanics? This issue is discussed in Section 9, where it is conjectured that there may exist topological eFects in classical mechanics [61] not so far recognized, which may be responsible for the fundamental diFerence and which are somehow incorporated in the Schr6odinger-like equations. On the other hand, one can ask the question whether the Schr6odinger-like equations of Ref. [58] which are the equations for some probability amplitudes, can be related to the Schr6odinger equation of quantum mechanics. Very recently the author has demonstrated [64] how the probability amplitudes of the former are related to that of the latter. It is an interesting connection which is discussed in Section 8. This demonstration provides a rather interesting quantum mechanical interpretation of the mode number n which label the various Schr6odinger-like equations of Ref. [58] and therefore of the diFerent residence times in an adiabatic trap. According to the derivation, the wave functions (n) governed by these Schr6odinger-like equations represent transition amplitudes from a large quantum numbered Landau level N to a Landau level numbered N ± n, where N n ¿ 1, and the gyroaction = N ˝. It may be mentioned that no such physical interpretation was possible in the framework of the derivation of Ref. [58]. However, this demonstration also establishes something conceptually rather important, namely, that the Schr6odinger-like equations of Ref. [58,64] represent the set of wave equations relating to the quasi-classical limit (N 1), which thus describe the matter wave behaviour in this limit, and therefore in the classical macrodomain. This thereby provides a quantum mechanical rationale for the observed interference eFects in the macrodomain. Furthermore, the quantum mechanical derivation of Ref. [64] also enables the Schr6odinger-like equations of Ref. [58] to be generalized to include a curl-free vector potential. This leads to the possibility of observing the curl-free vector potential in the classical macrodomain in the manner of the Aharonov–Bohm eFect. This is a rather spectacular consequence of this formalism for, in the classical mechanical domain
308
R.K. Varma / Physics Reports 378 (2003) 301 – 434
in which this formalism is operative, the curl free vector potential is not an observable in the sense of its contribution to the Lorentz equation of motion, where B = ∇ × A, would vanish for curl-free A. We have indeed observed the eFects of the presence of a curl free vector potential in the sense of Aharonov–Bohm as described in Section 8.6. As will be discussed there, this particular observation is even more dramatic, since the observable eFects of the curl-free vector potential have been detected in the macrodomain where classical dynamics is normally operative, and thus appear to be in apparent contravention of the Lorentz equation. However, as we shall discuss in Sections 8 and 9, the issue requires a somewhat subtle understanding. Thus, having started from an intuitive conjecture as to the possibility of describing the nonadiabatic particle loss through a Schr6odinger-like description, analogous to the quantum tunneling, we have developed a whole new paradigm whose rami%cations have gone far beyond its original objective. This has evolved through a process of synergistic combination of theory and experiments going hand in hand. In fact the set of Schr6odinger-like equations of Ref. [64] have emerged as representing the matter wave aspect of the QM-Schr6odinger equation in the classical limit, for this particular system which is a rather novel realization. There has, of course, been a parallel conventional approach to the problem of nonadiabatic particle loss and the determination of residence times. It is only appropriate that it is also presented in this review for the sake of comparison. This is reviewed in Section 5. In Section 1.2, we present a brief outline of this conventional approach (Chirikov [7,9,10]; Bernstein and Rowlands [8]). This will serve to familiarize the reader with the basic elements of this approach vis a vis our new paradigm as overviewed in the present subsection. 1.2. An outline of the conventional approach: the stochastic di/usion model The conventional approach, by which we mean the one which is rather commonly subscribed to, is largely due to Chirikov [7,9,10]. It is directed basically at the problem of nonadiabatic escape from magnetic mirror traps. One of the essential diFerences between our approach as outlined in Section 1.1, and the conventional one is that we have the Liouville equation as our starting point, which is linear in the Liouville density function, even as the dynamical equations of motion are nonlinear. The advantage of this approach is that the linearity of the Liouville equation, or its transform (as the Schr6odinger-like equations are) lends to easier calculations, with the nonlinearity aFecting only the manner of linear evolution of the Liouville density or its transform. Of course, the resulting paradigm has turned out to be much richer in content, than what its original objective would entail. In the conventional approach, on the other hand, one deals explicitly with the nonlinear dynamical equations with appropriate methods of analysis using suitably discretized mapping. One of the basic elements of this approach is the result from the numerical experiments of Garren et al. [11], relating to the existence of jumps V in the gyroaction value whenever a particle moving in a mirror trap crosses the midplane where the %eld goes through a minimum. Since the con%ning potential in the adiabatic trap is ( ), any change V in , would lead to an increase or decrease of ( ) depending on the sign of V. If after a series of transits across the midplane, the cumulative change Vt in is such that ( m i − E) + Vi t m 6 0, where i is the initial value, and m is the maximum of the gyro-frequency at the position of the mirror throat, the potential gets lowered to the extent that the particle escapes the potential well.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
309
This is the basic idea of this approach. However, the crucial and the most diWcult aspect of the problem is to calculate the “cumulative change” V. The origin of the diWculty rests in the fact that the magnitude and sign of the individual change V depends on the Larmor phase at the time of the midplane crossing through a factor sin [see Section 4.2.1 for an expression for V]. To be able to calculate the cumulative change Vt , at any given time t, it would be necessary to keep track of the Larmor phase from one mid-plane crossing to the next, which is not an easy task particularly when the phase changes rapidly during the motion. To circumvent this diWculty, the motion is split into two parts: (i) the nonadiabatic change V at the middle-plane crossing, and (ii) the rest of the motion in between crossing which is assumed to be adiabatic. The change in phase V from one crossing to the next is evaluated using the adiabatic motion. This enables one to evaluate the change V at the midplane crossing. In this approach the continuous diFerential equation mapping is replaced by a discrete mapping in the variables and at the midplane crossing, which involves a mapping parameter K. It is shown that for value of K ¡ 1, the mapping exhibits a regular behaviour, where the changes in are bounded, and particle may remain con%ned in the trap. On the other hand for K ¿ 1, the mapping is shown to exhibit a chaotic behaviour whereby suFers unbounded change. It is in this regime that as a result of a suWciently large negative excursion in , the maximum potential ( m ) may get lowered to the extent that the particle escapes the initial potential well. The chaotic changes in are likened to a random walk in , and a diFusion coeWcient D is de%ned [Section 5.3.1]. A solution of the corresponding diFusion equation in with appropriate boundary conditions then determines the residence time in the trap. This model may be referred to as the “stochastic diFusion model”. It will have been noticed by the reader that the Schr6odinger-like model of our new paradigm, outlined in Section 1.1 and the “stochastic diFusion model” outlined here represent two widely diFerent approaches which will be discussed and compared with respect to their contents in Sections 6.3 and 9. Since the “adiabatic invariance” and “nonadiabaticity” are important basic concepts which are central to this review, these are discussed in the following sections: adiabatic invariants along with some general considerations in Section 2, adiabatic invariants and adiabatic motion for charged particles in electromagnetic %elds in Section 3 and nonadiabatic eFects in Section 4. The discussion of the nonadiabatic eFects in particular has been carried out rather extensively, as it is an important concept. Wherever possible a comparison is attempted to be brought out between nonadiabatic eFects on the one hand and quantum eFects on the other. In fact, as will be noticed, some of the mathematical procedures used to calculate nonadiabatic eFects are many a time similar to these used in quantum mechanics. This particular bias of presentation should be new to the readers. However, before we close this section we must make a remark on the symbols and notations used in this review. The number of symbols available including both roman and greek, has been found to be insuWcient to represent the large number of distinct quantities appearing in this review. While it is desirable to have distinct notations for the various quantities it has not been possible to do so. An attempt has been made to use, what we may call a “global” notation for certain basic quantities which have remained the same throughout the review. On the other hand, there are “local” notations whose meaning may vary in diFerent sections of the review. The reader should be able to easily distinguish between these two diFerent kinds of notations. Hopefully the reader would appreciate the author’s compulsion in this matter.
310
R.K. Varma / Physics Reports 378 (2003) 301 – 434
2. Adiabatic invariants: some general considerations Before we embark on the discussion of adiabatic invariants for charged particle dynamics in electromagnetic %elds in Section 3, we shall undertake some general considerations pertaining to adiabatic invariants (AI). Historically the concept of “adiabatic invariance” was already implicit in the works of Helmholtz [12] and Hertz [13]. Helmholtz has discussed the concept of adiabatic motion of a cyclical system (under the slow variation of an external parameter) in his attempt to provide a mechanistic basis for equilibrium thermodynamics, with entropy being the adiabatic invariant. Likewise, Hertz has also introduced the same concept in his exposition of mechanics. It has often been stated that the concept (of adiabatic invariant) has its origin in the answer that Einstein provided to a question posed by Lorentz at the %rst Solvay Congress in 1911, namely, that if the parameter of an oscillator is changed slowly, the energy E of the oscillator changes in direct proportion to the change in its frequency . Both Helmholtz’ and Hertz’ works pre-date this question. Hence they must be regarded as the %rst originator of the concept of adiabatic invariants. It seems that Lord Rayleigh was also aware as early as 1902 that E= was invariant for a pendulum whose length was changed gradually. Subsequently these AI played an important role in the formulation of early quantum mechanics particularly through the work of Ehrenfest [14]. Even later, they provide today a bridge between classical and quantum mechanics. For the charged particle motion, however, the adiabatic invariants were %rst introduced by Alfven [15] in the 1940s who derived them through physical considerations. More formal procedures were developed later, through the works of Hellwig [16], Northrop and Teller [17], and Kruskal [6] and that of Sen-Gupta [18] who has obtained some elegant results using the Hamilton–Jacobi formalism. A system which is generically a nonlinear oscillator in one or many dimensions with a small parameter dependent Hamiltonian, would in general admit as many adiabatic invariants as there are degrees of freedom with quasi-periodic motion in them. In what follows we shall discuss the nature of the invariance and some of its mathematical aspects, in particular, the asymptotic phenomena in mathematical physics that this is related to, and which also arises in the relationship between classical and quantum mechanics. 2.1. The adiabatic invariant for a time dependent harmonic oscillator We shall %rst discuss the simplest example of the adiabatic invariant for a harmonic oscillator with time varying frequency. This will serve as a paradigm for the subsequent discussion of the adiabatic invariants. Let the Hamiltonian for a one-dimensional oscillator be p2 1 + m!2 (jt)q2 ; (2.1) 2m 2 where ! is the frequency and j is a parameter of smallness, which measures the slowness of the variation of ! through the time dependence (jt). Then, according to the theory of adiabatic invariants, (see for instance, Landau and Lifshitz [27]) the action associated with the cyclical motion of the oscillator is an “adiabatic invariant”. J = p dq = (adiabatic) invariant : (2.2) H=
R.K. Varma / Physics Reports 378 (2003) 301 – 434
311
For the harmonic oscillator it can be shown to be J = E=! ;
(2.3)
where E is the total energy of the particle at a given instant of time. There is another, a more formal manner of obtaining the above result. The equation of motion for the oscillator described by the Hamiltonian (2.1) is given by q6 + !2 (jt)q = 0 :
(2.4)
If we introduce the variable = jt, we get j2 q + !2 q = 0 ;
(2.5)
where the small parameter j2 multiplies the highest time derivative namely q ≡ d2 q=d2 in Eq. (2.5). It may be noted that the order of the diFerential equations drops to zero for j = 0. Mathematically this is the case of “singular” perturbations and asymptotic phenomena, and is common to such diverse physical phenomena as boundary layer, geometrical optics, shadow boundary, shock waves, as well as the skin eFect in the Row of currents. One may ask the question as to what the properties of the motion are, as governed by Eq. (2.5) in the limit j → 0. Such questions have been handled by employing formal series solutions which were proved by PoincarLe in 1886 to represent asymptotic expansions of actual solutions (see for example Wasov [19], for a comparatively recent and rigorous development of the subject). Such series solutions do not in general converge, but exhibit nonuniform convergence, however. The formal series that are employed are of the form u = eS(z)=j jn vn (z) ; (2.6) n
where u stands for q or any other function following a similar diFerential equation, and z for the independent variable. A formal series of the type (2.6) is said to represent the asymptotic expansion of the function u(z; j) if the remainder of the terms upto N th order is of order O (N + 1); that is −S(z)= j
e
u(z; ) =
N
jn vn + O(jN +1 ) :
(2.7)
n=0
Writing down a similar expansion for q in Eq. (2.5) jn yn () ; q = eS()=j
(2.8)
n
we obtain on substitution in (2.5) and equating terms of the lowest and the %rst order in j: 2
S + !2 + j2 y0 = 0 ;
(2.9a)
2
(2.9b)
2S y0 + y0 S + y1 (S + !2 ) = 0 : This gives from (2.9a) (neglecting the j2 y0 =y0 term) S = ±i ! d ;
(2.10a)
312
R.K. Varma / Physics Reports 378 (2003) 301 – 434
and from (2.9b): d 2 d 2 (y S ) = ±i (y !) = 0 ; d 0 d 0 whence y02 ! = 2J (const) :
(2.10b)
(2.10c)
With (2.10a) and (2.10b), the solution q [Eq. (2.8)], to the lowest order in j is now given by: 2J sin ! dt + ; (2.11) q= ! where J from (2.10b) is a constant of motion to the lowest order. To see that this is indeed the same “adiabatic invariant” J as given by (2.3) we note from (2.11) that the energy E of the oscillator at a given instant of time t is given by E = 12 q˙2 + 12 !2 q2 = J! ;
(2.12)
which is the same relationship as (2.3), de%ning J . But it is interesting and instructive to see how this invariance is obtained in this formal procedure, as an asymptotic approximation. 2.2. Asymptotic phenomena in quantum mechanics It may now be pointed out that quantum mechanics is related to classical mechanics also through an asymptotic relationship of the kind alluded to above. We would purposely like to brieRy recall this relationship because we would like to exploit this analogy between the relationship of exact charged particle motion to the adiabatic motion on the one hand and that of quantum to classical mechanics on the other. Indeed Eq. (2.5) has an analogue in the Schr6odinger equation in that the highest derivative (the spatial derivative ∇2 in this case) is multiplied by the small parameter ˝2 : ˝2 2 ∇ + (E − V ) = 0 : 2m
(2.13)
We have chosen to consider a stationary state ( ∼ e−iEt=˝ ). The asymptotic representation in this case corresponds to the well known WKB form of the solution for : = AeiS=˝ ;
(2.14)
where both S and A may be developed into a power series in ˝. Substituting (2.14) into (2.13) and separating the real and imaginary parts, yields the well known equations (see, for example, Landau and Lifshitz [20]): 1 ˝2 2 ∇ A=0 (∇S)2 + (V − E) − A 2m and
∇·
A2 ∇S m
(2.15a)
=0 :
(2.15b)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
313
In the limit ˝ → 0 (2.15a) reduces to the Hamilton–Jacobi equation of classical mechanics with S as the “principal function”. Eq. (2.15b) on the other hand, describes the continuity equation in the stationary state for the probability density % = A2 . ∇ · (%v) = 0 ;
(2.15c)
with v = ∇S=m being the velocity %eld. Eq. (2.15a) and (2.15b) correspond to Eq. (2.9a) and (2.10b) for the time dependent oscillator, and classical mechanics can be regarded as an adiabatic approximation to quantum mechanics inasmuch as the latter implies slowness of the spatial variation of the potential V . However, (2.15b) is exact under the ansatz (2.14). 2.3. Adiabatic invariance to all orders The neglect of the second term ∼ j2 in Eq. (2.9a) to obtain (2.10a) and hence (2.12) amounts to assuming the slowness of the time variation of !. The adiabatic invariance of J de%ned through (2.12) is then to order j2 . The statement of the theorem of “adiabatic invariance” needs to be made more precise, however. Following Chandrasekhar [21] we give the statement as follows: Let the Hamiltonian of the system be a function of the time dependent parameter a(t), which varies such that Max|a=a| ˙ 6 T −1 :
(2.17)
with T being an appropriate time. Further let the derivatives of a satisfy the condition that d n a=dt n → 0; as t → ±∞
for all n ¿ 1 :
(2.18)
J = J+ − J− = J (t → ∞) − J (t → −∞) ;
(2.19)
Consider then the quantity being the diFerence of the values of J as t tends to plus and minus in%nity, respectively. The limit of an in%nitely slow variation of a is approached formally by letting T → ∞. If in this limit J → 0, we say that J is an adiabatic invariant. The slowness of the variation of a can be expressed by letting the time dependence to be of the form a = a(jt) = a() ; = jt ;
(2.20)
where j is the parameter of smallness. Then t → ∞, is equivalent to j → 0. If a positive constant M can be found such that for the change J |J | ¡ jn M
(2.21)
(n a positive integer) for all suWciently small j, we say that J is an adiabatic invariant to the nth order. If this is true for any n, J is said to be adiabatic invariant to all orders. Kulsrud [22] has given an expression for the relative change of the adiabatic invariance of a harmonic oscillator in terms of the discontinuity of the nth derivative of the time dependent frequency. J 2(n+1 1 =− (2.22) sin 2’ − n* ; J (2!)n+2 2
314
R.K. Varma / Physics Reports 378 (2003) 301 – 434
where (n+1 is the jump in the (n + 1)th derivative of the frequency as a function of t, and ’ is the phase of the oscillator at the point of discontinuity. This result would be obtained in Section 4.1.1 by an alternate procedure. It follows from this expression that J = 0, if all the time derivatives of ! are continuous. That is J is constant to all orders. This result was extended to the case of nonlinear oscillator by Lenard [23] using the Liouville equation. The latter has the advantage of being a linear equation even for the nonlinear system, and can be handled more easily. In fact, the phase space dynamics of a Hamiltonian system which is governed by a linear partial diFerential equation (namely the Liouville equation) is formally equivalent to the Hamiltonian dynamics of an individual system governed in general by an ordinary nonlinear diFerential equation. Indeed, this fact will form the basis of the discussion in Section 6. 2.4. A “tighter” adiabatic invariant Before we close this discussion on the adiabatic invariance of a time dependent harmonic oscillator, we shall obtain what amounts to tighter adiabatic invariant for the latter. The result is due to Rosenbluth [24] though we present here a more formal procedure than employed by him to obtain the result. Again consider the equation q6 + !2 (t)q = 0 ;
(2.23)
for the oscillator with a slowly time varying frequency !=!(t). As before let T be the characteristic time of the variation of the frequency, so that j = (!T )−1 is the parameter of smallness. The Lagrangian for the motion is given by L=
1 2 1 2 2 q˙ − ! q ; 2 2
and the action is 1 2 1 2 q˙ − ! q : S = dt 2 2 As a %rst step, change t to , de%ned by = ! dt ; q˙ = !
dq = !q ; d
where q denotes derivative with respect to . The action in terms of becomes 1 2 1 2 ; S= d! q − q 2 2
(2.24)
(2.25)
(2.26) (2.27)
(2.28)
with the Lagrangian L=
1 !(q2 − q2 ) : 2
(2.29)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
315
Next, change the dependent variable such that the term !q2 in the Lagrangian is written as q(1)2 = q2 ! :
(2.30)
Then the action S is given by 1 ! (1) (1) 1 !2 (1)2 (1)2 (1)2 S= d q − q q + : q −q 2 ! 4 !2
(2.31)
The term (! =!)q(1) q(1) = d=d(ln !)d=d( 12 q(1)2 ) in the integrand of (2.31) can be integrated by parts so that (2.31) transforms to
2 1 d2 1 d 1 (1)2 (1)2 q −q ln ! 1− ; (2.32) (ln !) − S= d 2 2 d2 4 d and the Lagrangian in terms of the new variables q(1) and is L(1) = 12 q(1)2 − 12 !(1)2 q(1)2 ; with !(1)2 !
(1)2
1 d2 1 =1− (ln !) − 2 2 d 4
(2.33)
d ln ! d
2
:
(2.34)
The Lagrangian L(1) thus describes the dynamics of an oscillator with the coordinate q(1) in terms of the “time variable” , with the time dependent frequency !(1) . The corresponding Hamiltonian H (1) is H (1) = 12 p(1)2 + 12 !(1)2 q(1)2 :
(2.35)
Note that !(1)2 is almost unity apart from the terms d 2 ln !=d2 , and (d ln !=d)2 , which are of order O(!T )−2 ∼ or O(j2 ). Therefore the “energy” as represented by the Hamiltonian H (1) is invariant to O(j2 ). If we transform H (1) to the variables q; t, we obtain H (1) (divided by !(1) ), numerically as
2 d 1 2 d H (1) !(1) 1 (1) −2 2 2 2 (1) −2 1 (! ) q˙ + ! q + (! ) qq˙ ln ! + q ln ! : = !(1) ! 2 2 dt 8 dt (2.36) Note that the left hand side (LHS) H (1) =!(1) represents the adiabatic invariant J (1) for the oscillator described by the Hamiltonian H (1) , and is invariant to O(j2 ). The procedure carried out above from Eq. (2.24)–(2.33) can be repeated n number of times. To repeat once more, de%ne the independent variable 1 = !(1) d ; (2.37) and the dependent variable q(2) through q(2)2 = q(1)2 !(1) :
(2.38)
316
R.K. Varma / Physics Reports 378 (2003) 301 – 434
This yields, as before, the action 1 − !(2)2 q(2)2 ] ; S = d1 [q(2)2 2 1 with 2 d 1 d2 1 (2)2 (1) (1) ! =1− (ln ! ) − ln ! : 2 d21 4 d1
(2.39)
(2.40)
The corresponding Hamiltonian H (2) is then H (2) = 12 p(2)2 + 12 !(2)2 q(2)2 : Note that (changing to the original time variable t) 2 d 1 d 1 d 1 1 (2)2 (1) (1) ! =1− − : ln ! ln ! 2 !!(1) dt !!(1) dt 4 !!(1) dt
(2.41)
(2.42)
Since !(1) ≈ 1 + O(j2 ) ; 2
!(2) ≈ 1 + O(j4 ) ;
(2.43)
the Hamiltonian H (2) is invariant to order O(j4 ). Each such transformation operation leads to invariance to an order enhanced by j2 . Transforming the variables in H (2) to q and t we obtain (numerically) 2 1 d √ (1) 2 1 (2)2 (1) 2 1 (2) H = q !! + ! ! !q 2 !!(1) dt 2 2 √ 2 1 d !(2)2 !(1) 1 2 2 1 (1) 2 2 (1) !q + !! q˙ + q = !! ! 2 2 !!(1) dt √ ! d √ (1) (1) + 2qq˙ !! !! dt !(2)2 !(1)
!(1) !(2)2 1 [(!(1) !(2) )−2 q˙2 + !2 q2 ] = ! 2 2 1 2 (1) (2) −2 d 1 (1) (2) −2 d (1) (1) ˙ ln(!! ) + q (! ! ) ln(!! ) : (2.44) ! ) + qq(! 2 dt 8 dt Now from the transformations (2.26) and (2.37) it follows that q0(2) (1 ) (2) (1) (2) (1) q= √ sin ! ! ! dt + ’ = q0 (t) sin ! ! ! dt + ’ ; !!(1) so that 1 (!(1) !(2) )−2 q˙2 2
+ 12 !2 q2 = 12 !2 q02 (t) = E(t) :
(2.45)
(2.46)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
317
Furthermore, H (2) =!(2) represents the action for the oscillator described by the Hamiltonian H (2) , and therefore the adiabatic invariant for it, which is invariant to order j4 at any time. If we therefore divide (2.44) by !(2) and average it over a period of the motion, and make use of (2.45) and (2.46) we obtain
2 1 H (2) E d ln(!!(1) ) : (2.47) = J (2) = !(1) !(2) 1 + (!!(1) !(2) )−2 × !(2) ! 8 dt J (2) is thus an adiabatic invariant to order j4 . If the transformation similar to (2.37) and (2.38) are repeated n times we get the adiabatic invariant
2 1 d E ln(!!(1) · · · !(n−1) ) J (n) = !(1) !(2) · · · !(n) 1 + (!!(1) · · · !(n) )−2 ; (2.48) ! 8 dt which is an invariant to order j2n , and to all orders as j → 0. On the other hand the function J (∞) is also an invariant to all orders for any j ¡ 1. If, however, j → 0 at t = ±∞, then the quantity E=! is invariant to all orders. Eq. (2.36) represents the result (2.48) for n = 1, so that J (1) is an adiabatic invariant to O(j2 ) at any given time. 3. Charged particle dynamics in an electromagnetic eld: adiabatic invariants and adiabatic motion 3.1. Preliminaries: motion in a constant and homogeneous magnetic 3eld The motion of charged particles in an electromagnetic %eld is governed by the Lorentz equation of motion d 2 r dr E = ×+e ; 2 dt m dt where is the vector gyrofrequency. = eB=mc
(3.1) (3.2)
and other symbols have their usual meaning, E and B being the electric and magnetic %elds, e the elementary charge, and m the mass of the particle. Both E and B are, in general, functions of space and time, which makes Eq. (3.1) a nonlinear equation in the coordinate r of the particle. The simplest motion corresponds to a static homogeneous magnetic %eld B = B0 eˆz , where it is given by a uniform motion along the magnetic %eld with the initial velocity v0 z = v 0 t + z0 ;
(3.3a)
and a circular motion perpendicular to the magnetic %eld described by v⊥ 0 cos( t + 0 ) ; (3.3b) x=
v⊥ 0 sin( t + 0 ) ; (3.3c) y=
where v⊥0 refers to the magnitude of the particle velocity perpendicular to the magnetic %eld and 0 is the initial phase of the circular motion—generally referred to as the Larmor motion or gyro-motion.
318
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Also, the quantity v⊥0 = is the radius of the Larmor motion, and is generally referred to as the Larmor radius. %⊥ = v⊥ = :
(3.4)
Since remains constant in the above example, the Larmor radius also remains constant during the motion. The case considered above, of static homogeneous magnetic %eld, is the simplest one, and has quite trivially an exact solution given above. But this serves as the base, or as an unperturbed solution to most other cases of inhomogeneous and/or time dependent magnetic %elds, whenever the inhomogeneity or the time dependence can be considered as a perturbation. 3.2. Motion in a time varying magnetic 3eld ˜ is small compared to the If the time varying or inhomogeneous part of the magnetic %eld B, ˜ ˜ basic constant %eld B0 ; (B = B0 + B(x; t)); |B||B0 |, then the perturbed motion can be obtained by using the standard perturbation theory either in the action-angle formalism (see for example, Arnold [25]) or equivalently in the Hamilton–Jacobi formalism (ter Haar [26]). However this has limited ˜ applicability, for in many cases of interest, the perturbation condition |B||B 0 | may not be satis%ed. In a large number of cases of practical interest (though not all) the magnetic %eld may be a slowly varying function of space and/or time, in the sense de%ned below. This circumstance enables one to invoke an approximation procedure of a diFerent nature (from the above standard perturbation theory), which depends not on the smallness of the perturbation B˜ of the magnetic %eld but rather on the smallness of its %rst (and higher) spatial and/or temporal derivatives as measured with respect to an appropriate spatial or temporal scale, so that the spatial and temporal variations are slow enough. For spatial variation the condition for slowness of variation is, %⊥ 0x ≡ |∇B|1 ; (3.5a) B while for time variation, the corresponding condition is 2* 9B 1 ; (3.5b) 0t ≡
B 9t where %⊥ and are, respectively, the Larmor radius and Larmor frequency. 0x and 0t are to be referred to as the spatial and temporal adiabaticity parameters respectively. 0x 1, then implies that the Larmor radius be small compared to the characteristic length L, of the variation of the magnetic %eld L ≡ (|∇B=B|)−1 . Likewise, 0t 1, implies that the Larmor period 2*= , be small compared to the characteristic time of the variation of the magnetic %eld T ≡ (1=B9B=9t)−1 . There is yet another adiabaticity parameter 0 , and the corresponding adiabaticity condition v |∇B|1 (3.5c) 0 =
B which is associated with the parallel motion of the particle and reRects the time variation of the magnetic %eld as viewed by the particle moving along the %eld line with the velocity v along the inhomogeneous magnetic %eld. Both conditions (3.5a) and (3.5c) must be satis%ed to ensure adiabaticity with respect to the spatial variation. The approximate treatment of the motion under conditions (3.5a)–(3.5c) is referred to as the “adiabatic approximation”. This has its origin in the theory of “adiabatic invariants” in classical mechanics (see for example Landau and Lifshitz [27]).
R.K. Varma / Physics Reports 378 (2003) 301 – 434
319
3.3. Adiabatic invariant for motion in a homogeneous but time dependent magnetic 3eld The motion is governed by Eq. (3.1), with being homogeneous in space but dependent on time. Because of the time dependence, this must be supplemented by the relevant Maxwell equations namely, ∇·E=0 ;
(3.6a)
1 9B ; (3.6b) c 9t since the electric %eld E in (3.1) must be related to the time varying magnetic %eld. Take the homogeneous magnetic %eld B to be in the z-direction, B = Beˆz . Writing Eq. (3.1) in Cartesian components and de%ning 1 = x + iy, a complex perpendicular coordinate, one obtains (3.1) as ∇×E=−
˙ =0 ; 16 + i (t)1˙ + 12 i 1
(3.7)
where 1 dB (eˆz × r) · ; (3.8) 2c dt has been used in (3.1) as the solution of (3.6a) and (3.6b). Introducing a new variable 2 through E=−
1 = 2e(−1=2i)
dt
;
(3.9)
Eq. (3.7) gives 26 + 14 2 2 = 0 :
(3.10)
Eq. (3.10) is formally similar to the oscillator equation (2.4), except that 2 is now a complex variable.
is again assumed to be a slowly varying function of time. Writing 2 = B2 =j2 c2 ; j2 ≡ (m=e)2 , where j is now formally a small parameter, leads Eq. (3.10) to 1 B2 2=0 ; (3.11) 4 c2 which is formally similar to (2.5). Employing an asymptotic expansion for 2 similar to (2.8), j2 26 +
2(t; j) = Z(t; j)eiS(t)=j ; with the provisio of Z being developed into a power series, we obtain to the lowest order 2 dS 1 B2 = ; dt 4 c2 and d dt
dS Z dt 2
(3.12)
(3.13)
=0 :
(3.14)
Again under the ansatz (3.12), Eq. (3.14) is exact provided that the unapproximated expression for dS=dt is given by the equation 2 2 dS 1 B2 21 d Z = + j ; (3.15) dt 4 c2 Z dt 2
320
R.K. Varma / Physics Reports 378 (2003) 301 – 434
is used. However, as before (3.14) gives the adiabatic invariant. 1 Z 2B 2 dS ≡ = Const: Z dt 2 c
(3.16)
Again, since the perpendicular energy of the motion is given by E⊥ = 12 m 2 Z 2 ;
(3.17)
Eq. (3.16) gives E⊥ = Const (adiabatic) : (3.18) J=
This expression is again the same as (2.3), or the same as given by (2.12), but with E⊥ being the total energy in the perpendicular motion of the charged particle. 3.4. General case: time dependent and inhomogeneous magnetic 3eld A generalization of the procedure in Section 3.2 to include, in general, an inhomogeneous magnetic %eld has been given by Kruskal [6] whose exposition has also been presented by Northrop [5]. In an inhomogeneous magnetic %eld the equation of motion (3.1) is a full three dimensional vector equation. To deal with this equation Kruskal writes down an asymptotic expansion of the form +∞ in | n| r= B(R0 ) dt ; j Rn (t) exp (3.19) jc n=−∞ where higher harmonic terms are also present, for various values of n ¿ 1 and where Rn (t) themselves are power series in j. Reality of r requires that R−n = Rn∗ , the complex conjugate of Rn . R0 may be designated as the “guiding centre” of the particle. The asymptotic approximation to the complete equation (3.1) for r should then lead to the “reduced equation” for the guiding centre R0 along with the establishment of the “adiabatic invariants”. Clearly, while the series (3.19) contains exponential terms of the form ein4 , it cannot be regarded as a Fourier series. Rather, it is a generalization of the asymptotic series of the form (3.12) given earlier, to include “higher harmonic” terms, and should be interpreted as such. It is not the purpose of this review to present the derivations of the adiabatic invariants of motion obtained elsewhere in the literature. One may refer to the excellent book by Northrop [5] for a detailed discussion of these and for other references. But certain general remarks are still in order. First of all, an adiabatic invariant can be de%ned only for a bounded quasi-periodic motion of a degree of motion. Thus the number of adiabatic invariants admitted by a system has to be less or at most equal to the number of degrees of freedom. Second, an adiabatic invariant, is an asymptotic series and has the form J = J0 + jJ1 + j2 J2 + · · · ;
(3.20)
where J on the left hand side is formally an exact constant of motion, while J0 is the lowest order expression for the adiabatic invariant and J1 ; J2 : : : are %rst, second and higher corrections. It is J0 which is generally referred to as the adiabatic invariant. The Jns become increasingly complex with increasing n, but the series being an asymptotic series, is not convergent. What it means is that the smaller the j, the better is the approximation. This series is similar to that for the Adelphi
R.K. Varma / Physics Reports 378 (2003) 301 – 434
321
integral discussed by Whittaker [28] whose successive higher order terms also become increasingly complex. There, however, exists a term in the series, (not shown there) which is nonanalytic in j, of the form ∼ exp(−c=j), (apart from the other terms which are an expansion in powers of j) which vanishes faster than any power of j. This nonanalytic term arises whenever the particle moves from one region to another one, through a point (in space or time) where either the %eld vanishes or the %rst derivative of the %eld vanishes and the second derivative has an appropriate sign. Such a nonanalytic change of the adiabatic invariant is referred to as a “nonadiabatic change”. These nonadiabatic changes have to be speci%cally computed for given situations. Such eFects are important for the theme of this review and will be discussed in Section 4. 3.4.1. Adiabatic invariants and adiabatic equation of motion: adiabatic traps Following Northrop [5] now if one uses expression (3.19) in the equation of motion (3.1), then one obtains the equation of motion for the guiding centre R0 ∇ (R) e + O(j) ; R6 = E(R) + R˙ × − m m
(3.21)
where the subscript ‘0’ on R has been dropped for convenience, and where is the gyro-action of the gyrating particle ( is usually denoted to be the magnetic moment, but we introduce it here as the action), and is given in terms of the expansion coeWcients in (3.19) by =
cmB ∗ R10 · R10 : e
(3.22)
R10 is the lowest order term in the expression of R1 (R1 = R10 + jR11 + · · ·). That (3.22) is indeed the gyroaction follows since it is easily seen that ∗ = (% )2 ; (r˙ − R˙ 0 )2⊥ = 2B2 R10 · R10
(3.23)
˙ ⊥ , being the perpendicular gyration velocity in the frame of the guiding centre, (r− ˙ 2 =(% )2 ˙ R) ˙ R) (r− ⊥ where % is the Larmor radius. From (3.22) it then quickly follows that in (3.22) really gives the gyro-action = 12 m(r˙ − R˙ 0 )2⊥ = = 12 m%2 . Eq. (3.21) is now a “reduced” equation of motion for the guiding centre R0 , where the details of gyration have been averaged over. It enables one to talk about the motion of the guiding centre as if in an eFective potential %eld V = ( ), if is an (adiabatic) invariant. This has been shown in this formalism, by proving more generally (both for time dependent and inhomogeneous %elds) that d ∗ (BR10 · R10 ) = 0 + O(j) : dt
(3.24)
One may compare the form of Eq. (3.24) with that of (3.14) [read with (3.15)], and notice the similarity. The reduced equation of motion (3.21) is, of course, approximate not only because it is correct only to the lowest order in j, but also because the is invariant also to O(j). Its consequences must also likewise be approximate. However, it is an extremely useful equation as it describes eFects (even if approximately valid) which would be far from obvious from the exact equation of motion. This is referred to as the “adiabatic equation of motion”, for the guiding centre R. It can be projected into two components. One, parallel to the magnetic %eld, B(R), and the other perpendicular to it.
322
R.K. Varma / Physics Reports 378 (2003) 301 – 434
One can also determine the next term 1 of O(j) in the gyroaction adiabatic invariant series as expressed by the expansion (3.20). This has been given by Kruskal, so that the gyroaction invariant has the form to O(j2 )
2 mv⊥ m ˆ · [(v × e) ˆ · ∇]B −j (v2 eˆ + vv · e) = 2B (v) 2 B2 1 2 +(eˆ · v)(∇ × B) · ; (3.25) v eˆ + 2v⊥ e˜ · v 2 ⊥ ˆ where e(r) ˆ is the unit magnetic where v⊥ is the instantaneous particle velocity perpendicular to e(r) %eld vector at the instantaneous position r of the particle. The second term in the above expression for is the %rst order term j1 , in the expression for . 3.4.1.1. The “parallel” adiabatic equation of motion. by mv˙ = eE (R) − ∇ :
The “parallel” equation of motion is given (3.26)
This is obtained by taking the dot product of Eq. (3.21) with the unit vector e, ˆ representing the direction of %eld line at R. Eq. (3.26) represents a potential motion of the parallel guiding centre coordinate x = eˆ · R, along the %eld line with the potential , which is proportional to the strength of the %eld B. This therefore oFers the possibility of trapping a charged particle in a “potential well” along the %eld line, a region of the %eld bounded on either side by regions of suWciently high magnetic %eld. Taking E = 0, the %rst integral of Eq. (3.26) is given by E=
1 2 mv + ; 2
(3.27)
where E, the total energy of the particle, is constant of motion in the static %eld. The trapping of the particle in the potential well occurs if E 6 max ; ( max = eBmax =mc) where Bmax is maximum of the magnetic %eld on either side along the %eld line on which the guiding centre moves. If the two maxima on either side are diFerent, the trapping will be determined by the ¡ smaller of the two E 6 (eBmax =mc). Such a trapping, occasioned as it is by the adiabatic invariance of , is known as the “adiabatic trapping”, and the trapping potential well as the “adiabatic trap”, or “mirror trap”. For a given energy E, and the gyroaction , the point along the %eld line from which the particle will be reRected is given by
eBˆ E= ; (3.28) mc at which point the v = 0, Bˆ being the value of B as the point of reRexion. Such a point is referred as the “mirror point” for the particle, where it is said to be mirror-reRected form. Hence the name “mirror trap”.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
323
If be the pitch angle of the particle velocity at a given point along the magnetic %eld line, ˆ = v cos ; then de%ned by (v · e) 2 mv⊥ E sin2 = ; (3.29) 2
E = 12 mv2 . The condition for the trapping of the particle in the adiabatic potential can then be expressed in the form
=
¡ 1=2 sin 0 ¿ (B0 =Bmax ) ;
(3.30)
where 0 is the pitch angle at the position of the particle where the magnetic %eld is B0 and where ¡ Bmax is the lesser of the two maxima of the magnetic %eld on either side of the %eld minimum. In particular, B0 is usually taken to be the minimum of the %eld, at the midplane for a symmetric %eld. ¡ 1=2 The angle c , de%ned by sin c = (B0 =Bmax ) is called the “loss cone angle”, since all particles with −1 ¡ 1=2 0 6 sin (B0 =Bmax ) would lie in the untrapped region and would be lost. For a given energy E, on the other hand, the point of reRection occurs at a point where the magnetic %eld is Bˆ = B0 =sin2 0 :
(3.31)
The trapping of charged particles based on this principle has found wide applications both in the plasma fusion devices, particularly those referred to as the “Mirror Machines”, as well as some of the well known space phenomena such as the van Allen belts. Budker [29] in Russia (the erstwhile USSR) and R.F. Post in the U.S. were the %rst to suggest such a mirror trapping of charged particles. 3.4.1.2. The “perpendicular” equation of motion. Though we shall not be concerned much in this review about the drift motions perpendicular to the magnetic %eld, it is nevertheless, appropriate for the sake of completeness to discuss brieRy the contents of the perpendicular components of Eq. (3.21) which is given by e R6 ⊥ = E⊥ + R˙ ⊥ × − ∇⊥ : (3.32) m m Note that while v = R˙ · eˆ = R˙ , is of O(1), being the motion along the %eld line and therefore unconstrained by the magnitude of B; R˙ ⊥ is of O(j), and R6 ⊥ ∼ O(j2 ). Therefore, the perpendicular equation (3.32), is not to be solved as a diFerential equation, but rather as an algebraic equation for the R˙ ⊥ . That gives cE × B ∇⊥ ˙ 6 R⊥ = (3.33) + 2×
+ R⊥ ; B2
m where R˙ ⊥ in (3.33) now represents “drifts” of the guiding centre of the particle, which are all perpendicular to the magnetic %eld. Thus the parallel equation of motion (3.26) along with Eq. (3.33) which describe “perpendicular” drifts of the guiding represent the lowest order results of the adiabatic theory. Needless to say, the adiabatic paradigm embodied in these results continues to play a crucial role in the study of the dynamics of plasmas when the adiabaticity conditions are well satis%ed. The latter is indeed the case in many practical situations. 3.4.1.3. The longitudinal adiabatic invariant. The con%nement of the particles in the adiabatic mirror trap renders the motion along the line of force periodic or at least quasi-periodic, as the
324
R.K. Varma / Physics Reports 378 (2003) 301 – 434
particle bounces oF the mirrors at the two turning points. This oFers the possibility of de%ning another adiabatic action invariant, for this quasi-periodic motion, de%ned by (3.34) J = p d x = d x [2m(E − )]1=2 : This action will be an adiabatic invariant, for suWciently slow variation of the appropriate parameters of the problem. For example, if the distance L between the mirror points varies (slowly enough) because of some time dependence of the magnetic %eld, or because the particle drifts given by (3.33) transport the particle from one %eld line to another such that L varies slowly, then the “longitudinal action” J (3.34) is an adiabatic invariant [5]. This adiabatic invariant, it may be mentioned, can be invoked to describe the Fermi-acceleration of cosmic ray particles between two approaching magnetized clouds. 4. Nonadiabatic e$ects As mentioned in Section 3.4 there exists a nonanalytic term in the asymptotic series of an adiabatic invariant which is of the form ∼ e−c=j ; j being the small adiabaticity parameter. This represents a nonexpandible change in the adiabatic invariant which arises as the system undergoes a passage from one space-time point to another, when the frequency of the ‘quasi-periodic motion’ is an analytic function, but its %rst derivative in space or time goes through a zero, and the second derivative has an appropriate sign. Such a change in an adiabatic invariant is referred to as a nonadiabatic change, and the associated eFects as nonadiabatic eFects. When the frequency itself goes through a zero or a jump, the nonadiabatic eFects occur a fortiori. When the higher order time derivatives of the frequency have discontinuities, they lead to higher order nonadiabatic eFects. As has already been pointed out in the Introduction (Section 1) and also in Section 2, there exists a formal analogy between the relationship of quantum to classical mechanics on the one hand, and that of exact charged particle dynamics to adiabatic motion, on the other. In fact, classical mechanics can be regarded as an “adiabatic approximation” to quantum mechanics. As discussed in Section 2, in both cases the relationship is asymptotic. We wish to highlight a similar analogy with respect to nonadiabatic eFects. The nonadiabatic eFects (the departures from adiabatic theory), which have the form ∼ e−c=j for an analytic function for the magnetic %eld variation, are likewise analogous to quantum eFects, the departures from classical mechanics which typically have the form e−K=˝ . Both j and ˝ (or a nondimensionalized form thereof) are the respective small parameters in the two cases. We shall be often alluding to this analogy in our discussion in the following pages, and use it eventually to formulate our new paradigm in Section 6. We shall now undertake a review of the work on the calculation of individual particle nonadiabaticity carried out over the last forty years. We shall consider two broad cases: (I) A purely time dependent, but spatially homogeneous magnetic %eld. (II) A spatially inhomogeneous, but static magnetic %eld. A large part of the earliest work related to the Case I, starting with the early works by Kulsrud [22], Chandrasekhar [30], Hertweck and Schl6uter [31], Lenard [23] and Vandervoort [32] and in the (then) Soviet camp by Chirikov [7], Dykhne [35] and later by Howard [34]. It is not the purpose of this review to oFer a detailed account of these pioneering investigations. Instead, while presenting here an overview of the nature of these approaches, our endeavour would be to present calculational
R.K. Varma / Physics Reports 378 (2003) 301 – 434
325
procedures, partly based on the above works and partly obtained independently by us in a manner which would appear to be more systematic and pedagogic. It is our hope that it will constitute a more coherent account of the topic than what a mere review of the original works could entail. The calculations relating to the Case II were %rst reported in the work of A. Garren et al. [11] at the 2nd Geneva Conference on the Peaceful Uses of Atomic Energy in 1958. This and further work in this case by Dykhne and Chaplik [44], Hastie et al. [40], Cohen et al. [43] and others will be reviewed in Section 4.2.1. It may be mentioned that the calculations relating to this case (static, inhomogeneous magnetic %eld) aroused a much greater interest from applications point of view than those relating to the Case I. The latter work, in the 1970s and 1980s was therefore much more related to the Case II than to the Case I. 4.1. Time dependent, spatially homogeneous magnetic 3eld It may be recalled that the dynamics of a charged particle in a purely time dependent homogeneous magnetic %eld was shown to be governed by Eq. (3.10) which represents a harmonic oscillator for a complex coordinate 2 de%ned through (3.9) and with a time dependent frequency ! = 12 ( being the gyrofrequency). Kulsrud [27] was the %rst to calculate the relative changes in the adiabatic invariant J = E=! of a harmonic oscillator as a consequence of the discontinuities in the derivatives of the frequency ! as a function of time. We shall present later a diFerent (and a somewhat simpler) derivation of Kulsrud’s result. One may, in general, consider two classes of time dependences of the frequency of the oscillator. I. Discontinuous changes in the frequency or its %rst derivative. II. Smooth and slow changes in the frequency. We shall consider two speci%c forms of discontinuous changes: (i) a step function and (ii) a -function change in the frequency. However, to %nd out the change in the action invariant for the charged particle corresponding to Eq. (3.7) which is related to the oscillator Eq. (3.10) for the complex coordinate 2, we need to carry out the analysis for a complex coordinate without imposing the reality condition 2∗ = 2. We shall do that in what follows. We shall, however, %rst specialize the case of real 2 to correspond to the case of a time-dependent oscillator. 4.1.1. Discontinuous changes in the frequency 4.1.1.1. The harmonic oscillator. (i) A step function discontinuity: Consider the one dimensional oscillator described by (3.10) for the complex coordinate 2 26 + !2 2 = 0 ;
(4.1)
with ! = 12 , where ! = !1 ;
t 6 t1 ;
! = !2 ;
t ¿ t1 :
(4.2)
The solution to Eq. (4.1) is then i!1 t −i!1 t + 2(1) ; 21 = 2(1) + e − e
t 6 t1 ;
i!2 t −i!2 t + 2(2) ; 22 = 2(2) + e − e
t ¿ t1 :
(4.3)
326
R.K. Varma / Physics Reports 378 (2003) 301 – 434
These solutions 21 and 22 must be matched at t=t1 by equating their values and their time derivatives at t = t0 . This yields ˆ(1) ˆ(2) ˆ(2) 2ˆ(1) + + 2− = 2+ + 2− ;
(4.4a)
ˆ(1) ˆ(2) ˆ(2) !1 (2ˆ(1) + − 2− ) = !2 (2+ − 2− ) ;
(4.4b)
where (1) ±i!1 t1 ; 2ˆ(1) ± = 2± e
(2) ±i!2 t1 2ˆ(2) : ± = 2± e
ˆ(1) Solving for 2ˆ(2) ± in terms of 2± gives !1 ˆ(1) 1 1 (2) ˆ 1+ 1− 2+ = 2+ + 2 !2 2 !1 ˆ(1) 1 1 (2) ˆ 1− 1+ 2− = 2+ + 2 !2 2
!1 !2 !1 !2
(4.5)
2ˆ(1) − ; 2ˆ(1) − :
(4.6)
The energies E(2) and E(1) of the oscillator 2 at the times t ¿ t1 and t 6 t1 are given by (2) 2 2 ˙ 2 + 1 !2 |2|2 = !2 [|2(2) E(2) = 12 |2| + | + |2− | ] ; 2 2 2
(4.7a)
(1) 2 2 E(1) = !12 [|2(1) + | + |2− | ] :
(4.7b)
Using (4.5)–(4.7) we get !12 1 2 !12 (1) 2 (1) 2 (1)∗ (1) −2i!1 t1 (2) + c:c:) : E = !2 2 1 + 2 (|2+ | + |2− | ) + 2 1 − 2 (2+ 2− e 4 !2 !2
(4.8) ∗
(1) If we specialize to the case of real 2 to correspond to the case of a harmonic oscillator, (2(1) + = 2− ) 2 (2) ˆ(1) 2 so that |2ˆ(1) becomes + | = |2− | . Then, E 2 !2 − !12 (2) 2 2 2 E = (!1 + !2 )20 1 + 2 cos 2(!1 t1 + ’) ; (4.9) !2 + !12
where we have de%ned −i’ 1 ; 2(1) − = 2 20 e
i’ 1 2(1) : + = 2 20 e
Using expression (4.7b) for E(1) and de%nition (4.10) we get !22 (2) (1) 2 2 cos (!1 t1 ) + 2 sin (!1 t1 ) ; E =E !1
(4.10)
(4.11)
where we have chosen ’ = *=2, which renders the solution 21 = 20 sin !1 t for t 6 t1 . If one has another jump in frequency from !2 to !3 at the time t = t2 , then one similarly has 2 !3 (3) (2) 2 2 E =E cos (!2 t2 + ’) + sin (!2 t2 + ’) = E(2) g (!3 ; !2 ; t2 ) !2 = E(1) g (!3 ; !2 ; t3 ) g (!2 ; !1 ; t2 ) :
(4.12)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
327
In general, E(n) = E(0) gn (!n ; !n−1 ; tn−1 )gn−1 (!n−1 ; !n−2 ; tn−2 ) · · · g1 (!1 ; !0 ; t0 ) ;
(4.13)
where gj is the transfer function gj = gj (!j ; !j−1 ; tj−1 ) ;
(4.14)
and which has the obvious property that gj (!j ; !j−1 ; tj−1 )|!j =!j−1 = 1 :
(4.15)
Eq. (4.13) expresses the energy of the oscillation after a series of n jumps in its frequency to the values !n ; !n−1 · · · !1 at the times tn−1 ; tn−2 ; : : : ; t0 . Any time dependence of ! can in principle be constructed out of this sequence. We shall, however, obtain its diFerential equation equivalent. To do so, expand the right-hand side of (4.11) (for example) around !2 = !1 , assuming V!1 = !2 − !1 , to be small. This yields VE(1) = 2E(1)
V!1 V!1 (1) sin2 !1 t1 = E (1 − cos 2!1 t1 ) !1 !1
(4.16)
where VE(1) = E(2) − E(1) . Now de%ning the adiabatic action invariant J J = E=!
(4.17)
for the oscillator, (4.16) gives d d (ln J ) = − (ln !) cos 2 ; (4.18) dt dt t where we have written = ! dt. If we recognize that (4.18) is one of the Hamilton equations of motion for the action-angle pair (J; ) then it is easy to see that the other equation, for must be d !˙ sin 2 : (4.19) = !(t) + dt 2! It may be noted that (4.18) and (4.19) are precisely the equations obtained by Vandervoort [32]. Now Eq. (4.18) or its earlier diFerence from (4.16) says that the relative change VJ=J in the adiabatic invariant as a result of the jump V! in the frequency is given by VJ V! =− cos 2 0 (4.20) J ! where 0 is the phase of the oscillator at the time of the jump. To obtain the jumps in J , as a consequence of the discontinuities in the derivatives of ! as a function of time, we integrate (4.18) by parts. This yields: t t sin 2 d d2 cos 2 d n+1 sin(2 − (*=2)n) t − (ln !) · · · − (ln !) ln J = − (ln !) : dt 2! 0 dt 2 (2!)2 dt n+1 (2!)n+1 0 0 (4.21)
328
R.K. Varma / Physics Reports 378 (2003) 301 – 434
If now the %rst N derivatives are continuous and (N +1)th derivative has a discontinuity of magnitude ZN +1 , at a time tN , then the corresponding jump VJ in J is given by * VJ (N +1) ZN +1 sin 2 − = −2 N : (4.22) J (2!)N +2 2 This is essentially the result of Kulsrud [22]. The assumption of this result is that none of the %rst N time derivatives of ! is discontinuous, (N + 1)th being the %rst discontinuous derivative with being the phase at the time when the discontinuity occurs. (ii) A -function frequency discontinuity: Consider next the harmonic oscillator (4.1) with a -function frequency change given by !(t)2 = !02 [1 + T(t − t0 )]
(4.23)
so that ! = !0 ;
t ¡ t0 ;
t ¿ t0 :
The %rst time derivatives of 2 will not be continuous across the discontinuity now. Writing the solutions as i!0 t −i!0 t 21 = 2(1) + 2(1) ; + e − e
t ¡ t0 ;
i!0 t −i!0 t + 2(2) ; 22 = 2(2) + e − e
t ¿ t0 ;
(4.24)
we have, on matching 21 and 22 at t = t0 ˜(1) ˜(2) ˜(2) 2˜(1) + + 2− = 2+ + 2− ;
(4.25)
where (1) i!0 t0 ; 2˜(1) + = 2+ e
(2) −i!0 t0 2˜(2) : − = 2− e
(4.26)
To obtain the matching condition for the %rst time derivative, integrate Eq. (4.1) with !2 (t) given by (4.23) across the discontinuity. This yields [2˙2 − 2˙1 ]t=t0 = −!02 T2(t0 ):
(4.27)
Using (4.24) this gives 2 ˜(2) ˜(1) ˜(1) ˜(1) ˜(1) i!0 (2˜(2) + − 2− ) − i!0 (2+ − 2− ) = −!0 T (2+ + 2− ) :
(4.28)
˜(1) Eqs. (4.25) and (4.28) then give the expressions for 2˜(2) ± in terms of 2± 1 ˜(1) 1 ˜(1) 2˜(2) + = (1 + 2 i!0 T )2+ + 2 i!0 T 2−
(4.29a)
1 1 ˜(1) ˜(1) 2˜(2) − = − 2 i!0 T · 2+ + (1 − 2 i!0 T )2− :
(4.29b)
The energies E1 and E2 of the oscillator at a time t ¡ t0 and t ¿ t0 (pre- and post-discontinuity) are given by the expressions (1) 2 2 E1 = !02 [|2(1) + | + |2− | ] ;
(4.30a)
(2) 2 2 E2 = !02 [|2(2) + | + |2− | ] :
(4.30b)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
329
˜(1) Using expressions (4.29a), (4.29b) for 2˜(2) ± in terms of 2± , we obtain the energy E2 in terms of E1 : E2 = E1 {1 + 12 (!0 T )2 + (!0 T ) sin 2(!0 t0 + ’) + 12 (!0 T )2 cos 2(!0 t0 + ’)}
(4.33)
whence 1 J 2 − J1 E2 − E 1 1 = = (!0 T )2 + (!0 T ) sin 2(!0 t0 + ’) + (!0 T )2 cos 2(!0 t0 + ’) : J1 E1 2 2
(4.34)
If we now compare the fractional change (4.34) in the adiabatic invariant for the -function change in frequency with that obtained for the step-function change, we %nd the interesting distinction that the former has a phase-independent term 12 (!0 T )2 , which is also independent of the sign of T , while for the step-function, the change VJ=J [Eq. (4.20)] is determined by the phase 0 at the instant of the change. It, therefore, follows that if we have an ensemble of oscillators with diFerent initial phases, then for a -function change of frequency there will be a non-zero ensemble average of (J2 − J1 )=J1 , while for a step-function change, we shall have a vanishing average. 4.1.1.2. Charged particle in a magnetic 3eld. Recall from Eqs. (3.7)–(3.10) that since the (per2 ˙ 2, pendicular) energy of gyration of the charged particle in a magnetic %eld is given by 12 v⊥ = 12 |1| the gyroaction is given by =
˙2 1 |1| : 2
Using expression (3.9) for 1 in terms of 2, this gives 1 ˙ 2 + 1 2 |2|2 + 1 i (22 ˙ ∗ − 2˙∗ 2) : |2| = 2
4 2
(4.35)
(4.36)
(i) A step function discontinuity: If we now use the form of solution (4.3) for t ¿ t0 , (t0 being the instant of discontinuity) we obtain the gyroaction 2 for t ¿ t0 . 2 2 = 12 2 |2(2) − | ;
(4.37)
where 2 has not been assumed to be real, and 2 , is the gyrofrequency for t ¿ t0 . Similarly, the gyroaction at the time t ¡ t0 ( = 1 ) is 2 1 = 12 1 |2(1) − | :
(4.38)
(1) Now to obtain 2 in terms of 1 , substitute 2(2) − in terms of 2± from (4.6)
2 =
1 (2) 2 1 2 2 (1) 2 |2− | = {( 2 − 1 )2 |2(1) + | + ( 2 + 1 ) |2− | 2 2
2 ∗ (1) −i t0 (1)∗ i 0 t0 + ( 22 − 12 )(2(1) + 2(1) )} : + 2− e + 2− e
(4.39)
It may be noted that this corresponds with appropriate identi%cation, to the result given by Chandrasekhar [30], but obtained here by a well laid out systematic procedure. Using the expression
330
R.K. Varma / Physics Reports 378 (2003) 301 – 434
(4.38) for 1 we obtain 2(1) 2 2 1 + ( 2 + 1 )2 + ( 2 − 1 )2 (1) = 1 4 1 2 2−
+( 22 − 12 )
2(1) + 2(1)
∗
e−i t0 +
2(1) +
2(1) −
ei t0
;
(4.41)
(1) 2 2 − 1 1 2 ( 2 − 1 )2 1 + + = 2(1) 1 4 1 2 −
+( 22 − 12 )
2(1) +
2(1) −
∗
e−i t0 +
2(1) + 2(1) −
ei t0
:
(4.42)
(ii) A function discontinuity: Next, we determine the change in across the -function discontinuity given by (4.23). With the solutions (21 ; 22 for t ¡ t0 , and t ¿ t0 ) given by (4.24), Eq. (4.29a) and (4.29b) describe the matching conditions across the discontinuity. Substituting for 2(2) − from (4.29b) in (4.37) we get 2 (1) 2+ 1 1 1 (2) 2 (1) 2 2 2 = |2− | = |2− | 1 + ( 0 T ) 1 + (1) 2 2 8 |2− |2
1 1 + ( T ) sin t0 + ( T )2 cos t0 2 8
(4.43)
which with (4.38) gives
(1) 2 2 − 1 1 |2 1 1 | + ( T ) sin t0 + ( T )2 cos t0 : = ( T )2 1 + + (1) 1 8 2 8 |2− |2
(4.44)
From expressions (4.42) and (4.44) for the change in the gyroaction of a charged particle in a magnetic %eld for, respectively a step function change and a -function change in the magnetic %eld, we note that both the expressions have a phase-independent part. This is, in contrast to the case of a harmonic oscillator discussed earlier in this section where the change in the action for the step function change of the frequency is entirely phase dependent, while that for the -function change there is a phase independent part. It may, therefore, be noted that the charged particle in a magnetic %eld is not quite equivalent to a harmonic oscillator in the time dependent case because of the coupling to the induction equation. 4.1.2. Smooth changes in the frequency 4.1.2.1. The harmonic oscillator. The treatment of the cases of discontinuous changes in the frequency and of its %rst derivative carried out in Section 4.1.1 for the determination of nonadiabatic
R.K. Varma / Physics Reports 378 (2003) 301 – 434
331
changes in the action invariant is somewhat simpler because the post-discontinuity solution and the value of the action could be determined in terms of the pre-discontinuity solution and the corresponding action value through appropriate matching conditions. For the case of smooth changes in frequency, on the other hand, a solution for the action (as a function of time) would imply an appropriate matching at all the continuum of points on the time line in the appropriate time interval. In such a case, the nonadiabatic change in action is de%ned in the spirit of Eq. (2.19), where J∓ = J (t →∓ ∞) corresponds to the values of the action J in the time asymptotic past and future, and where the frequency takes the values !1; 2 , while it goes through any given “smooth” variation between t = −∞ and +∞. The diFerence VJ = (J+ − J− ) is then the nonadiabatic change which we have to evaluate for any given time variation of the frequency going from !1 at t → −∞ to !2 at t → +∞. Such an evaluation can be carried out by using two distinct kinds of procedures exempli%ed by the treatments of Chandrasekhar [30], Backus, Lenard and Kulsrud [33], Vandervoort [32] and Howard [34] on the one hand, and that of Dykhne [35] on the other. We shall discuss here the elegant treatment of Dykhne as it is closer to the spirit of the present review which seeks to explore the formal relationship and analogy between nonadiabatic eFects in classical mechanics and quantum eFects. Consider, therefore, the time dependent one-dimensional harmonic oscillator given by (2.4): q6 + !2 (t)q = 0 with q being a real coordinate, and !2 (t) a function of time, such that asymptotically ! → !1; 2 at t → ∓∞. It is further assumed that !2 is an analytic function of time. The solution of the above oscillator equation has, therefore, the asymptotic form i!1 t −i!1 t q1 = 12 (2(1) + 2(1) ); + e − e
t → −∞ ;
(4.45a)
i!2 t −i!2 t + 2(2) ); q2 = 12 (2(2) + e − e
t → +∞ ;
(4.45b)
where the complex solution 2 is governed as before by Eq. (4.1), namely 26 + !2 (t)2 = 0 ; (1)∗ (2)∗ and 2(2) and the reality of q1 and q2 would require 2(1) − = 2+ − = 2+ . Note that this equation for the complex coordinate 2, is analogous to the Schr6odinger equation of quantum mechanics (2.13) which has the second spatial derivative of the complex function , while in our oscillator equation (4.1) we have the second time-derivative of the complex coordinate 2. The quantity !2 (t), in the oscillator equation is, on the other hand, analogous to the quantity 2m(E − V )=˝2 = p2 =˝2 (p being the momentum) in the Schr6odinger equation. Since !2 ¿ 0, it is analogous to the free or unbound case, (E − V ) ¿ 0, in quantum mechanics. If we now consider an “incident wave” ei!1 t from t = −∞ encountering a time dependence !2 (t), this will lead to a “reRected wave” e−i!1 t at t = −∞ and a “transmitted wave” ei!2 t at t = +∞. We shall thus have a complex solution
21 = ei!1 t + Re−i!1 t ; 22 = Tei!2 t ;
t → −∞ ;
t → +∞ ;
(4.46a) (4.46b)
332
R.K. Varma / Physics Reports 378 (2003) 301 – 434
where R is the amplitude of the “reRected wave”, and T that of the transmitted wave. Comparing the solution (4.46) with (4.45), we are led to identify ∗ 2(1) + = (1 + R );
2(2) + = T;
2(2) − =0 :
(4.47)
Furthermore, it is straightforward to show that if 2 is a solution of the oscillator equation, then the quantity ˙ ∗ − 2˙∗ 2) N = i(22
(4.48)
is a constant of motion. In quantum mechanics, it corresponds to the conservation of the probability current J = (˝=i)( ∗ ∇ − ∇ ∗ ). Using the asymptotic form of the solutions (4.46) for 21 and 22 for t → ∓∞ respectively, (4.48) yields: (1) 2 (2) 2 (2) 2 2 !1 (|2(1) + | − |2− | ) = !2 (|2+ | − |2− | ) :
(4.49)
(2) (1) (1) (2) (2) Identifying 2(1) ± ; 2± , in (4.46) as 2+ = 1; 2− = R; 2+ = T; 2− = 0; we obtain from (4.49) for the set (4.46), the relation
!1 (1 − |R|2 ) = !2 |T|2 :
(4.50)
On the other hand, for the action of the oscillator Jk ; (k = 1; 2), we %nd from (4.45) (taking qk to be real) (k) (k)∗ (k) 1 Jk = 12 !k 2(k) − 2+ = 2 !k 2+ 2+ :
(4.51)
Using (4.47) and (4.50) we then obtain for Jk J1 = 12 !1 |1 + R|2 ; J2 = 12 !2 |T|2 = 12 !1 (1 − |R|2 ) ;
(4.52)
so that we have J = J1 − J2 = !1 (|R|2 + RR) :
(4.53)
where R denotes the real part. It may be emphasized that this result is exact. To evaluate J , the change in the action J , we must %nd an expression for the reRection amplitude R. For that we shall employ a method used in quantum mechanics to calculate the reRection amplitude of the matter wave from a potential barrier of height less than the energy of the particle—the case of “reRection above the barrier”. As pointed out earlier, the oscillator equation corresponds, in quantum mechanics, to the case where the energy exceeds the height of the potential barrier, and hence to the case of “reRection above the barrier”. We shall %rst consider the solution of the problem in the WKB or quasi-classical approximation using a procedure developed by Pokorovskii, Savvinykh and Ulinich [36] and also explained subsequently in Landau and Lifshitz [37]. This is applicable generally to any time dependence of !2 (t) which is analytic. Later we shall consider an exact determination of the reRection amplitude R for a particular time dependence of !2 as an illustrative example. In any case, the determination of R corresponds to a scattering problem in quantum mechanics.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
333
Fig. 1. The contour in the complex t-plane, with t as the complex turning point, as the solution of !2 = 0.
(i) Determination of the re;ection amplitude R in the WKB approximation: When the energy E of the particle is less than the barrier height Vmax , then there exists a turning point X0 on the real axis, given by E − V (X0 ) = 0. As one knows, the WKB solutions while being valid away from the turning point are invalid at the turning points. The connection of the solutions on the two sides of the turning point therefore necessitates an arti%ce of an excursion into the complex plane along a contour far away from the turning point so that the form of the solutions in the WKB approximation remains formally valid. A similar arti%ce is employed in the case of a non-negative kinetic energy in the quantum case and analogously in the oscillator case where the corresponding quantity !2 ¿ 0. A “turning point” can be de%ned in this case as well, given by !2 (t0 ) = 0, where t0 is now a complex number—a complex-valued turning point. Let t1 = Rt0 . Then the solution of the oscillator equation to the right of the turning point is given by t : (II) ! dt ; t ¿ t1 ; (4.54) 2+ = √ exp i ! t1 where the superscript (II) refers to the region t ¿ t1 , to the right of the turning point. The solution (4.54) contains only the “outgoing wave” represented by the positive exponent as there can be no incoming wave from t = +∞ in the region II. We shall determine the solution 2(I) for t ¡ t1 , to the left of the turning point. To do that we follow the variation of the solution (4.54) along a contour C in the upper half-plane which encloses the complex turning point t0 at a suWcient distance from it as shown in Fig. 1. Note that the passage around the turning point t0 causes a change in the sign of the root (!2 )1=2 , (I) from +! to −!, and after return to the real axis the function 2(II) + becomes the function 2− , that is a “reRected wave”. On the other hand, going along a path below the point t0 , that is going along (I) the real axis changes the function 2(II) + into the incident wave 2+ . We thus have t t : : (I) ! dt + $ exp −i ! dt exp i ! dt ; (4.55a) 2 = √ exp i ! |!|ei* t1 t1 C t : (II) ! dt ; t ¿ t1 ; (4.55b) 2 = √ exp i ! t1
334
R.K. Varma / Physics Reports 378 (2003) 301 – 434
where t1 is an arbitrary point t1 ¡ t, and hence C is the semi-circular contour in the upper half-plane. This can be distorted to the one given by C , which consists of two lines in opposite directions on the two sides of the branch cut connecting t0 to t1 (Fig. 1). It can furthermore be shown by a method similar to that used in Landau and Lifshitz [20] that : = :. Comparing (4.55a) with (4.46a), we %nd that t0 ! dt = −i exp 2i ! dt : (4.56) R = −i exp i C
t1
4.1.2.2. Exact treatment: A particular case. We next consider the case of a particular time dependence of the frequency of the harmonic oscillator, namely !2 (t) = !02 (1 + (2 sech2 (t=T )) ;
(4.57)
which is such that !2 (±∞) = !02 :
(4.58)
The oscillator equation is correspondingly given by d2 q + [!02 + (!0 ()2 sech2 (t=T )]q = 0 : dt 2 If we change the independent variable t to , given by = tanh(t=T ) :
(4.59)
(4.60)
Eq. (4.59) becomes 2 d 2 dq (1 − ) + ( + 1) − q=0 ; d d 1 − 2
(4.61)
with the identi%cations 2 = −(!0 T )2 ;
( + 1) = (!0 T()2 :
(4.62)
For slow variation of the frequency (!0 T )1 this implies ; ||1. Eq. (4.61) is an associated Legendre equation, which has a general solution: q = AP () + BQ () ;
(4.63)
where P () and Q () are associated Legendre function of the %rst and second kind. These can be expressed in terms of the hypergeometric functions as follows [38]: 1 + (1=2) 1 1 (4.64a) F −; + 1; 1 + ; (1 − ) ; P () = >(1 − ) 1 − 2 1 >(1 + + )>(−) 1 − (1=2) 1 Q () = F −; 1 + ; 1 + ; (1 − ) 2 >(1 + − ) 1+ 2 1 1 1 + (1=2) (4.64b) + >() cos(*) F −; 1 + ; 1 − ; (1 − ) : 2 1− 2
R.K. Varma / Physics Reports 378 (2003) 301 – 434
335
From (4.60) it is seen that → ±1 ⇒ t → ±∞ :
(4.65)
This leads to 1 − (1=2) → e∓t=T = e∓i!0 t ; lim →±1 1 + 1 + (1=2) lim → e±t=T = e±i!0 t : →±1 1−
(4.66a) (4.66b)
Note that in consequence of (4.66) lim P () →
→1
ei!0 t : >(−)>(1 + )
(4.67)
Now, in view of the scattering formalism discussed in the last section, we shall choose as a solution of Eq. (4.61) in the region t → ∞, a function which represents only the outgoing wave ei!0 t . In view of (4.67), P () is such a function. In the region t → −∞, on the other hand, the chosen function which connects to P (), should be such that it is a sum of two terms going as e−i!0 t and ei!0 t in the region t → −∞. It can be shown [38] that P (−)
sin( + )* >( + + 1) − sin * + P (−) = P () : sin * sin * >( − + 1)
(4.68)
From expression (4.64a) for P () and the asymptotic expression (4.66), we see that for → −1 (t → −∞) lim P (−) →
→1
e−i!0 t ; >(−)>(1 + )
(4.69a)
and lim P− (−) →
→1
ei!0 t : >(−)>(1 + )
(4.69b)
Thus the %rst term on the left-hand side of (4.68) would represent the “reRected wave”, and the second one the “incoming” (incident) wave in the region t → −∞. The terms on the left-hand side in this limit are: sin * e−i!0 t ei!0 t sin( + )* >( + + 1) + : sin * >(−)>(1 + ) sin * >( − + 1) >(−)>(1 + ) &' ( % &' ( % reRected wave
(4.70a)
incident wave
On the other hand, the right hand side, in the limit t → ∞, which represents the outgoing wave, is ei!0 t : >(−)>(1 + )
(4.70b)
336
R.K. Varma / Physics Reports 378 (2003) 301 – 434
If we normalize the amplitude of incident wave to be unity, the amplitudes of the reRected wave R and transmitted wave T, would be sin * >( − + 1) R= ; (4.71a) sin( + )* >( + + 1) sin * >( − + 1) T= : (4.71b) sin( + )* >( + + 1) Making use of Stirling’s formula for the >-functions for large arguments, ; , we %nd R=
ei@ ; (cosh2 * + cot2 * sinh2 *)1=2
(4.72)
where @ = 20 − 0 ln(2 + 02 ) − 2’ − 0 ; ’ = tan−1
;
= tan−1 (cot * tanh *) ; 0 = !0 T ; = V!0 T : Finally, using the result J1 − J2 = ![|R|2 + RR], [Eq. (4.53)] we get sin2 * sin * cos @ : + J1 − J2 = ! sin2 * + sinh2 * (sin2 * + sinh2 *)1=2
(4.73)
(4.74)
In view of 1, this expression reduces to J1 − J2 = 2!{2 sin2 *e−2* + sin*e−* cos @} :
(4.75)
This expression shows that (a) the nonadiabatic charge in J is exponentially small (∼ e−* ; e−2* ) and (b) it oscillates with , going through a zero for ≡ (!0 T )( = n (an integer), that is both with T , the width, and (, the height of the bell shaped time dependence of !2 . When (2 ¡ 0, on the other hand, there is no oscillation in the nonadiabatic change, since the corresponding functions become hyperbolic. If one compares the (time dependent) oscillator equation (4.59) with the Schr6odinger equation (second order in spatial derivative), then it will be seen that !02 + (!0 ()2 sech2 =t=T ) corresponds to (E − V ) in the latter, so that (2 ¿ 0 corresponds to V ¡ 0. In case of the Schr6odinger equation also we %nd a similar behaviour, namely the oscillation of the reRection and transmission coeWcients with the depth V0 and width (1=A) of the potential well V0 sech2 Ax, for V0 ¡ 0. For V0 ¿ 0, the latter coeWcients do not oscillate now with the height V0 or the width (1=A) (see, for example Gol’dman et al. [39]) analogously to the case of the time dependent harmonic oscillator, where again, as mentioned above, the nonadiabatic change in the action J does not oscillate with the width T and the height ( of the time dependence. We thus notice again the formal similarity between the nonadiabatic eFects and their characteristics as manifested in the change in the action J of the harmonic oscillator, and the quantum eFects as
R.K. Varma / Physics Reports 378 (2003) 301 – 434
337
manifested in the reRection and transmission coeWcients and their characteristics, of particles from potential wells and humps. 4.2. Time independent, spatially inhomogeneous magnetic 3eld We shall now consider the more important case of a static, but inhomogeneous magnetic %eld. It is a more important issue from the application point of view as spelt out in the Introduction, as well as from the point of view of the main theme of the present review, as we shall see. The issue of the nonadiabatic change of the gyroaction of a charged particle moving in a static, inhomogeneous magnetic %eld of an axisymmetric mirror trap was %rst addressed numerically by Garren et al. [11] as early as 1958. Their most signi%cant %nding was that during the course of the motion in the magnetic trap, the gyro-action of the particle suFers an almost step-function (“jump”) change as the latter crosses the mid-plane of the trap, while the magnitude of change depends in a periodic manner on the gyrophase 0 of the particle at the midplane. We outline below some of the salient features of the results of the calculations as featured in Northrop [5]. The numerical calculation of Ref. [11] were carried out for an axisymmetric magnetic mirror geometry described by the vector potential. B0 L 1 % + : cos 1I1 (%) ; A4 = (4.76) 2* 2 where L is the distance between the maxima of the magnetic %eld, % = 2*r=L; 1 = 2*z=L; B0 is the magnetic %eld at a point midway from mirror to the median plane, and I1 is the Bessel function of the second kind of order 1. : is a constant which determines the mirror ratio and was chosen to be 0.2 for this calculation. It may be pointed out that the expression (4.76) for A4 describes, in fact a periodic magnetic %eld in the z-direction—an in%nite array of magnetic mirror traps joined end to end. The calculations were carried out with the particle starting oF at the mid-plane (the minimum of the magnetic %eld), with the velocity vector speci%ed by (v; ; ); v being the magnitude of the velocity vector, being the pitch angle—the angle that the velocity vector makes with the direction of the magnetic %eld locally (which is the z-direction at the %eld minimum) and is the phase angle, the angle that the velocity v⊥ makes with any plane containing the axis of symmetry. As the trajectory of the particle was followed the values of and were noted each time the particle returned to the midplane, and plotted on a (; ) plot as shown in Fig. 2 as a PoincarLe section. Several interesting features were revealed pertaining to the behaviour of the particle in such magnetic %elds: Firstly, the angle [and hence the gyroaction =E sin2 = (R0 )] were found to suFer a “jump” everytime the particle crossed the midplane, with the magnitude of the jump depending on the phase angle with which it crossed the midplane. Basically, two classes of behaviour were found to be in evidence. In one class, it was found that even though the jumps were individually large, they were highly self-canceling from one median plane crossing to the next, with the result that such set of values lay on a smooth curve such as A, in the (; ) plane as shown in Fig. 2 (adapted from Northrop [5]). The interesting and indeed the surprising thing about this curve is that it is described by the constancy of to O(j) as given by Eq. (3.25). This equation can be shown to be expressed in the form f(; ) = sin (sin + a cos2 sin ) = constant ;
(4.77)
338
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Fig. 2. Phase plot (; 4) for a particle in the mirror geometry described by the vector potential (4.76) (adapted from Northrop [5]).
where for the curl-free magnetic %eld described by the vector potential (4.76), the constant a is given by 4*:I1 (%0 ) mev ; (4.78) a= [1 − :I0 (%0 )]2 eB0 L %0 being the solution of 1 4*2 cP4 % = % − :I (% ) ; 0 0 1 0 eB0 L2 2
(4.79)
and P4 is the canonical angular momentum which is a constant of motion for the axisymmetric magnetic %eld. Also the numerically obtained curve in the PoincarLe section agrees with that corresponding to (4.77) more closely, the smaller the gyro-radius is. This is a rather interesting and indeed a fascinating fact that a numerically obtained PoincarLe invariant curve, the constituent points of which are obtained as a consequence of nonadiabatic jumps in the -values (or alternatively gyro-action values) at midplane crossings, should be described by the terms of the adiabatic action invariant to %rst order. This seems to indicate some intimate relationship between the nonanalytic, non-expandible term in the asymptotic series for the gyroaction, which accounts for the nonadiabatic jumps, and the terms of the expansion. This needs to be explored further.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
339
There are also found to exist another set of initial values (; ) for the particle which lead to the invariant curve of the type B, quite distinct from that of type A. Besides, there exist “%xed points” associated with such curves as denoted in the %gure. The second class of behaviour was found to arise for smaller initial values of the angle such that ¿ c (c being the adiabatic loss cone angle). For such initial values , and 0 6 6 2*, the (; ) values at subsequent midplane crossings do not lie on a smooth curve, but are found to be scattered over a band. Indeed, there is found to exist a critical curve of demarcation between the two classes of behaviour, called “stable” for the former class where the points on the PioncarLe section lie on a well de%ned smooth curve, and “unstable” for the latter class where they are found to be scattered. A particle which belongs to the unstable region is found to escape from the mirror trap within about 10 or so mirror reRections. This amounts to an eFective increase of the loss cone angle, since these particles belong to the trapped zone according to the adiabatic theory. One of the important and interesting questions is as to how sharp the boundary is between the ‘stable’ and the ‘unstable’ region. Such an escape of particles from the region of adiabatic trapping due to the nonadiabatic jumps of in the “unstable” region is therefore referred to as “nonadiabatic escape”. As will be described in Section 5 the nonadiabatic escape of charged particles from adiabatic magnetic traps has also been observed experimentally. A theoretical determination of the life times of the particles against nonadiabatic escape is a rather interesting and a challenging mathematical problem, which constitutes one of the core problems, around which this review is centered. 4.2.1. Analytical calculations for the jumps in the gyroaction—single transit nonadiabaticity Subsequent to the numerical calculations of Garren et al. [11], another numerical calculation was carried out by Hastie et al. [40] for the linear quadrupole %eld in which case too the jumps in the gyroaction were found as the particle crossed the minimum of the %eld. However, these authors also presented an analytical approach for the calculation of these jumps. Subsequent improvement and generalization of their calculations have been reported by Howard [41], Krushkal [42] and Cohen et al. [43]. Since it is an important concept we shall review brieRy the salient features of these calculations with reference to the axisymmetric mirror traps. It may be mentioned that the calculation of a nonadiabatic change in the gyroaction in a static, inhomogeneous magnetic %eld is a much more involved problem than that in a homogeneous time dependent %eld considered in Section 4.1, because the former is a full three dimensional problem, or two dimensional in the case of axisymmetry. It may, however, be pointed out that the earliest analytical approach to calculate the nonadiabatic jumps in the gyro-action is due to Dykhne and Chaplik [44]. This is basically a quantum mechanical approach whereby the gyroaction is identi%ed as the action associated with a Landau level, ≡ n˝, with the quantum number n being 1 to correspond to the classical limit. The nonadiabatic change in the gyroaction is then identi%ed as resulting from the transition n → n ± 1, to the neighbouring Landau levels, induced by the magnetic %eld inhomogeneity. The near diagonal scattering matrix elements are thus evaluated, and the change in the adiabatic invariant is then calculated by going to the classical limit. We shall return to a discussion of this method in Section 4.3. The approach of Hastie et al. [40] however, is a more direct classical one. Following Hastie et al. [40] we start from the exact equation of evolution for the gyroaction 2 = 12 v⊥ = (de%ned per unit mass for simplicity) in order to calculate the nonadiabatic change V
340
R.K. Varma / Physics Reports 378 (2003) 301 – 434
in . This is given by 2 v 9 v⊥ v⊥ d 1 2 2 2 =− v + v ln(r ) cos 2 ; cos + dt Rc 2 ⊥ 2 9s
(4.80)
where s is the coordinate parallel to the magnetic %eld, r is the radial coordinate and is the phase angle de%ned by cos = eˆn · v⊥ =v⊥ ; eˆn being the unit vector along the magnetic %eld gradient, eˆn = ∇B=|∇B| and v⊥ , the velocity perpendicular to the magnetic %eld. Rc is the radius of curvature of the %eld line. The equation for the phase angle is also shown to be given by v2 d = − + sin (4.81) dt v⊥ R c for an axisymmetric vacuum %elds without torsion. These Eqs. (4.80) and (4.81) for and for the axisymmetric vacuum mirror %elds can be gleaned from Ref. [45] as the coeWcients of 9f=9 and 9f=9 terms in the Boltzmann–Vlasov equation transformed to the (E; ; ) variables, or calculated independently by diFerentiating the de%ning expressions. Unless one is dealing with a case of a very large mirror ratio M = (Bm =B0 ), as considered by Krushkal [42], it will be suWcient to consider only the %rst term of Eq. (4.81), amounting to a phase averaging. Furthermore, it will be seen that the second term in (4.80) involving cos 2 will yield a contribution exponentially smaller than that accruing from the %rst. We shall therefore neglect the second term. Integrating Eq. (4.80) then yields 1=2 2 1 V = − dt (2E − ) cos ; (4.82)
Rc where the integration is carried out over the guiding centre trajectories which entails the constancy of in the above time integral. Rather than give here a detailed discussion of the evaluation of V, we present here only a broad general outline of the manner of its evaluation, pointing out the various cases, and the main features of the results obtained while referring to the original sources [40,43] for full details. Note that the integrand in Eq. (4.82) is of the form R{g(t) exp[i(t)]}, R standing for the real part with d=dt = − , where g and are slowly varying functions of time arising essentially from the guiding centre (adiabatic) trajectories. Hastie et al. [40] noted that if the integration over t is carried out in the complex plane, then the dominant contribution to V would come from the stationary phase points of namely from d=dt = − = 0; that is from the zero of B in the complex plane. There can also be singularities of g which will give contribution to V, and which also happen to be mostly at the zero of B in Eq. (4.82). The integral (4.82) is then evaluated using the steepest descent method, deformig the contour of integration into the upper half of the complex t-plane (or any other equivalent variable) and the contour is chosen such that the function exp(i) decays exponentially from the zeros of B. The principal contribution then comes from the vicinity of singularities. We shall be concerned mostly with axisymmetric vacuum %elds for the consideration of charged particle dynamics. These are described by the vector potential A = A4 (r; z)eˆ4
(4.83)
which may be written in the form of an expansion in powers of r A4 = B0 [ 12 rf(z) + 14 r 3 h(z)] :
(4.84)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
341
It will be seen that the r 2 term must be absent, for it leads to a nonzero current J4 in the limit r → 0, which is unacceptable. Furthermore, for the regions of the vacuum %eld, J4 = 0 requires h(z) = − 14 f (z) to the lowest order in r. Then we have 1 1 9 (4.85a) (rA4 ) = B0 f(z) − r 2 f (z) ; Bz = r 9r 4 1 9A4 1 3 (4.85b) Br = − = −B0 rf (z) − r f (z) ; 9z 2 16 where primes denote diFerentiation with respect to z. If s denotes the co-ordinate along the magnetic %eld line, then we have ds d z dr = ; = B Bz Br
(4.86)
so that s=
dz 2 (B + Br2 )1=2 ∼ = Bz z
r 2 f 2 dz 1 + 4f2
;
(4.87)
which to the lowest in r (paraxial approximation) gives z ≈ s. Thus to the lowest order in r Bz ≈ B0 f(s) ; Br ≈ − 12 B0 rf (s) :
(4.88)
The function f(s) then speci%es a vacuum magnetic %eld in the paraxial approximation through Eqs. (4.88). A quadratic form of the function f(s) f(s) = 1 + (s=L)2
(4.89)
represents a short trap, where L represents its scale length. While this form of the magnetic %eld leads to a harmonic longitudinal motion, due to the harmonic form of the adiabatic potential, this is not a very interesting form to study the nonadiabatic loss of particles from, as being in%nitely bound on both sides, no loss can occur from such a trap. However, if the form (4.89) is taken to be only in the neighbourhood of the midplane s = 0, it can yet be used to calculate the jump V, which occurs at the midplane crossing. Far away from s = 0, f(s) may have any desired form which joins the form (4.89) smoothly. Now to evaluate the integral (4.82) for any given form of f(s), in accordance with the prescription outlined following it, we %rst change the angle , de%ned by cos = v⊥ · ∇B=v⊥ |∇B| to the Larmor phase angle ’, measured with reference to the guiding centre as the origin. For suWciently small gyroradius %L , it is given by r cos = −rc sin ’ ≈ −r sin ’, where rc is the distance of the Larmor centre from the symmetry axis. Next, the expression for the radius of curvature Rc of the magnetic %eld line is obtained from the equation of the Rux surface rA4 = 12 B0 r 2 f(s) = 12 B0 r02 (const) :
(4.90)
342
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Using (4.90), we get f d 2 r 3 f 2 1 1 ≈ 2 = r0 5=2 − r0 3=2 : Rc ds 4 f 2 f
(4.91)
Note that the singularity (f = 0) of 1=Rc in the integrand of (4.82) coincides with the singularity of the −1=2 factor. Changing also the variable of integration dt = ds=v , and retaining only the most singular term from the expression (4.91) we obtain (4.82) as √ ds f 2 3 2 −1=2
0 r0 Im exp(i’)(2E − ) ; (4.92) V = − 4 v f 3 where Im stands for the imaginary part of the integral following it. Both Krushkal’ [42] and Cohen et al. [43] have evaluated this expression for V for the quadratic form of the %eld (4.89) [and the latter in fact, for a wider class], which we give here without calculational details 3* vr0 √ V =− √ exp(−G=j) sin ’0 ; (4.93) 8 2j 01=2 L where 1 G= 2Aˆ2
1 + Aˆ2 1 + Aˆ ln −1 2Aˆ 1 − Aˆ
(4.94)
with
v⊥ 0 = sin 0 : (4:94 ) Aˆ = v The integral in (4.92) is evaluated at the zeros of f(s) after expanding ’ around the zero of f(s), or alternatively after changing the variable of integration from s to ’; ds=v = d’= : If one uses the expression for f(s) f(s) = (1 − a)−1 [1 − a cos(2*s=L)] ;
(4.95)
which describes an in%nite, periodic multi-mirror system with a harmonic dependence on s, one obtains the expression for V, as given by Irie [46] V = 1() sin ’0 ; 1() = −
3* rg0 v⊥0 exp(−H=j) ; 8
with 1 H= * sin 0
1−a 2a
1=2
[(1 − a)F(0 ; k ) + 2aE(0 ; k ) − sin 0 (2a(1 + a))1=2 ] ;
(4.96)
(4.97)
where rg0 is the position of the guiding centre radial coordinate, F(0 ; k ) and E(0 ; k ) are respectively the incomplete elliptic integrals of the %rst and second kind, and k =(1−k 2 )1=2 ; k =cot 0 ((1− a)=2a)1=2 . The form (4.95) of f(s) is a more appropriate form to study the problem of nonadiabatic leakage, from a single mirror trap as well as to study the nonadiabatic trapping and untrapping as the particle transit from one mirror trap to the next.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
343
4.3. Jump in the gyroaction: a quantum mechanical approach In the last Section 4.2.1 we presented an analytical calculation for the determination of the jump V in the gyroaction as the particle moves across a plane having the minimum of magnetic %eld. This used a prescription %rst given by Hastie et al. [40]. Much before this “direct” approach, however, Dykhne and Chaplik [44] had presented a quantum mechanical approach, which basically provides a quantum mechanical interpretation of the jump V in the gyroaction. This was, in fact, the %rst correct calculation of V, and though an indirect one, it provides an important insight into the nature of the nonadiabatic eFects in relation to the analogy with quantum eFects that we have been alluding to. Since this is an important concept from our point of view, we shall describe here the Dykhne-Chaplik approach to calculate the jump V, for an axisymmetric magnetic %eld. The calculation is carried out for a particle moving initially in a homogeneous magnetic %eld and later, passing through an inhomogeneous %eld along a curved %eld line. The Schr6odinger equation for the particle in an axisymmetric magnetic %eld in cylindrical coordinate system (r; 4; z) given by 2 ˝2 1 9 1 ˝ 9 92 e − =E : (4.98) (r ) + 2 + − A4 2m r 9r 9z 2m ir 94 c Because of the axisymmetry, one can seek solutions of the form = (r; z; m)eim4 whence we get ˝2 1 9 92 − (r) + 2 + V = E 2m r 9r 9z with V as the eFective potential 2 1 M e − A4 V= ; 2m r c
(4.99)
(4.100)
(4.101)
with A4 being the vector potential and M = m˝
(4.102)
is the conserved canonical angular momentum. The nonadiabatic change V is identi%ed as the transition from the Landau level n0 (corresponding to n0 = 0 =˝) to the neighbouring Landau levels n0 ± 1, induced by the magnetic %eld inhomogeneity. To calculate the corresponding matrix elements of the scattering matrix we start from the conserved canonical angular momentum. In the small Larmor radius limit, the guiding centre of the particle lies on the %eld line given by e rA4 = M : (4.103) c 0 For such a particle, which would always stay close to magnetic %eld line given by (4.103) by virtue of the adiabatic limit, it is more appropriate to employ a local coordinate system (y; 4; s) in place of (r; 4; z), where s is the length along the line of force measured from some appropriate point and
344
R.K. Varma / Physics Reports 378 (2003) 301 – 434
y, a coordinate orthogonal to the %eld line, shortest distance of the particle position from the line of force and 4 the azimuthal angle as before. The line element in this coordinate system is given by dl2 = dy2 + h24 d4 2 + h2s ds2 ;
(4.104)
where hs = (1 − y=Rc ), and h’ = r are the scale factors, Rc being the radius of curvature of the %eld line at the point s. The parametric equation of the line of force is given by r = %(s);
z = z(s) :
(4.105)
In the small Larmor radius limit the coordinate y of the particle will always remain small during the motion; we can expand (rA4 ) in the potential energy term (4.101) 9 1 2 92 ::: : (rA4 ) + y (rA ) (4.106) rA4 = (rA4 )y=0 + y 4 9y 2 9y2 y=0
y=0
Moreover, we have the total magnetic %eld on a %eld line B=
1 9 (rA4 ) : r 9y
(4.107)
Hence using (4.106), (4.107) and (4.103), we get 2 1 e 1 z˙ 1 2 2 M − rA4 = m (s) y 1 + y − V= 2mr 2 c 2 Rc %
(4.108)
where z˙ ≡ d z=ds, and r = %(s) + yz, ˙ has been used, and the relation 9B=9y = B=R, obtained from ∇ · B = 0 in these coordinates has been used to substitute for 92 =9y2 (rA4 ) in (4.106) and (4.108). Making now another change of variables from y to 2, 2 = y(m (s)=˝)1=2 ;
s=s ;
(4.109)
with scale factors h2 = (˝=m )1=2 ;
h4 = %(s) + (˝=m )1=2 2z˙
hs = 1 − (˝=m )1=2 2=Rc
(4.110)
we obtain the Laplace operator L of the Schr6odinger equation (4.98) in the variables 2 and s, as follows: L = L0 + L1 ; √ % 9
9 m 92 √ : L0 = + ˝ 922 % 9s
9s √ 1 ˝ 1=2 22 9 % 9 9 m 1=2 z˙ √ − + L1 = ˝ % Rc 92 m
Rc % 9s
9s 1 9 z6 z˙%˙ R˙c 1 ˙ z˙ ˝ 1=2 − 2− 2− + : 2 + m
% % Rc 2 % Rc 9s
(4.111)
(4.112)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
345
Likewise, the potential V in terms of the variables (4.109) is given by V = V0 + V1 ; 1 ˝ 22 ; 2 ˝ 1=2 1 z˙ V0 : 2 − V1 = m
Rc % V0 =
(4.113)
It may be pointed out that the transformation to the new variable 2 is prompted by the observation that the potential energy part V0 = 12 m 2 (s)y2 , corresponds to a “local” harmonic oscillator, that is, whose frequency is a weak function of the variable s. Also both the Laplace operator L and the potential energy V are split into two parts, L0 + L1 and V0 + V1 respectively, with L1 and V1 containing small quantities of the type 2=Rc , with %rst power of 2, which are considered as perturbations. In terms of the new variables, the zeroth order Schr6odinger equation is then given by
√ 2 92 9 % ˝ 9
1 √ ˝ − 2 + 22 − = E : (4.114) 2 92 2m % 9s
9s Note that, since is a weak function of s, Eq. (4.114) almost separates in the variables 2 and s. This gives, in the quasi-classical approximation, a solution for a given energy E 1=2 s m m 1=2 (n; E) = exp i knE ds Hn (2) ; (4.115) 2*knE ˝2 ˝%2 where Hn (2) is a harmonic oscillator eigenfunction for the quantum number (Landau level) n and knE is the “longitudinal” wave number kn2E = (2m=˝2 )[E − (n + 12 )˝ (s)] :
(4.116)
The perturbation Hamiltonian arising from the parts L1 and V1 which contain the terms with 2 and 9=92 (odd powers), will induce transitions from n0 to the neighbouring levels n0 ± 1. The matrix element Snl of the scattering matrix is given by ˝2 L1 + V1 l d2 ds : (4.117) Snl = n∗ − 2m If Xˆ and Yˆ are the terms with the operators 2 and 9=92 respectively in the perturbing Hamiltonian, then integration over 2 yields n 1=2 Sn; n−1 = ds exp −i ds(knE − kn−1; E ) (X − Y ) 2 n 1=2
(s) = ds(X (s) − Y (s))exp i ds ; (4.118) 2 v where knE is expanded around kn−1; E and v is given by v = [2(E − n˝ (s))=m]1=2 :
(4.119)
346
R.K. Varma / Physics Reports 378 (2003) 301 – 434
ˆ ˆ and X (s) and Y (s) are the matrix elements of the operators X and Y . Now note that X (s) and Y (s) comprise of terms ∼ 1=Rc . Note also that the exponent ds=v of the exponential in (4.118) is essentially the gyrophase ’= ds=v = dt. The integral (4.118) is thus very similar in structure to the integral (4.82) for the evaluation of V which has I(ei’ =Rc ) as an integrand. This integral is therefore evaluated similarly to (4.82) by going into the complex plane of s, whereby the main contribution to the integral comes from saddle point which coincides with the zero s0 of the function
(s). The poles of 1=Rc also coincide with the zeros of (s). To the lowest order in the Larmor radius, then we have for the near-diagonal element of the scattering matrix as given in Ref. [44]: s0 v1=4 v n
ds 1=2 ; (4.120a) Sn; n−1 = C(m=˝) exp i ˙ 0 )]1=4 2 R(s0 ) [ (s v s0 v1=4 v (n + 1)
ds ∗ 1=2 ; (4.120b) Sn; n+1 = C (m=˝) exp −i ˙ ∗ )]1=4 R(s0∗ ) [ (s 2 v 0 where C = 2−1=4 >(1=4)e7*i=8 : The change in the adiabatic invariant is then given by
1=4 E 1=2 v V =R ˙ 0 R0
˙ R20 ×exp i
s0
−∞
where
−;
ds
− − − v v
− + i − s0 + i v
−
;
(4.121)
v− and − are the phase, longitudinal velocity, and gyrofrequency as s → −∞.
5. Long term non-adiabaticity of adiabatically conned systems In Section 4.2.1 we considered essentially a single transit nonadiabaticity, the nonadiabatic jump V in the gyroaction as the particle transits once across the region of minimum %eld (for example). If however, the particle is (adiabatically) con%ned in a mirror trap, it bounces oF the mirrors at both ends, and thereby makes repeated transits across the midplane. A)question of considerable physical interest and mathematical complexity is: What is the net change in after an arbitrarily large number of bounces between the mirrors? Posed in a more mathematical language: Given the initial conditions so that the motion corresponds to an adiabatic con%nement in a magnetic mirror trap, what is its long term stability with respect to the change in the gyroaction (or equivalently the distribution of the total energy, in the static magnetic %eld, in the “parallel” and “perpendicular” components)? The numerical work of Garren et al. [11] already revealed a rather interesting feature that for a certain set of initial values (; ) the jumps of across the midplane were highly self-canceling resulting in what appeared at %rst glance to be bounced oscillations in . A theorem was, in fact,
R.K. Varma / Physics Reports 378 (2003) 301 – 434
347
later proved by Arnold [25] according to which the gyroaction would indeed suFer only bounded oscillations in an adiabatic trap if the latter is axisymmetric and the adiabaticity parameter is “small enough” (that is the magnetic %eld is strong enough). This would imply that if a particle is initially con%ned in a %nite adiabatic potential well it will remain con%ned for ever. A similar result was later proved by Braun [47]. A diWculty with the practical application of the result above is that there is no estimate available for the smallness of the adiabaticity parameter below which the consequences of this theorem could be considered to hold. In the absence of other more practical analytical results, there are two other routes possible to the examination of the problem to gain some insight into the behaviour of this system, namely (a) numerical experiments, and (b) laboratory experiments. We have already described the numerical experiments of Garren et al. in Section 4.2 which have revealed some very interesting features of the motion. We shall present in Section 5.1 some early laboratory experiments which give us some indication of the long term nonadiabatic behaviour of charged particles in %nite adiabatic potential wells which manifests itself in the observation of %nite (nonin%nite) residence times in these magnetic traps. Later in Section 5.3 we discuss the question of the long term stability of this motion aA la Chirikov [9,52] which hopes to provide a more practical criterion for the stability of this motion and therefore one for perpetual con%nement. He uses the criterion of the overlap of neighbouring (nonlinear) resonances (between the mean gyro-frequency and the harmonics of the bounce frequency in the trap) with respect to the value of a parameter of a discretized mapping to demarcate the stability boundary. Beyond a critical value of this parameter, the neighbouring resonances overlap resulting in the particle executing unbounded excursion in , in a stochastically diFusive manner, while below this critical value there are supposed to be only bounded oscillations. However, since Arnold [25] has proved the existence of bounded oscillations in only for a “suWciently small” value of the adiabaticity parameter, one would like to understand the relationship between the Chirikov criterion of stability and the requirement for the Arnold theorem to hold. Both the Arnold–Braun theorem and the stability of motion aA la Chirikov have been considered for an axisymmetric magnetic trap, which constrains the system through the conservation of the canonical angular momentum. A relaxation of this constraint for axially asymmetric traps allows the motion into regions of phase space previously forbidden. In particular, the particle can now move across magnetic surfaces induced by the axis-asymmetry. In such a situation another periodic motion associated with the ∇B-drift comes into play nontrivially and leads to new sets of resonances involving three frequencies. An overlap of such neighbouring resonances can lead to excursion of particles across magnetic surfaces, again in a diFusive manner. The instability associated with this process has been discussed by Arnold, and the resulting diFusion has been termed as Arnold diFusion by Chirikov. This is brieRy discussed in Section 5.3.2. 5.1. Nonadiabatic loss of particles from magnetic mirror traps: some early experimental results As discussed in Section 4.2 the numerical results of Garren et al., already gave an indication of the loss of particles from the mirror traps even when they were in the trapped zone, that is away from the “loss cone” in the velocity space, ¿ c ≡ sin−1 (B0 =Bm )1=2 ; c being the loss cone angle, where B0 is the magnetic %eld in the mid-plane of the mirror trap and Bm , the maximum of the %eld at the mirror point. As we saw, the closer the pitch angle , of the particle is to the loss cone
348
R.K. Varma / Physics Reports 378 (2003) 301 – 434
angle c , the sooner the particle was lost after making a few bounces oF the mirrors. This amounts an increase in the eFective loss cone angle. Early experiments carried out to study the behaviour of charged particles in such traps by Rodionov [2 (a)] and Gibson et al. [2 (b)] con%rmed the adiabatic invariance of the gyroaction to quite a high degree inasmuch that the electrons were found to be con%ned in such traps upto ∼ 109 (Gibson et al.) and ∼ 107 reRections (Rodionov), respectively. However, these experiments did not study the nonadiabatic escape of charged particles which the numerical experiments suggested. These studies were undertaken later by the (then) Soviet experimenters [48], the latest of which by Dubinina et al. [49] were by far the most exhaustive and controlled until that time. Injecting the electrons of a certain energy into the mirror trap at a pitch angle greater than the adiabatic loss cone angle c , for the given magnetic %eld con%guration (mirror ratio, M = Bm =B0 , and the scale length L = |∇B=B|−1 , etc.) they measured the residence times (corresponding to an exponential decay) of trapped particles as a function of the magnetic %eld strength. (Varying the current in the solenoid coils preserves the %eld con%guration.) For precise experimental details the original sources may be referred [49]. The exponential decay residence times may be determined by plotting the leakage current from the trap as a function of time on a semi-log plot. The life times so determined as a function of the magnetic %eld strength were also plotted on a semi-log plot. This gave a straight line with a %nite slope, showing that the residence time for a given energy, pitch angle and magnetic %eld con%guration, increased exponentially with the magnetic %eld strength. For a given background gas pressure, the residence times beyond a certain magnetic %eld were limited by the loss due to collisions. Since the particles were injected with a pitch angle greater than the loss cone angle, the loss of particles below such a magnetic %eld could be attributed unambiguously to the nonadiabatic eFects. These authors were particularly interested in checking experimentally the manifestation of the Arnold theorem, according to which the gyroaction as an adiabatic invariant would have only bounded oscillations, if the adiabatic trap is axisymmetric and the adiabaticity parameter is small enough. This would entail a perpetual trapping of the particles and an in%nite residence time for large enough magnetic %eld and an in%nite slope for the ln vs. B straight line plot beyond a certain large enough value of the magnetic %eld. Though the authors did observe a break in the value of the slope in the ln vs. B plot, it was not an in%nite slope, only an increased slope by a factor of 3. The authors, therefore, could not conclude that they had observed a perpetual trapping for the highest magnetic %eld used. 5.2. Life time as an ensemble property A life time against a decay, such as a radioactive decay or against leakage of particles from the adiabatic traps, as discussed above, is generally de%ned through an exponential decay. This, of course, presupposes the existence of a population or an ensemble of a suWciently large number to enable the decay function to be de%ned sensibly. Purely empirically speaking, an exponential decay implies that the time rate of decrease of a population be proportional to the number of members of the population at a given time regardless of any label (such as the initial conditions) that the members may carry. This signi%es a Markovian behaviour. Hamiltonian dynamical system such as the one that we are concerned with in this review, namely charged particles in a magnetic %eld, would in general be far from Markovian. However, under certain
R.K. Varma / Physics Reports 378 (2003) 301 – 434
349
conditions, it can simulate a Markovian behaviour, but this would require a careful examination. In general, however, the escape of an ensemble of non-interacting particles belonging to a Hamiltonian dynamical system, from a quasi-con%nement system such as the “adiabatic trap” would be determined solely by their initial conditions propagated in time by the governing dynamical equations. Let us denote by i = i (Xi(0) ), the instants of escape of the members of the ensemble having their initial conditions Xi(0) . If one de%nes the escape of a particle appropriately such that the position Xi (i ) at the instant of escape belongs to the region of space S (out) , outside the trap Xi (i ) ∈ S (out) , then i =i (Xi(0) ), can be considered as a mapping i =O(Xi(0) ), from Xi(0) to i by the dynamical equation. In that case the decay law from the trap would be determined by how the instants of escape of the various particles i fall in the various time bins. Thus if N(t), be the number of escapes between the times t − 12 dt, to t + 12 dt, we have t+(1=2) dt t+(1=2) dt (t − i ) dt = dt [t − O(Xi(0) )]f(Xi(0) ) dXi(0) ; (5.1) N(t) dt = i
t −(1=2) dt
i
t −(1=2) dt
where f(Xi(0) ) is a distribution over the initial conditions with which the ensemble is injected inside the trap. The form of the decay function N(t) as a function of time would then be determined by the two elements in the above expression: (i) the form of the function f(Xi(0) ) and (ii) the nature of the mapping i = O(Xi(0) ). The mapping O(Xi(0) ) may be such that for certain of the initial data it is ‘regular’ and for some others it is “chaotic”. If the function f(Xi(0) ) covers all such initial conditions, then N(t) would have in it signatures of all such initial conditions. It would be inappropriate and meaningless, therefore, to talk of any a priori form of the function N(t) without regard to the form of the function f(Xi(0) ) and the nature of the mapping O(Xi(0) ). N(t) could have any form representing a decay. However, in the extreme limit, if the mapping O is highly chaotic, such that the distinction among the various initial conditions is lost, then it is possible that the system simulates a Markovian behaviour, and the form of the function N(t) is an exponential decay. But that conclusion would require a careful analysis. Even so, the converse may not yet be true: An exponential decay need not always imply a Markovian behaviour, or a chaotic mapping. As we shall discuss in Section 6.2.1 in detail, special forms of the function f(Xi(0) ) de%ne special forms of ensemble which have some very interesting properties, some of which have been explored by Synge [3]. Such ensembles form the basis of our studies, which have unraveled some very fascinating theoretical and experimental manifestations of their properties, not only with regard to the problem of residence times and their theoretical predictions and experimental veri%cation, but also some other unexpected manifestations. 5.3. Stability of the adiabatically con3ned motion in the magnetic traps As we saw in the last section, experiments carried out [49] to check the consequences of the Arnold theorem failed to record a perpetual con%nement for the highest magnetic %eld used. It would, therefore, appear that at least for the strength of the %elds employed in the experiment there may yet be a cumulative change in the gyroaction, which could be responsible for the observed %nite (non-in%nite) residence times in the trap. However, the existence of a break in the ln vs.
350
R.K. Varma / Physics Reports 378 (2003) 301 – 434
B plot, observed in the experiment at a certain lower magnetic %eld, is suggestive of two diFerent kinds of mechanisms involved in the nonadiabatic loss which may be operative at diFerent ranges of %eld strength. 5.3.1. The axisymmetric case Chirikov [9,52] has studied the stability of the bounce motion in the adiabatic trap which is identi%ed as the adiabatically trapped motion in consequence of the adiabatic invariance of the gyroaction. More precisely speaking, it translates to inquiring as to how good the invariance of the gyroaction over a long time period is for a given particle. Chirikov’s approach does not label the particles by all their initial conditions, but rather by a certain parameter which essentially measures the magnitude of the nonadiabatic jump V that the particle suFers as it crosses the midplane—the single transit nonadiabatic change. This jump V, as we know from Section 4.2 (Eq. (4.93) for instance) depends on the particle speed v and rather critically on the pitch angle 0 at the midplane crossing. But as we shall see later he uses the value of V in the limit 0 → 0. The exact particle motion in an adiabatic trap is, of course, a continuous Hamiltonian mapping in time. Chirikov, however, approximates it by a discrete canonical map, by reducing the exact motion into two parts: (i) the single transit change V at the midplane, the expression for which has been obtained in Section 4.2.1 and (ii) an (approximate) adiabatic motion between consecutive midplane crossings. He thus constructs a Poincare’ surface of section speci%ed by the two variables (; ’) at the midplane crossing, being the gyroaction after the midplane crossing and ’, the gyrophase at the crossing. Thus while the change (“jump”) in is taken to occur only at the midplane (or wherever else, one or two discrete points depending on the form of the magnetic %eld) and remaining constant in between crossings (in consequence of the assumed adiabatic invariance) the change in ’ occurs essentially all through the adiabatic motion assumed between the crossings. One therefore has the discrete mapping ] = + 2() sin ’ ; ’] = ’ + @() ] ;
(5.2)
where 2() is, in general, a function of , the form of which depends on the form of the magnetic %eld, as given by the function f(s) in the paraxial approximation [Eq. (4.89)]; @() represents the phase change between two successive midplane crossings governed essentially by the adiabatic motion, and is given by m 1=2 ]
ds ] = * ; √ @() = = (5.3) 2 !b E −
s0 where the integral is taken over half a period of the bounce motion, from the midplane s0 = 0, back to the same position after one reRection oF the mirror, and is the time taken for this transit which is *=!b , !b being the bounce frequency. @() ] is taken as a function of , ] the value of after the change V. The mapping (5.2) is not canonical if 2 is a function of . It can be rendered canonical, however, if instead of , one can use an appropriate function P of as a variable, such that the change VP is not a function of P. This will depend on the form of the function 2(). As an example, Eq. (4.93) √ illustrates a case where the change of is independent of . The corresponding mapping would
R.K. Varma / Physics Reports 378 (2003) 301 – 434
351
then be canonical. However, for any function 2(), the mapping can be rendered canonical, if it is linearized in the following manner. ] The mapping (5.2) is linearized about a value of =n , such that it corresponds to a resonance = ] 2n!b (n ), between the averaged gyro-frequency and the bounce frequency !b (n ) corresponding to the value n . [This resonance condition with 2n!b , rather than n!b is connected with the symmetry of motion (See Ref. [52, p. 40]).] @(n ) is, thus expanded around n @() ] = @(n ) + (] − n )@ (n ) ;
(5.4)
where @(n ) = 2*n, by the hypothesis of n being a resonant value. De%ning I = ( − n )@ (n ) ;
(5.5)
K = 2(n )@ (n ) ;
(5.6)
we obtain the linearized mapping (5.2) as I] = I + K sin ’ ; ’] = ’ + I ;
(5.7)
where the linearization in (5.4) assumes that |−n |= ¡ |n+1 −n |=n ∼1=n1, that is, the resonance
] = 2n!b , is of a high order, n1 and where the constant 2*n is dropped from the second of Eq. (5.7), since being integral multiple of 2*, it would not aFect the %rst of Eq. (5.7). The mapping represented by (5.7) is referred to as the “standard mapping” (Chirikov [10]) which is characterized by a single mapping parameter K. Using the expression for 2 from (4.93) and (5.3) for @(), for the “quadratic” %eld variation (4.89), one obtains [9] 3*2 r0 3 + sin2 0 exp(−G=j) ; (5.8) 16 Lj2 sin4 0 where G is given by (4.94), and depends on 0 in a rather complicated manner. Note that 0 here corresponds to the resonant value n , so that K is independent of I , ensuring the canonicity of the mapping (5.7). The behaviour of the standard mapping (5.7) has been studied by Chirikov [9,10,52] and other workers (Greene [50], Rechester et al. [51]), with respect to the value of the mapping parameter K. Two types of behaviour have been identi%ed, determined by the value of K as the mapping is iterated a large number of times (N ∼ 106 –107 ) starting from a certain initial value of and I . For values of K ¡ Kcr ∼ 1, the map iterated points ’ and I (modulo 2*) were found to be in narrow bands near the separatices of the mapping, with the points moving stochastically from one iteration to the next within a band (see Fig. 4 of Ref. [9]). The horizontal axis represents one period of gyrophase, and the vertical axis two periods of , with a resonance value n . This motion within a band in the separatrix though stochastic, corresponds to only bounded oscillations in . For values of K ¿ Kcr , on the other hand, the single trajectory (map-iterated) points, however, %ll up greater area, moving stochastically from one separatrix to the next through a series of jumps, as shown by the hatched portion in Fig. 3 of Ref. [9]. The unhatched portion corresponds to the bounded oscillations in . The type of behaviour in the hatched zone is termed as “unstable” and has been shown to be associated with the overlapping of neighbouring resonances [10] n and n ± 1, which signals the onset of unstable behaviour leading to unbounded change in . K=
352
R.K. Varma / Physics Reports 378 (2003) 301 – 434
The unbounded stochastic changes in across resonance for K ¿ Kcr can be considered to be diFusive in nature and a diFusion coeWcient D = (V)2 =t in can be evaluated using the mapping. As has been discussed by Chirikov [52] a diFusion coeWcient is adequate to describe the excursions in only in the limit of the mapping parameter (of the standard mapping) K1, over short times. However, for K ¿ 1 and over long times, it is more appropriate to use the more accurate nonlinearized mapping corresponding to Eq. (4.93) for the “quadratic” form of the %eld, namely p] = p + 2 sin ’ ; ’] = ’ + @(p) ] ;
(5.9)
where now √ p= ; 3* r0 v exp(−G=j) ; (5.10) 2=− √ 8 2 jL 01=2 as obtained from (4.93), with G, a function of sin 0 , given by (4.94), and @(p), ] as before, given by 2 (5.3) with only replaced by p] . An approximation is made by taking the limit 0 1; this makes G independent of 0 reducing it to G ≈ 2=3. This renders the mapping (5.9) canonical. A stochastic evolution is, in general, more correctly described over an arbitrary length of time by the Fokker–Planck–Kolmogorov (FPK) equation 9f() 9Q =− ; 9t 9 9 (5.11) Q = − (U f) + D f ; 9 where f is the probability density of , and Q, the probability Rux, while U is the drift “velocity” in , U = V=t. It has been shown [52] that if one uses an appropriate ergodic variable > (instead of ), then in equilibrium (because of detailed balancing) f(>) = const. This leads to U> = 9D> =9> and U = (9D(d) =9) where U(d) and D(d) refer to discrete time . Using this in (5.11) yields in discrete time , the diFusion equation 9 9f 9f = D : (5.12) 9 9 9 √ In the continuous time t related to by dt=d = *=!b (), the appropriate ergodic variable is p = , √ precisely the variable of the mapping (5.9), since !] b ∼ , and the appropriate diFusion equation in continuous time variable t is 9 9f(p) 9f Dp ; (5.13) = 9p 9p 9t where 2 dp 1 D p = D D : = (5.14) d 4 However, using mapping (5.9), we %nd, 1 (Vp)2 22 : (5.15) Dp = lim t →∞ 2t 4
R.K. Varma / Physics Reports 378 (2003) 301 – 434
353
From (5.15), then D = 22 . Using the discrete time diFusion equation (5.12) in , we get 9f 9f 1 2 9 = 2 p ; (5.16) 9 4 p9p 9p √ where p = . Changing variable p → X; X = (sin =sin a ) = (p=p0 )M 1=2 , where M = Bm =B0 , is 1=2 1 2 √ , and a is the loss cone angle, we get the mirror ratio, p0 = 0 = v = 0 2 9 9f 9f = X ; (5.17) 9S X 9X 9X where S is a new dimensionless time. S = (22 M=40 ) :
(5.18)
Assuming f ∼ exp(−:2 S), we obtain for the eigenvalue equation the following Bessel equation df d X + :2 Xf = 0 ; (5.19) dX dX with the boundary conditions df f(1) = 0 and dX
X =Xer
=0 ;
(5.20)
where X =1, corresponds to the adiabatic loss cone, and f(1)=0 corresponds to the particles getting lost at the adiabatic loss cone. X = Xcr corresponds to the stability boundary of the mapping and df=dX |X =Xcr = 0 implies that the particle Rux vanishes at the stability boundary which is such that Xcr 1. A general solution of Eq. (5.19) is given by f(X ) = CJ0 (:X ) + N0 (:X ) ;
(5.21)
and the boundary conditions (5.20) give CJ0 (:) + N0 (:) = 0 ; CJ1 (:Xcr ) + N1 (:Xcr ) = 0 :
(5.22)
Eliminating C one obtains J1 (:Xcr )N0 (:) − J0 (:)N1 (:Xcr ) = 0 :
(5.23)
Since the %rst eigenfunction must be positive everywhere, this requires that :Xcr 6 1. With Xcr 1, this implies :1. Using the expansion of N0 (:) and N1 (z) [38] 2 1 Hz ; N0 (z) ≈ J0 (z) ln * 2 z 1 2 Hz ; N1 (z) ≈ − + ln *z * 2 1 J1 (z) ≈ z ; (5.24) 2
354
R.K. Varma / Physics Reports 378 (2003) 301 – 434
we get from (5.22) 1 1 = Xcr2 ln Xcr (5.25) 2 : 2 Now Xcr = (sin cr )M 1=2 is obtained from the stability boundary given by K ≈ 1. Since K is given by 81 2 r0 q2 e−q * K = 2@ (p) = ; (5.26) 64 L sin4 where q = (2=3j), this yields 9* r0 1=2 −(1=2)q M qe : (5.27) Xcr2 = 8 L We thus get the residence time 32 L 3=2 e3q=2 40 1 2 r = X ln Xcr = ln (M sin2 cr ) ; (5.28) M22 2 cr 9* r0 q on making use of the form of f ∼ exp (−:2 S), expression (5.18) for S and the expression (5.10) for 2]2 . Noting that q = (2=3j) = (2L=3%m ), we see how the life time depends on the length L (the magnetic %eld variation length near the midplane), and the mirror ratio M . It may be emphasized however, that the expression (5.28) does not depend on the initial pitch angle of injection 0 , as the limit 0 → 0 has been taken in the dependent expression in the exponent for 22 . The adequacy of the above expression (5.28) for the life time of charged particles con%ned adiabatically in a magnetic mirror trap, can of course be scrutinized against the results of laboratory as well as numerical experiments. This will be done later in Section 6. For the sake of such a comparison and future discussion a few summarizing comments relating to the various simpli%cations and approximations used in %nally arriving at the expression (5.28) will, however be in order. These are as follows: 1. First of all, the exact (continuous) diFerential equation mapping for the charged particle dynamics in the magnetic trap is split into two components: (i) the nonadiabatic jump V in which occurs as the particle transits across the midplane of the trap. This change which depends sinusoidally on the Larmor phase is, in fact, assumed to be the only change in between one transit and the next. (ii) The motion between every two successive midplane transits is assumed to be adiabatic, with , therefore, remaining constant. This adiabatic part of the motion then determines the total change in the gyrophase ’ between one midplane transit and the next. This phase in turn determines the exact value of the jump across the midplane transit through its sin ’ dependence. The continuous di/erential equation mapping is thus reduced to a discrete PoincarCe mapping speci3ed by the values (i ; ’i ) at the midplane. 2. The exact analytic expression (4.93) for V depends rather sensitively on the pitch angle . The resulting discrete mapping, however, turns out to be noncanonical because of this dependence. This only highlights the approximate nature of this discrete mapping since the original diFerential equation mapping is canonical. To render this discrete mapping canonical, a further approximation is introduced by taking the → 0 limit of the expression for V. This is, however, achieved at the cost of a crucial dependence on , the initial pitch angle. It should, however, be mentioned at the same time that if the mapping is linearized around a “resonant” value of n of , no such
R.K. Varma / Physics Reports 378 (2003) 301 – 434
355
limit (0 → 0) need be taken: the mapping reduces to a “standard mapping” characterized by a single mapping parameter K. 3. Though the Larmor or the gyrophase ’, is one of the dynamical variables of the discrete mapping, and an ensemble can be constructed by specifying a distribution over the initial phases, these phases have in eFect no discernible distinguishing consequence on the dynamics of the individual members of the ensemble. All initial conditions pertaining to an ensemble of particles are thereby reduced to just one parameter K of the corresponding standard map. The members of the ensemble are thus not distinguished by individual label and all behave like any “typical” particle. However, our numerical experiments with the exact diFerential equations of motion reveal a rather interesting structure with respect to the initial gyrophase values [59]. 4. The stochastic diFusive behaviour of the mapping which has been identi%ed here as the central mechanism for the nonadiabatic loss from mirror traps, would certainly ensue when K1, because of the “overlap of neighbouring resonances”. No Rux is assumed to Row across the K = 1 boundary from the “stable” K ¡ 1 region. [This is used as one of the boundary conditions for Eq. (5.18).] While in the framework of the standard map-approximation of the continuous map, it is certainly consistent to regard K = 1 as the stability boundary, this approximation may itself be suppressing interesting dynamical behaviour in the region corresponding to 0 6 K 6 1. In our numerical experiments with the exact diFerential equations we have found indications to that eFect [59]. 5. Finally, there may exist, instances where the oscillations of may be bounded in the mean, so that the process does not amount to a diFusion (an unbounded chaotic excursion in space), but they may just be large enough at a certain time to hit the adiabatic loss cone. The above mentioned treatment excludes such a possibility. 5.3.2. The nonaxisymmetric case 5.3.2.1. Strong azimuthal inhomogeneity. As we have seen the axisymmetry of the magnetic %eld con%guration, which implies the conservation of the canonical angular momentum M = mr 2 4˙ + (e=c)rA4 , constrains the motion of the particle to a M = const surface in the phase space. In the small Larmor radius limit, it implies the motion of the guiding centre of the particle constrained on a Rux surface rA4 = const. When the motion of the particle is bounded along the magnetic %eld lines, as in a magnetic trap, there exists as we know [5] the longitudinal invariant $ J= p ds = ds 2m(E − ) ; where the integral is taken over a period of the longitudinal oscillation. In the case of the axisymmetric %eld, this invariant (J ) is trivially conserved, since the azimuthal drift motion of the guiding centre of the particle transports it to equivalent magnetic %eld lines on the Rux surface. With the relaxation of the constraint of axisymmetry and conservation of canonical angular momentum, the system becomes three dimensional and excursions of particles into the regions of phase space forbidden earlier is now permissible. Now the azimuthal drift of the particle can take it to a %eld line on a diFerent Rux surface where the mirror points on the %eld line may change. It has been shown [5] that if the azimuthal drift of the particle is slow enough (as it would generally be in the guiding centre theory) so that there are a large number of longitudinal oscillations over a time
356
R.K. Varma / Physics Reports 378 (2003) 301 – 434
corresponding to a transit through an azimuthal inhomogeneity scale L = ((d=d#) ln B)−1 , then the longitudinal action J is an adiabatic invariant. In terms of the drift frequency !D = vD =r (vD being the magnitude of the drift velocity, and r is the distance from the axis of the %eld), this translates to !b !D , where !b is the longitudinal bounce frequency. As the particle now goes from one %eld line to another due to its azimuthal drift, conserving J , it de%nes a closed drift surface, which is, of course, diFerent from the one in the axisymmetric case. A value of J thus de%nes a particular drift surface. When the condition !b !D is not satis%ed well enough, there would be departures from the invariance of J . However, as has been discussed by Chirikov [9], the departures could be quite large, when the adjacent resonances de%ned by 2!] b = n!D , (n being a necessarily large integer) overlap. Here !] b is the longitudinal bounce frequency averaged over the drift period 2*=!D . These large departures in J , in consequence of the overlap of those resonances, very much similar to the case of changes in arising because of overlap of the resonances 2 ] = n!b , would lead to excursion of particles away from the drift surfaces de%ned by the values of the invariant J . When the azimuthal inhomogeneity is pronounced, as measured by the relative amplitude of the various modulations ] ) ] + (V!b = !] b )] ∼ 2!b = ] ; ≡ [(V =
(5.29)
the neighbouring resonances would strongly overlap and the stochastic changes in J would occur, leading to a diFusive excursion of the particle guiding centre across drift surfaces. A diFusion coeWcient DJ can be constructed as discussed by Chirikov. We shall, however, limit ourselves only to these qualitative remarks, about these eFects, as these are not central to our review. The reader may refer to Chirikov [9] for further discussion. 5.3.2.2. Slight azimuthal inhomogeneity: Arnold di/usion. There is a vast literature dealing with what has been referred to as the “Arnold diFusion” [9] or perhaps more precisely as the Arnold instability of multi-dimensional nonlinear dynamical systems [53]. We give here only a very qualitative description of this general phenomena with particular reference to the problem of charged particles in magnetic traps. In this context, one can pose a question with reference to the Arnold theorem alluded to in the introductory part of the present section on the bounded variation of the gyroaction (and the consequent perpetual con%nement of charged particle in magnetic mirror traps) which was proved under the constraint of strict axisymmetry: How will the consequences of the Arnold theorem be aFected if the system departs from the axisymmetry slightly? This means, %rst of all, that one has allowed the system to become three dimensional from the strict two dimensional under the constraint of axisymmetry, whereby the canonical angular momentum is no longer conserved. We discuss in this subsection very qualitatively the consequences to the particle dynamics, of a slight departure from axisymmetry speci%ed by the inequality 2!] b (5.30)
] with de%ned by (5.29). We recall, from Section 5.3.1 that for the axisymmetric case, the standard mapping (5.7) exhibits a stochastic behaviour in the small layer around the separatrix of a nonlinear resonance even for
R.K. Varma / Physics Reports 378 (2003) 301 – 434
357
the mapping parameter K1. These layers are referred to as “stochastic layers”. It is important to emphasize that in the strictly axisymmetric case, the motion in these layers though stochastic, is strictly con%ned to these layers—since the “stochastic layers” of diFerent resonances 2 = n!b are separated from each other by stable invariant tori. When an axisymmetry is switched on, ever so slight, the various resonances and their stochastic layers can cross enabling the system to move from one layer to another bypassing the stable nonresonant regions. The stochastic excursion along the resonances 2 ] = n!b and along their stochastic layers can thus lead to large cumulative changes in in a diFusive manner. This phenomena is therefore referred to as “Arnold diFusion”. Again, we con%ne ourselves to giving only this qualitative description of Arnold diFusion. For a greater quantitative account of this rather complex and general phenomena, the reader may refer to Refs. [9,53,54]. 6. A wave mechanical Schr-odinger like description of long term nonadiabaticity: the new paradigm As summarized at the end of the last section, the conventional approach to the calculation of residence time in the adiabatic trap splits the exact motion into two parts: (i) a nonadiabatic part which is taken to occur entirely in the midplane, and (ii) the motion in the rest of the trap which is assumed to be adiabatic. This is, however, not entirely satisfactory, since there may exist higher orders of nonadiabaticity in other regions of the trap, which may aFect the %nal outcome and the calculation of the residence time in the trap. However, in an entirely diFerent approach advanced by the author (Varma [55]), which dates prior to the calculation based on the conventional wisdom [8,9], the point of view was taken that the nonadiabaticity should be taken to mean as the departure of the exact trajectory as a whole from the idealized adiabatic trajectory (in the limit j → 0). The nonadiabatic eFects which are thus responsible for the leakage of the particles and consequent residence times, have to be suitably extracted from the exact trajectory as a whole. In the case of slight departure from adiabatic motion (j1), the exact trajectory would be in the near neighbourhood of the adiabatic trajectory in the function space of trajectories. To correspond to the actual experimental situation, we consider an ensemble of particles with a given initial value of the gyro-action , the energy E, injected into the trap at a given coordinate x0 along a particular %eld line, and distributed equally over the Larmor phase ’. (The last condition is very nearly satis%ed since the injection time is usually much longer than the Larmor period). If for simplicity, the magnetic trap is assumed to be axisymmetric then the canonical angular momentum P4 = mr 2 4˙ + (e=c)rA4 , where A4 is the 4-component of the vector potential, is a constant of motion. The ensemble of particles are assumed to be injected with the same common value of P4 . This translates to the perpendicular Rux coordinate of the particle for the case of small Larmor radius (slight nonadiabaticity). The exact trajectories of N particles of the ensemble may then be formally given by xi = xi (t; P4 ; E; );
’i = ’i (t; P4 ; E; ); i = 1; 2; : : : ; N ;
(6.1)
where x denotes the coordinate of the particle along the %eld line. As mentioned earlier, because of the (assumed) slight departure of these trajectories from the adiabatic trajectory, they all lie in the small neighbourhood of the latter in the function space of trajectories.
358
R.K. Varma / Physics Reports 378 (2003) 301 – 434
We make appeal to the principle of stationary action for the adiabatic motion to introduce a functional of the trajectories which would, in some sense, specify their position in a function space relative to the adiabatic trajectory. We note that the adiabatic equation of motion dv m (6.2) = −∇ ( ) dt follows from the stationarity of the action S given by S = dtL (6.3) with L = 12 mv2 − , as the adiabatic Lagrangian. Because of this property, if the action S is to be evaluated for the diFerent trajectories at a given time t, then all these values St would be centred around the value SA for the adiabatic trajectory at the same time t. Since a departure from the adiabatic trajectory is indicative of nonadiabaticity, by de%nition, the extent of departure of St from SA , would measure the extent of nonadiabaticity of an exact trajectory. At this point we make three observations: 1. In the limit j → 0, all the exact trajectories in the neighbourhood of the adiabatic one approach the latter, so that for all the trajectories, St → SA in the limit j → 0, and which is formally equivalent to the limit → 0 (taken via → ∞, rather than via the pitch angle → 0). 2. Following our discussion in Section 1.1 and analogies drawn in Section 2, we draw a parallel between nonadiabaticity and quantum eFects. Both nonadiabaticity and quantum eFects were shown in Section 2 to be nonanalytic departures from the corresponding limiting motions, namely the adiabatic motion of charged particles in the limit → 0, and the classical motion in the limit ˝ → 0. Thus the nonadiabatic loss from adiabatic magnetic mirror traps when j = 0 ( = 0) is likened to the quantum tunneling from classical potential wells when ˝ = 0. In fact, it was shown in Ref. [55] that the nonadiabatic leakage from adiabatic magnetic traps can be described by a Schr6odinger-like equation. This derivation will be reviewed in Section 6.1 below. 3. The exact trajectories (6.1) in the trap which depart from the adiabatic trajectory are likened to the Feynman paths in his path integral representation of quantum mechanics. Just as the Feynman paths which depart from the classical trajectory contribute to the quantum eFects, one may so formulate the present problem that the nonadiabatic eFects would appear as being due to the departures of exact trajectories from the adiabatic motion. 6.1. SchrFodinger-like equations: a heuristic derivation This then constitutes the basic philosophy of our approach as expounded in Ref. [55] where, it may be emphasized, the nonadiabatic eFects are taken to be associated with the departure of the trajectories as a whole from the adiabatic trajectory and are sought to be extracted through a functional theoretic approach. We review in this section the heuristic derivation of Ref. [55] based on the above approach. This is the %rst derivation of the Schr6odinger-like equations for the description of the nonadiabaticity in magnetic mirror traps. In spite of the two subsequent deductive derivations of these equations, this particular derivation displays the novelty of the line of arguments advanced to obtain them, and demonstrates the power of heuristics, where no known dynamical equations were used as the starting
R.K. Varma / Physics Reports 378 (2003) 301 – 434
359
point. For this reason this may interest the reader. These equations were subsequently derived, both starting from the classical Liouville equation as well as the quantum mechanic Schr6odinger equation which are reviewed in Sections 6.2 and 8, respectively. If we now consider the distribution of the action St over all the particles of the ensemble at a given time t, at an adiabatically accessible point x (the suWx on x will be dropped for convenience) in the trap, these values St should be found to be peaked around the adiabatic value SA (x; t). The action St may then serve as a very signi%cant label for a trajectory because for it to end up outside the trap, it must have its action diFerent from the extremal at that time. It is, therefore, found to be useful to introduce the action St as a variable in de%ning a distribution function for the particles. Instead of the action St , however, we introduce the action phase de%ned by 1 t1 2 mv dt + ’ ; @t = St = = (6.4) 0 2 t where ’ is the Larmor phase, ’ = − 0 dt. As a phase, @t is a dimensionless quantity and has the Larmor phase ’ as an additive part. This latter fact, which is made possible through the non-dimensionalization of St by the gyroaction initial value , would then make any function f of @ periodic in @, by virtue of its being periodic in the Larmor phase ’, as a physical requirement. The problem of determining the probability of nonadiabatic escape of particles then reduces to the problem of determining, at each instant of time, what fraction of actual trajectories labeled by their action-phases @ %nd their end-points outside the adiabatic trap. We now introduce a function f(x; @t ; t; E; ; P4 ) de%ned at every point (x; t) to give the smoothed out density of trajectory end points at the time t, per unit interval Vx at x, and with their action-phases in V@t at @t . The arguments E; and P4 which represent the initial values of energy, the gyroaction and the canonical angular momentum, with which the particles are injected into the trap appear as parameters; the E and P4 are global invariants, while is a constant of motion by virtue of its being the initial value. Clearly the particle density at (x; t) is given by (6.5) G(x; t) = d@t f(x; @t ; t) : Since, the action phase @t is not a measurable quantity, it has been integrated over. So far we have presented the elements of the philosophy of our approach and some de%nitions. Now we proceed to present the heuristics which lead us to the Schr6odinger-like equations that purport to describe the nonadiabatic loss of particles from adiabatic magnetic traps, in analogy with the quantum tunneling. We begin by writing a Chapman–Kolmogorov type equation for f(x; @t ; t) [the parameter arguments of f having been suppressed]. (6.6) f(x; @t+ ; t + ) = d(Vx)P(x; @t+ ; t + | x − Vx; @t ; t)f(x − Vx; @t ; t) where P represents a transition probability that a particle at (x − Vx; t) and with the action phase @t goes to the point (x; t + ) with the action-phase @t+ . Our task is to construct a suitable expression for P which has an appropriate adiabatic limiting expression in the adiabatic limit → 0. We %rst of all, note that for in%nitesimal changes we simply have @t+ = @t + L=. If SA (x; t) is the “principal function” for the adiabatic motion, then in the
360
R.K. Varma / Physics Reports 378 (2003) 301 – 434
adiabatic limit the transition probability P must reduce to the -function, (Vx − =m 9SA =9x), so that on integration with respect to Vx, and @t+ , Eq. (6.6) would give 9SA G(x − Vx; t) (6.7) G(x; t + ) = d(Vx) Vx − m 9x which yields the “equation of continuity” for the adiabatic motion: 9G 9 1 9SA + ·G =0 ; (6.8) 9t 9x m 9x where (1=m) 9SA =9x = VA is the adiabatic velocity %eld. We next note that as a probability density, f must be a positive de%nite quantity at all space-time points. This is chosen to be ensured by writing for f a positive de%nite expression f=
∗
(x; @t ; t) (x; @t ; t) ;
(6.9)
where is, in general, a complex scalar quantity. Such an expression for f also suggests an expression for P in terms of the function , which has the form of an “overlap”. Consider then the quantity P˜ P˜ =
∗
(x; @t+ ; t + ) (x − Vx; @t ; t)
(6.10)
as a prospective candidate for the expression for P. Since f as a function of @t , would be peaked around the adiabatic value @A (x; t), we can write an expansion for in the adiabatically accessible region, of the form ˜ (x; n; t) exp{in[St − SA (x; t)]=} = (6.11) n
Note that we can write (6.11) as a Fourier series because of the form (6.4) of @t , which has the Larmor phase ’ as its additive part. By virtue of the fact that the physical quantities must be single valued and periodic over the Larmor phase ’ (modulo 2*), the periodicity with respect to @t (modulo 2*) follows from the additive presence of ’ in it. Conversely, to ensure this periodicity, is the appropriate quantity which has been used to non-dimensionalize St with. This is an important point, as it is this which leads to the appearance of in the role of ˝ in the Schr6odinger-like equations obtained in Ref. [55]. If we now use (6.11) in (6.10), and write St = St+ − L, we get ˜ ∗ (x; n; t + ) ˜ (x − Vx; n ; t) P˜ = n; n
×exp{−i(n − n )[St+ − SA (x; t + )]=} exp{−in [L − VSA ]=} ;
(6.12)
where VSA = SA (x; t + ) − SA (x − Vx; t) :
(6.13)
Note that the second exponential factor in (6.12) gives in the limit → 0 in 1 (Vx)2 −imn 1=2 9SA m − VSA − = : (6.14) Vx − lim exp − →0 2 2* m 9x This follows, if we expand VSA = Vx(9S=9x) + (9S=9t) = pVx − E, (as p = 9SA =9x; E = −9SA =9t) complete the square, and write 12 m(Vx)2 = − VSA − = 12 m=(Vx − =m 9SA =9x)2 − [p2 =2m + −
R.K. Varma / Physics Reports 378 (2003) 301 – 434
361
E] = 12 m=(Vx − =m 9S=9x)2 , since E = p2 =2m + , for the adiabatic motion. Then integrating the left-hand side of (6.14) over Vx on using these relations, we get
9SA 2 −imn −1=2 1 mn Vx − · d(Vx) exp − i lim →0 2* 2 m 9x 9SA =1 : (6.15) = d(Vx) Vx − m 9x The second exponential factor in (6.12), then gives the required -function in the adiabatic limit. P˜ would then serve as an appropriate expression for the transition probability, as per the discussion ˜ leading to Eq. (6.7). However, P˜ must be suitably normalized, such that P = P=g P˜ d(Vx) = 1 ; (6.16) g ˜ to be a proper transition probability. If we use the result (6.14), then in the limit → 0 for P = P=g (6.12) gives 9SA ; (6.17) P˜ = ∗ (x; @t+ ; t + ) (x − Vx; @t+ ; t) Vx − m 9x provided that P˜ of (6.12) is rede%ned with the factor A(n ) = (−imn =2*)1=2 under the summation. That is ˜ ∗ (x; n; t + ) (x − Vx; n ; t)A(n ) P˜ = nn
×exp{−i(n − n )[St+ − SA (x; t + )]=} exp{−in [L − VSA ]=} :
(6.18)
Now for P˜ to be normalized in accordance with (6.16), it follows from (6.17) that g must be g=
∗
(x; @t+ ; t + ) (x − Vx; @t+ ; t)
∗
(x − Vx; @t ; t) (x − Vx; @t ; t)
(6.19)
to the lowest order in the in%nitesimals, Vx, and . Using (6.16), (6.18) and (6.19) in (6.6), we get 2 | (x; @t+ ; t + )| = R d(Vx) ∗ (x; @t+ ; t + ) × A(n) ˜ (x − Vx; n; t) exp{in[St − SA (x − Vx; t)]=} :
(6.20)
n
If we write St = St+ − L, on the right-hand side of (6.20) and carry out integration over @t+ , then it can be shown that we would get “adiabatic equation of continuity” (6.8) in the limit → 0; R stands for the real part of the right-hand side. In the form expressed, Eq. (6.20) is thus an appropriate equation for the adiabatically accessible region. For an arbitrary space time point including the adiabatically accessible region, we carry out a Fourier analysis of with respect to @t according to (x; @t ; t) = (x; n; t) exp(inSt =) ; (6.21) n
362
R.K. Varma / Physics Reports 378 (2003) 301 – 434
rather than according to (6.11). Since the left-hand side is real, we can drop the pre%x R. Then Eq. (6.20) immediately gives A(n) (x − Vx; n; t) exp[in(St+ − L)=] : (6.22) (x; @t+ ; t) = d(Vx) n
Fourier analysis of (6.22) with respect to @t+ yields: −in 1 2 m(Vx) = − (x − Vx; n; t) : (6.23) (x; n; t + ) = d(Vx) exp 2 This equation is very much analogous to the Feynman path integral representation of quantum mechanics [66], where is appearing in the role of ˝, and where the adiabatic potential ( ) takes the place of potential. Carrying out a procedure similar to Feynman’s, we expand both sides of the equation about the point (x; t), and carrying out integration over (Vx), we obtain, using the expression for A(n), the following set of Schr6odinger-like equations for the (x; n; t): 2 1 92 (n) −i 9(n) =− + ( )(x; n; t); n = 1; 2; 3; : : : : (6.24) n 9t n 2m 9x2 From the de%nition (6.5) of the density G(x; t) (or probability density, depending on the manner of normalization) and the expression (6.9) for f, we get the following expression for G(x; t) ∗ (x; n; t)(x; n; t) ; (6.25) G(x; t) = d@t f(x; @t ; t) = n
on using the Fourier expression (6.21) for . Expression (6.25) for the probability density G(x; t) is also analogous to that for quantum mechanical probability density, except for the fact that it is a generalized expression to include all the modes n = 1; 2; 3; : : : ; corresponding to the functions (n) governed by Eq. (6.24). Thus as envisaged in the Introduction, Section 1.1, the nonadiabatic departures from the adiabatic motion are indeed found to be governed by the Schr6odinger-like equations (6.24). To be sure, the derivation presented above is largely heuristic, and not based on any dynamical equation governing the charged particle motion in a magnetic %eld. In fact, the only property of the actual trajectories that we have used is that they lie in the neighbourhood of the adiabatic trajectory in the function space of trajectories, (as speci%ed by the value of the action St ) without regard to any speci%c structure characteristic of the charged particle dynamics. Nevertheless these equations have been vindicated through two subsequent derivations which are based on known dynamical equations, namely the classical Liouville equation, (Section 6.2) and QM-Schr6odinger equation (Section 8). On comparing Eq. (6.11) and Eq. (6.21) we note that (x; n; t) ∼ ˜ exp(−inSA =) where SA is essentially the Hamilton principal function for the adiabatic motion, SA = −Et + px. It follows that we seek solution for in the form ∼ exp(inEt=), where E is the energy of the particle. We thus see that Eq. (6.24) with the connection (6.25) and the form of the solution from a close analogue of the Schr6odinger theory in quantum mechanics. As in the case of the latter leading to classical mechanics in the limit ˝ → 0, so also Eq. (6.24) yields the Hamilton–Jacobi equation for adiabatic motion in the limit → 0. Eq. (6.24) would obviously describe the leakage of particles trapped adiabatically in the potential , analogously to the quantum-tunneling. This leakage of particles is thus identi%ed as the nonadiabatic escape, which can be calculated using these equations as in quantum mechanics, given the form of the magnetic %eld which determines the adiabatic potential .
R.K. Varma / Physics Reports 378 (2003) 301 – 434
363
What is more interesting, however, is the fact, that as a consequence of the diFerent equations of the set corresponding to n = 1; 2; 3; : : : ; etc., one would obtain diFerent probabilities of tunneling for the same injected value of the particle energy E, and the initial value of the gyroaction . These equations thus predict the existence of a multiplicity of residence times in the trap, which were earlier completely unsuspected and unexpected. If we assume the potential and hence the magnetic %eld, to be of the form B = B0 + [Bmax − B0 ][cosh :x]−2
(6.26)
in the region of the mirrors along a certain %eld line, then the probability of transmission per unit time across the potential hill (6.26) is given by 1 C(n)e−An B (6.27) P= T n with An given by
2*n {( max − 0 )1=2 − (E − 0 )1=2 } :B
1=2 1=2 2*Le Bmax B0 1=2 B 2 0 √ − ; sin − 1 − sin2 =n m B B B c E
An = (2m)1=2
(6.28)
where B is the value of the magnetic %eld at the point of injection, and is the pitch angle of injection so that = E sin2 =B; B0 is the magnetic %eld in the straight middle section of the magnetic mirror trap. It is seen that the probabilities of transmission corresponding to the various modes n are exponentially smaller for successively larger values of n, as we see that An = nA1 . The corresponding residence terms n , which would be given by n = T eAn B = T enA1 B :
(6.29)
are exponentially longer. Note that in the case of slight nonadiabaticity (quasi-adiabatic approximation), T may be taken as the bounce period between adiabatic turning points. The C(n) in (6.27) represents relative magnitudes of the transmission probabilities for the various n, which the model does not determine. It may be conjectured that they fall oF as ∼ jn . There were some experimental results [48,49] available in the literature on the residence times as a function of the magnetic %eld strength at the time (1970) of the writing of the paper [55]. It is assumed, for simplicity, that all the %eld con%gurations in the diFerent experiments [48,49] are characterized simply by the scale length L = :−1 , and described in the region of the mirror by the function of the form (6.26), for which the expression (6.28) holds. The experimental results were given in the form of ln vs. B plots. These results are compared with the values calculated from the expression (6.28) for n = 1. For the corresponding values of the parameters in the experiments, namely the energy E and the scale length :−1 , the theoretical and the experimental values are given in Table 1. The theoretical expression (6.29) with A1 , given by (6.28), implies a straight line curve for the ln 1 vs. B plot. The corresponding experimental values are obtained from the steepest section of the ln vs. B curve. A look at Table 1 shows that the experimental values compare very well with the predictions of the model. We thus conclude that our model describes very well indeed the nonadiabatic loss of
364
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Table 1 A comparison of the theoretically predicted value (A1 B)theor of the exponent for n = 1 [see text Eq. (6.28)] with its experimental value (A1 B)exp for the diFerent experiments Authors
L (cm)
E (keV)
(A1 B)theor
(A1 B)exp
Dubinina et al. [49] Ponomarenko et al. [48] Con%guration I Con%guration II Con%guration III
25
23.5
0.17B
0.15B
18 14 12
9 9 9
0.071 Bmax 0.055 Bmax 0.047 Bmax
0.096 Bmax 0.051 Bmax 0.037 Bmax
particles and consequent residence times even with the approximations made in the calculations, in particular the one made in representing the experimental magnetic %eld con%guration by the form (6.26). The approximation made is worst for the con%guration III where the %eld variation had a more complicated form. The departure is also largest for this case. 6.1.1. Multiple residence times: experimental determination The comparison with the experimentally observed residence times was made above with the theoretically calculated expression for the mode n = 1 of the set of equations (6.24). As discussed in the previous section, the other equations of the set predict additional residence times corresponding to the modes n = 2; 3; : : : ; with the same exponential dependence on the magnetic %eld but with n times larger exponent as indicated by the relation (6.27). Such additional residence times could not have been suspected in the conventional approach. An experimental check on their existence was therefore very crucial. Such an experimental check was carried through a series of experiments carried out at the Physical Research Laboratory by Bora et al. [57 (a)–(c)] which con%rmed the existence of upto three distinct residence times for n = 1; 2 and 3 with precisely the characteristics as determined by the relation (6.28). It is appropriate to present here a brief account of this experiment. The experimental system consisted of a stainless steel vacuum chamber 1:5 m long and 15 cm in diameter. An electron gun, to inject electron pulses into the system was placed at the end of the system. The static con%ning magnetic mirror %eld was produced with the help of 12 pancake coils placed at diFerent axial positions along the vacuum chamber. Electrons were injected by pulsing the electron gun and were trapped in the mirror %eld by reducing the magnetic %eld at the mirror throat momentarily [57 (b)]. The electron leakage current was recorded as a function of time by a Faraday cup at the other end of the chamber, and the parallel electron energy was measured by the Faraday cup acting as a retarding potential analyser (RPA). The dispersion in the parallel energy was found to be less than 10%. The density of electrons in the system at the time of injection was typically 104 particles cm−3 , which ensured the absence of any collective behaviour. The system was evacuated to a pressure of ∼ 5 × 10−8 torr in order to reduce the leakage of electrons by scattering oF the background neutral atoms. The results from the experiments could thus be ascribed unambiguously to the nonadiabatic behaviour of single particles trapped in an adiabatic magnetic trap. During the course of the experiment the electrons were injected at pitch angles 4 33◦ and 35◦ , that is, in excess of the loss cone angle by 0:69◦ and 2:69◦ , respectively. Though highly desirable, larger pitch angles could not be achieved, because of the limitation of the experimental set up. Three
R.K. Varma / Physics Reports 378 (2003) 301 – 434
365
Fig. 3. (a) Schematic representation of the experimental system for the residence time studies in the mirror trap. (b) Magnetic %eld variation for three diFerent %eld con%gurations with spatial scale lengths L = 8 cm (I), 11 cm (II), and 13 cm (III).
diFerent scale lengths for the magnetic %eld con%guration were used. Fig. 3 depicts the magnetic %eld variation corresponding to three scale lengths L = 8, 11 and 13 cm. The signal due to the leakage current collected by the Faraday cup was logarithmically ampli%ed before it was digitized at a rate of 33 s. The semi-log plot of the leakage current as a function of time for a given magnetic %eld value at the maximum, in general, did not represent a straight line. This suggested the existence of more than one residence times. To see if this is indeed the case, and to determine these residence times the decay signal was numerically %tted to a sum of exponentials with diFerent decay times n and amplitudes An . (6.30) I= An exp(−t=n ) : The details of the %tting procedure are described in Ref. [57 (b), (c)]. The experiment was repeated with diFerent magnetic %eld strengths and the residence times in the trap were determined as a function of the magnetic %eld, for the same set of electron energy E, the pitch angle , and the scale length L. The residence times n so determined are plotted on a semi-log plot as a function of Bmax which are depicted in Fig. 4 and are found to be straight lines
366
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Fig. 4. Variation of ln n (n , the experimentally determined two residence times, n = 1, and 2) with the magnetic %eld Bm at the mirror throat, for the pitch angle of injection 35◦ . (a) For the magnetic %eld scale length L = 8 cm and electron energies E = 2:2, 2.9, 3.7 and 4:5 keV. The corresponding straight line pair of plots are designated as (AA, BB, CC, DD). The slopes A1 and A2 of these straight lines for the various sets (A1A ; A2A ); (A1B ; A2B ); (A1C ; A2C ) and (A1D ; A2D ) are presented in Table 2. (b) For the electron energy E = 2:9 keV and diFerent magnetic %eld scale length L = 8, 11 and 13 cm. The straight line pair of plots are designated as (EE, FF, GG), with their slopes (A1E ; A2E ); (A1F ; A2F ) and (A1G ; A2G ) given in Table 2.
with the slopes An being diFerent for diFerent n. This is what was, in fact, predicted by the model through the expression (6.28) based on the latter. More precisely, the expression (6.28) states that An = nA1 for a given set of parameters (E; ; L). Fig. 4 depicts the ln n vs. Bmax plots for the two residence times 2 and 1 determined in the experiments. Fig. 4(a) gives the plots for the same scale length L = 8 cm, but diFerent energies E = 2:2, 2.9, 3.7 and 4:5 keV, while Fig. 4(b) gives the plot for the same energy E = 2:9 keV, but diFerent scale lengths L = 8, 11 and 13 cm. The pitch angle was taken for all these cases to be 35◦ . The results of these observations are summarized in Table 2 which gives values of the slopes A1 and A2 for the various sets of energies E and scale lengths L, as well as the ratio A2 =A1 in each case. It is clearly seen from the table that the ratio A2 =A1 is close to the theoretically predicted value of 2. The corresponding experimental results for = 33◦ are given in Fig. 5, which are summarized in Table 3. We again %nd that the ratio A2 =A1 is close to the value 2. These results clearly signify the experimental veri%cation of the most important component of the prediction of the model, namely the existence of additional residence times, a second residence time 2 , with A2 = 2A1 . For certain values of the energy E and the magnetic scale length L, we have found the existence of three distinct residence times 1 ; 2 and 3 , such that A3 = 3A1 . The corresponding ln vs. Bmax plots are displayed in Figs. 6(a) and (b) and the corresponding A-values, A1 ; A2 ; A3 are tabulated in Table 4(a). We see clearly the relation A3 =3 = A2 =2 = A1 , approximately satis%ed. However, the
R.K. Varma / Physics Reports 378 (2003) 301 – 434
367
Table 2 Values of the slopes A1 and A2 of the ln vs. B plots corresponding to the two residence times 1 and 2 of Fig. 4 for the pitch angle of injection 35◦ for the various energies E = 2:2; 2:9; 3:7 and 4:5 keV, and magnetic scale length L = 8, 11 and 13 cm with the ratio A2 =A1 for the various cases Sl. No.
E (keV)
L (cm)
A1
A2
A2 =A1
1 2 3 4 5 6
2.2 2.9 3.7 4.5 2.9 2.9
8 8 8 8 11 13
(5:56 ± 0:18) × 10−3 (4:9 ± 0:24) × 10−3 (4:54 ± 0:3) × 10−3 (3:96 ± 0:42) × 10−3 (6:16 ± 0:18) × 10−3 (8:24 ± 0:3) × 10−3
(11:9 ± 0:5) × 10−3 (10:46 ± 0:7) × 10−3 (9:95 ± 0:68) × 10−3 (9:15 ± 0:7) × 10−3 (12:62 ± 0:62) × 10−3 (17:2 ± 0:53) × 10−3
2:14 ± 0:03 2:12 ± 0:04 2:15 ± 0:06 2:31 ± 0:01 2:05 ± 0:03 2:09 ± 0:038
= 35◦ , (A2 =A1 )av = 2:14.
relative fraction of the particles corresponding to the residence times 1 ; 2 ; 3 · · · decreases with n. These are presented in Table 4(b) for some typical cases. These are typically in the range 70%, 25% and 5%, respectively for 1 ; 2 and 3 . Any possible signal corresponding to 4 is probably too small to be discernible above the background noise. Having checked the dependence of An on n for n = 2 and 3, it will be necessary to also check the dependence of A1 on E; and L. A diWculty presents itself, however, with respect to the check for the dependence on the pitch angle of injection , because it is not possible to specify exactly to minutes and seconds as to with what pitch angle the particles got injected into the trap. A check with respect to the L and E can however be made, if one uses the fact that the pitch angle of injection was √ the same even as E and L were varied. This would then entail from the expression (6.28) that A1 E, for the various E would have a constant value for the same value of L, and (A1 =L) would also have a constant value (for diFerent L) for a given energy E. These are presented in Table 5, which shows that these quantities are indeed “constants” to a good degree. To check the dependence of , the following self-consistency check was carried: Given the experimental value of A1 for a particular set of E and L values, the value of can be calculated using the expression (6.28) for A1 . The values of so calculated for all the sets (E; L) can be compared to check how close they are to each other as they ought be to correspond to the same pitch angle of injection. The last column in Table 5 gives the values of so calculated corresponding to the diFerent sets of E and L values. They are found to be remarkably close to each other, with a mean of 34◦ 20 20 and dispersion of +0 52 and −1 58 . This is quite astonishing because it shows that expression (6.26) which has been used to simulate the magnetic %eld variation, and the resulting expression (6.28) for A1 appear to work extremely well. It is thus seen that the prediction of the quantum-like model are well borne out by the experimental results, including the existence of the multiplicity of residence times, as well as their dependence on the energy E of the particles, the magnetic scale length L, the pitch angle of the injection , besides the mode number n. The existence of the multiplicity of residence times is indeed a very surprising result, which seems to have no obvious physical origin, in terms of the classical charged particle dynamics in a magnetic %eld. However, since the quantum-like model of Ref. [55] is essentially an intuitive–heuristic one, it also oFers no clue as to the physical origin of the multiplicity of residence
368
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Fig. 5. Variation of ln n (n , the experimentally determined two residence times, n = 1, and 2) with the magnetic %eld Bm at the mirror throat for the pitch angle of injection 33◦ . (a) and (b) for the electron energy E = 2:2 keV, and diFerent magnetic %eld scale length L = 8, and 11 cm. (c), (d) and (e) For the magnetic %eld scale length L = 13 cm, and diFerent electron energies E = 2:2, 2.9 and 3:7 keV. The slopes A1 ; A2 are presented in Table 3.
times. A numerical approach was therefore adopted by following the trajectories on a computer, of an ensemble (∼ 500) of particles in a suitable magnetic trap with initial conditions, corresponding to the actual experimental situation (-function in the energy E of the particles, the gyroaction , the canonical angular momentum P4 , and a uniform distribution in the gyrophase) to see if indeed the ensemble of particles leak out of the trap with a multiplicity of residence times. If so, what can these be traced to? These numerical results are described in Ref. [59] and brieRy referred to in Section 6.3.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
369
Table 3 Values of the slopes A1 and A2 of the ln n vs. B plots corresponding to the two residence times 1 and 2 of Fig. 5 for the pitch angle of injection 33◦ for the various energies E = 2:2, 2.9 and 3:7 keV, and the magnetic %eld scale length L = 8, 11, 13 cm, with the ratio A2 =A1 for the various cases Sl. No.
E (keV)
L (cm)
A1
A2
A2 =A1
1 2 3 4 5
2.2 2.2 2.2 2.9 3.7
8 11 13 13 13
(2:88 ± 0:18) × 10−4 (6:14 ± 0:52) × 10−4 (0:97 ± 0:11) × 10−3 (0:95 ± 0:06) × 10−3 (0:65 ± 0:027) × 10−3
(6:38 ± 0:43) × 10−4 (1:16 ± 0:14) × 10−4 (1:98 ± 0:16) × 10−3 (1:70 ± 0:06) × 10−3 (1:31 ± 0:207) × 10−3
2:2 ± 0:02 1:88 ± 0:06 2:02 ± 0:25 1:82 ± 0:14 2:01 ± 0:16
33◦ , (A2 =A1 )av = 1:99.
Table 4 (a) Values of the slopes A1 ; A2 and A3 of the ln vs. B plots corresponding to the three residence times 1 ; 2 and 3 for the electron energies E and magnetic %eld scale length (E = 2:9 keV, L = 11 cm), (E = 3:7 keV, L = 8 cm), along with the ratios A1 : A2 : A3 as given in Fig. 6. Pitch angle 33◦ . (b) Amplitudes A1 ; A2 ; A3 corresponding to the three residence times 1 ; 2 and 3 for diFerent magnetic %eld (at the mirror throat) for the case E = 3:7 keV, L = 8 cm at 33◦ E (keV)
L (cm)
A1 × 104
A2 × 104
A3 × 104
A1 : A2 : A3
2.9 3.7
11 8
5.45 3.0
10.2 6.0
16.4 9.0
1 : 1:88 : 3:01 1 : 2:01 : 3:02
B (Gauss)
1 (ms)
A1
2 (ms)
A2
3 (ms)
A3
270 335 400 470 540
0:227 ± 0:059 0:236 ± 0:036 0:242 ± 0:035 0:255 ± 0:036 0:262 ± 0:059
0:75 ± 0:116 0:67 ± 0:09 0:70 ± 0:09 0:67 ± 0:08 0:74 ± 0:11
0:475 ± 0:112 0:514 ± 0:109 0:585 ± 0:155 0:628 ± 0:135 0:647 ± 0:2
0:3 ± 0:044 0:26 ± 0:05 0:26 ± 0:056 0:27 ± 0:048 0:26 ± 0:047
0:823 ± 0:126 0:954 ± 0:02 1:06 ± 0:22 1:15 ± 0:29 1:30 ± 0:25
0:054 ± 0:016 0:023 ± 0:02 0:025 ± 0:019 0:037 ± 0:023 0:023 ± 0:015
While the heuristic model of Section 6.1 did provide a somewhat surprising prediction of the multiplicity of residence times which were subsequently veri%ed, this model does not have a proper dynamical basis. Eqs. (6.24) and (6.25) of the model should be derivable from an appropriate dynamical equation of classical dynamics, as the system under consideration manifestly belongs to the classical mechanical domain. Such a derivation is presented in Section 6.2. It may be mentioned that the same set of equations have been obtained in Section 8 starting from the quantum mechanic Schr6odinger equation. It will be seen that this latter derivation provides a rather interesting quantum mechanical interpretation of the mode number n in Eq. (6.24) and consequently of the multiplicity of residence times corresponding to n = 1; 2; 3 : : : . We shall be discussing this interpretation in Section 8, which will be seen to be quite appealing physically.
370
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Fig. 6. Variation of ln n (n , the experimentally determined three residence times n = 1, 2 and 3) with the magnetic %eld Bm at the mirror throat for the electron energies and magnetic %eld scale length: (a) E = 2:9 keV, L = 11 cm; (b) E = 3:7 keV, L = 8 cm. The slopes A1 ; A2 and A3 of the corresponding straight line plots are presented in Table 4(a).
6.2. SchrFodinger-like equations as a Hilbert space representation of the classical Liouville equation This derivation starts from the classical Liouville equation for the charged particles in a magnetic %eld, which is, of course, equivalent to the classical dynamical equations, as its characteristic equations are the Hamilton equations of motion. This is the appropriate equation to employ as it represents the dynamics of an ensemble of particles which is what we are dealing with. Moreover, and more importantly it is a linear equation in the Liouville density function, independently of the nature (linear or otherwise) of the Hamilton dynamical equations. This property (of linearity) affords a great formal as well as calculational advantage in terms of obtaining a solution. In fact, the linearity of the Schr6odinger-like equations that we shall obtain would be found to follow from that of the former. We shall for simplicity consider the dynamics in an axisymmetric magnetic %eld, for which the Langrangian is given by 1 e m(x˙2 + x˙2⊥ + r 2 4˙ 2 ) + rA4 ; (6.31) 2 c ˆ and x⊥ is the coordinate where x is the coordinate along the %eld line, with the unit vector e, ˆ and eˆ 4 , the unit vector in the 4-direction at the point. Strictly speaking, the perpendicular both to e, kinetic energy term in the “parallel” direction 12 mx˙2 should have an appropriate scale factor so as to have the form 12 m(1 − y=Rc )2 x˙2 , where Rc is the radius curvature of the %eld line at the point L=
R.K. Varma / Physics Reports 378 (2003) 301 – 434
371
Table 5 (a) The pitch angle of injection calculated from relation (6.28) using the experimentally determined value A1 for the various values of E and L. These are found to have a very small spread around the mean value ] =√34◦ 20 30 . The (approximate) estimated angle of injection was 35◦ . Also shown are approximate constancy of A1 E, and A1 =L as required by (6.28). (b) Pitch angle of injection calculated from relation (6.28) using the experimentally determined values A1 for the various values of E and L. These are found to have a very small spread around the mean value = 32◦ 29 31 , while the (approximate) estimated value of was 33◦ (a) For 35◦ Sl. No.
E (keV)
L (cm)
1 2.2 8 2 2.9 8 3 3.7 8 4 4.5 8 5 2.9 11 6 2.9 13 ] = 34◦ 20 30 (−2 8 ; +0 26 ).
A1
A1 E 1=2
(5:56 ± 0:18) × 10−3 (4:9 ± 0:24) × 10−3 (4:54 ± 0:3) × 10−3 (3:96 ± 0:42) × 10−3 (6:16 ± 0:18) × 10−3 (8:24 ± 0:3) × 10−3
(8:25 ± 0:25) × 10−3 (8:34 ± 0:41) × 10−3 (8:71 ± 0:57) × 10−3 (8:37 ± 0:8) × 10−3
A1 =L (0:612 ± 0:03) × 10
−3
(0:56 ± 0:016) × 10−3 (0:631 ± 0:023) × 10−3
34◦ 19 57 34◦ 21 12 34◦ 21 12 34◦ 20 20 34◦ 18 22 34◦ 20 56
(b) For 33◦ Sl. No. 1 2 3 4 5
E (kev) 2.2 2.2 2.2 2.4 3.7
L (cm) 8 11 13 13 13
A1 (experimental) −3
0:91 × 10 0:213 × 10−2 0:339 × 10−2 0:332 × 10−2 0:227 × 10−2
(calculated) 33◦ 21 36 32◦ 23 24 32◦ 25 12 32◦ 25 48 32◦ 24 36
] = 32◦ 24 31 (−2 55 ; +1 17 ).
[see Eq. (4.104) where y = (x⊥ − x⊥0 ), x⊥0 being de%ned by (6.36)]. However, if we assume, as we do, that (x⊥ − x⊥0 )Rc , then the form (6.31) follows. The axisymmetry of the magnetic %eld that has been assumed entails the conservation of the canonical angular momentum P4 P4 =
9L e = mr 2 4˙ + rA4 = M ; ˙ c 94
(6.32)
Using this constancy one can obtain a reduced Lagrangian L] which describes the eFective motion in the (x ; x⊥ ) plane after the 4-motion has been eliminated. The reduced Lagrangian L] is essentially given by the Routhian R. L] = R = L − P4 4˙ ; where 4˙ in R is to be substituted from (6.32). This yields 2 1 1 e rA L] = m(x˙2 + x˙2⊥ ) − : M − 4 2 2mr 2 c
(6.33)
(6.34)
372
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Note that r in the above expression is to be expressed in terms of x and x⊥ . The last term in (6.34) then appears as an eFective potential for the motion in the (x ; x⊥ ) plane. Two cases arise: The %rst one corresponds to M ¿ 0 (with e ¡ 0). In this case the value of M ˙ which must always be greater than ((e=c)rA4 ) and must in (6.32) is dominated by the term mr 2 4, always carry the same (positive) sign. This means that such particles, always having the same sign ˙ must encircle the axis of symmetry. These particles would %nd themselves exactly trapped if for 4, M is suWciently large positive. Since their trapping is not dependent on the adiabatic invariance of the gyroaction , this case is not relevant for the nonadiabatic escape. The second case corresponds to M ¡ 0 (with e ¡ 0), (or M ¿ 0, with e ¿ 0) that is, the expression (6.31) for the canonical angular momentum is dominated by the term (e=c)rA4 . In this case we can expand (e=c)rA4 around the value M as in Eq. (4.106) 9 e e rA4 = M + (x⊥ − x⊥0 ) rA4 + ··· ; (6.35) c 9x⊥ c x⊥ =x⊥0 where x⊥0 is de%ned by the relation M=
e rA4 | x⊥ =x⊥0 : c
(6.36)
Then we have L] = 12 m(x˙2 + x˙2⊥ ) − 12 m 2 (x⊥ − x⊥0 )2 + · · · ;
(6.37)
where
(x ; x⊥0 ) =
9 e (rA4 ) | x⊥ =x⊥0 : mcr 9x⊥
(6.38)
The Hamiltonian corresponding to L] is given by Hˆ =
p2 2m
+
2 p⊥ 1 + m 2 (x⊥ − x⊥0 )2 2m 2
(6.39)
which represents a harmonic oscillator in the perpendicular coordinate x⊥ . The Hamiltonian Hˆ is of course approximate because of expansion (6.35). The exact Hamiltonian H] , corresponding to the Lagrangian L] is, on the other hand, H] =
p2 2m
+
2 2 p⊥ 1 e + rA : M − 4 2m 2mr 2 c
(6.40)
Transforming (p⊥ ; x⊥ ) in (6.39) to action-angle variables (; ) at any x , de%ned by p⊥ = (2m )
1=2
cos ;
(x⊥ − x⊥0 ) =
2 m
1=2
sin ;
(6.41)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
373
one gets Hˆ =
p2
+ (x ) ; (6.42) 2m where is the action corresponding to the lowest order oscillatory motion of the x⊥ coordinate. is an adiabatic invariant if is a slowly varying function of x . 6.2.1. The Liouville equation for the evolution of the ensemble Consider the ensemble of particles corresponding to the experimental conditions where a large number of them are injected into a magnetic mirror trap at a given space coordinate x0 , and with a speci%ed energy E0 and gyroaction , but with a distribution in the gyrophase 0 . The average density of injected particles in the trap is assumed to be so low that interparticle collisions can be neglected. This then constitutes an ensemble of independent particles. If f represents the Liouville density for the ensemble in the phase space of the canonical variable (x ; p ; x⊥ ; p⊥ ) corresponding to the reduced Hamiltonian H] of (6.40) in view of the assumed axisymmetry, then the evolution of f is governed by the Liouville equation: 9f p 9f 9f 9f p⊥ 9f + + + p˙ + p˙ ⊥ =0 : (6.43) 9t m 9x m 9x⊥ 9p 9p⊥ This expresses the conservation of probability along trajectories described by the Hamiltonian (6.40). The initial form f0 of the distribution function f is determined by “state preparation”. In view of the conditions of injection, it is taken to be a -functions in the energy E, the canonical angular momentum P4 = M, as well as in the initial value of the gyroaction , while an uncontrollable distribution g(0 ) in the initial value of the phase angle 0 . Such a “state” corresponds precisely to what has been designated by Synge [3] as a “coherent system of trajectories”, and a “family” by Dirac [4]. It has been pointed out by Dirac that it corresponds to a solution S(x; t; :i ) of the Hamilton– Jacobi equation for the system, where the :i represents the momenta initial values, some of which may be the “global” constants of motion in view of the global symmetries of the system, such as the energy E and the canonical angular momentum P4 . According to him, a “family” “corresponds to a state in quantum mechanics”, and that “presumably the family has some deep signi%cance in nature, not yet properly understood”. It is worth mentioning that the author was unaware of both Synge’s work and Dirac’s remarks at the time of writing of Ref. [58], and was guided more by analogies, intuition, and actual experimental conditions in choosing the “initial state” as de%ned above. With such a choice of the initial “state”, which is such that the initial momenta values :i are, by de%nition, constants of motion, the variables (x ; x⊥ ; p ; p⊥ ) of the Liouville density function f are transformed to a “mixed” representation of variables (x ; ; :i ) where the :i stand for (E; P4 ; 0 ) and , the gyrophase is de%ned by t = 0 −
dt : (6.44) 0
As a function of these variables, the distribution function f(x ; ; t; :i ) has the meaning of a probability very close to that in quantum mechanics; that is, that of %nding a particle at (x ; ) at the time t, if it initially has the momentum :i (E = E0 ; P4 = M; P ≡ = 0 ). It is in this context that Dirac’s remarks about the ensemble that fˆ represents, namely, that it “corresponds to a state in
374
R.K. Varma / Physics Reports 378 (2003) 301 – 434
quantum mechanics”, assumes signi%cance. This is, of course, not a canonical transformation, and an appropriate Jacobian of transformation must be used. ˆ ; ; t; :i ), the Liouville equation is then In terms of the function f(x 9fˆ 9fˆ 9fˆ + v =0 ; + ˙ 9t 9x 9
(6.45)
where v and ˙ are to be regarded as functions of (x ; :i ). We shall now seek what may be regarded as a Hilbert space representation of the Liouville equation in the form (6.45). As we have seen [58], and as we shall show presently, we obtain precisely the Schr6odinger-like equations (6.24) of Section 6.1, as the required Hilbert space representation which were obtained there through a heuristic construction. These are, of course, equations for the amplitude functions (n) which would describe wave behaviour in the classical mechanical parameter domain to which they belong. We shall return to a detailed discussion of this aspect later. 6.2.2. Equations for the probability amplitudes for the ensemble We now carry out a series of suitably designed transformations on the Liouville equation which lead us towards a Hilbert space representation in the form of a set of equations for probability amplitudes. We %rst introduce a change of variable from the gyrophase to, an action-phase @ de%ned by 1 t 1 2 @=+ mv dt ; (6.46) 0 2 where the time integration is carried out along an exact trajectory. This transformation of the variable is motivated by the fact that the time derivative of @ gives 1 1 2 1 (6.47) mv − = L= ; @˙ = ˙ + mv2 = = 2 2 where L is the adiabatic Lagrangian which generates the adiabatic equation of motion (6.2). This @ is the same variable as de%ned by Eq. (6.4), and has the important property that since @ is the action, it de%nes the neighbourhood of the adiabatic motion through its stationarity for the latter. This, by de%nition, is the region of nonadiabaticity. Recalling the analogy with quantum-classical relationship, alluded to in Section 2, the quantum departures from the classical motion also belong to the neighbourhood of the classical motion in the function space of paths, de%ned through the stationarity of the action which de%nes the classical motion. Carrying out the transformation of variables from to @, the Liouville equation (6.45) takes the form 9f 9f L 9f +v + =0 ; 9t 9x 9@
(6.48)
where all subscripts have been dropped. A %nite time integral form of (6.48) is given by t t ; dt v; @ − dt L=; t f(x; @; t) = f x − t
t
(6.49)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
375
Note that since f de%nes a probability, it ought to be positive de%nite at all space-time points. This is ensured by writing f as f=
2
;
(6.50)
with a real quantity. This also constitutes a step towards constructing a Hilbert space representation as we shall see. Taking the square-root of (6.49) using (6.50) one gets taking the positive sign t t (6.51) dt v; @ − L=; t : (x; @; t) = x− t
t
One √ may also take the negative sign for the square root. It only leads to a change of the sign of i (= −1) in the %nal equation, which is equally valid, as it can be regarded as a complex-conjugate equation. Note the distribution function f and hence to be single valued, must be periodic in the gyrophase as also in @, since it is related to , additively. We thus introduce a Fourier series expansion ˆ n; t)ein@ : (x; @; t) = (6.52) (x; n
Then the Fourier transformation of (6.51) according to (6.52) gives t t ˆ n; t) = exp −in dt L= ˆ x − dt v; n; t ; (x; t
t
(6.53)
where in this and all the foregoing equations, we have suppressed the momenta :i , as the parameter ˆ etc. and shall continue to do so except when explicit reference arguments of the function, ; f, or , to them is required. t Note that the integral t L dt is evaluated along the projections of the exact three-dimensional trajectories on the one-dimensional coordinate parallel to the magnetic %eld line. In the slightly t nonadiabatic case, the values of the integral t L dt for the various trajectories of the ensemble would be close to the value for the adiabatic trajectory which corresponds to the extremal of the action t dt L = 0 : (6.54) t
Eq. (6.53) appears very similar to the Feynman equation representing his path integral formulation of quantum mechanics. (See for instance Ref. [66]) In particular, t the exponential factor t exp[in t dt L=] in Eq. (6.52) is similar to the exponential factor exp[i t dtLc =˝] in the Feynman formulation, where Lc is the classical Lagrangian. These exponential factors have analogous consequences in the two cases: In the adiabatic limit (formally expressed by → 0), the exponential t ˆ factor exp[in t L dt =] oscillates rapidly, and would contribute to the amplitude (x; n; t) at (x; t) predominantly via the trajectory which extremizes the action L dt, as expressed by (6.54). t This is the adiabatic trajectory. This is analogous to the situation with the Feynman kernel exp[i t dt Lc =˝], where in the limit ˝ → 0, the dominant t contribution comes through the classical trajectory which extremizes the classical action integral t Lc dt . There is, however, a diFerence between the two cases: There is no integral over “paths” in (6.53) as in the Feynman path integral formulation. Eq. (6.53) will therefore be dealt with diFerently. That is what is done in what follows.
376
R.K. Varma / Physics Reports 378 (2003) 301 – 434
t Note that the integral t dt L, in the exponent is a trajectory integral, where the x-dependence of t L is substituted for by the trajectory value x(t ) = x(t ) + t v(x(t1 )) dt1 . t t dt L(x(t )) = dt L{x(t )v(x(t1 )) dt1 } : (6.55) t
t
Expressed in this fashion, the trajectory integrals like (6.55) are explicit functions of the position x(t ) at the time t , and not of x(t) at time t. Note that the x appearing in the argument of ˆ in (6.53) refers to x(t) at time t. ˆ n; t) with respect to x [that is, the position x(t) at We now introduce a Fourier transform of (x; time t], de%ned by ˜ n; t) = ˆ n; t) : (k; d xeikx (x; (6.56) The Fourier transformation of (6.53) with respect to x, in accordance with (6.56), gives t nL ˜ n; t ) ; ˜ − kv (k; (k; n; t) = exp −i dt t
(6.57)
As pointed out above, the functions under the trajectory integral are not functions of x(t), and are therefore not involved in the Fourier transformation. Now consider the integrand in the exponent of (6.57), [(nL=) − kv], which may be written as
k 2 n 1 k 2 1 nm nL n 1 2 v− − +
(6.58) − kv = mv − − kv = 2 2 nm 2m n If we next de%ne a function
] n; t) = (k; n; t) exp − 1 i nm (k; 2 ˜∗
t0
t
dt
k v− mn
2
;
(6.59)
where t0 is some arbitrary initial time, then in terms of this function Eq. (6.57) takes the form
2 t k 1 in ] n; t; :i ) = exp − ] n; t ; :1 ) (k; dt +
(k; (6.60) t 2m n when use is made of Eq. (6.58). The inverse Fourier transform of (6.60) may be taken with respect to k which will give back the dependence of the functions on x(t) (the position at time t). We thus have 2 1 92 in t dt − + (x(t )) (x; n; t ; :i ) ; (6.61) (x; n; t; :i ) = exp − t n 2m 9x2 ] n; t; :) and is given by where (x; n; t; :i ) is the inverse Fourier transform of (k; dk ] (x; n; t; :i ) = (k; n; t; :i ) eikx 2*
k 2 ˜ ∗ d k ikx 1 nm t e exp − i (k; n; t; :i ) : = dt v − 2* 2 t0 mn
(6.62)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
377
If in Eq. (6.61) we let t = t + , with being an in%nitesimal time, and expand both sides around t , then it is easy to see that, to the lowest order in , we obtain the diFerential equation 2 1 92 (n) i 9(n) =− + ( )(n); n = 1; 2; 3 : : : ; (6.63) n 9t n 2m 9x2 where we have used the limiting value t+ dt (x(t )) = (x) : (6.64) lim →0
t
We have to determine next the meaning and interpretation of these functions (n) and their relationship with the probability density function f of (6.49) and (6.50). Since the action phase @ is not a measurable quantity, the probability density f integrated over @ is given by ˆ n; t) ; d@f = (6.65) ˆ ∗ (x; n; t)(x; G(x; t; :i ) = n
where use has been made of (6.50) and (6.52) in obtaining (6.65). Making use of (6.56), (6.59) and (6.62), we have the following expression for G(x; t; :i ). d k d k ] nt)] ∗ (k n; t) ei(k −k )x (k; G(x; t; :i ) = 2 (2*) n
1 inm t k 2 k 2 ×exp v− dt − v− 2 t0 mn mn t dG dK 1 K 1 iGx ] ∗ ] ; e K + G; n; t K − G; n; t exp iG dt v − = (2*)2 2 2 nm t0 n (6.66)
where we have introduced a change of variables G = k − If one now de%nes an average velocity v] by t −1 v] = (t − t0 ) v dt ;
k ,
and K =
1 (k 2
+
k ). (6.67)
t0
where t0 may be taken as an initial time, then t K = (t − t0 )(v] − K=mn) dt v − mn t0
(6.68)
In the limit of large times (t − t0 ), the exponential factor under the integral (6.66) will oscillate rapidly and will give a vanishing contribution to the integral unless mv] = K=n :
(6.69)
This relation identi%es, analogously to quantum mechanics, K=n of the particle. In view of the above argument, we have dG dK 1 iGx ] ∗ ] G(x; t; :i ) = e K + G; n; t; :1 K − (2*)2 2 n ∗ (x; n; t; :1 )(x; n; t; :1 ) ; = n
with the average momentum mv] 1 G; n; t; :i 2
(6.70)
378
R.K. Varma / Physics Reports 378 (2003) 301 – 434
] n; t) is the Fourier transform of (x; n; t) as per where use has been made of the fact that (k; Eq. (6.62). The set of Eqs. (6.63) for the amplitude functions (x; n; t) along with the connection (6.70) with the probability density G(x; t) is the same set of equations as obtained earlier in Section 6.1, using a heuristic derivation. But they have been derived here using a systematic deductive procedure starting from the classical Liouville equation for the ensemble under consideration. It amounts to a Hilbert space representation of the latter. This set of equations wherein (=n); n = 1; 2; 3; : : : appears in the various equations in the role of ˝, and the adiabatic potential ( ) in the location of the potential in the quantum mechanic Schr6odinger equation, bears the same relationship with the adiabatic motion as the Schr6odinger equation does with the classical mechanical motion. Note that can be identi%ed as = N ˝, with N 1 with N ∼ 109 typically. This makes a macroscopic quantity in contrast to ˝. The nonadiabatic eFects which are responsible for the leakage of particles from the adiabatic traps thus appear in the nature of quantum eFects which are responsible for the quantum tunneling. This formalism thus constitutes a close analogue of the Schr6odinger formalism of quantum mechanics with the important diFerence that we here have a set of in%nite equations for the functions (x; n; t); (n = 1; 2; 3; : : :). 6.2.3. An analysis of the SchrFodinger-like formalism and its observational rami3cations The Schr6odinger-like formalism as represented by Eqs. (6.63) and (6.70) has now been derived using a more formal procedure, starting from a known dynamical equation, namely, the Liouville equation. Its amplitude character now has a more formal basis as compared to the earlier heuristic procedure (Section 6.1). (In Section 8 we give yet another derivation of these set of equations starting from the quantum mechanic Schr6odinger equation). As a consequence, the amplitude character of the Schr6odinger-like formalism can now be taken with a greater degree of con%dence. We have already considered the predictions of this formalism relating to the existence of the multiplicity of residence times in an adiabatic trap arising from the diFerent equations of the set (6.63) corresponding to the modes n = 1; 2; 3; : : : . These have been experimentally veri%ed with all the characteristics, in accordance with the theory. It may be emphasized that there is no way one could have suspected the existence of these multiple residence times based purely on the standard equation of motion paradigm following, for instance, the analysis of Chirikov as given in Section 5. These ought therefore be regarded as a characteristic manifestation of the amplitude description, similar to the quantum tunneling. Being a description of the classical mechanical system, this could be considered as quite astonishing. There is, however, an even more astonishing consequence of this probability amplitude formalism for this classical mechanical system. This is the existence of matter wave phenomena which were predicted by the author in Ref. [58]. This would be considered, in fact, quite extraordinary because no standard representation of classical charged particle dynamics in a magnetic %eld would support any matter wave phenomena. However, as we shall see in Section 7, we have in fact, observed such matter wave manifestations in the form of the discrete energy band structure in the transmission of electrons along a magnetic %eld [60] and in the form of matter wave beats [65]. Considering that these eFects pertain to the system belonging to the classical mechanical macrodomain, this leads inevitably to the question as to what precisely is the nature of this formalism and of the underlying mathematical structure vis a vis the standard classical mechanical paradigm, which
R.K. Varma / Physics Reports 378 (2003) 301 – 434
379
imparts it the potential of such an unusual prediction. Furthermore, if one recalls that this formalism has been obtained from the classical Liouville equation, one is led to the question, whether this formalism constitute a new representation of classical dynamics, which has been able to unearth some new manifestations of the latter, not hitherto associated with it. If so, one may further ask if classical dynamics is, in fact, endowed with hitherto unknown features, which have got somehow incorporated in its Hilbert space representation, which this formalism amounts to. These questions will be taken up for further discussion in Sections 8 and 9. One thing may, however, be noted rightaway. While the Liouville equation as the starting point of the above derivation is a %rst order partial diFerential equation in three dimensions, with the Hamilton equations as its characteristic equations, the set of Schr6odinger-like equations obtained from it through a series of transformations are a set of second order partial diFerential equations in one spatial dimension (along the magnetic %eld line coordinate). They thereby constitute a boundary value problem for the in%nite set of functions (x; n; t) in the %eld line coordinate. Being hyperbolic in nature they, of course, describe wave phenomena. It will be clear that by the time we have arrived at the set of (Schr6odinger-like) equations (6.63) from the Liouville equation, the identity of the characteristics of the latter is completely lost. That is, there are no more “trajectories” in this representation, which have been “decomposed” (through the process of derivation) in terms of the inverse variables specifying the Hilbert space. We thus have the emergence of a wave picture in the sense of matter waves. On the other hand, because of the probability amplitude nature of this formalism, a natural question presents itself, namely whether there exists a relationship between the Schr6odinger-like equations of this formalism and the Schr6odinger equation of quantum mechanics, so that the amplitude character of the former can be attributed directly to that of the latter. This has, in fact, been found to be the case, and it is demonstrated in Section 8. Such a relationship reinforces the true amplitude character of these equations and the consequent matter wave phenomena that they describe. 6.3. Residence times: experimental results and comparison with theoretical models We have now before us experimental results and two theoretical models to describe the residence times which have been discussed in detail. The “stochastic diFusion” model in Section 5, and the Schr6odinger-like formalism in the present section. It has been seen in Section 6.1.1 that the experimental results do exhibit the existence of the multiplicity of residence times in accordance with the expectations of the Schr6odinger-like formalism, with all the predicted characteristics in respect of this dependence on the energy E, the pitch angle of injection , and the scale length of the magnetic %eld variation L in the region of the mirrors. If one were to compare the experimental results with the expectations of the “stochastic diFusion” model reviewed in Section 5, we have seen that the latter model oFers little possibility of description of the multiplicity of the residence times that have been established experimentally. It has been suggested [52] that the behaviour of the standard mapping in the “stochastic layer” could lead to a power law decay in time. An attempt has been made [52] to %t the latter part of one of our (published) decay curves (which corresponds to two distinct residence times) [57 (b)] with a power law using a log–log plot and it is claimed that it corresponds to the theoretically derived power law ∼ t −2:3 . It seems to us, however, that one data curve is too small a number to establish such a %t, particularly the one involving a log–log plot. It is desirable to use a large number of data curves.
380
R.K. Varma / Physics Reports 378 (2003) 301 – 434
On the other hand, using quite a large number of current decay data curves over a wide range of parameters, we have clearly established the existence of upto three residence times with the characteristics in accordance with the expression (6.28) obtained from the theory. In particular, the dependence on the mode number n, the energy E, the magnetic %eld scale length L, and the initial pitch angle of injection has been clearly demonstrated as being in accordance with the expression (6.28). In this context it is also instructive to compare the dependence of the residence time (5.28) as calculated in Section 5 for the “stochastic diFusion” model on the various parameters, such as (E; ; L) mentioned above. As was discussed at the end of Section 5.3.1, a number of assumptions and simpli%cations were made in formulating the stochastic diFusion model and in arriving at the corresponding residence time. In particular, to render the mapping (5.7) canonical the limit 0 → 0 was taken in the expression for the nonadiabatic change V (across the midplane transit). The residence time r of (5.28) thus bears no dependence on the initial pitch angle 0 , while the experimental results have clearly established such a dependence in accordance with the expression of (6.28) as shown in Table 5. Moreover, while expression (5.28) for r involves the scale length of magnetic %eld variation at the midplane, the experimental results have established a scale length dependence characteristic of the mirror region, in the particular manner given by expression (6.28). The expression (5.28) for the residence time would thus appear to be inadequate to describe even the shortest one of the experimentally determined residence times (corresponding to n = 1). It is yet possible, however that there exists a parameter regime where the experimental results may be described by the stochastic diFusion model. But we do not yet know what this regime is, if any. A question that presents itself rather acutely is the following: The problem of the determination of the residence times in a magnetic trap belongs ostensibly to the domain of classical mechanics by virtue of the spatial dimensions of the system involved (10 –100 cm). The experimental results on the multiplicity of residence times should thus be explainable in terms of the classical Lorentz equation of motion. But the only existing approach—the stochastic diFusion model—based on the classical dynamics—appears not to be successful so far in reproducing the experimental results. On the other hand, the Schr6odinger-like probability amplitude description which predicted these multiple residence times in the %rst place describes them remarkably well. We give in Section 8 a quantum mechanical derivation of the set of equations (6.63), where a relationship has been established between the wave amplitude of (6.63) and the quantum mechanical wave amplitude. This thus provides, interestingly enough, a quantum mechanical interpretation of the multiplicity of residence times. This is indeed quite intriguing at %rst sight, that there should be a quantum mechanical origin for these residence times, which were observed for a system with macroscopic dimensions [This and related questions will be taken up for further discussion later]. The question nevertheless remains that considering the fact that the system is of macroscopic dimensions, is it at all possible to describe the multiplicity of residence times in terms of the classical dynamics? In other words are these residence times at all contained in the classical dynamics. Assuming that the approximations used to formulate the stochastic diFusion may have led to a suppression of the multiplicity of residence times, the above question may be answered by following exactly the trajectories of an ensemble of particles (corresponding to the experimental conditions) using the Lorentz equation of motion with an appropriate magnetic %eld con%guration. Such numerical experiments were carried out [59] by starting oF a number of 500 particles in a magnetic trap with initial conditions closely corresponding to that in an actual experiment. The number of particles
R.K. Varma / Physics Reports 378 (2003) 301 – 434
381
leaking out of the trap as a function of time was then recorded and plotted on a semi-log plot to see if there exists one or more residence times in the numerical experiments. While the experiments did indicate the existence of more than one residence times, they could not be considered as conclusive. More numerical experiments with diFerent magnetic %eld con%gurations, and a larger number of particles would need to be carried out before reaching de%nitive conclusions. Another interesting question arises from the derivation given in Section 6.2 of the set of Schr6odinger-like amplitude equations (6.63) obtained from the classical Liouville equation. It may be pointed out that the ensemble used for the Liouville equation was a “coherent system of trajectories” [3]. Since the classical Liouville equation is just another representation of classical dynamics, the set of Schr6odinger-like equations may also be considered as its another representation, though in Hilbert space (an inverse space in some variables). A somewhat deeper question then looked purely from the classical mechanical view point is: Is nature manifesting itself through the Hilbert space even in classical dynamics, in the observation of the multiplicity of residence times? This question warrants careful examination. 7. Observations of one-dimensional interference phenomena The amplitude character of the Schr6odinger-like equations (6.63) along with the connection (6.70) with the probability density G(x; t) leads one to the prediction of the existence of one dimensional interference phenomena along the magnetic %eld for this purely classical mechanical system. This prediction is based entirely on the mathematical structure of Eq. (6.63) and (6.70) which is analogous to the Schr6odinger equation of quantum mechanics, and would be regarded generally as rather unusual for a classical mechanical system. But astonishing as it may appear, such interference eFects have indeed been observed, by Varma and Punithavelu [60] which we now describe in the next sub-section. A deeper understanding of the nature of these interference phenomena will be achieved in Section 8, where we exhibit the relationship between the formalism of Eqs. (6.63) and (6.70) and the Schr6odinger equation of quantum mechanics for the problem. In order to be able to carry out experiments to check the predictions of the above formalism, it is %rst required to specify what exactly one should look for and what kind of experiments one ought to do. Consider Eq. (6.63) for the mode n = 1, which is a one dimensional wave equation along the magnetic %eld. Electrons propagating along a magnetic %eld, in general inhomogeneous, from an electron gun to a detector plate placed at a distance L away, are represented by a travelling wave solution (1) = Aei
k dx
;
k = p= :
(7.1)
where p = mv is the momentum of the particle. Since the injected value of may have a spread , (7.1) may be integrated over around a mean , ] with p = [2m(E − )]1=2 . This yields i x ] (7.2) (1) = A exp v] for a homogeneous magnetic %eld, where v] is the beam velocity with the mean ] 1=2 v] = [2(E − )=m] ] :
(7.3)
382
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Form (7.2) of the wave function (1) implies a wave length of the matter wave 2*v] : (7.4) NeF =
A direct quantum mechanical derivation of this expression (7.4) for the wave length of the macroscopic matter wave is given in Section 7.3. A proper wave algorithm for considering the interference phenomena in the experiments, with the wave length (7.4), is presented in Section 7.2.2. However, a simple prescription to determine the allowed values of k = = v, ] can be given if we consider the grounded detector as an absorber. In that case the gun-detector system with the grounded anode can be considered a one-dimensional “box” with periodic boundary conditions for the wave function (7.2). This yields the condition [69]
L = 2*lv;
l = 1; 2; 3 ;
(7.5)
for the allowed values of k = =v. If the magnetic %eld is inhomogeneous, then one obtain the relation ] = 2*lv;
L ] l = 1; 2; 3 ; (7.6) where ] is the average of over the length L of the “box”, and v, ] the average of v. The experiment is carried out to study the transmission characteristics of a beam of electrons of extremely low intensity (∼ nA–A) along a magnetic %eld to check whether one does get plate current maxima when the condition (7.6) is satis%ed. The electrons are injected from an electron gun, capable of energies in the range 0 –3 keV, at a small angle to an ambient magnetic in a vacuum chamber, and the transmission characteristics of the electrons along the magnetic %eld can be studied in three modes. (i) The electron beam E and the length L are %xed and the ambient magnetic %eld is swept over a range of values, with the procedure repeated with diFerent values of L, and diFerent energies E. (ii) The magnetic %eld and the length L are kept %xed, and the electron energy is swept over a certain range with the procedure repeated with diFerent values of the length L and the magnetic %eld B. (iii) The electron beam energy E, the magnetic %eld B and the length L are kept %xed and the potential on the biased grid of the detector grid is swept from a value |@| ¿ |@0 |; @0 = −E=e to zero. The plate current is recorded as a function of @. The diFerent modes of experimentation are essentially meant to study the consequences of the relation (7.6) with respect to its dependence on the various parameters—the magnetic %eld, the electron energy and the distance L. As we carry out the above mentioned experiments to check whether there arise interference maxima at parameter values connected by the relation (7.6), it is well to remember that from the point of view of the equation of motion—initial value paradigm of classical dynamics, the electron motion in the above experiments is governed by the Lorentz equation of motion. It will be desirable to see what one ought to expect from the Lorentz equation of motion in the three modes of experimentation mentioned above. The experimental results obtained can then be compared and contrasted against this expectation. In the simplest case of a homogeneous magnetic %eld (in some of the experiments we have in fact employed a homogeneous %eld), the electrons from an electron gun would propagate along the
R.K. Varma / Physics Reports 378 (2003) 301 – 434
383
magnetic %eld with a constant parallel velocity, v = const, while they perform gyrations around the magnetic %eld line with the Larmor frequency = eB=mc. Now what would happen if, in accordance with the mode (i) above, the magnetic %eld is swept from a given value to a larger value, while other quantities remain constant. Since the magnetic %eld con%guration remains unaFected while the %eld strength is varied, the pitch angle of injection remains unchanged. The parallel and perpendicular velocities of particles would remain unaFected with the variation of the magnetic %eld strength. Consequently one should expect no change in the electron detector current as the magnetic %eld is swept. This expectation based on the Lorentz equation is clearly grossly at variance with the expected observation of maxima and minima in the detector current based on the wave formalism [Eqs. (6.63) and (6.70)], with the maxima being described by (7.6). There is a similar contrast of expectations from the two points of view in the modes (ii) and (iii) of experimentation as de%ned above. We shall discuss them during the course of the presentation and discussion of the corresponding experiment. SuWce it to say at present that the experiments would provide a crucial test of the validity or otherwise of the wave formalism, and their results would of great interest because of their exciting implications. 7.1. Transmission characteristics of charged particles along a magnetic with a retarding potential—existence of discrete energy states We %rst describe the experiment corresponding to the mode (iii) whereby the electrons from an electron gun with a very low beam current (¡ 0:1 A) travel along a magnetic %eld in a vacuum chamber evacuated to ∼ 5 × 10−7 torr, with a small initial pitch angle (¡ 5◦ ). At the other end of the chamber, which is kept at ground potential throughout, is a Faraday cup detector which can be moved along the axis of the chamber so as to vary its distance from the gun. The Faraday cup consists of a grounded collector plate with a grid at 10 mm in front of it, which can be raised to any desired potential. The inhomogeneity of the magnetic %eld, produced by a set of 35 equally spaced (8:5 cm) current carrying coils is ¡ 0:1% and is thus not relevant to the experiment. (See Ref. [60] for details of the experimental set-up.) The experiment corresponding to the mode (iii) that we now describe is not the simplest of the experiments, but it is interesting in view of the contrast of the two expectations, and the %rst one to be reported. It is carried out by %rst %xing a suitable distance L, between the gun and the detector, and choosing an energy E, and a magnetic %eld B. The detector grid is raised to a negative potential @ = −@m ; @m ¿ E=|e|. At this retarding potential, all the beam electrons of energy E are stopped, resulting in a zero detector current. The grid potential is then allowed to drop from the value −@m to zero, and the detector plate current is recorded as a function of the retarding potential. The arrangement described above, corresponds to a retarding potential analyser (RPA), and we shall %rst discuss what one would ordinarily expect for the detector response to such a retarding potential sweep, based on the standard charged particle dynamics aA la the Lorentz equation of motion. Again as discussed in Section 7, the electrons from the gun, in the absence of the retarding potential move with uniform velocity along the magnetic %eld uncoupled to the gyratory motion perpendicular to it. All the electrons emanating from the gun would reach the detector when @ = 0, while none would reach it when @=−@m . Note that it is “parallel” velocity of the electrons that will be retarded by the retarding potential. The velocity perpendicular to the magnetic %eld will have an E × B drift
384
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Fig. 7. Retarding potential analyser (RPA) plot for the electron energy E = 800 eV, and a weak magnetic %eld B = 25:9 G and gun–plate distance L = 89 cm. Note that it represents in form a standard RPA plot for a peaked electron energy.
Fig. 8. Plate current (a) and grid current (b) as a function of the retarding potential for the electron energy E = 600 eV, magnetic %eld B = 170 G, and gun–plate distance L = 30 cm. Note that the character of the RPA plot for B = 170 G changes drastically from that of Fig. 7 with a weak magnetic %eld.
velocity as its part. Thus as the grid potential is swept from @ = −@m to @ = 0, the detector electron current is expected to rise monotonically from zero to a saturation value, as shown in Fig. 7. The diFerential of this curve will represent the energy distribution of the beam. The RPA (retarding potential analyser) curve shown in Fig. 7 represents the pro%le of the plate current expected on the basis of the standard equation of motion paradigm with the Lorentz equation of motion. In fact Fig. 7 presents an experimental curve with a rather weak magnetic %eld (∼ 25 gauss). However, when the above experiment is carried, with a suWciently strong magnetic %eld (B 200 gauss) we obtain typically a form of the curve shown in Fig. 8(a), which exhibits a series of sharply de%ned peaks and dips in stark contrast to the monotonic RPA curve of the form Fig. 7. From the point of view of the standard paradigm this is totally unexpected and astonishing. One has, of course, to ensure that the observed pro%le is not an artifact of the detector system. A
R.K. Varma / Physics Reports 378 (2003) 301 – 434
385
measurement of the biased grid current, as shown in Fig. 8(b), shows similar peaks and dips as in Fig. 8(a), in fact, in complete correlation with the latter. This rules out the possibility that the peaks and dips of the plate current could some how be due to the interception of the electron beam by the grid wires and thus leads to the conclusion that the peaks and dips of Fig. 8 do indeed genuinely represent a physical phenomena. Further discussion to rule out other conceivable origins of the observed behaviour is given in the original reference [60 (a)]. Such a behaviour then signals the existence of a discrete set of “allowed” (and “forbidden”) parallel energy states in the classical mechanical parameter domain very much similar to the quantum mechanical conduction bands in solids. A rather interesting observation was also made when, simultaneously with the recording of the plate and grid currents of the detector, the anode current of the electron gun (Rowing to the ground) was also recorded. It was found to be in complete anti-correlation with the plate and grid current. This observation which was reported in Ref. [60 (b)] would help answer the question as to the fate of the electrons corresponding to the dips in the transmission current. Since the electrons cannot Row across the magnetic %eld, the total current along the magnetic %eld must be conserved. The anti-correlated anode current then serves as a complement to the plate and grid currents with their peaks and dips. The interesting point is that the complementary anode current happens to be so close to the gun and a distance L away from the plate. We have carried out the above experiment for a set of gun-plate distances L and magnetic %eld values B. We reproduce here the plate current pro%les for the set of L and B values: [L = 22 cm, B = 304 g and 248 g; L = 27 cm, B = 231 g] from Ref. [60 (a)] which have been analysed to check whether or not the observed peaks (and dips) are in accordance with the relation (7.6) which follows from the wave formalism of the new paradigm. Fig. 9(a) – (c) depicts these pro%les. Since the plots of Fig. 9(a) – (c) are in terms of the retarding potential @ which translates to the energy E = e@, we transform the relation (7.6) in terms of the energy E, which yields: 2
L 1 1 El = m (for maxima) ; (7.7a) 2 2* l2 2 1 1
L (for minima) : (7.7b) El = m 2 2* (l + 1=2)2 It will be noted from the plots of Fig. 9(a) – (c) that the plate current “dips” are more sharply de%ned than the “peaks”. We shall therefore check the positioning of the dips in the various %gures against the relation (7.7b). An attempt to %t the various dips into relation (7.7b) shows that they do %t into a relation of the form * 1 2 1 3 L 2 El = l+ ; (7.8) 2 2* 2 with 3L rather than L as in (7.7b). The origin of this discrepancy for this experiment is not yet entirely clear. We mark a dip arbitrarily by a number N (indicated in the %gures by an arrow), and measure the energy locations of the various other dips numbered as N + 3; N + 6 · · · . We tabulate these in Table 6 for the plots of the Fig. 9(a) – (c). Using the values of L and B corresponding to these %gures we calculate the values of (l+1=2), as determined by relation (7.8), which are presented in the table. We do %nd the whole numbers l for the various plots which diFer by 3, corresponding
386
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Fig. 9. Plate current as a function of the retarding potential for the electron energy E = 650 eV and diFerent values of the magnetic %eld B, and gun–plate distance L, as shown in the various plots (a) – (c). Table 6 The energy values (in eV) of the dips in the Fig. 9(a) – (c) and the corresponding quantum numbers l as identi%ed using the relation (7.8) for the three cases Dips
N (↓) N +3 N +6 N +9 N + 12
Energy El (eV)
l + 1=2
Plot 9a
Plot 9b
Plot 9c
Plot 9a
Plot 9b
Plot 9
417 357 313 277 247
453 377 317 272 237
438 373 323 283 250
41 + 0:15 44 + 0:48 47 + 0:45 50 + 0:50 53 + 0:48
31 + 0:32 34 + 0:36 37 + 0:47 40 + 0:46 43 + 0:35
36 + 0:56 39 + 0:62 42 + 0:57 45 + 0:47 48 + 0:41
to the choices of the dips N + 3; N + 6 · · · diFering also by the number 3. The fractions adding to the whole number also are found to be close to 1=2, except in a couple of cases. The dips are thus found to %t relation (7.8) quite well, leading to the conclusion that the results are generally in
R.K. Varma / Physics Reports 378 (2003) 301 – 434
387
accordance with the expectation of the theory from which the relation of form (7.8) Rows. (There is of course, the question of the factor 3 as compared to the relation (7.7b) which has to be sorted out.) Checks were performed to rule out the possibility of any artifact or spuriousness vitiating the results and conclusions. One may refer to the original reference [60] for a discussion thereof. The following conclusions may be highlighted. (a) The discrete allowed and forbidden states do exist in the domain of parameters where one use classical mechanical equation of motion to determine the motion. (b) The energies of this states are well represented by the relation of form (7.8), which is obviously nonquantal in nature as there is no Planck quantum ˝ appearing in it. (c) The allowed energy states E, [Eq. (7.7a)] form a hydrogen-like sequence for which the quantum numbers can be identi%ed as shown in Table 6. (d) The allowed energy values El and the associated quantum numbers l depend in a continuous manner on the length L of the “box”. This is a manifestation of wave-like behaviour which is not known to be a characteristic of the standard equation of motion paradigm of classical mechanics. As pointed out earlier the experiments described above corresponding to the mode (iii) turned out to have not the simplest of methodologies. The experiments were repeated by two group of experimenters: Unnikrishnan et al. [62] and Ito and Yoshida [63], both of whom have recorded observations similar to ours, namely maxima and minima in the plate current as a function of the grid potential. However, the latter authors’ current pro%le exhibit depth of modulation (large maxima and minima) similar to ours, which the formers’ do not. Both the groups of authors have proposed the generation of a secondary electron beam arising due to the acceleration of the secondary electrons produced as the primary beam strikes the negatively biased detector grid. The secondary electron beam initially travels from the grid towards the gun being accelerated because of the potential drop from the gun (anode) to the negatively biased grid. Because of a fortuitous coincidence, the condition (7.5) for the maxima of the one-dimensional interference happens to be the same as the one for the focusing of an electron beam (with a small angular spread) travailing along a homogeneous (or even a slightly inhomogeneous magnetic %eld) (see for example [67]). Both the groups of workers have proposed explanations for the existence of maxima and minima in the plate current using this property of focusing and defocusing of the beam. The geometrical size of the anode hole plays an important role in their explanation: If the diameter of the anode hole is comparable to the Larmor radius, then if the secondary beam reaches the anode hole in the defocussed state, it will be partly intercepted by the anode, as the rest of it reaches the cathode and reRected by it. It travels back to the detector where it is detected as a “minimum”. On the other hand, if the beam reaches the anode hole in a focused state, the whole of it reaches the cathode and reRected by it, and is detected as a maximum on travelling back to the detector. As the secondary electron beam energy now varies with the sweep of the potential on the biased grid, the beam goes through one focus after another at the anode as determined by the relation L=2*lv , for l = 1; 2; 3; : : : . This, according to the mechanism outlined above would correspond to the maxima of the detector current.
388
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Ito and Yoshida [63] have proposed, in addition, a mechanism for the “resonant production” of secondary electrons and thereby an enhanced strength of the secondary beam whenever L = 2*lv holds; whereby as the secondary beam strikes the detector grid on its return path, it produces more secondary electrons on each transit—thus leading to the enhancement of the maxima. These authors have carried out a numerical simulation of the problem based on the above mechanism. They have found that while the simulation does produce the peaks and dips in the collector current (at the required positions as determined by the condition L = 2*lv , which they term as the “resonance condition”), with a “reasonable” secondary electron yield ( = 3), it produces only “small” and “narrow” peaks as against the “drastic” changes observed in the experiment. It is suggested by them that the discrepancy “may be resolved by studying the secondary emission process more carefully”. Thus while the mechanism advanced by these authors [63] appears reasonable, it does seem to fall short of explaining the more drastic changes observed experimentally. It may be pointed out, however, that their mechanism depends crucially on the size of the anode hole, which should be comparable to (or somewhat less than) the Larmor radius, so that the anode can partly intercept the secondary electron beam in its defocussed state. We should like to point out that this mechanism could not work in our experiment since the diameter of anode hole was 9 mm, which is very much greater than the Larmor radius, and therefore no interception of the secondary electron beam can occur at the anode. [Assuming the initial energy of the secondary electrons to be ∼ 10 eV, the energy in the motion perpendicular to the magnetic %eld will at most be 10 eV, since the acceleration takes place only along the magnetic %eld which is normal to the biased grid. The Larmor radius for the 10 eV ‘perpendicular’ energy and a magnetic %eld of 100 g, turns out to be rL ≈ 1 mm. This is less than the diameter of the anode hole, 2 mm, even in the experiment of Ref. [63]. But in our case [60] it is much less than our anode hole diameter, 9 mm. On the other hand we have used magnetic %elds 200 –300 g, in which case the Larmor radius will be even less: 0.5 –0:3 mm.] The above discussion shows that we can rule out the mechanism of Ref. [63] to explain the maxima and minima in the plate current in our experiment at least [60], and we should be led to conclude that the maxima (and minima) represent interference eFects in one dimension, as described in Section 7. It must be mentioned, however, that while the mechanism advanced by these authors may be inadequate to explain the depth of modulation observed in the experiment, the role played by the secondary electron beam ought to be acknowledged even in our explanation. There are thus simultaneously two beams operative in the experiment: A primary electron beam with a %xed energy from the primary source, the electron gun; and a secondary electron beam emanating from the biased grid, whose energy varies as the potential on the biased grid is swept from a negative value to zero. The experiment is thus equivalent to the experiment in the mode (ii) mentioned in Section 7, whereby the electron energy is swept while the magnetic %eld B and the length L are kept %xed. The only diFerence is that it is here the secondary electron source which is swept in energy while the primary beam only plays the role of producing the secondary electrons on the grid. If, on the other hand, we use a mode of experimentation which does not involve the generation of a secondary electron beam, we can eliminate altogether the possibility of the mechanism of Ref. [62,63]. This would be the case with respect to the modes (i) and (ii), where no biased grid is used. Though there will still be secondary electrons produced as the primary beam strikes the grounded grid, they will not produce a beam in the absence of a potential drop. The experimental results obtained with the mode (i) have been reported earlier [68] and will not be reviewed here. Rather we would like to describe, in the next section, experimental results
R.K. Varma / Physics Reports 378 (2003) 301 – 434
389
carried out in the mode (ii), which, as mentioned already, would be free from the possibility of an alternative interpretation in terms of the behaviour of the secondary electron beam arising in mode (iii). Moreover, we shall exhibit the existence of another new feature of these experimental results which can be understood only in terms of the wave phenomena—namely, the existence of beats. It is well known that in a wave phenomenon the beat frequency !B is given by the diFerence between the two beating frequencies, !B = !1 − !2 when !1 !2 , so that !B !1 ; !2 . This follows for the intensity of the superposed waves, which is obtained as the magnitude squared of the superposed amplitudes. This is an essential and characteristic wave property. A demonstration of this property for a phenomena would establish unambiguously its credence as a wave phenomena. We shall present in the next section, not only the experimental results exhibiting the discrete energy band structure in the plate current but also the existence of beats in this band structure which are shown to have a “frequency” which is equal to the diFerence between the two closely spaced “frequencies”. We shall see later what is meant by a “frequency” in the context of this experiment. 7.2. Transmission characteristics of charged particles along a magnetic 3eld with electron energy sweep: observations of discrete energy states and beats in the plate current We now report the experimental results in the transmission characteristics of a stream of charged particles (electrons) from an electron gun to a detector plate as the electron energy is swept by sweeping the cathode voltage. [This is in accordance with the mode (ii) of experimentation.] As earlier, the stream is taken to be of such a low intensity (∼ nA) that it can be regarded as consisting of only individual particles without any interparticle collisions or collective eFects. (For an energy of electrons E ∼ 1 keV a nanoampere current corresponds to a linear electron number density of approximately 10 cm−1 , and volume number density ≈ ne ∼ 102 cm−3 , taking the diameter of the electron stream to be ∼ 2 mm). This is quite a low number density which makes the interparticle collisions inconsequential and collective eFects absent. The experimental chamber consists of a glass cylinder (length 85 cm, diameter 11 cm) which is evacuated to ∼ 4 × 10−6 torr. The magnetic %eld is produced by a set of solenoid coils fed by a low voltage high current power supply, and can be varied, if desired, by varying the current in the coils. The electrons are injected almost parallel to the magnetic %eld (very small pitch angle 6 5◦ ) from an electron gun placed at one end of the chamber. At the other end is placed a detector, a Rat grounded SS (stainless steel) plate, behind a grounded SS grid. For details of the experimental set-up see Ref. [65]. The plate is kept at a %xed distance from the gun anode, but in a signi%cant variation from the earlier experiment, the grid is now made movable with the help of a Wilson feedthrough. The plate–gun distance can also be varied if desired. The experiment is carried out by recording both the plate and the grid currents as a function of the cathode voltage (electron energy). The experiment is repeated after varying the distance between the plate and the grid, keeping the distance between the plate and the gun %xed at 51 cm. The plate–grid distance is changed by intervals ranging between 2 and 10 cm. The plate and grid currents Rowing to the ground are measured by recording the potential drop across a 470 k` resistor and deducing the current therefrom. Fig. 10(a) – (c) exhibit the plate and grid currents as a function of the electron energy for the plate–grid distances, 2, 4 and 6 cm, respectively, but for the same magnetic %eld value, 69 g. Taking the plot of Fig. 10(a) for the minimum plate–grid distance of 2 cm, as a reference, we notice a rather
390
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Fig. 10. Plate and grid current plots as a function of the cathode voltage (electron energy in eV) for the plate–grid separation D: (a) 2 cm, (b) 4 cm, (c) 6 cm; magnetic %eld B = 69 G, and gun–plate distance Lp = 51 cm in all cases.
striking beat like modi%cation of the curve progressively with increasing separations, 4 and 6 cm of the grid from the plate. We notice the increase in the number of beats with increasing separation within the same sweep of the electron energy 0 –800 eV. This points to an increase in the frequency of the beats with respect to electron energy sweep, with increase in the plate–grid separation. We shall see later that this dependence is in accordance with the wave algorithm. We next consider Fig. 11, which compares the plots obtained for two diFerent magnetic %eld values, B = 69 and 135 g, but the same plate–grid separation of 6 cm, and the same plate–gun distance Lp = 51 cm, which is %xed at this value for all the plots. We again notice an increase in the number of beats with an increase in the magnetic %eld from 69 to 135 g. We shall see in Section 7.2.2, that the frequency of the main oscillations over which the beats ride as a modulation is determined by the gun–plate distance Lp . The presence of grid provides another distance Lg , between
R.K. Varma / Physics Reports 378 (2003) 301 – 434
391
Fig. 11. Plate and grid current plots as a function of cathode voltage (electron energy in eV) for gun–grid distance Lg = 45 cm, (a) magnetic %eld B = 69 G, (b) B = 135 G.
Fig. 12. Plate and grid current plot for the gun–grid distance Lg = 10 cm, and magnetic %eld B = 69:2 G.
the gun and grid. The beat frequency is then found to correspond to the diFerence of two frequencies characterized by the two distances Lp and Lg . As one continues to increase the plate–grid distance, and the grid crosses the midway mark between the gun and plate (Lg ¿ 26 cm), one no longer has the condition appropriate for beats (which requires that Lp & Lg ), and the character of the plots changes entirely. Fig. 12 gives the
392
R.K. Varma / Physics Reports 378 (2003) 301 – 434
plots for Lg = 10 cm. As expected, one no longer has the beats, but rather a superposition of two frequencies corresponding to the two distances Lp and Lg : the higher frequency variation riding over the low frequency variation, the former characterized by Lp and the latter by Lg . Finally, one must point out the rather striking complete anti-correlation between the variations of the plate and the grid currents. This is due to the constraint that there can be no transport of electrons across the magnetic %eld, and the current along the magnetic %eld must be conserved. Consequently any variation of current on the plate must be compensated for by an equal and opposite grid current. Hence the anti-correlated grid current. 7.2.1. Analysis of the experimental data Given the macroscopic matter function of the form = exp(ikx), with k = =v , as obtained in Section 7 (and also as will be obtained independently in Section 7.3), it suggests for the positions of the interference maxima in the energy domain, a relation of the form (7.5), that is, L = 2*lv ; (l = 1; 2; 3 : : :) with the appropriate distance L. The maxima of Fig. 10(a), where D ≡ (Lp − Lg ) = 2 cmLp (51 cm), should correspond to the distance Lp = 51 cm, and described by the relation
Lp = 2*lv
l = 1; 2; 3; : : : :
(7.9)
There are no “beats” present in Fig. 10(a) over the range of the energy sweep 0 –800 eV. On the other hand, in Fig. 11(b) which corresponds to B = 135 g and D = 6 cm, Lp = 51 cm, there are present “beats” in addition to the current oscillations with respect to the energy E, characteristic of the distance Lp = 51 cm and magnetic %eld B = 135 g with the maxima described by (7.9). If these “beats” are a consequence of the wave property, then their maxima should be describable by a relation of the form (7.9), that is,
D = 2*lv ;
l = 1; 2; 3; : : : ;
(7.10)
where D = Lp − Lg , would correspond to the beat frequency. We shall %rst analyse the experimental data in Figs. 10(a) and 11(b) to check whether the current peaks in them do correspond to the relation (7.9) and (7.10), respectively. Later in Section 7.2.2, we shall present an algorithm for this particular experiment based on the macroscopic wave function = exp(i x=v ), which will show that the beats are indeed described by the relation (7.10) and the basic faster oscillations by the relation (7.9). Now to check if the maxima of Fig. 10(a) are described by relation (7.9), we present in the %rst column of Table 7, the energy values corresponding to the maxima of the plate current and the values of the quantity =2*v for these energies in the next column for the magnetic %eld B = 69 g used here. Now from relation (7.9), =2*v = l=Lp ; this requires that =2*v be an integral multiple of a common factor which must be (1=Lp ). In the next column the closest such integers are identi%ed. Using these we calculate in each case the value of Lp as required by the above relation. These values are presented in the last column of Table 7, with the average value being 50:8 cm. This value of L is deduced from the experimental data using the relation (7.9) and is therefore designated as L]ded = 50:8 cm. This is to be compared with the value Lp = 51 cm %xed in the experiment. Excellent agreement between the values Lp = 51 cm and L]ded = 50:8 cm, shows that the peaks of the plots of Fig. 10(a) are indeed well described by (7.9). We similarly check if relation (7.10) describes the position of the beat maxima in Fig. 11(b). As in the case of Fig. 10(a), we give in Table 8 the energies corresponding to the beat maxima, in
R.K. Varma / Physics Reports 378 (2003) 301 – 434
393
Table 7 Energy peak positions El “quantum number” identi%ed, l, the plate-gun, Lded , deduced from the relation Lded = 2*lv, corresponding to the curve of Fig. 10(a). B = ambient magnetic %eld, = eB=mc, the gyrofrequency, and v the electron beam velocity Peak No.
El (eV)
k = =2*v (cm−1 )
l
Lded = l( 2*v )
(cm)
Peak No.
El (eV)
k = =2*v (cm−1 )
l
Lded = l( 2*v )
(cm)
1 2 3 4 5
246.7 206.7 173.3 146.7 126.7
0.1975 0.2158 0.2356 0.2561 0.2756
10 11 12 13 14
50.6 51.0 50.9 50.8 50.8
6 7 8 9 10
110 96.7 85.6 76.6 69.0
0.2954 0.3153 0.3348 0.3534 0.372
15 16 17 18 19
50.8 50.7 50.8 50.9 51.1
Magnetic %eld B = 69 g, Lp = 51, and average L]ded = 50:8 cm. Table 8 Energy peak positions El of beat maxima; l, the “quantum number” identi%ed for the particular beat in Fig. 11(b); D, the grid–plate distance as required by the relation D = 2*lv; B = ambient magnetic %eld, = eB=mc, the gyrofrequency, ] v, the electron beam velocity, D—average of D values Beat No.
El (eV)
k = =2*v (cm−1 )
l
D = l 2*v (cm)
1 2 3 4
55.0 83.3 141.7 283.0
0.820 0.6682 0.5103 0.358
5 4 3 2
6.1 6.0 5.83 5.58
Magnetic %eld B = 135 g, D = 6 cm and average L] = 5:9 cm.
its %rst column, the quantity =2*v in the second column, closest l value in the next column, and the values of D so deduced in each case in the %nal column, having the average D] = 5:9 cm. This is in excellent agreement with the distance D ≡ (Lp − Lg ) = 6 cm actually used in the experiment. This shows that the “beat” frequency deduced from the plot of Fig. 11(b) does correspond to the frequency characterized by the diFerence (Lp − Lg ), that is the diFerence of the two “frequencies” Lp and Lg present in the system. Finally we shall present an analysis of Fig. 12, which corresponds to widely diFerent values of the distances Lp and Lg , and therefore of the corresponding frequencies. Fig. 12 corresponds to Lp = 51 cm, Lg = 10 cm. As expected, there are no beats now, but merely a simple superposition of two frequencies, the higher frequency oscillation corresponding to Lp = 51 cm, riding over the low frequency variation corresponding to Lg = 10 cm. We shall analyse here only the low frequency to see what value of L do the maxima of the oscillation yield if they are to %t in a relation of the form
L = 2*lv. Following the same procedure as before, we tabulate in Table 9 the various quantities as indicated there, calculating the values of L in each case, given in the last column. The average L] ] value of these calculated L values, L=10:1 cm shows that it does correspond to the gun–grid distance Lg = 10 cm, which was chosen to be so for this particular run (Fig. 12). We therefore conclude that the low frequency in this limit (Lg Lp ) corresponds to the gun–grid distance Lg (Table 9).
394
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Table 9 Energy peak position El of the slowly varying part of Fig. 12. l, the quantum number identi%ed for the peaks; Lg , the anode–grid distance as required by the relation Lg = 2*lv; = eB=mc, the gyrofrequency, B, the ambient magnetic %eld, L]g , the average value of Lg El (eV)
=2*v (cm−1 )
l
L=l
38.33 60.0 105.0 256.7
0.50 0.40 0.302 0.19
5 4 3 2
10 10 9.9 10.5
2*v
(cm)
L]g (cm) 10.1
Magnetic %eld B = 69:2 g, Lg = 10 cm, and L]g = 10:1 cm.
Fig. 13. Schematic of the experimental arrangement indicating the various relevant distances Lp ; Lg , etc.
7.2.2. The wave algorithm for the present experiment We now present the wave algorithm which follows from the Schr6odinger-like formalism presented in Section 6.2. We shall apply this formalism to the above experiment and shall show how the experimental results can be understood in terms of the former. We recall from Section 7 that the wave function for the progressive macroscopic matter wave associated with electron motion along a magnetic %eld is given, for the mode number n = 1, by (1) = exp(ikx), where k = =v, and v the electron velocity parallel to the magnetic %eld (the subscript “parallel” is dropped). Other waves corresponding to the mode number n = 2; 3; : : : may also be present, for which the wave function is (n) = exp(inkx), but the mode n = 1 is the most dominant. We shall discuss later that the experimental curves do imply the existence of the other modes through the presence of higher harmonics in their periodic variation. We shall, however, consider only the n = 1 mode at the present time. It may be mentioned that there exists a direct quantum mechanical justi%cation for the form of the wave function (n) = exp(inkx), with k = =v, independently of the formalism of Ref. [58]. This is presented in Section 7.3. Consider now the experimental arrangement as shown schematically in Fig. 13. Electrons from an electron source S, are injected with a velocity almost parallel to the magnetic %eld. P and G denote the plate and grid, respectively (both grounded), at distances Lp and Lg from the source. Let x be the %eld point within the plate just behind the plate surface, where the “detection” is assumed to occur.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
395
The total wave amplitude at x is comprised of a sum of three contributions: (i) one corresponding to the particles arriving directly from the source S; H exp(ikx), (ii) another one corresponding to the particles arriving after being scattered oF the grid so that the grid acts as a secondary source for the electron wave for them, the corresponding wave amplitude being, : exp[ik(x − Lg )] and (iii) a third one from particles arriving after being scattered oF the plate surface, their amplitude likewise being, A exp[ik(x − Lp )]. Thus the wave amplitude at x (a point just behind the plate surface) is p (x)
= :eik(x−Lg ) + Aeik(x−Lp ) + Heikx ;
(7.11)
where : is the coeWcient of the forward scattering amplitude at the grid, A that of the forward scattering amplitude at the plate surface and H is the amplitude of the direct unscattered wave arriving at the point x. If g is the amplitude for the absorption of the wave at the grid, then from the conservation of total probability current in this one dimensional case, we have |
g
|2 + |
p
|2 = 1 ;
(7.12)
2
where | p | is proportional to the probability current recorded by the plate, and clearly transmitted past the grid. Since the transmitted current must have a signi%cant forward scattered component, we write approximately : = :0 | p | 2 . We then have p
= :0 |
p
| 2 eik(x−Lg ) + Aeik(x−Lp ) + Heikx ;
(7.13)
whence taking magnitude squared, we get |
p
|2 = [1 − 2:0 H cos kLg − 2:0 A cos k(Lp − Lg )]−1 [H2 + A2 + 2AH cos kLp ] ; :02
(7.14)
4
where we have neglected | p | as being small compared to the rest of the terms. Expanding the denominator in the expression (7.14) we obtain |
p
|2 ≈ (A2 + H2 ) + 2AH cos kLp + 2:0 (H2 + A2 )[H cos kLg + A cos k(Lp − Lg )] + 4:0 AH cos kLp [H cos kLg + A cos k(Lp − Lg )] ≈ A2 + H2 + 2AH cos kLp + 2:0 H(A2 + H2 ) cos kLg + 2:0 A(A2 + 2H2 ) cos k(Lp − Lg ) + 2:0 H2 A cos k(Lp + Lg ) + 4:0 A2 H cos k(Lp − Lg ) cos kLp :
(7.15)
There are various kinds of terms. The presence of the term cos kLp arises only through the coeWcients A or H, which represent the scattering oF the plate surface (∼ A) and the coeWcient of the unscattered wave amplitude (∼ H). All the other terms involve :0 , that is the coeWcient of the wave amplitude scattered oF the grid. We consider three cases: (a) Lp ≈ Lg ; (Lp −Lg )=(Lp , (b) Lp ¿ Lg ; (=Lp 6 0:2 (say) and (c) Lg Lp . Case (a): Lg ≈ Lp ; Lp − Lg = (Lp . If we consider the limiting case when the grid is very close to the plate, so that (Lp −Lg ) ≡ (Lp , then we get |
p
|2 = A2 + H2 + 2:0 A[A2 + 2H2 ] + [2AH + 2:0 H(3A2 + H2 )] cos kLp + 2:0 AH2 cos 2kLp :
(7.16)
396
R.K. Varma / Physics Reports 378 (2003) 301 – 434
This gives a variation with k(≡ =2*v) of | p |2 which is characterized by the “frequency” determined essentially by the gun–plate distance Lp with the peaks determined by the relation Lp =2*v=l, which is essentially relation (7.9). It has been shown in the last section that the plot of Fig. 10(a) does correspond to this relation with Lp = 51 cm and B = 69 g and the values of l identi%ed in Table 7. This shows that the peaks in the plot of Fig. 10(a) are consistent with the wave algorithm with the wave length N=2*v= . For the value of B=69 g and energy E=200 eV (say), N ≈ 4:6 cm. Thus the electrons of energy 200 eV, behave like an eFective de Broglie-like wave of wave length N ≈ 4:6 cm in a magnetic %eld of 69 g, which is of a rather macroscopic dimensions and is independent of the Planck quantum. It may further be noticed that, the grid and plate currents are found to be anti-correlated in all cases. This, as remarked already, is a reRection of the total current conservation along the magnetic %eld as expressed by the relation (7.12). Any maxima–minima which the plate current may exhibit as a consequence of the interference eFects, must be compensated for in the form of complementary grid current which we %nd to be the case in all the plots of Figs. 10(a)–(c) and 11(a) and (b). In our earlier experiment [69] such a complementary current appeared on the anode as the grid and plate currents there, because of the particular nature of that experiment, were positively correlated. Case (b): Lp ¿ Lg ; (1 − Lg =Lp ) = 6 0:2 (say). This case is the one which is appropriate for beats. We may write cos kLg = 2 cos kLp cos k(Lp − Lg ) − cos(2Lp − Lg ). Then using the inequality 1, in this case expression (7.15) gives |
%
|2 = (A2 + H2 ) + [2AH − 2:0 H(A2 + H2 )] cos kLp + 2:0 AH2 cos 2kLp + 2:0 A(A2 + 2H2 ) cos k(Lp − Lg ) + 4:0 H(2A2 + H2 ) cos k(Lp − Lg ) cos kLp ;
(7.17)
where the last term represents the modulation of the oscillating term cos kLp , with the beat frequency (Lp − Lg ). The analysis of the plots in Fig. 11(b) as presented in Table 8, shows that the beat frequency is indeed characterized by the diFerence (Lp − Lg ), precisely what is required by the expression (7.17). Since the latter expression for the probability current at the plate is derived assuming the wave algorithm with the wave number k = =v, whereby the intensity is obtained as | p |2 (the modulus squared of the wave amplitude at the plate) it follows that the observed beat structure does conform to a wave behaviour with the wave length N = 2*v= . We also note the fast variation cos kLp , which is characterized by the length Lp and which is modulated by cos k(Lp − Lg ), also agrees with the observations. It may be speci%cally emphasized that the beat frequency being equal to the diFerence between the two prevailing frequencies in the system is a speci%c consequence of a wave formalism, whereby the intensity or the probability current is obtained as a magnitude squared of the sum of the two amplitudes of the interfering waves with the two frequencies. In fact, it will be shown in the next section how it comes about in the present situation. On the other hand, the sum of two oscillating particle sources with closely spaced frequencies !1 and !2 will also produce beats, but with only half the frequency of the diFerence, !b = 12 (!1 − !1 ), rather than with (!1 − !1 ) as with waves. It may, therefore, be mentioned that the observation of beats with the right frequency constitutes a crucial test for the existence of the wave picture. There is little room now for the possibility of understanding these results in terms of the classical particle picture as was suggested [62,63] for the earlier results. The con%rmation of the existence of beats with the right frequency, then leads to
R.K. Varma / Physics Reports 378 (2003) 301 – 434
397
Fig. 14. Plate current plot transformed as a function of E−1=2 , E, the electron energy in electron volts for the plot of Fig. 11(b).
the important conclusion that the experimental results signify indeed a manifestation of matter wave phenomena in the macrodomain of a few centimeters. We note from here that in terms of the variation with the wave number k, the lengths Lp , and (Lp − Lg ) act as a “frequencies”. So if Figs. 10(a) and 11(b) are replotted as a function of E−1=2 which is proportional to k [k = (2E=m)−1=2 ], rather than what they are (as a function of E), then the various maxima, including the beat maxima would be found to be equally spaced, with the interpeak interval being inversely proportional to the distance Lp or (Lp − Lg ), the latter one in the case of beats, while the former for the main interference maxima and minima. We have done just that for the plot of Fig. 11(b). The replotted curve which is obtained after digitizing the plot of Fig. 11(b) manually and converting the data points in terms of E−1=2 , leads to the plot of Fig. 14. As expected, we do %nd the maxima including those of the beats equidistant. Case (c): Lg Lp . We next consider the case when the gun–grid distance Lg is much less than Lp . In this case we obtain |
p
|2 = A2 + H2 + 2AH cos kLp + 2:0 (H2 + A2 )[H cos kLg + A cos k(Lp − Lg )] + 4:0 AH cos kLp [H cos kLg + A cos k(Lp − Lg )] ≈ A2 + H2 + 2:0 A2 H(1 + cos 2kLp ) + 2:0 (H2 + A2 )H cos kLg + [2AH + 2:0 A(3H2 + A2 )] cos kLp :
(7.18)
In this limit (Lg Lp ); | p |2 is a sum of three terms, going as cos 2kLp ; cos kLg and cos kLp . The last two together yield a fast variation cos kLp riding over a slow variation cos kLg , precisely the kind of variation exhibited by the plot of Fig. 12. The term going as cos 2kLp represents only a second harmonic of cos kLp , which may well be present in the variation of periodicity characterized by the distance Lp .
398
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Again the plot of Fig. 12 is well represented by the expression of the form (7.18) which is obtained from the wave algorithm based on the formalism of Refs. [1,2] in the limit Lg Lp . Thus taken all the cases (a) – (c) together, the wave algorithm given here describes the plots of Figs. 9(a), 11(b) and 12. It must also be mentioned that it has not been found possible for the authors [65] to %nd any other explanation for these plots in terms of the equation of motion-initial value paradigm (referred to as the “standard paradigm”). We refer to the discussion in Ref. [60] to rule out any possible plasma physical explanation, essentially because of the very low beam current (∼ nA) used and high vacuum (∼ 5 × 10−7 torr) employed. 7.2.3. Discussion We have presented here experimental observations on the discrete energy band structure and specially the “beats” as a conclusive evidence for the existence of matter wave phenomena in the macrodomain, for electrons moving along a magnetic %eld. The “beat frequency” agrees entirely with the expectation of the wave formalism, being equal to the diFerence of the closely spaced frequencies of the two interfering waves. The “frequencies” correspond here to gun–plate and gun– grid distances, Lp and Lg respectively, and the “beats” have been found in the experiments to have the frequency corresponding to the diFerence (Lp − Lg ). The earlier experiment [60] also had exhibited the existence of discrete energy band structure which were shown there to be a manifestation of matter wave phenomena. The frequency of variation of the plate current was characterized by just one distance in the experiment, the gun–plate distance. However, as was mentioned in Section 7.1, these observations of Ref. [60] do permit some room for the possibility of being explained [62,63], in terms of the classical charged particle trajectories, even though the authors of [63] have themselves noted their proposed mechanism to be not entirely adequate to explain the depth of the observed modulation. The importance of the observations of beats with the right frequency (equal to the diFerence of the frequencies of the two interfering waves, which is required by a wave formalism), lies in the fact, that these beats (with this frequency) is a de%nite indicator of the wave formalism being at play to govern the dynamics of the electrons. Such beats cannot be explained in terms of the particle picture. We thus conclude that taken together the results obtained earlier [60] on the existence of discrete energy band structures in the transmission of electrons along a magnetic %eld, and those reported now [65] on the existence of beats modulating this band structure, constitute a convincing evidence for the existence of a probability wave in the macrodomain associated with the motion of electrons along a magnetic %eld. The probability matter wave has the wave function of the form (1)=exp[2*ix=N], with N=2*v= , v being the electron velocity along the magnetic %eld, and as shown in Section 7.2.2, the discrete energy band structure as well as the “beats” are a consequence of one-dimensional interference effects with the wave function of the above form, and with a wave length typically N∼5 cm, which is clearly in the macrodomain. These are extraordinary results by any account, because matter waves with such macroscopic wave lengths (∼ 5 cm) have not been either conceived or observed before, even if in the limited context of charged particle dynamics along a magnetic %eld. In the next section we give a direct quantum mechanical derivation of the macroscopic wave function (n) = exp(2*inx=N).
R.K. Varma / Physics Reports 378 (2003) 301 – 434
399
Needless to say, these results are clearly contrary to the expectations of the classical Lorentz equation of motion which governs the dynamics of charged particles in the classical macrodomain. The question naturally presents itself as to what the relationship is between the dynamics determined by the Lorentz equation, and the one governed by the equations (6.63) and (6.65) which have predicted these eFects. 7.3. A quantum mechanical justi3cation of the non-Planckian macroscopic matter wave behaviour of electrons along a magnetic 3eld We demonstrate in this section, how the form of the wave function (1) of Eq. (7.2), or more generally, the form (n) = A exp(inkx) with k = =v, follows directly from the quantum mechanics of charged particles in a magnetic %eld in the correspondence limit. A charged particle in a magnetic %eld in the classical mechanical domain corresponds in quantum mechanics to the particle in a Landau level with a large quantum number. If E represents the energy of a Landau level, so that E = ( + 12 )˝ ;
(7.19)
where =eB=mc is the gyrofrequency in the magnetic %eld B, then 1 corresponds to the classical limit, and ˝ = de%nes the gyroaction. Let P represent the Landau eigenfunctions which are essentially the harmonic oscillator wave functions. Consider now the propagation of an electron beam along a magnetic %eld in such a set of Landau levels with 1. Let there be a scatterer in the path of the electron beam, as a small obstacle, like the wires of a grid in the path of the electron beam. The anode of the electron gun through which the electron beam passes in the process of acceleration may also act as a scatterer. The scattering, assumed to be elastic, may kick the electron from the Landau level to ± n, where n ¿ 1. If H˜ be the perturbation Hamiltonian which describes the scattering, then the transition amplitude for this process :n ≡ − n|H˜ | = d2 P−n H˜ P ; (7.20) 2 being the coordinate normal to the magnetic %eld. If ’ represents the complete wave function of the particle in a magnetic %eld including the plane wave along the magnetic %eld (assumed homogeneous), we have ’ = P (2)eiG x ;
(7.21)
1 G = [2m(E − ˝ )]1=2 ˝
(7.22)
where
and where x is the coordinate along the magnetic %eld, and E is the total energy of the particle. The transition amplitude including the eigenfunction along the %eld is given by (n) d2’∗−n (2)H˜ ’ = :(n) ei(G −G−n )x : (7.23) A =
400
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Now making use of the assumption n, we expand G−n around G , using the expression (7.22), we get −1=2 2 9G n
= n
(E − ˝ ) ; (7.24) = G − G−n = n 9 m v where
2 v= (E − ˝ ) m
1=2
is the velocity along the magnetic %eld. The transition amplitude An is then given by A(n) = :n ei(n =v)x
(7.25)
so that it corresponds to a wave with the wave number n
kn = (7.26) v which for n = 1, gives essentially the wave length NeF of Eq. (7.4) and is clearly independent of ˝. It is therefore, this transition amplitude (7.25) which is responsible for the non-Planckian wave behaviour for the motion along the magnetic %eld which have been reported here in Section 7.2 as well as in Section 7.1. Note that n ¿ 2 in (7.25) would correspond to the higher harmonics of the fundamental wave corresponding to n = 1. An examination of the plots of the various Figs. 10(a) – (c) would reveal that higher harmonics must be present. These higher harmonics, it may be pointed out, correspond to Eq. (6.63) for higher values of the modes n (n ¿ 2). Thus the formalism of Ref. [58] does contain the higher harmonics as well. 8. The Schr-odinger-like equations—a quantum mechanical derivation Schr6odinger-like equations (6.63) for the set of probability amplitudes (n) were derived in Section 6.2 from the classical Liouville equation, for the charged particles in a magnetic %eld as its Hilbert space representation. The derivation, itself somewhat specially designed and given originally in Ref. [58], yielded a probability amplitude description, similar to that of quantum mechanics, but now for a system in the classical mechanical domain. The gyroaction which appears here in the role of ˝ in the Schr6odinger-like equations, has a macroscopic magnitude, typically ≈ 108 ˝. It is the latter fact which de%nes the domain of description of this theory to be classical. Following from its (probability) amplitude character, the theory predicts the existence of onedimensional interference phenomena in the transmission of charged particles along magnetic %eld. The observations exhibiting such interference phenomena have been reviewed in Section 7. Earlier, its predictions relating to the existence of a multiplicity of residence times in an adiabatic trap arising from the diFerent equations of the set, were successfully veri%ed through a series of experiments as observed in Section 6.1.1. There are two comments that can be made in relation to this theory. First, that while the Schr6odinger-like equations of the theory have been obtained from the classical Liouville equation which has as its characteristic equations, the Hamilton equations of motion (which are equivalent to
R.K. Varma / Physics Reports 378 (2003) 301 – 434
401
the Lorentz equation), the expectations of the two (the Schr6odinger-like equations, and the Lorentz equation), as we saw in the last sections, are fundamentally diFerent. We had tried to throw some light on this question in Section 6.2.3, but a deeper examination is required. The second comment is that, being an amplitude theory, one would expect it to have some connection with quantum mechanics, the fundamental amplitude theory. Though a search for such a connection was attempted earlier [56] it did not go far enough. A recent work by the author [64] has now established such a relationship, where the Schr6odinger-like equations of the theory have been obtained starting from the quantum mechanic Schr6odinger equation for the system. The advantage of a derivation starting from quantum mechanics is that the amplitude character of the derived set of equations, since it would Row directly from that of the quantum-mechanic Schr6odinger equation, would now prevail unreservedly. Furthermore, it would aFord a closer understanding of the relationship between the Q-M Schr6odinger equation and these set of equations, as well as between the present quantum mechanical derivation and the derivation of Ref. [58], reviewed in Section 6.2. This derivation now also aFords an opportunity to generalize these set of equations to include all the three components of vector potential A, but taking only A4 eˆ4 to have nonzero curl, so that the magnetic %eld still has only Br and Bz components and Ar and Az are curl free in almost the entire region except for a small source region. While we had assumed axisymmetry for the derivation in Ref. [64], we assume here that the magnetic %eld is weakly azimuthally asymmetric. With this generalization, we shall obtain a set of three dimensional equations with a structure similar to that of the QM-Schr6odinger equation with a vector potential, which is assumed here to be curl-free in the entire region of space except in a thin torus inside which the B4 -%eld is con%ned. As will be shown in Section 8.2 the set of equations with the vector potential so obtained predict the possibility of observing, in the manner of the Aharonov–Bohm eFect, the curl-free vector potential in the macrodomain (≈10 cm). The observation of such an eFect in the macrodomain would constitute a spectacular demonstration of the amplitude character of governing equations in the macrodomain, for it is the amplitude which carries the information of the vector potential in its phase. Moreover, and more importantly the observation of the curl-free vector potential in the classical macrodomain, as these observations would signify, would appear to contradict the Lorentz equation of motion. This would thus entail an enlargement of our understanding of the classical charged particle dynamics in a magnetic %eld. As will be reported in Section 8.4 we have indeed found experimental evidence for the eFect that a curl-free vector potential has on the electrons a: la Aharonov–Bohm in the classical macrodomain. In the next section we set to derive the required set of equations starting from the Feynman path integral representation for the quantum mechanical problem under consideration. 8.1. A path integral representation for a charged particle in an inhomogeneous magnetic 3eld and the derivation of the set of SchrFodinger-like equations As we wish to start with the quantum mechanical considerations of the charged particle dynamics, it is expedient to employ as done earlier [56,64] the path integral representation. If then (r; 4; z; t+) is the probability amplitude for the particle at r; 4; z (cylindrical coordinates) at the time t + , then it is connected to that at (r − Vr; 4 − V4; z − Vz; t) through the Feynman relation ( being a small
402
R.K. Varma / Physics Reports 378 (2003) 301 – 434
time interval) (r; 4; z; t + ) =
m 3=2 d(V4)r d(Vr) d(Vz) 2*i˝ t+ i ×exp L dt (r − Vr; 4 − V4; z − Vz; t) ; ˝ t
(8.1)
where L is the Lagrangian for the charged particle in a magnetic %eld. 1 e ˙ 4 + zA m(x˙2 + r 2 4˙ 2 + z˙2 ) + (rA ˙ r + r 4A ˙ z) ; 2 c and where L dt in the exponent in Eq. (8.1) is written in t+ 1 dt L = L = m[(Vr)2 + r 2 (V4)2 + (Vz)2 ]= + 2 t L=
(8.2) the form e (VrAr + rV4A4 + VzAz ) : c
(8.3)
We now make an appeal to the adiabatic theory of charged particle dynamics [5] brieRy outlined in Section 3. Accordingly we introduce “fast” and “slow” variables, with respect to both r and 4 coordinates. Write %rst 4 = 4 + #, 4 being the “fast” and “#” the “slow” variable, where the prime is later dropped. Likewise write V4 ⇒ V4 + V#. We thus have L =
1 m[(Vr)2 + r 2 (V4)2 + r 2 (V#)2 + (Vz)2 ]= 2 e + (VrAr + r(V4 + V#)A4 + VzAz ) : c
(8.4)
Using this expression in (8.1), and taking a Fourier transform with respect to 4 (taking the functions to be periodic with period 2*) m (r; ; #; z; t + ) = d(V4)r d(Vr) d(Vz) 2*i˝ i (8.5) ×exp L − i(V4) (r − Vr; ; # − V#; z − Vz; t) ; ˝ where is an integer (+ ive or −ive), the angular Fourier transform variable. We shall take 1 to correspond to the classical or correspondence limit. This enables one to ignore the weaker # dependence. The exponent [L=˝ − (V4)] in Eq. (8.5) then yields on completing the square in (V4): (L)=˝ − (V4) =
1 e m[(Vr)2 + (Vz)2 + r 2 (V#)2 ]=˝ + (VrAr + V#A# + VzAz ) 2 ˝c , 2 2 + 2 e e 1 mr V4 − 2 ˝ − rA4 rA ˝ − − : + 4 2 ˝ mr c 2mr 2 ˝ c
Using this in Eq. (8.5) and carrying out integration with respect to (V4), this yields m i 1 (Vr)2 (r; z; #; ; t + ) = d(Vz)r d(Vr)exp m 2*i˝ ˝ 2
(8.6)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
1 (Vz)2 + + m 2 ˝ − − 2mr 2
1 r 2 (V#)2 e e e m + r(V#)A# + VrAr + VzAz 2 c c c 2 e rA4 (r − Vr; z − Vz; # − V#; ; t) : c
403
(8.7)
Note that the exponent now has the term 1=2mr 2 (˝ − (e=c)rA4 )2 which represents an eFective potential for the (r; z) motion. In the large quantum number limit assumed, 1, M = ˝(1) de%nes the classical canonical angular momentum which will not be conserved if A4 is not strictly independent of 4. Next, we specialize to the case of near adiabatic limit de%ned by the inequality (j ≡ v⊥ = 2 d =ds1) which implies that the particle stays close to the magnetic line around which it gyrates; that is, the gyro-radius rL of the particle is much less compared to the characteristic length L of the magnetic %eld variation. In a curl free inhomogeneous magnetic %eld, the %eld lines would in general have a curvature. It is then more appropriate to have a local orthogonal system of coordinates in place of the cylindrical coordinate system. Following Dykhne and Chaplik [44], we employ for an axisymmetric magnetic %eld con%guration the coordinate system (y; 4; s) de%ned by (4.104) and (4.105), where s is the length along the line of force, y a coordinate orthogonal to the particular %eld line, and 4 the angular coordinate orthogonal both to y and s. The line element dl in this coordinate system is given by dl2 = dy2 + h2s ds2 + h24 d4 2 where hs and h4 are the scale factors, hs = (1 − y=Rc ); h4 = r, and Rc (s) is the radius of curvature of the particular %eld line [see Section 4.3]. If, for simplicity, we consider the small Larmor radius limit and assume Rc to be large, that is, we select a line of force for the particles to be on, near the axis of the magnetic %eld con%guration, we then have hs 1. The parametric equation of a line of force is given by (4.105) r = %(s);
z = z(s) :
In the small Larmor radius limit the coordinate y of the particle will always remain small during the motion; we can thus expand (e=c)rA4 in the potential energy term in power of y=Rc as in (4.106) 9 (rA4 )|y=0 + · · · : rA4 = (rA4 )y=0 + y 9y Moreover, we have the total magnetic %eld on a %eld line (4.107) 1 9 (rA4 ) : B= r 9y Hence, 2 eB 2 2 eB 2 e e e y = ˝ − (rA4 )0 + ˝ − rA4 = ˝ − (rA4 )y=0 − y2 r 2 c c c c c e eB ˝ − (rA4 )y=0 + · · · : − 2yr (8.8) c c Note that (rA4 )y=0 refers to the value on the particular %eld line from which y is measured. (rA4 )y=0 basically represents the Rux coordinate of the %eld line. For the axisymmetric case is a constant
404
R.K. Varma / Physics Reports 378 (2003) 301 – 434
of motion and ˝ (1) is identi%ed as the canonical angular momentum M ≡ ˝, which is further identi%ed with e=c(rA4 )y=0 ≡ M . Then (8.8) gives the potential energy term in (8.7) as 2 1 e 1 rA ˝ − = m 2 y2 : (8.9) 4 2mr 2 c 2 Thus the guiding centre of the particle would always stay on the Rux surface rA4 = (c=e)M in the axisymmetric case. The departure from the axisymmetry would make it move to neighbouring Rux surfaces. We thus introduce a coordinate Y of the guiding centre, the “slow” variable counterpart of the “fast” variable y of the coordinate system (4.104). We thus have Eq. (8.7) as i 1 (Vy)2 1 (VY )2 1 r 2 (V#)2 ˜ (y; Y; s; #; ; t + ) = m + m + m d(Vs) d(Vy)exp ˝ 2 2 2 e e 1 1 (Vs)2 e 2 2 + m + r(V#)A# + VsAs + VYAY − m (s)y 2 c c c 2
˜ (y; Y; s; #; ; t) =
× ˜ (y − Vy; Y − VY; s − Vs; # − V#; ; t) y ie exp − Ay dy : ˝c
(8.10) (8.11)
Next we consider an eigenfunction expansion [66] of a part of the Kernel in (8.10), that is m 1=2 i 1 1 2 2 2 2 = exp Pn (y)e−iEn =˝ Pn∗ (y − Vy) : (8.12) m(Vy) − m y 2*i˝ ˝ 2 2 n This part of the Kernel represents a harmonic oscillator with frequency , where the Pn are the harmonic oscillator wave functions and En = (n + 12 )˝ (s)
(8.13)
are the Landau energy levels. If we now use the expansion (8.12) in (8.10), multiply both sides by Pn∗ and integrate over y, we obtain using the orthonormality of the eigenfunction Pn , m 1=2 i 1 (VY )2 1 R2 (V#)2 ˆ m + d(Vs)exp (Y; n; s; #; ; t + ) = 2*i˝ ˝ 2 2 e e 1 (Vs)2 e + R(V#)A# + VsAs + VYAY − n˝ (s) + m 2 c c c ˆ − VY; n; s − Vs; # − V#; ; t) : ×(Y
(8.14)
Eq. (8.14) follows on carrying out integration over (Vy) on the right-hand side of (8.10). Consider now the exponent in (8.14), and note that it can be written as e VX⊥ 1 (VX⊥ )2 1 m VX⊥ 2 e m + VX⊥ · A − n = n · A − ; (8.15) + 2 ˝ ˝c 2 n˝ n˝c where [VX ≡ (VY; RV#; VS)]. Now note also that when As and are independent of s (homogeneous %eld) n is a strict constant of motion. Call it n0 . When and As are slowly varying functions of s, n0 is an adiabatic invariant which is identi%ed with = n0 ˝ (n0 1), the gyroaction. However,
R.K. Varma / Physics Reports 378 (2003) 301 – 434
405
transitions will, in general, occur from n0 to n = n0 + N, where n0 N ¿ 1, when and As vary with s more rapidly. Therefore, N represents a change in the quantum number from n0 induced by the motion in a varying magnetic %eld and vector potential As . We may call it “nonadiabaticity”. But this should be properly considered as “quantum nonadiabaticity”. We therefore note that VX⊥ 1 m VX⊥ 2 e · A − = n0 LA = ⇒ (n0 + N)LA = n0 + (8.16) 2 n˝ n˝c where
LA =
1 m 2
Vs
2
e + (Vs)As −
c
1 + m 2
VX⊥
2
e + c
VX⊥
·A
(8.17)
is the eFective Lagrangian for the guiding centre motion in the presence of the vector potential A. In view of (8.16) and (8.17) we write 1 1 (Vs)2 e n0 LA = + NLA = = (8.18) m + + VsAs − n0 ˝ + NLA = ˝ 2 c where we have ignored the (VX⊥ ) terms in the %rst term of (8.18) as they pertain to the slower guiding centre “perpendicular” motion and therefore smaller compared to the (Vs) terms pertaining to the “parallel” motion. This leads (8.14) to the form ˆ #; s; t + ; ; n0 + N) (Y;
m 1=2 i 1 (Vs)2 e d(Vs)exp = m + VsAs − n0 ˝ (s) 2*i˝ ˝ 2 c (1) iN 1 (VS)2 e 1 (VX⊥ )2 e m + VSAs − + m + (VX⊥ ) · A⊥ + 2 c 2 c (2) ˆ − VY; # − V#; s − Vs; t; ; n0 + N) : ×(Y
(8.19)
ˆ 0 + N) governed by Eq. (8.19) is the wave function for the system in the Landau The function (n state (n0 + N) to which transition occurs from the state n0 as a consequence of a perturbation. The two terms in the exponent in Eq. (8.19) lead to widely diFerent scales of variation of the wave function. The %rst one [in the square bracket marked with the subscript (1)] leads to the variation on the microscopic scale characterized by the denominator ˝, while the second one [marked with the subscript (2)] leads to a variation on the macroscale characterized by the denominator = n0 ˝. We also accordingly denote the interval Vs in the second bracket by VS to emphasize the slower variation of the macroscale. We would like to obtain an equation of evolution for the transition amplitude for the transition from the initial state (n0 ) to the %nal state (n0 + N). For this purpose we take an “overlap” of the equation for the wave function (n0 + N) with that for the initial state (n0 ). In order to do so we %rst consider an eigenfunction expansion of the kernel with the subscript (1) term in the exponent
406
R.K. Varma / Physics Reports 378 (2003) 301 – 434
of (8.19). This kernel, namely m 1=2 i 1 (hs Vs)2 e Ks = exp m + hs VsAs − n0 ˝ (s) 2*i˝ ˝ 2 c
(8.20)
corresponds to the initial state labelled by the Landau quantum number n0 , and represents a motion along the coordinate s in the potential n0 ˝ (s) and curl free vector potential As . This kernel (8.20) is expanded in terms of the eigenfunctions of the Hamiltonian corresponding to the Lagrangian in its exponent [66]. Ks = ’G (s)e−iEG =˝ ’∗G (s − Vs) ; (8.21) G
where EG =
e 2 1 1 ˝G − As + n0 ˝ = (˝K)2 + n0 ˝
2m c 2m
with e K = G − As = ˝c
and
2m ˝2
1=2
[EK − n0 ˝ ]1=2
m 1=2 ’G (s) = exp i G+ ds − exp i G− ds 2*K˝2 s s m 1=2 ie exp As ds sin dsK(s) = 2*K˝2 ˝c
(8.22a)
(8.22b)
(8.23a)
where
e (8.23b) As ± K : ˝c In the eigenfunction expansion (8.21) of the kernel, Ks the functions ’G are the eigenfunctions of the initial state with the Landau quantum number n0 . The “overlap” alluded to above is then taken by using (8.21) in (8.19), multiplying both sides by ’∗G and integrating over s. Integrating later over (Vs) (on the right-hand side one gets iN 1 (VS)2 e − iE = ˝ G ˆ (X⊥ ; K; N; t + ; n0 ; ) = e exp m + VSAS − 2 c 1 (VX⊥ )2 e + (VX⊥ ) · A⊥ + m 2 c G± =
ˆ ⊥ − VX⊥ ; K; N; t; n0 ; ) : ×(X
(8.24)
We thus transform away from Eq. (8.19) the rapidly varying part Ks of the kernel in this equation by transforming it into the Fourier space with respect to that part of s which accounts for the rapid variation, where the weak dependence of and As on S has been disregarded in taking the Fourier transform. Write K =K0 +k, where K0 is a large constant wave number K0 1=L (L, characteristic length of
R.K. Varma / Physics Reports 378 (2003) 301 – 434
407
and As ), and kK0 ; (k ∼ 1=L). Then ˝k EK = EK0 + (˝K0 ) = EK0 + (˝k)v0 ; (8.25) m noting that (˝K0 ) = mv0 ; v0 being the velocity corresponding to the wave number K0 . Using (8.25) in (8.24) we get (dropping the subscript 0 on v0 ): iN 1 (VS)2 e −iEK0 = ˝ ˆ (X⊥ ; k; N; t + ; K0 ; n0 ; ) = e exp m + VSAS − 2 c 1 (VX⊥ )2 e + m + VX⊥ · A⊥ − ikv 2 c ˆ ⊥ − VX⊥ ; k; N; t; K0 ; n0 ; ) ; ×(X
(8.26)
Eq. (8.26) is still left with a rapid time dependence characteristic of the microdomain of ˝ [showing through the factor exp(−iEK0 =˝) on the right-hand side]. To transform away this (rapid) time dependence, multiply both sides by exp[iEK0 t=˝], and integrate over a time interval Vt; T Vt˝=EK0 , (T being the characteristic macroscopic time T L=v), which yields %nally N 1 (VS)2 e 1 (VX⊥ )2 ˆ (X⊥ ; k; N; t + ; EK0 ; K0 ; n0 ; ) = exp i m + VSAs + m 2 c 2 e + VX⊥ · A⊥ − − ikv c ˆ ⊥ − VX⊥ ; k; N; t; EK0 ; K0 ; n0 ; ) : ×(X
(8.27)
Taking the inverse transform with respect to k, yields [the “momentum” parameters (; n0 ; K0 ; EK0 ) will be suppressed hereafter] iN LA (S − VS; X⊥ − VX⊥ ; N; t) ; (8.28) (S; X⊥ ; N; t + ) = exp where LA is the reduced Lagrangian. given by (8.17) and where we have written VS = v. As can be seen, Eq. (8.28) has been obtained from Eq. (8.1) by systematically transforming away all the rapid dependences on 4, and later on the coordinates y and s, and the time t, characteristic of the microscale of ˝. The transformation of the y-coordinate in (8.10) through the use of eigenfunction expansion (8.12) yields the Landau quantum number n as the de%ning label for a particular state of the particle in a magnetic %eld. Subsequently, de%ning an equation for a Landau state (n0 + N) to which transition is assumed to have taken place from the initial state n0 , an “overlap” is de%ned between the %nal state de%ned by the Landau quantum number (n0 + N), (and other associated quantum number) and the initial state de%ned by n0 (and other associated quantum numbers). This process of taking the overlap transforms away from (8.19) all rapid dependences characteristic of the microscale of ˝ leaving the slow dependences which characterizes the behaviour of the transition amplitude in (8.28). The scale of variation of these transition amplitudes is characterized by the magnitude of the large action = n0 ˝, while the quantum numbers characterizing the initial state namely (; n0 ; K0 ; EK0 ) appear as parameters in the wave functions. These transition amplitudes functions are yet probability
408
R.K. Varma / Physics Reports 378 (2003) 301 – 434
amplitudes, and would exhibit all the properties characteristic of a quantum mechanical probability amplitude, but now in the macroscopic domain characterized by the wave numbers kK0 , Nn0 , and the action = n0 ˝˝. Eq. (8.28) with LA given by (8.17) is of the Feynman path integral form. One way to proceed therefore is to integrate the right-hand side over (VS) and (VX⊥ ), with the appropriate normalizing factor. Such an integration over the “slow” variables (VS; VX⊥ ) would be a logical consequence of the splitting of the variables into “fast” and “slow” variables. The path integration in (8.10) include integration over both the “slow” and “fast” components of the paths. The integration over the “fast” component (Vs; V4; Vy) having been already carried out through steps (8.10) to (8.24), the integration over the “slow” components (VS; VX⊥ ) remains which is now eFected as expressed in (8.29). The other way to proceed is to follow the procedure of Ref. [58] (reviewed in Section 6.2) which does not require integration aA la Feynman. Thus, following the Feynman procedure here we obtain mN 3=2 N 1 (VS)2 (S; Y; #; N; t + ) = d(VS)Y d(VY )R d(V#) exp i m 2*i 2 1 (VY )2 1 R2 (V#)2 e + VSAS + m + m − c 2 2 (S − VS; Y − VY; # − V#; N; t)
:
(8.29)
Carrying out integration over VS; VY; V#, and using the standard procedure [66] we obtain 2 2 9 9 1 e e i 9(N) = − AS + − AY N 9t 2m Ni 9S c iN 9Y c 2 e 9 − A# (N) + (N) ; (8.30) + iN R9# c where R is to be understood as a function, in general, of (Y; #; s), and the probability density is given as the total transition probability over all the N values: ∗ (N)(N) : (8.31) G(S; Y; #; t) = N
This set of equations (8.30) and (8.31) then constitutes a generalization to describe the three dimensional guiding centre motion, as well as to include curl free vector potential components AS and AY . Note that to the lowest order in the adiabaticity parameter j, the guiding centre stays on the initial Rux surface de%ned by RA# = (c=e)M , and in fact, on the given initial magnetic %eld line. In the axisymmetric case all %eld lines on a given Rux surface are equivalent. In the next order in j, the guiding centre executes a ∇B drift given by (3.33) which would be purely azimuthal in the axisymmetric case, while it continues to stay on the same Rux surface. We shall see in Section 8.3 how this drift can be extracted from Eq. (8.30). So long as the magnetic %eld (or equivalently A4 ) is axisymmetric the guiding centre will continue to remain on the same magnetic surface, because of the conservation of canonical angular momentum.
R.K. Varma / Physics Reports 378 (2003) 301 – 434
409
A non-axisymmetric part of A4 will induce excursion from the initial Rux surface. In terms of Eq. (8.30), this corresponds to transitions from the initial azimuthal quantum number l to l ± H, where H = 1; 2; 3 : : : are the mode numbers of the Fourier components of the non-axisymmetric part of A4 and these transition probabilities will be evaluated in Section 8.3 using the standard perturbation theory aC la wave mechanics. 8.1.1. The nature of the SchrFodinger-like formalism The %rst thing that we may note about the Schr6odinger-like formalism represented by Eqs. (8.30) and (8.31) is that the wave functions governed by these equations must necessarily be amplitudes in the sense of wave mechanics, as they Row directly from the wave amplitudes of the QM-Schr6odinger formalism. The second thing to note is that these equations are of the same form as obtained earlier in Section 6.2 (as a Hilbert space representation of the classical Liouville equation), but generalized now to three dimensions and also to include a curl free vector potential. In fact, it is interesting to note that Eq. (8.27) is the same as Eq. (6.57) of Section 6.2 (Ref. [58]) (except for the generalization mentioned above). Also the parameter argument :’s of the amplitude functions of Ref. [58] which were taken to be the initial values of the momenta, {:} ≡ (M; p0 ; 0 ; E) [being respectively the canonical angular momentum M , the linear momentum p0 , gyroaction 0 , and energy E, are identical with the parameter argument of the functions of the present derivation which are {refer to Eq. (8.27)}; K0 ; n0 ; E and essentially the set (M = ˝; p0 = ˝K0 ; = ˝n0 ; E) of Ref. [58]. Hence there is a one to one correspondence between the amplitude functions of Ref. [58] and that of the present paper. The present derivation from quantum mechanics therefore vindicates the earlier derivations and assigning to the functions (n) of Ref. [58], (we use in this section the symbol N to denote the argument of (N), as against the n in Section 6.2 and in Ref. [58]. This is because the symbol n stands here for some other quantity) the meaning of wave amplitude aA la wave mechanics, and therefore justifying the prediction made about their describing interference-like phenomena, the evidence for which has been reported [60,65] and reviewed in Section 7. The signi%cant point to be appreciated is that Eqs. (8.30) and (8.31) refer now to macroscopic dimension of 1–10 cm characterized by the a characteristic of ˝. magnitude of ≈ 108 ˝ (typically) rather than to the micro-domain of ∼ 1 A These equations thus describe the matter wave phenomena in the macro-domain with typical wave length ∼ 1–10 cm. The experimental results reported in Section 7 on the existence of discrete energy band structure (Section 7.1), as well as the observations of beats (Section 7.2) are truly a manifestation of one-dimensional (along the magnetic %eld) interference phenomena with macroscopic matter waves. There is a certain degree of dilemma which presents itself with respect to these results. We have known that in the macro-domain of a few centimeters, the equation of motion—initial value paradigm of classical mechanics (in the present case, the Lorentz equation of motion) is what governs the dynamics of charged particle. On the other hand, we also know that classical equations of motion for particles do not support matter wave interference phenomena that have been observed. Faced with this dilemma, it has been suggested by the author [61] that topological considerations in classical mechanics may be at play. Topological properties are global properties of a system, and cannot be captured by the standard equation of motion—initial value paradigm which represents only a local evolution. He has in fact shown that the Einstein–Bohr–Sommerfeld kind of quantization conditions can be obtained for a classical mechanical system as a consequence of their topological
410
R.K. Varma / Physics Reports 378 (2003) 301 – 434
properties where the role of ˝ is enacted by an appropriate action (PoincarLe invariant) belonging to the classical mechanical system. It would thus seem that the Hilbert space representation of the classical Liouville equation does capture the global topological properties of the system con%guration space, and the wave amplitude character of the equations so obtained is a reRection of that fact. From the point of view of the present derivation (from quantum mechanics) it is interesting to examine the meaning of the (N) and the index N. We recall that N was taken to be the change in the Landau level quantum number from n0 to n0 + N induced by the inhomogeneity in the magnetic %eld as the motion takes place along the %eld line. Thus (N) has the interpretation of the probability amplitude for %nding the particle in the state n0 + N (n0 being the level number in the absence of inhomogeneity). This transition to the states (n0 ± N) induced by the inhomogeneity may be termed as the “quantum-nonadiabaticity”. It is interesting to note that in the derivation of Ref. [58] from the classical Liouville equation, the N (denoted there as n) were the Fourier indices corresponding to the variable @ (action phase of Ref. [55]). Here it is the change in the quantum number n0 . In fact, one can now understand the origin of the multiplicity of residence times in an adiabatic traps discussed in Sections 6.1.1 and 6.3, which correspond to the diFerent values N = 1; 2; 3; : : : : They correspond to the nonadiabatic transitions induced by the magnetic inhomogeneity, from the Landau level n0 (==˝) to n0 ±N. One can also now understand why the relative fractions of particles corresponding to the diFerent values of N decreases rapidly with N, as has been found to be the case experimentally. This is because the probability of transition for large N values and therefore larger energy intervals VE = N˝ would decrease rapidly with N. It may also be noted that classical nonadiabaticity pertaining to a single particle which corresponds to N=1 (for instance Refs. [35,43]) has no counterparts corresponding to the values of N ¿ 1. That is, there are no particles which suFer nonadiabatic changes corresponding only to N = 2, 3, etc. whereas quantum mechanically probabilities for changes corresponding to N = 1; 2; 3, etc. are independent of each other and nonzero individually. Thus the observed multiplicity of residence times ought to be regarded as a manifestation of “quantum nonadiabaticity”. 8.2. Observability of the curl free vector potential aC la Aharonov–Bohm in the macro-domain The set of equations (8.30) and (8.31) generalized as they are to include a curl free vector potential, aFord the possibility of making yet another prediction, namely, the observability of the curl free vector potential in the macrodomain, in the manner of the Aharonov–Bohm (A–B) eFect in the microdomain of ˝. With axisymmetry and the consequent con%nement of particles to the magnetic surface, Eq. (8.30) are essentially one dimensional along the %eld line coordinate and are given by 2 i 9(N) 1 9 e = − AS (N) + (N) : (8.32) N 9t 2m Ni 9S c These were obtained in Ref. [64]. The situation here is somewhat diFerent, from the standard A–B eFect, where the fringe shift in the simplest case of a double-slit interference experiment is proportional to the Rux enclosed by the two paths topologically. Because of the one-dimensional situation we have only open paths here but we do have one dimensional interference phenomena through which we can observe the eFect of the
R.K. Varma / Physics Reports 378 (2003) 301 – 434
411
Fig. 15. (a) Schematic scale drawing of the experimental arrangement showing the position of the electron gun, the Faraday cup detector, the solenoidal ring, and the coils for the external %eld. (b) Gives the variation of the external magnetic %eld along the axis.
curl free vector potential on the interference maxima similar to that in the standard Aharonov–Bohm eFect. To see how such an eFect can be observed in the present case, we consider the passage of electrons from an electron source S to a Faraday cup detector D, along an ambient axial magnetic %eld, in an arrangement similar to that employed earlier (as described in Section 7). But now a curl free magnetic vector potential with components (Ar ; A4 ) is also produced by a Rowland ring which is a torus of high permeability magnetic material wound around by current carrying wires, so that the magnetic induction %eld B4 is completely con%ned in it. This induction %eld B4 produces a curl-free vector potential (Ar ; Az ) in the space outside. The Rowland ring is taken to be positioned midway between the electron gun and the detector with its face perpendicular to the axial magnetic %eld, as indicated schematically in Fig. 15. If we use the form of the solution (1) ∼ exp[ − i(Et − p dS)=] for Eq. (8.32) we obtain e (8.33) p = mv + AS c
412
R.K. Varma / Physics Reports 378 (2003) 301 – 434
with v = [2(E − )=m]1=2 so that
= 0 exp i
0
S
(8.34)
e dS mv + AS c
:
(8.35)
If we now consider the position Sp of the plate detector, and S = 0, the position of the electron gun, then the wave amplitude at Sp is Sp e 1 (Sp ) = 0 exp i dS mv + As : (8.36) c 0 Next consider a grid placed at the position Sg in the path of the electrons. This acts as a source of secondary waves as the electrons are scattered oF it, and travel to the plate. The wave amplitude at Sp for the wave originating at the grid is given by Sp e 2 (Sp ) = 0 exp i dS mv + AS : (8.37) c Sg The total probability current density at the plate position Sp is then given to be proportional to the total probability density |(Sp )|2 = |1 (Sp ) + 2 (Sp )|2 = |1 |2 + |2 |2 Sg i e : + 20 0 cos dS mv + AS 0 c This leads to the condition for interference maxima being given by Sg e dS mv + AS = 2*n : c 0
(8.38)
(8.39)
It is clear from here that the curl free vector potential AS appears in the condition Sgfor the interference maxima and would thus aFect their “positions”. In particular, since the integral 0 AS dS =g@, where @ is the Rux enclosed in the Rowland ring and where g is a geometrical factor, scanning the Rux in the Rowland ring would aFect the positions of the maxima. Carrying out the integration in (8.39) yields e (8.40) mvL ] g + g@ = 2*n ; c v] being the mean particle velocity, and Lg the gun–grid distance. S It may be mentioned that Eq. (8.39) involves the integral 0 g AS dS along the “open” path connecting the source at S = 0 to the grid at Sg . The question of the gauge-invariance of this quantity and for relation (8.39) may be raised. This is diFerent from the discussion of the standard A–B eFect where the interfering paths enclose the Rux topologically and we have the circuit integral A dx = @, which is trivially gauge invariant. S In the present case, however, the integral 0 g As dS along the open path would have to be evaluated using an expression for A in terms of its source, namely the Rux @ in the Rowland ring. We may
R.K. Varma / Physics Reports 378 (2003) 301 – 434
413
use Coulomb gauge ∇ · A = 0, in this static case. Then using the boundary condition that |A| → 0 at |x| → ∞; A becomes unique; that is, there is no gauge freedom left. (See for instance Ref. [71]). In terms of its source namely the Rux @ in the Rowland ring the expression for A in the space external to the Rowland ring, is given by A = (@=2*)∇$, where $ is the solid angle subtended by the Rowland ring at the observation point x (see for instance Ref. [72]). With such an expression S for A one %nds 0 g AS dS = sin 40 @, with the Rowland ring placed midway between the source and the grid, so that 40 = tan−1 (Lg =2r0 ), where r0 is the radius of the Rowland ring. Clearly the integral Sg AS dS along the open path is also gauge invariant because its value sin 40 @ which involves the 0 Rux @ is gauge invariant. The geometrical factor in (8.40) is g = sin 40 . We discuss in Section 8.4 the precise manner in which we can detect the eFects of the vector potential As experimentally in accordance with the relation (8.40), where we describe the experiment and present experimental results on its observation. 8.3. Guiding centre equations of motion—the adiabatic limit We shall now consider the adiabatic limit of the three-dimensional Schr6odinger-like equation (8.30) taken formally as → 0 and show how the well known guiding centre equations of motion follow from this equation in this limit. This, as was already pointed out earlier in Section 1, is similar to the classical limit of the quantum mechanic Schr6odinger equation taken through ˝ → 0. In an analogous manner we seek a WKB solution to Eq. (8.30) by writing (N) = A exp(iNR=) :
(8.41)
where R is the action—the Hamilton principal function for the guiding centre motion. Substituting (8.41) in (8.30), yields the following equations: 9R e 2 2 ∇2 A 1 =0 (8.42) ∇R − A + + + 9t 2m c 2mN2 A and 9A2 + ∇ · (A2 V) = 0 : 9t
(8.43)
In the limit → 0, Eq. (8.42) reduces to the Hamilton–Jacobi equation for the guiding centre while (8.43) represents the equation of continuity for the guiding centre probability density A2 . Note that the ∇ operator in (8.42) and (8.43) is with respect to the guiding centre position X, and P = ∇R, represents the guiding centre canonical momentum. The guiding centre equation of motion is obtained by operating (8.42) by the ∇-operator and using P = ∇R. e 9P e (8.44) + V · ∇ ∇R − A + V × ∇ × ∇R − A + ∇( ) = 0 ; 9t c c where the last term in (8.42) has been neglected as being small in the limit → 0. Here V is the guiding centre velocity de%ned by e (8.45) mV = ∇R − A : c
414
R.K. Varma / Physics Reports 378 (2003) 301 – 434
With (8.45), Eq. (8.44) then gives e 9A dV e = V × B − ∇ − ; (8.46) m dt c c 9t where −((1=c)9A=9t) represents the induction electric %eld E(i) . Eq. (8.46) is clearly the usual guiding centre equation of motion as already presented earlier (Eq. (3.21)) except for the electric %eld term eE, which is here a purely induction %eld; V is essentially X˙ in Eq. (3.21) and Eq. (8.46) can be split into “parallel” and perpendicular components. dv m (8.47) = −∇ ( ) ; dt dV⊥ e = V⊥ × B − ∇⊥ ; (8.48) m dt c where we have ignored the induction electric %eld, since we consider here only a static magnetic %eld. Eqs. (8.47) and (8.48) are essentially the guiding centre equations (3.26) and (3.32) given earlier, barring the electric %eld terms E and E⊥ , which are absent here, where v = eˆ · V, and ˆ Eq. (8.48) solved for V⊥ as in (3.33), yields a ∇B and polarization drifts. V⊥ = (V − v e). B × ∇
: (8.49) V⊥ = c eB2 We are thus able to obtain the essential guiding centre dynamics from the Schr6odinger-like equation (8.30) with (8.47) describing the motion along the %eld line, and (8.48) describing the ∇B drift. 8.4. Nonaxisymmetric magnetic 3eld and the “longitudinal invariant” If we consider the energy E and the action invariant of the particle such that it is trapped between the two mirror points in a given magnetic %eld con%guration, as explained in Section 3.4.1.1, then as was discussed in Section 3.4.1.3, the trapped particle has a periodic motion, and there exists a “longitudinal invariant” J given by [5] J = [2m(E − )]1=2 dS ; the circuit integral is carried out over a bounce period between the two turning points. When the magnetic %eld con%guration is axially symmetric, the ∇B drift given by Eq. (8.49) is purely azimuthal, it transports the guiding centre of the particle to equivalent %eld lines around the axis of symmetry and the longitudinal invariant J is trivially conserved. If, however, the %eld is weakly axially asymmetric, then the azimuthal drift would transport the guiding centre to inequivalent %eld lines, which would, in general, lead to a change in the turning points. The J is then an “adiabatic invariant” provided that (!D b )d=d#(ln B), where !D =vD =R, is the angular drift velocity, and b is the bounce period between the turning points along the magnetic %eld and where d=d#(ln B) denotes the azimuthal inhomogeneity scale length of the magnetic %eld. We shall now %rst try to understand the longitudinal invariant in the framework of the amplitude equation (8.30) for the guiding centre. Consider Eq. (8.30) for N = 1; making use of the weak #-dependence of the magnetic %eld write S i 1=2 ˆ = (R; #) exp ; (8.50) dS[2m(E − )] 0
R.K. Varma / Physics Reports 378 (2003) 301 – 434
415
where the time dependence of is taken to be of the form exp[ − iEt=], because of the time ˆ stationarity of the problem and where (R; #) is the weakly dependent (R; #) part of the wave function. If the motion along the S-coordinate is bounded, and periodic, then the value of the action S S= dS[2m(E − )]1=2 (8.51) would increase by the quantity J = dS[2m(E − )]1=2 every time the coordinate S of the particle completes a bounce with the period b , dS : b = [2(E − )=m]1=2
(8.52)
(8.53)
If 2 is the corresponding angular coordinate representing the periodic bounce motion then (J; 2) are the “action-angle” pair for the longitudinal motion. Expression (8.50) can then be written as ˆ = (R; #) exp[iJ2=] :
(8.54)
For to be single valued we must have J = 2*j ;
(8.55)
with j being an integer. This then is an eigenvalue equation, where j is a “quantum number”. When the guiding centre Hamiltonian is a weak function of #, as assumed, the “quantum number” j remains unchanged as the system evolves, just as in quantum mechanics in the adiabatic approximation, and is thus an “adiabatic invariant”. Since J is just the longitudinal action, its adiabatic invariance follows, in this framework, from that of j through the relation (8.55). Note that from (8.52): dS 9J = = b : (8.56) 9E [2(E − )=m]1=2 Thus the bounce motion can be described by a Hamiltonian K, in the action angle variables (J; 2) K = J!b
(8.57)
which yields the Hamilton equation as 9K =0 ; J˙ = − 92 9K = 2*=b = !b : 2˙ = 9J
(8.58)
8.5. Nonaxisymmetric magnetic 3eld and the transition across magnetic surfaces As was mentioned earlier, when the magnetic vector potential A# and therefore also the magnetic %eld is axisymmetric, then the canonical angular momentum is conserved. In terms of the
416
R.K. Varma / Physics Reports 378 (2003) 301 – 434
Schr6odinger-like probability amplitude equation (8.30), this corresponds to a solution for (N) of the form (1) ∼ ei‘#
(8.59)
for the #-dependence of the wave function (1) for the mode N = 1. The conserved canonical momentum P# is then equal to ‘, and is given by e P# = ‘ ∼ (8.60) = mR2 #˙ + RA# ; c where R#˙ is the guiding centre azimuthal velocity given by (8.49), so that |∇⊥ B| 2˙ = mR!D : (8.61) mR # = R B Since ∇⊥ B=B ∼ 1=Rc , where Rc is the radius of curvature of the %eld line, mR2 #˙ ∼ R=Rc ∼ j; (e=c)RA# is thus the dominant term in the expression (8.60) for P# . The conservation of the canonical angular momentum thus implies that the guiding centre stays on the Rux surface given by F = RA# for an axisymmetric magnetic %eld. When the magnetic %eld has an azimuthal inhomogeneity, the guiding centre can make transitions to diFerent values of the azimuthal quantum number ‘, and therefore to diFerent Rux surface F = RA# . The probability of such transitions can be calculated as in quantum mechanics, from the Schr6odinger-like equation (8.30) for the guiding centre. If we take AY = 0; AS = 0 Eq. (8.30) can be written in the form 2 1 1 e i 9(N) 2 2 = PY + 2 P# − RA# + PS + (8.62) N 9t 2m R c where 9 9 9 ; P# = ; PS = (8.63) i 9Y i 9# i 9S and where R is understood to be a function of (Y; #; S), in general. Taking the time-dependence of (1) to be of the form PY =
(1) ∼ exp(−iEt=) for a stationary state, we obtain from (8.62) 2 1 1 e 2 2 PY + 2 P# − RA# + PS + = E : 2m R c
(8.64)
(8.65)
We next introduce an operator U, given by 1 e P# − RA# − mR!D ; U= (8.66) R c where !D is the azimuthal angular drift speed de%ned by (8.61). We then %nd the “commutation relation” 9 9R 1 e P# − RA# − m + (R!D ) : (8.67) PY U − UPY = iR 9Y R c i 9Y
R.K. Varma / Physics Reports 378 (2003) 301 – 434
417
Making use of (8.60), (P# − (e=c)RA# )=R ∼ = mR!D , the %rst term on the right-hand side of (8.67) is ((=i)m!D 9R=9Y ) which is (m =i). Also 9=9Y (R!D ) and therefore neglecting these smaller terms we get PY U − UPY = − m : (8.68) i On the other hand, a part of the Hamiltonian operator of (8.65) namely 2 1 e 1 2 ˆ PY + 2 P# − RA# (8.69) H= 2m R c can be expressed in terms of the operator U of (8.66) instead of P# (see, for example Gol’dman et al. [39]) 1 e 1 (PY2 + U2 ) + !D P# − RA# − m(R!D )2 : (8.70) Hˆ = 2m c 2 If we de%ne the “creation” and “annihilation” operators as :† = (2m )−1=2 (PY − iU) ; : = (2m )−1=2 (PY + iU) ; then making use of the commutation relation (8.68) we %nd 1 1 2 2 2 † (P + U ) = j : : + ; 2m Y 2
(8.71)
(8.72)
where we have expressed the order of the operator in (8.72) explicitly as j2 , since 9=R9# ∼ 9=9Y ∼ j(9=9y). Using (8.69)–(8.71) in Eq. (8.65) we obtain 9 1 1 e 2 † 2 + !D j : : + − RA# − m(R!D ) 2 i 9# c 2 −
2 92 + = E : 2m 9S 2
(8.73)
Note that the number operator :† : corresponds to the action variable pertaining to the two variables PY and U, as in a harmonic oscillator. The eigenfunction corresponding to the operator :† : is given by 1=2 n
1 m
|∇⊥ B| 2 |∇⊥ B| Y − Y0 − ; (8.74) Y − Y0 − Hq Pq = exp − 2 eB
eB
where Y0 is the root of the equation. e ‘ = [RA# (Y0 )] (8.75) c and represents the Y -coordinate of the guiding centre, and the Hq are Hermite polynomials. The complete wave function as the solution of Eq. (8.73) is given by = ei‘# Pq(‘) (Y )’q‘ (S) ;
(8.76)
418
R.K. Varma / Physics Reports 378 (2003) 301 – 434
with ’q‘ (S) being governed by the eigenvalue equation 1 2 92 ’ e 2 2 − + j (q + 1=2)’ + !D ‘ − RA# − m(R!D ) + ’ = E’ ; 2m 9s2 c 2
(8.77)
where the subscript on ’ have been dropped, and the term j2 (q + 1=2) arising from the %rst term of (8.73) would be dropped as being small ∼ j2 . This equation then yields the WKB solutions of the form i S 1 (8.78) exp ± Pb dS ; ’= $ a |Pb | where Pb is the momentum of the bounce motion given by 1 e 2 2 Pb = 2m E − − !D ‘ − RA# + m (R!D ) : c 2 If we introduce an action-angle system (Jb ; 2), through 1=2 Jb cos 2 ; Pb = *m!b m!b Jb 1=2 S= sin 2 ; * the solution (8.78) takes the form 1 i ’= $ exp ± Jb 2 ; |Pb | where the action Jb is given by 1=2 1 e 2 Jb = dS 2m E − − !D ‘ − RA# + m (R!D ) c 2
(8.79)
(8.80)
(8.81)
(8.82a)
and 2 is the angle canonically conjugate to Jb . Note that if we use (8.60) to substitute for (‘ − (e=c)RA# ), we obtain 1=2 1 ; (8.82b) Jb = dS 2m E − − m (R!D )2 2 where 12 m(R!D )2 represents the energy of the azimuthal drift motion. If it is neglected as being small, (8.82b) reduces to the expression (8.52) for J . The condition for the single-valuedness of ’ of Eq. (8.81), then leads to the eigenvalue equation Jb (E ; ‘) = 2*n ; where Jb may be considered as a function of the energy E = E − ‘!D + (e=c)!D RA# + where E is the total energy. The bounce frequency b may then be obtained as 9E 9E ; !b = 2*b = 2* = 9Jb 9n using Jb = 2*n from (8.80). Integrating (8.84) with respect to n, yields e 1 E = (‘!D + n!b ) − !D RA# − m(R!D )2 : c 2
(8.83) 1 m(R!D )2 , 2
(8.84)
(8.85)
R.K. Varma / Physics Reports 378 (2003) 301 – 434
419
This is the form of the energy (Hamiltonian) in the action-angle form for the two degrees of freedom; the quantum number ‘ corresponds to the azimuthal motion with the drift frequency !D , and n to the longitudinal bounce motion with frequency !b . Note that the total energy of the bounce motion (libration) n!b , which includes as the potential energy must obviously be greater than the latter, n!b ¿ . Since, however, is generally greater than !b , we must have n1. When the magnetic %eld is time dependent, then the energy E of the particle would change in time through, in general, both the terms ‘!D and n!b . However, if the time dependence is slow enough such that !b T 1 (T being the characteristic time scale of change), then the quantum number n and hence the action J is an invariant, and the part n!b of the energy E changes through the change in !b . On the other hand, if !D T ∼ 1, ‘ will not be invariant and the change in ‘!D , may occur through the change in both ‘ and !D . When, however, the magnetic %eld is static the total energy E remains constant, the dynamics can lead only to a transfer of energy between the energy of libration (longitudinal bounce motion) and that of the drift motion. An azimuthal magnetic inhomogeneity which is not weak enough to guarantee the invariance of ‘, would induce transitions across the magnetic surfaces from ‘ to ‘ = ‘. If in the process, the total energy E is to be conserved by virtue of the time-independence of the Hamiltonian, its expression (8.85) would require !b and !D to change, if the quantum number n is to remain invariant. We shall next evaluate probabilities of transition across magnetic surfaces induced by the azimuthal inhomogeneity using the Eq. (8.73) in the perturbation theory, as in quantum mechanics. 8.5.1. Transport across magnetic surfaces Eq. (8.73) which we shall use to calculate the transitions may be written in the form H = E ;
(8.86)
where unperturbed Hamiltonian operator Ho [neglecting the small term j2 (:† : + 1=2)] is given by e 9 1 2 92 2 − RA# − m (R!D ) +
(8.87) + !D Ho = − 2m 9S 2 i 9# c 2 and the perturbation Hamiltonian, representing azimuthal inhomogeneity is given by e H˜ = − RA˜ # + ˜ : c The unperturbed eigenvalue equation is Ho U‘n = E‘n U‘n
(8.88)
(8.89)
with the eigenfunction U‘n given in the WKB approximation in terms of the “angle” variables by U‘n = U0 ei‘#+in2 :
(8.90)
We shall now calculate the transition amplitude from an initial azimuthal quantum number ‘ to ‘ due to the azimuthal inhomogeneity. The equation for the perturbation is given by ˜ ‘n ; H˜ U‘n + Ho ˜ = E‘n ˜ + EU
(8.91)
420
R.K. Varma / Physics Reports 378 (2003) 301 – 434
where ˜ is the perturbation wave amplitude. Following the standard perturbation theory aC la quantum mechanics we expand ˜ = ajk Ujk : (8.92) jk
Multiplying both sides of (8.91) by Uqr∗ and integrating over # and 2, we obtain on using the expansion (8.92), ˜ ‘q nr ; Uqr |H˜ |U‘n + ajk Ejk jq kr = E‘n ajk jq kr + E (8.93) jk
jk
whence we obtain E˜ ‘n = U‘n |H˜ |U‘n
(8.94)
and Uqr |H˜ |U‘n (Eqr − E‘n ) H˜ (q − ‘; r − n) : = |U0 |2 [(q − ‘)!D + (r − n)!b ]
a(‘n) qr = −
This may be written as H˜ (j; k) ajk = |U0 |2 ; [j!D + k!b ]
(8.95a) (8.95b)
(8.96)
where H˜ (j; k) is the (j; k) Fourier component of the perturbation with respect to the angles # and 2 with j = q − ‘, and k = r − n, and where (8.85) has been used to substitute for Eqr and E‘n . We have redesignated a‘n qr as ajk in (8.96). It may be noted that the expression (8.96) for ajk is actually, apart from a constant, identical with the expression for the Fourier component S˜ jk of the perturbation correction to the action S as the generating function {see Eq. (17.14) of Ref. [70]} corresponding to the perturbation Hamiltonian H˜ . It may be noted that the denominator in the expression (8.96), (j!D + k!b ) may become small in some cases. This signals the breakdown of the perturbation theory. For rational frequencies !D and !b , the perturbation theory would break down completely. The probability of transition from the state U‘n to Uqr as a consequence of the perturbation H˜ is given by |H˜ (q − ‘; r − n)|2 2 4 |an‘ | = |U | : (8.97) 0 jk 2 [(q − ‘)!D + (r − n)!b ]2 If the perturbation with respect to 2 is slow enough so that n is an adiabatic invariant, and we write r = n. Then (8.97) yields, 2 ˜ 2 4 |H (q − ‘)| |an‘ | = |U | (8.98) 0 nk 2 [(q − ‘)!D ]2 which gives the probability of transition across the Rux surface speci%ed by M = ‘ to M = q, induced by the Fourier component k = q − l of the perturbation H˜ .
R.K. Varma / Physics Reports 378 (2003) 301 – 434
421
8.6. Observation of the curl free vector potential in the classical macrodomain We now present experimental observation of the curl free vector potential in the classical macrodomain in the manner of the Aharonov–Bohm eFect in the quantum domain. The basic concept of the experiment was discussed in Section 8.2. We describe here the actual experiment carried out [73] and the results obtained. It may be remarked that the detection of the presence of a curl free vector potential in the classical macrodomain, as predicted by the formalism represented by Eqs. (8.30) and (8.31) would constitute a rather spectacular observation which is entirely unexpected and would entail an enlargement of our conceptual understanding of the charged particle dynamics. Carrying on from Eq. (8.39) of Section 8.2, we consider a small spread in which is usually present in an experiment, and expand its both sides around a mean . ] = ] + . This yields [recalling that v in Eq. (8.39) is given by (8.34)], the two equations L=2 e ] k = 1; 2; 3; : : : (8.99a) ds mv + A˜ = 2k ; c −L=2 and
L = 2*lv; ]
l = 1; 2; 3; : : : ;
(8.99b)
where (8.99a) holds for the mean , ] and (8.99b) in the %rst order in . Both must hold simultaneously. The subscripts on v and A˜ have been dropped. It may be recalled that the relation (8.99b) is essentially the one which describes the interference maxima already reported earlier [60,65] and reviewed in Section 7. Here L denotes the gun–plate distance (with the grid close by in front), and v] the average electron velocity along the magnetic %eld. It is to be noted that this relation is unaFected by the presence of a curl free vector potential. Assuming the curl free vector potential A˜ to be produced by a con%ned magnetic Rux @ in a torus of radius r0 , located at s = 0, (8.99a) gives, as explained in Section 8.2, e (8.100) mvL ] + @ sin 40 = 2*k ] ; c where 40 = tan−1 (L=2r0 ), Note that the relation (8.99b) does not involve @, while a change of the magnetic %eld B does not aFect the toroidal Rux con%ned in the torus. The experiment is typically carried out by varying the external magnetic %eld for a given electron energy E and the gun–grid distance L so as to satisfy (8.99b) for a given l. The latter is indicated by the appearance of the detector current maximum. The satisfaction of this condition is important for the observability of the eFect of the curl free vector potential. If in this state the Rux @ in the toroid is varied (by varying the current producing it), we have the following condition e ] Vk = 1; 2; 3 : : : (8.101) V@ sin 40 = 2*(Vk); c from Eq. (8.100) for the diFerences V@ and (Vk). A change V@ induces a change V in the phase of the macroscopic wave function, V = (eV@ sin 40 =c). ] Thus at V = 2*(Vk); (Vk) = 1; 2; 3 : : : which is essentially Eq. (8.101), there would be recurrence of the interference maximum at which the system is tuned in view of (8.99b). To incorporate (8.99b) into (8.99a) divide it by (8.99b) for
422
R.K. Varma / Physics Reports 378 (2003) 301 – 434
l = 1, which yields 1c (mvL) V@ = 2e
B] B0
sin tan (Vk) ; sin 40
(8.102)
where is the initial pitch angle of the particle and B0 , the magnetic %eld at the point of injection, ] the average magnetic %eld over the %eld line between the gun and the detector. One would and B, thus expect to get a series of maxima corresponding to Vk = 1; 2; 3; : : : as the Rux @ changes by values given by (8.102), with V@ for Vk = 1 varying as E1=2 with E. An electron beam of very low beam current (∼ 20–60 nA) from an electron gun capable of energies in the range 0 –2 keV is injected at a small angle to the magnetic %eld in a vacuum glass chamber ∼ 50 cm long and 9:3 cm dia evacuated to about 10−6 torr. The low beam current ensures that the inter-electron collisions are negligible and the collective eFects absent. Fig. 15(a) gives a schematic scale drawing of the apparatus showing the position of the electron gun, the Faraday cup detector, the Rowland ring, along with the coils for the external %eld. Fig. 15(b) gives the variation of the external axial magnetic %eld along the axis as measured by the magnetometer probe. The curl free vector potential whose eFect on the electron motion is to be tested experimentally is produced by a solenoid ring (the Rowland ring) which is suspended in the path of the electron beam between the gun and the detector, with its plane normal to the axis, and such that the electron beam passes through the ring centre. The details of the experimental set-up are given in Ref. [72]. The experiment is now carried out by choosing a particular energy E for the electron beam and also a particular distance L between the electron gun and the detector. For most of the results reported here these are chosen to be E = 600; 800 and 1200 eV and L = 30 cm. The values of the magnetic %eld corresponding to the condition (8.99b) were found to be 18.9, 22.5 and 27:0 g for the energies E = 600; 800 and 1200 eV. One should like to point out the sensitivity of the observed pattern (maxima and minima) to the departure of the magnetic %eld from the value which satis%es the relation (8.99b). A departure by as little as 0.5 –1 g is found to obliterate the observed pattern of maxima and minima, leading to an almost Rat response. It is clearly seen from Fig. 16 that we do get a series of (about seven) maxima and minima, thus con%rming at least qualitatively, to begin with, the expectations from (8.101) or equivalently from (8.102). It is, in fact, worth noting that the modulation observed is found to be quite large, in the range of 25 –30% of the mean plate current. The dialation of the peak to peak separation with the increase in the solenoidal current noted in the various plots is due to saturation of the ferrite core used. We present in Table 10, the changes VI in the current in the Rowland ring corresponding to the change of the order Vk = 1, for the diFerent electron energies E = 600; 800 and 1200 eV for the plots in Fig. 16. Since the peak–peak separation is seen to dilate with the solenoid current, the change Vk = 1, is taken to be in the same region of the current in the Rowland ring from the plots for the diFerent electron energies. The ratio of VI to E1=2 tabulated there for the various energies is found to be quite close to the average value 5:09 × 10−3 A eV−1=2 . Hence VI (and therefore V@) is indeed found to vary with E as E1=2 in the experiment as is required by the relation (8.102). We estimate 15◦ in the present set up and write V@ = 0m (Vb):, where b is the vacuum magnetic %eld in the ring due to the solenoid current and 0m is the magnetic permeability of the ferrite and : is the cross sectional area of the ring, : = (*=4) cm2 . For the ring in the experiment with 154 windings b = 15:4 gauss=A (A being the current in amperes). Then using the values
R.K. Varma / Physics Reports 378 (2003) 301 – 434
423
Table 10 Current interval VI in the solenoid per order of the observed maxima, tabulated against the energy E of the beam in eV, for the plots of Figs. 16 and 17 E (eV)
600 800 900 1200
E1=2 (eV)1=2
24.5 28.3 30.0 34.64
Average (VI =E1=2 ) × 103
Fig. 16
Fig. 17
VI (A)
(VI =E1=2 ) × 103
VI (A)
(VI =E1=2 ) × 103
0.125 0.145 — 0.175
5.10 5.12 — 5.05
0.238 — 0.288 0.325
9.7 — 9.6 9.4
5.09
9.58
Fig. 16. Variation of the (electron) plate current as a function of current in the small solenoidal ring (placed inside the vacuum chamber) for the diFerent energies and corresponding magnetic %eld values B. (A) E = 600 eV, B = 18:5 G, (B) E = 800 eV, B = 22:5 G, (C) E = 1200 eV, B = 27:0 G.
424
R.K. Varma / Physics Reports 378 (2003) 301 – 434
] 0 ∼ 2 (as estimated from the magnetic %eld variation of Fig. 15(b)), L = 30 cm, E= 800 eV, B=B we get from (8.102) √ 0m (Vb) = 100
VL (sin tan =sin 40 )Vk = 277(Vk) :
(8.103)
using the above mentioned values of V , L, : and , V being the energy in electronvolt. For Vn = 1, Vb represents consecutive inter-peak separation for E = 800 eV. From Fig. 16 it is found to be 2:62 G using the calibration b = 15:4 gauss=A for the magnetic %eld in the ring. This yields a %gure for the magnetic permeability 0m of the ring material (ferrite) to be 0m ≈ 100, which is very close to the value of 0m for the ferrite ring used. We have thus demonstrated the existence of an entirely novel and unexpected eFect (in the classical macrodomain). Moreover, we have been able to show that it is indeed governed by relation (8.102) [which follows from (8.98) – (8.99b)] with some of the most crucial dependences such as the electron energy and the magnetic %eld having been veri%ed. The above experiment was repeated with another, cleaner source of the curl free vector potential in the path of the electron beam, namely a larger solenoidal ring of diameter ∼ 10 cm placed around the glass chamber outside. It is a cleaner source, because the eFect of the leakage %eld of the solenoid ring on the electron ring, if any, would be much smaller here because of the larger distance of the latter from the seat of the leakage %eld. This arrangement furthermore insulates the beam physically from the solenoidal ring thereby eliminating any eFect arising from any possible interaction with it. This arrangement has the additional advantage that the detector can now be moved right through the centre of the solenoid coil so as to be able to change the gun–detector distance over a larger range. The dependence of the behaviour of the detector on the distance also turns out to be in accordance with the relation (8.102) and will be reported shortly. The experiment was carried out as before with the same distance L = 25 cm, and E = 600; 900 and 1200 eV. The electron plate current variation for these energies as a function of the solenoidal current I are presented in Fig. 17. The interpeak separation VI was again found to be proportional to E1=2 as indicated in Table 10, which gives the ratio (VI=E1=2 ) for the various energies centred closely around the mean 9:6 × 10−3 A eV−1=2 . After applying a number of other experimental checks as described in Ref. [72] it is concluded that these eFects are indeed genuine and can only be attributed to the eFect of the magnetic vector potential on the electrons in the manner of the Aharonov–Bohm eFect of quantum mechanics and as described by (8.99a) – (8.100). As mentioned in the beginning, the importance of our observations lies in the fact that they signify the observability of the vector potential in the classical domain as well, though it is not known to be an observable classically. One can of course, raise the question of gauge invariance of the results. It may be noted that Eq. (8.100) which is the equation from which Eqs. (8.101) and (8.102) arise is gauge invariant because the Rux @ that it involves is a gauge invariant quantity. Hence the results presented satisfy gauge invariance. This point is elaborated upon in Section 8.2, where it is shown, how the line integral L=1 −L=2 A dS is shown to be equal to @ sin 40 . These observations would obviously raise a host of questions of a fundamental nature. The electrons clearly do sense the presence of a curl free vector potential, albeit under certain conditions
R.K. Varma / Physics Reports 378 (2003) 301 – 434
425
Fig. 17. Variation of the (electron) plate current as a function of current in the large solenoidal ring (placed around the glass vacuum chamber outside) for the diFerent electron energies E and corresponding magnetic %eld values B : E = 600 eV, B = 24:5 G, (B) E = 900 eV, B = 29:3 G, (C) E = 1200 eV, B = 34:9 G.
(Eq. (8.99b)), which the Lorentz equation of motion does not permit. Clearly these results call for a reconciliation with the Lorentz equation of motion. It does seem that our understanding of the charged particle dynamics may have to be enlarged. It may be noted that the Lorentz equation of motion describes a local evolution, while any phenomenon like the Ahranov–Bohm type of eFect which arises from the phase of the wave function, represents a global property of the system, related to the topology of the system con%guration space which cannot be captured by a local evolution equation, like the Lorentz equation. These results thus establish the existence of a wave amplitude and an associated phase in the classical macrodomain (for this particular system at least) where classical mechanics is known to be operative. It would thus seem that the probability amplitude description regarded as so uniquely characteristic of the quantum microdomain, extends well into the classical macrodomain. In fact, we have thus uncovered through these experiments a hitherto unknown, deeper layer of physics, for the charged particle dynamics in a magnetic %eld in the classical macrodomain related to its global structure. As hinted above, it is possible that it is related to the topological structure of the system con%guration space, in the spirit discussed the author [61].
426
R.K. Varma / Physics Reports 378 (2003) 301 – 434
9. Summarizing comments, discussion and future issues In this concluding section we present %rst a retrospective view of the evolution and development of the main ideas as described in this review. We highlight, in particular, the entirely new, hitherto unknown and unsuspected physical phenomena that have been unraveled through a series of theoretical and experimental investigations relating to the dynamics of charged particles in a magnetic %eld in the classical macrodomain (we call “classical macro-domain” the parameter domain of macroscopic dimensions where classical dynamics is supposed to operate). Later, we discuss some conceptual issues arising out of the rather extraordinary results obtained; for example, the observation of the curl free vector potential in the classical macrodomain aC la the Aharonov–Bohm eFect as described in Section 8.6. These new and extraordinary physical phenomena have been unraveled through a new paradigm for the charged particle dynamics in a magnetic %eld which has been developed by the author over the last three decades. The main objective of this review is to present the evolution of this new paradigm (for the charged particle dynamics in a magnetic %eld) as compared and contrasted with the standard approach of nonlinear dynamics as applied to this system. The focal problem which served originally to de%ne the point of departure between the two approaches is the rather fascinating and mathematically challenging problem of the “determination of residence times against nonadiabatic escape of charged particles from adiabatic magnetic traps”. [This is de%ned and discussed in Sections 5 and 6.] We shall, for brevity, refer to this problem as the “Exit Problem” as such a problem is usually referred to in mathematical parlance. The standard approach to the problem has been reviewed in Section 5, whose perusal would show why the Exit Problem is so mathematically challenging. We have presented in Section 6.3 a comparison between the standard approach vis a vis the new paradigm with reference to the experimental results relating to the “exit problem”. However, the new paradigm, by virtue of its (probability amplitude, Schr6odinger-like) structure led to further new predictions (matter wave phenomena and the observability of the curl free vector potential in the macrodomain), which went far beyond its original objective, namely, to address the “exit problem”. These predictions were subsequently con%rmed experimentally (Sections 7 and 8.6). For the review to be accessible to the general readership, it is desirable to provide a review of basic concepts and paradigms which should lead upto the understanding and appreciation of the problem mentioned above. Some of these concepts such as “adiabatic invariants”, “adiabatic motion”, “nonadiabatic eFects”, etc. have therefore been reviewed in Sections 2–4 outlining the early pioneering work relating to the adiabatic invariants [6,17,21–23] and adiabatic motion [5]. The calculation of nonadiabatic changes in the gyroaction is an essential component of the standard approach to the “Exit Problem”. Such calculations have been reviewed rather extensively in Section 4 where both, the purely time dependent and the purely space dependent magnetic %elds have been considered for completeness. Though the former are not relevant for the consideration of the “Exit Problem”, the calculations pertaining to this case carried out by some of the early workers [21,34] have been presented here along with some new methods through well de%ned procedures for both the discontinuous and analytic form of magnetic %eld variation in time. All through these presentations, an analogy has been drawn between nonadiabatic eFects on the one hand and quantum eFect on the other, using in some cases methods of quantum mechanics to calculate nonadiabatic changes in the gyroaction. For the purely spatially varying magnetic %eld, the quantum mechanical procedure
R.K. Varma / Physics Reports 378 (2003) 301 – 434
427
of Ref. [44] is pointed out to be particularly interesting and instructive and in consonance with the new paradigm. The new paradigm for the determination of residence times of the Exit Problem has its origin in the work of the author [55] three decades back, which was largely intuitive—heuristic, based essentially on an analogy with the tunneling phenomenon in quantum mechanics. This has been elaborated in the Introduction [Section 1.2]. This derivation of Ref. [55], of the governing equations of the new paradigm, is reviewed here in Section 6.1, essentially to highlight the nature of the intuitive step that this derivation represents. The governing equations did turn out, as a ful%llment of the analogy, to be a set of Schr6odinger—like equations (the gyroaction appearing in the role of ˝), for a set of functions (n) which do have the interpretation of probability amplitudes, as in quantum mechanics. That one should have a Schr6odinger-like probability amplitude description for a classical mechanical system, would be considered quite heretical. However, it described amazingly well the experimentally determined residence times [55], and moreover, made new predictions about the existence of additional residence times, which were entirely unexpected, but were experimentally con%rmed later with all the predicted characteristics [57]. This led to a strengthening of the conviction that these equations do represent some basic truth though they were not based in this derivation on any speci%c dynamical equation, classical or quantum. Given the classical mechanical parameter domain of the problem (of the adiabatic magnetic trap), it would be expected that these equations should be derivable from classical dynamical equations. On the other hand, given the probability amplitude nature of these equations, they should also be related in some manner to the Schr6odinger wave equation. The connection has indeed been demonstrated to exist both ways [58,64]. The author was %rst able to obtain these (Schr6odinger-like) equations [58] starting from the classical Liouville equation for the system. This derivation which is reviewed here in Section 6.2, produced these equations as, what may be regarded, a Hilbert space representation of the latter. The Born-like connection with the probability density, which was already given in the heuristic derivation of Section 6.1 is again obtained here through the derivation itself. As should be obvious from the derivation, the linearity of the Schr6odinger-like equations follows directly from that of the Liouville equation. The Schr6odinger-like structure of this formalism would prompt one to venture the prediction for the existence of matter wave interference phenomena (in one-dimension) for this classical mechanical system even though there was no quantum mechanical input in this derivation. Such a prediction was, in fact, made in Ref. [58]. As we have seen in Section 7 such macroscopic matter wave interference phenomena with wave length typically in the range 1–5 cm have indeed been observed. Next, we have shown in Section 8 how these equations could also be derived from the Schr6odinger wave equation [64], so that the amplitude nature of these equations Rows directly from that of the former, and has therefore an unreserved validity, and with it the prediction of the matter wave interference phenomena for this system in the macrodomain. An important fall out of this derivation, it has been noted, is the identi%cation of the mode number n of the wave function (n) of the Schr6odinger-like equations (6.63) obtained from the Liouville equation with the index N of (N) in (8.30) obtained from quantum mechanics where it has the interpretation of the change in the Landau quantum number as a consequence of magnetic %eld inhomogeneity or any other perturbation. In fact, if we compare Eqs. (6.53) with (8.28), we would note that n and N have similar roles. Thus,
428
R.K. Varma / Physics Reports 378 (2003) 301 – 434
if we were to sum Eq. (8.28), over N after multiplying both sides by eiN@ where @ = would get (S; X⊥ ; @; t + ) = (S − v; X⊥ − VX⊥ ; @ − LA =; t)
t 0
LA dt=) we (9.1)
which is a generalized form (which includes the X⊥ coordinates) of Eq. (6.51) but for the in%nitesimal time = (t − t ). It is thus clear that N may also be regarded as a Fourier index with respect to the action phase @ when (9.1) is so constituted, just as the index n was in the derivation of Ref. [58] from the Liouville equation. Therefore, the somewhat heterodox procedure used in the derivation of Ref. [58] (reviewed here in Section 6.2) starting from the classical Liouville equation stands vindicated. In fact by backtracking the steps used to arrive at Eq. (6.53) one would arrive at the Liouville equation. [One may note parenthetically, the rather interesting manner in which the classical functional dependence on @ appears here, in the spirit of the correspondence principle as a Fourier sum over the quantum number intervals N.] A very signi%cant generalization that this derivation allows is the inclusion in these equations of a curl free vector potential component (besides the nonzero curl component A4 which corresponds to the external magnetic %eld in which the particles move). As we discussed in Section 8.2 this leads to the rather interesting prediction on the observability of the curl free component of the vector potential a: la Aharonov–Bohm, but now in the macroscopic domain of a few centimeters. Such an eFect has indeed been observed as described in Section 8.6. It may be remarked that all the three physical phenomena predicted by these (Schr6odinger-like) equations, and subsequently con%rmed experimentally, [namely (i) the existence of multiplicity of residence times in the adiabatic trap, (ii) the observation of macroscopic matter wave interference phenomena, and (iii) the detection of the eFect of a curl free vector potential a: la Aharonov– Bohm but in the macro-domain of a few centimeters] present an enigma because they belong to the macrodomain of classical mechanics, but cannot be understood in terms of the Lorentz equation of classical dynamics. This holds most strikingly for the observation of the curl free vector potential, the presence of which does not even aFect the Lorentz equation of motion. It has been shown in Ref. [65] that the matter wave beats in the macrodomain reported therein also cannot be understood in terms of the Lorentz equation. Finally, as we have discussed in detail in Section 6.3, the approach based on the classical dynamical equation and termed as the “stochastic diFusion model” (reviewed in Section 5) has been found to be unable to reproduce the multiple residence times established experimentally and in accordance with the predictions of the new paradigm. A remark on the last issue would be quite pertinent. From the point of view of the standard approach (and the “stochastic diFusion model” is one expression of it), the leakage of particles from the trap occurs (as discussed in Sections 1.2 and 5) due to nonadiabatic changes V in the gyroaction, the expressions for which have been well reviewed in Section 4. It is these expressions which have served as inputs for the “stochastic diFusion model”. Recall that these expressions have the characteristic form e−G=j [see Eq. (4.93), for example]. The probability of escape in this model is thus related to this form (∼ e−1=j , leading to the residence time r to be of the form ∼ e1=j [Eq. (5.28) for example]. The nonadiabaticity referred to here may be called as “classical nonadiabaticity”, since the nonadiabatic change V have been calculated here using the classical equation of motion. In the Schr6odinger-like formalism, on the other hand, we have transition amplitudes (N), for diFerent values of the change N of the Landau quantum number n0 = =˝1. Transitions involving
R.K. Varma / Physics Reports 378 (2003) 301 – 434
429
a change in quantum numbers are, by de%nition, nonadiabatic transitions. The transition amplitude (N) for N = 1; 2; 3; : : : are independent of each other and lead, as we have seen, to the probability of escape having the form ∼ eNw=j , with N = 1; 2; 3; : : : existing independently of each other, [w being an appropriate function of the mirror ratio, pitch angle etc., see Eq. (6.28)]. The associated nonadiabatic changes may thus be referred to as “quantum nonadiabaticity”. This fact was already implicit in the treatment of Dykhne and Chaplik [44], who identi%ed the classical nonadiabaticity with N = 1, and perhaps overlooked the observation that though the nonadiabatic changes corresponding to N =2; 3; 4; : : : are exponentially smaller, they are not additive to that for N =1, but exist independently of it in the probability sense. It is this fact of the independent existence of the form eNw=j for diFerent values of N (N = 1; 2; 3; : : :) which accounts for the existence of the multiplicity of residence times having the form ∼ eNw=j . It should thus be clear that a treatment based on classical nonadiabaticity cannot capture the multiplicity of residence times which are a consequence of the independent transition amplitudes for N = 1; 2; 3; : : : . This is, of course, again enigmatic because the physical dimensions of the system studied is well into the classical mechanical regime, and the phenomena in this regime should be described by classical dynamical equations. We are thus faced with a severe paradox: Observations relating to the three physical phenomena mentioned above are a manifestation of the matter wave description, but pertain to the parameter domain of classical mechanics, and cannot be understood in terms of the latter! This is an entirely new situation not faced earlier to the best of author’s knowledge. We do not have yet a resolution of this paradox, but some thoughts are given hereunder. 9.1. Future issues One of the foremost future issues relating to this problem should be the resolution of the above mentioned paradox. Though we have no de%nite ideas yet, we can explore some possibilities. The question may be clearly posed in the following form: Does classical mechanics (as applied to this system at least) possess a hitherto unknown or unearthed structure which has attributed to it the wave property that it has been found to exhibit? To attempt to answer this question it is important to examine the derivation of Ref. [58] which is based purely in classical mechanics, that is, the classical Liouville equation, and has no quantum mechanical input. (It was shown above to have made contact with the quantum mechanical derivation of Ref. [64].) An important input of this derivation was, however, the special choice of the ensemble used for the Liouville equation. The choice, it turned out, corresponded to the “coherent system of trajectories” as designated by Synge [3] or a “family” as termed by Dirac [4]. It is de%ned by a solution of the Hamilton–Jacobi equation, and according to Dirac “corresponds to a state in quantum mechanics”. Synge [3] has studied some interesting topological properties of this coherent system of trajectories. Such a system of trajectories is shown to possess the interesting property that surfaces of constant action—“waves of action”—(as they have been termed by Synge [3]) can be de%ned when the con%guration space of a particular system is simply connected. It has been shown by the author [61] that in a multiply connected space (which, in general a classical mechanical system, corresponds to), one can still de%ne surfaces of constant action provided that certain conditions are satis%ed. These turn out to be EBK-like quantization conditions for the classical mechanical system, where a certain smallest (in magnitude) action appears in place of ˝. Applied to the system of charged particles in a magnetic %eld, where the gyroaction may be taken as the smallest action, the EBK-like quantization condition takes the
430
R.K. Varma / Physics Reports 378 (2003) 301 – 434
form
n
p d x = ‘:
(9.2)
where p is the canonical momentum along the magnetic %eld. These have been interpreted as the “allowed states” [61]. These conditions, it may be noted, are essentially the EBK quantization conditions for the set of Schr6odinger-like equations (6.63). Now topological properties represent global properties of a system. Since the topological considerations of Ref. [61], and the Schr6odinger-like formalism of Ref. [58] have been shown to be related to each other through the Eq. (9.2), it is reasonable to surmise that the global topological properties of the system must be contained in the Schr6odinger-like formalism of Ref. [58]. It has been argued in Section 8.6 that an equation of motion, like the Lorentz equation, describes a local evolution, while any phenomena like the Aharonov–Bohm eFect, or the matter wave phenomena which arises from the phase of the wave function, represents a global property of the system, related to the system con%guration space which cannot be captured by a local evolution equation, like the Lorentz equation. Moreover, at the base of both these formalisms of Refs. [61,58] lies the “coherent system of trajectories” [3] or “family” [4], as the crucial choice for the ensemble of systems. It is thus tempting to conclude that looked at purely from the classical mechanical view point the coherent system of trajectories belonging to a classical mechanical system may exhibit wave phenomena in the parameter domain of classical mechanics, as its global property. But admittedly, this ought to be regarded only as a conjecture yet. The issue raised is still open, most strikingly for the observed Aharonov–Bohm like eFect in the macrodomain. Another related issue for future considerations is the question of relationship between the present Schr6odinger-like formalism (which has been found to operate in the classical macrodomain), and the classical mechanics in the equation of motion-initial value paradigm. What is the manner in which the two are related to each other? How is it that in one description—the Schr6odinger-like description—the curl free vector potential exhibits an observable eFect on the dynamics, while in the other—the Lorentz equation of motion description—it has no eFect on the dynamics. What is the limit process which would yield the latter from the former? It is, in fact, not even clear whether one should seek a limiting procedure to connect the two descriptions. It may have to do eventually with local vs. global dynamics as argued earlier. This issue also requires further considerations. Yet another interesting point arises in respect of the nature of relationship between the quantum mechanic Schr6odinger equation (whose regime of operation is the microdomain characterized by the small magnitude of ˝), and the Schr6odinger-like equations (8.30) (whose regime of operation is the macrodomain characterized by the magnitude of the gyroaction = N ˝ (N 1). There is the standard familiar manner of taking the classical macroscopic limit of the Schr6odinger wave equation, through the WKB ansatz, whereby one obtains the Hamilton–Jacobi equation for the action S, while the wave function = exp(iS=˝) oscillates rapidly in the limit ˝ → 0. These wave functions are still basic quantum mechanical wave functions, but have a rather uninteresting content, in this macroscopic limit. By contrast, the Schr6odinger-like equations (8.30), provide a more interesting limiting procedure to go to the macroscopic domain. This route is in the spirit of the Bohr correspondence principle, where one goes to the large quantum number limit (the large Landau quantum number N , in the present case of charged particle dynamics), and the wave functions (N) of these equations are
R.K. Varma / Physics Reports 378 (2003) 301 – 434
431
the transition amplitudes from the large quantum number N to N ± N, where 1 ¡ NN rather than the basic quantum mechanical wave functions following the Schr6odinger wave equation (Recall that the inequality NN , is precisely the condition for going over to classical mechanics in the correspondence principle spirit). These equations thus provide a more interesting route to go to the macroscopic limit (which is approached here through N → ∞) because the wave functions (N) represent a more interesting nontrivial matter wave behaviour with macroscopic wave length. This happens because the large action = N ˝, enacts the role of ˝ in these equations. As already pointed out these have led to entirely new matter wave properties exhibited by the system studied. One may ask the question, if some other systems could exhibit similar macroscopic matter wave properties in the large quantum number limit. In a recent paper the author has shown [74] that it is indeed possible, and that composite systems such as diatomic molecules, Rydberg atoms in their highly excited internal states could exhibit such macro/mesoscopic matter wave interference phenomena in one-dimensional scattering. The correspondence principle was introduced by Bohr in relation to the radiation emitted by atoms and the relationship of its frequency with the frequency of the classical orbit. It was shown that the two are equal in the limit of large quantum numbers. The question as to what the corresponding wave equation should be in the limit of large quantum number could not have been posed in the old quantum theory era because there did not exist a wave equation at that time. Such a question can now be posed. Perhaps the results of investigations reported here may provide some stimulus for further exploration in this direction. Acknowledgements The development of the new paradigm, the “macroquantum dynamics” of charged particles in a magnetic %eld reviewed here, has been inspired by the Dirac–Feynman path integral formalism of quantum mechanics, in particular Feynman’s Nobel Prize address which alluded to this formalism. The %rst (essentially heuristic) derivation of Ref. [55] made use of this formalism invoking an analogy between the nonadiabatic leakage from the adiabatic trap with the quantum tunneling to obtain the set of governing equations. The author would next like to pay his tribute to his Guru Marshall Rosenbluth from whom he imbibed in a large measure the art and science of heuristics as well as learnt all about adiabatic invariants and nonadiabaticity. The author would like to acknowledge the contributions of his experimental colleagues, Dhiraj Bora, Y.C. Saxena and P.I. John whose eForts led to the veri%cation of the %rst of the predictions of the Schr6odinger-like equations of the new paradigm, namely, the existence of the multiplicity of residence times in an adiabatic trap. With his gurubhai Wendell Horton [56], the author had attempted to derive the Schr6odinger-like equations of Ref. [55] starting from the Schr6odinger equation of quantum mechanics, while with Ashok Ambastha’s eForts [59], the author explored the possibility of reproducing the multiplicity of residence times through numerical pursuit of the trajectories of an ensemble of particles. The author would next like to thank Kyoji Nishikawa, the then Director of the Institute for Fusion Theory, Hiroshima, Japan for providing him the opportunity of a visit at his Institute as a Ministry of Education Visiting Professor during April–September 1980. This enabled him to complete his work [58] in a peaceful academic ambience that this Institute provided. The existence of macroscopic matter wave interference phenomena predicted in this paper [58] were subsequently
432
R.K. Varma / Physics Reports 378 (2003) 301 – 434
established experimentally through the painstaking eForts of his other experimental colleagues, A.M. Punithavelu and S.P.S. Rawat along with S.B. Banerjee later. Finally the observability of the curl free macrodomain predicted in Ref. [64] was experimentally demonstrated with the eForts and contribution of the above mentioned colleagues. The majority of the work carried out by the author would not have been possible without the encouragement, criticism and support over the years, material and moral, of a number of colleagues M.G.K. Menon, D. Lal, G.S. Agarwal, B.V. Chirikov, P.K. Shukla, J.C. Parikh and G. Rowlands, T.G. Northrop (through his book [5], and some private communication) and an anonymous referee of the 1985 paper [58] (for his highly appreciative comments). Private exchange of scienti%c correspondence with B.V. Chirikov and G. Rowlands was stimulating. P.K. Shukla and the late N. Nagesha Rao (my former student) kept prodding the author to write a review article on his researches on this topic. Nagesha, in fact, made some very valuable suggestions on the %rst two sections of this review that he had seen before his sad and untimely demise. The author thanks all these colleagues for their help, support and encouragement and all his experimental colleagues for their contributions which helped establish the physical reality of the rather heterodox theoretical formalism of the new paradigm. Special thanks are due to Ashok Ambastha for his generous help with the %gures for this article. M. Sourabhan’s help in typing the manuscript is greatly appreciated. Finally, the author would like to make a special mention of the constant encouragement, and unstinted support provided him by his wife Sushama which enabled him to conclude this review in good time. References [1] Carl St6ormer, The Polar Aurora, Clarendon Press, Oxford, 1955. [2] (a) S.N. Rodionov, Atom. Energiya 6 (1959) 623; (b) G. Gibson, W. Jordan, E. Lauer, Phys. Rev. Lett. 4 (1960) 217. [3] J.L. Synge, in: Principles of Mechanics, Handbuch der Physik, Vol. III/1, Springer, Berlin, 1960, p. 121. [4] P.A.M. Dirac, Can. J. Maths 3 (1951) 1. [5] T.G. Northrop, Adiabatic Motion of Charged Particles, Interscience, New York, 1963. [6] M. Kruskal, J. Math. Phys. 3 (1962) 806. [7] B.V. Chirikov, Atom. Energia 6 (1959) 630 J. Nucl. Energy C1 (1960) 253. [8] I.B. Bernstein, G. Rowlands, Phys. Fluids 19 (1976) 1546. [9] B.V. Chirikov, Fiz. Plazmy 4 (1978) 521 Sov. J. Plasma Phys. 4 (1978) 289. [10] B.V. Chirikov, Phys. Rep. 52 (1979) 263. [11] A. Garren, R.J. Riddell, L. Smith, G. Bing, R.L. Hanrich, T.G. Northrop, J.E. Roberts, Proceedings of the Second International Conference on Peaceful uses of Atomic Energy, Vol. 31, United Nations, Geneva, 1958, p. 65. [12] H. von Helmholtz, J. f6ur Mathematik 96 (1884) 111. [13] Heinrich Hertz, Principles of Mechanics (English translation London, 1899, reprinted Dover, New York, 1956, orig. (German Edition 1894)). [14] P. Ehrenfest, Philos. Mag. 33 (1917) 9500 Also Proc. Acad. Amsterdam 25 (1916) 412. [15] H. Alfven, Cosmical Electrodynamics, Clarendon Press, Oxford, 1950. [16] G. Hellwig, Z. Naturforsch. 10a (1955) 508. [17] T. Northrop, E. Teller, Phys. Rev. 117 (1960) 215. [18] N.D. Sen Gupta, Nuovo Cimento 42 (1966) 121. [19] W. Wasov, Comm. Pure Appl. Math. 9 (1956) 1. [20] L.D. Landau, E.M. Lifshitz, Quantum Mechanics, Pergamon Press, New York, 1965, p. 158. [21] S. Chandrasekhar, in: R.K. LandshoF (Ed.), The Plasma in a Magnetic Field, Stanford University Press, Stanford, 1958, p. 3.
R.K. Varma / Physics Reports 378 (2003) 301 – 434 [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62]
433
R.M. Kulsrud, Phys. Rev. 106 (1957) 205. A. Lenard, Ann. Phys. 6 (1959) 281. M.N. Rosenbluth, Course of Lectures at the Graduate Class of 1963, University of California, San Diego. V.I. Arnold, Uspekhi Akad. Nauk SSSR 18 (1963) 91 [Russian Math Surveys, 18 (1963) 86]. D. ter Haar, Elements of Hamiltonian Mechanics, Pergamon Press, New York, 1971, p. 146. L.D. Landau, E.M. Lifshitz, Mechanics, Pergamon Press, New York, 1965. T. Whittaker, Analytical Dynamics, Cambridge University Press, Cambridge, 1960. G.I. Budker, Plasma Phys. Contr. Thermonucl. Reactions, Acad Nauk SSSR 3 (1959) 3 (in Russian). S. Chandrasekhar Plasma Physics, Notes compiled by S.K. Trehan, Chicago University Press, Chicago, 1960. F. Hertweck, A. Schl6uter, Z. Naturforsch. 12a (1957) 844. P.O. Vandervoort, Ann. Phys. (N.Y.) 12 (1961) 436. G. Backus, A. Lenard, R. Kulsrud, Z. Naturforsch. 15a (1960) 1007. J.E. Howard, Phys. Fluids 13 (1970) 2407. A.M. Dykhne, Zh. Eksp. Teor. Fiz. 38 (1960) 570 [Sov. Phys. JETP 11 (1960) 411]. V.L. Pokorovskii, S.K. Savvinykh, F.R. Ulinich, Zh. Eksp. Teor. Fiz. 34 (1958) 1272 34 (1958) 1629 [Sov. Phys. JETP 34 (1958) 879; 34 (1958) 1119]. L.D. Landau, E.M. Lifshitz, Quantum Mechanics, Pergamon Press, New York, 1965, p. 181. W. Magnus, F. Oberhetinger, Formulas and Theorems for the Function of Mathematical Physics, Chelsea Publishing Company, NY, 1949, pp. 59 – 63. I.I. Gol’dman, V.D. Krivchenkov, V.I. Kogan, V.M. Galitskii, in: D. ter Haar (Ed.), Problems in Quantum Mechanics, Academic Press, New York, 1960, Problem No. 5, p. 99. R.J. Hastie, G.E. Hobbs, J.B. Taylor, in: Plasma Physics and Controlled Nuclear Fusion Research, Vol. 1, International Atomic Energy Agency, Vienna, 1969, p. 389. J.E. Howard, Phys. Fluids 14 (1971) 2378. E.M. Krushkal, Zh. Tekh. Fiz. 42 (1972) 2288 [Sov. Phys. - Tech. Phys. 17 (1973) 1792]. R.H. Cohen, G. Rowlands, J.H. Foote, Phys. Fluids 21 (1978) 627. A.M. Dykhne, A.V. Chaplik, Zh. Eksp. Teor. Fiz. 40 (1961) 666 [Sov. Phys. JETP 13 (1961) 465]. M.N. Rosenbluth, R.K. Varma, Nucl. Fusion 7 (1967) 33. H. Irie, J. Phys. Soc. Japan 54 (1985) 2883. M. Braun, J. DiF. Equations 8 (1970) 294. V.G. Ponomarenko, Ya.L. Tranin, V.I. Yurchenko, A.N. Yasnetskii, Zh. Eksp. Teor. Fiz. 55 (1968) 3 [Sov. Phys. JETP 28 (1969) 1]. A.N. Dubinina, L.S. Krasitskaya, Yu N. Yudin, Plasma Phys. 11 (1969) 551. J.M. Greene, J. Math. Phys. 20 (1979) 1183. A.B. Rechester, R.B. White, M.N. Rosenbluth, Phys. Fluids 23 (1981) 2664. B.V. Chirikov, Particle dynamics in magnetic traps, in: B.B. Kadomtsev (Ed.), Problems in Plasma Theory, Vol. 13, Consultants Bureau, New York, 1987. Luigi Chierchia, Variational and local methods in the study of Hamiltonian Systems, in: A. Ambrosetti, G.F. Dell’ Antonio (Eds.), Proceedings of Workshop held at International Centre for Theoretical Physics, Trieste, Italy 24 –28 October 1994, World Scienti%c, Singapore. N.N. Nekhoroshev, Usp. Math. Nauk 32 (1977) 5 [Russian Math. Surveys 32 (1977) 1]. R.K. Varma, Phys. Rev. Lett. 26 (1971) 417. R.K. Varma, C.W. Horton Jr., Phys. Fluids 15 (1972) 620. (a) D. Bora, P.I. John, Y.C. Saxena, R.K. Varma, Phys. Lett. A 75 (1979) 60; (b) Plasma Phys. 22 (1980) 563; (c) Phys. Fluids 25 (1982) 2284. R.K. Varma, Phys. Rev. A 31 (1985) 3951. A. Ambastha, R.K. Varma, Plasma Phys. Contr. Fusion 30 (1988) 1279. (a) R.K. Varma, A.M. Punithavelu, Mod. Phys. Lett. A 8 (1993) 167; (b) Mod. Phys. Lett. 8 (1993) 3823. R.K. Varma, Mod. Phys. Lett. A 9 (1994) 3653. C.S. Unnikrishnan, C.P. Safvan, Mod. Phys. Lett. A 14 (1999) 479.
434
R.K. Varma / Physics Reports 378 (2003) 301 – 434
[63] A. Ito, Z. Yoshida, Phys. Rev. E 63 (2001) 026502. [64] R.K. Varma, Phys. Rev. E 64 (2001) 036608; R.K. Varma, Phys. Rev. E 65 (2002) 019904, Erratum. [65] R.K. Varma, A.M. Punithavelu, S.B. Banerjee, Phys. Rev. E 65 (2002) 026503. [66] R.P. Feynman, A.R. Hibbs, Quantum Mechanics and Path Integrals, McGraw-Hill, New York, 1965. [67] A. Kitaigorodsky, Introduction to Physics, Mir Publishers, Moscow, 1981, p. 340. [68] R.K. Varma, A variety of plasmas, in: Proceedings of the International Conference on Plasma Phys., Indian Academy of Sciences, Bangalore, India, Delhi, 1989, p. 235. [69] R.K. Varma Quantum-like models and coherent eFects, in: R. Fedele, P.K. Shukla (Eds.), Proceedings of the 27th Workshop of INFN Eloisatron Project, Erice, Italy, 13–20 June 1994, World Scienti%c, Singapore, 1995, p. 187. [70] J.L. McCauley, Classical Mechanics, Cambridge University Press, Cambridge, 1997, p. 435. [71] R. Shankar, Principles of Quantum Mechanics, Second Edition, Plenum Press, New York, 1994, p. 494. [72] J.D. Jackson, Classical Electrodynamics, Wiley, New York, 1975, p. 205. [73] R.K. Varma, A.M. Punithavelu, S.B. Banerjee, Phys. Lett. A 303 (2002) 114. [74] R.K. Varma, Europ. Phys. J. D 20 (2002) 211.
435
CONTENTS VOLUME 378 D. Sornette. Critical market crashes D. Drechsel, B. Pasquini, M. Vanderhaeghen. Dispersion relations in real and virtual Compton scattering
1 99
R.J. Szabo. Quantum field theory on noncommutative spaces
207
R.K. Varma. Classical and macroquantum dynamics of charged particles in a magnetic field
301
Contents of volume
435