THE RESTLESS UNIVERSE APPLICATIONS OF GRAVITATIONAL N-BODY DYNAMICS TO PLANETARY, STELLAR AND GALACTIC SYSTEMS
Proceed...
50 downloads
684 Views
15MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE RESTLESS UNIVERSE APPLICATIONS OF GRAVITATIONAL N-BODY DYNAMICS TO PLANETARY, STELLAR AND GALACTIC SYSTEMS
Proceedings of the Fifty Fourth Scottish Universities Summer School in Physics, Blair Atholl, 23 July - 5 August 2000.
A NATO Advanced Study Institute
Edited by
B A Steves - Glasgow Caledonian University A J Maciejewski - Nicolaus Copernicus University Series Editor
P Osborne - University of Edinburgh
Copublished by Scottish Universities Summer School in Physics & Institute of Physics Publishing, Bristol and Philadelphia
Copyright @ 2001 The Scottish Universities Summer School in Physics
No Part of this book may be reproduced in any form by photostat, microfilm or any other means without written permission from the publishers.
British Library cataloguing-in-Publication Data:
A catalogue record for this book is available from the British Library ISBN 0- 7503-0822-2 Library of Congress Cataloging-in-Publication Data are available.
Copublished by
SUSSP Publications The Department of Physics, Edinburgh University, The King’s Buildings, Mayfield Road, Edinburgh EH9 352, Scotland. and
Institute of Physics Publishing, wholly owned by The Institute of Physics, London. Institute of Physics Publishing, Dirac House, Temple Back, Bristol BS1 6BE, UK. US Editorial Office: Institute of Physics Publishing, The Public Ledger Building, Suite 1035, 150 Independence Mall West, Philadelphia, PA 19106, USA.
Printed in Great Britain by J W Arrowsmith Ltd, Bristol.
SUSSP Proceedings
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1982 1983 1983 1984 1985 1985 1985 1986 1987 1987 1988 1988
Dispersion Relations Fluctuation, Relaxation and Resonance in Magnetic Systems Polarons and Excitons Strong Interactions and High Energy Physics Nuclear Structure and Electromagnetic Interactions Phonons in Perfect and Imperfect Lattices Particle Interactions at High Energy Methods in Solid State and Superfluid Theory Physics of Hot Plasmas Quantum Optics Hadronic Interactions of Photons and Electrons Atoms and Molecules in Astrophysics Properties of Amorphous Semiconductors Phenomenology of Particles at High Energy The Helium Liquids Non-linear Optics Fundamentals of Quark Models Nuclear Structure Physics Metal Non-metal Transitions in Disordered Solids Laser-Plasma Interactions: 1 Gauge Theories and Experiments at High Energy Magnetism in Solids Lasers: Physics, Systems and Techniques Laser-Plasma Interactions: 2 Quantitative Electron Microscopy Statistical and Particle Physics Fundamental Forces Superstrings and Supergravity Laser-Plasma Interactions: 3 Synchrotron Radiation Sources and their Applications Localisation and Interaction Computational Physics Astrophysical and Laboratory Spectroscopy Optical Computing Laser-Plasma Interactions: 4 /continued V
susSP Proceedings 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
1989 1990 1991 1991 1992 1992 1993 1994 1994 1994 1995 1995 1996 1997 1998 1998 1998 1999 2000 2001
(continued)
Physics of the Early Universe Pattern Recognition and Image Processing in Physics Physics of Nanostructures High Temperature Superconductivity Quantitative Microbeam Analysis Nonlinear Dynamics and Spatial Complexity in Optical Systems High Energy Phenomenology Determination of Geophysical Parameters from Space Quantum Dynamics of Simple Systems Laser-Plasma Interactions 5: Inertial Confinement Fusion General Relativity Laser Sources and Applications Generation and Application of High Power Microwaves Physical Processes in the Coastal Zone Semiconductor Quantum Optoelectronics Muon Science Advances in Lasers and Applications Soft and J?ragile Matter The Restless Universe Heavy Flavour Physics
vi
Lecturers Sverre Aarseth
Institute of Astronomy, Cambridge, UK
Alessandra Celletti
Universith di Roma Tor Vergata
Hugh Couchman
McMaster University, Ontario
Rudolf Dvorak
University of Vienna, Austria
Claude Froeschlh
Observatoire de la Cote d'Azur
Douglas Heggie
University of Edinburgh
Martin Hendry
University of Glasgow
Andrzej J Maciejewski
Nicolaus Copernicus University, Torun
Christian Marchal
D.E.S. ONERA, Chatillon, France
Michael Merrifield
University of Nottingham
David Merritt
Rutgers University, New Jersey
Philip James Message
University of Liverpool
Andrea Milani
Universita di Pisa
Tom Quinn
University of Washington, Seattle
Carles Sim6
Universitat de Barcelona
Bonnie A Steves
Glasgow Caledonian University
David Vokroulickj,
Charles University, Prague
Joerg Waldvogel
ETH-Zentrum, Zurich
Martin Weinberg
University of Massachusetts
Gustavo Yepes
Universidad Aut6noma de Madrid
Postal and e-mail addresses for lecturers, students and committee members can be found at http://www.astro.gla,ac.uk/users/martin/nato/natoconf.html .
vii
Executive Committee Dr B A Steves
Glasgow Caledonian Univ.
Co-Director and Co-Editor
Prof. A E Roy
University of Glasgow
Treasurer
Dr M Hendry
University of Glasgow
Secretay
International Advisory Committee Dr B A Steves
Glasgow Caledonian Univ.
Prof. A J Maciejewski
Nicolaus Copernicus Univ. Co-Director and Co-Editor
Prof. C1 Froeschlk
Observatoire de Nice, France
Prof. D Heggie
University of Edinburgh, Scotland
Dr M Hendry
University of Glasgow, Scotland
Prof. A Milani
Universita di Pisa, Italy
viii
Co-Director and Co-Editor
Preface The gravitational N-body problem dominates much of theoretical astrophysics. It arises in problems ranging from the motion of artificial and natural satellites to the behaviour of stars in star clusters and galaxies and the formation of large-scale structure in the universe. Since the early years of the twentieth century, the techniques and scientific issues involved in gravitational dynamics have diversified widely. Recently, however, there have been signs for the need to exchange ideas and techniques between the disciplines of celestial mechanics, stellar dynamics and galactic dynamics as many of the established techniques in one field are being rediscovered or reinvented for use in another field. This especially concerns theoretical achievements allowing better understanding of dynamics in multidimensional phase space and global properties of investigated systems. This state of the art textbook provides an invaluable reference volume for all students and researchers in these subjects. Based on the recent joint NATO Advanced Study Institute and Scottish Universities Summer School in Physics entitled ‘The Restless Universe: Applications of Gravitational N-Body Dynamics to Planetary, Stellar and Galactic Systems’, the book, written by the lecturers at the School, is aimed at young scientists at PhD level who wish to learn of recent developments in their fields. By the nature of the different themes involved in N-body gravitational dynamics, the book is also relevant to research specialists in each field providing them with an up-to-date synoptic view of their own discipline, while enabling them to obtain a review of gravitational N-body dynamics from the viewpoint of the other disciplines.
A major aim of the volume, like that of the School, is to lead the reader from a strong element of review in tutorial form to a clear picture of the state-of-the-art of research being conducted in the application of gravitational N-body dynamics in the following fields. Within the Solar System (the traditional realm of celestial mechanics), studies of the three, four or few body problems come into their own. Recent numerical and analytical methods such as the use of fast Lyapunov indicators are being used to study chaos and resonance in the three body problem. While exciting applications of these new analyses are found in such studies as the distribution and impact probabilities of Near Earth Asteroids, the formation and evolution of planetary systems and more particularly in the dynamics of small bodies in the solar system, in recent years it has become apparent that the developing theory of chaotic motion in celestial mechanics has also serious application in the relaxation of galactic structures. Stellar dynamics is the application of the N-body problem to the formation, evolution and dynamics of galaxies, star clusters, and multiple star systems of few bodies. It therefore finds common ground with both cosmology and celestial mechanics. Specially tailored algorithms and specially designed computer hardware are developed to handle the N- body problem to high accuracy and high speed. Such techniques, for example, are becoming increasingly important in the study of galaxy formation where direct numerical ix
simulations of galaxy formation are fast approaching the resolution required to model galaxy morphology. Cosmology studies the formation and evolution of galaxies in the context of the standard Big Bang model, and so draws together several other strands of cosmology, including the analysis of large scale structure and the physics of the early Universe. Numerical simulations of large-scale structure w e raising important questions about the relationship between galaxy evolution and the background cosmological model. Dynamical studies of the matter distribution in galaxy clusters, and their large scale streaming motions, are also being used to constrain the mean mass density of the universe and determine its eventual fate. Many of the well-established techniques of celestial mechanics and stellar dynamics are being rediscovered or reinvented to help study these problems. The SUSSP54/NATO AS1 was held in the Atholl Arms Hotel, Blair Atholl, Scotland. The two week long AS1 brought together 80 scientists from more than 30 countries. Blair Atholl in Perthshire was an ideal location for the School. It fitted the NATO criteria of being comparatively secluded and quiet, yet major towns were only one hour’s train journey away. The Atholl Arms Hotel, built in 1832, is completely refurbished in the style of a traditional highland hotel. Located in the heart of the Scottish highlands it enabled us to provide a variety of outdoor pursuits and cultural interests amidst spectacular mountain scenery. We would like to thank Professor Archie Roy, Dr Martin Hendry and Ms. Gail Penny (University of Glasgow), Dr Winston Sweatman (Napier University) and Mr Peter Duncan (Glasgow Caledonian University) for their unstinting help which contributed so much to the success of the School. The proprietor and hotel staff of the Atholl Arms Hotel are also in our debt for the high standard of service they provided. We are also indebted t o the NATO Scientific Affairs Division and the SUSSP Committee for their valuable help and sponsorship. Further information on the School, including addresses of all participants, can be found at the web address: http://www.astro.gla.ac.uk/users/martin/nato/natoconf.html .
Bonnie A Steves and Andrzej J Maciejewski CO-Directors, March 2001
X
Contents Solar system dynamics N-body simulations of the Solar System, planet formation, and galaxy clusters.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Quinn
.l
On the Trojan problem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rudolf Dvorak and Elke Pilat-Lohinger
.21
Ideal resonance and Melnikov’s theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philip J Message
.43
The Yarkovsky effect in the dynamics of the Solar System.. . . . . . . . . . . . . . . . .53 David Vokrouhlicky’ Are science and celestial mechanics deterministic? Henri PoincarB, philosopher and scientist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Marchal
79
Stellar kinematics and dynamics Regularization methods for the N-body problem ........................... Sverre J Aarseth
.93
Escape in Hill’s problem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Douglas C Heggie
.lo9
Galactic dynamics Galaxies: from kinematics to dynamics., .................................... Michael R Merrifield
.129
Non-integrable galactic dynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Merritt
.145
Evolution of galaxies due to self-excitation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin D Weinberg
,167
f cont xi
Cosmology
- Large scale structure dynamics
Dynamical methods for reconstructing the large scale galaxy density and velocity fields.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin H e n d y
,191
Cosmological numerical simulations: past, present and future . . . . . . . . . . . . .217 Gustavo Yepes Gravitational N-body simulation of large-scale cosmic structure. . . . . . . . . .,239 H M P Couchman
General dynamics Periodic orbits of the planar N-body problem with equal masses and all bodies on the same p a t h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carles Simd Central configurations revisited . . . . . . . . . . . . . . . . . . . . . . . . . Jorg Waldvogel Surfaces of separation in the Caledonian Symmetrical Double Binary Four Body Problem.,. . . . . . . . . . . . . . . . . . . Bonnie A Steves and Archie E Roy The Fast Lyapunov Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claude Froeschle', Massimiliano Guzzo and Elena Lega
,265 285
301 ,327
Determination of chaotic attractors in short discrete time series . . . . . . . . . . 339 Alessandra Celletti, Claude Froeschle', Igor V Tetko and Alessandro E P Villa Non-integrability in gravitational and cosmological models. . . . . . . . . . . . . . . ..361 Andrzej J Maciejewski
Index... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii
,387
1
N-body simulations of the Solar System, planet formation, and galaxy clusters Thomas Quinn University of Washington, USA
1
Introduction
The title of this section describes systems that vary by an enormous range in scale. The Solar System a few AU in size, involves about a solar mass, and is several billion dynamical times old. Clusters of galaxies are 10l2 AU in size, involve upwards of 1014 solar masses and are only a few dynamical times old. The commonality that relates these systems is gravity, and that they are both well approximated as a Hamiltonian system. In the past decade, gravitational N-body simulations have been successfully used to make discoveries in both of these regimes. The physical interpretation of the simulations is different: with the Solar System we are following the orbits of actual bodies, while in galaxy clusters we are following packets of phase space density it order to solve the Collisionless Boltzmann Equation. Nevertheless, the similarities are such that cross-talk between the two areas has been beneficial for both.
2 2.1
Stability of the solar system History
The issue of the long-term stability of the Solar System is one of the oldest unsolved problems in Newtonian physics, but recent (largely numerical) work has provided some new insight into the problem. The issue goes all the way back to Newton who recognised the problem, although he did not have the quantitative tools to address it. Laplace was the first to have something quantitative to say. He noted that if the mutual perturbations of the planets are expanded in powers of their masses, inclinations and eccentricities, then to first order, the orbits could be expressed as a sum of periodic terms. In this expansion,
2
Thomas Quinn
the non-Kepler part of the Hamiltonian is composed of terms of the form:
T = Ch2’h,”2ks”k,n4p15p~eqq17q,n8 exp(ilcJi
+ ikjAj)
where h,, IC,, p , , q, are the PoincarC elements of the ith planet, A, = 27ru,t is the mean anomaly of the ith planet, U, is the corresponding mean motion, and n, and k, are integers. If this is the case, then if we waited for a sufficiently long time, the solar system would return to an initial configuration. Successive work by Brower and van Woerkom (1950) and Bretagnon (1974) showed that this was also the case for higher order expansions. The problem with this approach was pointed out by PoincarC: the convergence of the expansions is not guaranteed. In particular, if one uses standard perturbation theory and tries to integrate the terms in the Hamiltonian along the unperturbed orbit, one gets terms with a coefficient of 1
k,U, + k, U3 ’ It is therefore obvious that no matter what values the mean motions U, and u3 take, a sufficiently high order term in the expansion will have an arbitrarily large coefficient. Significant progress on the mathematical issue was provided by the work of Kolmogorov, Arnold and Moser (see HCnon, 1983 for a review and references). This work (which is now termed the KAM theorem) showed that the quasi-periodic trajectories of an integrable problem usually remain quasi-periodic under the influence of “sufficiently small” perturbations to the Hamiltonian. The exceptions are trajectories where the ratios of characteristic frequencies of the original problem are sufficiently well approximated by rational numbers - i.e. the theorem failed near resonances. Since rational numbers are inextricably mixed with irrationals along the real number line, regions of quasi-periodic and chaotic behaviour are similarly intertwined. Furthermore the tori which are “destroyed” form a finite set which grows with the strength of the perturbation. The KAM theorem is thus fundamental in showing the persistence of tori under very small perturbations. Using this formalism, Arnold has shown that for very small masses, eccentricities, and inclinations, the solar system is stable; however, the solar system does not meet these stringent requirements. This however does not mean the solar system is actually unstable. In the past couple of decades, much work has been done in looking at the stability of dynamical systems using numerical integrations. The remainder of this section will discuss the techniques used and some of the more interesting results.
2.2
Determining stability from numerical integrations
As a single planet orbit can be reduced to a one dimensional problem, the simplest nontrivial example of stability in a planetary orbit is the restricted three body problem. This is the situation where two bodies are in circular orbits around their mutual centre of mass, and a third massless body moves around them in the orbital plane. Since the only integral is the Hamiltonian, orbits lie on a 3-d hypersurface. Any orbit which exhibits a second integral divides this surface in two. The stability of any given orbit simply depends on whether it is constrained by such an orbit; therefore, its stability is usually determined after only a few dynamical times: the constraining integral is either there or it isn’t. For more bodies, there are many degrees of freedom, and orbits no longer divide phase space. In this case there is a possibility of long term evolution, and determining stability
N-body simulations of the Solar System, planet formation, and galaxy clusters
3
is a non-trivial task. The issue then becomes how to determine stability without having to integrate an ensemble of orbits for an indefinite amount of time. Of course, trying to determine the ultimate stability of an orbit (in the sense of whether a body will escape from the system or a collision between two bodies will occur) from a relatively short integration is difficult. In the discussion below, I will use the term "stable" in the very technical sense of having all the integrals of motion, rather than the general sense of survival of the system for very long periods of time. Likewise by "unstable" I will mean chaotic, or lacking integrals of motion, but not necessarily destined for disruption.
Figure 1. Power spectra of the h and k Poincare' elements of two different possible orbits of the planets in the Upsilon Andromedae system. One technique for assessing the stability of an orbit is by looking at the Fourier transform of one of the coordinates. If the orbit is quasi-periodic then the motion should be expressible as a sum of Fourier components. This is easy to see if one remembers that if there is one integral for each degree of freedom, the Hamiltonian can be expressed as H = H(J),and any function of the phase space coordinates, f ( J , e )can be expressed as f ( t ) = f ( J ,ot)and o = B H / B J . Therefore, the w should show up as discrete lines in the Fourier transform of f(t). On the other hand, if the Fourier transform is continuous, it is an indication that the integrals of motion don't exist, and the orbit is unstable. Examples of these two types of motion can be seen in Figure 1, which shows the power spectrum of the Fourier transform of the h and IC PoincarC elements of two possible orbits of Upsilon Andromeda "c" . Another way of assessing the stability of an orbit is the Lyapunov exponent. The distance in phase space of initially adjacent orbits grows as a power o f t for quasi-periodic orbits, but exponentially if the orbit has fewer integrals of motion than degrees of freedom. The Lyapunov exponent is defined as the limit of y ln[d(t)/d(O)]/t as t grows large where d ( t ) is the distance in phase space as a function of time between two initially adjacent orbits. Use of this exponent for exploring the stability of systems is discussed elsewhere in this volume, but again: the correlation between a positive Lyapunov exponent and the ultimate disruption of a system is not straightforward.
Thomas Quinn
4
-1
0.4
t fir 0.2
i0
-0.2 'f
-0.4
t 'c"
:r
sy- 1
i
3
Figure 2. A comparison of symplectic and non-symplectic integrators is made. The squares are a second order leapfrog integrator; the crosses are a 4th order Runge-Kutta integrator with the same timestep, and the solid line is the exact solution.
2.3
Symplectic integrators
The importance of the integrals of motion has lead to the use of symplectic integrators. A symplectic integrator is an exact solution to a discrete Hamiltonian system that is close to the continuum Hamiltonian of interest. Therefore, it preserves all the PoincarC invariants, and places stringent conditions on the global geometry of the dynamics. A symplectic integrator will exactly conserve an integral in the discrete Hamiltonian that is an approximation to the true integral in the system. This approximate integral oscillates about the true integral without any numerical dissipation. The difference between the discrete and continuum Hamiltonians can be viewed as a small perturbation given by the truncation error of the integrator. In other words, the error is a Hamiltonian. If the error Hamiltonian is a sufficiently small perturbation, then the KAM theorem (Arnold, 1978) guarantees that the invariant curves destroyed are a set of finite measure. In other words, almost all orbits that are stable in the real system will continue to be stable in the numerical system. An illustration of these advantages is shown in Figure 2. Here the radial velocity, U,, is plotted against the radius, r , for an ellipticity, e = 0.5 Kepler orbit using a leapfrog integrator (which is symplectic) and using a fourth order Runge-Kutta integrator. In each integration, approximately 24 steps were taken per orbit, and the integrations ran for 16 orbits. Note how the leapfrog integrator oscillates about the true solution but always remains on a one dimensional surface. This indicates that it is indeed conserving an energy-like quantity, i.e. having the orbit constrained to a one dimensional surface shows the existence of an isolating integral of motion. On the other hand, the Runge-Kutta orbit slowly becomes more circular. The poor performance of the Runge-Kutta integrator is remarkable given that it is a fourth order integrator and uses four times as many force evaluations as the leapfrog integrator. Also note the large
N-body simulations of the Solar System, planet formation, and galaxy clusters
5
wiggles in the leapfrog integration at apoapse. These are indicative of the proximity of resonant islands that would lead to a instability for larger timesteps. General, purpose symplectic integrators (Gladman, Duncan and Candy 1991, Yoshida 1990) tend to be of low order because of their complexity and so are not suitable for long accurate simulations. However, the so-called “mixed variable symplectic” (or MVS) integrators (Wisdom and Holman 1991, Saha and Tremaine 1992) can be made more accurate by factors of the ratio of planetary to solar mass for a given timestep. The principle behind these integrators is to split the Hamiltonian into an unperturbed Kepler part and a perturbation part, and is essentially a generalisation of the leapfrog method. In each step of the integration, the system is first moved forward in time according to Kepler motion, and then a kick in momentum is applied which is derived from the perturbation part of the Hamiltonian. This second step is analytic since the perturbed part of the Hamiltonian can be made independent of the canonical momenta in Cartesian coordinates. The MVS integrators have the additional advantage that the errors are limited to high frequency terms. Over long integration periods these terms will then average out, giving no net contribution to the evolution.
2.4
Chaotic motion of the planets
The rather surprising results of the last decade are that our Solar System is chaotic, and that the Lyapunov exponent is short compared to its age. Integrations of the outer planets for periods up to 845 Myr were made with a special purpose machine, (Applegate, et al. 1986, Sussman and Wisdom, 1988) and it was discovered that Pluto had a Lyapunov exponent of 1/20 Myr-’. Laskar (1989) numerically integrated a rather extensive secular system of the entire Solar System. The Fourier analysis of this 200 million year integration showed that it was not possible to describe the solution as a sum of periodic terms. Laskar also estimated the maximum Lyapunov exponent and found the surprisingly high value of (1/5 Myr - I ) . This very important conclusion has been checked by direct numerical calculations. These include direct comparison to shorter integrations (Quinn, Tremaine and Duncan 1991, Laskar et al. 1992) and independent estimates of the Lyapunov times by longer integrations (Sussmann and Wisdom 1992). These results are intriguing, but they do not completely answer the fundamental question: how long will the Earth and other planets stay on their current nearly circular orbits? However, the presence of chaos in the system and the resulting sensitivity to initial conditions allows one to perform “computational steering” to find configurations within the observational errors that will eventually lead to catastrophic changes in the system. Laskar (1994) showed that it was indeed possible to find such a configuration where Mercury’s eccentricity increases to 1 on timescales of only 3-5 Gyr. Finally, the chaotic dynamics of the terrestrial planets may have played a role in their formation (the topic of the next section). One issue that may be resolved by the irregular motion is the number and size of the current terrestrial planets. If the motions were more regular, there would be fewer collisions between the proto-planets, and the inner Solar System may have consisted of more numerous, smaller planets. A second issue is the cleansing of planetesimals not incorporated into planets. If they were not removed from the terrestrial region then the subsequent impact rate on Earth would have been significantly different, to the possible detriment of life.
6
Thomas Quinn
Simulations of terrestrial planet formation
3
Planet formation theories are modern versions of Kant’s Nebular Hypothesis divided into stages where dust grains become kilometre-sized bodies by non-gravitational interactions and these planetesimals agglomerate into the present-day planets owing to gravitationally driven pairwise accretion; see Lissauer (1993) for a ,iew of this fundamental picture that dates back only as far as Safronov (1969). However, models of planetesimal evolution have been forced to rely on analytical approximations, statistical techniques, or direct N-body methods with comparatively few particles and severe spatial restrictions. Comprehensive direct simulation must evolve a prohibitive number of bodies ( N 106-107) for an equally 106-107 orbits. However, these kinds of simulations are starting to daunting time, become tractable through improvements in computational technology. In this section I will show how N-body techniques apply to this problem, and the scientific questions we can address using them. N
N
3.1
Scientific issues
There are a number of observations that one would hope to explain using simulations of terrestrial planet formation. These include the size of the planets, their composition, their spin, and the amount of debris (comets and asteroids) left over. All of these issues are particularly important because of the impact they have on the origin of life. Size may be critical for holding onto an atmosphere and for plate tectonics. Composition is obviously important. Spin has an effect on climate. As for debris, we have ample evidence that the terrestrial planets were subject to heavy bombardment for a period after their formation. which possibly frustrated the development of life. If that bombardment were to continue to today, the planet would continue to be sterile. The key question with all these issues is the uniqueness of our own system. Are the conditions we see special t o our own system or a generic product that is the usual result of the planet formation process. As mentioned above, the “standard model” for planet formation is the planetesimal hypothesis where dust grains condense into kilometre-sized bodies before aggregating into planets. The other model is formation directly from a gas disk via gravitational instability. The later appears implausible for several reasons. These include the observation that all planets are enhanced in condensable material, and the theoretical constraint that a disk mass comparable to the mass of the Sun is need for the gravitational instability. Lastly, the gas instability model does not account for the formation of small bodies such as asteroids and comets, which fit quite naturally into the planetesimal picture. Terrestrial planet formation is divided into four loosely defined stages (see Lissauer (1993) for a review): 1. Initial Stage. Condensation and growth of grains in the hot nebular disk together with gradual settling to the midplane. Gravitational instability among the grains is resisted owing to continuous stirring by convective and turbulent motions. 2. Early Stage. Growth of grains to km-sized planetesimals via pairwise accretion in the turbulent disk. Planetesimals initially have low eccentricities and inclinations due to gas drag.
N-body simulations of the Solar System, planet formation, and galaxy clusters
7
3. Middle Stage. Agglomeration of planetesimals by focused merging. Possible runaway accretion and subsequent energy equipartition (dynamical friction) may lead to polarisation of the mass distribution: a few large bodies with low e and i in a swarm of smaller planetesimals with high e and i. 4. Late Stage. Once runaway accretion has terminated due to lack of slow moving material, protoplanets gradually evolve into crossing orbits as a result of cumulative gravitational perturbations. This leads to radial mixing and giant impacts until only a few survivors remain, over timescales of -IO* yr.
Although the four stages make a plausible scenario, the details of the development are rather poorly understood. With the initial and early stages, the problem is understanding the interplay of all the complex physics involved. In these stages we cannot unambiguously order the dominant forces. The star formation involves complicated magnetic fields, turbulence, radiative transfer, and gravitational instability. The grain formation and aggregation involves complex chemistry interacting with shocks and radiation. Creating a predictive model is very difficult. On the other hand, the late stage is relatively easy to model. The physics and the numerical techniques are equivalent to the problem of the long term stability of the Solar System discussed in the previous section. Simulations in this regime have been successfully done by Chambers and Wetherill (1998) and Rivera et al. (1999); however, the initial conditions for this stage are a product of the highly nonlinear evolution in the Middle stage. If we do not understand where to lay down a few hundred bodies of lunar size, then we do not know what to make of the final state. For the middle stage, constructing initial conditions is rather straight-forward. This is because with enough particles, the initial conditions become relatively insensitive to the initial distribution of inclinations, eccentricities, and planetesimal masses. These should evolve quickly to representative states where one can follow the nonlinear dynamics of planetary build-up and disk cleansing. Therefore, we can create initial conditions for the middle stage that are realistic and characterised by a few global parameters such as surface density distribution and the properties of the giant planets. In fact, we have a good working hypothesis. This is the minimum mass solar nebula: augment each planet with its missing volatiles so it is of solar composition, then spread that mass out over an annulus of width corresponding to the separation of the nearest planets. Except for the asteroid belt, one obtains a remarkably smooth rP3l2surface density profile from the terrestrial planets all the way out to Neptune (see ). For the rest of this section, I will focus on modelling the Middle stage of planet formation. There are a number of fundamental questions that simulations of this stage can address:
What are the planet formation timescales? The timescales are sensitive to the initial mass distribution and the nature of the growth processes ( L e . whether there was a period of runaway growth-see below). However, there are important observational constraints. For example, pre-main-sequence stars lose their infrared excesses in 1-10 Myr (Strom et al. 1993), setting a limit to the timescale for the planetesimals to become large enough to cease “grinding” collisions that return dust to the disk. Subsequent evolution from these protoplanets to planets in
-
8
Thomas Quinn
-
d
E
7
26
-t
J
c
mow
t
i
t
* 25-
24
L r
I
I
-I
I
t
23 t
Figure 3. The logarithm of the surface density in arbitrary units as a function of the logarithm of the distance from the Sun. The histogram is made b y augmenting the planets with their missing volatiles and spreading the resulting mass into an annulus. The dotted line is a C = r - 3 / 2 power law. the inner Solar System may take significantly longer ( ~ 1 yr; 0 ~Chambers and Wetherill 1996). The transition from rapid growth to long-term interactions has been treated only qualitatively so far (Lissauer 1993).
What controls “runaway” growth? The search for runaway growth has been a consistent focus of planetesimal work. While it is now generally accepted that a few bodies do detach from the general planetesimal mass distribution with accelerated growth rates after a certain amount of time and under certain conditions (e.g. the form of the mass and velocity distribution is important), some of the details remain uncertain. This is because direct simulations .have to date been too coarse to do more than scratch the surface of the problem. The ultimate goal is to have sufficient dynamic range and time coverage to quantify directly the conditions under which runaway growth both begins and ceases to become effective.
Was there strong radial mixing? Wetherill’s (1990) simulations suggest radial mixing during protoplanet accumulation sufficient to blur chemical gradients-at odds with the dependence of asteroid spectral type on semi-major axis seen by Gradie et al. (1989). Direct simulations can provide a detailed picture of radial mixing by merely comparing initial and final orbital radii. What determines planetary spin? Six of our planets have spin vectors aligned with the common orbital vector, while the remaining three (Venus, Uranus, and Pluto) are retrograde. Direct simulations can
N-body simulations of the Solar System, planet formation, and galaxy clusters
9
track the spin vectors of planetesimals to determine the trends in obliquity and distinguish between models where planets are gradually spun up versus those where a massive late-stage impact dominates (Lissauer and Safronov 1991, Dones and Tremaine 1993, Greenberg et al. 1996). This issue is related to the likelihood of creating Earth’s Moon with a large impact (Cameron and Benz 1991, Ida et al. 1997). Post-formation torquing by solar tides also affect planetary obliquities.
Why is the asteroid belt so sparse? g of material between 2.1 and 4 AU. The size distribution There is only 3 x is collisionally evolved and the characteristic relative velocities ( ~ 5 k m . s - l )are larger than the escape velocity of even the largest asteroid, Ceres. The blame for thwarting accretion and carving “the gaps” is nearly always attached to Jupiter. The first requires the rapid formation of Jupiter. The latter may face problems with the extent of mass depletion compared to the narrow width of the resonances, unless Jupiter’s semi-major axis migrated during its evolution so the narrow resonance zones swept through the belt and ejected sufficient material (Lissauer and Stewart, 1993). What is the role of orbital migration? Recent discoveries (e.g. Marcy et al. 1998 and references therein) of giant planets in surprisingly small orbits around nearby stars has stimulated an interest in orbital migration (Lin et al. 1996) that may arise from gravitational torques (Goldreich and Tremaine 1980), excitation of spiral density waves in the gaseous disk (Ward 1986, 1997), or preferential scattering of planetesimals (Malhotra 1993, Murray et al. 1998). Jupiter could also drive a one-armed spiral density wave in the planetesimal disk of wavelength -0.5 AU (for nominal disk parameters) at the g5 secular resonance near 2 AU, creating a spiral wave pattern that rotates on a timescale of w105 yr (Ward and Hahn, 1998). These waves and their relative importance in the heating and long-term stability of the planetesimal disk can be examined with direct simulations.
3.2
Modelling
The simulations described here were performed using a modified version of a cosmological N-body code, PKDGRAV (Stadel and Quinn, in preparation; data structures described in Anderson 1993, 1997). This is a scalable, parallel treecode designed for ease of portability and extensibility. It was originally written for cosmological N-body simulations (see the next section), but it was easily adapted for this application. Load balancing among processors is achieved through domain decomposition: each processor works on particles within a subvolume. These subvolumes are adjusted each timestep according to the amount of work done in the previous force evaluation. The equations of motion in our simulations are integrated using a leapfrog integrator. Leapfrog has several advantages over other methods for this class of problems: 1. For second order accuracy, only one force evaluation and one copy of the physical state of the system are required. This is particularly beneficial for N-body simulations where the cost of a force evaluation is very expensive.
2. The force field in an N-body simulation is not very smooth, so higher order does not necessarily mean higher accuracy.
Thomas Quinn
10
3. As discussed in the previous section, it is a symplectic integrator, i.e. it preserves properties specific to Hamiltonian systems. In the absence of collisions, a planetesimal system is Hamiltonian, and therefore should benefit from the use of an integrator that conserves phase space volume and has no spurious dissipation.
Spatial and temporal adaptivity The hierarchical structure of a treecode allows us to follow extremely large dynamic ranges in densities at modest additional cost per force evaluation (e.g. Barnes and Hut 1986, Richardson 1993). However, large ranges in densities also imply a large range in timescales (cx l / d w ) . Therefore, implementing a scheme such that the force on a given particle is evaluated at a frequency corresponding to its dynamical time can reduce the computational cost significantly. In PKDGRAV, all particle timesteps are chosen to be a power-of-two subdivision (called a “rung”) of the basic timestep. In this way, we ensure that the particles are synchronised at the end of the basic timestep. In a planetesimal simulation, a logical choice for a timestep criterion is
E,
At=q -
(1)
where F is the acceleration of the particle, T is the distance, either to the Sun or to the particle that contributes the largest acceleration, and q is a dimensionless constant. This criterion has the desirable property that in the absence of inter-planetesimal interactions, the planetesimals will have a fixed number of timesteps per orbital period ( n = 27r/q). Other criteria have this property, but they also have drawbacks. For example, a criterion with At 0; T / V , where v is the particle’s velocity, is not Galilean invariant. A criterion of At c( F / 3 , would be suitable, but calculating 3 proved to be computationally expensive.
Collision detection and resolution Collisions are predicted at the beginning of each step by keeping the particle velocities fixed and extrapolating the particle positions. Since this is a linear transformation, the time to surface contact between a pair of approaching particles (i.e. for which rev < 0, where r and v are the relative position and velocity) is given by
where RI and Rz are the physical radii of the two particles. The sign ambiguity is resolved by choosing the smallest positive value of tcoll.For any given particle, N,nearest neighbours are considered (typically 8 5 N , 5 32). The neighbours are found in order N,log N time using a balanced k-d tree, which is slightly different than the tree used by the gravity solver. The neighbour search algorithm is described in Bentley and Friedman (1979). If a value of tcoll is found that is less than the size of the step, then a collision must be performed. If more than one pair of particles satisfies this condition, the pair with the smallest tcoll value is processed first.
N-body simulations of the Solar System, planet formation, and galaxy clusters
11
To determine the collision outcome, the relative speed is compared to the mutual escape speed
where M E ml +m2 is the sum of the particle masses and G is the gravitational constant. If the relative speed is less than the mutual escape speed, the particles are merged t o form a new (spherical) object with the same bulk density. Otherwise the particles are allowed to bounce, with some energy dissipation parameterised by coefficients of restitution E , and surface friction Et (Richardson 1994, 1995). At higher impact energies, cratering and fragmentation would be expected to take place; we do not model these effects currently but plan to add them in the future (see below). However, merged particles are checked to ensure that their post-collision angular speeds do not exceed the classical breakup limit
where R is the radius of the newly merged body. Otherwise the particles involved are forced t o bounce off rather than merge. This prevents unrealistic mergers resulting from grazing collisions. Once the collision outcome has been determined and new velocities have been calculated (either for the two rebounding particles or for the single merged body), the postcollision particles are traced back to the start of the step so that they can be included in any remaining collision checks. This ensures that all collisions are detected and treated in the correct order, even if particles are involved in more than one collision during the step. For an example calculation, we show the results for N = lo6 identical 2 g cm-3 planetesimals in a cold Z ( a ) 0; a-3/2 disk of total mass 4.7 Me that extends from 0.8 to 3.8 AU. The present-day outer planets were included in the calculation in order t o gauge their effect on planetesimal accumulation. Since the disk started perfectly flat, the mutual inclination of the planets provided a vertical component of acceleration for the planetesimals. The run took approximately 200 wallclock hours to complete 890 years of integration using a 300-MHz Cray T3E with 128 dedicated processors. Timesteps were fixed a t 0.01 years. shows the mass density of this system a t the end of the simulation. The effect of Jupiter on the disk, which extends well into the present-day asteroid belt, can be clearly seen: a large density gap opens up a t the 2:l resonance (3.2 AU) and a narrow groove becomes visible a t the 3:l (2.5 AU). Strong transient spiral wave patterns and other telltale features also develop early on before dissolving away. Effects from the other outer planets are too weak to be seen yet. Meanwhile, planetesimal growth has proceeded uninterrupted in the inner region of the disk (under the assumption of perfect accretion). However, due to the realistic particle sizes the largest planetesimal a t the end of the run is only 9 times its starting mass. shows the number of mergers as a function of a. Initially the merger rate was very large due to the cold start: there were 60,000 mergers in the first 50 yr, and only 15,000 in the remaining 840 yr. This may explain why there is little evidence in Figure 5 of any sharp changes to the merger rates near the Jupiter resonances, since the scattering N
N
12
Thomas Quinn
Figure 4. Distribution in mass density of a planetesimal disk after 890 years of evolution under the influence of the giant planets. Jupiter is in the upper left-hand corner. took a while to develop. Hence we cannot determine from the basis of our simulation so far whether the presence of the resonances impedes or enhances planetesimal growth.
3.3
Prospects
There are still a number of improvements to be made on the modelling described here. These include both better algorithms for doing the calculation, and better physical modelling of the system, particularly the collisions. An obvious improvement to try to make is to use an MVS integrator instead of the leapfrog. However, this makes collision detection very much harder, as one has to check for the close approaches of Kepler orbits. This is further complicated by the requirement that the collision search must be done in O(N1ogN) time. New techniques such as building trees in orbital element space are being investigated. Better collision models are also needed. To this end we have been doing parameter studies of collisions between lkm bodies, with each body being modelled as a rubble pile; that is, it has no tensile strength and it is held together by gravity (Leinhardt et al. 2000). We’ve done a large parameter study of such collisions so that we determine what kinds of collisions result in mergers or bouncing or fragmentation of the constituents. These results can then be used to construct heuristic rules to employ in the large disk simulations. The initial particle size distribution is also an issue to be addressed. Small patches of the disk can be studied in very high resolution using the shearing sheet approximation, which will allow us to determine how bodies build up to the km size range.
N-body simulations of the Solar System, planet formation, and galaxy clusters
I
-
e
E
-5 a
13
1000
-
3
1
6oo 0
t
L I
I
,
.
.
,
I
,
.
,
,
2
I
,
.
,
I
3
%mI-maJor Ax11 a (AU)
Figure 5 . The number of mergers is plotted as a function of semi-major axis, a.
4
Simulations of clusters of galaxies
Numerical simulations are required to determine the nonlinear final states of theories of structure formation. These theories, based on the amount and nature of the matter in the Universe easily predict the statistics of 'density fluctuations in the early Universe, and these can be directly measured with microwave background experiments such as Boomerang and Maxima. Testing these theories against observations of clusters of galaxies we see to day is significantly more difficult because of the nonlinear physics involved. As well as gravitational physics, there are issues of hydrodynamics and star formation. These eventually need to be addressed as the bulk of our observations of galaxies are of the starlight. Notwithstanding these complications the numerical simulations can be compared against the observed clustering of galaxies to confirm or rule out the theory. For the past 15 years, the standard model for structure formation in the Universe is gravitational instabilities in a Universe dominated by Cold Dark Matter (CDM). This model has several features which make it attractive despite some shortcomings. It is theoretically well motivated. Starting with a scale free distribution of fluctuations (the Harrison-Zel'dovich spectrum) which then grow through the matter-radiation decoupling era and subsequently gravitationally collapses does a remarkable job of reproducing a number of observed quantities, all the way from galaxy clustering to the microwave background. Furthermore there is just one free parameter: the amplitude of the fluctuations. On the other hand CDM has had its troubles. Recently the microwave background results coupled with high redshift supernovae distances indicate that the mass density of the Universe is less than the critical density preferred by the standard model. More troubling is that CDM predicts galaxy velocity dispersions that are much higher than those
14
Thomas Quinn
observed. Different normalisations are also needed to explain observations on different scales. There are several ways to modify CDM to better fit these observations. Lowering the mass density and introducing a cosmological constant are obvious changes. Changing the initial power spectrum from the Harrison-Zel’dovich form is another modification. “Bias”, the idea that matter is clustered differently than the galaxies, adding a small component of hot dark matter, and changing the physical structure of the perturbations are all ideas that have been tried to reconcile CDM with more observations. However, as one adds more and more parameters, the predictive power of the model rapidly decreases. Despite all these issues, in the remainder of this section, I will be focusing on the standard CDM model.
4.1 Clusters of galaxies as dark matter probes Clusters of galaxies make excellent laboratories for studying the nature of dark matter for several reasons. One is that there are several ways to directly observe the dark matter distribution. The X-ray luminosity (assuming hydrostatic equilibrium) provides a direct measure of the cluster potential. Galaxy velocities provide another measure of the matter distribution. Finally gravitational lensing can measure the surface mass density. Most of these measures are not available for probing galactic dark matter. Again in contrast to galaxies, the cluster dynamics is dominated by gravity. Whereas gas cooling and hydrodynamics play a significant role in the formation of galactic structure, clusters are large enough that gravity is the only dynamically important force. This makes them simpler to model, and also simpler to understand, since one does not need to analyse the possibly complicated interplay between different non-linear physics. A similar point is that the observable structure of a cluster is dominated by gravity. That is, if one just looks at a typical galaxy, the light distribution is very different than the mass distribution. What is seen is the stellar disk, while the dark matter is nearly spherically distributed. In a cluster, the distribution of dark matter is relatively similar to the distribution of galaxies. In other words, the observations are more closely linked to the dominant mass in a cluster. Lastly, there is a practical matter that makes clusters of galaxies good objects to study dark matter. Clusters are only a few dynamical times old, whereas galaxies are older both in physical age, and especially in terms of their internal dynamical times. The smaller age makes clusters that much easier to model in terms of computational cost.
4.2
What is N?
In studying the evolution of dark matter using particle simulations, we have to be very careful about the meaning of the particles. One is tempted to think of the simulation as solving for the orbits of the particles given by the differential equation:
N-body simulations of the Solar System, planet formation, and galaxy clusters
15
However the N in our simulation is many orders of magnitude different than the N in the physical system. How can following of just a few million particles model the evolution of the perhaps 1070 sub-atomic particles that make up the dark matter of a cluster of galaxies? What we really should be doing is solving the Collisionless Boltzmann Equation:
Where f (x,v, t) is the distribution function of the dark matter, and @(x)is the gravitational potential. On the surface this is difficult: this is a partial differential equation in 7 dimensions. However, we can use the method of characteristics where we follow the motion of packets off: bf(x(t),v(t)). Now the equations of motion for these packets are:
x = v, v = -V@. Upon inspection, it is easy to see that the equation of motion for these packets is identical to the equation of motion for our particle orbits. Thinking about the problem in this way helps us properly interpret the particle distribution. Firstly, any quantity involving the particles should be smoothed so that it is averaged over many particles. In particular, interparticle forces should be modified so that @ is not dominated by single interactions, i.e. the forces should be softened. Even if the forces are softened, the discreteness will cause large scale fluctuations in @ that will excite spurious solutions for the evolution of f. These should be small compared to quantities of interest.
4.3
Strategy for simulating clusters
When simulating the formation of a cluster, how big should our simulation volume be in order to capture all the relevant cosmological context? An obvious minimum is 20 Mpc so that the simulation volume doesn’t collapse on itself. A more stringent condition is the ability to model the tides from the surrounding structure. Significant torques come from structure as far as 25 Mpc away (Ryden, 1988). However, to follow the evolution of these structures without significantly suppressing growth with a finite volume one needs to use a 100 Mpc volume (Gelb and Bertschinger, 1994). If the cluster under consideration is not going to be the largest object in the volume but representative of the structures in the simulation volume, then one needs to model a 600 Mpc volume to get a fair sample of clusters. Finally, if we want a volume large enough to determine the cluster-cluster correlations, then a volume of 1000 Mpc is needed. This is also the volume of current large scale galaxy surveys, such as the Sloan Digital Sky Survey (York et al. 2000). The next issue is the resolution needed. To have sufficient resolution that the simulation reproduces the overall structure of the cluster, 100 kpc spatial resolution is needed.
Thomas Quinn
16
In order to suppress particle noise in the simulation at the 100 kpc scale, approximately 50 million particles are needed in the 1000 Mpc volume. If we wish to resolve the substructure, (that is the galactic halos) within the clusters, we need a spatial resolution of 10 kpc or less, and more than a million particles in the cluster itself. This implies more than 8 x 10" particles within the 1000 Mpc simulation volume. Clearly this is too large for computational resources available for the next few years. We need a simulation strategy that will give us the necessary resolution within the cluster and capture the surrounding structure with a minimum number of particles.
4.4
Simulation procedure
Our strategy is the following. First we simulate a large (1000 Mpc) cosmological volume at relatively low resolution (50 million particles). A candidate cluster is then identified in this simulation. The particles within the selected halo are traced back to the initial conditions to identify the region that will be re-simulated at higher resolution. The power spectrum is extrapolated down to smaller scales, matched at the boundaries such that both the power and waves of the new density field are identical in the region of overlap. Then this region is populated with a new subset of less massive particles. Beyond the high resolution region the mass resolution is decreased in a series of shells such that the external tidal field is modelled correctly in a cosmological context. The starting redshift is increased such that the initial fluctuations are less than one percent of the mean density and we then re-run the simulation to the present epoch. Figure 6 demonstrates the power of this technique. Note how we are able to resolve hundreds of halos within the cluster of interest while capturing the surrounding large scale structure.
Figure 6. A Coma-sized cluster in its cosmological context. The entire simulation as shown in the right hand panel, while the left hand panel shows the resolution we are able to achieve within the cluster itself.
N-body simulations of the Solar System, planet formation, and galaxy clusters
4.5
17
Radial profiles
The properties of dark matter halos in the CDM model have been extensively investigated since t h e early 1980’s; however, only in the past decade have computational facilities and software improved to the point such that the central properties of dark halos can be compared directly with the observations. Navarro, Frenk and White (1996), hereafter referred to as NFW, made a systematic study of CDM halo structure over a range of mass scales. They found that the density profiles of halos follow a universal form, uniquely determined by their mass and virial radius varying from r-l in the central regions, smoothly rolling over to r - 3 at the virial radii. (The virial radius, rvir, is defined as the radius of a sphere containing a mean mass over-density of 200 with respect to the global value). The NFW halos typically contained 5,000-10,000 particles, a number that was claimed to be sufficient to resolve the density profile of halos beyond a distance -1% of the virial radius. We investigated the robustness of this result as we varied the resolution of the simulation. We have run the same cluster varying the number of particles in the cluster from 1450 to 2.7 million, and the force softening from 1 Mpc to 10 kpc. Our result is that in the highest resolution simulations the central density rises slightly steeper than the NFW given above: p 0: r-1,5. We found that the large number of particles, as opposed to softening or substructure, was the key ingredient for obtaining the steeper profiles. In other words, the discreteness of the phase space sampling seems to be responsible for flattening the central density profile in the lower resolution simulations. Is this density profile consistent with what is observed in clusters of galaxies? Unfortunately, results from gravitational lensing have so far been contradictory. Tyson et al. (1998) find evidence for a soft 70 kpc core, while Williams et al. (1999) show that massive clusters are essentially consistent with the N-body predictions. It also seems the case that the presence of central galaxy and cluster substructure makes the comparison difficult. Both scaling arguments and dark matter simulations of galaxy halos show the same steep density profile for galaxy-sized halos. Comparison of these predictions with the rotation curves of disk galaxies are difficult because of the ambiguity of the contribution from the stellar component. The structure of dark matter halos is more directly revealed in systems where the disk contributes little to the dynamics. Several studies have noted that the rotation curves of dark matter dominated dwarf galaxies and low surface brightness galaxies are inconsistent with the profile we find, and instead, indicate constant-density cores (Moore 1994, Navarro et al. 1996, Burkert & Silk 1997). On the other hand beam smearing may be artificially flattening the profiles (van den Bosch et al. 2000). If the flatter density profiles hold up under higher resolution observations, then a significant modification of CDM is required. Recent proposals include collisional dark matter, (Spergel and Steinhardt, 2000, Moore et al. ZOOO), degenerate dark matter, and annihilating dark matter (Calcaneo-Roldan and Moore, 2000).
4.6
Substructure in clusters
A remarkable feature of the high resolution simulations is the number of subhalos that naturally form. This is in contrast to previous low resolution simulations which had almost no substructure, and determining the predicted galaxy distribution within the cluster required some more or less ad hoc methods (Gelb and Bertschinger, 1994a).
Thomas Quinn
18
b
1
0
-Simulated
0.1
0.2 “c
0.3
cluster
0.4
“200
Figure 7. The cumulative number of halos greater than a given circular velocity is plotted for our simulated cluster, our simulated galaxy, the Virgo cluster, and the Milky Way. The circular velocities are scaled according to the circular velocity of the parent halo. To identify the subhalos in our simulation we used an algorithm that uses local density maxima to find group centres, and assigns particles to those centres via a “watershed” algorithm. I n Figure 7 we compare the cumulative distribution in circular velocities of these halos with circular velocities of galaxies in the Virgo cluster as derived from the Tully-Fischer relation and satellites of the Milky Way. Note how well the simulations match the cluster data, and how poorly they match the Milky Way satellites. This is another indication of trouble for the standard CDM model. Either the model of hierarchical formation is wrong or the substructure is really present in the Milky Way, but is somehow very dark. Numerous studies have invoked feedback from star formation or an ionising background to suppress or darken dwarfs (Dekel and Silk 1986, Quinn et al. 1996); however the argument has always been weak. There just isn’t enough energy to expel gas out of the halo of isolated dwarfs. Furthermore, even if the halos are dark, the spiral disk cannot survive in the presence of large amounts of substructure. The strongly fluctuating potential will turn a spiral galaxy into something that looks like an elliptical. Modifications to the CDM model do not have to be as drastic as what is needed for the core problem discussed above. For example, if the dark matter were just slightly “warm” , e.g. as would be the case for a 1 kev neutrino, then structure on subgalactic scales would be erased, allowing for a good match with the observations.
N-body simulations of the Solar System, planet formation, and galaxy clusters
4.7
19
Future. work
All of the above discussion has been in the context of only the dark matter. Especially on galaxy scales, non-gravitational processes such as gas dynamics, cooling and star formation play a major role. In the end, we need to model the things we can actually see, stars and gas, in order to make good comparisons with observations. Furthermore, simulated observations need to be performed so that direct comparisons can be made between models and observations without any simplifying assumptions. Simulations are being performed, but the resolution requirements are even higher than for the dark matter only calculations. If the recent history of the subject is any guide, we expect these studies to produce new and exciting discoveries about the fundamental properties of the Universe.
References Anderson R J, 1993, Computer science problems in astrophysical simulation. In Silver Jubilee Workshop on Computing and Intelligent Systems pp48-61. Tata McGraw-Hill, New .Delhi. Anderson R J, 1996, Tree data structures for N-body simulation. Proc 37th Ann Symp Foundations of Comp Sci , 224. Applegate J H, Douglas M R, Gursel Y, Sussman G J & Wisdom J, 1986, Astron J 92 176. Arnold V I, 1978, Mathematical methods of classical mechanics (New York: Springer-Verlag). Bentley J L, and J H Friedman, 1979, Data structures for range searching, Computing Surveys 11, 397. Bretagnon P, 1974, Termes a longues periodes dans le systeme solaire, Astron &' Astrophys 30 141. Brower D & van Woerkom A J J, 1950, Astron Pap A m Ephem 13: part 2, Washington: US Print Off. Burkert A & Silk J, 1997, Dark Baryons and Rotation Curves, Astrophys J Lett 488 L55. Calcaneo-Roldan C & Moore B, 2000, Phys Rev D, in press. Cameron A G W & Benz W, 1991, Zcarus 92 204. Chambers J E & Wetherill G W, 1998, Icarus 136 304. Chambers J E & Wetherill G W, 1996, AAS/Division of Planetary Sciences Meeting 28 1106. Dekel A & Silk J, 1986, The origin of dwarf galaxies, cold dark matter, and biased galaxy formation, Astrophys J 303 39. Dones L & Tremaine S, 1993, Icarus 103 67. Gelb J M & Bertschinger E, 1994, Astrophys J 436 467. Gelb J M & Bertschinger E, 1994a, Cold dark matter 2: Spatial and velocity statistics, Astrophys J 436 491. Gladman B, Duncan M & Candy J, 1991, Cel Mech and Dyn Astr 52 221. Goldreich P & Tremaine S, 1980, Ap J 241 425. Gradie J C, Chapman C R & Tedesco E F, 1989, Asteroids 11, 316. Greenberg R, Fischer M, Valsecchi G B & Carusi A, 1996, AAS/Diuision of Planetary Sciences Meeting 28 1110. HBnon M, in Chaotic Behaviour of Deterministic Systems, ed. G Iooss et al. Amsterdam: NorthHolland, 54. Ida S, Canup R M & Stewart G R, 1997, Nature 389 353. Laskar J, 1989, Nature 338 237. Laskar J, 1994, Large-scale chaos in the solar system, A &' A 287 L9. Laskar J, Quinn T & Tremaine S, 1992, Zcarus 95 148.
20
Thomas Quinn
Leinhardt Z, Richardson D C & Quinn T, 2000, Direct N-body Simulations of Rubble Pile Collisions, Icarus 146 133. Lin D N C, Bodenheimer P & Richardson D C, 1996, Nature 380 606. Lissauer J J, 1993, Ann Rev Astr Astrophys 31 129. Lissauer J J & Stewart G R, 1993, Protostars and Planets III, 1061. Lissauer J J & Safronov V S, 1991, Zcarus 93 288. Marcy G W, Butler R P, Vogt S S, Fischer D & Lissauer J J, 1998, Ap J Lett 505 L147. Malhotra R, 1993, Nature 365 819. Moore B, Gelato S, Jenkins A, Pearce F R & Quilis V, 2000, Collisional versus Collisionless Dark Matter, Astrophys J Lett 535 L21. Moore B, 1994, Evidence against Dissipationless Dark Matter from Observations of Galaxy Haloes, Nature 370 629. Murray N, Hansen B, Holman M & Tremaine S, 1998, Science 279 69. Navarro J F, Eke V R & Frenk C S, 1996, The cores of dwarf galaxy haloes, Mon Not R Astron SOC 283 L72. Navarro J F, Frenk C S & White S D M, 1996, The Structure of Cold Dark Matter Halos, Astrophys J 462 563. Quinn T, Katz N & Efstathiou G, 1996, Photoionization and the formation of dwarf galaxies, Mon Not R Astron SOC 278 L49. Quinn T R, Tremaine S & Duncan M, 1991, Astron J 101 2287. Richardson D C, 1994, Tree code simulations of planetary rings, Mon Not R Astron Soc 269 493. Richardson D C, 1995, A self-consistent numerical treatment of fractal aggregate dynamics, Icarus 115 320. Rivera E, Lissauer J J, Duncan M J & Levison H F, 1999, AAS/Division of Dynamical Astronomy Meeting 31 0202. Ryden B S , 1988, Astrophys J 333 78. Safronov V S, 1969, Evolution of the Protoplanetary Cloud and Formation of the Earth and Planets Nauka Press, Moscow. Saha P, and Tremaine S, 1992, Symplectic integrators for solar system dynamics, Astron J , 104 1633. Spergel D N & Steinhardt P J , 2000, Observational evidence for self-interacting cold dark matter, Physical Review Letters 84 3760 . Strom S E, Edwards S & Skrutskie M F, 1993, Protostars and Planets 111, 837. Sussman G J & Wisdom J 1988, Science 241 433. Sussman G J & Wisdom J, 1992, Science 257 56. Tyson J A, Kochanski G P & dell’bntonio I P, 1998, Detailed Mass Map of CL 0024+1654 from Strong Lensing, Astrophys J Lett 498 L107. Ward, W R & Hahn J M, 1998, Astron J 116 489. Ward, W R, 1997, Icarus 126 261. Ward W R, 1986, Icarus 67 164. Wetherill G W, 1990, Annual Review of Earth and Planetary Sciences 18 205. Williams L L R, Navarro J F & Bartelmann M, 1999, The Core Structure of Galaxy Clusters from Gravitational Lensing, Astrophys J 527 535. Wisdom J & Holman M, 1991, Symplectic maps for the N-body problem. Astron J 102 1528. van den Bosch F C, Robertson B E, Dalcanton J J & de Blok W J G, 2000, Constraints on the Structure of Dark Matter Halos from the Rotation Curves of Low Surface Brightness Galaxies, Astron J 119 1579. York D G et al. , 2000, Astron J 120 1579. Yoshida H,1990. Construction of higher order symplectic integrators. Phys Lett A 150 262.
21
On the Trojan problem Rudolf Dvorak and Elke Pilat-Lohinger University of Vienna, Austria
1
Introduction
Up to 1800 the knowledge of bodies in our Solar System was limited to seven planets moving around the Sun in more or less circular orbits and some strange species, the comets, which - on highly eccentric orbits - appeared on the night sky like messengers of pain for the human race. It was on New Year’s Eve of 1801, in Sicily, that Piazzi discovered the first asteroid, Ceres (1); at that time this was regarded as a new planet between Mars and Jupiter, thus reinforcing belief in the Titius-Bode law which ‘predicted’ an undiscovered planet at rz2.8AU. In the following years astronomers discovered more and more bodies with semimajor axes in the range 1.5AU < a < 5.2AU which finally lead to the - nowadays confirmed - assumption of a belt of asteroids. In 1846 the eighth planet, Neptune, was discovered by Galle in Berlin after the theoretical work of Leverrier and Adams. Besides the interesting discovery in 1898 of the first Earth approaching asteroid, Eros(433) (coming as close as the Moon), it was a real surprise to find celestial bodies moving in the 1:l resonance with Jupiter. The first asteroid, librating on a stable orbit close to Jupiter’s Lagrangian point Lq, was observed in 1906 by Max Wolf in Heidelberg; it was named Achilles(588)after the hero of the Trojan war. The discovery of Pluto in 1930 lead to the following conception of the Solar system. Nine planets moving around the Sun on more or less circular orbits. In between the orbits of Mars and Jupiter and also close to the Lagrangian equilibrium points of Jupiter smaller bodies (asteroids) orbit the Sun. Comets, with sometimes very eccentric orbits, are coming far from outside the Solar system. But in 1932 the first discovered real Earth crosser, Apollo(1862), led to the conclusion that many more smaller objects may populate the region of the inner Solar system. In fact more minor planets were found with semimajor axes slightly larger than that of the Earth, and crossing its orbit. In 1976 the first asteroid moving inside the Earth’s orbit and also crossing it, was found: Aten (2062). The discovery of the first so called Edgeworth-Kuiper-Belt object QB1992 by Jewitt and Luu (1993), in the region where Pluto moves, answered the question, whether other
22
Rudolf Dvorak and Elke Pilat-Lohinger
planets (or asteroids) exist beyond Pluto. Nowadays we may distinguish globally 4 different groups of asteroids: The Edgeworth-Kuiper-objects (KBO) moving outside Neptune (275 objects). The cloud of Jupiter Trojans moving close to the Lagrangian equilibrium points L4 and L5 (613 objects). The main belt asteroids, with semimajor axes between those of Mars and Jupiter (7369 objects). The asteroids with perihelion distances q < 1.3 AU which we call Near Earth Asteroids. (1014 objects). There is extended literature available for all asteroid “belts”; we just mention some recent articles: Dvorak and Pilat-Lohinger, (1999a, for the KBO objects), Moons (1997, for the dynamics of the main belt) and Dvorak and Pilat-Lohinger (199913, for the NEAs). In the following we will extensively discuss the literature on the Trojan “swarms” (as some colleagues call them). Concentrating on the dynamics of the Jupiter Trojans the theory of the Restricted Three-Body Problem (RTBP) will be introduced, which is the basis of understanding the motion of these asteroids. Then we will report on theoretical work, using analytical methods, to understand their dynamics and we will also shortly discuss the importance of other types of resonances within the 1:l mean motion resonance with Jupiter. Numerical results will show how we deal with the problem of the long term evolution of the Trojans and their stability. Finally we give some examples of peculiar orbits in connection with the complicated dynamics of asteroids in stable chaotic motion, an expression which goes back to Milani & Nobili, (1992)
The restricted three-body problem
2 2.1
Equations of motion
The classical restricted three-body problem consists of the following restrictions with respect to the general three-body problem:
1. The third body (mass m3) is thought to be massless and thus is not affecting the Keplerian motion of the two primaries (masses ml and mz). 2. The motion of
m3
takes place in the plane of motion of the two primaries.
3. The motion of the primaries is circular. The problem is reduced to the motion of m3 described by a set of second order differential equation in the plane and the usual way of formulating the problem is to use a uniformly rotating coordinate system, where the two primaries have fixed positions on the {-axis. When the motion of the primaries is such that their eccentricity is NOT negligible (e > 0) we speak of the elliptic restricted problem. Then a uniformly rotating frame cannot be
On the n o j a n problem
23
used and a rotating-pulsating coordinate replaces it. When the orbit of m3 is not confined to the plane of motion of the primaries we deal with the spatial restricted problem. The most effective way to derive the equations of motion in the rotating frame is to express the kinetic energy ( T )and the potential energy (U = r - l ) in the new coordinates and then use the Euler-Lagrange equations:
For the transformation of the fixed coordinate system to a uniformly rotating one we use the following relations x = (cos4 - qsin4 (2)
(3)
y=(sin4+qcos4
with 4 = nt, where n denotes the mean motion. Differentiating with respect to time and inserting in the well known expression for the kinetic energy in the fixed coordinate system
T = -(i2 + y2) 2 1
leads to
Likewise the potential energy can be expressed as U = ( ( 2 + q 2 ) - 1 / 2 . Building the required derivatives with respect to E, q, ( and $, we derive the Euler-Lagrange equations in the following form d . E - nv) = n ( i + n o - (6) r3
$
Finally the equations of motion in a rotating frame become
(-2n7j=(
(n
-r:)=$
an
ij+2n(=q n - - - ( 2 ri)-aq
where the effective potential in the rotating frame reads 0 = T-’ distance from the centre of mass.
2.2
+ $?r2,
r being the
Jacobi integral and zero-velocity curves
The equations derived above possess an integral of motion which is easy to compute when we multiply equation (8) by ( and equation (9) by and add the two expressions. The new equation
24
Rudolf Dvorak and Elke Pilat-Lohinger
Figure 1. Zero velocity curves (ZVC): (a) Large Jacobi constant: m3 can move either close to ml or m2 (a satellite), or far away from both (a planet). (b) Moderate Jacobi constant: m3 can move around both primaries, because a channel close to L1 is open (e.g. artificial satellites can orbit around both Earth and Moon). (c) Small Jacobi constant: m3 now can leave the region around the primaries. With even smaller values of C the Z V C degenerate to the equilateral points and there are no more restrictions for the motion of m3 (e.g. comets). After Stumpf (1970). can be integrated, leading to the well-known Jacobi integral (after Carl Gustav Jacob Jacobi, 1804-1851) which reads
v' = 2 R - e.
(11)
The zero velocity curves (ZVC), defined by V 2= f(E, 7 ) = 0, form the border in the (t, 7) coordinate system between allowed (V' 2 0) and forbidden (V' < 0) regions of motion for a small body under the gravitational attraction of the two primaries. See Figure 1 in which forbidden regions are shaded. The initial conditions E, 7 , and rj define the Jacobi constant C, which acts as the integral of relative energy in the restricted three-body problem. The discussion of the properties of the potential function R leads to 5 stationary points of equilibrium called the Lagrangian points Li, i = 1,..., 5; three of these points lie on the axis (one to the left of one of the primaries, one in between the primaries and a third one to the right of the other primary, see Figure 1). A study of the stability properties shows that the three collinear equilibrium points, L1, L2 and L3, are unstable points, which means, that a body initially close to any of them will diverge exponentially fast. The two other equilateral Lagrangian points ( L 4 , which precedes the planet by 60" in longitude, if we take the Sun and Jupiter as primaries, and Lg, which trails Jupiter by 60") are stable equilibrium points for mass ratios of the primaries up to U , = mz/ml = 0.0385. Thus, all equilateral Lagrangian points in frequently studied three-body subsystems of the Solar system are stable, even in the Earth-Moon system where p = 1/81 = 0.0123.
i
<
On the Dojan problem
25
Nowadays (October, 2000) 480 asteroids, all of which were given the names of warriors of the Trojan war, are known to be moving close to 60" ahead of Jupiter (154) and 344 objects are trailing Jupiter at about 60" (L5). A possible explanation for this difference in the number of Trojans around the two equilibria was given by Barber (1986), who postulated that this effect may be due to the long term perturbations of Saturn, but this question has not been answered successfully yet.
3
Analytical approaches and qualitative results
Analytical estimates for the stability range of libration orbits in the RTBP were already carried out by Thiiring (1931); only 7 Trojans were known at that time. The stability limit bu = f 0 . 0 5 u ~ where , U J is the semi-major axis of Jupiter, is well above the stable libration limit found by recent methods. Numerical integrations of libration orbits around the Lagrangian point L4 were for the first time carried out by Thiiring (1959) in connection with the existence of periodic orbits. In a further step, the stability of the long-period libration of the Trojans has been established on the basis of the linearised variational equations, valid only for infinitesimally small displacements from the exact periodic orbit, by Rabe (1961, 1962). Using the RTBP, Rabe (1961) established a limiting curve for the stability of the Trojans, depending on the eccentricity and the libration width, which is also valid for the planar elliptic restricted problem. Since the orbits of the real Trojans show additionally short-periodic oscillations, higher-order terms had to be included in the theoretical model. These correction terms led to the conclusion that the Trojans are stable, at least up to the third-order approximation (Rabe, 1967). In a qualitatively different approach Giorgilli and Skokos (1997) considered the problem of stability of the triangular Lagrangian equilibria in the RTBP. They were able to prove, in the spirit of Nekhoroshev's theory, the effective stability (i.e. orbital stability over exponentially long times) of orbits initiated in a region around the equilateral equilibrium points of Jupiter, which is big enough to include some known Trojan asteroids. Several analytical studies have been carried out during the last years using also simplified models, mostly based on the RTBP or extensions of it: Garfinkel (1977) used the planar restricted three body problem. Erdi (1977) generalised this work to the planar elliptic restricted three body problem. Erdi (1978) extended it to the 3-dimensional elliptic restricted three-body problem and studied the motion of the Trojans in a series of papers (Erdi 1981, 1983, 1984, 1988, Erdi & Presler 1980, Erdi & Varadi 1983). Zagretdinov (1986) studied the motion in the 3-dimensional restricted three body problem (an extension of Garfinkel's solution). Erdi (1995) investigated the Trojan problem by taking into account the major perturbations of the giant planets. It is remarkable that there exist four basic well distinct periods of the Trojans' orbital motion around Jupiter's triangular points:
26
Rudolf Dvorak and Elke Pilat-Lohinger 1. A period of revolution ( x l 2 y r s ) .
2. A period of libration around
L4
or L5 (145.7-240 yrs).
3. A period of free motion of the perihelion (3000-5600 yrs) 4. A period of free motion of the ascending node (38000-2700000 yrs) Evidently no low order resonances between the basic motions of the Trojans are present. The orbital evolution is mainly driven by the 1:l mean motion resonance with Jupiter, the 5:2 mean motion resonance with Saturn and, at certain regions inside the 1:l libration zone, by the presence of secular resonances. To study the problem of the Trojans theoretically one has to take into account that the inclinations of these asteroids can be large (up to 37"); this makes the problem different from the study of the main belt asteroids, which are mostly confined to low-inclination orbits.
Jupiter
Figure 2. The coordinate system: r, a , z are the cylindrical coordinates of the asteroid; is the true anomaly of Jupiter; the distance between the Sun and Jupiter is the unit distance. RI and Rz denote the distance from the Sun and Jupiter, respectively. (After Erdi, 1981).
U
3.1
Basic considerations
In the following we will sketch the analytical work carried out by Erdi, who studied the motion in the gravitational field of the Sun and Jupiter, in the case where Jupiter's orbit is a fixed ellipse around the Sun (spatial elliptic restricted problem). The equations of motion in the cylindrical coordinates ( r ,CY, z ) of Figure 2 are:
-d2r -r($) dv2
$
2
(T2$
- 2 r - da: = 1 dv 1+ e J c o s v
+r2) d2z
-+z dv2
= 1 =
+ eJcosv Z
1
+ eJcosv
cosa - r
- cosCY)]
(12)
On the Trojan problem
27
where r and a are polar coordinates in the orbital plane of Jupiter and the z coordinate is perpendicular to this plane. v is the true anomaly of Jupiter and eJ denotes the eccentricity of Jupiter's orbit. U , is the mass ratio given by the ratio of Jupiter's mass to the total mass of the system (i.e. the sum of the masses of Sun and Jupiter). The distances of the asteroid to the Sun (RI) and to Jupiter (R2) are given by
R2 = d1+ T~
R1 =
- 2rcosa
+ z2
For the equations of motion Erdi assumed a solution in the form of a three-variable asymptotic expansion:
N
z
=
[c Enzn(w,
U ,7 )
+ O(€"+')/
where E
=
JT;
U = €(U - 210)
T
= E 2 ( W -WO)
(WO is the epoch). The replacement of r , a and z in the equations of motion by these expansions results in a system of partial differential equations for the unknown functions r,, an,z,, for which the arbitrary functions appearing in the solutions are such that the solution for T , a, z should not contain secular terms. The solution was first determined to the second order (Erdi 1981) and then extended to the third order (Erdi 1984). From the solution for r , a , z one can determine the perturbations of the Trojan by means of the two-body problem. To take also the perturbations of the other giant planets into account, Erdi studied the motion of the Trojans in a model in which the elliptic orbit of Jupiter is secularly precessing (Erdi 1995). Therefore, the left-hand-sides in the first two of equations ( 1 2 ) should be changed to:
-d2r -r($) dv2
2
- 2 r da ---2,~r(g+I)
=
...
dv
where the parameter p, equal to ( . j J / n J )is, connected to the precession of Jupiter's orbit; .jis the secular rate of change of the perihelion of Jupiter's orbit and nJ denotes Jupiter's mean motion. The application of the method of a four-variable asymptotic expansion results in a system of partial differential equations, where the solutions for r , a , z and the perturbations of the Trojan's motion are determined in the same way as in the spatial elliptic restricted three-body problem (for more details see Erdi 1995).
Rudolf Dvorak and Elke Pilat-Lohinger
28 5.60 1
4.80
0
7
1
1
I
1
300
360
I
1
I
I
1
60
120
180
240
’
Figure 3. The variation of a against QO. Each curve corresponds to a certain value of h (starting with the innermost): 1.6, 2, 2.5, 3, 4.5 and 6. (After Erdi, 1997).
3.2 Libration around the triangular points The main part of the libration around Ld is given by the term cy0 in the expansion of cy in Equation (13). Figure 3 shows the variation of the semi-major axis against QO, where the curves correspond to different values of the “energy” integral h,
(aao)2+
h=- 2 du
3 [2-’/2(1 - cosao)-‘/~- coscyo]
(14)
which comes from solving the equation:
3.3 Perturbations of the eccentricity and the perihelion The theory of secular perturbations can be used to describe the long-term variations of Jupiter’s eccentricity ( e J ) and of the longitude of perihelion ( I Z J ) ; in the context of linear secular theory for Jupiter and Saturn: 6
6
mJ, sin(g,t
hJ = 3=5
+ P3)
kJ = 3=5
mJ, cos(g3t + @,I
(15)
with hJ = eJ s i n a ~and k J = eJ cos W J . The fundamental frequencies g3 and the constants mJ, and @, can be found in the paper by Nobili et al. (1989). The following figures show the behaviour of the eccentricity and the longitude of perihelion in different models: In the two types of motion, i.e. libration and circulation are shown in the planar elliptic restricted three-body problem, when Jupiter’s orbit is a fixed ellipse around the Sun. A secularly perturbed orbit of Jupiter, in the planar system, causes a variation of the asteroid’s eccentricity which can be seen in . Studying the same orbit in the spatial case, with an inclination of 19”, the variation of the eccentricity is stronger than in the planar case (see ). shows the behaviour of the eccentricity and the longitude of perihelion of the Trojan asteroid (1173) Anchises, according to a numerical integration; this result confirms the validity of Erdi’s analytical work.
29
On the Trojan problem 0.15
I
I
I
0.00
-
-90
0
90
180-
w
270
- 3,
Figure 4. Variation of the eccentricity versus a - W J : (a) planar restricted three body problem, eJ is constant; (b) planar problem; eJ is secularly changing; (c) spatial problem, e is secularly changing; ( d ) for the Trojan asteroid ( 2 1 73) Anchises according a numerical integration in the OSS model. (After Erdi, 1997).
4
The R6le of the resonances
In dynamical models describing the motion of asteroids, where there is more than one perturbing planet, three new types of resonance occur in addition to the mean motion resonance; these are (a) the secular resonances (SR), (b) the Kozai resonance and (c) the three-body mean motion resonance.
Secular resonances, These are important new resonances between the precession frequency of an asteroid's longitude of perihelion (or node), a (or R), with one (or a linear combination) of the characteristic secular frequencies of the solar system, which describe the precession of the planetary orbits. Of special interest are the so-called linear secular resonances: The notation vi is used to describe the SR which arises when the mean precession rate of the perihelion longitude (r5) of an asteroid is equal to the mean precession rate of the perihelion longitude of Jupiter ( i = 5 ) , Saturn ( i = 6), Uranus (i =7) or Neptune ( i = 8); more precisely when (e) x g,, where g, are the secular frequencies calculated from linear secular theory (see Equation 15). Similarly, the SR vli arises when the mean precession rate of the asteroid's node is equal to the mean precession rate of the node of Jupiter ( i = 15), Saturn ( i = 16), Uranus ( i = 17) and Neptune ( i = 18).
30
Rudolf Dvorak and Elke Pilat-Lohinger
Already Bien and Schubart (1984) pointed out the perturbations acting on the orbital motion of the Trojans due to the non constant nodes and perihelion distances of Jupiter and Saturn . The important r61e of the linear secular resonances, especially the vg. v5 and v16, in the dynamical sculpting of the main belt and the evolution of the NE.4s was recently demonstrated by several researchers (e.g. Michel and Ch. Froeschlk, 1997, Dvorak and Pilat-Lohinger, 199913). These mechanisms act such that a celestial body initiated inside a SR suffers from a strong, chaotic, increase in the eccentricity (in vz) or in the inclination (in VI$). Recently Morais (1999) derived a secular theory for Trojan-type motion in the simplified framework of the restricted three-body problem; it is valid inside the entire regular region of the 1:l mean motion resonance. An extension of the theory to include the secular perturbations from additional bodies and an oblate central mass is. according to the author, also possible under certain assumptions. Using hlorais' theory it is also possible to locate the linear secular resonances which play an important role in the long-term stability of Trojan orbits. We will see this in the next section.
The Kozai resonance This resonance (Kozai, 1962), acts when the precession rate of the longitude of perihelion of a body in the Solar System is equal to the precession rate of its nodal longitude. In this situation the precession of the argument of the perihelion stops ( ~ 2= 0). In the context of linear theory it turns out that the Delaunay element H = u d m cos i is constant, as in Keplerian motion. Although in this simplified approach the semimajor axis is constant, this is not true for the inclinations and the eccentricities. However. due to the constant value of H , these orbital elements behave such that a minimum in eccentricity corresponds to a maximum of inclination and vice versa (see ) .
The three-body motion resonance This new type of mean motion resonance in the dynamics of main-belt asteroids was investigated by Nesvorn? and Morbidelli (1998). In such a resonance the critical argument is defined as a linear combination of the mean longitudes of the asteroid, Jupiter and Saturn and can be thought of as the analogy of the Laplace resonance of the Galilean satellites. The large variations of the semi-major axis of the asteroid (490) Veritas. another 'Asteroid in Stable Chaos' (ASC), were attributed t o the action of such a mixed resonance (Milani et al. 1997). Such resonances may play a role for small inclinations, because the effective width of these resonances is a decreasing function of the inclination. Thus, for inclinations of the order of 20°, the presence of a secular resonance in the same region of the proper elements' space would dominate the dynamical evolution of a minor body.
Discussion Although most of the Trojans seem to be favoured by the L4 (or L5)stability region of the 1:l mean motion resonance with Jupiter, the relatively high inclinations (- 20") of many of them (e.g. Thersites, Phoinix) make them possible candidates for suffering instabilities induced to their orbits by the proximity of secular resonances, which may even lead to an escape from the Trojan clouds.
On the Trojan problem
31
Elements of 1992 FE 1
I
I
,
I
I
1
I
I
I
I
.-E
E .
0.5
0
I
I
I
50
100
150
200 250 300 time in 1000 years
I
1
350
400
I 450
500
Figure 5 . Orbital parameters a, e and sin(i) for an asteroid an the Kozai resonance This has been shown recently by Tsiganis et al. (2000a) and Tsiganis and Dvorak (2000) for two Trojans, namely Thersites and Achates and by Marzari and Scholl (1999) for 300 fictitious llojans. In the latter case the orbits were integrated with the aid of the SWIFT program (Levison and Duncan, 1994), for initially small eccentricities and inclinations (e < 0.05 and i < 5") and libration widths of the order of cy 60". The dynamical model was (as usual for the Trojans) the Outer Solar System (OSS), and the time-span of the integration was 400 Myrs. The authors found the following interesting scenario: The Kozai resonance may increases relatively fast the inclination of initially low pumps up the inclinations inclined orbits of up to i 10"; then the secular resonance to even larger values (up to i 20"). Collisions within the Trojan clouds may reduce the large amplitude of libration of some Trojans, such that these asteroids stay in a stable orbit in the vicinity of the equilibrium point, while the others would escape quite fast.
-
-
N
5 5.1
Numerical results Numerical approaches of Schubart, Milani and Levison
One of the first numerical studies has been undertaken by Schubart and Bien (1987) and Bien & Schubart (1987) where they derived three "characteristic orbital parameters" (the three proper elements: amplitude of libration D, proper eccentricity and proper inclination) for 40 Trojans based on numerical integrations over +/- 73000 years, which included the perturbations of Jupiter and Saturn. Their conclusion was that they expected
32
Rudolf Dvorak and Elke Pilat-Lohinger
the proper elements to be stable over much longer intervals of time In an extensive study, Milani (1993, 1994) calculated the proper elements for 174 asteroids which are in the 1:l resonance with Jupiter in the model of the OSS; the timespan of the numerical integration was lo6 years. The main purpose was to find possible families of asteroids also in the Trojan cloud, like in the main belt. Some indications were found, with the aid of the calculated proper elements, but Milani’s conclusion was, that “the number of Trojans with good enough orbits is marginal for a reliable detection of families”. Furthermore, during his integration he also determined the maximal Lyapunov Characteristic Exponent (LCE), in order to be able to distinguish between regular and chaotic orbits. The LCE, y,is the typical indicator of chaos and is defined as the average asymptotic rate of exponential divergence of infinitesimally nearby trajectories. If y = 0 the orbit is regular (because this indicates a linear deviation of two nearby orbits), whereas y > 0 corresponds to a chaotic orbit. The inverse of the LCE is called the Lyapunov time, TL = l / y , and defines the time beyond which any orbital prediction is bound to fail. Some of these Trojan orbits were stable, in the sense that their proper elements were practically unchanged for times much greater than TL,although they are chaotic; they are examples of asteroids in stable chaos (ASCs, see also Milani et al. 1997). For all these Trojans the Lyapunov time is less than lo5 years and, yet, they are permanent members of the Trojan belt. Although the nature of stable chaos is still not clear, it is possible that stable chaos is the manifestation of the stzckzness effect, which is caused in Hamiltonian dynamical systems by the presence of stability islands, and the action of the cantori surrounding them, inside a chaotic domain of the phase space (see Murison et al. 1994; Varvoglis & Anastasiadis 1996; Tsiganis et al.. 2000b and for details Dvorak, 1997). In a long-term integration Levison et al. (1997) calculated orbits of real asteroids and also of fictitious objects for very long times (up to several lo9 years). Their dynamical model was the Outer Solar System, and the integration method was the symplectic integrator SWIFT. Out of the 36 real Trojan orbits, which they integrated in their sample, 21 turned out to be unstable in less than 4 billion years. For the fictitious Trojans they plotted level curves of equal escape time in the proper element plane D ( libration amplitude ) versus e p (i.e. proper eccentricity). Between lo8 and lo9 years the stable regions shrunk almost by 50% in the considered proper element plane defined above. Their results also showed that most of the real Trojans are in a “safe” region of long-term stability, and only some of them (e.g. Achates) lie above this region, in a dynamical neighbourhood of shorter lifetimes. In we show a similar plot to the one given by Levison et al. (1997). where Trojan asteroids are plotted in the plane of proper elements defined above. The full squares represent the asteroids of Table 1 and the dotted line is Rabe’s stability curve (Rabe 1965)-as was determined in the model of the restricted three-body problem. Shoemaker et al. (1989) concluded from the objects above this stability curve that the true stability curve must be above the one of Rabe. In the study of Levison et al. (1997) the stable region exceeds Rabe’s stability curve slightly, nevertheless there are still Trojans above the determined stability boundary-like Achates. We will deal in the following withthe dynamical behaviour of the ASCs of the Trojan cloud, because understanding their orbital behaviour seems to be the key to understanding the existence of objects close to L4 and L5 of Jupiter’s orbit.
On the Trojan problem
33
0.25
0.2
0.15 n
0.1
0.05
0
Figure 6. Rabe’s curve in the proper element plane: proper eccentricity versus libration 20
5.290 0.110
16.8
LCEx
LT [lo3 yrs]
1.12
89
(1869) Philoctetes
5.303
0.065
4.0
1.49
67
1988 AK
5.305
0.064
22.1
1.07
93
(4543) Phoinix
5.082
0.098
14.7
1.11
90
4523 P-L
5.236 0.048
0.9
2.12
47
5187 T-2
5.131
0.031
8.6
1.24
81
1991 HN
5.098
0.011
8.3
1.73
58
(1173) Anchises
0.137 0.086
6.9 5.5
2.04
(2594) Acamas
5.326 5.113
2.90
49 34
(3451) Mentor
5.086
0.070
24.7
1.90
53
(5144) Achates
5.232
0.273
91
5.269
0.096 0.024
8.9 1.4
1.10
1988 R N l l 1989 UX5
1.99
50
4.3
6.38
16
5.104 -
Table 1. Trojan asteroids an stable chaotic motion (cf. Milani et al., 2997).
5.2
Asteroids on the edge of the Trojan cloud
Using the results of Pilat-Lohinger et al. (1999) we present the orbital evolution of the sample of 13 Trojans in stable chaos. These asteroids are given in Table 1, where the upper 7 asteroids belong to the Ld Trojans and the lower 6 asteroids to the L5 Trojan cloud. Former computations established the dynamical stability of these orbits over a much longer time interval than indicated by their Lyapunov time. The asteroids of Table 1
34
Rudolf Dvorak and Elke Hat-Lohinger I
0.35 ._ .C
.
.-
fi
?
'E
I
I
1
,
0.25 1
I
0.3
..-C
0.25
'0
0.2
._ .-2 E
0.15
I
I
1
I
1
0.2
0.15 0.1
0.1
0.05
0.05
n
n
0
200
400
600
time in 100000 years
800
1000
0
200
400 600 800 time in 100000 years
1000
Figure 7. Orbital evolution for 100 Myrs of two L4 ASCs: (4543) Phoinix with eo = 0.098 and io = 14.7 (left) and 1991 H N with eo = 0.011 and io = 8.3 (right); on the y-axis we plotted the eccentricity and the sine of the inclination. 0.5
I
I
I
I
1
0.35 0'4
6 . 025 035 03
-5 2
02 015
-~
7
sln
-
eccentricity
U. 1
0.05
n
ni 0
200
400 600 800 time in I00000 years
1000
0
I
200
400 600 800 time in 100000 years
1000
Figure 8. Orbital evolution for 100 Myrs of two L5 ASCs: (3451) Mentor with eo = 0.07 and io = 24.7 (left) and (5144) Achates with eo = 0.273 and io = 8.9 (right); on the y-axis we plotted the eccentricity and the sine of the inclination. were integrated over a time interval of lo8 years whereby the equations of motion have been computed by means of a Lie integration method (Lichtenegger 1984, Hanslmeier & Dvorak 1984). ,The OSS (i.e., Sun and the planets Jupiter through Neptune) has been used as the dynamical model, where the Sun's mass has been increased by the masses of the inner planets in order to approximate the neglected perturbations by the inner planets. Relativistic terms were not taken into account. The dynamical behaviour of these Trojan asteroids within 11 time intervals, i.e. subintervals of the whole time, was analysed by means of (1) a numerical frequency analysis (via a program by Chapront (1997) (2) the root mean square of the orbital elements and (3) the proper elements. The subintervals have been introduced in order to study possible variations of the different results over the time. According to the results these selected asteroids show larger variations of the semimajor axis and, for some of them, significant changes of the inclination have been found. One of these asteroids, (5144) Achates, showed exceptionally strong variations in the inclination; another one, namely (1868) Thersites, escaped after some 30 Myrs. Figures 7 and 8 show the dynamical behaviour of 4 selected asteroids of the Trojan ASCs. The evolution of the eccentricities is quite similar for all asteroids while the in-
On the l l o j a n problem 0.05
I ,
I
0.0255 0 0255
35
,
OQZ5t 0 0245
I
h
/----4
i\
-I
0 024 0 0235 00235
11
0.023
0022 -
00225
0.03 .
0 025
tgpppp
00215
-
-
-
1
-
-
10
Figure 9. RMS(e) (boxes) and proper element ep (lines) versus 11 different time intervals o f A t = 1 Million years: [0,1], [lO,ll], ... [ l O O , l O l ] for the L4 Trojan Phoinix (left) and for the L j Trojan Mentor (right) 045,
025
0 15 O2
0
O
,
I
c
4
1 ;
:
10
:
10
Figure 10. RMS(i) (boxes) and proper element sinI, (lines) versus 11 different tame intervals of At = 1 Million years: [0,1], [10,11], ... [100,101] for the L4 Trojan 88AK (left) and for the L5 Trojan: 88RNll(right) clinations changes significantly in some cases. The inclination of the asteroid 1991 HN () shows larger variations a t the beginning while, after about 60 Myrs, its behaviour looks quite regular. Larger variations over the whole time interval characterise the inclination of the L j Trojan (5144) Achates (). The results of the RMS and the proper elements are in quite good agreement, as it can be seen in Figures 9 and 10, where we plotted as examples these two quantities for each of the 11 time intervals for Phoinix and Mentor (for the eccentricities) and for 88AK and 88RNll (for the inclinations). Generally, we can say that an ASC near Lj shows larger variations between the different time intervals. For the R M S ( ~ )we have derived nearly the same values for almost all Trojan ASCs - independent of the initial inclination. There are only two exceptions: the L5 Trojan (5144) Achates with significantly larger RMS values than the other asteroids and the L4 Trojan 4523 P-L also with relatively large variations. Both tools (the RMS and the proper elements) indicate larger variations of the semimajor axis between the different time intervals €or the L5 Trojans than for the L4 ones. The eccentricities of all Trojans are nearly constant over 100 Myrs and seem not to be
36
Rudolf Dvorak and Elke Pilat-Lohinger
affected significantly over these time intervals. On the contrary the inclination seems to be more sensitive, which can be explained by secular resonances involving the nodes which may act inside the 1:l mean motion resonance with Jupiter.
5.3
Clones and neighbours of the ASCs
12 Trojans from Table 1 (Thersites was studied separately) were selected for a more detailed study of their orbital properties by Dvorak & Tsiganis (2000). For each of these objects 5 orbits were integrated, in order to obtain a better view of their dynamical neighbourhood: the ‘original’ Trojan had initial conditions taken from the catalogue of Bowel1 et al. (1994), while his four ‘neighbours’ were obtained by accounting for small deviations in the initial values of the semimajor axis, Aa = f0.01AU (first and second neighbour), or the eccentricity, Ae = 10.01 (third and fourth neighbour) of the Trojan. For a chaotic orbit, it is only natural to expect that two ‘different’ integrations would have a different outcome. Thus, we decided to perform the integrations twice: (a) with the Lie integrator (Lichtenegger, 1984; Hanslmeier & Dvorak, 1984) and (b) with the MVS symplectic integrator from the SWIFT package (Levison & Duncan, 1994). Hence, the total number of integrated orbits is 120. The output of the integration consisted, as usually, of the time series of the osculating elements. A running-window averaging (see also Tsiganis et all 2000a) was performed for the computation of mean elements from the initial semi-major axis, a , eccentricity, e, and inclination, i, time series. A large fraction of the integrated orbits showed large excursions in (i), while the eccentricity remained practically constant during the integration. Therefore, instabilities are most probably caused by a secular resonance involving the nodes (see also Milani, 1994). Indeed Dvorak and Tsiganis (2000) showed that the main source of instability is the v16 resonance. Also, at small inclinations, the s = 0 resonance, which is defined by a pause in the precession of the node ( h = 0) is acting. However, higher order (or ‘non-linear, as they are sometimes called) secular resonances with frequencies close to 36 were also identified, through the typical chaotic behaviour of the respective critical arguments. These resonances cause small-amplitude chaotic variations in the orbital elements, which in turn appear to be stable for very long times. Thus, a possible explanation for both the positive LCE and also the seemingly stable orbital elements of the selected ASCs was found. Moreover, these resonances were found to overlap with the v16 resonance. If an asteroid avoids entering the libration zone of the v16 resonance, its orbit will continue to look stable, while in the opposite case large variations in the inclination, which will finally lead to ejection from the 1:l mean motion resonance, will appear. shows two typical examples of this kind of behaviour. Out of the 120 integrated orbits 52 (40%) were found to be unstable -2 even escaped-and the rest were found to be stable, according to the criteria used in this study, within 50 Myrs. In we see the results in the plane of mean elements (D) - ( e ) and (D) - (i), where one can also see that high values of the inclinations do not necessarily mean that the Trojan will be on an unstable orbit! The time-scale for large-scale instabilities t o grow in the vicinity of the v16 resonance seems to be of the order of 100-200 Myrs. In we can see the distribution of escape times derived from a numerical experiment, where 50 fictitious Achates’ neighbours were integrated over 1 Gyr.
On the Trojan problem
Achates
0,232 y
0,228
37
I I 1
ii
.L
L
0
I
__
-
1x10'
I 2x10'
3x10'
4x10'
5x10'
4x10'
5x10'
lime (KV)
Anchises
L_
0
1x10'
2x10'
3x10' time (Kyr)
Figure 11. (a) The orbit an unstable Achates clone. The figure shows (from top to bottom) the tame development of the mean eccentricity, ( e ) , the mean inclination, ( i ) and the critical argument of the resonance. The variations in (2) are correlated to the libration/circulation of the critical argument. (b) The same as in (a) but for the stable orbit of Anchises.
38
Rudolf Dvorak and Elke Pilat-Lohinger
0,24
+
0,20 i
P
6
-
0,16
0,12 I
0,m 1 I
-
0,04 C 28 L
2.
-"g
24
1
20
t
om,
a
0
O W 0 * O 00
O !
A
t
1612L
A
i
841
I 01 5
I '
'
10
15
25
20
30
35
40
45
DmaX( d w )
Figure 12. Distribution of the ASC clones and neighbours on the proper element planes D-e (upper graph) and D-i (lower graph). 18 16 14
12 n u 10
g* 6 4
2
0 0
100
200
300
400
500
600
700
800
T, (Myrs)
Figure 13. Escape tame versus number of escapers for fictitious Achates in the model of the Outer Solar System Now the question arises, why these 4SCs are still there? According to the results presented for clones and neighbours of "real" Trojans there exists a great sensitivity of orbital evolution with respect to the slightest change in the initial position of an asteroid. Therefore we may conclude that these M C s , which span a region of orbital elements sparsely populated in comparison with the rest of the Trojan cloud (see )' may be regarded as the long-lived 'tail' of an initial popdation, which was scattered away by secular perturbations.
On the nojan problem
6
39
Conclusions
This lecture was given to introduce the reader to the interesting problem of the motion of the Trojans. In the historical remarks we sketched the different relevant discoveries after the observation of the very first minor planet. Since the basic properties of the dynamics close to the Lagrangian points are already evident in the RTBP, we shortly discussed Trojan motion on the basis of the equations of motion of a massless celestial body. In the following we stressed the importance of the secular and Kozai resonances acting on asteroids which stay in the vicinity of Jupiter’s orbit. Then the very efficient theory developed primarily by Erdi was presented, which explains the main properties of the libration amplitude and also of the motion of the eccentricity and the longitude of perihelion. In the next chapter numerical results of integrations of real and of fictitious Trojans in more realistic dynamical models were discussed. Special emphasis was given to the Trojan motion which is unstable on very long time-scales and estimates of dynamical life-times for these asteroids. To conclude, we can summarise the results of analytical and numerical research on Trojan motion, which we tried to present in this course, as follows: There exist significant results on the basis of simplified models, which correspond quite well to the actual dynamical behaviour of the Trojans (we understand the different distinct periods being present in their motion, we understand how the eccentricity and the longitude of the perihelion evolves). The role of the different types of resonances is becoming obvious through the large number of extensive numerical simulations of the evolution of real and fictitious Trojans. The stability behaviour of the Trojans with large eccentricities or inclinations is also better understood through numerical experiments; the asteroids which move on the edge of the Trojan cloud are characterised by instabilities which are growing on time-scales of the order of some tens to hundreds of Myrs. Future work will concentrate on models that take into account the resonances that may appear inside the 1:l mean motion resonance with Jupiter and the not yet fully understood interplay between eccentricity, inclination and libration period. With the aid of computer simulations we will certainly gain additional information, but for understanding the basic mechanisms we need to develop better-more realistic-analytical models.
References Barber G, 1986, The Orbits of Trojan Asteroids, in Lagerkvist C I, Lindblad B A, Lundstedt A and Rickman H(eds.) Asteroids, Comets, Meteors 11,University of Uppsala, p. 161. Bien R and Schubart J, 1984, Trojan orbits in secular resonances, Celest. Mech., 34, 425. Bien R and Schubart J, 1987, Three characteristic parameters for the Trojan group of asteroids, Astron Astrophys, 175, 292. Bowel1 E, Muinonen K, Wasserman L H, 1994, A Public-Domain Asteroid Orbit Data Base, in A. Milani, M. di Martino, A. Cellino (eds.), IAU Symposium 160, Asteroids, Comets, Meteors IZI, Kluwer Academic Publishers, The Netherlands, p. 477.
40
Rudolf Dvorak and Elke Pilat-Lohinger
Chapront J, 1997, Representation of planetary ephemerides by frequency analysis. Application to the five outer planets, Astron Astrophys, 109,181. Dvorak R, 1997, Stickiness in Dynamical Systems, The Dynamics of Small Bodies in the Solar System: a major key to Solar System Studies (eds. A.E. Roy, B.A. Steves), NATO AS1 Series, 522,509. Dvorak R and Pilat-Lohinger E, 1999a, The Edgeworth-Kuiper-Belt in The Outer Heliosphere: Beyond the Planets (eds. K. Scherer, H. Fichtner & E. Marsch), Copernicus Gesellschaft 305. Dvorak R and Pilat-Lohinger E, 1999b, On the Dynamical Evolution of the Atens and the Apollos, PSS, 47,665. Dvorak R and Tsiganis K, 2000, Why do Trojan ASCs (not) escape?, Cel Mech Dyn Astron (in press) Erdi B, 1977, An asymptotic solution for the Trojan case of the plane elliptic restricted problem of three bodies. , Celst Mech, 15,367. Erdi B, 1978, The three-dimensional motion of Trojan asteroids, Celest Mech 18, 141. Erdi B and Presler W H, 1980. On long-periodic perturbations of Trojan asteroids., Celest Mech, 85,1670. Erdi B, 1981, The perturbations of orbital elements of Trojan asteroids, Celest Mech, 24,377. Erdi B, 1983, A note on the normalized period of libration of Trojan asteroids. Celest Mech. 30,3. Erdi B and Varadi F, 1983, Motion of the perihelion of Trojan asteroids., in Asteroids, comets, meteors; Proceedings of the Meeting, Uppsala, Sweden, 155. Erdi B, 1984, Critical inclination of Trojan asteroids, Celest Mech, 34,435. Erdi B, 1988, Long period perturbations of Trojan asteroids, Celest Mech, 43,303. Erdi B, 1996, On the Dynamics of Trojan Asteroids, in S.Ferraz-Mello, B. Morando, and J.E. Arlot (eds.), IAU Symposium 172, Dynamics, Ephemerides and Astrometry in the Solar System, 171. Erdi B, 1997, The Trojan Problem, Cel Mech Dyn Astron, 65,149 Garfinkel B,1977, Theory of the Trojan asteroids I, A J , 82,p 368. Giorgilli A and Skokos C, 1997, On the stability of the Trojan asteroids., Astron Astroph, 317, 254. Hanslmeier A and Dvorak R, 1984, Numerical integration with Lie series, Astron Astrophys, 132,203. Jewitt D and Luu J, 1993, Discovery of the candidate Kuiper belt object 1992 QBl, Nature, 362, 730 Kozai Y, 1962, Secular perturbations of asteroids with high inclination and eccentricity., A J , 67,591. Levison H and Duncan M, 1994, The long term dynamical behaviour of short period comets, Zcarus, 108,18 Levison H, Shoemaker E M, Shoemaker C S, 1997, Dynamical Evolution of Jupiter's 'Ikojan asteroids, Nature, 385,42. Lichtenegger H, 1984, The dynamics of bodies with variable masses, Cel Mech, 34,357. Marzari F and Scholl H, 1999, The growth of Jupiter and Saturn and the capture of Trojans., Astron Astroph, 339,278. Michel P and FroeschlC Ch, 1997, The Location of Linear Secular resonances for Semimajor Axes Smaller Than 2 AU.,Zcarus, 128,230. Milani A and Nobili A M, 1992 An example of stable chaos in the Solar System, Nature, 357, 569 Milani A, 1993, The Trojan Asteroid Belt: Proper Elements, Stability, chaos and Families, Cel Mech Dyn Astron, 57,59.
On the nojan problem
41
Milani A, 1994, The dynamics of the Trojan asteroids, in Symposium 160, ACM 1993 IAU, edited by Milani A, di Martino N and Cellino A, Kluwer Academic Publishers, The Netherlands, 159. Milani A, Nobili A, KneSeviC Z, 1997, Stable chaos in the asteroid belt, Icarus, 125,13. Moons M, 1997, Review of the dynamics in the Kirkwood gaps Cel Mech Dyn Astr, 65 175. Morais M H M, 1999 A secular theory of Trojan-type motion, Astron Astrophys., 350,318. Murison M A, Lecar M, Franklin F A, 1994, Chaotic motion in the outer asteroid belt and its relation to the age of the solar system., A J, 108,p. 2323. Nesvornjl D and Morbidelli A, 1998, Three-Body Mean Motion resonances and the Chaotic Structure of the Asteroid Belt., A.J, 116,3029. Nobili A, Milani A, Carpino M, 1989, Fundamental frequencies and small divisors in the orbits of the outer planets, Astron Astrophys, 210,313. Pilat-Lohinger E, Dvorak R, Burger Ch, 1999, Trojans in stable chaotic motion, Cel Mech Dyn Astron, 73,117. Rabe J, 1961, Determination and survey of periodic Trojan orbits in the restricted problem of three bodies, A J, 66,500 Rabe J, 1962, Additional periodic trojan orbits and further studies of their stability features., A J, 67,382. Rabe J, 1965, Limiting eccentricities for stable Trojan librations, A J, 70,687. Rabe J, 1967, Third-order stability of the long-period Trojan librations., A J, 72,10. Schubart J and Bien R, 1987, Trojan asteroids - Relations between dynamical parameters, Astron Astrophys, 175,299. Shoemaker E M, Shoemaker C, Wolfe R F, 1989, Trojan asteroids: Population, dynamics, structure and origin of the L4 and L5 swarms, in R P. Binzel, T. Gehrels and M.S. Matthews (eds.) Asteroids II, University of Arizona Press, Tuscon, 487. Tsiganis K, Dvorak R, Pilat-Lohinger E, 2000a, Thersites: a ‘jumping’ Trojan?, Astron. Astrophys, 354,1091. Tsiganis K, Varvoglis H and Hadjidemetriou J D, 2000b, Stable chaos in the 12:7 mean motion resonance and its relation to the stickiness effect, Icarus, 146,240. Tsiganis K and Dvorak R, 2000, (5144) Achates: a Trojan on the edge of escape, in Freistetter and Dvorak (eds.) Second Austrian-Hungarian workshop on Trojans and related topics (in print) Thiiring B, 1931, Die Librationsperiode der Trojaner in ihrer Abhangigkeit von der Librationsamplitude. A N , 243,183. Thiiring B, 1959,Programmgesteuerte Berechnung von Librationsbahnen, A N , 285,71 Varvoglis H,Anastasiadis H, 1996, Transport in Hamiltonian Systems and its Relationship to the Lyapunov Time, A J , 111, 1718. Zagretdinov R V, 1986, Theory of the motion of Trojan asteroids., Kinematika i Fizika Nebesnykh Tell 2,68 (in russian).
43
Ideal resonance and Melnikov’s theorem P J Message University of Liverpool, UK
1
Discrete and continuous dynamical systems
Melnikov’s Theorem, gives a criterion for the occurrence of the homoclinic tangle in dynamical systems of a certain class, and hence for the occurrence of chaos. The theorem is applied to the basic ideal resonance problem, as subject t o a very simple perturbation, as an indication of the implications of the theorem in Celestial Mechanics. First we consider the relation between the solution sets of two equations which are related, but of different type. First, let us consider the logistic equation,
which provides a simple mathematical model for the evolution of the size of the population of a species from year to year (2, being the size in the year n ) , under suitably simple assumptions as to the factors governing it. Treatises dealing with chaos, or, as it used to be called, wildness (for example, Gleick, 1988), frequently describe the behaviour of solutions of this equation. Such texts should be referred t o for detailed descriptions of the types of solution encountered, but, very briefly, it is found that, for small enough values of the parameter A, the evolution behaves regularly, and there exist simple fixed points, which, for larger values of A, are replaced by fixed pairs of points, which are visited alternately in the motion, and which, for yet larger values of A, are in turn replaced by fixed quadruples of points, which are visited in sequence in the motion. For yet larger values of A, further successive doublings of these fixed sets of points are found, and motion not associated with these fixed sets of points appears to exhibit increasingly apparently random behaviour, and the system thus provides an example of chaos, deriving from a quite simple equation (the system being of course nevertheless completely deterministic). Now, in contrast, let us consider the continuously varying equation with the same right-hand side:
P J Message
44
This equation is of course exactly integrable, having the solution
where
20
= s(t0) # 0. Plainly, no chaotic phenomena can occur here.
So we see that the discretely varying system displays chaos, but the corresponding continuous one does not. Let us now remember that, when a solution to a differential equation system is sought by numerical integration, any method employed always replaces the differential equation system by a discrete system of recurrence relations, either explicitly or implicitly. Could it be that some, at least, of the chaotic type phenomena shown in numerical integration of differential equations are artifacts of the numerical algorithms used for the numerical integration, and not properties of the actual differential equation systems themselves? Can we be certain that the chaotic features found in such numerical integrations are real? To investigate whether this doubt may be allayed, we look for an analytical criterion for the existence of chaotic type phenomena, and this is what, for appropriate types of dynamical system, Melnikov’s theorem ( Melnikov, 1963) provides.
2
The pendulum: an integrable Hamiltonian system
Melnikov’s theorem relates to dynamical systems which can be expressed as resulting from perturbations of an integrable dynamical system. We will here be concerned with the case in which the integrable system is of Hamiltonian form. A simple case of this is provided by the motion of a simple pendulum, which may also be considered as the very simplest case of the Ideal Resonance Problem (see Garfinkel 1966, Garfinkel et al. 1971, Jupp 1973,). So we consider the motion of a particle, P , of mass m, which is suspended from a fixed point, 0, by a massless rod of length e, in such a way that it can move in a fixed vertical plane through 0. Then, if e denotes the angle which the rod O P makes with the downward vertical through 0, the kinetic energy of the particle is +nez282, and the momentum conjugate to 6’ is p = meZ28. The potential energy of P is V = -meg cos 8, where g is the acceleration due to gravity, and so the Hamiltonian function is
and the equation of motion is
8
= -w2sin0,
where w2 = g/e. The integral of energy is,
The phase space for this system is the surface of a cylinder, whose axis is parallel to the coordinate axis of p , the same configuration being represented by the values T and -T of B. The type of motion is determined by the value of the ratio C/w2. In the case in which this ratio is less than unity, the motion is one of libration, the solution curves being closed curves which enclose the stable equilibrium position, which is
Ideal resonance and Melnikov's theorem
45
given by e = 0 , p = 0 , and these curves are symmetric about both coordinate axes. Thus the angle e oscillates between the values &eo, where cos00 = C/w2. The solution of the equation of motion is in this case
e
sin 5 = k sn{w(t - to)},
(7)
the modulus, k , of the Jacobian elliptic function being equal to sin(eo/2),
In the case in which the ratio C/w2 is greater than unity, each solution curve passes right round the cylinder, and corresponds to a motion in which the particle makes complete revolutions about 0, so that 0 either always increases, or always decreases. The solution of the equation of motion is in this case
e
sin-2 = sn{ i ( t - t o ) } ,
e
where U is the value of at 6' = 0, and the modulus, k , of the Jacobian elliptic function is equal to 2wlu. In the case in which the ratio C/w2 is exactly equal to unity, the solution curve separates the two sets of solution curves corresponding to the two previous cases, and is called the separatrix, and is in two pieces. One of these extends from the point 6' = -ri,p = 0 through increasing values of 0, and positive values of p , to reach the same point at its appearance as 8 = + a , p = 0 , this motion corresponding to an infinite elapse of time. Thus this end-point corresponds to an unstable equilibrium configuration. The other piece extends from 0 = +ri,p = 0,through decreasing values of 8, and negative values of p , to reach the end-point at its appearance as e = - a , p = 0. The solution of the equation of motion is given in this case by
e
tan-
4
= ftanh
where the f gives the branch on which
3
(9)
e is increasingldecreasing.
A perturbed system, and Melnikov's theorem
Consider now a more general dynamical system with coordinate y, and its conjugate momentum 5. Suppose the equations of motion to be
where f and g have period T in t. Let r denote the pair ( y , ~ ) ,let Ei, denote (Yo,Xo), and let p(r, t ) = {f(y, 2 , t ) ,g(y, 2 , t ) } ,so that the equations may be written
.i- =
&(T)
+ Ep(r,t).
(11)
Consider now the case E = 0 (the unperturbed system), with equations:
1: = Ro(r),
(12)
P J Message
46
and suppose that this is a Hamiltonian system, i.e. there is a function %o(y,z) such that @LO
Yo(y,z) = dz
Xo(y,z)=
and
a310
(13)
Let us suppose that this unperturbed system has, like the system of the previous section, one stable equilibrium point, A, a t r = T A , and one unstable point, B , a t T = r g , with a separatrix which we now suppose to be of one piece only, and which begins and ends a t B , and encloses A. Denote the solution on the separatrix by = ro(t - t o ) = { y o ( t - t o ) , zo(t - t o ) )
(14)
where to is some, finite, instant of time, which we can think of as the initial, or starting, time. Then, since the separatrix solution is doubly asymptotic, T O -+ T g as t -+ &m. In the unperturbed system, this separatrix solution satisfies fo
=
&(To).
(15)
In the space with coordinates (y, z, t ) , the solutions corresponding to the separatrix are represented by a cylinder, whose generators are parallel to the axis of t , whose crosssections at right-angles to this axis are all replicas of the separatrix in the (y, z)-plane, and on which the solutions spiral around the cylinder, identical except that they are displaced relative to each other, having different time phases, corresponding to their having different values of to. Now when we pass to the perturbed case, in which E is different from zero, we must first notice that the separatrix solution will not in general be replaced by a single closed curve in the (y,z)-plane, and, indeed, because of the time-dependence of f and g, the actual form of a solution asymptotic to B will in general be different for different starting times to. In fact, let us denote the set of solutions which approach the unstable point B in the limit t + m by
d t , t o , E ) = (ys(t,
to1
€1, zs(4 t o , E ) ) ,
called the stable manifold, S , a t B , while we denote the solutions which approach B asymptotically on reversal of time, i.e. for which T -+ r g in the limit t -+ -m by
r u ( t ,t o , E ) = (ya(t1t o , €1, W ( t ,t o , E ) ) , called the unstable manifold, U, a t B . Because the functions f and g have period T in the time t , the figure in the (y, z, t)-space will repeat after each displacement in the t-direction by an amount T , except that each solution belonging to the stable manifold S will have moved nearer to B , while each solution belonging to the unstable manifold U will have moved further from B. Consider now the mapping 7, corresponding to the advance of time through the interval T , i.e 7 { T ( t ) } = r ( t T ) . Suppose S and U have an intersection, say T ~ in, that (y, 2)-plane corresponding to t = to. Denote by SOthat solution of S which passes through T O , and by U, that solution of U which passes through T O . Then T I , defined as TO}, is on SO,and also is on UO,so that T I is also an intersection of these two solutions. In turn, 7-2 = 7 { T 1 } is likewise also yet another intersection of SOand V O and , so we see that the
+
Ideal resonance and Melnikov’s theorem
47
existence of one intersection implies the existence, in turn, of an infinite number of them, say 7-3, 7-4,. . . . Now suppose that the system (10) is area preserving (which it certainly will be if it is Hamiltonian, from Liouville’s theorem), then the area in the projection onto the (y, z)-plane enclosed by the loops of So and U0 between ro and r1 will be equal in area to that enclosed by the loops between r:, and r3, and equal in turn to that enclosed by the loops between 1-4 and 7-5, and so on. But the intersections ro, r1, 7-2,. . . , must be crowded successively more closely together on So as B is approached, so the equal-area property must imply that the loops become more elongated, in the confined region, in which a similar phenomenon is occurring in reverse time, as B is approached along UO. So the two sets of loops must become increasingly entangled, to an infinite extent as the limit is approached. This homoclinic tangle ensures the occurrence of the sort of phenomena associated with chaos.
So we must conclude that, if SOand U0 possess one intersection, then chaos is unavoidable. Choose a starting time, to, so that the curves So and U. are then close together in the (y, 2)-plane, and close also to the separatrix of the unperturbed problem. Let us call that separatrix S. Consider the distance between So and U0 in a direction perpendicular to S. An intersection of the curves SOand U0 will of course occur wherever this distance is zero, which will be so at any zero of A(t0,to, E ) , where A(t, t o , E ) is the magnitude of the vector product of f o ( t - t o ) (which of course gives the tangent to S) with ru(t,to, E ) - rs(t,to, E )
P J Message
48
Then we have
A d 6 to, €1
=
Yo(ro(t- t O ) ) i U , l + I'o(ro(t - t o ) ) z v , 1 ( t 1
E(
to,
€1
-Xo(ro(t - tO))Yb,l - Xo(ro(t - tO))YU,l(t,tO, e ) } + O(E2). (23) We note that +cl
- +o
= RO(r0)- % ( T o )
from which, taking terms of order ay0
$U,l
= -(Yo,
and correspondingly for
Yo(ro(t
E
ZO)YU,l
i ~ r , ~Now, .
+ ~au, ( Y o , x O ) w , l+ !(Yo,
=
(24)
xo,t),
(25)
- to)),
(26)
using the components of (15),
ay0 - t o ) ) = -Yo(ro(t - t o ) )
Xo(ro(t - t o ) )
+€P(TU,t),
of the y and z components,
-to))
au, + -Xo(ro(t ax
+ x8x0 X o ( r o ( t- t o ) ) .
(27)
Then, on substitution into (23) from (25), (26), and (27), and noting the cancellation of = 0, we obtain terms and also noting that, because of (13), Y
+2
Acl(t,to,d = Em(t,to)+ O ( f 2 ) ,
(28)
where
4 4 to)
=
Yo{ro(t- to)g{Yo(t - t o ) , xo(t - t o ) , t> -Xo{ro(t - tO)f{YO(t - t o ) , ao(t .- t o ) , t ) .
(29)
Similarly
h s ( t ,t o , €) = € m ( t to) , + O(2) Integrating, noting that Au(t,to) --t 0 as t -+ 00, gives
(30)
so that, together, recalling (18)
Melnikov's theorem then states that, if the integral (33) has a simple zero for some value of to, then the corresponding solutions SOand U0 have an intersection, and so there must be a homoclinic tangle. An alternative, and often more useful, form of the integral is given by making the substitution T = t - t o , giving
Ideal resonance and Melnikov’s theorem
4
49
An application in celestial mechanics
Now consider perturbed orbital motion, in which two orbiting bodies (planets or satellites), have a ratio of orbital periods close to the ratio of two small integers. Evidence of chaos is sometimes found in association with such cases. Examples of such pairs occur, for example, within the satellite system of Saturn (e.g., the pairs Mimas and Tethys, Enceladus and Dione, Titan and Hyperion, and also the more recently-discovered smaller satellites to the larger ones) (see, for example, Message, 1998), as well as relating some minor planets to Jupiter. Let us, so that the present exploratory calculation may be kept simple, suppose that one of the orbiting bodies (let us call it J ) is very much more massive than the other, ( P ) , and to simplify matters further, suppose that J moves in an unperturbed circular orbit about the primary (S),so that we may use as a model the circular restricted gravitational problem of three bodies. Further suppose that the entire motion is confined to one plane. Then the orbital motion of P about S will be perturbed by J , the perturbed part of the acceleration of P relative to S being expressible as the gradient of a disturbing function, R, which is small with the ratio m‘ = m J / m s (where mJ and ms are the masses of J and S, respectively), and which can be expressed as a multiple Fourier series in the angles A (the orbital mean longitude of P ) , w (the longitude of its apse) and A’ (the mean longitude of J ) . Further suppose that the near-commensurability relation is given by (P + q b
(35) where n is the orbital mean angular motion of the smaller body P , n’ is that of J , and p and q are integers with no common factor. Then, amongst the linear multiples of these angles, which appear as arguments in this Fourier series, there will be the slowly-changing critical argument Pn’,
(P +
- PA’ - w . 4 Let us set up the equations of motion in Hamiltonian form, taking, as coordinates, the slow-moving critical argument 0, and also the difference of the mean longitudes q5 = A-A’, the latter being of course fast-changing. Then the conjugate momenta to these are found to be
o =
respectively. In these, p = Gms (G being the gravitational constant), E
=
JG e+ + 128 7 =
-e3 1
+
-e5 0(~7), (38) 8 and the major semi-axis a, of the orbit of P is, in terms of the momenta@ and @,
From (38) we note that 0 is small with E , and so with the orbital eccentricity e. Now the disturbing function can be expanded as i=O j=-w
P J Message
50
where the coefficients Kij are functions of 0 and Cp. The Hamiltonian function giving the equations of motion is
in which we must substitute for a its expression (39). The underlying long-period part of the evolution of the orbital parameters will be governed by those terms in the disturbing function R which are independent of 4. These features may be formally separated out of the full motion, that is, from the short-period features, by a transformation of Lie series type or equivalent (see, for example, Message, 1987),
(4,O;Q ,0 ) H ($*, O*; Q*,e*),
(42)
where 4 differs from 4* by terms of order E , which take the form of a double Fourier sine je* (with i # 0), and correspondingly for 0, Q,and 0 (with series with arguments cosine series in the latter two cases). The Hamiltonian function which gives the equations of motion for the transformed system is
+* +
R*
P
= --
2a*
L
- ~K,”COS(iO*) i=o
(43)
where a* is the same function of the starred quantities that a is of the un-starred, and the coefficients K,*, which, to first order, are equal to the Kio, are to be evaluated as functions of the starred quantities also. The d’Alembert property of the disturbing function expansion (see, for example, Message, 1987), has the consequence that K,o (for i > 0) has the factor et, and so, in terms of 0 , it will have the factor O ( i / 2 ) So, . if we consider cases of motion in which the orbital eccentricity is small, the main terms of %* which actually contribute to the equations of motion will be
1
%;
= -AO~-BCOSO
(44) 2 where we may, to a first approximation, regard A and B as constants, by being evaluated at some fixed values of 0 and Cp. This has brought us again to the very simplest “ideal resonance” problem” (Garfinkel 1966, Garfinkel e t al. 1971, Jupp 1973), and the equations of motion are of the familiar form
O=AO,
0 = -BsinO,
(45)
which lead to the simple pendulum equation
e
= -ABsinO.
(46)
Since we wish to apply Melnikov’s theorem, we are concerned especially with the solution on the separatrix, and that is given by
e
at
t a n -4 = tanh-,4 where
cy2
= AB,
which gives sine = 2 t a n h % s e c h q ,
0 = asechy.
(47)
Ideal resonance and Melnikov's theorem
51
We apply Melnikov's theorem, taking as the perturbing terms those terms which were omitted from the Hamiltonian function in order to reach the approximation given by the simplified Hamiltonian function (44), our aim being to investigate whether these (or other) perturbations lead to chaos. The topological nature of the solution will not in general be changed by the inclusion of terms in R which depend only on 0 and 0, such as the omitted terms K: cos(iO*) for i > 1, or by including the changing parts of the coefficients K: which arise from the changes in the motion of 0 Although these terms will certainly increase the complexity of the solution, we would not expect them to introduce that sort of breakdown of the regular nature of the solution which we associate with chaos. So we are led to examine the terms which depend on 4. To carry out a very simple exploratory calculation, while keeping the essential principle, let us examine the very simple case given by '
ax0
B=-+Ccos~, a0 where C and y are constants, the term y being introduced to model very simply a measure of slow dissipation which reduces the orbital eccentricity. Suppose also that 4 may be taken as vt 40, v and $0 being constants. Then the additional terms in (49) may be incorporated into our system by adding to the Hamiltonian function the perturbation
+
x1= CO cos 4 + ye.
(49)
We can now calculate the Melnikov function, using the results of the previous section, and we find, using the notation of (34)
= A @ { T ( T ) } ( --~ Bsin(B),(,)Ccos{4(r ) = -aysech-
ff7
2
ffr
ar
2
2
- 2BC tanh --sech-
+ to)}
COS{Y(T
+to) + 40).
(50)
We now evaluate the integral
by using the calculus of residues to give
which does indeed have zeroes if
-y< - 2 d B C 2 -
ff
(53)
which is so whenever y 5 4Cfi2BIA). So we conclude that, under perturbations of this very simple form, the homoclinic tangle, and therefore chaos, will occur whenever the dissipative influence is sufficiently small.
52
P J Message
References Garfinkel B, 1966, Astronomical Journal 71, 657-669. Garfinkel B, Jupp A H and Williams C, 1971, Astronomical Journal 76, 157-166. Gleick, James, 1988, Chaos; Making a New science, Heinemann, London. Jupp A H, 1973, Celestial Mechanics 7, 347-355. Melnikov V K, 1963, On the stability of the center for time-periodic perturbations, Tkans Moscow Maths SOC12, 1. Message P J, 1987, Planetary Perturbation Theory from Lie Series, including Resonance and Critical arguments, in Long-term Dynamical Behaviour of natural and Artificial N-Body Systems edited by Roy A E, Kluwer, 47-72. Message P J, 1993, Celestial Mechanics 56, 277-284. Message P J, 1999, Orbits of Saturn’s satellites: Some Aspects of Commensurabilities and Periodic Orbits, in The Dynamics of Small Bodies in the Solar System edited by Steves B A and Roy A E, Kluwer, 207-225.
53
The Yarkovsky effect in the dynamics of the Solar System David Vokrouhlickf Charles University, Prague, Czech Republic
1
Non-gravitational forces in solar system dynamics
Large bodies in the solar system (planets, their natural satellites, etc.) are most often considered as the ideal test bodies for the gravitational physics. As such, they even have a capability to fruitfully probe the structure of the gravity theory, which nowadays means the first post-Newtonian level of relativistic theories (see, e.g., Will 1993). On the contrary, the motion of small solar system bodies (dust particles, artificial satellites, etc.) is submitted to a number of non-gravitational forces that usually mask the tiny details of the gravitational action exerted upon them by the Sun, planets and other massive bodies. This conclusion is quantitatively due to the fact that the majority of non-gravitational forces are surface phenomena (absorption and emission of physical fields and/or particles of the interplanetary medium) and thus scale with the second power of the body’s size. On the other hand, it is a remarkable property of the gravitational interaction that it rather depends on the volume (mass) of the body and, hence, scales with the third power of its size. The ratio of the strength of diverse non-gravitational forces to that of gravity thus typically decreases with the body’s size. The accuracy of the available observations and the correctness of the theoretical arguments determine whether in a particular case one may neglect the influence of the non-gravitational forces or not, since these forces obviously act on the motion of even the biggest bodies. For instance, the motion of comets was a purely gravitational problem for astronomers at the beginning of the 19th century, while the precision of the late,l9th century observations already enabled one to conjecture the dynamical (non-gravitational) action of the out-gassing processes. While comets are possibly a special case, the transition size of the inactive solar system bodies for whose motion the non-gravitational forces has to be considered seems to be an interesting, “epoch-dependent” value. About a decade ago, centimetre to decimetre sized bodies were at the edge of this transition. It is a purpose of this review to demonstrate that understanding the dynamics of bodies with sizes up to small asteroids (E 1-lOkm), and that of the Moon, requires an analysis of the non-gravitational force influence today.
54
2
David VokrouhJick3;
The principle of the Yarkovsky effect
Since the time of Maxwell we know that electromagnetic radiation propagates energy as well as linear momentum (for historical notes about the astronomical context see, e.g., Mignard 1992). Absorption and/or emission of the radiation means interchange of linear momentum between the body and the radiation field by the obvious law of action and reaction. Early astronomical applications of radiation pressure, and the derivation of its velocity dependent component (later known as the Poynting-Robertson effect: for a final form see Robertson 1937), may be found for instance in Poynting (1903). A contemporary review of the radiation force influence on the dynamics of solar system bodies may be found in Burns et al. (1979).
,4special kind of radiation pressure effect occurs when the surface temperature of the body is nonuniform. Thermal photons emitted by hotter regions on the surface carry away more energy, and thus more linear momentum, than the corresponding photons emitted by cooler regions on the body‘s surface. .4s a consequence, the recoil action of the thermal radiation is not averaged out if integrated over the whole surface of the body and a net radiation force (and torque) appears. This force is by a curious historical tradition called the Yarkovsky force (see, e.g., Opik 1951, Vokrouhlick3; 1998a). Though we outlined the basic principle behind the Yarkovsky effect, we immediately face a number of questions and problems. Here are a few of them. First. let us understand how a cosmic body may keep some part of its surface at a higher temperature than other parts. There are several possibilities. A particular case occurs when a cosmic body has its own heat source that distributes the energy anisotropically. The space probes Pioneer 10 and 11 containing asymmetrically located radioactive thermal generators may be mentioned as an example. It seems that the recent observation of a solar-oriented acceleration acting on these probes (Anderson et al. 1998), that has been interpreted as a violation of Newton’s gravity, may be partly due to the thermal (Yarkovsky) force. While surface processes on active bodies (e.g. sublimation and ejection of dust and gas on comets) may also result in surface temperature gradients. the anisotropic absorption of external radiation is the most common way of keeping temperature gradients on the surface of inactive cosmic bodies (e.g. small asteroids and their fragments, passive satellites). The radiation-exposed parts of the surface become hotter by a partial absorption of the radiation energy. The most obvious source of radiation is the Sun, but the infrared radiation of the planets may also act as a source in particular cases (e.g. the artificial satellites). Second, as the bodies get small the temperature differences throughout their entire volume (and surface) naturally diminish due to heat conduction. This conclusion is independent of the particular mechanism of generation of these temperature differences. When the small bodies approach the temperature equilibrium the efficiency of the Yarkovsky effect on them decreases. We thus arrive at an important observation, i.e. that the Yarkovsky effect is most efficient in some range of sizes only (since for large bodies its efficiency also decreases as explained in Section 1). A precise evaluation of this range for bodies with particular values of thermal constants, rotation speeds, orbit geometry etc. is an important task for a quantitative modelling of the thermal effects. The simplest approach will be given in Section 3 below. Third, since the value of thermal conductivity is finite, there always exists a time delay
The Yarkovsky effect in the dynamics of the solar system
55
between the absorption of the external radiation and its reemission. The extent of this delay depends both on the thermal parameters of the body and on the frequency by which the incoming (external) radiation flux is modulated; actually we shall see that there is always a whole spectrum of such frequencies, naturally clustered around the rotation and revolution frequencies. It will be also explained in detail below that the inertia in the thermal response is essential for the dominant orbital effects (the semimajor axis drift, in particular). In the next section we shall quantify the above discussed concepts using the simplest model. More involved approaches, including the nonlinear thermal response to the external heating or eccentric orbits, could be found in the literature (e.g. Vokrouhlickj. and Farinella 1998, 1999; Spitale and Greenberg 2000).
3
A simple model for the Yarkovsky effect
The problem of the Yarkovsky force estimation naturally splits into two steps: (i) determination of the surface temperature distribution, and (ii) evaluation of the thermal radiation recoil force. The former problem has been studied in some detail, especially in the context of the radiometry of asteroids (e.g. Lebofsky and Spencer 1989, Spencer et al. 1989). However, the theory of the Yarkovsky effect is characterised by two subtle points. First, using the radiometric terminology, the Yarkovsky force computation inevitably requires a thermophysical model (the so called “standard” radiometry model is of no use here since it does not enclose the effects of thermal inertia). On the other hand, if the body’s orbit is quasi-circular one may significantly simplify the problem by assuming the temperature to be close to a constant mean value. Then, a linearised theory may be derived analytically, as it will be demonstrated below, without any need of an involved numerical solution.
Assumptions. The solution outlined briefly below is due to Vokrouhlicki (1999) and the reader is referred to this paper for a more detailed discussion. We should also point out a pioneering work of Rubincam (1995, 1998) who obtained a similar solution for the seasonal variant of the Yarkovsky effect, though in somewhat less compact formalism. Here follows a list of the simplifying assumptions used in the following: the body is spherical (for a generalisation see Vokrouhlickj 1998b, Vokrouhlickj and Farinella 1998), its surface emits thermally as a grey body with emissivity coefficient E and according to the Lambert law, the temperature T(r,t ) a t any position r in the body and t i m e t is close to a constant, mean value T,,, hence T(r,t ) = T,, AT(r, t ) with IAT(r, t)I << T,,, the body revolves around a radiation centre on a circular orbit (eccentricity-related corrections neglected), and the ratio m between the rotation rate wrot and the orbital mean motion w,,, is an integer number.
+
The third item above represents the most restrictive assumption, since it excludes the analysis of the Yarkovsky effect on very eccentric orbits (where large temperature variations
56
David VokrouhJick3;
are to be expected along one revolution around the Sun). Examples of a full, nonlinearized theory of the Yarkovsky effect on large bodies has been developed by Vokrouhlickj. and Farinella (1998, 1999) and in the most general formulation by Spitale and Greenberg (2000). The last assumption of the previous list allows us only to express the incident radiation flux on every surface element in a simple way. It can be easily removed by a technique discussed in Farinella and Vokrouhlickj. (1996).
Basic equations. The fundamental equations of the heat diffusion problem are both energy conservation constraints; either applied to the body's volume (called also the Fourier equation) LIT V .( K V T )= pCat or to its surface
(K:),
+ eaT4 = cd
The latter equation appears in the problem as a boundary condition for the temperature ( T ) distribution solution. The body is represented as a continuum with the following parameters: p the density. K the thermal conductivity and C the thermal capacity. All of these quantities may be, in turn, temperature and location dependent. In what follows, however, we neglect these effects considering their values for the mean temperature T,,. The surface absorptivity in the optical band is denoted by Q (one minus Bond albedo; see Vokrouhlickf and Bottke 2000) and the thermal emissivity by E . Both of these parameters are typically close to unity. The incident radiation flux on a given surface element is denoted by & and a is the Stefan-Boltzmann constant. Neglecting the heat conduction term in ( 2 ) and assuming a circular orbit of the body we define the mean temperature T,, by: 4 ~ a T :=~ a&* (where &, is the radiation flux at the body's distance from the radiation centre). Notice the factor 4 in this definition. It is a consequence of the fact that the thermal energy is radiated by the whole surface of the body (47~R')while absorption occurs by the cross-section (.rrR2)only. Before proceeding with the solution of Equations (1) and (2), it is useful to introduce new variables. As mentioned above, we shall rather use AT = T - T,, since this is a small quantity with respect to T,,. Moreover, a suitable scaling of the variables removes constants and numerical factors from the equations, compressing them into a single parameter 0 = m & / ~ a T ; . called the thermal parameter (here T* is a subsolar temperature defined by EOT:= a&*). The final, non-dimensional (primed) variables are the radial coordinate T , measured from the centre of the body, is scaled to where 1, =
T'
= r/1,
JK/pc.,,,,
the temperature change A T is scaled by the subsolar value T*, AT-+AT'=AT/T*, the variable part of the external radiation flux on a given surface element A& = &-&*I4 is scaled by the nominal value &* of the radiation flux a t the body's distance from the source, i.e. A& A&' i = A&/&*, and time t is to be replaced by a complex variable
= exp[iwrev(t- t o ) ] .
The Yarkovsky effect in the dynamics of the solar system
57
The choice of the time origin to has to do with the reference frame used throughout the solution. Given the assumed symmetry of the body we shall use a system of spherical coordinates with the origin in the body’s centre. The colatitude 0 is measured from the body’s spin axis and the longitude 4 from the equatorial axis X. The system rigidly rotates with the body so that the axis X points toward the radiation centre at an arbitrary time to (one easily proves that such an instant always exists on a circular orbit). Rewriting Equations (1) and (2) into the new set of variables, we obtain
The operator
A(O,4) in Equation (3) represents the angular part of the Laplacian:
The O(2) C3[(AT/T,,)2]term reminds us that the non-linear terms in the presumably small parameter AT/T,, are always omitted in our approach. Peterson (1976) developed an analytical theory quadratic in this parameter and noticed that his results differ by a small amount from the predictions of the linearised theory, provided orbits of low eccentricity are assumed. This conclusion has been later verified also within the complete non-linear theory (Vokrouhlick9 and Farinella 1998, 1999; Spitale and Greenberg 2000). The general goal of the solution is to find the temperature AT’ everywhere in the body (r’,8, 4), with r’ ranging up to RI, where the surface is located, and at any time C. The solution is determined by (i) the value of the thermal parameter 0 , and (ii) an explicit specification of the radiation flux term A&’.
Radiation flux term. The term AE’ is to be determined for any surface element parametrised by the angular variables (e,4). Since it represents a function on a sphere, one may seek a spherical function representation n
n2l
k=-n
An advantage of the solution is due to the fact that only the dipole ( n = 1) coefficients are to be needed. The latter read
where (6’0,do) are the colatitude and longitude of the local position vector of the radiation centre. As the body moves on its orbit, these two quantities change. After some algebra one finds i COSBO = - s i n y s i n X = -sinT(<-C-’) , (8) sin ,go e*imo
= sin2 2 2
where y is the obliquity of the spin axis.
p(m-i-1)
2 + cos2
2 ~ i ( m - 1 ). 2
(9)
58
David VokrouhlickS;
Temperature dipole solution. Having formulated the problem in detail, we can now embark on solving the Equations (3) and (4) with the external flux term (6). In particI) of the ular, we shall be again interested in obtaining the dipole part coefficients spherical function representation
n>l k = - n
The most important simplification of the problem is due to the linearity of Equations (3)(4). The orthogonality of the spherical harmonics Ynk(6’, 4) then results in a complete decoupling of the equations for the amplitudes tLk(r’;C). In particular, they satisfy
with the boundary constraints
Since the operator C(d/dC) ”recycles” the Fourier modes (given by <(d/dC)Ck= kCk for all integer values k), we may also treat separately all the components of the a,k-amplitudes in the Fourier representation; see Equations (8)-(9) for the dipole terms. We may thus easily separate the radial and temporal parts in tkk(r’;C). The radial functions then satisfy a system of homogeneous spherical Bessel equations with complex coefficients. The details of the solution may be found. for instance, in Vokrouhlick? (1998a, 1999).
: after some deal of algebra one can thus express the zonal dipole coefficient t $ ( ~ ’C) a t the surface of the body r’ = R’ as
where X = urev(t- t o ) is the mean orbital longitude. Here, we have introduced the 1 amplitude ER! and the phase lag 6 ~ by ER, exp (ZbR,) =
+ +
A(x) iB(z) iD(x)’
C(z)
with z = f i R ’ and the auxiliary functions A ( x ) ,B ( x ) ,C ( x ) ,D ( x ) reading
A(z) = - (z+ 2) - e” [(x- 2) cos x - x sin x] , B ( x ) = -z - e“ [xcos x + (z- 2) sin x] . C ( x ) = A ( z )+ - (3 (x + 2) + e” [3 (x - 2) cosz + z(x - 3) sinx]} , 1+x
D ( x ) = B ( z )+ - {x(x 1+x
+ 3) -e”
[x(x - 3)cosx - 3 (x - 2)sinxI) .
(15) (16) (17)
(18)
The parameter X is defined by: X = O / ( f i R ’ ) . Note that the zonal irradiation alo cx sinX (Equation (8)) and thus b p indeed plays a role of the delay-parameter between
The Yarkovsky effect in the dynamics of the solar system
59
the irradiation and the temperature response. Obviously, at zero conductivity K this parameter vanishes. The Fourier representation of the tesseral dipole coefficients t;+l (r’;() has a slightly more complicated form
The radial r-functions satisfy again a system of spherical Bessel equations and can be found, for instance, in Vokrouhlicki (1999).
Recoil force evaluation. Having determined the basic features of the temperature distribution on the body’s surface, we may then proceed to compute the recoil thermal (Yarkovsky) force. Given an oriented surface element dS with temperature T . the recoil force per unit mass of the body df reads: df = -(2/3) (eaT4/mc)dS. The structure of this formula is directly related to the assumed Lambert law for the directional characteristic of the thermal radiation: (i) the force is opposite to the surface orientation vector (df x -dS), and (ii) the factor 2/3 is a consequence of the radiation isotropy. Integrating over the whole surface of the body (sphere), and keeping the linear approximation T4 2: T,”, 4TfvAT . . ., we obtain
+
+
for the thermal recoil force per unit of mass of the body. Here, CP = (&*7rR2/mc)is the usual radiation force factor: m is the body’s mass, c the velocity of light and n the unit vector normal to the surface. Since the surface normal vector n in the integrand of (20) can be expressed as a combination of the spherical functions of degree 1 only, and the spherical functions are orthogonal, we obtain the following expression for the three Yarkovsky force components
fz(0
=
Obviously, these are again referred to the coordinate system rigidly rotating with the body (introduced in Section 3.2). In particular, the component fi yields a projection are the out-of-spin of the Yarkovsky force onto the spin axis direction, while (fx,fu) components of the Yarkovsky force. Notice that the spin-aligned component fi-(22) and (13)-does not depend on the rotation frequency but it is uniquely determined by the revolution frequency around the radiation centre. As a consequence, it is called the seasonal component of the Yarkovsky force. Interestingly, this component has been entirely omitted in the early works (Opik 1951, Radzievskii 1952, Peterson 1976) and has been derived only in the 1980’s in the context of the LAGEOS dynamics (Rubincam 1987, Slabinski 1997). Rubincam (1995) was the first to apply the seasonal variant of the Yarkovsky effect in planetary dynamics. The pioneering works of Opik and Radzievski dealt with the effects of the out-of-spin Yarkovsky force components fx and fu.With the help of explicit expressions of the radial
60
David VokrouhlickS;
r-functions in (19) we obtain
fx +ifY
4 a@ 91+x
= -- -(sin2 2 ER; exp(-zbR;)
2
Y C-’ +cos2 -ER! exp(-zdR:) C) C-”. 2 -
(23)
Here we have defined R‘, = \/1 f l / m RL (RL = fiRI), while the remaining quantities are the same as those we used previously. Since the force components in (23) depend on the rotation frequency--via the “commensurability” factor m - we call them diurnal.
Secular mobility of the semimajor axis. The three Yarkovsky force components (fx,f y , fi)from the previous section are given in a coordinate system rigidly rotating with the body. In order to estimate their orbital effects one has to transform them into the orbit-oriented reference frame. The evaluation of the transverse force component 7 is of a particular interest, since the semimajor axis change is given by: ( d a l d t ) = 2 7 / ~ ~ , ~ + C ? ( e ) . In what follows we shall not consider the perturbations of the other orbital elements, since they are of much lesser importance. First, let us consider the effect of the spin-aligned Yarkovsky force component fi. Averaging the corresponding Gauss equation mentioned above over one revolution around the centre we obtain a mean drift rate of the semimajor axis in the form
(2)
4a @ 9 Urw
=--
sin6Rt sin’?. 1+ X
ER,
(24)
Analogously, we may obtain the corresponding mean semimajor axis drift-rate due to the diurnal Yarkovsky force components (fx, fy) as 8cu @
N
[cos4 22 E R ,- sin bR[
8a @ ER:,sinb~k cos y . 9 U,,, 1 x
+
- sin4 2 ER; sin 6 ~ ; )
2
(25)
Here, we can note that in typical astronomical applications m >> 1, resulting in R’, N RL. We remind that RL = f i R ‘ , while R’ = R/ls and 1, is the penetration depth of the seasonal thermal wave. We may also introduce the penetration depth of the diurnal therso that RL = R/ld. It is important to keep in mind that the mal wave Id = penetration depth of the seasonal wave (at lower frequency U,,,) is larger than the corresponding penetration depth of the diurnal wave (at higher frequency LJ,,~). In particular, the diurnal temperature variations are typically surface phenomena only, since Id may be smaller than few centimetres. The Equations (24) and (25) giving the seasonal and the diurnal mean mobility of the semimajor axis resemble each other to a large extent apart from two important differences: (i) although structurally they both contain the (E sin6) factor, the scaling of R (radius of the body) is different in the two cases (notice that the X variable in the denominator is scaling-invariant), and (ii) there is an entirely different dependence on the obliquity y. As far as the first issue is concerned, we remind that RL = f i RI, where fi N 100 - 10000 in the typical astronomical applications. Secondly, thanks to the second power of sin y in Equation (24) the seasonal effect always results in an orbit decay: (daldt), < 0 (notice that the phase 6 ~ 1is negative). On the other hand, the diurnal semimajor axis mobility (da/dt)d may be either positive or negative.
The Yarkovsky effect in the dynamics of the solar system
61
Figure 1. Mean change of the semimajor axis A a (in AU) of objects in the inner part of the main asteroid belt vs their radius R (in km); both components (diurnal and seasonal) of the Yarkovsky effect included. Five different values of the surface conductivity K considered: (1) K = 0.002 W / m / K ; (2) K = 0.02 W/m/K; (3) K = 0.2 W / m / K ; (4) K = 2 W / m / K ; and K = 40 W / m / K (curve m, for metal-rich bodies). The low-K cases are dominated b y the diurnal effect, while for high-K cases the seasonal effect is more important. The dashed strips correspond to three astronomically important classes of bodies: (a) pre-atmospheric meteorite parent bodies (R = 0.1 -1.5m); (b) Tunguska-like small NEAs (R = 5-30m); and (e) the largest existing NEAs or smallest family members today observed (R = l-l0km). Note that Aa depends sensitively on the selected value of K in the (a) and (b) size ranges, but much less so in range (e).
To get a first glimpse of the semimajor axis mobility due to the Yarkovsky effect we performed a simple test-see Figure 1. Assuming bodies of different sizes in the inner part of the asteroid belt, we estimated the average change Aa of the semimajor axis that may be accumulated during the collisional lifetime Tdisr. Obviously, both A a and Q i s r are size-dependent quantities; in particular, for the latter we. assumed Tdisr N 1 6 . 8 0 Myr based on the computations of Farinella and Vokrouhlick$ (1999) with R being the characteristic radius of the body in meters (see also Farinella et al. 1998). We note that A a is acquired through small increments (da/dt)y,,k x bt, where (da/dt)y,,k is the total Yarkovsky drift of the semimajor axis (linear composition of (24)and (25) at this level of approximation) and 6t is the step-size of about 10 kyr. Initially, the obliquities y of the spin axes of all objects were assumed random in space but at each time-step 6t we consider a possibility of a collisional reorientation as a result of non-disruptive impacts. These phenomena are modelled as a Poissonian process with a characteristic timescale of Trot N 15.0 J?i Myr (see Farinella and Vokrouhlickjr 1999; Farinella et al. 1998). Rotation rates are assumed to be correlated with the size through the linear relation wrot = 5 R, that roughly approximates observations. The principal unconstrained parameter is then the surface conductivity K . For this reason we run several simulations with different
David Vokrouhlicki
62
values of K spanning the physically admissible range: from low value K N 0.001W/m/K, corresponding to a highly particularized, regolith-like surface (e.g., Wechsler et al. 1972. Langseth et al. 1973), to high value K rrl-2W/m/K, corresponding to bare basaltic rock. Increasing degree of the surface porosity results in a decrease of the value of K (e.g., Yomogida and Matsui 1983, Presley and Christensen 1997). Two special cases with typically higher conductivity might be also mentioned: (i) icy bodies, that might be appropriate for the outer part of the asteroid belt and further regions in the solar system, with K in the range 1 - 10W/m/K (depending on percentage of the dust pollution), and (ii) the iron-rich objects with even higher conductivity of K N 40W/m/K. The surface density and heat capacity may also vary according to the type of the surface. but typically in a tighter range. Data in the Figure 1 indicate several interesting results: (i) except from the highstrength iron objects, the maximum expected drift within the collisional lifetime is roughly 0.1 AU (up to 0.2 XU for regolith-covered objects; see Vokrouhlickj and Broi 1999), (ii) the drift becomes smaller for larger bodies (downto 0.01 AL at R rr5-10km), but on the other hand much less selective as far as the surface conductivity value is concerned. Another important result concerns high-conductivity objects (curve 4 in Figure 1) which have a significant peak of mobility for x 10m size range. This is an entirely novel finding due to the seasonal component of the Yarkovsky effect (Rubincam 1998, Farinella et al. 1998). The above mentioned quantitative values of Au (FZ 0.1 AL drift for smaller objects and x 0.01 AU for large objects) represent a key starting information for applications of the Yarkovsky effect in the solar system dynamics. For instance, 0.1-0.2 AL is a typical distance to the closest strong orbital resonance in the main belt and thus any collisional ejecta in the belt are possible feeders of the resonances. Similarly, 0.01-0.02 A'C' is a fair fraction of an asteroid family size in the semimajor axis direction and thus smaller asteroids in families may undergo a large mixing within a typical family age (a few Gyr). More details about these applications will be given below.
4
Applications of the Yarkovsky effect
Since we are presently only starting to understand and/or exploit all possible applications of the Yarkovsky effect in solar system dynamics, the next Section 4.1 is intended to be a brief overview of the results achieved so far; in Section 4 . 2 we shall attempt to envisage the future development of the theory and applications of the Yarkovsky effect.
4.1 Worked examples Observability through dynamics of NEAs. The present evidence of the Yarkovsky effect is mostly based on investigation of the statistical properties of large samples of objects (e.g. CRE age distribution of meteorites) or on qualitative arguments (e.g. origin of large near-Earth asteroids (NEAs), size-dependent dispersion of the asteroid families in the semimajor axis). It would surely be interesting to prove the existence of the k'arkovsky effect by detecting its orbital perturbation on an individual object. Though we succeeded to do so in the case of the artificial satellite motion around the Earth (e.g. Rubincam 1987), a similar evidence has not been obtained yet for the dynamics of natural bodies.
The Yarkovsky effect in the dynamics of the solar system
63
Torn
Figure 2. Left: Estimated secular drift of the semimajor-axis of several near-Earth asteroids due to the Yarkovsky effect vs admissible values of their surface conductivity (in W/m/K;most likely values are between 0.005 W/m/K for larger objects and 0.05 W / m / K for smaller objects); ordinate units in A U/Myr. Right: Uncertainty ellipsoids, based on current orbital data, projected onto the range R vs range-rate dR/dt plane of radar observables for epochs close to the next Earth-encounter of the asteroid Golevka (in May 2003). Solid/dashed lines for solution with/without the Yarkovsky effect. Ellipses labelled with 0 correspond to the epoch of the closest Earth-approach of the nominal orbit (without the Yarkovsky effect), others for f 3 and f 6 days. Coordinate origin always referred to the nominal orbit at the epoch.
There are two obvious reasons for this situation: (i) we know nearly nothing about the orbital dynamics of meter- t o deka-meter sized objects, for which the effects are maximum (and thus most likely observable), and (ii) the Yarkovsky force becomes rather small for large asteroids or satellites. Considering these facts, Vokrouhlickf et al. (2000) concluded that the NEAs are the most likely candidates for direct detection of the Yarkovsky effect. First, they are generally small objects with even a few cases of known orbits corresponding to deka-meter sized bodies (see, e.g., Ostro et al. 1999). Second, some of them have been observed by.the radar technology that allow both very precise astrometry (by a factor of 100-1000 better than the usual optical astrometry) and measurement of the physical parameters of the body (shape, surface properties and the rotation state) required for modelling the Yarkovsky effect. Vokrouhlickf et al. (2000) noticed that in all candidate cases the presently available observations are not precise enough to detect the Yarkovsky effect. It should be pointed out that the primary orbital effect, that might eventually be observed, is the longitude perturbation related to the secular change in the semimajor axis (see Figure 2 to have an idea about the order of magnitude of the drift in the most promising cases). Since the semimajor axis change has a linear component in time, the longitude effect is basically quadratic in time. Observability of this effect would thus profit from having two radar observations separated well enough in time. However, either we have NEAs observed over a long time interval, but the observations are of a low precision (e.g. in the case of Icarus we have two Doppler observations in 1968 and 1996), or we have NEAs with two very precise radar (delay) observations, but these are not separated enough in time (e.g. Goievka observed in 1991 and 1995).
64
David VokrouhlickJ;
The main result of Vokrouhlick? et al. (2000) is that the radar observations during the future apparitions of several objects (Golevka, Icarus, Geographos etc.) may reveal existence of the Yarkovsky perturbation of their orbital motion. An example is shown in the Figure 2 (right) where we compare uncertainty ellipsoids of the orbital solution with and without the Yarkovsky effect for the asteroid Golevka. Since the axes (relative distance and velocity of the Earth and the asteroid) are basically the radar observables, the future observations will project as a point(s) in this plot. The fact that the uncertainty ellipsoids of the two solutions do not overlap suggests a possibility to observationally disprove one of the models (at a given statistical level; note that 30 means roughly 98% confidence level). Similar results have been obtained also for other asteroids, notably Icarus, Geographos or 1998KY26. In collaboration with the JPL group of S. Ostro we are doing efforts to take the necessary radar observations. In some cases, pre-encounter optical observations might be also very suitable (e.g. Vokrouhlicki et al. 2000).
Meteorite transport from the main belt. The original motivation behind the Yarkovsky effect was related to the transport of bodies into Earth-crossing orbits (Opik 1951, Radzievskii 1952, Peterson 1976). However, since until the end of the 1970’s nothing precise was known about the role of the mean motion and the secular resonances for the Earth-crosser dynamics, Opik and even Peterson assumed a slow, permanent decay of semimajor axes from the typical values in the main belt to 1 AU. Very long timescales and/or unrealistically slow rotation periods were then required. This situation might have weakened the overall credibility of the Yarkovsky effect in the late 1970’s. The perspective started to change with improved understanding of the fundamental importance of the mean-motion and secular resonances for the meteoroids and NE,4s delivery toward the Earth (e.g., Wetherill and Chapman 1988, Greenberg and Nolan 1989). The source regions of the fragments thus appeared to be much closer to the escape hatches toward the Earth, so a Yarkovsky dominated drift of the semimajor axis of only a fraction of one AC (typically 0.05-0.15 AU) is necessary for starting the transport chain. Millions to tens of millions of years are then sufficient for the Yarkovsky dominated period in the transport scenario. Such a timescale becomes astronomically realistic, since it is comparable to the estimated lifetime of the objects and it also fits well to transport timescale as “measured” by the cosmic-ray exposure (CRE) ages of the meteorites (e.g. Marti and Graf 1992). The fundamental role of the Yarkovsky effect thus consists in delivery of the asteroidal fragments from a generic place in the belt toward the principal resonances that eventually direct the bodies to the collision with the Earth (mainly the 4 and 3/1 resonances; Morbidelli and Gladman 1998). Since the necessary transport time is also comparable (especially for large fragments) to the dynamical lifetime in the weak main-belt resonances (e.g. Yeworn? and Morbidelli 1998, Morbidelli and Nesvorni 1999), a detailed description of the transport toward the above mentioned escape hatches from the belt presents as a very complicated issue. Note, that the orbits may be temporarily trapped in one of the weak resonances, a process that may (or may not) eventually result in ejection onto Mars-crossing orbits prior to the 3/1 or v6 resonances. On the other hand, the Yarkovskydrifting orbits may “jump-over” the weak resonances, accelerating thus effectively the drift rate. The first attempt to quantify the mutual interaction of the Yarkovsky drifting orbits with the gravitational resonances by a direct numerical simulations has been performed by Bottke et al. (2000; apart from an older and less elaborated work by Afonso et al. 1995). Broi et al. (2000) then extended results of Bottke et al. by considering (i) bodies
The Yarkovsky effect in the dynamics of the solar system
C
io
5C
80
loo
1 io
65
1I C
[MYrl
Figure 3. Mean semimajor axis (in A U ) vs time (in Myr) for numerically integrated orbits of 50 particles initially ejected from asteroid Hebe (all particles have lOOm size). Gravitational perturbations of planets (except from Pluto) and the Yarkovsky effect were included i n this simulation. Spin axes initially randomised and a low thermal conductivity of the surface assumed ( K = 0.0015 W / m / K ) . Diurnal Yarkovsky eflect makes the semimajor axes to drift nearly linearly with time, until the orbit reaches one of many resresonances an the belt. I n case of strong resonances (3/1 at x 2.5 A U or the onance at x 2.33 A U ) the particles are typically ejected from the system. If weaker resonances are encountered (e.g. 1/2 external resonance with Mars at x 2.42 A U or the mixed Jupiter-Saturn-asteroid resonance 4,-2,-1 at x 2.4 AU), particles may be temporarily captured or quickly jump across the resonance (see also Figure 4). Scattered points are high-eccentricity orbits in the 3/1 or resonances before they were removed from the simulation. of different sizes (in the 1 to 500m range in radius), and (ii) larger sample of bodies to evaluate statistical probabilities of capture/jump over several weak resonances in the inner part of the asteroid belt. In what follows we shall show data from this reference, but many of them were already reported by Bottke et al. (2000). Figure 3 shows the evolution of mean semimajor axis of 50 fragments released initially from the asteroid Hebe, while Figure 4 illustrates the time evolution of a single particle exemplifying the whole palette of intrigue details of the transport process in the inner part of the inner asteroid belt: temporal capture (for about 10 Myr) in the external 1/2 resonance with Mars, rapid jump over the multiple resonance (4,-2,-1) with Jupiter and Saturn, scattering on Mars and final ejection via the 3/1 resonance. Figure 3 demonstrates the sinks from the main belt represented by the 3/1 resonance (at x 2.5 AU) and the 4 resonance (at x 2.33 AU for Hebe inclination). In both cases, however, the output from the numerical simulations revealed also unexpected results. First, an encounter of drifting particle with the 3/1 resonance most often leads t o its ejection onto very high-eccentricity orbits. Sometimes, however, fast-drifting particles may cross the resonance in both senses (see Figure 3); this result has been already reported by Bottke et al. 2000. Second, the zone of influence of the v6 resonance was found rather large and the particles (slowly)
66
David Vokrouhlick3;
,
r, s N
Figure 4. Mean semimajor axis (in AU) vs time (in Myr) for 20m-sized particle released initially from asteroid Hebe. Parameters of the simulation as in the Figure 3, notably lowconductivity surface assumed. Periods of interaction with weak resonances are magnified in the two boxes: (a) temporary capture an the external 1/2 resonance with Mars lasting for about 10 Myr, and (ai) rapid jump over the mixed Jupiter-Saturn-asteroid resonance (4,-2,-1). At z 90 Myr the orbit was weakly scattered on Mars (due to close encounters), but the spin axis was modelled to be reoriented at this instant too. Within next x 30 Myr the particle crossed the inner part of the asteroid belt and was ejected from the simulation in the 3/1 resonance.
approaching this resonance from the main-belt region may gradually increase amplitude of the eccentricity oscillations. As a result, they may interact with Mars, and consequently be ejected from the belt, even before they reach the “nominal” position of the 243 resonance. Bottke et al. (2000) conjectured this property and Broi et al. (2000) quantified it on a large sample of integrated orbits (see also Figure 9). These latter authors also quantified probabilities of capture/jump for several weak resonances in the inner asteroid belt and show that these two processes are in approximate balance- the effective mean drift-rate is not substantially affected by the interaction with the weak, mean-motion resonance. As a novel finding, Broi et al. (2000) noticed a special role played by the higher order secular resonances (z2, 2 3 . g - 296 g5 etc.) that may capture Yarkovsky drifting orbits for interestingly long periods (see also Figure 9). Given the sense of the Yarkovsky drift, the mean eccentricity of the particles trapped in the higher-order secular resonances may be decreased or increased, adding thus to stability or instability of the orbits in the belt. We may thus preliminarily conclude, from the performed numerical simulations, that the orbital evolution in the inner and middle zones of the asteroid belt is very complex and essentially unpredictable process as far as the individual orbit is concerned. However. statistical properties of large samples of bodies may be derived from such simulations and these, in turn, be used as input information in codes that focus more on the collisional processes during the delivery (see below).
+
Whereas the previous references focused on the dynamical aspects of the Yarkovsky
67
The Yarkovsky effect in the dynamics of the solar system
1
1
a
a
z
z
1o.2
1o.2
10
2.1
2.2
2.3
2.4
2.5
2.1
2.2
2.3
2.4
2.5
Figure 5 . Temporal evolution of the semimajor axis distribution (abscissa; in AU) of fragments from Flora. Time in hundreds of Myr labels different curves. Shaded areas point out location of the 4 resonance (at the Flora proper inclination) and the 3/1 mean motion resonance. Left panel for meteoroids with low value of the surface thermal conductivity (K = 0.0015 W/m/K) right panel for meteoroids with high value of the surface thermal conductivity (K = 1W/m/K). In the first case the diurnal Yarkovsky effect dominates fragments’ mobility and results in smearing the distribution in the whole inner asteroid belt. In the second case the seasonal Yarkovsky effect dominates fragments’ mobility and causes collapse of the cloud toward the v6 resonance. The same normalisation for all curves, so that the integrals J NA,da yield actual number of simulated meteoroids in the main belt. Dashed curves for epochs > 250 Myr that might be affected b y the background chaotic diffusion from the belt (see Section 4.2).
effects (by investigating individual fragment orbits), Vokrouhlickf and Farinella (2000) have modelled in a statistical way the evolution of large “swarms” of fragments released by catastrophic break-up events or impacts on large asteroids in the main belt. The above discussed complex dynamics is highly simplified in this model and it is basically represented in proper element space by the secular semimajor axis drifts (24) and ( 2 5 ) and 3/1 due to the Yarkovsky effect. The statistical properties of delivery via the resonances are taken from the numerical simulations of Morbidelli and Gladman (1998). The principal effort is then focused on modelling the effects of random impact events resulting in the cascade-like generation of new populations of fragments. Each fragment, initially assumed to be ejected from a chosen parent asteroid, thus become sooner-or-later a myriad of smaller fragments that all drift according to the Yarkovsky effect toward the 3/1 and v6 resonances. Typical intermediate results of our simulations are shown in Figures 5 and 6: Figure 5 shows the semimajor axes distribution of the simulated fragments for two large samples initially released from the asteroid Flora, while Figure 6 shows the flux of fragments in the resonances and Earth-impacters for fragments initially ejected from Hebe. The main features of the model are as follows. The combination of the two studied phenomena-the Yarkovsky drift and the collisional dynamics- can feed efficiently the main resonances with small asteroid fragments from nearly all locations in the main belt, implying that the transport mechanism of the meteorites and small NEAs is less selective than thought before. Direct injections, considered in the “pre-Yarkovsky” studies (e.g.
68
David Vokrouhlick3;
0
200
400
600
800
T WYr)
Figure 6. The expected flux of fragments from Hebe (K = 0.1 W/m/K case) vs time, either into the 4 and 3:l resonances and at the Earth (full lane). The flux is dominated b y small (R < l m ) fragments, and the large fluctuations (about a factor 100) of the resonance fluxes are due to secondary fragmentations of relatively large bodies into swarms of smaller ones. The flux at the Earth mimics the behaviour of that to u6, which is the main delivery route in this case, although it is “smoothed out” b y the chaotic character of the post-resonant orbits. Note that if the radius RI of the largest body in the initial distribution of ejecta were changed (in this simulation RI = 250m), the quantities plotted along the vertical and horizontal axes in this diagram would scale roughly cx R:I2 and R:’~,respectively. Farinella et al. 1993, Morbidelli et al. 1994), seem to dominate this feeding process only for sources close to the resonances. The flux of the objects to the resonances is, contrary to the direct-injection scenario, spread over a long time span (see Figure 6). As an example, we quote that some 50 to 80% of the mass of the initial population of bodies released in the Flora-region may be transported to the resonances (dominantly the resonance) over 0.5 to 1 Gyr. Another important quantitative result from this model is that the distribution of accumulated CRE ages in the population of fragments reaching the Earth is in fair agreement with the observations (e.g. Marti and Graf 1992, Welten et al. 1997). In general, the CRE age histograms are found to depend on the age of the last event capable of dominating the local Earth swarm: relatively old events are likely to generate the background CRE age profiles (like in the case of L-chondrites) peaked at 20-50 Myr for stones and 200-500 Myr for irons, while comparatively recent and large events may create discrete peaks in the CRE age distributions (such as the 7-8 Myr prominent peak for the H-chondrites). In the latter case, the bulk of the original fragment population may still reside in the main belt and will supply a significant flux of meteorites in the future (next Myrs) and alter the currently observed distribution of their CRE ages. Figure 7 shows comparison of the simulated and observed CRE ages for different types of meteorites and different parent asteroids.
Long-term processes in the asteroid families. Farinella and Vokrouhlick?; (1999) have noted that the Yarkovsky effects are capable of providing some semimajor axis
The Yarkovsky effect in the dynamics of the solar system I
I
10
100
69
I
1
2
1
Figure 7. Comparison of the modelled and observed CRE-age distributions f o r three different meteorite types (data - grey histograms). As for the predictions, we show results of the direct-injection scenario with no Yarkovsky mobility ( D histogram) and the model including Yarkovsky mobility of the meteoroids and their precursors (bold full-line 1, 2 and 3 refer to thermal conductivity values of 0.0015, 0.1 and histograms). Has1W/m/K, respectively. Both the data and the results of our simulations were normalised independently. Part (a) assumes ejecta from asteroid Flora whose computed C R E ages are compared with the observed distribution for 240 L-chondrites. Part (b) assumes ejecta from asteroid Hebe and the comparison with 444 C R E ages of H-chondrites. Part (c) assumes ejecta from asteroid Vesta, compared to the C R E age data for 64 HED (howarditeeucrite-diogenite) meteorites. I n all cases, the intermediate K value appears to provide the best match to the data. Note that the direct injection scenario ,would always predict many more short C R E ages than are observed, and a shortage of ages between 20 and 50 Myr. Neither problem is present when the Yarkovsky mobility is taken into account. mobility even to km-sized small asteroids in the main belt. We refer to Figure 1 indicating that bodies in the 1 to lOkm diameter range may move in semimajor axis by x 0.01 AU within their collisional lifetime of 0.1 to 1 Gyr. This mobility may be a key mechanism for several interesting dynamical processes in the solar system. First we mention feeding the high-order and/or multiple resonances in the inner asteroid belt, which have been recently identified as the most likely dynamical routes for multi-km sized Mars-crossers and NEAs (Migliorini et al. 1998, Bottke et al. 2000). Other likely consequences are the eventual fall into the main resonances of fragments/asteroids generated “on the brink” (e.g. Milani
70
David Vokrouhlicki
Figure 8. Future evolution of the asteroid 7340 for 15 different assumptions of the spin axis orientation (gravitational and Yarkovsky perturbations included; low surface conductivity considered and radius of x 3km estimated from the absolute magnitude). About half of the states terminate in the 5 / 2 resonance within 0.5 Gyr and may represent past evolution of the asteroid Vysheslavia located in the tiny chaotic zone between 2.8282.829 A U . Notice also that within the estimated collisional lifetime (x Gyr) the extreme "clones" in this integration may separate in their semimajor axes as much as x 0.05 AU comparable to the width of the Koronis family. and Farinella 1995; KneEeviC et al. 1997) and the gradual spreading in semimajor axis of the small members of the asteroid families.
As a first step to quantitatively understand the above mentioned processes we have studied a possible long-term orbital evolution of small asteroids close to the inner boundary of the Koronis family. A primary motivation of this work was due to Milani and Farinella (1995) who discovered that the Koronis member Vysheslavia is presently located on a very unstable orbit. By integrating orbits, which initial conditions were all bound in the uncertainty ellipsoid of Vysheslavia, Milani and Farinella found that this asteroid will fall into the 5/2 mean motion resonance with Jupiter within 10 to 20 hlyr. Such an extremely short dynamical lifetime is in contrast with x 1 Gyr (or more) age of the Koronis family and Vysheslavia thus cannot be a primordial object. The most likely scenario, according to Milani and Farinella (1995), was that a recent secondary fragmentation in the family placed Vysheslavia into its current orbit. However, given Vysheslavia's size (x 15km), the probability of this hypothesis seemed very low. Vokrouhlickji et al. (2000) recently revisited the Vysheslavia's puzzling case by proposing an alternative scenario: the asteroid might have been put onto its orbit by a slow inward drift due to the Yarkovsky effect and thus be primarily located further from the 5/2 resonance. Since the diurnal variant of the Yarkovsky effect likely dominates the semimajor axis mobility (due to low surface conductivity, as follows from evidence reported by Muller et al. 1999) the Yarkovsky-origin hypothesis constrains orientation of the Vysheslavia's spin axis into a hemisphere (thus not severely). Presently, we have no
The Yarkovsky effect in the dynamics of the solar system
71
observational evidence of the axis orientation, but efforts are being done to determine Vysheslavia’s obliquity. To show then a possible past evolution of Vysheslavia’s orbit we int,egrated orbits of another Koronis members presently located further from the 5/2 resonances. These orbits are stable when only the gravitational perturbations are considered, but become evolving when the Yarkovsky effect is included in the simulation. Figure 8 shows one of these examples, notably possible future orbital evolution of the unnamed asteroid 7340 (1991UA2). Vysheslavia is about twice as large as 7340, which means that, if Vysheslavia was initially a t the orbit of 7340, the timescale (abscissa) would be twice as long. Even in this case the necessary transport time to the unstable chaotic region, where Vysheslavia is presently located, would be “comfortably” shorter than the Koronis family age. Since this new scenario has no additional constraints, apart from the hemisphere orientation of the spin axis, it becomes much more likely than the secondary-collision hypothesis (although this cannot be formally ruled out). Presence of further Koronis members on similar orbits as Vysheslavia seems to favour the Yarkovsky-driven origin (e.g. Broi and Vokrouhlick3; 2000). The case of Vysheslavia indicated for the first time that Yarkovsky-driven long-term processes may be occurring in real asteroid families. Further, but less elaborated cases, will be commented below.
4.2
Outlook and future work
Refined models of the meteorite transport. As it has been presented above, the current status of understanding details of the Yarkovsky effect role in the delivery of material from the inner part of the asteroid belt is based on the following lines of evidence: (i) numerical simulations of the long-term evolution of individual orbits with the Yarkovsky and gravitational perturbations included, and (ii) numerical simulations of large samples of collisionally evolving fragments with significant simplifications of the dynamical aspects of the transport. Though merging the two approaches in a single simulation is not possible now (because of computer-power constraints; note, e.g., that the collisional cascade effects imply that hundreds of millions of fragments are to be typically considered), their convergence is necessary in the future. Most straightforwardly, numerical integration of hundreds of Yarkovsky evolving orbits may yield some statistical properties of their interaction with gravitational resonances that were not included so far in the collisional modelling. Figures 9 and 10 illustrate some of these interesting phenomena. Figure 9 confirms that slowly drifting fragments may be delivered onto Mars-crossing orbits before they reach nominal position of the 243 resonance (see already Bottke et al. 2000), since the boundary a t which these bodies escape from the main belt corresponds to about g - g6 M 1 - 2 arcsec/yr. Already a t this “distance” from the Vf3 resonance the eccentricity undergoes large-enough variations so that encounters with Mars are possible. We note, that this phenomenon diminishes for smaller and faster drifting fragments and these may approach more closely to the 24 resonance. We also note that about 10% of fragments were trapped by the 252 secular resonance. The right part of the Figure 9 indicates that the middle zone of the asteroid belt is less affected by the background chaotic diffusion than the inner zone (also Morbidelli and Nesvorn3; 1999). Note here that the fast moving fragments (20m sized in this case) can cross the 3/1 resonance and continue drifting in the inner zone of the belt. Similarly, we have also recorded cases when
72
David Vokrouhlickj. SWIFT-RMVSY (7150, flwa-7), R = 1 W m,regolnh
03,
,,
2.1
SWIFT-RMVSY (7+50. maria-5). RI 10 m. regolith
I
2.15 2.2 2.25 proper semimajor axis a [AU]
2.3
23
24
25 26 27 proper semimajor axis a [AU]
28
Figure 9. Left: Proper semimajor axis (in A U ) vs proper eccentricity. of. 50 particles initially ejected from asteroid Flora (all particles have 200m size, initially randomised orientation of the spin axis and low surface conductivity). Proper semimajor axis is affected b y the Yarkovsky drift and proper eccentricity b y weak resonances and close encounters with Mars as the orbits approach the resonance (undergoing thus mean eccentricity oscillations with increasing amplitude). Notice also about ten particles trapped temporarily in the high-order secular resonance 2 2 . The position of the 22 and 2 3 resonances are indicated b y dashed lines (by their fO.5 arcsec/yr boundary; location of these resonances were determined using analytic theory of Milani and KneZeviC 1994); the central part of the vg resonance is also indicated (dashed-dotted line). These resonances are shown for the mean inclination of the initial orbits. Notice that the orbits are extracted before they reach the nominal position of the resonance. Mean pericentre line q M = 1.665 A U approximates limit of the Mars-crossing zone, but note that the osculating element q oscillates more that the mean element q M and Mars maximum apocentre is about 1.72 AU. Right: Proper semimajor axis (in A U ) vs proper eccentricity of 50 particles initially ejected from asteroid Maria (all particles have 20m size, initially randomised orientation of the spin axis and low surface conductivity). This middle part of the asteroid belt contains much less weak mean-motion resonances, so that the evolution seems more regular (dominated by the Yarkovsky drijt up to 3/1 or 5/2 resonances are reached). Only few particles (about 7) interact with the 8 / 3 resonance (at M 2.71 A U ) . As an interesting feature notice particles that crossed the 3/1 resonance without being immediately ejected from the system. However, they typically leave the resonance at a high-eccentricity state, so that they survive typically less then 10 Myr in the system after the 3/1 crossing. ~~
particles from the inner zone of the belt crossed the 3 / 1 resonance to the middle zone (see Figure 3 ) . Figure 10 demonstrates another aspect of the strong background chaoticity in the inner zone of the asteroid belt. Diminution of the numerically simulated population of Flora fragments in the main belt is compared with expectation of the collisional code of Vokrouhlick$ and Farinella (2000). This latter model disregarded the effects of the weak resonances and included only the v6 and 311 resonances. .4s discussed in detail by
The Yarkovsky effect in the dynamics of the solar system
-C
l
2 %
0.8
E
EL
73
decay of Flora basalt meteoroids population
s. 0.6
2 2
8
0.4
7
-83 V
P
c
0.2
0
7 s o E
R=lm
40
x)
O
+
MC model ------
10m
x
60 50m
*
100
80 time t[Myr] 100m
. _ _
0
500m
120
140
o noyarko -
8
Figure 10. Comparison of the meteoroids leakage f r o m the m a i n asteroid belt in the simple Monte Carlo model by Vokrouhlicky and Farinella (2000) and the direct numerical integration of 50 sample particles by BroE et al. (2000; both gravitational and Yarkovsky effects included). Ordinate gives number of bodies that are n o t o n Mars-crossing orbits given as a fraction of the initial population of the main-belt bodies; abscissa is t i m e since the beginning of the simulation (in Myr). Lines f o r the Monte Carlo model and symbols f o r the detailed numerical simulation. Legend below the figure indicates sizes of the bodies; high-conductivity surface assumed (K = 2 W / m / K ) . Rapidly moving fragments (R = 1 0 m ) indicate very good agreement, since the effect of weak resonances is m i n i m u m . As the m a x i m u m possible Yarkovsky drift decreases (typically large bodies), the agreement gets worse. Full squares correspond to the simulation without the Yarkovsky effect, when objects are moved onto the Mars-crossing orbits by the chaotic diffusion in the network of weak resonances (grey area indicates configurations that cannot be sustained in the inner part of the asteroid belt due to its background chaoticity). Morbidelli and Nesvornf (1999), the population of fragments decays even without the Yarkovsky effect (see full squares; after a 150 Myr simulation some 25% of the bodies diffused along the mini-resonances to the Mars-crossing region). When the Yarkovsky effect is included in the simulation, we observe faster decay of the main-belt population. At large sizes ( R = 500m) this result seems to support the hypothesis by Farinella and Vokrouhlicki (1999), who suggested that slight Yarkovsky mobility may bring bodies to the mini-resonances and thus enhance the population decrease. Small particles move fast enough to escape directly from the belt via the 2 4 and 3 / 1 resonances and this is a reason for a very good agreement with the semi-analytic theory for R = 10m particles. The fundamental result in the Figure 10 concerns, however, large fragments: we note that the simple modelling overestimates abundance of these objects in the population. This is an important fact, since their late disruptions may feed long-term flux in the principal resonances and consequently flux of meteoroids into the Earth (see Figure 6). Inclusion of the background diffusion of large fragments from the belt is thus an important task
74
David Vokrouhlicki
for the future development of the collisional transport codes. We note, that this problem is particularly important for the Flora-basin region. We also checked that it does not occur for simulations of evolution of fragments from Hebe and Maria, where the expected decay of the main-belt population well agrees with the numerical data. Obviously, lesser background chaoticity in these zones is responsible for the agreement.
So far we have reported the modelling of the CRE age distribution for the stony meteorites and the role of the Yarkovsky effects therein. The investigation of the CRE age distribution of the iron-rich meteorites has been excluded from that discussion because this latter problem is more difficult. basically from the reason pointed out in the previous paragraph. The slower drift in semimajor axis for the iron-rich bodies (see, e.g., Farinella et al. 1998) is consistent with the longer transport timescale to the principal resonances in the main belt (and thus longer observed CREs), but it also means that the principal resonances may not be the only routes that participate on their delivery. As explained above, a rather complex information about the manner in which the mini-resonances contribute to the diffusion of the meteoroids from the main-belt is needed to be included in the simulations similar to VokrouhlickS; and Farinella (2000). Moreover, the source for the iron meteorites appears to be less evident and thus their CRE ages modelling is a challenge for the future research. Semimajor-axis dispersion o f t he asteroid families. It has been demonstrated above that the semimajor axis mobility of small (multi-km) members in the asteroid families may be comparable to their width. The families may be thus expanding and even loosing small members via Yarkovsky semimajor axis drift. Similar effects occur also in eccentricity and inclination due to the omnipresent weak chaoticity in the main asteroid belt related to the high-order and/or multiple resonances with major planets (e.g. Milani et al. 1997, Nesvorni and Morbidelli 1998. Morbidelli and Nesvorni 1999). In course of time the families may occupy larger and larger volume in the proper element space, being initially much more tightly clustered. Quantitative aspects of this idea represent an interesting challenge, since more compact initial configurations of the families (corresponding to their smaller velocity dispersion) may be in better agreement with the numerical simulations of the catastrophic disruptions of large asteroids. If the family lies very close to a major resonance in the main asteroid belt (e.g. Maria near the 3/1 resonance, Koronis, Dora or Gefion near the 5 / 2 resonance, Themis or Hygiea near the 2/1 resonance; Morbidelli et al. 1995) smaller asteroids may fall in the resonance and be ejected from the belt. Evaluation of the mass-loss due to this process may be also an interesting research project for the future work. Putting these ideas in other perspective, we might also argue that highly clustered families should likely be very young. Milani and Farinella (1994) used this argument to constrain the age of the Veritas family by observing a rapid chaotic evolution of some of its present members. Quantitative aspect of the chronology of compact families (with all dispersive phenomena taken into account) also awaits further developments. Fine details of the size distribution of the main-belt objects and NEAs. The Yarkovsky effect may also provide a natural mechanism for explaining the observed overabundance of 10-100m bodies among NEAs (e.g. Rabinowitz 1993, 1994). The Yarkovsky mobility of a population of bare-rock (or iron-rich) fragments, dominated by the seasonal variant of the Yarkovsky effect, is maximum for bodies of size comparable to the penetration depth of the seasonal thermal wave (about 10m for stones and 20m for irons: see Figure 1). Thus. we may expect that these bodies are preferentially removed from
The Yarkovsky effect in the d-ynamics of the solar system
75
the main belt, and eventually show up in a relative overabundance within the population of Earth-crossing objects (e.g. Farinella et al. 1998, Hartmann et al. 1999; we have estimated that up t o 30-40% of 15m-sized fragments may be removed from the whole bulk of the main asteroid belt within their collisional lifetime of x 65 Myr). Moreover, their removal from the main-belt population would imply a longer collisional lifetime for the bodies of about 100 meters in size (which are typically fragmented by impacts of the 10 meter bodies). allowing them to drift over a wider portion of the belt and eventually feed the resonances. Such a size-dependent removal/injection of objects from/to the system should create wavy patterns in the size distribution, an effect that might be observationally tested. Details of the intimately connected size distribution statistics of main-belt and near-Earth objects can be thus significantly influenced by the Yarkovsky effect and requires a quantitative study.
Yarkovsky effect and rotation (YORP). Rubincam (2000) pointed out that the recoil force due to the thermal radiation may also affect the rotation state of the body. In contrast to the similar effect due to the direct solar radiation, such a Yarkovsky torque (proposed to be called YORP effect) can result in quasi-secular change of the rotation period and the obliquity of the spin axis as measured from the normal to the orbital plane. The evaluation of this effect needs a rather complicated approach since it vanishes for bodies with high symmetries (spherical or ellipsoidal shape) and becomes nonzero only for irregularly shaped objects. Rubincam (2000), whose results were based on a simplified approach, suggested that the YORP effect may cause the rotation state to oscillate between periods of very fast and slow rotation with flipping of the obliquity on a timescale of Myr to 100 Myr (depending on the object’s size and shape). This would have profound implications for the orbital version of the thermal phenomenon-the Yarkovsky effect-since such a rapid changes in the obliquity might to some extend inhibit diurnal variant of the Yarkovsky drift of the semimajor axis (note, however, that the seasonal variant would be affected less since it always results in a decay of the orbital semimajor axis). For this reason, we need a detailed understanding of the rotation effects due to YORP. Vokrouhlicki (2000; work in progress) reconsidered Rubincam‘s approach with slightly more involved modelling of the thermophysics and preliminarily concluded that: (i) the obliquity may be affected less by the YORP effect than previously mentioned, while (ii) the body may be spun-up or despun as predicted by Rubincam (2000). The novel aspect included in Vokrouh1ickS;’smodelling, notably the inertia in the temperature response, was found to reduce the out-of-axis torque re that affects the obliquity changes. Axis reorientation would then be dominated by the collisions rather than YORP. Put in a more general perspective, Vokrouhlicki observed that the long-term non-periodic features of the rotation state might be more complex and inevitably ruled by collisions during some periods. However. it should be noted that more work and independent checks are needed to clarify the situation. Also a full-fledged numerical simulation of the long-term evolution of the rotation state including YORP effect, as well as the gravitational torques due to the planets and the Sun, seems to be difficult but important task for the future work. Further topics. More Yarkovsky-work might be directed along two lines: (i) improvements of the theory and modelling the underlying thermal phenomena, and (ii) further applications. -4s far as the first item is concerned, we recall that the current models are still highly simplified. Influence of the inhomogeneities (surface layer, fractures run-
David VokrouhlickS;
76
ning through object etc.), temperature dependence of the thermal parameters, directional properties of the absorption and emission by the surface are just a few topics that should be studied. First steps in this direction (e.g. work by Vokrouhlickji and BroE 1999) indicated that we can still learn a lot. Optimisation of the numerical codes might be also considered, since the most precise approaches still require unrealistically long CPU time for long-term orbital applications. Further applications of the Yarkovsky effect are more difficult to predict, as usually in science, but natural ext,ension of the previous work might be directed toward dynamics of planetary rings, trans-Neptunian objects, long-term stability of the binary asteroids etc.
5
Conclusion
The Yarkovsky effect was introduced in the planetary dynamics by Opik and Radzievskii as a possible transport mechanism for meteorites. Since that time we have significantly substantiated our understanding of the manner in which matter is transported from the main belt toward the Earth, mainly by including the fundamental role of resonances. We have also substantiated modelling of the Yarkovsky effect and we have seen that it still plays an important role in the new context of the delivery scenario. A novelty of the last decade consisted in extending the applications of the Yarkovsky effect to larger bodies than a few meters across. As it has been argued above, dynamics of small solar system bodies up to x lOkm size may be importantly affected by the Yarkovsky effect (a singular case of the lunar motion has been left aside; see Vokrouhlickji 1997). Another important perspective that arose during the last few years is that of a possible observational test of the Yarkovsky orbital perturbation (Section 4.1). This fact is very important since a successful detection of the Yarkovsky effect may indicate that our present modelling is generally correct and confirm that we do not overestimate any aspect in the thermophysics of small solar system objects. In some sense this is a crucial point about the Yarkovsky effect: we know from the first principles that it unavoidably exists and we hope that our way of modelling it is “about correct” (i.e. within a factor 2 or so). We need to gain certainty about this second aspect and only direct observations can yield it. A systematic failure in detection of the Yarkovsky effect would prompt a deep reevaluation of its fundamental assumptions. Let us conclude this review by giving some useful internet addresses where the interested reader may check further developments in the theory and applications of the Yarkovsky effect:
http://sirrah.troja.mff.cuni.cz/-mira/mp/,
http://astro.mff.cuni.cz/davok/davok,htm,
http://www.boulder.swri.edu/-bottke/. Post scriptum. The profound impact of the Yarkovsky effect on understanding the orbital fate of small bodies in the solar system has been forseen in the early nineties by Paolo Farinella. I had the privilege to assist Paolo in shaping his ideas and putting them onto a quantitative basis. As in many other occasions, Paolo’s approach to understanding the Yarkovsky-effect role was that of a highly distinguished scientist: starting from an
The Yarkovsky effect in the dynamics of the solar system
77
original idea (mostly intuitive feeling), he was able t o quickly recognizee its physical nature and importance, and describe it by means of a precise mathematics. His vast knowledge of planetary science then enabled him to propose many situations where the Yarkovsky effect may help in better explaining the observations or even shape a new “paradigm”. Paolo passed away on March 25, 2000 at the peak of his scientific productivity. As many of his friends and collaborators, I shall painfully miss his sparkle and the inspiration by his unique ideas. I dedicate this text t o his memory.
Acknowledgements. The author thanks M. Bro6 for his help in drawing the figures and Bill Bottke for suggestions t h a t improved the final version of this text.
References Afonso G B, Gomes R S and Florczak M A, 1995, Planet Sp Sci 43 787-795. Anderson J D etal., 1998, Phys Rev Lett 81 2858-2861. Bottke W F, Burns JA and Rubincam DP, 2000, Zcarus 145 301-331. Bottke W F, Jedicke R, Morbidelli A, Petit J-M and Gladman B, 2000, Science 288 2190-2194. BroZ M and Vokrouhlicki D, 2000, in: Proceedings of US-European Workshop on Celestial Mechanics (eds. P K. Seidelman, D. Richardson and E. Wnuk, Dordrecht: Kluwer Acad Publ) in press. Broi M, Vokrouhlicki D and Farinella P, 2000, Icarus submitted. Burns J A, Lamy P L and Soter S, 1979, Icarus 40 1-48. Farinella P and Vokrouhlicki D, 1996, Planet Sp Sci 44 1551-1561. Farinella P and Vokrouhlicki D, 1999, Science 283 1507-1510. Farinella P, Vokrouhlicki D and Hartmann WK, 1998, Zcarus 132 378-387. Farinella P, Gonczi R, Froeschlk Ch. and Froeschlk C, 1993, Zcarus 101 174-187. Farinella P etal., 1994, Nature 371 314-317. Gladman B J etal., 1997, Science 277 197-201. Greenberg Rand Nolan M C, 1989, in: Asteroids ZZ(eds. R P Binzel, T Gehrels and MS Matthews; Tucson: The University of Arizona Press) pp 778-826. Hartmann W K etal., 1999, Meteoritics Planet Sci 34 A161-Al68. KneieviC Z,Milani A and Farinella P, 1997, Planet Space Sci 45 1581-1585. Langseth M G, Keihm S J and Chute J L, 1973, in: Apollo 17 - prelimnary Science Report NASA-SP-330. Lebofsky L A and Spencer J R, 1989, in: Astroids ZI (eds. R Binzel, T Gehrels and MS Matthews, Tucson: Arizona Univ Press) pp 128-146. Marti K and Graf T, 1992, A n n Rev Earth Planet Sci 20 221-243. Migliorini F etal., 1998, Science 281 2022-2024. Mignard F, 1992, in: Interrelations between physics and dynamics for minor bodies in the solar system (eds. D Benest and C Froeschlit, Gif-sur-Yvette: Editions Frontikres) pp 419-451. Milani A and Farinella P, 1994, Nature 370 40-42. Milani A and Farinella P, 1995, Zcarus 115 209-212. Milani A and KnefeviC Z, 1994, Icarus 107 219-254. Milani A, Nobili A M and KneEevid Z,1997, Icarus 125 13-31. Morbidelli A and Gladman B, 1998, Meteor Planet Sci 33 999-1016. Morbidelli A and Nesvornf D, 1999, Zcarus 139 295-308. Morbidelli A, Gonczi R, Froeschl6 Ch and Farinella P, 1994, Astron Astrophys 282 955-979. Morbidelli A, Zappala V, Moons M, Cellino A and Gonczi R, 1995, Zcarus 118 132-154. Muller T G, Lagerros JSV, and Blommaert JADL, 1999, Bull A m Astr SOC31 1075.
78
David Vokrouhlick2;
Nesvorni D and Morbidelli A, 1998, Astron J 116 3029-3037. Ostro S J etal., 1999, Science 285 557-559. Opik E J , 1951, Proc Roy Irish Acad 54 165-199. Peterson C, 1976, Icarus 29 91-111. Poynting J H, 1903, Phil Trans Roy Soc A 202 525-552. Presley M A and Christensen P R, 1997, J Geophys Res 102 6535-6550. Rabinowitz D L, 1993, Astrophys J 407 412-427. Rabinowitz D L, 1994, Icarus 111 364-377. Radzievskii V V: 1952. Astron Zh 29 162-170. Robertson H P, 1937, Mon Not R Astr Soc 97 423-438. Rubincam D P, 1987: J Geophys Res 92 1287-1294. Rubincam D P, 1995, J Geophys Res 100 1585-1594. Rubincam D P, 1998, J Geophys Res 103 1725-1732. Rubincam D P, 2000, Icarus 148 2-11. Slabinski V J , 1997, Celest Mech Dyn Astron 66 131-179. Spencer J R, Lebofsky L A and Sykes M V, 1989, Icarus 78 337-354. Spitale J N and Greenberg R, 2000, Icarus in press. Vokrouhlicki D, 1997, Icarus 126 293-300. Vokrouhlicki D, 1998a, Astron Astrophys 335 1093-1100. Vokrouhlicki D, 199813, Astron Astrophys 338 353-363. Vokrouhlickj. D, 1999, Astron Astrophys 344 362-366. Vokrouhlicki D and Bottke W F, 2000, Astron Astrophys submitted. Vokrouhlicki D and Broi M, 1999. Astron Astrophys 350 1079-1084. Vokrouhlicki D and Farinella P, 1998, Astron J 116 2032-2041. Vokrouhlicki D and Farinella P, 1999, Astron J 118 3049-3060. Vokrouhlicki D and Farinella P, 2000, Nature 407 606-608. Vokrouhlicki D, Milani A and Chesley SR, 2000, Icarus 148 118-138. Vokrouhlick$ D, Chesley S R and Milani A, 2000, .Celest Mech Dyn Astr in press. Vokrouhlicki D, Broi M , Farinella P and KneZeviC, 2000, Icarus in press. Wechsler A E, Glaser P E, Little A D and Fountain J A, 1972, in: Thermal Characteristics of the Moon (ed JW Lucas; Cambridge: MIT Press) pp 215-241. Welten K C etal., 1997, Meteor Plan Sci 32 891-902. Wetherill G W and Chapman C R, 1988, in: Meteorites and the Early Solar System (eds. J F Kerridge and M S Matthews; Tucson: Univ. of Arizona Press) pp 35-67. Will C M, 1993, Theory and Experiment in Gravitational Physics (Cambridge: Cambridge University Press). Yomogida K and Matsui T, 1983, J Geophys Res 88 9513-9533.
79
Are science and celestial mechanics deterministic? Henri Poincar6, philosopher and scientist C Marchal General Scientific Direction, ONERA, France
1
The absolute, or Laplacian determinism
The idea of determinism has a very long history and various meanings. Its absolute meaning was defined by Pierre Simon de Laplace in 1814 in his book “Essai philosophique sur les probabilitks” (Philosophical essay on probabilities) where he has written:
“Nous devons envisager l’e‘tat p r b e n t de l’linivers comme l’effet de son &at ante‘rieur et la cause de ce qui va suivre. Une intelligence qui, pour u n instant donne‘, conna&rait toutes les forces dont la nature est anime‘e et la situation respective des gtres qui la composent, si d’ailleurs elle e‘tait assez vaste pour soumettre ces donne‘es a l’analyse, embrasserait dans la mgme formule le mouvement des plus grands corps de l’linivers et ceux du plus le‘ger atome: rien ne serait incertain pour elle, 1 ’avenir comme le passe‘ serait pre‘sent a ses yeux”. “We must consider the present state Universe as the effect of its past state and the cause of its future state. An intelligence that would know all forces of nature and the respective situation of all its elements, if furthermore it was large enough to be able to analyze all these data, would embrace in the same expression the motions of the largest bodies of Universe as well as those of the slightest atom: nothing would be uncertain for this intelligence, all future and all past would be as known as present” (Laplace, 1814). Such an absolute determinism is known as “Laplacian determinism”. Throughout the nineteenth century it was considered as a fundamental part of science and we must recognize that it played a very useful role in helping scientists to classify and understand a huge variety of physical, astronomical, chemical, medical and biological phenomena. It is certainly one of the major reasons for the fantastic scientific progress of the twentieth century.
80
2
C Marchal
The creed of Scientism and its discredit
In the decades 1880-1910 the impressive progress of science led to an entirely new situation. Most scientists, but also many writers and philosophers, as well as a very large proportion of the public, felt that mankind was at the dawn of a new era. Science was considered as almost infallible: able to solve all problems, worries and miseries that were the age-old share of human condition; able to answer to all questions, especially the philosophical ones: Where are we? Where do we come from? Where do we go? Why are we on Earth? Many scientists, having acquired a high degree of pride, considered that any scientific progress was progress for mankind; consequently refused all exterior interventions or considerations. This state of mind was particularly well reflected in the following profession of scientific faith presented at the general meeting of the French Association for the Advancement of Science (in Reims, 1880) by J. Mercadier, chairman of the Physics sect ion.
“La liberte‘ est la condition essentielle du de‘veloppementdes sciences. Aussi n’existe- t-il parmi nous ni castes, ni sectes, ni coteries; toutes les convictions sincbres y sont respecte‘es. Tout ce qui touche au domaine de la conscience est syste‘matiquement e‘carte‘de nos de‘bats. On ne discute ici que des questions ve‘ritablement discutables et sur lesquelles l’expe‘riencea quelques prises; mais toutes les questions d e ce genre sont admises a la discussion. Nous e‘coutons toutes les doctrines scientzfiques, se‘rieuses ou non, peu nous importe, car celles qui ne le sont pas ne re‘sistent pas d. un examen rigoureux, fait librement et en pleine lumibre. Nous avons une foi sincbre dans le progrbs continu de 1 ’humanate‘et, jugeant de 1 ’avenir d’aprbs le passe‘ et d’aprbs les conqu6tes que le sibcle actuel a faites sur la nature nous n’admettons pas qu’on vaenne nous dire d priori en quelque branche que ce soit de la science positive: ‘Tu t’arr6teras ld!’. I1 y a donc place parmi nous, vous le voyez, pour tout homme d’initiative, de bonne volonte‘ et de bonne foi”. “Freedom is the essential condition of the development of Science. Then among us exist neither castes, nor sects nor political sets; all sincere convictions are respected. All that concern the domain of consciousness is systematically discarded from our discussions. Here we only discuss debatable questions upon which some experiments are possible; but all these questions are debatable. We listen to any scientific doctrine, serious or not, as those that are not serious will not resist any rigorous examination done freely and openly. We have a sincere faith in the continuous progress of mankind and, assessing the future with the help of the past and of the conquests of nature that our present century has granted, we forbid anyone to say ‘ a priori ’, for any branch of positive science: ‘You will stop there!’ Thus as you can see, we are ready to welcome any active and honest man of good will”.
Are science and celestial mechanics deterministic?
81
This very optimistic view of Science was still cautious: it avoided the domain of consciousness. But twenty years later this prudence was over and the triumphant “Scientism” claimed to rule even that domain. Its particularly optimistic and dominant ideology can be summarized into what can be called the “creed of Scientism”: Science will explain everything. Religions belong to the past (Auguste Comte). All that actually exists can be proved (I only believe what I can see). God is an invention of men (Freud, Feuerbach). The Universe is infinite and unchanging, it has always existed, it will exist forever. Man is an animal, that is some organized matter. Evolution only depends on the random (Darwin). The Bible, and its miracles, are only legends (Renan). The finality is only an appearance, only the determinism actually exists. Of course the philosophy corresponding to this creed is materialism and determinism; the corresponding belief is atheism. But even in 1900 this creed was impossible to accept fully. For example, the German biologist and physiologist Ernst-Wilhelm von Briicke (1819-1892) claimed: “The finality is an exacting mistress and a biologist cannot avoid her, but above all he refuses to be seen publicly with her!”. We will see below the more serious objections of PoincarC. Let us notice that even if this creed has met many difficulties, contradictions and refutations throughout the twentieth century, it remains for many scientists the unconscious, but still very active, basis of their vision of science and of their definition of scientific endeavour. Furthermore civil laws of modern nations reflect this philosophy of determinism, and murderers are sometimes considered as not guilty: are they not predetermined? Today we know that this 1900 creed of Scientism rests on weaker and ever weaker grounds. It has been under fire from both inside and outside science. Scientists have met many limits of science, the most famous being the following. The uncertainty principle (Heisenberg). The Godel theorem of incompleteness. Chaotic motion, strange attractors, sensitivity to initial conditions, (Poincard, Julia, Mandelbrot, HCnon, Lorenz). The Liapunov time, the time of divergence (Ruelle, Takens, BergC, Lighthill). The paradox of freedom. The limits of information theory. Even in astronomy, this stronghold of determinism, the time of divergence of motions is not infinite; it is about 10 to 100 millions of years for the motions of planets (and much less for the motions of small asteroids). Celestial mechanics cannot decide alone on the origin of the Moon or on the long-term evolution of the solar system. A completely unexpected phenomena arose in the first half of the twentieth century, exemplified by Robert Oppenheimer in a dramatic statement: “The scientists have met
82
C Marchal
sin!” Today it is difficult to imagine the disarray of people of the (nineteen) twenties and the thirties: “How is it possible that scientists have participated in the 1915-1918 war of asphyxiating gas! Have led experiments t o determine which gas was the most efficient in killing human beings!” These scientists were chemists and their inventions were also used for the industrial death of Nazi camps. The physicists had their burden with the atom bomb; the biologists with the temptation of eugenics, the genetic manipulations and the experiments on aborted babies collected still alive. The image of science as the servant of mankind had gone.
As a result most scientists are now modest. They know that science cannot, by far, explain everything. Unthinkable for nineteenth century scientists, many ethics committees have been established by teams of scientists, philosophers and even theologians. The best known examples are the following: The Nuremberg code of 1947 that gives the ethical limits in medical experiments on human beings (these human beings must have given freely their consent, they must have a fair knowledge of the experiment purpose and of the possible consequences for their health, they must have the right to stop the experiment at any time, etc. The Russel-Einstein manifesto in 1955. Since 1957 the yearly Pugwash conferences on atomic weapons (Nobel prize for peace in 1995). The non-proliferation treaty (1969). The Asilomar moratorium on genetic manipulations (1974).
To these we can add many scientific studies on the dangers related to nuclear wastes, accidents of Tchernobyl type, contaminated blood, etc. The scientists have looked outside of science for directives and justifications; they have recognized, after Rend Cassin, that the main references of human condition, such as the Right of Man, have an ethical and religious origin: the belief in the dignity of human beings.
3
Henri Poincar6 philosopher
Henri Poincar6 has written many books at the boundary of Science and Philosophy such as: “La Science et 1’Hypothkse” (Science and Hypothesis), “La valeur de la Science” (The Value of Science), “Science et mkthode” (Science and method). Here we will consider only his reflexions on determinism and irreversibility as they appear in his last and unfinished book: “Dernibres pens6es” (Last thoughts). In the relations between ethics and science, Henri Poincark recognizes many beneficial effects: the scientists are looking for truth, their ethics lead them to be honest and to have a collective and general point of view leading them (usually) to the good of all mankind. However he was distressed by the philosophical problem of determinism:
“Mais nous sommes en pre‘sence d’un fait; la science, Q tort ou Q raison, est de‘terministe; partout o u elle pe‘ndtre elle fait entrer le de‘terminisme. Tant qu’il ne s’agit que de physique ou m i m e de biologie cela importe peu; le domaine de la conscience demeure inuiole‘; qu’arrivera-t-il le jour o u la morale
Are science and celestial mechanics deterministic?
83
deviendra a son tour objet de science? Elle s ’ impre‘gnera ne‘cessairement de dkterminisme et ce sera sans doute sa ruine”. “However we are in the presence of the following fact: truly or wrongly science is deterministic, its extension is also an extension of determinism. As long as only physics or even biology are concerned the effects are minor; but what will happen when ethics will become a subject of science? It will be impregnated with determinism and will probably be destroyed” (Poincark, 1913). We can almost read that Henri PoincarC was already horrified by the future horrors of the reign of such a dogmatic science and of “scientifically founded regimes” that send you to the gulag archipelago not because of your crimes but because of your social origins . . . (today such a policy is qualified as “crime against Mankind” ).
4
Henri Poincar6 scientist
We have seen in the first section the definition of absolute determinism. Its main application in science may be summarised in the statement that two experiments with exactly the same initial and limit conditions must give exactly the same results. It is easy to understand how precious this idea has been in the development of science and in the observation of the innumerable phenomena of nature. Celestial mechanics is the best example of the application of determinism. The wonderful law of universal attraction was sufficiently simple t o be discovered by Newton’s genius and sufficiently complex to give a wide variety of motions with many perturbations and inequalities. It was above all a deterministic law leading to accurate predictions of planetary motions and eclipses. These successes were the major reason for the consensus on determinism of the nineteenth century scientists; the discovery of the planet Neptune after the long calculations of Leverrier and Adams was, of course, an excellent positive argument. However, long before Heisenberg’s uncertainty principle, Henri PoincarC presented scientific elements going against absolute determinism.
( ‘ h e cause tris petite, qui nous e‘chappe, de‘termine un effet conside‘rable que nous ne pouvons pas ne pas voir, et alors nous disons que cet effet est d au hasard . . . Mais, lors mime que les lois naturelles n’auraient plus d e secret pour nous, nous ne pourrons connaltre la situation initaale qu ’approximativement. Si cela nous permet de pre‘voir la situation ulte‘rieure avec la mEme approximation, c’est tout ce qu’il nous faut, nous disons que le phe‘nomdne a e‘te‘pre‘vu, qu’il est re‘ga par des lois; mais il n’en est pas toujours ainsi, il peut arriver que de petites daffe‘rences dans les conditions initiales en engendrent de trds grandes dans les phe‘nomines finaux . . . ”.
“A very small, unnoticeable cause can determine a very large visible effect; in this case we claim that this effect is a product of random . . . However, even
84
C Marchal if the natural laws were perfectly known, we will never be able to know the initial conditions without some approximation. If this allows us to know the future to the same approximation then that is all we want. We will say that the phenomenon is foreseeable, that it is governed by laws; however this is not always the case, it is possible that very small initial differences lead to very large differences in the final state . . .”(PoincarC, 1908a).
-4s examples of this sensitivity to initial conditions, Henri Poincar6 presents the trajectories of hurricanes (almost the “butterfly effect”) and, more striking, the conception of NapolCon by his parents . . . (PoincarC, 1908b). Thus we must consider that the idea of absolute determinism only reflects a particular state of the conditions of the development of science. It was indeed easier to study first the most simple, regular and foreseeable phenomena such as the free fall, the rise of the Sun, the periodic recurrence of full Moon, of seasons, of high tides etc. It was an obvious, but too large, generalization that led us to consider that all natural phenomena must be deterministic. We must then first make a clear distinction between what can be called “mathematical determinism” and “physical determinism”. The mathematical determinism reflects the definition: ”Two experiments with exactly the same initial and limit conditions must give exactly the same results” : the mathematical model of a natural phenomenon is considered as deterministic if the mathematical conditions of existence and uniqueness of solutions are satisfied, (which is generally the case for models using systems of differential equations). Physical determinism is very different. For many reasons, for instance because of the motions of planets, it is impossible to repeat the same experiment exactly. Thus a useful physical definition of determinism must be: “TWO experiments with almost exactly the same initial and limit conditions must give almost exactly the same results”. In other words the stability of a phenomenon is an essential condition of the usefulness of the idea of determinism. For unstable phenomena, as soon as we consider durations longer than the time of divergence, statistical analysis is more useful and more efficient than a deterministic analysis. Does physical indeterminism, this sensitivity to initial conditions occur frequently? We have seen that Henri Poincar6 presented several examples: the meteorology, the conception of NapolCon, etc. But he is also the initiator of what is called today the theory of chaos, an essential feature of motions that are sensitive to initial conditions. Moreover he recognised that chaos appears extremely often: it appears in all non-integrable problems. “Que l’on cherche a se repre‘senter la figure forme‘e par ces deux courbes et leurs intersections en nombre infini dont chacune correspond a une solution doublement asymptotique, ces intersections forment une sorte de treillis, d e tissus, de re‘seau a mailles infiniment serre‘es; chacune d e ces deux courbes ne doit jamais se recouper elle-mime, mais elle doit se replier sur elle mime d e manibre infiniment complexe pour venir recouper une infinite‘ de fois toutes les mailles du re‘seau. On sera frappe‘ de la complexate‘ de cette figure, que j e ne cherche mime pas a tracer. Rien plus propre d nous donner une ide‘e de la complication du problkme des trois corps et en ge‘ne‘ral de tous les problbmes d e la Dynamique ou il n’y a pas d’inte‘grale uniforme et od les se‘ries de Bohlin sont divergentes”.
Are science and celestial mechanics deterministic?
85
“If we try to represent the figure formed by these two curves, by their intersections in infinite number each of which corresponding to a doubly asymptotic solution; we will find a kind of lattice, a texture, a net with infinitely tightened meshes. Each of these two curves cannot intersect itself, but it is folded on itself in an infinitely complex way in order to cross an infinite number of times all the meshes of the net. The complexity of this figure is striking and I will even not try to draw it. Nothing can give a better idea of the complexity of the three body problem and of all problems of dynamics without uniform integral and with diverging Bohlin series” (PoincarC, 1957a). Of course the importance of chaotic motions varies very much with the domain of interest. When the perturbations are large almost all bounded solutions are chaotic while most of them are regular in almost-integrable problems. However, even in this latter case, the presence of a small proportion of chaotic solutions challenges the long-term stability. An example of an almost-integrable problem is the classical problem of planetary motions in the solar system: the Keplerian motion is an excellent first order approximation of these motions, and the method of small perturbations leads to very useful and very accurate expansions. However the accuracy of this method is limited and PoincarC has demonstrated that the corresponding series are generally diverging (PoincarC, 1954a, 1957b).
As an example of a problem with very large perturbations, we can consider the kinetic theory of gas (PoincarC, 195413). The instability is so large and the Avogadro number is so huge that the statistical methods give excellent results: the physicist uses the statistical measures called temperature, pressure, density, etc. and the corresponding system of partial differential equations as if this model was absolutely accurate and deterministic. Of course statistical models cannot have an infinite accuracy, but they do have also an unexpected property: they give irreversible evolutions even when they describe a reversible phenomenon, such as the motions described by the kinetic theory of gas. This property is a pure mathematical effect but it leads to the second principle of thermodynamics and to all the related irreversibilities, the essential elements of what is called “arrow of time”. There is a major problem: consider two vessels full of gas and let us open the communication between these two vessels. The Brownian motion will equalise the temperatures, the pressures and the compositions while the opposite evolution never appears. However: The Brownian motion and the kinetic theory of gas are conservative and reversible, as conservative and reversible as Celestial mechanics itself. Henri PoincarC has demonstrated that for bounded and conservative systems, almost all initial conditions lead to an infinite number of returns to the neighbourhood of the initial conditions (PoincarC, 1957 c). Of course these returns to the neighbourhood of the initial conditions contradict the equalization of temperatures, pressures and compositions. In face of this contradiction there are several classical but unsatisfactory answers: “There exist perhaps some very small, irreversible and dissipative hidden phenomena that forbid the application of the Poincar6 return theorem . . . ”
86
C Marchal
All known laws of nature are reversible (if we consider that the second principle of thermodynamics is a “principle” and not a law) and tjhis first answer is thus the rejection of a major symmetry of nature . . . We will see that it is not necessary. “For a given phenomenon the notion of trajectory remains accurate for only its time of divergence that is about fifty or one hundred “Liapounov times” and much less than the PoincarC return time that has never been observed in this type of experiment .’! This answer is true but insufficient. The impossibility of accurate long-term computations of future evolution doesn’t resolve the contradiction . . . ”In principle PoincarC is right and for strictly isolated systems there is indeed this mysterious correlation between initial and final conditions (after the PoincarC return time). But our systems are not strictly isolated and even very small perturbations, ” such as the attraction of planets, destroy this correlation These “mysterious correlations” are imaginary, and it is in a natural fashion that the system returns towards all states attainable from the given initial conditions. The “very small perturbations” will not modify the order of magnitude of the PoincarC return time, even if it is true that they can modify very much the evolution in a relatively short interval of time (a few “Liapunov times”) and thus contribute to the disappearance of correlations. The true answer is related to the chaotic motions. It is because a system is “sensitive to initial conditions” and because it depends on billions of parameters, while we measure only a few of them, essentially the statistical ones, that we ascertain an appearance of irreversibility and that the Poincard return time is very large, much larger than the age of Universe. We thus reach the physical irreversibility of our experiments in spite of reversible and conservative laws. Kotice that for non-chaotic evolutions, for instance for periodic or quasi-periodic evolutions, the deterministic previsions can be excellent even if the knowledge of initial conditions is weak. -4 solution of these types has a natural reversibility and remains in a very small part of phase space, a part much smaller than that corresponding to chaotic motions. The chaotic evolutions compensate their impossibility of long-term deterministic previsions by excellent long-term statistical predictions (notice the similarity with quantum mechanics). This ability is related to the chaos itself reintroducing randomness permanently: even if it is impossible to predict the future motion of a given molecule in the Brownian motion, we can model very accurately the statistical elements such as the temperature and the pressure. This strange result was reported with humour by Henri PoincarC:
“Vous me demandez de vous pre‘dire les phe‘nombnes qui vont se produire. Si, par malheur, j e connaissais les lois de ces phe‘nombnes, j e ne pourrais y arriver que par des calculs inextricables et j e devrais renoncer h vous ripondre; mais, comme j’ai la chance d e les ignorer, j e vais vous ripondre tout de suite. Et, ce qu ’a1 y a d e plus extraordinaire, c ’est que m a re‘ponse sera juste. ” ”You are asking me to predict the phenomena that are going to happen. If I was unlucky enough to know the exact laws of these phenomena my predictions
Are science and celestial mechanics deterministic?
87
would require tremendous computations and I would be unable to give you the answer; but fortunately I ignore the exact laws of these phenomena and thus I am going to give you the answer immediately . . . and , fantastically, my predictions will be correct!” (Poincard, 1 9 0 8 ~ ) . But how is it possible to reconcile the reversible laws of individual elements with the irreversible laws of averaged statistical elements? The reconciliation is in the difference between the average and the reality of these statistical elements. For systems with a large number of independent parameters this difference is usually extremely small and inappreciable but it can become large, after a “sufficiently long time”, for instance for the Poincark return to the neighbourhood of initial conditions. In most cases this Poincard return time is so long that it has no physical meaning. For instance, consider the example presented in Marchal (1995). Two identical vessels contain (a rather small) total of 10” identical molecules a t the same temperature, with initial pressures of 1.4 and 0.6 bar and with an exchange rate of 1015 molecules per second. We find, with the exception of the very small proportion of of initial conditions, the Poincard return time T to be
T = 10Rmillenia;
35,735,000,089,859,491 < R < 35,735,000,089,859,696.
This is of course a purely theoretical result! Thus the paradox of reversible laws associated with irreversible phenomena can be explained without “small hidden irreversibilities” , “perfect isolation” and/or “hidden correlations”. The main reasons of physical irreversibilities is the chaotic character and the very large number of parameters of irreversible systems. The Boltzmann hypothesis of “molecular chaos” is excellent and allows very accurate computations. The correlations will not increase slowly and insidiously after a very long time and we can almost claim that the return of Poincard occurs by chance and requires such a large delay, much larger than the age of Universe, that the corresponding decrease of entropy never appears in our experiments. If we meet so many phenomena with an increase of entropy, it is because disequilibrium is easy in our world: the smallest valley has a sunny side and a shady one . . . The fundamental reason is our existence in the middle of a giant stream of energy ( 1 . 7 3 ~ 1 0 ’ ~ watts) that arrives continuously from the Sun and escapes to the frozen space. At all scales of nature (quantum, microscopic, ordinary, global, astronomical, cosmological) the chaotic motions destabilize the individual elements (position and velocity of a particle) but stabilize the corresponding mean statistical elements (pressure, temperature) that become the basic elements of the larger scale. Phenomena are thus nested in one another up to the astronomical and cosmological scales that use the notion of “centre of mass of a celestial body”. The dynamics of motion a t one level is unaffected by the dynamics of internal levels. The corresponding time of divergence is a rapidly increasing function of the scale of the phenomenon of interest: extremely short a t quantum scale (in agreement with the statistic and probabilistic character of quantum mechanics); a few seconds or a few minutes for ordinary turbulent flows; about two weeks for meteorology; several millions of years for the planetary motions in our Solar System. Of course Poincark did not arrive a t indeterminism as a principle, in the way proposed later by Heisenberg for quantum mechanics; in 1910 these questions were not sufficiently
88
C Marchal
studied and understood. Nevertheless, in the last months of his life, he analysed the theory of quanta and recognised that the discontinuity of quanta was a necessity:
“Donc, quelle que soit la loa du rayonnement, si l’on suppose que le rayonnement total est fini, on sera conduit 6 une fonction w pre‘sentant des discontinuite‘s analogues a celles que donne l’hypothdse des quanta”. “Thus, for any law of radiation, if we assume that the total radiation is finite, we will be led to a function w with discontinuities similar to these given by the hypothesis of quanta” (PoincarC, 1 9 5 4 ~ ) .
5
“God doesn’t play dice”
In spite of Poincark’s philosophical and scientific investigations, in spite of the limits of science and the discredit of scientism, in spite of the ethical problems arising throughout the twentieth century. many conservatives remained stubborn supporters of the absolute determinism. Upset by the probabilistic character of quantum mechanics, Einstein claimed that “God doesn’t play dice” and, with two friends, he proposed in 1935 what is now known as the Einstein-Podolsky-Rosen paradox. The main idea is that quantum mechanics cannot be at the same time “complete”, i.e. with its statistical expression of reality without possible deterministic improvement, and, at the same time, “local1’,i.e. without the need of transmission of information a t large distances with velocities greater than that of light. Einstein, Podolsky and Rosen, for whom the velocity of light is an absolute limit and determinism a requirement, insisted that quantum mechanics must be improved. They suggested the possible existence of still unknown, or hidden, variables inside quantum particles: their different possible states would explain the different possible motions of particles from apparently the same initial conditions. On the contrary, for Niels Bohr and his supporters of the Copenhagen school, the probabilistic character of quantum mechanics is fundamental; their quantum theory is complete. They simply consider that their quantum theory is not local; this, for them, is not a major drawback. This controversy remained a philosophical one until 1964. Then J.S. Bell proposed an experiment in which the two opposite opinions lead to clearly different results. This difficult experiment has been realised by several teams with many results controversial until the beautiful tests of Alain Aspect over a metric scale in 1979. More recently (1997) experiments at a kilometric scale a t CERN have confirmed the same results: Niels Bohr is right and quantum physics cannot avoid an intrinsic randomness and a statistical character. However let us notice that Einstein is at least partially right: because of the random and statistic character of quantum mechanics the Bell experiment cannot be used for the transmission of information faster than the velocity of light . . .which is really an extraordinary conclusion!
Are science and celestial mechanics deterministic?
6
89
The second line of defense
“Of course, it is now obvious that quantum mechanics is intrinsically mixed with randomness and statistics. But let us be serious, these infinitesimal effects cannot affect the fundamentally deterministic character of ordinary Physics and above all of Astronomy”. Even today many scientists still continue to believe in the absolutely deterministic character of their own discipline. If you point out the phenomenon of the “butterfly effect” in meteorology, either they consider that this is something particular to that subject, in which much progress has yet to be made, or, worse, you discover that for them the butterfly effect is a pure fiction of the theoretician and nothing to do with reality. The mathematics are not ignored and most scientists know that in unstable phenomena (in mathematical terms: when one or several Liapunov coefficients are positive) there is “sensitivity to initial conditions” and “exponential divergence of neighbouring solutions”, but they consider that the gap between quantum mechanics and ordinary physics is so large that no divergence, exponential or not, can ever fill it. They also know that a diverging exponential is a very rapidly increasing function, but they have not realised just how fast it is. When you ask them to do the computation, which is easy, you get answers as: “so fast!, incredible!, I would never have believed that!”. It is then that they understand that the randomness of quantum mechanics invades rapidly all physics and how it is important to know if, in the conditions you are studying, the phenomenon of interest is either regular or chaotic. In the former case a deterministic analysis is the best, in the latter a statistical analysis is very useful. Fortunately, even in astronomy, many scientists have learned to deal with the new concepts as shown by several presentations of this meeting: “Close approaches of Earthcrossing asteroids: chaos and impact probabilities” by Andrea Milani; “(Statistical) evolution of galaxies due to self-excitation structure’’ by Martin Weinberg; “Detection of chaos in Hamiltonian-like dynamical systems: analysis of discrete times series from model and observations” by Alessandra Celetti and Claude FroeschlC; “Non-integrable galactic dynamics: limit of regularity and chaos” by David Merritt; etc.
7 The next step: freedom and free will The evolution of ideas leads now to a new major step: the scientific analysis of free will and freedom. Of course that subject has been analysed by philosophers since time immemorial. Are we really free? Is our impression of freedom a pure illusion? It is possible to classify the philosophers in terms of their answers to this essential question (Honderich, 1993), most of them remaining in doubt. Scientific analysis has led to a strange result: a scientific conclusion seems impossible and no known experiment has given unambiguous results. In face of this problem, and in spite of their scientific method and their huge scientific success, the scientists remain as powerless as the philosophers (Burns 1999). The present tendency is to consider that “freedom” and “free will” really do exist; indeed, with this hypothesis our world is much more understandable than with the opposite hypothesis, but neither hypothesis is provable. They must be considered postulates no
C Marchal
90
more provable than those of geometry or arithmetic. Nonetheless many would unhesitatingly postulate that “There is a source of freedom in each human being”.
For the philosopher Patricia Churchland in “The astonishing hypothesis” (Crick. 1994), the existence of so many chaotic motions with their corresponding butterfly effects is the real reason of the possibility, and the existence, of freedom: our free will has constantly a very large number of opportunities to act decisively for almost nothing . . . Along with the ethical problems of scientists, that stream of ideas has had an unexpected, but logical, consequence: a fantastic modification of the image of God. First we must understand how, in the past centuries, the common image of God was hard and severe, a horrific and repulsive God that counted our sins and took revenge. This was not unquestioned. For example Voltaire was so upset about people telling him that the sins of the people of Lisbon were the reason for the 40,000 deaths in the earthquake of 1755 that he wrote:
Lasbonne, qua n’est plus, eut-elle plus d e mces Que Londres, que Paras, plongis dans les de‘lices? Lisbon, that is no more, had it more vices Than London. than Paris. living in delight? Much later, in Paris as recently as 1897, the catastrophe of the fire of the “Bazar de la Charitk”(117 deaths, mostly women) raised up the same kind of rhetoric about the revenge of God . . . in total contradiction with the teaching of Christ. Today God is completely different from these images of the past. He is no longer Almighty: He has given to Man the marvellous but also terrible gift of Freedom and this limits His Power. God doesn’t correct the bad consequences of our sins for then we would not be free; instead He suffers from them. His interventions are in the light He brings to our consciences, as formerly Christ accepting arrest, condemnation, torture and execution in order to teach us concretely how much we can be unjust. This new image of God has spread surprisingly fast and it is now usual to hear, even among old people, comments as: “God is pure love. How is it possible that, for instance in Algeria, people kill in the name of God?” They have forgotten how God was, so few decades ago, . . . and how He remains in the minds of fanatics. And the scientific proofs of the existence or non-existence of God? It is certainly impossible to be dogmatic on this subject for to believe or not to believe is the primary freedom.
8
Conclusion
In less than one century, amidst tremendous scientific progress, the foundations of science have been upset. The classical and absolute determinism, so useful formerly, has been shown to be limited and all branches of physics and astronomy are today a mixture of determinism and randomness. Furthermore the ethical problems arising from misuse of science have completely modified the point of view of scientists on philosophical questions. Materialism is no more a “must” and freedom, will, free will, these essential pillars of human dignity. are no more considered as illusions. It is impressive to realise that all
Are science and celestial mechanics deterministic?
91
these fundamental transformations have their origin in the philosophical and scientific works of a great pioneer: Henri Poincar6.
"Le savant n'e'tudie pas la nature parce que c'est utile, il l'e'tudie parce qu'il et il y prend plaisir parce qu'elle est belle. Si la nature n'e'tait pas belle elle n e vaudrait la peine d'dtre connue, la vie n e vaudrait pas la peine d'dtre ve'cue". y prend plaisir,
"Scientists don't study nature because it is useful, they study it because they delight in it, and they delight in it because nature is beautiful. If nature was not beautiful it would not be worth studying, life would not worth living". (Henri Poincar6, Science et mdthode, 1908).
References Burns J E, 1999, Volition and Physical Laws, Journal of Consciousness Studies 6 (10) pp2747. Crick F, 1994,La scienza e l'anima, (Rizzoli editor, Milan). Appendice sul libero arbitrio p315. Honderich T, 1993, How free are you? The determinism problem. Oxford University Press. Laplace P S, 1814, Essai philosophique sur les probabilite's. Madame Veuve Courcier ed. Paris. Marchal C, 1955, Chaos as the true source of the irreversibilty of time. In From Newton to chaos, edited by Roy A E and Steves B A, Plenum Press, New-York, pp451-460. PoincarB H, 1908a,b,c. Science et me'thode, Edition Ernest Flammarion, Paris. (a) p68, (b) pp69,gl and (c) p66. PoincarB H, 1913, Derniires pense'es, edition Ernest Flammarion, Paris 1913, p245. PoincarB H, 1954a, Sur le problkme des trois corps et les Bquations de la dynamique. Divergence des series de M. Lindstedt. Oeuvres de Henri Poincare' 7 pp462-470, Gauthier-Villars ed. Paris. Originally published in Acta Matematica 13 ppl-270, 1890; PoincarB H, 1954b, RBflexions sur la thBorie cinktique des gaz. Oeuvres de Henri Poincare' 9 pp620-668, Gauthier-Villars ed. Paris. Originally published in Journal de Physique the'orique et applique'e, 4ibme se'rie 5 pp369-403, 1906, and also in Bulletin de la socie'te' francaise de Physique, pp150-184. 6 Juillet 1906. PoincarC H, 1954c. Sur la thBorie des quanta. Oeuvres de Henri Poincare' 9 p649, GauthierVillars ed. Paris. Poincark H, 1957a, Les me'thodes nouuelles de la Me'canique ce'leste, Volume 3. Dover Publications, New-York, p389. 1957. PoincarC H, 1957b, MBthode de M. Bohlin-Divergence des skries, in Les me'thodes nouuelles de la Me'canique ce'leste, Volume 2, Dover Publications Inc. New-York pp388-393. Poincark H, 1957c, Sur le problkme des trois corps et les Cquations de la dynamique, in Les me'thodes nouuelles de la Me'canique ce'leste, Volume 2, Dover Publications Inc. New-York pp140-157. Also in Oeuvres de Henri PoincarC 7, pp314-318, Gauthier-Villars ed. Paris, 1954, and also Acta Matematica 13, pp65-70, 28 Avril 1890.
93
Regularisation methods for the N-body problem Sverre J Aarset h Institute of Astronomy, University of Cambridge, UK
1
Introduction
This article discusses regularisation methods which have proved useful in direct N-body simulations of star cluster dynamics. First we consider the Kustaanheimo-Stiefel method which is used for studying perturbed binaries. The second part is concerned with the treatment of compact subsystems by several methods of multiple regularisation. Direct integration of self-gravitating stellar systems invariably poses many technical challenges. In this article we concentrate on methods for dealing with binaries and small subsystems. Such systems are characterised by short periods and successive close encounters which require special treatment in order to avoid the loss of efficiency as well as accuracy. The first lecture deals with a range of topics in two-body regularisation, starting from basic principles. This is a powerful technique for studying perturbed binaries which play an important role in star cluster simulations. We sketch a derivation of the classical Kustaanheimo-Stiefel (1965, hereafter KS) regularisation method. Some aspects relating to the study of self-gravitating systems will be discussed, with emphasis on practical implementations. The alternative Stumpff KS method will also be presented. Longlived hierarchies are a special feature and their treatment can be speeded up by adopting unperturbed motion for the inner binary. This approximation is justified by a stability criterion which has proved to be robust and reliable for surprisingly small period ratios and a wide range of outer masses and outer eccentricities. The second lecture describes methods for studying strong interactions between binaries and single stars or with other binaries. Such events often involve large energy changes and produce high-velocity escapers. The simplest case of relatively isolated subsystems are treated by unperturbed three-body and four-body regularisation. The concept of a chain of strongly interacting particles replaces these methods when external perturbers are significant. Finally, we discuss some practical features of N-body implementations which have proved useful during the past ten years. An earlier review of general integration methods which include regularisation has been presented elsewhere (Aarseth 1994).
94
2
Sverre J Aarseth
Two-body regularisation
Consider a binary with mass M = mk +ml and coordinates r k , rl. The equation of motion for the relative separation R = r k - ri then takes the form
where F k l is the external perturbation. We now introduce a fictitious time T by the differential relation d t = Rdr and apply the operator d l d t = (1/R) d l d r twice. This gives rise to the time-smoothed equation of motion
R’ M R” = -R’ - -R R R
+ R2Fkl.
This equation is better behaved numerically than (1) but is still undetermined as R However, it serves as an introduction to the removal of the two-body singularity.
+ 0.
The basic ideas of regularisation may be illustrated by a simple one-dimensional exercise. From the equation of motion
and the time transformation d t = x dr; the new equation of motion is
X
(4)
together with the definition x = x’/x then gives
x” = 2 h z + M . Although regular for z -+ 0, this displaced harmonic oscillator equation can be made even simpler by the coordinate transformation
z= U2.
(7)
Differentiating twice and substituting for z’= 2 . ~ 1 ~in’ the energy integral (5) converted to z’then leads to the final equation of motion U ‘I
=Z 1h u .
(8)
Thus the original non-linear equation of motion has been reduced to a harmonic oscillator which is a linear equation. It should be noted that the coordinate transformation halves the frequency of (6) which is a general property of the mapping (cf. Stiefel and Scheifele 1971). Since the physical time is readily obtained by integrating t’ = U’, it follows that the complete solution is regular as z -+ 0. It should be emphasised that the above regularisation is achieved by a transformation of both the time and the coordinate. Unfortunately, this simple formulat,ion is not possible in general because the two-body equation of motion has vectorial form.
Regularisation methods
2.1
95
Kustaanheimo-Stiefel formulation
In the following we shall derive the equations of motion for regularisation in 2D (cf. Bettis and Szebehely 1972). It is then straightforward to apply the KS transformation in 3D. The physical coordinates RI, Rz can be expressed in terms of the new coordinates u1, u2 by the complex plane mapping
RI = u : - u ~ R2 =
(9)
2 ~ 1 ~ 2 .
Using the Levi-Civita (1920) matrix
we write the coordinate transformation as
R = C(U)U. It is easily verified that R = U: matrix C has the properties
+ U;.
C*(U)C(U) C’(u) C(u)v U . uC(v)v
According to Stiefel and Scheifele (1971) the linear
= RI = ,c(u’) = C(v)u - 2u * v C ( u ) v
+ v .vC(u)u = 0 .
Differentiating (11) and using (12b) and (lac) yields
R‘ = ~ C ( U ) U ’ . From C‘(u) = C(u’) we have
RI’ = 2C(u)u” Substituting R, R’and R’ = 2u’ 2 u . uC(u)u”
U
+2 4 ~ ’ ) ~ ’ .
in the time-smoothed equation (2) results in
+ 2 u . uC(u’)u’ - 4 u . u’C(u)u’ +
(mk
+ ml)C(u)u = ( U . u ) ~ F ~ [
Simplification using (12d) leads to 2 u . uC(u)u” - 2u’. u’C(u)u We now multiply this equation by
+
{
(mk
,!?‘(U)
+ (mk + ml)C(u)u = ( U - u ) ~ F ~ ~ . and employ (12a) to obtain
+ mi) - 2u’ . U’ } U = 21U 2u. U
*
UC*(U)Fkl.
In order to proceed, we use the operator d / d t = ( l / R )d / d r and convert (13) to give the physical velocity ’ 2 R = -C(U)U’ R . (18)
Sverre J Aarseth
96
From the definition of the scalar product we then obtain ' T
.
4
R .R=-u'.u'. R
(19)
This enables us to introduce the binding energy per unit mass, h, as before and putting U ' U = R, the final equation of motion becomes U"
= ihu
+ ;RCT(u)Fkl.
(20)
From (19) the explicit expression for the binding energy is
h=
2u' . U' - (mk
+ ml)
R
The energy is not constant in the presence of perturbations. The rate of change is obtained from R . R which gives
Conversion according to h' = R' . Fkl and substitution for R leads to the differential equation h' = 2 ~ 'CT(u)Fkl. (23) 1
Since generalisation to 3D proved impossible, Kustaanheimo and Stiefel achieved their objective in going to 4D and adding a redundancy condition. Consequently, the 4 x 4 matrix takes the elegant form
C ( u )=
U:,
U1
-U4
-U3
U3
U4
U1
U2
Here the explicit components of R given by the transformation (11) become
RI = RZ =
U:-U;-U~+U: 2 ( ~ 1 -~~ 23 u q )
= 2(U1U3+~2~4) R4 = 0 . R3
It can be verified that summing the squares and taking the square root simplifies to
Initialisation of the KS solution proceeds as follows. If R1 > 0, we combine R1 and R which gives U: + U : = ;(RI R) (27)
+
Regularisation methods Applying the redundancy
214
97
= 0 then yields
u1 = [ $ ( R 1 + R ) I 1 ” U2
= $(Rz/ud
U3
= $(R3/u1).
In the alternative case RI < 0 we subtract R1 from R and obtain U; +U: =
$(R - R I ) ,
which together with the redundancy u3 = 0 results in
211
=
i ( R 2 / ~ )
214
=
$(R3/212).
To obtain the regularised velocity we invert (13) and use (12a), hence =
$L(u)R’/R,
or in the more convenient form U’
=
$L(u)R.
In summary, we have the ten differential equations of motion U”
=
$ h u + $RLTFkl
h’
=
2U‘.LTFkl
t’ =
U.U.
(33)
For practical work, the semi-major axis is useful and is given by
Likewise, the eccentricity can be obtained using R = U( 1 - e cos 6’) and Kepler’s equation
nt = 6’ - e sin 6’ which yields
2.2
Stumpff KS method
Although the standard KS method works very well for large perturbations, it is desirable to obtain more reliable long-term solutions without the use of stabilisation or rectification. In the following we describe briefly a recent formulation which achieves high accuracy without extra cost (Mikkola and Aarseth 1998). This idea is based on using truncated
Sverre J Aarseth
98
Taylor series, where additional correction terms represent the neglected higher orders and in fact exact solutions are obtained for unperturbed motion. The integration cycle begins with the usual perturber prediction of rj and vI to order Fj. Let us adopt the notation U for the standard KS coordinates. We predict U and U’ to order U(5)in the Taylor series using the modified Stumpff (1962) functions
n!C m
Cn(Z) =
k=O
(-2Ik
(72
+ 2k)! ,
(36)
1 with argument z = -2hAr’. Since the argument is small here, it suffices to take twelve terms in the expansion for convergence. Although the coefficients Cn(z) only deviate slightly from unity, re-evaluation after each step is needed at a small extra cost. Introducing R = - 21h for convenience. the equation of motion is given by U(’) = -RU
+;RPF
,
(37)
with F the external perturbation. Adopting the Hermite formulation (Makino 1991), we write the KS acceleration and its first derivative at the beginning of the step as
Here fi’) = i R Q with Q = LTF represents the perturbed force after the previous cycle. Note that the regularised derivative of the perturbation is obtained by F’ = RF. The two next Taylor series terms are constructed from the Hermite scheme. From the current value of R,predicted to fourth order, we form the perturbative derivatives a t the end of the step according to f(2)
=
(Ro- R ) U + ~ T Q
$3)
=
(Ro - R)U’ - R’U + ~
+~ T Q ‘ .
T ’ Q
(39)
The corrector derivatives fi4),fJ5) can now be constructed by the Hermite rule. This enables us to form the higher derivatives a t the beginning of the step as
which gives rise to the provisional solution. Likewise, from 0‘ = -U‘. Q and substituting from U(’), the second derivative of the energy becomes
Finally, an improved solution is obtained by one iteration starting from (39) without recalculating the external perturbation. The integration of the energy by R‘ = -U’ . Q remains as in standard KS. However, the treatment of the time also involves use of the Stumpff functions in the higher derivatives.
Regularisation methods
99
2.3 N-body interface The implementation of the KS method in N-body codes took place over 30 years ago (Aarseth 1972, Bettis and Szebehely 1972). The complete solution for the two-body motion is obtained by introducing the centre of mass. The corresponding acceleration is
where F, here refers to the external force per unit mass. For completeness, the global coordinates are given by
+
where p = mkml/(mk ml). and similarly for the velocities. In order to facilitate decision-making, we define the relative perturbation
Y=
IFk
- FilR2
(44) mk -k ml In the tidal approximation, perturbers of mass m3 are selected for distances T ~ ,< r y , with
is taken to be the boundary value a t apocentre for adopting unperwhere ?inin = turbed two-body motion. The regularised time-step is chosen by the modified frequency expression
which contains the dimensionless accuracy factor qu and an empirical reduction due to the external perturbation. The corresponding physical time-step is obtained by the Taylor series expansion 6 1 At = -tr'Ark (47) k ! k=l These time derivatives are evaluated using known quantities, with the second and third derivatives given by
t; = 2u'. U tl"' = 2u" . U + 2u' . U Conversely, a regularised subinterval corresponding to the physical interval bt may be obtained by iteration of (47) or from the inverse expansion
Sverre J Aarseth
100
In practice, it is sufficient to perform the inversion with i o = 1/R and the two additional terms
The division by R is not dangerous here because the c.m. approximation is used for large values of r / R . Integration of a regularised solution still introduces systematic errors. One device called energy stabilisation consists of adding a small correction term to the equation of motion during the prediction. Consequently, we write
where a is a constant which contains the inverse time-step and A is obtained explicitly from (21). Note that the multiplication by R makes the expression well behaved. Without this feature, the value of h as calculated by the transformed quantities begins to deviate from the integrated value. However, angular momentum is no longer conserved. An alternative scheme consists of rectifying the regularised coordinates and velocities to the correct energy value while maintaining angular momentum conservation. This is achieved by introducing the correction coefficients C1, C2 in the energy relation
h=
2uI2C,2- M U%:
From angular momentum conservation we have C1C2= 1, which yields the solution
c2=
{ -* [-+ M 4u12
hR
2Ul2
(M 4uI2
1'2}1'2
(53)
Here the positive root is chosen if M/4ut2 < 1. The corrected KS variables are then given by
ii = ClU iit = C2U' .
(54)
For completeness, we remark that the integration of the KS solutions is essentially performed by Taylor series, in analogy with the direct integration method. Here there is a choice of divided differences (Aarseth 1985) or the Hermite method (Makino 1991). In the latter case, this entails the prediction of both the perturber coordinates and velocities according to
Regularisation methods
101
where bti = t - t j represents the interval since the last force calculation and the particle index has been omitted from F. An attractive feature of the two-body description is that unperturbed motion may be adopted if the external effect is sufficiently small. Hence in the case of no perturbers inside a specified distance, e.g. 100a(l+e), the next physical time-step may be taken to be the Kepler period 2 7 r ( ~ ~ / M ) ’ In / ~ .fact, a careful examination of the perturber motions with respect to the associated centre of mass often allows a large number of unperturbed periods. To conclude this section, we may summarise some practical advantages of using the KS method as follows. Regular equations for small separations. The time-step is independent of eccentricity. Even circular orbits are more accurate. The perturbation falls off as l/r3. Unperturbed iwo-body motion for hard binaries. The c.m. approximation may be used. On the debit side should be mentioned the coordinate transformations which are needed in order to obtain the physical perturbation. Likewise, for other particles which are close to a KS pair. There is also an additional cost of obtaining the inverse time interval (50). However, the net gain in efficiency is substantial when compared to traditional direct integration, although it requires an effort to construct a scheme for an arbitrary number of KS pairs (cf. Aarseth 1985).
2.4
Hierarchical systems
Hard binaries in star clusters are usually quite stable and spend much of their time experiencing relatively small perturbations. Occasionally hard binaries combine with a single star or another binary to form long-lived hierarchical structures. It is well known that the inner semi-major axis tends to be constant in stable systems, and procedures for speeding up the calculation by adopting the c.m. approximation have been in place for a long time (cf. Aarseth 1985). This procedure can be justified by several numerical stability criteria (cf. Harrington 1972). A new approach, based on the binary-tides problem (Mardling 1995), has led to a semi-analytical stability criterion which applies to a wide range of outer mass ratios and arbitrary outer eccentricities (Mardling and Aarseth 1999). Here the boundary for the outer pericentre distance, R T t , is expressed in terms of the inner semi-major axis, ai,, by
Sverre J Aarseth
102
+
where qout = m3/( m 1 m2) is the outer mass ratio, eout is the corresponding eccentricity and C 21 2.8 is determined empirically. This criterion is valid for coplanar prograde orbits and ignores a weak dependence on the inner eccentricity and mass ratio. Since inclined systems tend to be more stable, we include a linear heuristic correction factor of up to 30% for retrograde motion. in qualitative agreement with early work (Harrington 1972) and recent unpublished experiments. The criterion (56) ensures stability against escape of the outermost body. However. exchange with one of the inner components also needs to be examined. For this purpose we employ the semi-analytical criterion (Zare 1977)
where J is the angular momentum and f(p), g ( p ) are algebraic functions of the masses. If J2E < (J2E),r,t(where E < 0). no exchange can occur between the outer body and one of the inner components. However, this condition only appears to be necessary and sufficient for small inclinations. In any case. the boundary for escape lies above the exchange limit for qout 5 5 and hence the exchange criterion is of less practical importance in star cluster simulations. Accordingly, if the outer orbit forms a hard binary and aout(l- eout)> I??'. a triple system is defined to be stable. subject to certain perturbation tests. The system is then treated as a KS solution in which the inner binary temporarily becomes a composite single body. Hence the KS period is now associated with the outer orbit and this leads to a significant gain in efficiency. This scheme may be generalised to situations where the outer component is another binary, as well as higher-order systems which are actually observed. Although somewhat complicated, the decision-making is well defined. There are several ways in which a hierarchical system may cease to be stable. The outer eccentricity may increase as a result of small perturbations until the stability condition is violated. Alternatively, strong perturbations may lead to exchange or ionisation of the outer component. On termination of the hierarchical structure, the system is restored to its original form, followed by standard initialisation, whereupon the integration can be continued
3
Compact subsystems
Strong interactions between binaries and single stars or other binaries are a characteristic feature of star cluster dynamics. This behaviour is particularly evident in the highdensity core. Consequently. the KS formulation based on dominant two-body motion leads to frequent switching and loss of accuracy as well as efficiency. The development of a three-body regularisation method (.i\arseth and Zare 1974) improved the treatment of compact triples. Here the basic idea is to introduce two coupled KS solutions with respect to the third body which plays a pivotal role. This is achieved by reducing the set of 18 differential equations to 12 by using the six integrals which define the local c.m. motion. In analogy with KS theory. the system of 12 equations is then expanded to 16. where each half system is governed by the well-known transformations.
103
Regularisation methods
Let us describe a triple system by the two distances RI, R2 of the mass-points m l , m2 relative to the reference body m3. This allows us to define standard KS coordinates for each two-body interaction ml, m3 and m2, m3. Using the notation Qk for each corresponding four-vector, we write
Q: = Rk,
(k=1,2)
(58)
Likewise, the generalised time transformation is taken to be
dt = RlR2 d r .
(59)
The actual derivation employs the well-known concept of generating functions (cf. Szebehely 1967). The crux consists of writing a separable generating function which leads to the regularised Hamiltonian in the extended phase space
r* = R l R z ( H - E o ) ,
(60)
where H is the Hamiltonian function and Eo is the initial value of the total energy. The explicit expression is given by
with Pk the regularised momenta and Ak twice the transpose of the generalised LeviCivita matrix. Moreover, P k 3 = mkma/(mkfT3)and 1 = [ ( k + l ) / k ] .The corresponding equations of motion take the standard form
These differential equations are regular for RI + 0 or R2
+ 0.
What makes this method work so well is that, after differentiation, the formally singular interaction terms containing m1m2are numerically smaller than the regular terms, provided we have that IR1 - Rz(> max ( R l ,R2). This condition is ensured by re-labelling the particles, followed by transformations to regularised variables. In fact, it is usually sufficient for the distance between ml and m2 to be the second smallest. This enables interactions of the fly-by type to be calculated without any switching if one of the binary components is taken to be the reference body. The effect of external perturbations may also be included (cf. Aarseth and Zare 1974). However, this has not been implemented yet and the method has therefore only been used for compact subsystems. The three-body formulation serves two purposes since it may also be used as a standalone method for studying isolated triples. In either case, the internal decision-making is extremely simple. However, when treating compact subsystems, it is necessary to introduce the associated c.m. body as a fictitious particle which must be advanced in a
Sverre J Aarseth
104
consistent manner. The calculation proceeds until one of the members escapes or moves out to a distance where the external perturbations can no longer be neglected. Here the maximum extent can be estimated by (45) using the total subsystem mass and the contribution mJ/r:Jdue to the dominant perturber. Following termination, a KS solution is initialised for the binary, whereas the escaper is treated as a single particle. A global formulation which includes all the N ( N - 1)/2 interactions was also developed a long time ago (Heggie 1974). Now the number of equations becomes larger but the case hT= 4 is still feasible and has been used extensively in earlier numerical work. Because of the complexity of the original derivation, an alternative version has been presented (Mikkola 1985). Numerical experiments show that the global method is only more accurate than three-body regularisation for extremely critical triple configurations which do not usually occur in practical calculations. If desired, external perturbations can again be included (Heggie 1974, 1988). However, in analogy with the three-body case, relatively compact quadruple systems have been treated in the unperturbed approximation. Finally, we remark that this method is well suited to studying binary-binary collisions which constitute an important feature of star cluster dynamics.
4
Chain regularisation
4.1 Basic formulation Whereas it took about 60 years to go from two to three dimensions in two-body regularisation, it only required 17 years to extend the three-body method to an arbitrary membership. This development is denoted chain regularisation (Mikkola and Aarseth 1990) because the interacting particles are connected along the vector giving the strongest neighbour force. Again each two-body vector is represented by a KS solution and all the other interactions are included as perturbations which are not assumed to be small. In the following we give a brief outline of the derivation. Consider a system of N particles with coordinates q, and momenta m,v, in the local c.m. system. Let us define relative momentum vectors by the recursive relation
with W1 = -pl and WNP1= pN. Substituting these momenta and the relative coordinates Rk = qk+l - qk into the Hamiltonian, we obtain
N-1
k=l
We now introduce the time transformation involving the inverse Lagrangian by 1 t’ = L’
Regularisation methods
105
which gives rise to the regularised Hamiltonian r*. Accordingly, the final equations of motion are again given by (62). Inspection of the differential equations shows that the solutions are regular for Rk + 0 with k = l , . . . , N - 1. Finally, using the traditional notation (cf. Mikkola and Aarseth 1993), the KS transformations take the form
Rk = L Q k
from which the global coordinates and velocities may readily be recovered.
4.2
Slow-down treatments
In order to be realistic, star cluster simulations need to include a wide distribution of primordial binary periods. Occasionally a binary with short period may become part of a compact subsystem and be treated by chain regularisation. Although typical subsystems are of small size compared to the interparticle separation, such binaries may be much smaller still. Consequently, integration of the relative motion becomes very expensive. However, it is possible to take advantage of small perturbations and exploit the concept of adiabatic invariance to speed up the calculation. The new idea (Mikkola and Aarseth 1996) is to slow down the dominant two-body motion such that one orbit represents several Kepler periods. This is achieved by scaling the small perturbation and corresponding physical time-step by a slowly varying factor, thereby neglecting any fluctuations on short time-scales. In other words, we only include the secular effects acting on the dominant binary. This may be illustrated by presenting the perturbed two-body equations in the form
i . =
(67)
K-lV.
Now the new period is n times the old one and the numerical integration is speeded up. Accordingly, we split the Hamiltonian for the chain subsystem into two separate parts by writing = K-'Hb ( H - Hb). (68) Here Hb represents the Hamiltonian for the weakly perturbed binary. The corresponding expression in the extended phase space (cf. Szebehely 1967) is then given by
+
rnew
= Hnew
-E ,
(69)
which can be multiplied by the time transformation (65) before performing the final differentiation (62). Since K is slowly varying, H,,, is not constant and the value of the total energy must be adjusted according to
In practice, n is changed by a small discrete amount after each step, with the actual value obtained from the local perturbation. In this way, binaries of arbitrarily small periods
Sverre J Aarseth
106
may be treated without expending an unduly large effort. Note that in defining the slowly varying perturbation, we evaluate the corresponding semi-major axis using regular quantities only. The same idea has also been implemented for the different KS formulations (cf. Mikkola and Aarseth 1996) and will be summarised here. Now we scale the perturbation factors F, F as well as the time-step (47) by K . Moreover, the strategy for determining K is more involved than in chain regularisation. In the first place, it is convenient to reevaluate the slow-down factor at the first point past apocentre, where the perturbation is usually largest. By restricting the choice of K to the hierarchical integer values 2"-' for different levels IC = 1.2,3, .... we limit the number of changes and thereby avoid frequent re-initialisations of the KS polynomials. After selecting the new perturbers, we estimate the time interval At for the perturbation to reach a specified small value yo (e.g. 5 x The relevant algorithms are based on the expression (45) as well as the relative velocities of nearby perturbers. Having decided a new discrete value of K , the orbit is integrated a small step back to the apocentre, whereupon the relevant KS polynomials can be initialised. Hence the slowdown procedure is performed over an integer number of orbits with the same value of K , since otherwise spurious fluctuations would have an effect. Note that, in principle, it is not necessary to adopt strictly unperturbed two-body motion since K may be chosen arbitrarily large. However, there are some technical advantages with the data structure in distinguishing the special case of zero perturber number.
4.3
Practical aspects
Implementation of the basic chain formulation into an IV-body code calls for a variety of algorithms to be developed. Some of the tasks required are similar to the treatment of unperturbed triple and quadruple systems described above. However, in addition to introducing external perturbations, the internal membership may change. Let us first discuss some aspects concerned with external perturbers. According to the original theory (cf. Mikkola and Aarseth 1993), the presence of perturbers gives rise to extra terms in the equations of motion (62). An equation of motion must also be introduced for the total energy of the the subsystem. in analogy with h' for KS. The perturbers are selected in a similar way, with the two-body separation in (45) replaced by the gravitational radius
Rgrav=
"3.
P oI
(71)
Hence particles with an effective tidal perturbation exceeding ymlnare selected initially and updated frequently. The regularised equations of motion are integrated by the powerful Bulirsch-Stoer (1966) method which evaluates the perturbing force as well as the regularised derivatives a large number of times per step. Although an expensive method, the overheads here are modest because of the relatively short duration of a typical interaction, with only a few hundred such events in a long calculation. Because of the non-linear time, special care is needed for the internal integration which must not exceed the c.m. The latter is advanced consistently together with the other particles. However. the c.m. force evaluation is more involved. First the total force is obtained as a single particle. This is followed by a differential correction in which each
Regularisation methods
107
perturber interaction is replaced by the proper force, analogous to the KS expression given by (42). Likewise, the c.m. force on perturbers is treated in a similar manner. A suitable chain configuration is identified by considering the impact parameter of an approaching single particle or another binary with respect to a binary. Such tests are carried out at each apocentre passage if the corresponding c.m. step is small. Several different outcomes are possible once a chain system has been initialised. The simplest situation is that one of the members (i.e. single particle or binary) escapes, after which any binary is initialised as a KS solution. Less common is the case of an approaching perturber being absorbed as an internal member. Hence we need to select natural configurations for the chain instead of treating a close neighbour as a perturber. In astrophysical simulations we also need to cater for actual collisions between stars of finite radii. Procedures are available for increasing the membership to six if necessary. It can be seen that the decisionmaking is complicated and a full description is beyond the scope of this article. Since the geometrical configuration of strongly interacting particles changes on a short time-scale. it is necessary to update the chain connecting the particles frequently. In fact, a check on the relative distance vectors is made after every step at very little extra cost. As in the unperturbed three-body case above, there is no strict requirement that the chain be constructed in an optimal way. In other words, the chain possesses a certain elasticity and yet remains effective.
A sequence of chain regularisation is normally terminated when one member escapes. Here the decision-making is based on the distances R k , such that a single escaper is characterised by the largest value. Likewise, an escaping binary is readily identified by the second distance at the beginning or end of the chain being largest. For this purpose, it is sufficient to make use of the approximate two-body relation
where d is the radial velocity with respect to the local c.m and M , is the subsystem mass. Hence the simple conditions d > 0, Ed > 0 together with d > 3R,,, ensures removal of the escaping object, whereupon the integration is continued if necessary. Another situation which calls for termination concerns the formation of stable hierarchies. The simplest case arises after a binary-binary collision in which one component escapes and leaves behind a stable hierarchy. For this purpose we employ the stability criteria discussed above. Likewise, more complicated configurations, such as quadruples or higher-order systems, may also occur and require special attention. In view of the expensive but accurate treatment of chain regularisation, it is essential to avoid the continuation of stable hierarchies which are better studied by the modified KS scheme. However, use of the chain regularisation facilitates the identification of hierarchies, and is therefore a useful tool for understanding dynamical processes as well as providing reliable solutions.
References Aarseth S J, 1972, in Gravitational N-Body Problem ed M Lecar, Reidel, Dordrecht, 373 Aarseth S J, 1985, in Multiple T i m e Scales ed J U Brackbill and B I Cohen, Academic Press, Orlando, 377
108
Sverre J Aarseth
Aarseth S J, 1994, in Lecture Notes in Physics ed G Contopoulos, N K Spirou and L Vlahos, Springer-Verlag, Berlin 433 277 Aarseth S J and Zare K, 1974, Celest Mech 10 185 Bettis D G and Szebehely V, 1972, in Gravitational N-Body Problem ed M Lecar, Reidel, Dordrecht, 388 Bulirsch R and Stoer J, N u m Math 8 1 Harrington R S, 1972, Celest Mech 6 322 Heggie D C, 1974, Celest Mech 10 217 Heggie D C, 1988, Long-Term Dynamical Behaviour of Natural and Artificial N-Body Systems ed A E Roy, Kluwer, Dordrecht, 329 Kustaanheimo P and Stiefel E, 1965, J Reine Angew Math 218 204 Levi-Civita T,1920, Acta Math 42 99 Makino J, 1991, Astrophys J369 200 Mardling R A, 1995, Astrophys J 450 722 Mardling R and Aarseth S, 1999, in The Dynamics of Small Bodies i n the Solar System ed B A Steves and A E Roy, Kluwer, Dordrecht, 385 Mikkola S, 1985, Mon Not R Astron SOC215 171 Mikkola S and Aarseth S J, 1990, Celest Mech Dyn Ast 47 375 Mikkola S and Aarseth S J, 1993, Celest Mech Dyn Ast 57 439 Mikkola S and Aarseth S J, 1996, Celest Mech Dyn Ast 64 197 Mikkola S and Aarseth S J, 1998, New Astron 3 309 Stiefel E L and Scheifele G, 1971, Linear and Regular Celestial Mechanics, Springer-Verlag, Berlin Stumpff K, 1962, Himmelsmechanik Band I, VEB Deutscher Verlag der Wissenschaften, Berlin Szebehely V, 1967, Theory of Orbits, Academic Press, New York Zare K, 1977, Celest Mech 16 35
109
Escape in Hill’s problem Douglas C Heggie University of Edinburgh, CK
1
Introduction and motivation
In the 19th century the American mathematician G W Hill devised a simple and useful approximation for the motion of the moon around the earth with perturbations by the sun. To most dynamical astronomers “Hill’s Problem” still means a model for motions in the solar system in which two nearby bodies move in nearly circular orbits about another much larger body at a great distance. These lectures have, however, been motivated by a problem in stellar dynamics. Consider a star in a star cluster which is itself in orbit about a galaxy (Figure 1). The star, cluster and galaxy take the place of the moon, earth and sun, respectively. The potentials of the cluster and galaxy are not those of a point mass, and the galactic orbits of the star and cluster may be far from circular. Nevertheless Hill’s problem is a good starting point, and it can be modified easily to accommodate the differences. In Section 2 we outline a derivation of Hill’s equations, and in Section 3 we summarise the appropriate extensions.
Figure 1. The cluster is treated as a point mass M , in uniform circular motion of angular velocity w at a distance R from a point-mass galaxy M g .
Douglas C Heggie
110
Tidal Models I 1 ' 1 ' 1
l l
JOOOO m
" C 3000
D
300
100 N
Figure 2. Results of numerical experiments (Aarseth and Heggie, unpublished) on the escape of stars from star clusters. The time for half the stars to escape is plotted against the original number N of stars in the simulation. Points are averages ouer several simulations at each N , except the largest value. The continuous line shows the prediction of theory, i.e. proportional to the relaxation time (see text), and the dashed line is an empirical fit. Stars gradually escape from star clusters. This has been expected on theoretical grounds for many years, ever since a paper by Ambartsumian (1938). Recently, deep observations have confirmed this (e.g. Leon et a1 2000), by revealing faint streams of stars around a number of the globular clusters of our Galaxy. Loosely speaking we can say that a star can only escape if its energy exceeds some critical energy. The energies of stars change slightly as a result of two-body gravitational encounters within clusters, though the time scale on which this happens. (the relaxation time scale) is very long, of order logyr. But the orbital motions of stars within clusters have much smaller time scales of order 106yr, and until recently it was 'thought that escaping stars would leave on a similar time scale. With this assumption, relaxation is the bottleneck, and so the escape time scale (e.g. the time taken for half the stars to escape) should vary with the relaxation time. Nowadays it is possible to simulate the evolution of modest-sized star clusters with 3 x lo4 or more members, and the predicted escape time scale can be checked empirically. Unfortunately the results contradict the theory (Figure 2). As these simulations require considerable extrapolation in particle number N to be applicable to real clusters (for which N lo6) the error of the theory is serious. N
It turns out that the assumption of rapid escape is the main source of error (Fukushige & Heggie 2000, Baumgardt 2000a,b). In fact some stars above the escape energy never escape (unless some other dynamical process comes into play), and others take much
Escape in Hill’s Problem
111
longer to escape than had been generally thought. With this motivation, the remaining sections of these lectures are devoted to the dynamics of escape. Section 4 analyses the very definition of escape, which is not as straightforward as in more familiar situations. The last two sections show some ways in which the computation of the escape rate can be approached. The main result of Section 5 concerns the way in which the time scale of escape depends on the energy, and outlines how this resolves the problem of Figure 2. Much more difficult, from a theoretical point of view, is determining the distribution of escape times, and some relevant ideas are introduced in Section 6.
2 2.1
Equations of motion Derivation
We now outline a derivation of the equations of Hill’s problem in the stellar dynamics context. To simplify matters as much as possible, however, we treat the cluster and galaxy as point masses M, and Mg >> M, (Figure l),and consider motion of a massless star in the same plane of motion. If x,y are the coordinates of the star in a rotating frame centred at the cluster centre, w[R Therefore the Lagrangian for its velocity relative to the galaxy is ( k - w y , its motion is
+
+ XI).
1 GM, GM, L = -2{ ( x - ~ y + (6 ) +~w(R + x ) ) ~+} -+ -, RI r
where r 2 = z2+ y 2 and R’2 = ( R + z ) *+ y 2 . (Note here that we are neglecting the motion of the galaxy, which will not affect the final approximate set of equations of motion.) For reasons that will become clear later we switch to a canonical formulation. The momenta conjugate to x and y are p,
=Lx = x - w y
Py
= LG = j ,
+ w(R + x ) ,
and the Hamiltonian is
3-1
= xp,
+ypv - c
The next step is common to applications in the solar system and stellar dynamics but has a different name. In applicat,ions to the earth-moon-sun problem it is referred to as “neglect of the parallax”, while in stellar dynamics it is always called a “tidal approximation”. (Even that phrase betrays how much the subject of stellar dynamics owes to the celestial mechanics of the earth-moon-sun system!) We suppose r << R and approximate 1
R’
-
R
Douglas C Heggie
112
We drop constant terms, substitute w2 = GMg/R3from the equations of circular motion (again assuming Mc << Mg), and replace p, -+ p , w R . (If the other variables are not changed this transformation is canonical.) We get
+
1
3t = $(Pi + p i )
+ w(yp,
1 2
- -w2(2x2
- xp,)
- y2) -
GMc 7.
Next we write down Hamilton’s equations
x
p , = -,Etx3
= g,,,
and on eliminating p , and p,, we get 2 - 2wy - 3W2X = --GMC X r3 y+2wx
=--y, GMc
r3 which differ from Hill’s equations only in notation, and then only slightly.
2.2
A generalised leapfrog
The leapfrog is a favourite integration algorithm for equations of motion in stellar dynamics. It is identical to the Verlet algorithm of molecular dynamics. For a one-dimensional problem with Hamiltonian p 2 / 2 V(x), for example, it m a y be written
+
xn+1 = x n + h P n (4) Pn+l = Pn - hV’(xn+l), (5) where h is the time step. Note that the new coordinate is used immediately, which is where the algorithm differs from an Euler algorithm. The effect is dramatic, as the long-term behaviour of the leapfrog is m u c h better.
One of the nice properties of the leapfrog is that it is symplectic, like a good Hamiltonian flow. Here we show how to construct a similar algorithm for the Hamiltonian of Hill’s problem. Euler’s algorithm would be , xn+1 = x n + h N p ( x n , P n ) , Pn+l = P n - h Z x ( x n 3 ~ n ) i where we have written x = ( s l y ) and p = (px,py). We can make this symplectic by replacing pn by pn+l in the arguments of the derivatives of X ,because it then takes the form xn+1 = F p ( X n , P n + l ) ’ P n F x ( x n , Pn+l)r where the g e n e r a t i n g f ~ ~ ~ t Fi o=nX n . P n + l h X ( X n , P n + l ) . Writing out these equations explicitly for the Hamiltonian of Hill’s problem, we obtain the algorithm
+
= Yn+l =
xn+1
Px,n
=
+ h ( p x , n + l + WYn) Yn + h(py,n+l- wxn)
(6) (7)
xn
px,n+l+
h
-WPy,n+l
- 2w2xn
+
Escape in Hill’s Problem
113
These equations look horribly implicit, a common difficulty with elementary derivations of symplectic methods, but in fact Equations (8) and (9) are easily solved explicitly for pn+l and then Equations (6) and (7) give xn+l.
2.3
Elementary properties
1. The Hamiltonian 31 is time-independent, and so its value is conserved. Rewriting the momenta in terms of the velocity components one finds that this value is - __ GMC , which is often referred to as the “energy”. Again E = -(i2 1 + c2) - -u2x2 3 2 2 r there is another name in the celestial mechanics community, who refer to the “Jacobi constant” C = -2E. In stellar dynamics this term is often applied to E . At any rate, one implication is that the motion is bounded to the region in which 3 252 - GMc 5 E . The boundaries of these regions are called Hill’s curves 2 r (Figure 3 ) . --(&I
It is sometimes tempting to refer to the expression for E as the Hamiltonian, and indeed the right-hand side has the same value as 31. It is, however, impossible to recover the equations of motion from the expression for E . 2. Hill’s equations have two equilibrium solutions, at ( x , y ) = k ( r t , O ) , where rf = G M c / ( 3 w 2 ) In . stellar dynamics rt is called the tidal radius or Jacobi radius, and in all subjects these points are referred to as the Lagrange points L1 and L2.
3. Hill’s equations have an obvious symmetry: if ( x ( t )y(t)) , is a solution, then so is ( ~ ‘ ( ty )’ (,t ) ) = ( x ( - t ) , -y(-t)). This is quite useful for studying asymptotic orbits. For example, if an orbit tends to L1 as t + 00, then the orbit obtained by this symmetry tends to L1 as t + - W . Also, if i ( 0 ) = y(0) = 0, the two orbits are the same, as they satisfy the same initial conditions. This helps to explain the amount of attention that has been paid in the literature to such orbits.
Douglas C Heggie
114
3
Variants of the problem 1. It is not necessary that one of the bodies is massless. Hill’s equations are also applicable to the relative motion of the moon and earth, under solar perturbation. as in Hill’s original research. .4 relatively accessible account of this research is Plummer (1918). A modern application is binary asteroids (e.g. Chauvineau 8L Mignard 1990). 2. It is not necessary that the two small bodies are bound. Another application is to near-conjunctions of coorbitals (e.g. Murray & Dermott 1999).
3. When the smallest body is treated as massless, as in the star cluster application, it is not necessary that the other bodies are treated as point masses. For a spherically symmetric galaxy potential and an arbitrary cluster potential q5c the threedimensional equations of motion are
x
-.
2wy
+ ( 2- 4 w 7 x
=
a4c
-.-
ax
where K. is the epicyclic frequency (Chandrasekhar 1942, Binney & Tremaine 1987) and the plane of motion of the cluster is the x , y plane. For a point-mass galaxy K. = w and the previous equations are recovered (when 9c = -GM,/r.) Very often the cluster potential $c would be chosen to be that of a King model (cf. Binney & Tremaine 1987). Qualitatively the most important difference from the point-mass potential is that the depth of the potential well is finite. 4. Returning to the point mass case, we now consider the situation in which the motion of the cluster is elliptic, with eccentricity e. There is now a formulation using the same coordinates as in Figure 1 but scaled by R (so-called rotating, pulsating coordinates P,Q).For coplanar motion the equations are $1
- 25’
= 1
+
1 (35 ecosf
2;)
where the prime denotes differentiation with respect to f , the true anomaly of the cluster orbit, and = P2 Q2. These equations can be easily derived from the corresponding formulation of the elliptic restricted problem (Szebehely 1967). An important difference from the circular case is that the Hamiltonian is no longer autonomous, and there is no Jacobi integral.
+
5 . One can equally well treat the previous problem in rotating, non-pulsating coordinates with origin at the centre of the cluster. For coplanar motion, a point-mass
115
Escape in Hill’s Problem galaxy and an arbitrary cluster potential, the equations are
x - 2wy -
R ($2- 2 4 x + 2w--y R
.
y+2wx--y
R R
ax
%Jc
= --
84
=-2
dY ’
but the corresponding three-dimensional equations can easily be derived for any spherical galaxy potential (Oh, Lin & Aarseth 1992). Here, of course, w is not constant in general. 6. For a still more general galaxy potential q ! ~ it ~ is simplest to use non-rotating, nonpulsating coordinates, 2.e. a coordinate frame with origin a t the cluster centre but with axes parallel to fixed directions in space. Then the equation of motion takes the simple vector form r = - V & - r.VV4,.
Though this may well be the most useful formulation for non-circular cluster motion, and certainly when the potential is not even spherical, one can’t help feeling that something is lost in this simplicity. For example, in the case of a point mass galaxy the equation of motion is h
l
r = -V& - w2(r- 3 ( r . R ) R ) ,
(10)
where is the unit vector from the galaxy to the cluster. Now the corresponding Hamiltonian is time-dependent, and it is no longer obvious that any integral exists. But the Jacobi integral is still conserved, taking the form
-
1 1’22 3 E = -r2 - w.r x i. + -w r - -w2(r.R)2 2
2
2
+ &.
This is an integral of Equation ( l o ) , but not quite an obvious one.
4
Escape criteria
4.1 Escapers An escaping star eventually travels far from the cluster, and the cluster potential is negligible. If the right sides of Equations (2) and (3) are neglected we have the approximate solution 2
=
x + acos(t + 4)
3 y = Yo - - X t 2
- 2asin(t + 4),
(11)
(12)
where X, Yo, a and 4 are constants, and we have scaled t so that w = 1. Typical orbits are shown in Figure 4, even in places where the cluster potential would not be negligible. Notice that stars like to revolve or spiral anticlockwise, while the axes are such (Figure 1) t,hat the galaxy is far away a t the bottom and the direction of motion of the cluster
Douglas C Heggie
116
-3 -
Figure 4. Orbits in Hill’s problem when the cluster potential is neglected. Note that the axes are orientated unconventionally. is to the right. Thus the stellar motions are retrograde. This is typical of epicycles, as these motions are often called. Two orbits drawn in Figure 4 are centred at the location of the cluster. There is a family of such orbits, for varying a. When the cluster potential is restored this family becomes a family of stable retrograde satellites of the cluster. Its existence has been known for a long time (Jackson 1913, Hdnon 1969, Benest 1971, Markellos 2000). (In the solar system context these are sometimes referred to as eccentric retrograde satellites, but the reference to the heliocentric eccentricity is not illuminating in the stellar dynamical context.) This family, referred to as Family f by H h o n , ranges from tiny, almost Keplerian orbits around the origin to huge orbits far beyond the tidal radius. As a star cluster loses mass by the escape of stars, it is conceivable that some stars in retrograde orbits are left on such orbits well outside the tidal radius, and it would be interesting to look for these in N-body simulations. Now consider the orbit in Figure 4 passing through the origin. Again such orbits of arbitrary size exist (Ross et a1 1997). Though severely distorted by the cluster potential near the origin, they show that stars can escape, recede to arbitrary distance, and then return to the cluster again. Thus distance by itself is no guarantee that escape is permanent. Rigorous escape criteria can be derived, but, to be frank, in practical terms it is quite enough to assume that stars that recede to a few times rt will escape; the fraction of such stars that do not escape is tiny.
4.2
Non-escapers
It is easy to obtain a rigorous criterion for nowescape, using the simple idea behind 3 GM, , and any star J has energy E, = Figure 3. A particle at rest at the L ~ points 2 rt with E 5 E,, and lying within the Hill curve passing through the Lagrangian points, can never escape. What now if E > E,? We already know one set of orbits on which a star can remain inside the cluster forever, even with energy above the escape energy: these are the stable
Escape in Hill's Problem
117
retrograde satellites (which move outside the tidal radius only for energies considerably above E,). Being stable, these orbits are surrounded by a region of phase space with the same properties. This is illustrated by the surfaces of section in Figure 5. Closed invariant curves predominate on the left side of the diagrams, which corresponds to retrograde motions. At the centre of this nested set of curves is a fixed point corresponding to the retrograde periodic orbit. Though these diagrams are plotted for energy E = E,, similar sets of invariant curves are obtained at somewhat higher energies in the standard Hill problem (Chauvineau & Mignard 1991, Sim6 & Stuchi 2000). They correspond to retrograde motions of stars permanently remaining inside the cluster and with energies above the energy of escape. The chaotic scattering of points on the right-hand half of the diagram would, however, generally correspond to escaping orbits for E > E,.
-2
2
0 X
6 --
-
41 7 -
c
i
W \
x
-c
0; 2 t 1.. L i
-6
~
'
-.5
' , I 1
1
l i t / !
0
'
'
1
l 4
.5
X
Figure 5. Surface of section in the two-dimensional Hill's problem at the escape energy E = E,. A point is generated on the surface each time an orbit crosses the line y = 0 with y > 0. The edges of the diagram are limiting curves corresponding to the condition y = 0. Upper diagram: the potential is that of a model star cluster called a King model (from Fukushige f3 Heggie 2000). Lower diagram: point-mass potential. It is just possible that such orbits have an astrophysical relevance. In two star clusters (Gunn & Griffin 1979, Meylan et a1 1991) there are stars whose radial velocity alone appears to exceed the escape velocity. Perhaps these are indeed stars permanently bound within the cluster at energies above the escape energy.
Douglas C Heggie
118
Escape rate
5 5.1
Motion near the Lagrangian points
Before attempting to determine the rate a t which stars escape, we study orbits in Hill’s problem a little longer. It is clear from the structure of Hill’s curves (Figure 3) that, at energies just above the energy of escape, an escaper must make its way at relatively low speed through one of the gaps in the Hill curves near L1 and L2. Therefore it pays to study motions near these points, which can be done by linearisation of the equations of motion. In the vicinity of (2.y) = (rt.O); when w = GM, = 1, Equations (2) and (3) take the approximate form
where z = rt
+ { and y = 77.
These have the general solution
where A, B, C and 8 are arbitrary constants. On this solution the “energyi! is
E = E,+C2(10d?+ 49)
+ AB(196 - 40d?).
Several cases have interesting properties: 1. A = B = C = 0: this is the Lagrange point, where E = E,
2. B = C = 0: this solution approaches L1 as t -+ invariant manifold of L1, and E = E,.
3. A = C = 0: this solution approaches L1 as t invariant manifold of L1, and E = E,.
-W.
(E 3GM,. in general). 2 rt
It is part of the local unstable
+ m.
It is part of the local stable
4. A = B = 0: the solution is periodic, and E > E,. Though derived in a linear approximation, there is indeed a family of periodic solutions of the full Hill problem, parametrised by E (Liapounov’sTheorem, cf. Moser 1968). They are named Family a and c (one for each Lagrangian point) in HCnon (1969).
5 . .4 = 0: part of the local stable invariant manifold of the Liapounov orbit.
6. B = 0: part of the local unstable invariant manifold of the Liapounov orbit.
5.2
The flux of escapers
Stars escaping from the interior of the star cluster have A as t -+ hx;thus C2(10fi 49) < E - E,. For fixed E
+
< 0 and B < 0, so that + *m > E,, then, this is the situation
Escape in Hill’sProblem
119
.2
0 h
-.2
-.4
2.6
2.8
3
3.2
X
Figure 6. Orbits in Hill’s problem around one of the Lagrangian points, at a fixed energy E just above E,, after Fukushige & Heggie (2000). The potential of the cluster is that of a King model. Several orbits are shown which approach a periodic orbit asymptotically. Other similar diagrams (for the point mass potential of the usual Hill problem) are given in Marchal (1990) and Simd & Stuchi (2000). for stars “inside” the tube formed by the stable invariant manifold of the Lagrange point (Figure 6). It is quite easy to estimate the rate at which the phase space occupied by these escapers flows out of the cluster. The general theory is given by MacKay (1990), though some trivial generalisation is needed because of the Coriolis forces in Hill’s problem. The rate of flow of phase space (per unit energy EO)is
where the &function singles out values of the phase-space variables ing to the required energy. This is readily transformed to
in the notation of Section 5.1. In fact E =
5 , y , p,,
p y correspond-
1 . 1 9 3 -<’ + -rj2 - -<’ + -q2 +E,, and so 2 2 2 2
This is a two-dimensional result (i.e. for the coplanar problem). In the three-dimensional problem it is found that .F 0: (EO- E,)’, with an equally simple coefficient. In each case, however, the flux of escaping phase space must be doubled, as there are two Lagrangian points.
Douglas C Heggie
120
In order to turn the flux into a time scale for escape, it is only necessary to estimate the volume of phase space inside the cluster at energy E . In turn this is given by an integral of the form V = J,,,, b(E - Eo)dzdydp,dp, in two dimensions. This does not change much with Eo in the vicinity of the critical energy, and there it is easily seen to be 27~times the area inside the last closed Hill curve (Figure 3). It follows that the time for escape varies as ( E - E,)-' in the three-dimensional problem, though this concerns the dimensionless case in which w = 1. When dimensional factors are reinserted it turns out that the result is a time scale proportional to E," -.1 This is a central result of these lectures. ( E - E,)2w
5.3
Numerical methods
It is not hard to obtain the rate of escape numerically. One possible procedure is the following. 1. Choose some E
> E,
2. Select initial conditions a t energy E inside the sphere distribution (cf. Fukushige & Heggie 2000).
T
= rtr according to some
3. Determine the escape time t,, defined to be the first time when problem mentioned in Section 4.1).
T
> rt (pace' the
4. Repeat 2-3 many times. 5. Compute P ( t ) ,defined to be the fraction of cases with t , > t . The third item in this procedure requires choice of a numerical integration scheme. Many are available, but it is worth mentioning here one of the favourites in this subject, which is a fourth-order Hermite scheme (cf. Makino & Aarseth 1992). It is a self-starting scheme, and we illustrate it for the one-dimensional equation of motion x = a(.). Suppose position and velocity are known a t the beginning of a time step of length At, and have values ZO, W O , respectively. From the equation of motion compute the initial acceleration and its initial rate of change, i.e. a. and uo, respectively. Compute the predicted position and velocity a t the end of the time step by xP =
ZO
+ wont + ao-At2 + bo-At3 2 6
Now compute the acceleration and its derivative at the end of the time step, using x p and wp. If the results are denoted by al and b l , respectively, the values of z and v a t the end of the time step are recomputed by x1 = zo 2'1
=
2'0
+ At2 + -(U0
2'1)
-
At2 -((a1
12
At At2 . + -(a0 + a1) 2 12 -((a1
- ao) - bo).
Escape in Hill's Problem
121
Now we return to the numerical problem of determining the escape rate. A typical set of results is shown in Figure 7. Curves at larger t correspond to smaller values of E - E,. It can be seen that these have a horizontal asymptote well above the t-axis; in other words, there is a substantial fraction of stars for which the escape time is extremely long. This is not unexpected, because of the stable retrograde motions shown in Figure 5 . The fraction of such stars decreases as E increases. We also see, as expected from Section 5.2, that the escape times decrease as E increases. Indeed, if we redefine P ( t ) to be the fraction of escapers with escape times t, > t (i.e. we exclude stars which never escape), and if we rescale the values o f t by ( E - E,)2,the resulting curves lie very nearly together, independent of energy (Fukushige & Heggie 2000).
a
Figure 7. Distribution of escape times from a generalised Hill's problem, for various values of the energy, after Fukushige & Heggie (2000).
5.4
Relaxation and escape
We now show how the escape rate which we have just determined leads to a resolution of the problems with the scaling of N-body simulations, with which these lectures were motivated (Figure 2). The ideas are based on those given in Baumgardt (2000b). We imagine that stars are present in a cluster with a distribution of energies n ( E ) . This distribution evolves as a result of two processes: (i) relaxation, which is a kind of diffusion process with a characteristic time scale t,; and (ii) escape, which takes place on
YE,>
E ', where P is the orbital period of the cluster round the a time scale of order P ( galaxy.
As a very simple model for this problem we shall consider the toy model defined by the differential equation
where n ( E ) d E is the number of stars with energies in the range ( E ,E Heaviside (unit step) function H confines escape to energies above E,.
+ d E ) , and the
Douglas C Heggie
122
There are several details missing from this problem. First, in addition to the diffusive term (i.e. the first term on the right side) one needs a “drift” term corresponding to dynamical friction (cf. Spitzer 1987, for this and other issues in what follows). We have also neglected the fact that the coefficient of the diffusion term depends on E and n ( E ) in a complicated way. Next, one needs to take into account the effect on the energies of the stars of the slowly changing gravitational potential of the cluster. Finally, we need to take into account the stars above the escape energy whose escape time scale appears to be infinite. If all those factors were included, we would obtain something close to a full Fokker-Planck equation for the evolution of the distribution function in the presence of energy-dependent escape. We shall see, however, that this toy model is quite illuminating. Let us now scale t by t, and let x = ( E - Ec)/lEcl.Then the escape boundary occurs a t x = 0, and the equation transforms to dn at
--
d2n 8x2
__
aH(x)x2n,
where a = t,/P. Now in astrophysical applications P varies with the crossing time scale in a star cluster, and so CY varies nearly as N / log N , where N is the total number of stars (cf. Spitzer 1987). Therefore CY can be used nearly as a proxy for N . In order to estimate an escape rate we adopt the strategy pioneered by Chandrasekhar in this context (Chandrasekhar 1943), which is to look for a separable solution n(x, t ) = exp(-Xt)y(x), where we expect X > 0. If we impose a no-flux boundary condition a t x = -1 (say) and the condition that y(x) + 0 as x + CO, then we find that
where A , B are constants. While the first of these solutions is elementary, the second deserves some explanation. As Maple shows, the solution of the differential equation for y(x) when x > 0 can be written in terms of Whittaker functions, and a search through Abramowitz & Stegun (1965) shows that these can be expressed as integrals. It is easier, however, to proceed directly, though the appropriate methods are not in common use (cf. Burkill 1962). In this particular case, for a reason that will become clear, we first change the independent variable to z = x2/2. Then the differential equation becomes 2zji
+ $ + (A - 2az)y = 0,
(19)
where a dot denotes a z-derivative. Now we get down to business. Motivated by the inversion integral for Laplace transforms, we seek a solution in the form
where both the function f and the contour C are to be chosen. Substituting into Equation (19), we find that we require
Escape in Hill’s Problem
123
No non-trivial choice o f f will make the integrand vanish. We can, however, integrate by parts to remove the z-dependent part of the last factor of the integrand. It follows that we require
where the first term is the end-point contribution. Now the integral can be made to vanish by making the integrand vanish, which in turn requires the solution of a separable first-order differential equation. (Without the precaution of changing from z to z, this would have been a second-order equation.) This gives the integrand in Equation (18b). To make the end-point contribution vanish, we require a function y(z) vanishing as z + m, and the exponential factor in the integrand has this behaviour if we restrict the contour to R e s < 0. One obvious choice for end-point is s = -m. For the other we choose the negative root of f ( s ) ( s 2- a ) , i.e. s = -&, which works if X < 4&. In fact the more stringent condition is the integrability of f(s) a t this point, which requires X < &. Now we must match y and y’ a t z = 0. Evaluating the integral a t z = 0 is straightforward, and the transformation s = -&(l 2t) gives a standard integral for a beta function. In order to evaluate the derivative y’(O+) one cannot simply differentiate the integral and substitute z = 0. For one thing the resulting integral diverges as s + -ca. This behaviour is killed by the exponential if 2 is small and positive, and in this case one can approximate the other factors in the integrand by their asymptotic form as s + -m. Again one obtains a standard integral, this time for a gamma function.
+
In the end one finds that the relation between X and a is
E rr (( - L + L ) ‘ -I>+:)
tan&=
4 6
4
As A/& + 1-, the gamma function in the denominator tends to infinity, and so fi -+ 0. Thus there is an asymptotic regime such that CY + 0 and X N &.If, on the other hand, A/& + O+, it is clear that the right hand side of Equation (20) tends t o infinity, and so X + 7r2 /4. Numerical study shows that there is a single solution which joins these two asymptotic regimes. In the second asymptotic regime ( a + 03, i.e. large N ) , escape is very efficient, and the time scale for loss of mass, 1/X, is determined by relaxation. Recalling that we have scaled time by the relaxation time, it follows that the time t o lose half the mass, say, varies as t,. In case a is small, however, escape is the bottleneck, and the escape timescale, in units of the relaxation time, increases as N (or a ) decreases. In fact in this regime we expect the half mass time to vary nearly as t , / f l . Since tr itself varies nearly as N (in the units of Figure 2), we expect a time scale varying as tfl2. These results correspond qualitatively to what is observed (Figure 2). It should be stressed, however, that the value of this toy model is purely qualitative. When one studies simulations of the evolution of star clusters quantitatively (Baumgardt 2000b, or those in Figure 2) it is found that, in the case of small N , the actual scaling is more like t,3/4. We now outline Baumgardt’s argument which leads to this scaling. We assume that the distribution of escapers (i.e. those with E > E,) is nearly in equilibrium. Then
Douglas C Heggie
124
Equation (17) shows that the width of the distribution is x cy-li4. (This scaling can also be seen in Equation (18b).) The number of such escapers is proportional to this width, and can be estimated to be of order N c x - ’ / ~The . escape time scale at this energy is of order 1/(cys2), and therefore the rate of escape is of order N c Y ’ / ~Thus . the time scale for losing (say) half the mass is of order cy-li4 in units of the relaxation time, i.e. the time scale of mass loss varies almost as t;i4. It is the assumption that the distribution of escapers reaches equilibrium which distinguishes this estimate from the toy model discussed previously, but the reason for this difference is not understood. N
N
Distribution of escape times
6
The results of the previous section relate to the time scale on which stars escape, and we conclude with some discussion of the actual distribution of escape times. This issue has been studied in a fairly wide variety of problems (e.g. those discussed in the book by Wiggins 1992, and Kandrup et a1 1999). In some problems the distribution is found to be approximately exponential, and in others it is better approximated by a power law. For escape in Hill’s problem, the numerical experiments summarised in Section 5.3 show that the distribution is approximately a power law, over the range of escape times that are relevant in applications and amenable to numerical study (Fukushige & Heggie 2000). In this section we shall not even come close to obtaining the distribution of escape times numerically. We shall, however, introduce two tools which show us how to think about this problem. One is a suitable theoretical framework (called turnstile dynamics), and the other is a toy model (HCnon 1988) which serves two purposes: (i) it can be used to illustrate turnstile dynamics, and (ii) it was inspired by Hill’s problem.
6.1
Hbnon’s toy model
We already presented a surface of section for Hill’s problem, and HCnon’s model could have been devised with.the properties of the corresponding PoincarC map in mind. Physically, however, it can be thought of as the problem of a ball falling under gravity and bouncing off two disks (Figure 8).
Figure 8. Hinon’s billiard model for Hall’s problem. When the radius of the disks is very large, HCnon showed that the relation between
Escape in Hill’sProblem
125
conditions at each bounce takes a particularly simple form, which is
Xj cosh $ + wjsinh 1c, - sj(cosh $ - 1)
Xj+,
=
Wj+l
= X j sinh $
+ wjcosh $ - ( s j cosh $ + sj+l)tanh -,1c,2
where $ is a parameter (related to the radius of disks, the strength of gravity, etc.), X, is the x-coordinate at the j t h bounce, wj is the tangential velocity component at this time, and sj = signX,. (There is a tiny subtlety at Xj = 0, which we ignore in this exposition.) The only non-linearities in this problem are the terms with s’s. Otherwise the map is just a hyperbolic rotation about the point X = fl,w = 0, in the left and right halves of the X, w plane, as appropriate. It is only when a point moves from one half to the other (across the discontinuity in the surface off which the ball bounces) that anything different happens. These two points are fixed points of the map. As usual in such situations, a fixed point corresponds to a periodic motion, which here refers to the ball bouncing repeatedly off either of the highest points of the disks (Figure 8). These motions are obviously unstable, and the fixed points on the surface of section have local stable and unstable invariant manifolds which are segments of the lines X = fl f w (Figure 9).
Figure 9. Schematic surface of section f o r Hdnon’s model. T h e dashed lanes are the local stable and unstable invariant manifolds of the fixed points at (kl,0 ) . What has this to do with Hill’s problem? For one thing the unstable periodic orbits have an analogy (in Hill’s problem) with the Liapounov orbits mentioned in Section 5.1. Using the linearised equations derived there it is also possible to derive equations for the local stable and unstable invariant manifolds of the corresponding fixed points on the surface of section (Figure 10). The main difference between the two models is the absence, in HQnon’smodel, of anything comparable with a limiting curve.
6.2
Turnstile dynamics
In Figure 10 it is fairly obvious how to define the part of the section “inside” the cluster, and how to define the part outside. In Figure 9, despite the absence of limiting curves,
Douglas C Heggie
126
J
Wdt 0
.‘./ ,
-\
Figure 10. Outline surface of section for Hill’s problem, at some energy E just above E,. The small elliptic arcs are the local stable and unstable manifolds of the fixed points corresponding to the Liapounov orbits, and the large curves are the limiting curves.
we shall define the inside and the outside by the naive resemblance of the two pictures. To be more precise, the inside (RI) will be defined as the rhombus lying within the stable manifolds of the fixed points, and the outside (R2) as everything else (Figure 9). This at least makes clear that the boundary between the two regions is to be defined by pieces of the stable and unstable manifolds. This is one of the main procedural points in the theory of turnstile dynamics (Wiggins 1992), which we now introduce via this example. In order to apply this theory to Hill’s problem, we would also have to define the inside and the outside a little more carefully near the fixed points, though we shall not dwell on the details here. The problem of escape in HCnon’s model now focuses on the parts of the surface of section which, under the PoincarC map, are exchanged between regions R1 and R2. A direct calculation (Roy 2000) shows that the region which, on one iteration of HCnon’s map, leaves the region RI consists of the union of two triangles. One of these is shown on Figure 9 and labelled Ll,2(l). The notation, which comes from Wiggins (1992), indicates that this region is a lobe, which moves from region R1 to region R2 on one iteration. In the usual situation considered by Wiggins, a lobe is bounded by parts of the stable and unstable manifolds of fixed points. This is only partly true in HCnon’s model. Two parts of the boundary of the little triangle on Figure 9 have this property: the lower right, which is part of the unstable manifold of the right-hand fixed point, and the boundary at upper right, which consists of part of the stable manifold of the left-hand point. The discontinuity a t X = 0 provides the remaining part of the boundary. We now consider capture of phase space from the region Rz into the region R1. Again we have two triangular regions, one of which is shown in Figure 9, and labelled L2,1(1). as the reader should by now appreciate. Also shown in the Figure are successive iterates of this lobe under the HCnon map H . It can be seen that these remain inside R1 until the map is iterated 5 times. The region H4(L2,1(l)),which is the black triangle furthest to the lower right, intersects L1,2(l), and after one further iteration this intersection leaves region R1. (It does so forever, actually. The number of iterations that elapse before such an intersection takes place depends on the value of $, of course.)
Escape in Hill’s Problem
127
Now we can see how the distribution of escape times can be analysed, at least in principle. Imagine that, at t = 0 (where t counts the number of iterations) the region RI is filled uniformly with points. At time t = 1,the area occupied by L1,2(1) escapes. The same happens at times t = 2, 3, and 4. At time 5 , however, the number of points that escape is given by the area of L1,2(l)\H4(L2,1(l)).At time 6 the area is now L1,2(l)\(H4(L2,1(l))U H5(L2,1(l))),and so on. HCnon’s toy problem is unusual in that some of these calculations can be carried out by elementary means. In almost all problems, by contrast, the work is necessarily numerical. Nevertheless the ideas of turnstile dynamics help to economise the work. The naive way of computing a distribution of escape times, as in Section 5.3, is to distribute points throughout region R1 and measure how long they take to escape. We now see, however, that we only need to consider the dynamics of points within L1,2(1) in order to reach the same results. This concentrates the numerical work where it is actually needed. When we apply these notions to Hill’s problem, a number of additional complicating factors arise. In the first place the area on the surface of section is not proportional to the volume of phase space (Binney et a1 1985): and therefore does not yield an appropriate measure of the escape rate. Secondly, not all escapers from the Hill potential actually intersect the obvious surface of section y = 0. Thirdly, the problem is three-dimensional, and the visualisation of turnstile dynamics becomes harder; Wiggins’ book shows some of the complications that arise. On the other hand, in the planar Hill problem some results are possible. In particular, the analogues of the escape and capture lobes, L1,2(1) and L2,1(1), and their iterates have been mapped out at one or two values of the energy (Roy 2000, Sim6 & Stuchi 2000). For small numbers of iterations one obtains fairly simple ovals on the surface of section. These are the intersections of the surface of section with the stable and unstable invariant manifolds of the Liapounov orbits, i.e. structures like the tube in Figure 6. For higher numbers of iterations their structure becomes highly convoluted, and further complicated by the fact that, at some intersections, only part of the tube actually intersects the surface. Another factor which turnstile dynamics clarifies is the relationship between escape, which is our interest here, and temporary capture, which has motivated other studies (e.g. Murison 1989).
Acknowledgments I thank J Waldvogel, T Quinn and C Sim6 for interesting conversations about the issues in these lectures, and B. Chauvinea and F. Mignard for a copy of their 1991 publication. Research with H. Baumgardt is supported by PPARC under grant PPA/G/S/1998/00044.
References Abramowitz M and Stegun I A, 1965, Handbook of Mathematical Functions, (New York: Dover). Ambartsumian V A, 1938, On the dynamics of open clusters, Uch. Zap. L.G. U,,No.22,p.19; translated in Dynamics of Star Clusters, (eds Goodman J and Hut P), Proc. IAU Symp. 113, 521-524 (Dordrecht: Reidel).
128
Douglas C Heggie
Baumgardt H, 2000a, in Dynamics of Star Clusters and the Milky Way, (eds Deiters S , Fuchs B, Just A, Spurzem R and Wielen R), in press (San Francisco: ASP). Baumgardt H, 2000b, Scaling of N-body calculations, MNRAS, submitted. Benest D, 1971, Elliptic restricted problem for sun-jupiter: existence of stable retrograde satellites at large distance, A&A 13 157-160. Binney J, Gerhard 0 E and Hut P, 1985, Structure of surfaces of section, MNRAS 215 59-65. Binney J and Tremaine S, 1987, Galactic Dynamics, (Princeton: Princeton Univ. Press). Burkill J C, 1962, The Theory of Ordinary Digerential Equations, (Edinburgh: Oliver & Boyd). Chandrasekhar S, 1942, Principles of Stellar Dynamics, (Chicago: Univ. of Chicago Press; also New York: Dover, 1960). Chandrasekhar S, 1943, Dynamical friction 11. The rate of escape of stars from clusters and the evidence for the operation of dynamical friction, ApJ 97 263-273. Chauvineau B, Mignard F, 1990, Dynamics of binary asteroids I (Hill’s case), Zcar 83 360-381. Chauvineau B and Mignard F, 1991, Atlas of the Circular Planar Hill’s Problem, (Grasse: Obs. de la Cote d’Azur). F’ukushige T, Heggie D C, 2000, The time scale of escape from star clusters, MNRAS in press. Gunn J E and Griffin R F, 1979, Dynamical studies of globular clusters based on photoelectric radial velocities of individual stars, I - M3 A J 84 752-773. HCnon M, 1969, Numerical exploration of the restricted problem. V. Hill’s case: periodic orbits and their stability. A&A 1 223-238. HCnon M, 1988, Chaotic scattering modelled by an inclined billiard, Physica D 33 132-156. Jackson J, 1913, Retrograde satellite orbits, MNRAS 74 62-82. Kandrup H E, Siopis C, Contopoulos G and Dvorak R, 1999, Diffusion and scaling in escapes from two-degree-of-freedom Hamiltonian systems, astro-ph/9904046. Leon S, Meylan G and Combes F, 2000, Tidal tails around 20 Galactic globular clusters. Observational evidence for gravitational disk/bulge shocking, A &A 359 907-931. MacKay R S, 1990, Flux over a saddle, Phys. Lett. A 145 425-427. Makino J and Aarseth S J, 1992, On a Hermite integrator with Ahmad-Cohen scheme for gravitational many-body problems, PASJ 44 141-51. Marchal C, 1990, The Three-Body Problem, (Amsterdam: Elsevier). Markellos V V, 2000, private communication. Meylan G, Dubath P and Mayor M, 1991, Two high-velocity stars shot out from the core of the globular cluster 47 Tucanae, ApJ 383 587-593. Moser J K, 1968, Lectures on Hamiltonian Systems, Mem. AMS 81 1-60; also in Hamiltonian Dynamical‘Systems, editors MacKay R S and Meiss J D, (Bristol: Adam Hilger, 1987). Murison M A, 1989, The fractal dynamics of satellite capture in the circular restricted three-body problem, A J 98, 2346-59 and 2383-6. Murray C D and Dermott S F, 1999, Solar System Dynamics, (Cambridge Univ. Press). Oh K S, Lin D N C and Aarseth S J, 1992, Tidal evolution of globular clusters. I. Method. ApJ 386 506-18. Plummer H C, 1918, A n Introductory Treatise on Dynamical Astronomy, (Cambridge: Cambridge Univ. Press; also New York: Dover, 1960). Ross D J, Mennim A and Heggie D C, 1997, Escape from a tidally limited star cluster. MNRAS 284 811-814. Roy A, 2000, PhD Thesis, Univ. of Edinburgh, in preparation. Sim6 C and Stuchi T J, 2000, Central stable/unstable manifolds and the destruction of KAM tori in the planar Hill problem, Physica D, 140,1-32. Spitzer L, Jr, 1987, Dynamical Evolution of Globular Clusters, (Princeton Univ. Press). Szebehely V, 1967, Theory of Orbits, (New York: Academic Press). Wiggins S, 1992, Chaotic Transport in Dynamical Systems, (Berlin: Springer-Verlag).
129
Galaxies: kinematics to dynamics Michael R Merrifield University of Nottingham, UK
1
Introduction
As will be apparent to anyone reading this book, the practitioners of N-body simulations have an enormous variety of preoccupations. Some are essentially pure mathematicians, who view the field as an exciting application for abstruse theory. Others enjoy formulating and tackling mathematically-neat problems, with little concern over whether the particular restrictions that they impose are also respected by nature. Still others are closer to computer scientists. inspired by the challenge of developing ever more sophisticated algorithms to tackle the N-body problem, but showing less interest in the ultimate application of their codes to solving astrophysical problems. This contribution is presented from yet another biased perspective: that of the observational galactic dynamicist. Observational astronomers tend to use N-body simulations in a rather cavalier manner, both as a tool for interpreting existing astronomical data, and as a powerful technique by which new observations can be motivated. The aim of this article is to illustrate this profitable interplay between simulations and observations in the study of galaxy dynamics, as well as highlighting a few of its shortcomings. To this end, the text of this chapter is laid out as follows. Section 2 provides an introduction to the sorts of data that can be obtained in order to study the dynamical properties of galaxies, and goes on to discuss the intrinsic stellar dynamics that one is trying to model with these observations, and the role that N-body simulations can play in this modelling process. Section 3 gives a brief overview of the historical development, of N-body simulations as a tool for studying galaxy dynamics. Section 4 provides some examples of the interplay between observations and N-body simulations in the study of elliptical galaxies, while Section 5 provides further examples from studies of disk galaxies, concentrating on barred systems. These sections are in no way intended to be encyclopedic in scope; rather, by selecting a few examples and examining them in some detail, the text seeks to give some flavour for the range of what is possible in this rich field. Finally, Section 6 contains some speculations as to what may lie in the future for this productive relationship between observations and N-body modelling.
Michael R Merrifield
130
2
Kinematics and dynamics
The astronomer can glean only limited amounts of information about galaxies from observations. Some of these limitations arise from the practical shortcomings of telescopes, which can only obtain data with finite signal-to-noise ratios and limited spatial resolution. However, some of the restrictions are more fundamental. One can, for example, only view a galaxy from a single viewpoint, from which it is not generally possible to reconstruct its full three-dimensional shape, even if the galaxy is assumed to be axisymmetric (Rybicki 1986). We must therefore draw a distinction between kinematics, which are the observable properties relating to the motions of stars in a galaxy, and dynamics, which fully describe the intrinsic properties of a galaxy in terms of the motions of its component stars. Much of the study of galaxy dynamics involves attempting to interpret the former in terms of the latter. Since stars are not the only constituents of galaxies, there is often additional information that one can glean from other components such as gas, whose kinematics may be revealed by its emission lines. These additional components can also confuse the issue, as selective obscuration by dust of some regions of a galaxy can have a major impact on the observed kinematics (e.g. Davies 1991). However, this text is concerned with N-body models, which are primarily used to describe the stellar components of galaxies, so here we concentrate just on the stellar dynamics. Nevertheless, it should be borne in mind that no description of a galaxy, particularly a later-type spiral system, is complete without considering these other components.
2.1
Kinematics
We begin by looking a t what properties of a dynamical stellar system are, at least in principle, observable. The simplest data that one can obtain is what is detected by an image, namely the distribution of light from the galaxy on the sky, p(z,y). Even for relatively nearby galaxies, the smallest resolvable spatial element will contain the light from many stars, so p provides a measure of the number of stars per unit area on the sky. By obtaining spectra of each of these spatial elements, we can start measuring the motions of the stars as well as their current locations. The observed spectrum will be a composite of the light from all the individual stars. Stellar spectra contain dark absorption lines due to the various elements in their atmospheres, but these absorption lines will be Doppler shifted by different amounts, depending on the line-of-sight velocities of the stars. Thus, as Figure 1 illustrates, the observed absorption lines will appear broadened and shifted due to the superposition of all the individual Doppler shifted spectra. Put mathematically, the observed spectrum of a galaxy made up from a large number of identical stars will be G(U)= d%sF(2'los)S(~- ?JlOS)l (1)
1
where U = clnX is the wavelength expressed in logarithmic units, S is the spectrum of the star in the same units, and F(u~,,) is the function describing the distribution of stars' line-of-sight velocities within the element observed. Equation 1 is a convolution integral equation, which, in principle, can be inverted to yield the kinematic quantity of interest, F(q,,), for a given galaxy spectrum, G ( u ) ,and
131
Galaxies: kinematics to dynamics 2
1.5
h
s s
1
0.5
520
500
540
A /nm
Figure 1. Spectra of a star and a galaxy, showing how the absorption lanes an the latter are shifted and broadened relative to the former. a spectrum S ( u ) obtained using an observation of a suitable nearby “template” star. In practice, such unconstrained deconvolutions are hopelessly unstable. The usual approach is therefore to assume some relatively simple functional form for this function, and adjust its parameters until Equation 1 is most closely obeyed. The best-fitting version of F(y,,) then provides a model for the line-of-sight velocity distribution of stars a t that point. Conventionally, and with little physical justification, F(wl,,) has usually been assumed t o be Gaussian, and the fitting returns optimal values for the mean velocity and dispersion of this model velocity distribution. More recently, however, the quality of data has improved to a point where more general functional forms’can be fitted, allowing a less restricted analysis (e.g. Gerhard 1993, Kuijken & Merrifield 1993). With spectra a t high signal-tonoise ratios, it is even possible to attempt a non-parametric analysis, yielding a best-fit form for F(y,,) subject only to the most generic constraints of positivity and smoothness (e.g. Merritt 1997). Although there are many practical difficulties involved in deriving a completely general description for F(q,,) [see Binney & Merrifield (1998) Chapter 111, it is a t least in principle measurable. Thus, the most general kinematic quantity that one can infer for a stellar dynamical system is the line-of-sight velocity distribution at each point on the sky where any of the galaxy’s stars are to be found, F ( z ,y , qOs).
2.2
Dynamics
To fully specify a galaxy’s stellar dynamics, we need t o know the gravitational potential, @(z,y, z ) , which dictates the motions of individual stars, and the “distribution function”, f(x,y, z , U,, uuy,wz),which specifies the phase density of stars, giving their velocity distribution and density at each point in the galaxy. We would therefore appear t o have a completely intractable problem, since we must somehow try to extract the six-dimensional distribution function from the rather complex observable projection of this quantity, F ( x ,y, ulOs),which only has three dimensions.
132
Michael R Merrifield
Fortunately, however, the form of the distribution function is not completely arbitrary. For example, it must be positive or zero everywhere, since one can never have a negative density of stars. Further, stars are (more-or-less) conserved as they orbit around a galaxy, and can only change their velocities in a continuous manner, dictated by acceleration due to gravity. This continuity can be expressed in the collisionless Boltzmann equation,
By manipulating the collisionless Boltzmann equation, one can derive a number of useful formulae for galaxy dynamics. A full discussion of this field is beyond the scope of this article, and the interested reader is referred to the excellent treatment by Binney & Tremaine (1987). Here, we simply summarise some of the key results: By taking a spatial moment of the collisionless Boltzmann equation, one can derive the virial theorem, which relates the total kinetic and potential energies of the system. The kinetic energy can be estimated from the observable line-of-sight motions of stars, from which the potential energy and hence the mass of the system can be inferred. It was this approach that provided the first evidence of dark matter, in clusters of galaxies (Zwicky 1937). By integrating Equation 2 over velocity, one obtains the continuity equation, which describes how the density of stars will vary with time due to any net flows in their motions. This equation is central to the dynamics of “cooler” stellar systems like disk galaxies, where mean streaming motions dominate the dynamics; as we shall see below, it has played an important role in studying the properties of barred galaxies. By multiplying Equation 2 by powers of velocity and integrating over velocity, one can derive the Jeans equations obeyed by the velocity dispersion, and their highermoment analogues. The Jeans equations describe the random motions of stars, and have proved particularly important in studies of the dynamics of elliptical galaxies, where there is little mean streaming, and random velocities are generally the dominant stellar motions (e.g. Binney & Mamon 1982). By considering integrals of motion, one can derive the strong Jeans theorem: “for a steady state galaxy in which almost all the orbits are regular, the distribution function depends on at most three integrals of motion.” For example, in an axisymmetric galaxy, one may write f (z,y, z , u2, wy,U,) f ( E ,Jz,I s ) , where E is the energy of the star, J, its angular momentum about the axis of symmetry, and 1 3 is the “third integral” respected by the star’s orbit, which cannot generally be written in a simple analytic form. This last result provides us with at least the hope that galaxy dynamics presents a tractable problem, since we now need only infer a three-dimensional distribution function from its three-dimensional observable projection, F ( z ,y, ulos). Equation 2 describes the continuity equation of a phase space fluid, which must be solved in order to understand the dynamics of galaxies. N-body simulation codes are really just Monte Carlo integrators tailored to solving this partial differential equation. It is very tempting to interpret the bodies in an N-body code as something more physical,
Galaxies: kinematics to dynamics
133
such as the individual stars in a galaxy. However, unlike star clusters, galaxies contain so many stars that current simulations are still several orders of magnitude away from such a one-to-one correspondence. It is therefore much healthier to view an N-body simulation simply as a Monte Carlo solver for the collisionless Boltzmann equation, which is, in turn, a fluid approximation to the description of the properties of the large (but finite) collection of stars that make up a galaxy.
3
A brief history of galaxy N-body simulations
Before discussing modern applications of N-body simulations to studies of galaxy dynamics, it is instructive to look at the historical development of the field. N-body simulations of galaxies date back to well before the invention of the computer. Probably the first example of the technique was presented by Immanuel Kant in his 1755 publication, Universal Natural History and Theory of the Heavens. Part of this book was concerned with the properties of the Solar System, discussing how the plane of the ecliptic reflects the ordered motions of the planets around the Sun, while the more random orbits of comets causes them to be distributed in a spherical halo. Kant’s N-body simulation involved using this understanding of the Solar System as an analog computer by which the Milky Way could be simulated. He pointed out that the same law of gravity applies to the stars in the Galaxy as to the planets in the Solar System. He therefore argued that the band of the Milky Way could be understood in the same way as the plane of the ecliptic, arising from the ordered motion of the stars around the Galaxy. The lack of apparent motion in the stars could be explained by the vastly larger scale of the Milky Way. He further pointed out that the scattering of isolated stars and globular clusters far from the Galactic plane could be compared to comets, their locations reflecting their more random motions. Finally, he speculated that other faint fuzzy nebulae were similar “island universes’’ whose stars followed similar orbital patterns. Quite amazingly, Kant’s simple analog N-body simulation had revealed most of the key dynamical properties of galaxies. The next major advance in galaxy N-body simulations was made by Holmberg (1941). He used the fact that the intensity of a light source drops off with distance in the same inverse-square manner as the force of gravity. He therefore constructed an analogue computer by arranging 74 light bulbs on a table: the intensity of light arriving at the location of each bulb from different directions told him how large a force should be applied at that position, and hence how that particular bulb’s location should be updated. With this analogue integrator, Holmberg was able to show that collisions between disk galaxies can throw off tidally-induced spiral arms (Figure 2), and that this process can rid the system of sufficient energy that the remaining stars can become bound into a single object. The subject really took off in the 1970s with the widespread availability of digital computers of increasing power. Numerical N-body simulations on such a machine allowed Toomre & Toomre (1972) to explore the parameter space of galaxy mergers far more thoroughly than Holmberg had been able. They were thus able to reproduce the observed morphology of tidal tails and other features seen in particular merging galaxies, allowing them to reconstruct the physical parameters of the collisions in these systems. Other fundamental insights into galaxies were also made by N-body simulations around this time, such as the demonstration that a self-gravitating axisymmetric disk of stars on circular
Michael R Merrifield
134
.
- .
.. . . . F - , .:.. e
*
.
.
.-:..’
,e--.
I
..
e
Figure 2. Holmberg’s original N-body simulation illustrating a merger between two disk galaxies. [From Holmberg (1941).] 7
t
=
8.0
t = 8.5
t
=
9.0
-
I I t = 10.0 t = 10.5 t = 9.5
Figure 3. N-body simulation of a disk of ‘(cold”particles initially orbiting on orbits very close to circular. Note the rapid growth of a strong bar instability. [From Hohl (1971).] orbits is grossly unstable, rapidly evolving into a bar and spiral arms (see Figure 3). More recently, progress has been driven by developments in algarithms and computer hardware, which allow N-body codes to follow the motions of ever larger numbers of particles. Although we are still a long way from being able to follow the motions of the billions of stars that make up a typical galaxy, the increased number of particles helps suppress various spurious phenomena that arise from the Poisson fluctuations in simulations using small numbers of particles. The increased number of particles also increases the dynamic range of scales that one can model within a single simulation. For example, it is now possible to look in some detail a t the results of mergers between disk galaxies; it has long been suggested that such mergers may produce elliptical galaxies [see Barnes & Hernquist (1992) for a review], but the simulations are now so good that we can measure quite subtle details of the merger remnants’ properties such as how fast they rotate and the exact shapes of their light distributions (Naab et al. 1999). We can then compare these quantities with the properties of real elliptical galaxies to test
Galaxies: kinematics to dynamics
135
the viability of this formation mechanism. We are fast reaching the stage where a single simulation will have sufficient resolution to model simultaneously the growth of largescale.structure in the Universe and the formation of individual galaxies (e.g. Kay et al. 2000, Navarro & Steinmetz 2000). Thus, within the next few years, we will be able to perform simulations where the formation and evolution of galaxies can be viewed within the broader cosmological framework. However, since these studies depend critically on the treatment of gas hydrodynamics, they lie beyond the remit of this article on N-body analysis of the collisionless Boltzmann equation.
4
Modelling elliptical galaxies
Elliptical galaxies provide a good place to start in any attempt to model the stellar dynamics of galaxies. The simple elliptical shapes of these systems offer some hope that their dynamics may also be relatively straightforward to interpret; this high degree of symmetry means that the assumption of axisymmetry or even spherical symmetry may not be unreasonable. Further, the absence of dust in these systems means that the observed light accurately reflects the distribution of stars in the galaxy, greatly simplifying the modelling process. In fact, elliptical galaxies are so simple that N-body simulations would not appear to have much of a role to play. The symmetry of these systems means that one can readily generate spherical or axisymmetric models with analytic distribution functions that reproduce many of the general properties of elliptical galaxies (e.g. King 1966, Wilson 1975). Where one seeks to reproduce the exact observations of a particular galaxy, Schwarzschild’s method (Schwarzschild 1979) is often a much better tool than a full Nbody simulation. This technique involves adopting a particular form for the gravitational potential-perhaps, for example, by assuming that the mass distribution follows the light in the galaxy-and calculating a large library of possible stellar orbits in this potential. One then simply seeks the weighted superposition of these orbits that best reproduces all the observational data for the galaxy. Originally, these fits were made just to reproduce the projected distribution of stars, but more recent implementations have also used kinematic constraints such as the line-of-sight streaming velocities and velocity dispersions at different projected locations in the galaxy. It is also possible to start using information from the detailed shape of the line-of-sight velocity distribution (e.g. Cretton et al. 2000); ultimately, one could look for the superposition of orbits that reproduces the entire projected kinematics, F ( z ,y, qos). There are, however, some aspects of the properties of elliptical galaxies where N-body simulations offer a powerful tool. In particular, if one is concerned with the stability of an elliptical galaxy, one needs to study the full non-linear time evolution of Equation 2, for which N-body solutions are the most natural technique. As an example of the sort of issues one can answer using this approach, consider the distribution of elliptical galaxy shapes. Observations of this distribution have revealed that very flattened elliptical galaxies do not exist: the most squashed systems have shortest-to-longest ratios of only 0.3. This observation could not be explained using the simple modelling techniques described above, since it is straightforward to derive a distribution function corresponding to a much flatter elliptical galaxy. However, if one takes such a distribution function as the initial solution
-
Michael R Merrifield
136
I
.
I
.
.
.
.
.
.
I
.
.
.
.
.
,
.
.
' . . . . .
Figure 4. N-body simulation of an elliptical galaxy set up in an initially very flat distribution, as viewed along the three principal axes. Note the rapid fattening via a bending instability. From Jessop et al. (1997).] to Equation 2, and uses an N-body simulation to follow its evolution, one discovers that it is grossly unstable, usually to some form of buckling mode, which rapidly causes it to evolve into a rounder system, comparable to the flattest observed ellipticals (see Figure 4). Thus, the absence of flatter elliptical systems has a simple physical explanation: they are dynamically unstable. Instability analysis using N-body codes has also shed light on other properties of elliptical galaxies. For example, Newton & Binney (1984) successfully constructed a distribution function that could reproduce the photometric and kinematic properties of M87: assuming only that the mass of the galaxy is distributed in the same way as its light and that the galaxy is spherical, they were able to match both the light distribution of M87 and the variation in its line-of-sight velocity dispersion with projected radius. Thus, they would appear t o have a completely viable dynamical model for M87. However, Merritt (1987) took this distribution function as the starting point for an N-body simulation, and showed that the preponderance of stars on radial orbits at its centre rendered the model unstable - the N-body model rapidly formed a bar at its centre. Thus, the simple spherical model in which the mass followed the light was invalidated, implying either that M87 is not intrinsically spherical, or that it contains mass in addition to that contributed by the stars.
Galaxies: kinematics to dynamics
137
Although some instability analyses can be carried out analytically, the full calculations of the behaviour of an unstable system, particularly once the instability has grown beyond the linear regime, is almost always intractable, making N-body simulations the best available tool. Some care must be taken, however, to make sure that any instability detected is not a spurious effect arising from the numerical noise in the Monte Carlo N-body integration method (or indeed, that any real instability is not suppressed by the limitations of the method). N-body simulations can also be applied to the study of elliptical galaxies by providing what might be termed “pseudo-data.” When a new technique is proposed for extracting the intrinsic dynamical properties of a galaxy from its observable kinematics, one needs some way of testing the method. Ideally, one would take a galaxy with known dynamical properties, and see whether the method is able to reconstruct those properties. Unfortunately, it is most unlikely that the corresponding intrinsic dynamics of a real galaxy would be known - if they were, there would be no need to develop the new technique! However, with an N-body simulation, for which the intrinsic properties are all measurable, one can readily calculate the appropriate projections to construct its “observable” properties, F ( z ,y, qO8), from any direction. One can then test the method on these pseudo-data to see whether the intrinsic properties of the galaxy can be inferred. An excellent example of this approach was provided by Statler (1994) in his attempt to reconstruct the full three-dimensional shapes of elliptical galaxies from their observable kinematics. Although these systems have a simple apparent structure, there is no a priori reason to assume that they are axisymmetric, and a more general model would be t o suppose that they are triaxial, with three different principal axis lengths (like a somewhat deflated rugby ball). Indeed, there is strong observational evidence that elliptical galaxies cannot all be completely axisymmetric. Images of some ellipticals reveal that the position angles on the sky of their major axes vary with radius. Such “isophote twist” cannot occur if a galaxy is intrinsically axisymmetric, as the observed principal axes of such a system would always coincide with the projection on the sky of its axis of symmetry. Thus, these elliptical galaxies must be triaxial in structure. Statler made a study of the dynamics of some simple triaxial galaxy models, and concluded that one could obtain a much better measure of the shape of the system by considering the mean line-of-sight motions of stars as well as their spatial distribution. As a test of this hypothesis, he took an N-body model, and extracted from it the observable properties of the mean line-of-sight velocity and projected density at a number of positions. Unfortunately, the constraints on the intrinsic galaxy shape inferred from these data were found to be only marginally consistent with the true known shape of the N-body model. Although in some ways rather disappointing, this analysis reveals the true power of using N-body simulations to test such ideas: the N-body simulation did not contain the same simplifying assumptions as the analytic model that had originally motivated the proposed idea, so it provided a truly rigorous test of the technique. As a final example of the way in which N-body simulations can interact with observations in the study of elliptical galaxies, let us turn t o some work on “shell galaxies.” Such systems typically appear to be fairly normal ellipticals, but careful processing of deep images reveals that their light distributions also contain faint ripple-like features in a series of arcs around the galaxies’ centres (e.g. Malin & Carter 1983). The simplest explanation for these shells is that they are the remains of a small galaxy that is merging with the
Michael R Merrifield
138
04
02 i
x
0
-0 2 -0 4
25
2
35
3 X
0 04
0 02
t
o - 0 02 - 0 04
2
25
3
35
4
X
Figure 5. N-body simulation, projected to show the observable properties of the shells created in a minor merger. The upper panel shows the photometric properties, while the lower panel shows the kinematically-observable line-of-sight velocity versus projected distance along the major axis. The dashed lanes show the predicted caustic shapes. [ F r o m Kuijken & Merrijield (1998). larger elliptical from an almost radial orbit. Each shell is made up from stars of equal energy from the infalling galaxy, which have completed a half-integer set of oscillations back-and-forth through the larger galaxy, and are in the process of turning around. Since the stars slow to a halt as they turn around, they pile up at these locations, producing the observed shells. Shells at different radii contain stars with different energies, which have completed different numbers of radial orbits since the merger. Since the stars in any shell have a very small velocity dispersion compared to that of the host galaxy, they show up clearly as sharp edges in the photometry. N-bodies simulations (e.g. Quinn 1984) played a key role in confirming that such mergers could, indeed, produce sets of faint shells in the photometric properties of galaxies. It is therefore interesting to go on to ask what the most generally-observable kinematic properties of one of these shells might be. Again, N-body simulation offer an excellent tool with which to address this question. Figure 5 presents the results of such a simulation, showing both the faint photometric shells and the rather stronger kinematic signature of a minor merger. The line-of-sight velocity distribution as a function of position along the major axis shows a characteristic chevron pattern, whose origins are relatively straightforward to explain (Kuijken & Merrifield 1998). Consider the stars in a shell whose outer edge lies at T = T,. By energy conservation, the radial velocities of stars at T < T , in this shell are 21,
where
@(T)
= f {2[@(T,)- @(T)]}”*,
is the gravitational potential at radius
T.
(3)
By simple geometry, the observable
Galaxies: kinematics to dynamics
139
line-of-sight component of this velocity is given by
Close to the shell edge, where T r, << 2 , the maximum value of expanding and differentiating Equation 4, to be N
vlos
can be shown, by
Examples of lines obeying this equation are shown in Figure 5 ; they clearly match the pattern seen in the N-body “observation.” Thus, if one were to make a detailed kinematic observation of a shell galaxy and observed this chevron pattern, not only would one have dynamical evidence for the merger model, but one would also be able to use the slope of the chevrons to measure d@/dr at the radii of each of the shells. Combining these measurements would allow one to estimate the gravitational potential of the galaxy in a simple robust manner. Here, then, is an excellent example of the close interplay that is possible between observations and N-body simulations. The photometric discovery of shells in elliptical galaxies led to a merger theory that was validated by N-body simulations. The simulations then provided motivation for further observations to study the kinematics of shells in order to make a novel measurement of the gravitational potentials ,of elliptical galaxies.
5
Modelling disk galaxies
We now turn to the use of N-body simulations in the study of disk galaxies. Here, the motivation for using N-body modelling is much clearer. Spiral galaxies contain a wealth of structure, much of which is probably transient in nature, so simple analytic models of the type that do such a good job of describing the basic properties of elliptical galaxies are clearly inappropriate. Instead, one needs a full time-dependent solution to Equation 2, for which N-body simulations provide the most obvious technique. It should, however, be borne in mind that the use of Equation 2 is often significantly less appropriate in the study of disk galaxies than was the case for ellipticals. Active star formation in many spiral galaxies means that the continuity implied by the collisionless Boltzmann equation is not strictly valid, as stars appear in the formation process, and the brightest, most massive amongst them subsequently disappear in supernovae. Further, the location of these star formation regions is largely driven by the dynamics of the gas from which the stars form. The collisional nature of this gas means that it is poorly described by a collisionless N-body code, and should really be dealt with using much more sophisticated gas codes. As a further complication, the dust found in most spiral galaxies means that a significant fraction of the starlight is scattered or absorbed. Thus, there is a rather complicated relationship between the results of an ?J-body code (which essentially gives the distribution of stars in the system) and the observed photometric properties of a galaxy. Finally, the likely transient nature of many of the properties of spiral galaxies also complicates comparison between observation and theory: since one has only a snapshot view of a galaxy, one has to search through the complete evolution
140
Michael R Merrifield
in time of an N-body simulation to see if it matches the observed properties of the galaxy at any point. Despite these caveats, N-body simulations have provided a wide variety of insights into the dynamics of disk galaxies. As for the ellipticals, N-body simulations have not only been used to explain many of the observed properties of disk galaxies, but they have also provided data sets that can test novel analysis techniques, and they have provided the key motivation for a range of new observations.
As an example of this synergy between N-body simulations and observations, we consider in some detail the properties of barred galaxies. As we have already described in Section 3, one of the early triumphs of N-body simulations was in demonstrating that a rectangular bar-like structure, similar to those seen in more than a third of disk galaxies, appears due to an instability in a self-gravitating disk of stars. Subsequently, as we shall see below, N-body simulations have enabled us to understand a great deal about the properties of bars. One of the simplest physical properties of a bar is its pattern speed, R,, which is the angular rate at which the bar structure rotates. In a simulation like that shown in Figure 3, Q, is easy enough to calculate by comparing the bar position angles at different times. In a real galaxy, of course, we do not have the luxury of being able to wait the millions of years required to see the bar pattern move, so it is less obvious that R, can be measured. However, Tremaine & Weinberg (1984) elegantly demonstrated that one can manipulate the continuity equation into a form that contains only the distribution of stars, their mean line-of-sight velocities (observable via the Doppler shift in the starlight at each point in the galaxy), and Cl,. Since the pattern speed is the only unknown, one can derive its value directly from the other observable properties. At the time that this technique was proposed, no observations of barred galaxies had ever produced the quality of spectral data required to implement the method. However, Tremaine & Weinberg were able to prove its viability by taking a single snapshot of an N-body simulation and creating pseudo-observations of the line-of-sight velocities and projected locations of the objects in it. The pattern speed derived from this single pseudo-dataset was found to match that derived from watching the pattern rotate in the complete time sequence of the simulation. More recently, kinematic observations have progressed to a point where this method can be applied’to data from real barred galaxies (e.g. Merrifield & Kuijken 1995). These measurements led to the discovery that bar patterns seem to rotate rather rapidly, with the bar ends lying close to the “co-rotation radius,” which is the radius in the galaxy at which the bar pattern rotates at the same speed that the stars themselves circulate. This finding proved interesting in the light of subsequent N-body simulations of bars (Debattista & Sellwood 2000). These simulations showed that although bars form with this rapid initial rotation rate, in many cases the bar pattern speed rapidly decreases almost to a halt. This deceleration is the result of dynamical friction: the passage of the bar disturbs the orbits of any material orbiting in the halo of the galaxy, concentrating this material into “wakes” of mass that lie behind the rotating bar, exerting a torque that serves to slow the bar’s rotation. Since cosmological N-body models of galaxy formation predict that galaxies should form in centrally-concentrated dark matter halos with plenty of mass at small radii (e.g. Navarro, Frenk & White 1997), one would expect the dynamical friction effects from this halo mass to be strong, yielding slowly-rotating bars. Thus, either the bars with measured pattern speeds happen to have been caught very early in their lives
Galaxies: kinematics to dynamics
141
when they have not slowed significantly, or the dark halos in which these barred galaxies reside do not conform to the cosmologists' predictions. Finally in this discussion of N-body studies of barred galaxies, let us turn to the ultimate demise of bars. Once a bar has grown, there are several ways that it can be destroyed. A minor merger with an in-falling satellite galaxy can put enough random motion into the stars to mean that they no longer follow highly-ordered bar-unstable orbits, thus destroying the bar [see, for example, the N-body simulations by Athanassoula (1996)l. A less violent solution involves the growth of a massive central black hole in the galaxy. Inside a bar, stars shuttle back and forth on ordered orbits aligned with the bar. However, N-body simulations have shown that if a central black hole exceeds a critical mass of a few percent of the bar mass, then the black hole scatters the passing stars so strongly that they end up on chaotic orbits that do not align with the bar, thus destroying its coherent shape (Sellwood & Moore 1999). This mechanism is particularly intriguing, as a bar provides a conduit by which material can be channelled toward the centre of a galaxy. If this inflowing matter is accreted by a central black hole, the central object's mass can grow to a point where the bar is disrupted, shutting off any further inflow of material - a remarkable case of the black hole biting the hand that feeds it! Even if left in isolation with no mergers or central black holes, thin bars in disks can have only a very limited lifetime. N-body simulations (Combes & Sanders 1981, Raha et al. 1991) have shown that bars undergo a buckling instability perpendicular to the plane of the galaxy, rather similar to that shown in Figure 4. This instability initially just bends the bar, but the structure then flops back and forth until it fills a double-lobed fattened region perpendicular to the galaxy plane, rather like a peanut still in its shell (see Figure 6). IZ.
.
. . ._
...
..
Figure 6. N-body simulation showing the peanut-shaped structure perpendicular to the disk plane into which a bar ultimately evolves. [From Combes €d Sanders (1981)]. This N-body discovery has an interesting tie-in with observations: the bulges of approximately a third of edge-on galaxies are observed to have boxy or peanut-shaped isophotes, similar to that seen in Figure 6 (de Souza & dos Anjos 1987). Could it be that these systems are simply barred galaxies viewed edge-on? The fraction certainly corresponds to the percentage of more face-on systems seen to contain bars, but more direct evidence is clearly needed. Again, numerical simulations pointed the way forward: calculations of orbits in barred potentials have shown that they display a rich array of structure,
Michael R Merrifield
142
1 . I
> :
0
-1
-2
0
R
2
-2
0
2
R
Figure 7. Simulations of the observable kinematics (line-of-sight velocity versus projected radius) along the major axes of edge-on galaxies, comparing the properties of barred and unbarred systems. [From Kuijken 63 Merrijeld (1995)]. with highly elongated orbits, and changes in orientation at radii where one passes through resonances. Kuijken & Merrifield (1995) investigated the implications of this complexity for the observable kinematics of edge-on barred galaxies, and showed that the structure is apparent even in projection: as Figure 7 shows, the changing orientations of the different orbit families shows up in a rather complex structure in the observable kinematics, F(z, More sophisticated N-body and hydrodynamic simulations, allowing for the complex collisional behaviour of gas, confirm that this structure should also be apparent in the gas kinematics of an edge-on barred galaxy (Athanassoula & Bureau 1999). This N-body analysis motivated detailed kinematic observations of edge-on disk galaxies, which revealed a remarkably strong correlation: systems in which the central bulge appears round almost all have the simple kinematics one would expect for an axisymmetric galaxy, whereas galaxies with peanut-shaped central bulges almost all display the complex kinematics characteristic of orbits in a barred potential (Bureau & Freeman 1999, Merrifield & Kuijken 1999). Thus, the connection between peanut shaped structures and bars suggested by the instability found in the N-body models has now been established in real disk galaxies. Here, then, is another excellent example of a case where N-body simulations have not only produced a prediction as to how galaxies may have evolved to their current structure, but have also provided the motivation for new kinematic observations that confirm this prediction.
6
The future
Hopefully, the examples described in this article have given some sense of the productive interplay between kinematic observations of galaxies and N-body simulations of these systems, and there is every reason to believe that this relationship will continue to thrive as the fields develop. On the observational side, kinematic data sets become ever more expansive: the construction of integral field units for spectrographs has made it possible to obtain spectra for complete two-dimensional patches on the sky, thus allowing one to map out the complete observable kinematics of a galaxy, F ( z ,y, z ) ] ~ ~in) ,a single observation. In the N-body work, developments in computing power result in ever-larger numbers of
Galaxies: kinematics to dynamics
143
particles in the code, allowing finer structure to be resolved, and giving some confidence that the results are not compromised by the limitations in the Monte Carlo solution of Equation 2. More powerful computers also allow one to analyse the completed N-body simulations more thoroughly: for example, when comparing transient spiral features in real galaxies to those in a simulation, one can search through the entire evolution of the simulation to see whether there are any times at which the data match the model. Traditionally, one weakness in combining N-body analysis with kinematic observations is that although the simulations are very good at analysing the generic properties of galaxies, they do not provide a useful tool for modelling the specific properties of individual objects. However, there is now the intriguing possibility that this shortcoming could be overcome, through Syer & Tremaine’s (1996) introduction of the idea of a “made-tomeasure” N-body simulation. In such N-body simulations, in addition to its phase-space coordinates, each particle also has a weight associated with it. This weight can be equated with that particle’s contribution to the total ‘‘luminosity” of the model. Syer & Tremaine presented an algorithm by which the weights can be adjusted as the N-body simulation progresses, such that the observable properties of the model evolve in any way one might wish while still providing a good approximation to a solution to the collisionless Boltzmann equation. Thus, for example, one can take as a set of initial conditions a simple analytic distribution function, and “morph” this model into a close representation of a real galaxy. In fact, one can go beyond just the photometric properties of the galaxy, and match the N-body model to kinematic data as well, thus yielding a powerful dynamical modelling tool. Syer & Tremaine’s initial implementation of this method was fairly rudimentary: for example, they did not solve self-consistently for the galaxy’s gravitational potential, but instead imposed a fixed mass distribution. However, there appears to be no fundamental reason why a more complete made-to-measure N-body code could not be developed as a sophisticated technique for modelling real galaxy dynamics. There has also been a lot of progress in the techniques of stellar population synthesis (e.g., Bruzual & Charlot 1993, Worthey 1994). This approach involves determining the combination of stellar types, ages and metalicities that could be responsible for integrated light properties of a galaxy such as its colours and spectral line strengths. Thus, one can now go beyond the simple-minded dynamicist’s picture of a galaxy made up from a large population of identical stars, as assumed in Section 2; instead, one can begin to pick out the range of ages and metalicities that could be present in a galaxy, and even ask whether the different populations have different kinematics. Here, an extension the madeto-measure N-body approach presents an exciting possibility. In addition to a weight, one could associate an age and a metalicity with each particle. One could then synthesise the stellar population associated with that particle, and hence calculate its contribution to the total spectrum of the galaxy. Projecting such an N-body model on to the sky, one could calculate the spectrum associated with any region of the model galaxy by simply adding up the spectral contributions from the individual particles (suitably Doppler shifted by their line-of-sight velocities). By using the sorts of N-body morphing techniques introduced by Syer & Tremaine (1996), one could then evolve an N-body simulation until it matched the properties of a real galaxy not only in its light distribution and kinematics, but also in its colours, the strengths of all its spectral absorption lines, etc. This complete spectral modelling - in essence, a galaxy model that would fit the spatial coordinates and energy of every detected photon - would represent the ultimate match between N-body simulations and observations. It would be a truly amazing tool for use in the study of
144
Michael R Merrifield
galaxy dynamics, and would allow us t o integrate the evolution of the galaxy’s stellar population into the dynamical picture, opening up a whole new dimension of information in the study of galaxy formation, evolution and structure.
References Athanassoula E and Bureau M, 1999, ApJ 522 699. Athanassoula E, 1996, in Barred Galaxies, Astronomical Society of the Pacific, edited by Buta R, Croker D A and Elmegreen B G. Barnes J E and Hernquist L, 1992, ARAF4A 30 705. Binney J and Mamon G A, 1982, MNRAS 200 361. Binney J and Merrifield M, 1998, “Galactic Astronomy,” Princeton University Press. Bruzual A G and Charlot S, 1993, ApJ 405 538. Bureau M and Freeman K C, 1999, A J 118 126. Combes F and Sanders R H, 1981, A&A 96 164. Cretton N, Rix H-W and de Zeeuw P T, 2000, ApJ 536 319. Davies J, 1991, in Dynamics of Disc Galaxies, edited by Sundelius B, Goteborg, p65. Debattista V P and Sellwood J A, 2000, ApJ 543 704. de Souza R E and dos Anjos S , 1987, A&A 70 465. Gerhard 0 E, 1993, MNRAS 265 213. Hohl F, 1971, ApJ 168 343. Holmberg E, 1941. ApJ 94 385. Jessop C M, Duncan M J and Levison H F, 1997, ApJ 489 49. Kay S T et al. , 2000, MNRAS 316 374. King I, 1966, A J 71 64. Kuijken K and Merrifield M R, 1993, MNRAS 264 712. Kuijken K and Merrifield M R, 1995, ApJ 443 L13. Kuijken K and Merrifield M R, 1998, MNRAS 297 1292. Malin D F and Carter D, ApJ 274 534. Merrifield M R and Kuijken K, 1995, MNRAS 274 933. Merrifield M R and Kuijken K, 1999, AF4A 345 L47. Merritt D, 1987, ApJ 319 55. Merritt D, 1997, A J 114 228. N a b T, Burkert A and Hernquist L, 1999, ApJ 523 L133. Navarro J F, Frenk C and White S D M, 1997, ApJ 490 493. Navarro J F and Steinmetz M, 2000, ApJ 538 477. Newton A J and Binney J, 1984, MNRAS 210 711. Quinn P J, 1984, ApJ 279 596. Raha N, Sellwood J A, James R A and Kahn F D, 1991, Nature 352 411. Rybicki G B, 1986, in Proc IAU Symp 127, The Structure and Dynamics of Elliptical Galaxies, edited by de Zeeuw P T, Dordrecht, p. 397. Schwarzschild M, 1979, ApJ 232 236. Sellwood J A and Moore E M, 1999, ApJ 510 125. Statler T, 1994, ApJ 425 500. Syer D and Tremaine S, 1996, MNRAS 282 223. Toomre A and Toomre J, 1972, ApJ 178 623. Wilson C P, 1975, A J 80 175. Worthey G, 1994, ApJS 95 107. Zwicky F, 1937, ApJ 86 217.
145
Non-integrable galactic dynamics David Merrit t Rutgers University, New Brunswick, NJ USA
1 Introduction Galaxies have traditionally been viewed as integrable or nearly integrable systems, in which the majority of stellar orbits are regular, respecting as many integrals of motion as there are degrees of freedom. Three arguments have commonly been cited in s u p port of this view. First, many reasonable potentials contain only modest numbers of stochastic orbits. (The terms “stochastic” and “chaotic” will be used interchangeably here). This is always true for the potentials of rotationally symmetric models, and there is even a class of non-axisymmetric potentials for which the motion is globally integrable, including the famous “perfect ellipsoid” (Kuzmin 1973; de Zeeuw & Lynden-Bell 1985). Second, stochastic orbits often behave in ways that are very similar to regular orbits over astronomically interesting time scales. Therefore (it is argued) one need not make a sharp distinction between regular and stochastic orbits when constructing an equilibrium model. Third, following the successful construction by Schwarzschild (1979, 1982) of self-consistent triaxial equilibria, it has generally been assumed that the regular orbits which are confined to narrow regions of phase space and thus have definite shapes, are the fundamental building blocks of real galaxies. Schwarzschild’s discovery that many orbits in non-axisymmetric potentials are effectively regular came as a surprise, since triaxial potentials admit only one classical integral of the motion, the energy. In fact a modest fraction of the orbits in Schwarzschild’s models were subsequently shown to be stochastic (Merritt 1980; Goodman & Schwarzschild 1981), though only weakly. But it was clear early on that certain modifications of Schwarzschild’s potential could lead to a much larger fraction of chaotic orbits. For instance, Gerhard & Binney (1985) showed that the addition of a central density cusp or “black hole” (i.e. point mass) to an otherwise integrable triaxial model would render most of the centrefilling, box orbits unstable, due to deflections that occur when a trajectory comes close to the centre. This insight was followed by the discovery (Crane et al. 1993; Ferrarese et al. 1994) that stellar spheroids generically contain power-law cusps in the luminosity density rather than constant-density cores. Evidence for central supermassive black holes also gradually accumulated (Kormendy & Richstone 1995). It is now believed - not only that black holes are universal components of galactic nuclei - but that their masses are pre-
146
David Merritt
dictable with high precision given the global properties of their host spheroids (Ferrarese & Merritt 2000; Merritt & Ferrarese 2000). Thus it is no longer possible to discuss galaxy dynamics in terms of idealised models like Schwarzschild’s with finite central densities. Non-integrability has two important consequences. First, some orbits in non-integrable potentials respect fewer isolating integrals than there are degrees of freedom. Such orbits are chaotic and behave in ways that are very different from regular orbits: they are exponentially unstable to small perturbations, and occupy a phase-space region of larger dimensionality than the invariant tori of regular orbits. The time-averaged shape of a chaotic orbit is similar to that of an equipotential surface and hence such orbits are much less “useful” than regular orbits for reinforcing the shape of the galaxy’s figure. Second, while regular orbits generally still exist in potentials that are not globally integrable, they are strongly influenced by resonances between the frequencies of motion in different directions. These resonances are present even in globally integrable potentials but have no effect on the structure of phase space; in non-integrable potentials, however, the resonances divide up phase space into alternating regions of regular and chaotic motion, with the lowest-order resonances “capturing” the largest parts of phase space. Most regular orbits in non-integrable potentials can be associated with a definite resonance and have a shape that reflects the order of the resonance. This article reviews the following topics: (1) torus construction, a set of techniques for characterising regular motion in non-integrable potentials and for detecting departures from integrability; (2) resonances and their effect on the structure of orbits; (3) the orbital content of triaxial potentials with central point masses; (4) mixing, the process by which the phase-space density of stellar systems approaches a steady state; and ( 5 ) the relation between chaos in the gravitational N-body problem and chaos in smooth potentials.
2
Torus construction
In systems with a single degree of freedom, constancy of the energy allows the momentum variable p to be written in terms of the coordinate variable q as H ( p , q ) = E , and the dependence of both variables on time follows immediately from Hamilton’s equations. In general systems with N 2 2 degrees of freedom (DOF), such a solution is generally not possible unless the Hamilton-Jacobi equation is separable, in which case the separation constants are isolating integrals of the motion. An isolating integral is a conserved quantity that in some transformed coordinate system makes 6’H/6’p2 = f ( q 2 ) ,thus allowing the motion in q2 to be reduced to quadratures. Each isolating integral restricts the dimensionality of the phase space region accessible to an orbit by one; if there are N such integrals, the orbit moves in a phase space of dimension 2N - N = N, and the motion is regular. The N-dimensional phase space region to which a regular orbit is confined is topologically a torus (Figure 1). Orbits in time-independent potentials may be either regular or chaotic; chaotic orbits respect a smaller number of integrals than N - typically only the energy integral E . Although chaotic orbits are not confined to tori, numerical integrations suggest that many chaotic trajectories are effectively regular, remaining confined for long periods of time to regions of phase space much more restricted than the full energy hypersurface. The most compact representation of a regular orbit is in terms of the coordinates on
Non-integrable galactic dynamics
147
Figure 1. Invariant torus defining the motion of a regular orbit in a two-dimensional potential. The torus is determined b y the values of the actions J1 and J2; the position of the trajectory on the torus is defined b y the angles el and 0 2 , which increase linearly with time, Bi = w,t 6;.
+
the torus (Figure 1) - the action-angle variables (J, 6). The process of determining the map (x,v) --t (J, 6) is referred to as torus construction. There are a number of contexts in which it is useful t o know the (J,6). One example is the response of orbits to slow changes in the potential, which leave the actions (J) unchanged. Another is the behaviour of weakly chaotic orbits, which may be approximated as regular orbits that slowly diffuse from one torus to another. A third example is galaxy modelling, where regular orbits are most efficiently represented and stored via the coordinates that define their tori. Two general approaches to torus construction have been developed. Trajectoryfollowing algorithms are based on the quasi-periodicity of regular motion: Fourier decomposition of the trajectory yields the fundamental frequencies on the torus as well as the spectral amplitudes, which allow immediate construction of the map 6 -+ x in the form of a Fourier series. Iterative approaches begin from some initial guess for .(e), which is then refined via Hamilton’s equations with the requirement that the 0, increase linearly with time. The two approaches are often complementary, as discussed below.
2.1
Regular motion
In certain special potentials, every orbit is regular; examples are the Kepler and Stackel potentials. Motion in such globally-integrable potentials can be expressed most simply by finding a canonical transformation to coordinates (p,q) for which the Hamiltonian is independent of q, H = H(p); among all such coordinates, one particularly simple choice is the action-angle variables (J1,Oz), in terms of which the equations of motion are
J, = constant,
et
= w,t +e:,
aH aJ, ’
w, = -
i = 1, ..., N
(1)
(Landau & Lifshitz 1976; Goldstein 1980). The trajectory x(J, 6) is periodic in each of the angle variables e,, which may be restricted to the range 0 < 0, 5 27r. The J, define the cross-sectional areas of the torus while the 0, define the position on the torus (Figure 1).
David Merri t t
148
These tori are sometimes called “invariant” since a phase point that lies on a torus at any time will remain on it forever. Most potentials are not globally integrable, but regular orbits may still exist; indeed these are the orbits for which torus construction machinery is designed. One expects that for a regular orbit in a non-integrable potential, a canonical transformation (x,v) -+ (J,6) can be found such that J , = o , e , = W , , i = i ,..., N . (2) However there is no guarantee that the full Hamiltonian will be expressible as a continuous function of the J, as in globally integrable potentials. In general, the map (x,v ) -+ (J,6) will be different for each orbit and will not exist for those trajectories that do not respect N isolating integrals. The uniform translation of a regular orbit on its torus implies that the motion in any canonical coordinates (x,v) is quasi-periodic: & ( J ) exP [i ( l k W -I-m k W z
x(t) =
+ 7 1 k w 3 ) t],
k
v(t) =
v k( J )
exp [i ( l k U 1
f mkU2
+ 7 1 k w 3 ) t]
(3)
3
k
with (Ik, m k , n k ) integers. The Fourier transform of x(t) or v(t) will therefore consist of a set of spikes at discrete frequencies w k = l k q + m k w 2 n k w 3 that are linear combinations of the N fundamental frequencies U , , with spectral amplitudes X k ( J ) and v k ( J ) .
+
2.2
Trajectory-following approaches
The most straightforward, and probably the most robust, approach to torus construction is via Fourier analysis of the numerically-integrated trajectories (Percival 1974; Boozer 1982; Binney & Spergel 1982, 1984; Kuo-Petravic et al. 1983; Eaker et al. 1984; Martens & Ezra 1985). The Fourier decomposition of a quasiperiodic orbit (Equation 3) yields a discrete frequency spectrum. The precise form of this spectrum depends on the coordinates in which the orbit is integrated, but certain of its properties are invariant, including the N fundamental frequencies w, from which every line is made up, w k = l k W l + m k w z n k W 3 . Typically the strongest line in a spectrum lies at one of the fundamental frequencies: once the U , have been identified, the integer vectors ( l k , m k , n k ) corresponding to every line w k are uniquely defined, to within computational uncertainties. Approximations to the actions may then be computed using Percival’s (1974) formulae; e.g. the action associated ~ a 3 DOF system is with 1 9 in
+
J1
lk (lkU1
+mkW2 +nkW3) lXkI2.
(4)
k
Finally, the maps (6 spectrum, e.g.
-+ x) are
obtained by making the substitution w,t
x ~ (exp J )[i (
s(t) =
b 1 + mkwz
+
nkw3)
+ 8,
in the
t]
k
xk(J)exp [i (he1 + m d 2 + n k 8 3 ) ]
= k
2(81,@2, ’93).
(5)
Non-integrable galactic dynamics
149
Trajectory-following algorithms are easily automated; for instance, integer programming may be used to recover the vectors ( l k , mk, n k ) (Valluri & Merritt 1998). Binney & Spergel (1982) pioneered the use of trajectory-following algorithms for galactic potentials. They integrated orbits for a time T and computed discrete Fourier transforms, yielding spectra in which each frequency spike was represented by a peak with finite width N rr/T centred on W k . They then fitted these peaks to the expected functional form Xk sin[(w - w k ) T ] / ( w - w k ) using a least-squares algorithm. They were able to recover the fundamental frequencies in a 2 DOF potential with an accuracy of N 0.1% after N 25 orbital periods. Binney & Spergel (1984) used Equation (4) to construct the “action map” for orbits in a principal plane of the triaxial logarithmic potential. Carpintero & Aguilar (1998) have applied similar algorithms to motion in 2- and 3 DOF potentials. The accuracy of Fourier transform methods can be greatly improved by multiplying the time series with a windowing function before transforming. The result is a reduction in the amplitude of the side lobes of each frequency peak at the expense of a broadening of the peaks; the amplitude measurements are then effectively decoupled from any errors in the determination of the frequencies. Laskar (1988, 1990) developed this idea into a set of tools, the “numerical analysis of fundamental frequencies” (NAFF), which he applied to the analysis of weakly chaotic motion in the solar system. Laskar’s algorithm recovers the fundamental frequencies with an error that falls off as T-4 (Laskar 1996), compared with N T-’ in algorithms like Binney & Spergel’s (1982). Even for modest integration times of lo2 orbital periods, the NAFF algorithm is able to recover fundamental frequencies or better in many potentials. The result is a very precise with accuracies of N representation of the torus (Figure 2 ) . N
Since Fourier techniques focus on the frequency domain, they are particularly well suited to identifying regions of phase space associated with resonances. Resonant tori are places where perturbation expansions of integrable systems break down, due to the “problem of small denominators”. In perturbed (non-integrable) potentials, one expects stable resonant tori to generate regions of regular motion and unstable resonant tori to give rise to chaotic regions. Algorithms like NAFF allow one to construct a “frequency map” , for of the phase space: a plot of the ratios of the fundamental frequencies ( w l / w ~wZ/w3) a large set of orbits selected from a uniform grid in initial condition space. Resonances appear on the frequency map as lines; either densely filled lines in the case of stable resonances, or gaps in the case of unstable resonances; the frequency map is effectively a representation of the Arnold web (Laskar 1993). Resonances are discussed in more detail in 53.
2.3
Iterative approaches
Iterative approaches to torus construction consist of finding successively better approximations to the map 0 -+ x given some initial guess x(0); canonical perturbation theory is a special case, and in fact iterative schemes often reduce to perturbative methods in appropriate limits. Iterative algorithms were first developed in the context of semi-classical quantisation for computing energy levels of bound molecular systems, and they are still best suited to assigning energies to actions, H(J). Most of the other quantities of interest to galactic dynamicists - e.g. the fundamental frequencies w, - are not easily recovered using these algorithms. Iterative schemes also tend to be numerically unstable unless the
150
David Merrit t
lh,
X
5
10
20
0
Y
r---i O
0-0
00o T o m 5
0.178
Jy
0
2n
0.1
(b) 0.01
82 n
10-5
0 0
2rr
n 81
2
10
5
20
50
10-4
lh,
Figure 2. Construction of a 2 DOF, box-orbit torus an a Stackel potential using the NAFF trajectory-following algorithm. (a) The orbit and its actions, computed using Equation (4) with ,,CI terms. Dashed lines show the exact Ji. (b) The map y(O1,0 2 ) ; dashed contours correspond to negative values of y, A(k,,,) is the RMS error in the reconstructed map, calculated using an equation similar to (5). initial guess is close to the true solution. On the other hand, iterative algorithms can be more efficient than trajectory-following algorithms for orbits that are near (but not exactly on) resonances. Ratcliff, Chang & Schwarzschild (1984) pioneered iterative schemes in galactic dynamics. They noted that the equations of motion of a 2 DOF regular orbit,
..
x=--
a@
..
Y=--
ax
a@ ay ,
can be written in the form
( (
a
w1-+w2-
801 d w1-+wz801
a@,) 2
x =
l2
a@, a
y =
a@
--
ax
I
a@ -ay.
(7)
Non-integrable galactic dynamics
151
If one specifies w1 and w2 and treats a@/dx and a@/ay as functions of the Oil equations (7) can be viewed as nonlinear differential equations for x ( & , & ) and y(B1,&). Ratcliff et al. expressed the coordinates as Fourier series in the angle variables, n
Substituting (8) into (7) gives n
where the right hand side is again understood to be a function of the angles. Ratcliff et al. truncated the Fourier series and required equations (9) to be satisfied on a grid of points around the torus. They then solved for the X, by iterating from an initial guess. Convergence was found to be possible if the initial guess was close to the exact solution. A similar algorithm was developed for recovering tori in the case that the actions, rather than the frequencies, are specified a priori. Guerra & Ratcliff (1990) applied these algorithms to motion in the plane of rotation of a non-axisymmetric potential. Another iterative approach to torus construction was developed by Chapman, Garrett & Miller (1976) in the context of semiclassical quantum theory. One begins by dividing the Hamiltonian H into separable and non-separable parts Ho and H I , then seeks a generating function S that maps the known tori of Ho into tori of H . For a generating function of the &-type (Goldstein 1980), one has
as ae
J(0, J') = -,
as
e'(@ J') = aJ'
where (J18)and (J',8') are the action-angle variables of HO and H respectively. The generator S is determined, for a specified J', by substituting the first of equations (10) into the Hamiltonian and requiring the result to be independent of 8. One then arrives at H(J'). Chapman et al. showed that a sufficiently general form for S is S(8,J') = 8 * J' - i
Sn(J')ein'', n#O
where the first term is the identity transformation, and they evaluated a number of iterative schemes for finding the S,. One such scheme was found to recover the results of first-order perturbation theory after a single iteration. McGill & Binney (1990) applied the Chapman et al. algorithm to 2 DOF motion in the axisymmetric logarithmic potential. The generating function approach is not naturally suited to deriving the other quantities of interest to galactic dynamicists. For instance, equation (10) gives @ ( e )as a derivative of S, but since S must be computed separately for every J' its derivative is likely to be ill-conditioned. Binney & Kumar (1993) and Kaasalainen & Binney (1994a) discussed two schemes for finding e'(8);the first requires the solution of a formally infinite set of equations, while the latter requires multiple integrations of the equations of motion for each torus - effectively a trajectory-following scheme. Warnock (1991) presented a hybrid scheme in which the generating function S was derived by numerically integrating an orbit from appropriate initial conditions, transforming the coordinates to (J,8) of Ho and interpolating J on a regular grid in 8. The values
152
David Merri t t
of the S, then follow from the first equation of (10) after a discrete Fourier transform. Kaasalainen & Binney (1994b) found that Warnock’s scheme could be used to refine substantially the solutions found via their iterative algorithm. Another hybrid scheme was discussed by Reiman & Pomphrey (1991). Having computed the energy on a grid of J’ values, one can interpolate t o obtain the full Hamiltonian H(J’). If the system is not in fact completely integrable, this H may be rigorously interpreted as smooth approximation to the true H (Warnock & Ruth 1991, 1992) and can be taken as the starting point for secular perturbation theory. Kaasalainen (1994) developed this idea and showed how to recover accurate surfaces of section in the neighbourhood of low-order resonances in the planar logarithmic potential. Percival (1977) described a variational principle for constructing tori. His technique has apparently not yet been implemented in the context of galactic dynamics.
2.4 Chaotic motion Torus-construction machinery may be applied to orbits that are approximately, but not precisely, regular (Laskar 1993). The frequency spectrum of a weakly chaotic orbit will typically be close to that of a regular orbit, with most of the lines well approximated as linear combinations of three “fundamental frequencies” U,. However these frequencies will change with time as the orbit migrates from one “torus” to another. The diffusion rate can be measured via quantities like 6w = /wl - w i l , the change in a “fundamental frequency” over two consecutive integration intervals. Papaphilippou & Laskar (1996, 1998), Valluri & Merritt (1998) and Wachlin & Ferraz-Mello (1998) used this technique to study chaos and diffusion in triaxial galactic potentials. Measuring chaos via quantities like 6w has a number of advantages over the traditional technique based on computation of the Liapunov exponents (Lichtenberg & Lieberman 1992). 6w can be accurately determined after just a few tens of orbital periods, whereas determination of the Liapunov exponents may require much longer integrations. The Liapunov exponents measure only the rate of growth of infinitesimal perturbations around the trajectory, while bw measures the finite “movement” of the trajectory in action-angle space, a more physically interesting measure of chaos. It is possible for orbits to be extremely unstable in the sense of having large Liapunov exponents, but to behave nearly regularly in the sense of having small 6w; an example is presented in $6.
3
Resonances
The character of a regular orbit depends critically on whether the frequencies wi are independent, or whether they satisfy one or more nontrivial linear relations of the form N
xmiwi=0 i=l
with N the number of degrees of freedom and mi integers, not all of which are zero. Generally there exists no relation like Equation (12); the frequencies are incommensurate; and the trajectory fills its invariant torus uniformly and densely in a time-averaged sense.
Non-integrable galactic dynamics
61
153
81
Figure 3. Resonant tori. (a) A two-dimensional torus as a square with identified edges. The plotted trajectory satisfies a 2 : 1 resonance between the fundamental frequencies, w1 - 2w2 = 0 (e.g. a (‘banana’’). (b) A three-dimensional torus as a cube with identified sides. The shaded region is covered densely by a resonant trajectory for which 2wl + w2 2w3 = 0. This trajectory is not closed, but it is restricted by the resonance condition to a two-dimensional subset of the torus. The orbit in configuration space is thin. When one or more resonance relations are satisfied, however, the trajectory is restricted to a phase-space region of lower dimensionality than N . In the case of a two-dimensional regular orbit, the angle variables are
el = w l t + e,,,
e2 = w2t + e2,,
(13)
which define the surface of a torus (Figure 1). Because of the quasi-periodicity of the orbit, its torus can be mapped onto a square in the (e,, B2)-plane, with each side ranging from 0 to 27r (Figure 3a); the top and bottom of the square are identified with each other, as are the left and right sides. In the general case, the frequencies w1 and w2 are incommensurate and the trajectory densely covers the entire (el, &)-plane after an infinite time. However if the ratio w1/w2 = Im2/mlI is a rational number, i.e. if ml and m2 are integers, the orbit closes on itself after Im21 revolutions in 8, and lmll revolutions in 02 and fills only a one-dimensional subset of its torus (e.g. Arnold 1963, p. 164). Its dimensionality in configuration space is also one - the orbit is closed. Such an orbit has a single fundamental frequency wo = wl/m2 = w2/ml = 27r/T, with T the orbital period; after an elapsed time T , the trajectory returns to its starting point in phase space. Examples of resonant orbits in two-dimensional galactic potentials are the “boxlets” (Miralda-EscudB & Schwarzschild 1989). In the case of a three-dimensional regular orbit, the angle variables are
e,
= wlt
+ el,,
e2 = w2t+ e20, e3 = w3t+ e30.
(14)
The orbit may now be mapped into a cube whose axes are identified with the 8, (Figure 3b). If the wi are incommensurate, this cube will be densely filled after a long time. However if a single condition of the form m l w l + m2w2
+ m3w3 = 0
(15)
is satisfied with integer m,, the motion is restricted for all time to a two-dimensional subset of its torus . Such an orbit is not closed; instead, as suggested by Figure 3b, it is
David Merrit t
154
Figure 4. Surfaces filled by a set of thin, or resonant, box orbits in a non-integrable triaxial potential (Merritt t3 Valluri 1999), as seen from vantage points on each of the three principal axes. The cross sections of these orbits are shown in Figure 5. thin, confined to a sheet or membrane in configuration space, which it fills densely after infinite time. Just as in the two-dimensional case, the condition (15) may be used to reduce the number of independent frequencies by one. Defining the two “base” frequencies w!), wf’ as w t ) = w g / m l , wo( 2 ) - w 2 / m l , (16) we may write w1= -m3w!’
- mzwf),
w2 = mlwO ( 2 ),
w3 = mlwo( 1 ) .
(17)
Since the motion is quasi-periodic, i.e.
x(t) =
C
~k
exp i ( 1 k w 1 +
mkw2
+ nkw3) t,
(18)
k
with
( l k , m k , nk) integers,
it will remain quasi-periodic when expressed in terms of the two
Non-integrable galactic dynamics
155
(3,0,-2)
Z
Y
x +
2 Y -
X -
Figure 5 . Intersections with the principal planes of the thin box orbits shown in Figure 4. Because the orbits are thin, their intersections with any plane define a curve or set of curves. The centre of the potential is indicated b y a cross. base frequencies:
A Fourier transform of the motion will therefore consist of a set of spikes whose locations can be expressed as linear combinations of just two frequencies. Equation (19) is a parametric expression for the Cartesian coordinates in terms of the angles on the 2-torus, i.e. it is a reconstruction of the (reduced) torus. A number of examples of resonant box orbits reconstructed in this way are illustrated in Figures 4 and 5 .
David Merri t t
156
Certain special orbits may satisfy two independent resonance relations simultaneously. In this case we can write:
and each frequency wi may be expressed as a rational fraction of any other:
-U1_ - m2n3 - m3n2 -- -11
w2
- m3n1 - m1n3
-
12
--(21) m1n2- m2n1 13’ w3 mln2 - m2nl 13’ with ( l 1 , l 2 , l 3 ) integers. The motion is therefore periodic with a single base frequency W O = w l / l l = w2/12 = w 3 / / 3 and the trajectory is closed - the orbit is a three-dimensional, closed curve. In a system with N degrees of freedom, N - 1 such conditions are required for closure; only in the 2 DOF case does a single resonance condition imply closure. w3
Following PoincarC (1892), it has commonly been assumed that closed orbits are the fundamental “building blocks” of phase space. However in three-dimensional potentials, one expects thin orbits to be more common than closed ones, in the sense that orbits satisfying one resonance condition are more likely than orbits satisfying two. Hence one expects that most regular orbits will be associated with families whose parent is a thin orbit. Numerical integrations of orbits in realistic non-axisymmetric potentials suggest that this is in fact the case: the majority of regular orbits have most of their “power” in frequencies that lie close to linear combinations of two fundamental frequencies (thin orbit) rather than one frequency (closed orbit) (Merritt & Valluri 1999).
4
Triaxial potentials with central singularities
Non-integrability is likely to be a generic feature of galactic potentials, for two reasons. First, galaxies are often observed to be non-axisymmetric, either due to the presence of embedded subsystems like bars, or because the stellar distribution is globally triaxial. Observational evidence for global triaxiality in elliptical galaxies is not particularly strong; few ellipticals exhibit significant minor-axis rotation (Franx, Illingworth & de Zeeuw 1991), and detailed modelling of a handful of nearby ellipticals suggests that their kinematics can often be very well reproduced by assuming axisymmetry (e.g. van der Mare1 et al. 1998). However, at least some elliptical galaxies and bulges exhibit clear kinematical signatures of non-axisymmetry (e.g. Schechter & Gunn 1979; Franx, Illingworth & Heckman 1989), and the observed distribution of Hubble types is likewise inconsistent with the assumption that all ellipticals are precisely axisymmetric (Tremblay & Merritt 1995, 1996; Ryden 1996). Mergers between disk galaxies also produce generically triaxial systems (Barnes 1996), and departures from axisymmetry (possibly transient) are widely argued to be necessary for the rapid growth of nuclear black holes during the quasar epoch (Shlosman, Begelman & Frank 1990), for the fuelling of starburst galaxies (Sanders & Mirabel 1996), and for the large radio luminosities of some ellipticals (Bicknell et al. 1997). These arguments suggest that most elliptical galaxies or bulges may have been triaxial at an earlier epoch, and perhaps that triaxiality is a recurrent phenomenon induced by mergers or other interactions. The second feature of galactic potentials conducive to non-integrability is the apparently universal presence at the centres of stellar spheroids of high stellar densities and
Non-integrable galactic dynamics
157
supermassive black holes. Low-luminosity ellipticals and bulges have stellar luminosity profiles that diverge as unbroken power laws at small radii, p r-7, with y M 2. Brighter galaxies also exhibit power laws in the space density of stars, but with shallower slopes, y 5 1; seen in projection, these weaker cusps appear as cores (Kormendy 1985). The gravitational force in an r W 2density cusp diverges as r - l , not steep enough to produce large-angle deflections in the motion of stars that pass near the centre. However galaxies also contain supermassive black holes, with masses that correlate astonishingly well with the velocity dispersion of the stars (Ferrarese & Merritt 2000); the ratio of black hole mass to spheroid mass is 0.0015 with small scatter (Merritt & Ferrarese 2000). The combination of non-axisymmetry in the potential with a steep central force gradient is conducive to non-integrability and chaos, since many orbits in such potentials pass near the centre where they undergo strong gravitational deflections (Gerhard and Binney 1985). N
N
V regular
Figure 6. Three zones in the phase space of triaxial potentials (see text). In a triaxial potential containing a central point mass, the phase space divides naturally into three regions depending on energy, i.e. on distance from the centre (Figure 6). In the innermost region, where the enclosed mass in stars is less than the mass of the black hole, the potential is dominated by the central singularity and the motion is essentially regular. The gravitational force from the stars acts as a small perturbation causing the nearly-Keplerian orbits around the black hole to precess slowly. The two major orbit families in this region are (a) the tube orbits, high angular momentum orbits that avoid the centre and (b) the pyramid orbits, Keplerian ellipses that precess in two orthogonal planes parallel to the short axis of the figure (Sridhar & Touma 1999; Sambhus & Sridhar 2000; Poon & Merritt 2001). Pyramid orbits are similar to the classical box orbits of integrable triaxial potentials except that their elongation is counter to that of the triaxial figure, making them less useful for self-consistently reconstructing a galaxy’s shape. At intermediate radii, the black hole acts as a scattering centre rendering almost all of the centre-filling or box orbits stochastic. (Tube orbits persist at these and higher energies and remain mostly regular.) This “zone of chaos” extends from a few times r g , the radius where the black hole dominates the gravitational force, out to a radius where the enclosed stellar mass is roughly lo2 times the mass of the black hole. The transition to chaos at r 2 rg is very rapid and occurs at lower energies in more elongated potentials (Poon & Merritt 2001). If the black hole mass exceeds times the mass of the stellar spheroid, as it may do in a few galaxies (Merritt & Ferrarese 2000), the chaotic zone will include essentially N
David Merrit t
158
-1
0
orbit
1
-1
0
1
orbit
Figure 7. Non-integrability i n triaxial potentials (Merritt & Valluri 1999). The mass model in (a) has a weak (7= 0.5) density cusp and no black hole; i n (b) the black hole contains 0.3% of the total mass. Each panel shows one octant of an equipotential surface, lying close to the half-mass radius of the model; the 2 (short) axis is vertical and the x (long) axis is to the left. The grey scale measures the degree of stochasticity of orbits started with zero velocity on the equipotential surface. Stable resonance zones - the white bands in (a) and (b) - are labelled b y their defining integers ( m l ,m2,m3).Panels (c) and (d) show the pericentre distance A of a set of IO3 orbits with starting points along the heavy solid lines in (a) and (b). Panels (e) and ( f ) plot a measure of the chaos for these orbits; G w l w ~is the fractional change in the frequency of the strongest line in the orbit's frequency spectrum.
Non-integrable galactic dynamics
159
the entire potential outside of T ~ .However if M. M 10-3Mg,1, as in the majority of galaxies, there exists a third, outermost region where the phase space is a complex mixture of chaotic and regular trajectories, including resonant box orbits like those in Figures 4 and 5 that remain stable by avoiding the centre (Carpintero & Aguilar 1998; Papaphillipou & Laskar 1998; Valluri & Merritt 1998; Wachlin & Ferraz-Mello 1998). Figure 7 illustrates the complexity of box-orbit phase space at large energies in two triaxial potentials: one with a weak density cusp and the other with a central point mass. N
Non-integrable potentials often exhibit a transition to global stochasticity as the magnitude of some perturbation parameter is increased. The results summarised above suggest that there are two such perturbation parameters associated with motion in triaxial galaxies containing central black holes. In a triaxial galaxy with a given M., the motion of centre-filling orbits undergoes a sudden transition to stochasticity as the energy is increased; the critical value is the energy at which the gravitational force from the stars is of order the force from the black hole. If one imagines increasing M. in an otherwise fixed, triaxial potential, the zone of chaos that extends outward from this radius will eventually encompass the entire potential; this occurs when the second “perturbation parameter,” M./Mgal, exceeds Thus at intermediate radii, in the “zone of chaos,” and perhaps throughout an elliptical galaxy containing a central black hole, triaxiality should be difficult to maintain. N
5
Mixing and collisionless relaxation
Stochastic motion introduces a new time scale into galactic dynamics, the mixing time. Mixing is the process by which a non-uniform distribution of particles in phase space relaxes to a uniform distribution, at least in a coarse-grained sense. A weak sort of mixing, phase mixing, occurs even in integrable potentials, as particles on adjacent tori gradually move apart (Lynden-Bell 1967; Figure 8a). Phase mixing is responsible for the fact that the coarse-grained phase space density in relaxed integrable systems is nearly constant around tori. A stronger sort of mixing takes place in chaotic systems. Chaotic motion is essentially random in the sense that the likelihood of finding a particle anywhere in the stochastic region tends toward a constant value after a sufficiently long time. An initially
Figure 8. ( a ) Phase mixing us. (b) chaotic mixing.
160
David Merri t t
tp t
PT=
6
‘1
Figure 9. Mixing in a triaxial potential with a central point containing 3% of the total mass (Valluri 63 Merritt 2000). Time is in units of the local crossing tame. Ensembles of lo4 phase points were distributed initially (r = 0) in patches on an equipotential surface with zero velocity. compact group of stars should therefore spread out until it covers the accessible phase space region uniformly in a coarse-grained sense (Kandrup & Mahon 1994; Figure 8b). This “chaotic mixing” is irreversible in the sense that an infinitely fine tuning of velocities would be required in order to undo its effects. It also occurs on a characteristic time scale, the Liapunov time associated with exponential divergence of nearby trajectories. Phase mixing, by contrast, has no associated time scale; its rate depends on the range of frequencies associated with orbits in the region of interest, and this rate tends to zero in the case of a set of trajectories drawn from a single invariant torus - a set of points on the torus translates, unchanged, around the torus. Figure 9 shows examples of chaotic mixing in a triaxial potential with a central point mass. Ensembles of orbits were started at rest on an equipotential surface and integrated in tandem for several crossing times. The central point had a mass M. = 0.03 in units of the galaxy mass. The first ensemble (a) was begun on an equipotential surface enclosing
Non-integrable galactic dynamics
161
a mass -3 times that of the central point; for ensembles (b) and (c) these ratios were -7 and -17 respectively - all within the “zone of chaos.” Mixing occurs very rapidly in these ensembles. At the lowest energy (ensemble a), the linear extent of the points in configuration space roughly doubles every crossing time until Tx4, when the volume defined by the equipotential surface appears to be nearly filled. At the highest energy (ensemble c), mixing is slower but substantial changes still take place in a few crossing times. The final distribution of points a t this energy still shows some structure, reminiscent of a box orbit. The irreversibility of mixing flows like the ones illustrated in Figure 9 implies a reduction in the effective number of orbits: all the stochastic trajectories at a given energy are gradually replaced by a single invariant ensemble, whose shape is typically not well matched to that of the galaxy (Merritt & Fridman 1996). If time scales for chaotic mixing are comparable to galaxy lifetimes, this reduction might be expected t o encourage a galaxy to evolve away from a triaxial shape toward a more axisymmetric one, in which most of the orbits are tubes that avoid the destabilising centre. Such evolution has in fact been observed in N-body simulations of the response of a triaxial galaxy to the growth of a central black hole. Merritt & Quinlan (1998) found that a triaxial galaxy evolves to axisymmetry in little more than the local crossing time at each radius when the black hole mass exceeds -2.5% of the total galaxy mass. This is about an order of magnitude larger than the typical black hole mass ratio in real galaxies (Merritt & Ferrarese 2000), but Merritt & Quinlan observed more gradual evolution even when the mass ratio was 10 times smaller, a t a rate that would imply substantial shape changes over a galaxy lifetime. These simulations suggest an explanation for the generally low level of triaxiality observed in real galaxies (Bak & Statler 2000).
6
Chaos in collisional systems
The discussion presented so far has assumed that galaxy potentials are smooth, or “collisionless.” In fact, the gravitational force on a star in a galaxy can be broken up into two components: a rapidly varying component that arises from the discrete distribution of stars, and a smoothly varying component that arises from the large-scale matter distribution. The effects of the discrete component relative to the smooth component are usually assumed to scale as In N / N , the ratio of dynamical to two-body relaxation times. For galaxies, which have N-lO1’, collisional effects should therefore be unimportant.
-
If this were the case, it should be possible to show that the N-body trajectories go over, in the limit of large N , to the orbits in the corresponding smoothed-out potential i.e., that the equations of motion of the N-body problem tend t o the characteristics of the collisionless Boltzmann equation as N + CO. However this has never been demonstrated, and in fact there is an important sense in which the equations of motion in an N-body system do not tend toward the trajectories of the corresponding smooth potential in the limit of large N . This surprising statement is justified in Figure 10, which shows the results of testparticle integrations in a potential consisting of N fixed point masses distributed randomly and uniformly within a triaxial ellipsoid. The mass of each of the N points is m = 1/N, so that the total mass and mean density of the ellipsoid remain constant as N is varied.
David Merri t t
162 6
5 4 UT,
3 2 0
1
0 N
N
I
h
'0
10
5
15
20
t
Figurs 10. Evolution of orbits in a potential consisting of N fixed point masses with m = 1/N, distributed randomly and uniformly an an ellipsoidal volume (Valluri 63 Merritt 2000) (see text). In the limit N -+ CO, one might expect the equations of motion to approach those of a 3-dimensional harmonic oscillator, since the potential of a uniform ellipsoid is quadratic A& (Chandrasekhar 1969). However the upper left-hand in the coordinates, @ = @po panel shows that the Liapunov exponents U of orbits in the N-body potential do not tend to zero with increasing N . Instead, the instability time scale appears to reach a roughly for N 2 lo3. Furthermore constant value (expressed as a fraction of the crossing time TcT) the instability time scale is very short, a fraction of the crossing time!
ci
The generic instability of the N-body problem was first noted by Miller (1964), who calculated the time evolution of the separation between two N-body systems with slightly different initial conditions. He defined this separation as
'1
*(4 = [E(X' - x1)' + E (v2 - vl)
1'2
with x1 and x2 the N configuration-space coordinates in N-body systems 1 and 2 and and v2 the velocities; the summations extend over all the particles. Miller found, for
v1
Non-integrable galactic dynamics
163
4 5 N 5 32, that A grew roughly exponentially with a characteristic time scale that was a fraction of the crossing time, as in the fixed N-body problem of Figure 10. What are the physical implications of this generic instability? Several suggestions have been made. Gurzadyan & Saviddy (1986), who first investigated the large-N dependence of the instability using an idealised model, suggested that the exponential divergence implies chaotic mixing on a similar time scale, and hence that stellar systems should relax much more rapidly than implied by the standard Chandrasekhar formula. Heggie (1991) disagreed, but suggested that the use of smooth potentials for approximating galaxies would need to be abandoned, a t least for studies of orbital instability. Kandrup (1998) suggested that - while individual orbits may always be exponentially unstable - ensembles of N-body systems might behave, on average, as if the potential were smooth. Figure 10 suggests an even stronger way in which the motion goes over to that of the collisionless problem as N -+ m. The open circles in the upper left-hand panel show a second measure of the orbital evolution: the RMS variation, over 20 orbital periods, of the action J , for each ensemble of orbits. Contrary t o the behaviour of the Liapunov exponents, the average changes in the actions tend uniformly to zero as N is increased - in other words, the orbits approach more and more closely, in their macroscopic behaviour, to that of integrable orbits even though they remain locally unstable (as measured by the Liapunov exponents) to a degree that is nearly independent of N . Plots of the trajectories of some typical orbits (lower left panel) confirm this interpretation. These results suggest the way in which trajectories in the N-body problem tend toward those in the corresponding smooth potential: as N is increased, orbits are confined more and more strongly to narrow regions of phase space around the invariant tori of the smooth potential. It is remarkable that orbits can be extremely unstable locally, as measured by their Liapunov exponents, and yet behave macroscopically in a way that is essentially identical to that of regular orbits. Apparently, the exponential growth of perturbations must saturate a t some finite amplitude, and this saturation amplitude must be a decreasing function of N . The lower right-hand panel of Figure 10 verifies this conjecture for a few pairs of orbits with nearly identical initial conditions. The early divergence takes place a t a rate that is independent of N, but for large N, the separation saturates at a value that is much smaller than the size of the system. These pairs of orbits act as if they are confined to the same, restricted region of phase space; saturation occurs when the separation between them is of order the width of this region. The fact that the exponential divergence saturates sooner for larger N suggests that the width of the confining regions decreases with increasing N. These results suggest that collisional relaxation in stellar systems is intimately connected with the evolution of orbits under conditions of weak chaos, i.e., with Arnold diffusion. This connection would be a fruitful topic for future study.
Acknowledgments Some of the work presented here was first published in collaboration with M.Valluri. I am grateful for her permission to reproduce the work here. The preparation of this review was supported by NSF grant AST-0071099 and by NASA grants NAG 5-280315-9046.
164
David Merrit t
References Arnold V I, 1963, Russian Mathematical Surveys, 18,85. Bak J and Statler T, 2000, A J , 120,110. Barnes J , 1996, in The Formation of Galaxies, Proceedings of the V Canary Islands Winter School of Astrophysics, editor Muiioz-Tuiibn C, (Cambridge University Press), 399. Bicknell G V, Koekemoer A, Dopita M A and O’Dea C P, 1997, in The Second Stromlo Symposium: The Nature of Elliptical Galaxies, A S P . Conf. Ser. Vol. 116, editors Arnaboldi M, Da Costa G S and Saha P) (Provo: ASP), 432. Binney J and Kumar S, 1993, MNRAS, 261,584. Binney J and Spergel D, 1982, ApJ, 252,308. Binney J and Spergel D, 1984, MNRAS, 206,159. Boozer A H, 1982, Phys Fluids, 25,520. Carpintero D D and Aguilar L A, 1998, MNRAS, 298,1. Chandrasekhar S, 1969, Ellipsoidal Figures of Equilibrium (New York: Dover). Chapman S, Garrett B C and Miller W H, 1976, J Chem Phys., 64,502. Crane P et al. 1993, A J , 106,1371. de Zeeuw P T and Lynden-Bell D, 1985, MNRAS, 215,713. Eaker C W, Schatz G C, De Leon N and Heller E J, 1984, J Chem Phys , 81,5913. Ferrarese L et al. 1994, A J , 108,1598. Ferrarese L and Merritt D, 2000, ApJ, 539,L9. Franx M, Illingworth G D and de Zeeuw P T, 1991, ApJ, 383, 112. F r a u M,Illingworth G D and Heckman T M, 1989, ApJ, 344,613. Gerhard 0 E and Binney J J, 1985, MNRAS, 216 467. Goldstein H, 1980, Classical Mechanics 2nd ed. (Reading: Addison-Wesley). Goodman J, Heggie D C and Hut P, 1993, ApJ, 415,715. Goodman J and Schwarzschild M, 1981, ApJ, 245, 1087. Guerra D V and Ratcliff S J, 1990, ApJ, 348,127. Gurzadyan V G and Savvidy G K, 1986, A and A , 160,203. Heggie D, 1991, in Predictability, Stability, and Chaos in N-Body Dynamical Systems, editor Roy A E, (Plenum Press, New York) p. 47. Kaasalainen M, 1994, MNRAS, 268, 1041. Kaasalainen M,and Binney J,1994a, MNRAS, 268,1033. Kaasalainen M, and Binney J , 1994b, Phys. Rev Lett , 73,2377. Kandrup H E, 1998, in Long-Range Correlations in Astrophysical Systems, edited by Buchler J R, Dufty J W and Kandrup H E, Ann N Y Acad Sci , 848,28. Kandrup H E and Mahon M E, 1994, Phys Rev E, 49,3735. Kormendy J , 1985, ApJ, 292,L9. Kormendy J and Richstone D 0, 1995, ARA and A, 33, 581. KuePetravic G, Boozer A H, Rome J A and Fowler R H, 1983, J Comp Phys , 51,261. Kuzmin, G G, 1973, in The Dynamics of Galaxies and Star Clusters, ed. G B Omarov (Nauka, Alma Ata). Landau L D and Lifshitz E M, 1976, Mechanics 3rd edition (Oxford: Pergamon). L a s h J, 1988, AAp, 198,341. Laskar J, 1990, Zcarus, 88,266. Laskar J , 1993, Physica D, 67,257. Laskar J , 1996, in Hamiltonian Systems with Three or More Degrees of Fkedom NATO-ASI, editors Simo C and Delshams A (Dordrecht: Kluwer). Lichtenberg A J and Lieberman M A 1992, Regular and Chaotic Dynamics (New York: Springer). Lynden-Bell D, 1967, MNRAS, 136,101.
Non-integrable galactic dynamics
165
Martens C C and Ezra G S, 1985, J Chem Phys , 83,2990. McGill C A, and Binney J, 1990, MNRAS, 244,634. Merritt D,1980, ApJS, 43,435. Merritt D and Ferrarese L, 2000, MNRAS, in press (astro-ph/0009076). Merritt D and Fridman T, 1996, ApJ, 460, 136. Merritt D and Quinlan G, 1998, ApJ, 498,625. Merritt D and Valluri M, 1999, A J , 118,1177. Miller R H, 1964, ApJ, 140,250. Miralda-Escudk J and Schwarzschild M, 1989, ApJ, 339,752. Papaphilippou Y and Easkar J, 1996, A and A , 307,427. Papaphilippou Y and Laskar J, 1998, A and A , 329,451. Percival I C, 1974, J Phys A , 7,794. Percival I C, 1977, J Phys A , 12,57. Poincark H, 1892, Les Me'thodes Nouvelle de la Me'canique Ce'leste Tome I. (Paris: Gauthier-Villars), ch. 3. Poon M and Merritt D, 2001, ApJ, in press (astro-ph/0006447). Ratcliff S J, Chang K M, and Schwarzschild M 1984, ApJ, 279,610. Reiman A H and Pomphrey N, 1991, J Comp Phys , 94,225. Ryden B S, 1996, ApJ, 461,146. Sambhus N and Sridhar S, 2000, Apj, 542, 143. Sanders D B and Mirabel, I F, 1996, A R A A , 34,749 Schechter P L and Gunn J E, 1979, ApJ, 229,472. Schwarzschild M, 1979, ApJ, 232,236. Schwarzschild M, 1982, ApJ, 263,599. Shlosman I, Begelman M C and Frank J, 1990, Nature, 345,679. Sridhar S and Touma J, 1997, MNRAS, 287,L1. Tremblay B and Merritt D, 1995, A J , 110,1039. Tremblay B and Merritt D, 1996, A J , 111, 2243. Valluri M and Merritt D, 1998, ApJ, 506, 686. Valluri M and Merritt D, 2000, in The Chaotic Universe, editors Gurzadyan V G and Ruffini R (Singapore: World Scientific), 229. van der Mare1 R P, Cretton N, de Zeeuw P T and Rix H W, 1998, ApJ, 493,613. Wachlin F C and Ferraz-Mello S, 1998, MNRAS, 298,22. Warnock R L, 1991, Phys Rev D, 66, 1803. Warnock R L and Ruth R D, 1991, Phys Rev Lett , 66, 990. Warnock R L and Ruth R D, 1992, Physica D,56,188.
167
Evolution of galaxies due to self-excitation Martin D Weinberg University of Massachusetts, Amherst, USA
1
Introduction
Much of our effort in understanding the long-term evolution and morphology of galaxies has focused on the equilibria of luminous disks and ellipticals. For example, lopsided ( m = 1) asymmetries are transient with gigayear time scales, bars may grow slowly or suddenly and, under certain circumstances may decay as well. Recent work shows that stellar populations depend on asymmetry. Because the properties of a galaxy depend on its history, an understanding of galaxy evolution requires that we understand the dynamical interplay between all components. These lectures will cover methods for addressing these topics and present some recent results. The first part will emphasise N-body simulation methods which minimise sampling noise. These techniques are based on harmonic expansions and scale linearly with the number of bodies, similar to Fourier transform solutions used in cosmological simulations. Although fast, until recently they were only efficiently used for a small number of geometries and background profiles. I will describe how this so-called expunsion or selfconsistent field method can be generalised to treat a wide range of galactic systems with one or more components. We will work through a simple but interesting two-dimensional example relevant for studying bending modes. These same techniques may be used to study the modes and response of a galaxy to an arbitrary perturbation. In particular, I will describe the modal spectra of stellar systems and the role of damped modes which are generic to stellar systems in interactions and appear to play a significant role in determining the common structures that we see. The general development leads indirectly to guidelines for the number of particles necessary to represent adequately the gravitational field such that the modal spectrum is resolvable. I will then apply this same excitation to understanding the importance of noise to galaxy evolution.
Martin D Weinberg
168
2 2.1
N-body simulation using the expansion method Potential solver overview
A number of 5-body potential solvers have already been mentioned in other lectures. To understand better the motivation for the development here, I will begin by briefly reviewing and contrasting their properties. Many of these have already been reviewed by Hugh Couchman but I would like to make a general point to start: the N-body problem of the galactic dynamicist or cosmologist differs considerably from the N-body problem of the celestial mechanician or the student of star clusters. For galactic or CDM simulations, one really wants a solution to the collisionless Boltzmann equation (CBE), not an N-body system with finite N . A direct solution of the CBE is not feasible, so simulate a galaxy by an intrinsically collisional problem of n-bodies but with parameters that best yield a solution to the CBE. In other words, you should consider an N-body simulation in this application as an algorithm for Monte Carlo solution of the CBE. The N bodies should be considered tracers of the density field that we simultaneously use to solve for the gravitational potential and sample the phase-space density.
Direct summation: the textbook approach This truly is the standard N-body problem. The force law is the exact pairwise combicouplings. (*y) = ( N 2)!2! One might use Sverre Aarseth’s advanced techniques for studying star clusters or vari-
nation of central force interactions; there are
’’2
-
-
ous special purpose methods to study the solar system as Tom Quinn and others have reviewed in this volume. Considered as a solution t o the CBE, the density is a distribution of points and the force from pairwise attraction of all points. For any currently practical value of N , this system is a poor approximation to the limit N + W . Furthermore, the direct problem is very expensive. Of course, this direct approach is easy to understand and implement, and with appropriate choice of softening parameter is useful in some cases. However in most cases, it makes sense to take a different approach: interpret the distribution of N points as a sampling of the true distribution. This motivates tree and mesh codes among others.
Tree code The tree algorithm makes use of differences in scales to do only the computational work that will make a difference to the end result. The algorithm treats distant groups of particles as single particles at their centres of mass. The criterion for replacing a group by a single particle is whether or not the angular subtended by that group is smaller than some critical openzng angle 6,. Figure 1 shows the recursive construction that gives the tree code its name. This particular tree is a quad tree although k-d trees and others have been used. The force computation only “opens” the nodes of the tree if they are larger than Bc. Thinking in terms of multipole expansions, one is keeping multipoles up to order 1 2 ~ 1 6 , :typical opening angles have 1 2 20.
-
Evolution of galaxies due to self-excitation L
1.1
169
.
I I I I I I I
Figure 1. Construction of the data structure for the tree algorithm in two dimensions: (left) illustrating the opening angle and (right) the mesh algorithm. Mesh code
A mesh code is simple in concept. The steps in the algorithm are as follows. First, assign the particle distribution to bins. Be aware there are good and bad ways of doing this. For example, one may wish to distribute the mass of a particle according to a smoothing kernel rather than using the position and bin boundaries naively. Then, represent density as a Fourier series by performing a discrete Fourier transform by FFT. Again, one must be very careful about boundary conditions; see Couchman’s paper in this volume and references therein. Finally, the gravitational potential follows directly from Fourier analysis: if we set p = Ckck exp(ik x) then a simple application of the Poisson equation yields @ = - Ck ck exp(ik. x)/4nGlc2. In short, we are using a mesh to represent the density and exploiting harmonic properties of the Poisson equation to write down the gravitational potential. Note that the particle distribution traces the mass but an individual particle does not interact with others as a point mass. Smoothed-particle hydrodynamics (SPH) This notion of density representation is explicit in smoothed-particle hydrodynamics (SPH), a topic which has also appeared several times in these lectures. In SPH, the gas particles must be considered as tracers of the gaseous density, temperature, and velocity fields. The hydrodynamic equations are solved, crudely speaking, by a finite difference solution on appropriately smoothed fields determined from the tracers. One can show that these algorithms reduce to Euler’s equations in the limit of large N . The choice of algorithm and smoothing kernel must be done with great care but most clearly, the gas particles are not stars or gas clumps in any physical sense but tracers of field quantities.
Summary All of these but direct summation are examples of density estimation: a statistical method for determining the density distribution function based on a sample of points. The algorithms follow the same three steps. (1) Estimate the density profile of the galaxy based on
Martin D Weinberg
170
the n bodies; (2) Exploit some property of the estimation to compute efficiently the gravitational potential, and in the case of SPH, other necessary field quantities; (3) Use the gravitational field to derive the accelerations, and in the case of SPH, the hydrodynamical equations of motion.
2.2
Expansion method
The expansion method is density estimation using an orthogonal function expansion. This is a standard technique in functional approximation and familiar to most readers. Its application to solving the Poisson equation is directly analogous to the grid method. In the standard grid method, one represents the density as a Fourier series
&) = L3 l
M
Clmnewlz+mY+n4
(1) l,m,n=-M
where Ak = 2n/L and the infinite sum of integers is truncated at f M .Then, by separation of variables, the gravitational potential is: @(r)= -
M eiAk(lr+my+nz) 1 Clmn ~ T G ( A ~ C ) ~ ~ ,l 2~ , m2 ~ = -n2~’
E’
+ +
(2)
There is a way to skip the binning and FFT steps altogether. We can write the density profile of the R point particles as N
P(X,
Y,2) =
1b(z - Xi)b(Y - Yi)b(Z - 4
(3)
i=l
The coefficient qmnis integral L3 J-LL i/ 22 dx
ILJ2 ILI2 -L/2
dy
-LIS
dz e-iAk(lx+my+nz)
P(X1
Y1 2 )
(4)
which immediately yields
and we are done! From these coefficients, we have the potential and force fields. This may be less efficient than an FFT scheme in some cases and a suboptimal method of density estimation because the lack of smoothing may increase the variance, but it is applicable to non-Cartesian geometries for which no FFT exists as we will see below.
2.3
General theory for gridless expansion
We tend to take for granted special properties of sines and cosines in solving the Poisson equation. However, most of the special properties are due to the equation not the rectangular coordinate system. In particular, the Poisson equation is separable in all
Evolution of galaxies due to self-excitation
171
conic coordinate systems (see, for example, Morse and Feshbach, 1953). Each separated equation takes the Sturm-Liouville (SL) form: d
dx
[p(z)F]
- q ( z ) @ ( z )= Xw(z)@(x)
where p ( z ) , q ( z ) ,w(z) are real and w(z) is non-negative. The eigenfunctions of this equation are orthogonal and complete. The implication of this is the existence of pairs of functions, one representing the density and one the potential, that are mutually orthogonal and together can be arranged to satisfy the Poisson equation. Such a set of pairs is called bi-orthogonal. Just as in the case of rectangular coordinates, the particle distribution can be used to determine the coefficients for a bi-orthogonal basis set and the coefficients yield a potential and force field.
Pedagogical example: semi-infinite slab Here, we will develop a simple but non-trivial example of a bi-orthogonal basis. Our system is a slab of stars, infinite in z and y directions but finite in z ; that is, p = 0 for /zI > L. Since the coordinates are Cartesian, the eigenfunctions of the the Laplacian (the SL equation) are sines and cosines again and we do not have to construct an explicit solution. The subtlety in the solution is the proper implementation of the boundary conditions. Proceeding, we know that we should find a bi-orthogonal basis of density potentialdensity pairs, p,, d,, with a scalar product (p,, d,) = -
J d x d y d z p i d , = 6,"
(7)
such that V 2 p , = d,. Inside the slab, solutions are sines and cosines in all directions. However, outside the slab, the vertical wave function must satisfy the Laplace equation
(9)
where k, is the wave vector in the horizontal direction. The Laplacian is self-adjoint with these boundary conditions. Therefore, the resulting eigenvalue problem is of SturmLiouville type whose eigenfunctions are a complete set. Taking the form Q = Acos(kz + cy) results in the following requirements on k: cy = m ~ / and 2 tan(kL) = k,/k m even, (10) cot(kL) = -k,/k m odd. n7r, + 7r/2] and Let k,: and kgn be the solutions of these two relations where k,: E [ n ~ kO,, E [n7r+ 7r/2, ( n + l ) ~. ]The normalised eigenfunctions are QE = A: cos(k",z) and = A;sin(k",z) with normalisation constants A; and A;. Finally, putting all of this together, the bi-orthogonal pairs can be defined as
{
Martin D Weinberg
172
where k and R are vectors in the x-y plane and Q,, and k* denote both the even and odd varieties. The orthogonality relationship is -
J d3xpEkd,,k,= b,,b(k
- k’).
(12)
The application to an N-body simulation requires two O ( N ) steps:
1. We obtain the coefficients by summing the basis functions over the N particles: N Cp k = Ci=Omippk(R.1, zi) where k = (kz,ky) is the in-plane wave vector now generalised to remove the identification of = 4 and R = (x, y).
2 . We compute the force by gradient of potential: F(r) = -z a ~kdk~pp,k(R,z). Because the slab is unbounded in the horizontal direction the values of k are continuous and therefore, construction of the potential requires an integral over k. This is indicated as a discrete sum over the volume in k space in the expression for F(r). A few short words about error analysis for this scheme. Nearly all results follow from the identification of this algorithm as a specific case of linear least squares (Dahlquist and Bjork, 1974). For our purposes, it is interesting to note that the coefficient determination in the expansion method is, therefore, unbiased: E{c,} = E,. This means that if one performs a large number of Monte Carlo realizations, the expectation values of the coefficients from this ensemble will be the true values. One can derive formal error estimates for this method, following the approach outlined in many standard probability and statistics texts. In this case we find that Var
0;
Pmax -
N
where pmaxis the maximum order in the expansion series and N is the number of sample points. This is broadly consistent with expectations: the variance in a Monte Carlo estimate scales as 1/N and each independent parameter contributes to this variance. More informative analyses are possible. In particular, it is straightforward to compute the variance of the coefficients (or the entire covariance matrix) and estimate the signal t o noise ratio for each coefficient. Then, one may truncate the series when the information content becomes small, or at the very least, use this information to inform future choices of pmax(see Hall, 1981, for general discussion in the density estimation context).
Example: spherical system The recurring slab example in this presentation is intended to give you a complete example which illustrates most aspects of the method, rather than be of use for a realistic astronomical scenario. Nonetheless, it is easy to implement and coupled with the analytic treatment in 53.2 is useful for exploring the effects of particle number (more on this below). Astronomically useful geometries include the spherical, polar and cylindrical bases, although as mentioned above, this approach can be applied to any conic coordinate system. For example, the Poisson equation separates in spherical coordinates and each equation yields an independently orthogonal basis: (1) trigonometric functions in the azimuthal
Evolution of galaxies due to self-excitation
173
direction, e*m#;(2) associated Legendre polynomials in latitudinal direction, qm(cos8); and (3) Bessel functions in the radial direction, qnlJl+l/2(anr/R).The first two bases combine to form the spherical harmonics, X m ( 8 , 4). The a, follow from defining physical boundary conditions that the distribution vanishes outside of some radius R and qnl is a normalisation factor. This bit of potential theory should be familiar to readers who have studied mathematical methods of physics or engineering. For N sampled particles at position r,, the gravitational potential is then
where the expansion coefficients are
This set is is easy to describe but the basis functions look nothing like a galaxy. Therefore, one requires many terms to represent the underlying profile and any deviations. Because the variance increases with pmax(cf. Equation 13), such a basis is inefficient.
2.4
Basis Sets
There is an obvious way around this problem. Nothing requires us to use the Bessel function basis directly and we can construct new bases by taking weighted sums to make the lowest order member have any desired shape. This method is nicely described in Clutton-Brock (1972, 1973) who shows that a suitably chosen coordinate transformation, followed by an orthogonality requirement, leads to a recursion relation for a set of functions whose lowest order members do look like a galaxy. He describes two sets in each of these papers, a spherical set whose first member is proportiona! to a Plummer model and a two-dimensional polar set whose first member is similar to a Toomre disk. At nearly the same time, Kalnajs (1976, 1977) described a two-dimensional set appropriate for studying spiral modes. More recently, Hernquist & Ostriker (1992) used Clutton-Brock’s construction to derive a basis whose lowest-order member is the Hernquist profile (Hernquist 1990). The lack of choice in basis functions in all but a few cases, however, seems to have limited the utility of the expansion approach. However, there is really no need for analytic bases (or those constructed from an analytic recursion relation). Saha (1993) advocates constructing bases by direct Gram-Schmidt orthogonalisation beginning with any set of convenient functions. Recall from 52.3 that the original motivation for using eigenfunctions of the Laplacian is that these are solutions to the Sturm-Liouville equation and therefore orthogonal and complete. The SL equation has many useful properties and recently these have led to very efficient methods of numerical solution (Pruess and Fulton, 1993). By numerical solution, we can construct spherical basis sets with any desired underlying profile and three-dimensional disk basis sets close to a desired underlying profile (Weinberg, 1999). The next section describes t,he method.
Martin D Weinberg
174
2.5
Empirical bases
The spherical case is straightforward and illustrates the general procedure. We still expand in spherical harmonics and only need to treat the radial part of the Poisson equation:
The most important point is to search for solutions of the form @ ( r )= Qo(r)u(r)and p ( r ) = po(r)u(r)where Qo(r) and po(r) are conditioning functions. Note that if we choose our conditioning functions so that V2Qo(r)= 4 ~ G p o ( r )the , lowest order basis function will be a constant, u ( r ) = constant, with unit eigenvalue X = 1. In other words, by choosing QOappropriately, we have achieved the goal of a basis whose lowest order member can be chosen to match the underlying profile and, furthermore, the entire basis will be orthogonal and complete. Figure 2(a) shows an example conditioned to the singular isothermal sphere, a case that would be challenging for other than standard bases (and other potential solvers). Note that the lowest order members have potential and density proportional to l n r and r-2. Each successive member has an additional radial node.
0
-4
3
U X i
Figure 2(a). Basis derived assuming the singular isothermal sphere profile as conditioning functions. The upper (lower) panel shows the potential (density) members for harmonic 1 = 0 . The density members are premultiplied b y r2 to suppress the dynamical range. Figure 2(b) illustrates the advantage of the basis by illustrating the convergence of the coefficients for a Monte Carlo simulation of N = lo5 particles. The 1 = 0 plot shows that all of the variance in the distribution is described by the lowest order basis function as expected by design. The 1 = 2 case is noise; the plot shows that nearly all of the variance is described by j 6 8.
175
Evolution of galaxies due to self-excitation 1
1
0.8
0.8
0.6
0.6 a-
a-
<-
<-
W
W
04
0.4
0.2
0.2
0
0 0
10
5
0
5
i
IO J
Figure 2(b). Convergence of the coeflcients for a Monte Carlo realization of the underlying profile for 1 = 0 (left) and 1 = 2 (right). The solid line (dashed line) shows the cumulative explained variance (values of the coeflcients).
A main deficiency of the expansion method has been the lack of suitable bases for simulating a galactic disk with non-zero scale height. This can also be accomplished by direct solution of the Sturm-Liouville equation but with an additional complication: we can only use the conditioning trick in one dimension. For the cylindrical disk, the separable equations give us trigonometric functions in both the azimuthal and vertical dimensions. A related approach has been described by Robijn and Earn (1996) but users must take care to apply appropriate boundary conditions. We now have a choice: we can condition in z or R. The other dimension can be orthogonalised ex post facto to provide a good match to the underlying distribution using an empirical orthogonal function analysis (also known as principal component analysis). Explicitly, the Laplace equation separates in cylindrical coordinates using @ (r) = R(r)Z(z)O(B) as follows:
d2
--z(z) dz2
+k2Z(z) = 0
We now assume solutions of the form @(.,.,e) = \Eo(r)u(r)Z(z)O(B)and p ( r , z , B ) = po(r)u(r)2(z)@(e)with radial conditioning functions. The Poisson equation becomes
together with Equations (17b,c) above, where X is an unknown constant. In SL form, this is:
Martin D Weinberg
176
Now, use standard SLE solver to table the eigenfunctions. These coefficient functions now provide the input to the standard packaged SLE solvers either in tabular or subroutine form. The orthogonality condition for this case is
-47rG
lm
dr T Qo(r)po(r)u(r)'= -47rG
/
CC
dr r Q p = 1.
0
(20)
The functions Q ( r ,z , 0) and p(r, z , 0) are potential-density pairs. Just as for the spherical case, the lowest eigenvalue is unity and the corresponding eigenfunction u ( r ) is a constant function if Qo and po solve the Poisson equation. Again Qo and po need not solve the Poisson equation, but the conditioning functions must obey appropriate boundary conditions at the centre and at the edge. This is especially appropriate for this cylindrical case where equilibria solutions for three-dimensional disks are not convenient.
rzl ::pj--j
~~~~~
0
-0 1
-0 1
-0 2
-0 2
::E{ w
-0 21 -0
-0 1
-2
-1
0
1
2 -2
-1
0
1
2
-0 2
x
-2
-1
0
1
2
X
Figure 3. Cylindrical basis set conditioned by an exponential radial density profile and sech-squared vertical profile f o r m = 0. The potential (density) members are shown on the left (right) labelled b y radial and vertical orders. Positive (negative) isovalues are shown as black (gray). Figure 3 shows the basis set for the SL method conditioned by an exponential radial density profile and sech' vertical profile. The steps in the construction were as follows: 1. The radial SL equation is solved numerically with conditioning functions given by Qo(R)0: (1 (R/a)')-' and po 0: exp ( - R / a ) . The vertical functions are sines and
+
cosines with vacuum boundary conditions.
2 . Linear combinations of the resulting eigenfunctions are found using an empirical orthogonal function analysis to find the best description of the Qo(R)sech2(z/h) in the least squares sense. 3. The resulting basis functions are tabulated and interpolated as needed. Note that the basis can be chosen to have definite parity which optimises table storage.
Evolution of galaxies due to self-excitation
2.6
177
What good is all of this?
So far, we have explored a general approach for representing the gravitational potential for an ensemble of particles using particular harmonic bases. These bases can be derived in any coordinate system in which the Poisson equation is separable; at the very least, this includes all conic coordinate systems. Other advantages include: This potential solver is fast: it is O ( N ) with a small coefficient. Recall that the most popular approaches, tree and grid codes, are O ( N In N ) . Direct summation is O ( N 2 ) .For large N , this method has optimal scaling. Each term in the expansion resolves successively smaller structure. By truncating the series at the minimum resolution of interest or when the coefficients have low S/N, the high-frequency fluctuations are filtered out. This approach results in a relatively low-noise simulation; the high-frequency part of the noise spectrum dominates the particle noise in the standard potential solvers. Note that all of the dynamical information in a simulation is represented by the expansion coefficients. In other words, the expansion coefficients significantly compress the structural information in the simulation. If the dynamical content of the density and potential fields is the goal, one does not need to keep entire phase space, only the coefficients. Similarly, velocity fields may be represented by a similar expansion (e.g. Saha, 1993). One is not restricted to individual components or single bases and can assign parts of phase space to separate bases depending upon its geometry or history. This is precisely what one needs to study a disk embedded in a spheroid and halo. There is no one method for solving the Poisson equation in a simulation. The major disadvantage of the expansion approach is its lack of spatial adaptivity. It efficiently resolves non-axisymmetric features and disturbances as long as the galaxy does not change its structure rapidly. The approach is not highly adaptive and would not be good for equal mass merger, for example. Similarly, these schemes (like most efficient algorithms) do not strictly conserve momentum. In the limit of a large number of bodies, the expansion centre is arbitrary because the distribution can be represented in an origin independent way regardless of the expansion centre. For a smaller number of bodies, the number of available high-signal-to-noise ratio coefficients is too small to permit resolving the expansion about an arbitrary origin. The offset of the origin allowed for a given error bound decreases with particle number. This demands that an efficient implementation of the expansion-based Poisson solver recentre the particle distribution. These advantages, properties and limitations motivate a set of ideal applications: 1. Simulating a multicomponent galaxy. A feature of N-body simulation of galaxies is the disparate length scales of the disk, bulge and halo. This is not a problem for the expansion. We can pick a separate basis tailored to each component and determine the total gravitational field from their sum.
2. Long-term evolution. For a fixed number of particles, Poisson fluctuations and the simulation’s self-gravitating response to those fluctuations limit the length of time
Martin D Weinberg
178
7 that the evolution remains a good approximation to the collisionless Boltzmann equation. For too few particles, the fluctuations can be so large that the angular momentum and energy of a particle orbit can drift or dz&e significantly over a single orbital time. Because the expansion method filters the high-frequency noise by construction, this is likely to give the largest value of 7 in most cases. 3. Weak, cumulative and tidal interactions. Similarly, this method is ideal for studying the response of a simulated galaxy to external global distortions. The scale sensitivity can be manipulated to represent efficiently the scales of interest and no others. Of course, limiting the resolution a priori is not always the best policy and this strategy must be motivated by a prior study with weaker constraints.
4. Stability. This Poisson solver is ideally suited to studying global stability. A timeseries analysis of the coefficients can empirically yield both the growth rate and shape of the unstable mode.
3
A numerical method for perturbation theory
N-body simulation is not the only use for this special Poisson-solving bi-orthogonal expansion. We can exploit the completeness property to transform a linearised solution of the collisionless Boltzmann equation to a system of linear equations. This has been given the name 'matrix method' by dynamicists but is a standard approach to solving partial differential equations (Courant and Hilbert, 1953). By using the same expansion for both an analytic linear solution and an N-body simulation, we explore a particular problem both ways and even apply the two together in various hybrid ways to increase further the dynamic range or time scale 7 . I will sketch the development in the next section and follow this with a simple but complete example based on the slab model.
3.1
Introduction
The response of our stellar galaxy to any distortion is mathematically described by the simultaneous solution of the collisionless Boltzmann and Poisson equations:
V@(X)= 47~Gp(x).
(22)
The steps in the solution are as follows. First we linearise equation (21) and note that equation (22) is already linear. We then separate the partial equations in their natural bases. In general, the two equations separate different bases and this presents a technical problem but not an insurmountable one. The Cartesian coordinate system is the exception: the bases are the same. For a spherical stellar distribution, the bi-orthogonal potential-density pairs take the following form: @(r)= ElmE, afmqm(O, $)ai"(.) with an analogous expression for p(r). The two partial differential equations are then transformed to Fourier space using these
Evolution of galaxies due to self-excitation
179
bases to yield a set of algebraic equations. To do this, we note that orbits are quasiperiodic in regular potential. If all conserved quantities exist then by the averaging principle (Arnold, 1978), we can represent any phase-space quantity by the following expansion in action and angles:
e If the gravitational potential admits chaotic orbits, this approach does not apply strictly. If the Lyapunov exponents are small, quasi-periodicity should still be a good approximation. With these tools and conditions, we begin by linearising the collisionless Boltzmann equation (Equation 21). After expressing all phase-space variables in actions and angles, a Fourier transform in angles followed by a Laplace transform in times yields the solution
where the hat denotes a Laplace transformed quantity and the subscript A? denotes an action-angle transform. Finally, we can integrate equation (24) over v to get Pl(1,w). We have not included the simultaneous solution of the Poisson equation but at this point, we tie the two together by expanding both Pl(I, w ) and the perturbing potential in the bi-orthogonal series. Explicitly, we can determine the scalar product of the potential component of the pair with the velocity integral of equation (24):
The left-hand side is density expansion coefficients a. The right-hand side may be written as the action of a matrix on the vector of coefficients describing the perturbing potential b. The matrix R depends on the underlying unperturbed distribution function and the Laplace expansion frequency s. In other words, the resulting solution for the response a given the perturbation b takes the form a = R(s)b. We may straightforwardly include the self-gravity in the response by noting that the self-gravitating response is the simultaneous solution of the system to both the perturbation and the response of the system to its own response. Mathematically, this is a = R ( s )(a + b) which upon solving for a yields a = [I - R(s)]-’ R(s)b Note our accomplishment: we began with a coupled set of partial-integrodifferential equations and end up with a matrix inversion. The computational work is all in determining the matrix elements of R. Finally, after solving these sets of linear equations, we perform an inverse transform to obtain the resulting response to the perturbation in physical space. I feel that the name matrix method is a bit of a misnomer, or at least not fully descriptive. The procedure described above has simple intuitive interpretation and this will be even more apparent as we proceed through the next example. In transforming to Fourier space, we are in essence solving for the spectrum of normal modes of the system. The perturbation picks out the discrete modes and excites “packets” of continuous modes. After transforming back to physical space, we see the result of the decaying (or growing) discrete modes and phase-mixing packets of continuous modes in configuration space. In this sense, this approach might be more aptly called stellar spectral dynamics.
Martin D Weinberg
180
3.2
Example: slab dispersion relation
In this section, we apply this spectral dynamics approach to the stellar slab described in $2.3. The natural coordinates here are Cartesian. The canonical variables describing the phase space are linear position and momentum in the slab and action-angle variables in the vertical direction. This simple case differs from a disk or halo in that trajectories are not bound in the two in-plane dimensions. Similarly, there is symmetry in the two in-plane dimensions so with no loss of generality we are free tjo consider only one of these, say the z degree of freedom. So, the canonical variables are linear momentum and position (pz, z) and vertical action and orbital angle ( I z ,e). Orbital angle is defined as:
where E, is the energy in the vertical degree of freedom and R,(E,) = aH/aI,. The density and potential of the unperturbed equilibrium model does not vary in the infinite horizontal plane so the unperturbed quantities-density, potential and phase-space distribution function-do not vary in this dimension. This presents a formal difficulty popularised by Binney & Tremaine as the “Jeans’ swindle”. We will side step the subtleties here but please see Binney & Tremaine (1987) for discussion. We can now write our linearised equations of motion, the CBE and Poisson equation in these variables:
V2K(r) = 47rGpl(r). We now perform the two transforms: Fourier in actions and angles and Laplace in time. .4gain, the infinite horizontal extent causes a slight complication: a continuous set of plane waves rather than a discrete set that would obtain from a bound system. Let us denote the Fourier wave vector in the z direction as k, the index of the discrete vertical set as n and the Laplace variable as s. A tilde indicates a Laplace transformed quantity. The transformed CBE becomes afo . afo . S,flnk -k Z n ~ , , f ~ , k ik * p fink - -ZR.V1nkRz- -zk * pVink = 0. (27)
+
aE,
dEX
Solving for ,f, we now integrate over velocities to derive the Laplace-transformed density for each wave vector and vertical index. Integrating over wave vectors and summing over vertical indices gives us the expression for the response density for each Laplace
Next, we incorporate the Poisson equation by expanding the density and potential distortions in the bi-orthogonal functions in canonical variables:
Using the bi-orthgonality condition we perform the scalar product with equation (28) to get a linear set of equations that determines the expansion coefficients apk = -47rG
/ d3a:d321p”;Lke2n‘reik’’R.
(30)
Evolution of galaxies due to self-excitation Substituting the solution for
181
f", we have explicitly
which can be written as the following matrix equation n
u
To get the full self-gravitating response, we note that the imposed perturbation is then the sum of the internal response and external perturbation as follows:
The solution for the response is then
Alternatively, we can look for the perturbation that has the same shape as its own response, an eigenmode. The equation for this solution takes the form: a p k = M p u a u k . A non-trivial solution demands that D ( s ) det{l-M,,(s)} = 0 and this is often called the dispersion relation by analogy with the same relation that defines the possible wave modes in a plasma. We can classify the resulting modes by the real part of s. For Re (s) > 0, Re (s) = 0 or Re (s) < 0, the mode is respectively growing, oscillatory or damped. If we are interested in the evolution of a stable system, growing modes should be absent from the spectrum by design. Oscillatory modes are rare, requiring pattern frequencies which avoid commensurabilities with an integer combination of orbital frequencies. For this reason, pure oscillating modes are practically non-existent, although one can construct special cases theoretically. The damped part of the spectrum is analogous to Landau damping in a plasma. Physically, the damping results from resonant transfer between the pattern and commensurabilities with orbital frequencies. Note that all of these solutions are in Laplace space. To recover the time evolution, we must perform the inverse Laplace transform. This requires a bit of care but is straightforward; for details see the standard plasma literature, e.g. Krall & Trivelpiece, 1973 or Ikeuchi & Nakamura 1974. Finally, let us evaluate the response explicitly for a specific case. Recall that we are assuming that Let & is Sz. Let us further assume that we can factor the phasespace distribution function as: fo(z,v) = f ~ ~ ( vvy)fi(z, z, vz). Let the in-plane part of the distribution function be Maxwellian and the vertical part be be that for the sech2(z/h) density profile. The matrix elements M:u(s, k) now take the form
Conveniently, the integrals over U, can be written as error functions of complex argument using the relation
where z = (no, - is)/ICd%. Routines for evaluating the complex error function are readily available (e.g. www.netlib. org).
182
Martin D Weinberg -1 -1.5
........
-2 -
'
<
!
I
'
'
I
"
I
-2.5 - - - - -
' i
'
-3
-.-..
-0.05 -0.1
1
-0.15
Im(o)
-0.2 -0.25 I
\Y
I
I
,
I
0.4 0 5 0.6 0.7 0.8 0.9
I
1
I
,I
1 1 1.2 1 3 1.4
-0.3
3
22 B
U
-
- a c I
1 0 0.4
-m 0.3 d
5
0.2
&
0
0.1 0 0
0.2
0.6
0.4
0.8
1
z
Figure 4. Top: plot of the dispersion relation ID(w)l for the slab with a Maxwellian and sech-squared distribution. Bottom left: odd mode (bending) and right: even mode (Jeans).
3.3 Modes in the slab Modes are at the zeros of D which is shown in Figure 4. The figure only shows the dispersion relation as a function of w -is rather than s. For reference, Im ( U ) > 0 corresponds to instability. The dispersion relation is even in Re ( U ) : hence an exploration of the half-plane Re ( U ) > 0 is sufficient. We see two zeros. The first has Im ( U ) < 0 and very small /Im ( U ) / << 1. This is a damped mode but very weakly damped. The second, with larger Re ( U ) . is also weakly damped but more strongly that the first.
To get physical intuition for these modes, one can can determine the shape of the mode by finding the null vector of M,",(s, k) for each zero in Figure 4. The two modes are shown also in Figure 4. The more weakly damped of the two is odd about the mid plane and is a travelling bending mode. The second mode mode is even about the mid plane and is a breathing mode. The dispersion relation D is also a function of k. The zeros 2) determine a branch for each mode. In this case, the damping increases as lkl increases.
183
Evolution of galaxies due to self-excitation
3.4 Excitation of a damped mode by a disturbance We can use the information about the various modes in the dispersion to compute the excitation of the system due to a time-dependent disturbance. For example, let us consider the response of the slab due to a body passing through the slab at constant velocity. This is an idealisation of a dwarf satellite moving through the disk (e.g. Sgr dwarf and the Milky Way). We assume that we know the initial time dependence of the disturbance. After expansion in our chosen bi-orthogonal basis, we can write this as a vector of time-dependent coefficients. The Laplace transform of the perturbation vector is then
b(s) =
lm
(37)
dt'exp(-st')b(t')
The inverse Laplace transform of equation (31) gives
1 a(t) =
J
c+im ,
c--2w
00
dse"D,-,l(s,k)M,,(s,k)1
dt'exp(-st')b(t)
The Laplace transform was performed assuming a value of s that ensured convergence. We are free to deform the integral path as long as we use care to continue analytically the integrand and identify singularities. In particular, if the slab is dynamically stable, then D is non-singular in the half plane with Re (s) > 0. There will be poles for Re (s) < 0 corresponding to damped modes. In addition, the matrix elements M have denominators of the form s ix for x on the real line. The contour deformation rules are then: (1) for t < t', deform to Re (s) --t w, no poles; and ( 2 ) for t > t', deform to Re (s) -+ -CO, poles at s = -ix and at any possible poles of V in the lower-half s-plane (damped modes). Performing the inverse Laplace transform and putting everything together gives the explicit expression for the self-gravitating time-dependent response to the perturbation:
+
{
a(t) = - 4nG(27r)
S d l , J dw,i &:(--nnR,
+ICW,-
aEz afo
)
x
The inverse of the dispersion matrix will have poles at any modes (recall Cramer's formula). The notation R e s D i i denotes the residue of the this matrix and may be determined numerically by using singular value decomposition with the following procedure: (1) locate the damped modes s, and compute Dp,(s,); (2) analyse by singular value decomposition and compute the determinant without the singular value, DI(s,) say; (3) compute the derivative of the determinant at s,, dD/dsI,,. We expect D ( s ) = a ( s - sr)D'(s,) for some unknown constant of proportionality whose solution is: a = dD/dsl,,/D'(s,); and (4) replace the singular value in the decomposition by the value of a. I have given explicit details for readers interested in exploring this procedure numerically. The numerical computations here are straightforward for this case of the slab. One should be able to investigate the full response of the slab to an arbitrary perturbation.
Martin D Weinberg
184
4
Galaxy interactions
Let us finish with examples of these methods applied to two classes of astronomical scenarios. First, we will mention the excitation of structure by a passing galaxy such as a weak encounter in group, a fly-by. These interactions can cause off-centred disks and centres and trigger bars. Similarly, an orbiting satellite will have a very similar effect on its primary. Second, we will describe noise-driven evolution, both the shape and magnitude of fluctuation-driven structure and the possibility of significant evolution of halo profiles due to these fluctuations.
4.1
Fly-bys and satellites
Another way of getting the same sort of excitation, perhaps more important for group galaxies than the Milky Way, is a passing fly-by. A perturber on a parabolic or hyperbolic trajectory can excite similar sorts of halo asymmetries and persist until long after the perturber's existence is unremarkable. Presumably, our Galaxy has suffered such events in the past but because the satellite excitation is closely related to the fly-by excitation, the study of one will provide insight into the other. Vesperini and Weinberg (2000) describes the application of the response approach to this problem. From these analytic calculations, we can compute the standard asymmetry parameters (Abraham et al. 1996a, Abraham et al. 1996b, Conselice et al. 2000) obtained by summing over the mean square difference of the galaxy and its 180" rotated image:
For example, a perturber with 10% of the halo mass, with pericentre at the halo half-mass radius, and encounter velocity of 200km/s will produce A x 0.2. Damped modes play a major role in both the morphology and longevity of these modes. Figures 6 and 7 of Vesperini and Weinberg (2000) illustrates their importance by comparing the response with and without damped modes. The m = 1 mode is significantly altered by the discrete weakly damped mode. Please see Vesperini and Weinberg (2000) for more details. Because the halo response is dominated by the modes of the halo rather than properties of the perturber, we expect that the asymmetry should be dominated by contributions at well-defined radii, independent of the perturber parameters. We proposed a simple generalisation of Equation (40) to test this prediction: define A ( T ) to be the sums over pixels restricted to those within projected radius T . More recently, we have shown that N-body simulations agree in magnitude and morphology with the perturbation theory.
4.2
Noise
The possibility of long-lived damped modes leads to the possibility that global modes are continuously excited by a wide variety of events such as disrupting dwarfs on decaying orbits, infall of massive high velocity clouds, disk instability and swing amplification and the continuing equilibration of the outer galaxy. The dominant halo modes are low frequency and low harmonic order and therefore can be driven by a wide variety of transient noise sources. Some recent work (Weinberg, 2000a,b) provides a theory of
185
Evolution of galaxies due to self-excitation 0 15
8
0.02 0.1
\ N E
ms _ 4
8
0.01
0.05
2 0
0.3
0.l
15 08
c4 10
0.2
0.4
s_
8
v
0.1
w ” 5
0
0
n
0.2
5
n
10
15
0
5
n
10
15
5
n
10
15
Figure 5 . Power (in energy units) of the response of a halo to noise f o r two different models as a function of radial basis index. Left: WO= 5 King model. Right: Hernquist model. The top row left (right) shows the 1 = 1 (1 = 2) response f o r each model. The bottom row shows the cumulative power. The radial basis set is similar to that shown in Figure 2; the index on the abscissa indicates the number of nodes for each basis function. The larger the index, the finer the spatial scale. excitation by noise and applies this to the evolution of halos. In this section, I will first describe an application of our response theory to fluctuation noise. I will describe some preliminary results suggesting that noise may drive halos toward approximately selfsimilar profiles. Additional work will be required to make precise predictions for these trends and explore the consequences for long-term evolution of disks in spiral systems.
4.3 Halo noise The simplest approach is a calculation of the power in a stellar system due to Poisson fluctuations. Consider the response of the entire system to a single orbiting star. Physically, each star excites a wake in the halo. This wake includes all the modes from the weakly damped modes to very small scale modes. We now sum up the wakes from all of the stars. The self-gravity of the lowest-order mode leads to significant excess power at large scales. The detailed theoretical computation is compared with N-body simulations in Figure 5 . Note that the amplification of the noise by self-gravity is significant for the 1 = 1 component for both halos with and without cores. The analytic calculation is valid in the limit N + M. However, if this is not obtained in the N-body simulation, the power in fluctuations can be so large that individual orbits do not have well-defined conserved quantities (energies and angular momenta) over a dynamical time. In this noise-dominated regime, the diffusion of orbits is so fast that coherent large-scale dynamics is suppressed. In other words, with too few particles, one is simulating a star cluster, not a dark-matter dominated halo. We can see the effects of particle number by determining the number N required to obtain the noise-spectrum
Martin D Weinberg
186
OS
t
0 01
i I
2
3
4
5
6
1
8
9
10 Y
Figure 6 . Left: Fluctuation power as a function of particle number for each basis coeficient scaled by the number of particles N . Right: Fluctuation power for n = 5 as a function of particle number. The upper (lower) horizontal lines show the expected results with self gravity (Poisson). predicted by analytic solution of the underlying power spectrum. This is illustrated in Figure 6. The figure compares the same empirically determined power spectra shown in Figure 5 (left panel) but for various values of N . In short, one needs N 2 lo6 before the dynamics of the collisionless limit obtains. This result is largely independent of the N-body simulation technique.
4.4
Evolution of galaxy by noise
Given that fluctuations are a generic part of stellar dynamics, let us now ask what sort of evolution we can expect. To do this, I will sketch the development of a constitutive equation for the long-term evolution under noise. We could proceed as for globular clusters: expand the Boltzmann collision term using Master formalism (Binney and Tremaine, 1987). After a number of false starts, I found the more general transition probability approach to be more natural (although the Master approach is formally equivalent). One begins with the probability that an orbit with phase-space state x a t time t makes a transition to x' a t time t T : P ( x ' ,t + T / X ,t ) . For the entire ensemble described by the distribution function j ( x ,t ) ,we can describe the evolution using the transition probability as j ( x ,t 7 ) = d x ' P ( x ,t + T I X I , t )f ( X I , t ) . (41)
+
+
s
Now, expand the transition probability in its moments of x - x' for small
T.
This gives
This is known as the Kramers-Moyal expansion (Risken 1989). We will derive the transition probability for our case by considering the change in conserved quantities of orbits (actions) over the correlation time of the fluctuation. This implies that the transition probability is only defined for time scales T larger than the dynamical orbital time scales. Therefore, we can further simplify the computation by
Evolution of galaxies due to self-excitation
187
using action-angle variables and averaging over the rapidly varying angles. For the phasespace distribution function, the Kramers-Moyal expansion becomes
f(1,t
+
T)
=
1
d’P(1,t
+ TII’,t)f(I’!t )
(43)
Now to evaluate this equation, expand integrand in a Taylor series about I and define A G I’ - I. In the limit T -+ 0, we find af(lt
at
+
=
(
-;)n
n=l
D(n)(I,t)f(I,t).
(44)
where Dn is proportional to the time-derivative of the moments of A over the distribution P. However, despite the appearance of continuous functions in these formulae, note that P describes stochastic events. To write this explicitly in stochastic variables, let [ be the stochastic value of I. The expression D(n)may be written
If stochastic excitation is a Markov process, this guarantees that the expansion terminates after two terms (Pawula, 1967). Our evolution equation is then a Fokker-Planck eauation:
4.5
Noise-dominated halo evolution
First, some general observations. Noise from periodically orbiting bodies does not give rise to long-term evolution, even though they do give rise to significant orbital diffusion (as described above). This is easily argued. Changes over long time periods, so-called secular changes, will only occur if the disturbance presents a torque. Consider the mean density of an orbiting body over many dynamical times, for example. It will only present a torque if it is a closed, resonant orbit. At order 1 = 1, this requires that the radial and azimuthal frequencies be equal, as in a Keplerian orbit. For most halo profiles, these orbits populate the outer edge and therefore have little effect. Similarly, at order 1 = 2 , we add the possibility of closed, stationary bar-like orbits that have radial frequencies that are twice the azimuthal frequencies. This can occur in homogeneous cores, but these conditions are thought to be rare or non-existent in realistic halos. Order 1 = 3 is the lowest order that admits resonant orbits over a wide range of energies. This is not inconsistent with the the results of 54.2. Noise at orders 1 = 1 , 2 caused by orbiting bodies can cause significant orbital diffusion without changing the equilibrium profile. This turns out to be a corollary of a more general proof of the stability of stellar equilibria against phase-mixing (Hjorth, 1994). Parenthetically, N-body folks have used the maintenance of an equilibrium as an indicator of the collisionless regime. However, the argument above shows that the equilibrium will persist even if the rate of orbital diffusion is high. Conversely, any transience in the noise source-orbital decay, fly-bys, disrupting or shearing stellar streams-can excite the weakly damped modes at low order. Since a
188
Martin D Weinberg
galactic halo will suffer all of these disturbances over its lifetime, direct numerical estimates suggest that excitation of transient noise will dominate orbital noise in driving evolution for realistic astronomical scenarios and I will give examples of these below (Weinberg, 2000a,b). The overall procedure is as follows. We begin with an equilibrium halo and phase-space distribution function. To simplify solution of the Fokker-Planck equation, the distribution is made isotropic. The evolution equation (46) is now solved in two steps. First, we solve the Fokker-Planck equation holding the underlying gravitational potential fixed for some T greater than l / O but small compared to the overall evolution time scale. Second, we “turn off” the collision term and find new self-consistent equilibrium. The two-step process is repeated to obtain the evolution. Figure 7 shows the evolution under three different noise sources: (1) a satellite with a decaying orbit; (2) a halo of black holes; and (3) satellite fly-bys. In the first two cases, we begin with a WO= 5 King model, and the third begins with a broken power-law profile (with a small core for numerical convenience). For Cases (1) and (3), the results can be characterised as follows. There are two distinct evolutionary phases: a transient readjustment to a double power law profile followed by slow, approximately self-similar evolution. The outer profile is characterised by power law with exponent close to -3. The profile continues to approach the -3 power-law form at increasing radius as the evolution continues. Weinberg (2000a) shows that this obtains for a wide variety of initial conditions and is caused by the reaction of the halo to the external 1 = 1 multipole, which explains the ubiquity of the profile. The inner profile has a shallower roll before reaching the core. A power law of -1.5 is shown for comparison. The more concentrated models, which have deeper potential wells and therefore shorter dynamical times, evolve most quickly. This is clear in the comparison of Cases (1) and (3) but Weinberg shows that this obtains for a variety of initial conditions. Case (2), evolution by orbiting black holes, does not result in the same asymptotic form and exhibits much weaker evolution overall. Because these models have cores, and both the radial and azimuthal orbital frequencies are nearly the same in the core, it is difficult to couple to these orbits in order to transfer angular momentum in and out of the core. The core, then, expands with the overall expansion of the halo due to the deposition of energy from the noise sources. These dynamics suggest that we restrict our consideration to evolution beyond the core. Further investigations of the importance of an initial cusp are in progress.
5
Summary and topics for future work
These lectures have described the use of bi-orthogonal expansions in N-body simulations and perturbation theory to understand the long-term evolution of galaxies. For a concrete example, I presented an explicit example of an infinite slab which has a rich modal structure but can be treated analytically and by N-body simulation with a small amount of numerical computation. One can use these same procedures, with carefully chosen bases, to represent gravitational fields of galaxies in order to perform smooth, low-diffusion, N-body simulations. Multiple disk, bulge and halo components can be treated simultaneously by using separate bases for each component since solutions of the Poisson equation are additive. The
Evolution of galaxies due to self-excitation
189
E''''''\
'
'
""""
""-"
1
01
0001
0 01
0 001
ow01
lo-'
10-1
1
10
LOO
10
LOO
1WO
Figure 7. Left: Orbital decay in a WO= 3 King model halo for a satellite to halo mass ratio of 0.05. The straight lines are power laws with exponents -1.5 and -3, for comparison. Centre: Evolution of a King WO= 3 profile under 'black hole' noise. The times for each curve are shown with the scaling for number of black holes per halo assuming that the black hole fraction is 10%. This gives roughly nbh = lo6 and the evolution time scale is uninterestingly large. Right: Evolution for a double power law model p K ( r + e)-'(r + 1)y-p with y = 1, /? = 4 and E = 0.1. same expansion bases can be used to construct perturbation theories for understanding the stable and unstable modes and deriving the response to time-dependent disturbances. The advantage of using the perturbation theory is its insensitivity to particle noise and resulting orbital diffusion which can wipe out correlations that are critical to dynamics. Because both the N-body simulations and the perturbation can be represented by the same field expansion, the two approaches can be used together to understand the details of a complex interaction. Using these methods, we have seen that many if not all astronomical equilibria have weakly damped modes. These modes are easy to excite and slow to decay and therefore will tend to dominate the non-axisymmetric structure of galaxies. For example, the ubiquity of very weakly-damped "sloshing modes'' (1 = m = 1) may cause lopsided disks, off-centred nuclei including nuclear bars and black holes. The basic dynamics here was thoroughly explored decades ago by the pioneers in spiral structure (Lin and Shu 1964, Julian and Toomre 1966, Toomre 1969, Shu 1970a,b). In particular, global spiral structure was shown to be damped (Toomre, 1969) for the same physical reasons. We described several applications, satellite and fly-by induced lopsidedness and bars and excitation of structure by noise, emphasising the latter. In particular, the Poisson noise from a simulation of a halo with lo5 particles drives enough power, when damped modes are included, to cause observable disturbances in the disk. Physically, this noise is comparable to a halo of black holes of 2 to 6 x lo6 Ma.Conversely, one needs at least lo7 bodies to suppress the particle noise to the point that the collisionless limit is obtained with some confidence. We then considered the long-term consequences of this noise to the evolution of a galaxy halo. We argued that dwarf mergers, weak encounters with neighbours, and noise from the still equilibrating outer halo can drive significant halo evolution through noise excitation over a galaxy lifetime. There is much more that needs to be done in this area, including careful analysis of more realistic galaxy models under a wide variety of possible perturbations and noise
190
Martin D Weinberg
spectra. Calculations t o date have only considered stellar dynamics, but the gas component response to the large-scale structure discussed here may prove important t o our understanding of galaxy evolution as well as providing a n important observational diagnostic. This all leads t o the speculative possibility that galactic evolution may be driven by stochastic evolution, at least in part. It will be interesting t o see if a stochastic view rather than a static view is borne out.
References Abraham R G, Tanvir N R, Santiago B X, Ellis R S, Glazebrook K and van den Bergh S, 1996a. MNRAS 279 L47. Abraham R G, van den Bergh S, Glazebrook K, Ellis R S, Santiago B X, Surma P and Griffiths R E, 1996b, ApJS 107 1. Arnold V I, 1978, Mathematical Methods of Classical Mechanics, Springer-Verlag, New York. Binney J and Tremaine S, 1987, Galactic Dynamics, Princeton University Press, Princeton, New Jersey. Clutton-Brock M, 1972, Astrophys Space Sci 16 101. Clutton-Brock M , 1973, Astrophys Space Scz 23 55. Conselice C J, Bershady M A and Jangren A 2000, ApJ 529 886. Courant R and Hilbert D, 1953, Methods of Mathematical Physics, Vol 1, Interscience, New York. Dahlquist G and Bjork A, 1974, Numerical Methods, Prentice-Hall, Englewood Cliffs. Hall P, 1981, Ann Stat 9 683. Hernquist L, 1990, ApJ, 356 359. Hernquist L and Ostriker J P, 1992, ApJ 386(2) 375. Hjorth J, 1994, ApJ 424 106. Ikeuchi S, Nakamura T and Takahara F, 1974, Prog Theor Phys 52 . Julian W H and Toomre A, 1966, ApJ 146 810. Kalnajs A J, 1976, ApJ 205 751. Kalnajs A J, 1977, ApJ 212(3) 637. Krall N A and Trivelpiece A W, 1973, Principles of Plasma Physics, McGraw-Hill, New York. Lin C C and Shu F, 1964, ApJ 140 646. Morse P M and Feshbach H, 1953, Methods of Theoretical Physics, McGraw Hill, New York. Pawula R F, 1967, Phys Rev 162 186. Pruess S and Fulton C T, 1993, AGM Trans Math Software 63 42. Risken H,1989, The Fokker-Planck Equation, Springer-Verlag. Robijn F H A and Earn D J D, 1996, MNRAS 282 1129. Saha P, 1993, MNRAS 262 1062. Shu F H, 1970a, ApJ 160 89. Shu F H, 1970b, ApJ 160 99. Toomre A, 1969, ApJ 158 899. Vesperini E and Weinberg M D 2000, ApJ 534 598. Weinberg M D, 1999, A J 117 629. Weinberg M D 2000a, Noise-driven evolution in stellar systems: Theory, submitted to MNRAS, astro-ph/0007275.
Weinberg M D 2000b, Noase-driven evolution in stellar systems: A universal halo profile, submited to MNRAS, astro-ph/0007276.
191
Dynamical methods for reconstructing the large scale galaxy density and velocity fields Martin Hendry University of Glasgow, Glasgow, UK
1
Introduction
Over the past decade the study of the large scale structure of the Universe from analysis of galaxy redshift surveys has matured into an important and highly active area of cosmological research. Redshift surveys have become a powerful tool for probing both the dynamics of galaxy motions on large scales and the nature of the background cosmological model in which the galaxies are embedded. Indeed it is precisely the relationship between these two aspects of redshift surveys-galaxy dynamics and the background cosmologywhich is the main focus of this article. While we will see that the subject is firmly rooted in gravitational dynamics, a continuous, fluid, description is preferred to a discrete n-body treatment: galaxies are regarded as tracers of a smooth, underlying density and velocity field, and these fields are in turn treated as smooth perturbations on the background homogeneous and isotropic cosmological model. The aim of this article is to describe some of the techniques which have recently been developed to reconstruct the galaxy density and velocity fields on large scales, and t o use these reconstructed fields to place constraints on the parameters of the underlying cosmological model. Since this topic lies somewhat removed from the research fields of many participants a t this meeting, and perhaps many readers of this monograph, the approach of this article will be didactic, with the focus on the reconstruction methods and their context-discussed from first principles-rather than on specific results. We will set out to answer three basic questions: What are galaxy redshift surveys? Why are they (cosmologically) interesting? How do we extract useful cosmological information from them?
192
2
Martin Hendry
What are galaxy redshift surveys?
Figure 1 shows a representation of the so-called Lick m a p of the projected galaxy distribution in the nearby Universe, displaying the angular positions of about one million galaxies in the Northern hemisphere (Seldner et al. 1977). It is clear that the galaxy distribution across the sky is far from uniform: one sees a complex pattern of clusters, as well as regions which are almost devoid of galaxies.
Figure 1. Representation of the Lick map, showing the projected galaxy distribution of about one million galaxies in the northern hemisphere. Adapted f r o m Peebles (1993) In the same way as the distribution of stars in the night sky has been assembled into a pattern of constellations which are mere optical illusions, one must ask if the projected galaxy patterns in the Lick map, and later projected surveys such as the APM galaxy catalogue (Maddox et al. 1990), are merely a line of sight projection effect, or if they reflect the 3-D spatial distribution of galaxies. Application of the Copernican principle (which broadly speaking states that there should be nothing special about our place in- and viewpoint of-the Universe) implies that the clustering which one observes in the projected galaxy distribution should also be present in the 3-D distribution. Indeed, since the 1970s there has been a major research effort in cosmology to use the statistical properties of the 2-D angular galaxy distribution to infer the 3-D spatial distribution, under certain simplifying assumptions (see Groth and Peebles 1977). What is clearly desirable, however, is a direct probe of the 3-D spatial galaxy distribution; such a probe is provided by galaxy redshift surveys.
Reconstructing density and velocity fields
193
righr ascension
Figure 2. Slice of the CfA survey of Huchra et al. (1983), showing the distribution in right ascension and distance of around 1000 galaxies. Edwin Hubble’s observation of a linear relationship between the estimated distance and radial velocity of recession of nearby galaxies (Hubble 1929) revealed the expansion of the Universe and gave birth to modern cosmology. More straightforwardly Hubble’s law also provided us with a simple means to estimate accurately the distance of galaxies on cosmological scales. As any elementary textbook on cosmology will teach us, the linear relation cz = U,,, = Hod (1) connecting the redshift, z , (deduced from e.g. the wavelength shift of emission lines in the spectrum of a galaxy), radial velocity, v,,,, and distance, d,is an excellent description on scales of up to several hundred Mpc. The constant of proportionality-the Hubble constant-has a long and colourful history, featuring until quite recently a “factor of two” controversy in its measured value (see e.g. Hendry 1997; Mould et al. 2000). Nevertheless, irrespective of the true value of Ho,the measured redshift of two distant galaxies is certainly a reliable estimator of their relative distance. In short, then, a redshift survey is a 3-D map of the galaxy distribution where the radial coordinate is the measured recession velocity of each galaxy (or the estimated galaxy distance, assuming a value for the Hubble constant). The function of galaxy redshift surveys is (at least) twofold. First, they are useful as a purely cosmographic tool: by mapping the 3-D positions of galaxies they reveal directly patterns in the distribution which can be only indirectly inferred from the projected distribution. An example of an early redshift survey which made a significant cosmographic impact is the Harvard CfA survey (Huchra et al. 1983), one ‘slice’ of which-containing about 1000 galaxies-is shown in Figure 2. This picture, from the data of Huchra et al. (1990), shows the right ascension and inferred distance of around 1000 galaxies lying within a ‘wedge’ on the sky; the thickness (6” in declination) of the wedge has been suppressed in the plot. We can see that the distribution of galaxies in the nearby Universe is characterised by clusters, filamentary structures and voids, with a characteristic size of
194
Martin Hendry
around 50 Mpc. This was consistent with the findings of earlier ‘pencil beam’ redshift surveys, which had indicated the existence of large under-dense regions (Kirshner, Oemler and Schechter 1978). Note the presence of the coherent filamentary structure which stretches across almost all of Figure 2 a t a distance of about 80 Mpc, which has been termed the ‘Great Wall’ (Geller and Huchra 1989). Since the CfA survey was first published many more, considerably deeper, slices of the nearby Universe have been mapped in a similar manner (see e.g. Giovanelli and Haynes 1991; \;ogeley et al. 1994: Loveday et al. 1996: Shechtman et al. 1996; Vetollani et al. 1997). Among the most recent of these is the ‘Two degree field’ or 2dF catalogue (Maddox 2000). which is more than an order of magnitude larger in size and depth than the CfA surveys. In the early years of the 21st century, the Sloan Digital Sky Survey (York et al. 2000) will also be completed, generating a catalogue of one million galaxy redshifts across approximately one quarter of the sky. Interestingly, however. the largest observed features in these much later and deeper surveys are still comparable in size to those shown in Figure 2-indicating that on very large scales the Universe begins to look homogeneous, as predicted by the cosmologzcal prznczple. Around the same time as first results from ‘slice’ surveys such as the CfA were being presented, systematic programmes of studying all-sky redshift surveys were also being developed (Rowan Robinson et al. 1990; Yahil et al. 1991)- chiefly making use of the (almost) all-sky coverage of the infra-red sky in galaxy maps compiled by the IRAS InfraRed astronomical satellite, launched in 1983. As we will see in the next section, it is all-sky surveys which will be the main concern of this article.
3 3.1
Why are redshift surveys interesting? Statistical analysis of galaxy clustering
The statistical analysis of large galaxy redshift surveys is, in itself, a powerful cosmological probe, meriting a lengthy review article in its own right. This is because the observed pattern of galaxy clustering can be directly related to the parameters of the underlying cosmological model. Over the past few decades a very extensive battery of statistical techniques have been developed to extract cosmological information from the analysis of galaxy clustering. These techniques include (with representative references) : n point correlation functions (Davis and Peebles 1983), galaxy counts-in-cells (Balian and Schaefer 1989), void probability functions (White 1979), Voronoi tesselations (van de Weygaert and Icke 1989), power spectra (Peacock and Dodds 1994), redshift distortions (Kaiser 1987; Hamilton 1998), spherical harmonic analyses (Scharf and Lahav 1993), percolation and multifractal methods (Klypin and Shandarin 1993), Minkowski functionals (Kerscher et al. 1997), minimal spanning trees (Krzewina and Saslaw 1996) and the genus and other topological measures (Bardeen et al. 1986). Lack of space forces us to avoid further discussion of these statistical techniques and instead to restrict our consideration to methods which make explicit use of dynamical information from redshift surveys. Such a distinction between galaxy clustering and galaxy dynamics is rather artificial, however, since (as we will see in Section 3.3) the spatial distribution and motions of galaxies are inextricably connected within the gravatatzonal instability paradigm-the standard theo-
Reconstructing density and velocity fields
195
retical framework for the formation of structure in the Universe. For a detailed review of how the statistics of galaxy clustering can be used as a cosmological probe, the reader is referred to e.g. Strauss and Willick (1995).
3.2
Cosmology from galaxy dynamics: overture
To understand fully the dynamical usefulness of all-sky redshift surveys requires the theoretical background which we will review in the next section. The essence of their usefulness is easily seen, however. In the gravitational instability paradigm, the observed radial velocity of a distant galaxy is not simply due to the Hubble expansion, but also includes a contribution from the local motions induced by the gravitational attraction of matter around the galaxy. This additional motion, known as the galaxy’s radial peculiar velocity, distorts the positions of galaxies in a redshift survey compared with their true radial positions in space. Whilst for distant galaxies (e.g. r 2 100 Mpc) the contribution of the peculiar velocity to the total observed recession velocity becomes increasingly small and can be safely ignored, on smaller scales peculiar velocities can be significant-not least because they do not represent a random ‘error’ added to the Hubble expansion velocity, but rather a systematic pattern of deviations from a linear Hubble law which is coherent over large volumes of space. This is because these distortions- far from being peculiar-are exactly what one would predict within the gravitational instability framework: that structure forms through the evolution of density inhomogeneities, which move and grow under the influence of gravity. Examples of early studies of such large scale galaxy flows, from comparing observed galaxy redshifts with estimated galaxy distances (see also Section 4.8). include the detection of ‘Virgocentric flow’ of the Milky Way towards the nearby Virgo Cluster of galaxies (Aaronson et al. 1982) and the discovery of a streaming motion, coherent over more than 100 Mpc, towards the direction of the constellations Hydra and Centaurus, interpreted as evidence for a mass concentration which was termed the ‘Great Attractor’ (Dressler et al. 1987). The gravitational instability theory of structure formation allows one to relate the distribution of galaxies in an all-sky redshift survey to their peculiar motions. Moreover, the relationship between the spatial distribution and motions of galaxies (or a t least the matter, from which the galaxies form) depends on the underlying cosmological model in which the galaxy distribution is moving and evolving. This, then, highlights the second function of galaxy redshift surveys: they are also a cosmological tool-i.e. the galaxy distribution can be used to place constraints on parameters of the underlying cosmological model.
3.3
Friedmann’s equations and the evolution of the scale factor
In this section we briefly review the essential theoretical elements which relate the parameters of cosmological models to the formation of structure in the Universe under the influence of gravity. There is a vast literature on this subject, with dozens of excellent textbooks and review articles. References which are particularly relevant t o the theme of
Mar tin Hen dry
196
velocity and density reconstruction include Strauss and Willick (1995), Peacock (1999) and Dekel and Ostriker (1999), and the interested reader is directed to these references for a more detailed (and in some places more rigorous!) introduction to this topic. In the standard Big Bang model of cosmology the evolution of the Universe is described in terms of the Friedmann-Robertson- Walker (FRW) homogeneous and isotropic background model, with line element (describing the interval between two neighbouring points in spacetime) given by (in spherical polar coordinates)
[
ds2 = c2dt2- a ( t ) 2
dr"
___
1 - krI2
+ r'2(d02+ sin2Od&)
1
Here IC is the curvature constant, which takes the value -1, 0, and 1 for a Universe with negative, zero and positive spatial curvature respectively. The function a ( t ) denstes the scale factor of the Universe, which is a measure of the typical physical separation of e.g. galaxies as a function of time, t , since the Big Bang. The coordinate triple (r',e,$) denotes comoving coordinates-i.e. coordinates which expand with the background Universe. Thus, in the limit of a completely homogeneous and isotropic Universe, the comoving separation of two points in spacetime does not change with time, while the proper distance. r , between them is given by r = a(t)f(r') (3) where sin-' r' k = 1,
f(r')=
k = 0, sinh-' r'
(4)
k = -1
The dynamical evolution of the scale factor can be determined from the solution of Einstein's equations, assuming the FRW metric of Equation (2) and treating the massenergy content of the Universe as a perfect fluid, with density, p, and pressure, P , (usually assumed to be zero). One obtains the following two equations for a ( t ) a
4
a
3
- = --.irG(p
and
Ac2 + -)3c2P + 3
Ac2 3
(5)
kc2 a2
Here 11is the cosmological constant, introduced by Einstein to allow a static solution u / a , with present day value (i.e. for a ( t ) . The Hubble parameter is defined as H evaluated a t time t = to) given by Ho, which is of course the constant of proportionality in Equation (1). For -1= 0 then
where pcrit is the critical density which marks the division between a Universe which will expand indefinitely and one which will eventually re-collapse. The dimensionless matter density, Qm, is defined as Q
- - =P-
m -
Pcrit
8.irGP
3H2
Reconstructing density and velocity fields
197
which is, in general, time dependent due to the time dependence of p and H . We denote by Rmo the present day value of the density parameter. It is also customary to define
which denotes the (time-dependent, through H ) contribution of the cosmological constant to the dimensionless density parameter. The generic prediction of inflationary cosmological models (Kolb and Turner 1990; Liddle and Lyth 2000), which undergo a period of very rapid (i.e. exponential) expansion during early times, is that the Universe has zero curvature (i.e. k = 0) from which it is straightforward to show that
although the relative importance of R, and RA changes as the Universe evolves. There has been much recent interest in the cosmological literature in attempts to measure the values of f l m O and CIAOfrom the Hubble diagram of distant supernovae and the pattern of temperature fluctuations in the cosmic microwave background radiation (CMBR), the relic radiation from the Big Bang which emanates from the ‘surface of last scattering’: the epoch when the Universe was hot enough to ionise neutral hydrogen, thus rendering it effectively opaque due to scattering of photons by free electrons. For further discussion of this exciting new area of research, see e.g. White, Scott and Silk (1994), Lineweaver (1997), Perlmutter et a1 (1999) and Tegmark and Zaldarriaga (2000). In this article, however, we will restrict our attention to methods for estimating the (present day) dimensionless matter density, R,o, from redshift surveys, and for notational convenience we will henceforth denote R,o simply by Ro.
3.4
How did structure form?
The gravitational instability paradigm is the standard theoretical framework which describes the growth of structure on cosmological scales. It asserts that in the early Universe (e.g. at the epoch of the CMBR, about 300,000 years after the Big Bang) there were already present small perturbations in the density of matter. The origin of these perturbations need not concern us here, although in most scenarios they are ‘imprinted’ on the microwave background from a much earlier epoch-e.g. due to quantum fluctuations during the inflationary phase of the Universe (see e.g. Liddle and Lyth 2000). What is important is that the density perturbations grow under the influence of gravity, and essentially one needs to consider no physical processes other than gravity in order to explain the distribution and motion of the matter as the Universe evolves. The problem is most readily treated using a fluid description (e.g. Weinberg 1972; Coles and Lucchin 1993; Peacock 1999), i.e. we define a (scalar) mass density field p = p(r, t ) and (vector) velocity field V = V(r,t ) as a function of position, r (in proper coordinates), and cosmic time, t. (Note that, unlike the homogeneous background cosmology, the density is now a function of position. Note also that V includes the Hubble expansion at r). The time evolution of p and V will, then, be described by the following equations
198
Martin Hendry
9 + V, at dV
at
‘
(pV) = 0
+ ( V . V,)V + V,@ = 0
and
V:d
=4 ~ G p
(131 where V, denotes the gradient operator in proper coordinates. These are, respectively. the equation of continuity, equation of motion and Poisson equation for the fluid, with (scalar) gravitational potential = d(r, t ) . If we consider galaxies observed today in the nearby Lniverse as tracers of the velocity field, V, the observed line of sight recession velocity (equal to the radial component of V) of a galaxy a t position r is given by U,,, = HOT
+ r . [v(r)
-
vo]
(14)
Here v(r) is the peculiar velocity at position r, r is a unit vector in the direction of the observed galaxy and vo is the peculiar velocity of our location with respect to the reference frame in which the temperature distribution of the CMBR appears isotropici.e. the reference frame comoving with the mean motion of galaxies in the local Universe. We can determine this peculiar velocity if we assume that it produces the observed dipole anisotropy in the CMBR (Fixsen et al. 1994; Lineweaver et al. 1996). Correcting for this peculiar velocity is usually referred to as correcting observed galaxy redshifts to the ‘CMBR frame’. We next define the dimensionless density contrast, b
where p ( t ) is the mean mass density of the background FRW model. Expanding Equations (11) and (12) to first order in b and lvl, converting the gradient operator to comoving coordinates and subtracting the zeroth order solution for the background model leads to the following equations
db
-
dt
and
dv at
-
1 + -v .v =0 a
ua 1 ++ -aV d = v
Taking the time derivative of Equation (16) substituting into Equation (13) gives
,
0
(17)
the divergence of Equation (17) and
d2b + 2 -ad6 - =4~GiJb dt2 a at
-
We seek a solution which is separable in its spatial and time dependence, of the form
+
b = A ( r ) D l ( t ) B(r)Dz(t)
(19)
Reconstructing density and velocity fields
199
where D1 and D2 are respectively growing and decaying modes (i.e. D1 grows and DZ decays as the scale factor increases). A simple, analytic solution for 6 exists in special cases, for example the Einstein de Sitter model with Ro = 1 and A = 0. In this case it is easy to show that a ( t ) 0: t2/3. Mass conservation implies that p 0: u ( t ) - 3 , which then reduces Equation (18) t o 2 -a26 + - - 4 a6 = -6 at2
This has analytic solution
3t at
3t2
6(r,t ) = A(r)t2/3 + B(r)t-'
For more general cosmological models, the solution for 6 depends on the value of Ro and RA. At late times the second term in this solution becomes negligible, so that Equation ( 1 6 ) reduces to
where the growth factor, f , is given by f=--
1 dD1 HOD1 dt
An excellent approximation to f (Lahav et al.
f (Ro, 0,)
+
=
from which it follows that (neglecting the very weak RA dependence)
v .v
6=--
HoR!.6
Equation (25) may be solved using standard techniques from the theory of electrostatics, to give
HOR;.~
1
v(r) = 47r
'6 r')(r' - r) - r13
Equations ( 2 5 ) and (26) epitomise the theme of this article: in the gravitational instability paradigm there is a precise relationship between the density field and peculiar velocity field of matter, and that relationship depends on the parameters of the background cosmological model. Thus, if we use the distribution and motions of galaxies to reconstruct the matter density and velocity fields, we can in turn use these dynamical fields to place constraints on the background cosmological parameters. The importance of all-sky redshift surveys in this reconstruction process is seen from Equation (26), where the integral formally is over all space. In other words, the peculiar velocity field at a given position is not simply induced locally, but results from the mass distribution in a large surrounding volume. Thus, simply ignoring the contribution from a particular region of the survey volume, due to e.g. poor sampling, will result in systematic errors in the reconstructed dynamical fields. The theoretical framework presented thus far is a simplification in several key respects, and the remainder of this article is concerned with how that simplified picture is handled in practice. Specifically, we must address the following points:
200
Martin Hendry We measure galaxy redshifts, not positions, but Equation (26) is an integral over spatial positions.
Our theoretical framework treats S and v as smooth fields, following a fluid description. We use galaxy redshift surveys to trace 6 and v. but our tracers are only a sparse and noisy sample of those fields. The analysis which leads to Equations (25) and (26) assumes linear perturbation theory. In practice we must deal with a density field which has evolved (at least on small scales) well beyond the linear regime.
6 measures the overdensity in the distribution of mass, whereas redshift surveys measure the distribution of galaxies. The relationship between the galaxy and mass distribution may be (and again, at least on small scales, clearly is!) non-trivial. Despite these caveats, the essence of Equations (25) and (26) is a simple. and rather profound idea: that the dynamics of galaxies can tell us something of the nature of the Universe in which we live-determining, for example, whether the Universe will eventually re-collapse or continue expanding indefinitely. One should not lose sight of this farreaching result, while recognising-as we will see in the following sections-that the devil is in the detail.
How do we extract information from surveys?
4 4.1
Iterative reconstruction methods
A common methodology adopted in attempting to recover cosmologically useful information from all-sky redshift surveys is to solve Equation (26) in real space iteratively in a self-consistent manner. The procedure adopted is essentially as follows. 1. Estimate a smooth S(r) from the observed distribution of galaxies in the redshift ignoring the distorting effect of galaxy survey, assuming first that cz = r-i.e. peculiar motions. 2. Using Equation (26), compute the peculiar velocity field, v(r), predicted by the assumed S(r). 3. Use the radial components of the predicted velocity field to correct the 'observed' values of the smoothed density field a t each position. r. 4. Repeat iteratively from step (2) until convergence is reached This iterative reconstruction procedure has been applied to a number of different redshift surveys, including for example the IRAS 1.9Jy and 1.2Jy surveys (Straws et al. 1992; Fisher et al. 1995a), the QDOT survey (Kaiser et al. 1991), Optical Redshift Survey (Santiago et al. 1995) and, recently, the IRAS PSCz survey (Branchini et al. 1999). Other notable implementations of this iterative scheme, for various galaxy surveys and catalogues, include Hudson (1993). Freudling et al. (1994) and Teodoro (1999).
Reconstructing density and velocity fields
4.2
201
The impact of sparse sampling
Sparse sampling of the galaxy distribution throughout the volume of the redshift survey introduces random ‘shot noise’ in the reconstruction. This can to some extent be controlled by filtering of the smoothed galaxy density field, using e.g. the Wiener filter, which minimises the variance between the reconstructed and true density fields (Wiener 1949). Further discussion of this topic lies beyond the scope of these lectures, but the reader is referred to Fisher et al. (1995b) for more details. It is worth noting in passing, however, that the QDOT survey (Rowan Robinson et al. 1990) confronted directly the issue that galaxies are only discrete tracers of the underlying dynamical fields, and made something of a virtue of a sparse sampling strategy, designed to optimise the efficiency of the survey. In other words. instead of obtaining redshifts for every relevant galaxy within a given volume of the local Universe, the QDOT strategy was to sample only a fixed fraction of galaxies, sufficient to reconstruct iteratively the dynamical fields without substantial loss of cosmological information, while allowing the survey to probe a much larger volume-albeit only sparsely. More serious than ‘shot noise‘ effects are variations in the sampling with redshift and/or direction, both of which introduce systematic errors in the reconstructed density and velocity fields.
4.3 The impact of observational selection effects While the iterative reconstruction of the dynamical fields is a straightforward idea in principle, its practical implementation is limited by the fact that the integral of Equation (26) is over all space, while redshift surveys are generally flux-limited and masked. These two effects mean that galaxies, respectively, fainter than a specified limiting observable flux and within a particular ‘masked’ solid angle on the sky will be excluded from the survey; the two exclusions can be termed together ‘observational selection effects’, and also pose problems for other, non-iterative, reconstruction procedures, as we will see later. The presence of a directional mask arises because a redshift survey is never truly ‘allsky’, since the observation of e.g. IRAS galaxies behind the plane of the Milky Way is rendered almost impossible by the contaminating effect of dust in our own Galaxy. The IRAS 1.9Jy and 1.2Jy surveys covered 88% of the sky, missing a strip 10” wide around the Galactic plane and high latitude regions inadequately covered by the IRAS satellite. The more recent PSCz survey (Saunders et al. 2000) had similar sky coverage but to a deeper flux limit. The effect of a flux limit is generally radial rather than directional. It renders the sampling of galaxies increasingly sparse at large distances. This problem is usually addressed by defining a selection function for the survey, which measures the probability that a galaxy at a given distance meets the selection criteria to enter the survey catalogue. We can illustrate how a selection function is defined as follows. Suppose that the luminosity of all galaxies were a constant, L = L*, say. Then, for a flux limit &, it follows that a galaxy at distance, r , will be observable provided r*
202
Martin Hendry
or
Thus one could in principle sample completely the galaxy distribution within a sphere of radius q i m , producing a uolume limited survey. In practice, however, galaxy luminosities are not constant but are described by a , that @ ( L ) d Lis proportional to the number density of luminosity function, @ ( L ) such galaxies with luminosity between L and L dL. Thus for a sharp flux limit a t Fmin galaxies of different luminosities will ‘fade out’ at different distances, and the selection function for the survey is a convolution over the galaxy luminosity function. Explicitly, the selection function, # ( r ) ,is defined as
+
The mean galaxy number density, E , can be estimated from the selection function computed for a discrete sample of galaxies as follows
I C -
n=-
1
V galaxies $ ( r i ) and a smoothed galaxy overdensity field, hg(r),can be constructed from
where W(r - r t ) is a window function which smooths the point galaxy distribution. Examples of window functions commonly used in the literature include a spherical ‘top hat’ window and Gaussian window (see e.g. Strauss and Willick 1995 for more details). Thus, the sparse sampling at large distances caused by the flux limit is compensated in the reconstruction procedure by the presence of the selection function ‘renormalising’ our estimate of the smooth galaxy density field to take account of the ‘missing’ galaxies. The IRAS 1.9Jy survey mapped the redshifts of 2658 galaxies with 60 micron flux f 6 0 brighter than 1.936Jy. The 1.2Jy survey increased this total to 5339 galaxies with f60 2 1.2Jy, and the PSCz survey to 15411 galaxies with f60 2 0.6Jy. The impact on the reconstruction of incomplete sky coverage, caused by the masking effects of e.g. the galactic plane, must also be taken into account. One approach to this problem is to ‘fill in’ the masked regions by adding randomly generated galaxies to the redshift survey. Various strategies have been adopted to do this. Yahil et al. (1991) ‘cloned’ galaxies from the regions neighbouring the masked region, thus ensuring continuity of structure across the Galactic plane. Figure 3, from Yahil et a1 (1991), shows the distribution of galaxies in the sky from the IRAS 1.9 Jy redshift survey used in their reconstruction, including the cloned galaxies in the masked region. In recent years a more sophisticated statistical technique has been developed to improve further the treatment of the masked region. This approach again uses the idea
Reconstructing density and velocity fields
203
Figure 3. The distribution in the sky of galaxies in the IRAS 1.9Jy redshij? survey, including random galaxies in the masked region (cloned’from neighbouring regions. From Yahil et al. (1991) of the Wiener jilter (Wiener 1949)) and can be regarded as a Bayesian approach to the problem- using the observed variations outside of the mask to define a prior model for the all-sky variations (Fisher et al. 1995b; Zaroubi et al. 1995). .4 similar statistical formalism has also been used to reconstruct the temperature distribution of the cosmic microwave background radiation behind the Galactic Plane (Bunn, Hoffman and Silk 1996).
4.4 The impact of non-linear effects: triple valued zones The iterative procedure outlined above uses Equation (26) to compute corrections to the redshift space distribution, but as we have already remarked Equation (26) is strictly valid only in the linear regime, where 6 << 1. How does one handle the situation when the linear approximation begins to break down? We will return to non-linear effects in detail in Section 4.5, but in the context of our iterative procedure we can easily see how they will already manifest themselves in a problematic way. Figure 4 is a cartoon diagram sketching the relation between distance and redshift along a particular line of sight close to an overdense region in the survey. The gravitational effect of the overdensity induces a positive radial peculiar velocity for galaxies in front, and a negative radial peculiar velocity for galaxies behind the overdensity. For a sufficiently small overdensity (or for a line of sight which does not directly cross the overdense region) the redshift-distance relation will deviate from the linear form expected for pure Hubble expansion (shown by the dotted line in Figure 4) but will still be monotonically increasing. As the size of the overdensity (and/or the proximity to it) increases, however, a redshift-distance relation similar to the solid curve will be obtained. Thus a t observed redshift A, for example, a unique distance may still be inferred; a t
Martin Hendry
204
I 0
I Zoo0
loo0
Distance (kms
I 300(
-1)
Figure 4. Schematic illustration of the problem of ‘triple valued zones’ in non-linear evolution. The solid curve shows the relation between redshift and distance along a particular line of sight close to an overdensity in the matter distribution. At observed redshift A a unique distance may still be inferred, but at redshift B galaxies at three distinct distances along the line of sight have the same observed redshift. redshift B, on the other hand, galaxies at three distinct distances along the line of sight have the same observed redshift. The obvious question arises: which distance should one adopt? An iterative reconstruction scheme based on linear theory alone has no exact means of dealing with such triple valued zones. The approach adopted by e.g. Yahil et al. (1991) and subsequent authors has been to assign a distance to each galaxy probabilistically, drawn from the three computed possible solutions, so that the line of sight galaxy distribution matches in a statistical sense the reconstructed underlying density field from which it is sampled. In other words, one builds into the reconstruction a model for p(rlcz),the probability distribution for the true galaxy distance, given the observed redshift. A more sophisticated version of this approach-known as the VELMOD maximum likelihood analysis-has recently been developed (Willick and Strauss 1998).
4.5
Relating the galaxy and mass density fields: linear biasing?
Perhaps the most significant approximation in Equations (25) and (26) is the assumption that galaxies are faithful tracers of the mass density field. On a very basic level this assumption is clearly false: we developed the theory of structure formation assuming a smooth mass density, while galaxies are sampled sparsely at discrete locations. The question remains, however: even with a ‘good’ estimate of 6,(r) provided by a densely sampled all-sky redshift survey, to what extent can we assume that 6,(r) = 6(r)-or, as the problem is usually expressed, to what extent does galaxy light trace mass?
Reconstructing density and velocity fields
205
Kaiser (1984) raised this question in the context of galaxy formation models and showed that, on the (reasonable) assumption that galaxies form at the peaks of the mass density field, then galaxy clustering will be stronger than the clustering of the mass distribution. Expressed in more formal statistical language, the two-poznt correlatzon function of the galaxy distribution (which measures the mean excess number of galaxy pairs with a given spatial separation, compared to the expected number of pairs in the absence of galaxy clustering) has a higher amplitude- indicating stronger clusteringthan the correlation function of the mass distribution. Kaiser considered the case where galaxies form at peaks in the mass distribution above a certain, fixed, threshold, and showed that in this case the following simple relation holds
bg(r) = bb(r) where b is a constant related to the galaxy formation threshold. Kaiser’s model introduced the concept of ‘biased galaxy formation’, and the constant b is known as the linear bias parameter. While it is widely recognised that the threshold picture of how and where galaxies form is a gross oversimplification, the adoption of a linear treatment, of galaxy bias has been a very popular one-not least because it can be very easily incorporated into the basic theoretical framework summarised in Section 3.2. In short, if we assume b,(r) = bb(r) then we can re-write Equations (25) and (26) as
and
where we have written
p=-”
n0.6
b
(35)
Thus, the task of constraining Ro from a comparison of the density and velocity fields becomes instead the task of constraining p. A great deal of work has been carried out in recent yeari testing the linear bias assumption (e.g. exploring the dependence of b on galaxy morphology and scale) and considering the possible application of reconstruction methods to measure nonlinear biasing -see for example Coles and Lucchin (1995) and Coles (1997) for a general introduction, and Dekel and Lahav (1999) and Sigad, Branchini and Dekel (2000) for recent perspectives. Such reconstruction analyses are in their infancy, however (although greater progress has been made in the statistical analysis of galaxy clustering as a probe of nonlinear biasing: see e.g. Verde et al. 1998; Verde, Heavens and Matarrese 2000; Scoccimarro et al. 2001) and in the remainder of our discussion of velocity and density reconstruction we shall restrict ourselves to the linear biasing model.
4.6
Beyond linear theory: the Zel’dovich approximation
In analysing the evolution of cosmic structure in terms of a fluid description, one can adopt either a Lagrangian or Eulerian formulation. In the former case one defines a coordinate system which is ‘attached’ to the fluid elements, so that the Lagrangian coordinates of
206
Martin Hendry
the elements remain fixed as the fluid evolves; in the latter case one’s coordinate system is attached to points in space, with respect to which the fluid elements will move as the fluid evolves. Comoving coordinates are in some sense a hybrid of the two, since the comoving coordinates of the homogeneous background cosmology remain fixed as the Universe expands, but the evolving fluid which is a perturbation on the background model contains elements whose position and velocity change. Linear perturbation theory makes the assumption that the comoving positions of the fluid elements change negligibly as the Universe expands, so that structures grow simply according to the linear growth factor. f,of Equation (23). Zel’dovich (1970) introduced an important approximation which extended linear theory in a powerful, and rather elegant. manner. The Zel’dovich approximation assumes that the difference between the Lagrangian position (usually denoted by q) and Eulerian position (x)of a fluid element may be written as the product of a purely time-dependent function and a purely spatiallydependent function, i.e. x(q) = 44 [q + Dl(t)$(s)l
(36)
w
where D l ( t ) is the growing mode which appeared in Equation (19) and is proportional to the gradient of the gravitational potential. Thus we see that the Eulerian position is simply the Hubble expansion, plus a separable perturbation. The Zel’dovich approximation is very useful, in that it gives an excellent approximation to the true evolution of the velocity and density field into the mildly non-linear regime (roughly to 6 of order 5)-as has been verified extensively by numerical simulations (see e.g. Sahni and Coles 1995: Dekel et al. 1999). It remains an excellent approximation so long as there remains a one-to-one mapping between the initial and final Eulerian positions of the fluid elements-i.e. so long as the trajectories of fluid elements do not intersect (‘shell crossing’). Its usefulness in the context of dynamical reconstruction lies specifically in providing mildly non-linear versions of Equations (25) and (26), thus allowing the density and velocity fields to be related on smaller scales than permitted in linear theory.
4.7 Directly mapping redshift space to real space One of the main drawbacks of the simple iterative reconstruction scheme outlined in Section 4.1 is that it is computationally intensive. Several attractive alternatives have gained in popularity in recent years. which involve establishing a direct, non-iterative, relation between the dynamical fields in real space and redshift space. The first direct approach which we consider is based on the fact that-as we have just seen in Section 4.5-we can establish a unique one-to-one mapping (the Zel‘dovich approximation) between the initial and final positions of galaxies which is valid up until shell crossing occurs. Assuming that the peculiar velocity field is irrotational (a reasonable assumption on large scales, since in linear theory the growing mode, D1in Equation (19), is indeed irrotational) one can then write down a Jacobian for the transformation between the initial galaxy distribution in real space and the final galaxy distribution in redshift space. The irrotationality assumption implies that the peculiar velocity field can be written as the gradient of a scalar velocity potential, a, a differential equation for which can be obtained from the Jacobian. Nusser and Davis (1994) developed this approach-known
as the I T F method-by
207
Reconstructing density and velocity fields
expanding the density field and velocity potential (both in redshift space) in terms of spherical harmonics, producing the differential equation 1 d
d@im
1 l(1+
1)
75( s 2 x - ) - G - 3 7aim =
1+B
(dim -
)
1 d log 4 d@im s d l o g s ds
(37)
where s is the redshift space radial coordinate and 4 is the radial selection function defined in Equation (29)-clearly selection effects have an impact here in the same way as they did for the iterative solution of Equation (26). One can solve Equation (37) for the spherical harmonic coefficients, @ l m , on a given shell in redshift space, and then obtain the ( p dependent) peculiar velocity field by differentiation of the velocity potential. No iteration is required, although-since the method assumes the validity of the Zel’dovich approximation-it is incompatible with the existence of triple valued zones, and thus cannot be applied on small scales. A related approach was adopted in Fisher et al. (1995b), where the density field was expanded in angular spherical harmonics and radial spherical Bessel functions-although this time assuming linear theory. Again one may establish the ( p dependent) relationships between the expansion coefficients in real space and redshift space, taking account of the radial selection function and the angular mask of the redshift survey, and applying a Wiener filter to reduce the impact of shot noise. Rigorous testing of the reconstruction methods was carried out using mock redshift surveys generated from n-body simulations. A comparison of ‘true’ and reconstructed velocities showed that the I T F method (Zel’dovich approximation 2-D spherical harmonics on radial shells) and Fisher et al. method (Linear theory Wiener filtered 3-D spherical harmonics) gave very similar results, and both were clearly superior to the iterative reconstruction scheme. Further development of the ‘redshift space to real space operator’ approach, using a spherical harmonic decomposition of the density and velocity field, can be found in Taylor and Valentine (1999).
+ +
Due to the presence of the radial and angular selection effects, the spherical harmonics and spherical Bessel functions no longer provide an orthogonal set of basis functions for the decomposition of the density field. This introduces coupling between the different modes of the expanded density field, and means that one cannot solve separately for each mode. However, the problem is still reasonably tractable since the redshift space distortions produced by the radial peculiar velocities and the flux limit affect only the radial modes, while the angular mask affects only the angular modes. An interesting alternative approach, which preserves the orthogonality properties of the expansion coefficients, was considered in Schumacher (2000). Here a Gram-Schmidt orthogonalisation procedure was applied to the redshifts and angular positions of galaxies in the PSCz catalogue in order to generate a new, fully orthogonal, set of basis functions on the ‘masked’ space (i.e. including the angular and radial selection effects directly in the orthogonalisation procedure). This analysis was closely based on a similar treatment of the ‘masked’ sky maps of the microwave background radiation, first proposed by Gorski (1994). Another direct method of inferring the real space density and velocity field, proposed by Peebles (1990) and further developed by Giavalisco et al. (1993), is based on the least action principle and is (uniquely among the techniques discussed in this article) suitable for application to a discrete n-body system. One can derive the exact equations of motion for a multi-body gravitating system by finding the stationary points of the action. The
208
Martin Hendry
method is exact, provided shell crossing has not occurred, and converges very quickly. It was first applied to redshift survey data in Peebles (1994) and Shaya, Peebles and Tully (1994). ,The least action approach has recently been developed further, via the Path Interchange Zel’dovich Approximation (PIZA) method of Croft and Gaztaiiaga (1997) and Valentine, Saunders and Taylor (2000).
4.8
Comparison with real data: independent distance indicators
The essential point of our discussion so far is that one may use all-sky redshift surveys to reconstruct the ( p dependent) peculiar velocity and density fields, taking account of sparse sampling, radial and angular selection effects and (to some extent) non-linear evolution of the cosmic structure. The question then remains. how does one use the reconstructed fields to constrain the value of ?? To do this requires zndependent information about 6 and v: we obtain this information from redshzft-zndependent dzstance andzcators. In the past decade many catalogues-containing thousands of galaxies with redshift-independent distance estimates-have been compiled. including the MAT sample (Mathewson, Ford and Buchhorn 1992); HM sample (Han and Mould 1992); C F sample (Courteau et al. 1993); Abell BCG sample (Lauer and Postman 1994); SCI sample (Giovanelli et al. 1997); KLUK sample (Theureau et al. 1997): nearby SNIa sample (Riess et al. 1997); Mark111 dataset (Willick et al. 1997); SBF survey (Tonry et al. 1997); SFI sample (Giovanelli et al. 1998); SMAC sample (Hudson et al. 1999); EFAR and ENEAR samples (Colless et al. 1999 and Wegner et al. 1999): SCII sample (Dale et al. 1999); Shellflow survey (Courteau et al. 2000); and LPlOk survey (Willick 1999).
.4discussion of the astrophysical basis of redshift-independent distance indicators lies beyond the scope of this article. although more details can be found in e.g. Hendry (1997) and Webb (1999). A typical example is the Tully-Fzsher relatzon, which is appropriate to mention since it is based on a sound dynamical principle: that more massive spiral galaxies tend to rotate more rapidly. The Tully-Fisher relation is an empirical powerlaw relationship between the luminosity and rotation velocity (as deduced e.g. from the width of the 21cm emission line of neutral hydrogen in the disk) of spiral galaxies. By measuring the galaxy’s rotation velocity a t some specified location in the disk, one can use Tully-Fisher to estimate its luminosity, which one then compares with the observed brightness of the galaxy to infer its distance-by application of the inverse square law. How can we use redshift-independent distance indicators to provide cosmological constraints? Consider, for example, the peculiar velocity field. From Equation (26), or other suitable reconstruction procedure, we can obtain vp(r)-i.e. given a value for /3 we can predict the peculiar velocity field a t any position in our survey volume. In particular, therefore, we can infer the rudzul peculiar velocity, up(^,) say, at the position of any observed galaxy. If i, denotes an estimate of that galaxy’s distance (expressed for convenience in kms-’, equivalent to assuming Ho = l ) ,then an estimate of the radial peculiar velocity of the galaxy is simply U, = cz, - i, (38) One can then compare the ( 3 dependent) predicted and estimated radial peculiar velocities to obtain in turn a maximum likelihood estimate (or confidence region) for 0. The principal drawback of this approach is that redshift-independent distance indica-
Reconstructing density and velocity fields
209
tors such as Tully-Fisher are generally rather noisy-with typical distance error dispersions of about 20% for an individual galaxy. More problematically, this intrinsic scatter will introduce systematic errors when distance indicators are applied to estimate peculiar velocities. These systematic errors are intimately linked to a classic problem in statistical astronomy-known generically in the literature as Malmquist baas.
4.9
Malmquist bias of estimated distances and velocities
Malmquist (1920) considered the luminosity distribution of observable galaxies in a fluxlimited sample, and concluded that the mean luminosity of observable galaxies would be systematically brighter than the mean of the underlying population of galaxies, since intrinsically fainter galaxies would be missing from the sample a t large distances. This bias in the luminosity of observable galaxies has important consequences for estimating galaxy distances. A detailed discussion of this problem can be found in e.g. Hendry and Simmons (1994), Strauss and Willick (1995) and Teerikorpi (1997), and in what follows we present only a very brief summary of the salient statistical points. One can understand the essence of the problem of Malmquist bias by considering the following scenario. Suppose one observes a spiral galaxy with true distance , rtruer and uses e.g. the Tully-Fisher relation to obtain an estimate, i , of that true distance. Given the intrinsic Tully-Fisher scatter, it is likely that i will differ significantly from rtrue.What, then, is p(rtrueli),the conditional probability distribution of rtrue,given i? One might reasonably suppose that, for a galaxy a t a given true distance, the scatter in a properly calibrated distance indicator is equally likely to produce an under-estimated and over-estimated value of i . More formally, we might assume that the conditional distribution of i given rtruehas expectation value equal to rtrue,i.e.
However, we are interested in the conditional distribution of true distance given our estimated distance (i.e. we want to know where the galaxy really came from). We can relate p(ilrtrue)and p ( r t r u e ( ivia ) Bayes’ formula, so that
It immediately follows that, if E(ilrtrue)= rtrue,one will not in general obtain E(rtrueli)= i , because of the presence of the p(rtrue)term in Equation (40). Moreover, removing the bias which this term introduces strictly speaking requires knowledge of p(rtrue)-which is closely related to the very galaxy density field which we are trying to reconstruct. And, of course, from Equation (38) any residual bias present in our adopted galaxy distance will also impact on our estimated radial peculiar velocity-possibly leading to a spurious signature of large scale galaxy motion. A number of different statistical approaches to correct for Malmquist bias have been proposed in the literature. Lynden-Bell et al. (1988) made the assumption of a uniform p(rtrue),thus defining homogeneous Malmquist bias corrections for their distance indicator. This led, however, t o some debate in the literature over their detection of radial back inflow into their ‘Great Attractor’ region: could this detection have been merely the signature of
210
Martin Hendry
uncorrected Malmquist bias? Landy and Szalay (1992) improved the uniform assumption in a novel way, using the distribution of galaxy distance estimates themselves to estimate p(rtrue).Many other authors (e.g. Hudson 1993; Willick 1994) have preferred to use other ‘prior‘ data for p(rtrue), provided by e.g. iterative reconstruction of the IRAS density field as discussed in Section 4.1, although this approach had the disadvantage of weakening the independence of the redshift survey and distance indicator information. Newsam, Simmons and Hendry (1995) proposed a Monte-Carlo approach to bias correction which simultaneously addressed all the systematic errors present in reconstructing the dynamical fields; a similar treatment was developed in Freudling et a1 (1995). Malmquist bias alone is no longer regarded as a serious problem for reconstruction techniques. and with more recent data consistent results have been obtained using different bias correction procedures and calibration methods. Suffice it to say, however, that careful attention to the impact of Malmquist bias remains a very important issue for most reconstruction methods, and that the adage “applying the wrong Malmquist bias correction is often worse than applying no bias correction” is a useful mantra.
4.10
The POTENT reconstruction procedure
The assumption that the peculiar velocity field is irrotational is the basis of another reconstruction procedure which has been studied intensively over the past decade: the POTENT method. First proposed by Bertschinger and Dekel (1988), POTENT is an extremely appealing-and rather simple-method in principle. As we have already remarked, if v is irrotational, then it may be written as the gradient of a scalar velocity potential, Q. Hence, at any position, r, we may write
where the line integral is path independent. In particular, then, we may choose a purely radial path, so that Q(r)=
ir
u(rr)dr’
(42)
where u(r’) is the radial component of the peculiar velocity at distance, r’, along the line of sight to r. Having evaluated Q(r)one then obtains v(r) by differentiation. The beauty of POTENT is, then, that one may evaluate the 3-D peculiar velocity field purely from knowledge of only the radial components, which are directly accessible from redshift-independent galaxy distance indicators. The drawback of POTENT, however, is that we need to know the radial component of v everywhere. This means that the practical implementation of POTENT requires a great deal of smoothing of the raw data. Our sparse and noisy estimates of the U , a t the positions of the galaxies in our survey must be interpolated onto a regular grid. This is achieved by tensor window smoothing (Dekel, Bertschinger and Faber 1990), in which a bulk flow velocity field model is fitted to the radial velocity estimates of galaxies in a large smoothing window centred on each grid point. A tensor approach is required since the radial velocities are not parallel across the smoothing window. Moreover. this means that the integration need not be restricted to radial paths. Simmons, Newsam and Hendry (1995) developed the MAXFLOW algorithm which adapts the standard POTENT procedure to include non-radial paths if, for example,
Reconstructing density and velocity fields
211
this avoids a region where galaxy sampling is particularly poor. In practice, however, the tangential components of the fitted bulk flow velocity in each smoothing window are generally much noisier than the radial component, which means that the optimal integration path is rarely significantly non-radial. The smoothing procedures and the nature of the data used when applying POTENT in practice introduce several possible sources of systematic error in the recovered peculiar velocity field, the two largest of which are Malmquist bias (due to scatter in the individual distance and peculiar velocity estimates) and sampling gradient bias (due to uneven sampling of the velocity field across each smoothing window). Since the precise magnitude of these biases is very sensitive to the details of the particular dataset to which POTENT is applied, a proper error analysis can only be carried out by Monte Carlo simulations (Dekel et al. 1999). Having reconstructed the peculiar velocity field (Bertschinger et al. 1990; Dekel et al. 1999) one can then use e.g. Equation ( 2 5 ) to determine the mass overdensity field, 6, on a regular grid. Comparison with reconstructed from redshift survey data then permits the estimation of /3 (Dekel et al. 1993; Sigad et al. 1998). More recent implementations of this density-density comparison have extended the linear treatment to relate 6 and v using the Zel’dovich approximation, or other non-linear schemes (Nusser et al. 1991). Again, the error analysis requires great care as the data points (the grid values of 6) are coupled not just by real correlations but also by noise, and a full Monte Carlo treatment is essential. Kolatt et al. (1996) build realistic n-body simulations which mimic the properties of the real galaxy data and use them to explore carefully different smoothing procedures and assess the impact of possible systematic errors at all stages of the velocity and density reconstruction and density-density comparison. In summary, POTENT is a very attractive reconstruction method based on a very elegant idea, but is undermined by the sparse and noisy nature of its raw data, which necessitates highly complex smoothing and interpolation. Perhaps its most appealing feature is that it measures the mass density field directly-independent of any assumptions about galaxy biasing-because it uses galaxies as tracers of the velocity field and not of the density field. Dekel and Rees (1994) suggested that POTENT could, therefore, be used to place a robust lower limit on the value of noby considering the divergence of the velocity field around voids in the galaxy distribution. This is because &which in linear theory is of course proportional to V . v-is bounded below by -1, so that the lowest measured value of V * v can place a robust lower limit on the constant of proportionality, independent of biasing. Sparse sampling and data noise have prevented definitive application of this idea to date, but it remains an interesting possible future application of the POTENT method.
5
Conclusions
Although our aim in this article has been to describe reconstruction methods rather than results, it would be inappropriate to end without making at least some remarks about recent applications of these techniques to constrain cosmological models. We will restrict our attention to the linear bias parameter, @, which has attracted most attention in the literature.
212
Martin Hendry
Despite the considerable effort to test rigorously reconstruction methods and ensure their freedom from systematic error, there remains a lack of consensus in recently published determinations of $, with results divided roughly between methods which compare predicted and observed velocities, and those which compare predicted and observed densities. Chief examples of the latter category are POTENT-based comparisons using the Mark I11 catalogue of redshift-independent galaxy distance estimates and the density field reconstructed from the IRAS 1.2Jy redshift survey. Sigad et al. (1998, and references therein) favours a value of 31 = Cl: 6 / b I N 1 from these analyses. (Here the subscript I denotes the fact that the comparison is between galaxies in the near infra-red part of the spectrum: galaxy biasing. and hence the value of b, will in general be wavelength dependent). Velocity-velocity comparisons, on the other hand, have generally favoured 81N 0.5 (Davis, Nusser and IVillick 1996: Riess et al. 1997; Da Costa et al. 1998; Willick and Strauss 1998). The origins of this significant discrepancy have not yet been elucidated (see for example Strauss 2000; Willick 2000). At least one of the reconstruction methods is suffering from some residual systematic effects. What are those systematic effects? Two facts would seem to give some indication. Firstly, lower p values have generally (although not always) also been found from statistical analyses of galaxy clustering in redshift space (Taylor and Hamilton 1996: Fisher and Nusser 1996; Tadros et al. 1999), which make no use of redshiftindependent distance information, although some assumptions about the power spectrum of density fluctuations may remain. Secondly, recent development and application of the = 0.6. Although the details of ROBUST method (Rauzy and Hendry 2000) gives this method are beyond the scope of this article, the salient point is that ROBUST uses redshift-independent distance indicators in a manner which requires no application of Malmquist bias corrections. Thus, the agreement between ROBUST and e.g. VELMOD (Willick and Strauss 1998) and I T F (Davis, Fusser and Willick 1996) suggests that the Malmquist bias corrections employed in the latter methods are indeed largely free from systematic error, and perhaps instead it is the sampling gradient biases associated with the POTENT smoothing procedure which are contributing to the higher /?estimates from that method. Notwithstanding these remarks, it appears a t least from the Monte Carlo analyses of Kolatt et al. (1996) and Dekel et al. (1999) that POTENT is not seriously contaminated by systematic error. Another possibility which must be faced is simply that the assumption of linear galaxy biasing is inadequate. Berlind, Narayanan and Weinberg (2000; 2001) are exploring the impact of various different non-linear, and even non-local, biasing schemes on reconstruction methods, and this work may yet shed light on the continuing 8 discrepancy. Certainly in the long term, POTEST-based methods offer an interesting means of disentangling the relationship between galaxies and mass. precisely because they treat galaxies as tracers of the velocity field-which responds to all gravitating mass, however it is clustered. In summary, this article has hopefully made clear that the analysis of all-sky redshift surveys for the purpose of reconstructing the large scale density and velocity fields is a rapidly maturing topic, requiring sophisticated statistical machinery to cope with the sparse and noisy nature of the datasets used. Within the (possibly limited) framework of linear galaxy biasing, the best estimates of p to date have been obtained from infra-red surveys and indicate that 31 N 0.5, consistent with the low fl, values favoured by analyses of high redshift supernovae and the CMBR (see e.g. Tegmark and Zaldarriaga 2000) and with the assumption that IRAS galaxies appear to trace the mass distribution on large
Reconstructing density and velocity fields
213
scales. This picture, while far from complete, appears t o vindicate the gravitational instability paradigm, which is alive and kicking in Large Scale Structure.
Acknowledgments The author is pleased to acknowledge many useful discussions with his collaborators in preparing this review-particularly Stkphane Rauzy, Elke Schumacher, John Simmons, Andrew Newsam and Kenton D’Mellow. The author would like to dedicate this article to the memory of Jeffery Willick, who was killed tragically in June 2000, after more than a decade of research in this field. Jeff Willick’s outstanding work in the analysis of galaxy redshift and redshift-distance surveys-which has been extensively referenced throughout this article-has made an enormous, and lasting, contribution to our understanding of the large scale structure of the Universe, and his loss to the community is sorely felt.
References Aaronson, M., et al., 1982, ApJS, 50, 241 Balian, R. and Schaefer, R., 1989, A&A, 220, 1 Bardeen, J., Bond, J.R., Kaiser, N. and Szalay, A., 1986, ApJ, 304, 15 Berlind, A., Narayanan, V.K. and Weinberg, D.H., 2000, ApJ, 537, 537 Berlind, A., Narayanan, V.K. and Weinberg, D.H., 2001, in prep. Bertschinger, E. and Dekel, A., 1989, ApJ, 336, L5 Bertschinger, E., Dekel, A., Faber, S.M. and Burstein D., 1990, ApJ, 364, 370 Branchini, E., et al., 1999, MNRAS, 308, 1 Bunn, E.F., Hoffman, Y. and Silk, J., 1996, ApJ, 464, 1 Coles, P. and Lucchin, F., 1995, Cosmology: The Origin and Evolution of Cosmic Structures, (Chichester: John Wiley) Coles, P., 1997, in From Quantum Fluctuations to Cosmological Structures, eds. D. Valls-Gabaud et al., ASP Conf. Ser. 126, 233 Colless, M., et al., 1999, MNRAS, 303, 813 Courteau, S., Faber, S.M., Dressler, A. and Willick J.A., 1993, ApJ, 412, L51 Courteau S., et al., 2000, in Cosmic Flows 1999: Towards an Understanding of Large-Scale Structure, eds S. Courteau et al., ASP Conf. Ser. 201, 17 Croft, R.A.C. and Gaztariaga, 1997, MNRAS, 285, 793 da Costa, L.N., et al., 1998, MNRAS, 229, 425 Dale, D.A., et al., 1999, ApJ, 510, 11 Davis, M., Nusser, A. and Willick,J.A., 1996, ApJ, 473, 22 Davis, M. and Peebles, P.J.E., 1983, ApJ, 267, 465 Dekel, A., Bertschinger, E. and Faber, S.M., 1990, ApJ, 364, 349 Dekel, A. and Rees, M.J., 1994, ApJ, 422, L1 Dekel, A., et al., 1993, ApJ, 412, 1 Dekel, A. and Lahay,’b., 1999, ApJ, 520, 24 Dekel, A., et al., 1999, ApJ, 522, 1 Dekel, A. and Ostriker, J.P., 1999, Formation of Structure in the Universe, (Cambridge: CUP) Dressler, A., et al., 1987, ApJ, 313, 42 Fisher, K.B., et al., 1995a, ApJS, 100, 69 Fisher, K.B., et al., 1995b, MNRAS, 272, 885
214
Martin Hendry
Fisher, K.B. and Nusser, A., 1996, MNRAS, 279, 1 Fixsen, U.J.: et al., 1994, ApJ, 470, 38 Freudling, W., da Costa, L.N. and Pellegrini, P.S., 1994, MNRAS, 268: 943 Freudling, W., et al., 1995, AJ, 110, 920 Geller, M.J. and Huchra, J.P., 1989, Science, 246, 897 Giavalisco, M., Mancinelli, B., Mancinelli, P.J. and Yahil A.. 1993, ApJ. 411, 9 Giovanelli, R. and Haynes, M.P.. 1991, ARA&A, 29, 499 Giovanelli, R., et al., 1997, AJ, 113, 22 Giovanelli, R., et al., 1998, ApJ, 505, 91 Gorski, K.M., 1994, 430, L85 Groth, E.J. and Peebles, P.J.E., 1977, ApJ, 217, 385 Hamilton. A.J.S.. 1998, inThe Evolving Universe, ed. D. Hamilton (Kluwer Academic: Dordrecht), 185 Han, M., Mould: J.R., 1992, ApJ, 396, 453 Hendry, M.A., 1997, in From Quantum Fluctuations to Cosmological Structures, eds. D. VallsGabaud et al., ASP Conf. Ser. 126, 385 Hendry, M.A. and Simmons, J.F.L., 1994, ApJ, 435, 515 Hubble, E.. 1929, Proc. Nat. Acad. Sei.. 15, 168 Huchra, J.P., Davis, M., Latham, D. and Tonry, J., 1983, ApJS, 52, 89 Huchra. J.P., Geller, M.J., de Lapparent, V. and Corwin, H.G., 1990, ApJS, 72, 433 Hudson, M.J., 1993, MNRAS, 265, 43 Hudson, M.J., et al., 1999, ApJ, 512, 79 Kaiser, N., 1984, ApJ, 284, L9 Kaiser, N., 1987. MNRAS, 227, 1 Kaiser, N., et al., 1991, MNRAS, 252, 1 Kerscher, M.. et al., 1997, MNRAS, 284, 73 Kirshner, R.P., Oemler, A. and Schechter, P.L., 1978, AJ, 83, 1549 Klypin, A. and Shandarin, S.F., 1993, ApJ, 413, 48 Kolatt, T., Dekel, A., Ganon, G. and Willick, J.A., 1996, ApJ, 458, 419 Kolb, E.W. and Turner, M.S., 1990, The Early Universe, (Redwood City: Addison-Wesley) Krzewina, L.G. and Saslaw, W.C., 1996, MNRAS, 278, 869 Lahav, O., Lilje, P.B., Primack, J.R. and Rees, M.J., 1991, MNRAS, 251, 128 Landy, S.D. and Szalay, A.. 1992, ApJ, 391, 494 Lauer, T.R. and Postman, M., 1994, ApJ. 425, 418 Liddle, A.R. and Lyth, D.H., 2000, Cosmological Inflation and Large Scale Structure, (Cambridge: CUP) Lineweaver, C.H., 1997, in From Quantum Fluctuations to Cosmological Structures, eds. D. Valls-Gabaud et al., ASP Conf. Ser. 126, 185 Lineweaver, C.H.. et al., 1996, ApJ, 470, 38 Loveday, J., Peterson, P.A., Maddox, S.J. and Efstathiou, G., 1994, ApJS, 107,201 Lynden-Bell, D., et al., 1988, ApJ, 326, 19 Maddox, S.J., 2000, in Clustering at High Redshifl, eds. A. Mazure et al., ASP Conf. Ser. 200, 63 Maddox, S.J., Efstathiou, G., Sutherland, W.J. and Loveday, J., 1990, MNRAS, 243, 692 Malmquist, K.G.. 1920, Medd. Lund. Astron. Obs. Ser. 11, 22, 1 Mathewson, D.S., Ford, V.L. and Buchhorn. M., 1992, ApJS, 81, 413 Mould, J.R., et al., 2000, ApJ, 529, 786 Newsam, A.M., Simmons, J.F.L. and Hendry, M.A., 1995, A&A, 294, 627 Nusser, A., Dekel, A.: Bertschinger, E. and Blumenthal, G.R., 1991, ApJ, 379. 6 Nusser. A. and Davis. M.. 1994. ApJ, 421, L1
Reconstructing density and velocity fields
215
Peacock, J.A., 1999, Cosmological Physics, (Cambridge: CUP) Peacock, J.A. and Dodds, S.J., 1994, MNRAS, 267, 1020 Peebles, P.J.E., 1990, ApJ, 362, 1 Peebles, P.J.E., 1993, Principles of Physical Cosmology, (Princeton: Princeton University Press) Peebles, P.J.E., 1994, ApJ, 429, 43 Perlmutter, S., et al., 1999, ApJ, 517, 565 Riess, A.G., Davis, M., Baker, J. and Kirshner, R.P., 1997, ApJ, 488, 1 Rowan Robinson, M. et al., 1990, MNRAS, 247, 1 Sahni, V. and Coles, P., 1995, Physics Reports, 262, 1 Santiago, B.X., et al., 1995, ApJ, 446, 457 Saunders, W., et al., 2000, MNRAS, 317, 55 Scharf, C. and Lahav, O., 1993, MNRAS, 264, 439 Schechtman, S.A., et al., 1996, ApJ, 470, 172 Schumacher, E., Optimal Representation of the Density Field from Redshifi Surveys, MSc Thesis, University of Glasgow, UK Scoccimaro, R., Feldman, H., Fry, J.N. and Frieman, J.A., 2001, ApJ, 546, 652 Seldner, M., Siebers, B., Groth, E.J. and Peebles, P.J.E., 1977, AJ, 84, 249 Shaya, E.J., Peebles, P.J.E. and Tully, R.B., 1995, ApJ, 454, 15 Sigad, Y., et al., 1998, ApJ, 495, 516 Sigad, Y., Branchini, E. and Dekel, A., 2000, ApJ, 540, 62 Simmons, J.F.L., Newsam, A.M. and Hendry M.A., 1995, A&A, 293, 13 Straws, M.A., 2000, in Cosmic Flows 1999: Towards an Understanding of Large-scale Structure, eds S. Courteau et al., ASP Conf. Ser. 201, 3 Straws, M.A., et al., 1992, ApJS, 83, 29 Straws, M.A. and Willick, J.A., 1995, Physics Reports, 261, 271 Tadros, H., et al., 1999, MNRAS, 305, 527 Taylor, A.N. and Hamilton, A.J.S., 1996, MNRAS, 282, 767 Taylor, A.N. and Valentine, H.: 1999, MNRAS, 306, 491 Teerikorpi, P., 1997, ARA&A, 35, 101 Tegmark, M. and Zaldarriaga, M., 2000, ApJ, 544, 30 Teodoro, L., 1999, The Density and Velocity Fields of the Local Universe, PhD Thesis, University of Durham, UK Theureau, G., et al., 1997, A&A, 319, 435 Tonry, J.L., Blakeslee, J.P., Ajhar, E.A. and Dressler A., 1997, ApJ, 475, 399 Valentine, H., Saunders, W. and Taylor, A.N., 2000, MNRAS, 319, L13 van de Waygaert, R. and Icke, V., 1989, A&A, 213, 1 Verde, L., Heavens, A.F. and Matarrese, S., 2000, MNRAS, 318, 584 Verde, L., Heavens, A.F., Matarrese, S.and Moscardini, L., 1998, MNRAS, 300, 747 Vetollani, G., et al., 1997, A&A, 325, 954 Vogeley, M.S. et al., 1994, ApJ, 420, 525 Webb, S., 1999, Measuring the Universe, (Chichester: Springer Praxis) Wegner, G., et al., 1999, MNRAS, 305, 259 Weinberg, S., 1972, Gravitation and Cosmology: Principles and Applications of the General Theory of Relativity, (New York: Wheeler) White, M., Scott, D. and Silk, J., 1994, ARA&A, 32, 319 White, S.D.M., 1979, MNRAS, 186, 145 Wiener, N., 1949, in Extrapolation and Smoothing of Stationary T i m e Series, (New York: Wiley) Willick, J.A., 1994, ApJS, 92, 1 Willick, J.A., 1999, ApJ, 516, 47
216
Martin Hendry
Willick, J.A., 2000, in Cosmic Flows 1999: Towards an Understanding of Large-scale Structure, eds S. Courteau et al., ASP Conf. Ser. 201, 321 Willick, J.A., et al., 1997, ApJS, 109, 333 Willick, J.A. and Strauss, M.A., 1998, ApJ, 507, 64 Yahil, A., Strauss, M.A., Davis, M. and Huchra, J.P., 1991, ApJ, 372, 380 York, D.G., et al., 2000, AJ, 120, 1579 Zaroubi, S., Hoffman, Y., Fisher, K.B. and Lahav, O., 1995, ApJ, 449, 446 Zel’dovich, Y.B., 1970, A&A. 5. 84
217
Cosmological numerical simulations: past, present and future Gustavo Yepes Universidad Aut6noma de Madrid, Spain
1
Introduction
One of the most important problems of modern Cosmology is to understand how the Universe evolved from an initial homogeneous and isotropic distribution of mass and energy to the complex structures we observe today: stars, galaxies, clusters of galaxies, and large-scale structures. The most accepted hypothesis within the paradigm of the Standard Cosmological Model (i.e., Einstein’s General Relativity Cosmological Principle) to explain the formation of structures in the universe is the Gravitational growth of primordial density fluctuations. This hypothesis has been very successful in explaining many of the observational features of the distribution of matter in the Universe. Nevertheless, for many years, the physical origin of the primordial density fluctuations remained unknown. In the early 1980’s, Guth (1981) proposed the idea that a phase transition in the early universe, associated with Grand Unification symmetry breaking, resulted in an epoch of exponential expansion (inflation). This mechanism has proved to be very useful in solving some of the most important problems of the Standard Cosmological Models: the flatness problem, the horizon problem and the origin of density fluctuations. (See e.g. Liddle (1997) for a review on inflationary cosmology).
+
The gravitational evolution of density fluctuations in Friedman-LemAtre-RobertsonWalker (FLRW) models can be studied analytically by means of Linear Perturbation Theory as long as Sp G (p(r, t ) - pb)/pb) << 1. After assuming the type of matter content of the Universe (i.e. an equation of state) the FLRW field equations can be linearised and solved (Peebles 1980) for different epochs of the Universe: from radiation dominated epoch, to matter dominated to the epoch of decoupling between matter and radiation. Predictions from linear theory can be compared with observational results coming from early epochs (e.g. Cosmic Background Radiations) or from scales in which Sp is still small (e.g. cluster correlations, Void functions, etc). Results from linear theory cannot be extrapolated further than Sp 1. At that moment, non-linear effects become important and all the linear approximations break down. N
218
Gustavo Yepes
At this point we are left with two possibilities: go to higher order in perturbation theory or numerically integrate the Boltzmann equations for the different mass components from the time where 6p 1 until the present. A considerable effort has been invested in developing higher order perturbation theory of gravitational instability in an expanding universe (see e.g. Sahni and Cole (1995) for a recent review). Unfortunately, because there is no analytical solution to the non-linear gravitational evolution of a self-gravitating fluid, there is no possibility of checking the validity of these methods except if they are compared with results from numerical simulations. A typical example of this is the validity of the Zeldovich Approximation or the Press-Schechter formalism. N
The first cosmological-related simulations were done in the late 1970’s by Aarseth et al. (1979) who studied the evolution of fluctuations in an expanding universe using a system of 5000 particles randomly placed within a sphere. At that time, the only possible fluctuation spectrum that could be reproduced was a flat spectrum corresponding to a random distribution of points. They also tried to generate a realization of a fluctuation with power spectrum P ( k ) 0; l / k by placing particles in rods at the beginning. Needless to say, this initial configuration was far from being a realistic representation of initial cosmological conditions. In the early 1980’s, Doroshkevich et al. (1980) and Klypin and Shandarin (1983) were the first to apply the Zeldovich Approximation (Zeldovich 1970) to generate realizations of density fluctuations with arbitrary power spectra using particles (see e.g. Yepes (1996) for more details). Since then, this has become the standard method to set up initial conditions for cosmological N-body simulations. During the 1980’s and 1990’s the field of numerical simulations bloomed. The exponential growth of computational power during these years and the development of new and more sophisticated numerical algorithms, made it possible to increase the accuracy of simulations. More particles and/or grid cells could be treated, therefore the spatial and mass resolution increased dramatically. In the first simulations, not more than a few thousand particles could be used. Today, the most sophisticated codes can treat up to lo9 particles using massive parallel supercomputers and the spatial resolution they achieve can be a few kpc in a computational volume of NlOOMpc in size, more than lo5 times better resolution than 15 years ago. But, although a key ingredient, numerical resolution is not the only one needed in a realistic simulation. A simulation with infinite resolution but with an incorrect, or incomplete, modelling of the most relevant physical processes acting on the simulated system, would be useless. In the case of cosmological simulations, the “simulated system” is a patch of our Universe, and many things happened in any patch of Universe which must be modelled. Until very recently, gravity was the only interaction that was taken into account. In dark matter dominated cosmological scenarios, baryons represent a tiny fraction of the total matter density of the Universe (see e.g. Primack (2000) and references therein). However, most of the observations from our Universe come from the light emitted, or absorbed, by this tiny fraction of matter. Therefore, any realistic simulation of the process of structure formation would have to include, in addition to the dominant dark matter component, the baryons and the physical processes acting on them. Unfortunately the physics of baryons is much more complicated than gravitational interactions. At a first approximation, the primordial baryonic distribution can be considered as an ideal monoatomic gas made up of neutral hydrogen (76%) and helium (24%). As an ideal gas,
Cosmological num erica1 simulations
219
it generates pressure gradients as it is compressed, and heated, when falling in the dark matter potential wells. Pressure forces act against gravitational forces and shocks can be developed in the gas fluid. The numerical treatment of these gas-dynamical effects adds an extra difficulty to the already complex problem of gravitational evolution of density fluctuations. Besides, there exist other physical phenomena that can change considerably the thermo-dynamical behaviour of the baryonic gas. Radiative cooling due to atomic transitions and inverse Compton cooling between free electrons and Cosmic Background photons at early redshifts are two major sinks of the internal energy of the gas. Even more, to complicate the situation, when the gas cools below w104K, molecular clouds form, giving birth, by fragmentation, to stars, which behave as collisionless particles. The stars evolve and the most massive ones will explode as supernovae, injecting a fair amount of energy and metals into the surrounding gas. Unfortunately, there is little knowledge of the complicated physical processes that happened in the star-gas system. All the abovementioned effects-gas dynamics, cooling, star formation, and star-gas feedback-happened at very different scales. Even with today’s largest computers and foreseeable future computers, star formation and supernovae feedback will remain unresolved in cosmological simulations and their effects must be included phenomenologically. Despite these difficulties, we have now at our disposal a tool that has changed the way of doing cosmology. Instead of questionable hypotheses to extract information from the dark matter distribution (e.g. constant M/L ratios, linear biasing between total and visible matter distributions, etc) to compare with observations, we can now test them by running self-consistent simulations which include gravity, gas dynamics, radiative processes and star physics. The results of these simulations (e.g. luminosity, colours, metalicity of dark halos) can be directly compared with observational data from present day galaxies and their progenitors at early redshifts. Computer simulation in Cosmology has become a standard, and popular tool among the scientific community. In the early days, only a small number of scientists could do numerical simulations because they required access to the few supercomputer centres and because none of the numerical codes were publicly available. Fortunately, the situation has changed in the last few years. Now, there is no longer need of large supercomputers to run cosmological simulations. Present day workstations are powerful enough to perform interesting numerical experiments. But the biggest change has come from the numerical cosmology community. Along with the current trend in the computing community of promoting free software (e.g. Linux, GNU, Open Source, etc) there is an increasing offer of publicly available numerical codes, even the state-of-the art parallel codes, to perform numerical simulations in a cosmological context. This has facilitated the access of many people to this area of research, which will, in turn, speed up the development of the field. This lecture is organised in 3 large sections in which I will review the past, the present and the foreseeable future of numerical simulations in Cosmology. In Section 2 I will briefly summarise the basic ideas behind the N-body methods to simulate gravitational evolution of density fluctuations and the numerical techniques that have been developed during the last 20 years. Those interested in gravitational N-body simulations will find more details in Hugh Couchmann’s lectures in this volume. Present day simulations incorporate the effects of gas dynamics together with gravitational evolution. In Section 3, I will present the formulation of gas dynamics in Cosmology and will review the two major numerical techniques used to solve the fluid equations: (a) Eulerian methods and (b) SPH. Finally,
220
Gustavo Yepes
in Section 4 I will discuss the recent attempts to introduce the short-scale non-adiabatic processes related to star formation and star-gas interactions and will speculate on the future of numerical simulations: what new developments, both in numerical and physical modelling will be expected in this field in the years to come?.
2
The past: gravitational N-body simulations
Gravity is the dominant interaction at large scales (21Mpc). To study large-scale structure formation, a good approximation was to ignore the effects related with the electromagnetic nature of baryons. A collection of observational evidence (e.g. Primack (2000)) showed that most of the mass in the Universe (290%) is in the form of non-baryonic dark matter, that interacts only gravitationally. It can be considered as a collisionless fluid in which the particle trajectories are not substantially changed by close encounters. (i.e. two-body relaxation is a negligible effect).
A statistical mechanics description is the most accurate representation for a collisionless self-gravitating system. In terms of the distribution functions, f i ( x ,k,t ) , for each mass component, the dark matter density in comoving space can be defined as
where a ( t ) is the expansion factor and m, is the mass of individual particles (axions, WIMPS, neutrinos, etc) The dynamical evolution of such a system is governed by the Collisionless Boltzmann Equation, also known as Vlasov equation (e.g. Binney and Tremaine 1987):
coupled with Poisson’s equation for the gravitational potential
V2$ = 4rGa2(t)(bp(x, t))
2.1
(3)
Cosmological N-body simulations
In order to solve the Boltzmann equation, we can use the method of characteristics. As a reminder, the characteristics are the trajectories in phase space along which the distribution functions, f i , are constant. It turns out that the characteristics for the Boltzmann equation are simply the newtonian trajectories of fluid elements (see e.g. Saslaw (1985)): j ; = -v+. The complete set of characteristics for each point of the phase space is equivalent to the solution of the partial differential equation. Of course, it is not possible to follow the trajectories of an infinite number of differential fluid elements along the characteristics. A good approximation would be to split the phase space volume into a representative number of sub-volumes and calculate one characteristic (i.e. trajectory of a particle in the gravitational potential) for each of them. Thus, the solution of the Vlasov equation is equivalent to the solution of the equations of motion of N gravitating bodies in an
Cosmological numerical simulations
221
expanding universe. In this regard, the direct summation N-body technique developed in the 1960’s (Aarseth 1963) to study the evolution of self-gravitating systems such as star clusters or galaxies, was applied to study the development of non-linear clustering in the Universe (Aarseth et al. 1979). Since then, many new numerical algorithms have been applied to the N-body problem in Cosmology. A detailed description of these algorithms as well as more technical details on Cosmological N-body simulations can be found in Couchman’s lectures in this book. Also, the interested reader may consult the recent reviews by Yepes (1996), Sellwood (1997), Bertschinger (1998) or Klypin (2000) which cover different aspects of this subject.
3
The present: gas-dynamical simulations
Despite the success of gravitational N-body simulations in understanding many features of non-linear clustering in hierarchical cosmological models, they have an important drawback: the impossibility of making a direct comparison between simulation results and observational data. Most of the information we get from the Universe comes from the electromagnetic radiation emitted or absorbed by baryons, which are not properly modelled in a collisionless N-body simulation. At short scales, (< 1Mpc) gas-dynamical effects dominate over gravitational effects. On shorter scales, non-adiabatic processes related to radiative transfer or star formation would influence the thermodynamic behaviour of the system. The computational power required to integrate simultaneously Poisson’s and gasdynamical equations was not reached until the early 1990’s. During this decade, there has been a strong development of numerical codes which include all kinds of gravity solvers developed in the past, together with different kinds of numerical methods to solve the gas-dynamical equations. Nowadays, what seemed almost impossible to achieve with the biggest CRAY computers 10 years ago, can be easily done in a laptop personal computer. But, although cosmological gas-dynamical simulations have become the current fashion, it does not mean that N-body simulations are outdated. There is a revival of cosmological N-body simulations because numerical resolution has increased by several orders of magnitude with respect to previous simulations. This has been possible due to successful parallel implementations of existing N-body algorithms (e.g. P3M, Tree) and to the development of new numerical N-body methods, based on adaptive techniques (Kravtsov, Klypin and Khokhlov 1997). Therefore, it is now possible to study in detail various important problems of the gravitational clustering process, directly related to the lack of resolution, such as numerical over-merging, or the central density profiles of dark matter halos (see Tom Quinn’s lectures in this volume).
3.1
The Cosmological gas-dynamical equations
Baryonic gas is made of ions of Hydrogen and Helium. The electromagnetic interactions of these ions result in macroscopic pressure forces that act against gravity. In the fluid approximation, this equation is translated into a set of hyperbolic partial differential equations for the first moments of the distribution function (density, mean velocity and internal energy). They are known as Euler’s Equations and describe the evolution of a
222
Gustavo Yepes
compressible fluid in a gravitational field generated by the evolution of density fluctuations. In the non-relativistic (w << e ) , Newtonian approximation ( r << c/H(t); with H u/a the Hubble function) they read
aLJ + v , p . U = -
Continuity:
-
Momentum:
-
Energy:
-
at
au
at
aE
at
(%)*,
(4)
+ ( u . V , ) u + V,@ + v,p = 0,
(5)
+ V, . [ ( E+ P)u]+ p u . V,@ = (r - A ( p , T ) ) .
(6)
~
P
+
where E z P ( E u2/2) is the total energy density of gas and E is the internal energy per unit mass. The right-hand-side terms in Equations (4) and (6) account for the effects of all possible non-adiabatic processes that act as sources or sinks of matter and energy (e.g. star-formation, radiative transfers, etc..). This set of equations must be supplemented by an equation of state that relates pressure and internal energy. For a perfect gas P = (y - 1 ) ~ pwith . y = c p / c ~ .being the ratio of specific heats. For a monoatomic gas, y = 5/3. Together with these equations, Poisson's equation
for the gravitational potential must also be solved to close the system.
As in N-body cosmological simulations: it is easier to do the calculations in the comoving reference frame
in which global expansion is removed. In this set of coordinates, positions (x) are time independent and velocities can be split into two parts: (a) the global term H(t)r corresponding to expansion and (b) the term v = a(t)x which is peculiar (due to gravitational potential attractions). In the comoving coordinates (x,v ) , the cosmological Euler equations are transformed into 1 9 + 3H(t)p+ -Vxpv at a
=-
(9)
1 +vxp = 0 av + -1( v . Vx)v + H ( t ) v+ -Vz4
-
aE at
+
at a 1 -V,. [(E P)v] H ( t ) { 3 ( E P) a
+
+
aP
a
1
+ + pw2} + ; p v . Vz4 = (r - A(p,T)) (11)
where 4 = @ + (1/2)a(t)ii(t)x2 is the Peculiar Potential (e.g. Peebles (1980)), which is governed by Poisson's equation (3)
3.1.1
The Shandarin transformation
Sumerica1 methods developed over the years to study the dynamics of compressible fluids in other areas of Physics cannot be applied directly in Cosmology. As can be seen in Equations (9-11) the expansion of the universe introduces drag terms (depending on a(t))that
223
Cosmological numerical simulations
produce a decay with time of p, v and E , even when Vq5 = V P = 0. Fortunately, there is a way to get rid of these spurious effects by means of a coordinate transformation. Shandarin (1980) found a transformation from physical variables to comowiny dimensionless variables, also known as super-comowing variables (Martel and Shapiro 1998):
where ro is an arbitrary scale and the rest of the units are:
Transforming to dimensionless variable ( 2 ,t) by means of
dt-
1 dt --. a2
to ’
2= -
X TO
After these transformations, Equations (9-11) take the form:
-a6+ v - - c . = -
ai
xp
- + (5.V,)S
at
aE
ai + vi,. [(I?
-
(si, ~
V-P + -+ + v,$ = 0, P
+ P)?] + j? vi,$= PH(2F - 3 P / P ) + (I? - A), *
(17)
where we have defined
- = -_.1
E
P H- _ -?. Ida
7-16,
a dt ’
r- =- a7 l?/Fo; A = a7A/Ao; (r0= A.
= Eopo/to)
(18)
Poisson’s equation now reads V i $ = (6/a)8. After applying the Shandarin transformation, the cosmological equations for gas dynamics look almost like the standard Euler equations for a compressible fluid, except for the first term in the RHS of the energy equation (17). For a monoatomic gas (i.e. P = (y - 1 ) ~ p7; = 5/3), which is a reasonable assumption for the primordial gas, this term vanishes and the equations in super-comoving coordinates are exactly the standard Euler equations. The physical reason why this drag term vanishes for a monoatomic gas can be found in Martel and Shapiro (1998). Thus, the classical techniques for compressible fluids can be directly applied to cosmological simulations taking advantage of the vast knowledge achieved by many years of research in Computational Fluid Dynamics. Despite its usefulness, the Shandarin Transformation has not been extensively used in Numerical Cosmology (Yepes et al. 1997; Kravtsov 1999).
3.2 Numerical methods for gas dynamics The numerical solution of a hyperbolic system of partial differential equations (like Equations (15-17)) is much more complicated than a parabolic differential equation (like the
224
Gustavo Yepes
Poisson’s equation). In hyperbolic equations there can be discontinuities in the solutions (i.e. shocks in the fluid) where spatial derivatives diverge. The numerical algorithms can handle these singularities using an artificial viscosity term, which smooths the fluid variables. Another possibility to get around this problem is to integrate the equations in small volumes and transform the spatial derivatives into fluxes through volume interfaces, by means of the Gauss theorem. In either case, the numerical solution must fulfils the Courant stability condition (Courant, Friedrichs and Lewy 1967) which imposes severe restrictions on the time-steps of gas-dynamical simulations as compared with pure N-body simulations. The numerical methods used in gas-dynamical simulations in Cosmology can be divided into two large groups depending on whether the fluid elements are represented by particles or grids. The first category, smoothed Lagrangian hydrodynamics, or Smoothed Particle Hydrodynamics (SPH) is treated in Section 3.3. The basic grid methods are covered in Section 3.4 and grid methods on non-uniform meshes are covered in Section 3.5. The various numerical techniques are covered in Section 3.6.
3.3
Particle methods: smoothed Lagrangian hydrodynamics
This method has become the most popular in gravitational gas-dynamical simulations, not only in Cosmology, but in many other areas of Astrophysics. Smoothed Particle Hydrodynamics (SPH) was originally proposed by Lucy (1977) and Gingold and Monnaghan (1977). Here, I will briefly summarise the basic steps of this method. Those interested in more details on SPH should read the excellent review of Monnaghan (1992). This method consists of a discretisation of the continuous fluid variables by a set of pseudo-particles which carry the dynamical information (density, pressure, velocity, energy, etc). These particles respond to gravitational and pressure forces and move according to them. To reconstruct the continuous variables of the fluid, the particle distribution is interpolated with a given interpolation kernel. In this sense, SPH is more a Monte Carlo than a pure Lagrangian method. But it possesses, like Lagrangian methods, the ability to move the computational elements (particles) with the fluid. In this sense, it is a natural extension of N-body methods, making it is very easy to accommodate to existing N-body cosmological codes. The first cosmological N-body (P3M) and SPH code was written by Evrard (1988). Soon after, Hernquist and Katz (1989) developed a code, called TREESPH, which used a Tree algorithm as a Poisson solver together with SPH. Since then many other cosmological codes with SPH and different N-body algorithms have been written. I summarise the most representative ones:
+
SPH PP (Direct N-body) Special purpose hardware: GRAPE (Unemura et al. 1993) Parallel implementation on CM5 (Serna, Alimi and Chieze 1996) SPH Tree N-body (Octal rooted tree. Barns-Hut algorithm) TREESPH (Hernquist and Katz 1989) PTREESPH. Parallel TREESPH (DavB, Dubinski and Hernquist 1997) PSPH (Carraro, Lia and Chiosi 1998) GADGET (Springel, Yoshida and White 2000) GRAPESPH. (Steinmetz 1996; (Nakasato, Mori and Nomoto 1997)
+
Cosmological n um erica1 simulations
Binary-tree N-body
225
+ SPH
(Navarro and White 1993) 0
SPH
+ P3M
P3MSPH, (Evrard 1988) AP3MSPH, (Tissera, Lambas and Abadi HYDRA (Couchman, Thomas
SPH
+ PM
1997); and Pearce 1995)
SPH+HPM. Hierarchical Particle-Mesh ( H P M ) and SPH (Shapiro A S P H f H P M . Adaptive SPH and HPM (Shapiro e t al. 1996)
e t al. 1991)
The first step in the SPH method is to substitute the continuous flow field (temperature, density, energy), A(r, t ) ,by a smoothed estimate As(r, t ) over a scale h
As(rl t ) =
1
d3r’A(r‘,t)W(r - r’, h )
(19)
The interpolation kernel function W(r) must be a normalised, strong peaked function i.e.
J W(r - r’, h)d3r‘ = 1
; h-0 lim W(r - r’, h ) = b(r - r’)
(20)
A spline kernel function is the usual choice. Sometimes, a Gaussian function is also used. The main advantage of SPH is that spatial gradients of the smoothed quantities (19) are easily obtained (see e.g. Monnaghan (1992)) by VA,(r, t ) = /d3r‘A(r‘, t)V,,W(r - r‘, h ) .
(21)
Therefore, there is no need of a grid to compute gradients. The only requirement is that the interpolation function must be differentiable. The next step in SPH consists of two approximations. First, the continuous hydrodynamical fields are replaced by their smoothed estimates given by (19) and second, they are divided into N sub-volumes (fluid particles). Under these assumptions, the gas density can be easily computed at any position by
and the rest of the gas-dynamical variables as
Because W(r, h ) must be a strong peaked function as x -+0, there is no need to extend the sum to all particles. In fact, only nearby particles are required to perform the interpolation. In this regard, SPH shares the same p r o b l e m as tree or P3M N-body methods: to find the nearest neighbours efficiently. Therefore, they are the “natural” Poisson solvers to be used with SPH. The error in approximating Equation (19) by (23) depends on the disorder of the particles and is normally O ( h 2 ) .(Monnaghan 1992)
226
Gustavo Yepes
In most implementations of SPH, the smoothing scale, h is variable and depends on the local density h, K pZli3 and the number of neighbours used for interpolation is fixed (typically 40-60). The equations of motion for the fluid particles can be derived in different ways. Using the density defined by (22) in the Lagrangian, one obtains the following equations of motion in physical coordinates (Gingold and Monnaghan 1977): Momentum: Energy:
du:”
dt =
(p,” + ($+%) p,
-
&=FmJ dt
m3
p3
-T p3
+ HZ3) V,W(r, - r3,h ) - Vq5
(24)
( u z - u 3 ) ~ V 2 W ( r 2 - r 3 , h ) +rz- - A, Pz
(25)
In Equation 24 note that the force due to pressure gradients is antisymmetric with respect to particle indices, z, 3 . Therefore, the total linear and angular momentum are explicitly conserved quantities. tensor factor in the above equations represents the artificial viscosity term The needed to treat shocks in SPH because it works with gas-dynamical equations in differential form. Shocks are, therefore, not very well resolved. They are spread over several scale lengths, h. Viscosity can also influence the dynamics of the system in different ways by producing spurious numerical effects. Nevertheless, most of these problems can be “reduced” by an adequate tailoring of the IIz3term (see Steinmetz (1995) for an extended discussion on this issue). Another potential problem due to artificial viscosity has been raised by Shapiro et al. (1996). Artificial viscosity can preheat the gas in front of a shock to temperatures of several lo4 K. This can be disastrous for calculations which include radiative cooling. Shapiro et al. proposed a new approach to SPH that, they claim, can solve this problem. The basic difference between this Adaptive SPH and the standard SPH approach is that the interpolation kernel function, W(r, h ) , is no longer spherically symmetric but ellipsoidal in h and adapts itself to the distribution of fluid particles. The numerical accuracy of SPH is more difficult to determine theoretically than for grid based algorithms because of the absence of a mathematical proof of convergence of the solution to the fluid equations in the continuous limit. (See Balsara (1995) for a study of the stability of the SPH method). Nevertheless, the easy implementation of this method and the faculty of placing the computational resources in the high density regions makes it very attractive for cosmological simulations. A comparison of different SPH implementations in classical hydrodynamical tests can be found in Thacker et al. (2000).
3.4 Grid based Eulerian methods These are the standard methods used in Computational Fluid Dynamics. The fluid equations are solved by finite-difference in a grid in which the computational volume is discretised. The main advantage of using these methods is the vast amount of information on stability and error analysis (see e.g. LeVeque et al. (1997) for a recent review). The first application of an eulerian method in Cosmology was done by Cen et al. (1990), who used
Cosmological numerical simulations
227
an aerodynamical finite-difference code (Jameson 1989) with artificial viscosity to treat shocks. They combined this code with a Particle-Mesh N-body code. I t was soon superseded by a new generation of finite volume methods with sophisticated shock-capturing schemes based on the so called Godunov algorithm (LeVeque et al. 1997). The Godunov algorithm is a finite-volume method: gas-dynamical equations are integrated in each volume element of the computational mesh. For instance, the continuity equation over a volume element would be
Using Gauss’s theorem we can transform the volume integral of divergences into fluxes through the cell boundaries
pv . s = -atpi*
&p $sides
Here, p is the average density in the cell and S is a unit vector normal to each of the volume surfaces. In this form Euler’s equations are valid in all situations, even in the presence of strong shocks, and mass, energy and momentum are conserved quantities. The time evolution of volume-averaged fluid quantities can be easily computed once the fluxes through all cell boundaries are known. The Godunov method uses the approximation that all gas-dynamical quantities are constant within each cell. To compute fluxes from one cell to the next, a Riemann problem must be solved. I recall that the Riemann shock tube problem corresponds to the break up of a single discontinuity, where gas-dynamical variables, u(X,T ) have the initial configuration u(X,0) = UL for X < 0 and u(X,O) = UR, for X > 0. The solution to this problem is a set of non-linear discontinuities in the state variables which propagate from each interface with characteristic velocities and involve only algebraic equations. Using these propagating discontinuities, one can compute the difference between the initial state and the solution after a time step and find the fluxes. These are then used to update the averaged fluid variables. The main advantage of Godunov algorithms is that no artificzal vzscosity is needed to treat flow discontinuities. This means that all the spurious effects, like oscillations propagating away from a discontinuity due to the Gibbs effect, or sound waves propagating upwind in supersonic flows, are not present in these methods. Also, shocks are much better resolved. They are kept in place and do not spread out as in numerical methods that use artificial viscosity. The spatial accuracy of Godunov algorithms can be pushed to higher order assuming that the volume-averaged gas-dynamical quantities are not constant within each grid cell. Instead, a linear or parabolic profile is considered. The value of the fluid variables a t the discontinuities for the Riemann problem is then found by interpolation from these profiles. This procedure improves the spatial accuracy of the method to second or even ) O ( h 3 ) )a t least in regions away from shocks. third order ( O ( h 2 Higher order Godunov methods have become the standard eulerian methods used today in cosmological gas-dynamical simulations. The Piecewise Parabolic Method (PPM) (Colella and Woodward 1984) is a high-order Godunov method which uses parabolic interpolation for the profiles. It is claimed to have third-order accuracy in smoothed fluid regions. It has become the standard method for eulerian cosmological gas-dynamical
228
Gustavo Yepes
codes (Bryan et al. 1994; Quilis, Ibaiiez and Saez 1996; Sornborger et al. 1997; Yepes et al. 1997). Other methods that have also been utilised in Cosmology are: a
e
e
TVD (Total Variation Diminishing): a higher-order Godunov method which uses linear profiles for fluid variables (Ryu et al. 1993) FCT (Flux-Corrected-Transport) : also a Godunov method with no artificial viscosity. It has second-order accuracy. (Klypin, Kates and Khokhlov 1991; Yepes 1995) ZEUS-3D: an eulerian 3 dimensional code with artificial viscosity, originally developed to solve magneto-hydrodynamical equations in astrophysical related problems (Stone and Korman 1992). A modified version valid for cosmological problems has been implemented with different N-body algorithms: (a) Tree+ZEUS-3D (Roettiger, Burns and Loken 1993) ZEUS-3D (Anninos and Norman 1994) (b) Particle-Mesh
+
3.5
Eulerian methods in non-uniform meshes
One of the major drawbacks of Eulerian methods as compared with SPH is that resolution is determined by the cell separation in a fixed, uniform mesh. Because it is fixed in comoving coordinates, the physical resolution decreases with redshift. On the contrary, SPH can adapt the resolution to the local density because the number of fluid particles is strongly correlated with the density field. In order to take advantage of the good properties of Eulerian methods (no artificial viscosity, shock treatment, etc) and the good resolution of SPH, there have been some attempts to build Eulerian numerical codes in non uniform grids that can adapt themselves to the arbitrary structures formed in a simulation. Different techniques have been applied in cosmological simulations to adapt the computational mesh: I l
~
l
Figure 1. An AMR grid hierarchy (from N o r m a n and B r y a n (1998)) Adaptive Mesh Refinement (AMR) Algorithms. The original regular grid is refined with uniform sub-meshes to achieve the desired resolution (Figure 1). These methods
Cosmological numerical simulations
229
have been used for more than 20 years in Computational Fluid Dynamics. The first implementation in Cosmology was done by Anninos and Norman (1996) who used a finer grid to increase resolution only in a sub-volume of the simulation. The refined grid was fixed in space and time. A fully adaptive algorithm was developed by Bryan et al. (1994) for the PPM eulerian method. The computational mesh can be refined and de-refined at multiple levels to achieve the desired resolution in each density zone. The AMR gas-dynamical codes must be implemented with N-body codes that use the same grid structures to solve Poisson's equation. Bryan and Norman's adaptive PPM code was implemented with a hierarchical Particle-Mesh N-body method for collisionless particles. New N-body codes have been developed with the idea in mind to use them with existing AMR eulerian gas-dynamical codes. The so-called Adaptive Refinement Tree Nbody code by Kravtsov, Klypin and Khokhlov (1997) is an example of a fully adaptive N-body code, in which Poisson's equation is solved in a hierarchy of meshes (see Figure 2).
U 50
100
Figure 2. Left: Particle distribution and its corresponding mesh structure for the N-body ART code. Each panel shows a blow-up of the area within the square of the previous one. Right: A test example of the gas-dynamical ART code. The grid structure of a 3D explosion at the centre after several time-steps (courtesy of A . Kravtsov) The idea behind the construction of this code was to provide an existing adaptive PPM hydrodynamical code (Khokhlov 1998) with an efficient Poisson solver that uses the same computational mesh and data structure. The main advantage of this code is that it is fully adaptive, with no restrictions on the shape of the structures that are formed. More details on the gas-dynamical ART code can be found elsewhere (Kravtsov 1999). A test example of a 3D explosion and the mesh structure generated to resolve the propagating shock can be seen in Figure 2. The ART code is an example of the benefits of combining the latest developments in Computational Fluid Dynamics with the experience gathered in the last 30 years of studying the N-body problem. The application of the hydrodynamical codes from other fields of Physics in Cosmology is rather straightforward, as we have shown in 5 3.1.1, when fluid equations are given in super-comoving coordinates.
230
Gustavo Yepes
The FLASH code (Ricker et al. 2000) (see http://flash.uchicago .edu/ for more information) is another initiative to incorporate AMR techniques in astrophysical fluid dynamical simulations, with the advantage that it is fully parallel using MPI. The modularity of this code makes it very useful as a framework to build fully adaptive and fully parallel eulerian gas-dynamical codes with different N-body algorithms.
Moving Mesh Algorithms. In this case, a regular grid with a fixed number of cells is distorted to adapt itself to the structures developed in the fluid. The algorithms to distort the grid are based on different physical assumptions. For instance, in the Smoothed Lagrangian Hydrodynamics (SLH) code developed by Gnedin (1995) the fluid equations are solved in a fixed grid in lagrangian coordinates. Therefore, the eulerian (physical) grid is heavily distorted, as is shown in Figure 3. In contrast, in the Moving Mesh Hydrodynamics algorithm developed by Pen (1998), the eulerian grid is distorted such that each computational cell had the same amount of fluid mass inside. Again, when strong clustering develops, the mesh is heavily deformed as shown in Figure 3. The SLH
MMH
DELAUNAY
Figure 3. Three examples of non-uniform meshes. Left, a planar projection of a 643 unaform lagrangian mesh shown in eulerian coordinates, in arbitrary units using the SLH code (from Gnedin 1995). Central panel: a 12g3 mesh projection, an cell units, produced by a Moving-Mesh Hydrodynamics code (from Pen 1998). Right: unstructured mesh obtained from a Delaunay triangulation for an arbitrary set of 2 0 points (from Xu 1997) major drawback of these algorithms is the unknown numerical errors that strong mesh distortions associated with development of clustering, could introduce in the solutions of the fluid equations. It will be necessary to undertake more detailed studies on error analysis and numerical stability of these methods in situations of strong density contrast.
Unstructured Meshes. Finite element methods are well known methods for fluid dynamics in engineering. The fluid equations are solved on a mesh with cells of arbitrary shapes and sizes. One attempt to use this kind of method in Cosmology was made by Xu (1997) his mesh is constructed from an arbitrary set of points in 3D through a Delaunay triangulation. A 2D example can be seen in Figure 3. As in Godunov methods, volume average quantities within each cell are used. Fluxes from one cell to the nearby ones are computed by applying gas-kinetic theory to the Riemann problem. He also developed algorithms to solve Poisson's equation in unstructured meshes. So far, this type of code has proved to be very memory expensive compared with others because of the considerable overhead to account for cell boundaries of arbitrary position and shape.
Cosmological numerical simulations
3.6
231
Comparison of different numerical techniques
The inherent complexity of gas dynamics makes it difficult to validate the results from codes based on different numerical techniques. The classical tests with known analytical solutions (e.g. Sod's solution to the shock tube problem, Sedov's solution to propagation of an explosion shock front, etc) are passed with flying colours by all codes. But these ideal physical situations are far from those arising in cosmological conditions. Therefore, in order to assess the reliability of cosmological gas-dynamical simulations, it is necessary to compare the results obtained with different numerical techniques. There have been two attempts to carry out this comparison. Kang et al. (1994) compared the statistical properties of large-scale structure generated in a cosmological simulation made with different codes (SPH and Eulerian). Some conclusions were drawn from this comparison. SPH was able to resolve higher central density concentrations than eulerian methods, but it failed to sample the profiles of objects at intermediate densities and completely failed to resolve low density areas. Also, there was agreement on temperature estimation in high density areas among different codes. The main differences occurred in the temperature estimates of low density areas of the simulation. More recently, Frenk et al. (1999) made a more detailed comparison of many different cosmological gas-dynamical codes. Different versions of SPH combined with either tree or P3M N-body, eulerian codes with either fixed or adaptive grids, even parallel and serial versions of the same code were compared. The purpose of the comparison was to study the non-linear properties of an individual cluster of galaxies. A constrained Gaussian realization of a Cold Dark Matter (CDM) model that produces a Coma-like cluster at the centre of the volume was the initial condition for all simulators. A detailed analysis of dark matter, gas density, temperature and X-ray emission of the central cluster was done in exactly the same way for all codes. The agreement in the dark matter distribution among all different N-body techniques was much better than the agreement in gas properties. This reflects the complexity of gas-dynamical simulations as compared with N-body simulations. Despite some discrepancies mainly related to differences in the time integration (possibly due to variations in internal timing among different codes) there is a clear indication of the different numerical treatment of shocks between SPH and eulerian codes. The gas entropy radial profile decreases towards the centre of the cluster in all SPH simulations. This decline is not shown in eulerian simulations. In fact, in the high resolution AMR eulerian simulation (Bryan et al. 1994) a flattening of the entropy at the cluster centre was found. Recently, this result has been confirmed in a re-simulation of the cluster formation with the new gasdynamical ART code with twice better resolution (Kravtsov, private communication).
4
The future of cosmological simulations
Future developments in cosmological simulations will be related, as in the past, to developments in computer technology. The tendency towards massively parallel computing will be the dominant factor in the development of new numerical codes. In the previous section, I have already mentioned different examples of parallel N-body/gasdynamical codes (e.g. GADGET, PTREESPH, FLASH, etc) that can, potentially, use hundreds or thousands of processors, linked together with high-speed Ethernet connections, by means
232
Gustavo Yepes
of MPI or PVM software tools. The use of specially designed hardware like GRAvity PipE (GRAPE) to speed up the computations is another alternative, although it is not clear whether there will be a continuous effort in the design and construction of special hardware for the N-body problem. It is simply not profitable, as compared with the cheap, home made. parallel computers such as the well-known Beowulf project (see h t t p : //www . beowulf .org/ for more information). We will se, for sure, new and more efficient parallel algorithms for particle methods (N-body and SPH). As I said before, the use of adaptive techniques for eulerian gas dynamics in Cosmology is just starting. More work is necessary to obtain parallel and efficient AMR algorithms. specially designed for cosmological applications. The other major development in future cosmological simulations will be physical rather than technological or numerical. It has to do with the proper modelling of the physics of the simulated system. During the last decade, gas dynamics was introduced in simulations in addition to gravity, but the physics of baryons is far from being completely described by them. As I commented in the Introduction, there are non-adiabatic phenomena that can change the physical behaviour of the system under study. They have been szmply represented as source and sink terms in the gasdynamical equations (9-11). To estimate them a proper modelling of all the relevant short-scale baryonic processes is necessary. In the rest of this section, I will discuss what has been done up t o now in order to include the non-adiabatic processes in cosmological simulations and what we would expect to be done in the coming years.
4.1
The short-scale baryonic physics
Most of the information we gather from, the Universe comes from the light generated by the stars in galaxies, or by the interaction of this light with the interstellar and/or intergalactic medium. A realistic simulation should give us the same information that we observe from galaxies in the Universe: luminosity, colours, spectra, abundances, etc. To this end, it is customary to include the complex physical processes that the baryonic component experiences at short scales, such as: Radiative Cooling Photoionisation processes Formation of molecular clouds Star formation within clouds Feedbacks associated with star evolution (winds. supernovae explosions) Chemical evolution Multiphase structure of the interstellar medium The inclusion of these physical processes in a cosmological simulation is not straightforward. On one hand, the physics of some of the above processes is poorly understood, so the modelling will not necessarily be very accurate. On the other hand, the short scale range of these processes makes the models to strongly depend on numerical resolution and on the type of numerical method employed to model the large-scale interactions (gravity and gas dynamics).
Cosmological numerical simulations 4.1.1
233
Heating and cooling processes
Radiative and Compton cooling in an homogeneous gas are very well-known processes in Astrophysics. For a plasma of primordial composition it is necessary to integrate the chemical reactions for the different ionic species: e-, HI He, H+, He+, He++. This has been done by Cen (1992) in the context of eulerian codes (see also Anninos et al. (1997) ) and by Katz, Weinberg and Herquist (1996) in SPH simulations. When collisional ionisation equilibrium is considered, there is no need to follow the different species separately. The cooling rates can be estimated directly as a function of local (cell average quantities in eulerian gas dynamics or particle positions in SPH) gas density and temperature as h(p,, T ) = piA(T, Z ) , where A(T,2 ) depends only on chemical composition,(Z), and temperature (see Figure 4)
2
"""I
'"""'I
'
1 ' ' ' ' ' ' '
r?
t
Cooling ' " " " ' rates' "for ""'~ different 2 Sutherland and Dopita
I I
-
0
104
105
106
T (Kelvin)
107
108
Figure 4. Top left: Cooling rates for primordial composition plasma and the contribution of different chemical reactions. Top right : Primordial cooling rates in the presence of a UV radiation for different gas densities. Bottom: Cooling rates for plasmas with variable chemical composition; (thick upper solid line correspond to plasma with solar metallicity).
234
Gustavo Yepes
High redshift observations show that there was very little neutral hydrogen a t early epochs. It is thought that a UV background coming from QSO and AGN was responsible for the complete re-ionisation of primordial gas. Hence, photoionization processes must be taken into account, together with radiative cooling, if we want to estimate the characteristics of Lyman-a absorption systems from cosmological simulations and compare them with observational data (e.g. Hernquist et al. 1996; Miralda-Escudk et al. 1996; Zhang et al. 1998; Machacek et al. 2000). Photo-ionisation heating is more important in low, primordial gas density regions, as can be seen in Figure 4. Therefore, it was considered a possible mechanism to avoid the over-cooling problem (e.g. White and Frenk 1991; Efsthatiou 1992) in hierarchical models of galaxy formation. In these models, short scale structures become non-linear first. Then gas cools so efficiently in the shallow potential wells of the small dark matter halos that it should have been converted into stars long before the assembly of present-day disk galaxies. Recent results from simulations seems to indicate that UV photoionization alone cannot solve this problem (Navarro and Steinmetz 1997; Sommer-Larsen, Gelato and Henrik 1999). Up to now, the UV radiation field has been considered to be isotropic and homogeneous and only a frequency dependence is allowed. This is a rather crude approximation, because as inhomogeneities develop, the optical depth of the gas changes. For a correct treatment of this physical situation, radiative transfer equations must be solved to find the UV intensity as a function of position, frequency and direction. So far, there has not been any attempt to simultaneously integrate Euler’s and radiative equations in 3D. A substantial increase in computational resources is needed for this problem. So far, only 1D (Ducloux et al. 1992) and 2D (Stone, Mihalas and Norman 1992) radiation hydrodynamical simulations have been done. With the increase of computing power, full 3D radiative and gasdynamical simulations could be feasible. To this end, new algorithms for the 3D radiative transfer equations are necessary. Radiation hydrodynamics will constitute a future line of research in Numerical Cosmology. In all these calculations, the multiphase nature (cold clouds embedded in hot gas) of the interstellar medium has not been considered. Almost all computer simulations assume an homogeneous ISM. It has been shown (Mucket and Kates 1997) that the physics of a multiphase ISM can be very surprising. Gravitational instabilities can be triggered after thermal instabilities are developed in the multiphase ISM for a particular range of densities, temperatures and UV fluxes. Yepes et al. (1997) have attempted to model a multiphase medium (two fluid components tightly coupled) using an eulerian PPM code and implemented in SPH simulations by Hultman and Pharasyn (1999).
4.1.2
Star formation and star-gas back-reactions
Any realistic simulation of galaxy formation and evolution should model, in some way, the formation of stars from cold gas clouds. When stars are formed they behave as collisionless particles. Therefore, inclusion of star formation in a simulation implies a transfer of collisional to collisionless material. The dynamics of the system are strongly altered with respect to simulations without star formation (see e.g. Katz, Weinberg and Herquist (1996)). In fact, recent results seem to indicate that star formation could play a key role in stabilising the disks of gas particles generated in SPH simulations of galaxy formation (Dominguez-Tenreiro, Tissera and Saiz 1998).
Cosmological numerical simulations
235
The physics of star formation is not understood yet. Therefore the modelling of a poorly understood process must necessarily be not very sophisticated. In most codes, a simple star-formation law is considered:
where pgas is the local gas density (in the particle position in SPH or averaged over a volume element in eulerian codes). Estimation of the characteristic timescale t, is done by different criteria (Katz 1992; Navarro and White 1993; Yepes et al. 1997; Berczik 1999). The total mass in stars obtained from the above equation has to be distributed according to an Initial Mass Function (IMF). Almost all simulations which include star-formation use the standard IMF of the solar vicinity (i.e. Scalo or Salpeter) and star formation is kept constant along the time evolution. This is a rather strong assumption because in a primordial gas cloud, stars could have formed according to different IMF's that could be shifted towards massive stars. However, we know that massive stars ( 2 lOM,) explode as Type I1 supernovae in a very short time and inject enormous amounts of energy ((ESN) = 1O5'erg, ( M S N ) = 22M0) into the surrounding gas. They also modify the chemical composition of the gas by the deposition of metals. Also, stellar winds from hot stars (0 and B) can deposit the same amounts of mass and energy back to the gas (Leitherer, Robert and Drissen 1992). UV radiation from stars can escape from where stars are formed and ionise molecular clouds in low density regions, making them another source of photoionization that should be taken into account in addition to QSO and AGN (Carraro, Lia and Chiosi 1998). The complex physics of the star-gas interactions is also very poorly understood. Nevertheless it is very important to include these effects in simulations. Stellar feedbacks can provide the mechanism to self-regulate star formation within dark matter halos, which could explain some of the observational properties of galaxies, like the Tully-Fisher relation (Elizondo e t al. 1999a), or the morphology (colour)-density relation (Elizondo e t al. 1999b). Feedback is also required to prevent the over-cooling problem (see 3 4.1.1) and the related disk-angular momentum problem in gasdynamical simulations of hierarchical galaxy formation (see e.g. Thacker and Couchman (2000) and references therein). The modelling of stellar feedbacks in cosmological simulations is just starting. In most SPH simulations of galaxy formation (e.g. Navarro and White 1993; Katz, Weinberg and Herquist 1996; Navarro and Steinmetz 2000) the thermal energy from supernovae is distributed among SPH particles within a sphere of radius corresponding t o the smoothing length. The way energy is distributed among particles is arbitrary. However, no matter how this procedure is done, results from simulations show that considering only thermal re-heating as a feedback mechanism is not enough to reconcile properties of simulated disk galaxies with observations (Thacker and Couchman 2000). In eulerian codes, the energy from supernovae affects only the gas inside the cell where they explode. But the conclusions are similar to those for SPH simulations: thermal energy from supernovae hardly affects the gas temperature inside large dark matter halos. Gas density is very high in regions of star formation. Therefore, the injected energy will be radiated away very efficiently because A cx pi,. This effect will be more important for SPH simulations as they can reach higher densities due to the higher resolution of this method (see 3 3.3). In simulations which also take into account the effects of chemical enrichment by supernovae (Steinmetz and Muller 1995; Yepes e t al. 1997; Carraro, Lia
236
Gustavo Yepes
and Chiosi 1998) this effect is even stronger because, as shown in Figure 4, the gas cools faster as its metallicity increases. Navarro and White (1993) showed that if part of the energy released by supernovae is put back as kinetic energy (by giving a kick to nearby SPH particles) a dramatic increase in feedback occurs. In order to avoid the galaxy being blown out by this effect, a very small fraction of the supernovae energy (f,5 must be kinetic, but, simulations of supernovae explosions predict that at least 5 to 20% of the total supernovae energy is mechanical (Gerritsen 1997). Hence, F, 0.1 seems a more physical guess. But, taking this value in a SPH simulation would simply destroy the baryonic content of dark halos. Apart from other considerations, as to whether the algorithm to implement kinetic effects does violate total momentum conservation, it is clear to me that a better treatment of supernovae feedback in SPH simulations is necessary. N
A possible way of attacking the problem of star-gas back-reactions would be through a multiphase modelling of the ISM. There is not much work done yet, although it will certainly be an active area of research in the future. Yepes et al. (1997) were the first to propose a numerical model in which a multiphase medium (stars, hot gas , cold clouds and dark matter) and eulerian (PPM) gas dynamics are used to study galaxy formation in a cosmological context. In this model, supernovae affect the system in two ways: heating of hot gas and evaporation (which mimics the effects of mechanical energy) of cold clouds within a cell volume. Our simulations show that when most of the energy from supernovae goes into re-heating the hot gas (thermal energy), the pressure gradients are very effective in suppressing subsequent star formation in shallow potential wells. If we allow supernova energy to evaporate cold clouds (kinetic energy), and transfer mass from cold to hot components, star formation is not completely suppressed and small halos tend to be brighter (Elizondo e t al. 1999a) This multiphase model has also been implemented in SPH (Hultman and Pharasyn 1999), assuming that each SPH particle has a multiphase structure. An attempt to implement a multiphase SPH with particles of different density, aimed at representing the hot and cold phases, has recently been proposed (Ritchie and Thomas 2000). Despite the advances made up until now in the modelling of all the physical processes described above, there is still a long way to go until we will be able to simulate in a computer patches of the Universe with enough physical and numerical resolution to zoom in from the large scale filamentary structure, described by the dark matter, to the inner regions of molecular clouds where stars form. To this end, an enormous amount of work in modelling the large (gravity coupled with gas dynamics) and small (non-adiabatic baryonic processes) scale physics has to be done. Fortunately, the availability of new numerical codes, as soon as they are written, and fast, and cheap, parallel computers will allow many cosmologists and astrophysicists to contribute to the development of Numerical Cosmology. The future looks very promising for those young researchers who want to start their scientific career in this field.
References Aarseth S J, 1979, MNRAS 126 223. Aarseth S J, Gott J R and Turner E L, 1979, A p J 228 664.
Cosmological numerical simulations
237
Anninos P and Norman M L, 1994, ApJ 436 11. Anninos P and Norman M L, 1996, ApJ 459 12. Anninos P, Zhang Y, Abel T, Norman M L, 1997, New Astronomy 2 209. Balsara D S, 1995 J Comp Phys , 121 357. Berczik P, 1999, A B A 348 371. Bryan G et al. , 1994, ApJ 428 405. Bertschinger E, 1998, A R A A 36 599. Binney J and Tremaine S, 1987, Galactic Dynamics, Princeton Univ Press. Carraro G, Lia C, and Chiosi C, 1998, MNRAS 297 1021. Cen R, 1992, ApJS 78 341. Cen R, Jameson A, Liu F, Ostriker J P, 1990, ApJ 363 L41. Colella P and Woodward P R, 1984, J Comp Phys 54 174. Couchman H M P, Thomas P and Pearce F, 1995, ApJ 452 797. HYDRA is available at http: //hydra.mcmaster .ca/ Courant R, Friedrichs K 0 and Lewy H, 1967, IBM Journal 11 215. Dave R, Dubinski J and Hernquist L, 1997, New Astronomy 2 277. Dominguez-Tenreiro R, Tissera P B and SBiz A, 1998, ApJ (Letters) 508 123L. Doroshkevich A et al. , 1980, MNRAS 192 321. Ducloux E, Leorat J , Gerbal D and Alecian G, 1992, A B A 257 425. Efsthatiou G, 1992, MNRAS 256 43P. Elizondo D, Yepes G, Kates R, Muller V and Klypin A, 1999a, ApJ 515 525. Elizondo D, Yepes G, Kates R, and Klypin A, 1999b, New Astronomy 4 101. Evrard A E, 1988, MNRAS 235 911. Frenk C S F et al. , 1999, ApJ 525 554. Gerritsen J P E, 1997, Ph. D. Thesis, Univ of Groningen, Netherlands. Gingold R A and Monnaghan J J , 1977, MNRAS 181 375. Gnedin N Y, 1995, ApJS 97 231. Guth A, 1981, Phys Rev D 23 347. Hernquist L and Katz N, 1989, ApJS 70 419. Hernquist L Katz N, Weinberg D and Miralda-EscodC J,1996, ApJ 457 51. Hultman J and Pharasyn A, 1999, A €4 A 347 769. Jameson A, 1989, Science 245 361. Kang H et al. , 1994, ApJ 430 83. Mucket J P and Kates R E, 1997, A B A 324 1. Katz N, 1992, ApJS 391 502. Katz N, Weinberg D H and Hernquist L, 1996, ApJS 105 19. Khokhlov A, 1998, J Comp Phys 143 519. Klypin A, in Proc of the Summer School Relativistic Cosmology: Theory and Observations”, Cuomo, Italy. (astro-ph/0005502). Klypin A, Kates R, Khokhlov A M, 1991, Lecture Notes in Physics: Insights into the Universe, Springer-Verlag: Berlin. Klypin A and Shandarin S F, 1983, MNRAS 204 891. Kravtsov A, 1999, Ph.D. Thesis. New Mexico State Univ. USA. Kravtsov A, Klypin A and Khokhlov A, 1997, ApJS 111 73. Leitherer C, Robert C and Drissen L, 1992, ApJ 401 596. LeVeque R J, Mihalaa D, Dorfi E A, and Muller E, 1997, Computational Methods for Astrophysical Fluid Flow, Springer-Verlag: Berlin. Liddle A, 1997, A S P Conference Series 126 31. Lucy L B, 1977, A J 82 1013. Machacek M E et al. , 2000, ApJ 532 118.
238
Gustavo Yepes
Martel H and Shapiro S, 1998, M N R A S 297 467. Miralda-EscudB, Cen R, Ostriker J P, Rauch M, 1996, A p J 471 582. Monnaghan J J, 1992, A R A A 30 543. Nakasato N, Mori M, Nomoto K, 1997, A p J 484 608. Navarro J F, Steinmetz M, 1997, A p J 478 13. Navarro J F, Steinmetz M, 2000, A p J 538 477. Navarro J F and White S D M, 1993, M N R A S 265 271. Norman M L and Bryan G 1998, in Proc of Int Conf Numerical Astrophysacs. A p €4 Sp Sc Library, 210 , 19. (astro-ph/9807121). Peebles P J E, 1980, The Large Scale Structure of the Universe, (Princeton Univ Press). Pen U L, 1998, A p J S 115 19. Primack J , 2000, Proc. 4th International Symposium on Sources and Detection of Dark Matter i n the Universe, Marina del Rey, USA. (astro-ph/0007187). Quilis V, Ibaiiez J M, Saez D, 1996, A p J 469 11. Ricker P M, et al. , 2000, Procs. VI1 International Workshop on Advanced Computing and Analysis Techniques i n Physics Research, Chicago, USA. (astro-ph/0011502). Ritchie B W and Thomas P A, 2000, M N R A S (in press) (astro-ph/0005357). Roettiger K, Burns J 0, Loken C, 1993, A p J 407 53. Ryu D, Ostriker J , Kang H and Cen R, 1993, A p J 414 1. Sahni V and Cole P, 1995, Phys Rep 262 1. Saslaw W C , 1985 Gravitational physics of stelar and galactic systems, Cambridge Univ Press. Sellwood J A, 1987, A R A A 25 151. Serna A, Alimi J M, and Chieze J P, 1996 A p J 461 884. Shandarin S F, 1980, Astrojtica 16 439. Shapiro P A, Kang H, Villumsen J V, 1991, A S P Conf Proc 15 : Large-scale Structures and Peculiar Motions i n the Universe. Shapiro P A, Martel H, Villumsen J V, Owens J M. 1996, A p J S 103 269. Sornborger A, Branderberger R, Fryxell B and Olson K, 1997, A p J 482 22. Springe1 V, Yoshida N, White S D M, 2000, New Astronomy (in press) astro-ph/0003162. GADGET is available at http://ibm-2.MPA-Carching.MPG.DE/gadget/. Steinmetz M, 1995, Proc of the International School of Physics Enrico Fermi:Dark Matter i n the Universe,Varenna, ed. J Primack, A Provenzale, S Bonommeto. (astro-ph/9512013) . Steinmetz M and Muller E, 1995, M N R A S 276 549. Stone J M, Mihalas D and Norman M L, 1992, A p J S 80 819. Stone J M, and Norman M L, 1992, A p J S 80 753. Steinmetz M, 1996, M N R A S 278 1005. Sommer-Larsen J, Gelato S and Henrik V, 1999, A p J 519 501. Thacker R and Couchman H M P, 2000, A p J (in press) (astro-ph/0001276). Thacker R J, Tittley E R, Pearce F R, Couchman H M P and Thomas P A, 2000, M N R A S 319 619. Tissera P B, Lambas D G and Abadi M G, 1997, M N R A S 286 384. Umemura M et a1 , 1993, PASJ 45 311. White S D M and Frenk C S F, 1991, A p J 379 52. Xu G, 1997, M N R A S 288 903. Yepes G 1996, A S P Conf Series 126 279. Yepes G , Kates R, Klypin A, Khokhlov A M, 1995, Proc of the XVth Recontres de Moriond: Clustering in the Universe. Ed. Frontiers Paris. Yepes G, Kates R, Khokhlov A and Klypin A, 1997, M N R A S 284 235. Zeldovich Ya B, 1970, A & A 5 84. Zhang Y , Meiksin A, Anninos P and Norman M L, 1998, A p J 495 63.
239
Gravitational N-body simulation of large-scale cosmic structure H M P Couchman McMaster University, Canada
1
Introduction
The purpose of cosmological simulation is to describe the formation and evolution of the cosmic fluid from an almost smooth state at early epochs to the rich variety of structures observed from the present: from galaxies, clusters of galaxies, walls and sheets of galaxies to superclusters and apparently empty voids. On the largest scales the behaviour of matter is dominated by gravitational interactions. These lectures will consider the behaviour of the cosmic fluid under the influence of gravity, with a particular focus on the numerical simulation of nonlinear gravitational evolution. On galactic scales, hydrodynamic interactions become important in addition to gravity; a detailed discussion of these issues is given in the lectures presented elsewhere in this volume by Yepes.
A wealth of observational data over the last decade is providing increasingly important constraints on models of structure formation. Observations of the Cosmic Microwave Background-a relic of the hot early universe-now convincingly demonstrate that structure grew from a state of near uniformity at early epochs, with fractional fluctuations Upcoming experiments promise improved in the matter density at the level of measurement of the spectrum of the initial fluctuations. At more recent epochs we have increasingly detailed information about the spatial distribution of cosmic structure and are beginning to elucidate the evolution of cosmic populations. A key thrust of contemporary post-recombination cosmology is to connect these two regimes: to understand how structure grows from small linear perturbations to the range of structures-some highly non-linear-that we observe at present. Figure 1 illustrates this physical domain. N
As noted, at early epochs the fluctuation amplitude is small and may be treated through perturbation theory. For fluctuation amplitudes approaching unity and for bound virialised objects, we have no satisfactory analytic descriptions of gravitational evolution beyond very simple models. It is to describe non-linear gravitational evolution that we turn to numerical simulation. This provides a means both to model the non-linear be-
H M P Couchman
240 Redshift, z
IO’ -
Cosmic Microwave Background
Recombination
The “Dark Ages“
.
. .
I
. . ,
.
.
. .
First Objects
“**/Quam . I I .
Lyman alpha X-ray clusters Large-scale Structure, Hubble Deep Field etc. I)
Figure 1. A schematic representation of the post-recombination universe and the growth of structure. The redshifl, 2 , is such that 1 z 0: l f a , where a is the expansion factor of the universe.
+
haviour of self-gravitating systems and to generate realizations of initial density fields which we may compare with observed distributions. The layout of these lectures is as follows. In Section 2 the linear growth of small fluctuations ifi an expanding background is reviewed together with a description of the formation of bound structures in common cosmological models. Section 3 discusses the requirements of codmological simulations, followed in Section 4 by an extended discussion of the numerical techniques employed in these simulations. Section 5 discusses implementing large-N particle techniques for parallel computers and Section 6 discusses issues relevant to practical simulation. The lectures conclude with a brief look to future challenges.
2
Structure format ion in hierarchical cosmologies
The focus of these lectures is on the evolution of the self-gravitating cosmic fluid. We will consider epochs such that t < trecombination implying that P/c2 << p, and focus on mass scales such that M 2 MJeansso that thermal pressure may be ignored. Under these circumstances the fluid has no effective pressure (at least until random velocities can provide support in bound objects) and is termed “pressure-less dust”. We will further restrict our attention to scales very much less than the horizon scale N cto, where to is the present age of the universe. This latter assumption is justified by the observational fact that deviations from homogeneity are only observed to be significant on scales much smaller than the horizon scale. Under these conditions it is sufficient to treat the fluid as Newtonian.
Cosmological N-body simulations
241
Let us begin by considering the fluid (Euler) equations:
2 1 +V,.(pu) at
= 0
E
- +U.V,.(U) = -Vr@ Ir
which together with Poisson’s equation,
V:@ = 4.irGp,
(3)
describe the evolution of the fluid It is convenient to cast these equations into comoving coordinates to remove the universal expansion:
r = ax U
u / a is the Hubble parameter and
(4)
a = -r+v; a
(5)
v is the peculiar velocity describing local departures
from the Hubble flow. In comoving coordinates, Equations (1)-(3) become:
86 dt
-
+ -a1V . [ ( l + d ) v ]
a v 1
-+-v.vv+-v
dt
a
a
a
= 0 =
1 --vf$ a
V2$ = 47rGpa26, where 6 = p / p - 1 is the fractional over-density, p is the mean density, 4 is the peculiar potential and all spatial derivatives are now in terms of the comoving coordinate x. Note that the first two terms of equation ( 7 ) form the convective derivative dvldt; the momentum equation obtained by writing the equation in terms of the convective derivative is exactly that required for Lagrangian methods such as those using particles as described below. Linearising these equations, 161, IvI << 1, and taking V.(7) - d / d t ( 6 ) - l/a(8) gives
where R is the ratio of the mean density to the critical density of a scale-free universe with flat spatial sections (e.g., Peebles 1993). The temporal behaviour of the expansion parameter, a, is derived from the field equations. It is sufficient for our purposes to note that at early times the universe behaves as though R N 1 and has a 0: t2l3. In this case we see immediately that equation (9) is homogeneous and has solutions 6 0: tR with n = 2/3, -1. The decaying solution corresponds to a freely moving object; it is always overtaking observers moving away with the Hubble expansion and so in comoving coordinates its peculiar speed decays. The growing solution shows that the over-density increases proportionately to the expansion factor. Thus the imposed drag from the expansion slows the growth from the normal exponential perturbative increase to a power law growth.
242
H M P Couchman
We may extrapolate these results to a simple picture of the growth of cosmic structure. At recombination the fluctuation amplitudes are small and grow in lock-step with the universal expansion. Further, if we consider the spectrum of fluctuation amplitudes, the different comoving Fourier modes grow independently at the same rate. As the overdensity, 6, on a particular scale approaches unity, the perturbation will break away from the universal expansion, stop growing in physical coordinates and collapse to a bound object. Depending upon the initial fluctuation spectrum, a snapshot at a particular epoch may show a range of scales in different stages of this growth cycle. When an object collapses and virialises it does so a t a density which is a roughly constant multiple of the mean density a t the epoch at which it broke away from the universal expansion. Thus objects collapsing at early times have higher densities than those collapsing at late times. Currently popular cosmological models favour initial spectra in which the fluctuation amplitude rises monotonically to small scales. Thus small objects become non-linear first, followed by larger and larger structures. We may thus expect that objects collapsing a t high densities a t early times will be subsumed in a hierarchical fashion into later, larger collapsing objects. This appears to be consistent with the observed distribution, densities and ages of cosmic objects such as galaxies and clusters.
3
Requirements of a cosmological simulation
We turn now to the question of modelling the behaviour of a self-gravitating fluid in the cosmological context. Contemporary cosmologies postulate a component of dark matter which interacts only gravitationally. Further, since the elements of which this dominant matter component is composed are typically believed to be elementary particles, it will be a very good approximation on scales of cosmological interest to consider the dark matter as a collisionless fluid. Thus, in order to model the behaviour of the gravitationally dominant component, we need to be able to approximate successfully a collisionless fluid. Ideally, we would model numerically the full six-dimensional phase space of the collisionless Boltzmann (or Vlasov) equation. This is not possible a t useful resolution at present. A common approach-the one to be discussed here-is to split the initial phasespace into volume elements which are approximated by elements of fixed Eulerian size, in this case particles. In reality the initial phase-space elements would change shape and volume to preserve the phase-space density of the collisionless fluid under the action of gravity. The key to ensuring that this numerical description is a useful approximation of a collisionless fluid is to minimise fluctuations within the volume elements. It is certainly necessary to ensure that two-body scattering is not significant on the time scales under consideration. Beyond this, little detailed quantitative work hz :been done to determine the quality or convergence to a collisionless fluid of the numerical approximation. Two further features of cosmic structures will motivate the choice of numerical method. First, cosmic structures are encountered a t very high density contrasts. As noted previously, an object will first virialise at a fixed over-density relative to the background of roughly 200. This density contrast will grow as the object maintains a fixed physical size within the expanding background. A galaxy, for example, (in which dissipation is
Cosmological X-body simulations
243
important) has a density contrast of 105-6. These large contrasts lead immediately to a requirement for high resolution forces and indicate that a wide range of timescalestdynamica) 0: p-'/'-will be present in the simulation. Second, structure exists in the universe over a very large range of scales. These features of the cosmos suggest that Lagrangian particle methods will be well suited to modelling the collisionless component. Indeed, it is hard to see how otherwise to model easily the multiple streams that will occur from orbit crossing as structures virialise and merge. As noted previously, the requirement that particle scattering and discreteness fluctuations be minimised will be an essential feature of a plausible model of the collisionless fluid. This is achieved in particle methods through the use of particle softening. Bound objects are a key feature of interest in cosmological investigations, and in order to model properly the matter distribution within these haloes it will frequently be necessary to have a particle softening which is very much smaller than the mean interparticle separation. This is simply a reflection of the large density contrasts that arise. It is important to note that this does not represent an attempt to resolve structure on scales smaller than those at which information is present in the initial conditions. The information at the Nyquist wavelength of the particles will be transferred to scales very much smaller than the original comoving Nyquist scale as the first bound objects break away from the universal expansion. The optimal choice of softening is discussed in Section 6. The very large range of masses of structures of interest in the universe has motivated a continuing push for simulations incorporating very large numbers of particles. An immediate corollary-if the simulations are to be practicable-is a continuing drive to develop highly efficient methods for solving the gravitational N-body problem. In the cosmological context, an efficient method is generally believed to require the following properties: 1. Computational scaling with particle number, N , as close to O ( N 1 n N ) as possible. For a fractal distribution of particles, computing the forces to a resolution comparable to the minimum separation is now widely believed to have a complexity of this order.
2. Low memory overhead beyond the essential 6N storage required for particle positions and velocities. 3. Forces accurate to 112%. This value appears to be the empirical standard at present although relatively little work has been done to quantify under what conditions this level of force error (typically equivalent to a stochastic fluctuation in the force) leads to an acceptable representation of the collisionless fluid. N
A primary focus of these lectures is a discussion of numerical methods for the gravitational N-body problem satisfying these three properties. We will conclude this section by estimating an appropriate N for the problem of modelling the large-scale distribution of galaxies in a representative volume of the universe.
H M P Couchman
244
We may express the average number of particles within a distance s of another particle directly in terms of an integral of the two-point correlation function, E , of the particle distribution:
n, = 47rE
is
T 2 ( ( r )dr.
(10)
where E is the mean particle density. Suppose that we wish to model the cosmic density distribution, t ( ~N) ( T o / r ) Y o i where y N 1.8 and TO N 5h-’ Mpc. The factor os parameterises our lack of knowledge of the autocorrelation of the underlying dark matter distribution and assumes that it is a constant multiple of the observed galaxy two-point correlation. We will assume 08 0.5. We can then determine the total required number of particles by setting the size of the simulation volume (assumed to be cubic) and demanding that we have a sufficient number of particles within the gravitational softening that the fluid will be approximately collisionless, with the softening being determined by the spatial resolution required. From equation (10) we have: N
Taking the scaled quantities from left to right we have first an expression of the empirical result that a t least 10 particles are needed per gravitational softening to ensure that we approximate a collisionless fluid. The second term sets the dynamic clustering rangehere it is set at roughly 5h-’kpc, enough to resolve galaxies but not enough to distinguish internal properties. The third term ensures a fair sample; empirical results have shown that the imposed boundary begins to influence the simulated correlation length, To, if r0 is more than a few percent of the box size. Thus we would require roughly lo9 particles for a simulation of a representative volume of the universe of size 250h-’ Mpc with 5h-’ kpc resolution. It is noteworthy that in 1986 a state-of-the-art simulation had N 30,000, 12 years later a lo9 particle simulation has been run. This represents an increase of 30,000 in particle number. In the same period computer “power” has increased (according to Moore’s law of a doubling every 18 months) by 28 = 256. Clearly large simulations now use multiple processor machines (as well as the large amounts of memory they command), but it is clear that substantial algorithmic advances have also been necessary. N
4
The computational framework
These lectures will concentrate on the simulation of large cosmological volumes. ,4great deal of work has also been done using simulations with vacuum boundary conditions; examples of these simulations are discussed by Quinn (this volume).
A very frequent choice for cosmology is to use boundary conditions which are triply periodic in a cube. Apart from the intrinsic convenience of these boundary conditions, for large enough volumes this may represent a useful approximation to the observed approach
245
Cosmological ,V-body simulations
to homogeneity on large scales. It is useful to consider in a little more detail how the assumption of triply periodic boundary conditions corresponds to a universe which is homogeneous on large scales. In a triply periodic system forces satisfy:
F(x + nL) = F(x),
(13)
where n is an integer triple and L is the period (we will assume a periodic cube here, but the argument does not depend upon this). The periodicity immediately implies
LL2
F.dS = 0,
(14)
where the integral is taken over opposite faces of the cube and dS is the outward normal. Using Poisson’s equation, the divergence theorem and equation (14), we can write: 47iGL3 pdV =
L3
V2q5dV = -
1
L3
/
V.FdV = - s F.dS = 0,
(15)
where S is taken over all faces of the cube. Thus the total mass in the .cube is zero and we are therefore investigating departures from homogeneity as expected. We may compare the two views of the universe: 1. Transforming the fluid equations to comoving coordinates and subtracting the homogeneous part using the field equations gives
V 2 $= 47rGpa26,
(16)
(6) = (p - p ) / p = 0.
(17)
where The transformation is well-defined because of large-scale homogeneity.
2. Zero mean density in the simulation cube is simply a result of the imposed periodicity. The two views coincide only for L sufficiently large that small wavenumbers, 27r/L x (1,2,3,...), describe a smooth transition to zero mean value and homogeneity.
4.1 Initial conditions The aim is to impose a specified spectrum of fluctuations on an initial particle distribution which can then be evolved forward using one of the N-body methods described below.
A straightforward method for displacing particles to represent the required density field is to use the Zel’dovich (1970) map: x = q - b(tdV$(q),
(18)
where x gives the perturbed particle position, q is the unperturbed coordinate and b is the linear growth factor at initial time ti.
H M P Couchman
246
Mass conservation implies
The over-density is then
6,,-b-
ax,ax,
-1
bV2$,
N
and thus we can derive the appropriate $ for equation (18) by solving Poisson’s equation. Initial conditions are usually specified in terms of the spectrum of 6 , P(lc),implying
(here, and throughout,
2, implies the Fourier representation of X).
The process is easily carried out computationally using Fourier methods.
4.2
The unperturbed field
There are many choices for the unperturbed field p(q): uniform; random; random in cell; glass. The uniform grid has the advantage that it is initially noise free and an input spectrum can be very faithfully reproduced in the particle distribution. A potential disadvantage is that small-scale regularity exists which may influence the imposed spectrum as the distribution evolves. It is not clear to what extent this is significant. The impact will likely depend upon the input spectrum and if power extends to the particle Nyquist, discreteness noise rapidly overwhelms any regularity in highly evolved regions. A random initial particle distribution avoids correlations on small scales but typically introduces unacceptably large fluctuations on small scales which limit the range of scales over which the desired spectrum is reproduced. Other sub-random distributions have been used by various authors to try to ameliorate this effect.
A popular choice at present is the use of a glass initial distribution. This is obtained by allowing a near uniform particle distribution to relax under mutually repulsive forces such as negative gravity or SPH pressure forces (Thomas & Couchman 1992). The resulting distribution is smooth on scales of roughly twice the mean interparticle separation (the fluctuation amplitude is roughly 1% on this scale), but is disordered. The two point function of the distribution resembles that of a glass.
4.3
Gaussian initial conditions
Many current inflationary theories predict that the spectrum of fluctuations at the epoch of recombination should be Gaussian. For an ensemble of simulation cubes we can ensure that statistical quantities measured using an ensemble average are Gaussian if R(JlC),
I(&)
Cosmological N-body simulations
247
are independent random variables with zero mean and variance = P ( k ) . Equivalently, if 8k = I8k1eei4*;
then
such that 6 real,
(24)
must be drawn from a Rayleigh distribution: X
and
& is uniformly distributed in [0,27r).
Provided that a sufficient number of waves contribute to the statistic of interest, the Central Limit Theorem will ensure a Gaussian distribution if f$k is random and it will have the desired spectrum if =
m.
Usually, few realisations are constructed and we rely on ergodicity to measure averages from a single realisation. (Note that we are not discussing here relaxation to ergodicity under evolution of the system, simply the device of estimating ensemble averages using volume averages. For the purposes and scope of this discussion, these procedures could be carried out in an un-evolved set of linear initial states.) It is unclear if any benefit accrues from having randomly sampled amplitudes (drawn from the Rayleigh distribution, equation 25) on large scales where so few waves contribute that the distribution in a single realisation is a long way from Gaussian.
4.4
N-body methods in cosmology
One of the central algorithmic goals of particle methods in numerical cosmology is, then, to integrate the orbits of N particles under their mutual gravity (possibly including a system of periodic images) as described by the following set of tightly coupled equations:
+
where n is an integer triple (in 3-D) labelling an image cell such that xj,, = xj nL. We will ignore, here, for simplicity the effect of comoving coordinates: accounting for this introduces a drag 0: vi in equation (26).
For convenience, we will divide the popular N-body methods into two broad categories: “accurate” methods which are structured in such a way that the method permits interparticle forces to be computed to machine precision and “approximate” methods which are typical of those used in cosmology in which force accuracies with RMS errors of 1/2% are commonplace. As noted previously this level of force accuracy is believed to be adequate for investigating the gravitational instability in cosmological simulations. We will investigate the following topics in this subsection. 0
“Accurate” methods: (a) Direct methods. (b) Fast multipole method.
0
“Approximate” methods: (a) Tree codes. (b) Grid-based codes; The Ewald method for periodic boundary conditions.
0
Time-stepping or time integration schemes.
0
H M P Couchman
248
Direct Lagrangian
Hybrid methods
Explicit Particle-Particle N
F, =
1fiJ: 322
Aarseth;
1
O ( N Z )[Iv r; 103-4(5)]
...
Tree methods wln N
1
F, =
Tree-PM
fig; O ( N 1 n S )
g(9roUPs)
Barnes-Hut; Appel; ...
I
,
I
As P3M, but PP sum replaced by a local tree calculation TPM
Xu; ...
Fast Multipole Method
I
Fi = Ca&(xi;x,); O ( N 1 n N )
/Adaptive P3M
M1
#
Greengard; ...
fij replaced with refined
7,)
P3M for n,>>O(l); O ( N 1 n N ) AP3M C o u c r a n
I Spatial adapative field (nested grids)
F,
c
=
~ c $ ~ I ~ O, ( N 1 n N )
g( grzd level)
I
~
I Particle-part idle-Particle-Mesh 1
HPM: Vzllumsen ARTKravtsow:; Hercules Bryan; ...
Fixed basis
F, = %lx,;
O(1Y
PM Birdsall; ...
c
( O ( N 1 n N ) for FFT, but relaxation (multigrid) ( O ( N ) ) SCF Hernquist; ...
Eulerian field representation Figure 2. Classafication of cosmological N-body algorithms. We note, in passing, that special-purpose hardware such as “GRAPE” exists which implements this technique in hardware, but this will not be the focus of discussion here. We summarise the various algorithms that have been used for the large-N gravitational particle problem and their relationships in Figure 2. The approximate memory requirements of the popular N-body methods in cosmology are summarised in Table 1. The substantial memory saving of the grid-based methods reflects the simplicity of sorting into, and indexing of, regular grids. As we will see however, this simplicity presents substantial difficulties in achieving a satisfactory data decomposition relative to the flexibility of the tree codes when these methods are adapted for parallel computers.
Cosmological N- body simulations Method
249
Storage (words)
P3M 8N AP3M 10N Tree, FMM -25N-35N Table 1. Memory requirements of the popular large-N particle methods. We will treat each of the main methods briefly in turn
Direct met hods The O ( N 2 )scaling of the direct sum rapidly becomes very restrictive for large N and these techniques have not been widely used in cosmology. It is worth noting, however, that a number of improvements can be made which substantially alter the naive scaling under appropriate conditions. The most significant improvement that can be made is to integrate each particle forward on individual timesteps computed to maintain accuracy: particles travelling at high velocity or experiencing large accelerations need to be evolved with shorter time steps. Most direct Y-body integrators employ individual particle timesteps in some manner (e.g., Aarseth 1985).
A further, related, technique is to split the force computation for a given particle into two components: a part coming from near neighbours which it may be expected will vary on a short time-scale and a relatively smooth long range part which will need only relatively infrequent re-calculation (Ahmad & Cohen 1973). Combining this technique with individual timestepping produces a scheme which is amenable to parallel execution (e.g., Spurzem 1999). These techniques allow N N lo5 by reducing the effective scaling relative to the nai've O ( N * )direct summation. The development of high-flop rate application-specific ICs has also permitted the possibility of performing large-N direct calculations. The GRAPE (GRAvity PipE-Sugimoto et al. 1990, Athanassoula et al. 1998) chips perform the direct summation in hardware and have been used for calculations with N N lo6. Direct summation techniques do not have a firm hold in cosmology primarily because the scaling is perceived to be a continued barrier to achieving the very large N calculations which are emphasised, and because highly accurate force calculations or time integrations are not considered relevant to many cosmological investigations.
Fast Multipole Met hod The technique introduced and popularised by Greengard & Rohklin (1987) allows an efficient computation of the mutual gravity of N particles to arbitrary accuracy. The method decomposes the particle distribution into a hierarchy of cells which describe the spatial relationships of particles (structure and substructure etc.). The hierarchy is organised into a tree data structure. At each level of the tree a multipole expansion is constructed which describes the density field to a specified level of accuracy. The ef-
H M P Couchman
250
ANDTHEN DOWN
K
Compute local expansion due to non-neighbour cells; shift expansion and propagate to children
\
Compute p t e r m multipole expansion = C(shifted expansions from children)
\
FIRST, UP
Figure 3. Schematic of the Fast Multipole Algorithm. The sub-divided box at the topright shows the quad-tree (oct-tree in 3 - 0 ) subdivision for a hypothetical distribution of four particles. The “tree” (with empty branches pruned) shown to the left of the figure corresponds to the illustrated particle distribution and indicates the basic mechanism of the Fast Multipole method.
ficiency of the methods relies on the fact that the multipole expansions on larger scales (higher up the tree hierarchy) may be constructed efficiently through the use of shift theorems for the harmonic multipoles from those describing the density field on smaller scales. The result is a hierarchy describing the potential at all levels of structure. Potentials are found at each particle position by an analogous inverse process. Starting at the top of the tree on the largest scales, local expansions are computed due to non-neighbour cells; these expansions are then shifted and propagated to the node’s children. The scheme is illustrated in Figure 3. In general terms the method constructs a field describing the particle distribution which is spatially adaptive. As noted the efficiency arises because of the existence of shift theorems in the space of the basis expansions. (This is analogous to the shifts in Fourier space which make an FFT convolution efficient. In the case of the FFT, of course, the gain in computational efficiency is offset by the uniform sampling and consequent lack of spatial adaptivity.) The precision of the method is controlled by the number of terms included in the expansion. The practical implementation of this method requires care if efficiency is to be maintained for highly clustered particle distributions. Furthermore it is hard to incorporate
Cosmological N-body simulations
251
individual particle timesteps in the standard technique. The image charges required for periodic boundary conditions are fairly naturally incorporated (note that even though the total mass is non-zero, V$ can be computed unambiguously). A number of optimisations and refinements may be found in the literature. The technique is usually claimed to be O ( N ) which arises from a consideration of the operations count as one traverses the tree shown in Figure 3. However a number of authors, e.g., Aluru et al. (1994), have shown that the technique is, in fact, O ( N 1 n N ) . This can be understood because the tree (an “oct” tree as illustrated) can have arbitrary depth for particles arbitrarily close together. N-node trees-such as nearest neighbour trees-indeed reduce the cost of the expansion and potential computations to O ( N ) ,but the construction of the tree, which involves the spatial sorting of the particle distribution, is probably O(N1nN). It is worth pointing out that the Fast Multipole Method exists in its most efficient practical implementations for particle distributions which are close to uniform. In this case the method is O(N)-as are many other techniques to be described. When the particle distribution is fractal and has structure on all scales down to that of the individual particle, the method almost certainly is O ( N In N ) . It is important to point out, however, that the true complexity of the gravitational N-body problem is unknown. Furthermore the complexity-as a measure of computational effort-cannot be separated from the issue of required force accuracy. These methods have not been widely used in cosmology. The primary reason given has been that the coefficient to the apparent scaling has been believed to be too large to make them competitive. This is not clear however: although the 3-D multipole expansion method is certainly known to require very careful coding to be efficient, recent techniques using Gaussian expansions have shown promise (Pham 1999). It remains to be seen, however, how well these methods perform in the context of general fractal particle distributions.
Tree methods In this section we will discuss a generic class of approximate algorithms for solving the gravitational N-body problem which use trees of the type described in the previous section to describe the spatial relationships of particles. These methods, however, rely on realspace (or direct) interactions between groups of particles at different levels of the tree rather than expressing the potential using a set of basis functions. In this sense the methods have a close relationship with direct N-body methods although they are unable to achieve the same level of accuracy without losing efficiency. Very many refinements of these techniques have been discussed in the literature. We will focus here on outlining the basic Barnes-Hut method (Barnes & Hut 1986). The particles are partitioned using a tree to describe the spatial distribution of particles (as for the Fast Multipole method). Groups of particles at various levels are treated as monopoles (or quadrupoles). A group is “opened” (i.e., the tree is descended) if the point at which the force is required is too close to the cell to achieve the desired accuracy. This “opening” is shown explicitly in Figure 4. The accuracy of the scheme is set by the opening angle 0 as described in Figure 4. In typical use 0 N 0.5-1 radian. If the opening angle (or tolerance) is set to be very small in order to achieve high accuracy, the method becomes extremely inefficient. Ensemble
252
H M P Couchman
Figure 4. Schematic illustration of the operation of the Barnes-Hut tree code for the same particle distribution as in Figure3. The force calculation for particle a opens cell j and descends to further levels if Q > 8, where 9 sets the accuracy of the scheme. RMS force errors of order 1/2% are typically achieved for 8 Y 0.75 and quadrupole cell expansions. Under normal operation, the scaling of the method is O ( N 1 n N ) . The precise method for setting the angle subtended by a cell is illustrated only schematically in Figure 4 and a number of different criteria have been used in order to avoid pathologies of certain particle distributions.
The method has a number of attractive features and a great deal of effort has been expended in optimising the details of the technique. In particular, the algorithm may be written very compactly indeed as a recursive algorithm, but this does not lead to efficient use of modern hardware. In addition to removing recursion, the tree may be traversed for several particles simultaneously t o avoid multiple similar tree walks. Multiple timesteps have been introduced and a large number of refinements made for efficient parallel execution. The Barnes-Hut tree may have an arbitrary number of levels depending upon the particle distribution. N-node trees, such as the nearest-neighbour tree (Appel 1985, Benz et al. 1990), although more expensive to construct than the Barnes-Hut tree, can often be used for several steps and are simpler t o modify before a complete rebuild of the tree is necessary. For cosmological investigations periodic boundary conditions can be included fairly readily. This is frequently achieved via table lookup from stored values derived from the Ewald method (see below). These techniques are somewhat cumbersome but improvements have been made by computing high-order expansions of the far-field image distribution. A key issue in cosmology is the difficulty of accurately evolving small perturbations in a nearly homogeneous particle distribution. As we will see in the next subsection, it is in this area that grid-based codes have a distinct advantage. Tree methods are in a state of active development in many fields of computational science and improvements and new technical variations appear frequently. Of interest is a class of codes which combines aspects of the Fast Multipole method with the BarnesHut-type tree algorithm (e.g., Dehnen 2000); many of the ideas in these approaches look very promising.
Cosmological N- body simulations
253
Grid-based codes Grid methods became established early on as an important technique for solving for the mutual gravity of a large number of particles. The introduction of the Fast Fourier Transform, in particular, lead to extensive use of grid-based particle methods in plasma physics which subsequently found widespread acceptance in astrophysics (see, for example, Birdsall & Langdon 1985; Hockney & Eastwood 1991). The standard Particle-Mesh (PM) method operates as follows. The particle distribution is sampled (typically) on a regular mesh and the potential corresponding to the sampled density is computed by one of the standard elliptic solution techniques-frequently via FFT convolution. Forces are then obtained on the grid by differencing this distribution and are finally interpolated back to the particle positions. The method has a number of key features:
FFT convolutions produce periodic boundary conditions automatically (E,mi=O). Forces are accurate and the pairwise interaction is well determined and conserves momentum. The existence of rapid solution methods for regular grids leads to a very efficient method. The force resolution is fixed by the grid size and results in a limited ability to model large density contrasts. In order to address the limited spatial resolution of the standard Particle-Mesh (PM) method, Hockney & Eastwood (1991), popularised the Particle-Particle-Particle-Mesh (P3M) method. This technique augments the limited spatial force resolution of the standard PM calculation with a direct, short-range, force accumulation from neighbours within roughly 2 grid cells (set by the grid Nyquist) using a pairwise force which when added to the mesh force is designed to reproduce a Coulombic (or other desired) interaction to much smaller scales. This technique maintains many of the advantages of the PM method. The timing of P3M (assuming that the number of grid cells L c( N 1 / 3 )is given approximately as
t E N ~+,p
~ 1nL, 3
(27)
where N is the total number of particles and p is a constant. Short range forces must be accumulated out to a cutoff distance r,, and the computational effort per particle will thus be proportional to the average number of neighbours of a particle out to the distance r,: given by n, = 47r :J rz<(r)dr. The basic features of the scheme are outlined in Figure 5. Whilst the P3M technique has found extensive use in cosmology (e.g., Efstathiou et al. 1985) it is not ideally suited to situations in which large density contrasts develop. In circumstances that ( ( r , ) , and hence n,, become large, the method becomes very inefficient as a very large number of neighbours must be included in the direct sum. The accuracy of the scheme is limited by the mesh errors (aliasing etc.): smaller mesh errors require a smoother mesh force which implies a larger cutoff radius r, and hence larger n,, reducing the computational efficiency.
H M P Couchman
254
v
i
r
x\ FpM Radius above which force is force
Shape 8~ optimise accuracy c F(x = x, - xj;x,) by minimising
lellh3 L3 d3x,
’
w.r.t e ( k ) TC
log(r)
‘I’
T
>rc
A
L pairwise rc
IF(Yx , ) - w 2 d 3 :
log(r)
1 Required
Tc
rce log(r)
Figure 5. Schematic of the P3M algorithm. A modification to standard P3M is to replace regions with expensive direct sums (the
PP part) with a further P3M calculation on a finer, localised mesh (Couchman 1991). The “refined” regions can themselves be further refined. The method is illustrated schematically in Figure 6. Figure 6 demonstrates the relationship between P3M and AP3M. More importantly it shows how i\P3M behaves like a crude tree code but with coarse resolution blocks. Although this gives only limited control over the data it leads to low storage requirements and maintains the efficiency associated with the potential solution on regular (sub-)grids. Most data operations consist of binning (sorting into cells) and the code generally has a low computation to communication ratio. In computing the force on a particle approximately 150 effective particle interactions are computed compared with roughly 2000 for a tree code. Significant advantages can be gained by ordering data sequentially to maximise the use of memory hierarchies on modern RISC computer systems.
Cosmological N- body simulations
255
P3M:
Figure 6. Schematic force splitting employed in the AP3M algorithm. In the implementation of AP3M each component of the force is computed and accumulated sequentially which minimises storage but means that a particle in a dense region may be processed several times as the force on it from each refinement is accumulated. It is worth noting that, within the errors, the forces produced are fully equivalent to those derived from standard P3M. Under the assumption that all the direct-sum, PP, work not done at the first P3M stage, is done with one level of refined meshes, simple estimates show that the cost of the AP3M method is roughly twice that of standard P3M under light clustering. In practice it may not be possible to achieve this because of restrictions on the allowable grid sizes and because it is not always easy to cover optimally dense regions of arbitrary geometric distribution. In practice, during execution the code slows by a factor of roughly five as clustering develops. Further efficiencies may be derived by including multiple time steps in the method, probably along the lines of an Ahmad-Cohen scheme, but this idea has not yet been implemented. It is worth making a final further comment on the complexity of the gravitational N-body problem. It is usually stated that the complexity of PM (or P3M under light clustering) is O ( N 1 n N ) since the FFT has an operations count L31nL and N L3. It should be noted, however, that efficient elliptic solvers exist which are O ( L 3 ) : Brandt’s (1977) multigrid method for example. It is clear that the O ( N 1 n N ) complexity in the AP3M arises from the need to have several levels of refinement in cases where the particle distribution is clumpy (fractal). This again highlights the need to determine empirically the scaling of N-body methods under conditions in which a fractal distribution of particles obtains; it is clear that the complexity of the N-body problem is only O ( N ) under conditions of near uniform particle clustering-a situation of limited interest in cosmology. N
N
Other grid-based methods
A number of alternative techniques to the P3M method have been developed for providing enhanced resolution in regions of high particle density. Several authors have experimented with a hierarchy of grids to achieve high spatial resolution where desired but without adding a direct sum component. These include “Hierarchical PM” (HPM),
H M P Couchman
256
Villumsen (1982); “Adaptive Refinement Tree” (ART), Kravtsov et al. (1997); and “Hercules”, Bryan & Norman (1995). It is not generally efficient, however, to ensure that twice the local cell size is less than the minimum local particle separation and this leads to a spatially varying force resolution. It is not clear to what extent this is important. The refined grids in the methods are solved in a variety of ways some analogous to the sequential sweeps of AP3M. An important philosophical difference with some iterative solvers, multigrid for example, is that the full solution is computed on all mesh points. both of the main grid and any refined grid points, simultaneously. The potential is then differenced to produce the total force which is interpolated back to the particles in a single step. This approach avoids processing the particles many times but requires the simultaneous storage of all levels of the grid hierarchy. The non-Fourier techniques also do not permit the force shaping which is a key component of deriving accurate forces in the P3M method.
A different approach to improving the resolution of standard PM methods has been achieved by marrying PM with Tree codes (e.g., Xu 1995). These hybrids have been motivated by the desirable features of Fourier PM methods for cosmological simulations, in particular the automatically periodic boundary conditions and the ease with which small perturbations in a near uniform distribution may be evolved. The most straightforward approach to an effective algorithm is simply to replace the PP part of the P3M algorithm with a tree-based computation in which the pairwise force is appropriately shaped to match the PM force.
4.5
The Ewald method for periodic boundary conditions
As noted previously, many direct methods such as the tree code require the explicit introduction of images in order to model correctly periodic boundary conditions. The method used almost universally is some variant of the Ewald (1921) method which was originally developed in the context of ionic crystals for finding lattice potentials. The key idea is closely analogous to the force splitting of P3M. The method splits the interaction potential ( a Coulombic law softened a t small scales) into two parts:
4 ( ~ =) h e a r + 4far.
(28)
In order to find the contribution at a point from all masses in the fundamental zone and their images the split is chosen so that the the first part is summed directly over roughly 53 image cells. The second component is slowly converging in real space but is sharply peaked in Fourier space. A similar sum over approximately the 53 lowest wavemodes allows the total potential to be accumulated and converged to 32 bit precision. The P3M method is equivalent to this process except that: 1. The split is tilted far towards the 4farsum because efficient Fourier summation methods are available (FFT) and hence is summed only over a part of the fundamental cell (and no images are included).
2. Fourier values are interpolated from a stored grid of values which implies that mesh errors will be present.
Cosmological ,\'-body
4.6
257
simulations
Timestepping
A number of features of the gravitational N-body problem in cosmology dictate the choice of simple time integration schemes. First, the stochastic nature of particle orbits in a softened many-body bound halo limits the accuracy with which we can or wish to follow particular particle trajectories. Second, we are frequently investigating the collapse of objects in the cosmological setting and are thus modelling an instability. Third, the large particle number frequently dictates that forces have limited accuracy; there is little point using an integration scheme with an accuracy significantly incommensurate with that of the forces. Finally, for a large number of particles it is not feasible to store a large number of intermediate force evaluations. These factors dictate rather limited integration schemes. For pure N-body-no frequently used:
hydrodynamics or shocks: f
# f (v)-time-centred
Note that it is simple to include the effective cosmic drag coordinates.
(K v )
leapfrog is
when using comoving
This method is simple to programme, is second order in 6t and requires no extra storage. It is also simplectic and, although this property is highly desirable in integrations of stable or quasi-stable systems, it is not clear to what extent this property is useful in cosmology. In hydrodynamic simulations a simple fixed-timestep scheme such as the above is not useful: particles may suffer very rapid changes in acceleration and velocity as they pass through shocks for example. In these cases, adaptive timestepping schemes are commonly used.
Individual t i m e s t e p s Cosmological simulations naturally give rise to a wide range of densities and hence characteristic timescales t 0: p-'/*. It will frequently be wasteful to integrate all particle orbits forward with the same conservative timestep. Many cosmological tree codes use a simple binary blocking scheme to permit flexibility in the number of particles which are being actively integrated forward at any time (e.g., Hernquist & Katz 1989). Particles are sorted into bins with timestep At9 At, = F 1 (31) where tg is a global synchronising timestep. Although the tree is rebuilt on the shortest timestep, t,, the force calculations are performed on only a subset of the particles; generally the tree rebuild is fast compared with the force calculation. There have been several suggestions to extend the adaptivity of the tree into the temporal domain to avoid explicit tree rebuilds on the shortest timescales, but this requires care in order to avoid the tree data structure becoming very tangled. In many cases the gain by using individual timesteps is limited for pure N-body, perhaps up to a factor of 3 in speed gain compared with a fixed global timestep. Further,
H MP
Couchman
in parallel implementations great care has to be exercised in order to avoid poor load balance when using individual timesteps. In hydrodynamic simulations the range of natural particle timesteps can vary by very much larger factors and in this case individual timesteps may represent a huge improvement in efficiency.
4.7
Comparison of methods
We conclude this section with a brief comparison of the main methods used for cosmological simulations in Table 2. The direct summation technique, although not widely used in cosmology, is included for reference. Speed Direct FMM Tree AP3M
Storage
3000/( 10-3N) 600 ( N
1000 8000
> lo4)
6+ 25-30 25-30 10-15
Straightforward implementation Care needed for 3-D implementations Barnes-hut-type code; periodic b.c.s will reduce speed Lack of detailed control over data
Table 2. Comparison of commonly used N-body methods i n cosmology. Code speeds are quoted an particles/s on a serial DEC E V 5 250MH.z processor. Storage requirements are quoted an words/particle and include the 6 words per particle required for storing positions and velocities i n 3-D.
5
Parallel implementation of N-body algorithms
The advent of massively parallel computers is providing a huge impetus to many areas in which the realistic simulation of physical systems requires modelling large spatial and temporal scales. This is certainly true of cosmology: within the last two or three years simulations with 3 x 10' and lo9 particles have been run on such systems. A simple example will illustrate the necessity of parallel execution. A 10' particle gravity-only cosmological simulation running at 8000/particles/s would require roughly 3 hours per step and 4 months to complete a 1000 step run. This is just marginally possible on a single processor provided it has access to at least 5Gb of RAM. Add more particles or hydrodynamics and it rapidly becomes essential to run in parallel to obtain manageable run times and, frequently, to get access to large aggregate memory. The relatively large communications overhead of particle codes-relative to, say, finite difference codes or much linear algebra-means that a problem of a particular size will scale in speed only over a relatively small range of processor numbers. Cosmological simulation is a good example, however, of Gustafson's modification of Amdahl's law: that although the scaling on a particular problem size may be limited, parallel machines enable one to run larger problems on a greater number of processors in the same time. Behind these large simulations lies a substantial algorithmic effort to make effective use of these new parallel computer architectures. This section considers the basic features of parallel algorithms for both tree and grid codes. Although there are several other
Cosmological N-body simulations
259
aspects of large-scale simulation which are consuming a great deal of attention-post processing of large data sets and, indeed, the general issue of the data quantity from large simulations-we will focus on general issues of algorithmic design.
5.1
Parallelising a tree code
A key feature of the tree algorithm is the flexibility it provides in terms of dynamic decomposition of the data across processors. The approach taken by Salmon & Warren (1994) and followed by others (Dubinski 1996, for example) will be described here. 1. Divide the particle distribution amongst the processors by Orthogonal Recursive Bisection. The bisection continues until the distribution has been subdivided into a number of parts equal to the number of processors. The particle distribution may be divided according to particle number or, more usually, according to an estimate of work.
This procedure will split the distribution into 2n groups which fits well with the power-of-two processor groups on many parallel machines. The load balance, which is usually estimated initially, can be poor at the outset, but rapidly improves as direct measures of the work becomes available. Splitting the work in this way can lead to an imbalance of up to a factor of two in particle number across the processors, but this is usually tolerable.
2 . The particle distribution local to each processor is then decomposed using a regular Barnes-Hut Oct tree. 3. Construct “locally essential trees”. It is not necessary for each processor to hold a copy of the full tree: a tree on a “distant” processor will never be opened below a certain level.
4. Proceed with the regular tree calculation fetching data as indicated by the locally essential tree from other processors as required. This algorithm is relatively straightforward to develop for parallel systems. It requires little fundamental change to the structure or philosophy of the serial algorithm. Parallelism can often be introduced as a localised code layer which makes the algorithms robust and portable.
5.2
Parallelizing AP3M
At first glance, P3M might seem straightforward to parallelise; the PP part is local and the PM part is localised in Fourier space. However, especially when particle clustering occurs, the ideal distribution of data and work-load balance-becomes hard to achieve and is different in the PM and PP parts of the code. A particular problem is that the simplicity of the regular grids does not lend itself readily to a convenient dynamic distribution of data in the way that is easily achieved in tree algorithms. Two different general approaches to Parallelizing this method will be discussed: explicit message-passing using MPI and a directive-based approach using either CRAFT or OpenMP.
260
H M P Couchman
Directive-based parallelism Initial attempts to parallelise AP3M used Cray‘s proprietary directive-based parallel Fortran language, CRAFT, on the T3D. This enabled a simple shared-memory environment on the distributed memory T3D by constructing in software a global address space. This was a very useful paradigm and provided a useful entry into parallel processing although the software caused a significant performance degradation: a factor of three on a single processor when compiled and run with and without CRAFT.
A very useful feature of CRAFT was the provision of the “ATOMIC UPDATE’’ directive. This acted as a single lock-fetch-increment-store-unlock procedure which provided an easy way to avoid data conflicts as many particles cluster to the same grid points in the PM part of the calculation. Despite acting on single words in memory, this part of the algorithm scaled with processor number almost perfectly even under heavy clustering. The PP part of the algorithm loaded cells and neighbours into local arrays and computed them in round robin fashion over the processor list (Pearce & Couchman 1997). A significant drawback of the CRAFT model was the inability to split the machine into processor groups. This meant that refinements in the AP3M algorithm either had t o run on a single processor or over the full machine in parallel. There are typically a wide range of refinements which perform inefficiently in either of these extremes and this represents a serious limit to the possible efficiency. OpenMP is an attempt to develop a standard API for directive-based parallelism on shared-memory machines. It generally permits a very rapid route to production parallelism and incorporates the features of many proprietary vendor-supplied directive-based APIs. Like CRAFT it does not have processor groups. A significant difference is that there is no efficient equivalent of the ATOMIC directive. In many cases this forces the programmer to pay more careful attention to the layout of the data in order to explicitly avoid data conflicts. Such considerations almost always lead t o improved performance on typical machines. The PM part of the AP3M algorithm uses a slab decomposition of the simulation cube in one dimension with data sorted in this direction and addressed sequentially to avoid memory conflicts. Good scaling has been achieved with AP3M on several shared-memory machines such as the SGI Origin 2000, Sun El0000 and Compaq GS320 for moderately-size problems up to lo7 particles (Thacker & Couchman 2000). It is worth noting that achieving optimum serial performance is still an area which should receive detailed attention. Particularly on modern RISC processors the obvious rule of understanding the memory hierarchy and maximising cache use-avoiding indirect addressing, linked lists for example-can often gain substantial performance improvements.
Message-passing algorithm In order to maintain the simplicity of the grid-based AP3M algorithm, and to avoid the addition of a complex data addressing scheme, it was decided t o attempt a static decomposition of the particle distribution. Since these are cosmological simulations we can use the fact that the matter distribution tends to homogeneity on large scales to achieve a good load balance. It is worth noting that many of the issues which must be explicitly addressed in order to construct an efficient distributed-memory algorithm (the
Cosmological N- body simulations
261
ipically 5x5 or 7 x 7 PP cells
Figure 7. The cyclic data decomposition used for large cosmological simulations. The cells are labelled with the processor number
model usually associated with message-passing) , are also relevant for SMP architectures, particularly with the widespread appearance of NUMA (non-uniform memory access) architectures. For fewer than 64 processors it is advantageous to use a slab decomposition as outlined previously. For more that 64 processors, the simulation cube was split into a chequerboard of square cylinders running through a full width of the cube in one dimension. These cells, which cover an integral number of chaining cells, are then distributed cyclicly across the processors. This choice ensures that a reasonable load and data balance is achieved across the machine by averaging over clustering within the simulation volume; moreover it ameliorates the effect of a large surface area-to-volume ratio which would pertain if cubic blocks were used (thus increasing the communications overhead). Figure 7 illustrates this decomposition. The implementation of AP3M was originally coded using Cray’s message-passing library shmem; this permits single-sided, asynchronous messaging and has many features in common with MP12 to which it has now been ported.
As noted, the static decomposition was chosen for simplicity. The expectation was that refinements, which address inefficiencies due to clustering in the serial code, would in addition help load imbalances in the parallel code. The blocking factor (typically either 5 or 7 PP cells) is chosen to optimise the better averaging which occurs with a greater number of smaller cells, against the lower communications cost (which is related to the surface-area-to-volume ratio) that is achieved with larger cells. The PP neighbour work is done in a fairly standard manner by importing data from each of the neighbouring processors as required, in each of the 4 directions in turn in which off-processor neighbour cells lie (see MacFarland et al. 1998). Since the data are stored in blocks of PP cells a redistribution is required for the PM part of the algorithm. Processors write to local PM mesh sites and to the “ghost” cells required to handle overlap between processors. These local mesh sites are then repartitioned to a slab array distributed across processors for computation of the 3-D FFT convolution in parallel. The process is then inverted and forces are interpolated back to the particles which remain stored in the PP blocks. The mesh repartition is performed as
H M P Couchman
262
a free-for-all data exchange. Although this causes observable network congestion on the T3E the effect is very small and it has a negligible effect on the performance or scaling of the PM part which is very good. Refinements are computed as follows. Refinements are ordered by the amount of work required to solve them with the largest first. The data is gathered into temporary arrays and the large refinements are computed. When the work required to solve a refinement becomes too small for the full machine, it is subdivided into smaller processor groups and the process continued until all refinements are solved. This process involves significant data movement which works well on the Cray T3E: it is not clear what penalty this will incur on systems with slower networks.
Practical simulation
6
A practical simulation involves choosing various key parameters. We will discuss here the considerations involved in setting these in a typical cosmological simulation.
As before, suppose that we want to model the observed two point correlation function: <(T)
N
( T ~ / T ) ~Then .
from equation (12) we have:
where s is the softening. The softening must be small enough that we have good spatial softening and yet large enough that we approximate a collisionless fluid.
Softening Ideally, we want a softening such that bound haloes are modelled as collisionless objects. This suggests a softening which is constant in physical coordinates (and hence which shrinks in comoving coordinates). In this case, however, the softening will only be appropriate for haloes virialising at one epoch-since the epoch at which a halo virialises determines its density and hence mean interparticle separation. An alternative would be to use a spatially variable softening which was tailored to ensure that a roughly constant number of particles was maintained with the softening everywhere in the simulation at all times (in a manner analogous to the smoothing used in Smoothed Particle Hydrodynamics). This has generally been perceived as being undesirable as it would involve changes in the binding energy of bound haloes as they move into regions of higher density although the importance of this has not been quantified.
Timestep In many codes, the choice of timestep automatically adapts to the velocities and accelerations encountered instantaneously during the evolution. We can estimate the timestep using the following simple argument. The crossing time of a region of density p is tD (3/87rGp)'I2 and the age of the universe is t H (1167rGp). Since p / p [, t D / t H N 1/fi. With s = 30h-'kpc, [ ( s ) N lo4 and requiring roughly 10 steps per lo3 steps. crossing time, this implies roughly l O / d N
N
N
N
Cosmological N-body simulations
263
Start epoch
Clearly a simulation has to be started well before the first non-linear structures develop. The choice is again something of a compromise. Starting at a later epoch risks serious contamination from the initial Zel'dovich setup procedure which does not imprint the correct higher order statistics on the particle distribution. Several expansion factors are required to generate the correct higher-order statistics (Scoccimarro 1998). A late start minimises the difficulty that tree codes have in integrating the quasi linear particle evolution. The disadvantage of starting at an early epoch is that linear growth of high wavenumber modes is somewhat suppressed. This is generally not of great significance however. Certainly, for grid codes there appears to be little to lose by starting at early epochs. Contemporary high resolution simulations of popular cosmological models will frequently integrate through between 30 and 100 expansion factors.
7
The future
It is clear that numerical simulations are now a central part of our understanding of contemporary cosmology. It is equally clear that there will be a continuing drive towards higher resolution simulations in order to ensure numerical convergence and to better compare with rapidly expanding observational datasets. A simulation which today is a significant undertaking will rapidly become routine. Such a routine simulation might have N = lo8, s = 10h-l kpc, L = 150h-' Mpc, and require 7000 timesteps. Each output would occupy 2.5Gb per timeslice of storage and 5 Gb of RAM to evolve ( w 10 Gb for a tree code). A simulation of this size would take 1 minute per step on 128, 833MHz EV6 Alpha processors and would complete in 5 days. Data handling will become an increasingly significant issue and we are likely to see increased use of selective volume renormalisation in which, perhaps several, simultaneous regions of a cube are simulated at high resolution. On the algorithmic side, the effort expended in code development appears unabated. We can expect to see improved time adaptivity both in tree codes and a serious effort to include this in grid codes. Finally, we are already seeing a shift in emphasis from purely collisionless simulations to those incorporating hydrodynamics and we may expect this trend to accelerate.
References Aarseth S J, 1985, in Multiple Times Scales, eds. J U Brackbill and B I Cohen, Academic Press: Orlando. Ahmad A and Cohen L, 1973, J Comp Phys 12 389. Aluru S , Prabhu G M and Gustafson, J, 1994, http://ww.scl.ameslab.gov/Publications/N-Body/N-Body.html. Appel A W, 1985, Siam J Sci Stat Comp 6 85 .
Athanassoula E, Bosma A, Lambert J-C and Makino J, 1998, Mon Not R Astron SOC 293 369. Barnes J E and Hut P, 1986, Nature 324 446. Benz W, Cameron A G W, Press W H and Bowers R L, 1990, Ap J 348 647.
264
H
P Couchman
Birdsall C K and Langdon A B, 1985, Plasma physics via computer simulation, New York: McGraw-Hill. Brandt A, 1977, Math Comput 31' 333. Bryan G L and Norman M L, 1995, Bull Amer Astron Soc 187 95 04. Couchman H M P, 1991, ApJ 368 L23. Dehnen W, 2000, Ap J 536 L39. Dubinski J, 1996, New Astron 1 133 . Efstathiou G, Davis M, White S D M and Frenk C S, 1985. Ap J Supp 57 241. Ewald P P, 1921, Ann Physik 64 253. Greengard L and Rokhlin V, 1987, J Comp Phys 73 325. Hernquist L and Katz N, 1989, Ap J Supp 70 419. Hockney R, Eastwood J W, 1981, Computer simulation using Particles, Inst of Physics Publishing, Bristol. Kravtsov A V, Klypin A A and Khokhlov A M, 1997, Ap J Supp 111 73. MacFarland T, Couchman H M P, Pearce F R and Pichlmeier J, 1998. New Astron 3 687 . Pearce F R and Couchman H M P, 1997. New Astron 2 411. Peebles P J E, 1993, Principles of Physical Cosmology, Princeton: Princeton University Press. Pham H H, 1999 Ph D Thesis, Waterloo. Salmon J K and Warren M S, 1994, J Comp Phys 111 136 Scoccimarro R, 1998, Mon Not R Astron Soc 299 1097 . Spurzem R 1999, in Computational Astrophysics, J Comp Appl Math 109 407. Sugimoto D, Chikada Y, Makino J, Ito T, Ebisuzaki T and Umemura M, 1990, Nature 345 33. Thacker R J and Couchman H M P, 2000 in preparation. Thomas P A, Couchman H M P, 1992, MNRAS 257 11. Villumsen J, 1982, Mon Not R Astr Soc 199 493 . Xu G, 1995 Ap J Supp 98 355 . Zel'dovich Y B, 1970, Aston Ap 5 84.
265
Periodic orbits of the planar N-body problem with equal masses and all bodies on the same path Carles Sim6 Universidad de Barcelona, Spain
1
Introduction
The differential equations for the planar Newtonian N-body problem are the well known
where qi E R2 and rij = Iqi - q j l , with I I being the Euclidean norm. From now on we assume the centre of mass located at the origin, by using a suitable reference system. Units are chosen so that the gravitational constant, G, is equal to 1. While the Newtonian two-body problem is an elementary problem and the three-body problem is a major problem in celestial mechanics, only partially understood, the N body problem, for N > 3, remains completely unsolved. Only a few solutions are known analytically, related to simple central configurations. In the planar case they give rise to the so-called relative equilibrium solutions, in which all the bodies rotate around the centre of mass, with constant angular velocity, keeping the mutual distances constant. In the simplest of these solutions the N bodies are located at the vertices of a regular N-gon. In the case N = 3 these solutions are the celebrated Lagrange equilateral solutions (Lagrange, 1772), and they exist for any values of the masses ml ,m2 and 7713. In the general case the existence of a regular N-gon requires all masses equal. From now on the masses will be taken as equal and normalised to m = 1.
Carles Simd
266
1.1
Motivation
The N-gon solutions offer, in particular, another interesting aspect: all bodies move periodically, tracing the same curve on the plane, with a constant time interval between the passage of one body and the next at any particular point. We can assume, without restriction, that the period is 2 7 ~(see later) and, hence, there is a function of time q :R
+ R2, t H q ( t )
(2)
such that the positions of the bodies are given by 9 d t ) = q(t - P.rrk/N),
k = O , l , ...,N-1.
(3)
A natural question is to ask if, beyond the N-gon solutions, there are other solutions of the N-body problem with all masses equal, either planar or spatial (the spatial case being simpler) such that all the bodies move along the same path. We shall confine our study to the planar case.
1.2
Historical note
The first example of these solutions was found numerically by Moore (1993). A rigorous existence proof has been given by Chenciner and Montgomery (2000), making use of variational methods and of the symmetries of the problem. The solution turns out to be a Jigure of eight curve; see Figure 1. A description of the dynamical properties of the figure of eight solution, the %atellites” of the figure of eight, the nearby solutions and bifurcations can be found in Chenciner et al. (2000) and Sim6 (2000, 2001b). Soon after the announcement of the Chenciner-Montgomery result, in December 1999, Gerver found another such solution for N = 4 (Figure 1). At the beginning of 2000 I found a lot of solutions with curves of different shape and values of N ranging from 4 to several hundreds. The name choreographic solutions has been introduced for these solutions, inspired by the motion of the N bodies along the path, as seen in animations (see, for example, h t t p : //www .maia.ub.es/dsg/). The N-gon solutions will be considered as the trivial choreographies. Fixing the values of G and m, still allows us to normalise time. We shall use a normalisation such that the period of the choreographies is 2a. This fixes the levels of energy and angular momentum of the choreography. Note that choreographies which are obtained one from the other by rotation or axial symmetry will be considered as equivalent.
1.3
Other potentials
As will be clear in what follows, it is important to consider a more general problem. Instead of the Newtonian potential in 1/r we can consider other potentials. The homogeneous potentials in l/ra (or even logr) are particularly relevant and we shall make use of them in the subsequent sections. If the potential is of the form
Periodic orbits of the N-body problem on the same path
267
where f ( r ) behaves like l / r a ( a 1 2) as r + 0, the potential is known as a strong-force potential. Strong-force potentials are easy to deal with in order to show the existence of choreographies. For the homogeneous cases we can always scale the period t o 27r.
1.4
Theoretical approach
The first question to study is the existence of choreographies. As the problem is of global nature, i.e. we cannot rely on local or perturbation methods, it is natural to consider a global approach. This is provided by the variational formalism. The problem is rephrased in terms of the action functional. We should look for extremals of the action. The space of paths used to make the action extrema1 must exclude collisions. Otherwise we have solutions of ( 1 ) which fail to satisfy the differential equations a t collisions. We shall introduce some choreography classes as suitable sets where we look for extremals of the action. As mentioned, strong-force potentials will lead to simpler solutions. On the other hand, for non strong-force potentials we fail to obtain existence proofs by variational methods. Only the figure of eight choreography has been proved t o exist for the Newtonian potential, by using these methods. The difficulty lies in the fact that we need a priori estimates of the action, on some ‘‘candidate” choreography, t o show that in minimising the action we do not tend to some path for which collisions occur. Another relevant point is the requirement that all masses be equal. The existence of choreographies with unequal masses is still an open question, but Chenciner (2000) has shown that this is impossible for N 5 5 .
1.5
Numerical methods
Given existence results there remains the problem of how to find choreographies explicitly. Even if we do not have existence proofs, we can try to approximate these solutions. We have to resort to numerical methods. As is customary in celestial mechanics and other areas in dynamical systems, the most efficient method of finding periodic solutions (in this case with period 27r) is to look for initial conditions such that we return to those initial conditions after integration of ( 1 ) for one period. That is, we have to look for the zeros of some function. Newton’s method is suitable to this end. In our case we must also use the fact that the time shift from one particle to the next is 27r/N. Furthermore any solution gives rise to other solutions by shifting the origin of time. Hence, we have to ‘‘normalise” the initial conditions (e.g. by asking that a given particle has y = 0 and > 0 a t t = 0, if (2,y) denote the coordinates in R2.) For convergence of Newton’s method, it is essential t o have a good starting point. We shall deal with this by a numerical implementation of the minimisation of the action. This can give the desired initial conditions for Newton’s method. Another possibility is to use minimisation of the action for a homogeneous strong-force potential. This converges t o a solution of (1). As soon as a sufficiently good approximation is found, we shift to Newton’s method. Then we use continuation with respect t o the exponent of the potential until we reach, if possible, the desired exponent (e.g. a = 1 for the Newtonian case).
268
1.6
Carles Simd
Extensions
It is also interesting to look for relative choreographies. Here “relative” means that they can be seen as choreographies in rotating axes, but not with respect to a fixed reference frame. Hence the particles move on a 2D torus instead of moving on a closed curve. For suitable angular velocity of the rotating axes (i.e. rotation with rational frequency) we can have periodic solutions of the N-body problem. These solutions may or may not be choreographies, depending on whether the different particles follow the same path or not. A given choreography can be seen as a fixed point of a canonical transformation (PoincarC map) on a transverse section with fixed energy. Now it can happen that the differential of the PoincarC map at the fixed point has some elliptic component. Around the given solution we can find satellite solutions moving around the basic solution and closing after IC revolutions. These solutions can be satellite choreographies if all bodies move on the same path. See Chenciner et al. (2000) for further discussion and several examples of relative and satellite choreographies. In what follows we shall concentrate on choreographies for N 2 4. The greater emphasis will be on methods to find these solutions. We believe that these methods can be useful for many other applications to celestial mechanics.
2 2.1
The variational formalism The action functional
Let us denote by K and -U the kinetic and potential energies of the N-body problem. We have 1 N-1 K = - m&12, (5) 2 i=o while U is given by (4) with f ( r ) depending on the chosen potential. Let us introduce the Lagrangian L = K U. It is well known that classical solutions of (1) (i.e. solutions without singularities) must come from extremals of the action A . In particular, to find 27r periodic solutions, we can look for extremals of
+
If we want to find choreographies we must consider functions q j ( t ) as defined by (3) being q ( t ) as in (2). As q is a 27r periodic function the image { q ( t ) , t E [0,27r)} is a closed curve on the plane. That is, the image of a circle S’. In what follows we shall denote such functions q simply as a path. It must be emphasised that the relevant object is not only the curve q(S’), obtained as image S’ by q , but the parameterised curve, that is, the curve and how it is travelled with respect to time.
Periodic orbits of the .%'-body problem on the same path
2.2
269
The functional space
However not all such paths are admissible. The first condition is that the integral in ( 6 ) must be well defined. To this end we require that q belongs to the space Z of functions with square integrable first derivative. On the other hand we must rule out collisions. A collision will occur if there are two values o f t , say tl and tz, such that q(t1) = q ( t 2 )with t l - t 2 = 2 ~ j / NIn . that case bodies numbers 0 and j coincide at the same point when t = t l , and the same happens after time shift 27r/N with bodies 1 and j 1, etc. The subset of functions in Z giving rise to collisions will be denoted as A. Hence we must search for q E 31 \ A.
+
For further use let us study the double points in q ( S ' ) . Let tl and tz be time values such that q(t1) = q ( t 2 ) . Then, going from q ( t l ) to q(t2) by increasing (or decreasing) t , we have a loop associated to the double point. Let us measure the time length of the loop, that is, It2 - tlI taking 27r/N as the unit of measure. Let e = It2 - tll/(27r/N).Then we say that the length of the loop is e. Of course, we can also take N - e instead of e (this depends of which sense we use to go from q ( t l ) to q ( t 2 ) ) . For concreteness, we shall assume from now on (except where otherwise stated) that we take as length of the loop associated with the double point (or, simply, length of the double point) the minimum of 1 and N - e. We shall denote again the minimum length as 1. We use the symbol 2 to denote the integer part of e and refer to 2 as the integer length associated to the double point. In the exceptional case that a point has multiplicity higher than 2 , then several lengths can be associated to that point in a natural way. It is clear that if 1 = 2 then q E A. Hence, to avoid collisions we should restrict the admissible paths to those such that the lengths of the double points are not integers. Note that double points with 2 = 0 are non-essential. They can be removed (or introduced) by a deformation of the parameterisation { q ( t ) } without passing through a double point of integer length during the deformation. Given q, it is possible that by deformation we obtain new paths having a tangency (or loosing it). This is allowed provided that the tangency point does not have an integer value of 1.
2.3
Choreographic classes
This defines a partition of Z \ A in connected components, that will be denoted as choreographic classes. In other words: two admissible parameterised closed curves { q(O)(t)}and { q ( ' ) ( t ) } belong to the same class if there exists a one-parameter family of parameterisations, { q ( ' ) ( t ) ) , such that no double points of integer length exist for q(') for any E E [0,1] and it reduces to {q(O)(t)}for E = 0 and to { q ( l ) ( t ) }for E = 1. Hence, both q ( O ) and q(') have the same topology (apart from small loops created by non-essential double points and allowed tangencies) and the integer lengths of the corresponding double points are the same. The boundary of a class consists of paths having double points of integer length.
Carles Sim6
270
2.4
Results
It is clear that if we multiply a curve { q ( t ) } by a factor p then the kinetic energy is multiplied by pz and the potential energy by p-’ (or by p-a if we have f ( r ) = l / r a ) .This prevents p from going to 03 or to 0 and sets bounds on the size of the choreographies (with prescribed masses and period). The same reason shows that the extremals of A cannot be maxima. Therefore, only minima or saddles are possible. Thus the idea is t o take an initial function q belonging t o the desired class and then start a minimisation process. Certainly, in this way the probability of reaching a saddle is zero; we shall return to this point later. To minimise A we can follow (theoretically) the gradient flow of A in q in the sense which decreases A . The problem is that we cannot exclude the probability that this gradient flow tends t o a collision. Indeed, if f ( r ) = l / r , assume that on q ( t ) there is a collision. At collision both K and U become unbounded. Let r,in be the minimum distance between two of the N bodies. By shifting the origin of time we can assume that the collision occurs at t = 0. When t --t 0 we have r,in --t 0. But we know that at a collision of the Newtonian N-body problem r,in behaves as lt12/3.Furthermore the functions K and U differ in the total energy of the system ( K - U = h ) . Hence, the contribution of a collision to the integral in (6) is finite. There is nothing against having a local minimum of A for a q having collisions: see, however, Sim6 (2000) for some constraints that must be satisfied by the shape of q near collision in order that a q minimising A ( q ) contains a collision. This is impossible for the strong-force cases. Indeed, if a = 2 then r,in behaves as
Itll/’. Then U behaves, locally, as t-’ and we have divergence of A . This fact was realised by PoincarC (1896) who was able to prove existence of periodic solutions of the N-body problem for strong-force cases using variational methods. Using the ideas sketched here the following is proved in Chenciner et al. (2000).
Theorem 1 Given any choreographic class, there is a periodic solution of any planar strong-force N-body problem which belongs to this class.
Remark 1 Note that nothing is said about the uniqueness of the solution. I n fact the numerical results show that there are classes with more than one solution, at least i n the Newtonian case and also for all values of ‘a’ close to 1.
2.5
Symmetries
The simplest choreographies (see Figures 1, 2 and 4) have one or several symmetries. It is easy to take this into account both in the variational formulation and in the effective computations. For instance, in Figure 1 the number of essential initial conditions to determine the solution in the three examples displayed is 2, 3 and 6, respectively. However we shall not use any such reduction in the effective computations. Choreographies without any symmetry have been found starting with N = 6; see Figure 3, case 1.
Periodic orbits of the N-body problem on the same path
2.6
271
Linear chains
Some choreographies present a very simple structure. They are symmetric with respect to the z axis and all double points are on the z axis. Furthermore, if q(0) is the rightmost point on q(S’) then the first component, z ( t ) , of q is decreasing from t = 0 to t = 7r and, hence, has a single minimum and a single maximum. These choreographies will be denoted as linear chains: examples are Figure 1 (all cases), Figure 2 (cases 1 and 2 ) , Figure 3 (case 3 ) , and Figure 4 (cases 1 to 4). All these examples are characterised by the increasing sequence of integer lengths of the double points (taking the suitable determination, by choosing e or N - e as appropriate). A simple count (Sim6, 2000) shows that the number of possible linear chains increases exponentially with N . All of them exist for strong-force potentials. It seems reasonable to conjecture that they also exist for the Newtonian potential. We now name the above examples as direct linear chains and we shall use the name folded linear chains if the number of extrema is greater than two. This is what happens in Figure 2 cases 3-6: note that one has to rotate case 5 by ~ / 2 All . of them can be seen as a linear chain with some loops “folded”: the left half loop in case 3, the left loop in case 4, the extreme loops folded inside in case 5 , one and one half loops in case 6. This is also true in several cases in Figure 3. Note that in Figure 2 case 6 we have 4 double points on the z axis and only 4 bodies. This seems impossible for direct linear chains.
2.7
Open questions
At this point it is convenient to summarise some open questions, without being exhaustive. The most important question is to characterise which choreographic classes are realised for the Newtonian potential. Which folded linear chains can be realised? How should they be characterised? Find objections to the existence of solutions inside some class for the Newtonian potential. What is the behaviour of a choreography which exists for strong-force potentials, but fails to exist for the Newtonian one, when we continuously change the exponent in the potential? Up to what values of the exponent can the choreographies be continued?
.
To study, for general choreographies, whether equal masses are necessary for one such periodic solution: see the Section 1.4 above. Other related questions refer to relative or satellite choreographies.
The greatest difficulty in answering these questions is that the use of variational methods relies on avoiding collisions. This requires “good enough” candidates for a minimum of the action and a priori estimates to rule out the existence of collisions. However, the estimates depend strongly on the unknown details of the shape of the curve and the time parameterisation.
Carles Sim6
2 72
-0.5
-0.5
-0.5 0
1
1
0
1
1
1
1
0
Figure 1. The j r s t choreographies found, with 3, 4 and 5 bodies, respectively. The dots show the position of the bodies at some initial time.
1 1)
0.5
0.5
0
O.1: -0.5 -1
- :0 -05
.
1
-0.5
i i -1
0
1
0
1
1
1
0 -0 0 '55 :
1
m: 0
1
0
1
1 0.5
1
1
0
-0.5
0
1
1
1 -1
1
0
1
Figure 2. Choreographies for four bodies under the Newtonian potential. Note that there is another choreography, also with N = 4, very similar to case number 5.
o,::J3hq
-0.5 .
-1
1
J 1
0
1
-1
I 1
0
1
-2
-1
0
1
2
Figure 3. Case 1: A choreography without any geometrical symmetry for N = 6 . Case 2: Idem for N = 7 . Case 3: A circle-like choreography with a small outer loop.
Periodic orbits of the N-body problem on the same path 1 ,
0.5
0 -0.5 -1
1
-1
1
1
273
1 i -1
1
0
1
1
0
1
4)
1
0
1
0.5
0 -0.5 1
1
0
0
1
1
71
0.5
1
0
1
9)
I
0 -0.5 1
0
I 0.5
1
1
10)
I
0
1
0
1
1
1 11)
.
0.5 0 -0.5
-0.5
1
1
0
1
1
0
1
1
1
0
0.5 0.5
O ' l K
-0.5
0
-0.5
0
0
-0.5
-0.5 1
0
-0.5 0
1
-1
1
0
-0.5 0
1
0
1
Figure 4. Choreographies found for 5 bodies. The dots denote initial conditions. Note that all of them have, at least, one (geometrical) symmetry.
Carles Sim6
274
3
Numeric variational methods
We pass to the effective computation of choreographies. The method can be quite different according to whether or not we have approximate initial conditions (i.e. the values q(2.rrjlN) and 4 ( 2 ~ j / N for ) j = 0,1,. . . , N - 1). In the first instance we assume that no information on the choreography is available-only a desired topology and a rough idea of the values o f t associated with the double points which make the parameterised curve admissible.
3.1
A representation problem
To look for q is not a numerical problem: we want to find a function. This must be reduced to a numerical one by representing an approximation of the function by a finite set of numbers. As the components ( x , y ) of q are 2r-periodic functions, a convenient representation for the components ( 2 ,Q) of is by means of a truncated Fourier series (trigonometrical polynomials)
+
c(ap’ kt + K
2(t)x
cos
K
b p ) sin k t ) ,
k=l
Q(t) z
c(at’cos k t + b t ) sin k t ) ,
(7)
k=l
It is easy to take into account symmetries in ( 7 ) . For instance, (direct) linear chains can be parameterised in such a way that x contains only cosine terms and y contains only sine terms. Other representations can be useful.
Remark 2 Due to the centre of mass condition, a choreography with N bodies cannot have harmonics of any order which is a multiple of N in the ( x , y ) components. Note that it is not necessary to impose this condition a priori. The variational method will give these coefficients as zero.
3.2
The starting point
We need some initial guess of the form (7) to start with a minimisation process. For instance, one can start with a handmade sketch of the curve defined with the mouse on a graphical screen and some indications on timing. The values of (2,y ) on the marked points can be converted to equally spaced data with respect to time (by interpolation). Then a Fourier analysis (and some filter to skip irregularities of hand drawing) is used to get the initial approximation. With these data the position of the centre of mass will be not constant, in general; however, as said before, this is irrelevant for the initial data. Any method we use will take care of this and lead to a constant position of the centre of mass, which can then be shifted to the origin by a translation. Alternatively we can omit the coefficients of the constant part of 2 and jj from the plot giving rise to our initial guess.
Periodic orbits of the N-body problem on the same path
3.3
275
Minimising the action
For a numerical implementation of the variational method the simplest approach is to mimic the ideas of the proof. We want to approximate the function q by a function 4^; for example by using trigonometric polynomials as in (7). Alternatively we can use the values at a set of equally spaced points--or some other method. Let us denote as Q E RM the finite set of parameters (Fourier coefficients, coordinates of points, etc. ) needed to represent The action A as defined in (6) is approximated by a discretised map
c.
where the integral is computed by a numerical quadrature formula. The time derivatives are computed by differentiation of the trigonometric polynomial, or by finite differences (which introduce an additional error in A ) , etc. Due to the time shift appearing in (3), which must be preserved by the discretisation, it is sufficient to integrate from 0 to 2 7 ~ / N . Taking into account that collision-free solutions are analytical, the trapezoid rule is highly convenient if trigonometric polynomials are used; in that case any time step can be used to compute the integral. On the other hand, for the case of a discrete set of points we can use only the data at the prescribed times for which we know approximations of q. Using trigonometric polynomials one must compute the components of i j ( t j ) , for the required t j , before evaluation of 2, but this can be done quickly by using FFT methods. We then proceed to minimise bination of variants.
3.4
A(Q) using a gradient method, some variant or a com-
Advantages of the method It is quite robust. It allows very rough starting values, before, and leads to some local minimum of A^.
e(*),obtained as described
It is easy to program and can be used with any potential.
As we have an explicit formula for the discretised action A^, the gradient can be obtained from Q without the need of numerical differentiation. When an approximate minimum is reached we can do checks, such as looking for the invariance of the energy at different time steps. We can also compute the residual acceleration: the difference between the acceleration of the masses computed from i j (by differentiating twice) and using the equations of motion (1). This gives a clear indication of the goodness of the solution found.
3.5
Difficulties of the method Other than in quite simple cases M has to be chosen large-especially if there are passages close to collision. In this case there are sudden changes of q ( t ) and this means that the number of harmonics must be large or that the number of equally spaced (in time) points must be very large. Typical values of M are in the range [lo3,lo4]. It is possible to use different spacings in time in different parts of the curve but this adds some complexity to the algorithm.
2 76
Carles Sim6 The function A^ is very flat, possibly with lots of extrema. The number of iterations to achieve a good approximation is also large (say, again in [lo3,lo‘]). This slows down the process. It can happen that jumps from one choreographic class to another when the minimisation process leads to the neighbourhood of a collision. The variational technique does not provide direct information on the stability properties of the periodic solutions. One should be aware, when looking a t the residual acceleration, that a bad determination of the higher order harmonics can have a mild influence on the potential, a medium size influence on the kinetic energy and a large influence on the acceleration This implies that, after obtaining an approximation, it is suitable to look a t the contribution of the highest order harmonics in ( 7 ) and to increase the number of harmonics to obtain a better approximation. Working with an increasing number of harmonics can also give significant savings in computing time.
8.
The difficulties related to the number of harmonics and to changes of class are much less when the potential is a strong-force. Then one cannot come too close to a collision. This suggests (see later) that it can be faster to proceed t o the computation of the approximate solution with a = 2 (even if one is interested in a = l), then refine the solution by Newton’s method and use continuation to go back to a = 1, (if this is possible).
3.6
Looking for saddle points of the action
Up t o now we have considered only minima as extremals of the action. The outstanding question is how to find saddle points. As before we are interested in obtaining approximations by using a variational method and then using Newton’s method for further refinement. Newton’s method will not care whether the choreography is a minimum or a saddle of the action. Assume we have two different local minima of A on the same choreographic class. The so-called “mountain pass lemma” ensures that there is an arc (in the functional space X \ A) joining both minima and such that the point of the arc on which the action has a maximum (along the arc) is a saddle point of the action. The intuitive idea is quite “geographical”. We can implement this idea numerically. Let us consider two approximate minima for s E [0,1], be an arc joining $O) and # I ) . We of the action #O) and #‘I. Let can take a segment in the space of Fourier coefficients, for instance, but we should check that the elements of the arc do not leave the choreography class. (This can need more sophisticated arcs, like some polygon in the space of approximations). Then we can subdivide the arc in a prescribed number of points; for example they can be equally spaced with s, = i / L where i = 1 , .. . , L - 1. Starting at each one of these points we can minimise the action. After every step of minimisation of all the points in the arc, we can re-examine the distance between those points. Those points close to one of the minima will approach it, so they can be discarded. Intermediate points may increase their distance so that new points have to be added (by interpolation).
Periodic orbits of the IV-body problem on the same path
277
At every step we also check for the maximum of the action along the arc. The process finishes when in two (or more) consecutive steps the value of this maximum remains essentially constant. This gives the saddle point. The process is time-consuming. It is very convenient to implement the calculation on an array of processors, so that every processor deals with one (or a few) of the points on the arc.
4
Zero-finding methods
Let pt denote the flow time t of the differential equation (l),with the natural modifications in cases where we are dealing with a non-Newtonian potential. Let us denote by w, the vector containing the components of q, and q, at t = 0, (4 components in the planar case). Let w the vector containing all the wj components. The principle adopted consists of searching for some w such that
q w ) := CpZ,(W) - w = 0.
(9)
In this section we discuss methods of solving (9), the reductions to be done, the difficulties which appear and how to overcome them. Finally we comment on how to turn the numerical determination of these periodic orbits into a computer assisted rigorous proof.
4.1 Fundamentals To solve (9) we recall, as is well known, that the solution, if it exists, is not unique. Given any vector w of initial conditions then p t ( w ) gives another set of initial conditions. To single out one of them we can do the following: Consider the function q and look for a value o f t * such that Iq(t*)l is maximal. By replacing t by t-t* and rotating the reference frame we can always assume that, at t = 0, q(0) is on the z axis and q(0) is orthogonal to q(0). That is, z(0) is at maximal distance from the origin, y(0) = 0, i ( 0 ) = 0 and Y(0) 2 0. It is possible to reduce the dimension of the system to be integrated by using the centre of mass condition. I have not used this reduction: instead the preservation of the first integrals has been used as a test. After all, faced to a problem of, say, 100 bodies, the reduction of the size of the system from 400 to 396 is not very important. As mentioned in the variational part, it is not necessary to use pzr. The choreographic character of the solution allows us to replace (9) by
where o(w)= ( w N - ~ ,W O , w1,. . . , w N - ~ is) ~obtained from w by (right) cyclic permutation. This saves computations and errors. We shall represent o by a permutation matrix with the same name. To solve (10) we use Newton's method. This requires evaluation of ( P Z r / N (w), numerical integration of (1) and the computation of the differential D G ( w ) = D p ~ , / ~ ( w ) - o .
Carles Sim6
278
4.2
Numerical integration of ODE
It is preferable to obtain solutions of (1) with small error. A convenient procedure is to use Taylor’s method. For simplicity of notation, and assuming no collisions, let us consider the general autonomous, analytic differential equation U! = F ( w ) .
(11)
Assuming w ( t ) known, we want to compute the coefficients of the expansion
~ (+ ht ) = ~ ( t+)
1akhk.
k>l
We have to select a suitable value of h and decide on the number of terms to be kept in order that the relative errors are of the order of magnitude of the rounding errors with the current accuracy (or tolerance) used. Using this scheme the computations are done in the most efficient way. The algorithm producing the components of F ( w ) from the components of w uses, in general, the arithmetic operations, logarithms and exponentials (eventually trigonometric functions). The expansions for all the intermediate variables can be obtained recurrently by repeated application of Leibniz rule. Note that it is essential to produce the numeric coefficients in the expansions rather than long analytic expressions for each of them. It . is obvious than the computational cost to reach some given order, K , is O ( K Z ) The appendix in Sim6 (2001a) contains some details and a proof about the optimal value of h (from the point of view of efficiency); h has to be taken equal to the current radius of convergence of (12) divided by e’. The current radius of convergence can be estimated from the experimental behaviour of the coefficients in (12). This also gives a suitable order a t which to truncate the series. For small enough tolerances a “typical” optimal order is Kept FZ log(l/&)/2, where E denotes the current tolerance. In Jorba and Zou (2001) there is an explicit procedure for the automatic implementation of Taylor’s method for arbitrary analytic F . Note that the method is the most suitable for the case where we are interested in very small rounding errors (say 10-’Oo, leading to Kept x 115). This will be relevant later on.
4.3
Variational equations and stability
To compute D G ( w ) ,which is required for the successive Newton iterates W ( m + l ) = ,(m)
-(DG(~(~)))-~G(~(”)),
we must integrate the first order variational equations associated with (ll), d
-dta u ( O ) ( P t ( w ) ) =
D,F(cpt(w(O)))D,(o)(Pt(w(O))
(13)
subject to the initial condition D,(o)cpo(w(0))= I . Furthermore, having computed the monodromy matrix Dw(o,cpzn(w(0))we can study the stability of the periodic solution is )available, ) but we found so far. Note that only the “partial matrix” D , p ) c p ~ ~ , ~ ( w ( O
Periodic orbits of the N-body problem on the same path
279
can recover easily the full period matrix by composing with copies of the same matrix shifting, successively, rows and columns by CT,g 2 , .. . etc. We remark that (13) has to be integrated simultaneously with (11). Taylor’s method is again suitable for this integration. In the choreography problem and for large N the integration of the variational equations can be time-consuming. For 100 bodies we need to integrate, in all, around 160,000 equations. However, the columns in (13) evolve independently. This means that the task can be distributed to different processors. Each one of them can compute the contribution to (13) of a few columns. Although all processors have to integrate (ll),this results in important savings in computer time.
For further use we also display here the second order variational equation d
-dDt w ( O ) , w ( o ) P t ( ~ ( O ) )
= ~wwF(cpt(.w(O)))(V V, )+ ~
w
~
~
(
P
t
~
~
~
~
~
~
~
(14) ~ w
where V stands for Dw(0)pt(w(O));we must take D w ( o ) , w ( o ) ~ o ( w = ( 0 )0,) as the initial condition.
4.4
Parallel shooting
One of the difficulties in finding same kinds of choreographies, especially if N becomes large, is the strong instability they show. Dominant eigenvalues of the monodromy matrix easily reach values like 1O1O0 and larger. This means that initial (unavoidable) errors in the starting point or the errors introduced by the numerical methods increase in such a way that the computed orbit goes away from the true one. But this is a common problem in boundary value problems and there are simple ways to prevent the growth of the errors. The most popular one is the parallel shooting method (also known as multiple shooting). See, for example, Stoer and Bulirsch (1983). For the moment we forget that we are interested in choreographies and assume that we are faced with the problem of finding a 2r-periodic solution of an equation like (ll), (still autonomous, but there is no need of that). We saw before that this can be solved by looking for zeros of (9). To cope with the strong instability we can introduce P-1 auxiliary intermediate values oft: 0 < tl < tz < . . . < t p - 1 < 2r, either equally spaced or not. Instead of looking for an initial value w(0) we try to find also the values of w at the intermediate epochs w(tl), w(t2), . . . , w(tp-1) and impose the “matching and closing” conditions:
Ptp-
1- t p - p
(20 ( t P - 2 ) )
(P211-tp-l(w(tP-l))
= =
(b-1) w(0).
I
As the time intervals are now shorter, the partial instabilities are less dramatic than in the previous approach. Again (15) can be solved by Newton’s method. The variational matrices associated to qt,-t,-l are required. The system has larger dimension (if w E R” instead of dimension
(
o
,
,
280
Carles Simd
n we shall have nP ) . But the numerical errors in computing p t , - t , - l are much less than before and the condition number of the linearised system is smaller. Readers familiar with topics in dynamical systems can note that this idea is similar t o the use of pseudo-orbits to obtain true orbits in a hyperbolic system by using the shadowing lemma. Readers familiar with spacecraft mission design will realise that the closing errors in the matching when we stop Newton's iterations play the same role as small mid course manoeuvres applied to correct errors coming from model, tracking and execution of previous manoeuvres. In the present problem one must modify slightly the conditions in (15). The values , belong to (0,27r/N) instead of (0,27r). The intermediate conditions have identical form, but the last condition should read
tl,. . . tp-1
'&IN-tp-1
(w(tP-1)) w ( 2 7 r / N = ) aw(O).
The unknowns are the components of w(O), w(tl), . . . , w(tp-I).
4.5
Continuation with respect to the exponent of the potential
The choreographies obtained by the previous procedures can be continued with respect to the exponent a. To this end we only need take into account that a can be viewed as a parameter in any one of the above formulations. Using the version of (10) we have the condition G ( w , a ) = 0. If w E R" it is convenient to consider (w,a) as a new variable z E R"+l and not to make a distinction between the components of z .
G is then a map from an open set in Rn+' to R". If rank D,G is maximal (equal to n) then G ( z ) = 0 defines locally a curve and we are at a regular point. The tangent to the curve is given by ker D,G. Otherwise we are in presence of bifurcations. Note that a "turning point" along the curve where a reaches a local extremum is, in general, a regular point. As the tangent to the curve is available at every regular point, we can set up a differential equation for the curve. A convenient parameter is the arc length. From time to time it is convenient to redefine the point to prevent the values of G ( z )making over-large deviations from zero. This can be done by using a modijed Newton's method where no variable is privileged (see Sim6 1990). The only additional thing we need are the derivatives of G with respect to a. To obtain them one has to add an extra variational equation which accounts for the variation of the flow with respect to the parameter.
4.6
Computer assisted proofs
An interesting question (and not only for the present problem) is how to turn the numerical computations into mathematical proofs. This aspect has not yet been implemented; however, we proceed with the description of an algorithm able to lead to a complete computer assisted proof of the existence of choreographies for the Newtonian potential and different values of N . The first idea is to have some "explicit" version of the implicit function theorem. Different possibilities are available. For our purposes it is convenient to state the following
Periodic orbits of the N-body problem on the same path
281
version of the Newton-Kantorovich theorem; see Isaacson and Keller (1966), Stoer and Bulirsch (1983) for different versions. Assume that we want to solve the equation G ( w )= 0 by Newton's method starting at some initial point WO. Then
Theorem 2 Assume G is a C2 function on a ball B of radius 2 a around a point the following holds: e
W O and
The first Newton's correction -(DG(wo))-lG(wo)has norm bounded b y a . The norm of (DG(wo))-' is bounded b y p. On B the norm of D2G is bounded b y y.
Then if a& < 112 all the iterates of Newton's method fall in 23, there is a unique solution of G ( w ) = 0 in B and the convergence is quadratic. If aPy = 112 one has still a unique solution and convergence, but the convergence is only linear. Note that the theorem is valid in Banach spaces (no need of finite dimension) and that other versions require only that G should have a Lipschitz first derivative. To apply this theorem as an existence proof of choreographies we take the function
G as the reduced version appearing in (10) or, even more convenient, the modification introduced in (15) by using some intermediate time values for the parallel shooting. For concreteness let us denote as G * ( W ) = 0 the current system of equations. We proceed with the following steps: 1. Find a good estimate of the solution that we take as WO.To this end it is suitable to work with high accuracy (say, 100 decimal digits).
2. Starting at WOwe integrate both the equations of motion and the first order variations using interval arithmetic (based on the 100 decimal digits arithmetic used previously, for instance). We end up with G*(Wo)and DG*(Wo). Let b be a bound of the components of G*(Wo) (now given as intervals). Let /? be a bound derived from the interval containing the norm of (DG*(Wo))-'. The value of p is the one required by the theorem and 6p is a bound for a.
3. The more delicate step is the estimate of y. Values of the norm of D2G*(Wo))can be obtained by again using interval arithmetic in the integration of (14), but what is needed now is a bound on a ball around WO. To shorten notation in (14) we shall skip the subscripts which refer to the variable used for the differentials. Instead of D,, we shall use D2, for instance. The meaning and the variables used become clear from the context. 4. Assume we start the integration of (14) not at WO but at a point W1 E B. The norm o f t := Wl - WO is bounded by 2a. Let c ( t ) be the supremum of the norm of D2pt(Wl)for W1 E B. Then c ( t ) satisfies the differential inequality
d
-44 dt
I SUP IlD2F(77)11(SUPIIDvt(W0) + c(t)(5)11)2+SUP IIDF(77)IIc(t)1
where
c(0) = 0 ,
1
77 = vt(W0)+ DvtW0)E + #)(E,"3
and the supremum in (16) is taken with respect to all E with
llEll i 2au.
(16)
282
CarJes Sim6
5. We bound 117 - (pt(W0)llby IIDqt(Wo)112a+c(t)2a2.This defines a tubular neighbourhood around (pt(W0)where we have to bound llD2Fll and IlDFll on that neighbourhood. The bounds are rather simple because they rely, essentially, in bounds of the mutual distances ri3 between bodies. It is clear that the maximum radius of the neighbourhood goes to zero when cy + 0.
6. We proceed to the estimates by trial and error. Assume that the maximum radius of the tubular neighbourhood is bounded by p and let ml, m2 be upper bounds of IlDFll and IID2FII, respectively. Note that estimates of ml, m2 when p = 0 are easy to obtain. Solve the equality of equation (16) (using interval arithmetic) by using these bounds. Denote by F(t) the variable which will be a bound for c(t).This gives d -E(t) = m2(llD(pt(Wo)ll pE(t))’ mlE(t), E(O) = 0. (17) dt
+
+
7. With the value of c(t) attained at the end of the time interval, check if the assumption on p is satisfied. This has to be true for all time intervals used for the parallel shooting. If it holds, let E* be the maximum of the bounds on E ( t ) at the end of the intervals. This is an upper bound for y. Finally check the condition aPr 5 112. If the assumption on p is not satisfied we can try different values for p. If none of them works we have to decrease a. The same happens if the condition on p holds but cyPy > 1 / 2 . To try to reduce a we can improve the estimate on WOby working with interval arithmetics with higher accuracy. We conclude that if we are not in presence of some bifurcation (i.e., if (DG*(Wo))-’has bounded norm) and we are able to produce estimates of WOwith increasing accuracy, we will obtain a complete proof of the existence of the choreography.
5
A sample of choreographies
Figures 2 and 4 display several examples of Newtonian choreographies, for 4 bodies, 5 bodies respectively. All these choreographies have at least an axial symmetry. In Figure 3 (cases 1 and 2) we show examples of Newtonian choreographies without any symmetry. The lowest value of N for such examples has been found to be N = 6. Figure 3 (case 3) illustrates a different phenomenon. It contains 11 bodies and has a small loop with integer length = 1. Figure 2 (case 1) and Figure 4 (case 1) show similar choreographies with N = 4 and N = 5 respectively. In fact no problem has been found in finding this type of solution (small outer loop) for all values of N up to 100. However, we had no success when trying a similar pattern but with the small loop (of integer length 1) inside the large loop. Hence, it seems that not all the possible choreographies are realised for the Newtonian potential. On the other hand there is no problem in finding N-body choreographies of circle-like type with a small loop inside for a = 2. Using continuation with respect to a it is possible to proceed up to values of a close to a = 1. But, before reaching a = 1, a turning point appears and, along the family of choreographies of these type, a increases from the turning point on. Other turning points can appear later. If we denote by a~ the value of a at
Periodic orbits of the N-body problem on the same path
283
which the turning point appears the numerical experiments carried out up to now give a good evidence that U N M 1 + cNd2, for some c > 0. If this is the real behaviour we have the surprising fact that for any fixed a > 1 these choreographies exist from some N on, but never for a = 1. We now return to the circle-like choreography with a small outer loop. As mentioned before, such examples seem to exist for all N if a = 1. Again, using continuation, a turning point appears for some value a: c 1.: in this case one has a; x 1 - d / N for some d > 0. For N = 4 two different choreographies have been found on the same choreographic class. This has been obtained by continuation of case 5 in Figure 2. Trying to decrease a a turning point is found close to a = 1. The family of periodic solutions returns to a = 1 producing a similar, but definitely different, choreography. The figure of eight solution for N = 3 is, in some sense, quite exceptional, for the Newtonian potential (a=l)is the only stable choreography found up until now. Certainly, other choreographies have been found to be stable for some ranges of a, but not for a = 1. Furthermore is seems quite robust concerning continuation. It persists up to values of a beyond the logarithmic potential (which is the potential replacing the case a = 0). Among the other choreographies the only ones which persist until the logarithmic potential are also the figure of eight ones with an odd N. Finally, it is worth mentioning that when N goes to infinity there are some shapes which seem to tend to a limit after scaling (eventually using different scalings in the z and y directions). These are the figure of eight solutions and the direct linear chains with N bodies and N - 1 loops. For further information see Chenciner et al. (2000) and Sim6 (2000).
6
Discussion
The existence of the choreographies and the variety of patterns they display comes as a surprise, especially in what concerns complicated or even asymmetric patterns. Every piece of the N-body problem we start to understand opens a new world of questions. The choreographies are, in some sense, the simplest examples after the central configurations. For both topics the existence and classification problems seem to be far from reach. In any case they will probably contribute to the advance of celestial mechanics due to the new ideas and methods to be developed in order to study them.
Acknowledgments This work arose from the information that A. Chenciner, R. Montgomery and J. Gerver shared with me and has been largely influenced by many discussions on the topic. I am also indebted to R. Martinez and A. Jorba for many useful suggestions. A large part of the work was started during a sabbatical leave at the Institute de Mkcanique Celeste (Observatoire de Paris), thanks to the support of CNRS. My gratitude to my colleagues at that institution for their hospitality and interest on the work. The parallel
Carles Sim6 computing facilities of the UB Grup de Sistemes Dinamics have been widely used. The support of grants BFM2000-805 (Spain), 2000SGR-27 (Catalonia) and INTAS 97-771 is also acknowledged.
References Chenciner A, 2000. Private communication. Chenciner A and Montgomery R, 2000, A remarkable periodic solution of the three body problem in the case of equal masses, Annals of Mathematics - to be published. Chenciner A, Gerver J , Montgomery R and Sim6 C, 2000, Simple Choreographic Motions of N Bodies: A Preliminary Study, Geometry, Mechanics and Dynamics, Springer, New York, to be published. Isaacson E and Keller H B, 1996, Analysis of Numerical Methods, John Wiley, New York. Jorba A and Zou M, 2001, On the numerical integration of ODE by means of high-order Taylor methods. Preprint. Lagrange J L, 1772, Essai sur le problitme des trois corps, iEuures 6 p273. Moore C, 1993, Braids in Classical Gravity, Physical Review Letters 70 3675-3679. Poincark H, 1896, Sur les solutions pkriodiques et le principe de moindre action, Comptes Rendus Acad. Sci. Paris 123 915-918. Sim6 C, 1990, Analytical and numerical computation of invariant manifolds, Modern Methods in Celestial Mechanics, Editions F’rontiitres, Paris, edited by Benest D and F’roeschlk C, 285-330. Sim6 C, 2000, New families of Solutions in N-Body Problems, Proceedings of the ECM 2000, Birkhauser (Basel), to be published. Sim6 C, 2001a, Global Dynamics and Fast Indicators, Global Analysis of Dynamical Systems, IOP Publishing, Bristol, edited by Broer H W, Krauskopf B and Vegter G , to be published. Sim6 C, 2001b, Dynamical properties of the figwe eight solution of the three-body problem, Proceedings of the Chicago Conference on Celestial Mechanics dedicated to Don Saari, to be published. Stoer J and Bulirsch R, 1983, Introduction to Numerical Analysis, Springer, Heidelberg, second printing.
285
Central configurations revisited Jorg Waldvogel ETH, Zurich, Switzerland
1
Introduction
The classical subject of equilibrium, or central, configurations of N point masses has its role in our restless universe: they describe the patterns in which N bodies engage in simultaneous collisions. In this paper we take one more look at the classical conditions defining central configurations: they are easily seen to be expressible as systems of algebraic equations in one or several unknowns. A well-known result is that the 1-dimensional central configurations of N = 3 point masses (the Eulerian configurations) are given by the unique real root of a univariate polynomial of degree 5 . In this case the univariate polynomial also provides an efficient numerical algorithm for computing the central configurations. This is not necessarily true in the more complicated situations of N = 4 bodies. Nevertheless, it is an interesting problem to find the degrees of the univariate polynomials defining, e.g., the central configurations of 4 bodies. The collinear configurations considered by Moulton will be shown to be given by the unique real root of a univariate polynomial of degree 35 (with coefficients of degree 9 in the masses). In the symmetric case of pairwise equal masses this degree reduces to 7 . The planar central configurations of 4 bodies are much more complicated. Satisfying polynomial equations of degrees 6, 9, 22, 22, 2 2 , respectively, in 5 unknowns, the data of these central configurations may well be roots of univariate polynomials with thousands of terms. Whereas this configuration is presently out of reach, the cases of trapezoidal or axial symmetry were found to be of the algebraic degree 24 or 102, respectively.
2
Basics
The notion of a central configuration is conveniently defined by means of equilibrium configurations of N gravitationally interacting bodies in a rotating coordinate system. Let mj > 0 ( j = 1,.. . , N ) be the masses of the bodies, located at positions xj E I?* with
Jorg Waldvogel
286 respect to the centre of mass, i.e.
N
1m j x j = o
where G is the constant of gravity. The force exerted on mj by the other bodies is the gradient d U / d x j . Therefore the N bodies are in equilibrium in a coordinate system rotating about the origin if the positions xj and the angular velocity w can be chosen such that for each body the resultant gravitational force is balanced by the centrifugal force,
w’mjxj. Generalising this concept to d-dimensional space R d , and putting p = w 2 , we define X I , 2 2 , . . . , X N E Rd with Cy=,mjxj = 0 is called a central configuration with respect to the masses m3 > 0, if there exists a constant p > 0 such that
Definition 2.1 A set of N distinct points dU
--dXj
-pm3xj,
(j=1,2,...,N).
(3)
Remarks: 1. Due to the homogeneity of U (degree = -1) the condition (3) remains invariant under the scaling transformation x j Hc x g ,p c--j c - ~ P . Therefore central configurations are a t most determined up to a scaling transformation. A particular choice of p amounts to a normalisation of the size of the configuration.
2. (ii) Condition (3) is also invariant under Rotations x3 H R x j with R an orthogonal matrix E R d x d ; therefore a central configuration is a t most determined modulo a i rotation and a scaling. A useful way (Wintner, 1941) of dealing with the undetermined size of a central configuration consists of introducing the (polar) moment of inertia N
3=1 Multiplying Equation 3 by x3 and summing over j gives, using Euler’s theorem on homogeneous functions as well as the definition (4),
P=rU
(5)
Substituting (5) into (3) and multiplying by 2UI shows that the central configuration satisfies d -(U ’I)= O , j=l,...,N.
5
8x3
Since U 2 1 is homogeneous of degree 0 in the x j (i.e. independent of the scaling) the size of the configuration is not determined by (6).
Central configurations revisited
287
One way of deriving necessary and sufficient conditions for a central configuration is to parameterise the configuration by the mutual distances
x j
These distances may have to satisfy certain constraints which make sure the T j k are the mutual distances between N points of Rd. The force function is naturally expressed in terms of T j k by Equation (2). In view of (6) and the rotational invariance of central configurations it is expected that the moment of inertia I may be expressed in terms of the r j k as well. In fact, introducing the total mass N
m=C
m j
j=1
and considering the product m I as well as Equation (1) results in ma I - o = (ml =
+m 2 + .. . +m N ) ( m l x : + . .. +"2%)
mjmk(xj j
- x K ) ~=
m j m k Ixj
-(mlxl+
.. .+m N x N ) 2
- xkl2 .
j
In view of this and Equation (6) we have (Waldvogel 1972, Wintner 1941)
Theorem 2.2 Let m j > 0 , ( j = 1,.. . , N ) be N masses, and let x j E Rd, ( j = 1,.. . , N ) be N distinct points of R d . Necessary and suficient for x = ( 5 1 ; . . . ; E N ) E RNd to form a central configuration is b(U2I) = 0 , (9) where
are the force function and the moment of inertia of the N-body system. The variational condition (9) means that U21 is stationary, i.e. its gradient vanishes under the geometric constraints making sure that the T j k ( 1 5 j < k 5 N ) are the mutual distances of N distinct points of Rd.
A useful equivalent formulation was suggested by Marchal (personal communication). Define a typical distance A in the configuration by I.L
with G , m , p from (2), (8), ( 5 ) . Then (9) is seen to be equivalent to
The distance A proves useful in the discussion of many cases with 3 or 4 masses.
288
Jorg Wald vogel
Central configurations are important in the dynamics of the N-body problem since the above-mentioned equilibrium solutions involving synchronised circular motions of all bodies can be generalised. If the central configuration is flat, i.e. if xJ E Rz, the N bodies can move’on synchronised coplanar Keplerian orbits (homographic solutions). If the Keplerian orbits degenerate into rectilinear orbits, the motion is also possible if the central configuration is spacial. These solutions of the gravitational N-body problem, referred to as homothetic solutions, clearly exhibit collisions of all N bodies. It is conjectured that a solution exhibiting a general collision of N bodies is asymptotic to a homothetic solution as the instant of collision is approached. For N = 3 this was proven by Siege1 (1941), but for N > 3 the conjecture is still open. Here Siegel’s proof fails since it is not known if the number of central configurations modulo rotations and scalings is finite. Central configurations, therefore, bear the key for understanding collisions of several bodies as well as their inverse processes (explosions).
3
Classical examples
A simple approach to solving (9) is to normalise the size of the configuration and to describe it by parameters p l , p z , . . . ,,of,where f is the number of degrees of freedom modulo the rotations, such that the geometric constraints are automatically satisfied. Since condition (9) is invariant under non-degenerate coordinate transformations, it now becomes
Example (i). No constraints, e.g., X = 3 points of R2; the 3 distances r12, r23, r13 may be chosen independently. We have U
+ ml m3 rG1, ml m2 r:2 + m2 m3 ri3 + ml m3 r f 3 ,
= ml m2 r;: -I m2 m3 r;’
mI = and (11) yields
There follows
mI 1 Ij < k r& = 7 ,
IN ,
(12)
i.e. all three mutual distances are equal. For N = 3 the central configuration is the Lagrangian configuration of an equilateral triangle (even for unequal masses!). The same reasoning applies for N = 2 points of the line R’: the central configuration consists of two distinct points of R’, the “one-dimensional equilateral simplex”. Analogously, we may have N = 4 points of R3 and 6 independent mutual distances. According to Equation (12) the central configuration for any set of masses is the regular tetrahedron.
289
Central configurations revisited
Example (ii). N = 3 points of R', the collinear (Eulerian) configuration of three masses ml, m2, m3. Three different central configurations are possible according as ml,m2, or m3 is the inner mass. We consider the arrangement (m1,m2, m3),normalise the configuration by ~ 2 3= 1 and introduce p = 7 3 2 > 0 as (the only) independent parameter. With ~ 1= 3 p + 1 we obtain
+ ml m3(p+ 1)' + m2 m3, +-mlm3 p+l
mI = ml m2 p2
U
[ml m2
m3]
=
mlmz p
+m2m3
0 1
0 2
0 1
-3 -1
1
3
3
0
-3 -2
-1 -1
0
0
1 -::P5
=0.
(13)
P2
It may be shown that for mj > 0 Equation (13) has exactly one real solution p with > 0. Clearly, the inner mass m2 plays a distinguished role, whereas the interchange of ml and m3 corresponds to replacing p with l / p , as may also be deduced from geometric considerations. p
4
The algebraic problem of central configurations
If parameterisations of N-body configurations in Rd with the correct number f of degrees of freedom are not available the technique of Lagrange multipliers may be used. This technique allows to treat (9) as a variational problem with L geometric constraints
R e ( z ) = O , e = l , 2 , . . . ,L .
(14)
Equations (14) must be necessary and sufficient conditions for the N(N - 1 ) / 2 quantities T j k , 1 5 j < k 5 N to be the mutual distances of N points of Rd. The Lagrange technique then calls for solving the unconstrained variational problem
together with the constraints (14), where the Lagrange multipliers Xe appear as additional unknowns. Any parameterisation may be used, but the mutual distances T j k are preferable since they allow for an elegant representation of the constraints Re = 0. These constraints, expressing the requirement that the ( d + 1)-dimensional volumes of all sub-simplices of 2 with d + 2 vertices vanish, may be formulated by means of the Cayley-Menger determinant (Arthur Cayley 1821-1895, Karl Menger 1902-1985) expressing the volume of a simplex in terms of its edges (Cayley, 1895):
Jorg Waldvogel
290
Theorem 4.1 Let ~ 0 ~ x .1. xd , . E Rd (d 2 0 ) be the vertices of a simplex S
C
Rd, and let
M = (lXj be the matrix of its squared edges. Then the d-dimensional volume V of S satisfies
where e = (1,1 , .. . ,
is the column vector with d
E R~+'
+ 1 ones.
Remarks: (i) d = 0. The O-dimensional simplex is a point; its O-dimensional volume is 1, namely the number of points. Equation (17) yields V = 1. (ii) In d = 1 dimension the simplex is a segment; (17) yields its length. (iii) For d = 2 Equation (17) yields Heron's famous formula for the area of a triangle. (iv) With d = 3 Equation (17) yields a remarkable formula for the volume of a tetrahedron from its 6 edges, also mentioned by Wintner (1941). We are now able to assess the mathematical nature of the conditions (14), (15) when the mutual distances rJk are used as parameters. It is seen that the geometric constraints Re = 0 are all polynomial equations in terms of the rjk, whereas U21is a rational function of the rjk. Equation (15),
also yields polynomial equations in the rjk after multiplication with the product of all denominators. We therefore have
Theorem 4.2 Central configurations of N masses in Rd are determined b y a system of polynomial equations in the mutual distances rjk. Indeed, the second example of Section 3 yields the algebraic degree 5 for N = 3 points on the line, whereas the unconstrained equilateral arrangement of the planar configuration may be considered to be of the algebraic degree 1. Surprisingly, the general case of N > 3 bodies has still not been discussed; not even N = 4 is completely understood. Open questions concern the existence and the number of solutions, the possibility of degeneracies (finite or infinite number of configurations) , the algebraic, quantitative and topological dependence on the masses. Here we will list some of the known results; in the next section the new results of this paper will be summarised.
Central configurations revisited
291
(i) Moulton (1910) gives a complete discussion of the existence of central configurations of N 2 2 bodies on the line:
Theorem 4.3 For every distinct arrangement of N 2 2 points on the line R' there exists a unique central configuration. Therefore their total number is 1 N ! (ii) Important theoretical contributions were made by Palmore (1977), Smale and many others. However, it is still not known if any given set of masses always has a finite number of central configurations. (iii) Lacking theoretical understanding, many authors have resorted to numerical investigations. Algorithms for the numerical solution of systems of nonlinear equations, such as Newton-Raphson-type methods, generally work well here. Sim6 (1978) was able to qualitatively and quantitatively discuss the case of N = 4 bodies in the plane.
5
New results
The Eulerian configuration of N = 3 bodies on the line is determined by the only real root of a univariate polynomial ps(p) of degree 5 , as was seen in Section 3. The smallest degree of a univariate polynomial (with coefficients being functions of the masses) determining a central configuration C is called the algebraic degree a, of C . Therefore, e.g., the Eulerian ~ computing an Euler configuration has the algebraic degree U E = ~5 . ~For ~numerically configuration a root finding algorithm for the polynomial p5 is an excellent method. We now consider the problem of finding the algebraic degree of three particular 4-body configurations by means of currently available computer algebra systems. This degree will give insight into the existence of solutions and their number. However, the question of reality and general validity of the solutions corresponding to the zeros of the univariate polynomial pa is difficult and will not be considered here. We were using the two computer algebra systems: MAPLE, (University of Waterloo, Canada; Bruce W. Char et al. ) See http://www.maplesoft .com/ PARI, (UniversitC de Bordeaux, France: H. Cohen et al. ). See http://www.parigp-home.de/ In the basic capabilities of handling polynomials in several variables both systems were about equivalent; the syntax and automatic simplification capabilities of PARI seem to be more convenient, however. The goal of our study is the reduction of a system of polynomial equations to the problem of the zeros of a univariate polynomial, i.e. to eliminate all but one of the unknowns. There is a wide literature discussing this fundamental problem of algebraic geometry using the theory of polynomial ideals, see e.g. Collins (1971), Cox et al. (1998), Geddes et al. (1992). The preferred tool is the Grobner basis which is a normal form of the polynomial ideal allowing for the successive determination of the unknowns in lexicographic
Jorg Waldvogel
292
order. Unfortunately, Grobner bases turn out to be computationally unfeasible in the problems a t hand. However the following approach proved to be successful. (i) Use rational parameterisations, including rational parameterisations of orthogonal matrices, e.g. cosp -sinp sinp
1- t 2
-2t
cosp
(ii) Work with polynomials. i.e. multiply with the common denominator as soon as possible. (iii) The use of several symbolic parameters (e.g. the masses) often produces very long expressions. Begin with small integer values of the parameters, use a t most 1 symbolic parameter, say m l , in order to determine the polynomial dependence on this parameter. (iv) Cse polynomial resultants for the elimination of one or several unknowns. This is computationally feasible for univariate degrees up to 100. (v) The univariate polynomial pa often contains trivial factors, sometimes in high multiplicities. E.g., the use of the parameterisation (18) may produce the factor (1+t2)n in the resultant R(t). If the degree of R(t)is sufficiently small, polynomial factorisation may work. In more complicated cases it is recommended to calculate the resultant by elimination in different lexicographic orders, producing, e.g., resultants R l ( t ) ,Rz(t). We then compute the greatest common divisor (GCD) of RI and R2 by the Euclidean algorithm, and we have pa 1 GCD(R1, R2) , a much easier computation than the full factorisation of R1.
5.1
The collinear configuration of N=4 bodies
Let the masses ml, m2, m3, m4 be lined up at the positions 0, x, y, 1 respectively, where x , y are the 2 parameters of the configuration, and 0 < x < y < 1. We then have the force function
and a similar expression for mI where the powers -1 are replaced by the squares. It is seen that
0
= U . x ( l - x) ’ y(1 - y) * (y - x)
are polynomials in x and y. The conditions (ll),after multiplication with appropriate powers of the denominators, may be written as
Central configurations revisited
(;z)
y ( 1 - y ) . &I
=-(
z(1 - z)
293
+ fr x ( l - x ) ( y - x) i71z ’
. OYI+ f y ( 1 - y ) ( y - x) . DIY
1
=0,
(20)
where f1 and f2 turn out to be polynomials of total degree 9 in x and y . The elimination of y from the system (20) by means of the technique of the resultant yields
where pa(.) is a polynomial of degree a in x. The degree of Res(f l , fz, y ) in x is 67, whereas the theorems of BCzout and Bernstein (see e.g. Cox et al. 1998) yield the upper bounds 9 . 9 = 81 and 69, respectively. The factors x and (1 - x) in (21) obviously must be excluded since they correspond to the collisions (ml, mz) and (m2, m4) respectively. Furthermore we find p 2 ( z ) = (mz
+ m3)(m1+ m4)x2 - 2(mz + m3)m4x + (ml + m2+ m3)m4
and
x3(1- 2 ) 3 ~ 2 ( 5 =)f l ( z , 4 ; therefore we have to drop the factor p z ( z ) as well since it corresponds to x = y , i.e. the collision between m2 and m3. Finally we have pio(x) = Res@, m I , Y ) , i.e. p l o ( s ) = 0 corresponds to U = I = 0 which is not meaningful for central configurations. We therefore have
Theorem 5.1 The ratio r = dist(mQQ1, m2)/dist(ml, m4) an a collinear central configuration of 4 bodies is the root of a polynomial of degree 35 whose coeflcients are polynomials of total degree 5 9 an the masses. Remarks: (i) The degree of the coefficients is obtained in a computationally feasible way by using sets of masses with a single symbolic parameter. (b) In a large number of cases the polynomial p 3 5 ( x ) has exactly 1 real root, x = 21. It is conjectured that this is generally true for positive masses m3 > 0. (iii) Lacking a Grobner basis the second unknown, y, may be determined from, e.g., the equation fib, Y ) = 0 , (22) which is found to be of degree 7 in y with exactly 1 real root. If this is generally true the search for the “correct” root of (22) is not necessary. (iv) For computing numerical approximations the direct treatment of Equation (20) by a 2-dimensional equation solver, e.g., a Newton-Raphson algorithm is recommended.
Jorg Wald vogel
294
(v) In the case of symmetric masses, m4 = m l , m3 = m2 the configuration is also symmetric, 2 y = 1. Then we always have the factorisation
+
P35 (2) = P7
PZS (2)
'
.
where p7 carries the only real root. With [ = 1 - 22 we have 1 0 - 2
0
0
0 1 7 0 0
8-1 1
(23)
5.2
The trapezoidal configuration of 4 bodies
For two pairs of equal masses, for simplicity normalised as mI=mZ=l and m3=m4=p (0 5 p < l ) ,there exists a symmetric central configuration in the shape of a trapezium (Figure 1). Vc'e normalise the distance between ml and m2 by 2 and introduce the two
-1
-0.5
0
0.5
1
Figure 1. The trapezoidal configuration of 4 bodies, ml = m2 = 1, m3 = m4 = diagonals t , s as the parameters of the configuration: only solutions with t + s > 2, It - SI < 2 are valid. This is a rational parameterisation since the distance between m3 and m4 is
x = -21 (t2 - s2) .
Central configurations revisited We have
U =
21 -+-+-+-
2p
t
mI = 2p(t2
s
2p2
t2-h-2
295
1 2
P2 ( t 2 + 2 )+ 4
s2)2
+4,
and with the abbreviation q = ts(t2-s2)2 the conditions (11)may be written in polynomial form" 4m 1 fl(t,S) = - (UJ+ - UIt).qt = 0 CL 2 (25) m f 3 ( t , s ) = - (U& - U&) . qts = 0 . 2P2 The elimination of s from these polynomial equations of total degrees 10 or 9, respectively, yields the following resultant of degree 72 in t:
+
+
Res(f1, f3, s) = p8 . t34(pt2 4)4(p3t3 8)2 p24(t) .
(26) As in the previous section only roots of the last factor are valid, i.e., the trapezoidal central configurations have the algebraic degree 24. Remarks: (i) The coefficients of p24 are polynomials of degree _< 9 in p . (ii)
always seems to have exactly 1 positive and 1 negative root; only the positive root t = tl is valid.
p24
(iii) s may be determined from in s.
5.3
j 3 ( t l ,s)
= 0; this is a polynomial equation of degree 9
The diamond configuration of 4 bodies
In the second type of symmetric arrangement of 4 bodies (Figure 2) the masses are normalised to ml = m3 = 1, m2 = p , m4 = U > 0. The geometry of the configuration may be parameterised by polynomials in u , u as follows: U = U2 212
+
+ +
b = 1 u2v2
c = (1 u2)(1 - 212)
(27)
d = 42/21.
The third line stems from c = c1 + c2 with c1 = u2 - u2, c2 = 1 - u2u2. The force function and moment of inertia are given by 2p 2u pu 1 U = -+-+-+-
abed mI = 2pu2 + 2ub2 + puc2 + 8.
Elimination of U from the polynomial equations resulting from d(U21)/au = 0, a(U21)/ av = 0 generically yields an equation of degree 102 in u2. Indeed, the approach of the Lagrange multipliers, applied to a parameterisation involving the ratio of the diagonals T = c/d yields an equation of the form p l ~ z ( r= ) 0 leading to
Jorg Waldvogel
296 1,
-1
-0.5
0
0.5
Figure 2. T h e diamond configurataon of 4 bodies,
Theorem 5.2 T h e diamond central configuration of f o r generic values of the masses.
ml
1
= m3 = 1, m2 =
i, m4 = 2
4 bodies has the algebraic degree 102
Remarks: (i) In the particular case pv = 1, i.e.
m2m4
= 77317713, this degree reduces to 84.
(ii) In the case m2 = m4, the obvious solutions with double symmetry have degree 12. For certain masses there are also nontrivial solutions with only one symmetry: they are of degree 45. (iii) In the equal-mass case ml = m2 = m3 = m4 there are the obvious solutions of the square and the equilateral triangle with its barycentre. There is also a nontrivial configuration with ci/d = -.32518 92364 76032 99122 73931 77123 43900 cz/d =
.908619697361919 20548473696650238103
(29)
where the ratio r = (c1+cg)/d of the diagonals is a root of the irreducible polynomial of degree 37,
p37
Central configurations revisited
297
+
p3,(r) = 39858075 * r37 39858075 * r3‘ - 139060395 * r34
+ 17124210 * r33- 115440795 * r32+ 217615248 r31 - 42764598 * + 160917273 * rm- 172452240 * rz8 + 44215308 * r27 128353329 * rZ6+ 53738964 * rZ5 - 19889496 * r24+ 55894536 * r23+ 11212992 * rm + 1386909 * r2’- 12287376 * rZo- 15790278 * + 2319507 * ri8+ 642162 * + 6241506 * r’’ - 1081984 * I-’’ + 338994 * r14 - 1364385 * ri3 + 241040 * r12 - 88548 * rl1+ 175113 * r10 - 30456 * rs + 8400 * U’ - 12120 * u7 + 2052 * r6- 240 * r5 + 336 * r4 - 61 * r3 + 1. 1;
r30
-
(30)
r17
6
Outlook and conclusions
Unfortunately, the algebraic treatment of the general planar central configuration Q of 4 bodies (Simo, 1978) is presently out of reach. As no rational parameterisation of the quadrilateral and its diagonals with 4 parameters is known, one has to resort to Lagrange multipliers. In this way a system of 5 polynomial equations of degrees 6,9,22,22,22, respectively may be obtained. Section 5.3 and B6zout’s theorem therefore yield the interval 102 < UQ < 574992 (= 6 . 9 ’ Z3)for the algebraic degree UQ of this configuration. To conclude we mention two problems unrelated to central configurations, where the algebraic method provides complete insight into the maximum number of solutions. (i) The s t a t i o n a r y points of the distance between two ellipses in R3 (Gronchi 2000, Kholshevnikov and Vassiliev 1999 and Schnider 2000). A practical application of this geometric problem is the “Minimum Orbital Intersection Distance’’ (MOID) between the Earth’s orbit E and a cometary orbit C. For predicting close encounters of comets with the Earth all local minima of the distance between points of E and C are needed. The technique of Sections 4, 5 reduces the problem to finding the roots of a polynomial of degree 16. There follows that 16 is the maximum number of stationary points of the distance between two ellipses in R3 (if there are finitely many). Surprisingly, there are situations where all 16 stationary points are real. Figure 3 shows the distance function D(p,$) in such a case, plotted over the square -7r 5 p , $ 5 7 r , where p , $ are the eccentric anomalies on the two ellipses. There are 4 local maxima, 4 local minima, and 8 saddles (“The Scottish Hills”). (ii) The Stewart p l a t f o r m (Stewart 1965, Dietmaier 1998 and Husty 1996). This device used in robotics is a moving coordinate system with origin 0, e.g. the box of a flight simulator, that can be controlled in 3 translatory and 3 rotational degrees of freedom. One way of implementing this is to support it with 6 “legs” of variable length. Consider the following positioning problem: given 6 fixed hinges pk E R3 on the ground, 6 hinges q k E R3 on the platform and 6 distances d k E R’, (k = 1, . . . , 6 ) . Find the position x E R3 of 0 with respect to the ground as well as the orientation of the moving coordinate system.
Jorg Waldvogel
298
4
psi
4
phi
Figure 3. Distance between two ellipses an space: 4 maxama, 4 minima, 8 saddles Again, this problem may be described by a system of polynomial equations: 6 equations with 6 unknowns. The elimination process, simplified by the linearity of these equations in 3 of the unknowns, yields a single equation p40(z1) = 0 of degree 40 in the first unknown. Therefore the problem can have at most 40 solutions (unless there are degeneracies; then there are infinitely many). Surprisingly, there exist data sets pk, q k , dk with 40 distinct real solutions (Dietmaier 1998).
To find all of them is an important problem in robotics. It is computationally unfeasible, however, to generate the symbolic expressions for the coefficients of p40 in terms of the data. Instead, numerical methods for computing the values of these coefficients for a given data set have to be used. Concluding remarks Central configurations in celestial mechanics, as well as many other geometric configurations in engineering, are determined by systems of polynomial equations. The methods of algebraic geometry provide insight into the existence of solutions and into their maximum number. In the case of central configurations the technique is computationally feasible for N 5 4 bodies, except for the general planar case of 4 bodies. For the Eulerian configurations of 3 bodies and for the symmetric collinear 4-body configurations algebraic geometry also provides competitive numerical algorithms. In all the other cases numerical values are best obtained by means of multidimensional nonlinear equation solvers.
Central configurations revisited
299
Acknowledgment The author thanks Christian Marchal for his valuable comments and discussions on central configurations during the Blair Atholl Summer School. The idea of looking a t the diamond configuration of 4 bodies was brought up by Bonnie Steves; this is gratefully acknowledged.
References Cayley A, 1895, The Collected Mathematical Papers of Arthur Cayley, Volume I, Cambridge. Collins G E, 1971, The calculation of multivariate polynomial resultants. Journal of the AGM, 18 515-532. Cox D, Little J and O’Shea D, 1998, Using Algebraic Geometry, Springer Graduate Texts in Mathematics 185. Dietmaier P, 1998, The Stewart-Gough platform of general geometry can have 40 real postures, in Advances in Robot Kinematics: Analysis and Control, edited by Lenarcic J and Husty M L, Kluwer Academic Publishers. Geddes K 0, Czapor S R, Labahn G, 1992, Algorithms for Computer Algebra, Kluwer Academic Publishers. Gronchi G F, 2000, On the stationary points of the squared distance between two ellipses with a common focus, Res Report 4.73.1251, Dept Mathematics, University of Pisa. Husty M L, 1996, An algorithm for solving the direct kinematics of general Stewart-Gough platforms, Mechanism and Machine Theory 31 365-380. Kholshevnikov K and Vassiliev N, 1999, On the distance function between two Keplerian elliptic orbits. Celestial Mechanics and Dynamical Astronomy 75 75-83. Moulton F R, 1910, The straight-line solutions of the problem of N bodies, Annals of Math 12 1-17. Palmore J, 1977, Minimally classifying relative equilibria, Lett Math Phys 1 395-399. Schnider Th, 2000, Berechnung schmiegungsoptimaler kollisionsfreier Werkzeugverfahrwege fur die funfachsige Frkisbearbeitung von Freiformflachen mit Torusfrbern. Diss. ETH Zurich Nr. 12522. Siege1 C L, 1941, Der Dreierstoss, Ann Math 42 127-168. Sim6 C, 1978, Relative-equilibrium solutions in the four-body problem, Celestial Mechanics 18 165-184. Stewart D, 1965, A platform with six degrees of freedom. Institution of Mechanical Engineers 180 371-386. Waldvogel J, 1972, Note Concerning a Conjecture by A. Wintner, Celestial.Mechanics 5 37-40. Wintner A, 1941, The Analytical Foundations of Celestial Mechanics. Princeton University Press.
301
Surfaces of separation in the Caledonian symmetrical double binary four body problem Bonnie A Steves’ and Archie E Roy’ ‘Glasgow Caledonian University and 2Glasgow University, Glasgow, UK
1
Introduction
Only ten integrals exist in the N-body dynamical problem. If however, time t is replaced as the independent variable and the method known as the “elimination of the nodes” is used, then effectively the order of the N-body problem’s equations of motion can be reduced from 6N to 6N - 12. The two-body problem is soluble since N = 2; if N 2 3, then such problems are insoluble. Much of the work of the eighteenth and nineteenth centuries’ celestial mechanicians was devoted to finding general perturbation methods to give the predicted positions of N bodies to a high a degree of accuracy for as long a time as possible in the future or the past. In addition, efforts were made to devise three-body systems that would enable useful information to be given about the dynamics of the system. The restricted circular threebody problem (RCTBP) was one of these. As is well known, it was a model where two of the bodies of finite masses, ml and m2,revolved about their centre of mass in circular orbits: the third body was a test particle attracted by the two finite masses but of such infinitesimally small mass that it could not disturb the circular orbits of the two finite masses. The particle’s motion could be either (a) coplanar with the orbits of the two bodies of finite mass, the so-called coplanar restricted circular three-body problem, or (b) could move in three dimensions, the three-dimensional restricted circular three-body problem. The equations of motion of the particle in these models, respectively of order 4 and 6, admitted the well-known Jacobi’s Integral viz. V 2 = 2U - C where V 2 is the
302
Bon~iie.4 Steves and Archie E Roy
square of the particle’s velocity, U is a function of the particle’s positional coordinates in a rotating frame and C is the Jacobi constant (Roy, 1988). The Jacobi Integral was used to derive curves in the ay-plane or surfaces in the ayz-space of zero velocity which separate regions where the particle’s velocity is real from regions where it’s velocity is imaginary. The topology of these zero velocity curves or surfaces depends solely on the value of the Jacobi constant. If the topology is such that the regions of possible real motion are isolated from each other, then it can be said that the hierarchical arrangement of the bodies existing in one region of real motion cannot evolve into the hierarchical arrangement of the bodies existing in another separate region of real motion. This situation would be hierarchically stable for all time. Thus zero velocity curves can give information on the hierarchical stability of a system.
A modification of this model, in which coplanarity was assumed and the two massive bodies had equal masses, became known as the Copenhagen problem because the school of celestial mechanicians there devoted much time to exploring in an exhaustive fashion the families of periodic orbits existing in that problem and their stability. The RCTBP, with two unequal massive bodies, has been of great use in giving insight into real three-body problems in the solar system, for example a Sun-planetsatellite/asteroid/comet system where one of the bodies is small, but not massless, and the two massive bodies are not in circular orbits about their centre of mass but in elliptic orbits of small eccentricity. In the last twenty-five years a generalisation to the Jacobi relation has been obtained (Zare 1976 and 1977, Marchal and Saari 1975) when it was found possible to derive a relationship in the general three-body problem using the energy ( H ) and the angular , G is the constant of gravmomentum (c) integrals in the combination c 2 H / G 2 M 5 where itation and M is the sum of the masses of the three finite bodies. If the masses ml and m2 were originally in a binary arrangement with the third mass m3 sufficiently distant and orbiting about the centre of mass of ml and m2 , then the relationship can be used, together with the initial conditions, to discover whether the binary can or cannot be disrupted by the third body. It is also possible that the end-result will be the departure of the third body to infinity. Comparatively little work of this nature has been done in the four-body problem, either in the attempt to derive a simplified four-body problem analogous to the restricted three-body problem or to produce and examine the topology of surfaces separating regions in which the motion is imaginary from regions in which the motion is real. The greater number of variables in even very restricted four body problems pose real difficultiesin any analytical study. In the restricted three body problem, one of the bodies can be made a particle which has no gravitational effect on the remaining two bodies. This reduces the problem of the motion of the two massive bodies to a two body problem which can be solved explicitly. It therefore only remains to study the motion of the particle under the gravitational influence of the two massive bodies of predetermined orbit. In any restricted four body problem, where one of the bodies is made a particle, there still remain three massive bodies whose orbits cannot be determined explicitly unless they are placed in configurations which are known solutions of the general three body problem. Such a model was first proposed by Eckstein (1963) who placed three finite masses at the Lagrangian equilateral triangular solutions to the three body problem and
The Caledonian symmetrical double binary problem
303
studied the fourth massless body’s orbit in the same plane using Hill’s boundary curves. Matas (1968, 1971) analysed the stability of this model and the model where three finite masses are placed at the Lagrangian collinear solutions to the three body problem, by deriving the equivalent Jacobi integral for these models and studying the resulting zero velocity curves.
If special or known solutions to the three body problem are not used, then the four body problem must be further restricted. The “very restricted four body problem” (Huang, 1960) has two masses m2 and m3 moving in fixed circular orbits about each other, while their centre of mass moves in a fixed circular orbit about a third mass ml. The fourth mass m4 is massless. Further restrictions include: ml >> m2 + ma; all motion is coplanar; and the separation of the first two masses ~ 2 3must be much smaller than 3 the third massive body. This problem has applications to the their distances ~ 1 2 ~, 1 to Sun ml, Earth m2,Moon m3’satellite m4 system. Huang (1960) used osculating surfaces of zero velocity to find possible regions of real motion. He proved that the orbit of any artificial satellite about the Moon will be stable if the orbit is periodic. Periodic orbits for the satellite m4 were discovered and studied near the triangular libration points of the Earth-Moon m2-m3 system (Cronin et al. 1964 and 1968, Kolenkiewicz and Carpenter 1967, Barkham et al. 1977). Further generalisations were made to Huang’s model using fixed elliptical orbits with applications to a Sun-Earth-Moon-satellite system which included radiation pressure from the Sun (Matas 1969, 1970). A Huang-type restricted four body problem was used by Llibre and Pinol (1987) to explain the Titius-Bode law. Here the Sun m2 and inner planet m3 are taken to move in a fixed circular orbit while their centre of mass moves in a fixed circular orbit about the galactic centre ml. Llibre and Pinol studied the gravitational effect of these three primaries on a fourth massless outer planet. Scheeres (1998) revisited the periodic orbits near the triangular libration points of two primaries using a generalisation of the restricted three-body problem and the Hill three body problem to the four body problem in which the motion of an infinitesimal mass ( e . g . satellite) is studied acting under the gravitational influence of two primaries ( e . g . Earth-Moon) which orbit a third larger mass ( e . g . Sun). Planetary type four body problems involving three bodies of small mass revolving around a more massive body in the same plane have been used to explore the stabilising role of Saturn in the evolution of the Sun-Jupiter-Saturn-asteroid system (Hadjidemetriou 1980) and the resonant motion in the Jupiter-Io-Europa-Callisto system (Hadjidemetriou and Michalodimitrakis 1981). Hadjidemetriou (1980) studied the motion of a fourth small planet under the gravitational influence of a three body system following a known periodic orbit which was close to the real motions of the Sun, Jupiter and Saturn. Hadjidemetriou and Michalodimitrakis (1981) computed numerically many families of periodic orbits of the planetary type which are separated by resonant orbits showing that there exists a stable periodic orbit near the observed motion of the Galilean satellites. Wiesel (1980) used the planetary type model to show it was possible for the Jovian system to evolve into the current resonant state under the influence of tidal forces. Hadjidemetriou’s method of finding period orbits in the planetary type problem was used by Michalodimitrakis and Grigorelis (1989) to show that a system of two massive bodies moving in circular orbits about each other with two very small bodies (e.g. two planets in a binary star system) can have some stable motions.
304
Bonnie A Steves and Archie E Roy
Kumerical integrations of specific four body problems such as a Sun-Jupiter-Saturnasteroid system have also been a fruitful line of attack. Froeschle and Scholl (1987) explored the existence of asteroids near the 76 secular resonance finding only two asteroids after numerical integrations for 1 Myr. Zhang and Innanen (1988) studied asteroids near the triangular Lagrangian points of Saturn and found stable solutions there. FerrazMello (1994), using numerical integrations and Nesvorny et al. (1997), using frequency map analysis , examined asteroids near the 2:l resonance, showing that these asteroids can diffuse to high enough eccentricities to enable close approaches to the inner planets, thereby providing a possible cause of the 2:l Kirkwood gap. Another method of analysing the four body problem has been the search for special solutions to the four body problem, similar to the Lagrangian equilibrium solutions in the three body problem. It has long been known that special solutions of the N-body problem exist where equilibrium solutions appear for particular geometrical configurations, for example where N masses are placed a t the vertices of an N-gon of equal sides. Moulton (1910) gave straight-line solutions of the problem of N-bodies. In the four-body equal coplanar mass problem there are four equilibrium solutions: a square with a body a t each vertex; a collinear arrangement; an equilateral triangle with a body at each vertex, the fourth body being a t the centre of mass; and an isoceles triangle with a body a t each vertex, the fourth body lying above the centre of mass. Each of the four bodies can be shown to perform circular orbits about the centre of mass of the system, all orbits having the same period of revolution (Palmore 1973, 1975a, 197513, 1976). Using arguments involving the counting of bifurcation sets and different invariant manifolds, Simo (1978) presented a comprehensive survey of arbitrary mass four-body equilibrium configurations and also equilibrium solutions to systems of three masses and a particle. Roy and Steves (1998) demonstrated that most of the equilibrium solutions of the equal mass foilr-body problem can be reduced to the Lagrangian solutions of the Copenhagen problem by setting ml=m2=pLM and m3=md=M and letting p approach zero. In this way families of equilibrium solutions for all values of p from 1 2 p 2 0 are shown to exist. An important analysis of the general planar four body problem was made by Loks and Sergysels (1985, 1987). Using the angular momentum c and the energy E integrals, they obtained hypersurfaces which defined regibhs of the five dimensional space where motion was allowed to take place. In their study, hyperplanes are showfl to exist which correspond to singularities in the potential, $.e. collisions between the bodies; it was also shown that the hypersurfaces were symmetric with respect to a particular plane. The four-body Caledonian problem introduced by the present authors (Steves and Roy 1998, Roy and Steves 2001) enables considerable simplification to be made particularly in the form of the Caledonian Symmetrical Double Binary Problem (CSDBP). The present paper shows how the CSDBP model can be used to obtain surfaces of separation which enable predictions to be made of the possible paths of evolution the initial hierarchy of the four-body system can take. In the following sections, the model is described. First the energy integral is used, and subsequently Sundman’s inequality, to produce respectively surfaces of zero velocity and surfaces of separation which define regions of real motion. In the process, a non-dimensional constant COis found; it is very similar to the one discovered in the general three-body problem. The value of CO obtained from the initial conditions, enables the precise topology of the surfaces of separation in the problem to be obtained.
The Caledonian symmetrical double binary problem
2
305
The Caledonian problem
The CSDBP is formulated by using all possible symmetries. The main feature of the model is its use of two types of symmetry: (a) past-future symmetry and (b) dynamical symmetry. Past-future symmetry exists in an N-body system when the dynamical evolution of the system after t = 0 is a mirror image of the dynamical evolution of the system before t = 0. It occurs whenever the system passes through a mirror configuration, i.e. a configuration in which the velocity vectors of all the bodies are perpendicular to all the position vectors from the system’s centre of mass (Roy and Ovenden, 1955). Dynamical symmetry exists when the dynamical evolution of two bodies on one side of the system’s centre of mass is paralleled by the dynamical evolution of the two bodies on the other side of the system’s centre of mass. The resulting configuration is always a parallelogram, but of varying length, width and orientation. The CSDBP is three-dimensional and involves initially two binaries, each binary having components of unequal masses, but the same two mass values as the other binary. To set up the model we make the following assumptions: All four bodies are finite point masses, with two bodies PI and P4 on opposite sides of the centre of mass of the system having mass m and the other two bodies P2 and P3 having mass M . We define p = m / M so that 0 < p 5 1. See Figure 1. At t = 0, the bodies are collinear with their velocity vectors perpendicular t o the line the bodies lie on. This ensures past-future symmetry. At t = 0, the radius and velocity vectors of the bodies with respect to the four body system’s centre of mass C are given by rl = -r4; r2 = -r3; V1 = -V4; VZ= -V3. Note that VI and V4 do not need to be coplanar with Vz and Vs. This ensures that dynamical symmetry holds for all time, the configuration of the four bodies always being a parallelogram. z
\.’4I
‘
a
Figure 1. The initial configuration of the general form of the CSDBP
Bonnie A Steves and Archie E Roy
306
Taking the centre of mass of the system to be a t rest and as origin, the equations of motion of the general four body system may be written as:
and
a
a + k- a
V, = i- +jax, ay,
at,
The dynamical symmetry condition is given by
rl = -r4;
r2 = -r3;
tl = -i4; i2 = - r 3 .
('4
Given (1) and ( 2 ) , the differential equations for the CSDBP reduce to miri = ViU where
i = 1,2.
(3)
The energy integral is given by T - U = E where T is the kinetic energy, U is the force function and E is the energy constant. For the CSDBP, U can be written as:
1 7-12
J2(r?
+ r;) -
(4) 7-:2
where
M is the mass of the two more massive bodies P2 and m is the mass of the other two PI and P4, so that U ,
P3,
= m / M and is therefore the mass ratio,
r , is the length of the radius vector of Pi from C for
i = 1,2,
r12is the separation distance of Pl from P2.
Suppose now that be written as
where
TI
and
7-2 are
given prescribed values, so that in equation (4) U may
The Caledonian symmetrical double binary problem
307
both of which are greater than zero. It is simple to show that the minimum value of U is given by putting r12 = = J-. We then have
The energy integral may be written as
E = (pMV:+MV:)-GM2 7-12
d 2 (7-:
+17-22) - 7-:2 )+;(;+:)]VI
with the limitation that
Let Eo = -E. Since the kinetic energy must be greater or equal to zero for real motion, Equation (7) provides the condition that
Gw'[~++
7-12
J2
(7-;
+17-22) - 7-y2 ) + ; ( ; + 9 ] - E 0 z 0
(9)
We now introduce Sundman's inequality (Bocalletti and Pucacco, 1996). Let the moment of inertia of the system be I , where 4
I =
mi7-2 i=I
By symmetry we may write I as
I = 2M
(,UT;
+ 7-z) .
Then Sundman's inequality relates I , T and c2, where c is L e magnitude of the constant angular momentum vector for the system, giving
We then have.
This is a much stricter condition. In the case where E is considered alone, the minimum kinetic energy T is 0. When both E and c2 are considered, the minimum kinetic energy Tminvaries with 7-1 and 7-2, since I is a function of these variables. The surfaces now obtained which determine real motion are more correctly called surfaces of separation.
Bonnie A Steves and Archie E Roy
308
3
Regions of motion in the CSDBP
We note that in (9) and (13). G, Eo, c2, A1 and p are constants and that the only variables that appear are r l , r2 and r12. The regions of possible and forbidden motions of the bodies PI and P2 can therefore be displayed in the three-dimensional space r l , r2 and r12. (By virtue of the imposed dynamical symmetry the position and velocity vectors of P3 and P 4 follow immediately). We now introduce dimensionless variables p1, p2 and
p12
in place of r l , r2 and
r12:
The more general problem valid for any energy EOand mass M can therefore be studied. The only parameter which appears explicitly is the mass ratio /I. There are three procedures. successively more precise, which divide the into regions of possible and forbidden motions. They involve finding:
p1p2p12
space
(a) surfaces defined by the kinematic constraints of Equation (8): (b) surfaces of zero velocity using the energy integral. Equation (9); (c) surfaces of separation using Sundman's inequality, Equation (13);
4
Kinematic constraints and collisions
An arbitrary point ( p l , p2, p12) is subject to the constraints, derived from (8). lP1
- P21
5 PlZ 5 PI + P2
(15)
I
+
The upper bound is achieved on the plane OAB in Figure 2; its equation is ,012 = p1 pz. When p1 > pz the lower bound is achieved on p12 = p1 - p2, the plane OAC. When p1 < p2 the lower bound is achieved on p12 = p2 - p l , the plane OBC. The solutions must lie within the (infinite) region bounded by these three planes. Various collisions are possible. They correspond to lines in Figure 2: (a) If p1 = 0 then PI collides with P4. The inequalities (15) are satisfied only if p12 = p2. This collision corresponds to any point on OB. (b) If p2 = 0 then P2 collides with P3. The inequalities (15) are satisfied only if p12 = p1. This collision corresponds to any point on 0.4. (c) If p12 = 0 then PI collides with P2and P3collides with P4.The inequalities (15) are satisfied only if p2 = p1. This collision corresponds to any point on OC.
d m ,
(d) If plz = then Pl collides with P3 and P2 collides with P4. This condition defines a cone which touches the plane p12 = p1 p2 along the line OD. The equation of this line is best written in terms of an axial distance in the plane p1 = p2 denoted by p = f i p l = f i p z so that OD is given by the pair of equations p1 = p2 and p12 = f i p .
+
The Caledonian symmetrical double binary problem 2
=
12
= 0;
p
309
P1 = P 2
Pl = P?
Figure 2. The domain holding the surfaces of zero velocity
5
Surfaces of zero velocity using the energy integral
In terms of the dimensionless variables of (14) the condition for positive kinetic energy given in (9) becomes
The equality defines a surface which confines the possible motions for coordinates satisfying the inequalities (13). In this section we construct explicit formulae for this surface and show how it may be drawn. It will be convenient to parameterise the surface in terms of variables x and y in two ways, depending on the relative magnitudes of p1 and p2: (a) If
p1
> p2 then we set
and x =
y=
so that p2 = ypl and p12 = xpl.
2
(b) If p2 > p1 then we set y = and x = so that p1 = yp2 and p12 = xp2. These definitions and the inequalities (13) imply that 0 5 y 5 1 . 1-y I 5 I l + y Case (a) PI
> PZ
Multiplying (16) by p1 and substituting y = p2/p1 and x = pt2/pt gives P1
5
x1,
The surface is given by P1
= X l b : Y) ;
pz = YP1 ;
P12
= ZPl
'
Bonnie .1Steves and .Archie E Roy
310 Case (b)
P2
> P1
Multiplying (16) by p2 and substituting y = pI/p2 and z = p12/p2 gives P2
where = 2P
X 2 ( 5 . Y)
(:
+
J2(1
Ix 2 . 1yz) - Z2 ) + ; ( 1 + $ )
+
The surface is given by P2
= X2(5, Y) :
pi = yp2 ;
pi2 = xp2
The case of p = 1 \+'hen p = 1 we have X I = ,Ti2 = S say. 2 X(5. y) = s
The surfaces in the = Pz and Pi2.
p1
p2
pi2
+ J2(1 +2y')
-22
+ -21+ -2Y1
space are clearly symmetric about the plane formed by
PI
Minimum values of Xi, X2 and X These functions all have minima attained at
Z ,
=
d m :
and. when p = 1.
Maximum values of Xi, X2 and X A sketch of any of the X functions (see Figure 3a) shows that they achieve their maximum value at the extreme values of z, namely 5- = 1 - y and Z+ = 1 y .
+
and. when p = 1. X,(y) = X ( z - . y ) = X ( z + . y ) = -+'(I+;). 1-y* 2
311
The Caledonian symmetrical double binary problem
(b)
Figure 3. (a) X(x). (b) T h e inverse x(X) obtained by exchanging the axes of X ( x ) . (c) plZ(p) is obtained by a horizontal linear rescaling and a vertical shear proportional to the X value. (d) T h e special cases of y = 0 and y = 1.
Plotting the regions For simplicity we shall plot the surface for /I = 1: since this surface is symmetric about the plane p1 = p2 we need consider only the case p1 > p2. The analysis is similar for /I # 1, except that both cases p1 > pz and p1 < p2 must be considered. Our strategy is to find the intersection of the surface on the plane through the pl2-axis with p2 = y p l , (where y is a chosen constant value between 0 and 1). It is convenient to denote the = d m p l . First consider the function X ( x ) plotted axial distance by p = in Figure 3a. It is defined by Equation 25 and its values at x* and x, are X, and X , respectively (given in Equations (31) and (28)).
d
m
Denote the inverse of X ( x ) as x ( X ) : this function is plotted in Figure 3b. First we = this is simply transform the horizontal axis from X to p = a linear scaling by the factor We then transform the vertical axis from x to plz = x p l = x X . This transform is a shear parallel to the vertical axis: the ordinates of P and R are multiplied by X,; the ordinate of Q is multiplied by X,; the ordinates of Po, QOand & are multiplied by X = 0 so that they are all mapped into the origin. The result of these transformations is shown in Figure 3c.
d m .
dmpl dmX:
Note that the line &R is mapped into a line with slope
z+/dmso that
its
312
Bonnie A Steves and Archie E Roy
j p 12=2p1
Figure 4. The surfaces of zero velocity obtained from the energy integral
+
+
equation is p12 = ( x + / d m ) p = z+pl = (1 y)pl = p1 pz. This is the line in which the axial plane intersects the plane OXB in Figure 2. Similarly the line POPis the line in which the axial plane intersects the plane OAC in Figure 2. There are two limiting cases. arising when y = 0 and y = 1. On the plane pz = 0, given by y = 0, both OP and OR have a slope of unity and both X , and X, tend to infinity: the region degenerates to the collision line OX in Figure 2 . On the plane p1 = pp. given by y = 1. the slopes of OR and OP are and 0 respectively. whilst X, tends to infinity and X , tends to 2& + 1. That is, the points R and P go to infinity along the collision lines OD and OC of Figure 2. The above analysis shows that the real motions lie between hyperbola-like curves and the plp axis whilst also remaining inside the region of Figure 2 . The resulting physical region is shown in Figure 4. The figure also shows the projections of the loci of the points P. Q and R in the plpp-plane for 0 5 y 5 1. We can now discuss the main features of the zero velocity surfaces (the tubes) shown in Figure 4. It is seen that the four tubes in which real motion can take place join in the
313
The Caledonian symmetrical double binary problem
vicinity of the origin. Recalling the symmetry of the original problem, we see that the lower and upper tubes bisected by the plane of symmetry are regions where, for p1 and p2 large, the four body system is respectively, the original double binary or a double binary now formed by Pl and P3 and by P2 and P4. The side wall tubes are tubes that, far from the origin, have one but not both of p1 and p2 large. The tube attached to the wall p1Op12, with p2 small compared t o p 1 , then represents a binary PIP4 with two single bodies P2 and P3 orbiting the binary PlP4. The other side wall tube, with p1 small compared to p2. represents a binary P2P3 with two single bodies Pl and P4 orbiting the binary P2P3. The region near the origin where the four tubes join is therefore a transition region in which strong interplay among the four bodies takes place from which presumably, unless collisions occur, one of the four possible configurations will subsequently emerge to continue the evolutionary progress of this four body problem.
6
Separation surfaces from Sundman’s inequality
A more rigorous sculpting of the p l , p2 and p12 space is now obtained by using Sundman’s inequality. With Sundman’s inequality real motion is possible only when the left hand side of relation (13) is greater than or equal t o the right hand side; imaginary movement occurs when the right hand side is greater than the left hand side. As in (14) we introduce dimensionless variables new dimensionless constant CO:
Eon .
p1
=GM2 1
EO7-2
p2=-
GM2
p~ and
E0732
, 1
p1,
p12=-
GMZ ;
p12;
we also introduce a
Eo c2 C O=G 2M 5 ‘
In terms of these variables Sundman’s inequality (13) becomes
We now show that a relation exists which, for a given mass ratio p , is invariant t o every initial set of conditions of the CSDBP. Replacing the inequality sign by the equality sign in (33) and rearranging the terms, we obtain:
This equation must describe some surface in the p1p2p12 space. It will be much more complicated than that defined by the zero velocity surface but it must approximate that surface as COapproaches zero. We shall find that there are forbidden regions close t o the origin; these regions increase in extent as COincreases and it will be shown that the ‘tubes’ eventually become disconnected. Our method parallels that of the previous section. In particular we introduce the same parameterisation of the surface. (a) If p1 > p2 then we set y = ez and z = P1
(b) If
p2
> p1 then we set y
= PL P2 and z =
so that
p2
= ypl and p12 = z p l .
so that
p1
= yp2 and pI2 = xp2.
Bonnie A Steves and -4rchie E Roy
314 where as before
0 1-y
5 5
y 5 1 . x
(35) (36)
5 l+y
We again consider the two cases separately.
Multiplying (34) by p1 and substituting y = p 2 / p I and x = pIz/p1 gives
where X l ( z ,y ) is the same function as that in Section 5 :
Multiplying (37) by p:: we obtain the following quadratic equation in P: - XlPl
+ 4(pC+O y2)
p1:
=0.
(39)
with solution
If we set
then
This equation for p1, together with p2 = yp1 and p12 = x p l , defines the surface of separation for p1 > p2. Case (b) PZ
> PI
Multiplying (34) by
p2
and substituting y = p l / p 2 and x = p12/p2 gives 1 -xz P2
=
CO 4 P U + PY2)
+I,
(43)
where (44)
The Caledonian symmetrical double binary problem
315
Rewriting (43) gives the quadratic:
P;
- ‘7i2P2
+ 4 ( 1 +COpy2)
=O
(45)
with so1ut)ion
(46) where we have set
C’(x.y) = (1 + pg2)X,’.
(47)
This equation for p 2 , together with p1 = yp2 and p12 = x p z , gives the parameterisation of the surface of separation for p1 < pz.
Properties of C and C’ The two relations (41) and (47) for C and C‘ are functions of p, z and y and therefore are independent of the initial conditions (i.e. initial positions and velocities) of any CSDBP. Consequently they form a set of relations invariant to any change in the initial conditions. The constant COis a function of the initial conditions of the particular CSDBP being studied, but plays no part in these relations. Note that for fixed y both relations have minimum values when 5 = ,/G5:
Throughout the range of values ,U can have, viz. 0 5 p 5 1, the two relations are wellbehaved. For given values of x and y, an increase in p within p’s range increases C and C‘ in value. In particular, for p = 1,
Hence for all pairs of values of x and y in this case, C = C’.
7
Building Szebehely’s Ladder
We now show that a single diagram essential to obtain the topology of the surfaces of separation exists which, for a given mass ratio p, is invariant to every initial set of conditions of the CSDBP. It is derived from the condition that the discriminants in the solutions to the quadratic equations (42) and (46) must be greater than or equal to zero for real systems.
Bonnie A Steves and Archie E Roy
316
7.1 The projection in the plpz-plane of the maximum extension of the region of real motion in the p1, p2, p12 space
+
Let z+ = 1 y and z- = 1 - y . They give the maximum widths of the regions of real movement extending from the plzpplane where p = f i p l = fip2. and extending from the line given by p12 = p l . p2 = 0 and the line p12 = p2, p1 = 0. We again consider the two cases y = p2/p1 and y = p1/p2 separately.
> p2 Here we have y = p2/p1. 15'hen x+ and x- are substituted in turn into the equation (38), it is found that the resulting equations are again identical showing that the upper and lower widths of the extensions from the plane and the lines are equal. Case (a) PI
Substituting z+ = 1
+ y into equation (42) gives
The corresponding variable p2 is given by P2
Case (b) PZ
(53)
= YP1
> PI
In this case y = pI/p2 and a similar solution is found as follows:
where
(55) The corresponding variable p1 is given by P1
= YP2
'
(56)
7.2 The projection in the pppplane of the minimum of the region of real motion in the p1, p2, p12 space The functions X1(x,y) and XZ(Z, y) involved in the solutions for the surfaces of separation in Section 6, Equations (38, 44), are exactly the same as those used in Section 5 . They The actual minimum values are therefore have a minimum at the value z =
d m .
The Caledonian symmetrical double binary problem
317
and
(58) We again consider the two cases y = p2/p1 and y = p1/p2 separately. Case (a) PI > PZ Substituting 5 =
Jminto equation (42) gives
where
The corresponding variable pz is given by P2
Case (b)
P2
= YP1
> P1
For the case of y = p 1 / p 2 ,a similar solution is found as follows:
The corresponding variable p1 is given by P1
= YP'
7.3 Discussion In summary, in sections 7.1 to 7.2, four functions of p and y, i.e. C,, CL, C , and Ch given by equations (52), ( 5 5 ) , (60) and (63) have been obtained. For clarity the two C functions obtained from the maximum or extreme extension of the region of real motion in the pl, pz, p12space are given the suffix e , while the two C functions resulting from the minimum of the region of real motion are given the suffix m. The four C functions are invariant to values of the constant CO,a function of the initial conditions of any particular CSDBP. When a value of p is stipulated, the surfaces become functions only of y. In Section 6, we found that the more general functions C and C' were well behaved with respect to all values of p. In Figure 5 , we display the behavior of the four curves C,, CL, C, and C& as functions of y for five characteristic values of p , namely p = 1,
Bonnie A Steves and Archie E Roy
318
(d) p = 0.01 C
I
c"
t
0.2
0.4
C.6
3.8
i y
(b p = 0.1
0.2
0.4
6
C'.
C.6
0.8
. y
3.6
0.8
iL
(e) p = 0.001
C
0.2
0.4
0.6
0.8
0.4
0.6
0.8
0.2
0.4
(c) p = 0.05 C
I
0.2
i y
Figure 5 . The behaviour of the jour C junctions as junctions of y for jive values of p 0.1, 0.05, 0.01 and 0.001. In particular, when p = 1, then C, = C A and C, = C& for all values of y. In all other values of p . the four curves are independent. The four minima of these curves in the range 0 5 y 5 1 form the rungs of a ladder, which is invariant to the initial conditions of every CDBSP. In the next section. we show how the rungs of this ladder provide the essential information to enable the exact topology of the connectivity of the regions of real motion in any CDBSP to be found, once its constant COhas been computed. The authors suggest naming it Szebehely's ladder after Professor Victor Szebehely (1921-1997), the renowned celestial mechanician and cherished teacher. The authors also proffer the suggestion that the constant CO= E c2 G M" be named Szebehely's Constant.
+
The Caledonian symmetrical double binary problem
8
319
Climbing the rungs of Szebehely’s Ladder
We now show how the constant COcan be used with Szebehely’s ladder to determine the topology of the surfaces of separation in the space p1,p2 and p12. For simplicity we shall consider the case of equal masses ( p = 1). Note first that in the quadratic solutions given by equations (42) and (46) the roots are real, single or complex depending upon whether COis less than, equal to or greater than C for equation (42) or C‘ for equation (46). Consider the C functions for p = 1 shown in Figure 5a. In this case, there are only two rungs on the ladder. The lower rung arises from the minimum of equation (SO), over the range 0 5 y 5 1. Its value is Cmmin= 29.314. Given a value of CO,equations (59) to (61) enable the projections of the minima in the plpz-plane to be found for all values of y. The upper rung arises from the minimum of equation (52). Its value is Cemin= 46.4. Given the same value of CO,equations (51) to (53) enable the projections of the extreme (maximum) extensions in the plpz-plane to be found for all values of y. Because of the symmetry existing when p = 1, the resulting diagrams are symmetric about the line p1 = pz (y = 1). We distinguish the cases (1) CO< Cmmin, (2) CO= Cm,in, (3) Cm,in < CO< Cemin, (4) CO= Cemin, (5) CO> Cemin.
Figure 6. (a) Projections in the plp2-plane of the m i n i m a and extreme extensions for CO< Cmmin.(b) Corresponding region of real m o t i o n in the plzp-plane. Case (1) CO< Cmmin (Figure 6)
In case ( l ) , the roots of both equations (51) and (59) are real. Recall that a value of y defines a straight line through the origin in the plpz plane. Consider firstly the extreme projection curve arising from equation (51). Each value of y will give two points in the plpz-plane that lie on this curve. Figure 6(a) shows that on the pz < p1 side the projections of real motion are bounded by two curves A, and Be. By symmetry the curves A: and BL form the equivalent on the other side of the line p1 = p2. The shaded area therefore is the projection of the region of real motion in the plh-plane. All real motion must therefore occur in the p1, p ~ plz , space above the shaded area. In Figure 6(b) we show the shaded region of real motion in the pl~p-plmeof symme try where p = fip1 = fib. It therefore gives additional information on the form of the region of real motion in the three dimensional space p l , pz, p12. We note that the
320
Bonnie A Steves and Archie E Roy
boundaries QK and PH for the region of real motion connecting the upper and lower segments of real motion project in Figure 6(a) onto the points K and H. As expected, the curves E,,,, EL, F, and F& indicating the minima extension of the region of real motion from the plane of symmetry and the line p12 = pl, p2 = 0 and the line p12 = p2, p1 = 0 lie within the projection area of the region of real motion. When only the energy integral was considered, the region of real motion was found to form four tubes connected to a volume that included the origin. Use of the Sundman inequality produces a region of real motion very similar to that shown in Figure 4 (the zero velocity surface). The major difference involves the inclusion of a small region of imaginary motion in the vicinity of the origin. It forms a tube of imaginary motion which curls from the footprint given by curve Be (Figure 6(b)) before curling down again symmetrically to its footprint given by the curve B: (Figure 6(a)). The four tubes and their connectivity still exist and far from the origin, each tube of real motion involves one distinct possible hierarchical arrangement of the four bodies. Because of the connectivity of the four tubes in Figure 6, each hierarchical arrangement is free to evolve into any of the other three. In this case there is therefore no restriction on hierarchical evolution.
Case (2) C O = Cmmin (Figure 7 )
In this case Equation (51) has two real roots, but equation (59) has a double point real root. The resulting situation in the plfi-plane and in the pl2pplane is shown in Figure 7. In Figure 7(a), curves E,, EL, F, and FA all meet at the point D , the projection of the point D' in Figure 7(b). Direct connection between the upper and lower tubes is about to be lost. Case (3) Cmmin < CO< Cemin(Figure 8)
In this case Equation (51) has two real roots, but equation (59) now has complex roots. This situation, shown in Figure 8, is an intermediary phase where direct connection between the upper and lower tubes has been lost, but connection still exists between each of these tubes and the side wall tubes (See Figure 8(b)). In principle no tube is yet completely separated from any of the other three, though the tube of imaginary motion has now joined itself to the region of imaginary motion between upper and lower tubes of real motion. Evolution from one hierarchical arrangement into any of the other three is theoretically possible; with the restriction, however, that a hierarchical arrangement consisting of a pair of binaries must first evolve into a hierarchical arrangement of a binary and two single stars before evolving into a hierarchical arrangement consisting of a different pair of binaries. Case (4) CO= Cemin (Figure 9)
In this case Equation (51) now has a double point real root, with equation (59) continuing to have complex roots. Figure 9(a) shows that in this situation curve A, meets curve Be at the point K , with curve A: simultaneously meeting curve Bi at the point K'. At this value of CO,connection between the plane of symmetry tubes and the side wall tubes is about to be lost.
The Caledonian symmetrical double binary problem
321
Figure 7. (a) Projections in the p1p2-plane of the minima and extreme extensions for CO= Cmmin(b) Corresponding region of real motion in the p12p-plane.
14 1; 1c.
Figure 8. (a)Projections in the plp2-plane of the minima and extreme extensions for Cmmin < CO < Cemi,(b)Corresponding region of real motion in the plnp-plane.
Figure 9. ( a ) Projections in the plpzplane of the minima and extreme extensions for CO= Cemin(b) Corresponding region of real motion in the p12p-plane.
Bonnie .4 Steves and Archie E Roy
322
pi!
Figure 10. ( a ) Projections in the plp2plane of the minima and extreme extensions for CO > Cemi,( b ) Corresponding region of real motion in the p12p-plane. Case ( 5 ) CO> Cemln(Figure 10) In this case both equations (51) and (59) now have complex roots. All connections between the tubes of real motion have been lost (Figure 10). From the hierarchical evolution point of view. each of the four possible hierarchies given in Figure 4 must remain for all time with no transition possible between any two of them. Thus. absolute hierarchical stability is ensured for all CSDBP systems with a value of CO> C.,,,.
9
Szebehely’s Constant
Szebehely’s Constant COis a function by its form of the starting conditions. We had c2Eo CO = G2M5
where EO is the negative of the energy, c2 is the square of the angular momentum. G is the constant of gravitation and M is the mass of each of the larger two masses. By a suitable choice of units for mass, distance and time we can set G = 1. Now p = m / M and LY = a / b , where m is the mass of each of the two smaller masses, a is the ratio of (i) the initial separation a of PI from 9 and P3 from P4 and (ii) the
initial distance b of the centre of mass of each pair of bodies from the system‘s centre of mass (See Figure 1). Then at t = 0. in the double binary initial collinear arrangement, the starting conditions are given by a, b, E , e l , z with and m as the different mass values. The quantities e and el are eccentricities respectively of Pl and P2’s initial relative orbit and the centre of mass CI2and C34‘s initial relative orbit. The inclination z is the initial inclination of the orbital plane of PI and P2 to that of C12 and C34. Now
Eo
=
c2 =
Eo(M.p,b,a,e.el.z), c2(.21,p,b,a.e,e1,z).
The Caledonian synime t rical double bin arv problem
323
and it is found that COtakes the form
CO= C O ( a~. ,e, el. 2 ) .
(68)
both M and b disappearing from Co. Therefore COis a function of only five parameters. But CO is a constant of the motion so that the choice of a value for CO gives a fivedimensional surface for equation (68) relating the five parameters. Various strategies may be adopted for simplifying the study of this function giving CO. (1) If the initial two-body orbits are circular and coplanar. CO= Co(p,a ) .
(2) If the bodies are of equal mass. we have the even more trivial case of CO= CO(Q). (3) If a number of values of a and 1-1 are taken, then a set of surfaces for each pair of values of Q and p can be found, uzz. CO= Co(e,el 2 ) . giving a surface that may be plotted in three dimensions. %
(4) The case where the initial conditions are of a symmetric four-body system with both the outside bodies Pi and P d initially in orbits about the other two, Pz and P3, which initially form a binary about the system's centre of mass, may obviously also be processed to provide COas a function of five parameters. Further exploration of these relations among the initial conditions will be the subject of a future paper. Obtaining the values rl. r2 and r 1 2 of any particular four-body system is trivial and consists of applying an inverse transformation of the equation (32) to the quadratic solutions p1, p2 and pi2 found in Sections 6 and 7. Finally, the COconstant which is used in the present symmetrical four body model (CSDBP) is obviously closely related to the constant occurring in the past twenty years in many studies of the general three-body problem. It now appears that it is of wider application than previously thought.
10
Conclusions
The study by Loks and Sergysels (1985, 1987) of zero velocity hypersurfaces in the general planar four body problem obtained hypersurfaces which defined regions of the five dimensional space where motion was allowed to take place. Hyperplanes were shown to exist corresponding to singularities in the potential, z.e. collisions between the bodies; it was also shown that the hypersurfaces were symmetric with respect t o a particular plane. In the present study using the CSDBP, the symmetry condition enables a three-dimensional representation of the surfaces of zero-velocity and separation to be obtained. It may be noted that many of the features of the general four body problem found by Loks and Sergysels exist in the present study but are more amenable to visualisation. Additionally. the ability of the Caledonian problem to utilise a large number of initial parameters and still preserve symmetry enables a large family of such models to be studied. It raises the hope that this family of restricted four body models will have the potential to
324
Bonnie A Steves and -4rchie E Roy
play an analogous role in the general four body problem to that played by the restricted three body model in gaining insight into the general three body problem.
As expected, the introduction of the square of the angular momentum, c2, and the use of Sundman’s inequality produces a picture of the surfaces of connectivity showing highly significant differences from that obtained by using the energy alone. In particular, for a given value of the mass ratio p , it is possible to compute the four rungs of a ladder whose heights (C values) are invariant to the set of initial conditions of any CSDBP with that value of p . The place on the ladder, not necessarily on any of the rungs, of the constant Co. a function of the initial conditions of the particular four body system under consideration, immediately gives the complete topology of the connectivity of the surfaces of separation for that problem. The identification of the tube of real motion in which the given four body system resides a t t = 0 then enables statements to be made regarding its ability or otherwise to change its hierarchical arrangement. If it can change its hierarchical arrangement, the mode of change can probably be predicted giving the possible hierarchical arrangements it is free to evolve into. The present authors hope to give. in a future paper. a study of hierarchical stability using the present model. There is at least one major question to be answered. Given an initial departure in the symmetry of the Caledonian symmetrical double binary problem, either in one of the masses, or in one of the initial velocities, or in a difference in the separation of the two components of each binary, for how long is the CSDBP capable of predicting the behaviour of the perturbed model? Historically. the surprising usefulness of the essentially unreal restricted circular three-body model in real system exploration may hopefully be repeated in real system four body studies that at least approximate to the CSDBP.
Acknowledgments We would like to thank hlr Andras Szell. Mr Iain Hannah and Dr Peter Osborne for their useful comments on the paper and their invaluable advice and help with the creation of some of the diagrams.
References Ahmad A. 1995, Bull Astr Soc Indaa 23 165 Arazov G T, 1975, Sou Astron Lett 1 153 Barkham P G D. Modi V J and Soudack A C. 1977. Celest Mech 15 5 Boccaletti D and Pucacco G , 1996. Theory of Orbzts 1: Integrable Systems and Non-perturbatwe Methods. Springer-Verlag. Cronin J, Richards P B and Bernstein I S, 1968, Icarus 9 281 Cronin J, Richards P B and Russell L H, 1964, Icarus 3 423 Eckstein M C, 1963, Astron J 68 535 Ferraz-Mello S, 1994, Astron J 108 2330 FroeschlC Ch and Scholl H, 1987. Astron Astrophys 179 294 Hadjidemetriou J D. 1980, Celest Mech 21 63 Hadjidemetriou J D and Michalodimitrakis M, 1981, Astron Astrophys 93 204
The Caledonian symmetrical double binary problem Huang S, 1960, Astron J 65 347 Kolenkiewicz R and Carpenter L, 1967, Astron J 7 2 180 Llibre J.and Pinol C. 1987, Astron J 9 3 1272 Loks A and Sergysels R, 1985, Astron Astrophys 149 462 Marchal C and S a r i D G , 1975, Celest Mechl2 115. Matas V, 1968, Bull astr Inst Csl 19 354 Matas V, 1969, Bull astr Inst Csl20 322 Matas V, 1970, Bull astr Inst Csl 21 139 Matas V, 1971, Bull astr Inst Csl 22 72 Michalodimitrakis M and Grigorelis F, 1989, J Astrophys Astr 10 347 Moulton F R 1910, Ann Math 12 1 Nesvorny D and Ferraz-Mello S, 1997, Astron Astrophys 320 672 Palmore J I, 1973, Bull Amer Math Soc 7 9 904 Palmore J I, 1975a, Bull Amer Math Soc 81 489 Palmore J I, 1975b, Lett Math Phys 1 71 Palmore J I, 1976, Ann of Math 104 421 Roy A E, 1988, Orbital Motion, Adam Hilger, Bristol, 3rd ed. Roy A E and Ovenden M W, 1955. Mon Not Roy Astron Soc 115 296 Roy A E and Steves B A, 1998, Planet Space Sci 46 1475 Roy A E and Steves B A, 2001, Celest Mech accepted for publication Sergysels R and Loks A, 1987, Astron Astrophys 182 163 Sim6 C, 1978, Celest Mech 18 165 Scheeres D J, 1998, Celest Mech and Dyn Astr 70 75 Steves B A and Roy A E, 1998, Planet Space Sci 46 1465 Thanos D A, 1989, Astron J 97 1220 Wiesel W, 1980, Celest Mech 21 265 Zare K, 1976, Celest Mech 14 73. Zare K, 1977, Celest Mech 16 35. Zhang S P and Innanen K A, 1988, Astron J 96 1983
325
327
The Fast Lyapunov Indicator Detection of the Arnold web for Hamiltonian systems and symplectic mappings with 3 degrees of freedom.
C Froeschlk, M Guzzo and E Lega Observatoire de Nice, France
1
Introduction
It is well-known t,hat the long-term behaviour of a mechanical system is in general unpredictable. In the framework of Hamiltonian systems, a remarkable exception corresponds to those systems which are integrable in the sense of Liouville-Arnold. In these systems the phase-space is fibrated by invariant tori, and on each invariant torus all motions are quasi-periodic with the same frequencies w1, . . . ,w,, where n is the number of degrees of freedom. Though Liouville-Arnold’s integrability is a very special property, many mechanical systems of great interest are integrable. Among these, we quote the Euler-Poinsot rigid body, the two body problem, the Birkhoff normal forms around elliptic equilibria truncated at suitable order. Many interesting problems of Physics, like for instance those arising in Celestial Mechanics, can be represented as small perturbations of integrable systems. In general, a whatsoever small perturbation breaks the integrability of the system. Consequently, the behaviour of the solutions can be rather complex and unpredictable to such an extent that it is generically called chaotic. Small perturbations of integrable systems transform them into quasi-integrable systems, and their study is the subject of Hamiltonian Perturbation Theory. One of the most celebrated results of Hamiltonian Perturbation theory is KAM theorem (Kolmogorov 1954, Moser 1958, Arnold 1963), which applies if the perturbation is smooth (i.e. analytic), suitably small, and the integrable approximation of the system satisfies a non-degeneracy property (The non-degeneracy condition requires essentially that the set of invariant tori can be locally labelled by means of the frequencies on each torus. Another possible condition, independent from the previous one, is the so-called isoenergetic non-degeneracy condition-see Arnold 1963). The KAM theorem establishes
328
Claude Fro eschl6
that on a large volume set of initial conditions, which we call the regularity set, the features of the motions of the system are essentially those of the integrable approximation: in the regularity set, motions occur on invariant tori, and on the same torus all motions are quasi-periodic with the same frequencies. More precisely, KAM theorem proves that for any invariant torus of the original system with strongly non-resonant frequencies, (more precisely Diophantine'), there exists an invariant torus in the regularity set which is a small deformation of the unperturbed one. Conversely, nothing is predicted for the initial conditions in the set made of invariant unperturbed tori with frequencies which satisfy a resonance condition Cikiwi = 0 with some integers ICl, . . . ,IC, E Z\O, within a suitable Ikil. Therefore in such a set, which is called precision which increases with the2 order the Arnold web, the motions of the system can exhibit chaotic features.
xi
The topology of the Arnold web is a peculiar one. To describe it we resort to the frequency space wl;. . . ,un.In this space, the Arnold web projects on all hyperplanes C,kiw, = 0 with a neighbourhood which decreases with the order IICiI. Therefore, it is open, dense and if the perturbation is suitably small it has a small relative measure.
xi
Though the structure of the Arnold web was clearly explained already on Arnold's 1963 articles, only quite recently researchers have numerically investigated on its existence, both on model (Laskar 1993) and physically interesting systems. At this regard, we stress that in different fields of physics the question of stability of quasi-integrable Hamiltonian systems in the sense of the KAM theorem has been considered of crucial importance: it has several implications in the problem of beam-beam interactions (Month and Herrera 1979 ), of asteroid diffusion (Nesvorny and Ferraz-Mello 1997) and of galactic models (Papaphilippou and Laskar 1998). All these works have been based on numerical applications based on the frequency-map-analysis (Laskar et al. 1992). The importance of this kind of numerical check is not only explanatory or didactic. In fact, KAM theorem predicts the regularity of the orbits with initial conditions in the regularity set, while the rigorous proof of the existence of instability and irregularity in the Arnold web is a delicate, not completely solved problem. In this article we give a graphical representation of the Arnold web, obtained with a numerical test of regularity of the solutions of the system, with an accuracy which, to our knowledge, has never been published before. As we will show below, the pictures of the Arnold web for different perturbations of the same system clearly show that for very slightly perturbed systems the web seems to have indeed the described structure, while increasing the strength of the perturbation the regular set shrinks until it almost completely disappears. In this way, the evolution from a mostly ordered system t o a largely chaotic one is clearly represented, and it turns out to be in complete agreement with theoretical representations. The paper is organised as follows: in Section 2 we introduce the Hamiltonian model and the symplectic mapping that we have studied and we describe the expected phenomenology of the motions in the Arnold web. In Section 3 we recall the definition of the Fast Lyapunov Indicator and we give a simple example of applications on some characteristic orbits of the mapping. Results on the Arnold web are presented in Section 4. Conclusions are provided in Section 5 .
xi
I w ~.,. . , w, are said to be Diophantine if there exist positive constants y, T such that 1 > y / / k l r , with lkl = E, Ikil, for all k = ( k l ,.,. , k,) E Zn\O. The Diophantine condition considered in KAM theorem requires 7 > n - 1 and y which suitably re-scales with the strength of the perturbation. *More precisely, all frequencies which are not Diophantine.
The Fast Lyapunov Indicator
2 2.1
329
A model problem The Hamiltonian model
Following the approach of M.H6non (H6non and Heiles 1964, HCnon 1969) used to convert the study of complex processes into the study of simple non trivial system, we consider a system with the following Hamiltonian function (FroeschlC et a1.2000):
I: 1; H, = - + - + I 3 + € 2
1
cos 41
2
+ cos 4 2 + cos 4 3 + 4
where IllI z , I3 E R and 42,43 E S are canonically conjugate variables and E is a small parameter. The canonical equations of the integrable Hamiltonian Ho are trivially integrated: 11,12,I3 stay constant while the angles increase linearly with time according to: $ l ( t ) = & ( O ) Ilt, 4 2 ( t ) = $2(0) 12t and 43(t)= 43(0)+t . Therefore, each couple of actions 11,12characterises an invariant torus T 3 and all motions on the considered torus are quasi-periodic with frequencies: w1 = I1, w2 = 1 2 , w3 = 1. Conversely, for any whatsoever small E different from zero, H , is not expected to be integrable. However, we expect that KAM theorem applies, and consequently the phase-space is filled by a large volume of invariant tori, surrounded by the Arnold web. Our goal is to determine numerically the structure of the Arnold web. Before describing the numerical indicator used to discriminate between regular and chaotic orbits, we remark that the Arnold web can be conveniently represented in the two-dimensional plane 11,12. Indeed, each point on this plane individuates univocally the frequency of an unperturbed torus. Moreover, all resonances klwl k2wz k3w3 = 0 are represented by straight lines k l l l k212 IC3 = 0. Of course, the set of all resonances is dense on the plane. However, one can expect that irregular orbits surround each resonance line up to a distance which decreases as & / l k l T , and consequently the volume of the Arnold web is expected to be as small as &.
+
+
+
+
+
+
We now describe very quickly the expected phenomenology of the motions with initial conditions in the Arnold web. Within resonances, both chaotic and regular motions can be observed. Regular resonant motions are topologically different from the regular non resonant ones because they are quasi-periodic with a number of frequencies which is strictly smaller than the number of degrees of freedom. Islands of regular resonant orbits can be surrounded by chaotic zones. However, orbits with initial conditions in such a chaotic regions do not diffuse in the action space thanks to the famous Nekhoroshev theorem (Nekhoroshev 1977) which applies if E is small and some non-degeneracy condition is satisfied (in particular satisfied by the Hamiltonian of Equation 1). (Actually, only diffusion with a velocity exponentially small with respect to - l / ~can be expected.) The picture drastically changes by increasing the perturbation parameter. As already remarked, when e is high enough the global volume of resonances does not leave any place for invariant tori. In this case the dynamics is no more controlled by Nekhoroshev theorem. To describe it we resort t o the well known Chirikov (Chirikov 1979) overlapping criterium, which allows the resonant chaotic orbits to go from one resonance to the other, possibly giving rise to large scale diffusion. As a global picture, all the action space seems t o be constituted by a large volume chaotic region with some robust resonant island in it.
330
Claude Froeschli
The mapping model
2.2
For the application to the discrete case we consider the following mapping (Froeschli! and Lega 2000):
z(t + 1) = z ( t )+ c1 sin(z(t) + y(t))
+ b[sin[z(t)+ y ( t ) + ~ ( t+)t ( t ) ] + sin[z(t)+ y(t) z ( t ) t ( t ) ] ] (mod27r) -
Y(t + 1) = z(t
+ 1)
t ( t + 1)
d t )+ z ( t )
(mod27r)
+ sin(z(t) + t ( t ) ) + b[sin[z(t)+ y ( t ) + z ( t ) + t ( t ) ] - sin[z(t) + y(t) - z ( t ) - t ( t ) ] ] (mod27r) z(t) + t ( t ) (mod27r).
= z(t)
=
-
€2
(2)
This mapping, which has been obtained adding some harmonics in the coupling term of the original 4-dimensional standard map (Froeschli! 1971,1972): corresponds to the leap-frog integrator of the following Hamiltonian:
with
E,
= a,At2, z = 1 . 2 and b = BA@,At being the integration step.
As we have described above for the Hamiltonian case, Yekhoroshev‘s theorem provides a hint about the “practical’’stability of the dynamical system. The situation is less clear in the case of symplectic maps. However. as we will show below, the numerical experiments show structure and evolution very similar to those arising in Hamiltonian systems (Guzzo et al. 2000). Because of these reasons Froeschl6 and Lega (2000) used the appellation Nekhoroshev’s like regime for the study of a map. For the sake of simplicity, we will write here Nekhoroshev regime for both the Hamiltonian and the symplectic case.
3
The Fast Lyapunov Indicator (FLI)
This kind of work requires tools for a very sensitive analysis of a lot of orbits. The classical tool for discriminating between chaotic and ordered orbits is the largest Lyapunov exponent. Recall that, under some suitable regularity conditions, the Lyapunov exponents are computed by integrating the equations of motion and the variational equations:
where v is any n-dimensional vector. The largest Lyapunov exponent is defined in such a way that, unless (0) belongs to some lower dimensional linear spaces, the quantity In iiv(t)ll/t tends to it for t going to infinity. If Equation (4) is of Hamiltonian type or if we have a symplectic map and if the motion is regular. then the largest Lyapunov
The Fast Lyapunov Indicator
331
6
5 4
I
5
3
U
2
1
0 1
100
10
1000
10000
t
Figure 1. Variation of the Fast Lyapunov Indicator with time for four orbits of the standard map with E = 0.3. The upper curve is for a slow chaotic orbit with initial conditions x ( 0 ) = O.OOOOl,y(O) = 0 , the second one is for a non resonant orbit with x ( 0 ) = 0.5, y(0) = 0 and the third one is for a regular resonant orbit with x ( 0 ) = 0 , y(0) = 1. The curve with quasi-constant FLI is obtained for the periodic orbit, of order 2, of initial conditions s(0) = 7 r , y(0) = 0. exponent is zero, otherwise its value is positive. This property has been largely used to discriminate between chaotic and ordered motion. However, among regular motions the Lyapunov exponent does not enable the distinction between KAM tori and resonant islands. If instead of computing the quantity In llv(t)ll/t, we just look a t the variation of In I Iv(t)1 1 with time then we can distinguish not only chaotic motion but also resonant and non resonant regular motions (FroeschlC and Lega 2000). The value of In I Iv(t)I I for a given t is called the Fast Lyapunov Indicator (hereafter called FLI, FroeschlC et al. 1997, Lega and FroeschlC 1997). Figure 1 shows the evolution of the FLI as a function of time for 4 orbits of the 2-dimensional standard map (FroeschlC 1970, Lichtenberg & Lieberman 1983):
M={
x ( t + 1)
=
s ( t )+ E sin(x(t) + y ( t ) ) (mod27r) (mod27r)
Y ( t + 1) = z ( t )
(5)
The perturbation parameter considered is E = 0.3 For this low value of the perturbing parameter the majority of orbits are invariant tori (Figure 2). Some resonant curves surround the elliptic point (0, - 7 r ) (0, T ) . A small chaotic zone recalling the separatrix of the pendulum is generated by the existence of the hyperbolic point a t the origin. 0) in the chaotic zone just The upper curve in Figure 1, with initial conditions described, shows the variation of the FLI with time for a well confined chaotic orbit. Despite the fact that the chaotic zone is very small, such curve shows an exponential
332
Claude Froeschle'
a.0.3
3
7
2. I
1-
0I
I -
I I ,
2-1
I I
33
-1
0
1
2
X
Figure 2. A set of orbits of the standard map for E = 0.3.
increase of the FLI with time. The lowest curve with quasi-constant FLI corresponds to the periodic orbit of order 2, with initial conditions ( 7 r , O ) . The peculiar behaviour of the FLI for periodic orbits is studied in Lega and FroeschlC (2000). The intermediate curve corresponds to a regular invariant torus of initial conditions (0.5,O) and the curve nearby corresponds to a regular resonant curve of initial conditions ( 0 , l ) . The oscillations are due to the distortion of the orbits. This fact, does not prevent the distinction between the two cases and the same is true for the Hamiltonian case. For a finer study of the phase space, like for instance the study of the set of small resonances of Fibonacci around the golden torus, Lega and FroeschlC (2000) have recently introduced a slightly modified definition of the FLI. Considering the S U ~ ~ log < ~1 Iv(k)/ < ~ 1 they obtain the same fundamental results avoiding the oscillations due to the distortions of orbits which may prevent to distinguish correctly the dynamics of very close orbits. In recent years other tools of analysis have been introduced: the frequency map analysis (Laskar et al. 1992, Laskar 1993), the sup-map analysis (Laskar 1990, FroeschlC and Lega 1996) and the twist angle (Contopoulos and Voglis 1997,FroeschlC and Lega 1998). The twist angle is particularly suited for two dimensional area-preserving map while the frequency map analysis and the sup-map analysis are less sensitive (FroeschlC and Lega 2000) and therefore more expensive in computational time. As far as chaos is concerned we remark that the FLI is defined for each orbit and provide quantitative informations about the strength of chaos. Moreover, the advantage of this method is to be directly related to the definition of chaos. and that is why the transition from the Nekhoroshev to Chirikov regime that we will describe in next section appears in such a spectacular way.
The Fast Lyapunov Indicator
4
333
Detection of the Arnold web: graphical evolution.
In this section we show how the Arnold web evolves in the transition from order to chaos for both the Hamiltonian model and the symplectic mapping. For the Hamiltonian given in Equation 1 we have computed the FLI, using a leap-frog symplectic integrator, on a grid of 500 x 500 mesh of initial conditions regularly spaced in the action space (the unimportant choice of initial angles was q51 = $2 = 4 3 = 0). A delicate role in the method is played by the initial choice of the tangent vector. Indeed, it turns out that resonances which are aligned with the initial tangent vector (dl,$2) are not detected by the method. In order to avoid the loss of resonances parallel to the initial vector we have chosen ($1, $2) to be in a strong irrational ratio; the other components play a minor role, and we have chosen (il, i2)= (1, 1). Results are presented in Figure 3 (FroeschlC et al. 2000). On each of the pictures, the initial conditions IllI2 are associated to the corresponding FLI value using a grey scale. The lowest values of the FLI appear in black and they correspond to the resonant islands of the Arnold web, while the highest values appear in white and they correspond to chaotic motion rising at the crossing nodes of resonant lines or near the separatrices. The FLI of all the KAM tori have approximately the same value, called reference value, and therefore they appear with the same grey colour. Thanks to these characteristics, the resonant lines appear very clearly embedding large zones filled with KAM tori (in particular in Figure 3a and in the enlargement shown in Figure 3b). The choice of the grey scale is suited to the value of the perturbation parameter and to the integration time. Due to the choice of the perturbation with a full Fourier spectrum, i.e. all harmonics are present at order E , a high number of resonances is already visible at small E (Figure 3a,b in principle all resonances should appear just by increasing the integration time). Instead in Figure 3c,d, which refer to E = 0.01 , the volume of invariant tori decreases, but the system is still in the Nekhoroshev regime. In these figures the chaotic regions become evident at the crossing of resonances, following Nekhoroshev theorem. In Figure 3e,f, which refers to E = 0.04, it appears very clearly that the dynamical regime has completely changed. As expected, the majority of invariant tori has disappeared due to resonance overlapping, and a big chaotic connected region has replaced the regularity set. We can say that the transition from Nekhoroshev’s to Chirikov’s regime occurs in the interval 0.01 < E < 0.04.
For the model of coupled standard maps (Equation 2) we have computed the FLI on a grid of 500 x 500 initial conditions regularly spaced on the x and z axis. For the initial angles we have considered y = t = 0. The initial vector has components: wl(0) = 0.5(3 - &), w, = 1, i = 2 , 4 . The number of iterations for the mapping is 1000. The parameters of the two mappings are = 0.4, €2 = 0.3 and the coupling parameter b varies from b = 0.01 to b = 0.05. In Figure 4 and Figure 5 the FLI values are represented on a grey scale. For b = 0.01 in Figure 4 a lot of points are grey, i.e. the corresponding FLI is about 3, which is the reference FLI value for the tori. A white band on the left hand side appears corresponding to the chaotic zone generated by the hyperbolic point at the origin. The dark zones correspond to resonant regular orbits. The problem of detection of the transition interval for passing from one regime to the other is more complicated than for perturbed Hamiltonian systems. When increasing the perturbation parameter the chaotic zone becomes larger and larger at the crossing of resonant lines, many more resonant lines appear and the volume occupied by the tori
334
Claude Froeschle‘
Figure 3 . FLI values computed on a grid of 500 x 500 initial conditions regularly spaced in action axis I , and 12 for increasing values of the perturbation parameter E . The other initial conditions are I3 = 1, 41 = 0, 4 2 = 0, 4 3 = 0. Left column: a global picture, right column: enlargement around the resonance centred on Il = 113, I2 = 116. The transition from Nekhoroshev’s to Chirikov’s regime occurs in the interval 0.01 < E < 0.04. Low FLI values are black ( ~ 2 . 5 ) :high FLI values are white ( ~ 4 . 5 ) .
The Fast Lyapunov Indicator
335
Figure 4. Graphical evolution of the Arnold web, for the standard map of Equation 2, for increasing values of the coupling parameter shrinks t o zero (Figure 4). The transition from Nekhoroshev t o Chirikov, represented by the increasing of the white zone, seems t o occur in the interval 0.02 < b < 0.05. Figure 5 shows the transition between the two regimes in a local zone of the phase space which for b = 0.01 is almost completely grey, i.e. filled by invariant tori, but streaked with thin resonant lines. It is interesting t o note that for increasing b we have locally the same evolution of the grey scale topology as in the global case. We know that this is indeed the case for Hamiltonian systems fulfilling the hypothesis of Nekhoroshev theorem.
5
Conclusion
We have shown, using a very simple numerical tool (FLI), (a) the structure of the Arnold web in a Hamiltonian system and in a symplectic mapping and (b) the transition from an ordered t o a chaotic system which occurs as the perturbation parameter increases.
336
Claude Froeschl6
Figure 5. Graphical evolution of the Arnold web, f o r the standard map of Equation 2, for an enlargement of a small tone of the phase space of Figure 3. The main reason for obtaining so easily the Arnold web is due to the fact that the FLI not only distinguishes between chaotic, even slow chaotic, and regular motion, but also between regular resonant motion and regular non resonant one. Using the two dimensional standard mapping Froeschlk and Lega (2000) have investigated the behaviour of the FLI for the regular cases. Actually, in both, regular resonant and non resonant dynamics, the tangent vector grows linearly with time, but with a different speed. This fact is explained studying (Fkoeschlk and Lega 2000) the differential rotation inside a chain of islands which turns out to be lower than between tori with a derivative which becomes close to zero towards the centre of the islands. For the particular case of periodic orbits, the FLI grows linearly with time for a time proportional to the logarithm of the order of the orbit and then becomes quasi-constant. This peculiar behaviour has been studied, again on the two dimensional standard map, by Lega and Froeschlk (2000) introducing a model based on linear elliptic rotation. Although the results have been obtained up to now on %oy' models, they are so encouraging that we intend to use the method for applications to dynamical astronomy like problems of celestial mechanics or of galactic dynamics.
The Fast Lyapunov Indicator
337
References Arnold V I, 1963, Proof of a theorem by A N Kolmogorov on the invariance of quasi-periodic motions under small perturbations of the Hamiltonian, Russ Math Surw, 18 9. Chirikov B V, 1979, An universal instability of many dimensional oscillator system, Phys Rep, 52 263. Contopoulos G and Voglis N, 1997, A fast method for distinguishing between order and chaotic orbits, Astron Astrophys, 317 73. FroeschlC C, 1970, A numerical study of the stochasticity of dynamical systems with two degrees of, Astron Astrophys, 9 15. FroeschlC C, 1971, On the number of isolating integrals in systems with three degrees of freedom, Astrophys and Space Sciences, 14 110. FroeschlC C, 1972, Numerical study of a four-dimensional mapping, Astron Astrophys, 16 172. FroeschlB C and Lega E, 1983, Twist angles: a fast method for distinguishing islands, tori and weak chaotic orbits. Comparison with other methods of analysis, A A , 334 355. FroeschlC C and Lega E, 1996, On the measure of the structure around the last KAM torus before and after its break-up, Celest Mech and Dynamical Astron, 64 21. FroeschlC C, Lega E, and Gonczi R. 1997, Fast Lyapunov Indicators. Application to asteroidal motion, Celest Mech and Dynam Astron, 67 41. FroeschlC C, Guzzo M and Lega E, 2000, Graphical evolution of the Arnold’s web from order to chaos, Science, 289-N.54872108. FroeschlC C and Lega E, 2000, On the structure of symplectic mappings. The Fast Lyapunov indicator a very sensitive tool, Celest Mech and Dynamical Astronomy, in press. Guzzo M, Lega E and Froeschlir C, 2000, Stable and unstable chaos. Recent numerical tools for the transition from Nekhoroshev to Chirikov regimes, in preparation. HCnon M, 1969, Numerical study of quadratic area-preserving mappings, Quarterly of Applied Mathematics, 27 291. HBnon M and Heiles C, 1964, The applicability of the third integral of motion. Some numerical experiments, A J , 1 73. Kolmogorov A N, 1954, On the conservation of conditionally periodic motions under small perturbation of the Hamiltonian, Dokl Akad Nauk SSSR, 98 524. Laskar J, 1990, The chaotic motion of the Solar System. A numerical estimate of the size of the chaotic zones, Icarus, 88 266. Laskar J , 1993, Frequency analysis for multi-dimensional systems. Global dynamics and diffusion, Physica D, 67 257. Laskar J, FroeschlC C, and Celletti A, 1992, The measure of chaos by the numerical analysis of the fundamental frequencies. Application to the standard mapping, Physica D, 56 253. Lega E and FroeschlB C, 1997, Fast Lyapunov Indicators. Comparison with other chaos indicators. Application to two and four dimensional maps, in The Dynamical Behaviour of our Planetary System, edited by Henrard J and Dvorak R, Kluwer Academic Press.. Lega E and FroeschlC C, 2000, On the relationship between Fast Lyapunov Indicator and periodic orbits for symplectic mappings, Celest Mech and Dynamical Astronomy, in press. Lichtenberg A J and Lieberman M A, 1983, Regular and Stochastic motion, Springer, Berlin, Heidelberg, New York. Month M and Herrera J H, 1979, Nonlinear dynamics and the beam-beam interaction, American Institute of Physics. Moser J, 1958, New aspects in the theory of stability of Hamiltonian systems, Comm on Pure and Appl Math, 11 81. Nekhoroshev N N, 1977, Exponential estimates of the stability time of near-integrable Hamiltonian systems, Russ Math Surveys, 32 1.
Claude Froeschle' Nesvorny D and Ferraz-Mello S, 1997, On the asteroidal population of the first-order Jovian resonances, Icarus, 130 247. Papaphilippou Y and Laskar J ? 1998, Global dynamics of triaxial galactic models through frequency' map analysis, Astronomy and Astrophysics, 329 451.
339
Determination of chaotic attractors in short discrete time series A Celletti', C Froesch162,I V Tetko3,A E P Villa4 'Universita di Roma, 'Observatoire de Nice, 3Academy of Sciences of Ukraine, 4Universitk de Lausanne.
1
Introduction
Discrete time series can represent the occurrences of either a deterministic or a random process. Dynamical system theory provides powerful techniques to assess whether a set of equations (in a suitable embedding space) underlies the dynamics. In this case the trajectory can be predicted whenever the initial conditions are known with absolute precision. On the contrary, a stochastic system is characterised by a complete unpredictability of the trajectories. Time series may be derived from mathematical models, either from mappings or from continuous models. The time series may be also provided by experimental data, derived, e.g. , from astronomy. physics, medicine and biology. In particular, we present an analysis of neuro-biological data, where the discrete time series are obtained from the epochs of action potentials of nervous cells (i.e., spzke t r a m s ) . We refer to Babloyantz and Salazar (1985), Celletti and Villa (1996), Mpitsos et al. (1988), Rapp et al. (1985) for extensive applications of dynamical system methods to neurobiology. In recent years several techniques have been extensively developed to determine the deterministic or stochastic behaviour of a time series (Abarbanel et al. 1993, Boffetta et al. 1998, Celletti et al. 1999, Cellucci et al. 1997, Gao and Zheng 1993, Eckmann and Ruelle 1985, Kaplan and Glass 1992. Rapp et al. 1993, Sugihara and May 1990, Theiler et al. 1992). An exhaustive description of methods in nonlinear time series analysis is presented in Hegger et al. (1999), Schreiber (1998). Beside the characterisation of the embedding space, topological and metric invariants can be determined. On the one hand, the method developed by Grassberger and Procaccia (1983) allows computation of the size of the attractor, i.e. the so-called correlatzon damensaon. On the other hand, the computation of Lyapunov exponents quantifies the divergence of nearby trajectories, providing an analysis of the structure of the attractor (Damming and Mitschke 1993, Eckmann et al. 1986, Kantz 1994, Packard et al. 1980, Rosenstein et al. 1993. PVolf et al.
,4 Celletti, C Froeschle', I V Tetko, A E P Villa 1985, Zeng et al. 1991). We devote Section 2 to the definitions of fractal dimensions; a review of the Grassberger and Procaccia method and of the basic techniques to compute Lypaunov exponents is presented in Sections 3 and 4, respectively.
-4common hindrance of most methods is a severe constraint due to the necessity of having a suficiently large number of points in the time series in order t o avoid unreliable results due to poor statistics. During physical experiments long observations may be corrupted by drifts and non stationarities which may lead to incorrect results. Therefore, the availability of long time series can be a serious limitation in the investigation of nonlinear dynamics in physical systems. We stress that the statistics required by standard investigation methods usually prevents the applicability of the algorithms in realistic situations. The development of methods able to distinguish the deterministic character of short time series becomes an important issue for future research in this field. A new method to provide information on the deterministic properties of time series
{xi}, i = 1, ...K , with a significant but not too large number of points was presented in Celletti et al. (1999). In particular, this algorithm was applied to the 2-dimensional HCnon mapping taking K = 400 and to the Rossler system with K = 1000. In Section 5 we explore in more detail the method presented in Celletti et al. (1999), providing a large variety of applications to discrete and continuous systems, as well as to surrogate data (see Section 6). We provide also a validation for the choice of the parameters on which the method depends. Among the mathematical models we have investigated, we selected the mappings known as H h o n (and its extension in 4 dimensions), KaplanYorke, Zaslavskii, Ikeda, Sinai and the continuous systems known as Lorenz, Rossler and the hyperchaotic Rossler attractor. The time series were constructed as the iterates of one variable with only K = 1000 points. In all cases the deterministic behaviour of the system was correctly detected. Moreover, we use the method presented in Celletti et al. (1999) to give an estimate of the maximum Lyapunov exponent (or Lyapunov characteristic estimate, hereafter LCE). We perform a comparison of the LCE with the classical numerical expectations. In order to explore the robustness of the method, we analyse the effect of additive, dynamical and experimental noise. The stochastic behaviour is already observed with noise levels of 5%. The results suggest a much higher sensitivity of our method with comparison to other algorithms, such as the Grassberger and Procaccia. As a further check, we test the method on several sets of surrogate data and we always observe a stochastic behaviour. A question was left open in Celletti et al. (1999), namely the effectiveness of the method when applied to realistic situations. To this end, we consider in Section 6 experimental time series derived from electro-physiological recordings of neuronal discharges in the cerebral cortex of anaesthetised mice and in the red nucleus of behaving rats. Although the majority of these experimental data show a stochastic behaviour, some cases reveal a deterministic behaviour in low-dimensional spaces. The method presented in Section 5 does not provide reliable results in some specific cases of symplectic mappings (precisely, regular motions and weak chaos). In Lega et al., (2000) alternative methods based on the technique of Section 5 have been developed to deal with such degenerate cases. The most promising algorithm is briefly summarised in Section 7 . The conclusions are discussed in Section 8.
A practical implementation of many algorithms from time series analysis can be found in h t t p : //uww .neuroheuristic. org where a virtual laboratory is installed.
341
Determination of chaotic attractors in short discrete time series
2
Fractal dimensions
Given a set of points, fractal dimensions are related to the way the density of points scales with small volumes surrounding the points (Abarbanel, 1996). The simplest way to assign a fractal dimension is obtained by a box-counting method. Sets with noninteger dimension are called fractals. To introduce the box-counting dimension, consider a set in an N-dimensional space, which we cover by a grid of N-dimensional cubes of side r. Let fi(r) be the number of cubes which are needed to cover the set. The box-counting dimension is defined as log Z(r) Do = lim ___ r+o log
($)
As an example, we consider the middle third Cantor set. To compute its box-counting dimension, we define a sequence r, with the property that limn-+wr, = 0. Then Do can be rewritten as
In particular, one can take r, = 1/3,, so that %(r,) = 2, and Do = log2/log3 providing the fractal property of the Cantor set.
N
0.63,
As an extension of the box-counting dimension, one defines the following family of generalised dimensions, which depend on an index q:
-
where I(q,r) E C,”(;)p: and the sum is over all fi(r) cubes of unit size r which are needed to cover the attractor. The quantities pi are the natural measure of the attractor. More precisely, if the attractor is covered by a grid of cubes Ci, for any 5 0 in the basin of attraction, let us define
where v(C,,5 0 , T ) is the time spent in C, by the orbit starting from xo for any 0 5 t 5 T . For q = 1 one has the information dimension D1: let the attractor be,covered by fi(r) cubes of size r and let pi be the probability to visit the i-th cube. Then D1 is defined as
For q = 2, one obtains the correlation dimension DZ which will be introduced in the following section, being at the basis of the Grassberger and Procaccia method.
3
The method of Grassberger and Procaccia
A basic problem when dealing with discrete time series is to ascertain whether the series is produced by a deterministic or stochastic system. In the first case, one assumes there
342
A Celletti, C Froeschk, I 1' Tetko, A E P Villa
exists a set of equations governing the dynamics in a suitable embedding space. In the latter case, due to the randomness of stochastic motion, no forecasting can exist on the dynamics. Among several methods available for classification of discrete time series the algorithm of Grassberger and Procaccia (1983), hereafter referred to as G P method, has been widely applied to theoretical and experimental cases. If the observable is deterministic the G P method enables to determination of the dimensions of the embedding space and of the attractor. We briefly recall the method as follows. Let ( 5 1 , ...,z ~ (x, } E R) be a discrete time series composed by K points. In a d-dimensional embedding space, define the set Y = { y l . ...yN } (S= K - d 1) of delay vectors as
+
Y1
=
(51*...rxd)
Y2
=
(52. ...,Xd+l)
...
Let r > 0 and for any y j E Y , let n3(r;d ) be the number of points yz E Rd (z are contained in the d-dimensional hypersphere of radius r around y3, i.e.
# J ) which
N
@(r - (1Yz - gj/ld) 3
n j ( r ;d ) Z=l,t#j
where @ is the Heaviside function (2.e. @(z)=lfor 520, @(x)=O for x
The correlation dimension 0
2
/I . / I d is
is related to C,v,d(r) by
for d sufficiently large. We remark that the correlation dimension corresponds to the generalised dimension of order q = 2, since it can be shown that C N , d ( r ) scales as I ( 2 , r ) . Moreover, one has the inequalities 0 2 5 D1 5 Do; if the points on the attractor are uniformly distributed, then 0 2 = D1 = Do. The correlation dimension corresponds to the slope of the graph logCN,d(r) against logr, whenever its value is nearly constant as the embedding dimension d is varied (see Figure l a ) . This algorithm enables computation of the correlation dimension as well as the dimension of the embedding space, provided that the slopes of the above curves are definitely convergent. .4 stochastic behaviour is given by a constant increase of the slopes with d (see Figure lb). In practical applications, the slope of the curves logCN,d(r) against logr must be evaluated in a meaningful range of values of the radius, say (rg,T I ) . denoted as the scaling region. Below T O the curves are distorted since few points are counted in the hypersphere of radius ro, while above r1 the curves tend to flatten since the attractor has finite size. The relation between the minimum amplitude of the scaling region and the number of points forming the time series was investigated in Eckmann and Ruelle, (1992). An extension of the Grassberger and Procaccia method to analyse the joint behaviour of two (or more) time series was investigated in Celletti et al. (1998).
Determination of chaotic attractors in short discrete time series
0
-2
oz
-4
v- -2
9
-k
-1
h
c
"=8 -
343
-3 -4
-6
-8 -2
-4
0
2 logr
Figure 1. Graphs of logCN,d(r) against logr f o r embedding dimensions d = 1, ..., 8: (a) the deterministic behaviour i s indicated by the parallelism of the curves f r o m d = 2; ( b ) the stochastic character is determined by the divergence of the slopes of the correlation integral curves.
4
Lyapunov exponents
The calculation of the Lyapunov exponents (Benettin and Galgani 1979, Benettin et al. 1980, Damming and Mitschke 1993, Eckmann et al. 1986, Kantz 1994, Packard et al. 1980, Rosenstein et al. 1993, Wolf et al. 1985, Zeng et al. 1991) provides information on the evolution of the motion and, more precisely, the rate of divergence of nearby orbits. Most methods for determining the Lyapunov exponents are based on the following idea introduced in Benettin and Galgani (1979), Benettin et al. (1980). Compute the spectrum of the Lyapunov exponents following the evolution of a set of tangential vectors, which might be approximated by small distance vectors. A renormalisation procedure is applied at given intervals of time in order to control the overflow of chaotic orbits. More precisely, consider two orbits starting at Po and A P with dist(Po,Pi) = do (Figure 2). I
I
II
PO, Id0
0
"7
P PO 1 Figure 2. T h e Benettin and Galgani method (see Benettin and Galgani, 1979). After a time h, Po evolves into PI and PA into Pi with dist(Pl,Pi) = dl. By a homothesis of the centre P I and of the ratio d o / d l , one finds a new point P;' at distance do from PI. Iterating this process (with new initial data Pl, P;') one obtains a sequence of points at distances dl, dz, d3,... The quantity:
A Celletti, C Froeschle', I V Tetko, .4 E P Villa
344
tends to a limit, which is the largest Lyapunov characteristic estimate as the number n tends to infinity and as the distance do tends to zero. We remark that in order to apply the above method. it is essential to know the explicit equations governing the dynamics. However. the previous technique can be adapted to investigate discrete time series as described in Wolf et al. (1985). More precisely, follow the evolution of two points PO,Pi. until their distance exceeds a given value. (Figure 3). Let PI, P: be the evolved points; replace Pi with a point P: closer to Pl and such that the vector PIPP has the same orientation as PIP:. Let { t k } be the sequence of times a t which the replacements take place and let d ( t k ) =dist(Pk. P i ) , d ' ( t k ) =dist(Pk. PL). The largest Lyapunov exponent is defined as
where n is the total number of replacements. I
r
i
1
I
I
I
P
Po
P
1
2
P
3
Figure 3. The Wolf et al. method (see Wolf et al., 1985). An alternative method to compute the whole spectrum of the Lyapunov exponents was developed in Eckmann et al. (1986). Suppose that the dynamics is ruled by xn+1 = f ( s n )
and let D,,,
(g) .
We look for an approximation of D,, using the experimental
T^
data as follows. Conside; the evolution of the points Pi, whose distance from a preassigned point P, is less than r , (Figure 4). Consider those points whose images P:+, are still at distances less than r from Pz+m.Determine D,, with a least square approximation over the points Pi, so that
Dx,[P,I - Pi] N Pi+,,, - Pi+m Determine the matrices Ox,+,, D,,+2m.... in the same way. Next, let us decompose the matrix D,, as D,, = QIR1, where Q1 is an orthogonal matrix, while RI is an upper triangular matrix with non-zero diagonal elements. Analogously, let Dxl+,Q1 = QZR2, ..., Dxl+nmQn = Qn+l&+l. The Lyapunov exponents are given by the formula
.
M-1
where 7 is the sampling time step, (R3)kk is the k-th diagonal element of Rj and h i' is the available number of matrices.
Determination of chaotic attractors in short discrete time series
345
Figure 4. The Eckmann et al. method (see Eckmann et al., 1986)
5
A method for short time series
The main drawback of the methods for computing Lyapunov exponents and correlation dimensions consists in the large number of points required to avoid inaccuracy and errors during the calculation. A careful analysis of the minimal number of points necessary to compute correlation dimension and Lyapunov exponents has been presented in Eckmann and Ruelle (1992). In addition, the calculation of the methods presented in Sections 3 and 4 often relies on the choice of some parameters, which are not easily selected. In this section, we review the method presented in Celletti et al., (1999) to assess the deterministic character of short time series (typically composed by 1000 points), providing more details for the criterion of selection of the parameters on which the method depends. Let {q},1 = 1,...,K , be a discrete time series with K points. We then consider delay coordinates in a d-dimensional embedding space, setting yj = ( x j ,..., x j + d - I ) for j = 1,..., K - d 1. Denote by Pj (yj) a point in the embedding space and let P,( k ) - (yj+k) be the k-th iterate of Pj. For ro > 0, let n ( r o )be the number of pairs
+
(pi,P,), z < j , such that di:) E - PjlId 5 T O . We denote by d!;) l1Pi@’- p,(k)I(d ( k ) / d i(0) j ) and let the distance between the k-th iterates of Pi, P,. Let ai;) = log(dij
We refer to Xl(r0) as the Lyapunov characteristic estimate (LCE). An estimate of the LCE can be equivalently obtained as X~;(ro)/Ic.We may therefore set the following criteria: 1. In a low-dimensional deterministic system, there exists a suitable interval of initial
distances TO in which, for a fixed k , the value of the LCE is nearly constant; the curves & ( T O ) against T O tend to become parallel because the values X k ( ~ o ) / kare nearly equal as k is varied. 2. In stochastic or higher dimensional deterministic systems, the value of the LCE, for a fixed k , decreases with T O , due to the unpredictability character of stochastic dynamics; the curves X k ( r 0 ) against TO tend t o converge t o the same limit.
346
A Celletti, C FroeschlC, I V Tetko, A E P Villa
Note that the above criteria depends only on two parameters: the iteration parameter k and the initial distance T O . The value of the largest LCE is given by Xl(ro) when it is nearly constant with the initial distance ro.
Remark: The measure at the basis of our method (i.e., the quantity & ( T O ) ) was introduced in other studies (Boffetta et al. 1998, Cellucci et al. 1997, Gao and Zheng 1993, Kantz 1994), though the analysis of short time series (like those studied in the present section) was not performed and simple criteria for applicability to real situations were not discussed. In particular, Gao and Zheng (1993) proposed a local exponential divergence plot aimed a t determining the minimal embedding dimension. An algorithm to detect noise corruption was presented by Cellucci et al. (1997). A measure similar to LCE has been suggested to compute Lyapunov exponents for dynamical systems characterised by different time scales (Boffetta et al. 1998). The dependence of the LCE upon noise was investigated in Damming and Mitschke (1993).
5.1
Choice of the iteration parameter
A variation of the iteration parameter k implies a comparison of the initial distance d$) of some pairs of points, say Pi and Pj, with the distance d$) after IC iterations of the above points. In order to keep control of the divergence of the corresponding trajectories, it is essential to take a reasonable low value of the iteration parameter, since in a deterministic chaotic system the trajectories diverge exponentially. It is rational to consider a maximum number of k = 5 iterations as sufficient to control the separation of the orbits.
5.2
Choice of the initial distance
As mentioned in Celletti et al. (1999), the value of the initial distance is crucial for the statistics of our method: for a small value of ro, the number of pairs within ro is generally too small to provide meaningful results. On the contrary, if T O is too large all points of the embedding space will be included, eventually exceeding the actual size of the attractor, if any. If K denotes the number of points which form the time series, we have heuristically determined to select an optimal value rmaxfor the distance T O such that n(rmm) = K2/100, where n(rmaX) is the number of pairs (Pi,Pj), i < j , whose distance is less than or equal to rmax.The validation of this 'rule of the thumb' has been performed by a x2 comparison of the distribution of the X,(T,,) with the classical numerical result, when dealing with explicitly known dynamical systems. The rationale is that for optimal values of TO the LCE-curves are flat parallel lines and in the case of low-dimensional systems the value of the estimated LCE is nearly constant on this plateau. The classical numerical estimate of the LCE is performed as follows (FroeschlC, 1984): let M : Rd -+ Rd be a mapping in a d-dimensional embedding space. We derive the tangent mapping at a point zoE Rd,say O M ( z o ) ,and for a given initial vector vo E Rd we compute the image point as v1 = DM(z0)vo. After normalising the sequence of vectors, the greater LCE is given by
where No is a suitable number of iterations a t which convergence is reached.
Determination of chaotic attractors in short discrete time series
347
A X2-test is performed between the LCE computed as in (2) and as in (1) with k = 1. More precisely, let [ a ,b ) be an interval in R; consider a pairwise disjoint partition of [a,b ) , say.[a,b) ~ l " , ~ [ a ~for , ba~suitable ) n > 1. For a given ro, we denote by I l 1 ( [ a l l b l ) ) the number of values a!;', i = 1,..., K - 1, j = i 1, ...,K , belonging to the interval [al,b l ) .
+
Using the same notation, II2([al, b o ) is the number of values log which fall in the interval [ai, bl).
I' v z+ll' d i m'
= 11 ...1 No,
The standard X2-value is provided by n
x2(ro) f
C[n,([a,,b l ) ) - n2([a1,
b1))I2
'
1=1
Notice that the quantities CY!;) and henceforth n , ( [ a l , b l ) )depend on T O . The value Toptima] a t which xz as a function of T O reaches its minimum is the optimal initial distance a t which the LCE computed as Xl(ro) (see (1)) is the nearest to the classical numerical value. The validation by this test was performed on well known low-dimensional mathematical models, the H h o n and the Sinai mappings (see also section 6). In Figure 5 we determined as the minimum illustrate the corresponding X2-functions. Note that rOptimal such that n(rmaX) = K2/100: for the H6non of the Xz-function nearly coincides with T, N 0.02, rmaX N 0.014, while for the Sinai map we case, taking K = 1000 we have rOptimal have rOptimdN 0.067, T, 1: 0.064.
0.041 0.02/j n
0 L W
L 0.2 0.4
OO
ro
0.01
\ 0.4
0.2
r,1
Figure 5. The curves x2(ro) against ro are displayed for: ( a ) The He'non mapping with initial data xo = 0.6, yo = 0.19 and for the parameters: a = 1.6, b = 0.1 (d = 2); ( b ) The Sinai mapping with xo = 0.1, yo = 0.1 and A = 0.1 (d = 2).
5.3
Embedding dimension
We shall now address the problem of the choice of the embedding dimension d , which was considered as fixed in the above discussion. For a generic time series {x3}, j = 1,..., K , we consider embedding spaces whose dimension is related t o the number K of points available. For example, if K = 1000 we let d vary between 2 and 8, according t o the fundamental limitation provided in Eckmann and Ruelle (1992). When a deterministic
A Celletti, C Froeschlk,I V Tetko, A E P Villa
348
case is detected its embedding dimension is computed as the value a t which the curves X k ( ~ 0 ) against T~ are straight and parallel lines as the iteration parameter IC is varied. However, we want to stress that in order to reconstruct the attractor’s dynamics it might be necessary to embed the trajectory in a space whose dimension is greater than the true dimension of the state space (compare with Eckmann et al., 1986).
6
Applications
Discrete time series may be derived from mappings, continuous systems (taking the discretisation over finite times) or experimental data. With explicit dynamical systems, the time series is formed by the iterations of one observable, typically one of the system’s coordinate. IVe consider several examples of low-dimensional mappings and apply the method to characterise the deterministic system, compute the Lyapunov exponents and compare the results with the classical numerical ones (see Section 6.1). Continuous systems are analysed in Section 6.2 and analyses of experimental data derived from neurobiology (precisely. from the epochs of action potentials in electro-physiological recordings) are presented in Section 6.3. 1;alidation of the method by considering surrogate data and simulated spike trains is presented in Section 6.4.
6.1
Mappings
Below is a list of dissipative mappings and the corresponding tables reporting the values of the LCE (A,) and its classical numerical estimate (XC) for several choices of the parameters of the mappings. In all cases the time series was constructed as the iterates of the xcomponent with K = 1000 points and XI was calculated for T O = rmU. We refer to Celletti et al. (1999) for a more extensive discussion of the 2-dimensional H6non mapping and of the Rossler system. 6.1.1
HBnon mapping (2-dimensional)
It is defined by the equations
x’ = -ax2+y+l y‘ = bx , x,y E R and a. b E R. Let Xc be the classical numerical estimate of the LCE and XI be the value obtained as in Section 5. with an embedding dimension d = 2; let initial conditions. 20
Yo
U
b
XC
X I ( d = 2)
0.6 0.5 0.4
0.19 0.20 0.10
1.6 1.6 1.4
0.1 0.1 0.15
0.352 0.354 0.349
0.325 0.311 0.331
5 0 , yo
be the
Determination of chaotic attractors in short discrete time series
349
6.1.2 HBnon mapping (4-dimensional)
It is defined by the equations
x’ = - a x 2 + y + l y’ = bx h t ,
+
z’ = - d z 2 + t + 1 , t’ = - b ’ t + hx , x, y, t , t E R and a, b, a’, b’, h E R. We adopt the same notations as before for Xc and Al.
20
0.6 0.6 0.6 0.6 0.6 0.5 0.5 0.62 0.62 0.62
6.1.3
YO
0.19 0.19 0.19 0.19 0.19 0.21 0.21 0.18 0.18 0.18
to 0.62 0.62 0.62 0.62 0.62 0.82 0.82 0.62 0.62 0.62
a
to
0.192 0.192 0.192 0.192 0.192 0.4 0.4 0.19 0.19 0.19
b
b‘
a’
1.4 0.3 1.6 1.4 0.3 1.6 1.4 0.3 1.6 1.2 1.6 0.2 1.6 0.2 1.2 1.6 0.2 1.2 1.6 0.2 1.2 1.6 0.2 1.2 1.4 0.31 1.61 1.4 0.31 1.61
0.6 0.6 0.6 0.1 0.1 0.1 0.1 0.1 0.6 0.6
h 0.01 0.001 0.03 0.001 0.005 0.001 0.005 0.001 0.01 0.001
Xc 0.405 0.420 0.400 0.440 0.440 0.440 0.441 0.440 0.430 0.423
X1 (d = 4) 0.378 0.380 0.363 0.460 0.474 0.493 0.468 0.465 0.394 0.371
Kaplan-Yorke map
It is defined by the equations
(mod 1) y’ = cyy+bcos(2c~x),
5’
= ax
x, y E R, a, b, c, cy E R. In a 2-dimensional embedding space, for any initial conditions we have the following results. a
b
c
cy
Xc
3 3 3
2 2 2
1 1 1
0.2 0.25 0.5
1.099 1.099 1.099
A1
( d = 2)
1.164 1.164 1.164
A Celletti, C Froeschle‘, I V Tetko, il E P Villa
350 6.1.4
Zaslavskii map
It is defined by the equations z’ = z + 2 : ( l + / * y ) + E V p C o S z
(mod 27r)
.
y’ = e-’(y+Ecosz)
where z,y E R and the parameters are real numbers with p = (1 - e-7)/T 2:
= (4/3). 100.
Taking the initial conditions 50 = yo = 0, we have the following results.
3 3 3 2 4 4
0.1 0.15 0.3 0.2 0.2 0.3 0.4
6.1.5
0.758 1.278 1.928 1.426 1.358 1.922 2.158
4
0.683 1.324 1.773 1.468 1.347 1.765 1.931
Ikeda map
Let =
2’
+B
.
e~k-zal(l+l~/?
(3)
where z E C and p , B, k . a E R. We rewrite (3) in its real form as
z‘
= p
+ Bcos(k
Q
Ly
+ x2 + + x2 + y2 )Y B c o s ( k - 1 + + y2 ) y + Bsin(k - 1 + x2 + y* ) x -
) x - Bsin(k - 1 y2
1
0
y‘ =
cy
22
and take k = 0.4. In this example, the best XI is always obtained in a 3-dimensional embedding space. For comparison, we report also the value corresponding to d = 2. 20
0.1 0.1 0.1 0.1 0.1
Yo
P
B
Q
Xc
0.5 0.1 0.1 0.1 0.1
1 1 1 0.9 1.1
0.9 0.9 0.9 0.9 0.8
6 6 6.5 6 6
0.507 0.507 0.487 0.420 0.466
X1(d=
3)
0.488 0.410 0.443 0.382 0.454
X1(d=
2)
1.092 1.030 1.178 0.943 0.800
Determination of chaotic attractors in short discrete time series
6.1.6
Sinai map
x’ = x + y + Acos(27ry)
(mod 1)
y’ = x + y
(mod 1) ,
where x, y E R, A E R. We take xo = yo = 0.1; the correct value of obtained taking a higher dimensional embedding space.
6.2
351
A
Xc
X l ( d = 3)
0.1 0.3 0.01 0.005 0.12 0.15
0.687 0.614 0.693 0.693 0.685 0.681
0.555 0.705 0.661 0.706 0.617 0.612
X1(d=
X1
is in some cases
2)
0.638 1.080 0.577 0.644 0.758 0.993
Continuous systems
For continuous systems the 1000 points time series are derived from the x’ iterates of the discretised x-component, when integrating by, e.g., a Runge-Kutta method. The experimental time series x” were constructed by taking the series formed by the intervals x!’ I = x! - x:. The resolution of the time series generated from the regular dynamics of conservative systems should be carefully taken into account. In the present study the experimental time series x” were arbitrarily scaled in such a way that xkax = 100000, which represents a very high resolution. For comparison with the electro-physiological spike trains the series generated out of the continuous systems were characterised by a pseudo ‘firing rate’ of 0.1 spikes/s. We have tested our method for three continuous systems whose data were contaminated by various levels of noise . In particular, we show the results of the effect of 5% additive, dynamical and experimental noise. For continuous systems, the computation of the LCE definitely requires more than 1000 points. However, we show that the method is still able to detect the deterministic behaviour when data series of length K = 1000 are considered. 6.2.1
Lorenz system (3-dimensional)
It is defined by the equations 4 Y - x) = z(R-z)-y
x =
6
i =
XY-
bz
Figure 6 shows the curves X ~ ( T ~as) functions of the initial distance TO (for k = 1,...,5) for the parameters o = 16, R = 45.92, b = 4. In Figure 6(a) the application of our
A Celletti. C Froeschle', I V Tetko, A E P Villa
352
method to the plain time series indicates that the curves tend to become parallel and flat. Conversely, for 5% additive. dynamical and experimental noise. Figures 6(b), 6(c) and 6(d) respectively, show that curves are not parallel. We tested the Lorenz system also for two other choices of parameters, namely for 0 = 10. R = 28, b = 8/3 and for 0 = 16, R = 40, b = 4. In both such cases our method differentiated the plain time series from the 5% noisy series.
z
2-l
0
002
0.01
0 03
0
001
0.02
h
L Y 0
I]
1
x 0
0 03
0.04
'0
'0
h,, h, ;(
0
0
0 01
0.02
0
0.03
'0
0.005
0.01
'0
Figure 6. The Lorenz system embedded in a 3-dimensional space is considered for the parameters 0 = 16, R = 45.92, b = 4. ( a ) The graphs refer to the curves Xk(r-0) against ro as IC = 1, ..., 5 for a 1000 points time series. ( b ) Analysis of the original time series with 5% additive noise; (c) 5% dynamical noise; ( d ) 5% experimental noise.
6.2.2
RGssler system (3-dimensional)
It is defined by the equations
x =
-y - z
y = z+ay
i = b+z(s-c) Figure 7 shows the curves X I ; ( T ~ ) against ro as IC = 1,.... 5 for the parameters a = 0.2, b = 0.2 and c = 10 for the plain, 5% additive, dynamical and experimental noisy time series. We tested the Rossler system also for the following choices of parameters: a = 0.2, b = 0.4, c = 5 . 7 : a = 0.2, b = 0.2, c = 5.7; a = 0.15, b = 0.2, c = 10. In all cases the plot of & ( T O ) against T~ allowed to discriminate plain series from 5% noisy series.
Determination of chaotic attractors in short discrete time series
0
001
002
0.03
353
004
' l J
I
0
0.02
0.04
0.06
0
0.005
0.01
0.015
r0
'0
Figure 7. The Rossler system in a 3-dimensional space is considered for the parameters a = 0.2, b = 0.2, c = 10. (a) The graphs refer to the curves Xk(r0) against TO as k = 1, ..., 5 for a 1000 points time series. ( b ) Analysis of the original time series with 5% additive noise; ( c ) 5% dynamical noise; ( d ) 5% experimental noise. 6.2.3 Hyperchaotic Rossler system (4-dimensional) It is defined by the equations 5
= -y-z = z+ay+w
i = b+xz W
= cw-dz,
We considered this system for the parameters a = 0.25, b = 0.3, c = 0.05, d = 0.5. Figure 8 shows the curves Xk(r0) against ro as k = 1,...,5 for the plain, 5% additive, dynamical and experimental noisy time series.
6.3
Neuro-biological data
The basic frequency of a neuron is usually in the range 1-5 Hz, allowing the neuron to be ready to transmit information. When excited, the frequency of the neuron may increase up to 50 H z , sometimes even up to 500 H z for few a milliseconds. Let { t j } be the sequence of firing times of a given neuron. We analysed several data sets collected during neuro-biological experiments under different recording conditions. All experiments were performed in compliance with the guidelines for the care and use of laboratory animals edited by the Society of Neuroscience and after receiving governmental veterinary a p proval. Extracellular single unit recordings were made with glass-coated tungsten microelectrodes having an impedance in the range 0.5-2 M O measured at a frequency of 1kHz.
A Celletti, C Fkoeschle', I V Tetko, A E P Villa
354
,
,
0.01
0.02
,,
-
L
v
Y
0.2 0
-0
0
0.03
0
0.05
0
0.01 0.02 0.03 0.04 0.05
'0
0.1
0.15 '0
Ill
0
0.01
0.02
'0
Figure 8. The hyperchaotic Rossler system in a 4-dimensional space is considered for the parameters a = 0.25, b = 0.3, c = 0.05, d = 0.5. ( a ) The graphs refer to the curves &(TO) against ro as k = 1, ..., 5 for a 1000 points time series. (b) Analysis of the original time series with 5% additive noise; (c) 5% dynamical noise; ( d ) 5% experimental noise. All recordings were stationary as it pertains to the normal electro-physiological criteria. The firing times {t,} of the nervous cells were stored digitally for off-line analysis. The spike train is provided by the discrete time series formed by consecutive intervals of firing, i.e. { q ,...,x3,..., x ~ }5 { t l -to, ...,t, - t J - l ,...,t~ - tK-l}, where the experiments are performed up to time t ~We . considered only time series with a minimum of 800 points. The dynamics of 214 spike trains recorded in the temporal cortex of anesthetised mice (Schwaller et al., 1998) were investigated by using both the GP method and the method of Section 5. The firing rate of these cells extended over the range 0.43-1.99 spikes/s. The accuracy of the time epochs was set to 1 ms. Note that the temporal cortex receives inputs from the auditory system and is connected to other sensory and associative cortical areas. The anaesthetic condition was maintained in a steady state throughout the recording session. The data analysed here concern only those periods of time when no external stimuli was applied (so-called 'spontaneous activity', labelled as sp). Up to ten blocks of 100 seconds each were cumulated for the analysis of single spike trains. Another set of data consisted in 139 spike trains recorded in the red nucleus neurons of conscious freely moving rats while they performed a simple forelimb reaching movement with the contralateral forepaw. This nucleus is an important centre of the motor system. The data were provided by Brian Hyland at the Department of Physiology, University of Otago, New Zealand. The firing rate of these neurons varied between 7.5 and 43.2 spikes/s. The accuracy of spikes timing was set to 0.1 ms, thus providing a comparable resolution with the other data set described above. The relationship of activity changes in these neurons to phase of task performance is reported elsewhere (Hyland and Jarratt, 1999). Data were recorded continually while the animals repeatedly reached for, grasped,
Determination of chaotic attractors in short discrete time series
355
and then consumed small pieces of food. For analysis, data were segmented into 2 sets thus providing 278 time series. Each set made up of multiple blocks of 4 seconds each. One set, referred to as mvt, included the 2 seconds before and after each occasion the food was grasped, and so included the acts of reaching, grasping, and withdrawing of the food. The other set was made up of blocks taken from periods between reaching episodes and is referred to as a control period (ctl). Spike train
Rec
K
d
dGP
D2
m2agc2.12A mlahc1.13-3 mlbac2.12-B m l bdc6.09-4
sp sp sp sp
921 1073 1050 2367
4 5 5 4
4 5 5 4
1.50 0.92 1.66 0.28
rnOlc6-1.All rn22c08.A2 rn18c07.All rn04c05.Al rn29c08.Al rn07c07.All rn08c08.A2
ctl ctl ctl ctl ctl mvt mvt
1037 1248 1441 1941 1083 1050 2072
6 5 5 5 4 4 4
5 4 4 4 4
3.80 3.40 0.46 2.03 1.39 0.31 2.60
4
4
By using the G P method we found 13/214 and 21/278 spike trains which exhibited a chaotic attractor in the mice temporal cortex and the rat red nucleus, respectively. Our method confirmed that 11/34 cases do show clear features of deterministic systems embedded in a low-dimensional space. These results are reported in Table I, where Rec indicates the recording condition, K denotes the total number of points, d is the embedding dimension at which the series are deterministic according to our algorithm, dGp and 0 2 are the embedding and correlation dimensions as provided by the G P method. Figure 9 illustrates one experimental case of deterministic dynamics observed in the red nucleus of freely moving rats during the control period.
6.4
Surrogate data
The method was tested on several sets of surrogate data derived from the original discrete time series. We considered the time series as point processes, i.e. {x,} z { t 3 } . The intervals between two consecutive points {t, - t 3 - l } were randomly shuffled. In such a way the first-order statistics (i.e., the time interval histogram) remained unchanged but the dynamics was completely scrambled. This construction of surrogate time series was applied to deterministic mappings, continuous systems and experimental data. In addition, to test the method on simulated spike trains we created surrogate data according to Abeles and Gerstein (1988) as realisations of non-stationary Poisson processes at different firing rates and different rates of fluctuations. We examined about 200 surrogate data sets with K = 1000 points and we always found that the LCE curves were not parallel, not straight lines, no matter what the embedding dimension was (up to d = 8). In a few cases we noticed that the application of the GP method to simulated spike
A Celletti, C Froeschlk, I V Tetko, il E P Villa
356
;i-,
81 6 2x Y 4
2
0 0
0.005
0.001
0 0
0.010 "(
- 0 0
2
- 0 0
0.003
0.002
3t
0.0021
0.002
0.004
0.0027
0.0054
31
h
h
0
0
t
2-
Y
x
0.0014 1'0
'0
t
0.0007
1
2-
Y
x
1.
oL 0
0.0025
0.0050
1. 01 0
"
Figure 9. A neuro-biological application is considered. The example refers to the spike train rn18c07.All and includes 1441 points. The curues X ~ ( T O )against ro are shown as k = 1, ...,5 and for embedding dimensions d = 2, ...,7 . A deterministic behaviour is observed for an embedding dimension d 2 5 .
trains could suggest a deterministic behaviour, even if all spike intervals were generated by chance. Figure 10 illustrates the analysis of a spike train generated according to a non-stationary Poisson distribution with fast fluctuation (0.05 s) of firing rate (Tetko and Villa, 1997). The average firing rate was 3.7 spikes/s and it fluctuated in range of 0-54 spikes/s. The application of our algorithm did not detect any deterministic behaviour for all analysed embedding dimensions. However, such behaviour was detected by the G P method.
357
Determination of chaotic attractors in short discrete time series
(b) h
0 v b
Y
x
iF.“i.i.- q-k, 6
Tr
0 0
ld=41
0.02
0.04
0.06
0
0.05
0.10
‘0
0.15
‘0
Figure 10. Dynamical system analysis of a simulated spike train generated b y nonstationary Poisson distribution. ( a ) The GP algorithm detects deterministic behaviour with d G p = 4, D2 = 2.08. The scaling region is enhanced b y grey lines. The curves X k ( r - 0 ) against ro are plotted for k = 1 , .. . , 5 within an embedding space of dimension d = 4 in ( b ) and dimension d = 7 in ( e ) . Note that no deterministic behaviour is observed.
7
Other methods for short time series
The method presented in Section 5 works properly in many dynamical situations, particularly when dealing with chaotic regimes of conservative mappings and when analysing dissipative systems. However, some dynamics of conservative mappings cannot be satisfactorily investigated with the method of Section 5 . Precisely, for the analysis of regular orbits and of weak chaotic motions, one needs to apply slightly different techniques as provided in (Lega et al., 2000). Recalling the notations of Section 5 , let a::)
3
d!k! log%. Let
4.i
x k
be the average of a$)
over the (N - k ) pairs of nearest neighbours having distances d!P,) ( N = K - d
+ 1):
The LCE is provided by or equivalently by x k / k . A mixed method which results as a compromise between (1) and ( 4 ) can be obtained as follows. For each point P3 locate its nearest neighbour P,. Order the distances d!:) between nearest neighbours from the smallest to the largest and consider only the first h pairs of nearest neighbours. In Lega et al. (2000) the first h = ( N - k)/10 pairs were considered in order to cover the orbit as much as possible and to have small initial distance vectors. Consider the images P,’k’and PJk)after k iterations until their distance d$) becomes greater than a given threshold T O .
358
A Celletti, C Froeschle', I V Tetko, A E P Villa
Let k(P,,P,) be the time necessary for d{P,) to become greater than the threshold Excluding the pairs whose initial distance is greater than T O , define
TO.
and call TO) the average over the h = ( N - k)/10 first pairs of nearest neighbours (renumbered from 1 to h ) having distances dF,F'PJ)> ro after k(P,, P') iterations:
The Lyapunov exponents as computed using ( l ) ,(2), (4), ( 5 ) were compared in Lega et al. (2000) for the 2-dimensional H h o n ' s map and for the standard mapping described by the equations x,+1 = 2 , E sin(z, yz) mod (27r)
+
Yz+l =
2'
+YE
+
mod (27r).
Several dynamical behaviour of the standard map were investigated (circulation torus, libration island, weak chaos, chaos, strong chaos). In all cases (both dissipative and conservative), the mixed (Equation 5) provides the best result when compared to the other methods. With the speed of computation, it is wise to use all algorithms to crosscheck the results.
8
Discussion
In Celletti et al. (1999) we presented a new algorithm to test for low-dimensional determinism of a short time series and to provide a good estimate of the maximum Lyapunov exponent (LCE) based on the measure suggested in Boffetta et al. (1998), Cellucci et al. (1997), Gao and Zheng (1993), Kantz (1994). The method depends only on two free parameters: the iteration parameter and the initial distance. In Section 5 , we have provided simple arguments for the choice of these parameters and performed a X2-test on some mathematical models to validate these criteria. We have applied the method of Celletti et al. to many discrete systems, showing that it is able to discriminate correctly the deterministic behaviour of the time series. Also in the case of continuous systems, our method correctly detects the deterministic dynamics. Our method of discretising these systems was aimed a t providing time series with comparable resolution to experiment. Then, we have arbitrarily scaled the values in order to obtain point processes with characteristics similar to those observed in spike trains, namely an average rate of 0.1-5.0 spikes/s a t a resolution of lms; the LCE curves showed a stable behaviour over this range. Surrogate data were also considered and in all cases a stochastic behaviour was found. As a concrete application of the method presented in Section 5 , we considered experimental time series derived from neuro-biological studies. In this case, time series are usually short and are characterised by noise variances of 10% or more of the signal variance (Rapp 1993, Theiler and Rapp 1996). The test for determinism of such series, in
Determination of chaotic attractors in short discrete time series
359
particular for the analysis of time series derived from brain activity such as extracellular single unit spike trains, has often been put in doubt because of the limited possibility to discriminate low levels of noise offered by established methods of analysis. To this respect, a method based on the measure of Gao and Zheng (Gao and Zheng, 1993) has been recently developed (Cellucci e t al., 1997) to detect noise in time series derived from Rossler, Lorenz and Mackey-Glass attractors with more than 8000 points. The problem of estimating the effect of noise corruption in time series data is difficult (Schreiber and Kantz, 1996) and depends on the nature of noise, either observational or dynamical (Theiler e t al., 1992). To this end, we considered the effect of additive, dynamical and experimental noise and compared our algorithm to the method developed by Grassberger and Procaccia (Grassberger and Procaccia, 1983). Without applying any noise filtering technique the noisy time series was identified as deterministic by the G P method, but not by our method. This result indicates a high level of sensitivity to noise by our technique. However, the G P method has the advantage of providing information on the dimensions of the embedding and of the attractor, if any, and was successfully used in studies of neuro-biological data (Babloyantz and Salazar 1985, Celletti and Villa 1996, Mpitsos e t al. 1988, Rapp e t al. 1985). Therefore, we would suggest the application of the G P method at first in order to discriminate the candidate time series for deterministic dynamics. On these selected series, the complementary use of our method would provide a more precise evaluation of which data may follow a strict deterministic behaviour. The finding of strict deterministic dynamics in several spike trains investigated in this study confirms the previous results obtained by applying the G P method (Babloyantz and Salazar 1985, Celletti and Villa 1996, Mpitsos e t al. 1988, Rapp e t al. 1985). These results establish the existence in the brain of some mechanisms able to support stable nonlinear dynamics of neuronal firing over a time that must be suitable to process some meaningful information in the brain. Theoretical prediction of the existence of such attractor networks was suggested in relation to representation of learned stimuli and was simulated in large scale neural networks with simple but reasonable assumptions of interactions between neurons (e.g., Amit and Brunel 1997, Herrmann e t al. 1993). We may raise the hypothesis that a number of neuronal networks, each one being potentially described by a limited set of differential equations (given the low-dimensionality in the experimental findings) , may interact at the level of selected single-units. Therefore, the analysis of deterministic dynamics in the brain might provide a new measure of the level of interacting networks at different conditions, encompassing also clinical and pharmacological manipulations. The method adopted in this work is simple enough to be implemented in an efficient computer program and could be used as a complementary method to the routinely accepted time domain analyses of spike trains.
Acknowledgments We are grateful to B. Hyland for many discussions during the accomplishment of this work and for providing us with experimental data. We thank G. Della Penna and E. Lega for helping us in the numerical computation of the Lyapunov exponents. A. Celletti was partially sponsored by GNFM (Gruppo Nazionale per la Fisica Matematica); A. Villa and I. Tetko were partially supported by Swiss National Science Foundation grant #2150045689.95 and INTAS-OPEN grant #97-0168 grants.
360
A Celletti, C Froeschl6,I V Tetko, A E P Villa
References Abarbanel H D I, 1996, Analysis of observed chaotic data, New York: Institute for Nonlinear Science, Springer-Verlag Abarbanel H D I, Brown R, Sidorowich J J and Tsimring L S, 1993, Rev Mod Phys 65, 1331. Abeles M and Gerstein G L, 1988, J Neurophysiol60, 909. Amit D J and Brunel N, 1997, Cerebral Cortex 7, 237. Babloyantz A and Salazar J M, 1985, Phys Lett A 111, 152. Benettin G and Galgani L. 1979, “Lyapunov characteristic exponents and Stochasticity, Intrinsic Stochasticity in plasma” in Les iditions de Physique, ed G Lava1 and D Gresillon (Coutaboeuf Orsay - France) Benettin G, Galgani L, Giorgilli A and Strelcyn J M 1980, Meccanica 15, 9. Boffetta G, Crisanti A, Paparella F, Provenzale A and Vulpiani A, 1998, Physica 116D, 301. Celletti A, Bajo V and Villa A E P. 1998, Meccanica 33, 381. Celletti A, FroeschlB C, Tetko I V and Villa A E P, 1999, Meccanica 252, 1. Celletti A and Villa A E P, 1996, Biological Cybernetics 74, n. 5, 387. Cellucci C J, Albano A M, Rapp P E, Pittenger R A and Josiassen R C, 1997, Chaos 7, 414. Damming M and Mitschke F, 1993. Phys Lett 178, 385. Eckmann J P, Kamphorst Oliffson S. Ruelle D and Ciliberto S. 1986, Phys Rev A 34, 4971. Eckmann J P and Ruelle D, 1985, Rev Mod Phys 57, 617. Eckmann J P and Ruelle D, 1992, Physica 56D, 185. FroeschlB C, 1984, Cel Mech 34, 95. Gao J and Zheng Z, 1993, Phys Letters A 181, 153. Grassberger P and Procaccia I, 1983, Phys Rev A 28, 2591. Hegger R, Kantz H and Schreiber T, 1999, Chaos 9, n. 2, 413. Herrmann M, Ruppin E and Usher M, 1993, Biol Cybern 68, 455. Hyland B I and Jarratt H, 1999, Neuroscience 88, 629. Kantz H, 1994, Phys Lett A 185, 77. Kaplan D T and Glass L, 1992, Phys Rev Lett 68, n. 4, 427. Lega E, Celletti A, Della Penna G and FroeschlB C, 2000, to appear in Int J Bif and Chaos Mpitsos G J, Burton R M Jr, Creech H C and Soinila S 0, 1988, Brain Res Bull 21, 529. Packard N H, Crutchfield J P, Farmer J D and Shaw R S, 1980, Phys Rev Lett 45, 712. Rapp P E, 1993, The Biologist 40, 89. Rapp P E, Albano A M, Schmah T I and Farwell L A, 1993, Phys Rev 47E, 2289. Rapp P E, Zimmerman I D, Albano A M, Deguzman G C and Greenbaun N N, 1985, Phys Lett A 110, 335. Rosenstein M T, Collins J J and de Luca C J, 1993, Physica 65D, 117. Schreiber T, 1998, Phys Rep 308, 1. Schreiber T and Kantz H, 1996, “Observing and Predicting chaotic signals: Is 2% noise too much?” in Predictability of Complex Dynamical Systems, ed Y A Kravtsv and J B Kadtke 69, 43, Springer Series in Synergetics. Schwaller B, Villa A E P, Tetko I V, Hunziker W, Tandon P, Silveira D C and Celio M, 1998, Europ J Neurosci Suppl 10, 4. Sugihara G and May R M, 1990, Nature 344, 734. Tetko I V and Villa A E P, 1997, Biol Cybern 76, 397. Theiler J, Eubank S, Longtin A, Galdrikian B and Farmer J D, 1992, Physica 58D, 77. Theiler J and Rapp P E, 1996, Electroenceph Clin Neurophysiol98, 213. Wolf A, Swift J B, Swinney H L and Vastano J A, 1985, Physica 16D, 285. Zeng X: Eykholt R and Pielke R A, 1991, Phys Rev Lett 66, 3229. ~
361
Non-integrability in gravitational and cosmological models Introduction to Ziglin theory and its differential Galois extension Andrzej J Maciejewski N Copernicus University, Torun and Pedagogical University, Zielona G h a , Poland
1
Introduction
At least half of mathematical models describing different phenomena in physics and astronomy, as well as many in chemistry, biology, economics and other sciences are given as a system of ordinary differential equations of the form
where z = (x',.. . , zn) are quantities parameterising our phenomenon, i.e. z describes a state of the model, and t is an evolution variable (usually the time). On the right hand sides w(z) = ( ~ ' ( z ).,. . , wn(z))of (1) we code our knowledge about the phenomenon; they say how velocity of state changes depend on the state itself. Having a model of the form (l),which we call a dynamical system, our aim is to find an explicit form of the evolution, i.e., we look for its general solution z ( t ) := cp(t,zo),where cp is a 'known' function of its variables and zo = z ( t o )is an arbitrary initial condition. For a long time it was believed that for 'reasonably simple' systems it is always possible to find such solutions. Thanks to this belief in the eighteenth and the nineteenth centuries many classes of differential equations where solved and many techniques for this purpose were developed. One of them, important for this lecture, is connected with the notion of the first integral. A continuous function F ( z ) is a first integral of system (l),if it is constant on its solutions, i.e., F(p(t,zo))= F ( z 0 ) does not depend on t for an arbitrary zo. In other words, a first integral gives us a conservation law for the system described by (1). When function F ( z ) is differentiable, the above definition of a first integral can be formulated in a more usable form. Namely, F ( z ) is a first integral of
Andrzej J Maciejewski
362
system (1) if it satisfies the following equation n
L , ( F ) ( z ) := C u " z ) a i F ( z )= 0 , i=l
where a, denotes the partial derivative with respect to z2,and L,(F) is named the Lie derivative of function F with respect to vector field v. h level of constant value of F , i.e. M j := {z E U?"
I F ( z )= f } ,
when non-empty, consists of whole phase curves-if zo E M f then cp(t,zo)E M f for all t. We say also that h f f is invariant with respect to the flow of system (1). It is clear that M j is a (n- 1)-dimensional hyper-surface in R". Thus, we can fix f and look for solutions p(t.zo) 'lying' in M j , i.e.. for which F(zo)= f . The gain is that in solving our problem we have one equation less because we can eliminate one variable from system (1) using equation F ( z ) = f. It is obvious that with enough first integrals we can solve Equation (1). In fact we have the following.
Theorem 1. If F,, for 2 = 1 , .. . , n - 1, are functionally independent first integrals of system (l), then it is integrable b y quadratures . Integrals are functionally independent if their gradients are linearly independent. In the above theorem the meaning of 'integrable by quadratures' is as follows. .4s first integrals are functionally independent , we can choose them as new coordinates. Without loss of generality we can put y2 = F,(z) for i = 1, . . . , n - 1, and y" = z".Then in new variables, system (1) reads d -y2 = 0 , for i = I,..'. , n - 1, dt d -y" = P ( y ) := v"(x'(y),. . . , z"-'(y), y"), dt
and its solution is given by yi(t) := ci for i = 1 , .. . , n - I, and
Thus the whole set of calculation reduce to inversions of known functions (we need them to express z as a function of y) and inversion of integral ( 2 ) . Not only do first integrals help to integrate explicitly a system of differential equations, we can also look for more complicated objects which are constant along solutions of the such that system. Generally, we can look for a tensor T with coordinates q'll,',';I(z) T;i;;i;(p(t,zo)) do not depend on t . The last requirement is equivalent to the following condition
which expressed in coordinates reads
Non-integrability in gravitational and cosmological models
363
where the summation convention is assumed. Among tensor invariants symmetries play an important role. A symmetry is, by definition, an invariant vector field. Thus, according to (3) U(.) = ( u ' ( x ) ,. . . , u n ( z ) )is a symmetry of system (1) if
LtJ(u)(z) = [ v , u l ( z )= 0, where [., .]denotes the Lie bracket of vector fields v and U . The following theorem explains the importance of symmetries.
Theorem 2 (Lie). If there exist linearly independent symmetries U' = U , 212, . . . , U , of system (1) such that [ui,u j ] = 0 for i , j = 1,.. . , n then it as integrable b y quadratures. Other important tensor invariants are the n-forms w = Mdx' A . . . A dxn:
(4)
where M = M ( s ) is a function. In older literature an invariant n-form is called the Jacobi Last Multiplier . A form (4) is invariant with respect to system (1) if d t ( M v i )= 0. In his Vorlesungen uber Dynamik Jacobi devoted nine of the thirty five lectures to the investigation of properties of this kind of invariant. The most important; is the following theorem
Theorem 3 (Jacobi). If there exist n - 2 functionally first integrals Fi, i = 1,. . . , n - 2, and an invariant n-form of system (1) then it is integrable b y quadratures. It is relatively easy to formulate a whole series of theorems about integrability of system (1) assuming the existence of a large enough number of tensorial invariants of different kinds. All of them, as the three given above, have generally local character. It is worth mentioning that even when we have a large enough number of first integrals or other tensor invariants the problem of finding an explicit form of the global, i.e., valid for all t , solution is a difficult one. Now, it is almost unquestionable that dynamical systems describing real processes are typically Hamiltonian if we can neglect dissipation of energy, (Novikov, 1982). Therefore, in what follows we consider only Hamiltonian dynamical systems: d --(I' dt
-
'-
dH dpi
d dtpi =
-,
--,aH aqi
i = 1, . . . , n ,
(5)
where H = H ( q , p ) is a Hamiltonian function. Our aim is to study the question of integrability of such systems. The concept of Poisson bracket plays the fundamental role in the Hamiltonian formalism. In our canonical settings it is defined as follows n
(atf(z)at+ng(z) - (at+nf(z)atg(z)) >
g ) ( z ) := a=1
where f and g are smooth functions of z := (21,.. . , ZZn) := ( q 1 , . . . , q,,pl,. . . , p , ) . It is easy to check that Poisson bracket is bilinear and antisymmetric. Moreover, it satisfies the Leibniz rule
Andrzej J Maciejewski
364 and the Jacobi identity
{f,{g,h } ) + { h ,{f,9 ) ) +
(91
{ h ,f>>= 0;
where f , g and h are arbitrary smooth functions of t . Using Poisson brackets we can rewrite Equations ( 5 ) in the form
(6) Now, it is obvious that F = F ( z ) is a first integral of (6) if { F , H } = 0 . Hamiltonian systems have many specific properties. The most important is that they always possess a t least one first integral, namely H , and moreover, they possess invariant 2k-forms uk, with k = 1 . . . . . n. These forms are defined as follows n
= u i:= x d q , A d p i ,
!J&=ii!AWk-l,
k = 2 ,... ,n.
2=1
These properties are the reason why, for the integrability in quadratures of 2n Hamilton’s equations of motion, we need only n commuting first integrals Fl=H? F2,. . . F,, {Fi,F’} = 0, i, j = 1 , .. . , R. More precisely we have the following theorem. ~
Theorem 4 (Liouville). Assume that Hamiltonian system (6) possesses n commuting first integrals Fl = H , . . . F,, which are functio, zlly independent. Then a common constant value level IWf := ( 2 E R2n Fi(z)= fi}? is an invariant smooth manifold, and
1. If hff is compact and connected then it
;
diffeomorphic to an n-dimensional torus.
2. System ( 6 ) is integrable by quadratures. When a Hamiltonian system with n degrees of freedom possesses n independent and commuting first integrals then we say that it is completely integrable . All the facts given above are widely known. For more complete exposition refer to books of Arnold (1978); Arnold et al. (1988): Kozlov (1996) as basic sources of information about integrability. There exist several approaches for proving non-integrability of a Hamiltonian system, see e.g. Kozlov (1983, 1996). In this lecture I will present one of them. The main ideas of this approach have their roots in works of K Weierstrass, P Hoyer, S V Kovalevskaya, A M Lyapunov, H Poincard, P Painlevd, E Picard and many others. In the 1980‘s S L Ziglin published two papers (Ziglin, 1982, 1983) where the basic theorems of what is now known as Ziglin theory, were formulated. Clearly he found a beautiful and very powerful unification of old ideas. Later on, thanks to works of H Yoshida, H Ito, R C Churchill, D R Rod, and many others the Ziglin theory was applied to study non-integrability of various systems. Xt the same time the theory was developed. Quite recently thanks to works of J J Morales-Ruiz, J P Ramis, C Simo, R C Churchill and D R Rod the Ziglin
Non-integrability in gravitational and cosmological models
365
theory evolved in a new direction and was enriched by the so-called differential Galois approach. Although the theoretical background for the Ziglin theory and its differential Galois extension is rather mathematically involved, for applications of these theories it is enough to be equipped with the standard mathematical knowledge supplemented with some basic facts from higher algebra, Riemann surfaces and analytic theory of differential equations. The aim of this lecture is to present the method in relatively simple settings and point out those aspects which are important from the point of view of applications.
2
Integrability and variational equations
The Ziglin theory, as well as its extension due to Morales and Ramis, relates the integrability of the original system to the appropriately defined integrability of variational equations around a particular solution. In what follows I demonstrate the simplest forms of this relation. Let us consider system (1) and let us assume that we know its particular solution - p ( t ) we can rewrite it in the form
z = p ( t ) and its first integral F ( z ) . Then introducing new variables y = 5
Truncating the right hand side of (7) on the first term we obtain the variational equation around the particular solution p(t)
As we assumed that F ( z ) is the first integral of (l),F ( y ( 7 ) . We can expand it into the Taylor series
+ p ( t ) )is the first integral of
where Fk(Y,t) is a homogeneous polynomial of degree IC in y , F{cp(t))is a constant, and Fko(y,t ) # 0. Now, it is easy to see that variational equations (8) have also a first integral and that this integral is given by the first non-trivial term in expansion (9), i.e., it is Fko(y,t ) . This simple consideration gives us the following implication
Proposition 1. I f an analytic system possesses an analytic first integral then the variational equations around a particular solution possess a polynomial first integral. Thus, if we are able to prove that the variational equation around a particular solution does not have a first integral we prove that the studied system does not have a first integral. It seems that this fact is useless because it is difficult in general to decide whether a variational equation does not have a first integral. However, it is not like that. To see this let us consider the simplest situation when our particular solution is an equilibrium. Then A ( t ) is a constant matrix and we can apply the following theorem.
Andrzej J Maciejewski
366
Theorem 5. System ( 8 ) with constant coeflcients matrix possesses a polynomial first integral if and only if eigenvalues A I , . . . , A, of the matrix are resonant, i.e., they satisfy the following relation ki A i
+ . . + k,A, ,
= 0,
kl
+ . . + k, *
# 0,
for some non-negative integers k l . . . . , k,. We can extend this kind of reasoning to a more general situation when we are looking for conditions of existence of a meromorphic first integral, i.e. when F ( z ) is a ratio of two analytic functions.
Proposition 2. If an analytic system possesses a meromorphic first integral then the variational equation possesses a rational first integral. Then we need a theorem analogous to Theorem 5 .
Theorem 6. System ( 8 ) with constant diagonalisable coeficients matrix possesses a rational first integral if and only if its eigenvalues AI, . . . A, are resonant, i.e., they satisfy the following relation ~
+ . " + knAn = O .
klAl
Ikll
+ . . . + lknl # 0:
for some integers ki... . k,. ~
For proofs of the last two theorems see Nowicki (1994, 1996) When a particular solution is periodic, let us say with the period 27r. then A(t) is 271.-periodic. In such cases. as is well known. system (8) can be transformed to a linear system with constant coefficients
Matrix B in the above equation is defined by exp B = Y(27r),where Y ( t ) is the matrix of fundamental solutions of (8), i.e. it is the solution of the following matrix initial value problem d -Y dt
= A(t)Y,
Y ( 0 )= E ,
where E denotes the unit matrix. Eigenvalues of matrices Y(27r)and B are called multipliers and characteristic exponents of the periodic solution, respectively. Matrix Y (271.) is called the monodromy matrix of system ( 8 ) . Matrix A(t) is not arbitrary, it is the Jacobian matrix of an autonomous system. From this fact it follows that a t least one multiplier is equal one, or, equivalently, one characteristic exponent is equal zero. What is more interesting, we have the following two theorems.
Theorem 7 (Poincark). If system (1) possesses k first integrals which are independent along a periodic solution then at least k + 1 multipliers are equal one. Theorem 8 (Poincar6). Assume that Hamiltonian system ( 6 ) possesses k commuting first integrals Fl = H , . . . F k which are functionally independent along a periodic solution. Then at least 2k multipliers are equal one.
Non-integrability in gravitational and cosmological models
367
From these theorems it follows that the existence of periodic solutions with nonvanishing characteristic exponents is an obstacle for integrability. This observation is the basic idea of a method developed by PoincarC for proving non-integrability of Hamiltonian systems. A more detailed explanation of the last two theorems, as well as a clear exposition of the PoincarC method can be found in (Kozlov, 1996).
3
Method of Kovaleveskaya
We know from complex analysis that to explain certain properties of real functions it is good to consider their real argument as a complex variable. Similarly, to better understand the properties of solutions of differential equations it is helpful to consider the independent variable t as a complex quantity. Thus, we consider a complex dynamical system of the form
where w(x) = (wl(x),. . . , w"(x)), and functions w z ( ( z ) are holomorphic. In the literature instead of a complex dynamical system the name analytic system is also used. The basic definitions and theorems about the existence and uniqueness of solutions are similar to those corresponding to the real case. However, we have to notice important differences which are specific only for complex systems. First of all let us observe that a solution p(t) of a real system defines a curve, i.e., a one dimensional object in the phase space. A solution p ( t ) of a complex system is a two dimensional object because coordinates of a point lying on the phase curve corresponding to p ( t ) depend on two real parameters: real and imaginary parts o f t . In the real situation we assumed that solutions exist for all values o f t E R. Of course this is not always true-there are simple examples showing that a solution of a differential equation exists only on a finite interval. Similarly, when time is complex, a solution is defined only for t belonging to a certain subset of the complex time plane. However, in the complex case a very specific phenomenon can occur. Imagine that we know a solution q(t) in a neighbourhood U of to E C. Because we are on the complex time plane we can calculate this solution along different paths. Let us choose a path which starts from to, leaves neighbourhood U and returns to to. It can happen that the value of solution obtained in the end of the 'travel' along the closed path is different from that we started with. Thus, we can have a multi-valued solution-the value of solution p(t1) at time tl depends on the path from to to tl we chose to calculate .p(tl).The most astonishing is that this phenomenon is rather a rule than exception, i.e., 'typical' solutions of 'typical' differential equations are multi-valued.
It is important to understand that this phenomenon is not in contradiction with the uniqueness of solution of the initial value problem. Calculating our solution along the chosen path we proceed from point to point using analytic continuation. This process can be described as follows. Our path is given by a continuous function y : [0, 11 + C, such that y(0) = to and y(1) = t l . Being at to we calculate a local solution p ( t ) with initial condition p(t0) = xo which is unique and defined in a disc D, = { t E C I It-tl < a } , where a > 0. Then we move on the path choosing s > 0 such that y(s) E D,, and we find the
Andrzej J Maciejewski
368
t Im t
Re t
Figure 1. Analytic continuation o,f a solution along path y solution $ ( t ) satisfying initial condition $(y(s)) = p(y(s)). This new solution is unique and defined in a certain disc DL = { t E C I It-tl < zl}. Moreover, for t E D a nDi we have y ( t ) = $ ( t ) ,i.e. they coincide at points where they are both defined. It can be shown that making a finite number of described steps we reach the end point of the path. This process is illustrated in Figure 1. Nowadays, instead of multi-valued functions we use the notion of an analytic function. It is an object obtained from a locally given holomorphic function by all possible analytic continuations along all possible paths. We can associate with an analytic function a geometrical object-a Riemann surface. A multi-valued function is a 'usual' single-valued function defined on its Riemann surface. A classical and beautiful exposition of the theory of analytic differential equations, as well as Riemann surfaces is given in the book of Golubev (1950). See also Weyl (1955) and Hille (1976). In our further considerations the following example will be important. Let us assume that the particular solution of system (10) is a rational (and thus single-valued) function
where s ( t ) and ~ ( tare ) polynomial functions. It is well defined for all t except a finite number of poles d,. z = 1.. . . , m, i.e. zeros of r ( t ) . Thus, the Riemann surface of this function can be identified with C\{d,.. . . . d m } . However, when we study the behaviour of the solution when It1 + m it is customary to compactify the complex plane adding to it one point {m}. In this way we obtain the classical Riemann sphere C, = C U {CO}. The name 'sphere' for this object is given because C, and the unit sphere S2 are biholomorphic by means of the stereographic projection. The Riemann sphere can also be identified with the complex projective line CP', see the book of Miranda (1995) for a contemporary exposition of Riemann surfaces theory. Thus, the compact Riemann . . , dm}-z.e. the Riemann surface associated with the rational function (11) is C,\{dl,. sphere with some points removed. Examples of more complicated Riemann surfaces which appear in applications will be given later.
Non-integrability in gravitational and cosmological models
369
Figure 2. So$a Vasil’ievna Kovalevskaya 1850-1891 The natural question arises: why investigate systems with complex time when our models with real time are complex enough? In the nineteenth century it was observed that solutions of integrable mechanical problems considered as functions of complex time are single-valued. Thus, the question appeared in the context of asking if, and in what sense, this is connected with integrability. In fact, the Ziglin and Morales-Ramis theories arose from many different attempts which were made during the century to give an answer to this question. It seems that the idea of connecting integrability with properties of solutions as functions of the complex time attracted so many scientist because of brilliant works of S V Kovalevskaya (Figure 2) connected with the investigation of the heavy top problem (Kowalevski, 1888, 1890). She wondered when all solutions of the heavy top equations of motion would be single valued and she found that it occurs in all the then known integrable cases (i.e. Euler and Lagrange cases) and in one additional case. She showed that this additional case is integrable and she found an explicit form of the solutions for this case. The first step she made---searching for those parameters’ values for which all solutions of a differential equations are single valued -now is known as the Kovalevskaya analysis. To describe the method of Kovalevskaya we start with the following observation made by Lyapunov (1894) who improved the Kovalevslbya considerations.
Proposition 3. If all solutions of a system are single-valued then all solutions of the variational equation around a particular solution are also single-valued. Equations of motion of the heavy top problem are quasi-homogeneous and this p r o p erty allows one to find very simple particular solutions. The concept of quasi-homogeneity is as follows. A function F ( z ) is quasi-homogeneous of degree m if there exist non-zero
Andrzej J Maciejewski
370 integers 91,. . . , g, such that
F(Xg'21,. .
for all z and X
. , Pnz") = X"F(z1,. . . , z"),
> 0. An equation of the form (10) is quasi-homogeneous if vz(Xglxl,.. . , Xgnx") = Xg~+lvz(zl,. . . ,P), i = 1 , . . . , n.
For a more general definition of quasi-homogeneity and properties of quasi-homogeneous equations see (Maciejewski afid Popov, 1998). For a quasi-homogeneous system we are able to find a particular solution. It is easy to check that formulae z'((t) = czt@,
i = 1 , . . . , n,
(12)
where c = (c1, . . . , c,) are solutions of nonlinear equations ~ ' ( c )= -gtcz,
i = 1,.. . , n,
given a particular solution of the system. Moreover, variational equations around a solution (12) have the form (13) They have a particular solution y'((t) = ettP-gl,
i = 1 , .. . , n,
(14)
where p is an eigenvalue and e = ( e l , . . . , e n ) is the corresponding eigenvector of the Kovalevskaya matrix:
Let us notice that from (14) it follows that if the Kovalevskaya matrix has a non-integer eigenvalue then there is a solution of variational equations (13) which is not single-valued. More precisely, we have the following theorem.
Theorem 9 (Lyapunov). If all solutions of a quasi-homogeneous system are singlevalued then the Kovalevskaya matrix is diagonalisable and has integer eigenvalues. Eigenvalues of the Kovalevskaya matrix are called Kovalevskaya exponents. Although in the epoch of Kovalevskaya and later, it was unclear why the system can be integrable when all solutions are single-valued, the described approach allowed the finding of many integrable systems. The so called Peinlevk analysis, very popular in the physical literature, is in fact a version of the Kovalevskaya analysis adopted to arbitrary (not only quasihomogeneous) systems, see Ramani et al. (1989). The literature on this subject is very rich, see an overview of Conte (1999) and references therein. The first result connecting the existence of integrals with the non-branching of solutions of a quasi-homogeneous system was found by Yoshida (1983a,b).
Non-integrability in gravitational and cosmological models
371
Theorem 10 (Yoshida). If a quasi-homogeneous function F ( s ) of degree m is a $rst integral of the quasi-homogeneous system (10) and V F ( c )# 0 , then m is an eigenvalue of the Kovalevskaya matrix. We notice here that the assumption V F ( c ) # 0 can not be weakened in the above theorem. It is easy to give an example such that all Kovalevskaya exponents except one (equal to -1) are irrational but, nevertheless, the system possesses first integrals or is integrable. Anyway, the Kovalevskaya analysis is still used in the search for integrable cases, see e.g. (Borisov and Tsygvintsev, 1996, 1997; Borisov and Dudoladov, 1999). It is important also to notice a remarkable similarity of the above theorem wit,h Theorem 7. After works of Yoshida there appeared works where the authors investigated the relations between the existence of tensor invariants and Kovalevskaya exponents, see (Lochak, 1985; Kummer et al., 1991; Kozlov, 1992, 1996; Sadktov, 1993,1994). The existence of such invariants is always connected with a certain resonance relation between Kovalevskaya exponents. Let us make a historical remark. In the PhD thesis of P Hoyer (1879) we can find all the basic steps of the Kovalevskaya analysis. Hoyer was a pupil of K, Weierstrass and it seems that he was the first to be interested in the investigation of systems wit,h singlevalued solutions. Kovalevskaya had good contacts with Weierstrass and she had to know the works of his pupils. However, independently of the fact that some techniques for tests when solutions are single valued were developed before her, it is undoubted that the idea of a connection of integrability with the lack of multi-valued solutions is her own. A brief history of the Kovalevskaya exponents can be found in Goriely (2000).
4
Ziglin theory
We continue to study the question of the integrability of a complex dynamical system of the form (1). In the above considerations we underline the importance of variational equations t o the study of the integrability of this system. The implication formulated in Proposition 2 plays the fundamental role but to make it useful we need an effective and universal method for testing whether the variational equations around a particular solution possess a first integral. There are many examples when the particular solution of the system is more complicated than that used for investigating quasi-homogeneous systems. In applications, most often, we find solutions which are expressed in terms of elliptic or hyper-elliptic functions, and variational equations (8) for these solutions are complicated. The existence of a first integral imposes a restriction on the behaviour of solutions of variational equations. Thus, we have t o know how to describe this behaviour without the explicit knowledge of the solutions. It appears extremely fruitful to investigate how solutions of the variational equations are changing when we calculate them along different closed paths on the complex time plane. This leads to the notion of the monodromy group which we explain below. Let p ( t ) be a particular solution of system (10) which is not an equilibrium and let r be the Riemann surface associated with this solution. It is convenient for better understanding to work with a possibly simple example. We just assume that our solution is rational, and thus its Riemann surface is r = C,\{d,, . . . , &}. We consider matrix
Andrzej J Maciejewski
372
Im t
Re t
Figure 3. Closed oriented paths on @\{dl,d2, ds, dq,d 5 } with the common point to. Paths y and -7. are homotopic. variational equations for our solutions d dt
-Y
= A(t)Y,
Entries of matrix A(t) are rational functions which have poles a t points dk, k = 1 , .. . m. We choose one pole, let us say d3, and we fix to E I' which is close to d j . Then we find a local solution Y(t) such that Y(t0) = E . Now, we choose a loop y which encircles pole dJ and we make an analytic continuation of Y(t) along this loop. As a result, we obtain a new solution ?(t) in a neighbourhood of to. As it is known, a system of n linear differential equations has n lin5arly independent solutions. Thus, there must exist a nonsingular matrix M7 such that Y ( t ) = M,Y(t). It can be shown that matrix M, does not depend on a particular choice of loop y provided that it encircles only one pole d3. When M , is not the identity matrix then variational equations (16) have multi-valued solutions. Matrix M, is called the local monodromy around the singular point d J . Now, we consider all closed oriented loops on r starting and ending at to. We can identify two loops if one of them is a continuous deformation of the other. In this way all loops are divided into separate classes-the so called homotopy classes. We can also define multiplication of two loops: we say that y = y1 . y~ if we go first along y1 and then along 7 2 . We say that y-' is the inverse of loop y if it has orientation opposite to y. In fact, these operations are correctly defined on homotopy classes and give them the structure of a group. It is the first homotopy group and is denoted by 7rl(r). In the same way as for the local monodromy we obtain a monodromy matrix M-, for an arbitrary loop y and this matrix will not depend on a specific choice of the loop only on its homotopy class, see Figure 3. Moreover, the group structure of 7rl (I?) is transformed to the group structure of matrices M,: if y = y1 . y2 then M , = M72M-,1.In this way we obtain the monodromy group of equation (16) and we denote it M . Assume now that our system has a first integral F ( z ) then, as we explained in Section 2, variational equations (8) also
Xon-integrability in gravitational and cosmological models
373
have a first integral Fko(y,t ) . This integral is constant along any analytic continuation of a local solution of the variational equations. In particular, if we start form point t and return to this point along loop y then from the local solution y ( t ) we obtain M,y(t). It follows that the first integral of the variational equations satisfies the following condition
Fko(MyY1 t ) = F k o ( Y , t ) ,
(17)
for all matrices M y from the monodromy group. We can see now that the more matrices we have in the monodromy group, the more restrictions we have for the first integral. In particular, when the monodromy group is big enough, it is possible to show that there is no non-constant function satisfying (17). Functions satisfying (17) are called integrals of the monodromy group.
As our variational equations are obtained from an autonomous system all monodromy matrices have at least one eigenvalue equal to one. If we remove them, our considerations will be simpler. To this end. we assume that coordinates z are chosen in such way that the investigated particular solution is given in the following form p(t) = (0, , . . . , 0, ~ " ( t ) ) . (This assumption is not restrictive. For an arbitrary particular solution we can always choose local coordinates in this way). Then the first n - 1 variational equations form a closed system called the normal variational equations. The following lemma is due to Ziglin (1982). Lemma 1 (Ziglin). If system (10) possesses k functionally meromorphzc independent first integrals in a neighbourhood of a particular solution, then the monodromy group of normal variational equations possesses k independent j r s t integrals. This lemma contains the great idea of Ziglin and is the kernel of his theory. We notice that the notion of monodromy group has been known since works of Riemann, however Ziglin was the first who, tracing ideas of Kovalevskaya and Lapunov, showed its importance for integrability. The above lemma can be used effectively for proving the non-integrability of non-Hamiltonian systems. An example is the so called (ABC)-system investigated by Ziglin (1996, 1998). When the system is Hamiltonian, we can achieve more than that which follows directly from Lemma 1; this is the subject of Ziglin theory. To present the basic facts we consider a complex Hamiltonian system given by a complex Hamiltonian function H : C2" --+ C. We assume that Hamilton's equations
have a non-equilibrium solution z = p(t). To simplify the exposition, we assume that this solution lies in a two dimensional invariant plane
n = {(Zl,.. . , Z2") E C2" I zi = 0 , i = 1,.. . , 2 ( n - 1 ) ) . curve r = {cp(t) E CZn It E C } is a Riemannian surface with a local coordi-
The phase nate t . Together with equations (18) we consider also variational equations along solution
$44
rlndrzej J Maciejewski
374
This system separates into the normal and the tangential subsystems. In our settings this separation takes a very simple form-matrix A ( t ) has a block diagonal structure. We consider the normal variational equations ( W E )
where B ( t ) is a 2(n - 1) x 2(n - 1) upper diagonal block of matrix A ( t ) . Let us note that for a Hamiltonian system we can lower the dimension of variational equation by two due to the existence of its first integral. As the considered system is Hamiltonian, monodromy matrices of normal variational equations are symplectic. Let us take an element of monodromy group M E M . Its spectrum has the form spectr(M) = ( A l T A;',
. . . , An-l%
A i E C.
Element M is called resonant if n-1
ITA;'
= 1 for some
(IC1,. . . ,
E Zn-'\{O}.
1=1
Theorem 11 (Ziglin, 1982). Let us assume that there exists a non-resonant element M E M . If the Hamiltonian system has n meromorphic first integrals FI=H, Fz, . . . , F, which are functionally independent in a connected neighbourhood of r then any other monodromy matrix M' E M transforms eigenvectors of matrix M to its eigenvectors. In the case of a system with two degrees of freedom this theorem can be formulated in a more operational way
Theorem 12. Let us assume that there exists a non-resonant element M E M . If there exists another element M' E M such that 1. TrM'
# 0 and M M ' # M ' M , or
2. TrM' = 0 and
MM'M #
then there is no additional meromorphic first integral functionally independent of H in a connected neighbourhood of F. The main difficulty with the application of the Ziglin theorem is the determination of the monodromy group of NVE. Only in very special cases can we do this analytically. Nevertheless, Ziglin (1983) was able to prove that the heavy top problem is non-integrable except for the known classical Euler, Lagrange and Kovalevskaya cases. Yoshida (1987, 1989) adopted the Ziglin approach for special cases when the Hamiltonian of a system has a natural form and the potential is a homogeneous function. In this case we can find a particular solution in the form of a 'straight line solution' and the normal variational equations for it can be transformed to a product of certain copies of hypergeometric equations for which the monodromy group is known. This allows us to formulate adequate theorems in the form of an algorithm. The Ziglin theory was developed and analysed by Ito (1985, 1987, 1990), Braider and Churchill (1990), Churchill and Rod (1988, 1991) and others. It was applied successfully
Non-integrability in gravitational and cosmological models
375
for proving the non-integrability of many problems connected with: rigid body dynamics by e.g. Christov (1993) and Ziglin (1987, 1997); orbital dynamics by e.g. Irigoyen (1995) and Ferrdndiz and Sansaturio (1995); cosmological models by e.g. Maciejewski and Szydlowski (2000); other branches of physics by e.g. FranGoise and Irigoyen (1993); Roekaerts and Yoshida (1994); Maciejewski and Goidziewski (1999); Rod and Sleeman (1995), to mention only a few examples. In 1997 Ziglin published a paper (Ziglin, 1997) where he considerably improved the power of his method. Namely, with extensions he made it possible to prove the nonexistence of real meromorphic first integrals. Using this approach he proved the real non-integrability of the heavy top problem except for known integrable cases as well as the non-integrability of HCnon-Heiles and Yang-Mills systems.
5
Morales-Ramis theory
At the end of the nineteenth century there appeared one new problem which attracted mathematicians working on linear differential equations, and which is important for our further story. Having a system of linear equations of the form (8), or one n-th order linear homogeneous differential equation y(n)
+ a,-I(t)y(n-’) + ’ . . + ao(t)y = 0,
we can ask when its solutions can be expressed in terms of ‘known’ functions. To make this question reasonable we have to be precise as to what we mean by a known function. We have several possibilities; however for our purposes we need to understand what a Liouvillian function is. We start with the set of rational functions which we denote C ( t ) . As is well known, this set is a field (in the algebraic sense) with usual addition and multiplication. Moreover, we know how to differentiate such functions. It is an example of a dzflerential field. We can construct a bigger differential field making the following three operations. 1. An algebraic extension. We pick a polynomial whose coefficients are rational functions. Then a solution of the equation fk
+
bk-l(t)fk-’
+ . + bo(t) = 0,
bl E C ( t ) , 1 = 0 , . . . , IC - 1,
defines an algebraic function f ( t ) . We can differentiate according to the known rules. Moreover, we can construct the smallest field containing it. 2. An extension by an integral. We pick up a rational function g E @ ( t )and we calculate f ( t ) = g(s)ds. Then we construct the smallest field containing f.
st
3. An extension by an exponent of integral. We pick g E @ ( t )and we calculate f ( t ) = exp g(s)ds. Then we construct the smallest field containing f.
st
In all three points the construction of the smallest field is easy. It is just the field of rational functions of variable f and coefficients taken from @@).We describe these elementary extensions starting from the field of rational functions but, in fact, we can start from an arbitrary differential field. Now imagine that starting from C ( t ) we make an elementary
376
Andrzej J Maciejewski
extension, and we obtain field K 1 , then starting from K 1 we obtain field Kz by an elementary extension, and, after a finite number of such steps we obtain field K,. Elements of K , are Liouvillian functions. It is easy to observe that all elementary functions are Liouvillian. Now, our question can be formulated more precisely: under what conditions are all solutions of a linear differential equation Liouvillian? It is amazing that the answer to this question is similar to that given by Galois answering a question about the solvability of a polynomial equation by radicals. Moreover, the theory which gives the answer is similar to the Galois theory, and is called the differential Galois theory. The differential Galois group plays the crucial role in this theory. We will skip the precise definition of this group and describe it informally. From the previous section we know that with a system of linear equations we can connect the monodromy group. However, with the same system, we can connect a bigger group-the differential Galois group. We can think about it as a group of invertible matrices which transforms a solution of a system to a solution. More precisely, it is a group of matrices which preserve all polynomial relations between solutions of considered equations. It is important that this group is an algebraic group because properties of this kind of group are well known. We can consider such a group as a disjoint finite union of connected (in an appropriate topology) sets. One of these sets contains the identity matrix and it is called the identity component of differential Galois group.
Now, we can formulate the following. Theorem 13. All solutions of a system of linear equations are Liouvillian if and only if elements of the identity component of the differential Galois group can be put simultaneously into triangular form. In the above we have joined two theorems from the classical exposition of the subject, see e.g. book of Kaplansky (1976) and paper of Singer (1990). Remembering what was said about the Ziglin theory, we can now ask what is the connection between the integrability of a system and properties of the differential Galois group of the variational equations around a particular solution of the system. In the context of Hamiltonian systems, this question was analysed first by Morales-Ruiz (1989) and independently by Churchill and Rod (1991). Later investigations gave rise to a whole theory which is fully described in the book of Morales-Ruiz (1999). We point out basic ideas of this theory. First of all, as in the Ziglin theory, we consider a complex Hamiltonian system for which we know a particular solution. Then, we analyse variational (or normal variational) equations. Now, differences between the Ziglin and Morales-Ramis theories appears. In the Ziglin theory we ask when the system with n degrees of freedom possesses n, not necessarily commuting, first integrals. In the Morales-Ramis theory we ask whether the system is completely integrable. Moreover, in the Morales-Ramis theory we ask how the assumed integrability manifests itself by properties of the differential Galois group. These differences are important because on the one hand the imposed integrability is more restrictive than in the Ziglin theory, and on the other hand the differential Galois group is bigger than the monodromy group, thus, in general, it must be easier to prove the non-integrability. The basic theorem of the Morales-Ramis theory is the following.
Non-integrability in gravitational and cosmological models
377
Theorem 14. If a Hamiltonian system is completely integrable in a neighbourhood of a particular solution then the identity component of the differential Galois group of uariational equations along this solution is Abelian. The first integrals mentioned in this theorem are assumed to be meromorphic. In order to apply the above theorem we have to know how to check out if the identity component of the differential Galois group is Abelian. Generally speaking, this is a difficult problem; however, for some classes of equations, we know the differential Galois group and, for low order systems, there exist algorithms which allow us to answer this question. An application (Morales-Ruiz, 1999) of the present theory to the investigation of the integrability of a natural Hamiltonian system with a homogeneous potential shows the full power of the theory. .4s this general result can be used in many examples, we summarise it here. We consider a natural complex Hamiltonian system given by the following Hamiltonian function
where V ( q )is a homogeneous function of integer degree IC. Taking into account a quasihomogeneous property of equations generated by the above Hamiltonian, we can look for particular solutions of the form = c,p(t), P%(t) = 42(%
2
= 1 , .. . . %
where p(t) is a scalar function satisfying the following equation
$ ( t ) = -p(t)"-'. and c = (c1,. . . , cn) is a solution of nonlinear equations
V V ( c )= c Equation (21) has a first integral
1 1 E = -$(t)2+ x ~ ( t ) ~ , 2
and its solution can be obtained by inversion of a hyper-elliptic integral (an explicit form of this solution is unnecessary for further considerations). Variational equations around our particular solution have the form $ = -p(t)"-2V2V(c)y1
y = ( y 1 , . . . , y,).
As the Hessian V 2 V ( c )is symmetric, by means of a linear change of variables y can transform the above equations to the form G, = -p(t)k-2A,w,,
i = 1,.. . , n,
+ w, we (22)
where A, are eigenvalues of the Hessian. It can be shown that one eigenvalue, let us say A, is equal to IC-1. The corresponding equation describes the tangential equations. Thus
Andrzej J Maciejewski
378
the first n - 1 equations in (22) give the normal variational equation. Now, changing the ) ~transform , each of equations (22) into the form independent variable t -+ z := ~ ( t we of hypergeometric differential equation; this can be done for arbitrary values of E # 0. For this equation we know when the identity component is Abelian, see Kimura (1969). Based on this fact Morales-Ruiz (1999) proved the following.
Theorem 15. If the Hamiltonian system with Hamiltonian (20) is completely integrable then each pair ( k , X i ) belongs to one of the following lists
( k P + SP(P - 1)) (21 4, 4, ( - 5 , 4o 49 - f6 ( 1 + 3 ~ ) ~ ( )- 5,, b - f ( 1 + 5 ~ ) ~ ) (-4 , 1 28 - -i 1( + 3 p ) 2 ) > 1
(-21
’
2
(-3 1 2a--( 4 3”2 + 4p)2) (-3> - & (l + 5 p ) 2 ) , (-3, - $j(1 + 5P)2) , (3, -& + f (1 + 3p)’) , (3, -& + $j(1+4p)2) , (3,-& + & (1 + 5 ~ ) , ~ (3, ) -& + (1 + 5 ~ ), ~ (4, ) + (1 3p)2 ) , (-3 a24- ; ( 1 + 3 p ) ’ ) , 1
1
2
( 5 , -$
+ ;(1 + 3 d 2 ) ,
(5,
-& + f (1 + 5 p ) 2 ) ,
-:
(k,;
+
(y+ p ( p + 1)k)) ,
where p is an integer and X an arbitrary complex number. It is important to compare the above theorem with the theorem of Yoshida (1987) who investigated systems of the same type applying the Ziglin theory. His results are weaker. First of all it is formulated for a system with two degrees of freedom. Moreover, in Yoshida’s formulation the system is not integrable if X I belongs to certain intervals, and thus the system can be integrable when X I belongs to the complement of these intervals. In the above formulation the system can be integrable for discrete values of A,. Detailed exposition of the Morales-Ramis theory with proofs and many examples is contained in (Morales-Ruiz, 1999). For more informal presentation see also (Morales-Ruiz, 2000). For general texts about differential Galois theory see (Ramis and Martinet, 1990; Beukers, 1992: Magid, 1994).
6
Applications
Before we start to present applications of the Morales-Ramis theory it is necessary to say how the differential Galois group can be determined. As already mentioned, it is known for the hypergeometric equation . The same is true for the Lam6 equation and some other classes of equations. Generally, when a normal variational equation can be transformed to the form of a second order linear equation with rational coefficients, then there exists an algorithm, due to Kovacic (1986), allowing the determination of its differential Galois group; see also Ulmer and Weil (1996). For higher order equations the problem is more difficult, see Singer and Ulmer (1993a,b, 1995); van Hoeij et al. (1998). In most of the examples given below, in order to obtain the final result a version of the Kovacic algorithm was used. Since it is easy to understand, this algorithm is the main tool used when the Morales-Ramis theory is applied. However, lack of space does not allow its presentation here. We refer the interested reader to the papers cited above. Applying the Morales-Ramis theory we have to perform three steps. First, given a Hamiltonian system we have to find its particular solution. The more solutions we find
Non-integrability in gravitational and cosmological models
379
the stronger is the result obtained. In most of the known examples the system possesses an invariant plane on which it is integrable. The next step consists of the derivation of the normal variational equations. This is easy but, bearing in mind the next step, we have to transform these equations to obtain a system with rational coefficients. It is not always obvious how to do that. Finally, for the last step, we check if the identity component of the normal variational equations is Abelian. If that is the case, then the system can be integrable and we can check by using e.g. the PoincarC cross section. When the identity component is not Abelian then the system is not integrable.
Example 1 Our first example is connected with cosmology. The Einstein field equations describe the dynamical evolution of space-time, as well as the motion of matter and physical fields. They provide a system of coupled, non-linear partial differential equations. However, when we postulate a certain symmetry of space-time, we can reduce Einstein's field equations to a system of ordinary differential equations. Assuming a certain homogeneity of the space-time the Einstein equat,ions can be reduced to a Hamiltonian system with
where
bt3I=
-d
qlq2 qlq3
q2ql
-42"
93% 9392
q2q3
I
-d]
and 72, E {+l,-1, O}. We consider a case when (711, 722,713) = (1,1,-1). For this choice the system is called the Bianchi VI11 model. We want to know if this system is integrable. First of all, let us notice that this system possesses a four dimensional invariant plane
r = { ( q l , q 2 r ~ 3 , ~ 1 , ~ E2 rc6~ 3I q1) = q2,
pl = P ~ } .
Moreover, it can be shown that the system restricted to 7 is integrable. This allows us to find particular solutions. To obtain these solutions it is convenient to introduce new variables (Q1, Q2, Q3,21,22, Z3) (for further considerations the explicit form of transformation is irrelevant). In these new variables particular solutions have the following form Q l ( t )=
where
0,
Z,(t) = 0,
Z2(t) = Z(t)-'
+U@),
Andrzej J Maciejewski
380
and ( A .B , C, D, k ) are constants satisfying
2C (A2 + B)3’2= kAD. The normal variational equation for these solutions has the form
i‘+
a ( t ) i+ b(t)E = 0,
where
~ ( t=) -&(t).
b ( t ) = 4&2(t)’ - & 3 ( t ) ’ +
1
,Zz(t)2.
Now, in order to apply the Kovacic algorithm, we have to find a transformation t -+ z which gives us an equation with rational coefficients. For B # 0 the author does not know such a transformation; however, when B = 0 we can take
z ( t ) :=
i
kexp[At] for IC exp[At] for
# 0,
k = 0.
Applying the Kovacic algorithm we can easily check that the identity component of the i.e. it is not Abelian and, differential Galois group of the resulting equations is SL(2. as a result, the system is not integrable. For details see (Maciejewski et al., 2001).
e),
Example 2 Our next example is also connected with cosmology. We consider the following Hamiltonian system
where (A, A, p ) are parameters. This is the so-called Friedman-Robertson-Walker system. We will study its integrability using Theorem 15. For this system equation
VV(Q)= 4.
q = (41. q z ) ,
has the following solutions
Let A, denote an eigenvalue of V’V(zl) which is different from 3, and let 11
= M 2 P - 1)lP E Z).
Then, from Theorem 15 it follows that if {Al, X2, A,} is non-integrable. Calculations show that /J x2 = --.I*x x1 = --,. I
x3=
< I = 11U
I2
xlxz - 2(X1 + A,) 1 - XlX,
U 13 then the system
+3
381
Non-integrability in gravitational and cosmological models
Figure 4. A rigid satellite in a circular orbit around a gravitational center It is interesting to select those cases when { X I , X2, A,} c I , i.e. those values of parameters for which the system can be integrable. Kote that is a symmetric function of X1 and X2. It is easy to observe that if XI = 1 or X2 = 1 then A3 = 1. Assume for example that X2 = X3 = 1. Then /I = -A and X1 = X/h. If the system is integrable then X/A c I . As we can see there exists a discrete but infinite set of cases suspected to be integrable. This example was analysed in (Maciejewski and Szydlowski, 2000).
Example 3 Our next example is connected with the dynamics of the rotational motion of a satellite under the influence of the gravity torque of the central body. Consider a rigid body B moving in a circular orbit around a fixed gravitational centre 0 (see Figure 4). The equations of the rotational motion of the body can be written in the form (Beletskii, 1965; Bogoyavlenskii, 1991) d dt d -r = r x (0 - N), dt d -N = N x 0, dt
-M = M x 0+3r x
~r,
where 0 is the angular velocity, M = I 0 is the angular momentum and I=diag(A, B , C) is the inertia tensor; r is the unit vector in the direction of the radius vector of the center of mass of the satellite; N is the unit vector normal to the plane of the orbit. All vectors are taken with respect to the principal axes reference frame. We chose the time unit in such a way that the Keplerian orbital angular velocity is equal to one:
In the above formula M is the mass of the gravitational center 0, and the orbit.
T
is the radius of
Andrzej J Maciejewski
382
System (23) possesses the energy integral 3 2
1 2
H1 = H ( M , r , N )= -(M,I-lM) - (M.N) + -(r,Il?); and the following geometric integrals
H~ = (r,r).
H~= (N, N),
H* = (r;N).
Our system (23) is a Hamiltonian one; but it is written in non-canonical variables. In what follows we consider the case when the body is dynamically symmetric with B=C=l. It can be shown that for this’case the Hamiltonian function has the form
H
= p2 L
2cos2qq
3 + Pi - - p1 + - ( A - 1)cos2 q1 cos2 q 2 , 2 2
where ( q l , q 2 , q 3 ) are the Euler angles of the type 3-2-1 which describe the orientation of the principal axes reference frame with respect to the orbital frame defined by three orthonormal vectors (I?, N x I?, N ) . Canonical coordinate q3 is cyclic and the value of the corresponding first integral was taken t o be equal to zero. It is easy to notice that the ( q l , p l ) plane is invariant. As a particular solution we choose
1
+
W
sin q1 (t) = - p1(t) = 1 - for A > 1, cosh wt cosh u t w sinql(t) = tanhwt pl(t) = 1 - for A < 1, cosh ut
+
where w =
d
m E (0; &).
The normal variational equations have the form
i‘ = .(a,
(24)
where 2w cosh u t
a(t) =
2W2
cosh’ u t 2w2
I-‘az-E Z z
for A > 1‘ for A < 1.
After the following transformation
t
-+
2 := tanh
wt 2
-.
equation (24) reads
E’’ where
+ p(z)<’ + q ( z ) J = 0:
_ _d ’dz‘
(25)
Non-integrability in gravitational and cosmological models
383
Now, applying the Kovacic algorithm we can prove that the considered system is not integrable for A # 1. In the proof, it appeared important that the local monodromy of equation (25) around the singular point z = i is given by a triangular matrix. This example was analysed by the author and C Sim6.
Other Examples The above examples illustrate the power of the Morales-Ramis theory. For more a p plications see the references in Morales-Ruiz (2000). Without doubt, the most interesting results in this field were obtained by Boucher (2000) and Tsygvintsev (2000). In these works proofs of the non-integrability of the planar three body problem are given.
Aknowledgments I would like to thank J J Morales-Ruiz and F Ulmer who helped me to understand many aspects of the theory. I am very thankful t o H Zolqdek who sent me a part of his unpublished book and to A V Borisov for letting me use the'photograph of S V Kovalevskaya.
References Arnold V I, 1978, Mathematical Methods of Classical Mechanics, Graduate Texts in Mathematics, Springer-Verlag, New York. Arnold V I, Kozlov V V, and Neishtadt A I, 1988, Dynamical systems. 111, vol. 3 of Encyclopaedia of Mathematical Sciences, Springer-Verlag, Berlin-New York. Beletskii V V, 1965, Motion of a Satellite about its Mass Center, Nauka, Moscow, in Russian. Beukers F, 1992, in &om number theory to physics (Les Houches, 1989), 413-439, Springer, Berlin. Bogoyavlenskii 0 I, 1991, Overturning solitons. Nonlinear integrable equations, Nauka, Moscow, in Russian. Borisov A V and Dudoladov S L, 1999, Regul. Chaotic Dyn., 4(3), 13-20. Borisov A V and Tsygvintsev A V, 1996, Regul. Khaoticheskaya Din., 1(1), 15-28, 29-37. Borisov A V and Tsygvintsev A V, 1997, Pm'kl. Mat. Mekh., 61(1),30-36. Boucher D, 2000, Sur des e'quations diffirentielles line'ares pamme'tre'es, une application aux syste'mes hamiltoniens, Ph.D. thesis, Universith de Limoges, F'rance. Braider A and Churchill R C, 1990, J. Differential Equations, 2(4), 451-481. Christov 0 B, 1993, C. R. Acad. Bulgare Sci., 46(9), 33-36 (1994). Churchill R C and Rod D L, 1988, J. Diferential Equations, 76(1), 91-114. Churchill R C and Rod D L, 1991, SIAM J . Math. Anal., 22(6), 1790-1802. Conte R, 1999, in The Painleve' property, 77-180, Springer, New York. Ferrandiz J M and Sansaturio M E, 1995, Phys. Lett. A , 207(3-4), 180-184. Francoise J P and Irigoyen M, 1993, J. Geom. Phys., 10, 231-243. Golubev V V, 1950, Lectures on Analytical Theory of Differential Equataons, Gosud. Isdat. Teh. Teor. Lit., Moscow, second edn., in Russian. Goriely A, 2000, Regul. Chaotic Dyn., 5(1),3-15. Hille E, 1976, Ordinary Differential Equations in the Complex Domain, John Wiley and Sons, New York.
384
‘4ndrzej J Maciejewski
Hoyer P, 1879, Uber die Integration eines Differentialgleichungssystemsvon der Form x ~ = a ~ x ~ x 3 + a 2 ~ 3 x ~ + a 3 x ~xx~2 ,= b ~ x ~ x ~ + b ~ x ~ x ~x + ~ =b c~ ~x x~ ~x x~ ~, + c ~ x ~ x ~ + c ~ x ~ durch eliptische Funktionen: Ph.D. thesis, Konigl. Friedrich-Wilhelms Univ., Berlin. Irigoyen M. 1995. in From Newton to chaos (Cortina d’Ampetzo, 1993), 247-251, Plenum, New York. Ito H, 1985, Kodai Math. J . , 1, 120-138. Ito H, 1987, Z. Angew. Math. Phys., 38, 459-476. Ito H, 1990, Bol. Soc. B r a d Mat., 21(1), 95-120. Kaplansky I, 1976, A n introduction to differential algebra, Hermann, Paris, second edn., actualit6s Scientifiques et Industrielles. No. 1251, Publications de 1’Institut de Mathkmatique de l’Universit6 de Nancago, No. V. Kimura T, 1969/1970, Funkcial. Ekvac.. 12, 269-281. Kovacic J J: 1986, J . Symbolic Comput., 2(1), 3-43. Kowalevski S, 1888, Acta Math., 12, 177-232. Kowalevski S: 1890, Acta Math., 14, 81-93. Kozlov V V, 1983, Uspekhi Mat. Nauk, 38(1(229)), 3-67. Kozlov V V, 1992, Mat. Zametki, 51(2): 46-52. Kozlov V V, 1996, Symmetries, Topology and Resonances i n Hamiltonian Mechanics, SpringerVerlag, Berlin. Kummer M, Churchill R C? and Rod D L: 1991, in J A Ellison and H Uberall, eds., Essays on Classical and Quantum Dynamics, 71-76, Gordon and Breach Science Publishers, Philadelphia. Lochak P, 1985. C. R. Acad. Sci. Paris Sir. I Math.. 300(11), 369-372. Lyapunov A M:1894, Soobshch. Har’kovsk. Mat. Obshch., 4(3), 123-140, in Russian. Maciejewski A J and Goidziewski K. 1999, Rep. Math. Phys., 44(1/2), 133-142. Maciejewski A J and Popov S I, 1998: Rep. Math. Phys., 41(3), 287-310. Maciejewski A J, Strelcyn J M, and Szydlowski M, 2001, J. Math. Phys., 42(4), 1-16. Maciejewski A J and Szydlowski M, 2000, J . Phys. A , 33, 9241-9254. Magid A R, 1994, Lectures on differential Galois theory, vol. 7 of University Lecture Series, American Mathematical Society, Providence, RI. Miranda R, 1995, Algebraic curves and Riemann surfaces, American Mathematical Society, Providence, RI. Morales-Ruiz J J, 1989, Te‘cnicas algebraicas para estudio de la integrabilidad de sistemas hamiltonianos, Ph.D. thesis, University of Barcelona, Barcelona. Morales Ruiz J J, 1999, Differential Galois theory and non-integrability of Hamiltonian systems, Birkhauser Verlag, Basel. Morales-Ruiz J J, 2000, Regul. Chaotic Dyn., 5(3), 251-272. Novikov S P, 1982, Hamiltonian Formalism and Many-valued Analog of Morse Theory, Uspekhi Mat. Nauk, 5(527), pp3-49. Nowicki A, 1994, Polynomial derivations and their rings of constants, Nicolaus Copernicus University Press, Toruri. Nowicki A, 1996, J. Pure Appl. Algebra, 235, 107-120. Ramani A, Grammaticos B: and Bountis T: 1989, Phys. Rep., 180(3), 159-245. Ramis J P and Martinet 3, 1990, in Computer algebra and differential equations, 117-214, Academic Press, London. Rod D L and Sleeman B D, 1995, Proc. Roy. Soc. Edinburgh Sect. A , 125(5), 959-974. Roekaerts D and Yoshida H, 1994, in J Leon, ed., Nonlinear evolutions, 721-726, World Scientific, Singapore. Sadetov S T: 1993, Mat. Zametki: 54(4), 152-154. Sadktov S T: 1994, Vestnik Moskov. Univ. Ser. I Mat. Mekh., ( l ) ,82-87, 97.
Non-integrability in gravitational a n d cosmological models
385
Singer M F, 1990, in Computer algebra and diflerential equations, 3-57, Academic Press, London. Singer M F and Ulmer F, 1993a, J. Symbolic Comput., 16(1), 9-36. Singer M F and Ulmer F, 1993b, J. Symbolic Comput., 16(1), 37-73. Singer M F and Ulmer F, 1995, Appl. Algebra Engrg. Comm. Comput., 6(1), 1-22. Tsygvintsev A, 2000, The meromorphic non-integrability of the threebody problem, Tech. rep., UniversitC de Paul Sabatier, MathCmatiques, Toulouse, France. Ulmer F and Weil J A, 1996, J. Symbolic Comput., 22(2), 179-200. van Hoeij M, &got J F, Ulmer F, and Weil J A, 1998, J. Symbolic Comput., 28, 589409. Weyl H, 1955, The Concept of a Riemann Surface, Addison-Wesly Publishing Company, London. Yoshida H, 1983a, Celestial Mech., 31, 363-379. Yoshida H, 1983b, Celestial Mech., 31, 381-399. Yoshida H, 1987, Phys. D, 29(1-2), 128-142. Yoshida H, 1989, Phys. Lett. A , 141(3-4), 108-112. Ziglin S L, 1982, Functional Anal. Appl., 16, 181-189. Ziglin S L, 1983, Functional Anal. Appl., 17, 6-17. Ziglin S L, 1987, Dokl. Akad. Nauk SSSR, 292(4), 804-807. Ziglin S L, 1996, Funktsional. Anal. i Prilozhen., 30(2), 80-81. Ziglin S L, 1997, Funktsional. Anal. i Prilozhen., 31(1), 3-11, 95. Ziglin S L, 1998, Chaos, 8(1), 272-273.