High Performance Computing in Science and Engineering '11: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2011

High Performance Computing in Science and Engineering ’11 Wolfgang E. Nagel r Dietmar B. Kröner Michael M. Resch r ...

Author: Wolfgang E. Nagel | Dietmar B. Kröner | Michael M. Resch

32 downloads 1342 Views 31MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

High Performance Computing in Science and Engineering ’11

Wolfgang E. Nagel r Dietmar B. Kröner Michael M. Resch

r

Editors

High Performance Computing in Science and Engineering ’11 Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2011

Editors Wolfgang E. Nagel Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) Technische Universität Dresden Zellescher Weg 12-14 01069 Dresden Germany [email protected]

Dietmar B. Kröner Abteilung für Angewandte Mathematik Universität Freiburg Hermann-Herder-Str. 10 79104 Freiburg Germany [email protected]

Michael M. Resch Höchstleistungsrechenzentrum Stuttgart (HLRS) Universität Stuttgart Nobelstraße 19 70569 Stuttgart Germany [email protected]

Front cover figure: Laminar-turbulent transition on a common dolphin: Turbulent kinetic energy at 1 m/s and 1% turbulence intensity. (Simulation by D. Riedeberger and U. Rist, IAG, Stuttgart University, Germany. Geometry DIXIE kindly provided by V. Pavlov, FTZ, Kiel University, Germany)

ISBN 978-3-642-23868-0 e-ISBN 978-3-642-23869-7 DOI 10.1007/978-3-642-23869-7 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011942929 Mathematics Subject Classification (2010): 65Cxx, 65C99, 68U20 © Springer-Verlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: WMXDesign GmbH Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

It is a pleasure to announce that after the installation of the IBM Blue Gene/P at NIC/JSC in May 2009, as part of the Gauss Centre for Supercomputing (GCS) – linking together the three national supercomputing centres HLRS (Stuttgart), NIC/JSC (J¨ulich), and LRZ (Garching) – the next major step has been realized at HLRS with the installation of a new system, HERMIT, in October 2011. HERMIT is a large Cray XE-6 with a peak speed of more than 1 PFLOP/s. Based on the new AMD Interlagos chip, two sockets form a node, and HERMIT brings together 3,552 compute nodes with 113,664 cores, integrated into 38 water-cooled cabinets. Additionally, the system is tightly integrated with external servers for pre- and postprocessing to support complex workflows. The new system entered the Top500 list in November 2011 and ranked number 12 on a worldwide level, achieving a Linpack value of 831,4 TFLOP/s. Having the new Petaflop system as an infrastructure, and together with the new research buildings for VISUS, SimTech, and HLRS, the Universit¨at Stuttgart is well positioned to become one of the leading science nodes for simulation technology in Germany, as well as abroad. In 2013, the second delivery phase will follow, and the final Cray system will then have a peak performance of roughly 5 PFLOP/s. Additionally, the LRZ will upgrade its own systems accordingly to a 3 PFLOP/s system in summer 2012. The plan is to have a Tier-0 HPC system within the GCS operating at any time within the five year period. The HLRS also participates in the European project PRACE (Partnership for Advances Computing in Europe) as part of the GCS, extending its reach to all European member countries. Within the PRACE project, the GCS will provide access to high performance computing resources valued at 100 Million Euros. Moreover, the PRACE activities are well aligned with the HLRS activities in the European HPC support project, HPC-Europa2. Additionally, HLRS participates with partners in Germany in two Exascale Software Initiatives on European Level, namely TEXT and CRESTA where the challenges on the efficient use of current and future computing systems are investigated. While the GCS has successfully addressed the high end computing needs, it was clear from the very beginning that an additional layer of support is required v

vi

Preface

to maintain the longevity of the Centre, via a network of competence centres across Germany. This gap is addressed by the Gauß–Allianz (GA), in which regional and local centres teamed up to create the necessary infrastructure, knowledge, and the required methods and tools. The mission of the Allianz is to coordinate the HPCrelated activities of its members. By providing versatile computing architectures and by combining the expertise of the participating centres, the necessary ecosystem for computational science has been created. Strengthening the research and increasing the visibility to compete at the international level are further goals of the Gauß–Allianz. To disseminate information about its activities, the Gauß–Allianz has started to publish a flyer (GA-Infobrief, http://www.gauss-allianz.de/infobrief), issued several times a year. A number of projects of the second BMBF HPC-call have started as early as April 2011. This call was directed towards proposals that enable and support petascale applications on more than 100,000 processors, as they are also currently available at HLRS. While the projects of the first funding round started in early 2009 and will complete within the next 6 months, the follow-up call had been delayed by more than 18 months. Nevertheless, all experts and administration authorities continue to acknowledge the strong need for such a funding program, given that the main issue identified in nearly all applications is that of scalability. The strategic funding plan involves another 20 Million Euros, with a yearly follow-up call over the next three years, for projects that develop scalable algorithms, methods, and tools to support massively parallel systems. This can be seen as a very large investment. Nevertheless, in relation to the investment in computing hardware within Germany over this five year period, the investment in software is still comparatively small, amounting to less than 20 per cent of the hardware investment. Furthermore, the investment in software will produce the ‘brains’ that will be needed to use the newly developed innovative methods and tools, to accomplish technological breakthroughs in scientific as well as industrial fields of applications. It is widely known that the long term target is aimed not only at Petascale but at Exascale systems as well. We do not only need competitive hardware but also excellent software and methods to address – and solve – the most demanding problems in science and engineering. The success of this approach is of significant importance for our community, and will also greatly influence the development of new technologies and industrial products. Beyond being important, the success of this approach will finally determine whether Germany will be an accepted partner alongside the leading technology and research nations. It is, therefore, a pleasure to announce that in October 2011, the German Research Foundation (DFG) has funded an additional Priority Program 1648 “Software for Exascale Computing (SPPEXA)” in the field of HPC. The funding is available for 6 years, starting January 2013 with 4 Million Euros per year, to support fundamental and basic research questions in several specific areas related to HPC. Since 1996, the HLRS has supported the scientific community as part of its official mission. Just as in the past years, the major results of the last 12 months were presented at the 15th annual Results and Review Workshop on High Performance Computing in Science and Engineering, which was held on October 4–5, 2011 at

Preface

vii

the Universit¨at Stuttgart. The workshop proceedings contain the written versions of the research work presented. The papers were selected from all projects running at the HLRS and the SSC Karlsruhe during a one-year period between October 2010 and October 2011. Overall, a number of 47 papers were chosen from Physics, Solid State Physics, Chemistry, Reactive Flow, Computational Fluid Dynamics (CFD), Transport, Climate, and numerous other fields. The largest number of contributions originated from the CFD field, just as in many previous years, with 13 papers. Even though such a small collection cannot entirely represent an area this vast, the selected papers demonstrate the state-of-the-art in high performance computing in Germany. The authors were encouraged to emphasize the computational techniques used in solving the problems examined. This is an often forgotten aspect, and was the major focus of the workshop proceedings. Nevertheless, the importance of the newly computed scientific results for the specific disciplines is impressive. We gratefully acknowledge the continuing support of the federal state of BadenW¨urttemberg in promoting and supporting high performance computing. Grateful acknowledgments are also due to German Research Foundation (Deutsche Forschungsgemeinschaft (DFG)) and the Germany Ministry for Research and Education (BMBF), as many projects pursued on the HLRS and SSC computing machines could not have been carried out without its support. Also, we thank Springer Verlag for publishing this volume and, thus, helping to position the national activities in an international framework. We hope that this series of publications contributes to the global promotion of high performance scientific computing. Stuttgart, November 2011

Wolfgang E. Nagel Dietmar Kr¨oner Michael Resch

Contents

Physics P. Nielaba The Influence of the Mass Ratio on Particle Acceleration by the Filamentation Instability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Kilian, T. Burkart, and F. Spanier

5

The SuperN-Project: Neutrino Hydrodynamics Simulations of Core-Collapse Supernovae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 B. M¨uller, L. H¨udepohl, A. Marek, F. Hanke, and H.-Th. Janka Simulation of Pre-planetesimal Collisions with Smoothed Particle Hydrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 R.J. Geretshauser, R. Speith, and W. Kley Copper Substrate Catalyzes Tetraazaperopyrene Polymerization . . . . . . . 47 W.G. Schmidt, E. Rauls, U. Gerstmann, S. Sanna, M. Landmann, M. Rohrm¨uller, A. Riefer, S. Wippermann, and S. Blankenburg QCD Critical Surfaces at Real and Imaginary μ . . . . . . . . . . . . . . . . . . . . . 57 O. Philipsen and Ph. de Forcrand Higgs Boson Mass Bounds from a Chirally Invariant Lattice Higgs-Yukawa Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 P. Gerhold, K. Jansen, and J. Kallarackal Massive and Massless Four-Loop Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 83 P. Baikov, K. Chetyrkin, J.H. K¨uhn, P. Marquard, and M. Steinhauser Solid State Physics H. Fehske Laser Ablation of Aluminium: Drops and Voids . . . . . . . . . . . . . . . . . . . . . . 93 J. Roth, J. Karlin, C. Ulrich, H.-R. Trebin, and S. Sonntag ix

x

Contents

Cysteine on Gold: An ab-initio Investigation . . . . . . . . . . . . . . . . . . . . . . . . . 105 B. H¨offling, F. Ortmann, K. Hannewald, and F. Bechstedt Ab-initio Calculations of the Vibrational Properties of Nanostructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 G. Bester and P. Han Entropy and Metal-Insulator Transition in Atomic-Scale Wires: The Case of In-Si(111)(4 × 1)/(8 × 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 W.G. Schmidt, E. Rauls, U. Gerstmann, S. Sanna, M. Landmann, M. Rohrm¨uller, A. Riefer, and S. Wippermann Obtaining the Full Counting Statistics of Correlated Nanostructures from Time Dependent Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 P. Schmitteckert Phase Diagram of the 1D t-J Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 A. Moreno, A. Muramatsu, and S. Manmana Chemistry C. van W¨ullen Constrained Density Functional Theory of Molecular Dimers . . . . . . . . . . 169 J.-H. Franke, N.N. Nair, L. Chi, and H. Fuchs Atomistic Simulations of Electrolyte Solutions and Hydrogels with Explicit Solvent Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 J. Walter, S. Deublein, S. Reiser, M. Horsch, J. Vrabec, and H. Hasse cuVASP: A GPU-Accelerated Plane-Wave Electronic-Structure Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 S. Maintz, B. Eck, and R. Dronskowski Reacting Flows D. Kr¨oner Assessment of Conventional Droplet Evaporation Models for Spray Flames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 M.R.G. Zoby, A. Kronenburg, S. Navarro-Martinez, and A.J. Marquis Analysis of the Effects of Wall Boundary Conditions and Detailed Kinetics on the Simulation of a Gas Turbine Model Combustor Under Very Lean Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 F. Rebosio, A. Widenhorn, B. Noll, and M. Aigner Oxy-coal Combustion Modeling at Semi-industrial Scale . . . . . . . . . . . . . . 245 M. M¨uller, U. Schnell, and G. Scheffknecht

Contents

xi

Delayed Detached Eddy Simulations of Compressible Turbulent Mixing Layer and Detailed Performance Analysis of Scientific In-House Code TASCOM3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 M. Kindler, P. Gerlinger, and M. Aigner Computational Fluid Dynamics S. Wagner Discontinuous Galerkin for High Performance Computational Fluid Dynamics (hpcdg) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 C. Altmann, A. Beck, A. Birkefeld, F. Hindenlang, M. Staudenmaier, G. Gassner, and C.-D. Munz Highly Efficient and Scalable Software for the Simulation of Turbulent Flows in Complex Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 D.F. Harlacher, S. Roller, F. Hindenlang, C.-D. Munz, T. Kraus, M. Fischer, K. Geurts, M. Meinke, T. Kl¨uhspies, V. Metsch, and K. Benkert A Computation Technique for Rigid Particle Flows in an Eulerian Framework Using the Multiphase DNS Code FS3D . . . . . . . . . . . . . . . . . . . 309 P. Rauschenberger, J. Schlottke, and B. Weigand Optimization of Chaotic Micromixers Using Finite Time Lyapunov Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 A. Sarkar, A. Narv´aez, and J. Harting Numerical Simulation of Particle-Laden Turbulent Flows Using LES . . . . 337 M. Breuer and M. Alletto Large-Eddy Simulation of Supersonic Film Cooling at Finite Pressure Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 M. Konopka, M. Meinke, and W. Schr¨oder Prediction of Stability Limits of Combustion Chambers with LES . . . . . . . 371 B. Pritz, F. Magagnato, and M. Gabi Numerical Simulation of Laminar-Turbulent Transition on a Dolphin Using the γ -Reθ Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 D. Riedeberger and U. Rist Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 A. Klein, S. Illi, K. N¨ubler, T. Lutz, and E. Kr¨amer Numerical Simulation of Helicopter Wake Evolution, Performance and Trim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 F. Bensing, M. Embacher, M. Hollands, B. Kutz, M. Keßler, and E. Kr¨amer

xii

Contents

Parameter Study for Scramjet Intake Concerning Wall Temperatures and Turbulence Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 B. Reinartz Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 J. Starzmann, M.V. Casey, and J.F. Mayer Turbulence Modelling for CFD-Methods for Containment Flows . . . . . . . . 451 A. Zirkel and E. Laurien Transport and Climate C. Kottmeier The Transport of Mineral Dust Towards Hurricane Helene (2006) . . . . . . 471 J. Schwendike, S. Jones, H. Vogel, and B. Vogel Numerical Modelling of Mediterranean Cyclones . . . . . . . . . . . . . . . . . . . . . 489 C.-J. Lenz, U. Corsmeier, and C. Kottmeier Modelling Near Future Regional Climate Change for Germany and Africa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 H.-J. Panitz, P. Berg, G. Sch¨adler, and G. Fosser High-Resolution Climate Predictions and Short-Range Forecasts to Improve the Process Understanding and the Representation of Land-Surface Interactions in the WRF Model in Southwest Germany (WRFCLIM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 H.-S. Bauer, K. Warrach-Sagi, V. Wulfmeyer, T. Schwitalla, and M. Kirn Direct Numerical Simulation and Implicit Large Eddy Simulation of Stratified Turbulence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 S. Remmler and S. Hickel Miscellaneous Topics W. Schr¨oder Allocation of Economic Capital in Banking: A Simulation Approach . . . . 541 H.-P. Burghof and J. M¨uller The Influence of Partial Melt on Mantle Convection . . . . . . . . . . . . . . . . . . 551 A.-C. Plesa and T. Spohn Molecular Modeling of Hydrogen Bonding Fluids: Phase Behavior of Industrial Fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 S. Eckelsbach, M. Bernreuther, C. Engin, G. Guevara-Carrion, Y.-L. Huang, T. Merker, H. Hasse, and J. Vrabec

Contents

xiii

“Brute-Force” Solution of Large-Scale Systems of Equations in a MPI-PBLAS-ScaLAPACK Environment . . . . . . . . . . . . . . . . . . . . . . . . 581 M. Roth, O. Baur, and W. Keller Metallic Foam Structures, Dendrites and Implementation Optimizations for Phase-Field Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 A. Vondrous, B. Nestler, A. August, E. Wesner, A. Choudhury, and J. H¨otzer Quaero 2010 Speech-to-Text Evaluation Systems . . . . . . . . . . . . . . . . . . . . . 607 S. St¨uker, K. Kilgour, and F. Kraft Accurate Simulation of Wireless Vehicular Networks Based on Ray Tracing and Physical Layer Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 T. Gaugel, L. Reichardt, J. Mittag, T. Zwick, and H. Hartenstein Reduction of Numerical Sensitivities in Crash Simulations on HPC-Computers (HPC-10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 O. Mangold, R. Prohl, A. Tkachuk, and V. Trickov Three-Dimensional Gyrotron Simulation Using a High-Order Particle-in-Cell Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 A. Stock, J. Neudorfer, B. Steinbusch, T. Stindl, R. Schneider, S. Roller, C.-D. Munz, and M. Auweter-Kurtz

Physics Prof. Dr. Peter Nielaba

Many important results have been achieved by the computer time granted at the HLRS. The contributions in the proceedings present the results of large scale simulations for astrophysics phenomena, nano-systems, and elementary particle models and are summarized and commented below. Patrick Kilian, Thomas Burkart, and Felix Spanier (University of W¨urzburg, “iotmrofi”) have studied the influence of the mass ratio on particle acceleration by the filamentation instability. Observations indicate that several types of astrophysical sources produce relativistic jets that interact with the intergalactic medium, creating regions of counter-streaming plasma. Under these conditions the plasma is susceptible to filamentation instabilities. The analytical analysis of this environment is highly non-trivial, which leads to the extensive use of computer simulations to study these conditions and the connection to the energetic photons and particles emanating from these sources. To make simulations feasible one has to make a couple of simplifications to reduce the computational complexity to a level that is reachable with todays computers. One such simplification is the reduction of the proton mass compared to the electron mass. The authors try to assess what the lower limit of this quantity is that still allows a realistic representation of the situation in nature. The authors have used the particle-in-cell code ACRONYM, in parts on the NEC Nehalem Cluster at the High Performance Computing Center Stuttgart (HLRS). The largest jobs there used 64 nodes with 8 processors per core, the largest simulation used 37709 CPU hours and was split into 17 consecutive jobs using 128 CPUs each. In that time 3.4 109 particles were moved through 3600 time steps. B. M¨uller, L. H¨udepohl, A. Marek, F. Hanke, and H.-Th. Janka from the MPI for Astrophysics in Garching (“SuperN”) have investigated two-dimensional (core collapse) supernova models by simulations, give an overview on the relevant equations and the algorithm for its solution that are employed in their code, and report on their efforts to improve the physics in their supernova code VERTEX as well as its computational efficiency. Recent results of simulations performed on the NEC SX-8 Fachbereich Physik, Universit¨at Konstanz, 78457 Konstanz, Germany, e-mail: [email protected] 1

2

P. Nielaba

at the HLRS include findings about the role of general relativity in core-collapse supernovae, about nucleosynthesis conditions in O-Ne-Mg core supernovae, and about the proto-neutron star cooling phase. R.J. Geretshauser, R. Speith, F. Meru, and W. Kley (University of T¨ubingen, “SPH-PPC”) have analyzed pre-planetesimal collisions with their solid body— smoothed particle hydrodynamics (SPH) code parasph. The formation of planetesimals requires the right amount of sticking, bouncing, and fragmentation to be consistent with observations. Therefore, collisions of pre-planetesimals have to be investigated as thoroughly as possible and their outcome has to be mapped as precisely as possible depending on all relevant parameters such as initial porosity, collision partner size, and impact velocity. The code parasph is based on the ParaSPH library, featuring domain decomposition, load balancing, nearest neighbor search, and internode communication, extended for the simulation of elasticity and plasticity including the time evolution of the deviatoric stress tensor and by an implementation of a porosity model. The parallel implementation utilizes the Message Passing Interface (MPI) library, and HDF5 was included as a compressed input and output file format with increased accuracy, decreasing the amount of required storage space. The authors have developed the first damage model which is based on the inhomogeneity of SiO2 dust aggregates, and they have investigated the occurrence of sticking and bouncing for macroscopic and microscopic aggregates and head-on collisions of pre-planetesimals. The simulations were carried out on the NEC Nehalem cluster of the HLRS with 240,143 to 476,476 SPH particles depending on the size of the projectile. 32 to 80 cores were used, and simulations roughly took 72 to 240 h for 1 s of simulated time, depending on size of the problem and involved physical process. W.G. Schmidt, E. Rauls, U. Gerstmann, S. Sanna, M. Landmann, M. Rohrm¨uller, A. Riefer, S. Wippermann, S. Blankenburg (University of Paderborn, “MolArch1”) have investigated by density functional theory (DFT) calculations the polymerization of 1,3,8,10-tetraazaperopyrene (TAPP) molecules on a Cu(111) substrate, as observed in recent STM experiments. According to the authors computations, the substrate catalyzes this tautomerization, and metal coordinated (provided the process is accompanied by a dehydrogenation) as well as uncoordinated polymerization processes are possible. The authors conclude, that the catalytic effect of metallic substrates may thus assist in the formation of covalently bonded molecular networks which are not formed in the gas phase or in solution. In the calculations, the electron-ion interaction was described by the projector-augmented wave (PAW) method with a relatively moderate energy cutoff of 340 eV, and the adstructures were modeled in periodically repeated supercells, containing two atomic Cu layers. The calculations within this project were performed on the NEC SX-8 and SX-9 of the H¨ochstleistungsrechenzentrum Stuttgart. The DFT calculations were performed within the local density approximation (LDA) for exchange and correlation as implemented in VASP. The ground-state DFT calculations have been parallelized for different bands and sampling points in the Brillouin zone using the message passing interface (MPI), and parallelization over bands and plane wave coefficients at the same time reduced the communication overhead.

Physics

3

Ph. de Forcrand from the ETH Z¨urich and O. Philipsen from the University of M¨unster (“muQCD”) have calculated the critical surface bounding the region featuring chiral phase transitions in the quark mass and chemical potential parameter space of quantum chromo dynamics (QCD) with three flavors of quarks. Their calculations are valid for small to moderate quark chemical potentials, μ ≤ T . Previous calculations were done on coarse Nt = 4 lattices, corresponding to a lattice spacing of a ∼ 0.3 fm. Now, the authors present updated results for three degenerate flavors at zero and finite density on Nt = 6 lattices, corresponding to a lattice spacing of a ∼ 0.2 fm, and for the phase structure at imaginary chemical potential μ /T = iπ /3, finding tricritical lines which bound the continuation of the chiral as well as the deconfinement transition surfaces to imaginary chemical potentials, and explain their curvature. For their Monte Carlo simulations the authors used the standard Wilson gauge and Kogut-Susskind fermion actions. Configurations are generated using the Rational Hybrid Monte Carlo (RHMC) algorithm. In order to investigate the critical behavior of the theory, the authors use the Binder cumulant as an observable. For each set of fixed quark mass and chemical potential, the critical coupling βc has been interpolated from a range of typically 3–5 simulated β -values by FerrenbergSwendsen reweighting. The simulations have been performed on the NEC SX-8 at the HLRS in Stuttgart and the EEGE Grid at CERN. An estimate of the Binder cumulant for one set of mass values consisted of at least 200k trajectories, and the estimate of a critical point required at least 500k trajectories. The derivatives on the Nt = 4 lattices are based on 500k trajectories, on Nt = 6 the authors have so far also collected 1M trajectories. Philipp Gerhold, Karl Jansen, and Jim Kallarackal from the Humboldt University and DESY Zeuthen (“Xyukawa”) considered a chirally invariant lattice HiggsYukawa model based on the Neuberger overlap operator. The model has been evaluated using PHMC-simulations, and the authors present final results on the upper and lower Higgs boson mass bound. The question of a fourth generation of heavy quarks has recently gained attention, and the authors illustrate preliminary results of the Higgs boson mass bounds within this framework. The authors as well discuss their progress on properties of the Higgs boson with respect to its unstable nature, such as the decay width and the resonance mass of the Higgs boson. The current program, which is running on the XC2 (Karlsruhe), has been improved in many ways, including algorithmic as well as technical issues. The authors detected, that most of the used computer time is spent on performing the Fast Fourier Transforms. The implemented parallelization techniques therefore mainly focused on exploiting the available multi processor system for calculating the FFT, mostly limited by memory access speeds. The authors idea was to align processes to memory segments where the memory had been allocated, leading to a good scaling behavior. At present, the authors report on runs on 324 and 404 lattices on the fat nodes having 8 cores each. The simulation program will need approximately 20 GB of main memory on each node for the 404 -lattice, and the fat nodes with their 128 GB of memory are therefore well suited for the computations. About 200 simulations are stored in archive space and occupy roughly 8.4 TB.

4

P. Nielaba

P. Baikov, K. Chetyrkin, J.H. K¨uhn, P. Marquard and M. Steinhauser from the KIT Karlsruhe (“ParFORM”) have investigated massive and massless four-loop integrals, the computations were mainly performed on the Landesh¨ochstleistungsrechner XC4000. The problems treated within their project aim for the evaluation of socalled Feynman diagrams which in turn lead to quantum corrections within a given quantum field theory like Quantum Electrodynamics or Quantum Chromodynamics but also supersymmetric theories. Each Feynman diagram has a one-to-one translation to a high-dimensional momentum integral to be computed, if possible analytically. There are several algorithms which have been suggested for the computation. All of them require huge resources of computer power which can only be handled in combination with effective programs. The workhorse of the authors for such calculations is the computer algebra program FORM and its parallel versions ParFORM and TFORM, developed at the authors institute. Since August 2010 all versions of FORM are open source. The parallelization concept for FORM is roughly described as follows: The original expression is divided into several pieces which are then distributed to the individual processors or cores (workers). Once the workers have finished their job the resulting expressions have to be collected by one processor which combines the results. A computer architecture running ParFORM or TFORM requires a fast connection to the (in general) local hard disks of the order of one tera byte per core. The typical CPU time reaches from several hours to several months depending on the concrete problem under consideration. One part of the authors project deals with two different kind of integrals. The authors investigated massive vacuum diagrams, also called tadpoles. These diagrams play an important role in the determination of the low-energy expansion of the vacuum polarization functions which can be used for the extraction of the masses of the charm and bottom quark from experimental data. The calculation of the low-energy expansion and the analysis of the available data has led to the most precise determination of the masses of the heavy quarks. A new class of integrals, so-called four-loop onshell integrals, appear as building blocks in important physics applications like the MS-onshell relation or the anomalous magnetic moment of the muon. The MS-onshell relation allows to relate the values of the quark masses in different renormalization schemes and is up to now only known at three-loop order. First results are already available: the calculation of the contributions from diagrams with at least two closed massless quark loops. The calculation of the anomalous magnetic moment of the muon, which, together with the one of the electron, is one of the most precisely measured quantities, is scheduled by the authors, once the calculation of the MS-onshell relations is complete. Since it is based on the same families of integrals as the MS-onshell relation many results can be reused.

The Influence of the Mass Ratio on Particle Acceleration by the Filamentation Instability Patrick Kilian, Thomas Burkart, and Felix Spanier

Abstract Observations indicate that several types of astrophysical sources produce relativistic jets that interact with the intergalactic medium, creating regions of counterstreaming plasma. Under these conditions the plasma is susceptible to filamentation instabilities. Analytical analysis of this environment is highly non-trivial, which leads to the extensive use of computer simulations to study these conditions and the connection to the energetic photons and particles emanating from these sources. To make simulations feasible one has to make a couple of simplifications to reduce the computational complexity to a level that is reachable with todays computers. One such simplification is the reduction of the proton mass compared to the electron mass. This project tries to assess what the lower limit of this quantity is that still allows a realistic representation of the situation in nature.

1 Introduction In this project we wanted to study the influence of the mass ratio protons vs electrons on the particle acceleration due to the filamentation instability in relativistic plasma. In nature we find relativistic counterstreaming plasmas in interaction regions between the intergalactic medium and jets from sources like active galactic nucleii (AGNs) and gamma-ray bursts (GRBs). These sources are—just like supernova remnants (SNRs)—known to produce more energetic photons than expected from a purely thermal source [1]. This non-thermal tail extends to extremely large energies and often shows a connection between energy and flux following a powerlaw. Furthermore we can infer from observations of strong synchrotron radiation that these sources also generate accelerate particles up to very high energies. Unlike the photon spectrum the particle spectrum is not directly observable here on earth, but most likely it follows a power-law too. Patrick Kilian · Thomas Burkart · Felix Spanier Institut f¨ur Theoretische Physik und Astrophysik, Universit¨at W¨urzburg, Emil-Fischer-Str. 31, 97074 W¨urzburg, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 1, © Springer-Verlag Berlin Heidelberg 2012

5

6

P. Kilian, T. Burkart, F. Spanier

The precise mechanism that accelerates particles up to the Petaelectronvolt scale which we observe in particles reaching earth as part of the cosmic rays is not entirely clear yet. Several plausible mechanisms including Fermi acceleration of type I or II are know from theoretical considerations and semi-analytical calculations. AGNs and GRBs with their relativistic outflows allow also for a different mechanism, the filamentation instability. This plasma instability may occur whenever collisionless plasmas are counterstreaming in an unmagnetized medium. Here not only magnetic fields are created, but also electrical fields may accelerate particles. The filamentation instability is especially interesting as it may provide the necessary pre-acceleration required for the Fermi-I mechanism. The non-isotropic and non-homogeneous situation in the jet, the non-linear coupling between particles and electromagnetic fields and the collective and correlated behavior of the plasma itself require either quite extensive simplifying assumptions to make analytical calculations feasible or self-consistent three-dimensional simulations. The tool of choice for self-consistent simulations of collisionless plasmas are Particle-in-cell codes. For the simulations our in-house code ACRONYM was used that is described in Sect. 3. Even simulations on the largest available computers can’t track position and momentum of each and every single particle in a relativistic jet. So even large-scale simulations need some albeit less drastic simplifications. The first simplification one usually makes is lumping a number of particles of one species, say one billion electrons, which are close together in phase space to create one metaparticle with accordingly larger charge and mass. This leaves the q/m ratio constant and thus doesn’t alter the acceleration caused by the Lorentz force. Obviously there is a limit how many particles may be lumped together and how many metaparticles need to remain which will be discussed later on. The next step is to replace the direct interaction between particles due to the coulomb force by an indirect interaction via the strength of the electromagnetic field, stored on a spatially discrete grid. Using this particle-mesh method improves the computational complexity from O(n2 ) to O(n) in the particle number. This method under represents short range forces (between particles that are much closer to each other then the size of a grid cell) but reflect long range forces quite accurately [2]. In the environments mentioned above long range interactions dominate rendering short range interactions comparatively unimportant, hence the plasma is often referred to as being “collisionless”. The third big simplification concerns the mass ratio between protons and electrons and is the topic of this project. The rate at which protons evolve in time and form filaments is closely connected to the mass of the proton m p . However to faithfully represent the plasma the simulation code has to resolve the short-range and transient fluctuations which mainly carried by electrons and are therefore connected to the electron mass me . The typical length scale of these fluctuations is the Debye length λD [3] which restricts the edge length of the grid cells to dx < λD [4]. In turn the Courant-Friedrichs-Lewy condition [5] limits the length of each explicit time step to dt < 3−1/2 c/Δ . If one wants to look at the behavior at late times, say after a couple of proton gyro periods, the simulations has to run for ten thousands of

Mass Ratio and Particle Acceleration

7

time steps. To alleviate this problem one can reduce the mass of the protons in the simulation, creating mass ratios smaller then the rather large value of 1836 that we find in nature. So in other words this projects attempts to shed light on the question “How small can we make the mass ratio without misrepresenting nature”.

2 Scientific Results The description of the project status closely follows the PhD thesis of Thomas Burkart [6] who ran and analyzed most of the simulations presented here. A more detailed scientific treatment of the results can be found in [7]. For numerical reasons the simulations were conducted neither in the rest frame of the background medium nor in the rest frame of the jet plasma but rather in the frame where the common center of mass is at rest. In this frame both populations are streaming with a velocity close to the speed of light. Initially the density of particles is uniform in the plane normal to the streaming direction for both populations. As the filamentation instability grows it creates strong flux tubes in which particles from one initial population (either jet or background) propagate preferentially creating a net current in the flux tube. This current generates a strong magnetic field around the flux tube, further compressing it. The magnetic field around the flux tubes has large components in the plane normal to the streaming direction. Thus the energy stored in the two perpendicular magnetic field components are a good proxy for the growth of the flux tubes. Figure 1 shows five reconstructed field lines of the magnetic field in the neighborhood of a flux tube. The flux tube itself is represented by the particle density shown semi-transparently in gray-scale. The component of the magnetic field in the direction of the flux tube is rather small and the field lines form nearly closed loops around the flux tube, compressing it.

2.1 Small and Intermediate Mass Ratios Coming back to the topic of the mass ratio it is obvious that the light electrons react much stronger to a given magnetic field than the heavy protons. Therefore small fluctuations in the electron density suffice to generate enough magnetic field to bunch the electrons a little closer, amplifying the initial fluctuation. If the mass of a proton is not much larger then the electron mass, the protons will be strongly affected by the small magnetic fields too. This has two effects: The first is that the protons are—due to their opposite charge—repulsed from the magnetic field that bundles the electrons. As the protons are ejected from the forming electron flux tubes, charge separation increases which amplifies the magnetic field. The expelled protons form flux tubes too, which tighten due to the same filamentation instability, creating proton flux tubes on the same timescale as the electron flux

8

P. Kilian, T. Burkart, F. Spanier

Fig. 1 Exemplary reconstructed field lines (shown in green) around a single flux tube stay almost completely in the perpendicular plane. The background image in gray-scale shows the mass density which peaks in the flux tubes

tubes, much faster then one would expect based on the mass ratio. The second effect is that the instability of the electrons happens slower than in the case of the physical mass ratio. This is due to the fact that moving charged particles within a magnetic field create an opposite magnetic field following Lenz’ rule. As the magnetic field that was originally created by the electrons pushes the protons away it is weakened and the bunching of the electrons is slowed down. Both effects can be seen in Fig. 2. For all mass ratios shown in this plot electron and proton instability generate one shared peak of the perpendicular field and larger mass ratios shift that peak to larger times. Mass ratios 1 and 5 show similar results, the differences are no larger than the statistical fluctuations one would find for two runs of equal mass ratio. A mass ratio of 20 that is often used in kinetic plasma −1 instead of 44 ω −1 . A mass simulations results in a slightly later peak after 60 ω pe pe ratio of 42.8 (the square root of the mass ratio found in nature) results in an even later

Mass Ratio and Particle Acceleration

9

Fig. 2 Temporal development of the perpendicular magnetic field for small and intermediate mass ratios

−1 ) peak. Furthermore two distinct growth periods from 20 ω −1 to 35 ω −1 (68.3 ω pe pe pe −1 to 68 ω −1 can be seen but the separation between the two is not and from 55 ω pe pe large enough to allow the electron flux tubes to decay before the proton flux tubes develop.

2.2 High Mass Ratios For mass ratios larger than the ones discussed in the subsection above the electron and proton instability decouple and happen on their own undisturbed timescales. This can be seen quite well in Fig. 3 where the electron instability happens in the −1 to 50 ω −1 independently from the mass of the protons. range from 30 ω pe pe The time of the peak magnetic field of the proton instability follows the linear −1 + 0.8 ω −1 m /m very closely. Without wanting to over relation t peak = 33.4 ω pe e pe p interpret this linear fit through just three data point we claim that the time scale of the proton instability grows linearly with the proton mass was expected from theory. Given that the electron instabilities decay to half of their peak value by the time −1 57 ω pe have passed and that the proton instability needs—just like the electron −1 to grow one can conclude that a mass ratio of 42.8 is instability—at least 20 ω pe indeed too small but 100 suffice to decouple the two fairly well.

10

P. Kilian, T. Burkart, F. Spanier

Fig. 3 Temporal development of the perpendicular magnetic field for large mass ratios

2.3 Other Influences To check if a mass ratio of 100 is indeed large enough as claimed in the preceding subsection we ran two more simulations. Both use the same mass ratio but are changed in either composition or size of the simulation box. Looking at Fig. 4 the dashed line belongs to the simulation where both plasma populations are hydrogen plasma. Contrast this with the other simulations where one direction contains fewer protons and some positron (so effectively a mixture of hydrogen and pair plasma). The smaller number of light constituents results in a smaller then usual electron peak and the larger number of protons enhances the second peak in the magnetic field. The timing however remains unaffected. The other investigated variant is shown as a dotted line in Fig. 4. In this case the simulations used twice the number of grid cells in both perpendicular directions, increasing the size of the simulation box in physical units correspondingly. The time −1 which is dominated by the electron instability evolution up to the point of 60 ω pe remains nearly unchanged. However the following proton instability grows slower and longer, reaching a later and stronger peak in the perpendicular magnetic field. The most likely reason is that a larger simulation domain reduces interaction with other filaments and more importantly the interaction of filaments with their own mirror image via the periodic boundaries. This point is the topic of further studies but is outside the focus of this project and doesn’t affect the key conclusion that a

Mass Ratio and Particle Acceleration

11

Fig. 4 Influences on the temporal development of the perpendicular magnetic field other than the mass ratio

mass ratio significantly smaller than 100 results in an unphysical coupling between electrons and protons.

3 Numerical Performance The particle-in-cell code ACRONYM that has been developed at the chair of astronomy at the University of W¨urzburg has been described in detail in the application of this project. The following paragraph gives a short overview of the code and the main features for reader who are not familiar with the code. A particle-in-cell code belongs to the class of particle mesh methods commonly used in numerical studies of many-body problems. For a detailed introduction see [2] or [8]. The field quantities like electric and magnetic fields as well as current densities are stored in a Yee lattice [9]. The particles deposit currents on the grid in each time step using Esirkepov’s method [10] which influences the electric and magnetic fields in the next update step. The electromagnetic fields act on the particles through the Lorentz force which is implemented using the Boris push [11]. A more detailed albeit slightly dated description of our code can be found in [12]. Some of the simulations of this project were run in-house, the other were done using the NEC Nehalem Cluster at the High Performance Computing Center Stuttgart (HLRS). The largest jobs there used 64 nodes with 8 processors per node. Our code

12

P. Kilian, T. Burkart, F. Spanier

would easily scale to the 512 cores reserved this way but we had some problems with MPI threads running out of memory. Consequently most runs were conducted using two or four MPI threads per node to increase the amount of RAM available to each thread. The source of the problem is that a huge amount of particles is concentrated inside the flux tube, resulting in quite unbalanced memory requirements between the MPI threads depending on the presence or absence of a large flux tube in the part of the simulation domain covered by each MPI thread. As this was the first time that the available memory was the limiting factor our code has no support for OpenMP yet. Currently we are working on a dynamic rebalancing of the computational domains between different MPI threads which should remove the problem of the imbalanced memory usage. This will remove the need to run with a reduced number of threads per node and consequently the incentive to add OpenMP support will likely go away again. The largest simulation in this project used 37709 CPU hours and was split into 17 consecutive jobs using 128 CPUs each. In that time 3.4 · 109 particles were moved through 3600 time steps. This is just short of 700 particle updates per CPU and second. This falls well short of the peak performance of 200000 particles per second of our code. The reason is that each job had to read 250 gigabytes of particles as well as several gigabytes of electromagnetic fields from disk to reconstruct the state that the preceding job reached as well as write back all that information at the end of the run. The majority of the 15 to 20 hours runtime was spend waiting for the IO to happen. Other jobs which used one fifth the number of particles per cell where not impeded by this problem and performed as expected. Recent improvements in the IO part of the code have shown a large speed up on other systems and we plan to port the improvement to the code version running at HLRS prior to any new simulations.

References 1. F.A. Aharonian, A.G. Akhperjanian, K.M. Aye, A.R. Bazer-Bachi, M. Beilicke, W. Benbow, D. Berge, P. Berghaus, K. Bernl¨ohr, O. Bolz, C. Boisson, C. Borgmeier, F. Breitling, A.M. Brown, J. Bussons Gordo, P.M. Chadwick, V.R. Chitnis, L.M. Chounet, R. Cornils, L. Costamante, B. Degrange, A. Djannati-Ata¨ı, L.O. Drury, T. Ergin, P. Espigat, F. Feinstein, P. Fleury, G. Fontaine, S. Funk, Y.A. Gallant, B. Giebels, S. Gillessen, P. Goret, J. Guy, C. Hadjichristidis, M. Hauser, G. Heinzelmann, G. Henri, G. Hermann, J.A. Hinton, W. Hofmann, M. Holleran, D. Horns, O.C. de Jager, I. Jung, B. Kh´elifi, N. Komin, A. Konopelko, I.J. Latham, R. Le Gallou, M. Lemoine, A. Lemi`ere, N. Leroy, T. Lohse, A. Marcowith, C. Masterson, T.J.L. McComb, M. de Naurois, S.J. Nolan, A. Noutsos, K.J. Orford, J.L. Osborne, M. Ouchrif, M. Panter, G. Pelletier, S. Pita, M. Pohl, G. P¨uhlhofer, M. Punch, B.C. Raubenheimer, M. Raue, J. Raux, S.M. Rayner, I. Redondo, A. Reimer, O. Reimer, J. Ripken, M. Rivoal, L. Rob, L. Rolland, G. Rowell, V. Sahakian, L. Saug´e, S. Schlenker, R. Schlickeiser, C. Schuster, U. Schwanke, M. Siewert, H. Sol, R. Steenkamp, C. Stegmann, J.P. Tavernet, C.G. Th´eoret, M. Tluczykont, D.J. van der Walt, G. Vasileiadis, P. Vincent, B. Visser, H.J. V¨olk, S.J. Wagner, Nature 432, 75 (2004). DOI 10.1038/nature02960 2. R.W. Hockney, J.W. Eastwood, Computer simulation using particles (Bristol: Hilger, 1988) 3. Mitteilungen der Astronomischen Gesellschaft Hamburg 13, 29 (1959)

Mass Ratio and Particle Acceleration

13

4. B.I. Cohen, A.B. Langdon, D.W. Hewett, R.J. Procassini, Journal of Computational Physics 81(1), 151 (1989). DOI 10.1016/0021-9991(89)90068-5 5. R. Courant, K. Friedrichs, H. Lewy, Mathematische Annalen 100(1), 32 (1928). DOI 10.1007/BF01448839 6. T. Burkart, Der Einfluss des fundamentalen Massenverh¨altnisses auf die Teilchenbeschleunigung durch Plasmainstabilit¨aten. Ph.D. thesis, Julius-Maximilians-Universit¨at W¨urzburg (2010) 7. T. Burkart, O. Elbracht, U. Ganse, F. Spanier, ApJ 720, 1318 (2010). DOI 10.1088/ 0004-637X/720/2/1318 8. C.K. Birdsall, A.B. Langdon, Plasma physics via computer simulation, 1st edn. (New York: Taylor and Francis, 2005) 9. K. Yee, Antennas and Propagation, IEEE Transactions on 14(3), 302 (1966) 10. T.Z. Esirkepov, Computer Physics Communications 135(2), 144 (2001). DOI 10.1016/ S0010-4655(00)00228-9 11. G. Penn, P.H. Stoltz, J.R. Cary, J. Wurtele, Journal of Physics G: Nuclear and Particle Physics 29(8), 1719 (2003). URL http://stacks.iop.org/0954-3899/29/1719 12. T. Burkart, O. Elbracht, F. Spanier, Astronomische Nachrichten 328, 662 (2007)

The SuperN-Project: Neutrino Hydrodynamics Simulations of Core-Collapse Supernovae B. M¨uller, L. H¨udepohl, A. Marek, F. Hanke, and H.-Th. Janka

Abstract We give an overview of the challenges and the current status of our twodimensional (core collapse) supernova modelling, and present the system of equations and the algorithm for its solution that are employed in our code V ERTEX. We also discuss the parallelisation of V ERTEX, give scaling results on different architectures, and report on our ongoing efforts to increase the computational efficiency of the code. Furthermore, we outline some of the recent results obtained from simulations performed on the NEC SX-8 at the HLRS. Specifically, we report our findings about the role of general relativity in core-collapse supernovae, about nucleosynthesis conditions in O-Ne-Mg core supernovae, and about the proto-neutron star cooling phase.

1 Introduction A star more massive than about 8 solar masses ends its life in a catastrophic explosion, a supernova. Its quiescent evolution comes to an end, when the pressure in its inner layers is no longer able to balance the inward pull of gravity. Throughout its life, the star sustained this balance by generating energy through a sequence of nuclear fusion reactions, forming increasingly heavier elements in its core. However, when the core consists mainly of iron-group nuclei, central energy generation ceases. The fusion reactions producing iron-group nuclei relocate to the core’s surface, and their “ashes” continuously increase the core’s mass. Similar to a white dwarf, such a core is stabilised against gravity by the pressure of its degenerate gas of electrons. However, to remain stable, its mass must stay smaller than (roughly) the Chandrasekhar limit. When the core grows larger than this limit, it collapses to a neutron star, and a huge amount (∼ 1053 erg) of gravitational binding energy is set B. M¨uller · L. H¨udepohl · A. Marek · F. Hanke · H.-Th. Janka Max-Planck-Institut f¨ur Astrophysik, Karl-Schwarzschild-Strasse 1, Postfach 1317, D-85741 Garching bei M¨unchen, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 2, © Springer-Verlag Berlin Heidelberg 2012

15

16

B. M¨uller et al.

free. Most (∼ 99%) of this energy is radiated away in neutrinos, but a small fraction is transferred to the outer stellar layers and drives the violent mass ejection, which disrupts the star in a supernova. Despite 40 years of research, the details of how this energy transfer happens and how the explosion is initiated are still not well understood. Observational evidence about the physical processes deep inside the collapsing star is sparse and almost exclusively indirect. The only direct observational access is via measurements of neutrinos or gravitational waves. To obtain insight into the events in the core, one must therefore heavily rely on sophisticated numerical simulations. The enormous amount of computer power required for this purpose has led to the use of several, often questionable, approximations and numerous ambiguous results in the past. Fortunately, however, the development of numerical tools and computational resources has meanwhile advanced to a point, where it is becoming possible to perform multi-dimensional simulations with unprecedented accuracy. Therefore there is hope that the physical processes which are essential for the explosion can finally be unravelled. An understanding of the explosion mechanism is required to answer many important questions of nuclear, gravitational, and astro-physics like the following: • How do the explosion energy, the explosion timescale, and the mass of the compact remnant depend on the progenitor’s mass? Is the explosion mechanism the same for all progenitors? For which stars are black holes left behind as compact remnants instead of neutron stars? • What is the role of the—incompletely known—equation of state (EoS) of the proto-neutron star? Do softer or stiffer EoSs favour the explosion of a core collapse supernova? • How do neutron stars receive their natal kicks? Are they accelerated by asymmetric mass ejection and/or anisotropic neutrino emission? • What are the generic properties of the neutrino emission and of the gravitational wave signal that are produced during stellar core collapse and explosion? Up to which distances could these signals be measured with operating or planned detectors on earth and in space? And what can one learn about supernova dynamics or nuclear and particle physics from a future measurement of such signals in the case of a Galactic supernova? • How do supernovae contribute to the enrichment of the intergalactic medium with heavy elements? What kind of nucleosynthesis processes occur during and after the explosion? Can the elemental composition of supernova remnants be explained correctly by the numerical simulations? Does the rapid neutron capture process (r-process), which produces e.g. gold and the actinides, take place in supernovae?

Simulations of Supernovae

17

2 Numerical Models 2.1 History and Constraints According to theory, a shock wave is launched at the moment of “core bounce” when the neutron star begins to emerge from the collapsing stellar iron core. There is general agreement, supported by all “modern” numerical simulations, that this shock is unable to propagate directly into the stellar mantle and envelope, because it loses too much energy in dissociating iron into free nucleons while it moves through the outer core. The “prompt” shock ultimately stalls. Thus the currently favoured theoretical paradigm exploits the fact that a huge energy reservoir is present in the form of neutrinos, which are abundantly emitted from the hot, nascent neutron star. The absorption of electron neutrinos and anti-neutrinos by free nucleons in the post-shock layer is thought to reenergise the shock, thus triggering the supernova explosion. Detailed spherically symmetric hydrodynamic models, which recently include a very accurate treatment of the time-dependent, multi-flavour, multi-frequency neutrino transport based on a numerical solution of the Boltzmann transport equation [1, 2], reveal that this “delayed, neutrino-driven mechanism” does not work as simply as originally envisioned. Although in principle able to trigger the explosion (e.g., [3–5]), neutrino energy transfer to the post-shock matter turned out to be too weak. For inverting the infall of the stellar core and initiating powerful mass ejection, an increase of the efficiency of neutrino energy deposition is needed. A number of physical phenomena have been pointed out that can enhance neutrino energy deposition behind the stalled supernova shock. They are all linked to the fact that the real world is multi-dimensional instead of spherically symmetric (or one-dimensional; 1D) as assumed in the works cited above: (1) Convective instabilities in the neutrino-heated layer between the neutron star and the supernova shock develop to violent convective overturn [6]. This convective overturn is helpful for the explosion, mainly because (a) neutrino-heated matter rises and increases the pressure behind the shock, thus pushing the shock further out, (b) cool matter is able to penetrate closer to the neutron star where it can absorb neutrino energy more efficiently, and (c) the rise of freshly heated matter reduces energy losses by the reemission of neutrinos. These effects allow multi-dimensional models to explode easier than spherically symmetric ones [7–9]. (2) Recent work [10–13] has demonstrated that the stalled supernova shock is also subject to a second non-radial low-mode instability, called the standing accretion shock instability or “SASI” for short, which can grow to a dipolar, global deformation of the shock [12, 14, 15]. (3) Convective energy transport inside the nascent neutron star [16–18] might enhance the energy transport to the neutrinosphere and could thus boost the neutrino luminosities. This would in turn increase the neutrino-heating behind the shock.

18

B. M¨uller et al.

This list of multi-dimensional phenomena (limited to non-magnetised supernova cores) awaits more detailed exploration by multi-dimensional simulations. Until recently, such simulations have been performed with only a grossly simplified treatment of the involved microphysics, in particular of the neutrino transport and neutrino-matter interactions. At best, grey (i.e., single energy) flux-limited diffusion schemes were employed. Since, however, the role of the neutrinos is crucial for the problem, and because previous experience shows that the outcome of simulations is indeed very sensitive to the employed transport approximations, studies of the explosion mechanism require the best available description of the neutrino physics. This implies that one has to solve the Boltzmann transport equation for neutrinos.

2.2 The Mathematical Model As core-collapse supernovae involve such a complex interplay of hydrodynamics, self-gravity and neutrino heating and cooling, numerical modellers face a classical “multiphysics” problem. Although the overall problem can still be formulated as a system of non-linear partial differential equations, rather dissimilar methods— sometimes with conflicting requirements on the computer architecture and the parallelisation strategy—need to be applied to treat individual subsystems. In the case of our code, the system of equations that needs to be solved consists of the following components: • The multi-dimensional Euler equations of (relativistic) hydrodynamics, supplemented by advection equations for the electron fraction and the chemical composition of the fluid, and formulated in spherical polar coordinates; • equations for the space-time metric (or in the Newtonian case, the Poisson equation) for calculating the gravitational source terms in the Euler equations; • the Boltzmann transport equation and/or its moment equations which determine the (non-equilibrium) distribution function of the neutrinos; • the emission, absorption, and scattering rates of neutrinos, which are required for the solution of the neutrino transport equations; • the equation of state of the stellar fluid, which provides the closure relation between the variables entering the Euler equations, i.e. density, momentum, energy, electron fraction, composition, and pressure. In what follows we will briefly summarise the neutrino transport algorithms, thus focusing on the major computational kernel of our code. For a more complete description of the entire code we refer the reader to [19, 20], and the references therein.

Simulations of Supernovae

19

Fig. 1 Illustration of the phase space coordinates (see the main text)

2.3 “Ray-by-Ray Plus” Method for the Neutrino Transport Problem The crucial quantity required to determine the source terms for the energy, momentum, and electron fraction of the fluid owing to its interaction with the neutrinos is the neutrino distribution function in phase space, f (r, ϑ , φ , ε , Θ , Φ ,t). Equivalently, the neutrino intensity I = c/(2π h¯ c)3 · ε 3 f may be used. Both are time-dependent functions in a six-dimensional phase space, as they describe, at every point in space (r, ϑ , φ ), the distribution of neutrinos propagating with energy ε into the direction (Θ , Φ ) at time t (Fig. 1). The evolution of I (or f ) in time is governed by the Boltzmann equation, and solving this equation is, in general, a six-dimensional problem (as time is usually not counted as a separate dimension). A solution of this equation by direct discretisation (using an SN scheme) would require computational resources in the PetaFlop range. Although there are attempts by at least one group in the United States to follow such an approach, we feel that, with the currently available computational resources, it is mandatory to reduce the dimensionality of the problem. Actually this should be possible, since the source terms entering the hydrodynamic equations are integrals of I over momentum space (i.e. over ε , Θ , and Φ ), and thus only a fraction of the information contained in I is truly required to compute the neutrino effects on the dynamics of the flow. It therefore makes sense to consider angular moments of I, and to solve evolution equations for these moments, instead of dealing with the Boltzmann equation directly. The 0th to 3rd order moments are defined as J, H, K, L, . . . (r, ϑ , φ , ε ,t) =

1 4π

I(r, ϑ , φ , ε , Θ , Φ ,t) n0,1,2,3,... dΩ

(1)

where dΩ = sin Θ dΘ dΦ , n = (cos Θ , sin Θ cos Φ , sin Θ sin Φ ), and exponentiation represents repeated application of the dyadic product. Note that the moments are tensors of the required rank. This leaves us with a four-dimensional problem. So far no approximations have been made. In order to reduce the size of the problem even further, one needs to

20

B. M¨uller et al.

resort to assumptions on its symmetry. At this point, one usually employs azimuthal symmetry for the stellar matter distribution, i.e. any dependence on the azimuth angle φ is ignored, which implies that the hydrodynamics of the problem can be treated in two dimensions. It also implies I(r, ϑ , ε , Θ , Φ ) = I(r, ϑ , ε , Θ , −Φ ). If, in addition, it is assumed that I is even independent of Φ , then each of the angular moments of I becomes a scalar, which depends on two spatial dimensions, and one dimension in momentum space: J, H, K, L = J, H, K, L(r, ϑ , ε ,t). Thus we have reduced the problem to three dimensions in total.

2.3.1 The System of Equations With the aforementioned assumptions it can be shown [19], that in the Newtonian approximation the following two transport equations need to be solved in order to compute the source terms for the energy and electron fraction of the fluid: 1 ∂ (r2 βr ) 1 ∂ (sin ϑ βϑ ) J +J 2 + r ∂r r sin ϑ ∂ϑ 2 1 ∂ (sin ϑ βϑ ) 1 ∂ (r H) βr ∂ H ∂ ε ∂ βr ∂ βr + + − H − + 2 εJ r ∂r c ∂t ∂ε c ∂t ∂ε r 2r sin ϑ ∂ϑ ∂ ∂ βr βr βr 1 ∂ (sin ϑ βϑ ) 1 ∂ (sin ϑ βϑ ) − + +J − εK − ∂ε ∂r r 2r sin ϑ ∂ϑ r 2r sin ϑ ∂ϑ ∂ βr βr 2 ∂ βr 1 ∂ (sin ϑ βϑ ) +K + − − H = C(0) , (2) ∂r r 2r sin ϑ ∂ϑ c ∂t

∂ βϑ ∂ 1∂ + βr + c ∂t ∂r r ∂ϑ

∂ βϑ ∂ 1 ∂ (r2 βr ) 1∂ 1 ∂ (sin ϑ βϑ ) H +H 2 + βr + + c ∂t ∂r r ∂ϑ r ∂r r sin ϑ ∂ϑ ∂ K 3K − J ∂ βr βr ∂ K ∂ ε ∂ βr + + + +H − K ∂r r ∂r c ∂t ∂ε c ∂t 1 ∂ (sin ϑ βϑ ) ∂ ∂ β r βr − − εL − ∂ε ∂r r 2r sin ϑ ∂ϑ ∂ βr 1 ∂ βr 1 ∂ (sin ϑ βϑ ) + + (J + K) = C(1) . (3) εH − ∂ε r 2r sin ϑ ∂ϑ c ∂t

These are evolution equations for the neutrino energy density, J, and the neutrino flux, H, and follow from the zeroth and first moment equations of the comoving frame (Boltzmann) transport equation in the Newtonian, O(v/c) approximation. The quantities C(0) and C(1) are source terms that result from the collision term of the Boltzmann equation, while βr = vr /c and βϑ = vϑ /c, where vr and vϑ are the components of the hydrodynamic velocity, and c is the speed of light. The functional dependencies βr = βr (r, ϑ ,t), J = J(r, ϑ , ε ,t), etc. are suppressed in the notation. This system includes four unknown moments (J, H, K, L) but only two equations,

Simulations of Supernovae

21

and thus needs to be supplemented by two more relations. This is done by substituting K = fK · J and L = fL · J, where fK and fL are the variable Eddington factors, which for the moment may be regarded as being known, but in our case are indeed determined from a separate simplified (“model”) Boltzmann equation. The moment equations (2) and (3) are very similar to the O(v/c) equations in spherical symmetry which were solved in the 1D simulations of [21] (see (7), (8), (30), and (31) of the latter work). This similarity has allowed us to reuse a good fraction of the one-dimensional version of V ERTEX, for coding the multi-dimensional algorithm. The additional terms necessary for this purpose have been set in boldface above. Finally, the changes of the energy, e, and electron fraction, Ye , required for the hydrodynamics are given by the following two equations 4π de =− dt ρ

∞ 0

dε

∑

(0)

Cν (ε ),

ν ∈(νe ,ν¯ e ,... ) 4π mB ∞ dε (0) dYe (0) Cνe (ε ) −Cν¯ e (ε ) =− dt ρ 0 ε

(4) (5)

(for the momentum source terms due to neutrinos see [19]). Here mB is the baryon mass, and the sum in (4) runs over all neutrino types. The full system consisting of (2)–(5) is stiff, and thus requires an appropriate discretisation scheme for its stable solution.

2.3.2 Method of Solution In order to discretise (2)–(5), the spatial domain [0, rmax ] × [ϑmin , ϑmax ] is covered by Nr radial and Nϑ angular zones, where ϑmin = 0 and ϑmax = π correspond to the north and south poles, respectively, of the spherical grid. (In general, we allow for grids with different radial resolutions in the neutrino transport and hydrodynamic parts of the code. The number of radial zones for the hydrodynamics will be denoted hyd by Nr .) The number of bins used in energy space is Nε and the number of neutrino types taken into account is Nν . The equations are solved in two operator-split steps corresponding to a lateral and a radial sweep. In the first step, we treat the boldface terms in the respectively first lines of (2)– (3), which describe the lateral advection of the neutrinos with the stellar fluid, and thus couple the angular moments of the neutrino distribution of neighbouring angular zones. For this purpose we consider the equation 1 ∂ (sin ϑ βϑ Ξ ) 1 ∂Ξ + = 0, c ∂t r sin ϑ ∂ϑ

(6)

where Ξ represents one of the moments J or H. Although it has been suppressed in the above notation, an equation of this form has to be solved for each radius, for

22

B. M¨uller et al.

each energy bin, and for each type of neutrino. An explicit upwind scheme is used for this purpose. In the second step, the radial sweep is performed. Several points need to be noted here: • Terms in boldface not yet taken into account in the lateral sweep, need to be included into the discretisation scheme of the radial sweep. This can be done in a straightforward way since these remaining terms do not include derivatives of the transport variables J or H. They only depend on the hydrodynamic velocity vϑ , which is a constant scalar field for the transport problem. • The right hand sides (source terms) of the equations and the coupling in energy space have to be accounted for. The coupling in energy is non-local, since the source terms of (2) and (3) stem from the Boltzmann equation, which is an integro-differential equation and couples all the energy bins. • The discretisation scheme for the radial sweep is implicit in time. Explicit schemes would require very small time steps to cope with the stiffness of the source terms in the optically thick regime, and the small CFL time step dictated by neutrino propagation with the speed of light in the optically thin regime. Still, even with an implicit scheme 105 time steps are required per simulation. This makes the calculations expensive. Once the equations for the radial sweep have been discretized in radius and energy, the resulting solver is applied ray-by-ray for each angle ϑ and for each type of neutrino; i.e. for constant ϑ , Nν two-dimensional problems need to be solved. The discretisation itself is done using a second order accurate scheme with backward differencing in time according to [21]. This leads to a non-linear system of algebraic equations, which is solved by Newton-Raphson iteration with explicit construction and inversion of the corresponding Jacobian matrix with the BlockThomas algorithm.

2.4 Parallelisation The ray-by-ray approximation readily lends itself to parallelisation over the different angular zones. In order to make efficient use of modern supercomputer systems with relatively small shared-memory units (e.g. 8 CPUs per node on the NEC SX8, 16 on the NEC SX-9), distributed memory parallelism is indispensable. An MPI version of the V ERTEX code using domain decomposition was initially developed within a cooperation between MPA and the Teraflop Workbench at the HLRS in 2007/2008. Since then, the parallelisation of V ERTEX has been further extended to allow good scaling on several thousands of cores as required for future 3D supernova simulations. The I/O is now handled by means of parallel HDF5 to ensure high scalability and to eliminate the excessive memory consumption associated with temporary I/O arrays on the root node. In order to optimise memory usage, we have also eliminated most of the remaining non-distributed global data structures. Fur-

Simulations of Supernovae

23

Fig. 2 Left: Parallel efficiency of strong scaling (i.e. the ratio of the speed-up to the number of cores) for a 2D setup with 256 angular rays on different machines. Right: Wall clock time per time step as a function of problem size (indicating the parallel efficiency of weak scaling) for different machines with more than 1000 available cores

thermore, introducing virtual topologies has provided a more robust and convenient way for handling the communication between the different domains. Scaling tests on different architectures demonstrate the excellent parallel performance of V ERTEX. Figure 2 (left panel) shows the parallel efficiency of strong scaling as a function of the number of processors for a test run with 256 angular rays on the NEC machine SX-8 (8 CPUs per node) at the HLRS, on the JUROPA cluster at the J¨ulich Supercomputing Centre (JSC), on the SGI Altix 4700 at the Leibniz Rechenzentrum (LRZ), and on the IBM Power6 575 (32 CPUs/node) at the Rechenzentrum Garching (RZG) of the Max-Planck-Gesellschaft. On the SX-8, a speed-up of 7.85 could be obtained by using eight nodes instead of one, which corresponds to a parallel efficiency of more than 98%. However, with the expected partial phase-out of the vector systems at the HLRS and the transition to the new Cray XE6 system, the performance on massively parallel scalar systems offering a larger number of cores for sustained use becomes a more immediate concern. Fortunately, measurements on the JUROPA system at JSC (which is in some respects similar to the NEC Nehalem cluster at the HLRS) show very good strong scaling on up to ∼ 256 cores for large 2D simulations. For future 3D simulations we expect even better scaling: Preliminary tests using an experimental 3D version of V ERTEX on three different machines (Bluegene/P and IBM Power 6 575 at RZG, JUROPA cluster at JSC) indicate that a high weak-scaling efficiency can be achieved on several thousands of cores (Fig. 2, right panel). With the recent improvements of the V ERTEX code, we are therefore well prepared for efficiently utilising the next generation of computing platforms at the HLRS.

3 Recent Results and Ongoing Work We make use of the computer resources available to us at the HLRS to address some of the important questions in SN theory (see Sect. 1) with 2D-simulations. We typically run our code on one node of the NEC SX-8 (8 processors, OpenMP-

24

B. M¨uller et al.

parallelised) with 98.3% of vector operations and up to 30000 MFLOPS per second, settling for a compromise between parallel speed-up and throughput on the currently available system. In the following we present some of our results from these simulations that are currently conducted at the HLRS. For the neutrino interaction rates we use the full set as described in [19, 21] unless noted otherwise.

3.1 Relativistic Supernova Models During the last year, we continued several multi-dimensional simulations of different progenitor stars using the relativistic version of the V ERTEX code [20]. These simulations have now been advanced to sufficiently late times and allow us to quantify the impact of general relativistic effects on the explosion conditions and the gravitational wave signal for the first time [22]. For an 11.2M star, we already confirmed the explosions previously found by our group in calculations [23, 24] with a simple, “pseudo-Newtonian” treatment of strong-field gravity [21, 25] last year, and the simulation has been extended considerably to 560 ms after bounce to study the post-explosion phase in more detail and provide data for nucleosynthesis studies in the future. However, a 15M model, now advanced to 690 ms after bounce turned out to be an even more interesting case. Although we had obtained an explosion for this progenitor in a simulation assuming relatively rapid rotation [24] and an overly strong gravitational potential, no explosion was observed in our best recent non-rotating pseudo-Newtonian models. By contrast, a strongly asymmetric explosion develops shortly after 400 ms in our relativistic 15M simulation (Fig. 3). Upon comparing this relativistic simulation to the corresponding pseudo-Newtonian run, we find the neutrino heating conditions to be significantly more favourable for an explosion in the relativistic case throughout the entire post-bounce phase. We also observe conspicuous differences in the gravitational wave signal between these models. The relativistic simulation not only yields a stronger overall signal because of enhanced gravitational wave emission during the first 200 ms after the onset of the explosion; the spectrum also peaks at significantly lower frequencies (between 800 Hz and 900 Hz) than for the pseudoNewtonian model (between 1000 Hz and 1120 Hz, see Fig. 4).

3.2 Oxygen-Neon-Magnesium Core Supernovae Using our computing time at the HLRS, we have long been investigating a special class of progenitor stars with an O-Ne-Mg core. For these progenitors, explosions can be obtained fairly easily and without support from multi-dimensional hydrodynamical instabilities like convection because of the peculiar density structure at the edge of the core. These progenitors are therefore an ideal case for studying the

Simulations of Supernovae

25

Fig. 3 Specific entropy s along the north and south polar axis during the explosion of a 15M star. The shock is visible as the transition from dark indigo (representing cold infalling material at large radii) to brighter colours (blue to red). At about 450 ms, the shock begins to accelerate outwards rapidly along the north polar axis, indicating the onset of a strongly asymmetric explosion

Fig. 4 Comparison of the spectral energy distribution of gravitational waves for a relativistic and a pseudo-Newtonian model of a 15M star

nucleosynthetic yields from the explosion and the cooling phase of the newly-born neutron star. 2D simulations conducted at the HLRS have now allowed us to analyse the effects of convective mixing after the onset of the explosion on the nucleosynthesis [26]. Although convection is of little consequence for the energetics of the explosion, we were able to show that it has a considerable impact on the composition of the ejecta. While the spherically symmetric models studied by our group so far suffered from a harmful overproduction of certain isotopes with neutron number N = 50 (mainly 90 Zr), buoyant neutron-rich bubbles with moderate entropy contribute elements from the iron group to zirconium with relatively uniform abun-

26

B. M¨uller et al.

Fig. 5 Neutrino luminosity and mean energy for the first seconds of the cooling phase of a protoneutron star formed during the explosion of an 8.8M star. This plot demonstrates the impact of convection in the proto-neutron star (solid lines) on the emission of electron neutrinos (νe , black), electron antineutrinos (ν¯ e , red) and heavy-flavour neutrinos (νμ τ ); the results from a model without convection are also shown for reference (dashed lines)

dances in the 2D simulation. Small model variations, which could result in more neutron-rich convective bubbles, might even allow for a so-called “weak r-process” producing elements up to palladium, silver and cadmium. These findings suggest that O-Ne-Mg core supernovae are very likely candidates for explaining the abundance pattern in low-metallicity stars in the Galactic halo that are deficient in heavy r-process elements. We also continued to investigate the cooling phase of the proto-neutron star for O-Ne-Mg core supernovae over several seconds [27], focusing in particular on the late-time neutrino signal, which may contain important clues to the properties of the compact remnant and the equation of state beyond nuclear matter density. In addition, the neutrino emission also regulates the nucleosynthesis conditions in the wind emanating from the young neutron star, which has long be advocated as a site of r-process nucleosynthesis. Currently, it is not yet feasible to address this phase with multi-dimensional simulations due to severe time-step limitations, and even long-time simulations in spherical symmetry have only become possible recently [27, 28]. The limitation to 1D is rather unsatisfactory, as convection inside the proto-neutron star remains an important factor of uncertainty in these models. However, we have now been able to take the effects of proto-neutron star convection into account in 1D simulations with the help of a mixing-length-type of approach. Our first results indicate that convection has a minor effect during the neutron star cooling phase in the case of O-Ne-Mg core supernovae and only results in a moderate enhancement of the neutrino luminosities and mean energies at early times, thus effectively leading to somewhat faster cooling (Fig. 5). Likewise, we observe only small changes in the nucleosynthesis conditions in the neutrino-driven wind, and still find no sign of r-process conditions in this phase, thus confirming our previous findings [27] with non-convective models.

Simulations of Supernovae

27

4 Conclusions and Outlook We continued to simulate 1D and 2D models of core collapse supernovae with detailed neutrino transport at the HLRS. Using a recent extension of our V ERTEX code, we explored the role of general relativistic effects in the explosion mechanism and the emission of gravitational waves in multi-dimensional supernova models, and found sizable effects both on the heating conditions and the typical gravitational wave frequencies. We also extended our investigations of the nucleosynthesis yields and the neutrino signal from O-Ne-Mg core supernovae. For the first time, nucleosynthesis calculations for these events have been conducted on the basis of a self-consistent multi-dimensional simulation. Due to the interesting implications of these results, we plan to further analyse the nucleosynthesis conditions in corecollapse supernovae by calculating self-consistent explosion models for more massive progenitors, by exploring model variations with further high-resolution runs of O-Ne-Mg core supernovae, and also by investigating the accretion-induced collapse and the subsequent explosion of white dwarfs (another scenario of high relevance for chemogalactic evolution). Furthermore, we have been able to improve our predictions for the late-time neutrino signal for these progenitors with the help of an adapted mixing-length treatment of convection in 1D models. After having thoroughly studied the neutron star cooling phase for O-Ne-Mg core supernovae, we are presently in the process of addressing a wider range of more massive progenitor stars. Since last year, we have also made considerable progress in our attempts to improve the scaling behaviour of the V ERTEX code. With the implementation of parallel I/O and the reduction of memory consumption, V ERTEX can efficiently utilise hundreds of cores for large 2D simulations and potentially thousands of cores for 3D models. The recent adaptations of the code provide us with an excellent starting point for using the large Cray XE6 system as the next high-performance computing platform at the HLRS. Acknowledgements. We thank especially K. Benkert for her extremely valuable and fruitful work on the MPI version of V ERTEX . Support by the Deutsche Forschungsgemeinschaft through the SFB/TR27 “Neutrinos and Beyond” and the SFB/TR7 “Gravitational Wave Astronomy”, and by the Cluster of Excellence EXC 153 “Origin and Structure of the Universe” (http://www.universecluster.de) are acknowledged, as well computer time grants of the HLRS, NIC J¨ulich, and Rechenzentrum Garching are acknowledged.

References 1. Rampp, M., Janka, H.T.: Spherically symmetric simulation with Boltzmann neutrino transport of core collapse and postbounce evolution of a 15M star. Astrophys. J. 539 (2000) L33–L36 2. Liebend¨orfer, M., Mezzacappa, A., Thielemann, F., Messer, O.E., Hix, W.R., Bruenn, S.W.: Probing the gravitational well: No supernova explosion in spherical symmetry with general relativistic Boltzmann neutrino transport. Phys. Rev. D 63 (2001) 103004–+ 3. Bethe, H.A.: Supernova mechanisms. Reviews of Modern Physics 62 (1990) 801–866 4. Burrows, A., Goshy, J.: A theory of supernova explosions. Astrophys. J. 416 (1993) L75

28

B. M¨uller et al.

5. Janka, H.T.: Conditions for shock revival by neutrino heating in core-collapse supernovae. Astron. Astrophys. 368 (2001) 527–560 6. Herant, M., Benz, W., Colgate, S.: Postcollapse hydrodynamics of SN 1987A – Twodimensional simulations of the early evolution. Astrophys. J. 395 (1992) 642–653 7. Herant, M., Benz, W., Hix, W.R., Fryer, C.L., Colgate, S.A.: Inside the supernova: A powerful convective engine. Astrophys. J. 435 (1994) 339 8. Burrows, A., Hayes, J., Fryxell, B.A.: On the nature of core-collapse supernova explosions. Astrophys. J. 450 (1995) 830 9. Janka, H.T., M¨uller, E.: Neutrino heating, convection, and the mechanism of Type-II supernova explosions. Astron. Astrophys. 306 (1996) 167–+ 10. Thompson, C.: Accretional heating of asymmetric supernova cores. Astrophys. J. 534 (2000) 915–933 11. Blondin, J.M., Mezzacappa, A., DeMarino, C.: Stability of standing accretion shocks, with an eye toward core-collapse supernovae. Astrophys. J. 584 (2003) 971–980 12. Scheck, L., Plewa, T., Janka, H.T., Kifonidis, K., M¨uller, E.: Pulsar recoil by large-scale anisotropies in supernova explosions. Phys. Rev. Letters 92 (2004) 011103–+ 13. Foglizzo, T., Galletti, P., Scheck, L., Janka, H.T.: Instability of a stalled accretion shock: Evidence for the advective-acoustic cycle. Astrophys. J. 654 (2007) 1006–1021 14. Scheck, L., Kifonidis, K., Janka, H.T., M¨uller, E.: Multidimensional supernova simulations with approximative neutrino transport. I. Neutron star kicks and the anisotropy of neutrinodriven explosions in two spatial dimensions. Astron. Astrophys. 457 (2006) 963–986 15. Scheck, L., Janka, H.T., Foglizzo, T., Kifonidis, K.: Multidimensional supernova simulations with approximative neutrino transport. II. Convection and the advective-acoustic cycle in the supernova core. Astron. Astrophys. 477 (2008) 931–952 16. Keil, W., Janka, H.T., M¨uller, E.: Ledoux convection in protoneutron stars – A clue to supernova nucleosynthesis? Astrophys. J. 473 (1996) L111 17. Burrows, A., Lattimer, J.M.: The birth of neutron stars. Astrophys. J. 307 (1986) 178–196 18. Pons, J.A., Reddy, S., Prakash, M., Lattimer, J.M., Miralles, J.A.: Evolution of proto-neutron stars. Astrophys. J. 513 (1999) 780–804 19. Buras, R., Rampp, M., Janka, H.T., Kifonidis, K.: Two-dimensional hydrodynamic corecollapse supernova simulations with spectral neutrino transport. I. Numerical method and results for a 15M star. Astron. Astrophys. 447 (2006) 1049–1092 20. M¨uller, B., Janka, H., Dimmelmeier, H.: A new multi-dimensional general relativistic neutrino hydrodynamic code for core-collapse supernovae. I. Method and code tests in spherical symmetry. Astrophys. J. Suppl. 189 (2010) 104–133 21. Rampp, M., Janka, H.T.: Radiation hydrodynamics with neutrinos. Variable Eddington factor method for core-collapse supernova simulations. Astron. Astrophys. 396 (2002) 361–392 22. M¨uller, B., Janka, H., Marek, A., Dimmelmeier, H.: in preparation (2011) 23. Buras, R., Janka, H.T., Rampp, M., Kifonidis, K.: Two-dimensional hydrodynamic corecollapse supernova simulations with spectral neutrino transport. II. Models for different progenitor stars. Astron. Astrophys. 457 (2006) 281–308 24. Marek, A., Janka, H.T.: Delayed neutrino-driven supernova explosions aided by the standing accretion-shock instability. Astrophys. J. 694 (2009) 664–696 25. Marek, A., Dimmelmeier, H., Janka, H.T., M¨uller, E., Buras, R.: Exploring the relativistic regime with Newtonian hydrodynamics: An improved effective gravitational potential for supernova simulations. Astron. Astrophys. 445 (2006) 273–289 26. Wanajo, S., Janka, H., M¨uller, E.: Electron-capture supernovae as the origin of elements beyond iron. Astrophys. J. 726 (2011) L15–+ 27. H¨udepohl, L., M¨uller, B., Janka, H., Marek, A., Raffelt, G.G.: Neutrino signal of electroncapture supernovae from core collapse to cooling. Physical Review Letters 104 (2010) 251101–+ 28. Fischer, T., Whitehouse, S.C., Mezzacappa, A., Thielemann, F., Liebend¨orfer, M.: Protoneutron star evolution and the neutrino-driven wind in general relativistic neutrino radiation hydrodynamics simulations. Astron. Astrophys. 517 (2010) A80–+

Simulation of Pre-planetesimal Collisions with Smoothed Particle Hydrodynamics R.J. Geretshauser, R. Speith, and W. Kley

1 Introduction It is widely accepted that planets form in protoplanetary discs. These discs, which are produced as a byproduct of stars in the collapse of a molecular cloud [23], initially consist of gas and dust, which interact with each other. The interaction induces size-dependent relative velocities between dust aggregates: the effects of Brownian motion, radial drift, vertical settling, and turbulent mixing lead to mutual collisions [38]. Three size steps can be distinguished on the path to a planet: (1) dust monomers and fractal dust aggregates (up to millimetre size), where van der Waals forces act as sticking mechanism, (2) millimetre- to kilometre-sized pre-planetesimals, where the sticking mechanism is under discussion, and (3) planetesimals of kilometre size and larger, from which the growth to planets is assisted by gravity. Once a sufficient population of planetesimals exists, the formation of planets is ensured [15]. Therefore, we briefly review only the first two steps. In the first stage, micrometre-sized dust grains grow up to millimetre to centimetre size depending on the disc model [6, 39]. This regime is characterised by low collision velocities and van der Waals forces as dominant sticking mechanism. At first, chain-like, fractal aggregates with a low mass-to-surface ratio are formed. With increasing collision velocities these aggregates get compacted and their mass-tosurface ratio increases. The result of the first stage are pre-planetesimals: millimetreto centimetre-sized fluffy aggregates with a high degree of porosity. Global coagulation simulations revealed that porous aggregates are able to acquire a much larger R.J. Geretshauser · W. Kley Institut f¨ur Astronomie und Astrophysik, Abteilung Computational Physics, Eberhard Karls Universit¨at T¨ubingen, Auf der Morgenstelle 10, 72076 T¨ubingen, Germany, e-mail: [email protected], [email protected] R. Speith Physikalisches Institut, Eberhard Karls Universit¨at T¨ubingen, Auf der Morgenstelle 14, 72076 T¨ubingen, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 3, © Springer-Verlag Berlin Heidelberg 2012

29

30

R.J. Geretshauser, R. Speith, W. Kley

mass compared to non-porous particles [27]. This growth step was investigated thoroughly in the laboratory [5] and in molecular dynamics simulations [28, 35]. The second growth step from pre-planetesimals to planetesimals is problematic and subject of extensive numerical and experimental effort. The hinderances in this stage can be characterised according to three types of barriers: fragmentation, drift, and bouncing barrier. The most serious barrier is the fragmentation barrier: with increasing size the relative velocities between pre-planetesimals increase and potentially lead to catastrophic disruption in mutual collisions. Often a velocity threshold of 1 ms−1 for disruptive events is assumed [3, 17], which is treated as independent from other parameters such as porosity or mass ratio of the collision partners. The fragment distribution is often described by a power-law [3, 17]. With these assumptions, it was found that dust coagulation is halted at centimetre or even millimetre size, depending on the disc model [6]. Recently, a growth model was proposed [34], where small fragments are not lost for planetesimal formation. Instead, planetesimals mainly grow by accretion of fragments smaller than 1 mm. The second obstacle to planetesimal formation is a barrier caused by radial drift of larger pre-planetesimals [37]. Due to its pressure support the gas in the disc rotates at sub-Keplerian velocity, whereas the solid material lacks this pressure support and tends to rotate at Keplerian velocity. As a consequence, the solid objects feel a headwind which causes the inward drift. In a minimum mass solar nebula, it takes roughly a century for metre-sized objects to drift into the star from 1 AU. In contrast, for centimetre- and for kilometre-sized objects it takes ∼ 105 yr. As a result, metresized objects are quickly lost for planetesimal formation by accretion onto the host star or photoevaporation. Collective effects in the midplane of the protoplanetary disc might diminish the problem [9]. Based on a collection of empirical collision data [17], which include rebound as third type of pre-planetesimal collision outcome besides sticking and fragmentation, a possible new barrier was introduced [39]: the bouncing barrier. In this scenario, collisional growth is halted at aggregate masses of about 1 g. After that, aggregates only get compacted in mutual collisions but do not grow any further. Additional to these barriers, the planet formation process must not be too efficient. Observations [26] suggest that (sub-)millimetre sized dust and millimetre- to centimetre-sized aggregates are present as long as 106 yr in T Tauri discs. Without fragmentation, however, the disc is depleted of small grains within 103 yr [10]. This issue is commonly referred to as grain retention problem. To summarise, the formation of planetesimals requires the right amount of sticking, bouncing, and fragmentation to be consistent with observations. Therefore, collisions of pre-planetesimals have to be investigated as thoroughly as possible and their outcome has to be mapped as precisely as possible depending on all relevant parameters such as initial porosity, collision partner size, impact velocity, etc. However, due to restrictions of the apparatus laboratory experiments cover only small parts of the relevant parameter space. Moreover, many findings are not deduced from collisions between porous dust aggregates but from collisions of dust aggregates with solid objects [17]. Collisions between aggregates of decimetre size are

Simulation of Pre-planetesimal Collisions with SPH

31

not possible at all. In addition, not all experiments can be carried out in protoplanetary disc conditions, i.e. vacuum and microgravity. To bridge this gap of lacking experimental data, we perform simulations of preplanetesimal collisions with our solid body smoothed particle hydrodynamics (SPH) code parasph. This code was introduced, calibrated and tested thoroughly for the simulation of pre-planetary dust material [13] based on benchmark experiments [18]. A short introduction to solid body SPH and the code parasph is given in Sect. 2. In Sect. 3.1 we show that the code is capable of reproducing all sticking, bouncing, and fragmentation types found in the collection of dust experiments of Ref. [17]. Mapping our simulation results to this categorisation reveals that there are transitions between the sticking, bouncing, and fragmentation categories. We propose a new model for mapping collisional data, which is oriented on the collisional outcome only: by categorising according to fragment masses as continuous quantities, our model allows for transitions [12]. In the protoplanetary disc, pre-planetesimals grow by subsequent collisions. Therefore, they feature a collision history. To capture this, we develop a damage model, which is based on the inhomogeneity of dust aggregates [11]. Results from the test simulations with this model are presented in Sect. 3.2. Simulations of homogeneous dust aggregates and aggregates with hard shells are carried out to assess the danger of the bouncing barrier for planetesimal formation [11]. In Sect. 3.3 we summarise the findings from this study: bouncing is very unlikely for homogeneous highly porous aggregates but even thin hard shells can cause them to rebound. To investigate the influence of aggregate porosity and size of the projectile, we perform 160 head-on collisions between homogeneous aggregates of high, intermediate, and low porosity [11]. The results are collected in Sect. 3.4: collision velocity thresholds for transitions between stricking, bouncing, and fragmentation strongly depend on aggregate porosity and projectile size. An outlook on future work is given in Sect. 4.

2 SPH Code and Porosity Model 2.1 Solid Body SPH The numerical method smoothed particle hydrodynamics (SPH) was developed in Refs. [22] and [14]. Originally designed for the simulation of compressible flows in astrophysical context, it was enhanced in the past decades and is nowadays applicable for a variety of physical problems. The research contexts where SPH is used are too numerous to be listed at this place. To the interested reader, we recommend the review [24], which offers a broad overview of research fields and method improvements. The application of SPH in astrophysics was discussed in Ref. [30]. SPH is mainly used because of its advantages in pure hydrodynamics. However, by including stress-strain relations this scheme can be expanded for the simulation of solid bodies instead of flows and fluids. This was carried out in Ref. [21] and later pursued, e.g. in Refs. [1] and [29]. Since the simulation of porous pre-planetesimals

32

R.J. Geretshauser, R. Speith, W. Kley

belongs to the realm of solid body mechanics, the numerics of our approach are based on these works. To simulate porosity we modify the approach presented in Ref. [33]. SPH is a mesh-free Lagrangian particle method. The SPH particles are the sampling points of the scheme. Particularly with fragmentation in mind, they must not be confused with real particles. Instead, the continuum of a solid body is discretised into small mass packages, which interact with each other. The contribution of each SPH particle is weighted by the smoothing kernel W ab = W (xa , xb ; h), which is usually a normalised and first order differentiable function of the particle positions xa and xb and the smoothing length h. Moreover, the kernel is spherically symmetric and possesses compact support. We use the cubic B-spline kernel [25]. The basic equations of SPH are the equations of hydrodynamics, namely the continuity, momentum, and energy equation in their Lagrangian form. Since the equations of state (EOS) used for our work are independent of energy, we consider only the former two equations. We use the following SPH representations [29] dρ mb ∂ W ab = −ρ a ∑ b (vbα − vaα ) , dt ∂ xα b ρ a b σαβ σαβ ∂ W ab dvaα b = ∑m + . dt (ρ a )2 (ρ b )2 ∂ xaβ b

(1) (2)

We follow the usual notation where ρ , vα , and mb denote the density, velocity and mass, respectively. Greek indices denote the spatial components. The Einstein summing notation is applied throughout this work. The quantity σαβ is the stress tensor defined as

σαβ = −pδαβ + Sαβ ,

(3)

where p and Sαβ denote the hydrostatic pressure and deviatoric stress tensor of the solid body, respectively. The time evolution of the deviatoric stress tensor, which accounts for pure shear, follows Hooke’s law. Its frame invariant formulation (Jaumann rate form) is given by [1, 13, 32] dSαβ 1 = 2μ ε˙αβ − δαβ ε˙γγ + Sαγ Rγβ − Rαγ Sγβ , (4) dt 3 where μ is the shear modulus. The quantity ε˙αβ is the strain rate tensor and Rαβ is the rotation rate tensor. The SPH representations of these quantities are given by ∂ W ab ∂ W ab 1 mb a b a b a , (5) ε˙αβ = ∑ b (vα − vα ) + (vβ − vβ ) 2 b ρ ∂ xβa ∂ xbα b ab ab m ∂ W ∂ W 1 Raαβ = ∑ b (vbα − vaα ) . (6) − (vbβ − vaβ ) 2 b ρ ∂ xβa ∂ xαa

Simulation of Pre-planetesimal Collisions with SPH

33

To model porous structures below the resolution limit by means of continuous quantities we utilise the filling factor, which is defined as

φ=

ρ , ρs

(7)

where ρ is the density of the porous material and ρs is the density of the matrix material. As analogue for protoplanetary dust we utilise mono-disperse spherical SiO2 dust because a broad experimental and numerical basis exists with respect to this material. The matrix density for SiO2 is ρs = 2000 kg m−3 [4]. According to a preexisting porosity model [33], porosity enters via the equation of state and filling-factor-dependent strength quantities. We modify this model (see Ref. [13] for details and Fig. 1 for an illustration) such that the hydrostatic pressure is given by ⎧ for φc+ < φ ⎨ Σ (φ ) (8) p(φ ) = K(φ0 )(φ /φ0 − 1) for φc− ≤ φ ≤ φc+ , ⎩ T (φ ) for φ < φc− where φc+ > φc− , and φc+ and φc− are critical filling factors. The value of φc+ marks the transition between elastic and plastic compression and φc− defines the transition between elastic and plastic tension. The filling factors in between the critical values represent the elastic regime. The governing quantities of the elastic regime are the filling-factor-dependent shear modulus μ = μ (φ ) and bulk modulus K = K(φ ). Both are defined by a modification of the Murnaghan equation of state γ φ (9) K(φ ) = 2 μ (φ ) = K0 φRBD where γ = 4 and K0 is the bulk modulus of an uncompressed random ballistic deposition (RBD) dust sample with φRBD = 0.15 [4]. This value was calibrated to be K0 = K(φRBD ) = 4.5 kPa [13]. For the compressive, tensile, and shear strength, which represent transition thresholds from the elastic to the plastic regime, we adopt the relations found by a joint experimental and numerical approach [13, 18]. The tensile strength is given by a power-law (10) T (φ ) = −102.8+1.48φ Pa . The compressive strength takes the form

Σ (φ ) = pm

φmax − φmin −1 φmax − φ

Δ ln 10 ,

(11)

with φmax = 0.58 and φmin = 0.12, which denote maximum and minimum filling factor of the compressive strength relation. The quantity Δ ln 10 is the power of the expression with Δ = 0.58. The constant pm = 260 Pa is its mean pressure. Plasticity for pure hydrostatic tension or compression is modelled such that the material follows the path given by (10) or (11), respectively (see Fig. 1).

34

R.J. Geretshauser, R. Speith, W. Kley

Fig. 1 Within our porosity model the regime of elastic, purely hydrostatic deformation is limited by the compressive strength Σ (φ ) and the tensile strength T (φ ) relations. Σ (φ ) represents the transition threshold to plastic compression for p > Σ (φ ), while T (φ ) marks the transition to plastic tension for p < T (φ ). In between those regimes the material moves on elastic paths. Two examples are given by E0 (φ ) and E1 (φ ), which intersect the φ -axis at the reference filling factors φ0 and φ1 , respectively. The blue arrows represent an example for the plastic compression path whereas the red arrows exemplarily show a typical tension path

The shear strength is given by the geometric mean of (10) and (11) Y (φ ) = Σ (φ )|T (φ )| .

(12)

To model plasticity due to shear, we apply the von Mises plasticity criterion, which is based on the second irreducible invariant of the deviatoric stress tensor J2 = Sαβ Sαβ . The implementation of the deviatoric stress reduction follows [2] and [32] (13) Sαβ → f Sαβ , where f = min[Y 2 (φ )/3J2 , 1] and Y = Y (φ ) is the shear modulus.

2.2 The Code parasph The parasph code was developed by M. Hipp and already described in Refs. [20] and [31]. It is based on the ParaSPH library [7, 8]. This is a set of routines developed for a easier and faster handling of parallel particle codes. By means of this library the physical problem and the parallel implementation are clearly separated. ParaSPH features domain decomposition, load balancing, nearest neighbour search,

Simulation of Pre-planetesimal Collisions with SPH

35

and inter-node communication. The adaptive Runge-Kutta-Cash-Karp integrator has been used for the simulations presented here. The parallel implementation utilises the Message Passing Interface (MPI) library. Test simulations yielded a speedup of 120 on 256 single core processors of a Cray T3E and of 60 on 128 single core processors on a Beowulf-Cluster. The code by M. Hipp was extended [31, 32] for the simulation of elasticity and plasticity including the time evolution of the deviatoric stress tensor, which consumes a large amount of computing time. The extensions also include the implementation of the first version of the porosity model presented in Ref. [33] and the Murnaghan and Tillotson equation of state. With respect to numerics an adaptive second order Runge-Kutta, and an Euler integrator was added. Moreover, the SPH enhancements such as additional artificial stress and XSPH were implemented. In Ref. [13] the porosity model implementation was corrected and improved. In addition, it was calibrated it for the simulation of protoplanetary dust with the aid of benchmark experiments [18]. A treatment of fixed boundaries was added. HDF5 was included as a compressed input and output file format with increased accuracy, which decreases the amount of required storage space considerably.

3 Results The simulations in this section are carried out with 240,143 to 476,476 SPH particles depending on the size of the projectile. The program parasph is run on the NEC Nehalem cluster of the HLRS. Depending on the size of the problem 32 to 80 cores were used. The simulation time is strongly dependent on the collision velocity. In particular for fragmenting collisions the adaptive Runge-Kutta-Cash-Karp integrator reduces the time step to ∼ 10−5 s throughout the whole simulation. In contrast in bouncing and sticking collisions, the time step is initially low and than increases for the rest of the simulation. To give a rough number, simulations take 72 to 240 h for 1 s of simulated time depending on size of the problem and involved physical process.

3.1 Introducing the Four-Population Model Beyond the successful calibration for the simulation of porous SiO2 dust [13], we demonstrate that the presented SPH code is capable of reproducing all sticking, bouncing, and fragmentation types (see Fig. 2) that appear in the collision experiments with macroscopic porous dust aggregates collected in Ref. [17]. This consolidates the validity of the applied porosity model and shows its readiness for the application in the field of investigating pre-planetesimal collisions. For a successful transfer of collision data to global dust coagulation models, a suitable mapping method is desired. In the attempt to map our simulation data to

36

R.J. Geretshauser, R. Speith, W. Kley

Fig. 2 The outcome of the simulations of pre-planetesimal collisions encompasses all sticking (S), bouncing (B), and fragmentation (F) types proposed in Ref. [17]. The initial configuration for each simulation is a sphere with radius rt = 10 cm as resting target and a sphere with rp = 6 cm as projectile except for S3 and F2, where rp = 2 cm. The colour code indicates the filling factor φ . Both objects are initially set up with φ = 0.35 except for the F2 case, where φ = 0.55. The simulations are carried out with different impact velocities and the snap shots are taken at different times (figure from Ref. [12])

the categorisation of Ref. [17], which represents the most elaborate collision model available, we experience difficulties: on the one hand, the distinction between the four sticking, two bouncing, and three fragmentation types of Ref. [17] introduces unnecessary complexity caused by distinguishing between a mixture of qualitative and quantitative attributes and by adhering to the classification into sticking, bouncing, and fragmentation events. On the other hand, some simulation outcomes can not clearly be attributed to one of the proposed categories. Because of the difficulties of mapping the simulation data to the existing format, we propose a new model, which is based on quantitative aspects: we divided the set of fragments of a collision into four populations: the largest and second largest fragment are described by distinct values for the characteristic quantities of mass, filling factor, and kinetic energy to name only a few. The power-law population is described by distributions and the sub-resolution population by averaged values for the characteristic quantities. The largest fragment indicates growth or erosion, the second largest fragment accounts for bouncing, the power-law population quanti-

Simulation of Pre-planetesimal Collisions with SPH

37

tatively describes the amount of fragmentation, and the sub-resolution population gives an upper limit for smaller fragments, which are not captured due to insufficient resolution. Since the SPH code is not restricted to small aggregate sizes, the importance of the sub-resolution population becomes significant for aggregate collisions between objects of approximately metre size and more. Details on this model are presented in Ref. [12].

3.2 An Inhomogeneity Damage Model We develop the first damage model which is based on the inhomogeneity of SiO2 dust aggregates as measured in Ref. [18]. The approach is based on the concept that according to the porosity model (see Sect. 2.1) inhomogeneities in filling factor cause fluctuations in compressive, shear, and tensile strength in the aggregate. These fluctuations can be regarded as flaws in the material. In contrast to previous approaches designed for brittle material [16], the propagation of these flaws is not explicitly evolved. Instead, the defects in the dust material, which behaves more like a fluid, are determined by the time evolution of the filling factor or—equivalently— the density. The inhomogeneity of an aggregate is imposed on the initial SPH particle distribution as Gaussian distribution of the filling factor. The measure for the inhomogeneity is the standard deviation φσ of the Gaussian 1 1 φ − φμ (14) exp − n(φ ) = √ 2 φσ 2πφσ where n(φ ) is the number density for a filling factor φ and φμ the median filling factor. Using this inhomogeneity damage model, we perform test simulations of collisions between spherical dust aggregates of intermediate porosity (φμ = 0.35). The target and projectile radii are rt = 10 cm and rp = 6 cm, respectively. Both objects feature the same value of φσ . The projectile hits the target with an impact velocity of v0 = 10 ms−1 or v0 = 12.5 ms−1 . The former velocity is below and the latter above the fragmentation threshold for homogeneous aggregates (∼ 11 ms−1 ). The resulting fragment distribution is shown in Fig. 3 for the lower velocity and different values of φσ . For the lower collision velocity inhomogeneity leads to fragmentation. For both velocities the number of fragments and the fraction of small fragments increased with increasing φσ . These findings demonstrate the qualitative and also quantitative functionality of the inhomogeneity approach as a damage model.The findings indicate that inhomogeneous dust aggregates are weaker than their homogeneous equivalents. A slight inhomogeneity is sufficient to result in catastrophic disruption instead of growth as result of a dust aggregate collision. Furthermore, macroscopic dust aggregates in protoplanetary discs are produced by subsequent impacts of smaller aggregates at different impact velocities, i.e. pre-

38

R.J. Geretshauser, R. Speith, W. Kley

Fig. 3 Outcome of a collision between a target and projectile with the same φσ for different inhomogeneities and the collision velocity v0 = 10 ms−1 . The standard deviations for the Gaussian were φσ = 0 (a), φσ = 0.01 (b), φσ = 0.02 (c), φσ = 0.03 (d), φσ = 0.04 (e), and φσ = 0.05 (f). The collision outcome is shown in the impact direction. In the homogeneous case (a) and for small inhomogeneities (b) the target stays intact and forms one massive object with the projectile. For φσ ≥ 0.02 the target fragments (c–d). The fragment sizes decrease with increasing φσ and at the same time the number of fragments increases (figure from [11])

planetesimals feature a collision history and thus are very likely to be inhomogeneous. With the inhomogeneity damage model this feature can be simulated. The filling factor distribution of dust aggregates can be determined in the laboratory by X-ray tomography measurements [18]. These empirical data can be directly implemented into the inhomogeneity damage model whose input parameters can be obtained more easily than the values for the Weibull distribution [36], which is used for brittle material. By considering laboratory measurements of a size range of aggregates of the same filling factor, scaling laws of the inhomogeneity with size could be derived. This eventually introduces a length scale for simulations of preplanetesimals of sizes ranging from centimetre to hundreds of metres. Details on the inhomogeneity damage model can be found in Ref. [11].

Simulation of Pre-planetesimal Collisions with SPH

39

3.3 Hard Shells and Aggregate Bouncing We investigate the occurrence of sticking and bouncing for macroscopic and microscopic aggregates. It was proposed that bouncing is a typical outcome of centimetre sized porous aggregates with collision velocities v0 ≤ 1 ms−1 [17]. On this empirical basis the bouncing barrier for pre-planetesimal growth was found in dust coagulation simulations [39]. Our study is carried out to assess whether a large bouncing regime for similar sized porous aggregates is realistic. For this purpose we carry out simulations of collisions between homogeneous aggregates and collision velocities v0 ≤ 1 ms−1 . Both the target and projectile has an initial filling factor of φ = 0.15 in the high porosity and φ = 0.35 in the intermediate porosity case. The projectile radius is rp = 0.6 × rt , where rt is the target radius, which is roughly a decimetre. The masses are chosen such that the impact energies of high and intermediate porosity case are comparable. We show that bouncing is characteristic for homogeneous aggregates of intermediate porosity (φ = 0.35) and collision velocities v0 ≤ 1 ms−1 . For highly porous aggregates, however, sticking is much more frequent than stated in Ref. [17]. However, there is experimental evidence [19] which points to bouncing also of highly porous aggregates. We note that the outer regions of these aggregates are compressed while preparing the collision experiment. To assess the effect of this compacted shell, we carry out simulations of dust aggregates with a hard shell of intermediate porosity and a highly porous core. Both the target and projectile feature the same hard shell thickness, which is varied from 0.1 × r to 0.4 × r, where r is the radius of the respective object. The simulation results are shown in Fig. 4 for the thickest hard shells. Summarising, we find that for the low collision velocities in Ref. [19] (v0 ∼ 0.4 ms−1 ) even a thin hard shell can produce bouncing of the aggregates instead of sticking. We conclude that in the collision parameter space bouncing is much less frequent than assumed in Ref. [17] and that the bouncing barrier [39] could be less endangering for planetesimal formation than hypothesised by the authors. This is discussed in more detail in Ref. [11].

3.4 Head-on Collisions of Pre-planetesimals For the study of head-on collisions between dust aggregates in the centimetre size regime the following setting is used: the target radius is fixed at rt = 10 cm whereas the projectile radius is varied from rp = 2 cm to rp = 10 cm. The target is resting and the projectile velocity ranged from v0 = 0.1 ms−1 to v0 = 27.5 ms−1 . Both spherical aggregates are homogeneous and feature the same initial filling factor φ . Three values are considered for this study: φ = 0.15 (high porosity, see Fig. 6 for collision sequences), φ = 0.35 (intermediate porosity), and φ = 0.55 (low porosity). In total, 160 simulations are carried out for this study. The results are shown in Fig. 5 for the

40

R.J. Geretshauser, R. Speith, W. Kley

Fig. 4 Cross-section through aggregates with a hard shell of 0.4 r. The hard shell has an intermediate filling factor (φ = 0.35) and the interior is highly porous (φ = 0.15). The initial setup is shown in a. The remaining cross-sections show the situation after the impact with 0.1 (a), 0.3 (b), 0.5 (c), and 1.0 ms−1 (d) (figure from [11])

largest fragment distinguishing between high porosity (top), intermediate porosity (middle), and low porosity (bottom). With respect to pre-planetesimal growth, we distinguish between three different regimes: the gain, neutral, and loss regime and analyse how the transitions between these regimes change with impact velocity. In Fig. 5 the gain regime can be iden−1 tified by m1 m−1 t > 1, the neutral regime by m1 mt = 1, and the loss regime by −1 m1 mt < 1. A neutral-gain transition can be seen for aggregates of intermediate porosity at v0 ∼ 1 ms−1 . It also can be regarded as a transition from bouncing to sticking collisions. Neutral-gain transitions can be expected for all initial filling factors in an intermediate porosity regime. A gain-loss transition is only existent for aggregates of high and intermediate porosity. This is because only these aggregates are a sufficiently compressible to allow for sticking and hence a gain regime. The threshold velocity varies with projectile size for the same filling factor and it is important to note that it also varies with initial filling factor. Very porous aggregates can be compressed easily but they also fragment more easily. Aggregates of intermediate porosity are more stable and feature higher gain-loss thresholds.

Simulation of Pre-planetesimal Collisions with SPH

41

Fig. 5 The mass of the largest fragment m1 was normalised by the target mass mt . The colours denote collisions with different projectile radii rp . The homogeneous aggregates have an initial filling factor φ = 0.15 (top), φ = 0.35 (middle), and φ = 0.55 (bottom) (figure from [11])

A neutral-loss transition is only visible for aggregates of low porosity. This is because of the lacking ability to be compacted. This transition is characterised by a rebounding projectile but the elastic waves induced in both objects cause them to shatter. It is important to note that the neutral-loss transition velocity is much lower than the gain-loss transition velocities of the high and intermediate porosity cases. Since Ref. [17] contains the most comprehensive collection of laboratory experiments with SiO2 dust, we compare the simulation results to the findings of nearly equal sized aggregates presented in this reference. For the sake of simplicity in

Fig. 6 The homogeneous aggregates have an initial filling factor of φ = 0.15. The projectiles with radii rp = 4 cm (a), rp = 6 cm (b), and rp = 10 cm (c) hit a 10 cm target with a collision velocity of v0 = 7.5 ms−1 . From the left to the right the simulation times are 0, 40, 80, and 160 ms. For the latter two panels, single SPH particles are removed for the purpose of visibility. The filling factor is colour coded (figure from [11])

42 R.J. Geretshauser, R. Speith, W. Kley

Simulation of Pre-planetesimal Collisions with SPH

43

Ref. [17] it was not distinguished between different filling factors but the aggregates were categorised into “porous” with φ ≤ 0.40 and “compact” with φ > 0.40. The simulation results of this study reveal that these assumptions of Ref. [17] are too simplistic. It is not sufficient to distinguish between porous and compact aggregates. The simulated collisions show that not only the threshold values for the transitions but also the transition types vary with filling factor. In addition, the gain-loss transition threshold velocity, which is the most important for preplanetesimal growth, also varies with projectile size, which is most evident in the intermediate porosity case. The simulations also indicate that the gain-loss threshold might be much higher than estimated in Ref. [17]. This might be sufficient for preplanetesimals to break through the fragmentation barrier. A more detailed analysis of the head-on collisions is given in Ref. [11].

4 Outlook The results presented here encourage further investigations. Expanding the parameter space more filling factors should be studied and the variation of the transition thresholds with filling factor should be investigated. It is very likely that preplanetesimal collisions occur off-centre and that both aggregates possess a different filling factor. Therefore, the effect of a non-zero impact parameter and different porosity can be studied building on a reliable basis of head-on collisions. Simulations with new materials could be carried out.This is to assess the effect of different materials on the outcome of pre-planetesimal collisions. The results could be compared to those presented here and delivered to global coagulation simulations. In turn, the latter simulations could constrain collision outcomes in critical parameter ranges which allow for planetesimal formation. Utilising the solid body SPH code, material parameters could be varied until these desired collisional outcomes are produced. Approaching this “inverse problem” it could be assessed whether planetesimal formation by coagulation is possible with a realistic material and what the properties of this material must be.

References 1. Benz, W., Asphaug, E.: Impact simulations with fracture. I. Method and tests. Icarus 107(1), 98–116 (1994). 2. Benz, W., Asphaug, E.: Simulations of brittle solids using smooth particle hydrodynamics. Computer Physics Communications 87(1–2), 253–265 (1995). 3. Blum, J., M¨unch, M.: Experimental investigations on aggregate-aggregate collisions in the early solar nebula. Icarus 106(1), 151–167 (1993). 4. Blum, J., Schr¨apler, R.: Structure and mechanical properties of high-porosity macroscopic agglomerates formed by random ballistic deposition. Phys. Rev. Lett. 93(11), 115503 (2004). 5. Blum, J., Wurm, G.: The growth mechanisms of macroscopic bodies in protoplanetary disks. Annual Review of Astronomy and Astrophysics 46(1), 21–56 (2008).

44

R.J. Geretshauser, R. Speith, W. Kley

6. Brauer, F., Dullemond, C.P., Henning, T.: Coagulation, fragmentation and radial motion of solid particles in protoplanetary disks. Astronomy and Astrophysics 480(3), 859–877 (2008). 7. Bubeck, T., Hipp, M., Huettemann, S., Kunze, S., Ritt, M., Rosenstiel, W., Ruder, H., Speith, R.: SPH test simulations on a portable parallel environment. In: Proceedings of the Workshop on Physics and Computer Science, pp. 139–155. Spring meeting of the DPG (1999). 8. Bubeck, T., Hipp, M., H¨uttemann, S., Kunze, S., Ritt, M., Rosenstiel, W., Ruder, H., Speith, R.: Parallel SPH on Cray T3E and NEC SX-4 using DTS. In: E. Krause, W. J¨ager (eds.) High Performance Computing in Science and Engineering ’98, pp. 396–410. Springer-Verlag (1998). 9. Dominik, C., Blum, J., Cuzzi, J.N., Wurm, G.: Growth of dust as the initial step toward planet formation. In: B. Reipurth, D. Jewitt, K. Keil (eds.) Protostars and Planets V, pp. 783–800. University of Arizona Press, Tucson (2007). 10. Dullemond, C.P., Dominik, C.: Dust coagulation in protoplanetary disks: A rapid depletion of small grains. Astronomy and Astrophysics 434(3), 971–986 (2005). 11. Geretshauser, R.J.: Simulation of Pre-planetesimal Collisions with Smoothed Particle Hydrodynamics. Ph.D. thesis, Universit¨at T¨ubingen, T¨ubingen (2011). 12. Geretshauser, R.J., Meru, F., Speith, R., Kley, W.: The four-populations model: A new classification scheme for pre-planetesimal collisions. Astronomy and Astrophysics 531, A166 (2011). 13. Geretshauser, R.J., Speith, R., G¨uttler, C., Krause, M., Blum, J.: Numerical simulations of highly porous dust aggregates in the low-velocity collision regime: Implementation and calibration of a smooth particle hydrodynamics code. Astronomy and Astrophysics 513, A58 (2010). 14. Gingold, R.A., Monaghan, J.J.: Smoothed particle hydrodynamics – Theory and application to non-spherical stars. Monthly Notices of the Royal Astronomical Society 181, 375–389 (1977). 15. Goldreich, P., Lithwick, Y., Sari, R.: Final stages of planet formation. The Astrophysical Journal 614(1), 497 (2004). 16. Grady, D.E., Kipp, M.E.: Continuum modelling of explosive fracture in oil shale. International Journal of Rock Mechanics and Mining Sciences & Geomechanics Abstracts 17(3), 147–157 (1980). 17. G¨uttler, C., Blum, J., Zsom, A., Ormel, C.W., Dullemond, C.P.: The outcome of protoplanetary dust growth: Pebbles, boulders, or planetesimals? I. Mapping the zoo of laboratory collision experiments. Astronomy and Astrophysics 513, A56 (2010). 18. G¨uttler, C., Krause, M., Geretshauser, R.J., Speith, R., Blum, J.: The physics of protoplanetesimal dust agglomerates. IV. Toward a dynamical collision model. The Astrophysical Journal 701, 130–141 (2009). 19. Heißelmann, D., Fraser, H.J., Blum, J.: Experimental studies on the aggregation properties of ice and dust in planet-forming regions. International Astronautical Congress Abstracts 58, 1–6 (2007). 20. Hipp, M., Rosenstiel, W.: Parallel hybrid particle simulations using MPI and OpenMP. In: M. Danelutto, M. Vanneschi, D. Laforenza (eds.) Euro-Par, Lecture Notes in Computer Science, vol. 3149, pp. 189–197. Springer (2004). 21. Libersky, L.D., Petschek, A.G.: Smooth particle hydrodynamics with strength of materials. In: H. Trease, M.J. Fritts, W.P. Crowley (eds.) Advances in the Free-Lagrange Method: Including Contributions on Adaptive Gridding and the Smooth Particle Hydrodynamics Method, Lecture Notes in Physics, vol. 395. Springer (1991). 22. Lucy, L.B.: A numerical approach to the testing of the fission hypothesis. The Astronomical Journal 82, 1013–1024 (1977). 23. McKee, C.F., Ostriker, E.C.: Theory of star formation. Annual Review of Astronomy and Astrophysics 45(1), 565–687 (2007). 24. Monaghan, J.J.: Smoothed particle hydrodynamics. Reports on Progress in Physics 68(8), 1703 (2005). 25. Monaghan, J.J., Lattanzio, J.C.: A refined particle method for astrophysical problems. Astronomy and Astrophysics 149(1), 135–143 (1985).

Simulation of Pre-planetesimal Collisions with SPH

45

26. Natta, A., Testi, L., Calvet, N., Henning, T., Waters, R., Wilner, D.J.: Dust in proto-planetary disks: Properties and evolution. In: B. Reipurth, D. Jewitt, K. Keil (eds.) Protostars and Planets V, pp. 767–781. University of Arizona Press, Tucson (2007). 27. Ormel, C.W., Spaans, M., Tielens, A.G.G.M.: Dust coagulation in protoplanetary disks: Porosity matters. Astronomy and Astrophysics 461(1), 215–232 (2007). 28. Paszun, D., Dominik, C.: Collisional evolution of dust aggregates. From compaction to catastrophic destruction. Astronomy and Astrophysics 507(2), 1023–1040 (2009). 29. Randles, P.W., Libersky, L.D.: Smoothed particle hydrodynamics: Some recent improvements and applications. Computer Methods in Applied Mechanics and Engineering 139(1–4), 375– 408 (1996). 30. Rosswog, S.: Astrophysical smooth particle hydrodynamics. New Astronomy Reviews 53(4– 6), 78–104 (2009). 31. Sch¨afer, C.: Application of Smooth Particle Hydrodynamics to Selected Aspects of Planet Formation. Ph.D. thesis, Universit¨at T¨ubingen, T¨ubingen (2005). 32. Sch¨afer, C., Speith, R., Kley, W.: Collisions between equal-sized ice grain agglomerates. Astronomy and Astrophysics 470(2), 733–739 (2007). 33. Sirono, S.: Conditions for collisional growth of a grain aggregate. Icarus 167(2), 431–452 (2004). 34. Teiser, J., Wurm, G.: Decimetre dust aggregates in protoplanetary discs. Astronomy and Astrophysics 505(1), 351–359 (2009). 35. Wada, K., Tanaka, H., Suyama, T., Kimura, H., Yamamoto, T.: Collisional growth conditions for dust aggregates. The Astrophysical Journal 702(2), 1490 (2009). 36. Weibull, W.: A Statistical Theory of the Strength of Materials, Ingeni¨orsvetenskapsakademiens handlingar, vol. 151. Generalstabens Litografiska Anstalts F¨orlag, Stockholm (1939). 37. Weidenschilling, J.S.: The distribution of mass in the planetary system and solar nebula. Astrophysics and Space Science 51(1), 153–158 (1977). 38. Weidenschilling, S.J.: Aerodynamics of solid bodies in the solar nebula. Monthly Notices of the Royal Astronomical Society 180(1), 57–70 (1977). 39. Zsom, A., Ormel, C.W., G¨uttler, C., Blum, J., Dullemond, C.P.: The outcome of protoplanetary dust growth: Pebbles, boulders, or planetesimals? II. Introducing the bouncing barrier. Astronomy and Astrophysics 513, A57 (2010).

Copper Substrate Catalyzes Tetraazaperopyrene Polymerization W.G. Schmidt, E. Rauls, U. Gerstmann, S. Sanna, M. Landmann, M. Rohrm¨uller, A. Riefer, S. Wippermann, and S. Blankenburg

Abstract The polymerization of tetraazaperopyrene (TAPP) molecules on a Cu(111) substrate, as observed in recent STM experiments, has been investigated in detail by first principles calculations. Tautomerization is the first step required for the formation of molecular dimers and polymers. The substrate is found to catalyze this tautomerization.

1 Introduction The ongoing miniaturization of electronic devices drives the search for alternatives to the present approaches of lithographic manufacturing. In this context, the spontaneous ordering and assembly of atoms and molecules on atomically well-defined surfaces, the so-called bottom-up approach, appears to be a very promising way to fabricate functional systems with nanometer dimensions [1–3]. The intermolecular interactions ranging from indirect, substrate mediated interplay [4] and direct Coulomb forces [5] to weak dispersion interactions, metal complexation [6] and hydrogen [7, 8] or covalent bonds [9] set the stage for a large number of possibilities to form one- and two-dimensional molecular networks of varying robustness. Many potential applications of supramolecular structures, e.g., as templates in bottom-up device technology will require a thermal and chemical stability that can only be achieved by covalent bonding. Layers in which the adsorbates are interlinked by strong covalent bonds are advantageous also in the field of organic W.G. Schmidt · E. Rauls · U. Gerstmann · S. Sanna · M. Landmann · M. Rohrm¨uller · A. Riefer Lehrstuhl f¨ur Theoretische Physik, Universit¨at Paderborn, 33095 Paderborn, Germany S. Wippermann University of California, Dept. of Chemistry, One Shields Avenue, Davis, CA 95616, USA S. Blankenburg Eidgen¨ossische Materialpr¨ufungs- und Forschungsanstalt (EMPA), 8600 D¨ubendorf, Switzerland W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 4, © Springer-Verlag Berlin Heidelberg 2012

47

48

W.G. Schmidt et al.

electronics, e.g., for organic field effect transistors or organic solar cells. Provided nanoporous molecular networks are sufficiently robust, they may also offer welldefined surroundings to control surface chemical reactions in confined spaces. For these reasons a number of experimental investigations into covalently interlinked molecular structures on surfaces under ultrahigh vacuum (UHV) conditions were performed. For instance, the vapor-deposition polymerization of ultrathin films was studied by scanning tunneling microscopy (STM) [10, 11]. The STM was not only used to study, but also to initialize and control the polymerization of adsorbates [12]. Photoinduced polymerization of surface adsorbed diacetylene was demonstrated to allow for the switching between different adsorbate phases [13]. Recent years have seen a surge of interest in the formation and characterization of two-dimensional, covalent organic networks based on chemical reactions at surfaces, see, e.g., Refs. [8, 14–23]. Using computer grants of the HLRS, we studied the tautomerization reaction of 1,3,8,10-tetraazaperopyrene (TAPP) on a Cu(111) surface on the basis of DFT calculations [24]. This special system has previously been investigated experimentally by Matena et al. [16, 23]. The authors found that the copper substrate is crucial for the formation of the various aggregates they observed, i.e. differently ordered closepacked as well as porous networks on the one hand, and covalently bonded linear chains on the other hand. From a surface science point of view, especially the latter are of great interest, since their formation requires a multiple step chemical reaction which does not occur in gas phase. The Cu(111) surface exhibits only a weak interaction with the adsorbed molecules, but obviously is sufficient to change the reaction kinetics. Recent DFT calculations focused on the energetic stability of the molecular network or the chain, respectively [23, 25]. However, the detailed mechanism for the tautomerization reaction has not been investigated yet. With the work presented here, we want to close this gap.

2 Computational Method The DFT calculations are performed within the local density approximation (LDA) for exchange and correlation as implemented in VASP [26]. Thereby the system of Kohn-Sham equations n(r ) h¯ 2 +V dr +Vxc (r) ψnk (r) = εnk ψnk (1) − ext (r) + 2m |r − r | n(r) = ∑ f nk |ψnk |2

(2)

n,k

is solved iteratively for the external potential Vext (r) until self-consistency in the total electron density n(r) is reached. Plane waves serve as basis set for the KohnSham orbitals ψnk (r). The ground-state DFT calculations were parallelized for different bands and sampling points in the Brillouin zone using the message passing

Tetraazaperopyrene Polymerization

49

Fig. 1 Molecular model for the single TAPP molecule (left), the first tautomer TA1 (right) and the dimer adsorbed on the Cu(111) surface (bottom)

interface (MPI). Parallelization over bands and plane wave coefficients at the same time reduces the communication overhead significantly. In order to account approximately for the influence of the dispersion interaction, a semiempirical scheme based on the London dispersion formula was used [27, 28]. The electron-ion interaction was described by the projector-augmented wave (PAW) method [29, 30], which allows for an accurate treatment of the firstrow elements as well as the Cu 3d electrons with a relatively moderate energy cutoff of 340 eV. The adstructures were modeled inperiodically repeated supercells, con 0 taining two atomic Cu layers arranged in a 14 3 6 translational symmetry, the ad˚ Adsorption energies were obtained sorbed molecules and a vacuum region of 15 A. as Eads = Etot − Esub − Emol − NCu μCu + 12 NH EH2 from the total energy of the adsystem Etot , the substrate Esub and the molecule Emol . Here the number of additional metallic adatoms is given by NCu , NH denotes the number of dissociated hydrogens, μCu is the copper adatom chemical potential and EH2 the energy of molecular hydrogen. Potential energy surfaces (PES) are calculated by lateral displacement of the molecule followed by a relaxation with a laterally constrained carbon atom (Cc in Fig. 1). Figure 2 shows benchmark calculations to determine the electronic ground state of a typical 150 atom cell used for surface modeling in our project. The calculations within this project were performed on the NEC SX-8 and SX-9 of the

50

W.G. Schmidt et al.

Fig. 2 CPU time and speedup for DFT calculations for Perfluoro-anthracene (PFA) on the Au(111) surface containing around 150 atoms. The calculations were performed with the Stuttgart optimized VASP version on the HLRS NEC SX-8 and SX-9 machines. In comparison, we show data for the HLRS Cray XE6, a local Linux cluster (Intel Core i7, 24 Twin-nodes with 4 CPUs 2.5 GHz Quad Core Xeon each) and Mac Pro workstations (Intel Core i7)

H¨ochstleistungsrechenzentrum Stuttgart. As can be seen, a reasonable scaling is achieved for using up to 32 CPUs.

3 Results A single TAPP molecule adsorbs completely flat on the Cu(111) surface. The adsorption geometry can be seen in the top left part of Fig. 1. The molecule–substrate interaction is weak and dominated by dispersion forces. The molecule’s center ring is located on top of a copper atom as energetically most favorable adsorption position. In this minimum energy position, the adsorption energy amounts to −2.55 eV. However, due to the weak interaction with the substrate and its mainly dispersive character, the molecule can be expected to be rather mobile, which is crucial for the polymerization reactions discussed in the following. The calculated PES (Fig. 2 in Ref. [24]) indicates only small energy barriers hindering lateral movements of the molecule at the surface. The movement of the center ring of the molecule to a bridge site increases the energy by less than 0.01 eV, and even the most unfavorable hollow site is only by 0.02 eV less favorable in energy than the top site. Tautomerization is an essential prerequisite for chain and network formation of the TAPP molecules. The hydrogen atom at the central edge carbon has to move to one of the adjacent nitrogen atoms in order to form the first tautomer (TA1, cf.

Tetraazaperopyrene Polymerization

51

Fig. 3 Energetics of the reaction. Towards the left, results for the gas phase are shown. Towards the right, the reaction on the substrate is shown. All values in eV/molecule. Energies including dispersion corrections are marked with an asterisk. See text for details

Fig. 1). In this configuration, the central carbon is left with only two bonds to the neighboring nitrogen atoms and a free electron pair. The same reaction happens for the second tautomer (TA2) at the central edge carbon at the opposite side of the molecule. With an energy difference of 2.09 eV between TAPP and TA1, the described tautomerization process is unfavorable in gas phase (see Fig. 3 for an overview of the energetics). Analogously, the same holds for the formation of TA2. In gas phase, the polymerization is hindered due to this costly removal of the hydrogen atom. The energetics change significantly for Cu(111) adsorbed molecules. However, the polymerization process is preceded by a copper coordination network, and additional surface adatoms may also modify the reaction kinetics. To get an idea of the impact of the various processes, we start with the discussion of the clean unperturbed surface followed by the discussion of the copper adatom influence. Since the TA1 tautomer can deform after desorption and bind covalently to a Cu atom, the adsorbate system can remain in an energetically more favorable geometry than in the gas phase. Thus, the energetic difference between TAPP and TA1 (in the adsorbed geometry) decreases by a factor of five to 0.42 eV. Figure 1 shows the adsorption geometries for TAPP (top left part of the figure) and TA1 (top right part). The diffusion barrier of the tautomer TA1 (TA2) is higher than that of TAPP and amounts to 0.5 eV (0.7 eV). Also the energetically favored bonding position for both tautomers is shifted towards the bridge side with respect to the central ring of the molecule. Therefore, the mobility of the tautomers is reduced compared to the TAPP molecule suggesting covalent bonding to the substrate. This is supported by

52

W.G. Schmidt et al.

Fig. 4 PES (including dispersion corrections) of the tautomerization process from TAPP towards TA1 (one hydrogen moves from the carbon to the nitrogen). All values in eV

the calculated charge density differences revealing that the free electron pair of the central carbon atom of the TA1 tautomer binds to the substrate. Additionally, the nitrogen lone pair interacts weakly with the surface. At this point, the catalyzing effect of the surface can to some extent already be understood. While the total energy differences between the tautomers explain the relative stabilities, the activation energies for the tautomerization reactions have to be calculated in order to understand the reaction kinetics. We have calculated the PES for the moving hydrogen both for the molecule in the gas phase and in the adsorbed state (Fig. 4). The barrier for the first tautomerization (TAPP → TA1) is found to be significantly smaller in the adsorbed state (0.56 eV) than in the gas phase (2.10 eV). Figure 4 shows the path taken by the H-atom during the tautomerization process in the adsorbate state and reveals that the final state is a local energy minimum. An energy barrier of 0.14 eV separates this state from the initial configuration (TAPP). In the gas phase, in contrast, TA1 is not stable, since the energy barrier for the reverse reaction (TA1 → TAPP) is with 0.01 eV negligible and even within the error bar of our calculations. Thus, on the substrate, TA1 can be viewed as a meta stable state. This fact, together with the total energy differences discussed above, enhances the tautomerization and makes the reaction, contrary to the gas phase, likely to happen at the substrate. The tautomerization is the first step towards chain or network formation. The second step towards the formation of longer polymers is the dimerization of two tautomers. The formation of a dimer (DI) consisting of two TA1 tautomers is highly favored at the substrate as well as in the gas phase. The energy gain has been calcu-

Tetraazaperopyrene Polymerization

53

lated to be by a factor of two higher in gas phase (−2.33 eV) than at the substrate (−1.09 eV). One reason for this is that the isolated TA1 tautomer is unstable in gas phase, another reason is the covalent bonding of TA1 to the substrate which has to be lifted for the dimer bond formation. Since the dimer has a flat adsorption geometry (see bottom part of Fig. 1), the molecule-substrate bonds of TA1 must be broken. Nevertheless, there is an overall energy gain due to the formation of a C=C double bond between the two TA1. Comparing DI with the originating TAPP, this energy gain is by a factor of two higher for the molecules adsorbed on the Cu-surface than in gas phase. Figure 3 summarizes the results of our calculations in a reaction diagram. Starting from the TAPP molecule in the center, the two step process via TA1 to DI is shown for the gas phase (towards the left) or for the adsorbate (towards the right), respectively. Values marked with an asterisk include dispersion interactions, while those without are DFT-GGA values. Although the dispersive interaction is the main contribution to the binding energy of TAPP to the substrate, it does not change the reactions qualitatively, as could have been expected based on our previous experiences with comparable systems [31]. An explanation for this finding might be the crucial role of the formation of covalent bonds for TA1. From TA1, the second tautomerization step to TA2 is in both cases unlikely. For chain formation, however, it is not needed. The deformation of the adsorbed TA1 due to the bond to the substrate does not affect more than half of the molecule. The other end of the molecule remains approximately flat. In DI, the situation is similar, thus deformation is not hindered and the mechanism of the repositioning of the hydrogen atom at the DI central C can happen completely analogously to the reaction from TAPP to TA1. Like this, a third and any further TA1 can be added, resulting in linear molecular chains. However, additional adatoms can also play a crucial role during the covalent synthesis. Figure 5 summarizes the potential influence of Cu adatoms on the reaction process. Hereby, the energy levels depend on the chemical potential of the adatoms bulk up to the adsorption energy of a μCu . It ranges from the energy of copper bulk μCu adatom single isolated adatom μCu . This range is shown in Fig. 5 with shaded areas. We first discuss the case that the system is in equilibrium with a copper bulk reservoir, i.e., adatoms are not freely available but have to generated from the substrate bulk. For the single TAPP one molecular nitrogen is coordinated to the metal and the energy difference of 0.09 eV compared to the clean surface shows a slight preference of the non-coordinated geometry for lower coverages. After the first tautomerization, the total-energy difference between coordinated and non-coordinated TA1 increases to 0.38 eV. Now, the central carbon is bonded to the additional metal atom resulting just in a small deviation from the planar structure which reduces the strain of the molecule compared to the uncoordinated case. The corrugation of the PES decreases for the TA1-M tautomer compared to the uncoordinated case from 0.6 eV to 0.15 eV as seen in Fig. 2 in Ref. [24]. Thus, the metal coordination reduces the diffusion hinderance of the tautomer. The hydrogen diffusion from the molecule to the substrate possibly assisted by the metal adatom is another possibility to form the TA1 state. The resulting molecule is now partially dehydrogenated and bonds to the coordination atom. However, this

54

W.G. Schmidt et al.

Fig. 5 Energetics of the reaction with coordination adatoms (right) and an additional dehydrogenation (left). The corresponding molecular structures are visualized as well. The ranges of the chemical potentials of copper are marked with shaded areas for metal coordinated structures (rangbulk to μ adatom as indicated). All values in eV/molecule. Energies including dispersion ing from μCu Cu corrections are marked with an asterisk. See text for details

increases the energy by 1.81 eV and is thus not likely. The dimerization of two TA1 tautomers results in an energy gain for all three possibilities: DI with metal coordination (−0.25 eV), DI with metal coordination and hydrogen loss (−0.48 eV) as well as DI (−1.09 eV). The dehydrogenated state DI-MH is now more stable than the solely coordinated dimer (DI-M). Nevertheless, for the upper limit of the chemical potential, the uncoordinated reaction pathway as described above is preferred. Indeed, at high temperatures there are additional adatoms available due to thermal degradation from step edges shifting the chemical potential to lower values and increasing the probability of metal coordinated molecules at the surface. This situation is reflected by the lower limit of the chemical potential. For constant temperature, the shaded area in Fig. 5 can be interpreted as the partial pressure of available adatoms. The copper coordinated molecular dimerization may be favorable in dependence on the availability of adatoms. The dehydrogenated, metal coordinated dimer is more stable than the hydrogenated one in contrast to the tautomer case where the hydrogenated TA1-M is more stable. Thus, the preferred reaction pathway starts with a metal coordinated TAPP molecule followed by a tautomerization step without loosing the coordination to the adatom. Now, tautomers can dimerize while an additional dehydrogenation is preferred (energy gain compared to TAPP just with dehydrogenation). Like this, more tautomers can be added, resulting in metal coordinated, linear molecular chains. In the end, the preferred reaction path-

Tetraazaperopyrene Polymerization

55

way depends on the number of adatoms available ranging from an uncoordinated to a metal coordinated polymerization.

4 Summary Summarizing, we presented detailed first principles investigations of the polymerization process of TAPP molecules on a Cu(111) surface. We have identified the mechanisms that lead to the tautomerization of molecules as a necessary first step (formation of TA1) and the dimer (DI) formation as a second step in the reaction. A catalyzing effect could be attributed to the Cu(111) substrate. It is crucial for the stability of the tautomer TA1, the formation of which is required for the polymerization. We find that metal coordinated (provided the process is accompanied by a dehydrogenation) as well as uncoordinated polymerization processes are possible. The catalytic effect of metallic substrates may thus assist in the formation of covalently bonded molecular networks which are not formed in the gas phase or in solution. Acknowledgments. Generous grants of computer time from the H¨ochstleistungsrechenzentrum Stuttgart (HLRS) and the Paderborn Center for Parallel Computing (PC2 ) are gratefully acknowledged. We thank the Deutsche Forschungsgemeinschaft for financial support.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

J V Barth, Annu. Rev. Phys. Chem. 58, 375 (2007). A Nilsson and L G M Pettersson, Surf. Sci. Rep. 55, 49 (2004). J A A W Elemans, S Lei, and S DeFeyter, Angew. Chem. Int. Ed. 48, 7298 (2009). S Lukas, G Witte, and C W¨oll, Phys. Rev. Lett. 88, 028301 (2001). Q Chen and N V Richardson, Nature Materials 2, 324 (2003). S L Tait, A Langner, N Lin, R Chandrasekar, O Fuhr, M Ruben, and K Kern, ChemPhysChem 9, 2495 (2008). N Nyberg, M Odelius, A Nilsson, and L G M Petterson, J. Chem. Phys. 119, 12577 (2003). R Pawlak, S Clair, V Oison, M Abel, O Ourdjini, N A A Zwaneveld, D Gigmes, D Bertin, L Nony, and L Porte, ChemPhysChem 10, 1032 (2009). S Weigelt, C Busse, L Petersen, E Rauls, B Hammer, K V Gothelf, F Besenbacher, and T R Linderoth, Nature Materials 5, 112 (2006). S F Alvarado, W Rieß, M Jandke, and P Strohriegel, Org. Electronics 2, 75 (2001). C H Schmitz, J Ikonomov, and M Sokolowski, J. Phys. Chem. C 113, 11984 (2009). Y Okawa and M Aono, Nature 409, 683 (2001). O Endo, H Ootsubo, N Toda, M Suhara, H Ozaki, and Y Mazaki, J. Am. Chem. Soc. 126, 9894 (2004). L Grill, M Dyer, L Lafferentz, M Persson, M V Peters, and S Hecht, Nature Nanotech. 2, 687 (2007). S Weigelt, C Busse, C Bombis, M M Knudsen, K V Gothelf, T Strunskus, C W¨oll, M Dahlbom, B Hammer, E Laegsgaard, F Besenbacher, and T R Linderoth, Angew. Chem. Int. Ed. 46, 9227 (2007).

56

W.G. Schmidt et al.

16. M Matena, T Riehm, M St¨ohr, T A Jung, and L H Gade, Angew. Chem. Int. Ed. 47, 2414 (2008). 17. M In’t Veld, P Iavicoli, S Haq, D B Amabilino, and R Raval, Chem. Commun., 1536 (2008). 18. S Weigelt, C Busse, C Bombis, M M Knudsen, K V Gothelf, E Lægsgaard, F Besenbacher, and T R Linderoth, Angew. Chem. Int. Ed. 47, 4406 (2008). 19. N A A Zwaneveld, R Pawlak, M Abel, D Catalin, D Gigmes, D Bertin, and L Porte, J. Am. Chem. Soc. 130, 6678 (2008). 20. M Treier, N V Richardson, and R Fasel, J. Am. Chem. Soc. 130, 14054 (2008). 21. M Treier, R Fasel, N R Champness, S Argent, and N V Richardson, Phys. Chem. Chem. Phys. 11, 1209 (2009). 22. J A Lipton-Duffin, O Ivasenko, D. F Perepichka, and F Rosei, Small 5, 592 (2009). 23. M Matena, M St¨ohr, T Riehm, J Bj¨ork, S Martens, M S Dyer, M Persson, J Lobo-Checa, K M¨uller, M Enache, H Wadepohl, J Zegenhagen, T A Jung, and L H Gade, Chem. Eur. J. 16, 2079 (2010). 24. S Blankenburg, E Rauls, and W G Schmidt, J. Phys. Chem. Lett. 1, 3266 (2010). 25. J Bj¨ork, M Matena, M S Dyer, M Enache, J Lobo-Checa, L H Gade, T A Jung, M St¨ohr, and M Persson, Phys. Chem. Chem. Phys. 12, 8815 (2010). 26. G Kresse and J Furthm¨uller, Comp. Mat. Sci. 6, 15 (1996). 27. F London, Z. Phys. Chem. Abt. B 11, 222 (1930). 28. F Ortmann, F Bechstedt, and W G Schmidt, Phys. Rev. B 73, 205101 (2006). 29. P E Bl¨ochl, Phys. Rev. B 50, 17953 (1994). 30. G Kresse and D Joubert, Phys. Rev. B 59, 1758 (1999). 31. E Rauls, S Blankenburg, and W G Schmidt, Phys. Rev. B 81, 125401 (2010).

QCD Critical Surfaces at Real and Imaginary μ O. Philipsen and Ph. de Forcrand

In a long term project, we calculate the critical surface bounding the region featuring chiral phase transitions in the quark mass and chemical potential parameter space of QCD with three flavours of quarks. Our calculations are valid for small to moderate quark chemical potentials, μ T . Previous calculations were done on coarse Nt = 4 lattices, corresponding to a lattice spacing a ∼ 0.3 fm. Here we present results for three degenerate flavours at zero and finite density on Nt = 6 lattices, corresponding to a lattice spacing of a ∼ 0.2 fm. This part of the report is very similar to last year’s, with updated numbers. Furthermore, we compute the phase structure at imaginary chemical potential μ /T = iπ /3, finding tricritical lines which bound the continuation of the chiral as well as the deconfinement transition surfaces to imaginary chemical potentials and explain their curvature.

1 Introduction The fundamental theory describing the strong interactions is Quantum Chromodynamics (QCD) with two light quark flavours, the u- and d-quarks, and a heavier s-quark. Since the interaction weakens at asymptotically large energy scales, QCD predicts at least three different forms of nuclear matter: the usual hadronic matter at low temperature and density, a quark gluon plasma at high temperature and low density, and colour superconducting nuclear matter at low temperatures and high density. Direct Monte Carlo simulations of the finite density QCD phase diagram are impossible because of the so-called sign problem, so that indirect methods need O. Philipsen Institut f¨ur Theoretische Physik, Goethe-Universit¨at Frankfurt, 60438 Frankfurt am Main, Germany Ph. de Forcrand Institut f¨ur Theoretische Physik, ETH Z¨urich, CH-8093 Z¨urich, Switzerland Physics Department, TH-Unit, CERN, CH-1211 Geneva, Switzerland W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 5, © Springer-Verlag Berlin Heidelberg 2012

57

58

O. Philipsen, Ph. de Forcrand

Fig. 1 Top: Schematic phase transition behaviour of N f = 2 + 1 QCD for different choices of quark masses (mu,d , ms ) at μ = 0. Left, Right: The same with chemical potential for quark number as an additional parameter. The chiral critical line sweeps out a critical surface as μ is turned on. Depending on the curvature, a QCD chiral critical point is present or absent. Also shown is the deconfinement critical surface for heavy quark masses

to be employed, which work for small enough μ /T only (for an overview and references, see [1]). Moreover, because simulations with dynamical fermions cannot yet be performed for physically light quark masses and on sufficiently fine lattices, Monte Carlo investigations have to proceed step by step in quark masses and lattice spacings in order to achieve an eventual extrapolation to the physically interesting case. At zero chemical potential, the nature of the quark-hadron phase transition depends on the quark masses, as summarised in Fig. 1. In the limits of zero and infinite quark masses, order parameters for the breaking of the global chiral and centre symmetry, respectively, can be defined, and one finds numerically that first order phase transitions take place at some finite temperature Tc . On the other hand, for intermediate quark masses the transition is an analytic crossover. Hence, each corner of first order phase transitions is bounded by a second-order critical line as in Fig. 1. The physical quark masses are light, so our interest is in the lower left boundary, which

QCD Critical Surfaces at Real and Imaginary μ

59

is called the chiral critical line, as opposed to the deconfinement critical line in the heavy mass region. When a chemical potential is switched on, the chiral critical line will sweep out a surface, as shown in Fig. 1. According to standard expectations for small but nonzero mu,d the chiral critical line should continuously shift with μ to larger quark masses until it passes through the physical point at μE , corresponding to the endpoint of the QCD phase diagram. This is depicted in Fig. 1 (middle). However, it is also possible for the chiral critical surface to bend towards smaller quark masses, cf. Fig. 1 (right), in which case there would be no chiral critical point or phase transition at moderate densities. Here we specialise to the theory with three degenerate quarks, which lives on the diagonal in the quark mass plane. The critical quark mass corresponding to the point on the chiral critical line can be expanded as function of chemical potential, μ 2k mc ( μ ) . (1) = 1 + ∑ ck mc (0) πT k=1 A strategy to learn about the chiral critical surface is to tune the quark mass to mc (0) and evaluate the leading coefficients of this expansion. In particular, the sign of the curvature c1 will tell us which of the scenarios in Fig. 1 is realised. In previous work the location of the boundary line has been determined for the case of degenerate quark masses, N f = 3 [2, 3], where it was also shown that it belongs to the 3d Ising, or 3d Z(2), universality class. On the lattice, temperature and lattice spacing are related by T = 1/(aNt ), i.e. larger Nt corresponds to finer lattices for a fixed physical temperature. We have used Nt = 4 lattices, corresponding to a lattice spacing a ∼ 0.3 fm, to map out how this line changes i) for N f = 3 as a function of chemical potential μ [3, 4] and ii) for μ = 0 in the case of non-degenerate quark masses mu,d = ms [4]. It was found that the physical point is located close to the boundary line on the crossover side. When a chemical potential for the baryon density is switched on, the N f = 3 theory has leading coefficients c1 = −3.3(3), c2 = −47(20) in (1) [5, 6], i.e. the chiral critical point recedes towards smaller masses as in Fig. 1 (right). The same behaviour is found for non-degenerate quark masses. Tuning the s-quark mass to its physical value, we calculated mcu,d (μ ) with c1 = −39(8) and c2 < 0 [7]. As a consequence, the physical point remains in the crossover region also for moderate baryon densities. This is in contrast to the expectations sketched above, and therefore a careful examination of the systematic errors in such simulations needs to be performed. Here, we check on cut-off effects and present our current results for mc (0) as well as the curvature of the N f = 3 theory on a finer Nt = 6 lattice, corresponding to a ∼ 0.2 fm. Part of these results have been shown before. Due to the very low signal-to-noise ratio, they require large statistics in order to be reliable and hence simulation times on the order of years.

60

O. Philipsen, Ph. de Forcrand

Fig. 2 Schematic behaviour of the Binder cumulant as a function of quark mass for μ = 0. First order transitions and crossovers correspond to B4 = 1, 3, respectively, whereas a second order 3d Ising transition is characterised by B4 ≈ 1.604. On finite volumes the step function gets smeared out

2 The Binder Cumulant and Universality In order to investigate the critical behaviour of the theory, we use the Binder cumulant [8] as an observable. It is defined as B4 (m, μ ) =

(δ X)4 , (δ X)2 2

(2)

with the fluctuation δ X = X − X of the order parameter of interest. Since we investigate the region of chiral phase transitions, we use the chiral condensate, X = ¯ . For the evaluation of the Binder cumulant it is implied that the lattice gauge ψψ coupling has been tuned to its pseudo-critical value, β = βc (m, μ ), corresponding to the phase boundary between the two phases. In the infinite volume limit the Binder cumulant behaves discontinuously, assuming the values 1 in a first order regime, 3 in a crossover regime and the critical value ≈ 1.604 reflecting the 3d Ising universality class at a chiral critical point. On a finite volume the discontinuities are smeared out and B4 passes continuously through the critical value. This is sketched in Fig. 2. In the neighbourhood of the chiral critical point at zero chemical potential it can be expanded linearly B4 (m, μ ) = A + B (am − amc ) +C(aμ )2 . . . ,

(3)

with A → 1.604 for V → ∞. The curvature of the critical surface in lattice units is directly related to the behaviour of the Binder cumulant via the chain rule, ∂ B4 ∂ B4 −1 damc =− . (4) d(aμ )2 ∂ (aμ )2 ∂ am

QCD Critical Surfaces at Real and Imaginary μ

61

Fig. 3 First results on Nt = 6 for N f = 3. Left: Critical quark mass mc (0) from the intersection of B4 with the Ising value shrinks compared to Nt = 4. Right: Curvature of the Binder cumulant. The dots on the left give the value of the curvature when the difference quotient is fitted as a constant or with a subleading term

While the second factor is sizeable and easy to evaluate in a simulation, the μ dependence of the cumulant is excessively weak and requires enormous statistics to extract. In order to guard against systematic errors, we compute the derivative directly and without recourse to fitting via the finite difference quotient [5]

∂ B4 B4 (aμ ) − B4 (0) = lim . ∂ (aμ )2 (aμ )2 →0 (aμ )2

(5)

Because the required shift in the couplings is very small, it is adequate and safe to use the original Monte Carlo ensemble for amc0 , μ = 0 and reweight the results by the standard Ferrenberg-Swendsen method [9]. Moreover, by reweighting to imaginary μ the reweighting factor is real positive and close to 1.

3 Results on Nt = 6 Having found that the curvature is negative for both N f = 3 and N f = 2 + 1, there remains a source of considerable systematic errors. On coarse lattices the discretisation effects can be larger than the finite density effects. This has been demonstrated recently by repeating the zero density studies on Nt = 6 lattices. For the case of N f = 3, it was found that the critical quark mass shrinks by almost a factor of five on Nt = 6 compared to Nt = 4 [5], as shown in Fig. 3 (left). Again, this effect corresponds to a weakening of the transition. However, being much larger than the actual finite density effects, the question now is what happens to the curvature of the critical surface as the continuum is approached. First results are shown in Fig. 3 (right). The next-to-leading order fit is clearly preferred, but both fits give negative values for the curvature. The result needs to still be improved in accuracy and possibly be extended to N f = 2 + 1. A continuum extrapolation would require the same computations for at least one additional lattice spacing.

62

O. Philipsen, Ph. de Forcrand

Fig. 4 Left: Phase diagram for imaginary μ . Vertical lines are first order transitions between Z(3)sectors, arrows show the phase of the Polyakov loop. The μ = 0 chiral/deconfinement transition continues to imaginary μ , its order depends on N f and the quark masses. Right: Phase diagram for N f = 3 at μ = iπ T . Solid lines are lines of triple points ending in tricritical points, connected by a Z(2) critical line

4 Continuation of the Critical Surfaces to Imaginary μ 4.1 The QCD Phase Diagram at Imaginary Chemical Potential Let us now consider imaginary chemical potential, μ = i μi . The QCD partition function exhibits two important exact symmetries, reflection symmetry in μ and Z(3)periodicity μi , which hold for quarks of any mass [10], μ μ 2π n Z(μ ) = Z(−μ ), Z =Z , (6) +i T T 3 for general complex values of μ . The symmetries imply transitions between adjacent centre sectors of the theory at fixed μic = (2n + 1)π T /3, n = 0, ±1, ±2, . . . . The Z(3)-sectors are distinguished by the Polyakov loop 1 Nτ L(x) = Tr ∏ U0 (x, τ ) = |L| e−iϕ , 3 τ =1

(7)

whose phase ϕ cycles through ϕ = n(2π /3), n = 0, 1, 2, . . . as the different sectors are traversed. Moreover, the above also implies reflection symmetry about the Z(3) phase boundaries, Z(μic + μi ) = Z(μic − μi ). Transitions in μi between neighbouring sectors are of first order for high T and analytic crossovers for low T [10–12], as shown in Fig. 4 (left). Correspondingly, for fixed μi = μic , there are transitions in T between an ordered phase with two-state coexistence at high T and a disordered phase at low T . The order parameter is the shifted phase of the Polyakov loop, φ = ϕ − μi /T . Away from μi = μic , there is a

QCD Critical Surfaces at Real and Imaginary μ

63

chiral or deconfinement transition line separating high and low temperature regions. This line represents the analytic continuation of the chiral or deconfinement transition at real μ . Its nature (1st, 2nd order or crossover) depends on the number of quark flavours and masses. Here we are interested in the nature of the endpoint of the Z(3) transition line as a function of quark masses. Similar investigations have been carried out for N f = 2 [13]. We thus fix the chemical potential to an imaginary critical value, μi = π T , and investigate the order of the transition by scanning vertically in T for N f = 3 with various masses, cf. Fig. 4 (right).

4.2 Nature of the Z(3) Endpoint for Nf = 3 We use again the Binder cumulant, (2), this time for the imaginary part of the Polyakov loop, X = ImL, which is an order parameter for the ordered and disordered regimes. For μ /T = iπ , every β -value represents a point on the phase boundary and thus is pseudo-critical. In the thermodynamic limit, B4 (β ) = 3, 1.5, 1.604, 2 for crossover, first order triple point, 3d Ising and tricritical transitions, respectively. On finite L3 volumes the steps between these values are smeared out to continuous functions whose gradients increase with volume. The critical coupling βc for the endpoint is obtained as the intersection of curves from different volumes. In the scaling region around βc , B4 is a function of x = (β − βc )L1/ν alone and can be expanded (8) B4 (β , L) = B4 (βc , ∞) + a1 x + a2 x2 + · · · , up to corrections to scaling, with the critical exponent ν characterising the approach to the thermodynamic limit. The relevant values for us are ν = 1/3, 0.63, 1/2 for a first order, 3d Ising or tricritical transition, respectively. For each quark mass, we simulated lattices of sizes L = 8, 12, 16 at typically 8–14 different β -values, calculated B4 (Im(L)) and filled in additional points by Ferrenberg-Swendsen reweighting [9]. Figure 5 (left) shows an example for quark mass am = 0.04. B4 moves from large values (crossover) at small β (i.e. low T ) towards 1 (first order transition) at large β (i.e. high T ). In the neighbourhood of the intersection point, we then fit all curves simultaneously to (8), thus extracting βc , B4 (βc , ∞), ν , a1 , a2 . We have investigated quark mass values ranging from the chiral to the pure gauge regime. The exponents ν pertaining to each of them are shown in Fig. 5 (right). There is unambiguous evidence for a change from first order scaling to 3d Ising scaling, and back to first order scaling. Note that, in the infinite volume limit, the curve would be replaced by a non-analytic step function, whereas the smoothedout rise and fall in Fig. 5 (right) corresponds to finite volume corrections. Hence, for small and large masses, we have unambiguous evidence that the boundary point between a first order Z(3) transition and a crossover at μ = iπ T corresponds to a triple point. We are thus ready to discuss the (T, m) phase diagram of N f = 3 QCD at fixed imaginary chemical potential, μ = i(2n + 1)π T /3. The qualitative situation

64

O. Philipsen, Ph. de Forcrand

Fig. 5 Left: Finite size scaling of B4 for small quark mass, fitted to (8). Insets show data rescaled with fixed ν = 0.33, corresponding to a first order transition. Right: Critical exponent ν vs. quark mass am at μ /T = iπ

is shown in Fig. 4 (right). For high temperatures, we have a two-phase coexistence with the phase φ of the Polyakov loop flipping between two possible values. At low temperatures, instead, we observe phase averaging over the possible phases of the Polyakov loop. Since the transition between these regimes is associated with a breaking of a global symmetry, it is always non-analytic.

4.3 Connection to and Relevance for μ = 0 Combining our knowledge of N f = 2, 3, the nature of the Z(3) transition endpoint can be characterised as a function of quark masses as in Fig. 6 (left), in complete analogy to the corresponding plot at μ = 0 Fig. 1 (top). Schematically, we have a first order region of triple points for both heavy and light masses, which are separated from a region of second order points by a chiral and deconfinement tricritical line, respectively. This entire diagram is computable by standard Monte Carlo methods and constitutes a useful benchmark for model studies of the QCD phase diagram. How is this diagram connected to the one at μ = 0? Generally, a tricritical point represents the confluence of two ordinary critical points. In the heavy mass region the critical endpoints of the deconfinement transition, representing the deconfinement critical surface, merge with the endpoints of the Z(3) transition. Thus, the deconfinement tricritical line is the boundary of the deconfinement critical surface at μ = iπ T /3. This is shown in Fig. 6 (right). It could also be demonstrated within a Potts model, which is in the same universality class of QCD with heavy quarks, that the shape of the deconfinement critical surface for heavy quarks follows tricritical scaling with mean field exponents. The results were published in [14] and reported at conferences [15]. A similar situation is expected for the chiral critical surface.

QCD Critical Surfaces at Real and Imaginary μ

65

Fig. 6 Left: Schematic phase diagram of the Roberge-Weiss endpoint. Right: 3d depiction of the deconfinement and chiral critical surfaces

5 Simulation Details For our Monte Carlo simulations we use the standard Wilson gauge and KogutSusskind fermion actions. Configurations are generated using the Rational Hybrid Monte Carlo algorithm [16]. Our numerical procedure to compute the Binder cumulant is as follows. For each set of fixed quark mass and chemical potential, we interpolate the critical coupling βc from a range of typically 3–5 simulated β -values by Ferrenberg-Swendsen reweighting [9]. For each simulation point 50 k–100 k RHMC trajectories have been accumulated, measuring the gauge action, the Polyakov loop and up to four powers of the chiral condensate after each trajectory. Thus, the estimate of B4 for one set of mass values consists of at least 200 k trajectories, and the estimate of a critical point at least 500 k trajectories. The derivatives on the Nt = 4 lattices are based on ∼ 500 k trajectories, on Nt = 6 we have so far also collected ∼ 700 k trajectories. The simulations are performed on the NEC SX-8 at the HLRS in Stuttgart. A scan in parameter space involves simulations of many parameter sets. For such a problem, parallelisation is achieved trivially by running one set of couplings per node, each node running in vector mode. This way of parallelising allows to explore large regions of the parameter space at the same time, which is necessary when mapping out a phase diagram. At the same time, there is no overhead for parallelisation and communication, ensuring maximal computing efficiency and one-to-one scaling of compute power with the number of processors. The vector mode ensures maximal throughput for each individual lattice. Typically we work with 4–6 nodes in parallel, using each of their 8 cores in parallel and avoiding communication between the nodes. Moreover, this procedure permits efficient use of Grid-computing, which we have employed earlier for the more expensive N f = 2 + 1 [7] and part of the Nt = 6 calculations. The NEC in Stuttgart is used to thermalise multiple decorrelated copies for the configurations, which are then transferred to the Grid for production runs.

66

O. Philipsen, Ph. de Forcrand

6 Conclusions On coarse Nt = 4 lattices, we have found that the chiral phase transition for light quarks weakens as a real chemical potential μ T for fermion number is switched on. This implies that for physical QCD, whose transition is an analytic crossover at zero chemical potential, it remains a crossover also at finite chemical potential. Those findings persist on finer lattices, Nt = 6. We stress, however, that we have only investigated the chiral critical surface. Our findings do not exclude a critical point not belonging to that surface. On the other hand we showed that the critical surfaces continue to imaginary chemical potentials and that their curvature is influenced by tricritical scaling, which thus explains the weakening of the phase transition.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

O. Philipsen, Eur. Phys. J. ST 152 (2007) 29 [arXiv:0708.1293 [hep-lat]]. F. Karsch, E. Laermann and C. Schmidt, Phys. Lett. B 520 (2001) 41 [arXiv:hep-lat/0107020]. P. de Forcrand and O. Philipsen, Nucl. Phys. B 673 (2003) 170 [arXiv:hep-lat/0307020]. P. de Forcrand and O. Philipsen, JHEP 0701 (2007) 077 [arXiv:hep-lat/0607017]. P. de Forcrand, S. Kim and O. Philipsen, PoS LAT2007 (2007) 178 [arXiv:0711.0262 [heplat]]. P. de Forcrand and O. Philipsen, JHEP 0811 (2008) 012 [arXiv:0808.1096 [hep-lat]]. J. T. Moscicki, M. Wos, M. Lamanna, P. de Forcrand and O. Philipsen, Comput. Phys. Commun. 181 (2010) 1715 [arXiv:0911.5682 [Unknown]]. K. Binder, Z. Phys. B 43 (1981) 119. A. M. Ferrenberg and R. H. Swendsen, Phys. Rev. Lett. 63 (1989) 1195. A. Roberge and N. Weiss, Nucl. Phys. B 275 (1986) 734. P. de Forcrand and O. Philipsen, Nucl. Phys. B 642 (2002) 290 [arXiv:hep-lat/0205016]. M. D’Elia and M.P. Lombardo, Phys. Rev. D 67 (2003) 014505 [arXiv:hep-lat/0209146]. M. D’Elia and F. Sanfilippo, Phys. Rev. D 80 (2009) 111501 [arXiv:0909.0254 [hep-lat]]. P. de Forcrand and O. Philipsen, Phys. Rev. Lett. 105 (2010) 152001 [arXiv:1004.3144 [heplat]]. O. Philipsen and P. de Forcrand, PoS LATTICE2010 (2010) 211 [arXiv:1011.0291 [hep-lat]]. M.A. Clark and A.D. Kennedy, Phys. Rev. Lett. 98 (2007) 051601 [arXiv:hep-lat/0608015].

Higgs Boson Mass Bounds from a Chirally Invariant Lattice Higgs-Yukawa Model Philipp Gerhold, Karl Jansen, and Jim Kallarackal

Abstract We consider a chirally invariant lattice Higgs-Yukawa model based on the Neuberger overlap operator D (ov) . The model will be evaluated using PHMCsimulations and we will present final results on the upper and lower Higgs boson mass bounds. The question of a fourth generation of heavy quarks has recently gained attention and we will illustrate the effect of heavy quarks on the Higgs boson mass bounds. Finally we report on the unstable nature of the Higgs boson. The resonance mass and width have been computed in a genuinely non-perturbative manner. The results are compared to the former Higgs boson mass bounds.

1 Introduction In the early 1990’s there has been a large activity of investigating lattice Higgsand Higgs-Yukawa-models driven by the desire for a non-perturbative determination of lower and upper Higgs boson mass bounds as well as its decay properties, see e.g. Refs. [1–7] for reviews. These earlier investigations were, however, blocked by the lack of a consistent formulation of chiral symmetry on the lattice, which is, of course, indispensable for the lattice construction of chiral theories like the HiggsSector of the standard model. There are two main developments which warrant to reconsider these questions: firstly, with the advent of the LHC, we are to expect that properties of the stanPhilipp Gerhold Humboldt-Universit¨at Berlin, Newtonstr. 15, D-12489 Berlin, Germany, e-mail: [email protected] Karl Jansen DESY-Zeuthen, Platanenallee 6, D-15738 Zeuthen, Germany, e-mail: [email protected] Jim Kallarackal Humboldt-Universit¨at Berlin, Newtonstr. 15, D-12489 Berlin, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 6, © Springer-Verlag Berlin Heidelberg 2012

67

68

P. Gerhold, K. Jansen, J. Kallarackal

dard model Higgs boson, such as the mass and the decay width, will be revealed experimentally. Secondly, there is, in contrast to the situation of the earlier investigations, a consistent formulation of a lattice Higgs-Yukawa model with an exact lattice chiral symmetry [8] based on the Ginsparg-Wilson relation [9], allowing thus to go beyond those earlier models. Based on this development, the interest in lattice studies of Higgs-Yukawa models has recently been renewed [10–14]. Establishing non-perturbative bounds for the Higgs boson mass was the main subject of research in the last three years and led to the successful completion of a PhD thesis of PG [15]. Within this time it was possible to establish bounds for the lower as well as the upper Higgs boson masses using dynamical fermions obeying an exact lattice chiral symmetry. The final results were published in [15–17] and will be presented in Sect. 4. Recently, we focused our interest on the possibility of a fourth generation of heavy quarks and leptons. Earlier analysis of electroweak precision measurements of the Z resonance peak stated that the number of fermion generations shall be three. This analysis though, was based on the assumption that the fourth generation neutrino is as light as the other three. An extension of the standard model of particle physics with heavy fermions, i.e. a heavy doublet in the quark sector, denoted by (t , b ) as well as a heavy lepton doublet, denoted by (τ , ντ ), is not excluded [18]. Beyond the mere possibility of a heavy fermion generation, this extension has appealing properties. The extension of the standard model with a fourth generation of heavy quarks and leptons (SM4) can provide a sufficient source for CP violation in order to explain the observed asymmetry between matter and antimatter [18]. Furthermore, the heavy fermions have a significant impact on the running of the coupling constants such that a unification of the gauge couplings is possible without imposing super symmetry [19]. The electroweak precision data is well compatible with more than three generations of quarks and leptons [18]. Investigating such heavy fermions implies a strong Higgs-Yukawa coupling, which in turn necessitates a genuine non-perturbative study. We will present the final results on the shift of the Higgs boson mass bounds due to the heavy quarks. Finally, we report on our results on computing the resonance parameters of the standard model Higgs boson and address its decay properties. The standard model Higgs boson can decay into two weak gauge bosons (W ± , Z) or a pair of fermion and anti-fermion (tt, bb). The Goldstone equivalence theorem states that the decay into the weak gauge bosons can be determined by computing the decay of the Higgs boson into two Goldstone bosons and allows us to connect our computations with upcoming experimental results. Due to the fact that the Higgs boson can decay into lighter particles the computation of the Higgs boson mass by means of the two point correlation function is not anymore reliable. It has been shown [20, 21], that the volume dependence of two-particle energy states are connected to scattering phases as long as the energy levels are within the purely elastic region. The scattering phases allow us to compute the resonance mass as well as its decay width. Three values of the bare quartic coupling are investigated and enable to study the dependence of the resonance width and mass on the strength of the quartic coupling.

Higgs Boson Mass Bounds and Resonance Parameters

69

2 The SU(2)L × SU(2)R Invariant Higgs-Yukawa Model The model, we consider here, is a four-dimensional, chirally invariant SU(2)L × SU(2)R Higgs-Yukawa model discretized on a finite lattice with Ls lattice sites in space directions and Lt lattice sites in time direction. We set the lattice spacing a to unity throughout this paper. The model contains one four-component, real Higgs field Φ (equivalent to the complex doublet notation used in the standard model) and N f degenerate fermion doublets represented by eight-component spinors ψ (i) , ψ¯ (i) , i = 1, . . . , N f with the total action being decomposed into the Higgs action SΦ , and the fermion action SF . The partition function can hence be written as

Z=

DΦ Dψ Dψ exp (−SΦ [Φ ] − SF [Φ , ψ , ψ ]) ,

(1)

where N f is the number of fermion doublets. It should be stressed here that no gauge fields are included within this model. The reason for neglecting the gauge fields is that we are interested in the Higgs and fermion masses here, which are dominated by the quartic self coupling and Yukawa coupling, whereas the coupling to the gauge bosons is strongly suppressed due to the renormalized gauge coupling constant being much smaller than the Yukawa coupling constant at the energy scale considered here which is given by the Higgs mass in this setup. Furthermore, the Goldstone equivalence theorem states, that the obtained results can be related to the standard electroweak model including gauge fields. In addition, we restrict ourselves to mass degenerate fermion doublets such that the Yukawa coupling of the scalar field to the fermions are equal. The fermion action SF is based on the Neuberger overlap operator D (ov) [22] and can be written in terms of the fermionic matrix M according to t ψ = SF = ∑ ψ¯ M ψ , b i=1 1 (ov) , M = D (ov) + yN B[Φ ] 1 − D 2ρ B[Φ ] = Φ μ P+ θμ† 12 + P− θμ 12 θμ = (12 , −iτ ) Nf

(i)

(i)

(i)

(2) (3) (4) (5)

τ denotes the three Pauli matrices. The fermion matrix M describes the propagation of the fermion fields as well as their coupling to the scalar field φ . The (doublet) Dirac operator D (ov) = Dˆ (ov) ⊗ Dˆ (ov) is given by the Neuberger overlap operator Dˆ (ov) , which is related to the Wilson operator Dˆ (W ) = γμE 12 (∇μf + f ∇bμ ) − 2r ∇bμ ∇μ by Aˆ (ov) ˆ D , = ρ 1+ √ Aˆ † Aˆ

Aˆ = Dˆ (W ) − ρ ,

1 ≤ ρ < 2r

(6)

70

P. Gerhold, K. Jansen, J. Kallarackal

with ∇μ , ∇bμ denoting the forward and backward difference quotients and ρ is within its bounds a free parameter of the Neuberger Dirac operator. Note that in absence of gauge fields this kinetic part corresponds to the one of free fermions which will be exploited in the numerical construction of the overlap operator later. The model then obeys an exact, but lattice modified, global SU(2)L × SU(2)R chiral symmetry according to f

ψ → ΩL Pˆ− ψ + ΩR Pˆ+ ψ ψ¯ → ψ¯ P+ ΩL† + ψ¯ P− ΩR† φ → Ω R φ ΩL† φ † → ΩL φ † ΩR† with ΩL , ΩR ∈ SU(2) recovering the chiral symmetry in the continuum limit [8]. Finally, the lattice Higgs action is given by the usual lattice notation

2 SΦ = −κN ∑ Φn† Φn+μˆ + Φn−μˆ + ∑ Φn† Φn + λN ∑ Φn† Φn − N f n,μ

n

(7)

n

with the only peculiarity that the fermion generation number N f appears in the quartic coupling term which was a convenient convention for the large N f analysis. However, this version of the lattice Higgs action is equivalent to the usual continuum notation 1 f † f 1 2 † † 2 Sϕ = ∑ (8) (∇μ ϕ )n ∇μ ϕn + m0 ϕn ϕn + λ0 (ϕn ϕn ) , 2 2 n with the bare mass m0 and the bare quartic coupling constant λ0 corresponding to the continuum notation. The connection is established through a rescaling of the Higgs field and the involved coupling constants according to

Φx2 + iΦx1 ϕx = 2κN Φx0 − iΦx3

λN 4κN2 1 − 2N f λN − 8κN m20 = κN yN y0 = √ . 2κ N λ0 =

3 Implementation, Performance, and Parallelization The simulation algorithm and the software components which were essential to produce the space time configurations were completed in 2009. The following chapter will describe the numerical challenge and the implemented algorithm. We will re-

Higgs Boson Mass Bounds and Resonance Parameters

71

fer to the typical performance achieved on the XC4000 at the computing center in Karlsruhe. The first step towards a numerical treatment of the considered Higgs-Yukawa model is to integrate out the fermionic degrees of freedom leading to the appearance of the determinant of the fermionic matrix M in the partition function

Z=

DΦ [det (M )]N f · exp (−SΦ [Φ ]) ,

(9)

where the fermionic matrix M was given in (3). We remark here, that M is neither hermitean nor does it possess an orthogonal eigen basis. The fermion matrix M is, in contrast to QCD, not even γ5 -hermitean. For the numerical treatment of the fermion dynamics the determinant in (9) is rewritten in terms of Gaussian integrations over N f complex, so-called pseudofermionic fields ωi , i = 1, . . . , N f . This step, however, requires the positivity and hermiticity of the underlying fermion matrix, which can be guaranteed by rewriting the partition function according to

Z=

N f DΦ [sign det (M )]N f · det M M † 2 · exp (−SΦ [Φ ])

DΦ Dω Dω † [sign det (M )]N f Nf

1 1 † † −2 × exp −SΦ [Φ ] − ∑ ωi M M ωi i=1 2

(10)

=

(11)

leading to the numerically demanding problem of calculating the inverse square root of M M † . Note that (10) does not involve a general complex phase of the determinant but only its sign since the determinant of M is real as one can show. In this reformulation the model can finally be numerically evaluated by using Monte-Carlo techniques. Here we have implemented a PHMC-algorithm (see [23] for a general review), which evaluates the extremely high-dimensional, i.e. 3(N +1) N f +1 · Lt -dimensional, functional integral in (11) stochastically and 4 · 16N f · Ls f solves the problem of determining the inverse square root of M M † acting on a vector ω by applying polynomial approximations P(M M † )ω . The basic numerical task for performing this Monte-Carlo simulation is therefore the computation of M M † ω . However, since M is a 8V × 8V matrix for a lattice with volume V = Ls3 × Lt , this is a very time-consuming operation. In fact, M is too large to be held in computer memory. Instead, only the prescription how M acts on a given vector is implemented in the software code. At this point we exploit the fact that there are no gauge fields included within our model. The Dirac operator D (ov) is thus block-diagonal in momentum space. The fermion matrix M is therefore composed of the coupling matrix φ , being diagonal in position space, and the Dirac operator, being diagonal in momentum space. In our approach we use a Fast Fourier Transform for the computation of M M † ω to switch between position and momentum representations, such that all operator

72

P. Gerhold, K. Jansen, J. Kallarackal

Table 1 Typical speeds, auto-correlation times τ for the vacuum expectation value of the scalar field and configuration file sizes for typical lattice sizes. The main improvement which enables us to perform calculations on lattice sizes up to 404 , was to exploit the multi processor environment and algorithmic efforts which helped to reduce the maximum polynomial degree down to 20 Typical performance in 2008: Lattice size Ls3 × Lt

Speed [generated configurations per day]

Auto-correlation time τ for Higgs vev

File size for single configuration

123 × 32 163 × 32 203 × 32

700 to 900 300 to 400 80 to 120

5 to 6 5 to 6 5 to 6

1.7 MBytes 4.0 MBytes 7.8 MBytes

Current typical performance: Lattice size Ls3 × Lt

Speed [generated configurations per day]

Auto-correlation time τ for Higgs vev

File size for single configuration

323 × 32 403 × 40

80 to 150 80 (1)

3 to 5 3 to 5

32.0 MBytes 78.0 MBytes

(1) This lattice has been computed on the XC2-FAT-NODE, while the other numbers refer to computations on the THIN-NODES. Due to the high demands in memory, the 404 lattices can only run on the FAT-NODES.

applications can be trivially performed due to their actual block-diagonal structure. This is particularly advantageous for the overlap operator, since the standard construction of this operator would be based on much more demanding computations and moreover would be constructed only approximately. A second advantage of this approach is that the applied Dirac operator can easily be replaced by other operators simply by adopting the corresponding eigenvalues. The current program, which is running on the XC2 has been improved in many ways, including algorithmic as well as technical issues, which will be illustrated below. Table 1 shows the typical performance on the XC2 and a comparison to the performance achieved in 2008. The auto-correlation times τ are calculated according to the Γ -strategy [24] and refer to the vacuum expectation value of the scalar field. When examining the differences in the running speeds in Table 1 one notices that the speed does not exactly scale proportional to the volume. Although a Fast Fourier Transform belongs in general to the complexity class V · log(V ) one needs to keep in mind that a FFT runs only optimal on lattices with sizes being a power of two. If other prime factors appear in the lattice size, the algorithm looses its efficiency. This is clearly seen on the 203 × 32 lattice containing a 5 as prime factor. This observation also reveals the fact that the biggest portion of the used computer time is spent on performing the Fast Fourier Transforms. The implemented parallelization techniques therefore mainly focus on exploiting the available multi processor system for calculating the FFT. Figure 1a shows a straight forward use of the multi-threaded FFTW package. The reason for this bad scaling behavior lies in the fact, that FFTs are mostly limited by memory access speeds, i.e. bus speeds, and not by CPU power. Figure 2 shows the ccNUMA-architecture underlying an AMD Opteron system. Each processor is connected to its neighboring unit through a hyper transport link which has a throughput of 4 GB/s. The figure also shows

Higgs Boson Mass Bounds and Resonance Parameters

(a)

73

(b)

Fig. 1 a Scaling behavior of FFTW using several threads (cores) on a single processor. b Scaling performance using one or two cores and up to four processors. The number of threads is given as: n p × nc where n p denotes the number of processors and nc denotes the number of cores. Both performance plots were performed on a 324 lattice

Fig. 2 ccNUMA architecture. Image taken from 4P Server Comparison at http://www.amd.com/ us-en/assets/content type/DownloadableAssets/

that each processor has its own memory segment. A thread running on one node but accessing memory that is resident on a different node, will have to make use of the hyper transport link. The idea was to align processes to memory segments where the memory had been allocated. The eight components of the fermionic vector can

74

P. Gerhold, K. Jansen, J. Kallarackal

then be distributed to each of the four segments, such that each thread can access its data through its own bus leading to a very good scaling behavior. Figure 1b shows the performance in scaling with respect to the number of threads by aligning the processes to a specific memory segment. Currently, we are running 324 and 404 lattices on the fat nodes having 8 cores each. The simulation program will need approximately 20 GB of main memory on each node for the 404 -lattice. The fat nodes with their 128 GB of memory are therefor perfectly suited for our computations. There are about 200 Simulations which are stored in archive space and which occupy 8.4 TB.

4 Results We examine the model choosing various values for the bare quartic self coupling while holding the cut-off Λ fixed. The bare parameters of the theory then have to be tuned such that the phenomenologically known physical values of the top quark and the scalar vacuum expectation value are kept unchanged. The latter requirement restricts the freedom in the choice of the bare parameters κ , yt,b , λ . The Yukawa couplings will be tuned to yield top and bottom quark masses of mt /a ≈ 175 GeV. Here, mt , mb denote the lattice masses and a is the lattice spacing. For our numerical simulations we use the tree-level relation yt,b =

mt,b vr

(12)

as an approximative starting point to set the bare Yukawa coupling constants yt and yb . Furthermore, the model has to be evaluated in the broken phase, i.e. at nonvanishing vacuum expectation value of the Higgs mode, v = 0, however close to the phase transition to the symmetric phase. We also use the phenomenologically known value of the renormalized vev, vr /a = 246 GeV, to determine the lattice spacing a and thus the physical cutoff Λ according to 246 GeV =

v vr ≡√ , a ZG · a

Λ = a−1 ,

ZG−1 :=

c −1 2 ∂ Re (GG ) (p ) 2 . 2 ∂ p2 p =− μG (13)

The Goldstone propagator GcG denotes the infinite volume Goldstone propagator which can be fitted to the obtained simulation data in order to perform an infinite volume extrapolation. μG is an arbitrary mass scale and it is natural to choose the so called on shell scheme where the renormalization point is chosen at the physical value. Once the infinite volume propagator is known one can compute the field renormalization factor ZG , ZH as well as the physical mass by determining the pole of the propagator. In the case of unstable particles, such as the Higgs boson, care is needed as the pole of the Higgs boson propagator is complex. We define the physical Higgs boson by

Higgs Boson Mass Bounds and Resonance Parameters

−1 Re GcH (p2c )

75

p2c =−m2H

= 0.

(14)

The lattice Goldstone and Higgs propagators in position space are given as GG (x, y) =

1 3 α α gx gy 3 α∑ =1

and

GH (x, y) = hx hy .

(15)

We derive the physical top and bottom quark masses mt , mb from the fermionic time slice correlators C f (Δ t) =

1 Lt −1 ∑ ∑ 2ReTr fL,t+Δ t,x · f¯R,t,y Lt · Ls6 t=0 x,y

(16)

with f = t, b selecting the quark flavor. We remark here that the full all-to-all correlator as defined in (16) can be trivially computed by using sources which have a constant value of one on a whole time slice for a selected spinor index and a value of zero everywhere else. This all-to-all correlator yields very clean signals for the top and bottom quark mass determination. Two restrictions limit the range of accessible energy scales: on the one side all particle masses have to be small compared to Λ to avoid unacceptably large cutoffeffects, on the other side all masses have to be large compared to the inverse lattice size to avoid finite volume effects. As a minimal requirement we demand here that all particle masses m in lattice units fulfill mˆ < 0.5

and

mˆ · Ls,t > 2.

(17)

4.1 The Higgs Boson Mass Bounds Our aim is to compute non-perturbative upper and lower Higgs boson mass bounds. The lower Higgs boson mass is obtained at λN = 0 while the upper mass bound is determined at λ = ∞. We perform our simulations on various lattice sizes in order to analyze the finite volume effects and have extrapolated the physical masses to infinite volume. Figure 3a shows our final result on the upper and lower Higgs boson masses mH /a versus the cutoff Λ . All presented results have been obtained in Monte-Carlo simulations where finite size analysis have been performed. The details of this work were published in [15–17].

4.2 Preliminary Data on the Effects of a Heavy Fourth Generation One of our current main target is to investigate the influence of a potential fourth heavy fermion generation on the upper and lower Higgs boson mass bounds. For that purpose we apply basically the same strategy as for the standard model Higgs boson as described above.

76

P. Gerhold, K. Jansen, J. Kallarackal

Table 2 The table shows an overview of the chosen bare parameters. The fermion mass was evaluated at each value of the cut off in order to ensure, that the fermion mass is within 3% with respect to the average fermion mass of 676 GeV. Stat. is the number of configurations produced on a 163 × 32 lattice with the given parameter set. τ is the auto correlation time. The temporal extent aT was set to 32 in all cases κ m20 /a2 λ0 y0 Ls [a · L] Stat. τ Λ [GeV] 0.09442 2.59098 0.0 3.21224 12, 16, 18, 20, 24 20000 1.3 3498 ± 48 0.09463 2.56747 0.0 3.20867 12, 16, 18, 20, 24 15000 1.2 2929 ± 27 0.09485 2.54296 0.0 3.20495 12, 16, 18, 20, 24 15000 0.8 2548 ± 22 0.09545 2.47669 0.0 3.19486 12, 16, 18, 20, 24 20000 0.7 1883 ± 16 0.09560 2.46025 0.0 3.19235 12, 16, 18, 20, 24 20000 0.8 1786 ± 18 0.09605 2.41124 0.0 3.18486 12, 16, 18, 20, 24 20000 1.5 1511 ± 20 0.21300 −∞ ∞ 3.37068 12, 16, 18, 20, 24 6000 7 3566 ± 48 0.21500 −∞ ∞ 3.35497 12, 16, 18, 20, 24 6000 3 2701 ± 17 0.22200 −∞ ∞ 3.18159 12, 16, 20, 24 15000 4 2563 ± 38 0.22320 −∞ ∞ 3.17303 12, 16, 20, 24 15000 3 2299 ± 21 0.22560 −∞ ∞ 3.15610 12, 16, 20, 24 15000 4 1932 ± 14 0.23040 −∞ ∞ 3.12305 12, 16, 20, 24, 32 15000 3 1516 ± 7

Including a heavy fourth generation involves another mass scale with respect to the known three generations. In our numerical studies we will neglect the known three generations of quarks and leptons and restrict ourselves solely to the fourth generation as we expect that its coupling to the scalar field will dominate the contribution to the scalar masses. With this restriction, the model is unchanged and the fourth generation can be included by tuning the bare Yukawa couplings such that the fourth generation top quark mass is in the phenomenologically allowed region. Experimental constraints from Tevatron suggest that the mass of such a heavy quark should be larger than mt , mb > 250 GeV. Table 2 lists the chosen bare parameters for the lower as well as the upper Higgs boson mass bound in order to meet the phenomenologically interesting case. The table also reveals that the condition number of the fermion matrix rises such that a polynomial degree of 64 is needed in order to keep the simulation cost comparable to those simulations performed for the standard model fermions. All results for a heavy fourth generation of fermions were performed on a 163 × 32 lattice and computations on larger lattices are currently running. It is the aim to keep the quark masses fixed while the cut off is varied. In numerical simulations, where the result of observables involve statistical errors, it is a difficult task to satisfy such a condition to arbitrary good precision. In this work the quark masses are about mt = mb = 676 ± 22 GeV. Finally Fig. 3 shows the infinite volume result for the Higgs boson mass bounds at cut off values between 1500 GeV and 3500 GeV. It is known from e.g. [25] that the upper Higgs boson mass bound follows the functional form −1/2 mH = Am · log(Λ 2 /μ 2 ) + Bm . a up

(18)

Am , Bm are free fit parameters and μ is an arbitrary scale, which is set to μ = 1 TeV.

Higgs Boson Mass Bounds and Resonance Parameters

77

Fig. 3 The figure shows the infinite volume extrapolation of the Higgs boson masses. The mass was extracted from the pole of the Higgs boson propagator. For a direct comparison, the results obtained in previous work [16, 17] is displayed in the left plot (a). The right plot (b) shows the case of a heavy quark doublet. The quark mass is mt = mb = 676 ± 22 GeV

Compared to the standard model case, the relative shift in the upper Higgs boson mass is less than 200 GeV and the cut off dependence is weaker. The bound is well compatible with the logarithmic decay given in (18). All parameter sets have been computed on lattices with spatial extent of Ls ∈ 12, 16, 18, 20, 24 in order to analyze the finite volume effects and to perform an infinite volume extrapolation (see Table 2 for details on the simulation).

4.3 Resonance Parameters of the Higgs Boson Our second main target is to address the unstable nature of the Higgs boson and to determine its resonance parameters. The connection between the volume dependence of two particle energies and the scattering phase of unstable particles was demonstrated in [20, 21, 26, 27]. The main result is that there exists an effective Schr¨odinger equation with corresponding wave functions. Expanding these wave functions in terms of spherical harmonics yields the momentum dependent scattering phase which in turn can be used to compute the resonance mass. The straightforward application of this proposal will require lattice sizes of at least 404 and about 10000 configurations in order to evaluate the scattering phase with the desired accuracy. The above mentioned procedure to evaluate the scattering phase in the pure O(4) model was investigated in the mid nineties [28]. We now have extended these results by including the fermions which brings us closer to the standard model of elementary particles. The Higgs boson decays dominantly to any even number of Goldstone bosons, if kinematically allowed. The physical set-up chosen here allows always for such a de-

78

P. Gerhold, K. Jansen, J. Kallarackal

Table 3 The table summarizes the bare parameters for the Monte Carlo simulations which were performed in order to determine the scattering phases. The next columns show the Higgs boson mass extracted from the propagator, the Goldstone boson mass quark mass and the renormalized vev. The last column shows the cut off (Λ ). The latter physical quantities are obtained after an extrapolation to infinite volume κ λˆ yˆ J Mp Mp mt [GeV] vR Λ [GeV] H

0.12950 0.24450 0.30200

0.01 1.0 ∞

0.36274 0.49798 0.57390

0.001 0.002 0.002

G

0.278(1) 0.386(28) 0.405(4)

0.085(2) 0.133(4) 0.129(1)

174(1) 179(2) 178(1)

0.2786(3) 0.1637(5) 0.1539(2)

883(1) 1503(5) 1598(2)

cay. The bare parameters as well as the physical cut off, the Higgs boson propagator mass, the Goldstone boson mass and the obtained Top quark mass are summarized in Table 3. The infinite volume results are obtained after a linear fit to the data starting from lattice volumes of at least 163 × 40. The procedure above reflects the method which has been followed in order to determine the mass bounds of the Higgs boson [16, 17]. The Goldstone theorem ensures that the Goldstone bosons are massless. Due to an external current which couples to one of the scalar fields in the complex SUW (2) doublet, the symmetry is broken explicitly in the Lagrangian. The Goldstone bosons acquire a mass and they form a vector under cubic rotations. The magnitude of the p current J is chosen such that the ratio of the Higgs boson mass MH to the Goldstone boson mass is roughly 3. Here and below the superscript p in MHp and MGp denotes that the mass was extracted from the analysis of the momentum space propagator and a fit formula motivated from perturbation theory. The resonance mass which corresponds to the physical Higgs boson mass is obtained with the help of the correlation matrix analysis [29, 30] and a fit of the corresponding scattering phases to the generic Breit-Wigner curve. Figure 4 shows the obtained scattering phases for the three different physical situations. The scattering phase takes values in the interval [0, π ] and is plotted against the momentum k. If the scattering phase δ (k) passes through π2 it indicates the existence of a resonance. Hence, all three set-ups involve an unstable Higgs boson and its resonance parameters are obtained by a fit of the obtained scattering phases to the Breit-Wigner function. The cross section can be decomposed into spherical harmonics and is then given by

σ (k) = ≈

4π k2

∞

∑ (2 j + 1) sin2 (δ j (k))

(19)

j=0

4π 2 sin (δ0 (k)) . k2

(20)

As was mentioned in the beginning of this section, the total cross section resembles a Breit-Wigner curve near a resonance. As argued before, the contribution of the higher angular momenta j > 0 are neglected. Here the Breit Wigner function is used as a fit function in order to extract the resonance mass and the width. The first column in Fig. 4 shows the cross sections and the Breit-Wigner fit. The explicit form

Higgs Boson Mass Bounds and Resonance Parameters

79

Fig. 4 The figure shows the scattering phases obtained in the three different physical situations for λ ∈ {0.01, 1.0, ∞} ordered vertically. The red points refer to scattering phases obtained from the analysis in the center of mass frame as originally proposed in [21]. The blue points denote the scattering phases computed within a moving frame. The modification was proposed in [31]. The vertical dotted line indicates the inelastic threshold. The computations were performed on various lattice volumes Ls3 × 40 where Ls ∈ {12, 16, 18, 20, 24, 32, 40}

80

P. Gerhold, K. Jansen, J. Kallarackal

Table 4 The table summarizes the obtained final results on the resonance mass and the resonance width of the Higgs boson. λˆ denotes the bare quartic coupling. The first line is a preliminary result from Chap. 8 in [15]. Λ is the cut off of the theory. The following two columns display the resonance parameters computed from the scattering phases. ΓHp is the width obtained from perturbation theory where a non-vanishing mass for the Goldstone bosons has been considered. Finally the mass extracted from the propagator as well as the mass eigenvalues computed with the help of the correlation matrix is shown. The latter results were obtained after an extrapolation to infinite volume λˆ Cut off Λ Resonance Resonance ΓHp Propagator GEVP mass MH width ΓH mass MHp 0.01 593(1) GeV 0.428(3) 0.009(3) 0.0076(2) 0.433(3) 0.01 883(1) GeV 0.2811(6) 0.007(1) 0.0054(1) 0.278(2) 0.274(4) 1.0 1503(5) GeV 0.374(4) 0.033(4) 0.036(8) 0.386(28) 0.372(4) ∞ 1598(2) GeV 0.411(3) 0.040(4) 0.052(2) 0.405(4) 0.403(7)

of the fit function is f (k) := 16π

MH2 ΓH2 . (MH2 − 4m2G )((Wk2 − MH2 )2 + MH2 ΓH2 )

The solid curve in the second column in Fig. 4 is then obtained by inverting equation (19) which gives the scattering phases. Finally Table 4 summarizes the results obtained by the different approaches. The physical Higgs boson mass is compared to the mass obtained from the Higgs propagator and the energy eigenvalues obtained with the help of a correlation matrix analysis. The latter results were obtained after an extrapolation to infinite volume.

5 Summary and Outlook With the time allocated to us at the Karlsruhe computer center we could establish the lower and the upper Higgs boson mass bounds as published in [15–17]. The bounds were calculated without relying on arguments upon vacuum instability or triviality as normally used in e.g. perturbation theory. This was achieved by investigating a lattice model of the pure Higgs-Yukawa sector of the standard model. The main idea of this approach is to apply direct Monte-Carlo simulations to determine the maximal interval of Higgs boson masses attainable within this model while keeping the phenomenologically known physical values (the top quark mass and the renormalized vacuum expectation value of the scalar field) fixed. To maintain the chiral character of the Higgs-fermion coupling structure on the lattice we have considered here a chirally invariant lattice Higgs-Yukawa model based on the Neuberger Dirac operator. The main result of the presented findings is that a lower Higgs boson mass bound is a manifest property of the pure Higgs-Yukawa sector that evolves directly from the Higgs-fermion interaction for a given Yukawa coupling parameter. The result for the upper Higgs boson mass is in good agreement with renormalized

Higgs Boson Mass Bounds and Resonance Parameters

81

perturbation theory. The magnitude of the upper Higgs boson mass decreases logarithmically with rising values of the cut-off Λ . The upper Higgs boson mass at a cut-off value of 1.5 TeV is 630 GeV. Moreover, we have also computed the effects of a heavy fourth generation of fermions on the Higgs boson masses. We find that the lower bound is significantly shifted to higher Higgs boson masses while the upper bound is less altered, as is summarized in Fig. 3. However, these results need to be corroborated by further simulations to control the finite size effects. This will make it necessary to use lattices of size 324 in the future to make a reliable infinite volume extrapolation. Moreover, it will be important to investigate the Higgs boson mass bounds at several values of the 4th generation quark masses and study the cut off dependence. As another result of our work, we have determined the resonance parameters of the Higgs boson. Our results are shown in Fig. 4 and summarized in Table 4. We have studied the resonance parameters of the Higgs boson from small to strong values of the quartic coupling. Our results show that even for large values of the coupling the resonance width is only about 10% of the resonance mass and fully compatible with perturbation theory incorporating a non-vanishing Goldstone boson mass. It will be very interesting to extend this resonance analysis of the Higgs boson also to the case of a 4th generation of quarks. Acknowledgments. We thank the “Scientific Super computing Center” in Karlsruhe for granting computing time on the HP XC4000 System. We further acknowledge the support of the “Deutsche Telekom Stiftung” which provided a Ph.D. scholarship for P.G. and the support of the DFG through the DFG-project Mu932/4-1.

References 1. J. Smit. Standard model and chiral gauge theories on the lattice. Nucl. Phys. Proc. Suppl., 17:3–16, 1990. 2. J. Shigemitsu. Higgs-Yukawa chiral models. Nucl. Phys. Proc. Suppl., 20:515–527, 1991. 3. M. F. L. Golterman. Lattice chiral gauge theories: Results and problems. Nucl. Phys. Proc. Suppl., 20:528–541, 1991. 4. I. Montvay and G. M¨unster. Quantum Fields on a Lattice (Cambridge Monographs on Mathematical Physics). Cambridge University Press, 1997. 5. A. K. De and J. Jers´ak. Yukawa models on the lattice. HLRZ J¨ulich, HLRZ 91-83, preprint edition, 1991. 6. M. F. L. Golterman, D. N. Petcher, and E. Rivas. On the Eichten-Preskill proposal for lattice chiral gauge theories. Nucl. Phys. Proc. Suppl. B, 29C:193–199, 1992. 7. K. Jansen. Domain wall fermions and chiral gauge theories. Phys. Rept., 273:1–54, 1996. 8. M. L¨uscher. Exact chiral symmetry on the lattice and the Ginsparg-Wilson relation. Phys. Lett. B, 428:342–345, 1998. 9. P. H. Ginsparg and K. G. Wilson. A remnant of chiral symmetry on the lattice. Phys. Rev. D, 25:2649, 1982. 10. T. Bhattacharya, M. R. Martin, and E. Poppitz. Chiral lattice gauge theories from warped domain walls and Ginsparg-Wilson fermions. Phys. Rev. D, 74:085028, 2006. 11. J. Giedt and E. Poppitz. Chiral lattice gauge theories and the strong coupling dynamics of a Yukawa-Higgs model with Ginsparg-Wilson fermions. JHEP, 10:076, 2007.

82

P. Gerhold, K. Jansen, J. Kallarackal

12. E. Poppitz and Y. Shang. Lattice chirality and the decoupling of mirror fermions. arXiv:0706.1043 [hep-th], 2007. 13. Z. Fodor, K. Holland, J. Kuti, D. Nogradi, and C. Schroeder. New Higgs physics from the lattice. PoS, LAT2007:056, 2007. 14. P. Gerhold and K. Jansen. The phase structure of a chirally invariant lattice Higgs-Yukawa model for small and for large values of the Yukawa coupling constant. arXiv:0705.2539 [heplat], 2007. 15. P. Gerhold. Upper and lower Higgs boson mass bounds from a chirally invariant lattice HiggsYukawa model. 2010. 16. P. Gerhold and K. Jansen. Lower Higgs boson mass bounds from a chirally invariant lattice Higgs-Yukawa model with overlap fermions. JHEP, 0907:025, 2009. 17. P. Gerhold and K. Jansen. Upper Higgs boson mass bounds from a chirally invariant lattice Higgs-Yukawa model. JHEP, 1004:094, 2010. 18. B. Holdom et al. Four statements about the fourth generation. PMC Phys. A, 3:4, 2009. 19. P. Q. Hung. Minimal SU(5) resuscitated by long-lived quarks and leptons. Phys. Rev. Lett., 80:3000–3003, 1998. 20. M. Luscher. Signatures of unstable particles in finite volume. Nucl. Phys. B, 364:237–254, 1991. 21. M. Luscher. Two particle states on a torus and their relation to the scattering matrix. Nucl. Phys. B, 354:531–578, 1991. 22. H. Neuberger. More about exactly massless quarks on the lattice. Phys. Lett. B, 427:353–355, 1998. 23. R. Frezzotti and K. Jansen. The PHMC algorithm for simulations of dynamical fermions. I: Description and properties. Nucl. Phys. B, 555:395–431, 1999. 24. U. Wolff. Monte Carlo errors with less errors. Comput. Phys. Commun., 156:143–153, 2004. 25. M. L¨uscher and P Weisz. Scaling laws and triviality bounds in the lattice phi**4 theory. 3. N component model. Nucl. Phys. B, 318:705, 1989. 26. M. Luscher. Volume dependence of the energy spectrum in massive quantum field theories. 1. Stable particle states. Commun. Math. Phys., 104:177, 1986. 27. M. Luscher. Volume dependence of the energy spectrum in massive quantum field theories. 2. Scattering states. Commun. Math. Phys., 105:153–188, 1986. 28. M. Gockeler, H. A. Kastrup, J. Westphalen, and F. Zimmermann. Scattering phases on finite lattices in the broken phase of the four-dimensional O(4) phi**4 theory. Nucl. Phys. B, 425:413–448, 1994. 29. M. Luscher and U. Wolff. How to calculate the elastic scattering matrix in two-dimensional quantum field theories by numerical simulation. Nucl. Phys. B, 339:222–252, 1990. 30. B. Blossier, M. Della Morte, G. von Hippel, T. Mendes, and R. Sommer. On the generalized eigenvalue method for energies and matrix elements in lattice field theory. JHEP, 0904:094, 2009. 31. K. Rummukainen and S. A. Gottlieb. Resonance scattering phase shifts on a nonrest frame lattice. Nucl. Phys. B, 450:397–436, 1995.

Massive and Massless Four-Loop Integrals P. Baikov, K. Chetyrkin, J.H. K¨uhn, P. Marquard, and M. Steinhauser

This is the report for the project ParFORM for the period June 2010 to June 2011.

1 Introduction The major task of perturbative quantum field theory is the evaluation of so-called Feynman diagrams, which constitute an intuitive representation of physical processes in elementary particle physics. Each Feynman diagram has a one-to-one translation to a high-dimensional momentum integral to be computed—if possible analytically. There are several algorithms which have been suggested for the computation. All of them require huge resources of computer power which can only be handled in combination with effective programs. In our applications the main part of the calculation is performed analytically with the help of the program FORM [1]. At our institute parallel versions have been developed, ParFORM [2] and TFORM [3] which to date support the full command set of FORM. The main features of FORM have been discussed extensively in previous reports. Since August 2010 all versions of FORM are open source [4].

2 Massless Four-Loop Integrals The current precision measurements of the Z decay rate into hadrons, ΓZh , have developed into an important experimental tool for a reliable determination of αs . From the theoretical point of view the QCD corrections to the former used to be treated in close analogy to the corrections involved in studying the total cross-section of e+ e− P. Baikov · K. Chetyrkin · J.H. K¨uhn · P. Marquard · M. Steinhauser Institut f¨ur Theoretische Teilchenphysik, Karlsruher Institut f¨ur Technologie, 76128 Karlsruhe, Germany W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 7, © Springer-Verlag Berlin Heidelberg 2012

83

84

P. Baikov et al.

Fig. 1 Sample diagrams contributing to ΓZA,S to order αs2 , αs3 and αs4 . Wavy, curly, thin and thick straight lines denote photons, gluons, massless and massive quarks, respectively

annihilation into hadrons through a virtual photon. Within the standard model the interaction of the Z boson with quarks is described (in the lowest order approximation in the weak coupling √ constant) by adding to the QCD Lagrangian an extra term of the form MZ (GF / 8)1/2 Z α Jα0 , with Jα0 = ∑i ψ i γα (gVi − gAi γ5 )ψi being the neutral axial-vector current. The decay rate ΓZh , including all strong interaction corrections, may be viewed as incoherent sum of vector (ΓZV ) and axial (ΓZA ) contributions. Technically, it is convenient to divide both parts into the so-called singlet and non-singlet contributions: ΓZV = ΓZV,NS + ΓZV,S , ΓZA = ΓZA,NS + ΓZA,S . (1) For three of the four contributions, namely, ΓZV,NS , ΓZA,NS , ΓZV,S the corresponding two-point functions are identical (apart from a simple modification of some overall factors) to the electromagnetic current two-point function, which in turn, is uniquely related to another important physical quantity: the total cross-section of electron positron annihilation (so-called R-ratio). On the other hand, the axial-singlet part ΓZA,S receives contributions from two types of diagrams: completely massless ones and those containing a virtual top quark loop [5, 6]. Some sample diagrams are shown in Fig. 1. The technique of asymptotic expansions allows to use effectively the heaviness of the top quark and take into account the leading power not-suppressed (that is proportional to various powers of log MZ2 /Mt ) terms. The resulting factorized expressions contain two main ingredients: massless propagators (more precisely, their absorptive part) and massive vacuum diagrams (so-called tadpoles).

Massive and Massless Four-Loop Integrals

85

Till now the decay rate ΓZh is known completely to order αs3 [6]. The important problem of calculation of the next, αs4 correction, has been on the agenda of our group during the last 10 years. Note that there is a significant technical differences between calculation of singlet and non-singlet contributions. In addition, numerically the singlet contributions are usually significantly smaller than the non-singlet ones. Thus, we started our calculations from the latter. By now we have computed and published ΓZV,NS , ΓZA,NS as well as the (nonsinglet) part of R(s) to order αs4 . This activity has been covered in detail in previous reports so we will not dwell on this in what follows. Starting from the early spring of 2010 we began to compute the two remaining ingredients, ΓZV,S and ΓZA,S to order αs4 . We have already performed an analytical calculation of the coefficient function of the Gross-Llewellyn Smith sum rule in a generic gauge theory in order O(αs4 ) [7]. It is demonstrated that the corresponding Crewther relation allows to fix two of the three color structures in the O(αs4 ) contribution to the singlet part of the Adler function. We have also computed independently the singlet part of the Adler function (which is equivalent to the computation of RS ) and have found the Crewther relation is indeed met analytically for all rational and irrational contributions at order O(αs4 ) (altogether about 10 independent relations), except for one piece: in one among the three different color structures a rational part of the Adler function is numerically close but not identical to the one predicted from the Crewther relation. Now we are investigating the possible reasons for the disagreement. For most of the compute-jobs connected to the massless four-loop propagator integrals we have used 12 processors which leads to the total amount of 120 processors assuming 10 jobs in the batch queue.

3 Four-Loop Vacuum Integrals In the second part of the project we are concerned with two different kind of integrals. First of all we investigated massive vacuum diagrams also called tadpoles. These diagrams play an important role in the determination of the low-energy expansion of the vacuum polarization functions which can be used for the extraction of the masses of the charm and bottom quark from experimental data. The calculation of the low-energy expansion and the analysis of the available data has been the central point of the last report and is by now complete leading to the most precise determination of the masses of the heavy quarks. Since this topic has been extensively discussed in our previous reports we will not go into further detail.

86

P. Baikov et al.

Fig. 2 Sample diagrams contributing to the calculation of n2l and n3l part of the MS-onshell relation. Curly, thin and thick straight lines denote gluons, massless and massive quarks, respectively

4 Four-Loop Onshell Integrals Let us instead discuss the new class of integrals which has become the main focus of our attention. This new class of integrals, so-called four-loop onshell integrals, appear as building blocks in important physics applications like the MS-onshell relation or the anomalous magnetic moment of the muon. The MS-onshell relation allows to relate the values of the quark masses in different renormalization schemes and is up to now only known at three-loop order. The calculation has been started and has already lead to partial results namely the calculation of the n3l and n2l contributions, i.e. contributions from diagrams with at least two closed massless quark loops, see Fig. 2 for sample diagrams. This part of the calculation required only five of a total of about 100 families of integrals, however, it nevertheless represents a new result and shows that our approach to compute four-loop onshell integrals works quite well. In the course of the calculation two main obstacles have to be overcome. In all the different families the integrals have to be reduced to a small set of basis integrals which then have to be identified and calculated separately. For the first part of the calculation, the reduction, two programs, CRUSHER and FIRE [8], exist and have to be applied to the problem. Both programs are very memory demanding and consume a considerable amount of CPU time. The second part, the calculation of the basis integrals, can be approached in two different ways, numerically or analytically. The numerical approach, using FIESTA [9], is again heavily relying on computing power while the analytical approach requires a highly sophisticated analysis of the properties of every basis integral. Both methods will be applied and used as cross checks. First results can probably presented in the next report. The calculation of the anomalous magnetic moment of the muon, which, together with the one of the electron, is one of the most precisely measured quantities, will follow once the calculation of the MS-onshell relations is complete. Since it is based on the same families of integrals as the MS-onshell relation many results can be reused. Acknowledgments. Most of the computations presented in this contribution were performed on the Landesh¨ochstleistungsrechner XC4000.

Massive and Massless Four-Loop Integrals

87

References 1. FORM version 3.0 is described in: J. A. M. Vermaseren, “New features of FORM”, arXiv:math-ph/0010025; for more developments, see also: M. Tentyukov and J. A. M. Vermaseren, “Extension of the functionality of the symbolic program FORM by external software”, arXiv:cs.sc/0604052; FORM can be obtained from the distribution site at http://www.nikhef.nl/∼form. 2. M. Tentyukov, D. Fliegner, M. Frank, A. Onischenko, A. Retey, H. M. Staudenmaier and J. A. M. Vermaseren, “ParFORM: Parallel Version of the Symbolic Manipulation Program FORM”, arXiv:cs.sc/0407066; M. Tentyukov, H. M. Staudenmaier and J. A. M. Vermaseren, “ParFORM: Recent development”, Nucl. Instrum. Meth. A 559 (2006) 224; H. M. Staudenmaier, M. Steinhauser, M. Tentyukov, J. A. M. Vermaseren, “ParFORM”, Computeralgebra Rundbriefe 39 (2006) 19; See also http://www-ttp.physik.uni-karlsruhe.de/∼parform. 3. M. Tentyukov and J. A. M. Vermaseren, “The multithreaded version of FORM”, arXiv:hep-ph/0702279. 4. http://www.nikhef.nl/form/formcvs.php. 5. K. G. Chetyrkin and J. H. Kuhn, Phys. Lett. B 308 (1993) 127. 6. K. G. Chetyrkin and O. V. Tarasov, Phys. Lett. B 327 (1994) 114 [arXiv:hep-ph/9312323]. 7. P. A. Baikov, K. G. Chetyrkin, J. H. Kuhn, Nucl. Phys. Proc. Suppl. 205–206 (2010) 237–241 [arXiv:1007.0478 [hep-ph]]. 8. A. V. Smirnov, JHEP 0810 (2008) 107 [arXiv:0807.3243 [hep-ph]]. 9. A. V. Smirnov, M. N. Tentyukov, Comput. Phys. Commun. 180 (2009) 735–746 [arXiv:0807.4129 [hep-ph]].

Solid State Physics Prof. Dr. Holger Fehske

The following chapter reveals that solid state physics based research has profited substantially from the supercomputing facilities of the High Performance Computing Center Stuttgart during this funding period. Out of numerous excellent projects in the area of condensed matter and material science, six contributions—ranging from first-principle simulations of ablation and adsorption processes via the calculation of vibrational and transport properties of nanostructures to the numerical study of strongly correlated electron systems—have been selected by the reviewing committee for presentation. The first topic concerns the ablation of particles from the surface of a bulk material by intense laser radiation. The collaborative work by the Stuttgart groups J. Roth, J. Karlin, H.-R. Trebin, S. Sonntag from the University and Ch. Ulrich from the Fraunhofer-IWM demonstrates how the laser ablation process can be adequately simulated by large-scale molecular dynamics, even when femtosecond pulses hit metallic surfaces and the direct interaction of the laser beam with the conduction electron has to be taken into account. Of course, a molecular dynamics calculation produces only particle trajectories, i.e., gives no information about the formation of drops and clusters in the gas plume or voids close to the sample’s surface. In order to find such objects an additional clustering algorithm (DBSCAN) had to be implemented. In the actual simulation of the Stuttgart group, a 60 million aluminum atom sample was considered and the laser energy was introduced by rescaling the kinetic energy of the atoms in each time step. Then the threshold where ablation sets in could be determined and the cluster-, velocity- and angular-distribution in the plume could be analyzed. The formation of voids and the void dynamics was studied for a sample with more than 7 million atoms in the bulk. It has been found that all accessible quantities compare well to experiments, which showed that scaling invariance was present.

Institut f¨ur Physik, Lehrstuhl Komplexe Quantensysteme, Ernst-Moritz-Arndt-Universit¨at Greifswald, Felix-Hausdorff-Str. 6, 17489 Greifswald, Germany, e-mail: [email protected] 89

90

H. Fehske

Density functional theory (DFT) based ab-initio calculations of structural, electronic and vibronic properties of organic molecules or molecular aggregates on crystal surfaces, semiconductor nanostructures, quantum-dots or nanowires are another focal point of the solid state physics section. Along this line, a first principles analysis of the adsorption of amino acid cysteine on the gold (110) surface has been performed by B. H¨offling, K. Hannewald and F. Bechstedt from the European Theoretical Spectroscopy Facility and the University Jena and F. Ortmann from the CEA Grenoble. Here the huge number of atoms and configurations and the complicated bonding processes involved call for massively parallel codes that efficiently run on the present-day supercomputers. H¨offling et al. used the repeated-slab super-cell method to study the molecule-surface interaction for different absorption geometries. Their DFT calculations are based on VASP, with the exchange-correlation functional taken in a generalized gradient approximation. The electron-ion interaction was modelled by the projector-augmented wave method. To gain insight into the chemical nature of the molecule-substrate interaction the spatial rearrangement of Au atoms caused by molecular adsorption was studied. The authors point out the dominant role of the thiolate-Au bond in the determination of the adsorption geometry, despite the fact that the amino-Au bond was found to be weaker and the adsorption-induced strain in the substrate larger than for the other, energetically favorable flat adsorption geometries. These results should be of interest for other self-assembled cysteine monolayers as well. As soon as realistic applications of semiconductor-based nanostructures or quantum dots will be addressed, the effects of finite temperature, e.g. vibrations and other dynamical processes, come into play. For this reason G. Bester and P. Han from the Max Planck Institute for Solid State Research Stuttgart studied the vibrational properties of III–V semiconductor colloidal nanostructures by a hybrid MPI OpenMP DFT implementation on the HLRS NEC Nehalem cluster. In particular, the authors investigated the effects of surface passivation for indium-phosphorus nanoparticles. Thereby the structure-geometry relaxation was performed using the Broyden-Fletcher-Goldfarb-Shano procedure for the optimization of the atomic positions, the charge densities are obtained by solving the Kohn-Sham equations selfconsistently in a symmetric setting, and the eigenvalues and eigenvectors of the dynamical matrix are determined by direct diagonalization. Comparing the vibrational density of states of the nanoparticles with the phonon density of states of bulk InP, some additional modes were found in the gap between the acoustic and optical phonon modes. These surface optical modes show a blue shift with decreasing size of the nanoparticle. By contrast, for the unrelaxed unpassivated nanostructures, unpaired electrons reduce the bond strength and lead to a red shift of vibrational modes. A rather different, but nevertheless very interesting DFT application has been presented by W.G. Schmidt, E. Rauls, U. Gerstmann, S. Sanna, M. Landmann, M. Rohrm¨uller, A. Riefer and S. Wippermann (University Paderborn) on the topic “Entropy and metal-insulator transition in atomic-scale wires”. Nanowires that self assemble on a Si surface provide a fascinating model system for studying the interplay between electron-electron and electron-phonon interactions on an atomic scale.

Solid State Physics

91

In this project the focus was on the origin of the heavily debated (4 × 1)-metal to (8 × 2)-insulator phase transition of a Si(111)–In nanowire arrays (order-disorder vs. (triple-band) Peierls instability). Again the DFT was implemented in VASP with a reasonable scaling up to 32 CPUs. In contrast to earlier work both the vibrational and electronic entropy of the In-wire were included in the calculations. The authors arrive at the conclusion that both the (4 × 1) and (8 × 2) configurations are stable structures; soft shear and rotational vibrations transform between the In zigzag chains at high (room) temperature and the hexagons at low temperature. The calculations show that the phase transition is caused by the gain in vibrational entropy which overcompensates for higher temperatures the gain in band-structure energy realized if the metallic chains pass over to the semiconducting hexagons. This mechanism might apply to other quasi-1D materials with competing interactions. The last category of projects deals with the investigation of strongly correlated electron systems. In solid state theory these are the most complicated systems we are aware of, quite simply because all relevant energy scales have the same order of magnitude. The lack of exact results for the prototype microscopic models has stimulated the development of unbiased numerical methods. Nowadays finitecluster diagonalizations, quantum Monte Carlo simulations, and—especially for one-dimensional systems—density matrix renormalization group (DMRG) calculations have become powerful tools for solving the underlying complicated manybody Hamiltonians. The remaining two projects make use of this very elaborate but hardly to parallelize DMRG technique. P. Schmitteckert from the Karlsruhe Institute of Nanotechnology studied the transport properties of quantum devices attached to metallic leads. Exploiting timedependent DMRG, he followed two complementary approaches: (i) the Kubo approach to obtain the conductance and (ii) a direct time-evolution approach for an initial state with charge imbalance. In great detail he analyzed the current-current correlation function for an interacting resonant level model at its self-dual point. The fluctuations then allow for the calculation of the noise power spectrum (in the low frequency limit). The framework could be extended to get access to the so-called full counting statistics via the cumulant generating function. Most strikingly P. Schmitteckert proved the existence of a negative differential conductance regime even for the single-resonant level model. Moreover he argued that color-charge separation takes place in cold SU(3) fermionic systems confined in 1D optical lattices. The 2D t-J model constitutes a paradigm for the theoretical description of hightemperature superconductivity. But also its 1D version allows for deep insight into the nature of rather exotic states, such as the spin-gap phase, which is of importance for the 2D case as well. Advantageously the 1D t-J model can be investigated by DMRG on a sequence of fairly large lattices, which allows to extrapolate the results within a controlled finite-size scaling to the thermodynamic limit. A. Moreno, A. Muramatsu (University Stuttgart) and S. Manmana (JILA University Colorado) carefully explored the ground-state properties of the 1D t-J model. The project partners determined the phase boundaries between metallic, singlet superconducting spin-gapped, gapless superconducting, and phase-separated states by DMRG, calculating density-density correlation functions, the structure factor, the Luttinger liquid

92

H. Fehske

anomalous dimension, the compressibility, and the spin gap with extreme precision. Their large-scale numerics makes it possible to resolve several controversial issues, related, e.g., to the extent of the spin-gap region or to the appearance of particleand hole-rich domains which, in comparison to previous studies, were found to be stable at higher values of the antiferromagnetic spin-exchange coupling only. In summary, in the field of computational solid state physics a variety of interesting activities have been put into effect. Without a doubt the presented projects were of high quality demonstrating the large impact of supercomputing in this area.

Laser Ablation of Aluminium: Drops and Voids Johannes Roth, Johannes Karlin, Christian Ulrich, Hans-Rainer Trebin, and Steffen Sonntag

1 Introduction In [16] we have presented a general introduction to the process of laser ablation, its simulation by the molecular dynamics method, and results for aluminium and a complex metallic alloy. Here we will concentrate on how drops or clusters and voids can be simulated which form during laser ablation. Laser ablation is a process where material is removed from the surface of a solid or liquid by very intense laser radiation. We are especially interested in femtosecond pulses on metals where the direct interaction of the laser beam with the conduction electrons plays an important role and must in general be described by a two-temperature model with separate temperatures for the electrons and the atoms. For details on the two-temperature model and the implementation in conjunction with molecular dynamics see [16]. We will not apply this model here for reasons explained later. To actually drill a hole with femtosecond pulses requires of the order of ten thousand pulses and time frames up to seconds. This is way beyond the scope of atomistic simulations. But we can treat each laser pulse separately since the time between two pulses is so large that the material cools down. The only difference are modified surfaces. Here we report on large scale ablation simulations and the analysis of the size and angular distribution of single atoms forming a gas phase together with a fraction of droplets. A second part includes the formation of voids in the irradiated region and its temporal evolution. Johannes Roth · Johannes Karlin · Hans-Rainer Trebin · Steffen Sonntag Institut f¨ur Theoretische und Angewandte Physik, Universit¨at Stuttgart, Stuttgart, Germany, e-mail: [email protected] Christian Ulrich Fraunhofer-Institut f¨ur Werkstoffmechanik, Freiburg im Breisgau, Germany, e-mail: ulri@iwm. fraunhofer.de W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 8, © Springer-Verlag Berlin Heidelberg 2012

93

94

J. Roth et al.

2 Method The effectivity of laser ablation and the precision of the hole generated by the material removed depends directly on phenomena observed. If drops are formed they can fall back and stick to the surface, which is regarded as a pollution. Or they can stay in the hole and fill it up. Void formation on the other hand may be a desired effect for patterning surfaces. This was the motivation for us to study this process. For the characterization of the ablated material as a function of the angular distribution we had to illuminate the sample inhomogeneously with the spatial distribution of a Gaussian. The electron heat conductivity, however, is so large that it would completely smear out the spatial distribution in our sample which is large, but still small on an experimental size scale. Thus we had to resort to a simpler method, the so-called rescale method which neglects the electron contribution completely. This is certainly a serious shortcoming but cannot be avoided currently. A molecular dynamics simulation produces only particle trajectories and generates no information about drops or clusters and voids. To find these objects, one has to use an algorithm to figure out which atoms form a cluster and which places are void of atoms. One can actually use the same algorithm for both problems. First we applied the public domain program rapidminer which features the algorithm DBSCAN. This program, however, runs only serially and is rather slow for large datasets. Therefore we implemented our own version of this algorithm. The article is organized as follows: first we will shortly describe the methods to find clusters and voids. Then we will present the most interesting results. At the end we will also present some benchmark results and close with a summary. More details can be found in the theses by S. Sonntag [17] and J. Karlin [9] and in [18].

3 Clustering Algorithm 3.1 Clusters Clusters of very different size can develop in the plume generated by laser ablation. We used the DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm [5, 6] to analyze the cluster size distribution. DBSCAN is a hierarchical algorithm which divides and merges a data set iteratively into a tree structure. The problem is the termination condition when the dividing and merging should stop. This condition could be a critical distance dcrit between the clusters. If dcrit is too small this would split large clusters. It should not be too large so that different nearby clusters can still be distinguished. A comparison with a number of other algorithms leads to several advantages of DBSCAN: (i) it does not require the a priori knowledge of the number of clusters in a volume, (ii) the shape of the clusters is arbitrary, and (iii) noise is permitted. In our case noise would correspond to clusters consisting of exactly one atom. These “clus-

Laser Ablation of Aluminium: Drops and Voids

95

ters” form the gas phase of the ablated material. Two atoms are already considered as a proper cluster if the distance between the atoms is below 0.35 nm, (iv) good performance for large data sets. The reason is that the clusters have a typical density of points which is much larger in the cluster than outside of it which is essential to the DBSCAN algorithm. In general the minimum number of points MinPts forming a proper cluster can be taken larger than one, for the voids for example it was 5. The algorithm was applied to the coordinates of all atoms above the sample surface which means that the bulk of the sample is not considered. It clusters the data points according to a minimum of neighbor points which means that a lower threshold density exists. Pairs of points are then classified into direct density reachable, density reachable or density connected if their distance is below the threshold, if they are connected by a chain of points with the first property, or if a point exists through which both are density reachable. A cluster is now an object which is maximal with respect to density reachability and where all the points are at least density connected [5]. For the actual analysis we first used the public domain program rapidminer available at http://rapid-i.com/ [14]. Rapidminer is a tool for data mining, which means extracting patterns from data. It has a powerful graphical user interface and an on-the-fly error recognition system. It can handle all kinds of input data without preprocessing! Rapidminer can be applied for very many data mining questions. For our purposes we did the following: (i) reading the data with the cvs reader, (ii) scanning the data sets in xyz-format for clusters using the DBSCAN algorithm. After the analysis has been carried out, each atom is marked with a new attribute CN called clusterID. Two atoms have the same CN if they belong to the same cluster. (iii) Convert the attributes from strings into numbers makes post-processing with awk much easier, (iv) writing the data in xyz, id format. The drawback of rapidminer is that it is programmed in Java and lacks efficiency. Clustering a dataset of 106 atoms for example takes up to 7 days. The same holds for visualization. So rapidminer cannot be applied in a user-friendly way for more than 105 atoms. This was the reason why we produced our own implementation of the DBSCAN algorithm since the plumes produced in our simulation contained up to 106 atoms. A second reason was that rapidminer cannot treat periodic boundary conditions which are essential for molecular dynamics simulations and thus could not combine clusters or voids at opposite sides across the samples.

3.2 Voids Voids are the opposite of a clusters: under certain conditions they are generated close to the surface of the sample and are in general completely empty. Thus we cannot classify them directly by looking at data points. Clustering the surrounding material would lead to a prohibitively large overhead and tells us nothing about the individual voids.

96

J. Roth et al.

We came up with a simple yet effective method for detecting voids: griding the whole sample with a simple cubic lattice and deleting all grid vertices which are too close to existing atoms. The remaining grid points are exactly the objects that represent the voids. To effectively run the algorithm in linear time we can make use of the link-cell method already implemented in IMD. There is one free parameter, namely the size of the grid cells. If it is too large, the void shape will be represented rather crudely. If it is too small, the grid points will flow into the region between the atoms. Tests of different grid sizes showed that the optimal grid cell length is about the same as the atom lattice constant.

4 Simulation Results 4.1 Inhomogeneous Ablation Simulations on Aluminium Here we will present results of a quantitative study of the plume generated by an inhomogeneous laser beam. The ablated particles leave behind a crater in the bulk material. For practical reasons the system size was limited to 60 million atoms: a single snapshot requires about 6 GB of disc space. The tools for data manipulation like MegaMol [7] are at the limit of handling samples that large. A second reason is the computing time requiring of the order of a million CPU hours. A 60 million atom sample has a size of 101 × 101 nm2 in cross section and a length of 108 nm. This is too small to apply the two-temperature model with an inhomogeneous beam since the high electron conductivity will immediately remove all inhomogeneities. Therefore we introduced the laser energy by rescaling the kinetic energy of the atoms i at each time step δ t: i i Ekin (t + δ t) = Ekin (t) +

δt S(xi ,t) ρ

with the laser energy distribution S(x,t) = (1 − R)S0 α e−α x I(y, z)e

−

(t−t0 )2 2σt2

and the spatial distribution I(y, z) = e

(y2 +z2 ) ω2

.

ρ is the number density of the sample, R is the reflectivity, S0 the amplitude factor, α the inverse absorption length, σt the pulse duration and ω the half beam width. The pulse duration was set to σt = 100 fs, α −1 to 8.0 nm, and the laser fluences where chosen equidistant between 1373 J/m2 and 3204 J/m2 . The values may seem high but one has to take into account that these values correspond to the peak intensities at the very center of the profile. For all simulations ω was set to 15 nm.

Laser Ablation of Aluminium: Drops and Voids

97

4.1.1 Ablation Threshold For the given parameters we determined the threshold where ablation sets in and obtained a value of Fth = (1137±166) J/m2 together with a penetration depth α −1 = (8.6 ± 1.7) nm, very close to the optical penetration depth [17]. The reason for this behavior is the missing electron heat conductivity. Ablation simulations have then been carried about at five fluences above the threshold.

4.1.2 Cluster Distribution in the Plume An average cloud of particles contains from 250,000 up to 850,000 atoms in the setups studied in our simulations. Figure 1 shows that most of the clusters no longer connected to the filaments have √ spherical shape and can therefore be characterized by an average radius RN = 3 Natoms . The largest clusters in the plume contain approximately 100,000 atoms, corresponding to a radius of r = 7–8 nm. Figure 2 summarizes our results. It is quite obvious that we observe a bimodal distribution. We followed the common analysis procedure where the yield distribution is described by two power laws Y (N) ∝ N −δ [11, 22]. There is no clear cut between the two ranges, thus we fit the data in the range N < 6 and N > 6 separately. The exponents δ range from 3.9 to 5.1 and from 0.3 to 0.9 in the large cluster region. These values are, however, very sensitive to the fitting intervals and therefore only rough estimates. Experimental results for sputtering experiments are somehow comparable to laser ablation. They yield similar results for δ of 7.7 [21] or 9.3 [3]. Other simulations by Zhigilei [22] give values from 1.21 to 1.31 for the large cluster regime.

Fig. 1 Snapshots of a 60 million atoms simulation 4, 40 and 70 ps after a 100 fs laser pulse hits the surface. The colors map the kinetic energy of the atoms. For visualization MegaMol [7] was used

98

J. Roth et al.

Fig. 2 Cluster distribution for five different fluences between 1373 J/m2 and 3204 J/m2 . For the ablation yield Y (N) a power law Y (N) ∝ N −δ can be found with different δ values in the low-mass and the high-mass (N > 10) ranges

4.1.3 Velocity Distribution of the Plume From the molecular dynamics simulation data the velocity distribution function of the plume can be calculated (Fig. 3). The temperature of the plume is estimated by averaging the kinetic energy. Due to the directional expansion, the center of mass velocity of the clusters has to be subtracted. We find that the gas phase has nearly the same temperature for all pulses. A fit to a M AXWELL -B OLTZMANN distribution shows that all temperatures lie within an error of 200 K, thus the temperatures for the different fluences cannot be distinguished. The mean temperature obtained is T = 3107 K. A slight deviation in the regions with small and large velocities can be seen. This is usually accounted for by a modified distribution function used to describe the ablated particles. The fastest particles move with about 6 km/s. In experiments velocities between 3 and 30 km/s have been reported [15].

4.1.4 Angular Distribution in the Plume The angular distribution function of the clusters with respect to the angle between their flight vector and the surface normal ϑ can be calculated from the same simulation data. This is similar to experiments where the ablated material is collected by a metal foil above the experimental setup. Ablation is carried out in vacuum with a G AUSSian-like intensity distribution. Thus it is close to our simulation. Unfortunately there are no data for aluminium [4, 10]. All data points can well be fitted by the functions also used in experiment [10]. A first case contains the sum of two distributions f1 (ϑ ) = r[a cosm (ϑ ) + (1 − a) cosn (ϑ )] (m < n),

(1)

Laser Ablation of Aluminium: Drops and Voids

99

Fig. 3 Velocity distribution calculated from MD simulations for the different fluences. For comparison a 3107 K M AXWELL -B OLTZMANN velocity distribution is shown

where r is the value at ϑ = 0, a describes the ratio of the cosm (ϑ ) component to the whole distribution. The values of m range from 1 to 4 in experiment and from 0.65 to 7 in the simulations, the values of n are between 3 and 24 in experiment and between 8 and 53 in the simulations. A second equation used to interpret experimental data can be derived from a model based on gas dynamics [1, 2, 20]: f2 (ϑ ) = r

(1 + tan2 (ϑ ))3/2 . (1 + k2 tan2 (ϑ ))3/2

(2)

The fitting parameter k denotes the ratio of the cloud front along the surface normal direction and the front in direction of the surface, k = Xinf /Yinf , respectively. While for small angles both functions fit the data set reasonable, for angles larger than 1 rad (2) leads to a smaller error in general, see Fig. 4. In the literature, values between 1.1 and 3.2 are reported for k while we find values between 1.6√and 3.2. The dependency of k on the applied √ fluence can well be fitted to k(F) = χ F − F0 + k0 with χ = (38 ± 2) · 10−3 m/ J, k0 = 1.68 ± 0.05 and F0 = (1373 ± 96) J/m2 . In general we find a more directed ablation process for higher fluences as reflected by the decreasing a and a larger value of k. This has been reported also in experiment at least for ions [4] while for uncharged particles the opposite is reported.

4.2 Void Distribution in Homogeneous Ablation Simulations Here we present a study of voids generated in the ablation process. The sample contained more than 7 million atoms in a volume of 211 × 27 × 27 nm3 . The applied laser fluence studied in detail was 572 J/m2 , the reflectivity R 0.86. Pulse

100

J. Roth et al.

Fig. 4 Angular distributions for different fluences between 1373 J/m2 and 3204 J/m2 . The line shows the fit to (1). The inlay shows a direct comparison of the two analytic functions f1 and f2 . In general (2) describes the data better than (1). The red crosses exemplarily show one azimuthal distribution fˆ(ϕ ) for the lowest fluence while the other symbols denote the θ -dependency

durations σt have been varied between 100 fs and 1 ps. All simulations were carried out at 300 K. Simulation time was 60 ps or 60,000 time steps. Periodic boundary conditions have been applied in lateral direction whereas open boundaries where used along the direction of the laser beam. Since no damping was applied at the back end, essentially a thin film was simulated. Beneath the heating, the laser pulse also generates tensile waves which travel through the sample at the velocity of sound, 6734 m/s (experimental value of 6420 m/s) and are reflected as pressure waves at the back end. In general, the reflected wave will interact with the void formation. In our simulation this does not matter since the wave returns after the voids have decayed. Therefore the life time of the voids is governed by there intrinsic properties and not by the interaction with the pressure wave.

4.2.1 Void Dynamics The most interesting case occurs at an applied laser fluence of 572 J/m2 due to the formation of an unstable ablation layer: there are three competing processes: (i) the formation of voids, (ii) the growth of voids by coalescence, (iii) the shrinkage of the voids caused by the layer returning back onto the bulk. The voids form within about 1 ps after 5 ps, for long pulses a little later. The number of the voids decays with a L ORENTZ law, and the decay time increases with the pulse duration from 40 ps for the 100 fs pulses up to 55 ps for 400 to 600 fs pulses and shrinks then up to 25 ps for the 1000 fs pulses. A similar behavior is obtained for the total volume of the voids and the volume of the largest void.

Laser Ablation of Aluminium: Drops and Voids

101

Fig. 5 Evolution of the entire void volume in time

Figure 5 shows the volume evolution of the voids. It is a bimodal distribution with a second maximum at about 30 ps, slightly increasing with time, and a first maximum around 15 to 20 ps, overlapping with the second maximum. In most cases the overlap causes a minimum at about 25 ps. The second maximum is generated by the unstable detachment of a layer from the bulk and its subsequent backdrop. The first maximum is formed by the nucleation and coalescence of small voids. These results are in good agreement with similar simulation work in two dimensions [12, 13] and three dimensions [8]. A similar picture emerges if we concentrate on the void with the largest volume only. Again a bimodal distribution is observed. The growth rate drops to zero after the time of the minimum which tells us that the coalescence of voids has been terminated. For pulse durations shorter than 400 ps the volume of the largest void continuously increases with pulse duration. Figure 6 depicts the evolution phases of the voids: in the beginning (represented at 11 ps) there are many voids of all sizes. Then the small voids merge to form larger voids represented here at 25 ps, and finally only large voids are left (for example at 44 ps) which then shrink in the last phase until the ablation layer has been reattached to the bulk.

102

J. Roth et al.

Fig. 6 Evolution of the voids from 11 to 25 and 44 ps (from left to right)

4.2.2 Larger Fluences We find that at a laser fluence of 572 J/m2 the formation of an ablation layer is unstable: the voids grow and shrink. If the fluence is increased slightly to 687 J/m2 , stable spallation occurs. Voids form and a layer of material is removed from the bulk irreversibly at moves at about 121 m/s.

4.2.3 Summary The thickness of the ablated layer increases for pulse durations below 400 ps from 4.5 nm at 100 ps up to 9 nm at 400 ps. The temperature of the layer is close to or above the boiling temperature at the front of the sample. Thus it is liquid, and the voids are empty as has been found also in [8]. The formation and decay of the voids is therefore a mechanical process in the melting phase of aluminium.

5 Performance 5.1 Benchmarks General benchmark data for IMD have been given by Stadler et al. [19]. The data show that IMD scales almost linearly in weak scaling (same number of atoms per processor) and fairly well for strong scaling (total number of atoms constant, thus communication load growing).

5.2 Performance of Cluster Distribution Simulation Here we present some timing data for the big quantitative cluster distribution simulation. For these simulations the samples contained 60 million atoms, each sample

Laser Ablation of Aluminium: Drops and Voids

103

˚ 2 in equal steps). Each condition was studied at 5 different fluences (15–35 eV/A was treated with 5 different initial conditions to get better statistics, and the last simulation was run with the additional option STRESS. At the beginning, all simulations run equally well, per 12 h simulation on 512 cores results in 40 ps of sample time, equal to 40,000 iteration steps. Since the sample gets more and more inhomogeneous, the simulation gets slower and slower leading to only 30 ps per 12 h simulation at the end. The option STRESS requires some additional computing power, leading to 35 ps of sample time at the first run. Usually the sample is cut off at a certain level in the bulk after a cumulative simulation time of 200 ps since the simulation slow down, resulting in 25 ps per 12 h simulation on 512 cores at the end. In the ideal case the total study would have taken Ttotal = 5 fluences × 5 simulations of 12 h duration (5×40 ps = 200 ps) × 12 h each = 1500 h @ 512 = 768000 CPU hours. A time step lasts 2.02e-6 sec per core, step and atom. Due to non optimal runs and an additional simulation of 50 ps of the gas phase at the end we estimate that the total amount of 1 million CPU hours was consumed.

6 Summary We presented results on ablation simulation with inhomogeneous laser beams. It has been shown that all accessible quantities compare well to experiments. This is remarkable since our large-scale simulation is still an order of magnitude smaller in lateral direction than the smallest experiments. This shows that a scaling invariance is present. It is certainly debatable if our simulations grasp all the complex processes occurring in the expanding plume since we could not account for plasma formation.

References 1. Amuroso, S., Toftmann, B., Schou, J., Thermalization of a UV laser ablation plume in a background gas: From directed to a diffusion-like flow, Phys. Rev. E 69 1–6 (2004). 2. Anisimov, S.I., Luk’yanchuk, B.S., Luches, A., An analytical model of three-dimensional laser plume expansion into vacuum in hydrodynamic regime, Appl. Surf. Sci. 96–98 24–32 (1996). 3. Coon, S., Calaway, W., Pellin, M., White, J., New findings on the sputtering of neutral metal clusters, Surf. Sci. 298 161–172 (1993). 4. Donnely, T., Lunney, J., Amoruso, S., Bruzzese, R., Wang, X., Ni, X., Angular distributions of plume components in ultrafast laser ablation of metal targets, Appl. Phys. A 100 569–574 (2010). 5. Ester, M., Kriegel, H.P., Sander, J., Xu, X., Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications, Data Mining and Knowledge Discovery 2 169–174 (1998).

104

J. Roth et al.

6. Fisher, D., Fraenkel, M., Henis, Z., Moshe, E., Eliezer, S., Interband and intraband (Drude) contributions to femtosecond laser absorption in aluminum, Phys. Rev. E 65 1–8 (2001). 7. Grottel, S., Reina, G., Vrabec, J, Ertl, T., Visual verification and analysis of cluster detection for molecular dynamics, IEEE Trans. on Visual. and Comp. Graph. 13 1624–1631 (2007). 8. Ivanov, D.S., Zhigilei, V., Combined atomistic-continuum modeling of short-pulse laser melting and disintegration of metal films, Phys. Rev. B 68 064114 (2003). 9. Karlin, J., Formation of Voids in Laser-Irradiated Aluminium, Diploma Thesis, Stuttgart (2011). 10. Konomi, I., Motohiro, T., Asaoka, T., Angular distribution of atoms ejected by laser ablation of different metals, J. Appl. Phys. 106 013107 (2009). 11. Leveugle, E., Zhigilei, L.V., Molecular dynamics simulation study of ejection and transport of polymer molecules in matrix-assisted pulsed laser evaporation, J. Appl. Phys. 102 074914 (2007). 12. Lewis, L.J., Perez, D., Laser ablation with short and ultrashort laser pulses: Basic mechanisms from molecular dynamics simulations, Appl. Surf. Sci. 255 5101–5106 (2009). 13. Lewis, L.J., Perez, D., Molecular dynamics study of ablation of solids under femtosecond laser pulses, Phys. Rev. B 67 184102 (2003). 14. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T., YALE: Rapid Prototyping for Complex Data Mining Tasks, in KDD’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, eds. Ungar, L., Craven, M., Gunopulos, D., Eliassi-Rad, T., ACM, New York, 2006, pp. 935–940. 15. Okano, Y., Oguri, K., Nishikawa, T., Nakano, H., Observation of femtosecond-laser-induced ablation plumes of aluminium using space- and time-resolved soft x-ray absorption spectroscopy, Appl. Phys. Lett. 89 221502 (2006). 16. Roth, J., Trichet, C., Trebin, H.-R., Sonntag, S., Laser Ablation of Metals, in High Performance Computing in Science and Engineering ’10, eds. Nagel, W.E., Kr¨oner, D.B., Resch, M.M., Springer, Heidelberg, 2011, pp. 159–168. 17. Sonntag, S., Computer Simulations of Laser Ablation from Simple Metals to Complex Metallic Alloys, PhD Thesis, Stuttgart (2011). 18. Sonntag, S., Trichet, C., Roth, J., Trebin, H.-R., Molecular dynamics simulations of cluster distribution from femtosecond laser ablation in aluminum, Appl. Phys. A 104 559–565 (2011). 19. Stadler, J., Mikulla, R., Trebin, H.-R., IMD: A software package for molecular dynamics studies on parallel computers, Int. J. Mod. Phys. C 8 1131–1140 (1997). 20. Toftmann, B., Schou, J., Lunney, J., Dynamics of the plume produced by nanosecond ultraviolet laser ablation of metals, Phys. Rev. B 67 1–5 (2003). 21. Wuchner, A., Wahl, M., The formation of clusters during ion induced sputtering of metals, Nuc. Inst. Meth. Phys. Res. Sec. B 115 581–589 (1996). 22. Zhigilei, L.V., Dynamics of the plume formation and parameters of the ejected clusters in short-pulses laser ablation, App. Phys. A 76 339–350 (2003).

Cysteine on Gold: An ab-initio Investigation B. H¨offling, F. Ortmann, K. Hannewald, and F. Bechstedt

Abstract We present a first principles analysis of the adsorption of the amino acid cysteine on the Au(110) surface. We carry out density functional theory calculations using the repeated-slab supercell method to investigate the molecule-surface interaction. We investigate the adsorption for four different adsorption geometries: one upright configuration, in which the molecule binds to the surface solely via the deprotonized thiolate head group and three flat configurations, which form an additional bond via the amino side group. We analyze bonding energy, charge redistribution, and changes in the density of states. We find that a flat geometry with the Au-thiolate bond at an off-bridge site and the Au-amino bond close to the Autop site is energetically favored. The electron redistributions exhibit the combined characteristics of the isolated bonds found in earlier studies, supporting the view of strongly localized interaction between the functional groups and the metal surface. The electrostatic nature of the Au-amino bond and the covalent character of the Au-thiolate bond are still visible in the adsorption of the complete molecule.

1 Introduction Recently, there has been much interest in the adsorption of organic molecules on crystal surfaces, especially to metal/organic interfaces [1–9], which are of great interest for the development of molecular electronics [10–12] or molecular spintronics [13]. Organic molecules exhibit a large variety of different functional groups, which can be exploited for the organic functionalization of various substrates, making use B. H¨offling · K. Hannewald · F. Bechstedt European Theoretical Spectroscopy Facility (ETSF) and Institut f¨ur Festk¨orpertheorie und -optik, Friedrich-Schiller-Universit¨at Jena, Max-Wien-Platz 1, 07743 Jena, Germany, e-mail: [email protected] F. Ortmann CEA Grenoble, INAC/SPRAM/GT, 17 Rue des Martyrs, 38054 Grenoble Cedex 9, France W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 9, © Springer-Verlag Berlin Heidelberg 2012

105

106

B. H¨offling et al.

of their unique electronic properties as revealed by single molecule conductance experiments [14] or reflectance anisotropy spectroscopy [15]. The investigation of molecular adsorption by theoretical ab-initio investigations using high performance computers has greatly advanced the understanding of these complex processes. The huge number of atoms and configurations as well as the intricate nature of the bonding processes involved create a high demand for massively parallel machines and codes that run efficiently on those machines. These simulations can yield atomistic bonding models that can be used for calculating spectroscopic properties which may be measured experimentally. Surface functionalization can be achieved by a large variety of different functional groups such as the amino group which, for instance, allows for modifications of the metal work functions. From this viewpoint, cysteine is of particular interest. The adsorption of cysteine on metal surfaces has been the subject of intense experimental [15–20] and theoretical [21–24] investigations. Since cysteine contains an -SH thiol head group as well as an NH2 amino group, its adsorption behavior on Au surfaces is rather complex and hence interesting. While the bonding of cysteine on Au(111) has been investigated and analyzed theoretically [21, 22], prior to our work [23–25] the interaction of cysteine with the Au(110) surface was rather unexplored. Especially the question of the precise bonding geometry and the possible formation of a bond via the amino group and its interplay with the thiolate bond in the formation of a flat or more vertical adsorption configuration is still under discussion. This article tries to clear up this question by a detailed analysis of the properties of different molecule-substrate geometries. After introducing the methodology in Sect. 2, we present the results in Sect. 3. We study the atomic geometry of different adsorption configurations in Sect. 3.1. The binding energies with respect to the non-bonded molecule and the deprotonized cysteine radical are investigated in Sect. 3.2. The energetical cost for surface rearrangements due to adsorbate-substrate interaction is considered in Sect. 3.3. An investigation of adsorption-induced changes in electronic states in Sect. 3.4 and charge rearrangements in Sect. 3.5 round off the characterization of the molecule-surface interaction. Finally, in Sect. 4, we summarize and draw conclusions.

2 Computational Methods The calculations are performed in the framework of density functional theory (DFT) using the Vienna ab initio simulation package (VASP) [26, 27]. The exchange and correlation (XC) functional is calculated in the generalized gradient approximation (GGA) as parametrized by Perdew and Wang [28, 29]. GGA treatment has been shown by Maul et al. [30, 31] to be superior to the local density approximation for the description of amino acid bonds. In opposite to previous adsorption studies [32, 33] we did not explicitly include long-range correlation because we expect such contributions to be of minor importance in the present case. The electron-ion interaction is modelled within the projector-augmented wave (PAW) method [34].

Cysteine on Gold: An ab-initio Investigation

107

Fig. 1 a Used notation for atoms of cysteine. b Unreconstructed Au(110) surface unit cell. High ¯ bridge site (3), and [110] ¯ offsymmetry positions are the top site (1), [001] bridge site (2), [110] bridge site (4)

The electron wave functions are expanded as plane waves up to a cutoff energy of 500 eV. This allows for an accurate treatment of the first-row elements as well as the Au 5d electrons. The Brillouin zone (BZ) integration is modelled as a sum over Monkhorst-Pack points [35]. The supercells consist of material slabs of five atomic layers and a vacuum region equivalent to ten atomic layers. The lateral cell size is the Au(110) 4×4 surface cell. The size was chosen to minimize interaction between the adsorbed molecules. Energy convergence with respect to all these parameters has been checked carefully. Fundamental material parameters such as the surface energy of the Au(110) surface were reproduced accurately by our supercells [36]. All calculations were carried out on the NEC SX-9 system, on which both the scaling behavior and the performance per CPU for our code are most efficient (see Ref. [37] for details on performance and scalability).

3 Results and Discussion 3.1 Relaxed Geometries Investigation of isolated bonding of the cysteine molecule with the Au(110) via solely the thiolate head group or solely the amino side group [23, 25] has revealed that the energy gain by both bonds is considerable, about 0.6 eV for the Au-thiolate bond and about 0.5 eV for the Au-amino bond. The preferred bonding sites were bridge and off-bridge positions for the deprotonized -S head group and top positions for the NH2 amino group (cf. Fig. 1b). The bonding energy of the Au-S bond was found to be very sensitive to changes in the bonding geometry, which preferred an arrangement with tetrahedron structure. These two possibilities for the bonding of a cysteine molecule and the accompanying energy gains suggest a flat adsorbate geometry with two bonds, if the deformation of the molecule does not cost too much elastic energy and the binding angle at S can be kept close to a tetrahedron angle.

108

B. H¨offling et al.

Table 1 Bond angles (in degrees) of the relaxed adsorbed and the gas phase cysteine configurations. The notation for the atoms is explained in Fig. 1a Configuration Au1-S-C3 Au2-S-C3 Au1-S-Au2 S-C3-C2 C3-C2-C1 C3-C2-N Au-N-C2 C2-N-H2 C2-N-H3 H2-N-H3

a 113.8 109.7 84.1 117.4 109.4 114.4 118.5 109.8 111.3 109.1

b 111.8 104.0 97.3 114.8 108.6 111.9 124.8 109.9 110.4 106.0

c 115.6 111.5 98.6 119.8 109.9 113.6 118.2 110.8 109.9 107.8

d 107.3 114.7 85.6 108.8 110.1 110.8

Gas phase

113.4 109.9 110.7

111.0 111.4 107.5

110.3 110.8 106.9

˚ of the relaxed adsorbed and the gas phase cysteine configurations. Table 2 Bond lengths (in A) The notation for the atoms is explained in Fig. 1a Configuration Au1-S Au2-S S-C3 Au-N N-C2 Au-H3 C2-H3

a 2.43 2.43 1.84 2.36 1.48

b 2.46 2.48 1.84 2.35 1.48

1.10

1.10

c 2.47 2.47 1.84 2.33 1.48 2.38 1.12

d 2.41 2.42 1.84

Gas phase

1.82

1.46

1.46

1.10

1.10

When one takes into account all parameters influencing the adsorption energy of the cysteine radical, i.e. the position of S and NH2 , the bonding angles at the thiolate sulfur atom as well as the intramolecular strain, flat adsorption geometries, using two functional groups to register to the surface, seem to be energetically favored over upright configurations. To check this hypothesis we have studied three different flat adsorption geometries, registering to the surface via both the amino side group and the thiolate head group, and one upright configuration—registering solely via the thiolate group. We performed ionic relaxations for all three configuration by minimizing the Hellmann˚ The resulting geometries can be Feynman forces until they were below 0.01 eV/A. seen in Fig. 3. The three flat adsorbate configurations differ with respect to the positions of the amino and thiolate bonds. Configuration a has N at an off-top site and ¯ bridge site1 . Configurations b and c both have N at a top site and S at S at the [110] the [001] bridge site but differ from each other due to the chirality of the molecule. ¯ off-bridge site. The upright configuration d has S at the most favored [110] The bond angles and bond lengths are listed and compared with parameters of the gas phase cysteine molecule in Tables 1 and 2. Hereby, our reference gas phase molecule is cysteine conformer 9 of Ref. [30], the geometry of which closely resem1 Before ionic relaxation, S was located at the [110] ¯ off-bridge site, the thiol side group was relocated during the calculations due to intramolecular strain caused by the pull of the amino-Au bond.

Cysteine on Gold: An ab-initio Investigation

109

bles that of the adsorbed molecule. The tables demonstrate that the bond lengths are almost conserved and are essentially the same in different adsorbate configurations. More important are the variations of the bond angles in the adsorbate-substrate bond region. A comparison of the bond angles in Table 1 shows that the thiolate bond angles Au-S-C3 are close to the tetrahedron angle 109.4◦ in all four configurations, which confirms the influence of the tetrahedron structure of this bond discovered in earlier studies [23, 25]. The Au1-S-Au2 angles, however, are considerably smaller due to the S-Au and Au-Au bond lengths, which are incommensurable with an ideal tetrahedron geometry, especially in configuration a and the upright configuration d, ¯ bridge site. The intramolecular bonding since both have S located at the short [110] angles remain close to their gas phase values, only S-C3-C2 varies significantly. We observe one extra molecule-substrate bond in configuration c. In addition to the interaction with the amino and thiolate group, there is a hydrogen bond between H3 and the metal slab. Table 2 shows an increase of the C2-H3 bond length by ˚ Using the lengthening of the carbon-hydrogen bond as an indicator for the 0.02 A. bond strength, we estimate the bond to have an energy of 0.2 eV or more [38]. We estimate the amino bond of configuration c to be noticeably weaker than that of configuration b (see below). The extra bond is likely to be the reason that configuration c is nevertheless energetically slightly more favorable. Comparing our adsorption configurations with those of earlier experimental and theoretical studies [16, 17] of cysteine adsorption on Au(110), we see our findings regarding the most favorable adsorption geometry in agreement with these works. Although the geometries proposed by K¨uhnle et al. are dimer adsorption geometries ¯ off-bridge they too exhibit the preference of the thiolate group to bind at the [110] site that causes the amino group to move from its favored top position to an off-top position.

3.2 Adsorption Energies The bonding of cysteine to the Au substrate is accompanied by a dissociation of the molecule into a cysteine radical and a hydrogen atom. Therefore the computation of the binding and adsorption energies requires the knowledge of the hydrogen chemical potential. To obtain this, we assume that the hydrogen atoms dissociated from the cysteine form H2 molecules. For that reason we consider the total energy EH2 of the gas phase H2 -molecule. A total energy optimization for the H2 -molecule with re˚ edge length spect to the bond length was performed, using a cubic supercell of 10 A in the DFT framework described in Sect. 2. Again, the system was relaxed until ˚ for each proton. The resulting molecular the residual forces were below 0.01 eV/A ˚ is in good agreement with the literature value (0.74 A ˚ [39]). bond length of 0.75 A Besides EH2 , we need the total energies Eads/sub of the slab with adsorbate, Esub of the clean substrate, Erad of the deprotonized gas phase cysteine radical and Emol of the intact gas phase cysteine molecule to compute the characteristic energy gains. The adsorption energies with respect to radical and molecule are then calculated as

110

B. H¨offling et al.

ads , with Table 3 Adsorption energies with respect to the free deprotonized cysteine radical Δ Erad ads and slab deformation energy Δ E of cysteinerespect to the gas-phase cysteine molecule Δ Emol sub ˚ Au1 and Au2 Au(110) adsorbate configurations in eV. Displacements of top layer Au atoms in A. are the atoms binding to S, Au3 the atom binding to N. Positive values describe displacement towards the bonding partner, negative values away from the bonding partner. Δ x, Δ y, and Δ z denote ¯ and the [110] direction, respectively the displacement in [001], the [110],

Configuration a

ads Δ Erad −2.637

ads Δ Emol −0.743

Δ Esub 0.270

b

−2.501

−0.607

0.246

c

−2.526

−0.632

0.254

d

−2.166

−0.272

0.292

Atom Au1 Au2 Au3 Au1 Au2 Au3 Au1 Au2 Au3 Au1 Au2

Δx −0.06 −0.10 0.01 0.24 0.22 0.07 0.22 0.21 0.07 −0.05 −0.11

ads Δ Erad = Eads/sub − Esub − Erad

Δy −0.17 −0.14 0.04 0.06 0.04 0.10 0.04 −0.06 0.05 −0.20 −0.13

Δz 0.19 0.19 0.09 0.12 0.13 0.06 0.13 0.16 0.04 0.16 0.28

|Δ x| 0.26 0.26 0.10 0.28 0.26 0.14 0.26 0.27 0.09 0.26 0.32

(1)

and

1 ads Δ Emol = Eads/sub + EH2 − Esub − Emol . (2) 2 The adsorption energies are listed in Table 3. We see, that configuration a is the energetically most favored adsorption configuration, while configurations b and c are energetically very close to each other but 0.1 eV higher than configuration a. The energy gain for the upright adsorbate geometry d is much smaller. This indicates that the docking of the cysteine via both the amino N atom and the thiol S atom gives rise to flat adsorption geometries as the most favorable ones. For the three considered flat adsorption geometries, we find that the binding energy varies with the bonding sites of the adsorbate molecule at the surface. The ¯ bridge site (S), i.e. configuration a, is finally docking at off-top site (N) and [110] favored.

3.3 Adsorption-Induced Surface Rearrangements To gain additional insight into the molecule-substrate interaction, we investigate the spatial rearrangement of Au atoms caused by the molecular adsorption. We compare the positions of the Au ions in the clean relaxed slab with those in the relaxed adsorbate-substrate systems. The displacement of the Au atoms engaged in bonds with the molecule was then calculated as

Δ x = xads/sub − xsub ,

(3)

Cysteine on Gold: An ab-initio Investigation

111

where xads/sub,n and xsub denote the position of the ion in the adsorbate/substrate system and in the clean relaxed slab, respectively. Table 3 contains the Δ x for all atoms engaged in thiolate and amino bonds. The atomic displacement is described for all Cartesian directions. One aspect of all atoms participating in thiolate bonds is that the lateral shift exceeds the vertical movement. In configurations b and c, the atoms are laterally repositioned towards S. In the configurations with the thiol group near the short ¯ bridge, i.e., configuration a and the upright configuration d, the displacement [110] occurs laterally away from the thiolate group, creating a thiolate bonding geometry closer to the tetrahedron structure by widening the Au-S-Au angle. This relaxation away from the bonding partner confirms the important role that the bonding geometry plays in the Au-thiolate bond found in earlier studies [23, 25]. The shifts of the Au atoms interacting with the amino group are considerably smaller than that of the thiolate bonding partners. This could be expected from the relative strengths of the bonds. The displacement is most pronounced in configuration b. The spatial movement is a good measure for the ‘pull’ exerted by the bonding partner, therefore it seems that the Au-amino bond is strongest in this configuration. The deformation energy due to adsorption-induced substrate rearrangement was calculated as Δ Esub = Esub − Esub,0 , (4) where Esub and Esub,0 denote the total energies obtained from calculations using the adsorption-restructured Au(110) slab and the clean relaxed slab, respectively. The results are listed in Table 3. It is interesting that for configurations a and d, the geometries with the molecule adsorbed at the short bridge, the strain is noticeably greater than for the other adsorption geometries, where the thiol head group docks at the long [001] bridge. This greater strain is one more indication of stronger bonds at this short bridge. The energy cost for the slab rearrangement in configuration a is about 20 meV larger than for the other flat adsorption geometries. Nevertheless, this configuration is energetically favored with respect to the other flat geometries by about 100 meV. This underlines the important role of the thiol group docking site in the adsorption process.

3.4 Electronic State Changes The adsorption via two docking bonds, S-Au and N-Au, causes a redistribution of electronic states. To analyze this redistribution, we make use of the possibility to project the wave function onto single atomic states to obtain an atom-resolved projected density of states (PDOS). This subsystem- and site-projected densities of states are shown in Fig. 2. The PDOS of the Au(110) substrate is almost independent of the adsorption geometry. The effect of the adsorption configuration on the cysteine-projected DOS is much more noticeable. To investigate the changes in greater detail, the density of states projected onto S and N, the bonding atoms,

112

B. H¨offling et al.

Fig. 2 Electronic density of states projected on the Au slab, the cysteine molecule, the thiolate S atom and the amino N atom for the four relaxed adsorbed cysteine configurations and for the unadsorbed system, i.e. the clean relaxed Au slab and the gas phase cysteine molecule. The Fermi level of the metal is aligned with the origin of the energy scale. Colored regions indicate states as described in the text

are presented in Fig. 2 as well. One observes, that the differences in the cysteineprojected DOS of the four configurations are mainly due to differences in the aminogold interaction. While the S-projected DOS has very similar shapes for all four configurations, the N-projected DOS displays the same differences as the moleculeprojected DOS. Particularly the peaks below −6 eV, i.e., below the Au d bands, possess the same lineshape as the cysteine-localized states. This indicates that the main difference in the adsorption characteristics of the configurations lies in variations of their Au-amino bond. This fact leads to an important conclusion. The adsorption process is dominated by the stronger thiolate bond, the amino bond is largely formed

Cysteine on Gold: An ab-initio Investigation

113

within the constraints imposed by the intramolecular and preferred thiolate bonding geometries. None of the adsorbed molecules shows the pronounced HOMO peak of the gas phase molecule (yellow shading). In all configurations the S-projected DOS shows the same pattern of bonding and antibonding states with peaks closely above and below the Au d bands (green shading) and a plateau in between. This is the same basic pattern as observed for the isolated thiolate bond [25]. This lineshape can be explained as a hybridization of the S p orbitals with metal states forming bonding and antibonding states. A similar lineshape has also been found for thiolate-Au bonds on the Au(111) surface [22]. There is no qualitative change in the lineshape of the thiolate-localized states due to the nearby amino bond, the basic lineshape of the SPDOS is the same for the upright geometry d as for the flat adsorption geometries. This supports our assumption that the S-Au bond is a strictly localized interaction. The fact that the S-projected DOS remains qualitatively unchanged while the Nprojected DOS varies in all relaxed configurations confirms the leading role of the thiolate bond in the adsorption process. We want to mention that the additional hydrogen bond in configuration c also changes the electronic states of the interacting hydrogen atom in the energy range of the Au d bands, indicating interaction with the d orbitals. In all flat adsorption geometries the amino-localized HOMO-1 state of the free molecule (blue shading) is largely suppressed and new peaks appear below the Au bands. This confirms that in the amino-bonded system N-localized states are shifted downwards in energy due to N-localized regions of electron depletion, as predicted in our studies of the isolated bonds [25]. The retaining of the N-localized HOMO-1 state in the upright geometry d is a result of missing amino-Au interaction. The Nprojected DOS of the upright molecule retains its gas phase profile except a broadening of the states about 5 eV below the Fermi level probably due to intramolecular charge redistribution.

3.5 Electron Transfer The charge density difference for all four configurations is calculated according to

Δ ρ (x) = ρads/sub (x) − ρads (x) − ρsub (x),

(5)

where ρads/sub is the charge density of the adsorbate-substrate system and ρads and ρsub are the charge densities of the adsorbate without surface and the clean surface, respectively. Thereby, the isolated systems have the same geometry as in the adsorbate structure. For the calculation of Δ ρ (x) we take the isolated systems in the geometry of the respective relaxed adsorbate-substrate systems, i.e. the location of the ions is exactly the same in all three calculations. The charge density difference describes the charge redistribution due to the chemical interaction of the subsystems

114

B. H¨offling et al.

Fig. 3 Charge density difference plots for the relaxed adsorbed cysteine configurations. Regions of electron accumulation/depletion are marked of in blue(+)/red(−). Isosurface value: ˚ −3 . For configuration d the surface unit cell is indicated ±0.025 e A

with Δ ρ (x) < 0 in regions of electron depletion and Δ ρ (x) > 0 in regions of electron accumulation. Figure 3 shows Δ ρ isosurface plots for the four configurations. Qualitatively, the charge density difference in all four systems closely resembles the charge rearrangement caused by the isolated bonds [25], which indicates very similar bond characteristics. Both bonds show complicated patterns of charge rearrangement. However, as with the isolated bonds we can conclude by integrating over Δ ρ that the Au-amino bond is largely electrostatic in nature, in accordance with earlier studies of amino-metal interaction [40], while the Au-thiolate bond is covalent without an ionic contribution. Configuration a displays far less electron accumulation at the Au atom interacting with the cysteine amino group than configuration b and c. This indicates that the N-Au interaction and hence the amino-Au bond is weaker than in the other two flat geometries, probably due to the more disadvantageous off-top position of the N atom. In configuration b we observe the largest isosurface enclosed regions at the site of the N-Au bond. This seems to indicate that in this geometry the Au-amino bond is stronger than in the other configurations. In our investigations of the isolated amino-Au bond we found that the charge transfer from molecule to substrate was largest for the adsorption site with the strongest bond. Therefore, the charge transfer can be used as an approximate measure for the strength of the amino bond. The total charge transfer between amino acid molecule and Au(110) substrate has been calculated for all four adsorption geometries by integrating over the charge density difference over the region of the Au slab in the

Cysteine on Gold: An ab-initio Investigation

115

adsorbate/substrate system. While for the upright configuration d no net charge transfer was observed, we measure 0.51, 0.66 and 0.66 electrons for the configurations a, b, and c, respectively, i.e. more than half of an electron is transfered from the molecule into the substrate. This is similar though somewhat less than in the case of the isolated bond discussed in Ref. [25] (0.79 electrons). The largest total charge transfer is observed in configuration b (slightly larger than in configuration c), which supports the assumption made above that the N-Au bond is strongest in this geometry. For configuration a and c Δ ρ shows S-localized charge transfer patterns similar to the isolated thiolate bond. Electron accumulation along the S-Au bond axes is asymmetric and we observe a region of decreasing electron density opposite the bond axis around which less charge accumulates. In configuration b, the thiolatelocalized charge redistribution displays almost complete symmetry in [001] direction although the S-C3 bond does not lie in the (001) plane and the two S-Au bond lengths are not entirely equal. The upright configuration d shows a thiolate bonding ¯ line symmetry pattern, that, although not entirely symmetric, displays more [110] then both the model isolated thiolate bond and configurations a and c, although the S-C3 bond is considerably tilted with respect to [001]. Nevertheless asymmetric features similar to those of configuration c and of the isolated [001] bridge bond are visible, although less pronounced. The charge density difference in configuration c shows an additional feature not present in the other configurations. The H3 localized electronic density decreases through adsorption and on the axis towards the upper right Au-atom there is a small area of electron accumulation (upper right side of Fig. 3c). This pattern can be attributed to the weak hydrogen bond mentioned above.

4 Summary and Conclusions We have investigated the adsorption of the amino acid molecule cysteine on the Au(110) surface in through density functional theory calculations within the framework of a semilocal exchange-correlation description and the projector-augmented wave method. The total-energy optimizations with respect to the atomic positions have been accompanied by investigations of the electron transfers and the changes in the electronic structure of both molecule and substrate upon adsorption, to provide us with insight into the chemical nature of the molecule-substrate interaction. We have analyzed four different adsorption geometries by performing full ionic relaxations. Previous studies using a rigid configuration indicated that flat adsorption configurations with different N and S adsorption sites should be energetically favored. For comparison one upright adsorption geometry docking to the surface solely via the thiol head group was studied as well. Indeed, the flat molecular adsorption geometries turned out to be energetically favored over the upright adsorption, taking advantage of both the thiolate-Au and the amino-Au bond. Adsorption with ¯ bridge site and the amino the deprotonized thiolate sulfur atom at the short [110]

116

B. H¨offling et al.

nitrogen at an off-top position are the most favored geometry. Molecular adsorption was discussed in terms of electron redistribution, changes in the site projected density of states and adsorption induced surface rearrangements. The combined characteristics of the isolated amino-Au and thiolate-Au bonds were present in the respective aspects of the adsorbed systems, which supports our earlier findings of largely localized bonds that do not influence each other in any direct way. The dominant role of the thiolate bond in the determination of the adsorption geometry is illustrated by the fact that the adsorption configuration with optimum thiolate bond site was determined to be energetically most favorable despite the fact that the aminoAu bond was weaker and the adsorption-induced strain in the substrate larger than for the other flat adsorption configurations. Since the cysteine adsorption process is mainly dominated by the characteristics of the thiolate bond, the preference of flat adsorption configurations might not hold for other surface orientations and metals with filled d shells. Since the bonding energy of the thiolate bond is very sensitive to changes in the bonding geometry, the adsorption geometry depends on the possibility of forming amino (or other) bonds while retaining the favored tetrahedron-like geometry of the thiolate-Au bond, i.e., it depends on the surface structure and the lattice constant of the substrate. Our results should also be of interest for the investigation of self-assembled cysteine monolayers, dimer-adsorption and other more complex coverage structures. In the case of larger biomolecules attached to metallic supports through cysteine residues, the discussed flat adsorption configurations and energies might be less relevant, since in those systems the NH2 group is usually engaging in peptide bonds along the primary chain. For future investigations of such systems, the results concerning the upright adsorption configuration should be of greater interest. Due to the localized nature of the chemical bonds the influence of the molecular tail on the properties of the single bonds proved to be small, therefore the results obtained by the analysis of S-Au and N-Au bonds should be applicable for other problems such as the adsorption of nucleobases or alkanethiols. In particular the results concerning the importance of the bonding geometry on metal-thiolate bonds could be exploited for practical purposes, as in the usage of cysteine as an anchoring group for functional biomolecular structures on metal substrates. Acknowledgments. We gratefully acknowledge fruitful discussions with J. Furthm¨uller and M. Preuss. The work was financially supported by the EU e-I3 Project ETSF (Grant No. 211956) and the German Federal Government (BMBF Project No. 13N9669). Grants of computer time from the H¨ochstleistungsrechenzentrum Stuttgart are gratefully acknowledged.

References 1. 2. 3. 4.

A. Nilsson and G. M. Petterson, Surf. Sci. Rep. 55, 49 (2004). H. Ishii, K. Sugiyama, I. Eisuke, and K. Seki, Adv. Mater. 11, 605 (1999). G. Heimel, L. Romaner, J.-L. Br´edas, and E. Zojer, Phys. Rev. Lett. 96, 196806 (2006). H. V´asquez, Y. J. Dappe, J. Ortega, and F. Flores, J. Chem. Phys. 126, 144703 (2007).

Cysteine on Gold: An ab-initio Investigation

117

5. I. G. Hill, A. Rajagopal, A. Kahn, and Y. Hu, Appl. Phys. Lett. 73, 662 (1998). 6. W. G. Schmidt, K. Seino, M. Preuss, A. Hermann, F. Ortmann, and F. Bechstedt, Appl. Phys. A 85, 387 (2006). 7. C. Vericat, M. E. Vela, and R. C. Salvarezza, Phys. Chem. Chem. Phys. 7, 3258 (2005). 8. V. De Renzi, R. Rousseau, D. Marchetto, R. Biagi, S. Scandolo, and U. del Pennino, Phys. Rev. Lett. 95, 046804 (2005). 9. E. Rauls, S. Blankenburg, and W. G. Schmidt, Surf. Sci. 602, 2170 (2008). 10. C. Joachim, J. K. Gimzewski, and A. Aviram, Nature 408, 541 (2000). 11. H. B. Akkerman, P. W. M. Blom, D. M. de Leeuw, and B. de Boer, Nature 441, 69 (2006). 12. C. P. Collier, E. W. Wong, M. Belohradsky, F. M. Raymo, J. F. Stoddart, P. J. Kuekes, R. S. Williams, and J. R. Heath, Science 285, 391 (1999). 13. L. Bogani and W. Wernsdorfer, Nature Materials 7, 179 (2008). 14. S. Y. Quek, L. Venkataraman, H. J. Choi, S. G. Louie, M. S. Hybertsen, and J. B. Neaton, Nano Lett. 7, 3477 (2007). 15. R. LeParc, C. I. Smith, M. C. Cuquerella, R. L. Williams, D. G. Fernig, C. Edwards, D. S. Martin, and P. Weightman, Langmuir 22, 3413 (2006). 16. A. K¨uhnle, T. R. Linderoth, B. Hammer, and F. Besenbacher, Nature 415, 891 (2002). 17. A. K¨uhnle, L. M. Molina, T. R. Linderoth, B. Hammer, and F. Besenbacher, Phys. Rev. Lett. 93, 086101 (2004). 18. A. K¨uhnle, T. R. Linderoth, and F. Besenbacher, J. Am. Chem. Soc. 128, 1076 (2005). 19. A. K¨uhnle, T. R. Linderoth, and F. Besenbacher, J. Am. Chem. Soc. 125, 14680 (2003). 20. R. R. Nazmutdinov, J. D. Zhang, T. T. Zinkicheva, I. R. Manyurov, and J. Ulstrup, Langmuir 22, 7556 (2006). 21. R. Di Felice, A. Selloni, and E. Molinari, J. Phys. Chem. B 107, 1151 (2003). 22. R. Di Felice and A. Selloni, J. Chem. Phys. 120, 4906 (2004). 23. B. H¨offling, F. Ortmann, K. Hannewald, and F. Bechstedt, Phys. Rev. B 81, 045407 (2010). 24. B. H¨offling, F. Ortmann, K. Hannewald, and F. Bechstedt, Phys. Stat. Solidi C 7, 149 (2010). 25. B. H¨offling, F. Ortmann, K. Hannewald, and F. Bechstedt, in: W. E. Nagel, D. B. Kr¨oner, and M. M. Resch, eds., High Performance Computing in Science and Engineering ’10, p. 119, Springer, Heidelberg (2010). 26. G. Kresse and J. Furthm¨uller, Comp. Mater. Sci. 6, 15 (1996). 27. G. Kresse and J. Furthm¨uller, Phys. Rev. B 54, 11169 (1996). 28. J. P. Perdew, Electronic Structure of Solids ’91, p. 11, Akademie-Verlag, Berlin (1991). 29. J. P. Perdew and Y. Wang, Phys. Rev. B 45, 13244 (1992). 30. R. Maul, M. Preuss, F. Ortmann, K. Hannewald, and F. Bechstedt, J. Phys. Chem. A 111, 4370 (2007). 31. R. Maul, F. Ortmann, M. Preuss, K. Hannewald, and F. Bechstedt, J. Comp. Chem. 28, 1817 (2007). 32. F. Ortmann, W. G. Schmidt, and F. Bechstedt, Phys. Rev. Lett. 95, 186101 (2005). 33. F. Ortmann, W. G. Schmidt, and F. Bechstedt, Phys. Rev. B 73, 205101 (2006). 34. G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999). 35. H. J. Monkhorst and J. D. Pack, Phys. Rev. B 13, 5188 (1976). 36. B. H¨offling, F. Ortmann, K. Hannewald, and F. Bechstedt, in: W. E. Nagel, D. B. Kr¨oner, and M. M. Resch, eds., High Performance Computing in Science and Engineering ’09, p. 53, Springer, Heidelberg (2009). 37. R. Leitsmann and F. Bechstedt, in: W. E. Nagel, D. B. Kr¨oner, and M. M. Resch, eds., High Performance Computing in Science and Engineering ’10, p. 135, Springer, Heidelberg (2010). 38. G. R. Desiraju and T. Steiner, The Weak Hydrogen Bond in Structural Chemistry and Biology, Oxford University Press, Oxford (1999). 39. N. N. Greenwood and A. Earnshaw, Chemie der Elemente, VCH, Weinheim (1988). 40. M. Preuss, W. G. Schmidt, and F. Bechstedt, Phys. Rev. Lett. 94, 236102 (2005).

Ab-initio Calculations of the Vibrational Properties of Nanostructures Gabriel Bester and Peng Han

1 Introduction The new class of materials formed by semiconductor-based nanostructures has a large and mostly unexplored ensemble of application fields. For instance, colloidal semiconductor nanostructures or “quantum dots” are used today in biological science and medicine as bio-compatible light-emitting markers [1, 2]. In the area of classical information science and technology (e.g. wavelength-division multiplexing) often monochromatic light and light-detectors of tunable wave length are required. In general, optoelectronics could profit from the developments of semiconductor nanostructures [3]. Following a rather long-term goal, in the area of quantum information science and technology, the use of quantum dots is one of the most promising concepts and is lead by a world-wide effort. As soon as realistic applications are addressed, the effect of temperature and hence the dynamical processes come prominently into play. A theory at T = 0 K yields very valuable results to unveil certain aspects of the underlying physics, but to make predictions valid in the world of technology the effects of vibration and temperature on the dynamical processes must be addressed. Beyond a change in the quantitative values obtained at T = 0 K, the inclusion of dynamical processes in the theory opens the way of effects that simply do not exist without them, such as spin-flip transitions [4], or the relaxation of excited carriers to the ground state [5, 6]. For nanostructures with typically thousands to hundred thousands of atoms, the quantum confinement and surface effects play a dominant role for the physical properties. The vibrational properties of such semiconductor nanoparticles have been calculated earlier using continuum models [7] or empirical atomic potential models [8]. Within these models, the change of bond lengths induced by the surface on the entire nanocrystals, as well as the detail of the surface relaxation and reconstruction are missing. These effects are treated accurately by ab initio method, Gabriel Bester · Peng Han Max Planck Institut f¨ur Festk¨orperforschung, Stuttgart, Germany, e-mail: [email protected], [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 10, © Springer-Verlag Berlin Heidelberg 2012

119

120

G. Bester, P. Han

where charge redistribution is taken into account self-consistently. We therefore calculated the vibrational density of states (DOS) based on ab initio density functional theory (DFT) for III–V semiconductor nanostructures ranging in diameter from 26 ˚ We study the effect of surface passivation, the change of bond-length, the to 36 A. modification of electric field at the surface and the changes in the chemical bonds.

2 Research Methodology A detailed review on density functional theory applied to lattice-dynamical calculations has been given elsewhere [9]. Here, we only outline our research methodology briefly. Based on the adiabatic approximation (Born-Oppenheimer approximation), the lattice-dynamical properties of a system are given by h¯ 2 ∂ 2 + E({R}) Φ ({R}) = εΦ ({R}) (1) −∑ 2 I 2MI ∂ RI where, RI is the coordinate of atom I, and MI is its mass, {R} is the nuclei configuration given as a set of atomic positions, h¯ is the Planck constant, ε and Φ ({R}) are the eigenvalue and eigenvector of the lattice vibrations, respectively. E({R}) is the ground state energy of the system, which is determined by the many body Hamiltonian HBO ({R}) = −

∂2 1 h¯ 2 e2 −VR (r) +VN ({R}) + ∑ 2m i ∂ r2i 2 i∑ |r − r j| =j i

(2)

where, m is the mass of the electron, −e is the electron charge, ri is the coordinate of electron i. The electron-nucleus interaction potential VR (r) is given by VR (r) = − ∑ i,I

ZI e2 |ri − RI |

(3)

with ZI the charge of nucleus I. The electrostatic interaction potential VN ({R}) is written as ZI ZJ e2 . (4) VN ({R}) = 2 I∑ |R I − RJ | =J Based on the Hellmann-Feynmann theorem, the force acting on nucleus I is FI = −

∂ E({R}) ∂ HBO ({R}) = −Ψ (r, {R})| |Ψ (r, {R}) ∂ RI ∂ RI

(5)

and the force constant matrix elements are

∂ 2 E({R}) = ∂ RI ∂ RJ

∂ ρR (r) ∂ VR (r) dr + ∂ RJ ∂ RI

ρR (r)

∂ 2VR (r) ∂ 2VN ({R}) dr + ∂ RI ∂ RJ ∂ RI ∂ RJ

(6)

Ab-initio Calculations of the Vibrational Properties of Nanostructures

121

where, Ψ (r, {R}) is the electronic ground-state wave function and ρR (r) is the electron charge density for the nuclei configuration {R}. The charge density ρR (r) is obtained by mapping the problem onto a set of one-particle equations (Kohn-Sham equations): h¯ 2 ∂ 2 ρR (r ) δ Exc 2 − dr + +VR (r) + e ψn (r) = εn ψn (r) (7) 2m ∂ r2 |r − r | δ ρR (r) and N/2

ρR (r) = 2 ∑ |ψn (r)|2

(8)

n=1

where, δ Exc is the exchange-correlation energy, εn and ψn (r) are the eigenenergy and wave function of the electronic states, respectively. Based on the harmonic approximation of lattice dynamics, the frequencies ω and the corresponding eigenmodes uI are obtained by solving the eigenvalue equation

∑√ J

1 ∂ 2 E({R}) uJ = ω 2 uI . MI MJ ∂ RI ∂ RJ

(9)

3 Computational Demand 3.1 Structure Relaxation The geometry relaxation is performed using the Broyden-Fletcher-Goldfarb-Shano (BFGS) procedure for the optimization of the ionic positions. This method usually leads to a reduction of the forces on the atoms in a minimum number of iterations. The information of several previous steps in the history of the atomic position relaxation is kept and used in the guess for the new positions. This part of the calculation remains one of the most time consuming and requires several tens of cycles in the self-consistent loop.

3.2 Calculation of Dynamical Matrix Elements 2 E({R}) The dynamical matrix elements ∂∂ R are obtained by solving (6). In the calcuI ∂ RJ lation, the charge density ρR (r) are obtained by solving the Kohn-Sham equation R (r) are calculated using a finite difference apself-consistently and the values of ∂∂ρR J proach. In principle we need 3N atomic displacements to obtain all the elements of the dynamical matrix (N being the number of atoms). In practice we calculate a significantly lower number of displacements (3N/24) and use the symmetry elements

122

G. Bester, P. Han

of the point group to deduce the missing elements. This is one of key points to be able to treat these large structures.

3.3 Diagonalization of Dynamical Matrix The eigenvalues and the eigenvectors of the dynamical matrix are obtained by diagonalizing the dynamical matrix. This is a feasible task with standard direct diagonalization methods and does not represent a bottleneck of the calculations. All the calculations are performed with the CPMD code package developed at the Max Planck Institute in Stuttgart and at IBM [10]. The CPMD code is a high performance parallelized plane wave/pseudopotential implementation of Density Functional Theory (DFT). It offers, at the moment, the best scaling among the DFT codes using a hybrid scheme of MPI and OpenMP.

4 Computational Platform and Scaling In this project, we use the NEC Nehalem clusters at the High Performance Computing Center Stuttgart (HLRS) with 2.8 GHz and 12 GB memory per nodes, and infiniband interconnects. We used 8 to 128 tasks with typical runtimes between 1 and 24 hour(s). The total memory requirements are between 4 and 192 GB, depending on system type and size. We usually run using the maximum available memory on each note, i.e., 1.5 GB per task. So the large memory demand sets a lower limit on the number of tasks. The parallelization scheme is a combination of Message Passing Interface (MPI) and openMP, as implemented in the CPMD package. The CPMD package is compiled with the Fortran compiler mpiifort, using Intel’s version of MPI and the Intel Math Kernel Library (MKL) -L/opt/numlib/ intel/mkl/10.1.2.024 /lib/em64t -lmkl intel lp64 -lmkl sequential -lmkl intel thread -lmkl core -lguide at the Nehalem cluster at HLRS. In Fig. 1 we show the results of our scaling tests for CPMD using three different cluster sizes: In135 As140 H∗172 (with 447 atoms and 1272 electrons), Ga141 P140 H∗172 (with 453 atoms and 1296 electrons) and In225 As240 H∗228 (with 693 atoms and 2088 electrons). These scaling tests are performed on the clusters used in the later calculation of the vibrational properties, but executing only one self-consistent step. We have chosen this approach as opposed to a downscaled problem size, since the results depend sensitively on the size of the system. With this approach we are not able to go below a certain number of tasks due to the memory requirement, but we are able to identify the point at which the performance deviates from the ideal scaling. In the top panel, for the smallest structure, we can performed one full iteration on two nodes, using a minimum of 16 tasks. For the medium and large clusters we need a minimum of 24 and 96 tasks, respectively. For the small cluster size (top panel) we

Ab-initio Calculations of the Vibrational Properties of Nanostructures

123

Fig. 1 Scaling of CPMD on the Nehalem cluster at HLRS (circles with solid line) for different system sizes

can increase the number of tasks from 16 to 32 with nearly perfect scaling. For the medium size cluster we can use up to 40 tasks before the performance drops. For the large cluster the performance drops after 96 tasks. The blue dots in Fig. 1 show the number of tasks used in the calculations. One should mention that the type of problems we address are hard to parallelize and require significant amount of internode communication. One of the largest computational demand comes from FFTs used to move from real to reciprocal space in each iteration.

124

G. Bester, P. Han

5 Results 5.1 Vibrational Properties of Passivated Nanostructures The nanostructure is spherical and made of zinc-blende material. It is centered around a cation and has Td symmetry. To passivate the dangling bonds of the nanoparticles, fictitious pseudohydrogen atoms H∗ with a nuclear charge of 5/4 or 3/4 were use. In order to obtain the optimized geometry of the nanoparticles, the atomic positions of the clusters are relaxed until the residual forces are less than 3×10−6 Ha/Bohr. The relaxed geometry of fully passivated In225 P240 H∗228 nanoparticle is shown in Fig. 2. The vibrational DOS of the passivated In321 P312 H∗300 , In225 P240 H∗228 , In141 P140 H∗172 nanostructures are plotted in Figs. 3a–c as thin blue curves with the projection of the eigenmodes on the core atoms (black curves), surface atoms (red curves), and the passivants (green curves). The phonon DOS of bulk InP is plotted in Fig. 3d. In the low frequency part of Figs. 3a–c, the vibrational modes are generated by the collective vibration of all the atoms in the nanostructure, which correspond to the acoustic phonon modes of bulk. Along the increasing frequency, the vibrational modes of the nanoparticles appear around the peaks of the transverse and longitudinal optical (TO and LO) phonon modes of the bulk materials. Comparing the vibrational DOS of nanostructures with various diameters, the optical-like modes exhibit a blue shift comparing to the bulk optical modes with decreasing cluster size. In addition, a broadening of the optical-like TO-/LO-modes are obtained in these nanoparticles. The properties of these vibrational modes are attributed to the change of the bond length caused by the geometry relaxation and the modification of the polarization at the surface.

Fig. 2 Relaxed structure of a fully passivated In225 P240 H∗228 nanoparticle. The indium, the phosphorous, and the fictitious hydrogen atoms are represented by tan, purple and white spheres, respectively

Ab-initio Calculations of the Vibrational Properties of Nanostructures

125

Fig. 3 The vibrational DOS of the passivated a In321 P312 H∗300 , b In225 P240 H∗228 , c In141 P140 H∗172 , and d phonon DOS of bulk InP

Comparing the vibrational DOS of the nanoparticles with the phonon DOS of bulk InP in Fig. 3, some additional vibrational modes are found in the gap between the acoustic and optical phonon modes of the bulk material. From the projection of the vibrational DOS presented in Figs. 3a–c, these additional modes have dominant surface character. These vibrational modes are surface optical modes. These surface optical modes exhibit a blue shift with decrease of nanoparticle size. Besides the bulk-like and the surface modes, another type of mode caused by the vibration of the passivant atoms appears in the high frequency range. As shown in Figs. 3a–c, the passivant modes are separated into two frequency ranges, one just above the optical-like modes and the other significantly higher. In the lower frequency range (about 400 to 750 cm−1 ), the passivants rotate and shear around the surface atoms. For the higher frequency range, the passivant vibrational modes are separated into rocking modes at about 1900 cm−1 and stretching modes at about 2200 cm−1 .

5.2 Vibrational Properties of Unpassivated Nanostructures To understand the effects of the surface passivation on the vibrational properties of nanoparticles with various size, we calculated the vibrational DOS of the un-

126

G. Bester, P. Han

Fig. 4 The vibrational DOS of the unpassivated a In141 P140 , b In225 P240 , c In321 P312 , and d the phonon DOS of bulk InP

passivated In321 P312 , In225 P240 , and In141 P140 nanostructures and plotted them in Figs. 4a–c with the phonon DOS of bulk InP in Fig. 4d. In the calculation, we fixed the In and P atoms at their optimized positions and removed all the passivants. For this “frozen” geometry, one or two sp3 orbital(s) of the surface atoms is (are) occupied only by a single unpaired electron. As a result, the potential of the cluster changes and additional forces are introduced in this structure. Due to these unpaired sp3 orbitals, the bond strength at the surface is reduced. This leads to a red shift of the vibrational modes, which can be seen by comparing Figs. 3a–c with Figs. 4a–c. Furthermore, we studied the vibrational properties of the unpassivated nanostructures including geometry relaxation. The relaxed structure of the Ga225 P240 cluster is given in Fig. 5, where, the Ga and P atoms are shown as green and purple spheres, respectively. As shown in this figure, the relaxation introduces massive rearrangement of the atoms and leads to some “buckling” at the surface. The vibrational DOS of the relaxed Ga141 P140 , Ga225 P240 and Ga321 P312 nanoparticles is plotted in Figs. 6a–c with the phonon DOS of bulk GaP in Fig. 6d. In these figures, the vibrational spectrum becomes broader and the “phonon gap” between the bulk-like acoustic and optical modes disappears. This phenomena can be understood as follows. In the unpassivated nanostructures, the tetrahedral structure with one unpaired sp3 orbital evolves into a trigonal planer structure with sp2 hybrid and an unpaired p orbital after the symmetry constraint relaxation. The geometry of the trigonal planer configuration can be seen in Fig. 5. From the geometry of the relaxed Ga225 P240 ,

Ab-initio Calculations of the Vibrational Properties of Nanostructures

127

Fig. 5 Relaxed structure of unpassivated Ga225 P240 nanoparticle. The gallium and phosphorous are represented by green and purple spheres, respectively

Fig. 6 Vibrational DOS of the relaxed unpassivated a Ga321 P312 , b Ga225 P240 , c Ga141 P140 , and d the phonon DOS of bulk GaP

the bond angle between the surface atom with two dangling bonds and its two nearest neighbors is not 180◦ . This geometry shows that the surface atoms with two dangling bonds will keep their unpaired sp3 orbitals rather than form the sp hybridized orbitals during our symmetry constraint relaxation. Because the bond strength of the sp2 hybrid orbitals is stronger than that of the fully occupied sp3 orbitals and stronger than that of the sp3 orbitals with unpaired electrons, some vi-

128

G. Bester, P. Han

brational modes move to higher frequency while others moves to lower frequency. This leads to the broadening of the vibrational spectrum and the disappearance of the “phonon gap”.

6 Summary In summary, we performed DFT calculations to study the vibrational properties of III–V semiconductor colloidal nanostructures on the NEC Nehalem Cluster at HLRS. These represent the largest calculations of vibrational properties based on DFT. We find that the vibration of the passivated nanostructures can be divided into bulk-like, surface and passivant modes. The bulk-like modes exhibit a blue shift comparing to the vibrational frequency of bulk material. This blue shift is the result of the bond length reduction originates from the surface effects. In the unrelaxed unpassivated nanostructures, the unpaired electrons in the sp3 hybrid orbitals reduce the bond strength and lead to the red shift of vibrational modes. For the unpassivated nanostructures with symmetry constraint relaxation, the formation of sp2 orbitals increases the bond strength and induces a blue shift of vibrational frequency.

7 Publication With the calculated results obtained from the NEC Nehalem Cluster at HLRS, we have the following publications, 1. Peng Han and Gabriel Bester, “Interatomic potentials for the vibrational properties of III-V semiconductor nanostructures”, Phys. Rev. B 83, 174304 (2011). 2. Peng Han and Gabriel Bester, “Effects of the size and surface passivant on the vibrational properties of III–V semiconductor nanoparticles: An ab-initio study” (submitted to Phys. Rev. B).

References 1. Rogach, A. L., Eychm¨uller, A., Hickey, S. G., Kershaw, S. V.: Infrared-emitting colloidal nanocrystals: Synthesis, assembly, spectroscopy, and applications. Small 3, 536–557 (2007). 2. Gaponik, N., Hickey, S. G., Dorfs, D., Rogach, A. L., Eychm¨uller, A.: Progress in the light emission of colloidal semiconductor nanocrystals. Small 6, 1364–1378 (2010). 3. Talapin, D. V., Lee, J.-S., Kovalenko, M. V., Shevchenko, E. V.: Prospects of colloidal nanocrystals for electronic and optoelectronic applications. Chem. Rev. 110, 389–458 (2010). 4. Fischer, J., Loss, D.: Hybridization and spin decoherence in heavy-hole quantum dots. Phys. Rev. Lett. 105, 266603 (4pp) (2010). 5. Kilina, S. V., Kilin, D. S., Prezhdo, O. V.: Breaking the phonon bottleneck in PbSe and CdSe quantum dots: Time-domain Density Functional Theory of charge carrier relaxation. ACS Nano 3, 93–99 (2009).

Ab-initio Calculations of the Vibrational Properties of Nanostructures

129

6. An, J. M., Califano, M., Franceschetti, A., Zunger, A.: Excited-state relaxation in PbSe quantum dots. J. Chem. Phys. 128, 164720 (7pp) (2008). 7. Trallero-Giner, C., Comas, F., Marques, G. E., Tallman, R. E., Weinstein, B. A.: Optical phonons in spherical core/shell semiconductor nanoparticles: Effect of hydrostatic pressure. Phys. Rev. B 82, 205426 (14pp) (2010). 8. Fu, H. X., Ozolins, V., Zunger, A.: Phonons in GaP quantum dots. Phys. Rev. B 59, 2881–2887 (1999). 9. Baroni, S., de Gironcoli, S., Corso, A. D.: Phonons and related crystal properties from densityfunctional perturbation theory. Rev. Mod. Phys. 73, 515–562 (2001). 10. The CPMD consortium page, coordinated by Parrinello, M., and Andreoni, W., Copyright IBM Corp 1990–2008, Copyright MPI f¨ur Festk¨orperforschung Stuttgart 1997–2001 http://www.cpmd.org.

Entropy and Metal-Insulator Transition in Atomic-Scale Wires: The Case of In-Si(111)(4 × 1)/(8 × 2) W.G. Schmidt, E. Rauls, U. Gerstmann, S. Sanna, M. Landmann, M. Rohrm¨uller, A. Riefer, and S. Wippermann

Abstract Density functional theory (DFT) calculations are performed to determine the mechanism and origin of the intensively debated (4 × 1)–(8 × 2) phase transition of the Si(111)-In nanowire array. The calculations (i) show the existence of soft phonon modes that transform the nanowire structure between the metallic In zigzag chains of the room-temperature phase and the insulating In hexagons formed at low temperature and (ii) demonstrate that the subtle balance between the energy lowering due to the hexagon formation and the larger vibrational entropy of the zigzag chains causes the phase transition.

1 Introduction Quasi-one-dimensional (1D) electronic systems attract considerable interest, related on the one hand to the search for fascinating collective phenomena such as spincharge separation. On the other hand, modulation and controlled tuning of the electrical characteristics of nanoscale structures are essential for their future use in nanoelectronics. The ordered array of In nanowires that self-assembles at the Si(111) surface is one of the most fascinating and most intensively studied model systems in this context. It provides a robust testbed for studying electron transport at the atomic scale [1, 2]. In addition, the experimentally observed phase transition from the metallic Si(111)-(4 × 1)-In structure (Fig. 1a) stable at room temperature (RT) to an insulating (8 × 2) reconstruction below 120 K [3] has provoked many fundamental questions and intensive research. While the atomic structure of the lowtemperature (LT) (8 × 2) phase has recently been explained [4, 5] in terms of a W.G. Schmidt · E. Rauls · U. Gerstmann · S. Sanna · M. Landmann · M. Rohrm¨uller · A. Riefer Lehrstuhl f¨ur Theoretische Physik, Universit¨at Paderborn, 33095 Paderborn, Germany S. Wippermann University of California, Dept. of Chemistry, One Shields Avenue, Davis, CA 95616, USA W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 11, © Springer-Verlag Berlin Heidelberg 2012

131

132

W.G. Schmidt et al.

Fig. 1 Schematic top views of a room temperature (4 × 1) and b the low-temperature (8 × 2) hexagon structure of the Si(111)-In nanowire array. Red balls indicate In atoms

hexagon structure (Fig. 1b), the nature and driving force of the metal-insulator transition remained an open question. Originally, it was explained as a charge-density wave (CDW) formation due to the Peierls instability [3]. However, only one of the metallic bands nests properly. Therefore, a triple-band Peierls instability has been proposed, where an interband charge transfer modifies the Fermi surface to improve nesting [6, 7]. A periodic lattice distortion that lowers the energy has also been suggested [8–11]. On the other hand, many-body interactions were made responsible for the low-temperature phase [12]. Several theoretical studies proposed the phase transition to be of order-disorder type [4, 9, 13] and explained the RT phase in terms of dynamic fluctuations between degenerate ground state structures. However, photoemission [14, 15] and Raman spectroscopy [16] results have cast doubt on this model. Using computer grants of the HLRS, we studied the (4 × 1)–(8 × 2) phase transition on the basis of DFT calculations [17]. In contrast to earlier work, the vibrational and electronic entropy of the In nanowire array is included in the calculations.

2 Computational Method For a fixed stoichiometry, the ground state of the surface-supported nanowires is characterized by the minimum of the free energy F as a function of the substrate crystal volume V and the temperature T . It can be obtained using atomistic thermodynamics, see, e.g. Ref. [18]. Within the adiabatic approximation, F is given by F(V, T ) = Fel (V, T ) + Fvib (V, T ),

(1)

Entropy and Metal-Insulator Transition

133

with Fel = Etot − T Sel , where we approximate the total energy Etot by the zerotemperature DFT value and calculate the electronic entropy Sel from

Sel = kB

dE nF [ f ln f + (1 − f ) ln(1 − f )].

(2)

Here nF and f denote the density of electronic states and the Fermi distribution function, respectively. The vibrational free energy of the supercell with volume Ω is calculated in harmonic approximation h¯ ω (k) 1 Ω − i Fvib = 3 d 3 k ∑ h¯ ωi (k) + kB T ln(1 − e kB T ) . (3) 8π 2 i The wave-vector dependent phonon frequencies ωi (k), as well as the corresponding eigenvectors are obtained from the force constant matrix calculated by assuming Fel (V, T ) ∼ Etot (T = 0), i.e., neglecting the explicit temperature and volume dependence. The DFT calculations are performed within the local density approximation (LDA) for exchange and correlation as implemented in VASP [19]. Thereby the system of Kohn-Sham equations h¯ 2 n(r ) − +Vext (r) + dr +Vxc (r) ψnk (r) = εnk ψnk (4) 2m |r − r | n(r) = ∑ fnk |ψnk |2

(5)

n,k

is solved iteratively for the external potential Vext (r) until self-consistency in the total electron density n(r) is reached. Plane waves serve as basis set for the KohnSham orbitals ψnk (r). The ground-state DFT calculations were parallelized for different bands and sampling points in the Brillouin zone using the message passing interface (MPI). Parallelization over bands and plane wave coefficients at the same time reduces the communication overhead significantly. Concerning numerical details, we follow Stekolnikov et al. [2]. The Brillouin zone (BZ) integrations in the electronic structure calculations are performed using uniform meshes equivalent to 64 points for the (4 × 1) unit cell. This number was increased to 3200 points for the electronic entropy calculations. Frozen-phonon calculations have been performed using a (8 × 4) translational symmetry that yields the Γ - and X-point modes of the (8 × 2) unit cell. Figure 2 shows benchmark calculations to determine the electronic ground state of the 200 atom cell used for surface modeling in our project. The calculations within this project were performed on the NEC SX-8 and SX-9 of the H¨ochstleistungsrechenzentrum Stuttgart. As can be seen, a reasonable scaling is achieved for using up to 32 CPUs.

134

W.G. Schmidt et al.

Fig. 2 CPU time and speedup for DFT calculations for the hexagon model of the Si(111)-In nanowire array containing around 200 atoms. The calculations were performed with the Stuttgart optimized VASP version on the HLRS NEC SX-8 and SX-9 machines. In comparison, we show data for the HLRS Cray XE6, a local Linux cluster (Intel Core i7, 24 Twin-nodes with 4 CPUs 2.5 GHz Quad Core Xeon each) and Mac Pro workstations (Intel Core i7)

3 Results The calculated Γ -point frequencies for strongly surface-localized vibrational modes of the Si(111)-In nanowire array are compiled in Table 1. The table contains the present results for the (4 × 1) phase as well as their assignment to the frequencies of geometrically similar eigenvectors of the (8 × 2) phase in comparison with the Raman data from Fleischer et al. [16]. The overall very good description of the distinct, but similar, sets of vibrational modes measured for the LT and RT phase by calculations for (8 × 2) and (4 × 1) geometries is a strong argument against the dynamical fluctuation model [4, 9, 13]. Also, if at elevated temperatures the system were frequently visiting configurations associated with (8 × 2) structures, significant contributions from the LT structure should be present in the RT spectra, in contrast to the actual experimental findings [16]. Interestingly, the calculations confirm the existence of a low-frequency shear mode of A” symmetry for the Si(111)-(4 × 1)-In phase at 28 cm−1 . This mode, which was also detected by Raman spectroscopy [16], is energetically below the phase transition temperature of about kB T ∼ 83 cm−1 and has been suggested to correspond to the lattice deformation characteristic for the (4 × 1) −→ (8 × 2) phase transition [4, 7, 13]. The calculated eigenvector of this mode (Fig. 3a) shows the two In atom zigzag chains oscillating against each other. We find that the structural transformation from the In zigzag-chain structure with (4 × 1) symmetry (Fig. 1a) to the In hexagons with (8 × 2) translational symmetry (Fig. 1b) can in fact be per-

Entropy and Metal-Insulator Transition

135

Table 1 Calculated Γ -point frequencies for strongly surface localized A’ (upper part) and A” phonon modes (lower part) of the Si(111)-(4 × 1)/(8 × 2)-In phases in comparison with experimental data [16]. The symmetry assignment of the (8 × 2) modes is only approximate, due to the reduced surface symmetry T HEORY ω0 [cm−1 ] (4 × 1) → (8 × 2) 22 → 20 27 hexagon rotary mode 44 → 47 51 → 53 62 → 58, 69 65, 68 → 70, 69, 78, 82 100, 104 → 97, 106, 113, 142 129, 131 → 137, 142 143, 145 → 139, 145, 146, 147 28 → 18, 19 shear mode → antisym./sym. shear mode 35 51 75 82

E XPERIMENT ω0 [cm−1 ] (4 × 1) → (8 × 2) 31 ± 1 → 21 ± 1.6 28 ± 1.3 36 ± 2 52 ± 0.6 61 ± 1.3 2·72 ± 3.3 105 ± 1 118 ± 1 2·148 ± 7 28 ± 0.9

→ 41 ± 2 → 57 ± 0.7 → 62, 69 ± 1.5 → 83 ± 2.3 → 100–130 → 139 ± 1.2 → 139, 2·154±2 → 2·23.5 ± 0.8

3·42 ± 3.5 2·59 ± 3 69 ± 1.5 85 ± 1.7

fectly described by superimposing the calculated eigenvector of the 28 cm−1 mode with the two degenerate low-frequency X point modes at 17 cm−1 (one of the symmetrically equivalent modes is shown in Fig. 3b). Similarly, the combination of the corresponding shear mode of the Si(111)-(8 × 2)-In phase at 18 cm−1 with the hexagon rotary mode at 27 cm−1 (Fig. 3c) transforms the In hexagons back to parallel zigzag chains. The calculated phonon modes support the geometrical path for the phase transition proposed in Refs. [4, 7, 13]. They give an atomistic interpretation of the triple-band Peierls model [7, 20, 21]: The soft shear mode lifts one metallic band above the Fermi energy, while the rotary modes lead to a band-gap opening for the remaining two metallic In surface bands. What, however, is causing the phase transition? Before we discuss the difference in the free energies calculated for the two phases of the Si(111)-In surface (cf. Fig. 4), a word of caution is in order. The weak corrugation of the In atom potential-energy surface leading to small and error-prone force constants as well as the harmonic approximation impair the accuracy of the calculated phonon frequencies. In order to minimize systematic errors, we compare results obtained for supercells of identical size and use identical numerical parameters. The calculations are performed at the equilibrium lattice constant. From calculations where the measured lattice expansion has been taken into account, we estimate the corresponding error to be of the order of 0.1 meV per surface In atom. The sampling of the phonon dispersion curves is another crucial point. It is performed here by using only the

136

W.G. Schmidt et al.

Fig. 3 Calculated eigenvectors for three prominent phonons modes (notation as in Table 1) of the Si(111)-(4 × 1)-In (a), (b) and Si(111)-(8 × 2)-In phase (c). The mode shown in b—occurring at the X point of the (4 × 1) BZ—is twofold degenerate due to the existence of an equivalent mode at the neighboring In chain

Γ and the X point of the (8 × 2) BZ. However, as shown in the inset of Fig. 4, further restricting of the sampling to the Γ point results in an energy shift of less than 0.3 meV, indicating that the unit cell is large enough to compensate for poor BZ sampling. Stekolnikov and co-workers [2] have shown that the energetics of the In nanowires depends sensitively on the functional used to model the electron exchange and correlation energy and the treatment of the In 4d electrons. We find the inclusion of the In 4d states and/or the usage of the generalized gradient rather than the local density approximation to result in typical (maximum) frequency shifts of ±2(4) cm−1 . This affects the vibrational free energy by at most 1 meV per surface In atom at 130 K. In Fig. 4 we present the free energy difference between the Si(111)-(4 × 1)-In and Si(111)-(8 × 2)-In phases. It vanishes at 128.5 K if only the vibrational entropy is taken into account. Additional consideration of the electronic entropy lowers the calculated phase transition temperature to 125 K. At this temperature, the vibrational and electronic entropy is large enough to compensate for the lower total energy of

Entropy and Metal-Insulator Transition

137

Fig. 4 Difference of the free energy F(T ) calculated for the (4 × 1) and (8 × 2) phase of the Si(111)-In nanowire array. The stable phase is indicated. The inset shows enlarged the entropy difference calculated by neglecting the electronic contributions and by restricting the BZ sampling to the Γ point

the insulating (8 × 2) phase compared to the metallic (4 × 1) phase. The calculated phase transition temperature is slightly above the experimental value of about 120 K. However, given the approximations and uncertainties discussed above, the agreement between theory and experiment should be considered to be fortuitously close. The present calculations show that the phase transition is caused by the gain in (mainly vibrational) entropy that overcompensates for higher temperatures the gain in band-structure energy realized upon transforming the metallic In zigzag chains into semiconducting In hexagons. Is it possible to trace the change in vibrational entropy to the frequency shift of a few illustrative modes? Due to the reduced symmetry of the hexagon structure, the phase transformation results in modified phonon eigenvectors. This complicates the one-to-one comparison of the vibrational frequencies. However, a general trend to higher surface phonon frequencies upon hexagon formation is clearly observed. This can be seen from most values in Table 1—with the shear mode as a notable exception—as well as from the comparison of the respective phonon densities of states shown in Fig. 5. The present calculations essentially confirm earlier experimental work that states “all major modes of the (4 × 1) surface are found in the (8 × 2) spectra, though blueshifted” [16]. A typical example is shown as inset in Fig. 5. The eigenvector corresponding to the alternating up and down movements of the In atoms hardly changes upon the (4 × 1)–(8 × 2) phase transition. The according frequency, however, goes up from 63 to 67 cm−1 . This shift in frequency is easily understood from the formation of additional In-In bonds upon hexagon formation, resulting in larger force constants.

138

W.G. Schmidt et al.

Fig. 5 Phonon density of states calculated for the (4 × 1) and (8 × 2) phase of the Si(111)-In nanowire array (4 cm−1 broadening). The inset shows a specific displacement pattern that hardly changes upon the phase transition but shifts in frequency. Arrows (feathers/heads) indicate down/up movements

4 Summary In summary, free energy calculations based on density functional theory are performed that explain the (4 × 1)–(8 × 2) phase transition of the Si(111)-In nanowire array in terms of a subtle interplay between the lower total energy of the insulating In hexagon structure and the larger vibrational and electronic entropy of the less tightly bound and metallic In zigzag chain structure at finite temperatures. Both the (4 × 1) and (8 × 2) phases are stable and well-defined structural phases. Soft shear and rotary vibrations transform between the In zigzag chains stable at room temperature and the hexagons formed at low temperatures. The present work resolves the discrepancies arising from the interpretation of the (4 × 1) reconstruction as timeaveraged superposition of (8 × 2) structures given by the dynamic fluctuation model. It clarifies the long-standing issue of the temperature-induced metal-insulator transition in one of the most intensively investigated quasi-1D electronic systems. We expect the mechanism revealed here to apply to many more quasi-1D systems with intriguing phase transitions, e.g., Au nanowires on high-index silicon surfaces. Acknowledgments. Generous grants of computer time from the H¨ochstleistungsrechenzentrum Stuttgart (HLRS) and the Paderborn Center for Parallel Computing (PC2 ) are gratefully acknowledged. We thank the Deutsche Forschungsgemeinschaft for financial support.

Entropy and Metal-Insulator Transition

139

References 1. T Tanikawa, I Matsuda, T Kanagawa, and S Hasegawa, Phys. Rev. Lett. 93, 016801 (2004). 2. A A Stekolnikov, K Seino, F Bechstedt, S Wippermann, W G Schmidt, A Calzolari, and M Buongiorno Nardelli, Phys. Rev. Lett. 98, 026105 (2007). 3. H W Yeom, S Takeda, E Rotenberg, I Matsuda, K Horikoshi, J Schaefer, C M Lee, S D Kevan, T Ohta, T Nagao, and S Hasegawa, Phys. Rev. Lett. 82, 4898 (1999). 4. C Gonzalez, F Flores, and J Ortega, Phys. Rev. Lett. 96, 136101 (2006). 5. S Chandola, K Hinrichs, M Gensch, N Esser, S Wippermann, W G Schmidt, F Bechstedt, K Fleischer, and J F Mcgilp, Phys. Rev. Lett. 102, 226805 (2009). 6. J R Ahn, J H Byun, H Koh, E Rotenberg, S D Kevan, and H W Yeom, Phys. Rev. Lett. 93, 106401 (2004). 7. S Riikonen, A Ayuela, and D Sanchez-Portal, Surf. Sci. 600, 3821 (2006). 8. C Kumpf, O Bunk, J H Zeysing, Y Su, M Nielsen, R L Johnson, R Feidenhans’l, and K Bechgaard, Phys. Rev. Lett. 85, 4916 (2000). 9. J-H Cho, D-H Oh, K S Kim, and L Kleinman, Phys. Rev. B 64, 235302 (2001). 10. J-H Cho, J-Y Lee, and L Kleinman, Phys. Rev. B 71, 081310(R) (2005). 11. X Lopez-Lozano, A Krivosheeva, A A Stekolnikov, L Meza-Montes, C Noguez, J Furthm¨uller, and F Bechstedt, Phys. Rev. B 73, 035430 (2006). 12. G Lee, S-Y Yu, H Kim, and J-Y Koo, Phys. Rev. B 70, 121304(R) (2004). 13. C Gonz´alez, J Guo, J Ortega, F Flores, and H H Weitering, Phys. Rev. Lett. 102, 115501 (2009). 14. H W Yeom, Phys. Rev. Lett. 97, 189701 (2006). 15. Y. J Sun, S Agario, S Souma, K Sugawara, Y Tago, T Sato, and T Takahashi, Phys. Rev. B 77, 125115 (2008). 16. K Fleischer, S Chandola, N Esser, W Richter, and J F McGilp, Phys. Rev. B 76, 205406 (2007). 17. S Wippermann and W G Schmidt, Phys. Rev. Lett. 105, 126102 (2010). 18. M Valtiner, M Todorova, G Grundmeier, and J Neugebauer, Phys. Rev. Lett. 103, 065502 (2009). 19. G Kresse and J Furthm¨uller, Comp. Mat. Sci. 6, 15 (1996). 20. J R Ahn, J H Byun, H Koh, E Rotenberg, S D Kevan, and H W Yeom, Phys. Rev. Lett. 93, 106401 (2004). 21. C Gonzalez, J Ortega, and F Flores, New J. Phys. 7, 100 (2005).

Obtaining the Full Counting Statistics of Correlated Nanostructures from Time Dependent Simulations Peter Schmitteckert

Transport properties of strongly interacting quantum systems are a major challenge in todays condensed matter theory. In our project we apply the density matrix renormalization group (DMRG) method [1–6] to study transport properties [7–14] of quantum devices attached to metallic leads. To this end we have developed two complementary approaches to obtain the conductance of a structure coupled to left and right leads. First we used the Kubo approach [15] to obtain the linear conductance. Combined with leads described in momentum space [16, 17] we were able to achieve high resolution in energy. The second approach is based on simulating the time evolution [18–20] of an initial state with a charge imbalance and is reviewed in [21]. In cooperation with Edouard Boulat and Hubert Saleur we have been able to show that our approach is in excellent agreement with analytical calculations in the framework of the Bethe ansatz [12]. This agreement is remarkable as the numerics is carried out for a lattice model, while the analytical result is based on field theoretical methods in the continuum. Therefore we have to introduce a scale TB to compare the field theoretical result to our numerics. Remarkably, at the so called self-dual point the complete regularization can be expressed by a single number, even for arbitrary contact hybridization t . Most strikingly we proved the existence of a negative differential conductance (NDC) regime even in this simplistic model of a single resonant level with interaction on the contact link. In an extension of this approach we presented results for current-current correlations, including shot noise, based on our real time simulations in our last report [22], see also [13, 14]. In this report we extend this scheme in order to access the cumulant generating function (CGF) of the electronic transport within the interacting resonant level model (IRLM) at its self-dual point [23].

Peter Schmitteckert Institute of Nanotechnology, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 12, © Springer-Verlag Berlin Heidelberg 2012

141

142

P. Schmitteckert

1 Shot Noise in the Interacting Resonant Level Model The study of current fluctuations in nanodevices such as quantum point contacts and tunnel junctions is deeply connected with some of the most important physical questions. These include the nature of fundamental excitations in strongly interacting electronic systems [24, 25], the possibility of fluctuation theorems out of equilibrium [26], and the time evolution of many body entanglement [27]. Experimental progress in this area has been swift—second and third cumulants have been measured in several systems [28, 29], shot noise of single hydrogen molecules has been measured [30], and even the full counting statistics has been obtained in semi conductor quantum dots [31]. In [13, 14, 22] we presented results for the calculation of shot noise from simulating the time evolution of a one-dimensional system with two leads coupled to a single impurity subject to a voltage quench. We described a method that allows to extract the noise power spectrum by means of a Fourier transform of the current fluctuations. To remove finite size effects we could show that in leading order the finite size effects scale like aG2 (VSD )/M, where G(VSD ) is the differential conductance, M the number of sites, and a a regularization constant independent of the system parameter. In detail we looked at the current-current correlation function ˆ Δ I(t ˆ )Ψ , S(t,t ) = Δ I(t) ˆ =e I(t)

ˆ iHt

Iˆ e

ˆ −iHt

for the IRLM

Hˆ = Hˆ 0 + Hˆ B , Hˆ 0 = −J

,

(1)

·Ψ = Ψ | · |Ψ ,

−2

∑

m=−ML

Hˆ B =

ˆ = I(t) ˆ − I(t) ˆ Ψ, Δ I(t)

cˆ†m+1 cˆm +

MR −1

∑

(2) cˆ†m+1 cˆm

+ H.c.,

(3)

m=1

1 1 † ˆ † ˆ ∑ U nˆm − 2 nˆd − 2 − J (cˆm d + d cˆm ) +Vg nˆd, m=±1

(4)

with the creation and annihilation operators as well as the density operators in the ˆ The number of lattice leads (cˆ†j , cˆ j , nˆ j = cˆ†j cˆ j ) and on the level (dˆ† , d,ˆ nˆ d = dˆ† d). sites in the left (right) lead is denoted by ML (MR ). The fluctuations then allow for the calculation of the noise power spectrum S(ω ) = 4Re

∞

dt eiω t S(t,t )

(5)

0

where we restricted ourselves to the low frequency limit ω = 0. The initial state |Ψ is obtained as the quantum mechanical ground state of the Hamiltonian Hˆ + VSD (Nˆ L − Nˆ R )/2. The particle number operators in the left (Nˆ L ) and right (Nˆ R ) lead enter with opposite sign, which generates a charge imbalance in the initial state corresponding to a difference of the chemical potential in the left and the right lead.

DMRG

143

While we have been able to proof excellent agreement between our numerical simulations and the analytical results based on the thermodynamic Bethe ansatz [13] it turns out that the extraction of the results is a tedious fitting problem. First, one has to ensure, that the DMRG itself is converged, i.e. that the projected subspace is large enough to provide accurate results for the time dependent simulations. Next one has to ensure that the initial t0 is large enough to provide steady state results and the transient and finite size oscillations have to be taken into account [21]. However, the main difficulty consists in choosing suitable boundaries for the Fourier transformation of S(t,t0 ) and to ensure that the results are indeed independent from the choice. And finally, there is the physical finite size effect which origins from the finite time of the simulations. This is the one we could show to scale ∼ G2 /M.

2 Full Counting Statistics of the Interacting Resonant Level Model Having established the results for the finite bias conductance [12] and the associated shot noise [13] we now extend our framework to get access to the full counting statistics (FCS) via the cumulant generating function [32]. In this approach we start by introducing a counting field χ which couples like a vector potential

−iχ /2 + − t cˆn1 −1 cˆn1 (6) Hχ = H + t1 eiχ /2 − t cˆ+ n1 cˆn1 −1 + t1 e As in the simulation of the current-current correlations we now perform a voltage quench and follow the transient up to time t0 . |Ψ (T0 ) = e−ı(H −E0 )T0 |Ψ0 ,

(7)

|Ψχ (T ) = e−ı(Hχ −E0 )(T −T0 ) |Ψ (T0 ),

(8)

Z(χ , T ) = Ψ−χ (T ) | Ψχ (T ),

(9)

F( χ , T ) = − ln(Z(χ , T )).

(10)

At time t0 we now switch on the counting field χ and define the cumulant generating function F via the logarithm of the Keldysh partition function Z. The reason for this approach is that one obtains the cumulants of the charge transport by taking the derivatives of F with respect to the counting field tm Cn = −

∂ i∂ χ

n

F(χ ,tm )

. χ =0,t→∞

Within this notation, C1 is the mean, C2 the variance, C3 the skewness, C4 the kurtosis, and so on [33]. In the long-time limit, each of these moments should be proportional to the measurement time tm , for example C1 = Itm where I is the current

144

P. Schmitteckert

flowing in the system, and C2 = Stm where S is the zero-frequency shot noise. Note, that since the cumulants are real it follows directly that the even cumulants are given by the real part of F, while then odd cumulants are given by the imaginary part of F. While the FCS for non-interacting electrons is fairly comprehensively understood [34, 35], far less is known in the presence of interactions [33, 36], particularly when the interactions drive the system into a strongly correlated regime. In the following we will show that this approach can be applied to interacting systems. where we study the CGF for the IRLM at the self dual point as discussed in the previous section.

2.1 Current We begin the discussion by looking at the overlap for small values of χ , where one can write F(χ ) ≈ iC1 χ tm −C2 χ 2tm /2. Hence looking at the imaginary part of (∂ F( χ )/∂ tm )/χ for small values of χ will give the current. We plot this in Fig. 1 for various values of bias voltage VSD and counting fields χ , where we see relatively weak transient effects, and a clear convergence to the correct analytic value, previously discussed in Ref. [12].

Fig. 1 Time evolution of the imaginary part of ∂tm F(χ )/( χVSD ) for various values of bias voltage VSD , system size M, and small values of measuring field χ . This converges in the long time limit to the current of the model, where we rescaled the current by 1/VSD for visibility reasons. The analytic result for the conductance is also plotted. It is seen that after an initial transient, excellent agreement is obtained between the time evolved simulation in the presence of the measuring field and the analytic result

DMRG

145

Fig. 2 Time evolution of the real part of ∂tm F(χ )/(χ 2 /2) for various values of V , M and small values of χ . This gives the zero frequency shot-noise of the model, which is known to have a noticeable finite-size correction [13]. Excellent agreement is found between the time evolved simulation and the finite size corrected analytic result, which is also plotted

2.2 Noise Turning to the real part gives the noise, which is plotted in Fig. 2. This can be compared to the results in Ref. [13]. Despite the rather lager transient behavior, the asymptotic behavior is quite clear, and is in perfect agreement with the previously discussed analytic result for noise (which should be corrected for finite size effects [13]). We note that the present method of obtaining the (zero-frequency) noise is computationally a lot easier than obtaining the correlation function directly. The main advantage is that the noise is obtained directly as the long time limit of F without having to resort to the Fourier transform and its associated finite size related difficulties as explained in the previous section.

2.3 Full Counting Statistic of the IRLM at the Self Dual Point ˜ χ ) = ∂tm F(χ ). Following the proLet us now turn the full χ dependent curve of F( cedure outlined in [12, 13] one can extend the Bethe ansatz based analysis to obtain all cumulants [32] ˜ χ ) = 1 iV χ −V ∑ a4 (m) F( 2π m>0 2m

V TB

6m

−2miχ −1 , e

(11)

146

P. Schmitteckert

Fig. 3 Cumulant generating function for self-dual interacting resonant level model, at small voltages compared to analytic results

˜ χ) = V F(

2a1/4 (m) V −3m/2 imχ /2 e −1 , ∑ m TB m>0

(12)

Here TB ∼ (t )4/3 is the natural energy scale for V < Vc and V > Vc respectively. √ 3 of the problem, and Vc = 42/3 TB is the convergence radius of each of the series. This (non-universal) parameter of proportionality relates the regularization of the field theory (on which the Bethe ansatz solution is based) to that of the original lattice model, and can be taken from previous works [12]. This leaves zero free fitting parameters for all comparisons between analytic and numerical results in this report. The coefficients are given by (−1)m+1Γ (1 + Km)

. aK (m) = √ 4 π m!Γ 32 + (K − 1)m

(13)

We now look at the results, where we first consider the low bias regime. We plot in Fig. 3. along with the analytic result (11), where we find nice agreement between the imaginary part of F and our simulations. It is known that at small voltages, the finite size effects in the noise are quite large [13]. While we have been able to take the finite size corrections into account for the shot noise simulations, we find extremely large finite size effects for the imaginary part of F at large values of χ which are still waiting to be understood.

DMRG

147

Fig. 4 Cumulant generating function for self-dual interacting resonant level model at large voltages, compared to analytic results

Looking at the analytic expansion, (11), we see that the backscattering current may be thought of as a sum of Poissonian process of effective quasi-particles of charges 2me where m is an integer [37, 38]. As a result of quantum interference, the effective “probabilities” in this equivalent Poissonian process are not all positive, and therefore the electronic transport is not classical. However from the periodicity of F(χ ), it is clear that the fundamental quasi-particles being scattered carry charge 2e. We note that this behavior was already hinted at by the Fano factor [13], knowledge of the full CGF confirms this. We now turn to the large voltage regime: In Fig. 4 we plot numerical results along with the appropriate analytic curves from (12). This time we find very good agreement between the numerics and the analytic result for both, the real and imaginary part of F, adding further evidence to justify the technique first exposed in Ref. [39]. As in the low bias voltage case, we can interpret the expansion (12) as effective Poissonian processes, but this time with fractionally charged quasi-particles me/2 [37, 38, 40]. For a detailed discussion see Ref. [32]. In summary, we have presented a numerical technique based on time-dependent DMRG from which one can extract the finite-bias CGF of the FCS for a generic quantum impurity problem. We have demonstrated the method for the self-dual IRLM, for which we can compare the results to those obtained from Bethe ansatz calculations, finding excellent agreement, except for imaginary part of F in the low voltage regime where the results are dominated by finite size effects.

148

P. Schmitteckert

2.4 Remarks on the Numerics In our DMRG simulations we typically start by keeping up to 1200 states per block and performing only 2 time steps to initialize the calculations. We then successively increase the number of states kept per block up to 3100 and the time frame until t = vF ∗M/2 as outlined in our previous reports. The target subspace reaches dimensions of ≈ 8 · 106 states. The actual wall time of the jobs are of the order of three to five months of constantly self submitting batch jobs, which shows that we are currently facing the limit of what we can achieve on the XC2. Yet, the mere existence of our results shows that the code is running very stable on the XC2. The code is running embarrassingly parallel with respect to the nodes, as on each node a different χ value was running within a node the code is parallelized using Posix threads making use of a high concurrency within the code. The work chunks which are distributed within our master-worker approach are dominantly BLAS-3 calls. Therefore the overhead induced by the parallelization is very small compared to the actual work load, compare our previous reports [8–11, 22].

3 Color-Charge Separation in Trapped SU(3) Fermionic Atoms One of the most intriguing effects of strong correlations in low-dimensional systems is the separation of charge and spin degrees of freedom. Within this project we have shown that the spin-charge separation should be observable by means of a transport experiment [10, 18]. Here we extend our simulations to the case of three flavors which could be realized by cold fermionic atoms with three different hyperfine states confined in one-dimensional optical lattices. Extending our previous work to SU(3) fermions we tracked the evolution of a particle injected into a onedimensional system with subject to on-site interaction. Here the Hamiltonian is given by a SU(3) variant of the Hubbard model, H = −t

∑

i j,α

fi†α f jα + h.c. +

Uαβ niα niβ . 2 i,α =β

∑

(14)

In Ref. [41] we showed, that similar to the SU(2) case the initial wave packet splits into the charge and the corresponding spin degrees of freedom. As an examples we show several snapshots in Fig. 5. A detailed analysis [41] reveals a nice agreement between the measured velocities and a analytical treatment in the framework of a Luttinger liquid description. It shows that the color-charge separation exists for repulsive and not too large attractive interaction. Acknowledgments. The DMRG calculations were performed on the HP XC4000 at the Steinbuch Center for Computing (SCC) Karlsruhe under project RT-DMRG, with support through project B2.10 of the DFG Centre for Functional Nanostructures. The work on the full counting statistics

DMRG

149

Fig. 5 Snapshots of the time evolution of an additional fermion for filling ν = 0.5 and U = 0 (a), U = +1 (b), U = −1 (c) and U = +5 (d). The initial state is a Gaussian wave packet with average momentum k = 0.6π . In (b), (c), (d) the color density of quantum number j8 (magenta) separates from the charge density (black). Lines serve as guides to the eye

is a joined project together with Sam Carr and Dmitry Bagrets. I would also like to thank Edouard Boulat and Hubert Saleur for many insightful discussions. The project on color charge separation was performed together with Tobias Ulbricht, Rafael A. Molina, and Ronny Thomale.

References 1. S. R. White. Phys. Rev. Lett., 69:2863, 1992. 2. S. R. White. Phys. Rev. B, 48:10345, 1993. 3. I. Peschel, X. Wang, M. Kaulke, and K. Hallberg, editors. Density Matrix Renormalization, 1999. 4. R. M. Noack and S. R. Manmana. Diagonalization- and numerical renormalization-groupbased methods for interacting quantum systems. In A. Avella and F. Mancini, editors, Lectures on the Physics of Highly Correlated Electron Systems IX: Ninth Training Course in the Physics of Correlated Electron Systems and High-Tc Superconductors, volume 789, pages 93–163, Salerno, Italy. AIP, Melville, NY, USA, 2005. 5. K. A. Hallberg. New trends in density matrix renormalization. Adv. Phys., 55(5):477–526, 2006. 6. U. Schollw¨ock. The density-matrix renormalization group. Rev. Mod. Phys., 77(1), 2005. 7. P. Schmitteckert. Nonequilibrium electron transport using the density matrix renormalization group. Phys. Rev. B, 70:121302(R), 2004. 8. P. Schmitteckert and G. Schneider. Signal transport and finite bias conductance in and through correlated nanostructures. In W. E. Nagel, W. J¨ager, and M. Resch, editors, High Performance Computing in Science and Engineering ’06, pages 113–126. Springer, Berlin, 2006. 9. P. Schmitteckert. Signal transport in and conductance of correlated nanostructures. In W. E. Nagel, D. B. Kr¨oner, and M. Resch, editors, High Performance Computing in Science and Engineering ’07, pages 99–106. Springer, Berlin, 2007.

150

P. Schmitteckert

10. T. Ulbricht and P. Schmitteckert. Signal transport in and conductance of correlated nanostructures. In W. E. Nagel, D. B. Kr¨oner, and M. Resch, editors, High Performance Computing in Science and Engineering ’08, pages 71–82. Springer, Berlin, 2008. 11. A. Bransch¨adel, T. Ulbricht, and P. Schmitteckert. Conductance of correlated nanostructures. In W. E. Nagel, D. B. Kr¨oner, and M. Resch, editors, High Performance Computing in Science and Engineering ’09, pages 123–137. Springer, Berlin, 2009. 12. E. Boulat, H. Saleur, and P. Schmitteckert. Twofold advance in the theoretical understanding of far-from-equilibrium properties of interacting nanostructures. Phys. Rev. Lett., 101(14):140601, 2008. 13. A. Bransch¨adel, E. Boulat, H. Saleur, and P. Schmitteckert. Shot noise in the self-dual interacting resonant level model. Phys. Rev. Lett., 105(14):146805, Oct 2010. 14. A. Bransch¨adel, E. Boulat, H. Saleur, and P. Schmitteckert. Numerical evaluation of shot noise using real-time simulations. Phys. Rev. B, 82(20):205414, Nov 2010. 15. D. Bohr, P. Schmitteckert, and P. W¨olfle. DMRG evaluation of the kubo formula— conductance of strongly interacting quantum systems. Europhys. Lett., 73:246, 2006. 16. D. Bohr and P. Schmitteckert. Strong enhancement of transport by interaction on contact links. Phys. Rev. B, 75(24):241103(R), 2007. 17. P. Schmitteckert. Calculating green functions from finite systems. J. Phys.: Conf. Ser., 220:012022, 2010. 18. T. Ulbricht and P. Schmitteckert. Is spin-charge separation observable in a transport experiment? EPL, 86(5):57006+, 2009. 19. T. Ulbricht and P. Schmitteckert. Tracking spin and charge with spectroscopy in spin-polarised 1d systems. EPL, 89:47001, 2010. 20. T. Ulbricht, R. A. Molina, R. Thomale, and P. Schmitteckert. Color-charge separation in trapped su(3) fermionic atoms. Phys. Rev. A, 82(1):011603, July 2010. 21. A. Bransch¨adel, G. Schneider, and P. Schmitteckert. Conductance of inhomogeneous systems: Real-time dynamics. Ann. Phys. (Berlin), 522:657, 2010. 22. A. Bransch¨adel and P. Schmitteckert. Conductance of correlated nanostructures. In High Performance Computing in Science and Engineering ’10. Springer, Berlin, 2010. 23. D. Bagrets, S. Carr, and P. Schmitteckert. Full counting statistics in the self-dual interacting resonant level model. arXiv:1104.3532. 24. R. de Picciotto, M. Heiblum, H. Shtrikman, and D. Mahalu. Phys. Rev. Lett., 75:3340, 1995. 25. A. Kumar, L. Saminadayar, D. C. Glattli, Y. Jin, and B. Etienne. Phys. Rev. Lett., 76:2778, 1996. 26. M. Esposito, U. Harbola, and S. Mukamel. Rev. Mod. Phys., 81:1665, 2009. 27. I. Klich and L. Levitov. Phys. Rev. Lett., 102:100502, 2009. 28. B. Reulet, J. Senzier, and D. E. Prober. Phys. Rev. Lett., 91:196601, 2003. 29. Y. Bomze, G. Gershon, D. Shovkun, L. S. Levitov, and M. Reznikov. Phys. Rev. Lett., 95:176601, 2005. 30. D. Djukic and J. M. van Ruitenbeek. Shot noise measurements on a single molecule. Nano Letters, 6(4):789, 2006. 31. S. Gustavsson, R. Leturcq, B. Simovic, R. Schleser, T. Ihn, P. Studerus, and K. Ensslin. Phys. Rev. Lett., 96:076605, 2006. 32. S. Carr, D. Bagrets, and P. Schmitteckert. Full counting statistics in the self-dual interacting resonant level model. Phys. Rev. Lett., 2011. 33. D. S. Golubev, D. A. Bagrets, Y. Utsumi, and G. Sch¨on. Fortschr. Phys., 54:917, 2006. 34. H. Lee, L. S. Levitov, and G. B. Lesovik. J. Math. Phys., 37:4845, 1996. 35. K. Sch¨onhammer. Full counting statistics for noninteracting fermions: Exact results and the Levitov-Lesovik formula. Phys. Rev. B, 75(20):205329, May 2007. 36. A. W. W. Ludwig, A. O. Gogolin, R. M. Komnik, and H. Saleur. Ann. Phys. (Leipzig), 16:678, 2007. 37. H. Saleur and U. Weiss. Point-contact tunneling in the fractional quantum hall effect: An exact determination of the statistical fluctuations. Phys. Rev. B, 63(20):201302, Apr 2001. 38. A. Komnik, B. Trauzettel, and U. Weiss. Ann. Phys. (Leipzig), 16:661, 2007.

DMRG

151

39. P. Fendley, A. W. W. Ludwig, and H. Saleur. Exact nonequilibrium transport through point contacts in quantum wires and fractional quantum hall devices. Phys. Rev. B, 52(12):8934– 8950, Sep 1995. 40. L. S. Levitov and M. Reznikov. Counting statistics of tunneling current. Phys. Rev. B, 70(11):115305, Sep 2004. 41. T. Ulbricht, R. A. Molina, R. Thomale, and P. Schmitteckert. Color-charge separation in trapped su(3) fermionic atoms. Phys. Rev. A, 82(1):011603, July 2010.

Phase Diagram of the 1D t-J Model A. Moreno, A. Muramatsu, and S. Manmana

Abstract The ground-state phase diagram of the t-J model in one dimension is studied by means of the Density Matrix Renormalization Group. The phase boundaries separating the repulsive from the attractive Luttinger-liquid (LL) phase, and also the boundaries of the spin-gap region and phase-separation, are determined on the basis of correlation functions and energy gaps. In particular, we shed light on a contradiction between variational and renormalization-group (RG) results about the extent of the spin-gap phase, that results larger than the variational but smaller than the RG one.

1 Overview The work that we present in the remainder of this report was recently accepted for publication in Physical Review B [1]. Apart from the work detailed below, we considered in the last grant period the following topics: We studied the finitetemperature ordering of defect-induced magnetic moments in graphene [2], based on an effective spin-model that accounts for the long-ranged RKKY interactions mediated among the moments by the conduction electrons. We verified the mean-field character of the finite-temperature ordering transition, and analyzed the dependence of the N´eel temperature on the defect concentration. Furthermore, we analyzed the Hubbard model of spin- 12 fermions on the honeycomb lattice at half-filling using large-scale quantum Monte Carlo simulations [3]. We found that the weak coupling semimetal and the antiferromagnetic Mott insulator at strong interaction are separated by an extended spin-liquid phase in an A. Moreno · A. Muramatsu Institut f¨ur Theoretische Physik III, Universit¨at Stuttgart, Pfaffenwaldring 57, D-70550 Stuttgart, Germany S. Manmana JILA, Department of Physics, University of Colorado, Boulder, CO 80309-0440, USA W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 13, © Springer-Verlag Berlin Heidelberg 2012

153

154

A. Moreno, A. Muramatsu, S. Manmana

intermediate coupling regime, best described by a resonating valence bonds state. We presented a summary in the previous annual report. Further work awaiting publication comprise the simulation of supersolidity in polar molecules [4], edge-state magnetism in graphene nanoribbons [5], quantum criticality in dimerized antiferromagnets [6], and pair superfluidity for bosons with three body constraints [7].

2 Introduction The t-J model constitutes together with the Hubbard model a paradigm for the theoretical description of high temperature superconductors (HTS) since its derivation by Zhang and Rice [8] from a three-band Hubbard (spin-fermion) model describing the copper-oxide planes present in HTS. It can also be derived in second order of perturbation theory around U = ∞, where U is the strength of the interaction in the Hubbard model [9]. Its Hamiltonian is as follows: 1 (1) H = −t ∑ fi,†σ f j,σ + h.c. + J ∑ Si · S j − ni n j , 4 i, j i, j σ

where the operator fi,†σ ( fi,σ ) creates (destroys) a fermion with spin σ =↑, ↓ on the site i. They are not canonical fermionic operators since they act on a restricted Hilbert space without double occupancy. Si = fi,†α σ αβ fi,β is the spin operator and ni = fi,†σ fi,σ is the density operator. In all expressions a summation over repeated indices is understood. Furthermore, i, j denotes nearest neighbor bonds. While the main interest on this model resides on its two-dimensional (2D) realization, it presents already in one dimension (1D) a number of very interesting features. In contrast to the 1D repulsive Hubbard model, where only a Luttinger liquid (LL) phase for density n = 1 and an insulating phase for n = 1 are present, the t-J model possesses a rich phase diagram, as shown first by M. Ogata et al. [10] long ago. Interestingly, the phases display a correspondence to the ones present in HTS, like superconductivity, spin-gap and phase separation, albeit for values of J/t outside the range pertaining to HTS. In spite of this fact, since unbiased results for the 2D model can only be obtained by exact diagonalization [11] or by density matrix renormalization group (DMRG) [12] on small systems, a controlled finite-size scaling determining the phase diagram of the 2D t-J model is up to now not available. Therefore, the 1D version presents a possibility of gaining insight into exotic phases like the spin-gap region. The 1D t-J model has been solved exactly only for J/t → 0, where it is equivalent to the U → ∞ Hubbard model [13–15], and for the supersymmetric point J = 2t [16, 17]. In both cases the model behaves as an LL [18–22]. For very large J/t the attractive interaction dominates against the kinetic energy and the system phase separates into hole rich regions and antiferromagnetic islands. Although the first

Phase Diagram of the 1D t-J Model

155

phase diagram appeared almost twenty years ago, there are still issues to be clarified like the boundaries of the spin-gap phase. In previous studies the existence of a spin-gap has been deduced from exact diagonalization (ED) of small systems [10], variational [23], or projection [24] methods, and by a combination of renormalization group (RG) arguments with ED [25]. In particular the last study found a much larger region in density for the spin-gap phase. A main goal of the present work is to achieve a precise determination of the phase diagram, by performing finite-size extrapolations to the thermodynamic limit for the correlation functions and energy differences relevant for the different phases. The results are obtained using the DMRG method [26–28], on lattices with up to L = 200 sites. The method will be briefly described in the next section.

3 Method In order to give the basic idea underlying the density matrix renormalization group (DMRG) [26–28], we consider a chain with L sites as is displayed in Fig. 1. Each site j has associated a set of vectors |Si j of dimension d, where i j = 1 · · · d. The basis of the Hilbert space of the entire system is constructed by the tensor product of all single site states. Therefore, the dimension of this space, which is given by d L , increases exponentially with the system size L. Consequently, the computational effort to solve the problem exactly increases exponentially and with the current classical computers we can reach up to around 40 sites for spin-1/2 chains. The DMRG circumvents the necessity of dealing with the whole Hilbert space by selecting a set of states that best represent the state targeted for a given accuracy. The general idea of the method is to start with a chain small enough in order to diagonalize the problem exactly and then taking out unimportant degrees of freedom while the system is growing. After partitioning the system into two parts, the most important degrees of freedoms to be kept are given by the eigenvectors with the largest eigenvalues of the reduced density matrix of one part, obtained by tracing out over the states of the other part. More explicitly, we divided the chain into two parts: left and right parts. Each part can be expanded by a set of vectors: |i for the left part and | j for right one. Hence, the basis of the entire system is given by the tensor product |i| j and any state |ψ can be written as a linear combination of this basis as follows

Fig. 1 One-dimensional quantum system with L sites. The state of each site is described by a set of vectors |Si j

156

A. Moreno, A. Muramatsu, S. Manmana dL dR

|ψ = ∑ ∑ ψi j |i| j, i

(2)

j

where dL is the dimension of the left subspace and dR of the right one. We can construct a reduced density matrix operator ρ˜ associated to the state |ψ by taking the usual density matrix ρ = |ψ ψ | for the pure state |ψ and tracing out the degrees of freedom of one part, let us say the right part,

ρ˜ = TrR |ψ ψ | = ∑ j|ψ ψ | j.

(3)

j

For an operator A acting on the left part we can prove that its mean value is given by ψ |A|ψ = TrL (ρ˜ A) = ∑ νi νi |A|νi ,

(4)

i

where νi and |νi are the eigenvalues and eigenvectors of the reduced density matrix ρ˜ . Since the νi ’s fulfill that νi > 0, ∑i νi = 1 and they decay quite fast, then we can introduce a cut off to compute this mean value up to a certain error by choosing the m largest eigenvalues of ρ˜ . This error is also called discarded weight and is defined as m

ε = 1 − ∑ νi ,

(5)

i=1

where m < dL and we have assumed that the νi ’s are in decreasing order. Therefore, given the discarded weight ε , (5) determines m and consequently defines a truncation procedure to reduce the Hilbert space. Typical values of m are 200–400 for chains with L = 100–200 and ε of the order of 10−8 . One way to realize that the set |νi represents the optimal basis is to write down an approximation |ψ˜ of the state (2) given by m dR

|ψ˜ = ∑ ∑ aα j |α | j α

(6)

j

where the vectors |α stand for an arbitrary truncated basis of the left part with dimension m < dL . After minimizing the quantity || |ψ − |ψ˜ ||, it can be proved that |α = |νi , i.e, the optimal truncated basis is given by the eigenvectors of the reduced density matrix [26–28]. The DMRG method was originally introduced to deal with both infinite and finite systems. In the infinite algorithm the size of the chain L increases while in the finite algorithm L is kept constant by increasing one part of the chain and decreasing the another one simultaneously. In the finite algorithm we say that we have completed a sweep after going back and forth throughout the chain. This sweeping helps to converge to more precise values of the measurements. For more details about the DMRG we refer the reader to the original references [26, 27] and a review article [28].

Phase Diagram of the 1D t-J Model

157

All the results in this report were obtained using at least m = 200 DMRG vectors, 4 sweeps, and a discarded weight of 10−8 . This translates into errors in energy of the order of 10−8 and in correlation functions of the order of 10−5 at the largest distances. Typical run times on the NEC SX-9 platform are 8–12 hours and the number of parallel processes (or mpi tasks) are 16–24. The NEC SX-9 platform is a vector machine with a shared memory of 512GB per node. This makes the SX-9 a suitable platform for algorithms where matrix-matrix and matrix-vector multiplication are the most time consuming operations, as is the case of the DMRG method. Until now a deep parallelization of DMRG codes is not available. However, an MPI parallelization is used to have a strong throughput of data.

4 Phase Diagram The phase diagram for the 1D t-J model obtained by Ogata et al. [10] is based on ED on systems with up to 16 lattice sites. They found three phases: repulsive LL phase (metal), attractive LL phase (superconductor) and phase-separation. At that time they suspected the existence of a spin gap at low density but they could not prove its existence due to the limitation to small system sizes. Posterior works [23, 24] found evidences of the spin gap using variational methods. However, Nakamura et al. [25] found based on an RG analysis that the spin gap region is larger than expected. Here we present results from a direct measurement of the spin gap and an extrapolation to the thermodynamic limit. Details for it will be discussed in Sect. 4.2. Our results can be summarized in the phase diagram shown in Fig. 2. We obtained four phases: a metallic phase (M) or repulsive LL, a superconducting (SC) region without spin gap, a singlet-superconducting phase with spin gap (SG + SS), and phase-separation (PS), where the system separates into a hole rich part and domains of Heisenberg islands. The number given to each line stands for the value of the Luttinger parameter Kρ , with Kρ < 1.0 in M and Kρ > 1.0 in both superconducting phases. The determination of Kρ will be discussed in the subsequent sections. In order to characterize the phases and to find the boundaries between them, we calculated directly the energy gap to triplet excitations and measure the densitydensity correlation function, Ni j = ni n j − ni n j .

(7)

4.1 Metallic Phase In order to characterize this phase, we compute the Luttinger parameter Kρ , with Kρ < 1 (Kρ > 1) for a repulsive (attractive) interaction, and Kρ = 1 for the free case. In oder to obtain Kρ , we consider the limit k → 0, where the structure factor for

158

A. Moreno, A. Muramatsu, S. Manmana

Fig. 2 Phase diagram for the 1D t-J model from DMRG for densities 0.1 ≤ n ≤ 0.9 and in the range 0 < J ≤ 4, where we set t = 1. n = N/L is the electronic density (N is the total number of particles and L the number of lattice sites). Four phases are present: a metallic phase (M) or repulsive Luttinger liquid, a superconducting (SC) phase without spin gap, a singlet-superconducting phase with spin gap (SG + SS), and phase-separation (PS). The number given to each line stands for the value of the Luttinger parameter Kρ

Fig. 3 Structure factor N(k) of the density-density correlation function for n = 0.5, J = 2.0 and L = 40, 80, 120, 160, 200

the density correlations displays a linear behavior with a slope proportional to Kρ [18, 29, 30], (8) N(k) → Kρ |k|a/π for k → 0. Here a is the lattice constant (we set a = 1). Figure 3 shows N(k) for n = 0.5, J = 2.0 and L = 40, 80, 120, 160, 200. We observe a clear linear behavior for small k, with N(k = 0) = 0 due to the conservation of the total particle number.

Phase Diagram of the 1D t-J Model

159

Fig. 4 Extrapolation to the thermodynamic limit of N(k = 2π /40) for n = 0.5 and J = 2.0

Fig. 5 Kρ as function of J for different densities n

Although N(k) appears to be almost independent of the lattice size, a more precise value of the slope is obtained by extrapolating the value of N(k) at the point k = 2π /40, that is the smallest wavevector in our smallest system, to L → ∞. Using this last value and N(k = 0) = 0 we obtain the slope and then we can extract Kρ in the thermodynamic limit using (8). This extrapolation is shown in Fig. 4. We repeated this procedure for different values of n and J. Kρ as function of J for different densities n is plotted in Fig. 5. Note that Kρ → 0.5 when J → 0 for all densities, which is in agreement with the results obtained for the U/t → ∞ Hubbard model [13]. It can also be observed that, for Kρ > 1, Kρ increases quite fast with the interaction constant J and it actually should diverge in the phase-separated phase.

160

A. Moreno, A. Muramatsu, S. Manmana

Fig. 6 Kρ as function of the density n for J = 2 (supersymmetric point)

Fig. 7 Regions with different Luttinger parameter Kρ (n, J). Each curve corresponds to a constant value of Kρ . The red line (Kρ (n, J) = 1) denotes the boundary between the metallic phase and the superconducting phases. Note that the density of lines increases with J showing the fast growth of Kρ

The critical exponents Kρ at the supersymmetric point J = 2 and for all densities were exactly obtained long ago by means of Bethe-ansatz [31]. In Fig. 6 we compare our DMRG results with this exact solution, observe a very good agreement between both. From the data set presented in Fig. 5 we can extract all the points which fulfill Kρ (n, J) = const. These are curves which separate regions with different Luttinger parameters. The different regions and curves are plotted in Fig. 7. The red line (Kρ (n, J) = 1) denotes the boundary between the metallic phase and the super-

Phase Diagram of the 1D t-J Model

161

conducting phases. This line and few others were also plotted in the phase diagram (Fig. 2). Note that the density of lines increases with J showing the fast growth of Kρ .

4.2 Singlet-Superconductivity and Spin Gap Phase The possibility of a region with a spin gap was first analyzed by Ogata et al. [10] for the low density limit, where a gas of singlet bound pairs may form. They compared the ground state energy of a system containing four particles to twice the energy of a system with only two particles. The energy for the last situation is 2(−J − 4/J), where the expression −J − 4/J is obtained by solving exactly the Hamiltonian (1) for two particles. The ground state energy for the four particles is obtained here numerically by using the DMRG. The comparison between these energies are shown in Fig. 8 using 1000 lattice sites what corresponds to a very low density of n = 0.004. We observe a region, 2.0 < J < 3.0, where both energies are the same within an error of 10−5 . At least from energy considerations, this is an indication of the possibility of the formation of a gas of singlet bound pairs and consequently of the existence of a spin gap at very low densities. A more precise estimate can be obtained by measuring directly the spin gap Δ ES . The spin excitation energy from a singlet to a triplet state is given by the energy difference z z Δ ES = E0 (N, Stot = 1) − E0 (N, Stot = 0), (9)

Fig. 8 Energy comparison between a system containing four particles and a gas of singlet bound pairs. We observe a region 2.0 < J < 3.0 where both energy are the same within an error of 10−5 . This opens the possibility of the formation of a gas of singlet bound pairs and consequently of the existence of a spin gap at very low densities

162

A. Moreno, A. Muramatsu, S. Manmana

Fig. 9 Δ E0 vs 1/L for n = 0.2 and various values of J. For J = 2, where the system is still in the metallic phase, the spin gap extrapolates to zero in the thermodynamic limit

where the subindex 0 means that we take the lowest energy level with given quanz tum numbers N and Stot . In order to go to the thermodynamic limit we plot Δ ES as a function of 1/L and we extrapolate to 1/L → 0 using L = 40, 80, 120, 160, and 200. Figure 9 shows Δ ES vs 1/L at n = 0.2 and for J = 2.0, 2.7 and 2.9. The extrapolations to the thermodynamic limit are performed with third-order polynomials. While for J = 2, where the system is still in the metallic regime the gap extrapolates to zero, for J = 2.7 and 2.9 a finite gap can be clearly seen. Proceeding in the same manner for different values of n and J, we can obtain the spin gap in the thermodynamic limit (Fig. 10). For J < 2 (metallic phase) we observe that the spin gap is zero for all densities. For J > 2 a finite spin gap emerges that increases as the density diminishes. For definiteness, Jc is defined in our case as the value of J for which Δ ES > 10−4t, this value being the range on which Δ ES fluctuates around zero before it definitively increases as a function of J for a given density. In this manner we have obtained the lowest boundary of the spin gap phase in the phase diagram (Fig. 2). We present also in Fig. 11 the spin gap as a function of n for J = 2.1–2.8. Note that Δ ES smoothly closes to zero when the density is increased. Δ ES attains its largest values as J increases for the limit of vanishing densities, reaching Δ ES ≈ J/20. Since in one dimension superconductivity can only set in at temperature T = 0, such a finite value of the gap gives the energy scale at which pairs form, signaling the existence of preformed pairs in this regime. A further increase of J leads to phase separation, that we discuss next.

Phase Diagram of the 1D t-J Model

163

Fig. 10 Spin gap Δ ES in the thermodynamic limit as a function of J and for different densities n. For J < 2 (metallic phase) the spin gap is zero for all densities. For J > 2 a finite spin gap emerges with a value that increases for diminishing densities. We can observe a small but finite spin gap up to n = 0.55

Fig. 11 Spin gap Δ E0 in the thermodynamic limit as a function of n for J = 2.1–2.8

4.3 Phase Separation In this phase the attraction among the particles is so strong that they start to form antiferromagnetic domains, such that the system separates into particle- and holerich regions. In the limit J → ∞ all the particles join in a single island, which can be described by the Heisenberg model, forming an electron solid phase, where the kinetic fluctuations are completely eliminated and only spin fluctuations remain. We first consider the inverse of the compressibility that vanishes at the onset of phaseseparation.

164

A. Moreno, A. Muramatsu, S. Manmana

Fig. 12 Inverse of the compressibility κ −1 as a function of J for n = 0.1–0.9 with Δ n = 0.05

At zero temperature the expression for the inverse compressibility is given by

∂ 2 e0 (n) ∂ n2 [e(n + Δ n) + e(n − Δ n) − 2e(n)] ≈ n2 , Δ n2

κ −1 (n) = n2

(10)

where e0 (n) = E0 /L is the energy density per site, and the second line gives the approximation for finite (Δ n = 0.05) changes in the density. For the extrapolation of e(n) we use L = 40, 80, 120, 160 and a third-order polynomial fitting. Figure 12 shows κ −1 vs J for different densities. In this manner we found the boundary of the phase-separated phase in Fig. 2. Note that, in comparison to other studies [10, 23, 24], the phase separation boundary is shifted to higher values of J.

5 Summary We have revisited the phase-diagram of the one-dimensional t-J model and determined, on the basis of finite-size extrapolations of results obtained with DMRG, the boundaries between the known four phases: metal (M), singlet-superconductivity with spin-gap (SG+SS), gapless superconductivity (SC), and phase separation (PS) (see Fig. 2). The most controversial issue was related to the boundary between SC and SG+SS, where appreciable differences were present between the results from variational methods [23, 24], and results based on renormalization group [25]. The highest densities at which the spin-gap phase was predicted was n ∼ 0.4 and n ∼ 0.8, respectively. In our case it corresponds to n ∼ 0.6. The boundary between M and SC was determined by extracting the Luttinger liquid anomalous dimension Kρ from the

Phase Diagram of the 1D t-J Model

165

slope of the structure factor for density correlations in the limit k → 0, extrapolated to the thermodynamic limit. The extrapolations were performed using system sizes L = 40, 80, 120, 160, and 200. The opening of the spin-gap was directly determined by examining the gap to the lowest triplet state, and again extrapolating to the thermodynamic limit. Finally, the boundary to PS was determined by extrapolating the inverse compressibility. Acknowledgments. We wish to thank HLRS-Stuttgart (Project CorrSys) for the allocation of computer time. We also acknowledge financial support by the DFG programs SFB/TRR 21. S. R. M. acknowledges financial support by PIF-NSF (grant No. 0904017). A. M. and S. R. M. acknowledge interesting discussions with A. M. Rey and A. V. Gorshkov and A. M. is grateful to KITP, Santa Barbara for hospitality during the completion of this work. This research was supported in part by the National Science Foundation under Grant No. PHY05-51164.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.

A. Moreno, A. Muramatsu, and S. R. Manmana, Phys. Rev. B 83, 205113 (2011). T. Fabritius, N. Laorencie, and S. Wessel, Phys. Rev. B 82, 035402 (2010). Z. Y. Meng et al., Nature 464, 847 (2010). L. Bonnes and S. Wessel, arXiv:1101.0913 (2011). H. Feldner et al., arXiv:1101.1882 (2011). L. Fritz et al., arXiv:1101.3784 (2011). L. Bonnes and S. Wessel, arXiv:1101.5991 (2011). F. Zhang and T. M. Rice, Phys. Rev. B 37, 3759 (1988). K. A. Chao, J. Spalek, and A. M. Oles, J. Phys. C 10, L271 (1977). M. Ogata, M. Lucchini, S. Sorella, and F. Assaad, Phys. Rev. Lett. 66, 2388 (1991). E. Dagotto, Rev. Mod. Phys. 66, 763 (1994). S. R. White and D. Scalapino, Phys. Rev. B 57, 3031 (1998). H. J. Schulz, Phys. Rev. Lett 64, 2831 (1990). N. Kawakami and S.-K. Yang, Phys. Lett. A 148, 359 (1990). H. Frahm and V. E. Korepin, Phys. Rev. B 42, 10553 (1990). P.-A. Bares and G. Blatter, Phys. Rev. Lett. 64, 2567 (1990). P.-A. Bares, G. Blatter, and M. Ogata, Phys. Rev. B 44, 130 (1991). T. Giamarchi, Quantum Physics in One Dimension (Clarendon Press, Oxford, 2004). F. D. M. Haldane, J. Phys. C 14, 2585 (1981). F. D. M. Haldane, Phys. Rev. Lett. 45, 1358 (1981). J. S´olyom, Adv. Phys. 28, 201 (1979). V. J. Emery, In Highly Conducting One-dimensional Solids (Plenum, New York, 1979). Y. C. Chen and T. K. Lee, Phys. Rev. B 47, 11548 (1993). C. S. Hellberg and E. J. Mele, Phys. Rev. B 48, 646 (1993). M. Nakamura, K. Nomura, and A. Kitazawa, Phys. Rev. Lett. 79, 3214 (1997). S. R. White, Phys. Rev. Lett. 69, 2863 (1992). S. R. White, Phys. Rev. B 48, 10345 (1993). U. Schollw¨ock, Rev. Mod. Phys. 77, 259 (2005). R. T. Clay, A. W. Sandvik, and D. K. Campbell, Phys. Rev. B 59, 4665 (1999). S. Ejima, F. Gebhard, and S. Nishimoto, Europhys. Lett. 70, 492 (2005). N. Kawakami and S.-K. Yang, Phys. Lett. 65, 2309 (1990).

Chemistry Prof. Dr. Christoph van W¨ullen

In this section, three contributions have been selected that fall into the wider area of computational materials science. The first two report on applications of high performance computing, while the third one deals with the modification of a computer code such that it runs well on modern hybrid machines where there are both generalpurpose CPUs and graphics processor units. A promising new class of materials has found its way into real applications in the last decades: organic semiconductors. They are used in devices such as OLEDs (organic light emitting diodes) but also organic solar cells and organic transistors (organic thin film transistor, OTFT, or organic field effect transistor, OFET). Materials used in such devices are often based on polycyclic aromatic hydrocarbons such as tetracene, pentacene or rubrene (tetraphenyl-tetracene). The performance of theses materials is limited by their conductance, or more specifically, by the mobility of charge carriers. Understanding what structural factors determine this mobility is the key to making better devices. In many cases charge transport occurs through the hopping of a localized charge from one molecule to the other. In the report by Franke, Nair, Chi and Fuchs, the benzene and pentacene dimers are considered as simple model systems. If one calculates the ground state of such a mono-charged dimer, the charge does not necessarily localize on one fragment, even if the fragment carrying the charge is allowed to adapt its molecular structure to the charge. To prepare an initial state with a localized charge on has to impose an additional constraint to the Kohn-Sham wave function. The implementation and first results of such a constrained DFT procedure are reported in some detail. This is now the starting point for future calculations of the charge transfer from one fragment to the other when they come close. The second report comes from the groups of Vrabec (Paderborn) and Hasse (Kaiserslautern) on force field simulations of electrolyte solutions and hydrogels. Here we mention in particular the hydrogels. These are hydrophilic polymers which swell in water, by absorbing large quantities of the latter. This has found much use Fachbereich Chemie, Technische Universit¨at Kaiserslautern, Erwin-Schr¨odinger-Straße, D-67663 Kaiserslautern, Germany 167

168

C. van W¨ullen

in different contexts (“superabsorbers”), but also the resulting wet material is of interest because of its biocompatibility (hydrogel contact lenses). Sometimes there is a relatively sharp transition between a swollen and a compact structure, depending on electrolyte concentration, temperature, solvent, etc. Atomistic simulations of this process are challenging because one has to include both the polymer and the solvent in atomic resolution. The agreement between experiment and simulation is thus still qualitative. In high performance scientific computing, big efforts are necessary from time to time in order to adapt well-established computer codes to new types of hardware. For example, when vector computers such as the Cray 1 or the Cray XMP became available to the scientific computing community in the beginning of the 1980s, most codes had to be extensively restructured to make use of the vector pipelines. Meanwhile, the era of vector supercomputers has come to an end, and at first sight the effort invested in ‘vectorizing’ the codes seems lost. This is not at all true, because as a by-product of vectorization, many codes also run faster on today’s superscalar CPUs with their long instruction pipelines. Since few years, graphics processor units (GPUs) are available which can also be used for general purpose computing. Such a GPU can only be used for parts of the calculation, which profit from the high degree of parallelization within the GPU. The contribution of Maintz, Eck and Dronskowski reports a modification of the program suite VASP (a density functional code for periodic systems much used in computational materials science) to make use of GPUs. These investigations are of interest beyond GPUs because future multi-core CPUs (100 and more CPU cores on a chip) will possibly require similar considerations and techniques. As we have seen, the rational (that is, knowledge-driven) design of new materials requires interdisciplinary work having both experimental and computational methods at hand. Computational materials science is a branch of chemistry (and physics) which very much depends on having access to high performance computing facilities. These are therefore a prerequisite for future breakthroughs in this field.

Constrained Density Functional Theory of Molecular Dimers J.-H. Franke, N.N. Nair, L. Chi, and H. Fuchs

Abstract For charge transport in organic semiconductors the geometrical response to the presence of the charge plays a crucial role. Often, charge transport in these materials can be considered as the hopping of a localized polaron. Unfortunately, the description of localized charge carriers within semilocal Density Functional Theory (DFT) is prevented by the self-interaction error that artificially delocalizes the charge. Here, we present a computational scheme for the description of localized charges in an organic semiconductor. Constrained DFT is used to localize the charge on one of the molecules of a molecular dimer. The availability of the forces from this constraint enables ab initio molecular dynamics calculations and gives access to the geometrical response of neighboring molecules to the presence of a charged neighbor. This is demonstrated for a pentacene dimer. The reorganization energy is found to increase from 91 meV to 108 meV when decreasing the distance between ˚ to 4 A. ˚ two Pentacene molecules from 7 A

1 Introduction We study here the charge hopping process between two charge states of a molecular dimer. This process can be considered as the elementary charge transport step in the thermally activated hopping regime of strongly localized polarons in small molecule organic semiconductors. It is thus of tremendous interest for research into higher performance organic semiconductors. To be able to describe the charge transfer process it is first necessary to localize a charge on one molecule. To this end we implement a constrained DFT (CDFT) J.-H. Franke · L. Chi · H. Fuchs Physikalisches Institut, WWU M¨unster, Wilhelm-Klemm-Str. 10, 48149 M¨unster, Germany, e-mail: [email protected] N.N. Nair Department of Chemistry, IIT Kanpur, Uttar Pradesh 208016, India W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 14, © Springer-Verlag Berlin Heidelberg 2012

169

170

J.-H. Franke et al.

approach into the CPMD code supporting ultrasoft pseudopotentials. Different definitions of what constitutes a charge on a molecule are tested and found to be able to produce correct charge states. To access the geometrical response of the molecules to the charge state and sample the thermodynamic properties it is further necessary to implement the ionic forces resulting from the charge constraint. The presented methodology is finally used to calculate the reorganization energy of an isolated molecule-positive ion Pentacene dimer as a function of intermolecular distance. In the following sections the constrained DFT approach is first outlined, then applied to Benzene and Pentacene dimers. The focus on the Benzene calculations is to establish the validity of the electronic structure optimization and to check if the method is able to prepare an isolated ion—neutral molecule electronic structure. The focus of the Pentacene calculations is on the geometry-electronic structure relationship. Here, also structural relaxations of the dimer via simulated annealing and energy conservation of a Born-Oppenheimer molecular dynamics run are demonstrated.

2 Constrained DFT Methodology To realistically describe systems containing strongly localized charges like small polarons, it is a prerequisite to localize the charge. If this is achieved in a proper way, a meaningful description of electronic polarization and structural relaxation response in the surrounding medium due to the presence of a charged molecule based on ab initio methods becomes possible. Combining this with molecular dynamics, also dynamical quantities can be studied and eventually even charge hopping rates could be calculated. The inherent self-interaction error in widely used semilocal exchange-correlation functionals artificially delocalizes the charge density of a molecular ion over many molecules [3, 10–12, 16, 24]. This stems from the convexity of the energy curve as a function of fractional electron number, making a fractional electron occupation of isolated fragments energetically more favorable than integer occupations. In exact DFT this energy curve is a linear function with its slope given by the ionization energy or electron affinity [14].1 The artificial delocalization of charges makes current semi-local exchange-correlation functionals ill suited to the description of strongly localized charges in small polarons. Apart from constructing self-interaction-free density functionals, a straightforward way to overcome this limitation is to put a constraint on the charge of the molecules [2, 4, 13, 15, 19–23]. This implies the need to define a measure for the charge residing on one molecule, which is basically a partitioning problem of the spatially delocalized charge into regions in space defined to belong to a cer1

The energy of an additional charge distributed over two identical entities in exact DFT is thus invariant towards its distribution on the two units, consistent with the ensemble interpretation of fractional electron numbers. If the molecular geometries differ, the charge would become correctly localized.

Constrained Density Functional Theory of Molecular Dimers

171

tain molecule. The arbitrariness of this definition is evident from the many different partitioning schemes suggested in the framework of constrained DFT, the most prominent probably being Mulliken and L¨owdin population analyses [17] as well as Hirshfeld [8] or Becke [1] real space weight-function based charges [2, 13, 21, 22]. In general, different charge partitioning schemes can be defined via a projection operator Pˆi j . With the KS-states |Ψn and occupation numbers fn this operator defines an occupation matrix Ni j Ni j = ∑ fn Ψn |Pˆi j |Ψn .

(1)

n

The charges of parts of the system A, e.g. orbitals, atoms or molecules, can be obtained as the partial trace of the occupation matrix nA = ∑ Nii .

(2)

i∈A

The projection operator can take different forms, for example it can be a weighting function in real space like in the work of Dederichs et al. [4] or in the Hirshfeld scheme [13] niiso (r) |r = δ r − r δi j (3) r |PˆiHirsh j l ∑l niso (r) where niiso (r) denotes the electron density of an isolated (promolecular) atom. The projection operators for Mulliken, L¨owdin and dual charges are PˆiMull = |φi φ˜ j |, j PˆiLoew j PˆiDual j

−1/2

(4) −1/2

=U |φi φ j |U , 1 ˜ |φi φ j | + |φ j φ˜i | = 2

(5) (6)

where |φi denotes the non-orthogonal local projectors and |φ˜i = ∑ j Ui−1 j |φ j the orthogonal ones (Ui j = φi |φ j ). Here the projectors are simply taken as the pseudo wave functions of the pseudopotentials used. In our case we have a rather well-defined situation we have to reproduce with the constrained DFT calculations. We know the ionic state and the neutral state of the isolated molecule, and the constraint DFT should naturally produce a neutral molecule and an ion. The arbitrariness in the definition of the charges is thus to be eliminated by comparing the produced molecular states to the isolated molecular states. The constraint needs to be formulated in such a way that these states are most closely reproduced. A margin of error is however introduced by the unknown magnitude of polarization effects on the neighboring molecule. We implemented the Hirshfeld charge constraint and the Dual [7] as well as the L¨owdin orbital projection schemes. The Hirshfeld scheme is complete in the sense of partial charges adding up to the total charge of the system. For the orbital projection scheme this depends on the completeness of the projector set. As we project on the atomic pseudo-orbitals, yet expand the Kohn-Sham wave functions in a plane

172

J.-H. Franke et al.

wave basis, the projection remains incomplete and absolute numbers of the charges should be taken with a grain of salt. The beauty of the orbital projection schemes (Mulliken, L¨owdin and Dual) lies in their simplicity as all quantities can be straightforwardly evaluated in reciprocal space and are thus well parallelized in CPMD.2 For the Hirshfeld charges the regions with small electron densities produce numerical instabilities in the derivative of the weighting function needed for the ionic forces that necessitate a real space cutoff. Although this problem can be solved [13], it still requires additional effort that is not needed in the projection schemes. Also, the evaluation of the forces requires additional 3-dimensional Fourier transforms which constitutes a serious bottleneck in massively parallel calculations. Also the implementation in conjunction with ultrasoft pseudopotentials is more involved then in the case of the projection schemes. Therefore, ultrasoft pseudopotentials and ionic forces are not implemented for the Hirshfeld charge constraint and the results presented here for this constraint scheme are limited to norm-conserving pseudopotentials and static electronic-self-consistency calculations. The Dual and L¨owdin orbital projection schemes are implemented in CPMD including support for ultrasoft pseudopotentials. The L¨owdin projection scheme implementation also comprises ionic forces. Using ultrasoft pseudopotentials, the occupation matrix for the projection operator becomes ˆ Pˆi j |SˆΨn (7) Ni j = ∑ fn Ψn S| n

ˆ with Sˆ being the S-operator of ultrasoft pseudopotentials. The charge difference between the charge on the donor and acceptor molecules can now be constrained to a given value Nc as nA − ∑ ZI I∈A

− nD − ∑ ZI

(8)

= Nc

I∈D

with the ZI denoting the core charges of the pseudopotentials. Under this constraint the DFT functional to minimize becomes ECDFT [n] = EDFT [n] +Vc

nA − ∑ ZI I∈A

− nD − ∑ ZI

− Nc

(9)

I∈D

with the Lagrange multiplier Vc . In practice, the minimization is performed by optimizing the wave functions for a given Vc , then calculating the Nc from these wave functions and predicting a new Vc from the error in Nc . Thus, the optimization of the charge constraint is done in an outer loop around the electronic self-consistency loop.

2 In fact, the additional overhead in the projection schemes scales linearly to at least 128 cores. This test was done on the NEC Nehalem cluster of HLRS that boasts 8 cores per node and Infiniband interconnect.

Constrained Density Functional Theory of Molecular Dimers

173

For the calculation of the self-consistent wave functions an additional term in the wave function forces appears due to the constraint: δ ECDFT δ EDFT δ nA δ nD (10) = +Vc − δ Ψn | δ Ψn | δ Ψn | δ Ψn | δ EDFT ˆ Ψn . +Vc ∑ σi Sˆ Pˆii S| = (11) δ Ψn | i∈D,A The function σi is hereby defined as being 1 for i ∈ A and −1 for i ∈ D. The additional forces on the ions due to the constraint are A ∂ ECDFT ∂ EDFT ∂n ∂ nD = +Vc − . (12) ∂ RI ∂ RI ∂ RI ∂ RI The evaluation of the ionic forces is much more involved as many terms depend on the ionic positions. For the L¨owdin projection scheme this becomes

∂ nA = ∂ RI

∂

∑ ∑ fn ∂ RI φ˜i |S|ˆ Ψn 2

(13)

n i∈A

∂ Sˆ ˜ ∂ ˜ ˜ ˆ ˆ = 2 ∑ ∑ fn φi |S|Ψn Ψn | |φi + Ψn |S | φi ∂ RI ∂ RI n i∈A

with

∂ ˜ ∂ −1/2 −1/2 ∂ |φi = ∑ |φl Uil +Uil |φl . ∂ RI ∂ R ∂ RI I l

(14)

(15)

ˆ ∂ RI and ∂ |φi /∂ RI terms is straightforward, the While the evaluation of the ∂ S/ −1/2 derivative involving the matrix Uil is more complicated. Nevertheless it can be calculated by first diagonalizing the overlap matrix Uil = φi |φl [9]: UDi = ui Di

(16)

and then evaluating the derivative in the orthogonal basis of the eigenvectors Di (which constitute the rows of the matrix Di j ) as −1/2 −1/2 ∂ U −1/2 ∂U um un =− . (17) −1/2 −1/2 ∂ RI ∂ RI mn um + un mn

The derivative of the matrix in its non-orthogonal basis is then obtained through back-transformation via ∂ U −1/2 ∂ U −1/2 = ∑ Dmi Dnl . (18) ∂ RI ∂ RI m,n il

mn

The ionic forces are only implemented for this L¨owdin charge constraint.

174

J.-H. Franke et al.

3 CDFT on the Benzene Dimer Calculations using different target values of the charge difference Nc yield different Lagrange multipliers. Integrating these along the coordinate of charge difference should yield the energy difference between the different charge states. The mutual consistency of the Lagrange multiplier and the energy can be considered as a selfconsistency check of the numerical implementation of the constrained DFT. The process of localizing a charge on one constituent of a dimer system can be understood as a combination of adding charge on one and removing the same amount from the other unit. In exact DFT, changing the charge amount in an isolated system, i.e. one part of the dimer, changes the total energy proportional to the ionization energy or electron affinity, depending on the overall charge of the system [15]. Since, due to the constraint, one constituent of the dimer gains the charge that is lost on the other, the net effect on the energy should vanish in exact DFT. The finite Lagrange multipliers are thus a direct consequence of the self-interaction error present in the PBE functional used here. A parabolic energy contribution as a function of fractional electron number can be expected for the Hartree contribution of the selfenergy, under the additional assumption of fixed shape of all orbitals. The Lagrange multiplier of the constraint needed to change the occupation number of one orbital should be linear in the fractional electron number. At this stage a simpler system, the Benzene dimer, is used for testing. For this system data from self-interaction corrections and restricted open-shell Hartree-Fock calculations are readily available [10]. Stacking the Benzene molecules in its neutral geometry at varying distances, the charge difference is constrained to different values Nc . Importantly, the two molecules considered have the same geometry and therefore only absolute charge differences between the two molecules need to be considered. The overall system contains one positive elementary charge. For these tests the Dual projection scheme and the PBE density functional were used. The resulting Lagrange multiplier Vc is plotted against Nc in Fig. 1a. A linear relationship ˚ The mutual consistency of the difis found for a molecule-molecule distance of 7A. ferent charge projection schemes is highlighted by the almost identical Mulliken and L¨owdin charges calculated for these wave functions. The corresponding energies are plotted in Fig. 1b, together with the energies predicted from the linear fit of Fig. 1a. Here, the excellent agreement gives evidence of the numerical self-consistency of the code. To further elucidate the validity of the constraint scheme it is instructive to look at the magnetization density of the Benzene dimer. The single hole on the dimer should produce a magnetization density that corresponds to the HOMO at the ion ˚ while vanishing at the neutral molecule for the quasi isolated molecules, i.e. the 7 A dimer. In Fig. 2 one can see that in the unconstrained case the magnetization density is completely delocalized over the dimer. Constraining the charge difference to higher values then increasingly concentrates the charge on one molecule, with finally vanishing magnetization density at the other. The charge difference value Nc at which this occurs, is larger then 1.0 for the Dual charge constraint. Nevertheless, the magnetization density at a charge difference of 1.3 resembles the sought-for

Constrained Density Functional Theory of Molecular Dimers

175

Fig. 1 Constrained DFT using the Dual constraint scheme on the Benzene dimer with one positive charge. a Lagrange multiplier Vc plotted against the charge difference between the two molecules ˚ apart. The relationship is linear which can be seen by the linear fit. Mulliken Nc with molecules 7 A and L¨owdin population analysis schemes are tested on the converged wave functions and give ˚ as a almost identical charge differences. b The energies obtained for the Benzene dimer at 7 A function of the charge difference between the two molecules. The parabolic behavior is evident from the coincidence of the energies calculated as the integral of the fit function from (a) with the actual energies obtained during the calculations. c Vc -Nc -plots obtained for different intermolecular ˚ distance case, distances. At smaller distances the initial slope of the curve is smaller then in the 7 A i.e. the Vc values are smaller for small Nc . At higher Nc and smaller distances aberrations from linear behavior appear. d Energies for Benzene dimers at different distances as a function of Nc . The total energy decreases with increasing distance. Secondly, the parabolas at smaller distances are less steep then for the isolated case. e and f Lagrange multipliers and energies at high Nc of the Dual projection and Hirshfeld schemes. Nonlinearities occur for Nc values of 1.3 and 1.0 in the Lagrange multiplier. The energies at the respective values are parabolic, up to the nonlinearity in the Lagrange multiplier, yet differ drastically at even higher values

magnetization density of the isolated ion and the neutral molecule closely. For comparison, the results for a Hirshfeld charge constraint are also given. Here, the charge ˚ dimer. is completely localized at a charge difference of 1.0 already for the 7 A

176

J.-H. Franke et al.

Fig. 2 Spin density isosurfaces (isovalue m = 0.002) of the Benzene dimer with different con˚ In this well separated limit, the spin density should straints at the intermolecular distance of 7 A. be limited to one molecule only. The spin density of an isolated positive ion is given at the right of the figure. The unconstrained dimer shows the complete delocalization of the single unpaired electron into the two molecular HOMOs. Switching on the constraint leads to localization of the spin density. At a charge difference Nc of 0.9, the Dual and the Hirshfeld charge constraint both yield the same spin density, which corresponds almost to the isolated ion, with some residual occupation still on the other molecule. At Nc = 1.0 the Hirshfeld constraint yields basically the correct ion-neutral system, while the Dual constraint charge transfer is still incomplete. However, the dual constraint also generates the correct ion, albeit only at the larger Nc value of 1.2. Constraining the Hirshfeld charges to this value leads to additional charge transfer, probably tending towards the ˚ dimer shown on the right exhibits basically the negative plus double positive ionic state. The 3 A same pattern albeit the charge localization occurs at higher Nc values

˚ case) to the sum of Comparing the total energy of the constrained dimer (7 A energies of the neutral molecule plus that of the ion gives further insight into the ˚ validity of the scheme presented. Without constraint, the energy of the dimer at 7 A is about 1 eV smaller then the sum of the single constituent energies. This energy difference is reduced with increasing charge difference, following the parabolic energy curves. At a charge difference Nc = 1.0, it is still smaller by 315 meV and 47 meV for Dual and Hirshfeld charge constraints, respectively. However, it is difficult to quantify effects on the total energy of the polarization response of the neutral molecule to the presence of the charged molecule. This will lower the total energy, bringing the real, physical energy of the dimer closer to the calculated ones. The crossing point of the energy parabola with the expected energy value is very slightly larger then Nc = 1.0 in the Hirshfeld scheme and around Nc = 1.2 for the Dual pro-

Constrained Density Functional Theory of Molecular Dimers

177

jection scheme, corresponding quite well to the Nc values where the magnetization density becomes completely localized on one molecule. ˚ dimer, aberrations Plotting the Vc -Nc behavior for Nc values up to 1.3 for the 7 A from linearity are observed. In the Hirshfeld constraint scheme, the aberrations take the form of a sudden jump in the Lagrange multiplier at Nc = 1.0, while in the Dual projection scheme this jump occurs at higher charge differences and is much smaller. ˚ dimer this discontinuity in the Lagrange multiplier occurs very In fact, for the 7 A close to the charge differences of Nc = 1.0 and Nc = 1.2 for which the total energy and spin density indicate the preparation of an isolated ion-neutral molecule complex. Considering again the gradual transfer of electrons from one molecule to the other with increasing charge differences as adding infinite amounts of charge to one molecule and withdrawing the same amount from the other, one can immediately see the origin of this discontinuity. In exact DFT this jump would correspond to the sum of two derivative discontinuities of the energy with respect to fractional electron numbers: one as the ion passes from the single to double positive ionic state and one from the neutral-negative ionic state transition. In the semilocal PBE functional this discontinuity is underestimated as only the discontinuity in the kinetic energy is present. However there is still a jump at crossings of integer electron numbers. Vc -Nc plots at different dimer distances (Fig. 1c) show that the slope of the curve ˚ case and also the aberrations from linearity occur at smaller is different from the 7 A charge differences. Localization of spin density occurs at larger charge differences, ˚ dimer is see Fig. 2. Note that in Hartree-Fock theory the spin density of the 3 A not fully localized on one molecule [10] and the complete charge localization might thus be unphysical here as Hartree-Fock should overlocalize fractional charges [11]. In conclusion, one can say that the calculations discussed above show that the electronic structure of the ion-neutral molecule dimer can be prepared by our charge constrained DFT implementation.

4 CDFT on the Pentacene Dimer The next point is to study the influence of the geometrical structure and check if our method gives the correct reorganization energy of an isolated molecular dimer. For this purpose the DFT-D method using the PBE functional is chosen in conjunction with the L¨owdin charge constraint as this is the only constraint scheme with implementation of the ionic forces. All calculations presented below were done using ultrasoft pseudopotentials. Occasional cross-checking with norm-conserving pseudopotentials of the Troullier-Martins type gave identical results. First, the inner reorganization energy of Pentacene is evaluated using the “4point-method” [5, 18]. It is calculated as the sum of the two contributions of distorting the neutral molecule to its charged geometry and distorting the ion to its neutral geometry. These quantities are directly accessible through isolated molecule calculations. Calculating the energies of charged or neutral isolated molecules (E + and 0 E 0 ) in their charged or neutral geometries (R+ N and RN ) gives for the reorganization

178

energy

J.-H. Franke et al.

0 + 0 0 λ = E + ({R0N }) − E + ({R+ N }) + E ({RN }) − E ({RN }).

(19)

Since there are only energy differences involved, contributions of possible interaction energies of a neutralizing background charge in periodic calculations cancel out. The above mentioned quantity was calculated using CPMD at the PBE-D level of theory and carefully converging the energy cutoffs of the plane wave basis set and unit cell size. Also the BLYP functional in conjunction with semiempirical dispersion corrections (BLYP-D) was tested. All values obtained are in very good agreement with each other as the CPMD calculations yield values of 62.4 meV and 65.6 meV for PBE-D and BLYP-D, respectively. Markedly, this is in contrast to the reorganization energy reported in the literature using the hybrid B3LYP functional of 98 meV [6]. The difference is here attributed to the semilocality of the functionals used as technical effects of pseudopotentials, incomplete basis set or spurious interactions due to small supercells can be excluded. In the next step, a Pentacene dimer was constructed from the relaxed ionic and ˚ 5A ˚ and 7 A ˚ are neutral geometries. Vc -Nc and E-Nc plots for dimer distances of 4 A, given is Fig. 3a, b. They are similar to the ones obtained for Benzene before, but here the dimer is made up of the neutral and ionic geometries. Positive charge differences hereby correspond to the positive charge being more on the Pentacene in its ionic geometry. As a consequence of the geometrical differences of the molecules, the charge is more easily accommodated by the ionic Pentacene, thus giving a charge ˚ dimer already. This translates to a difference of 0.04e for the unconstrained 7 A y-axis intercept (point of vanishing charge difference) of the Lagrange multipliers at negative values. Since the Vc -Nc curves are linear for all dimers, the E-Nc curves which correspond to the integral of this function form parabolas with minima at the unconstrained charge differences. The energy differences between energies obtained for positive and negative charge differences

Δ E(Nc ) = E(−Nc ) − E(Nc )

(20)

can be interpreted as the reorganization energy at a given charge difference. This quantity is plotted in Fig. 3c. As expected from the parabolic behavior of the E-Nc plots this energy difference is linear in the charge differences. Its intercept with the reorganization energy of isolated molecules gives the charge difference at which the ˚ dimer as isolated molecules. ionic state should be reached when interpreting the 7 A Again, this can be compared to the charge differences at which the spin density is localized on the ion (Fig. 4) and the location of the jump in the Lagrange multiplier of Fig. 3a. The result is that the ionic-neutral dimer is obtained at a L¨owdin charge difference of 1.3e. The mutual consistency of the different indicators shows that the physical electronic state is obtained at this charge difference. Interestingly, the ˚ and 5 A. ˚ situation is very similar for the smaller dimer distances of 4 A The forces acting on the ions show that the molecules are not in their respective ground state at this charge difference. The forces are largest for the dimer with the smallest intermolecular distance. To get insight into the molecular relaxation

Constrained Density Functional Theory of Molecular Dimers

179

Fig. 3 Constrained DFT using the L¨owdin constraint scheme with ultrasoft pseudopotentials tested on the cofacially stacked Pentacene dimer. One molecule is in the frozen ionic geometry, the other in the neutral one with positive charge differences corresponding to more charge on the ion. a Lagrange multiplier Vc plotted against the charge difference Nc with molecules at different distances. ˚ and 7A ˚ dimers. It jumps for The relationship is linear up to charge differences of 1.2e for the 5A ˚ dimer. Nc = 1.3e in these cases and shows increasing slope already for smaller Nc values for the 4A b The energies obtained for the Pentacene dimers as a function of the charge difference between the two molecules. The parabolas are slightly offset along the x-axis by the finite positive charge difference observed already in the unconstrained electronic state. c The energy differences (20) as a function of Nc . The plots are linear and almost identical. The isolated molecule reorganiza˚ tion energy is reached at a charge difference of Nc = 1.3e d NVE ensemble trajectory of the 7 A Pentacene dimer at a charge difference of Nc = 1.3e. Shown are the Kohn-Sham energy and the sum of Kohn-Sham and kinetic energy that should be conserved throughout the run. The system temperature is initialized as 300 K and the conserved energy does not exhibit any significant drift and fluctuates by about 2 10−3 Hartree

pattern in the presence of the other molecule, the structure is relaxed by simulated annealing Born-Oppenheimer molecular dynamics. In these runs, 6 carbon atoms are constrained to move along the axis connecting the two molecules, i.e. they are constrained to fix the intermolecular distance. The fixed atoms are at both ends and in the middle of the molecule. The two middle atoms are also constrained along the other coordinates, so that slipping motion and in-plane rotations do not occur. To test the accuracy of the constraint contribution to the ionic forces and the other molecular dynamics parameters (timestep of 40 a.u., convergence of the wave function gradients to < 10−6 a.u., charge difference constraint converged to < 10−5 e), a trajectory of the Pentacene dimer at a charge difference of 1.3e is generated. Figure 3d shows the fluctuations of the Kohn-Sham energy and the conserved energy. It is evident that no significant drift occurs in the conserved energy and its fluctuations are also reasonable. After extensive optimization of the molecular dynamics scheme

180

J.-H. Franke et al.

Fig. 4 Spin density isosurfaces (isovalue m = 0.002) of the Pentacene dimer with the L¨owdin and Hirshfeld constraints at varying intermolecular distances. The spin density is completely localized on the molecule in the ionic geometry (the left one) for Nc = 1.3e and on the molecule in the neutral geometry at Nc = −1.3e for the L¨owdin constraint. The Hirshfeld constraint localizes the spin density already at Nc = 1.0e. All spin density isosurfaces correspond well to the spin density of the isolated molecule shown on the right

involving a second order Lagrange polynomial prediction of the Lagrange multiplier [13] at the new timestep and occasional memory resets of the preconditioned conjugate gradient minimizer, the generation of this trajectory of 160 fs still cost one day of wallclock time on 64 cores. In conclusion, it is proved that our constrained DFT approach is able to generate trajectories of physically correct charge localized systems. The relaxation of the three dimers showed significant additional relaxations for all distances. Relaxation was carried out by simulated annealing subject to the above mentioned constraints and the forces were converged to maximum gradients of 10−4 a.u.. The energy of the electronic state with charge on the former neutral molecule, i.e. E(−Nc ) was then calculated at the relaxed geometry, giving the reorganization energy via (20). The results (see Table 1) show a significant increase of the reorga˚ intermolecular distance to 0.00336 nization energy already for the dimer with 7 A Hartree (91 meV). The reorganization energy is even larger for the smaller distances

Constrained Density Functional Theory of Molecular Dimers

181

Table 1 Reorganization energies of Pentacene dimers at Nc = 1.3e with varying distances calculated from (20). The second column contains the results for the geometry frozen in its isolated molecule—isolated ion configuration and the third the values obtained when relaxing the geometry self-consistently. In these relaxations the intermolecular distances are fixed by the constraints mentioned in the text Intermolecular distance Reorganization energy isolated configuration relaxed configuration ˚ 7A 0.00231 a.u. 0.00336 a.u. ˚ 5A 0.00231 a.u. 0.00359 a.u. ˚ 4A 0.00227 a.u. 0.00398 a.u.

(0.00359 a.u. and 0.00398 a.u. corresponding to 98 meV and 108 meV). The fact that the reorganization energy at frozen isolated molecule—isolated ion configurations is similar for all distances (cf. Fig. 3c) points to the mutual cancellation of electronic polarization energies in these cases, i.e. the polarization energies of charge states Nc and −Nc cancel out. This leads to reorganization energies independent of molecular distance. However, the polarization becomes important for the relaxations of the molecules, increasing the reorganization energy with decreasing distance. Since the ˚ electrostatic interactions are long ranged, they are also important already for the 7 A configuration. This result gives evidence of the importance of the surrounding medium for the calculation of reorganization energies. The surrounding molecules contribute significantly to the overall reorganization energy. Since the reorganization energy enters the rate constant of Marcus theory in the exponential, these effects are highly significant.

5 Outlook Through the CDFT based molecular dynamics not only the electronic and structural response of the surrounding medium to the presence of a charged molecule is accessible, but also the fact that the electronic structure is available on-the-fly can be used. The final quantity of interest, the charge carrier mobility, can be obtained from charge hopping rates between nearest neighbors. These, in turn, can be calculated from the electronic structure of the molecules via the Tully surface hopping probability to the electronic state where the charge resides on the formerly neutral molecule. The hopping probability follows the geometrical configurations directly and an ensemble average could be obtained from a trajectory of adequate length. Moreover, the fact that the charge states are ground states of the constrained system means that this technique is based on the solid ground of applicability of DFT, not using virtual states. Acknowledgments. The simulations were performed on the national supercomputer NEC Nehalem Cluster at the High Performance Computing Center Stuttgart (HLRS) under the grant num-

182

J.-H. Franke et al.

ber AIMDPOLH/12841. The authors especially thank Dominik Marx for instrumental discussions concerning this project.

References 1. Becke, A.D.: A multicenter numerical integration scheme for polyatomic molecules. The Journal of Chemical Physics 88, 2547–2553 (1988). DOI 10.1063/1.454033. http://link.aip.org/link/?JCP/88/2547/1 2. Behler, J., Delley, B., Reuter, K., Scheffler, M.: Nonadiabatic potential-energy surfaces by constrained density-functional theory. Physical Review B 75, 115,409 (2007). http://link.aps.org/doi/10.1103/PhysRevB.75.115409 3. Cohen, A.J., Mori-Sanchez, P., Yang, W.: Insights into current limitations of density functional theory. Science 321, 792–794 (2008). DOI 10.1126/science.1158722. http://www.sciencemag.org/cgi/content/abstract/321/5890/792 4. Dederichs, P.H., Bl¨ugel, S., Zeller, R., Akai, H.: Ground states of constrained systems: Application to cerium impurities. Physical Review Letters 53, 2512–2515 (1984). http://link.aps.org/doi/10.1103/PhysRevLett.53.2512 5. Deng, W.Q., Goddard III, W.A.: Predictions of hole mobilities in oligoacene organic semiconductors from quantum mechanical calculations. The Journal of Physical Chemistry B 108, 8614–8621 (2004). http://dx.doi.org/10.1021/jp0495848 6. Gruhn, N.E., da Silva Filho, D.A., Bill, T.G., Malagoli, M., Coropceanu, V., Kahn, A., Bredas, J.L.: The vibrational reorganization energy in pentacene: Molecular influences on charge transport. Journal of the American Chemical Society 124, 7918–7919 (2002). http://dx.doi.org/10.1021/ja0175892 7. Han, M.J., Ozaki, T., Yu, J.: O (N) LDA+U electronic structure calculation method based on the nonorthogonal pseudoatomic orbital basis. Physical Review B 73, 045110 (2006). http://link.aps.org/doi/10.1103/PhysRevB.73.045110 8. Hirshfeld, F.L.: Bonded-atom fragments for describing molecular charge densities. Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta) 44, 129–138 (1977). http://dx.doi.org/10.1007/BF00549096 9. Jorgensen, P., Simons, J.: Ab initio analytical molecular gradients and Hessians. The Journal of Chemical Physics 79, 334–357 (1983). DOI 10.1063/1.445528. http://link.aip.org/link/?JCP/79/334/1 10. Mantz, Y.A., Gervasio, F.L., Laino, T., Parrinello, M.: Charge localization in stacked radical cation DNA base pairs and the benzene dimer studied by self-interaction corrected density-functional theory. The Journal of Physical Chemistry A 111, 105–112 (2007). http://dx.doi.org/10.1021/jp063080n 11. Mori-Sanchez, P., Cohen, A.J., Yang, W.: Many-electron self-interaction error in approximate density functionals. The Journal of Chemical Physics 125, 201102 (2006). DOI 10.1063/1.2403848. http://link.aip.org/link/?JCP/125/201102/1 12. Mori-Sanchez, P., Cohen, A.J., Yang, W.: Localization and delocalization errors in density functional theory and implications for band-gap prediction. Physical Review Letters 100, 146,401 (2008). http://link.aps.org/doi/10.1103/PhysRevLett.100.146401 13. Oberhofer, H., Blumberger, J.: Charge constrained density functional molecular dynamics for simulation of condensed phase electron transfer reactions. The Journal of Chemical Physics 131, 064101 (2009). DOI 10.1063/1.3190169. http://link.aip.org/link/?JCP/131/064101/1 14. Parr, R.G., Yang, W.: Density-Functional Theory of Atoms and Molecules. Oxford University Press (1988) 15. Perdew, J.P., Parr, R.G., Levy, M., Balduz, J.L.: Density-functional theory for fractional particle number: Derivative discontinuities of the energy. Physical Review Letters 49, 1691–1694 (1982). http://link.aps.org/doi/10.1103/PhysRevLett.49.1691

Constrained Density Functional Theory of Molecular Dimers

183

16. Perdew, J.P., Zunger, A.: Self-interaction correction to density-functional approximations for many-electron systems. Physical Review B 23, 5048–5079 (1981). http://link.aps.org/abstract/PRB/v23/p5048 17. Szabo, A., Szabo, J., Ostlund, N.S.: Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory. Dover Publishing Inc. (1996) 18. Wang, L., Nan, G., Yang, X., Peng, Q., Li, Q., Shuai, Z.: Computational methods for design of organic materials with high charge mobility. Chemical Society Reviews 39, 423–434 (2010). http://dx.doi.org/10.1039/b816406c 19. Wu, Q., Cheng, C.L., Van Voorhis, T.: Configuration interaction based on constrained density functional theory: A multireference method. The Journal of Chemical Physics 127, 164119 (2007). DOI 10.1063/1.2800022. http://link.aip.org/link/?JCP/127/164119/1 20. Wu, Q., Kaduk, B., Van Voorhis, T.: Constrained density functional theory based configuration interaction improves the prediction of reaction barrier heights. The Journal of Chemical Physics 130, 034109 (2009). DOI 10.1063/1.3059784. http://link.aip.org/link/?JCP/130/ 034109/1 21. Wu, Q., Van Voorhis, T.: Direct optimization method to study constrained systems within density-functional theory. Physical Review A 72, 024,502 (2005). http://link.aps.org/doi/10.1103/PhysRevA.72.024502 22. Wu, Q., Van Voorhis, T.: Constrained density functional theory and its application in longrange electron transfer. Journal of Chemical Theory and Computation 2, 765–774 (2006). http://dx.doi.org/10.1021/ct0503163 23. Wu, Q., Van Voorhis, T.: Extracting electron transfer coupling elements from constrained density functional theory. The Journal of Chemical Physics 125, 164105 (2006). DOI 10.1063/1.2360263. http://link.aip.org/link/?JCP/125/164105/1 24. Zhang, Y., Yang, W.: Comment on “generalized gradient approximation made simple”. Physical Review Letters 80, 890–890 (1998). http://link.aps.org/abstract/PRL/v80/p890

Atomistic Simulations of Electrolyte Solutions and Hydrogels with Explicit Solvent Models Jonathan Walter, Stephan Deublein, Steffen Reiser, Martin Horsch, Jadran Vrabec, and Hans Hasse

1 Introduction Two of the most challenging tasks in molecular simulation consist in capturing the properties of systems with long-range interactions (e.g. electrolyte solutions), and of systems containing large molecules such as hydrogels. These tasks become particularly demanding when explicit solvent models are used. Therefore, massively parallel supercomputers are needed for both tasks. For the development and optimization of molecular force fields and models, a large number of simulation runs have to be evaluated to obtain the sensitivity of thermodynamic properties with respect to the model parameters. This requires both an efficient work flow and, obviously, even more computational resources. The present work discusses the force field development for electrolytes regarding thermodynamic properties of their solutions. Furthermore, simulation results for the volume transition of hydrogels in solution containing electrolytes are presented. Both applications are of interest for engineering. It is shown that the properties of these complex systems can be reasonably predicted by molecular simulation.

2 Development of Force Fields for Alkali and Halogen Ions in Aqueous Solution 2.1 Outline The simulation of electrolyte systems is computationally very expensive due to the long-range interactions, which have to be taken into account by suitable algorithms. Jonathan Walter · Stephan Deublein · Steffen Reiser · Martin Horsch · Hans Hasse Lehrstuhl f¨ur Thermodynamik, Technische Universit¨at Kaiserslautern, Erwin-Schr¨odinger-Straße 44, 67663 Kaiserslautern, Germany, e-mail: [email protected] Jadran Vrabec Lehrstuhl f¨ur Thermodynamik und Energietechnik, Universit¨at Paderborn, Warburger Straße 100, 33098 Paderborn, Germany W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 15, © Springer-Verlag Berlin Heidelberg 2012

185

186

J. Walter et al.

Examples of such algorithms are the classical Ewald summation [1] and its derivatives like particle mesh Ewald summation [2] or the particle particle/particle mesh method [3]. Early efforts in this research area were mainly directed to the development of ion force fields, which were capable of reproducing static structural properties of solutions [4–6]. Recently, however, these models have been proven to be too inaccurate for the prediction of basic thermodynamic properties [7, 8]. Since the models were parameterized solely at short distances, long distance effects are underestimated. These effects can be seen in particular in the density and the activity of electrolyte solutions. In the present work new atomistic models for alkali and halogen ions are presented, which accurately describe not only structural properties of aqueous electrolyte solutions but also basic thermodynamic properties like their density. The main focus of the model development is placed on transferability, i.e. the ion models are intended to describe solution properties independently of the cation/anion combination. The ions were modeled as Lennard-Jones (LJ) spheres with superimposed charges of ±1 in units of elementary charges, located in the center of mass. For the solvent water, the SPC/E model [9] was used, which consists of one LJ site and three point charges.

2.2 Simulation Details All simulations in Sect. 2 were performed using the Monte-Carlo technique in the isobaric-isothermal (N pT ) ensemble at 20◦ C and 0.1 MPa. The simulation volume contained a total of 1000 molecules and ions, respectively. The electrostatic longrange interactions were calculated using the Ewald summation with an Ewald parameter κ of 5.6/L, where L denotes the length of the cubic simulation volume. The dispersive and repulsive long-range contributions were approximated using the ˚ The assumption of an homogeneous fluid beyond the cut-off distance of 11.9 A. simulation program employed was an extended version of ms2 [10].

2.3 Solvent Model and Reduced Properties For the calculation of aqueous electrolyte solutions, the model for water is crucial, since it represents the largest fraction in the mixture. The SPC/E [9] water model was employed, which reproduces the density and other thermodynamic properties at chosen conditions in good agreement with experimental data (ρSPC/E (T = 20◦ C, p = 0.1 MPa) = 999.5 g/l). To minimize the influence of errors in the solvent model on the parameterization of the ion force fields, the reduced density ρ of the aqueous solution was chosen as objective function, which is defined as the fraction

Atomistic Simulations of Electrolyte Solutions and Hydrogels

187

Fig. 1 Reduced density ρ as a function of sodium chloride mass fraction. The line indicates the experimental values [11], while the symbols represent simulation results using varying ion force fields [12–15]

of the density of the electrolyte solution ρES and the density of the pure solvent ρS at the same temperature and pressure

ρ =

ρES . ρS

(1)

Figure 1 shows a typical plot of the reduced density of an aqueous electrolyte solution as a function of salt mass fraction using sodium chloride as an example. The dependence of ρ with xm is almost linear. It can also be seen from Fig. 1 that electrolyte force fields from the literature predict this linear correlation, but with a wrong slope. Note also that the solvent activity (or activity coefficient) can be regarded as a normalized property. However, a similar approach fails for properties like the ion self-diffusion coefficient.

2.4 Ion Force Field A global parameterization strategy was used in the present study. Due to the simple model for the ions, the overall parameter space is small. It contains the two LJ parameters σ and ε for each ion. The position and magnitude of the charges are constant. In a preliminary analysis, the sensitivity of the reduced density on the Lennard-Jones parameters was evaluated. It shows that the influence of the LJ energy parameter ε on ρ is very small. Therefore, the σ parameters were determined by a fit to the reduced density, while the ε parameters were set here to a constant

188

J. Walter et al.

value, i.e. 200 K for sodium and chloride, based on results of a study on the water activity in the electrolyte solution, which is not described in detail here. The objective function of the optimization used to determine the LJ σ parameters is the slope of the reduced density ρ with increasing salt mass fraction at x(m) → 0 dρ dρ = (σanion , σcation ) . (2) dx(m) x(m) →0 dx(m) x(m) →0 In molecular simulations, these slopes were systematically derived for varying values of σanion and σcation . The data were smoothed by a two dimensional polynomial fit. The ion parameters for all alkali and halogen ions were chosen such that the sum of the squared deviations between the functional fit and the experimental data is minimized.

2.5 Results New atomistic models for alkali and halogen ions were determined that describe the reduced density of the aqueous electrolyte solution in good agreement with experiments. As an example the force field for sodium chloride is shown in Table 1. Figure 1 shows the excellent agreement between the simulation using the new model and the experimental results. The new atomistic models for alkali and halogen ions were investigated regarding the representation of structural properties of aqueous solution. Radial distribution functions gi j (r) were used for the characterization which are defined by gi j (r) =

ρ j (r) . ρ j,bulk

(3)

Here ρ j (r) is the density of component j as a function of the distance r between two ions of component i and j, respectively, and ρ j,bulk is the number density of component j in the bulk phase. For the characterization of aqueous electrolyte solutions, the radial distribution function of water around the ions is of particular interest. In this case, water is represented by the position of the oxygen atom. For a solution of sodium chloride in water, gi,H2 O (r) is shown in Fig. 2. The locations of the first maximum and minimum, respectively, for both ions are in good agreement with experimental measurements [16], as shown in Table 2. The hydration shell of the cation is more attached to the ion than of the anion, which is represented by the height of the first peak in the distribution function.

Table 1 Force field of sodium chloride Ion Na+ Cl−

˚ σ /A 1.88 4.41

ε /K 200 200

Atomistic Simulations of Electrolyte Solutions and Hydrogels

189

Fig. 2 Radial distribution function gi,H2 O (r) of the solvent water around the sodium cation and chloride anion, respectively, at T = 20◦ C and p = 0.1 MPa Table 2 Location of the first maximum rmax and minimum rmin of the radial distribution function gi,H2 O (r) for an aqueous sodium chloride solution. The simulative results are compared to experimental data from the literature [16] Ion i Na+ Cl−

Sim /A ˚ rmax 2.2 3.4

Exp ˚ rmax /A 2.3 3.3

Sim /A ˚ rmin 3.0 3.9

Exp ˚ rmin /A 3.0 4.0

Table 3 Hydration number n of Na+ and Cl− in aqueous solution. The simulative results are compared to experimental data from the literature [16] Ion Na+ Cl−

nSim 5.6 7.5

nExp 5–6 7–8

The attraction of the solvent to the cation is also visible in the hydration number around the ions, which describes the number of solvent molecules in the closest vicinity to the solute i and is defined by ni,H2 O = 4πρH2 O

rmin 0

r2 gi,H2 O (r)dr.

(4)

Here, ρi defines the number density of component i and rmin the distance up to which the hydration number is calculated. For the first shell, the value of rmin is chosen to be the distance of the first minimum in the radial distribution function. For sodium chloride for example, the calculation of hydration numbers reproduces experimental results in good agreement, cf. Table 3.

190

J. Walter et al.

2.6 Computational Demands Typical simulations to generate data points for the studies discussed in Sect. 2 were carried out on 16 CPUs running for 72 hours. For the prediction of other thermodynamic data like activity coefficients, up to 32 CPUs running for 72 hours depending on the system are required. For these simulations a virtual ram of 216 MB was required.

3 Self-diffusion Coefficients of Solutes in Electrolyte Systems 3.1 Outline The self-diffusion coefficient is, in comparison to the density, an individual property of the different solute species and the solvent in electrolyte systems. Self-diffusion coefficients of anions and cations in aqueous solution are experimentally accessible and numerous experimental data are available in the literature [17, 18]. Therefore, it is worthwhile trying to fit model parameters of ion force fields to self-diffusion coefficient data. As a suitable strategy for determining the size parameters σ is available (cf. Sect. 2), the question may be raised whether self-diffusion coefficient data are useful for determining the energy parameters ε . In molecular simulations, the self-diffusion coefficient is usually determined by time and memory consuming methods, which require simulations of large systems. Examples for such methods in equilibrium molecular dynamics are the mean square displacement [19] and the Green-Kubo formalism [20, 21]. In this formalism, the self-diffusion coefficient is related to the time integral of the velocity autocorrelation function. The calculation of self-diffusion coefficients of solutes in electrolyte systems are computationally expensive due to additional time consuming algorithms (e.g. Ewald summation [1]) that allow for a truncation of ionic interactions in molecular simulations. In aqueous solution, the cations and anions are surrounded by a shell of strongly bonded water molecules (hydration shell). These hydrated ions diffuse within a bulk fluid, which is itself also highly structured. Therefore, the mobility and accordingly the self-diffusion coefficient of ions is strongly related to the structure of the water molecules around the ions [22].

3.2 Methods The investigated solution consisted of sodium and chloride ions as solutes and explicit water as solvent. The ions were modeled as Lennard-Jones spheres with a central charge. Size parameters σ for different ion force fields were determined as

Atomistic Simulations of Electrolyte Solutions and Hydrogels

191

discussed in Sect. 2. The energy parameter ε was modified in a range of 50 to 250 K. Water models were taken from the literature. First, the density of the solution was determined in an isobaric-isothermal (NpT) molecular dynamics (MD) simulation at a desired temperature and pressure. Then, the velocity auto-correlation function and, according to the Green-Kubo formalism [20, 21], the self-diffusion coefficients were determined in an isochoric-isothermal (NVT) MD simulation at this temperature and density. The sampling length of the velocity auto-correlation function was set to 11 ps and the time span between the origin of two auto-correlation functions was 0.06 ps. The separation between the time origins was chosen such that all autocorrelation functions have decayed at least to 1/e of their normalized value. For both NpT and NVT simulations, the molecular dynamics unit cell with periodic boundary condition included 4420 water molecules, 40 sodium and 40 chloride ions. The long-range charge-charge interactions were calculated using Ewald summation [1]. The simulation program employed was an extended version of ms2 [10], which is developed by our group.

3.3 Results The self-diffusion coefficients of anions and cations determined in molecular simulation largely depend on the used molecular model of water, as the mobility of the ions is influenced by the hydration shell and the structure of the surrounding water.

3.3.1 Self-diffusion Coefficient of Water Models The accuracy of the estimated self-diffusion coefficients of pure water for different water models is verified with respect to experimental data [23]. The self-diffusion coefficients were determined with the Green-Kubo formalism [20, 21] in molecular dynamics simulations at a temperature of 25◦ C and a pressure of 0.1 MPa. For this study, three commonly used rigid nonpolarizable molecular water models of unitedatom type, namely SPC/E [9], TIP4P [24] and TIP4P/2005 [25], were chosen. The determined self-diffusion coefficients of the different water models vary over a wide range, cf. Table 4. For TIP4P/2005, the self-diffusion coefficient of pure water is in good agreement with the experimental value. In contrast, both the SPC/E and the TIP4P model overestimate the mobility of water molecules in pure water. The obtained values for the self-diffusion coefficient are in good agreement with the results published by other authors [26]. Table 4 Self-diffusion coefficients of pure water at 25◦ C and 0.1 MPa of different molecular water models. The number in parenthesis indicates the uncertainty of the last digit Model Dw [m2 s−1 ]

SPC/E 26.2 (1)

TIP4P 36.7 (2)

TIP4P/2005 21.9 (1)

Experiment 22.3 (−)

192

J. Walter et al.

3.3.2 Model Development For fitting the energy parameter ε of the ion force fields to experimental selfdiffusion coefficients of anions and cations in aqueous solution, the influence of ε on the determined self-diffusion coefficients in molecular simulations was investigated. In this study, the above mentioned three water models were used. However, it turned out that the number for the energetic parameter ε has no significant influence on the self-diffusion coefficient. This study is not discussed here in detail. In the following, ε = 200 K is used. The results for both sodium and chloride are shown in Fig. 3. As can be seen in Table 4 and Fig. 3, there is no correlation between the accuracy of the determined self-diffusion coefficient of pure water and the accuracy of the estimation of ion mobility for the different water models. For example, the TIP4P model significantly overestimates the self-diffusion coefficient of pure water, whereas the simulation results for the self-diffusion coefficients of the sodium and the chloride ion are in fair agreement with experimental data (deviation for sodium of 4%; deviation for chloride of 18%). All water models overestimate the interaction between the water molecules in the hydration shell, the sodium ion and the bulk fluid. Hence, the cation mobility is too low. The same is true for the chloride ion only for SPC/E and TIP4P/2005 water models, whereas the TIP4P underestimates the interaction in the hydration shell.

Fig. 3 Self-diffusion coefficient of sodium (DNa ) and chloride (DCl ) in aqueous solution at 25◦ C and 0.1 MPa for different water models. The energy parameter ε for both sodium and chloride force fields was set to 200 K. Experimental data for sodium [17] and chloride [18] were taken from the literature

Atomistic Simulations of Electrolyte Solutions and Hydrogels

193

3.4 Computational Demands All molecular simulations in Sect. 3 were carried out with the MPI based molecular simulation program ms2, which is developed in our group. The total computing time for determining self-diffusion coefficients of ions in electrolyte systems was 216 hours on 36 CPUs (72 hours for the NpT run and 144 hours for the NVT run). These simulations require large systems as the accuracy of the Green-Kubo formalism for ions increases with increasing number of solutes and, at the same time, an infinite dilution is aspired. For these simulation a maximum virtual memory of 1.76 GB was used.

4 Hydrogels in Electrolyte Solutions 4.1 Outline Hydrogels are three-dimensional hydrophilic polymer networks. Their most characteristic property is their swelling in aqueous solutions by absorbing the solvent, which is influenced by various factors. Hydrogels can be used in many applications like e.g. superabsorbers such as in diapers [27] and contact lenses [28]. To fully exploit the potential of hydrogels in all these applications, it is important to understand, describe and predict their swelling behavior. The hydrogel which is studied in the present work is built up of poly(N-isopropylacrylamide) (PNIPAAm) crosslinked with N,N’-methylenebisacrylamide (MBA). PNIPAAm is one of the most extensively studied hydrogels in the scientific literature and is mainly used in bioengineering applications [29]. The degree of swelling in equilibrium of PNIPAAm is significantly influenced by many factors [30–33]. On the one hand, the swelling depends on the structure of the hydrogel itself, like the type of the monomer, but also the amount and type of cross-linker and of co-monomers. On the other hand, the environment conditions like temperature, type of solvent, solvent composition, electrolyte concentration or pH-value of the solvent influence the swelling. Varying these factors, the hydrogel typically shows a region where it is swollen and a region where it is collapsed. In between those two regions lies the region of volume transition. The solvent composition which is characteristic for that transition is called Θ -solvent here. The Θ -solvent mainly depends on the environmental factors and the nature of the polymer chain but not on the amount of cross-linker. For the quantitative description of the swelling of hydrogels, various types of models are used [34]. It is normally not possible to quantitatively predict the swelling of hydrogels or its dependence on factors the models were not adjusted to. With molecular simulation it is possible to predict the swelling of different hydrogels upon varying any environmental factor, as was shown in a previous study [35]. In the present work, the swelling of PNIPAAm hydrogel is studied with atomistic molecular dynamics simulation. The results for the Θ -solvent are compared to

194

J. Walter et al.

experimental data of PNIPAAm hydrogel as a function of the temperature in water [36]. The experimental data shows that the Θ -solvent of PNIPAAm in different electrolyte solutions follows the Hofmeister series. For this study the electrolyte sodium chloride (NaCl) was considered.

4.2 Models For the molecular dynamics simulations of PNIPAAm in aqueous solutions, the OPLS-AA force field [37, 38] was employed to describe PNIPAAm. It was combined with the SPC/E water model [9]. In previous studies, it was shown that this combination allows predicting the volume transition of PNIPAAm in water as function of the temperature [35]. Different NaCl models from the literature were used and compared to the NaCl model developed in this work. The used electrolyte models from the literature are GROMOS-96 53A6 (G53A6) [39] and KBFF [40].

4.3 Simulation Details Molecular simulations of PNIPAAm single chains were carried out with version 4.0.5 of the GROMACS simulation package [41, 42]. Simulations with PNIPAAm in aqueous NaCl solutions at 25◦ C were performed in order to find the best NaCl model for the simulation of the volume transitions of PNIPAAm in NaCl solutions. In a previous study, it was shown that the amount of cross-linker of the hydrogel has no significant influence on the Θ -solvent of the hydrogel and that this value is the same as for the PNIPAAm polymer [35]. Therefore, the simulations can be performed with a single PNIPAAm chain. For equilibration, single PNIPAAm chains in water were simulated in the isobaric-isothermal ensemble (N pT ) over 1 to 5 · 107 timesteps. The pressure was 0.1 MPa and was controlled by the Berendsen barostat [43], the temperature was controlled by the velocity rescaling thermostat [44] and the timestep was 1 fs for all simulations. Newton’s equations of motion were numerically solved with the leap frog integrator [45]. For the long-range electro˚ and an static interactions, particle mesh Ewald [46] with a grid spacing of 1.2 A ˚ was assumed for interpolation order of four was used. A cutoff radius of rc = 15 A all interactions. After equilibration, 2 to 6 · 107 production time steps were carried out with constant simulation parameters. Note that the production steps include the conformation transition as well as the simulation of the equilibrium. In order to analyze the results, the radius of gyration Rg was calculated Rg =

Σi ||ri ||2 mi Σ i mi

1/2 ,

(5)

Atomistic Simulations of Electrolyte Solutions and Hydrogels

195

which characterizes the degree of stretching of the single chain, where mi is the mass of site i and ||ri || is the norm of the vector from site i to the center of mass of the single chain. The radius of gyration in equilibrium was calculated as the arithmetic mean over the last 5 · 106 time steps of the simulation together with its standard deviation.

4.4 Results and Discussion For the temperature of 25◦ C, the hydrogel is swollen in pure water and collapsed (m) above the NaCl concentration xNaCl of about 0.03 g·g−1 [36]. Therefore, the single chain should be stretched in pure water and low electrolyte concentrations and collapsed at electrolyte concentrations above 0.03 g·g−1 . Figure 4 shows the simulation results as the radius of gyration over the NaCl concentration. The NaCl model from this work and G53A6 are able to predict the Θ -solvent of PNIPAAm at a NaCl concentration of 0.03 g·g−1 . The NaCl model KBFF does not yield the fully collapsed conformation. The models, that show the Θ -solvent at 0.03 g·g−1 also show a more or less stretched single chain at electrolyte concentrations above this point. The model with the best results here is the one from this work, for it only yields a slightly stretched conformation of the chain at electrolyte concentrations above the Θ -solvent. It is therefore clearly possible to obtain qualitative predictions for the swelling of PNIPAAm hydrogels in electrolyte solutions of NaCl by molecular simulation of

Fig. 4 Radius of gyration Rg of a PNIPAAm chain of 30 monomers in NaCl solutions of about (m) 14,000 solvent molecules in equilibrium as a function of the NaCl concentration xNaCl for the different NaCl models at a temperature of 25◦ C. The error bars indicate the standard deviation

196

J. Walter et al.

a single chain. For the Θ -solvent in NaCl solutions, it was even possible to quantitatively reproduce experimental data. First studies on the volume transition of PNIPAAm hydrogels in electrolyte solutions of sodium sulfate Na2 SO4 were currently carried out (results not shown here). By comparing the results of the two electrolyte solutions NaCl and Na2 SO4 in water, it is also possible to determine the effect of the Hofmeister series on the solubility of PNIPAAm in aqueous electrolyte solutions. The Hofmeister series leads to a Θ -solvent of PNIPAAm in Na2 SO4 solutions at a lower concentration than in NaCl solutions [36]. This correlation could be reproduced by the molecular simulations. In summary, this is an unexpectedly favorable agreement between predictions by molecular simulation and experimental data, especially when considering that the force fields were not adjusted to any such data.

4.5 Computational Demands All simulations presented in Sect. 4 were carried out with the MPI based molecular simulation program GROMACS. The parallelization of the molecular dynamics part of GROMACS is based on the eighth shell domain decomposition method [42]. With GROMACS, typical simulation runs to determine the radius of gyration in equilibrium employ 128 CPUs running for 24–72 hours. For these simulations very large systems must be considered comprising typically about 58 800 interaction sites. For these simulations a maximum memory of 284 MB and a maximum virtual memory of 739 MB was used.

5 Conclusion This work covers the development of ion force fields for describing the thermodynamic properties of electrolyte solutions and the applications of these force fields for predicting the volume transition of hydrogels by atomistic molecular simulations with explicit solvent models. The present work proves that alkali and halogen ions can be reliably modeled by an LJ sphere with a superimposed charge located at the center of mass. The developed force fields allow predicting structural properties like the radial distribution function and the hydration number as well as thermodynamic properties like the density. The self-diffusion coefficient of pure water was determined using various well known molecular models of water. The agreement with experimental data is often poor. Furthermore, ion self-diffusion coefficients were determined in simulations using these different water models. No correlation was observed between the accuracy of the self-diffusion coefficients of pure water and the accuracy of the self-diffusion coefficients of the ions. Further research effort in developing a new water model is needed to gain accurate predictions of transport properties in aqueous electrolyte

Atomistic Simulations of Electrolyte Solutions and Hydrogels

197

systems. In addition, the study shows that the self-diffusion coefficient of ions in aqueous solutions is almost independent of the LJ energy parameter ε of the ion. With the developed electrolyte models, it was possible to predict the volume transition of hydrogels in electrolyte solutions qualitatively and in some cases even quantitatively. The results also reproduce the effect of the Hofmeister series on the swelling of hydrogels. Acknowledgments. The computer simulations were performed on the supercomputer HP XC4000 at the Steinbuch Centre for Computing in Karlsruhe (Germany) under the grant LAMO. This work was carried out under the auspices of the Boltzmann-Zuse Society for Computational Molecular Engineering (BZS).

References 1. P. P. Ewald. Die Berechnung optischer und elektrostatischer Gitterpotentiale. Annalen der Physik, 64:253–287, 1921. 2. T. Darden, D. York, and L. Pedersen. Particle mesh Ewald – An N·log(N) method for Ewald sums in large systems. Journal of Chemical Physics, 98:10089–10092, 1993. 3. J. W. Eastwood, R. W. Hockney, and D. N. Lawrence. P3M3DP – The 3-dimensional periodic particle-particle/particle-mesh program. Computer Physics Communications, 19:215– 261, 1980. 4. B. G. Rao and U. C. Singh. A free-energy perturbation study of solvation in methanol and dimethyl-sulfoxide. Journal of the American Chemical Society, 112:3803–3811, 1990. 5. F. G. Fumi and M. P. Tosi. Ionic sizes + Born repulsive parameters in NaCl-type alkali halides. I. Huggins-Mayer + Pauling Forms. Journal of Physics and Chemistry of Solids, 25:31–44, 1964. 6. L. X. Dang and T. M. Chang. Molecular mechanism of ion binding to the liquid/vapor interface of water. Journal of Physical Chemistry B, 106:235–238, 2002. 7. B. Hess, C. Holm, and N. van der Vegt. Osmotic coefficients of atomistic NaCl (aq) force fields. Journal of Chemical Physics, 124:164509, 2006. 8. J. Walter, S. Deublein, J. Vrabec, and H. Hasse. Development of models for large molecules and electrolytes in solution for process engineering. High Performance Computing in Science and Engineering ’09, pages 165–176, 2010. 9. H. J. C. Berendsen, J. R. Grigera, and T. P. Straatsma. The missing term in effective pair potentials. Journal of Physical Chemistry, 91:6269–6271, 1987. 10. S. Deublein, B. Eckl, J. Stoll, S. V. Lishchuk, G. Guevara-Carri´on, C. W. Glass, T. Merker, M. Bernreuther, H. Hasse, and J. Vrabec. ms2: A molecular simulation tool for thermodynamic properties. Computer Physics Communications, in press, doi:10.1016/j.cpc.2011.04.026, 2011. 11. R. C. Weast, editor. Handbook of Chemistry and Physics. CRC Press, 68 edition, 1987. 12. P. B. Balbuena, K. P. Johnston, and P. J. Rossky. Molecular dynamics simulation of electrolyte solutions in ambient and supercritical water. 1. Ion solvation. Journal of Physical Chemistry, 100:2706–2715, 1996. 13. K. P. Jensen and W. L. Jorgensen. Halide, ammonium, and alkali metal ion parameters for modeling aqueous solutions. Journal of Chemical Theory and Computation, 2:1499–1509, 2006. 14. T. P. Straatsma and H. J. C. Berendsen. Free-energy of ionic hydration – Analysis of a thermodynamic integration technique to evaluate free-energy differences by molecular-dynamics simulations. Journal of Chemical Physics, 89:5876–5886, 1988.

198

J. Walter et al.

15. P. J. Lenart, A. Jusufi, and A. Z. Panagiotopoulos. Effective potentials for 1 : 1 electrolyte solutions incorporating dielectric saturation and repulsive hydration. Journal of Chemical Physics, 126:044509, 2007. 16. Y. Marcus. Ionic-radii in aqueous-solutions. Chemical Reviews, 88:1475–1498, 1988. 17. J. H. Wang and S. Miller. Tracer diffusion in liquids. II. The self-diffusion as sodium ion in aqueous sodium chloride solutions. Journal of the American Chemical Society, 74:1611–1612, 1952. 18. J. M. Nielsen, A. W. Adamson, and J. W. Cobble. The self-diffusion coefficient of the ions in aqueous sodium chloride and sodium sulfate at 25◦ C. Journal of the American Chemical Society, 74:446–451, 1952. 19. R. Zwanziger. Time correlation functions and transport coefficients in statistical mechanics. Annual Review of Physical Chemistry, 16:67–102, 1965. 20. M. S. Green. Markoff random processes and statistical mechanics of time-dependent phenomena. Journal of Chemical Physics, 22:398, 1954. 21. R. Kubo. Statistical-mechanical theory of irreversible processes: I. General theory and simple applications to magnetic and conduction problems. Journal of the Physical Society of Japan, 12:570, 1957. 22. S. Koneshan, J. C. Rasaiah, R. M. Lynden-Bell, and S. H. Lee. Solvent structure, dynamics and ion mobility in aqueous solutions at 25◦ C. Journal of Physical Chemistry, 102:4193–4202, 1998. 23. K. T. Gillen, D. C. Douglass, and M. J. R. Hoch. Self-diffusion in liquid water to −31◦ C. Journal of Chemical Physics, 57:5117–5119, 1972. 24. W. L. Jorgensen, J. Chandrasekhar, and J. D. Madura. Comparison of simple potential functions for simulating liquid water. Journal of Chemical Physics, 79:926–935, 1983. 25. J. L. F. Abascal and C. Vega. A general purpose model for the condensed phases of water: TIP4P/2005. Journal of Chemical Physics, 123:234505, 2005. 26. G. Guevara-Carri´on, J. Vrabec, and H. Hasse. Prediction of self-diffusion coefficient and shear viscosity of water and its binary mixtures with methanol and ethanol by molecular simulations. Journal of Chemical Physics, 134:074508, 2011. 27. H. A. Abd El-Rehim. Swelling of radiation crosslinked acrylamide-based microgels and their potential applications. Radiation Physics and Chemistry, 74:111–117, 2005. 28. V. N. Pavlyuchenko, O. V. Sorochinskaya, S. S. Ivanchev, S. Y. Khaikin, V. A. Trounov, V. T. Lebedev, E. A. Sosnov, and I. V. Gofman. New silicone hydrogels based on interpenetrating polymer networks comprising polysiloxane and poly(vinyl alcohol) networks. Polymers for Advanced Technologies, 20:367–377, 2009. 29. Z. M. O. Rzaev, S. Dinc¸er, and E. Pis¸kin. Functional copolymers of N-isopropylacrylamide for bioengineering applications. Progress in Polymer Science, 32:534–595, 2007. 30. A. H¨uther, X. Xu, and G. Maurer. Swelling of N-isopropyl acrylamide hydrogels in water and aqueous solutions of ethanol and acetone. Fluid Phase Equilibria, 219:231–244, 2004. 31. A. H¨uther, X. Xu, and G. Maurer. Swelling of N-isopropyl acrylamide hydrogels in aqueous solutions of sodium chloride. Fluid Phase Equilibria, 240:186–196, 2006. 32. K. Mukae, M. Sakurai, S. Sawamura, K. Makino, S. W. Kim, I. Ueda, and K. Shirahama. Swelling of poly(N-isopropylacrylamide) gels in water-alcohol (C1-C4) mixed solvents. Journal of Physical Chemistry, 97:737–741, 1993. 33. H.M. Crowther and B. Vincent. Swelling behavior of poly-n-isopropylacrylamide microgel particles in alcoholic solutions. Colloid Polymer Science, 276:46–51, 1998. 34. S. Wu, H. Li, J. P. Chen, and K. Y. Lam. Modeling investigation of hydrogel volume transition. Macromolecular Theory and Simulations, 13:13–29, 2004. 35. J. Walter, V. Ermatchkov, J. Vrabec, and H. Hasse. Molecular dynamics and experimental study of conformation change of poly(N-isopropylacrylamide)-hydrogels in water. Fluid Phase Equilibria, 296:164–172, 2010. 36. Y. Zhang, S. Furyk, D. E. Bergbreiter, and P. S. Cremer. Specific ion effects on the water solubility of macromolecules: PNIPAM and the Hofmeister series. Journal of the American Chemical Society, 127:14505–14510, 2005.

Atomistic Simulations of Electrolyte Solutions and Hydrogels

199

37. W. L. Jorgensen and J. Tirado-Rives. The OPLS potential functions for proteins. Energy minimizations for crystals of cyclic peptides and crambin. Journal of the American Chemical Society, 110:1657–1666, 1988. 38. W. L. Jorgensen, D. S. Maxwell, and J. Tirado-Rives. Development and testing of the opls all-atom force field on conformational energetics and properties of organic liquids. Journal of The American Chemical Society, 118:11225–11236, 1996. 39. C. Oostenbrink, T. A. Soares, N. F. A. van der Vegt, and W. F. van Gunsteren. Validation of the 53a6 GROMOS force field. European Biophysical Journal, 34:273–284, 2005. 40. S. Weerasinghe and P. E. Smith. A Kirkwood-Buff derived force field for sodium chloride in water. Journal of Chemical Physics, 119:11342–11349, 2003. 41. D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. C. Berendsen. GROMACS: Fast, flexible, and free. Journal of Computational Chemistry, 26:1701–1718, 2005. 42. B. Hess, C. Kutzner, D. van der Spoel, and E. Lindahl. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation, 4:435–447, 2008. 43. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola, and J. R. Haak. Molecular dynamics with coupling to an external bath. Journal of Chemical Physics, 81:3684–3690, 1984. 44. G. Bussi, D. Donadio, and M. Parrinello. Canonical sampling through velocity rescaling. Journal of Chemical Physics, 126:014101, 2007. 45. R. W. Hockney, S. P. Goel, and J. W. Eastwood. Quiet high-resolution computer models of a plasma. Journal of Computational Physics, 14:148–158, 1974. 46. U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen. A smooth particle mesh Ewald method. Journal of Chemical Physics, 103:8577–8592, 1995.

cuVASP: A GPU-Accelerated Plane-Wave Electronic-Structure Code Stefan Maintz, Bernhard Eck, and Richard Dronskowski

Abstract We report about a source-code modification of the density-functional program suite VASP which greatly benefits from the use of graphics-processing units (GPUs). The blocked Davidson iteration scheme (EDDAV) has been optimized for GPUs and gains speed-ups of up to 3.39 on S1070 devices and of 6.97 on a C2050 device. Using the Fermi card, the code reaches an impressive 61.7% efficiency but does not suffer from any accuracy losses. The algorithmic bottleneck lies in the multiplication of rectangular matrices. We also give some initial thoughts about introducing a different level of parallelism in order to harness the computational power of multi-GPU installations.

1 Serial cuVASP With the advent of cards using graphics-processing units (GPUs) the traditional concept of vector computing has been facing a noteworthy comeback. Due to the fact that the wide-spread density-functional theory (DFT) program VASP is known to run extremely well on vector computers, the choice for a VASP source-code modification was quite obvious. We have recently published a detailed contribution [1] which reports stunning results in terms of such source-code modifications of a serial VASP version. As a first finding, simple routine renaming and linking against the CUFFT and CUBLAS libraries provided by NVIDIA did not lead to any computational speed-up. On the contrary, the overall code execution was significantly slowed down. This unfortunate effect was identified as being due to the PCI bottleneck with respect to memory transfers between the CPU and the GPU and the according latencies.

Stefan Maintz · Bernhard Eck · Richard Dronskowski Institute of Inorganic Chemistry, RWTH Aachen University, D-52056 Aachen, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 16, © Springer-Verlag Berlin Heidelberg 2012

201

202

S. Maintz, B. Eck, R. Dronskowski

Fig. 1 The profiling results of a typical calculation yield the number of calls and their according GPU times of the CUDA kernels. For clarity these kernels were sorted into groups depending on their purpose. The results illustrate that even in an optimized version, lots of memory transfers are necessary, but they are negligible regarding their share of the GPU time. The high matrixmultiplication share is a consequence of the mentioned badly performing routine for rectangular matrices

In the sequel, other source-code modifications were undertaken, for example porting those routines to the GPU which account for about 97% of the CPU time used. In particular, coalesced memory access is vital to gain the optimum speedup, and most routines transferred to the GPU were found to be limited by the GPU’s own restrictions. The final speed-up grows with the number of (sequentially stepped) routines that have been transferred to the GPU but we note that the shared memory of the GPU becomes another limiting factor for many routines. A couple of routines show an impressive speed-up, which does not come as a surprise because fast Fourier transformation (FFT) or matrix multiplication of rather large matrices are typical cases where vector computing performs exceptionally well. Nonetheless, it must be stressed that the hardware’s internals must also be taken into account. For example, carrying out FFT on the GPU with a standard grid size proposed by VASP (54 ×56×54 in our case) resulted only in a moderate speedup. By enlarging the grid size to 64 × 64 × 64, not only the accuracy (!) is greatly improved but—at the same time—this leads to an enormous speed-up of 3.4. This interesting behavior goes back to the nature of the hardware design since FFT is done perfectly fast if the gridsize adheres to steps of 2n . The most severe drawback was found in the multiplication of a heavily rectangular (that is, almost vector-like) matrix—a problem CUBLAS currently doesn’t seem to be tuned for. The consequences with respect to GPU times are illustrated in Fig. 1.

cuVASP: a GPU-Accelerated Plane-Wave Electronic-Structure Code

203

Because the CPU using Intel’s MKL performs amazingly well on this, the problem may arise due to the fact that the GPU has no cache while the CPU may process the entire operation inside its huge (8 MB) L3 cache. We expect the GPU to perform even better with the new Fermi cards given that their cache is used as well. Indeed, a newly written CUDA kernel that parallelizes the matrix multiplication more in a vector-like way outperforms CUBLAS 3.1 by one order (!) of magnitude. Nonetheless, it is still only slightly faster than the CPU due to a data-reduction operation which is needed in the present scheme, and no more shared memory is available to re-use data that was already read from the main memory. Overall a speed-up of 3.39 on a Tesla card (S1070) and of 6.97 on a Fermi card (C2050) compared to a Xeon X5560 with 2.8 GHz may be achieved. Here, the efficiency on the GPU is 46% (with respect to computing power on Tesla) and 61% (with respect to memory bandwidth on Fermi). No noteworthy numerical deviations compared to pure CPU-based calculations were found when using double-precision GPU calculations.

2 Outlook on Parallelization The currently reached speed-up of the serial version is quite impressing, but before dealing with parallelism the term “serial” needs some further reckoning. In general, a code is called serial if it does not calculate multiple data concurrently. As long as we disregard SIMD extensions, this holds for one core of an x86-based CPU. Within this definition, computations that profit from using GPUs are not serial, which becomes obvious if one looks at their architecture. The first question that always needs to be answered when thinking about parallelization is whether the effort is at all necessary. VASP calculations employing 64 CPU cores are nothing special even today since incredibly fast CPU cores are available, to be compared with pretty slow CPU cores just a few years ago. Nonetheless, faster CPUs do not necessarily make the users entirely happy throughout, simply because the faster hardware just allows them to tackle those larger problems that were unthinkable to calculate beforehand. Also, somebody will always find a larger problem than the previous one. So speaking of CPUs the clear answer to the initial question is yes. It is necessary to access as many CPUs as possible concurrently; in other words, there is an obvious need for parallelization. Another aspect one still has to keep in mind is that serial optimization is always necessary and often also effective for the parallel version. As far as we can tell, VASP seems to be very well optimized serially, so there is not much to do here. Can this simple consideration, however, also be applied to GPUs? The principles of serial tuning and the necessity to handle larger problems hold as well. Because 8-core CPUs are available and will thus outperform the GPU with a slower speedup of almost 7, the parallel CPU solution seems to be preferable for calculating large problems. For cuVASP to compete, it would thus be desirable to have more parallelism even when using GPUs. Concerning the graphics-card version, all com-

204

S. Maintz, B. Eck, R. Dronskowski

putationally demanding routines are executed on the GPU anyway, so using more cores of the CPU combined with one GPU to gain more parallelism does not seem to make much sense. As mentioned before, GPUs work in parallel, by definition. Hence to gain more parallelism it is necessary to introduce another level of parallelism that goes under the term of “multi-GPU” usage. As the term alludes this level of parallelism would include more than one graphics card. Not later than trying to use clustered GPU installations in modern computing centers one may want to use graphics cards that are connected to different hosts. When data needs to be exchanged between those cards, it becomes necessary to distribute that data over the network connecting these hosts. A great tool for that is the MPI framework, and serendipitously the CPU parallelization already included in VASP was carried out using MPI. This strategy might give the impression that enabling multi-GPU usage is trivial. In the following, we will see why the latter is not the case. The parallelization in VASP nicely follows the paradigm to parallelize as finely grained as possible, at least when speaking of the EDDAV algorithm. Of course, the fineness of the data distribution depends on multiple factors. If the algorithm allows distribution, the efficiency primarily depends on the network link bandwidth, its latency and on how often data need to be exchanged. Especially within one host but also on low-latency (e.g., Infiniband) networks the associated factors are in favor of fine-grain distribution, as data exchange times are low. In the work on the single-GPU version of cuVASP, we found that data transfers between the host memory and the GPU had to be avoided as much as possible. This is caused by latencies for each memory copy in the dimension around 10 μ s on the newest devices. This value can be compared to typical MPI on Infiniband latencies of up to 3 μs. The MPI-enabled version of the EDDAV algorithm supports parallelization over plane-wave coefficients. Despite distributing the workload nicely between all available cores, this requires lots of MPI communications in the FFT routines. For cuVASP, this would also require one memory transfer from the GPU to the host memory ahead of the MPI call on the sending host. Of course the receiving host must copy those data to its own GPU. Accordingly, this would need two memory transfers for every single FFT! A visualization of the data transfers is given in Fig. 2. This is basically the same situation as the aforementioned idea of linking NVIDIA’s FFT-Library to the VASP code without further alteration to the code. Since the MPI latency would still have to be taken into account, it would be even worse. Hence we reason that this multi-GPU parallelization scheme using the EDDAV algorithm cannot be implemented efficiently. On the other hand the RMM-DIIS scheme is able to distribute the workload over the electronic bands. Within each band there is very little communication needed. Except for the re-orthogonalization and subspace rotation the bands can be calculated widely independent from each other, which should be especially true for the data that is carried on the GPU. Thus, the nicely performing FFTs and other routines used to find the Hartree potential could be executed just in the way they work in the serial version. Because the routines used in the RMM-DIIS scheme differ slightly

cuVASP: a GPU-Accelerated Plane-Wave Electronic-Structure Code

205

Fig. 2 Flowchart to visualize the necessary memory transfers when using more than one GPU. Typical latencies are given to illustrate the costs of a data exchange in the multi-GPU scenario

from those used by EDDAV, they cannot be expected to work without further effort, but they also do not require a complete rewriting of all CUDA kernels. We therefore expect a working multi-GPU version of VASP soon. The aforementioned reasoning might lead to the conclusion that multi-GPU parallelization works the better the more coarsely the data are distributed, at least as long as the problems are large enough. For the application of that unusual paradigm in cuVASP, the most coarse and easiest scheme would be parallelization over k points because they are completely independent from each other and, therefore, allow trivial parallelism. Nonetheless, a problem arises here that makes k point parallelism rather unattractive: chemical or physical systems that demand computational resources which make multi-GPU parallelism necessary are expected to be extraordinarily large. For these, only a single k point will often suffice to describe them accurately in reciprocal space. The latter expectation brings us back to the original paradigm of using finer workload distributions but, on the other hand, the performance is limited by the greatly enlarged data-exchange latencies when GPUs are employed. In conclusion it will be necessary to find the best compromise between the two depicted extrema. It appears that band distribution fits this criterion. Our next step in the immediate future will be the implementation of the described schemes and the careful analysis of the ascendancies concerning the efficiency.

References 1. S. Maintz, B. Eck, R. Dronskowski, Comp. Phys. Commun. 2011, 182, 1421.

Reacting Flows Prof. Dr. Dietmar Kr¨oner

During the last years great effort has been done to reduce the pollution of the atmosphere by CO2 and NOx , e.g. by lean combustion. On the other hand the high performance numerical simulations on very powerful computer platforms like NEC SX-9 and the new Cray XE6 at the HLRS lead to a comprehensive understanding of the complex and fundamental processes during combustion and can help to develop new technologies for diminishing the emission. Now also first numerical experiments for the comparison between the performance of the powerful codes on the NEC SX-9 and the Cray XE6 at the HLRS are available. The “Assessment of conventional droplet evaporation models for spray flames” are considered by Zoby et al. In particular they study the influence of a droplet density on the evaporation rates in a reactive turbulent environment and the induced combustion. Different droplet arrangements are taken into account. The mathematical model is given by the compressible Navier-Stokes equations with a source term representing surface forces. A two fluid method is used to capture the interface by a level set approach. By the direct numerical simulation “all” relevant scales of turbulence are resolved. The numerical algorithm is based on a low Mach number in-house code which is fully implicit. This code allows the direct computations of the evaporation rates and can be used for a direct comparison of droplet evaporation models to compute evaporation rates when droplets are not fully resolved. In order to reduce pollution the development of a new gas turbine combustor concept is necessary. This is studied in the paper “Analysis of the effects of wall boundary conditions and detailed kinetics on the simulation of gas turbine model combustor under very lean conditions” by Rebosio et al. The use of lean combustion processes can be useful within these developments. Unfortunately the lean combustion processes present some new challenges, in particular strong pressure fluctuations in the combustion chamber have to be controlled. These can lead to lock with eigenfrequencies of the combustion chamber and to serious structural damages. The simulations are based on the compressible Navier-Stokes equations including Abteilung f¨ur Angewandte Mathematik, Universit¨at Freiburg, Hermann-Herder-Str. 10, 79104 Freiburg, Germany 207

208

D. Kr¨oner

a three-step chemical kinetic mechanism for the methane combustion. A turbulence model which can change dynamically between the URANS and the LES is used. The numerical solver is an implicit finite volume scheme on structured and unstructured meshes and a multi-grid solver is used for solving the linear systems. Second order discretization in space and time imply sufficient accuracy. The code is parallelized on the basis of MPI. A comparison with experimental measurements shows a good agreement. In the paper about “Oxy-coal combustion modeling at semi-industrial scale” M¨uller et al. study a so-called oxy-coal combustion technology which is important for the CO2 capture and CCS storage technology. In contrast to conventional combustions in a pure air atmosphere oxy-coal combustion processes are based on a mixture of O2 and recycled flue gas. Therefore the existing code for conventional combustion simulation has to be adapted to this new situation. The reacting flow problem has been discretized by a finite volume scheme including a k–ε model. The complete algorithm has been implemented within the AIOLOS environment and has been validated by experimental data obtained by measurements taken from a semi-industrial scale furnace at the authors’ institute. In the paper “Delayed detached Eddy-simulations of compressible turbulent mixing layer and detailed performance analysis of scientific inhouse code TASCOM 3D” by M. Kindler et al. a compressible turbulent reacting flow with 13 species and 32 reactions is considered. The underlying mathematical model consists of the compressible Navier-Stokes system together with transport equations for the species. For the turbulence modeling a modification of the “detached eddy-simulation” (DES) method is used. The DES looks like a LES if the grid spacing is small compared to the wall distance and looks like a RANS model otherwise. The code TASCOM 3D is based on a 5th order upwind biased scheme combined with an improved multidimensional limiting process for spatial discretization. The inviscid fluxes are calculated using the AUSM flux vector splitting. The unsteady equations are solved using an implicit lower-upper-symmetric Gauss-Seidel finite volume algorithm. The code was implemented on the NEC SX-9 and the new Cray XE6 computers and the performance analysis is studied. It turns out that the scaling on the Cray XE6 gives better results. They are in good agreement with experimental data.

Assessment of Conventional Droplet Evaporation Models for Spray Flames M.R.G. Zoby, A. Kronenburg, S. Navarro-Martinez, and A.J. Marquis

1 Introduction In turbulent spray flames, the fuel is atomised, the droplets evaporate and combustion takes place. The flow in most applications of technical relevance is turbulent, and studies of liquid fuel combustion should include the complex interactions between turbulence, spray break-up and evaporation. In particular the local temperature affects the evaporation rates of droplets, but conventional spray studies are very complex and do not allow to easily quantify the local temperature gradients at the droplet surface and interactions between the various droplets. It can be shown that most spray flames are not dense, but consist of dilute sprays and droplet interactions do not have to be accounted for [1]. The parameter space to describe droplet arrangements in real turbulent flows is too large to be considered, and studies of regularly ordered droplet arrays as proposed in the current paper, facilitate the analysis of local effects that determine evaporation rates. Still, only very few numerical works exist [2] that are capable of directly quantifying the effect of droplet interactions on evaporation rate and subsequent combustion. The present work assesses the effects of droplet density and turbulence on evaporation rates in inert and reactive turbulent environments using Direct Numerical Simulations (DNS) of n-heptane and kerosene droplet arrays. The two phases are fully represented, capturing the interface location and resolving all the scales of turbulence. The model assumes that the droplet temperature is constant, below the critical point and equal to the saturation value. This assumption is reasonable for very hot environments where droplets reach saturation temperature within a short M.R.G. Zoby · S. Navarro-Martinez · A.J. Marquis Department of Mechanical Engineering, Imperial College London, SW7 2AZ London, UK, e-mail: [email protected], [email protected], [email protected] A. Kronenburg Institut f¨ur Technische Verbrennung, University of Stuttgart, Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 17, © Springer-Verlag Berlin Heidelberg 2012

209

210

M.R.G. Zoby et al.

time compared to their life time. The conditions are typical for high-pressure combustion chambers after injection of the droplets and their deceleration due to fluiddroplet momentum exchange, i.e. only a low relative velocity between droplets and surrounding fluid remains. Therefore, trans- and super-critical characteristics do not need to be taken into account and are beyond the scope of the presented work. The evaporation rates obtained with the DNS are then compared to existing models of droplet evaporation [3, 4], quantifying differences.

2 Numerical Approach The fluid motion is described by the Navier-Stokes equations. Assuming a Newtonian fluid, the conservation of mass and momentum can be written using standard notation as

∂ ρ ∂ ρuj = 0, + ∂t ∂xj ∂ ρ ui ∂ ρ ui u j ∂p ∂ ∂ ui =− + μ + Fs , + ∂t ∂xj ∂ xi ∂ x j ∂xj

(1) (2)

where FS represents the surface forces. The initial size of the droplets and the relative velocity between the liquid and gas phases in the simulated test cases suggest that inertial forces dominate at the small scales considered here and gravity and other volume forces have been neglected. At low Mach numbers and constant thermodynamic pressure, acoustic interactions and viscous heating effects are small. Then, the species concentration and energy transport equations can be rewritten for a general reactive scalar Φk , with k = 1, Ns where Ns is the number of scalars required to describe the system, i.e. the number of species plus enthalpy, as ∂ ρΦk ∂ ρΦk u j ∂ ∂ Φk D + Sevap + Scomb , + = (3) ∂t ∂xj ∂xj ∂xj where the thermal and species diffusivities, D, have been set equal (unity Lewis number assumption). Sevap and Scomb are the source terms due to evaporation and chemical reactions, respectively. The formation of soot is not modelled, the droplet temperature is much less than the mean gas temperature and droplet heating by radiation can be ignored as discussed in Sazhin [4]. In the present model, the energy transport equation is solved using a one-fluid formulation while the species, velocities and continuity equations are solved using a two-fluid formulation. It is noted here that a two-way coupling between droplets and surrounding fluid is implemented. The droplets are not held fixed in position, but movement within the computational domain remains small due to low relative velocities and short evaporation (and simulation) time scales. In order to track the interface between liquid and gas, the Level Set method [5] has been combined with

Assessment of Conventional Droplet Evaporation Models for Spray Flames

211

the Ghost Fluid method [6] to account for pressure jumps in a mass conserving approach. The introduction of interface thickness and the smoothing of fluid properties (density, diffusivity, specific heat, conductivity and surface tension) in variable density flows are avoided because of two-fluid formulation and the use of the Ghost Fluid method. The physical properties are defined in each cell according to the species concentrations and temperature, following mixing rules. The Level Set function is defined in every cell of the domain as the signed minimum distance of the cell centre to the interface. Negative values correspond to the liquid phase, positive values to the gas phase and the interface is represented by the zero level of the function. Advection of the level set function, φ , is given by

∂φ ∂φ + uIi =0 ∂t ∂ xi

(4)

where uIi is the local interface velocity of the ith-component. Equation (4) is solved explicitly for the cells close to the interface (|φ | ≤ 1.5Δ x, where Δ x is the grid spacing and constant in the present work), using a second-order accurate method in time [7] and a first order upwind method in space. The relatively low order is justified since (4) provides a first estimate that needs to be corrected by a reinitialisation procedure (see below). The interface velocity, uIi , is calculated by [8] k ∂T (5) uIi = udi − ρl · h f g ∂ xi where udi is the liquid phase velocity, k is the thermal conductivity, ρl is the liquid density, h f g is the enthalpy of evaporation and T is temperature whose gradient is approximated by a fourth-order central difference scheme. To maintain φ as a distance function, it is reinitialised by calculating the new Level Set function in the entire domain by ∂φ ∂φ −1 = 0 + sign(φ ) (6) ∂τ ∂ xi where τ is the non-dimensionalised time. Equation (6) ensures that |∇φ | = 1. Integration in time is achieved by a third-order Runge-Kutta scheme with time step, Δ τ , equal to the value of 0.8Δ x. The error is controlled in a band close to the interface (|φ | ≤ 2.5Δ x) and is 5Δ x. The derivatives of the level set function are approximated by a fifth-order WENO scheme since WENO is non-oscillatory and preserves monotonicity [9]. Advancing the level set function can be improved by extending the interface velocity to every point in the domain. There are problems in which the speed of the interface changes rapidly or discontinuously as the front moves through the domain and the exact location of the interface determines the speed. In these cases, constructing a velocity from the position of the interface itself rather than from less accurate grid velocities improves subgrid resolution [10]. In addition, the extension of the velocities helps maintain an accurate level set representation and reduces the number of iterations in the reinitialisation procedure.

212

M.R.G. Zoby et al.

Fig. 1 Extension of the interface velocity scheme

The extension velocity is defined so that the level set function which is initialised with the signed minimum distance from the interface to the centre of the cell, evolves accurately in the entire domain as the front moves. However, it is noted that some correction is still necessary and the reinitialisation procedure cannot be avoided in order to guarantee that the minimum distance is well preserved. The advantages of the velocity extension algorithm is that it both reduces the need of reinitialisation (number of timesteps in this procedure), which can cause some distortion on the interface, and avoids the bunching and stretching of neighbouring level set lines. The perturbations of the level set lines close to the interface lead to mass conservation issues so the extension of the velocities also improve mass conservation. In this work, the extension is performed following the idea that each cell of the domain takes the interface velocity of the closest interface point. The main issue is to design an efficient algorithm to track the closest interface point for each cell. If not efficient, the procedure might increase significantly the computational cost. In order to optimise the algorithm, the value of the level set function at the cell is used to define a region of the domain where the closest points are tracked, as shown by the yellow region in Fig. 1. The region is defined with a width of 4Δ x. Only the cells inside the region are tracked, making the algorithm more efficient. Then, for each interface point found in this region, the distance from the interface to the cell centre is verified and the velocity of the closest one is set to the cell. Even though the extension of the velocities reduces the computational time of the reinitialisation procedure, the method increases the stability, improves accuracy, and increases the total time of the whole Level Set method procedure. Therefore, in cases where the interface is not moving outside a certain band or region, the extension of the velocities can be done only in the neighbourhood of this region and not necessarily in the entire domain. As indicated above, the discontinuities in the properties of the fluids and also in the pressure that occur across the interfaces, are taken into account using the Ghost Fluid method [6]. The concept behind the method is to create ghost cells (or nodes) in each phase and compute the appropriate properties and scalars for the ghost cells

Assessment of Conventional Droplet Evaporation Models for Spray Flames

213

Fig. 2 Ghost Fluid method—scheme for ghost cells of variable p

depending on the phase which is being evaluated. These values are then used to calculate derivatives. Jumps conditions, which must be known or defined, are added appropriately to the derivatives. The classical formulation of a second order central difference derivative for pressure, for example, is ∂p pi+1 − pi−1 . (7) = ∂x xi+1 − xi−1 In the Ghost Fluid method, using the notation of Fig. 2, the derivatives become p+ + J p − pi−1 ∂p . = i+1 ∂x xi+1 − xi−1

(8)

In this work, the Ghost Fluid method is used to account for pressure, velocities and species discontinuities at the interface as the value of the jump is known (Jp = σ κ and Jρ = ρl − ρg ). As shown in the next section, its use reduces significantly the spurious currents due to the implementation of surface tension. The interfacial curvature, κ , is calculated from the level-set as ∂φ ∂ ∂ xi . (9) κ= ∂ xi | ∂ φ | ∂ xi

The Level Set method is not inherently mass conserving [11]. Depending on the interface movement and because of the reinitialisation procedure, very high mass losses and gains may occur. In order to overcome this problem, a simple and computationally efficient method is implemented. The approach assumes that the mass loss or gain is uniformly distributed over the interface. At the end of each step, the mass of one of the phases defined as the controlling phase (liquid phase in this work) is compared to the theoretical mass. The total theoretical mass is determined from a global mass balance equation in the controlling phase. If the difference between the

214

M.R.G. Zoby et al.

theoretical mass and the actual mass is higher than a specified tolerance (0.1%) then a corrective interface velocity is imposed all over the interface causing an artificial movement of the interface in order to correct the mass. It is noted here that any other forces that exist at the interface such as shear stresses are neglected. The focus of the present work is on the evaporation of the spray droplets, its modelling and its effects on combustion. The primary driving force for the evaporation process is the heat transfer to the droplet and this is fully resolved. The droplets are relatively small and likely to remain close to spherical and droplet surface and thus evaporation rates can be well approximated by the approach presented here. The models are implemented in an in-house low-Mach number CFD code (BOFFIN) [12]. The code is based on an fully implicit low Mach number finite volume formulation, second order in time and space and is described in detail elsewhere [13]. We use a structured, Cartesian grid for the discretisation. Up to 512 cores have been used, but parallelisation efficiency drops to around 70% when compared to an ideal scaling law. The chemistry for kerosene combustion is approximated by a 4 step and 7 species (CO, CO2 , H2 , N2 , H2 O, O2 and C12 H23 ) mechanism for hydrocarbon combustion [14].

3 Examples of Code Validation Potential sources of errors in the model must be investigated in order to verify the accuracy and to quantify the uncertainties. The first test case compares the use of the less computationally expensive one-fluid formulation for the solution of the velocity transport equations and the two-fluid formulation. Other important sources of errors are surface tension modelling and spurious currents generated by this modelling. The interface area and curvature calculations can also induce significant deviations in the surface tension modelling and the heat transfer calculation. A number of different test cases are performed in order to quantify these errors and are three examplary tests are described here. The fuel is assumed to be methanol tests and at a temperature of T = 338 K with ρl = 750 kg/m3 , σ = 1.85 · 10−2 N/m, μ = 3.5 · 10−4 Pa s and h f g = 1.097 MJ/kg.

3.1 One- and Two-Fluid Formulations Initially, all the governing equations were implemented following a one-fluid formulation. However, it is observed that following the one-fluid formulation for velocities, it is necessary to smooth the properties across the interfaces in order to maintain the model stability. The smoothing of the properties introduces unrealistic physics into the model and the errors induced must be evaluated.

Assessment of Conventional Droplet Evaporation Models for Spray Flames

215

Fig. 3 Fixed radial temperature profile of a 2-droplet case

Fig. 4 Comparison of radial velocity profile induced by Stefan flow in one and two-fluid formulations

Figure 4 shows the velocity profiles for a test case where two 100 µm droplets are placed in a hot stagnant environment. The hot ambient is characterised by fixed radial temperature gradient that decreases from 2500 K far from the droplets to 338 K at the droplet surface (see Fig. 3). The droplets are at constant temperature (338 K). Radial velocities are induced by evaporation of the droplets. The profile shown in Fig. 4 is along a line passing through the centre of the two droplets. The circles indicate the regions where the radial velocity is close to zero i.e. where the droplets reside. Far from the droplets the velocity increases because of the density difference due to the imposed radial temperature gradient. In the one-fluid formulation, it can be seen that the velocity peaks are not symmetric and the peaks between the droplets are lower than in the outer region. Furthermore, because of the density smoothing around the droplet, the peak is lower than in the two-fluid formulation where the properties are treated as a steep jump. The two-fluid formulation captures well the peak value and represents well the symmetry of the problem. As the one-fluid formulation fails to accurately treat the immediate neighbourhood of the droplets, the two-fluid formulation is adopted in this work for the velocity and species equations. For the energy equation, because the relative jump in

216

M.R.G. Zoby et al. h

g enthalpy (Jh = c p (Tmaxf −T ) is much lower and in most of the cases studied the liqmin ) uid phase remains at constant temperature, the one-fluid formulation is used for the energy equation.

3.2 Curvature and Surface Tension Errors in the curvature calculation directly affect the pressure jump across the surface due to surface tension modelling (J p = σ κ ). The Laplace problem is simulated to investigate the effects of the surface tension modelling. This problem is chosen because the dynamical system is close to equilibrium. Under these conditions it is easier to evaluate the numerical errors. A major consequence is the appearance of spurious or parasitic currents observed within many calculations where the surface tension effects are dominant. These currents are generated in the vicinity of the interfaces and their magnitudes can be evaluated by the simulation of static droplets surrounded by a different fluid in zero gravity environment. A 50 µm droplet is placed in a stationary environment and different combinations of surface tension, σ , and viscosity, μ , are investigated in three-dimensional cases. Figure 5 shows that these currents can generate vortical flows despite the absence of external forces. The currents depend on the surface tension coefficient, σ , and the viscosity, μ . Figure 5 (right) presents the maximum velocities generated by spurious currents. For the same droplet diameter, these values are correlated to the ratio σμ in accordance with [15]. For the entire range investigated, 0.1 < σμ < 100, the maximum velocities reached are lower than 0.1 m/s. Moreover, the model used in this work (present implementation) presents better results than published in [16], for the pure

Fig. 5 Spurious currents generated by surface tension modelling (left) and maximum velocity induced by spurious currents for different σμ

Assessment of Conventional Droplet Evaporation Models for Spray Flames

217

Fig. 6 Error in surface area calculation as function of number of cells per radius

Level Set method (denoted LS) ( σμ > 8) and for a mass conserving Level Set (denoted MCLS) ( σμ > 0.2). The errors in curvature increase when the number of cells across the droplet radius decreases. The errors are lower than 2% for droplet with more than 5 cells and are around 10% for droplets with 2 cells across the radius.

3.3 Surface Calculation The interface surface that is used to calculate the heat transfer and mass evaporation rate is computed in each cell using the level set function. Since surfaces are assumed to be planar in each cell, errors strongly depend on mesh resolution and curvature of the surface. The tests performed evaluate the difference between the surface computed by the code and the theoretical surface for droplets with different number of cells across the radius. The tests show that even for droplets with only 2 cells across the radius, the errors in total droplet surface calculation are lower than 7%. Figure 6 shows the effects of the number of cells on droplet surface errors. Increasing the number of cells, the error decreases significantly and for droplets with more than 5 cells across the radius, the errors in the total surface area calculation are lower than 1%.

4 Modelling of the Droplet Evaporation The implementation described above allows the direct computation of the evaporation rates and subsequent comparison of droplet evaporation models that are used in the literature to compute evaporation rates when droplets are not fully resolved

218

M.R.G. Zoby et al.

and treated in a Lagrangian framework as point sources of fuel. Here, we compare the DNS to two well known models commonly used in Reynolds-averaged Navier-Stokes (RANS) methods and Large Eddy Simulations (LES). These models assume single-droplet evaporation in equilibrium, with constant droplet temperature and uniform properties of the gas phase. The first model, here defined as M1, is based on the energy equation and assumes that all the heat conducted to the droplet goes to evaporation [4] as the droplet temperature, Tdrop , is at boiling temperature. The evaporation rate is calculated from m˙ evap =

π μg d Nu ln(Bq + 1) Pr

(10)

where the heat transfer number, Bq , based on the gas temperature, T∞ , is given by Bq =

C pg (T∞ − Tdrop ) hfg

(11)

and the modified Nusselt number 1/2

Nu = 2 + 0.552Rek Pr1/3 F(Bq ).

(12)

A correction factor through the function F(B) is introduced to account for Stefan flow [17], ln(1 + B) 1 = (1 + B)0.7 . (13) F(B) B The Reynolds number is based on the gas properties, the relative velocity and the droplet diameter. ρgUrel d . (14) Rek = μg The second model, M2, makes the same general assumptions but the mass fraction of vapour at the droplet interface is uniform and determined by the liquidvapour equilibrium at the droplet temperature [18]. The model assumes that the rate of the droplet evaporation is controlled by the diffusion process [4]. The mass fraction of vapour at the droplet surface is defined as YS and the gas vapour fraction as Y∞ . The rate is then defined as m˙ evap =

π μg d Sh ln(Bk + 1) Sc

(15)

where the mass transfer number, Bk , is defined as Bk =

YS −Y∞ 1 −YS

(16)

and the modified Sherwood number is given by 1/2

Sh = 2 + 0.552Rek Sc1/3 F(Bk ).

(17)

Assessment of Conventional Droplet Evaporation Models for Spray Flames

219

The Prandtl and Schmidt numbers are assumed to be 0.7. Model M1 is appropriate when the evaporation is controlled by heat transfer (Tdrop = Tsat ) and M2 when it is controlled by mass transfer (Tdrop ≤ Tsat ) [19]. It is well known that the performance of both models is strongly dependent on the transport properties of the gas, which depend on the temperature and species composition. Mixing rules are often used to calculate a reference value for temperature and vapour fractions, Tr and Yr , in order to compute the transport properties. The most popular mixing rule is a linear combination of gas (TG ,YG ) and vapour properties at the surface (TS ,YS ). Tr = TS + A(TG − TS ),

(18)

Yr = YS + A(YG −YS )

(19)

where 0 ≤ A ≤ 1. Yuen and Chuen [20] proposed the value of A = 1/3, commonly known as the 1/3-rule. This weighting factor is most often used in modelling studies of droplet evaporation and combustion, but it should not be forgotten here, that Miller et al. [21] found better agreement for A = 0 in cases of high temperature environments. A further uncertainty is the determination of TG and YG . In principle, these should be the conditions at infinity or at the flame front, if single droplet combustion occurs, however, determination in practice is not so straight forward as will be demonstrated below.

5 Test Cases The test cases performed for n-heptane and kerosene droplets are summarised in Tables 1 and 2. The droplet distribution inside the domain is as shown in Fig. 7. They are located in planes, regularly ordered and equally spaced. Only very selected results that are representative for all evaporation rates are presented here. The next subsection presents a single droplet case for n-heptane Table 1 3D n-heptane test cases Test 1-drop Stagnant

u∞ [m/s] 0

T∞ [K] 556

Drops 1

d [µm] 794

Δ x [µm] 78

p [atm] 5

1-drop Inert Laminar 64-drop Inert Laminar

2 2

1350 1350

1 64

100 100

8 8

5 5

1-drop Inert Turbulent 64-drop Inert Turbulent

2 2

1350 1350

1 64

100 100

8 8

5 5

1-drop Reacting Laminar 64-drop Reacting Laminar

2 2

1350 1350

1 64

100 100

8 8

5 5

1-drop Reacting Turbulent 64-drop Reacting Turbulent

2 2

1350 1350

1 64

100 100

8 8

5 5

220

M.R.G. Zoby et al.

Table 2 3D kerosene test cases Test u∞ [m/s] 1-drop Inert Laminar 1 8-drop Inert Laminar 1 27-drop Inert Laminar 1 64-drop Inert Laminar 1

T∞ [K] 1530 1530 1530 1530

Drops 1 8 27 64

d [µm] 60 60 60 60

Δ x [µm] 5 5 5 5

p [atm] 40 40 40 40

1-drop Inert Turbulent 8-drop Inert Turbulent 27-drop Inert Turbulent 64-drop Inert Turbulent

1 1 1 1

1530 1530 1530 1530

1 8 27 64

60 60 60 60

5 5 5 5

40 40 40 40

1-drop Reacting Laminar 8-drop Reacting Laminar 27-drop Reacting Laminar 64-drop Reacting Laminar

1 1 1 1

1530 1530 1530 1530

1 8 27 64

60 60 60 60

5 5 5 5

40 40 40 40

1-drop Reacting Turbulent 8-drop Reacting Turbulent 27-drop Reacting Turbulent 64-drop Reacting Turbulent

1 1 1 1

1530 1530 1530 1530

1 8 27 64

60 60 60 60

5 5 5 5

40 40 40 40

Fig. 7 Slice of the domain with the droplet arrangements of the evaporation test cases for n-heptane and kerosene

without combustion since this is one of the very few experimental validation test cases available. All other results are for kerosene droplets in reactive environments.

5.1 Evaporation of Non-reacting n-Heptane Droplets In order to evaluate the accuracy of the DNS, the results for single n-heptane droplets in inert environments are compared to available experimental data [22] and to other computational results [23]. The environment is stagnant and inert at a temperature of T = 556 K. The domain simulated is discretised by 1283 nodes with a grid spacing of Δ x = 78 µm. Figure 8 shows that the initial droplet diameter decay is not well captured by the DNS. This is due to the assumption that the droplet is at saturation temperature. As the ambient temperature in this test case is relatively low (556 K), the transient period when the droplet temperature increases until it reaches a constant value is significant. However, at a later stage when the droplet temperature reaches the maximum value, the results of the DNS accurately represent the decay of the droplet diameter as measured in the experiments. In order to compare

Assessment of Conventional Droplet Evaporation Models for Spray Flames

221

Fig. 8 Comparison of the individual droplet diameter evolution in time for the 1-droplet stagnant case with other numerical [23] and experimental [22] works

Fig. 9 Comparison of the DNS results with models M1 and M2 for the stagnant test case using T∞ = T¯

the slopes the DNS results have been plotted with a shift in the x-axis of 2 s/mm2 . The droplet diameter variation follows the well known d 2 -law [24]. The results of the stagnant test case are also compared to the two models M1 and M2 (see Fig. 9) with Nu = Sh = 2. The main assumptions are single-droplet evaporation in equilibrium with constant droplet temperature and uniform properties of the gas phase. The definition of appropriate properties and the gas-phase temperature is essential to guarantee the predictive capabilities of evaporation models. As in RANS and LES, only mean (or filtered) values of the temperature and composition are available, the “ambient” temperature needed as input to Bq is taken to be the mean (or filtered) value T∞ = T¯ , and Yg is used to define the gas properties. Here, it is assumed that a RANS or LES cell is of the size of the computational domain and T¯ denotes the averaged DNS gas phase temperature. Two different ways of evaluating the models properties are investigated: Firstly, the 1/3-rule, A = 1/3, and secondly A = 0 which is equivalent to the assumption that the gas is pure vapour at saturation conditions. The DNS results are within the values predicted by the two models for the entire computation. Initially the DNS rates are between the values of M2 for A = 0 and A = 1/3. Then, the model M2 with A = 0 seems to accurately capture the DNS

222

M.R.G. Zoby et al.

rates, and in the last 50 ms of the simulations, the rates are between the values predicted by model M2 with A = 0 and M1 with A = 1/3. This comparison shows that no model with gas properties obtained through the mixing rule seems to capture the rates accurately during the entire droplet lifetime. The errors seen in Fig. 8 are likely to be less pronounced in reacting cases due to the much higher temperature of the surrounding gas. However, uncertainties in the modelling of A are likely to remain as will be shown now.

5.2 Evaporation of Reacting Kerosene Droplets The configuration for the kerosene test cases is similar to the n-heptane test cases. The droplet mass loading varies from 0.3 to 21.3 kg f uel /m3 resulting in volume fractions from around 0.04% to 3.0%. Turbulent and laminar cases are simulated. In the turbulent case, the gas velocity is initialised with a turbulent field from a spectral isotropic DNS code. The Taylor microscale is 0.12 mm and the associated Kolmogorov microscale is 15 µm, equivalent to 3Δ x. The computed integral length scale, , is relatively close to the Taylor length scale. The ratio between the integral scale and the droplet diameter is /d0 = 2.5 and is of the same order as in the experiments of [25]. In order to sustain turbulence during the droplet lifetime, turbulence is super-imposed to the convective inflow which has a mean inflow velocity of 1 m/s in both laminar and turbulent cases. The Reynolds number is around 100, based on the gas-phase properties and the Taylor length scale. The gas phase is initialised with the same conditions as the inflow which consists of pure air at high temperature (1530 K) and high pressure (40 bar). The liquid fuel is assumed to be at 332.8 K with ρl = 770 kg/m3 , σ = 2.0 · 10−2 N/m, μ = 8.2 · 10−4 Pa s and h f g = 310 kJ/kg. A temperature gradient from the gas temperature to the kerosene saturation temperature is initially imposed close to the droplets. This initialisation has a strong effect on the initial evaporation rates during an initial transient of approximately 50–100 µs and should be considered when interpreting the results of the DNS. Figure 10 shows snapshots of the temperature fields for droplet evaporation in turbulent reacting flows with 1, 8, 27 and 64 drops in the domain. It is evident that droplet number density strongly affects the combustion process. Single droplet combustion occurs for larger droplet spacings, however, there is a clear transition to group combustion once the droplet spacing falls below 5 initial droplet diameters. For these rather dense sprays, group combustion can be observed with one leading flame zone upstream of the droplet arrays. The flame zone downstream of the arrays is less pronounced due to the relatively rich mixture in this region. The evaporation rates per droplet of kerosene are investigated here by comparing the results from the DNS with the two models, M1 and M2, using the correlations for the modified Nusselt and Sherwood number as proposed by [17]. Figure 11 demonstrates that the square of the average droplet diameter varies linearly with time independent of the droplet loading. The droplet lifetime can now

Assessment of Conventional Droplet Evaporation Models for Spray Flames

223

Fig. 10 Instantaneous temperature profiles for the 1, 8, 27 and 64-droplet cases with turbulent inflow and combustion

Fig. 11 Individual droplet diameter squared as function of time

be derived from the well known d 2 -law for all cases, independent of the existence of a flame and the flame regime. For the 1 and 8-drop cases, the average droplet lifetime is around 1.25 ms while for the most dense case with 64 droplets, it is approximately 1.60 ms. In contrast, it does not seem that under these high temperature conditions the evaporation rates, and thus the droplet lifetimes, are much affected by the presence of turbulence, neither by the presence of combustion. The evaporation rates for the 8-drop cases are shown in Fig. 12, the transients in the laminar case are somewhat different due to differences in initialisation, but general trends and peak rates differ by a mere 20% after approximately 200 µs when ignition of the gas-vapour mixture occurs. As in the n-heptane cases, the maximum evaporation rate for the 1-drop non-reactive laminar case is used as a reference value for normalisation. The high inflow temperature of 1530 K sustains high evaporation rates even in the inert cases and this leads to the relatively small differences. It is also noticed that there are no significant variations in the evaporation rates of the different droplets, and droplet evaporation rate of a single droplet is independent of its position within the droplet array. This changes if droplet density is increased and droplet spacing falls below 10 droplet diameters. As observed in Fig. 12, turbulence does not seem to have significant effects on evaporation rates under any of the conditions considered here. No correlation between the evaporation rates and the subgrid kinetic energy can be observed (not shown here).

224

M.R.G. Zoby et al.

Fig. 12 Mean evaporation rate per droplet for the 8-droplet test cases

Fig. 13 Comparison of the DNS results with models M1 and M2 for the non-reactive turbulent 8-droplet test case (left), for the reactive turbulent 8-droplet test case (centre) and for the reactive turbulent 64-droplet test case (right) using T∞ = T¯

The DNS results of the 8-droplet and of the 64-droplet cases are compared with results obtained using models M1 and M2. Based on Fig. 10, the 8 droplet case may represent single droplet evaporation and combustion and good agreement with standard models can be expected, while the performance of the models for the 64 droplet case is less certain. The results in Fig. 13 show that both models do not predict the evaporation rate satisfactorily at any time if the conventional 1/3-rule is used. It is not surprising that the initial transients are not captured, however, even at later times during the linear decay of d 2 with time, evaporation rates are considerably underpredicted by between 30% to 50%. The model M1 gives slightly better results, but both models seem unsuited for modelling evaporation under engine like conditions. However, both models perform much better when setting A = 0. Now, deviations from DNS are between 1% and 22%, with M1 generally being closer to the DNS data. In addition M2 does not seem to capture the decay of the evaporation with time appropriately for high droplet densities with combustion, as can be seen in Fig. 13 (right). In reacting flows, it can be argued that conditions at infinity should be approximated by conditions in the flame since volume averaging does not capture local conditions correctly. Figure 14 compares DNS with modelled evaporation rates, using the 1/3-rule and approximating Tg and Yg from compositions averaged along the flame contours where mixture fraction is close to stoichiometric. T∞ is set to the

Assessment of Conventional Droplet Evaporation Models for Spray Flames

225

Fig. 14 Comparison of the DNS results with models M1 and M2 for the reactive turbulent 8droplet test case (left) and for the reactive turbulent 64-droplet test case (right) using A = 13

Fig. 15 Comparison of the DNS results with model M1 using properties averaged within a shell around the droplet for the reactive turbulent 8- and 64-droplet test cases

maximum flame temperature. Results for M1 and M2 using filtered values are also included in the figure for better comparison. Due to the assumption of higher temperatures around the droplets, modelled evaporation rates increase, but evaporation rates are still underpredicted by 20% in the 8 droplet case. Due to the logarithmic dependence of Bq on temperature, the temperature adjustment does not significantly change the results of M1 and it does not really affect the results of M2. However, the change in reference temperature and composition changes the properties more significantly and leads to the observed differences in M1 and M2. The good agreement in the 64 droplet case is somewhat fortuitous since models for single droplet combustion should overpredict group combustion and the general trend of underprediction is thus counteracted by the existence of group evaporation and combustion. As for the n-heptane cases, we may conclude that in particular a “correct” choice of properties may improve the predictive capabilities of M1 and M2. A shell around the droplet (δ0 ≈ Δ x) is again defined, and average temperature and composition within this shell are calculated. Figure 15 demonstrates that M1 with reference properties from this shell gives very good predictions independent of reaction and of the combustion regime (errors are around 2%). The transients are also much better captured and this confirms that a “shell” model is appropriate for modelling droplet evaporation under the conditions considered here.

226

M.R.G. Zoby et al.

6 Conclusions The present work investigates the effects of droplet density and turbulence on the evaporation rates of n-heptane and kerosene droplet arrays in inert and reactive turbulent environments using Direct Numerical Simulations (DNS). The DNS results are compared against those obtained using two models commonly used in RANS and LES based procedures and in the case of n-heptane experimental results. The models assume single-droplet evaporation with constant droplet temperature and uniform properties of the gas phase; in the first the droplet temperature is assumed to be equal to the boiling temperature and the evaporation rate related to the heat transfer and in the second the evaporation rate is related to the mass transfer rate. The models make use of Nusselt or Sherwood number correlations to relate the local physical conditions to the relevant heat and mass transfer coefficients. It is shown that higher droplet densities generate group combustion instead of single droplet combustion. When single droplet combustion occurs, the evaporation rates are independent of the droplet loading. However, when combustion occurs as a group phenomenon, a reduction in the evaporation rates is observed. No evident correlation is observed between the evaporation rates and the subgrid kinetic energy and the effects of turbulence are mostly related to vapour dispersion far from the droplet surface. The evaporation rates of the 1, 8 and 64-droplet cases are compared to two commonly used models for RANS and LES computations. The results show that the model M1 is more accurate than M2 for the cases studied here. This is more evident in the kerosene test cases where the results of the models differ more. The estimation of the gas properties as properties of pure vapour at saturation temperature presents much better results for both fuels; it is noted that the evaporation rates at intermediate times are not well reproduced. The use of mean properties obtained from a shell around the droplet as estimation for the modelled gas properties allows the models to capture the initial transients and provides the most accurate predictions of the evaporation rates at later times. However, it is not yet evident how these values can be computed between RANS and LES.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

R. W. Bilger, Combust. Flame 158 (2011) 191–202. R. I. Imaoka, W. A. Sirignano, Proc. Comb. Inst. 30 (2005) 1981–1989. K. K. Kuo, Principles of Combustion, John Wiley and Sons, Hoboken, New Jersey 1986. S. S. Sazhin, Prog. Energy Combust. Sci. 32 (2006) 162–214. S. Osher, J. A. Sethian, J. Comp. Phys. 79 (1988) 12–49. R. Fedkiw, T. Aslam, B. Merriman, S. Osher, J. Comp. Phys. 152 (1999) 457–492. N. N. Yanenko, The Method of Fractional Steps, Springer, Berlin 1971. R. P. Selvam, L. Lin, R. Ponnappan, Int. J. Heat and Mass Transfer 49 (2006) 4265–4278. G. S. Jiang, D. Peng, J. Sci. Comput. 21 (2000) 2126–2143. D. Adalsteinsson, J. A. Sethian, J. Comp. Phys. 148 (1999) 2–22.

Assessment of Conventional Droplet Evaporation Models for Spray Flames

227

11. T. Menard, S. Tanguy, A. Berlemont, Int. J. Multiphase Flow 33 (2006) 510–524. 12. W. P. Jones, F. di Mare, A. J. Marquis, LES-BOFFIN: Users Guide, Technical Memorandum, Imperial College, London (2002). 13. M. R. G. Zoby, S. Navarro-Martinez, A. Kronenburg, Proc. of the 4th European Combustion Meeting. 14. W. P. Jones, R. P. Lindstedt, Combustion and Flame 73 (1988) 233–249. 15. Y. Renardy, M. Renardy, J. Comp. Phys. 183 (2002) 400–421. 16. E. R. A. Coyajee, M. Herrmann, B. Boersma, Proc. Summer Program, Center for Turbulence Research. 17. B. Abramzon, W. A. Sirignano, Int. J. Heat and Mass Transfer 32 (1989) 1605–1618. 18. S. R. Turns, An Introduction to Combustion: Concepts and Applications, McGraw-Hill, Inc., New York 1996. 19. T. Lederlin, H. Pitsch, Annual Research Briefs 9, Center for Turbulence Research, Stanford (2008) 479–490. 20. M. C. Yuen, L. W. Chen, Comb. Sci. Tech. 14 (1976) 147–154. 21. R. S. Miller, K. Harstad, J. Bellan, Int. J. Multiphase Flow 24 (1998) 1025–1055. 22. H. Nomura, Y. Ujiie, H. J. Rath, J. Sato, M. Kono, Proc. Comb. Inst. 26 (1996) 1267–1273. 23. M. Birouk, M. M. A. Al-Sood, Int. J. Thermal Sc. 49 (2010) 264–271. 24. G. A. E. Godsave, Proc. Comb. Inst. (4) (1953) 818–830. 25. M. Birouk, C. Chauveau, I. Gokalp, Int. J. Heat Mass Transfer 44 (2000) 4593–4603.

Analysis of the Effects of Wall Boundary Conditions and Detailed Kinetics on the Simulation of a Gas Turbine Model Combustor Under Very Lean Conditions F. Rebosio, A. Widenhorn, B. Noll, and M. Aigner

Abstract The numerical study presents the simulation of the DLR gas turbine model combustor operated at very lean conditions, near the lean extinction limit. The results have been validated against numerical data: while the hybrid LES-RANS model adopted for the turbulence closure demonstrated to be very well suited for such complex simulations, the combustion revealed to be dependent on the chemical kinetic mechanism adopted for the finite rate chemistry module. The latter was used in combination with the eddy dissipation model and it was possible to show that the flame root zone is mainly controlled by chemical kinetic effects.

1 Introduction The development of new gas turbine combustor concepts is intrinsically related to the necessity of reducing pollutants [5]. In the past years great interest has been shown regarding technologies, which enable to drastically diminish the emission of NOx . The use of lean combustion processes in swirled flows is a winning strategy for this scope [11]. This solution unfortunately presents some challenges, above all the possible arising of strong pressure fluctuations in the combustion chamber [4, 14]. These can lead to very high thermal loads, can eventually lock with the eigenfrequencies of the combustion chamber and can finally lead to serious structural damages. A great research effort has been dedicated to controlling techniques for combustion oscillations; both active and passive control methods have been developed [1]. Both methods require a deep understanding of the dynamics leading to and sustaining the instabilities and finally require major modification to the combustor [13, 15].

F. Rebosio · A. Widenhorn · B. Noll · M. Aigner Institut f¨ur Verbrennungstechnik der Luft- und Raumfahrt, Universit¨at Stuttgart, Pfaffenwaldring 38-40, 70569 Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 18, © Springer-Verlag Berlin Heidelberg 2012

229

230

F. Rebosio et al.

Although many mechanisms causing the onset of the unstable conditions have been identified [1], a constant need of research in this field is required. Moreover, when using the lean premixed swirled technology to its very own limit, i.e. burning in very lean conditions, a new band of problems arises, ranging from local flame extinction events till lean blow off (LBO). The dynamic behavior of a combustion process next to lean blow off is very difficult to analyze in complex, technical configurations. Shnabhogue et al. [17] presented a review of the blow off phenomenon on a simplified configuration, i.e. a bluff body stabilized premixed flame. In their publication it is pointed out clearly, that already flames next to the lean blow off limit are highly unsteady. Furthermore, the LBO process is divided into two phases: first the flame experiences local extinctions due to very high strain rates, the holes in the flame front can grow or eventually reconnect into a continuous flame. This phase is not the LBO yet and the flame can stay in this situation enduringly. The complete extinction of the flame happens just in a second phase, in which the flame experiences a disruption of the wake/recirculation zone and finally blows off. Eggenspieler et al. [3] performed an accurate numerical study on a swirled premixed flame stabilized on a bluff body. They emphasize the role of chemistry-turbulence interaction: LBO first takes place in a so called broken reaction zone regime, i.e. when the flame front configuration is completely disrupted due to the penetration of turbulent structures into the reaction zone and flame quenching becomes relevant. A study from St¨ohr et al. [19] experimentally analyzed the onset and development of LBO on the model combustor adopted for the present numerical study. Their findings enabled to correlate the LBO onset to a prolonged extinction of the flame root, moreover the authors were able to identify the crucial role of the flame root in the flame stabilization process. The complete process is still not entirely disclosed and the numerical simulation can become a very effective tool to assess some unresolved question. Indeed, numerical simulation tools are becoming more important not just for simple studies but also for predictive analysis of complex configurations. The main problem to asses when using computational fluid dynamic tools for predictive scope is still the reliability of the models adopted for turbulence and combustion. Several papers were published in the previous years about the numerical modeling of the gas film nozzle model combustor presented in the following and were able to prove the capability of modern hybrid LES-RANS models to predict complex turbulent flows, not just in their time averaged configurations but also in the Root Mean Square (RMS) quantities [24]. Moreover results from [23] demonstrated that also combustion phenomena could be reproduced properly and in [16] a first attempt to simulate the conditions next to LBO was successfully accomplished. In particular, Rebosio et al. [16] showed that the precursory phenomena of LBO, i.e. local extinction and re-ignition, could be properly reproduced in the computations. It is important to point out that such complex studies, as described in [16, 23, 24] are possible just when adopting strong parallelization techniques and can dispose of very powerful supercomputers. Even in this case, simulation can take up to months.

CFD Parameter Investigations of a GT Model Combustor

231

2 Physical Model 2.1 Conservation Equations The initial set for the numerical simulation of reacting flows includes the continuity, momentum, energy, turbulence and species equations. In this paper the compressible formulation is used. The equations are given by:

∂ Q ∂ (F − Fv ) ∂ (G − Gv ) ∂ (H − Hv ) + + + = S. ∂t ∂x ∂y ∂z

(1)

The conservative variable vector Q consists of the density, the velocity components, the total specific energy, the turbulent kinetic energy, the specific dissipation rate and the species mass fractions and is defined as: T ˜ ρ¯ k, ρω ¯ , ρ¯ Y˜i ˜ ρ¯ v, ˜ ρ¯ w, ˜ ρ¯ E, Q = ρ¯ , ρ¯ u,

i = 1, 2, . . . , Nk−1 .

(2)

Here, Favre-averaged quantities are used. F, G and H are the inviscid and Fv , Gv and Hv are the viscous fluxes in x-, yand z-direction, correspondingly. The vector S in (1) contains the source terms and is defined as: S = [0, Su , Sv , Sw , SE , Sk , Sω , SYi ]T

i = 1, 2, . . . , Nk−1 .

(3)

2.2 Turbulence Modeling For the closure of the above system of partial differential equations for turbulent flows the Boussinesq hypothesis is used. The required values for the eddy viscosity can be obtained by appropriate turbulence models. In the present work the Scale Adaptive Simulation (SAS) model in the formulation given by Menter and Egorov [9, 10] is used. The model is based on a particular formulation of the SST model, [8], to which the von Karman length scale is added within the turbulence length scale equation, so that the model is able to change dynamically from a URANS to a LES resolution scheme. In [16] the model equations are presented.

2.3 Combustion Modeling To take combustion processes into account the respective chemical production rate of all species has to be modeled. The required values for this quantity can be obtained by appropriate combustion models. In the present work the combined Eddy

232

F. Rebosio et al.

Dissipation/Finite Rate Chemistry combustion model is used. The Eddy Dissipation Model (EDM) relies on the assumption that reactions take place only when reactants are mixed at molecular level and that the reaction rate is controlled by the mixing time rather than the chemical kinetic time [6]. Thus the rate of consumption of the fuel or oxidant is proportional to the turbulent kinetic energy and the turbulent length scale, or turbulent dissipation rate. The assumption made for the EDM no longer holds true when one chemistry time scale becomes relevant. However, the limits of application of the EDM can be extended by combining it with the Finite Rate Chemistry (FRC) model, in which the reaction rates are calculated by means of a chemical kinetic mechanism. For the present simulation a three step chemical kinetics mechanism is used for methane combustion.

3 Numerical Method The simulations were performed applying the commercial software package ANSYS CFX-11.0. The fully implicit solver is based on a finite volume formulation for structured and unstructured grids. A multigrid strategy is applied to solve the linear set of coupled equations. For the spatial discretization a second order scheme is used except for the species and energy equations. For the latter a high order resolution scheme, which is essentially second order accurate and bounded is exploited. For the time discretization an implicit second order time differencing scheme is used. SAS-SST was used to model turbulence and EDM-FRC provided the combustion model. A three step chemical kinetics mechanism with a CO-equation subset has been used to describe the flame chemistry, [12], both in its original formulation and with slightly modified constants, as explained in Sect. 4.2. The parallelization in CFX is based on the Single Program Multiple Data (SPMD) concept. The numerical domain is decomposed into separate tasks, which can be executed separately. The communication between the processes is realized using the Message Passing Interface (MPI) utility. The partitioning process is fully automated and the memory usage is equally distributed among all processors.

4 Results and Discussion 4.1 Test Case Description The gas turbine test combustor presented in the following is a modified version of an aero engine gas turbine combustor and has been investigated in many experimental works, [2, 7, 18, 20, 22], providing a large experimental database, which has been used to validate the present results too.

CFD Parameter Investigations of a GT Model Combustor

233

Fig. 1 Schematic drawing of the gas film nozzle model combustor Table 1 Flame parameters Air g/min 281

CH4 g/min 9.0

Pth kW 7.4

φglob – 0.55

fglob – 0.031

A sketch of the model combustor is shown in Fig. 1. Air is introduced into the combustion chamber from a plenum through two concentric co-rotating radial swirlers, with 8 and 12 channels for the central and the annular nozzles respectively. The fuel inlet is positioned between the two swirled air flows and consists of a non-swirling annular nozzle with 72 channels. The combustion chamber has square-cross section (85 mm × 85 mm) and is 114 mm long. The exhaust gas exit is formed by a conical top plate with central exhaust tube. The exit plane of the outer air nozzle was taken as the reference height in the combustion chamber and corresponds to x = 0. The flame analyzed was a very lean methane air flame near blow off conditions. The thermal power was 7.4 kW, the overall mixture fraction was 0.031 and the global equivalence ratio φ was 0.55. The extinction limit equivalence ratio for this flame has been measured and is equal to 0.53 [22]. Table 1 summarizes the parameters of the flame.

234

F. Rebosio et al.

4.2 Numerical Setup The computational grid, Fig. 2, consists of 1.91 million grid points. For the injector system and the combustion chamber an unstructured hexaeder grid with 1.6 million grid points was created. In the regions of potential turbulence generation and large velocity gradients a fine mesh was used. Furthermore, the growth of the adjacent cells was limited to 10% in these zones. For the plenum and the air support to the swirlers an unstructured tetrahedral mesh was applied. It consists of 1.79 million tetrahedral elements and 0.31 million grid points. Figure 3a shows the employed domain with the air and fuel inlet, as well as the outlet highlighted. Mass flow boundary conditions were set at both inlets and realized the global equivalence ratio of 0.55, see Table 1. Air and fuel entered the combustion chamber

Fig. 2 Computational grid

Fig. 3 Computational domain

CFD Parameter Investigations of a GT Model Combustor

235

Table 2 Main features of the performed testcases Testcase

Feature

Combustion chamber wall temperature

Combustion modeling

Case A

Baseline

1050[K]

Case B

Low wall temperature

800[K]

Case C

Modified chemistry

800[K]

EDM+FRC, Nicols’ chemistry model EDM+FRC, Nicols’ chemistry model EDM+FRC, modified chemistry model

at 330 K. At the outlet, a static pressure boundary condition was set. The plenum wall temperature was 330 K and the swirler walls were adiabatic. At first, the temperature of the combustor walls was set to 1050 K, which was an estimated value, consistent with measured values from other operating points. Further simulations were performed with a lower temperature prescribed at the combustion chamber wall, in order to investigate the influence of this parameter. Table 2 summarizes all performed testcases; as already reported in [16], the results for case A evidenced that the overall simulated heat release was too high compared to the experimental results available, [2, 7, 21, 22]. In Sect. 4.3, it is shown that the combustion process in case A starts too close to the burner outlet, causing a higher temperature distribution next to the flame holder compared to the temperature reported in the experiments. Therefore, two possible causes for this behavior have been addressed in cases B and C respectively: A too high temperature prescribed at the combustion chamber walls and an overprediction of the chemical reaction rate by the combustion model. Furthermore, Fig. 3b shows the locations at which profiles for the most relevant quantities has been extracted, according to the available experimental results.

4.3 Results The typical structures for a confined swirling flow had been properly predicted already in the baseline simulation and are shown in Fig. 4. In particular the black line denotes the zero axial velocity and enables a good identification of the large central recirculation zone (CRZ) and of the outer recirculation zones (ORZ). The CRZ enhances the recirculation of the hot products and plays a fundamental role in the flame stabilization mechanism. The dynamics, which are characteristic of the combustion chamber and of the operational point, Fig. 5, were well reproduced for all three test cases. The results were confirmed by the frequency analysis of velocity and pressure signals at relevant points in the chamber. Exemplary for case A, Fig. 5a shows the magnitude spectrum of the pressure signal at the chamber wall; a frequency signal at 290 Hz is clearly detached, corresponding to the dynamics of the low frequency oscillation, which was documented in the experiments as well, [21]. Figure 5b reports the precessing

236

F. Rebosio et al.

Fig. 4 Baseline case: swirling flow characteristics, contour plot of time-averaged axial velocity

Fig. 5 Baseline case. Dynamics behavior

vortex core (PVC) structure for the simulation of case A. The frequency associated with the PVC was found to be 446 Hz which is slightly different from the evaluated frequency of 510 Hz reported by Stoehr et al. [19]. Moreover in Fig. 5a a lower broadband frequency around 150 Hz is clearly visible. This frequency can be associated with the phenomena occurring at very lean conditions next to the extinction limit, i.e. with the local extinctions and re-ignitions in the flame front. The highly unsteady character of the flame is highlighted in Fig. 6 for case A. Here the instantaneous temperature field is overlapped with the flow vectors. It can be seen that the vortices arising from the roll up of the inner shear layer entrap

CFD Parameter Investigations of a GT Model Combustor

237

Fig. 6 Baseline case. Interaction between chemistry and turbulence: Instantaneous contour plots of temperature and velocity vectors

Fig. 7 Temperature profiles. The blue dots are the experimental results, the green lines the numerical simulation, case A

the cold flow coming from the inlet. The vortices are transported causing local ignition phenomena downstream the flame root. Moreover, the interaction of the vortices with the flame front disrupt the coherence of the flame sheet breaking it locally. Figure 7 confronts the temperature profiles extracted along the chamber for the baseline simulation with the experimental data. Exemplary the profiles at 10 mm above the inlet and at 40 mm are reported; the differences between experiment and numerical data were observed all along the combustion chamber and reveal an overestimation of the heat release in the numerical simulation near the burner outlet.

238

F. Rebosio et al.

Fig. 8 Chemistry relevant quantities. The blue dots are the experimental results, the green lines the numerical simulation, case A

Fig. 9 Temperature profiles extracted at h = 90 mm. The blue dots are the experimental results, the green lines the simulation results for case A and the black lines those for case B

On the other hand, the profiles of mixture fraction, Fig. 8a are in good agreements with the experiments. This indicates that the turbulent mixing processes are properly captured. Moreover, the profiles of mass fraction of methane, Fig. 8b show that the flame burns too soon compared to the experiments. Therefore, as already anticipated in Sect. 4.2, cases B and C were performed. Case B revealed an identical flow structure as case A. The effect of the lower prescribed temperature at the combustion chamber walls mainly influenced the outlet temperature of the mixture, as it can be seen in Fig. 9. The mass fraction of methane as well as the temperature at the flame root were not significantly influenced by the new boundary condition applied. A further analysis confirmed that the flame root behavior is in fact largely controlled by the finite rate chemistry module, as it can be seen in Fig. 10. Thus, the mechanism of Nicols et al. [12] was tuned by means of laminar flame velocity. The results of the simulation with the changed reaction mechanism are reported in Figs. 11 and 12. The time-averaged temperature contours of the new simulation, Fig. 11b exhibit a lower overall temperature compared to the baseline testcase, Fig. 11a; the outer recirculation zone is colder in case C and the temperature at outlet is also lower, although this effect is mainly introduced by the lower

CFD Parameter Investigations of a GT Model Combustor

239

Fig. 10 Case B. Analysis of the active chemistry model. The red zones are the EDM controlled zones, while the blue zones are FRC controlled

Fig. 11 Contour plot of time-averaged temperature

temperature imposed at the combustion chamber walls, as already evidenced for case B. The modification of the chemical kinetic effects in the FRC model results in a different time-averaged flame shape, Fig. 12b; the V-form of the flame is more compact and located mainly in the inner shear layer, the averaged flame position stretches deeper in the combustion chamber. The changing in the averaged flame configurations barely influenced the flow field. Exemplarily, the profiles of time-averaged and RMS values for the velocity components at 20 mm above the inlet of the combustion chamber are reported in Fig. 13, but the same behavior was identified for all profiles extracted. The results of

240

F. Rebosio et al.

Fig. 12 Time-averaged reaction rate for CH4 equation

case C, in red, reproduce very well the experimental data, blue dots and compared to those of case A, green lines, they reproduce slightly better the shape and intensity of the ORZ. However, the most significant enhancement from case A to case C regards the combustion relevant quantities: Fig. 14a reports the extracted profiles for the mass fraction of methane of two test cases in comparison to the experiments. The combustion process is clearly better modeled with the modified chemical kinetic mechanism. Yet, the temperature profiles in the center of the chamber are too high, Fig. 14b, and this trend is visible for all the extracted profiles, Fig. 3b, at different heights in the combustion chamber. Anyhow, the results from the simulation of case C show the crucial role played by detailed chemistry. The model adopted for case C is a global reaction model with 5 species and 3 global reactions, i.e. it represents a very coarse description of the chemical kinetic effects. The available results indicate the need of a more detailed description of the chemical kinetics.

4.4 Computational Time The simulations were run on the high performance cluster HP XC4000 at the Steinbuch Center for Computing (SCC) in Karlsruhe. Each simulation was run on 20 AMD Opteron processors. The baseline testcase, case A, was initialized with a steady RANS simulation. Afterwards, two whole combustion chamber resident times, estimated by time mean velocity, were simulated to obtain a start up solution, from which statistical averaging was performed. Averaged quantities were obtained

CFD Parameter Investigations of a GT Model Combustor

241

Fig. 13 Profiles of velocity components extracted 20 mm above the combustion chamber inlet. The blue dots are the experimental data, the green lines the simulation, case A, and the red lines case C

over 4 combustion chamber residence times, which correspond to 0.7 seconds in real time and a CPU expense of 4.23e+04 hours. Cases B and C were initialized with the converged solution of case A and for both simulations statistics were achieved over 2 residence times. In case B we simulated 0.42 seconds altogether, corresponding to a computational expense of 2.9e+04 hours. For case C, the total simulated time was 0.46 s for which 4.1e+04 CPU hours were needed. Acknowledgments. The authors would like to thank the Steinbuch Center for Computing (SCC) in Karlsruhe for the computational time at disposal and the technical support.

242

F. Rebosio et al.

Fig. 14 Profiles of combustion relevant quantities extracted 20 mm above the combustion chamber inlet. The blue dots are the experimental data, the green lines the simulation, case A, and the red lines case C

References 1. S. Candel. Combustion dynamics and control: Progress and challenges. In Proceedings of the Combustion Institute, volume 29, pages 1–28, 2002. 2. X. R. Duan, P. Weigand, W. Meier, O. Keck, W. Stricker, M. Aigner, and B. Lehmann. Experimental investigations and laser based validation measurements in a gas turbine model combustor. Progress in Computational Fluid Dynamics, 4(3–5):175–182, 2004. Text hier eingeben. 3. G. Eggenspieler and S. Menon. Combustion and emission modelling near lean blow-out in a gas turbine engine. Progress in Computational Fluid Dynamics, 5(6):281–297, 2005. 4. J. J. Keller. Thermoacoustic oscillations in combustion chambers of gas turbines. AIAA Journal, 13(12):2280–2287, December 1995. 5. A. H. Lefebvre. Gas Turbine Combustion. Taylor & Francis, 2 edition, 1999. 6. B. F. Magnussen. The eddy dissipation concept a bridge between science and technology. In ECCOMAS Thematic Conference on Computational Combustion, Lisbon (Portugal), 24 June 2005. 7. W. Meier, X. R. Duan, and P. Weigand. Investigation of swirl flames in a gas turbine model combustor. II. Turbulence-chemistry interaction. Combustion and Flame, 144:225–236, 2006. 8. F. R. Menter. Two equation eddy viscosity turbulence models for engineering applications. AIAA Journal, 32(8):269–289, 1994. 9. F. R. Menter and Y. Egorov. A scale-adaptive simulation model using two-equation models. In 43rd AIAA Aerospace Sciences Meeting and Exhibit, 10 January 2005. 10. F. R. Menter, M. Kuntz, and R. Bender. A scale adaptive simulation model for turbulent flow prediction. AIAA Paper, (2003-0767), 2003. 11. H. C. Mongia. Ge aviation low emission combustion technology evolution. In AeroTech Congress and Exhibition, Los Angeles, CA, number 2007-01-3924 in SAE Technical Paper Series. SAE International, September 2007. 12. D. G. Nicol, P. C. Malte, A. J. Hamer, R. J Roby, and R. C. Steele. Development of a FiveStep Global Methane Oxidation-NO Formation Mechanism for Lean-Premixed Gas Turbine Combustion. Journal of Engineering for Gas Turbines and Power, 121:272–280, April 1999. 13. C. O. Paschereit and E. Gutmark. Combustion instabilities and emission control by pulsating fuel injection. Journal of Turbomachinery, 130, January 2008. 14. C. O. Paschereit, E. Gutmark, and W. Weisenstein. Coherent structures in swirlling flows and their role in acoustic combustion control. Physics of Fluids, 11(9):2667–2678, September 1999. 15. C. O. Paschereit and E. J. Gutmark. Control of high- frequency thermoacoustic pulsations by distributed vortex generators. AIAA Journal, 44(3):550–557, March 2006.

CFD Parameter Investigations of a GT Model Combustor

243

16. F. Rebosio, A. Widenhorn, B. Noll, and M. Aigner. Numerical simulation of a gas turbine model combustor operated near the lean extinction limit. In Proceedings of ASME Turbo Expo 2010, number GT2010-22751, 2010. 17. S. J. Shanbhogue, S. Husain, and T. Lieuwen. Lean blowoff of bluff body stabilized flames: Scaling and dynamics. Progress in Energy and Combustion Science, 35:98–120, 2009. 18. M. Stoehr and W. Meier. Coherent structures in partially premixed swirling flames. In 12th International Symposium on Flow Visualization, 10 September 2006. 19. M. St¨ohr, I. Boxx, C. Carter, and W. Meier. Dynamics of lean blowout of a swirl-stabilized flame in a gas turbine model combustor. Proceedings of the Combustion Institute, 33(2):2953– 2960, 2011. 20. M. St¨ohr and W. Meier. Investigation of a periodic combustion instability in a swirl burner using phase-resolved PIV. In 3rd European Combustion Meeting ECM 2007, 2007. 21. P. Weigand. Untersuchung periodischer Instabilitaeten von eingeschlossenen turbulenten Drallflammen mit Lasermessverfahren. PhD thesis, Institut fuer Verbrennungstechnik der Luft- und Raumfahrt an der Universitaet Stuttgart, Germany, 2007. 22. P. Weigand, W. Meier, X. R. Duan, W. Stricker, and M. Aigner. Investigation of swirl flames in a gas turbine model combustor. I. Flow field, structures, temperature and species distribution. Combustion and Flame, 144:205–224, 2006. 23. A. Widenhorn, B. Noll, and M. Aigner. Numerical characterization of a gas turbine model combustor applying scale adaptive simulation. In Proceedings of the ASME Turbo Expo 2009: Power for Land, Sea and Air, volume GT2009, 8 June 2009. 24. A. Widenhorn, B. Noll, and M. Aigner. Numerical study of a non-reacting turbulent flow in a gas turbine model combustor. In 47th AIAA Aerospace Science Meeting Including the New Horizons Forum and Aerospace Exhibition, number AIAA2009-647, January 2009.

Oxy-coal Combustion Modeling at Semi-industrial Scale Michael M¨uller, Uwe Schnell, and G¨unter Scheffknecht

Abstract The use of simulation tools such as CFD leads to a detailed fundamental understanding of the complex processes during coal combustion. Thus, mathematical modeling provides an important instrument for research and development of the oxy-coal combustion technology which represents one of the promising CO2 capture and storage (CCS) technologies. In this process, coal is burnt in an atmosphere consisting of pure O2 mixed with recycled flue gas leading to an exhaust gas with high CO2 concentration which is ready for storage after further conditioning. However, the specific conditions of the oxy-coal combustion process require several adjustments in the CFD code which was originally been developed for conventional air-firing combustion. Accordingly, advanced sub-models concerning global homogeneous and heterogeneous chemistry as well as gas phase radiation have been developed and implemented in IFK’s CFD code AIOLOS. The objective of this study was to investigate the accuracy and prediction quality of the enhanced modeling approach. The simulation results focus on temperature profiles and gas species concentrations. For validation purposes extensive tests have been carried out at IFK’s semi-industrial scale furnace (500 kWth ) firing dried pulverized lignite. In general, the simulation results show satisfactory agreement to the corresponding measurements indicating that the developed mathematical models are suitable for the application of the AIOLOS code in the field of oxy-coal combustion.

1 Introduction Power generation from combustion of fossil fuels leads to emission of greenhouse gases, the dominant contributor being CO2 . As rapidly growing energy demand is expected to be met in large part with fossil fuel fired power plants, reduction of Michael M¨uller · Uwe Schnell · G¨unter Scheffknecht Institute of Combustion and Power Plant Technology – IFK, University of Stuttgart, Pfaffenwaldring 23, 70569 Stuttgart, Germany, [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 19, © Springer-Verlag Berlin Heidelberg 2012

245

246

M. M¨uller, U. Schnell, G. Scheffknecht

Fig. 1 Simplified schematic of the oxy-coal process with its main stages: Air separation, combustion with flue gas recycling, exhaust gas conditioning, and CO2 separation (adapted from [1])

greenhouse gas emissions within the process of power generation is strongly required. Hence, in recent years oxy-coal combustion is receiving growing attention as one of the promising CO2 capture and storage (CCS) technologies. Whereas conventional pulverized coal fired boilers use air as combustion medium, the oxidizing atmosphere within the oxy-coal process consists of a mixture of O2 and recycled flue gas. In theory, post-combustion gases with CO2 concentrations of about 95% may be produced. After further conditioning, the highly concentrated CO2 may subsequently be compressed and stored. A simplified schematic of the oxy-coal process is illustrated in Fig. 1. Due to the auxiliary power required for the O2 generation and the purification and liquefaction of the exhaust gas, the net efficiency will be lowered by about 7 to 11% [1]. Nonetheless, the oxy-coal combustion process is regarded as a relatively cost effective CCS method [2, 3] and existing conventional power plants may also be retro-fitted to oxy-coal operation without major complications. As a consequence, continuing research and development— employing experimental studies as well as simulation tools—has been triggered worldwide. Mathematical modeling has proven to be a useful and cost-reducing instrument for design and optimization of combustion systems in general, and pulverized coal fired power plants in particular. The application of Computational Fluid Dynamics (CFD) leads to a better understanding of the complex physical and chemical processes in coal combustion. Hence, this methodology is expected to play an important role for future development of oxy-coal combustion systems. The oxy-coal process results in specific conditions compared to the standard airfiring process. The most important differences between conventional and oxy-coal combustion—such as flame characteristics and emission behavior—stem from the flue gas recycling and the corresponding composition of the oxidizing atmosphere. Within the oxy-coal process there are much higher partial pressures of CO2 and H2 O; accordingly, the radiation of the flue gas is enhanced. Furthermore, the main diluting gases, i.e. CO2 in oxy-coal and N2 in air operation, have different thermo-

Oxy-coal Combustion Modeling at Semi-industrial Scale

247

physical properties; these are primarily the molecular weight, the heat capacity, and the O2 diffusion rate. In order to account for those particular oxy-coal conditions which affect both combustion characteristics and heat transfer [4, 5], some of the commonly used sub-models have to be adjusted. The reaction sub-model has been identified as being the most relevant to allow the transition from simulation of conventional air-fired to oxy-coal combustion. Consequently, the present study focuses on extended chemical reaction mechanisms implemented in the CFD code AIOLOS which was developed at the Institute of Combustion and Power Plant Technology (IFK), University of Stuttgart.

2 Modeling Details Modeling of turbulent reacting flows, and in particular of pulverized coal combustion systems, has to account for fluid flow, chemical reactions, and heat transfer phenomena. The mutual influence of these physicochemical processes requires a simultaneous numerical solution of a system of strongly coupled differential equations, i.e. the governing equations of mass, momentum, energy, and the mean mass fraction of each species which participates in the considered chemical reactions. The general transport equation in direction j may be formulated as ∂ (ρΦ ) ∂ (ρ u j Φ ) ∂ ∂Φ + = ΓΦ (1) + SΦ ∂t ∂xj ∂xj ∂xj with ρ , t, u, x, ΓΦ , and SΦ denoting density, time, velocity, coordinate, diffusion coefficient, and source term, respectively. Equation (1) describes the local change of the Favre-averaged variable Φ with the corresponding transient, convective, diffusive, and source/sink terms. Furthermore, additional sub-models are needed to account for the broad range of physical and chemical processes within the combustion system. Simulations have been performed using the CFD code AIOLOS which is based on the Finite Volume method. Incompressible flow and stationary boundary conditions are assumed. The two-phase flow is treated as a simplified Eulerian quasione-phase flow neglecting slip between gas and dispersed particle phase. Due to the Favre averaging a turbulence model is required for solving the Navier-Stokes equations. Accordingly, the standard k-ε model [6] is applied to calculate an eddy viscosity in order to resolve turbulence closure and to describe turbulence phenomena. Pressure-velocity coupling is modeled by the SIMPLE method in combination with the interpolation scheme from Date [7] for pressure correction. The discrete ordinates method is utilized for modeling radiative heat transfer. The Eddy-Dissipation concept accounts for turbulence-chemistry interactions. Further details of the developed chemical reaction scheme and the radiation model are illustrated in the following sections. Moreover, the code has been optimized for vectorization and

248

M. M¨uller, U. Schnell, G. Scheffknecht

parallelization and allows domain decomposition. Hence, efficient use of high performance vector platforms has been enabled. Further information concerning the AIOLOS code is given elsewhere [8–10]. The coal combustion process can be split into a sequence of stages: devolatilization, gasification and combustion of the remaining char, and combustion of the volatiles released during pyrolysis. Due to the number of reactions involved, the modeling approach within a CFD framework has to be simplified in order to maintain reasonable computational effort.

2.1 Coal Devolatilization The devolatilization stage is generally considered as a thermal decomposition process. Thereby, the surrounding atmosphere has only an indirect impact via heat transfer to the coal particles. As a consequence, commonly used devolatilization models are considered to be applicable to oxy-coal conditions. Whereas the primary pyrolysis is often represented by a single hypothetical reaction of dry-ash-free coal decomposing to char and volatiles containing mostly CO, CO2 , H2 O, H2 , light hydrocarbons (Cn Hm ), and tars (Cx Hy Oz ), the secondary pyrolysis involves several reactions to describe tar conversion and soot formation. The suggested standard reaction scheme includes the corresponding reactions: tar gasification at low temperatures, tar decomposition into soot at high temperatures, and further oxidation of tar and soot. A detailed description of the coal devolatilization model used in AIOLOS can be found in [9]. Experiments of coal pyrolysis in N2 and CO2 atmosphere indicate that the model described above is suitable for oxy-coal conditions if an appropriate gas phase chemistry model is used which accounts for shift reactions and gas phase equilibrium [11].

2.2 Homogeneous Chemistry At conventional air-firing combustion the most important gas phase reaction which determines flame speed and the promotion of chain branching is: H + O2 O + OH.

(2)

At oxy-coal combustion, however, the reaction H + CO2 CO + OH

(3)

is of similar relevance due to the specific CO2 -rich atmosphere [12]. Reactions (2) and (3) indicate that in oxy-coal systems CO2 competes with O2 for the available H-radicals, which may cause a reduction of flame speed and lower availability of O-

Oxy-coal Combustion Modeling at Semi-industrial Scale

249

radicals. Furthermore, high CO2 levels locally promote the formation of CO in fuelrich regions via reaction (3). Because detailed chemistry models which in general are capable of modeling radical reactions are computationally prohibitive within engineering applications, the standard global gas phase reaction model has to be extended to consider the specific oxy-coal conditions [13]. In order to account for the chemical effects of high CO2 concentrations in the oxidizing atmosphere, the homogeneous water-gas-shift reaction H2 O + CO CO2 + H2

(4)

needs to be included in the global reaction scheme as an equilibrium reaction [13]. The combustion of light hydrocarbons Cn Hm , which are formed during the devolatilization stage, is modeled as Cn Hm + n/2 O2 → n CO + m/2 H2 .

(5)

The intermediate species H2 is supposed to be in chemical equilibrium with H2 O, stated as (6) H2 + 0.5 O2 H2 O. This assumption has proven to be very important for high temperature flames with elevated O2 concentrations which may occur under oxy-coal conditions, since the chemical equilibrium of reaction (6) is shifted towards the educt side with temperatures exceeding 2000 K. Neglecting the reverse reaction would then lead to a local over-prediction of the flame temperature. The calculation of the respective equilibrium constants as a function of standard Gibbs free energy is achieved by polynomial fitting to the respective data from JANAF tables [14]. The rate expressions and kinetic parameters of reactions (4)–(6) are compiled in [13]. As previously mentioned, the Eddy-Dissipation concept (EDC) is applied to account for the turbulence-chemistry interaction within the homogeneous reactions. Within the EDC approach, the total reaction space is split into the fine structure region where perfect mixing of the reactants is assumed, and the surrounding fluid [15]. All gas phase reactions are considered to take place only in the fine structures, except reactions which are slow and/or do not require mixing such as the pyrolytic secondary reactions of tar [13].

2.3 Heterogeneous Chemistry Heterogeneous chemistry describes the gasification reactions of the residual solid after pyrolysis with gaseous oxidizers is finished. Despite the development of various phenomenological models in the past, the heterogeneous chemistry within coal combustion is still topic of ongoing research due to the number of occurring chemical and physical processes and their mutual interactions. The most commonly used rate

250

M. M¨uller, U. Schnell, G. Scheffknecht

equations are simple Arrhenius-type expressions employing the kinetic parameters: char specific frequency factor and char specific activation energy. In doing so, the reaction progress is assumed to depend on the interaction of the chemical reaction and the physical diffusion of oxidizer and products through the particle boundary layer. Measurements of char reactivity indicate that the intrinsic char combustion rate is similar under oxy-coal and air combustion conditions. It was found that the oxygen consumption, and thus char conversion proceeds only slightly slower in CO2 than in N2 atmosphere [11]. Since the char combustion rate differs only insignificantly for both experimental set-ups, it can be considered to be approximately the same under air and oxy-coal conditions. As generally accepted in most modeling approaches, char is assumed to consist of pure carbon. At conventional combustion, char oxidation with O2 (reaction (7)) is the dominating reaction. In oxy-coal systems, however, especially CO2 and H2 O have to be considered as oxidizers. This leads to the following set of main net reactions: (7) (1 + f ) C + O2 → (1 − f ) CO2 + 2 f CO C + CO2 → 2 CO

(8)

C + H2 O → CO + H2

(9)

with f denoting a statistical mechanism factor within the range of 0 to 1. At combustion temperatures above 1000 ◦C f may be presumed to be constant with the main product of reaction (7) being CO, resulting in: f ≈ 1 [16]. The Boudouard reaction (8) and the heterogeneous water-gas-shift reaction (9) are mostly neglected in air combustion simulations. But in O2 -lean regions both reactions may have major impact because partial pressures of CO2 and H2 O are generally higher in oxy-coal combustion systems, the latter especially in case of wet recycle systems. In general, both reactions should be considered as equilibrium reactions since product inhibition may occur [17]. Yet, regarding the equilibrium constants of each reaction reveals that—at typical combustion temperatures and ambient pressure—the equilibrium is shifted towards the product side. On this basis, reactions (8) and (9) may be considered irreversible. A comprehensive analysis of char gasification models under oxy-coal combustion conditions including details about modeling the char morphology is given in [13].

2.4 Radiation Modeling The heat transfer in coal-fired furnaces is dominated by thermal radiation which is mainly influenced by temperature and the composition of the participating medium. In general, the radiative heat transfer may be split into gas radiation and particle radiation with the participating media being CO2 and H2 O and char, soot and fly

Oxy-coal Combustion Modeling at Semi-industrial Scale

251

ash. Compared to conventional combustion, the oxy-coal conditions result on the one hand in higher gas emissivity because of the specific gas composition in the furnace and on the other hand in enhanced particulate matter concentrations due to lower gas volumes [18]. Indeed, models for radiative properties of CO2 , H2 O, and particulate matter have been established for air combustion; yet, there is uncertainty regarding their validity for oxy-coal combustion systems. Gas radiation. In general, the numerical solution of the radiative transfer equation (RTE) has to account for both the spatial distribution of the radiative intensity and the spectral dependency of the optical properties of the participating medium. A full integration of the RTE involving all spectral lines in the gas spectrum via line-byline calculations is computationally prohibitive for application in CFD simulations. Another rather detailed option is to approximate the absorption lines within the wave number spectrum by spectral bands (narrow- and wide-band models). However, CFD codes often employ simplified global models which neglect the spectral variation by treating the gas mixture as one grey medium in order to solve a spectrally integrated RTE. One of those models is the Weighted Sum of Grey Gases (WSGG) model which approximates the medium as a mixture of grey gases with constant absorption coefficients associated with weighting factors [19]. Within the model’s standard formulation those coefficients and according weighting factors are given for a fixed partial pressure ratio of CO2 and H2 O. With this being only valid in regions where combustion is almost finished, the resulting total absorption coefficient is often deemed to be inappropriate for strongly varying CO2 /H2 O ratios. As a consequence, adapted formulations of the WSGG model have been published recently in order to account for different partial pressure ratios of CO2 and H2 O specific for oxy-coal conditions [20, 21]. To overcome the limitations of the standard WSGG model, Leckner proposed a more general model which predicts the total emissivity of a gas mixture depending on its composition and temperature [22]. The corresponding total absorption coefficient which is calculated from the total emissivity has shown to yield very accurate results compared to benchmark calculations carried out with band models, especially with path lengths typical for industrial boilers [23]. Consequently, Leckner’s model is used within the present simulations. Particle radiation. The particle phase affects the radiative heat transfer via emission, absorption and scattering of thermal radiation; the main influencing factors being ash concentration, particle size distribution of the cloud and complex index of refraction and absorption index. Since the calculation of the optical properties using the general Mie theory is way too complex for numerical simulation of industrial furnaces, a simplified engineering approach is applied [10, 13]. The radiative properties of coal and ash are derived from the specific area and a mean efficiency factor of the particle cloud. The scattering phase function is modeled by the DeltaEddington approximation [24].

252

M. M¨uller, U. Schnell, G. Scheffknecht

3 Simulation Results 3.1 Experimental Setup and CFD Model The model evaluation was carried out using experimental data from IFK’s semiindustrial scale test facility (500 kWth ) fired with dried pulverized Lausitz lignite coal. The performed measurements include both detailed in-flame and continuous exhaust gas measurements. Detailed in-flame measurements are performed focusing on combustion gas temperature and flue gas composition. In combination with the continuous monitoring at the exit of the furnace reliable information about the main characteristics of the combustion process are provided, which is of major importance for the validation of the CFD models. The IFK test facility is a vertical down-fired furnace with a total length of about 7 m and an inner diameter of 0.8 m, as illustrated in Fig. 2. The test rig is optimized for investigation of pulverized fuel combustion processes and has been retro-fitted for oxy-coal combustion by integration of a flue gas recycle path after the electrostatic precipitator [25]. For the oxy-coal combustion experiments, pure oxygen from an external storage tank may either be premixed with the wet recycled flue gas, or directly injected into the furnace. The top-mounted swirl burner has been designed specifically for oxy-coal combustion. The coal and the oxidants enter the furnace through several concentric registers. Since each of the inlet flows has a separate control system, this burner design

Fig. 2 Schematic illustration of IFK 500 kWth test facility with its measurement ports

Oxy-coal Combustion Modeling at Semi-industrial Scale

253

Fig. 3 Clip of the numerical grid

allows high flexibility concerning distribution and composition of the individual flows at the inlet. Swirl generators are integrated in the ducts of both secondary oxidant streams. Some burner details are shown in Fig. 3 which shows a clip of the numerical grid. Here, the black-colored cells denote inlet cells which correspond to the respective inlet streams named Core, Primary, Secondary1, and Secondary2 (from inside outward). Due to the cylindrical shape of the furnace and the burner geometry, the system may be assumed to be axisymmetric. A three-dimensional multi-domain grid has been used to describe the furnace geometry (see also Fig. 3). The first domain represents the burner with a cylindrical grid, and the second domain specifies the combustion chamber in Cartesian coordinates. Despite the test facility’s height of about 7 m, the numerical mesh may be reduced to a length of about 3 m because the combustion process is essentially finished in this section. Hence, computational time may be saved without any loss of accuracy. The entire grid consists of approximately 1 million computational cells. Since no temperature measurements at the furnace walls are available, an estimated temperature is mapped onto the refractory lining as boundary condition. In combination with an assumed deposit layer, a variable surface temperature may be calculated depending on the local wall heat flux.

3.2 Operating Conditions Fuel properties are compiled in Table 1. The approximated particle size distribution of the pulverized Lausitz lignite is derived from sieve analysis resulting in ten discrete particle classes. The most relevant operating conditions of the benchmark oxy-coal test case are listed in Table 2. The thermal input is 321 kWth . While the Primary stream contains the pulverized coal and pure CO2 as carrier gas the secondary streams consist of recycled flue gas premixed with O2 . The flue gas recycling is operated in wet mode at a recycling rate of 65.2% resulting in an O2 enrichment in the secondary oxidant of about 36 vol.-%. The global oxidant-to-fuel ratio is given with

254

M. M¨uller, U. Schnell, G. Scheffknecht

Table 1 Properties of Lausitz lignite

ar* daf** *

Proximate Analysis Cfix Volatiles [wt.-%] [wt.-%] 31.45 41.65 43.02 56.98

Moisture [wt.-%] 10.15 –

Ash [wt.-%] 16.75 –

Ultimate Analysis C H [wt.-%] [wt.-%] 49.70 3.89 67.99 5.33

N [wt.-%] 0.53 0.73

S [wt.-%] 1.69 2.31

O [wt.-%] 17.29 23.65

as received; ** dry, ash-free basis

Table 2 Boundary conditions of the benchmark test case Inlet Streams Core Coal* 3 mSTP /h [kg/h]

Primary Secondary1 3 mSTP /h m3STP /h

Secondary2 3 mSTP /h

0.0

24.0

156.0

61.1

48.0

Composition of Secondary1 and Secondary2 O2

CO2

[vol.-%]

[vol.-%] [vol.-%]

[vol.-%]

34.74

43.98

4.51

*

H2 O 16.77

inerts (N2 +Ar)

Coal is carried by Primary stream (pure CO2 )

λ ≈ 1.14. All inlet gas streams listed in Table 2 are characterized by high Reynolds numbers (Re > 104 ); thus, due to the imposed flow pattern the convective transport predominates the diffusive transport. Furthermore, in combination with the induced swirl, stable flame conditions are attained within this particular configuration.

3.3 Comparison of Simulation and Experiment The simulation results are summarized focusing on gas temperature profiles and the main gas species concentrations (O2 , CO2 , CO). Figure 4 shows vertical slices through the computational model. The illustrated conical flame shape is typically attributed to swirl burners. Ignition starts directly in the burner quarl resulting in peak temperatures of about 1500 ◦C (see Fig. 4a). Regarding the gas concentrations the simulation indicates reasonable predictions since in regions with considerable O2 levels there is no CO present, and accordingly the CO2 content is rather low. The CO peak is located in the near burner region where internal recirculation occurs. Figure 5 presents axial profiles along the centerline with symbols referring to measurement data. In general, the predictive quality of the model appears to be satisfactory as the simulation reflects the fundamental trends. Nevertheless, especially in the near burner region discrepancies between measurements and simulation are identified. As indicated by the sharp temperature gradient right after the burner exit, the simulations predict the ignition of the flame too fast leading to a peak value

Oxy-coal Combustion Modeling at Semi-industrial Scale

255

Fig. 4 Simulation results illustrated on a vertical slice along the furnace centerline (temperature in [◦C], concentrations in [vol.-%])

which is somewhat over-predicted compared to the measured peak (see Fig. 5a). Yet, after about 1 m the temperature corresponds very well to the measurements. Referring to the gas concentrations, it can be observed that the trends of the simulations concur with the experimental data. While the O2 levels agree well with the measured values over the entire range (Fig. 5b), the CO2 levels are in general predicted higher than the measured concentrations (Fig. 5c). Accordingly, the CO concentrations are too low compared to the experiment (Fig. 5d). Similar to the temperature profile, the calculated CO peak is shifted towards the burner exit; but the peak value is predicted rather accurately. However, the identified deviations between simulations and experiments may be a consequence of partially uncertain boundary conditions which are still essential for numerical modeling. Accordingly, the approximative procedure concerning the thermal boundary conditions at the furnace walls has to be taken into account regarding the differences between experiment and simulation. Moreover, slagging at

256

M. M¨uller, U. Schnell, G. Scheffknecht

Fig. 5 Axial profiles along the furnace centerline

the quarl and in the upper part of the furnace detected during the experiments may have distinct impact on the flow that can not be considered within a predictive CFD study. Hence, measurements of the flow field and the wall temperature should be conducted to avoid potentially misleading assumptions, and to identify potential shortcomings of the implemented sub-models. Furthermore, the measured recycled flue gas composition hints at distinct air in-leakage at the test facility which may not be reproduced adequately by the simulations.

3.4 Performance The simulations are computationally expensive due to the coupling of flow field and chemical reactions; thus, usage of high performance computers is usually inevitable. Accordingly, the AIOLOS simulations are performed on one of the NEC vector systems (SX-8/SX-9) which are installed at the High Performance Computing Center Stuttgart (HLRS). The simulations of this study have been carried out on the NEC SX-8 platform. The numerical grid of the furnace consists of about 1 million cells corresponding to a memory demand of about 2.3 GB. Each simulation requires 220 000 iterations until convergence which results in a total elapsed time of about 18 h while running on one single vector node using 8 CPUs. Performance

Oxy-coal Combustion Modeling at Semi-industrial Scale

257

analysis shows quite good vectorization efficiency of the AIOLOS code for both SX platforms. On the NEC SX-8 the vector operation ratio is 99.7% with an average vector length of 252, and an overall computational performance of 4.3 GFLOPS is obtained.

4 Conclusions An efficient computational modeling framework for oxy-coal combustion was developed and has been implemented into the CFD combustion code AIOLOS. The advanced sub-models which base mainly on the work of Leiser [13] were adjusted to account for the specific oxy-coal conditions. Validation of the entire model was carried out against an oxy-coal benchmark test case with pre-dried lignite. The corresponding experiments have been conducted at IFK’s 500 kWth furnace. The comparison of measurements and numerical results reveals that the extended sub-models are applicable for oxy-coal combustion. The combustion characteristics could be described reasonably well and fundamental trends are predicted correctly for gas temperature and the main combustion gas species indicating that the combustion model used in the present work is conceptually consistent. Nonetheless, some discrepancies were detected particularly in the near burner region as the simulations predicted the flame ignition further upstream than observed in the experiments. Thus, potential for further improvement within the modeling approach has been identified. The validation of the models is based on experimental data obtained in a semiindustrial scale test facility. In order to ensure the reliability of the presented modeling approach, further validation with oxy-coal flames at larger scale is thus recommended. Acknowledgments. The authors would like to thank Vattenfall AB and ALSTOM Power Systems GmbH for greatfully funding the presented work. Computational resources have been provided by the High Performance Computing Center Stuttgart (HLRS).

References 1. A. Kather, G. Scheffknecht, The oxycoal process with cryogenic oxygen supply, Naturwissenschaften 96 (2009) 993–1010. 2. D. Singh, E. Croiset, P. L. Douglas, M. A. Douglas, Techno-economic study of CO2 capture from an existing coal-fired power plant: MEA scrubbing vs. O2 /CO2 recycle combustion, Energy Conversion and Management 44 (19) (2003) 3073–3091. 3. T. F. Wall, Combustion processes for carbon capture, Proceedings of the Combustion Institute 31 (2007) 31–47. 4. B. J. P. Buhre, L. K. Elliott, C. D. Sheng, R. P. Gupta, T. F. Wall, Oxy-fuel combustion technology for coal-fired power generation, Progress in Energy and Combustion Science 31 (2005) 283–307.

258

M. M¨uller, U. Schnell, G. Scheffknecht

5. S. P. Khare, T. F. Wall, A. Z. Farida, Y. Liu, B. Moghtaderi, R. P. Gupta, Factors influencing the ignition of flames from air-fired swirl pf burners retrofitted to oxy-fuel, Fuel (87) (2008) 1042–1049. 6. B. E. Launder, D. B. Spalding, The numerical computation of turbulent flows, Computer Methods in Applied Mechanics and Engineering 3 (2) (1974) 269–289. 7. A. W. Date, Complete pressure correction algorithm for the solution of compressible NavierStokes equations on a nonstaggered grid, Numerical Heat Transfer, Part B: Fundamentals 29 (4) (1996) 441–458. 8. U. Schnell, Numerical modelling of solid fuel combustion processes using advanced CFDbased simulation tools, Progress in Computational Fluid Dynamics 1 (4) (2001) 208–218. 9. D. F¨ortsch, A Kinetic Model of Pulverised Coal Combustion for Computational Fluid Dynamics, Ph.D. thesis, Universit¨at Stuttgart, Stuttgart (2003). 10. J. Str¨ohle, Spectral Modelling of Radiative Heat Transfer in Industrial Furnaces, Shaker Verlag, Aachen, 2003. 11. L. Al-Makhadmeh, Coal Pyrolysis and Char Combustion Under Oxy-Fuel Conditions, Shaker Verlag, Aachen, 2009. 12. P. Glarborg, L. L. B. Bentzen, Chemical effects of a high CO concentration in oxy-fuel combustion of methane, Energy & Fuels 22 (1) (2008) 291–296. 13. S. Leiser, Numerical Simulation of Oxy-Fuel Combustion, Shaker Verlag, Aachen, 2010. 14. E. W. Lemmon, M. O. McLinden, D. G. Friend, NIST Chemistry WebBook (2009). 15. I. R. Gran, B. F. Magnussen, A numerical study of a bluff-body stabilized diffusion flame. Part 2. Influence of combustion modeling and finite-rate chemistry, Combustion Science and Technology 119 (1) (1996) 191–217. 16. J. R. Arthur, Reactions between carbon and oxygen, Transactions of the Faraday Society 47 (1951) 164–178. 17. D. G. Roberts, D. J. Harris, Char gasification in mixtures of CO2 and H2 O: Competition and inhibition, Fuel 86 (2007) 2672–2678. 18. T. Wall, Y. Liu, C. Spero, L. Elliott, S. Khare, R. Rathnam, F. Zeenathal, B. Moghtaderi, B. Buhre, C. Sheng, R. Gupta, T. Yamada, K. Makino, J. Yu, An overview on oxyfuel coal combustion – State of the art research and technology development, Chemical Engineering Research and Design 87 (2009) 1003–1016. 19. H. C. Hottel, A. F. Sarofim, Radiative Heat Transfer, McGraw-Hill, New York, 1967. 20. R. Johansson, K. Andersson, B. Leckner, H. Thunman, Models for gaseous radiative heat transfer applied to oxy-fuel conditions in boilers, Journal of Heat and Mass Transfer 53 (2010) 220–230. 21. R. Johansson, B. Leckner, K. Andersson, F. Johnsson, Account for variations in the H2 O to CO2 molar ratio when modelling gaseous radiative heat transfer with the weighted-sum-ofgrey-gases model, Combustion and Flame 158 (2011) 893–901. 22. B. Leckner, Spectral and total emissivity of water vapor and carbon dioxide, Combustion and Flame 19 (1) (1972) 33–48. 23. N. Lallement, A. Sayre, R. Weber, Evaluation of emissivity correlations for H2 O-CO2 -N2 /air mixtures and coupling with solution methods of the radiative transfer equation, Progress in Energy and Combustion Science 22 (6) (1996) 543–574. 24. J. H. Joseph, W. J. Wiscombe, The delta-Eddington approximation for radiative flux transfer, Journal of the Atmospheric Sciences 33 (12) (1976) 2452–2459. 25. S. Grathwohl, O. Lemp, U. Schnell, J. Maier, G. Scheffknecht, F. Kluger, B. Krohmer, P. M¨onckert, G. N. Stamatelopoulos, Highly Flexible Burner Concept for Oxyfuel Combustion, in: 1st Oxyfuel Combustion Conference, Cottbus, Germany, 2009.

Delayed Detached Eddy Simulations of Compressible Turbulent Mixing Layer and Detailed Performance Analysis of Scientific In-House Code TASCOM3D Markus Kindler, Peter Gerlinger, and Manfred Aigner

Abstract In the present paper a compressible turbulent mixing layer is investigated using Delayed Detached Eddy Simulation (DDES). Thereby two compressible flows are divided by a splitter plate which join downstream of the plate and form a mixing and shear layer. The simulations are performed with the scientific code TASCOM3D (Turbulent All Speed Combustion Multigrid Solver) using a fifth-order upwind biased scheme combined with an improved multi-dimensional limiting process (MLP) [1] for spatial discretization. The inviscid fluxes are calculated using the AUSM+ -up flux vector splitting. DDES is a hybrid RANS/LES approach which uses a traditional Reynolds averaged Navier-Stokes (RANS) approach for wall-bounded regions and a Large Eddy Simulation (LES) approach for the mixing section. The simulations show a quasi two-dimensional flow field right after the splitter plate and a following conversion to a turbulent and highly unsteady flow field after short distance. Furthermore the performance of TASCOM3D on different HPC systems is analyzed: a vector (NEC SX-9) and a scalar processor based (Cray XE6) system, both installed at the High Performance Computing Center Stuttgart (HLRS). The investigation points out the challenges and problems in HPC and may serve other researchers as comparison and assistance to achieve good performance on the different architectures.

1 Introduction In a turbulent flow the large eddies are responsible for the greatest part of energy, mass and momentum transport and strongly depend on boundary conditions and geometry. Large Eddy Simulations (LES) resolve these scales directly, while the small scales of turbulence are modeled. In the past many researchers showed excellent Markus Kindler · Peter Gerlinger · Manfred Aigner Institut f¨ur Verbrennungstechnik der Luft- und Raumfahrt, Universit¨at Stuttgart, Pfaffenwaldring 38-40, 70569 Stuttgart, Germany W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 20, © Springer-Verlag Berlin Heidelberg 2012

259

260

M. Kindler, P. Gerlinger, M. Aigner

results obtained by LES of unsteady flows, which cannot be predicted accurately by solving the traditional Reynolds averaged Navier-Stokes (RANS) equations. However especially in wall bounded high speed flows, the resolution of the grid in the boundary layer has to be extremely high to resolve the required scales accurately. In such near wall regions traditional RANS strategies predict the boundary layer very well at a computational effort which is much smaller. Hybrid RANS/LES techniques combine the advantages of both approaches with reasonable computational effort. The Delayed Detached Eddy Simulation (DDES) is one of the most common hybrid RANS/LES approaches and is used in the present paper to investigate a compressible turbulent mixing layer that has been investigated experimentally by Goebel and Dutton [2]. Especially in high speed flows the numerical investigations using hybrid RANS/ LES techniques are computationally expensive. However, depending on the object of investigation, even traditional RANS approaches are very costly. The computational effort of simulations of reacting flows using detailed chemistry with approaches to account turbulence chemistry interaction is comparable to an inert DNS in some cases. Therefore the use of high performance computers (HPC) is indispensable. The main part of HPC systems can be divided into two groups: vector and scalar processor based systems. Both types of systems are installed at the High Performance Computing Center Stuttgart (HLRS). In the past TASCOM3D was mainly used on high performance computing systems with vector-processors (e.g. NEC SX-9). During the whole development of the code up to now it has been paid attention to optimizations on these architecture, e.g. avoiding data dependency, increasing vector lengths or avoiding power of 2 strides. However a majority of next generation high performance computing systems are expected to be based on scalarprocessors. The High Performance Computing Center Stuttgart (HLRS) is installing a massively parallel supercomputer with AMD Opteron processors. The supercomputer is a Cray XE6 system and its final installation step is planned for 2013. At the end of 2010 a part of the Cray XE6 system has been installed in order to provide users the possibility to migrate, test and optimize applications. In the second part of the present paper the performance of TASCOM3D on both systems is investigated and the results are discussed in detail.

2 Governing Equations and Numerical Scheme The investigations presented in this paper are performed using the scientific in-house code TASCOM3D. The code has been used successfully in the last two decades simulating reacting and non-reacting flows. It describes reacting flows by solving the full compressible Navier-Stokes, species and turbulence transport equations. Additionally an assumed PDF (probability density function) approach is used to take turbulence chemistry interaction into account. The set of averaged equations in threedimensional conservative form is given by

DDES of Compressible Turbulent Mixing Layer

261

∂ Q ∂ (F − Fν ) ∂ (G − Gν ) ∂ (H − Hν ) + + + = S, ∂t ∂x ∂y ∂z

(1)

where T ˜ ρ¯ k, ρω ¯ , ρσ ¯ T , ρσ ¯ Y , ρ¯ Y˜i , ˜ ρ¯ v, ˜ ρ¯ w, ˜ ρ¯ E, Q = ρ¯ , ρ¯ u,

i = 1, 2, . . . , Nk − 1 .

(2)

The variables in the conservative variable vector Q are the density ρ¯ (averaged), ˜ the velocity components (Favre averaged) u, ˜ v˜ and w, ˜ the total specific energy E, the turbulence variables k and ω = ε /k (where k is the kinetic energy and ε the dissipation rate of k), the variance of the temperature σT and the variance of the sum of the species mass fractions σY and finally the species mass fractions Y˜i (i = 1, 2, . . . , Nk − 1). Thereby Nk describes the total number of species that are used for the description of the gas composition. The vectors F, G and H specify the inviscid fluxes in x-, y- and z- direction, Fν , Gν and Hν the viscous fluxes, respectively. The source vector S in (1) includes terms from turbulence and chemistry and is given by T S = 0, 0, 0, 0, 0, S¯k , S¯ω , S¯σT , S¯σY , S¯Yi ,

i = 1, 2, . . . , Nk − 1 ,

(3)

where S¯k and S¯ω are the averaged source terms of the turbulence variables, S¯σT and S¯σY the source terms of the variance variables (σT and σY ) and S¯Yi the source terms of the species mass fractions. As basis RANS turbulence model for the hybrid RANS/LES approach the two-equation low-Reynolds-number Wilcox k-ω turbulence model is applied [3] which is described below. The momentary chemical production rate of species i in (3) is given by Nk Nk Nr νl,r νl,r , (4) SYi = Mi ∑ νi,r − νi,r k f r ∏ cl − kbr ∏ cl r=1

l=1

l=1

where k fr and kbr are the forward and backward rate constants of reaction r (defined by the Arrhenius function), the molecular weight of a species Mi , the species con and ν of species i centration ci = ρ Y˜i /Mi and the stoichiometric coefficients νi,r i,r in reaction r. The averaged chemical production rate for a species i due to the use of an assumed PDF approach is described in detail in Refs. [4, 5]. For the reconstruction of the interface values of the finite-volume scheme, a fifth-order upwind biased scheme is employed. Based on these data the inviscid fluxes are calculated using the AUSM+ -up flux vector splitting [6]. The reconstruction of the interface values is based on a higher order polynomial reconstruction of fifth order spatial accuracy. Unequal grid spacing is taken into account. The unsteady set of equations (1) is solved using an implicit Lower-Upper-Symmetric Gauss-Seidel (LU-SGS) [7– 10] finite-volume algorithm, where the finite-rate chemistry is treated fully coupled with the fluid motion. More details concerning TASCOM3D may be found in Refs. [5, 9–12].

262

M. Kindler, P. Gerlinger, M. Aigner

2.1 Delayed Detached Eddy Simulation The Detached Eddy Simulation (DES) was originally formulated by Spalart [14]. He realized a hybrid RANS/LES-approach by replacing the length scale in the SpalartAllmaras turbulence model by a new length scale l˜ = min(d,CDES Δ ), where d is the wall distance, Δ the local maximum grid spacing and CDES an empirical constant in the order of one. Due to the introduction of the new length scale the model is able to control the eddy viscosity. The DES works as a LES if the grid spacing is small compared to the wall distance and as a RANS model otherwise. However, due to the fact that the length scale only depends on the grid spacing, problems can arise if the grid spacing in the boundary layer is relatively fine in wall-normal and streamwise direction. In this case the local maximum grid spacing Δ becomes smaller than the wall distance d in the boundary layer. Therefore Spalart introduced a blending function

(5) fd = 1 − tanh [8rd ]3 , rd =

νt + ν ∂ ui ∂ ui 2 2 ∂x j ∂x j κ d

(6)

that preserves the RANS mode in the boundary layer [15], where νt and ν are the kinematic eddy viscosity and molecular viscosity, d is the wall distance and κ = 0.41 the van K´arm´an constant. The new model is referred to as Delayed Detached Eddy Simulation (DDES) and the new formulated length scale is l˜ = d − fd max (0, d −CDES Δ ) .

(7)

Thereby CDES is an adjustable parameter, whose effect on a compressible turbulent mixing layer is investigated in [16]. Travin et al. [17] introduced the DES formulation in the k-ω framework by replacing the length scale lk−ω in the dissipative term of the k-equation ρ k3/2 DkRANS = ρβ ∗ κω = (8) lk−ω by the length scale of the DES model (β ∗ is a constant of the Wilcox turbulence model). This procedure is used in the present paper and the dissipative term of the k-equation for the DDES-model in the k-ω framework is given by DkDDES =

ρ k3/2 . l˜

(9)

3 Compressible Turbulent Mixing Layer Goebel and Dutton [2] performed several experiments, where two compressible flows are divided by a splitter plate which join downstream of the plate and form a mixing and shear layer (see Fig. 1). The flow fields have been examined by Laser

DDES of Compressible Turbulent Mixing Layer

263

Table 1 Inflow conditions for the compressible turbulent mixing layer test case according to experiments of Goebel and Dutton [2] (δ = boundary layer thickness) top flow (1) bottom flow (2)

Mach Number

u (m/s)

T (K)

p (Pa)

δ (mm)

1.91 1.36

700 399

334 215

49000 49000

2.9 2.5

Fig. 1 Sketch of the compressible turbulent mixing layer with mixing layer thickness b

Doppler velocimetry and Schlieren imaging. The measurements provide average and rms velocities over a wide range of convective Mach numbers. Thereby the test channel has a height of 48 mm, a width of 96 mm, and the range of the test section covers 450 mm downstream of the splitter plate. The splitter plate has a thickness of 0.5 mm at the trailing edge. The lower part of the plate has an angle of 2.5◦ whereas the upper part shows no inclination. The present investigation is based on the experiment with flow Mach numbers of Ma1 = 1.91 and Ma2 = 1.36, which result in a convective Mach number of 0.46. Table 1 summarizes the inflow conditions of the investigated test case. The mixing layer thickness b (see Fig. 1) is defined by the distance of transverse locations with u = u1 − 0.1Δ u and u = u2 + 0.1Δ u with Δ u = u1 − u2 . The domain of the simulation considers the full channel height and a test section of 350 mm in downstream direction. For the inflow conditions used in this paper a similarity region is observed experimentally for positions with x > 300 mm. Due to the high numerical effort only a section of the channel width is simulated, using periodic boundary conditions. The domain width in span-wise direction is z = 3 mm. The grid consists of 1474 × 128 × 30 cell volumes. The grid is clustered towards the splitter plate in wall-normal direction (y+ < 1) and towards the tip of the plate. The boundary condition at the splitter plate is no-slip while slip boundary conditions are used at the upper and lower channel walls. At the inflow steady state boundary layer values (calculated by RANS simulations) are used which match the incoming boundary layer thicknesses specified by Goebel and Dutton. Due to the supersonic flow, outlet values are extrapolated from the flow field. The adjustable parameter for the DDES approach is chosen to be CDES = 0.5. For the reconstruction of the interface values of the finite-volume scheme a 5th order upwind biased scheme is used. The simulation covers a time span of two residence times (tr = domain length/u2 ) in order to eliminate initialization errors. Statistical informations are gathered over a time span of four residence times. Figure 2 shows the instantaneous iso-surface of a vortex identification criterion Q = 0.5(Ω 2 − S2 ), where Ω describes the vorticity and S the strain rate tensor.

264

M. Kindler, P. Gerlinger, M. Aigner

Fig. 2 Iso-surfaces of the vortex identification criterion Q = 0.5(Ω 2 − S2 ) for the compressible turbulent mixing layer obtained by DDES. The color indicates the velocity magnitude

In the near field of the splitter plate the flow shows a vortex shedding with quasi two-dimensional characteristics. However after a short distance (≈ 50 mm) rapid conversion to a highly unsteady and turbulent flow field is observed. Hence it is demonstrated that the numerical framework of TASCOM3D using the DDES approach is able to resolve the larger scales of turbulence in a compressible turbulent mixing layer. Figure 3 compares the measured similarity profiles and the results of the DDES calculations. The mean velocity profile is predicted satisfactorily, apart from some smaller deviations towards the top flow. The peak value of the rms velocity components in stream-wise direction is under-predicted. However the characteristic of the profile is reproduced well. In case of the wall-normal rms velocity components the agreement between simulation and experiment is very good. Similar observations are noticed concerning the kinematic Reynolds stresses u v . The development of the stresses and the peak value are predicted well by the simulation. The uncertainties occurring at the rms velocity components and Reynolds stresses can be deduced to different reasons. Studies of a variation of the modeling parameter CDES [16] showed that the effect of this parameter on the rms velocity components is limited. Therefore probably the grid spacing and the domain width in span-wise direction has to be improved to obtain better results. Furthermore the steady state inflow conditions may be inadequate for the current investigation and unsteady inflow conditions might improve the results as well.

DDES of Compressible Turbulent Mixing Layer

265

Fig. 3 Similarity profiles of mean velocity (top left), rms velocity profiles in stream-wise direction (top right), rms velocity profiles in wall-normal direction (bottom left) and Reynolds stresses (bottom right) for experimental data and DDES results, respectively

4 Performance Analysis In the next section the performance of TASCOM3D on the NEC SX-9 and the Cray XE6 system is investigated. Note that so far no optimizations regarding the performance on scalar-processors have been done. Hence the aim of this analysis is on one hand to assess the performance on both systems and on the other hand a detection of problems and optimizations needed on the Cray XE6 system. The test case for the present investigation is a reacting air flow with 13 species and 32 reaction steps in a cubic domain of dimension 2π × 2π × 2π . The grid resolution varies between 32 × 32 × 32 and 256 × 256 × 256 volumes depending on the specific test case. Besides the performance measurements for a single CPU, the performance with parallelization by domain decomposition (using MPI) is investigated. Two types of scaling procedures are applied: strong and weak scaling.

266

M. Kindler, P. Gerlinger, M. Aigner

4.1 Single CPU Performance Table 2 summarizes the five most time consuming subroutines and their performance key data of TASCOM3D on the NEC SX-9 using a grid with a resolution of 128 × 128 × 128 volumes. The subroutines PROP and REACTION perform calculations on the right hand side (RHS) of the set of equation and only require local data of each volume. The subroutines LINE3D, UFSWEEP and LFSWEEP are part of the implicit left hand side (LHS) of the solver and require data from neighboring cells. The LHS is solved using an implicit lower-upper symmetric Gauss-Seidel (LU-SGS) [7–10] algorithm. This data dependency makes the algorithm unvectorizable, if the implicit lower and upper sweeps are performed in i, j,k-direction. The chosen solution method eliminates data dependencies by sweeping along hyperplanes in a lower and upper solution step through the computational domain. The hyperplanes on the structured i, j,k-ordered grid are defined by i + j + k = constant and have to be predefined and stored in a list vector at the start of the simulations. Hence indirect addressing is required in the solution steps of the LHS. All subroutines listed in Table 2 show very good vector operation ratios (99.29%–99.96%) and vector lengths (250–256). However the amount of MFLOPS achieved is very different. The performance varies between 10510 and 12483 MFLOPS (10.3%–12.2% peak performance) in case of subroutines of the RHS and 2633 and 5449 MFLOPS (2.6%–5.3% peak performance) in case of subroutines of the LHS. The great differences in performance of these subroutines are explained by the bank conflicts (per iteration). As the subroutine PROP and REACTION show only minor conflicts (0.0004–0.007), these values are increased significantly for the subroutines of the LHS (0.1861–0.1982). Those bank conflicts are assumed to result from the indirect addressing required for the hyperplanes which probably causes memory latencies. The minimization of bank conflicts and hence an increase in performance is still an open task and is further investigated. In total, TASCOM3D reaches 7692 MFLOPS (7.5% of peak performance), 99.29% vector operation ratio and an average vector length of 218.5 for the current test case. In addition to the single CPU performance analysis on the NEC SX-9, an identical analysis is performed using the previous vector-processor based HPC, the NEC SX-8. Corresponding results are summarized in Table 3. The theoretical peak performance of a NEC SX-9 CPU is 102.4 GFLOPS which is an increase by a factor of about 6 compared to a NEC SX-8

Table 2 Summary of performance data using the NEC SX-9 for the most important subroutines Subroutine

Time

MFLOPS 10510 5449 12483 2633 2653

Vec. oper. ratio 99.45% 99.59% 99.39% 99.96% 99.95%

Av. vec. length 256 250 256 250 250

Bank conflicts 0.14 59.46 2.14 56.25 55.84

PROP LINE3D REACTION UFSWEEP LFSWEEP

21.8% 18.5% 15.7% 11.0% 11.0%

TOTAL

100%

7692

99.29%

218.5

182.1

Quota peak perform. 10.3% 5.3% 12.2% 2.6% 2.6% 7.5%

DDES of Compressible Turbulent Mixing Layer

267

Table 3 Comparison of performance data using NEC SX-9 and NEC SX-8 for the most important subroutines Subroutine MFLOPS Quota peak speed up perform. SX-9 SX-8 SX − 9 SX − 8 PROP 10510 7082 10.3% 44.3% 1.48 LINE3D 5449 2143 5.3% 13.4% 2.54 REACTION 12483 5477 12.2% 34.2% 2.27 UFSWEEP 2633 1198 2.6% 7.5% 2.2 LFSWEEP 2653 1203 2.6% 7.5% 2.19 TOTAL

7962

7.5%

3953

24.7%

2.01

Table 4 Summary of performance data using Cray XE6 for the most important subroutines Subroutine

Time

MFLOPS

Cache Hits

LINE3D REACTION UFSWEEP LFSWEEP PROP

20.4% 18.4% 13.7% 13.6% 11.8%

306.15 657.25 122.87 123.24 789.85

84.2% 97.7% 54% 53.9% 97.9%

Quota peak perform. 3.8% 8.2% 1.5% 1.5% 9.9%

TOTAL

100%

406.94

89.1%

5.1%

CPU (16 GFLOPS). However, a comparable speed up in a practical simulation is not observed. While the quota of the peak performance achieved ranges between 2.6% and 12.2% in case of the NEC SX-9, much higher quotas are reached using the NEC SX-8 (7.5%–44.3%). Hence the total speed-up of TASCOM3D on a single NEC SX-9 CPU compared to a NEC SX-8 CPU is only a factor of two and does not scale according to the theoretical peak performance values. Table 4 summarizes the performance data achieved on the Cray XE6 system. Due to the limited memory of a single node a grid with a resolution of 32 × 32 × 32 volumes only could be used for the single CPU performance investigation. The subroutine with the highest computational effort are the same as on the NEC SX-9. Similar observations are noticed concerning the performance as in case of NEC SX9. The subroutines requiring local data only (PROP and REACTION) perform well (between 657.25 and 789.85 MFLOPS, i.e. between 8.2% and 9.9% peak performance), while the subroutines of the LHS perform worse (122.87–306.15 MFLOPS, i.e. 1.5%–3.8% peak performance). The reason for the lower performance again is found in memory access (cache hits). The cache hits (i.e. cache contains requested data) are high for RHS-subroutines (97.7%–97.9%), whereas the LHS-subroutines show much less hits (53.9%–84.2%). In total TASCOM3D reaches 406.9 MFLOPS, 89.1% cache hits and 5.1% peak performance for the current test case on the Cray XE6 system.

268

M. Kindler, P. Gerlinger, M. Aigner

4.2 Scaling Performance In the next section the scaling of TASCOM3D on the different HPC systems is investigated. As mentioned above two types of scaling procedures are applied (see. Fig. 4). In case of strong scaling the total number of cell volumes is constant while the number of CPUs is increased, i.e. the number of volumes per CPU decreases. In case of weak scaling the number of cell volumes per CPU is kept constant while the number of CPUs is increased, i.e. the total number of volumes increases. Table 5 summarizes the different block sizes for strong and weak scaling, respectively. The investigations covers between 1 and 64 CPUs in case of NEC SX-9 and between 4 and 512 CPUs in case of Cray XE6. Due to the limited memory on the Cray XE6 the simulations start with 4 CPUs. Figure 5 shows the speed up and size up on the NEC SX-9 and Cray XE6, respectively. Table 6 summarizes the performance key data for the minimum and maximum CPUs used for weak and strong scaling on both systems. Note that in case of simulations with more than one CPU, additional subroutines are required to provide and process data for MPI communication which increases the computational costs compared to a single CPU simulation. In case of the NEC SX-9 and weak scaling a size up by a factor of 50 is obtained using 64

Fig. 4 Sketch for procedure of strong (top) and weak (bottom) scaling, respectively

DDES of Compressible Turbulent Mixing Layer

269

Table 5 Test matrix for investigation on scaling performance using NEC SX-9 and Cray XE6 system, respectively No. CPUs 1 2 4 8 16 32 64 128 256 512

Weak scaling 1 · (32 × 32 × 32) 2 · (32 × 32 × 32) 4 · (32 × 32 × 32) 8 · (32 × 32 × 32) 16 · (32 × 32 × 32) 32 · (32 × 32 × 32) 64 · (32 × 32 × 32) 128 · (32 × 32 × 32) 256 · (32 × 32 × 32) 512 · (32 × 32 × 32)

Strong Scaling 1 · (128 × 128 × 128) 2 · (64 × 128 × 128) 4 · (64 × 64 × 128) 8 · (64 × 64 × 64) 16 · (32 × 64 × 64) 32 · (32 × 32 × 64) 64 · (32 × 32 × 32) 128 · (16 × 32 × 32) 256 · (16 × 16 × 32) 512 · (16 × 16 × 16)

NEC SX-9 yes yes yes yes yes yes yes no no no

Cray XE6 no no yes yes yes yes yes yes yes yes

Fig. 5 Speed-up/size-up using NEC SX-9 (left) and Cray XE6 (right) system, respectively

CPUs. Hence the losses summarize to 22% of the performance of a single CPU. Strong scaling is observed to be less efficient. A speed up by a factor of 40 is obtained using 64 CPUs, and hence the losses summarize to 36% of the performance compared of a single CPU. As the grid dimension is constant on each CPU in case of weak scaling the majority of the losses arise from increased bank conflicts. In strong scaling additionally the reduction of the average vector length due to smaller grid dimensions per CPU hinders the vector operation ratio and decreases the performance further. The total number of MFLOPS for 64 CPUs result in 315.6 GFLOPS for weak scaling and 340.7 GFLOPS for strong scaling, respectively. On the Cray XE6 TASCOM3D shows a very good scalability. For weak scaling a size up of factor 500 is reached using 512 CPU, which results in only 2.3% losses. The key performance data shows that only slight deterioration is observed in cache hits which result in a small decrease in performance. MPI communication is not significant for the current test case. For strong scaling a super-ideal speed up factor of 685 is observed using 512 CPUs. This behavior can be explained by analyzing the memory access listed in Table 6. With decreasing grid dimension per CPU the cache hits increase (by 3.8% in maximum) which leads to a significant improvement

270

M. Kindler, P. Gerlinger, M. Aigner

Table 6 Summary of key performance data with respect to scalability (weak and strong scaling) using NEC SX-9 and Cray XE6 system, respectively

NEC SX-9

CRAY XE6

vect. oper. ratio aver. vect. length bank conflicts MFLOPS CPU max. total MFLOPS cache hits MFLOPS CPU max. total MFLOPS

weak scaling min CPUs max CPUs 98.61% 98.5% 112 112 0.106 8.385 6332 4930 315570 91% 89.6% 298.5 286.5 175269

strong scaling min CPUs max CPUs 99.3% 98.6% 218.5 112 4.343 8.211 7962 5322 340650 87.7% 91.5% 262.6 342.3 146676

Fig. 6 MPI communication, cache misses and total performance in MFLOPS for block sizes per CPU of 43 , 83 , 163 and 323 using Cray XE6, respectively

in performance (about 30% more MFLOPS in case of 512 CPUs). The memory access seems to be strongly coupled with the size of the hyperplanes: when the size of the planes decrease the cache hits increase. This effect is further investigated by simulations of different grid dimensions using 512 CPUs. Thereby four different grid sizes are investigated: 2563 , 1283 , 643 and 323 volumes, which result in block sizes on each CPU of 323 , 163 , 83 and 43 volumes. Figure 6 shows the quota of MPI communication and cache misses as well as the total MFLOPS versus the different block sizes. The cache misses are strongly minimized if the block size is getting smaller (9% misses in case of 323 , 1.6% misses in case of 43 ). However with decreasing block size the time consumed by MPI communication is rising. While about 2.9% of computational time is spent for MPI in case of 256 × 256 × 256 volumes only, about 32.2% is used in case of 32 × 32 × 32 volumes. Regarding the total amount of MFLOPS of each simulation, the block size of 83 performs best and reaches about 68% more MFLOPS compared to the simulation using 323 volumes per CPU. In case of the smallest block size the great effort for the MPI communication reduces the gain in memory access and hence the total performance decreases again.

DDES of Compressible Turbulent Mixing Layer

271

5 Conclusion A Delayed Detached Eddy Simulation (DDES) of a compressible turbulent mixing layer has been performed. The simulation shows a vortex shedding with quasi two-dimensional characteristics in the near field of the splitter plate and a following rapid conversion to a highly unsteady and turbulent flow field. The numerical prediction obtained using DDES demonstrates a good agreement with measurements. However the rms velocity in stream-wise direction is under-predicted which probably results from a insufficient domain width and grid spacing in span-wise direction. Furthermore the use of unsteady inflow conditions might improve the results, too. In the second part of the paper the performance of TASCOM3D on the NEC SX-9 and the Cray XE6 system has been investigated. It is noticed that parts of the algorithm using local data only perform well on both systems. In subroutines dealing with the implicit part of the solver, lower performance is observed on the NEC SX-9 and the Cray XE6 system, respectively. The indirect addressing required for the LUSGS solver causes bank conflicts on vector-processors and decreases cache hits on scalar processors what is problematic on both systems. Additionally the scalability of TASCOM3D on the different systems has been investigated. In case of the NEC SX-9 the losses resulting from scaling range between 22% (weak scaling) and 36% (strong scaling) using 64 CPUs compared to single CPU performance. On the Cray XE6 system TASCOM3D shows a much better scalability and the losses for the weak scaling are only 2.3%. In case of strong scaling even a super-ideal scaling is observed. This behavior is explained by increased cache hits due to shorter lengths of the hyperplanes required for the LU-SGS solver.

References 1. Gerlinger, P.: High-Order Multi-Dimensional Limiting for Turbulent Flows and Combustion, AIAA paper 2011-296, 2011 2. Goebel, S.G., Dutton, C.D.: Experimental Study of Compressible Turbulent Mixing Layers, AIAA Journal, 28, pp. 538–546, 1991. 3. Wilcox, D.C.: Formulation of the k–ω Turbulence Model Revisited, AIAA Journal, 46, pp. 2823–2838, 2008. 4. Gerlinger, P.: Numerische Verbrennungssimulation, Springer, Berlin-Heidelberg 2005, ISBN 3-540-23337-7. 5. Gerlinger, P.: Investigations of an Assumed PDF Approach for Finite-Rate-Chemistry, Combustion Science and Technology, 175, pp. 841–872, 2003. 6. Liou, M.-S.: A Sequel to AUSM, PART II: AUSM+ − up for all Speeds, Journal of Computational Physics, 214, pp. 137–170, 2006. 7. Shuen, J.S.: Upwind Differencing and LU Factorization for Chemical Non-Equilibrium Navier-Stokes Equations, Journal of Computational Physics, 99, pp. 233–250, 1992. 8. Jameson, A., Yoon, S.: Lower-Upper Implicit Scheme with Multiple Grids for the Euler Equations, AIAA Journal, 25, pp. 929–937, 1987. 9. Gerlinger, P., Br¨uggemann, D.: An Implicit Multigrid Scheme for the Compressible NavierStokes Equations with Low-Reynolds-Number Turbulence Closure, Journal of Fluids Engineering, 120, pp. 257–262, 1998.

272

M. Kindler, P. Gerlinger, M. Aigner

10. Gerlinger, P., M¨obus, H., Br¨uggemann, D.: An Implicit Multigrid Method for Turbulent Combustion, Journal of Computational Physics, 167, pp. 247–276, 2001. 11. Stoll, P., Gerlinger, P., Br¨uggemann, D.: Domain Decomposition for an Implicit LU-SGS Scheme Using Overlapping Grids, AIAA-paper 97-1896, 1997. 12. Stoll, P., Gerlinger, P., Br¨uggemann, D.: Implicit Preconditioning Method for Turbulent Reacting Flows, Proceedings of the 4th ECCOMAS Conference, 1, pp. 205–212, John Wiley & Sons, New York 1998. 13. Gerlinger, P., Br¨uggemann, D.: Numerical Investigation of Hydrogen Strut Injections into Supersonic Air Flows, Journal of Propulsion and Power, 16, pp. 22–28, 2000. 14. Spalart, P.R., Jou, W.-H., Strelets, M., Allmaras, S.R.: Comments on the feasibility of LES for wings, and on a hybrid RANS/LES approach, Advances in DNS/LES, 1997. 15. Spalart, P.R., Deck, S., Shur, M.L., Squires, K.D., Strelets, M., Travin, A.: A New Version of Detached-Eddy Simulation, Resistant to Ambiguous Grid Densities, Theor. Comput. Fluid Dyn., 20, pp. 181–195, 2006. 16. Kindler, M., Gerlinger, P., Aigner, M.: Investigation of Hybrid RANS/LES Approaches for Compressible High Speed Flows, to be published, 2011. 17. Travin, A., Shur, M.L., Strelets, M.: Physical and Numerical Upgrades in the Detached-Eddy Simulation of Complex Turbulent Flows, Advances in LES of Complex Flows, 65, pp. 239– 254, 2004. 18. Gerlinger, P., Stoll, P., Kindler, M., Schneider, F., Aigner, M.: Numerical Investigation of Mixing and Combustion Enhancement in Supersonic Combustors by Strut Induced Streamwise Vorticity, Aerospace Science and Technology, 12, pp. 159–168, 2008.

Computational Fluid Dynamics Prof. Dr.-Ing. Siegfried Wagner

Only approximately two of three proposed papers within CFD could be selected for inclusion into the book because of limited space. All these papers were reviewed by two reviewers. Besides the usual papers that deal with Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES) or with hybrid methods, i.e. a combination of DNS or LES with RANS (Reynolds Averaged Navier-Stokes) methods or finite volume solutions of the Navier-Stokes equations, usually second order accurate in space and time, there is also noticeable a new trend towards higher order methods, i.e. Discontinuous Galerkin (DG) methods, whereas in the past higher order methods were applied within DNS. These were restricted to low Reynolds numbers and usually to flows around simple geometries. Three highly efficient explicit Discontinuous Galerkin schemes for high performance calculations are presented in the paper by Altmann et al. that allow performing Direct Numerical Simulations of isotropic turbulence and turbulent channel flow, Large Eddy Simulations (LES) of cavity flow and hybrid simulations of aeroacoustic phenomena. The authors demonstrate a good scaling behavior up to thousand cores on HLRS clusters. Members of university institutes and industry apply in the paper by Harlacher et al. DG methods to simulate compressible turbulent flows with noise generation in complex geometries and use hybrid methods (a zonal coupling of LES and RANS) to enhance efficiency in turbulence simulations. Rauschenberger et al. apply their 3D Free Surface code to simulate freezing of super cooled droplets in the atmosphere. The numerical procedure does not use a model for the fluid-solid interactions and can thus be regarded as DNS. They reach a performance of 100 GFLOPS per CPU using 16 CPUS of the NEC SX-9 of HLRS. They indicate that there is still a way for further optimizing the performance. Sarkar et al. use the Lattice Boltzmann Method to find the best procedure simulating micro-fluidics mixing. They get a moderate performance on the relatively small XC2 of SCCK (Steinbuch Centre for Computing Karlsruhe) but mention that

Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, Pfaffenwaldring 21, 70550 Stuttgart, Germany 273

274

S. Wagner

their code scales almost linearly for up to 262144 cores on the BlueGene/P system JUGENE of the J¨ulich Supercomputing Centre. Breuer and Alletto use LES to simulate particle-laden turbulent flows in complex geometries and at high mass loading. Their grids contain up to 17 million cells. A special collision detection procedure allows to break down the computational effort from O(Np2 ) to O(Np ) when N p means the number of particles. Konopka et al. analyze film cooling in supersonic combustion ramjets, called scramjets, by LES. The simulations on the NEC SX-9 of HLRS consist of 24 million grid points, were run on one node (16 CPUs) occupying a total memory of 31.98 GB and reach a performance of 285.881 GFLOPS. Using their own code SPARC Pritz et al. detect a very important source of selfexcited combustion instabilities in a combustion system consisting of Helmholtz resonator type components. They show that LES can provide not only the damping ratio for the analytical model but also the Eigen frequency of the resonator. The computer time for one point of the excitation frequency amounts to about 400 hours using 108 Opteron processors of the HP XC4000 of SCC Karlsruhe. Riedeberger and Rist employ the so called γ -Reθ model, a two equation, correlation-based transition model using local variables to predict the extension of the laminar regions on a stiff geometry of the dolphin skin swimming at Reynolds numbers between 5.5 · 105 and 107 . They can gain a very interesting insight of the potential for active laminarization due to the anisotropic structure of the dolphin skin. They use the NEC Nehalem cluster of HLRS and show a very good scaling behavior of the STAR-CCM+ solver if an additional control processor is placed for each running simulation. The important result is that due to the modeling of both turbulence and transition within the RANS equations it is possible to use HPC to address complex geometric setups in contrast to DNS where the focus is to understand the physics on the basis of the necessarily simple flow geometries. Klein et al. demonstrate the impact of corner flow effects on results obtained in wind tunnel tests by simulating the flow not only over the model but also including the wind tunnel wall effects using the FLOWer and TAU codes of DLR. They perform scaling tests with TAU on the NEC 144 Rv-1 server configuration of HLRS with a grid size of 2.67 million points. Using 28 nodes the total costs for one run are 2698 CPUh on the basis of 15000 iterations. Bensing et al. perform numerical simulations on the NEC Nehalem cluster of HLRS in order to investigate the hovering performance of an isolated helicopter rotor in ground effect, to optimize the shape of the rotor blade and to compute the flow around a complete helicopter in forward flight. The computational effort on the NEC Nehalem cluster using a mesh with 25 million grid cells and 398 grid blocks as well as 160 parallel processors amounts to 18 hours of wall clock time per one rotor revolution. Complete helicopter aeromechanics simulations in forward flight require about 70000 CPU hours. Reinartz investigates the flow state of the boundary layer and of shockwave/boundary layer interactions at the intake of a scramjet. The computations are performed with the FLOWer code on the NEC SX-9 of HLRS with a memory requirement of approximately 80 GB. For a typical problem to converge nearly 100000 iterations

Computational Fluid Dynamics

275

are needed whereas a single batch job performs approximately 12000 iterations and requires 10 hours of CPU time per node. Afterwards the job is re-submitted into the batch queue. Reinartz tested the scaling behavior by performing 100 iterations using 4, 8, and 16 processors with a speedup of 1.88 and 3.4 when switching from 4 to 8 and from 4 to 16 processors, respectively. Starmann et al. investigate the unsteady effects on the droplet formation process due to rotor-stator interactions of steam power plants. The numerical procedure is based on the solution of RANS equations that are extended by a wet steam specific nucleation and a droplet growth model. The computations are performed on the NEC Nehalem cluster using the commercial code Ansys CFX V12.1. The authors find out that a high number of iterations are necessary to gain convergence and oscillating solutions of this unsteady flow. The simulation time is approximately 60 days when 48 CPUs are used. To contribute to the improvement of the safety analysis of light-water reactors Zirkel and Laurien study possibilities to improve CFD methods in order to predict the mixing of a stable stratification with a free jet. They find out that the Reynolds stress model is capable to calculate non-isotropic Reynolds stresses but is not sufficient to simulate the mixing accurately. They had to add a non-isotropic Turbulence scalar flux model to enhance the Reynolds stress model. They run the commercial CFX code on the NEC Nehalem cluster of HLRS using 64 CPUs and present interesting information as the wall clock time for one run is 22 h 40 min compared to a total CPU time of 14 h 45 min. This means that only 65 per cent of the total time is for actual simulation and the rest is communication in their example. A shortcoming in the reports of this year is again the usage of a relatively low number of CPUs. This situation has to be improved since some of the HPCs within GCS (Gauss Centre of Supercomputing) are already massively parallel platforms and some additional will come in the very near future. Although the three centers of supercomputing in J¨ulich, M¨unchen and Stuttgart do already a lot with respect to education and courses for the customers there is still a need to continue the effort and may be to even increase it.

Discontinuous Galerkin for High Performance Computational Fluid Dynamics (hpcdg) Christoph Altmann, Andrea Beck, Andreas Birkefeld, Florian Hindenlang, Marc Staudenmaier, Gregor Gassner, and Claus-Dieter Munz

Abstract In this paper we present selected ongoing computations, performed on HLRS clusters. Three efficient explicit Discontinuous Galerkin schemes, suitable for high performance calculations, are employed to perform direct numerical simulations of isotropic turbulence and turbulent channel flow, large eddy simulations of cavity-flows as well as hybrid simulations of aeroacoustic phenomena. The computations were performed on hundreds to thousands computer cores.

1 Introduction The group of Prof. Claus-Dieter Munz is working in the field of high order discretization schemes for a wide range of continuum mechanic problems with a special emphasis on fluid dynamics. The main research focus lies on the class of Discontinuous Galerkin (DG) schemes. These schemes are gaining more and more popularity in the community due to their promising properties such as: arbitrary high order of accuracy on unstructured grids, compact data layout, low dispersion and dissipation errors, suitability for advection dominated problems, ability to handle non-conforming (grid and/or functional) approximations and high parallel and computational efficiency. Investigations show that our DG based codes are very efficient on modern CPU architectures, as the ratio of memory access/load to operation count is advantageous compared to traditional methods, especially when higher polynomial degrees are used for the approximation. Efficiency studies show that 9–17% peak performance on a Nehalem CPU is sustained. To obtain an estimation for the performance, a script provided by Holger Berger is executed on a Nehalem node

Christoph Altmann · Andrea Beck · Andreas Birkefeld · Florian Hindenlang · Marc Staudenmaier · Gregor Gassner · Claus-Dieter Munz Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, 70569 Stuttgart, Germany, e-mail: altmann/beck/birkefeld/hindenlang/staudenmaier/gassner/[email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 21, © Springer-Verlag Berlin Heidelberg 2012

277

278

C. Altmann et al.

Table 1 Table of specific CPU times for different codes and schemes Method Compact Finite Difference (O6, NS3D) Modal DG (p = 5, HALO) DGSEM (p = 5, STRUKTI)

Spec. CPU time for 3D NSE (Nehalem) [µs] 4 10 2

during simulation runtime. For example, the structured DG code sustained a performance of about 1.9 GFlop/s, resulting in about 17% peak performance. To estimate the efficiency of the discussed schemes and to facilitate a comparison to other state-of-the-art DNS flow solvers, we compared the specific CPU times. This figure of merit is computed as follows: The required CPU time for a typical simulation run is divided by the overall number of degrees of freedom (DOF) and by the overall number of time steps (as well as the number of intermediate stages, such as for instance used in the RK time integration scheme). Thus, the specific CPU time is the CPU time needed to evaluate the spatial operator exactly once, normalized by the overall number of spatial degrees of freedom. If the overall number of spatial degrees of freedom and the number of time steps for a new simulation is known, this specific CPU time can be used to estimate the total CPU resources required. Table 1 shows a comparison of such specific CPU times determined for the HLRS Nehalem cluster. We compare the specific CPU times for our unstructured code HALO and the structured code STRUKTI with an established IAG-inhouse compact finite difference code NS3D for the three-dimensional compressible Navier-Stokes equations. From our experience, the accuracy of those schemes is comparable for the same number of degrees of freedom and the size of the explicit time step, which demonstrates the high efficiency of the DG scheme and the motivation for this project, in which we want to apply our DG framework to relevant three-dimensional test cases and evaluate its capability. The advantage of a DG-based framework compared to classic discretization methods lies in its inherent parallel efficiency. As an example, Fig. 1 shows the strong scaling of the structured DG code STRUKTI up to 512 processors. The speciality in this scaling test is, that the domain consists of only 512 grid cells. Thus, the strong scaling is up to one grid cell on a processor when using 512 processors, while still reaching about 70% performance. The current report includes our results from DNS of compressible turbulent flows, preliminary aeroacoustic calculations and LES of a cavity flow.

2 Description of Methods and Algorithms We shortly describe the key features of the numerical schemes used for the calculations. In the group of Prof. Munz at IAG, the focus lies on the development of high order numerical schemes. The most prominent candidate is the discontinuous Galerkin (DG) method.

hpcdg

279

Fig. 1 Strong scaling up to 512 processors. The domain consists of 512 grid cells in total, thus the scaling is down to one element per processor

Discontinuous Galerkin (DG) schemes may be considered a combination of finite volume (FV) and finite element (FE) schemes. While the approximate solution is a continuous polynomial in every grid cell, discontinuities at the grid cell interfaces are allowed which enables the resolution of strong gradients. The jumps on the cell interfaces are resolved by Riemann solver techniques, already well-known from the finite volume community. Due to their interior grid cell resolution with high order polynomials, the DG schemes can use coarser grids. The main advantage of DG schemes compared to other high order schemes (Finite Differences, Reconstructed FV) is that the high order accuracy is preserved even on distorted and irregular grids.

2.1 High Order Discontinuous Galerkin Solver HALO A nodal discontinuous Galerkin scheme on a modal basis is implemented in the code HALO (Highly Adaptive Local Operator). The code runs on unstructured meshes composed of hexahedra, prisms, pyramids and tetrahedra. To maintain the high order accuracy at curved wall boundaries, a high order representation of the element boundaries is required. Several techniques for the construction of curved element boundaries are used, see [2, 9]. The code is designed for the computation of unsteady flow problems and fully parallelized with MPI [2]. The scheme is explicit and therefore each grid cell only needs direct neighbor information. This property allows a very efficient parallelization. The computation domain is decomposed by either ParMetis or recently also by the use of space filling curves. A major disadvantage of an explicit DG scheme may be the global time step restriction to assure stability. This restriction depends on the grid cell size, on the degree of the polynomial approximation and on wave speeds for advection terms and on diffusion

280

C. Altmann et al.

coefficients for diffusion terms. In HALO, this drawback is overcome by a special time discretization, the so called time-consistent local time stepping [3, 4]. The stability criterion is only locally applied to each grid cell, thus each cell runs with its optimal time step. Hence, the computational effort is concentrated on the grid cells with small time steps. On meshes with strongly varying grid cells as well as flow velocities, the number of operations is greatly reduced compared to an explicit global time stepping approach.

2.2 High Order DGSEM Solver STRUKTI A very efficient variant of a discontinuous Galerkin formulation is the discontinuous Galerkin spectral element method (DGSEM). This special variation of the DGmethod is based on a nodal tensor-product basis with collocated integration and interpolation points on hexahedral elements, allowing for very efficient dimensionby-dimension element-wise operations. An easy-to-use structured code (STRUKTI) was set up to test the performance of that method, especially for large scale calculations. First promising results clearly indicate its applicability to handle such tasks.

2.3 The Acoustic Solver NoisSol For the simulation of flow induced acoustic phenomena in complex domains a high order discontinuous Galerkin based solver is very well suited. It combines the use of unstructured grids, a low sensitivity to grid quality and low dispersion and dissipation errors. NoisSol is a solver for the linearized acoustic equations (Linearized Euler Equations and Acoustic Perturbation Equations [17]). It applies a discontinuous Galerkin scheme on triangular or tetrahedronal grid cells. The time discretization employs either an ADER scheme (Arbitrary High Order Scheme using Derivatives) or a TaylorDG scheme [6]. These schemes offer an arbitrary high order of convergence in space and time. NoisSol includes a local time stepping mechanism and allows MPI based parallel computations to reduce the overall wall-clock computation time.

3 DNS and LES of Isotropic Turbulence The classical Taylor-Green Vortex problem constitutes the simplest flow for which a turbulent energy cascade can be observed numerically [20]. Starting from an initial analytical solution containing only a single length scale, the flow field undergoes a rapid build-up of a fully turbulent dissipative spectrum because of non-linear interactions of the developing eddies (Fig. 2).

hpcdg

281

Fig. 2 Vorticity contours of the Taylor-Green Vortex colored by helicity after t = 1 s (left) and for fully developed turbulence after t = 9 s (right)

Due to its simplicity in both initial and boundary conditions, the Taylor-Green Vortex has been studied extensively in literature and serves as a well-established reference and benchmark test problem for direct numerical simulation (DNS) solvers and large eddy simulation (LES) subgrid scale models [11, 20, 21]. Our rationale for choosing this flow as an initial test case for the newly-developed DGSEM solver was two-fold: Firstly, the Taylor-Green Vortex allows for an easy validation of the code due to the readily available reference data while at the same time being ideally suited for a structured grid approach. Secondly, the physics of the flow field and the absence of wall boundaries constitute an excellent testbed for first LES implementations. Our simulations were run at a relatively high Reynolds number of 5000, which causes a broad spectrum of turbulent scales and an associated well-developed energy spectrum with typical turbulent features. Due to the periodic, isotropic nature of the flow, periodic boundary conditions on all sides of the hexahedral domain (2π × 2π × 2π ) were selected. Since the flow field is essentially incompressible, the flow Mach number was set to 0.1. The flow domain was discretized by up to 723 sixth-order elements, resulting in a maximum total of 80 mil. DOF and a timestep size of about 2.5 × 10−4 (overall about 40, 000 time steps). The computations were run on 512 processors, requiring slightly over 30 hours to simulate the physical time frame from 0 to 10 seconds for the highest resolution. The sustained performance of this computation was about 0.94 TFlop/s. Figure 3 shows the spectra of the kinetic energy contained within the scales of the Taylor-Green Vortex at a simulated physical time of t = 9 s. These spectra where obtained by postprocessing the DGSEM solution by means of parallel Fast Fourier Transform (FFT). Since this operation was very memory consuming, it was also run in parallel on up to 64 processors, requiring only about 1 minute of CPU time. The resulting data shows a fast convergence of the spectra up to the smallest scales for the high resolution test cases, proving the DNS character of our simulation. In addition, the spectra and their features agree very well with the reference data published e.g. in [11], thereby validating our code and its postprocessing toolchain for this type of

282

C. Altmann et al.

Fig. 3 Resolution study of the kinetic energy spectra of the Taylor-Green Vortex at Re = 5000, showing the convergence to the typical turbulent spectrum with pronounced inertial subrange and dissipation region

simulation [22]. As a next step and part of ongoing research, an LES formulation for the modeling of the small scales of the flow field was implemented and tested. Preliminary results at a lower Reynolds number indicate the high potential of LES for correctly capturing the important features of isotropic turbulence.

4 DNS of Turbulent Channel Flows As a first compressible turbulent test case, a channel flow is chosen, since a lot of DNS and LES results are published in literature for this type of problem, see e.g. [13] and [14]. For the upper and lower part of the computational domain (length 0 ≤ x ≤ 4π , width 0 ≤ z ≤ 4/3π , height −1 ≤ y ≤ 1), isothermal boundaries are applied, all other boundaries are periodic. A forcing term is added to sustain the flow. The flow is initialized using a laminar velocity profile in x-direction, superimposed by random noise. The channel flow was simulated at Ma = 0.5 and Ma = 1.5, while the Reynolds number was kept between Re = 3000 and Re = 4880. The final grid consists of 286, 720 fourth-order elements, resulting in a total of 35.84 Mio DOF. The average explicit time step size was Δ tavg = 1.194 × 10−4 , resulting in about 8.74 × 105 time steps for the overall simulation. The calculation was set up on 1000 processors in parallel for 24 h total runtime. The overall performance for this computation was about 1.5 TFlop/s. Figure 4 shows a slice of the instantaneous x-component of flow field at Ma = 1.5. Clearly, turbulent structures as well as the boundary layers are visible. For closer investigations, post-processing steps were taken. Figure 5 shows the quantities of x-velocity u and Mach number Ma in normal direction to the wall. The profiles are typical for turbulent flows: Velocity and Mach

hpcdg

283

Fig. 4 Slice of instantaneous x-velocity of the turbulent channel

Fig. 5 Left: Near-wall velocity profile for the Ma = 0.5, Re = 4880 case. Right: Space-time averaged flow quantities for the Ma = 0.5, Re = 4880 case Table 2 Comparison of selected flow quantities with results from literature (uc , Tc , ρc taken at the channel mid plane, Tw , ρw taken at the wall) uc Tc /Tw ρc /ρw

DNS 1.166 1.259 0.792

DNS (Coleman et al.) 1.168 1.378 0.723

LES (Lenormand et al.) 1.175 1.393 0.723

number show a distinct maximum at the channel center. The velocity maximum coincides well with results from the literature [13, 14], as indicated by Table 2.

5 Simulation of Cavity Flows The goal of this computation is a LES simulation of a cavity in three space dimensions at Ma = 0.8 and Re = 200,000 based on the cavity length, with a length L to depth D ratio of five, (L/D = 5). The realized calculations were used in an applicability/robustness study.

284

C. Altmann et al.

Fig. 6 Time averaged volume ribbons or streamlines in y- and z-direction

If for a DG method the resolution is close to resolving all small scale phenomena, then the DG approximate solution is either continuous or contains only small jumps at grid cell interfaces. For LES, however, the resolution is by definition too coarse to resolve all scales. Thus, jumps occur as the smallest scale phenomena cannot be resolved by a continuous approximate solution within the given resolution. The DG method has the basic property of tolerating discontinuous approximations, but only to a certain extent. Therefore, these small scale and highly oscillatory phenomena have to be damped without harming or changing the overall large scale solution. This can be achieved by adding some amount of viscosity, triggered by the local resolution of scales. Cavity flows, in general, can be classified into two types based on their length to depth ratio L/D: The ‘open cavity flow’ and the ‘closed cavity flow’. The latter occurs at a ratio L/D ≈ 7–11 for sub- and transonic flows. The flow over such a cavity can be thought of as a flow over a backward facing step followed by a forward facing step and a reattachment in the middle. Our cavity size or length is set dependent on the Reynolds number to non-dimensional L = 200 units. Its depth and width is D = 40 units. At the upstream inflow boundary face, a disturbed boundary layer is applied, while the remaining boundary conditions are fairly standard subsonic outflow and isothermal wall conditions. In the following, three-dimensional calculations of this open cavity flow are shown. The presented computations were run on a mesh consisting of 130,060 cells with 5,282,480 DOF. The minimum explicit time step of this calculation was Δ tmin = 4.11 × 10−4 , while due to the local time stepping the average explicit time step could be increased to Δ tavg = 2.37 × 10−2 . Thus the local time stepping scheme yields a speed up of the computation by a factor of about 57.6. These computations were carried out on 320 processors in parallel, with a total runtime of 450.5 h. The overall performance of the computation was 200 GFlop/s. Figure 6 shows volume ribbons or streamlines of the time averaged velocity field. At the distinct po-

hpcdg

285

Fig. 7 Perspective view of the isosurface Λ2 = −0.0005, showing the eddy structures at time level t = 24,000 with color denoting the normal y-coordinate

sition z/D = 0.55 (right), the separation of the flow occurs. Everything left of this imaginary line is drawn towards the cavity and deflected from the cavity on the other side. When looking closer at this position, using 28 streamlines in vertical direction for flow visualization, one can see that the lowermost particles are generally pushed away, while the flow above is still drawn into the cavity and towards the center. Figure 7 shows isosurfaces of the Λ2 vortex detection criterion. The sidewalls of the cavity confine the eruption of vortices to the centerline. Triggered by the cavity edge, a shear layer develops, which starts to roll up into vortices at around x/D ≈ 1. Restricted by the depth and width of the cavity, these vortices cannot grow so much in size. This results in a small vortical flow regime with only one hairpin like vortex at the centerline. These results indicate that in the context of LES modeling, the proposed approach can be classified as implicit LES with an intuitive choice for the amount of added artificial viscosity. All computations remained stable. From a qualitative viewpoint, the obtained results are plausible and valid and will be used in future work for comparisons.

6 Hybrid Simulation of Aeroacoustic Phenomena For the acoustic simulation in low Mach number flows, the use of a hybrid approach has proved useful. In this method, the acoustic simulation is carried out separately from the flow simulation. The coupling is achieved by acoustic source terms, which can be deduced from the computation of time-resolved equations (e.g. with LES or DES) or time-averaged equations (RANS) with a synthetic source terms approach (RPM/SNGR) [19].

286

C. Altmann et al.

Fig. 8 Single cylinder scattering, mean flow O1 Fig. 9 Single cylinder scattering, mean flow O4

In our recent research we focused on several issues: 1. 2. 3. 4.

Hybrid grid coupling with the Finite Difference acoustic solver PIANO [15, 18] Better handling of source terms Better handling of highly inhomogeneous mean flows Wall boundary conditions

While the first two issues were the main topics in the proposal, the last ones showed their relevance during the ongoing work. Larger test runs revealed serious problems with mean flow inhomogeneities close to the wall and the coupling interface. In particular, a very good representation of the vanishing mean flow velocity at the wall is necessary to ensure the validity of the wall boundary condition. The same needs to be fulfilled for the velocity source terms of the Acoustic Perturbation Equations. To ensure these constraints, a space dependent mean flow representation within each grid cell and a combined nodal/modal scheme [1, 5] have been implemented [16]. Another building block that proved to be helpful in suppressing aliasing related disturbances is a modal filter as presented by Hesthaven and Warburton [1]. These improvements required extensive changes in NoisSol, which already showed their operability and potential in small test cases [16]. Figures 8 and 9 show the results of a single cylinder scattering test case in 2D. The acoustic field is excited by a sinusoidally pulsating monopole source upstream of the circle in a potential flow coming from the right. While the calculation with cell-constant mean flow (Fig. 8) suffered from severe instabilities due to a violation of the boundary condition, the improved simulation with the high order mean flow showed convincing results (Fig. 9). We still need a larger number of simulations to find out suitable scheme and parameter combinations for the planned applications. These are being performed and analyzed at the moment. Hence, large scale computations have not been performed yet.

hpcdg

287

7 Summary and Outlook This report summarizes our efforts and results obtained during the first months of this project. We have adopted and applied our DG schemes to compute relevant test problems and evaluated the HPC capability of our framework. Several successful runs with up to 1000 processors in parallel clearly demonstrate its potential. Compressible flow problems, such as turbulent channel flow and isotropic homogeneous turbulence, are simulated using full spatial and temporal resolution, i.e. direct numerical simulations. Another application is an implicit large eddy simulation based on local resolution adapted viscosity of a compressible flow past a cavity. Moreover, preliminary pure acoustic computations were set up and performed in preparation for the large scale application. In the next few month of this project the goal is to establish our DG based framework as a high performance compressible LES solver for complex geometries. To achieve this, several applications of our scheme with LES models for turbulent test cases will be performed and compared to our own DNS results generated within this project. As LES models for DG approximations of compressible flows are a rather new field of research with only few available results, those computations are essential to evaluate different LES strategies within our high order DG framework. Acknowledgments. The research presented in this paper was supported in parts by Deutsche Forschungsgemeinschaft (DFG), amongst others within the Schwerpunktprogramm 1276: MetStroem and the Graduiertenkolleg 1095: Aerothermodynamische Auslegung eines ScramjetAntriebssystems f¨ur zuk¨unftige Raumtransportsysteme and the research projects ADIGMA/ IDIHOM within the European Research Framework Programme.

References 1. J. S. Hesthaven and T. Warburton, Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications, 1st Edition, Springer, Berlin, 2007. 2. F. L¨orcher, Predictor Corrector DG, PhD thesis, University of Stuttgart, (2008). 3. F. L¨orcher, G. Gassner and C.-D. Munz, A discontinuous Galerkin scheme based on a spacetime expansion I. Inviscid Compressible flow in one space dimension, J. Sci. Comp., Vol. 32, pp. 175–199, (2007). 4. G. Gassner, F. L¨orcher and C.-D. Munz, A discontinuous Galerkin scheme based on a spacetime expansion II. Viscous flow equations in multi dimensions, J. Sci. Comp., Vol. 34, pp. 260–286, (2007). 5. G. J. Gassner, F. L¨orcher, C.-D. Munz and J. S. Hesthaven, Polymorphic nodal elements and their application in discontinuous Galerkin methods, Journal of Computational Physics, Vol. 228, Issue 5, (20 March 2009). 6. F. L¨orcher, G. Gassner and C.-D. Munz, Arbitrary High Order Accurate Time Integration Schemes for Linear Problems, European Conference on Computational Fluid Dynamics, ECCOMAS CFD, (2006). 7. G. Gassner, F. L¨orcher and C.-D. Munz, An explicit discontinuous Galerkin scheme with local time-stepping for general unsteady diffusion equations, J. Comput. Phys., Vol. 227, pp. 5649– 5670, (2008).

288

C. Altmann et al.

8. G. Gassner, M. Dumbser, F. Hindenlang and C.-D. Munz, Explicit one-step time discretizations for discontinuous Galerkin and finite volume schemes based on local predictors, J. Comput. Phys., In Press, Corrected Proof, (2010). 9. F. Hindenlang, G. Gassner, T. Bolemann and C.-D. Munz, Unstructured high order grids and their application in discontinuous Galerkin methods, Conference Proceedings, V European Conference on Computational Fluid Dynamics ECCOMAS CFD 2010, Lisbon, Portugal, (2010). 10. S. Hickel, Implicit turbulence modeling for large-eddy simulation, PhD thesis, TU Dresden, (2005). 11. M. E. Brachet, Direct simulation of three-dimensional turbulence in the Taylor–Green vortex, Fluid Dynamics Research, Vol. 8, pp. 1–8, (1991). 12. P.-O. Persson and J. Peraire, Sub-Cell Shock Capturing for Discontinuous Galerkin Methods, Proc. of the 44th AIAA Aerospace Sciences Meeting and Exhibit, (2006). 13. G. N. Coleman, J. Kim and R. D. Moser, A numerical study of turbulent supersonic isothermalwall channel flow, J. Fluid Mech., Vol. 305, pp. 159–183, (1995). 14. E. Lenormand, P. Sagaut and L. Ta Phuoc, Large eddy simulation of subsonic and supersonic channel flow at moderate Reynolds number, Int. J. Numer. Meth. Fluids, Vol. 32, pp. 369–406, (2000). 15. A. Birkefeld and C.-D. Munz, A Hybrid Method for CAA, Proc. of the 36 Jahrestagung f¨ur Akustik der Deutschen Gesellschaft f¨ur Akustik, DAGA, Berlin, (2010). 16. A. Birkefeld, A. Beck, M. Dumbser, C.-D. Munz, D. K¨onig and W. Schr¨oder, Advances in the Computational Aeroacoustics with the Discontinuous Galerkin Solver NoisSol, Proc. of the 16th AIAA/CEAS Aeroacoustics Conference (31st AIAA Aeroacoustics Conference), Stockholm, (2010). 17. R. Ewert and W. Schr¨oder, Acoustic perturbation equations based on flow decomposition via source filtering, Journal of Computational Physics, Vol. 188, pp. 365–398, (2003). 18. J. W. Delfs, M. Bauer, R. Ewert, H. A. Grogger, M. Lummer and T. G. W. Lauke, Numerical Simulation of Aerodynamic Noise with DLR’s aeroacoustic code PIANO, Deutsches Zentrum f¨ur Luft- und Raumfahrt e.V., Institut f¨ur Aerodynamik und Str¨omungstechnik. 19. R. Ewert, (Broadband slat noise prediction based on CAA and stochastic sound sources from a fast random particle mesh RPM) method, Computers & Fluids, Vol. 37, pp. 369–387, (2008). 20. S. Orszag, Numerical simulation of the Taylor-Green vortex, Computing Methods in Applied Sciences and Engineering Part 2, Lecture Notes in Computer Science, Vol. 11, pp. 50–64, (1974). 21. D. Fauconnier, Development of a Dynamic Finite Difference Method for Large-Eddy Simulation, PhD thesis, University of Gent, Belgium, (2009). 22. A. Beck, G. Gassner, I. Horenko, R. Klein and C.-D. Munz, Technischer Report im Rahmen des Schwerpunktprogramms 1276 (MetStroem), Lugano, Berlin, Stuttgart, (2011).

Highly Efficient and Scalable Software for the Simulation of Turbulent Flows in Complex Geometries Daniel F. Harlacher, Sabine Roller, Florian Hindenlang, Claus-Dieter Munz, Tim Kraus, Martin Fischer, Koen Geurts, Matthias Meinke, Tobias Kl¨uhspies, Volker Metsch, and Katharina Benkert

Abstract This paper investigates the efficiency of simulations for compressible turbulent flows with noise generation in complex geometries. It analyzes two different approaches and their suitability with respect to quality as well as turn around times required in industrial DoE processes. One approach makes use of a high order discontinuous Galerkin scheme. The efficiency of high order schemes on coarser meshes is compared to lower order schemes on finer meshes. The second approach is a 2nd order Finite Volume scheme, which employs a zonal coupling of LES and RANS to enhance efficiency in turbulence simulation. The schemes are applied to three industrial test cases which are described. Difficulties on HPC systems, especially load-balancing, MPI and IO, are pointed out and solutions are presented.

Daniel F. Harlacher · Sabine Roller Applied Supercomputing in Engineering, German Research School for Simulation Sciences and RWTH Aachen University, 52056 Aachen, Germany, e-mail: [email protected], [email protected] Florian Hindenlang · Claus-Dieter Munz Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, 70569 Stuttgart, Germany, e-mail: [email protected], [email protected] Tim Kraus · Martin Fischer Robert Bosch GmbH, 70049 Stuttgart, Germany, e-mail: [email protected], [email protected] Koen Geurts · Matthias Meinke Chair of Fluid Mechanics and Institute of Aerodynamics, RWTH Aachen University, 52056 Aachen, Germany, e-mail: [email protected], [email protected] Tobias Kl¨uhspies · Volker Metsch Trumpf Werkzeugmaschinen GmbH + Co. KG, 71254 Ditzingen, Germany, e-mail: [email protected], [email protected] Katharina Benkert H¨ochstleistungsrechenzentrum Stuttgart (HLRS), Nobelstr. 19, 70569 Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 22, © Springer-Verlag Berlin Heidelberg 2012

289

290

D.F. Harlacher et al.

1 Introduction Numerical simulation of turbulent flows in industrial settings is often realized by Reynolds-averaged Navier-Stokes equations (RANS). These algorithms produce stationary, time-averaged solutions, but the physical processes in industry, especially those concerning the noise generation and aero-acoustics, are time dependent. Therefore, RANS methods are not suitable here. The other end of the spectrum are Direct Numerical Simulations (DNS) which resolve all flow phenomena, but are computationally too expensive. The combination of both is the Large Eddy Simulation (LES), where larger structures (eddies) are resolved as in DNS, but everything below the mesh resolution is approximated by models as in RANS simulations. LES is the current state-of-the-art method used in research to combine accuracy with acceptable turn-around times. Nevertheless, even this approach is still computationally too expensive to be deployed in industrial development processes. This is the starting point for the current project. We aim for enhancing and advancing the methods further to reach industrially acceptable turn-around times. The project aims to simulate real life industrial applications and to validate the efficiency of two different approaches. One of the methods is a high order Discontinuous Galerkin solver (HALO, developed by IAG and GRS), the other is a second order Finite Volume scheme (TSF, developed by AIA). Efficiency is sought in a numerical sense, i.e. quality vs. stability, as well as in parallel and single-CPU efficiency. Both codes require communication with direct neighbors only, the HALO code additionally uses local time steps which means that global (all-to-all) communications are avoided. The acoustics is simulated in a closely coupled way in the HALO approach, i.e. flow and acoustics are calculated within one scheme, and a weakly coupled approach where the TSF flow simulation is coupled with an acoustic simulation with PIANO afterwards. Within the project, different issues have been tackled in the context of high scalability on the supercomputing facilities. Especially memory consumption, load-balancing, MPI-IO, and hybrid parallelization are of interest. Three main test-cases are investigated in the STEDG project. Proposed by industrial partners, they comprise unsteady turbulent flow features in complex geometries, requiring a high resolution both in space and time. For these flow regimes, commercial solvers reach their limit of applicability, especially regarding product design processes. The first test-case is a natural gas injector. We focus on the direct simulation of aero-acoustics to predict the noise generated by the break up of a turbulent supersonic jet. The second test-case is investigating the flow through the nozzle and the kerf in a laser cutting configuration. The cut-pattern depends strongly on the internal gas dynamics, since the high pressure ratios lead to strong shock interactions. The third test-case is an airfoil at high angle of attack, high Reynolds number and low Mach number. The paper is structured as follows: first, the numerical schemes are introduced in Sect. 2, then the HPC aspects are described in Sect. 3. The three industrial test-cases are presented in Sects. 4, 5 and 6. In Sect. 7 turbulence modeling using high order schemes is described. The results are summarized and an outlook for future work will be given in Sect. 8.

STEDG

291

2 Description of Methods and Algorithms We shortly describe the key features of the two numerical schemes. The one is a high order discontinuous Galerkin (DG), the other a second order finite volume (FV) scheme with special implementations regarding LES computations. Actually, finite volume schemes can be found in most commercial flow solvers, i.e. CFX or FLUENT, thus being state of the art in industrial development processes.

2.1 High Order Discontinuous Galerkin Solver Discontinuous Galerkin (DG) schemes may be considered as a combination of finite volume (FV) and finite element (FE) schemes. While the approximate solution is a continuous polynomial in every grid cell, discontinuities at the grid cell interfaces are allowed which enables the resolution of strong gradients. The jumps on the cell interfaces are resolved by Riemann solver techniques, already well-known from the finite volume community. Due to their interior grid cell resolution with high order polynomials, the DG schemes may use coarser grids. The main advantage of DG schemes compared to other high order schemes (finite differences, reconstructed FV) is that the high order accuracy is preserved even on distorted and irregular grids. The discontinuous Galerkin scheme developed in the group of Prof. Munz is implemented in the code HALO (Highly Adaptive Local Operator). The code runs on unstructured meshes consisting of hexahedra, prisms, pyramids and tetrahedra. To maintain the high order accuracy at curved wall boundaries, a high order representation of the element boundaries is required. Several techniques for the construction of curved element boundaries are used, see [3, 8]. The code is designed for the computation of unsteady flow problems fully parallelized with MPI [3]. The scheme is explicit and therefore each grid cell only needs direct neighbor information. This property allows a very efficient parallelization. The computation domain is decomposed by either ParMetis or recently also by the use of space filling curves. A major disadvantage of an explicit DG scheme may be the global time step restriction to establish stability. This restriction depends on the grid cell size, on the degree of the polynomial approximation, on wave speeds for advection terms, and on diffusion coefficients for diffusion terms. In HALO, this drawback is overcome by a special time discretization, so called time-consistent local time stepping [4, 5]. The stability criterion is only locally applied to each grid cell, thus each cell runs with its optimal time step. Thus, the computational effort is concentrated on the grid cells with small time steps. On meshes with strongly varying grid cells as well as flow velocities, the number of operations is greatly reduced compared to an explicit global time stepping approach. With this time-stepping approach, the scheme is highly local w.r.t. spatial as well as temporal behavior. Therefore, the scheme is highly suitable for massively parallel systems and high scalability. Additionally, the high order representation with polynomials of high degree, the scheme is expected to be less memory bandwidth bound then low order schemes.

292

D.F. Harlacher et al.

2.2 Zonal RANS-LES Coupling on 2nd Order Finite Volume Scheme The flow solver of the Institute of Aerodynamics Aachen (TFS) solves the NavierStokes equations for three-dimensional compressible flows on a block-structured grid. A modified AUSM method as introduced in [12] is used for the Euler terms which are discretized to second-order accuracy by an upwind-based approximation. For the non-Euler terms, a centered approximation of second-order accuracy is used. The temporal integration is done by a second-order accurate explicit 5stage Runge-Kutta method with coefficients optimized for maximum stability, [13]. The sub-grid scale modeling for the large-eddy simulations is based on an implicit ansatz, i.e., the MILES (monotone integrated LES) approach of Boris et al. [14]. For the RANS zones the Spalart-Allmaras turbulence model was chosen to close the Reynolds-averaged Navier-Stokes equations. To reduce computation time, the solution methods based on RANS and LES can be combined into a zonal method. The LES regions are used to resolve the leading and trailing edge region where flow separation occurs, while the RANS zone is used for the attached flow regions. The schematics of the overlapping zones is shown in Fig. 1, where the values that have to be communicated back and forth between the two different approaches are indicated. In the overlapping region, where the flow is directed from a RANS to LES zone, synthetic eddies are introduced to accelerate the generation of coherent turbulent structures using the method of Jarrin [19]. Furthermore, control planes are used to drive the solution towards the correct turbulence level in the LES domain according to Spille and Kaltenbach [18]. When the flow is coming from the LES into the RANS domain, the RANS requires a definition of the eddy viscosity νt , the value of which is reconstructed from time and spatial averaging of the LES data. The aeroacoustic computations in the PIANO code are performed by solving the acoustic perturbation equations (APE) [15], derived from the viscous conservation laws by applying source filtering based on an eigendecomposition in Fourier/Laplace space. Only transportation effects related to acoustical eigenmodes contribute to the operator on the left-hand side, while the remaining terms form the corresponding acoustical sources on the right-hand side. The APE offer stable linear

Fig. 1 Schematic overview of the overlapping regions between the RANS and LES regions. Here ρ denotes the density, V the velocity vector, p the pressure and νt the turbulent eddy viscosity, respectively

STEDG

293

acoustical propagation in arbitrary mean flows while taking into account convection and refraction effects. They have been successfully applied to several aero-acoustic problems including trailing edge noise [16], high-lift airfoil noise [17], and others.

3 HPC: Issues, Techniques, and Optimization Several issues have been observed when analyzing the efficiency and scalability of the schemes. Not all of them were due to the application, but to libraries like MPI or ParMetis. The problems and the work-arounds are described in this section.

3.1 Memory Problems/MPI-IB Buffer The first issue is the influence of communication buffers on the available memory. In particular the behavior of the MPI-buffers of socket based communication protocols as they appear in commodity clusters using Infiniband interconnect are investigated. Communication buffers are allocated at runtime for each process communicated with, and decrease the memory left for the application. The number of allocated buffers on each core depends strongly on the communication patterns and can vary strongly from core to core. For example OpenMPI uses by default a communication buffer with the size of 512 kb per connection for each process. These buffers are not deallocated after usage, thus a program with highly dynamical communication patterns (for example due to dynamic load-balancing) will suffer from a steady increase of buffers filling up the memory. Sophisticated MPI collectives have the potential to reduce the number of connections and with that the needed communication buffers. At the cost of runtime, memory could be reduced in the MPI internally, but the application cannot rely on it. Figure 2 shows behavior of MPI ALLTOALL with small message sizes (here: one double precision float) in comparison to large messages (one thousand double precision floats). Only for small messages a memory saving communication pattern is used by MPI internally.

Fig. 2 Memory demand of MPI ALLTOALL call depending on message and communicator size

294

D.F. Harlacher et al.

The restrictions imposed by the memory requirements of the MPI library result in the need for strategies to overcome these problems. Algorithms with all-to-all or many-to-all communication patterns have to be avoided. In our case, the graph partitioning library ParMetis (which contains all-to-all communication) was replaced by a load-balancing on basis of a space filling curve.

3.2 Load Balancing: Importance of Accurate Weight Prediction Load balancing is essential for highly dynamic applications as they are presented in this paper. This is not only due to the highly unsteady flow, but also to the specific demands of the chosen DG scheme in HALO, e.g. curved elements and local timestepping. To ensure a proper load-balancing, heuristics were used to estimate the computational weight of each element. However this was found to be too inaccurate for sophisticated balancing. To improve the load estimation, time measurements for each element are introduced, which capture all possible influences and provide most accurate computational loads. The load is then distributed by a partitioning algorithm which is acting on a space-filling curve (SFC). The algorithm ensures an even distribution of the weights across the used cores under the constraint not to change the order of the elements. Leveraging the implicitly given area to volume ratio of the SFC and combining it with the exact computational cost for each element results in a performance increase of 10% on 1024 cores over a conventional combination of heuristic weights and a graph based partitioning algorithm. The overhead of the time measurements surely can not be neglected but it still has to be shown, that a systematic can be derived from the gathered data. For now the proposed process shows the best results even with the frequent measurements.

3.3 Hybrid MPI/OpenMP To ensure efficiency on hierarchical machines with increasing number of cores per node, hybrid parallelization using MPI and OpenMP parallelization is necessary. In the following section, the results of cluster benchmarking using the flow solver TFS of the AIA Aachen is presented. Computations are performed on three different clusters, namely the in-house AIA cluster (Intel Xeon [email protected]), the Nehalem cluster in Stuttgart and the BlueGene IBM in J¨ulich, and the speed-up of the computations with an increasing number of processes is compared. Figure 3 shows the speed-up when increasing the number of OpenMP threads per CPU. Speed-up from one to two OpenMP threads on all clusters are relatively good. Increasing the number further on the AIA cluster however doesn’t increase performance as expected. All 3 clusters show a good scaling up to 4 processes. BlueGene has only 4 cores per CPU, and results in the best speed-up, however, it should be noted that the BlueGene CPUs are slower than Nehalem or AIA Intel CPUs, resulting in a higher

STEDG

295

Fig. 3 Speedup comparison for increasing num- Fig. 4 Speedup comparison for increasing number of OpenMP processes ber of MPI processes

speed-up, while the Nehalem and AIA cluster cores are not fully loaded due to memory bandwidth limitations. The BlueGene cores don’t run into the same bandwidth limits due to the slower CPU-speeds. It shows that the BlueGene has a more optimal byte to flop ratio. Figure 4 presents the speed-up for an increasing number of CPUs and thus an increasing number of MPI processes. The maximum number of cores per CPU are used. The reference for the speed-up is the computation time of 1 CPU with all cores working on an OpenMP thread each. For the Nehalem and AIA clusters, this results in 8 threads per CPU, while the maximum number of threads on the BlueGene is 4 per CPU. Due to the fact that the inter-thread communication of the AIA cluster is slow, the internode bandwidth isn’t the bottleneck and scales perfectly with increasing CPU numbers. The BlueGene is known to have very fast internode communication, which is clearly visible, while the Nehalem cluster shows a drop for two nodes. After that, it scales up linearly as does BlueGene.

3.4 MPI-IO with PIANO All test cases presented in this section have been performed on an IBM BlueGene/P. For data intensive applications like the present hybrid aeroacoustic prediction method, I/O is probably the most severe bottleneck on today’s supercomputer architectures. For I/O operations PIANO utilizes the HDF5 library, which offers an open data format standard supporting collective parallel I/O via MPI-IO. However, the strong scaling of the output of four double precision variables on a 2563 grid in Fig. 5 clearly reveals no speedup at all. Figure 6 indicates that the I/O bottleneck becomes dominant when scaling up a fixed grid size while writing periodical snapshots of the computational domain every tenth time step.

296

D.F. Harlacher et al.

Fig. 5 Strong scaling speedup of the output Fig. 6 Strong computation and output scaling written by PIANO for a 2563 grid using the speedup of PIANO for a 2563 grid using the HDF5 I/O library HDF5 I/O library

4 Industrial Test-Cases I: Gas Injection Nozzle The setting of the simulation is given by an expansion of compressed natural gas into free space through an injection nozzle, commonly used in the automobile industry. The focus of investigation is the resulting highly turbulent jet producing acoustic waves which propagate into the far field. Both experimental and numerical data are available for comparison [1, 2]. The inlet conditions at the nozzle exits result in a flow of four separated under-expanded jets, each dominated by a system of shockcells (see Fig. 7). After the collapse of the individual shock-cells, these jets combine to a highly fluctuating jet flow with strong acoustic activity. To capture the acoustic wave generation of the jets separately as well as the flow and wave interaction, a simulation of the entire three-dimensional geometry is necessary. Due to high flow speed, sophisticated device geometry and the small size of the shock-cells, a very high resolution of physics is required within the silencer geometry. Therefore this domain demands the largest part of computational efforts in the simulation. With a calculated Kolmogorov length of ∼ 10−7 m a full direct numerical simulation of all turbulent scales is out of reach. As we are mainly interested in the resulting far-field aero-acoustics of this case, it is not required to fully resolve the turbulence scale of Kolmogorov [1] here. Even though simulations are under-resolved, high quality aero-acoustic results are achieved.

4.1 Computational Domain and Simulation Parameters The geometry, as depicted in Fig. 8, represents a silencer duct with a main-diameter of 6.11 × 10−3 m. In total, the length of the geometry is 3.74 × 10−3 m. At the outlet of the silencer, a notch reduces the inner diameter to 5.95 × 10−3 m. At the base of the duct a step narrows the flow domain to a diameter of 2.26 × 10−3 m. In the bottom of the duct, four kidney shaped orifices are symmetrically positioned around

STEDG

297

Fig. 7 Picture from Schlieren optic measure- Fig. 8 Nozzle exit geometry ment of the jet, showing the density gradient in streamwise direction

a flow separating truncated cone that arises in the middle of the silencer. These inlets are located at a radius of r = 1.71 × 10−3 m (measured from middle-axis of the duct to the averaged free-stream center). For the simulation, the orifices were extended by 0.05 × 10−3 m to define proper face geometries for the specific inlet boundary profiles resulting from a stationary simulation of the injector geometry [2]. Air (γ = 1.4, R = 288 kJ/kg K) enters the domain with a non uniform normal velocity profile with mean value w = 563.1 m/s. The respecting pressure and density distributions over the orifices with mean values of p = 139600 Pa and ρ = 1.221 kg/m3 yield a mean Mach number of Ma = 1.4 and a mean temperature of 400 K. The ambient conditions are ρ∞ = 1.2 kg/m3 , u∞ = v∞ = w∞ = 0.0 m/s and p∞ = 120000 Pa. The Reynolds number based on the width of an orifice is Re = 52000 (μ = 3.7 × 10−5 m2 /s, Pr = 0.72). At all free flow boundaries, sponge layers, based on explicit relaxation, are employed to prevent spurious reflections of acoustical waves. A cut through the cylindrical computational domain is shown in Fig. 9. The 3D domain is subdivided into three zones according to different computational needs, originating from the varying flow phenomena in this complex setup. The first zone consists of the nozzle outlet region, where shock structures and non-linear interaction of acoustic waves with the nozzle geometry and the flow have to be resolved. The second domain captures the expansion of the resulting jet into the far-field and the third needs to resolve the transport of acoustic waves. The employed mesh has to consider these zones, in order to reach satisfying performance. In Table 1 an overview over the mesh properties in the three different domains is given. The different zones are connected by tetrahedral interfaces which coarsen the element size to match the appropriate cell sizes in the adjoining zones. The resulting mesh contains 2,566,528 hexahedra, 1,522,760 tetrahedra and 76,864 pyramids which yields a total number of 4,166,152 elements. The solution is computed with a second order DG scheme using a local time stepping algorithm.

298

D.F. Harlacher et al.

Fig. 9 Cut view of the used mesh. The three Fig. 10 Averaged Mach number distribution on regimes represented by different colors a cut plane Table 1 Computational domains with mesh properties Domain 1 2 3

Element unstr. hexahedral unstr. hexahedral unstr. hexahedral

No. of Elements 1.3 mio 1 mio 200,000

Δ tavg 0.515 × 10−6 0.9238 × 10−6 0.3915 × 10−5

Δ tmin 0.7509 × 10−9 0.8868 × 10−9 0.5661 × 10−8

With this approach a total of 16,664,608 degrees of freedom have to be computed in the whole computational domain. To avoid instabilities caused by the shock structures near the nozzle exit a shock capturing scheme, based on the work of Persson and Peraire [11], is used. For the simulation of 2 × 10−3 s the simulation ran for 44 hours on 1024 cores.

4.2 Results Flow field. According to Fig. 7, the flow field of the jet is dominated by shock structures at the nozzle exits and a subsequent combination of the four single jets into one turbulent expanding jet. The simulation predicts up to four shock cells with supersonic velocities, Fig. 10. This is in good agreement to the Schlieren measurements, showing that the mesh resolution in this domain is capable to resolve the physical effects. To prescribe the successive expansion in streamwise direction, the axial profile of the jet width δ0.03 is presented in Fig. 11. This quantity characterizes the distance from the jet axis to the location, where the mean velocity in streamwise direction drops below 3% of the respective peak value. In Fig. 11, δ0.03 is plotted against the axis coordinate z, both referred to the radius r on which the kidney-shaped orifices are located. The prediction by the simulation matches very well with experimental data from Particle Image Velocimetry (PIV). Thus, an exact representation of the jet development is provided by the used grid.

STEDG

Fig. 11 Axial profile of the jet width δ0.03

299

Fig. 12 Contour plot of density distribution (rainbow scale from 0.8 to 1.3 kg/m3 )

Acoustics. To evaluate the accuracy of aeroacoustic prediction, the Sound Pressure Level (SPL) Spectra at the emission angles of 45 and 90 degrees to the jet axis are compared with measurement data. The density contour plot in Fig. 12 shows the propagation of sound waves into the far field produced by the turbulent jet. As we are interested in sound of the audible frequency range, microphones with a sensitivity up to 20 kHz and a sampling rate of 96000 Hz were used in the experimental setup. The Fourier analysis of the results consists of 4096 DFT (discrete Fourier transform) points which yields a physical evaluation time of 43 ms. Additionally the data is averaged over 10 measurement samples. In the simulation setup 10 monitor points, detecting the pressure history in time, were positioned on half a circle around the jet axis, corresponding to the defined emission angles. Caused by the local time stepping, the data is represented in non-equidistant time-steps. As the FFT algorithm needs an equidistant representation of the time data, it is first interpolated to an averaged time-step before being transformed into the frequency domain. An additional improvement of the FFT results is obtained by averaging over the data of the 10 monitor points for each emission angle, respectively. Due to the fact that periodical simulation data is present for only 1 ms and a minimum of 5 wavelengths should be represented for reliable FFT data, the examined frequencies range from 5–20 kHz. As the microphones were positioned at 250 mm from the silencer exit, the simulation data is projected using the inverse square law for sound emission, assuming a point source at the silencer exit. In Figs. 13 and 14 the resulting Sound Pressure Level Spectra are shown. One can conclude that the simulation data agrees very well with experiments. The overall sound pressure level (OASPL) is slightly over-predicted by 3 dB in both emission directions. Due to the quite short physical simulation time (2 ms) these results are expected to be improved by simulating a time period of 5 ms.

300

D.F. Harlacher et al.

Fig. 13 Sound Pressure Level Spectrum at 45◦ Fig. 14 Sound Pressure Level Spectrum at 90◦ emission angle emission angle

5 Industrial Test-Cases II: Laser Cutting Nozzle The development of laser cutting nozzles requires a large number of geometry variations. The investigation of cutting behavior and cut quality based on experiments is time consuming and expensive. Therefore simulations of gas dynamics in addition to experimental tests are necessary. The goal is an unsteady simulation, resolving the time-dependent phenomena which greatly influence the quality of the cut. The results shown below are a pre-work to these simulations and were performed stationary to prove the principal capability of modern flow solver to capture the flow phenomena within a certain accuracy. The simulation results are compared to experimental data. The setup considered consists of a nozzle geometry and a kerf as depicted in Fig. 15. A high pressured gas flow is led into the nozzle and accelerated throughout the geometry. This leads to very high velocities and pressure levels. The outlet of the nozzle is located 1 mm from the kerf so that the expansion of the flow interacts with the kerf as well as with the upper side of the workpiece and the nozzle itself. In the kerf, high Mach Numbers and shocks are a result of the abrupt change of diameter in the mean flow.

5.1 Computational Domain and Simulation Parameters The nozzle, depicted in Fig. 15 has an inlet to outlet ratio of 7 : 1 and a total length of 15.5 × 10−3 m. The complete computational domain is shown in Fig. 16. Nitrogen (γ = 1.4 and R = 296 kJ/kg K) enters at the top of the domain with a prescribed pressure of p = 2.2 × 106 Pa and a velocity normal to the inlet surface of 0.0 m/s. The ambient conditions at the pressure outlet are given as ρ∞ = 1.2 kg/m3 and p∞ = 101325 Pa. The mesh used for the steady state analysis consists of 669,450 elements, simulated with the commercial flow solver Fluent. The simulation of the

STEDG

Fig. 15 Experimental setup of the test-case

301

Fig. 16 Computational domain

half-section was performed using second order accuracy in space and a standard k–ω turbulence model. After 4000 iterations the necessary residual values were reached. The computational cost for the simulation is in the range of 70 cores.

5.2 Results The characteristic flow phenomena are depicted in Fig. 17. One can observe a curved shock, a shock disc at the outflow of the kerf as well as a boundary layer separation at the cutting front. Figure 17 also shows the measurement points of the experiment conducted to ensure not only qualitatively but also quantitatively good results. Figure 18 shows the comparison of experimental and simulated results. The results show that the predictions using Fluent compare very well to the experimental results, besides the positions of the shocks which seem to be not predicted completely accurate. In order to achieve results of unsteady flow the complete geometry needs to be simulated to account for the time-dependent turbulence. Much higher computational costs are expected and therefore the suitability of Fluent as simulation tool is questionable. Figure 19 shows the strong scaling behavior of Fluent which drops below 65% at 32 processes. Thus, the usage of HALO for unsteady simulations is targeted and especially the shock capturing should improve when using this method.

6 Industrial Test-Cases III: High-Lift Configuration The airfoil profile in this study is the research airfoil known as HGR-01. This airfoil is designed to have a mixed stall behavior of leading edge and trailing edge stall [20]. Specific to this airfoil is that the trailing edge separation is moving upstream from

302

D.F. Harlacher et al.

Fig. 17 Pressure contour plot of the symmetry Fig. 18 Comparison between experimental and plane (rainbow scale from 0 to 150 kPa) predicted pressure values using Fluent

Fig. 19 Strong scaling of the half geometry on the HLRS cluster

the trailing edge before full stall occurs. An extensive database of experimental results is available. The chosen test configuration for this study has a Reynolds number of 0.6565 · 106 and an angle of attack of 12◦ at the low speed Mach number of 0.15.

6.1 Zonal RANS/LES Computation Grid Two separate LES zones are built around the leading and trailing edge area where flow separation occurs. In Fig. 21 these LES zones are shown in red color. Around the LES zones a RANS domain is defined which is shown in black in the same figure and overlaps as required with the LES domain. In this overlap region the flow variables of the different turbulence model regions are transferred as described in Sect. 2.2. A sketch of the setup is shown in Fig. 20. The purpose of the applied zonal RANS/LES computations is to reduce calculation time and cost, but to maintain the same accuracy as achieved with a full LES. The most important values for airfoils at high angle of attack where separated flow regions come into play are the results for the friction and pressure coefficients.

STEDG

Fig. 20 Test configuration with flow lay-out

303

Fig. 21 Zonal RANS/LES grid existing of 16blocks and 13·106 0. The LES zones are plotted in red

Fig. 22 Visualization of the smooth transition of Fig. 23 Comparison of the pressure coefficient the Mach number contours cP for a full LES and the zonal computation, and the experimental results [20]

Figure 22 shows the Mach number contour plots for the zonal RANS/LES computation. A smooth transition from RANS to LES domain is visible in the Mach number contours, where the black line indicates the outflow boundary of the RANS domain and the red line the boundaries of the LES domain. Furthermore, when zooming in, the laminar separation bubble is clearly visible, proving the ability to capture this phenomenon with the applied zonal method. Figure 23 presents the pressure distribution of the pure LES computation and experiments, together with the pressure distribution of the zonal RANS/LES solution. It can be seen that the curve has a smooth and continuous transition between LES and RANS domains. At the trailing edge, the zonal solution still shows pressure fluctuations due to the relative small averaging time. The interface location between RANS and LES is indicated by short vertical black lines in the pressure distribution. The pressure distribution shows that the size of the laminar separation bubble (LSB) is reproduced also by the zonal solution, however, the pressure value in the laminar separation is slightly lower than in the full LES and the experiments. Further work on the boundary conditions in the overlapping region is being performed to reduce the existing small pressure offset going from RANS domain into LES domain in front of the leading edge.

304

D.F. Harlacher et al.

Fig. 24 Taylor-Green vortex (Re = 5000, Ma = 0.1), time evolution of the vortical structures (isosurfaces of vorticity, left t = 1, right t = 9)

7 Turbulence Modeling Using High Order Schemes The simulation of turbulent flows with DG methods is a relatively new topic and the behavior of turbulence models like LES have to be investigated thoroughly. Here we will focus on the analysis of the Taylor-Green vortex [10], where a laminar-turbulent transition produces isotropic homogeneous turbulence by consecutive break up of large vortices into smaller ones, as shown in Fig. 24. We implemented a sub-gridscale model, namely the standard Smagorinsky model. The sub-grid-scale viscosity SM = (C Δ )2 |S|, ˜ with the Smagorinsky constant CS , the filter width is defined as μsgs S ˜ The resolution of a DG based approximation is Δ and the filtered stress tensor S. determined by the size of the element Δ x and furthermore by the number of internal DOF, i.e. by the polynomial degree N. Typically, the resolution is proportional to ∼ Δ x/N, which is used to determine the filter width of our LES discretization. The LES results of coarse Taylor-Green vortex simulations are shown in Fig. 25. We choose a Mach number Ma = 0.1 to compare our results with incompressible DNS calculations [9]. For the discretization, we choose N = 8 with 43 elements. The filter is defined via L2 -projection onto polynomial degree N˜ = 4. In a preliminary step, the effect of different Smagorinsky constants for different Reynolds numbers is shown. We kept the overall resolution constant, to solely investigate the different model effects. The standard Smagorinsky constant CS = 0.18 results in too much dissipation. Tuning the constant reveals that it is possible to find a suitable sub-gridscale viscosity. However, the results demonstrate that this optimal constant depends on the simulated problem, as we get CS = 0.13 for Re = 200 and CS = 0.09 for Re = 400. More investigations of the different sub-grid-scale model aspects (Smagorinsky constant, filter, definition of filter width, higher Re number) are necessary. In Fig. 26 we plot the dissipation rate for a low order and high order discretization without modeling and the result of the LES from Fig. 25. The low order simulation is over-dissipative, even for a very high resolution of 2.1mio DOF, whereas the high order simulation perfectly reproduces the DNS results. We listed the computational cost of each simulation to show that a successful high order LES is very promising.

STEDG

305

Fig. 25 LES of Taylor-Green vortex for different Smagorinsky constants CS

Fig. 26 Taylor-Green vortex (Re = 400, Ma = 0.1), dissipation rate for varying discretizations, with computational costs

8 Summary and Outlook The applications shown in this paper demand for highly scalable and capable codes that can leverage the many core HPC machines from today. These machines will be the standard development machines in industry of tomorrow. The results of this paper show that numerical simulations are capable of producing high quality solutions for sophisticated geometries. However, the computational costs are still very high for DoE processes in which up to hundreds of simulations are needed for one development cycle. To achieve this, knowledge from all fields in computer science, numerical mathematics and engineering have to be combined. Section 3 showed several optimization and scalability issues that have to be tackled and resolved. But with these only, the necessary speed-up cannot be achieved. New numerical schemes with their own mathematical properties have to be developed like zonal RANS-LES coupling (Sect. 2.2) or higher order schemes (Sect. 7). They have to be validated on industrial test-cases (Sects. 4–6), which require a good engineering knowledge to find e.g. suitable meshes and parameters that do not contradict performance and efficiency considerations. The current paper has shown that the necessary quality of the new methods is achieved not only for academic test cases, but for real life

306

D.F. Harlacher et al.

applications also. Optimization and scalability issues have been tackled. This development has to be continued in the future, bringing together the current developments with future applications. Acknowledgments. The project STEDG is funded by the German Federal Ministry for Education and Research (BMBF) in the call “HPC Software for scalable Parallel Computers”. We also thank the Gauss Centre for Supercomputing (GCS) which provided us with the necessary resources on different HPC systems.

References 1. J. Utzmann, T. Schwartzkopff, C. Munz, and M. Dumbser, Heterogeneous domain decomposition for computational aero-acoustics, AIAA Journal, Vol. 44, pp. 2231–2250, (2006). 2. O. Sch¨onrock, Aeroacoustic simulation using SAS-SST turbulence model in ANSYS CFX, Proceedings of Int. Conf. on Jets, Wakes and Separated Flows, ICJWSF-2008, (2008). 3. F. L¨orcher, Predictor Corrector DG, PhD thesis, University of Stuttgart, (2008). 4. F. L¨orcher, G. Gassner, and C.-D. Munz, A discontinuous Galerkin scheme based on a spacetime expansion I. Inviscid compressible flow in one space dimension, J. Sci. Comp., Vol. 32, pp. 175–199, (2007). 5. G. Gassner, F. L¨orcher, and C.-D. Munz, A discontinuous Galerkin scheme based on a spacetime expansion II. Viscous flow equations in multi dimensions, J. Sci. Comp., Vol. 34, pp. 260– 286, (2007). 6. G. Gassner, F. L¨orcher, and C.-D. Munz, An explicit discontinuous Galerkin scheme with local time-stepping for general unsteady diffusion equations, J. Comput. Phys., Vol. 227, pp. 5649– 5670, (2008). 7. G. Gassner, M. Dumbser, F. Hindenlang, and C.-D. Munz, Explicit one-step time discretizations for discontinuous Galerkin and finite volume schemes based on local predictors, J. Comput. Phys., in press, corrected proof, (2010). 8. F. Hindenlang, G. Gassner, T. Bolemann, and C.-D. Munz, Unstructured high order grids and their application in discontinuous Galerkin methods, Conference Proceedings, V European Conference on Computational Fluid Dynamics ECCOMAS CFD 2010, Lisbon, Portugal, (2010). 9. S. Hickel, Implicit Turbulence Modeling for Large-eddy Simulation, PhD thesis, TU Dresden, (2005). 10. M.E. Brachet, Direct simulation of three-dimensional turbulence in the Taylor–Green vortex, Fluid Dynamics Research, Vol. 8, pp. 1–8, (1991). 11. P.-O. Persson and J. Peraire, Sub-cell shock capturing for discontinuous Galerkin methods, Proc. of the 44th AIAA Aerospace Sciences Meeting and Exhibit, (2006). 12. M.-S. Liou and C.J. Steffen, A new flux splitting scheme, Journal of Computational Physics, Vol. 107, pp. 23–39, (1993). 13. M. Meinke, W. Schr¨oder, E. Krause, and Th. Rister, A comparison of second- and sixth-order methods for large-eddy simulations, Computers and Fluids, Vol. 31, pp. 695–718, (2002). 14. P. Boris, F.F. Grinstein, E.S. Oran, and R.L. Kolbe, New insights into large eddy simulation, Fluid Dynamics Research, Vol. 10, pp. 199–228, (1992). 15. R. Ewert and W. Schr¨oder, Acoustic perturbation equations based on flow decomposition via source filtering, Journal of Computational Physics, Vol. 188, pp. 365–398, (2003). 16. R. Ewert and W. Schr¨oder, On the simulation of trailing edge noise with a hybrid LES/APE method, Journal of Sound and Vibration, Vol. 270, pp. 509–524, (2004). 17. D. K¨onig, S.R. Koh, M. Meinke, and W. Schr¨oder, Two-step simulation of slat noise, Computers and Fluids, Vol. 39 nr. 3, pp. 512–524, (2010).

STEDG

307

18. A. Spille, H.-J. Kaltenbach, Generation of turbulent inflow data with a prescribed shear-stress profile, Third AFSOR Conference on DNS and LES, (2001). 19. N. Jarrin, N. Benhamadouche, S. Laurence, D. Prosser, A synthetic-eddy-method for generating inflow conditions for large-eddy simulations, Journal of Heat and Fluid Flow, Vol. 27, pp. 585–593, (2006). 20. R. Wokoeck, N. Krimmelbein, J. Ortmanns, V. Ciobaca, R. Radespiel, and A. Krumbein, RANS simulation and experiments on the stall behaviour of an airfoil with laminar separation bubbles, AIAA Paper AIAA-2006-0244, (2006).

A Computation Technique for Rigid Particle Flows in an Eulerian Framework Using the Multiphase DNS Code FS3D Philipp Rauschenberger, Jan Schlottke, and Bernhard Weigand

Abstract A new technique to simulate the motion of rigid particles was implemented in the in-house VOF based code FS3D with the aim of studying freezing processes of undercooled water droplets in the atmosphere. Particle deformation is determined and terminal velocities of a free falling sphere are compared to analytic results for validation. The computations were performed on the NEC SX-9 platform of the HLRS.

1 Introduction The ITLR code Free Surface 3D (FS3D) shall be extended to simulate freezing of supercooled droplets in the atmosphere. This requires the code to handle three phases: ice (solid), water and air (fluids). A preliminary is to treat the motion of rigid bodies in fluids. This method is presented here and bases on the works of Patankar [6] and Sharma et al. [13]. The numerical technique does not use any model for fluid-solid interaction and can hence be used for direct numerical simulations (DNS). The idea is to treat the whole computational domain (including the particle domain) as fluid. A body force is formally introduced in the momentum equation to force rigid body motion. The translatory and angular momenta of the rigid body are computed by taking into account the control volumes occupied by the particle. The resulting rigid body velocity is then projected onto the velocity field in the particle domain. Sharma et al. [13] move the particle centroid according to its current translatory and rotatory velocity. In contrary, the method presented here convects the VOF variable defining the particle with the linear transport equation. The advantage is that existing features of FS3D like evaporation and future

Philipp Rauschenberger Institut f¨ur Thermodynamik der Luft- und Raumfahrt, Universit¨at Stuttgart, Pfaffenwaldring 31, 70569 Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 23, © Springer-Verlag Berlin Heidelberg 2012

309

310

P. Rauschenberger, J. Schlottke, B. Weigand

developments like freezing of water can be used and implemented in a coherent fashion. First, the mathematical formulation of the technique is given, followed by a description of the numerical method. Then, the results of two test cases are presented. The report winds up with a section on performance analysis and improvements on the NEC SX-9.

2 Mathematical Formulation The whole computational region is treated as fluid in a first step. However, an additional condition, forcing the fluid region to perform rigid body motion, must be imposed and so Carlson [3] introduced the term rigid fluid method. No model for fluid-particle interaction is necessary. The viscosity of the fluid representing a particle is the same as that of the surrounding fluid and any model for the viscous behavior of the base fluid may be used. However, a Newtonian fluid is assumed here. Let Ω designate the whole computational domain (fluid and particle) and P(t) be the region occupied by a particle. For simplicity, only one particle will be considered, but the extension to multiple particles is straightforward. The condition of rigid body motion is represented by the familiar equation of mechanics: u = U +ω × r

(1)

u is the velocity vector in one point of P(t). U and ω are the translatory and angular velocity relating to the particle centroid, r is the vector pointing from the centroid to the respective point within the particle. In the fluid domain Ω \ P(t) ∇·u = 0

in Ω \ P(t)

(2)

must hold. In the particle domain P(t), a supplementary condition (4) appears: ∇ · u = 0 in P(t) 1 D [u] = ∇u + (∇u)T = 0 in P(t) 2

(3) (4)

Equation (4) demands no deformation in P(t) with D [u] = 0. This constraint also assures that the velocity field is divergence free and makes (3) redundant. Formally, the rigidity constraint gives rise to a tensor field L in the particle domain assuring rigid body motion. Thus, the stress tensor reads τ = −pI + L + S, where p is pressure, I is the identity matrix, the viscous stress tensor S = 2μ D [u] turns out to be zero due to (4). According to Patankar et al. [7], the rigidity constraint can be imposed with

A Computation Technique for Rigid Particle Flows

311

∇ · (D [u]) = 0 in P(t) D [u] · nγ = 0 on ∂ P(t)

(5) (6)

Hence, the main problem is how to impose the rigidity constraint into the numerical method of FS3D.

3 Numerical Method FS3D uses a finite volume method on a staggered cartesian grid according to the MAC method by Harlow and Welch [5]. All scalar variables (e.g. f , ρ , p, T ) are located at the cell centers, while the velocities u, v, w are stored at the center of the cell faces. Convective terms are discretized by second order accurate upwind schemes, while diffusive terms are approximated by central difference schemes of second order accuracy. Fluxes are determined with classical Godunov type schemes. However, fluxes of the volume fraction f over cell faces are computed with the known reconstructed interface plane provided by the PLIC algorithm of Rider and Kothe [8]. The numerical scheme to solve the momentum equations is based on the projection method by Bell et al. [1]. It basically consists of two steps. First, all terms except for the pressure gradient are treated by explicit schemes gaining an intermediate velocity. In the second projection step, the pressure field is calculated implicitly, from the Poisson equation: ∇·u 1 ∇p = (7) ∇· ρ( f ) δt It satisfies the solenoid continuity equation, which is in fact a pure velocity condition due to incompressibility. A multi grid solver is used to solve the upcoming linear system of equations. It bases upon a coarse grid Galerkin approximation, where the efficient red-black Gauss-Seidel algorithm in W-cycle scheme is applied as smoother. The number of pre- and post-smoothing steps are adjusted during runtime and an overrelaxation of the Gauss-Seidel scheme is introduced depending on convergence rate. More detailed information about the applied numerical procedure in FS3D can be found in [9] and [12]. In the rigid fluid method, the whole computational domain is treated as fluid. Solid and fluid are only distinct due to a density difference in Ω \ P(t) and P(t). The surface tension of the rigid particle is zero. Let uˆ be an intermediate velocity field in Ω . Rigid body motion must be imposed in P(t). Another source term frigid is formally introduced and rigid body motion is projected on P(t): urigid = uˆ +

Δt frigid ρ( f )

in

P(t)

Equations to compute frigid can be gained from (5) and (6):

(8)

312

P. Rauschenberger, J. Schlottke, B. Weigand

Δt =0 frigid ∇ · D urigid =∇ · D uˆ + ρ( f ) Δt D urigid · n = frigid · n = 0 D uˆ + ρ( f )

(9) (10)

In contrast Patankar [6] proposed that ˆ + ωˆ × r urigid = U

(11)

must hold and the linear and angular momentum must be conserved in P(t). The momenta can be determined from a simple integration over all particle cell volumes:

ˆ = MU Jωˆ =

P(t)

P(t)

ˆ ρ ( f )udV

(12)

ˆ r × ρ ( f )udV

(13)

Then, the velocity field u˘ accounting for rigid body motion in P(t) reads: u˘ = uˆ u˘ = urigid

Ω \ P(t) in P(t)

in

(14) (15)

ˆ and ωˆ on a staggered Some remarks must be made about the computation of U ˆ grid. Each element of the translatory velocity U may be computed separately on the respective momentum control volumina where the velocities u, v, w are defined. To do so the exact mass in the momentum control volumina (mass of solid + mass of fluid) is determined with the PLIC-reconstructed planes in each cell. By this means, a no-slip condition is obtained at the interface of particle and fluid and momentum is conserved. The elements of ωˆ are computed with a vector product of u and r (e.g. ωx = ry w− rz v), where r points to the center of the control volume, while velocities are defined on the interfaces. Hence, mass averages of velocity (denoted umcv ) are determined in the center of the control volumes to compute r × umcv . The update of the particle position and orientation is done within the Eulerian framework, when the VOF-variable f is advected at the next time step. The velocity field assures rigid body motion due to the above adaptions. Therefore, the rigid body is not treated as a Lagrangian particle as in other methods (e.g. Sharma et al. [13]). Rigid body motion is divergence free by definition, but there will be a divergence in the velocity field on ∂ P(t) (i.e. the interface cells) due to the projection of the rigid body velocity onto the velocity field. Therefore, a pressure correction according to (7) must be conducted after having imposed rigid body velocity. In return, the pressure correction disturbs the velocity field within the particle and particularly in the interface cells. This is no problem, as long as the particle density is much higher than the fluid density ρ p /ρ f 1, because then the solver adds most acceleration due to pressure to the fluid phase and the particle velocity is almost untouched.

A Computation Technique for Rigid Particle Flows

313

However, if ρ p ≈ ρ f , the particle might rather be deformed due to the non-rigidbody velocity field. Two pressure corrections with preceding rigid body correction are done at each time step.

4 Results 4.1 Deformation In this section, the rigid fluid method is to be tested for deformation of a rigid particle. The setup is an oblate ellipsoid in a uniform inflow (see Fig. 1). The lateral boundaries are set to be free slip boundary conditions. The outlet has a continuous (Neumann) condition, where an additional damping zone (grey zones in Fig. 1) avoids backflow [10]. The uniform inflow velocity on the left side with u∞ = 8 m/s ρ u2 D

corresponds to a relative Weber number of We = g σ∞ e = 3.5, with De being the equivalent diameter of a sphere and surface tension σ = 0.072 kg/s2 . The particle density is ρ = 998.2 kg/m3 and the surrounding fluid is air (ρ = 1.2045 kg/m3 , μ = 18.2 μPa s). A water droplet of the same shape is computed for reference. The test cases are computed with different resolutions: 32 · 2n × 16 · 2n × 16 · 2n cells for the nth run. The resulting deformation is depicted in Fig. 2. The reference water droplet is computed with a resolution of 128 × 64 × 64 cells. It oscillates between an oblate and prolongated form. The water droplet shape is depicted in Fig. 2 at distinct times (light blue droplets). Maximum relative deviations from the initial surface and the resolution of the ellipsoid semiaxes are given in Table 1. With the rigid fluid method, the deformation is reduced considerably already on the coarsest grid, although the ellipsoid fills only two cells in x-direction. Admittedly, the ellipsoid keeps on changing its shape throughout the considered period. Doubling the cell numbers in each direction (blue curve) leads to a constant value

Fig. 1 Geometry and boundary conditions of an ellipsoid in steady inflow, semiaxes a = 5·10−4 m, b = c = 3 · 10−3 m; lengths L = 1.6 · 10−2 m

314

P. Rauschenberger, J. Schlottke, B. Weigand

Table 1 Resolution and maximum deviation of surface area Mesh res. of 2a res. of 2b, 2c 128 × 64 × 64, ref 4 24 64 × 32 × 32 2 12 128 × 64 × 64 4 24 256 × 128 × 128 8 48

|A/A0 − 1|max 39% 5.2% 3.2% 0.2%

Fig. 2 Deviation of surface area related to initial surface of an ellipsoid in steady inflow with the rigid fluid method in comparison with a water droplet

of the surface area after some oscillations for times t < 0.2 s. These are due to the strong acceleration of the flow field that is initially at rest. Once the flow has developed, the surface area is virtually constant. On the finest grid, the maximum relative devitation of surface area is only 0.2%. Thus, the presented method is able to conserve the ellipsoid’s shape, if the particle is resolved with enough cells. The rigid ellipsoids are illustrated at time t = 0.1 s in Fig. 2. The lower resolution obviously leads to a thickening of the ellipsoid and reduction of the two big semiaxes, i.e. a more spherical form.

4.2 Terminal Velocity of a Free Falling Rigid Sphere This section treats a rigid sphere falling freely in an unbounded fluid (see Fig. 3) with the following properties: gravitational acceleration g = 9.81 m/s2 , fluid dynamic viscosity μ = 1 · 10−3 kg/(m s), fluid density ρ f = 998.2 kg/m3 and rigid particle density ρ p = 1200 kg/m3 . The boundary conditions are a no-flow condition at the left and right boundary and slip walls (symmetry) on the other sides. Clift et al. [4] give a correlation to determine the theoretical terminal velocity. It depends

A Computation Technique for Rigid Particle Flows

315

Fig. 3 Geometry and boundary conditions of a free falling rigid sphere in an unbounded fluid: D = 5 · 10−4 m, channel width L = 4 · 10−3 m (L/D = 8)

on the dimensionless diameter ND from which the Reynolds number can be determined: 4ρ f ρ p − ρ f gd 3 2 ND = CD ReT = (16) 3μ 2 log10 Re = −1.7095 + 1.33438W − 0.11591W 2 ;

W = log10 ND

(17)

Equation (17) is valid in the range 73 < ND ≤ 580 and 2.37 < Re ≤ 12.2. For this setup ND = 329.35 and Re = 8.23, it follows that the terminal velocity UT = 0.01649 m/s. The influence of the walls (due to L/D = 8) on the terminal velocity is considered to be small and allows a high number of cells per diameter. A stability problem arises: The terminal velocity experiences a sudden jump as can be seen in Fig. 4. After the peak is reached, the velocity drops back to a steady state value. Figure 4 shows curves of the terminal velocity on the same grid but with different CFL-numbers. The jump in velocity can be smoothed by lowering the time step and vanishes for CFL ≤ 0.1. It was discovered that the divergence of the velocity also experiences a sudden strong increase right before the velocity jump. By reducing the CFL-limit, the increase in divergence is damped. This time step limitation holds at each time step, but a restriction is only needed, when the divergence exceeds a certain limit. The following time step limit is governed by the infinity norm of the divergence || · ||, because it is the most restrictive one:

316

P. Rauschenberger, J. Schlottke, B. Weigand

Fig. 4 Terminal velocity of the sphere with stability problem Table 2 Free falling rigid sphere: discretization parameters, relative error in terminal velocity and deformation Mesh fdiv res. of D relative error in UT |A/A0 − 1|max 080 × 032 × 032 1.0 · 10−2 4 27.1% 0.72% 160 × 064 × 064 5.0 · 10−3 8 15.9% 0.83% 16 9.70% 0.52% 320 × 128 × 128 2.5 · 10−3 640 × 256 × 256 2.5 · 10−3 32 6.30% 0.28%

δ t = fdiv ||(∇ · u)||−1

(18)

fdiv is a factor decreasing with cell size. Results obtained with the above time step limit are depicted in Fig. 5. fdiv as well as the relative error in terminal velocity and deformation are listed in Table 2. The terminal velocities converge with decreasing cell sizes differing 6.7% from the analytic value on the finest grid. A grid refinement analysis according to Roache [11] yields an extrapolated relative error between the solution on the finest grid UT,1 and the asymptotic one on an infinitesimal fine grid UT,ext :

εext =

UT,ext −UT,1 = 2.6% UT,ext

(19)

The fine grid convergence index GCI = 3.4%. Then, the error of UT,ext compared to the theoretical solution is εth = 4.1%.

A Computation Technique for Rigid Particle Flows

317

Fig. 5 Terminal velocity of the free falling rigid sphere in an unbounded fluid with time step restriction

5 Performance Analysis Switching from NEC SX-8 to NEC SX-9 at the High Performance Computing Center Stuttgart (HLRS) disclosed severe performance issues of the FS3D code on the new platform. Particularly the speed-up of parallel versions proved to be intolerable. As reported by Weking et al. [15], computational performance on the NEC SX-8 is comparatively good. Vector lengths of 208 and vector operation ratios above 98% were achieved on a grid with 512 × 256 × 256 cells. The speed-up is 1.75, 2.81 and 3.85 computing on 2, 4 and 8 CPUs, respectively. Admittedly, these results are far from being ideal. However, FS3D is a code for DNS of incompressible multiphase flow and the numerical solution of the incompressible Navier-Stokes equations makes it necessary to solve a Poisson equation for pressure. This results in a huge set of linear equations using up to 90% of the computation time. In FS3D, a multigrid solver (cf. P. Wesseling [16]) is used that essentially solves the problem on consecutively coarser grids leading to faster convergence. The coarsest grid is obtained by successively dividing all three grid dimension by two until the modulo is not equal to zero. The number of divisions plus one is the number of levels nl . The coarsest grid is solved with a direct solver (i.e. dgesv of the lapack library). On coarse grids, there are quite small vector lengths. At a certain coarse grid size, there will be no speed-up anymore when splitting up the computation on several CPUs/threads (actually a thorough examination has shown, that using multiple threads on these coarse grids slows down computation). Even worse, the multigrid solver is smoothing the levels in a W-cycle, which means that the coarser the grid, the more often the smoothing step has to be performed. A Wcycle for a grid with 163 cells is shown in Fig. 6a.

318

P. Rauschenberger, J. Schlottke, B. Weigand

Fig. 6 Multigrid W-cycle of a grid with 163 cells

The key feature of the newly implemented parallelization is that the coarse grids are smoothed by one CPU only. This is realized with conditional OMP directives. Furthermore, coarsening is restricted to a certain limit such that the direct solver treats a somewhat larger set of linear equations. The limit is probably architecture dependant. On the NEC SX-9 it was found to be 64 cells. This new restriction is depicted in Fig. 6b. Hence, using the multigrid solver leads inevitably to a higher serial fraction of the code than a classic solution method that is solving the problem directly on the finest grid. Nevertheless, the multigrid solver leads to considerably faster convergence. The bottom line is that no ideal speed-ups can be expected by FS3D due to the large fraction of serial code in the multigrid solver. However, the finer the resolution (i.e. the more grid cells) the lower the serial fraction and hence, the better the speed-up, because the finer grids then play a more important role. Typical numerical setups computed with FS3D involve single drops. The volume of fluid method (VOF) with piecewise linear interface reconstruction (PLIC) frequently implies loops that only do computations if the cell has a phase interface. Dynamic scheduling of the OMP do loops is used to obtain good load balance in these cases and work arrays are assigned that save the indices of the interface cells, previous to the actual computation. On each multigrid level, one smoothing operation is done with the Gauss-Seidel method. The structure of the pressure Poisson equation leads to an algorithm that accesses all six direct neighbors of an element of the pressure array p. In order to vectorize this code, the Red-Black ordering method is used. At each smoothing step, two loops run through the whole array successively. This approach is also shown in Sako et al. [14] who describe the implementation of the Red-Black ordering method on the NEC SX-9. In contrary to their approach, here a mask is defined that hides the halo cells. This allows to pass the loop in 1D with stride 2, allowing for longer vector lengths (especially on coarse grids). As mentioned in [14], the compiler directive on adb is used on array p in order to store it in the NEC SX-9 assignable data buffer (ADB) to reduce access time.

A Computation Technique for Rigid Particle Flows

319

All of the following performance results are obtained with a representative test case of a water droplet in air exposed to an incoming flow with a velocity of 2 m/s. Its diameter is 0.01 m and the computational domain is cubic with a lateral length of 0.05 m. The speed-up S of parallel versions is defined by S=

T1 TN

(20)

where Ti is the computation time of one time step. The index N denotes the number of CPUs / threads used for the computation. Hence, T1 marks the reference computation time of the parallel code on one CPU. The ideal speed-up is reached, if S = N. This leads to the definition of the efficiency E E=

T1 S = NTN N

(21)

The serial fraction F of the parallel code can be estimated experimentally by using the Karp-Flatt metric which can be derived from Amdahl’s Law [2]. S= F=

1 (1 − P) + NP 1 S

− N1

1 − N1

(22) (23)

In this formulation the parallel fraction P = (1 − F). Knowing the estimated serial fraction F, the maximum possible speed-up on an infinite number of CPUs is calculated from (22) Smax = lim

N→∞

1 1 1 = = (1 − P) F (1 − P) + NP

(24)

The first step in improving the performance on the NEC SX-9 was to have a look at the vectorization of the code. Afterwards the code was parallelized with OpenMP. The current FS3D version is v52 including all vectorization and parallelization developments. It shall be compared to the former version v51. First, the average computation time of one time step is compared using the serial codes. The average is taken from 200 time steps and the initialization time is subtracted. Version v51 takes 6.33 s/timestep, while version v52 takes 4.46 s/timestep. Computation time is reduced by a factor of 1.4. Hence, vectorization adapted to the NEC SX-9 architecture already improves computation time considerably. Next, the computation times of the two versions are compared on different numbers of CPUs. Both versions are compiled on the NEC SX-9 with the P auto compiler flag. Figure 7 shows that computation time is considerably reduced with version v52. When using the OpenMP directives, additional reduction of computation time can be achieved.

320

P. Rauschenberger, J. Schlottke, B. Weigand

Fig. 7 Comparison of average computation time per time step of versions v51 auto parallelize, v52 auto parallelized and v52 OpenMP on a grid with 2563 cells Table 3 Computation time compared to v51 on multiple CPUs CPUs 1 2 4 8 16

Tv51,ap /Tv52,ap 2.65 3.38 4.09 4.72 5.05

Tv51,ap /Tv52,omp 2.57 3.67 5.15 6.81 8.24

The factor between the computation times for a number of CPUs of version v52 auto parallelize and OpenMP compared to version v51 are presented in Table 3. It is approximately 2.5 on one CPU compared to v51. Using multiple CPUs, the speedup increases up to 5.05 with auto parallelization and 8.24 with the OpenMP version. These results clearly show the improvements in computation time compared to version v51. Figure 8 depicts the speed-up of all three versions and ideal speed-up. The axes are scaled logarithmic with base 2. Version v51 has a speed-up of S = 1.53 on 16 CPUs with an estimated serial fraction of F = 63%. Thus, the maximum speedup on an infinite number of CPUs is Smax = 1.6. Considerable improvement was achieved with version v52. The serial fraction reduces to F = 29.4% with the auto parallel compiler option set and F = 14.9% when using the OpenMP directives. Therefore, the maximum speed-up of the auto parallel version is Smax = 3.4, with OpenMP directives it is Smax = 6.7. This clearly shows that the OpenMP version gives the best results and shall now be tested on cases with more grid cells, where even better results are expected. This is due to the fact that fine grids show a very good speed-up and then play a more important role in total computation time. The results are shown in Fig. 9.

A Computation Technique for Rigid Particle Flows

321

Fig. 8 Comparison of speed-up of versions v51 auto parallelize, v52 auto parallelized and v52 OpenMP on a grid with 2563 cells

Fig. 9 Comparison of speed-up of version v52 OpenMP on grids with 2563 , 5123 and 7683 cells

The serial fraction reduces to F = 10.4% (Smax = 9.6) on the grid with 5123 cells and F = 8.8% (Smax = 11.4) on the grid with 7683 cells. A computation with 10243 cells is not possible on one node of the NEC SX-9 due to memory limitations. The average vector operation ratio is above 99% for all computations and the average vector lengths are 174, 211 and 222 on the grids with 2563 , 5123 and 7863 cells, respectively. Table 4 shows the concurrent GFLOPS reached during computation of different grids. The peak performance of the NEC SX-9 being 100 GFLOPS per CPU, there is still some room for optimization, since only approximately a seventh of peak FLOPS is obtained.The average vector operation ratio is above 99% for all computations and the average vector lengths are 174, 211 and 222 on the grids with 2563 , 5123 and 7863 cells, respectively. Table 4 shows the concurrent GFLOPS reached during computation of different grids. The peak performance of the NEC

322

P. Rauschenberger, J. Schlottke, B. Weigand

Table 4 Concurrent GFLOPS of v52 OpenMP on different grids 2563 12.0 20.6 32.6 45.8 56.8

CPUs 1 2 4 8 16

5123 15.1 27.1 45.2 67.8 89.0

7863 16.7 30.4 51.8 79.7 107.8

Table 5 Comparison of speed-up and efficiency of v52 OpenMP on different grids CPUs 1 2 4 8 16

2563 S 1.00 1.73 2.77 3.93 4.92

E [%] 100.00 86.72 69.24 49.14 30.76

5123 S 1.00 1.81 3.05 4.64 6.18

E [%] 100.00 90.62 76.28 57.95 38.63

7863 S 1.00 1.84 3.16 4.96 6.81

E [%] 100.00 91.95 79.05 61.96 42.54

SX-9 being 100 GFLOPS per CPU, there is still some room for optimization, since only approximately a seventh of peak FLOPS is obtained. Table 5 lists speed-up and efficiency for each grid and can be taken as a guidance for the choice of number of CPUs used in future computations. Having in mind that a certain amount of serial fraction is inevitable in FS3D, the results are relatively good.

6 Conclusion A new rigid fluid method was implemented in FS3D and is able to simulate rigid body motion. The implementation of this method was necessary to investigate icewater particles in clouds in the future. However, deformation cannot be completely inhibited due to the convection of the VOF-variable and the necessary pressure correction after having imposed the rigid body velocity. The instability that occurs in accelerated motion, such as the presented free fall test case, can be avoided with an additional time step restriction. The performance improvements on the NEC SX-9 were successful. Ideal speedups cannot be reached due to the serial fraction of the applied multigrid solver. However, computationally expensive cases on large grids can be computed efficiently. Acknowledgments. The authors greatly appreciate the High Performance Computing Center Stuttgart (HLRS) for support and supply of computational time on the NEC SX-9 platform under the Grant No. FS3D/11142. Sincere thanks are given to J. Hertzer and the HLRS-team for very helpful advice on the code optimization and the plenty of time they sacrificed. The authors also greatly acknowledge financial support of this project from DFG for the collaborative research council SFB-TRR 75.

A Computation Technique for Rigid Particle Flows

323

References 1. Bell, J.B., Colella, P., Glaz, H.M.: A second-order projection method for the incompressible Navier-Stokes equations. Journal of Computational Physics 85(2), 257–283 (1989) 2. Bengel, G., Baun, C., Kunze, M., Stucky, K.U.: Masterkurs Parallele und Verteilte Systeme. Vieweg+Teubner, Wiesbaden (2008) 3. Carlson, M.: Rigid, Melting and Flowing Fluid. Ph.D. thesis, College of Computing Georgia Institute of Technology (2004) 4. Clift, R., Grace, J.R., Weber, M.E.: Bubbles, Drops and Particles. Dover Publications, Inc., Mineola, New York (2005) 5. Harlow, F.H., Welch, J.E.: Numerical calculation of time-dependent viscous incompressible flow of fluid with free surface. Physics of Fluids 8(12), 2182–2189 (1965) 6. Patankar, N.A.: A formulation for fast computations of rigid particulate flows. Center for Turbulence Research, Annual Research Briefs, pp. 185–196 (2001) 7. Patankar, N.A., Singh, P., Joseph, D.D., Glowinski, R., Pan, T.W.: A new formulation of the distributed Lagrange multiplier/fictitious domain method for particulate flows. International Journal of Multiphase Flow 26, 1509–1524 (2000) 8. Rider, W.J., Kothe, D.B.: Reconstructing volume tracking. Journal of Computational Physics 141(2), 112–152 (1998) 9. Rieber, M.: Numerische Modellierung der Dynamik freier Grenzfl¨achen in Zweiphasenstr¨omungen. Ph.D. thesis, University of Stuttgart (2004) 10. Rieber, M., Graf, F., Hase, M., Roth, N., Weigand, B.: Numerical simulation of moving spherical and strongly deformed droplets. Proceedings ILASS-Europe, pp. 1–6 (2000) 11. Roache, P.J.: Perspective – a method for uniform reporting of grid refinement studies. Journal of Fluids Engineering-transactions of the ASME 116(3), 405–413 (1994) 12. Schlottke, J., Weigand, B.: Direct numerical simulation of evaporating droplets. Journal of Computational Physics 227(10), 5215–5237 (2008) 13. Sharma, N., Patankar, N.A.: A fast computation technique for the direct numerical simulation of rigid particulate flows. Journal of Computational Physics 205, 439–457 (2005) 14. Soga, T., Musa, A., Okabe, K., Komatsu, K., Egawa, R., Takizawa, H., Kobayashi, H., Takahashi, S., Sasaki, D., Nakahashi, K.: Performance of SOR methods on modern vector and scalar processors. Computers & Fluids 45(1), 215–221 (2011). http://www.sciencedirect.com/science/article/B6V26-51TYDY8-4/2/ 2656643dcf6a469e4321b243c552060d 15. Weking, H., Huber, C., Weigand, B.: Direct numerical simulation of single gaseous bubbles in viscous liquids. HLRS, High Performance Computing in Science & Engineering, pp. 1–13 (2009) 16. Wesseling, P.: An Introduction to Multigrid Methods. John Wiley & Sons (1991)

Optimization of Chaotic Micromixers Using Finite Time Lyapunov Exponents Aniruddha Sarkar, Ariel Narv´aez, and Jens Harting

Abstract In microfluidics mixing of different fluids is a highly non-trivial task due to the absence of turbulence. The dominant process allowing mixing at low Reynolds number is therefore diffusion, thus rendering mixing in plain channels very inefficient. Recently, passive chaotic micromixers such as the staggered herringbone mixer were developed, allowing efficient mixing of fluids by repeated stretching and folding of the fluid interfaces. The optimization of the geometrical parameters of such mixer devices is often performed by time consuming and expensive trial and error experiments. We demonstrate that the application of the lattice Boltzmann method to fluid flow in highly complex mixer geometries together with standard techniques from statistical physics and dynamical systems theory can lead to a highly efficient way to optimize micromixer geometries. The strategy applies massively parallel fluid flow simulations inside a mixer, where massless and noninteracting tracer particles are introduced. By following their trajectories we can calculate finite time Lyapunov exponents in order to quantify the degree of chaotic advection inside the mixer. The current report provides a review of our results published in [1] together with additional details on the simulation methodology.

1 Introduction Microfluidics is an interdisciplinary engineering and science branch which connects physics, chemistry, biology and engineering and has applications in various scientific and industrial areas. Here, we are interested in a common building block for microfluidic systems, namely micromixers. A micromixer is a microfluidic device Aniruddha Sarkar · Ariel Narv´aez · Jens Harting Institute for Computational Physics, University of Stuttgart, Pfaffenwaldring 27, 70569 Stuttgart, Germany Ariel Narv´aez · Jens Harting Department of Applied Physics, Eindhoven University of Technology, Den Dolech 2, 5600MB Eindhoven, The Netherlands W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 24, © Springer-Verlag Berlin Heidelberg 2012

325

326

A. Sarkar, A. Narv´aez, J. Harting

used for effective mixing of different fluid constituents. A typical example is the integration as important component of chemical and biological sensors [2]. It can be used to efficiently mix for example a variety of bio-reactants such as bacteria cells, large DNA molecules, enzymes and proteins in portable integrated microsystems with minimum energy consumption. It is also used in mixing of solutions in chemical reactions [3], sequencing of nucleic acids or drug solution dilution. In recent years the demand for highly efficient and reliable micromixers has increased substantially in research and in industry. Hence, their optimized design has become an important field of research [4]. Due to the small dimensions of micromixers laminar flows are created inside the channels causing the mixing performance to be limited. Experiments on channels with complex surface topology have revealed that microscale mixing is enhanced by “chaotic advection”, a process which was first reviewed by Aref in 1984 [5]. He describes how mixing is still possible even at low Reynolds number by repeated stretching and folding of fluid elements. If properly applied, this mechanism causes the interfacial area between the fluids to increase exponentially, which can then lead to an enhanced inter-material transport and thus better mixing [6]. If an external energy source is used to drive the mixing process, then the micromixer is termed as “active mixer”. These external energy sources could be acoustic bubble induced vibrations, periodic variation of the flow rate, piezoelectric vibrating membranes, valves etc. The external sources are often moving components such as micropumps and they require advanced fabrication steps [7]. The second category of micromixers is based on restructuring the flow profile using static but sophisticated mixer geometries. These are termed as “passive mixer.” While the fabrication of passive micromixers is generally much simpler than producing an active device, further advantages are increased reliability and the lack of any elements which generate heat. The absence of heating is an important factor for applications to biological studies where temperature is a sensitive parameter. The mixing length and mixing time are defined as the distance and time span the fluid constituents have to flow inside the mixer in order to obtain a homogeneous mixture. An effective micromixer should reduce the mixing length and time substantially in order to achieve rapid mixing. A common practice to achieve this goal is to design passive micromixers that create alternating thin fluid lamellae. These result in an interfacial area that increases linearly with the number of lamellae rendering the diffusion process more effective and hence allowing faster mixing [8]. There are many examples of bi-lamellation [9, 10] and multi-lamellation [4], but the drawback of such devices is that the number of lamellae is generally limited due to the negative impact on the applied pressure drop caused by the microstructures inside the channel. The drawbacks of conventional mixers based on multi-lamination techniques is overcome in the so-called “chaotic micromixer”. Such a device consists of microstructured objects such as “herringbones”, placed inside a microchannel. The staggered herringbone mixer (SHM) shown in Fig. 1 is the first chaotic micromixer that can be found in the literature. It was developed in 2002 by Stroock et al. [11]. The half cycles of the SHM consist of grooves with two arms which are asymmetric

Optimization of Chaotic Micromixers

327

Fig. 1 A snapshot from a typical simulation of flow inside a staggered herringbone micromixer demonstrating the highly regular arrangement of tracer particles at the beginning of the simulation (left) and a fully mixed state at a late stage of the simulation (right). The fluid itself is not shown

and unequal in length. These arms are inclined at an angle of 45◦ and the pattern interchanges every half cycle of the herringbone. The peculiar arrangement of the herringbone structure enhances the mixing process by “chaotic advection” where the interfacial area between the fluids grows exponentially in time—the most important advantage over mixers using the concept of multi-lamellation. To compare different micromixers and to develop better ones, it is important to develop schemes to quantify their performance. Efficiency and mixing quality have been studied by various methods in the past. These include the analysis of the probability density function of the flow profiles, studying the stretching of the flow field, the Poincar´e section analysis, or the intensity of segregation as introduced by Danckwerts in 1952 [6, 12]. Here, an alternative numerical optimization procedure is presented which is tailored for the optimization of chaotic micromixers and which is able to harness the power of today’s high performance computers for the application to a highly practical problem. It is based on lattice Boltzmann (LB) simulations to describe the flow inside complex mixer geometries together with a measurement of finite time Lyapunov exponents (FTLE) as obtained from trajectories of massless tracer particles immersed in the flow. The Lyapunov exponent provides a quantitative measure of long term average growth rates of small initial flow perturbations and thus allows a quantification of the efficiency of chaotic transport [13, 14]. We apply Wolf’s method to calculate the FTLE since the systems of interest are finite and simulations are limited to a finite time span [8]. The numerical scheme has the potential to assist an experimental optimization since geometrical parameters or fluid properties can easily be changed without requiring a new experiment. To demonstrate its applicability, the scheme is applied to evaluate the optimal parameters of the staggered herringbone mixer. Figure 1 depicts two typical snapshots from our simulations. The left figure shows a snapshot of the tracer positions just after the start of the simulations. The fluid itself is not shown. One can see that the tracer particles which are initially placed at the inflow plane of the SHM start to travel with the flow. As shown on the right hand side, towards the end of the simulation all tracer particles are homogeneously distributed throughout the mixer demonstrating that the system is fully mixed.

328

A. Sarkar, A. Narv´aez, J. Harting

2 Simulation Method For a description of the fluid flow inside the micromixer, we apply the lattice Boltzmann method, a simplified approach to solve the Boltzmann equation in discrete space, time and with a limited set of discrete velocities [15]. The Boltzmann equation, given as ∂t f + c · ∇ f = Ω ( f ), (1) describes the evolution of the velocity distribution function by molecular transport and binary intermolecular collisions. f (r, c,t) represents the distribution of velocities in continuous position and velocity space, r and c respectively. The position x at which f (x, ck ,t) is defined, is restricted to a discrete set of points on a regular discrete lattice with lattice constant Δ x. The velocity is restricted to a set of velocities ck implying that velocity is discretized along specific directions. Δ t denotes the discrete time step. The model we adopt is a D3Q19 model which is a 3 dimensional model with 19 different velocity directions, k = 0, 1, . . . , 18 [16]. The right hand side of the above equation represents the collision operator which is simplified to a discretized linear Bhatnagar-Gross-Krook (BGK) form [17] that can be written as eq

Ωi = −ω ( fk (x,t) − fk (x,t)).

(2)

Here, ω is one over the relaxation time of the system, which controls the relaxation eq towards the Maxwell-Boltzmann equilibrium distribution fk (x,t). By considering small velocities and constant temperature, a discretized second order Taylor expansion of the above equilibrium distribution function can be written as ρ ck · ueq (ck · ueq )2 ueq · ueq eq + − f k (x,t) = ζk ◦ 1 + , (3) ρ cs 2 2cs 4 2cs 2 ◦ where √ ζk are the lattice weights, ρ is theeqdensity, ρ a reference density, and cs = (1/ 3)Δ x/Δ t is the speed of sound. u is the equilibrium velocity of the fluid, which is shifted from the mean velocity by an amount g/ω under the influence of a constant acceleration g. The evolution of the LB process takes place in two steps: the collision step where the velocities are redistributed along the directions of the lattice and the propagation step by which they are displaced along these directions. This leads to the discretized Boltzmann kinetic equation: eq (4) fk (x + Δ tck ,t + Δ t) − fk (x,t) = −ωΔ t fk (x,t) − f k (x,t) .

Here, the macroscopic fluid density is given by

ρ (x,t) = ρ ◦ ∑ fk (x,t)

(5)

k

and the macroscopic fluid velocity in the presence of external forcing is given by

Optimization of Chaotic Micromixers

u(x,t) =

329

ρ◦ Δt fk (x,t)ck − g. ρ (x,t) ∑ 2 k

(6)

It can be shown by a Chapman-Enskog expansion that the macroscopic fields u and ρ from the above equations fulfill the Navier Stokes equation in the low Mach number limit and for isothermal systems [15]. In order to simulate a fluid flow through microchannels, periodic boundary conditions are implemented along the flow direction (see Fig. 1) and no-slip bounce back boundary conditions are imposed at the channel walls. We simulate a fluid which is hydrodynamically similar to water, flowing inside a SHM with a cross section of 96 µm × 192 µm. The length of the channel is of the order of 1536 µm, but can be varied in order to always accommodate a full cycle of the herringbone structure. For computational efficiency we have chosen a lattice resolution of Δ x = 3 µm. Such a resolution is sufficient for a comparably simple geometry as studied in this report. However, if more sophisticated mixer geometries are to be optimized the resolution needs to be increased. Further, if mixing of multiple phases is to be simulated, periodic boundary conditions cannot be applied requiring the simulation of the full length of the mixer. In the LB method, the kinematic viscosity is related to the discrete time step through the expression ν = cs 2 (ω − Δ t/2). Δ t/ω is chosen to be 1 and the simulated fluid has the kinematic viscosity of water, ν = 10−6 m2 s−1. This implies for the current choice of Δ x that Δ t = 1.5×10−6 s and cs = 1.15 m√s−1 . The Reynolds number Re = u L/ν is set to the values 0.4 and 1.3, where L = H 2 +W 2 is the characteristic length of the channel. H denotes the height of the channel and W denotes the width of the channel. Trajectories of massless and non-interacting tracer particles introduced into the flow are obtained by integrating the vector equation of motion dR j = u(R j ), dt

j = 1, . . . , P

(7)

where R j denotes the position vector of an individual tracer particle. The velocity u(R j ) is obtained from the discrete LB velocity field through a trilinear interpolation scheme. After the flow simulation has reached its steady state, P = 1, 000 particles are introduced at fluid nodes in the inlet and then their velocities are integrated at each time step. A general feature of chaotic systems is that two nearby trajectories diverge exponentially in time. The rate of divergence can be related to the strength of the flow field to create conditions for chaotic mixing. The Lyapunov exponent is a possible measure for this effect since it is related to the rate of stretching of the trajectories. It is defined by 1 D(t) λ∞ = lim ln , (8) t→∞ t D(0) where D(t) is the distance between two trajectories at time t. λ∞ gives the value of the Lyapunov exponent as t tends to infinity. Since any real system is finite it is not possible to implement this definition to quantify mixing. Also, when two trajectories

330

A. Sarkar, A. Narv´aez, J. Harting

separate from each other, this definition does not allow to understand the ongoing stretching and folding dynamics. A quantitative measure of mixing based on the Lyapunov exponent can be obtained by using the FTLE instead [18, 19]. It is defined as [20] 1 D(t + δ t) λFTLE = ln , (9) δt D(t) where t is any particular instant of time and δ t is a finite time after which the FTLE is measured. The same process is repeated over N times. For large N the average FTLE converges to the Lyapunov exponent [19] lim λFTLE N = λ∞ .

N→∞

(10)

Wolf et al. suggested a method to calculate the FTLE from a set of experimental data [8, 21]. Following Wolf’s approach, we implement the following equation to quantify the mixer performance on the basis of the average FTLE as λ N =

1 N−1 1 D(ti + τi ) ∑ τi ln D(ti ) , N i=0

(11)

where ti is the ith time when a FTLE is evaluated, D(ti + τi ) and D(ti ) are the distance at time step ti + τi and ti , respectively. τi is a multiple of Δ t and N is the total number of times the particle positions are re-adjusted. If λ N has a positive and non-zero value the distance between two nearby particles diverges at an exponential rate. Particle pairs which are initially very close to each other are chosen to evaluate the FTLE (i.e. with a distance Δ x). When these particles evolve in time, the distance between them either increases or decreases. If the separation is greater than a maximum distance which is half the minimum dimension of the system H/2, the distance between the particles is re-adjusted to the initial distance D(t0 ) and one of the particles is placed along the line of separation in order to avoid errors due to orientation. If a replacement point cannot be found due to a surface node present at the location, a nearby fluid node is selected instead. If even such points cannot be found, the replacement is postponed to a later time step. For the implementation of the scheme, for every particle pair one of the trajectories is chosen as the fiducial path, while the position of the other particle is replaced if the distance becomes larger than the threshold value.

3 Implementation During the last twelve years, our massively parallel 3D LB code (LB3D) was developed. LB3D is based on Shan and Chen’s multiphase LB model [22, 23], which can be utilized to simulate a number of miscible or immiscible fluids. In addition, amphiphiles were added to the model [24]. Interactions between different fluid species are modeled by a mesoscopic force between the phases. The code was applied to a

Optimization of Chaotic Micromixers

331

large number of problems to study for example the behavior of binary and ternary fluid mixtures under shear [25], the formation of surfactant mesophases [26, 27], or flow in porous media [28, 29]. Collaborations with computer scientists and software developers led to a large number of improvements to the simulation code. These include computational steering facilities, which allow the transparent access to a running simulation over the network and to change parameters or visualize output data on the fly from the user’s workstation [30]. Recently, LB3D was extended to simulate typical problems arising in microfluidics including fluid flow along rough and hydrophobic surfaces [31–34]. Our group has a long standing reputation and experience in simulating suspensions using different hybrid methods consisting of an MD solver for the particle motion and various solvers for the fluid solvent [35– 39]. Within the last three years our lattice Boltzmann code LB3D was combined with a parallel MD code which is also the base of the results presented in this report. Within the same activity, a model for red blood cells (RBC) in plasma was implemented [38]. The cells are described as hard ellipsoids interacting with the hydrodynamic field. It was demonstrated that the code allows to describe systems containing several million RBCs on current supercomputers. The code was ported to most available supercomputer platforms and shows a very good performance and scaling behavior. The Edinburgh parallel computing center has awarded an earlier version of LB3D with its gold medal for scaling almost linearly to 1024 processors already in 2004. Recently, improvements in the MPI communication code allowed to demonstrate strong scaling for up to 262144 cores on the BlueGene/P system Jugene in J¨ulich [40, 41]. The code was the main application within the TeraGyroid project where it was used on a prototype computational grid consisting of all national supercomputers in the UK and a number of machines in the US and received various prizes. See Fig. 2 for a comparison of the performance on the XC2 (Karlsruhe), HECToR (Edinburgh), Huygens (Amsterdam) and Juropa (J¨ulich). It is interesting to note that even though the XC2 is

Fig. 2 Scaling and performance comparison of LB3D on the XC2 at SSC Karlsruhe, HECToR at EPCC, Huygens at SARA, and JUROPA at JSC. The studied system is comparably small which is the reason for the less good scaling compared to what we generally observe

332

A. Sarkar, A. Narv´aez, J. Harting

the oldest available platform it still performs reasonably well and has proven as a reliable workhorse during the last years. The scaling behavior in the presented plot is not very good on larger core counts because of the too small system used for the benchmark.

4 Results In this section we present how FTLE can be utilized for an optimization strategy for chaotic micromixers. As an example, the influence of different parameters which directly affect the performance of the SHM is evaluated. These are the ratio of height of the grooves to the height of the channel α , the ratio of the horizontal length of the long arm to the channel width β , the ratio of distance between the grooves to the length of the channel γ and the number of grooves per half cycle n. While keeping all other parameters fixed, the width fraction (β ) is varied within the range of 0.22 and 0.82 and the distance fraction (γ ) from 0.04 to 0.11. The width of the grooves is kept fixed at 24 µm for all simulations. Then, the number of grooves per half cycle (n) is varied from 2 to 10 and the height fraction (α ) from 0.125 to 0.343. One has to take care of a thorough convergence of the simulations since λ N fluctuates before finally converging to a particular value after ∼ 6.0×105 time steps. Therefore, simulations are run until the FTLE have thoroughly converged before evaluating the tracer trajectories. The effect of the geometry can be measured by comparing the average of the converged FTLE which is denoted by λ . The error bars in Figs. 3 to 4 are given by the standard deviation of the data from the point where it has converged. A possible reduction of the required computing time can be achieved by stopping the lattice Boltzmann simulation when a steady state flow field has been obtained since the computational effort for the tracer particles is small compared to time

Fig. 3 Left: A maximum of the variation of the maximum averaged finite time Lyapunov exponent λ with different width fraction β can be obtained for a width fraction of β = 2/3. While the position of the maximum is not affected by changing Re, the absolute values change. Right: The FTLE rises with the increase of the distance fraction γ until it reaches a distinct peak. Then, the curve decreases demonstrating an optimized performance of the mixer at γ = 0.07 [1]

Optimization of Chaotic Micromixers

333

Fig. 4 Left: The variation of λ with the number of grooves per half cycle (n) shows that the SHM with n = 5 performs best. Right: A variation of the height fraction (α ) indicates that the maximum FTLE can be obtained for α = 0.25 [1]

required for the fluid solver. However, for complex geometries and chaotic flows several hundred thousand timesteps can be required to obtain the steady state and a thorough convergence study would be needed for every different channel geometry. The left part of Fig. 3 depicts the variance of λ and as such the performance of the SHM with respect to β for two different Reynolds numbers, Re = 0.4 and 1.3. Due to the symmetry of the mixer geometry, only values for β ≥ 0.5 are plotted. The datasets peak at β = 2/3 implying that the degree of chaotic advection is maximized for this particular value of the width fraction β . The measurements at different Reynolds numbers depict that changing the driving force does change the absolute value of λ , but has no influence on the general shape of the curve. This is confirmed by similar studies of the Re dependence for other geometrical parameters and various different driving forces. Therefore, we restrict ourselves to Re = 1.3 for all further simulations. Our findings are consistent with the original experimental work of Stroock et al. [11] as well as numerical optimizations by Stroock and McGraw [42]. Both publications show that β = 2/3 generates a maximum swirling motion of the fluid trajectories. However such analysis with dyes or concentration profiles does not allow to obtain an insight into the behavior of the flow field, while the FTLE does. In the right inset of Fig. 3 data from a set of simulations with β fixed at the optimized value of 2/3 and the distance fraction γ being varied from 0.04 to 0.11 is shown. It can be observed that after a moderate increase of λ with γ , the curve has a sharp peak at γ = 0.07, which corresponds to a value of d = 105 µm for the current choice of Δ x. Afterwards, λ decreases in a similar fashion as for small γ , but still at higher absolute values. In the following the number of grooves per half-cycle n is varied from 2 to 10. It can be understood from the left inset of Fig. 4 that a variation of n has the largest impact on the performance of the mixer as compared to β or γ . For the current setup, by variation of n it is possible to change the value of λ by a factor of 2.3 as compared to 1.2 for β and 1.3 for γ . The data clearly demonstrates that a staggered herringbone mixer with n = 5 performs best. Similar to our work, Li and Chen performed LB simulations and used tracers to follow the flow field [43]. They, however, quantify

334

A. Sarkar, A. Narv´aez, J. Harting

mixing by computing the standard deviation of the local tracer concentration and conclude that mixers with n = 5 or n = 6 perform best. This result is in agreement with our finding, but the FTLE analysis clearly shows that the channel with n = 5 performs better than the one with n = 6. The final parameter to be considered is the ratio of the half depth of the grooves to the height of the channel α . Figure 4 (right) depicts the average value of the converged Lyapunov exponents for α between 0.125 and 0.343. After a strong increase of the curve the data has a maximum at α = 0.25. For larger α the value of λ decreases again. Our result are confirmed by the original experimental analysis of Stroock et al. [11].

5 Summary Passive chaotic micromixers can be successfully applied to improve mixing at the microscale where turbulence is absent and only diffusion can be used for mixing. These mixers provide a large fluid-fluid interface by repeatedly stretching and folding of these interfaces. The performance of such mixers depends on the rate at which “chaotic advection” of the fluid takes place. In this work we have demonstrated an efficient numerical scheme which allows the quantification of “chaotic advection” and thus the performance of a micromixer. The scheme is based on our well developed massively parallel LB solver LB3D to describe the time dependent flow field in complex mixer geometries combined with Wolf’s method to compute FTLE from passive tracer trajectories. We have utilized the XC2 in Karlsruhe to demonstrate the applicability of the quantification method by applying it to optimize the geometry of the staggered herringbone mixer. By performing a systematic variation of the relevant geometrical parameters we obtained a set of optimal values α = 0.25, β = 2/3, γ = 0.07 and n = 5 which is consistent with literature data published by others. An important feature of the method presented here is that it allows optimization of the mixing performance by direct investigation of the underlying dynamical process [1]. Currently we are extending our method to make use of the multiphase and multicomponent capabilities of our lattice Boltzmann implementation in order to study the mixing of multiphase flows in microchannels. This is of course a more realistic scenario which has received surprisingly little attention in the literature so far. A possible explanation for the small number of publications on this topic is the complicated interaction between the process of chaotic advection together with the parameters determining the diffusion between different fluid species. Acknowledgments. The authors thank F. Janoschek, F. Raischel, G.J.F. van Heijst, and M. Pat´ tanty´us-Abrah´ am for fruitful discussions. This work was financed within the DFG priority program “nano- and microfluidics”, the DFG collaborative research center 716, and by the NWO/STW VIDI grant of J. Harting. A. Narv´aez thanks Deutscher Akademischer Austauschdienst (DAAD) for financial support. We thank the Scientific Supercomputing Center Karlsruhe for providing the computing time and technical support for the presented work.

Optimization of Chaotic Micromixers

335

References 1. A. Sarkar, A. Narv´aez, and J. Harting. Quantification of the degree of mixing in chaotic micromixers using finite time Lyapunov exponents. Submitted for publication, arXiv:1012.5549, 2010. 2. M. A. Burns. Microfabricated structures for integrated DNA analysis. Proc. National Acad. Sci. USA, 68(93):5556–5561, 1996. 3. P. Watts and S. Haswell. Microfluidic combinatorial chemistry. Curr. Opin. Chem. Biol., 7:380–387, 1996. 4. V. Hessel, H. Loewe, and F. Schoenfeld. Micromixers—A review on active and passive mixing principles. Chem. Eng. Sci., 60:2479–2501, 2005. 5. H. Aref. Stirring by chaotic advection. J. Fluid Mech., 143:1–21, 1984. 6. H. Kim and A. Beskok. Quantification of chaotic strength and mixing in a micro fluidic system. J. Micromech. Microeng., 17:2197–2210, 2007. 7. C. Zhang, D. Xing, and Y. Li. Micropumps, microvalves and micromixers within pcr microfluidic chips: Advances and trends. Biotech. Advances, 25:483–514, 2007. 8. F. G. Bessoth, A. de Mello, and A. Manz. Microstructure for efficient continuous flow mixing. Analyt. Comm., 36:213–215, 1999. 9. D. Gobby, P. Angeli, and A. Gavriliidis. Mixing characteristics of a T-type microfluidic mixers. J. Micromech. Microeng., 11:126–132, 2001. 10. Y. Mingquiang and H. Bau. The kinematics of bend-induced stirring in micro-conduits. In Proc. ASME Intl. Mech. Sys. (MEMS’97), Nagoya, Japan, pp. 96–101, 2000. 11. A. Strook, S. Dertinger, A. Adjari, I. Mezic, H. Stone, and G. Whiteside. Chaotic mixer for microchannels. Science, 295:647–651, 2002. 12. T. K. Kang, M. K. Singh, T. H. Kwon, and P. D. Anderson. Chaotic mixing using periodic and aperiodic sequences of mixing protocols. Microfluids and Nanofluids, 4(6):589–599, 2007. 13. C. Ziemann, L. A. Smith, and J. Kurths. Localized Lyapunov exponents and the prediction of predictability. Phys. Lett. A, 4:237–251, 2000. 14. G. Lapeyre. Characterization of finite-time Lyapunov exponents and vectors in twodimensional turbulence. Chaos, 12(3):688–698, 2002. 15. S. Succi. The Lattice Boltzmann Equation for Fluid Dynamics and Beyond. Oxford University Press, London, 2001. 16. Y. H. Qian, D. d’Humieres, and P. Lallemand. Lattice BGK models for Navier-Stokes equation. Europhys. Lett., 17(6):479–484, 1992. 17. P. Bhatnagar, E. Gross, and M. Krook. A model for collision process in gases. Small amplitude process in charged and neutral one-component systems. Phys. Rev., 94:511–525, 1954. 18. D. Ruiquiang and L. Jianping. Nonlinear finite-time Lyapunov exponent and predictability. Phys. Lett. A, 364:396–400, 2007. 19. X. Tang and A. Boozer. Finite time Lyapunov exponent and chaotic advection-diffusion equation. Physica D, 95:283–305, 1996. 20. Y. Lee, C. Shih, P. Tabeling, and C.-M. Ho. Experimental study and non-linear dynamics of time-periodic micro chaotic mixers. J. Fluid Mech., 575:425–448, 2007. 21. A. Wolf, J. B. Swift, H. L. Swinney, and J. A. Vastano. Determining Lyapunov exponents from a time series. Physica D, 16:285–317, 1985. 22. X. Shan and H. Chen. Lattice Boltzmann model for simulating flows with multiple phases and components. Phys. Rev. E, 47(3):1815, 1993. 23. X. Shan and H. Chen. Simulation of nonideal gases and liquid-gas phase transitions by the lattice Boltzmann equation. Phys. Rev. E, 49(4):2941, 1994. 24. H. Chen, B. M. Boghosian, P. V. Coveney, and M. Nekovee. A ternary lattice Boltzmann model for amphiphilic fluids. Proc. R. Soc. Lond. A, 456:2043, 2000. 25. J. Harting, M. Harvey, J. Chin, M. Venturoli, and P. V. Coveney. Large-scale lattice Boltzmann simulations of complex fluids: Advances through the advent of computational grids. Phil. Trans. R. Soc. Lond. A, 363:1895–1915, 2005.

336

A. Sarkar, A. Narv´aez, J. Harting

26. G. Giupponi, J. Harting, and P. V. Coveney. Emergence of rheological properties in lattice Boltzmann simulations of gyroid mesophases. Europhys. Lett., 73:533–539, 2006. 27. N. Gonz´alez-Segredo, J. Harting, G. Giupponi, and P. V. Coveney. Stress response and structural transitions in sheared gyroidal and lamellar amphiphilic mesophases: Lattice-Boltzmann simulations. Phys. Rev. E, 73:031503, 2006. 28. A. Narv´aez, T. Zauner, F. Raischel, R. Hilfer, and J. Harting. Quantitative analysis of numerical estimates for the permeability of porous media from lattice-Boltzmann simulations. J. Stat. Mech: Theor. Exp., 2010:P211026, 2010. 29. A. Narv´aez and J. Harting. A D3Q19 lattice-Boltzmann pore-list code with pressure boundary conditions for permeability calculations. Advances in Applied Mathematics and Mechanics, 2:685, 2010. 30. J. Chin, J. Harting, S. Jha, P. V. Coveney, A. R. Porter, and S. M. Pickles. Steering in computational science: Mesoscale modelling and simulation. Contemporary Physics, 44(5):417–434, 2003. 31. J. Harting, C. Kunert, and H. Herrmann. Lattice Boltzmann simulations of apparent slip in hydrophobic microchannels. Europhys. Lett., 75:328–334, 2006. 32. C. Kunert and J. Harting. Roughness induced apparent boundary slip in microchannel flows. Phys. Rev. Lett., 99:176001, 2007. 33. J. Hyv¨aluoma and J. Harting. Slip flow over structured surfaces with entrapped microbubbles. Phys. Rev. Lett., 100:246001, 2008. 34. C. Kunert, J. Harting, and O. I. Vinogradova. Random-roughness hydrodynamic boundary conditions. Phys. Rev. Lett., 105:016001, 2010. 35. A. Komnik, J. Harting, and H. J. Herrmann. Transport phenomena and structuring in shear flow of suspensions near solid walls. J. Stat. Mech: Theor. Exp., P12003, 2004. 36. M. Hecht, J. Harting, T. Ihle, and H. J. Herrmann. Simulation of claylike colloids. Phys. Rev. E, 72:011408, 2005. 37. J. Harting, H. J. Herrmann, and E. Ben-Naim. Anomalous distribution functions in sheared suspensions. Europhys. Lett., 83:30001, 2008. 38. F. Janoschek, F. Toschi, and J. Harting. Simplified particulate model for coarse-grained hemodynamics simulations. Phys. Rev. E, 82:056710, 2010. 39. F. Jansen and J. Harting. From Bijels to Pickering emulsions: A lattice Boltzmann study. Phys. Rev. E, 83:046707, 2011. 40. D. Groen, O. Henrich, F. Janoschek, P. Coveney, and J. Harting. Lattice-Boltzmann methods in fluid dynamics: Turbulence and complex colloidal fluids. In W. F. Bernd Mohr, editor, J¨ulich Blue Gene/P Extreme Scaling Workshop 2011. J¨ulich Supercomputing Centre, 52425 J¨ulich, Germany, apr 2011. FZJ-JSC-IB-2011-02; http://www2.fz-juelich.de/jsc/docs/autoren2011/ mohr1/. 41. J. Harting, F. Jansen, S. Frijters, F. Janoschek, and F. G¨unther. Nanoparticles as emulsion stabilizers. inSiDE, 9(1):48, 2011. 42. A. D. Stroock and G. J. McGraw. Investigation of the staggered herringbone mixer with a simple analytical model. Phil. Trans. R. Soc. Lond. A, 362:923–935, 2004. 43. C. Li and T. Chen. Simulation and optimization of chaotic micromixer using lattice Boltzmann method. Sensors and Actuators B, 106:871–877, 2005.

Numerical Simulation of Particle-Laden Turbulent Flows Using LES Michael Breuer and Michael Alletto

Abstract The paper is concerned with the simulation of particle-laden two-phase flows based on the Euler-Lagrange approach. The methodology developed is driven by two compulsory requirements: (i) the necessity to tackle complex turbulent flows by eddy-resolving schemes such as large-eddy simulation (LES); (ii) the demand to predict dispersed multiphase flows at high mass loadings. First, a highly efficient particle tracking algorithm was developed working on curvilinear, block-structured grids in general complex 3D domains. Second, to allow the prediction of dense twophase flows, the fluid-particle interaction (two-way coupling) as well as particleparticle collisions (four-way coupling) had to be taken into account. For the latter instead of a stochastic collision model, in the present study a deterministic collision model is considered. Nevertheless, the computational burden is minor owing to the concept of virtual cells, where only adjacent particles are taken into account in the search for potential collision partners. The methodology is applied to different test cases. Here results of the two-phase flows in a plane channel and in a model combustion chamber are reported. The influence of particle-fluid (two-way coupling) as well as particle-particle interactions (four-way coupling) is investigated for a mass loadings of 22%. The computational results are compared with experimental measurements and an encouraging agreement is found. Results for a higher mass loading of 110% will be published in a subsequent report. The methodology developed will be further extended in the near future, e.g., to account for rough walls. Then even more challenging test cases will be tackled.

Michael Breuer · Michael Alletto Professur f¨ur Str¨omungsmechanik, Institut f¨ur Mechanik, Helmut-Schmidt-Universit¨at Hamburg, Holstenhofweg 85, Postfach 70 08 22, D-22043 Hamburg, Germany, e-mail: [email protected], [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 25, © Springer-Verlag Berlin Heidelberg 2012

337

338

M. Breuer, M. Alletto

1 Introduction The interaction between turbulence and the dynamics of particles plays an essential role in many industrial applications like cyclone separators, the fluidized-bed combustion and the conveying of solid materials through pipe systems. Knowing the shortcoming of RANS methods to reliably predict complex flow phenomena, it is necessary to develop simulation tools which (i) describe accurately the flow of the carrier phase and (ii) account for all relevant effects which influence the particle dynamics. Regarding wall-bounded flows, a few attempts have been made to reproduce experimental data, e.g. [14, 20], available in the literature based on LES predictions, e.g. [13, 25, 28]. All authors could not reasonably predict the particle mean velocity profile which was lower than the one of the carrier phase in spite of the gravity force acting in streamwise direction. The particle velocity fluctuations were also considerably lower than the measured ones. A possible explanation for the difference between the predictions and the experiments could be the neglection of the wall roughness effects [15]. Recently, special care was taken by Benson et al. [4] about the smoothness of the wall in the channel development section. This leads to good agreement between the experimental findings of Benson et al. [4] and our four-way coupled channel flow simulations, see Sect. 6.1, which do not yet incorporate wall roughness effects. Because of the considerably high computational costs, very few attempts have been made to simulate particle-laden turbulent flows in complex geometries using LES, see e.g. [1, 22]. Both groups only considered a two-way coupled flow. Particleparticle collisions, however, play an important role if the volume fraction becomes higher than 10−3 [26] and were found to influence the computed LES statistics at considerably lower volume fractions [13, 28]. It is therefore interesting to elucidate the difference between the one-way-, two-way and four-way coupled flow assumptions. To the best of our knowledge, this work presents the first four-way coupled simulation for complex geometries using LES and a Lagrangian tracking of particles with deterministic collision detection. For that purpose concentrated efforts were put into the development and implementation of a highly efficient particle tracking scheme [8, 9] which now also includes the handling of interparticle collisions. The procedure is validated based on particle-laden channel and pipe flows and presently compared to the experimental data of a model combustion chamber [3].

2 Governing Equations In this work we describe the multiphase flow using an Euler-Lagrange approach in which the two different phases are solved in two different frames of reference.

Numerical Simulation of Particle-Laden Turbulent Flows Using LES

339

2.1 Continuous Phase The continuous phase is solved in an Eulerian frame of reference. The conservation equations of the filtered quantities used in LES, see [7], can be extended to take into account the feedback effect of the particles on the fluid (two-way coupling). For that purpose the particle-source-in-cell method described in [12] is used.

2.2 Dispersed Phase The dispersed phase is solved in a Lagrangian frame of reference. The equation of motion is given by Newton’s second law, where the fluid forces are derived from the displacement of a small rigid sphere in a non-uniform flow [18]. For particles with a density much higher than the carrier fluid, i.e. ρ p /ρ f 1, only the drag, lift, gravity and buoyancy forces have to be considered, leading to: ρf u f − up d up FMcL + L . = +g 1− (1) dt τ p /α ρp mp u p , u f , τ p , g and m p are the particle velocity, the fluid velocity at the particle position, the particle relaxation time, the gravitational acceleration and the mass of is calculated as follows [19]: the particle. The lift force FMcL L 1/2 |G| ν f 9 |G| 1/2 u u 2 = J , J = f sign(G) μ f d p (u f − u p ) sign(G) 4π νf (u p − u f ) (2) G is the fluid velocity gradient tensor. The drag force on the particle is based on Stokes flow around a sphere, where the corresponding drag coefficient is given by: FMcL L

CD =

24 α Re p

with Re p =

|u f − u p | d p . νf

(3)

Here d p denotes the particle diameter, ν f = μ f /ρ f the kinematic viscosity of the fluid and Re p the particle Reynolds number, respectively. In order to extend the validity of relation (3) towards higher particle Reynolds numbers (0 < Re p ≤ 800), the correction factor α = 1 + 0.15 Re0.687 is introduced [23]. Finally, the particle p relaxation time in (1) is defined as:

τp =

ρ p d 2p . 18 μ f

(4)

It describes the response time of a particle to changes in the flow field. The effect of the unresolved scales on the particle motion is taken into account by a simple stochastic model [21]:

340

M. Breuer, M. Alletto

u f = u | p +u , u =

2 ksgsξ . 3

(5)

Especially for small particle Stokes numbers St = τ p+ = τ p u2τ /ν the subgrid-scale velocities have to be considered owing to their significant effect on the particle motion. ξ is a random number following a Gaussian distribution with zero mean and unit variance, ksgs the subgrid-scale kinetic energy, u | p the filtered velocity at the particle position and u the subgrid-scale velocity fluctuations. If the Smagorinsky model or the dynamic model is used, an estimation for the subgrid-scale kinetic energy ksgs is required: 1 ksgs = (u | p −u | p )2 . 2

(6)

u | p is the velocity field filtered a second time with a test filter Δ˜ = 2Δ . Δ is the filter width used to filter the Navier-Stokes equations. u | p and u | p are taken from the cell closest to the particle. Estimation (6) is based on the scale similarity approach of Bardina et al. [2], which relies on the assumption of the similarity of neighboring scales in the spectral space.

3 Numerical Methods 3.1 Continuous Phase The continuous phase is solved in an Eulerian frame of reference using the computer code L E S OC C (= Large-Eddy Simulation On Curvilinear Coordinates [6–8]) to integrate the governing equations in space and time. It is based on a 3-D finitevolume method for arbitrary non-orthogonal and block-structured grids. The entire discretization is second-order accurate in space and time. For modeling the nonresolvable subgrid scales the well-known Smagorinsky model [24] with Van Driest damping near solid walls is applied. Alternatively, a dynamic model can be used.

3.2 Dispersed Phase 3.2.1 Solution of the Governing Equations The ordinary differential equation (1) for the particle motion is integrated by a fourth-order Runge-Kutta scheme which is stable under the condition 0 ≤ β = Δ t α /τ p ≤ 2.78. To avoid the violation of the stability condition occurring for tiny particles with τ p tending towards zero, an analytical integration is used if β is out of the stability bounds. To avoid time-consuming search algorithms, the second inte-

Numerical Simulation of Particle-Laden Turbulent Flows Using LES

341

gration of (1) to determine the particle position is done in the computational space. Here an explicit relation between the position of the particle and the cell index containing the particle exists [8, 9], which is required to calculate the fluid forces on the particle. Thus a highly efficient particle tracking scheme results allowing to predict the path of millions of particles. The fluid velocity u f at the particle position is calculated with a Taylor series expansion around the cell center next to the particle [17]. This interpolation was shown to have a weaker filtering effect on the fluid velocity than a trilinear interpolation leading to better results for particles with small relaxation times τ p .

3.2.2 Collision Detection Algorithm According to the technique of uncoupling developed by Bird [5], the calculation of the particle trajectories is split into two stages: 1. particles are moved based on the equation of motion without inter-particle interactions 2. the occurrence of collisions during the first stage is examined for all particles. If a collision is found, the velocities of the collision pair are replaced by the post-collision ones without changing their position which is also advantageous for parallelization. The collision handling itself is carried out in two steps: (I) In the first step likely collision partners are identified. Since for small time steps only collisions between neighboring particles are likely, substantial computational savings are achieved by dividing the computational domain into virtual cells. The method is commonly employed in smooth particle hydrodynamics (SPH), see e.g. [27], restricting the collision detection on neighboring particles. It is obvious that collisions are likely to occur between adjacent particles. Thus there is no need to enclose all particles in the search procedure to detect the majority of collisions. Choosing the cell size in such a way that the particles per cell are sufficiently sparse, the cost of checking Np particles for collisions is reduced from the order O(N p2 ) to O(Np ), which is crucial for large numbers of particles. To establish the virtual cells containing the particles, the computational domain of the size (ni , n j , nk ) is split into (ic , jc , kc ) cells according to ic = int(ni /di ) + 1, jc = int(n j /di ) + 1 and kc = int(nk /di ) + 1 where di is a factor adjusted dynamically to limit the maximum number of particles in a virtual cell to an amount specified by the user. Based on the particle coordinates in the computational domain (ξ p , η p , ζ p ) the corresponding index values of the virtual cells can be determined: ξp ηp ζp i p = int + 1, j p = int + 1, k p = int + 1. (7) di di di Finally, a particle property is defined by assigning the index of the virtual cell, denoted ivc , in which the particle is located according to:

342

M. Breuer, M. Alletto

ivc = i p + ( j p − 1) ∗ ic + (k p − 1) ∗ ic ∗ jc .

(8)

Consequently, all particles within the same virtual cell are characterized by the same index ivc . In this way the collision detection procedure can be limited to the particles in each virtual cell. Furthermore, to avoid overlapping cells or the necessity to take the 26 surrounding cells into account during the first step, the search and collision detection procedure is carried out a second time with slightly larger virtual cells. (II) The second step solely takes the particles in one virtual cell into account. Following a suggestion of Chen et al. [11] the algorithm relies on the assumption of constant velocity within a time step, which is quite reasonable owing to the small time steps used in LES. Based on the assumption of linear displacements during a time step, it is possible to detect the collision of two particles by purely kinematic conditions, i.e., (i) the two particles have to approach each other and (ii) their minimum separation within a time step has to be less than the sum of their radii. The first condition (i) for collision that two particles are approaching each other is expressed by: (9) xr · u p,r < 0. Otherwise a collision is impossible. xr and u p,r are the relative distance and the relative velocity between two particles, respectively. If this condition is fulfilled, the time Δ tmin at which the particle separation distance is a minimum xr,min is computed as follows:

Δ tmin = −

xr · u p,r |u p,r |2

with xr,min = xr + u p,r Δ tmin .

(10)

The condition that two particles collide is set as follows: (Δ tmin ≤ Δ t

&

|xr,min | ≤ d p12 )

∨

(|xr | ≤ d p12 ).

(11)

d p12 is the sum of the radii of the two colliding particles. The collision time Δ tcol is calculated from the condition that the relative distance at that time has to be equal to d p12 : |xr + Δ tcol u p,r |2 = d 2p12 . (12) The solution of this equation is given by:

Δ tcol = Δ tmin (1 −

1 − K1 K2 ),

K1 =

d 2p12 |xr |2 |u p,r |2 , K = 1 − . 2 (xr · u p,r )2 |xr |2

(13)

Note that (12) has two solutions. The second one, not shown here, belongs to the state where the particles have already interpenetrated each other and is excluded a priori. The collision-normal vector can then easily be calculated: xr,col = xr + u p,r Δ tcol .

(14)

For handling the collision itself, it is necessary to determine the particle velocities in the direction normal to the collision:

Numerical Simulation of Particle-Laden Turbulent Flows Using LES

u− 1n =

xr,col yr,col zr,col u1x + u1y + u1z , |xr,col | |xr,col | |xr,col |

u− 2n = . . . .

343

(15a) (15b)

xr,col , yr,col and zr,col are the Cartesian components of the collision-normal vector and u1x , u1y and u1z the Cartesian components of the particle velocity before the collision. If a collision is detected, the velocity of the colliding particles is changed according to a hard sphere collision model: u+ 1n =

− − − m p1 u− 1n + m p2 u2n − e m p2 (u1n − u2n ) , m p1 + m p2

(16a)

u+ 2n =

− − − m p1 u− 1n + m p2 u2n + e m p1 (u1n − u2n ) . m p1 + m p2

(16b)

+ u+ 1n and u2n are the particle velocities in the direction normal to the collision after the collision. e denotes the restitution coefficient. The restitution coefficient is set e = 1 for a fully elastic and e = 0 for a fully inelastic collision. In the present case it is set to e = 1. m p1 and m p2 stand for the particle masses. Note that if no friction is involved in the collision, as presently in our method, the particle velocity changes occur only in collision-normal direction. The Cartesian components of the particle post-collision velocities are predicted as follows: + u+ 1 = u1n + u+ 2 = u2n

xr,col , |xr,col | xr,col . |xr,col |

(17a) (17b)

3.2.3 Wall Collisions If the particle passes the center of the cell adjacent to the wall, currently two different boundary conditions can be applied: (i) The particle sticks at the wall and is consequently removed from the computational domain, or alternatively (ii) the particle rebounds fully elastically. This implies that the sign of the velocity component normal to the wall is inverted and all other components are kept.

4 HPC Strategies The optimization of the solver for the continuous phase in L E S OC C was reported in several previous reports, see, e.g., [10, 16]. Thus, here we restrict the discussion to the solver for the dispersed phase.

344

M. Breuer, M. Alletto

As mentioned above, in order to track a huge number of particles it is important to work with efficient algorithms which are applicable on high-performance computers. The present scheme is highly efficient due to the following reasons: • No CPU time-consuming search algorithm is needed in the present c-space scheme. • The particle properties are stored in linear arrays which allows vectorization of all loops in the particle routines over the total number of particles on the processor. If the number of particles is reasonably large (e.g., > 256) the loops are efficiently carried out on the vector unit. • Even if particles are leaving the present domain or deposit, the linear arrays are kept filled by reordering the particles after each time step. This guarantees optimal performance. • The multi-block exchange between blocks for the particle data completely relies on the same arrangement as used for the continuous phase. The data transfer itself is based on MPI. • Parallelization of the particle routines (and also of the flow solver) is achieved by domain decomposition, i.e., each processor deals with the particles of its own block. A minor disadvantage of this procedure is the fact that no load balancing of the particle tracking is possible, since the distribution of particles is not known in advance. However, since the tracking is so efficient that the predominant part of the CPU time is still spent for the continuous phase, this imbalance observed for the particle routines does not preponderate for the overall load balancing of the entire code. • The collision detection procedure is carried out over a small amount of particles contained in a virtual cell which breaks down the computational cost from O(N p2 ) to O(N p ). • Condition (9) is introduced to further reduce the number of potential colliding particles. • Vectorization of the most time-consuming loop of the collision check routine is achieved by splitting up the loop. Additionally, it is ensured that the loop is reasonably large to be efficiently carried out on vector units.

5 Description of the Test Cases 5.1 Channel Flow Preliminary tests examined the particle motion in a turbulent plane channel flow. The computational domain was 2πδ × πδ × 2δ in streamwise, spanwise and wallnormal direction, respectively. δ denotes the channel half width. The grid had 128 × 128 × 128 cells. In streamwise and spanwise direction an equidistant grid was used. In wall-normal direction the first cell center was located at Δ y+ = 0.65. Periodic boundary conditions were applied at the streamwise and spanwise boundaries and

Numerical Simulation of Particle-Laden Turbulent Flows Using LES

345

the no-slip condition at the walls. The Re number based on the bulk velocity was Re = 11, 900 (Reτ = 640 based on the friction velocity), the particle density and diameter were given by ρ p /ρ f = 2061 and d p /δ = 7.5 · 10−3 . The mass loading was set to Φ = 20%.

5.2 Cold Flow in a Combustion Chamber To test the code using a more challenging case with practical relevance, a cold flow in a model combustion chamber was considered. The geometry, shown in Fig. 1, was chosen to match the configuration described in [3]. The particle-laden air flow with a mean velocity U j of 3.1 m/s entered the chamber through a circular pipe located on the chamber axis. The particle diameter varied in a range of d p = 20 to 100 µm. The mass loading was Φ = 22% or 110%, but in the present report results are only shown for the low mass loading case. Clean air entered through an annular ring with a mean velocity Ue = 5.5 m/s and an outer radius Ra = 150 mm. The gravity acted in flow direction. The inflow conditions were provided by two additional simulations using pipe and annular ring flows with periodic boundary conditions. For all computed cases with the same mass loading the same inflow data were used to reduce the considerations solely to the chamber flow. At the outflow a convective boundary condition and the no-slip condition at the walls were prescribed. The chamber of length L/R = 90 was discretized by about 1.3 × 107 cells.

Fig. 1 Flow configuration of the model combustion chamber

346

M. Breuer, M. Alletto

6 Results 6.1 Channel Flow Figure 2 shows the results compared with the experiments of Benson et al. [4]. The instantaneous flow field and the particle velocities are averaged in both homogeneous directions and additionally in time over a dimensionless time interval of about

Fig. 2 Plane turbulent channel flow. Mean streamwise velocity: a particle, b fluid, fluctuations in wall-normal direction: c particle, d fluid, fluctuations in streamwise direction: e particle, f fluid

Numerical Simulation of Particle-Laden Turbulent Flows Using LES

347

Δ T = 980 in order to reach a statistically steady state. The velocity fluctuations are scaled with the centerline velocity Uc and the mean quantities with the friction velocity uτ of the unladen flow. Figure 2a displays the mean particle velocity. The computed four-way coupled case is in very good agreement with the experimental data. It is obvious that considering the particle-particle interactions leads to a flatter mean velocity profile, which indicates an enhanced momentum transfer between the particles. This is underlined by looking at the particle wall-normal fluctuations (see Fig. 2c) where particle-particle collisions lead to an increase of the fluctuations by a factor of about three with respect to the one-way and two-way coupled cases. This is astonishing since the mass loading is only Φ = 20%. Figure 2b shows the mean fluid velocity which is almost not affected by the presence of the particles. The fluid velocity fluctuations, see Figs. 2d and 2f, are significantly attenuated by the particles, which is surprising for this mass loading. The influence is more pronounced for the four-way coupled simulations. Unfortunately, there are no experimental data available for this case. Figure 2e shows the particle streamwise velocity fluctuations, which are flatter in the channel center than the corresponding one- and two-way coupled calculations. This may be explained by the flatter mean particle velocity profile: If a particle moves from a region with a smaller mean velocity to a region with a larger mean velocity, the difference between the instantaneous particle velocity and the mean velocity is the smaller the flatter the mean profile is. This holds of coarse also for the opposite case.

6.2 Cold Flow in a Combustion Chamber Figures 3 and 4 show the mean streamwise velocity of the flow and the particles on the symmetry axis and at different planes normal to the axis in comparison with the experimental data of Boree et al. [3]. The mass loading here is Φ = 22%. The presented velocity profiles are scaled with the mean velocity U j . Particles with a diameter of d p = 50 µm were chosen to show preliminary results because they had the highest number of entities among the distribution chosen by Boree et al. [3] and thus led to better converged statistics than particles with other diameters. From Fig. 3a it is visible that some deviations occur in the locations of the two stagnation points S1 and S2 and that the magnitude of the velocity between the stagnation points is lower compared with the experiments of Boree et al. [3]. Furthermore, considering a two-way or four-way coupled flow allows the jet to penetrate deeper into the chamber than in the corresponding one-way coupled simulation. Figure 3b shows the mean fluid velocity at z/R = 0.3. A good agreement between experiment and simulation in the jet region is found. The mean predicted annular velocity profile is not as asymmetric around the centerline of the annular gap (at r/R = 11.25) as the measured profile. The short development section in [3] for the annular flow could be an explanation for this discrepancy. This deviation seems to affect the results in all other planes along the symmetry axis (Figs. 3c–3h). From Fig. 3c it is noticeable that the radial extension of the recirculation region at this measurement plane can

348

M. Breuer, M. Alletto

Fig. 3 Combustion chamber flow at Φ = 22%. Mean streamwise fluid velocity at: a z/R = along the axis, b z/R = 0.3, c z/R = 8, d z/R = 16, e z/R = 20, f z/R = 24, g z/R = 32, h z/R = 40

Numerical Simulation of Particle-Laden Turbulent Flows Using LES

349

be very well captured by the simulation. The predicted fluid velocity in the following two measurement planes (see Figs. 3d and 3e) is higher than the measured one. This is a direct consequence of the differences in the location of the two stagnation points (see Fig. 3a). The mean fluid velocity in the planes normal to z/R = 24 to 40 (Figs. 3f–3h) shows again a close agreement with the experiments in the center of the chamber. Similar considerations as for the fluid hold for the particulate phase. Viewing at the particle mean streamwise velocity along the axis and especially at the measurement plane z/R = 0.3 (Figs. 4a and 4b) it is obvious that the measured particle velocity is lower than the one of the carrier phase despite the gravity was pointing in flow direction. This is not reproduced by the simulation. Possible reasons for this anomalous behavior have to be further investigated. As for the fluid the mean streamwise particle velocity at the plane normal to z/R = 8 (Fig. 4c) shows a very good agreement with the experimental data. In the following two measurement planes (Figs. 4d and 4e) major differences between the numerical results and the experiment can be observed in the region surrounding the symmetry axis. Based on Fig. 4d it seems that the particles in the simulation adjust quicker to the carrier phase than in the experiments. The mean velocity at the last three measurement planes is again in good agreement with the experiments (see Figs. 4f–4h).

7 Conclusions The paper reports on the development of an efficient numerical methodology for the simulation of turbulent particle-laden flows at high mass loadings. In view of the results obtained in Sect. 6.1 the following conclusions can be drawn: (i) particleparticle collisions strongly enhance the momentum transfer between the particles, which yields a flatter mean velocity profile and higher fluctuations in wall-normal direction, (ii) taking the collisions into account leads to a good agreement of the predictions with experimental data when the channel walls are smooth, (iii) collisions have probably to be considered even at small mass loadings as in the case presented in Sect. 6.1 (the volume fraction α = 10−4 was even lower due to the high particle-fluid density ratio). Regarding the cold flow in the combustion chamber the agreement between the numerical results and the experiments of Boree et al. [3] is satisfactory in the two measurement planes located between the two stagnation points S1 and S2 and good in all other planes. To draw major conclusions we have to wait for converged particle and fluid statistics at the higher mass loading of Φ = 110% presently under investigation. They will be presented in the following report. The computational cost for the collision search in the presented cases is quite low and amounts to approximately 2% of the whole computational effort. That yields very optimistic expectations about the feasibility of detailed and efficient simulations of turbulent flows with high mass loadings in complex geometries.

350

M. Breuer, M. Alletto

Fig. 4 Combustion chamber flow at Φ = 22%. Mean streamwise particle velocity at: a z/R = along the axis, b z/R = 0.3, c z/R = 8, d z/R = 16, e z/R = 20, f z/R = 24, g z/R = 32, h z/R = 40

Numerical Simulation of Particle-Laden Turbulent Flows Using LES

351

Acknowledgments. The time-consuming computations were carried out on the national supercomputer NEC SX-9 at the High Performance Computing Center Stuttgart (grant no.: PARTICLE / pfs 12855), which is gratefully acknowledged.

References 1. Apte, S.V., Mahes, K., Moin, P., Oefelein, J.C.: Large-Eddy Simulation of Swirling ParticleLaden Flows in a Coaxial-Jet Combustor, Int. J. Multiphase Flow, vol. 29, 1311–1331, (2003). 2. Bardina, J., Ferziger, J.H., Reynolds, W.C.: Improved Subgrid-Scale Models for Large Eddy Simulations, AIAA Paper, 80–1357, (1980). 3. Boree, J., Ishima, T., Flour, I.: The Effects of Mass Loading and Inter-Particle Collision on the Development of Polydispersed Two-Phase Flow Downstream of a Confined Bluff Body, J. Fluid Mech., vol. 443, 129–165, (2001). 4. Benson, M., Tanaka, T., Eaton, J.K.: Effects of Wall Roughness on Particle Velocities in a Turbulent Channel Flow, Trans. ASME J. Fluids Eng., vol. 150, 250–256, (2005). 5. Bird, G.A.: Molecular Gas Dynamics, Clarendon, Oxford, (1976). 6. Breuer, M.: Large Eddy Simulation of the Sub-Critical Flow Past a Circular Cylinder: Numerical and Modeling Aspects, Int. J. for Numer. Methods Fluids, vol. 28, 1281–1302, (1998). 7. Breuer, M.: Direkte Numerische Simulation und Large-Eddy Simulation turbulenter Str¨omungen auf Hochleistungsrechnern, Habilitationsschrift, Universit¨at Erlangen-N¨urnberg, Shaker, Aachen, (2002). 8. Breuer, M., Baytekin, H.T., Matida, E.A.: Prediction of Aerosol Deposition in 90 ◦ Bends Using LES and an Efficient Lagrangian Tracking Method, J. Aerosol Science, vol. 37(11), 1407–1428, (2006). 9. Breuer, M., Matida, E.A., Delgado, A.: Prediction of Aerosol Drug Deposition Using an Eulerian-Lagrangian Method Based on LES, Int. Conference on Multiphase Flow, ICMF 2007, July 9–13, 2007, Leipzig, Germany, (2007). 10. Breuer, M., Lammers, P., Zeiser, Th., Hager, G., Wellein, G.: Direct Numerical Simulation of Turbulent Flow Over Dimples—Code Optimization for NEC SX-8 plus Flow Results, High Performance Computing in Science and Engineering 2007, Transaction of the High Performance Computing Center Stuttgart, Oct. 4–5, 2007, eds. Nagel, W.E., Kr¨oner, D., Resch, M., 303–318, Springer, Berlin, ISBN 978-3-540-74738-3, (2008). 11. Chen, M., Kontomaris, K., McLaughlin, J.B.: Direct Numerical Simulation of Droplet Collisions in a Turbulent Channel Flow. Part 1: Collision Algorithm, Int. J. Multiphase Flow, vol. 24, 1079–1103, (2007). 12. Crowe, C.T., Sharma, M.P., Stock, D.E.: The Particle-Source-In-Cell (PSI-CELL) Model for Gas-Droplet Flows, Trans. ASME J. Fluids Eng., vol. 99, 325–332, (1977). 13. Fugakata, K., Zahrai, S., Kondo, S., Bark, F.H.: Anomalous Velocity Fluctuations in Particulate Turbulent Channel Flow, Int. J. Multiphase Flow, vol. 27, 701–719, (2001). 14. Kulick, J.D., Fessler, J.R., Eaton, J.K.: Particle Response and Turbulence Modification in Fully Developed Channel Flow, J. Fluid Mech., vol. 277, 109–134, (1994). 15. Kussin, J., Sommerfeld, M.: Experimental Studies on Particle Behaviour and Turbulence Modification in Horizontal Channel Flow with Different Wall Roughness, Experiments in Fluids, vol. 33, 143–159, (2002). 16. Lammers, P., Wellein, G., Zeiser, Th., Hager, G., Breuer, M.: Have the Vectors the Continuing Ability to Parry the Attack of the Killer Micros?, High Performance Computing on Vector Systems, Proc. of the High Performance Computing Center Stuttgart, March 17–18, 2005, eds. Resch, M., B¨onisch, Th., Benkert, K., Furui, T., Seo, Y., Bez, W., 25–37, Springer, Berlin, ISBN-10 3-540-29124-5, (2006).

352

M. Breuer, M. Alletto

17. Marchioli, C., Armenio, V., Soldati, A.: Simple and Accurate Scheme for Fluid Velocity Interpolation for Eulerian-Lagrangian Computation of Dispersed Flow in 3D Curvilinear Grids, Computers & Fluids, vol. 36, 1187–1198, (2007). 18. Maxey, M.R., Riley, J.J.: Equation of Motion for a Small Rigid Sphere in a Nonuniform Flow, Phys. Fluids, vol. 26, 883–889, (1983). 19. McLaughlin, J.B.: Inertial Migration of a Small Sphere in Linear Shear Flows, J. Fluid Mech., vol. 224, 261–274, (1991). 20. Paris, A.D., Eaton, J.K.: Turbulence Attenuation in a Particle-Laden Channel Flow, Report TSD-137, Dept. of Mech. Eng., Stanford University, (2001). 21. Pozorski, J., Sourabh, V.: Filtered Particle Tracking in Isotropic Turbulence and Stochastic Modeling of Subgrid-Scale Dispersion, Int. J. Multiphase Flow, vol. 35, 118–128, (2009). 22. Riber, E., Moureau, V., Garcia, M., Poinsot, T., Simonin, O.: Evaluation of Numerical Strategies for Large-Eddy Simulation of Particulate Two-Phase Recirculating Flow, J. Comput. Phys., vol. 228, 539–564, (2009). 23. Schiller, L., Naumann, A.: A Drag Coefficient Correlation, VDI Zeitschrift, vol. 77, 318–320, (1933). 24. Smagorinsky, J.: General Circulation Experiments with the Primitive Equations. I: The Basic Experiment, Month. Weath. Rev., vol. 91, 99–165, (1963). 25. Segura, J.C., Oefelein, J.C., Eaton, J.K.: Predictive Capabilities of Particle-Laden Large Eddy Simulation, Report TSD-156, Dept. of Mech. Eng., Stanford University, (2004). 26. Sommerfeld, M.: Theoretical and Experimental Modelling of Particulate Flows, Lecture Series 2000-06, von Karman Institute for Fluid Dynamics, (2000). 27. Viccione, G., Bovolin, V., Pugliese Carratelli, E.: Defining and Optimizing Algorithms for Neighboring Particle Identification in SPH Fluid Simulations, Int. J. Numer. Meth. Fluids, vol. 58, 625–638, (2008). 28. Yamamoto, Y., Potthoff, M., Tanaka, T., Kajishima, T., Tsuji, Y.: Large-Eddy Simulation of Turbulent Gas-Particle Flow in a Vertical Channel: Effect of Considering Inter-Particle Collisions, J. Fluid Mech., vol. 442, 303–334, (2001).

Large-Eddy Simulation of Supersonic Film Cooling at Finite Pressure Gradients∗ Martin Konopka, Matthias Meinke, and Wolfgang Schr¨oder

Abstract Large-eddy simulations are performed to analyze film cooling in supersonic combustion ramjets (Scramjets). The transonic film cooling flow is injected through a slot parallel to a Ma = 2.44 main stream with a fully turbulent boundary layer. The injection Mach number is Mai = 1.2 and adiabatic wall conditions are imposed. The cooling effectiveness is investigated for adverse and favorable pressure gradients which are imposed onto the potential core region right downstream of the slot. The numerical results are in good agreement with the measured adiabatic cooling effectiveness. The turbulent mixing process of the injected cooling flow shows high turbulence levels just downstream of the lip and slowly increasing turbulence levels in the cooling flow. At a favorable pressure gradient, the adiabatic film effectiveness downstream of the potential core region is significantly increased by approximately 50% compared to the film cooling flow without a pressure gradient, whereas the adverse pressure gradient leads to a reduction of adiabatic film effectiveness by 30%.

Nomenclature a K k L M

speed of sound acceleration factor turbulent kinetic energy computational domain length blowing rate

Martin Konopka · Matthias Meinke · Wolfgang Schr¨oder Institute of Aerodynamics, RWTH Aachen University, W¨ullnerstr. 5a, 52062 Aachen, Germany, e-mail: [email protected] ∗ © 2010 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved. Reprinted with permission of the American Institute of Aeronautics and Astronautics [1].

W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 26, © Springer-Verlag Berlin Heidelberg 2012

353

354

M. Konopka, M. Meinke, W. Schr¨oder

Ma Mach number p pressure Pr Prandtl number R wall curvature radius Re Reynolds number S slot height T temperature u, v, w streamwise, wall normal and spanwise velocity components x, y, z Cartesian coordinate system δ boundary layer thickness ν kinematic viscosity ρ density Θ dimensionless fluid temperature subscripts 2 region after pressure gradient ∞ freestream condition max maximum value i injected cooling stream r recovery value RS rescaling t total condition aw adiabatic wall superscripts + inner coordinates ∼ Favre average Favre fluctuation − Reynolds average Reynolds fluctuation

1 Introduction Supersonic and hypersonic vehicles are subject to intense aerodynamic heating on the vehicle’s exterior surfaces and inside of the engine [2]. In Supersonic Combustion Ramjet (Scramjet) powered vehicles the combustion chamber and nozzle surfaces are exposed to the hot, burnt gas. Although new materials are introduced which can withstand high temperatures, active cooling techniques are still required. One of those active cooling techniques is supersonic film cooling, which is a promising way to reduce the engine surface temperatures [3]. Former experimental studies of supersonic film cooling focused on the investigation of film effectiveness both in turbulent and laminar flow [4–7]. Later studies also included the effect of film cooling with shock wave interaction [8, 9]. Additionally, computational studies based on the Reynolds Averaged Navier Stokes (RANS) equations were performed, for

Supersonic Film Cooling at Finite Pressure Gradients

355

varying slot geometries and slot blowing ratios [10, 11]. These RANS studies were also done for film cooling including shock-wave interaction [12, 13]. For compressible flows, very few studies exist considering the effect of a favorable pressure gradient on film cooling, which is present in a Scramjet Single Expansion Ramp Nozzle (SERN). Arnold et al. [14] investigated the effect of accelerated flow in a rocket combustion chamber, injecting the cooling flow under subsonic conditions. In Scramjets, the supersonic flow undergoes a sudden expansion in the SERN nozzle [15]. In the present report, this favorable pressure gradient is imposed onto the film cooling flow. Since there are not only shock waves with sudden pressure rises, but also adverse pressure gradients due to supersonic combustion in Scramjets [16], the impact of adverse pressure gradients on film cooling is also analyzed by large-eddy simulations (LES) in the present report. This LES ansatz allows to assess the physics associated with compressible turbulence [17] in detail. Here, a laminar cooling flow injected into a turbulent boundary layer is considered. The adiabatic cooling effectiveness data by Juhany et al. [18] is used to validate the present computations. The present report is a concise summary of the extensive study performed by Konopka et al. [1] and is organized as follows. First, the numerical method, the boundary conditions and the flow configuration is discussed. Then, the results of the present computations are presented including a discussion of the cooling effectiveness values. Finally, a summary is presented.

2 Numerical Method The Navier-Stokes equations for three-dimensional unsteady compressible flow are solved based on an LES formulation using the MILES (monotone integrated LES) approach [19] to model the impact of the subgrid scales. The discretization of the Euler terms consists of a mixed centered upwind AUSM (advective upstream splitting method) scheme [20] at second-order accuracy and the non-Euler terms are discretized second-order accurate using a centered approximation. The temporal integration is done by a second-order explicit 5-stage Runge-Kutta method. A detailed description of the fundamental flow solver is given by Meinke et al. [21] and a thorough discussion of the quality of its solutions in fully turbulent flow is discussed by Alkishriwi et al. [22] and El-Askary et al. [23] at sub- and supersonic Mach numbers.

3 Boundary Conditions and Computational Mesh In large-eddy simulations of spatially developing boundary layers the proper prescription of inflow variables is still challenging. A possible solution is to compute the flow over a complete surface where the boundary layer evolves. This approach would have far too high computational costs for the present film cooling problem. To

356

M. Konopka, M. Meinke, W. Schr¨oder

Fig. 1 Computational domains and boundary conditions of ZPG

circumvent this problem an independent spatially developing boundary layer simulation is performed simultaneously to the film cooling simulation. The inflow distribution of the main simulation domain is determined by a slicing technique to insert the data of the independent boundary layer simulation into the cooling computations. A sketch of the boundary conditions is given in Fig. 1. The auxiliary flat plate simulation generates its own turbulent inflow data by using the compressible rescaling method proposed by El-Askary et al. [23], which is an extension of Lund’s et al. [24] approach considering compressibility. In the spanwise direction, fully periodic boundary conditions are used. On the wall, no slip and adiabatic conditions are imposed. The outflow boundary conditions are based on the conservation equations written in characteristic variables. The illustration in Fig. 1 shows that a sponge layer is used to damp numerical reflections from the top and outflow boundaries. In the sponge layer source terms are added to the right-hand side of the governing equations to match instantaneous pressure and density solutions to the desired target solutions. At the film cooling slot laminar supersonic inflow conditions are applied. The favorable pressure gradient is generated by a convex wall located at the streamwise position x/S = 10. The adverse pressure gradient is imposed onto the film cooling domain by a concave wall located on the upper boundary, which creates compression waves. Further details can be found in [1]. The structured computational mesh consists of up to 24 × 106 grid points with a minimum wall resolution in inner coordinates of Δ x+ = 20, Δ y+ = 0.5, Δ z+ = 10 and constant grid spacing in the streamwise and the spanwise directions. The computational domain of the rescaling simulation is shown in Fig. 1 which has a spanwise extension in the z-direction of Lz,RS /S = 2.2, where the reference length S is the slot height. The lip thickness is 0.16 S. The dimensions of the main simulation domains where the supersonic cooling flow mixes with the approaching turbulent boundary layer are depicted in Fig. 1. These domains have the same spanwise extension as the rescaling domain.

Supersonic Film Cooling at Finite Pressure Gradients

357

Table 1 Cooling flow properties Case ZPG FPG APG

Mai 1.2 1.2 1.2

Tti /Tt∞ 0.80 0.80 0.80

Mi = ρi ui /ρ∞ u∞ 0.40 0.40 0.40

Kmax – 0.2 × 10−6 −0.2 × 10−6

Ma2 2.44 2.9 2.1

4 Flow Configuration Three supersonic cooling configurations are considered in the present study and are summarized in Table 1. The flow configurations match the conditions used in the experiments by Juhany et al. [18]. The Reynolds number ReS = u∞ S/ν∞ based on the freestream velocity, kinematic viscosity, and slot height S is 13500. The thickness of the boundary layer at the tip of the slot is δ = 2.2S and the freestream Mach number is Ma = 2.44. The blowing rate Mi , the injection Mach number Mai , and the total temperature ratio Tti /Tt∞ of the total injection and total freestream temperature is kept constant. The zero pressure gradient configuration (ZPG) is used to validate the present computation by cooling effectiveness data obtained by experiments conducted by Juhany et al. [18]. A favorable pressure gradient (FPG) is imposed onto ◦ resulting in a maximum the cooling flow by a curved wall at2a final wall angle of 10 −6 acceleration factor of Kmax = ν /u · ∂ u/∂ x max = 0.2 × 10 . At FPG the quantity K is computed using a wall parallel coordinate x and velocity u. The Mach number downstream of the expansion fan is Ma2 = 2.9. When the (APG) configuration is considered the maximum acceleration factor is Kmax = −0.2 × 10−6 and the Mach number is Ma2 = 2.1. At all cooling configurations a fully laminar inflow having a boundary layer thickness of δ = 0.067S at the exit is assumed. The injection pressure matches the freestream pressure p∞ .

5 Results In this section the results of the numerical simulations are presented. First, the main flow characteristics of supersonic film cooling by slot injection and the imposed pressure gradients are explained. Subsequently the instantaneous flow field is characterized, followed by a discussion of the validation by adiabatic cooling effectiveness data. The averaged flow field is then discussed using velocity, temperature and total temperature profiles. Then turbulence statistics and the turbulent transport of heat and momentum are presented.

358

M. Konopka, M. Meinke, W. Schr¨oder

5.1 Main Flow Characteristics of Supersonic Film Cooling with Pressure Gradients In supersonic turbulent film cooling at laminar cooling flow injection the flowfield can be divided into three characteristic regions, as shown in Fig. 2. Right downstream of the injection, the potential core region is observed [18, 26], where the wall temperature remains constant at the recovery value of the cooling flow. A laminar boundary layer exists which merges with a free shear layer emanating from the lip and marking the end of the potential core region. In the following wall jet region, the temperature of the wall increases due to the mixing of the slot boundary layer with the viscous layer from the lip. Further downstream, the wall jet no longer determines the near wall flow. It is part of the boundary layer region, where the wall temperature still rises. Unlike to the wall jet region the velocity profile has already developed a turbulent boundary-layer-like profile. Figure 3 illustrates the pressure distributions of the FPG and APG configurations. At both cases, an expansion fan originates from the lip and extends through the turbulent boundary layer into the freestream. This expansion fan is followed by a shock readjusting the flow angle to its freestream angle. A different expansion fan emanates from the lower edge of the lip and impinges upon the wall and is reflected. As shown in Fig. 3b, the compression waves interact with the expansion waves and the shock system of the lip causing a complex pressure pattern. This configuration was chosen because in a supersonic combustor the pressure rise at the

Fig. 2 Main flow characteristics of supersonic film cooling with a laminar cooling flow injection [25, 26]

Fig. 3 Pressure contours

Supersonic Film Cooling at Finite Pressure Gradients

359

Fig. 4 Pressure coefficient distribution

combustor wall is imposed by the supersonic combustion and not by wall curvature. The pressure contours depicted in Fig. 3a are less complex, showing the typical pattern of the Prandtl-Meyer expansion along a convex wall. Figure 4 evidences that the adverse pressure gradient and the favorable pressure gradient are imposed in the region 10 ≤ x/S ≤ 20. The pressure drop of the expansion fan right at the wall and the regions of constant pressure coefficients beginning at x/S = 25 can be identified. The pressure gradients are primarily imposed onto the potential core region. This will be discussed in greater detail in Sect. 5.3.

5.2 Instantaneous Flowfield The turbulent structures of the interaction of the transonic cooling flow with a turbulent boundary layer are visualized in Fig. 5 by the λ2 criterion [27]. Besides the vortical structures in the approaching boundary layer a small recirculation region at the lower wall is visible in Fig. 5a, where the lip expansion fan hits the laminar slot boundary layer of the coolant. The absence of any turbulent structures indicates the potential core region. The end of this region further downstream is shown in Fig. 5b where elongated turbulent structures form in the boundary layer of the coolant and a bypass-like transition takes place. In this wall jet region the elongated turbulent vortices break down into smaller structures. A closer look into the shear layer downstream of the cooling slot reveals the small eddies present in the flow close to the lip to break down in the shear layer. The development of the temperature distribution in the interaction region of the shear layer with the laminar cooling flow is shown in Fig. 6. The instantaneous temperature field evidences the potential core region without any vortices downstream of the cooling slot. The mixing between the turbulent boundary layer and the cooling flow seems to be initially governed by small scale vortices in the region of 0 ≤ x/S ≤ 3 close to the lip. Further downstream, larger patches of fluid with temperatures above the static temperature of the cooling flow penetrate deeper into the potential core region, causing transition of the laminar slot boundary layer.

360

M. Konopka, M. Meinke, W. Schr¨oder

Fig. 5 Turbulent structures visualized by the λ2 criterion at ZPG with mapped on Mach number contours

Fig. 6 Instantaneous temperature contours in the slot vicinity at ZPG

5.3 Cooling Effectiveness The supersonic film cooling configurations are evaluated by the adiabatic cooling effectiveness Taw − Tr∞ η= , (1) Tri − Tr∞ where Tr∞ and Tri are the recovery temperatures of the freestream and the coolant. Figure 7a shows a comparison between the measurements of the adiabatic wall temperature from Juhany et al. [18] and the numerical ZPG data. The end of the potential core region is predicted correctly by the present computation to be at x/S = 24. Until x/S = 80 the numerical results are in good agreement with the experimental findings. The deviations further downstream may be attributed to shock reflections and pressure gradients in the wind tunnel, which do not occur in the current computations. The effectiveness values slightly above unity at ZPG and FPG are caused by the expansion fan and shock of the lip, since the recovery temperature Tri of the cooling effectiveness is set to the value at the wall at x/S = 0. At FPG, the favorable pressure gradient leads to a slight cooling of the wall and an extension of the potential core region until x/S = 40. The following wall jet and boundary layer region is

Supersonic Film Cooling at Finite Pressure Gradients

361

Fig. 7 Comparison of present computational data with experimental results form Juhany et al. [18]

hardly impacted by the favorable pressure gradient, i.e. the cooling effectiveness has the same gradient as at ZPG. The cooling effectiveness at APG drops at x/S = 15 below unity, i.e. the transition is shifted upstream. In Fig. 7b the recovery temperatures of the three cases are juxtaposed, showing the slight drop in wall temperature right downstream the injection where the lip expansion fan and shock impinges upon the wall. The rise of the wall temperature at APG shows a different temperature gradient in the region 10 ≤ x/S ≤ 20 where the adverse pressure gradient is present. This is caused by a small separation region of the laminar boundary layer.

5.4 Mean Flow Field The velocity profiles also evidence the three flow regions of the cooling flow. In Fig. 7a it is clear that the potential core region at ZPG extents until x/S = 24. This region is also visible in Fig. 8a for all cases, where the velocity variation in the core region of the cooling flow remains zero until x/S = 20. The wall jet region cannot be clearly distinguished and at x/S = 30 the velocity profile at ZPG already resembles that of a turbulent boundary layer. Right downstream of the lip the lower layer of the freestream boundary layer is rapidly accelerated. The influence of the adverse pressure gradient at APG becomes significant at x/S = 15, where a thicker cooling flow boundary layer as compared to ZPG can be observed. At x/S = 30 the lower parts of the boundary layer profile still have lower velocities. Further downstream the profile is similar to those at ZPG. The only difference between these profiles occurs in the freestream velocity. The velocity profiles at FPG in Fig. 8b show that until x/S = 40 the shear layer emanating from the lip and the laminar boundary layer of the cooling flow are still separated. Only at x/S = 60 the two layers have merged. This development is indicated by the similar shape of the FPG boundary layer profile compared to that of ZPG. The temperature profiles in Fig. 9a reveal the rapid temperature reduction in the shear layer since the viscous sublayer originating in the approaching turbu-

362

M. Konopka, M. Meinke, W. Schr¨oder

Fig. 8 Velocity profiles at several streamwise locations, grid spacing is u/u∞ = 1; at FPG the coordinates x and y are wall parallel and wall normal, respectively

Fig. 9 Temperature profiles at several streamwise locations, grid spacing is T/T∞ = 1; at FPG the coordinates x and y are wall parallel and wall normal, respectively

lent boundary layer is rapidly accelerated. A close look at the evolution of the wall temperature shows that the recovery temperature only slightly varies from its original recovery value in the streamwise direction from x/S = 15–60. The accurate computation of this minute increase as shown in terms of cooling effectiveness in Fig. 7a demonstrates the quality of the solution. The static temperature profile at ZPG in Fig. 9b at x/S = 30 evidences a wall jet region since the temperature gradient ∂ T/∂ (y/S) at y/S = 0.5 is still close to zero, although the velocity profile at the same position already resembles that of a boundary layer. The temperature profiles at APG in Fig. 9b show a higher static temperature in the shear layer at y/S = 0. Only far downstream at x/S = 60 the temperature profile posses the shape of that of a turbulent boundary layer. At FPG in Fig. 9b the temperature is significantly reduced at y/S = −0.5 since the shear layer from the lip and of the coolant do not merge until x/S = 40. Additionally the acceleration of the flow reduces the static temperature.

Supersonic Film Cooling at Finite Pressure Gradients

363

Fig. 10 Dimensionless fluid temperature Θ profiles indicating the mixture between the freestream and the coolant, grid spacing is Θ = 1; at FPG the coordinates x and y are wall parallel and wall normal, respectively

Figure 10 shows nondimensionalized total temperature profiles

Θ=

T t − Tt∞ . Tti − Tt∞

(2)

At the total temperature injection ratio Tti /Tt∞ = 0.8, this fluid temperature emphasizes changes in the Reynolds averaged total temperature T t in the current computations by a factor of five. Therefore it is possible to observe very small changes in the total temperature distribution of the current computations, allowing to analyze the mixture between the freestream and the coolant in detail. Figure 10a shows the dimensionless fluid temperature profiles in the range of 0 ≤ x/S ≤ 10. The total temperature overshoot of the compressible turbulent boundary layer is illustrated by negative values of Θ with a maximum of 8%. This deviation has been seen in measurements and direct numerical simulations of compressible boundary layers, e.g. by Pirozzoli et al. [28]. However, it does not occur in the hotwire measurements of Hyde et al. [29]. They conducted measurements of heated supersonic slot injection into a supersonic flow. In Fig. 10a the shear layer between the cooling flow and the freestream boundary layer is indicated by the steep gradient of Θ in all three cases. In Fig. 10b the dimensionless fluid temperature profiles of FPG and APG at x/S = 15 are hardly affected by the pressure distributions in this region. Only after the interaction region the APG profiles show a slightly increased mixture between the two flows by a reduced dimensionless fluid temperature. The opposite trend is visible at FPG, where the favorable pressure gradient leads to an increase of Θ from the wall to the edge of the mixture layer. The total temperature overshoot is shifted away from the wall. Indicated by the film effectiveness values above zero, the mixing process between the freestream boundary layer and the coolant is far from finished.

364

M. Konopka, M. Meinke, W. Schr¨oder

Fig. 11 Turbulent kinetic energy profiles at several streamwise locations, grid spacing is 0.1; at FPG the coordinates x and y are wall parallel and wall normal, respectively

√ k/u∞ =

5.5 Turbulence Statistics The profiles of the turbulent kinetic energy (TKE) k=

1 2 u + v2 + w2 , 2

(3)

in Fig. 11a show the absence of fluctuations in the prescribed slot boundary layer flow. The local maximum at the streamwise location (x/S = 2, y/S = 1.5) is determined by the unsteady expansion fan at the lip. The unsteadiness is generated by the vortices in the approaching boundary layer. At the same streamwise location the small kink close to the wall is also caused by a different unsteady expansion fan impinging on the slot flow boundary layer. The turbulent kinetic energy downstream of the lip at y/S = 0 increases when the boundary layer mixes with the slot flow. Further downstream it decreases when the shear layer begins to amplify turbulence levels in the cooling flow. Downstream of the potential core region the turbulence levels close to the wall rapidly increase, showing a wall jet region at ZPG. The TKE profile at ZPG at x/S = 40 in the region of −0.5 ≤ y/S ≤ 1 in Fig. 11b shows no significant variation as in the profile at x/S = 30 and thus suggests the beginning of a newly established boundary layer. Figure 11b evidences amplified turbulence levels for APG at x/S = 15 close to the wall. This shows by the large drop of TKE at y/S = −0.6 the forced transition of the slot flow boundary layer by the adverse pressure gradient without a complete mixture of the shear layer from the lip with the slot flow. Further downstream TKE peaks at APG in a region where no longer an adverse pressure gradient exists. These higher turbulence levels do not change the cooling effectiveness gradient once the effectiveness values drop as observed in Fig. 7a. The TKE profiles at FPG in Fig. 11b resemble those of case ZPG, but with reduced turbulence levels at x/S ≥ 30. At x/S = 60 the local minimum of the TKE profile vanishes, showing an increased mixture between the shear layer of the lip and the slot flow boundary layer.

Supersonic Film Cooling at Finite Pressure Gradients

365

5.6 Turbulent Transport of Heat and Momentum In film cooling configurations the knowledge of the ratio of the turbulent eddy viscosity and diffusivity is essential, since in most turbulence models a fixed relation, i.e., the turbulent Prandtl number Prt is used. The definition of this quantity for compressible flows reads ρ u v (∂ T/∂ y) . (4) Prt = ρ v T (∂ u/∂ y) To compare compressible turbulent Prandtl number distributions, it is important to use the same type of averaging. Therefore, the components of the turbulent Prandtl number are Favre averaged as in Guarini et al. [30]. A comparison of the turbulent Prandtl number of the rescaling domain with the DNS of Guarini et al. at Ma = 2.5 shows a good agreement (Fig. 12), indicating the physically correct prescription of the turbulent boundary layer in the film cooling domain. Figure 13a evidences a general reduction of Prt in the shear layer at x/S = 15 at ZPG, although peak values around 2 occur at the interface between the potential core region and the shear layer emanating from the lip and at the end of the potential core region. High values of Prt can be observed where the shear layer from the lip mixes with the cooling flow boundary layer. This can be attributed to the intense momentum transport typical for a wall jet region. Downstream of this region the turbulent Prandtl number posses values found in an undisturbed boundary layer, although the TKE profiles presented in Sect. 5.5 differ from that in an undisturbed boundary layer. Figure 13c shows that the presence of a favorable pressure gradient (FPG) leads to a significant reduction of the turbulent Prandtl number levels in the shear layer at x/S = 25. When the shear layer from the lip finally merges with the slot boundary layer at x/S = 40, peak values of Prt are lower than at the end of the potential core region at ZPG. The APG configuration in Fig. 13d reduces the potential core region as previously observed. The comparison with the ZPG configuration shows the peak values of Prt to be reduced at x/S = 20 where the shear layer merges with the cooling flow boundary layer.

Fig. 12 Comparison of the turbulent Prandtl number Prt in the rescaling domain with a DNS of Guarini et al. [30]

366

M. Konopka, M. Meinke, W. Schr¨oder

Fig. 13 Contours of the turbulent Prandtl number Prr

6 Computational Resources The simulations were run on the NEC SX-9 installed at the HLR Stuttgart. The meshes for the three presented cases in Sect. 5 consist of approximately 24 million grid points which were divided into 16 blocks. Each block resides on a single CPU and contains around 1.5 million grid points. The computations were performed on a single SX-9 node using a global memory of 31.98 GB. Data between the blocks is exchanged via MPI (message passing interface). The computational details for a single 12 hour run are given in Table 2.

Supersonic Film Cooling at Finite Pressure Gradients

367

Table 2 Performance on NEC SX-9 Number of CPUs Number of Nodes grid points/CPU total grid points

Case “ZPG” 16 1 1.51 · 106 24.1 · 106

Avg. User Time [s] Avg. Vector Time [s] Vector Operations Ratio [%] Avg. Vector Length

41556 38502 99.7 251.439

Memory/CPU [MB] total Memory [GB]

1998 31.98

Avg. MFLOPS/CPU Max. MFLOPS/CPU total GFLOPS

17873 19590 285.881

7 Summary Large-eddy simulations for supersonic cooling gas injection into a supersonic turbulent boundary layer were performed for zero, favorable and adverse pressure gradient configurations. The comparison of the cooling gas injection solutions with measured cooling effectiveness data [18] showed good agreement. The wall jet downstream of the potential core region and the boundary layer-like structure of the mixing were identified by analyzing the turbulence levels, velocity profiles, and several temperature profiles. Unsteady expansion waves excite turbulence in the laminar slot flow boundary layer. The turbulent Prandtl number shows pronounced deviations from its standard value of 0.9. At the potential core region the turbulent shear layer initiates a bypass-like transition of the laminar slot flow boundary layer. The favorable pressure gradient reduces turbulence levels in the shear layer and delays transition, whereas the adverse pressure gradient amplifies turbulence levels and shifts the transition further upstream. The cooling effectiveness is affected by this upstream shift of forced transition of the slot flow boundary layer, resulting in a 50% reduction compared to the film cooling flow at ZPG. The favorable pressure gradient increases the adiabatic cooling effectiveness by 30%. The analysis of the turbulent Prandtl number shows a reduction in the shear layer, which is affected by the favorable pressure gradient. In the wall jet region values of approx. 2 are observed, whereas the adverse pressure gradient results in similar findings as under zero pressure gradient conditions. Acknowledgments. The support of this research by the Deutsche Forschungsgemeinschaft (DFG) in the framework of the Research Training Group “Aero-Thermodynamic Design of a Scramjet Propulsion System for Future Space Transportation Systems” 1095/2 and the High Performance Computing Center Stuttgart (HLRS) is gratefully acknowledged.

368

M. Konopka, M. Meinke, W. Schr¨oder

References 1. Konopka, M., Meinke, M., Schr¨oder, W.: Large-Eddy Simulation of Supersonic Film Cooling. In: 46th AIAA/ASME/SAE/ASEE Joint Propulsion Conference & Exhibit. Number AIAA 2010-6792, Nashville, TN (2010) 2. Anderson, J.D.: Hypersonic and High-Temperature Gas Dynamics. AIAA Education Series. American Institute of Aeronautics and Astronautics, Inc. (2006) 3. Kanda, T., Masuya, G., Ono, F., Wakamatsu, Y.: Effect of Film Cooling/Regenerative Cooling on Scramjet Engine Performance. J. Prop. Pow. 10 (1994) 618–624 4. Goldstein, R., Eckert, E., Tsou, F., Haji-Sheikh, A.: Film Cooling with Air and Helium Injection Through a Rearward-Facing slot into a Supersonic Air Flow. AIAA J. 4 (1966) 981–985 5. Parthasarathy, K., Zakkay, V.: An Experimental Investigation of Turbulent Slot Injection at Mach 6. AIAA J. 8 (1970) 1302–1307 6. Cary Jr., A., Hefner, J.: Film-Cooling Effectiveness and Skin Friction in Hypersonic Turbulent Flow. AIAA J. 10 (1972) 1188–1193 7. Richards, B., Stollery, J.: Laminar Film Cooling Experiments in Hypersonic Flow. J. Aircraft 16 (1979) 177–181 8. Kanda, T., Ono, F., Saito, T., Takahashi, M., Wakamatsu, Y.: Experimental Studies of Supersonic Film Cooling with Shock Wave Interaction. AIAA J. 34 (1996) 265–271 9. Holden, M., Nowak, R., Olsen, G., Rodriguez, K.: Experimental Studies of Shock Wave/Wall Jet Interaction in Hypersonic Flow. In: 28th Aerospace Sciences Meeting. AIAA Paper 900607, Reno, NV (1990) 10. O’Connor, J., Haji-Sheikh, A.: Numerical Study of Film Cooling in Supersonic Flow. AIAA J. 30 (1992) 2426–2435 11. Sarkar, S.: Numerical Simulation of Supersonic Slot Injection Into a Turbulent Supersonic Stream. International J. of Turbo and Jet-Engines 17 (2000) 227–240 12. Takita, K., Masuya, G.: Effects of Combustion and Shock Impingement on Supersonic Film Cooling by Hydrogen. AIAA J. 38 (2000) 1899–1906 13. Peng, W., Jiang, P.X.: Influence of Shock Waves on Supersonic Film Cooling. J. of Spacecraft and Rockets 46 (2009) 67–73 14. Arnold, R., Suslov, D., Haidn, O.J.: Film Cooling of Accelerated Flow in a Subscale Combustion Chamber. J. Prop. Pow. 25 (2009) 443–451 15. Mitani, T., Ueda, S., Tani, K., Sato, S., Miyajima, H.: Validation Studies of Scramjet Nozzle Performance. J. Prop. Pow. 9 (1993) 725–730 16. Scheuermann, T., Banica, M., Chun, J., von Wolfersdorf, J.: Eindimensionale Untersuchungen zur gestuften Brennstoffeinbringung in einer Scramjet-Brennkammer. In: Deutscher Luft- und Raumfahrtkongress 2008. Number DLRK2008-81208, Darmstadt (2008) 17. Bowersox, R., Schetz, J.: Compressible Turbulence Measurements in a High-Speed HighReynolds-number Mixing Layer. AIAA J. 32 (1994) 758–764 18. Juhany, K., Hunt, M., Sivo, J.: Influence of Injectant Mach Number and Temperature on Supersonic Film Cooling. J. of Thermophysics and Heat Transfer 8 (1994) 59–67 19. Boris, J., Grinsteina, F., Orana, E., Kolbea, R.: New Insights into Large Eddy Simulation. Fluid Dynamics Research 10 (1992) 199–228 20. Liou, M., Steffen, C.J.: A New Flux Splitting Scheme. J. Comput. Phys. 107 (1994) 23–39 21. Meinke, M., Schr¨oder, W., Krause, E., Rister, T.: A Comparison of Second- and Sixth-Order Methods for Large-Eddy Simulations. Comp. Fluids 31 (2002) 695–718 22. Alkishriwi, N., Meinke, M., Schr¨oder, W.: A Large-Eddy Simulation Method for Low Mach Number Flows Using Preconditioning and Multigrid. Comp. Fluids 35 (2006) 1126–1136 23. El-Askary, W., Schr¨oder, W., Meinke, M.: LES of compressible wall-bounded flows. Technical Report 2003-3554, AIAA (2003) 24. Lund, T.S., Wu, X., Squires, K.D.: Generation of Turbulent Inflow Data for SpatiallyDeveloping Boundary Layer Simulations. 140 (1998) 233–258

Supersonic Film Cooling at Finite Pressure Gradients

369

25. Juhany, K.: Supersonic Film Cooling Including the Effect of Shock Wave Interaction. PhD thesis, California Institute of Technology Pasadena, California (1994) 26. Seban, R.A., Back, L.H.: Velocity and Temperature Profiles in Turbulent Boundary Layers with Tangential Injection. J. of Heat Transfer 84 (1962) 45–54 27. Jeong, J., Hussain, F.: On the Identification of a Vortex. J. Fluid Mech. 285 (1995) 69–94 28. Pirozzoli, S., Grasso, F., Gatski, T.B.: Direct numerical simulation and analysis of a spatially evolving supersonic turbulent boundary layer at m = 2.25. Phys. Fluids 16 (2004) 530–545 29. Hyde, C., Smith, B., Schetz, J., Walker, D.A.: Turbulence Measurements for Heated Gas Slot Injection in Supersonic Flow. AIAA J. 28 (1990) 1605–1614 30. Guarini, S.E., Moser, R.D., Shariff, K., Wray, A.: Direct numerical simulation of a supersonic turbulent boundary layer at Mach 2.5. J. Fluid Mech. 414 (2000) 1–33

Prediction of Stability Limits of Combustion Chambers with LES B. Pritz, F. Magagnato, and M. Gabi

Abstract Lean Premixed combustion, which allows for reducing the production of thermal NOx, is prone to combustion instabilities. There is an extensive research to develop a reduced physical model, which allows—without timeconsuming measurements—to calculate the resonance characteristics of a combustion system consisting of Helmholtz resonator type components (burner plenum, combustion chamber). For the formulation of this model numerical investigations by means of compressible Large Eddy Simulation (LES) were carried out. In these investigations the flow in the combustion chamber is isotherm, non-reacting and excited with a sinusoidal mass flow rate. Firstly a combustion chamber as a single resonator subsequently a coupled system of a burner plenum and a combustion chamber were investigated. In this paper the results of additional investigations of the single resonator is presented. The flow in the combustion chamber was investigated without excitation at the inlet. It was detected, that the mass flow rate at the outlet cross section is pulsating once the flow in the chamber is turbulent. The fast Fourier transform of the signal showed that the dominant mode is at the resonance frequency of the combustion chamber. This result sheds light on a very important source of self-excited combustion instabilities. Furthermore the LES can provide not only the damping ratio for the analytical model but the eigenfrequency of the resonator also. Key words: compressible large-eddy simulation, combustion instabilities, oscillating flow, damping ratio

B. Pritz · F. Magagnato · M. Gabi Department of Fluid Machinery, University of Karlsruhe, Kaiserstr. 12, 76131 Karlsruhe, Germany, e-mail: [email protected], [email protected], [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 27, © Springer-Verlag Berlin Heidelberg 2012

371

372

B. Pritz, F. Magagnato, M. Gabi

1 Introduction It is well known that in order to fulfil the stringent demands for low emissions of NOx, the lean premixed combustion concept is commonly used. However, lean premixed combustors are susceptible to thermo-acoustic instabilities driven by the combustion process and possibly sustained by a resonant feedback mechanism coupling pressure and heat release [1, 2]. This resonant feedback mechanism creates pulsations typically in the frequency range of several hundred Hz and which reach high amplitudes so that the system has to be shut down in order not to be damaged. Although the research activities of the recent years have contributed to a better understanding of this phenomenon the underlying mechanisms are still not well enough understood. Combustion instabilities are characterized by time-dependent heat release rate of the flame as well as by the time-dependent static pressure within the combustion chamber or the connected volumes like burner plenum or exhaust gas system. For the prediction of the stability of technical combustion systems regarding the development and maintaining of self-sustained combustion instabilities the knowledge of the periodic-non-stationary mixing and reacting behavior of the applied flame type (flame model) [3–5] and a quantitative description of the resonance characteristics of the gas volume in the combustion chamber is conclusively needed. At the University of Karlsruhe an analytical model to predict the resonance characteristics of real, damped combustion systems was developed [6]. As the result of the foregoing research the model is able to describe the resonant characteristics of a single Helmholtz resonator type combustor and scales it to different operation conditions and geometries [7]. Furthermore the model was extended to describe a system of two coupled resonators [8]. The prediction of the model was proved by experimental data. It is important to mention that this model describes the system in the low frequency range and do not cover the high frequency instabilities. Simultaneously to the experimental investigations numerical simulations were carried out with the in-house developed flow solver SPARC [9]. The main goal of the numerical investigation was to predict the damping coefficient of the system which is an important input for the analytical model. By means of numerical simulation and the analytical model the resonance characteristics of a combustion system can be predicted already during the design phase. The results of the investigation of the single resonator [10] and the coupled resonators [11] showed that LES can predict accurately the resonant characteristics and the damping, respectively. The comparison of the LES study with the experiments sheds light on the significant role of the wall roughness in the exhaust gas pipe. Recently the Discrete Element Method (DEM) was implemented in our research code in order to model the effect of surface roughness. The simulations with DEM showed a good agreement with the experimental results [12]. As preparation for the simulations with DEM a calculation with constant mass flow rate at the inlet was carried out. In SPARC the full multigrid method is implemented, which implies also grid sequencing. This method allows for getting a statistically steady state solution much faster. The waves generated by the initializa-

Prediction of Stability Limits of Combustion Chambers with LES

373

tion decay relative fast on the coarsest grid level. Additionally the computation is very fast because of the significantly reduced number of control volumes. For the combustion chamber four grid levels were used. Based on the earlier experiences, the excitation was started from the case with constant mass flow rate only on the third grid level, i.e. on the second finest mesh. If the solution of a coarser grid is used as initialization on the next finer grid transient waves are generated again. In the case of the combustion chamber the mass flow rate signal at the outlet of the exhaust gas pipe could be used to observe the decay of these waves. It was detected, that after the decaying of the transient waves the mass flow rate was pulsating continuously with not negligible amplitude. The frequency spectrum of the mass flow rate signal showed a peak at the eigenfrequency of the combustion chamber. The only possible forcing of this pulsation was the turbulent fluctuations generated by the jet in the combustion chamber.

2 Simulated Configuration The simulated configuration was the same as described by Magagnato et al. [10]. The combustion chamber had a simple cylindrical geometry. The diameter and length of the chamber were dcc = 0.3 m and lcc = 0.5 m and the diameter and length of the exhaust gas pipe were degp = 0.08 m and legp = 0.2 m, respectively. The mass flow rate and the temperature of the fluid at the inlet was m˙ = 0.017 kg/s and T = 298 K, respectively. At the far field the pressure was set to p = 104755 Pa and the temperature to T = 292 K.

3 Numerical Method The simulations were carried out with the in-house developed parallel flow solver called SPARC (Structured PArallel Research Code). The code is based on the three dimensional block structured finite volume method and parallelized with the message passing interface. In the case of the combustor the compressible Navier-Stokes equations were solved. This was essential to capture the physical response of the pulsation amplification, which is mainly the compressibility of the gas volume in the chamber. Furthermore the viscous effects play a crucial role in the oscillating boundary layer in the neck of the Helmholtz resonator and, hereby, in the damping of the pulsation. The spatial discretization was a second-order accurate central difference formulation. The temporal integration was carried out with a second-order accurate implicit dual-time stepping scheme. For the inner iterations the 5-stage Runge-Kutta scheme was used. The time step was Δt = 10−5 s. The Smagorinsky-Lilly model was used as subgrid-scale model, as this model had been used for the earlier computations in [10] also. A simulation with the dynamic Smagorinsky model gave similar results.

374

B. Pritz, F. Magagnato, M. Gabi

The detailed description of the boundary conditions can be found in [10]. Here only a brief listing is given. For this computation a constant mass flow rate was imposed on the inlet plane. The outlet was placed in the far field. At the surfaces the no-slip boundary condition and an adiabatic wall are imposed. For the first grid point y+ < 1 is obtained, the effect of the wall on the turbulence is modeled with the van Driest type damping function. For the present calculation the wall in the exhaust gas pipe was aerodynamically smooth. The entire computational domain contains about 4.3 × 106 grid points in 111 blocks.

4 Resonance Characteristics of the Combustion Chamber The investigations in [10] represent the system identification of a combustion chamber under isothermal conditions. At the inlet a mass flow rate with sinusoidal component was imposed and the mass flow rate at the outlet cross section of the exhaust gas pipe was detected. The amplitude ratio of these two mass flow rate signals was calculated at different discrete excitation frequencies in the range of the eigenfrequency of the chamber (Fig. 1). Since the geometry was rather simple, the eigenfrequency could be predicted quite accurate by the undamped Helmholtz resonator model equation.

Fig. 1 Response curve of the combustion chamber

Prediction of Stability Limits of Combustion Chambers with LES

375

As discussed in [10] the investigations showed the important effect of the surface quality in the exhaust gas pipe on the damping. In order to catch this effect with CFD as well, the modeling of surface roughness is needed. The DEM was chosen and implemented in SPARC. The system identification described above was repeated with simulated roughness in the exhaust gas pipe. Before the excitation at the inlet was applied a statistically steady flow was computed on the coarsest and second coarsest grid level. The experiences of the earlier investigations showed that the transient waves generated at the start of the computation should be decayed before starting the excitation in order to get the real system response at the outlet. The calculation is initialized with homogeneous distribution of each variable. This produces quite strong transient waves. The mass flow rate signal at the outlet of the exhaust gas pipe was used to monitoring the decaying of these waves. As soon as an almost constant mass flow rate was reached the computation could be continued on the second coarsest grid level. The extrapolation of the solution from the coarser on the next finer grid level produces also transient waves because of the sudden change of the shear stress at the walls. These waves are much smaller than the waves generated at the initialization but they are still considerable on the second coarsest grid level. The mass flow rate signal at the outlet showed the decaying of these waves but later a certain amount of pulsation was observed and it decayed not at all. The amplitude of this pulsation was not negligible and the frequency of the dominating wave was approximately at the eigenfrequency of the combustion chamber. Therefore the computation without excitation was continued on the finer grid levels also.

5 Results and Discussion The frequency spectrum plotted in Fig. 2 is computed from the mass flow rate signal on the finest mesh. The samples were taken in each time step, thus the sampling frequency was 10 kHz furthermore the sampling length was 32768. In [6] it was shown that the mass flow rate signal at the outlet and the pressure signal in the combustion chamber can be used as output signal equivalently i.e. the pulsation of the mass flow rate indicates a pulsation of the pressure in the chamber. The Fourier transform of the pressure signal measured at the middle of the side wall of the chamber gives the same distribution. The form of these spectra agrees very well with the frequency response curve of the combustion chamber shown in Fig. 1. The mass flow rate at the inlet for this calculation was kept on a constant value. There was no external excitation in this computation. The only possible forcing of the pulsation could arise from the turbulent motions inside the combustion chamber. The inflow into the chamber is a jet with strong shear layer which generates a broad band spectrum of turbulent fluctuations (Fig. 3). The combustion chamber than amplifies the pressure fluctuations generated by turbulence at its eigenfrequency.

376

B. Pritz, F. Magagnato, M. Gabi

Fig. 2 FFT spectrum of the mass flow rate signal at the outlet cross section of the exhaust gas pipe at constant mass flow rate at the inlet

In order to investigate the effect of periodic flow instabilities further calculation with m˙ = 0.034 kg/s and m˙ = 0.0136 kg/s were carried out. The spectra of the mass flow rate of these calculations give the same distribution in the low frequency range except the amplitude of the pulsation was changing proportional to the mean mass flow rate. There are some possible mechanism listed in the literature, which could trigger self-excited instabilities in combustion systems, but they are not sufficiently understood [6, 13, 14]. An important achievement of this simulation is that the pressure in the combustion chamber can pulsate already without any external excitation e.g. compressor or other incoming disturbances from ambient or even periodic flow instabilities depending on the design of the burner. Thus the flame is also pulsating. The amplitude of this pulsation will be amplified to the limit cycle if the time lag of the flame changed so that the pressure fluctuation and the heat release fluctuation meet the Rayleigh criterion. The result of the earlier investigations shows that the pulsation and the high shear in the resonator neck produce highly anisotropic swirled flow. Therefore it is improbable that an Unsteady Reynolds Averaged Navier-Stokes Simulation (URANS) can render such flow reliably. Furthermore if the turbulence is modeled statistically, it cannot excite the flow in the combustion chamber. The use of LES for investigating combustion instabilities is essential.

Prediction of Stability Limits of Combustion Chambers with LES

377

Fig. 3 Flow pattern in the combustion chamber: isosurfaces of the Q-criterion at 5 × 104 s−2

For the analytical model the eigenfrequency of the system is an important input parameter. If the geometry is rather simple the undamped Helmholtz model can be used. Further important achievement of the present computation is that the eigenfrequency of the system with geometry of high complexity can be predicted without an additional modal analysis. The calculation with constant mass flow rate is a preparation for the investigation with excitation at the inlet. Latter calculation provides the damping ratio for the analytical model. Future work will be taken to investigate the coupled system and a multi-burner configuration with the same approach.

6 Computational Efficiency In the computation of these results we have been using up to 108 Opteron processors of the HP XC4000 in Karlsruhe. The in-house developed code Sparc is parallelized with the MPI 1.2.7 software. The computational time for one point of the excitation frequency was about 400 h. Since we were using 796 blocks of the finite volume scheme we could efficiently distribute the blocks on the 108 processors with the domain decomposition technique. The load balancing was at about 97%. The parallel efficiency was also very good. Since in Karlsruhe the communication is done with the InfiniBand 4XDDR Interconnect the parallel efficiency was close to 98%. From our recent investigations we know that a higher resolution of the computational mesh is required. We think that using about 24 million points in the next phase we be adequate for a well resolved Large Eddy Simulation.

378

B. Pritz, F. Magagnato, M. Gabi

7 Conclusion At the University of Karlsruhe an analytical model to predict the resonance characteristics of real, damped combustion systems was developed and proved by experimental investigations. Simultaneously series of LES was carried out to investigate the pulsating flow in the combustion chamber. The results showed that LES can predict the resonance characteristics of the system quite accurately. The investigation of the case with constant mass flow rate at the inlet show that the turbulence generated in the combustion chamber can force pressure pulsation in the chamber. Latter can be a source of self-excited thermo-acoustic oscillations in the combustion system. Furthermore it is possible to predict the eigenfrequency of the system by the computation, which is an important input parameter for the analytical model. Acknowledgments. The present work is a part of the subproject A7 of the Collaborative Research Centre (CRC) 606—“Unsteady Combustion: Transportphenomena, Chemical Reactions, Technical Systems” at the University of Karlsruhe. The project is supported by the German Research Foundation.

References 1. Lefebvre AH. Gas Turbine Combustion, Taylor & Francis, Philadelphia, 1999. 2. Lieuwen T, Yang V (Eds.) Combustion Instabilities in Gas Turbine Engines, AIAA, U.S., 2006. 3. Kuelsheimer C, Buechner H. Combustion Dynamics of Turbulent Swirling Flames. Combustion and Flame 2002, 131 (1–2), 70–84. 4. Buechner H, Lohrmann M. Coherent Flow Structures in Turbulent Swirl Flames as Drivers for Combustion Instabilities. Proc. Int. Colloquium on Combustion and Noise Control 2003, ISBN 1-871315-82-4. 5. Lohrmann M, Buechner H. Influence of the Air Preheating Temperature on the Flame Dynamics of Kerosene-LPP Swirl Flames. Proc. European Combustion Meeting 2003. 6. Buechner H. Stroemungs- und Verbrennungsinstabilitaeten in technischen Verbrennungssystemen. Habilitation, Universitaet Karlsruhe (TH) 2001. 7. Arnold G, Buechner H. Modelling of the Transfer Function of a Helmholtz-Resonator-Type combustion chamber. Proc. European Combustion Meeting 2003. 8. Russ M, Buechner H. Berechnung des Schwingungsverhaltens gekoppelter HelmholtzResonatoren in technischen Verbrennungssystemen. Verbrennung und Feuerung 2007. 9. Magagnato F. KAPPA-Karlsruhe Parallel Program for Aerodynamics. TASK Quarterly 1998, 2, 215–270. 10. Magagnato F, Pritz B, Buechner H, Gabi M. Prediction of the Resonance Characteristics of Combustion Chambers on the Basis of Large-Eddy Simulation. J Thermal Sci 2005, 14, 156– 161. 11. Pritz B, Magagnato F, Gabi M. Stability Analysis of Combustion Systems by Means of Large Eddy Simulation. Proc. Conference on Modelling Fluid Flow 2009, Budapest, Hungary. 12. Pritz B, Magagnato F, Gabi M. Investigation of the Effect of Surface Roughness on the Pulsating Flow in Combustion Chambers with LES. Proc. EU-Korea Conference on Science and Technology 2008, Heidelberg, Germany. 13. Poinsot T, Veynante D. Theoretical and Numerical Combustion, R.T. Edwards Inc., Ann Arbor, 2005. 14. Joos F. Technische Verbrennung, Springer, Berlin, 2006.

Numerical Simulation of Laminar-Turbulent Transition on a Dolphin Using the γ -Reθ Model D. Riedeberger and U. Rist

Abstract The γ -Reθ -model, a two equation, correlation-based transition model using local variables, has been employed to predict the extension of the laminar regions on a stiff geometry of the common dolphin (delphinus delphis) moving in the Reynolds regime of 5.5 · 105 to 107 . Mesh independence was gained for a domain resolution of approximately 30 million cells in an unstructured polyhedral mesh with a prismatic wall region (y+ ≈ 1). The final results conclude with very limited laminar regions and thus a mainly turbulent flow around the body of a dolphin traveling at the usual speed of 3 m/s and a resulting drag coefficient of CD ≈ 0.004 referring to the wetted surface area of A = 1.571 m2 . Consequently the potential for active laminarization due to the anisotropic structure of the dolphin skin is well established and is estimated to be as high as 20% with respect to drag force reduction.

1 Introduction The inherent differences in laminar and turbulent flow characteristics relating to skin-friction are not only a great issue in industrial application but also fed the discussion on dolphin locomotion for a long time in the past century. Whether the dolphin shows a potential to extend the laminar flow region and thus reduce the drag force on its body has been in question since the proposition of Gray’s paradox [4] and has recently been revisited in a broader review [3]. Newly developed modeling approaches based on experimental correlations for transition onset together with eddy-viscosity turbulence models for the Reynolds-Averaged Navier-Stokes equations (RANS) of unstructured flow solvers enable the study of transitional flow around complex geometries. Together with the present support of High Performance Computers (HPC) it is now possible to simulate the flow around the dolphin suffiD. Riedeberger · U. Rist Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, Pfaffenwaldring 21, D-70550 Stuttgart, Germany, e-mail: [email protected], [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 28, © Springer-Verlag Berlin Heidelberg 2012

379

380

D. Riedeberger, U. Rist

ciently accurate with a discretization of a rigid model with inclusion of the boundary layer characteristics and the application of turbulence and transition modeling. As a result, the Reynolds-number-dependent transition behavior as well as the influence of the turbulence level in the free-stream on the onset of turbulent flow can be addressed and an estimation of the potential for active laminarization of the dolphin skin can be given. The governing equations and the transition model formulation is outlined in Sect. 2. Afterwards the numerical details are given. The results of the flow around the dolphin are presented and discussed in Part 4 followed by an overview of the computational performance in Sect. 5. A conclusion summarizes the achievements.

2 Physical Modeling The fluid flow is modeled based on the incompressible Navier-Stokes equations for a fluid with constant properties. ∇·u = 0 Du = ρ f − ∇p + ∇ · τ ρ Dt

(1) (2)

The energy equation has been omitted here as the calculations dealt with incompressible flow and thus the energy equation can be treated as decoupled from momentum and mass conservation. For convenience the above equations can be adequately non-dimensionalized delivering the formulation with regard to the Reynolds number in the momentum equation as they can be found continuously in the literature [14].

RANS Equations and Turbulence Closure To regard flows that are inherently turbulent it is feasible to use the Reynoldsdecomposition leading to the Reynolds-Averaged Navier-Stokes equations in the form of

∂u ∂v ∂w + + =0 ∂x ∂y ∂z ∂ u¯i u¯ j ∂ u¯i ∂ ∂ p¯ ∂ u¯i ∂ u¯ j 2 ∂ u¯k +ρ ρ +η + − δi j = f¯i − ∂t ∂xj ∂ xi ∂ x j ∂ x j ∂ xi 3 ∂ xk ∂ ui u j −ρ ∂xj where Cartesian notation was used.

(3)

(4)

Laminar-Turbulent Transition on a Dolphin

381

For turbulence closure the Boussinesq approximation is used in the eddy viscosity formulation of the SST-k-ω model [9] that is adapted related to the transition model formulation [10].

Correlation-Based Transition Modeling The γ -Reθ -transition modeling relies on empirical correlations that relate the onset of transition to the boundary layer’s momentum-thickness Reynolds number. This approach bases on the early works of Abu-Ghannam and Shaw [1] which obtained a functional relationship for the transition onset with respect to free-streamturbulence level and pressure gradient in the form Reθ t = f (Tu, λθ )—where λθ is the Thwaites pressure gradient parameter. This empirical basis has been further refined and extended to low free-stream-turbulence environments by Menter and Langtry [6]. As the evaluated boundary layer momentum thickness is an integral parameter, the work of Menter et al. [10] proposed to use the relation between Reθ and the local vorticity-Reynolds number ReV = Ω ρ y2 /μ which is a locally available scalar value. The formulation Reθ = max(ReV )/2.193 enables reconstruction of the non-local value from a locally available one. The fact that this matching becomes improper for stronger deviations of the shape factor from the Blasius solution is actually used in the model for capturing the separation behavior [10]. The onset criterion for transition—empirically known in the free-stream by the θ t inside the boundary layer using previous definitions—is further transported as Re the following function θt ) θ t ) ∂ (ρ u j Re θt ∂ (ρ Re ∂ ∂ Re + = Pθ t + σθ t (μ + μt ) . (5) ∂t ∂xj ∂xj ∂xj Based on this flow field variable an intermittency scalar γ causes the production terms of the underlying SST turbulence model to be switched on according to the equation ∂ (ργ ) ∂ (ρ u j γ ) ∂ μt ∂ γ = Pγ − Eγ + μ+ . (6) + ∂t ∂xj ∂xj σf ∂xj In both cases P and E reassemble production and destruction terms respectively which are given by the model formulation of Menter [10, 11] and who further demand the specification of correlations [6] based on experiments to close the formulation. The necessary model inherent correlations, first kept proprietary, were proposed by different researchers [7, 13] before finally published by Langtry and Menter [6]. Concerning the implementation into the solver STAR-CCM+ some minor adaptations were proposed and the documentation by Malan et al. [8] is best consulted to get an overview on these.

382

D. Riedeberger, U. Rist

3 Numerical Details For the volume mesh generation within the software STAR-CCM+ by cd-adapco unstructured, polyhedral cells were used combined with prismatic, wall-adjacent cells sufficiently fine to achieve values of a dimensionless distance of the wall adjacent centroid of y+ ≈ 1 and well below that. Prior studies within the scope of this work assessed the model implementation using a flat-plate and an axisymmetric air-ship body. Various mesh topologies (hexahedral, polyhedral), cell densities (100 to 400 surface nodes per characteristic length) as well as different wall-normal refinements were considered. As a result the best-practice for the generation of the volume mesh for the transition simulations on the dolphin were found to use at least 100 nodes across the body length together with automatic local mesh refinements in areas of high curvature. The first wall-adjacent prismatic cell was 10−5 m high and expanded with a ratio of 1.1 to an overall boundary mesh thickness of 3.5 to 5 mm. As the dolphin model has an overall characteristic length of L = 1.94 m this relates to a near-wall mesh of the order of 0.26% compared to the body dimension. It was found that the choice of the wall-normal mesh resolution is most important for the transition modeling as already published before [5]. Thus a slight upstream shift of transition is found for y+ values above unity and the range between y+ = 0.01 and y+ = 1 delivers results best matched to experimental studies [5]. A mesh study on a dolphin model without fins enabled a proper addressing of any mesh-related qualitative offsets and thus the cases presented within this report are judged to be mesh independent. The volume mesh was coarsened to the boundaries using hangingnode refinement and the domain borders were at least 2.5 times the body length (L = 1.94 m) away from the dolphin surface—placement of the boundaries of ten times this value were showing no qualitative and no major quantitative differences in the results. The domain was bounded by a subsonic velocity inlet, a pressure outlet (with preceding 5 m of extruded mesh), slip walls on the outer boundaries and no-slip surfaces on the dolphin. Preconditioning of the field was not straightforward due to the complex geometry. Thus the flow was allowed to develop from an initially resting fluid. For the flow simulations the rigid body with pectoral and dorsal fins and tail fluke consisted of a volume mesh of around 30.5 million polyhedral cells. Studies on mesh-density and a great set of free-stream-turbulence intensity as well as Reynoldsnumber variations were done on a 16 million cell model of the dolphin without any fin appendices. Within the solver STAR-CCM+ the governing and modeling equations are addressed in a finite-volume method used with cell-centered discretization. A segregated, implicit approach with Rhie-Chow interpolation [2] based on the AMG SIMPLE algorithm is applied to the flow and energy equations. The transport equations for momentum, for the modeling of SST-k-ω as well as the γ -Reθ -model are implemented using second-order upwinding.

Laminar-Turbulent Transition on a Dolphin

383

4 Results and Discussion The flow of water around the full dolphin body was taken for varying Reynolds numbers ranging from ReL = 0.54 · 106 to ReL = 5.4 · 106 which relate to velocities of the free-stream from u∞ = 0.25 to u∞ = 2.5 m/s. The plots of the turbulent kinetic energy of the wall-adjacent cell is given in Fig. 1. In these plots the high values at the leading edges of the fins locally exceed the maximum of the highest contour plot value out of reasons to facilitate a better plotting of the main transition phenomena. Clearly there is a visible transition process taking place whose location is shifting upstream with increasing free-stream flow Reynolds number. While at low velocities the eye serves as a turbulence trip and only few portions of the body are experiencing turbulent characteristics the higher free-stream velocities cause the flow to be mainly turbulent only leaving very short laminar patches on the forehead of

Fig. 1 Turbulent kinetic energy k for free-stream velocities of u∞ = 0.25 (upper), 1.0 (mid) and 2.5 m/s (lower), turbulence intensity Tu = 1%

384

D. Riedeberger, U. Rist

Fig. 2 Tomograms of k and C f at different downstream body locations x = 0.25, 0.35, 0.65 m for u∞ = 0.25 m/s (upper) and u∞ = 1.0 m/s (lower), Tu = 1%, observation of body transition

the dolphin. Additional insight into the flow development and the dependence on the Reynolds number can be found when the turbulent kinetic energy is plotted in tomograms perpendicular to the stream-wise extent of the body as given in Fig. 2. At a velocity of 1 m/s both the eye and the breath hole can be identified to cause the local transition to turbulence to set in. As visible, the plot at x = 0.35 m just behind the location of the eye (centered approx. x = 0.3 m) shows a raise in turbulent kinetic energy k correlating with an increase in skin friction. Similarly, after the breath hole at location x = 0.65 both skin friction and turbulent energy are elevated compared to the lateral parts of the body. In addition it is evident from the given tomograms for 1 m/s that the snout is shedding a broad turbulent fraction at the lower body which extends further below the pectoral fins. It is the believe of the authors that the cause of this lies in vortical structures that develop around the shape of the snout and that these are forced to follow along the lower parts of the body due to downward-facing streamlines on the connection to the forehead. Indeed one can identify such a raise in turbulent kinetic energy on the lower body for the 0.25 m/s flow as well at a location x = 0.25 m and less intense at x = 0.35 m. It is seen that the onset of turbulence is suppressed in the direction of the flow, most probably due to favorable pressure gradients caused by the geometric body shape. The importance of the pressure gradient on transition suppression can be verified if one looks at Fig. 3 where the dimensionless pressure coefficient built with the free stream velocity u∞ = 1.0 m/s is plotted on the surface of the dolphin. This

Laminar-Turbulent Transition on a Dolphin

385

Fig. 3 Distribution of pressure coefficient C p for free-stream velocities of u∞ = 1.0 m/s, side (upper) and top (lower) projection, turbulence intensity Tu = 1%

Fig. 4 Streamlines along the dolphin body for u∞ = 1.0 m/s, Tu = 1%

pressure distribution was also found to be representative for all the incompressible calculations within this Reynolds-number regime. Quite obviously, the frontal regions of snout and head are exposed to the flow and build stagnation regions while the downstream portion of the body experiences acceleration resulting in decreasing static pressure until mid-way through the body where a large region of minimal pressure extends from the pectoral to the dorsal fin. Thus, the region of the body downstream of the connection of the fin appendices is governed by adverse pressure gradients. A more closer look on the pressure distribution for the appendices will follow later in the report. If the flow patterns are observed more closely, the streamlines along the body can be consulted. They are given in Fig. 4 for a rake position just upstream of the snout. At first it is obvious that the appendices of both the pectoral and the dorsal fin

386

D. Riedeberger, U. Rist

Fig. 5 Shape and Cp distribution for the fin appendices

influence the flow pattern along the body in a way that they cause the streamlines to aim away from the fins resulting in wake regions behind them. In addition the influence of the snout is visible. As already anticipated from the distribution of turbulent kinetic energy on the body there is a region present at the lower front body where the flow is forced downwards due to the shape of the snout and the connection region to the head. Another analysis of the flow patterns around the body focuses on the fin appendices. Although the streamlines along the appendices are not given within this text to keep a brief overview the results of velocity distribution and stream line patterns seemed to support the studies already done by Pavlov on the dorsal fin [12]. The profile shape and pressure distribution on the appendices was gained by creating cut sections along the chord approximately midway through the extent of each fin. This related in a chord length of L = 0.0929 m for the pectoral fin, L = 0.1413 m for the dorsal fin and L = 0.1155 m for the tail fluke of the dolphin. The respective profile shapes and the resulting pressure distribution is given in Fig. 5. Both the pectoral and dorsal fin show to have a slightly delayed maximum thickness which potentially makes them laminar profiles. The pressure distributions are accordingly while the pectoral fin has to be specially regarded as angled to the flow direction other than the dorsal fin and the tail fluke. Thus, the location of the lowest pressure is not relating to the thickest point as is the case for the dorsal fin and tail fluke. If one wants to draw conclusions concerning transitional behavior of the fin appendices Fig. 1 can be consulted. More conveniently the skin-friction coefficient across the chord length of the fins is given in Fig. 6. In all cases the skin friction shows laminar behavior almost half way through the chord length of the fins. In most of the lower- or middle-Reynolds-number cases transition seems to be triggered by separation on the pectoral fin and the tail fluke. In addition it has to be noted that the

Laminar-Turbulent Transition on a Dolphin

387

Fig. 6 Skin friction coefficient C f along the pectoral (top) and dorsal (middle) fin and the tail fluke (bottom), Tu = 1%, ReL = 0.544 · 106 . . . 1.089 · 107

maximum skin-friction coefficient for the pectoral fin is located further downstream than compared to the other appendices due to the angled arrangement in the flow. The transition behavior of the fins is not surprising as one has to take into account that the characteristic length of the fin appendices is one order of magnitude smaller than the body and thus the Reynolds-number regime accordingly is lower relating to smaller disturbances in the flow fluctuations. Concerning the body forces that act on the dolphin as a result of the transitional flow it was found that with rising turbulence level also the overall drag coefficient

388

D. Riedeberger, U. Rist

CD is seen to rise whereas a Reynolds-number dependence was not possible to be extracted clearly. Thus, for the given regime of free-stream flow from 0.25 to 5 m/s an average value of CD = 0.0038 (standard deviation 0.00026) was found using a representative model of the dolphin without inclusion of the fins. As a means to evaluate possible laminarization potentials a figurative calculation was pursued. For the cases of u∞ = 1 and u∞ = 2.5 m/s the drag coefficient for a low turbulence case (Tu = 0.0025) at which transition only occurs very late on the body was utilized for calculating the drag force on a high turbulence case (Tu = 0.01) with early transition at the same velocity. As a result the force acting on the body was reduced by 18.3% to 26.7% for the high and low velocity respectively. This is meant to give a ball-park figure of existing potential for active laminarization. It remains to apply turbulence damping models or other ways to simulate laminar patches in transitional flow simulation to support this view in future numerical studies.

5 Computational Resources and Performance of the Solver STAR-CCM+ The present study was done using the NEC Nehalem Cluster at HLRS with two quad core CPU, 12 GB RAM per node and infiniband network connection. The version 5.04.006 of the commercially available solver STAR-CCM+ from cd-adapco was used which has a client-server architecture and automatically partitions the mesh, distributes the partitions to the machine processes and controls data exchange via an additionally placed control-process. Due to the necessary grid refinement in the wall-adjacent boundary layer and the complexity of the dolphin shape the grid generation was pursued on one node with increased memory of at least 48 GB. For most of the parametric studies 48 processes on 8 nodes with 12 GB each were used relating to 650,000 cells placed per core to gain a sufficient balance between workshare and communication. While overall convergence for the momentum and continuity equations was already gained after 4000 iterations in 42 hours elapsed time the turbulence and transition model equations were observed until 8000 iterations at an overall time of approximately 84 hours. Evaluating the performance of a transitional simulation in comparison to a solely turbulent RANS simulation Table 1 gives an overview. There the calculations ran on 8 nodes with 63 processes in total and the convergence in those cases was achieved as continuity and y-momentum residuals reached 10−5 and turbulent kinetic energy settled at a residual of 4·10−4 . While the first simulation only used an activated SSTk-ω model the second run also iterated on the two transport equations of the γ -Reθ transition model. Obviously the need for solving two additional transport equations adds to the computational time. While the turbulent calculation iterates longer to reach convergence it still results in a shorter overall run time compared to the case with activated transition modeling. This increase to almost twice of the solution time can be reasoned by two things. First, the solution of the additional equations demands longer iteration time. Second, also the implementation of the transition

Laminar-Turbulent Transition on a Dolphin

389

Table 1 Performance of turbulent and transitional calculation on full dolphin body on NEC Nehalem used models CPU (nodes) iterations run time [h] I/O [min] SST-k-ω 63 (8) 3791 15.09 15.6 63 (8) 2543 28.05 23.2 SST-k-ω + γ -Reθ

Fig. 7 Elapsed run time of transitional simulation for different number of working processes on the NEC Nehalem cluster

model into the solver demands the global broadcast of the location of the cells close to the boundary layer edge as a tree to all mesh partitions every several iterations which consequently produces overhead when run in parallel [8]. The mentioned I/O time reported within Table 1 results from the writing of restart files and is higher for the transitional case as the file sizes are higher in this case as well. To judge the scalability of the solver Fig. 7 shows total solver-elapsed-times for simulations running on 1 to 31 processors on dedicated one to four nodes. The respective calculations were all run until residuals for continuity below 10−5 were achieved. Obviously the application scales quite well and distributing the mesh partitions leads to a good reduction in overall solution time. It is necessary to place an additional control process for each running simulation within the STAR-CCM+ solver which handles communication between the workers and the host. Consequently, the processes of 7, 15 and 31 relate to fully used nodes with one additional control process each. In similar runs of the simulation it was observed, that placing this control task on a node outside of completely covered working nodes (e.g. 32 processes on four nodes plus one control process on an additional node) relates to quite some added time in communication and thus in a deviation from the expected scaling. This effect is most obvious if one takes a look at calculations within one node (max. 8 processors). While the scaling from one to seven processors (plus controller) is almost exponential the placement of the controller outside the fully used node raises the overall solution time to 12.23 h as opposed to 8.23 h for a run on seven nodes. As more nodes are used this effect becomes less important which is reasonable as communication overhead is enhanced for global node communication anyway. Furthermore it was observed that the automatic partitioning of the mesh also resulted in spurious oscillations in a few cases which lead to a far extended run time to reach convergence. Overall it has to be noted that all scaling evaluations of the solver were done during normal operation of the Nehalem cluster of the

390

D. Riedeberger, U. Rist

HLRS and no special precautions could be taken to especially reserve whole nodes or blades exclusively. Nevertheless it is obvious that STAR-CCM+ easily enables to profit from HPC infrastructures with MPI interface but that mesh partitioning and parallelization optimization for a specific machine environment is much more cumbersome than with codes that can be user-adapted and tailored to the computational environment.

6 Conclusions The possibilities to account for a broad range of physical influences on the transition phenomenon has been well implemented into unstructured CFD codes through the γ -Reθ model by Menter and Langtry [6], and the application to STAR-CCM+ by Malan et al. [7]. Due to the modeling of both turbulence and transition within the framework of the RANS equations it is now possible to use HPC to address complex geometric setups as opposed to direct numerical simulation which, at using the same computational costs, focus on the underlying physics in simple flow geometries. The advantage of correlation-based modeling for turbulence onset also enables the transition model to gain precision if the onset criteria are fed with more sophisticated parametric studies through either experiments or DNS. Using this transition model coupled to the eddy-viscosity turbulence-closure by the SST-k-ω model flow around half-symmetric three-dimensional dolphin geometries with fin appendices could be pursued. Regarding normal swimming speeds of the dolphin around 3 m/s in a 1% turbulence-intensity environment it was found that the flow around the dolphin is mainly turbulent with limited laminar regions at the front of the head. Parametric studies concerning different environments of turbulence intensity were used to show the strong influence of turbulence level on the shear-force contribution of the overall drag. The flow pattern around the rigid, full body resulted in a drag coefficient of CD ≈ 0.004 which is in the same order of magnitude as found in literature. With this knowledge it was possible to roughly estimate drag reductions of around 20% if laminarization techniques exist that can delay the onset of turbulence to locations comparable to lower values of turbulence intensity. Furthermore the strong impact of turbulence level in the free-stream for transition occurrence and location emphasizes the demand to investigate the typically present turbulence values in the marine surrounding to be able to simulate the swimming environment and flow characteristics of the dolphin in closer match to reality. In conclusion the flow around a rigid dolphin model in the regime of ReL > 5·106 shows transition in very early locations. The limited laminar regions endure the hope of active laminarization potential due to e.g. certain structure (i.e. compliant walls or surface roughness) of the dolphin skin. Acknowledgments. The authors greatly appreciate the support by Vadim V. Pavlov who kindly provided the CAD data of the dolphin model that was the basis of the simulations. In addition

Laminar-Turbulent Transition on a Dolphin

391

the grant of computation time on the NEC Nehalem cluster of the High Performance Computing Center Stuttgart (HLRS) in the framework of the LAMTUR project is highly thanked for as well.

References 1. B. J. Abu-Ghannam and R. Shaw. Natural transition of boundary layers—the effects of turbulence, pressure gradient and flow history. Journal Mechanical Engineering Science, 22 (5): 213–228, 1980. 2. I. Demirdzic and S. Muzaferija. Numerical method for coupled fluid flow, heat transfer and stress analysis using unstructured moving meshes with cells of arbitrary topology. Computer Methods in Applied Mechanics and Engineering, 125 (1–4): 235–255, 1995. 3. F. E. Fish. The myth and reality of gray’s paradox: Implication of dolphin drag reduction for technology. Bioinspiration & Biomimetics, 1 (2): R17–R25, 2006. 4. J. Gray. Studies in animal locomotion: VI. The propulsive powers of the dolphin. Journal of Experimental Biology, 13: 192–199, 1936. 5. R. B. Langtry. A Correlation-Based Transition Model using Local Variables for Unstructured Parallelized CFD codes. PhD thesis, Universit¨at Stuttgart, 2006. 6. R. B. Langtry and F. R. Menter. Correlation-based transition modeling for unstructured parallelized computational fluid dynamics codes. AIAA Journal, 47 (12): 2894–2906, December 2009. 7. P. Malan, K. Suluksna, and E. Juntasaro. Calibrating the γ -Reθ transition model. In ERCOFTAC Bulletin 80, pages 53–57. 8. P. Malan, K. Suluksna, and E. Juntasaro. Calibrating the γ -Reθ transition model for commercial cfd. In 47th AIAA Aerospace Sciences Meeting, January 2009. 9. F. R. Menter. Two-equation eddy-viscosity turbulence models for engineering applications. AIAA Journal, 32 (8): 1598–1605, August 1994. 10. F. R. Menter, R. B. Langtry, S. R. Likki, Y. B. Suzen, P. G. Huang, and S. V¨olker. A correlationbased transition model using local variables – Part I: Model foundation. Journal of Turbomachinery, 128: 413–422, July 2006. 11. F. R. Menter, R. B. Langtry, S. R. Likki, Y. B. Suzen, P. G. Huang, and S. V¨olker. A correlationbased transition model using local variables – Part II: Test cases and industrial applications. Journal of Turbomachinery, 128: 423–434, July 2006. 12. V. V. Pavlov. Dolphin skin as a natural anisotropic compliant wall. Institute of Physics Publishing: Bioinspiration & Biomimetics, 1: 31–40, 2006. 13. K. Suluksna, P. Dechaumphai, and E. Juntasaro. Correlations for modeling transitional boundary layers under influences of freestream turbulence and pressure gradient. International Journal of Heat and Fluid Flow, 30: 66–75, 2009. 14. F. M. White. Viscous Fluid Flow. McGraw-Hill Series in Mechanical Engineering, 2 edition, 1991.

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes Alexander Klein, Sebastian Illi, Klemens N¨ubler, Thorsten Lutz, and Ewald Kr¨amer

Abstract Corner flow effects prove to have a significant influence on the results obtained in wind tunnels. During the SCBOPT project discrepancies between experiments and numerical investigations occurred. 3D numerical studies considering the wind tunnel were performed to show the effect of wind tunnel walls on the obtained results in both subsonic and transonic flows. The subsonic simulation of the DNW-TWG wind tunnel is presented within this report. Further investigations of a wing fuselage intersection were conducted to investigate the effect of corner separation near the trailing edge. It is shown that the presence of the fuselage does not only have a local effect in the vicinity of the corner separation but influences the complete flow around the wing.

1 Introduction Previous investigations within the SCBOPT project showed discrepancies between the experimental and numerical results for a shock control bump (SCB). Oblique shocks were present originating from the juncture of the airfoil’s leading edge with the wind tunnel side walls. These shocks interacted with the main shock system leading, compared to free-stream conditions, to an upstream displacement of the normal shock. Extensive modeling of the wind tunnel itself proved to be essential [13]. This effect is a special case in the superior category of corner flow effects, which occur every time two solid surfaces at right or near right angles intersect. Examples for such kind of flows can be found in different literature. Zamir [25] did experimental investigations and stability analysis on corner flows. Mikhail et al. [17] numerically analyzed the flow along a 90-deg axial corner and Shankar et al. [21] did Alexander Klein · Sebastian Illi · Klemens N¨ubler · Thorsten Lutz · Ewald Kr¨amer Institute of Aerodynamics and Gas Dynamics, University of Stuttgart, Pfaffenwaldring 21, 70569 Stuttgart, Germany W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 29, © Springer-Verlag Berlin Heidelberg 2012

393

394

A. Klein et al.

investigations for the swept compressive-compressive 90◦ corner arrangement. Besides these theoretical investigations corner flows are relevant for many engineering problems like the flow on the tailplane of airships [6] or wing fuselage intersections [11]. The latter are investigated in this document where a flow around the junction between the wing and the fuselage was simulated using the unstructured solver TAU. During the study a generic aircraft was investigated to show the influence of the presence of the fuselage compared to the wing only model. In the course of a recent airfoil measurement campaign in the transonic wind tunnel (TWG) in G¨ottingen, the measured aerodynamic lift differed decisively from the values previously numerically predicted for the 2D airfoil geometry. These differences are again expected to be caused by wind tunnel side wall interferences and were investigated by 2D and 3D RANS simulations using the block-structured solver FLOWer.

2 Numerical Method The CFD simulations were performed using the structured finite volume solver FLOWer [14] and the unstructured code TAU [19], which were both developed by the German Aerospace Center (DLR), German universities and industry. The codes use explicit time stepping, the multistep Runge-Kutta scheme and optionally implicit time stepping with a LU-time scheme. Flux discretization is achieved by either upwind or central schemes where the central schemes according to [15, 23] were used for the following investigations. Parallelization and different convergence accelerators like local time-stepping, geometrical multigrid or residual smoothing are supported. For time-accurate simulations dual and global time stepping approaches are implemented.

2.1 Governing Equations Both, FLOWer and TAU, solve the three-dimensional Reynolds Averaged NavierStokes equations (RANS) in integral form

∂ ∂t

Wdv + V

∂V

F · nds = 0

(1)

with the vector of the conservative variables W = [ρ , ρ u, ρ v, ρ w, ρ E]T

(2)

on finite volume meshes. The conservative variables are given in a Cartesian coordinate system with ρ , u, v, w, E denoting the density, the Cartesian velocity components

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes

395

of the velocity vector v and the specific total energy, respectively. V represents the control volume and ∂ V its closed outer surface. The flux tensor F is splitted in a convective inviscid part Fc and a viscous part Fv such that F = F c − Fv

(3)

with ⎡

⎡

⎤ ρu ρv ρw ⎢ ρ u2 + p ρ uv ρ uw ⎥ ⎢ ⎥ c 2 ⎢ F = ⎢ ρ uv ρv + p ρ vw ⎥ ⎥ ⎣ ρ uw ρ vw ρ w2 + p ⎦ ρ uE + up ρ vE + vp ρ wE + wp

and

The ψi are abbreviations of the type

∂T ψi = uσix + vσiy + wσiz + K ∂ xi

0 ⎢σxx ⎢ Fv = ⎢ ⎢σyx ⎣σzx ψx

0 σxy σyy σzy ψy

for i = x, y, z.

The pressure p is calculated by the equation of state of the perfect gas

u2 + v2 + w2 p = (γ − 1)ρ E − 2

⎤ 0 σxz ⎥ ⎥ σyz ⎥ ⎥. σzz ⎦ ψz

(4)

(5)

(6)

with the specific heats ratio γ . The temperature T is defined by T=

p . ρR

(7)

The Reynolds stress tensor σi j in (4) and (5), that represents correlations between fluctuating velocities and which depends on the fluid viscosity, is given through

2 (8) σi j = μ vi, j + v j,i − δi j vk,k 3 where (.)i, j denotes derivation of the ith component with respect to x j . Likewise the heat conductivity K in (5) depends on the viscosity μ , as is shown by the relation K=

γ μ γ − 1 Pr

(9)

with Pr being the Prandtl number. For laminar flow μ in (8) and (9) is set to μ = μL , which is according to the Sutherland law

μL = μ0

T T∞

3/2

T∞ + S T +S

where μ0 = 1.716 × 10−5 kg/(m s) and S = 110.4 K.

(10)

396

A. Klein et al.

2.2 Turbulence Models For turbulent flows FLOWer and TAU provide several algebraic and one- or twoequation turbulence models as well as explicit algebraic and differential Reynolds stress models. All these models, with exception of the Reynolds stress models, make use of the Boussinesq assumption which relates the Reynolds stresses to the velocity gradients and a so-called turbulent or eddy viscosity μT . Subsequently, μ in (8) and (9) is replaced by μL + μT for turbulent flows, where the eddy viscosity μT is provided by either an algebraic or a transport equation turbulence model. The Reynolds stress models instead solve a set of additional transport equations derived from the Navier-Stokes equations to model the components σi j directly. For the following investigations the common Spalart-Allmaras [22] and Menter SST [16] turbulence models were used, denoted as SAO and SST respectively. In addition, the SSG/LRR-ω Reynolds stress model [4] with generalized gradient diffusion model and Menter BSL ω -equation was used for the FLOWer RSM calculations. For the TAU simulations, the recently implemented ε h -RSM turbulence model according to [10, 20] has been used, where the Reynolds stress equation is: Dui u j = Pi j + Φi j − εi j + Dνi j + Dti j . Dt

(11)

The production term Pi j and the viscous diffusion Dνi j can be computed exactly, whereas the other terms require modeling approaches. A linear pressure-strain correlation Φi j is used with calibrated near-wall damping functions based on DNS results. To calculate the length scale the homogeneous part εihj of the dissipation rate ε = ε h + 12 Dν is accounted using equation: Dε h εh ∂ Ui ε h ε˜ h k ∂ 2Ui ∂ 2Ui = −Cε 1 ui u j −Cε 2 fε +Cε 3 ν h u j uk Dt k ∂ xi k ε ∂ x j ∂ xl ∂ xk ∂ xl + Dε h + Sl + Sε 4 .

(12)

The anisotropic dissipation rate tensor εihj is finally computed via an implicit relation: εh 2 εihj = fs ui u j + (1 − fs ) δi j ε h . (13) k 3

3 Flow Physics 3.1 Wing Body Junction Flows The horseshoe vortex at a geometrical junction is a prominent near-wall flow feature of elliptical shape. Following the description by Fleming et al. [5], two effects are

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes

397

Fig. 1 Oil flow of the baseline DLR-F6 wing/body configuration at M = 0.75, CL = 0.5, Re = 5.0e6, from [24]

responsible for the emergence of the vortex: The velocity gradients of the incoming turbulent boundary layer contribute most of the vorticity which is skewed and stretched while passing over the wing. Since this states a highly anisotropic flow, non-linear and RSM turbulence models have in the past been applied in CFD in the hope for capturing the correct physics, see [1]. In addition, the leading edge causes a strong adverse pressure gradient for the incoming flow which rolls up as a consequence. This type of secondary flow is called Prandtl’s first kind of secondary flow. Corner separations have not been studied nearly as detailed, but are suspected to constitute a secondary flow of the second kind according to Prandtl’s theory based on their initiation by Reynolds stress gradients as observed by Gessner [9]. Barber [2] showed that the ratio between the incoming boundary layer thickness and the wing thickness defined the interaction of horseshoe vortex and corner separation. In general, the bluntness of the leading edge, the displacement and momentum thicknesses of the oncoming boundary layer as well as the pressure distribution over the airfoil all contribute to the specific flow phenomenon occurring. Gand et al. [7] recently gave a very comprehensive overview over experimental and numerical investigations into junction flows and listed the occurrence of horseshoe vortices and corner separations. While a horseshoe vortex was observed for nearly all wing flat plate junctions, it occurred in conjunction with a corner separation on the airfoil suction side for several industry-relevant configurations. Figure 1 demonstrates this case on the DLRF6 configuration which was chosen as the baseline test for the Third AIAA CFD Drag Prediction Workshop in 2006 [24]. It shows the oil flow in the National Transonic Facility (NTF) at the wing body junction and exhibits both the footprint of the horseshoe vortex on the fuselage as well as a clear evidence of corner separation.

3.2 Corner Flow and Separation in Transonic Wind Tunnels When examining transonic flow in a wind tunnel, a quasi 2D flow is usually desired. It is often assumed that the center-line plane in the tunnel provides this and wall ef-

398

A. Klein et al.

Fig. 2 Oil visualization of M = 1.4 wind tunnel flow over normal shock, from [3]

fects can be neglected. However, especially for smaller wind tunnels, this might not be the case as Bruce et al. showed in [3]. Especially in the region where the normal shock decelerates the flow from supersonic to subsonic, a highly three-dimensional effect was observed: The adverse pressure gradient at the shock loads the boundary layer. Especially in the tunnel corners, the superposition of two boundary layers leads to corner separations also for relatively small pre-shock Mach numbers. The attached core flow in the middle of the tunnel experiences a reduction in crosssection due to the displacement in the corners. The post shock pressures are reduced in the core flow. Thus, an impact on the center-line results from the corner separations around the normal shock. Bruce et al. could demonstrate this by applying a suction just upstream of the normal shock on the side walls at M = 1.4 as seen in Fig. 2. In Fig. 2a no suction is applied and large corner separations result. The pressure reduction at the center-line suppresses a shock-induced separation which should occur for Mach numbers higher than 1.3 in case of a true 2D flow. In Fig. 2b, the applied suction reduces the pre-shock boundary layer and thus the corner separations. The wall influence on the center-line is smaller and a shock-induced separation occurs as expected. The relative width of a wind tunnel can be given as δ ∗ /(tunnel width) with δ ∗ representing the boundary layer displacement thickness. A correlation between the Mach number where the center-line separation occurs for the first time and the relative width of the tunnel was found. It approaches the value of M = 1.3 for wide tunnels which suggests that in this case, a 2D flow can be assumed around the center-line.

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes

399

4 Results 4.1 Generic Wing Fuselage Combination A generic wing fuselage combination was designed according to the data given in [18]. The wing geometry is based on a OAT15A airfoil with a trailing edge gap of 1% chord length, while 25% of the airfoil geometry length were included in the widening. The sweep angle of the wing is β = 30◦ with an aspect ratio of λ = 3.2 and a taper ratio of 1.2. The span of the model is 704 mm with a root chord length of 240 mm. First simulations of the pure wing configuration within the LuFo project ComFliTe showed that the presence of the fuselage has a big influence on the results. On that account a half ellipsoid was included to simulate the fuselage effects. The area around the intersection of wing and fuselage was meshed with a hexahedral block, where the boundary layer cells originating from the surface of the wing intersect those originating from the fuselage, as shown in Fig. 3 detail A. With this approach a good resolution of the corner flow was possible. This results in improved stability of the solver which seems to be essential while using the highly sophisticated ε h -RSM turbulence model. To resolve the tip region of the wing and to prevent the generation of a barrier layer for the turbulence quantities, the tip region was blunted as shown in Fig. 3 detail B. Due to the complex geometry of the OAT15A airfoil a semi-circle was

Fig. 3 OAT15A generic wing fuselage combination

400

A. Klein et al.

Fig. 4 OAT15A wing M = 0.82, Re = 2.5e6, α = 3.5◦ pressure distributions and surface streamlines

applied at the thickest position of the airfoil. Then at four further positions semiellipses were generated with semi-major axis equal to the radius of the semi-circle. With these elements and a spline connecting the elements, the tip geometry was closed using generative shape design in CATIA V5. With this approach a complete hexahedral meshing of the tip region was possible to resolve the boundary layer. The inflow conditions were set according to [18] at the shock buffet onset regime (M = 0.82, Re = 2.5e6, α = 3.5◦ ) but with the hereby used URANS approach the solutions converged to steady state. This might be due to the differences between the generative geometry and the wind tunnel model or due to the numerical approach. Figures 4a to 4c show the pressure distributions on the surface of the wing and wing fuselage combinations. In black the surface streamlines were plotted using the shear stress vectors. While in Fig. 4a no separation at the upside of the wing can be seen, both turbulence models show a separation bubble (I) at the trailing edge for the wing fuselage combination. The separation can also be seen at the fuselage itself, where also a recirculation area results. But the corner separation does not only have local influence. As a result of it the big recirculation area (II) which is also detected in Fig. 4a is shifted away from the root region of the wing. SAO predicts a stronger corner separation (I) so the shift of (II) away from the fuselage seems to be larger than for the ε h -RSM turbulence model. Nevertheless both turbulence models show good agreement in the general representation of the flow field. Figure 5a shows the distribution of the pressure coefficient in the y/L = 2.5% = const-plane which intersects the region of the corner separation. The pressure distribution of the wing only case is compared to the two cases with an included fuselage. For the two turbulence models in general a good agreement can be determined. Compared to the wing only configuration one can see that the presence of the fuselage amplifies the displacement effect of the configuration. Therefore the overspeed on the upside increases and the supersonic area starts right at the leading edge. But due to corner separation at the trailing edge, the supersonic area is decelerated through the nozzle effect. As a result, a clear shock in the pressure distribution in Fig. 5a cannot be seen for the wing fuselage combination. The Mach number decreases above the separation bubble and the supersonic region ends with a weak shock at the trailing edge of the wing.

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes

401

Fig. 5 OAT15A wing M = 0.82, Re = 2.5e6, α = 3.5◦

Fig. 6 3D streamrods at the trailing edge of the OAT15A wing

But where the SAO model converges to a big separation bubble starting at 80% chord length (Fig. 5b), the ε h -RSM model shows different unsteady separated areas starting at 75% of the chord length. The 3D character of this kind of separation is shown in Fig. 6 using 3D streamrods. One can see that for the wing only case the streamlines move straight over the surface. Including the fuselage the SAO model shows a big recirculation area which directly affects the supersonic region above. The formation of a highly three-dimensional structure is shown for the ε h -RSM model. The flow separates just before the trailing edge and fluid is pushed away from the upside of the wing. Parts of the flow which where close to the corner before diverge from the wing fuselage intersection and build up the recirculation area which could also be seen in Figs. 4b and 4c.

402

A. Klein et al.

4.2 Wind Tunnel Wall Effects The EDI-M109 airfoil was recently developed and experimentally investigated within the German national research projects SHANEL-L and INROS in close cooperation with Eurocopter Deutschland and the German Aerospace Center (DLR). The measurement campaign was performed by the DLR in the DNW-TWG wind tunnel in 2010 and aimed at the validation of the numerically predicted performance and unsteady characteristics with regard to dynamic stall [8, 12]. First post-test numerical investigations of the influence of the three-dimensional wind tunnel environment with FLOWer are presented in this subsection for a steady M = 0.3, Re = 1.8e6 test case at 5◦ angle of attack. In order to capture the wall boundary layers as accurately as possible, the complete nozzle and adaptive wall section geometries of the DNW-TWG wind tunnel were included in the numerical simulation (Fig. 7a). Grid clustering was applied near the inflow and the test section as well as in wall-normal direction to guarantee sufficient resolution of the boundary layer (y+ on the order of 1). The airfoil was meshed separately with an O-grid topology consisting of 352 × 96 cells for the airfoil section and extruded to 80 spanwise cells. As shown in Fig. 7b, the airfoil grid was then embedded into the wind tunnel test section by the Chimera technique. With the wind tunnel mesh consisting of 4.3 million cells, the total number of cells used for the computation sums up to approximately 7 million. Total temperature and pressure were specified at the inflow boundary while static pressure was prescribed at the test section exit plane. Due to the geometrical symmetry of the experimental setup, only a half-model of the wind tunnel was calculated with a symmetry boundary condition specified at the tunnel center-plane. All physical walls were numerically treated with an adiabatic, no-slip boundary condition. The presented computations were exclusively run as fully turbulent with different turbulence models applied in the RANS framework of FLOWer.

Fig. 7 Numerical wind tunnel model of the DNW-TWG

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes

403

Fig. 8 Computed lift polars for the EDI-M109 airfoil at M = 0.3, Re = 1.8e6 in the DNW-TWG wind tunnel, SST turbulence model

Fig. 9 Airfoil-wind tunnel corner flow for the SAO, SST and RSM turbulence models

Early numerical simulations showed that the consideration of the viscous side walls in steady polar calculations led to deviations in the lift curve slope in the linear angle of attack regime, Fig. 8. Since the effect of reduced lift in wind tunnel measurements is known, but only partly understood, CFD calculations were performed to investigate the sidewall effect in more detail. Figure 9 shows the corner flow between model surface and wind tunnel side wall for the SAO, SST and RSM turbulence models at α = 5◦ . Two flow phenomena can be observed in the vicinity of the wind tunnel side wall: The streamlines in Fig. 9a clearly show corner separation occurring for the SAO and SST simulations while all three turbulence models predict the emergence of a horseshoe vortex ahead of the airfoil leading edge, see the vector fields in Fig. 9b representing a top view on

404

A. Klein et al.

Fig. 10 λ2 visualization of the horseshoe vortex Fig. 11 Pressure coefficient along the centerplane upper and lower wind tunnel walls for the at the airfoil-wall junction, RSM adaptive test section

the horizontal planes sketched in Fig. 9a. A representative λ2 visualization of the horseshoe vortex is depicted in Fig. 10. The massive separation predicted by the SAO and SST turbulence models distorts the pressure distribution over a significant spanwise portion of the airfoil. Indeed, even the wind tunnel center-plane at which the experimental pressure tabs are located and therefore the resulting lift coefficient is affected. This can be seen in Fig. 11 displaying the pressure coefficient of the upper and lower wind tunnel wall. The enclosed area of the wall pressure distribution is proportional to the airfoil lift coefficient and the reduction for the SAO and SST models compared to RSM and the experiment is easily observed. The reason for the superior performance of the RSM model is believed to be its ability to more correctly account for the anisotropy of the corner flow and the associated improved prediction of eddy viscosity and vorticity.

5 Computational Resources During the studies scaling tests with TAU were performed on the NEC HPC-144 Rv-1 Server configuration of HLRS in Stuttgart. The parallel computations applied OpenMPI 1.4 and the GNU 4.1.2 compiler environment. The scaling tests were conducted on the wing fuselage combination described in Sect. 4.1 with a grid size of 2.67 million points. Two turbulence models and the influence of multigrid are compared in Figs. 12 and 13. Figure 12 shows the used wall clock time in log/log scaling using 2n cores with n = 0 . . . 7. For the series of tests as many cores per node as possible were used. Applying the 3w multigrid scheme increases the wall clock time per iteration by an average factor of 2.1 but the solver benefits from a higher convergence rate. Compared to the SAO turbulence model the RSM model needs approximately 2.7 times more CPU resources.

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes

Fig. 12 Wall clock time per iteration Table 1 CPU resources Computation case TAU Wing only SAO TAU Wing-Fuselage SAO TAU Wing-Fuselage ε h -RSM FLOWer TWG wind tunnel SAO FLOWer TWG wind tunnel SST FLOWer TWG wind tunnel RSM

405

Fig. 13 Scaling efficiency

Nodes 16 16 16 28 28 28

Iterations 200000 200000 200000 15000 15000 15000

CPU time 1097 CPUh 1554 CPUh 4568 CPUh 1643 CPUh 1828 CPUh 2698 CPUh

To illustrate the scaling effect the scaling factor ψ is drawn in Fig. 13. It is defined as:

CPU time with single core (14) CPU time with n cores and is a measure for the scaling efficiency of the parallelized code. For this configuration, ψ drops constantly for rising numbers of cores. Quite high efficiency is reached for parallel simulations on a single CPU using 1 to 4 cores. By using 8 cores and therefore 2 sockets the communication effort increases and ψ drops. Parallelization on more cores makes communication between nodes necessary which has further influence on the scaling efficiency. Besides, the number of ghost cells in the mesh increases. According to the parallelization guidelines of the DLR gridpartitions should at least contain 100000 nodes such that a maximum of 27 cores should be used with a scaling efficiency of around 70%. The FLOWer computations of the three-dimensional DNW-TWG wind tunnel environment demanded a total of 15000 iterations in order for the turbulent boundary layers to build up and for the (mean) aerodynamic coefficients to converge. The grid was split in 28 blocks and computed in parallel mode applying MPI communication between the nodes of the Nehalem cluster. The total computational resources used for the TAU and FLOWer computations are given in Table 1.

ψ=

406

A. Klein et al.

6 Conclusion and Outlook The simulation of corner separation effects proves to be essential for different problems in the scope of fluid dynamics. Two different numerical studies concerning a wing fuselage combination and an airfoil in a wind tunnel environment were performed. The effect of considering the side wall was shown for both cases. For the generic plane the SAO turbulence model generated a big recirculation area at the trailing edge pushing the shock far away from the wall and reducing the shock strength significantly. The newly implemented eh RSM model showed the same effect but with a smaller separated area and the generation of 3D unsteady vortices. For the wind tunnel simulation, 3D effects that were known from previous computations and experiments could numerically be verified. The SAO and SST turbulence models predicted a massive corner separation at a very moderate 5◦ angle of attack which reduced the achieved lift of the airfoil in the center-plane considerably. Only the SSG/LRR-ω Reynolds stress model achieved good agreement with regard to measured wind tunnel wall pressures and did not exhibit an adverse corner separation. From our point of view different questions on the topic still remain open and need further investigations. The simulated corner flows are far from grid-independent due to the high sensitivity of the applied turbulence models for such a complex configuration. Because of the major influence of the turbulence model on the result, (zonal) LES simulations of the corner separations should be considered to reduce the modeled part of the simulation. The results of these further investigations will be incorporated in future CFD simulations of wing-fuselage configurations as well as shock control bumps and dynamically pitch-oscillating airfoils in the wind tunnel.

References 1. D.D. Apsley and M.A. Leschziner. Investigation of advanced turbulence models for the flow in a generic wing-body junction. Flow, Turbulence and Combustion, 67:25–55, 2001. 2. T. Barber. An investigation of strut-wall intersection losses. Journal of Aircraft, 15:676–681, 1978. 3. P.J.K. Bruce, D.M.F. Burton, N.A. Titchener, and H. Babinsky. Corner effect and separation in transonic channel flows. Journal of Fluid Mechanics, in print. 4. B. Eisfeld and O. Brodersen. Advanced Turbulence Modelling and Stress Analysis for the DLR-F6 Configuration. In 23rd AIAA Applied Aerodynamics Conference, 2005. 5. J. Fleming, R. Simpson, J. Cowling, and W. Devenport. An experimental study of a turbulent wing-body junction and wake flow. Experiments in Fluids, 14:366–378, 1993. 6. P. Funk, Th. Lutz, and S. Wagner. Experimental investigations on hull-fin interferences of the LOTTE airship. Aerospace Science and Technology, 7:603–610, 2003. 7. F. Gand, S. Deck, V. Brunet, and P. Sagaut. Flow dynamics past a simplified wing body junction. Physics of Fluids, 22(11), 2010. 8. A.D. Gardner, K. Richter, H. Mai, A.R.M. Altmikus, A. Klein, and J. Raddatz. Experimental investigation of dynamic stall performance for the EDI-M109 and EDI-M112 airfoils. In 37th European Rotorcraft Forum, 13th–15th September 2011, submitted.

Wall Effects and Corner Separations for Subsonic and Transonic Flow Regimes

407

9. F. Gessner. The origin of secondary flow in turbulent flow along a corner. Journal of Fluid Mechanics, 58:1–25, 1973. 10. S. Jakirlic and K. Hanjalic. A new approach to modelling near-wall turbulence energy and stress dissipation. Journal of Fluid Mechanics, 459:139–166, 2002. 11. J. Jupp. Interference Aspects of the A310 High Speed Wing Configuration. In Subsonic/Transonic Configuration Aerodynamics AGARD Conference Proceedings No. 285, 1980. 12. A. Klein, K. Richter, A.D. Gardner, A.R.M. Altmikus, Th. Lutz, and E. Kr¨amer. Numerical Comparison of Dynamic Stall for 2D Airfoils and an Airfoil Model in the DNW-TWG. In 37th European Rotorcraft Forum, 13th–15th September 2011, submitted. 13. B. K¨onig, Th. Lutz, and E. Kr¨amer. Numerical simulation of a transonic wind tunnel experiment. Technical report, Annual report to the HLRS, 2008. 14. N. Kroll, C.C. Rossow, D. Schwamborn, K. Becker, and G. Heller. MEGAFLOW—A Numerical Flow Simulation Tool for Transport Aircraft Design. In Proceedings of the 23rd International Congress of Aeronautical Sciences, 2002. 15. D.J. Mavriplis, L. Martinelli, and A. Jameson. Multigrid solution of the Navier-Stokes equations on triangular meshes. Technical report, ICASE-report No. 89-35, 1989. 16. F. Menter. Two-equation eddy-viscosity turbulence models for engineering applications. AIAA Journal, 32:1598–1605, 1994. 17. A.G. Mikhail and K.N. Ghia. Viscous compressible flow in the boundary region of an axial corner. AIAA Journal, 16:931–939, 1978. 18. P. Molton, R. Bur, A. Lepage, V. Brunet, and J. Dandois. Control of Buffet Phenomenon on a Transonic Swept Wing. In 40th Fluid Dynamics Conference and Exhibit, Chicago, Illinois, 28th June–1st July 2010. AIAA. 19. Institute of Aerodynamics and Flow Technology. Technical documentation of the dlr tau-code. Technical report, German Aerospace Center (DLR), 2010. 20. A. Probst, R. Radespiel, Ch. Wolf, T. Knopp, and D. Schwamborn. A Comparison of Detached-Eddy Simulation and Reynolds-Stress Modelling Applied to the Flow over a Backward-Facing Step and an Airfoil at Stall. In 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, 2010. 21. V. Shankar and D. Anderson. Numerical solutions for supersonic corner flow. Journal of Computational Physics, 17:160–180, 1975. 22. P.R. Spalart and S.R. Allmaras. A one-equation turbulence model for aerodynamic flows. AIAA, 439, 1992. 23. E. Turkel. Improving the Accuracy of Central Difference Schemes. Technical report, ICASEreport No. 88-53, 1988. 24. J.C. Vassberg, E.N. Tinoco, M. Mani, D. Levy, T. Zickuhr, D.J. Mavriplis, R.A. Wahls, J.H. Morrison, O.P. Brodersen, B. Eisfeld, and M. Murayama. Comparison of NTF experimental data with CFD predictions from the third AIAA CFD drag prediction workshop. AIAA Paper, 2008-6918, 2008. 25. M. Zamir. Similarity and stability of the laminar boundary layer in a streamwise corner. Proc. R. Soc. Lond., A, 377:269–288, 1981.

Numerical Simulation of Helicopter Wake Evolution, Performance and Trim Felix Bensing, Martin Embacher, Martin Hollands, Benjamin Kutz, Manuel Keßler, and Ewald Kr¨amer

Abstract This paper gives an overview over recent activities in the field of helicopter aeromechanics simulation at the Institute for Aerodynamics and Gas Dynamics (IAG) at the University of Stuttgart. Numerical investigations on hovering isolated rotors in ground effect, main rotor blade shape optimization and finally computations of the entire helicopter in free flight conditions are included. For the hovering rotor good agreement with experimental data has been found for hovering out of ground effect as well as in close proximity to the ground. During our investigations on blade shape optimization a grid convergence study shows asymptotic behavior of relevant parameters such as the power coefficient or control angles. Further generalization of the fluid-structure coupling procedure allows the simulation of a helicopter in free flight condition. The extended coupling and trim method yields good agreement with flight test data. All simulations were carried out on the NEC Nehalem cluster platform of the HLRS and performance was evaluated during one of the optimization computations.

1 Introduction One of the main research interests at IAG in the last two decades has been helicopter aerodynamics. This field of research involves strongly multi-disciplinary aspects of both aerodynamics and elasticity, for instance fluid-structural coupling effects at the main rotor blades representing aeroelasticity has turned out to be mandatory for a meaningful representation of the flow phenomena, especially in forward flight. In addition to this, some methodology for trimming the rotor towards a specified state has proven indispensible. Only in fulfillment of these presuppositions reasonable Felix Bensing · Martin Embacher · Martin Hollands · Benjamin Kutz · Manuel Keßler · Ewald Kr¨amer Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, Pfaffenwaldring 21, D-70569 Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 30, © Springer-Verlag Berlin Heidelberg 2012

409

410

F. Bensing et al.

comparability of simulation and experiment can be guaranteed. Having taken the step of fluid-structure coupled CFD-simulations of isolated main rotors research at IAG steered towards the simulation of the complete helicopter aircraft [1, 2]. Presently, the coupling toolchain is augmented to account for complete helicopter configurations in free flight conditions. This will enable a direct comparison of CFD and data collected during flight testing.

2 Methodology 2.1 Aerodynamic and Structural Modeling Aerodynamics—CFD Our investigations were conducted making use of the Finite-Volume flow solver FLOWer [3, 4]. FLOWer discretizes the Reynolds-Averaged Navier-Stokes (RANS) equations on a multi block-structured grid with second order central differences, stabilized by dissipation terms of third order according to Jameson et al. [5]. The resulting system of ordinary differential equations is marched in time using an implicit dual time-stepping approach as introduced by Jameson [6], transforming each physical time step into a steady-state problem which in turn can be marched in pseudotime with conventional time-stepping schemes such as Runge-Kutta schemes. Furthermore, convergence acceleration techniques designed for steady-state problems such as multigrid and implicit residual smoothing are implemented. Additional fluxes due to grid movement (whirl fluxes) can be taken into account using an Arbitrary Lagrangian Eulerian (ALE) approach whereby free stream preserving properties are ensured by the enforcement of a discrete Geometric Conservation Law (GCL). Furthermore, the Chimera technique of overlapping grids is implemented for large relative grid movements of the respective grid parts.

Structural Dynamics—CSD The necessity of fluid-structure coupling at the main rotor blades was pointed out in the introduction. This is accomplished by coupling FLOWer as the flow solver to the Eurocopter flight mechanics code HOST [7] which is a general purpose computational environment for the simulation and stability analysis of entire helicopters with all their substructures, as well as isolated rotors. HOST is capable of trimming the rotor towards prescribed objectives such as integral forces or moments, based on lifting line theory using 2D airfoil tables. Internally, the elastic blade structure is modeled by a quasi one-dimensional Euler-Bernoulli beam where deflections in vertical (flap) and in-plane (lag) directions as well as elastic torsion along the blade axis are allowed. Simplifications involve a linear material law and neglection of

Helicopter Wake Evolution, Performance and Trim of Helicopters

411

shear deformation as well as tension elongation. Possible mismatches of the local cross-sectional centers of gravity, tension and shear, however, are taken into account. Geometrical non-linearity is represented by the blade’s representation as a sequence of rigid elements which are connected by virtual joints. In each joint, rotations about lag, flap and torsional axes are permitted. The large number of degrees of freedom resulting from this discretization is reduced by making use of a modal RayleighRitz approach. This yields a final description of the deformed blade axis based on a weighted sum of a limited set of mode-like deformation shapes.

2.2 Fluid Structure Coupling and Trim One possibility of fluid-structure coupling is the so-called strong coupling methodology [8, 9] where load and deformation of data are communicated at each timestep. Given potentially periodic solutions as in helicopter rotors in forward flight, this approach is often abandoned in favor of ‘weak coupling’, where data intercommunication is only done on a per-period basis, e.g. for an n-bladed rotor n/rev periodicity would be anticipated. CFD-CSD communication involves the transfer of aerodynamic loads, conversely blade surface point deflections are returned from the CSD to the CFD solver. Concurrently, pilot inputs of the control angles are updated after each iteration of the weak coupling cycle in order to meet the prescribed trim objectives. This approach of trimming to some (usually three) fixed rotor loads corresponds to the simulation of a wind tunnel experiment where finite forces and moments can be observed at the mounting of the model. It is therefore termed ‘wind tunnel trim’ in contrast to the ‘free flight trim’ which requires complete load balance on the entire aircraft. This will be explained in more detail later in Sect. 5. The details of the coupling process are described in [10].

3 Rotor Hovering in Ground Effect One activity in the field of helicopter aerodynamics is the numerical investigation of a helicopter rotor hovering in and out of ground effect. To assure a correct physical modeling of the ground the boundary layer must be considered and therefore calculated. The computation can only be carried out on a high performance computer like the NEC Nehalem cluster at the HLRS where these results were conducted. The examination is based on the experimental investigation of Light [12]. In this paper, an experimental and theoretical study of the wake geometry and thrust performance of a helicopter rotor in ground effect was presented. A four bladed Lynx tail rotor close to a plate was used in the experiment. Parameter variations in the collective pitch angles and rotor/ground plane distance were performed. This study intends to demonstrate that CFD can reproduce the experimental data and can

412

F. Bensing et al.

serve as a means for the prediction of tip vortex geometry as well as performance measures for helicopter rotors in ground effect hover scenarios. A very good overall agreement with the experimental data could be achieved in this examination (see also Chap. 6). It was shown that CFD can reproduce all flow field characteristics. A detailed description of the calculations and the corresponding experimental setup as well as the results of this investigation can be found in [11].

4 Parametric Study of Different Blade Shape Geometries for an Isolated Rotor in Forward Flight An important topic is the optimization of helicopter rotor blades with focus on blade shape design. Besides the most important issue, the reduction of fuel consumption, noise emissions of rotor systems are a crucial point. In order to fulfill both requirements modern rotor designs operate at a lower rotational frequency. This causes certain difficulties in forward flight, especially at higher cruise speeds. On the retreating blade only a limited area at the blade tip generates thrust, so that high pitch angles are required to achieve moment balance at the rotor which usually leads to higher power consumption. Another possibility to increase lift is to augment chord length in the area near to the blade tip. This effect is examined in a trade study. Results will be presented on the European Rotorcraft Forum 2011 [13].

4.1 Computational Setup and Process Chain A process chain has been built up, consisting of an automatic geometry and blade mesh generation and an automatic fluid structure calculation based on python scripts organizing data exchange between a CFD solver and a Computational Structural Dynamics (CSD) code. Figure 1 shows the process chain in a flow chart. The computational setup consists of a five bladed rotor with individual deformable blade grids being embedded in a background grid (Fig. 2). Within each trim-procedure

Fig. 1 Process chain for a fluid structure simulation

Helicopter Wake Evolution, Performance and Trim of Helicopters

413

Fig. 2 Grid system Table 1 Grid convergence study: Computational setup and performance grid

cells [Mio.]

blocks

coarse medium fine

8.166 278 18.172 310 39.797 450

Subit. per timestep 79 75 79

nodes cores wall-clock per rev. [s] 11 77 60142 17 136 79123 30 210 121392

memory per core [MB] 368 522 798

GFLOPS per node 4.61 4.58 4.44

the isolated rotor is trimmed towards a force and moment equilibrium, meaning that the three pilot inputs (collective and cyclic pitch angles) are adjusted to meet the objectives thrust, pitching and rolling moment. This aeroelastic trimming procedure is essential for the determination of power consumption of a rotor in forward flight. Whereas calculations on the structure side require very few processing power and can easily be done on personal computers the CFD calculation of an isolated rotor in forward flight is computationally expensive. In order to determine an adequate number of grid cells a grid convergence study was conducted. Table 1 presents three different grid setups and their computational performance on the NEC Nehalem Cluster of HLRS. Grid nodes were increased by a factor of 1.3 in each dimension within each refinement step. The calculation was performed with a timestep equivalent to 1◦ of rotor revolution while an average of 75 to 79 subiterations per timestep were used to achieve convergence of the solution. The results show that the calculation of one revolution still requires about 34 hours using 210 cores on the fine grid. On the coarse grid calculation time and effort can be reduced to 17 hours with only 77 cores which is still one week of wall-clock time for a fully converged trim with 8 to 9 revolutions. This limits the blade shape optimization to parametric trade studies as mathematical optimization strategies usually need a data set of thousands of points depending on the parameter space of interest.

4.2 Results Figure 3 shows the dependence of the dimensionless power coefficient CP on the grid resolution where 1 represents the fine grid, 2 the medium and 3 the coarse grid.

414

Fig. 3 Power coefficient and grid size

F. Bensing et al.

Fig. 4 Collective and cyclic pitch angles

Fig. 5 Vortex visualization with λ2 criterion for Fig. 6 Vortex visualization with λ2 criterion for coarse grid fine grid

h is a representative cell space and thus proofs an almost linear refinement of the mesh. Collective and cyclic pitch angles in Fig. 4 show a similar behavior. Although there is a considerable difference between solutions of different mesh resolutions an asymptotic behavior of the error can be observed. Figures 5 and 6 compare the flow field calculated on the coarse grid with the one of the fine grid. Vortex filaments are better conserved on the fine grid (Fig. 6) as the numeric dissipation is reduced, yet the main vortex structure remains similar. Consequently, a series of parametric variations were evaluated on both the coarse and the medium grid (Figs. 7 and 8). Results show a qualitatively good agreement of both grid solutions whereas a quantitative comparison reveals considerable differences as already stated before. But for trade studies and optimization problems only relative values between different solutions are of interest so that it is still adequate to use the coarse grids for a parametric study to determine a better configuration. Nevertheless, as the solution is strongly grid dependent it is essential for such parametric studies to use grids with same topology, grid quality and distribution of grid nodes along the blade surface.

Helicopter Wake Evolution, Performance and Trim of Helicopters

415

Fig. 7 Power coefficient for different geome- Fig. 8 Collective and cyclic pitch angles for tries different geometries

For this purpose the tool for automatic grid generation Automesh was designed to fulfill these requirements [14]. In the present case Figs. 7 and 8 show an improvement of the power coefficient for all configurations compared with the rectangular reference blade represented by number 1. The other geometries were blades with an increased blade area near the blade tip while the thrust weighted area of the blade was kept constant. All these blades show that the collective angle could be decreased and those blades which also reduce the amplitude of the sine fraction of the collective angle ΘS are even more advantageous as this reduces the angle of attack at the retreating blade. This proofs the theory stated before that blades with an increased cord length close to the blade tip reduce power consumption of rotors in fast cruise speed with reduced rotational frequency.

4.3 Conclusion If quantitative results are of interest the grid convergence study shows that there is a considerable amount of uncertainty due to discretization errors. Yet errors show an asymptotic behavior. When the focus is on differences between various configurations as it is usually the case in optimization problems a comparison of the coarse grid solutions with the solutions on the medium grid shows a good qualitative agreement in the evolution of rotor power. Thus, it is still acceptable to use the coarse grid for optimization problems, but it is essential to have a perfect similarity in topology and in node distribution on grids of different blade geometries to avoid negative effects due to discretization errors.

416

F. Bensing et al.

5 Free Flight Trim The rotor wake and performance analyses of the preceding sections are based on isolated rotor configurations. The inclusion of the fuselage and tail rotor in the simulation setup, i.e. the simulation of a ‘complete’ helicopter configuration, offers the possibility to expand the trim procedure: Commonly, the set of trim objectives of helicopter simulations is composed of three main rotor parameters. These parameters are chosen by the user and can be, for instance, the overall thrust and the two overturning moments in the pitch and roll axis generated by the rotor as it has been the case with the simulations described in the previous sections. The rotor control inputs for collective and cyclic blade pitch are varied in an iterative manner until the objectives are reached. This standard practice of rotor trim is called ‘wind tunnel trimming’, as it is well suited to reproduce the conditions of wind tunnel experiments; with the rotor held in a predefined orientation by the mounting fixture of the wind tunnel balance, thrust and moments are parameters that are precisely measurable and that are representative of the aerodynamic and dynamic state of the rotor. Hence matching the measured values usually warrants optimal conditions for the aeroelastic part of the simulation task which consists in the accurate simulation of fluid-structure interactions in order to reproduce the dynamics of the elastic rotor blades. However, as the wind tunnel trim is based on a fixed orientation of the rotor axis, it has only limited applicability to the study of realistic flight situations; in free flight, the rotor axis orientation varies as the helicopter assumes an attitude that corresponds to an equilibrium of loads for the entire aircraft. In the case of steady unaccelerated flight, weight and aerodynamic loads arising at the fuselage, the empennage and through engine thrust need to be balanced by the rotor loads in all axes; this requires to raise the number of load objectives accessible through trim from three to six. Accordingly, the formerly three main rotor control inputs which are manipulated to establish the trimmed condition are supplemented by two attitude angles and by the tail rotor collective blade pitch. They thus form a set of six independent, usually coupled, control inputs, and the expanded trim procedure is termed ‘free flight trim’. As before, the three main rotor control inputs influence the generation of thrust and both lateral forces and moments at the main rotor. The additional input for tail rotor collective pitch controls tail rotor thrust and thus mainly affects the balance of yaw moments (i.e. moments about the vertical axis) and lateral forces. If helicopter pitch and yaw angles (from rotation about the transverse and the vertical axis, respectively) are selected as trimmable attitude parameters, the former mainly serves to control the balance of longitudinal forces and fuselage pitching moments, while the latter determines fuselage lateral forces and yaw moment, though in a different manner as the tail rotor control. The roll angle (about the longitudinal axis) is prescribed in this case. Alternatively, it is also possible to trim the roll angle while setting the yaw angle fixed; this ambiguity in attitude is a general feature of helicopter flight and allows, for instance, sideward flight.

Helicopter Wake Evolution, Performance and Trim of Helicopters

417

Fig. 9 Schematic of data exchange between flow solver and aeromechanics code

The implementation of free flight trimming is based on the extension of the existing rotor load coupling between the CFD solver and the aeromechanics code towards a transfer of both rotor and fuselage loads. Figure 9 illustrates the general possibilities for data exchange when coupling a CFD solver and a comprehensive rotor code. In the case of classical isolated rotor simulations in CFD, only blade loads are forwarded and take corrective influence on the comprehensive code’s simplified aerodynamic model. If fuselage aerodynamic data in the form of separately generated polars was made available to the comprehensive code, a simplified variant of free flight trim would be possible with this setup; however, this approach takes no interference effects between the rotor wake and the fuselage into account. Alternatively, when a complete helicopter is modeled in CFD, both blade and fuselage loads including the effects of interference are transferred and enter the trim state prediction. In the trimmed state, CFD loads finally replace the comprehensive code aerodynamics; the entirety of aerodynamic loads is in balance with the helicopter weight in the temporal mean, and a dynamic equilibrium is established at the rotor blades. By fulfilling the integral load balance in the temporal mean despite the periodic oscillation of main rotor thrust and the short-term variations of fuselage loads, rigid body oscillations of the helicopter as a whole are filtered out. Accordingly, only average values of the fuselage loads are transferred, with the averaging interval extending over at least one, and up to three, rotor revolutions in the present study. In contrast to the wind tunnel trim, the free flight trim features non-constant and a-priori unknown objectives for the rotor loads; any change in helicopter attitude alters both rotor and fuselage loads in a non-linear manner. To find the equilibrium position, the same iterative solution procedure as for the wind tunnel trim can be employed as long as the trim Jacobian, calculated by the aeromechanics code on the basis of simplified models, has sufficient fidelity to the CFD model.

418

F. Bensing et al.

The use of the free flight trim functionality is motivated by an expected gain in the prediction capability for helicopter performance and interactional loads on its structural components. Employing CFD loads for the entire aircraft, it constitutes a ‘first principles approach’ to flight mechanics with the fundamental benefit that it inherently captures the various aerodynamic interference effects and their influence on flight attitude. As rotor performance and fuselage drag considerably depend on flight attitude which however is an unknown parameter during the design phase of a helicopter the extension of the trim process to include these values improves performance prediction.

5.1 Simulation Setup Coupled Simulations are performed using FLOWer for the CFD calculation, and the Eurocopter comprehensive aeromechanics code HOST for the solution of the trim, flight mechanics and structural dynamic tasks. Data exchange between the CFD code which is running on the HLRS Nehalem Cluster and HOST, installed on a PC, is done by file transfer as illustrated in Fig. 9. The exchange of CFD load data and of information on the trim state, i.e. blade dynamics, attitude angles and rotor control inputs, occurs periodically at every trim cycle. Fuselage loads are time averaged and sorted per helicopter component, such as cabin, horizontal stabilizer and vertical control surfaces, to be attributed to the respective HOST internal model. The load input termed “other loads” in Fig. 9 indicates that the coupling environment also allows to include loads from non-CFD sources. Presently, this feature is used to introduce estimates of the hub loads, as this component is not part of the CFD model. The structured mesh system of the CFD model is composed of 11 Chimera multiblock grid components which are the fuselage mesh, four main rotor blade meshes, two tail rotor blade meshes and four additional grid structures generating the skid landing gear. As shown in Fig. 10, the fuselage mesh serves as a background mesh, i.e. it expands to the far field and provides the boundary conditions for the embedded Chimera components. The total number of grid cells amounts to 25 millions. The mesh system was tested with two blocking variants; initially, a system of 305 blocks was used and is later further subdivided into 398 blocks to enhance parallelization. In contrast to isolated rotor simulations, a special challenge arises when extending the procedure towards complete helicopter trim. The helicopter attitude is subject to changes during the trim process resulting in varying inflow angles relative to the helicopter. This has to be accounted for in the CFD simulation. Principally different methods can be applied for this purpose: One approach is to adjust the far field flow direction at the outer mesh boundaries. The major disadvantage of this method is the significant amount of time required to convect the altered inflow direction through the entire mesh system which lasts about 1.5 rotor revolutions for the chosen extent of the computational domain. One way to circumvent this issue is to use whirl fluxes for the reorientation of the flow. This approach is used for the

Helicopter Wake Evolution, Performance and Trim of Helicopters

419

Fig. 10 Block-structured mesh of the EC145 helicopter. Chimera components are the main rotor and tail rotor blades and the landing skids

computations presented here. Effectively, the helicopter is unsteadily piloted from its previous attitude into its new attitude using a specified number of physical time steps for this transition process. The duration of the transition phase is set to one quarter of a main rotor revolution thus maintaining acceptable angular velocities of the helicopter not exceeding 104◦ /s. This strategy has proven to work well and features docile restart characteristics of the flow solver. A physical time step of 1◦ main rotor azimuth is chosen for all simulations based on the satisfactory results with this resolution in previous applications of weak coupling to isolated main rotors. Note that this time step, however, corresponds to a low equivalent temporal resolution on the tail rotor as it rotates at a ratio of 5.66 compared to the main rotor. 40 inner iterations are used to converge in pseudo time. For the closure of equations the Wilcox k-ω turbulence model is selected.

5.2 Results The simulated flight case is a steady forward flight condition at 252 km/h. This cruise flight case has been investigated already in [15], but now is revisited to test an updated HOST model and improved load coupling. An impression of the vortex system can be gained from the λ2 visualization in Fig. 11. The interference of the main rotor wake with the rear fuselage components can easily be identified. Flight test data from Eurocopter for the simulated EC145 helicopter is chosen as a reference for the setup of the trim process and simulation; since measured data is not available for the yaw angle, yaw and pitch angle are selected as the trimmable attitude parameters, while the roll angle is set fix at the flight test value. The skid

420

F. Bensing et al.

Fig. 11 FLOWer simulation of EC145 helicopter, λ2 -visualization. Note the interaction between the main rotor wake and the rear fuselage Table 2 Trim variables in comparison to flight test data

Δ θ0 + 1.63◦

Δ θC + 0.64◦

Δ θC + 0.54◦

Δ θ0,T R − 2.80◦

ΔΘ + 0.35◦

landing gear is included in the CFD simulation only for the first trim iteration. Subsequently, it is omitted to reduce the substantial computational effort inferred by these Chimera components. Nevertheless, the skids bear high relevance for a correct modeling of the helicopter’s flight mechanics, notably the pitching moment balance. Therefore, it was resorted to the compromise of retaining the skid influence through the fixed load correction by means of the same external load coupling as used for the hub loads. This load correction is based on a study of skid loads and of related fuselage interference effects carried out on the data from the first trim iteration. Five trim iterations are necessary to reach convergence within the accuracy limit of the scheme. Expressed in attitude or control angle changes, the accuracy limit is reached if the changes drop below a threshold of 0.01◦ to 0.02◦ upon the last trim cycle. Converged trim variables are summarized in Table 2. The main rotor control angles θ0 , θC and θS are given as offsets from the flight test values. No load coupling between FLOWer and HOST is implemented to date at the tail rotor, and therefore no control angle is transferred to FLOWer. Rather, tail rotor thrust in the CFD simulation is trimmed to the thrust level calculated by the comprehensive code through manual setting of the tail rotor collective angle θ0,T R based on a sensitivity study. For the Δ θ0,T R value given in Table 2 which is the offset between simulation and flight test the accordance of the thrust generated in the CFD simulation with the target value indicated by HOST is 98.5%. The last value listed in Table 2 is the deviation of the helicopter pitch angle Θ from the attitude prevailing in the flight test. The yaw attitude converges to Ψ = 0.52◦ , which is a common moderate sideslip angle.

Helicopter Wake Evolution, Performance and Trim of Helicopters

421

Fig. 12 Fuselage lateral force from CFD and comprehensive code internal aerodynamics (from [15])

The total number of trim iterations required to reach convergence is well in the range known from ‘wind-tunnel trim’ of isolated rotors in case of this cruise flight condition. This is a notable feature given that the trim scheme is enriched by three highly non-linear degrees of freedom in the trim relations. Computational effort, however, is increased for two reasons: On the one hand, flow convergence in response to a trim state update can be delayed since the reorientation of the helicopter into the updated attitude eventually occurs at high angular rates, and thus introduces significant perturbations to the flow. On the other hand, the flow about the fuselage is less regular than pure rotor flow. The EC145 helicopter fuselage is categorized aerodynamically as blunt body with flow separation at the fuselage ‘rear door’ below the tail boom. As a consequence, the tail boom, fin and stabilizer within the wake are subject to unsteady loads particularly in the transverse components; despite some influence from the periodic rotor downwash, these loads in the general case exhibit a longer period or are irregular. An exemplary evolution of fuselage lateral load with time, or equivalently rotor revolutions, is shown in Fig. 12. The time span shown is 22 rotor revolutions with each trim cycle delimited by grey vertical lines. The dominating frequency with a period of 0.25 rotor revolutions relates to pressure pulses from blade passages above the fuselage. However, the irregularity and subharmonic frequency content are clearly visible, particularly for the revolutions 2 to 6. As the objective of free flight trim is to achieve a load balance for a steady flight condition not considering any vibration it is necessary to determine mean loads. Therefore, precision of the trim solution can only be guaranteed by taking averages over a sufficiently long time interval. Averaging intervals range between one and three rotor revolutions in this case, as indicated by the length of the red lines in Fig. 12. For comparison, the isolated rotor trim usually requires phase averaging of rotor loads over one quarter revolution. This highlights the substantial increase in computational effort for the free flight trim. Dash-dotted lines in Fig. 12 represent the fuselage lateral load determined by the comprehensive rotor code as a stand-alone, solely based on the fuselage polar data it is furnished with. The boxed values are the mean offset between FLOWer and HOST, i.e. the aerodynamic correction applied at the following trim cycle. Although the HOST-internal polar data is generated from wind tunnel experiments of the isolated fuselage, it differs from

422

F. Bensing et al.

the CFD solution in the order of 100%, since the experiment lacks the interference from the rotor downwash which by itself is also dependent on flight attitude. The improvement in the determination of aerodynamic loads and consequently in the trim state prediction gained from fuselage load coupling with CFD becomes evident from this deviation. In the following, results are evaluated with respect to performance. Total engine power in flight testing is measured from the engine torque at the drive shafts between the engines and the main gear box. Hence, the measured power has to be corrected by gear box losses, tail rotor power and auxiliary device power. Simulated main rotor power consumption exceeds this corrected value by only 6.00% which is a realistic estimate when taking into account that the CFD simulation assumes fully turbulent flow on the rotor blades. It is noteworthy that besides turbulence modeling this prediction accuracy is achieved without any tuning or empirical coefficients, as it is the case with the semi-empiric, polar based aerodynamic models of comprehensive rotor codes. Nevertheless, the CFD model still features some shortcomings. Besides keeping the flow regime fully turbulent, the CFD model does not comprise the rotor head. Although rotor head drag is taken into account through coupling wind tunnel data as external load into the scheme (cf. Fig. 9), its interference effect on the tail surfaces of the helicopter is not represented. Furthermore, the CFD model uses closed engine inlets and outlets. The EC145 features comparatively large inlets causing a considerable amount of drag when modeled as solid walls. This issue will be remedied in future complete helicopter simulations by activating an engine model that sets appropriate characteristic boundary conditions at the inlets and outlets to mimic an internal mass flow at the desired rate. This way, flow stagnation at the inlet is weakened and an exhaust plume forms that contributes some thrust by momentum and also changes the fuselage loading through flow interference. Preliminary simulations have shown that the envisaged engine modeling also functions properly within the unsteady flow environment below the rotor; periodic pressure pulses from the rotor blades for instance modulate the back pressure against which the engine has to operate at the outlet.

5.3 Computational Effort Computational effort for the presented simulation is approximately 26 hours wallclock time per revolution when the preliminary blocking variant without the skid landing gear is processed by 60 CPUs of the HLRS Nehalem cluster. This reduced setup consists of 201 mesh blocks and amounts to 18 million grid cells. Since this mesh was designed for computations on the NEC SX-9 vector computer parallelization can be exploited only up to ca. 60 processes upon porting to the Nehalem cluster. Due to memory requirements, only four out of eight cores per node were in use while allocating the entire 12 GB memory of a standard-type node. For comparison,

Helicopter Wake Evolution, Performance and Trim of Helicopters

423

calculations on the NEC SX-9 were set up for 16 vector processors, resulting in 23 hours of wallclock time per rotor revolution. Configurations including the skids can be economically calculated with the reworked mesh system on the Nehalem cluster. Computing the new mesh variant with 398 blocks on 160 parallel processes avoids the memory size restriction encountered with the reduced parallelization of 60 CPUs and allows placing eight processes per node. 18 hours of wallclock time per rotor revolution are spent for this setup with 25 million grid cells. Compared to the case at reduced parallelization and grid size, compute effort in CPU hours per grid cell increases approximately by one third. As additional Chimera calculations only play a minor role this scaling is likely due to reaching the memory bandwidth limit on the node now used to full capacity. Complete helicopter simulations trimmed to free flight conditions thus require computing times in the order of 70.000 CPU hours.

6 Conclusions and Outlook Current activities in the field of numerical helicopter aerodynamics at IAG are summarized and relevant results are discussed. A toolchain for the numerical optimization of blade geometries was implemented and a grid convergence study has been carried out on a generic flight and geometry case. It has been shown that there is a considerable amount of uncertainty due to discretization errors. Yet errors show an asymptotic behavior. When the focus is on differences between various configurations as it is usually the case in optimization problems a comparison of the coarse grid solutions with the solutions on the medium grid shows a qualitatively good agreement in the evolution of rotor power. Thus it is still acceptable to use the coarse grid for optimization problems but it is essential to have a perfect similarity in topology and in node distribution on grids of different blade geometries to avoid negative effects due to discretization errors. Subsequently, a helicopter simulation in steady free flight conditions where in contrast to wind tunnel experiments vanishing global loads on the entire aircraft are enforced was carried out. Good agreement with flight test data has been achieved in terms of pilot control inputs and power consumption. Furthermore, the extended trim algorithm has been shown to converge equally well as the ‘wind tunnel trim’ procedure. Future work will include the simulation of other flight conditions as well as the reintroduction of the landing skids and the incorporation of an engine model. Computational performance was investigated for a reference case of the blade shape optimization study. Good scalability of the computational approach is shown and single-threading performance of close to 5% achieved. Acknowledgments. We greatly acknowledge the provision of supercomputing time and technical support by the High Performance Computing Center Stuttgart (HLRS) for our project HELISIM.

424

F. Bensing et al.

References 1. Dietz, M.: Simulation der Umstr¨omung von Hubschrauberkonfigurationen unter Ber¨ucksichtigung von Str¨omungs-Struktur-Kopplung und Trimmung, Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, PhD Thesis, 2009 2. Khier, W., Dietz, M., Schwarz, Th., Wagner, S.: Trimmed CFD Simulation of a Complete Helicopter Configuration, 33rd European Rotorcraft Forum, Kazan, Russia, 2007 3. Kroll, N., Eisfeld, B., Bleecke, H. M.: FLOWer, Notes on Numerical Fluid Mechanics, Vieweg, Braunschweig, Germany, Vol. 71, pp. 58–68, 1999 4. Schwarz, Th. O.: Ein blockstrukturiertes Verfahren zur Simulation der Umstr¨omung komplexer Konfigurationen, Institut f¨ur Aerodynamik und Str¨omungstechnik, Universit¨at Braunschweig, PhD Thesis, 2005 5. Jameson, A., Schmidt, W., Turkel, W.: Numerical Solution of the Euler Equations by Finite Volume Methods using Runge-Kutta Time-Stepping Schemes, AIAA 14th Fluid and Plasma Dynamic Conference, Pao Alto, California, USA, 1981 6. Jameson, A.: Time Dependent Calculations using Multigrid, with Applications to Unsteady Flows Past Airfoils and Wings, AIAA 10th Computational Fluid Dynamics Conference, Honolulu, Hawaii, USA, 1991 7. Benoit, B., Dequin, A.-M., Kampa, K., Grunhagen, W., Basset, P.-M., Gimonet, B.: HOST, a General Helicopter Simulation Tool for Germany and France, American Helicopter Society, 56th Annual Forum, Virginia Beach, Virginia, USA, 2000 8. Altmikus, A., Wagner, S., Beaumier, P., Servera, G.: A Comparison: Weak Versus Strong Modular Coupling for Trimmed Aeroelastic Rotor Simulations, American Helicopter Society, 58th Annual Forum, Montreal, Canada, 2004 9. Altmikus, A.: Nichtlineare Simulation der Str¨omungs-Struktur-Wechselwirkung am Hubschrauberrotor, Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, PhD Thesis, 2004 10. Bensing, F., Keßler, M., Kr¨amer, E.: CFD-CSD-Coupled Simulations of Helicopter Rotors Using an Unstructured Flow Solver, High Performance Computing in Science and Engineering, Springer, Berlin, pp. 393–406, 2010 11. Kutz, B., Bensing, F., Keßler, M., Kr¨amer, E.: CFD Calculation of a Helicopter Rotor Hovering in Ground Effect, STAB Symposium, Berlin, Germany, 2010 12. Light, J. S.: Tip Vortex Geometry of a Hovering Helicopter Rotor in Ground Effect. Proceedings of the American Helicopter Society, 45th Annual Forum, Boston, MA, USA, 1989 13. Hollands, M., Keßler, M., Altmikus, A., Kr¨amer, E.: Trade Study: Influence of Different Blade Shape Designs on Forward Flight and Hovering Performance of an isolated Rotor, European Rotorcraft Forum, Milano, Italy, 2011 [accepted] 14. Kranzinger, P., Hollands, M., Keßler, M., Wagner, S., Kr¨amer, E.: Generation and Verification of Meshes Used in Automatic Process Chains to Optimize Rotor Blades, 50th AIAA Aerospace Sciences Meeting, Nashville, TN, USA, 2012 [submitted] 15. Embacher, M, Keßler, M, Dietz, M, Kr¨amer, E: Coupled CFD-Simulation of a Helicopter in Free-Flight Trim, Proceedings of the American Helicopter Society, 66th Annual Forum, Phoenix, AZ, USA, 2010

Parameter Study for Scramjet Intake Concerning Wall Temperatures and Turbulence Modeling Birgit Reinartz

Abstract Scramjets are hypersonic airbreathing engines that utilize the unique technology of supersonic combustion. Besides supersonic combustor and nozzle, the intake and subsequent isolator are the main components of a scramjet engine where the latter two are especially sensitive to the state of the boundary-layer. Shock wave boundary-layer interaction is of major concern in both compression components. Wall cooling and increased turbulence enhance the ability of boundary-layers to resist strong adverse pressure gradients and reduce separated flow areas, however, both are to some extent unknowns in the simulation of a shock-tunnel experiment. Thus a parameter study was initiated to investigate the influence of both factors on the current intake design for hypersonic testing at Mach 8.

1 Introduction Within the frame of the DFG Research Training Center “Aero-thermodynamic Design of a Scramjet Engine for Future Space Transportation Systems” (GRK 1095) various groups at three universities (Technical University Munich, University of Stuttgart, RWTH Aachen University) and the DLR Cologne are striving towards developing a German scramjet demonstrator. The demonstrator is a small-scale model of an engine developed for operating conditions at an altitude of 30 km and a flight speed of Mach 8. So far all engine components have been tested separately, however, they are highly interdependent. Thus, tunnel-testing of the complete model under flight conditions is the next logical step towards developing a flight test model. Funding for such tests has been secured under the DFG grant GA 1332-1 and will take place at the Khristianovich Institute of Theoretical and Applied Mechanics (ITAM), Russian Academy of Sciences, Siberian Branch, Novosibirsk by the end of Birgit Reinartz Chair for Computational Analysis of Technical Systems (CATS), Center for Computational Engineering Science (CCES), RWTH Aachen University, Schinkelstr. 2, 52062 Aachen, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 31, © Springer-Verlag Berlin Heidelberg 2012

425

426

B. Reinartz

Fig. 1 Final scramjet demonstrator model

this year. Two Russian hypersonic wind tunnels, IT-302M and AT-303, will be utilized in the test campaign. Both hypersonic facilities have a nozzle exit diameter of 0.3 m. Figure 1 shows the demonstrator model currently assembled at the facilities of the Institute of Aerodynamics and Gasdynamics at the University of Stuttgart. Over the last three years there has been an on-going design development of the now finalized demonstrator model. In case of the hypersonic intake, the design process was, to a major part, based on numerical simulations performed in this project [1, 2]. Figure 2 shows exemplarily some of the various intake models that have been considered, simulated, and discarded, where several design prerequisites drove the development process. On the one hand the required inflow condition for the supersonic combustion chamber have to be fulfilled. Specifically, those are an inflow temperature of about 1000 K (and about one bar pressure) to ensure self ignition of the hydrogen injected and an enlarged chamber entrance area of 40 mm height and 65 mm width to allow for higher equivalence ratios than those currently realizable. Larger equivalence ratios lead to more heat release in the combustor and subsequently may cause thermal chocking if the combustor area is not large enough. On the other hand, certain restrictions apply to the dimension of the intake and the overall model due to the dimension of the test section defined by the nozzle exit diameter. Thus, the favored choice of a two-compression ramp intake design had to be abandoned much the same as an intake with higher sidewall compression. In the first case, the model became too high, in the latter case it became too wide and tunnel blocking was feared. In order to make the current model work, several modular design options had to be implemented which are described in more detail in the next section.

Parameter Study for Scramjet Intake

427

Fig. 2 Various intake designs that have been considered within the project

Shock-tunnel facilities are inherently quiet tunnels where the turbulence intensity is small (about 0.5%) and characterized by short running times. Thus, even though hypersonic heat fluxes are extremely high, the model temperature will remain moderate, acting like a strong wall cooling on the flow. Due to the high Reynolds number of the test conditions, the developing surface boundary-layer will transition to turbulence eventually. However, both effects, wall cooling and transition to turbulence, are mutually dependent and, to some extent, unknown. To investigate their effect on the state of the engine boundary-layers and interaction with impinging shock waves, a numerical campaign was initiated analyzing the effect of turbulence modeling, prescribed transition location and specified wall temperatures.

2 3D Intake Model and Test Design The intake model will be equipped with two static pressure transducers and 15 thermocouples along the center line of the intake ramp, the latter in the light of assessing the state of the boundary-layer correctly. The interior part of the model is equipped with over 30 pressure transducers as well as a Pitot rake to determine the local Mach numbers (Fig. 3). With respect to the intake, the final decision was made in favor of a single compression ramp configuration with a 15.5 degree compression angle and a length of 450 mm (shock on lip condition). The sidewall compression angle is 3.5 degrees, and a 35 degree sweep is introduced on the walls. The possibility of passive suction has been added to the model design where the simulation predicts the impingement of the cowl shock as a precaution. By removing the corresponding modular insert, the suction can be realized as a 11.5 mm wide slot which is located 525 mm downstream of the ramp leading edge (see Fig. 3). The slot opens into an ambient pressure chamber with an angle of 45 degrees on the leading edge

428

B. Reinartz

Fig. 3 Detail drawing of the final scramjet demonstrator model Table 1 Test conditions conditions M∞ Re∞ [106 /m] p0 [bar] flight 8.0 2.94 114 I 8.0 2.66 110 II 8.0 8.42 130 *The wall temperature during flight testing will depend on structure will be cooled

T0 [K] T∞ [K] Tw [K] 3130 227 unknown* 3280 237.7 ≤350 1650 120 293 the trajectory and whether or not the

(approx. maximum deflection based on Prandtl-Meyer angle) and 20 degrees at the trailing edge (corresponding to the approximate shock wave angle of the impinging cowl shock). In case of intake unstart, another horizontal insert, located in front of the bleed on the lower wall, can be removed to increase the overall intake height to 44 mm. The design allows for a saw-tooth like turbulence generator insert downstream of the ramp leading edge at x = 110 mm [3, 4]. Because the disturbances induced by the turbulence generator are quite large, it is assumed that the location for natural transition of the same strength would lie upstream of the ramp leading edge, thus, allowing for turbulent inflow conditions to the flow simulations. The turbulence generator was added as a precaution because on the one hand the location of natural transition on a single ramp compression inlet is completely unknown and on the other hand will have a huge impact on the performance characteristic of the inlet. The central strut injector is mounted right downstream of the impingement point of the cowl shock. The injector sits in the center of the model, stretching from one sidewall to the other. The overall length of the injector is 43 mm and the height is 7 mm. The front half of the injection system is wedge-shaped, the second half has a lobed structure to create vorticity and, thus, enhance mixing. Gaseous hydrogen is injected in streamwise direction from its trailing edge. Top and bottom wall of the combustion chamber open with an angle of 2 degrees, the side walls are straight. Finally, the nozzle is an inverted intake because producing thrust is not an issue at this state. The testing will be performed for Mach 8 flight conditions at 30 km altitude as well as for a lower temperature condition closer to the connect-pipe facility conditions that has been used so far for combustion tests (see Table 1). Furthermore, various tests are scheduled with and without the turbulence generator insert.

Parameter Study for Scramjet Intake

429

3 Numerical Method The DLR FLOWer code [5] is applied, which solves the unsteady compressible Reynolds-averaged Navier-Stokes equations using a cell-centered finite volume method on block-structured grids. An AUSM (Advection Upstream Splitting Method) scheme is used for the discretization of the convective flux functions. Higher-order accuracy and consistency with the central differences used for the diffusive terms is achieved by MUSCL (Monotonic Upstream Scheme for Conservation Laws) Extrapolation, and TVD (Total Variation Diminishing) property of the scheme is ensured by a modified van Albada limiter function. Time integration is performed by an explicit five-stage Runge-Kutta time-stepping scheme using local time stepping [6]. Time integration of the turbulence equations is decoupled from the mean equations and the turbulence equations are solved implicitly, using a Diagonally Dominant Alternating Direction Implicit (DDADI) scheme. For wall dominated flows with thick boundary-layers, strong shock/ boundarylayer interaction and with separation, as they are of interest here, the assumption of a linear dependence between the Reynolds stress tensor and the strain rate tensor is not always valid. Therefore, a differential Reynolds stress model (RSM) [7, 8] is used in the simulations. This model solves transport equations for each component of the Reynolds stress tensor as well as for an additional length scale. Thus, it is computationally expensive. However, RSM computations show promising results, especially for separated flows [9, 10]. To assess the effect of turbulence modeling on the predicted state of the boundary-layers two other two-equation models, Wilcox k-ω and Menter’s Shear Stress Transort (SST) k-ω model, are also tested. Both models have been used successfully for hypersonic flow applications in the past [11]. At the inflow boundary, the freestream conditions of the experimental investigation listed in Table 1 are prescribed. The turbulent values are determined by the specified freestream turbulence intensity Tu∞ : k∞ = 1.5(Tu∞ u∞ )2 . The Reynolds stress matrix is initialized by placing 2/3 k∞ on the diagonal and the specific dissipation rate of the freestream is ω∞ = k∞ /(RLTU · μlam ) with RLTU being a measure for the ratio of turbulent to laminar viscosity in the freestream (here: RLTU = 0.001). For the supersonic outflow, the variables are extrapolated from the interior. At solid walls, the no-slip condition is enforced by setting the velocity components and the normal pressure gradient to zero. Due to the short measurement times in a highenthalpy facility it is assumed the model remains at a constant temperature of either Twall = 293 K or Twall = 350 K. Additionally, the Reynolds stresses and turbulent kinetic energy are set to zero at the wall and the respective length scale is prescribed based on the first grid spacing according to Menter’s model.

430

B. Reinartz

4 Numerical Accuracy and Grid Convergence A complete validation of the FLOWer code has been performed by the DLR prior to its release [6, 12] and continued validation is achieved by the analyses documented in subsequent publications [13–15]. Furthermore, FLOWer has already been successfully used in the analysis of 3D hypersonic intake flows [9, 11, 16–18]. FLOWer relies on block-structured grids. Here, 16 blocks are used to discretize the half-model, using the symmetry along the centerline of the intake. Figure 4 shows the distribution of grid points along the ramp. The refinement towards walls and leading edges, depicted in the left detail, are necessary to resolve the strong wall gradients and the onset of boundary-layers, respectively. Otherwise, a homogeneous grid distribution is applied to avoid triggering of physical effects, such as boundary-layer separation, by clustering. This happens easily in hypersonic flows due to their strong grid sensitivity [11]. However, not always is a completely homogeneous distribution possible: the clustered nodes shown in the right detail are due to the refinement of the cowl leading edge. The added grid lines in this region are expanded downwards towards the ramp but are still visible as refinement due to the structuredness of the grid. Due to the complexity of the problem at hand and because the parameter study did not allow for even larger grids, no grid convergence study has been performed. The discretization relies on extensive experience in the

Fig. 4 Grid distribution on ramp with refinement towards walls and leading edges. The refinement on the ramp visible in the upper detail is due to the grid refinement at the cowl leading edge directly above

Parameter Study for Scramjet Intake

431

computation of such flows over the last decade, knowing that not all features of the flow have been sufficiently resolved. For the numerical analysis of the half model of the current intake approximately 10.7 million grid points are used: 488 in the streamwise direction and 85 × 265 in the cross plane. A minimum wall spacing of Δ = 1.e-06 is used in all directions yielding a y+ of 1.

5 Results The numerical simulation of hypersonic intake flows shows a strong compression of incoming air flow by the leading ramp shock wave and a subsequent compression by the cowl shock as the flow is turned into the engine. Figure 5 shows the Mach number distribution in the center plane for two different turbulence models (left: RSM, right: SST) where a turbulence inflow is assumed as discussed in Sect. 2. The effect of the side wall compression on the center plane is seen at approximately x = 0.23 m where the leading ramp shock wave starts to bend outwards due to compression waves emanating from the sides. This causes the leading ramp shock wave to miss the shock-on-lip condition and introduces an additional spillage drag. The cowl shock turns the flow back inside the engine and interacts with the expansion fan of the ramp corner, shown in the detail on the interior flow of Fig. 5. This interaction causes a forward bending of the cowl shock towards the ramp. Here, the boundarylayer has grown thick as it comes down the ramp and subsequently expands around the corner. Thus, the interaction of impinging shock wave and thick boundary-layer causes a large separation in this area. However, the size of the separation strongly depends on the applied turbulence model. The Reynolds stress model (RSM) reacts more to the expanding flow and reduces the inherent turbulence of the flow, thus, the separation size increases because the boundary-layer has lost some of its turbulent momentum. The SST model is not able to detect this partial relaminarizing effect and consequentially predicts a smaller separation. The same is true for the Wilcox k-ω model (not shown). In fact, a fully laminar simulation of the intake diverged after some thousand iterations because the separation grows so large that it blocks the intake flow. A more detailed three-dimensional analysis of the shock wave boundary-layer interaction region, predicted by RSM, is shown in Fig. 6. On the right side the Mach number distribution in the y-z plane for x = 0.5 m shows the cross extent of the separation. On the left side, the Mach number contours for several z = const. planes are shown. They reveal the onset of separation to be close to the sidewall. Likely, the flow separation in this area is due to the corner vortex. The vortex structure is caused by the interaction of the leading ramp shock wave and the side wall shock wave and originates upstream close to the ramp leading edge. Finally, the effect of the turbulence model on the combustor entrance conditions (as specified by the exit plane distribution of the intake simulations) is shown in Fig. 7. Here the temperature distribution in the exit plane is shown predicting overall excellent conditions for self ignition of hydrogen fuel. Nevertheless, the temperature

432

B. Reinartz

Fig. 5 Mach number distributions in central plane for two turbulence models: Reynolds stress model (top) and SST model (bottom); condition II, Tw = 293 K

Parameter Study for Scramjet Intake

433

Fig. 6 Mach contours of boundary-layer separation due to impinging cowl shock wave; various distributions on x-y planes (left) and one cross plane distribution at x = 0.5 m (right)

Fig. 7 Computed temperature isolines on combustor entrance plane (x = 0.58 m) for condition I using different turbulence models: Reynolds stress model (left) and SST model (right)

on the right side (SST model) is slightly lower than the one predicted by RSM. The larger separation of RSM simulations causes a stronger deceleration of the interior flow. Thus the combustor entrance Mach number is reduced and more energy is converted into heat. Table 2 lists the computed and mass averaged inflow conditions for the combustion chamber with respect to the test conditions of the demonstrator. Test condition I, resembling flight conditions, yields excellent combustion conditions in terms of Mach number and averaged temperature for self-ignition. However, at test condition II, self-ignition will be difficult to establish because even though the average Mach number is already reduced to Mach 2.5 at combustor inlet, the averaged static temperature is less than 700 K. Therefore, combustion can only be established

434

B. Reinartz

Table 2 Computed and mass-averaged combustor inflow conditions conditions I II

mass flow ratio 71% 72%

M [-] 2.8 2.5

pt [bar] 12.5 16.7

Tt [K] 3027 1564

T∞ [K] 1355 690

by local temperature maxima. The suction system, currently under investigation, will aim to provide such a local maximum close to the fuel injection location by removing (or at least reducing) the separation at the impingement point of the cowl shock and, thus, homogenize the flow beneath the injector.

6 Computational Considerations The FLOWer computations are performed on the NEC SX-9 cluster using 16 processors for approximately 10 million grid cells. The memory requirement is around 80 gigabytes for a simulation of the 3D intake. For a typical problem to converge approximately 100000 iterations have to be performed. A single batch job performs approximately 12000 iterations and requires 10 hours of CPU-time per node, after which it is resubmitted into the batch queue. Scalability was tested by performing 100 iterations of a typical problem on 16, on 8 and on 4 processors (see Table 3). Accordingly the required user time increases from 100% for the 16 processors to 186% for 8 processors on to 336% for only 4 processors. This behavior can partially be explained by the comparatively large amount of time spent on the set-up of the problem, i.e., to perform one iteration on 16 processors already requires 79% of the user time required for 100 iterations whereas 100 iterations of a 12,000 iterations run require only a fourth of the time of performing solely 100 iterations. FLOWer uses block based MPI parallelization where the communication between different grid blocks is performed using MPI. Therefore, the number of CPUs used should be equal to the number of grid blocks. The distribution of the grid blocks is done manually using the grid generation software MEGACADS and MSPLIT. Therefore, the achievable load balancing depends on the discretization of the physical problem at hand. For a typical intake simulation, 4200 and 5100 MFLOPS are achieved as minimum and maximum value, respectively, resulting in a load difference of approx. 18%. The vectorization level of the FLOWer FORTRAN program package is 98.6%. However, the average vector length lies between 116 and 163 whereas the hardware would allow for 256. Nevertheless the vector length is determined by the necessary physical resolution of the flow in the cross plane and, thus, could only be increased by a higher resolution. So far, more then 35 different cases have been investigated, varying either geometry, test conditions or turbulence modeling.

Parameter Study for Scramjet Intake

435

Table 3 Performance 16 processors 8 processors 4 processors

1 iteration 79% – –

100 iterations 100% 186% 336%

100 of 12,000 it’s 25.1% – –

speed-up 3.4 / 4 1.8 / 2 –

7 Conclusions In preparation of an experimental test campaign a numerical design study of a scramjet demonstrator intake for Mach 8 test conditions has been initiated. Various aspects of the design have been studied over the last years. During the current funding period, the focus was placed on the effect of turbulence modeling and wall cooling. Based on the simulation results, a modular design concept was created for the intake. On the one hand, this modular design is expected to help with the uncertainties of the turbulent flow predictions caused by the unknown transition location, which can only be experimentally determined, and the flow differences due to the use of different turbulence models. On the other hand, it is hoped that the passive bleed will tailor the flow in such a way to improve the intake performance so that reliable self ignition conditions are generated. Currently, the flow simulation is extended to include the passive bleed system in the computational domain. Acknowledgments. The authors would like to thank Dr. Uwe Gaisbauer, University of Stuttgart, for the collaboration and support. This work was supported by the German Research Foundation under GRK 1095 and GA 1332-1. Furthermore, greatly appreciated is the support and computational time provided by the High Performance Computing Center Stuttgart (HLRS) under the project “Shykos”.

References 1. Krause, M., Reinartz, B., and Behr, M., “Numerical Analysis of Transition Effects in 3D Hypersonic Intake Flows,” High Performance Computing in Science and Engineering 2009, Eds. Nagel, W. E. et al., Springer, Berlin, 2010, pp. 395–409. 2. Reinartz, B. and Behr, M., “Computational Design Study of a 3D Hypersonic Intake for Scramjet Design Testing,” High Performance Computing in Science and Engineering 2010, Eds. Nagel, W. E. et al., Springer, Berlin, 2010, pp. 429–442. 3. Freebairn, G., Boyce, R., and Mudford, N., “Hypersonic Transverse Jet Injection in Laminar, Transitional and Turbulent Flow,” AIAA Paper 2009-7230, October 2009. 4. Mee, D. J., “Boundary-Layer Transition Measurements in Hypervelociy Flows in a Shock Tunnel,” AIAA Journal, Vol. 40, No. 8, August 2002, pp. 1542–1548. 5. Kroll, N., Rossow, C.-C., Becker, K., and Thiele, F., “The MEGAFLOW Project,” Aerospace Science and Technology, Vol. 4, No. 4, 2000, pp. 223–237. 6. Radespiel, R., Rossow, C., and Swanson, R., “Efficient Cell-Vertex Multigrid Scheme for the Three-Dimensional Navier-Stokes Equations,” AIAA Journal, Vol. 28, No. 8, 1990, pp. 1464– 1472. 7. Eisfeld, B. and Brodersen, O., “Advanced Turbulence Modelling and Stress Analysis for the DLR-F6 Configuration,” AIAA Paper 2005-4727, 2005.

436

B. Reinartz

8. Eisfeld, B., “Implementation of Reynolds stress models into the DLR-FLOWer code,” IB 1242004/31, DLR, Institute of Aerodynamics and Flow Technology, 2004. 9. Reinartz, B. and Ballmann, J., “Computation of Hypersonic Double Wedge Shock / Boundary Layer Interaction,” 26th International Symposium on Shock Waves (ISSW 26), G¨ottingen, Germany, 16–20 July 2007, 2007, pp. 1099–1104. 10. Bosco, A., Reinartz, B., and Boyce, R., “Experimental and numerical analysis of an hypersonic compression corner for testing the prediction capability of a Reynolds Stress Model,” Proceedings of 8th Int. ERCOFTAC Symposium on Engineering Turbulence Modeling and Measurements, Marseille, France, 9–11 June 2010. 11. Reinartz, B., Ballmann, J., Brown, L., Fischer, C., and Boyce, R., “Shock Wave / Boundary Layer Interaction in Hypersonic Intake Flows,” 2nd European Conference on Aero-Space Sciences (EUCASS), Brussels, Belgium, 1–6 July 2007, 2007. 12. Becker, N., Kroll, N., Rossow, C. C., and Thiele, F., “Numerical Flow Calculations for Complete Aircraft – the Megaflow Project,” DGLR Jahrbuch 1998, Vol. 1, Deutsche Gesellschaft f¨ur Luft- und Raumfahrt (DGLR), Bonn, Germany, 1998, pp. 355–364. 13. Reinartz, B. U., Ballmann, J., Herrmann, C., and Koschel, W., “Aerodynamic Performance Analysis of a Hypersonic Inlet Isolator using Computation and Experiment,” AIAA Journal of Propulsion and Power, Vol. 19, No. 5, 2003, pp. 868–875. 14. van Keuk, J., Ballmann, J., Sanderson, S. R., and Hornung, H. G., “Numerical Simulation of Experiments on Shock Wave Interactions in Hypervelocity Flows with Chemical Reactions,” AIAA Paper 03-0960, January 2003. 15. Coratekin, T. A., van Keuk, J., and Ballmann, J., “On the Performance of Upwind Schemes and Turbulence Models in Hypersonic Flows,” AIAA Journal, Vol. 42, No. 5, May 2004, pp. 945–957. 16. Reinartz, B., “Performance Analysis of a 3D Scramjet Intake,” 26th Congress of International Council of the Aeronautical Sciences (ICAS), Anchorage, USA, 14–19 September 2008, 2008. 17. Krause, M. and Ballmann, J., “Numerical Simulations and Design of a Scramjet Intake Using Two Different RANS Solver,” AIAA Paper 2007-5423, July 2007. 18. Krause, M., Reinartz, B., and Ballmann, J., “Numerical Computations for Designing a Scramjet Intake,” 25th Congress of International Council of the Aeronautical Sciences (ICAS), Hamburg, Germany, 3–8 September 2006, 2006.

Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine J. Starzmann, M.V. Casey, and J.F. Mayer

Abstract In steam power plants condensation already starts in the flow path of the low pressure part of the steam turbine, which leads to a complex three-dimensional two-phase flow. Wetness losses are caused due to thermodynamic and mechanical relaxation processes during condensation and droplet transport. The present investigation focuses on the unsteady effects due to rotor-stator interaction on the droplet formation process. Results of unsteady three dimensional flow simulations of a two-stage steam turbine are presented, whereby this is the first time that non-equilibrium condensation is considered in such simulations. The numerical approach is based on RANS equations, which are extended by a wet steam specific nucleation and droplet growth model. Despite the use of a high performance cluster the unsteady simulation has a considerably high simulation time of approximately 60 days by use of 48 CPUs.

1 Introduction At present about 70% of the worldwide electrical power is generated in steam or combined-cycle power plants, whereby most of these power stations are nuclear, coal or gas fired [21]. In general steam turbines are installed for energy conversion in such power plants and for this reason, efficiency improvements of this key component has a high potential to reduce human CO2 emissions. The necessary reconstruction of power supply towards renewables will change the portfolio of power plants [17], but even in power plants which are based on biomass, geothermal or solar-thermal energy steam turbines are used. In conclusion it can be stated that steam turbines have played a dominant role in power generation in the past and that this dominance will continue in future. J. Starzmann · M.V. Casey · J.F. Mayer ITSM – Institut of Thermal Turbomachinery and Machinery Laboratory, Universit¨at Stuttgart, Pfaffenwaldring 6, D-70569 Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 32, © Springer-Verlag Berlin Heidelberg 2012

437

438

J. Starzmann, M.V. Casey, J.F. Mayer

Recent studies focus on low pressure steam turbines, were complex aerodynamical and mechanical effects must be handled and provide potential for future improvements in efficiency. Firstly the flow is highly three-dimensional and secondly the condensation leads to difficulties in efficiency and reliability. In the framework of the research initiative “Kraftwerke des 21. Jahrhunderts” a project has been started in which the condensation effects on the flow of multi-stage low pressure steam turbines are numerically investigated. High performance computing is needed especially to investigate the influence of rotor-stator interaction on the condensation process, because of the time consuming transient simulations.

2 Overview—Wetness Effects in Low Pressure Steam Turbines 2.1 Wetness Losses The steam in conventional operating steam turbines reaches saturated conditions in the last stages and therefore condensation takes place already in the turbine flow path. The condensation influences the flow field of the turbine due to mechanical and thermodynamic effects and leads to additional losses. Investigations on wetness losses in steam turbines, as summarised by Moore in [16], have shown that an increase in absolute wetness level of about 1% will reduce the turbine efficiency of about 0.5% (relative to the efficiency with dry steam). According to the comprehensive studies of Gyarmathy [10] three main losses exist: • A drag loss occurs due to the friction between droplets and steam, • a braking loss arise because big droplets are not able to follow the steam path and impact on the rotating blades, which also causes damage by erosion, • and perhaps the most important wetness loss is a thermodynamic relaxation loss, which is due to irreversible heat transfer between the liquid and vapour phase during condensation. The physical phenomena which lead to this kind of wetness loss are described in more detail below.

2.2 Non-equilibrium Condensation The initial condensation generally takes place on foreign seed particles on which water molecules can condense and a certain time is needed for water droplet formation. Due to limitations in water chemistry of steam cycles [14] the steam is relative pure and not enough impurities are present to serve as condensation nuclei. An even more important difference between a general condensation process and the process in low pressure steam turbines is that, in turbines, condensation takes place in rapidly expanding steam. Therefore in absence of foreign seeds, the steam has

Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine

439

not enough time to build sufficient stable nucleus. In fact, the steam remains dry although the saturation pressure is reached. From the thermodynamic point of view such a steam state is called supersaturated. Supersaturation means that for a prevailing steam temperature the usual saturation pressure ps (which is a function of temperature) is lower then the actual pressure p. Within the supersaturation S = p/ps the non-equilibrium state of the steam can be quantified. At certain non-equilibrium steam conditions, huge nucleation occurs and leads to strong condensation in the flow, which brings the system back to equilibrium conditions. A precondition for this spontaneous condensation is that stochastically formed water molecule agglomerations reach a stable size. This stable size is called critical radius rcrit and can be calculated from the so-called Kelvin-Helmholtz equation, which describes the influence of surface tension σ on the stability of a droplet at saturated conditions: 2σ 2σ p = exp −→ rcrit = (1) ps rρl RTg ρl RTg ln S The relation shows that the critical radius decreases if the supersaturation S increases. Huge nucleation is initialised if the supersaturation of the steam is high enough that the molecule agglomerations reach the critical radius. Detailed explanations are given by McDonald [15] or Gyarmathy [10]. In conclusion the thermodynamic relaxation back to an equilibrium state is accompanied by an irreversible heat transfer between the vapour and the liquid phase, which causes the so called thermodynamic wetness loss.

2.3 Short Overview about the State of Knowledge At the beginning of steam turbine development it was already recognised, e.g. by Baumann [5] in 1921 or Freudenreich [6] in 1927, that operating steam turbines under wet steam conditions leads to lower turbine efficiencies. A detailed investigation of the loss mechanisms could not be achieved at that time, thus the research focused on the more concrete erosion problem, see [20]. Erosion occurs due to big droplets in the flow path which are not able to accelerate to the steam velocity and impact the blade surfaces. This coarse water is induced into the steam flow due to liquid water film detachment from the stator trailing edge and it is these large drops that are responsible for erosion. The present study does not deal with erosion problems which are discussed in more detail for example in Ahmad [1]. The comprehensive investigations of Gyarmathy [10] or Kirillov and Yabolink [13] can be regarded as fundamental for numerous subsequent theoretical studies about condensation effects on the flow field and the performance of steam turbines. In recent decades specific wet steam models were implemented in conventional codes which use a time-marching approach. Two-dimensional non-viscous flow simulations of non-equilibrium steam flows were conducted by Bakhtar [3] or Young [26]. In modern three-dimensional

440

J. Starzmann, M.V. Casey, J.F. Mayer

Fig. 1 Wake-chopping in turbomachineries, [2]

investigations of non-equilibrium condensation RANS solvers are used, as in the works of Heiler [12], Wroblewski [23] or the works of Gerber [7, 8]. The main unsteady effect in turbomachinery is caused by the rotor-stator interaction. The flow downstream of the blade profile trailing edge is characterised by high viscous dissipation in the wakes. The wakes pass through the downstream blade row and are chopped to smaller parts by the next blade row, shown in Fig. 1. The consequences of wake-chopping on the flow are pressure and temperature fluctuations, which have an influence on droplet formation. A one-dimensional treatment of the temperature and pressure fluctuations due to wake-chopping were already presented by Gyarmathy and Spengler [11]. Examinations based on their results have shown that from these fluctuations a more polydispersed droplet size distribution can be expected. There are a few additional investigations on wake-chopping which are summarised by Bakhtar and Heaton in [2]. From this overview it can be concluded that an unsteady CFD simulation of a multi-stage turbine, if possible in three-dimensions, can help to confirm and quantify the supposed effects. Within the present investigation this further step in modelling condensing flows in low pressure steam turbines will be reached due to an exploration of modern wet steam flow simulation techniques towards a transient approach.

3 Numerical Modelling of Wet Steam Flow For modelling the condensing steam flow in a low pressure steam turbine the commercial CFD code Ansys CFX is used. In this kind of a multi-phase flow problem in addition to the mandatory Navier-Stokes equations a nucleation model which determines the droplet formation and a droplet growth model is needed. The common classical homogeneous nucleation model with the Kantrowitz non-isothermal correction, described in [4], and the algebraic droplet growth model of Young [25] are used. The wet steam model has to been linked with the mass, momentum and energy conservation equations, which was realised by Gerber [8, 9] in the Eulerian-Eulerian

Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine

441

multi-phase framework of CFX. This wet steam modelling approach is introduced in the following sections. The usual constraint for multi-phase flows is given in (2), whereas αc stands for the volume fraction of the continuous gas phase and αd for the volume fractions of possible dispersed liquid phases. D

αc + ∑ αd = 1

(2)

d=1

Further a mass conservation equation, is formulated for the continuous and the dispersed phases. D D ∂ (ρα )c ∂ ρ uj α c = − ∑ Sd − ∑ m∗ αc Jd (3) + ∂t ∂ xj d=1 d=1 D D ∂ (ρα )d ∂ ρ uj α d + = ∑ Sd + ∑ m∗ αc Jd ∂t ∂ xj d=1 d=1

(4)

Beside the density ρ and the flow velocity u, Sd stands for the source term which is related to the condensing mass due to the growth of already existing droplets. The formation of new droplets per volume and time are calculated by the nucleation model which gives the term Jd whereas m∗ is the mass of a single nucleated droplet. The droplet number is conserved for each dispersed phase by the following equation. ∂ (ρ N)d ∂ ρ uj N d = ρd αc Jd (5) + ∂t ∂ xj The momentum equation is as follows, wherein p is the pressure and τij the viscous stress tensor: ∂ (αρ ui )c ∂ αρ uj ui c ∂ p ∂ ατij c = −αc + + SF,m (6) + ∂t ∂ xj ∂ xi ∂ xj A momentum equation can be solved for each dispersed phase too, which is meaningful if a significant velocity difference between the steam flow and the droplets is expected. The droplets generated from homogeneous nucleation, which is the major wetness formation mechanism in low pressure steam turbines [10], are in the range of 0.05 to maximal 1 µm. Such small droplets are accelerated by the steam very quickly [16] and thus the continuous and the dispersed phase share the same velocity field. In principle, friction between the phases can be considered within the source term SF,m . As mentioned in Sect. 2, bigger droplets with sizes of 50 µm to 500 µm exist in the flow. These have no effect on the condensation process and cannot be considered within the present model. The energy conservation of the continuous vapour phase for the total energy H is given as:

442

∂ (αρ H)c ∂ αρ uj H + ∂t ∂ xj

J. Starzmann, M.V. Casey, J.F. Mayer

c

∂ α ui τij c ∂p ∂ ∂T = −αc α keff + + ∂ t ∂ xj ∂ xj c ∂ xj

(7)

+SF,e + SH + SQ The source term SF,e considers possible dissipation due to friction between the phases. The contribution to the energy equation due to mass and heat transfer are represented by the terms SH and SQ . The auxiliary models which define these source terms are not stated here, but it should be mentioned that SH and SQ are related to each other, because condensation is associated to the release of latent heat. According to Gyarmathy [10] for the present study with droplets below 1 µm the energy of the dispersed phase, respectively the droplet temperature Td , can be calculated in good approximation from an algebraic relation, whereas Ts is the usual saturation temperature and Tc the prevailing temperature at supersaturated steam conditions. Td = Ts − (Ts − Tc )

rcrit r

(8)

The multi-phase model used here is restricted to a mono-dispersed droplet representation, because the mean droplet radius r is calculated within the conserved droplet number and the volume of the dispersed phase for each control volume. rd =

3 αd 4πρd Nd

1/3 (9)

A more detailed description of the model is given in the publications of Gerber [8], therefore only a few further issues are stated here: • The pressure, acting across all the phases, is solved from conservation equations formulated for the mixture. • The steam properties are based on the IAPWS-standard, which includes information of the thermodynamic metastable region. • Conservation equations are converted to RANS-equations and • Solved in space xi and time t within the implicit solver Ansys CFX. • For turbulence closure the standard SST model is used, which is applied homogeneously to all phases The best way to validate the presented numerical model is to predict supersonic condensing nozzle flows and compare the results with pressure and droplet size measurements. Several publications in the past, e.g. Young [25], Heiler [12], Wroblewski [24] or Gerber [8], are dealing with this difficult task, however with different codes. For the present model comparisons with flow measurements for a three-stage steam turbine in Gerber [9] and Starzmann [18, 19] has confirmed, that the model is able to consider the non-equilibrium flow conditions and the formation and growth of droplets.

Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine

443

4 Auxiliary Simulation Models The aim of the present study is to clarify the influence of the rotor-stator induced unsteadiness on the condensation process in low pressure steam turbines. The investigation is conducted on a three-stage modern steam turbine, which is located at the Institute of Thermal Turbomachinery (ITSM) of the Universit¨at Stuttgart. Flow measurements have been carried out by V¨olker [22] over a range of operating conditions, which can be used for detailed comparisons with CFD results. Steady state flow calculations were already done for the whole turbine and the results have shown [18, 19], that for usual turbine operating conditions the first condensation takes place in the second stage of the turbine. That is why in the present study the first two stages of the ITSM three-stage turbine test rig are numerically investigated. Figure 2 shows a sketch of the turbine with the names of the several blade rows and the evaluation planes between. The picture on the right hand side shows the modelled geometry. The boundary conditions for the unsteady calculations are obtained from the steady three-stage calculations. A total pressure and a total temperature profile are used at the inlet, and a radial static pressure profile is used at the outlet of the turbine. In case of steady calculations a mixing plane which mixes out the flow in circumferential direction is commonly used as interface between rotor and stator. A sliding interface can be used in the unsteady case, which is more realistic than the mixing plane approach. Circumferential periodicity is assumed for the steady as well as for the unsteady simulation. This means that for a steady simulation only one blade pitch needs to be considered. Whereas using a sliding interface, modelling multiple blade pitches is required to ensure nearly the same pitch angle for each blade row. In Table 1 the calculated number of blade pitches are stated, which is 9 for the first blade row S1 for example.

Fig. 2 Sketch of the investigated two stages of the modelled three-stage turbine Table 1 Grid size and number of modelled blades for the coarse grid modelled blade pitches each blade element no. overall element no.

S1 9 45,674 411,066

R1 8 81,403 651,224

S2 4 86,086 344,344

R2 5 165,810 829,050

444

J. Starzmann, M.V. Casey, J.F. Mayer

The discretisation in space is realised within a structured grid generated in TurboGrid. The number of elements for one blade pitch and the number for all the modelled blade pitches are also given in Table 1. The overall grid consists of approximately 2.2 million elements. Today steady state wet steam simulations on such grids are not a big challenge, but, as the experience within the recent work has shown, the unsteady treatment is very time consuming.

5 Verification of the Numerical Grid Resolution Before the results of the unsteady turbine flow simulation are shown in Sect. 6, the ability of the numerical grid to simulate the flow should be discussed. For the first unsteady simulations presented here a relatively coarse grid is used to save computational effort. The influence of grid resolution on the flow field can be estimated, if results of steady calculations on different grid resolutions are compared with each other. In Fig. 3 results for the complete three-stage turbine are shown. In addition to the coarse grid with 0.7 million elements used for the unsteady investigation, results on a medium grid with 1.6 million and on a fine grid with 5.2 million elements are given. For the calculated operating point, nucleation and subsequent rapid condensation takes place mainly in the second stator of the turbine. In this stator S2 the nucleation rate and the subcooling is evaluated along a mid-streamline in Fig. 3. Actually in this sensitive flow region with high expansion rates, the differences between the grids are not dramatic. The picture on the right hand side of Fig. 3 presents circumferential averaged pressure distributions in plane E30. Again the differences with various grid resolutions are only small and the results match well with the available measurement data. The traverse measurements are conducted with pneumatic four-hole probes; in addition static wall pressure measurements are available which represent circumferentially averaged data [22].

Fig. 3 Steady calculations on different grids for the complete three-stage turbine, [19]

Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine

445

Fig. 4 Non-dimensional entropy increase on 20% span of stage 2

6 Results The unique feature of the results presented here is that this is the first time that a three-dimensional CFD analysis of a multi-stage turbine has been performed under consideration of the non-equilibrium condensation process. These, and further investigations should answer the open question: how strong is the influence of rotorstator interaction on condensation? In the following first results of the unsteady simulation are presented and compared to a steady state simulation with mixing planes between the rotating and stationary blades. An important effect of wake-chopping in turbomachinery is given in Fig. 4 where a non-dimensional entropy increase can be identified. The definition of this entropy increase is suggested by Young [27], whereas the entropy is normalised by the vapour gas constant Rg . Δs s − sinlet exp − = exp − (10) Rg Rg The high entropy production in the wakes of the stator can be seen for the steady simulation in Fig. 4a as well as for the unsteady case in Fig. 4b. However, in the unsteady approach these wakes are not destroyed due to the mixing plane approach and pass through the rotating blades.

6.1 Influence of Inherent Unsteadiness on Droplet Formation The chopping of wakes leads to an inherent unsteadiness of the flow field, which can be seen in Fig. 5. The pressure and the temperature distributions obtained on fixed locations (monitor points) for each solver timestep, which in principle equates to the real simulation time. These monitor points are located at the outlet of the blade rows and the locations R11 to R13 refer to the outlet region of the first rotor

446

J. Starzmann, M.V. Casey, J.F. Mayer

Fig. 5 Temperature and pressure fluctuation in rotor R1 on monitor points on 20% span

Fig. 6 Positions of the monitor points

R1, as given in Fig. 6. Because there is only one stage upstream of this position the temperature oscillates only slightly. The pressure oscillation is of about 2-3%, related to the pressure drop over the whole stage. These small fluctuations have a slight influence on the nucleation, as can be identified in Fig. 7 by the nucleation rate in stator S2 on a 20% span surface. The different timesteps correspond to one pitch transition for the rotor R1 and the related timesteps are marked by dots in the distributions of Fig. 5. Besides the fluctuation, nucleation is located more upstream in the unsteady case. This shows the comparison with the nucleation position in the steady simulation (Fig. 7), where on the suction side nucleation begins further downstream near the throat. Especially near the casing nucleation is in the unsteady simulation only predicted in the stator, whereas in the steady case nucleation mainly appears in the rotor. In Fig. 8 the nucleation rate is shown on a surface near the casing at 80% span. It is assumed that this shift in nucleation position is due to the use of mixing planes in the steady approach. This kind of interface mixes out the flow in circumferential direction which leads to a reduced mean supersaturation level at the inlet of the stator. However, nucleation needs high supersaturation and in the unsteady calculation a sufficient level is reached more upstream compared to the steady simulation. Stronger flow unsteadiness exists downstream of the second stator S2 in which droplet formation has occurred. The temperature and pressure distributions on the corresponding monitor points S21 to S24, are shown in Fig. 9. For the temperature the peak-to-peak amplitude reaches 10 K and the pressure oscillations reach over 7% of the pressure drop of stage two.

Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine

447

Fig. 7 Nucleation in S2 on 20% span for different timesteps and for the steady simulation

Fig. 8 Nucleation in S2 and R2 on 80% span for the unsteady and for the steady simulation

Fig. 9 Temperature and pressure fluctuation in stator S2 on monitor points on 20% span

Finally, it is interesting how the unsteady flow behaviour affects the droplet diameter, because the size of droplets has a direct influence on the wetness losses and erosion. The droplet diameter on the monitor points in rotor R2 are drawn in Fig. 10. The maximum peak-to-peak amplitude is of about 20% related to the mean droplet diameter. In comparison to the droplet size of the steady simulation, which is of about 10−7 m, the droplet size in the unsteady case is considerably larger. However, it must be stated that for the droplet diameter the unsteady solution is not fully converged, as it can be seen in Fig. 10, and the calculation must be continued.

448

J. Starzmann, M.V. Casey, J.F. Mayer

Fig. 10 Droplet diameter at the outlet of rotor R2 on 20% span height

6.2 Discussion In the previous section it was noted that the temperature fluctuations after the first stage are only weak, which causes only a slight oscillation in the nucleation position. It can be argued that the temperature fluctuations would be much higher, if a turbine with more stages is considered. In fact, for several stages upstream the fluctuations could be high enough, such that the nucleation front oscillates between the blade section and the space between the blades and this would lead to a completely different droplet spectrum. Because of the high computational time consumption, as discussed in the following section, it is advisable to realise calculations with more than two-stages in two-dimensions.

7 Computational Efforts The high performance cluster NEHALEM at the HLRS of the Universit¨at Stuttgart has been used for the current computations using the flow solver Ansys CFX V12.1. The experience with the non-equilibrium condensation model has shown, that a high number of iterations is needed to attain a convergent and almost periodic solution. The RMS-residuals for the momentum and above all for the mass conservation equations must reach a value around 5 · 10−6 to ensure that the wetness related variables reach an almost periodic characteristic. These criteria can be achieved by means of a very small time discretisation. For the current simulation a timestep has been used that results in more than 800 steps for one pitch transition of rotor R2. Taking into account that the simulation of several blade pitch transitions are needed before an almost periodic characteristic is developed, over 20,000 time iterations are necessary. Therefore a high performance computing platform is needed to reach acceptable computation times. An acceleration of the simulation can be reached due to a efficient parallelisation, for which the HPMPI option has been chosen. Figure 11 shows how many timestep iterations can be achieved with different levels of parallelisation. The red squares

Unsteady Numerical Study of Wet Steam Flow in a Low Pressure Steam Turbine

449

Fig. 11 Development of simulation time with increasing parallelisation

give the timesteps per hour over the used numbers of CPUs and the green triangles show the simulation time (in days) that is required for an unsteady computation with 20,000 timestep iterations.

8 Conclusion Time expensive three-dimensional unsteady flow simulations of a two-stage steam turbine have been undertaken. The special feature of the CFD model is the ability to model non-equilibrium condensation, which is responsible for additional efficiency losses in low pressure steam turbines. For the present investigation with only one stage upstream where condensation occurs, the rotor-stator interaction leads to a slight time fluctuation of the nucleation position. However, the droplet size at the outlet of the second stage changes about 20% over time. In addition, the results show that the unsteady simulation predicts the nucleation position more precisely compared to the oversimplified steady approach. Further investigations should be realised on turbines with more stages to extend knowledge about the effect of unsteadiness on the condensation in steam turbines.

References 1. Ahmad, M.: Experimental assessment of droplet impact erosion of low-pressure steam turbine blades, Dissertation, Universit¨at Stuttgart, 2009 2. Bakhtar, F.; Heaton, A. V.: Effects of Wake Chopping on Droplet Sizes in Steam Turbines, Proc. IMechE, Part C: J. Mech. Engrg. Science, 219(12):1357–1367, 2005 3. Bakhtar, F.; Tochai, M. T. M.: An Investigation of Two-Dimensional Flows of Nucleating and Wet Steam by the Time-Marching Method, Int. J. Heat and Fluid Flow, 2(1):5–18, 1980 4. Bakhtar, F.; Young, J. B.; White, A. J.; Simpson, D. A.: Classical Nucleation Theory and Its Application to Condensing Steam Flow Calculations, Proc. IMechE, Part C: J. Mech. Engrg. Science, 219(12):1315–1333, 2005 5. Baumann, K.: Some Recent Developments in Large Steam Turbine Practice, J. Inst. Electr. Eng., 59(302):565–623, 1921

450

J. Starzmann, M.V. Casey, J.F. Mayer

6. Freudenreich, J.: Einfluß der Dampfn¨asse auf Dampfturbinen, Zeitschrift des Vereines Deutscher Ingenieure, 71(20):664–667, May 1927 7. Gerber, A. G.: Two-Phase Eulerian/Lagrangian Model for Nucleating Steam Flow, Trans. ASME, J. Fluids Engrg., 124(2):465–475, 2002 8. Gerber, A. G.; Kermani, M. J.: A Pressure Based Eulerian-Eulerian Multi-Phase Model for Non-Equilibrium Condensation in Transonic Steam Flow, Int. J. Heat and Mass Transfer, 47:2217–2231, Aug. 2004 9. Gerber, A. G.; Sigg, R.; V¨olker, L.; Casey, M. V.; S¨urken, N.: Predictions of Nonequilibrium Phase Transition in a Model Low Pressure Steam Turbine, Proc. IMechE, Part A: J. Power and Energy, 221(6):825–835, 2007 10. Gyarmathy, G.: Grundlagen einer Theorie der Naßdampfturbine, Dissertation, ETH Z¨urich, 1962 11. Gyarmathy, G.; Spengler, P.: Traupel-Festschrift Gewidmet zum 60. Geburstag von Walter ¨ Traupel, chap. Uber die Str¨omungsfluktuationen in mehrstufigen thermischen Turbomaschinen, pp. 95–141, Juris-Verlag, Z¨urich, 1974 12. Heiler, M.: Instation¨are Ph¨anomene in Homogen/Heterogen Kondensierenden D¨usen- und Turbinenst¨omungen, Ph.D. thesis, Universit¨at Karlsruhe (TH), 1999 13. Kirillov, I. I.; Yablonik, R. M.: Fundamentals of the Theory of Turbines operating on Wet Steam, NASA Technical Translation, NASA TT F-611, Mashinostroyeniye Press, Leningrad, 1968 14. McCloskey, T. H.; Dooley, R. B.; McNaughton, W. P.: Turbine Steam Path Damage: Theory and Practice, Elec. Power Res. Inst., 1999 15. McDonald, J. E.: Homogeneous Nucleation of Vapor Condensation. I. Thermodynamic Aspects, Am. J. Phys., 30:870–877, Feb. 1962 16. Moore, M. J.; Sieverding, C. H. (eds.): Two Phase Steam Flow in Turbines and Separators, von Karman Institute Book. Hemisphere Publishing Corporation, Washington, London, 1976 17. Nitsch, J.; Wenzel, B.: Langfristszenarien und Strategien f¨ur den Ausbau Erneuerbarer Energien in Deutschland, Leitszenario 2009, Studie des DLR im Auftrag des Bundesministeriums f¨ur Umwelt, Naturschutz und Reaktorsicherheit, Berlin, 2009 18. Starzmann, J.; Casey, M. V.; Sieverding, F.: Non-Equilibrium Condensation Effects on the Flow Field and the Performance of a Low Pressure Steam Turbine, in: Proceedings of ASME Turbo Expo, Glasgow, ASME, June 2010 19. Starzmann, J.; Schatz, M.; Casey, M. V.; Mayer, J. F.; Sieverding, F.: Modelling and Validation of Wet Steam Flow in a Low Pressure Steam Turbine, in: Proceedings of ASME Turbo Expo, Vancouver, ASME, June 2011 20. Todd, K. W.; Hall, W. B.; Morris, W. D.; Ryley, D. J.: Symposium on wet steam, in: Proc. Instn. Mech. Engrs., London, 1966 21. VGB PowerTech: Zahlen und Fakten zur Stromerzeugung 2010, Sept. 2010, http://www.vgb.org/daten stromerzeugung.html 22. V¨olker, L.: Neue Aspekte der aerodynamischen Gestaltung von Niederdruck-EndstufenBeschaufelungen, Dissertation, Universit¨at Stuttgart, 2006 23. Wr´oblewski, W.; Dykas, S.; Gardzilewicz, A.; Kolovratnik, M.: Numerical and Experimental Investigations of Steam Condensation in LP Part of a Large Power Turbine, Trans. ASME, J. Fluids Engrg., 131(4), 2009 24. Wr´oblewski, W.; Dykas, S.; Gepert, A.: Steam Condensing Flow Modeling in Turbine Channels, Int. J. Multiphase Flow, 35:498–506, March 2009 25. Young, J. B.: The Spontaneous Condensation of Steam in Supersonic Nozzles, PhysicoChemical Hydrodynamics, 3(1):57–82, 1982 26. Young, J. B.: Two-Dimensional, Nonequilibrium, Wet-Steam Calculations for Nozzles and Turbine Cascades, Trans. ASME, J. Turbomachinery, 114:569–579, July 1992 27. Young, J. B.: The Fundamental Equations of Gas-Droplet Multiphase Flow, Int. J. Multiphase Flow, 21(2):175–191, 1995

Turbulence Modelling for CFD-Methods for Containment Flows Annual Report 06/2010–06/2011 Armin Zirkel and Eckart Laurien

Abstract During a severe accident of a light-water reactor, hydrogen can be produced by a chemical reaction between the Zircaloy cladding and water and escape into the containment through a leak in the primary circuit. The prediction of the mass transport of hydrogen is vital for an optimised positioning of countermeasures like recombiners. It is possible that a stable stratification of hydrogen and air occurs, due to the different densities of those fluids. This stratification can be mixed with a free jet. This mixing is characterised by the time dependency of the flow, sharp velocity and density gradients as well as the non-isotropy of Reynolds stresses and turbulent mass fluxes. With the use of a Reynolds stress turbulence model, the non-isotropic Reynolds stresses can be simulated. A similar approach is theoretically possible for the turbulent mass fluxes, but only the isotropic eddy diffusivity model is currently available in state-of-the-art cfd-software. The shortcomings of the eddy diffusivity model to simulate the turbulent mass flux are investigated, as well as improvements with the use of a non-isotropic model. Because of the difficulties to get experimental data of flows in real containments, the THAI experimental facility was created to get experimental data for flows in large buildings. The experiments are performed by Becker Technologies. The analysis is using the experimental data of the THAI experiments TH-18, TH-20, TH-21 and TH-22 as the reference case. For safety reasons the used light gas for the TH20 experiment is helium instead of hydrogen. Due to the rotational symmetry of the geometry as well as the boundary conditions, two-dimensional simulations are performed where applicable. The grids have been built following the best practice guidelines to ensure sufficient grid quality. Several simulations were carried out to investigate the numerical error caused by spatial and time discretisation. During this reports time frame, simulations of the TH-20 and TH-22 experiments have been performed. Dipl.-Ing. Armin Zirkel · Prof. Dr.-Ing. Eckart Laurien Institut f¨ur Kernenergetik und Energiesysteme, Universit¨at Stuttgart, Pfaffenwaldring 31, 70569 Stuttgart, Germany, e-mail: [email protected], [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 33, © Springer-Verlag Berlin Heidelberg 2012

451

452

A. Zirkel, E. Laurien

Notation and Symbols m s2 m2 s2

g

gravity constant

k L p T,t

turbulence kinetic energy lengthy m N pressure m2 time s

u

velocity

x u¯ u˜ u , u

spatial direction time averaged value Favre averaged value fluctuating value

m s

ε

turbulence eddy dissipation

ρ ρr μ ν

density reference density dynamic viscosity kinematic viscosity

νt

kinematic eddy viscosity

λ molecular diffusivity ϕ Helium mass fraction Φ˜ i turbulence mass flux τ˜i j Reynolds stress tensor

m2 s3 m s2 m s2

Pa s m2 s m2 s m2 s m s N m2

1 Introduction During a severe accident in a light-water reactor, hydrogen can be produced by a chemical reaction between the Zircaloy cladding and water. Zr + 2H2 O → ZrO2 + 2H2

(1)

The hydrogen can then escape into the containment through a leak in the primary circuit. The presence of hydrogen can lead to combustion processes which is a potential danger for the integrity of the containment. The prediction of the mass transport of hydrogen is vital for an optimised positioning of countermeasures like recombiners. Lumped parameter (LP) codes have been developed, verified and used to analyse and predict transport processes within a containment [1, 2]. These models are based on mass and energy budgets between given control volumes inside a containment building. They can provide valuable information about complex flows, such as mixing, condensation and aerosol transport. However, flow models are often specialised to a narrow range of application and the user influence is rather large. Recently, methods of computational fluid dynamics (CFD) have also been used to simulate containment flows [3, 4]. They are based on temporally averaged mass, momentum and energy conservation equations, which appear as a set of coupled partial differential equations. The international standard problem 47 (ISP-47) has the main objective to evaluate the capability of LP and CFD codes to predict the hydrogen distribution under LOCA conditions [5]. A possible distribution of hydrogen and air is a stable stratification, due to the different densities of those fluids. This stable stratification can then be mixed by a free jet caused by a leak in the primary circuit. The gathering of experimental data of this processes in a real containment is not easily accomplished. Therefore a model containment is used to realise measure-

Turbulence Modelling for CFD-Methods for Containment Flow

453

ments of mixing processes in large buildings. The abbreviation THAI stands for Thermal-hydraulics, Hydrogen/Helium, Aerosol, Iodine. Two THAI experiments have been simulated during the time frame of this report. The TH20 experiments were performed in the THAI testing facility to investigate the mixing of a stable stratification with a free jet, as described above [6]. The TH-22 experiments investigated an unstable temperature stratification.

2 Modelling 2.1 Favre-Averaged Navier-Stokes Equations The basic equations of fluid dynamics are the Navier-Stokes equations [7, 8]. They consist of the continuity equation for the conservation of mass and the momentum equations for the conservation of mass. In the present case of a single phase mixture of different fluids, the Navier-Stokes equations are complemented with the conservation of species concentration. Since the problem is isothermal, the energy conservation equation is not considered. For an incompressible flow, they are as follows. Continuity Equation: ∂ ui =0 (2) ∂ xi Momentum Equations: ∂ ui ∂ ui ∂p ρ − ρr ∂ 2 ui =− +uj ρ +μ + gi ∂t ∂xj ∂ xi ∂ x j∂ x j ρr

(3)

Concentration Equation:

ρ

∂ϕ ∂ 2ϕ ∂ϕ + ui =λ ∂t ∂ xi ∂ xi ∂ xi

(4)

The Navier-Stokes equations are capable of completely describing a flow given a sufficiently fine grid. This approach is called a direct numerical simulation (DNS). For most engineering applications the instantaneous behaviour of single eddies is not important. So a statistically investigation of a flow is sufficient. The commonly used approach is the Reynolds averaging, where a value is separated into a time averaged mean component and a fluctuating component. u = u¯ + u

(5)

The time averaging is defined as follows. u¯ =

1 · T

T 0

u · dt

(6)

454

A. Zirkel, E. Laurien

For flows with changing density, the Favre mass-averaging is advantageous [9]. u˜ =

ρ ·u ρ¯

(7)

To perform a Favre averaging of the conservation equations, the flow properties are decomposed as follows.

ρ = ρ¯ + ρ p = p¯ + p u = u˜ + u ϕ = ϕ˜ + ϕ

(8)

Using (7) and (8) on (2), (3) and (4) results in the Favre-averaged Navier-Stokes equations. Continuity Equation: ∂ u˜i =0 (9) ∂ xi Momentum Equations: 1 ∂ u˜i ∂ u˜i ∂ u˜i ∂ p¯ ρ¯ − ρr ρ¯ + μ − ρ ui u j + gi =− + u˜ j ∂t ∂xj ∂ xi ∂ x j ∂xj ρr Concentration Equation: ∂ ϕ˜ ∂ ϕ˜ ∂ ϕ˜ ∂ ¯ = + u˜i ρ λ − ρϕ ui ∂t ∂ xi ∂ xi ∂ xi

(10)

(11)

This equations are similar to the base equations except for the additional turbulent terms in the momentum and concentration equation. The momentum equations now have the Reynolds Stress tensor,

τ˜i j = −ρ ui uj

(12)

Φ˜ i = −ρϕ ui

(13)

and the turbulent mass flux vector

The turbulent terms are determined with turbulence models.

2.2 Reynolds Stress Model The isotropic Boussinesq approach to model the Reynolds Stress tensor is widely used, for example in two equation turbulence models [10, 11]. But it is not suitable

Turbulence Modelling for CFD-Methods for Containment Flow

455

for non-isotropic problems because the effect of the turbulent structures on the mean flow depends on their direction [12]. The secondary flow in a non-circular pipe for example is a result of non-isotropic Reynolds stresses. Two-equation models are also insufficient for flows with strongly bend streamlines. To take non-isotropic effects into account, the Reynolds stress models have a transport equation for every Reynolds stress [13]. Those transport equations are derived by subtracting the Favre-averaged momentum equation (10) from the momentum equation (3) for the xi and the x j momentum. The resulting equation N(u) for the component i is multiplied with the fluctuating velocity u for the component j and vice versa. The sum of both is then Reynolds averaged. N(ui )uj + N(u j )ui = 0

(14)

Because of the Reynolds averaging, every term with only one fluctuating component is zero. The remaining products of fluctuating velocities, which are the components of the Reynolds stress tensor, are forming the transport equations. There are nine equations of which six are different, due to symmetry.

∂ τ˜i j ∂ τ˜i j + u˜k = Di jk + Pi j + εi j + Πi j + Gi j ∂t ∂ xk

(15)

The right side of the Reynolds stress equations consists of the diffusive transport Di jk , the stress production Pi j , dissipation tensor εi j , the pressure strain correlation Πi j and the buoyancy production Gi j . The diffusive transport consists of the turbulent diffusion and the pressure diffusion. ∂ ui uj uk ∂ p uj ∂ p ui + + (16) Di jk = ρ¯ ∂ xk ∂ xi ∂xj The stress production term is a source term for the production or destruction of turbulence. It can have an amplifying or a damping effect on the Reynolds stresses. ∂ u˜j ∂ u˜i Pi j = ρ¯ τ˜ik (17) + τ˜ jk ∂ xk ∂ xk

εi j is the viscous dissipation. In a laminar flow it can be neglected, but dissipation caused by the fluctuations in a turbulent flow has to be considered. εi j = 2 μ

∂ ui ∂ ui · ∂ xk ∂ xk

(18)

The pressure strain correlation Πi j considers the interdependency of pressure and velocity fluctuations. It is not a source of turbulence, but describes a redistribution of the Reynolds stresses.

456

A. Zirkel, E. Laurien

Πi j = −p

∂ uj ∂ ui + ∂xj ∂ xi

(19)

Turbulence production due to buoyancy effects is considered in the buoyancy production term with β = ϕρ . Gi j = β gi Φ˜ j + g j Φ˜ i

(20)

2.3 Turbulent Scalar Flux Model Similar to the modelling of the Reynolds stresses, a simple model which is using the eddy viscosity is widely used to calculate the turbulent scalar fluxes.

νt ∂ ϕ˜ Φ˜ i = · σt ∂ xi

(21)

This is the eddy diffusivity model (EDM), where the spatial gradient of the con˜ centration ∂∂ xϕi is multiplied with the eddy diffusivity σνtt . The eddy diffusivity is the eddy viscosity νt divided by a turbulent Schmidt number σt which is constant. Therefore, this approach cannot take non-isotropy into account, which is present in a stratified flow, because with the eddy viscosity the same turbulent value is used for every spatial direction. Similar to the Reynolds stress equations, the exact equations for the turbulent scalar fluxes can be derived [8].

∂ τ˜i j ∂ τ˜i j + u˜k = Pi jY + Gi jY + Di jY + Πi jY ∂t ∂ xk

(22)

The terms on the right-hand side of (22) are as follows. Mean-Field Production: Pi jY = −τ˜i j

∂ ϕ˜ ∂ u˜i − Φ˜ j ∂xj ∂xj

(23)

Buoyancy Production: Gi jY

ϕ 2 = − (1 −C3Y ) β ρ¯

∂ p¯stat + ρr e f gi ∂ xi

(24)

Diffusive Transport: Di jY

∂ = ∂xj

2 kˆ 2 μ + CY ρ¯ 3 εˆ

∂ ∂xj

Φ˜ i ρ¯

(25)

Turbulence Modelling for CFD-Methods for Containment Flow

457

Pressure-Scalar Gradient Correlation:

∂ u˜ j εˆ ∂ u˜i Πi jY = −C1Y Φ˜ i −C2Y Φ˜ j −C4Y Φ˜ j ˆk ∂xj ∂ xi

(26)

The variance ϕ 2 in the buoyancy production term is calculated with an additional transport equation.

∂ ϕ 2 ∂ ϕ 2 + u˜ j = PjYY + D jYY + εYY ∂t ∂xj

(27)

The terms on the right-hand side of (27) are as follows. Mean-Field Production: ∂ ϕ˜ PjYY = −2Φ˜ j ∂xj

(28)

Diffusive Transport: D jYY Dissipation:

∂ = ∂xj

2 kˆ 2 μ + CYY ρ¯ 3 εˆ

∂ ∂xj

ϕ 2 ρ¯

εˆ εYY = −2C1YY ϕ 2 kˆ

(29)

(30)

The initial values for the model coefficients are C1Y = 2.9, C2Y = 0.4, C3Y = 0.55, C4Y = 0.0, CY = 0.15, CYY = 0.2 and C1YY = 1.0.

2.4 Integration Domain and Boundary Conditions 2.4.1 TH-20 For the numerical investigation, the TH20.8 experiment is used as the experimental case. The geometry of the experiment is rotational symmetric. This is also true for the experimental boundary conditions. Therefore a two-dimensional integration domain is used to simulate the experiment (Fig. 1, left). The two-dimensional domain is realised with a 1◦ wedge with a thickness of 1 cell. The first approach to generate the jet was to replace the inner part of the inner cylinder with a outlet boundary at the bottom of the inner cylinder and a velocity inlet boundary condition at the top of the nozzle. The helium is transferred from the outlet to the inlet, to prevent a loss of helium. A problem with this approach is that the influence of the increasing helium concentration on the generation of the jet cannot be covered. The fan, which is responsible for the jet generation, is applying a pressure gradient on the fluid. It acts as a momentum source. If the density of the fluid decreases with an increasing concentration of helium, the volumetric flow rate

458

A. Zirkel, E. Laurien

Fig. 1 Experimental case: integration domain (left) and initial helium distribution (right)

will increase. This results in a greater jet velocity. Another minor problem is the shift of helium between outlet and inlet. Because the helium is bypassing the interior of the inner cylinder it is faster at the inlet than in reality. This can influence the result, because a higher helium concentration of the jet means a less sharp density gradient between jet and density layer which results in a better mixing which in turn leads to a higher helium concentration at the outlet. This effect is cumulative and can influence the result given the long transient. An additional problem is associated with the nozzle as origin of the jet. The mixing rate of a jet originating from a nozzle is larger compared to a long pipe as origin. The so called vena contracta effect appears if the origin of a jet is a nozzle or orifice and must also be considered. Vena contracta is the constriction and acceleration of a jet depending of the opening angle of the nozzle. To solve those problems, the interior of the inner cylinder is also part of the integration domain and there are no inlet or outlet boundaries. The fan is modelled as a momentum source. The volume of the fan in the 2D-wedge is 3.3 · 10−4 m3 and the applied momentum is 10.43 mkg 2 s2 which is determined iteratively using velocity measurements of a pilot test of the inner cylinder. All walls are modelled smooth without slip. Symmetry boundary conditions are used for the symmetry axis and the sides. Figure 1 shows the initial helium distribution. The used fluid is a variable composition mixture containing air as ideal gas and helium. The helium concentration is the passive scalar transported by the concentration equation and air is constraint. Buoyancy must be considered due to the variable density. The reference density for the buoyancy treatment is 0.179 mkg3 and the gravity constant is 9.81 ms . The initialisation and reference pressure is 1.168 bar. The initialisation temperature is 24.3◦ C.

Turbulence Modelling for CFD-Methods for Containment Flow

459

Fig. 2 TH-22 boundary conditions

2.4.2 TH-22 The investigation of the TH-22 experiment was part of a blind calculation exercise. The only known boundary conditions were the wall temperatures, Thot = 119◦ C and Tcold = 41◦ C, and the average initial pressure and temperature inside the vessel, pini = 1.21 bar and Tini = 91◦ C (Fig. 2). The fluid inside the vessel is air, which is treated as ideal gas.

3 3 Numerical Parameters and Grids 3.1 TH-20 The numerical parameters to simulate the experimental case are the same with and without using the TSF model except the changes to the CCL to activate the TSF model. It is a transient calculation with a constant time step of 0.01 s and an initial time step of 0.001 s. The transient scheme is first order backward Euler. The first order transient scheme is necessary to ensure a stable run with the customised solver. Buoyancy production and dissipation are activated for the turbulence model. In the solver control, the turbulence numerics are set to first order. The spatial advection scheme is high resolution. The choice of a spatial advection scheme in CFX is realised through a blending factor β . This factor is blending between a first and a second order upwind differencing scheme. Uip = Uup + β ∇U Δ ψ

(31)

460

A. Zirkel, E. Laurien

Fig. 3 Position of the monitor points in the TH-20 simulation

Uip is the value of the integration point, Uup is the value of the upwind point, ∇U is the average value of the adjacent nodal gradient and ψ is the vector from the upwind node to the integration point [14]. With β = 1 it is a second order upwind scheme. Because it is unbounded it may lead to non-physical oscillations in regions of rapid solution variation. The high resolution scheme is of second order where possible, with β = 1. To prevent the non-physical oscillations it is decreasing the value of β where necessary. The maximum number of iterations (coefficient loops) per time step is 20, the minimum number of iterations is two. The convergence criterion is a maximum residuum of 10−3 . An important step for the post-processing of the experimental case is the definition of monitor points for the helium concentration. The position of those points is the position of the measuring points in the experiment, see Fig. 3. Using the monitor points, the mixing of helium can be analysed during the solver run. It would be possible to generate the mixing curves later with the CFX post-processor, but this requires transient result files in the frequency of the desired resolution of the curves. So it is highly recommended to use the monitor points, because here the values of every time step are available. The numerical set-up for the two-dimensional steady case is similar to the set-up of the experimental case. It is also a transient calculation with the same time step of 0.01 seconds. Once it reaches the steady state, small fluctuations of certain values, like Reynolds stresses can occur. Therefore a transient averaging of those values can be performed, since the average value is constant due to the steady state.

Turbulence Modelling for CFD-Methods for Containment Flow

461

Fig. 4 Blocking strategy for the experimental case

The geometry of the TH-20 experiment needs a nested arrangement of two Cgrids and a L-grid. The left hand side of Fig. 4 illustrates the three grid types. The first C-grid is placed at the wall of the vessel (red line in Fig. 4). The second Cgrid is used to describe the inner part of the inner cylinder (green line in Fig. 4). Finally the L-grid describes the wall of the inner cylinder and the shear layer of the free jet (blue line in Fig. 4). The final grid has 31094 elements. A grid dependency study has been performed. According to the best practice guidelines, the grid was strongly refined to 74854 elements. Then, the refined grid was refined a second time to 310467 elements, which is ten times the original grid. It turned out, that the original grid with 31094 elements is sufficient because the use of a finer grid yields the same results.

3.2 TH-22 The TH-22 experiment is simulated with a Scale-Adaptive-Simulation (SAS). The idea of a SAS is to directly simulate the large eddies if the spatial discretisation is sufficient. The smaller eddies and the regions with a coarser grid are modelled with a SST-model. The top part and the swamp have been neglected for the SAS. This has been done because it can be assumed that those regions of the vessel have no impact on the actual flow. The TH-22 flow is statistically steady but a SAS is a scale resolving method and therefore a transient calculation must be performed. A fixed time step of 0.001 s is used. Since the physical problem in question is a natural convection flow, buoyancy

462

A. Zirkel, E. Laurien

is considered. The spatial advection scheme is high resolution, the transient scheme is second order backward Euler and the turbulence numerics are first order. The maximum number of iterations (coefficient loops) per time step is 10, the minimum number of iterations is two. The convergence criterion is a maximum residuum of 10−3 .

4 Results 4.1 TH-20 4.1.1 TSF Model Results and Comparison to Experiments and EDM The first step of the investigation of the experimental case with the TSF model is a simulation with the original model coefficients and a comparison with the eddy diffusivity model and the experimental data. The compared values are the helium concentrations at certain monitor points due to experimental data availability. Only very limited velocity and no turbulence information is available for the experimental case. The used monitor points are at different heights in the experimental facility to cover the advancement of the density layer towards the ceiling. For an overview of all measurement positions of the experimental case see Fig. 3. All monitor points used for the comparison are at x = 1.078 m which is located horizontally between the interaction region of the jet with the density layer and the wall of the vessel. The naming convention for the monitor points of the original experimental data is taken over. The relevant monitor point sorted by height are shown in Table 1. The charts in Fig. 5 show the helium concentration at a given monitor point over the time. Initially all monitor points are inside the light gas cloud with helium concentrations larger than 30%. The measured time starts with the generation of the free jet by the fan. After the jet reaches the density layer, the layer becomes more narrow. This behaviour is explained in the discussion of the steady case. If two cases have a similar mixing speed, the layer thickness can be seen in the charts in Fig. 5. A steeper time gradient of helium at a monitor point means a more narrow density layer. But this is only true if the mixing time is similar, otherwise the time gradient of helium is also affected by the advancement speed of the layer towards the ceiling. The eddy diffusivity model shows the expected large discrepancy to the experimental data. It can be seen that the longer the calculated transient is, the larger

Table 1 Position of the monitor points used for comparison name x [m] z [m]

203 1.078 6.27

209 1.078 6.60

214 1.078 6.93

210 1.078 7.20

202 1.078 7.49

215 1.078 7.99

Turbulence Modelling for CFD-Methods for Containment Flow

463

Fig. 5 Helium concentration at different monitor points

the discrepancy becomes. The reason for this behaviour is the accumulation of the under-prediction of the mixing of the eddy diffusivity model over time. Using the turbulence scalar flux model yields a significant improvement of the mixing. Here the positive effect of the TSF model becomes more clear with a longer calculated transient. The curves of TSF and eddy diffusivity model at monitor point 203 are quite similar. But 0.33 m higher at monitor point 209 the curve of the TSF model is closer to the experimental data. This trend continues with the progress of the layer towards the ceiling, see monitor point 210. The gap between the curve of the TSF model and the measured data is getting smaller while the gap between the curves of TSF and eddy diffusivity model is getting larger. The reason for this behaviour is the smaller error of the TSF model to predict the mixing. The TSF model is still under-predicting the mixing and this error is also accumulated. The accumulation of the under-prediction described above for the eddy diffusivity model can be seen as well for the TSF model in Fig. 5. But it is less than the underprediction of the eddy diffusivity model. Figure 6 shows the advancement of the density layer over the monitor points 203 and 209 calculated with the turbulence scalar flux model. The sequence in Fig. 6 starts at 0 seconds and shows the initial helium distribution. The region of the density layer is rather broad. After 10 seconds, the jet reaches up to the region of 28%

464

A. Zirkel, E. Laurien

Fig. 6 Helium concentration with TSF model at different times

helium. The density layer in the region of the jet is drastically narrowed. The thickness of the layer closer to the wall is not yet affected by the jet. The next contour plot in Fig. 6 shows the helium distribution after 100 seconds. The jet has still the same length that it had after 10 seconds and the density layer didn’t advance further towards the ceiling. What happened in the 90 seconds is a change of the density layer thickness between the jet and the wall. It has now an approximately uniform thickness along its whole length. A small amount of helium is transported downwards and is already reaching the inner cylinder. But the fan is not yet sucking in helium. After 200 seconds the density layer moved upwards and is now above monitor point 203. The thickness of the density layer is further decreased and more helium is brought down to the lower part of the vessel. The fan is now sucking in a mixture of helium and air, so the jet contains a small helium concentration. The helium in the jet at 200 seconds leads to a less steep helium mass fraction gradient in the interaction region and causes a larger turbulent mass flux. The last contour plot in Fig. 6 shows a further continuation of the effects of the mixing on the density layer thickness and the transport of helium to the lower part of the vessel.

4.1.2 Modification of Model Coefficients Figure 7 shows a comparison of the helium concentration over time at monitor point 210 of C4Y = 0.3, C3Y = 0.0, the combination of both modified model coefficients and the result obtained with the original values C3Y = 0.55 and C4Y = 0.0. The improvement of the mixing because of the modification to the model coefficients can be seen. The unmodified TSF model needs the most time for the mixing. A faster

Turbulence Modelling for CFD-Methods for Containment Flow

465

Fig. 7 Improvement of the mixing with the modified model coefficients

mixing is possible with the use of C3Y = 0.0. A slight improvement towards a faster mixing can be achieved with the use of C4Y = 0.3. So far the results of the investigation of the steady case have been confirmed by the simulations of the experimental case. The TSF model yields a significant improvement over the eddy diffusivity model. The mixing with C3Y = 0.0 is better than with the original model coefficients and the mixing with C4Y = 0.3 is better than with C3Y = 0.0. The combination of both modifications to the final values leads to the best result in the steady case. The helium concentration curve at monitor point 210 (Fig. 7) shows that the final values improve the mixing compared to all other TSF model coefficient variations.

4.2 TH-22—Parallel Computing The SAS of the TH-22 experiment did not reach a statistically steady state and was eventually discontinued. However, information about the performance of CFX on the Nehalem cluster of the HLRS could be obtained. The large grid of 8 million cells made parallel computing necessary. 64 CPUs were used parallel. An interesting information is the ratio of actual CPU time to wall clock time. For example, a run with a total wall clock time of 22 h 40 min had a average total CPU time of 14 h 45 min. So the ratio for this run is 0.65. This means that only 65% of the total time is for the actual simulation and the rest is communication. Because the wall clock time on the HLRS cluster is limited to 24 h, a typical amount of time steps that can be calculated in one run is 1890. This equals a physical time of 1.89 s. A contributing factor to the large amount of communication is the size of the transient result files. One file is 3.5 GB. The solver is advised to write a transient result file every 0.1 s physical time. Assuming a total physical time of 1.89 s per

466

A. Zirkel, E. Laurien

solver run, 18 transient result files are written as well as the final result file. Considering the transfer of data to the nodes at the beginning of the solver run, a total of 20 · 3.5 GB = 70 GB is transfered.

5 Summary and Conclusions To contribute to the improvement of the safety analysis of light-water reactors, the aim of this study was to improve the capability of computational fluid dynamics method to predict the mixing of a stable stratification with a free jet. The specific investigated flows were one phase flows of helium and air. The helium is a replacement of the hydrogen of the loss of coolant accident in the experimental case. This replacement was made because of safety concerns of the experimenters. Previous numerical investigations showed a shortcoming of CFD codes to correctly predict the mixing of a stable stratification. The incapability of the turbulence model to consider the non-isotropy of such a stratification was identified as the reason for the poor results. While the Reynolds stress model is capable to calculate non-isotropic Reynolds stresses, a similar model for the turbulence mass flux was not yet available. This work was successfully using the non-isotropic turbulence scalar flux model to enhance the Reynolds stress model. The first step of the investigation of the TH-20 experiment with the TSF model was a simulation with the original model coefficients and a comparison with the eddy diffusivity model and the experimental data. The eddy diffusivity model shows the expected large discrepancy to the experimental data. It was observed that a longer calculated transient leads to a larger discrepancy. The accumulation of the under-prediction of the mixing of the eddy diffusivity model was identified as the reason for this effect. Using the turbulence scalar flux model was yielding a significant improvement of the mixing. Here the positive effect of the TSF model became more clear with a longer calculated transient. However, the TSF model was still under-predicting the mixing. Simulations were performed with the modified values of the model coefficients for the buoyancy production term and the pressure-scalar gradient correlation that yielded the best agreement with the large eddy simulation in the steady case. The results confirmed the outcome of the steady state investigation. The use of C3Y = 0.0 led to an improved mixing. A slightly better mixing than with C3Y = 0.0 could be obtained with C4Y = 0.3. Finally, the combination of both values was capable of a further improvement of the mixing and consequentially to the best agreement with the experimental data. The reason for the improved mixing is the increase if the turbulence mass flux with the modification of the model coefficients.

Turbulence Modelling for CFD-Methods for Containment Flow

467

References 1. H.-J. Allelein, S. Arndt, W. Klein-He¨sling, S. Schwarz, C. Spengler and G. Weber: “COCOSYS: Status of development and validation of the German containment code system”, Nuclear Engineering and Design, vol. 238, pp. 872–889, 2008 2. H.-J. Allelein, K. Neu and J.P. Van Dorsselaere: “European validation of the integral code ASTEC (EVITA) first experience in validation and plant sequence calculations”, Nuclear Engineering and Design, vol. 235, pp. 285–308, 2005 3. I. Kljenak, M. Babic, B. Mavko and I. Bajsic: “Modelling of containment atmosphere mixing and stratification experiment using a CFD approach”, Nuclear Engineering and Design, vol. 236, pp. 1682–1692, 2006 4. M. Houkema, N.B. Siccama, J.A. Lycklama and E.M.J. Komen: “Validation of the CFX4 CFD code for containment thermal-hydraulics”, Nuclear Engineering and Design, vol. 238, pp. 590–599, 2008 5. “International Standard Problem ISP-47 on containment thermal hydraulics final report”, Nuclear Energy Agency, NEA/CSNI/R(2007)10 6. T. Kanzleiter, A. K¨uhnel, K. Fischer, M. Heitsch and B. Schramm: “Technical Report THAI Blower Test TH 20”, Report No. 150 1325 TH20, Gesellschaft f¨ur Reaktorsicherheit, K¨oln, 2007 7. H. Siekmann and P. Thamsen: “Str¨omungslehre”, vol. 2, Springer, Berlin, 2008 8. W. Rodi: “Turbulence Models and Their Application in Hydraulics, A State-of-the-art Review”, vol 3, A.A. Balkema, Rotterdam, 1993 9. D. Wilcox: Turbulence Modelling for CFD, vol. 2, DCW Industries, USA, 2004 10. W.P. Jones and D. Lentini: “A realisable non-linear eddy viscosity/diffusivity model for confined swirling flows”, International Journal of Heat and Fluid Flow, 2008 11. F. Menter: “Two-equation eddy-viscosity turbulence models for engineering applications”, AIAA Journal, vol. 32, pp. 1598–1605, 1994 12. E. Laurien and T. Wintterle: “On the numerical simulation of flow and heat transfer within the fuel-assembly of the high-performance light-water-reactor”, Proceedings of the KTHWorkshop on Modelling and Measurements of Two-Phase Flows and Heat Transfer in Nuclear Fuel Assemblies, 200l 13. C. Speziale, S. Sarkar and T. Gatski: “Modelling the pressure-strain correlation of turbulence: an invariant dynamical systems approach”, Journal of Fluid Mechanics, vol. 227, pp. 245–272, 1991 14. ANSYS CFX User Documentation

Transport and Climate Prof. Dr. Christoph Kottmeier

The challenges of numerical weather prediction, of building projections of our future climate and of fluid transport modeling differ substantially, although the underlying physical cores of the models often bear similarities. The HPC requirements for simulations of natural systems like the atmosphere and the oceans in general are still increasing, since the model grid resolution is globally much too coarse to cover all energy-containing scales of motion in the atmosphere, in particular the mesoscale (10–1000 km) and the convective scale (100 m–10 km). It is aimed that approximations such as parameterizations can be avoided and that net effects of small scales are calculated at grid resolution. Weather is predicted by integrating a system of nonlinear coupled partial differential equations to obtain future states of the atmosphere from a given initial state. Weather forecasting is successful, when the dynamics of atmospheric vortices of different size and the interaction with the surface are properly represented. The projects AMMA in this context provides important insights into the effects of high soil moisture from previous rain on the surface fluxes of heat and moisture. Their relative size decides on the probability of new convection and convective precipitation as an essential element of climate in West Africa. The PANDOWAE project on the other hand focuses on the development of the Mediterranean cyclones, their dynamics and convection and elaborates by various model runs the factors determining their formation. The main goal is the early identification of cyclones and convective systems leading to high impact weather in the Mediterranean using numerical simulations by the COSMO forecast model (http://www.cosmo-model.org) for selected cases. Climate cannot be predicted for specific locations and times in this sense, since the coupling between processes on different scales interact in an unpredictable way. The relevant processes range from, e.g., cloud microphysics to large hemispheric circulation systems. Climate, being defined by the statistics of weather for a period of 30 years and beyond, also evolves in time, since it responds to varying boundInstitut f¨ur Meteorologie und Klimaforschung, Karlsruher Institut f¨ur Technologie, WolfgangGaede-Str. 1, 76131 Karlsruhe, Germany 469

470

C. Kottmeier

ary conditions, both external (changing solar radiation flux, land use) and internal (atmospheric composition). The slowly varying components of the climate systems, i.e. the oceans, ice sheets and soil characteristics cause a memory effect for the atmospheric changes. Limited area climate models are applied in the projects HRCM at KIT with CCLM since several years and now also in WRFCLIM at the University of Hohenheim to get higher resolution in regions of interest. Their computational and data storage requirements are similar to those of global models. The CPU time requirements also increase due to the tendency, since the idea of ensemble modeling becomes more and more popular. By contributing to the CORDEX initiative of regional climate change modeling for Europe and Africa, HCRM and WRFCLIM will also provide input the next IPCC assessment report on climate change. The oral presentations for both projects will be given next to each other to share experiences. At much higher model resolution, the project TIGRA addresses Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES) of stably stratified turbulent fluids. Simulation of geophysical turbulent flows as mentioned above requires a robust and accurate subgrid-scale turbulence modeling. The aim is to proof the reliability of implicit turbulence modeling with the Adaptive Local Deconvolution Method (ALDM). As benchmark results, high resolution DNS data and LES results with an explicit Smagorinsky model are used. The investigated test cases were the transition of the three-dimensional Taylor-Green vortex (TGV) and horizontally forced homogeneous stratified turbulence (HST). In most simulations, the buoyancy Reynolds number was larger than unity. The Froude and Reynolds number were chosen to cover the complete range from isotropic Kolmogorov turbulence up to strongly stratified turbulence. The analysis proves that the implicit turbulence model correctly predicts the turbulence energy budget and the spectral structure of stratified turbulence adequately.

The Transport of Mineral Dust Towards Hurricane Helene (2006) Juliane Schwendike, Sarah Jones, Heike Vogel, and Bernhard Vogel

1 Introduction West African dust aerosol plumes are the most wide spread, persistent and dense [31]. The processes which are responsible for the dust uplift are highly variable in time and space with an annual peak during summer [10, 31]. Mineral dust particles affect the atmospheric radiation budget directly (e.g. [14, 42]) and indirectly (e.g. [16]). The indirect aerosol affect and the scattering of solar radiation lead to a cooling of the atmosphere, whereas the absorption of radiation by aerosols leads to a warming of the atmosphere and to a suppression of convection, which is called the semi-direct effect (e.g. [15]). Dust emission over North Africa can occur in association with a variety of different weather systems: • High near-surface wind speeds result from the downward mixing of momentum from nocturnal low-level jets [3]. These can occur in relation to Saharan heat low (SHL) dynamics [22] or with low-level jets generated in the lee of complex terrain, e.g. the Bod´el´e region in Northern Chad [43, 47]. • The penetration of upper level troughs to low latitudes [21]. • Density currents due to strong evaporational cooling along precipitating cloud bands over the northern Sahara [21] and along the Saharan side of the Atlas Mountain chain in southern Morocco [19]. • The density currents due to mesoscale convective systems (MCSs) are very effective for the emission of dust and its injection to altitudes favourable for longrange transport. This process is of particular importance at the beginning of the monsoon season, before the growing vegetation rapidly inhibits local dust emission [11] and when greater energy is available to the downdraughts from the convective systems [27]. Juliane Schwendike · Sarah Jones · Heike Vogel · Bernhard Vogel Institut f¨ur Meteorologie und Klimaforschung, Karlsruher Institut f¨ur Technologie, Kaiserstr. 12, 76131 Karlsruhe, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 34, © Springer-Verlag Berlin Heidelberg 2012

471

472

J. Schwendike et al.

• Highly turbulent winds at the leading edge of the monsoon nocturnal flow in the Inter Tropical Discontinuity (ITD) region also generate dust uplifting [2]. The thermodynamic characteristics of the dusty layer at the leading edge of the monsoon flow over the Sahel are described by [27]. • Density currents at the leading edge of the Atlantic inflow [13] in association with dust uplift were observed by [35]. The Saharan air layer (SAL) consists of the well-mixed layer above the Saharan desert reaching from the surface up to about 500 hPa, and the elevated Saharan planetary boundary layer which occurs at a height of the African easterly jet (AEJ) with the monsoon flow underneath it and the free atmosphere aloft. It is characterised by a significant amount of mineral dust aerosols, relatively dry and warm air, and a weakly-stable stratification [6, 7, 18, 29, 30]. At the edges of the SHL, the Saharan planetary boundary layer is elevated from the ground and is referred to as the SAL. The SAL can be transported across the Atlantic and into the Caribbean conserving its thermodynamic structure [6]. [23] proposed a Saharan dust plume conceptual model that was based on the previous models by [6] and [18], which depict the movement of Saharan air from Africa to the Caribbean and its interaction with African disturbances. The dust plume over the West African coastline is composed of two narrow plumes. One originates over the northern Sahara and the other over the Lake Chad region in the east. Maximum dust concentrations can be found near the axis of the African easterly jet (AEJ). The aerosol optical thickness (AOT) decreases rapidly away from land with maximum optical depths still located close to the AEJ. The ITD acts to raise the mineral dust layer over the southwesterly monsoon flow into the AEJ core [12], and the AEJ transports the dust across West Africa and the Atlantic, where it can affect the tropical cyclogenesis. It is not yet fully understood how the SAL interacts with African easterly waves (AEWs) and influences the genesis and intensity of tropical cyclones. Some studies state that the SAL has a negative effect on tropical cyclones by inhibiting the formation, or reducing the intensity of tropical cyclones [5, 9, 17, 25, 46]. Other studies found a positive impact. For instance, [40] say that the SAL can act to intensify tropical cyclones when it occurs predominantly in the northwestern part of the storm, and it leads to a weakening of tropical cyclones when the dry air intrudes within 360 km of the tropical cyclone’s centre, mostly in the southwest and southeast. They also attributed weakening to the stabilising effect of the SAL. [4], on the other hand, claims that dust has neither a significant positive nor a negative impact on tropical cyclogenesis. To assess the effect of mineral dust on the cyclogenesis and intensity of tropical cyclones is a crucial step to improve our ability to forecast them. During the tropical cyclogenesis of Hurricane Helene (2006) large amounts of mineral dust are emitted and transported across West Africa and the Atlantic. This period was part of the special observation period of the African Monsoon Multidisciplinary Analyses (AMMA) project [34]. Our study aims to investigate the transport of mineral dust towards Hurricane Helene.

The Transport of Mineral Dust Towards Hurricane Helene (2006)

473

2 Numerical Model The COnsortium for Small scale MOdelling – Aerosols and Reactive Trace Gases (COSMO-ART) model1 [36, 37, 39, 44, 45] was used for this study. COSMO-ART is a limited-area numerical model, which describes the emission and transport of mineral dust aerosols and their interaction with radiation. The model system is fully coupled online and identical numerical methods are applied to calculate the transport of all scalars. The interaction of mineral dust particles with cloud microphysics is neglected. Mineral dust particles are represented by log normal distributions. The emission of dust particles is calculated online as functions of friction velocity, soil moisture and surface parameters [1, 39, 44]. Details about the dust emission scheme, the radiation scheme and the online calculation of the optical properties within COSMO-ART can be found in [39]. COSMO-ART is based on the operational weather forecast model COSMO used by several European weather services on different platforms, e.g. the German Weather Service (DWD). The code with the operational settings is well tested and optimised to produce cost effective forecasts. It is a fully compressible nonhydrostatic model suitable for forecasting atmospheric processes down to the mesogamma scale. The Arakawa C-grid is used for horizontal differencing on a rotated latitude/longitude grid. In the vertical a hybrid system with 50 layers up to about 28 km height is applied. The height of the model was increased compared to the standard setup to allow for deep tropical convection. The basic equations are solved using the time-splitting technique of [24]. The comprehensive physics package of COSMO includes a turbulence and surface layer scheme using a prognostic turbulent kinetic energy equation with a 2.5 order closure by [28] with extensions by [32] and a two-category bulk model cloud microphysics scheme [8, 26]. Precipitation formation is treated by a Kessler-type bulk microphysics parametrisation [20] including water vapour, cloud water, rain and cloud ice with column equilibrium for the precipitating phase. The subgrid-scale clouds, on the other hand, are parametrised by an empirical function depending on relative humidity and height. Deep convection is taken into account by the [41] parametrisation. A δ -two-stream radiation scheme [33] is used for the short and longwave fluxes and the full cloud radiation feedback. The surface layer is parametrised by a stability-dependent draglaw formulation of momentum, heat and moisture fluxes according to similarity theory. The initial and boundary conditions for all runs are taken from 6-hourly European Centre for Medium Range Weather Forecasts (ECMWF) analyses. Computational details for a typical model setting on the HP XC 4000 at the Steinbruch Centre for scientific super computing can be found in Table 1. We conducted various model runs (around 100) and varied the size of the model domain. The largest model region contained 1000 × 500 grid points. The following results are based on a 144-h model run initialised on 9 September 2006 at 12 UTC. The model domain ranges from −60–20◦ E and 0–45◦ N and comprises 50 vertical levels. The horizontal resolution is 28 km and the vertical 1

www.cosmo-model.org

474

J. Schwendike et al.

Table 1 The typical model settings for a COSMO-ART run are shown here Parameter Horizontal resolution Vertical levels Simulation time Tasks and nodes Sum of CPU-time over all processors (d-hh:mm) Elapsed time (hh:mm:ss) Maximum physical memory by any process (in MB) Maximum virtual memory by any process (in MB) Horizontal gridpoints

Example 0.025◦ 50 144 h 64 tasks running on 16 nodes 31-21:20 5:10:00 1508 4675 231×132

resolution in the boundary layer is enhanced. The initial and boundary conditions are taken from 6-hourly European Centre for Medium Range Weather Forecasts (ECMWF) operational analyses which contain no information about dust concentrations. Additionally, no dust climatology is applied, so only the computed dust concentrations impact the radiation fields.

3 Dust Emission and Transport over West Africa Between 9 to 14 September 2006 significant amounts of mineral dust were transported into the vicinity of the developing tropical storm Helene. A number of different dynamical features contributed to the emission and the transport of dust towards Helene. The different pathways of the dust could be identified with the help of COSMO-ART. On 9 September 2006, a low-level positive vorticity anomaly occurred over West Africa (Fig. 1c). It moved along about 18◦ N, crossed the coast of West African and moved towards the southwest, where it merged with a positive vorticity maximum associated with the monsoon depression (Fig. 1d). When this positive vorticity anomaly was collocated with the vorticity maximum of the AEW (Fig. 1b), out of which Hurricane Helene developed, the development of the pre-Helene tropical depression was initiated. This low-level vorticity maximum over land lead to the emission of mineral dust due to increased surface winds (Fig. 2 region A). The other regions where significant emission of mineral dust occurred during this case study are illustrated in Fig. 2. Dust was emitted by the gust fronts of the convective systems over land, due to the Atlantic inflow (Fig. 2 region B), and due to orographical effects at the mountains in the Western Sahara (Fig. 2 region C), the Atlas Mountains (Fig. 2 region D), and north and west of the Hoggar (Fig. 2 regions E and F). Large amounts of mineral dust were transported over the Atlantic in the SAL. Relatively high values of aerosol optical thickness (AOT) (not shown) occurred north and northeast of the developing tropical depression, and were present in the vicinity of the storm during the whole genesis period. As the low-level positive relative vorticity anomaly moved across the West African coast line, large amounts of dust were transported over the Atlantic. A

The Transport of Mineral Dust Towards Hurricane Helene (2006)

475

Fig. 1 Vertical component of relative vorticity and the horizontal wind (m s−1 ) at 700 hPa (upper row) and 1000 hPa (lower row) in the ECMWF analysis. Taken from [35]

strong temperature gradient occurred near the surface from about 15◦ N and 20◦ N along the West African coast, between the coastal zone and the desertic inland, separating cold stably-stratified maritime air in the west from hot neutrally-stratified air over land. This low-level front remained stationary during the day (Fig. 3a) and moved eastwards in the late afternoon and evening hours (Fig. 3b). The most favourable location for frontal propagation was between 17◦ N and 19◦ N. This inland propagating front was part of the Atlantic inflow [13]. Mineral dust was gliding up along the isentropes in the baroclinic zone of the Atlantic inflow. As the front moved inland, increased mineral dust concentration were found in the same region. This could be observed almost every day. It was modified, however, by the low-level circulation moving westwards across the West African coastline in the late evening hours on 10 September 2006, and by the monsoon flow that reached far north on 11 and 12 September as the monsoon trough over the Atlantic moved westward. The position of the low-level circulation over the eastern Atlantic led to a second maximum in the temperature gradient just off shore. The emission of mineral dust by the Atlantic inflow in this model run was mainly restricted to region B in Fig. 2. A cross section through the region of largest dust concentrations on 11 September (Fig. 4) shows that the dust was lifted up in the region with the strongest potential temperature gradient due to the warm air of the Saharan heat low to the northeast

476

J. Schwendike et al.

Fig. 2 Political boundaries of the West African countries, the COSMO model orography (m above mean sea level) with a horizontal resolution of 28 km, and the main source regions of mineral dust (A to F) in the period between 9–14 September 2006. Taken from [35]

and the colder air of the maritime region in the southwest. The dust was transported up to 600 hPa (Fig. 4b). Isentropic upgliding in the baroclinic zone between the maritime and the Saharan air is indicated at 1300–1400 km. Mineral dust was also transported upwards by the strong turbulent mixing over the SHL during the day. This effect weakened during the evening hours. The isentropic upgliding occurred in this baroclinic zone which transported the mineral dust up to a height of about 500 hPa, where it descended slightly. At 700 hPa, significant amounts of dust were transported across the Atlantic. In the afternoon hours on 11 September 2006 another significant dust event occurred in Mauritania (Fig. 5c) due to the enhanced monsoon flow. The mineral dust was lifted up at about 12◦ W and 16–18◦ N. A maximum in dust concentration could also be seen at 800 hPa (Fig. 5b) and 700 hPa (Fig. 5a). Another maximum in dust concentration occurred along the West African coastline north of 22◦ N in association with a strong Harmattan (Fig. 5c). A band of dust enriched air was located north-northeast of the low-level circulation.

The Transport of Mineral Dust Towards Hurricane Helene (2006)

477

Fig. 3 The aerosol mass concentration (shaded, μg m−3 ), the temperature gradient (solid black line displaying the 0.05 and 0.1 K km−1 contour), the 288 K dewpoint temperature (dashed line) indicating the position of the ITD, and the horizontal wind speed (arrows, m s−1 ) at 975 hPa on 10 September 2006 at 15 UTC (a), and at 21 UTC (b). Based on the model run initialised on 9 September 2006, 12 UTC including the dust-radiation interaction (RadDust). Taken from [35]

Fig. 4 a The horizontal wind speed (arrows, m s−1 ) at 700 hPa, the aerosol optical thickness (shaded), the AEW trough (black curvy line) and AEJ axes (purple line) on 11 September 2006, 00 UTC (T+36). The black solid line displays the position of the cross section depicting the mineral dust mass concentration in μg m−3 in b at the same time. Pressure in the vertical component. Based on the model run initialised on 9 September 2006, 12 UTC including the dust-radiation interaction (RadDust). Taken from [35]

During the following hours, the mineral dust concentrations further increased due to new dust emissions. Moreover, the dust was advected westwards by the AEW at 700 hPa (Fig. 5d), and northwestwards at 800 hPa (Fig. 5e) and 950 hPa (Fig. 5f) by the monsoon flow. Additionally, on 12 September at 00 UTC considerable ascent occurred between about 14–20◦ N along the West African coast. In this region dust was lifted up into the AEW trough, where mineral dust concentrations in the order of 300–900 μg m−3 could be found (Fig. 5d). The dust from the northern dust event was advected towards the south and the dust was lifted up along a northeastsouthwest orientated band which collocated with the position of the ITD. On 12 September 2006 at 04 UTC, the mineral dust from the southern region of high dust concentration at 950 hPa reached roughly 21◦ N (Fig. 5i). This region was

478

J. Schwendike et al.

Fig. 5 Horizontal wind speed (arrows, m s−1 ), mineral dust mass concentration (shaded, μg m−3 ), and vertical velocity (−1.2 Pa s−1 , −0.6 Pa s−1 , −0.2 Pa s−1 contours, i.e. regions of ascent) at 700 hPa (left), 800 hPa (middle), and 950 hPa (right) on 11 September 2006, 18 UTC and 12 September at 00 UTC and 04 UTC. Based on the model run initialised on 9 September 2006, 12 UTC including the dust-radiation interaction (RadDust). Taken from [35]

characterised by the convergence between the Harmattan and the monsoon flow. The mineral dust advected by the Harmattan flow and the dust transported by the monsoon flow was partly lifted up here. The other part rotated cyclonically around the low-level circulation. At 800 hPa, the mineral dust concentration showed a distinct maximum offshore between 16 and 18◦ N (Fig. 5h). At 700 hPa, instead, maximum dust concentrations could not be seen in this region but south of it (Fig. 5g). All the previous dust events occurred north of the AEJ at the time of their emission and during the transport across West Africa. The dust from this event, however, is transported along about 15◦ N over the Atlantic within the trough of the AEW out of which Hurricane Helene developed. The convective systems over the Atlantic are embedded in a mesoscale mid- and low-level circulation. The low-level circulation was found to be warmer than its environment. The strong Harmattan deflected the mineral dust transport towards the

The Transport of Mineral Dust Towards Hurricane Helene (2006)

479

southwest. The strong northwesterly monsoon flow, enhanced the low-level circulation and the dust spiralled around the low-level circulation centre until the low-level and the mid-level circulation centres were collocated (in the late evening hours on 12 September; not shown). From this time on, relatively high AOTs could be found within the centre of the developing tropical depression. During the tropical cyclogenesis, bands of dry and dusty air spiralled towards the storm’s centre. We have seen that mineral dust is transported by the strong monsoon flow, the Harmattan and the low-level monsoon trough, as well as by the AEJ. Compared to satellite imagery, COSMO-ART has the great advantage that we can distinguish between dry air, dusty air, and dry and dusty air. We computed trajectories from the vicinity of the tropical storm Helene to distinguish where the air around the storm comes from. Forward and backward trajectories showed that the dry air north and northwest of the tropical storm Helene was a result of subsidence. Backward trajectories were calculated for the region displayed in Fig. 6c and between 2000 and 3500 m height, i.e. a layer around 700 hPa. This region is characterised by relatively dry air and hardly any mineral dust. A region of mineral dust concentrations in the order of 100 μg m−3 could be found westward of the selected region. North of the selected region an anticyclone occurs between about 24–30◦ N. On 11 September 2006 at 06 UTC, 57 hours prior, the anticyclone is located at about 27–22◦ W and 23–28◦ N (Fig. 6b). On 9 September 2006 at 23 UTC, the anticyclone can be found at 500 hPa between about 18–9◦ W and 23–32◦ N (Fig. 6a). Air parcels that originated from the southern region encircled in Fig. 6a move towards northwest, and air parcels from the northern encircled region move anticyclonically around the high pressure system. As the anticyclone is displaced westwards, the parcels originating from the southern region in Fig. 6a turn towards the southeast in the region illustrated in Fig. 6b which is located in the north of the anticyclone. That means that these air parcels also rotate anticyclonically around the high pressure systems while it moves westwards. On 13 September 2006 at 15 UTC, all air parcels within the selected box (Fig. 6c) that show a relative humidity smaller than 30% were traced backwards (Fig. 7). The trajectory calculation shows that the air parcels from both regions illustrated in Fig. 6a rotate anticyclonically around the high-pressure system and meanwhile, descent (Fig. 7a). As this relatively moist air descends it becomes dryer (Fig. 7b). On 13 September 2006 at 15 UTC, the air associated with the anticyclone at 500 hPa is very dry (Fig. 6a). This non-dust related dry air enhances the moisture gradient in the northwest of the developing tropical depression. Further trajectories (not shown) revealed that the mineral dust was mainly transported by the AEJ, the Harmattan and the monsoon flow. The low-level anticyclone brought dry air towards the low-level monsoon trough. This air rotated cyclonically around the monsoon trough and when it reached the baroclinic zone along the West African coast the air was lifted up to the height of the AEJ into the trough of the AEW and was transported westwards. This study showed that the dry air in the vicinity of Helene was due to subsidence in the anticyclone and due to SAL that was transported by the AEJ.

480

J. Schwendike et al.

Fig. 6 The relative humidity (%) at a 500 hPa on 9 September 2006 at 23 UTC (T+11), b 650 hPa on 11 September 2006 at 06 UTC (T+42), and c 700 hPa on 13 September 2006 at 15 UTC (T+99). The mineral dust mass concentration (μg m−3 ) is shown in white contours. The black box in c indicates the position from where trajectories (Fig. 7) were calculated, the encircled regions in a give the regions were the trajectories ended, and in b the turning point of the trajectories is highlighted. The model run was initialised on 9 September 2006, 12 UTC. Taken from [35]

4 Impact of the Dust-Radiation Feedback To assess the effect of mineral dust in the atmosphere during the cyclogenesis of Hurricane Helene, we compared the simulation in which the dust interacts with the radiation scheme (RadDust) with another simulation in which dust acts as a passive

The Transport of Mineral Dust Towards Hurricane Helene (2006)

481

Fig. 7 Backward trajectories starting on 13 September 2006 at 15 UTC and ending on 09 September 2006, 23 UTC. The horizontal box in which the trajectories originate ranges from 43–39◦ W, 14–20◦ N and extends vertically between 2000–3500 m height, i.e. it is a layer around 700 hPa. Only those trajectories are traced back that have a relative humidity lower than 30%. The pressure (hPa) along the trajectories is displayed in colour in a, and the relative humidity (%) in b. The pressure (×100 hPa) is the vertical coordinate. The trajectories are calculated for the model run initialised on 9 September 2006 at 12 UTC. Taken from [35]

tracer (NoRadDust). The NoRadDust runs has the same model setup as the RadDust run. The only difference is that in the NoRadDust the dust is only a tracer and no interactions with the dust are taken into account. Several differences occur in the potential temperature, zonal and meridional wind fields during the simulated period. The most marked, however, is the intensification of the monsoon trough on 12 September 2006. The potential temperature at 950 hPa north of the low-level circulation centre is about 4 K higher in the RadDust run than in the NoRadDust run (Fig. 8d). High dust concentrations could be found in

482

J. Schwendike et al.

Fig. 8 a Differences between the RadDust and NoRadDust model runs in potential temperature (shaded, K) at 950 hPa, and the 200 μg m−3 mass concentration contour. b Differences between the RadDust and NoRadDust model runs in zonal wind (m s−1 , shaded) and meridional wind (2 m s−1 contour interval) at 950 hPa. The dotted line denotes the zero line. The horizontal wind speed (arrows in m s−1 ) from the RadDust run is shown on 12 September 2006 at 12 UTC (T+72). Taken from [35]

this region, so the potential temperature increase can be attributed to the absorption of incoming shortwave radiation by the dust particles (Fig. 8b). A broad region with enhanced dust concentration west of the circulation centre is also characterised by an increase in low-level potential temperature. A marked temperature decrease up to 4 K occurs over land north of about 18◦ N. It can partly be attributed to an elevated dust layer that is located at around 900 hPa in this region. The mineral dust absorbs the incoming shortwave radiation and the temperatures in this dust layer increase (not shown). Below, the temperatures decrease as the shortwave radiation is reduced. Associated with these potential temperature differences we see an increase in the easterlies and northerlies north of the low-level circulation, and an increase in the westerlies and southerlies south of the low-level circulation (Fig. 8f) in the RadDust run. This is an indication that the monsoon trough has intensified and the low-level vorticity has increased. Although some of the thermodynamical changes are clearly associated with the direct interaction between dust and radiation at the time in question, others cannot be directly connected, indicating that changes in the circulation have occurred due to the dust-radiation interaction at previous times. The differences between the RadDust and the NoRadDust runs become more distinct with time. On 15 September 2006 at 06 UTC, dust concentrations can be found in the storm’s centre in the RadDust run and even higher values in a narrow band southwest of it (Fig. 9a). The centre of tropical storm Helene had a lower geopotential height in the RadDust run than in the NoRadDust run (Fig. 9b) indicating that the tropical storm develops faster in the RadDust run. The system had a warmer core in the RadDust run (not shown). The storm’s centre in both runs is displaced by about 2◦ (Fig. 9b).

The Transport of Mineral Dust Towards Hurricane Helene (2006)

483

Fig. 9 a The mineral dust concentration (μg m−3 ) at 850 hPa in the RadDust run, and b the geopotential (10−1 m2 s−2 ) at 900 hPa, for the RadDust run (shaded) and the NoRadDust run (contours). Additionally, the horizontal wind field in the RadDust run is shown on 15 September 2006 at 06 UTC (T+138). Taken from [35]

5 Summary The AEW out of which Hurricane Helene (2006) developed, the associated weather systems and mineral dust sources, as well as the dust transport were simulated using the model system COSMO-ART. The model run (RadDust) was compared to satellite images and observational data from the AMMA campaign. The agreement between the observations and the model results at the beginning of the simulation period was found to be reasonably good, but the model run was too long to accurately simulate the whole tropical cyclogenesis. Between 9–14 September 2006 several dust events occurred over West Africa. The dust was transported by different pathways towards the developing storm Helene. Air at about 500 hPa descended and thus became drier while rotating around an anticyclone. This lead to particularly dry air north and northwest of the tropical storm Helene. At low levels that anticyclone also brought dry air towards the monsoon trough. Mineral dust was transported by the AEW, Harmattan, the northeasterly trade winds and the monsoon trough. The monsoon flow over West Africa was very strong and reached far north during the analysed period. Dust was lifted up to the height of the AEJ shortly after the emission. Additionally, mineral dust was transported northwestwards were it reached the monsoon trough. The monsoon flow also transported the dust westwards at low levels. When it reached the baroclinic zone along the West African coastline, the dust was lifted up to about 700 hPa into the trough of the AEW. Another part of the dust was transported northwards. It reached the convergence zone between the Harmattan and the monsoon flow and was either lifted up or transported westward at low levels. When the low-level and the mid-level relative vorticity anomaly were collocated on 12 September 2006 in the late afternoon hours, dust could be seen in the centre of the developing tropical depression. The dust concentration in the storm’s centre decreased with time due to wet and dry deposition, but relatively high AOTs could be

484

J. Schwendike et al.

observed in the centre when Helene reached hurricane intensity. During the development from a tropical depression to a mature hurricane, bands of dry air spiralled towards the storm’s centre and a dry intrusion occurred west and southeast of the storm, close to it’s centre. The comparison between the RadDust and NoRadDust model runs showed that the potential temperature increased in the near-surface and uplifted mineral dust regions due to the absorption of radiation by aerosol particles. In the regions below the elevated dust layer, cooling occurred due to the reduction of incoming shortwave radiation. The monsoon trough was stronger in the RadDust run than in the NoRadDust run. The effect of dust on the tropical cyclogenesis is assumed to depend on the intensity of the dust event, i.e. the mineral dust mass concentrations. The dust concentrations for the Helene case were relatively low compared to other mineral dust events in West Africa. The temperature increase due to the absorption of radiation by the dust particles in the present case led to warming in the TC core, which in turn enhanced the system. This may not be the case for more intense dust outbreaks. The main difference to previous studies is that the dust in the present study occurs partly in regions that are relatively moist, especially in the vicinity of the storm’s centre. If the air is dry and dust enriched then it is likely to suppress convection as pointed out by several other studies. Further analyses of this case is needed to quantify how the interaction between radiation and dynamics influences the tropical cyclogenesis and their evolution. The analysis of potential temperature and potential vorticity budgets, as used in [38], can be applied. Acknowledgements. This project received support from the AMMA-EU project. Based on a French initiative, AMMA was built by an international scientific group and is currently funded by a large number of agencies, especially from France, UK, US and Africa. It has been the beneficiary of a major financial contribution from the European Community’s Sixth Framework Research Programme. Detailed information on scientific coordination and funding is available on the AMMA International web site http://www.amma-international.org.

References 1. S. C. Alfaro and L. Gomes. Modeling mineral aerosol production by wind erosion: Emission intensities and aerosol size distributions in source areas. J. Geophys. Res., 106:18075–18084, 2001. 2. D. Bou Karam, C. Flamant, P. Knippertz, O. Reitebuch, P. Pelon, M. Chong, and A. Dabas. Dust emissions over the Sahel associated with the West African Monsoon inter-tropical discontinuity region: A representative case study. Q. J. R. Meteorol. Soc., 134:621–634, 2008. 3. A. K. Blackadar. Boundary layer wind maxima and their significance for the growth of nocturnal inversions. Bull. Amer. Meteor. Soc., 38:283–290, 1957. 4. S. A. Braun. Re-evaluating the role of the Saharan air layer in the Atlantic tropical cyclogenesis and evolution. Mon. Wea. Rev., 130:2007–2037, 2010. 5. T. N. Carlson and S. G. Benjamin. Radiative heating rates for Saharan dust. J. Atmos. Sci., 37:193–213, 1980.

The Transport of Mineral Dust Towards Hurricane Helene (2006)

485

6. T. N. Carlson and J. M. Prospero. The large-scale movement of Saharan air outbrakes over the northern equatorial Atlantic. J. Appl. Meteor., 11:283–297, 1972. 7. H. F. Diaz, T. N. Carlson, and J. M. Prospero. A study of the structure and dynamics of the Saharan air layer over the northern equatorial Atlantic during BOMEX. Tech. Memo. ERL WMPO-32, National Hurricane and Experimental Meteorology Laboratory NOAA, 1976. 8. G. Doms and U. Sch¨attler. A description of the nonhydrostatic regional model LM. Part I: Dynamics and Numerics. COSMO documentation, Deutscher Wetterdienst, Offenbach, Germany, www.cosmo-model.org, 2002. 9. J. P. Dunion and C. S. Velden. The impact of the Saharan air layer on the Atlantic tropical cyclone activity. Bull. Amer. Meteor. Soc., 353–365, 2004. 10. S. Engelstaedter and R. Washington. Atmospheric controls on the annual cycle of North African dust. J. Geophys. Res., 112:D03103, doi:10.1029/2006JD007195, 2007. 11. C. Flamant, J.-P. Chaboureau, D. J. Parker, C. M. Taylor, J.-P. Cammas, O. Bock, F. Timouk, and J. Pelon. Airborne observations of the impact of a convective system on the planetary boundary layer thermodynamics and aerosol distribution in the West African monsoon intertropical discontinuity region. Q. J. R. Meteorol. Soc., 133:1–28, 2007. 12. C. Flamant, C. Lavaysse, M. C. Todd, J.-P. Chaboureau, and J. Pelon. Multi-platform observations of a representative springtime case of Bod´el´e and Sudan dust emission, transport and scavenging over West Africa. Q. J. R. Meteorol. Soc., 135:413–430, 2009. 13. C. M. Grams, S. C. Jones, J. H. Marsham, D. J. Parker, J. M. Haywood, and V. Heuveline. The Atlantic inflow to the Saharan heat low: Observations and modelling. Q. J. R. Meteorol. Soc., 136(s1):125–140, 2010. 14. J. Haywood, P. Francis, S. Osborne, M. Glew, N. Loeb, D. Tanr´e, E. Highwood, G. Myhre, P. Formenti, and E. Hirst. Radiative properties and direct radiative effect of Saharan dust measured by the C-130 aircraft during SHADE: 1. Solar spectrum. J. Geophys. Res., 108:8577, doi:10.1029/2002JD002687, 2003. 15. J. Helmert, B. Heinold, I. Tegen, O. Hellmuth, and M. Wendisch. On the direct and semidirect effects of Saharan dust over Europe: A modeling study. J. Geophys. Res., 112:13208, 2007. 16. C. Hoose, U. Lehmann, R. Erdin, and I. Tegen. The global influence of dust mineralogical composition on heterogeneous ice nucleation in mixed-phase clouds. Environ. Res. Lett., 3, doi:10.1088/1748-9326/3/2/025003, 2008. 17. T. A. Jones, D. J. Cecil, and J. Dunion. The environmental and inner-core conditions governing the intensity of Hurricane Erin (2001). Wea. Forcasting, 22:708–2725, 2007. 18. V. M. Karyampudi and T. N. Carlson. Analysis and numerical simulations of the Saharan air layer and its effects on easterly wave disturbances. J. Atmos. Sci., 45:3102–3136, 1988. 19. P. Knippertz, C. Deutscher, K. Kandler, T. M¨uller, O. Schulz, and L. Sch¨utz. Dust mobilization due to density currents in the Atlas region: Observations from the SAMUM 2006 field campaign. J. Geophys. Res., 111:D21109, doi:10.1029/2007JD008774, 2007. 20. E. Kessler. On the distribution and continuity of water substance in atmospheric circulation models. Meteor. Monographs, 10, Americ. Meteor. Soc., Boston, MA, 1969. 21. P. Knippertz and A. Fink. Synoptic and dynamic aspects of an extreme springtime Saharan dust outbreak. Q. J. R. Meteorol. Soc., 132:1153–1177, 2006. 22. P. Knippertz. Dust emissions in the West African heat trough – the role of the diurnal cycle and of extratropical disturbances. Meteorol. Z., 17:001–011, 2008. 23. V. M. Karyampudi, S. P. Palm, J. A. Regan, H. Fang, W. B. Grant, R. M. Hoff, C. Moulin, H. F. Pierce, O. Torres, E. V. Browell, and S. H. Melfi. Validation of the Saharan dust plume conceptual model using Lidar, Meteosat, and ECMWF data. Bull. Amer. Meteor. Soc., 80:1045–1075, 1999. 24. J. B. Klemp and R. B. Wilhelmson. The simulation of three-dimensional convective storm dynamics. J. Atmos. Sci., 35:1070–1096, 1978. 25. K. M. Lau and K. M. Kim. Cooling of the Atlantic by Saharan dust. Geophys. Res. Lett., 34:L23811, doi:10.1029/2007GL031538, 2007.

486

J. Schwendike et al.

26. A. Gaßmann. Numerische verfahren in der nichthydrostatischen modellierung und ihr einfl¨uss auf die g¨ute der niederschlagsvorhersage. Berichte des Deutschen Wetterdienstes, 221:1–96, 2002. 27. J. H. Marsham, D. J. Parker, C. M. Grams, C. M. Taylor, and J. M. Haywood. Uplift of Saharan dust south of the intertropical discontinuity. J. Geophys. Res., 113, doi:10.1029/2008JD009844, 2008. 28. G. L. Mellor and T. Yamada. Development of a turbulence closure model for geophysical flow problems. Rev. Geophys. Space Phys., 20:831–875, 1982. 29. D. J. Parker, R. Burton, A. Diongue-Niang, R. Ellis, M. Felton, C. M. Taylor, C. D. Thorncroft, P. Bessemoulin, and A. Tompkins. The diurnal cycle of the West African monsoon circulation. Q. J. R. Meteorol. Soc., 131:2839–2860, 2005. 30. J. M. Prospero and T. N. Carlson. Vertical and areal distributions of Saharan dust over the western equatorial North Atlantic Ocean. J. Geophys. Res., 77:5255–5265, 1972. 31. J. M. Prospero, P. Ginoux, O. Torres, S. E. Nicholson, and T. E. Gill. Environmental characterization of global sources of atmospheric soil dust identified with the Nimbus 7 Total Ozone Mapping Spectrometer (TOMS) absorbing aerosol product. Rev. Geophys., 40:1002, doi:10.1029/2000RG000095, 2002. 32. M. Raschendorfer. The new turbulence parameterization of LM. COSMO Newsletter, 1:89–97, 2001. 33. B. Ritter and J.-F. Geleyn. A comprehensive radiation scheme for numerical weather prediction models with potential application in climate models. Mon. Wea. Rev., 120:303–325, 1992. 34. J.-L. Redelsperger, C. D. Thorncroft, A. Diedhiou, T. Lebel, D. J. Parker, and J. Polcher. African Monsoon, Multidisciplinary Analysis (AMMA): An international research project and field campaign. Bull. Amer. Meteor. Soc., 1739–1746, doi:10.1175/BAMS-87-12-1739, 2006. 35. J. Schwendike. Convection in an African easterly wave over West Africa an the eastern Atlantic: A model case study of Helene (2006) and its interaction with the Saharan air layer. PhD thesis, Karlsruhe Institut of Meteorology, Karlsruhe, Germany, February 2010. 36. J. Steppeler, G. Doms, U. Sch¨attler, H. W. Bitzer, A. Gassmann, U. Damrath, and G. Gregoric. Meso-gamma scale forecasts using the nonhydrostatic model LM. Meteorol. Atmos. Phys., 82:75–97, 2003. 37. U. Sch¨attler, G. Doms, and C. Schraff. A description of the nonhydrostatic regional model LM, part VII: User’s guide. Deutscher Wetterdienst, www.cosmo-model.org, 2008. 38. J. Schwendike and S. C. Jones. Convection in an African Easterly Wave over West Africa an the eastern Atlantic: A model case study of Helene (2006). Q. J. R. Meteorol. Soc., 136(s1):364–396, 2010. 39. T. Stanelle, B. Vogel, H. Vogel, D. B¨aumer, and C. Kottmeier. Feedback between dust particles and atmospheric processes over West Africa in March 2006 and June 2007. Atmos. Chem. Phys. Discuss., 10:7553–7599, 2010. 40. S. Shu and L. Wu. Analysis of the influence of Saharan air layer on tropical cyclone intensity using AIRS/Aqua data. Geophys. Res. Lett., 36:L09809, doi:10.1029/2009GL037634, 2009. 41. M. Tiedtke. A comprehensive mass flux scheme for cumulus parameterization in large-scale models. Mon. Wea. Rev., 117:1779–1800, 1989. 42. I. Tegen, A. A. Lacis, and I. Fung. The influence on climate forcing of mineral aerosols from disturbed soils. Nature, 380:419–422, 1996. 43. M. C. Todd, S. Raghavan, G. Lizcano, and P. Knippertz. Regional model simulations of the Bod´el´e low-level jet of northern Chad during the Bod´el´e Dust Experiment (BoDEx 2005). J. Climate, 21:995–1012, 2008. 44. B. Vogel, C. Hoose, H. Vogel, and C. Kottmeier. A model of dust transport applied to the Dead Sea area. Meteorol. Z., 15, doi:10.1127/0941-2948/2006/0168, 2006. 45. B. Vogel, H. Vogel, D. B¨aumer, M. Bangert, K. Lundgren, R. Rinke, and T. Stanelle. The comprehensive model system COSMO-ART – radiative impact of aerosol on the state of the atmosphere on the regional scale. Atmos. Chem. Phys., 9:8661–8680, 2009.

The Transport of Mineral Dust Towards Hurricane Helene (2006)

487

46. S. Wong and A. E. Dessler. Suppression of deep convection over the tropical North Atlantic by the Saharan air layer. Geophys. Res. Lett., 32:9808, doi:10.1029/2004GL022295, 2005. 47. R. Washington and M. C. Todd. Atmospheric controls on mineral dust emission from the Bod´el´e depression, Chad: Intraseasonal to interannual variability and the role of the low level jet. Geophys. Res. Lett., 32:L17701, doi:10.1029/2005GL023597, 2006.

Numerical Modelling of Mediterranean Cyclones Claus-J¨urgen Lenz, Ulrich Corsmeier, and Christoph Kottmeier

1 Introduction Among the different synoptic systems occurring in Europe Mediterranean cyclones often belong to the most striking weather systems. In many cases Mediterranean cyclones are connected to high impact weather (HIW): extreme precipitation and/or storm events. Precipitation initiated by Mediterranean lows can be of convective origin resulting in locally and regionally thunderstorms with very high rain rates, whose duration is usually restricted to a maximum of a few hours, and which often lead to local flash floods (e.g. at Atrani near Amalfi, Italy on 09 September 2010). Synoptic lifting on the frontside or near the cyclone center may result in large scale and long term precipitation in high intensity lasting up to a few days depending on the movement of the cyclone core and causing widespread flooding especially in flat terrain. Both, deep convection and large scale lifting can occur simultaneously followed by an accumulation of the high rain rates of both processes. Further the local and regional precipitation patterns are influenced by the distribution of land and sea surfaces and their different characteristics with regard to surface roughness and turbulent surface heat fluxes. The influence of mountain ranges on precipitation is mainly due to their interaction with the synoptic current resulting either in an air flow channeled around the mountain ranges or in ascending and descending air masses during the overflow leading to the formation, shifting or dissolution of large scale precipitation areas. A combination of all these influence factors may cause locally extreme rain rates exceeding 300 mm/day. It can be seen that the precipitation patterns initiated by Mediterranean cyclones cover a wide spatial and temporal range from local thunderstorm cells (meso-γ ) over mesoscale systems to synoptic lifting (meso-α ). The wide range of scales covered by the precipitation patterns and the factors influencing the precipitation patterns Claus-J¨urgen Lenz · Ulrich Corsmeier · Christoph Kottmeier Institut f¨ur Meteorologie und Klimaforschung (IMK-TRO), Karlsruher Institut f¨ur Technologie (KIT), 76131 Karlsruhe, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 35, © Springer-Verlag Berlin Heidelberg 2012

489

490

C.-J. Lenz, U. Corsmeier, C. Kottmeier

and the foregoing formation of the Mediterranean cyclones are a high challenge on the numerical weather forecast. The accuracy of the weather forecast with respect to the extreme rain rates is of high importance to avoid the loss of lives and to reduce the socio-economic impact in the affected regions. The development of the Mediterranean cyclones, their dynamics and convection and the elaboration of factors determining their formation are part of the research project PANDOWAE (Predictability ANd Dynamics Of Weather systems in the Atlantic-European sector, http://www.pandowae.de; Grams and Jones [8]) funded by the German Research Foundation (DFG). The main goal of this subproject of PANDOWAE is the early identification of cyclones and convective systems leading to high impact weather in the Mediterranean using numerical simulations by the COSMO forecast model (http://www.cosmo-model.org) for selected cases of Mediterranean cyclones. The model results will be used to investigate the factors leading to cyclogenesis in the Mediterranean, to estimate the importance of these factors by sensitivity studies and finally to assess the forecast quality of COSMO for the selected cases.

2 Climatological Considerations To estimate the frequency of occurrence of high precipitation events in the northern part of the western Mediterranean region the daily precipitation analysis from the MAP project [6], version 4.0, has been used. The precipitation data retrieved from the MAP Data Centre [5] representing daily precipitation sums for grid cells with 0.22◦ in latitudinal and 0.30◦ in longitudinal direction are available for a time period of 25 years lasting from 1971 to 1995. As high rain rates precipitation amounts exceeding 50 mm/day and 100 mm/day in the grid boxes have been assumed and the spatial distribution of absolute frequency of days fulfilling these assumptions has been calculated, respectively. Only grid cells have been taken into account whose precipitation data availability cover at least 90% of the mentioned 25 years time period. In the upper row of Fig. 1 the absolute frequency of days with precipitation sums exceeding 50 mm (left) and 100 mm (right) are shown. The highest occurrence of daily precipitation exceeding 50 mm/day is found in the Ticino region and in the most southeasterly part of the Alps (Julian and Carnian Alps, Karawanken). The percentage of such high precipitation days amounts up to 3% in the considered 25 years period or an average of 11 days per year. Further regions with an enhanced occurrence of daily precipitation exceeding 50 mm are the Cevennes mountains in southern France, the Ligurian coastal mountain ranges and the edge of the Alps in the Italian province Piedmont. The occurrence of daily precipitation sums > 100 mm is mainly restricted to the above mentioned areas with the highest frequency again in the Ticino region, the most southeasterly Alps and the Cevennes. These results were achieved with precipitation data gained from rain-gauge measurements converted to grid cells with a considerable size of about 400 square kilometers. Therefore locally

Numerical Modelling of Mediterranean Cyclones

491

restricted and extremely high precipitation rates measured by single rain gauges can neither be reproduced in the MAP data nor in the above statistical considerations. For single rain gauges the frequency of high precipitation events may be considerably higher than the corresponding area averages listed in the MAP data base. In a further step the described high precipitation events have been related to synoptic large scale weather conditions according to Gerstengarbe and Werner [7] to reveal typical weather situations most frequently leading to high daily rain rates. In the bottom panel of Fig. 1 synoptic weather situations contributing to precipitation > 50 mm are shown. To guarantee statistically significant results only grid cells with a frequency of at least 25 rain events exceeding 50 mm/day during the 25 years period covered by the MAP data have been taken into account. The most frequent synoptic weather situation (left) leading to high precipitation events in the Ticino region, at the Ligurian coastal mountain range is “trough over Western Europe” with a synoptic southerly current. This weather situation further contributes to high precipitation days in the southeasternmost Alps, together with the weather situation “trough over Central Europe” with a southerly current to the southeastern Alps, and synoptic scale westerly flows. In the province Piedmont weather situations

Fig. 1 Absolute frequency of days with precipitation exceeding 50 mm/day (top left) and 100 mm/day (top right); large scale weather situations generating most frequent (bottom left) and second most frequent (bottom right) daily precipitation > 50 mm (red: “trough over Western Europe”, green: “trough over Central Europe”, yellow-orange: weather situations with easterly/southeasterly flow, violet: weather situations with westerly flow, red-orange: “deep pressure area over Britain”, turquoise: “high pressure ridge over Central Europe”)

492

C.-J. Lenz, U. Corsmeier, C. Kottmeier

with southeasterly and easterly current contribute predominantly to high precipitation. The most frequent weather situation for high precipitation in the Cevennes is again “trough over Western Europe”. In addition in the southern part of this region the situation “high pressure ridge over Central Europe”, usually coupled with deep pressure in the Mediterranean and hence an easterly current at the French Mediterranean coast, often leads to high rain rates. The second most frequent synoptic weather situations (bottom, right) leading to high precipitation events seem in most regions to be less systematic than the most frequent situation and hence more dominated randomly. Only in the Ticino region as well as at the Ligurian Alps the weather types “deep pressure area over Britain” and “cyclonic southern situation” are paramount. Following these considerations, the search for cases of Mediterranean cyclones and high precipitation events can be focused on large scale synoptic situations with a trough over Western Europe and further situations with large scale flow from the south and southeast towards the Alps.

3 The COSMO Model The COSMO model is a limited-area numerical weather forecast model, which has been designed for the simulation of meteorological phenomena on the mesoβ and meso-γ scale. Its basic version has been developed mainly at the Deutscher Wetterdienst (DWD) under the name “Lokal-Modell (LM)”. Since a few years the model development is continued by the COnsortium for Small-scale MOdelling (COSMO), a group of several national weather services in Europe, which use the COSMO model as operational weather forecast model (http://www.cosmo-model. org). In COSMO the basic non-hydrostatic and compressible equations are formulated prognostically for the three wind vector components, the pressure deviation from a predefined basic state, the temperature, the specific humidity, the cloud water and cloud ice content as well as the rain, snow, and optionally graupel. The equations are solved on a numerical Arakawa-C-grid [1] applied to a rotated geographical coordinate system to guarantee a small meridian convergence and related metric and numerical problems in the formulation of the equations. In the vertical direction a hybrid vertical coordinate is used with terrain following numerical layers near the surface, flat numerical layers near the top of the model domain, and a weighted transition in between. The distance between the numerical hybrid layers in vertical direction increases from 20 m near the surface up to 1 km at the model top for a better resolution of the planetary boundary layer. Near the model top a Rayleigh damping layer [4] is introduced to prevent the reflection of signals from the upper rigid lid of the model domain. For the numerical integration of the prognostic equations a second order leapfrog scheme is used together with the time-splitting method by Klemp and Wilhelmson [10] to allow a much shorter time step for the equation

Numerical Modelling of Mediterranean Cyclones

493

terms containing sound waves. Further and more detailed descriptions about the model dynamics and numerics can be found in [3]. The initialization of the prognostic equations can be done by data from larger scale regional or from global models interpolated/extrapolated on the numerical grid of the COSMO model. To impress changes of the large scale synoptic conditions on COSMO, boundary data are deduced from forecast or analysis data of the larger scale models and provided to COSMO as lateral and upper level boundary data. To prevent reflection of signals and inconsistencies between the driving data and the data calculated by COSMO near the boundaries, transition layers are introduced at the lateral boundaries in addition to the Rayleigh damping layer near the model top mentioned above. In COSMO, a variety of physical parameterization options could be selected. In the model simulations mentioned in this article, the subscale turbulence is calculated by a closure of order 2.5 according to Mellor and Yamada [12] using a prognostic turbulent kinetic energy equation. The same scheme is used for the surface layer parameterization including a laminar turbulent roughness layer by Mironov and Raschendorfer [13]. In the model runs the calculation of the soil temperature and humidity is performed on 7 soil layers with a high resolution near the soil surface and exponentially increasing layer thickness with increasing soil depth for a better representation of the stronger gradients of temperature and humidity near the soil surface. The short- and longwave fluxes of radiation including cloud radiation feedback processes are done once per hour using the two-stream radiation scheme by Ritter and Geleyn [14]. Further and more detailed descriptions about the model physics can be found in [2]. Due to the main focus on the formation of Mediterranean cyclones the model domain selected extends from the Acores islands in the west to Russia and Greece in the east, from the Mahgreb countries in northern Africa in the south to Scotland and the Baltic Sea in the north. The horizontal numerical resolution has been chosen to 7 km, hence the model domain is mapped on a horizontal grid with 661 grid points in longitudinal and 441 grid nodes in latitudinal direction. To fulfill the CourantFriedrichs-Levy criterion a time step of 30 s for the time integration scheme is used. The initialization and driving data are deduced from the operational global model GME of the Deutscher Wetterdienst [11], either as forecast or as analysis data. The COSMO forecast has been calculated for 78 hours or 96 hours, depending on the availability of the driving data. A formal workflow of the entire forecast is shown in Fig. 2. The variation in the storage size of the output data of INT2LM and COSMO is chosen due to the length of the simulation, the time interval of model output and the number of output variables. Due to the high demand of computational resources with respect to computing time as well as to the operation of the forecast model at various national weather services COSMO has been optimized for the use on parallel computers. The parallelization is done using the Message Passing Interface (MPI) software. The program code of COSMO is written in Fortran90/95 code and compiled using UNIX shell scripts by the make command. The formulation of the model code allows a high portability of the code on various platforms including HPCs as Cray T3E,

494

C.-J. Lenz, U. Corsmeier, C. Kottmeier

Fig. 2 Formal workflow of the COSMO forecast

IBM-SP, SGI ORIGIN, NEC SX. The model calculations described in this article have been performed on the HP XC4000 super computer of the SCC. The interpolation program INT2LM runs on HP XC4000 on 16 processors/4 nodes. The sum of CPU time over all processors amount to about 6 hours for 32 interpolated files. The application runs interactively and the elapsed time is strongly varying on the availability of computer resources and the daytime of the execution. The COSMO model runs on 64 processors/16 nodes. The CPU time summarized over all processors is 20.5 days for a forecast time of 4 days. The wallclock time is approximately 7.5 hours.

4 COSMO Model Results Within PANDOWAE-MED a typical case of Mediterranean cyclogenesis was selected for extensive COSMO simulations and sensitivity studies which took place from 28 to 29 October 2008. Starting on 27 October, a strong trough protruded from the easternmost North Atlantic Ocean to Western Europe and the Iberian Peninsula on 29 October 2008 (Fig. 3, upper left). This large scale synoptic situation fits quite well in the above mentioned results of the highest probability of high precipitation events in case of the weather situation “trough over Western Europe”. Until 30 October the tip of the trough moved further eastward to the western Mediterranean Sea

Numerical Modelling of Mediterranean Cyclones

495

undergoing a further intensification. On 27 October 21 UTC, the cold front linked to the protruding trough reached the western Mediterranean Sea at its northwesternmost region, just east of the Pyrenes mountain range indicated by a marked cyclonic change of wind direction and a postfrontal increase of wind speed (Fig. 3, lower left). Until 29 October the cold front was strongly deformed by the orography which either hind or enhance the penetration of cold air in the lower troposphere. Between the Pyrenes mountain range, the Cevennes and the French Alps the cold air mass was channeled and accelerated, whereas on the leeside of the Pyrenes the penetration of the cold air was decelerated. In addition, the cold air proceeds from the southwest via the Strait of Gibraltar into the Mediterranean Sea leading to low tropospheric wind shear, convergence lines (Fig. 3, lower right) as well as localized shallow deep pressure areas in the vicinity of the Spanish coast and the Balearian Islands (Fig. 3, upper right). Later, when the upper level trough moved eastward a second deep pressure system developed over the Lion’s Gulf and northern Italy on the leading edge of the PV streamer connected with the approaching trough.

Fig. 3 500 hPa topography in gpdm on 29 October 2008, 00 UTC (top left); mean sea level pressure in hPa on 29 October 2008, 00 UTC (top right); temperature in 2 m height in ◦ C and wind vector in 10 m above ground in m s−1 on 27 October 2008, 21 UTC (bottom left); equivalent potential temperature in 850 hPa in ◦ C and wind vector in m s−1 on 29 October 2008, 00 UTC (bottom right)

496

C.-J. Lenz, U. Corsmeier, C. Kottmeier

The strong wind shear and temperature differences near the sea surface over the Mediterranean Sea gave rise to highly variable surface fluxes (Fig. 4). When the cold air flew over the warm sea surface, marked areas of high sensible (250 W m−2 ) and latent heat flux (500 W m−2 ) developed, whereas the fluxes were relatively low at locations, where still warm air was advected to the sea (e.g. south of the Pyrenees near the Spanish Mediterranean coast). At convergence lines with locally low wind speed (e.g. just south of the Balearian Islands or west of Sardinia) significantly reduced fluxes occur. The high spatial and temporal variability of the flux pattern modifies the energy input into the atmospheric system via the lower boundary on small scale. For this case of Mediterranean cyclogenesis we calculated a lagged-average forecast (LAF) ensemble [9] using the COSMO model. The model simulations were started with a time difference of 12 hours in the period between 25 October 2008, 00 UTC and 28 October 2008, 12 UTC. The COSMO model (version 4.6) was initialized and driven by data from the global weather forecast model GME of the Deutscher Wetterdienst [11].

Fig. 4 COSMO simulation of sensible heat fluxes (W m−2 , left) and latent heat fluxes (W m−2 , right) of the cyclogenesis between 28 October and 30 October 2008. Surface fluxes on 29 October, 00 UTC after 48 hours simulation time (upper panel) and on 29 October, 12 UTC after 60 hours (lower panel)

Numerical Modelling of Mediterranean Cyclones

497

In Fig. 5 the temporal development of the area-averaged mean sea level pressure (top row) and the pressure minimum of the developing cyclone (bottom row) in a model subdomain covering the western Mediterranean and surrounding countries as shown in Fig. 4 can be seen. In the left column the temporal development in real time starting on 25 October 2008, 00 UTC and ending on 31 October 2008, 18 UTC can be found. In the top right figure the bias of the area-averaged pressure value with respect to GME analysis data depending on the simulation time is shown. The lower left picture figures the bias of the pressure minimum value depending on real time. After 36 hours, the area-averaged pressure in the considered subdomain started to decrease from values of 1026 hPa to about 1008 hPa. Most of the LAF runs show a negative pressure bias, especially the first run started on 25 October 2008, 00 UTC has a negative bias in mean pressure. The second decrease in area-averaged pressure after hour 120 is due to a new cyclone moving from the Atlantic Ocean to the Bay of Biscay. The area-averaged pressure bias depending on forecast time shows systematically for all runs a bias of ±1 hPa or less until a lead time of 42 hours. After this time the bias increases strongly to an amount of up to more than 3 hPa. The

Fig. 5 Temporal development of area averaged pressure reduced to mean sea level starting on 25 October 2008, 00 UTC (top left); dependency of bias of area-averaged mean sea level pressure on simulation time (top right); temporal development of minimum value of mean sea level pressure starting on 25 October 2008, 00 UTC (bottom left) and of bias of minimum value of mean sea level pressure starting on 25 October 2008, 00 UTC (bottom right). All graphs are valid for the model subdomain shown in Fig. 4

498

C.-J. Lenz, U. Corsmeier, C. Kottmeier

pressure minimum value in the model subdomain started to decrease after 48 hours (corresponding to 27 October 2008, 00 UTC). After a strong decrease the deepening of the cyclones over the Mediterranean slowed down until 29 October 2008, 12 UTC (after 108 hours) with a subsequent strong decrease again until 120 h (30 October 2008, 00 UTC) corresponding to a strong deepening of the cyclone in the Lion’s Gulf when it came on the leading edge of the PV streamer mentioned above. The pressure minimum after 120 hours is dominated by the new cyclone moving from the Atlantic towards southwestern France. As in the area averaged pressure, the minimum pressure values calculated by the LAF runs are again mostly lower than those by the analysis data, even with a higher amount of up to 7 hPa. The start of the cyclogenesis or the start of the minimum pressure decrease between 48 hours and 60 hours seems to be delayed by all COSMO runs indicated by the positive pressure minimum bias. After 60 hours the cyclogenesis is mostly simulated too intensive in the LAF runs compared to the analysis data. The influence of evaporation on the water cycle and the cyclone development has been examined by varying the transition coefficient over water surfaces in the COSMO model. The transition coefficient has been varied in such a way that the area-averaged evaporation was about 50% (rat sea = 100) and about 150% (rat sea = 2) of the evaporation by the reference model run (rat sea = 20). Both model runs were started on 27 October 2008, 00 UTC and compared to the corresponding reference run from the LAF ensemble prediction run. As can be seen in Fig. 6 (top left) the area averaged evaporation grows with increasing lead time except the peaks around noon of the forecast days (after 12 hours, 36 hours, 60 hours model time) which were caused by the enhanced evaporation of the land surfaces during daytime. The hourly evaporation rates at the end of the simulation are 1.8 mm/day in case of rat sea = 100 and 5.0 mm/day in case of rat sea = 2. The spatially and temporarily appearing maximum values of evaporation (not shown here) are between 10 mm/day (rat sea = 100) and 30 mm/day (rat sea = 2). As expected the area-averaged precipitation is increasing with increasing evaporation over the sea surfaces. Due to the initialization of the model runs with identical data the influence of different evaporation is increasing with time. At the end of the forecast the precipitation rate differs by a factor of 3 (Fig. 6, top right). The effect of the different evaporation rates can also be seen in the area-average of the vertical water column (not shown). Enhancing the evaporation does not result exclusively in higher precipitation, but in addition in higher water column values. This means not only the water cycle is enforced by higher evaporation, but additional water remains in the atmosphere. In the lower panel of Fig. 6 the bias of area-averaged pressure (left) and the bias of the pressure minimum value (right) are shown. The bias of the area-averaged pressure is positive in the case of reduced evaporation and strongly negative by enhanced evaporation. The development of the small-scale shallow cyclones on the leeside of the Iberian Peninsula at first and the development of the large scale Mediterranean cyclone at northern Italy secondly are dependent on the evaporation and the intensity of the water cycle. This can be seen

Numerical Modelling of Mediterranean Cyclones

499

in addition when looking at the bias of the pressure minimum value. In all cases COSMO calculates a too strong development of the northern Italy cyclone, but in the case of enhanced evaporation the core pressure deviates by more than 10 hPa from the corresponding value deduced from the analysis. Besides the discussed case from October 2008 COSMO runs were performed for a cyclogenesis, which took place on 13 September 2008 (no figures shown here). The development of this cyclone was much less intense than those in October and even cannot be found in the area-averaged sea level pressure, but only in a decrease of the pressure minimum value by about 6 hPa in the GME analysis data. One of the deterministic COSMO runs even didn’t simulate a marked cyclone, but only a large scale pressure depression over the western Mediterranean Sea. Despite the water column in the September case is higher by about 30% compared to the October cyclogenesis, the area-averaged precipitation in September is only half as high as in the October cyclone event, whereas the order of the evaporation is about the same. A considerable larger part of the humidity seems to be advected outside instead of being involved in the water cycle and the development of the cyclone in the western Mediterranean region.

Fig. 6 Temporal development of area-averaged evaporation (top left); area-averaged precipitation (top right); bias of area-averaged mean sea level pressure (bottom left); bias of minimum value of mean sea level pressure (bottom right). All abscissas start on 27 October 2008, 00 UTC and end on 30 October 2008 06 UTC. All graphs are valid for the model subdomain shown in Fig. 4

500

C.-J. Lenz, U. Corsmeier, C. Kottmeier

5 Summary and Outlook In this article results of climatological considerations with respect to daily high precipitation amount in the Alps and northern Italy and corresponding synoptic weather situations have been shown. After a short description of the numerical atmospheric simulation model COSMO, results of model runs performed on the super computer HP XC4000 of the SSC have been shown. The model simulations discussed should be considered as a basis for further model calculations to estimate the influence of turbulent sensible and latent heat flux near the sea surface, of roughness of the sea surface, and of the orography of the Iberian Peninsula and the Alps, respectively, on the formation of the Mediterranean cyclones as well as the precipitation patterns. A further model simulation with a higher spatial and temporal numerical resolution will be performed to allow to replace the parameterization of deep convection by the direct numerical simulation of this process. Acknowledgments. This project is funded by the German Research Foundation (DFG) as part of the research unit PANDOWAE (FOR896). We would like to thank the Deutscher Wetterdienst (DWD) for support of the COSMO model and for providing GME data.

References 1. Arakawa, A. and F. Messinger, 1976: Numerical methods used in atmospheric models. GARP technical report 17, WMO/ICSU. Geneva, Switzerland, 1, 17. 2. Doms, G., J. F¨orstner, E. Heise, H. Herzog, M. Raschendorfer, R. Schrodin, T. Reinhardt, and G. Vogel, 2005: A Description of the Nonhydrostatic Regional Model LM. Part II: Physical Parameterizations, Deutscher Wetterdienst, Offenbach, 139 pages. 3. Doms, G. and U. Sch¨attler, 2002: A Description of the Nonhydrostatic Regional Model LM. Part I: Dynamics and Numerics. Deutscher Wetterdienst (German Weather Service). 4. Durran, D. and J. Klemp, 1982: The effects of moisture on trapped mountain lee waves. Journal of Atmospheric Sciences, 39, 2490–2506. 5. Frei, C., 2004: Alpine precipitation analysis from high-resolution rain-gauge observations, available at http://www.map.meteoswiss.ch/map-doc/rr clim.htm. 6. Frei, C. and C. Sch¨ar, 1998: A precipitation climatology of the alps from high-resolution raingauge observations. International Journal of Climatology, 18 (8), 873–900. 7. Gerstengarbe, F. and P. Werner, 1999: Katalog der Großwetterlagen Europas nach Paul Hess und Helmuth Brezowsky 1881–1998, 5. Verbesserte und erg¨anzte auflage. Potsdam, Offenbach a. M., Germany. 8. Grams, C. and S. Jones, 2011: Modelling the extratropical transition of tropical cyclones and its downstream impact. High Performance Computing in Science and Engineering ’10, 479– 499. 9. Hoffman, R. and E. Kalnay, 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus A, 35 (2), 100–118. 10. Klemp, J. and R. Wilhelmson, 1978: The simulation of three-dimensional convective storm dynamics. Journal of Atmospheric Sciences, 35, 1070–1096. 11. Majewski, D., D. Liermann, P. Prohl, B. Ritter, M. Buchhold, T. Hanisch, G. Paul, W. Wergen, and J. Baumgardner, 2002: The operational global icosahedral-hexagonal gridpoint model GME: Description and high-resolution tests. Monthly Weather Review, 130 (2), 319–338.

Numerical Modelling of Mediterranean Cyclones

501

12. Mellor, G. and T. Yamada, 1982: Development of a turbulence closure model for geophysical fluid problems. Reviews of Geophysics and Space Physics, 20 (4), 851–875. 13. Mironov, D. and M. Raschendorfer, 2001: Evaluation of empirical parameters of the new LM surface-layer parameterization scheme. Tech. rep., COSMO Tech. Rep. 1, 12 pp. [Available online at http://www.cosmo-model.org/content/model/documentation/techReports/docs/ techReport01.pdf]. 14. Ritter, B. and J. Geleyn, 1992: A comprehensive radiation scheme for numerical weather prediction models with potential applications in climate simulations. Monthly Weather Review, 120 (2), 303–325.

Modelling Near Future Regional Climate Change for Germany and Africa Hans-J¨urgen Panitz, Peter Berg, Gerd Sch¨adler, and Giorgia Fosser

1 Introduction The scope of regional climate simulations carried out at the Institute for Meteorology and Climate Research (IMK) of Karlsruhe Institute of Technology (KIT) using the regional climate model (RCM) COSMO-CLM (CCLM) has been extended during the last years. Having had the focus first on Southwest Germany [1]. the area of interest has then been extended to the whole of Germany for the assessment of changes in flood risk for medium and small mountainous river catchments (CEDIM project: www.cedim.de) [2, 3]. These simulations span the years 1971–2000 to represent the climate of the recent past and the years 2011–2040 and 2021–2050 to analyse the climate change during the next few decades. In the meantime simulations are going to be carried out within the CORDEX framework [4]. CORDEX (Coordinated Regional climate Downscaling Experiment) aims to provide a framework to evaluate and benchmark RCM performance and to design a set of experiments to produce climate projections for use in impact and adaption studies and as input to the IPCC Fifth Assessment Report (AR5). Although it is the goal of CORDEX to consider all land areas of the world—except the polar regions—Africa has been chosen as the first target regions for several reasons. One is the vulnerability of Africa to climate change in terms of impacts on temperature and precipitation patterns which are strongly related to vital sectors like agriculture, water management, and health. Furthermore, only very few high-resolution climate simulations are available for Africa. Thus, the African region can benefit particularly from the CORDEX framework. CORDEX focuses on scenario simulations which will span the period 1951–2100 in order to include a recent historical period and the entire 21st century. Different from the scenario runs carried out for the fourth IPCC assessment report (AR4, [5]) Hans-J¨urgen Panitz · Peter Berg · Gerd Sch¨adler · Giorgia Fosser Institut f¨ur Meteorologie und Klimaforschung, Karlsruhe Institut f¨ur Technologie (KIT), Karlsruhe, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 36, © Springer-Verlag Berlin Heidelberg 2012

503

504

H.-J. Panitz et al.

which were based on the SRES greenhouse-gase (GHG) scenarios [6] the CORDEX simulations will be based on so-called reference concentration pathways (RCPs) [7], which prescribe GHG pathways throughout the 21st century. To reach the goals of all projects and to assess the uncertainties of the climate projections, it is necessary to create multi-model, multi-member ensembles of climate simulations. In the CEDIM-project, downscaling simulations of three realisations of the Global Circulation Model (GCM) ECHAM5 for a control period (1971–2000) and a future scenario period (2021–2050) ([8–10]) have been carried out using the A1B SRES scenario [6]. Further downscaling simulations have been performed using the results of the Canadian global circulation model (CGCM3, [11, 12]) in order to sample the uncertainty due to the global model used. For CORDEX, downscaling experiments will be performed using the results of the recent version 6 of ECHAM (ECHAM6). In addition, it is intended to use three more GCMs which are not yet finally fixed. Each of these four GCMs will use at least two of the new RCPs for the simulation of the 21st century. Thus, the whole CORDEX ensemble will consist of 8 members. It is obvious that such simulations require high computational power and large storage capacity for the results.

2 The CCLM Model The regional climate model CCLM is the climate version of the operational weather forecast model COSMO (Consortium for Small-scale Modelling) of the German Weather Service (DWD). It is a three-dimensional non-hydrostatic model which means that spatial resolutions below 10 km (which is considered the limit for hydrostatic models) are possible. The model solves prognostic equations for wind, pressure, air temperature, different phases of atmospheric water, soil temperature and soil water content. Further details on COSMO and its application as a RCM can be found in [13], on the web-page of the COSMO consortium (http://www.cosmo-model.org), and in [14, 15], and [16]. The model is coded in Fortran 90, making extensive use of the modular structures provided in this language. Code parallelisation is done via MPI (message passing interface) on distributed memory machines using horizontal domain decomposition with a 2-grid halo.

Modelling Near Future Regional Climate Change for Germany and Africa

505

3 Regional Climate Simulations Using the HLRS Facilities 3.1 The CEDIM Project In the CEDIM-project “Flood hazard in a changing climate”, the goal is to assess flood risk in medium and small river catchments in Germany for the near future. To this purpose, an ensemble of high resolution RCM simulations with the CLM model has been carried out. The driving models ECHAM5 [17] and CGCM3 [12] are downscaled with the CCLM. The simulations have been carried out in a double nest procedure, with the first nest at around 50 km resolution, covering Europe, and the second at 7 km, covering all of Germany and including the Alps. The control simulations span the period 1968–2000. The first three years are regarded as model spin-up time and not considered in further analyses. The future scenario spans the period 2008–2050. Again, a spin-up of three years is disregarded in the analyses. Altogether 14 CCLM simulations have been performed at the HLRS facilities. Related to the NEC SX-8 the computing requirements were about 12 node-hours per simulation year for the first nest with 50 km resolution, and about 89 nodehours per simulation year for the second nest with 7 km resolution. A node-hour is defined as the CPU time in hours one node needs for the simulation, using all its available cores. Aggregating all simulations and taking into account the whole simulation periods, 154 and 1145 node-days have been needed for the simulations of nest 1 and nest 2, respectively. Ensemble results for the changes in mean temperature and precipitation are presented in Fig. 1. The uncertainty range was calculated with a Monte Carlo based bootstrap method [18], and the 90% confidence level is shown in the plot. Although the ensemble spans a large range of values, there is a general pattern of increasing temperatures throughout the year, and also increases in precipitation for most of the year, except for summer when a decrease is more likely. Note however that the ensemble is biased to the ECHAM5 simulation which has a different climate change signal for precipitation than the CGCM3, as it can be seen from Fig. 1.

3.2 The CORDEX Framework The efforts within the CORDEX framework began with a series of sensitivity runs in order to identify a model configuration suitable for Africa. It was necessary to take the special meteorological conditions of the Tropics into account. These are, for example, the height of the tropopause which might extend up to about 20 km (as compared to about 11 km in higher latitudes), and the deep convection processes, which can also reach the height of the tropopause. Due to these conditions a upper height of 30 km has been chosen as top of the CCLM model domain. For the first phase of CORDEX a horizontal grid resolution of 0.44◦ is demanded. Thus, the horizontal model domain, shown in Fig. 2, has a

506

H.-J. Panitz et al.

Fig. 1 Seasonal changes in mean temperature and precipitation for the A1B scenario 2021–2050 and the control period 1971–2000. The bars indicate the 90% confidence level, and the red “x” marks the ensemble mean. Numbers 1 to 3 stand for the three ECHAM5 realisations, the “C” denotes the Canadian CGCM3 model

size of 214 grid-points from West to East and 221 grid-points from South to North. The number of vertical levels is 35. The simulations have been driven by ERA-Interim reanalysis [19]. ERA-Interim is an ‘interim’ reanalysis of the period 1989-present in preparation for the nextgeneration extended reanalysis to replace ERA-40. The series of sensitivity runs comprised variations of the model’s dynamical core (Runge-Kutta versus Leapfrog), the numerical time-step, the number of vertical levels, the convection parametrisation, and the soil albedo. The runs have partly be performed on the BWGRID cluster, and partly on the NEC SX-8 at HLRS. The model results were more or less insensitive to the change of the dynamical core, the numerical time step,and the number of vertical levels. A striking feature of all runs was the overestimation of summer temperatures over the Sahara, which can

Modelling Near Future Regional Climate Change for Germany and Africa

507

Fig. 2 Model domain for CORDEX Africa simulation. Red squares indicate the locations of 15 evaluation regions

reach 3◦ compared to climatological observations. An example is shown in Fig. 3a. It shows the annual cycle of the mean monthly temperature at 2 m height over the eastern Sahara (SAE, Fig. 2). The CCLM results differ in the dynamical core. The observational data are based on CRU data, which have been improved by the assimilation of available station data (U. B¨ohm, PIK Potsdam, pers. comm.). Inspecting physical parameters used in the standard version of CCLM revealed that soil albedo for the Sahara seems to be too low. Thus, a new and more realistic dataset for soil albedo has been adapted and implemented into the CCLM (D. Luethi, ETH Z¨urich, pers. comm.). The new data have been derived from MODIS (Moderate Resolution Imaging Spectroradiometer) [20]. Figure 3b demonstrates the impact of the new albedo data set. The temperature decreases systematically, thus improving the model results during summer, although the temperature is still too

508

H.-J. Panitz et al.

Fig. 3 Sensitivity related to different time integration schemes and surface albedo values

Fig. 4 Annual mean bias of 2 m temperature; CCLM results minus PIKCRU observational data. Left: CCLM results using old albedo values. Right: CCLM results using new albedo values

high. During winter the new albedo data lead to a cold bias. However, the new albedo dataset will be used in the CCLM configuration for the CORDEX Africa simulations, since its usage improves the long-term climatological values like the annual mean temperature. This is demonstrated in Fig. 4 which shows spatial distributions of the annual mean bias of the 2 m temperature. The left picture shows the result when using the old albedo values, the right one the bias using the new dataset. Again the model results are compared to the modified CRU data. The fig-

Modelling Near Future Regional Climate Change for Germany and Africa

509

Fig. 5 Annual cycles of daily precipitation for West Africa/Southern Sahelian calculated by five different RCMs, all driven by ERA-Interim reanalysis (ERAINT). In addition, satellite derived climatological observations (TRMM-3B42) and ERA-Interim precipitation data are included

ures depict that especially northern Africa and the Arabian Pinsinsula are affected by the change of albedo data. The bias decreases considerably when using the new data derived from MODIS because the new values are higher than the old ones in these areas. South of the Sahara its influence diminishes. Based on the results of the sensitivity runs the final CCLM configuration for the CORDEX Africa simulations has recently been fixed. The final evaluation run for the period 1989–2008, driven by the ERA-Interim reanalysis, has already been finished. The results are just been evaluated. A comparison of preliminary results with those of four other regional climate models has already been carried out (Fig. 5). Figure 5 shows the 50-day low-pass filtered annual cycle of daily precipitation calculated by five different RCMs for the West Africa/Sahel region. All models used ERA-Interim (ERAINT, [19]) reanalysis as initial and boundary conditions. The model results are compared with the satellite derived TRMM-3B42 daily precipitation data [22], which are available since 1998. In addition, the annual cycle of daily precipitation of the ERA-Interim reanalysis is shown. The COSMO-CLM result is denoted as IES-CCLM. Although all models more or less capture the bi-modal structure of the annual precipitation cycle, their results vary strongly around the satellite derived observations and the ERA-Interim precipitation. The reasons for the discrepancies are unclear. Besides the CCLM evaluation run using a grid resolution of 0.44◦ a simulation with a higher resolution of 0.22◦ has recently been submitted on the NEC SX-8 of HLRS. Since the model domain is identical to that of the 0.44◦ run (Fig. 2) the

510

H.-J. Panitz et al.

Table 1 Computational demands, performance of CCLM CORDEX Africa simulation on NEC SX-8, and disk storage needed at HLRS. All values are valid for one simulation year Domain size of 0.44◦ simulation: 214*221*35; Domain size of 0.22◦ simulation: 427*441*35 Resolution node-days Average Average Average Disk per year vector vector MFLOPS Storage length ratio (%) (Gb) 0.44 0.7 200 98 3800 260 0.22 4.3 205 99 4080 1040

number of horizontal grid-points doubles in each direction. The number of vertical levels remains 35. Because the grid-spacing has been reduced by a factor of two, the numerical time-step has to lower for the 0.22◦ simulation. A time-step of 120 s has been chosen (240 s in the 0.44◦ case). Thus, the numerical size of the higher resolution runs increases at least by factor of six compared to the lower resolution case. The HPC demands, the performance details, and the required storage capacity scaled to one simulation year are summarised in Table 1 for both simulations (0.44◦ and 0.22◦ ).

4 Recently Initiated and Future Work Recent research predicts for the near future an increase in precipitation in winter coupled with a slight decrease and more variability in summer in Central Europe. Despite high regional variability, studies indicate that there will be a rise in the frequency and intensity of summer extreme precipitation events. Such increased heavy summer rainfall on wet and dry soil is likely to lead to the enhancement of the erosion risk. In this context, the KLIWA (Klimaver¨anderung und Wasserwirtschaft; www. kliwa.de) project “Bodenabtrag durch Wassererosion in Folge von Klimaver¨anderungen” was initiated to assess the impact of climate change on soil erosion in Southern Germany. Since soil erosion is a process acting on small spatial and temporal scales, it requires very high spatial and temporal resolution precipitation data and statistics from regional climate models. Therefore, the research intends to downscale climate projections to regional scales in order to produce precipitation time series and statistics with a resolution of 1 km for selected locations with high erosion risk. Various project partners will use these data as input of an erosion model. The work will focus on modelling extreme precipitation events at higher spatial and temporal resolution (2.8 km, 1 km; 1 hour, 5 minutes) with the regional model COSMO-CLM for the recent past (ca. 1970–2000) and near future (2011–2050). The very high spatial and temporal resolutions (1 km and 5 minutes) are required by the erosion models. The purpose of the work is to evaluate what the added values of very high spatial and temporal resolution are and how the high resolution affects the precipitation statistics as well as the changes in erosion-related extreme precipitation events in the future.

Modelling Near Future Regional Climate Change for Germany and Africa

511

HLRS resources are crucial to achieve the aim. Sensitivity studies will be performed on four domains for one year with 2.8 km and one-hour spatial and temporal resolution. The domains sizes vary between 60*60 to 140*140 grid points. The necessary storage capacity is about 1160 Gb for the bigger domain. Using 2 nodes of the NEC SX-8, the simulation usually necessitates to be completed of about 17 hour in the first case and more than a day in the second. Within a month, the simulations for the period 1970–2000 with 2.8 km and 5 minutes resolution will start running; this will require more than a month, not taking into account waiting time. In the near future, the first attempt of 1 km resolution in climate mode will be performed for the same period. Further large demands for High Performance Computing will also be necessary within the MIKLIP (Mittelfristige Klimaprognosen) program initiated by the Federal Ministry for Science and Education (BMBF). The IMK will participate in joint projects which aim at regional decadal predictions of climate for Central Europe and the West African monsoon region.

References 1. H.-J. Panitz, G. Sch¨adler, and H. Feldmann (2010): Modelling Regional Climate Change in Southwest Germany. In: High Performance Computing in Science and Engineering ’09 [W. E. Nagel, D. Kr¨oner, M. Resch (Eds.)]. doi:10.1007/978-3-642-04665-0, Springer Berlin Heidelberg New York 2010, pp. 429–441. 2. P. Berg, H.-J. Panitz, G. Sch¨adler, H. Feldmann, and Ch. Kottmeier (2010): Downscaling Climate Simulations for Use in Hydrological Modeling of Medium-Sized River Catchments. In: High Performance Computing on Vector Systems 2010 [M. Resch, K. Benkert, X. Wang, M. Galle, W. Bez, H. Kobayashi, S. Roller (Eds.)]. doi:10.1007/978-3-642-11851-7, Springer Berlin Heidelberg New York 2010, pp. 163–170. 3. P. Berg, H.-J. Panitz, G. Sch¨adler, H. Feldmann, and Ch. Kottmeier (2011): Modelling Regional Climate Change in Germany. In: High Performance Computing in Science and Engineering ’10 [W. E. Nagel, D. Kr¨oner, M. Resch (Eds.)]. doi:10.1007/978-3-642-15748-6, Springer Berlin Heidelberg New York 2010, pp. 467-478. 4. F. Giorgi, C. Jones, and G. R. Asrar (2009): Addressing climate informationneeds at the regional level: The CORDEX framework WMO Bulletin, 58 (3), July 2009, 175–183. 5. IPCC (2007): Summary for Policymakers. In: Climate Change 2007: The Physical Science Basis, Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [S. Solomon, D. Qin, M. Manning, Z. Chen, M. Marquis, K. B. Averyt, M. Tignor and H. L. Miller (Eds.)]. Cambridge University Press Cambridge, United Kingdom and New York, NY, USA. 6. N. Nakicenovic et al. (2000). Special Report on Emissions Scenarios: A Special Report of Working Group III of the Intergovernmental Panel on Climate Change, Cambridge University Press Cambridge, U.K., 599 pp. Available online at: http://www.grida.no/climate/ipcc/ emission/index.htm. 7. R. H. Moss, J. A. Edmonds, K. A. Hibbard, M. R. Manning, St. K. Rose, D. P. van Vuuren, T. R. Carter, S. Emori, M. Kainuma, T. Krma, G. A. Meehl, J. F. B. Mitchell, N. Nakicenovic, K. Riahi, St. J. Smith, R. J. Stouffer, A. M. Thomson, J. P. Weyant, and T. J. Wilbanks (2010): The next generation of scenarios for climate change research and assessment. Nature, 463, 11 February 2010, 747–756, doi:10.1038/nature08823.

512

H.-J. Panitz et al.

8. E. R¨ockner (2006a): IPCC-AR4 MPI-ECHAM5 T63L31 MPI-OM GR1.5L40 20C3M run no.1: Atmosphere 6 HOUR values MPImet/MaD Germany. World Data Center for Climate. [doi:10.1594/WDCC/EH5-T63L31 OM-GR1.5L40 20C 1 6H]. 9. E. R¨ockner (2006b): IPCC-AR4 MPI-ECHAM5 T63L31 MPI-OM GR1.5L40 20C3M run no.2: atmosphere 6 HOUR values MPImet/MaD Germany. World Data Center for Climate. [doi:10.1594/WDCC/EH5-T63L31 OM-GR1.5L40 20C 2 6H]. 10. E. R¨ockner (2006c): IPCC-AR4 MPI-ECHAM5 T63L31 MPI-OM GR1.5L40 20C3M run no.3: atmosphere 6 HOUR values MPImet/MaD Germany. World Data Center for Climate. [doi:10.1594/WDCC/EH5-T63L31 OM-GR1.5L40 20C 3 6H]. 11. N.A. McFarlane, J. F. Scinocca, M. Lazare, R. Harvey, D. Verseghy, and J. Li (2005): The CCCma third generation atmospheric general circulation model. CCCma Internal Rep., 25 pp. 12. J. F. Scinocca, N. A. McFarlane, M. Lazare, J. Li, and D. Plummer, 2008: The CCCma third generation AGCM and its extension into the middle atmosphere. Atmos. Chem. and Phys., 8, 7055–7074. 13. G. Doms and U. Sch¨attler (2002): A description of the nonhydrostatic regional model LM, Part I: Dynamics and Numerics. COSMO Newsletter, 2, 225–235. 14. C. Meissner and G. Sch¨adler (2007): Modelling the Regional Climate of Southwest Germany: Sensitivity to Simulation Setup. In: High Performance Computing in Science and Engineering ’07 [W. E. Nagel, D. Kr¨oner, M. Resch (Eds.)]. ISBN 978-3-540-74738-3, Springer Berlin Heidelberg New York. 15. A. Hense, A. Will, and B. Rockel (2008): Regional climate modelling with COSMO-CLM (CCLM). Meteorologische Zeitschrift, 17, 4, 2008, special issue, ISSN 0941-2948. 16. C. Meissner, G. Sch¨adler, H.-J. Panitz, H. Feldmann, and Ch. Kottmeier (2009): High resolution sensitivity studies with the regional climate model COSMO-CLM. Meteorologische Zeitschrift, 18, 543–557, doi:10.1127/0941-2948/20090400. 17. G. Roeckner, G. Baeuml, L. Bonaventura, R. Brokopf, M. Esch, M. Giorgetta, S. Hagemann, I. Kirchner, L. Kornblueh, E. Manzini, A. Rhodin, U. Schlese, U. Schulzweida, A. Tompkins (2003): The atmospheric general circulation model ECHAM 5. PART I: Model description. Technical Report 349, Max-Planck-Institut f¨ur Meteorologie, Bundesstr. 55, D-20146 Hamburg, Germany 18. B. Efron, and R. J. Tibshirani (1993): An Introduction to the Bootstrap. Chapman & Hall, New York. 19. A. Simmons, S. Uppala, D. Dee, Sh. Kobayashiera (2006): New ECMWF reanalysis products from 1989 onwards. ECMWF Newsletter 110, Winter 2006/07, 25–35. 20. P. J. Lawrence, and Th. N. Chase (2007): Representing a new MODIS consistent land surface in the Community Land Model (CLM3.0). J. Geophys. Res., 112, G01023, doi:10.1029/2006JG000168, 2007. 22. G. J. Huffman, R. F. Adler, D. T. Bolvin, E. J. Nelkin, D. B. Wolff, G. Gu, Y. Hong, K. P. Bowman, E. F. Stocker (2007): The TRMM Multisatellite Precipitation Analysis (TMPA): QuasiGlobal, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales. J. Hydrometeorol., 8, 38–55.

High-Resolution Climate Predictions and Short-Range Forecasts to Improve the Process Understanding and the Representation of Land-Surface Interactions in the WRF Model in Southwest Germany (WRFCLIM) Hans-Stefan Bauer, Kirsten Warrach-Sagi, Volker Wulfmeyer, Thomas Schwitalla, and Martin Kirn

1 Introduction and Motivation The use of numerical modeling for climate projections is an important task in scientific research since they are the most promising means to gain insight in possible climate changes. The quality of the prepared predictions has been constantly improved in recent years, enabled by more powerful supercomputers as well as advanced numerical and physical schemes [e.g. 8, 14, 17]. The understanding of the interaction of anthropogenic climate change with natural climate variability on the global scale (grid boxes of 100 km and coarser) is steadily increasing. In the meantime, regional climate simulations with grid resolutions of 10–50 km became available in the climate modeling community [e.g. 3, 6, 10]. However, several effects severely limit the improvement of the skill of the simulations on the mesoscale. These include: a) Incorrect boundaries of global models resulting from incorrect physics and initial conditions, b) inconsistent physics between global and regional models, and c) poor consideration of orography and the heterogeneity of land-surface-vegetation properties. These deficiencies result in erroneous regional simulations of feedback processes between the land-surface and the atmospheric boundary layer as well as of the development of clouds and precipitation. The convection parameterization, which is required down to scales of the order of 4 km, has been identified as a key reason for significant systematic errors in QPF in regional climate simulations and mesoscale weather forecasting. Within the WWRP projects DPHASE1 [9] and COPS2 [20, 21], it has been clearly demonstrated that convection-permitting models without a parameterization of convection provide a significant advance with respect Hans-Stefan Bauer · Kirsten Warrach-Sagi · Volker Wulfmeyer · Thomas Schwitalla · Martin Kirn Institute of Physics and Meteorology, University of Hohenheim, Garbenstrasse 30, 70599 Stuttgart, Germany 1

D-PHASE: Demonstration of Probabilistic Hydrological and Atmospheric Simulation of Flood Events in the Alpine region (http://www.map.meteoswiss.ch/map-doc/dphase/dphase info.htm) 2 COPS: Convective and Orographically-induced Precipitation Study (http://www.cops2007.de) W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 37, © Springer-Verlag Berlin Heidelberg 2012

513

514

H.-S. Bauer et al.

to predictive skill and reduction of systematic errors particularly in complex terrain [e.g. 1, 2, 13, 18]. Furthermore, tests of convection-permitting regional climate simulations indicate a significant improvement of the representation of land-surfaceatmosphere feedback processes even in orographic terrain [5]. The problems listed above are the reason for a large gap in our knowledge concerning regional impacts of climate change down to the meso-γ scale (2–20 km) [4, 16]. The latter is essential for correctly modeling the distribution of atmospheric variables, particularly precipitation, in complex terrain as the Black Forest and the Alpine region and in regions with heterogeneous land surface properties. [4] summarized the necessary research needs that shall follow IPCC AR4. In addition, the new Strategic Plan 2009–2017 of the World Weather Research Program (WWRP) was previously released [19]. Both communities come to consistent conclusions concerning the promotion of research in two areas: 1) High-resolution, advanced mesoscale atmospheric ensemble modeling, and 2) high-resolution variational data assimilation, e.g., for testing and improving regional climate models in weather forecast mode. Ensemble simulations are a prerequisite for uncertainty analyses and therefore highly appreciated by decision makers. The performance of ensemble simulations is critically dependent on accurate simulations of weather statistics by the single members. Unfortunately, errors in the simulation of atmospheric processes occur in each component of the chain of events leading to the initiation of convection and the development, organization, and decay of clouds and precipitation. Corresponding errors propagate nonlinearly in the modeling system leading to systematic errors in weather forecasts and climate simulations and to a limitation of predictability. It is therefore a key task in atmospheric sciences to detect and separate error sources, and to reduce corresponding errors by improved representation and parameterizations of processes in e.g. the land surface-atmosphere system or the representation of clouds and precipitation. Projects like CORDEX (see Sect. 2) will provide the community with a set of regional climate simulations, which then can be used as a regional climate model ensemble for analysis and further applications.

2 The Model and Experimental Design The Weather Research and Forecasting (WRF) model [15] is a numerical weather prediction (NWP) model designed for both research and operational applications. The development of WRF has been supported by contributions of universities and research centers to build a next-generation mesoscale forecast model and data assimilation system to advance the understanding and prediction of mesoscale weather. WRF is based on a portable code that is efficient in computing environments ranging from massively-parallel supercomputers to laptops. Its spectrum of physics and dynamics options reflects the experience and input of the broad scientific community as well as recent research results. It is suitable for a broad span of applica-

WRFCLIM Report 2011

515

Table 1 Number of CPUs used and storage needs for the two CORDEX domains CPUs for running model (WRF-ARW) CPUs for pre-processing (WPS) Disk space for daily results Disk space for weekly restarts Disk space for half year forcing Disk space for post-processed daily data

Domain 1 (0.33°) 160 32 3.9 GB 630 MB 35 GB + 162 MB + 590 MB > 38 MB*24

Domain 2 (0.11°) 160 32 6.3 GB 3.0 GB 787 MB + 2.8 GB > 170 MB*24

tions across scales ranging from large-eddy to global simulations. Such applications include real-time NWP, data assimilation development and studies, parameterizedphysics research, regional climate simulations, air quality modeling, atmosphereocean coupling, and idealized simulations. For the verification run, we are participating in the CORDEX3 project. It is an initiative of the World Climate Research Programme (WCRP) of WMO to coordinate regional climate downscaling research in preparation of the fifth assessment report of the Intergovernmental Panel on Climate Change (IPCC AR5). Some of the main goals of CORDEX are: • Provide quality-controlled data sets of regional climate downscaling based on information for the recent historical past and 21st century projections covering the majority of populated land regions on the globe. • Define a common set of Regional Climate Model domains as well as a standard set of variables, their output frequency and data format. • Coordinate a range of regional climate model simulations The verification run is performed for the 20-year period (1989–2009) and is driven by ERA Interim analysis data (http://data-portal.ecmwf.int/data/d/interim daily/). Major aim of this simulation is to investigate the capability of the selected WRF configuration to represent the climate in the recent two decades and to study the effect of resolution on the results. Figure 1 illustrates the model domains chosen for the CORDEX-Europe simulation. The first domain has a resolution of 0.33° on a rotated grid with 202 × 198 grid points, the second domain has a resolution of 0.11° with 448 × 436 grid points.The first domain is necessary to smoothly downscale the 0.75° resolving ERA-Interim forcing data to the desired 0.11° grid. Table 1 summarizes the general computational demands necessary for our simulations. To study the effect of the convection parameterization on the climate model results a third domain with a resolution of 0.036° is added for additional simulations. For a climate projection run within the DFG funded PAK346 project “Regional Climate Change: High-resolution probabilistic regional climate projections with emphasis on the interaction between farmland and atmosphere”, a simulation from 3

CORDEX: COordinated Regional climate Downscaling EXperiment, http://copes.ipsl.jussieu.fr/ RCD Projects/CORDEX/CORDEX.html

516

H.-S. Bauer et al.

Fig. 1 WRF model domain for CORDEX and PAK346. Three nested domains are shown. Red line, inner domain: CORDEX domain (0.11), black line, outer domain: buffer zone between ERAinterim and CORDEX domain (0.33), white line, Germany and Benelux: CP domain (0.0366)

1989–2030 is set up. This simulation is forced with ECHAM5/MPI OM model [7] global climate simulation data (A1B scenario) which is available at a resolution of 1.875 × 1.875°. This configuration requires an additional outer domain of 0.99° to be added for downscaling. Selected variables for validation are: Surface data: 2 m air temperature, relative humidity, 10 m u wind and v wind, precipitation, soil moisture (4 levels), soil temperature (4 levels), surface and subsurface runoff, latent heat flux, sensible heat flux, ground heat flux, shortwave and long wave downward radiation, emissivity, shortwave upward radiation, skin temperature, surface pressure. 3D-data: Air temperature, horizontal wind speed and direction, water vapor mixing ratio at 1000 hPa, 850 hPa, 700 hPa, 500 hPa and 300 hPa. Until now, the precipitation is analyzed, the other variables will be studied during 2011 and 2012.

WRFCLIM Report 2011

517

3 Preliminary Analysis of CORDEX-Europe Results The downscaling experiment applying WRF to the CORDEX domains of Europe from 1989 to 2009 finished at the end of August 2010. The 120 Terabyte of data are stored at the HPSS of HLRS. The variables to be evaluated are currently extracted. Precipitation was the first variable to be extracted and is currently analyzed. From experience with previous applications of WRF in Central Europe in weather forecast mode [12], it was decided to use the following parameterizations: the Morrison two-moment microphysical scheme; the new Kain-Fritsch convection scheme; the YSU atmospheric boundary layer parameterization; as well as the land surface model NOAH. Due to the climate mode of the simulation, the CAM shortwave and longwave radiation schemes were chosen. The 30 arc second MODIS land-cover data, classified according to the International Geosphere-Biosphere Programme (IGBP), which was adapted to NOAH, was chosen for the vegetation and for the soil in consultant discussions with P2 from PAK 346 the 5 min. UN/FAO data was applied, as it comes with the pre-processing package of WRF. Since the ERA interim data is available at approx. 0.75°, WRF is applied one-way nested on 0.33° and 0.11° with a time stepping of 120 s and 60 s, respectively. The WRF climate simulation for Europe from 1989 to 2009 was completed on HLRS on 23 August 2010 after 7 months of simulation on HLRS. The result is among the first data sets over the CORDEX domain at this scale (0.11°) applying the latest reanalyses data set from ECMWF (ERA-interim) as forcing at the boundary. The results are stored 3-hourly (and hourly for selected years) for each variable in both domains, i.e. currently 120 Terabyte of data are stored. With this resolution, diurnal cycles may be studied and the data may be applied as forcing data by other groups, e.g. the agricultural and land surface modeling groups of e.g. PAK 346. Precipitation and temperature were chosen to be validated at first. Accurate simulation of precipitation statistics is essential for agricultural applications. Although precipitation is one of the mostly observed meteorological quantities, it is still a challenge to produce gridded observational data sets and to simulate its distribution in space, time and intensity. For Germany, the REGNIE (Regionaler Niederschlag) data set of the German Weather Service (DWD) provides daily precipitation data on a 1 km grid since 1961. REGNIE is generated from about 1000 precipitation measurement stations interpolated on a 1 × 1 km grid over Germany considering the station elevations and expositions. To compare the REGNIE dataset with the different model grids, the observations were interpolated to the model grid, which is considered as the most appropriate approach [11]. First results from evaluating the annual and seasonal precipitation from 1990 to 2009 against REGNIE data show a wet bias of the model results in the order of 30% for Germany, some areas have a bias of more than 50%. Results for Germany are shown exemplarily in Figs. 2 and 3. Further extensive verification studies are ongoing. Convection parameterizations may cause biases in the water cycle leading to the overestimation of precipitation (e.g. [1, 12]). Therefore, the analysis of the currently running CP simulations will give important insights into the reasons for the wet bias. The verification of the first WRF-NOAH CORDEX runs over Europe

518

H.-S. Bauer et al.

Fig. 2 Precipitation in Spring (March-May) 2000 in Germany simulated with WRF on 0.11° grid for second domain: a simulated with WRF, b deviation of WRF results from regridded 0.11° REGNIE data

Fig. 3 Same as Fig. 2 but for summer 2002

of IPM is subject of a publication of K. Warrach-Sagi (in preparation). The results underline the importance of downscaling climate simulations forced with reanalysis data for evaluation studies prior to climate projections. WRF has demonstrated to perform well in weather forecast mode, however, each model configurations need to be tested further in climate mode due to a possible different behavior in long-term simulations.

WRFCLIM Report 2011

519

4 Ongoing Simulations 4.1 Convection Permitting Simulations The CP inner domain (0.036°, approximately 4 km grid) is set up via a nest down simulation (third domain in Fig. 1). Currently the results of the 0.11° simulation are further downscaled to 0.036° (for selected years: 1996, 2003, 2007, 2009) with WRF-NOAH in a CP simulation. In the study regions, during the growing season, dedicated high-resolution simulations with 0.0366° resolution will be performed in order to investigate land-surface model physics and feedback processes between the soil, vegetation, ABL, clouds, and precipitation. By comparison of 0.11° and 0.0366° simulations it will be investigated how far the model is capable to simulate orographically-induced flows in complex terrain such as the Black Forest region and the Swabian Jura. This is considered essential, as otherwise huge systematic errors such as the windward-lee-effect and errors in the simulation of the diurnal cycle of precipitation will remain.

4.2 Climate Projections A climate projection with WRF-NOAH on domains 1 and 2 (see Fig. 1) is in progress and will presumably be completed in 2012. It is forced with the global climate model ECHAM5/MPI-OM A1B scenario data from 1989–2035. For this purpose, the pre-processing was modified in the way that an additional coarser domain (0.99°) was added to allow nesting down of the ECHAM5/MPI-OM data into WRF. This simulation is run nest by nest due to the nature and coarse resolution (1.875° (T63), i.e. approx. 200 km) of the ECHAM data.

5 Work Planned In 2011 and 2012 the following steps will be a) the completion and evaluation of the climate projection currently under way, b) further downscaling on 0.036° (CP scale) for selected springs and summers between 1989 and 2009, and c) further evaluation of the 0.11° and 0.0366° simulation results to identify the reasons for the wet precipitation bias of the simulation. The evaluation includes e.g. comparisons of the model runs with 1. ERA-interim and ECMWF soil moisture analyses, 2. surface radiation from the Climate Monitoring Satellite Application Facility (CM-SAF) of EUMETSAT,

520

H.-S. Bauer et al.

3. CRU, LUBW, JDC, and REGNIE surface temperature, humidity, and precipitation fields, 4. International Satellite Cloud Climatology Project (ISCCP) cloud fields. Except for the comparison with ISCCP data, all tools for the model comparisons have already been developed at IPM within PAK 346, COPS, and D-PHASE. Specifically, it will be investigated whether the moist bias in precipitation is due to inaccurate simulations of soil moisture or model physics. This may require additional special model runs during dedicated time periods such as 2007 where the most extensive verification data sets are available. Planned is an additional WRFNOAH simulation forced with ERA-interim data in climate mode for 2005–2010 in an optimized configuration and the evaluation of its results.

References 1. H.-S. Bauer, T. Weusthoff, M. Dorninger, V. Wulfmeyer, T. Schwitalla, T. Gorgas, M. Arpagaus and K. Warrach-Sagi, 2011: Predictive skill of a subset of models participating in D-PHASE in the COPS region. Quart. J. Roy. Meteorol. Soc., 137, 287–305. DOI:10.1002/qj.715. 2. S. Crewell, M. Mech, T. Reinhardt, C. Selbach, H.-D. Betz, E. Brocard, G. Dick, E. O’Conner, J. Fischer, T. Hanisch, T. Hauf, A. H¨unerbein, L. Delobbe, A. Mathes, G. Peters, H. Wernli, M. Wiegner and V. Wulfmeyer, 2008: The general observation period 2007 within the priority program on quantitative precipitation forecasting: Concept and first results. Meteorol. Zeitschrift, 17, 6, 849–866. DOI:10.1127/0941-2948/2008/0336. 3. M. D´equ´e, D. P. Rowell, D. L¨uthi, F. Giorgi, J. H. Christensen, B. Rockel, D. Jacob, E. Kjellstr¨om, M. De Castro and B. van den Hurk, 2007: An intercomparison of regional climate simulations for Europe: Assessing uncertainties in model projections. Clim. Change, 81, 53– 70. 4. S. J. Doherty, S. Bojinski, A. Henderson-Sellers, K. Noone, D. Goodrich, N. L. Bindoff, J. A. Church, K. A. Hibbard, T. R. Karl, L. Kajfez-Bogataj, A. H. Lynch, D. E. Parker, I. C. Prentice, V. Ramaswamy, R. W. Saunders, M. S. Smith, K. Steffen, T. F. Stocker, P. W. Throne, K. E. Trenberth, M. M. Verstraete and F. W. Zwiers, 2009: Lessons learned from IPCC AR4: Scientific developments needed to understand, predict and respond to climate change. Bull. Amer. Meteor. Soc., 90, 497–513. 5. C. Hohenegger, P. Brockhaus and C. Sch¨ar, 2008: Towards climate simulations at cloudresolving scale. Meteorol. Zeitschrift, 17, 383–394. 6. D. Jacob, L. B¨ahring, O. B. Christensen, J. H. Christensen, S. Hagemann, M. Hirschi, E. Kjellstr¨om, G. Lenderink, B. Rockel, C. Sch¨ar, S. I. Seneviratne, S. Somot, A. van Ulden and B. van den Hurk, 2007: An intercomparison of regional climate models for Europe: Design of the experiments and model performance. Clim. Change, 81. PRUDENCE Special Issue. 7. J. H. Jungclaus, N. Keenlyside, M. Botzet, H. Haak, J.-J. Luo, M. Latif, U. Micolajewicz and E. Roeckner, 2005: Ocean circulation and tropical variability in the coupled model ECHAM5/MPI-OM. J. Climate, 19, 3952–3972. 8. H. Morrison and A. Gettelman, 2008: A new two-moment bulk stratiform cloud microphysics scheme in the Community Atmosphere Model, version 3 (CAM3). Part I: Description and numerical tests. 9. M. W. Rotach, M. Arpagaus, M. Dorninger, C. Hegg, A. Montani, R. Ranzi, F. Bouttier, A. Buzzi, G. Frustaci, K. Mylne, E. Richard, A. Rossa, C. Sch¨ar, M. Staudinger, H. Volkert, V. Wulfmeyer, P. Ambrosetti, F. Ament, C. Appenzeller, H.-S. Bauer, S. Davolio, M. Denhard,

WRFCLIM Report 2011

10.

11. 12.

13. 14.

15.

16.

17. 18.

19. 20.

21.

521

L. Fontannaz, J. Frick, F. Fundel, U. Germann, A. Hering, C. Keil, M. Liniger, C. Marsigli, Y. Seity, M. Stoll, A. Walser and M. Zappa, 2009: MAP D-PHASE: Real-time demonstration of weather forecast quality in the alpine region. Bull. Amer. Meteor. Soc., 90, 1321–1336. C. Sch¨ar, P. L. Vidale, D. L¨uthi, C. Frei, C. Haeberli, M. A. Liniger and C. Appenzeller, 2004: The role of increasing temperature variability in European summer heatwaves. Nature, 427, 332–336. C. D. Sch¨onwiese, 2000: Praktische Statistik f¨ur Meteorologen und Geowissenschaftler. Borntr¨ager, Stuttgart. T. Schwitalla, H.-S. Bauer, V. Wulfmeyer and F. Aoshima, 2011: High-resolution simulation over central Europe: Assimilation experiments during COPS IOP9c. Quart. J. Roy. Meteorol. Soc., 137, 156–175. DOI:10.1002/qj.721. T. Schwitalla, H.-S. Bauer, V. Wulfmeyer and G. Z¨angl, 2008: Systematic errors of QPF in low mountain regions. Meteorol. Zeitschrift, 17, 903–917. W. C. Skamarock and J. B. Klemp, 2008: A time-split nonhydrostatic atmospheric model for research and NWP applications. J. Comp. Phys., 227, 3465–3485. Special issue on environmental modeling. W. C. Skamarock, J. B. Klemp, J. Dudhia, D. Gill, D. O. Barker, M. G. Duda, X.-Y. Huang, W. Wang and J. G. Powers, 2008: A Description of the Advanced Research WRF Version 3. NCAR Technical Note TN-475+STR, NCAR, P.O. Box 3000, Boulder, CO, 80307. S. Solomon, D. Qin, M. Manning, L. Chen, M. Marquis, K. B. Avery, M. Tignor and H. L. Miller, 2007: Climate Change 2007: The Physical Science Basis. Cambridge University Press, Cambridge, 996 pp. T. Thuburn, 2008: Some conservation issues for dynamical cores of NWP and climate models. J. Comp. Phys., 227, 3715–3730. T. Weusthoff, F. Ament, M. Arpagaus and M. W. Rotach, 2009: Verification of precipitation forecasts of the D-PHASE data set. In: Proceedings of the 30th International Conference on Alpine Meteorology, pp. 72–73. Rastatt, Baden-W¨urttemberg, Germany. WMO, 2010: WWRP Strategic Plan 2009–2017. Available online: http://www.wmo.int/pages/ prog/arep/wwrp/new/documents/final WWRP SP 6 Oct.pdf. V. Wulfmeyer, A. Behrendt, H.-S. Bauer, C. Kottmeier, U. Corsmeier, A. Blyth, G. Craig, U. Schumann, M. Hagen, S. Crewell, P. Di Girolamo, C. Flamant, M. Miller, A. Montani, S. Mobbs, E. Richard, M. W. Rotach, M. Arpagaus, H. Russchenberg, P. Schl¨ussel, M. K¨onig, V. G¨artner, R. Steinacker, M. Dorninger, D. D. Turner, T. M. Weckwerth, A. Hense and C. Simmer, 2008: The convective and orographically-induced precipitation study: A research and development project of the world weather research program for improving quantitative precipitation forecasting in low-mountain regions. Bull. Amer. Meteor. Soc., 89, 10, 1477–1486. DOI:10.1175/2008BAMS2367.1. V. Wulfmeyer, A. Behrendt, C. Kottmeier, U. Corsmeier, C. Barthlott, G. C. Craig, M. Hagen, D. Althausen, F. Aoshima, M. Arpagaus, H.-S. Bauer, L. Bennett, A. Blyth, C. Brandau, C. Champollion, S. Crewell, G. Dick, P. DiGirolamo, M. Dorninger, Y. Dufournet, R. Eigenmann, R. Engelmann, C. Flamant, T. Foken, T. Gorgas, M. Grzeschik, J. Handwerker, C. Hauck, H. H¨oller, W. Junkermann, N. Kalthoff, C. Kiemle, S. Klink, M. K¨onig, L. Krauss, C. N. Long, F. Madonna, S. Mobbs, B. Neininger, S. Pal, G. Peters, G. Pigeon, E. Richard, M. W. Rotach, H. Russchenberg, T. Schwitalla, V. Smith, R. Steinacker, J. Trentmann, D. D. Turner, J. van Baelen, S. Vogt, H. Volker, T. Weckwerth, H. Wernli, A. Wieser and M. Wirth, 2011: The convective and orographically induced precipitation study (COPS): The scientific strategy, the field phase, and first highlights. Q. J. R. Meteorol. Soc., 137, 3–30. DOI:10.1002/qj.752.

Direct Numerical Simulation and Implicit Large Eddy Simulation of Stratified Turbulence S. Remmler and S. Hickel

Abstract Simulation of geophysical turbulent flows requires a robust and accurate subgrid-scale turbulence modeling. We propose an implicit subgrid-scale model for stratified fluids, based on the Adaptive Local Deconvolution Method. To validate this turbulence model, we performed direct numerical simulations of the transition of the three-dimensional Taylor–Green vortex and homogeneous stratified turbulence. Our analysis proves that the implicit turbulence model correctly predicts the turbulence energy budget and the spectral structure of stratified turbulence.

1 Introduction Turbulence in fluids is strongly affected by the presence of density stratification which is a common situation in geophysical flows. To predict atmospheric and oceanic mesoscale flows, we need to understand and parametrize the small scale turbulence. The stratification suppresses vertical motions and thus makes all scales of the velocity field strongly anisotropic. The horizontal velocity spectrum in the atmosphere was analyzed by Nastrom & Gage [24] using aircraft observations. They found a power-law behavior in the mesoscale range with an exponent of −5/3. In the vertical spectrum, on the other hand, Cot [6] observed an exponent of −3 in the inertial range. There has been a long an intensive discussion whether the observed spectra are due to a backward cascade of energy [10, 11, 18] as in two-dimensional turbulence [16], or due to breaking of internal waves, which means that a forward cascade is the dominant process [7, 30]. In different numerical and theoretical studies, ambiguous or even conflicting results were obtained [19].

S. Remmler · S. Hickel Technische Universit¨at M¨unchen, Institute of Aerodynamics and Fluid Mechanics, 85747 Garching bei M¨unchen, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 38, © Springer-Verlag Berlin Heidelberg 2012

523

524

S. Remmler, S. Hickel

During the last decade, a number of new simulations and experiments addressed the issue. Smith & Waleffe [27] observed a concentration of energy in the lowest modes in their simulations. Other studies [17, 31] suggested that the character of the flow depends on the Reynolds number. Apparently, high Reynolds numbers are associated with stronger three-dimensionality and a forward cascade of energy. Lindborg [21] presented a scaling analysis of the Boussinesq equations for low Froude and high Reynolds number. His theory of strongly anisotropic, but still −5/3 three-dimensional, turbulence explains the horizontal kh -spectrum as well as the −3 vertical kv -spectrum. On the basis of these findings, Brethouwer et al. [4] showed that the relevant non-dimensional parameter controlling stratified turbulence is the buoyancy Reynolds number R = Fr2 Re. For R 1, they predict stratified turbulence including local overturning and a forward energy cascade. In the opposite limit, for R 1, the flow is controlled by viscosity and does not contain smallscale turbulent motions. Since a full resolution of all turbulence scales is only possible for very low Reynolds numbers, many groups used subgrid-scale (SGS) models in their computations. E.g., M´etais & Lesieur [23] used a spectral eddy viscosity model, based on the eddy damped quasi-normal Markovian (EDQNM) theory. This required a flow simulation in Fourier space and the cut-off wavenumber to be in the inertial range. For classical large eddy simulations (LES) in physical space, Smagorinsky models are widely used, either in the classical formulation [15] or with certain modifications for stratified turbulence based in the local Richardson number [8]. Staquet & Godeferd [28] presented a two-point closure statistical EDQNM turbulence model, which was adapted for axisymmetric spectra about the vertical axis. Recently, many groups presented regularized direct numerical simulations (DNS) of stratified turbulence, which means rather pragmatically stabilizing under-resolved DNS by removing the smallest resolved scales. This is obtained by a hyperviscosity approach [20, 21] or by de-aliasing in spectral methods using the “2/3-rule” [1, 9]. All SGS turbulence models suffer from the problem that the computed SGS stresses are of the same order as the grid truncation error. This typically leads to interference between SGS model and numerical scheme, instability and worse results on refined grids. This issue can be solved by combining discretization scheme and SGS model in a single approach. This is usually referred to as “implicit” LES (ILES) in contrast to the traditional “explicit” SGS models. The idea of ILES was realized by Hickel et al. [12] in the Adaptive Local Deconvolution Method (ALDM) for neutrally stratified fluids. Based on this method and ALDM for passive scalar transport [13], we developed an implicit SGS model for Boussinesq fluids. In the present paper, we will evaluate the applicability of ALDM for stably stratified turbulence. We simulated transition and decay of the three-dimensional Tailor–Green vortex as an example of a transitional stratified turbulent flow. For isotropic conditions, this flow was intensively studied by Brachet et al. [2, 3]. Riley & de Bruyn Kops [25] first simulated its evolution in a stably stratified background. The second test case to be covered is forced homogeneous stratified turbulence at different Froude

DNS and ILES of Stratified Turbulence

525

and Reynolds numbers. For both cases, we present not only ILES, but also LES with a standard Smagorinsky model (SSM) and high-resolution DNS as benchmark solutions.

2 Governing Equations The non-dimensional Boussinesq equations for a stably stratified fluid in Cartesian coordinates read ∇·u = 0

ρ 1 2 ∂t u + ∇ · (uu) = −∇p − 2 ez + ∇ u Re0 Fr0 1 ∂t ρ + ∇ · (ρ u) = −w + ∇2 ρ Pr Re0

(1a) (1b) (1c)

where velocities are made non-dimensional by U , all spatial coordinates by the length scale L , pressure by U 2 , time by L /U , and density fluctuation ρ = ρ ∗ − ρ (ρ ∗ : local absolute density, ρ : background density) by the background density gradient L |d ρ /dz|. The non-dimensional parameters are Fr0 =

U , NL

Re0 =

UL , ν

Pr =

ν μ

(2)

We chose a Prandtl number of Pr = 0.7, corresponding to values found in the atmosphere. Froude and Reynolds number are used as parameters to control the flow regime. With the instantaneous values of kinetic energy Ek and kinetic energy dissipation εk , we find the local Froude and Reynolds number as well as the buoyancy Reynolds number R, defined by Brethouwer et al. [4]: Fr =

Fr0 L εk , U Ek

Re =

Re0 Ek2 , U L εk

R = Re Fr2

(3)

In ILES by construction we do not have direct access to the value of εk . We thus computed it from the total dissipation rate, assuming a constant mixing ratio of ε p /εk = 0.4, which is an acceptable approximation for a wide range of parameters.

3 Numerical Method and Computational Aspects We extended our existing finite-volume solver INCA for isothermal incompressible Navier-Stokes equations with scalar transport to solve the Boussinesq equations by adding the corresponding source terms in the equations for transport of vertical momentum and buoyancy.

526

S. Remmler, S. Hickel

All computations were run on Cartesian staggered grids with uniform cell size. The code offers different discretization schemes depending on the application. For DNS and LES with SSM, we used a non-dissipative central difference scheme with 4th order accuracy for the convective terms and 2nd order central differences for the diffusive terms and the continuity equation (Poisson equation for pressure). For implicit LES, we replaced the central difference scheme for the convective terms by the implicit turbulence model ALDM. The method is based on a reconstruction of the unfiltered solution on the represented scales by combining Harten-type deconvolution polynomials. The different polynomials are dynamically weighted based on the smoothness of the filtered solution. A tailored numerical flux function operates on the approximately reconstructed solution. Both, the solutionadaptive polynomial weighting and the numerical flux function involve free model parameters, that were calibrated in such a way that the truncation error of the discretized equations correctly represents the SGS stresses of turbulence [12]. This set of parameters was not changed for any subsequent applications of ALDM. For the presented computations, we used an implementation of ALDM with improved computational efficiency. The validity of this method has been proved for a number of applications (e.g. Hickel et al. [14]). For time integration, we used an explicit third-order accurate Runge-Kutta scheme, as proposed by Shu [26]. The time step was dynamically adjusted to keep the CFL number smaller than unity. The Poisson equation for the pressure is solved at every Runge-Kutta sub-step. The Poisson solver employs fast Fourier-transform (FFT) in the vertical direction and a Stabilized Bi-Conjugate Gradient (BiCGSTAB) solver [29] in the horizontal plane. By the FFT, the three-dimensional problem is transformed into a set of independent two-dimensional problems, which can be solved in parallel. Our code INCA is parallelized both for distributed and shared memory usage. Additionally, it is optimized for running on vector computer systems. For the computations presented here, we use the single domain shared memory approach for efficient computation of Fourier transforms of the whole data set. On this single domain, we use the openMP shared memory parallelization capabilities of INCA. The limiting factor for the single-domain approach is the size of the shared memory of the computer. We found excellent conditions on the NEC SX-9 vector computer at the High Performance Computing Center Stuttgart (HLRS). One node of the SX-9 provides 510 GB shared memory for 16 vector CPUs. This enabled us to run DNS with up to approximately one billion cells. The computation time was about 70 ns per time step and cell. The computational performance of the pressure Poisson solver reached approximately 19 GFLOP/s. Concerning the ALDM routines themselves, the computation of the numerical flux function reached 35 GFLOP/s and the reconstruction routine 49 GFLOP/s, which is half the nominal peak performance of the SX-9.

DNS and ILES of Stratified Turbulence

527

4 Test Cases and Results 4.1 Taylor–Green Vortex (TGV) Transitional flows impose a special problem to turbulence subgrid-scale models. Their correct prediction is only possible if the subgrid-scale model does not affect the laminar flow and its instability modes. For most eddy-viscosity models, such as the Smagorinsky model, this requirement is not fulfilled. We used the transition of the three-dimensional Taylor–Green vortex (TGV) as a test for ALDM in laminarto-turbulence transition. The flow field in a triple-periodic box with side length LD = 2π L is initialized with a set of large scale vortices varying vertically: ⎡ ⎤ cos (x/L ) sin (y/L ) u = U cos (z/L ) ⎣ − sin (x/L ) cos (y/L ) ⎦ (4) 0 where U and L are characteristic velocity and length scales of the problem. Initially, all flow energy is concentrated on the lowest wavenumbers. The flow is purely horizontal, laminar and strongly anisotropic. At later times, energy is transferred to smaller scales by vortex stretching. After approximately 10 nondimensional time units, the flow is quasi-turbulent, keeping its determinism and spatial symmetry. At this time, the energy dissipation has a maximum due to the enhanced shear in the small scale vortices. If neutrally stratified, the energy of the vertical velocity component soon reaches the level of the horizontal components and the flow gets isotropic. I case of a stable background stratification, vertical motions are damped by the restoring buoyancy force and the flow remains highly anisotropic. In the linear limit of zero Froude number, the stratification completely prevents the transition to turbulence. For DNS, the number of computational cells depends on the Reynolds number. We used 2563 cells for Re0 = 800, 5123 cells for Re0 = 1600, and 7683 cells for Re0 = 3000, to be sure to resolve the smallest turbulence scales. With LES, the resolution is Reynolds number independent. We used 643 cells for all LES. The effect of a density stratification on turbulence is illustrated in Fig. 1, which shows a visualization of the turbulence structures approximately at the time of maximum dissipation. In a stratified medium, the coherent structures are larger and anisotropic and the shear rate magnitude is lower compared to neutral stratification. The local Froude and Reynolds number in the TGV flow field are rapidly changing during the transition. To verify that the transition occurs in a relevant parameter space, we show the traces of several TGV simulations in Fig. 2. Indeed, most of the simulations are located in the regime of stratified turbulence. Hence they are suitable for validation of an SGS model for stratified turbulence. An LES must be able to correctly predict the temporal evolution of the total dissipation rate by modeling the effect of the small scale vortices on the larger scales. In Fig. 3, we show the results from LES on a 643 cell grid using ALDM as well as the SSM compared to a DNS on a grid with 5123 cells. For neutral and moder-

528

S. Remmler, S. Hickel

Fig. 1 Visualization of the TGV at t = 10 (Re0 = 1600). Iso-surfaces at Q = 0.5, colored by the shear rate

Fig. 2 Regime diagram [4] for the transition of the TGV for different parameters; symbols: DNS, lines: LES with ALDM

ately stable stratification, the ALDM yields better results than the SSM. For neutral stratification, Hickel et al. [12] compared ALDM to a dynamic Smagorinsky model and the spectral eddy viscosity model of Chollet & Lesieur [5]. They found better agreement of the LES solution with DNS data if ALDM was used, compared to both alternative parameterizations. In Fig. 5, we present the contributions of molecular and implicit SGS dissipation to the total dissipation in LES with ALDM for three different intensities of stratification. The relative amount of implicit SGS dissipation decreases with increasing stratification, since the flow is better resolved in cases of strong stratification. For Fr0 = 2 and Fr0 = 1, the dissipation peaks are dominated by implicit SGS dissipation, which shows that the implicit model is automatically activated when it is

DNS and ILES of Stratified Turbulence

529

Fig. 3 Total energy dissipation rate of the TGV (Re0 = 1600)

Fig. 4 Energy budget of the TGV (Re0 = 1600, Fr0 = 1); solid lines: ALDM, dashed lines: SSM

needed and provides a good approximation of the unresolved stresses for different intensities of stratification. The ratios of the different types of energy in the TGV vary constantly during the evolution. While initially there is only horizontal kinetic energy, at later times a certain fraction of this energy is converted to vertical kinetic energy as well as available potential energy. The energy budget for one representative case is shown in Fig. 4. Both LES, with implicit ALDM and with explicit SSM, predict the energy conversions with good accuracy. The best agreement is obtained for the horizontal kinetic energy component. The overall agreement with DNS data is better if ALDM is used.

530

S. Remmler, S. Hickel

Fig. 5 Contributions to energy dissipation of the Tailor–Green vortex (Re0 = 1600)

4.2 Homogeneous Stratified Turbulence (HST) The second investigated test case is homogeneous stratified turbulence in a statistically steady state. The flow is maintained at an approximately constant energy level by a large scale vertically uniform forcing of the horizontal velocity components. This approach was successfully applied by several authors before [21, 22, 31]. We ran two series of DNS, series A with Re0 = 6500 and series B with Re0 = 13 000. The domain size was 3203 cells for series A and 6403 cells for series B. Within the single series, the Froude number was varied to cover different buoyancy Reynolds numbers. The basic domain size again was 2π L . For low Froude numbers, we used a flat domain with a height of only π L , but keeping cubical cells. This is permitted since in stratified turbulence there is only a very small amount of energy contained in the low vertical modes. For both series, we performed LES, both with implicit ALDM and explicit SSM. For all these, we used grid boxes with 643 cells. For the low Froude number simulations, the domain was flattened as well, leading to a doubled resolution in vertical direction. Figure 6 shows the local Froude and Reynolds number of the single simulations. Most important for the assessment of a parametrization scheme for stratified turbulence is its ability to correctly predict the amount of energy converted from horizontal kinetic energy to vertical kinetic energy and available potential energy before the energy is finally dissipated on the smallest represented scales. In Fig. 7, we

DNS and ILES of Stratified Turbulence

531

Fig. 6 Regime diagram for our simulations of stratified turbulence

Fig. 7 Ratio of vertical to potential energy in HST as a function of local Froude number

show the ratio Ev /E p as a function of local Froude number as predicted by DNS and LES with ALDM. The ratio Ev /E p cannot be influenced by the forcing and can thus freely develop only due to the interaction of convective, pressure and buoyancy term. The vertical to potential energy ratio increases almost linearly with Froude number in the DNS. We find the same trend in our LES with ALDM. The agreement between DNS and LES is best in the region of high Froude numbers (weakly stratified turbulence), whereas for low Froude numbers, the vertical kinetic energy is slightly underpredicted. Note that the difference between results from ALDM and SSM differ from each other most at the lowest Froude number. This is an indication for ALDM being better capable of handling the strong turbulence anisotropy in strongly stratified flows. For comparison of kinetic energy spectra, we selected one DNS in the strongly stratified regime (R = 6.3, Fr = 0.03, Re = 8300) in Fig. 9 and one DNS in the weakly stratified regime (R = 41, Fr = 0.07, Re = 9300) in Fig. 8, both from series B. The corresponding LES have similar local Froude and Reynolds numbers.

532

S. Remmler, S. Hickel

Fig. 8 Weakly stratified turbulence kinetic energy spectra (R = 41)

Fig. 9 Strongly stratified turbulence kinetic energy spectra (R = 6.3)

In the horizontal spectra of kinetic energy, the difference between ALDM and a simple eddy viscosity model is most obvious. In the weakly stratified case (R = 41), the horizontal spectrum is still quite similar to the Kolmogorov spectrum of isotropic turbulence. In this case, both SGS models predict the inertial range spectrum fairly well. The SSM is slightly too dissipative, but the difference to the DNS spectrum is acceptable. Things completely change for the stronger stratified case (R = 6.3). The SSM dissipates too much energy and thus underpredicts the inertial range spectrum by more than one order of magnitude. Additionally, the predicted power-law exponent is significantly lower than −5/3. The spectrum predicted by ALDM, on the other hand, agrees well with the DNS. It correctly predicts the characteristic plateau region between the forcing scales and the inertial scales. Moreover, it produces a power-law decay with an exponent of −5/3, corresponding to the DNS and theory derived from scaling laws [4].

DNS and ILES of Stratified Turbulence

533

In the vertical spectra of kinetic energy, the inertial range decay exponent changes from −5/3 in neutrally stratified fluid to −3 in strongly stratified turbulence. We find this change in the DNS and it is well reproduced by the LES. Both SGS models predict the turbulence inertial range decay well. At strong stratification, the ALDM result perfectly agrees with the DNS. The SSM result is slightly too dissipative in this region.

5 Conclusion We presented a numerical investigation of turbulence in a stably stratified fluid to proof the reliability of implicit turbulence modeling with the Adaptive Local Deconvolution Method (ALDM). As benchmark results, we used high resolution DNS data and LES results with an explicit Smagorinsky model. The investigated test cases were the transition of the three-dimensional Taylor–Green vortex (TGV) and horizontally forced homogeneous stratified turbulence (HST). In most simulations, the buoyancy Reynolds number was larger than unity. The Froude and Reynolds number were chosen to cover the complete range from isotropic Kolmogorov turbulence up to strongly stratified turbulence. For the transition of the TGV, we found good agreement between ALDM results and DNS in neutrally and stably stratified fluid. With the implicit model, we generally obtained better results than with a SSM. This demonstrates the ability of ALDM to properly represent the SGS stresses in a transitional stratified flow. In HST, the ALDM results also agree well with the reference DNS, both in integral flow properties and energy spectra. This applies for the whole Froude number range from infinity down to very low values. Especially in the strongly stratified regime, the superiority of ALDM over the SSM is striking. While the SSM is far too dissipative in this case, ALDM spectra agree very well with the reference DNS. It should be noted, that the performed reference DNS at high Reynolds numbers were only possible due to the excellent computational conditions on the NEC SX-9 vector computer at the HLRS. The large shared memory system and the vector architecture enabled us to simulate both test cases at high accuracy to provide reliable reference data for the evaluation of ALDM. The results presented here were obtained without recalibrating the ALDM model constants for stratified turbulence. The good agreement with DNS data shows the ability of ALDM to automatically adapt to strongly anisotropic turbulence. Within the continuation of the project, we will investigate to which extend the results can be further improved by a recalibration of the model coefficients for stratified turbulence. But even without this possible improvements, we can use ALDM as a reliable turbulence SGS model for geophysical applications. Acknowledgments. This work was funded by the German Research Foundation (DFG) in line of the MetStr¨om priority program. Computational resources were provided by the High Performance Computing Center Stuttgart (HLRS).

534

S. Remmler, S. Hickel

References 1. P. Bouruet-Aubertot, J. Sommeria, and C. Staquet. Stratified turbulence produced by internal wave breaking: Two-dimensional numerical experiments. Dyn. Atmos. Oceans, 23(1–4):357– 369, 1996. Stratified flows. 2. M. E. Brachet, D. Meiron, S. Orszag, B. Nickel, R. Morf, and U. Frisch. Small-scale structure of the Taylor–Green vortex. J. Fluid Mech., 130:411–452, 1983. 3. M. E. Brachet. Direct simulation of three-dimensional turbulence in the Taylor–Green vortex. Fluid Dynam. Res., 8(1–4):1–8, 1991. 4. G. Brethouwer, P. Billant, E. Lindborg, and J.-M. Chomaz. Scaling analysis and simulation of strongly stratified turbulent flows. J. Fluid Mech., 585:343–368, 2007. 5. J.-P. Chollet and M. Lesieur. Parameterization of small scales of three-dimensional isotropic turbulence utilizing spectral closures. Journal of the Atmospheric Sciences, 38(12):2747– 2757, 1981. 6. C. Cot. Equatorial mesoscale wind and temperature fluctuations in the lower atmosphere. J. Geophys. Res., 106(D2):1523–1532, 2001. 7. E. M. Dewan. Stratospheric wave spectra resembling turbulence. Science, 204(4395):832–835, 1979. 8. A. D¨ornbrack. Turbulent mixing by breaking gravity waves. J. Fluid Mech., 375:113–141, 1998. 9. D. C. Fritts, L. Wang, J. Werne, T. Lund, and K. Wan. Gravity wave instability dynamics at high Reynolds numbers. Part I: Wave field evolution at large amplitudes and high frequencies. J. Atmos. Sci., 66(5):1126–1148, 2009. 10. K. S. Gage. Evidence for a k−5/3 law inertial range in mesoscale two-dimensional turbulence. J. Atmos. Sci., 36:1950–1954, October 1979. 11. J. R. Herring and O. M´etais. Numerical experiments in forced stably stratified turbulence. J. Fluid Mech., 202(1):97–115, 1989. 12. S. Hickel, N. A. Adams, and J. A. Domaradzki. An adaptive local deconvolution method for implicit LES. J. Comput. Phys., 213:413–436, 2006. 13. S. Hickel, N. A. Adams, and N. N. Mansour. Implicit subgrid-scale modeling for large-eddy simulation of passive scalar mixing. Phys. Fluids, 19:095102, 2007. 14. S. Hickel, T. Kempe, and N. A. Adams. Implicit large-eddy simulation applied to turbulent channel flow with periodic constrictions. Theor. Comput. Fluid Dyn., 22:227–242, 2008. 15. H.-J. Kaltenbach, T. Gerz, and U. Schumann. Large-eddy simulation of homogeneous turbulence and diffusion in stably stratified shear flow. J. Fluid Mech., 280(1):1–40, 1994. 16. R. H. Kraichnan. Inertial ranges in two-dimensional turbulence. Phys. Fluids, 10(7):1417– 1423, 1967. 17. J.-P. Laval, J. C. McWilliams, and B. Dubrulle. Forced stratified turbulence: Successive transitions with Reynolds number. Phys. Rev. E, 68(3):036308, September 2003. 18. D. K. Lilly. Stratified turbulence and the mesoscale variability of the atmosphere. J. Atmos. Sci., 40(3):749–761, 1983. 19. D. K. Lilly, G. Bassett, K. Droegemeier, and P. Bartello. Stratified turbulence in the atmospheric mesoscales. Theor. Comput. Fluid Dyn., 11:139–153, 1998. 20. E. Lindborg and G. Brethouwer. Stratified turbulence forced in rotational and divergent modes. J. Fluid Mech., 586:83–108, 2007. 21. E. Lindborg. The energy cascade in a strongly stratified fluid. J. Fluid Mech., 550(1):207–242, 2006. 22. O. M´etais and J. R. Herring. Numerical simulations of freely evolving turbulence in stably stratified fluids. J. Fluid Mech., 202(1):117–148, 1989. 23. O. M´etais and M. Lesieur. Spectral large-eddy simulation of isotropic and stably stratified turbulence. J. Fluid Mech., 239:157–194, 1992. 24. G. D. Nastrom and K. S. Gage. A climatology of atmospheric wavenumber spectra of wind and temperature observed by commercial aircraft. J. Atmos. Sci., 42(9):950–960, 1985.

DNS and ILES of Stratified Turbulence

535

25. J. J. Riley and S. M. de Bruyn Kops. Dynamics of turbulence strongly influenced by buoyancy. Phys. Fluids, 15(7):2047–2059, 2003. 26. C.-W. Shu. Total-variation-diminishing time discretizations. SIAM J. Sci. Stat. Comput., 9(6):1073–1084, 1988. 27. L. M. Smith and F. Waleffe. Generation of slow large scales in forced rotating stratified turbulence. J. Fluid Mech., 451(1):145–168, 2002. 28. C. Staquet and F. S. Godeferd. Statistical modelling and direct numerical simulations of decaying stably stratified turbulence. Part 1. Flow energetics. J. Fluid Mech., 360:295–340, 1998. 29. H. A. van der Vorst. Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 13(2):631–644, 1992. 30. T. E. van Zandt. A universal spectrum of buoyancy waves in the atmosphere. Geophys. Res. Lett., 9(5):575–578, 1982. 31. M. L. Waite and P. Bartello. Stratified turbulence dominated by vortical motion. J. Fluid Mech., 517:281–308, 2004.

Miscellaneous Topics Univ.-Prof. Dr.-Ing. Wolfgang Schr¨oder

In the previous chapters topics such as fluid mechanics, structural mechanics, aerodynamics, thermodynamics, chemistry, combustion, and so forth have been addressed. In the following another degree of interdisciplinary research is emphasized. The articles clearly show the link between applied mathematics, fundamental physics, computer science, and the ability to develop certain models such that a closed mathematical description can be achieved which can be solved by highly sophisticated algorithms on up-to-date high performance computers. In other words, it is the collaboration of several scientific fields which on the one hand, defines the area of numerical simulations and on the other hand, determines the progress in fundamental and applied research. The subsequent papers, which represent an excerpt of various projects linked with HLRS, will confirm that numerical simulations are not only used to compute some quantitative results but to corroborate basic physical models and to even develop new theories. The project of the Chair of Banking of the University of Hohenheim in Stuttgart deals with the analysis of the internal allocation process to secure buffer capital, i.e., economic capital, of banks. The approach describes the difficulties caused by consistently equating a bank’s allocation of economic capital with an allocation of decision rights in the form of value at risk limits. Risk measurement through value at risk methods is widespread. The strategic use of these methods to optimize the return to risk ratio actively on an overall bank level is hardly developed. A bank’s central planner is modeled coping is modeled with correlations’ uncertainty and learning about the limit addressees’ skills. To face the underlying mixed integer non-linear program the model provides the central planner with a heuristic optimization approach. According to the given information and the assumed rationality of the central planner, the resulting limit allocations are optimal in a portfolio theoretical sense. The developed numerical model generates a data set that shows the superiority of this allocation method compared to other approaches.

Institute of Aerodynamics, RWTH Aachen University, W¨ullnerstr. 5a, 52062 Aachen, Germany, e-mail: [email protected] 537

538

W. Schr¨oder

The impact of spatial melt on mantle convection is considered by the Institute of Planetary Research of the German Research Center in Berlin and the Institute of Planetology of the University of M¨unster. The thermo-chemical evolution of a oneplate planet like Mars strongly influences its atmospheric evolution via volcanic outgassing which produces partial melt in the mantle. In earlier thermal evolution and convection models melt production was considered by the release and consumption of latent heat, the formation of crust and the redistribution of radioactive heat sources. In the current contribution thermo-chemical models are presented that examine the influence of partial melt on the mantle dynamics of a one-plate planet such as Mars. Assuming fractional melting, where melt leaves the system as soon as it is formed, cooling boundary conditions, and decaying radioactive elements, the effects of partial melt on the melting temperature, mantle density, and viscosity on the mantle dynamics of Mars are investigated. The research of the scientists from the University of Paderborn, the High Performance Computing Center in Stuttgart, and the Technical University of Kaiserslautern focuses on molecular modeling of hydrogen bonding fluids. The availability and accuracy of thermodynamic properties determines the success of process design in chemical engineering and energy technology. In their contribution molecular modeling and simulation is used. It has become a mature tool to accurately predict thermodynamic properties of fluids provided the prediction is based on molecular models that are based on quantum chemical calculations and are optimized to vaporliquid equilibrium (VLE) data only. The scientists of the Institute of Geodesy of the University of Stuttgart use highperformance computing to efficiently postprocess and evaluate satellite gravity data. Space-borne gravity field recovery requires the solution of large-scale linear systems of equations to estimate tens of thousands of unknown gravity field parameters from tens of millions of observations. The extension of the Gravity field and steadystate Ocean Circulation Explorer (GOCE) mission poses unprecedented computational challenges in geodesy. To overcome these problems the numerical method was rewritten using the MPI, PBLAS and ScaLAPACK programming standards. The tailored implementation enhances the range of usable computer architectures to computers with less memory per node than the NEC SX-8 and SX-9 systems. Among other findings, runtime results using NEC SX systems as distributed memory systems will be presented. In the contribution of the Institute of Materials and Processes of the Karlsruhe Institute of Technology the phase-field method is applied to investigate metallic foam structures and dendritic growth. To be more precise, the analyses include heat conduction of open cell metal foams, dendritic growth, and optimizations of the concurrent processing using the message passing interface (MPI) standard. Largescale simulations are performed to identify relevant parameters of heat conduction and dendrite growth. The basic model and parallelization scheme is described. Disadvantages of 1D domain decomposition compared to 3D domain decomposition for large 3D simulation domains are explained and a detailed analysis of the new 3D decomposition requirements is given.

Miscellaneous Topics

539

The project “Quaero 2010: Speech-to-Text Evaluation Systems” of scientists from the Karlsruhe Institute of Technology is embedded into the Quaero research program, a French consortium with German participation. Its goal is to develop multimedia and multilingual indexing and management tools for professional and general public applications such as the automatic analysis, classification, extraction, and exploitation of information. In their contribution, the authors focus on various aspects of Automatic Speech Recognition and Machine Translation for both English and German. Since the underlying algorithmics entails a large number of almost independent tasks, the project can benefit extremely from parallel computing. The contribution “CAR2X: Accurate Simulation of Wireless Vehicular Networks Based on Ray Tracing and Physical Layer Simulation” is a nice example that informatics is not only a technology provider for HPC, but also a user. Vehicle-to-vehicle and vehicle-to-roadside communication are required for numerous applications that aim at improving traffic safety and traffic efficiency. As recent studies have shown, communication in this context is significantly influenced by radio propagation characteristics of the environment and the signal processing algorithms that are executed on the physical layer of the communications stack. Hence, to enable a proper assessment, the authors from the Institute of Telematics of the Karlsruhe Institute of Technology integrate a physical layer simulator into the popular NS-3 network simulator and employ ray tracing as a (computationally costly) method to accurately simulate the radio propagation characteristics of the Karlsruhe Oststadt. The paper by Mangold et al. is concerned with a crucial problem related to numerical simulation techniques in engineering. This is the reliability and reproducibility of large scale simulations. The authors use the example of crash simulation to discuss the predictability of results. Crash simulations are highly nonlinear. Dynamic effects as well as finite strain plasticity and complex contact response govern the results of such simulations. The paper shows that small changes can influence the output to a great extend and thus chaotic behavior is observed even if no physical stabilities are present. The project is aimed to identify sensitivities in crash simulations and to enhance reliability and reproducibility.

Allocation of Economic Capital in Banking: A Simulation Approach H.-P. Burghof and J. M¨uller

Abstract The approach describes the difficulties implied through consistently equating a bank’s allocation of economic capital with an allocation of decision rights in the form of value-at-risk limits. These days, risk measurement through value-atrisk methods is widespread. Using these methods strategically in order to optimize the return to risk ratio actively on an overall bank level is hardly developed. Thereto we model a bank’s central planner coping with correlations’ uncertainty and learning about the limit addressees’ skills. In order to face the underlying mixed integer non linear program the model provides the central planner with a heuristic optimization approach. According to the given information and the assumed rationality of the central planner, resulting limit allocations are optimal in a portfolio theoretical sense. The numerical model generates a data set providing evidence concerning this allocation method’s superiority compared to others.

1 Introduction Our approach’s intention is the analysis of the internal allocation process concerning securing buffer capital of banks. Such capital, also called economic capital, has to be provided by banks to a certain extent enabling the bank to compensate own business dealings’ losses. Thereby, insolvency and corresponding chain reactions based on the close business interactions among banks can be prevented or at least mitigated. Economic capital has to be a reserve with high liquidity level. Compared to the bank’s investment possibilities this liquidity normally causes certain opportunity costs. Hence, economic capital has to be assigned to businesses or business units providing advantageous return to risk ratios in order to maximize the bank’s total returns. Thereto, the present numerical approach provides a model bank that can H.-P. Burghof · J. M¨uller Lehrstuhl f¨ur Bankwirtschaft und Finanzdienstleistungen, Universit¨at Hohenheim, 70599 Stuttgart, Germany, e-mail: [email protected], [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 39, © Springer-Verlag Berlin Heidelberg 2012

541

542

H.-P. Burghof, J. M¨uller

be used to test different allocation methods of economic capital concerning their impact on the model bank’s performance.

2 A Model of Optimal Economic Capital Allocation The allocation of economic capital is implemented through the allocation of valueat-risk (VAR) limits vl among the business units. Each unit trades one particular security representing the different business areas the bank is active in. The corresponding numerical model’s extent concerning computation time stems from the underlying mixed integer non linear program (MINLP) denoted by the following. 1 m n ∑ ∑ w j, i (ψ j, i ) × cbank × r j, i , m i=1 j=1 1 if plbank, i < −vlbank where bbank, i : 0 else 1 if pl j, i < −vl j , and b j, i : 0 else

maxn μbank (vl), μbank =

vl j ∈R

subject to

1 m ∑ bbank, i ≤ α , m i=1

1 m ∑ b j, i ≤ α , m i=1 n ∑ w j, i × cbank ≤ cbank ,

j=1

and vl j ≤ vl j ≤ vl j .

(1)

The objective function is represented through the bank’s expected return μbank . The vector vl provides independent variables in form of VAR limits vl j . Within these limits the business units can take long and short positions which result in different objective function’s outcomes. The bank’s expected return μbank is given by the average of m Monte-Carlo-simulations of the bank’s profit and loss plbank, i = ∑nj=1 pl j, i where pl j, i = w j, i (ψ j, i ) × cbank × r j, i denotes the profit and loss of business unit j. Furthermore, there are the units’ position weights w j, i (ψ j, i ), the constant amount of investment capital cbank and securities’ rates of return r j, i . The term ψ j, i reflects the impact from the underlying situation of autonomous decision making by the business units and introduces thereby caused bias resulting from e.g. correlations’ instability, asymmetric position building and informed and herding decision makers.1 Hence, the position weights w j, i (ψ j, i ) base on separate simulations of the bank’s business units’ acting. During the model there is the assumption of a cen1

See [3] for issues on informational cascades and herding in financial institutions.

Allocation of Economic Capital in Banking: A Simulation Approach

543

tral planner learning about the units’ behavior.2 This information enters the limit optimization in form of ψ j, i . The central planner’s modeling is not deepened here in order to keep present remarks short. Nevertheless, consideration of these effects is central for the entire investigation and is expected to increase allocations’ efficiency. It represents a key difference to the viewpoint of common portfolio theory which exclusively considers securities’ or any financial instruments’ returns for optimization issues. The binary variable bbank, i takes the value one if plbank, i exceeds the bank’s given and constant total VAR limit vlbank . In all other cases the variable’s value is 0. The similar variable b j, i refers to each business unit’s VAR limit vl j . These binary variables’ outcomes define whether the side conditions are kept. The conditions are represented through a quantile value α (e.g. α = 0.01 in the case of VAR computation on the basis of a 99 percent confidence level). For example the bank’s value at risk VARbank, α is given by the absolute value of the bank’s inverse cumulative profit −1 (α )|. If the condition and loss distribution function (CDF) denoted through |Fbank 1 m b ≤ α is kept, the CDF’s outcome satisfies vl , which is at the same ∑ bank, i bank m i=1 time the amount of economic capital being statistically sufficient to cover the bank’s daily losses in 99 percent of all cases. When it comes to the business units, the CDF of the corresponding unit |Fj−1 (α )| is used. Furthermore there is a budget constraint in the form of condition ∑ni=1 |w j, i | × cbank = cbank determining the bank’s total investment capital cbank . Since there are also short positions, the weights’ absolute values are taken. For reasons of simplification the model supposes short positions to require the same amount of investment capital as long positions, e.g. for protection means. Finally there is the condition vl j ≤ vl j ≤ vl j restricting each unit’s limit vl j to a particular interval to exclude unrealistic high or low limits.

3 Optimization of Economic Capital Allocation vs. Portfolio Optimization under a VAR Constraint In order to give further insight, the above discussed model is confronted with the setting of common portfolio optimization under a VAR constraint to emphasize the differences. In the case of portfolio optimization, the independent variables are no longer VAR limits but position weights denoted by vector w. As a result, a solution to the problem in form of particular weights now exactly dictates the security positions to be built. The situation where decision makers can autonomously take long or short positions within an individual risk limit is factored out. For reasons of well comparability with the economic capital allocation model, the expression “bank” is kept instead of using “portfolio”.

2

See [2, pp. 205–210] for a description of the central planner’s Bayesian updating based learning mechanism.

544

H.-P. Burghof, J. M¨uller

1 m n ∑ ∑ w j × cbank × r j, i , w j ∈R m i=1 j=1 1 if plbank, i < −vlbank where bbank, i : , 0 else maxn μbank (w) =

1 m ∑ bbank, i ≤ α , m i=1 n ∑ w j × cbank ≤ cbank

subject to

j=1

and w j ≤ w j ≤ w j .

(2)

Since only the superior total VAR limit of the whole bank/portfolio is relevant, the binary variable b j, i is not required anymore. Furthermore, here plbank, i is given by ∑nj=1 w j × cbank × r j, i . The weights w j are no longer part of the Monte-Carlosimulations themselves as denoted by the missing control variable i. Similar to the economic capital model, each independent variable is bounded by an individual interval |w j | ≤ |w j | ≤ |w j | to obtain realistically sized weights.

4 Applied Heuristic Optimization Method—Functional Principle of Threshold Accepting The solving of the model’s underlying optimization problem is approached through the heuristic optimization method3 named threshold accepting (TA). In order to introduce the method the simpler setting of portfolio optimization under a VAR constraint is used. Key feature of TA is a stepwise modified threshold which increasingly prevents acceptance of worsened interim solutions. The possibility of also worsened solutions being accepted to a certain extent is an effective remedy against getting stuck in local extreme points of the solution space. Thereby, the method is useful to generate appropriate approximations of global optima. Since the inner logic of TA finally represents a simple local search mechanism the focus has to be on its nontrivial parameterization in order to achieve appropriate results. In this context, a central task is the determination of the threshold sequence. This is accompanied by important questions concerning the step size through the solution space and therefore the definition of the neighborhood’s extent of current solutions. Furthermore, there has to be set the number of restarts to assure the space’s proper coverage. Less critical but relevant is the determination of the number of rounds and thresholds respectively and the number of optimization steps within these rounds. The following pseudo code outlines the method’s functioning.4 3 4

See [7] for issues on heuristic optimization. Design and variables’ naming is chosen similar to the description of [4, p. 9].

Allocation of Economic Capital in Banking: A Simulation Approach

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

545

initialize nrestarts , nrounds and nsteps compute threshold sequence t = t1 , . . . , tnrounds for i = 1 to nrestarts do randomly generate current solution wc for j = 1 to nrounds do for k = 1 to nsteps do generate wn ∈ N(wc ) and compute Δ = μbank (wn ) − μbank (wc ) if Δ > t j then wc = wn end for end for μbank, i = μbank (wc ) and wi = wc end for w∗ = wi with i | μbank, i = max{ μbank, 1 , . . . , μbank, nrestarts }

Line seven describes the generation of a new solution wn from the neighborhood of current solution wc , denoted by the term N(wn ). The determination of the neighborhood will not be described furthermore by the following to keep it short. This also applies to the processes listed by the lines one and two which also require considerably more attention during implementation than indicated by the pseudo code. Finally the best solution w∗ from the nrestarts outcomes is chosen as shown by the last line. A graphic example for the problem-solving during one restart is given by Fig. 1. The underlying portfolio consists of 100 artificial securities whose returns are simulated by geometric Brownian motion. The portfolio’s starting configuration is set

Fig. 1 Example graph of the problem-solving during one restart

546

H.-P. Burghof, J. M¨uller

Fig. 2 Exemplary outcome of a heuristic portfolio optimization

randomly. The optimization produces 14101 accepted interim solution and the final solution. Therefore, nrounds = 10 and nsteps = 5000 were used. Although there is one threshold per round, the graph only exhibits nine of them, since the last two are both close to 0 and appear as one. The number of accepted interim solutions per threshold becomes increasingly smaller during one restart. Every interim solution which leads to a worsening that exceeds the threshold is not accepted and hence is not used as starting point for the following optimization step. Hence, in the present case of expected return maximization the threshold is represented through a negative value. During the example total improvement is mainly achieved during the first phase. Hence, this restart displays a rather imbalanced solution development. The corresponding monitoring therefore is seen as an important means in order to calibrate TA.5 However, further restarts are necessary to decide whether calibration has to be modified which is not gone through at this point. The resulting portfolio of the optimization from Fig. 1 is displayed by Fig. 2. According to the side conditions from (2), the weights’ values are bounded to a minimum and maximum value to achieve a realistic allocation. After sorting them by size, the minimum weights cumulate in the middle. The sum of all weights’ absolute values adds up to 100%.

5

See [6, pp. 29–30].

Allocation of Economic Capital in Banking: A Simulation Approach

547

5 Remarks on the Appropriateness of Threshold Accepting Optimization in context with VAR is still a problem whose complexity so far prevented the introduction of a method which is satisfying concerning computational effort and accuracy. The complexity stems from the corresponding solution spaces being non convex, provided with many local optima and often inherent discontinuity. However, TA seems promising to be an effective means to overcome these obstacles. A reason is the recent introduction of universal parameterization methods.6 There are investigations providing evidence that approximations on the basis of TA are at least sufficiently accurate for practical purposes from the fields of finance.7 Additionally, increasing computing power through developments from the fields of hardware and parallel computing give rise to TA and other heuristic methods. Nevertheless, there has been introduced a closed form method concerning optimization in context with VAR which is based on the optimization of expected shortfall (ES) and the subsequent conversion of ES into VAR.8 The correspondingly generated VAR solutions are undoubtedly valuable because of their additional appropriateness towards the rather more meaningful risk measure ES. Nevertheless, an optimization directly using VAR generates different and eventually more advantageous solutions if the focus is on maximization of short-term earnings and if particularly VAR is relevant for regulatory issues.9 Further advantage of using TA as a means for portfolio optimization is its flexibility concerning considerations of realistic (integer) constraints. For example constraints on the size of portfolio positions are easy to implement with TA. The introduction of a cardinality constraint in form of a minimum or maximum number of portfolio positions even excludes applicability of mentioned closed form approach [4, p. 14]. The much more complex setting of optimal economic capital allocation might provide further difficulties concerning a closed form implementation.

6 Technical Information on Typical Computations The parallelization of computations was achieved through using the Intel Message Passing Interface (IMPI). The parallel structure corresponds to a classical master and servant architecture. As programming language C++ in combination with the Intel compiler is applied. In order to provide information concerning resource requirements the example of efficient frontier computation from the field of portfolio theory is used. For one 6 See Gilli et al. [4]. They also reference Alth¨ ofer and Koschnick [1] proving TA to provide well convergence properties. 7 See [5]. 8 See [8]. 9 See [4, p. 12], for a graph emphasizing the mentioned differences concerning VAR optimizations on the basis of ES.

548

H.-P. Burghof, J. M¨uller

Table 1 Technical information on efficient frontier computations with the NEC Nehalem cluster Computationsa A B C Dots/single jobsb 1 1 100 Restarts 1 50 50 Rounds 10 10 10 Steps 5000 5000 5000 Sequential 13.97 s 11.6 min 19.3 h – – 46.4 min Parallelc a Each computation is based on the identical 100 securities whose expected returns base on 10000 simulations. b One scatter plot dot corresponds to one single job. c The parallel computation uses 4 nodes with 8 processors each.

efficient frontier the described TA algorithm has to process many optimizations. The exact amount of optimizations depends on the desired density and dots respectively of the resulting scatter plot. There has to be indicated, that displayed runtimes all refer to the setting of portfolio optimization under a VAR constraint. Time consuming computations here are the checks on interim solutions concerning their adherence to the total VAR limit. In contrast, the setting of economic capital allocation optimization requires significantly more computational resources. This stems from the necessity of simulating the business units’ acting in order to perform the check concerning the total VAR limit. This implies many more steps of calculation. Additionally, there have to be controlled the business units’ individual VAR limits. However, since the model of economic capital allocation optimization is not finally implemented yet, the current technical information exclusively refers to the example case of portfolio optimization under a VAR constraint.

7 Conclusions The optimization of economic capital allocation was described through a model. In order to give further insight, the model was compared with the situation of common portfolio optimization under a value-at-risk constraint. As an appropriate optimization method, the threshold accepting heuristic was introduced. Despite similarities there are key differences between the optimal allocation of economic capital and portfolio optimization. Obvious similarity is the consideration of diversification effects representing the main issue in both scenarios. Nevertheless, portfolio optimization exclusively considers diversification effects which are based on securities’ or any applied financial instruments’ returns. In contrast, in the case of optimal allocation of economic capital there is no assignment of portfolio weights to investment products but rather the assignment of decision rights in form of valueat-risk limits to autonomously acting business units. Future research focuses on the complete computational implementation of the model. The implementation aims at taking into account the potential bias resulting

Allocation of Economic Capital in Banking: A Simulation Approach

549

from the decision rights based view. Hence, the consideration of decision makers’ autonomous acting which causes correlations’ instability, their different degrees of market information influencing their chances of success and their decisions’ dependency from observations of their fellow business units which implies a certain herding behavior and increased risk respectively.

References 1. Alth¨ofer, I. and Koschnick, K.-U. (1991). On the convergence of threshold accepting. Applied Mathematics and Optimization 24(1), 183–195. 2. Burghof, H.-P. and M¨uller, J. (2009). Allocation of Economic Capital in Banking: A Simulation Approach. In: Handbook of Value at Risk (ed. Gregoriou, G. N.). McGraw-Hill, New York. 3. Burghof, H.-P. and Sinha, T. (2005). Capital allocation with value-at-risk—the case of informed traders and herding. Journal of Risk 7(4), 47–73. 4. Gilli, M., Kellezi, E. and Hysi, H. (2006). A data-driven optimization heuristic for downside risk minimization, Journal of Risk 8(3), 1–19. 5. Gilli, M. and Schumann, E. (2010). Optimal enough? Working paper. http://papers.ssrn.com/ sol3/papers.cfm?abstract id=1420058. Cited 18 July 2011. 6. Gilli, M. and Winker, P. (2008). A review of heuristic optimization methods in econometrics. Swiss Finance Institute, Research Paper Series No. 12. http://www.swissfinanceinstitute.ch/ faculty research/publications/paperlist.htm. Cited 18 July 2011. 7. Maringer, D. (2005). Portfolio Management with Heuristic Optimization (eds. Amman, H. and Rustem, B.). Springer, The Netherlands. 8. Rockafellar, R. T. and Uryasev, S. (2000). Optimization of conditional value-at-risk. Journal of Risk 2(3), 21–41.

The Influence of Partial Melt on Mantle Convection Ana-Catalina Plesa and Tilman Spohn

Abstract The thermo-chemical evolution of a one-plate planet like Mars strongly influences its atmospheric evolution via volcanic outgassing, which is linked to the production of partial melt in the mantle. In earlier thermal evolution and convection models melt production has been considered by the release and consumption of latent heat, the formation of crust and the redistribution of radioactive heat sources. We present thermo-chemical 2D convection models that examine the influence of partial melt on the mantle dynamics of a one-plate planet such as Mars. Assuming fractional melting, where melt leaves the system as soon as it is formed, cooling boundary conditions and decaying radioactive elements, we investigate the effects of partial melt on the melting temperature, mantle density and viscosity. In the present study, we examine the influence of these effects on the mantle dynamics of Mars.

1 Introduction Thermal and chemical convection in planetary mantles are the most dominant dynamical processes influencing the thermal and geological evolution of a planet. After the planetary formation, convection in the interior is one of the most prominent processes being responsible for the heat transport efficiency, the interior structure, the magnetic field generation and the geological structures at the surface of a planet such as volcanoes, rifts and others. Mantle convection may take different forms depending on the planet. On Earth, it involves recycling of the surface or oceanic lithosphere and results in plate tectonics. Because the lithosphere is relatively cold, recycling the lithosphere represents an Ana-Catalina Plesa German Aerospace Centre, Institute of Planetary Research, Berlin, Germany and WWU, Institute of Planetology, Muenster, Germany, e-mail: [email protected] Tilman Spohn WWU, Institute of Planetology, Muenster, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 40, © Springer-Verlag Berlin Heidelberg 2012

551

552

A.-C. Plesa, T. Spohn

extremely efficient way to remove the heat from the interior and cool the mantle. On other terrestrial planets, the so-called one-plate planets like Venus and Mars, mantle convection does not involve the outer layers. Instead it occurs below a stagnant lid where heat is transported by conduction. The different characteristics of mantle convection also have a strong influence on the planetary melt generation which on the other hand can influence the cooling behavior and convection structure itself. As the exact style of convection is only poorly known from observations such as seismology or indirectly from geological structures at the planetary surfaces that are formed by the internal processes, our main understanding stems from laboratory experiments and in particular from computer simulations. The latter built up the most powerful access to this fluid-dynamical problem by solving partial differential equations in a discrete formulation to describe the flow in space and time. In the present project, a systematic study of the role of partial melt in a convecting planetary mantle on the heat transport and the flow pattern was performed with numerical simulations using the 2D and 3D spherical convection code GAIA [12, 13]. The model was applied in particular to Mars using constraints inferred by the observed volcanic history that has been suggested by space mission data (e.g., Mars Express) and the data analyzed from the SNC meteorites [23].

2 Mantle Convection Model Mantle convection is a highly non-linear process which can be modeled using the equations of conservation of mass, energy, and momentum [30]. In our model two further equations account for compositional variations due to mantle depletion and changes in the mantle water concentration due to material dehydration. Both effects are consequences of mantle melting. The equations are scaled with the thickness of the mantle as a length scale and with the thermal diffusivity as a time scale. Therefore the non-dimensional equations of a Boussinesq fluid assuming a Newtonian rheology are [8]: ∇·u = 0 T ∇ · η (∇u + (∇u) ) + (RaT + RaC F)er − ∇p = 0 RaQ ∂T ΔS ∂F + u∇T − ∇2 T − + (T + Tsur f ) =0 ∂t Ra cp ∂ t ∂C ∂F 1 + u∇C − ∇2C + =0 ∂t Le ∂t 1 ∂X ∂ XF + u∇X − ∇2 X + =0 ∂t Le ∂t

(1) (2) (3) (4) (5)

The parameters in the above and following equations are non-dimensionalized using the relationships to physical properties presented in [5] where u is the velocity field, η is the viscosity, T is the temperature, C is the composition, X is the water concentration, er is the unity vector in radial direction, p is the pressure, Tsur f is

The Influence of Partial Melt on Mantle Convection

553

the surface temperature, Δ S is the entropy change upon melting, c p is the heat capacity at constant pressure, t is the time, F is the melt fraction and Le is the Lewis number, which is the ratio of thermal diffusivity to mass diffusivity. XF is the water concentration in the melt and can be calculated using the fractional melting formula [7, 20]: Xbulk (6) 1 − (1 − F)1/D XF = F where Xbulk is the initial water concentration in the mantle, and D = 0.01 the partitioning coefficient [17]. In (2) and (3) Ra is the thermal Rayleigh number, RaQ is the Rayleigh number for internal heat sources and RaC is the compositional Rayleigh number. These Rayleigh numbers are defined as follows [3, 6]:

ρ gαΔ T D3 κη ρ 2 g α Qm D5 RaQ = κ kη Δ ρ gD3 RaC = κη Ra =

(7) (8) (9)

with ρ the mantle density, g gravitational acceleration, Δ T temperature difference between surface and core-mantle boundary, D mantle thickness, κ thermal diffusivity, η reference viscosity, Qm mantle radioactive heat sources, k thermal conductivity and Δ ρ density difference upon mantle differentiation. The viscosity is calculated according to the Arrhenius law for diffusion creep [15]. The non-dimensional formulation of the Arrhenius viscosity law for temperature and depth dependent viscosity [29] is given by: E + (r p − r)V E + (r p − rre f )V (10) η (r, T ) = exp − T + Tsur f Tre f + Tsur f where E is the activation energy, Tsur f is the surface temperature, V is the activation volume, r is the radius, and r p is the planet radius. Tre f and rre f are reference values for temperature and radius. The effect of partial melt on the viscosity is given in (11) as described in [7] η (r, T, X) = η (r, T ) exp (1 − Xres ) log(Δ ηdry wet ) (11) where Xres is a normalized concentration so that it has an initial value of one and decreases toward zero as the water concentration in the mantle decreases. For our simulations we test different initial temperature and composition profiles. The surface temperature is fixed and will not change during the simulation. We choose free slip boundary conditions. For this the velocity vector is decomposed into a lateral part projected onto the boundary and a radial part. The radial component of the velocity is set to zero while material can still move along the boundary. We

554

A.-C. Plesa, T. Spohn

also consider a one-plate planet with cooling boundary conditions and decaying radioactive elements. The radioactive decay of the heat producing elements is given by [2]: Qm = Q0 exp(−λ t) (12) where Q0 is the initial heat-generation rate, λ is the decay constant, and t the time.

3 Technical Realization We consider the mantle convection in 2D and 3D spherical shells using the code GAIA [12, 13]. The discretization of the governing equations is based on the finite-volume method with the advantage of utilizing fully irregular grids in three and two dimensions, efficiently parallelized for up to 396 CPUs [12, 13]. As space is discretized by a fixed grid, time must be discretized as well. For the temporal discretization a fully implicit second-order method, also called an implicit three-level scheme, after [9] has been used. In contrast to spatial discretization, the temporal discretization is flexible and can adapt with a varying time step Δ t to the situation. A method proposed by Caretto et al. in [4] and Patankar in [24] called SIMPLE was adopted to solve the coupling of the continuity equation with the momentum equation. The model was validated by a comparison with analytically known solutions as well as published numerical results [12, 25]. A comparison with a commercial product also yielded satisfying results. A convergence test with successively refined grids proved the convergence of global quantities towards an extrapolated solution [14]. The program has been written in C++ and does not use any additional libraries. The code has been parallelized using the message passing interface (MPI). Libraries like HPMPI, IntelMPI and MVAPICH have been successfully used. To run a simulation with high resolution, the code must work with more than one CPU in parallel. Typically, a domain decomposition of the grid is applied, which results in an optimal breakdown of the grid into p equal volumes for 3D grids and p areas for 2D grids where each of the domains is assigned to a single processor. An efficient domain decomposition minimizes the area between these sections, leading to a minimized overhead of data exchange between the processors. The computer code was tested on two supercomputing centers and two local shared memory machines. To evaluate the performance, the same initial setup was taken and run on several node counts. The ratio of the execution time determines the speed-up. This speed-up is therefore the factor that determines the acceleration of the code for the same problem on various CPU counts. The IBM JUMP cluster consists of 41 nodes, each containing 32 p690 CPUs. The CLX system at the CINECA supercomputing center in Bologna, Italy, consists of 1024 Intel XEON CPUs with 3 GHz.

The Influence of Partial Melt on Mantle Convection

555

Fig. 1 2D-3D performance depending on the number of CPUs used. The speed-up factor is calculated by dividing the amount of time needed with only one CPU by the amount of time needed with the parallel code; blue: IBM JUMP, magenta: CINECA CLX, red: 8-Way AMD Opteron, green: 8-Way Intel Xeon

On the small-scaled architectures the 8-way AMD Opteron 875 CPU out-ran the 8-way quad-core XEON5355 shared memory configuration even though INTELs single-core performance was almost twice as fast as a single AMD 875 core. The break-even point was reached with 4 CPUs. Figure 1 shows a comparison of the speed-up obtained for both 2D and 3D simulations [25]. The 2D version of the GAIA code was tested further on four other supercomputing centers: HLRN (North German Supercomputing Alliance), PFCLUSTER1 (German Aerospace Centre, DLR Berlin), HP XC4000 (Steinbuch Centre for Computing, SCC Karlsruhe) and Itasca (Minnesota Supercomputing Institute for Advanced Computational Research). On the HLRN cluster the computational nodes used contain two quad-core sockets each for Intel Xeon Harpertown processors with 3 GHz and 16 GB memory per node. The computational nodes on PFCLUSTER1 each have two quad-core AMD Opteron(tm) processors running at 2.3 GHz and 16 GB memory per node. On the XC2-Karlsruhe we tested the code with four-way computational nodes each containing two AMD Opteron Dual Cores running at 2.6 GHz with 32 GB per node. On the Itasca cluster we used computational nodes having each two quad-core Intel Xeon X5560 processors running at 2.8 GHz and 24 GB memory per node. In Fig. 2 we show the speed-up using a 128 shells grid with 152064 computational points which is a typical resolution for our mantle convection simulations. The speed-up has been calculated by averaging the time needed for a time-step over 20 time-steps. Unlike Fig. 1 the “speed-up” factor in Fig. 2 is calculated by dividing the amount of time needed with eight CPUs by the amount of time needed with the parallel code. The dotted line in Fig. 2 shows the optimal speed-up. In Fig. 3 we calculate the speed-up reached on PFCLUSTER1 for various grids having different number of computational nodes.

556

A.-C. Plesa, T. Spohn

Fig. 2 The speedup using a 2D 128 shells grid (152064 computational points) on 8, 16, 32, 64 and 128 CPUs on HLRN, PFCLUSTER1, XC2-Karlsruhe and Itasca clusters. The speed-up factor is calculated by dividing the amount of time needed with eight CPUs by the amount of time needed with the parallel code; blue: HLRN, red: PFCLUSTER1, yellow: XC2-Karlsruhe, green: Itasca

Fig. 3 The speedup using three different 2D grids: 32 shells with 9056 computational points grid, 128 shells with 152064 computational points grid and 512 shells with 2378752 computational points grid on 8, 16, 32, 64 and 128 CPUs on PFCLUSTER1. The speed-up factor is calculated by dividing the amount of time needed with eight CPUs by the amount of time needed with the parallel code; blue: 32 shells grid, red: 128 shells grid, yellow: 512 shells grid

As shown in Fig. 3, the code performance increases with increasing number of CPUs used. However, the grid size is also of major importance. When using a moderate number of computational nodes, hence a small grid, the performance can decrease with increasing the number of CPUs used because of the communication overhead between the domains [25]. Figures 1 and 3 show such behavior for the 2D grid starting with 128 CPUs.

The Influence of Partial Melt on Mantle Convection

557

4 Results and Discussion Melt generation in a planetary mantle is a complex process that has a strong influence on the thermo-chemical evolution of a planet. In most earlier thermal evolution and convection models, melt production has been considered by the consumption and release of latent heat, the associated formation of a crust and the redistribution of radioactive heat sources [1, 10, 31]. However when modeling partial melt it is important to consider the effects this process has upon the melting temperatures, the density and the viscosity of the mantle material. First melt can influence the melting temperature through the melt extraction (depletion of the mantle material). Due to the loss of low-melting point components, the solidus increases with increasing degree of depletion (amount of extracted melt) by 150–200 K [19]. Significant variability in the solidus temperature is due, at least in part, to the fact that the primordial mantle material (peridotite) contains incompatible elements like Ca and Al (Cpx-elements) which have a lower melting temperature, whereas residues like harzburgite and dunite formed upon 30–60% depletion are depleted by these elements and enriched in Mg (higher fraction of Opx and Ol) [17]. During the melting process the increase in the melting temperature due to changes in the mineral phase-diagram tends to decrease the amount of melt with time. By increasing the solidus of depleted mantle material the melt production rates are lower than in a case where this effect has not been accounted for. Figure 4 shows the amount of depletion (processed mantle material) in percent for two cases with and without solidus variations. The crust production is therewith lower which will correspond better to the estimates obtained by inverting MOLA gravity and topography data [21]. Second the density of the mantle material decreases due to compositional changes. A compositional mantle variation arises from the extraction of partial melt,

Fig. 4 The melt extraction in percent: Solidus temperature increases with increasing degree of depletion (black) and no changes in the solidus temperature due to depletion of mantle material (cyan)

558

A.-C. Plesa, T. Spohn

which leaves behind a residuum depleted in incompatible elements and modified in modal mineralogy. This phenomenon, known as melt depletion, is geodynamically important because a melt-depleted residue is expected to be more buoyant than its fertile parent material. The decrease in residue density derives from both changes in mineral density and changes in the relative proportion of minerals. (4) includes a chemical component to account for compositional changes due to depletion [6]. Plesa and Breuer have shown in [28] results using a model of thermo-chemical convection with partial melt. Figure 5 shows a comparison between a case with no compositional changes and another case with compositional variations due to depletion. In the later case the density variations are 60 kg/m3 upon 30% differentiation (from peridotite—fertile mantle material—to residues like harzburgite). The composition for higher degrees of depletion like dunite (up to 60% depletion) does not cause any changes in the residue density. Therefore the maximum density variations are reached between peridotite and harzburgite (0–30% depletion). The tests in Fig. 5 were performed for a dry mantle rheology, which assumes a reference viscosity of 1021 Pa s. Density variations result in mantle inhomogeneities, however a stable mantle layering is not obtained in this case. The depleted layer is eroded by convection and new fertile material is supplied from the lower mantle. Eventually mixing takes place and separate mantle reservoirs cannot be sustained over the entire thermo-chemical evolution. For Mars the SNC meteorites suggest such reservoirs that haven’t mixed since their early formation. Third, melt can indirectly impact the viscosity of partially molten rocks through its influence on the water content [11]. Mantle material will be dehydrated due to partitioning of water from the minerals into the melt during the melting process. As a consequence, the viscosity of water depleted regions increases more than two orders of magnitude compared to the water-saturated rocks [18]. In a first study [26], [27] have shown that consideration of viscosity increase due to partial melt results in slower cooling rates. Figure 6 shows the thermal evolution of a case with an initially wet mantle rheology, which assumes a reference viscosity of 1019 Pa s, two orders of magnitude lower than the previous dry rheology cases. The viscosity increases in the regions where partial melt was extracted due to the depletion (dehydration) during the melting process. The initial wet mantle rheology results in higher convection velocity. The partial melt production in this case starts earlier in the planets’ thermal evolution due the fact that mantle plumes rise higher than in the dry rheology cases since the mantle material has a lower reference viscosity. Due to the higher mantle velocities, material mix very fast, compared to the dry rheology cases and an increase in the overall mantle viscosity takes place, which also thickens the upper boundary layer and reduces the production of partial melt. However the formation of a more viscous mantle due to dehydration cools the interior slower and may allow for a prolonged melt generation. Depending on the amount of melt produced this effect can play an important role in the volcanic history of the planet. Taking into account both density variations and dehydration of the mantle material due to depletion, a buoyant depleted layer can form. This layer will be depleted in both crustal components and water and has a blanketing effect preventing the

The Influence of Partial Melt on Mantle Convection

559

Fig. 5 Temperature and depletion slices for a case with compositional changes which don’t affect the density (upper row) and a case with compositional changes which cause a difference of 60 kg/m3 upon 30% mantle differentiation (lower row); solidus variations due to mantle depletion have been considered in both cases; partial melt is shown by the white spots in the temperature slice; three timesteps are presented: t = 1.10 Ga, t = 4.03 Ga, and t = 4.5 Ga

560

A.-C. Plesa, T. Spohn

Fig. 6 Temperature and water concentration slices for a case where dehydration due to water depletion during the melting process has been considered; partial melt is shown by the white spots in the temperature slice; three timesteps are presented: t = 0.09 Ga, t = 2.75 Ga, and t = 4.5 Ga

lower mantle to cool efficiently. Partial melt shown by the white spots in the temperature slice in Fig. 7 appears in the head of plumes which rise not from the coremantle boundary like in the previous cases but from the hot lower mantle. This kind

The Influence of Partial Melt on Mantle Convection

561

Fig. 7 Temperature and depletion slices (upper row) and viscosity and water concentration slices (lower row) for a case with both compositional changes, which cause a difference in the mantle density of 60 kg/m3 upon 30% mantle differentiation, and dehydration, which increases the viscosity up to two orders of magnitude in water depleted regions; solidus variations due to mantle depletion has been considered; partial melt is shown by the white spots in the temperature slice; three timesteps are presented: t = 1.10 Ga, t = 4.03 Ga, and t = 4.5 Ga

562

A.-C. Plesa, T. Spohn

of layering may help explain the SNC meteorites, which show that on Mars separate reservoirs have formed early in the planets’ evolution and did not mix since.

5 Conclusions In this work we presented results using 2D spherical convection models which account for effects of partial melt on melting temperatures, mantle density and viscosity. For the tests we used computational grids with 152064 nodes and time steps in the range of 10−6 to 10−8 , which adapt during the simulation run, depending on the parameters chosen. Performance tests show increasing speedup of the 2D Grid when using up to 64 CPUs, while for the 3D Grid as much as 256 CPUs or more can be used. The duration of simulation strongly depends on the parameter chosen (e.g. the Rayleigh number) and ranges from 8 to 24 hours on 40 CPUs for the cases in Fig. 5 and 64 CPUs for the cases in Figs. 6 and 7. The results show that the amount of melt which is produced during the planetary evolution strongly depends on the density variations due to compositional changes. Compositional variations arise from the extraction of partial melt, which leaves behind a residuum depleted in incompatible elements. In addition in a wet mantle case mantle material will be dried out due to partitioning of water from the minerals into the melt. This dehydration effect results in a stiffening of the mantle material. Both compositional changes and dehydration favor the formation of a buoyant depleted mantle layer which results in reduced melting rates since due to the compositional layering only the upper buoyant layer will efficiently be processed. Also the increase in the melting temperature caused by changes in the mineral phase-diagram tends to decrease the amount of melt with time. All three effects, increase in the melting temperature, density decrease due to depletion and dehydration stiffening, have consequences for crustal production rates, outgassing and formation of separate stable reservoirs. The effects of partial melt on viscosity, mantle density and melting temperature reduce the efficiency of outgassing of a one-plate planet. This implies that in the case of Mars the atmospheric conditions for fluid water at the surface are less favorable. The formation of separate reservoirs that didn’t mix since their formation, as indicated by the SNC meteorites, is hard to explain since the density variations between residual mantle material depleted in crustal components primordial mantle material are small and cannot prevent efficient mixing. However if a wet planetary mantle is assumed, dehydration stiffening in depleted regions help maintaining a buoyant residual upper layer throughout the planetary evolution. This implies that in the case of Mars the mantle material was rich in volatiles and a wet planetary mantle may be assumed.

The Influence of Partial Melt on Mantle Convection

563

6 Outlook The results presented in this work are a only small step towards a systematic investigation of the effects of partial melt on mantle convection. Further effects which have to be considered in future work are: 1 The effects of activation energy (see (10)). For simplicity and for a better comparison of the results, we assumed the same activation energy for both wet and dry rheology we presented. However the activation energy of wet olivine is 240 kJ/mol, smaller than that of dry olivine of 300 kJ/mol [16]. In future tests we will investigate the effects of the activation energy on the partial melt production in the mantle. 2 In our tests we assumed a linear parametrization of the solidus and liquidus temperature. Further tests should include a more realistic parametrization of the solidus and liquidus as in [32] where a polynomial approach is presented. The consequence of a polynomial parametrization is that partial melt can be produced also at greater depths [6], however, only melt produced above the density inversion depth [22] contributes to crust production and atmospheric outgassing. Further simulations are needed in order to better quantify the effect of the melting temperatures parametrization. 3 The water concentration in the melt can be calculated using the either fractional melting formula [7, 20] or the batch melting formula [17]. We used for our tests the fractional melting approach. Further tests are needed to compare the effects of fractional melting and the effects of batch melting on water concentration and with it on the viscosity of the mantle material. In the batch (equilibrium) melting case melt continuously reacts and equilibrates with crystalline residue until segregation (accumulation until permeability threshold), whereas in the fractional melting case melt is continuously removed allowing no reaction with crystalline residue (melt extracted at low % of melting e.g. 1%). However major differences between batch and fractional melting are expected for low degrees of melting. For degrees of melting higher than 1% the water concentration in the melt calculated by the two approaches should yield similar results. For a better comparison of the two approaches further simulations have to be done. 4 In our tests we did not account for the redistribution of radioactive heat sources due to depletion. This will result in a cooler mantle on the one hand since during the melting process the source mantle becomes depleted in incompatible elements like heat-producing elements. However the extracted heat sources will be enriched in the crust, which also has a lower thermal conductivity than the mantle beneath and acts therewith as a blanketing layer keeping the mantle warm. Heat source depletion and crustal feedback are complex processes which need more attention. In further work we aim to investigate these effects. Acknowledgments. This research has been supported by the Helmholtz Association through the research alliance “Planetary Evolution and Life”. The project is conducted at the SCC Karlsruhe on the XC4000 supercomputer.

564

A.-C. Plesa, T. Spohn

References 1. Breuer, D.; Spohn, T.: Early plate tectonics versus single plate tectonics: Evidence from the magnetic field history and crust evolution. J. Geophys. Res. – Planets, 108, 5072, (2003), doi:10.1029/20002JE001999. 2. Breuer, D.; Moore, W.B.: Dynamics and Thermal History of the Terrestrial Planets, the Moon, and Io. In: Treatise on Geophysics (Editor-in-Chief G. Schubert), 10, Planets and Moons (Ed. T. Spohn), p. 299–348, Elsevier, Amsterdam, (2007). 3. Breuer, D.: Dynamics and thermal evolution. In: Landolt-B¨ornstein Astronomy and Astrophysics (Group VI), 4, Astronomy, Astrophysics, and Cosmology (B – Solar System), p. 254– 270, Springer, Berlin, (2009), ISBN 978 3 540 88054 7. 4. Caretto, L.S.; Gosman, A.D.; Patankar, S.V.; Spalding, D.B.: Two calculation procedures for steady, three-dimensional flows with recirculation. Proc. Third Int. Conf. Numer. Methods Fluid Dyn., Paris, (1972). 5. Christensen, U.: Convection with pressure- and temperature-dependent non-Newtonian rheology. Geophysical Journal-Royal Astronomical Society, 77, 343–384, (1984). 6. De Smet, J.H.; Van Den Berg, A.P.; Vlaar, N.J.: The evolution of continental roots in numerical thermo-chemical mantle convection models including differentiation by partial melting. Lithos, 48, 153–170, (1999). 7. Fraeman, A.; Korenaga, Y.: The influence of mantle melting on the evolution of Mars. Icarus, 210, 43–57, (2010), doi:10.1016/j.icarus.2010.06.030. 8. Grasset, O.; Parmentier, E.M.: Thermal convection in a volumetrically heated, infinite Prandtl number fluid with strongly temperature-dependent viscosity: Implications for planetary thermal evolution. J. Geophys. Res., 103, 18171–18181, (1998). 9. Harder, H.; Hansen, U.: A finite-volume solution method for thermal convection and dynamo problems in spherical shells. Geophysical Journal International, 161, 522–532, (1986). 10. Hauck, S.A.; Phillips, R.J.: Thermal and crustal evolution of mars. J. Geophys. Res., 2002, 107, E7, doi:10.1029/2001JE001801, (2007). 11. Hirth, G.; Kohlstedt, D.L.: Water in the oceanic upper mantle: Implication for rheology, melt extraction and the evolution of the lithosphere. Earth and Planetary Science Letters, 144, 93– 108, (1996). 12. Huettig, C.; Stemmer, K.: Finite volume discretization for dynamic viscosities on Voronoi grids. Phys. Earth Planet. Interiors (2008), doi:10.1016/j.pepi.2008.07.007. 13. Huettig, C.; Stemmer, K.: The spiral grid: A new approach to discretize the sphere and its application to mantle convection. Geochem. Geophys. Geosyst., 9, Q02018, (2008), doi:10.1029/2007GC001581. 14. Huettig, C.: Scaling Laws for Internally Heated Mantle Convection, Ph. D. Thesis, (2009). 15. Karato, S.; Paterson, M.S.; Fitz Gerald, J.D.: Rheology of synthetic olivine aggregates: Influence of grain size and water. J. Geophys. Res., 91, 8151–8176, (1986). 16. Karato, S.; Wu, P.: Rheology of the upper mantle: A synthesis. Science, 260, 5109, 771–778, (1993). 17. Katz, R.F.; Spiegelman, M.; Langmuir, C.H.: A new parameterization of hydrous mantle melting. Geochemistry, Geophysics, Geosystems, 4 (9), 1073, (2003). 18. Korenaga, J.: Scaling of stagnant-lid convection with Arrhenius rheology and the effects of mantle melting. Geophys. J. Int., 179, 154–170, (2009), doi:10.1111/j.1365-246X.2009.04272.x. 19. Maaløe, S.: The solidus of harzburgite to 3 GPa pressure: The compositions of primary abyssal tholeiite. Mineralogy and Petrology, 81 (12), 117 (2004). 20. Morschhauser, A.; Grott, M.; Breuer, D.: Crustal recycling, mantle dehydration, and the thermal evolution of Mars. Icarus, 212 (2), 541–558, doi:10.1016/j.icarus.2010.12.028. (2011). 21. Neumann, G.A.; Zuber, M.T.; Wieczorek, M.A.; McGovern, P.J.; Lemoine, F.G.; Smith, D.E.: Crustal structure of Mars from gravity and topography. J. Geophys. Res., 109, E08002, (2004).

The Influence of Partial Melt on Mantle Convection

565

22. Ohtani, E.; Nagatab, Y.; Suzuki, A.; Katoa, T.: Melting relations of peridotite and the density crossover in planetary mantles. Chemical Geology, 120, 207–221 (1995). 23. Papike, J.J.; Karner, J.M.; Shearer, C.K.; Burger, P.V.: Silicate mineralogy of martian meteorites. Geochimica et Cosmochimica Acta, 73, 7443–7485, (2009), doi:10.1016/j.gca.2009.09.008. 24. Patankar, S.V.: Numerical Heat Transfer and Fluid Flow. McGraw-Hill, New York, (1980). 25. Plesa, A.-C.; Huettig, C.: Numerical Simulation of Planetary Interiors: Mantle Convection in a 2D Spherical Shell (Abstract). Workshop on Geodynamics 2008, Herz-Jesu-Kloster, Neustadt, Waldstr. 145 67434, Neustadt/Weinstrasse, (2008). 26. Plesa, A.-C.; Breuer, D.: Viscosity Variations Due to the Influence of Partial Melt: Implications for the Thermal Evolution of Mars and Earth (Abstract No. P1.05). International Conference on Comparative Planetology: Venus – Earth – Mars, Noordwijk, Holland, (2009). 27. Plesa, A.-C.; Breuer, D.: Effects of Viscosity Modifications and Solidus Changes in Regions of Partial Melt on Mantle Dynamics (Abstract No. EPSC2009-366). 4th European Planetary Science Congress (EPSC), Potsdam, Germany, (2009). 28. Plesa, A.-C.; Breuer, D.: The Influence of Partial Melt Generation on Mantle Density and Viscosity: Consequence for the Mantle Dynamics (Abstract). Geodynamics Workshop 2010, Muenster, Deutschland. (2010). 29. Roberts, J.H.; Zhong, S.: Degree-1 convection in the Martian mantle and the origin of the hemispheric dichotomy. Journal of Geophysical Research E: Planets, 111, (2006). 30. Schubert, G.; Turcotte, D.L.; Olson, P.: Mantle Convection in the Earth and Planets. Cambridge University Press, Cambridge, (2001). 31. Schumacher, S.; Breuer, D.: Influence of a variable thermal conductivity on the thermochemical evolution of mars. J. Geophys. Res., 111 (E2), E02006, (2006). 32. Takahashi, E.: Speculations on the Archean mantle: Missing link between komatiite and depleted garnet peridotite. J. Geophys. Res., 95 B10, 15941–15954, (1990).

Molecular Modeling of Hydrogen Bonding Fluids: Phase Behavior of Industrial Fluids Stefan Eckelsbach, Martin Bernreuther, Cemal Engin, Gabriela Guevara-Carrion, Yow-Lin Huang, Thorsten Merker, Hans Hasse, and Jadran Vrabec

1 Introduction The success of process design in chemical engineering and energy technology depends on the availability and accuracy of thermodynamic properties. In recent years, molecular modeling and simulation has become a promising tool to accurately predict thermodynamic properties of fluids. Thermodynamic data can accurately be predicted with molecular models that are based on quantum chemical calculations and are optimized to vapor-liquid equilibrium (VLE) data only.

2 Molecular Model Class To describe the intermolecular interactions, a varying number of Lennard-Jones (LJ) sites and superimposed point charges, point dipoles and linear point quadrupoles was used. Point dipoles and quadrupoles were employed for the description of the electrostatic interactions to reduce the computational effort during simulation. However, a point dipole may, e.g. when a simulation program does not support this interaction site type, be approximated by two point charges ±q separated by a distance l. Analogously, a linear point quadrupole can be approximated by three collinear point charges. A simulation code that does support point dipole and point quadrupole sites is ms2 [8]. Stefan Eckelsbach · Yow-Lin Huang · Jadran Vrabec Lehrstuhl f¨ur Thermodynamik und Energietechnik (ThEt), Universit¨at Paderborn, Warburger Str. 100, 33098 Paderborn, Germany, e-mail: [email protected] Martin Bernreuther High Performance Computing Center Stuttgart (HLRS), Department Parallel Computing – Training & Application Services, Nobelstr. 19, 70569 Stuttgart, Germany Cemal Engin · Gabriela Guevara-Carrion · Thorsten Merker · Hans Hasse Lehrstuhl f¨ur Thermodynamik (LTD), Technische Universit¨at Kaiserslautern, Erwin-Schr¨odingerStr. 44, 67663 Kaiserslautern, Germany W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 41, © Springer-Verlag Berlin Heidelberg 2012

567

568

S. Eckelsbach et al.

The parameters of the present molecular models can be separated into three groups. Firstly, the geometric parameters specify the positions of the different interaction sites of the molecular model. Secondly, the electrostatic parameters define the polar interactions in terms of point charges, dipoles and quadrupoles. And finally, the dispersive and repulsive parameters determine the attraction by London forces and the repulsion by overlaps of the electronic orbitals. Here, the LJ-12-6 potential [16, 17] was used to allow for a straightforward compatibility with the overwhelming majority of the molecular models in the literature. The total intermolecular interaction energy thus writes as N−1

U=

N

∑ ∑

i=1 j=i+1

⎧ ⎨ SiLJ

SLJ j

∑∑ ⎩a=1 b=1

4εi jab

σi jab ri jab

12

σi jab − ri jab

6 +

qic q jd qic μ jd + μic q jd qic Q jd + Qic q jd 1 · f1 (ωi , ω j ) + · f2 (ωi , ω j ) + ∑ ∑ 4πε0 ri jcd + 2 ri jcd ri3jcd c=1 d=1 Qic Q jd μic μ jd μic Q jd + Qic μ jd · f3 (ωi , ω j ) + · f4 (ωi , ω j ) + 5 · f5 (ωi , ω j ) , (1) ri4jcd ri3jcd ri jcd Sie

Sej

where ri jab , εi jab , σi jab are the distance, the LJ energy parameter and the LJ size parameter, respectively, for the pair-wise interaction between LJ site a on molecule i and LJ site b on molecule j. The permittivity of vacuum is ε0 , whereas qic , μic and Qic denote the point charge magnitude, the dipole moment and the quadrupole moment of the electrostatic interaction site c on molecule i and so forth. The expressions fx (ωi , ω j ) stand for the dependency of the electrostatic interactions on the orientations ωi and ω j of the molecules i and j [3, 13]. Finally, the summation limits N, SxLJ and Sxe denote the number of molecules, the number of LJ sites and the number of electrostatic sites of molecule x, respectively. For a given molecule, i.e. in a pure fluid throughout, the interactions between LJ sites of different type were defined by applying the standard Lorentz-Berthelot combining rules [5, 18] σiiaa + σ j jbb , (2) σi jab = 2 and

εi jab = εiiaa ε j jbb . (3)

3 Molecular Properties from Quantum Chemistry Molecular models that were developed on the basis of quantum chemical (QC) calculations stand between ab initio models and empirical models. The present strategy is based on the idea to include ab initio information without giving up the freedom to reasonably optimize the model to important macroscopic thermodynamic prop-

Molecular Modeling of Hydrogen Bonding Fluids

569

erties. Thus, for the modeling process some experimental data are needed for optimization. The chosen properties, vapor pressure and saturated liquid density, have the advantage to be well available for numerous engineering fluids and to represent dominant features of the fluid state.

3.1 Geometry All geometric data of the molecular models, i.e. bond lengths, bond angles and dihedrals, were specified on the basis of QC calculations. Therefore, a geometry optimization, i.e. an energy minimization, was initially performed using GAMESS(US) [26]. The Hartree-Fock level of theory was applied with a relatively small (6-31G) basis set. The resulting configuration of the atoms was taken to specify the spatial distribution of the LJ sites, except for the sites that represent groups containing Hydrogen atoms. As the united atom approach was used to obtain computationally efficient molecular models, the dispersive and repulsive interactions of the Hydrogen atoms were modeled together with the atom they are bonded to. For the Methyl (CH3 ) united atom site, the LJ potential was located at the geometric mean of the nuclei, while the Methine (CH) united atom site was located at 0.4 of the distance between Carbon and Hydrogen atom. These empirical offsets are in good agreement with the results of Ungerer et al. [32], which were found by optimization of transferable molecular models for n-Alkanes.

3.2 Electrostatics Intermolecular electrostatic interactions mainly occur due to static polarities of single molecules that can well be obtained by QC. Here, the Møller-Plesset 2 level of theory was used that considers electron correlation in combination with the polarizable 6-311G(d, p) basis set. The purpose of the present work was the development of effective pair potentials with state-independent model parameters. Obviously, the electrostatic interactions are stronger in the liquid state than in the gaseous state due to the higher density. Furthermore, the mutual polarization raises their magnitude in the liquid. Thus, for the calculation of the electrostatic moments by QC a liquid-like state should be considered. This was done here by placing one molecule into a dielectric continuum and assigning the experimental dielectric constant of the liquid to it, as in the COSMO method. From the resulting electron density distribution for the small symmetric molecules studied here, the dipole and quadrupole moments were estimated by simple integration over the orbitals. Thus magnitudes and orientations of these electrostatic interaction sites were derived from QC calculations.

570

S. Eckelsbach et al.

3.3 Dispersion and Repulsion It would be highly desirable to also calculate the dispersive and repulsive interactions using ab initio methods as well. This approach was followed by different authors in the past, predominantly for simple molecules. However, from an engineering point of view, this leads to difficulties. For an estimation of dispersive and repulsive interactions at least two molecules must be taken into account. To properly scan the energy hyper surface, many QC calculations for different distances and orientations of the molecules have to be performed. As the dispersive, and partly also the repulsive, interactions are usually only a very small fraction of the total energy calculated by QC, highly accurate methods like coupled cluster (CC) with large basis sets or even extrapolations to the basis set limit must be used for this task [27]. Due to the fact that this is computationally too expensive for engineering purposes, LJ parameters for a given atom or molecular group were passed on from other molecular models. Some of these parameters were subsequently fitted in the optimization process to yield an accurate VLE behavior of the modeled pure substance.

4 Pure Fluid Models None of the six molecules developed in the present work (Hydrogen chloride, Phosgene, Toluene, Benzene, Chlorobenzene and Ortho-Dichlorobenzene) exhibits significant conformational changes. Their internal degrees of freedom were thus neglected and the molecular models were chosen to be rigid, using the most stable configuration as determined by QC. The optimization was performed using a Newton scheme following Stoll [11, 30]. The applied method has many similarities with the one published by Bourasseau et al. [6]. It relies on a least-square minimization of a weighted fitness function that quantifies the deviations of simulation results for a given molecular model compared to reference data. Correlations for vapor pressure, saturated liquid density and enthalpy of vaporization, taken from the DIPPR database [25], were used as reference data for model adjustment and evaluation. This was done even in cases where the correlations are based only on few true experimental data points, as they were regarded as best practice. The quantitative comparison between simulation results and correlations of experimental data was done by applying fits to the simulation data according to Lotfi et al. [19]. The relative deviation between fit and correlation was calculated in steps of 1 K in the temperature range where simulations were performed and is denoted by “mean unsigned error” in the following. VLE were simulated with the Grand Equilibrium method [33] that is implemented in the ms2 simulation code [8]. The optimized parameter sets of the new molecular models are given in [15].

Molecular Modeling of Hydrogen Bonding Fluids

Fig. 1 Saturated densities; present simulation data: correlations of experimental data [25]: —

571

• Hydrogen chloride, ◦ Phosgene, Toluene;

The pure substance VLE simulation results on the basis of these optimized models are shown in absolute terms in Figs. 1, 2, 3, where they are compared to the DIPPR correlations of experimental data. Figure 2 illustrates the influence of molecular size and polarity on the phase envelope in a systematic manner. Both size and polarity increase in the sequence Benzene, Chlorobenzene, Ortho-Dichlorobenzene which is reflected by a decreasing average saturated liquid density and an increasing critical temperature. The critical properties were determined through fits to the present VLE simulation results as suggested by Lotfi et al. [19]. The estimated uncertainties of critical temperature, critical density and critical pressure from simulation are 1, 3 and 3%, respectively. Table 1 compares these critical properties to experimental data [1, 2, 4, 7, 20]. An excellent agreement was achieved, being almost throughout within the combined error bars. For Hydrogen chloride, Phosgene and Benzene experimental data on the second virial coefficient are available [9, 22, 23, 31]. Figure 4 compares the predictions based on the present molecular models with these data. The agreement is very good, only at low temperatures noticeable deviations are present for the smaller two molecules.

572

S. Eckelsbach et al.

Fig. 2 Saturated densities; present simulation data: Benzene, Chlorobenzene, Dichlorobenzene; correlations of experimental data [25]: —

•

◦

Ortho-

Fig. 3 Vapor pressure; present simulation data: Hydrogen chloride, Phosgene, Benzene, Toluene Chlorobenzene, Ortho-Dichlorobenzene; correlations of experimental data [25]: —

Molecular Modeling of Hydrogen Bonding Fluids

573

Table 1 Critical properties of the pure substances on the basis of the new molecular models in comparison to recommended experimental data. The number in parentheses indicates the experimental uncertainty in the last digit Tcsim Tcexp ρcsim ρcexp psim pexp Ref. c c K K mol/l mol/l MPa MPa Hydrogen chloride 324 324.65(5) 12.2 12.34(3) 8.3 8.31(5) [20] Phosgene 454 455.0(7) 5.1 5.40(6) 5.7 5.35(4) [2] Benzene 563 562.15(6) 3.9 3.88(2) 4.9 4.9(1) [4] Chlorobenzene 631 632.35(8) 3.2 3.24(7) 4.6 4.52(8) [1] Ortho-Dichlorobenzene 705 705.0(9) 2.8 2.77(6) 4.0 4.1(3) [7] Toluene 592 591.75(8) 3.4 3.20(4) 4.1 4.08(3) [4]

•

Fig. 4 Second virial coefficient; present simulation data: Hydrogen chloride, Benzene; correlations of experimental data [9, 22, 23, 31]: —, - -

◦ Phosgene,

5 Influence of the Intramolecular Degrees of Freedom on Vapor-Liquid Equilibrium of Ammonia The influence of the intramolecular interactions on the VLE was investigated, taking ammonia as a case study. The introduction of the intramolecular interactions on the basis of the harmonic potentials with the force constants by Shi and Maginn [28] into the originally rigid model by Eckl et al. [12] has a strong influence on the VLE properties. On average, the vapor pressure is decreased by 38%, the saturated liquid density is increased by 11% and the enthalpy of vaporization is increased by 31% [10]. As usual, the saturated vapor density follows the trend of the vapor pressure.

574

S. Eckelsbach et al.

Fig. 5 Distribution of the dipole moment magnitude in the saturated liquid (top) and saturated vapor (bottom) at 347.5 K of the rigid ammonia model [12] with superimposed intramolecular degrees of freedom, using the force constants by Shi and Maginn [28]. The data were sampled from six uncorrelated configurations. The vertical lines indicate the average molecular dipole moments μ¯ liq = 2.06 D and μ¯ vap = 1.92 D in the coexisting phases

The reasons for these discrepancies were studied in detail using the VLE at 347.5 K as an example. The observed major shift of VLE properties are the result of the significant change in the molecular geometry of the flexible molecule in the liquid phase. Due to the intermolecular interactions the flexible molecules oscillate in the liquid state around an average bond angle of 103.2°, instead of 106° in case of the equilibrium structure, which was adopted for the rigid model. The three partial dipoles, each constituted by the Nitrogen atom and one Hydrogen atom, are thus more aligned so that the overall dipole moment distribution has an average value of μ¯ liq = 2.06 D. This dipole moment distribution is 10% higher than the equilibrium value of μ = 1.88 D, cf. Fig. 5 (top). In the vapor state, the dipole moment distribution exhibits an average value of μ¯ vap = 1.92 D, which is only negligibly higher than that of the equilibrium structure, cf. Fig. 5 (bottom). This shows that as expected, for the studied conditions, the molecules in the vapor state oscillate around a geometry that is only marginally different from the equilibrium structure. In summary: the strong intermolecular interactions in the liquid state lead to changes of the flexible molecular model’s structure that significantly influence the

Molecular Modeling of Hydrogen Bonding Fluids

575

Table 2 Influence of the intramolecular degrees of freedom on the vapor-liquid equilibrium of ammonia at 347.5 K for different models with constant intermolecular interaction potential parameters. In the rigid model, both the bond length and the bond angle were kept fixed. In the bond length model, the bond angle was fixed, whereas the bond length was allowed to vary. In the bond angle model, the bond length was fixed, whereas the bond angle was allowed to vary. In the flexible model, both the bond length and the bond angle were allowed to vary. The force constants by Shi and Maginn [28] were used in those cases where the intramolecular degrees of freedom were introduced. “Exp.” denotes reference data from the NIST Chemistry Webbook [21]

rigid bond length bond angle flexible

ρ liq mol/l 30.3 (1) 30.5 (1) 33.0 (2) 33.3 (2)

ρ vap mol/l 1.73 (3) 1.69 (3) 1.05 (8) 0.98 (5)

p MPa 3.69 (7) 3.67 (5) 2.5 (1) 2.36 (5)

Δ hv kJ/mol 15.9 (4) 16.1 (4) 19.3 (5) 19.6 (6)

Exp.

30.4

1.73

3.66

15.5

Model

μ¯ liq D 1.88 1.88 2.05 2.06

μ¯ vap D 1.88 1.88 1.89 1.92 1.47

thermodynamic properties, whereas in the vapor state, only minor changes are observed. The influence of the different intramolecular degrees of freedom types on the VLE properties was also studied at the temperature 347.5 K. In the rigid model, both the bond length and the bond angle were kept fixed. In the bond length model, the bond angle was fixed, whereas the bond length was allowed to vary. In the bond angle model, the bond length was fixed, whereas the bond angle was allowed to vary. In the flexible model, both the bond length and the bond angle were allowed to vary. It can be seen in Table 2 that the bond angle potential is crucial, whereas the bond length potential has hardly any effect on the VLE properties of ammonia. The observed changes of the VLE properties can be explained by the increase of the attractive molecular interactions due to the increased average dipole moment in the liquid state. The average potential energy between two dipoles [24] is uμ μ = −

2 2 2 μi μ j , 3 kB Tr6

(4)

which indicates that the dipole-dipole potential energy depends on the fourth power of the dipole moment. The dipole-dipole interaction yields on average an attractive contribution so that a relative increase of Δ μ /μ0 of the dipole moment leads to a relative increase of the total potential energy of Δ uμ μ /u0μ μ ≥ 4Δ μ / μ0 . The charge distribution of ammonia has a quadrupole moment as well, however, the dipole moment is dominant. There is also an additional cooperative effect resulting from the configuration of the molecules, because intramolecular degrees of freedom enhance the ability of the molecules to avoid highly repulsive configurations.

576

S. Eckelsbach et al.

6 Compiler Performance Building Executables of the Molecular Simulation Code ms2 Most of the simulation data presented in this work were generated with the molecular simulation code ms2 [8] that is developed in our group. It was extensively tested on different platforms regarding different aspects like parallelization or vectorization. In this work, the compiler performance executables of ms2 is highlighted for molecular dynamics (MD) and Monte Carlo (MC) simulations. Throughout, the equimolar liquid mixture of methanol and ethanol at 298.15 K and 0.1 MPa was simulated in the NpT ensemble [14]. Methanol was modeled by two LJ sites and three point charges and ethanol by three LJ sites and three point charges [29]. This test case was chosen, because it is a typical application. Similar results are expected for a wide class of problems. However, note that the actual run times will differ with varying thermodynamic conditions that are simulated. Run times increase with higher density of the system, among others. ms2 is distributed as a source code and can be compiled with a wide variety of compilers. However, the performance is significantly influenced by the compiler and the linker as well as by the used options. Figure 6 shows the runtime of the test case on different platforms using different compilers. The binaries were generated by • • • • •

GNU gfortran1 with “-fdefault-real-8 -O3” Intel ifortran2 with “-r8 -fast” PGI pgf953 with “-r8 -fastsse” Sun Studio sunf904 with “-r8const -fast” Pathscale pathf905 with “-r8 -Ofast”

with options activated to enforce the use of double precision floating point numbers. The chosen optimization flags represent a common choice. The best combination of compiler and platform was the Intel ifortran compiler and the Intel Xeon X5560 “Nehalem” processor6 . An Intel ifortran compiled binary of ms2 significantly outperforms binaries compiled by GNU gfortran and PGI pgf90, independent of the computing platform.

1 2 3 4 5 6

http://gcc.gnu.org/fortran/ http://software.intel.com/en-us/intel-compilers/ http://www.pgroup.com/products/ http://developers.sun.com/sunstudio/ http://www.pathscale.com/ http://www.hlrs.de/systems/platforms/nec-nehalem-cluster/

Molecular Modeling of Hydrogen Bonding Fluids

577

Fig. 6 Runtime of the testcase executed by ms2 as measured on different platforms using different compilers. An equimolar liquid mixture of methanol and ethanol at 298.15 K and 0.1 MPa was simulated in the NpT ensemble

7 Conclusion Six rigid molecular models were proposed for Hydrogen chloride, Phosgene, Toluene, Benzene, Chlorobenzene and Ortho-Dichlorobenzene. The interaction sites were located according to the atom positions resulting from ab initio quantum chemical calculations. Also the electrostatic interactions were parameterized according to high-level ab initio quantum chemical results. The latter were obtained by calculations within a dielectric continuum to mimic the (stronger) interactions in the liquid state. The LJ parameters were adjusted to VLE data, namely vapor pressure and saturated liquid density. Even for very small molecules like ammonia, the introduction of intramolecular degrees of freedom may have an astonishingly large influence on VLE properties. Thus, care has to be taken upon combining force field parameterizations covering different aspects of the molecular interactions.

578

S. Eckelsbach et al.

Acknowledgments. We gratefully acknowledge support by Deutsche Forschungsgemeinschaft. This work was carried out under the auspices of the Boltzmann-Zuse Society (BZS) of Computational Molecular Engineering. The simulations were performed on the NEC SX-9 and the NEC Nehalem Cluster at the High Performance Computing Center Stuttgart (HLRS).

References 1. Alani, G. H.; Kudchadker, A. P.; Zwolinski, B. J. Chem. Rev. 1968, 68, 659. 2. Ambrose, D. Vapor-liquid critical properties; National Physical Laboratory Report Chem 107, Middlesex, 1980. 3. Allen, M. P.; Tildesley, D. J. Computer Simulations of Liquids; Oxford University Press: Oxford, 1987. 4. Ambrose, D.; Tsonopoulos, C. J. Chem. Eng. Data 1995, 40, 547. 5. Berthelot, D. Cr. Hebd. Acad. Sci. 1898, 126, 1703. 6. Bourasseau, E.; Haboudou, M.; Boutin, A.; Fuchs, A. H.; Ungerer, P. J. Chem. Phys. 2003, 118, 3020. 7. Bunger, W. B.; Riddick, J. A. Organic Solvents: Physical Properties and Methods of Purification; 3rd ed., Wiley Online Library: New York, 1970. 8. Deublein, S.; Eckl, B.; Stoll, J.; Lishchuk, S.; Guevara-Carrion, G.; Glass, C. W.; Merker, T.; Bernreuther, M.; Hasse, H.; Vrabec, J. Comput. Phys. Commun. 2011, in press, doi:10.1016/j.cpc.2011.04.026, http://www.ms-2.de/. 9. Danner, R. P.; Tarakad, R. R. AIChE J. 1977, 23, 685. 10. Engin, C.; Merker, T.; Hasse, H.; Vrabec, J. Mol. Phys. 2011, 109, 619. 11. Eckl, B.; Vrabec, J.; Hasse, H. J. Phys. Chem. B 2008, 112, 12710. 12. Eckl, B.; Vrabec, J.; Hasse, H. Mol. Phys. 2008, 106, 1039. 13. Gray, C. G.; Gubbins, K. E. Theory of Molecular Fluids. 1. Fundamentals; Clarendon Press: Oxford, 1984. 14. Guevara-Carrion, G.; Nieto-Draghi, C.; Vrabec, J.; Hasse, H. J. Phys. Chem. B 2008, 112, 16664. 15. Huang, Y.-L.; Heilig, M.; Hasse, H.; Vrabec, J. AIChE J. 2011, 57, 1043. 16. Jones, J. E. Proc. R. Soc. A 1924, 106, 441. 17. Jones, J. E. Proc. R. Soc. A 1924, 106, 463. 18. Lorentz, H. A. Ann. Phys. 1881, 12, 127. 19. Lotfi, A.; Vrabec, J.; Fischer, J. Mol. Phys. 1992, 76, 1319. 20. Mathews, J. F. Chem. Rev. 1972, 72, 71. 21. National Institute of Standards and Technology NIST Chemistry Webbook 2010; http://webbook.nist.gov/. 22. Nunes Da Ponte, M.; Staveley, L. A. K. J. Chem. Thermodyn. 1981, 13, 179. 23. Polt, A.; Platzer, B.; Maurer, G. Chem. Tech. Leipzig 1992, 44, 216. 24. Prausnitz, J. M. Molecular Thermodynamics of Fluid-Phase Equilibria; Prentice-Hall, Inc.: Englewood Cliffs: New Jersey, 1969. 25. Rowley, R. L.; Wilding, W. V.; Oscarson, J. L.; Yang, Y.; Zundel, N. A.; Daubert, T. E.; Danner, R. P. DIPPR Data Compilation of Pure Compound Properties. Design Institute for Physical Properties; AIChE: New York, 2006. 26. Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Shujun, S.; Windus, T. L.; Dupuis, M.; Montgomery, A. M. J. Comput. Chem 1993, 14, 1347. 27. Sandler, S. I.; Castier, M. Pure Appl. Chem. 2007, 79, 1345. 28. Shi, W.; Maginn, E. J. AIChE J. 2009, 55, 2414. 29. Schnabel, T.; Srivastava, A.; Vrabec, J.; Hasse, H. J. Phys. Chem. B 2007, 111, 9871.

Molecular Modeling of Hydrogen Bonding Fluids

579

30. Stoll, J. Molecular Models for the Prediction of Thermophysical Properties of Pure Fluids and Mixtures; Fortschritt-Berichte VDI, Reihe 3, Vol. 836, VDI-Verlag: D¨usseldorf, 2005. 31. Tsonopoulos, C. AIChE J. 1978, 24, 1112. 32. Ungerer, P.; Beauvais, C.; Delhommelle, J.; Boutin, A.; Rousseau, B.; Fuchs, A. H. J. Chem. Phys. 2000, 112, 5499. 33. Vrabec, J.; Hasse, H. Mol. Phys. 2002, 100, 3375.

“Brute-Force” Solution of Large-Scale Systems of Equations in a MPI-PBLAS-ScaLAPACK Environment M. Roth, O. Baur, and W. Keller

Abstract Space-borne gravity field recovery requires the solution of large-scale linear systems of equations to estimate tens of thousands of unknown gravity field parameters from tens of millions of observations. Satellite gravity data can only be exploited efficiently by the adaption of HPC technologies. The extension of the GOCE (Gravity field and steady-state Ocean Circulation Explorer) mission, in particular, poses unprecedented computational challenges in geodesy. In continuation of our work presented in the annual report in 2010, we succeeded in the preparation of a distributed memory version of our program using the MPI, PBLAS and ScaLAPACK programming standards. The tailored implementation enhances the range of usable computer architectures to computers with less memory per node than the NEC SX-8 and SX-9 systems we used. We present implementation details and runtime results using the NEC SX systems as distributed memory systems. A comparison with our OpenMP version shows that the MPI implementation of our program brings forth a speedup of around 12% for large-scale problems.

1 Introduction The satellite mission GOCE (Gravity field and steady-state Ocean Circulation Explorer) will provide a global model of the figure of the Earth—the geoid—with unprecedented precision. The GOCE satellite was launched on 17th March 2009. After two years in orbit, the nominally planned operational phase of twelve months was finished on 2nd March 2011. However, due to low solar activity in the last years, the energy consumption of the satellite was less than anticipated. The satellite’s good health and excellent data quality led the ESA (European Space Agency) to extend the mission until the end of 2012. This extension promises an even better mapping M. Roth · O. Baur · W. Keller University of Stuttgart, Institute of Geodesy, Geschwister-Scholl-Str. 24D, D-70174 Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 42, © Springer-Verlag Berlin Heidelberg 2012

581

582

M. Roth, O. Baur, W. Keller

of the gravity field [3]. On the other hand, more data leads to bigger computational challenges. The prime objective in space gravimetry is to recover a set of model parameters representing the terrestrial gravity field, the geoid respectively. Our approach is the “brute-force” solution of the resulting linear system of equations as it provides the full variance-covariance matrix of the estimated parameters. As a consequence, this approach needs the complete normal equation matrix in memory, and hence results in a memory demand of around 64 GByte. This demand, as well as a considerable run-time reduction by means of massively parallel processing, can only be satisfied by high performance computing. In our last year’s report [8] we described the OpenMP/BLAS/LAPACK version of our program to compute the aforementioned model parameters. Our OpenMP version was intended to run on the vector systems NEC SX-8 and SX-9 using one computing node. A series of experiments prove good performance [7]. Because of the anticipated flood of data and the resulting long computing time, we rebuilt our program in a MPI/PBLAS/ScaLAPACK version. This ensures the possibility to use more than one computing node of both SX systems while reducing the walltime of the program significantly. Furthermore, the future of high performance computing seems to lie in distributed computing with less memory per computing node than we demand. Hence, our program needs to utilize several computing nodes to fulfill our memory demand. In its current implementation, our analysis software is ready for real data exploitation. Moreover, we are prepared for the use of the upcoming Cray XE6 platform of the HLRS (High Performance Computing Center Stuttgart).

2 Methodology Parametrized in spherical harmonics, the gravitational potential of the Earth becomes V (λ , ϕ , r) =

GM L l a l+1 ¯ ∑ ∑ r Plm (sin ϕ )[c¯lm cos mλ + s¯lm sin mλ ] a l=0 m=0

(1)

[4]. In (1), (λ , ϕ , r) represent spherical polar coordinates with λ east longitude, ϕ latitude and r distance from the geocenter. GM denotes the geocentric constant, a is the major semi-axis of a reference ellipsoid of revolution. P¯lm (sin ϕ ) represent the 4π -normalized associated Legendre functions of the first kind (e.g. [5]), the series coefficients c¯lm and s¯lm indicate unknown model parameters and L = lmax stands for the maximum spherical harmonic degree (spectral resolution) which, in practice, depends on data sensitivity. The GOCE mission allows resolving the geopotential coefficients up to L ≈ 250, with the mission extension eventually even higher. For the derivation of the (linear) functional model of GOCE observables we refer the reader to [1] and [7]. The reformulation of this functional model in matrix-vector

“Brute-Force” Solution of Large-Scale Systems of Equations

583

notation yields y = Ax + r

(2)

with the vector of observations y (n×1), the vector of unknown parameters x (u×1) and the design matrix A (n × u). Additionally, (2) accounts for the case n > u, hence representing an overdetermined, inconsistent system of equations. The vector of residuals is r (n × 1). In order to obtain the “best” solution xˆ to (2), we minimized the square sum of residuals subject to minx r2 = minx y−Ax2 [6], which results in the least-squares (LS) estimate xˆ = (AT A)−1 AT y = N−1 b.

(3)

For computational reasons we split the design matrix into several blocks Ai according to j

j

i=1

i=1

N = AT A = A1 T A1 + · · · + A j T A j = ∑ Ai T Ai = ∑ Ni .

(4)

Alongside N, b is set up analogously. The variance-covariance matrix of the parameter estimate, D(ˆx), is obtained by inversion of the normal equation matrix N = AT A according to D(ˆx) = σˆ 2 (AT A)−1 = σˆ 2 N−1 , rˆ T rˆ , rˆ = [A(AT A)−1 AT − I]y. σˆ 2 = n−u

(5) (6)

As the variance-covariance matrix is of crucial significance for further analysis, here we follow (3) and (5) to compute xˆ and D(ˆx) “brute-force”, which includes normal equations system inversion.

3 MPI Implementation on NEC SX Systems Starting from the OpenMP version of our analysis software [8], we successively prepared for distributed memory system handling. In contrary to the shared memory version, the MPI/PBLAS/ScaLAPACK version needed several fundamental changes. The most important change is that MPI has to be initialized before a communication between the processes is established. Apart from these initializations (and finalizations at the end of the program) several other steps got necessary. Above all, vectors and matrices have to be distributed on all involved processes. This can be achieved following two strategies. First, each process computes the matrix-blocks it holds by itself. In this case a huge part of the computations needs to be repeated by every process that has a matrix-block where the same line of matrix A lies within. The strategy would require a new coding of several modules; the shared memory version made use of

584

M. Roth, O. Baur, W. Keller

Fig. 1 Flowchart of the MPI-parallelized version of the program

the data structure of the A matrix and computed one complete line per thread, as all entries in one line of A depend on the same data. Alternatively, each process computes a line of A and sends the parts to the processes that should hold it. This gives us the advantage that we could reuse the major part of the code and only needed to put an additional communication part in the program. A drawback is the additional communication cost. Nonetheless, this seemed the easier way, hence we decided to follow it. A flowchart of the distributed memory version of the program is given in Fig. 1. The steps will be explained in more detail in the following subsections.

3.1 Block-Cyclic Distribution The biggest difference between both program versions is that the shared memory version has the matrices and vectors lying “en bloc” in memory while, of course, the distributed memory version needs the matrices and vectors being split and distributed to all processes. Furthermore, ScaLAPACK demands a special arrangement of the matrices and vectors in the memory of the computing nodes in the processing grid. The matrices and vectors are split into blocks which are distributed over the single processes. The distribution scheme is called block-cyclic [2]. LAPACK is programmed in Fortran which favors column-wise storage of matrices. C, on the other hand, operates on row-wise memory scheme. Exemplary for the matrix given in Fig. 2, the Fortran-scheme does not align the individual blocks successively. Let us assume that the matrix is split into 2 × 2 sized blocks. The first line of these blocks gets distributed to the first line of the processes in the processing grid by sending the first block to process 0 of this line, the second block to process 1 etc. If the matrix’s block-line contains more blocks than the corresponding grid line has processes, one starts again at process 0.

“Brute-Force” Solution of Large-Scale Systems of Equations

585

Fig. 2 Top: Matrix to be distributed. Bottom: Linear memory structure (Fortran)

Fig. 3 The matrix is split into blocks, global view

In the same fashion the further block-lines of the matrix are distributed to the corresponding lines of processes in the processing grid. If the bottom line of the processing grid is reached and still block-lines of the matrix remain, one starts again at the first line of the processing grid. Here a processing grid of size 2 × 3 is used. Accordingly, the matrix is split into the blocks shown in Fig. 3; the color represents the blocks’ affiliation to the different processes. Then, these blocks are distributed to the processes and stored in local matrices (see Fig. 4). Figure 5 illustrates the data order in local memory. ScaLAPACK provides the function numroc which calculates the size of a local matrix, which is an important information during memory allocation.

3.2 Data Input and Distribution For simplicity, we decided to keep the data input similar to the shared memory version. Only one process reads the data from the hard disk drive and broadcasts them to the other processes via the MPI function MPI Bcast.

586

M. Roth, O. Baur, W. Keller

Fig. 4 The Matrix is split into blocks, proces- Fig. 5 The Matrix is split into blocks, local ses’ local view memory structure

3.3 Initializing a Distributed Matrix The flowchart of the distributed memory version (see Fig. 1) does not show the necessary initialization steps for the distributed matrices and vectors. Alongside numroc , ScaLAPACK provides another handy function, named descinit , which initializes a distributed matrix. Two calls of the function numroc , one call per matrix dimension, return the dimensions of the local matrix parts. Subsequently, the proper amount of local memory can be allocated. Afterwards, a call to the routine descinit tells ScaLAPACK how the matrix will be distributed.

3.4 Design Matrix Assembly and Redistribution We strived to optimize the computations of the OpenMP version. We found that the version that computes one line of the design matrix per process runs the fastest, due to the fact that computations are partially the same for all entries of one matrix line. It was decided not to waste these efforts and to reuse this part in the MPI version. But the MPI version has some significant difference. In the shared memory version all processes compute one block-line of the matrix together, while every process computes one block-line for itself in the distributed memory version. Furthermore, ScaLAPACK expects the blocks of the matrix to be block-cyclic distributed, provoking additional problems. From the viewpoint of a process these problems are equivalent to the following questions: • • • • •

on which data should I work, which of the blocks I computed have to be sent to other processes, which blocks do I receive from other processes, to which process should I send a specific matrix block, from which process do I receive a specific block?

An answer to these questions is illustrated in Fig. 6.

“Brute-Force” Solution of Large-Scale Systems of Equations

587

Fig. 6 Distribution scheme of a block of the design matrix A: From local over global to local coordinates of a 2 × 2 processing grid

First, each process computes one block-line of matrix A. Then, from the global coordinates of each block, in turn, its new local coordinates in the process grid are derived. Now a process can identify the blocks it can keep or has to exchange with other processes (also the coordinates of those processes are now known). This approach leads to higher network traffic the more processors are involved, because more blocks must be exchanged. In summary this approach requires a fast network connection between the computing nodes to transfer the large amount of data. We deal with two process grids. One linear “virtual grid” for computing the block-lines of matrix A; the other should be two-dimensional and is used by the PBLAS and ScaLAPACK routines, for instance for building matrix N and solving the linear system of equations. Hence, the matrix blocks need coordinates in both grids for distributing them to the correct processes. On the basis of the global coordinates of a block, its local coordinates can be retrieved easily. As this also applies the other way round, local coordinates are transformed to global coordinates and back to the local ones of the other grid. The calculation is the same for both dimensions, hence we implemented the transformation functions only for one dimension and call them twice by a 2D wrap around. The BLACS library provides the functions DGESD2D to send a matrix block, respectively DGERV2D for reception. The sending function needs the retrieving process specified and the other way round. But a retrieving process knows only the number of the sending process, not its grid coordinates. For this reason, the BLACS function BLACS PCOORD is necessary to calculate the corresponding coordinates of this process first. During the redistribution task to achieve the block-cyclic distribution of the matrix, because of our line-by-line calculation scheme and usage of the programming language C, we need to reorganize the elements of the matrix blocks according to the row-by-row Fortran-scheme.

588

M. Roth, O. Baur, W. Keller

3.5 Normal Equations System Setup The PBLAS library provides the distributed computing equivalents to the BLAS library for setting up the normal equation system. Vector b is computed by the function PDGEMV and the normal equation matrix N by PDSYRK. Vector and matrix have to be block-cyclic distributed. In general, both functions work the same as their BLAS pendants, however the function calls need additional parameters which describe the vector and matrix distribution on the processing grid.

3.6 Normal Equations System Solution The ScaLAPACK function PDPOSV solves the normal equations system. Again the distributed function needs additional parameters to describe how the matrix and vector are distributed. The solution vector, containing the estimated coefficients, is also distributed. Hence, the results are collected by one process, which writes them to a file. The gathering is done by the routine PDGEMR2D; its purpose is to redistribute a matrix from one processing grid to another.

3.7 Block Size of a Distributed Matrix Of relevance is also the size of the blocks of a distributed matrix. Each block gets some head information for communication purposes. The smaller the block size is chosen, the more significant the communication overhead becomes. On the other hand, if the block size gets too big, during its transfer other communication becomes blocked. In both cases the program is slowed down. Tests showed that block sizes between 48 × 48 and 104 × 104 result in notable lower runtime, with a slight optimum at a size of 88 × 88 (cf. Fig. 7). The tests were conducted on the SX-8 with one node. It remains to verify whether another block

Fig. 7 Runtime depending on the block size of the distributed matrices (black: Real time; red: Vector time; blue: User time). Left: Coarse stepping, right: Zoom in region with best performance

“Brute-Force” Solution of Large-Scale Systems of Equations

589

size would perform better on the SX-9. Nonetheless, a block size of 88 × 88 is used on both SX systems.

4 Runtime Results The MPI runtime tests were carried out on the SX systems. Not all tests could be executed, because of the long waiting queues on the SX systems.

4.1 Varying Number of CPUs The tests were conducted with 2 to 32 CPUs, while the single processor result of the OpenMP version was used as reference. MPI makes it possible to use several nodes of the SX systems. For 32 CPUs this equals four nodes on the SX-8 and two nodes on the SX-9. Figures 8 and 9 show the results of the time measurements. The seemingly weird behavior up to 8 CPUs is due to the processing grid. For the SX-8 this is superposed by the problem that in parallel job mode 12 processes run on a node with 8 CPUs. In both figures the measured time is low when the processing grid is more quadratic.

Fig. 8 Distributed memory version with MPI on the SX-8. Left: Design matrix assembly (blue); distribution of the design matrix (green); normal equations system setup (red); overall time (black). Middle: User time (red); vector time (blue); real time (black). Right: Ideal speed-up (black); true speed-up (red). Parameters: lmax = 50, top: n = 250 000, bottom: n = 500 000. Please note the gap in the graphs; here the job mode is switched to exclusive use

590

M. Roth, O. Baur, W. Keller

Fig. 9 As in Fig. 8, but for the SX-9

A linear processing grid is inefficient for the PBLAS and ScaLAPACK functions, this is the case if the number of used CPUs is a prime number (e.g., 1 × 7 for seven CPUs). The design matrix assembly is in line with the expected behavior. Compared to the SX-9, on the SX-8 the MPI version runs with a higher efficiency for more CPUs. A comparison with the OpenMP results shows that the MPI version takes around four to five times longer than the OpenMP version, which is not acceptable.

4.2 Large-Scale Problems We rerun the previous tests with 8, 16 and 32 CPUs for a higher degree and order of lmax = 200 and n = 500 000 lines of data to analyze dependencies on a specific degree and order. The test were conducted on the SX-9. The results, shown in Fig. 10, indicate that doubling the processor number does not necessarily result in halving the runtime. It is true for the step from 8 to 16 CPUs, but not for the step from 16 to 32 CPUs. In the next test on the SX-9, we analyzed if the program’s efficiency improves for higher lmax up to a degree and order of lmax = 200. The results are displayed in Fig. 11 and Table 1. The MPI version is about 10 % faster than the OpenMP version for lmax = 200. The MPI version appears to be better in handling large scale problems. To verify this, a test for L = 250 was dispatched for the MPI and the OpenMP version, but due to issues with the power supply at HLRS at that time, those jobs were not carried out. Still the results imply that the MPI version is more efficient than the OpenMP version for large-scale problems and the other

“Brute-Force” Solution of Large-Scale Systems of Equations

591

Fig. 10 As in Fig. 8, but for the SX-9 and with lmax = 200 and n = 500 000

Fig. 11 Distributed memory version with MPI on the SX-9. Colors are according to Fig. 8, 16 CPUs, n = 500 000

way round for lower-scale problems. The turning point is within a problem size of lmax = {150, . . . , 200}. Another test was performed on the SX-8 with one to four nodes (32 CPUs) with the intention to analyze the efficiency of the program for variable lmax and variable number of nodes. Like expected, the more nodes are used the faster the computation (cf. Fig. 12). But also the user time (accumulated time over all used processors) is a important factor for parallel computing to show efficiency. An equal efficiency shows in equal user time with a varying processor number. A comparison reveals that the user times match for one and two nodes only. The user time increases for three, respectively four, nodes due to increasing inter-process communication. Consequently, for runtime efficiency reasons the computation should be done on two nodes at maximum.

Table 1 Comparison of the parallelization between OpenMP and MPI, SX-9, n = 500 000, 16 CPUs lmax — OpenMP — — MPI — speed-up real time [s] user time [s] real time [s] user time [s] (OpenMP : MPI) 50 10 142 50 791 0.20 100 81 1272 151 2414 0.54 150 352 5596 504 8055 0.70 200 1070 17081 955 15278 1.12

592

M. Roth, O. Baur, W. Keller

Fig. 12 SX-8: Runtime depending on maximum degree and order lmax for a different number of nodes. One node (black); two nodes (red); three nodes (green); four nodes (blue); n = 256 000

Fig. 13 MPI version, L = lmax = 200, relative empirical errors. The poor recovery of the low-order parameters is a consequence of the GOCE orbit design; the quality of the parameters is as expected

4.3 Numerical Precision of the Computations The resolved spherical harmonic coefficients are compared to those of the EGM96 (Earth Gravitational Model 1996) to verify that the numerical precision is within acceptable range. The EGM96 coefficients are also the input parameters of the used simulated GOCE data (closed loop simulation). For the comparison we use the relative empirical errors according to ref vlm − vˆlm rel , evlm = (7) vref lm ref ref ref with vref lm = c¯lm for l ≥ 0 and vlm = s¯lm for l < 0 being the reference coefficients of the EGM96. Analogously, the vˆlm assemble the estimated geopotential parameters. From the displayed results in Fig. 13 the conclusion can be drawn that the program produces good results in terms of spherical harmonic coefficients quality.

“Brute-Force” Solution of Large-Scale Systems of Equations

593

5 Discussion, Conclusion and Outlook In comparison to the shared memory version of our analysis algorithm, MPI is more complex to use as communication is not done automatically as with OpenMP. An advantage is that it is possible to use both, shared memory and distributed memory systems, with MPI. Opposed to OpenMP, it is possible to use more than one node, with a huge number of processors. Furthermore, it became possible to distribute the normal equation matrix over all used nodes. Each of those nodes needs only to hold a part of the matrix. Because of that the memory demand per node is divided by the number of used nodes. A disadvantage is that the program has to be run with a tool like mpirun that provides the necessary interface to the communication network. For the distributed memory version of the program it became necessary to include ScaLAPACK with its associated libraries. Here MPI and a computing grid were initialized, an optimal block-size for distributing the matrices was found and an algorithm to distribute the matrix was developed. The effort done for optimization and parallelization helped to improve the efficiency of the program considerably. Attention was paid to make the program as modular as possible, so that future changes and extensions can be implemented easily. A comparison between the OpenMP and MPI versions showed that the OpenMP version seems to be better suited for lower lmax while MPI shows better efficiency for higher lmax . Future work will deal with the combination of both concepts, OpenMP doing the intra-node work and MPI the inter-node communication. Maybe this could improve the speed even further, especially if even more nodes get involved in the computations. Some further improvement could be done for the MPI version at the setup of the design matrix. For instance, the elements could be saved in the Fortran scheme from the very beginning. This would need a new computing scheme for the indices but the reordering step which is necessary at the moment could be avoided. It remains to be tested if this would lead to a gain in speed. Another possibility would be to completely eliminate the step of the design matrix distribution and to compute the values at the process that stores this part of the matrix. Maybe this could result in an additional gain of speed. Acknowledgments. The authors thank the High Performance Computing Center Stuttgart (HLRS) for the opportunity to use their computing facilities; furthermore, we gratefully acknowledge the helpful technical support. This work was supported by the German Ministry of Education and Research (BMBF) within the geoscientific R+D program GEOTECHNOLOGIEN under grant 03G0726G.

594

M. Roth, O. Baur, W. Keller

References 1. Baur O. (2009) Tailored least-squares solvers implementation for high-performance gravity field research, Computers and Geosciences 35: 548–556, DOI 10.1016/j.cageo.2008.09.004 2. Blackford L. S., Choi J., Cleary A., D’Azevedo E., Demmel J., Dhillon I., Dongarra J., Hammarling S., Henry G., Petitet A., Stanley K., Walker D., Whaley R. C. (1997) ScaLAPACK Users’ Guide, http://www.netlib.org/scalapack/slug/ (retrieval on 7th April 2011) 3. ESA (2011) GOCE website: http://www.esa.int/goce (retrieval on 7th April 2011) 4. Heiskanen W. A., Moritz H. (1967) Physical Geodesy, W.H. Freeman and Company San Francisco 5. Hobson E. W. (1931) The Theory of Spherical and Ellipsoidal Harmonics, University Press, Cambridge 6. Koch K.-R. (1999) Parameter Estimation and Hypothesis Testing in Linear Models, Springer Berlin Heidelberg New York 7. Roth M. (2010a) GOCE Data analysis: Optimized brute force solutions of large-scale linear equation systems on parallel computers, Diploma Thesis, University of Stuttgart, URN urn:nbn:de:bsz:93-opus-58910 8. Roth M., Baur O., Keller W. (2010) Tailored Usage of the NEC SX-8 and SX-9 Systems in Satellite Geodesy, in: Nagel W. E. et al. (eds.) High Performance Computing in Science and Engineering ’10: 561–572, Springer Verlag Berlin Heidelberg 2011, DOI 10.1007/978-3-642-15748-6 41

Metallic Foam Structures, Dendrites and Implementation Optimizations for Phase-Field Modeling A. Vondrous, B. Nestler, A. August, E. Wesner, A. Choudhury, and J. H¨otzer

Abstract We present our current work in the field of computational materials science with the phase-field method on the high performance cluster XC 4000 of the KIT (Karlsruhe Institute of Technology). Our investigations include heat conduction of open cell metal foams, dendritic growth and optimizations of the concurrent processing with the message passing interface (MPI) standard. Large scale simulations are applied to identify relevant parameters of heat conduction and dendrite growth. Our overall goal is to continuously develop our models, numerical solution techniques and software implementations. The basic model and parallelization scheme is described. Disadvantages of 1D domain decomposition compared to 3D domain decomposition for large 3D simulation domains are explained and a detailed analysis of the new 3D decomposition needs to be performed. The data throughput of parallel file IO operations is measured and system specific differences have been found which need further investigations.

1 Introduction Science is constantly growing with the discovered and understood phenomena of our environment. To help understand our environment, we try to get insight in processing mechanisms and interactions by computer simulations. In this report, we present our contribution based on applications using the phase-field method. The presented results present current work run on the XC 4000 Cluster of the KIT. Our overall goal is to reveal the whole strength of the phase-field method and to exploit its capabilities. Phase-field is an easily extendable and powerful method A. Vondrous · B. Nestler · A. August · E. Wesner · A. Choudhury · J. H¨otzer Institute of Materials and Processes, Karlsruhe University of Applied Sciences, Moltkestr. 30, Karlsruhe, Germany, e-mail: [email protected], Anastasia.August@hs-karlsruhe. de, [email protected], [email protected], johannes.hoetzer@ gmx.de, [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 43, © Springer-Verlag Berlin Heidelberg 2012

595

596

A. Vondrous et al.

to describe physical phenomena of microstructure evolution and to computationally design materials with tailored properties. One of its strengths is the ability to model surface dynamics and surface interactions [1]. The effect of different physical fields on the phase transformations and on the motion of boundaries can be introduced by formulating appropriate free energies in the underlying functional. The derivation of the set of evolution equations contains appropriate terms representing the driving forces according to the physical fields. Together with the property of simulating structures with complex shaped surfaces, the phase-field method evolves as a powerful tool to optimize production and manufacturing processes as well as to investigate mechanical properties in various material systems. The method expects to have a great potential within a broad range of research fields. The field of applications is constantly growing from solidification studies of multicomponent systems [3], over fluid dynamics studies [4] to studies of mechanical properties [5, 8] and computer graphics [6, 7]. The improvements of the models, numerical methods and implementation will constantly show progress to obtain efficient and applicable tools to gain more insights into the physics of structure and pattern formation. One application of the phase-field method is to investigate new materials such as metal foams. Open cell metal foams have a big surface and promising thermal conductivity properties as required to serve as a material for heat exchangers. Development, analysis and improvement of new materials can be realized more target orientated by applying simulation methods such as those based on the phase-field concept. Modified structures can be tested and analyzed in a fast manner without any material input or the need to set up an extensive experimental measurement apparatus for achieving a first and rough impression of the tendency. The roots of the phase-field method are solidification studies which still form the most common research subject [9]. Metal casting industry and research facilities make use of thermodynamic databases to determine material properties and process parameters. The phase-field method is capable to describe effects of certain process steps on the alloy properties, on the typical morphological shapes and microstructure patterns. Investigations in metal technology can be supported and shorter development cycles can be achieved. The gap between atomistic and macroscopic methods is crossed by the phase-field method operating on a mesoscopic (micrometer scale) in a consistant way. From a technical viewpoint, a further optimization of the numerical solving algorithms is required not only to improve the efficiency or to gain more usability, but also to establish a continuity in software design and to help developing systems for the next generation. The use of modern hardware and cutting edge software reduces unnecessary power consumption and increases the simulation speed. In the past, significant speedup has been obtained without dedicated programming concepts, only by faster hardware with higher clockspeed. Nowadays the number of CPUs directly correlates with the need to implement concurrent calculations. The future is now open for different applications, which have to be in mind, when developing high performance software. The section “Optimizations” is devoted to describe our current work on performance optimizations of the software package Pace3D for materials simulation.

Metallic Foam Structures and Dendrites with Phase-Field Modeling

597

2 Method In this section we give an overview of physical modeling, numerics and our implementation. Pace3D (Parallel Algorithms for Crystal Evolution in 3D) is a library which contains parallel solvers and tools for data manipulation, analysis and visualization. The core of Pace3D is the implemented phase-field model for multiphase and multicomponent systems proposed by B. Nestler, H. Garcke and B. Stinner in [2] and [1]. The formulation of the model is based on a general energy functional 1 ∇φ ) + w (φ ) + fbulk (φ , c, T, . . .) d Ω , F= ε a(φ ,∇ (1) ε from which a set of nonlinear dynamic equations for the different physical field variables is derived. Interfaces between distinct phases, e.g. between solid and liquid material are modeled such that a smooth transition from one phase state to the other is used. φ is the phase-field vector with N phases, where each vector component describes a physical state or a particular property of the material. The vector describes volume fractions of each phase with the constraint ∑Nα =1 φα = 1. ε is a length scale parameter with influence on the interface thickness. The gradient energy density is de∇φ ) and the surface energy density by w (φ ). One or many scribed by the term a(φ ,∇ bulk energies fbulk can be incorporated to model the processes such as solidification and to account for influences such as pressure, elasticity, plasticity or magnetism. Evolution equations are obtained by variational derivation of the energy functional. The area of interest is discretized by finite differences and solved according to an explicit Euler method. In case of many phases (N > 1000), the necessary memory rapidly exceeds the available capacity such that optimization strategies to manage large data structures are required. The software package offers the activation of locally reduced order parameter sets according to the techniques in [10] allowing to compute systems with very high amounts of phases (N 100000). Concerning aspects of software design, the implementation supports parallel execution with MPI and a clear structure for users, modelers and implementers. All parallel tasks follow a manager-worker pattern to keep complexity under control. An advantage of the explicit Euler method is the low range calculation stencil. Only direct neighbors are addressed to calculate the values of a cell for the next time step. Parallelization is applied by dividing the simulation domain in subregions along one dimension and adding boundary layers, which have to be updated after each time step (see Fig. 1). The user controls all physical data and the choice of modules of the package such as fluid flow, magnetism, heat transfer etc. by a setting the appropriate keys in the parameter file.

598

A. Vondrous et al.

Fig. 1 Simulation domain decomposition along one axis for parallel execution with MPI

3 Metallic Foam Metal foams are porous materials with a fine irregular structure. To obtain significant simulation results, we have to • resolve the ligaments with sufficiently many cells and • find a representative sample suitable to describe the behavior of bigger ones. On the one hand, the consideration of a large number of cells in the domain causes very long simulation times, whereas on the other hand, a small representative volume element impairs the generalization of the results. By increasing the number of processors, we can satisfy the previous items. The first example refers to an experimental foam sample of 1 cm3 length in each spatial direction with 20 ppi (pores per inch). The parameter ‘ppi’ implies the necessity of representing 339 pores. Hence this quantity defines the extension of the representative volume element in the simulation. To resolve samples of 50 ppi, a volume of 1 cm3 contains about 4850 pores, so that a smaller representative volume of 0.3 × 0.3 × 0.3 cm3 has to be chosen. This volume encloses about 133 pores to provide convincing simulation results. The physical and simulation parameters for both samples are summarized in Tables 1 and 2. In a phase-field simulation, the energy functional (see Sect. 2) is constructed to minimize the energy in the system of consideration on the basis of classical irreversible thermodynamics. Following a conservation law for the internal energy e, a dynamic field equation, directly related to the evolution of the temperature everywhere in the simulation domain, can be derived by means of variational differentiation of the according entropy functional. The classical Gibbs relation reads e = f (φ , c, T ) + T s(e, c, φ ) = f (φ , c, T ) − T f ,T (φ , c, T ),

(2)

where e is the internal energy, f contains bulk free energies, s is the entropy density and the notation f ,T denotes the derivative of the bulk free energy density f (φ , c, T )

Metallic Foam Structures and Dendrites with Phase-Field Modeling

599

Table 1 Structure parameters ppi 20 50

grid cells 420 × 420 × 420 420 × 420 × 420

pore radius 26 cells 36 cells

pore number 339 133

solid fraction 11.18% 9.54%

Table 2 Simulation parameters physical size of the domain for 20 ppi physical size of the domain for 50 ppi initial temperature in the interior of the domain volumetric heat capacity of aluminum volumetric heat capacity of air boundary condition at the bottom of the domain

1.0 × 1.0 × 1.0 cm3 0.3 × 0.3 × 0.3 cm3 300 K 2.422 · 106 J/(m3 K) 1.297 · 103 J/(m3 K) 600 K

with respect to the temperature. A balance law is used to form the evolution equation for the internal energy. Terms representing heat flux are derived from the functional equation by a linear relation with respect to the thermodynamic driving forces ∇δ S/δ e and ∇δ S/δ ci incorporating Onsager’s mobility coefficients Li j . The partial differential equation for the inner energy is given by −μ K 1 j . ∂t e = −∇ · L00 ∇ + ∑ L0 j ∇ T j=1 T

(3)

The simulation study considers the heating up of foam structures with different porosity from the bottom of the domain. The considered metal is aluminum and the pores are filled with air. Figures 2 and 3 refer to generated foam samples with 20 ppi and 50 ppi as well as to a heat diffusion after 0.79 s and 0.63 s, respectively. The figures at the right side illustrate snapshots of the temperature field including isolines within the foam at an intermediate state of the simulation.

4 Dendrites Dendrites are ubiquitous morphologies occurring during solidification processing. Apart from being very interesting to physicists from the point of view of pattern formation, it holds special significance in industry, because one requires to understand the influence of the processing parameters such that the resulting microstructure is finer and has uniform distribution contributing to better mechanical properties. While such a correlation between the processing parameters and the microstructure is difficult to establish experimentally, the application of simulation techniques such as the phase-field method can contribute immensely. Phase-field simulations have been extensively applied in the past decade for understanding the evolution of a number of microstructures occurring and one of the principal microstructures is indeed dendrite solidification. Dendrites are microstruc-

600

A. Vondrous et al.

Fig. 2 Left: Synthetic foam structure used for simulations associated to 1 cm3 foam samples of 20 ppi. Right: Heat distribution in the air-aluminum domain after 0.79 s

Fig. 3 Left: Synthetic foam structure used for simulations associated to 0.3 × 0.3 × 0.3 cm3 foam samples of 50 ppi. Right: Heat distribution in the air-aluminum domain after 0.63 s

tures that result due to a special instability leading to the selection of a unique dendrite tip radius and growth velocity for given processing conditions. Two theories exist for the selection of the dendrite tip radius, namely the marginal stability criterion and the microsolvability theory. While the marginal stability criterion is empirical, the microsolvability criterion is a solvability condition for the existence of a solution. Both theories do pretty well in predicting some of the precise details of the fineness of the microstructure. However, exact details such as the secondary arm spacing, and primary arm spacing selections are far too complicated for analytical theories. Here is where, phase-field simulations become useful, which not only establish the properties of the steady state dendrite tip, but also other geometrical features relating to the fineness of the microstructure.

Metallic Foam Structures and Dendrites with Phase-Field Modeling

601

Dendrites occur in both, pure materials and multi-components alloys. While in pure materials the onset of primary arm formation results due to latent heat rejection, in multi-component alloys, the physical problem is a combination of solutal and thermal field evolution. While, the coupled problem is difficult to solve numerically, because the solute and thermal field evolution occur on largely different time scales, useful assumptions can be made where the thermal field is assumed to have evolved to a steady state in the time scale where solute diffusion occurs. This allows one to simulate and recover most relevant properties of solutal dendrites. Phase-field simulations must however be used carefully if one needs to recover quantitative numbers out of simulations. In this context, one must perform an analysis of the governing equations to understand their legitimacy. Apart from this, the problem of dendrite evolution is computationally extensive. The reason for this is the resolution of the long far-field diffusion field. At higher velocities or undercoolings, the diffusion field is smaller, and is more computationally accessible. However, existing asymptotic analysis breakdown at such velocities and hence relevance of such simulations become limited. At lower undercoolings the diffusion field is of the order of millimeters, while for quantitative simulations the grid resolution is limited. Hence, the number of grid points multiplies. Therefore, to perform such simulations it becomes important to involve optimizations such as the adaptive mesh refinement and efficient parallelization computation techniques. In the following discussion we present simulation results of dendritic structures for the Al-Cu 2 at % alloy. We model the free energies of the solid and the liquid phases with an ideal solution model. We perform the asymptotic analysis and derive expression for the simulation parameters such that the resulting time scale of evolution is limited by the evolution of the solute field. This implies the phase transformation relaxes infinitely fast in response to a change by the concentration field. The simulation temperature is set to T = 895 K, with a small supersaturation, with a Gibbs-Thomson coefficient of 2.4e-7 K/m. The size of the domain in 2D simulations is 600 µm, while in the 3D simulations it is 100 µm with a grid resolution of 0.2 µm. The strength of anisotropy is 0.06. The secondary arm formation is induced by Langevin noise in the phase-field equations. The simulations are able to retrieve some of the important features such as the secondary arm spacings and dendrite tip radius, which can be seen in Fig. 4a and b. To obtain these quantities in the whole range of undercoolings and driving forces, highly advanced computational optimizations are required for simulating dendrite growth. This is however an outlook and dendrite growth still remains a serious computational challenge.

5 Optimizations In this section we describe the optimization of the Pace3D simulation software by applying a multi dimensional domain decomposition and parallel file IO with MPI. 1D and small 2D simulations can be computed on a regular workstation with 1 CPU.

602

A. Vondrous et al.

Fig. 4 Dendrites in isothermal conditions of T = 895 K in a two dimensions and b three dimensions

To use all available resources, even those of a workstation, concurrent calculations have to be applied. The current development of CPUs is pointing towards more processing units on a single socket such that the efficiency of parallel calculations should be optimized to gain a bigger advantage. 1D domain decomposition enables parallel execution, but may have drawbacks compared to a multi dimensional decomposition, especially for big domains and many parallel tasks. Each sub domain has to transfer more boundary cells in the case of a 1D compared to 3D decomposition, because the surface of a nearly cubic subdomain is smaller than the surface of a thin slice (compare Fig. 1 and Fig. 5a) with the same amount of cells. If the interconnect between nodes has a high latency, 1D decomposition should have the advantage of lesser communication connects. Nevertheless 1D decomposition limits the amount of useable CPUs to the size of the largest domain axis. A decomposition along all three axes as in Fig. 5a allows to use even more CPUs and increases the efficiency due to a better cache usage. The 3D decomposition of the software is in a final stage and a big scale performance analysis is in progress. To face load imbalance, we have developed an adaptive decomposition scheme which divides the domain more often in areas with higher load to increase the overall efficiency. IO operations for writing and reading files are another slow down to be minimized. MPI delivers IO operations for 3D decomposition with the chance of optimized writing and reading throughput. Measurements in Table 3 show a clear difference between collective and non collective writing operations as mentioned in the MPI standard [11]. They also show, that throuput of workstations can be higher than that of a high performance clusters. The measurements have to be repeated and evaluated at different times, since the file IO performance is dependent on the workload of the system. Workload and hardware configurations are not the only performance issues. The configuration of a system in an appropriate manner has an additional

Metallic Foam Structures and Dendrites with Phase-Field Modeling

603

Fig. 5 Decomposition of a 3D simulation domain by a “regular” 3D decomposition and b adaptive decomposition Table 3 Writing throughput of an MPI task on different systems in MB/s Communication type collective non collective

XC 4000 7,933 6,355

‘ProStudium’ cluster 2,564 1,012

Office workstations 20,235 7,012

significant influence. The measurements are performed without the usage of MPI hints and provide us with basic performance numbers. The so-called ‘ProStudium’ cluster is a high performance computing facility containing 40 nodes, each equipped with 8 CPUs. An infiniband with DDR connections is the high performance interconnect between the nodes. Data are stored at the login node which contains a software RAID 5 system with a handful of disks. This system is exclusively provided for student projects and teaching purposes. In contrast, all office workstations have 2 CPUs, are connected through 1 GBit ethernet and the data are stored on a hardware RAID 5 system. The big throughput of the office workstations compared to other types of clusters has to be investigated and a broader performance analysis of the new 3D decompositions for the Pace3D simulation application has to be carried out.

6 Outlook The current state of our studies, developments and implementations allow to investigate different phenomena of material development in detail. The model and implementation grow hand in hand. The investigation of metallic foams combined with fluid flow and heat convection is one of the next steps to predict the heat transfer for different flow conditions and for different metals and alloys. The application of topology optimization of metallic foam structures and other porous synthetic and

604

A. Vondrous et al.

symmetric configurations with respect to specific properties is another study in the current range of feasibility. The investigation of metallic foams is a particular topic, where simulations based on the phase-field method provide valuable information of microstructure design, but its far not the only application. There exist many open questions in the field of solidification and melting. For the study of dendritic structures, their properties and behavior, feasible simulation domains and efficient algorithms have to be applied to make sustainable conclusions. The extra addition of thermodynamic databases and more accurate calculations emphasizes the need for high performance computing resources. From a technical point of view, the progress is addressed to a continuous development of fast, portable and maintainable algorithms aiming for a most efficient handling of computing resources. 3D domain decomposition is an important step to meet current and future needs. Advantages and disadvantages have to be identified to provide solid programs and user advices. Beside the presented studies in this report, almost all microstructure simulations nowadays concern systems with many phases, grains or particles and require the use of high performance computations. The property changing process during recovery of heavy plastically deformed metal sheets involves the growth of many grains in a many grain microstructure. Large-scale simulations are capable of representing the process of recrystallization. To describe the Lotus effect of leafs, a high resolution of the numerical grid is necessary to resolve the rough and complex structured surface of the leaf. Each presented topic benefits from using high performance computational resources. Applications of the phase-field model exploit their capabilities and give insight in physical processes and mechanisms. The technical progress of software development allows to use current and prospective systems efficiently. Acknowledgments. The presented simulations were performed on the Landesh¨ochstleistungsrechner XC 4000 of the Karlsruhe Institute of Technology (KIT). The authors gratefully acknowledge the access to the system.

References 1. B. Nestler, H. Garcke, and B. Stinner: Multicomponent alloy solidification: Phase-field modelling and simulations. Physical Review E, 71:041609, 2005 2. H. Garcke, B. Nestler, and B. Stinner: A diffuse interface model for alloys with multiple components and phases. SIAM J. Appl. Math., 64:775, 2004 3. B. Nestler and A. A. Wheeler: A multi-phase-field model of eutectic and peritectic alloys: Numerical simulation of growth structures. Physica D, 138, 2000 4. B. Nestler, A. Aksi, and M. Selzer: Combined Lattice Boltzmann and phase-field simulations for incompressible fluid flow in porous media. Mathematics and Computers in Simulation, 80:1458–1468, 2010 5. R. Spatschek, C. M¨uller-Gugenberger, E. Brener, and B. Nestler: Phase field modeling of fracture and stress induced phase transitions. Physical Review B, 75:066111, 2007 6. T. Kim and M. C. Lin: Visual Simulation of Ice Crystal Growth. Department of Computer Science, Eurographics/SIGGRAPH Symposium on Computer Animation, 2003

Metallic Foam Structures and Dendrites with Phase-Field Modeling

605

7. H. Garcke, T. Preuer, M. Rumpf, A. C. Telea, U. Weikard, and J. J. van Wijk: A Phase Field Model for Continuous Clustering on Vector Fields. IEEE Transactions on Visualization and Computer Graphics Archive, 7(3), 2001 8. R. Spatschek, M. Hartmann, E. Brener, H. M¨uller-Krumbhaar, and K. Kassner: Phase Field modelling of Fast Crack Propagation. arXiv, 2005 9. B. Nestler and A. Choudhury: Phase-field modeling of multi-component systems. Current Opinion in Solid State and Materials Science, DOI: 10.1016/j.cossms.2011.01.003, 2011 10. S. G. Kim, D. I. Kim, W. T. Kim, and Y. B. Park: Computer simulations of two-dimensional and three-dimensional ideal grain growth. Physical Review E, 74:061605, 2006 11. Message Passing Interface Forum: A Message-Passing Interface Standard, Version 2.2

Quaero 2010 Speech-to-Text Evaluation Systems Sebastian St¨uker, Kevin Kilgour, and Florian Kraft

1 Introduction Our laboratory has used the HP XC4000, the high performance computer of the federal state Baden-W¨urttemberg, in order to participate in the third Quaero evaluation (2010) for automatic speech recognition (ASR). State-of-the-art ASR research systems usually employ techniques which require the parallel execution of several recognition systems for the purpose of system combination. The use of unsupervised adaptation techniques further requires the execution of several stages or passes of ASR systems. This leads to the fact that modern research systems process speech only with a run-time of many times realtime, under certain circumstances up to 100 times real-time. The process of speech recognition in this form can be easily parallelized at speaker level in independent processes without the need for inter-process communication. Therefore, the scheduling system of the XC4000 in combination with its global, high performing file space, is an ideal environment for executing such an evaluation. In this paper we report on our 2010 ASR evaluation systems for English and German that we, at least in part, trained and executed on the XC4000, and that are an extension of our 2009 systems [11].

2 Quaero Quaero (http://www.quaero.org) is a French research and development program with German participation. It targets to develop multimedia and multilingual indexing and management tools for professional and general public applications such Sebastian St¨uker · Kevin Kilgour · Florian Kraft Research Group 3-01 ‘Multilingual Speech Recognition’, Karlsruhe Institute of Technology, Karlsruhe, Germany, e-mail: [email protected], [email protected], [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 44, © Springer-Verlag Berlin Heidelberg 2012

607

608

S. St¨uker, K. Kilgour, F. Kraft

as the automatic analysis, classification, extraction, and exploitation of information. The projects within Quaero address five main application areas: • • • • •

Multimedia Internet search Enhanced access services to audiovisual content on portals Personalized video selection and distribution Professional audiovisual asset management Digitalization and enrichment of library content, audiovisual cultural heritage, and scientific information.

Also included in Quaero is basic research in the technologies underlying these application areas, including automatic speech recognition, machine translation, and speech-to-speech translation. The vision of Quaero is to give the general public as well as professional user the technical means to access various information types and sources in digital form, that are available to everyone via personal computers, television, and handheld terminals, across languages. Quaero is organized as a program consisting of seven projects. Five projects are concerned with applications. In addition, one project, the Core Technology Cluster (CTC), conducts basic research in the technologies underlying the application projects, and one project is concerned with providing the data resources necessary for the research within CTC. Our laboratory is mainly involved in the CTC project. Two of the technologies under investigation are Automatic Speech Recognition (ASR), i.e. the automatic transcription of human speech into written records and Machine Translation (MT). Within Quaero research is driven by competitive evaluation and sharing of results and technologies employed. This process is called coopetition. Evaluations are conducted once a year on a predefined domain and a set of languages. As the project continues the number of languages to address will grow. Also the performance of the recognition systems developed within the project is expected to improve. The third evaluation conducted in August 2010 was the second, real evaluation after the baseline evaluation in 2008 and the first evaluation in 2009. Seven languages were addressed: English, French, German, Greek, Polish, Russian, and Spanish. We participated in the languages English, German, Russian, and Spanish. The test data for the evaluation consisted of various audio files collected from the World Wide Web, including broadcast news, lectures, and video blogs.

3 English Evaluation Recognition Systems For the 2010 English ASR evaluation within Quaero we participated with a recognition system that is a further development of our 2010 evaluation system [11]. The system has been trained and tested with the help of the Janus Recognition Toolkit that features the IBIS single pass decoder [13]. In general all recognition systems employ left-to-right Hidden Markov Models (HMMs), modeling phoneme sequences with 3 HMM states per phoneme. The general features of the system are

Quaero 2010 Speech-to-Text Evaluation Systems

609

the same as for 2009. Improvements over the 2009 system came from increased amounts of training data, and the use of a large language model that made it necessary to use memory mapping in order to share the language model among several processes running on the same node.

3.1 Front-End We trained systems for two different kinds of acoustic front-ends. One is based on the traditional Mel-frequency Cepstral Coefficients (MFCC) obtained from a fast Fourier Transform and the other on the warped minimum variance distortionless response (MVDR). The second front-end replaces the Fourier transformation by a warped MVDR spectral envelope [16], which is a time domain technique to estimate an all-pole model using a warped short time frequency axis such as the Mel scale. The use of the MVDR eliminates the overemphasis of harmonic peaks typically seen in medium and high pitched voiced speech when spectral estimation is based on linear prediction. For training, both front-ends have provided features every 10 ms. During decoding this was changed to 8 ms after the first stage. In training and decoding, the features were obtained either by the Fourier transformation followed by a Melfilterbank or the warped MVDR spectral envelope. For the MVDR front-end we used a model order of 22 without any filter-bank since the warped MVDR already provides the properties of the Mel-filterbank, namely warping to the Mel-frequency and smoothing. The advantage of this approach over the use of a higher model order and a linear-filterbank for dimensionality reduction is an increase in resolution in low frequency regions which cannot be attained with traditionally used Mel-filterbanks. Furthermore, with the MVDR we apply an unequal modeling of spectral peaks and valleys that improves noise robustness, due to the fact that noise is mainly present in low energy regions. Both frond ends apply vocal tract length normalization (VTLN) [17]. For MFCC this is done in the linear domain, for MVDR in the warped frequency domain. The MFCC front-end uses 13 cepstral coefficients, the MVDR front-end uses 15. The mean and variance of the cepstral coefficients were normalized on a per-utterance basis. For both front-ends seven adjacent frames were combined into one single feature vector. The resulting feature vectors were then reduced to 42 dimensions using linear discriminant analysis (LDA).

3.2 Acoustic Model Training We trained acoustic models for two different kinds of phoneme sets P1 and P2. P1 is a version of the Pronlex phoneme set which consists of 44 phonemes and allophones while P2 is a version of the phoneme set used by the CMU dictionary that consists

610

S. St¨uker, K. Kilgour, F. Kraft

of 45 phonemes and allophones. We trained models for all four combinations of the two phoneme sets and the two acoustic front-ends described above. Unlike last year, for this year’s evaluation we only trained acoustic models on one set of acoustic model training data. The training contains approximately 80 h of English EPPS data provided by RWTH Aachen within the TC-STAR project [6], 9.8 h of TED data [8], and 167 h of unsupervised EPPS training material that had been collected within TC-STAR by RWTH Aachen but had not been manually transcribed. Transcriptions for the unsupervised training material were obtained by adapting an acoustic model of last year’s system on automatic transcriptions provided by RWTH Aachen on that data. We then decoded the data, using the segmentation provided by RWTH Aachen. It further contained 140 h of BroadCast News data from the HUB-4 corpus, and approx. 50 h of in-domain training data provided by the Quaero consortium. All models are semi-continuous quinphone systems that use 16000 distributions over 4000 codebooks. They were trained using incremental splitting of Gaussians training, followed by 2 iterations of Viterbi training. For all models we used one global semi-tied covariance (STC) matrix after LDA [5] as well as Vocal Tract Length Normalization. In addition to that feature space constraint MLLR (cMLLR) speaker adaptive training [3] was applied on top. We improved the acoustic models further with the help of Maximum Mutual Information Estimation (MMIE) training [10]. We applied MMIE training firstly to the models after the 2 Viterbi iterations, and secondly to the models after the FSASAT training, taking the adaptation matrices from the last iteration of the maximum likelihood FSA-training and keeping them unchanged during the MMIE training. This resulted in eight different acoustic models: for each combination of frontend, MVDR and MFCC, and phoneme set, P1 and P2, one set of models trained with VTLN plus MMIE, and one with FSA-SAT plus MMIE. From now on we refer to these models as P1-MFCC-VTLN, P1-MVDR-VTLN, P2-MFCC-VTLN, P2-MVDRVTLN, P1-MFCC-SAT, P2-MFCC-SAT, P1-MVDR-SAT, P2-MVDR-SAT.

3.2.1 Segmentation and Clustering Segmenting the input data into smaller, sentence-like chunks used for recognition was performed with the help of a fast decoding pass on the unsegmented input data in order to determine speech and non-speech regions. Segmentation was then done by consecutively splitting segments at the longest non-speech region that was at least 0.3 seconds long. The resulting segments had to contain at least eight speech words and had to have a minimum duration of six seconds. The maximum segment length was limited to 30 seconds. In order to group the resulting segments into several clusters, with each cluster, in the ideal case, corresponding to one individual speaker we used the same hierarchical, agglomerative clustering technique as last year which is based on TGMM-GLR distance measurement and the Bayesian Information Criterion (BIC) stopping crite-

Quaero 2010 Speech-to-Text Evaluation Systems

611

rion [7]. The resulting speaker labels were used to perform acoustic model adaptation in the multipass decoding strategy described below.

3.2.2 Language Model and Test Dictionary Using a 130k vocabulary, a 4gram case sensitive language model with modified Kneser-Ney smoothing was built for each of the text sources listed in Table 1. This was done using the SRI Language Modelling Toolkit [14]. The effects of the different text sources on the performance of the language model can be seen in Table 21 . The quick transcripts of the Quaro training data was cleaned and split into a 601k word training set and a 615k word tuning set. The aforementioned language models built from the text sources in Table 1 were interpolated using interpolation weights estimated on this tuning set resulting in a 50 GByte language model with 101 079k 2grams, 424 062k 3grams, and 978 979k 4grams. To select the vocabulary the deTable 1 English text sources Corpus UK parliamentary debate (Hansard) EPPS acoustic training data EPPS text data UN Parallel Text (English) Hub4 Broadcast News data Web dump (pre 2008) Quaero 2010 training transcripts Quaero 2010 training texts Web dump (February 2010) Web dump (December 2009) Gigaword 4th Edition including 2008 texts Google Ngrams

Wordcount 49 681k 750k 33 044k 40 991k 832k 643 236k 1 216k 1 764 227k 4 764 752k 1 488 881k 1 800 434k –

Total

10 588 044k

Table 2 WER (in %) on the Quaero 2010 development with varying language models. All other ASR components are the same as in our 2009 System Name LM1 LM3 LM5 LM6 LM7 LM8 LM9 LM10

Language model Discription same as Quaero2009 LM LM1 + new tuning set LM3 + Q2010 training transcripts LM5 + Q2010 training texts LM6 + Web dump (February 2010) LM7 + Web dump (December 2010) LM8 + Gigaword 4th Edition LM9 + Google Ngrams

Case Dependant pruned not pruned 34.31 – 34.39 – 32.56 32.47 31.46 31.28 31.39 31.26 31.26 – 31.35 31.06 – 30.94

Case Independant pruned not pruned 31.99 – 32.04 – 31.03 30.93 30.39 30.19 30.36 30.25 30.21 – 30.34 30.12 – 30.01

1 A keen ovserver may notice that LM2 and LM4 do not appear in this table. Their obmission is not due to a dislike of the numbers 2 and 4 but rather the result of dead-ends in the development of the language model for our evaluation system.

612

S. St¨uker, K. Kilgour, F. Kraft

velopment data text was split into a tuning set and a test set with each containing approximately half the text of every show. For each of our English text sources (see Table 1) we built a Witten-Bell smoothed unigram language model using the union of the text sources’ vocabulary as the language models’ vocabulary (global vocabulary). With the help of the maximum likelihood count estimation method described in [15] we found the best mixture weights for representing the tuning set’s vocabulary as a weighted mixture of the sources’ word counts thereby giving us a ranking of all the words in global vocabulary by their relevance to the tuning set. While the baseline 64k vocabulary had an OOV rate of 3.9% when measured on the validation set, the OOV rate of the vocabulary containing only the top ranked 64k words was 2.9%. This vocabulary was slowly increased until the OOV rate was under 1%. The final 130k vocabulary had a case sensitive OOV rate of 0.73%. Pronunciations missing from the initial dictionary were created either manually or automatically with the help of Bill Fisher’s tool [2] for P1 and Festival [1] for P2 respectively.

3.2.3 Decoding Strategy and Results Decoding within our recognition system was performed in two stages. The acoustic models of the second stage were adapted on the output(s) from the previous stage using Maximum Likelihood Linear Regression (MLLR) [9], Vocal Tract Length Normalization (VTLN) [17], and feature-space constrained MLLR (cMLLR) [3]. For the second and third stage the frame shift during recognition was changed to 8 ms. In the first stage we used the acoustic models P1-MFCC-VTLN, P1-MVDRVTLN, P2-MFCC-VTLN, and P2-MVDR-VTLN. The resulting word lattices of P1MFCC-VTLN and P1-MVDR-VTLN were then combined via confusion network combination to the output o1, those of P2-MFCC-VTLN and P2-MVDR-VTLN to o2. In this first stage we adapted the acoustic models using incremental VTLN and incremental fMLLR on a per speaker basis. For the second stage P2-MFCC-SAT and P2-MVDR-SAT were adapted on o1, P1-MFCC-SAT and P1-MVDR-SAT were adapted on o2. The result of the different models were then combined via confusion network combination to the final output. On the official 2010 development set the system achieved a word error rate of 24.0%.

4 German Evaluation Recognition System All speech recognition experiments described in the following were performed with the help of the Janus Recognition Toolkit (JRTk) and the Ibis single pass decoder [12].

Quaero 2010 Speech-to-Text Evaluation Systems

613

4.1 Front-End We applied two different frontends: The WMVDR approach and the conventional MFCC approach. The front-end uses a 42-dimensional feature space with linear discriminant analysis and a global semi-tied covariance (STC) transform [4] with utterance-based cepstral mean and variance normalization. The 42-dimensional feature space is based on 20 cepstral coefficients for the MVDR system and on 13 cepstral coefficients for the MFCC system.

4.2 Acoustic Model Training The training setup was based on last years evaluation system. We have used the following training material: Quaero development data set 2009 (13 hours), Quaero training data set 2009 (6 hours epps, 14 hours web data), Quaero training data set 2010 (51 hours), Verbmobil (67 hours), recordings of the Landtag BadenWuerttemberg (123 hours), Tagesschau (17 hours), isl-database (16 hours), Globalphone (19 hours), inhouse lecture and talk recordings (26 hours). All the acoustic data is in 16 kHz, 16 bit quality. Acoustic model training was performed with fixed state alignments and VTLN factors, which were written by our last years evaluation system. The system trained uses left-right hidden Markov Models (HMM)s without state skipping with three HMM states per phoneme. Additional to last years setup with 2000 distributions and codebooks with up to 128 Gaussians per model using the MVDR frontend, we trained the same setup with the MFCC frontend and for both frontends also new systems with 4000 distributions. The adapted gender independent acoustic model training (given the vocal tract normalization values for each speaker by the previous system) can be outlined as follows: • • • • • • • •

Training of the linear discriminant analysis matrix Extraction of samples Incremental growing of Gaussians Training of one global STC matrix Second extraction of samples Second incremental growing of Gaussians Three iterations of Viterbi training Three iterations of FSA-SAT speaker adaptive training.

For the 4000 distribution systems we skipped the second incremental growing of Gaussians, since we couldn’t see gains from that in other systems.

614

S. St¨uker, K. Kilgour, F. Kraft

4.3 Language Model Training and Evaluation Using the same methods described in the Language Model section of the English evaluation system we selected a 300k vocabulary with which a 4gram case sensitive language model was built for each of our German text sources. Because interpolating several language models with interpolation weights estimated on only the aforementioned tuning text produced a language model that performed poorly, we added some more general text to the tuning text and reestimated the interpolation weights. This produced a language model which outerperformed the base system language model.

4.4 Decoding Strategy and Results After a segmentation pass and speaker clustering we decoded for both frontends MVDR and MFCC both setups with 2000 and 4000 distributions using the speaker independent acoustic models. The result of a cnc combination applied on all four

Table 3 WERs on the German Quaero development set 2010 ID S A B C D E F G H I J

pass Segmentation 1st 1st 1st 1st cnc A+B+C+D 2nd 2nd 2nd 2nd cnc F+G+H+I

AM 2000 MVDR 4000 MVDR 4000 MFCC 2000 MVDR 2000 MFCC

LM LM01 LM01 LM01 LM01 LM01

4000 MVDR 4000 MFCC 2000 MVDR 2000 MFCC

LM02 LM02 LM02 LM02

WER in % (ci/cs) 35.5 / 36.4 30.0 / 31.2 29.8 / 30.9 30.8 / 31.9 31.2 / 32.3 28.3 / 29.4 26.8 / 28.0 27.0 / 28.0 27.7 / 28.8 27.9 / 29.0 26.1 / 27.2

Table 4 WERs on the German Quaero evaluation set 2010 ID S A B C D E F G H I J K

pass Segmentation 1st 1st 1st 1st cnc A+B+C+D 2nd 2nd 2nd 2nd cnc F+G+H+I compound merging

AM 2000 MVDR 4000 MVDR 4000 MFCC 2000 MVDR 2000 MFCC

LM LM01 LM01 LM01 LM01 LM01

4000 MVDR 4000 MFCC 2000 MVDR 2000 MFCC

LM02 LM02 LM02 LM02

WER in % (ci/cs) 33.2 / 34.1 28.1 / 29.3 28.3 / 29.6 29.2 / 30.4 29.8 / 31.0 26.8 / 28.0 25.3 / 26.5 25.7 / 26.8 26.3 / 27.5 26.5 / 27.7 24.6 / 25.7 24.1 / 25.2

Quaero 2010 Speech-to-Text Evaluation Systems

615

systems was used to adapt the 2nd pass systems with incremental VTLN adaptation, constrained MLLR and MLLR. In the second pass speaker adapted FSA-SAT models and a bigger language model were used. Finally we combined the four 2nd pass systems again using cnc combination and applied compound merging. On the official 2010 german development set the system achieved a word error rate of 26.1%. Tables 3 and 4 show the word error rates of the individual stages and combinations on the Quaero 2010 development and evaluation sets.

5 Parallelization Utilized The XC4000 was utilized at different stages in the development and application of the ASR systems, training as well as decoding. For training the SLURM scheduler was used to start several processes in parallel that work on disjunct portions of the training data and synchronize and combine their results after every training iteration using an in-house synchronization mechanism that uses flag-files. For decoding, the SLURM scheduler was thus used to simply start several processes in parallel that in theory can run independently of each other, as for decoding every speaker can be treated independently. The runtime of the different jobs per speaker can vary greatly, depending on how much speech is associated with one speaker. Due to the accounting system of the queue on the cluster—one process still running on one CPU on one node, while all the other processes belonging to one scheduled job have already finished, will lead to being charged the same amount of CPU time as if all processes were still running—as little processes per scheduled job as possible were sought. However, due to the fact that only 10 jobs in the production environment per user are allowed—no matter how many nodes or CPUs are actually employed—this would mean that only 40 parallel jobs could be started, if 4 CPUs per node in the cluster are assumed. In order to make use of more parallelization, therefore up to 8 or 16 processes per scheduled job were committed to the queue, even though that potentially means that one is being charged for runtime of nodes on which all processes have already finished.

5.1 Shared Memory Language Model An important part of an ASR system is its language model which models the apriori probabilities of word sequences. Good language models generally require a large amount memory but remain unchanged after being loaded into RAM. Our 2009 English system for example required about 8 GBytes to run, almost 7 GBytes of which was used up by the language model. Because our language model was 11 GBytes big—even compressed in an easy to load binary format—a modification was made to our ASR system so that we could load the language model into a region of shared memory and allow multiple decoder instances running on different cores to access it. A standard 4 core 16 Gbyte XC4000 node was able to start 3–4 instances of our

616

S. St¨uker, K. Kilgour, F. Kraft

system with an 11 GByte language model compared to just 2 instances of our older system with a 7 Gybte language model or only a single instance of our new system without the shared memory language model.

5.2 One-Button System The training and testing systems usually consisted of several sequential steps. Each step is parallelized in the xc4000 cluster using SLURM. The parallelization of a step is implemented using a master-slave system. 5.2.1 Parallelization of a Step The master holds a list of speakers or phonemes or any data that can be divided and executed independently. In this One-Button system the master can be a single master or a multiple master (for a step which runs multiple iterations). The clients (slaves) process the actual tasks. The master communicates with the clients and give each available client what needs to be processed by that individual client (from the list that the master holds). The communication is done using socket communication, so the client would have to know the port and the hostname of the master. Everytime a client completed it’s job it will notify the master and then get a new job or—if there is no more job left—wait for the others to complete their job. At the end of a step the master would wait for a minute (to make sure all jobs are really completed) and then notify the system to run the next step. 5.2.2 Detailed Description The One-Button system manages the whole steps. It reads the defined configuration of each step and the order of the execution and then running each step in the cluster according to that configuration. It can submit in development mode or in production mode, defined in the configuration. The One-Button system can do the following things: • Run the whole steps from start to end – ./DO.train system “training id” • Run a step only – ./DO.step “training id” “step name or order-index” • Run from a step and continue until the last step – ./DO.from “training id” “step name or order-index” • Cleaning the resulted outputs of a step (this is especially useful when a step is crashed and it need to be reset) – ./DO.step.clean “training id” “step name or order-index” • Continue a step and ignore the crashed or half-finished items: Don’t do any cleaning and just continue with DO.from or DO.step • Inform the user of the current training progress (the completion progress of every step).

Quaero 2010 Speech-to-Text Evaluation Systems

617

All outputs including the log file are located in the working directory of a setup. So to re-run the whole step from the beginning, the only thing to do is just removing the working directory and re-run the whole steps. The One-Button system also automatically backups logfiles and the parallelization list to make it possible to continue a step where it previously crashed or canceled. 5.2.3 Configuration Parameters There are two part of configurations: the global configuration and the step configuration. The global one consists of: name (training id), working directory, jobname prefix, duration (maximum time that will be allowed for each step), and the submit mode (development or production, as explained before). Our toolkit binary used can be defined globally or locally for each step. In other words each step can use a different binary. The configuration of each step is located in one file (the “steps” file). Each row defines the configuration of each step. The steps will be executed in the same order as it occurs in the steps-file, from the first row until the last one. Format for each row in the steps-file: “steps name”, “slave”, “master”, “original list”, “memory”, “janus(opt)”, where: • • • • • •

“steps name”: the name of the step, the script name is scripts/“steps name”.tcl “slave”: the number of parallelized workers “master”: the number of iteration, 1 means single iteration “original list”: the name of the parallelization list (speaker list, cbslist, etc) “memory”: memory that will be provided for this step “janus(opt)”: optional, janus binary to be used only in this step.

Other configuration files that are required by a particular step (featAccess, featDesc, traindesc, etc) are located in the desc-directory “training id”.desc/* and the parallelization list in “training id”.desc/Lists/*. Furthermore the cleaning would also require another config to list the outputs that each step produced: config. “training id”.clean. 5.2.4 Reusability The basic idea of the One-Button system is to separate configuration files from the output files (including the log files). All configuration files needed are put in one directory, all output files are put in one other directory. This separation improves to analyze and to manage the output of different setups. Thus we can run different setups and test different parameters just by copying the configuration file and description folder, changing few parameters, and running the whole training or test system with multiple steps using just one button. Acknowledgments. This work was realized as part of the Quaero Programme, funded by OSEO, French State agency for innovation. The Research group ‘3-01 Multilingual Automatic Speech

618

S. St¨uker, K. Kilgour, F. Kraft

Recognition’ received financial support by the ‘Concept of the Future’ of Karlsruhe Institute of Technology within the framework of the German Excellence Initiative.

References 1. A.W. Black and P.A. Taylor. The festival speech synthesis system: System documentation. Technical report, Human Communication Research Centre, University of Edinburgh, Edinburgh, Scotland, United Kingdom, 1997. 2. W.M. Fisher. A statistical text-to-phone function using ngrams and rules. In Proceedings the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, December 1999. IEEE. 3. M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Technical report, Cambridge University, Engineering Department, May 1997. 4. M.J.F. Gales. Semi-tied covariance matrices. 1998. 5. M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. Technical report, Cambridge University, Engineering Department, February 1998. 6. C. Gollan, M. Bisani, S. Kanthak, R. Schl¨uter, and H. Ney. Cross domain automatic transcription on the tc-star epps corpus. In Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, March 2005. 7. Q. Jin and T. Schultz. Speaker segmentation and clustering in meetings. In Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004 – ICSLP), Jeju Island, Korea, October 2004. ISCA. 8. E. Leeuwis, M. Federico, and M. Cettolo. Language modeling and transcription of the TED corpus lectures. In International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, March 2003. 9. C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171– 185, 1995. 10. D. Povey and P.C. Woodland. Improved discriminative training techniques for large vocabulary continuous speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, May 2001. 11. S. St¨uker, K. Kilgour, and J. Niehues. Quaero speech-to-text and text translation evaluation systems. In High Performance Computing in Science and Engineering ’10 – Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2010. Springer, Heidelberg, 2010. 12. H. Soltau, F. Metze, C. F¨ugen, and A. Waibel. A One Pass-Decoder Based on Polymorphic Linguistic Context Assignment. Trento, Italy, 2001. 13. H. Soltau, F. Metze, C. F¨ugen, and A. Waibel. A one pass-decoder based on polymorphic linguistic context assignment. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), pages 214–217, Madonna di Campiglio Trento, Italy, December 2001. 14. A. Stolcke. SRILM – An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), pages 901–904, Denver, CO, USA, 2002. ISCA. 15. A. Venkataraman and W. Wang. Techniques for effective vocabulary selection. Arxiv preprint cs/0306022, 2003. 16. M.C. W¨olfel and J.W. McDonough. Minimum variance distortionless response spectralestimation, review and refinements. IEEE Signal Processing Magazine, 22(5):117–126, September 2005. 17. P. Zhan and M. Westphal. Speaker normalization based on frequency warping. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 1997.

Accurate Simulation of Wireless Vehicular Networks Based on Ray Tracing and Physical Layer Simulation T. Gaugel, L. Reichardt, J. Mittag, T. Zwick, and H. Hartenstein

Abstract Vehicle-to-vehicle (V2V) and vehicle-to-roadside (V2R) communications is required for numerous applications that aim at improving traffic safety and traffic efficiency. As recent studies have shown, communications in this context is significantly influenced by radio propagation characteristics of the environment and the signal processing algorithms that are executed on the physical layer of the communications stack. Whereas a shadowing of the transmitted signal, e.g. due to buildings, determines the ability to communicate “around corners”, channel estimation, channel equalizing, and advanced coding schemes determine whether a receiver can decode a received signal successfully or not. Consequently, a proper assessment and evaluation of V2V and V2R communications, especially when traffic safety applications are considered, requires an accurate simulation of the wireless channel as well as the physical layer of the protocol stack. To enable a proper assessment, we integrated a physical layer simulator into the popular NS-3 network simulator, validated our implementation against commercial off the shelf transceiver chipsets, and employed ray tracing as a method to accurately simulate the radio propagation characteristics of the Karlsruhe Oststadt. Since the simulation of signal processing details and ray tracing are computationally expensive modeling methods, we based our work on the HP XC4000 to speedup the computation of both aspects.

T. Gaugel · J. Mittag · H. Hartenstein Decentralized Systems and Network Services, Institute of Telematics, Karlsruhe Institute of Technology (KIT), D-76128 Karlsruhe, Germany L. Reichardt · T. Zwick Institute of Hochfrequenztechnik und Elektronik, Karlsruhe Institute of Technology (KIT), D76128 Karlsruhe, Germany, e-mail: [email protected] H. Hartenstein Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology (KIT), D-76128 Karlsruhe, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 45, © Springer-Verlag Berlin Heidelberg 2012

619

620

T. Gaugel et al.

1 Introduction Simulation has proven to be an invaluable tool for the networking community, supporting the controlled assessment of the impacts of wireless environments on existing communication mechanisms and the evaluation of solutions to emerging issues. At present, most popular network simulators include certain basic abstractions in their physical layer implementations as well as the accompanying channel models. Thereby, they make a particular selection in the trade-off between simulation time and accuracy. Specifically, modern network simulators such as NS-3 [5] and QualNet [6] contain a fairly accurate representation of the layers above and including MAC, but abstract significantly the physical layer and the wireless channel. Notably, the smallest simulation unit considered in such simulators is the packet standing for a collection of bits. The collection is treated as an inseparable unit and does not allow the characterization of individual bits as erroneous, i.e. the frame is received in its entirety or not at all. However, in much of the physical layer simulation and channel modeling literature, individual bits are treated separately and the smallest unit employed is generally the time sample, i.e. a complex representation of the produced signal at the sender. Due to the aforementioned collective consideration of frame bits, the corresponding channel models are statistical abstractions that apply to the frame as a whole. As such, they are based on average values and their distribution, which have been obtained through empirical studies. Similarly, the signal reception characteristics of a frame are expressed only in terms of the average received signal strengths and average signal-to-interference-noise ratios (SINR). Yet, the computation of the SINR is based on the additive white Gaussian noise (AWGN) channel model and assumes that amplitudes of interfering signals are Gaussian distributed—an assumption which is not inherently valid in real systems. The usage of statistical models in popular network simulators, as described above, contradicts with the objectives when evaluating in a concrete scenario whether communication between vehicles can help to improve traffic safety or not. For instance, when studying a specific intersection of a city, it is important to know exactly which and how many transmitted messages can be received successfully, by whom and at which point in time. Therefore, it is necessary to model the geometry of the intersection and the induced radio propagation characteristics as accurate as possible, and to properly reflect the performance of a real transceiver. To this purpose we employ ray tracing to accurately calculate the radio propagation characteristics within a studied scenario, and physical layer simulation to emulate the signal processing steps of a real transceiver. Since both, ray tracing and physical layer simulation, are computational intensive methods, we make use of the HP XC4000 to speedup the calculation of the radio propagation characteristics, and to validate our developed physical layer simulator against recent commercial transceiver chipsets from Atheros. The rest of the paper is structured as follows. In Sect. 2 we explain and demonstrate how ray tracing is employed to characterize the radio propagation conditions for the eastern part of Karlsruhe. Afterwards, we sketch the implementation of our

Accurate Simulation of Wireless Vehicular Networks

621

physical layer simulator in Sect. 3 and discuss our validation results. Section 4 finally concludes this paper.

2 Realistic Radio Propagation by Means of Ray Tracing For the simulation of Intelligent Transportation Systems (ITS) it is crucial to determine whether a packet was successfully received by the driving public. This packet might include safety related information sent by a roadside-unit or other road users and could therefore lead to an altered driving behavior, like a braking maneuver, reduced velocity or a change of lanes. For that reason communication simulation does have to consider multiple influencing factors to realistically model the resulting probability of packet reception. One major factor that has to be considered is the path loss between the sending and receiving vehicle, which directly influences the signal strength at the receiver and therefore the possibility to detect and decode a packet. The following subsections are structured as follows: first a brief overview is given on how radio propagation modeling is typically being conducted. Then, we discuss the advantages and disadvantages of the different approaches, followed by the explanation of how ray tracing can be applied to achieve realistic radio propagation models.

2.1 Commonly Used Methods for Radio Propagation Modeling The majority of radio propagation models used nowadays are analytical models, directly based or derived from the Friis free-space path loss formula introduced in [1]. These propagation models usually consider only the wave length and the distance between the sender and receiver. In various works it has been shown, that this assumption is valid in rural scenarios if there is a direct line-of-sight connection between sender and receiver and if the propagation model was combined with a fast fading model to consider short-time space or time variant deviations, see e.g. [9]. In urban scenarios, however, the radio signal is much more affected by its surrounding. Buildings and vegetations might lead to reflections, diffractions and scattering which then results in a greater degree of path loss, compared to free-space propagation. To consider all these further influences in communication modeling there are different approaches conceivable: 1. If available, real measurements from a field operational test could be used. The collected data has to be stored in some kind of lookup-table and every time a transmission between a sender and receiver takes place the resulting path loss is directly queried. This approach is problematic, since it is really difficult to achieve the needed comparability in real world measurements. On the one hand

622

T. Gaugel et al.

short-time interferences measured at certain locations do have to be considered. On the other hand it is problematic to guarantee similar conditions for all measurements, since, for example weather and traffic conditions could not be neglected in real world measurements. Furthermore the execution of a field operational test requires an immense amount of resources for planning, equipment acquisition and the execution of the measurement itself. 2. Another approach, which is for example pursued in [3], is to undertake real world measurements as well and use the collected data to built an empirical model. The empirical model is then used to determine the average path loss value between sender and receiver. These models are easy to use and average out the specifics mentioned before, but on the other side lack at distinguishing between different urban conditions, since again usually only the distance and wave length are considered. 3. An approach, that is for example presented in [8] and [2] is using analytical formulas to determine the path loss. These formulas do not only consider the distance, but also calculate the number of reflections and diffractions, based on for instance the width of the road the sender and receiver are in. The problem with this approach is that heterogeneous scenarios can’t be modeled correctly. There is for example a problem if road width varies between sender and receiver or if there are some areas without persistent buildings (e.g. parks). 4. A fourth approach is using a ray tracer for simulation of the propagation of electromagnetic waves. A radio signal is undergoing lots of reflections and diffractions before ‘arriving’ at the receiver. The signal itself is also split up into several rays which might be received over different ways and with different signal strengths. The rays then could overlay each other in a beneficial or eliminating way before arriving at the receiver. A ray tracer is able to consider all this behavior, based on mathematical formulas, in homogeneous as well as in heterogeneous scenarios. Therefore a detailed scenario description is needed which includes among other things all the buildings, their shape, height and materials used. Furthermore the computation of one single sender/receiver-pair takes a considerable time of computation and is therefore nowadays usually not used as input for communication-simulations. For a more detailed description on ray tracing we refer to [7]. For the design and evaluation of V2V and V2R communication protocols and applications in urban scenarios we decided to explore the fourth approach, since it offers more trustworthy and accurate results compared to options two and three and is easier to manage and realize than option one. The idea, however, was to do a pre-calculation based on possible sender/receiver-positions, hence creating a ‘radio map’ which can then be used in communication simulation on the runtime. The precalculation should thereby be performed on the HP XC4000, since it takes quite a considerable time to do all the ray tracing, as will be discusses in Sect. 2.2.1.

Accurate Simulation of Wireless Vehicular Networks

623

Fig. 1 Map with all the buildings of the Karlsruhe Oststadt1

2.2 The Usage of Ray Tracing to Model Radio Propagation One basic problem of pre-calculation is that exact positions of senders and receivers are not known in advance. Therefore a certain resolution of the considered scenario is necessary to limit the number of possible sender/receiver-pairs. As an implication it is then necessary to do a mapping from actual positions of sender/receiver to pre-calculated positions. The coarser the resolution is chosen, the less combinations have to be pre-calculated, but also the more improper the results might become. As a first approach we decided to pick a resolution of 2 meters. Since this would still result in an enormous amount of sender/receiver-pairs, we decided to further reduce the number of pre-calculated positions by only considering positions where vehicles could be located, namely roads. In an optimistic approach this results in about 10000 positions in the area of the Karlsruhe Oststadt (see Fig. 1) with about 20 kilometers of roads (counted per lane). Therefore the number of sender/receiver-pairs is in the order of 50 millions, since a bidirectional behavior is assumed. Furthermore and currently work in progress is a further optimization by doing an optimistic estimation of potential sender/receiver-points. This estimation regards for example the maximum achievable distance in a pure line-of-sight case, as well as estimations for 1

Data provided and copyrighted by the real estate office of the city of Karlsruhe

624

T. Gaugel et al.

several non-line-of-sight cases. The idea is to filter out every sender/receiver-pair with low effort calculations, where definitely no reasonable path loss could be expected. Every other combination is passed towards the ray tracer for a more detailed path loss calculation.

2.2.1 Preliminary Runtime Performance Evaluation on the HP XC4000 The first time a path loss value for a given sender/receiver-pair is calculated the ray tracer is building up an optimized internal representation of every object, the relations between them, visibility zones and possible diffraction edges. Depending on the sender’s location this takes between 4:30 minutes and 22:00 minutes. Afterwards the calculation of one additional receiver, given the same sender and our current settings takes approximately between 2 and 20 seconds, depending on, for instance, the number of rays ‘arriving’ at the receiver and the total number of interactions (reflections, diffractions, . . . ) to be calculated. The experienced total runtime per sender for a reduced number of receiving points (≈ 2400) is in the range of 80 minutes to 510 minutes, whereby the calculations between different senders are independent, but not for a fixed sender and varying receiver points. The projected total runtime with the mentioned resolution of 2 meters and further optimization by only calculating reasonable receiver points is in the range of about 100.000 hours. The necessary runtime for the calculation of all the receivers to one sender could be reduced even further with the assistance of GPU-accelerated ray tracing.

2.3 Preliminary Results In Fig. 2 the cross indicates the position of the corresponding sender. The colors indicate the calculated path loss at the given positions. The coding scheme is therefore presented directly in this figure. The obtained results first of all indicate that for evaluating the impact of ITSSystems in urban scenarios appropriate communication models have to be used, since there is a clear deviation between LOS and NLOS propagation. The creation of a lookup table of the Oststadt Karlsruhe not only allows us to use the data as input for detailed communication modeling in this scenario, but also gives us the opportunity to oppose the results against other, much faster and more abstract, existing communication model approaches in this concrete scenario.

Accurate Simulation of Wireless Vehicular Networks

625

Fig. 2 Path loss plot of the Karlsruhe Oststadt

3 Physical Layer Simulation In the previous section, we described the approach which we use to calculate the radio propagation conditions within a given and concrete scenario. However, apart from knowing how a transmitted signal will be altered by the channel, depending on the specific locations of the sending and the receiving vehicle, their relative velocity, as well as the surrounding and reflecting objects, it is also important to determine whether the receiver can decode the received signal successfully or not. Traditional network simulators normally use the received signal strength or the signal-to-interference noise ratio (SINR) to perform this decision, i.e. they either compare the received signal strength against a modulation-dependent threshold or use the SINR to derive a bit-error and/or packet-error rate. However, to properly account for multi-path fading or frequency-selective fading, due to the Doppler effect, a more accurate modeling of the physical layer is required. In [4] we discussed the drawbacks of such traditional modeling approaches and proposed to integrate a physical layer simulator for the orthogonal-frequency division multiplex (OFDM) based IEEE 802.11 communication technology into the network simulator NS-3. Instead of abstracting the packet as an indivisible unit, our proposal treats a packet as a sequence of complex time samples, through which the individual bits of the packet are encoded. While [4] provides a full description of our work, we focus on the overview of our proposal and the computationally expensive simulation aspects in this paper.

626

T. Gaugel et al.

Fig. 3 Architecture of the proposed physical layer simulation within NS-3

Figure 3 illustrates the interaction between the lower layers of the protocol stack and highlights where signal processing is performed. Whenever a packet transmission request is triggered by the Medium Access Control (MAC) layer (cf. SendPacket), the bits of the packet are first scrambled to prevent long sequences of 0 s or 1 s. Afterwards, a convolutional encoder adds redundancy to enable error correction at the receiver, followed by a processing of the block interleaver to ensure that long runs of low reliability bits are avoided in the final OFDM symbols. In addition, the block interleaver divides the bits into equally sized blocks, which later end up in the OFDM symbols to be transmitted. Then, the OFDM modulator modulates the bits according to the configured modulating scheme, inserts pilot symbols in four of the 52 sub-carriers to support channel tracking in the receiver, and performs the final OFDM modulation per block. The different signal processing steps that are executed during frame construction and which have been described above are also illustrated in Fig. 4. Once the frame has been constructed, the sequence of complex time samples is passed down to the wireless channel module which allows chaining of several propagation loss models such that the output of one model serves as the input for the next model. Hence, the path loss values for specific sender and receiver positions within a certain scenario (that we calculated a priori using ray tracing) can easily be integrated through an additional model that retrieves the values from a database. After the signals of the frame have been altered by the channel models, and delayed in time according to the propagation delay, the signals are handed over to the physical layer of the receiving node. There, the following steps are executed: first, the signals are given to the interference manager which keeps track of all overlapping transmissions. Afterwards, signal detection and channel estimation is performed, i.e. the receiver tries to detect the beginning of the frame and to estimate how channel effects have altered the signal. If the frame beginning is detected successfully, the reverse of the frame construction process is executed, i.e. OFDM demodulation, bit demodulation, deinterleaving, error correction and descrambling.

Accurate Simulation of Wireless Vehicular Networks

627

Fig. 4 The signal processing steps that are performed during frame construction

3.1 Validation of the Physical Layer Implementation In order to validate our physical layer simulator integration, we compared the communication performance of our physical layer against commercial and off the shelf chipsets from Atheros. Since it is difficult to perform repeatable experiments with real hardware, we used the wireless network emulator testbed of the Carnegie Mellon University (CMU) in Pittsburgh. This testbed inter-connects the antenna in- and outputs of Atheros AR5212 wireless LAN cards through a digital signal processor. This setup allowed us to control the wireless channel effects in the same way as we can do in our developed simulator. For the validation, we measured and compared the probability of successful frame reception w.r.t. SNR when one single node is transmitting frames to a receiver in the absence of any interfering node. To increase the significance of our validation we compared our physical layer against the real chipset over a wide range of parameters, cf. the configuration parameters in Table 1. Based on these parameters, we first studied the receiver performance in a scenario with non-fading radio propagation conditions, i.e. with a static pathloss only, followed by the consideration of a Rayleigh fading channel with different fading intensities due to different relative node speeds. Since the runtime of one single simulation run is by three to four orders of magnitude slower than a traditional packet-level simulation, we made extensive usage of the HP XC4000. By being able to run independent parameter configurations in parallel, we were able to decrease the overall time of our validation significantly. All our experiments were repeated 10 times, each run with 1000 frame transmissions. Nevertheless, due to space restrictions, we illustrate only a subset of our validation results in this paper.

628

T. Gaugel et al.

Table 1 Configuration of the wireless channel and the physical layer for the conducted simulations as well as the CMU network emulator experiments Parameter Radio propagation model Relative node speeds Channel bandwidth Channel frequency Frame size Data rates

Range of value Static pathloss, Rayleigh fading 10, 30, 50 m/s 20 MHz 2.4 GHz 100, 500 and 1000 bytes 6, 9, 12, 24, 36, 48, and 54 Mbps

Fig. 5 Frame reception ratio w.r.t. SNR of the new physical layer in NS-3 compared to the results obtained with real chipsets. At lower data rates, the difference is at most 1 dB. At high data rates the simulator is significantly better. The configured channel reflects only a static pathloss, no Rayleigh fading was applied. Packet size was set to 500 bytes

Figure 5 shows the observed frame reception ratio w.r.t. SNR for four of the possible data rates and the case when no signal fading is present. As we can see, all curves share a similar slope and the curves by the physical layer simulation in NS-3 and the curves by the CMU wireless testbed show a reasonable match, being only 1 dB apart from each other on average. Only with the highest data rate configured, we observed a significant difference, which, according to our discussions with CMU, are due to inaccuracies in the analog to digital conversion frontends of the network emulator testbed. The frame reception ratios of the conducted simulations and testbed experiments with a Rayleigh fading channel are illustrated in Fig. 6. For the results shown in this figure the relative speed was set to 10 m/s. Similarly to the non-fading scenario, the slopes of the observed performance curves are again very similar, but this time, the offset varies between 3–5 dB throughout all evaluated data rates and the performance of the Atheros AR5112 chipset is significantly worse than the one observed in our simulator (note the use of two different x-axes in this figure). We believe that this discrepancy can be attributed to the channel estimation algorithms being used in the AR5112 chipset, which are very likely different to our own. We further believe that more contemporary chipsets would lead to better results. It should also be

Accurate Simulation of Wireless Vehicular Networks

629

Fig. 6 Frame reception ratio w.r.t. SNR of the new physical layer in NS-3 compared to the results obtained with real chipsets. At lower data rates, the difference is at most 3.5 dB. At high data rates the simulator is significantly better. The configured channel reflects a static pathloss plus flat Rayleigh fading with classical Jake’s Doppler spectrum and relative speed of 10 m/s. Packet size was set to 500 bytes

noted, however, that our intention is not to reflect the performance of a particular chipset; instead, we are interested in observing whether our current implementation is comparable to available chipsets and therefore realistic. Based on the obtained results, we feel justified in making such a claim and consider our implementation as valid foundation for future improvements and extensions.

4 Conclusion During the first two years with access to the high performance cluster HP XC4000, we primarily performed sensitivity studies to evaluate the performance and robustness of existing network protocols over a wide range of protocol configurations and scenario conditions. Now, in our third year, we slightly switched our focus and primarily used the HP XC4000 to increase the accuracy of the simulation framework itself and to model the characteristics of a specific scenario: on the one hand, we employed the method of ray tracing to generate a so called radio map for the eastern part of the city of Karlsruhe, and on the other hand, we improved the physical layer modeling approach used in todays network simulators. Regarding the second aspect, we used the HP XC4000 to extensively validate and compare the communication performance provided by our implementation against the performance provided by commercial hardware. Since both methods, ray tracing and the simulation of signal processing, are computationally very expensive, even when simulated on a cluster such as the HP XC4000, we recently evaluated the potential of general purpose graphics processing units (GPGPUs) to further increase the runtime performance of our simulations.

630

T. Gaugel et al.

Since the algorithms employed in the physical layer simulator are suitable for parallel execution on multi-core and even many-core processors, we started to port them to GPGPUs using the OpenCL API. According to our runtime performance evaluation, speedup factors of up to 100 can be observed on an AMD Phenom II X6 1035T CPU with six cores and an ATI Radeon HD 5870 graphics card with 1600 cores. We can therefore confirm the high potential that is provided through GPUs and are in favor of GPU-featured or GPU-supported high performance clusters in the future. Acknowledgments. This work was supported by Klaus Tschira Stiftung, INIT GmbH and PTV AG through the Research Group on Traffic Telematics as well as the Steinbuch Centre for Computing (SCC) that is part of the Karlsruhe Institute of Technology (KIT). We further want to thank the Karlsruhe Institute of Technology for the IKO Startup-Budget 2010 and its support of the research project titled Analyse von Protokollen zur Fahrzeugkommunikation unter realistischen Bedingungen.

References 1. H. Friis. A Note on a Simple Transmission Formula. Proceedings of the IRE, 34(5):254–256, May 1946. 2. E. Giordano, R. Frank, G. Pau, and M. Gerla. CORNER: A Realistic Urban Propagation Model for VANET. In Wireless On-demand Network Systems and Services (WONS), 2010 Seventh International Conference on, pages 57–60, Feb. 2010. 3. P. Ky¨osti, J. Meinil¨a, L. Hentil¨a, X. Zhao, T. J¨ams¨a, C. Schneider, M. Narandzi´c, M. Milojevi´c, A. Hong, J. Ylitalo, V.-M. Holappa, M. Alatossava, R. Bultitude, Y. de Jong, and T. Rautiainen. WINNER II Channel Models. Technical report, EC FP6, Sept. 2007. 4. J. Mittag, S. Papanastasiou, H. Hartenstein, and E. G. Str¨om. Enabling Accurate Cross-Layer PHY/MAC/NET Simulation Studies of Vehicular Communication Networks. Proceedings of the IEEE, 99(7):1311–1326, July 2011. 5. The NS-3 Network Simulator. http://www.nsnam.org/. 6. QualNet Network Simulator. http://www.scalable-networks.com/. 7. L. Reichardt, J. Maurer, T. F¨ugen, and T. Zwick. Virtual Drive: A Complete V2X Communication and Radar System Simulator for Optimization of Multiple Antenna Systems. In Proceedings of the IEEE, pages 1–16, May 2011. 8. Q. Sun, S. Tan, and K. Teh. Analytical Formulae for Path Loss Prediction in Urban Street Grid Microcellular Environments. Vehicular Technology, IEEE Transactions on, 54(4):1251–1258, July 2005. 9. V. Taliwal, D. Jiang, H. Mangold, C. Chen, and R. Sengupta. Empirical Determination of Channel Characteristics for DSRC Vehicle-to-Vehicle Communication. In Proceedings of the 1st ACM International Workshop on Vehicular ad hoc Networks, VANET ’04, page 88, New York, NY, USA, 2004. ACM.

Reduction of Numerical Sensitivities in Crash Simulations on HPC-Computers (HPC-10) Oliver Mangold, Raphael Prohl, Anton Tkachuk, and Vladimir Trickov

Abstract For practical application in engineering numerical simulations are required to be reliable and reproducible. Unfortunately crash simulations are highly complex and nonlinear and small changes in the initial state can produce big changes in the results. This is caused partially by physical instabilities and partially by numerical instabilities. Aim of the project is to identify the numerical sensitivities in crash simulations and suggest methods to reduce the scatter of the results. CAE-simulations allow us to recognize and evaluate the characteristics of a vehicle on the basis of a simulation at an early stage without having to build a real prototype. Already at the design engineering stage long before the first prototypes can be tested, we need reliable knowledge about the vehicles characteristics. Advances in applied mechanics, numerical methods and computer technology today permit the simulation of complex phenomena of automotive engineering. In the field of passive safety such simulations are used to analyze and optimize the structural behavior of the vehicles. For the realization of these simulations general-purpose-programs like ABAQUS, LS-DYNA, RADIOSS and PAMCRASH are used. Crash simulations are highly sensitive numerical experiments, this means small changes in the input can partly lead to large changes in deformation behavior of the car. On the other hand, Oliver Mangold High Performance Computing Center Stuttgart, Universit¨at Stuttgart, Nobelstraße 19, 70569 Stuttgart, Germany, e-mail: [email protected] Raphael Prohl ¨ Steinbeis Center of Innovation Simulation in Technology, Obere Steinbeisstr. 43/2, 75248 OlbronnD¨urrn, Germany, e-mail: [email protected] Anton Tkachuk Institute of Structural Mechanics, Universit¨at Stuttgart, Pfaffenwaldring 7, 70550 Stuttgart, Germany, e-mail: [email protected] Vladimir Trickov Automotive Simulation Center Stuttgart e. V., Universit¨at Stuttgart, Nobelstraße 15, 70569 Stuttgart, Germany, e-mail: [email protected] W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 46, © Springer-Verlag Berlin Heidelberg 2012

631

632

O. Mangold et al.

physical instabilities due to, for example bifurcation behavior of the material under asymmetric loading, can appear and have to be captured and correctly solved by the numerical procedures in order to obtain reliable results. Achieving this accuracy and reliability is our main goal in this project. The project is build on the knowledge from industry and on the expertise of the participating scientists. We started with the examination of the stability of full-vehicle-models taken from the industry, then we reproduced their main features in reduced models in order to investigate the described instabilities. The study of literature showed, that dynamical process differ in range of predictability and can be classified along with this range. The range of predictability gives a limit time after which no prediction is possible (the notion comes from Weather forecasting). One distinguishes • TYPE 1 characterized by an infinite range of predictability (e.g. harmonic oscillations, linear dynamics) • TYPE 2 the range of predictability is finite, but can be increased indefinitely by decreasing the size of the initial error (decreasing uncertainty in the model as well as numerical errors the range may be increased) • TYPE 3 the range of predictability is finite and intrinsically limited (e.g. chaotic system: stochastic models are not deterministic, which is not a dynamical model by definition; the properties of material and initial conditions may be randomly distributed, however the process of crash is deterministic and obey PDE of motion) The systems of ODE that arises in Crash Simulation are reported to be of type 2 or 3 (Marczyk [1]). This requires special attention and treatment, e.g. numerical discretization (e.g. errors, numerical damping and artificial mass scaling) should not change these properties. Analysis of stability of dynamical process differs from one for stability of state of equilibrium. A positive definite stiffness matrix is necessary and sufficient condition of state of equilibrium, however this does not hold for dynamical process. If a critical point is repelling then initially close trajectories diverge in time. It is not necessary that they diverge with respect to any degree of freedom. I.e. two trajectories can attract in one direction, but repel in another (Fig. 1). Lyapunov exponent is defined as a measure of maximal exponential divergence rate from a given solution. Analysis of Lyapunov exponent gives insight to stability of dynamical system as well as its TYPE. Positive Lyapunov exponent indicate to chaotic nature of process. It is very complicated to compute Lyapunov exponents even for small dynamical systems, thus there exist vast literature on the topic (23.500 related articles on www.scholar.google.com). The fundamental work is by Wolf et al. [3]. An estimate of Lyapunov exponent for crash made by Marczyk [1] gives positive value. The natural reason for that is big amount of critical points (bifurcations) in crash process. Therefore we can speak about deterministic chaos in crash simulation. Correspondently, one gets in simulations as well as in experiments substantially different results for almost identical input values. Thus the goal of simulation is

Reduction of Numerical Sensitivities in Crash Simulations on HPC-Computers

633

Fig. 1 Divergence of neighboring trajectories, Abraham [2]

Fig. 2 Basic approach of sensitivity studies

rather to obtain physically present scatter and its characteristics (e.g. by means of statistics: mean value and standard deviation) than results of unite computations. This requires differentiation between physical and numerical scatter, which is not a trivial task. To investigate the causes of this scatter and to evaluate efficiency of methods that decrease these numerical scatter, parametric studies on reduced models have been done by the ASCS e.V. and IBB. The objective of the studies is to use these findings to develop strategies to avoid the appearance of artificial scatter in the calculated results. For the parametric studies the commercial software packages Radioss and LS-Dyna were used. Figure 2 shows the basic approach of this sensitivity studies. A number of calculations showed that in particular the time discretization and the formulation of contact conditions have a major influence on the scattering of the solution. Other effective factors are the element formulation and the addition of a rotational damping which damps only the rotational degrees of freedom. Figures 3 and 4 exemplify how much simulation results may depend from the time step size and contact stiffness. The stability condition of the explicit time integra-

634

O. Mangold et al.

Fig. 3 Influence of time step

Fig. 4 Influence of contact stiffness

tion method, which is used in Radioss and LS-Dyna, requires a time step which lies below the critical time step, which is thus limited by the largest natural frequency of the system. In practice, generally a conservative estimate of the critical time step is made, since for large finite element systems a natural frequency calculation is computationally expensive. To ensure stability on the one hand and to reach short computation time on the other hand, a time step is used, which lies under, but as close as possible to the critical time step (Fig. 3, left). The studies have now shown that time steps which are clearly below this “stable” time step, can reduce the scatter of the response enormous (here the contact force between impacting plate and tube), as in Fig. 3 (right) shown. Furthermore investigations of the reproducibility of the simulation results depending on the method of parallel execution and the number of processes were done in collaboration with the High Performance Computing Center of the University of Stuttgart (HLRS). This includes the observation that, in the use of symmetric multiprocessing (SMP) in certain software packages, the simulation results differ significantly between repeated identical runs (Fig. 5) and with distributed memory multiprocessing the results differ significantly with the number of parallel processes (Fig. 6). Because of the large number of factors and parameters which influence stability of the simulation, computationally intensive large parameter studies are required to find effective ways to reduce the stability. These studies are planned to be run on the Nehalem cluster of HLRS. Scalability of single simulation runs is very limited although the finite element computations can be parallelized effectively. The reason lies mostly in the contact algorithms, which require a tighter coupling of computa-

Reduction of Numerical Sensitivities in Crash Simulations on HPC-Computers

635

Fig. 5 Identical runs of same input deck with 8 SMP processes in red and blue. The results differ considerably. HLRS further worked on the tracking of the origin in space and time of diverging simulation results. The results suggest a strong connection between instability and contact as a rapid escalation of the differences originating in the time and place of the contact could be observed

Fig. 6 Difference of nodal coordinates of a reduced model between results for 1 MPI process and 8 MPI processes. The difference is shown by color for 3 different time steps of the same simulation. It can be clearly observed that the divergence begins at the contact layer

tion processes as can be achieved with large number of distributed memory nodes. The number of MPI processes was typical between 8 and 32 with job runtimes from several minutes to several hours. However, these parameter studies can be trivially parallelized over the sets of independent simulation runs. To achieve this, interaction of the cluster scheduling system with parameter study frameworks like LS-Opt or Hyperstudy is planned to be realized. For effective operation some degree of fault-tolerance and merging of multiple simulation runs into one job should be implemented in this interaction layer. The Steinbeis-Center of Innovation Simulation in Technology works on the development of numerical methods. To this end they use their own simulation tool UG which is to be used on the Nehalem cluster at the HLRS to investigate stability of the highly nonlinear problems of crash simulations. SIT developed numerical methods which are able to predict physical instabilities, cf. [4]. In a first step SIT dealt with formulating an appropriate mathematical model of a crash test. To this end large elastic and plastic strains as well as non-linear material behavior were considered. This led to a non-linear system of partial differential equations coupled with an evolution law for plastic distortions. Main focus was on geometrical and kinematical non-linearities, non-linear material description, elasto-plastic material description, contact simulations and their impact on the instabilities. The simulation setup is planed to start in middle of the year 2011.

636

O. Mangold et al.

References 1. Marczyk, J. Automotive crash: Is optimization possible? USACM: Sixth US National Congress on Computational Mechanics Abstracts, pp. 68. 2001. 2. Abraham, R., Shaw, C.D. Dynamics: The Geometry of Behavior, Addison-Wesley, Reading. 1992, 642 p. 3. Wolf, A. et al. Determining Lyapunov exponents from a time series. Physica D: Nonlinear Phenomena, Volume 16, Issue 3, 1985, pp. 285–317. 4. Wieners, C., Lang, S., Wittum, G. The application of adaptive parallel multigrid methods to problems in nonlinear solid mechanics. Adaptive Methods in Solid Mechanics, J. Wiley, New York, 2002.

Three-Dimensional Gyrotron Simulation Using a High-Order Particle-in-Cell Method A. Stock, J. Neudorfer, B. Steinbusch, T. Stindl, R. Schneider, S. Roller, C.-D. Munz, and M. Auweter-Kurtz

Abstract A three-dimensional highly parallelized code for plasma simulation based on the Particle-In-Cell (PIC) approach using a discontinuous Galerkin method has been developed and validated within the instationary magneto-plasma dynamic (IMPD) thruster project1 . With this code, it is for the first time possible to simulate the highly challenging gyrotron launcher and resonator, i.e. a high-energetic microwave source used for fusion-plasma heating, without using any physical approximations. We present the results of the gyrotron simulations with special focus on the parallelization capabilities of our code. For the gyrotron launcher, computations with up to 2048 processes have been performed. Parallel scaling of the PIC code with at most 1024 processes for simulating the gyrotron resonator is investigated in detail.

A. Stock · J. Neudorfer · C.-D. Munz Institut f¨ur Aerodynamik und Gasdynamik, Universit¨at Stuttgart, Stuttgart, Germany, e-mail: [email protected] B. Steinbusch · S. Roller Applied Supercomputing in Engineering, RWTH Aachen University, German Research School for Simulation Sciences GmbH, 52062 Aachen, Germany, e-mail: [email protected] T. Stindl Institut f¨ur Raumfahrtsysteme, Abt. Raumtransporttechnologie, Universit¨at Stuttgart, Stuttgart, Germany, e-mail: [email protected] R. Schneider Karlsruher Institut f¨ur Technologie, Institut f¨ur Hochleistungsimpuls- und Mikrowellentechnik, Universit¨at Karlsruhe, PF3640, 76021 Karlsruhe, Germany, e-mail: [email protected] M. Auweter-Kurtz German Aerospace Academy ASA, 71034 B¨oblingen, Germany, e-mail: m.auweter-kurtz@ german-asa.de 1

Associated with the DFG project “Numerical Modeling and Simulation of Highly Rarefied Plasma Flows” W.E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ’11, DOI 10.1007/978-3-642-23869-7 47, © Springer-Verlag Berlin Heidelberg 2012

637

638

A. Stock et al.

1 Introduction For the numerical simulation of highly rarefied plasma flows, a fully kinetic modeling of the Boltzmann equation completed by the Maxwell equations is necessary. For that purpose, a hybrid code is currently under development relying on the Particle-In-Cell (PIC) method as well as stochastic and Monte Carlo techniques for particle collisions [2]. In the present report, we neglect collisional effects and focus our attention on the Maxwell-Vlasov equations allowing the self-consistent investigation of collective plasma phenomena. The numerical methods to tackle this nonlinear problem in six-dimensional phase space are very briefly reviewed in Sect. 2. In Sect. 3 we present numerical results obtained from highly challenging gyrotron launcher and resonator simulations. Especially, the launcher computation represents the first simulation conducted without any physical approximations. Finally, a short outlook on further activities is given in Sect. 4.

2 Numerical Framework A powerful method to treat numerically the non-linear Maxwell-Vlasov problem in six-dimensional phase space is the PIC approach [1, 5], which has a long history of more than five decades. The peculiarities of the PIC method are the ingenious particle-mesh techniques for the coupling of an Eulerian grid-based model for the Maxwell equations with a Lagrangian approach for the Vlasov equation. To get an overview of the numerical methods applied within the PIC framework a single PIC cycle, schematically depicted in Fig. 1, is discussed. The rarefied non-neutral plasma inside a device is represented by a sample of charged simulation particles

Fig. 1 Standard PIC concept

3D Particle-In-Cell Simulations of Gyrotrons

639

of, in general, different species. In each time step, the electromagnetic fields are obtained by the numerical solution of the Maxwell equations. In the context of the present PIC solver a discontinuous Galerkin (DG) method for these equations is applied where, in addition, a hyperbolic divergence cleaning technique [10] is considered. Especially, the use of a powerful mixed nodal and modal DG approach [3, 4] allows a fast and high-order space discretization. Afterwards, the electromagnetic fields are interpolated to the actual spatial positions of the simulation particles [11]. These charged particles experience a force and thus an acceleration due to the electromagnetic fields. According to the Lorentz force, the charges are advanced and the new phase space coordinates are determined by solving numerically the usual laws of Newtonian dynamics. For this purpose we can choose between an explicit time-centered leapfrog technique [1] and an explicit low-storage fourthorder Runge-Kutta approach (LSERK4) [8]. While the leapfrog method only has a second-order convergence, the LSERK4 scheme has a fourth-order convergence, also requiring five times the computational effort with respect to the particle solver. To close the chain of self-consistent interaction, the simulation particles have to be located with respect to the computational grid in order to assign the contribution of each charged particle to the changed charge and current density [11] which are the sources for the Maxwell equations in the subsequent time step.

3 Gyrotron Simulation Figure 2 shows a schematic drawing of a gyrotron, which roughly consists of the electron gun, the compression zone, the resonator and the launcher. Electrons are emitted from the electron gun and forced into a continuous beam along the field lines

Fig. 2 Schematic representation of a gyrotron

640

A. Stock et al.

of an externally applied magnetic field. The gyration of the electrons around the magnetic field lines interacts with the electromagnetic fields created by the particles themselves. This interaction takes place in the resonator, where the so-called electron cyclotron maser instability produces electromagnetic waves in the microwave spectrum. Afterwards, this microwave radiation propagates through the launcher in which it is shaped to assume an approximately Gaussian form. The resulting Gaussian beam is then emitted through a diamond window at the end of the launcher to be used for different purposes in fusion applications. Of special interest in the context of the present paper is the particle-field interaction in the resonator as well as the formation of the Gaussian beam in the launcher. While the modeling of the electron cyclotron maser instability in the resonator requires the inclusion of the electron beam and the field, the shaping of the Gaussian beam in the launcher can be modeled by the propagation of the electromagnetic waves alone. Due to the simpler modeling, the simulation of the launcher shall be described first in Sect. 3.1. The following Sect. 3.2 then presents aspects and results of PIC simulations of the resonator.

3.1 Gyrotron Launcher The presented simulation models the wave propagation inside the launcher shown in Fig. 3, which is part of a 170 GHz TE34,19 coaxial cavity gyrotron. For the given frequency f = 170 GHz and the speed of light c, the wavelength of the propagated waves is λ = cf = 1.76 mm. In order to resolve the tiny displacements of the inner surface of the cylindrical computational domain with radius r = 3.25 · 10−2 m and length z = 0.31 m with five points per wavelength, a total of roughly 23 million degrees of freedom (DOF) are required. A resolution of ten points per wavelength yields a total of about 188 million DOF. Since the required resolution is not evident from the beginning because no experience from other simulations is available, the

Fig. 3 The launcher’s wall

3D Particle-In-Cell Simulations of Gyrotrons

641

Fig. 4 Smooth wall wave guide: analytic solution (Bz )

launcher’s simulation was preceded by a series of test simulations. In these tests, a wave of the same mode as in the real launcher was sent through a cylindrical wave guide of the same dimensions as the launcher itself. Other than in the real launcher, the inner surface of this wave guide is smooth, i.e. without corrugations. The smooth wall of such a wave guide allows the incoming mode to traverse the domain without modifications. The unperturbed nature of the field propagation allows the comparison of simulation results with an analytical solution. In the test simulations of the smooth wall wave guide, the resolution was varied in order to characterize the quality of the solution depending on the number of DOF. Figure 4 shows the exact solution that is used for the irradiating boundary condition of the incoming waves as well as the reference solution. Figure 5 shows numerical solutions of simulations with different numbers of DOF. Here, the computational grid—discretizing the domain by about 2.4 million hexahedral elements—was the same in all simulations but the number of DOF was increased through the use of polynomial basis functions of different order. Obviously, the fourth-order simulation is not able to propagate the waves through the whole smooth surface wave guide. Even though the fifth-order computation shows a much better wave transport throughout the domain, the solution shows unphysical interferences especially in the center of the domain. The situation is much better in the case of the sixth-order simulation. Here, the waves are properly propagated through the whole cylindrical wave guide and no interferences are apparent in the center of the domain. Even though small deviations from the reference solution are still evident, all wave modes can clearly be distinguished.

642

A. Stock et al.

Fig. 5 Smooth wall wave guide: Numerical solutions (Bz , x-z-slice)

Thus, the launcher was simulated on the same grid with sixth-order basis functions. For the launcher simulation, the cylindrical mesh with the originally smooth outer boundary was modified by scaling all grid points in the radial direction. The scaling was applied according to the tiny corrugations in the order of 1 mm that are superimposed on the smooth cylindrical shape of the wave guide to yield the surface of the launcher. Consequently, the launcher simulation was conducted with the same number of DOF (136 million) as the sixth order computation of the smooth wall wave guide. The aspect of special interest in the result of a launcher simulation is the formation and structure of the emitting wave mode at the exit. These features can be visualized by evaluating the electromagnetic fields on the surface of the launcher as

3D Particle-In-Cell Simulations of Gyrotrons

643

Fig. 6 Simulation results: Magnetic field in z-direction on the (unrolled) surface of the launcher wall

shown unrolled in Fig. 6. The waves are irradiated from the left and are shaped into the emitting mode at z ≈ 0.28 m, u ≈ 0.075 m (second pattern from the bottom). The simulation of the electromagnetic wave propagation in the 170 GHz TE34,19 coaxial cavity gyrotron launcher was conducted on 512 compute nodes of an Infiniband cluster using 2048 MPI processes. Each node featured two Intel® Xeon X5570 quad-core CPUs and 24 GB memory. The memory being the limiting factor for the computation, not all of the available 4096 CPU cores could be used. The computation of the 2.5 ns that the waves took to propagate through the launcher required an elapsed time of about 20 hours.

3.2 Gyrotron Resonator A gyrotron resonator exciting a TE0,3 mode wave was simulated with a threedimensional fully coupled PIC scheme using the DG-Maxwell solver and the particle pusher, solving the Lorentz equation. Figure 7 shows the geometry of the TE0,3 resonator. It has a non-homogeneous particle distribution with an electron hollow beam in the center. Full geometrical and physical specifications of the electron beam can be found in [6] where the TE0,3 resonator was investigated in detail. For the gyrotron resonator simulation a computational grid with ∼ 14300 tetrahedrons and ∼ 3 · 106 DOF for a fourth-order DG method was used. The particle hollow

644

A. Stock et al.

Fig. 7 TE0,3 resonator geometry with electron hollow beam in the center

Fig. 8 TE0,3 resonator results showing a slice of the Ey -field distribution after 15 ns simulation time

beam was modeled with 106 pseudo electrons with a macro particle factor2 of 106 . A typical result of a resonator simulation is shown in Fig. 8a, where the Ey -field distribution in the TE0,3 cavity is recorded at time t = 15 ns. Clearly a patch pattern of the electromagnetic field is visible. Each patch developing in the middle part of the figures describes an extreme value in the Ey -field. A wave length consists of two patches resulting in three wave lengths in x-direction which form the expected 2

One macro particle imitates the presence of a certain number of real particles. This number is the macro particle factor.

3D Particle-In-Cell Simulations of Gyrotrons

645

TE0,3 mode. For the sake of illustration, the hollow electron beam together with the Ey -field distribution is plotted in Fig. 8b. We note that the results depicted in these figures show excellent agreement with those achieved by a two-dimensional, rotational symmetric finite difference based PIC code presented in [6]. Since the graph partitioning algorithm of the domain decomposition does not take the amount of particles in each element into account, load imbalances occur for such particle distributions. While most of the domain is not affected by the particles, the fully consistent PIC loop is executed in these regions as well. This limits the parallel scalability of the problem. In order to quantify the parallelization capabilities of the PIC code, scaling tests were conducted. All scaling test were performed on the “Nehalem” cluster where each compute node features two quad-core Intel® Xeon X5560 CPUs and 12 GB memory with an Infiniband node-node interconnect. Additionally the two different time stepping methods for the particles, i.e. leapfrog technique and the fourth-order LSERK4, were compared. Usually the leapfrog scheme is preferred over the LSERK4 scheme due to computational time savings. For the presented test case, the impact of the LSERK4 scheme on the computational time due to the non-homogeneous particle setup was investigated. A comparison of both schemes is reasonable since the LSERK4 scheme always is a better choice with respect to accuracy and consistency with other parts of the PIC code, e.g. the time stepper for the DG method3 . For the strong scaling experiments the same computational setup as mentioned above was used. The parallel efficiency (PE) for the strong scaling yields PEstrong (n procs ) =

Δ tserial , n procs Δ t(n procs ) 1

(1)

where Δ tserial is the computational time required to run the problem with the serial code, n procs is the number of processes and Δ t(n procs ) is the time used to compute the problem with n procs in parallel. Figure 9 shows the strong scaling PE. For 2 and 4 processes due to cache effects the PE is above one. From 8 processes on, the plot shows a nearly linear decay in the PE for both time stepping methods. The leapfrog method has a slightly better PE than the LSERK4 method. Figure 10 shows the strong scaling speed up (SU). The black dotted line shows the ideal SU. The largest SU is achieved with 64 processes for both time stepping method. While the leapfrog method has a maximum SU of ∼ 29, the LSERK4 scheme only has a maximum SU of ∼ 20. Considering that the PE at 64 processes is ∼ 0.45 for the leapfrog method and ∼ 0.29 for the LSERK4 method, the strong scaling is not very good, which can be explained by the load imbalances caused by the particles. For increasing numbers of processes the amount of processes without particles increases. These processes have idle times while other processes are working on the particle routines. The idle times increase for the LSERK4 method since the computational demand with respect to the particles is bigger than for the leapfrog method. To quantify the differ-

3

For the DG method, a LSERK4 time stepper is used as well.

646

Fig. 9 Parallel efficiency (PE) for strong scaling

A. Stock et al.

Fig. 10 Speed up (SU) of strong scaling

Fig. 11 Slow-down rate of the LSERK4 method for the strong scaling test

ences between both time stepping methods, Fig. 11 shows the slow-down rate4 of the LSERK4 method compared to the leapfrog method, i.e. Δ tLSERK4 /Δ tleap f rog − 1. For fewer than 16 processes the LSERK4 method is not significantly slower than the leapfrog method. Due to increasing idle times for increasing process numbers the slow-down drastically grows for more than 16 processes. This matches the observation for the PE and the SU. For the weak scaling the problem size and the number of processes has to grow by the same factor, i.e. fW S = n procs,2 /n procs,1 with n procs,1 < n procs,2 . Usually the problem size is depending on the number of degrees of freedom nDOF . In this case also the number of particles have to be considered since they affect the computational time as well. Thus also the number of particles is increased by fW S . For different nDOF the grid is built with different element sizes, i.e. the main parameter is the edge length of the tetrahedron. Since the grid is generated by a grid generator, sometimes it is impossible to match fW S exactly for nDOF . The resulting discrepancy 4

e.g. for a slow-down rate of 2 the LSERK4 method requires twice the computational time of the leapfrog method.

3D Particle-In-Cell Simulations of Gyrotrons

647

Table 1 Computational parameters for the weak scaling test with a fourth-order DG-PIC simulation n procs,2 nElems nDOF cDOF citer Edge length n parts MPF 8 20661 413220 1.0000 1.0000 0.003000 1E6 1E6 16 41641 832820 1.0077 1.5044 0.002400 2E6 0.5E6 32 90664 1813280 1.0970 1.9102 0.001900 4E6 0.25E6 64 168932 3378640 1.0220 2.2397 0.001450 8E6 0.125E6 128 327269 6545380 0.9899 3.0750 0.001225 16E6 0.0625E6 256 683707 13674140 1.0341 3.7750 0.000950 32E6 0.03125E6 512 1351672 27033440 1.0222 4.9323 0.000750 32E6 0.03125E6 1024 2778757 55575140 1.0507 7.2161 0.000580 32E6 0.03125E6

Fig. 12 Parallel efficiency (PE) for weak scaling Fig. 13 Slow-down rate of the LSERK4 method for weak scaling

is recognized by the constant cDOF (n procs,2 ) =

nDOF,2 n procs,1 nDOF,1 n procs,2 .

Since the timestep de-

elements5

we additionally normalize the PE by the creases for increasing number of number of iterations to get the PE for a unity timestep. This is recognized by the n . These constants are considered for the weak scaling constant citer (n procs,2 ) = niter,2 iter,1 PE formula, i.e. PEweak (n procs,2 ) =

Δ t(n procs,1 ) cDOF (n procs,2 ) citer (n procs,2 ). Δ t(n procs,2 )

(2)

In our results always n procs,1 = 8 and n procs,2 is the running variable for the number of processes. Table 1 lists up the setup parameter for each run with n procs . For more than 256 processes the number of particles could not be increased due to stack size problems. It is assumed that this limitation has a minor effect on the weak scaling. Figure 12 shows the weak scaling PE. The PE qualitatively has the same monotone decreasing behavior for both methods. But the PE for the LSERK4 method is ∼ 0.1 smaller, which again can be explained by larger idle times for this method. This also can be observed in the slow-down rate of the LSERK4 method shown in Fig. 13. 5

due to the CFL-condition

648

A. Stock et al.

The slow-down rate increases for larger numbers of processes, but it is not as large as for the strong scaling.

4 Outlook The scaling tests give an important outlook to future simulations of more complex shaped resonators, such as the TE34,19 mode resonator [7, 9]. An adequately resolved grid to simulate the TE34,19 resonator requires large process numbers and a good scaling capability of the code. The gained results indicate the necessity to improve the scalability of the current code. Especially the idle time problem of the particle pusher has to be sufficiently reduced to gain a better parallel efficiency. An important question to be answered by the scaling tests is the choice of the time stepping method for the particles. The weak scaling test revealed that the LSERK4 method only takes twice the computational time than the leapfrog method for large process numbers. Thus the LSERK4 method seems to be a preferable choice for future resonator simulations with respect to the consistency in the numerical framework of the PIC method and the accuracy. Acknowledgments. We gratefully acknowledge the Deutsche Forschungsgemeinschaft (DFG) for funding within the project “Numerical Modeling and Simulation of Highly Rarefied Plasma Flows”. T. Stindl wishes to thank the Landesgraduiertenf¨orderung Baden-W¨urttemberg and the Erich-Becker-Stiftung, Germany, for their financial support. Computational resources have been provided by the Bundes-H¨ochstleistungsrechenzentrum Stuttgart (HLRS).

References 1. C. K. Birdsall and A. B. Langdon. Plasma Physics via Computer Simulation. Adam Hilger, Bristol, Philadelphia, New York, 1991. 2. M. Fertig, D. Petkow, T. Stindl, M. Quandt, C.-D. Munz, J. Neudorfer, S. Roller, D. D’Andrea, and R. Schneider. Hybrid code development for the numerical simulation of instationary magnetoplasmadynamic thrusters. High Performance Computing in Science and Engineering ’08. Springer, Berlin, Heidelberg, pp. 585–597, 2009. 3. G. Gassner, F. L¨orcher, C.-D. Munz, and J. S. Hesthaven. Polymorphic nodal elements and their application in discontinuous Galerkin methods. J. Comput. Phys., 228(5):1573–1590, 2009. doi:10.1016/j.jcp.2008.11.012. 4. J. S. Hesthaven and T. Warburton. Nodal Discontinuous Galerkin Methods. Springer, New York, 2008. 5. R. Hockney and J. Eastwood. Computer Simulation Using Particles. McGraw-Hill, New York, 1981. 6. S. Illy. Untersuchungen von Strahlinstabilit¨aten in der Kompressionszone von GyrotronOszillatoren mit Hilfe der kinetischen Theorie und zeitabh¨angiger Particle-in-CellSimulationen. PhD thesis, Universit¨at Karlsruhe und Forschungszentrum Karlsruhe (FZKA 6037), December 1997.

3D Particle-In-Cell Simulations of Gyrotrons

649

7. J. Jin, M. Thumm, B. Piosczyk, S. Kern, J. Flamm, and T. Rzesnicki. Novel numerical method for the analysis and synthesis of the fields in highly oversized waveguide mode converters. IEEE Transactions on Microwave Theory and Techniques, 57(7):1661, 2009. 8. C. A. Kennedy, M. H. Carpenter, and R. M. Lewis. Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations. Applied Numerical Mathematics, 35:177–219, 2000. 9. S. Kern. Numerische Simulation der Gyrotron-Wechselwirkung in koaxialen Resonatoren. PhD thesis, Forschungszentrum Karlsruhe GmbH, FZKA, 1996. 10. C.-D. Munz, P. Omnes, R. Schneider, E. Sonnendr¨ucker, and U. Voß. Divergence correction techniques for Maxwell solvers based on a hyperbolic model. J. Comput. Phys., 161:484–511, 2000. 11. T. Stindl, J. Neudorfer, A. Stock, M. Auweter-Kurtz, C.-D. Munz, S. Roller, and R. Schneider. Comparison of coupling techniques in a high-order discontinuous Galerkin based particle in cell solver. J. Phys. D: Applied Physics, 44:194004, 2011.

High Performance Computing in Science and Engineering ' 06: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2006

High performance computing in science and engineering '08 transactions of the High Performance Computing Center, Stuttgart (HLRS) 2008

High Performance Computing in Science and Engineering ' 07: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2007

High Performance Computing in Science and Engineering '10: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2010

High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005

High Performance Computing on Vector Systems 2006: Proceedings of the High Performance Computing Center Stuttgart, March 2006

Tools for High Performance Computing: Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, July 2008, HLRS, Stuttgart

High Performance Computing in Science and Engineering '11: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2011

High Performance Computing in Science and Engineering ' 06: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2006

High performance computing in science and engineering '08 transactions of the High Performance Computing Center, Stuttgart (HLRS) 2008

High Performance Computing in Science and Engineering ' 07: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2007

High Performance Computing in Science and Engineering '10: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2010

High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005