101
Notes on Numerical Fluid Mechanics and Multidisciplinary Design (NNFM)
Editors W. Schröder/Aachen K. Fujii/Kanagawa W. Haase/München E.H. Hirschel/München B. van Leer/Ann Arbor M.A. Leschziner/London M. Pandolfi/Torino J. Periaux/Paris A. Rizzi/Stockholm B. Roux/Marseille Y. Shokin/Novosibirsk
Computational Science and High Performance Computing III The 3rd Russian-German Advanced Research Workshop, Novosibirsk, Russia, 23–27 July 2007 Egon Krause Yurii I. Shokin Michael Resch Nina Shokina (Editors)
ABC
Prof. Egon Krause Aerodynamisches Institut RWTH Aachen Wuellnerstr. zw. 5 u. 7 52062 Aachen Germany Prof. Yurii I. Shokin Academician of Russian Academy of Sciences Institute of Computational Technologies of SB RAS Ac. Lavrentyev Ave. 6 630090 Novosibirsk Russia
ISBN 978-3-540-69008-5
Prof. Michael Resch High Performance Computing Center Stuttgart University of Stuttgart Nobelstrasse 19 70569 Stuttgart Germany Dr. Nina Shokina High Performance Computing Center Stuttgart University of Stuttgart Nobelstrasse 19 70569 Stuttgart Germany
e-ISBN 978-3-540-69010-8
DOI 10.1007/978-3-540-69010-8 Notes on Numerical Fluid Mechanics and Multidisciplinary Design
ISSN 1612-2909
Library of Congress Control Number: 2008928447 c 2008
Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed in acid-free paper 543210 springer.com
NNFM Editor Addresses
Prof. Dr. Wolfgang Schröder (General Editor) RWTH Aachen Lehrstuhl für Strömungslehre und Aerodynamisches Institut Wüllnerstr. zw. 5 u. 7 52062 Aachen Germany E-mail: offi
[email protected] Prof. Dr. Kozo Fujii Space Transportation Research Division The Institute of Space and Astronautical Science 3-1-1, Yoshinodai, Sagamihara Kanagawa, 229-8510 Japan E-mail: fujii@flab.eng.isas.jaxa.jp Dr. Werner Haase Höhenkirchener Str. 19d D-85662 Hohenbrunn Germany E-mail: offi
[email protected] Prof. Dr. Ernst Heinrich Hirschel Herzog-Heinrich-Weg 6 D-85604 Zorneding Germany E-mail:
[email protected] Prof. Dr. Bram van Leer Department of Aerospace Engineering The University of Michigan Ann Arbor, MI 48109-2140 USA E-mail:
[email protected] Prof. Dr. Michael A. Leschziner Imperial College of Science Technology and Medicine Aeronautics Department Prince Consort Road London SW7 2BY U.K. E-mail:
[email protected]
Prof. Dr. Maurizio Pandolfi Politecnico di Torino Dipartimento di Ingegneria Aeronautica e Spaziale Corso Duca degli Abruzzi, 24 I-10129 Torino Italy E-mail: pandolfi@polito.it Prof. Dr. Jacques Periaux 38, Boulevard de Reuilly F-75012 Paris France E-mail:
[email protected] Prof. Dr. Arthur Rizzi Department of Aeronautics KTH Royal Institute of Technology Teknikringen 8 S-10044 Stockholm Sweden E-mail:
[email protected] Dr. Bernard Roux L3M – IMT La Jetée Technopole de Chateau-Gombert F-13451 Marseille Cedex 20 France E-mail:
[email protected] Prof. Dr. Yurii I. Shokin Siberian Branch of the Russian Academy of Sciences Institute of Computational Technologies Ac. Lavrentyeva Ave. 6 630090 Novosibirsk Russia E-mail:
[email protected]
Preface
This volume is published as the proceedings of the third Russian-German Advanced Research Workshop on Computational Science and High Performance Computing in Novosibirsk, Russia, in July 2007. The contributions of these proceedings were provided and edited by the authors, chosen after a careful selection and reviewing. The workshop was organized by the High Performance Computing Center Stuttgart (Stuttgart, Germany) and the Institute of Computational Technologies SB RAS (Novosibirsk, Russia) in the framework of activities of the German-Russian Center for Computational Technologies and High Performance Computing. Thee event is held biannually and has already become a good tradition for German and Russian scientists. The first Workshop took place in September 2003 in Novosibirsk and the second Workshop was hosted by Stuttgart in March 2005. Both workshops gave the possibility of sharing and discussing the latest results and developing further scientific contacts in the field of computational science and high performance computing. The topics of the current workshop include software and hardware for high performance computation, numerical modelling in geophysics and computational fluid dynamics, mathematical modelling of tsunami waves, simulation of fuel cells and modern fibre optics devices, numerical modelling in cryptography problems and aeroacoustics, interval analysis, tools for Grid applications, research on service-oriented architecture (SOA) and telemedicine technologies. The participation of representatives of major research organizations engaged in the solution of the most complex problems of mathematical modelling, development of new algorithms, programs and key elements of information technologies, elaboration and implementation of software and hardware for high performance computing systems, provided a high level of competence of the workshop. Among the German participants were the heads and leading specialists of the High Performance Computing Center Stuttgart (HLRS) (University of Stuttgart), NEC High Performance Computing Europe GmbH, Section of Applied Mathematics (University of Freiburg i. Br.), Institute of Aerodynamics (RWTH Aachen), Regional Computing Center Erlangen (RRZE (University of Erlangen-Nuremberg), Center for High Performance Computing (ZHR) (Dresden University of Technology). Among the Russian participants were researchers of the Institutes of the Siberian Branch of the Russian Academy of Sciences (SB RAS): Institute of Computational Technologies SB RAS (Novosibirsk), Institute of Computational Modelling SB RAS (Krasnoyarsk), Lavrentyev Institute of Hydrodynamics SB RAS (Novosibirsk), Institute of Computational Mathematics and Mathematical Geophysics SB RAS (Novosibirsk), Institute for System Dynamics and Control
VIII
Preface
Theories SB RAS (Irkutsk). The scientists from the following universities participated also: Siberian State University of Telecommunications and Computer Science (Novosibirsk), Altai State University (Barnaul), Novosibirsk State Medical University (Novosibirsk). This volume provides state-of-the-art scientific papers, presenting the latest results of the leading German and Russian institutions. We are very glad to see the live tradition of cooperation and successful continuation of the highly professional international scientific meetings. The editors would like to express their gratitude to all the participants of the workshop and wish them a further successful and fruitful work.
Novosibirsk-Stuttgart, December 2007
Egon Krause Yurii Shokin Michael Resch Nina Shokina
Table of Contents
Computing Facility of the Institute of Computational Technologies SB RAS Yu.I. Shokin, M.P. Fedoruk, D.L. Chubarov, A.V. Yurchenko . . . . . . . . . . .
1
HPC in Industrial Environments M.M. Resch, U. K¨ uster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Parallel Realization of Mathematical Modelling of Electromagnetic Logging Processes Using VIKIZ Probe Complex V.N. Eryomin, S. Haberhauer, O.V. Nechaev, N. Shokina, E.P. Shurina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Numerical Solution of Some Direct and Inverse Mathematical Problems for Tidal Flows V.I. Agoshkov, L.P. Kamenschikov, E.D. Karepova, V.V. Shaidurov . . . . .
31
Hardware Development and Impact on Numerical Algorithms U. K¨ uster, M.M. Resch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
Mathematical Modeling in Application to Regional Tsunami Warning Systems Operations Yu.I. Shokin, V.V. Babailov, S.A. Beisel, L.B. Chubarov, S.V. Eletsky, Z.I. Fedotova, V.K. Gusiakov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
Parallel and Adaptive Simulation of Fuel Cells in 3d R. Kl¨ ofkorn, D. Kr¨ oner, M. Ohlberger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Numerical Modeling of Some Free Turbulent Flows G.G. Chernykh, A.G. Demenkov, A.V. Fomina, B.B. Ilyushin, V.A. Kostomakha, N.P. Moshkin, O.F. Voropayeva . . . . . . . . . . . . . . . . . . .
82
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes E. B¨ ansch, O. Goncharova, A. Koop, D. Kr¨ oner . . . . . . . . . . . . . . . . . . . . . . 102 Parallel Numerical Modeling of Modern Fibre Optics Devices L.Yu. Prokopyeva, Yu.I. Shokin, A.S. Lebedev, O.V. Shtyrina, M.P. Fedoruk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Zonal Large-Eddy Simulations and Aeroacoustics of High-Lift Airfoil Configurations M. Meinke, D. K¨ onig, Q. Zhang, W. Schr¨ oder . . . . . . . . . . . . . . . . . . . . . . . . 136
X
Table of Contents
Experimental Statistical Attacks on Block and Stream Ciphers S. Doroshenko, A. Fionov, A. Lubkin, V. Monarev, B. Ryabko, Yu.I. Shokin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 On Performance and Accuracy of Lattice Boltzmann Approaches for Single Phase Flow in Porous Media: A Toy Became an Accepted Tool—How to Maintain Its Features Despite More and More Complex (Physical) Models and Changing Trends in High Performance Computing!? T. Zeiser, J. G¨ otz, M. St¨ urmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Parameter Partition Methods for Optimal Numerical Solution of Interval Linear Systems S.P. Shary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Comparative Analysis of the SPH and ISPH Methods K.E. Afanasiev, R.S. Makarchuk, A.Yu. Popov . . . . . . . . . . . . . . . . . . . . . . . 206 SEGL: A Problem Solving Environment for the Design and Execution of Complex Scientific Grid Applications N. Currle-Linde, M.M. Resch, U. K¨ uster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 A Service-Oriented Architecture for Some Problems of Municipal Management (Example of the City of Irkutsk Municipal Administration) I.V. Bychkov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Basic Tendencies of the Telemedicine Technologies Development in Siberian Region A.V. Efremov, A.V. Karpov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
List of Contributors
K.E. Afanasiev Kemerovo State University Krasnaya str. 6 Kemerovo, 650043, Russia
[email protected] V.I. Agoshkov Institute of Numerical Mathematics RAS Gubkina st. 8 Moscow, 119991, Russia
[email protected] V.V. Babailov Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] E. B¨ ansch Institute of Applied Mathematics III, University of Erlangen-Nuremberg Haberstr. 2 Erlangen, 91058, Germany
[email protected] S.A. Beisel Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia beisel
[email protected] I.V. Bychkov Institute for System Dynamics and Control Theory SB RAS Lermontov str. 134 Irkutsk, 664033, Russia
[email protected]
G.G. Chernykh Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] D.L. Chubarov Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] L.B. Chubarov Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] N. Currle-Linde High Performance Computing Center Stuttgart (HLRS), University of Stuttgart Nobelstraße 19 Stuttgart, 70569, Germany
[email protected] A.G. Demenkov S.S. Kutateladze Institute of Thermophysics SB RAS Lavrentiev Ave. 1 Novosibirsk, 630090, Russia
[email protected] S.A. Doroshenko Siberian State University of Telecommunications and Computer Science Kirova str. 86
XII
List of Contributors
Novosibirsk, 630102, Russia
[email protected] S.V. Eletsky Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] A.V. Efremov Novosibirsk State Medical University Krasny Ave. 52 Novosibirsk, 630099, Russia
[email protected]
A.V. Fomina Kuzbass State Academy of Pedagogy Pionerskii Ave. 13 Novokuznetsk, 654066, Russia
[email protected] O.N. Goncharova Altai State University pr. Lenina 61 Barnaul, 656049 Russia M.A. Lavrentiev Institute of Hydrodynamics SB RAS Lavrentiev Ave. 15 Novosibirsk, 630090, Russia
[email protected]
V.N. Eryomin Scientific Production Enterprise of Geophysical Equipment ”Looch” Geologicheskaya Str. 49 Novosibirsk, 630010, Russia
[email protected]
J. G¨ otz Chair for System Simulation,, University of Erlangen-Nuremberg, Cauerstraße 6, Erlangen, 91058, Germany
M.P. Fedoruk Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected]
V.K. Gusyakov Institute of Computational Mathematics and Mathematical Geophysics SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected]
Z.I. Fedotova Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] A.N. Fionov Siberian State University of Telecommunications and Computer Science Kirova str. 86 Novosibirsk, 30111, Russia
[email protected]
[email protected]
S. Haberhauer NEC - High Performance Computing Europe GmbH Nobelstraße 19 Stuttgart, 70569, Germany
[email protected] B.B. Ilyushin S.S. Kutateladze Institute of Thermophysics SB RAS Lavrentiev Ave. 1 Novosibirsk, 630090, Russia
[email protected]
List of Contributors
L.P. Kamenshchikov Institute of Computational Modelling SB RAS Academgorodok, Krasnoyarsk, 660036, Russia
[email protected] E.D. Karepova Institute of Computational Modelling SB RAS Academgorodok, Krasnoyarsk, 660036, Russia
[email protected] A.V. Karpov Novosibirsk State Medical University Krasny Ave. 52 Novosibirsk, 630099, Russia Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] R. Kl¨ ofkorn Section of Applied Mathematics, University of Freiburg i. Br. Hermann-Herder-Straße 10 Freiburg i. Br., 79104, Germany robertk@mathematik. uni-freiburg.de A. Koop Sternenberg 19 Wuppertal, 42279, Germany
[email protected] D. K¨ onig Institute of Aerodynamics RWTH Aachen Wuellnerstraße zw. 5 u.7 Aachen, 52062, Germany
[email protected]
XIII
V.A. Kostomakha M.A. Lavrentiev Institute of Hydrodynamics SB RAS Lavrentiev Ave. 15 Novosibirsk, 630090, Russia
[email protected] D. Kr¨ oner Section of Applied Mathematics, University of Freiburg i. Br. Hermann-Herder-Straße 10 Freiburg i. Br., 79104, Germany
[email protected]
U. K¨ uster High Performance Computing Center Stuttgart (HLRS), University of Stuttgart Nobelstraße 19 Stuttgart, 70569, Germany
[email protected] A.S. Lebedev Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] A.M. Lubkin Siberian State University of Telecommunications and Computer Science Kirova str. 86 Novosibirsk, 630102, Russia
[email protected] R.S. Makarchuk Kemerovo State University Krasnaya str. 6 Kemerovo, 650043, Russia
[email protected] M. Meinke Institute of Aerodynamics RWTH Aachen
XIV
List of Contributors
Wuellnerstraße zw. 5 u.7 Aachen, 52062, Germany
[email protected] V.A. Monarev Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] N.P. Moshkin Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] O.V. Nechaev Trofimuk Institute of Petroleum Geology and Geophysics SB RAS Koptyug Ave. 3, Novosibirsk, 630090, Russia
[email protected] M. Ohlberger Institute for Numerical and Applied Mathematics, University of M¨ unster Einsteinstraße 62 M¨ unster 48149 Germany
[email protected] A.Yu. Popov Kemerovo State University Krasnaya str. 6 Kemerovo, 650043, Russia a
[email protected] L.Yu. Prokopyeva Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected]
M. Resch High Performance Computing Center Stuttgart (HLRS), University of Stuttgart Nobelstraße 19 Stuttgart, 70569, Germany
[email protected] B.Ya. Ryabko Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] W. Schr¨ oder Institute of Aerodynamics RWTH Aachen Wuellnerstraße zw. 5 u.7 Aachen, 52062, Germany
[email protected] V.V. Shaidurov Institute of Computational Modelling SB RAS Academgorodok Krasnoyarsk, 660036, Russia
[email protected] S.P. Shary Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] Yu.I. Shokin Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] N.Yu. Shokina High Performance Computing Center Stuttgart (HLRS), University of Stuttgart
List of Contributors
Nobelstraße 19 Stuttgart, 70569, Germany
[email protected] M. St¨ urmer Chair for System Simulation,, University of Erlangen-Nuremberg, Cauerstraße 6, Erlangen, 91058, Germany markus.stuermer@informatik. uni-erlangen.de O.V. Shtyrina Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] E.P. Shurina Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia Novosibirsk State Technical University K. Marx Ave. 20 Novosibirsk, 630092, Russia
[email protected]
XV
O.F. Voropayeva Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] A.V. Yurchenko Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia
[email protected] T. Zeiser Regional Computing Center Erlangen, University of Erlangen-Nuremberg, Martensstraße 1, Erlangen, 91058, Germany
[email protected]
Q. Zhang Institute of Aerodynamics RWTH Aachen Wuellnerstraße zw. 5 u.7 Aachen, 52062, Germany
[email protected]
Computing Facility of the Institute of Computational Technologies SB RAS Yu.I. Shokin , M.P. Fedoruk , D.L. Chubarov, and A.V. Yurchenko Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia {shokin,mife,dchubarov,yurchenko}@ict.nsc.ru
1
Introduction
The performance gap between supercomputers and personal workstations is growing. Yet supercomputing resources are scarce and carefully managed. Gaining access to a leading national supercomputing centre requires a considerable amount of work comparable to the amount of work needed to get a paper published in a leading scientific journal. This raises the importance of centralised computing resources on the level of a single organization. We can outline several principles guiding the development of high performance computing on the scale of an academic organization. Ease of access. Every user within the organization should get a level of access that is as close as possible to their needs. This is always a compromise between the needs of different users. Training. Since most of the users within the organization are not high performance computing professionals, they need information on the existing technologies and latest trends in the development of high performance computing worldwide. Development facilities. The specifics of an academic organization is that a big proportion of the codes running on high performance computing systems is developed within the organization. Applications. The development of computing systems should follow the demands coming from applications. In the rest of this paper we show how these principles were implemented within the integrated data processing environment of the Institute of Computational Technologies of the Siberian Branch of the Russian Academy of Sciences (ICT SB RAS).
The work of Yu.I. Shokin is supported by grants from the Russian Foundation for Basic Research RFBR 06-07-03023, RFBR 06-07-01820. The work of M.P. Fedoruk, D.L. Chubarov, A.V. Yurchenko is supported by Federal Agency for Education within the framework of Research Programme 1.15.07.
E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 1–7, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
2
Yu.I. Shokin et al.
2
Computers
The Institute supports an ongoing effort to help the researchers within the institute with the use of high performance computing for the solution of demanding problems arising in their practice. In particular, the Institute is strengthening collaboration with high performance computing centers within Russia and abroad. This not only provides the necessary computing resources but also facilitates the exchange of knowledge and experience on parallel programming and on efficient use of high performance computers for solving demanding problems of today. Projects such as HPC–Europa1 make access barriers to supercomputing significantly lower for newcomers. At the moment there is no analogue of HPC– Europa in Russia. In future Grid computing holds a promise of unification of access policies within different supercomputing centers. In the meanwhile the Institute collaborates with the Joint Supercomputing center of the Russian Academy of Sciences (JSC), Siberian supercomputing center (SSCC), High Performance Computing Center in Stuttgart (HLRS) and with University supercomputing centers such as the computing center of South Ural State University. National and international supercomputing centers provide resources sufficient to satisfy the needs of most users. High degree of centralization of computing equipment helps to reduce the maintenance and service costs. On the other hand, gaining access to supercomputing centers is not always easy. It sometimes involves a significant time delay. There are several reasons for fostering collaboration with specialised computing centers and at the same time maintaining and developing of computing resources within the Institute. First, there are cases when immediate interactive access to the computing resources is particularly important. Such cases include development of new codes, debugging and performance analysis. Second, communication between the users and the maintainers of the computers provides for more flexible usage policies and faster dissemination of knowledge and experience. The development of the computing facility in ICT SB RAS is governed by the need to provide the researchers with an opportunity to develop new applications capable of solving large scale problems, debug and improve the performance of the codes. At the same time it is assumed that at certain point of development the major part of computations will be performed using the resources of remote supercomputers, therefore the computing facility strives to provide computers of different architectures for the evaluation of different computin techniques. The first compute cluster to support parallel processing of computationally intensive jobs was installed at the Institute in 1999–2000. Several users within the institute had access to the system’s parallel processing environment. In 2004 a new cluster of dual processor nodes was installed. The first system was disassembled shortly before that. A new compute cluster and a preprocessing server 1
http://www.hpc-europa.org
Computing Facility of the Institute of Computational Technologies SB RAS
3
Fig. 1. The development of peak performance of computing systems in ICT SB RAS. In year 2004 a new 4 node cluster ARBYTE/8 was installed. In 2007 two new systems were procured: a preprocessing server on Tyan VX 50 platform and a 7 node cluster TYMAN/28.
was installed in 2007. Figure 1 shows the historical development of the peak performance of computing systems in ICT SB RAS. Compute systems have different architectures. In the rest of this section we present a brief overview of the available systems. A summary of characteristics is presented in the following table. Table 1. Characterstics of the systems at ICT SB RAS Xeon Linux Cluster Preprocessing server Opteron Linux cluster Installation year Memory architecture Platform Number of compute nodes Type of node interconnect Total memory Number of CPU cores Processor type Clock frequency FP Performance – GFlops peak OS
2.1
2004 SMP cluster Intel SE7501 4 GigE 10 GB 8 Intel Xeon 3.06 GHz 48.96 RH Linux 9.0
2007 ccNUMA Tyan VX50 1 N/A 32 GB 16 Opteron 880 2.4 GHz 76.8 SuSE Linux 10.0
2007 ccNUMA cluster Tyan GT24 7 GigE 28 GB 28 Opteron 280 2.4 GHz 134.4 SuSE Linux 10.0
Xeon Linux Cluster
The HPC LINPACK benchmark [1] measures the performance of a system on the problem of dense matrix inversion. The floating point performance is measured on solving a system of linear equations with a dense matrix. We use the HPL implementation of the benchmark2 . On Linpack test the system clocked slightly less than 32 GFlops. That gives an efficiency of 65%. 2
HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers by A. Petitet, R.C. Whaley, J. Dongarra, A. Cleary, 2004.
4
Yu.I. Shokin et al. cl: 8 CPUs
cl: 4 CPUs GFlops
GFlops 16 14 12 10 8 6 4 2 0
16 14 12 10 8 6 4 2 0
220
16000 14000 12000 10000
210 200 190
25
25
20
20
15
15 10
10
5
5
0
0
220
30000
210
6000
170 150 0
20000
190
15000
180 NB
N
4000
160
25000
200
8000
180 NB
30 30
10000
170 160
2000
N
5000 150 0
Fig. 2. Linpack benchmark on Xeon Linux cluster. The graph on the left shows performance scaling using 4 CPUs. The graph on the right shows performance scaling on 8 CPUs. Software: HPL, Intel C compiler 9.1, Intel MKL.
Figure 2 below presents the dependency of Linpack performance on the size of the input matrix and the size of basic block of the algorithm. 2.2
Preprocessing Server
The preprocessing server is an interesting system. The system is based on Tyan VX 50 platform that is assembled of two quad socket motherboards connected via HyperTransport links using PCI-Express interface. Four processors in the middle processors are connected with three other processors, while four processors at the corners are connected with two processors only. The system is a ccNUMA with 8 nodes. Each node has two CPU cores, making it 16 CPU cores altogether. The preprocessing server can also be used for computations. We compare the performance of this system to the performance of Xeon Linux cluster. The Linpack results are shown in Figure 3. The aggregate performance of the system is around 62 GFlops, which gives efficiency at the level of 80%. It is important to note that the benchmark results are preliminary and may not show the full potential of the systems. There is a room for improvement on the operating system level as well as on the level of Linpack implementation. mist: 4 cores
mist: 8 cores
GFlops
GFlops 16
16
14
14
12
12
10
10 8
8
6
6
4
4
220
16000 14000 12000 10000
210 200 190
30 25 20 15 10 5 0
220
30000
210
6000
170
4000
160 150 0
2000
25000
200
20000
190
8000
180 NB
35 35 30 25 20 15 10 5 0
15000
180 N
NB
10000
170 160
N
5000 150 0
Fig. 3. Linpack benchmark on the preprocessing server. The graph on the left shows performance scaling using 4 CPU cores. The graph on the right shows performance scaling on 8 CPU cores. Software: HPL, SunStudio 12 EA C compiler, Goto BLAS.
Computing Facility of the Institute of Computational Technologies SB RAS
2.3
5
Opteron Linux Cluster
The Opteron Linux cluster is a cluster of nodes each containing 4 CPU cores in a two-node ccNUMA system. The cluster is currently using Gigabit Ethernet as the main transport interface connecting the nodes. It would be interesting to explore the opportunities of Ethernet controllers bonding to provide higher bandwidth between the nodes.
3
Applications
In this section we briefly describe some of the applications of high performance computing developed using the computing facility of the Institute. One advantage of a centralized computing facility compared to workstations is a higher degree of reliability. Computing systems are running without interruption in a climate controlled environment with a higher reliability of the power supply. This also extends the lifespan of individual hardware components. Reliability considerations made Xeon Linux cluster initially used as a reliable computing resource for development and production runs of sequential codes. Computing the propagation of laser pulses in high performance optical fibers with nonlinear properties requires a significant amount of compute time. An application developed by a group in the institute compares the level of transmission errors for different encodings of data [2,8]. This requires running the simulation for many different input bit patterns and can consume a significant amount of CPU time. At the same time researchers in the Institute using three-dimensional CFD models realised that the resources of workstations are not sufficient for studying complex phenomena such as the explosive expansion of the airbag in automobiles [3]. The code was developed and parallelized in collaboration with HLRS and was run on the supercomputers available in the center. At the moment the numerical model is maintained at ICT SB RAS. There was an interesting experience with the code for optimization of the shape of runner blades in the turbines of hydroelectric power stations [4]. The code is using the genetic algorithm paradigm for the solution of a multicriterial optimization problem. Genetic algorithms are well suited for parallelization since the flow for each individual in a generation can be computed independently. With the use of 8 processors of the Xeon Linux cluster the runtime of one optimization experiment was reduced from 3-5 days to 17-22 hours. Another application of high performance computing in the Institute is in the area of cryptography and cryptoanalysis. The statistical tests developed in the Institute require significant amounts of CPU time and memory [5,6]. This type of application would not be possible without access to high performance computing resources. This is a well established area of research and the researchers use mostly the resources of national computing centers and supercomputers at universities. The research in photonics and in particular in modelling of nanostructures in ultra high speed optical fibers requires significant computational resources
6
Yu.I. Shokin et al.
and the parallelization of the algorithms. Good scalability was observed on the solution of Maxwell equation in 2D for modelling nanostructures in optical fibers and other nanomaterials used in photonics research [7].
4
Training
One of the obstacles on a way to wider use of high performance computing systems is the lack of experience in parallel programming and in general the lack of experience in the use of advanced computational technologies. Increasing the level of qualifications in the area of parallel programming and high performance computing is one of the goals of a Russian-German Summer School that is organized at the Institute with the help of High Performance Computing Center in Stuttgart (HLRS). The summer school is oriented primarily at the young researchers working in the Institutes of the Siberian Branch of the Russian Academy of Sciences. In collaboration with HLRS the Institute has established a Russian-German Center for Computational Technologies and High Performance Computing that plays an important role in fostering links between researchers in Germany and in Russia and in providing access to computational resources for applications that have significant importance.
5
Conclusion
The purpose of this paper is to outline the reasons behind the development of a computational facility within the Institute that can serve as an intermediate level between the facilities of supercomputing centers and personal workstations. There is a wide scale of activities involved in making this vision a reality, that include procuring the equipment, training the users and providing necessary consultations. We presented the results of this work that were achieved within a 2 year timeframe.
References 1. Dongarra, J.J.: Performance of various computers using standard linear equations software. Tech. report of Computer Science Department, University of Tennessee, CS-89–85 (1993) 2. Shtyrina, O.V., Turitsyn, S.K., Fedoruk, M.P.: Kvant. Elektr. 35, 169–174 (2005) (in Russian) 3. Rychkov, A.D., Shokina, N., B¨ onisch, T., Resch, M.M., K¨ uster, U.: Parallel numerical modelling of gas-dynamic processes in airbag combustion chamber. In: Krause, E., Shokin, Yu.I., Resch, M., Shokina, N. (eds.) Computational science and high performance computing. 2nd Russian-German Advanced Research Workshop, Stuttgart, Germany, March 14-16, 2005. Notes on Numerical Fluid Mechanics and Multidisciplinary Design (NNFM), vol. 91, pp. 29–39. Springer, Heidelberg (2005)
Computing Facility of the Institute of Computational Technologies SB RAS
7
4. Lobareva, I.F., Cherny, S.G., Chirkov, D.V., Skorospelov, V.A., Turuk, P.A.: Comput. Technologies 11, 63–76 (2006) (in Russian) 5. Ryabko, B.Y., Monarev, V.A., Shokin, Yu.I.: Probl. Inform. Transm. 41, 385–394 (2005) 6. Ryabko, B.Y., Monarev, V.A.: J. Statist. Plann. Inference 133, 95–110 (2005) 7. Prokopyeva, L.Y., Shokin, Yu.I., Lebedev, A.S., Fedoruk, M.P.: Parallel numerical modeling of modern fiber optics devices. In: Krause, E., Shokin, Yu.I., Resch, M., Shokina, N. (eds.) Compitational Science & High Performance Computing III. NNFM, vol. 101. Springer, Heidelberg (2007) 8. Shapiro, E.G., Fedoruk, M.P., Turitsyn, S.K.: J. Opt. Comm. 27, 216–218 (2006)
HPC in Industrial Environments M.M. Resch and U. K¨ uster High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Nobelstraße 19, 70569 Stuttgart, Germany {resch,kuester}@hlrs.de Abstract. HPC has undergone major changes in recent years [1]. A rapid change in hardware technology with an increase in performance has made technology interesting for a bigger market than before. The standardization that was used in clusters for years has made HPC an option for industry. With the advent of new systems that start to deviate from standard processors the landscape changes again. In this short paper we set out to describe the main changes over the last years. We further try to come up with the most important implications for industrial use of industry as we see it in a special collaboration between the High Performance computing Center Stuttgart (HLRS) and the industrial users in the region of Stuttgart.
1
Introduction
HPC has undergone major changes over the last years. One can describe the major architectural changes over the last two decades in HPC in short as: • From vector to parallel: Vector processors have long been the fastest available with a huge gap separating them from standard microprocessors. This gap has been closed and even though vector processors still increase their speed single processor performance is not to be the driving force in HPC performance. Hence there is a desire the desire to design parallel architectures that most recently has resulted in parallelism to be exploited at the processor level with multi-core and many-core architectures [2]. • From customized to commodity: As commodity processors caught up in speed it became simply a question of costs to move from specialized components to standard parts. • From single system to clusters: As systems were put together from standard parts every system could look differently using other parts. This went at the expense of loosing the single system look and feel of traditional supercomputers. • From standards back to specialized systems: Most recently - as standard processors have run into problems of speed and heat dissipation new and specialized systems are developed again bringing architectural development for HPC experts to full cycle. At the same time we have experienced a dramatic increase both in level of peak performance and in the size of main memory of large scale systems. As a E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 8–13, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
HPC in Industrial Environments
9
consequence we can solve larger problems and typically we can solve them faster. It should not be ignored that the increasing theoretical speed is not matched by a similarly increasing level of sustained performance. Still, today industrial users can lay their hands on systems that are on the average ten times faster than systems five years ago. This is a chance on the one hand but it brings new questions on the other hand. In this paper we set out to discuss some of these questions and come up with some answers. The problems that come with massively parallel systems and specifically with petascale systems are beyond the scope of industrial usage and are discussed elsewhere [3,4].
2
Dual Use: Academia and Industry
The concept of integration of computational resources into seamless environments was introduced by the name ”Grid” in 1999 by Ian Foster and others [5]. The basic idea with respect to high performance computing is to make all computational resources available to all potential users. Through the concept of middleware complexities that had inhibited the usage of these systems by nonexperts were supposed to be hidden. Ease of use should lead to a better usage of systems and better access to systems. An immediate idea that follows from these concepts is to bring high performance computers out of the academic niche it was mainly used in. There were certainly a number of large companies running HPC systems but then the Grid was supposed to allow creating a pool of resources from all fields (academic and industrial) and make them available to every one (academia and industry) in a simple way. The idea could be described as dual use of HPC resources and became rather popular with funding agencies. 2.1
Advantages
The main advantages of such a dual use are promoted by funding agencies. The discussion currently goes two ways. • Reduction of costs: The argument goes that when industry can make use of academic computing resources funding costs for academia will go down. Industrial usage can be billed. The incoming funds can be used to at least partially pay the costs of the HPC system. this reduces costs for funding agencies. • Increased usage: Average usage of large scale systems is in the order of 80-85 %. This is due to the fact that scheduling of resources can not achieve a 100% usage if resources are kept for large scale simulations. The argument goes that industry could use spare cycles. This could be done specifically during vacation time when scientists reduce their usage of systems. Discussion of Advantages These assumed advantages have to be taken with a grain of salt. Costs for HPC can potentially be reduced for academia if industry pays for usage of systems.
10
M.M. Resch and U. K¨ uster
At the same time, however, industry takes away CPU cycles from academia increasing the competition for scarce resources. So if research agencies want to supply industry with access to HPC systems they either have to limit access to these same resources for researchers or invest additional money to make sure there is no reduction in usage for research. The only financial argument left is a synergistic effect that would allow to achieve lower prices if academia and industry merge their market power to buy larger systems together. The improved usage of resources during vacation time quickly turns out to be a too optimistic view as companies - at least in Europe - tend to schedule their vacation time in accordance with public education. As a result industrial users are on vacation when scientists are on vacation. Hence the industrial usage of shared resources tends to shrink at the same time as research uses shrinks. A better resource usage by anti-cyclic industrial usage turns out to be not achievable. Some argue that by reducing prices during vacation time for industry one might encourage more industrial usage when resources are available. However, here one has to compare costs: the costs for CPU time are in the range of thousands of Euro. a price reduction could help companies to safe thousands of Euro. On the other side companies would have to adapt their working schedules to the vacation time of researchers and would have to make sure that their staff - typically with small children - would have to stay at home. Evidence shows that this is not happening. Nevertheless there is a potential in dual use of HPC resources that goes beyond the high hopes of funding agencies. In 1995 the High Performance Computing Center Stuttgart has set up such a cooperation in order to explore the potential of such dual use which is described in the following chapter. 2.2
A Public Private Partnership Approach
The University of Stuttgart had been running HPC systems for some 15 years when in the late 1980s it decided to collaborate with Porsche in HPC operations. This resulted in shared investment in vector supercomputers for several years. The collaboration turned out to be fruitful and in 1995 a public private partnership (called hww) was set up that also included Daimler Benz. The main expectation was to: • Leverage market power: Combining the purchase power of industry and academia helped to achieve better levels of price/performance for both sides. • Share operational costs: Creating a group of operational experts helped to bring down the staff cost for running systems. Today a group of roughly 10 staff members is operating 7 HPC systems. • Optimize system usage: Industrial usage typically comes in bursts when certain stages in product development require a lot of simulation. Industry then has a need for immediate availability of resources. in academia most simulations are part of long term research. It turned out that a good model could be found to intertwine these two different modes.
HPC in Industrial Environments
11
These expectations were met by a close collaboration of all partners in the first years. However, there is a number of problems that have to be addressed. Some of them can be considered to be startup-problems. some of them are permanent problems that show up continuously and require a continuous effort of all partners involved. Problems In this paper we will not discuss the legal and organizational problems of setting up a public-private partnership in Germany. these issues have to be resolved. From a scientific point of view the key problem is an understanding for economic processes and economic thinking. These issues include personal attitudes of partners and have to be solved on an individual basis considering the national legal framework. We will also ignore economic problems like accounting and billing which require a good understanding of the total cost of ownership of hardware and software resources’ availability. Again this is an issue that requires some understanding of economic processes and depends heavily on the internal handling of financial affairs of public organizations. But right from the start some technical problems presented themselves. The most pressing ones were: • Security related issues: This included the whole complex of trust and reliability from the point of view of industrial users. While for academic users data protection and availability of resources are of less concern it is vital for industrial that its most sensitive data be protected and no whatsoever information leak to other users. Furthermore, permanent availability of a resources is a must in order to meet internal and external deadlines. • Data and communication: This includes both the question of connectivity and of handling input and output data. Typically network connectivity between academia and industry is low. Most research networks are not open for industry. Accounting mechanisms for research networks are often missing. So, even to connect to a public institution may be difficult for industry. The amount of data to be transferred is another big issues as with increasing problem size the size of output data can get prohibitively high. Both issues have been helped for by increasing speed of networks and a tendency of research networks to open up to commercial users. With all these issues the Grid [5] was quite a helpful tool to drive development. Specifically the problems of security were extensively addressed in a number of national and European projects. A number of permanent problems remains and some new problems have shown up. These new problems are mainly related to operational procedures at industry. While 10 years ago industry in the region of Stuttgart was mainly using in house codes we have seen a dramatic shift towards the nearly exclusive usage of independent software vendor codes. This shift has put licensing issues and licensing costs at the center of the discussion. What is requested from industry is
12
M.M. Resch and U. K¨ uster
• Ease of use: Industrial users are used to prepare their simulation jobs in a visual environment. When accessing remote HPC platforms they have to use scripts or work at the command line level to submit and manage jobs. • Flexibility: Industrial users would like to chose resources in a flexible way picking the best resources for a given simulation.
3
Technologies to Help Industry
As described before the Grid is claims to be able to provide seamless access to any kind of resource. So far, however, solutions provided have a limited scope. Two new technologies were hence developed at HLRS in order to support industry which are presented here. 3.1
Access to Resources
Access to resources is a critical task. Industrial simulation has long moved from a batch type mode to a more interactive style. Although typical simulations still require several hours of compute time users expect to be able to easily chose the right system and then manage the running job. HLRS has developed an environment (SEGL)that allows to define not only simple job execution but a set of jobs [6]. These jobs can be run on any chosen architecture. They can be monitored and controlled by a non-expert. First results in an engineering environment for combustion optimization are promising [7]. 3.2
Visualization
One of the key problems in industrial usage of HPC resources is interpretation of the resulting data. Many of these problems are similar to academic problems. The amount of data is large and three-dimensional time-dependent simulations are a specific challenge for the human eye. For industry we see an increasing need to be able to fully integrate visualization into the simulation process [8]. at the same time industrial simulation always goes hand in hand with experiments. In order to make full use of both methods the field of augmented reality has become important [9]. HLRS has developed a tool called COVISE (COllaborative VISualization Environment) that supports both online visualization and the use of augmented reality in an industrial environment [10]. In the future HLRS will integrate SEGL and COVISE to make usage of HPC resources even easier.
4
Summary
HLRS has set up a public-private partnership mainly with automotive industry to share HPC resources. Over time a number of problems have been solved. It has turned out that both sides can benefit from such a collaboration. However, the use of public resources brings some new problems to academia which have to be dealt with. On the other hand simplification of usage of public resources
HPC in Industrial Environments
13
requires new and improved techniques to support the average - typically nonexperienced - industrial user.There is still a lot of work ahead before usage of HPC resources can become standard procedure in industry - specifically in small and medium sized enterprisers.
References 1. Resch, M., K¨ uster, U.: Investigating the Impact of Architectural and Programming Issues on Sustained Petaflop Performance. In: Bader, D. (ed.) Petascale Computing: Algorithms and Applications, Computational Science series. Chapman & Hall CRC Press, Taylor and Francis Group (2007) 2. Asanov´ıc, K., Bodik, R., Catanzaro, B., Gebis, J., Husbands, P., Keutzer, K., Patterson, D., Plishker, W., Shalf, J., Williams, S., Yelick, K.: The Landscape of Parallel Computing Research: A View from Berkeley; Electrical Engineering and Computer Sciences University of California at Berkeley. Technical Report No. UCB/EECS-2006-183. December 18, 2006 (2006), http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html 3. Lindner, P., Gabriel, E., Resch, M.: Performance prediction based resource selection in Grid environments. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC 2007. LNCS, vol. 4782. Springer, Heidelberg (2007) 4. Resch, M., K¨ uster, U.: PIK Praxis der Informationsverarbeitung und Kommunikation, vol. 29, pp. 214–220 (2006) 5. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999) 6. Currle-Linde, N., Resch, M.: GriCoL: A Language to Bridge the Gap between Applications and Grid Services. In: Volkert, J., Fahringer, T., Kranzlmueller, D., ¨ Schreiner, W. (eds.) 2nd Austrian Grid Symposium, Osterreichische Computer Gesellschaft (2007) 7. Resch, M., Currle-Linde, N., K¨ uster, U., Risio, B.: WSEAS Trans. Inf. Science Applications 3/4, 445–452 (2007) 8. Resch, M., Haas, P., K¨ uster, U.: Computer Environments - Clusters, Supercomputers, Storage, Visualization. In: Wiedemann, J., Hucho, W.-H. (eds.) Progress in Vehicle Aerodynamics IV Numerical Methods. Expert-Verlag (2006) 9. Kopecki, A., Resch, M.: Virtuelle und Hybride Prototypen. In: Bertsche, B., Bullinger, H.-J. (eds.) Entwicklung und Erprobung innovativer Produkte - Rapid Prototyping. Springer, Heidelberg (2007) 10. Lang, U., Peltier, J.P., Christ, P., Rill, S., Rantzau, D., Nebel, H., Wierse, A., Lang, R., Causse, S., Juaneda, F., Grave, M., Haas, P.: Fut. Gen. Comput. Syst. 11, 419–430 (1995)
Parallel Realization of Mathematical Modelling of Electromagnetic Logging Processes Using VIKIZ Probe Complex V.N. Eryomin1 , S. Haberhauer2 , O.V. Nechaev3 , N. Shokina4,5 , and E.P. Shurina5,6 1
Scientific Production Enterprise of Geophysical Equipment ”Looch”, Geologicheskaya Str. 49, 630010 Novosibirsk, Russia
[email protected] 2 NEC - High Performance Computing Europe GmbH, Nobelstraße 19, 70569 Stuttgart, Germany
[email protected] 3 Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, Koptyug Ave. 3, 630090 Novosibirsk, Russia
[email protected] 4 High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Nobelstraße 19, 70569 Stuttgart, Germany
[email protected] 5 Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia 6 Novosibirsk State Technical University, K. Marx Ave. 20, 630092 Novosibirsk, Russia
[email protected]
1
Introduction
The electromagnetic methods are widely used for solving problems of surveying and defectoscopy in geophysics. The method of high-frequency induction logging isoparametric sounding (VIKIZ) [1] is directed toward reconstructing resistivity spatial distribution of rock, where oil- and gas-bearing wells are situated. The VIKIZ method is based on measuring relative phase characteristics of electromagnetic quantities, namely, a phase difference of electromotive forces induced in receiver coils. In order to realize a transmission distance, resolving power and parameter sensitivity, the VIKIZ equipment and its modifications, for example, the VIKPB [2] borehole tool for high-frequency electromagnetic logging sounding, designed in the Scientific Production Enterprise of Geophysical Equipment ”Looch” (http://www.looch.ru/index en.php), should consist of several probes of high-frequency electromagnetic sounding of different depth. The VIKIZ equipment consists of borehole and surface tools. A borehole tool (Fig. 1) includes logging tool complex and electronic measuring unit. A logging tool complex consists of five electromagnetic logging probes of different depths and the SP electrode. Every probe contains one transmitting coil and two receiver coils. A phase difference of electromotive forces induced in exploring coils is measured. The recorded parameter is identically related to resistivity of rock, surrounding a well. E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 14–30, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Parallel Realization of Mathematical Modelling
15
Fig. 1. Scheme of the probe part of borehole tool
An electronic measuring unit provides: electromagnetic field excitation in a media; conversion of signals from receiver coils; measurements of phase difference and SP signal; transmission of digital information through a standard three-core logging cable to the surface tool or to the logging station. The main function of the surface unit is receiving and conversion of information, incoming from the borehole unit. A surface tools functions can be done by a special program in a computerized logging station. The frequency domain in electromagnetic survey is located in a range from few hundreds kHz to few tens of MHz. This allows guaranteing rather wide range of investigation depths. The presence of materials with contrast change of electrical resistivity and dielectric permeability is characteristic for geophysical applications. For example, a resistivity of layers can range from 2 Ohm to 200 Ohm in oil-bearing areas; a mud resistivity in borehole can range from 0.01 Ohm to 5 Ohm. Electrical resistivity of separate fragments of a modelling domain contrasty changes at transition over boarders of these subdomains. Magnetic permeability coefficient is also discontinuous for casing boreholes. Therefore, a mathematical model has to be chosen carefully in order to take into account all complexities of the problem. The choice of a model also depends on time behavior of a process. A field can be harmonically time dependent, and a modelling domain can contain subdomains
16
V.N. Eryomin et al.
with different geometrical and physical characteristics, including anisotropy of coefficients of conductivity and dielectric permeability. A large range of frequencies, which is used in geophysical research, is chosen according to a given problem. For surface of borehole exploration of subsurface anomalies, which are situated on a distance from several tens to several hundreds meters, the induction methods are used with frequencies up to 250 kHz. These methods have become popular in mineral survey, ecological and geophysical researches. The high-frequency methods allow obtaining high level of signal even in relatively low-conductivity media with resistivity up to 120 Ohm, thus, broadening a range of determined resistivities. In case of harmonically time dependent processes a transition to a frequency domain is often done, therefore, a problem is reduced to solving the second order equation with respect to a field complex amplitude. This approach is used for the wide class of scattering problems; problems on modelling of electromagnetic waves in waveguides and corresponding problems on eigenvalues; problems on borehole sounding. For modelling of nonstationary slowly varying fields, wave propagation can be neglected and a transition to a parabolic problem can be done. It is possible in the case of a wave length, which is significantly larger than geometrical dimensions of a domain under consideration. Such fields are typical for problems on induced currents in objects with high electrical conductivity. Finally, nonstationary rapidly varying fields lead to solving the full system of Maxwell equations. Modelling of different microwave devices and high-frequency sounding belong to this class of problems. In applications as electrical logging, in media, where harmonic fields are investigated, electrical conductivity of some subdomains can differ by orders, and frequencies vary in a wide range from 1 kHz to 15 MHz. A typical exploration zone of oil-bearing areas is shown in Fig. 2.
2
5
3
T 1 R1 R2
4 5
Fig. 2. Typical exploration zone of oil-bearing areas: 1 – well, filled with drilling fluid; 2 – dielectric probe case with transmitting coil T and two receiver coils R1 and R2 ; 3 – drilling fluid penetration zone; 4 – seam; 5 – layer
Parallel Realization of Mathematical Modelling
17
The electromagnetic logging is a movement of probe, located near a wall of a horizontal or inclined well. The measurements of electromagnetic characteristics (phase difference of electromotive forces induced in exploring coils) are performed multiply, no less than in 100 positions of a probe. A direct modelling of this class of problems leads to multivariate solutions of the three-dimensional vector Helmholtz equations, representing electric or magnetic field.
2
Mathematical Model
The behavior of a harmonically time dependent electric field is described by the vector Helmholtz equation: rot
1 rot E + k 2 E = −iωJ0 , μ
(1)
where k 2 = iωσ − ω 2 ε, E is the electric field intensity, a complex vector function; J0 is the source current density; ω is the cyclic frequency; ε is the dielectric permeability, μ is the magnetic permeability; σ is the electric conductivity; i is the imaginary unit. It is important to underline that electric characteristics of a borehole, borehole vicinities, container rock and oil- or gas-bearing seams are such that conduction currents and induction currents are commensurable at work in the VIKIZ device frequency range. This leads to the complex quantity k 2 (wave number) and the complex vector function of electric field intensity E. The charge conservation law is valid: div ((σ + iωε)E) = 0.
(2)
The following condition is fulfilled: div J0 = 0.
(3)
The continuity conditions must be satisfied on the boundary Γ between subdomains with different materials: (4) [n × E] = 0, Γ
[n · (σ + iωε)E] = 0, Γ
(5)
where n is a normal vector with respect to the surface Γ . The homogeneous boundary conditions are set on the domain boundary ∂Ω: n×E
∂Ω
= 0.
(6)
18
V.N. Eryomin et al.
3
Vector Variational Formulation
Numerical schemes, approximating a harmonically time dependent electric field E in three-dimensional domains with discontinuous electric characteristics on boundaries, separating subdomains, have to adequately take into account following features of the vector field E: • Continuity of the tangential component of the electric field E on interfragmentary boundaries, separating subdomains with different electric properties; • Jump of the normal component of the electric field E on interfragmentary boundaries, separating subdomains, is proportional to the relation of conductivity coefficients in these subdomains; • Satisfaction of divergence conditions in media with inhomogeneous physical properties. Furthermore, the development of numerical algorithms for solving discrete systems of equations, obtained as a result of finite element approximation of the vector Helmholtz equations (1), is complicated by the large kernel of rotoperator, which contains all gradients of scalar functions, belonging to the space H1 (Ω) [3]. Let us introduce the Hilbert spaces in a domain Ω [4,5]: H(grad; Ω) = H1 (Ω) = {u ∈ L2 (Ω), grad u ∈ L2 (Ω)3 }, H(rot; Ω) = {u ∈ L2 (Ω)3 , rot u ∈ L2 (Ω)3 }, H(div; Ω) = {u ∈ L2 (Ω)3 , div u ∈ L2 (Ω)} with the norms:
|u|2rot,Ω =
u · u∗ dΩ +
Ω
|u|2div,Ω =
rot u · rot u∗ dΩ,
Ω
u · u∗ dΩ +
Ω
div u · div u∗ dΩ.
Ω
A scalar product is defined as follows: (u, v) = u · v∗ dΩ. Ω
The space H(rot; Ω) has been introduced in the works of J.-C. N´ed´elec; its properties are presented in [4,5]. The N´ed´elec basis functions are the vector rotconformal basis functions, which are widely used for finite element approximation of the Helmholtz vector equation and preserve the continuity of a tangential component of a field (for example, a field E) on interelement and interfragmentary boundaries. If a three-dimensional domain Ω (which is probably inhomogeneous with respect to physical properties) has a Lipschitz continuous boundary ∂Ω, then for
Parallel Realization of Mathematical Modelling
19
every function u ∈ H(rot; Ω) a tangential trace u × n on ∂Ω can be defined as an element of H−1/2 (∂Ω) [11]. The functions from the space H(rot; Ω) with zero tangential trace form its subspace: H0 (rot; Ω) = u ∈ H(rot; Ω), u × n = 0 . ∂Ω
H10 (Ω) and H0 (div; Ω) are defined similarly: H10 (Ω) = u ∈ H1 (Ω), u = 0 , ∂Ω H0 (div; Ω) = u ∈ H(div; Ω), u · n = 0 . ∂Ω
The spaces H10 (Ω), H0 (rot; Ω), H0 (div; Ω), L2 (Ω) and three operators: ∇, ∇× and ∇· state the De Rham complex [12], which has the following form in R3 : ∇
∇×
∇·
H10 → H0 (rot; Ω) → H0 (div; Ω) → L2 (Ω).
(7)
According to (7) the following inclusion property is valid for the space H0 (rot; Ω): ∀φ ∈ H10 (Ω), grad φ ∈ H0 (rot; Ω). (8) Let us state a variational formulation [3] for the problem (1),(6). ∈ Vector variational formulation. For given J0 E ∈ H0 (rot; Ω) such that ∀v ∈ H0 (rot; Ω): 1 rot E, rot v + k 2 E, v = −i (ωJ0 , v) . μ
L2 (Ω)3
find
(9)
The variational formulation (9) is fulfilled for all v ∈ H0 (rot; Ω ). If, according to (8), we take v = grad φ, φ ∈ H10 (rot; Ω), then (9) has the following form: 1 rot E, rot grad φ + (k 2 E, grad φ) = −i (ωJ0 , grad φ) , ∀φ ∈ Hl0 . μ Taking into account (3) and the property rot grad φ = 0, we obtain: 2 ω ε + iωσ E, grad φ = 0, ∀φ ∈ Hl0 . 2 2 ω ε + iωσ E, grad φ = ω ε + iωσ E · grad φdΩ, Ω
Ω
2 ω ε + iωσ E · grad φdΩ =
div
2 ω ε + iωσ E φdΩ
(10)
(11)
Ω
It follows from (11) that the equation (10) is the variational analogue of the conservation law (2), therefore, the solution of the variational problem (9) satisfies the charge conservation law (2) in weak sense. It follows from (8) that each vector function E ∈ H0 (rot; Ω) can be represented using the Helmholtz decomposition E = u + grad φ, where H0 (rot; Ω) ∩ H0 (div; Ω), φ ∈ H10 (Ω) [13].
20
V.N. Eryomin et al.
4
Discrete Analogues of Variational Problems
In order to construct the discrete analogue of the variational problem, the elements of the space H(rot; Ω) are approximated by the elements of the discrete space Hh (rot; Ω). Let us introduce the discrete analogue of the variational problem. ∈ L2 (Ω)3 find Discrete variational problem. For J0re h h h h h h Ere ∈ H0 (rot; Ω) and Eim ∈ H0 (rot; Ω) such that ∀v1 ∈ H0 (rot; Ω) and ∀v2h ∈ Hh0 (rot; Ω): (μ−1 ∇ × Ehre , ∇ × v1h )Ω − (ω 2 εEhre , v1h )Ω − (ωσEhim , v1h )Ω = 0, (μ−1 ∇ × Ehim , ∇ × v2h )Ω − (ω 2 εEhim , v2h )Ω + (ωσEhre , v2h )Ω = (ωJ0re , v2h )Ω . Once for the constructed discrete subspaces the inclusion property is valid: uh ∈ Hh (grad; Ω) → ∇uh ∈ Hh (rot; Ω), then the approximation of the electric field intensity Eh will satisfy the charge conservation law in weak form: (−ωεEhim − σEhre , ∇uh1 ) = 0, (ωεEhre − σEhim , ∇uh2 ) = 0,
∀uh1 ∈ Hh0 (grad; Ω), ∀uh2 ∈ Hh0 (grad; Ω).
Expanding the real and imaginary components of the field Eh in the basis of h the discrete space Hh0 (rot; Ω) and choosing the basis functions wm as functions v1 and v2 , the transition is done to the equivalent system of linear algebraic equations:
0 D + B −C Er = , (12) Ei f C D+B where Er and Ei are the weights in the expansion of the real and imaginary components of the field Eh in the basis. The elements of matrices A, B, C and vector fi are defined by the relations: 1 h h [D]i,j = rot wi , rot wj , μ Ω [B]i,j = − εω 2 wih , wjh Ω , [C]i,j = σωwih , wjh Ω , [f ]i = (ωJ0re , wih )Ω . The following preconditioning matrix is defined:
−1 (D + B)ii −Cii , Cii (D + B)ii where (D + B)ii and Cii are main diagonals of corresponding matrices.
(13)
Parallel Realization of Mathematical Modelling
5
21
Local Vector Basis Functions on Tetrahedral Grid
In the present work the vector elements of the first and second types are used on the tetrahedral grid for approximating the intensity of the electric field E. Let us define the interpolation functions uh for the vector quantity u on the tetrahedron K in the following form: uh (x) = αi (u)wih (x), (14) i∈S
where αi are the degrees of freedom, S is the set of indices of the degrees of freedom, wih are the basis functions, x = (x1 , x2 , x3 ) is the coordinate in R3 . Let pi (i = 1, 2, 3, 4) be the coordinates of vertices of an arbitrary tetrahedron K (see Fig. 3), λi (x) be three-dimensional barycentric coordinates of the point x ∈ K with respect to the tetrahedron vertices. Then the first order local vector basis functions of the first type have the following form: w1K = λ1 ∇λ2 − λ2 ∇λ1 , w2K = λ1 ∇λ3 − λ3 ∇λ1 , w3K = λ1 ∇λ4 − λ4 ∇λ1 , w4K = λ2 ∇λ3 − λ3 ∇λ2 , w5K = λ2 ∇λ4 − λ4 ∇λ2 , w6K = λ3 ∇λ4 − λ4 ∇λ3 .
(15)
The vector basis functions (15) are associated with the tetrahedral edges. The connection of the local numbering with the numbering of the tetrahedral vertices is shown in Fig. 3. The space of the first order basis functions is denoted by Hh (rot; Ω; 1). The fulfillment of the weak form (10) of the charge conservation law is connected with the inclusion property of spaces. Let us show that it is true for the
p4 e6 e3
p3 e2
z
e5 e4
p1 e1
y x
p2
Fig. 3. Numbering of vertices and edges of a tetrahedral finite element
22
V.N. Eryomin et al.
discrete subspace Hh (rot; Ω; 1). The barycentric coordinates are taken as the scalar local basis functions of the discrete subspace: K φK 1 = λ1 , φ2 = λ2 , K φ3 = λ3 , φK 4 = λ4 .
(16)
The discrete scalar subspace, constructed using the local basis functions (16), is denoted by Hh (grad; Ω; 1). The following correlations are valid: K K K ∇φK 1 = ∇λ1 = −w1 − w2 − w3 , K K K K ∇φ2 = ∇λ2 = w1 − w4 + w5 , K K K ∇φK 3 = ∇λ3 = w2 + w4 − w6 , K K K ∇φ4 = ∇λ4 = w3 − w5 + w6K .
(17)
It follows from (17) that the gradients of the scalar basis functions of the space Hh (grad; Ω; 1) are the linear combination of the vector basis functions of the space Hh (rot; Ω; 1). Therefore, the inclusion property is true: u ∈ Hh (grad; Ω; 1) → ∇u ∈ Hh (rot; Ω; 1). Let us define the first order local vector basis functions of the second type, which form the basis functions of the space Hh (rot; Ω; 2): K w1,1 K w2,1 K w3,1 K w4,1 K w5,1 K w6,1
K = λ1 ∇λ2 , w1,2 K = λ1 ∇λ3 , w2,2 K = λ1 ∇λ4 , w3,2 K = λ2 ∇λ3 , w4,2 K = λ4 ∇λ2 , w5,2 K = λ3 ∇λ4 , w6,2
= λ2 ∇λ1 , = λ3 ∇λ1 , = λ4 ∇λ1 , = λ3 ∇λ2 , = λ2 ∇λ4 , = λ4 ∇λ3 ,
(18)
Two local basis functions wi,1 and wi,2 are associated with each i-th edge of the tetrahedron. The discrete subspace Hh (rot; Ω; 2), constructed using the second order vector basis functions, corresponds to the discrete subspace Hh (grad; Ω; 2) with the second order scalar local basis functions [8]: φK 1 φK 3 φK 5 φK 7 φK 9
= λ1 (2λ1 − 1), φK 2 = λ2 (2λ2 − 1), = λ3 (2λ3 − 1), φK 4 = λ4 (2λ4 − 1), = 4λ1 λ2 , φK 6 = 4λ1 λ3 , = 4λ1 λ4 , φK 8 = 4λ2 λ3 , = 4λ2 λ4 , φK 10 = 4λ3 λ4 .
Let us construct the hierarchical basis of the space Hh (rot; Ω; 2): K w1,1 K w3,1 K w5,1 K w1,2 K w3,2 K w5,2
K = λ1 ∇λ2 − λ2 ∇λ1 , w2,1 K = λ1 ∇λ4 − λ4 ∇λ1 , w4,1 K = λ2 ∇λ4 − λ4 ∇λ2 , w6,1 K = λ1 ∇λ2 + λ2 ∇λ1 , w2,2 K = λ1 ∇λ4 + λ4 ∇λ1 , w4,2 K = λ4 ∇λ2 + λ4 ∇λ2 , w6,2
= λ1 ∇λ3 − λ3 ∇λ1 , = λ2 ∇λ3 − λ3 ∇λ2 , = λ3 ∇λ4 − λ4 ∇λ3 . = λ1 ∇λ3 + λ3 ∇λ1 , = λ2 ∇λ3 + λ3 ∇λ2 , = λ3 ∇λ4 + λ4 ∇λ3 .
(19)
Parallel Realization of Mathematical Modelling
23
The hierarchical basis of the space Hh (grad; Ω; 2) has the following form: φK 1 φK 3 φK 5 φK 7 φK 9
= λ1 , φK 2 = λ2 , = λ3 , φK 4 = λ4 , = λ1 λ2 , φK 6 = λ1 λ3 , = λ1 λ4 , φK 8 = λ2 λ3 , = λ2 λ4 , φK 10 = λ3 λ4 .
Further, the hierarchical bases of the H (grad; Ω; 2) on the tetrahedral grid are used.
spaces
Hh (rot; Ω; 2)
and
h
6
Two-Level Iterative Solver
Let us consider the system of linear algebraic equations with nonsingular matrix A and dimensionality n × n: Ax = b. (20) Let V denote the subspace of the space Rn with dim(V) = m. Let us introduce the matrix P of the dimension n × m, whose columns are the basis of the subspace V: P : V → Rn . Let us formulate the tow-level iterative algorithm for solving systems of linear algebraic equations (SLAE), which uses the subspace V: Algorithm SV (A, b, x0 ν) r0 = b − Ax0 ; for i = 1, 2, ... g = P T ri−1 ; z = (P T AP )−1 g or z = S((P T AP ), g, γP ); xi−1/2 = xi−1 + P z; ri−1/2 = b − Axi−1/2 ; y = S(A, ri−1/2 , γ); xi = xi−1/2 + y; ri = b − Axi ; if ri < νb, then return xi ; increase i, where S(A, b, γ) is the iterative solver for SLAE Ax = b. The matrix, obtained by the discretization of the considered problem, is not symmetric, therefore, the preconditioned stabilized biconjugate gradient method BICGST AB(A, b, γ) [10] is used. Based on the work of R. Hiptmair [6], the linear subspace, which is equivalent to the kernel rot of the operator Nh (rot; Ω) = {u ∈ Hh (rot; Ω); ∇ × u = 0} can be chosen as a subspace V in the algorithm SV (A, b, x0 ν).
24
V.N. Eryomin et al.
The columns of the matrix P are the coordinates of the gradients of basis functions of the space Hh0 (grad; Ω) in the basis of the space Hh0 (rot; Ω). In this case the system of linear algebraic equations P T AP u = P T r
(21)
is equivalent to the following discrete variational problem. ∈ L2 (Ω)3 find Discrete variational problem. For F0re , F0im h h h h h h Ure ∈ H0 (grad; Ω) and Uim ∈ H0 (grad; Ω) such that ∀v1 ∈ H0 (grad; Ω) and ∀v2h ∈ Hh0 (grad; Ω) the following equalities are fulfilled: h h h (μ−1 ∇ × ∇Ure , ∇ × ∇v1h )Ω − (ω 2 ε∇Ure , ∇v1h )Ω − (ωσ∇Uim , ∇v1h )Ω =
= (F0im , ∇v2h )Ω , h h h (μ−1 ∇ × ∇Uim , ∇ × ∇v2h )Ω − (ω 2 ε∇Uim , ∇v2h )Ω + (ωσ∇Ure , ∇v2h )Ω =
= (F0re , ∇v2h )Ω . Taking into account ∇ × ∇ = 0 the discrete variational problem is obtained. h h For F0re , F0im ∈ L2 (Ω)3 find Ure ∈ Hh0 (grad; Ω) and Uim ∈ Hh0 (grad; Ω) h h h h such that ∀v1 ∈ H0 (grad; Ω) and ∀v2 ∈ H0 (grad; Ω) the following equalities are fulfilled: h h −(ω 2 ε∇Ure , ∇v1h )Ω − (ωσ∇Uim , ∇v1h )Ω = (F0im , ∇v2h )Ω , h h −(ω 2 ε∇Uim , ∇v2h )Ω + (ωσ∇Ure , ∇v2h )Ω = (F0re , ∇v2h )Ω .
Therefore, the matrix of the system (21) can be obtained using the finite element technology instead of direct multiplication of matrices. The solver SV (A, b, x0 ν), which uses the operator kernel rot, is denoted by SN(rot) (A, b, x0 ν).
7
Multiplicative Algorithm
Let two subspaces V1 and V2 of the Rn exist such that V1 ∪ V2 = Rn ,
V1 ∩ V2 = 0.
Let us introduce the matrices P1 and P2 , the columns of which form the bases of the subspaces V1 and V2 accordingly. The following iterative algorithm for solving SLAE (20) [7] cam be formulated for successive refinement of an initial approximation by elements of the subspaces V1 and V2 .
Parallel Realization of Mathematical Modelling
25
Multiplicative Schwarz algorithm: r0 = b − Ax0 ; for i = 1, 2, ... g1 = P1T ri−1 ; z1 = (P1T AP1 )−1 g1 or z1 = S1 ((P1T AP1 ), g1 , γ1 ); xi−1/2 = xi−1 + P1 z1 ; ri−1/2 = b − Axi−1/2 ; g2 = P2T ri−1/2 ; z2 = (P2T AP2 )−1 g2 or z2 = S2 ((P2T AP2 ), g2 , γ2 ); xi = xi−1/2 + P2 z2 ; ri = b − Axi ; if xi satisfies a desired exactitude, then stop; increase i. Let introduce the following spaces: Nh (rot; Ω; 2) = {u ∈ Hh (rot; Ω; 2); ∇ × u = 0}, Nh (rot; Ω; 1) = {u ∈ Hh (rot; Ω; 1); ∇ × u = 0}. According to the definition, the hierarchic basis of the space Hh (rot; Ω; 2) consists of the basis functions of the space Hh (rot; Ω; 1) and gradients of scalar functions. Therefore, Hh (rot; Ω; 2) can be represented as follows: Hh (rot; Ω; 2) = Hh (rot; Ω; 1) ∪ Nh (rot; Ω; 2), Hh (rot; Ω; 1) ∩ Nh (rot; Ω; 2) = Nh (rot; Ω; 1). Let V1 = Hh (rot; Ω; 1), V2 = Nh (rot; Ω; 2), then the multiplicative Schwarz algorithm can be used for solving SLAE (12) [9]. It follows from the definition of subspaces that the matrices (P1T AP1 ) and (P2T AP2 ) will correspond to discrete variational problems, formulated with respect to the vector Helmholtz equation in the spaces Hh (rot; Ω; 1) and Nh (rot; Ω; 2) respectively. Therefore, the finite element technology is used for obtaining matrix data. The stabilized biconjugate gradient method is used as a solver S1 ((P1T AP1 ), g1 , γ1 ). According to the definition the hierarchic basis of the space Hh (grad; Ω; 2), the following property is valid: Nh (rot; Ω; 1) ⊂ Nh (rot; Ω; 2). Therefore, the solver S2 ((P2T AP2 ), g2 , γ2 ) is constructed on the basis of twolevel algorithm SV (A, b, 0, γ1 ), where V = Nh (rot; Ω; 1).
8
Numerical Testing of Algorithm on Model Problem
Numerical testing of the suggested algorithm is done for the following model problem (1)-(6) on modelling of a harmonically time dependent electric field in
26
V.N. Eryomin et al.
1
2
Fig. 4. Computational domain
the domain [−1, 1]3 , which consists of two subdomains with different electrical conductivity σ1 and σ2 (Fig. 4). In subdomain 1 the transmitting coil is placed, in which the current of 1 A is given with the frequency 14 MHz. The boundary between subdomains is the plane x = 0. The testing has been done on a PC with an Athlon-XP +1800 processor and 512 Mb RAM. The table 1 shows the dimensionalities of discrete subspaces, constructed on a sequence of unstructured tetrahedral grids Ti , and a length of the largest edge hmax . Table 1. The length hmax of the largest edge and dimensionalities of discrete subspaces, constructed on a sequence of unstructured tetrahedral grids hmax
H(rot; Ω; 1) N(rot; 1) H(rot; Ω; 2) N(rot; 2) −1
T1 2.5 · 10 8776 T2 8.84 · 10−2 63756 T3 6.25 · 10−2 236364
1474 9632 34204
17552 127512 472728
10250 73388 270568
Table 2. Time (in seconds) of finding an approximate solution by the multiplicative algorithm for different relations between values of electrical conductivity on different tetrahedral grids (σ1 , σ2 ) (1,1) (0,0) (1,10) (1,0.1) (1,0) (10,1) (0.1,1) (0,1) T1 T2 T3
4 3 5 5 64 53 92 93 767 544 1101 1457
5 4 118 51 1879 590
5 93 1545
6 119 1702
Parallel Realization of Mathematical Modelling
27
Table 3. Time (in seconds) of finding an approximate solution by the algorithm SN(rot)(A, b, x0 ν) for different relations between values of electrical conductivity on different tetrahedral grids (σ1 , σ2 ) (1,1) (0,0) (1,10) (1,0.1) (1,0) (10,1) (0.1,1) (0,1) T1 T2 T3
6 3 3 8 122 56 118 196 2651 1005 1674 6957
8 5 8 215 71 205 7224 1364 5774
10 269 7022
The time (in seconds), which the multiplicative algorithm and the algorithm SN(rot) (A, b, x0 ν) need for decreasing a relative residual norm in 107 times, is shown for different relations between values of electrical conductivity in the first and second subdomains and for different tetrahedral grids in tables 2 and 3 respectively.
9
The NEC SX-8 Vector Supercomputing System at HLRS
The NEC SX series of vector supercomputers has a tradition starting 1983 with the SX-1/2, the first supercomputer providing a peak performance exceeding 1 GFLOPS. The NEC SX-8 has become available January 2005, the system installed at HLRS started production April 2005. The basic building block of the system, a SX-8 node (Fig. 5), hosts up to 8 processors (Fig. 6) running at 2 GHz with 4 vector pipes for add and multiply providing a vector performance of 16 GFLOPS and an accumulated performance of 128 GFLOPS peak for a fully equipped node. The vector pipes take data
Fig. 5. NEC SX-8 node
28
V.N. Eryomin et al.
Fig. 6. Schematic view of an SX-8 CPU
from the vector registers, process them and store them back to vector registers located on the processors. There is no complicated multi-level cache hierarchy. The data is moved directly between the memory (MMU) and vector registers using a load/store pipe. The peak bandwidth between memory and a processor is 64 GB/s. This allows one load or store per 2 FLOP. The memory consists of 4096 interleaved memory banks to hide latency. All memory inside a node (128 GB) can be accessed by any processor via a crossbar switch, giving a aggregated memory bandwidth of 512 GB/s for a node. Special new features of the SX-8 are the additional hardware for the square root / divide and the so-called Memory Bank Cache (MBC), which avoids the stride-2 performance penalty encountered in earlier models of the SX-series and gives a significant speed-up for certain table-look-up pattern. The NEC SX-8 installation at HLRS consists of 72 nodes connected via a highspeed single-staged optical crossbarswitch (IXS), the programming paradigm for inter-node data exchange is MPI. The IXS provides 2*8 GB/s per node for each direction for inter-node communication (32 GB/s crossection bandwidth). More details can be found in [14]. The operating system is SUPER-UX (currently R15.1), a System V based Unix, with extensions for high performance computing. Applications can be programmed either in FORTRAN or in C/C++. The sophisticated compilers support a high level of vectorization and even automatic shared memory parallelization. OpenMP, MPI and HPF are available for portable parallel programming. Cross-development toolkits, containing compilers and performance analysis tools, are available for Linux systems and various other platforms. Two Intel Itanium based NEC TX-7 systems with Linux are used at HRLS for program development, batch-job-submission, pre- and post-processing. The SX-8 and the TX-7 have access to a Global File System (GFS), which is based on IA32 NAS-servers.
10
VIKIZ on NEC SX-8
Originally the code has been developed for running on PC. Therefore, some porting efforts have to be done. The mesh is constructed using an automatic 3d tetrahedral mesh generator NETGEN [15]. Unfortunately, due to the compilation problems with Tcl, Tk and
Parallel Realization of Mathematical Modelling
29
25
Δφ
20
15
10
5
-1615
-1610
-1605
-1600
Z
Fig. 7. Phase difference vs. probe depth
Tix libraries, we were not be able to compile NETGEN for running on SX-8, therefore, NETGEN has been compiled for running on the frontend. It doesn’t cause much trouble right now as the mesh is generated before the solver starts to work and it doesn’t changed during a solver run. Nevertheless, it is a future task to make a mesh generation parallel as a sequential version takes about 5 hours of a run-time. No issues have been found while porting the solver. First tests with the sequential version of VIKIZ have been performed to validate the numerical correctness of the runs on the SX-8. The results have shown a success in porting the sequential version of the code. In a second step, 8 instances of the sequential executable have been run concurrently on one SX-8 node and upon completion, the next set of 8 instances has been started. This approach seemed reasonable as the input data consists of independent files. Analyzing the runs of a typical production case with respect to elapsed time, it turned out that there is some larger variation. Consequently, in a third step, a dynamic load scheduler has been introduced. This is implemented inside the batch script: the number of active instances of the executable is checked frequently and as soon as an instance has completed, a new one is launched. The Fig. 7 shows the numerically obtained dependance of a phase difference on a probe depth. z is a probe depth; ΔΦ is a phase difference, which allows geophysicists to obtain properties of rock. This value is a goal of real sounding measurements by the VIKIZ probe.
11
Outlook
The following points will be considered during further development of the VIKIZ numerical modelling: • From NETGEN to an other grid generator in order to parallelize grid generation;
30
V.N. Eryomin et al.
• From external parallelism to internal parallelism: the algorithm allows effective parallelization; • From harmonic fields to time-dependent fields; • Special preconditioning procedures for vector elements of high order and their parallel realization.
References 1. Technology of oil- and gas-bearing wells surveying on basis of VIKIZ. Methodical manual. In: Epov, M.I., Antonov, Yu.N. (eds.) Novosibirsk, Scientific-publishing center of Trofimuk United Institute of Geology, Geophysics and Mineralogy SB RAS (2000) (in Russian) 2. Eryomin, V.N.: Geophysical Reporter 1, 15–19 (2005) (in Russian) 3. Hiptmair, R.: Acta Numerica 11, 237–339 (2002) 4. N´ed´elec, J.C.: Numer. Math. 35, 315–341 (1980) 5. N´ed´elec, J.C.: Numer. Math. 50, 57–81 (1986) 6. Hiptmair, R.: SIAM J. Numer. Anal. 36(1), 204–225 (1998) 7. Saad, Y.: Iterative methods for sparse linear systems. PWS Publishing Company (1996) 8. Webb, J.P.: IEEE Trans Antennas Propag 47(8), 1244–1253 (1999) 9. Nechaev, O.V., Shurina, E.P., Botchev, M.A.: Multilevel iterative solvers for the edge finite element solution of the 3D Maxwell equation. Department of Applied Mathematics Internal Report 1806, University of Twente (2006), ISSN 0169-2690 10. Van der Vorst, H.A.: SIAM J. Sci. Stat. Comput. 13(2), 631–644 (1992) 11. Monk, P.: Finite element methods for Maxwells equations. Clarendon Press, Oxford (2003) 12. Arnold, D.N., Falk, R.S., Winther, R.: Acta Numerica 15, 1–155 (2006) 13. Caorsi, S., Fernandes, P., Raffetto, M.: SIAM J. Numer. Anal. 38, 580–607 (2000) 14. Tagaya, S., Nishida, M., Hagiwara, T., Yanagawa, T., Yokoya, Yu., Takahara, H., Stadler, J., Galle, M., Bez, W.: The NEC SX-8 Vector Supercomputer System. In: Resch, M., B¨ onisch, T., Benkert, K., Furui, T., Seo, Y., Bez, W. (eds.) High Performance Computing on Vector Systems 2005. Proceedings of the High Performance Computing Center Stuttgart, March 2005, pp. 3–24. Springer, Heidelberg (2006) 15. http://www.hpfem.jku.at/netgen/
Numerical Solution of Some Direct and Inverse Mathematical Problems for Tidal Flows V.I. Agoshkov1, L.P. Kamenschikov2, E.D. Karepova2, and V.V. Shaidurov2 1
Institute of Numerical Mathematics RAS, Gubkina st. 8, 119991 Moscow, Russia
[email protected] 2 Institute of Computational Modelling SB RAS, Academgorodok, 660036 Krasnoyarsk, Russia
[email protected],
[email protected],
[email protected]
Abstract. The inverse problem of the mathematical theory of tides is considered in the form of defining a boundary values on liquid parts of boundary. The direct and conjugate equations of shallow water are closed by the observation data on the function of sea level (free surface elevation) on a part of the boundary. The iterative algorithm is used for solving this complete problem in connection to tides of World Ocean.
1
Introduction
The subject of paper concerns the two-dimensional shallow water equations describing tidal flows in seas and oceans. First part of paper is devoted to solving the direct problem including 3 partial differential equations for two horizontal velocities and water level fluctuation. For discretization, linear finite elements are used on triangles adapted for coastline. Then computational stability of discrete problem and convergence of its solution to exact one are proved. In the important case of World Ocean, the parallel realization of algorithm is used with geometrical decomposition of solution domain and correspondent data. These computations are useful for geophysical considerations, prediction of bank erosion, and arrangement of tidal hydro stations. In the second part we discuss some inverse problem where the right-hand side of boundary condition is unknown and it is necessary to reconstruct it. For this purpose, some combination of regularization method with optimal control is used in accordance with the paper [3]. First we get combination of direct and adjoint problems containing 6 partial differential equations with an additional equality. Then we use a stable iterative method to solve them for reconstruction of unknown function with the help of other known one. In the important case of World Ocean, the parallel realization of algorithm is used with geometrical decomposition of solution domain and correspondent data.
This work is supported by Russian Foundation of Basic Research (grant N 0501-00579) and Department of mathematical science of RAS and SB RAS (project N 1.3.9).
E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 31–43, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
32
V.I. Agoshkov et al.
These computations are useful for geophysical considerations, prediction of bank erosion, and arrangement of tidal hydro stations. We numerically demonstrate convergence history of iterative process and its possibility for reconstruction of unknown boundary function.
2
Differential Problem and Time-Discretization
Let (r, λ, θ) be standard spherical coordinates with origin at the center of terrestrial globe. Further we will use the geographic latitude ϕ = θ + π/2 instead of angle θ thus 0 ≤ ϕ ≤ π. We denote by λ geographic longitude and so 0 ≤ λ ≤ 2π. Let us suppose that r = RE everywhere, where RE is the radius of the Earth which is assumed to be constant. We consider the long-waves propagation problem in the following form. Let ΩRE be a given domain on the sphere with a boundary Γ =Γ1 ∪ Γ2 , where Γ1 is a part of boundary passing along the coastline (’solid boundary’) and Γ2 = Γ \ Γ1 is a part of boundary rounded the area of water (’liquid boundary’). We denote by m1 and m2 the characteristic functions of correspondent part of boundary. For simplicity sake we assume that the points ϕ = 0 and ϕ = π (poles) do not belong to ΩRE . Denote Ω = {(λ, ϕ) ∈ [0, 2π] × (0, π) : (RE , λ, ϕ) ∈ ΩRE }. We write in ΩRE × (0, T ) the impulse balance equations and the equation of continuity [3]: ∂ξ ∂u = lv + mg − Rf u + f1 , (1a) ∂t ∂λ ∂ξ ∂v = −lu + ng − Rf v + f2 , ∂t ∂ϕ ∂ξ ∂ ∂ n =m (Hu) + Hv + f3 , ∂t ∂λ ∂ϕ m
(1b) (1c)
where u = u(t, λ, ϕ) and v = v(t, λ, ϕ) are longitude and latitude components of the velocity vector U, ξ = ξ(t, λ, ϕ) is a fluctuation of a free surface from the nonperturbed level, H(λ, ϕ) > 0 is the depth of a water body at the point (λ, ϕ), the function Rf = r∗ |U|/H takes into account the force of friction on the floor, r∗ is the friction coefficient, l = −2ω cos ϕ is the Coriolis parameter, m = 1/(RE sin ϕ), n = 1/RE , g is the acceleration of gravity; f1 = f1 (t, λ, ϕ), f2 = f2 (t, λ, ϕ), and f3 = f3 (t, λ, ϕ) are given functions of the external forces. We consider the boundary conditions in the following form [1]: (2) HUn + βm2 gHξ = m2 gHd on Γ × (0, T ), n where Un = U· n, n = n1 , n2 is vector of outer normal to the boundary; 0 < m β < 1 is a given parameter of the problem, which influence to stable properties. Function d = d(t, λ, ϕ) is a boundary one, which have to be determined together with u, v, ξ. To close the problem (1) – (2), in [3] it was suggested to add the boundary condition m0 ξ = ξobs
(3)
Numerical Solution of Some Direct and Inverse Mathematical Problems
33
on a part of “liquid” boundary Γ0 with characteristic function m0 and given function ξobs ∈ L2 (Γ0 ). We also specify the initial conditions u(0, λ, ϕ) = u0 (λ, ϕ), v(0, λ, ϕ) = v0 (λ, ϕ), ξ(0, λ, ϕ) = ξ0 (λ, ϕ).
(4)
For the time discretization we subdivide the time interval [0, T ] into K subintervals: 0 = t0 < t1 < · · · < tK = T with step τ = T /K. Approximating the derivatives with respect to time with the help of left differences, we consider the system (1)–(4) on the time interval (tk , tk+1 ): 1 1 ∂ξ + Rf u − lv − mg = f1 + u k in Ω, (5a) τ ∂λ τ
1 1 ∂ξ + Rf v + lu − ng = f2 + v k in Ω, τ ∂ϕ τ ∂ ∂ n 1 1 ξ−m (Hu) + Hv in Ω, = f3 + ξ k τ ∂λ ∂ϕ m τ HUn + βm2
gHξ = m2 gHd on Γ, k
k = 0, 1, . . . , K − 1,
(5b)
(5c) (6)
k+1
where f (tk , λ, φ) = f , f (tk+1 , λ, φ) = f = f . In what follows superscript k + 1 in difference expressions will be omitted where this could not be ambiguously interpreted. The floor friction is given in the form Rf = r∗ |Uk |/H. Finally we come to the following statement. Problem 1. Let ξobs be given on Γ0 , function d be unknown on Γ2 and equal to zero on Γ1 . Find u, v, ξ, d, which satisfy system (5), initial data (4), boundary condition (6), and closure condition (3).
3
The Weak Formulation of the Problem
ˆ ∈ L2 (ΩR ) = L2 (ΩR ) 3 ˆ = (ˆ For real vector-functions Φ = (u, v, ξ), Φ u, vˆ, ξ) E E we introduce the inner product [3] 2 ˆ = RE (Φ, Φ) sin ϕ H(uˆ u + vˆ v ) + gξ ξˆ dλdϕ Ω
and the norm Φ = (Φ, Φ)1/2 < ∞. With this notations, we rewrite the problem (5a) – (5c) in the following operator-vector form: LΦ = F. Further we can formulate the Galerkin method for the system (5).
34
V.I. Agoshkov et al.
Problem 2. Let d = d(t, λ, ϕ) be given on Γ2 . A vector function Φ = (u, v, ξ) ∈ (L2 (ΩRE ))2 × W21 (ΩRE ) ≡ W is called a weak solution of the problem (5), if Φ0 = (u0 , v 0 , ξ 0 ) = (u0 (λ, ϕ), v0 (λ, ϕ), ξ0 (λ, ϕ)) = Φ(0) and at any instant of time tk+1 , k = 0, . . . , K − 1 the integral identity a(Φ, W) = f (W) + b(d, W)
(7)
is valid for any vector-function W = (wu , wv , wξ ) ∈ W . Here 1 2 a(Φ, W) = RE H(uwu + vwv ) + gξwξ + Hl(uwv − vwu ) sin ϕ τ Ω + HRf (uwu − vwv ) dλdϕ ∂wξ ∂ξ ∂ξ ∂wξ + RE Hg − wu − wv u + sin ϕ v dλdϕ ∂λ ∂λ ∂ϕ ∂ϕ Ω + βm2 g gHξwξ dΓ, Γ
f (W) = Ω
+ Ω
2 RE sin ϕ H(f1 wu + f2 wv ) + gf3 wξ dλdϕ 1 2 RE sin ϕ H(uk wu + v k wv ) + gξ k wξ dλdϕ, τ
b(d, W) =
βm2 g
gH d wξ dΓ.
Γ
We obtain identity (7) using equality (LΦ, W) = (F, W) and the boundary condition (6). In [3] it was proved that at any instant tk , k = 1, . . . , K the problem (5) has weak solution. A priori estimate for the solution of problem (5) at any instant tk , k = 1, . . . , K is obtained in the ibidem paper. Notice that the boundary conditions (6) are natural for the problem (5). Consider bilinear forms a(Φ, ·) and (d, ·) with any fixed vector functions Φ ∈ W and d ∈ L2 (Γ2 ) as bounded linear functionals. Then identity (7) one can rewrite in the form of operator equation in space W ∗ [2]: AΦ = F˜ + Bd
(8)
with operators A : W → W ∗ and B : L2 (Γ2 ) → W ∗ and right-hand side function F˜ , caused by inner product on F. Identity (3) also can be written as CΦ = ξobs 1/2
with operator C : W → W2
(Γ0 ).
(9)
Numerical Solution of Some Direct and Inverse Mathematical Problems
35
1/2
Since space W2 (Γ0 ) is compactly embedded in L2 (Γ0 ) [2], then the problems (8), (9) or (7), (3) may be ill-conditioned. For its solving let us used the approach suggested in [2].
4
The Adjoint Problem
Using integration by parts and boundary conditions (6) we obtain the following
∈ L2 (Ω) ∩ ((L2 (Ω))2 × W21 (Ω)), identity which is true for any vector-function Φ
T:
= (
Φ u, v , ξ)
− gξ H U
= (Φ, L∗ Φ) ˆn − βm2 gH ξ dΓ − b(d, Φ).
(LΦ, Φ) (10) Γ
n = U
· n, U ˆ = (ˆ Here U u, vˆ)T · n, L∗ is operator adjoint formally to L. The adjoint problem has the following form: ∂ ξ
1 +R u
+ l
v + mg = F 1 in Ω, (11a) τ ∂λ 1 ∂ ξ
+ R v − l
u + ng = F 2 in Ω, τ ∂ϕ ∂ ∂ n 1
ξ−m (H u
) + H
v = F 3 in Ω, τ ∂λ ∂ϕ m
n + βm2 − HU
gH ξ = p
(11b) (11c)
on Γ,
(12)
= (F 1 , F 2 , F 3 )T and p are given. At that, for classical solutions of the where F direct and adjoint problems (5)–(6) and (11)–(12) the following identity holds:
+ RE g m2 gHdξ ds = (Φ, F)
+ RE g ξp ds. (F, Φ) (13) Γ
5
Γ
Optimal Control Problem
To solve the Problem 1, consider the following optimal control problem [2]. Problem 3. Let some α ≥ 0 and ξobs be given on Γ0 . Find Φα = (uα , vα , ξα ), dα , satisfying system LΦα = F in Ω, initial data (4), boundary condition HUα,n + βm2 gHξα = m2 gHdα and minimizing the cost functional
on Γ
36
V.I. Agoshkov et al.
⎛ ⎞ 1 ⎝ Jα (dα , ξα (dα )) = g α m2 gHd2α ds + m0 gH(ξα − ξobs )2 ds⎠ . 2 Γ
(14)
Γ
Solution dα of Problem 3 satisfies a system of variation equations which has a form: AΦα = F˜ + Bdα ,
α = Jα, A∗ Φ Φ (dα , Φα ),
∗
Jα, d (dα , Φα ) + B Φα = 0,
where Jα, d , Jα, Φ means derivatives of Jα with respect to d and Φ respectively. Euler equations for the first variations produced the following problem with parameter: (15a) LΦα = F in Ω, HUα,n + βm2 gHξα = m2 gHdα on Γ,
α = 0 in Ω, L∗ Φ
α,n + βm2 HU
gH ξ α = m0 gH(ξα − ξobs ) on Γ, (15b)
αm2 dα + m2 ξ α = 0 on Γ. k+1
k+1
,v ,ξ To find solutions u the following iterative algorithm.
k+1
k+1
and d
(15c)
at k-th time level, we shall use
(0)
1. Let us take some dα on Γ2 . Here and later in algorithm description an (0) (0) upper index in parenthesis shows the number of iteration. Let uα = uk , vα = (0) k k v , ξα = ξ . 2. While 1/2 √ (l) m0 gH(ξ (l) − ξobs )2 dΓ Γ ≥ε (16) 1/2 √ 2 m0 gH(ξobs ) dΓ Γ
the next step of iteration is executed. Here ε is given accuracy. Else the process is finished. (l) (l) (l) 2.1. From the direct problem (15a), uα , vα , ξα are found (index k + 1 is omitted). (l)
α , 2.2. Using ξαl in the right-hand side of adjoint problem (15b) we find u (l) (l) v α , ξ α . (l) 2.3. Now use (15c) for iterative correction of dα : (l)
(l) d(l+1) = d(l) α α − γl (αdα + ξα ). (l)
(l+1)
(17)
2.4. Go to point 2 with new dα = dα , l = l + 1. (M) (M) After some number M of iteration we get uk+1 ≈ uα , v k+1 ≈ vα , ξ k+1 ≈ (M) ξα .
Numerical Solution of Some Direct and Inverse Mathematical Problems
6
37
The Construction of a Discrete Analogue
To pass on to the discrete problem, we consider a consistent triangulation T = Nel {ωi }i=1 of the domain Ω, which consists of nondegenerate triangles with rectilinear sides in λ- and ϕ-coordinates and includes the domain Ω. Consistency means that any side of the triangle either is boundary and belongs to a single triangle or is common for two neighboring triangles which have no common interior point. In the general case a grid may be nonstructural. ¯h be a set of nodes (i.e., of vertices of triangular elements) and their Let Ω number be equal to Nnd . Denote by Ωh the set of interior nodes. The boundary Γh is subdivided into a set of segments being boundary sides of triangles ωi : Γh := {sj : sj ∈ ωi ∩ Γ, i = 1, 2, . . . , Nel , j = 0, 1, 2}. ¯h we introduce the basis function Ψj (λ, ϕ) which is equal For any node zj ∈ Ω ¯h , and is linear on each triangle. to one at zj , vanished at all other nodes of Ω Denote the span of these functions by Nnd . Hh (Ωh ) = span Ψj j=1 ˆ h = (ˆ For real vector-functions Φh = (uh , v h , ξ h ), Φ uh , vˆh , ξˆh ) ∈ Hh (Ωh ) ≡ 3 (Hh (Ωh )) we consider the inner product ˆ h )h = (Φh , Φ
Nel 1 i=1
3
Si
2
2 RE sin(ϕij ) Hij (uij u ˆij + vij vˆij ) + gξij ξˆij
j=0
and the norm 1/2
Φh h = (Φh , Φh )h ⎞1/2 ⎛ Nel 2 1 2 2 2 h h h = RE ⎝ Si sin(ϕij ) Hij (uij + vij ) + gξij ⎠ . (18) 3 i=1 j=0 Here Si denotes the area of the i-th triangular element who’s vertices are numbered 0, 1, 2; hence, fij = f (λij , ϕij ) is the value of function at the j-th vertex of the i-th element. We formulate the Bubnov-Galerkin problem in the following way. Problem 4. At a fixed instant of time and given boundary function d find a vector-function Φh = uh (λ, ϕ), v h (λ, ϕ), ξ h (λ, ϕ) , where uh (λ, ϕ) =
N nd
αuj Ψj (λ, ϕ),
v h (λ, ϕ) =
j=1
ξ h (λ, ϕ) =
N nd j=1
N nd i=1
αξj Ψj (λ, ϕ),
αvj Ψj (λ, ϕ), (19)
38
V.I. Agoshkov et al.
such that the identity ah (Φh , Wh ) = f h (Wh )
(20)
is valid ∀ Wh = (wuh , wvh , wξh ) ∈ Hh . Here we use the following notations for bilinear and linear forms: 2 1 2 ξ u v Si RE sin(ϕij ) Hij (uij wij + vij wij ) + gξij wij 3 j=0 τ
Nel 1
h
a (Φh , Wh ) =
i=1
+
Nel 1 i=1
3
Si
2
2 v u RE sin(ϕij )Hij lij (uij wij − vij wij )
j=0
u v + vij wij ) +Rf,ij (uij wij Nel 2 ∂wξ ∂ξ 1 u Si RE Hij g uij − wij + 3 ∂λ ∂λ ij ij i=1 j=0 N 2 el 1 ∂wξ ∂ξ v Si RE Hij g sin(ϕij ) vij − wij + 3 ∂ϕ ∂ϕ ij ij i=1 j=0 +m2 βg
1 1 ξ Gij gHij ξij wij , 2 j=0
si ∈ Γh
f h (Wh ) =
Nel 1 i=1
+
Nel i=1
3
Si
2 j=0
ξ 2 u v RE sin(ϕij ) Hij (f1,ij wij + f2,ij wij ) + gf3,ij wij
2 1 1 2 u k v k ξ Si RE sin(ϕij ) Hij (ukij wij + vij wij ) + gξij wij , 3 j=0 τ
b(d, Wh ) = m2 g
1 1 ξ Gij gHij dij wij . 2 j=0
si ∈ Γh
In sums over boundary segments, indices 1 and 2 corresponds to its ends, Gij = (sin2 (ϕij )(λi,2 − λi,1 )2 + (ϕi,2 − ϕi,1 )2 )1/2 is multiplier arising due to approximation of integrals along arcs on sphere. Note that the discrete analogue of adjoint problem is similar to the direct one with change of sing in some elements of matrix. In [4] the following a priori estimate was obtained: max Φn 2h + βm2 τ
1≤n≤K
≤C τ
K
n=1
2 f n h
K 1 n 2 1 Gij g gHij ξij ≤ 2 j=0 n=1 si ∈ Γh
K 1 1 2 1 + Φ0 2h + m2 τ Gij g gHij dnij β 2 n=1 j=0 si ∈ Γh
(21) .
Numerical Solution of Some Direct and Inverse Mathematical Problems
39
Note that it characterizes stability of the discrete problem with respect to initial data and right-hand side. In the ibidem paper it has been proved that in a subdomain with a uniform grid the problem 4 is second-order consistent. Consistency and the estimate (21) characterizing stability (provided that an increase of Rf is bounded) imply convergence of the solution of the discrete Problem 4 to that of original problem in the norm (18) at any instant of time. This is confirmed by the numerical results for a model problem [4].
7
Numerical Experiments
A triangulation of the World Ocean is performed on the basis of the ETOPO2 bathymetric data base [5]. According to the kind of a problem, water basins which are not connected with World Ocean or have only a slight effect on its behavior (for example, Lake Baikal, the Black Sea, the Persian Gulf) should be eliminated from a triangulation. In numerical experiments we used a grid constructed for the part of the World Ocean of no less than 200 metres in depth. As a result, the grid covers a simply connected domain on sphere and the Arctic Ocean appears to be eliminated from numerical experiments. The triangulation1 of the domain Ω is a grid with the greater part of its elements being equal isosceles triangles. Near the boundaries (the coastlines) the grid is refined with the use of similar triangular elements with similarity √ ratios of k, k 2 , . . . , k L where k = L/ 2 is the degree of grid refinement which specifies the ratio between sizes of the basic triangle and the smallest one. In the numerical experiments the grid has Nnd = 4709 nodes and consists of Nel = 8357 triangular elements. The general view of the grid is shown in Fig. 1. To study convergence, two numerical experiments were performed. In the first experiment, the regularization parameter α was assumed to be zero and observations were specified, i.e., the function ξobs was given exactly: where 0 λ 2π.
ξobs = 5 sin λ,
(22)
Then the boundary function d was reconstructed with an accuracy of 10−9 . The function d reconstructed in such a way was taken as an “exact” solution dex . Further, using observation data, reconstruction of d was repeated with calculation of the relative errors of the reconstructing functions dnum and ξnum on the boundary 1/2 √ m2 gH(dnum − dex )2 dΓ , (23) δα (dnum ) = Γ 1/2 √ 2 m2 gHdex dΓ Γ
1
The triangulation of the World Ocean was constructed by Sergey F. Pyataev and Igor V. Kireev.
40
V.I. Agoshkov et al.
160
140
120
phi
100
80
60
40
20
0 0
40
80
120
160 200 lambda
240
280
320
360
m
5 4 3 2 1 0 -1 -2 -3 -4 -5
0
40
80
120
160
200
240
280
320
160 135 110 85 60 35 36010
phi
lambda
Fig. 1. The general view of the computational grid of the World Ocean (top), the form of the function d on the boundary of the World Ocean (bottom)
Numerical Solution of Some Direct and Inverse Mathematical Problems
1,E+00 1,E+02
41
xi
xi
d d
1,E-04
1,E-02
1,E-08
1,E-06
1,E-12
1,E-10 0
200
400
600
800
0
1000
200
400
600
800
1000
Fig. 2. Dependence of the uniform norms (25) and (26) of the absolute errors (left) and the relative errors (23), (24) (right) on the number of iterative steps for given ξobs in logarithmic scale
δα (ξnum ) =
Γ
1/2
√
m0 gH(ξnum − ξobs ) dΓ
Γ
2
√ 2 dΓ m0 gHξobs
1/2
,
(24)
and of the uniform norms of the absolute errors: 2 = max |dnum (λ, ϕ) − dex (λ, ϕ)|, dnum − dex Γh,∞
(25)
0 ξnum − ξobs Γh,∞ = max |ξnum (λ, ϕ) − ξobs (λ, ϕ)|.
(26)
(λ,ϕ)∈Γ¯2
(λ,ϕ)∈Γ¯0
Table 1. The error of a solution for given ξobs N iter.
80 160 240 320 400 480 560 640 720 800 880 960 1040
δα (ξ)
δα (d)
3.0738820E−02 2.9260031E−03 5.8040719E−04 1.4843521E−04 2.8617730E−05 2.6155546E−06 4.4225003E−07 1.0808954E−07 8.6498959E−08 5.5997273E−09 8.2276569E−10 2.5496027E−10 1.6778970E−10
5.2361336E−02 8.2900883E−03 1.3571233E−03 1.6466810E−04 2.4860263E−05 6.1784307E−06 1.2221164E−06 3.0350951E−07 8.9276996E−08 1.6155911E−08 2.2678233E−09 3.1191134E−10 1.6074235E−12
0 1 ξnum − ξobs Γh,∞ dnum − dex Γh,∞
1.7057510E+00 2.7979670E−01 6.4278920E−02 9.4516340E−03 1.5864500E−03 4.2215500E−04 8.8853470E−05 2.3036090E−05 6.9691870E−06 1.3422960E−06 2.2495160E−07 6.1124330E−08 3.5095420E−08
1.1523140E+02 3.4640420E+01 8.0381380E+00 1.1827990E+00 1.9853800E−01 5.2792310E−02 1.1099180E−02 2.8738970E−03 8.6778040E−04 1.6322060E−04 2.3504180E−05 3.2524620E−06 4.9959680E−10
42
V.I. Agoshkov et al.
In Fig. 2 and Table 1 the quantities of these norms in the reconstruction iterative process are shown. In the second experiment the boundary function dex was assumed to be known (Fig. 1 bottom): (27) dex = 5 sin λ, where 0 λ 2π. For given d, an “exact” solution ξ whose values on the boundary are taken as observed ones ξobs is considered as a solution of the direct problem. Then, with the use of observation data, reconstruction of d was performed with calculation of the relative errors of the reconstructing functions δα (dnum ) and δα (ξnum ) and 2 0 the uniform norms of the absolute errors dnum − dex Γh,∞ and ξnum − ξobs Γh,∞ . Since observations were given on the whole “liquid” boundary, the reconstruction problem turned out to be well-posed and the regularization parameter was assumed to be zero (α = 0). In Fig. 3 and Table 2 variations of the quantities of these norms in the reconstruction iterative process are shown.
1,E+02
xi
1,E+01
xi d
d 1,E-02
1,E-03
1,E-06
1,E-07
1,E-10
1,E-11
0
200
400
600
800
0
200
400
600
800
Fig. 3. Dependence of the uniform norms (25) and (26) of the absolute errors (left) and the relative errors (23), (24) (right) on the number of iterative steps for given d in logarithmic scale Table 2. The error of a solution for given d N iter.
80 160 240 320 400 480 560 640 720 800 880 960
δα (ξ)
δα (d)
7.8331529E−03 1.0917530E−03 6.3962083E−05 6.1990004E−06 1.2272556E−06 6.7685023E−07 7.6661245E−08 8.8425361E−09 1.4224667E−09 2.1353679E−10 1.2657594E−10 7.8182498E−11
2.4937732E−02 3.9961972E−03 2.7755244E−04 2.4705682E−05 5.5594843E−06 1.7734513E−06 3.0356439E−07 1.7300899E−08 1.4705201E−08 2.0219172E−08 2.0867546E−08 2.0961868E−08
0 1 ξnum − ξobs Γh,∞ dnum − dex Γh,∞
1.2448310E−02 2.8769540E−03 3.6630150E−04 4.0076540E−05 1.0039080E−05 3.2954490E−06 6.3863510E−07 8.0891970E−08 1.4309720E−08 2.2125700E−09 7.3249380E−10 5.1581240E−10
1.2677810E+00 3.5238280E−01 4.5480490E−02 5.0005950E−03 1.2095830E−03 4.0666290E−04 7.4058330E−05 4.4409820E−06 3.8653620E−06 5.3667280E−06 5.5499760E−06 5.5770940E−06
Numerical Solution of Some Direct and Inverse Mathematical Problems
43
References 1. Marchuk, G.I., Kagan, B.A.: Dynamics of ocean flood tides. Gidrometizdat, Leningrad (in Russian) (1983) 2. Agoshkov, V.I.: Optimal control and adjoint equations mehtods in problems of mathematical physics. Institute of Numerical Mathematics, Russian Academy of Sciences, Moscow (in Russian) (2003) 3. Agoshkov, V.I.: Russ. J. Numer. Anal. Math. Model 20(1), 1–18 (2005) 4. Kamenshchikov, L.P., Karepova, E.D., Shaidurov, V.V.: Russ. J. Numer. Anal. Math. Model 21(4), 305–320 (2006) 5. ETOPO2. National Geophysical Data Center (2001), http://www.ngdc.noaa.gov/ngdc.html
Hardware Development and Impact on Numerical Algorithms U. K¨ uster and M.M. Resch High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Nobelstraße 19, 70569 Stuttgart, Germany kuester,
[email protected]
1
Introduction
Hardware changes during the last 30 years reflected the higher and higher integration of chips. Moore’s law in 1965 was the anticipation of a progress of integration for about 10 years. But this process continued and is still ongoing. Different to the past decades the higher chip density is not more accompanied by an increasing processing frequency. Because the power consumption of a chip for a fixed technology is proportional to the power of three of the frequency, it is de facto limited today. Chips exceeding a power dissipation of 100 Watt are not reasonable for everyday computing. With limited frequency we see the progress of integration process going in the multiplication of the number of on chip cores. This might lead to an agglomeration of cores with 80x86/EM64T instruction sets in architectures like Intel’s two cores Woodcrest and quad cores Clovertown or AMD dual core Opteron and Barcelona. Both development lines suggest a moderate evolution of the number of cores with the same total performance increasing rate we saw before. On the other hand we perceive the interest of graphic card vendors like Nvidia and ATI to infiltrate the high performance computing market. This forces the large processor vendors to react in a not anticipated way. AMD bought ATI and will try to exploit the high floating point performance of the graphic card for computing. Beyond their market segment Nvidia tries to establish the new computing platform Tesla. The Sony – IBM –Toshiba cooperation developed the Cell processor introducing a completely new system with unusual specifications. Whereas this machine has been developed for computer games and the High Definition TeleVision (HDTV), researchers discovered the potential of this machine for numerical computing. All these recent developments will force Intel to react for not loosing market shares. In the following we will describe some common properties of the expected development. We try to analyse the consequences for well know parallel paradigms and for some important numerical techniques with respect of exploiting the performance of future processors. E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 44–51, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Hardware Development and Impact on Numerical Algorithms
2
45
Hardware Development
Aggregating peak performance seems to be simple in the ongoing development. The important question is how to transmit this high potential to existing algorithms. It turns out that memory hierarchies and the interconnection of core local memories will play in important role. We begin with an interesting but outdated concept, give a highlight on a “many core” chip, report some essential properties of the Cell processor, collect the rumors on an Intel concept, show some properties of future NEC vector processors and make remarks on graphic card processors. 2.1
Compaq/DEC Marvel
The Compaq/DEC Marvel machine was a shared memory system with EV-7 processors. Each processor has its own local memory which is accessible by all other processors in a Non Uniform Memory Access (NUMA) way. On chip routers are connecting all processors in a 2D torus. Because all processors have direct access to their own memory a perfect data distribution for domain decomposition for small as well large domains is possible. There is no need of exchanging and reloading data from an external memory. The latencies in accessing the local memories are small and the usable total bandwidth is aggregated in the same way as we experience it for distributed memory system. The processor grid is completely symmetric. The SGI Origin and the actual SGI Altix have a different architecture. But they also provide local memories in a symmetric way. The local memories aggregate to the global shared memory. 2.2
A Many Core Chip
As the number of the on chip cores will increase going from 65 nm to 45 nm, 32 nm and 22 nm technology we have to recognize the significance of on chip communication grids connected by on chip routers using relatively small on chip caches. Because of its size and power limitations the main memory will not be directly connected to the cores (See the Intel Polaris design). Even if stacked memory will be implemented directly on top of the processor the size of the stacked memory will be restricted to a small part of the overall global memory (a few GB versus hundreds of GBs). This global memory will have larger latencies and the bandwidth will not scale with the number of the cores unless it will be possible to connect enough memory banks to the inner cores via fibre lines. This design would increase the peak performance to an amazing amount, but at the expense of a decrease of the bandwidth peak performance relation. Prefetching data by a second independent thread will be important to hide the unavoidable latencies to the main memory. 2.3
Cell Processor
The Sony-IBM-Toshiba Cell Processor has nine cores on chip connected by a fast ring. By a high bandwidth ring all the cores have access to a single memory
46
U. K¨ uster and M.M. Resch
providing 25 GB/s bandwidth. The ring connects the cores directly without memory interference. One core acts as a host processor running the operating system and dispatching tasks for the others. These other SPE-cores are small vector machines delivering an aggregated single precision peak performance of 200 GFLOPs. The SPEs have their own local memory. Loading data is not implicit but explicit by special calls of an independently working DMA engine. Operations and memory transfer are separated. Double buffering techniques have to be used for overlapping data transfer and computation. This is different to the usual way of loading and storing data within an instruction. 2.4
Rumors on Intel Larrabee
Discussions and rumors on Intel’s processor strategy in the internet address the Larrabee project. This might be a processor having 16 cores connected to the memory via four pathes of more than 20 GB/s each. The cores themselves might have access to another even faster memory and are connected by a fast ring comparable to the EIB bus of the Sony-IBM-Toshiba Cell processor. The cores have vector units (SSE) claimed to be capable of eight double precision floating point operations per cycle. The cores can interact directly. The vector units deliver high performance for lower power. Each core is multithreaded by three or four threads helping in overlapping data access and calculation. The overall peak performance could be more than one teraflop for a frequency of four GHz. Several CSI rings may connect several chips of the same type. Different to the Marvel design the processors interact with a hierarchy of memories. Consequently the data located in the nearby memories or caches have to be exchanged for an iterative algorithm with large memory footprint. This leads to contention on the memory pathes with the other processors. Questionable is the cache coherence strategy. The coherence traffic might waste time and scarce memory bandwidth. This on chip high performance shared memory vector machine enforces careful programming of the data access to avoid contention on the memory pathes. Decoupling of data access and calculation will be important. Remarkable is the revival of vector mechanisms in off the shelf machines. See also the article [1] balancing the reasons for use of vector capabilities. 2.5
NEC Vector Machine
NEC builds vector machines for more than 25 years. Vectors computers are pronounced dead for more than 10 years but they are still living. The next NEC generation will extend the life time by enhancing the processor bandwidth to 256 GB/s for each of the 16 processors. A memory hierarchy may be expected to catch the decreasing memory bandwidth to peak performance relation. Important is the opportunity for using the so called vector data register directly by programming techniques. The programmer has the chance to retain vector data for a longer time across loop boundaries to save memory bandwidth and access latencies. Explicit prefetching will be important to hide latencies of data access in using the memory hierarchy.
Hardware Development and Impact on Numerical Algorithms
2.6
47
Graphic Processors
Modern graphic processors from the NVIDIA or ATI are freely programmable. The frequency of the processors is small compared to the frequency of modern general purpose processors. But they provide a high amount of data parallelism. The hardware of the Nvidia Graphic cards is able to handle 128 threads in 16 multiprocessors of SIMD type in a parallel way. For well suited algorithms the machine enables even higher performance as the Cell processor. Special GDDR shared memory shows high bandwidth. This is the reason for using graphic cards for numerical simulation. Different types of memory (registers shared, global, constant, texture) have impact on the performance. They are different in size, access latencies and bandwidths and the accessibility by the different multiprocessors. The correct choose of the appropriate memory and the interaction between the memories is left to the user as also the interaction with the host system. The thread model is not far from the vector model but is applied to the instruction parallel operation on many instances of the same code section and not to vectorizable loops. The optimal data access mechanisms remind to the strided access of vectorized loops. Nvidia tries to commercialize its products for high performance computing. The traditional vendors see this as an attack forcing them in using comparable hardware ideas for the same performance results. The graphic engine might be part of the future standard processors.
3
Parallel Paradigms
As future hardware exposes more and more parallelism to the programmer the efficient implementation of parallel programming models will be a key feature for the efficient use of parallel algorithms. Opposed to the requirement for weak scaling of codes for the todays large machines of several thousands of nodes with scaling peak performance and scaling memories it will be more important to justify the highly integrated parallel hardware with non scaling local memories by strong scaling for fixed size small problems. This is a harder requirement but might be more useful. The on chip hardware exposes high bandwidth communication with low latency just in the demanded way. Counterproductive could be a refusal of developers in implementing efficient hardware barriers. Will the standard parallel programming models express the potential of the hardware? The on core latencies are small and the on core bandwidth will be high. But will we also have implementations of MPI and OpenMP with reduced overhead for multi core chips? The implementations must use the properties like tight integration and short communication pathes and the common addressing. 3.1
MPI
MPI is developed for distributed memory machines. It assumes distinct local address spaces for the different processes. MPI may be simply mapped on a grid of cores. But MPI involves a lot of processing overhead. Whereas in the on core
48
U. K¨ uster and M.M. Resch
communication network the transfer of data packets may cost several tens of cycles, the protocol will could need hundreds because of MPI’s calling hierarchy and the transfer of unneeded information. If their would be an efficient on core implementation the question occurs how to get an efficient implementation for the coupling with other nodes in the usual distributed way. MPI communicators and groups could help for the mapping of cores, sockets and nodes to the application. 3.2
OpenMP
OpenMP is a shared memory parallelization model using user defined directives for description of parallel program regions and parallel loops. Variables and arrays can be declared to be PRIVATE to a thread or SHARED. This model fits well to the multicore architectures because of the shared address space and the close interaction of the cores. The model could be generalized to even heterogenous cores. In this case extensions of OpenMP could provide interface specifications to run calls on a specialized core. But OpenMP does not allow for a distinction of local and hierarchical memories. It does not allow for a separation of data beyond the differentiation on PRIVATE and SHARED data. No control is provided for the mapping of a thread to the grid or topology of cores. 3.3
Co-Array Fortran and UPC
Co-Array Fortran and UPC are relatively simple programming languages providing an additional array index for the localization of data and operations on an specific processor. This additional array index allows for accessing all partitioned data on every processor. Some additional intrinsic calls allow for partial barriers and for identifying processor numbers. These models can be used on shared memory systems as well as on distributed memory systems. The explicit indexing of the shared data controls the data locality. The index also allows the addressing of cores as part of a core array. 3.4
Vectorization
Vectorization and SIMDization will be one of the hardware offers for getting performance. Programming techniques have to have this in mind. Base constructs will be arrays as they are well known in the traditional vector computing. Algorithms have to be vectorizable. If loops are vectorizable then the compiler is able to generate vectorized code. Compiler knowledge on this subject is grown in decades. But the programmer must expose loops in a vectorizable way.
4
Implication on Numerical Algorithms
All new hardware technologies to be expected in the next years are well known from the past. The limitation of the frequency by electrical power consumption forces to rely on SIMD multithreading and vectorization techniques for performance increase where pure multiplication of cores seems to be not enough. This
Hardware Development and Impact on Numerical Algorithms
49
is a conservative but reasonable approach. The limitation of the memory bandwidth and the large memory access time enforce at the same time memory and cache hierarchies in all modern architectures. The programmer has to recognize the need of SIMDization and vectorization. Some tests and considerations of numerical base algorithms can be found in [2]. 4.1
Fast Fourier Transform
FFT can be formulated in different ways for having benefits of all these kinds of architectural elements. FFT provides implementations which have a good computational intensity. FFT has at the same time the advantage to be called by simple and clear interfaces. Libraries with optimal implementations could be build of all architectures. That makes its use appropriate for a variety of other algorithms. This applies also for various kinds of wavelet transforms and convolutions and recommends FFT like transforms to be used in the development of new algorithms. 4.2
Dense Matrices
Dense matrix algorithms as matrix X matrix, matrix X vector multiply, LU decomposition of square matrices, eigenvalue – eigenvector computations provide peak performance for all computer architectures since the beginning. Essential reason is the high computational density at least for large matrices. Also modern architectures will show good performance. Relevant implementations for BLAS and LAPACK will give the opportunity for widespread use. But the modern multicore design might also help in speeding up large sets of smaller dense matrix problems. In that way small sized linear algebra could help in preconditioning large matrices or help in inventing new methods for the purpose of increasing the computational density. 4.3
Stencil Based Computing
The solution of partial differential equations typical needs a large amount of data per grid point for only a few operations. The applied iterations schemes may use Krylov space procedures or simple explicit time stepping, Jacobi or Gauss-Seidel iterations. Also multigrid smoothers, restrictions and prolongations are of this type. Lattice-Boltzmann procedures are doing some more operations on a larger structured neighbourhood. As long as the cores in the grid have enough local memory that the complete decomposed problem data can be stored in total only some local data have to be exchanged. Remind the Compaq/DEC Marvel system. In that case the efficiency of the system would be quite high. If the problem size increases the global memory would be needed. If the number of domains exceeds the number of cores all the data of all local memories are to be exchanged. The performance would drop to the level the memory bandwidth can sustain the needed data. For example on the Cell processor the performance would drop drastically below its potential.
50
U. K¨ uster and M.M. Resch
Localization can be reached by execution of many (Pseudo-)time steps on the same or decreasing domain. The approach handled by “time skewing” uses exactly these data which are already computed and are still in the local memory or in the local cache [3,4]. The technique can be applied to simple cases. For more difficult problems with numerical local time stepping and interacting programs the technique is difficult to handle. It forces a not natural way of programming. Generalizations are possible to use the technique for multi cores. It is possible to shift the array step by step through the array of cores reading from memory only one hyperplane instead volume data. This decreases the amount of bandwidth needed. Other promising alternatives are the increase of the work done for each node. High order Finite Element techniques involve much more operations with some more data per grid point. Discontinuous Galerkin Methods may increase the number of approximation functions. Similar holds for spectral element techniques. 4.4
Sparse Matrices
Handling the sparse matrix vector multiplication of general sparse matrices by an appropriate sparse matrix storage scheme will be severely limited by the sustainable memory bandwidth. Our own experiments on the Cell processor show the saturation of the bandwidth already by one processor. The performance is indeed larger than on standard architectures. But we this behaviour will be typical. Prefetching on matrix data will help at least to saturate the memory bandwidth. It seems also to be possible to prefetch the vector for matrices with relatively small bandwidth. But this provides only unsatisfying solutions. Helpful would be the matrix vector multiplication for a complete set of vectors allowing for multiple use of the matrix elements and the indirection array. If matrix and vector(s) are organized in a block way which fits the need of many practical problems the impact of indirection is reduced and the respective parts of the vector(s) are multiply used. 4.5
Krylov Space Procedures
Krylov space procedures are wide spread kernels for the solution of sparse linear systems. Time determining are the matrix vector product and global dot products and preconditioning methods. To avoid performance breakdown by the sparse matrix vector product methods have to be invented allowing for the usage of sets of vectors instead of a single one. Incomplete LU factorization can be applied to colored domains to reduce the penalties of the inherent recursion. Block based or domain based preconditioners are ideal because of their localized behaviour. Partial problems could be solved on subdomains which are mapped to cores. These subdomain solutions provide preconditioners for the underlying problem. In this way the total iteration count can be reduced. The solution of the partial problems is perfectly localized for a longer time [5].
Hardware Development and Impact on Numerical Algorithms
4.6
51
Particle Based Techniques
Particle techniques with restricted interaction radius are dynamically localized. The localization is done by arranging the particles in the cells of a trivial background grid. These methods typically enforce many time steps. The resulting computing time limits on the other hand the number of particles. If the number of particles is as small that they fit in the local memories or caches many core chips would give a perfect speedup. If they don’t fit the performance is limited by the memory bandwidth because the time skewing trick cannot be applied if the particles are not homogeneously distributed. Vectorization must go along the particles in the cell different to vector computers with long vector length.
5
Conclusions
The de facto frequency limitation of modern processors will enforce an unattended performance race by highly parallel processors. Even worse as in the past the memory systems are not able to keep step with the performance increase. Local memories and caches shall mitigate the data starvation of the algorithms. This impacts programming techniques, parallel programming paradigms and numerical algorithms. Programming is forced to develop sophisticated methods for reusing the local data. Future numerical programming has to separate data movement and calculation. Big steps can be expected but also disappointments.
References 1. Gebis, J., Patterson, D.: Computer 40(4), 68–75 (2007) 2. Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The potential of the cell processor for scientific computing. In: Proc. of the 3rd Conference on Computing Frontiers, Ischia, Italy, CF 2006, May 03 - 05, 2006, pp. 9–20. ACM Press, New York (2006) 3. Harrison, I., Wonnacott, D.: An empirical study of multiprocessor time skewing. In: Proc. of the Mid-Atlantic Student Workshop on Programming Languages and Systems (MASPLAS 2003), Haverford College, April 26 (2003) 4. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-Oblivious Algorithms. In: Proc. of the 40th Annual Symposium on Foundations of Computer Science, FOCS., October 17 - 18, 1999, vol. 285. IEEE Computer Society Press, Washington (1999) 5. G¨ oddeke, D., Becker, C., Turek, S.: Integrating GPUs as fast co-processors into the parallel FE package FEAST. In: Matthias Becker, M., Szczerbicka, H. (eds.) Proc. of the 19th Symposium on Simulation Technique, September 2006, pp. 277–282 (2006)
Mathematical Modeling in Application to Regional Tsunami Warning Systems Operations Yu.I. Shokin1 , V.V. Babailov1, S.A. Beisel1 , L.B. Chubarov1 , S.V. Eletsky1 , Z.I. Fedotova1 , and V.K. Gusiakov2 1
Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia
[email protected],
[email protected], beisel
[email protected],
[email protected],
[email protected],
[email protected] 2 Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia
[email protected]
Abstract. Catastrophic tsunamis that flooded the ocean coast in the past had taken many human lives and destroyed the infrastructure of the coastal areas in the Pacific and elsewhere. Recent tragic events in the Indian Ocean motivated governments to develop new and improve the existing tsunami warning systems capable to mitigate the impact of catastrophic events. In the future, regional tsunami warning systems will be integrated into a network including both the systems currently being developed and the existing warning systems (in the past, tsunami warning systems had been deployed to protect the coastal areas in Japan, USA, Russia, Australia, Chile and New Zealand). The present study, conducted by the Institute of Computational Technologies in collaboration with other research institutes in Novosibirsk, is directed to the design of a new generation of the tsunami warning system for the Pacific coast of Kamchatka. The main purpose of the present study is to develop the computational methodology allow us to build a database of potential tsunamigenic sources that impose the imminent tsunami risk for the eastern coast of Kamchatka Peninsula. In the first stage of the project, a set of basic model sources of tsunamigenic earthquakes is defined. These model sources are used to calculate the initial water elevation in the source area that are used as initial conditions for dynamic modeling of tsunami propagation in the selected geographical region near the Kamchatka east coast. The next stage is the modeling of propagation and transformation of tsunami waves on the way from the source area towards the coast. This information is presented as a decision support system intended for use by persons on duty who are responsible for initiating tsunami mitigation procedures such as evacuation of people and sending ships away from the dangerous harbors and bays.
The work was supported by the RFBR (projects 05-05-64460, 06-05-72014, 07-0513583), by integrated projects of SB RAS (28, 113), by programs of the State support of the scientific researches spent by leading scientific schools of the Russian Federation 9886.2006.9, and by INTAS 06-1000013-9236.
E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 52–68, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Mathematical Modeling in Application
53
The project involves numerical solution of a large number of instances of wave hydrodynamics problems. At the same time, interpretation of the results requires non-trivial postprocessing and development of a specialized information systems. Numerical calculation of tsunami propagation in the ocean with a real bathymetry for multiple combinations of the model earthquake sources with high spatial resolution constitutes the major part of the computational requirements of the project. The amount of computation resources needed for numerical modeling of tsunami propagation implies the use of high performance computers and may require adaptation of numerical algorithms to specific computing platforms. In the future, this approach may be extended to other tsunamigenic areas in the Pacific and the Indian oceans. In the latter case, we plan to consider the problem of defining the extension of the coastal zone that is the subject for flooding by tsunami waves. This would require to perform run-up calculation. To achieve the necessary accuracy in reproducing of tsunami behavior in the coastal areas, we will need to take into account such features of the coastal area as small rivers, lakes and swamps. This will significantly increase the amount of computational resources needed both during the development of numerical models and algorithms and in the production runs.
1
Introduction
The present article is directed to the development of database of model tsunami sources that present the potential risk for the protected coastal sites located at the Kamchatka east coast. The complex of processing methods is suggested, and the principles, experience and results of numerical calculation of tsunami generation and propagation are discussed. The database “Tsunami zoning of Kamchatka” is designed for storage, review and retrieval of information about tsunami events generated by the set of model tsunamigenic earthquakes in the recording tide gages associated with the protected coastal locations. This database is a part of the information system and works in the user-operated mode, providing selection of a set of protected coastal locations, model earthquakes, computational domain, constructed computational grid, set of parameters of computational algorithm, mathematical models of computation of initial bottom disturbance and its further transformation during the tsunami wave propagation. A separate “research” component of the information system provides options for setting of parameters of the calculation area, set of protected coastal locations, model seismic sources, mathematical models of tsunami generation and propagation, separate components of computational algorithms, etc. This approach completely corresponds to the adopted concept of algorithmic and system support of the National Tsunami Warning System of Russian Federation. According to this concept, mathematical modeling of tsunamis is carried out by specialists on orders of other services, and modeling results are passed to customers. In the operational mode of the National Tsunami Warning System, the estimation of the expected tsunami heights is proposed by means of numerical
54
Yu.I. Shokin et al.
modeling of dynamic impact of tsunami waves at the coast. To implement this methodology, additional algorithms of data approximation based on methods and approaches of artificial intelligence will be used. The coupling of a synthetic catalogue created in such a way with a catalogue of historical evidence and real-time information from gauges of hydrophysical monitoring of ocean surface system is assumed.
2
Technologic Basis
The proposed computational methodology includes the following technological stages. First, the geographical area, adjacent to the protected coast (Kamchatka Peninsula) and containing zones of possible tsunamigenic earthquakes is determined. The computational domain “Kamchatka” and a set of protected points are confirmed (see Fig. 1). The position of “virtual” tide gages related to the protected coastal locations are determined. These tide gages are represented by
Fig. 1. Location of the protected points on the coast of Kamchatka
Mathematical Modeling in Application
55
a set at computational points that are most closely located from the protected coastal locations. With that, depths in these points are determined and deviations of their geographical coordinates from coordinates of the actual protected locations are estimated. Usually these deviations don’t exceed 2-3 angle minutes and may be considered to be quite acceptable. To estimate the spatial variability of wave behavior near a particular “virtual” tide gage, four nearby gages (2 “left” and 2 “right”) are settled. The estimation is carried out by expert analysis using “external” geoinformation system software. The study of seismotectonic features of the Kuril-Kamchatka seismic zone [4] resulted in the proposed scheme of probable locations of the possible tsunamigenic sources (Fig. 3) that present the imminent hazard for the protected villages at the Kamchatka east coast. For each model source, the initial co-seismic displacement of the ocean bottom is calculated based on the (Okada, [8]) formulas. This system is intended for usage in the research mode. Modification of computational algorithms is carried out to calculate transformation of tsunami waves from the source domain up to the coast. Numerical algorithms were modified and adapted to the specific bottom topography and geometry of coastal boundaries. The different numerical algorithms were used to evaluate the accuracy of modeling results and degree of their stability in relation to accounting of bottom and surface frictions, Earth sphericity, nonlinear effects, and other factors that influence the tsunami propagation. The system of control algorithms is created to provide organization and carrying out serial calculations of propagation of tsunami waves. Computational and control algorithms are also meant for realization of the “research” mode. A special data controlling system was created to adapt the basic computational output – computed mareograms at every protected coastal location. For each mareogram, maximums and minimums of water-surface elevation and wave heights (half-sums of maximums and minimums) are determined in “every protected point” for every model earthquake. For “every protected point” maximums and minimums of water-surface elevation and wave heights generated by every model earthquake are determined. The maximum and minimum event time and index number of earthquake, which generated these extreme values, are also determined for every protected location. After a series of extensive computational experiments, which allowed to distinguish difficulties of computational modeling and ways of their solution, it was decided to make two independent series of numerical calculations – within linear and nonlinear shallow water models. The nondivergent version of algorithm having high stability margin and approximating equations of shallow water theory in spherical coordinate system was used.
3
Protected Points and Model Seismic Sources
The Kuril-Kamchatka subduction zone is a very active convergent margin between subducting Pacific and overriding Okhotsk plate, where destructive tsunamigenic earthquakes occur on the regular basis. Historical seismicity in this area is
56
Yu.I. Shokin et al.
Fig. 2. Historical seismicity of Kamchatka. Nearly 23,000 of earthquakes occurred from 1737 to 2007 are shown. Source of data – ITDB/WLD [6].
concentrated within a narrow strip stretching between the eastern coast of Kamchatka and the axis of the deep-water trench (Fig. 2). In spatial distribution of this seismicity there are several local maxima that are concentrated near the main Kamchatka peninsulas. The largest historical earthquakes in this area are the 1737 and 1952 events with magnitude near 9 that ruptured the same segment of the arc [7] and generated the Pacific-wide tsunamis. Other large submarine earthquakes that generated locally destructive tsunamis at the Kamchatka east coast occurred here in 1841, 1904 and 1923. For the study of tsunami generation, the most adequate mathematical model is the solution of a closed system of equations of the dynamic theory of elasticity, describing the oscillations of layered elastic half-space (the model of the Earth crust and the upper mantle) coupled with an overlaying compressed liquid layer (the model of the ocean). This approach to the tsunami generation was first proposed by Podyapolskiy [9] and than used by Gusiakov [5]; Yamashita, Sato [11]; Ward [10]. Comer [1] has shown, that within the framework of long-wave
Mathematical Modeling in Application
57
approximation, the solution of the fully coupled problem of tsunami generation is equivalent to the consecutive solution of two separate problems: (1) determination of static bottom deformation due to a buried seismic source and (2) calculation of tsunami propagation within the framework of the long wave theory in an ocean with the variable depth using the solution obtained at the first stage as the initial condition for the tsunami generation. This approach is widely applied in the numerical modeling of real historical tsunamis in the different parts of the Pacific and elsewhere and in the cases, when parameters of seismic source are known, allows to obtain the reasonable agreement of computed and observed mareograms. In this model, the computed static deformation of the surface of an elastic halfspace due to an internal dislocation source is calculated based on analytical formulae obtained in (Okada, [8]) and than used as the initial conditions for the program of calculation of tsunami propagation In the ocean with a real bathymetry.
Fig. 3. Spatial distribution of the model sources with magnitude Mw = 7.8. Each rectangle shows the projection on the surface of the internal rupture with length L = 108 km, width W = 38 km
58
Yu.I. Shokin et al.
On the basis of analysis of available historical data on the tsunami occurrence in the Kuril-Kamchatka region and with invoke of expertise of regional seismologists (A.A.Gusev of the Institute of Volcanology and Seismology, PetropavlovskKamchatskiy, Russia) the scheme of possible locations of potential tsunami sources was created and the source parameters for each model source were selected. The seismicity pattern was approximated by a set of equally distributed model seismic sources with parameters retrieved from regional tectonic settings and past historical earthquakes. The main set of model sources consists of 73 fault planes distributed over the area of known historical earthquakes near Kamchatka (Fig. 3) each of them is represented by a internal dislocation source. This source is described by six parameters such as length of the fault L , its width W , depth of its upper edge h0 , dip-slip angle δ, strike-slip angle λ and amount of displacement over a fault D0 . As a measure of intensity of such a source, its seismic moment, defined as M 0 = μ · L · W · D0
(1)
According the regional seismotectonics [4], the most typical source mechanism for the major (M=7.8) earthquakes in this region is the low-angle thrust along the main interface boundary in this region – the boundary between subducting Pacific and overriding Okhotsk plates. According to this scheme, the dip angle of the fault δ is equal 15◦ , strike angle λ is equal 90◦ and the strike direction (azimuth) 58◦ . The spatial parameters of the model source are adopted to be as follows: length of the fault L = 108 km, width of the fault W = 38 km and displacement D0 = 2.75 m. The seismic moment of this source calculated by (1) is equal to 4 · 1010 N · m. According to the correlation formula (Aki, [12]) MS = (lg M0 − 9.0)/1.5
(2)
this value roughly corresponds an earthquake with the surface wave magnitude of 7.8.
4
Computational Domain “Kamchatka”
On the basis of preliminary analysis of historical data about tsunamis in water area of Kamchatka Peninsula and in accordance with the concept of designing of set of model sources the region extending from 150 to 177 degrees of east longitude and from 45 to 62 degrees of north latitude was taken as base water area. The portrait of the bottom and land relief of the region under concern is presented on Fig. 4.
5
Parameters of Model Tsunamigenic Earthquakes and Corresponding Ocean Surface Disturbances
The map of the model sources location is presented on Fig. 5.
Mathematical Modeling in Application
59
Fig. 4. Bottom relief in computational domain “Kamchatka”
Fig. 5. Map of model sources location. Diamonds point to earthquakes of 7.8 magnitude, triangles point to 8.1 ones, circles point to 8.4 ones, and squares point to 9.0 ones.
60
Yu.I. Shokin et al.
Fig. 6. Ocean surface disturbances directly above underwater earthquakes with 7.8 magnitude
Fig. 7. Location of initial ocean surface disturbances in case of model sources with 7.8 magnitude
Mathematical Modeling in Application
61
Fig. 8. Ocean surface disturbances directly above underwater earthquakes with 9.0 magnitude
Ocean surface disturbances directly above zone of model earthquakes with 7.8 magnitude are presented on Fig. 6, and map of these initial disturbances location in computational domain “Kamchatka” is presented on Fig. 7. Ocean surface disturbances directly above zone of model earthquakes with 9.0 magnitude are presented on Fig. 8, and map of these initial disturbances location in computational domain “Kamchatka” is presented on Fig. 9.
6
Algorithms
The set of computational algorithms is divided between two modules: the module of calculation of initial ocean surface disturbance generated by model tsunamigenic earthquake and the module of calculation of tsunami wave propagation from given initial disturbance. To carry out serial calculations the special console utility serial calc.exe is developed which isn’t included in package of program supporting of user-operation mode. Functionality carried out by this utility will be provided with set of utilities supporting the “research” operating mode. It should be noted that the creation of a production version of this utility required additional careful adjustment of computational algorithms and thus its development started in advance, at a stage preceding the design and realization of the database for the storage of results. Such approach was also justified, because carrying out the calculations turned out to be quite a lasting process. At this point a special computational utility source.exe which is meant for calculation of initial ocean surface disturbance and an utility MassGlobalCalcul.exe for modeling tsunami wave transformation from zone of initial disturbance up to protected points on the coast were also used.
62
Yu.I. Shokin et al.
Fig. 9. Location of initial ocean surface disturbances in case of model sources with 9.0 magnitude
Long-term authors experience in solving fundamental and applied tsunami problems showed that classical equations of shallow water are enough to determine maximum wave heights in vicinity of protected points (up to 5-meter isobath). These equations are characterized by set of “physical”, “geographical” and “mathematical” parameters. Physiographic parameters include initial data
Mathematical Modeling in Application
63
(initial free surface disturbance), boundary (structure and coastal) configuration, boundary conditions, bathymetry, bottom surface roughness, wind friction, turbulent exchange coefficients, etc. Mathematical parameters arise in the course of computational algorithm construction. Computation module of the system implements algorithm of calculation of tsunami propagation with account of two types of boundary conditions – reflection at vertical impermeable boundary and water discharge to the outside of computational domain. The algorithm is based on a two-step MacCormack scheme [2,3]. Large size of computational domain required using geographical reference system wherein linear equations of shallow water with account of Coriolis force and bottom friction are of the following form: ∂(Hu) ∂(Hv cos ϕ) ∂h 1 = 0, + + ∂t R cos ϕ ∂λ ∂ϕ ∂η g ∂u (3) + = f1 , ∂t
∂v ∂t
R cos ϕ ∂λ g ∂η R ∂ϕ =
+
f2 .
Here R is the mean radius of the Earth, ϕ is latitude, λ is longitude, t is time, h = H + η is the total depth, H is the depth of undisturbed water surface, η is the water surface elevation, g is the gravity acceleration, u and v are the velocity √ 2 2 vector components in the line of λ and ϕ respectively, f1 = lv − gk 2 u hu4/3+v , √
2
2
f2 = −lu − gk 2 v hu4/3+v , l = 2ω sin ϕ, ω is the angular velocity of the Earth, k is the roughness coefficient. Nonlinear equations of shallow water on a sphere can be written in the following form: ∂hv cos ϕ ∂h 1 ∂hu + + = 0, ∂t R cos ϕ ∂λ ∂ϕ 2 g ∂η ∂u 1 1 ∂u ∂u = f1 , (4) ∂t + R 2cos ϕ ∂λ + v ∂ϕ + cos ϕ ∂λ ∂η ∂v 1 u ∂v 1 ∂v 2 ∂t + R cos ϕ ∂λ + 2 ∂ϕ + g ∂ϕ = f2 .
Considering bounded domain Ω(λ, ϕ) = {(λ, ϕ) : λ1 ≤ λ ≤ λ2 , ϕ1 ≤ ϕ ≤ ϕ2 } in the plane of geographical coordinates ϕ and λ, introduce on it constant rectangular mesh Ω = {(λi , ϕj ) : λ1 ≤ λi ≤ λ2 , ϕ1 ≤ ϕj ≤ ϕ2 , 0 ≤ i ≤ Nλ , 0 ≤ j ≤ Nϕ } with steps Δλ and Δϕ in the line of spatial variables λ and ϕ respectively. Let Δτ n = tn+1 − tn be as time step. In terms of mesh functions depending on discrete variables λi , ϕj , tn , explicit two-step finite-difference scheme approximating linear model (3) in internal nodes of grid Ω is of the following form: Step 1 hij −hn ij τn u ij −un ij τn n vij −vij τn
n n n Hij uij −Hi−1j un i−1j 1 R cos ϕj Δλ n n ηij −ηi−1j g + R cos = f1 nij , ϕj Δλ n n g ηij −ηij−1 +R = f2 nij , Δϕ
+
+
n n n n cos ϕj Hij vij −cos ϕj−1 Hij−1 vij−1 Δϕ
= 0, (5)
64
Yu.I. Shokin et al.
f1 nij
n lj vij
=
− gk
2 uij n
2ω sin ϕj .
n n 2 n 2 n 2 (unij )2 +(vij ) n n 2 vij (uij ) +(vij ) , f = −l u − gk , lj = 2 ij j ij (hnij )4/3 (hnij )4/3
Step 2 hn+1 −(hn ij +hij )/2 ij + τ n /2 n n n n Hi+1j u i+1j −Hij u ij cos ϕj+1 Hij+1 v ij+1 −cos ϕj Hij v ij 1 + R cos ϕj + Δλ Δϕ n n un+1 −(un uij )/2 hi+1j −Hi+1j − hij +Hij g ij + ij + R cos = f1ij , τ n /2 ϕj Δλ n+1 n n n vij −(vij + vij )/2 h −H − h +H ij g ij+1 ij+1 ij +R = f2ij , τ n /2 Δϕ u f1 ij = lj vij − gk 2 ij
√
( uij )2 +( vij )2 , 4/3 (hij )
v f2 ij = −lj u ij − gk 2 ij
= 0, (6)
√
( uij )2 +( vij )2 . 4/3 (hij )
For a system of nonlinear equations a similar algorithm is built: Step 1 hij −hn ij τn
+
u ij −un ij τn n v ij −vij τn
1 R cos ϕj
n n n hn ij uij −hi−1j ui−1j Δλ
1 R cos ϕj
+ +
(
2 un − ij
)
2 un i−1j
(
)
2Δλ
n n un vij −vi−1j ij R cos ϕj Δλ
+
1 R
+
n n n cos ϕj hn ij vij −cos ϕj−1 hij−1 vij−1 Δϕ n n ηij −ηi−1j
+g
+
Δλ
2
n ) (vij
n vij
R
2
n −(vij−1 ) 2Δϕ
+g
n un ij −uij−1
Δϕ
n n ηij −ηij−1 Δϕ
= 0,
= f1 nij ,
= f2 nij .
Step 2 hn+1 −(hn ij +hij )/2 ij + τ n /2 hi+1j u hij+1 hij v i+1j − hij u ij cos ϕj+1 vij+1 −cos ϕj ij 1 = 0, + R cos ϕj + Δλ Δϕ n+1 n uij −(uij + uij )/2 + τ n /2 n n 2 2 hi+1j −Hi+1j −hij +Hij ( ui+1j ) −( uij ) v u − uij 1 + R cos + g = f1 ij , + Rij ij+1 ϕj 2Δλ Δλ Δϕ
u v − vij + R cosij ϕj i+1j Δλ
+
n+1 n vij −(vij + vij )/2 + n /2 τ n 2 2 h− H n − hij +Hij ( v ) −( v ) ij+1 ij 1 + g ij+1 ij+1 R 2Δϕ Δϕ
= f2 ij .
Top and bottom boundaries of the computational domain are parallel to equator and left and right ones go along meridians. Parts of “vertical wall” boundary go through nodes of constant rectangular mesh in such a way that they are always parallel to exterior of rectangular Ω (λ, ϕ). On the side boundaries boundary condition is given by u = 0, ∂v/∂λ = 0, ∂η/∂λ = 0 on meridians and v = 0, ∂u/∂ϕ = 0, ∂η/∂ϕ = 0 on parallels. On exterior “open” boundaries the conditions of free passage – Sommerfeld conditions – are postulated. In coordinates in use they are given on parallels by R
∂η ∂u ∂u ∂v ∂v ∂η ±c = 0, R ±c = 0, R ±c = 0, ∂t ∂ϕ ∂t ∂ϕ ∂t ∂ϕ
Mathematical Modeling in Application
65
and on meridians ∂η c ∂η ∂u c ∂u ∂v c ∂v R ± = 0, R ± = 0, R ± = 0, ∂t cos ϕ ∂λ ∂t cos ϕ ∂λ ∂t cos ϕ ∂λ √ where c = gh is wave propagation velocity and selection of mark depends on direction of external normal to the corresponding boundary. Bottom bathymetry and land topography are fixed, are keeping in the corresponding data base and are using by computational and rendering modules. ˜ (λi , ϕj ) , 0 ≤ i ≤ Nλ , 0 ≤ j ≤ Nϕ , defined This data represents grid function H at nodes of discrete domain Ω, and it is defined as depth and height values given on constant 1-minute grid. Initial free surface disturbance is calculated by special computational module; initial velocities are set as zero. Time step of the grid is calculated on every step from stability condition. Algorithm allows solution smoothing through certain number of steps. Parameters of this smoothing are set in special file. The first version of user-operate mode software has following functionalities: review of maximum and minimum amplitudes distribution and wave heights in given protected point for all tsunamigenic earthquakes, and review of maximum and minimum amplitudes distribution and wave heights for given tsunamigenic earthquake in all protected point. Export of these distributions to text files is possible. In view mode for given protected point and given earthquake it is possible to draw corresponding marigram.
7
Database Structure
In the present article “data base” means set of tables (SQL-bases) and folders on a disk. Microsoft Access was chosen as the data management system, as it is well documented and meets the requirements of data access speed. This system is easy-to-work and supports SQL language that will allow to change SQL database easily in the future. The storage structure suggested and realized by the authors allows to store database information about model tsunamigenic earthquakes, protected points and other entities. Big volume data (marigrams, arrays of free surface disturbances, etc.) are stored at local disk. It allows to gain access to information using other “external” software tools. Using standard database management system provides appropriate speed of access to storing information by SQL queries. As mentioned above, the database includes set of database tables and local storages (folders). The table dt defended points contains the set of protected points characteristics, dt points contains features of all tide-gages with its coordinates, dt sources contains features of model tsunamigenic earthquakes with descriptions and references to calculated arrays of corresponding values of ocean surface disturbances, dt calc params contains parameters of computational algorithms. Development and filling of this table is not finished yet; new fields will be added to the table. The table dt mareos contains a list of references to calculated marigrams. Finally, the table dt maxmin val contains characteristics of tsunami events at protected points. Its content regenerates after each calculation.
66
Yu.I. Shokin et al.
8
Structure of Project’s Folders
In the present article “project” means a set of serial calculations with given computational domain, protected points, model tsunamigenic earthquakes, mathematical model and computational algorithms. In the project’s root folder there are: The “Bathymetry” folder with bathymetry of computational domain. This folder contains file Bathymetry.grd in binary format .grd (format of Surfer system). Folder “Database” contains project’s database, file of general database (supporting SQL language and construction of several tables, MS Access in the present version) main db.mdb contains data about calculations, its parameters, etc. Folder “Import” is meant for the import of data calculated by console utility serial calc to the database main db.mdb. Before import the following files and folders should be put in: folder “Input” with files mareographs.txt and sources.txt taken from utility folder serial calc, folder “Calculation” with files xxx MareogramsAddCalcul.dat. The folder “Export” is meant for export of data from main db.mdb in format of other programs. Subfolders with names formed from of the current date are created in this folder. “Maxmin defended points” contains maximum and minimum distributions for every protected point. “Maxmin sources” contains maximum and minimum distributions for every model earthquake.
Fig. 10. Window of access to information about tsunami events at the protected point and corresponding marigram
Mathematical Modeling in Application
67
Fig. 11. Window of access to information about tsunami event, generated by given model tsunamigenic earthquake at the protected point, corresponding marigrams (at the right), distribution of tsunami event characteristics (at the bottom), information about numeric values of given earthquake at given protected point (in the context window)
Fig. 12. Example of distribution of tsunami event characteristics at the protected point for all earthquakes. Top plot corresponds to results of linear model of shallow water, and bottom one corresponds to nonlinear model.
68
Yu.I. Shokin et al.
Folder “Modules” contains project’s module files. For instance, it will contain modules for serial calculations. Folder “Sources” contains ocean surface disturbances calculated for model tsunamigenic earthquakes. It contains files in binary format grd: xxx z.grd (array of local disturbances in rectangular coordinates) and xxx etta.grd (array of disturbances in whole domain in spherical coordinates), where xxx is code name of source. Folder “Mareograms” stores files xxx mareograms.dat (marigrams).
9
Examples of Results Presentation
The following series of figures represents the sequence of work widows arising at different stages of user-operating mode.
References 1. Comer, R.P.: Geophys. J. R. Astr. Soc. 77(1), 29–41 (1984) 2. Eletsky, S.V.: Program system NEREUS for simulation of tsunami waves, experience of development: applicability and realization. In: Likhacheva ON (ed) Study of natural catastrophes in Sakhalin and Kuril Islands. Proceedings of the I (XIX) International Conference of Young Scientists. IMGG FES RAS, Yuzhno-Sakhalinsk (in Russian) (2007) 3. Fedotova, Z.I.: Comput. Technologies 11, Special issue, Part II, 53–63 (in Russian) (2006) 4. Gusev, A.A.: The schematic map of the source zones of large Kamchatka earthquakes of the instrumental epoch. Kompleksnye seismologicheskie i geofizicheskie issledovania Kamchatki, Petropavlovsk-Kamchatskiy (in Russian) (2005) 5. Gusiakov, V.K.: Matematicheskie problemy geofiziki. Novosibirsk, Computing Center 5 (Part I), 118–140 (in Russian) (1974) 6. ITDB/WLD, Integrated Tsunami Database for the World Ocean, Version 5.16, July 31, 2007. CD-ROM, Tsunami Laboratory, ICMMG SB RAS, Novosibirsk (2007) 7. Johnson, J.M., Satake, K.: Pure Appl. Geophysics 154, 541–553 (1999) 8. Okada, Y.: Bull. Seis. Soc. Am. 75, 1135–1154 (1985) 9. Podyapolsky, G.S.: Fizika Zemli. 1, 7–24 (in Russian) (1968) 10. Ward, S.: J. Phys. Earth 28 (N5), 441–474 (1980) 11. Yamashita, T., Sato, R.: J. Phys. Earth. 22 (N4), 415–440 (1972); Aki. K. Tectonophys. 13(1-4), 423–446 (1974)
Parallel and Adaptive Simulation of Fuel Cells in 3d R. Kl¨ ofkorn1, , D. Kr¨ oner1, and M. Ohlberger2 1
2
Section of Applied Mathematics, University of Freiburg, Hermann-Herder-Straße 10, 79104 Freiburg i. Br., Germany {robertk,dietmar}@mathematik.uni-freiburg.de Institute for Numerical and Applied Mathematics, University of M¨ unster, Einsteinstraße 62, 48149 M¨ unster, Germany
[email protected]
1
Introduction
In this paper we present numerical simulations for PEM (Polymer Electrolyte Membrane) Fuel Cells. Hereby, we focus on the simulation done in 3d using modern techniques like higher order discretizations using Discontinuous Galerkin methods, local grid adaptivity, and parallelization including dynamic load-balancing. As a test case for the developed software we simulate the twophase flow and the transport of species in the cathodic gas diffusion layer of the Fuel Cell. Therefore, from the detailed model presented in [4] we derive a simplified Model Problem presented in Section 2. In Section 3 one finds a few notes on the discretization schemes that were used for the simulation including comments on adaptation and parallelization. In Section 4 the results of an adaptive, parallel simulation in 3d are presented.
2
The Reduced Model Problem
The three-dimensional, coupled, adaptive, and parallel simulation software is tested on a reduced model problem which describes the flow within the cathodic gas diffusion layer (GDL). The reduced model considers the following physical processes: – Two-phase flow with phase transition – Transport of species with reactions for O2 and H2 O. 2.1
Two-Phase Flow with Phase Transition
A PEM Fuel Cell is operating at relatively low temperatures (around 80 degrees Celsius). Therefore, at the cathode side of the fuel cell liquid water is produced.
R. Kl¨ ofkorn was supported by the Bundesministerium f¨ ur Bildung und Forschung under contract 03KRNCFR.
E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 69–81, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
70
R. Kl¨ ofkorn, D. Kr¨ oner, and M. Ohlberger
As a consequence, the flow in the gas diffusion layers (GDL) is modeled via twophase flow in porous media taking into account the phases, liquid water and gas mixture, consisting of the species oxygen, hydrogen, water vapor, and some rest mostly consisting of nitrogen. In the following the index g denotes the gaseous phase whereas the index w denotes the liquid water phase. From the balance of the volume saturations sw , sg of the two phases we get the two-phase flow in porous media (see [13,15]): ∂t (Φρi si ) + ∇ · (ρi vi ) = qi ,
(1)
vi = −K
kri (∇pi − ρi g), μi
i = w, g.
(2)
Here ρi denotes the density, si the saturation, vi the Darcy velocity, pi the pressure of the phase i = w, g respectively, and g the gravity vector. Furthermore, K denotes the absolute permeability tensor, μi the viscosity of phase i, kri the relative permeability of phase i, and Φ the porosity of the porous medium. Additionally, qi denotes the source term modeling the phase transition of the phase i which is defined as follows (see [13,15]): qg := −rphase , qw := rphase , Mg Φ sg cH2 O sat (pg,H2 O − psat w ), if pg,H2 O ≥pw . RT rphase := kc kv Φ sw ρw (pg,H2 O − psat ), else w
(3) (4)
Thereby pg,H2 O denotes the partial pressure of the water vapor. For the description of the other physical parameters see Table 1. The following constitutive conditions close the two-phase flow system: sw + sg = 1,
pg − pw = pc (sw ).
(5)
Here pc (sw ) denotes the capillary pressure. Whereas in the liquid phase there exists only the species H2 O, in the gaseous phase we have the species k = O2 , H2 O, and R. Here, R denotes all other existing species (mostly nitrogen). As a consequence, in the gaseous phase (i = g) the transport of species has to be taken into account separately. Following [15,13] we get from the mass balances of the species in the gaseous phase the equations: ∂t (Φρg sg ck ) + ∇ · (ρg vg ck ) − ∇ · (ρg Def f (Φ, sg )∇ck ) = qgk .
(6)
Here ck denotes the mass concentration of the kth species, Def f the effective diffusion coefficient, and qgk the source term of the k-th species in the gaseous phase. As all the species together form the hole gaseous phase, we get the following constitutive condition: cO2 + cH2 O + cR = 1.
(7)
Therefore, the transport equation for cR will be droped as the concentration can be calculated using the constitutive condition from equation (7). The source terms qgH2 O and qgO2 are modeled as follows: qgH2 O := qg ,
qgO2 := −rreac cH2 O .
(8)
Parallel and Adaptive Simulation of Fuel Cells in 3d
71
The equations (1) to (8) describe the two-phase flow with species transport including phase transition and reactions in the gas diffusion layer (GDL) of a Fuel Cell. See Section 4 for a detailed description of the domain for the PDEs as well as the description of initial and boundary conditions. 2.2
Physical Parameters
The physical parameters in the two-phase flow equations and the transport equations are chosen from Table 1. 2.3
Global Pressure Formulation and Resulting Equations
First, the two-phase flow system in (1) and (2) is reformulated with s := sw and p := pg − πw (s) as independent variables. Therefore, we introduce the following notation: Table 1. Physical parameters for the model problem parameter porosity abs. permeability rel. permeability of gas rel. permeability of water Van Genuchten coefficient capillary pressure viscosity water viscosity gas density water density air temperature molar mass gas molar mass water molar mass of dry air diffusions coefficient of water vapor in air effective diffusion coefficient gas constant
symbol Φ K krg (s) krw (s) m pc μw μg ρw ρg T Mg MH20 Mair
condensation rate vaporisation rate saturation vapor pressure coefficient for psat w coefficient for psat w coefficient for psat w coefficient for psat w coefficient for psat w coefficient for psat w
kc kv psat w (T ) a b c d e f
value 0.7 5 · 10−11 √ 1 − s (1 − (1 − (1 − s)1/m )m )2 √ s (1 − (1 − s1/m )m )2 0.95 −1 pc (s) = 5300(1 − s) 2.3 −3 1.002 10 1.720 10−5 998.2 ρg (pg , T ) = Mg pg /RT 345 Mg (c) = 1/(cH2 O /MH2 0 + cO2 /Mair ) 0.018 0.02897
DgH2 O 0.345 · 10−4 Def f Def f (Φ, sg ) = (Φsg )3 DgH2 O R 8.3144 1 106 1 10−2 a exp( Tb + c − dT + eT 2 + f ln(T )) 1.00519 −6094.4642 21.1249952 2.724552 10−2 1.6853396 10−5 2.4575506
unit [m2 ]
[Pa] [Pa s] [Pa s] [kg/m3 ] [kg/m3 ] [K] [ kg/ mol] [kg / mol] [kg / mol] [ m2 / s] [ m2 / s] [J/ mol K] [1/s] [1/Pa s] [Pa]
72
R. Kl¨ ofkorn, D. Kr¨ oner, and M. Ohlberger
global velocity: u := vw + vg , phase mobility: λi := kri /μi , total mobility: λ := λw + λg , fractional flow: fi := λi /λ, phase velocity water: vw := fw u + λg fw K∇pc , phase velocity gas: vg := fg u − λg fw K∇pc , s global pressure: p := pg − πw (s), πw (s) := 0 fw (z) pc (z) dz + pc (0), Furthermore the densities ρw and ρg are assumed to be constant and the influence of the gravity is neglected. The equation for the global pressure is obtained by summing up the equation (1) for i = w, g, applying the constitutive conditions (5), and inserting the above notations. Finally, we obtain the Pressure equation − ∇ · (Kλ(sw )∇p) = 0,
(9)
u = −Kλ(sw )∇p.
(10)
and the Velocity equation
Assuming that the saturation sw is given, equation (9) can be used to calculate the global pressure, and finally equation (10) to compute the global velocity. Assuming the global pressure, the global velocity, and the species concentrations are given, then, since the density is assumed to be constant, from equation (1) including the definition of vw we obtain for sw the Saturation equation ∂t (Φsw ) + ∇ · (fw (sw )(u(sw ) + λg (sw )K∇pc (sw ))) = qw .
(11)
With the pressure p, the velocity of the gaseous phase vg , and the saturation of the gaseous phase sg the transport of species can be described by inserting these three values into equation (6). For k = O2 , H2 O we obtain the Transport equations ∂t (Φsg ck ) + ∇ · (vg ck ) − ∇ · (Def f (Φ, sg )∇ck ) = qgk .
(12)
Now the considered Model Problem consists of the equations (9),...,(12). Suitable boundary and initial conditions will be presented in the description of the simulated test problem in Sub-section 4.2. Throughout the rest of this paper numerical simulations using this Model Problem will be presented.
Parallel and Adaptive Simulation of Fuel Cells in 3d
3
73
Discretization of the Model Problem and Implementation
The discretization of the Model Problem (9),...,(12) uses Discontinuous Galerkin Methods. With these methods on one hand, higher order discretizations can be achieved without increasing the stencil of the methods which is an appealing feature when one wants to do parallel computations. Furthermore, there arise no special difficulties when dealing with non-conform grids. Non-conform grids on the other hand have the very nice feature, that the refinement zone stays local unlike for example when bisection refinement is applied. This is again very useful for parallel computations. The implementation of the discretizations uses the software package DUNE [2]. DUNE has a modular structure and in the following the DUNE modules DUNE-COMMON, DUNE-GRID, DUNE-ISTL, and DUNE-FEM have been used. A detailed description of the modules can be found on the Dune homepage [2]. DUNE-COMMON provides basic classes. DUNEGRID defines the abstract grid interface and provides its implementations for several well known software packages (such as ALBERTA [7], ALUGrid [1], and UG [14]) for solving PDEs. The following numerical simulations use ALUCubeGrid which is the grid interface implementation of ALUGrid using hexahedral elements. DUNE-ISTL (see [5]) is an Iterative Solver Template Library. It provides classes for matrix – vector handling and solvers. For the solution of the pressure equations the BCRSMatrix and the BiCG-Stab solver from DUNEISTL have been used. DUNE-FEM provides several implementations of discrete functions spaces such as Lagrange spaces or Discontinuous Galerkin spaces. The Discontinuous Galerkin space, i.e. the base functions and a mapping from local number of degrees of freedom to global number which is needed to store the data in vectors, have been used for the discretization of the Model Problem. Furthermore, DUNE-FEM provides mechanisms for projections of data and rearrangement of memory during adaption. Also the used ODE solvers are part of the DUNE-FEM package. 3.1
Discretization of the Pressure Equation
The equation (9) is discretized by using the Discontinuous Galerkin method. In [11] a variety of Discontinuous Galerkin methods for elliptic problems of the form −u = f are presented and analysed. For the following numerical simulation the Oden-Baumann method has been chosen. This method has been applied to two-phase flow in porous media for example in [9] and led to good results. The resulting linear system is stored using the block wise compressed row storage matrices (BCRSMatrix) from DUNE-ISTL ([5]). For preconditioning a block diagonal preconditioner is applied and the system is solved using the BiCG-Stab solver, both implemented in DUNE-ISTL ([5]). 3.2
Discretization of the Velocity Equation
The equation (10) is discretized by using the Local Discontinuous Galerkin method [12]. Consider ϕj with j = 1, ..., N the vectorial basis functions of the
74
R. Kl¨ ofkorn, D. Kr¨ oner, and M. Ohlberger
discrete functions space consisting of piecewise polynomial functions of degree p. Multiplying (10) with ϕj , integrating equation (10) over the domain Ω, integration by parts and taking into account that the integral over Ω can be split into integrals over all grid cells, on a single cell T we get λ(s)−1 u · ϕj = −Kp ∇ · ϕj + K p n · ϕj , ∀ j = 1, ..., N. (13) T
T
∂T
Here we used that the saturation sw is constant on a single cell and p is a numerical flux as described in [11, Table 3.1]. Here n denotes the outward normal with respect to ∂T . In our computation we have chosen the LDG flux p := {p} − β · [[p]], where {.} denotes the mean value of p on ∂T and [[p]] the jump in normal direction across ∂T which is a vector parallel to the normal (see [11]). Thereby β was chosen as β := |∂Tk |10 n where ∂Tk is a face segment of ∂T . In these definitions we follow the notation in [11]. This is implemented using the general framework for discretizing evolution equations presented in [6]. The implementation of this framework is part of the DUNE-FEM module (see [3]). 3.3
Discretization of the Saturation Equation
The saturation equation (11) is discretized by using the Local Discontinuous Galerkin approach described in [12]. Here, as conservative numerical flux the Engquist-Osher flux is taken. Although the polynomial order of the base functions can be chosen up to order 4, this equation is currently discretized using polynomial degree 0. The reason is, that due to the non-linearity of the flux function higher order discretizations might be unstable if no limiter is applied. On the other hand, as the non-linearity is self-compressive, also the first order method produces a satisfactory result. For the time discretization an explicit or implicit Runge-Kutta solver of order p + 1 is applied, where p is the polynomial degree of the DG base functions. As for the velocity equation the discretization of the saturation equation also is implemented using the general framework for discretizing evolution equations presented in [6]. The used implicit ODE solvers are part of the DUNE module DUNE-FEM. 3.4
Discretization of the Transport Equation
The transport equation is discretized in the same way as the saturation equation. The only difference is that in this case a simple linear upwind flux can be chosen as the numerical flux. Due to the fact that we have a linear flux function the polynomial degree of the base functions can be chosen larger than 0. Although higher order LDG discretizations for this type of equation are stable, one can get oscillations which are even higher if the discontinuities of the velocity field are to strong. To overcome this problem here the factor β is chosen sufficiently large which then works as a penalty term for jumps of the velocity u in normal direction across cell boundaries. In the following numerical example the polynomial order of the DG space for the transport equations is also 0. For the time discretization an explicit or implicit Runge-Kutta solver of order p + 1
Parallel and Adaptive Simulation of Fuel Cells in 3d
75
is applied, where p is the polynomial degree of the DG base functions. Again the discretization uses the general framework for discretizing evolution equations presented as in [6]. Also the same implicit ODE solvers which are implemented in DUNE-FEM are used. 3.5
Operator Splitting
Since the equations (9) to (12) are non-linearly coupled, we apply an operator splitting to decouple the equations and solve each equation separately. This is done as follows: Assume that the unknowns sw and ck , k = O2 , H2 O are given. Then one time step is solved as follows: 1. For given sw the pressure p can be calculated using equation (9), 2. For given saturation sw and given pressure p, the velocity u can be calculated using equation (10), 3. For given saturation sw , given pressure p, given velocity u, and given concentration c the new saturation can be calculated using equation (11), 4. For given saturation sw , given pressure p, given velocity u, and given concentration c the new concentration of species can be calculated using equation (12). 3.6
Parallelization
The parallelization follows the concept of single program multiple data, meaning that one and the same program is executed on multiple processors. The computational domain is distributed via domain decomposition. Here the graph partitioner METIS has been applied to calculate these distributions. Due to the distribution of the data to multiple processes communication between processes sharing data is necessary during computation of the unknowns. Here the DG methods have the nice property to be local methods, which means that during calculation only neighbor elements have to be available. For parallelization this is a very appealing feature. Furthermore, due to the discontinuity of the methods only element data have to be communicated to neighboring cells located on other processes. All communications during the solution process therefore are interior – ghost communications. Pressure equation One communication for each iteration step of the BiCG-stab solver is necessary. Furthermore, each evaluation of a scalar product within the solver needs a global sum operation. Velocity equation One communication after calculation of the velocity is applied. Saturation and transport equation One communication during each iteration step of the ODE solver is needed.
76
3.7
R. Kl¨ ofkorn, D. Kr¨ oner, and M. Ohlberger
Adaptivity and Load Balancing
During the computation of the unknowns p, u, sw , and c an error indicator is evaluated to monitor the local errors introduced by a too coarse grid. Error Indicators For the pressure and velocity together this indicator consists of the jump of the velocity in normal direction across cell boundaries in the normal direction. For the saturation equation and the transport equations we use the error indicators described in [10]. In [10] an a-posteriori error estimator is developed for advection-diffusion problems with source terms discretized by an implicit finite volume scheme. Although, the theoretical proof only holds for this type of discretization, the described local error indicators work very well also for similar discretizations. Marking Strategy Given a global adaptation tolerance the local cell tolerance is obtained by an equi-distribution strategy. Cells where the local indicator violates the local tolerance are marked for refinement. Cells, where the local indicator is 10 times smaller then the local tolerance are marked for coarsening. Adaptation and Load Balancing After marking of elements is done a grid adaptation is performed. Elements that are marked for refinement are refined. Elements that were marked for coarsening are coarsened if possible. After each adaptation step the load of each processor is checked. For each macro grid cell the load consists mainly of the number of leaf cells that have its source in the macro cell. If the given imbalance tolerance is violated, then a re-balancing, i.e. re-construction of a new well balanced distribution of the cells to the processors, is performed. This process is described in detail in [8].
4
Numerical Results
The following numerical simulations where performed on the XC4000 Linux Cluster of the Scientific Supercomputing Center of the University Karlsruhe with 16 processors. For the following simulation results a polynomial order of 2 has been chosen for the discretization of the pressure and velocity equation, for the saturation and the transport equation a polynomial order of 0. An implicit ODE solver of order 1 has been used to solve the saturation equation (11) and the transport equation (12). The resulting adapted mesh consists of 143580 hexahedrons and the simulation took 3 days and 14952 time steps were calculated. To compute one time step approximately 22 seconds were needed. Part of the time is spend in the linear solver which took about 9 seconds each time step. The solution of the saturation equation took approximately 3.5 seconds. The transport equations needed 8.5 seconds and the adaptation step took 0.5
Parallel and Adaptive Simulation of Fuel Cells in 3d
77
seconds. The missing difference was spend in other parts of the program or for waiting for other processes. 4.1
Geometry of the Test Problem
As the computational domain we consider Ω := [0, 2 10−4 ] × [0, 6 10−4 ] × [0, 2 10−4 ]m3 in three space dimensions. Γ1
The boundaries are defined as follows:
Γ3
Γ1 := ∂Ω \ (Γ2 ∪ Γ3 ∪ Γ4 ), Γ2 := {0} × [0, 2 · 10−4 ] × [0, 2 · 10−4 ] m2 , Γ3 := {0} × [4 · 10−4 , 6 · 10−4 ] × [0, 2 · 10−4 ] m2 , −4
Γ4 := {2 · 10
−4
} × [4 · 10
−4
, 6 · 10
−4
] × [0, 2 · 10
Γ1
Γ4
GDL
600 μ m (y−axis)
2
]m . Γ2
The domain Ω with boundaries Γ1 , ..., Γ4 represents the DGL of a PEM Fuel Cell. In the right figure a sketch of the computational domain is shown.
4.2
Γ1 200 μ m
(x−axis)
Fig. 1. A sketch of the computational domain
Boundary Conditions and Initial Values
On the boundaries Γ1 , ..., Γ4 the following boundary conditions were defined: Γ1 : (No-flow boundary – bipolar plate) Two-phase flow: no-flux boundary condition Transport equation: no-flow boundary condition Γ2 : (Inflow boundary – channel) Two-phase flow: sw = 0.0, p = p1cha Transport equation: cH2 O = 0.2, cO2 = 0.8 Γ3 : (Outflow boundary – channel) Two-phase flow: sw = 0.0, p = p2cha Transport equation: Outflow boundary condition Γ4 : (Catalyst layer) Two-phase flow: sw = 1.0, no-flow boundary condition for p Transport equation no-flow boundary condition The following initial values have been chosen: sw (x, 0) = s0 (x) = 0.1 in Ω 2O 2 T , cO c(x, 0) = c0 (x) = (cH 0 0 )
in Ω
2O 2 with cH = 0.2 and cO 0 0 = 0.8.
78
4.3
R. Kl¨ ofkorn, D. Kr¨ oner, and M. Ohlberger
Simulation Results
The following figures show a snapshot of the solution of the Model Problem taken at time step 14952 which corresponds to simulation time T = 0.00025 of the computation. In Figure 2 in the left the pressure distribution is shown. As expected due the choice of the boundary values there is a continuous pressure drop from Γ2 to Γ3 . In the middle the refinement level is shown. One can see that the higher levels (red color) are located in the area where Γ2 and Γ3 are connected to Γ1 on the left side of the geometry. On the right side of Figure 2 the partitioning of the grid for the 16 processors is shown. In Figure 3 the components of the global velocity u are shown. One can see that in the areas where the velocity has higher values the grid is refined (see Figure 2, middle). We state that the error indicator applied to monitor the velocity errors does work well. On the right side of Figure 3 the z-component of the velocity is shown which is as expected close the zero. The saturation sw at time t = 0.0001 is illustrated on the left side of Figure 4. The initial value was 0.1. One can see that phase transition has taken place in the hole area. As only dry air is entering at Γ2 the cell is drying out slowly. On the middle part of Figure 4 the concentration of water in the gaseous phase cH2 O is shown. In the area where saturation is above 0 and a phase transition takes place one can see that the concentration of water is increased as the vaporized water increases the concentration of water in the gas phase. As a consequence the concentration of oxygen is decreased although no oxygen is exhausted, but the mass of the gaseous phase is increased by the phase transition and by that decreasing
Fig. 2. Pressure distribution (left), level of refinement (middle), and partitioning of the grid (right) at T = 0.00025
Parallel and Adaptive Simulation of Fuel Cells in 3d
79
Fig. 3. Global velocity u. Components ux (left), uy (middle), and uz (right)
Fig. 4. Saturation sw (left), concentration of water cH2 O (middle), and concentration of oxygen cO2 (right) at computational time t = 0.0001
the percentage of oxygen, i.e. the concentration cO2 . The middle and left picture also show that that the concentrations sum up nicely to 1 as required. Figure 5 shows the same variables as Figure 4 but at a later time, i.e. T = 0.00025. One can see that more of the water has been vaporized but also that more and more dry air is covering the cathodic side of the cell. In difference to the earlier state on can see that water is produced due to the reaction taking place at the catalyst layer, i.e. Γ4 .
80
R. Kl¨ ofkorn, D. Kr¨ oner, and M. Ohlberger
Fig. 5. Saturation sw (left), concentration of water cH2 O (middle), and concentration of oxygen cO2 (right)
5
Conclusions and Future Work
We could show that complex time-dependent models can be treated in three space dimension with the developed software. Simulations done in 3d usually lead to a large number of unknowns. Therefore, the simulation tool can be used in parallel. In order to increase the efficiency while keeping the accuracy of the simulation local grid adaptivity is included. Future work will be the implementation of the detailed fuel cell model described in [4]. Furthermore, the use of higher order methods will be extended to the transport equations, which is under investigation right now.
References 1. ALUGrid: http://www.mathematik.uni-freiburg.de/IAM/Research/alugrid/ 2. DUNE: http://www.dune-project.org 3. DUNE-Fem: http://www.mathematik.uni-freiburg.de/IAM/Research/projectskr/dune/ 4. Steinkamp, K., Schumacher, J.O., Goldsmith, F., Ohlberger, M., Ziegler, C.: A non-isothermal PEM fuel cell model including two water transport mechanisms in the membrane. In: Preprint series of the Mathematisches Institut of the Universit¨ at Freiburg (2007) 5. Bastian, P., Blatt, M.: The Iterative Template Solver Library. In: K˚ agstr¨ om, B., Elmroth, E., Dongarra, J., Wa´sniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, Springer, Heidelberg (2007)
Parallel and Adaptive Simulation of Fuel Cells in 3d
81
6. Burri, A., Dedner, A., Diehl, D., Kl¨ ofkorn, R., Ohlberger, M.: A general object oriented framework for discretizing nonlinear evolution equations. In: Shokin, Y., Resch, M., Danaev, N., Orunkhanov, M., Shokina, N. (eds.) Advances in High Performance Computing and Computational Sciences. 1st Kazakh-German Advanced Research Workshop, Almaty, Kazakhstan, September 25 – October 1, 2005. Notes on Numerical Fluid Mechanics and Multidisciplinary Design (NNFM), vol. 93. Springer, Heidelberg (2005) 7. Schmidt, A., Siebert, K.G.: Design of Adaptive Finite Element Software – The Finite Element Toolbox ALBERTA. Springer, New York (2005) 8. Dedner, A., Rohde, C., Schupp, B., Wesenberg, M.: Comput Visual Sci 7, 79–96 (2004) 9. Bastian, P., Riviere, B.: Discontinuous galerkin methods for two-phase flow in porous media. In: Technical reports of the IWR (SFB 359), Heidelberg University (2004) 10. Ohlberger, M., Rohde, C.: IMA J. Numer. Anal. 22 (2), 253–280 (2002) 11. Arnold, D.N., Brezzi, F., Cockburn, B., Marini, L.D.: SIAM. J. Numer. Anal. 39 (5) (1749)–1779 (2002) 12. Cockburn, B., Shu, C.W.: SIAM. J. Numer. Anal. 35 (6), 2440–2463 (1998) 13. Helmig, R.: Multiphase Flow and Transport Processes in the Subsurface: A contribution to the modeling of hydrosystems. Springer, New York (1997) 14. Bastian, P., Birken, K., Johannsen, K., Lang, S., Neuss, N., Rentz-Reichert, H., Wieners, C.: Comput. Visual Sci. 1, 27–40 (1997) 15. Corapcioglu, M.Y., Baehr, A.: Water Resour. Res. 23 (1), 191–200 (1987)
Numerical Modeling of Some Free Turbulent Flows G.G. Chernykh1 , A.G. Demenkov2 , A.V. Fomina3 , B.B. Ilyushin2 , V.A. Kostomakha4, N.P. Moshkin1 , and O.F. Voropayeva1 1
Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia
[email protected],
[email protected],
[email protected] 2 S.S. Kutateladze Institute of Thermophysics SB RAS, Lavrentiev Ave. 1, 630090 Novosibirsk, Russia
[email protected],
[email protected] 3 Kuzbass State Academy of Pedagogy, Pionerskii Ave. 13, 654066 Novokuznetzk, Russia
[email protected] 4 Lavrentiev Institute of Hydrodynamics SB RAS, Lavrentiev Ave. 15, 630090 Novosibirsk, Russia
[email protected]
Abstract. A series of numerical models of free turbulent flows developed by authors are considered in this paper. These numerical models are based on the modern semi-empirical models of turbulence.
1
Introduction
Free turbulent flows occur in many engineering problems and in our environment. For example, turbulent flows observed in various technical devices and in motions within an atmosphere and ocean. Understanding of nature of turbulence phenomena is crucial for power producing industry and ecological problems. The most commonly interesting classical free turbulent flows are the almost isotropic grid generated turbulence, jets, turbulent wakes behind bodies in the homogeneous and stratified fluid and turbulent spots in stably stratified media. In spite of the fact that isotropic turbulence is actively studied about 70 years, there was no the detailed numerical model based on the closed Karman-Howarth equation allowing to describe turbulence degeneration from fully developed to weak area. Plane and axisymmetric turbulent wakes with varied value of the total excess momentum were considered in the large number of works, but there were no general numerical models which adequately describe the wakes dynamics. Essentially more complicated and less studied is the problem of the swirling turbulent wakes dynamics. Extremely prominent aspect of evolution of turbulent wakes behind axisymmetric bodies of revolution in stably stratified fluid is anisotropy of their decay due to the influence of stratification. Anisotropic decay of turbulence takes place and in other free turbulent flows in stably stratified E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 82–101, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Numerical Modeling of Some Free Turbulent Flows
83
media. Detailed numerical models of anisotropic decay of turbulent wakes in stably stratified fluid were absent. Well-known, that turbulent wakes dynamics in stably stratified fluids accompanied by active generation of internal waves. Unresolved there is a question on comparison of parameters of the internal waves generated by turbulent wakes for self-propelled and towed bodies in stably stratified media. Turbulent spots play the important role in formation of a fine microstructure of hydrophysical fields in ocean. The mathematical models of spots evolutions require the further development. Rather interesting and demanding the further development is the question of numerical simulation of a passive scalar propagation in a free turbulent flows. In particular, the propagation of passive scalar from a local instantaneous source in a turbulent mixing zone in a continuously stratified fluid is of current importance. In the present paper the review of authors studies (during last 10 years) of topics mentioned above is presented. In connection with brevity, the review of the literature is not presented. Detailed bibliography and the review can be found in the quoted articles.
2
Numerical Simulation of Isotropic Turbulence Dynamics
In order to study homogeneous isotropic turbulence dynamics we use the closed Karman-Howarth equation LL 1 ∂ BLL 2 ∂ 4 ∂B , (1) = 4 r˜ K + ˜ r ˜ ∂˜ r Re ∂˜ r ∂t M where BLL (t, r) is a longitudinal two-point correlation function of velocity field, LL = BLL /U 2 ; r is a distance between two points of space; t˜ = U∞ t/M = B ∞ x/M , t is time, connected with distance x from grid in a wind tunnel: U∞ is the velocity of stream in the work section of the wind tunnel or channel, r˜ = r/M, M is a size of a grid mesh; ReM = U∞ M/ν, ν designates a kinematic viscosity coefficient. Equation (1) has been obtained by using the gradient type hypothesis that according to Lytkin [1] connects two-point correlation functions of the third BLL,L and the second orders LL,L = 2K ∂ BLL , B (2) ∂˜ r 1/2 = æ˜ LL (t˜, 0) − B LL (t˜, r˜)] where K r 2[B . The only empirical constant value æ is defined from the relation æ = 1/5C 3/2 , where C is the universal Kolmogorov constant, C = 1.9. For initial condition we shall set a positive monotonically decreasing function ϕ(r), consistent with the experimental data: LL (t˜0 , r˜) = ϕ(˜ B r ), t˜ = t˜0 , 0 < r˜ < ∞.
(3)
84
G.G. Chernykh et al.
Fig. 1. Normalized longitudinal double correlation functions; ◦ – t˜ = 40, • – t˜ = 60, + – t˜ = 80, × – t˜ = 100, – t˜ = 120, – t˜ = 140, – t˜ = 160, – t˜ = 200, – t˜ = 220 ,2 – t = 240 ; ——- predictions
The boundary conditions are assumed as follows: LL ∂B LL → 0, r˜ → ∞, t˜ ≥ t˜0 . = 0, r˜ = 0; B ∂˜ r
(4)
Implicit finite-difference method on a moving grid [2] is used to find approximate solution. Predictions for the evolution of developed and weak grid turbulence have been made. As an example calculated (continuous lines) and measured [3] normalized correlation functions are compared in Figure 1. Comparisons with measurements demonstrate good agreement in the case of developed turbulence dynamics and quite satisfactory agreement for weak turbulence dynamics [4]. The numerical realization of the Loitsianskii-Millionschikov asymptotic solution was carried out. The Loitsianskii invariant was constant in all numerical experiments [2].
3
Plane and Axisymmetric Wakes
The numerical simulation of plane and axisymmetric wakes with varied values of total excess momentum has been performed. The mathematical model of the flow
Numerical Modeling of Some Free Turbulent Flows
85
is based on the unclosed system of the motion and continuity equations written in terms of the thin shear layer approach. The closure relation for this system is formulated with the help of algebraic Reynolds stress model [5]. The numerical solution of the problem is performed with the use of the finite difference algorithm realized on moving grids. The results of numerical experiments are in a good agreement with known experimental data [6].The asymptotic behavior of wake characteristics with small total excess momentum in homogeneous fluid has been studied. Based on the modern nonlinear approximation of the pressure-strain terms in the equations for the Reynolds stresses transfer [7] and the modified algebraic and differential models for triple correlations of the velocity fields [8] the numerical models of momentumless turbulent wake behind sphere have been constructed [9,10]. The system of equations describing the turbulent flow in the wake in homogeneous media comprises the averaged motion equation in the far wake approximation ∂ ∂Ud ∂ = u v + u w . (5) U∞ ∂x ∂y ∂z To close equation (5) we use three semiempirical turbulence models based on the differential equations for transport of the Reynolds stress tensor (i, j = 1, 2, 3) and dissipation rate (hereafter summation is taken over repeated indices): ∂ ui uj 2 = Dij + Pij + φij − δij ε, U∞ (6) ∂x 3 U∞
∂ ∂ε = ∂x ∂y
cs e 2 ∂ε v σ1 ε ∂y
+
∂ ∂z
cs e 2 ∂ε w σ1 ε ∂z
ε ε2 + cε1 P − cε2 . e e
(7)
Model 1 is one of the classical second-order models. In this model, we use the simplified approximations of the third-order moments ui uj uk and pressurestrain terms φij 0
∂ ui uj uk ∂ ui uj ∂ ui uj uk e ∂ Dij ∼ =− ≈ cs uk uk , =− ∂xk ∂xk ∂xk ε ∂xk
(8)
φij = −c1 εaij − c2 (Pij − 2/3δij P ) , (9) where aij = ui uj − 2/3δij e /e is the tensor of anisotropy, e = ui ui /2 is the turbulent energy, δij is the Kronecker delta. This model was constructed in [11] for description of the momentumless wake dynamics in the stratified flow. In Model 2, the diffusion term is approximated with using (8) and the pressurestrain terms are approximated as follows [7]: φij = φij1 + φij2 , φij1 = −c1 ε(aij + c1 (aik ajk − 1/3δij A2 )),
86
G.G. Chernykh et al.
φij2 = −0.6(Pij − 2/3δij P ) + 0.6aij P − 0.2Bij1 − c2 [A2 (Pij − Dij )+
Bij1
3ami anj (Pmn − Dmn )] ,
uk uj ul ui ∂Uk ∂Ul ∂Uj ∂Ui u u = + + uj uk − l k ui uk , e ∂xl ∂xk e ∂xl ∂xl
∂Uk ∂Uk + uj uk , Dij = − ui uk ∂xj ∂xi
A=1−
9 (A2 − A3 ) , 8
A2 = aij aji ,
A3 = aij ajk aki .
In the Model 3, as in the Model 1, we use the approximation (9) to calculate the terms φij . The values of third-order moments ui uj uk are determined by solving the differential equation (see, for example, [12]; l=2,3) ∂ ui uj uk ∂Cijkl ∂Uk ∂Ui ∂Uj U∞ =− − ui uj ul − uj uk ul − ui uk ul ∂x ∂xl ∂xl ∂xl ∂xl u u ∂ u ∂ u u ε ∂ u i j j k i k − uk ul − uj ul − ui ul − c3e ui uj uk . (10) ∂xl ∂xl ∂xl e In equation (10) Cijkl are cumulants of fourth order determined by algebraic relations [8]. In this model, we transformed the expressions Cijkl from [8] to the simplified diffusion form ([12]): ∂ ui uj uk e α ul ul , α = 1 + δil + δjl + δkl . Cijkl = − c4 ε ∂xl A comprehensive description of the numerical models, algorithm and the results of its testing one can find in [9]. To verify the efficiency of the presented models we calculated the flows in the momentumless turbulent wake behind a sphere. The numerical data are compared with the experimental one [13] in the Figures 2 and 3. There is reasonable good agreement between the calculated and experimental data. Valid description of the decay of the normal Reynolds stresses is given by models 2 and 3 based on the modified representations of the pressurestrain term and third-order moments.
Fig. 2. Variation of the intensities of the normal Reynolds stresses u ˆi 1/2 2 /U∞ on the wake centerline with distance from the body ui (x, 0, 0)
=
Numerical Modeling of Some Free Turbulent Flows
87
Fig. 3. Variation of the half-width l of the turbulent wake at its cross-section with distance from the body, and the distribution of the normalized defect of the longitudinal ˆd = Ud /Ud (x, 0, 0) at the distance x = 15D. Here we used the velocity component U same notation of lines as in previous Figure 2.
4
Dynamics of Swirling Turbulent Wakes
The numerical analysis of dynamics of swirling turbulent wakes with varied values of total excess momentum and angular momentum was carried out[14] - [18]. To describe the flow the following system of averaged equations for the motion and continuity in the thin shear layer approach is used: ∂U 1 ∂ ∂ ∂U +V =− r u v + U ∂x ∂r r ∂r ∂x
U
∞ r
[W 2 + w2 − v 2 ] dr r ∂( u2 − v 2 ) , (11) − ∂x
∂W ∂W VW 1 ∂ v w +V + =− r v w − , ∂x ∂r r r ∂r r ∂U ∂V V + + = 0. ∂x ∂r r
(12)
(13)
The closed system of equations is written for two different formulations of the closure relations. Mathematical Model 1 includes the following equations for the Reynolds stresses transfer: ∂ u2 ∂ u2 2 ∂U U +V = −2(1 − α) u v − ε− ∂x ∂r ∂r 3 2 Cs ∂ re v 2 ∂ u2 ε 2 2 C1 , u − e + αP + e 3 3 r ∂r ε ∂r
(14)
88
G.G. Chernykh et al.
∂ v 2 ∂ v 2 W 2 W +V − 2 v w = 2(1 − α) v w − ε− U ∂x ∂r r r 3 2 2 ∂ v 2 Cs ∂ re 2 v w ε 2 2 2 C1 v − e + αP + + − v − e 3 3 r ∂r ε ∂r r 2 2
v − w w ∂ v 2Cs e v w + w2 , (15) rε ∂r r ∂ w2 ∂ w2 W 2 ∂W U +V + 2 v w = −2(1 − α) v w − ε− ∂x ∂r r ∂r 3 2 2 ∂ w 2 Cs ∂ re 2 v w ε 2 2 2 C1 w − e + αP + + v + + e 3 3 r ∂r ε ∂r r 2 2
v − w 2Cs e w ∂ v + w2 v w , (16) rε ∂r r
U
U
∂U ∂ u v W ε ∂ u v +V − u w = −(1 − α) v 2 − C1 u v + ∂x ∂r r ∂r e Cs ∂ re 2 ∂ u v v w u w + v − − r ∂r ε ∂r r 2 u v Cs e ∂ u w + w v w , (17) rε ∂r r ∂ v w W 2 2 ∂ v w +V − ( u − w )= ∂x ∂r r ∂W 2 W ε − w ) − C1 v w + − (1 − α)( v 2 ∂r r e 2 2 ( v − w ) ∂ v w Cs ∂ re v 2 + v w − r ∂r ε ∂r r
2 2 2 v w − w ) Cs e ∂( v v w −4 w . (18) rε ∂r r
To determine the values of the rate of dissipation we make use of the relevant differential equation U
∂ε ∂ε Cε ∂ re 2 ∂ε ε +V = ( v ) + (Cε1 P − Cε2 ε), ∂x ∂r r ∂r ε ∂r e
(19)
∂U ∂(W/r) − v w . ∂r ∂r The turbulent shear stress u w is evaluated by formula [5]: P = − u v
u w = α1 (u v
∂W ∂V + v w ), ∂r ∂r
(20)
Numerical Modeling of Some Free Turbulent Flows
e α1 = −λ1 , ε
λ1 =
89
1 − C2 . C1 + P/ε − 1
In Model 2 we determine the Reynolds shear stresses by the following relations [5]: ∂U , u v = α1 v 2 ∂r
∂ W 2 2 v w = α1 ( v 2 r (W/r) + ( v − w )). ∂r r
The normal Reynolds stresses are obtained from the transport differential equations (14)-(16) and u w from relationship (20). The quantities Cs , Cε , α, C1 , C2 , Cε1 , Cε2 are well known empirical constants. Their values are taken to be equal 0.22, 0.17, 0.6, 0.6, 2.2, 1.45, 1.92. The results of the numerical experiments made on the basis of Models 1, 2 are given below. The experimental data were obtained in [15]. The experiments were carried out at the Reynolds U0 D = 50, 000 where ν is the coefficient of kinematic viscosnumber Re = ν ity and U0 is the velocity of unperturbed fluid. In Figure 4 a) the calculated distributions of the defect of the longitudinal velocity component U1 (r, x) are compared with the experimental data [15]. Both models show a rather good fit of the calculation results to the experimental data in the near-axis zone of the wake. Agreement between the calculation results and the experimental data in the peripheral region of the wake is somewhat worse. This is because of lack of second-order models which do not allow for the intermittency of the flow in the external regions of the wake. Noteworthy is a substantial difference in the behavior of the calculated profiles U1 (r, x) for small values of r. This is because equation (11) in Model 2 is of diffusion type. In Figure 4 b) the calculated distributions of the tangential velocity component W (r, x) are compared with the experimental data. Both models describe satisfactorily the experimental data. As in the case of comparison between the calculated and experimentally measured distributions U1 (r, x), the best agreement is obtained when Model 2 is used.
a)
b)
Fig. 4. Experimental and calculated distributions of the defect of the longitudinal U1 (r, x) and tangential W (r, x) velocity components
90
G.G. Chernykh et al.
Fig. 5. The transverse distributions of the turbulent fluctuation intensities of the velocity components
The transverse distributions fluctuation of the turbulent intensities of the velocity components σu = u2 , σv = v 2 , σw = w2 are shown in Figure 5. The calculation results are shown by lines, the experimental data by points. Agreement between the calculation results obtained by Models 1, 2 and the experimental data can be considered to be satisfactory. In Figure 6 the calculated values of tangential stress u v /U02 are compared with the experimental data. Agreement between the calculation results and the experimental data is sufficiently good. Figure 7 demonstrates a change in the characteristic scales of turbulence depending on the distance from a body. Here U10 is the axial value of the defect of the longitudinal velocity component, Wm is the maximum value of the tangential velocity component at the given crosssection of the wake, e0 is the value of turbulent energy along the wake axis, and r1/2 (x) is the characteristic wake width specified by the relation σu (r1/2 , x) = 0.5 · σu (0, x). At large distances from the body, the dependence of all the scale functions on is a power dependence (solid thin straight lines in Figure 7); within the framework of the mathematical model used, this is one necessary indication that the self-similarity of the turbulent motion in the wake is reached.
Fig. 6. Experimental and calculated distributions of the tangential stress
Fig. 7. The behavior of the characteristic scales of turbulence depending on the distance from the body
Numerical Modeling of Some Free Turbulent Flows
a)
91
b)
Fig. 8. The self-similar profiles of the defect of the longitudinal velocity component, the tangential velocity component and the turbulence energy
The other feature of self-similarity is the affine similarity of the transverse distributions of various turbulence characteristics in the wake, which are normalized to the corresponding scales. Figure 8 presents self-similar profiles of the defect of the longitudinal velocity component, the tangential velocity component, and turbulent energy, which are examples of the realization of this flow regime in the wake. Figure 8 a) corresponds to Model 1, Figure 8 b) to Model 2. As is seen, just as the asymptotic decay is attained (see Figure 7), the similarity of the transverse distributions is attained in the wake for x/D > 1000. The flow at these distances can be considered to be self-similar (in the framework of the adopted mathematical models). The numerical modeling of dynamics of swirling turbulent wake behind selfpropelled body in passively stratified fluid has been carried out. The dynamics of density defect ρ1 = ρ − ρs is characterized by Figure 9. Here ρ - is the
Fig. 9. Variation of ρ1 (x, 0, z) versus the distance from the body
92
G.G. Chernykh et al.
averaged density of fluid, ρs = ρ0 (1 − az)-the density of undisturbed fluid; l1/2 - characteristic wake width: e(l1/2 , x) = e0 /2. Like the case of nonswirling wake the computational results are consistent with the notions of partial mixing in the turbulent wake. Based on the Model 2 the numerical model of nonswirling axisymmetric turbulent jets (W = 0) has been constructed. The results of computations are in a satisfactory agreement with known experimental data.
5
The Propagation of a Passive Admixture from a Local Instantaneous Source in a Turbulent Mixing Zone
The numerical investigation of the propagation of admixture from a local instantaneous source in a two-dimensional turbulent mixing zone in a continuously stratified fluid was conducted [18]-[20] (these results were obtained in collaboration with Prof. Yu.D. Chashechkin). The flow for the case in which the location of the admixture source does not coincide with the center of the turbulent zone was under consideration. To describe the evolution of the turbulent mixing zone, we will use the following system of average equations for the motion, continuity, incompressibility, and passive-admixture concentration transport (notations are standard): ∂U ∂U 1 ∂p1 ∂ 2 ∂ ∂U +U +V =− − u − u v , ∂t ∂x ∂y ρo ∂x ∂x ∂y
(21)
∂V ∂V ∂V 1 ∂p1 ∂ ∂ 2 gρ1 +U +V =− − u v − v − , ∂t ∂x ∂y ρo ∂y ∂x ∂y ρo
(22)
∂ρ1 ∂ρ1 dρs ∂ ∂ρ1 ∂ +U +V +V = − u ρ − v ρ , ∂t ∂x ∂y dy ∂x ∂y
(23)
∂V ∂U + = 0, ∂x ∂y
(24)
∂Θ ∂Θ ∂ ∂ ∂Θ +U +V = − u θ − v θ . ∂t ∂x ∂y ∂x ∂y
(25)
The system of equations (21)–(25) is nonclosed; in order to determine the normal Reynolds stresses ui uj (i = j = 1, 2), the fluxes ui ρ (i = 1, 2), and the 2 density fluctuation variance ρ we will use the algebraic approximations [21]: ui uj 2 2 P 2 G 1 − C2 Pij 1 − C3 Gij = δij + − δij − δij + . (26) e 3 C1 ε 3 ε C1 ε 3 ε Here
1 ∂Uj ∂Ui + uj uk , Gij = (ui ρ gj + uj ρ gi ), Pij = − ui uk ∂xk ∂xk ρ0
Numerical Modeling of Some Free Turbulent Flows
93
g = (0, −g, 0), U1 = U, U2 = V, 2P = Pii , 2G = Gii , u2 e ∂ρ , C1t ε ∂x
(27)
e ∂ρ 1 − C2t [v 2 + gρ2 ], C1t ε ∂y ρo
(28)
2 e ∂ρ v ρ . Ct ε ∂y
(29)
−u ρ = −v ρ =
ρ2 = −
We also define the quantities u θ and v θ using gradient type relationships similar to those used for the approximation of u ρ and v ρ [20]. In addition to Eqs. (21)–(25) and relationships (26)–(29), we introduce differential equations for the transport of turbulence energy e = ui ui /2 (i = 1, 2, 3), the dissipation rate ε, and the tangential Reynolds stress u v : ∂e ∂e ∂e ∂ ∂ ∂e ∂e +U +V = Kex + Key + P + G − ε, ∂t ∂x ∂y ∂x ∂x ∂y ∂y ∂ε ∂ε ∂ε ∂ ∂ ∂ε ∂ε ε ε2 +U +V = Kεx + Kεy + Cε1 (P + G) − Cε2 , ∂t ∂x ∂y ∂x ∂x ∂y ∂y e e
(30)
(31)
∂u v ∂u v ∂ ∂u v ∂u v +U +V = Kex + ∂t ∂x ∂y ∂x ∂y ∂ ∂u v ε Key + (1 − C2 )P12 + (1 − C3 )G12 − C1 u v . ∂y ∂y e
(32)
Here Kex , Key , Kεx , Kεy are the turbulent viscosity coefficients, C1 , C2 , C3 , Cε1 , Cε2 – well-known empirical constants. The boundary and initial conditions for the system of equations (21)–(25), (30)–(32) are: U = V = ρ1 = e = ε = Θ = u v = 0, r2 = x2 + y 2 → ∞, t ≥ t0 ;
(33)
e(0, x, y) = e0 (r), ε(0, x, y) = ε0 (r), Θ(0, x, y) = Θ0 (x, y), r2 ≤ R2 , t = t0 , (34) e(0, x, y) = ε(0, x, y) = Θ(0, x, y) = 0, r2 ≥ R2 , ρ1 = U = V = u v = 0, −∞ < y < ∞, −∞ < x < ∞, t = t0 .
(35)
Here, e0 (r), ε0 (r), and Θ0 (x, y) are finite bell-shaped functions. In solving the problem numerically, we map the zero boundary conditions corresponding to r → ∞ onto the boundaries of a fairly large rectangle. The problem variables can be made dimensionless by using the characteristic length scale R (radius of the turbulized fluid zone at the initial moment of time) and the velocity scale U0 determined by the initial turbulence energy
94
G.G. Chernykh et al.
Fig. 10. Schematic image of turbulent mixing zone at t = 0
∗ Fig. 11. Relative location rm = rm /R of the maximum admixture concentration as a function of the dimensionless age t∗ of the spot
distribution: U0 = (e(0, 0, 0))1/2 . We will also use the following notation for the dimensionless variables: x∗ = x/R, y ∗ = y/R, t∗ = t/τ, τ = R/U0 , ∗
Ui∗ = Ui /U0 , ui uj = ui uj /U02 , ε∗ = εR/U03 , ρ∗ = ρ/aRρ0 , Θ∗ = Θ/Θm (0), Θm (0) = max Θ0 (x, y), x,y
a = −(1/ρ0 )dρs /dy, y = 0. As a result, in the dimensionless equations the quantity 4π 2 /F r2 appears in place of g. Here, the density Froude number F r is determined by the equality Fr =
U0 T 2π , T = √ , R ag
(36)
where T is the Brunt-Vaisala period. The finite difference algorithm is based on the application of methods of splitting by space variables [20]. To analyze the propagation of a passive scalar from a local source in a turbulent mixing zone in a homogeneous and linearly stratified medium, a series of numerical experiments was carried out on the basis of the mathematical model (21)–(35). We considered the following three variants of the location of the center of the circle C0 given by the values of the abscissa and ordinate x0 and y0 (Figure 10 ): 1) x∗0 = 0, y0∗ = 0; 2) x∗0 = 0, y0∗ = 0.57; 3) x∗0 = 0.57, y0∗ = 0.57. Figure 11 illustrates the time variation of the quantity 1 rm (t) = (xm (t))2 + (ym (t))2 ) 2 which characterizes the location of the maximum ∗ concentration Θm (t) = maxx,y Θ∗ (t, x, y). In this figure curve 1 was obtained for a homogeneous fluid (g = 0) and x∗0 = y0∗ = 0.57. Curve 2 corresponds to
Numerical Modeling of Some Free Turbulent Flows
95
F r = 4.7, x∗0 = 0, and y0∗ = 0.57. Curves 3 and 4 were obtained for same initial conditions as curve 1 but for F r = 4.7 and 22.1, respectively. The behavior of curves 3 and 4 is controlled by the difference between the turbulent diffusion coefficients in homogeneous and stratified fluids. At fairly large times t∗ = t/τ = (t/T )F r as a result of the suppression of the vertical turbulent transfer the values of the turbulent diffusion coefficient Kθy become significantly less than the corresponding values in a homogeneous fluid. The stratification effect manifests itself the more strongly, the smaller the Froude number. Therefore, curve 3 deviates from curve 1 earlier than curve 4 and the displacement of the location of the maximum average concentration in the direction of the origin proceeds more slowly. The results of numerical experiments indicate that the nature of the average admixture concentration distribution depends significantly on the initial data for this quantity. In particular, the displacement of the maximum average concentration toward the center of turbulent zone in homogeneous and linearly stratified fluids is fairly slowly as compared with turbulence degeneration. The results of the calculations in pycnocline show the possibility of situations in which the propagation of the passive admixture is determined to a considerable extent by convective flow generated by the turbulent mixing zone.
6
Evolution of Momentumless Turbulent Wakes in Stably Stratified Media
A numerical analysis of the evolution of momentumless turbulent wakes in continuously stratified fluid has been carried out by using a hierarchy of semiempirical turbulence models. In order to describe the flow in a far turbulent wake of a body of revolution in a stratified medium the three-dimensional parabolized system of the averaged equations for the motion, continuity and incompressibility in the Oberbeck-Boussinesq approach was used. This system of equations is nonclosed. We consider the hierarchy of five closed mathematical flow models. In the Model 1 the quantities of the components of the Reynolds stresses tensor ui uj are approximated by the “isotropic” relationships [21]: ui uj 2 2 P 2 G 1 − c2 Pij 1 − c2 Gij = δij + − δij − δij + . e 3 c1 ε 3 ε c1 ε 3 ε Model 2 is based on locally equilibrium approximations for the determination of ui uj . In Model 3 the quantities ui uj (i = j = 1, i = j = 2, i = j = 3, i = 2, j = 3) are computed by solving the differential equations [22]:
U0
∂ui uj ∂ui uj ∂ui uj 2 +V +W == Dij + Pij + Gij − δij ε− ∂x ∂y ∂z 3 ε 2 2 2 c1 ui uj − δij e − c2 Pij − δij P − c2 Gij − δij G , (37) e 3 3 3
96
G.G. Chernykh et al.
∂Uj ∂Ui 1 Pij = − ui uk + uj uk , Gij = ui ρ gj + uj ρ gi , ∂xk ∂xk ρ0 − 2, → g = (0, 0, −g) , 2P = Pii , 2G = Gii . e = u2 + v 2 + w2 We used the simplified relations ∂uj p ∂ ≈− u u u = ∂xi ∂xk k i j ∂ui uj ∂ e ∂ − uk ui uj 0 ≈ cs uk uk , k = 2, 3. (38) ∂xk ∂xk ε ∂xk
∂ 1 Dij = − u u u − ∂xk k i j ρ0
∂ui p ∂xj
1 − ρ0
The turbulent fluxes ui ρ and the dispersion of the density fluctuations ρ2 in Models 1-3 are replaced by locally equilibrium approximations: e ∂ ρ ∂U − u ρ = + (1 − c2T ) w ρ u w , c1T ε ∂z ∂z 2 2 ∂ ρ v e ∂ ρ e g 2 , − w ρ = + (1 − c2T ) − v ρ = ρ , w c1T ε ∂y c1T ε ∂z ρ0
2 2 e ∂ ρ w ρ . ρ =− cT ε ∂z The difference between Model 4 and Model 3 consists in using the algebraic relationships [23] for the determination of the quantities ui ρ . In order to determine the rate of dissipation the differential transport equation was used U0
∂ε ∂ε ∂ ∂ε ε ε2 ∂ +V +W = − v ε − w ε + cε1 (P + G) − cε2 . (39) ∂x ∂y ∂z ∂y ∂z e e
In Models 1 - 4 we determined the diffusion terms in (39) by relationships: −v ε =
cs e 2 ∂ε v , σ1 ε ∂y
−w ε =
cs e 2 ∂ε w . σ1 ε ∂z
Model 5 differs from the Model 3 by using the simplified variant of algebraic model of triple velocity field correlations [8] instead of simplest Daly and Harlow approximations [24]. This model takes into account the anisotropic damping effect by a stable stratification on the third-order moment values: −ui uj uk =
ui uj uk 0 , λ g e2 ∂ρ 1− c3 c3θ ρ0 ε2 ∂z
λ = δi3 + δj3 + δk3 .
In addition, Model 5 includes modified equation of the rate of turbulent kinetic energy dissipation [25], which based on relationships:
Numerical Modeling of Some Free Turbulent Flows
−v ε =
cs σ2 ε 1−
−w ε =
97
ev 2 ∂ε , 2 2 g e ∂ρ ∂y c3 c3θ ρ0 ε2 ∂z
cs σ2 ε 1−
ew2 ∂ε . 2 2 g e ∂ρ ∂z c3 c3θ ρ0 ε2 ∂z
We take standard values of empirical constants c1 = 2.2, c2 = 0.55, cε1 = 1.45, cε2 = 1.9 cs = 0.25, σ1 = 1.3, σ2 = 1.0, c3 = 4.5, c3θ = 4.5. A comparison with the experimental data of Lin and Pao [26] has been carried out in Figures 12–15 (cD = 0.22, FD = 120, FD = U0 T /D, U0 - the unperturbed fluid velocity, T - the Vaisala-Brunt period, D- the body diameter). It has been shown [11] that the Model 3 is the best of Models 1-4. Models 3 and 4 produce close results. But Model 3 is characterized by a significantly large deviations in description of the evolution of wake height. Figure 12 illustrates the behavior of horizontal wake size Ly . In Figure 13 we compare the vertical wake size Lz computed by using Models 3 and 5 with the results of the laboratory measurements of Lin and Pao [26]. The quantities Ly and Lz have been determine from the relationships e(t, Ly , 0) = 0.01e(t, 0, 0), e(t, 0, Lz ) = 0.01e(t, 0, 0), t = x/U0 . Figures 14 and 15 show the computed and measured decay of intensities of turbulent fluctuations. The comprehensive description of the results of numerical experiments one can find in [11], [27]-[29]. Based on the modern algebraic and differential models for the triple correlations of the velocity and density field the numerical models od anisotropic decay of momentumless wake in a linearly stratified fluid has been constructed [30]. The computed temporal (t = x/U∞ ) variation of the intensity of turbulent fluctuations depends on the used numerical model. Based on differential equations for all triple correlations of velocity field transfer and modified algebraic relations of the combined triple correlations of velocity and density fields model [30] gives valid description of the anisotropic decay of the normal Reynolds stresses in a far wake in a linearly stratified fluid.
Fig. 12. H1 = 2Ly /D(cD FD )1/4
Fig. 13. H2 = 2Lz /D(cD FD )1/4
98
G.G. Chernykh et al.
3/4
Fig. 14. u0 = FD u2 (t, 0, 0)1/2 /U0
7
3/4
Fig. 15. w0 = FD w2 (t, 0, 0)1/2 /U0
Wakes behind the Towed Body in Stably Stratified Media and Internal Waves Generated by Turbulent Wakes
A hierarchy of semi-empirical turbulence models of second order is involved for the description of fluid flow in far turbulent wake behind a towed body [31]. Most complicated of models includes the differential equations for normal Reynolds stresses transfer as well as equation for the triple correlations of the fluctuations of vertical velocity component. The results of the calculations showing dynamics of a far turbulent wake in linearly stratified medium in comparison with dynamics of momentumless turbulent wake are demonstrated. It is shown that the turbulent wake behind a towed body is characterized by much larger geometric sizes and values of pressure defect (in addition to internal wave phenomena [28]). Numerical modelling of anisotropic decay of turbulence in a far wake behind a towed body is carried out. The characteristics of a turbulent wake are calculated for large moments of decay time (t/T ≤ 20). The numerical model of turbulent wake with small total excess momentum in linear stratified fluid has been constructed [32]. The numerical modelling of dynamics of passive scalar in turbulent wakes were carried out [33]. Well-known, that turbulent wakes dynamics in stably stratified fluids accompanied by generations of internal waves. In Figures 16 and 17 characteristics of internal waves generated by turbulent wakes are represented. Simulation made on the based of Model 1 from previous paragraph. The choice of this model of turbulence is due to the following reasons: it is close to the standard e − ε model of turbulence and we can take into account the anisotropy of the turbulence characteristics in the wakes in stratified fluid. A weak dependence of the internal waves characteristics on the applied mathematical model has been shown as well (see, for example [11] ). The pattern of the internal waves generated by turbulent wakes in a linearly stratified fluid depicted in Figure 16 where the dynamics of curves ρ =const is given for the moments of time t/T = 1, 2, 3, 4, 5; FD = 280.
Numerical Modeling of Some Free Turbulent Flows
Fig. 16. Density profiles ρ0 − ρ = ρ0 − ρs (0.1D) for the time t/T = 1, 2, 3, 4, 5; The solid and dashed lines correspond to the momentumless wake and a drag wake, respectively (linearly stratified fluid)
99
Fig. 17. Time dependence of the (1-4) total turbulence energy Ek∗ (t) and (5-8) total internal-wave energy P∗k (t)
The calculation results show that the turbulent wake behind a towed body generates the waves of essentially greater amplitude then behind the selfpropelled body. This phenomenon is illustrated in Figure 17 as well where ∞ ∞ ∞ ∞ ∗2 ∗2 4π 2 ρ1 V + W ∗2 ∗ ∗ ∗ ∗ ∗ Ek (t) = + dy ∗ dz ∗ , e dy dz , Pk (t) = 2 F r2 2 0
0
0
0
are dimensionless values of the total turbulence energy and of the internal waves energy in cross section of wake, respectively. Curves 1, 3, 5 and 7 correspond to linear stratification and curves 2, 4, 6 and 8 correspond to pycnocline. The solid and dashed lines are for the self-propelled body and towed body, respectively. Some physical explanation of this effect has been made [28].
8
Conclusion
Based on the closed Karman-Howarth equation the advanced numerical model for isotropic turbulence dynamics has been constructed. Comparisons with measurements demonstrate good agreement in the case of developed turbulence dynamics and quite satisfactory agreement for weak turbulence dynamics. The numerical simulation of plane and axisymmetric wakes with varied values of total excess momentum has been performed. The numerical analysis of dynamics of swirling turbulent wakes with varied values of total excess momentum and angular momentum was carried out. The numerical investigation of the propagation of passive admixture from a local instantaneous source in a two-dimensional turbulent mixing zone in a continuously stratified fluid was conducted. The results of numerical experiments indicate that the nature of the average admixture concentration distribution depends significantly on the initial data for this quantity. A numerical analysis of the evolution of momentumless and drag turbulent
100
G.G. Chernykh et al.
wakes in continuously stratified fluid has been carried out by using a hierarchy of modern semi-empirical turbulence models. The calculation results show that the turbulent wake behind a towed body generates the waves of essentially greater amplitude then behind the self-propelled body. The work was supported by the Russian Foundation for the Basic Research (01-01-00783, 04-01-00209, 06-01-00724, 07-01-0363).
References 1. Lytkin, Yu., Chernykh, G.: Dyn. Splosh. Sredy. Novosibirsk 27, 124–130 (1976) (in Russian) 2. Chernykh, G., Korobitsyna, Z., Kostomakha, V.: Int., J. Comput. Fluid Dyn. 10, 173–182 (1998) 3. Kostomakha, V.: Dynamika Sploshnoi Sredy. Novosibirsk 70, 92–104 (1985) 4. Ling, S., Huang, T.: Phys. Fluids 13, 2912–2920 (1970) 5. Rodi, W.: Turbulent models and their applications in hydraulics. University of Karlsruhe, Karlsruhe (1980) 6. Chernykh, G., Demenkov, A.: Russ. J. Numer. Anal. Math. Model 12, 111–125 (1997) 7. Craft, J., Ince, N., Launder, B.: Dyn. Atmos. Oceans 23, 99–115 (1996) 8. Ilyushin, B.: Higher-moment diffusion in stable stratification. In: Launder, B., Sandham, N. (eds.) Closure strategies for turbulent and transition flows. Cambridge University Press, Cambridge (2002) 9. Voropayeva, O.: Russ. J. Numer. Anal. Math. Model 19, 83–102 (2004) 10. Voropayeva, O.: Russ. J. Numer. Anal. Math. Model 22, 87–108 (2007) 11. Chernykh, G., Voropayeva, O.: Comput. Fluids 28, 281–306 (1999) 12. Kurbatskii, A.: Modelling of nonlocally transfer of turbulent momentum and heat. Nauka. Novosibirsk (in Russian) (1988) 13. Aleksenko, N., Kostomakha, V.: J. Appl. Mech. Tech. Phys 28, 60–64 (1987) 14. Chernykh, G., Demenkov, A., Kostomakha, V.: Russ. J. Numer. Anal. Math. Model 13, 279–288 (1998) 15. Gavrilov, N., Demenkov, A., Kostomakha, V., Chernykh, G.: J. Appl. Mech. Techn. Phys 41, 619–627 (2000) 16. Vasiliev, O., Demenkov, A., Kostomakha, V., Chernykh, G.: Dokl. Phys. 46, 52–55 (2001) 17. Chernykh, G., Demenkov, A., Kostomakha, V.: Int. J. Comput. Fluid Dyn. 19, 399–408 (2005) 18. Voropayeva, O., Yu, C., Chernykh, G.: Dokl. Phys. 42, 566–569 (1997) 19. Voropayeva, O., Chernykh, G.: J. Appl. Mech. Tech. Phys. 39, 546-552 (1998) 20. Chashechkin, Yu., Chernykh, G., Voropayeva, O.: Int. J. Comput. Fluid. Dyn. 19, 517–530 (2005) 21. Rodi, W.: J. Geophys. Res. 92, 5305–5328 (1987) 22. Gibson, M., Launder, B.: J. Fluid Mech. 86, 491–511 (1978) 23. Gibson, M., Launder, B.: J. Heat Transfer Trans. ASME C98, 81–87 (1976) 24. Daly, B., Harlow, F.: Phys. Fluids 13, 2634–2649 (1970) 25. Voropayeva, O., Ilyushin, B., Chernykh, G.: Thermophys Aeromech 10, 379–389 (2003) 26. Lin, J., Pao, Y.: Annu. Rev. Fluid Mech. 11, 317–336 (1979) 27. Voropayeva, O., Chernykh, G.: J. Appl. Mech. Tech. Phys. 38, 391–406 (1997)
Numerical Modeling of Some Free Turbulent Flows
101
28. Voropayeva, O., Moshkin, N., Chernykh, G.: Dokl. Phys. 48, 517–521 (2003) 29. Voropayeva, O., Ilyushin, B., Chernykh, G.: Dokl. Phys. 47, 762–766 (2002) 30. Voropaeva, O.: Hierarchy of second- and third order turbulence models for calculating the momentumless wakes. In: Int. Conf. on the Methods of Aerophysical Research. ITAM SB RAS, Novosibirsk (2007) 31. Chernykh, G., Fomina, A., Moshkin, N.: Russ. J. Numer. Anal. Math. Model 21, 395–424 (2006) 32. Moshkin, N., Chernykh, G., Fomina, A.: Matem. Mod. 17, 19–33 (in Russian) (2005) 33. Chernykh, G., Fomina, A., Moshkin, N.: Russ. J. Numer. Anal. Math. Model 20, 403–424 (2005)
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes E. B¨ ansch1 , O. Goncharova2,3, A. Koop4 , and D. Kr¨ oner5 1
Institute of Applied Mathematics III, Institute of Applied Mathematics III, University of Erlangen-Nuremberg, Haberstraße 2, 91058 Erlangen, Germany
[email protected] 2 Altai State University, pr. Lenina 61, Barnaul, 656049 Russia 3 M.A. Lavrentyev Institute of Hydrodynamics SB RAS, Lavrentyev Ave. 15, Novosibirsk, 630090, Russia
[email protected] 4 Sternenberg 19, 42279 Wuppertal, Germany
[email protected] 5 Section of Applied Mathematics, University of Freiburg, Hermann-Herder-Straße 10, 79104 Freiburg i. Br., Germany
[email protected]
Abstract. The study of fluid flow inside compliant vessels, which are deformed under an action of the fluid, is important due to many biochemical and biomedical applications, e.g. the flows in blood vessels. The mathematical problem consists of the 3D Navier-Stokes equations for incompressible fluids coupled with the differential equations, which describe the displacements of the vessel wall (or elastic structure). We study the fluid flow in a tube with different types of boundaries: inflow boundary, outflow boundary and elastic wall and prescribe different boundary conditions of Dirichlet- and Neumann types on these boundaries. The velocity of the fluid on the elastic wall is given by the deformation velocity of the wall. In this publication we present the mathematical modelling for the elastic structures based on the shell theory, the simplifications for cylindertype shells, the simplifications for arbitrary shells under special assumptions, the mathematical model of the coupled problem and some numerical results for the pressure-drop problem with cylindrical elastic structure.
1
Introduction
The mathematical investigation of blood flow in the human circulatory system is one of the major challenges of last years (see [17,18,19] and [3,4,5] for the analytical results of some mathematical models). By the mathematical modelling of a deformation of vessels we start from the knowledge, that the mechanical properties of blood vessels are very different. Using the experimental data of the vessel structure we can consider the arteries of elastic type [11,18]. Thus we would like to study the coupled problem of fluid E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 102–121, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
103
flow in domains with elastic thin boundaries using elastic shell theory. We model the deformation for the elastic structure considered as a thin shell and define a shell as a body, for which two typical scales (length and width) are significantly larger than the thickness. From geometrical point of view a shell is defined by its middle surface. The common geometrical surface theory is usually used in the shell theory [10,12,16,21,22]. Physically a shell differs from the rods and the membranes due to the character of deformations and due to complex connections between expansion - compression forces [13]. We consider elastic shells consisting of linear elastic material. For this type of elastic bodies the Hooke’s law is used (the law as the relation between the stresses and strains) [1]. On the basis of the variational Hamilton principle we use the Timoshenko hypothesis in order to get a general system of the shell equations and the Love - Kirchhoff hypothesis for its further simplification. Both hypotheses are characterized geometrically as the hypotheses about the rectilinear element or the normal element to the middle surface. Together with the geometrical character of these hypotheses some mechanical assumptions about characters of stresses on the shell should be taken into consideration. The simplifications of the differential equations of thin shells for arbitrary shells and for cylinder-type shells are based on the scaling analysis and some hypotheses. For cylinder-type geometry we obtain the differential wave equation, which is similar to the Quarteroni model for 1D elastic structure (see [18,19]). For arbitrary geometry under special hypotheses we obtain the differential equation with respect to time, which is similar to the equation considered in [14]. We model the fluid flow with the help of the instationary 3D Navier-Stokes equations for incompressible fluid in a domain with different types of boundaries: inflow boundary, outflow boundary and elastic wall and prescribe different boundary conditions of Dirichlet- and Neumann types on these boundaries. The velocity of the fluid on the elastic wall is given by the deformation velocity of the wall. In this sense the idea of a coupling is to consider the elastic wall as the Dirichlet type of boundaries for the fluid flow. The elastic boundary equation can be viewed as a balance of forces on the elastic structure and is used for the simulations of the displacements of the elastic structure. The second condition on the elastic boundary is the condition of continuity of the velocity vector on the moving boundary. This condition is similar to the kinematic condition on a free boundary. We use the numerical methods and the ideas developed in [2,9] for the calculations of the flows in time-dependent domains with Dirichlet-, Neumann- and free boundaries. A stable finite element discretization by Taylor-Hood elements and stable time discretization by ϑ- scheme for solution of the initial boundary value problems for the Navier-Stokes equations are used for the solver of the Navier-Stokes equations (NSE). For the cylinder-type shell we solve an initial boundary value problem for the elastic structure wave equation considered in a weak formulation. We discretize this equation in time using the second order finite differences with respect to time and a version of the Crank-Nicholson scheme.
104
E. B¨ ansch et al.
In this publication we present shortly some results of numerical experiments carried out with the help of our NSE-structure solver by real values of physical parameters for fluid and elastic structure. The results demonstrate the waves along the elastic cylindrical structure.
2
General Assumptions of Linear Shell Theory
The construction of the elastic shell equations is based on the elasticity theory under considerations of some geometrical and mechanical assumptions similar to [10,12,16,21,22]. (A1) The shell is considered to be thin, i.e. h/R 1, where h is the shell thickness, R = min(R1 , R2 ) and R1 , R2 are the principal curvature radii of the middle surface. (In engineering research a shell is considered to be thin, if h/R ≤ 1/20 [8].) This assumption gives a possibility to neglect the small terms of order h/R in comparison with unity. (A2) The shell equations are derived using some version of the classical Timoshenko- and Love-Kirchhoff hypotheses. These kinematic hypotheses are natural for thin shells. – Due to Timoshenko hypothesis (see [12,13,20,21]) a rectilinear control element of the shell, which was normal to the middle surface, remains rectilinear. – Due to the Love-Kirchhoff hypothesis (see [10,12]) points, which lie on one line normal to the undeformed middle surface, also lie on one line normal to the deformed middle surface. The main distinction of the two hypotheses is that the first one takes into consideration approximately the cross-shear deformations. (A3) We assume as in the classical shell theory that the displacements are small compared to the diameter of the shell and the maximal curvature radii. (A4) The effect of some components of the stress tensor may be neglected in comparison with other components of this tensor. (A5) The shell material is supposed to be isotropic, homogeneous and obeys the Hooke’s law used for the dependence of the stress tensor on the deformation tensor. Let us consider a thin shell of constant thickness h and a parametrization for its middle surface by smooth functions x = x(α, β), y = y(α, β), z = z(α, β)
1 2
(1)
or r = r(α, β),
(α, β) ∈ Ω ⊂ R2 .
The orthogonal basis on the middle surface is denoted by (e1 , e2 , n), where e1 , e2 are the unit tangential vectors for coordinate lines (α), (β): e1 = A−1 ∂α r,
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
105
e2 = B −1 ∂β r and n = e1 × e2 . Here A, B are the Lam´e parameters expressed as A2 = (∂α x)2 + (∂α y)2 + (∂α z)2 , B 2 = (∂β x)2 + (∂β y)2 + (∂β z)2 . (2) A coordinate system is chosen so that the coordinate lines α (β = const) and β (α = const) coincide with the lines of the principal surface curvatures. We denote the principal curvatures of coordinate lines (α), (β) with k1 and k2 respectively (k1 = −A−1 ∂α n·e1 , k2 = −B −1 ∂β n·e2 ). The Lam´e parameters and the principal surface curvatures are connected due to the Codazzi conditions ∂β (k1 A) = k2 ∂β A, ∂α (k2 B) = k1 ∂α B
(3)
and due to the Gauss condition ∂α (A−1 ∂α B) + ∂β (B −1 ∂β A) = −ABK
(4)
with the notation K = k1 k2 for the Gauss curvature of the surface (see [12]).
3
Deformations on Thin Shells
(A6) We suppose that the middle surface is deformed and all its points have obtained the displacements: u(α, β, t) = u(α, β, t)e1 (α, β) + v(α, β, t)e2 (α, β) + w(α, β, t)n(α, β). We define the deformations of the middle surface as the relative deformations (strains) 1 , 2 of the coordinate lines and the shear deformations γ (the modification of the angle between the coordinate lines, i.e. of the shear angle of the middle surface). The following expressions for the relative deformations and for the angle modifications can be obtained [16,21]: 1 1 = 1 (u, Du) = A−1 ∂α u + (AB)−1 ∂β A v − k1 w + θ12 , 2 1 2 = 2 (u, Du) = B −1 ∂β v + (AB)−1 ∂α B u − k2 w + θ22 , 2 γ = γ(u, Du) = A−1 ∂α v + B −1 ∂β u − (AB)−1 (u ∂β A + v ∂α B) + θ1 θ2 ,
(5)
where the angles θ1 , θ1 of rotation of the normal to the middle surface in a linear approximation are defined by θ1 = A−1 ∂α w + k1 u, θ2 = B −1 ∂β w + k2 v
(6)
(see [16,22]). Let us denote that the linear relations (5) without θ12 , θ22 , θ1 θ2 can be used in the shell theory (see [12,22]). We want to consider now the shell of thickness h at time t equal to 0 as the h h points rξ (α, β) = r(α, β) + ξn(α, β) located at the distance ξ, − ≤ ξ ≤ from 2 2
106
E. B¨ ansch et al.
the middle surface and will define the deformations in the shell points rξ (α, β) [16,22]. We assume a kinematic hypothesis, which is a generalization of the LoveKirchhoff one. It is the hypothesis about the rectilinearity of the normal element [13,20,22]. Let ψ1 (α, β, t) and ψ2 (α, β, t) be the angles of rotation of the control rectilinear element around the e2 , e1 respectively. (A7) We assume that the displacements of the shell points rξ (not only the middle surface) for every time level t can be given approximately by linear expressions with respect to ξ: uξ (α, β, t) = u(α, β, t) + ξψ1 (α, β, t), vξ (α, β, t) = v(α, β, t) + ξψ2 (α, β, t), wξ (α, β, t) = w(α, β, t).
(7)
They are the functions depending on the coordinates of the middle surface points α, β and on the time t. The unknown functions u, v, w, ψ1 , ψ2 describing the deformation of middle surface are the kinematic functions. For the sake of simplicity we have assumed that wξ = w. The more general case wξ = w+ξψ3 is considered in [22]. (A8) For the relative deformations of the points rξ we assume that they depend linearly on ξ [16,22]: κ1 (α, β, t), 11 = 11 (α, β, t) = 1 (α, β, t) + ξ¯ 22 = 22 (α, β, t) = 2 (α, β, t) + ξ¯ κ2 (α, β, t), γ¯ = γ¯ (α, β, t) = γ(α, β, t) + 2ξ¯ τ (α, β, t), where
(8)
¯1 (u, Du, ψ1 , ψ2 ) = A−1 ∂α ψ1 + ψ2 (AB)−1 ∂β A+ κ ¯1 = κ +k1 (A−1 ∂α u + (AB)−1 ∂β Av − k1 w), κ ¯2 = κ ¯ 2 (u, Du, ψ1 , ψ2 ) = B −1 ∂β ψ2 + ψ1 (AB)−1 ∂α B+ +k2 (B −1 ∂β v + (AB)−1 ∂α Bu − k2 w),
2¯ τ = 2¯ τ (u, Du, ψ1 , ψ2 ) = A−1 ∂α ψ2 + B −1 ∂β ψ1 − (AB)−1 (ψ1 ∂β A + ψ2 ∂α B)+ k1 (A−1 ∂α v − (AB)−1 ∂β Au) + k2 (B −1 ∂β u − (AB)−1 ∂α Bv).
(9)
The deformations are also characterized by θ1 (α, β, t) − ψ1 (α, β, t), θ2 (α, β, t) − ψ2 (α, β, t).
(10)
The differences (10) define the deformation of the cross displacements (or cross shears). The modelling of the shell deformations consisting of (7) - (10) is known as a realization of the Timoshenko shell theory [21,22].
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
4
107
Equations for Shell Movement
We use the variational Hamilton principle to obtain the equations of motion for arbitrary mechanical systems [1,12,16,21,23]. According to this principle for the transition of the system from one state to another state during the time interval t1 [t0 , t1 ] the functional I = (K − Π + A)dt gets a stationary value. Here K is t0
kinetic energy of the system, Π is its potential energy (complete potential energy of deformation) and A is external forces work (see [12,16,21] for the definitions of K, Π, A). The first variation of this functional should be equal to zero. The Euler equation for the functional I will define the equations of motion for the whole system. We suppose the functions K, Π, A to be regular and do not discuss the questions about the shell boundaries and boundary conditions. Additional to the Hamilton principle the damping forces should be taken into consideration. These damping forces may arise from such types of sources as, for instance, internal friction due to imperfect elasticity of bodies [20]. The damping forces are considered as usual to be proportional to the velocity of moving elastic surfaces. (A9) We postulate that the damping forces are of the form: h2 ˙ h2 ˙ ψ1 , ψ2 . ch u, ˙ v, ˙ w, ˙ 12 12 Thus under the assumptions (A1)-(A9) the following system of differential equations can be obtained and written in terms of displacements u, v, w, angles ψ1 , ψ2 , normal forces N1 (u, Du), N2 (u, Du), tangential force S(u, Du), cross-cutting forces Q1 (u, Du), Q2 (u, Du), bending moments M1 (u, Du, D2 u), M2 (u, Du, D2 u), twisting or turning moment H(u, Du, D2 u): ρw h¨ u + chu˙ − (AB)−1 ∂α (BN1 ) − N2 ∂α B + ∂β (AS) + S∂β A + k1 Q1 + (11) k1 (Sθ2 +N1 θ1 )+(AB)−1 −∂α (BM1 k1 )+M2 ∂α Bk2 −H∂β Ak1 −∂β (HAk2 ) =p1 , v + chv˙ − (AB)−1 ∂β (AN2 ) − N1 ∂β A + ∂α (BS) + S∂α B + k2 Q2 + (12) ρw h¨ k2 (Sθ1 +N2 θ2 )+(AB)−1 −∂β (AM2 k2 )+M1 ∂β Ak1 −H∂α Bk2 −∂α (HBk1 ) =p2 , ρw h w ¨ + chw˙ − (AB)−1 ∂α (BQ1 ) + ∂β (AQ2 ) − k1 N1 − k2 N2 − (13) 2 2 −1 −1 −k1 M1 − k2 M2 − (AB) ∂α B(N1 θ1 + Sθ2 ) − (AB) ∂β A(N2 θ2 + Sθ1 ) = pn ,
ρw
h3 ¨ h3 ψ1 + c ψ˙ 1 − Q1 − (AB)−1 ∂α (BM1 ) − M2 ∂α B + ∂β (AH) + H∂β A = 0, 12 12 (14)
108
E. B¨ ansch et al.
h3 ¨ h3 ψ2 + c ψ˙ 2 − Q2 − (AB)−1 ∂β (AM2 ) − M1 ∂β A + ∂α (BH) + H∂α B = 0. 12 12 (15) Here ρw is a structure density, p1 , p2 , pn are the external loads, c is a damping viscosity coefficient. We assume that ρw , h, c, p1 , p2 , pn are given. This system is a system consisting of nonlinear wave equations. The nonlinearity is of geometrical type and due to the rotation angles of the normal to the middle surface. Further physical non-linearities will be obtained in the case, when nonlinear elastic law for the dependence of the stresses σij on the corresponding strains deformations ij will be used for the shell material. Under special assumptions the definitions of N1 , N2 , S, Q1 , Q2 , M1 , M2 , H are given in Section 4.2.
ρw
4.1
Hooke’s Law for Thin Shells
We will consider the displacements u, v, w and the angles ψ1 , ψ2 as basic unknown functions in the equations (11)-(15). The forces N1 , N2 , S and moments M1 , M2 , H will be calculated for the equations (11)-(15) with the help of their physical definitions due to the stresses and with the help of the Hooke’s law of elasticity (see (16)) [12,16,21]). For the Hooke’s law for a thin shell we have the following dependencies of the stresses σ11 , σ22 , σ12 , σ21 on the deformations 11 , 22 , 12 , 21 (12 = 21 = γ¯ /2) [6,12]: E E (11 + ν22 ), σ22 = (22 + ν11 ), σ11 = 2 1−ν 1 − ν2 σ12 = σ21 =
E γ¯ . 2(1 + ν)
(16)
Here E is the Young modulus, ν is the Poisson coefficient. We will not use the relations for stresses σ13 and σ23 to define the cross-cutting forces Q1 , Q2 (Section 4.2). The questions about definition of these forces are also discussed in [16,21]. 4.2
Equations for Shell Movement under Love-Kirchhoff Hypothesis
The equations (11)-(15) are rather global to describe the movement of the thin shells. For further simplification we use some additional assumptions. (A10) We use the Love-Kirchoff hypothesis in the sense that ψ1 = θ1 , ψ2 = θ2 (see (6), (10)). (A11) Ignoring the physical definition for cross-cutting forces Q1 , Q2 on the shell we introduce Q1 and Q2 as differential operators on the bendings and turning moments. In other words the equations (14), (15) are considered as the definitions for cross-cutting forces Q1 , Q2 . We repeat the most essential definitions (5) for the deformations in the geometrically linear case without θi2 (i = 1, 2) and θ1 θ2 [10], namely: 1 = A−1 ∂α u + v(AB)−1 ∂β A − k1 w, 2 = B −1 ∂β v + u(AB)−1 ∂α B − k2 w,
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
109
γ = A−1 ∂α v + B −1 ∂β u − (AB)−1 (u ∂β A + v ∂α B). The formulae (9) can be rewritten as follows: κ1 = A−1 ∂α (A−1 ∂α w + k1 u) + (AB)−1 ∂β A(B −1 ∂β w + k2 v)+ k1 (A−1 ∂α u + (AB)−1 ∂β Av − k1 w), κ2 = B −1 ∂β (B −1 ∂β w + k2 v) + (AB)−1 ∂α B(A−1 ∂α w + k1 u)+ k2 (B −1 ∂β v + (AB)−1 ∂α Bu − k2 w), τ = (AB)−1 (∂αβ w − A−1 ∂β A ∂α w − B −1 ∂α B ∂β w)+ k1 (B −1 ∂β u − u(AB)−1 ∂β A) + k2 (A−1 ∂α v − v(AB)−1 ∂α B)+ 1 (k1 − k2 )(A−1 ∂α v − u(AB)−1 ∂β A − B −1 ∂β u + v(AB)−1 ∂α B), 2 where we omit “bar” for κ1 , κ2 , τ . For the forces and moments N1 , N2 , S, M1 , M2 , H we use the following expressions [12,16,22]: N1 = M1 =
Eh Eh Eh (1 + ν2 ), N2 = (2 + ν1 ), S = γ, 1 − ν2 1 − ν2 2(1 + ν)
(17)
Eh3 Eh3 Eh3 (κ1 + νκ2 ), M2 = (κ2 + νκ1 ), H = τ. 2 2 12(1 − ν ) 12(1 − ν ) 12(1 + ν)
In order to shorten the notations in (18)-(22) we define Φ1 (u, Du, D2 u, D3 u, Q1 ) := −(AB)−1 ∂α B(N1 − N2 ) + B∂α N1 + A∂β S + 2S∂β A + k1 Q1 + k1 (Sθ2 + N1 θ1 ) + (AB)−1 − ∂α (BM1 k1 ) + M2 ∂α Bk2 − H∂β Ak1 − ∂β (HAk2 ) , Φ2 (u, Du, D2 u, D3 u, Q2 ) :=
−(AB)−1 − ∂β A(N1 − N2 ) + A∂β N2 + B∂α S + 2S∂α B + k2 Q2 + k2 (Sθ1 + N2 θ2 ) + (AB)−1 − ∂β (AM2 k2 ) + M1 ∂β Ak1 − H∂α Bk2 − ∂α (HBk1 ) , Φ3 (u, Du, D2 u, D3 u, Q1 , Q2 , DQ1 , DQ2 ) := −(AB)−1 Q1 ∂α B + Q2 ∂β A + B∂α Q1 + A∂β Q2 − k1 N1 − k2 N2 − k12 M1 − k22 M2 − (AB)−1 ∂α B(N1 θ1 + Sθ2 ) − (AB)−1 ∂β A(N2 θ2 + Sθ1 ) , Φ4 (u, Du, D2 u, D3 u) := −(AB)−1 ∂α (BM1 ) − M2 ∂α B + ∂β (AH) + H∂β A , Φ5 (u, Du, D2 u, D3 u) := −(AB)−1 ∂β (AM2 ) − M1 ∂β A + ∂α (BH) + H∂α B . Then our mathematical model for the thin shell (11)-(15) with the help of the assumptions (A10), (A11) can be written as: u + chu˙ + Φ1 (u, Du, D2 u, D3 u, Q1 ) = p1 , ρw h¨
(18)
110
E. B¨ ansch et al.
ρw h¨ v + chv˙ + Φ2 (u, Du, D2 u, D3 u, Q2 ) = p2 ,
(19)
ρw h w ¨ + chw˙ + Φ3 (u, Du, D2 u, D3 u, Q1 , Q2 , DQ1 , DQ2 ) = pn ,
(20)
5
Q 1 = ρw
h3 ¨ h3 ψ1 + c ψ˙ 1 + Φ4 (u, Du, D2 u, D3 u), 12 12
(21)
Q 2 = ρw
h3 ¨ h3 ψ2 + c ψ˙ 2 + Φ5 (u, Du, D2 u, D3 u), 12 12
(22)
ψ1 = θ1 = A−1 ∂α w + k1 u,
(23)
ψ2 = θ2 = B −1 ∂β w + k2 v.
(24)
Scaling Analysis for the Shell Equations
Let α, β be the dimensionless coordinates for the parametrization (1): (α, β) ∈ ¯ for the scaling of x, y and [0, α0 ] × [0, β0 ]. Let us use the characteristic length x z¯ for the scaling of z. We determine the characteristic scales for the tangential displacements u, v as u ¯ and for the displacement w as w. ¯ We consider τ∗ for a characteristic time for the structure processes and t∗ for a characteristic time w ¯ of the fluid flow, such that τ∗ = t∗ . Additionally we use the fluid density ρ∗ z¯ for a characteristic density value, ρ∗ /τ∗ for a characteristic value of the damping coefficient c and the characteristic pressure p∗ = ρ∗ v∗2 for a characteristic value of the external loads p1 , p2 , pn . The characteristic fluid velocity v∗ is defined by v∗ = z¯/t∗ . We keep the previous notations for the dimensionless functions. Then for the elastic structure equations we obtain the following nondimensional form of (18)-(22): Ep Eu 1 Ep 1 p1 , [ρw u ¨ + cu] ˙ + 3 Φ1 (u, Du, D2 u, D3 u, Q1 ) = 3 Eh2 Eh Eh Re ˙ + Ep Eu Eh2 [ρw v¨ + cv]
1 ˆ 2 ) = Ep 1 p 2 , Φ2 (u, Du, D2 u, D3 u, Q Eh3 Eh3 Re
Ep 1 Ep [ρw w ¨ + cˆw] ˙ + 3 Φ3 (u, Du, D2 u, D3 u, Q1 , Q2 , DQ1 , DQ2 ) = 3 pn , Eh2 Eh Eh
(25)
(26)
(27)
Ep [ρw θ¨1 + cθ˙1 ] − Ew2 Q1 + Φ4 (u, Du, D2 u, D3 u) = 0,
(28)
Ep [ρw θ¨2 + cθ˙2 ] − Ew2 Q2 + Φ5 (u, Du, D2 u, D3 u) = 0,
(29)
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
111
where Φ1 (u, Du, D2 u, D3 u, Q1 ) = −Eh Ew2 (AB)−1 ∂α (BN1 ) − N2 ∂α B+ 1 Eu ∂β (AS) + S∂β A + Eh3 Ew4 [k1 Q1 ] + Eh Ew3 k1 [Eu Sθ2 + N1 θ1 ]+ 12 1 3 4 E E (AB)−1 − ∂α (BM1 k1 ) + M2 ∂α Bk2 − H∂β Ak1 − ∂β (HAk2 ) , 12 h w Φ2 (u, Du, D2 u, D3 u, Q2 ) = −Eh Ew2 (AB)−1 ∂β (AN2 ) − N1 ∂β A+ 1 Eu ∂α (BS) + S∂α B + Eh3 Ew4 [k2 Q2 ] + Eh Ew3 k2 [Eu Sθ1 + N2 θ2 ]+ 12 1 3 4 Eh Ew (AB)−1 − ∂β (AM2 k2 ) + M1 ∂β Ak1 − H∂α Bk2 − ∂α (HBk1 ) , 12 Φ3 (u, Du, D2 u, D3 u, Q1 , Q2 , DQ1 , DQ2 ) = −
1 3 4 Eh Ew (AB)−1 ∂α (BQ1 ) + ∂β (AQ2 ) − 12
1 3 4 2 E E [k M1 + k22 M2 ]− 12 h w 1 Eh Ew3 {(AB)−1 ∂α B(N1 θ1 + Eu Sθ2 ) + (AB)−1 ∂β A(N2 θ2 + Eu Sθ1 ) }, Φ4 (u, Du, D2 u, D3 u) = −Ew2 {(AB)−1 ∂α (BM1 ) − M2 ∂α B + ∂β (AH) + H∂β A }, Φ5 (u, Du, D2 u, D3 u) = −Ew2 {(AB)−1 ∂β (AM2 ) − M1 ∂β A + ∂α (BH) + H∂α B }. Eh Ew2 [k1 N1 + k2 N2 ] −
We notice that some small dimensionless parameters have been obtained during the scaling procedure of the equations (18)-(24), namely: Eu =
1 u ¯ w ¯ p∗ , Ew = , Ep = , . w ¯ z¯ E Re
h (A12) We assume that the parameter Eh = is of the order 1. All these w ¯ parameters make it possible to show a different character of the displacements of the middle surface in the tangential and normal directions (Eu ), to compare the characteristic values of the normal displacements and of the shell thickness (Eh ), to compare the characteristic values of the normal displacements and of the characteristic length scale (Ew ) and also to give some analysis of the coupling of the fluid flow and the elastic structure movement (Ep , Ew ).
6
Simplifications of the Shell Equations for the Case with Normal Displacements Only
In this section we will simplify the nondimensional shell equations (25)-(29).
112
6.1
E. B¨ ansch et al.
Dominant Normal Displacements for General Shells
(A13) We introduce an assumption that the shell displacements in normal direction w are dominant in comparison with the tangential displacements u, v and continue the scaling with respect to the different scales for the tangential and normal displacements: Eu 1. Then after neglecting the terms with Eu from the equations (25)-(29) we obtain the following system of the equations: 1 −Ew2 (AB)−1 ∂α (BN1 ) − N2 ∂α B } + Eh2 Ew4 [k1 Q1 ] + Ew3 k1 N1 θ1 + 12 Ep 1 1 2 4 Eh Ew (AB)−1 − ∂α (BM1 k1 ) + M2 ∂α Bk2 − H∂β Ak1 − ∂β (HAk2 ) = p1 , 12 Eh Re (30) 1 2 4 2 −1 3 −Ew {(AB) ∂β (AN2 ) − N1 ∂β A } + Eh Ew [k2 Q2 ] + Ew k2 N2 θ2 + 12 Ep 1 1 2 4 Eh Ew (AB)−1 − ∂β (AM2 k2 ) + M1 ∂β Ak1 − H∂α Bk2 − ∂α (HBk1 ) = p2 , 12 Eh Re (31) 1 2 4 Ep [ρw w ¨ + cw] ˙ − Eh Ew {(AB)−1 ∂α (BQ1 ) + ∂β (AQ2 ) } − Ew2 [k1 N1 + k2 N2 ]− 12 1 2 4 2 Ep Eh E w [k1 M1+k22 M2 ] −Ew3 {(AB)−1 ∂α B(N1 θ1 ) +(AB)−1 ∂β A(N2 θ2 ) } = pn , 12 Eh (32) Ep [ρw θ¨1 + cθ˙1 ] − Q1 − {(AB)−1 ∂α (BM1 ) − M2 ∂α B + ∂β (AH) + H∂β A } = 0, 2 Ew (33) Ep [ρw θ¨2 + cθ˙2 ] − Q2 − {(AB)−1 ∂β (AM2 ) − M1 ∂β A + ∂α (BH) + H∂α B } = 0, 2 Ew (34) where for the forces N1 , N2 , S and moments M1 , M2 , H we have the following expressions after neglecting the terms with Eu (see (17) for the calculations):
M1 =
N1 =
1 {(−w)(k1 + νk2 )}, 1 − ν2
N2 =
1 {(−w)(k2 + νk1 )}, 1 − ν2
1 {[A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w − k12 w]+ 1 − ν2
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
113
ν[B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w − k22 w]}, M2 =
1 {[B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w − k22 w]+ 1 − ν2 ν[A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w − k12 w]},
1 (AB)−1 [∂αβ w − A−1 ∂β A ∂α w − B −1 ∂α B ∂β w]. 1+ν For θ1 , θ2 we have θ1 = A−1 ∂α w, θ2 = B −1 ∂β w. H=
(35) (36)
With the help of (35), (36) we compute the cross-cutting forces Q1 , Q2 from (33), (34). Then we obtain Q1 = −
1 Ep A−1 {∂α Δw + ∂α w K(1 − ν)} + 2 A−1 ∂α (ρw ∂tt w + c∂t w)− 1 − ν2 Ew 1 (AB)−1 Φ˜1 , 1 − ν2
Q2 = −
1 Ep B −1 {∂β Δw + ∂β w K(1 − ν)} + 2 B −1 ∂β (ρw ∂tt w + c∂t w)− 1 − ν2 Ew 1 (AB)−1 Φ˜2 , 1 − ν2
where
(37)
(38)
Φ˜1 = ∂α − w B (k12 + νk22 ) + w ∂α B (k22 + νk12 ), Φ˜2 = ∂β − w A (k22 + νk12 ) + w ∂β A (k12 + νk22 )
and Δ is the Laplace operator in the curvilinear orthogonal coordinates α, β Δ = (AB)−1 [∂α (BA−1 ∂α ) + ∂β (AB −1 ∂β )].
(39)
The operator Δ defined in (39) is the so-called Laplace-Beltrami operator on the middle surface. In order to compare with the usual Gaussian notations let us √ 2 = AB, compute g11 = A2 , g22 = B 2 , g12 = g21 = 0, than g = g11 g22 − g12 11 2 22 2 12 21 g = 1/A , g = 1/B , g = g = 0 and 1 √ Δf = ΔX f = Δf = √ ∂i { gg ij ∂j f } = g √ √ 1 √ {∂1 ( gg 11 ∂1 f ) + ∂2 ( gg 22 ∂2 f )}, g where ∂1 = ∂α , ∂2 = ∂β and ΔX or sometimes Δ are used to denote the LaplaceBeltrami operator. Let us use further the notation Δ for (39). Now we can rewrite (32) as follows Ep [ρw w ¨ + cw] ˙ +
1 1 2 4 1 1 E E Δ(K w)− Δ Δw + Eh2 Ew4 12 h w 1 − ν 2 12 1+ν
114
E. B¨ ansch et al.
1 2 2 1 1 E E Ep Δ(ρw ∂tt w + c∂t w) − Eh2 Ew4 Δ(w(k12 + νk22 )) 12 h w 12 1 − ν2 Ep ˜ −Ew2 [k1 N1 + k2 N2 ] + Φ(w, ∇w, k1 , k2 ) = pn . Eh
(40)
Here
−
˜ Φ(w, Dw, D2 w, k1 , k2 ) = (AB)−1 {∂α A−1 B∂α K w + ∂β AB −1 ∂β K w }+
1 2 4 1 E E 12 h w 1 + ν 1 2 4 1 Eh Ew (AB)−1 {∂α ∂α B A−1 w(k22 − k12 ) + 12 1+ν ∂β ∂β A B −1 w(k12 − k22 ) + AB −1 ∂β w(k12 − k22 ) }−
1 2 4 1 E E {k 2 [A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w]+ 12 h w 1 − ν 2 1 k22 [B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w] − (k14 + k24 )w+ ν k12 B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w +
k22 A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w − 2K2 w − Ew3 {(AB)−1 ∂α BA−1 N1 ∂α w + (AB)−1 ∂β AB −1 N2 ∂β w }. (A14) We propose to use the definitions from (35) of the normal forces N1 , N2 for the main term concerning these forces, namely for k1 N1 + k2 N2 . Then we can rewrite (40) as follows: Ep [ρw w ¨ + cw] ˙ +
1 2 4 1 1 1 Eh Ew Δ Δw + Eh2 Ew4 Δ(K w)− 2 12 1−ν 12 1 − ν2
1 2 2 1 1 Eh Ew Ep Δ(ρw ∂tt w + c∂t w) − Eh2 Ew4 Δ(w(k12 + νk22 ))+ 12 12 1 − ν2 1 1 w[ H2 − 2K]− Ew2 1+ν 1−ν 1 2 4 1 E E {k 2 [A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w]+ 12 h w 1 − ν 2 1 k22 [B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w] − (H4 − 4H2 K + 2K2 )w}+ Ep ˜ Φ(w, Dw, D2 w, k1 , k2 ) = pn . Eh
(41)
Here H is the mean curvature H = k1 + k2 of the middle surface. We remind ˜ Dw, D2 w, k1 , k2 ) can that K = k1 k2 is the Gauss curvature. The function Φ(w, be written in the form: ˜ Φ(w, Dw, D2 w, k1 , k2 ) =
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
−
115
1 2 4 ν E E {k 2 [B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w]+ 12 h w 1 − ν 2 1
k22 [A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w] − 2K 2 w}− Ew3 {(AB)−1 ∂α BA−1 N1 ∂α w + (AB)−1 ∂β AB −1 N2 ∂β w }− 1 2 4 1 Eh Ew (AB)−1 {∂α A−1 B∂α K w + ∂β AB −1 ∂β K w }+ 12 1+ν 1 2 4 1 Eh Ew (AB)−1 {∂α ∂α B A−1 w(k22 − k12 ) + 12 1+ν ∂β ∂β A B −1 w(k12 − k22 ) + A B −1 ∂β w(k12 − k22 ) }. (A15) Assuming that the Gauss curvature is small we could obtain the equation (41) in the form: Ep [ρw w ¨ + cw] ˙ +
1 2 4 1 E E Δ Δw− 12 h w 1 − ν 2
1 2 2 1 1 E E Ep Δ(ρw ∂tt w + c∂t w) − Eh2 Ew4 Δ(w(k12 + νk22 ))+ 12 h w 12 1 − ν2 Ew2 w
1 1 1 H2 − Eh2 Ew4 {k 2 [A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w]+ 1 − ν2 12 1 − ν2 1 k22 [B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w] − H4 w}+ Ep ˜ Φ(w, Dw, D2 w, k1 , k2 ) = pn . Eh
˜ For the equation (42) we use the following expression for the function Φ: ˜ Φ(w, Dw, D2 w, k1 , k2 ) = −
1 2 4 ν E E {k 2 [B −1 ∂β (B −1 ∂β w) + A−2 B −1 ∂α B∂α w]+ 12 h w 1 − ν 2 1
k22 [A−1 ∂α (A−1 ∂α w) + A−1 B −2 ∂β A∂β w]}− Ew3 {(AB)−1 ∂α BA−1 N1 ∂α w + (AB)−1 ∂β AB −1 N2 ∂β w }+ 1 2 4 1 Eh Ew (AB)−1 {∂α ∂α B A−1 w(k22 − k12 ) + 12 1+ν ∂β ∂β A B −1 w(k12 − k22 ) + A B −1 ∂β w(k12 − k22 ) }.
(42)
116
6.2
E. B¨ ansch et al.
Simplifications of the Shell Equations for the Case with Normal Displacement for Cylindrical Shells
Let us consider the radius R2 of the cylindrical middle surface as the characteristic scale of x and y: x ¯ = R2 . For the further considerations we suppose: (A16) that the deformed elastic boundary consists of a cylinder, for which the radius depends on z, and two dimensionless surface coordinates describing the cylindrical middle surface are taken into consideration: longitudinal z := α and angular β; for weak modifications of the cylindrical middle surface we have for its deformed dimensionless radius R(z) that R(z) ≈ 1; (A17) that during the deformations the variations of the Lam´e parameters A, B are very small, such that A ≈ 1, B ≈ x¯/¯ z and their derivatives can be considered to be equal to zero; (A18) that the curvature k1 is much smaller in comparison with k2 , and k1 ≈ x, i.e. the reference surface, given by (1) is assumed to be of the 0, k2 ≈ z¯/¯ cylindrical type; (A19) that the tangential load p2 on the middle surface is very small and can be neglected. These assumptions are geometrically correct with respect to a fulfillment of the Codazzi- and Gauss conditions (3), (4). The geometrical compatibility conditions (see [10,12]) will be fulfilled automatically. The assumptions (A14)-(A19) and (A20) (see below) imply that the detailed model (30)-(34) reduces to (43), (44) only. We notice that the equations (31), (34) are fulfilled identically. (A20) Instead of (30) and (42) and with the help of (37) for Q1 the following simplified form of the equations can be obtained without taking into consideration the dependence of w on β but using the terms containing Δ: −Ew2 ∂z N1 = ¨ + cw] ˙ + Ep [ρw w
Ep 1 p1 , Eh Re
(43)
1 2 4 1 E E Δ Δw− 12 h w 1 − ν 2
1 2 2 ν 1 1 Eh Ew Ep Δ(ρw ∂tt w + c∂t w) − Eh2 Ew4 k22 Δw + Ew2 k 2 w+ 2 12 12 1−ν 1 − ν2 2 ν 1 2 4 1 1 Ep Eh Ew k24 w − Ew3 {∂z N1 ∂z w + N1 ∂zz w} − Eh2 Ew4 k22 ∂zz w = pn . 2 2 12 1−ν 12 1−ν Eh (44) Actually we have proposed now to ignore the definition from (35) for the force N1 and to consider (43) as some compatibility condition between the tangential external load and the internal elastic force. In this case we can consider (43) as the equation for the definition of N1 with some suitable boundary conditions. The versions of equation (44) with constant coefficients and without damping term are called “the generalized rod model” or “the generalized string model” in
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
117
[18,19] and are used for numerical investigations of 1D structure model. Thus, the generalized string model was obtained in [19] by considering the forces acting on the cylinder and under dominant character of a longitudinal stress for the geometric linear case. 6.3
Simplifications of the Shell Equations for the Case with Normal Displacement for Arbitrary Shells under Special Assumptions
We suppose (A21) for the membrane theory the usual assumption [12], which characterizes the dynamics of shells without flexural strain energy, i.e. we neglect the action of the bending and turning moments M1 , M2 , H and of the cross-cutting forces Q1 , Q2 and we do not take into consideration the functions θ1 , θ2 . It can be done, if we want to study displacements for the case 1/Eh ≤ 10−1 , Ew ≤ 10−2 , 1/Re ≤ 10−2 and Ep ≤ 10−3 ; (A22) the assumption of the smallness of the tangential displacements Eu 1 and of a dominant role of the displacement w only in the normal direction without damping. Using the assumptions (A21), (A22) we obtain from (30)- (34) the differential equation for the displacement w: 1 1 Ep Ep 2 w H ρ w ¨ = − − 2K + pn . (45) w Ew2 1+ν 1−ν Eh Ew2 A simplification of (45) was used as an “independent-rings model” for a cylindrical elastic structure in [18] and as a “structure equation” in [15].
7
Mathematical Model of the Coupled Problem
Now let us consider the flow of an incompressible fluid in a domain Ω(t) with elastic wall Γw (t) := {(x, y, z) | x = R2 (z, t) cos β, y = R2 (z, t) sin β, 0 < z < L}. The inflow- and outflow boundaries are denoted by Γin and Γout respectively: Γin := {(x, y, z) | x = R cos β, y = R sin β, z = 0}, Γout := {(x, y, z) | x = R cos β, y = R sin β, z = L}. Here t ∈ (0, tend ), tend < ∞, β ∈ [0, 2π), 0 < R < R20 , R20 = R2 (0, 0) = R2 (L, 0). The mathematical model for the coupled problem consists of the initial boundary value problem for the incompressible Navier-Stokes equations (46)-(49), (53), (54), of the initial boundary value problem for the equation of the elastic structure deformation (50), (51), (55) for the cylindrical geometry. The coupling conditions are given by (52), (53). The Navier-Stokes equations are given by 1 ∂V + V · ∇V − ΔV + ∇p = 0 in Ω(t), ∂t Re
(46)
∇ · V = 0 in Ω(t),
(47)
where V is the velocity of the fluid and p denotes the pressure.
118
E. B¨ ansch et al.
On the upstream Γin and downstream Γout boundaries we prescribe for all time levels t the usual set of the Dirichlet- and Neumann conditions respectively: V = Vin on Γin ,
(48)
1 ∇V · n = 0 on Γout . (49) Re A coupled approach is based on an interacting of two different mathematical models describing a fluid flow and a structure movement. To describe a deformation of the elastic structure Γw (t) we consider an equation of type (44) for the cylindrical shell rewritten in the form: pn −
∂tt w − γ Δ Δw − γ t ∂t Δw + γ w w = γ p pn on Γw (t)
(50)
with constant coefficients γ Δ , γ t , γ w , γ p . The boundary conditions for the equation (50) for all time levels t are as follows: w = 0 on Γwin and on Γwout .
(51)
Here Γwin , Γwout are for all t ∈ (0, tend ) the fixed boundaries (the circumferences) of the elastic structure Γw (t): Γwin = Γw (t) ∩ Γin := {(x, y, z) | x = R20 cos β, y = R20 sin β, z = 0}, Γwout = Γw (t) ∩ Γout := {(x, y, z) | x = R20 cos β, y = R20 sin β, z = L}, where β ∈ [0, 2π). We also assume that the periodic conditions for w relative to β are fulfilled. The dynamic coupling for the fluid and the elastic structure is defined by the expression for pn : 1 n · D(V)n on Γw (t), (52) Re 1 is the strain rate tensor for the fluid and where D(V) = ∇V + (∇V)T 2 pw is the pressure on the boundary Γw (t). The kinematic coupling condition is representing the continuity of the velocity on the elastic boundary: pn = pw − 2
Vn = ∂t w, Vs1 ,s2 = 0 on Γw (t),
(53)
where Vn = V · n and Vs1 = V · s1 , Vs2 = V · s2 are the normal and tangential components of the fluid velocity on the boundary Γw (t) respectively. We assume that a movement of the fluid and of the elastic structure arises from an initial state V(0, x) = V0 (x) in Ω0 = Ω(0),
(54)
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
119
w(0, x) = 0, ∂t w(0, x) = w1 = const (w1 ≥ 0), if x ∈ Γw0 = Γw (0).
(55)
We should notice that n is here the outer normal with respect to the flow domain. Here the orders of the coefficients γ Δ , γ t , γ w , γ p , can be calculated with the help of the values of dimensionless parameters from the scaling analysis (see Section 5): γ Δ is of the order between 10−2 and 1, γ t is of the order between 10−3 and 10−2 , γ w is of the order between 10 and 102 , and γ p is of the order 1. Instead of (50) an other version of the equation (44) describing the normal displacements of a cylindrical elastic structure can be obtained: ∂tt w + γ˜ Δ ΔΔw − γ Δ Δw − γ t ∂t Δw + γ w w = γ p pn . The assumptions (A14)-(A20) lead to this model. 7.1
Numerical Discretization for the Coupled Problem
We use the numerical methods and the ideas developed in [2,9] for the initial boundary value problems for the incompressible Navier-Stokes equations in domains with different types of boundaries. A stable finite element discretization
t = 0.4
t = 0.8
t = 1.6
t = 2.0
t = 2.4
t = 3.0 Fig. 1. “Structure wave” in case Cyl − 3
120
E. B¨ ansch et al.
and stable time discretization are used. The numerical method for the NSEsolver is based on the fractional step ϑ- scheme. For a spatial discretization the classical Taylor-Hood elements are used, i.e. piecewise quadratics for the velocity and piecewise linears for the pressure. We discretize the equation (50) in time using the second order finite differences with respect to time and consider the weak formulation of the boundary value problem for this equation. 7.2
Pressure-Drop Problem with Cylindrical Elastic Structure
Let the elastic structure Γw be given at the initial time t = 0 as cylindrical surface {(x, y, z) | x = R20 cos β, y = R20 sin β, β ∈ [0, 2π), 0 ≤ z ≤ L} with the radius R20 = 0.5 and the length L = 3 (case Cyl − 3). We proceed from the homogeneous initial conditions for the structure equation. The fluid is initially at rest. An over-pressure of Pin is imposed at the inlet for time tin [7]. The results (wave along the elastic structure) are presented on the Figures 1. For the visualization the displacement of the structure has been magnified by a factor 10. These simulations are carried out with Re = 300, Pin = 3, tin = 0.3 and with the following values for the coefficients in (50): γ Δ = 1, γ t = 0, γ w = 1, γ p = 0.1. All parameters are given in a nondimensional form. The results show a good qualitative agreement with [7].
8
Conclusions
In this report we have presented an approach to solving fluid-structure interaction problems. We have combined the 3D Navier-Stokes equations with a principal two dimensional shell model for arbitrary geometry and have simplified the resulting equations systematically. The models for the elastic structures are based on common shell theories with use of the Love-Kirchhoff and Timoshenko hypotheses. We come up to a wave equation for the two dimensional cylindrical structure and to a differential equation with respect to time for arbitrary structures, e.g. carotid artery geometry. The coupling is based on an interaction of two different mathematical models, which describe a fluid flow and a structure movement. We establish the Navier-Stokes and cylindrical structure solver and test them with the different benchmarks.
Acknowledgments The research has been supported by the DFG (Deutsche Forschungsgemeinschaft). The work carried out by the second author was partially supported by the Alexander von Humboldt Foundation.
Mathematical and Numerical Modelling of Fluid Flow in Elastic Tubes
121
References 1. Antman, S.S.: Nonlinear problems of elasticity. Springer, New York (1995) 2. B¨ ansch, E.: Numerical methods for the instationary Navier-Stokes equations with a free capillary surface. PhD thesis, Freiburg University, Freiburg (1998) 3. Chambolle, E., Desjardins, B., Esteban, M.J., Grandmont, C.: J. Math. Fluid Mech. 7, 368–404 (2005) 4. Cheng, C.H.A., Coutand, D., Shkoller, S. (2006) Navier-Stokes equations interacting with a nonlinear elastic fluid shell 22 (November 2006) ArXiv:math.AP/0604313 v2 5. Cheng, C.H.A., Coutand, D., Shkoller, S.: SIAM J. Math. Anal. 39(3), 742–800 (2007) 6. Ciarlet, P.G.: Mathematical elasticity, Volume I: three-dimensional elasticity. Studies in mathematics and its applications, vol. 20. North-Holland, Amsterdam (1988) 7. Formaggia, L., Gerbeau, J.F., Nobile, F., Quarteroni, A.: On the coupling of 3D and 1D Navier-Stokes equations for flow problems in compliant vessels. Technical report INRIA RR–3862, 1–26 (2000) 8. Heil, M.: J. Fluid. Mech. 353, 285–312 (1997) 9. H¨ ohn, B.: Numerik f¨ ur die Marangoni-Konvektion beim Floating-Zone Verfahren. PhD Thesis, Freiburg University, Freiburg (1999) 10. Koiter, W.T.: A consistent first approximation in the general theory of thin elastic shells, part 1: foundations and linear thery. Technical report, Laboratory of Applied Mechanics, Delft (1959) 11. Liepsch, D.W.: Biorheology 23, 395–433 (1986) 12. Novozhilov, V.V.: The Theory of thin shells. P. Noordhoff Ltd., Groningen. Translated by Lowe, P.G (1959) 13. Pa¨Idoussis M.P.: Fluid-structure interaction. Slender structures and axial flow, vol. I. Academic Press, London (1998) 14. Perktold, K., Rappitsch, G.: ZAMM 74, T477–T480 (1994) 15. Perktold, K., Rappitsch, G.: Mathematical modelling of local arterial flow and vessel mechanics. In: Crolet J, Ohayon R (eds.) Computational methods for fluid structure interaction. Pitman Research Notes in Mathematics (1994) 16. Pertsev, A.K., Platonov, E.G.: Dynamic of shells and plates. Non-stationary problems. Sudostroenie, Leningrad (in Russian) (1987) 17. Quarteroni, A., Veneziani, A.: Modeling and simulation of blood flow problems. In: Bristeau, M.O., Etgen, G., Fitzgibbon, W., Lions, J.L., Periaux, J., Wheeler, M.F. (eds.) Computational science for the 21st Century. J.Wiley & sons, Chichester (1997) 18. Quarteroni, A., Tuveri, M., Veneziani, A.: Comput. Visual. Sci. 2, 163–197 (2000) 19. Quarteroni, A.: Mathematical modelling and numerical simulation of the cardiovascular system. In: Ayache, N. (ed.) Modelling of living systems. Handbook of Numerical Analysis Series. Elsevier, Amsterdam (2002) 20. Timoshenko, S.: Vibration problems in engineering. D. Van Nostrand Company, Inc, Toronto, New York, London (1953) 21. Volmir, A.S.: Nonlinear dynamics of membrans and shells. Moscow, Nauka (1972) 22. Volmir, A.S.: Shells in a stream of fluid and gas. Problems of aeroelasticity. Moscow, Nauka (in Russian) (1976) 23. Washizu, K.: Variational methods in elasticity and plasticity. Pergamon Press, Oxford, New York, Toronto, Sydney, Paris, Frankfurt (1982)
Parallel Numerical Modeling of Modern Fibre Optics Devices L.Yu. Prokopyeva, Yu.I. Shokin, A.S. Lebedev, O.V. Shtyrina, and M.P. Fedoruk Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. The paper addresses numerical modeling of electric and magnetic fields in fiber media generally described by Maxwell equations. Complex microstructures and composite materials in modern optics devices require special techniques to be modeled. These peculiarities are all considered in FVTD method proposed. High computational complexity inherent to multidimensional computations is resolved through parallel computations. For this purpose the decomposition of nonstructured triangular grids and MPI programming model are implemented. The resulting efficiency graphics obtained on HLRS cluster (NEC Xeon EM64T Cluster) show good scalability. In particular case, the simple fiber waveguide can be described by the simplified mathematical model. Nevertheless the computational complexity increases when femtosecond laser inscription with close to critical input power are considered. In order to parallelize these computations the parallel Thomas algorithm for solving tridiagonal linear systems was implemented. The obtained efficiency graphics are discussed.
1
Introduction
Recently numerical modeling of electromagnetic waves propagating in optical media has got a new set of applications due to wide experimental research and new technologies in designing and fabricating modern integrated fiber optics devices. One of the important problems numerical simulation can be helpful to solve is to maximize the length of data transmission of communication lines [1]-[2]. Here the needs of parallel computations are coming from the small permissible BER (Bit Error Rate) and thus enormously large bit sequences to test. Last years great attention was payed to fundamental theoretical and experimental research in designing new artificial metamaterials with nontrivial optical properties. Thus the negative refractive index was theoretically predicted and then experimentally proved in tricky dielectric metamaterials with artificial nanometer periodic metal includings [3]-[5]. That discovery gave rise to new fiber optics technologies and experimental research. On the other hand, as the potentialities of a direct experiment are strictly limited, the opportunities that numerical simulation can provide an experimenter E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 122–135, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Parallel Numerical Modeling of Modern Fibre Optics Devices
123
with are of great importance. But the complexity of modern metamaterials also restricts the numerical methods possibly applied. The main reasons for that are complicated geometry of composites and nanometer sizes of includings (note that it becomes comparable to the light wavelength, and thus no envelope of amplitude representation can be used). The media discontinuity of composites causes difficulties too. The full Maxwell equations give the correct mathematical model to describe the light propagation in such media. In ICT SB RAS the development of numerical algorithms for solving timedependent Maxwell equations in media with peculiarities started in 1998 with [6], where the FVTD on unstructured triangular grids was first proposed. Next development step followed with [7] where the FVTD was modified to work correctly with discontinuous permittivity, keeping practically the second order of convergence at the same time. Later the dispersive terms due to Debye and Drude models were implemented. The current version of FVTD quite satisfactorily describes propagation of light in tricky metal-dielectric composites which has been proved by a number of numerical tests on plane wave diffraction on a metal film [8], gradient cylinder [9] and some others. All tests show good agreement with analytical solutions that are written in terms of Bessel’s functions in last case. As an example, in this paper we present only simulation with gradient dielectric cylinder. In numerical modeling of light propagation in metamaterials the only problem is the number of cells required to correctly describe any essential bit of such nanostructured media. In periodic case the solution is to take one period as the computational domain for FVTD method and set periodic boundary conditions. In general case such nanostructured media could only be modeled with the help of modern high performance computing hardware and corresponding parallel programming models. To resolve this difficulty we developed a parallel version of FVTD. For that purpose we used domain decomposition on unstructured triangular grid to distribute it to processors. For processors’ communication we used standard MPI functions. The resulting C++ code was tested for a speedup on a cluster in HLRS (High Performance Computing Center, Stuttgart) up to 20 processors. On this relatively small number of processors we observed practically linear scaleup. The need for parallel implementation may also arise when quite simple mathematical model is used as is the case when the femtosecond laser inscription phenomenon is modeled. Although the model describes axial symmetrical case and uses quite simple finite difference approximations the computational complexity becomes enormous when the input beam power approaches the critical value and thus the extreme focusing of the beam occurs. To fight this, the parallel tridiagonal solver [10] was used on each time step. This paper addresses two examples of applying parallel programming in fibre optics simulations mentioned above. In Section 2, on Maxwell equations, we describe the parallelization steps coming from sequential finite volume algorithm on unstructured triangular grid up to final parallel version of it. First, sequential FVTD is given with the analysis of possible parallelism. Then the scheme of
124
L.Yu. Prokopyeva et al.
parallel FVTD with the processor communications needed derived. Finally, we briefly describe the decomposition process and end up with the speedup results of parallel C++ code using MPI. In Section 3, on femtosecond laser inscription phenomena, we start with the full and simplified models, then address the speedup obtained by implementing the parallel algorithm for tridiagonal systems of linear equations [10] in simplified model simulations.
2
Parallel Finite-Volume Algorithm on Unstructured Grids for Solving Maxwell Equations
Parallel Finite-Volume Method The electromagnetic waves propagating in nonmagnetic dispersive optical media (μ = 1, divD = 0) are described by Maxwell equations (for example see [11]): ⎧ ∂D ⎪ ⎪ − rotH = 0, D(ω) = ε∗r (ω)E, ⎪ ⎪ ⎨ ∂t (1) ∂B ⎪ + rotE = 0, B = H, ⎪ ⎪ ⎪ ⎩ ∂t divD = 0, divB = 0 The dispersive properties of a media are defined by frequency dependent permittivity ε∗r (ω) which in classical case of media with relative dielectric permittivity εr and finite conductivity σ reads as ε∗r (ω) = εr + σ/iωε0 .
(2)
For optical media there are two widely accepted and applicable dispersion models referred to as Debye model and Drude model. The following expressions for ε∗r (ω) define these models correspondingly ε∗r (ω) = ε∞ +
χ1 σ − 1 − iωt0 iωε0
(3)
ωp2 , ω(ω + iΓ )
(4)
ε∗r (ω) = ε∞ −
where ε∞ is the permittivity at infinite frequencies, χ1 is the permittivity step, σ is media conductivity, ωp is cyclic plasma frequency, Γ is relaxation coefficient. Mathematically, the Drude model is but a special case of the Debye model (the expression for frequency dependent permittivity in the Drude model has no term 1/ω) and is applied for a metal media. In the following it is assumed that the general case of Debye model is employed. Using D(ω) = ε∗r (ω)E and (3), and applying the inverse Fourier transformation we have χ1 D(t) = ε∞ E(t) + t0
t e 0
−(t−τ )/t0
σ E(τ )dτ + ε0
t E(τ )dτ. 0
(5)
Parallel Numerical Modeling of Modern Fibre Optics Devices
125
With substitution D → ε∞ E in (1), the equations for propagation of electromagnetic waves in frequency dependent materials become as follows ⎧ ∂D ⎪ ⎪ − rotH = −Q ⎪ ⎪ ∂t ⎪ ⎪ ⎪ ⎪ ⎨ ∂B + rotE = 0 (6) ∂t ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ divD = 0, D = ε∞ E, ⎪ ⎪ ⎩ divB = 0, B = H, where χ1 Q=− 2 t0
t e
−(t−τ )/t0
E(τ )dτ +
σ χ1 + ε0 t0
t
0
E(τ )dτ.
(7)
0
The equations (6) take into account dispersion properties of a media and adequately describe electromagnetic processes in various materials used in modern fiber optics. Further we consider two-dimensional case (axial symmetry, as well as TMand TE- waves are already of interest for simulation), and use matrix notation for divergent and nondivergent variables correspondingly: ∂ ∂ ∂U + A1 U + A2 U = −R, ∂t ∂x1 ∂x2
(8)
∂V ∂V ∂V + B2 = −Q. + B1 ∂t ∂x1 ∂x2
(9)
For example, in case of TM-wave and the Drude model the matrices in (8)-(9) are computed as ⎞ ⎛ ⎛ ⎛ ⎛ 1 ⎞ ⎞ ⎞ 0 0 −1 0 10 0 0 − ε1∞ 0 ε∞ 0 A1 = ⎝ 0 0 0 ⎠ A2 = ⎝ ε1∞ 0 0 ⎠ B1 = ⎝ 0 0 0 ⎠ B2 = ⎝ 1 0 0 ⎠ − 1 0 0 0 ⎞0 0 0⎞0 0 ⎛ ε∞t ⎛ −1 t0 0 ωp2 −(t−τ )/t0 ωp2 −(t−τ )/t0 D3 (τ )dτ ⎟ E3 (τ )dτ ⎟ ⎜ ε∞ e ⎜ ε∞ e 0 0 ⎟ ⎟ ⎜ R=⎜ Q = ⎠ ⎠ ⎝ ⎝ 0 0 0 0 ⎞ ⎞ ⎛ ⎛ D3 E3 U = ⎝ H1 ⎠ V = ⎝ H1 ⎠ H2 H2 (10) For discretization of the equations, the computational domain is fist covered with a triangular mesh whose cell edges are aligned, if necessary, with the lines of jumps in physical properties of the media. Figure 1 shows an example of a mesh with a smoothly varying cell size for a domain with a cut.
126
L.Yu. Prokopyeva et al.
Fig. 1. Triangular mesh for a domain with a cut, 2481 cells
Integrating Eq.(8) over Δi and transforming the integral of the spacial derivatives by the Gauss—Ostrogradsky formula, we obtain the equation ∂ ∂t
Udx1 dx2 +
3
AUdΓ = −
k=1 k Γi
Δi
Rdx1 dx2 ,
(11)
Δi
where Γik are the cell edges, n = (n1 , n2 ) is the outward normal, A = n1 A1 + n2 A2 . We determine the vector Un at the barycenters of triangular elements at the time tn = nτ , where τ is the time step. Approximating the time derivative in (11) by a finite difference, for the ith cell we obtain the equation Un+1 − Uni n+1/2 i + li,k Fi,k = Si Rni , τ 3
Si
(12)
k=1
where Fi,k is the estimated value of the flux through the kth edge of the ith triangle, li,k is the length of the edge, Si is the area of the triangle. As the above scheme is explicit it can be implemented in parallel manner. The key difficulty of parallel implementation lies in the computation of fluxes Fi,k which depend on the values Un from a number of the adjacent cells. The flux Fi,k is found by solving the one-dimensional (with respect to the normal to the kth edge of the ith triangle) Riemann problem for the equation ∂ ∂ U+A U=0 ∂t ∂n
(13)
Fi,k = A+ UL + A− UR ,
(14)
and is given by the formula
where ±
A =S
−1
diag
1 (λk ± |λk |) S, 2
(15)
Parallel Numerical Modeling of Modern Fibre Optics Devices
127
S is the matrix whose columns are the right eigenvectors of A, λk are the eigenvalues of A, and U L , U R are the values of the vector U at the center of the edge computed by linear extrapolation from the two adjacent triangles. For example, for TM-wave the matrices are ⎛ 1 √ −n1 n2 ⎜ ε˜ ⎜ n22 n1 n2 1 ⎜ n2 √ − √ A+ = ⎜ ⎜ 2 ⎜ ε˜ ε˜ ε˜ ⎝ n n1 n2 n21 1 √ − √ − ε˜ ε˜ ε˜
⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎠
⎛
1 −√ ⎜ ε˜ ⎜ ⎜ n 1 2 A− = ⎜ 2⎜ ⎜ ε˜ ⎝ n 1 − ε˜
n2
−n1
⎞
⎟ ⎟ n22 n1 n2 ⎟ −√ √ ⎟ , ε˜ ε˜ ⎟ ⎟ n1 n2 n21 ⎠ √ −√ ε˜ ε˜
(16)
where ε˜ = (2εL εR )(εL + εR ). As the permittivity ε is independent of time, in the parallel version of the algorithm the matrices A± are computed only once and are stored locally in each subdomain. The vectors U L , U R are computed as U R,L = T V R,L , where for instance in case of TM-wave T = diag(ε, 1, 1). The vectors V L , V R in their turn are n+1/2 estimated in the following way. By using the Taylor formula, the vector V L at the edge center is expressed to the second-order accuracy as V
n+1/2 L
2 τ ∂V xbL , tn b ∂V xbL , tn k b . xci − xLi + = V xL , tn + ∂xi 2 ∂t i=1
(17)
Substituting the time derivative from (9), yields V
n+1/2 L
τ − 2
2 b ∂V k xci − xbLi − = V xL , tn + ∂x i i=1
3 b 1 ∂V i 2 ∂V i + bji bji + Q xL , tn ∂x1 ∂x2 i=1
(18)
To compute the flux through the edge at the boundary of two adjacent subdomains, one needs to know V L (V R ) from different processors’ domain and thus it involves MPI communication between processors. Moreover to have the vectors ∂U for both triangles must be first V L (V R ) from the formula (18), the gradients ∂x i estimated and that again requires the values V from a number of neighboring triangles and hence further communications. ∂U in case The following multiple-stage procedure to estimate the gradients ∂x i of non-smoothly varying permittivity ε has proved to be fairly robust. Stage 1. For every triangle Δi the (preliminary) gradients are computed using the values of V from three adjacent to Δi triangles (we omit here the cases of Δi having one or two neighbors). These can be done in parallel provided the values of V from the immediate cell-neighbors of an adjoining subdomain have already been passed into subdomain in question.
128
L.Yu. Prokopyeva et al.
Fig. 2. Two adjacent cells ΔL , ΔR with barycenters xbL , xbR and their common edge with its center at xkc
Stage 2. Preliminary gradients found, for every triangle one then can find the vector V at the vertices of the triangle. Obviously, these computations can be performed in parallel manner in each subdomain. The value of V at a vertex is then computed as the arithmetic mean of the values generated by all incident to the vertex triangles. As incident triangles may belong to different subdomains, the sum of contribute from each adjacent subdomain is required to be sent explicitly with MPI communication. It is supposed that for every boundary vertex all subdomains containing this vertex know the total number of the incident triangles (Fig. 4). Stage 3. With the vector V at three vertices of a triangle known, the gradients for the triangle are easily found. These is performed in parallel manner in every subdomain locally and independently. The FVTD method described was successfully tested on different composite dielectric and metal patterns. We present here only test results on plane wave
send +
send send
Fig. 3. Data exchange to compute preliminary gradients
+
Fig. 4. Data exchange to compute an average value at a vertex
Parallel Numerical Modeling of Modern Fibre Optics Devices
129
0
200
E
1
0.5
E
1
0.5
0
200
100
100 -0.5
x2
0
-0.5
x2
200 100 -100
0 -100
0 -100 -200
200 100
x1
0 -100 -200
-200
x1
-200
1
1
0.5
200 100
E
E
0.5
0
0
200 100
-0.5
x2
0
200 100 -100
0 -100 -200
-200
x1
-0.5
x2
0
200 100 -100
0 -100 -200
x1
-200
Fig. 5. Electric field propagation while plane wave diffracting on gradient cylinder. Works out similar to lens focusing.
diffracting on a gradient dielectric cylinder which has 10 different coaxial layers of different permittivity and works like a lens (see Fig. 5). There is an opportunity to compare the numerical solution with the analytical one (expressed through Bessel functions) in terms of the resulting amplitude and phase for a given wavelength. This comparison is presented in Fig. 6 This test along with others proves the efficiency of presented FVTD. To conclude this section, we observe that according to our analysis the finitevolume algorithm for solving Maxwell equations described above can be performed in parallel. The scheme of the parallel operations and communications between processors is presented in Fig. 7. Parallel version requires three communication sessions on each explicit time step of FVTD method. Alternatively, we could use only one approximately three times larger message with all needed V values and then perform only local independent operations. That could be obviously a better way but the first one is currently implemented because of its technical simplicity due to lesser number of halo cells. Some details on constructing halo cells will be given in the next section.
L.Yu. Prokopyeva et al.
Amplitude
Amplitude y=0 2
1 -1
-0.5
0
0.5
1
0.5
1
x
Phase y=0
100
Phase
130
0
-100
-1
-0.5
0
x
Fig. 6. Comparison with analytical solution, described with Bessel functions
Processor 0
V_j, S_j, T_j l_i,k, A_i,k N=N+1
Processor M-1
...
V_j, S_j, T_j l_i,k, A_i,k N=N+1
MPI communication: Send V to halo cells (Fig. 3)
Stage 1. Preliminary gradients
...
Stage 2. Approximation to vertices
...
Stage 1. Preliminary gradients
Stage 2. Approximation to vertices
MPI communication: send V to boundary vertices (Fig. 4)
Stage 3. Final gradient computation
...
Stage 3. Final gradient computation
MPI communication: send gradV to halo cells (Fig. 3) Compute V L,R (Eq. 18) Compute Flows F (Eq. 14) Compute new U (Eq. 12) V=T^(-1)U Compute integrals R,Q (Eq. 10)
...
Compute V L,R (Eq. 18) Compute Flows F (Eq. 14) Compute new U (Eq. 12) V=T^(-1)U Compute integrals R,Q (Eq. 10)
Fig. 7. Parallel FVTD scheme of operations and communications
Parallel Numerical Modeling of Modern Fibre Optics Devices
131
Domain Decomposition and Speedup of the Parallel FVTD Method
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
x2
x2
For the speedup testing purpose we take the square domain with quite uniform cell sizes and the following numbers of triangles per square side: 80, 160, 320, 640. In that domain we simulate the propagation of the TM-wave in the dielectric layer [12]. We distribute the computational cells to processors geometrically — with horizontal stripes and in a so called “web” shape (See Fig. 8–9). The examples were intentionally chosen with parameters in a such way that the current absence of boundaries length optimization can be clearly seen. Each subdomain is stored locally on one processor. The key point when organizing operations and programming with MPI is the use of halo cells, as it allows to preserve all procedures from the sequential code. It is especially important when dealing with unstructured grids. We use one layer of halo cells and make three communications to provide values needed for each stage of one explicit time step. Some additional difficulties concerns the coordination of values in inner cells and in their halo doubles. The code was run on cluster in HLRS (High Performance Computing Center, Stuttgart), for hardware details see Table 1. For the horizontal decomposition we obtained practically linear speedup in case of sufficient number of computational cells. The tricky curve for the second decomposition method is caused by cells unbalancing problem. When the balancing condition is satisfied (17 processors) the speedup become linear again. When small number of processors is used (up to 20) our parallel FVTD has practically linear speedup if sufficient number of cells is involved and the cell balancing condition is preserved. For the coarsest grid (80x80) the speedup decreases because of increased communication costs. Some curious speedup results are obtained for two architecturally alike clusters: the cluster in HLRS and the less powerful and with worse network interconnection properties cluster in ICT SB RAS (see Table 1 for details).
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.2
0.4
0.6
0.8
1
x1
Fig. 8. Decomposition with horizontal stripes. 20 cells per square side.
0
0
0.2
0.4
0.6
0.8
1
x1
Fig. 9. Another so called “web” decomposition for the same domain
132
L.Yu. Prokopyeva et al.
20
20
18
14
lin Web_080 Web_160 Web_320 Web_640
16
Speedup = T1/Tn
16
Speedup = T1/Tn
18
lin Hor_080 Hor_160 Hor_320 Hor_640
12 10 8 6 4
14 12 10 8 6 4
2
2 2
4
6
8
10
12
14
16
18
20
2
4
Number of processors
Fig. 10. Speedup for horizontal decomposition
11
11
7 6 5 4
8
4
5
6
7
8
9
1
10
1
2
Number of Processors
11
11
7 6 5 4
8
4
5
6
7
8
Number of Processors
9
10
9
10
4
2 3
8
5
3
2
7
6
2 1
6
7
3
1
5
Lin Hor_640_HLRS Web_640_HLRS Hor_640_ICT Web_640_ICT
9
Speedup = T1/Tn
Speedup = T1/Tn
8
4
Y640
10
Lin Hor_320_HLRS Web_320_HLRS Hor_320_ICT Web_320_ICT
9
3
Number of Processors
Y320
10
20
4
2 3
18
5
3
2
16
6
2 1
14
7
3
1
12
Lin Hor_160_HLRS Web_160_HLRS Hor_160_ICT Web_160_ICT
9
Speedup = T1/Tn
Speedup = T1/Tn
8
10
Y160
10
Lin Hor_080_HLRS Web_080_HLRS Hor_080_ICT Web_080_ICT
9
8
Fig. 11. Speedup for “web” (Fig. 9) decomposition
Y080
10
6
Number of processors
9
10
1
1
2
3
4
5
6
7
8
Number of Processors
Fig. 12. Speedup on HLRS cluster compared to that on ICT SB RAS cluster. With the decrease of cells the speedup become linear for both clusters.
Parallel Numerical Modeling of Modern Fibre Optics Devices
133
Table 1. Clusters’ have been used details HLRS ICT SB RAS Number of nodes 210 nodes 4 nodes A node Dual 3.2 GHz Intel Xeon EM64T Dual 3.06 GHz Intel Xeon Memory 1Gb to 8Gb Memory 2Gb Memory Network Infiniband Gigabit Ethernet MPI Voltaire Grid Director ISR 9288 MPICH Voltaire MPI CC Intel Compiler 9.0 (EM64T) Intel Compiler (8.1, 9.1) QMS Open PBS Sun Grid Engine 6.0 n1ge
Figure 12 shows that as the number of computational cells become large enough the speedup approaches ideal linear case and becomes not too sensitive to network interconnect difference (Infiniband versus Gigabit Ethernet).
3
Parallel Finite-Difference Method for Modeling Femtosecond Laser Inscription
Material processing with high-intensity femtosecond laser pulses is an attractive fast developing technology for direct writing of multi-dimensional optical structures in transparent media. It is based on the induction of an irreversible modification of the material within the focal volume of the laser beam. The irreversible change of refractive index was empirically observed when the light intensity exceeds a certain threshold. Mathematical modeling of femtosecond laser propagation in nonlinear media is based on widely accepted modification of NLSE coupled with an equation describing the plasma generation: ⎧ ⎪ 1 β (k) ∂A k ∂ 2 A iσ ⎪ ⎨i + Δ⊥ A + k0 n2 |A|2 A = − (1 + iωτ ) ρA − i |A|2(k−1) A − 2 ∂z 2k 2 ∂t 2 2 β ( k) 2k ∂ρ 1 σ ⎪ ⎪ ρ|A|2 + = 2 |A| ⎩ ∂t nb Eb kω (19)
Physical interpretation of the full model can be found for example in [13]. More simple mathematical model describing the initial stage of beam propagation is given by the equation 1 ∂A + Δ⊥ A + k0 n2 |A|2 A = 0 (20) ∂z 2k One of the most striking features of this simple mathematical model is the catastrophic self-focusing phenomenon (beam collapse) that makes straightforward numerical modeling be a challenge. Only mesh-adaptive simulations using high-performance computing can have practical applications when input power beam is close to the critical value. To make computation in this field possible the parallel version of the finitedifference method was devised. The sequential method leads to implicit scheme with tridiagonal matrix. i
134
L.Yu. Prokopyeva et al.
8 linear Tpar=17/8Tseq ICT SB RAS HLRS
6
x
Speedup = T1/Tn
7
Speedup T1 / Tn
col_03 col_06 col_12 col_24 nocol_03 nocol_06 nocol_12 nocol_24 theoretical 0.47 empiric 0.67
x
5 4 3
15
x x x
x x x x x x x x
x
x x
x x
x x x x
x
x x x x
x
x
10
x x x x x x x x x x x
2
x x x
5
x x x
1 x
x x
x
0
1
2
3
4
5
6
7
x
8
x
5
Number of Processors
10
15
20
25
30
Number of processors
Fig. 13. Speedup on HLRS cluster compared to that on ICT SB RAS cluster for parallel finite-difference method modeling femtosecond laser inscription
Fig. 14. Speedup on HLRS cluster for parallel Thomas algorithm
Parallel simulation based on parallel Thomas algorithm [10] gave some speedup with respect to the sequential simulation. The results are presented in Fig. 13. Note that the parallel numerical scheme requires additional computations compared to sequential algorithm (complexity of the parallel version is approximately 17/8 times of the sequential one), but the processor communications are quite efficient. And that actually is the point for future to test the properties of this parallel tridiagonal solver itself with respect to well known recursive doubling, partition method and cyclic reduction approaches. Currently, the parallel tridiagonal solver have been extracted from the finitedifference code and tested on the same HLRS cluster, where quite good scalability up to 32 processors obtained (Fig. 14).
4
Conclusions
We presented two simulation problems of fibre optics where we were faced with the need of parallel computations. First, we derived parallel FVTD on unstructured grid for solving Maxwell equations and obtained speedup graphics for up to 20 processors. The speedup was linear when the number of cells was large enough. Further improvements concern the implementation of optimization algorithms for domain decomposition instead of using a geometrical approach. Second, we implemented the parallel Thomas algorithm [10] in finite difference method for modeling the femtosecond laser inscription phenomena. The resulting code with MPI communications showed some speedup on 8 processors and the extracted parallel Thomas algorithm showed good scalability up to 32 processors.
Parallel Numerical Modeling of Modern Fibre Optics Devices
135
Acknowledgements We would like to thank the High Performance Computing Center, Germany, Stuttgart (HLRS) and personally Michael Resch and Thomas B¨ onisch for access to “Cacau” cluster and useful remarks. This work was supported by the Federal Program of Leading Scientific Schools (No. SS-9886.2006.9) and the Russian Foundation for Basic Research (No. 0601-00210).
References 1. 2. 3. 4. 5. 6.
7. 8. 9. 10. 11. 12. 13.
Shtyrina, O.V., Turitsyn, S.K., Fedoruk, M.P.: Kvant Elektron. 35, 169–174 (2005) Shapiro, E.G., Fedoruk, M.P., Turitsyn, S.K.: J. Opt. Comm. 27, 216–218 (2006) Smith, D.R., Pendry, J.B., Witshire, M.C.K.: Science 305, 788-792 (2004) Markos, P., Soukoulis, C.M.: Opt. Express 11, 649-661 (2003) Kildishev, A.V., et al.: J. Opt. Soc. Am. 23, 423–433 (2006) Fedoruk, M., Munz, C.D., Omnes, P., Schneider, R.: A Maxwell-Lorentz solver for self-consistent particle-field simulations on unstructured grids. In: Forschungszentrm Karlsruhe GmbH, Karlsruhe (1998) Lebedev, A.S., Fedoruk, M.P., Shtyrina, O.V.: Vichisl. Mat. Mat. Fiz. 46, 1302– 1317 (in Russian) (2006) Kildishev, A.V., Chettiar, U.: Realistic models of metal nano-scaterers at optical frequencies. In: Interim Technical Report, School of ECE, Purdue University (2004) Kotlyar, V.V., Lichmanov, M.A.: Chisl. Met. Komp. Opt. 25, 11–15 (in Russian) (2003) Yanenko, N.N., KonovalovA, N., Bugrov, A.N., Shustov, G.V.: Chisl. Met. Meh. Splosh. Sredy. 9, 139–146 (in Russian) (1978) Sullivan, D.M.: Electromagnetic simulation using the FDTD method. The Institute of Electrical and Electronics Engineers, Inc., New York (2000) Taflove, A.: Advances in computational electrodynamics. Artech House, Boston (1998) Mezentsev, V.K., Turitsyn, S.K., Fedoruk, M.P., Dubov, M., Rubenchik, A.M., Podivilov, E.V.: In: Proc. of the 8th Int. Conf. on Transparent Optical Networks, pp. 146–149 (2006)
Zonal Large-Eddy Simulations and Aeroacoustics of High-Lift Airfoil Configurations M. Meinke, D. K¨ onig, Q. Zhang, and W. Schr¨ oder Institute of Aerodynamics, RWTH Aachen University, W¨ ullnerstraße zw. 5 u. 7, 52062 Aachen, Germany
[email protected]
Abstract. High Reynolds number flows are still challenging problems for large-eddy simulations (LES) due to thin small-scale structures e.g. in the near wall regions and often transitional boundary layers which have to be resolved. For this reason, the prediction of high Reynolds number airfoil flow over the entire geometry using LES still requires huge computer resources. To remedy this problem a hybrid zonal RANS-LES method for the flow over an airfoil in high-lift configuration at Rec =106 is presented. In a first step a 2D RANS solution is sought, from which boundary conditions are formulated for an embedded LES domain, which comprises the flap and a sub-part of the main airfoil. The turbulent fluctuations in the boundary layers at the inflow region of the LES domain are generated by controlled forcing terms, which use the turbulent shear stress profiles obtained from the RANS solution. In the second part of the paper a large-eddy simulation of the flow around an airfoil consisting of a slat and a main wing is performed at a Reynolds number of 1.4·106 based on the freestream velocity and the clean chord length to identify the flow phenomena generating slat noise. The freestream Mach number is Ma=0.16 and the angle of attack is 13◦ . A computational mesh with about 55 million cells is used to resolve the turbulent scales in the boundary layers and within the slat cove region. Results are presented for both the turbulent flow field obtained from the LES and the acoustic field obtained with a computational aeroacoustics method based on acoustic perturbation equations.
1
Introduction
For modern aircrafts a comprehensive understanding of the flow physics around high-lift systems is indispensable to improve take-off and landing performance and also to reduce airframe noise. The magnitude of the emitted noise will be an important factor in the future development process of aircraft due to the continuously increasing air traffic and the stricter licensing requirements. Since tremendous progress has been made in reducing jet noise, it is especially during the landing approach when the engines run almost in idle condition, that the airframe noise is becoming the dominant part of the emitted sound. The main contributions are from landing gears and the wing, where especially high-lift devices, i.e. slats and flaps, represent major noise sources. The development E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 136–154, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Zonal Large-Eddy Simulations and Aeroacoustics
137
of low-noise aircraft demands on the one hand, an investigation of the sound generating mechanisms, which on the other hand, requires a detailed knowledge of the underlying turbulent flow field. Large-eddy simulation (LES) is known to be a reliable method to resolve unsteady turbulent flow fields. However, the high computational efforts restricted LES applications mainly to low Reynolds number flows. To considerably reduce the required computational time, hybrid zonal RANS-LES methods seem to be a promising approach. Different approaches exist for such hybrid RANS-LES solutions, see, for instance, Spalart’s DES (detached-eddy simulation) [20]. One major problem in these methods occurs in regions, where a turbulent RANS flow enters an LES zone. In this case the time-averaged flow variables have to be transformed into unsteady spatially filtered variables, which contain the resolved part of the turbulent energy spectrum in the LES. The DES ansatz assumes that such turbulent fluctuations occur more or less automatically by switching the RANS model to an LES subgrid scale model. This assumption is, however, only acceptable when pronounced instability mechanisms are present, such as in strong unstable shear layers in separated flows, which generate turbulence over a relatively small streamwise distance. For the airfoil-flap configuration presented in this paper a DES is not applicable, we therefore use overlapping RANS-LES domains in a zonal method. The different domains of the zonal LES concept are drafted in Fig. 1. The RANS solution yields target values for a sponge-layer technique, which are placed in the transfer zone of the RANS and LES domains. On these interfaces the RANS solution is considered to be in steady state. At the inflow boundaries of the LES domain, controlled forcing terms [22], dependent on the local turbulent shear stress profiles obtained from the RANS solution, are used to excite turbulent fluctuations. For the prediction of the acoustic field of the high lift configuration also a hybrid method is used. It is based on a two-step approach in which source terms are determined with an LES, which are then used in a numerical solution of the acoustic perturbation equations (APE) [3] for the acoustic field. Compared to direct methods the hybrid approach possess the potential to be more efficient in many aeroacoustical problems, since it exploits the different length scales of the flow field and the acoustic field.
2 2.1
Numerical Methods
Large-Eddy Simulation
The large-eddy simulation method is based on the filtered Navier-Stokes equations for three-dimensional compressible flow. For an arbitrary flow variable φ its spatially filtered form φ¯ and its Favre-averaged form φ˜ are φ(xi , t) =
1 V
V
G(xi ) φ(xi , t) dV
ρφ , and φ˜ = ρ
(1)
138
M. Meinke et al.
where G(xi ) is the filtering function and ρ is the density. Using the standard notation the filtered equations with mass-weighted variables are given by ∂ ρ¯ ∂ + (¯ ρu ˜i ) = 0, (2) ∂t ∂xi ∂τijSGS ∂ ∂ ∂ p¯ ∂ τ¯ij (¯ ρu ˜i ) + (¯ ρu˜i u ˜j ) + = + , (3) ∂t ∂xj ∂xi ∂xj ∂xj ∂ ∂ ∂ ∂ (¯ ρe˜t ) + (¯ ρe˜t + p¯)u˜j = (˜ ui τ¯ij + u˜i τijSGS ) − (¯ qj + qjSGS ), (4) ∂t ∂xj ∂xj ∂xj or written in non-dimensionalized vector form for generalized coordinates ξi=1,2,3 ∂Q ∂F a,i ∂F v,i + = . ∂t ∂ξi ∂ξi
(5)
The applied block-structured flow solver is based on a vertex centered finitevolume technique, where the equations are implicitly filtered by a top-hat filter. The modeling of the SGS stress terms τijSGS and qjSGS will be discussed further below. Due to the nonlinearity of the convection terms their discrete formulation has a strong impact on the solution and as such has to be carefully selected. It has been shown that a mixed central-upwind AUSM (Advective Upstream Splitting Method) scheme with low numerical dissipation is appropriate for the discretization of these convective fluxes [14]. The AUSM method was introduced C by Liou and Steffen [13], who split the inviscid fluxes F a into a convective F a P and a pressure part F a . After inserting the local sound speed c, the convective part is reformulated using a Mach number weighted interpolation F ac =
1 (M a+ + M a− ) (f −c + f +c ) + |M a+ + M a− | (f −c − f +c ) . 2
(6)
The fluxes f ±c and the Mach number M a± are determined by interpolated primitive variables on the left (−) and right (+) cell faces. Using the conservative MUSCL interpolation according to van Leer [24] allows to formulate an O(h2 ) accurate scheme. The pressure part of the inviscid term is split using a pressure formulation based on a Mach number weighted interpolation 1 p± = p± ( ± χ M a± ). 2
(7)
Investigations in [14] have shown the parameter χ, which represents the rate of change of the pressure ratio with respect to the local Mach number, to have a strong influence on the numerical dissipation of the scheme. A central approximation, i.e. χ = 0, gives clearly less numerical dissipation and is well suited for performing LES. The viscous terms are discretized by a central scheme of second-order accuracy. The temporal integration is performed by a 5-step-Runge-Kutta method, the coefficients of which are optimized for maximum stability. These formulations
Zonal Large-Eddy Simulations and Aeroacoustics
139
result in an overall approximation of second-order accuracy in space and time. For a detailed description of the method the reader is referred to [14]. Herein, also numerical analysis showed that the formulation of the Navier-Stokes equations along with the pressure-term modified AUSM approximation yielded the most convincing results when the MILES approach was used, i.e. the inherent truncation error of the numerical scheme is used to mimic the dissipative character of an SGS model [6]. For the current study, we therefore use the MILES approach. 2.2
Reynolds-Averaged Navier-Stokes Equations
The RANS simulation is based on the time-averaged Navier-Stokes equations. Since these equations are not closed, the unknowns are related to the mean flow variables via a turbulence model. In the present work, the Spalart Allmaras turbulence model [21] was chosen for the RANS simulation on the entire domain. These RANS findings are then used for the hybrid RANS-LES coupling in the transfer zone and for the turbulent inflow generation at the inlet of the zonal LES for the airfoil-flap configuration (see Fig. 1). 2.3
Acoustic Perturbation Equations
The set of acoustic perturbation equations (APE) used in the present simulations to predict the acoustic field, corresponds to the APE-4 formulation proposed in [3]. It is derived by rewriting the Navier-Stokes equations as ∂p p + c¯2 ∇ · ρ¯u + u ¯ 2 = c¯2 qc , ∂t c¯ p ∂u + ∇ (¯ u · u ) + ∇ = qm . ∂t ρ¯
(8) (9)
The right-hand side terms constitute the acoustic sources
qc = −∇ · (ρ u ) +
¯ ρ¯ Ds , cp Dt
∇·τ (u )2 q m = − (ω × u) + T ∇¯ s − s ∇T¯ − ∇ + . 2 ρ
(10) (11)
To obtain the APE system with the perturbation pressure as independent variable the second law of thermodynamics in the first-order formulation is used. The left-hand side constitutes a linear system describing linear wave propagation in mean flows with convection and refraction effects. The viscous effects are neglected in the acoustic simulations. That is, the last source term in the momentum equation is dropped. The numerical algorithm to solve the APE-4 system is based on a 7-point finitedifference scheme using the dispersion-relation preserving scheme (DRP) [23] for the spatial discretization including the metric terms on curvilinear grids. This scheme accurately resolves waves longer than 5.4 points per wave length (PPW).
140
M. Meinke et al.
For the time integration an alternating 5-6 stage low-dispersion low-dissipation Runge-Kutta scheme [8] is implemented. To eliminate spurious oscillations the solution is filtered using a 6th-order explicit commutative filter [19,25] at every tenth iteration step. Since the APE system does not describe convection of entropy and vorticity perturbations [3] the asymptotic radiation boundary condition by Tam and Webb [23] is sufficient to minimize reflections on the outer boundaries. On the inner boundaries between the different matching blocks covering the LES and the acoustic domain, where the transition of the inhomogeneous to the homogeneous acoustic equations takes place, a damping zone is formulated to suppress artificial noise generated by a discontinuity in the vorticity distribution [17]. 2.4
Sponge Layer
On the far-field boundary of the full LES domain and on the outer boundary of the zonal LES region, special treatment is needed to reduce spurious numerical reflections. An efficient method to reduce disturbances is based on a buffer domain near the boundary, the so-called sponge layer [5], in which additional local damping is applied to reduce fluctuations. A source term vector S, which depends on the deviation of the instantaneous conservative flow variables Q from the turbulent steady-state values Q ∗ S = σsp (Q ∗ − Q),
(12)
is added to the right-hand side of the Navier-Stokes equations. The damping factor σsp is defined as σsp = σsp,max · (d/dmax ) β ,
(13)
where d is the dimensionless distance of a point to the interior boundary in the sponge layer, and dmax is the maximum sponge layer thickness. That is, the damping factor σsp is zero at the interior boundary and reaches its maximum value σsp,max at the outer boundary of the sponge layer. In the current simulation, the values σsp,max = 0.5 and β = 2 are used, which have been used successfully in various simulations to reduce numerical reflections, see e.g. [7,18]. 2.5
Generation of Turbulent Fluctuations
The method proposed by Spille and Kaltenbach [22] is used here to generate turbulent fluctuations at the inlet of the LES zone. Herein, a body force is added to the wall-normal momentum equation on a number of control planes at different streamwise positions ∂τijSGS ∂ ∂ ∂ p¯ ∂ τ¯ij (¯ ρu˜i ) + (¯ ρu ˜i u ˜j ) = − + + + δi2 f ∂t ∂xj ∂xi ∂xj ∂xj
(14)
to amplify the wall-normal velocity fluctuations v , such that a given Reynolds shear stress (RSS) profile u v ∗ can be matched. Unlike the methods discussed
Zonal Large-Eddy Simulations and Aeroacoustics
141
in [12,1] this method produced correct turbulent statistics after a relatively short distance [10]. It operates similar to a closed-loop control system. The difference between the target and the current RSS-profile is the error function e(y, t), which acts as the input parameter for the control system e(y, t) = u v ∗ (xp,i , y) − u v z,t (xp,i , y, t).
(15)
The term u v ∗ (xp,i , y) is the target RSS at the i−th control plane at x=xp,i , and the term u v z,t (xp,i , y, t) is the calculated RSS on the i−th control plane at time t, which is averaged over the spanwise direction and over time. The spatial and temporal averaging are denoted by the superscripts z and t, respectively. The amplitude of the introduced force term is controlled by the error function according to t
r(y, t) = α e(y, t) + β
e(y, t )dt .
(16)
0
The forcing term f , which is added to the right-hand side of the wall-normal momentum equation (14), reads f (xp,i , y, z, t) = r(y, t)[u(xp,i , y, z, t) − uz,t (xp,i , y, t)].
(17)
The constants α and β define the proportional and integral behavior of the controller, respectively, and are to be chosen such that the error signal can be reduced sufficiently in a short time and no instabilities are generated. It goes without saying that the parameters are method and problem dependent. In the present investigation, good results were obtained with the control parameters α=40 and β=0.25.
Fig. 1. Zonal computation of the airfoilflap configuration
3 3.1
Fig. 2. Airfoil-flap configuration and location of the measurement planes of the velocity profiles
Zonal RANS-LES
Computational Setup
The airfoil-flap geometry, which was defined within the framework of the German project SWING+ [26], is shown in Fig. 2 together with the locations of the
142
M. Meinke et al.
velocity measurements. The flap is deployed at an angle of 34◦ to the main airfoil chord. The freestream Mach number is Ma∞ =0.12 and the Reynolds number based on the chord length is Rec =106 . The numerical simulations are conducted for an angle of attack at α = 0◦ . The full LES is performed for the whole airfoil-flap configuration with the farfield boundary placed at approximately 15 chord lengths away from the airfoil surfaces. The spanwise extension is set to be 1.28% of the chord length. This extent corresponds approximately to the radius of the eddies, which appear in the flap cove due to the separation from the sharp cove lip on the pressure side of the main airfoil. The computational grid for the full and the zonal LES is shown in Fig. 3. The regions “I” and “II” are the zones, in which the turbulent inflow will be generated. The 3D zonal LES simulation is carried out in zone “III”. The inlet planes for the zonal LES are placed right ahead of the trailing edges, since the time averaged values of the turbulent boundary layers on the upper and lower side of the airfoil are considered to be predictable with sufficient accuracy by the RANS approach. + ≈ 2. The streamThe wall is resolved with a smallest wall distance of Δymin and spanwise resolution are Δx+ = 150 ∼ 200 and Δz + ≈ 20, respectively. A 2D RANS computation was also performed on the just described LES mesh.
Fig. 3. Computational domain of the zonal LES comprising zones I, II, and III
Zonal Large-Eddy Simulations and Aeroacoustics
143
Table 1. Parameters of the mesh of the SWING+ airfoil-flap configuration . Lx , Ly , and Lz are the extent of the domain in the x-, y-, and z-direction. Nx,y denotes the overall point number in the x-y-plane, and Nz is the point number in the spanwise direction. LX LY LZ Nx,y Nz Cell Number Full Mesh −15 ∼ 15 −12 ∼ 12 0 ∼ 0.0128 311,238 41 12,760,758 Zonal Mesh 181,470 41 7,440,270
To show the validity of the zonal LES concept, the impact of the mesh should be minimized. For this reason, the zonal LES is performed on a mesh, which is identical to an inner part of the full LES mesh. The properties of both meshes are summarized in Tab. 1. If the zonal approach is applied, the total cell number is reduced from 12.7 million for the full LES to 7.4 million for the zonal LES, which corresponds to a reduction of 42% in computational cost. The unsteady characteristics of the turbulent boundary layers at the inlet of the zonal mesh is generated by using the forcing term technique which has been validated for a turbulent channel flow not presented here. These results suggested to choose the length of the inflow generation zones “I” and “II” to be of 15 ∼ 20 δu,l and heights of 3 ∼ 5 δu,l , where δu,l denotes the boundary layer thickness δ near the end of the trailing edges on the upper and lower airfoil surfaces, respectively. Prior to the zonal LES 2D RANS computation is performed on the full mesh to define the transfer zones between the zonal and the full mesh. The Reynolds shear stress profiles on some discrete locations are used for the turbulent inflow generation in the zones “I” and “II”. On the airfoil and flap surfaces an adiabatic no-slip wall is imposed. For the spanwise direction, periodic boundary conditions are used. To avoid spurious waves, non-reflecting boundary conditions [15] are applied in conjunction with a sponge layer on the outer boundaries for the full LES simulation. 3.2
Results
The zonal LES is conducted for the domain shown in Fig. 3. The Spille/Kaltenbach method is applied in zones “I” and “II”, such that at the inlet of the zonal LES domain, i.e., zone “III”, fully developed turbulent boundary layers exist. The four control planes are located in zone “I” at x1 /c=0.49, x2 /c=0.51, x3 /c=0.53, and x4 /c=0.55 on the main airfoil upper side. For zone “II”, the locations are x1 /c=0.42, x2 /c=0.43, x3 /c=0.44, and x4 /c=0.45 on the airfoil lower side. The distances between the control planes approximately correspond to 10 times of the local momentum thicknesses. Turbulent structures are generated, when the fluid passes the control planes in zones “I” and “II”. After a development length of about 0.2 chords with 4 equidistantly distributed control planes on the upper and lower airfoil surfaces, the boundary layers possess a physically meaningful instantaneous turbulence structure. In Fig. 7 the resolved turbulent coherent structures of the full and zonal simulations are visualized by λ2 contours. The illustration of the turbulent eddies
144
M. Meinke et al.
evidence the intricate flow dynamics in the flap cove region and above the flap. The separated shear layer from the flap cove lip divides the structures into two parts. The slow structures colored by dark blue color occur in the low-speed recirculation zone. The light blue colored structures are accelerated by the local flow and flow through the gap. By passing the gap between the main element and the flap they are stretched and become longitudinal slender eddies. Above them, the small-scale vortex shedding behind the trailing edge of the main airfoil is evident. These flow structures above the flap leading edge will merge with the turbulent boundary layer of the flap upper side, and form the turbulent wake of the airfoil flow. Near the trailing edge on the upper surface of the flap small recirculation regions occur, i.e., the flow is detached. The comparison of the full and the zonal LES solutions shows qualitatively similar results. The upper and lower interfaces between the LES zone “III” and the 2D RANS zone in Fig. 3 are located approximately 10δ away from the airfoil-flap configuration. In these regions the turbulent fluctuations of the boundary layer already decay to zero, and hence, the flow can be well approximated by a Reynoldsaveraged formulation. For the vertical interface 0.5c downstream of the flap, the turbulent wake can be considered to have only a minor influence on the upstream flow. On these interfaces, a sponge layer technique has been applied to drive the unsteady LES flow variables towards the RANS solution. The time and spanwise averaged pressure coefficients and velocity profiles are compared with full LES results and experimental data in Figs. 4 and 6. For the pressure coefficient, deviations occur at the beginning of the inflow generation zones “I” and “II”. This is caused by the source terms added to the momentum equation in the control planes, which generate unphysical pressure variations. Shortly downstream of the control planes, the distribution shows an almost perfect match with the full LES results (see Fig. 5). Note, in the zonal method the
Fig. 4. Time and spanwise averaged pressure coefficient − Cp (x)
Fig. 5. Time and spanwise averaged pressure coefficient − Cp (x) on the upper surface of the main airfoil in different zones
Zonal Large-Eddy Simulations and Aeroacoustics
145
Fig. 6. Time and spanwise averaged wall parallel velocity profiles u/u∞ at several streamwise locations which are defined in Fig. 2
analysis comprises the RANS and LES solutions. That is, in the transition region where the control planes are located the final result is still determined by the RANS solution. Only further downstream the LES solution is substituted for the first-step RANS distribution. In other words, the differences due to the impact of the artificial forces in the control planes do not occur in the final representation of the RANS-LES composed distribution. The small recirculation region near the trailing edge of the flap being visible in the λ2 contours in Fig. 7 can not be identified in the Cp (x) distributions of the full or the zonal LES. Figure 6 shows that the velocity profile at position “A” is overpredicted by the full LES in the outer part of the boundary layer due to an insufficient momentum exchange in the wall-normal direction, which is caused by the underresolution.
Fig. 7. λ2 =-0.8 contours colored by the local Mach number: full (left) and zonal LES (right)
146
M. Meinke et al.
Fig. 8. Profile of the rms wall-parallel velocity fluctuations
Fig. 9. Profile of the rms wall-normal velocity fluctuations
Using the Spille/Kaltenbach method, an artificially generated and yet physically correct turbulent boundary layer is developed at the inlet of the zonal mesh. The artificially created fluctuations excite the momentum exchange, so that the wallparallel velocity component is reduced and the wall-normal component of the velocity is increased, yielding a better agreement with the experimental data. At position “E” the simulated flow in the full LES tends to separate, which can be seen in the velocity profile at position “E” in Fig. 6. This is due to the fact that the simulated flow on the airfoil lower side does not have enough turbulent kinetic energy to overcome the positive pressure gradient, especially when numerical dissipation of the underresolved mesh removes additional energy from the turbulent flow. With the application of the turbulent inflow generation
Fig. 10. Contours of the turbulent kinetic energy for full (left) and zonal LES (right)
Zonal Large-Eddy Simulations and Aeroacoustics
147
method, physically correct turbulent flow is generated on the airfoil lower side, so that its ability to overcome positive pressure gradient is increased. The increase of the turbulence intensities at position “A” and “E” is evidenced in Fig. 8. The distribution of the wall-parallel and the wall-normal rms velocities, i.e., rms , are compared in Figs. 8 and 9 for the full and the uwp rms and vwn zonal LES computations. With the exception of the increases in the uwp rms distributions of the zonal solution at position “A” and “E”, which are due to the Spille/Kaltenbach method, the overall agreement is convincing. Contours of the turbulent kinetic energy and the Reynolds shear stress u v of the full and zonal LES are plotted in Fig. 10. As can be expected from the previous discussion, the overall agreement of the fluctuation statistics possesses such a quality that the zonal method should be adapted for such a flow problem, since it is more efficient without contaminating the flow physics.
4 4.1
Aeroacoustics of Slat Noise
Computational Setup
−1.0
y −0.5
0.0
x
0.5 *10−1
Fig. 11. LES grid in the slat cove area of the high-lift configuration. Every 2nd grid point is depicted.
−1.5
−1.0
−0.8
−1.0
−0.6
−0.5
y
−0.4
−0.2
0.0
0.0
*10−1 0.5
*10−1 0.2
The computational mesh for the LES around the airfoil-slat configuration consists of 32 blocks with a total amount of 55 million grid points. The extent in the spanwise direction is 2.1% of the clean chord length and is resolved with 65 points. Figure 11 depicts the mesh near the airfoil and in the slat cove area. To assure a sufficient resolution in the near surface region of Δx+ ≈ 100, Δy + ≈ 1, and Δy + ≈ 22 [16] the analytical solution of a flat plate was used during the grid generation process to approximate the needed step sizes. On the far-field boundaries of the computational domain boundary conditions based on the theory of characteristics are applied. A sponge layer following Israeli et al. [9] is imposed on these boundaries to avoid spurious reflections, which would otherwise influence the acoustic analysis. On the walls an adiabatic no-slip
−1.0
−0.5
0.0
0.5
x
1.0
1.5 *10−1
2.0
Fig. 12. APE grid of the high-lift configuration. Every 2nd grid point is depicted.
148
M. Meinke et al.
boundary condition is applied and in the spanwise direction periodic boundary conditions are used. The computation is performed for a Mach number of M a = 0.16 at an angle of attack of α = 13◦ . The Reynolds number is set to Re = 1.4 · 106 . The initial conditions were obtained from a two-dimensional RANS simulation. The acoustic analysis is done by a two-dimensional approach. That is, the spanwise extent of the computational domain of the LES can be limited since especially at low Mach number flows the turbulent length scales are significantly smaller then the acoustic length scales and as such the noise sources can be considered compact. This treatment tends to result in somewhat overpredicted sound pressure levels which are corrected following the method described by Ewert et al. in [4]. The acoustic mesh for the APE solution has a total number of 1.8 million points, which are distributed over 24 blocks. Figure 12 shows a section of the used grid. The maximum grid spacing in the whole domain is chose to resolve 8 kHz as the highest frequency. The acoustic solver uses the mean flow field obtained by averaging the unsteady LES data and the time dependent perturbed Lamb vector (ω ×u) , which is also computed from the LES results, as input data. To be in agreement with the large-eddy simulation the Mach number, the angle of attack and the Reynolds number are set to Ma=0.16, α=13◦ and Re=1.4·106, respectively. 4.2
Results
The large-eddy simulation has been run for about 5 non-dimensional time units based on the freestream velocity and the clean chord length. During this time a fully developed turbulent flow field was obtained. Subsequently, samples for the statistical analysis and also to compute the aeroacoustic source terms were recorded. The sampling time interval was chosen to be approximately 0.0015 time units. A total of 4000 data sets using 7 Terabyte of disk space have been collected which cover an overall time of approximately 6 non-dimensional time units. The simulations were run on the NEC-SX8 of the HLRS Stuttgart. The maximum computing speed amounts 6.7 GFLOPS, the average value was 5.9 GFLOPS. An average vectorization ratio of 99.6% was achieved with a mean vector length of 247.4. First of all, the quality of the results should be assessed on the basis of the proper mesh resolution near the walls. Figures 13 to 16 depict the determined values of the grid resolution and shows that the flat plate approximation yields satisfactory results. However, due to the accelerated and decelerated flow on the suction and pressure side, respectively, the grid resolution departs somewhat from the approximated values. In the slat cove region the resolution reaches everywhere the required values for large-eddy simulations of wall bounded flows (Δx+ ≈ 100, Δy + ≈ 2, and Δy + ≈ 20 [16]). The Mach number distribution and some selected streamlines of the time and spanwise averaged flow field are presented in Fig. 17. Apart form the two stagnation points one can see the area with the highest velocity on the suction side shortly downstream of the slat gap. Also recognizable is a large recirculation domain which fills the whole slat cove area. It is bounded by a shear layer which
Zonal Large-Eddy Simulations and Aeroacoustics 450
250
Δx+ Δy+*100 Δz+
400 350
149
Δx+ Δy+*100 Δz+
200 150 +
250
Δhi
Δhi
+
300
200
100
150 100
50
50 0
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x/c
1
0
Fig. 13. Grid resolution near the wall: suction side of the main wing
350
Δx+ Δy+*100+ Δz
300 250
200
200
+
250
Δhi
+
Δhi
350
150
1
Fig. 14. Grid resolution near the wall: pressure side of the main wing
Δx+ Δy+*100+ Δz
300
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x/c
150
100
100
50
50
0
0 0
50
100
150
200
250
0
20
40
point
Fig. 15. Grid resolution near the wall: suction side of the slat
60
80 100 120 140 160 180 point
Fig. 16. Grid resolution near the wall: slat cove
5 4
LES RANS Exp. data
-cp
3 2 1 0 -1 -2 0
0.2
0.4
0.6
0.8
1
x/c
Fig. 17. Time and spanwise averaged Mach number distribution and some selected streamlines
Fig. 18. Comparison of the cp coefficient between LES, RANS and experimental data [11]
150
M. Meinke et al.
develops form the slat cusp and reattaches close to the end of the slat trailing edge. The pressure coefficient cp computed by the time averaged LES solution is compared in Fig. 18 with RANS results and experimental data. The measurements were carried out at DLR Braunschweig in an anechoic wind tunnel with an open test section within the national project FREQUENZ. These experiments are compared to numerical solutions which mimic uniform freestream conditions. Therefore, even with the correction of the geometric angle of attack of 23◦ in the measurements to about 13◦ in the numerical solution no perfect match between the experimental and numerical data can be expected. Figs. 19 to 21 show the turbulent vortex structures by means of λ2 contours. The color mapped onto these contours represents the Mach number. The shear layer between the recirculation area and the flow passing through the slat gap develops large vortical structures near the reattachment point. Most of these structures are
Fig. 19. λ2 contours in the slat region
Fig. 20. λ2 contours in the slat region
Fig. 21. λ2 contours in the slat gap area
Fig. 22. Time and spanwise averaged turbulent kinetic energy in the slat cove region
Zonal Large-Eddy Simulations and Aeroacoustics
151
convected through the slat gap while some vortices are trapped in the recirculation area and are moved upstream to the cusp. This behavior is in agreement with the findings of Choudhari et al. [2]. Furthermore, like the investigations in [2] the analysis of the unsteady data indicates a fluctuation of the reattachment point. On the suction side of the slat, shortly downstream of the leading edge, the generation of the vortical structures in Fig. 19 visualizes the transition of the boundary layer. This turbulent boundary layer passes over the slat trailing edge and interacts with the vortical structures convected through the slat gap. Fig. 21 illustrates some more pronounced vortices being generated in the reattachment region and whose axes are aligned with the streamwise direction. The distribution ofthe time and spanwise averaged turbulent kinetic energy k = 12 u2 + v 2 + w2 is depicted in Fig. 22. One can clearly identify the shear layer and the slat trailing edge wake. The peak values occur, in agreement with [2], in the reattachment area. This corresponds to the strong vortical structures in this area evidenced in Fig. 20. A snapshot of the distribution of the acoustic sources by means of the perturbed Lamb vector (ω × u) is shown in Fig. 23. The strongest acoustic sources are caused by the here presented y-component of the Lamb vector. The peak value occurs on the suction side downstream of the slat trailing edge, whereas somewhat smaller values are determined near the main wing trailing edge. Figures 24 and 25 illustrate a snapshot of the pressure fluctuations based on the APE and the LES solution. Especially in the APE solution the interaction between the noise of the main wing and that of the slat is obvious. A closer look reveals that the slat sources are dominant compared to the main airfoil trailing edge sources. It is clear that the LES mesh is not able to resolve the high frequency waves in some distance from the airfoil. The power spectral density (PSD) for an observer point at x=-1.02 and y=1.76 compared to experimental results are shown in Fig. 26 [11]. The magnitude and
Fig. 23. Snapshot of the y-component of the Lamb Vector
152
M. Meinke et al.
Fig. 24. Pressure contours based on the LES/APE solution
Fig. 25. Pressure contours based on the LES solution
2E-05
10-11
1E-05
PSD
10
p’rms cos(Φ)
-10
10-12
10-13
Total Geometry Slat Main Wing
0
-1E-05
2
4
6
8
-2E-05
-1E-05
0
1E-05
2E-05
Sr
p’rms cos(Φ)
Fig. 26. Power spectral density for a point at x=-1.02 and y=1.76
Fig. 27. Directivities for a circle with R = 1.5 based on the APE solution
the decay of the PSD at increasing Strouhal number (Sr) is in good agreement with the experimental findings. A clear correlation of the tonal components is not possible due to the limited period of time available for the fast Fourier transformation which in turn comes from the small number of input data. The directivities of the slat gap noise source and the main airfoil trailing edge source are shown in Fig. 27 on a circle at radius R = 1.5 centered near the trailing edge of the slat. The following geometric source definitions were used. The slat source covers the part from the leading edge of the slat through 40% chord of the main wing. The remaining part belongs to the main wing trailing edge source. An embedded boundary formulation is used to ensure that no artificial noise is generated [17]. It is evident that the sources located near the slat cause a stronger contribution to the total sound field than the main wing trailing edge sources. This behavior corresponds to the distribution of the Lamb vector.
Zonal Large-Eddy Simulations and Aeroacoustics
5
153
Conclusion
A zonal RANS-LES method to compute the flow over an airfoil-flap configuration has been presented. The results have shown that the computational effort can easily be reduced by nearly 50% for the zonal RANS-LES concept compared to a full LES. In the zonal approach the flow field has been divided into two zones. A 2D RANS simulation has been performed in the zone which is characterized by attached boundary layers such that equilibrium based turbulence models are valid. In zones with separated flow and where unsteady turbulent scales must be reproduced, e.g. for a subsequent acoustic analysis an LES is conducted. At the non-inlet interfaces between both zones a sponge layer technique has been implemented to drive the flow variables towards those extracted from the RANS results. A turbulent inflow generation technique using artificial forces in the wall-normal momentum equation [22] has been validated for a turbulent channel flow at Reτ = 590 and then applied to the inlet of the zonal simulation to generate physically relevant turbulent fluctuations in the boundary layers. The comparison of the zonal results with the full LES findings and the experimental data showed a convincing agreement. Due to the realistic turbulent boundary layer generated at the inlet zone, the zonal results were improved in some details in comparison to the full LES findings. At the embedded boundaries no spurious waves are generated. The reduction in the computational effort can be even further increased by minimizing the size of the embedded LES domain. In addition a successful computation of the slat noise based on a hybrid LES/APE method has been presented. The flow field and acoustic field were computed in good agreement with experimental results showing the correct noise generation mechanisms to be determined. The analysis of the slat noise study shows the interaction of the shear layer of the slat trailing edge and slat gap flow to generate higher vorticity than the main airfoil trailing edge shear layer. Thus, the slat gap is the dominant noise source region. The results of the LES are in good agreement with data from the literature. The acoustic analysis shows the correlation between the areas of high vorticity, especially somewhat downstream of the slat trailing edge and the main wing trailing edge, and the emitted sound.
Acknowledgments The slat noise study was funded by the national project FREQUENZ. The APE solutions were computed with the DLR PIANO code, the development of which is part of the cooperation between DLR Braunschweig and the Institute of Aerodynamics of RWTH Aachen University. All computations were carried out with the high-performance computer NEC-SX8 installed at the HLRS of the University of Stuttgart.
154
M. Meinke et al.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
17.
18.
19. 20. 21. 22.
23. 24. 25. 26.
Batten, P., Goldberg, U., Chakravarthy, S.: AIAA J. 42, 485–492 (2004) Choudhari, M.M., Khorrami, M.R.: AIAA paper 2006-0211 (2006) Ewert, R., Schr¨ oder, W.: J. Comput. Phys. 188, 365–398 (2003) Ewert, R., Zhang, Q., Schr¨ oder, W., Delts: J. AIAA paper 2003-3114(2003) Freund, J.B.: AIAA J. 35(4), 740–742 (1997) Fureby, C., Grinstein, F.: AIAA J. 37(5), 544–556 (1999) Guo, X., Schr¨ oder, W., Meinke, M.: Comp. Fluids 35, 587–606 (2005) Hu, F.Q., Hussaini, M.Y., Manthey, J.L.: J. Comput. Phys. 124(1), 177–191 (1996) Israeli, M., Orszag, S.A.: J. Comput. Phys. 41, 115–135 (1981) Keating, A., Piomelli, U., Balaras, E., Kaltenbach, H.J.: Phys. Fluids 16(12), 4696– 4712 (2004) Kolb, A.: Private communication. Project FREQUENZ (2006) Le, H., Moin, P., Kim, J.: J. Fluid. Mech. 330, 349–374 (1997) Liou, M.S., Steffen Jr, C.J.: J. Comput. Phys. 107, 23–39 (1993) Meinke, M., Schr¨ oder, W., Krause, E., Rister, T.: Comput. Fluids 31, 695–718 (2002) Poinsot, T.J., Lele, S.K.: J. Comput. Phys. 101, 104–129 (1992) Mary, I., Sagaut, P.: Contribution by ONERA. In: Davidson, L., Cokljat, D., Fr¨ ohlich, J., Leschziner, M.A., Mellen, C., Rodi, W. (eds.) LESFOIL: Large Eddy Simulation of flow around a high lift airfoil. Notes on Numerical Fluid Mechanics and Multidisciplinary Design (NNFM), vol. 83, pp. 167–183. Springer, Heidelberg (2003) Schr¨ oder, W., Ewert, R.: Some concepts of LES-CAA coupling. In: Wagner, C., Huttl, T., Sagaut, P. (eds.) Large-Eddy simulation for acoustics. Cambridge University Press, Cambridge (2005) Schr¨ oder, W., Meinke, M., Ewert, R., El-Askary, W.: LES based trailing-edge noise prediction. In: Liu, C., Sakell, L., Beutner, T. (eds.) DNS/LES - Progress and Challenges. Proc. of the third AFOSR International Conference on DNS/LES, Arlington, TX, August 5-9, 2001, pp. 689–698. Greyden Press, Columbus, Ohio (2001) Shang, J.S.: J. Comput. Phys. 153, 312–333 (1999) Spalart, P.R.: Trends in turbulence treatments AIAA paper 2002-2306 (2000) Spalart, P.R., Allmaras, S.R.: A one-equation turbulence model for aerodynamic flows. AIAA paper 92-0439 (1992) Spille, A., Kaltenbach, H.J.: Generation of turbulent inflow data with a prescribed shear-stress profile. In: Liu, C., Sakell, L., Beutner, T. (eds.) DNS/LES - Progress and Challenges, Proc. of the third AFOSR International Conference on DNS/LES, August 5-9, 2001. Greyden Press, Columbus, Ohio (2001) Tam, C.K.W., Webb, J.C.: J. Comput. Phys. 107(2):262–281 (1993) van Leer, B.: J. Comput. Phys. 32, 101–136 (1979) Vasilyev, O.V., Lund, T.S., Moin, P.: J. Comput. Phys. 146, 82–104 (1998) W¨ urz, W., Guidati, S., Herr, S.: Aerodynamische Messungen im Laminarwindkanal im Rahmen des DFG-Forschungsprojektes SWING+ Testfall 2. Tech. rep., IAG, Universit¨ at Stuttgart (2002)
Experimental Statistical Attacks on Block and Stream Ciphers S. Doroshenko1, A. Fionov1 , A. Lubkin1 , V. Monarev2 , B. Ryabko2 , and Yu.I. Shokin2 1
Siberian State University of Telecommunications and Computer Science Kirova str. 86, 630102 Novosibirsk, Russia
[email protected],
[email protected],
[email protected] 2 Institute of Computational Technologies SB RAS Lavrentiev Ave. 6, 630090 Novosibirsk, Russia
[email protected],
[email protected],
[email protected]
Abstract. Efficient statistical tests, e.g. recently suggested “Book Stack” test, are successfully applied to detect deviations from randomness in bit sequences generated by stream ciphers such as RC4 and ZK-Crypt, as well as by block cipher RC6 (with reduced number of rounds). In case of RC6 a key recovery attack is also mounted. The essence of the tests is briefly described. The experiments data are provided.
1
Introduction
Cryptanalysis of block and stream ciphers is a major topic of cryptology. The point is that any success in attacks to the ciphers, in fact, offers the ways of improvements which make the ciphers more strong and reliable. So cryptanalysis attracts many researchers worldwide and is also the field of applying highperformance computing. One of the goal of cryptographers is to develop such encryption techniques that no supercomputer would be able to break the cipher in centuries. The distinguishing attack on stream ciphers and pseudo-random bit generators (PRBGs) aims to find deviations from randomness in the output sequences produced thereby. It is generally considered that a PRBG (and a corresponding stream cipher based on it) is cryptographically secure if it passes all polynomialtime statistical tests, i.e., the bit sequences it produces are indistinguishable from truly random sequences by those tests. On the contrary, if a test is found which detects deviations from randomness for a certain PRBG then this PRBG is unsuitable for cryptographic use. That is why the distinguishing attack on stream ciphers and PRBGs is so important. When a block cipher is used in a stream cipher mode of operation, the same distinguishing attack is also applicable to it, and, as will be shown, can serve as a basis for further key recovery attack.
The authors were supported by Russian Foundation for Basic Research grant no. 06-07-89025.
E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 155–164, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
156
S. Doroshenko et al.
There are two basic approaches to maintaining distinguishing attacks. The first one is to show non-randomness by theoretical study of a PRBG. The second approach is to test randomness experimentally, e.g., by applying mathematical statistics methods. This experimental approach is quite known and widely used in practice. In particular, the US National Institute of Standards and Technologies (NIST) recommends in [1] this approach and suggests 16 statistical tests for cryptographic PRBGs. In [1], one can also find the main notions of mathematical statistics. We briefly discuss these notions since they are essential for our test. To distinguish between random and non-random sequences the null hypothesis H0 that the sequence is random must be tested against the alternative hypothesis H1 that the sequence is not random. A statistical test must decide on whether to accept or reject the null hypothesis. More specifically in our test, the null hypothesis H0 is that the sequence was generated by the source whose outputs are independent and equiprobable (in the binary case, all zeroes and ones are independent of each other and appear with probability 1/2). The alternative hypothesis H1 = ¬H0 is that the sequence was generated by a stationary and ergodic source different from the source under H0 . A generated sequence subject to statistical testing is usually called a sample. Often the whole generated sequence is divided into a number of independent samples for testing. Hypothesis testing is probabilistic in its nature. This is just because if H0 is true then any sample of a given length is equally likely. So when we look at a specific sample we cannot say for sure whether it is random or not. A so called Type I error occurs if H0 is true but, nevertheless, is rejected by the test. The probability of Type I error is often called the level of significance of the test and denoted by α. The values of α stretching form 0.001 to 0.05 are employed in practical cryptography. The opposite situation, when H1 is true but the test accepts H0 , is a Type II error. The probability of Type II error, denoted by β, is usually difficult to determine. For example, the sequence generated by a PRBG is definitely non-random since any PRBG is a deterministic algorithm. Yet, for a good (cryptographically secure) PRBG any applicable test should accept H0 . We can say that within some model of non-randomness, less values of β correspond to more powerful tests. In [2] a statistical test called “Book Stack” was first suggested. Then it was presented as a tool and compared to the NIST tests at the 2nd Russian-German Advanced Research Workshop on Computational Science and High Performance Computing, March 14–16, 2005, Stuttgart, Germany (see also [3]). It was shown that this test is more powerful than all 16 tests recommended by NIST. A computer implementation of the Book Stack test in C++ language together with operation manual is available at http://web.ict.nsc.ru/∼ldp. In this paper, we present the results of application of the Book Stack test to RC4 and ZK-Crypt stream ciphers, and to RC6 block cipher. RC4 stream cipher is well-known in cryptographic community. Sometimes it is referred to as “alleged” RC4 since its specification was not officially published by the author (R. Rivest). In the papers [4,5,6,7], it was shown theoretically that the keystream generated by RC4 is not truly random. In [8], non-randomness
Experimental Statistical Attacks on Block and Stream Ciphers
157
of RC4 was experimentally verified with about 56 trillion samples (> 255 bits) of RC4 output. We consider the most popular version of RC4, in which the number of bits in a single output is 8. It turns out that the Book Stack test detects non-randomness of RC4 under the sample size of 232 bits. ZK-Crypt [9] is a new stream cipher proposed as a candidate to ECRYPT Stream Cipher Project (eSTREAM). The Book Stack test declares the output of ZK-Crypt to be far from random when the sample size is about 225 bits. We can see that with the Book Stack test, the distinguishing attacks on RC4 and ZK-Crypt are successfully mounted on a PC. It is important to note that all other eSTREAM candidates have passed the Book Stack test under the sample sizes of 230 –235 bits (the upper bound of the sample size depended on the cipher speed and our computing facilities). We describe the test in the following sections. RC6 is a block cipher proposed as a candidate for AES (Advanced Encryption Standard, to replace well-known DES). There are numerous works devoted to cryptanalysis of the cipher. In most cases they use Knudsen and Meier results [10] and are applicable to simplified variants of RC6, without whitening (postwhitening and pre-whitening, see [11], [12]). Almost all attacks are based on investigation of statistical characteristics of RC6. For example, according to one of the results it is possible to distinguish RC6 output sequence from a random one having an appropriate number of ciphertexts (28r+10 texts for the r−th round RC6). The best result known so far is described in [11], it is a key recovery attack which needs 28r−1 plaintexts for the r−th (r = 2, 4, . . . ) round RC6 with a time complexity of 254 per one plaintext–ciphertext pair. Apparently, the time complexity for r = 4 would be considerable (285 ), which makes the attack unrealizable in practice. We propose a new key recovery attack which allows to recover a key for the 5-th round with time complexity of just 246 . This attack is also based on examination of statistical properties of RC6. The paper is organized as follows. In Sect. 2 we give some notes on the statistic criterion and testing techniques used. In Sect. 3 we give experimental data for the attacks on stream ciphers. In Sect. 4 and 5 data of cryptanalysis of RC6 are presented.
2
Statistical Criterion and Tests
To derive a decision upon the null hypothesis a statistic on a sample is first computed. In our tests, we use a well-known x2 statistic which is described as follows. Let n denote the sample size, n0 the number of zeroes and n1 the number of ones in the sample, n0 + n1 = n. Let p and q denote a priori probabilities of zero and one, respectively. Then pn and qn are the expected numbers of zeroes and ones in a sample of size n. The x2 statistic is defined by the equation x2 =
(n1 − qn)2 (n0 − pn)2 + . pn qn
(1)
158
S. Doroshenko et al.
For testing H0 directly, we have p = q = 1/2 and (1) is reduced to x2 =
(n0 − n1 )2 . n
However, direct testing is usually inefficient and, as a rule, some processing of the sample is performed after which an equivalent to H0 hypothesis (denoted H0∗ ) is tested for very skew distribution. The scheme of statistical test is the following. We set some critical (threshold) value tα > 0. Then we compute the x2 statistic on a sample and compare it to the critical value. The null hypothesis H0 (or H0∗ ) is accepted if x2 < tα . Otherwise H0 is rejected. So the probability of Type I error (the level of significance) is the probability of the event x2 ≥ tα when H0 is true, i.e. α = P (x2 ≥ tα ). It is known that the x2 statistic obeys asymptotically the χ2 (chi-square) distribution (with a corresponding degree of freedom). It is generally accepted that it is quite correct to employ the χ2 distribution for x2 if both qn and pn are greater than 5. There are percentile tables for χ2 distribution suggested in the literature that show the values of tα for various α. For example, for binary cases we used: α = 0.05 tα = 3.8415, α = 0.001 tα = 10.8376. If in a series of tests (on many samples of the generated sequence), in more than 5% of cases, we observe the values of statistic greater than 3.8415, we may conclude that the sequence is not random (i.e. reject H0 with the level of significance 0.05). To make input transformation the input sample x1 , x2 , . . . , xs , where each xi ∈ {0, 1}, is considered to consist of w-bit words, 1 ≤ w ≤ 32, extracted from the sample one after another with an optional omission of several bits (a “blank” between words). The words do not overlap. For example, if w = 3 and blank is 1, the bit sample 0111010100010100 . . . converts to the word sample 011 010 000 010 . . . or, in decimal notation, 3 2 0 2 . . . . So we have now a sequence of words y1 , y2 , . . . , yn obtained from the input sample, where all yi ∈ {0, 1, . . . , 2w − 1}. In the Book Stack test with the size of the upper part u, when observing y1 , y2 , . . . , yn , we calculate how many times the word yi , i = 1, . . . , n, was in the upper part of the stack. We denote this number by n0 . Obviously, if the null hypothesis H0 is true then all the words have the same probability 1/2w and the probability of the event yi is in the upper part is equal to u/2w . Using the notation introduced above we may write p = u/2w , q = 1 − p. Now testing H0 is replaced by testing the equivalent hypothesis H0∗ that the binary random variable Y obeys the distribution P (Y = 0) = p, P (Y = 1) = q, given the sample y1 , y2 , . . . , yn with n0 zeroes and n1 ones. This can be done using the χ2 distribution as was explained earlier. Let us make a remark on complexity of the Book Stack test. Whereas the “naive” method of implementation would require O(u) operations, and more complicated algorithms based on AVL or other balanced trees can perform all
Experimental Statistical Attacks on Block and Stream Ciphers
159
operations in O(log u) time, our current implementation of the test based on hashing is characterized by the expected running time of O(1).
3
Distinguishing Attacks on Stream Ciphers
The randomness of the RC4 keystream was investigated for 100 randomly chosen keys by the Book Stack test. To apply the test we divided the keystream sequence into 16-bit words (w = 16, without blanks) and set the size of the upper part u = 16. We generated files of different lengths for each randomly chosen 128-bit key (see Table 3) and applied the Book Stack test to each file with the level of significance α = 0.05. So, if the null hypothesis H0 is true then, on average, 5 files from 100 should be recognized as non-random. All the results obtained are given in Table 1, where integers in the cells are the numbers of files recognized as non-random (out of 100). Having taken into account that, on average, only 5 files from 100 should be recognized as non-random if the sequences are random, we see that the keystream sequences are far from random when their length is about 232 (and greater). The randomness of the ZK-Crypt keystream was also investigated for 100 randomly chosen keys by the Book Stack test. The keystream sequence was divided into 32-bit words (w = 32, no blanks). The size of the upper part was set to u = 216 . We generated files of different lengths for each key (see Table 5) and applied the Book Stack test to each file with the level of significance α = 0.001. So, if H0 is true then, on average, only 1 file from 1000 should be recognized as non-random. The results obtained are given in Table 2, where integers in the cells are the numbers of files recognized as non-random (out of 100). We can see that the keystream sequences are far from random when their lengths are about 225 .
Table 1. Testing RC4 Length (bits) 231 232 233 234 235 236 237 238 239 Non-random 6 12 17 23 37 74 95 99 100
Table 2. Testing ZK-Crypt Length (bits) 224 226 228 230 Non-random 25 51 97 100
4
Distinguishing Attack on RC6
We considered the most popular version of RC6 with 128 bits per block or 4 words of 32 bits each. For the sake of consistency, we give a brief description of RC6.
160
S. Doroshenko et al.
Let us use the following notation: a <<< b: cyclic rotation of a to the left by b-bits; a >>> b: cyclic rotation of a to the right by b-bits; f (x): x(2x + 1); F (x): f (x) <<< 5; ⊕: bitwise XOR; least significant n-bit of X; lsbn X: x||y: concatenation of x and y; i-th subkey; Si : r: the number of rounds; (Ai , Bi , Ci , Di ): i-th round input; First, the 128 (192, 256)-bit key is converted to an array of (2r + 4) 32-bit subkeys S0 , S1 , .... The encrypting algorithm could be considered as the following: 1. A1 = A0 , B1 = B0 + S0 , C1 = C0 , D1 = D0 + S1 ; 2. for i = 1 to r do t = F (Bi ), u = F (Di )), Ai+1 = Bi , Bi+1 = ((Ci ⊕ u) <<< t) + S2i+1 , Ci+1 = Di ; Di+1 = ((Ai ⊕ t) <<< u) + S2i ; 3. Ar+2 = Ar+1 + S2r+2 , Br+2 = Br+1 , Cr+2 = Cr+1 + S2r+3 , Dr+2 = Dr+1 . Steps 1 and 3 are “whitening” steps. The notation RC6-w/r/b means that cipher’s input is 4 w-bit words, the number of rounds is r, and the key length is b bits. We investigate RC6-32/r/128 using different r values. Let us describe a method of statistical analysis of cipher’s output sequences. 1. The following sequence is used as the cipher’s input: (0, 0, 0, 0), (1, 0, 0, 0), . . . , (232 − 1, 0, 0, 0), (0, 0, 1, 0), (1, 0, 1, 0), . . . (2) 2. The x2 statistic for lsb5 (Ar+2 )||lsb5 (Cr+2 ) of each cipher block (1023 degrees of freedom) is calculated, where r is a number of rounds. We tested 1000 random keys for 4 and 5 rounds. The results are presented in Table 3. To make the results directly comparable with those of other works the sample size is given in cipher blocks (128 bits each). It can be observed that x2 statistic for sample size 212 exceeds Q0.99 quantile in 963 cases out of 1000, though the average number of exceedings must be equal to 10. It means that 4th round cipher’s output is non uniform, which can be observed for sample size of 212 for more than 98% of keys. If the x2 value for one sequence is greater than for another one we say that the former sequence is “less random”. Table 3 shows that greater number of rounds results in greater randomness of encrypted sequence (2). This test forms the basis for key-recovery attack described in the next section.
Experimental Statistical Attacks on Block and Stream Ciphers
161
Table 3. Testing RC6 r Sample size Average x2 x2 (in blocks) 4 210 1342.8 1604.5 4 211 2068.8 4 212 5 224 3716.3 5744.3 5 225 9364.0 5 226
> Q0.95 x2 > Q0.99 x2 > Q0.999 730 908 986 692 801 921
608 831 963 624 711 862
488 713 930 559 635 789
Table 4. Testing 6-round RC6 Sample size (in blocks) 238 239 240 241 242 Number of non-random outputs (out of 5) 1 1 1 2 5
If we operate with samples of considerably greater size, we manage to maintain the distinguishing attack for 6-round RC6. For today, it is the best practical result for this cipher. A more powerful test, namely, Book Stack test was also used. Again, the sequence (2) was brought at the cipher’s input. From each block of the output sequence a word built from lower order bytes of RC6’s A and C registers was extracted for the Book Stack test (lsb8 (Ar+2 )||lsb8 (Cr+2 ), where r = 6). The size of the upper part for the test was chosen u = 215 + 214 + 213 − 1, the level of significance α = 0.05. The test was applied to 5 cipher’s output sequences obtained from 5 randomly selected keys. The results are presented in Table 4. We can see that under the sample size of 242 the cipher’s output was distinguished from a truly random sequence for all selected keys with the aid of the Book Stack test.
5
Key-Recovery Attack on RC6
Now we are ready to describe the algorithm of the key-recovery attack. Note that it is a Type 3 attack, i.e. “chosen text” attack. The main idea is close to attacks proposed in [12], [10], [11], where a difference between sequences encrypted on r and r + 1 rounds is used. In our method, a sequence (2) fed to the cipher input is far from random, which allows us to distinguish an encrypted sequence from a random one using much smaller sample sizes than in methods in [12], [10], [11]. This idea, under the name “gradient statistical attack”, was first suggested in [13]. Also the key search algorithm is changed, so it becomes possible to find a secret key in t × 232 time (where t is a length of chosen plaintext), instead of t × 254 , which considerably decreases the time complexity. First we describe an algorithm which will help to understand the main idea. Note that for known S0 and S1 in RC6 it is possible to select a plaintext so that the ciphertext will be as random as the ciphertext on the (r − 1)th round
162
S. Doroshenko et al.
of encryption of sequence (2). Actually, let (an , bn , cn , dn ) be a sequence of type (2). Let’s show that for known S0 and S1 we could select such a plaintext that A2 = an , B2 = bn + S3 , C2 = cn , D2 = dn + S2 . Indeed, assuming D0 = cn − S1 , B0 = an − S0 , A0 = (dn >>> u) ⊕ t, C0 = (bn >>> t) ⊕ u, where u = F (cn ), t = F (an ) we have what is required. As far as we had an input sequence (an , bn + S3 , cn , dn + S2 ) on the 2-nd round the output would be a sequence (an , bn , cn , dn ) encrypted with r − 1 rounds. Now we describe the full version of the attack with time complexity O(264 ):
1. For each pair S0 and S1 calculate the sequence described above. Let the sequence be a cipher’s input. 2. For every sequence calculate x2 statistic using lsb5 (Ar+2 )||lsb5 (Cr+2 ). 3. Search for a pair S0 and S1 with maximal x2 value. Assume those found equal to S0 and S1 we looked for. Let us explain the main idea. We use a sequence which depends on S0 and S1 as cipher’s input. If the subkeys are chosen correctly then we have a sequence (2) as the input for the 2-nd round, therefore it is possible to distinguish the output sequence from a random one according to Table 3. If the subkeys are incorrect then the randomness of output sequence will be greater. Indeed, the testing results are given in Table 5. The results are given for 5 different keys for both 5-th and 6-th rounds of RC6. For each correct key we tested 1000 incorrect random keys and calculated x2 statistics for the correct key and for each of the incorrect keys, also average x2 value for incorrect keys was calculated. From Table 5 we can see that x2 value for the correct key is much greater than the maximum of x2 from 1000 random incorrect keys, though it has the same order sometimes. As it is seen from experiments, the difference becomes greater if we slightly increase the sample size. The effectiveness of the attack can be improved if we choose S0 and S1 separately. In this case the time complexity becomes O(232 ): Table 5. Testing RC6 for key recovering I Key no. r sample x2 for correct max x2 of 220 avg x2 of 220 size key (S0 |S1 ) incorr. keys incorr. keys 1 5 212 1344.5 1162.5 1023.5 1869.0 1179.5 1021.5 2 5 212 2943.0 1164.0 1023.3 3 5 212 2457.5 1211.5 1021.8 4 5 212 1196.0 1182.5 1024.7 5 5 212 6 6 226 17054.2 1187.7 1024.1 2415.8 1183.6 1024.2 7 6 226 2670.8 1163.4 1020.6 8 6 226 17510.4 1166.5 1021.9 9 6 226 2873.0 1170.6 1023.6 10 6 226
Experimental Statistical Attacks on Block and Stream Ciphers
163
Table 6. Testing RC6 for key recovering II Key No r sample x2 for correct max x2 of 220 avg x2 of 220 size key (S0 |S1 ) incorr. keys incorr. keys 1 5 214 2360.4 1257.9 1022.95 1996.9 1283.1 1022.96 2 5 214 1774.0 1264.5 1023.01 3 5 214 1961.4 1245.8 1023.01 4 5 214 1728.3 1252.6 1023.00 5 5 214 2994.9 1250.9 1023.02 6 5 214 1381.1 1259.6 1022.97 7 5 214 1193.4 1262.9 1022.94 8 5 214 1544.5 1269.9 1023.10 9 5 214 1475.6 1289.8 1023.00 10 5 214
1. For each S0 calculate a sequence D0 = cn , B0 = an − S0 , A0 = bn ⊕ t, C0 = dn , where t = F (an ). Let the sequence be a cipher’s input. 2. For each sequence calculate x2 statistic using lsb5 (Ar+2 )||lsb5 (Cr+2 ). 3. Search for S0 for which x2 takes its maximum. Assume it is S0 we are looking for. For each of 10 keys we tested 220 random incorrect keys. For example, in the 2-nd experiment the x2 value is 1996.9 for the correct key, but only 1283.1 for maximal x2 from 220 incorrect keys, thus in this case S0 is found correctly. Searching for S1 subkey is similar. To find correct S0 and S1 we recommend to choose 215 “most suspicious” subkeys with maximal x2 statistic value and check pairs of those “most suspicious” subkeys according to the scheme described as the first version of the attack. Obviously, the time complexity of this additional step is equal to t × 230 , which insignificantly increases the overall complexity, but it surely increases the effectiveness of attack. From Table 6 we can see that correct S0 for all experiments (except for 8th) is the “most suspicious” key. Note that overall time complexity for the 5-th round will be 246 , i.e. we need sample size of 214 for each of 232 keys. The proposed attack allows recovering the secret key for RC6 much faster than other known attacks. We point out that the attack is applicable to the 6-th round and its time complexity is considerably less than for known attacks. For the 5-th round, the complexity of attack described in [12] would be 2101 (the best result), the new attack allows to find the secret key for the same round using 246 operations, which makes the new attack prospective.
References 1. Rukhin, A., et al (ed.): A statistical test suite for random and pseudorandom number generators for cryptographic applications. NIST Special Publication 80022 (rev. May 15, 2001) 2. Ryabko, B., Pestunov, A.: Probl. Inform. Transm. 40(1), 66–71 (2004)
164
S. Doroshenko et al.
3. Ryabko, B., Fionov, A.: Basics of contemporary cryptography for IT practitioners. World Scientific Publishing Co, Singapore (2006) 4. Dawson, E., Gustafson, H., Henricksen, M., Millan, B.: Evaluation of RC4 stream cipher (2002), http://www.ipa.go.jp/security/enc/CRYPTREC/fy15/ 5. Golic, J.D.: Iterative probabilistic cryptanalysis of RC4 keystream generator. In: Australasian Conf. on Information Security and Privacy (ACISP), pp. 220–233 (2000) 6. Fluhrer, S., McGrew, D.: Statistical analysis of the alleged RC4 keystream generator source in Lecture Notes In Computer Science. In: Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978. Springer, Heidelberg (2001) 7. Pudovkina, M.: Statistical weaknesses in the alleged RC4 keystream generator. Cryptology ePrint Archive (2002), http://eprint.iacr.org/2002/171 8. Crowley, P.: Small bias in RC4 experimentally verified (2003), http://www.ciphergoth.org/crypto/rc4/ 9. Gressel, C., Granot, R., Vago, G.: ZK-Crypt eSTREAM, ECRYPT Stream Cipher Project (2005), http://www.ecrypt.eu.org/stream/zkcrypt.html 10. Knudsen, L., Meier, W.: Correlations in RC6 whith a reduced number of rounds. In: Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, pp. 94–108. Springer, Heidelberg (2001) 11. Miyaji, A., Nonaka, M.: Evaluation of the security of RC6 against the χ2 -attack. IEICE Trans Fundamentals E88-A(1) (2005) 12. Isogai, N., Matsunaka, T., Miyaji, A.: Optimized χ2 -attack aginst RC6. In: Zhou, J., Yung, M., Han, Y. (eds.) ACNS 2003. LNCS, vol. 2846, pp. 199–211. Springer, Heidelberg (2003) 13. Ryabko, B., Monarev, V., Shokin, Yu.: Probl. Inform. Transm. 41(4), 385–394 (2005)
On Performance and Accuracy of Lattice Boltzmann Approaches for Single Phase Flow in Porous Media: A Toy Became an Accepted Tool — How to Maintain Its Features Despite More and More Complex (Physical) Models and Changing Trends in High Performance Computing!? T. Zeiser1 , J. G¨ otz2 , and M. St¨ urmer2 1 University of Erlangen-Nuremberg, Regional Computing Center Erlangen, Martensstraße 1, 91058 Erlangen, Germany
[email protected] 2 University of Erlangen-Nuremberg, Chair for System Simulation, Cauerstraße 6, 91058 Erlangen, Germany {jan.goetz,markus.stuermer}@informatik.uni-erlangen.de
Abstract. Delivering high performance on contemporary high performance computing architectures becomes more and more challenging owing to changing trends in computer hardware but also due to the incorporation of more complex physical models. Results from low level benchmarks and flow simulations using lattice Boltzmann approaches are reported for advanced HPC systems (NEC SX-8 vector parallel computer), commodity HPC clusters (based on Intel Woodcrest CPUs and Infiniband interconnect) and special purpose hardware (IBM Cell processor).
1
Introduction
During the last two decades, lattice Boltzmann approaches quickly evolved from a toy of physicists [1] to a well accepted tool of engineers and scientists [2]. An exponentially growing number of groups investigates in the meantime very diverse fields from computational fluid dynamics (CFD) with the help of lattice Boltzmann methods (LBM), including single or multiphase flow in porous media, basic turbulence research, free surface flow, nano-scale flow, blood flow, car aerodynamics, fluid-structure interaction, or aero-acoustics — in short: almost any type of “complex” flow or fluid. Arguments for the choice of LBM have since the very beginning been: • Ease of implementation, • Good (parallel) performance, E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 165–183, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
166
T. Zeiser, J. G¨ otz, and M. St¨ urmer
• Reasonable accuracy1 , and • Its original roots in kinetic theory2 . The extension from simple flow configurations to more complex flows or fluids but also changing trends in computer hardware continuously require algorithmic changes and different implementation strategies. Owing to the move from specially designed HPC hardware like vector-parallel systems to commodity clusters consisting of many nodes, nowadays with multi-core CPUs, and perhaps specialpurpose hardware (again) in the future, data layout and placement, partitioning strategies, spatial and temporal blocking, or hybrid programming become more and more important. As some special-purpose hardware available today (e.g. the current incarnation of the IBM Cell processor or graphics processing units (GPUs) of high-end graphics cards) can only handle single precision floating point data, it is also a question how well LBM can do with that. This reports gives an insight to our ongoing research, sometimes including preliminary results. It is restricted to single phase flow in simple and complex 3-D geometries. The reminder of the paper is organized as follows: First, in Sec. 2, the architectural specifications of the systems used in our survey are summarized. Section 3 gives a brief introduction to the basic lattice Boltzmann method and discusses some implementation aspects. A large number of testcases, starting with low-level benchmarks and moving on to diverse lattice Boltzmann flow simulations, are presented and analyzed in Sec. 4. Finally, Sec. 5 closes the paper with a summary, conclusions and an outlook on next steps.
2 2.1
Architectural Specifications of the Investigated Systems
NEC SX-8
From a programmer’s view, a NEC SX-8 CPU is a traditional vector processor with 4-track vector pipes running at 2.0 GHz. One multiply and one add instruction per cycle can be sustained by the arithmetic pipes, delivering a theoretical peak performance of 16 GFlop/s. The memory bandwidth of 64 GByte/s allows for one load or store of double precision floating point data (“word”) per multiply-add instruction, providing a balance of 0.5 Word/Flop. The processor has 64 vector registers, each holding 256 64-bit words. An SMP node comprises eight processors and provides a theoretical total memory bandwidth of 512 GByte/s, i.e. the aggregated single processor bandwidths should be saturated. The NEC SX-8 nodes are networked by a proprietary interconnect called IXS, providing a bidirectional bandwidth of 16 GByte/s and a latency of about 5 microseconds. 1 2
2nd order in space if boundary conditions are handled appropriately and 1st in the hydrodynamic time scale. Although the diverse physical and numerical discretizations leading to the simple explicit LB equations introduce significant restrictions.
On Performance and Accuracy of Lattice Boltzmann Approaches
167
The benchmark results presented in this paper were measured on the NEC SX-8 system at the High Performance Computing Center Stuttgart (HLRS), Germany. 2.2
Commodity Cluster with Intel Woodcrest CPUs and Infiniband Interconnect
Commodity clusters are available in many different flavors. Our benchmarks have been carried out on the Woody cluster at RRZE which consists of 2-socket nodes (HP DL140G3) with dual-core 64-bit enabled Intel Xeon 5160 CPUs (codenamed “Woodcrest” with private L1 cache and 4 MB shared L2 cache), 8 GB of main memory (8 double ranked 1 GB FBDIMMs) and high speed Infiniband interconnect (DDRx-4x). These Woodcrest CPUs are based on the new Intel Core design which is capable of performing a maximum of four double precision floating point operations (two multiply and two add) per cycle. Running at a clock frequency of 3.0 GHz, each core has a peak performance of 12 GFlop/s. The aggregated main memory bandwidth of an n-way Intel Xeon node — in contract to multi-way AMD Opteron nodes — usually does not scale with the number of CPUs in the node as all Xeon CPUs of the node have to share the front side bus (FSB) and north bridge, however, in the 2-socket Woodcrest nodes two front side buses are available for the 4 cores. Running with FSB1333, the theoretical memory bandwidth of the nodes is 21.3 GByte/s. The STREAM traid [3] actually shows only a sustained aggregated memory bandwidth of approximately 6.4 GByte/s (with the snoop filter of the Intel 5000X Greencreek chipset switched on in the HP DL140G3). The Infiniband cards are placed in PCIe-8x slots, resulting in an actual bidirectional bandwidth of up to 1500 MB/s per direction (owing to the DDRx mode of the Voltaire ISR9288 Infiniband switch, this rate is only seen if the two nodes are connected to the same line card of the Infiniband switch, otherwise only 900 MB/s (SDR-speed) are possible) and a latency of less then 4 microseconds. Unless otherwise specified, the Intel Fortran compiler for EM64T systems in version 9.1.039 and Intel MPI in version 3.0.043 were used together with the Voltaire GridStack 4.1.5 which is based on OFED-1.1 and SuSE SLES9SP3 as operating system. 2.3
IBM Cell Processor and PlayStation 3
Not following other designs, which have multiple cores of the same type on a single die, Sony, Toshiba and IBM (STI) introduced the Cell Broadband Engine Architecture (CBEA). The current Cell implementation, depicted in Fig. 1, offers a revolutionary design with combining one Power processor element (PPE) and eight simple SIMD cores, so-called synergistic processing elements (SPE). In addition to high raw performance, the chip is highly power efficient due to a low operating voltage and an advanced power management. Not only being the heart of Sony’s PlayStation 3 gaming system, it is built into servers and accelerator boards.
168
T. Zeiser, J. G¨ otz, and M. St¨ urmer
Fig. 1. Schematics of the IBM Cell processor
The PPE is a PowerPC compliant, 64-bit RISC general purpose processor. Providing 32 KB L1 data, 32 KB L1 instruction and 512 KB L2 cache, it supports symmetric multi-threading (SMT) but executes only in-order. It is primarily responsible to run the operating system and for program control. The SPEs are optimized for efficient data processing. Each consists of the following three units: The synergistic processor unit (SPU) is a simple but fast SIMD-only co-processor. Its 256 KB large local store (LS) is a fast cache-like memory that holds the SPU’s instruction code and the data it processes. To transfer data between main memory and local store, DMA commands have to be explicitly issued to the associated memory flow controller (MFC). The MFC can process multiple DMA transfers concurrently, SPE programs can check or wait for their completion. Each SPE reaches 25.6 GFlop/s using fused-multiplyadds in single precision at 3.2 GHz, i.e. 204.8 GFlop/s for 8 SPEs in total. The memory interface controller (MIC) supports Rambus extreme data rate memory (XDR) which can deliver up to 25.6 GB/s theoretical bandwidth. The Broadband Engine Interface (BEI) uses Rambus Flex I/O technology to support I/O devices or create a connection to another Cell processor, forming a powerful ccNUMA system in e.g. IBM’s QS20 servers. All components are connected by the element interconnect bus (EIB), a ring bus which provides a sustained maximum bandwidth of up to 204.8 GB/s. The current Cell processor is optimized for single precision floating point instructions, double precision (DP) stalls the pipeline for 6 cycles. However, the next generation, which will be built in 65 nm SOI technology, will support DP fully pipelined. It should also be noted that only six SPEs are available for Linux running as a guest system on the Sony PlayStation 3, reducing the maximum performance accordingly. 2.4
FPGA and GPU
Since a couple of years the interest of the HPC community in FPGAs (field programmable gate arrays) and GPUs is growing. FPGAs have the big advantage of
On Performance and Accuracy of Lattice Boltzmann Approaches
169
low power consumption (often <25 W). However, programming them requires significant efforts unless vendor supplied libraries (e.g. for BLAS or FFT routines) are used only. Moreover, general purpose compute nodes with the latest generations of Intel CPUs or the coming next generation Opteron can almost parry the peak performance of most FPGA cards (e.g. 85 GFlop/s aggregated peak performance in a 2-socket quad-core Intel Clovertown node running at 2.66 GHz vs. >66 GFlop/s sustained double precision DGEMM performance of a ClearSpeed Advance X620/e620 accelerator card according to the vendor’s documentation). LBM calculations are characterized by a usually large memory footprint and the high amount of memory operations compared to the number of floating point calculations. Therefore, current FPGAs are of rather low interest as they only have limited memory on-board and as the bandwidth of the PCI-X or PCIe bus is even more limited than the path to main memory. Graphics cards suffer from similar problems. However, the latest generation of NVIDIA’s GeForce cards (G80 series) or its HPC-incarnation (Tesla) offer an easy to use C API removing the need to learn a new hardware-oriented language or to use inappropriate graphics APIs (e.g. misuse OpenGL calls for the purpose of transferring computational data). Moreover, even high-end graphics cards are consumer (gamer) products, thus, available at a comparable low price tag. On the other hand, graphics cards consume considerable electric power and currently can only do single precision floating point operations. In the moment it looks like FPGAs and GPUs only have a chance if the total simulation domain and the complete algorithm can be shifted to the accelerator card, i.e. if only the initial and final results have to be transferred over the bus, or if at least the computational kernel requires many more floating point operations than the BGK model described in Sec. 3.
3 3.1
Computational Method and Implementation Aspects Basics of the Lattice Boltzmann Method
The lattice Boltzmann method (LBM) [2,4] is a recent method from computational fluid dynamics (CFD) which has its roots in a highly simplified gas-kinetic description, i.e. a velocity-discrete Boltzmann equation with appropriate collision term. When properly applied, the results of LBM simulations satisfy the Navier-Stokes equations in the macroscopic limit with second order of accuracy [2,4]. The simplest form is the lattice Boltzmann equation with BGK collision operator [5] which reads for the 3-D model with 19 discrete velocities (D3Q19 model) as follows if external forces are neglected: 1 fi (x, t) τ −fieq (ρ(x, t), u(x, t))
fi (x + ei Δt, t + Δt) = ficoll (x, t) = fi (x, t) −
(1) i = 0 . . . 18 ,
170
T. Zeiser, J. G¨ otz, and M. St¨ urmer
with
fieq
3 ei · u(x, t)+ (2) c2 9 3 + 4 (ei · u(x, t))2 − 2 u(x, t) · u(x, t) , 2c 2c
(ρ(x, t), u(x, t)) = ρ(x, t)wi 1 +
which describes the evolution of the single particle distribution function fi . ficoll denotes the “intermediate” state after collision but before propagation. The macroscopic quantities, density ρ and velocity u, are obtained as 0th or 1st order moments of fi with regard to the discrete lattice velocities ei , i.e. 18 18 f (x, t) and ρ(x, t)u(x, t) = ρ(x, t) = i 0 0 ei fi (x, t). The discrete equieq librium fi as given by Eq. 2 is a Taylor-expanded version of the MaxwellBoltzmann equilibrium distribution function [2,6]. wi are direction-dependent constants [2], and c = Δx Δt with the lattice spacing Δx and lattice time step Δt. The pressure p is obtained locally via the equation of state of an ideal gas, p(x, t) = c2s ρ(x, t), using the speed of sound cs . The kinematic viscosity of the fluid is determined by the dimensionless collision frequency τ1 according to ν = 16 (2τ − 1)Δxc with τ > 0.5 owing to stability reasons [2,4,6]. Alternative collision models, like the two-relaxation time (TRT) [7] or multirelaxation time models (MRT) [8], may replace the single-relaxation time BGK operator, providing additional adjustable parameters and thus usually an improved stability while preserving the benefits of the explicit lattice Boltzmann equation. In order to reduce the weak compressibility imposed by Eq. 2, the pressure p (and accordingly the density ρ) may be split into the constant contribution p0 and a small deviation δp resulting in a slightly modified equilibrium distribution function [9], 3 eq (3) fi (ρ(x, t), u(x, t)) = wi ρ0 + (ρ0 + δρ(x, t)) 2 ei · u(x, t)+ c 3 9 2 , + 4 (ei · u(x, t)) − 2 u(x, t) · u(x, t) 2c 2c with
ρ(x, t) = ρ0 + δρ(x, t) =
18
fi (x, t)
0
and
ρ0 u(x, t) =
18
ei fi (x, t) .
0
Using ρ0 = 1, no divisions are required any longer when calculating the local velocity for each cell update. If the modified equilibrium distribution function is now also shifted by −wi ρ0 , we can easily transform numbers of order O(wi ρ0 ) O(1) to small variations around zero (Eq. 4). The advantage of the latter is that certain operations are now carried out on numbers with the same order of magnitude as their result which should – in particular in the case of single precision – improve the numerical accuracy, i.e. no loss of digits when subtracting slightly varying numbers of the same magnitude.
On Performance and Accuracy of Lattice Boltzmann Approaches
fieq
171
3 (δρ(x, t), u(x, t)) = wi 0 + δρ(x, t) 2 ei · u(x, t)+ (4) c 9 3 + 4 (ei · u(x, t))2 − 2 u(x, t) · u(x, t) , 2c 2c with
and
δρ(x, t) = u(x, t) =
18 0 18
fi (x, t) ei fi (x, t) .
0
Lattice Boltzmann methods with an explicit lattice Boltzmann equation as outlined above are used on equidistant Cartesian meshes (if necessary with local mesh refinement [10] again using equidistant Cartesian cells). A marker-and-cell (MAC) approach is used to distinguish between fluid and solid regions. In the simplest case, solid wall boundary conditions are realized by the bounce-back rule [2,4], i.e. distributions hitting the wall which is assumed to be located half-way between the fluid and solid cell return to their original cell but with inverted momentum, f¯ı (x, t + Δt) = ficoll (x, t) with e¯ı = −ei and ficoll (x, t) being the right hand side of Eq. 1. Information about the geometric pore-scale structure can for example be directly taken from segmented X-ray micro-computed tomography images [11,12] or magnetic resonance imaging (MRI) data sets [13]. If the staircase approximation of the geometry is not sufficient, 2nd order geometric boundary conditions [10,14] can be applied which resemble cut-cell techniques and inter- or extrapolate the required distribution functions using data from (over-next) neighbor cells. A general form of the corresponding update rule using common linear or quadratic schemes is [15] f¯ı (x, t + Δt) = κ1 ficoll (x, t) + κ0 ficoll (x − ei , t) + κ−1 ficoll (x − 2ei , t) + +¯ κ−1 f¯ıcoll , t) + κ ¯ −2 f¯ıcoll − ei , t) ,
(5)
where the different κ values depend on the actual position of the wall and the interpolation scheme used [10,14,15]. 3.2
Implementation Aspects
The collision described by the lattice Boltzmann equation (right hand side of Eq. 1) is a purely local operation and involves arithmetic operations, whereas the propagation (left hand part of Eq. 1) only exchanges data with all direct neighbors of a cell. A usual way to work around the resulting data dependencies owing to the propagation step is the use of two arrays, i.e. one of the current and the other for the next time step, and toggling between them. To reduce the memory traffic, it is important that the collision and propagation are executed in a single loop and not independently of each other in separate loops or routines [16]. The D3Q19 lattice with either BGK or TRT collision operator requires
172
T. Zeiser, J. G¨ otz, and M. St¨ urmer
about 180–200 floating point operations per cell and time step as well as reading 19 floating point values and writing to 19 different memory locations [16]. Assuming double precision floating point data and a cache based architecture with write-allocate strategy, this results in a balance of about 2.2–2.5 bytes/flop which cannot be sustained by most hardware. Therefore, the layout (i.e. order of the different indexes) of the multi-dimensional arrays should be chosen in such a way that cache lines are efficiently used before they get replaced again. In most cases, a propagation optimized layout (“structure-of-arrays”) is preferential with all entries of a single direction i being consecutive in memory [16]. List-Based Approach Many lattice Boltzmann implementations use full arrays and a flag field to represent the data and the geometry. If only few cells are blocked out, this seems to be an optimal approach as the three nested spatial loops can be fused into just one, thus reducing loop overheads and ensuring large loop counts, and, moreover, the indexes of neighbor cells can be obtained directly via simple index shift arithmetics. However, if highly complex porous media with at least in some cases low porosities (e.g. packed bed reactors, vertebral bone or vascular system) are a main target, other implementation strategies become favorable, e.g. only patches of full arrays or a true sparse representation which includes only the fluid cells. The patch approach resembles domain decomposition including only boxes which contain at least some fluid cells and requires clever “communication” between patches e.g. via halo-cells but it preserves the local order of cells. The sparse representation on the other hand sticks with the individual Cartesian fluid cells only and allows any order of them as the connectivity has to be stored anyway as adjacent cells no longer can be obtained by simple index arithmetics anyway. Within an International Lattice Boltzmann Development Consortium the latter approach has been chosen. The resulting data structure (cf. Fig. 2) is one 1-D list containing the density distribution values of the M fluid cells (for the current and next time step) as well as a second 1-D list with information about the adjacency of the cells used during propagation. For the present investigations on the PC cluster and the NEC SX-8, this 1-D list-based implementation with the propagation optimized structure-of-arrays data layout [16], i.e f (0:18, 1:M, 0:1), and a combined collide–propagate algorithm is used unless otherwise noted. Stride-one access is always ensured for the adjacency list.
Fig. 2. Data structures of the sparse 1-D LBM solver
On Performance and Accuracy of Lattice Boltzmann Approaches
173
LBM Implementation for IBM Cell An efficient implementation for the IBM Cell processor has to respect the strengths and restrictions of the Cell Broadband Engine architecture. Most computation should take place on the SPEs, as they have more computational power and better bandwidth to main memory. Optimal bandwidth is given if multiple blocks of 128 bytes — corresponding to cache lines of the PPU — are transfered per DMA. Furthermore, computations should be done in SIMD, as scalar operations are not available and must be emulated. As SPUs have long penalties for branch misses and only support static prediction in software, branches should be avoided wherever possible, too. Finally, data structures must support parallel processing in a ccNUMA environment – for 16 SPEs and two distinct memory buses on the QS20 blades. The LBM implementation for the Cell processor divides the domain into boxes of 8×8×8 lattices and only boxes containing fluid are actually allocated and processed (“patches of full arrays”). This approach combines the good performance of structured codes with the ability to process irregular structures effectively. Data to update a patch fits easily into the 256 KB local store and every line of 8 lattices contains two SIMD vectors containing four single precision floating point values for every distribution function. Data that needs to be exchanged between patches — that are 30 planes and 12 lines of distribution functions — must also be collected in LS before a SPE can process a patch. At the end of the previous time step, it gets already reordered in LS and stored to dedicated memory locations belonging to each patch. Solid lattices handled by bounce-back can occur anywhere, their location is described by a bit field stored with each box. Streaming is done by loading a SIMD vector containing the distribution functions likely to be streamed as well as one containing those that might be reflected. By using a special select operation provided by the SPU, the correct values are chosen according to the bits of bounce-back information. Performance does not depend on the structure inside a patch and is further enhanced by register blocking. As SIMD operations do not throw exceptions, collision can calculate fake values for solid lattices and implementation is trivial. SPEs negotiate about who is processing which patch by counting with means of atomic increments. Running on a ccNUMA system, an equal portion of the patches is statically allocated on each node and the SPEs on that node process the patches in the local memory only. Inter-node traffic only occurs to exchange data with neighboring patches allocated on another node and to synchronize between time steps.
4 4.1
Investigated Testcases and Low-Level Benchmarks Low-Level Benchmarks
Low-level benchmarks are used to characterize computer systems. However, they can also provide an (upper) estimate of the expectable performance of real applications. For (parallel) LBM implementations, the memory bandwidth and the MPI
174
T. Zeiser, J. G¨ otz, and M. St¨ urmer
latency/bandwidth are of major importance. Thus, the STREAM benchmark [3] or the vector triad and the PingPong test from the Intel MPI benchmarks (IMB, formerly known as Pallas MPI benchmarks) can provide valuable hints. Memory Bandwidth: Vector Triad The vector triad is a very short kernel similar to the STREAM triad but involves three loads and one store, i.e. A(:) = B(:) + C(:) ∗ D(:), thus, generates a higher pressure on the memory subsystem. The performance obtained with this loop heavily depends on details of the hardware (e.g. CPU, chipset, memory modules), Table 1. Aggregated memory throughput using 4 MPI processes on 2-socket systems with dual-core Intel Woodcrest CPUs and 8x1GB FBDIMMs using the vector triad (64bit double precision floating point numbers). The Intel Fortran compiler for EM64T in different releases was always used and the maximum array size is known at compile time. The option -DCOMMON placed the four arrays in a common-block and -DNONTEMP manually added !DEC$ VECTOR NONTEMPORAL directives to bypass the L2 cache on the stores to A(:) for large loop lengths. system
compiler options
performance
HP DL140G3 snoop filter=on 2x Xeon 5160 (3 GHz) 5000X chipset
9.1.039 9.1.039 9.1.039 9.1.039 9.1.039 9.1.039 10.0.025 10.0.025 10.0.025
-xW -xW -xW -xW -xW -xW -xW -xW -xW
-DNONTEMP 380 MFlop/s -DNONTEMP -DCOMMON 360 MFlop/s -DNONTEMP -DCOMMON -unroll0 390 MFlop/s 294 MFlop/s -DCOMMON 304 MFlop/s -DCOMMON -unroll0 290 MFlop/s 383 MFlop/s -DCOMMON 359 MFlop/s -DCOMMON -unroll0 387 MFlop/s
HP DL140G3 snoop filter=off 2x Xeon 5160 (3 GHz) 5000X chipset
9.1.039 9.1.039 9.1.039 9.1.039 9.1.039 9.1.039 10.0.025 10.0.025 10.0.025
-xW -xW -xW -xW -xW -xW -xW -xW -xW
-DNONTEMP 376 MFlop/s -DNONTEMP -DCOMMON 332 MFlop/s -DNONTEMP -DCOMMON -unroll0 387 MFlop/s 348 MFlop/s -DCOMMON 348 MFlop/s -DCOMMON -unroll0 348 MFlop/s 379 MFlop/s -DCOMMON 329 MFlop/s -DCOMMON -unroll0 383 MFlop/s
Intel EA system no snoop filter 2x Xeon 5150 (2.66 GHz) 5000P chipset
9.1.039 9.1.039 9.1.039 9.1.039 9.1.039 9.1.039 10.0.025 10.0.025 10.0.025
-xW -xW -xW -xW -xW -xW -xW -xW -xW
-DNONTEMP 386 MFlop/s -DNONTEMP -DCOMMON 330 MFlop/s -DNONTEMP -DCOMMON -unroll0 391 MFlop/s 344 MFlop/s -DCOMMON 344 MFlop/s -DCOMMON -unroll0 344 MFlop/s 386 MFlop/s -DCOMMON 329 MFlop/s -DCOMMON -unroll0 387 MFlop/s
On Performance and Accuracy of Lattice Boltzmann Approaches
175
Table 2. Aggregated memory throughput with the vector triad on the NEC SX-8 using different numbers of MPI processes within a single dedicated SMP node (64-bit double precision floating point numbers) MPI processes GFlop/s GB/s
1
2
4
6
7
8
3.95 7.88 15.6 20.3 20.1 20.1 63.5 126 249 324 321 322
low-level system software (e.g. BIOS settings and memory refresh rate) and the compiler release or compiler options as Tab. 1 demonstrates. Surprisingly, the snoop filter of the Intel 5000X “Greencreek” chipset does not give the expected performance boost on the HP DL140G3 system whereas performance numbers provided by Intel and own initial tests on other Greencreek-based systems show a significant gain leading up to 415 MFlop/s [17]. Arithmetic and memory operations behave differently on the NEC vector system and an Intel Xeon-based node (cf. Fig. 3). The NEC SX-8 shows a rather long startup phase due to its large vector registers, however, its performance remains at a very high level for large data sets. The cache-based Intel Xeon CPU on the other hand operates very fast on small data sets which fit into the L1 or L2 cache. For in-cache operations, the Intel EA system is slightly slower than the HP DL140G3 system (see Fig. 3) owing to the lower clock rate of its CPUs. Starting with 6 MPI processes on a dedicated node, the sustained memory bandwidth of the NEC SX-8 saturates as shown in Tab. 2. It is expected that the 10000 HP DL140G3, snoop=on HP DL140G3, snoop=off Intel EA system, no snoop NEC SX-8, 1 MPI process NEC SX-8, 2 MPI processes
performance MFlop/s
8000 6000 4000 2000 0 0 10
2
10
4
10 vector length
10
6
Fig. 3. Performance of the vector triad using 64-bit double precision floating point numbers. On all systems with Intel Woodcrest CPUs the Intel 9.1.039 compiler with -xW -DNONTEMP -DCOMMON -unroll0 was used and the aggregated performance of 4 MPI processes on one node is shown.
176
T. Zeiser, J. G¨ otz, and M. St¨ urmer
Latency [μs]
1000
500
60 50 40 30 20 10 0 0 10
Beff [GBytes/sec]
Beff [MBytes/sec]
1500
1
2
3
10 10 10 10 Message Size [Byte]
4
15 10
0 0 10
GBit Ethernet Infiniband 4X/SDR (PCIe) Infiniband 4X/DDR (PCIe) NEC SX-8 via IXS
5 0 5 10
7
6
10 10 N [bytes]
2
10
8
10
4
10 N [bytes]
6
10
10
8
Fig. 4. MPI PingPong latency and bandwidth between nodes
limit can be pushed a little bit by tweaking the vector length and/or padding of the arrays [18]. Nevertheless, for large data sets which have to be loaded from main memory, the performance of a single NEC SX-8 CPU is about 10 times higher than a 2-socket node with Intel Woodcrest CPUs or 50 times if 6–8 CPUs of the NEC SX-8 SMP node are used. The sustained main memory bandwidth of the vector triad can be used as an upper limit for the memory throughput of the LBM solver [16,19]. As the operations in the collision/propagation routine of the LBM algorithm are much more complicated, it is expected that the compiler will not do as many transformations and optimizations as for the simple vector triad and thus, the results should depend less on different minor compiler releases. Network Latency and Bandwidth: MPI PingPong Figure 4 shows the measured MPI PingPong latency and bandwidth between different nodes of our Infiniband cluster and the NEC SX-8, respectively. GBitEthernet is not at all competitive. In the case of SDR Infiniband connections, the sustained bandwidth is close to the theoretical limit of 10 GBit/s. Going to DDR mode does not double the bandwidth although the 8x PCIe slot (8x250 MB/s) should provide enough capacity. On the NEC SX-8, a single MPI process can saturate the IXS interconnect, however, very large messages are required to see the full bandwidth. 4.2
LBM Testcases
From a user’s perspective, the turn-around time is the only important performance metric. Therefore, we give all performance results of the LBM investigations in terms of million fluid lattice cell updates per second (MLUPs). Knowing the domain size and the desired or required number of time steps, the runtime
On Performance and Accuracy of Lattice Boltzmann Approaches 15
60 50 40
MLUP/s
structure-of-arrays with loop-splitting structure-of-arrays array-of-structures array-of-structures with loop-splitting
MLUP/s
10
177
30 20
5
structure-of-arrays with loop-splitting structure-of-arrays array-of-structures array-of-structures with loop-splitting 1 CPU NEC SX-8
10 0
100 200 50 150 250 3 channel length N => domain size N
0
100 200 50 150 250 3 channel length N => domain size N
Fig. 5. Effect of data layout: performance of serial 1-D list-based code on an HP DL140G3 node with activated snoop filter and one NEC SX-8 CPU
can easily be derived. Of course, comparing the MLUPs rate is only fair if the same task is solved, i.e. the same physical model and with the same accuracy. For the D3Q19 model using the single phase BGK or TRT collision operator, 5 MLUPs are approximately equivalent to 1 GFlop/s. Effect of Data Layout As demonstrated by Wellein et al. [16] for a simple LBM kernel based on full arrays, the data layout has a significant influence on the performance for cache based systems. The same holds true for the 1-D list-based implementation. Figure 5 shows the performance of different implementation approaches as function of the domain size. The structure-of-arrays layout clearly outperforms the arrayof-structures layout, unless cache thrashing occurs for domain sizes which are multiples of 16 in the leading dimension. If the pressure on the memory subsystem, in particular on the write combine buffer, is reduced by splitting the collision/propagation routine into 3–5 parts based on the discrete velocity direction, an other 20% performance can be gained. In case of the array-of-structures data layout, loop splitting is counter productive. The NEC SX-8 is more or less insensitive to permutations of the index orders of the multi-dimensions f array, thus, the structure-of-arrays and the array-ofstructures layout give more or less the same performance. The 1-D list implementation where the adjacency has to be stored anyway, allows to exploit spatial blocking during the preprocessing step. The main collision/propagation loop then traverses the domain in the specified order without explicitly knowing about it. Thus, different spatial blocking factors can easily be checked without the usual overhead of additional (small) loops. Figure 6 shows the sustained performance with 4 OpenMP threads on 2-socket Woodcrest systems using a real porous structure (i.e. a bone weakened by osteoporosis) of fixed domain size (1.2 mio. fluid cells). In particular for the array-of-structures layout spatial blocking with a moderate blocking factor is mandatory. On the other hand, the structure-of-arrays layout is quite insensitive to blocking and
178
T. Zeiser, J. G¨ otz, and M. St¨ urmer 14
12
MLUPs (performance)
10
8
6 data layout and system aos, 3 GHz, Greencreek, Snoop Off aos, 3 GHz, Greencreek, Snoop On aos, 2.66 GHz, Blackford soa_split5, 3 GHz, Greencreek Snoop Off soa_split5, 3 GHz, Greencreek Snoop On soa_split5, 2.66 GHz, Blackford
4
2
0 0
10
20
30
40
50
60
70
80
90
100
blocking factor
Fig. 6. Effect of data layout and snoop filter for a complex porous medium (bone structure) using 4 OpenMP threads on 2-socket Woodcrest systems
outperforms the alternative data layout in all cases. It is interesting to note that in case of the less optimal array-of-structures layout an active snoop filter improves the performance, whereas for the faster structure-of-arrays layout – at least on the HP DL140G3 system – the active snoop filter decreases the sustained performance. Comparison of Full Array vs. 1-D List Approach So far, we have shown many examples for the performance dependencies of the 1-D list implementation and demonstrated that the optimization strategies of a full array implementation apply in that case, too. It also can be easily understood that the 1-D list-based approach is beneficial in terms of memory requirements if the flow in complex geometries with a large number of solid (i.e. blocked) cells is simulated. However, as indirect addressing is required during the propagation step, it is not obvious how the sustained performance compares to the full array approach with much more regular memory access patterns. Figure 7 shows the performance of the full array implementation of any empty channel geometry (2503 fluid cells) using 1–4 OpenMP threads on an HP DL140G3 system with activated snoop filter and compares it with the 1-D list-based implementation. In addition, the performance of a packed bad of spheres (8.5 mio. fluid cells) is given for the 1-D list-based implementation. Even for the simple plain channel case, the 1-D list-based approach – in particular if loop splitting is properly applied – can compete quite well with the full array implementation independent of the number of OpenMP threads. The complex geometry of the packed bed causes some performance penalty for the 1-D list-based code but much less than a full array implementation would suffer (data not shown). As expected, the performance only increases significantly if additional paths to main memory are added. It also should be noted that loop splitting becomes less mandatory if the memory bus is saturated anyway.
On Performance and Accuracy of Lattice Boltzmann Approaches
179
Fig. 7. Comparison of the full array and 1-D list approach on an HP DL140G3 node with activated snoop filter
The performance difference between a full array and the 1-D list-based implementation not only depends on the testcase but also on the hardware used. As shown above, the Intel Woodcrest CPUs can do quite well. On the other hand, the Intel Itanium2 with its in-order execution suffers very much (at least unless efficient prefetch statements are added manually) from the less obvious access patterns. Moving from the NEC SX-6+ to the NEC SX-8 also significantly increased the (relative) costs of 1-D list-based [20] approach, mainly because of the indirect addressing. It is expected that the NEC SX-8R will behave better again, but no verification could be done yet. Parallel Scalability of a Cube with 2503 Fluid Cells Domain decomposition is used for the MPI parallelization. Just cutting the 1-D list into equal junks guarantees good load balance for the investigated single phase flow cases independent of the geometrical complexity. The communication is also kept at a low level if spatial locality was considered while building the 1-D list (e.g. by simple spatial blocking or more advanced space filling curve ideas). Spatial blocking with a moderate blocking factor was applied for the present testcase before cutting the 1-D list into equal chunks. Figure 8 demonstrates the parallel scalability of a cube with a fixed size of 2503 fluid cells. On the Infiniband cluster, the performance per node remains more or less constant from 1 to 32 nodes, i.e. up to 128 MPI processes although in the latter case one MPI process only has to process roughly 120k cells which is equivalent to a working set of 60 KByte for double precision floating point values. Thus, the overhead of the MPI communication must always be negligible. The performance per node mainly depends on the number of paths to the main memory used, i.e. running
180
T. Zeiser, J. G¨ otz, and M. St¨ urmer
Fig. 8. Parallel scalability of a cube with 2503 fluid cells on the HL DL140G3 cluster with Infiniband DDRx interconnect
1 or 2 MPI processes on the the same socket does not make much difference. A hybrid approach using 1 MPI process with 4 OpenMP threads per node usually was slightly worse than the pure MPI version with 4 MPI processes per node. The aggregated performance of 32 Woodcrest nodes with 128 MPI processes is 370 MLUPs, a single NEC SX-8 node on the other hand already delivers 380 MLUPs, although the latter is slightly less than 8 times the single processor performance seen in Fig. 5. Single vs. Double Precision Using the geometry of a packed bed of spheres, no nameable difference in the measured pressure drop of creeping or low Reynolds number flow could be observed between single and double precision floating point calculations using Eq. 4. The original equilibrium equations, Eq. 2 and 3, have not been investigated in the present study but they also may be fine for that class of problems [21]. However, for cases with more steep gradients, it is expected that single precision calculations are less accurate and more over less stable [22]. In terms of memory footprint, single precision floating point numbers mean reducing the amount data for the distribution functions by a factor of two compared to double precision values. This not only reduces the amount of main memory required but also the required memory bandwidth. The latter effect can in particular be seen when calculations with double precision hit the bandwidth wall (cf. Tab. 3 for the complex medical geometry of Fig. 9).
On Performance and Accuracy of Lattice Boltzmann Approaches
181
Table 3. Performance using double and single precision floating point data for a medical application with 332k fluid cells double precision single precision
system
case
HP DL140G3 snoop filter=on 2x Xeon 5160 (3 GHz) 5000X chipset
serial 1 node, 2 MPI processes, 2 cores on 1 socket used 1 node, 2 MPI processes, 1 core of 2 sockets used 1 node, 4 MPI processes, 2 cores on 2 sockets used
6.31 MLUPs 6.83 MLUPs
9.74 MLUPs 11.73 MLUPs
10.88 MLUPs
17.39 MLUPs
12.20 MLUPs
21.05 MLUPs
— —
43.4 MLUPs 88.0 MLUPs
IBM Cell blade 1 processor with 8 SPUs 2 processors each with 8 3.2 GHz SPUs
Fig. 9. Real medical geometry with a bounding box of size 250x250x220 and about 332k fluid cells as used for the performance measurements presented in Tab. 3
Table 4. Comparison of standard C code to advanced Cell code for channel flow system
standard C code advanced Cell code
1 Intel Woodcrest CPU
10.2 MLUPs
—
1 Cell processor using the PPE only
2.0 MLUPs
—
1 Cell processor using 1 SPU only
—
38.5 MLUPs
1 Cell processor using all 8 SPUs
—
98.1 MLUPs
Necessity of Specific Implementation for the IBM Cell Processor Finally, Tab. 4 compares the performance obtained on the IBM Cell processor using a standard LBM implementation as used on normal PC-based systems
182
T. Zeiser, J. G¨ otz, and M. St¨ urmer
with the results using the adapted implementation specific for SPUs of the IBM Cell processor as outlined in Sec. 3.2 and also used for the comparison in Tab. 3. It clearly can be seen that high performance can only be obtained when an adapted implementation strategy is applied. Owing to bandwidth limitations, there is only limited scalability when going from 1 to 8 SPUs. Nevertheless, if the correct implementation is used, the IBM Cell processor delivers for the present testcases a performance which cannot be matched by a PC at least if single precision data if sufficient.
5
Summary, Conclusions and Outlook
Changing trends in the available computer hardware and associated products require significant adaption of implementation strategies all the time. High performance and good scalability can only be obtained if details of the architectures are taken into account and exploited in the code. The future will definitely be dominated by parallel systems. Different sorts of specialized hardware may either get designed specifically for HPC purposes but more likely for other high-volume consumer markets. All these components may be very attractive for HPC use but the more specific a device gets the more difficult implementation gets. For general purpose systems, MPI probably will remain the choice for parallelization. Besides of computer science aspects, track of advances in the physical models (e.g. for LBM: alternative collision models or other boundary conditions) must not be lost — even if the sustained flop rate may be considerably lower owing to additional arithmetic or memory operations. A good GFlop/s rate of course is important but as real-world problems shall be solved all the time, the overall physical behavior, i.e. the time to an accurate solution, is the most important aspect. Low-level benchmarks can always be used to define upper performance limits which then set the target of a specific implementation. They can also indicate directions for optimizations. Based on the results of low-level benchmarks one can for example easily check that it is usually much more important to optimize for load balance instead of reducing the inter-node communication to a minimum.
Acknowledgments Parts of this work are financially supported by the framework of the Competence Network for Technical, Scientific High Performance Computing in Bavaria (KONWIHR). The 1-D list based code is the result of joined work of the partners within the International Lattice Boltzmann Development Consortium, including NEC CCRLE/St. Augustin, HLRS/University of Stuttgart, iRMB/University of Braunschweig, CSP/University of Amsterdam, and RRZE/ University of Erlangen-Nuremberg as active members. The IBM Cell implementation and measurements were carried out by MS using IBM QS20 blades of “JUICE” located at FZ J¨ ulich.
On Performance and Accuracy of Lattice Boltzmann Approaches
183
References 1. Hardy, J., de Pazzis, O., Pomeau, Y.: Phys. Rev. A 13, 1949–1961 (1976) 2. Succi, S.: The lattice Boltzmann equation – for fluid dynamics and beyond. Clarendon Press, Oxford (2001) 3. McCalpin, J.D.: STREAM: Sustainable memory bandwidth in high performance computers (1991-2007), http://www.cs.virginia.edu/stream/ 4. Chen, S., Doolen, G.D.: Annu. Rev. Fluid. Mech. 30, 329–364 (1998) 5. Bhatnagar, P., Gross, E.P., Krook, M.K.: Phys. Rev. 94, 511–525 (1954) 6. He, X., Luo, L.-S.: Phys. Rev. E 56, 6811–6817 (1997) 7. Ginzburg, I.: Adv. Wat. Res. 28, 1171–1195 (2005) 8. d‘Humi`eres, D., Ginzburg, I., Krafczyk, M., Lallemand, P., Luo, L.-S.: Phil. Trans. R. Soc. Lond. A 360, 437–452 (1792) 9. He, X., Luo, L.-S.: J. Stat. Phys. 88, 927–944 (1997) 10. Yu, D., Mei, R., Luo, L.-S., Shyy, W.: Progr. Aero. Sci. 39, 329–367 (2003) 11. Ferreol, B., Rothman, D.H.: Transp. Por. Med. 20, 3–20 (1995) 12. Zeiser, T., Bashoor-Zadeh, M., Darabi, A., Baroud, G.: J. Eng. Med. (submitted) 13. Zeiser, T.: Combination of detailed CFD simulations using the lattice Boltzmann method and experimental measurements using the NMR/MRI technique. In: Krause, E., J¨ ager, E., Resch, M. (eds.) High performance computing in science and engineering 2004, Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2004. Springer, New York (2005) 14. Bouzidi, M., Firdaouss, M., Lallemand, P.: Phys. Fluids 13, 3452–3459 (2001) 15. Ginzburg, I., d’Humi`eres, D.: Phys. Rev. E 68, 066614 30 (2003) 16. Wellein, G., Zeiser, T., Donath, S., Hager, G.: Comp. Fluids 35, 910–919 (2006) 17. Semin, A.: Private communication with Intel (2007) 18. Berger, H.: Private communication with NEC HPCE (2006) 19. Zeiser, T., Wellein, G., Hager, G., Nitsure, A., Iglberger, K., R¨ ude, U.: Progr. Comp. Fluid Dyn. (accepted) 20. Bernsdorf, J.: Internal report (2006) 21. Maier, R.S., Kroll, D.M., Kutsovsky, Y.E., Davis, H.T., Bernard, R.S.: Phys. Fluids 10, 60–74 (1998) 22. Lallemand, P.: Private communication (2007)
Parameter Partition Methods for Optimal Numerical Solution of Interval Linear Systems S.P. Shary Institute of computational technologies SB RAS, Lavrentiev ave. 6, 630090 Novosibirsk, Russia
[email protected]
Abstract. The paper presents a new class of adaptive and sequentially guaranteeing PPS-methods, based on partitioning parameter sets, for computing optimal (exact) component-wise bounds of the solution sets to interval linear systems with square regular matrices.
1
Introduction
The subject of the present work is the problem of outer interval estimation of the solution set to an interval linear system Ax = b
(1)
with a regular (nonsingular) interval n × n-matrix A = ( aij ) and an interval right-hand side n-vector b = ( bi ). The solution set of the interval linear system (1) is known to be defined the set (2) Ξ(A, b) := x ∈ Rn | (∃A ∈ A)(∃b ∈ b)( Ax = b ) , formed by solutions to all the point systems Ax = b with A ∈ A and b ∈ b. Ξ(A, b) is often referred to as united solution set insofar as there exist a variety of other solution sets to interval systems of equations (see e.g. [1]). We will not consider them in our paper, so that the united solution set will be the only one being studied. In what follows, we shall thereby call it just solution set. An exact description of the solution set is practically impossible for the dimensions n larger than several tens, since its complexity grows exponentially with n. On the other hand, such an exact description is not really necessary in most cases. The users traditionally confine themselves to computing some estimates, in a prescribed sense, of the solution set, and below we are going to solve the following problem of outer (by supersets) interval estimation: For an interval system of linear equations Ax = b, find an interval enclosure of the solution set Ξ(A, b).
(3)
E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 184–205, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Parameter Partition Methods for Optimal Numerical Solution
185
Sometimes, a component-wise form of the problem (3) is considered: For an interval system of linear equations Ax = b, find estimates for min{ xν | x ∈ Ξ(A, b) } from below, for max{ xν | x ∈ Ξ(A, b) } from above, ν = 1, 2, . . . , n.
(4)
One can find the extensive (but not at all exhaustive) bibliography on the problem (3)–(4) e.g. in [2,3,4,5,6,7]. Practical needs often require that the solution to the interval problem should be not any one, but optimal, i.e. the best in some sense. It is fairly simple to realize that the optimal solution to (3)–(4) is the interval hull of the solution set, that is, the least inclusive interval vector guaranteed to contain the solution set. The optimality requirement makes the problem statement (3)–(4) NP-hard in general, if we do not restrict the widths of the intervals in the system and/or the structure of nonzero elements in the matrix A [8]. Still, in the present work we advance an efficient adaptive numerical technique — parameter partitioning methods or PPS-method — for computing such optimal outer estimates of the solution sets for interval linear systems. Our notation follows, in the major lines, an informal international standard [9]. In particular, we designate intervals and interval objects by boldface letters, while underbars and overbars mean lower and upper endpoints of the corresponding intervals. The set of all intervals is denoted by IR, and we identify real numbers with zero-width degenerate intervals.
2
Parameter Partition Method for Interval Linear Systems
In the rest of the paper, we concentrate on computing min{ xν | x ∈ Ξ(A, b) } for a fixed integer index ν ∈ { 1, 2, . . . , n }, since max{ xν | x ∈ Ξ(A, b) } = − min{ xν | x ∈ Ξ(A, −b) }. Let Encl be a method that computes an enclosure of the solution set (we shall call it basic method ), Encl (Q, r) be an interval enclosure, produced by the method Encl , of the solution set Ξ(Q, r) to the system Qx = r, that is, Encl (Q, r) ∈ IRn and Encl (Q, r) ⊇ Ξ(Q, r), Υ (Q, r) be the lower endpoint of the ν-th component of the interval enclosure for the solution set Ξ(Q, r) obtained by the method Encl , that is, (5) Υ (Q, r) := Encl (Q, r) ν .
186
S.P. Shary
We require that the basic method should satisfy the estimate Υ (Q, r) is inclusion monotonic with respect to the matrix Q and vector r, i.e., for all Q , Q ∈ IRn×n and r , r ∈ IRn , Q ⊆ Q and r ⊆ r implies the inequality
(C1)
Υ (Q , r ) ≤ Υ (Q , r ). For most of the popular techniques computing enclosures of the solution set to interval linear systems (interval Gauss method [2,5], interval Gauss-Seidel iteration [4,5], various modifications of the stationary iterative single-step and total-step techniques [2], Krawczyk method [5], etc.), the fulfillment of (C1) can be easily derived from the inclusion monotonicity of the interval arithmetic operations. To go further, we need to remind a remarkable result first obtained by H. Beeck [10] and afterward repeatedly proved by K. Nickel [11]: if A is regular (i.e., contains only regular point matrices), then both minimal and maximal componentwise values of the points from the solution set are attained at the so-called extreme matrices and right-hand side vectors made up of the endpoints of A and b. In other words, for any ν = 1, 2, . . . , n, min{ xν | x ∈ Ξ(A, b) } = A˜−1˜b ν with a point matrix A˜ ∈ Rn×n and a point vector ˜b ∈ Rn whose elements are the endpoints of the interval entries of the matrix A and vector b respectively. It is also worth noting that ˜ ˜b) ≤ A˜−1˜b Υ (A, ν due to the very definition of the estimate Υ . Assuming that an entry aij of the matrix A has nonzero width, we denote by A and A the matrices obtained from A through replacing the entry aij by aij and aij respectively,
(6)
by A and A the matrices obtained from A˜ through replacing the entry a ˜ij by aij and aij respectively. Inasmuch as A ⊆ A ⊆ A,
A ⊆ A ⊆ A,
and ˜b ⊆ b, the condition (C1) implies the inequalities Υ (A, b) ≤ Υ (A , b) ≤ Υ (A , ˜b)
and
Υ (A, b) ≤ Υ (A , b) ≤ Υ (A , ˜b).
Therefore, taking the minima of the corresponding inequality sides, we arrive at Υ (A, b) ≤ min Υ (A , b), Υ (A , b) ≤ min Υ (A , ˜b), Υ (A , ˜b) .
Parameter Partition Methods for Optimal Numerical Solution
187
Additionally, ˜ ˜b) ≤ A˜−1˜b = min{ xν | x ∈ Ξ(A, b) }. min Υ (A , ˜b), Υ (A , ˜b) ≤ Υ (A, ν
Comparing the above two inequality chains results in the relation Υ (A, b) ≤ min Υ (A , b), Υ (A , b) ≤ min{ xν | x ∈ Ξ(A, b) }, and, as a consequence, in the following practical prescription: having solved the two interval “systems-descendants” A x = b and A x = b defined by (6) we can get better estimate for min{ xν | x ∈ Ξ(A, b) } from below as min Υ (A , b), Υ (A , b) . In the right-hand side vector b, breaking an interval element bi up into its endpoints bi and bi has the similar effect. For uniformity, we will designate by A x = b and A x = b the interval “systems-descendants” we get from Ax = b after having subdivided an interval element of either the matrix A or right-hand side vector b. To further improve the estimate for min{ xν | x ∈ Ξ(A, b) }, it makes sense to repeat the above described subdivision procedure applying it to the “systemsdescendants” A x = b and A x = b , and then to subdivide the descendants of A x = b and A x = b again to get even better estimate, and so forth. We arrange the whole process of the successive step-by-step improvement of the estimate for min{ xν | x ∈ Ξ(A, b) } in accordance with the well-known “branchand-bound” technique, similar to that implemented in the popular interval global optimization methods from [3,4,12] and Lipschitz global optimization methods from [13]: first, all the interval systems Qx = r emerging as the result of the partitioning of the original system (1) as well as their estimates Υ (Q, r) are stored in a working list L; second, at every step of our algorithm, the interval system subject to subdivision is that providing the smallest current estimate Υ (Q, r); third, the interval element to be subdivided in the system Qx = r is the one having the maximal width. The execution of the algorithm thus amounts to maintaining the list L, which consists of records having the form of triples Q, r, Υ (Q, r) , (7) where Q is an interval n × n-matrix, Q ⊆ A, r is an interval n-vector, r ⊆ b. Besides, the records forming the working list L will be ordered with respect to the values of the estimate Υ (Q, r), while the first record of L as well as the corresponding interval system Qx = r and the estimate Υ (the smallest in the list) will be called leading ones at the current step of the method. Table 1
188
S.P. Shary Table 1. The simplest PPS-method for interval linear systems Input An interval linear system Ax = b. A number ν of the component estimated. A method Encl that produces the estimate Υ by the rule (5).
Output An estimate Z for min{ xν | x ∈ Ξ(A, b) } from below.
Algorithm assign Q := A and r := b ; compute the estimate υ := Υ (Q, r); initialize the list L := (Q, r, υ) ; DO WHILE ( either Q or r has an interval entry ) in the matrix Q = ( q ij ) and vector r = ( r i ), choose an interval element h having the maximal width; generate interval systems Q x = r and Q x = r so that if h = q kl for some k, l ∈ { 1, 2, . . . , n }, then set q ij := q ij := q ij for (i, j) = (k, l), q kl := q kl , q kl := q kl , r := r := r;
if h = r k for some k ∈ { 1, 2, . . . , n }, then set Q := Q := Q, r k := r k , r k := r k , r i := r i := r i for i = k; compute the estimates υ := Υ (Q , r ) and υ := Υ (Q , r ); delete the former leading record (Q, r, υ) from the list L; put the records (Q , r , υ ) and (Q , r , υ ) into L so that the values of the third field of the records in L increases; denote the first record of the list L by (Q, r, υ); END DO Z := υ;
summarizes the overall pseudocode of the new algorithm, which we are going to refer to as parameter partition method following the terminology tradition of deterministic global optimization [13]. Another suitable name for the new
Parameter Partition Methods for Optimal Numerical Solution
189
method is PPS-method — after Partitioning Parameter Set 1 . The main idea of this kind of method, first presented by the author in [14], can be extended to general nonlinear interval systems of equations, although the result of the subdivision of each interval parameter will be two subintervals rather than the endpoints as in the linear case. If T is the total number of interval (with nonzero widths) elements in the matrix A and right-hand side vector b of the original system (1) (in general, T ≤ (n + 1)n), then the algorithm of Table 1 stops after at most 2T steps, producing an estimate for min{ xν | x ∈ Ξ(A, b) } from below. How close the computed result is to the exact value of min{ xν | x ∈ Ξ(A, b) } depends mainly on the way we find the estimate Υ (Q, r), that is, on the choice of the basic method Encl . In particular, for the computed result to be optimal (exactly equal to min{ xν | x ∈ Ξ(A, b) }) it is necessary and sufficient that the following condition holds: the estimate Υ (Q, r) is exact (C2) for point linear systems Qx = r However, if the dimension of the system under solution is sufficiently large and T exceeds several tens, then, even on modern medium class computers, the simplest parameter partition method will never work till its natural completion, so that it makes good sense to consider it as an iterative one.
3
Modifications of Parameter Partition Methods
In this section, we are constructing more sophisticated and, hence, more efficient PPS-methods for the optimal outer estimation of the solution sets to interval linear systems. In doing that, the algorithm of Table 1 shall be a basis to be further improved and modernized by a number of the modifications, some of them being already standard for this kind of algorithm. 3.1
Monotonicity Test
Let an interval linear system Qx = r be given, and we know ∂xν (Q, r) ∂qij
and
∂xν (Q, r) , ∂ri
i.e., interval extensions of the corresponding derivatives ∂xν (Q, r) ∂qij
and
∂xν (Q, r) ∂ri
of the ν-th component of the solution x(Q, r) to the point system Qx = r with respect to the ij-th entry of the matrix Q and i-th element of the vector r. If an 1
The more so that there exists a dual class of PSS-methods [7], which exploit the idea of Partitioning the Solution Set.
190
S.P. Shary
˜ = (q ˜ ij ) and an interval n-vector r˜ = ( r ˜ i ) are formed interval n × n-matrix Q of the elements ⎧ ∂xν (Q, r) ⎪ ⎪ [ q ij , q ij ], for ≥ 0, ⎪ ⎪ ⎪ ∂qij ⎪ ⎪ ⎪ ⎨ ∂xν (Q, r) ≤ 0, [ q ij , q ij ], for ˜ ij = q (8) ⎪ ∂qij ⎪ ⎪ ⎪ ⎪ ⎪ ∂xν (Q, r) ⎪ ⎪ for int 0, q ij , ⎩ ∂qij ⎧ ∂xν (Q, r) ⎪ [ ri , ri ], for ⎪ ≥ 0, ⎪ ⎪ ∂ri ⎪ ⎪ ⎪ ⎨ ∂xν (Q, r) r˜i = ≤ 0, [ ri , ri ], for ⎪ ∂ri ⎪ ⎪ ⎪ ⎪ ⎪ ∂xν (Q, r) ⎪ ⎩ for int 0, ri, ∂ri
(9)
where “int” means interior of the interval, then, evidently, ˜ r ˜ ) } = min{ xν | x ∈ Ξ(Q, r) }. min{ xν | x ∈ Ξ(Q, ˜ and r ˜ may be Since the number of the elements with nonzero widths in Q substantially less than that in Q and r, reducing the interval system Qx = r ˜ = r˜ simplifies, in general, the computation of the desired min{ xν | x ∈ to Qx Ξ(Q, r) }. We can find the interval extensions of the derivatives entering into the formulas (8)–(9), for instance, in the following way. As is known from any advanced calculus course, the derivatives of the solution x of a linear system Qx = r with respect to its coefficients are given by ∂xν = −yνi xj , ∂qij
∂xν = yνi ∂ri
providing that Y = ( yij ) is the inverse matrix for Q = ( qij ), Y = Q−1 (see e.g. [2], Chapter 16, or [15]). Therefore, if Y = ( y ij ) is the so-called “inverse interval matrix” for Q, i.e. an enclosure for the set of inverse point matrices, Y ⊇ { Q−1 | Q ∈ Q }, and xj is the j-th component of an inclusive interval vector x ⊇ Ξ(Q, r), we can take the following interval extensions for the derivatives ∂xν (Q, r) = −y νi xj , ∂qij
∂xν (Q, r) = y νi . ∂ri
(10)
Computing the “inverse interval matrix” may be carried out, for example, as enclosing of the solution set to the interval matrix equation AY = I,
I is the identity matrix,
applying n times (for every column of the matrix Y ) the same outer estimation method Encl which has been chosen as the basic one for the entire algorithm.
Parameter Partition Methods for Optimal Numerical Solution
3.2
191
Subdivision Strategy
Traditionally, the leading intervals are subdivided along the longest components in the interval “branch-and-bound” based global optimization algorithms, which are the nearest relatives to our parameter partition technique. Such a strategy is known to guarantee (see e.g. [12,7]) the convergence of the algorithm, and we also use it in our simplest PPS-method of Table 1 (although it is finite). When the convergence takes place, we can wish optimizing the numerical procedure, i.e. to achieve the best possible convergence rate, which usually further reduces to the simplified problem of getting faster improvement of the objective function at every step of the algorithm. A strict and exact optimization of the algorithm in the above sense is hardly possible for the parameter partition technique in general, but we are going to improve our method relying on reasonable heuristic considerations and taking into account estimates of the derivatives of the objective function. ˇ = ( qˇij ) and Q ˆ = ( qˆij ) differ from each other only in the If the matrices Q (k, l)-th entry, qˇkl < qˆkl , and wid [ qˇkl , qˆkl ] stands for the width of the interval [ qˇkl , qˆkl ], then, due to Lagrange mean-value theorem, ˜ ˇ −1 r)ν = ∂xν (Q, r) · wid [ qˇkl , qˆkl ] ˆ −1 r)ν − (Q (Q ∂qkl ˜ ∈ [ Q, ˇ Q]. ˆ Similarly, if the vectors rˇ = ( rˇi ) and rˆ = ( rˆi ) differ for some matrix Q only in the k-th component and rˇk < rˆk , then (Q−1 rˆ)ν − (Q−1 rˇ)ν =
∂xν (Q, r˜) · wid [ rˇk , rˆk ] ∂rk
for some vector r˜ ∈ [ˇ r , rˆ]. ˇ and Q ˆ be obtained from the interval matrix Now, let the interval matrices Q Q by breaking up its element q kl into the endpoints q kl and q kl : qˇ kl = q kl , ˇ r) } and min{ xν | x ∈ Ξ(Q, ˆ r) } ˆ kl = q kl . Suppose also that min{ xν | x ∈ Ξ(Q, q are attained at the same family of the endpoints of the matrix and right-hand side vector, which is almost always the case for “sufficiently narrow” interval systems due to continuity reasons. Then ´ ˆ r) } − min{ xν | x ∈ Ξ(Q, ˇ r) } = ∂xν (Q, r´) · wid q kl min{ xν | x ∈ Ξ(Q, ∂qkl ´ ∈ Q and vector r´ ∈ r. Similarly, let rˇ and rˆ be the interval for some matrix Q vectors obtained from the interval vector r by breaking up its k-th component into the endpoints rk and r k : rˇk = rk , rˆ k = rk . Under the condition that ˇ ) } and min{ xν | x ∈ Ξ(Q, r ˆ ) } are attained at the same min{ xν | x ∈ Ξ(Q, r set of the endpoints of the matrix and right-hand side vector, we again get ˆ ) } − min{ xν | x ∈ Ξ(Q, rˇ) } = min{ xν | x ∈ Ξ(Q, r
` r`) ∂xν (Q, · wid r k ∂rk
192
S.P. Shary
` ∈ Q and vector r` ∈ r. Hence, the value of the product of the for some matrix Q width of an interval element by the absolute value of the interval extension of the corresponding derivative may serve as a local measure, in a sense, of how the subdivision of an element from either Q or r affects on min{ xν | x ∈ Ξ(Q, r) } and the size of the solution set. For most of the existing techniques that solve interval linear systems, the overestimation of enclosures of the solution sets gets smaller as the solution set lessens. For example, the quadratic convergence is proven for Krawczyk method [5] and so on. Therefore, the decrease of the size of the solution set Ξ(Q, r) shall result, to approximately the same extent, in the change of the estimate Υ (Q, r). With such basic methods, the requirement that the objective function should increment most rapidly per step is, in essence, equivalent to that the subdivision of the leading interval system implies the fastest decrease of the size of the solution set. Taking the above (partly heuristic) conclusions, we thus recommend to subdivide the leading interval systems along the elements on which the maximum of ∂xν (Q, r) ∂xν (Q, r) · wid q , · wid ri , (11) ij ∂qij ∂ri i, j = 1, 2, . . . , n, is attained, that is, along the elements providing the maximal product of the width by the derivative estimate. The author has first proposed such a subdivision strategy requiring maximization of (11) in the paper [14]. 3.3
“Rohn Modification”
Beeck-Nickel theorem that we used in Section 2 for the derivation of parameter partition technique has been strengthened in 80s by J. Rohn [6] who defined more precisely the set of the endpoints of the matrix A and right-hand side vector b on which min{ xν | x ∈ Ξ(A, b) } and min{ xν | x ∈ Ξ(A, b) } are attained. To give a mathematically rigorous formulation of the result by Rohn, we need an additional notation. Let E := { x ∈ Rn | | xi | = 1 for i = 1, 2, . . . , n } be the set of vectors with ±1 components. For a given interval matrix A and fixed σ, τ ∈ E, we designate through Aστ = ( aστ ij ) a point n × n-matrix formed by the entries aij , if σi τj = −1, στ aij := aij , if σi τj = 1. Also, we designate through bσ = ( bσi ) a point n-vector formed by the elements bσi
:=
bi , if σi = 1, bi , if σi = −1.
Parameter Partition Methods for Optimal Numerical Solution
193
The matrix Aστ and the vector bσ are thus made up of collections of the endpoints of the elements of A and b respectively, and there are totally 2n · 2n = 4n matrixvector couples of the form ( Aστ , bσ ) as σ and τ independently vary within E. For the nonsingular matrix A, it turns out [6] that both minimal and maximal component-wise values for the points from the solution set Ξ(A, b) can be only reached at the set of 4n matrices Aστ and associated vectors bσ , i.e. −1 σ Aστ b , (12) min{ xν | x ∈ Ξ(A, b) } = min σ,τ ∈E
max{ xν | x ∈ Ξ(A, b) } = max
σ,τ ∈E
ν
Aστ
−1
bσ
ν
(13)
for every index ν = 1, 2, . . . , n. How could we exploit this fact in our parameter partition method? It is important to realize that the above result imposes no restriction on the endpoints of a separate element of either A or b unless the information of the other elements’ endpoints is drawn into the consideration. The restrictions on the combinations of the endpoints followed from (12)–(13) are essentially collective and to take them into account we should trace the structure of the endpoints involved through all of the matrix A and right-hand side b. To put these ideas into practice, we connect, with every interval system Qx = r, Q = ( q ij ), r = ( r i ), produced from the subdivision of the initial system (1), 1) an auxiliary integer n × n-matrix W = ( wij ), its elements being equal to ±1 or 0, such that ⎧ ⎪ ⎨ −1, if q ij = aij , 0, if q ij = aij , wij := ⎪ ⎩ 1, if q = a , ij ij and 2) auxiliary integer n-vectors s = (si ) and t = (tj ), their components being equal to ±1 or 0, such that wij = si tj (14) for all i, j = 1, 2, . . . , n and
⎧ ⎪ ⎨ −1, if r i = bi , 0, if r i = bi , si := ⎪ ⎩ 1, if r = b . i i
The values of tj are thus determined implicitly through the matrix W and vector s. Additionally, the working list L is going to consist of the records with six members, Q, r, Υ (Q, r), W, s, t , (15) to preserve W , s and t obtained at the preceding steps of the algorithm. In the rest of the paper, we will call W and s, t check matrix and check vectors respectively, intending to use them for checking and controlling the overall
194
S.P. Shary
bisection process in the PPS-methods. Namely, the vectors s and t are to be “approximations” to the vectors σ and τ , respectively, from the equalities (12)– (13), while W = s t shall be an “approximation” to the matrix (σ τ ). At the start of our algorithm, we set W , s and t to all zeros, and then they are recalculated (updated) so as to replace their zero elements (that correspond to our ignorance of which specific endpoint is treated) to nonzero ones (that correspond to a certain endpoint). The check matrix W and check vectors s, t, mutually affecting each other and being updated during the algorithm run, are thus intended to “supervise” the partitioning of the initial interval linear system so that only the variants allowed by the equalities (12)–(13) are begotten. The latter is implemented as follows. At each step of the algorithm, when subdividing an interval element h of the leading system Qx = r, we look at the corresponding value • of the check matrix W = ( wij ), if the element h is q kl of the matrix Q, • of the check vector s = (si ), if the element h is r k of the right-hand side vector r. Then, in case of wkl = 0 (sk = 0 respectively), we engender, according to the usual subdivision procedure used in the simplest parameter partition method of Table 1, two interval systems-descendants Q x = r and Q x = r corresponding to both endpoints of the interval subdivided. Otherwise, in case of wkl = 0 (sk = 0 respectively), only one descendant Q x = r may be engendered, depending on the sign of wkl (sk respectively). More precisely, we perform the procedure presented in Table 2 instead of the traditional bisection. Why is that at all possible? In other words, can discarding the second interval system-descendant in the above procedure violate the fact that the leading estimate Υ (Q, r) approximates the sought-for min{ xν | x ∈ Ξ(A, b) } from below? To answer these questions, we note that in the new subdivision procedure of Table 2 we reject only such interval systems that neither belong to the set of point systems { Aστ x = bσ | σ, τ ∈ E } nor contain them. Therefore, due to the property (C1) of the basic enclosing method and taking into account the equality (12), we have −1 σ min{ xν | x ∈ Ξ(A, b) } = min b ≥ min Υ (Aστ , bσ ) Aστ σ,τ ∈E
ν
σ,τ ∈E
≥ min{ Υ (Q, r) | Q Aστ and r bσ for some σ, τ ∈ E } ≥ min{ Υ (Q, r) | the system Qx = r is in the working list L } = the leading estimate Υ (Q, r), so that with our new subdivision procedure the leading estimate really approximates min{ xν | x ∈ Ξ(A, b) } from below.
Parameter Partition Methods for Optimal Numerical Solution
195
Table 2. Generating the systems-descendants IF ( subdivided is q kl ) THEN IF ( wkl = 0 ) THEN generate the systems Q x = r and Q x = r so that q ij := q ij := q ij for (i, j) = (k, l), ELSE
q kl := q kl ,
q kl := q kl ,
r := r := r;
generate the system Q x = r so that r := r, q kl :=
q ij := q ij for (i, j) = (k, l), q kl , for wkl = 1,
q kl , for wkl = −1;
END IF END IF IF ( subdivided is r k ) THEN IF ( sk = 0 ) THEN generate the systems Q x = r and Q x = r so that Q := Q := Q, r i := r i for i = k, ELSE
r k := r k ,
r k := r k ;
generate the system Q x = r so that Q := Q, r k :=
r i := r i for i = k, r k , for sk = −1, r k , for sk = 1;
END IF END IF
Getting started to specify the formal computational scheme, let us establish the rules for the recalculation of the check matrix W and check vectors s, t during the algorithm run. In doing this, we adopt the following notation: if a leading interval system Qx = r has begotten, as the result of executing the algorithm of Table 2, the systems-descendants Q x = r and Q x = r , then the corresponding new check matrices and check vectors will be referred to as W , W and s , s , t , t respectively. There exists a one-to-one correspondence between the vector s and the right-hand side of the interval system Qx = r, while partitioning the interval matrix Q of the system affects the vectors s and t only implicitly, through the matrix W and the conditions (14). The latter still enables us to organize recalculating of W , s and t at every such algorithm step that it results in the subdivision of an interval element of the leading system. Otherwise,
196
S.P. Shary
if the leading interval system engenders only one descendant Q x = r according to Table 2, the check vectors s, t and check matrix W remain unchanged so that s := s, t := t, W := W . So, let the leading interval system Qx = r have been subdivided to two systems-descendants Q x = r and Q x = r defined as in Table 2. What should be the law according to which we are to form the matrices W , W and vectors s , s , t , t corresponding to the systems-descendants? Initially, we can set W := W := W , s := s := s, t := t := t, and then perform the following two-stage recalculating procedure: First, we modify W , W and s , s using the information about the subdivision done. Namely, (i) if the subdivided element was q kl of the matrix Q, then, in the matrices W = ( wij ) and W = ( wij ), we put wkl := 1 and wkl := −1; (ii) if the subdivided element was r k of the right-hand side vector r, then, in the vectors s = (si ) and s = (si ), we put sk := −1 and sk := 1. Second, we recalculate each of the two families of the interconnected objects — W , s , t and W , s , t respectively — using the relations (14). Namely, (i) if s or t is changed, we try to update the matrix W ; (ii) if W or t is changed, we try to update the vector s ; (iii) if W or s is changed, we try to update the vector t . The instructions (i)–(iii) repeat consecutively one after another in a cycle until changes in W , s and t stop. The same process is then applied to W , s , t . The overall algorithmic scheme of the above procedure turns out to be quite involved so that it makes sense to provide the reader with its more strict and detailed description. Table 3 presents the corresponding pseudocode and some explanations are in order. The Boolean variables W Change, and
W Change,
s Change,
t Change,
W C,
s C,
s Change,
t Change,
W C,
s C,
t C t C
are “flags” introduced to reflect the current state of changes in the check matrices and vectors W , s , t and W , s , t respectively. The value true means that the corresponding object has been changed at the current iteration of the recalculating process, while the value false means “no changes”. The whole algorithm of Table 3 can be divided into three essentially different parts. The first one, consisting of two starting lines, is preparatory and represents initialization of the flags. The second part, consisting of two conditional operators IF-THEN, fulfills recalculation of the check matrices W , W and check vectors s , s taking into account the bisection results. Finally, the third part of the pseudocode, consisting
Parameter Partition Methods for Optimal Numerical Solution Table 3. Recalculation of W , W , s , s , t , t W Change := false; s Change := false; t Change := false; s Change := false; t Change := false; W Change := false; IF (subdivided is q kl of Q) AND (q kl is subdivided to two endpoints) THEN := 1; wkl := −1; W Change := true; W Change := true; wkl END IF IF (subdivided is r k of r) AND (r k is subdivided to two endpoints) THEN sk := −1; sk := 1; s Change := true; s Change := true; END IF DO WHILE W Change OR s Change OR t Change IF ( s Change OR t Change ) THEN trying to update the matrix W according to (14); IF ( W has been changed ) THEN W C := true ELSE W C := false END IF END IF IF ( W Change OR t Change ) THEN trying to update the vector s according to (14); IF ( s has been changed ) THEN s C := true ELSE s C := false END IF END IF IF ( W Change OR s Change ) THEN trying to update the vector t according to (14); IF ( t has been changed ) THEN t C := true ELSE t C := false END IF END IF W Change := W C; s Change := s C; t Change := t C; END DO DO WHILE W Change OR s Change OR t Change IF ( s Change OR t Change ) THEN trying to update the matrix W according to (14); IF ( W has been changed ) THEN W C := true ELSE W C := false END IF END IF IF ( W Change OR t Change ) THEN trying to update the vector s according to (14); IF ( s has been changed ) THEN s C := true ELSE s C := false END IF END IF IF ( W Change OR s Change ) THEN trying to update the vector t according to (14); IF ( t has been changed ) THEN t C := true ELSE t C := false END IF END IF W Change := W C; s Change := s C; t Change := t C; END DO
197
198
S.P. Shary
of two DO WHILE cycles, makes an attempt to update the new check matrices and check vectors “on its own basis”, according to the main relation (14). We perform the calculation until W , W , s , s , t , t “stabilizes”, that is, their changes stop and all the corresponding flags W Change, s Change, t Change, W C, s C, t C, W Change, s Change, t Change, W C, s C, t C are false. To complete the formalized description of “Rohn modification” we have only to define in detail what is meant by “trying to update the matrix W according to (14)”, “trying to update the vector s ” and the like in Table 3. Let us denote by Greek capital letters K , Λ and Ω index subsets of the elements of the vector s , vector t and the matrix W respectively that have changed their values (from 0 to ±1) at the current step of the recalculation procedure of Table 3. K and Λ are thus subsets of the set of the first n natural numbers { 1, 2, . . . , n }, while Ω is a subset of the set of all the pairs constituted by the first n natural numbers, i.e. of { (1, 1), (1, 2), . . . , (1, n), (2, 1), . . . , (n, n) }. Each of the sets K , Λ , Ω may be empty, or may contain more than one member. Then our “trying to update the vector s ” may be organized as follows: Table 4. Updating s IF ( W Change ) THEN DO FOR (k, l) ∈ Ω IF ( tl = 0 ) sk := wkl /tl END DO END IF IF ( t Change ) THEN DO FOR l ∈ Λ DO FOR k = 1 TO n = 0 ) sk := wkl /tl IF ( sk = 0 AND wkl END DO END DO END IF
“Trying to update the vector t ” can be accomplished similar to the above with the only distinction that the cycle “DO FOR l ∈ Λ ” in the second IF-operator should be replaced by “DO FOR k ∈ K ”. Coming up next is the updating of W : The same with the recalculation of s , t and W , for which we should introduce the index subsets K , Λ and Ω to denote the indices of the elements of the vector s , vector t and matrix W respectively that has changed at the current step of the recalculation process. Finally, it is worth mentioning the following remarkable property of the check matrix W : in every its 2 × 2-submatrix, any element is equal to the product of the rest three elements. To make sure of that, designate by i1 , i2 the numbers of the rows and by j1 , j2 the numbers of the columns forming the submatrix under consideration. Then, according to the definition (14).
Parameter Partition Methods for Optimal Numerical Solution
199
Table 5. Updating W IF ( s Change ) THEN DO FOR k ∈ K DO FOR l = 1 TO n := sk tl IF ( tl = 0 ) wkl END DO END DO END IF IF ( t Change ) THEN DO FOR l ∈ Λ DO FOR k = 1 TO n := sk tl IF ( sk = 0 ) wkl END DO END DO END IF
wi1 j1 = σi1 τj1 ,
wi1 j2 = σi1 τj2 ,
wi2 j1 = σi2 τj1 ,
wi2 j2 = σi2 τj2 .
Multiplying any three of the above equalities, e.g., the 1st, 2nd and 4th, we get wi1 j1 wi1 j2 wi2 j2 = σi1 τj1 σi1 τj2 σi2 τj2 . The square of any of the components of σ and τ is 1, so that wi1 j1 wi1 j2 wi2 j2 = τj1 σi2 = wi2 j1 .
(16)
The same with the rest elements of the submatrix: wi1 j1 wi1 j2 wi2 j1 = wi2 j2 ,
(17)
wi1 j2 wi2 j1 wi2 j2 = wi1 j1 ,
(18)
wi1 j1 wi2 j1 wi2 j2 = wi1 j2 .
(19)
Sometimes, the relations (16)–(19) may prove helpful in further refining the check matrix W . For instance, let us be about to subdivide the leading interval system Qx = r in the element q kl while the corresponding element wkl of the check matrix W is zero, i.e. normally we will have to engender two systemsdescendants instead of Qx = r. It is advisable to make an effort to determine wkl by searching 2 × 2-submatrices of W having all the entries nonzero except wkl . If such a submatrix in W is found, we assign wkl the product of the rest its three elements. The above stated can be implemented as the following program.
200
S.P. Shary Table 6. Refining W by 2 × 2-submatrix search DO FOR i = 1 TO n DO FOR j = 1 TO n IF ( i = k AND j = l ) THEN IF ( wij = 0 AND wil = 0 AND wkj = 0 ) THEN wkl := wij wil wkj ; EXIT END IF END IF END DO END DO
where EXIT operator means leaving all the blocks and cycles in the above code. 3.4
Sifting Unpromising Records
Next, we consider the modification of the parameter partition method resulting from the computation of the estimates Υ for the midpoints of the leading systems. It enables us to partially control the precision of the current estimate for min{ xν | x ∈ Ξ(A, b), }, as well as to delete unpromising records, which never become the leading ones, from the working list L. Thanks to the last feature the growth of the list L is confined to some extent. Let, along with the estimates Υ (Q, r) for the interval linear systems Qx = r, the values Υ (Q, r) be computed during the algorithm run where “” means taking a point from the interval. It is fairly simple to realize that Υ (Q, r) ≥ Υ (Q, r) and the values of Υ (Q, r) approximate min{ xν | x ∈ Ξ(A, b) } from above. If we define, for each step of our PPS-method, ω := min Υ (Q, r)
(20)
over all the interval linear systems Qx = r which have been in the list L up to the current step, then min{ xν | x ∈ Ξ(A, b) } ≤ ω. On the other hand, if Qx = r is the leading interval system, then Υ (Q, r) ≤ min{ xν | x ∈ Ξ(A, b) }, so that one more stopping criterion in our algorithm may be attaining the required smallness of (ω − Υ (Q, r)). Next, an interval linear system Qx = r satisfying at some step the condition Υ (Q, r) > ω
(21)
Parameter Partition Methods for Optimal Numerical Solution
201
will never become the leading one, and deleting the corresponding record from the working list L would have no effect on the result of the execution of the algorithm. In general, all the newly generated records should be tested by the inequality (21), while the total clean up of the working list — looking through L and removing the records satisfying (21) from it — makes sense only after the value of ω changes. Choosing (Q, r) ∈ Arg min{ Υ (Q, r) | Q ∈ Q, r ∈ r } would be an ideal outcome, but in general it is not at all easier than the original problem (3)–(4). So, to minimize the possible deviation of (Q, r) from the set Arg min{ Υ (Q, r) | Q ∈ Q, r ∈ r }, we can take Q and r as the midpoints of the matrix Q and right-hand side vector r, i.e. as mid Q and mid r respectively. 3.5
Influence of the Basic Method
To start many procedures for enclosing the solution sets to interval linear systems, we need an initial interval that contains the solution set under estimation. Such are, for instance, interval Gauss-Seidel iteration, Krawczyk method and some others. It is not hard to understand that an enclosure of the solution set to a leading system Qx = r , found at a previous step, may serve as an initial approximation for the procedures enclosing the solution sets to the systemsdescendants Q x = r with Q ⊆ Q and r ⊆ r. The same trick is applicable to the computation of the “inverse interval matrix” which we need in the monotonicity test of §3.1 and in selecting the subdivided element according to the technique of §3.2. Hence, having chosen such a basic method that requires a starting outer approximation, it makes sense to preserve the interval enclosures of both the solution set and “inverse interval matrix” obtained at the preceding step of the algorithm. To do that, we have to enlarge the records forming the working list L by two more fields, so that now L contains neither triples (7) nor six-term records (15), but eight-term records Q, r, Υ (Q, r), W, s, t, Y , x , where the first three fields has the same meaning as in (7), the check matrix W and check vectors s, t have been introduced in §3.3 and, additionally, Y is an interval n × n-matrix, such that Y ⊇ { Q−1 | Q ∈ Q }, x is an interval n-vector, such that x ⊇ Ξ(Q, r). Every technique that encloses the solution set to interval linear systems and satisfies the condition (C2) usually produces the exact estimate Υ (Q, r) not only for point (noninterval) Q and r, but for a wider data set, when a part of the elements in either Q or r can be intervals. For instance, interval Gauss-Seidel iteration [4,5] and stationary iterative single-step and total-step procedures [2] provide the exact estimate Υ (Q, r) in case of the point matrix Q, no matter what the right-hand side vector r is. The estimate Υ (Q, r) obtained through interval Gauss elimination is, obviously, exact for the triangular matrices Q. The list
202
S.P. Shary
of examples might be continued as well. There can exist more sophisticated conditions on the mutual disposition of the elements in the interval matrix Q and vector r, their widths, magnitudes, and so on. We thus may not wait for the complete “deintervalization” of the leading interval system Qx = r (the termination criteria for “DO WHILE” cycle in Table 1) to stop the PPS-method. Instead, it is quite sufficient that the leading estimate Υ (Q, r) is exact. One can go even further and make provision for a dynamic runtime change of the basic method Encl . Originally, Encl may be a technique with a wide applicability scope, although having a low convergence rate. Afterward, as the algorithm reaches a prescribed narrowness of the interval systems-descendants, we can switch Encl to a more precise specialized technique. The conclusion one ought to draw from the above stated is that, to achieve the best possible efficacy of the parameter partition methods, all the components of the practical algorithm, namely data structure (in particular, the form of the records from L), the way the working list L is processed, subdivision strategy, etc., should be carefully adapted to the features of the specific problem under solution. 3.6
Overall Computational Scheme
The pseudocodes of Tables 7 and 8 sum up the above modifications of the parameter partition method for outer interval estimation of the solution sets to the interval linear systems. Theoretically, Rohn modification enables us to decrease the upper bound of 2 the computational complexity of PPS-methods from 2n +n to 4n , but • This is done at the price of substantial complication of the algorithm, so that its informational complexity becomes quite high, • When solving large practical problems of the dimensions greater than several 2 tens, both 2n +n and 4n are unattainable and parameter partition technique should be considered rather as an iteration procedure that does not work to the natural completion. As a consequence, the check matrix W remains mostly zero and we cannot avail ourselves of the information followed from the equalities (12)–(13). To our mind, it is up to the user to decide in each specific case, judging on the size of the interval system, its structure, etc., whether incorporating Rohn modification into this or that implementation of parameter partition technique is really expedient. This is why we present two overall computational schemes, both with and without Rohn modification. In Tables 7 and 8, it is supposed that interval Gauss-Seidel iteration is taken as the basic enclosing method Encl , but this is set just for certainty. We should emphasize that the parameter partition method is rather a general scheme, while Tables 7 and 8 present only some of its possible implementations. The constructions of the above subsections contain, in particular, a number of “free variables”
Parameter Partition Methods for Optimal Numerical Solution
203
Table 7. Algorithm LinPPS1 DO WHILE
(the leading estimate Υ (Q, r) is not exact) OR (ω − Υ (Q, r) > )
using the formulas (10), compute the interval enclosures of the derivatives ∂xν (Q, r) ∂qij
and
∂xν (Q, r) , ∂ri
that correspond to the interval elements q ij and r i with nonzero width; “squeeze” according to (8)–(9) the interval elements of Q and r for which we have detected the monotonicity of xν with respect to qij and rj , denote the interval matrix and vector thus obtained by Q and r too; find, among the elements of the system Qx = r, an interval h that corresponds to the largest of the products ∂xν (Q, r) · wid q , ∂xν (Q, r) · wid r i , i, j ∈ { 1, 2, . . . , n }; ij ∂qij ∂ri beget the interval “systems-descendants” Q x = r and Q x = r so that if h = q kl for some k, l ∈ { 1, 2, . . . , n }, then set q kl := q kl , q kl := q kl , q ij := q ij := q ij for (i, j) = (k, l), r := r := r; if h = r k for some k ∈ { 1, 2, . . . , n }, then set r k := r k , r k := r k , r i := r i := r i for i = k, Q := Q := Q; compute the interval vectors x = Encl (Q , r ) and x = Encl (Q , r ), taking x as an initial approximation; assign the estimates υ := Υ (Q , r ) and υ := Υ (Q , r ); sharpen the enclosures for the “inverse interval matrices” Y ⊇ ( Q )−1 and Y ⊇ ( Q )−1 , taking Y as an initial approximation; compute the estimates Υ (mid Q , mid r ) and Υ (mid Q , mid r ), and set μ := min{ Υ (mid Q , mid r ), Υ (mid Q , mid r ) }; delete the former leading record (Q, r, υ, Y , x) from the list L; if υ ≤ ω, then put the record (Q , r , υ , Y , x ) into the list L so that the values of the third field of the records in L increase; if υ ≤ ω, then put the record (Q , r , υ , Y , x ) into the list L so that the values of the third field of the records in L increase; if ω > μ, then set ω := μ and clean up the list L, i.e. remove from it all such records (Q, r, υ, Y , x), that υ > ω; END DO
204
S.P. Shary Table 8. Algorithm LinPPS2 DO WHILE
(the leading estimate Υ (Q, r) is not exact) OR (ω − Υ (Q, r) > )
using the formulas (10), compute the interval enclosures of the derivatives ∂xν (Q, r) ∂qij
and
∂xν (Q, r) , ∂ri
that correspond to the interval elements q ij and r i with nonzero width; “squeeze” according to (8)–(9) the interval elements of Q and r for which we have detected the monotonicity of xν with respect to qij and rj , denote the interval matrix and vector thus obtained by Q and r too; find, among the elements of the system Qx = r, an interval h that corresponds to the largest of the products ∂xν (Q, r) · wid q ij , ∂xν (Q, r) · wid r i , i, j ∈ { 1, 2, . . . , n }; ∂qij ∂ri trying to refine the check matrix W according to the procedure of Table 6; beget one or two interval “systems-descendants” Q x = r and Q x = r according to the procedure of Table 2; if two systems-descendants have been generated, calculate the new check matrices W , W and vectors s , s , t , t according to the procedures of Table 3 and Tables 4–5; otherwise, set W := W , s := s, t := t; compute the interval vectors x = Encl (Q , r ) and, possibly, x = Encl (Q , r ), taking x as the initial approximation; assign the estimates υ := Υ (Q , r ) and, possibly, υ := Υ (Q , r ); sharpen the enclosures for the “inverse interval matrices” Y ⊇ ( Q )−1 and, possibly, Y ⊇ ( Q )−1 , taking Y as the initial approximation; compute the estimates Υ (mid Q , mid r ) and, possibly, Υ (mid Q , mid r ), and set μ := min{ Υ (mid Q , mid r ), Υ (mid Q , mid r ) }; delete the former leading record (Q, r, υ, Y , W, s, t, x) from the list L; if υ ≤ ω, then put the record (Q , r , υ , W , s , t , Y , x ) into the list L so that the values of the third field of the records in L increase; if the two systems-descendants has been engendered and υ ≤ ω, then put the record (Q , r , υ , W , s , t , Y , x ) into the list L so that the values of the third field of the records in L increase; if ω > μ, then set ω := μ and clean up the list L, i.e. remove from it all such records (Q, r, υ, W, s, t, Y , x), that υ > ω; END DO
Parameter Partition Methods for Optimal Numerical Solution
205
to be tuned and determined under specific circumstances. We can, therefore, speak about a whole class of methods based on the common general idea of partitioning the interval parameters of the system. To start the algorithm of Table 7, which we shall call LinPPS1, one should • Find crude enclosures for the solution set and the inverse interval matrix, that is, compute x ⊇ Ξ(A, b) and Y ⊇ A−1 , • Assign the accuracy > 0, • Set Υ (A, b) := x and ω := +∞, • Initialize working the list L by the record (A, b, x, Y , x). To start the algorithm of Table 8, called LinPPS2, one needs accomplish the first three items as in the previous case, then set W := 0, s := 0, t := 0 for the check matrix W and check vectors s and t introduced in §3.3 and initialize the working list L by the record (A, b, x, W, s, t, Y , x). Parallelization is another important point. Similar to its nearest relatives — “branch-and-bound” based interval global optimization techniques from [3,4,12] — our parameter partition methods for interval systems of equations are readily adapted for parallel processing, but deeper inquiring into this issue is beyond the scope of the present work.
References 1. Shary, S.P.: Reliab. Comput. 8, 321–419 (2002) 2. Alefeld, G., Herzberger, J.: Introduction to interval computations. Academic Press, New York (1983) 3. Hansen, E., Walster, G.W.: Global optimization using interval analysis. Marcel Dekker, New York (2004) 4. Kearfott, R.B.: Rigorous global search: continuous problems. Kluwer, Dordrecht (1996) 5. Neumaier, A.: Interval methods for systems of equations. Cambridge University Press, Cambridge (1990) 6. Rohn, J.: Lin Algebra Appl. 126, 39–78 (1989) 7. Shary, S.P.: SIAM J. Numer. Anal. 32, 610–630 (1995) 8. Kreinovich, V., Lakeyev, A.V., Rohn, J., Kahl, P.: Computational complexity and feasibility of data processing and interval computations. Kluwer, Dordrecht (1997) 9. Kearfott, R.B., Nakao, M.T., Neumaier, A., Rump, S.M., Shary, S.P., van Hentenryck, P.: Standardized notation in interval analysis (2002), http://www.nsc.ru/interval/INotation.pdf 10. Beeck, H.: Computing 10, 231–244 (1972) (in German) 11. Nickel, K.: Computing 18, 15–36 (1977) (in German) 12. Ratschek, H., Rokne, J.: New computer methods for global optimization. Ellis Horwood–Halsted Press, Chichester–New York (1988) 13. Horst, R., Tuy, H.: Global optimization. deterministic approaches. Springer, Berlin (1995) 14. Shary, S.P.: Interval Comput. 2(4), 18–29 (1992) 15. Hansen, E.: On linear algebraic equations with interval coefficients. In: Hansen, E. (ed.) Topics in interval analysis, pp. 35–46. Clarendon Press, Oxford (1969)
Comparative Analysis of the SPH and ISPH Methods K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov Kemerovo State University, Krasnaya st. 6, 650043 Kemerovo, Russia {afa,mak,a popov}@kemsu.ru
Abstract. Free surface problems in fluid mechanics are of a great applied importance, and that’s why they rouse many researchers to excitement. Numerical simulations of such problems, using the so-called meshless methods, become more and more popular today. One of the subsets of these methods is the class of meshless particle methods, which don’t use any mesh during the whole numerical simulation process, and therefore allow solving the problems with large deformations and with failure of problem domain connectivity. These reasons cause great popularity of these methods in the sphere of numerical simulation of free surface problems. One of the meshless particle methods is the Smoothed Particle Hydrodynamics [1]. Numerous computations, mainly of free surface problems, carried out by different scientists, using the mentioned method, proved its doubtless efficiency in receiving the high-quality kinematic images of ideal as well as viscous fluid flows. However, it has a considerable drawback - it doesn’t allow receiving satisfactory images of pressure distribution. Recently, with the purpose to deliver such drawbacks, the ISPH (Incompressible Smoothed Particle Hydrodynamics) method has been developed [2,3], which is used for simulation the incompressible fluid flows, as its name shows. It uses a split step scheme for integration of basic equations of fluid dynamics. Comparative analysis of computational results, obtained by the methods, mentioned above, shows, that the ISPH gives low-grade kinematic images of fluid flows in comparison with the classical SPH, however, it allows obtaining much better images of pressure distribution, and it will give an opportunity to compute the hydrodynamics loads in future.
1
Introduction
At present more currency in the field of numerical simulation of free surface flows is gained by meshless methods. Among them the subclass of particle methods is pointed out. These methods don’t require using a mesh neither at the stage of shape functions constructing, nor at the stage of integrating equations of motion. Their major idea lies in discretization of the computation area by a set of Lagrangian particles, which are able to move freely within the constraints, received by means of basic equations of continuum dynamics. Shape functions E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 206–223, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
Comparative Analysis of the SPH and ISPH Methods
207
Table 1. Distinctions in the SPH and ISPH algorithms
Integration scheme Pressure Courant condition Density Artificial viscosity Solid boundary conditions
SPH Explicit Equation of state Sound speed Variable Applied Lennard-Jones potential
ISPH Implicit Poisson equation Maximum speed of particles Constant Not applied Morris virtual particles
in this approach are constructed for each time step using a different set of nodes (particles). The meshless nature of these methods and their easy realization and application have caused their great popularity in the field of numerical simulation of free surface flows. The most widespread for the present moment particle methods are the smoothed particle hydrodynamics method (SPH) [1,4,5,6,7] and the moving particle semi-implicit method (MPS) [8,9]. Besides, there are many modifications of the SPH, i.e. the RKPM [10] and MLSPH [11], intended for optimization of its approximation characteristics. This work considers the original SPH and one of its modifications – the Incompressible Smoothed Particle Hydrodynamics (ISPH) [2,3], which allows, contrary to the original SPH method, exactly complying with the incompressibility condition. The most significant differences of the two methods are presented in Table 1.
2
Governing Equations
The basic equations of fluid dynamics, including the Navier-Stokes equations and the equation of continuity, in case of the Newtonian viscous fluid, are of the following form: dv a 1 ∂p μ ∂ = Fa − + (T ab ); (1) dt ρ ∂xa ρ ∂xb dρ ∂v a = −ρ a , (2) dt ∂x where a, b = 1, 2, 3 – numerical indices of coordinates, v a – components of the velocity vector, F a –components of the vector of volumetric forces density, δab – Kronecker symbols, p and ρ – pressure and density of the fluid, correspondingly, μ – coefficient of dynamic viscosity, while the stress components are calculated by the formula: ∂v a ∂v b 2 + − divv · δab (3) T ab = ∂xb ∂xa 3 It should be noted, that the original SPH method is applied for simulation of compressible fluid flows, and a certain equation of state is used for closure the
208
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
equations (1)-(2). For computation of incompressible fluid flows, most often the barotropic equation of state in the Theta form is applied [4]. By selection of the coefficient of volume expansion in the equation of state we can obtain the effect of incompressible fluid. However, certain variations of particles densities occur. The ISPH method uses the split step scheme for time integration of equations of motion, what allows calculating the particles pressure values by solving the Poisson equation without recourse to the equation of state. This approach allows also to neglect the artificial viscosity terms in Navier-Stokes equations, contrary to the original SPH, because of using the model of incompressible fluid. As a result of the fact, that the ISPH uses the model of incompressible fluid, the equations (2), (3) are represented by the following formulas: ∂v a = 0; ∂xa
(4)
∂v a ∂v b + . (5) ∂xb ∂xa Using them we can represent the equations of motion (1) in the following form: T ab =
dv a 1 ∂p μ ∂ ∂v a = Fa − + ( ) dt ρ ∂xa ρ ∂xb ∂xb
3
(6)
Approximation of Functions
The first stage in the process of constructing approximation formulas for functions, occurring in equations of fluid dynamics, is exact presentation of these functions as the integral: ∞ f (r) =
f (r )δ(r − r )dr ,
(7)
−∞
where δ – the Dirac delta-function. Then follows substitution of the δ-function with a certain classic compactly supported function, what allows receiving the integral formula of the function approximation on the bounded domain: (8) f (r) = f (r )W (r − r , h)dr , Ω
with the weight function W , which is often called the kernel function. The value h is a size of support domain of the function W and is called a smoothing length. At the following stage the problem domain Ω is discretized by a finite number of Lagrangian particles, and the integral (8) is substituted with the quadrature [4,5,6,7]: n mj f (rj ) W (ri − rj , hi ), (9) fs (ri ) = ρj j=1
Comparative Analysis of the SPH and ISPH Methods
209
Fig. 1. Interaction of particles
where n – number of particles, determined as nearest, in the radius hi , neighbours of the i-th particle. Two particles i and j are called neighbouring or interacting particles, if the distance between their centres does not exceed hi +hj . ri , mi , ρi - radius-vector, mass and density of the i-th particle, correspondingly. As the weight W usually polynomial splines are applied. On Fig. 1 is shown, that particles 1 and 2 are neighboring to particle i. From (9) follows, that the gradient of required function is represented by the following expression: n mj fj ∇W (ri − rj , h) (10) ∇fs (ri ) = ρ j=1 j Using the formula (9), for density approximation we obtain: ρi =
n
mj W (ri − rj , hi )
(11)
j=1
It should be noted, that contrary to the original SPH, where the density value in particles is most often determined by solving a discrete analogue of the equation (2), in the ISPH method the density is calculated just by the formula (11).
4
Kernel Function
For numerical simulation, using the considered methods, one can apply various kernel functions, from the Gaussian function up to splines of different orders. Beside the already known kernel functions the others can be developed, but, at that, has to follow the minimum requirements specified for them: – W r , h) = 0, r¯ > h; (¯ r , h)dr = 1; – W (¯ Ω
– lim W (¯ r , h) = δ(¯ r ), h→0
210
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
Fig. 2. Kernel function (left) and first derivative of kernel function (right)
where r¯ = ||r −r||. On Fig. 2 shapes of kernel function and its first derivative are submitted. Beside the above mentioned requirements, some additional conditions can be imposed on the kernel function to provide best stability of the method and the higher degree of approximation of the functions, characterizing flows. Such additional conditions and ways of constructing the kernel functions, subsequent upon them, have led, for instance, to development of the RKPM method. For the problems, considered in this work, the cubic spline from [3,12] is applied: ⎧ 2/3 − q 2 + q 3 /2, 0 ≤ q ≤ 1; ⎪ ⎪ ⎪ ⎪ 15 ⎨ 3 (12) W (¯ r , h) = (2 − q) /6, 1 < q ≤ 2; ⎪ 7πh2 ⎪ ⎪ ⎪ ⎩ 0, q > 2, where q = hr¯ .
5
Artificial Viscosity
For additional stability of the original SPH the additional term, called artificial viscosity, is added to the right part of the Navier-Stokes equations of motion [4,7,13]: ⎧ ν
⎪ c) d ⎨ μij (βμij − α¯ , vi − vjd xdi − xdj < 0; 1/ (ρi + ρj ) Πij = (13) 2 d=1 ⎪ ⎩ 0, otherwise, where ν – dimension of the problem domain, and μij are computed by the formula: ν h(vid − vjd )(xdi − xdj ) ; (14) μij = (xdi − xdj ) + ε2 h2 d=1
1 1 (ci + cj ), h = (hi + hj ), α = β = 1, ε = 0.1 (15) 2 2 Since the ISPH method was developed for incompressible fluids it doesn’t need any artificial viscosity term to be added to the approximated equations of motion. c¯ =
Comparative Analysis of the SPH and ISPH Methods
6
211
Pressure and Viscosity
For approximation of the Poisson equation one can apply SPH formulas for the gradient of function and the vector field divergence sequentially. However, on account of the fact, that approximation of the second-order derivative, obtained in this case, is too sensitive to the particles disorder, approximation of the first derivative in terms of the SPH and its finite difference analogue are usually applied together [3]: n 8 (pi − pj ) (ri − rj ) · ∇i W (ri − rj , hi ) 1 mj (16) ∇p = ∇· 2 2 ρ (ρi + ρj ) ri − rj i j=1 We can also obtain other formulas for approximation of the Poisson equation, particularly, the following springs out of the work [14]: n 1 2 (pi − pj ) (ri − rj ) · ∇i W (ri − rj , hi ) ∇p = ∇· mj (17) 2 ρ ρ i ρj ri − rj i j=1 The calculations of the model problems have been executed using as the formula (16), so (17), however any essential differences in the obtained results have not been discovered. The formula for approximation of viscous forces for the ISPH is obtained in a similar way and takes on the form: n μ 2 4 ∇ v¯ = mj 2× ρ (ρi + ρj ) i j=1 (18) (μi + μj ) (ri − rj ) · ∇i W (ri − rj , hi ) × (¯ v − v ¯ ) i j 2 ri − rj The pressure gradient term in the equation (1) can also be approximated in different ways. One can directly apply the SPH approximation formulas to this term, but it was found by researchers, that the most stable approximation can be obtained using the formula: p p ∇p =∇ (19) + 2 ∇ρ ρ ρ ρ Now one can apply the SPH approximation formulas to the right-hand side of (19) and obtain the following approximation of pressure gradient:
n ∇p pj pi =− mj + 2 ∇i W (ri − rj , hi ) (20) ρ i ρ2i ρj j=1 This approximation of pressure gradient, because of its symmetrical form, provides the total momentum conservation law fulfilment.
212
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
Based on considered approximations it is possible to rewrite equations (1)-(2) in the following form for the original SPH:
n pj pi dvia = Fi − mj ∇W a (ri − rj , hj )+ 2 + ρ2 + Πij dt ρ i j j=1 (21) n ν j i μj Tda μi Tda d + mj + ∇W (ri − rj , hj ); ρ2i ρ2j j=1 d=1
n
ν
dρi = mj (vid − vjd )∇W d (ri − rj , hj ), dt j=1
(22)
d=1
where pi , ρi , μi – pressure, density, and coefficient of dynamic viscosity accordingly for particle i. Normal and tangent components of viscous stress tensor T i are defined by following expressions: i Taa =
n mj j=1
ρj
[2 · (vja − via )∇W a (ri − rj , hj )−
(vjb − vib )∇W b (ri − rj , hj )];
i Tab =
n mj j=1
ρj
[(vja − via )∇W b (ri − rj , hj )−
(vjb − vib )∇W a (ri − rj , hj )]
7
(23)
(24)
Model of Incompressibility in the SPH
Initially method SPH was applied only to modeling strongly compressed environments. Incompressibility or weak compressibility is reached by a choice of the suitable equation of a state to solve the pressure. The following equation, as a rule, is applied to simulation of liquid dynamics [4]: γ ρ −1 , (25) p=B ρ0 where ρ0 - initial density of a fluid, γ = 7 - adiabatic index. For problems with failure of a weighty fluid constant B gets out under the following expression: B=
200ρ0 g(H − y) , γ
(26)
where g – gravity acceleration, H – initial height of a fluid column, y – vertical coordinate of the particles. In case of a choice of constant B under the formula (26) density of particles during calculation differ from initial density ρ0 no more than on 2-3 %, that allows to take a fluid weakly compressed.
Comparative Analysis of the SPH and ISPH Methods
8
213
Time Integration
In the SPH method for time integration the “predictor-corrector” scheme is used. “Predictor”: ⎧ n n−1/2 ρ = ρi + (Δt/2)(dρn−1 /dt); ⎪ i ⎪ ⎨ i (27) n−1/2 n−1 n ⎪ v = v + (Δt/2)(dv /dt). ⎪ i i i ⎩ “Corrector”:
⎧ n+1/2 n−1/2 ρi = ρi + Δt(dρni /dt); ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ n+1/2 n−1/2 vi = vi + Δt(dvin /dt); ⎪ ⎪ ⎪ ⎪ n+1/2 n+1 ⎪ = xni + Δt(vi /dt). ⎪ ⎩ xi
(28)
For time integration of motion equations in the ISPH the split step scheme is applied. Initially under consideration is only the effect of mass forces and viscous forces upon the particles, what allows receiving preliminary values of densities, velocities and coordinates of the particles. At this stage the pressure is not taken into account, and in the formula (3), due to incompressibility of the fluid, the term with the velocity divergence disappears. The mentioned restrictions lead to the equations: μ 2 ∗ Δ¯ v = g¯ + ∇ v¯ Δt; ρ v¯∗ = v¯n + Δ¯ v∗ ;
(29)
r¯∗ = r¯n + v¯∗ Δt. Here Δ¯ v ∗ – velocity change caused by mass forces and viscous forces, v¯∗ and r¯∗ – preliminary values of the velocity vector and the radius-vector of the particles centers, correspondingly, v¯n and r¯n – values of the velocity vector and the radiusvector of the particles centers on the n-th time step. Using the obtained preliminary values of the physical characteristics, we determine intermediate values of the particles densities ρ∗i by the formula (11). Variations of the obtained particles densities from their initial values are used later to comply with the incompressibility condition, and the final values of the characteristics on the (n + 1)-th time step are determined by the following formula: 1 Δ¯ v ∗∗ = − ∗ ∇pn+1 Δt; ρ v¯n+1 = v¯n + Δ¯ v ∗∗ ; r¯n+1
n v¯ + v¯n+1 n Δt. = r¯ + 2
(30)
214
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
At this stage we have to solve the pressure Poisson equation, which is of the following form [3]: ρ0 − ρ∗i 1 ∇p = (31) ∇· ∗ ρi ρ0 Δt2 i This equation can be finally reduced to the system of linear algebraic equations, and the conditions on the free surface, represented here by the Dirichlet condition, are introduced to the matrix of coefficients of the resulted system on the same principle, as in the finite element method [15]. The time step gets out proceeding from Courant condition [5,3]: Δt ≤ C
min(hi ) i
max(ci + vi ) i
(SP H),
Δt ≤ C
h (ISP H), max(vi )
(32)
i
where ci - speed of sound for particle i, C ∈ (0; 1) - Courant constant. In calculations it was used C ∈ [0.25 ; 0.5].
9
Solid Boundary Conditions
In the original SPH method the most frequently used way of imposing conditions at the solid boundaries is application of “virtual” particles, divided into two types. The first type – Monaghan virtual particles [4]. These particles are located along the solid boundary in a single line, don’t change their characteristics in time, and effect on the fluid particles by means of a certain interaction potential. The most popular among researchers is the Lennard-Jones potential, though selection of this is not imperative. In his work [6] Monaghan proposes a new interaction potential, taking into account peculiarities of the SPH method. The most frequently used potential in a method of the smoothed particles – Lennard-Jones potential: D r0 12 r0 6 − , (33) U (r) = r r r where D - depth of a potential hole, r0 - distance on which the potential of interaction addresses in a zero. If r < r0 there are repulsive force, otherwise attractive force. The found potential is included in the momentum equation as additional force which turns out as a result of application of SPH approximations: r0 12 r0 6 n − r a a a D(xi − xj ) r (34) Ui = r2 j=1 For simplification the potential (34) is calculated only for repulsive forces, and forces of an attraction at r > r0 can be neglected. The second type – Morris virtual particles [5]. These particles are located along the solid boundary in several lines. The number of the lines depends on
Comparative Analysis of the SPH and ISPH Methods
215
the smoothing length of particles of the fluid. This allows solving one of the main problems of the SPH method – asymmetry of the kernel function near the boundaries. The effect of the Morris particles on the fluid particles differs from the effect of Monaghan particles by the fact, that there is no need in using any interaction potential. Instead of this, values of the characteristics in the Morris particles are calculated on the basis of their values in particles of the fluid. In the ISPH method, for imposing solid boundary conditions the Morris virtual particles are used. In the work [3] the approach is presented, which is applied in solving problems by the MPS method [8,9]. For solving the pressure Poisson equation, on the solid boundaries the Neumann condition is imposed, specifying equality of pressures in the particles, located along the normal to the solid boundary. In this work 3 lines of “virtual” particles are applied for numerical simulation. The pressure Poisson equation is solved using only one line of “virtual” particles and also only one line is used in the gradient formula (20). The other two lines are necessary to keep particles densities near the solid boundary the same as for inner ones.
10
Free Surface Conditions
For identification of particles, belonging to the free surface, one can apply different ways. One of such ways is using the Dilts algorithm [11], based on the fact, that each particle has its size, which is determined by the smoothing length in the ISPH method. However most often, as in the MPS method, so in the ISPH, the particles, belonging to the free surfaces, are identified by their densities. This approach is based on the fact, that the density of particles, belonging to the free surface, is less than the density of inner fluid particles, because of the lack of neighbours from one side of the boundary. For solving the pressure Poisson equation, in the particles, belonging to the free boundary, the Dirichlet condition is imposed: p = 0. Besides, this condition is also applied in calculation of the pressure gradient, used in the formula (20). It is done as follows. For each particle, belonging to the free surface, all its nearest neighbours should be determined. It is evident, that the neighbours are only inner particles of the fluid, since there are no particles above the free surface. To keep the symmetry of support domain of kernel function for the particles, belonging to free surface, above the free surface we introduce additional “ghost” particles. They are located symmetrically about the free surface to the inner particles of the fluid, and the pressure in them is converse. In this work the approach, used in [3], is applied.
11
Nearest Neighbours Search
Since nearest neighbours search algorithm is carried out on each time step and requires high computing expenses it is necessary to use effective algorithms of search of neighbours. In our calculations we used grid algorithm.
216
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
Fig. 3. Direct search
Fig. 4. Grid algorithm
Fig. 5. Acceleration of computations
Comparative Analysis of the SPH and ISPH Methods
217
To define efficiency of neighbouring particles search algorithm we compared direct search and grid algorithm. Test calculations were carried out on uniprocessor system: AMD Athlon 2000+ / 512 Mb RAM / Windows XP SP2 in Fortran PowerStation 4.0. Time of search neighbouring particles depending on number of particles for 1000 time steps was measured. Apparently from Fig. 3 and 4, dependence for direct search is square-law and for grid algorithm is linear that corresponds to theoretical assumptions. Dependence of acceleration of neighbouring particles search procedure on number of particles is resulted in the figure 5.
12
Parallelization
A great amount of problems, simulating with the ISPH, is restricted by the capabilities of the computers. However, thanks to the appearance of new more powerful computers, clusters of computers and to the organization of high-performance computations using them, the opportunity to expand of class of problems, which can be simulated using the ISPH appeared. While simulating the problems on clusters the algorithm runs as an MPIapplication. MPI is a package of subprograms, giving the opportunities of communications between the processes, running on different CPUs. One of the most “weak” places in the ISPH algorithms is a necessity to solve high-dimensional systems of linear algebraic equations, which appear when approximating the pressure Poisson equation. At first the algebraic system was solved using the standard Gauss procedure, then the serial algorithm was parallelized. For demonstration of efficiency of obtained parallel algorithm some test computations were performed. The dimensions of matrices in computations varied from 200 to 4600. For comparison of computation time of serial and parallel algorithms and for demonstration of parallel algorithm advantages the coefficients of acceleration and efficiency are used: T1 Sm , , Em = Sm = Tm m where Tm is the computation time using the parallel algorithm on the cluster with m (m>1) processors, T1 is the computation time using the serial algorithm. Parallel computations were carried out on the cluster, consisted of units with the characteristics: Intel Pentium 4 2.80 GHz / 1 Gb RAM / Red Hat Linux with Intel Fortran compiler and MPI package. On Fig. 6 the dependencies between the dimensions of matrices and computation time for different number of processors are presented. Fig. 7 presents the dependencies between the dimensions of matrices and acceleration of computations. The dependencies between the dimensions of matrices and efficiency of parallel algorithm are presented on Fig. 8. The developed parallel realization of the SLAE Gauss solver allows receiving considerable acceleration of the computations as can be seen from the Table 2. Also
218
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
Fig. 6. Dependencies between the dimensions of matrices and computation time for different number of processors
Fig. 7. Dependencies between the dimensions of matrices and acceleration of computations for different number of processors
it gives an opportunity to use much more particles in computations for obtaining more accurate results. Nevertheless, the uniprocessor PGMRES solver shows more significant results in computation time than parallel Gauss solver. However the full computation time (Table 3) of the ISPH method remain rather great. Therefore
Comparative Analysis of the SPH and ISPH Methods
219
Fig. 8. Dependencies between the dimensions of matrices and efficiency of parallel algorithm for different number of processors Table 2. Computation time of different SLAE solvers (in sec.) for one time step Gauss procedure 0.984 3.421 8.625 20.203 43.125
Parallel Gauss procedure (32 CPU) 0.031 0.108 0.272 0.638 1.361
3
3
2.5
2.5
2
2
1.5
1.5
Y
Y
Particles number PGMRES 1409 0.032 2009 0.047 2709 0.094 3509 0.203 4409 0.719
1
1
0.5
0.5
0
0 0
0.5
1
1.5
X
2
2.5
3
0
0.5
1
1.5
2
2.5
3
X
Fig. 9. Dam breaking problem at t = 0.0s (left - SPH, right - ISPH)
220
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
parallelization of PGMRES algorithm is planed for future work. Possibly more efficient will prove to be the defect correction multi-grid method [2]. Further optimization of algorithms of the both presented methods lies in parallelization of the grid algorithm for nearest neighbours search.
13
Model Problem. Dam Breaking
3
3
2.5
2.5
2
2
1.5
1.5
Y
Y
The equations (1)-(2) are solved. In the initial moment t = 0 the viscous fluid column gets broken under the influence of gravity. The following values of physical characteristics are used for this problem: ρ = 1000 kg/m3 – fluid density, μ = 1 kg/m · s2 – dynamical viscosity. Fig. 9-14 presents flow pictures in
1
1
0.5
0.5
0
0 0
0.5
1
1.5
2
2.5
3
0
0.5
1
X
1.5
2
2.5
3
X
3
3
2.5
2.5
2
2
1.5
1.5
Y
Y
Fig. 10. Dam breaking problem at t = 0.21s (left - SPH, right - ISPH)
1
1
0.5
0.5
0
0 0
0.5
1
1.5
X
2
2.5
3
0
0.5
1
1.5
2
2.5
3
X
Fig. 11. Dam breaking problem at t = 0.65s (left - SPH, right - ISPH)
Comparative Analysis of the SPH and ISPH Methods
221
Table 3. Full computation time (in sec.) of the SPH and ISPH methods with different number of particles (2000 time steps)
3
3
2.5
2.5
2
2
1.5
1.5
Y
Y
Particles number SPH ISPH 1409 12.2 52.2 2009 17.4 111.4 2709 23.4 211.4 3509 30.2 436.2 4409 38.4 1476,4
1
1
0.5
0.5
0
0 0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
X
2
2.5
3
X
3
3
2.5
2.5
2
2
1.5
1.5
Y
Y
Fig. 12. Dam breaking problem at t = 1.31s (left - SPH, right - ISPH)
1
1
0.5
0.5
0
0 0
0.5
1
1.5
X
2
2.5
3
0
0.5
1
1.5
2
2.5
3
X
Fig. 13. Dam breaking problem at t = 1.5s (left - SPH, right - ISPH)
K.E. Afanasiev, R.S. Makarchuk, and A.Yu. Popov
3
3
2.5
2.5
2
2
1.5
1.5
Y
Y
222
1
1
0.5
0.5
0
0 0
0.5
1
1.5
2
2.5
3
0
0.5
X
1
1.5
2
2.5
3
X
Fig. 14. Dam breaking problem at t = 1.64s (left - SPH, right - ISPH)
different moments of time with the calculation number of fluid particles – 1600. The numerical results, obtained solving this problem by the SPH and MPS methods, are provided in the work [16]. Computation time for the SPH and ISPH methods with different number of particles is presented in the Table 3.
14
Conclusion
On the problems submitted in the given work in 2D-dimensional statement the basic advantages and opportunities of the SPH and ISPH methods can be revealed. The shown results demonstrate wide applicability of the methods in simulation of problems with large deformations of problem domains. However, the presented methods have the following disadvantages: applying the equation of state for pressure field calculation in the original SPH doesn’t allow computing dynamic loads; in the obtained with the ISPH results a particle recession can be observed after fluid impact on solid wall. Nevertheless, the SPH shows pronounced kinematics of flows. The ISPH due to using the pressure Poisson equation can be applied to calculation of dynamic loads.
References 1. 2. 3. 4.
Gingold, R.A., Monaghan, J.J.: Mon. Not. Royal Astron. Soc. 181, 375–389 (1977) Cummins, S.J., Rudman, M.: J. Comp. Phys. 152, 584–607 (1999) Shao, S., Lo, E.Y.M.: Adv. Water Res. 26, 787–800 (2003) Monaghan, J.J., Thompson, M.C., Hourigan, K.: J. Comput. Phys. 110, 399–406 (1994) 5. Morris, J.P., Fox, P.J., Zhu, Y.: J. Comput. Phys. 136, 214–226 (1997) 6. Monaghan, J.J.: Rep. Prog. Phys. 68, 1703–1759 (2005)
Comparative Analysis of the SPH and ISPH Methods
223
7. Liu, G.R., Liu, M.B.: Smoothed particle hydrodynamics: a meshfree particle method. World Scientific, Singapore (2003) 8. Koshizuka, S., Tamako, H., Oka, Y.: Comp. Fluid Dyn. 4(1), 29–46 (1995) 9. Koshizuka, S., Nobe, A., Oka, Y.: Int. J. Numer Methods Fluids 26, 751–769 (1998) 10. Liu, W.K., Jun, S., Sihling, D.T., Chen, Y., Hao, W.: Int. J. Numer Methods Fluids 24, 1– 25 (1997) 11. Dilts, G.A.: Int. J. Numer Methods Eng. 48, 1503–1524 (2000) 12. Monaghan, J.J., Lattanzio, J.C.: Astron Astrophys. 149, 135–143 (1985) 13. Monaghan, J.J., Poinracic, J.: Appl. Numer Math. 1, 187–194 (1985) 14. Brookshaw, L.A.: Proc. ASA 6, 207–210 (1985) 15. Connor, J.J., Brebbia, C.A.: Finite element techniques for fluid flow. Sudostroenie, Leningrad (in Russian) (1979) 16. Afanasiev, K.E., Iliasov, A.E., Makarchuk, R.S., Popov, A.Yu.: Comput. Technologies 11, 26–44 (in Russian) (2006)
SEGL: A Problem Solving Environment for the Design and Execution of Complex Scientific Grid Applications N. Currle-Linde1 , M. Resch1 , and U. K¨ uster1 High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Nobelstraße 19, 70569 Stuttgart, Germany (linde,resch,kuester)@hlrs.de
Abstract. The design and execution of complex scientific applications in the Grid is a difficult and work-intensive process which can be simplified and optimized by the use of an appropriate tool for the creation and management of the experiment. We propose SEGL (Science Experimental Grid Laboratory) as a problem solving environment for the optimized design and execution of complex scientific Grid applications. SEGL utilizes GriCoL (Grid Concurrent Language), a simple and efficient language for the description of complex Grid experiments.
1
Introduction
The development of Grid technology [1] provides the technical possibilities for a comprehensive analysis of big data volume for practically all application environments. As a consequence, scientists and engineer are producing an increasing number of complex applications which make use of distributed Grid resources. However, the existing Grid services do not allow scientists to design complex applications on a high level of organization [2]. For this purpose they require an integrated interface which makes it possible to automate the creation and starting as well as the monitoring of complex experiments and to support their execution with the help of the existing resources. This interface must be designed in such a way that it does not require the user to have a knowledge of the Grid structure or of a programming language. In this paper, we propose SEGL (Science Experimental Grid Laboratory) as a problem solving environment for the design and execution of complex scientific Grid applications. SEGL utilizes GriCoL (Grid Concurrent Language) [3], a simple and efficient language for the description of complex Grid experiments. 1.1
Existing Tools for Parameter Investigation Studies
There are some efforts in implementing such tools e.g. Nimrod [4], Ilab [4]. These tools are able to generate parameter sweeps and jobs, running them in a distributed computer environment (Grid) and collecting the data. ILab also allows the calculation of multi-parametric models in independent separate tasks E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 224–237, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
SEGL: A Problem Solving Environment for the Design and Execution
225
in a complicated workflow for multiple stages. However, none of these tools is able to perform the task dynamically by generating new parameter sets by an automated optimization strategy as is needed for handling complex parameter problems. 1.2
Dynamic Parameterization
Complex parameter studies can be facilitated by allowing the system to dynamically select parameter sets on the basis of previous intermediate results. This dynamic parameterization capability requires an iterative, self-steering approach. Possible strategies for the dynamic selection of parameter sets include genetic algorithms, gradient-based searches in the parameter space, and linear and nonlinear optimization techniques. An effective tool requires support of the creation of applications of any degree of complexity, including unlimited levels of parameterization, iterative processing, data archiving, logical branching, and the synchronization of parallel branches and processes. The parameterization of data is an extremely difficult and time-consuming process. Moreover, users are very sensitive to the level of automation during application preparation. They must be able to define a fine-grained logical execution process, to identify the position in the input data of parameters to be changed during the course of the experiment, as well as to formulate parameterization rules. Other details of the parameter study generation are best hidden from the user. SEGL (Science Experimental Grid Laboratory) aims to overcome the above limitations of existing systems. The new technology has a sufficient level of abstraction to enable a user, without knowledge of the Grid or parallel programming, to efficiently create complex modeling experiments and execute them with the maximum efficient use of the Grid resources available. This paper is organized as follows: In the third section we describe properties and principle of organization of an parallel language GriCoL for the description of Grid experiments. The fourth section presents the architecture of SEGL, namely the Experiment Designer and the Runtime System.
2
Science Experimental Grid Laboratory (SEGL)
Science Experimental Grid Laboratory (SEGL) is a problem solving environment enabling the automated creation, start and monitoring of complex experiments and supports its effective execution on the Grid. Figure 1 shows the system architecture of SEGL at a conceptual level. It consists of three main components: the User Workstation (Client), the ApplicationServer (Server) and the ExpDBServer (OODB). The system operates according to a Client-Server-Model in which the ApplicationServer interacts with remote target computers using a Grid Middleware [6,7]. The implementation is based on the Java 2 Platform Enterprise Edition (J2EE) specification and JBOSS Application Server. The database used is an Object Oriented Database (OODB) with a library tailored to the application domain of the experiment.
226
N. Currle-Linde, M. Resch, and U. K¨ uster
Fig. 1. Conceptual Architecture of SEGL
The two key parts of SEGL are: Experiment Designer (ExpDesigner), used for the design of experiments by working with elements of GriCoL, and the runtime system (ExpEngine).
3
Grid Concurrent Language
From the user’s perspective, complex experiment scenarios are realized in Experiment Designer using GriCoL to represent the experiment. GriCoL is a graphical-based language with mixed type and is based on a component model. The main elements of this language are components, which have a defined internal structure and interact with each other through a defined set of interfaces. In addition, language components can have structured dialog windows through which additional operators can be written into these elements. The language is of an entirely parallel nature. GriCoL provides parallel processing of many data sets at all levels, i.e. inside simple language components; at the level of more complex language structures and for the entire experiment.
SEGL: A Problem Solving Environment for the Design and Execution
227
In general, the possibility of parallel execution of operations in all nodes of the experiment program is unlimited. The language is based on the principle of wrapping functionality in components in order to utilize the capacities of supercomputer applications and to enable interaction with other language elements and structures. Program codes, which may be generated in any language, for parallel machines are wrapped in the standard language capsule. Another property of the language is its extensibility. The language enables new functional components to be added to the language library. An additional property of the language is that it is multi-tiered. The multitiered model of organization enables the user when describing the experiment to concentrate primarily on the common logic of the experiment program and subsequently on the description of the individual fragments of the program. GriCoL has a two-layer model for the description of the experiment and an additional sub-layer for the description of the repository of the experiment. The top level of the experiment program, the control flow level is intended for the description of the logical stages of the experiment. The main elements of this level are blocks and connection lines. The lower level, the data flow level, provides a detailed description of components at the top level. The main elements of the data flow level are program modules and database areas. The repository sublayer provides a common description of the database. 3.1
Control Flow Level
To describe the logic of the experiment, the control flow level offers different types of blocks: solver blocks, condition blocks, merge/synchro blocks and message blocks. Solver blocks represent the nodes of data processing. Control blocks are either nodes of data analysis or nodes for the synchronization of data computation processes. They evaluate results and then choose a path for further experiment development. Another important language element on the control flow level is the connection line. Connection lines indicate the sequence of execution of blocks in the experiment. There are two mechanisms of interaction between blocks. If the connection line is red (solid) in color, control is passed to the next block one time only i.e. after all runs in the previous block have been finished. If the connection line is blue (dashed) in color, control is transferred to the next block many times i.e. after each computation of an individual data set has been completed. In accordance with the color of the input and output lines, the blocks interact with each other in two modes: batch and pipeline. Figure 2(a) shows an example of an experiment program at the control flow level. The start block begins a parallel operation in solver blocks B.01 and B.02. After execution of B.02, processes begin in solver blocks B.03 and B.04. Each data set which has been computed in B.04 is evaluated in control block B.05. The data sets meeting the criterion are selected for further computation. These operations are repeated until all data sets from the output of B.04 have been evaluated. The data sets selected in this way are synchronized by a merge/synchro
228
N. Currle-Linde, M. Resch, and U. K¨ uster
Fig. 2. Experiment Program
block with the corresponding data sets of the other inputs of B.06. The final computation takes place in solver block B.07. 3.2
Data Flow Level
A detail of programming on the data flow level is represented on Figure 2(b). The solver block consists of computation (C), replacement (R), parameterization (P) modules and a database (DB). Computation modules are functional programs
SEGL: A Problem Solving Environment for the Design and Execution
229
which organize and execute data processing on the Grid resources. The programs of computation modules consists of two parts: the resident program, which is executed on the application server, and the remote program, which is executed on the Grid resources. Some computation modules have one resident program but can have several remote programs, if the executable code of the programs of the remote computers differ from each other. On one hand, the resident program supports the interaction with other language modules and, on the other hand, starts the copies of its remote programs on the target machines. In addition it organizes the preparation of input data sets for the computation and writes the results into the experiment database. Parameterization modules are modules which generate parameter values. Replacement modules are modules for remote parameterizations. A more detailed description of the working of the solver modules is given in [3]. Figure 2(b) shows an example of a solver block program. A typical example of a solver block program is a modeling program which cyclically computes a large number of input data sets. In this example three variants of parameterization are represented: (a) Direct transmission of the parameter values with the job (Parameter 3); (b) Parameterized objects are large arrays of information (Parameter 4) which are kept in the experiment database; (c) Parameters of jobs are large arrays of information which are modified by replacement module (M.02.03) for each run with new values generated by parameter modules (M.02.01 and M.02.02). A more detailed description can be found in [3].
4
Science Experimental Grid Laboratory Architecture
Figure 3 shows the architecture of the Experiment Designer. The Experiment Designer contains several components for the design of the experiment program: ProgramConstructor, TaskGenerator, DBConstructor, DBBrowser and MonitorVisGenerator. To add new functional modules to the language library the component Units Library Assistant is used. The key component of the Experiment Designer is the Program Constructor. Within this environment, the user creates a graphical description of the experiment on three levels: the control flow, data flow and the repository layers. After the completion of the graphical description of an experiment, the user reviews the execution of the experiment with the aim of verifying the logic of the experiment and also detecting inaccuracies in programming. After completion of the design of the program at the graphical icon-level, it is compiled. During the compiling the following is created: (a) The program objects (modules), which belong to the block are incorporated in Block Containers; (b) Block Containers are incorporated in the Task Container.
230
N. Currle-Linde, M. Resch, and U. K¨ uster
Fig. 3. Architecture of Experiment Designer
In addition, the Task Container also includes the Block Connection @ Activity Table. This table describes the sequence of execution of experiment program blocks. At the control flow level, when a new connection is made between the output of a block and the input of the next block, a new element is created in the Block Connection @ Activity Table to describe the connection. Parallel to this, the DBConstructor aggregates the arrays of data icon-objects from all blocks and generates QL-descriptions of the experiment’s database (DBSpecification). The Task Container is transferred to the Application Server and the QL-descriptions are transferred to the experiment database server. In addition the Experiment Designer has a DBBrowser; its function is to convert the files and program modules into the object form (DBInformationObjects) required for writing to the database. Besides, the DBBrowser makes it possible to observe the current state of the experiment database as well as to read and write information objects. Additionally, Experiment Designer has the MonitorVisGenerator. This program creates the windows for the control and monitoring of the experiment.
SEGL: A Problem Solving Environment for the Design and Execution
231
Fig. 4. Conceptual Architecture of SEGL
4.1
Runtime System
The runtime system of SEGL (ExpEngine) chooses the necessary computer resources, organizes and controls the sequence of execution according to the task flow and the condition of the experiment program, monitors and steers the experiment, informing the user of the current status (see Figure 4). This is described in more detail below. The Application Server consists of the ExpEngine, Task, the MonitorSupervisor and the ResourceMonitor. The Task is the container application (Task Container). The ResourceMonitor holds information about the available resources in the Grid environment. The MonitorSupervisor controls the work of the runtime system and informs the Client about the current status of the jobs and the individual processes. The ExpEngine is the controlling subsystem of
232
N. Currle-Linde, M. Resch, and U. K¨ uster
SEGL (runtime subsystem). It consists of three subsystems: the TaskManager, the JobManager and the DataManager. The TaskManager is the central dispatcher of the ExpEngine. It coordinates the work of the DataManager and the JobManager. 1. It organizes and controls the sequence of execution of the program blocks. It starts the execution of the program blocks according to the task flow and the condition of experiment program. 2. It activates a particular block according to the task flow, chooses the necessary computer resources for the execution of the program and deactivates the block when this section of the program has been executed. It informs the MonitorSupervisor about the current status of the program. The DataManager organizes data exchange between the ApplicationServer and the FileServer and between the FileServer and the ExpDBServer. Furthermore, it controls all parameterization processes of input data. The JobManager generates jobs, places them in the corresponding SubServer of the target machines. It controls the placing of jobs in the queue and observes their execution. The SubServer informs the JobManager about the status of the execution of the jobs. The final component of the SEGL is the database server (ExpDBServer). All data which occurred during the experiment, initial and generated, are kept in the ExpDBServer. The ExpDBServer also hosts a library tailored to the application domain of the experiment. For the realization of the database we choose an object-oriented database because its functional capabilities meet the requirements of an information repository for scientific experiments. The interaction between ApplicationServer and the Grid resources is done through a Grid adaptor. 4.2
Block Connection @ Activity Table
The Block Connection @ Activity Table (BC@AT) contains information about the conditions and the sequence between individual experiment blocks. Whenever a block is activated (i.e. a connection made between the output of a block and the input of another block), an entry in this table is made. The table is shown in Figure 5. Each element of the table consists of two parts. The left part relates to the exit of the block while the right part is concerned with the input of the block.
Fig. 5. Block Connection @ Activity Table
SEGL: A Problem Solving Environment for the Design and Execution
233
– The left part of the BC@AT has the following elements: NrBlock - Block number (identifier). NrOutput - Number of the block exit. Fro - Activation flag of the run. Fro is a counter. After the run has been finished, the value of Fro becomes 1. After the transfer of the activation signal from the left side to the right side, the value of Fri becomes 1 and the value of Fro becomes 0. Ind - Index of the run currently being executed. Fco - When set to “1” means the completion of the operation running. – The right part of the BC@AT has the following elements: NrBlock - Block number (identifier). NrInput - Number of the block input. Fri - Activation flag of the input data set. The right block sets the value of the counter to 1 as soon as the computation of the following data set is started. This value of the flag is zero when the input has not yet been activated. *SET-Ind - Index of the data sets. *SET-Ind contains the reference to the buffer, into which the TaskManager writes the indices of the input data which are ready for computation. Fci - Activation flag for all sets of input data. It also signals complete processing when set to “1”. Figure 6 shows an example of the connection table BC@AT for the experiment at the control flow level shown in Figure 2(a). The main program of the block is the master program, which controls the execution of inter-block operations. After the processing of one of the data sets has
Fig. 6. BC@AT for Figure 2
234
N. Currle-Linde, M. Resch, and U. K¨ uster
been finished or the computation of all sets of input data has been terminated, these events are registered in the BC@AT with a special flag. In this way, the partial or complete execution of block operations is signalized. After the execution of all programs in the block has been finished, the Fco flag signifying the termination of all block operations is set. Depending on the logic of the experiment, the user will himself decide whether to use the Fro flag or not. (The Fro flag is used in the case of pipelined operations: blue-dashed line.) The design of each block already provides the possibility to set such a flag after the execution of a run. In this case each time the execution of a run in the previous block has been finished the corresponding operations in the following block can be started. After the experiment has been started the TaskManager cyclically scans the state of the activation flags belonging to the block exits (see left part of the table). When an activation flag has been found, the TaskManager activates the corresponding activation flag of the entrance of the following block (right part of the table). Immediately after the blocks have been started their master programs cyclically ask for the state of the corresponding input flags. As soon as the activation flag has been placed in the connection table BC@AT (relating to the entrances of blocks), the master program receives a corresponding message together with the number of the run carried out by the previous block. The number stands for a run which has already been finished and for the corresponding run of the following block which is now ready for computation. Once the input data set has been prepared the master program initiates the computation of the corresponding data in the block.
5
Use Case: Molecular Dynamics Simulation of Proteins
Molecular Dynamics (MD) simulation is one of the principal tools in the theoretical study of biological molecules. This computational method calculates the time dependent behavior of a molecular system. MD simulations have provided detailed information on the fluctuations and conformational changes of proteins and nucleic acids. These methods are now routinely used to investigate the structure, dynamics and thermodynamics of biological molecules and their interaction with substrates, ligands, and inhibitors. A common task for a computational biologist is to investigate the determinants of substrate specificity of an enzyme. On one hand, the same naturally occurring enzyme converts some substrates better than others. One the other hand, often mutations are found, in nature or by laboratory experiments, which change the substrate specificity, sometimes in a dramatic way. To understand these effects, multiple MD simulations are performed consisting of different enzyme-substrate combinations. The ultimate goal is to establish a general, generic molecular model that describes the substrate specificity of an enzyme and predicts short- and long-range effects of mutations on structure, dynamics, and biochemical properties of the protein. While most projects on MD simulation are still managed by hand, large-scale MD simulation studies may involve up to thousands of MD simulations. Each
SEGL: A Problem Solving Environment for the Design and Execution
235
simulation will typically produce a trajectory output file with a size of several gigabytes, so that data storage, management and analysis become a serious challenge. These tasks can no longer be performed by batch jobs and therefore have to be automated. Therefore it is worthwhile to use an experiment management system that provides a language (GriCoL) that is able to describe all the necessary functionalities to design complex MD parameter studies. The experiment management system must be combined with the control of job execution in a distributed computing environment. Figure 6 shows the schematic setup of a large-scale MD simulation study. Starting from user provided structures of the enzyme, enzyme variants (a total of 30) and substrates (a total of 10), in the first step the preparation solver block (B.01) is used to generate all possible enzyme substrate combinations (a total of 300). This is accomplished by using the select module in the data flow of the preparation solver block, which builds the Cartesian product of all enzyme variants and substrates. Afterwards for all combinations the molecular system topology is built. These topologies describe the system under investigation for the MD simulation program. All 300 topology files are stored in the experiment database and serve as an input for the equilibration solver block (B.02). For better statistical analysis and sampling of the proteins’ conformational space, each system must be simulated 10 times, using a different starting condition each time. Here the replacement and
Fig. 7. Screenshot of bio-molecular experiment
236
N. Currle-Linde, M. Resch, and U. K¨ uster
parameterization modules of the GriCoL language are used in the data flow of the equilibration solver block to generate automatically all the necessary input files and job descriptions for the 3,000 simulations. The equilibration solver block now starts an equilibration run for each of the 3,000 systems, which usually needs days to weeks, strongly depending on the numbers of CPUs available and the size of the system. In the equilibration run the system should reach equilibrium. An automatic control of the system’s relaxation into equilibrium would be of great interest to save calculation time. This can be achieved by monitoring multiple system properties at frequent intervals. The equilibrium control block (B.03) is used for this. Once the conditions for equilibrium are met, the equilibration phase for this system will be terminated and the production solver block (B.04) is started for this particular system. Systems that have not reached the equilibrium yet are subjected to another round of the equilibration solver block (B.02). During the production solver block (B.04), which performs a MD simulation with a predefined amount of simulation steps, equilibrium properties of the system are assembled. Afterwards the trajectories from the production run are subjected to different analysis tools. While some analysis tools are run for each individual trajectory (B.05), some tools need all trajectories for their analysis (B.06). The connection line between B.01 and B.02, as well as between B.04, B.06 and the end block of the control flow is drawn in red-solid, because in these blocks all tasks have to be finished before the control flow proceeds to the next block. The blue-dashed lines used to connect the other blocks indicate that as soon as one of the simulation tasks has finished, it can be passed to the next block. This example shows the ability of the simple control flow to steer the laborious process of a large number of single tasks in an intuitive way. The benefits of using an experiment management system like this are obvious. Beside the time saved for setting up, submitting, and monitoring the thousands of jobs, the base for common errors like misspelling in input files is also minimized. The equilibration control helps to minimize simulation overhead as simulations which have already reached equilibrium are submitted to the production phase while others that have not are simulated further. The storage of simulation results in the experiment database enables the scientist to later retrieve and compare the results easily.
6
Conclusion
This paper presented a powerful integrated system for the creation, execution and monitoring of complex modeling experiments in the Grid. The integrated system is composed of GriCoL, a universal abstract language for the description of Grid experiments and SEGL, a problem solving environment capable of utilizing the resources of the Grid to execute and manage complex scientific Grid applications. The new technology has a sufficient level of abstraction to enable a user, without knowledge of the Grid or parallel programming, to efficiently create complex modeling experiments and execute them with the maximum efficient use of the Grid resources available.
SEGL: A Problem Solving Environment for the Design and Execution
237
Acknowledgment This research work is carried out under the FP6 Network of Excellence CoreGRID funded by the European Commission (Contract IST-2002-004265).
References 1. Foster, I., Kesselman, C.: The Grid: blueprint for a future computing infrastructure. Morgan Kaufmann Publishers, USA (1999) 2. Yarrow, M., McCann, K., Biswas, R., van der Wijngaart, R.: An advanced user interface approach for complex parameter study process specification on the information power Grid. In: Proc. of the Workshop on Grid Computing, Bangalore, India (2002) 3. Currle-Linde, N., B¨ os, F., Resch, M.: GriCoL: A language for Scientific Grids. In: Sloot, P., van Albada, G., Bubak, M., Trefethen, A. (eds.) Proc. of the 2nd IEEE International Conference on e-Science and Grid Computing, Amsterdam, p. 62 (2006) 4. De Vivo, A., Yarrow, M., McCann, K.: A comparison of parameter study creation and job submission tools. Technical report NAS-01002, NASA Ames Research Center, Moffet Filed, CA (2000) 5. Yu, J., Buyya, R.: J. Grid Comput. 3(3-4), 171–200 (2005) 6. Erwin, D.: Joint project report for the BMBF Project UNICORE Plus, Grant Number: 01 IR 001 A-D (2002) 7. Foster, I., Kesselman, C.: J. Supercomputer Appl. 11(2), 115–128 (1997)
A Service-Oriented Architecture for Some Problems of Municipal Management (Example of the City of Irkutsk Municipal Administration) I.V. Bychkov Institute for System Dynamics and Control Theory SB RAS Lermontov str. 134, 664033 Irkutsk, Russia
[email protected] “These are data, what must circulate, but not the citizen” Gerhard Schr¨ oder, former FRG Chancellor
Abstract. The paper presents the fundamentals of a service-oriented architecture (SOA), its layers, and the associated types of architectural decisions. The method for service-oriented modeling and the characteristics of activities from a service consumer and provider perspectives are described. This method guarantees necessary requirements for analysis and design in order to determine three fundamental aspects of a serviceoriented architecture: services, flows, and components that realize the services. SOA is an architectural style, which has a goal to achieve a loose coupling among interacting software agents. A service is a unit of work done by a service provider to achieve desired final results for a service consumer. Both provider and consumer are roles played by software agents on behalf of their owners. A template is described, which can be used for architectural decisions in each of the SOA layers. For service identification it is important to combine three approaches of top-down, bottom-up, and cross-sectional analysis of problem modelling. The main activities of specification and realization of services are considered.
Presently, efficient solution of the priority problems bound up with socialeconomic development of the cities and with the increase in efficiency of management conducted by municipal authorities necessitates that i) contemporary information and telecommunication technologies be developed and extensively employed, ii) a developed infrastructure of informatization be elaborated, which would satisfy the needs experienced by local authorities, organizations and population and related to transmission, receiving, processing, storage and application of diverse information [1,2]. For example, the city of Irkutsk possesses a developed telecommunication main-line infrastructure, which includes digital networks and communication channels, which belong to “Rostelecom, Inc.”, “Irkutskenergo” (ATM-main-line) and “BaikalTransTelecom, Inc.” which possess a large resource of capacity and are capable of integration and transmit data and audio-visual information. E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 238–248, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
A Service-Oriented Architecture
239
A special corporative information-computing network has been created for the Irkutsk Municipal Administration. It includes 22 nodes of computer integrated computational networks (NCICNs) for territorially separate structural detachments of the Municipal Administration, which in turn include 34 local computing networks (LCNs) and 794 computer-aided workbenches (CAWBs) for the Municipal Administration coworkers. There are 125 databases and 18 data banks registered in the Municipal Register of Databases and Data Banks. Development of NCICNs, formation of municipal information resources and provision of access to them have to be conducted so that requirements of information safety be satisfied. To this end, a special law-normative base has been elaborated for the Irkutsk Municipal Administration. It is intended to verify the ownership rights and regulate the order of formation, exchange, protection and usage of the information resources. There also exists a developed integrated information-computing network (its capacity being 100 Mbit/s), which belongs to the Irkutsk Research and Education Complex (IRNOK). This network integrates information resources of Irkutsk Scientific Center (Siberian Branch, Russian Academy of Sciences), Baikalsky Educational Complex, Irkutsk State University, Irkutsk State Technical University (Polytechnic), Irkutsk State University of Railway Communications, East-Siberian Center of Russian Academy of Medical Science (Siberian Branch), Baikalsky State University of Economics and Law. Such WWW providers as Golden Telecom, Rostelecom, Global One, BaikalTransTelecom, etc. provide for the positive dynamics of growth of the number of network consumers. Enterprises and organizations of Irkutsk have developed and support over 300 websites. Nevertheless, some enterprises and organizations still have no access to WWW, what results in their insufficient activity on the city, regional and national markets. Every day some 40 thousand citizens of Irkutsk address WWW. There are about 1500 Irkutsk site and sites about Irkutsk, meanwhile, the number of professionally made sites makes only about 10 %. The annual growth of WWW users in Irkutsk is some 30%, this figure being substantially higher than that for the European part of Russia. Our estimate of the current state of the city information infrastructure of Irkutsk allows to conclude that on the basis of balanced development and consumption of information resources available it is possible to maintain further complex and balanced elaboration of informational technologies to the end of satisfaction of informational needs and realization of the rights of the citizens, local authorities, organizations and social unions [3,4]. But presently, there exist some important problems, which are related to development of the info-communication environment and formation of the information space on the territory of Irkutsk. These problems shall influence the progress of possible development since: • The efficiency of information interaction (exchange) between state and municipal administrative infrastructures is quite insufficient; the coordination between state departments, establishments and municipal infrastructures is
240
• • • • •
•
I.V. Bychkov
insufficient in the aspect of works conducted; as a result, we have a weak level of integration for the existing systems of state authorities and the systems of municipal administration; There does not exist a necessary complex of municipal standards and classifiers, which could allow experts to unify the data representation and provide for coordinated functioning of municipal systems; The joint cartographic basis, which is intended to support efficient administrative decision making on many issues (including the issues of town-building, public utilities and emergency situations), is used quite incompletely; The contemporary information environment, which could be efficiently used for maintaining active interaction between the administrative boards and the population, has been developed quite insufficiently; The level of informatization of museums, libraries and other Irkutsk establishments of culture is very low; It is vitally important for Irkutsk not simply to retain the innovative potential but rather to form a functionally complete and efficient infrastructure, which could support innovations (mainly in the sector of info-communication technologies) at the expense of introduction of innovations into many other spheres of activity; The access to the basic info-communication service and to socially important information must be provided for all the citizens regardless of the place, where they live, and their economic status.
So, application of contemporary information technologies, which are related to development and introduction of municipal information systems and resources, as well as elaboration of programs and devices, which would ensure their efficient operation for the purpose of efficient interaction between the administration and the population/organizations, are essential problems to be solved. It is possible to formulate the following objectives to be reached and the problems to be solved for the purpose of increasing the efficiency of management related to the problems of social-economic development of Irkutsk, which may be reached at the expense of elaboration of a joint information-technological infrastructure: • Development and improvement of an information-analytical subsystem, which provides for the opportunity of monitoring, analysis, forecasting and planning the activity of municipal administration departments related to reaching the objectives of Social-economic development of the city; • Development and improvement of a functional subsystem, which provide for some definite increase in the efficiency of activity for municipal administration departments, municipal enterprises and establishments related to rendering services to the Population and to the companies; • Development and improvement of a subsystem, which provides for the opportunity of access of the population and companies to the information related to activity of municipal administration departments, municipal enterprises and establishments, i.e. to municipal information resources;
A Service-Oriented Architecture
241
• Development and improvement of an integrating subsystem, which provides for the opportunity of any organization to get access to the joint municipal system of electronic information interaction, as well as to similar regional and national (federative-level) information systems; • Development and improvement of a joint information-technological infrastructure, which would provide for the opportunity of joint operation of separate subsystems of the municipal information system; • Organization, personnel and methodological filling and support of the programme oriented to informatization of the boards of municipal administration; • Formation of a joint information space of Irkutsk, which would include municipal and departmental information resources, information resources of the enterprises, organizations, education establishments, academic institutes, etc.; provision of access of these resources to the population; • Improvement of activity of the boards (departments) of Irkutsk municipal administration on the basis of application of information technologies (ITs) and development of systems intended for administrative and management decision making by the administration of Irkutsk; • Information supply and support of large enterprises and small companies of Irkutsk; • Improvement of the ITs’ normative basis for Irkutsk; • Development of the market of information and knowledge, which are industrial progress factors; transformation of the information resources into real factors of social-economic development of Irkutsk; • Development of the information space for such cities as Irkutsk, Anagarsk, Shelekhov, which would provide for efficient information interaction of the population, access to the city, regional, national and international information resources, satisfaction of personal and social requirements in information products and service; • Improvement of the education system; extension of capabilities of new systems of information exchange; elevation of the role of qualification, professionalism and creativity as the most important characteristics of the labor related to information technologies; • Improvement of the system of training specialists and qualified users to make the specialist able to work with contemporary information technologies. So, creation of contemporary information systems, which could provide for the functions of municipal administration and management, necessitates development of the corresponding technological infrastructure. It must satisfy a standard set of requirements to the large-scale distributed information computations and decisions. This must imply satisfaction of open standards and scaling, which would allow us to integrate diverse information systems of different national and municipal structures, extend the functionality of the corporative system without it cardinal change and flexibly react to the increase of the number of users. This must also imply an opportunity to integrate existing hardware-software platforms and databases (including inherited ones). This availability of the platform
242
I.V. Bychkov
Fig. 1. A scheme of the service-oriented architecture (SOA)
intended for fast development of new applications, as well as the aids for integration of available applied software packages intended for support of decision making executed by municipal authorities [4,5]. Considering contemporary tendencies in technologies of integration, as well as the fact that a substantial part of complex municipal IT-projects, which integrate systems of different departments and committees, presently survive initial stages of their development, it is possible to state with high degree of confidence that the following service-oriented architecture (SOA) is perspective from the viewpoint of problems of computer-aided support of functions of municipal authorities [6]. Note, the service-oriented architecture grants substantial advantages with respect to standard customer-service applications [7]. SOA is based on open standards. Web-services operate to such accepted standards as HTTP, XML, UDDI, WSDL and SOAP. These are widely used by majority of experts involved in elaboration of software for municipal authorities. From the viewpoint of implementation Web-services are developed so that dependence between the users and the designers of Web-services (weak coupling) would be removed. The information needed by the user is taken directly from the library of services. Weakly coupled (weakly bound) services in SOA give one an opportunity to easily introduce changes (amendments) into the applications (in accordance with varying requirements) because the functionality of the services is mutually independent (this is the principal difference from older systems). Weak coupling characteristic of SOA allows one to substantially reduce labor expenses related to elaboration and, so, simplifies the problem of integration.
A Service-Oriented Architecture
243
As note above, the work process in any municipal administration is presumes existence of a substantial turnover of the documents. The Web-services allow one to maintain exchange (turnover) of structured documents, which contain variable volumes of diverse information. As soon as information is accessible via the ESB-bus, it may be transformed into different formats, which are used by the municipal administration. Exchange of the messages for the division of Web-services takes place simultaneously via several machines. In this connection, the system presumes initial excessiveness, and so, the load onto the network does not grow substantially after including additional services and applications. The services operate in real time mode. It is very important from the viewpoint of support of operation of departments and divisions of the municipal administration. The information is distributed via the ESB-bus, and there is no need either to make retrievals to definite databases or to initiate any other processes, which could require additional time and resources, and so, would make the system’s operation slower. Application of SOA allows one to improve reliability of operation of the administrative corporative system on the whole, because the Web-services send messages not to definite devices but to logical addresses, which may hide either a large number of services or a large number of physical devices. This conception provides for high fail safety of the system on the whole. Web-services are easy to access via safe WWW-links with the aid of protocols HTTPS and SMTP, in this connection, there is no need to install any additional equipment and software (independence of the platform). Such a technique of work is very comfortable. It also allows one to operate with information represented diverse formats, to view the necessary information with the aid of a browser, a PDA, a mobile phone equipped to the standards of the 3G-technology, or to view it by any other technique, which may become popular in virtue of further progress of information technologies in the nearest future. In the capacity of the services one can consider both new decisions and the decisions known already for a long time. The list of SOA advantages, which are important for municipal authorities, includes the integration of diverse systems with the use of standard interfaces, flexibility from the viewpoint of including new functions, the possibility to avoid duplication of similar functions in different systems, which are implemented as separate services, etc. The actions, which are needed for introducing, maintenance and extension of the efficient infrastructure of SOA for the purpose of automation of some functions of municipal administration, may be represented in the form of a “life cycle”, which includes the four stages: planning, determination, application and estimation. On the stage of planning , in course of constructing the SOA infrastructure, it is necessary to analyze in detail the general demands of the municipal administration and determine (identify) the domains which necessitate more efficient administering (management). The main pert of this work is conducted in the committees and departments, in contact with people. This work is oriented to extension of joint
244
I.V. Bychkov
work. On this stage, it is necessary to develop and identify in the program of informatization the direction related to SOA, in which it is necessary: • To determine the desired level of potentials of IT and SOA; • To formulate and redefine the strategy of development of SOA; • To develop a plan of management. Determination implies development of an approach to administering (management). After determination of directions of work, the heads of the administration departments (divisions) and experts in information technologies have join their efforts in determination (choosing) or changing existing methods (technologies) and/or mechanisms of administering (management). On this stage, it is necessary to make the following decisions related to administering (management): • Determination of the set of necessary additional procedures (e.g. some reorganization of the IT-infrastructure may be needed); • Agreement of the regulations related to the repeated usage of the applications and resources between different committees and departments; • Formation of a mechanism of support and stimulation of repeated usage of the applications and the resources; • Development of regulations, which would provide for a desired level of service. Application presumes completion of the model and its practical usage. On this stage, the decisions related to administering (management) are realized. As a rule, this realization implies the following operations: • Application of new and improved methods of administering (management), such as: • Monitoring of processes of decision making, • Development of the infrastructure intended to support the administering (management) strategies elaborated; • Training the municipal administration’s co-workers in application of new methods of work and new regulations; • Development of the infrastructure, which would support the regulations elaborated. Estimation implies monitoring of new decisions and correction of these decisions when there is the need. This stage also implies monitoring of new methods and mechanisms administering (management) proposed on the stage of determination and applied on the stage of application. This allows the specialist to estimate the results and, when there is the need, initiate a new cycle of four stages in order to find a decision with higher efficiency of administering (management). As a rule, the following operations are executed on this stage: • Verification of fulfillment of the regulations elaborated and control methods proposed, such as service level agreements (SLAs), analysis of the level of repeated usage and change of the regulations; • Analysis of IT-structure efficiency indicators.
A Service-Oriented Architecture
245
So, the Service-Oriented Architecture (SOA) is a component model, which interconnects different functional modules of applications (called services) by exactly determined interfaces and agreements between these services. The interfaces are determined by an independent technique and shall not be dependent on hardware platform, on the operating system or on the programming language, in terms of which the service has been implemented. Such an approach allows to develop services on different systems, which can interact with each other uniformly, in some standard way. Availability of an independent determination of the interface (this determination being not tightly related to a definite implementation) is know as loose coupling between the services. As an advantage of weakly coupled systems one can consider its time optimality and the capability to endure evolution changes in the structure and implementation of each individual service, which on the whole constitute an application. On the other hand, tight coupling presumes that the interfaces of different components of the application are strongly interrelated in the aspect of their functionality and reciprocal actions, what makes these rather vulnerable, when some part of the application undergoes changes. The need in loosely coupled systems is a result of the requirement of quick adaptation of the applications according to varying requirements of the computational environment (say, variation of the regulations; variation of emphases of social-economic growth of a territory; etc.). The administering (management) systems, which can quickly react to the varying requirements of the environment, are said to be the systems capable of “administering (management) to some requirement”; in this case, the changes into administering (management) are introduced in accordance with the objectives of administering (management), or else when there is the need to execute a definite work. The service-oriented architecture is not a new model [8]. But it represents a model, which is alternative to more traditional tightly coupled object-oriented models, which have been elaborated in recent decades (see Fig.2). Despite the fact that SOA-systems do not exclude the needed of object implementation of definite services, on the whole the system’s design remains service-oriented. Since such an approach allows one to use objects in the system, SOA represents an object-based but not an object-oriented system. The difference consists in the interfaces themselves. A classical example of SOA-system, which was rather popular in its time, is the Common Object Request Broker Architecture (CORBA), which is based on the conceptions close to those of SOA [9]. But the contemporary SOA is quite different. It relies on the recent achievements related to the eXtensible Markup Language (XML). At the expense of description of interfaces of the Web-services in terms of a language based on XML, which is called the Web Services Definition Language (WSDL), the system intended for describing the interfaces has become more dynamic and flexible in comparison to the version based on the Interface Definition Language (IDL) specified within the frames of the CORBA standard. Web-services represent just one of the techniques of implementation of SOA. For example, CORBA, as mentioned above, is one of the versions of implementation.
246
I.V. Bychkov
Fig. 2. Evolution of the software architectures
The same is true of systems constructed on the basis of Message-Oriented Middleware, such as IBM WebSphere MQ (MQ Series). But for the purpose of development of an architecture model one needs to have more than mere description of the service. One has to define how the application on the whole allows him to organize the coordination of actions between the services. Next, one has to find the points of transformation between the regulations (time limits) of control and the operations of the software used in the computer-aided system. Hence, SOA must be capable of correlating control functions and the respective providing technical processes, and regulating the sequence of operations (the workflow) between them. For example, payment to the supplier in accordance with a municipal contract represents a business process, while change of the database (which implies addition of ambulance cars supplied for municipal hospitals) is a technical process. So, the workflow plays an important role in the design of SOA. Furthermore, the workflow may include not only the operations between the subdivisions but also interaction with external partners, which are not under your supervision. Consequently, in order to gain larger efficiency, one has to determine the regulations (time limits) and determine how the interaction between the servers will be maintained. The latter is often organized in the form of service level agreements and the time limits for the operations. And, finally, this all must operate in a reliable environment with use-friendly interfaces, so that the processes would be executed as expected, in accordance with some agreed conditions. So, safety, plausibility and reliable exchange of
A Service-Oriented Architecture
247
messages play a substantial role in the systems constructed on the basis of the service-oriented architecture. In the second half of 2006, the Irkutsk Municipal Administration started the process of application of SOA. But it appeared obvious that this problem is more complex than it was expected in the beginning. Several large obstacles were revealed, which complicated implementation f the project. The following of the obstacles have to be mentioned. Real functional compatibility is presently inaccessible in many respects. The contemporary understanding of SOA is bound up with weak interactions between heterogeneous systems, as well as with the functional compatibility based on some standards, which constitutes the idea of Web-services. But, in spite of substantial piece of work conducted by the organizations responsible for the standards and by the Web Service Interoperability Organization (WS-I), there still exists some small but rather substantial differences in application of even the most principal Web-service standards - SOAP and WSDL. The choice of products is either too fragmentary (fragmented) or too oriented to the integration. We have found out that the problem of elaboration of SOA cannot be solved by purchasing some one software product. The problem is that in order to obtain the desired system’s potentials it is necessary to integrate several software products supplied by different companies. And still the result is problematic, because the products are hardly ever functionally compatible. The standards are incomplete, too bounded or poorly elaborated. Several organizations are involved in problems with standards, and their co-workers have elaborated detailed plans of development of the proposed standards, which imply aspects of integration, management, safety, etc. They have reached substantial progress in devising such standards on the basis of these plans, but, nevertheless, there is still much to be done in order to obtain a complete and coordinated list of standards functionally compatible with SOA. Another very substantial problem is that some of the standards elaborated are too restricted and, furthermore, improperly elaborated. For example, the standard WSDL has a substantial restriction, i.e. it cannot provide for requirements to service or contract restrictions for the service user. Such standards as UDDI and WS-BPEL also belong to the list of improperly developed ones. UDDI has acquired its shortcomings since in the process of its development during several years they applied priorities. The standard WS-BPEL combines two earlier specifications, which are hardly ever compatible. Furthermore, the standard WS-BPEL lacks the aids, which would allow one to work with the turnover of documents. The process of application of advanced information technologies is also influenced by the human, which has to be taken into account. Any person will tend resist changes. The coworkers of municipal administrations do not make an exception to the rule. All the problems discussed above are deterrents to the introduction of SOA. But advantages of SOA make us sure that application of SOA for the purpose of development of corporative info-communication systems, which can make operation of
248
I.V. Bychkov
municipal departments more efficient, is a perspective approach. No doubt, in the nearest future application of such systems will become normal and large-scale.
References 1. Bychkov, I.V., Baturin, V.A., Baturina, E.Y., et al.: Modeling and estimation of the state of medical-ecological-economic systems. Nauka Publ., Novosibirsk (in Russian) (2005) 2. Bychkov, I.V., Schastlivtsev, E.L., Malakhov, S.M., Ovdenko, V.I., et al.: Regional problems of transition to sustainable development: resource potential and its rational usage for the purpose of sustainable developemnet. Kemerovo, Polygraph (in Russian) (2003) 3. Kitov, A.D., Bychkov, I.V., et al.: Geo-information system for control of territory. Irkutsk, Institute of Geography. SB RAS Publ. (in Russian) (2002) 4. Baturin, V.A., Bychkov, I.V., Vassilyev, S.N., et al.: Mathematical modeling of development at the levels of a region and a country. Physmathlit Publ., Moscow (2001) 5. Bychkov, I.V., Vassilyev, S.N., Baturin, V.A., Ruzhnikov, G.M., Cherkashin, E.A.: A system of mathematical and information technologies for support of administrative decision making in the sphere state authorities. Vestnik of Tomsk state University 9(11), 8–12 (2004) (in Russian) 6. Biberstein, N., Jones, K., Bows, S.: A Compas in the world a Service-Oriented Architecture (SOA): The Value for business, planning and the plan rational development of and enterprise. Kudits-Press, Moscow (2007) (in Russian) 7. SOA in the Real World/ http://www.microsoft.com/downloads/details. aspx?FamilyID=cb2a8e49-bb3b-49b6-b296-a2dfbbe042d8&DisplayLang=en 8. Krafzig, D., Banke, K., Slama, D.: Enterprise SOA: Service-Oriented Architecture best practices. The Coad Series. Prentice Hall, Englewood Cliffs (2004) 9. Gamma, E., Helms, R., Johnson, R., Vlissides, J.: Design patterns: elements of reusable object-oriented software. Addison-Wesley, Reading (1995)
Basic Tendencies of the Telemedicine Technologies Development in Siberian Region A.V. Efremov1 and A.V. Karpov1,2 1
Novosibirsk State Medical University, Krasny Ave. 52, 630099 Novosibirsk, Russia
[email protected] 2 Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia
[email protected]
Abstract. Telemedicine is the using of achievements of telecommunication technologies in the public health service. Telemedicine allows improving the efficacy of treatment and diagnosis on a new level. With the aid of such technologies you can render a great qualified help to a remote patient in the best medical institutions of our country as well as of the whole world. Doctors can make a diagnosis using as the base received through the e-mail or the Internet X-ray pictures, computer tomograms, electrocardiograms, brainwaves or other findings of laboratory or instrumental patient’s examinations. In connection with the fact that the essential part of specialists in different fields of medicine work in specialized medical centers of big cities, so it provoked some centralization of attendance. However, the achievements of telemedicine remove the necessity of direct physical presence of a specialist. Modern telemedical ways allow realizing removed advisories of doctors and their patients locating in far regions. By the way, for realizing of consultation of any ill person a physician can repose not only his own experience. Thank to telecommunication technologies physicians and specialists can listen to lectures of famous scientists of the most actual problems of the public health service and medical science, to keep professional contacts with global scientific centers, and also with their colleagues from neighboring district hospitals or from principal specialists of regional center. The possibility of using of technologies of videoconferences is very attractive which permits the real communication in the regimen of video transmission.
1
Introduction
Telemedicine — is the use of the telecommunication technology in the field of the public health service. Telemedicine raises the effectiveness of diagnosis and treatment of patients to a new higher level. By means of these technologies a highly qualified medical aid in the best patient care institutions of the world can be provided to a distant patient. Doctors are able to diagnose on the basis of X-ray images, computer tomography images, electrocardiograms, electroencephalograms and other data of laboratory and instrumental examination of the patient sent via e-mail or the Internet. E. Krause et al. (Eds.): Comp. Science & High Perf. Computing III, NNFM 101, pp. 249–254, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com
250
A.V. Efremov and A.V. Karpov
As most specialists in different fields of medicine are working in medical centres located in big cities, medical services are definitely centralized. However the achievements of telemedicine obviate the need for physical presence of the specialist. Contemporary telemedicine provides means for distant consultations with patients from the farthest districts. As a result a doctor consulting a critical patient can rely not only on his own experience. Owing to the telecommunication technology, doctors and specialists are able to attend lectures of the most outstanding scientists, on the most relevant problems of the public health service and medicine to maintain professional connections with the leading scientific centres of the world, as well as with colleagues from the neighbouring district hospitals or with the leading specialists from a regional center. The opportunity of using the videoconferencing technology, giving the real communication in the video mode of operation is extremely attractive.
2
Principles of Telemedicine Networks Construction
The architecture of telemedical networks is based on the communication of the medical equipment, which provides the examination of the patient, with the stations of telemedicine for preparing and conducting the consultations. When establishing any telemedicine network the following principles are to be implemented: • An opportunity for getting the objective medical information about the patient in the digital form. This principle is realized by the use of the modern digital medical equipment or by converting medical data to the digital form (in case of use non-digital medical equipment); • An opportunity to keep, examine and process the medical data of the patient for conducting telemedical consultations. This principle is realized by creating specialized stations that combine hardware and software for processing and keeping the data. • Two basic methods of telemedicine consultations are possible: a) The objective data of the patient is transferred to the consultant for processing; b) The audio-visual contact via video conference is effected. • An opportunity for transferring the collected and prepared medical data in far distance in the short period. An opportunity for discussing this data with the distant consultant and the possibility to get the medical certificate according to the data. This principle is realized by the use of various communication channels, such as telephone channels for general use, special ground-based and satellite channels and others. The selection of the essential channels depends on several factors, for instance: the volume of the transmitted medical data, farness of the telemedical centers from the communication facilities for general use;
Basic Tendencies of the Telemedicine Technologies Development
251
• Telemedicine technologies of the common information space have to guarantee the completeness and the authenticity of the data by the receiver, irrespective the distance, types of the equipment and channels, as the target purpose of any such action is the appropriate use of the transmitted data in the interests of the patient, the medical worker or health authorities.
3
Principles of the Telemedicine Technology Selection by Telemedicine Systems and Nets Establishing
Telemedical technologies, constructed on the basis of telemedical networks and systems, can be divided into three main classes: these are medical, telemedical and telecommunicational. Medical resources include the following: • Digital and non-digital medical diagnostic equipment; • Methods of the diagnostic for different clinical cases. Telemedical resources include the following: • • • •
Means Means Means Means
of of of of
the the the the
medical data collection; processing and holding the medical data; medical data preparation and consulting; traced consultations registration.
Telecommunication resources include the following: • Telephone channels for general use; • Fiber-optic lines of communication; • Satellite channels. Special dedicated telemedical channels can be organized over corporate networks, or alternatively existing public channels such as Internet can be used. All this provides organization of different ways of the telemedical data exchange between the patient and the consultant. Telemedical resources are special components of computerized doctor offices, which allow to prepare and lead the consultation. Means of the Medical Data Collection Medical diagnostic equipment is divided to digital and non-digital. The digital equipment, more modern (for example, computer tomography scanner, ultrasonic equipment, digital electrocardiograph and others) spreads nowadays quite fast, but the overwhelming majority of hospitals are still equipped with nondigital equipment. Two main aims are facing the process of the medical data collecting: Firstly — to transform any medical diagnostic data to the digital, without the quality loss, and secondly, to organize the infill of databases by the quantized information. Moreover, these means should provide the input of the accompanying text as annotations.
252
A.V. Efremov and A.V. Karpov
Means for Processing and Storage of Medical Data Contemporary software tools allow us to process quantized medical images with a view to present some summarized final data to the doctor. This speeds up and raises the quality of the diagnosis. All the collected and processed data is to be kept in databases for further retrieval. The basic databases are: • The patients’ database, which contains the complete information about patients, including their digital case histories; • The specialists’ database, which contains the information about consultations, place of employment, professional speciality; • The pharmaceutical database, which contains the information about medicaments, characteristics, availability; • The database of the medical methods, where diagnostic and medical treatment methods of typical and widely spread diseases are presented; • The telemedical consultations database, which contains archives of traced telemedical consultations. Means of the Medical Data Preparation and Consulting These means provide the access to databases, searching the data, required for the telemedical consultation, putting it in the right order, sending this data to the consultant and getting the diagnostic decision. All the mentioned opportunities have to be realized either on-line by leading a videoconference, or off-line via deferred consultations. Means of the Traced Consultations Registration All the traced telemedical consultations have to be recorded and put into the archives. This data can be used in the subsequent medical actions. The access control of the various extents by different keys depending on demands must be provided. For instance: • • • •
For For For For
the the the the
subsequent treatment procedures with the present patient; registration of traced consultations with the concrete specialist; distance learning process; registration of the whole traced consultations number.
The mentioned functions are provided with control systems of respective databases. The filling of the databases has to be supported by accounting and registration software for traced telemedical consultations. The consultant requires only the telemedical terminal for the examination of the received data, forming the diagnostic decision and forwarding it to the patient. In general case the respective equipment of the telemedical center can be used for these aims.
Basic Tendencies of the Telemedicine Technologies Development
253
Program-Technical Means of Telemedical Systems Technical means of the telemedical technologies realization are: • Telecommunicational infrastructure, which unites in the common information space specialized telecommunication operators, providers, telecommunication service, consultants and clients; • Telemedical equipment, which allows to collect, memorize, handle, prepare for the transmission digital medical diagnostic data, as well as to maintain held telemedical consultations. Telemedical Equipment The structure of the telemedical equipment, which is necessary for the telemedical consultations, essentially depends on the expected number of consultations. However, an equipment, operable with the radiodiagnosis data (X-ray, tomography, ultrasonic scanning), cytological, histological preparations, endoscopy and functional diagnostics data is required in most cases. Most of these tasks can be combined at one station, functioning as a medical terminal, the station of the medical documents preparation and the server at small telemedical points. At large medical establishments the set of the telemedical center equipment usually consist of: • • • • •
Telemedical terminals; Video conference terminals; Special teleconsultation stations; Database and communication server; Subsidiary equipment.
The equipment of the telemedical center should provide: the collection of the objective medical data about the patient, storage of the medical data in the special database, opportunity for examining and processing this data, document archives conducting, preparation and holding telemedical consultations. On-line consultation is accomplished at the real-time via audio-visual contact, with a guarantee of the double-sided work with diagnostic and attendant materials. Off-line consultation is held according to the diagnostic data, including images, delivered to the consultative point. The time domain between the data delivery and the consultant decision is defined by the contract conditions.
4
Basic Principles of the Interaction
The interaction of subjects of the telemedical net is built according to the telemedical service principles. The following model can be used: telemedical centers purchase services (medical consultations and telecommunicational resources) from specialized licenced medical centers and telecommunicational operators and sell them to their clients (prophylactic establishments, insurance companies and private persons).
254
5
A.V. Efremov and A.V. Karpov
Actions, Providing Coordination of Telemedical Nets
For coordination of national consultative telemedical networks (for example of the CIS states) the following is required: • The exchange of the data, which participants of telemedical actions use, have to be only in the digital form; • The data have to be transferred via digital channels; • For the data exchange, which precedes consultations, should be used unified forms, including application of the special software of telemedical centers’ worker offices. This forms provide conducting of the uniform medical and attendant documentation; • When quantizing the radiodiagnostic data, which is kept in the X-ray film, should be used the equipment, providing definition not less than 300 dpi and the colour casts not less than 212 (according to the ACR1 recommendations); • Certified equipment, corresponding to the recommendations of the International Telecommunications Union should be used for telemedical actions, where means of the video conference connection are used. For the purpose of patients’ rights for the data confidentiality observance, when holding medical consultations, the following should be obeyed: • Using protected channels; dividing traffics of the passport data, diagnostic data and video conference; • Receiving the informed agreement of patients on the holding of video consultations; • Receiving the informed agreement of patients on the presence of non-medical staff during the consultation (technical specialists), which provides the support of the equipment and lines of communication, as well as data receive/transfer; • To legalize non-disclosure agreements of technical specialists, participating in conferences, about personal data. • To carry out a coordinated policy in the medical informatics area, taking into account current international standards in the field of holding and transferring medical data. As the development of clinical information systems in medical establishments progresses (for example, electronic case histories), the compatible software, standardized with all the participants of the present technological process is being created. For instance, the programme DOCA+2 .
1 2
American College of Radiology. Advanced Hospital In-Patient Information System (http://www.docaplus.com)
Author Index
Afanasiev, K.E. Agoshkov, V.I.
206 31
Babailov, V.V. 52 B¨ ansch, E. 102 Beisel, S.A. 52 Bychkov, I.V. 238
Koop, A. 102 Kostomakha, V.A. 82 Kr¨ oner, D. 69, 102 K¨ uster, U. 8, 44, 224 Lebedev, A.S. 122 Lubkin, A. 155
Chernykh, G.G. 82 Chubarov, D.L. 1 Chubarov, L.B. 52 Currle-Linde, N. 224
Makarchuk, R.S. 206 Meinke, M. 136 Monarev, V. 155 Moshkin, N.P. 82
Demenkov, A.G. 82 Doroshenko, S. 155
Nechaev, O.V.
14
Ohlberger, M.
69
Efremov, A.V. 249 Eletsky, S.V. 52 Eryomin, V.N. 14 Fedoruk, M.P. 1, 122 Fedotova, Z.I. 52 Fionov, A. 155 Fomina, A.V. 82 Goncharova, O. G¨ otz, J. 165 Gusiakov, V.K.
52
Haberhauer, S.
14
Ilyushin, B.B.
102
Popov, A.Yu. 206 Prokopyeva, L.Yu. 122 Resch, M.M. 8, 44, 224 Ryabko, B. 155 Schr¨ oder, W. 136 Shaidurov, V.V. 31 Shary, S.P. 184 Shokin, Yu.I. 1, 52, 122, 155 Shokina, N. 14 Shtyrina, O.V. 122 Shurina, E.P. 14 St¨ urmer, M. 165
82 Voropayeva, O.F.
Kamenschikov, L.P. 31 Karepova, E.D. 31 Karpov, A.V. 249 Kl¨ ofkorn, R. 69 K¨ onig, D. 136
Yurchenko, A.V. Zeiser, T. Zhang, Q.
165 136
82 1
Notes on Numerical Fluid Mechanics and Multidisciplinary Design
Available Volumes Volume 101: Egon Krause, Yurii I. Shokin, Michael Resch, Nina Shokina (eds.): Computational Science and High Performance Computing III - The 3rd Russian-German Advanced Research Workshop, Novosibirsk, Russia, 23–27 July 2007. ISBN 978-3-540-69008-5 Volume 100: Ernst Heinrich Hirschel, Egon Krause (eds.): 100 Volumes NNFM and 40 Years Numerical Fluid Mechanics. ISBN XXXXXXXXXX Volume 99: Burkhard Schulte-Werning, David Thompson, Pierre-Etienne Gautier, Carl Hanson, Brian Hemsworth, James Nelson, Tatsuo Maeda, Paul de Vos (eds.): Noise and Vibration Mitigation for Rail Transportation Systems - Proceedings of the 9th International Workshop on Railway Noise, Munich, Germany, 4–8 September 2007. ISBN 978-3-540-74892-2 Volume 98: Ali Gülhan (ed.): RESPACE – Key Technologies for Reusable Space Systems - Results of a Virtual Institute Programme of the German Helmholtz-Association, 2003–2007. ISBN 978-3-540-77818-9 Volume 97: Shia-Hui Peng, Werner Haase (eds.): Advances in Hybrid RANS-LES Modelling - Papers contributed to the 2007 Symposium of Hybrid RANS-LES Methods, Corfu, Greece, 17–18 June 2007. ISBN 978-3-540-77813-4 Volume 96: C. Tropea, S. Jakirlic, H.-J. Heinemann, R. Henke, H. Hönlinger (eds.): New Results in Numerical and Experimental Fluid Mechanics VI - Contributions to the 15th STAB/DGLR Symposium Darmstadt, Germany, 2006. ISBN 978-3-540-74458-0 Volume 95: R. King (ed.): Active Flow Control - Papers contributed to the Conference “Active Flow Control 2006”, Berlin, Germany, September 27 to 29, 2006. ISBN 978-3-540-71438-5 Volume 94: W. Haase, B. Aupoix, U. Bunge, D. Schwamborn (eds.): FLOMANIA - A European Initiative on Flow Physics Modelling - Results of the European-Union funded project 2002 - 2004. ISBN 978-3-54028786-5 Volume 93: Yu. Shokin, M. Resch, N. Danaev, M. Orunkhanov, N. Shokina (eds.): Advances in High Performance Computing and Computational Sciences - The Ith Khazakh-German Advanced Research Workshop, Almaty, Kazakhstan, September 25 to October 1, 2005. ISBN 978-3-540-33864-2 Volume 92: H.J. Rath, C. Holze, H.-J. Heinemann, R. Henke, H. Hönlinger (eds.): New Results in Numerical and Experimental Fluid Mechanics V - Contributions to the 14th STAB/DGLR Symposium Bremen, Germany 2004. ISBN 978-3-540-33286-2 Volume 91: E. Krause, Yu. Shokin, M. Resch, N. Shokina (eds.): Computational Science and High Performance Computing II - The 2nd Russian-German Advanced Research Workshop, Stuttgart, Germany, March 14 to 16, 2005. ISBN 978-3-540-31767-8 Volume 87: Ch. Breitsamter, B. Laschka, H.-J. Heinemann, R. Hilbig (eds.): New Results in Numerical and Experimental Fluid Mechanics IV. ISBN 978-3-540-20258-5 Volume 86: S. Wagner, M. Kloker, U. Rist (eds.): Recent Results in Laminar-Turbulent Transition - Selected numerical and experimental contributions from the DFG priority programme ’Transition’ in Germany. ISBN 978-3-540-40490-3 Volume 85: N.G. Barton, J. Periaux (eds.): Coupling of Fluids, Structures and Waves in Aeronautics - Proceedings of a French-Australian Workshop in Melbourne, Australia 3-6 December 2001. ISBN 978-3-54040222-0 Volume 83: L. Davidson, D. Cokljat, J. Fröhlich, M.A. Leschziner, C. Mellen, W. Rodi (eds.): LESFOIL: Large Eddy Simulation of Flow around a High Lift Airfoil - Results of the Project LESFOIL supported by the European Union 1998 - 2001. ISBN 978-3-540-00533-9
Volume 82: E.H. Hirschel (ed.): Numerical Flow Simulation III - CNRS-DFG Collaborative Research Programme, Results 2000-2002. ISBN 978-3-540-44130-4 Volume 81: W. Haase, V. Selmin, B. Winzell (eds.): Progress in Computational Flow Structure Interaction Results of the Project UNSI, supported by the European Union 1998-2000. ISBN 978-3-540-43902-8 Volume 80: E. Stanewsky, J. Delery, J. Fulker, P. de Matteis (eds.): Drag Reduction by Shock and Boundary Layer Control - Results of the Project EUROSHOCK II, supported by the European Union 1996-1999. ISBN 978-3-540-43317-0 Volume 79: B. Schulte-Werning, R. Gregoire, A. Malfatti, G. Matschke (eds.): TRANSAERO - A European Initiative on Transient Aerodynamics for Railway System Optimisation. ISBN 978-3-540-43316-3 Volume 78: M. Hafez, K. Morinishi, J. Periaux (eds.): Computational Fluid Dynamics for the 21st Century. Proceedings of a Symposium Honoring Prof. Satofuka on the Occasion of his 60th Birthday, Kyoto, Japan, 15-17 July 2000. ISBN 978-3-540-42053-8 Volume 77: S. Wagner, U. Rist, H.-J. Heinemann, R. Hilbig (eds.): New Results in Numerical and Experimental Fluid Mechanics III. Contributions to the 12th STAB/DGLR Symposium, Stuttgart, Germany 2000. ISBN 978-3-540-42696-7 Volume 76: P. Thiede (ed.): Aerodynamic Drag Reduction Technologies. Proceedings of the CEAS/DragNet European Drag Reduction Conference, 19-21 June 2000, Potsdam, Germany. ISBN 978-3-540-41911-2 Volume 75: E.H. Hirschel (ed.): Numerical Flow Simulation II. CNRS-DFG Collaborative Research Programme, Results 1998-2000. ISBN 978-3-540-41608-1 Volume 66: E.H. Hirschel (ed.): Numerical Flow Simulation I. CNRS-DFG Collaborative Research Programme. Results 1996-1998. ISBN 978-3-540-41540-4