Modeling, Simulation and Optimization of Complex Processes

Modeling, Simulation and Optimization of Complex Processes • Hans Georg Bock Hoang Xuan Phu Rolf Rannacher Joha...

Author: Hans Georg Bock | Xuan Phu Hoang | Rolf Rannacher | Johannes P. Schlöder (editors)

264 downloads 3406 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Modeling, Simulation and Optimization of Complex Processes

•

Hans Georg Bock Hoang Xuan Phu Rolf Rannacher Johannes P. Schl¨oder Editors

Modeling, Simulation and Optimization of Complex Processes Proceedings of the Fourth International Conference on High Performance Scientific Computing, March 2-6, 2009, Hanoi, Vietnam

123

Editors Hans Georg Bock Rolf Rannacher Johannes P. Schl¨oder University of Heidelberg Interdisciplinary Center for Scientific Computing (IWR) Heidelberg Germany

Hoang Xuan Phu Vietnam Academy of Science and Technology (VAST) Hanoi Vietnam

ISBN 978-3-642-25706-3 e-ISBN 978-3-642-25707-0 DOI 10.1007/978-3-642-25707-0 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2012931412 Math. Subj. Class. (2010): 35-06 49-06, 60-06, 65-06, 68-06, 70-06, 76-06, 86-06, 90-06, 93-06, 94-06 c Springer-Verlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Front cover figure: The Huc Bridge on Hoan Kiem Lake, Hanoi. By Courtesy of Johannes P. Schl¨oder. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

High Performance Scientific Computing is an interdisciplinary area that combines many fields such as mathematics and computer science as well as scientific and engineering applications. It is an enabling technology for both competitiveness in industrialized countries and for speeding up development in emerging countries. High performance scientific computing develops methods for modeling, computer-aided simulation, and optimization of systems and processes. In practical applications in industry and commerce, science and engineering, it helps to conserve resources, to avoid pollution, to reduce risks and costs, to improve product quality, to shorten development times, or simply to operate systems better. Topical aspects of scientific computing have been presented and discussed at the Fourth International Conference on High Performance Scientific Computing held at the Institute of Mathematics, Vietnam Academy of Science and Technology (VAST), March 2–6, 2009. The conference has been organized by the Institute of Mathematics of VAST, the Interdisciplinary Center for Scientific Computing (IWR) of the University of Heidelberg, and Ho Chi Minh City University of Technology. More than 200 participants from countries all over the world attended the conference. The scientific program consisted of more than 140 talks, 10 of them were invited plenary lectures given by Robert E. Bixby (Houston), Olaf Deutschmann (Karlsruhe), Iain Duff (Chilton), Roland Eils (Heidelberg), L´aszl´o Lov´asz (Budapest), Peter Markowich (Cambridge & Vienna), Volker Mehrmann (Berlin), Alfio Quarteroni (Lausanne & Milan), Horst Simon (Berkeley), and Ya-xiang Yuan (Beijing). Topics included mathematical modeling, numerical simulation, methods for optimization and control, parallel computing, software development, applications of scientific computing in physics, mechanics, hydrology, chemistry, biology, medicine, transport, logistics, site location, communication networks, scheduling, industry, business, and finance. This proceedings volume contains 27 carefully selected contributions referring to lectures presented at the conference. We would like to thank all authors and the referees.

v

vi

Preface

Special thanks go to the sponsors whose support significantly contributed to the success of the conference: + Heidelberg Graduate School of Mathematical and Computational Methods for the Sciences + Daimler and Benz Foundation, Ladenburg + The International Council for Industrial and Applied Mathematics (ICIAM) + Berlin Mathematical School + Berlin/Brandenburg Academy of Sciences and Humanities + The Abdus Salam International Centre for Theoretical Physics, Trieste + Institute of Mathematics, Vietnam Academy of Science and Technology + Faculty of Computer Science and Engineering, HCMC University of Technology Heidelberg

Hans Georg Bock Hoang Xuan Phu Rolf Rannacher Johannes P. Schl¨oder

Contents

A Cutting Hyperplane Method for Generalized Monotone Nonlipschitzian Multivalued Variational Inequalities . . .. . . . . . . . . . . . . . . . . . . . Pham Ngoc Anh and Takahito Kuno

1

Robust Parameter Estimation Based on Huber Estimator in Systems of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Tanja Binder and Ekaterina Kostina

13

Comparing MIQCP Solvers to a Specialised Algorithm for Mine Production Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Andreas Bley, Ambros M. Gleixner, Thorsten Koch, and Stefan Vigerske

25

A Binary Quadratic Programming Approach to the Vehicle Positioning Problem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Ralf Bornd¨orfer and Carlos Cardonha

41

Determining Fair Ticket Prices in Public Transport by Solving a Cost Allocation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Ralf Bornd¨orfer and Nam-D˜ung Ho`ang

53

A Domain Decomposition Method for Strongly Mixed Boundary Value Problems for the Poisson Equation . . . . .. . . . . . . . . . . . . . . . . . . . Dang Quang A and Vu Vinh Quang

65

Detecting, Monitoring and Preventing Database Security Breaches in a Housing-Based Outsourcing Model. . . . . . . .. . . . . . . . . . . . . . . . . . . . Tran Khanh Dang, Tran Thi Que Nguyet, and Truong Quynh Chi

77

Real-Time Sequential Convex Programming for Optimal Control Applications .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Tran Dinh Quoc, Carlo Savorgnan, and Moritz Diehl

91

vii

viii

Contents

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103 Abhijeet Gaikwad, Viet Dung Doan, Mireille Bossy, Franc¸oise Baude, and Fr´ed´eric Abergel A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 115 Jochen Garcke Haralick’s Texture Features Computation Accelerated by GPUs for Biological Applications . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 127 Markus Gipp, Guillermo Marcus, Nathalie Harder, Apichat Suratanee, Karl Rohr, Rainer K¨onig, and Reinhard M¨anner Free-Surface Flows over an Obstacle: Problem Revisited .. . . . . . . . . . . . . . . . . . 139 Panat Guayjarernpanishk and Jack Asavanant The Relation Between the Gene Network and the Physical Structure of Chromosomes.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 153 Dieter W. Heermann, Manfred Bohn, and Philipp M. Diesinger Generalized Bilinear System Identification with Coupling Force Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 169 Jer-Nan Juang Reduced-Order Wave-Propagation Modeling Using the Eigensystem Realization Algorithm .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 183 Stephen A. Ketcham, Minh Q. Phan, and Harley H. Cudney Complementary Condensing for the Direct Multiple Shooting Method . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 195 Christian Kirches, Hans Georg Bock, Johannes P. Schl¨oder, and Sebastian Sager Some Inverse Problem for the Polarized-Radiation Transfer Equation. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 207 A.E. Kovtanyuk and I.V. Prokhorov Finite and Boundary Element Energy Approximations of Dirichlet Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 219 G¨unther Of, Thanh Xuan Phan, and Olaf Steinbach Application of High Performance Computational Fluid Dynamics to Nose Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 233 I. Pantle and M. Gabi MaxNet and TCP Reno/RED on Mice Traffic . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 247 Khoa T. Phan, Tuan T. Tran, Duc D. Nguyen, and Nam Thoai

Contents

ix

Superstable Models for Short-Duration Large-Domain Wave Propagation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 257 Minh Q. Phan, Stephen A. Ketcham, Richard S. Darling, and Harley H. Cudney Discontinuous Galerkin as Time-Stepping Scheme for the Navier–Stokes Equations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 271 Th. Richter Development of a Three Dimensional Euler Solver Using the Finite Volume Method on a Multiblock Structured Grid.. . . . . . . . . . . . . . . . . . . 283 Tran Thanh Tinh, Dang Thai Son, and Nguyen Anh Thi Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 293 Thomas Tometzki and Sebastian Engell On Isogeometric Analysis and Its Usage for Stress Calculation . . . . . . . . . . . . 305 Anh-Vu Vuong and B. Simeon On the Efficient Evaluation of Higher-Order Derivatives of Real-Valued Functions Composed of Matrix Operations.. . . . . . . . . . . . . . . . . . . 315 Sebastian F. Walter Modeling of Non-ideal Variable Pitch Valve Springs for Use in Automotive Cam Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 325 Henry Yau and Richard W. Longman

•

A Cutting Hyperplane Method for Generalized Monotone Nonlipschitzian Multivalued Variational Inequalities Pham Ngoc Anh and Takahito Kuno

Abstract We present a new method for solving multivalued variational inequalities, where the underlying function is upper semicontinuous and satisfies a certain generalized monotone assumption. First, we construct an appropriate hyperplane which separates the current iterative point from the solution set. Then the next iterate is obtained as the projection of the current iterate onto the intersection of the feasible set with the halfspace containing the solution set. We also analyze the global convergence of the algorithm under minimal assumptions.

Keywords Multivalued variational inequalities • Generalized monotone • Upper semicontinuous

1 Introduction We consider the classical multivalued variational inequality problem (see e.g. [7, 8, 11]), shortly MVI, which is to find points x 2 C and w 2 F .x / such that hw ; x x i 0

8x 2 C;

P.N. Anh Department of Scientific Fundamentals, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam e-mail: [email protected] T. Kuno Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, Japan e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 1, © Springer-Verlag Berlin Heidelberg 2012

1

2

P.N. Anh and T. Kuno

where C is a closed convex subset of Rn , F is a point-to-set mapping from C into subsets of Rn , and h:; :i denotes the usual inner product in Rn . Various methods have been developed so far to solve variational inequality problems (see e.g. [3,5,9,10,12]). In general, these methods assume the underlying mapping to be single-valued, monotone, and hence cannot be applied directly to MVI where F is multivalued. Unfortunately, there are still not many methods for solving MVI (see e.g. [4, 10]). Most of these methods require that F is either Lipschitz with respect to the Hausdorff distance or strongly monotone on C . However, both Lipschitz constant and strongly monotone constant are not easy to compute. In a recent paper [4], Anh et al. proposed a generalized projection method to solve MVI whose mapping F is not assumed to be Lipschitz. The main features of their method are that at each iteration, at most one projection onto C is needed, and that the search direction can be determined from any point in the image of the current iterate. In [1, 2], Anh also proposed an interior proximal method for solving monotone generalized variational inequalities and pseudomonotone multivalued variational inequalities when C is a polyhedron. His method is based on a special interior-quadratic function which replaces the usual quadratic function. This leads to an interior proximal type algorithm, which can be viewed as combining an Armijotype line-search technique and the special interior-quadratic function. The only assumption required is that F is monotone on C . Our main concern in this paper is to use the projection operators on closed convex set C PrC .x/ D arg min ky xk: y2C

We propose an algorithm for solving MVI, by making no assumptions on the problem other than upper semicontinuity, compact-valuedness and certain generalized monotonicity of F . In Sect. 2, we give formal definitions of our target MVI and the generalized monotonicity of F . We then extended an idea often used for singlevalued variational inequalities to MVI and develop an iterative algorithm. Section 3 is devoted to the proof of its global convergence to a solution of MVI. An application to nonlinear complementarity problems is discussed in the last section.

2 Generalized Monotonicity and Algorithm Let F W Rn ! 2R be a mapping upper semicontinuous on a closed convex set C Rn , which is a subset of domF D fx 2 Rn j F .x/ ¤ 0g. Again, let us write out the target problem: n

(MVI) Find points x 2 C and w 2 F .x / such that hw ; x x i 0

8x 2 C:

For simplicity, we assume MVI to have a solution .x ; w /. Let S C denote the first component of the solution set.

Generalized Monotone Variational Inequalities

3

Now we recall well known definition of generalized monotonicity of mappings which will be required in our following analysis (see e.g. [13]). We assume that the mapping F of MVI satisfies this condition. Definition 1. F is called generalized monotone on C if hw; x x i 0

8w 2 F .x/; 8x 2 C:

It is clear that F is generalized monotone if F is monotone, i.e., hw w0 ; x x 0 i 0

8x; x 0 2 C; w 2 F .x/; w0 2 F .x 0 /:

More generally, F is also generalized monotone if F is pseudomonotone, i.e., for all x; x 0 2 C; w 2 F .x/; w0 2 F .x 0 / hw0 ; x x 0 i 0 ) hw; x x 0 i 0: However, even if F is generalized monotone, F might not be monotone or pseudomonotone. It is not difficult to check such examples (see e.g. [13]). If F is a point-to-point mapping, then MVI can be formulated as the following variational inequalities: (VI) Find x 2 C such that hF .x /; x x i 0

8x 2 C:

In this case, it is known that solutions coincide with zeros of the following projected residual function T .x/ D x PrC .x F .x//: In other words, x 0 2 C is a solution of (VI) if and only if T .x 0 / D 0 (see e.g. [12]). Applying this idea to the multivalued variational inequalities MVI, we have the following solution scheme. Let x k be a current approximation to the solution of MVI. First, we compute k w D arg supw2F .x k / hw; x k i and PrC .x k cwk / for some positive constant c. Next, we search the line segment between x k and PrC .x k cwk / for a point .wN k ; zk / such that the hyperplane @Hk D fx 2 Rn j hwN k ; x zk i D 0g strictly separates x k from the solution set S of the problem. To find such .wN k ; zk /, we may use a computationally inexpensive Armijo-type procedure. Then we compute the next iterate x kC1 by projecting x k onto the intersection of the feasible set C with the halfspace Hk D fx 2 Rn j hw N k ; x zk i 0g. The algorithm is then described as follows. Algorithm 1 Step 0.

Choose > 0; x 0 2 C; w0 2 F .x 0 /; 0 < c < 1=, and 2 .0; 1/.

4

P.N. Anh and T. Kuno

Step 1.

Compute wk WD arg sup hw; x k i;

r.x k / WD x k PrC .x k cwk /:

w2F .x k /

Let Gk .m/ WD F x k m r.x k / for an integer m, and find the smallest nonnegative number mk of m such that vk WD

sup hw; r.x k /i kr.x k /k2 :

(1)

w2Gk .mk /

N k ; r.x k /i D vk . Set zk WD x k mk r.x k /. Choose wN k 2 Gk .mk / such that hw Step 2. Set Hk WD fx 2 Rn j hwN k ; x zk i 0g: Find x kC1 WD PrC \Hk .x k /. Step 3. Set k WD k C 1, and go to Step 1.

t u

3 Convergence of the Algorithm Let us discuss the global convergence of Algorithm 1. Lemma 1 Let fx k g be the sequence generated by Algorithm 1. Then the following hold: (i) if r.x k / D 0, then x k 2 S , (ii) x k … Hk , S C \ Hk , (iii) x kC1 D PrC \Hk .y k /, where y k D PrHk .x k /. Proof. (i) It follows from r.x k / D 0 that PrC .x k cwk / D x k . Then hx k cwk x k ; x x k i 0 8x 2 C: Hence, hwk ; x x k i 0

8x 2 C:

(ii) By noting r.x k / ¤ 0, we have hw N k ; x k zk i D hwN k ; x k .x k mk r.x k //i D hwN k ; mk r.x k /i mk kr.x k /k2 > 0: This implies x k … Hk . Since F is assumed to be generalized monotone, N k ; x zk i 0 ) x 2 Hk : hw N k ; zk x i 0 ) hw

Generalized Monotone Variational Inequalities

5

(iii) We know that H D fx 2 Rn j hw; x x 0 i 0g ) PrH .y/ D y

hw; y x 0 i w: kwk2

Hence, y k D PrHk .x k / D x k

hwN k ; x k zk i k mk hwN k ; r.x k /i k k w N D x wN : kwN k k2 kwN k k2

Otherwise, for every y 2 C \ Hk there exists 2 .0; 1/ such that xO D x k C .1 /y 2 C \ @Hk ; where @Hk D fx 2 Rn j hw N k ; x zk i D 0g, because x k 2 C but x k … Hk . Therefore, ky y k k2 .1 /2 ky y k k2 D kxO x k .1 /y k k2 D k.xO y k / .x k y k /k2 D kxO y k k2 C 2 kx k y k k2 2hxO y k ; x k y k i D kxO y k k2 C 2 kx k y k k2 kxO y k k2 ;

(2)

because y k D PrHk .x k /. Also we have kxO x k k2 D kxO y k C y k x k k2 D kxO y k k2 2hxO y k ; x k y k i C ky k x k k2 D kxO y k k2 C ky k x k k2 : Since x kC1 D PrC \Hk .x k /, using the Pythagorean theorem we can reduce this to the following: kxO y k k2 D kxO x k k2 ky k x k k2 kx kC1 x k k2 ky k x k k2 D kx kC1 y k k2 : From (2) and (3), we have kx kC1 y k k ky y k k 8y 2 C \ Hk ;

(3)

6

P.N. Anh and T. Kuno

which means x kC1 D PrC \Hk .y k /: t u Using Lemma 1, we can prove the global convergence of Algorithm 1 under moderate assumptions. Theorem 2 (Convergence theorem). Let F be upper semicontinuous, compact valued and generalized monotone on C . Suppose the solution set S of MVI is nonempty. Then any sequence fx k g generated by Algorithm 1 converges to a solution of MVI. Proof. Suppose that kr.x k /k > 0. First, we need to show the existence of the smallest nonnegative integer mk such that sup hw; r.x k /i kr.x k /k2 ; w2Gk .mk /

where

Gk .mk / D F x k mk r.x k / :

Assume on the contrary that it is not satisfied for any nonnegative integer i , i.e, sup hw; r.x k /i < =jr.x k /=j2 , hw; r.x k /i < kr.x k /k2 8w 2 Gk .i /: w2Gk .i /

As k ! 1, from the upper semicontinuity of F we have hw; r.x k /i kr.x k /k2

8w 2 F .x k /:

(4)

Since hx PrC .x/; z PrC .x/i 0 8x 2 Rn ; z 2 C; we have hx k cwk PrC .x k cwk /; x k PrC .x k cwk /i 0; by noting x D x k cwk and z D x k . This means hr.x k / cwk ; r.x k /i 0 ) kr.x k /k2 chwk ; r.x k /i: From (4) and (5), we have kr.x k /k2 chwk ; r.x k /i ckr.x k /k2 ) c > This is a contradiction.

1 :

(5)

Generalized Monotone Variational Inequalities

7

We next show that the sequence fx k g is bounded. Since x kC1 D PrC \Hk .y k /, we have hy k x kC1 ; z x kC1 i 0 8z 2 C \ Hk : Substituting z D x 2 C \ Hk , then we have hy k x kC1 ; x x kC1 i 0 , hy k x kC1 ; x y k C y k x kC1 i 0; which implies

kx kC1 y k k2 hx kC1 y k ; x y k i:

Hence, kx kC1 x k2 D kx kC1 y k C y k x k2 D kx kC1 y k k2 C ky k x k2 C 2hx kC1 y k ; y k x i hx y k ; x kC1 y k i C ky k x k2 C 2hx kC1 y k ; y k x i D ky k x k2 C hx kC1 y k ; y k x i D ky k x k2 kx kC1 y k k2 :

(6)

Since zk D x k mk r.x k / and y k D PrHk .x k / D x k

hwN k ; x k zk i k w N ; kwN k k2

we have ky k x k2 hw N k ; x k zk i2 k 2 2hw N k ; x k zk i k k kwN k hwN ; x x i k 4 kwN k kwN k k2 mk k 2 2 mk hwN k ; r.x k /i k k hwN ; r.x k /i k 2 hwN ; x x i D kx x k C kwN k k kwN k k2 mk k 2 hw N ; r.x k /i D kx k x k2 kwN k k " # mk k k 2 h w N ; r.x /i mk hwN k ; r.x k /i k k 2 hwN ; x x i kwN k k2 kwN k k D kx k x k2 C

2

D kx x k k

N k ; r.x k /i m k hw kwN k k

2

2 mk hw N k ; r.x k /i k k N k ; r.x k /i hw N ; x x i m k hw k 2 kwN k

8

P.N. Anh and T. Kuno

D kx k x k2

N k ; r.x k /i m k hw kwN k k

2

N k ; r.x k /i k k 2 mk hw hw N ; x x mk r.x k /i kwN k k2 mk k 2 2 mk hwN k ; r.x k /i k k hw N ; r.x k /i k 2 hwN ; z x i: D kx x k kwN k k kwN k k2

(7)

From the generalized monotonicity of F we see that hw N k ; zk x i 0. This, k k together with wN 2 F .z /, implies hwN k ; r.x k /i kr.x k /k2 : Thus, (7) reduces to

2 m k hw N k ; r.x k /i ky x k kx x k kwN k k mk 2 kx k x k2 kr.x k /k4 : kwN k k k

2

k

2

(8)

Combining (6) and (8), we obtain kx kC1 x k2 kx k x k2 kx kC1 y k k2

mk kwN k k

2

kr.x k /k4 :

(9)

This implies that the sequence fkx k x kg is nonincreasing and hence convergent. Consequently, the sequence fx k g is bounded. Since wk 2 F .x k /, r.x k / D x k PrC .x k cwk /, zk D x k mk r.x k / and F is upper semicontinuous and compact valued on C , the sequence fzk g is also bounded (see e.g. [6]). Hence, the sequence fF .zk /g is bounded, i.e., there exists M > 0 such that kwk k M 8wk 2 F .zk /: This, together with (9), implies kx kC1 x k2 kx k x k2 kx kC1 y k k2

mk M

Since fkx k x kg converges to zero, it is easy to see that lim mk kr.x k /k D 0:

k!1

The cases remaining to consider are the following.

2

kr.x k /k4 :

(10)

Generalized Monotone Variational Inequalities

9

Case 1. lim sup mk > 0. k!1

This case must follow that lim inf kr.x k /k D 0. Since x PrC .x cF .x// is upper k!1

semicontinuous on C and fx k g is bounded, there exists x, N an accumulation point of fx k g. In other words, a subsequence fx ki g converges to some xN such that r.x/ N D 0, as i ! 1. Then we see from Lemma 1 that xN 2 S , and besides we can take x D x, N in particular in (10). Thus fkx k xkg N is a convergent sequence. Since xN is an accumulation point of fx k g, the sequence fkx k x kg converges to zero, i.e., fx k g converges to xN 2 S . Case 2. lim mk D 0. k!1

Since mk is the smallest nonnegative integer, mk 1 does not satisfy (1). Hence, we have hw; r.x k /i < kr.x k /k2 8w 2 F x k mk 1 r.x k / ; and besides hw; r.x ki /i < kr.x ki /k2

8w 2 F x ki mki 1 r.x ki / :

(11)

Passing onto the limit in (11) as i ! 1 and using the upper semicontinuity of F , we have hw; r.x/i N kr.x/k N 2

8w 2 F .x/: N

(12)

From (5) we have kr.x ki /k2 chwki ; r.x ki /i: Since F is upper semicontinuous, passing onto the limit as i ! 1 we obtain N r.x/i: N kr.x/k N 2 chw; Combining this with (12), we have N r.x/i N ckr.x/k N 2; kr.x/k N 2 chw; which implies r.x/ N D 0, and hence xN 2 S . Letting x D xN and repeating the previous arguments, we conclude that the whole sequence fx k g converges to xN 2 S . This completes the proof. t u

10

P.N. Anh and T. Kuno

4 An Application to Nonlinear Complementarity Problems It is well known [8] that when C D RnC is a closed convex cone, then MVI becomes the nonlinear complementarity problem, shortly NCP: Find x 2 C such that F .x / 2 C ; hF .x /; x i D 0; where

F W C ! Rn ; C WD fw W hw; xi 0 8x 2 C g

is the polar cone of C . We apply Algorithm 1 to the complementarity problem NCP. Note that in this case, wk D hF .x k /; x k i; r.x k / D x k P rC .x k cwk /; the algorithm for NCP can be detailed in the following. Algorithm 2 Step 0. Choose > 0; x 0 2 C; w0 2 F .x 0 /; 0 < c < 1=, and 2 .0; 1/. Step 1. Compute wk ; r.x k /. Find the smallest nonnegative number mk such that hF .x k mk r.x k //; r.x k /i kr.x k /k2 : Set zk WD x k mk r.x k /. Step 2. Set Hk WD fx 2 Rn j hzk ; x zk i 0g: Find x kC1 WD PrC \Hk .x k /. Step 3. Set k WD k C 1, and go to Step 1.

t u

Validity and convergence of this algorithm is immediate from Algorithm 1.

Acknowledgements The author would like to thank the referee for his/her useful comments, remarks, questions and constructive suggestions that helped us very much in revising the paper. This work is supported in part by the Vietnam National Foundation for Science Technology Development (NAFOSTED) and the Grant-in-Aid for Scientific Research (B) 20310082 from the Japan Society for the Promotion of Sciences.

References 1. Anh P. N.: An interior proximal method for solving pseudomonotone nonlipschitzian multivalued variational inequalities, Nonlinear Analysis Forum, 14, 27–42 (2009). 2. Anh P. N.: An interior proximal method for solving monotone generalized variational inequalities, East-West Journal of Mathematics, 10, 81–100 (2008).

Generalized Monotone Variational Inequalities

11

3. Anh P. N., and Muu L. D.: Coupling the Banach contraction mapping principle and the proximal point algorithm for solving monotone variational inequalities, Acta Mathematica Vietnamica, 29, 119–133 (2004). 4. Anh P. N., Muu L. D., and Strodiot J. J.: Generalized Projection Method for Non-Lipschitz Multivalued Monotone Variational Inequalities, Acta Mathematica Vietnamica, 34, 67–79 (2009). 5. Anh P. N., Muu L.D., Nguyen V. H., and Strodiot J. J.: On the Contraction and Nonexpensiveness Properties of the Marginal Mappings in Generalized Variational Inequalities Involving Co-coercive Operators. In: Eberhard, A., Hadjisavvas, N. and Luc, D. T. (ed) Generalized Convexity and Monotonicity. Springer (2005). 6. Aubin J.P., and Ekeland I.: Applied Nonlinear Analysis, Wiley, New York (1984). 7. Daniele P., Giannessi F., and Maugeri A.: Equilibrium Problems and Variational Models, Kluwer (2003). 8. Facchinei F., and Pang J.S.: Finite-Dimensional Variational Inequalities and Complementary Problems, Springer-Verlag, NewYork (2003). 9. Farouq N. El.: Pseudomonotone variational inequalities: convergence of the auxiliary problem method, J. of Optimization Theory and Applications, 111(2), 305–325 (2001). 10. Hai N. X., and Khanh P. Q.: Systems of set-valued quasivariational inclusion problems, J. of Optimization Theory and Applications, 135, 55–67 (2007). 11. Konnov I. V.: Combined Relaxation Methods for Variational Inequalities, Springer-Verlag, Berlin (2000). 12. Rockafellar R. T.: Monotone operators and the proximal point algorithm, SIAM J. Control Optimization, 14, 877–898 (1976). 13. Schaible S., Karamardian S., and Crouzeix J. P.: Characterizations of generalized monotone maps, J. of Optimization Theory and Applications, 76, 399–413 (1993).

•

Robust Parameter Estimation Based on Huber Estimator in Systems of Differential Equations Tanja Binder and Ekaterina Kostina

Abstract The paper discusses the use of the Huber estimator for parameter estimation problems which are constrained by a system of ordinary differential equations. In particular, a local and global convergence analysis for the estimation with the Huber estimator is given. For comparison, numerical results are given for an estimation with this estimator and both l1 estimation and the least squares approach for a parameter estimation problem for a chemical process.

1 Motivation Robustness in the sense of parameter estimation problems we use here means “insensitivity to small deviations from the assumptions” as defined by Huber [3]. Standard assumptions in parameter identification are, e.g., normally distributed and independent measurement data. But even high quality measurements are not exactly normally distributed but typically longer-tailed. In scientific routine data, i.e. data not taken with special care, about 1%–10% gross errors can be expected [2]. Such gross errors often show as outliers but, of course, not every outlier is a gross error but may also arise for other reasons. Nevertheless, a single outlier can often completely spoil a least squares estimation. This is where the Huber estimator comes into play. Essentially, it is a combination of l1 and l2 criteria for parameter estimation. It is robust in the sense that it can reduce the influence of “wild” data points [5]. To do this, the Huber estimator minimizes a cost function .t/ that evaluates a least squares term if the data point

T. Binder E. Kostina Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str., 35032 Marburg, Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 2, © Springer-Verlag Berlin Heidelberg 2012

13

14

T. Binder and E. Kostina

has an absolute value smaller than a given constant and an absolute value term if the data’s absolute value is greater than , ( 2 t ; jtj ; (1) .t/ D 2 2 jtj 2 ; jtj > : The used partition constant is linked to the ratio " of “bad” data points in the measurement data by the nonlinear function [4, 5] .1 "/1 D

Z

'.t/ dt C 2'. /=;

1

where '.t/ D .2/ 2 exp. 12 t 2 / denotes the standard normal density. For " going to zero the partition constant tends to infinity and the solution of the Huber estimator converges to the solution of the least squares method. In case the error probability " tends to one then approaches zero and the solution of the Huber estimator converges to the solution of the l1 approximation. The Huber estimator is more robust than the l2 estimator in the sense that it is less sensitive to outliers in the measurement data. Via the partition constant the Huber estimators even shows directly which data points are to be considered as outliers. Absolute value minimization also gives a robust estimation by interpolating the n “best” data points and ignoring the rest, i.e. the outliers, where n is the number of degrees of freedom. In contrast, the Huber estimator takes more measurements into account and thus more information of the system.

2 Parameter Estimation in Systems of Differential Equations We assume that at time points tj ; j D 1; : : : ; N , we have given measurement data ij ; i D 1; : : : ; Kj ; which are the sum of some model response hi .tj ; x.tj /; p/ and some unknown measurement errors "ij , ij D hi .tj ; x.tj /; p/ C "ij ; i D 1; : : : ; Kj ; j D 1; : : : ; N; where the function x satisfies an ordinary differential equation x.t/ P D f .t; x.t/; p/

for t 2 Œt0 ; tf ;

x.t0 / D x0 : Here x0 is given, p is a vector of unknown parameters which are to be estimated. To get the parameters p we solve an optimization problem in which an appropriate function describing the deviation between model and data is to be minimized subject to constraints. In case of the Huber estimator this problem reads:

Robust Parameter Estimation Based on Huber Estimator

min

x./;p

Kj N X X

15

ij hi .tj ; x.tj /; p/

(2)

j D1 i D1

s.t. x.t/ P D f .t; x.t/; p/

8t 2 Œt0 ; tf ;

rC .x.t0 /; : : : ; x.tf /; p/ D 0 : Besides the ODE constraint, equality constraints rC .x.t0 /; : : : ; x.tf /; p/ D 0 can hold, e.g., any initial, boundary and interior point conditions or similar restrictions which the solution of the problem has to satisfy.

2.1 Discretization of the Dynamics Since the problem (2) is a parameter estimation problem with an ODE system in the constraints, we follow the boundary value problem approach [1]. This means we discretize the dynamics like a boundary value problem and solve simultaneously the optimization problem, the boundary value problem as equality constraints and the further constraints in one loop. The method of choice for the discretization of the ODE is the multiple shooting method. The integration interval of the ODE is divided into M multiple shooting intervals Œk ; kC1 , k D 0; : : : ; M 1, with t0 D 0 < < M D tf where new unknowns x.k / D sk are introduced to represent the ODE state values in the limiting points of the intervals. Then M initial value problems x.tI P sk ; p/ D f .t; x.tI sk ; p/; p/;

t 2 Œk ; kC1 ;

x.k I sk ; p/ D sk ; k D 0; : : : ; M 1; have to be solved. Of course, the solutions of these subproblems do not necessarily link together smoothly. Therefore we have to introduce M additional continuity conditions, i.e. equality constraints in the endpoints of the multiple shooting intervals, x.kC1 I k ; p/ D skC1 ; to guarantee continuity of the overall solution x.t; s0 ; : : : ; sM ; p/; t 2 Œt0 ; tf . Summarizing the new unknowns sk ; k D 0; : : : ; M , and the original parameters T p to a new parameter vector s D .s0T ; : : : ; sM ; p T /T , this procedure leads to an equality constrained, parametrized optimization problem:

min s

Kj N X X

ij hi .tj ; x.tj ; s/; p/

(3)

j D1 i D1

s.t. rk .s/ WD x.kC1 I sk ; p/ skC1 D 0; rC .s/ D 0 :

k D 0; : : : ; M 1;

16

T. Binder and E. Kostina

2.2 Parameter Estimation Problem with Huber Estimator Essentially, the parameter estimation problem with the Huber estimator (3) can be written as min X

N1 X

.F1;i .X //;

s.t.

F2 .X / D 0;

(4)

i D1

where the cost function splits up into two terms, X 1 X 1 2 2 .F1;i .X // D jF1;i .X /j ; .F1;i .X // C 2 2 i D1

N1 X

i 2I1 .X /

i 2I2 .X /

with the two index sets I1 .X / D fi W jF1;i .X /j g;

I2 .X / D fi W jF1;i .X /j > g :

Here Fl .X /, l D 1; 2; are Nl -vector sufficiently smooth functions, X 2 RN :

2.3 Constrained Gauss-Newton Method For the solution of the minimization problem (4) we use the constrained GaussNewton method. Starting from a given initial value X 0 , we compute new iterates X kC1 D X k C Œt k X k ; where the increment X k solves the linearized problem min X

1 2

X

2 F1;i .X k / C J1;i .X k /X C

i 2I1 .X /

X i 2I2 .X /

2 k k ; jF1;i .X / C J1;i .X /X j 2

s.t. F2 .X k / C J2 .X k /X D 0; for the index sets I1 .X / D I1 .X k ; X / D fi W jF1;i .X k / C J1;i .X k /X j g; I2 .X / D I2 .X k ; X / D fi W jF1;i .X k / C J1;i .X k /X j > g :

(5)

Robust Parameter Estimation Based on Huber Estimator

17

l .X / Here Jl .X / D @F@X ; l D 1; 2: The optional stepsize t k can be determined by a line search strategy, see Sect. 2.5. A numerical procedure for solving the problem (5) is briefly discussed in Sect. 2.6. Let JQ 1 denote a transformation of the Jacobian of the cost function,

JQ 1 .X; X / D D.X; X /J1 .X /; where D.X; X / is a diagonal matrix with the entries ( 1; if jF1;i .X / C J1;i .X /X j ; dii D 0; if jF1;i .X / C J1;i .X /X j > : This means that in the transformed Jacobian JQ 1 only those components of the cost function are still present for which the absolute value of the linearized cost function is smaller than the partition constant while all other components are set to zero and therefore ignored in the further course of the procedure. We further introduce the notation JQ for the composed Jacobian matrix JQ 1 .X; X / Q Q : J D J .X; X / D J2 .X / Under the regularity assumptions rank J2 .X k / D N2 ; rank JQ .X k ; X k / D N

(6)

the solution X k of the linearized problem (5) is given by means of a generalized C inverse JQ : C X k D JQ .X k ; X k /FQ .X k ; X k / ; where FQ is defined as FQ 1 .X; X / FQ D FQ .X; X / D F2 .X / with ( F1;i .X /; Q F 1;i .X; X / D sign.F1;i .X / C J1;i .X /X /; The generalized inverse JQ C JQ

C

if i 2 I1 .X; X /; if i 2 I2 .X; X /:

is explicitly given by 1 C J .X /T D.X; X /J1 .X / J2 .X /T Q D J .X; X / D .IN ; 0N;N2 / 1 J2 .X / 0N2 ;N2 J .X /T 0N;N2 1 0N2 ;N1 IN2

18

T. Binder and E. Kostina

and satisfies the condition

C C C JQ JQJQ D JQ :

Here Ik denotes k k identity matrix, 0k;r denotes k r zero matrix.

2.4 Local Convergence It can be shown that the Gauss-Newton method eventually identifies the “optimal partitioning” of residuals F1i .X / in a neighbourhood of a solution X that satisfies certain assumptions: Theorem 1. Suppose that X is a solution of the problem (4) that satisfies (6) with X k D X ; X k D 0 and the strict complementarity jF1i .X /j ¤ ; i D 1; : : : ; N1 ; with X D X . Then, there exists a neighbourhood D of X such that for all X k 2 D the linearized problem (5) has a solution X k whose partitioning I1 .X k ; X k /; I2 .X k ; X k / is the same as the partitioning I1 .X k /; I2 .X k / of the nonlinear problem (4) at X D X k , which is in its turn the same as the partitioning I1 .X /; I2 .X / of the nonlinear problem (4) at X D X W Il .X k / D Il .X / D Il .X k ; X k / D Il ; l D 1; 2: Moreover, signF1i .X k / D signF1i .X / D sign.F1i .X k / C J1i .X k /X k /; i 2 I2 : Hence, the full-step (t k 1) method becomes equivalent to the Gauss-Newton method applied to solving a modified least squares problem min

X 1X 2 F1i .X / C sgn.F1i .X //F1i .X /; 2 i 2I i 2I 1

s.t. F2 .X / D 0;

2

and as a result it has a linear rate of local convergence. Theorem 2 (Local Contraction). Let D be the neighbourhood defined by Theorem 1. Assume that the following (weighted) Lipschitz conditions for JQ and C JQ are satisfied for all X; Y D X C X; Z 2 D and all t 2 Œ0; 1 C kJQ .Y /ŒJQ .X C t.Y X // JQ .X /.Y X /k ! < 1; t kY X k2 C

(7)

C

kŒJQ .Z/ JQ .X /R.X; X /k < 1; kZ X k

(8)

Robust Parameter Estimation Based on Huber Estimator

19

where X solves the linearized problem (5) at X , R is the residual of the linearized problem, Q R.X; X / ; R D R.X; X / D F2 .X / C J2 .X /X with

(

F1;i .X / C J1;i .X /X; RQ i .X; X / D sign.F1;i .X / C J1;i .X /X /;

if i 2 I1 if i 2 I2 :

Assume further, that all initial guesses X 0 2 D satisfy !jjX 0 jj C < 1; ı0 D 2

jjX 0 jj N D: D0 D B X 0 ; 1 ı0

Here X 0 solves the linearized problem (5) at X 0 : Then the sequence of iterates fX k g of the full step Gauss-Newton method is well defined, remains in D and converges to a point X with JQ .X /CFQ .X / D 0: It further holds the a priori estimate kX kC1 k .kX k k !=2 C / kX k k ; which means the convergence to be linear with the rate . The statements of this theorem can be interpreted similarly to the least squares case. The constant ! from the Lipschitz condition (7) for JQ is a measure for the nonlinearity of the model as it is in fact nothing but a weighted second derivative. Its inverse ! 1 characterizes the region of validity of the linear model. C The constant from the Lipschitz condition (8) for JQ refers to the incompatibility of the model and the measurements and it is therefore called the incompatibility constant. A value < 1 is a necessary condition for the identifiability of the parameters from the available data. Only a solution with < 1 is statistically stable. Solutions with 1 have large residuals and are statistically unstable. Let us note that in case of the Huber-estimator one can reduce by decreasing the partitioning constant :

2.5 Global Convergence As we have seen that the Constrained Gauss-Newton method for the Huber estimator is locally convergent, we concentrate our attention on globalization strategies. One possibility is a line search in which the iteration step is damped, X kC1 D X k C t k X k ;

20

T. Binder and E. Kostina

with a stepsize t k 20; 1. This stepsize is chosen such that the next iterate X kC1 is “better” in some sense than the current iterate X k , T1 .X kC1 / < T1 .X k / : As a measure for the goodness of the iterates we use the exact penalty function T1 as a merit function, T1 .X / D

N1 X

.F1;i .X // C

i D1

N2 X

˛i jF2;i .X /j ;

(9)

i D1

with sufficiently large weights ˛i > 0. Theorem 3 (Compatibility of Gauss-Newton method for Huber estimator and exact penalty function). Under the regularity assumptions (6) and if we further assume strict complementarity, i.e. jF1;i .X /j ¤ ;

jF1;i .X / C J1;i .X /X j ¤ ;

the increment X solving the linearized problem leads to a descent direction of the nonlinear problem with Huber estimator, T1 .X k C "X / T1 .X k / < 0; "!0 " lim

for the exact penalty function (9). Based on this Theorem we can prove global convergence of the method with exact line search.

2.6 Numerical Solution of the Linearized Problem One of the decisive steps of the method which largely affects its performance is the solution of the linearized problems of (5). An efficient method is the so-called condensing which exploits the block structure of the Jacobian J2 that arises due to the applied multiple shooting approach, see also [1], 0

R01 BR 2 B 0 B B A0 J DB B B B @

R11 R21 R12 R22 I A1 I :: :

1 RM 2 RM

::

:

Rp1 Rp2 B0 B1 :: :

AM 1 I BM 1

1 C C C C C; C C C A

0 B B B B F DB B B B @

F1 rC r0 r1 :: : rM 1

1 C C C C C; C C C A

Robust Parameter Estimation Based on Huber Estimator

with Rj1 D

@F1 , @sj

@x.j C1 Wj ;sj ;p/ ; @sj

Rj2 D

@rC , j D @sj @x.j C1 Wj ;sj ;p/ , @p

21

0; : : : ; M , Rp1 D

@F1 , @p

Rp2 D

@rC @p

, Aj D

Bj D j D 0; : : : ; M 1. For notation simplicity we omit here the variable X . For given s0 and p we can solve the continuity equations by a simple forward recursion. This is equivalent to a reduction of the linearized system in the following way. Define iteratively vectors dj and matrices Cj and Dj with d0 D r0 , C0 D A0 , D0 D B0 , dj D Aj dj 1 C rj ; Cj D Aj Cj 1 ; Dj D Aj Dj 1 C Bj ; j D 1; : : : ; M 1; and compute FN1 D F1 C

M X

Rj1 dj 1 ;

rNC D rC C

j D1

RN 0l D R0l C

M X

M X

Rj2 dj 1 ;

j D1

Rjl Cj 1 ;

RN pl D Rpl C

j D1

M X

Rjl Dj 1 ; l D 1; 2:

j D1

We can then write sj D Cj 1 s0 C Dj 1 p C dj 1 ; j D 1; : : : ; M; and substitute this in the linearized problem to get: min

s0 ;p

N1 X

1 1 .FN1;i C RN 0;i s0 C RN p;i p/;

(10)

i D1

s.t. rNC C RN 02 s0 C RN p2 p D 0 : The condensed problem (10) is equivalent to a quadratic programming problem with additional variables v, u and w 2 RN1 and additional equality and inequality constraints 1 X 1 T .ui C wi /; v vC 2 i D1

N

min

s0 ;p;v;u;w

s.t. FN1 C RN 01 s0 C RN p1 p v D u w; u 0; w 0; rNC C RN 02 s0 C RN p2 p D 0 ; and can be solved by a structure exploiting QP solver with an active set strategy.

22

T. Binder and E. Kostina

A violation of the assumed regularity assumption that the rank of JQ should equal the number of unknowns means that the measurements judged to be “good” by the Huber estimator do not provide enough information about the parameters. The parameters are not identifiable from the given data. Hence, a regularization by a rank reduction is necessary. Alternatively, methods from optimum experimental design can be applied to gain more information about the parameters.

3 Numerical Results As a numerical example we consider the chemical process of the denitrogenization of pyridine, see [1]. Pyridine is converted into ammonia and pentane by means of three catalysts. The reaction coefficients p1 ; : : : ; p11 are the unknowns of the reaction which are to be estimated from given measurements. As the process is isothermal at 350ı K and 100atm, we do not need an Arrhenius term. Thus the process can be described mathematically by a system of seven ordinary differential equations, one for each occurring species, pyridine: AP D p1 A C p9 B; piperidine: BP D p1 A p2 B p3 BC C p7 D p9 B C p10 DF; pentylamine: CP D p2 B p3 BC 2p4 C C p6 C C p8 E C p10 DF C 2p11 EF; N-pentylpiperidine: DP D p3 BC p5 D p7 D p10 DF; dipentylamine: EP D p4 C C C p5 D p8 E p11 EF; ammonia: FP D p3 BC C p4 C C C p6 C p10 DF p11 EF; pentane: GP D p6 C C p7 D C p8 E : In the beginning of the process only pyridine is available while the initial concentration for all other species is zero. The artificial measurement data was generated using “true” parameter values. Four outliers were randomly introduced into the data. We solved the corresponding parameter estimation problem with the l2 norm, the l1 norm, and the Huber function as optimization criterion for the cost function. The computed parameter estimates together with their true values are given in Table 1. Obviously, l1 and Huber estimation do not differ much while the least squares approximation yields quite different results. The most deviant parameter estimates of the l2 estimation are marked in italics.

Robust Parameter Estimation Based on Huber Estimator

23

Table 1 Estimates of the parameters l2 l1 Huber true

p1 1.812 1.810 1.810 1.810

p2 0.850 0.894 0.894 0.894

p3 29.597 29.399 29.393 29.400

p4 4.467 9.209 9.172 9.210

p5 0.059 0.058 0.058 0.058

p6 2.503 2.429 2.430 2.430

p7 0.112 0.0644 0.0647 0.0644

p8 1.990 5.550 5.551 5.550

p9 0.0203 0.0201 0.0201 0.0201

p10 0.497 0.577 0.576 0.577

p11 8.468 2.149 2.184 2.150

4 Conclusions We developed methods for using the Huber estimator in parameter estimation problems with underlying dynamic processes. This estimator is “better” than the least squares method in the sense that (1) it is robust, (2) it gives a possibility for outlier identification, and (3) it tells us if there are enough “good” measurements to identify the parameters or if more information is needed. Therefore, the Huber estimator is the method of choice for normally distributed data with some outliers. For normally distributed data without outliers the l2 estimator is of course still preferable. Although the l1 estimator is also a robust method, it is inferior to the Huber estimator with respect to the amount of information taken into account. Acknowledgements This research was supported by the German Federal Ministry for Education and Research (BMBF) through the Programme “Mathematics for Innovations in Industry and Public Services”.

References 1. Bock, H.G.: Randwertproblemmethoden zur Parameteridentifizierung in Systemen nichtlinearer Differentialgleichungen. Bonner Mathematische Schriften, 183, Bonn (1987). 2. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York (1986) 3. Huber, P.J.: Robust Statistics. Wiley, New York (1981) 4. Huber, P.J.: Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73–101 (1964) 5. Ekblom, H., Madsen, K.: Algorithms for Non-linear Huber Estimation. BIT, 29, 60–76 (1989)

•

Comparing MIQCP Solvers to a Specialised Algorithm for Mine Production Scheduling Andreas Bley, Ambros M. Gleixner, Thorsten Koch, and Stefan Vigerske

Abstract This paper investigates the performance of several out-of-the-box solvers for mixed-integer quadratically constrained programmes (MIQCPs) on an open pit mine production scheduling problem with mixing constraints. We compare the solvers BARON, COUENNE , SBB, and SCIP to a problem-specific algorithm on two different MIQCP formulations. The computational results presented show that general-purpose solvers with no particular knowledge of problem structure are able to nearly match the performance of a hand-crafted algorithm.

1 Introduction Effective general-purpose techniques are currently applicable for most linear mixedinteger and continuous convex optimisation problems. In contrast, for many nonconvex optimisation problems, specialised algorithms are still required to find globally optimal solutions. Traditional solution methods for nonconvex integer optimisation problems have been developed either as entirely new solvers [16, 19], or by directly extending a solver for NLPs to cope with integrality conditions, see e.g. [3, 10]. In recent years several groups have started to explore a different direction by trying to extend MIP solvers to handle nonlinearities, see e.g. [1, 4, 5, 9, 14].

A. Bley Technische Universit¨at Berlin, Straße des 17. Juni 136, 10623 Berlin e-mail: [email protected] A.M. Gleixner T. Koch Zuse Institute Berlin, Takustraße 7, 14195 Berlin e-mail: gleixner,[email protected] S. Vigerske Humboldt-Universit¨at zu Berlin, Unter den Linden 6, 10099 Berlin e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 3, © Springer-Verlag Berlin Heidelberg 2012

25

26

A. Bley et al.

In this paper we compare the performance of a specialised branch-and-bound code to solve an open-pit mine production scheduling problem with mixing constraints to the performance of several general-purpose solvers on these problems, specifically BARON [19], COUENNE [4], SBB [3], and SCIP [5]. An extended version of this article is available as technical report [7]. Open-pit mine production scheduling has been chosen as a test case, since the authors were involved in a research project to solve these challenging, large-scale optimisation problems [6]. Now a few years later it can be seen that using recent general-purpose software we are able to get nearly as good solutions out-of-the-box.

2 Open Pit Mine Production Scheduling with Stockpiles In this section we describe in detail our model of the open pit mine production scheduling problem (OPMPSP) [8, 12, 17]. Typically, the orebody of an open pit mine is discretised into small mining units called blocks. Block models of realworld open pit mines may consist of hundreds of thousands of blocks resulting in large-scale optimisation problems. Groups of blocks are often aggregated to form larger mining units with possibly heterogeneous ore distribution, which we call aggregates. We assume such an aggregation of a block model is given a priori, with the set of aggregate indices N D f1; : : : ; N g.1 Note that this setting comprises the special case of an unaggregated block model where we have only one block per aggregate. Moreover, we assume complete knowledge about the contents of each aggregate i : First, its rock tonnage Ri , i.e. the amount of material which has to be extracted from the mine. Second, its ore tonnage Oi , i.e. the fraction of the rock tonnage sufficiently valuable to be processed further; in contrast, the non-ore fraction of each aggregate is discarded as waste immediately after its extraction from the mine. Finally, the tonnages A1i ; : : : ; AK i quantify a number of mineral attributes contained in the ore fraction. Attributes may be desirable, such as valuable mineral, or undesirable, such as chemical impurities. The mining operations consist of several processes: First, rock is extracted from the pit, which we refer to as mining. Subsequently, the valuable part of the extracted material is refined further for sale, which is called processing; the remaining material not sufficiently valuable is simply discarded as waste. In an intermediate stage between mining and processing, the valuable material may be stored on stockpiles. A stockpile can be imagined as “bucket” in which all material is immediately mixed and becomes homogeneous. The lifespan of the mine is discretised into several, not necessarily homogeneous periods 1; : : : ; T . A feasible mine schedule determines, for each time period, the

1

Various techniques exist for computing aggregates of blocks, for an example see the fundamental tree method [18].

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

27

amount of rock which is to be mined from each aggregate, the fraction of the mined ore which is to be sent for processing or stockpiled, as well as the amount of ore sent from the stockpiles to the processing plant. Resource constraints restrict the amount of rock which may be mined and the amount of ore which may be processed during each time period t by limits Mt and Pt , respectively. Precedence constraints model the requirement that wall slopes are not too steep, ensuring the safety of the mine. Technically, these constraints demand that, before the mining of aggregate i may be started, a set of predecessor aggregates P.i / must have been completely mined. Long-term mining schedules have to be evaluated by their net present value: For each time period, we take the return from the processed and sold minerals minus the cost for mining and processing, multiplied by a decreasing discount factor to account for the time value of money. For homogeneous time periods and constant interest rate q > 0 per time period, the profit made in time period t is multiplied by a factor of 1=.1 C q/t . The objective is to find a feasible mine schedule with maximum net present value. Already without considering stockpiles, open pit mine production scheduling poses an NP-hard optimisation problem, see e.g. [13]. This paper focuses on the special case of one attribute—some valuable mineral— and a single stockpile. A more general setting comprising multiple attributes, multiple stockpiles, or blending constraints in case of multiple attributes can easily be modelled by minor extensions and modifications, see [6]. To conclude this section, Table 1 summarises the notation introduced above.

3 MIQCP Formulations In this section we provide mixed-integer quadratically constrained programming (MIQCP) formulations of the open pit mine production scheduling problem with one attribute (“metal”) and a single, infinite-capacity stockpile, as presented in [6]: an aggregated “basic” formulation and an extended “warehouse” formulation. These formulations are theoretically equivalent. The results in [6], however, clearly speak in favour of the extended formulation. For the LP relaxation based solvers, this is equally confirmed by our computational study presented in Sect. 4.

Table 1 List of notation N, N P .i / Ri , Oi Aki ck m, p T ıt M t , Pt

Number of aggregates and set of aggregate indices f1; : : : ; N g, respectively Set of immediate predecessors of aggregate i Rock and ore tonnage of aggregate i , respectively [tonnes] Tonnage of attribute k in aggregate i (Ai for a single attribute) [tonnes] Sales price of attribute k (c for a single attribute) [$m/tonne] Mining and processing cost, respectively [$m/tonne] Number of time periods Discount factor for time period T (typically 1=.1 C q/t with fixed interest rate q > 0) Mining and processing capacity, respectively, for time period t [tonnes]

28

A. Bley et al.

3.1 Basic Formulation To track the various material flows, we define the following continuous decision variables for each aggregate i and time period t: m yi;t 2 Œ0; 1 as the fraction of aggregate i mined at time period t, p

yi;t 2 Œ0; 1 as the fraction of aggregate i mined at time period t and sent immediately for processing, s

yi;t 2 Œ0; 1 as the fraction of aggregate i mined at time period t and sent to the stockpile, s

s

p

p

ot ; at > 0

as the absolute amount of ore respectively metal on the stockpile at time period t, and

ot ; at > 0 as the absolute amount of ore respectively metal sent from the stockpile to the processing plant at time period t. With this, the net present value of a mine schedule is calculated as NP V .y m ; y p ; op ; ap / D " ! ! # T N N N X X X X p p p p m ıt c at C Ai yi;t p ot C Oi yi;t m Ri yi;t : t D1

i D1

i D1

(1)

i D1

In order to model the precedence constraints, we define the binary variables xi;t 2 f0; 1g as equal to 1 if aggregate i is completely mined within time periods 1; : : : ; T . A precedence-feasible extraction sequence is then ensured by the constraints xi;t 6

t X

m yi;

for i 2 N ; t D 1; : : : ; T;

(2)

for i 2 N ; j 2 P.i /; t D 1; : : : ; T:

(3)

D1 t X

m yi; 6 xj;t

D1

Additionally, we may, without altering the set of feasible solutions, require the sequence xi;1 ; : : : ; xi;T to be nondecreasing for each aggregate i : xi;t 1 6 xi;t

for i 2 N ; t D 2; : : : ; T:

(4)

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

29

Though redundant from a modelling point of view, these inequalities may help (or hinder) computationally, and have been used in the benchmark algorithm from [6]. Conservation of the mined material is enforced by T X

m yi;t 61

for i 2 N ; and

(5)

for i 2 N ; t D 1; : : : ; T;

(6)

t D1 p

s

m yi;t C yi;t 6 yi;t

i.e. for each aggregate, the amount sent for processing or to the stockpile in one time p s m period must not exceed the total amount mined. (The difference yi;t yi;t yi;t is discarded as waste.) To model the state of the stockpile, we make

3.1.1 Assumption S Material sent from the stockpile to processing is removed at the beginning of each time period, while material extracted from the pit (and not immediately processed) is stockpiled at the end of each time period. Following this assumption, we must not send more material from the stockpile to processing than is available at the end of the previous period: p

s

p

s

ot 6 ot 1 and at 6 at 1

for t D 2; : : : ; T:

(7)

If we assume the stockpile to be empty at the start of the mining operations, we p p have o1 D a1 D 0. Now, the book-keeping constraints for the amount of ore on the stockpile read (P s ot

D

N i D1

s ot 1

s

Oi yi;1

p ot

C

PN

s i D1 Oi yi;t

for t D 1; for t D 2; : : : ; T;

(8)

and analogously for the amount of metal on the stockpile (P s at

D

s N i D1 Ai yi;1 s p at 1 at C

PN

s i D1 Ai yi;t

for t D 1; for t D 2; : : : ; T:

(9)

The resource constraints on mining and processing read N X

m Ri yi;t 6 Mt

for t D 1; : : : ; T; and

(10)

for t D 1; : : : ; T:

(11)

i D1 p

ot C

N X i D1

p

Oi yi;t 6 Pt

30

A. Bley et al.

Last, we need to ensure that the ore-metal-ratio of the material sent from stockpile to processing equals the ore-metal-ratio in the stockpile itself. Otherwise, only the profitable metal could be sent to processing and for sale while the ore, only causing processing costs, could remain in the stockpile. This involves the nonconvex p p s s quadratic mixing constraints at =ot D at 1 =ot 1 for t D 2; : : : ; T . To avoid singularities, we reformulate these constraints as p s

s

p

at ot 1 D at 1 ot

for t D 2; : : : ; T:

(12)

All in all, we obtain the basic formulation max NP V .y m ; y p ; op ; ap / s. t.

(2)–(12);

(BF)

x 2 f0; 1gN T ; y m ; y p ; ys 2 Œ0; 1N T ; os ; as ; op ; ap > 0: Stockpiling capacities can be incorporated as upper bounds on os and as .

3.2 Warehouse Formulation In the basic formulation (BF) the material of all aggregates sent from the pit to the stockpile is aggregated into variables os and as . Alternatively, we may track the material flows via the stockpile individually. Instead of variables os , as , op , and ap , we then define for each aggregate i and time period t: p

zi;t 2 Œ0; 1 as the fraction of aggregate i sent from stockpile for processing at time period t and s

zi;t 2 Œ0; 1 as the fraction of aggregate i remaining in the stockpile at time period t. The net present values in terms of these variables is calculated as NP V .y m ; y p ; zp / D " N # T N N X X p X X p p p m ıt c Ai yi;t C zi;t p Oi yi;t C zi;t m Ri yi;t : t D1

i D1

i D1

(13)

i D1

s

Constraints (2)–(6) remain unchanged. Starting with an empty stockpile gives zi;1 D p zi;1 D 0 for i 2 N . Under Assumption S, the stockpile balancing equations read s

s

s

p

zi;t 1 C yi;t 1 D zi;t C zi;t

for i 2 N ; t D 2; : : : ; T:

(14)

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

31

The resource constraints on mining are the same as (10), the resource constraints on processing become N X

p p Oi yi;t C zi;t 6 Pt

for t D 1; : : : ; T:

(15)

i D1

Instead of the mixing constraints (12), now we demand that for each time period t, p s the fraction zi;t =zi;t is equal for each aggregate i . We obtain a better formulation by introducing, for each time period t, a new variable ft 2 Œ0; 1 called out-fraction, p s p and requiring for all i 2 N , that zi;t =.zi;t C zi;t / D ft . To avoid zero denominators, we reformulate this as p

s

zi;t .1 ft / D zi;t ft

for i 2 N ; t D 2; : : : ; T:

(16)

This gives the warehouse formulation max NP V .y m ; y p ; zp / s. t.

(2)–(6), (10), (14)–(16);

(WF)

x 2 f0; 1gN T ; y m ; y p ; ys ; zp ; zs 2 Œ0; 1N T ; f 2 Œ0; 1T : Note that the basic formulation is an aggregated version of the warehouse formulation, and thus the LP relaxation (obtained by dropping integrality and mixing constraints) is tighter for the warehouse formulation. Bley et al. [6] propose a rough a priori discretisation of the out-fractions in order to tighten the linear MIP relaxation obtained when dropping all of the quadratic constraints. Computational results on the effect of this technique can be found in the extended version of this article [7].

4 Computational Study 4.1 Application-Specific Benchmark Algorithm As benchmark algorithm we used the application-specific approach developed by Bley et al. [6]. It features a branch-and-bound algorithm based on the linear MIP relaxation of the problem obtained by dropping the nonlinear mixing constraints, i.e. (12) for the basic and (16) for the warehouse formulation, respectively. A specialised branching scheme is used to force the maximum violation of the nonlinear constraints arbitrarily close to zero. They implemented their approach using the state-of-the-art MIP solver CPLEX 11.2.1 with tuned parameter settings. Additionally, they apply problem-specific heuristics as well as a variable fixation scheme and cutting planes derived from the

32

A. Bley et al.

underlying precedence constrained knapsack structure which have been shown to improve the dual bound for linear mine production scheduling models. For further details, see [6]. We used the same implementation in our computational study.

4.2 General-Purpose MIQCP Solvers For our computational experiments, we had access to four general-purpose solvers for MIQCPs: BARON [19], a closed-source mixed-integer nonlinear programming (MINLP) solver that implements a spatial branch-and-bound algorithm based on a linear relaxation obtained from a convexification of the MINLP. Branching is performed on both integer variables and continuous variables, the latter to reduce the gap between a nonconvex function and its convex underestimator. We used BARON 9.0.2 with CPLEX 12.1.0 [8] as LP solver and MINOS 5.51 [15] as NLP solver. COUENNE [4], a recently developed open-source MINLP solver that implements a similar technique to BARON. It is built on top of the MIP solver CBC [11]. We used COUENNE 0.2 (stable branch, rev. 256) with CBC 2.3 as branch-andbound framework, CLP 1.10 [11] as LP solver, and the interior-point solver I POPT 3.6 [20] to handle NLPs. SCIP [2], a constraint integer programming solver that is freely available for academic use and has recently been extended to handle quadratic constraints within an LP based branch-and-cut algorithm [5]. We used SCIP 1.2.0.4 once with CPLEX 12.1.0 and once with CLP 1.10 as LP solver and IPOPT 3.7 as QCP solver. SBB [3], a commercial solver for MINLPs that implements an NLP based branchand-bound algorithm. The solution of the NLP relaxation is used as dual bound for the branch-and-bound algorithm. This bound can only be trusted if all NLPs are solved to global optimality. However, we used SBB with CONOPT 3.14T [3] as NLP solver, which does not guarantee global optimality for the nonconvex NLP relaxations in our application. We still include SBB into our testset, since NLP based branch-and-bound algorithms often obtain very good primal solutions also for nonconvex MINLPs.

4.3 Test Instances Our industry partner BHP Billiton Pty. Ltd.2 has provided us with realistic data from two open pit mines. Data set Marvin is based on a block model provided with the

2

http://www.bhpbilliton.com/

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

33

Table 2 Size of MIQCPs for instances Marvin and Dent (before presolving) Marvin Dent No. variables

No. constraints

No. variables

No. constraints

Total Bin Cont Total Linear Quad Total Bin Cont Total Linear Quad (BF) 5,848 1,445 4,403 7,598 7,582 16 12,600 3,125 9,475 15,774 15,750 24 (WF) 8,687 1,445 7,242 10,404 9,044 1,360 18,775 3,125 15,650 21,900 18,900 3,000

Whittle 4X mine planning software,3 originally consisting of 8513 blocks which were aggregated to 85 so-called “panels”, i.e. single layers of blocks without blockto-block precedence relations. The lifespan of this mine, i.e. the time in which the profitable part of the orebody can be fully mined, is 15 years. Each panel has an average of 2.2 immediate predecessor aggregates. Data set Dent is based on the block model of a real-world open pit mine in Western Australia, originally consisting of 96821 blocks which were aggregated to 125 panels. Each panel has an average of 2.0 immediate predecessor aggregates. The lifespan of this mine is 25 years. The aggregations to panels, the cutoff grades (determining which blocks in each panel are immediately discarded as waste), and precedence relations between the panels were pre-computed by our industry partner. Scheduling periods are time periods of one year each with a discount rate of 10% per year. Realistic values for mining costs and processing profits as well as for mining and processing capacities per year were chosen by our industry partner. We tested the performance of the general-purpose MIQCP solvers from Sect. 4.2 on this data using the basic and the warehouse formulation—the same formulations on which the benchmark algorithm is based. Table 2 gives an overview over the size of these MIQCP formulations for instances Marvin and Dent.

4.4 Computational Results The experiments were run single-threaded on an Intel Core2 Extreme CPU X9650 with 3.0 GHz and 8 GB RAM and a time limit of 10,000 s. Since no solver was able to provide a provable optimal solution, we report primal and dual bound and the number of nodes processed after one hour and at the end of the time limit. 4.4.1 Solver Settings We ran both solver BARON and COUENNE with default and optimised settings. For BARON we set maxpretime 1800 limiting the preprocessing time to 30

3

Gemcom Whittle, http://www.gemcomsoftware.com/products/whittle/

34

A. Bley et al.

minutes, PEnd 5 and PDo 50 reducing probing to depth 5 for at most 50 variables. For COUENNE we set aggressive fbbt no and optimality bt no to switch off too expensive bound propagation techniques. For SBB, we generally switched on the option acceptnonopt, ensuring that SBB did not prune a node if the NLP subsolver did not conclude optimality or infeasibility of the node’s QCP relaxation. Besides default settings, we also tested a tuned version with option dfsstay 25. The results for tuned settings are indicated by “ ” in tables and figures. We parenthesised the dual bound and gap for solver SBB, since they might be invalid due to the nonconvexity of the problem. SCIP was run with one setting only. The extended RENS heuristic [5] was called frequently. The QCP solver was only used inside the RENS heuristic to find feasible solutions of the sub-MIQCP with all integer variables fixed. To allow a better comparison with both BARON and C OUENNE, we run SCIP once with CPLEX and once with CLP as LP solver. 4.4.2 Results for the Basic Formulation Table 3 shows the performance of the application-specific benchmark algorithm from Sect. 4.1 and the general-purpose solvers when using the basic formulation. The application-specific algorithm yields the smallest primal-dual gaps among the LP relaxation based solvers, all of which, however, terminate with large dual bounds. Among the LP based general-purpose solvers, BARON has the best dual bounds, while it is outperformed by COUENNE and SCIP in terms of primal solutions. However, including the benchmark algorithm, all LP based solvers perform rather unsatisfactory on the basic formulation. In contrast, the tightest dual bounds clearly are obtained by the NLP based approach of solver SBB –although they cannot be trusted. It produces the best primal solution for problem instance Marvin and terminates with the smallest gap of 3.25%. For instance Dent, however, the best solution found by SBB was 18.1% worse than the best primal solution found by COUENNE , resulting in a final gap larger than for the benchmark algorithm.

4.4.3 Results for the Warehouse Formulation Table 4 shows the results for the warehouse formulation. First note that the LP based approaches perform significantly better on this formulation. The application-specific algorithm shows excellent performance on the warehouse formulation. It produces the best primal solutions and terminates with the smallest primal-dual gaps of 0.02% for instance Marvin and 0.33% for instance Dent. Nevertheless, the best solutions found by the general-purpose solvers are only 0.4% and 0.2% below those found by the benchmark algorithm for Marvin and Dent, respectively.

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

35

Table 3 Results for basic formulation (BF) Instance Solver After 3,600 s Primal Marvin

Dent

Dual

After 10,000 s Nodes

Primal

916.6 249,100

678.2

Dual

Nodes

Gap

916.4 476,456

35.13

Benchmark

678.2

BARON BARON

229.2 1,217.9 568.4 1,386.1

500 10

229.1 1,163.9 641.4 1,142.7

1,925 3,979

407.90 78.15

C OUENNE COUENNE

— 1,655.6 283.9 1,645.9

2 2,893

— 1,650.0 642.6 1,636.0

104 15,429

— 154.58

SCIP/CPLEX SCIP/CLP

669.6 1,584.1 232,600 671.6 1,581.1 170,800

672.4 1,579.7 645,397 671.6 1,577.4 462,962

57.43 57.43

SBB SBB

676.6 683.0

7,980 8,040

682.7 685.1

25,266 24,922

(3.25) (2.88)

54.0 100,500

47.3

53.8 269,023

13.71

Benchmark BARON BARON

47.3

(706.1) (706.0)

(705.0) (704.9)

4.8 4.6

106.8 104.3

19 6

4.8 4.6

106.8 104.3

384 2,118.86 1,333 2,161.57

COUENNE COUENNE

48.1 48.1

113.9 113.9

0 0

48.1 48.1

113.9 113.4

2 2,955

137.04 135.81

SCIP/CPLEX SCIP/CLP

46.1 47.5

110.1 110.3

61,400 45,000

46.1 47.5

109.9 179,554 110.0 141,391

58.05 56.86

SBB SBB

39.4 39.4

(50.2) (50.2)

500 540

39.4 39.4

(50.2) (50.2)

1,214 1,220

(27.53) (27.33)

The best dual bounds from the general-purpose solvers are 1.4% and 0.2% away from the benchmark values for Marvin and Dent, respectively. Note that this difference is not only due to the handling of the nonlinear constraints. Also, the benchmark algorithm uses knowledge about the underlying precedence constrained knapsack structure of the linear constraints in order to fix binary variables and separate induced cover inequalities. This structure is not directly exploited by the general-purpose solvers. In contrast to the LP relaxation based solvers, the QCP relaxation based approach of SBB appears to be less dependent on the change in formulation. Notably, the Dent instance appears more challenging to SBB than Marvin, while for SCIP the situation is reversed. This is probably due to the increased problem size, which affects the solvability of the QCP relaxation in SBB more than the solvability of the LP relaxation in SCIP. For both instances, SCIP was able to compute better primal solutions than SBB. For instance Marvin, SBB produced a solution only slightly worse than SCIP when using the option dfsstay 25. Here, the forced depth first search after nodes with integer feasible solution appears to function as an improvement heuristic, compensating for SBB’s lack of heuristics.

36

A. Bley et al.

Table 4 Results for warehouse formulation (WF) Instance Solver After 3,600 s

Marvin

Dent

After 10,000 s

Primal

Dual

Nodes

Primal

Dual

Nodes

Gap

Benchmark

694.8

695.9

41,057

695.0

695.1

115,103

0.02

BARON BARON

303.3 427.8

715.1 715.9

482 803

388.8 619.3

713.9 714.8

1,881 4,546

83.60 15.43

C OUENNE C OUENNE

681.2

719.5 718.5

0 193

687.6

719.5 715.7

2 1,534

4.09

SCIP/CPLEX SCIP/CLP

691.9 691.7

705.1 705.5

43,000 31,200

691.9 692.0

704.6 704.7

149,474 95,927

1.80 1.80

SBB SBB

677.8 684.3

(705.9) (705.9)

8,940 9,020

689.0 691.8

(705.0) (705.0)

27,498 27,095

(2.32) (1.92)

Benchmark

48.8

49.1

7,300

48.9

49.0

23,401

0.33

BARON BARON

in preprocessing 11.6 50.0 146

11.7 46.5

50.0 50.0

61 864

327.00 7.65

C OUENNE C OUENNE

47.3 47.3

50.3 50.2

0 2

47.3 47.3

50.3 50.2

2 10

6.43 6.11

SCIP/CPLEX SCIP/CLP

48.5 48.7

49.2 49.2

13,000 9,600

48.8 48.7

49.1 49.1

41,312 34,275

0.71 1.00

SBB SBB

40.2 40.3

(50.1) (50.1)

580 660

40.2 40.3

(50.1) (50.1)

1,546 1,611

(24.69) (24.19)

4.4.4 Comparison of LP Based Solvers BARON, C OUENNE, and SCIP For the basic formulation, the best dual bounds were found by BARON, while for the warehouse formulations SCIP computed tighter bounds. For all formulations, SCIP computed the best primal solutions among the global solvers—all found by the extended RENS heuristic—and terminated with the smallest gaps. BARON spent much time in preprocessing and per node—also with reduced probing—which results in a comparably small number of enumerated nodes. COUENNE, in contrast, spent much time in its primal solution heuristics. A significant amount of this time was used by the underlying NLP solver IPOPT , which seems to have difficulties solving the (nonconvex) QCPs obtained from fixing integer variables in the original formulation. Figure 1 compares the progress of the primal and dual bounds from the start to the time limit of 10,000 s for all three solvers. It can be seen that even with tuned settings BARON and COUENNE spent a significant amount of time in presolving, especially

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

200 0 0

50.5

Couenne∗

50

BARON∗

49.5

SCIP 49 Benchmark best primal 40 30 20 10 0 2,000 4,000 6,000 8,000 10,000 0 2,000 4,000 6,000 8,000 10,000

Benchmark

400

Dent dual bnd

BARON∗ SCIP

710

700 best primal 600 primal bnd

Couenne∗

primal bnd

dual bnd

Marvin 720

37

time [s]

time [s]

Fig. 1 Progress in primal and dual bounds for the application-specific algorithm (grey) and the global solvers BARON (dotted), COUENNE (continuous), and SCIP/CPLEX (dashed) for warehouse formulation (WF). The “best primal” axis is level with the best known primal solution value from the application-specific benchmark algorithm

for instance Dent. BARON found a number of primal solutions in presolving. Since the BARON log files do not show the times when primal solutions were found during presolving, we plot these at the end of presolving. For the dual bound it can be seen that all three solvers start approximately with the same dual bound. SCIP, however, is able to decrease it more rapidly and comes closer to the best known primal solution values from the application-specific algorithm. For both instances, SCIP’s dual bound after 1,800 s is already less than 0.35% above its final value at 10,000 s.

5 Conclusion We have compared the performance of state-of-the-art generic MIQCP solvers on two realistic instances arising from the scheduling of open pit mine production. The problem can be characterised as a large mixed-integer linear program which is complemented by quadratic mixing constraints. The performance of SCIP and the application-specific algorithm indicates that for such problems, extending a MIP framework compares favourably to other approaches. Intuitively, the reason might be that integer variables usually model decisions, whereas nonlinear constraints model conditions. Once a linear relaxation of the nonlinear constraints has been solved and all variables are integer feasible, what remains is to fix violations of the nonlinear constraints. One could argue that in many applications this is easier than trying to fix violated integrality constraints once a continuous nonlinear relaxation has been solved. On the other hand, the performance of the QCP relaxation based solver SBB shows that employing nonlinear relaxations can make the solver more robust with respect to the choice of the formulation used. Unfortunately, as long as there is no way to prove global optimality for the relaxation used, this can only be used as a heuristic.

38

A. Bley et al.

Comparing the LP based general purpose solvers, COUENNE exploits some sophisticated heuristics in the root node, which enable it to produce good primal solutions. However, the low number of enumerated nodes, partly due to using I POPT as QCP solver, yields weaker dual bounds. BARON computed better dual bounds, but was unable to produce compatible primal solutions. Our experiments demonstrated that SCIP is able to perform nearly as well as a problem-specific implementation. In a pure MIP setting, SCIP would employ 25 primal heuristics. At the time of testing, only one of these has been extended to handle nonlinearities. Acknowledgements We thank our industry partner BHP Billiton Pty. Ltd. for providing us with the necessary data sets to conduct this study, and GAMS Development Corp. for providing us with evaluation licenses for BARON and SBB. This research was partially funded by the DFG Research Center M ATHEON , Project B20.

References 1. K. Abhishek, S. Leyffer, and J.T. Linderoth. FilMINT: An outer-approximation-based solver for nonlinear mixed integer programs. Technical Report ANL/MCS-P1374-0906, Argonne National Laboratory, 2006. 2. T. Achterberg. Constraint Integer Programming. PhD thesis, TU Berlin, 2007. 3. ARKI Consulting & Development A/S. CONOPT and SBB. http://www.gams.com/solvers/ solvers.htm. 4. P. Belotti, J. Lee, L. Liberti, F. Margot, and A. W¨achter. Branching and bounds tightening techniques for non-convex MINLP. Optimization Methods and Software, 24(4-5):597–634, 2009 5. T. Berthold, S. Heinz, and S. Vigerske. Extending a CIP framework to solve MIQCPs. Technical Report 09-23, Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin (ZIB), 2009. http://opus.kobv.de/zib/volltexte/2009/1186/. 6. A. Bley, N. Boland, G. Froyland, and M. Zuckerberg. Solving mixed integer nonlinear programming problems for mine production planning with a single stockpile. Technical Report 2009/21, Institute of Mathematics, TU Berlin, 2009. 7. A. Bley, A.M. Gleixner, T. Koch, and S. Vigerske. Comparing MIQCP solvers to a specialised algorithm for mine production scheduling. Technical Report 09-32, Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin (ZIB), October 2009. http://opus.kobv.de/zib/volltexte/2009/1206/. 8. N. Boland, I. Dumitrescu, G. Froyland, and A.M. Gleixner. LP-based disaggregation approaches to solving the open pit mining production scheduling problem with block processing selectivity. Comp. & Oper. Research, 36:1064–1089, 2009. 9. P. Bonami, L.T. Biegler, A.R. Conn, G. Cornu´ejols, I.E. Grossmann, C.D. Laird, J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. W¨achter. An algorithmic framework for convex mixed integer nonlinear programs. Disc. Opt., 5:186–204, 2008. 10. O. Exler and K. Schittkowski. A trust region sqp algorithm for mixed-integer nonlinear programming. Optimization Letters, 1:269–280, 2007. 11. J.J. Forrest. CLP and CBC. http://projects.coin-or.org/Clp,Cbc/. 12. C. Fricke. Applications of Integer Programming in Open Pit Mining. PhD thesis, University of Melbourne, August 2006. 13. A.M. Gleixner. Solving large-scale open pit mining production scheduling problems by integer programming. Master’s thesis, TU Berlin, June 2008. 14. IBM. ILOG CPLEX. http://www-01.ibm.com/software/integration/optimization/cplex.

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

39

15. B.A. Murtagh and M.A. Saunders. MINOS 5.5 User’s Guide. Department of Operations Research, Stanford University, 1998. Report SOL 83-20R. 16. I. Nowak and S. Vigerske. LaGO: a (heuristic) branch and cut algorithm for nonconvex MINLPs. Central Europ. J. of Oper. Research, 16(2):127–138, 2008. 17. M.G. Osanloo, J. Gholamnejad, and B. Karimi. Long-term open pit mine production planning: a review of models and algorithms. International Journal of Mining, Reclamation and Environment, 22(1):3–35, 2008. 18. S. Ramazan. The new fundamental tree algorithm for production scheduling of open pit mines. European Journal of Oper. Research, 177(2):1153–1166, 2007. 19. M. Tawarmalani and N.V. Sahinidis. Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications. Kluwer Academic Publishers, 2002. 20. A. W¨achter and L.T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.

•

A Binary Quadratic Programming Approach to the Vehicle Positioning Problem Ralf Bornd¨orfer and Carlos Cardonha

Abstract The VEHICLE POSITIONING PROBLEM (VPP) is a classical combinatorial optimization problem that has a natural formulation as a MIXED INTEGER QUADRATICALLY CONSTRAINED PROGRAM. This MIQCP is closely related to the QUADRATIC ASSIGNMENT PROBLEM and, as far as we know, has not received any attention yet. We show in this article that such a formulation has interesting theoretical properties. Its QP relaxation produces, in particular, the first known nontrivial lower bound on the number of shuntings. In our experiments, it also outperformed alternative integer linear models computationally. The strengthening technique that raises the lower bound might also be useful for other combinatorial optimization problems.

1 Introduction The VEHICLE POSITIONING PROBLEM (VPP) is about the assignment of vehicles (buses, trams, or trains) to parking positions in a depot and to timetabled trips. The parking positions are organized in tracks, which work as one- or two-sided stacks or queues. If at some point in time a required type of vehicle is not available in the front of any track, shunting movements must be performed in order to change the vehicle positions. This is undesirable and should be avoided. The VPP and its variants, such as the BUS DISPATCHING PROBLEM [5], the TRAM D ISPATCHING PROBLEM [13], and the T RAIN UNIT DISPATCHING PROBLEM [10], are well-investigated in the combinatorial optimization literature, see Hansmann and Zimmermann [7]. The problem was introduced by Winter [13] and Winter and Zimmermann [14], who modeled the VPP as a QUADRATIC

R. Bornd¨orfer C. Cardonha Zuse Institute Berlin, Takustr. 7, 14195, Berlin, Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 4, © Springer-Verlag Berlin Heidelberg 2012

41

42

R. Bornd¨orfer and C. Cardonha

ASSIGNMENT PROBLEM and used linearization techniques to solve it as an integer linear program. This approach was extended by Gallo and Di Miele [4] to deal with vehicles of different lengths and interlaced sequences of arrivals and departures. Similarly, Hamdouni et al. [5] explored robustness and the idea of uniform tracks (tracks which receive just one type of vehicle) to solve larger problems. Recently, Freling, Kroon, Lentink, and Huisman [3] and Kroon, Lentink, and Schrijver [10] proposed an integer linear program to consider decomposable vehicles (trains) and different types of tracks; they assume that the number of uniform tracks is known in advance. Although the VPP was originally modeled as a binary quadratic program, this formulation was not explored theoretically and it was not used for computations. All research efforts that we are aware of concentrated on integer linear models, that used more and more indices in order to produce tighter linearizations. Recent progress in mixed integer nonlinear programming (MINLP) and, in particular, in mixed integer quadratically constrained programming (MIQCP) methods [11], however, has increased the attractivity of the original quadratic model. Besides the compactness of this formulation, quadratic programming models also yield potentially superior lower bounds from fractional quadratic programming relaxations. In fact, the LP relaxations of all known integer linear models yield only the trivial lower bound zero. We investigate in this article two binary quadratic programming formulations for the VPP. Our main result is that the QP relaxation of one of these models yields a nontrivial lower bound on the number of shunting movements, that is, the fractional QP lower bound is nonzero whenever shunting is required. This model also gave the best computational performance in our tests, even though it is not convex. We also tried to apply convexification techniques [6], but the results were mixed. Convexification helped, but only when the smallest eigenvalue of the objective function was not too negative. The article is organized as follows. The VPP is described in Sect. 2. Section 3 discusses integer linear and integer quadratic 2-index models, i.e., we revisit the original approach of Winter. In Sect. 4 we present integer linear and integer quadratic 3-index models. One of them produces the already mentioned QP bound. All our computational experiments were done on an Intel(R) Core 2 Quad 2,660 MHz with 4Gb RAM, running under openSUSE 11.1 (64 bits). We used CPLEX 11.2 [8] to solve linear programs, SCIP 1.0 for integer programs [1], and SNIP 1.0 for integer non-linear programs [12].

2 The Vehicle Positioning Problem The VEHICLE POSITIONING PROBLEM (VPP) is a 3-dimensional matching problem, where vehicles that arrive in a sequence A D fa1 ; a2 ; : : : ; an g must be assigned to parking positions P D fp1 ; p2 ; : : : ; pn g in a depot and depart to service a sequence of timetabled trips D D fd1 ; d2 ; : : : ; dn g. We assume that the first

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

43

departure trip starts after the last incoming vehicle arrived. Each vehicle ai has a type t.ai / and each trip di can be serviced only by vehicles of type t.di /. The parking positions are located in tracks S, and we assume that positions in the tracks are numbered consecutively. Each track s 2 S has size ˇ, and we assume that ˇjSj n. Each track is operated as a FIFO queue, that is, vehicles enter the track at one end and leave at the other. Consider a matching with assignments .i; p; k/ and .j; q; l/, that is, the i -th arriving vehicle is assigned to parking position p in order to service the k-th departing trip and the j -th arriving vehicle is assigned to parking position q in order to service the l-th departing trip. Assume that p and q are located in the same stack; then a shunting movement is required if either i < j and p > q or p < q and k > l. In this case, we say that these assignments are in conflict and denote the associated crossings by .i; p/ .j; q/ or .p; k/ .q; l/. Given A; P; D; S; t, and ˇ, the VPP is to find a 3-dimensional matching that minimizes the number of crossings. The number of crossings is related to the number of required shuntings. We remark that there are more complex versions of this problem involving different sizes of vehicles and parking positions, multiple periods, etc. However, we do not consider them here. We use the following notation. V .M / denotes the optimal objective value of a model M . If M is an ILP, VLP .M / is the optimal objective value of its LP relaxation, and if M is an MIQCP, VQP .M / is the optimal objective value of its fractional quadratic programming relaxation. Finally, we say that two models M and M 0 are equivalent if, for every solution of M , there is a solution of M 0 with the same objective value and vice-versa.

3 Two-Index Models Winter [14] gave the following integer quadratic programming formulation for the VPP: .W/ min

P .a;p/.a0 ;q/

xa;p xa0 ;q C P

P .d;p/.d 0 ;q/

yd;p yd 0 ;q

(1)

a2A

xa;p D 1

p2P

(2)

p2P

xa;p D 1

a2A

(3)

d 2D

yd;p D 1

p2P

(4)

p2P

yd;p D 1

d 2D

(5)

.a;p;d /2APD t .a/¤t .d /

(6)

P P P

xa;p C yd;p 1 xa;p ; yd;p 2 f0; 1g:

44

R. Bornd¨orfer and C. Cardonha

The model uses binary variables xa;p , with a 2 A and p 2 P, and yd;p , with d 2 D and p 2 P. If xa;p D 1 .yd;p D 1/, vehicle a (trip d ) is assigned to parking position p. Constraints (2)–(5) define the assignments, the constraint (6) enforces the coherence of these assignments by allowing only vehicles and trips of the same type to be assigned to a given parking position. Finally, the quadratic cost function calculates the number of crossings. In his work, Winter did not solve the quadratic program directly. Instead, he applied the linearization method of Kaufman and Broeckx [9], obtaining the following integer linear model: .LW/ min

P

P wa;p C d 2D;p2P ud;p P a2A xa;p D 1 P p2P xa;p D 1 P d 2D yd;p D 1 P p2P yd;p D 1

(7)

a2A;p2P

p2P

(8)

a2A

(9)

p2P

(10)

d 2D

(11)

.a;p;d /2APD xa;p C yd;p 1 t .a/¤t .d / P x x da;p xa;p wa;p C .a;p/.a0 ;q/ xa0 ;q da;p 8p 2 P; a 2 A P y y dd;p yd;p ud;p C .d;p/.d 0 ;q/ yd 0 ;q dd;p 8p 2 P; d 2 D

(12) (13) (14)

xa;p ; yd;p 2 f0; 1g wd;p ; ud;p 2 N: In this model, the integer variables wa;p and ud;p count the number of crossings y x involving the assignments .a; p/ and .d; p/, respectively. da;p and dd;p are upper bounds on these variables, respectively, that are computed a priori. The following is known about these models: Remark 1. The model W has 2n2 variables and n3 C 4n constraints. Remark 2. The model LW has 4n2 variables and n3 C 2n2 C 4n constraints. Theorem 1 (WZ00). The models W and LW are equivalent. Theorem 2 (WZ00). VLP .LW/ D 0. It is not difficult to modify Winter’s proof of Theorem 2 in order to get a similar result for the QP relaxation of his quadratic model: Theorem 3. VQP .W / D 0 if jSj > 1. Proof. Let M be a matching where each ai is assigned to di (i.e., first vehicle to first trip, second vehicle to second trip, and so on) and the assignment of the pairs

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

45

.ai ; di / to the parking positions is made according to the following scheme, where each column represents a track: .anjSj ; dnjSj / :: :

.anjSjC1 ; dnjSjC1 / :: :

::: :: :

.an ; dn / :: :

.ajSjC1 ; djSjC1 / .a1 ; d1 /

.ajSjC2 ; djSjC2 / .a2 ; d2 /

::: :::

.a2jSj ; d2jSj / .ajSj ; djSj /

Such a matching has no crossings. However, it is not always feasible for W because of type mismatches (cf. the coherence (6)). If the integrality of the variables is relaxed, assigning each pair .ai ; di / to the same relative position in each track avoids the restrictions given by the coherence equations. More precisely, if a pair .ai ; di / is assigned to the second position of some track (in other words, if b.i 1/=jSjc D 1), we fix xai ;p D ydi ;p D 1=jSj for each position p 2 P which is the second position in some track (in other words, if b.p 1/=jSjc D 1). If jSj > 1, Equations 6 are satisfied. Since there are no crossings, the objective value is zero. t u A problem with model W is that the objective is not convex. This obstacle can be overcome using the following eigenvalue technique of Hammer and Rubin [6]. P Initially, we observe that .a;p/.a0 ;q/ xa;p xa0 ;q can be written as x T Ax, where A 2 2

2

f0; 1gn f0; 1gn is the symmetric incidence matrix of all arrival crossings. If ˛ is the minimum eigenvalue of A, we have x T Ax D x T .A ˛I /x C ˛x T x:

(15)

As x is binary, this equation can be rewritten as x T Ax D x T .A ˛I /x C ˛

X

xi :

(16)

i

Finally, in our case, we have

P i

xi D n for every feasible solution, that is,

x T Ax D x T .A ˛I /x C ˛n:

(17)

As A ˛I P is positive semidefinite, the function on the right is convex. The same ideas yield .d;p/.d;q0 / yd;p yd;q 0 D y T A0 y. Moreover, A0 D A. Then, the objective can be written as X X 2 2 x T A0 x ˛ .xa;p xa;p / C y T A0 y ˛ .yd;p yd;p /: (18) .a;p/

.d;p/

Applying this substitution to the model W, we obtain:

46

R. Bornd¨orfer and C. Cardonha

minx T A0 x ˛

P .a;p/

2 .xa;p xa;p / C y T A0 y ˛

P

P .d;p/

2 .yd;p yd;p /

a2A

xa;p D 1

p2P

(19)

p2P

xa;p D 1

a2A

(20)

d 2D

yd;p D 1

p2P

(21)

p2P

yd;p D 1

d 2D

(22)

P P P

xa;p C yd;p 1

.a;p;d /2APD t .a/¤t .d /

(23)

xa;p ; yd;p 2 f0; 1g: Table 1 give the results of a computational comparison of models W and LW, and W and CW, respectively, on a test set of ten instances of small and medium sizes. The first column in these tables give the name x-y-z of the problem. Here, x is the number of vehicle types, y is the number of tracks, and z D ˇ is the number of parking positions per track. The arrival sequences A were built randomly (i.e., the type of each vehicle was uniformly chosen among the x possibilities), while sequences D were obtained by applying 1,000 uniformly chosen random swaps to A. The columns labeled Row, Col, and NZ give the number of constraints, variables, and non-zeros of the respective model. The numbers of rows and columns for the problems of model CW are the same as the ones for model W. Columns Nod give the number of nodes in the search tree generated by the respective solver (SCIP with LP solver CPLEX for LW and SNIP for W) and T/s the computation time in seconds. Comparing the results for models CW and W shows that convexification led to an improvement, but not enough to outperform the linearized model LW, in particular not on the larger instances. We remark that more sophisticated convexification techniques might improve the results [2].

4 Three-Index Models Gallo and Di Miele [4] improved Winter’s model by noting that assignments .a; s/ and .d; s/ of arrivals and departures to stacks implicitly determine the parking positions uniquely; this produces a substantially smaller model. Kroon, Lentink and Schrijver [10] took this idea in order to create a 3-index model with a stronger LP relaxation (although the lower bound is still equal to zero):

3-6-4 4-6-4 5-6-4 3-7-3 4-7-3 5-7-3 3-7-4 4-7-4 5-7-4 3-7-5

10;465 11;617 12;289 7;141 7;897 8;359 16;297 18;145 19;209 31;151

Name Row

LW

2;305 2;305 2;305 1;765 1;765 1;765 3;137 3;137 3;137 4;901

Col

43;741 46;045 47;389 25;257 26;769 27;693 68;391 72;087 74;215 152;125

NZ

1;343 12;849 32;870 234 17;220 114 17;220 7;393 60;590 59;992

Nod

Table 1 Comparing models LW, W, and CW

58 265 654 18 15 19 124 574 2;171 3;251

T/s 9;325 10;477 11;149 6;273 7;029 7;491 14;743 16;591 17;655 28;715

Row

W/CW

1;165 1;165 1;165 897 897 897 1;583 1;583 1;583 2;465

Col 21;889 24;193 25;537 14;995 16;507 17;431 33;937 37;633 39;761 64;471

NZ

W

215 816 1;010 590 523 651 480 1;609 113;997 6;612

Nod 142 214 237 58 52 64 121 251 11;845 76;685

T/s

21;913 29;977 25;561 15;023 16;535 17;459 33;965 37;661 39;789 64;499

NZ

CW

1;543 24;217 586 245 324 858 2;122 1;526 1;320 627

Nod

116 690 96 29 32 42 176 242 1;544 40;145

T/s

A Binary Quadratic Programming Approach to Vehicle Positioning Problem 47

48

R. Bornd¨orfer and C. Cardonha

.LU/

P

min

ra;s;d

(24)

.a;s;d /2ASD.a;s;d /

P

.s;d /2SD

xa;s;d D 1

a2A

(25)

.a;s/2AS

xa;s;d D 1

d 2D

(26)

.a;d /2AD

xa;s;d ˇ

s2S

(27)

P P P a0
xa0 ;s;d C

P

d 0 d

xa;s;d 0 ra;s;d 1 .a; s; d / 2 A S D

(28)

xa;s;d ; ra;s;d 2 f0; 1g: This model uses binary variables xa;s;d , with s 2 S a 2 A, d 2 D, and t.a/ D t.d / (modeling type-coherence directly, as assignments with type-mismatches are not represented), where xa;s;d D 1 if and only if vehicle a is assigned to the trip d and is parked on the track s. Equations (25) and (26) are assignment constraints for vehicles and trips, (27) are capacity restrictions for each track in S. Inequalities (28) count crossings using binary variables ra;s;d . We propose the following integer quadratic 3-index formulation for the problem: .U/ min

P s;.a;d /.a0 ;d 0 /

xa;s;d xa0 ;s;d 0

P

.s;d /2SD

xa;s;d D 1

a2A

(30)

.a;s/2AS

xa;s;d D 1

d 2D

(31)

.a;d /2AD

xa;s;d ˇ s 2 S

P P

(29)

(32)

xa;s;d 2 f0; 1g: Equations (30), (31), and (32) are equal to (25), (26), and (27), respectively. Crossings are counted directly by the quadratic cost function (29). The models U and LU have the following properties: Remark 3. Model LU has 2sn2 variables and 2n C s C sn2 constraints. Remark 4. Model U has sn2 variables and 2n C s constraints. Theorem 4. VLP .LU/ D 0 if jSj > 1. Proof. Let M be a matching where each ai is assigned to di (i.e., first vehicle to 1 first trip, second vehicle to second trip, and so on). Assign jSj to each variable xa;s;d such that .a; d / 2 M . In this case, Constraints (25) and (26) clearly hold, as X s

xa;s;d D

X 1 D1 jSj s

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

49

for each a 2 A and d 2 D. Moreover, as jM j D n, X

xa;s;d D n

.a;d /

1 ˇ jSj

for each s 2 S, satisfying (27). Finally, because each arrival is assigned to only one departure, we have X

xa0 ;s;d C

a0
X

xa;s;d 0

d 0 d

2 ; jSj

and consequently constraints (28) hold with ra;s;d D 0 for each .a; s; d /, yielding a solution of cost zero. t u Our key observation is that model U can be strengthened by penalizing not only crossings but also inconsistent assignments: .UI/ min

P s;.a;d /.a0 ;d 0 /

xa;s;d xa0 ;s;d 0 C P

P a;.s;d /¤.s 0 ;d 0 /

xa;s;d xa;s0 ;d 0

(33)

.s;d /2SD

xa;s;d D 1

a2A

(34)

.a;s/2AS

xa;s;d D 1

d 2D

(35)

.a;d /2AD

xa;s;d ˇ

s2S

(36)

P P

xa;s;d 2 f0; 1g The objective function of UI contains an additional penalty term X

xa;s;d xa;s 0 ;d 0

a;.s;d /¤.s 0 ;d 0 /

for inconsistent assignments of vehicles (i.e., if a vehicle is assigned to more than one track and/or more than one trip, the value of the product of the variables representing such an inconsistent assignment is added). The penalty term is zero for every feasible integer solution, but it increases the objective value of the QP relaxation. Theorem 5. The models U, UI, and LU are equivalent. Theorem 6. VQP .UI / > 0 if V .UI / > 0. Proof. If V .UI/ > 0, there is a crossing for each possible assignment of vehicles to trips and tracks. Let x be an optimal solution of the QP relaxation of UI. Consider the vector dx e. If dx e contains an integer solution, there is a crossing and

50

R. Bornd¨orfer and C. Cardonha

X s;.a;d /.a0 ;d 0 /

dxa;s;d edxa0 ;s;d 0 e > 0:

Then X s;.a;d /.a0 ;d 0 /

xa;s;d xa0 ;s;d 0 > 0:

If dx e does not contain an integer solution, there is an inconsistent assignment and therefore X xa;s;d xa;s 0 ;d 0 > 0: a;.s;d /¤.s 0 ;d 0 /

As far as we know, VQP .UI/ is the first nontrivial lower bound for the VPP. We remark that the same idea can also be used to strengthen some of the linear models such that they sometimes also produce nonzero lower bounds. We have, however, not been able to prove a result similar to Theorem 6, that is, that the lower bound is always nonzero if shuntings are required. Table 2 gives the results of a computational comparison of models U and LU on the same set of test problems as in Sect. 3 plus one additional model that could not be solved there. Model UI could not be tested yet due to numerical problems. The comparison of the results for models CW and W from Sect. 3 and those for LU and U shows a clear superiority of the U models over the W models. Among the U models, the integer quadratic model U outperformed the integer linear model LU. The next instance 7-8-7, however, could not be solved using any of our formulations.

Table 2 Comparing models LU and U LU Name 3-6-4 4-6-4 5-6-4 3-7-3 4-7-3 5-7-3 3-7-4 4-7-4 5-7-4 3-7-5 4-7-5 5-7-5 6-7-6

Row 3;511 3;511 3;511 3;137 3;137 3;137 5;552 5;552 5;552 8;653 8;653 8;653 12;440

Col 4;609 4;321 4;153 4;117 3;865 3;711 7;323 6;861 6;595 11;439 10;725 10;291 14;407

NZ 38;017 30;241 25;675 30;871 24;816 21;274 67;803 53;509 45;389 126;099 98;582 82;321 117;307

U Nod 1 1 59 12 1 54 1 41 1 1 59 26 227

T/s 1 0 15 8 1 6 1 29 1 4 44 38 200

Row 61 61 61 57 57 57 71 71 71 85 85 85 99

Col 1;159 871 703 1;037 785 631 1;842 1;380 1;114 2;871 2;157 1;723 2;066

NZ 4;621 3;463 2;797 4;124 3;123 2;507 7;351 5;503 4;432 11;467 8;604 6;875 8;240

Nod 28 69 16 16 20 27 22 33 21 17 22 31 27

T/s 4 3 2 3 2 28 10 7 4 34 21 12 32

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

51

We have also tried to apply the convexification technique of Hammer and Rubin [6] to model U, but this time it did not bring any performance gain. A possible explanation for this behavior is that the spectra of the objectives of the U instances have negative eigenvalues of much larger magnitude than those in the W instances. Again, more sophisticated convexification could be tried [2]. Acknowledgements We thank Stefan Vigerske for his advice with respect to the formulation of integer quadratic programs and SNIP support. We also thank an anonymous referee for helpful comments and suggestions. The work of C. C. is supported by CNPq-Brazil

References 1. T. ACHTERBERG, Constraint integer programming, Ph.D. thesis, TU Berlin, (2007). 2. A. B ILLIONNET AND S. ELLOUMI , Using a mixed integer quadratic programming solver for the unconstrained quadratic 0-1 problem, Math. Program., 109 (2007), pp. 55–68. 3. R. FRELING , R. LENTINK, L. KROON , AND D. HUISMAN , Shunting of passenger train units in a railway station. ERIM Report Series Research in Management, 2002. 4. G. GALLO AND F. DI M IELE, Dispatching buses in parking depots, Transportation Science, 35 (2001), pp. 322–330. 5. M. H AMDOUNI, F. SOUMIS, AND G. D ESAULNIERS, Dispatching buses in a depot minimizing mismatches. 7th IMACS, Scientific Computing Toronto, Canada, 2005. 6. P. HAMMER AND A. RUBIN , Some remarks on quadratic programming with 0-1 variables, Revue Francaise d’Informatique et de Recherche Operationelle, 4 (1970), pp. 67–79. 7. R. S. HANSMANN AND U. T. ZIMMERMANN , Optimal Sorting of Rolling Stock at Hump Yards, in Mathematics – Key Technology for the Future: Joint Projects Between Universities and Industry, Springer, Berlin, 2008, pp. 189–203. 8. ILOG, CPLEX website. http://www.ilog.com/products/cplex/. 9. L. K AUFMANN AND F. BROECKX, An algorithm for the quadratic assignment problem, European J. Oper. Res., 2 (1978), pp. 204–211. 10. L. KROON , R. LENTINK, AND A. SCHRIJVER , Shunting of passenger train units: an integrated approach, ERIM Report Series Reference No. ERS-2006-068-LIS, (2006). 11. I. N OWAK , Relaxation and Decomposition Methods for Mixed Integer Nonlinear Programming, Birkh¨auser Verlag, 2005. 12. S. VIGERSKE, Nonconvex mixed-integer nonlinear programming. http://www.math.hu-berlin. de/stefan/B19/. 13. T. W INTER , Online and Real-Time Dispatching Problems, PhD thesis, TU Braunschweig, 1998. 14. T. W INTER AND U. ZIMMERMANN, Real-time dispatch of trams in storage yards, Annals of Operations Research, 96 (2000), pp. 287–315.

•

Determining Fair Ticket Prices in Public Transport by Solving a Cost Allocation Problem Ralf Bornd¨orfer and Nam-Dung ˜ Ho`ang

Abstract Ticket pricing in public transport usually takes a welfare maximization point of view. Such an approach, however, does not consider fairness in the sense that users of a shared infrastructure should pay for the costs that they generate. We propose an ansatz to determine fair ticket prices that combines concepts from cooperative game theory and integer programming. An application to pricing railway tickets for the intercity network of the Netherlands is presented. The results demonstrate that prices that are much fairer than standard ones can be computed in this way.

1 Introduction Public transport ticket prices are well studied in the economic literature on welfare optimization as well as in the mathematical optimization literature on certain network design problems, see, e.g., the literature survey in [2]. To the best of our knowledge, however, the fairness of ticket prices has not been investigated yet. The point is that typical pricing schemes are not related to infrastructure operation costs and, in this sense, favor some users, which do not fully pay for the costs they incur. For example, we will show that in this paper’s (academic) example of the Dutch IC railway network, the current distance tariff results in a situation where the passengers in the central Randstad region of the country pay over 25% more than the costs they incur, and these excess payments subsidize operations elsewhere. One can argue that this is not fair. We therefore ask whether it is possible to construct ticket prices that reflect operation costs better.

R. Bornd¨orfer N.-D. Ho`ang Zuse Institute Berlin (ZIB), Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 5, © Springer-Verlag Berlin Heidelberg 2012

53

54

R. Bornd¨orfer and N.-D. Ho`ang

Ticket pricing can be seen as a cost allocation problem, see [11] for a survey/an introduction. Cost allocation problems are widespread. They come up whenever it is necessary or desirable to divide a common cost between several users or items. If the users have an alternative to accepting the prices, cost allocation becomes a game. Some examples of applications where cost allocations have been determined using methods from cooperative game theory are, e.g, aircraft landing fees [8], water resource planning [10], water resource development [12], distribution cost of gas and oil transportation [4], and investment in electric power [5]. In this paper we model ticket pricing as a cooperative cost allocation game to minimize overpayments. We argue that the f -least core of this game can be used to determine fair prices. The f -least core can be computed by solving a linear program. This linear program has a number of constraints that is exponential in the number of players, but it can be solved using a constraint generation approach. The associated separation problem is an NP-hard combinatorial optimization problem. The article is structured as follows. Section 2 recalls some concepts from cooperative game theory. Some desired properties that a cost allocation should have are introduced in Sect. 3. A model that treats ticket pricing as a cost allocation game is presented in Sect. 4. The final Sect. 5 is devoted to the Dutch IC railway network example. We use the following notation. For a vector x 2 RN and a subset S N , we P denote by x.S / D i 2S xi the sum of the coordinates of x in S .

2 Game Theoretical Setting A cost allocation game deals with price determination and is defined as follows. Definition 1. Consider a set of players N D f1; 2; : : : ; ng, a cost function c W 2N nf;g ! RC , and a non-empty polyhedron P D fx 2 Rn j Ax b; xi 0; 8i 2 N g, which gives the set of feasible prices x that the players are asked to pay. The triple D .N; c; P / is a cost allocation game. Definition 2. Let D .N; c; P / be a cost allocation game and f W 2N nf;g ! RC a weight function. For each coalition ; ¤ S ¨ N and each vector x D .x1 ; : : : ; xn / 2 Rn , we define the f -excess of S at x as ef .S; x/ WD

c.S / x.S / : f .S /

The f -excess represents the gain (or loss, if it is negative) of coalition S , scaled by f .S /, if its members accept to pay x.S / instead of operating some service themselves at cost c.S /. We will assume in this article that the weight function f has the form f D ˛ C ˇj j C c with ˛; ˇ; 0 and ˛ C ˇ C > 0; e.g., f .S / D jS j gives the gain per coalition member. The excess measures price acceptability: the smaller ef .S; x/, the less favorable is price x for coalition S , and

Determining Fair Ticket Prices in Public Transport

55

for ef .S; x/ < 0, i.e., in case of a loss, x will be seen as unfair by the members of S . We assume furthermore that the set of cost-covering prices of a cost allocation game X . / WD fx 2 P j x.N / D c.N /g is non-empty. The goal of the cost allocation game is to determine a price x 2 X . / which minimizes the loss (or maximizes the gain) over all coalitions. In other words, we associate the optimization problem max

min ef .S; x/

x2X . / ;¤S N

with the cost allocation game D .N; c; P /. We recall some related definitions from game theory. Definition 3. Consider a cost allocation game D .N; c; P /. Let " be a real number and f W 2N nf;g ! RC be a weight function. The set C";f . / WD fx 2 X . / j ef .S; x/ "; 8; ¤ S N g is the ."; f /-core of . In particular, C0;f . / is the core of . The f -least core of the game , denoted LCf . /, is the intersection of all nonempty ."; f /-cores. Proposition 1. Let "f . / be the largest " such that the set C";f . / is nonempty, i.e., "f . / D max

min ef .S; x/:

x2X . / ;¤S N

Then LCf . / D C"f . /;f . In other words, the f -least core is the set of all vectors in X . / that maximize the minimum f -excess of proper subsets of N . The following trivial proposition holds. Proposition 2. If X . / is non-empty, then LCf . / is non-empty.

3 Cost Allocation Methods We take in this section a novel axiomatic approach to cost allocation games. Introducing a number of desirable properties, we state the question whether there is a “perfect” cost allocation method that fulfills them all. It will out that this is unfortunately impossible. Due to lack of space, we state here only the results; a full exposition can be found in [7].

56

R. Bornd¨orfer and N.-D. Ho`ang

Definition 4. Let ˘ be the set of all cost allocation games. A cost allocation method ˚ is a function ˚ W˘ !

1 [

jN j

Rk0 ; .N; c; P / 7! x 2 R0 :

kD1

Definition 5. A cost allocation method ˚ is feasible for a cost allocation game .N; c; P / if ˚.N; c; P / belongs to P . A cost allocation method is feasible if it is feasible for every game .N; c; P /. Definition 6. A cost allocation method ˚ is efficient for a cost allocation game .N; c; P / if there holds ˚.N; c; P /.N / D c.N /: A cost allocation method is efficient if it is efficient for every game .N; c; P /. Definition 7. Consider a cost allocation game .N; c; P /. For each coalition ; ¤ S ¨ N , define the set ˇ ˚ PS WD xjS ˇ x 2 P W xjN nS D 0 : Assume that PS is non-empty for every coalition ; ¤ S ¨ N . A cost allocation method ˚ is a coalitionally stable allocation method for the game .N; c; P / if there holds for every coalition ; ¤ S ¨ N ˚.N; c; P /.S / ˚.S; cS ; PS /.S /; where cS WD cj2S . A cost allocation method is coalitionally stable if it is coalitionally stable for every cost allocation game .N; c; P / satisfying PS ¤ ; for every nonempty coalition S . With a coalitionally stable cost allocation method, there is no proper coalition S of N such that the price for each player in S will not increase and the price for at least one player will decrease if S leaves the grand coalition N . Hence, the grand coalition N is stable, since no coalition S will have a profit to leave N . Definition 8. A cost allocation method ˚ is a core allocation method/an f -least core allocation method if for every cost allocation game .N; c; P / the vector ˚.N; c; P / belongs to the core of .N; c; P / in the case it is non-empty/the f -least core of .N; c; P /. In reality, allocation costs can often only be approximated or they can change over time. Therefore, a cost allocation method should be insensitive with respect to changes of the cost function. Definition 9. A cost allocation method ˚ is said to have bounded variation if for each number 2 .0; 1/ there exists a positive number K such that for all cost allocation games .N; c; P / and .N; c; Q P / satisfying

Determining Fair Ticket Prices in Public Transport

57

jc.S Q / c.S /j ˛jc.S /j; 8S N; for some 0 ˛ , there holds j˚.N; c; Q P /i ˚.N; c; P /i j K˛˚.N; c; P /i ; 8i 2 N W ˚.N; c; P /i ¤ 0: Proposition 3. Each f -least core allocation method is a feasible, efficient core allocation method. We can now formulate our question: Does a feasible, efficient, coalitionally stable core allocation method, which has bounded variation, exist? The answer is “no” due to the following two propositions. Proposition 4. There is no efficient, coalitionally stable allocation method for cost allocation games whose cores are empty. Proposition 5. Core allocation methods do not have bounded variation, even if we only consider cost allocation games having a monotone, subadditive cost function. That means that, in general, one cannot construct an efficient, coalitionally stable core allocation method, which has bounded variation. Even worse, at most two of these four properties can be simultaneously fulfilled. There are two way to proceed: One way is to consider more specific families of cost allocation games which could have better properties, another is trying the minimize the degree of axiomatic violation. An example for the latter approach, namely, the minimization of unfairness in the sense of coalitional instability, is given in the following section.

4 Ticket Pricing as a Cooperative Game To apply the framework of Sects. 2 and 3 to the ticket pricing problem, we define a suitable cost allocation game D .N; c; P /. Consider a railway network as a graph G D .V; E/, and let N V V be a set of origin-destination (OD) pairs, between which passengers want to travel, i.e., we consider each (set of passengers of an) ODpair as a player. We next define the cost c.S / of a coalition S N as the minimum operation cost of a network of railway lines in G that service S . Using the classical line planning model of [3], c.S / can be computed by solving the integer program X

c.S / WD min .;/

s:t:

1 2 .cr;f r;f C cr;f r;f /

.r;f /2RF

X

X

r2R;r3e f 2F

ccap f .mr;f C r;f /

(1) X i 2S

Pei ; 8e 2 E

58

R. Bornd¨orfer and N.-D. Ho`ang

X

X

f r;f Fei ; 8.i; e/ 2 S E

r2R;r3e f 2F

r;f .M m/r;f 0; 8.r; f / 2 R F X r;f 1; 8r 2 R f 2F j 2 f0; 1gjRF j ; 2 ZjRF : 0

The model assumes that the Pi passengers of each OD-pair i travel on a unique shortest path P i (with respect to some distance in space or time) through the network, such that demands Pei on transportation capacities on edges e arise, and, likewise, demands Fei on frequencies of edges. These demands can be covered by a set R of possible routes (or lines) in G, which can be operated at a (finite) set of possible frequencies F , and with a minimal and maximal number of wagons m and 1 2 M in each train. ccap is the capacity of a wagon, cr;f and cr;f , .r; f / 2 R F , are cost coefficients for the operation of route r at frequency f . The variable r;f equals 1 if route r is operated at frequency f , and 0 otherwise, while variable r;f denotes the number of wagons in addition to m on route r with frequency f . The constraints guarantee sufficient capacity and frequency on each edge, link the two types of route variables, and ensure that each route is operated at a single frequency. It is shown in [3] that the problem is NP-hard, but it can be solved for the sizes that we consider. Finally, we define the polyhedron P , which gives conditions on the prices x that the players are asked to pay, as follows. Let .uj 1 ; uj /, j D 1; : : : ; l, be OD-pairs such that uj , j D 0; : : : ; l, belong to the travel path P st associated with some ODpair .s; t/, u0 D s, and ul D t, and let .u; v/ be an arbitrary OD-pair such that u and v also lie on the travel path P st from s to t. We then stipulate that the prices xi =Pi , which individual passengers of OD-pair i have to pay, must satisfy the monotonicity properties 0

l X xuj 1 uj xst xuv : Puv Pst j D1 Puj 1 uj

(2)

Moreover, we require that the prices should have the following property max st

xst xst K min ; st dst Pst dst Pst

(3)

where dst is the distance of the route .s; t/. This inequality guarantees that the price difference per unit of length, say one kilometer, is bounded by a factor of K. The triple D .N; c; P / defines a cost allocation game to determine costcovering prices for using the railway network G, in which coalitions S consider the option to bail out of the common system and set up their own, private ones. Computing prices in the f -least core of requires to solve the linear program

Determining Fair Ticket Prices in Public Transport

max "

59

(4)

.x;"/

s:t: x.S / C "f .S / c.S /; 8S 2 2N nf;; N g x 2 X . /: This can be done using a constraints generation approach. We start with a (small) subset ; ¤ ˙ 2N nf;; N g and solve max "

(5)

.x;"/

s:t: x.S / C "f .S / c.S /; 8S 2 ˙ x 2 X . /: Let .x ; " / be an optimal solution of this LP. The separation problem for .x ; " / is to find a coalition T 2 2N nf;; N g such that .x ; " / violates the constraint x .T / C " f .T / c.T /:

(6)

This can be done by solving the optimization problem max x .S / C " f .S / c.S /:

;¤S ¨N

(7)

If the optimal value is non-positive then .x ; " / is a feasible and hence optimal solution of (4). Otherwise, each optimal solution of (7) provides a violated constraint. Recalling f D ˛ C ˇj j C c, we have x .S / C " f .S / c.S / D ˛" C

X .xi C ˇ" /iS C . " 1/c.S /; i 2N

where iS WD

1 if i 2 S 0 otherwise:

On the other hand, there holds " 1:

(8)

Trivially, the inequality (8) holds for " < 0. In the case " 0, since ˛; ˇ 0 and .x ; " / is a feasible solution of (5),one can easily verify that the inequality (8)

60

R. Bornd¨orfer and N.-D. Ho`ang

holds as well. Therefore, the optimization problem (7) can be reformulated as the integer program max ˛" C

.;;z/

s:t:

X

X xi C ˇ" zi C . " 1/ i 2N

X X

1 2 r;f C cr;f r;f cr;f

(9)

.r;f /2RF

ccap f .mr;f C r;f /

X

Pei zi 0; 8e 2 E

i 2N

r2R;r3e f 2F

X

X

f r;f Fei zi 0; 8.i; e/ 2 N E

r2R;r3e f 2F

r;f .M m/r;f 0; 8.r; f / 2 R F X r;f 1; 8r 2 R f 2F j 2 f0; 1gjRF j ; 2 ZjRF ; z 2 f0; 1gjN j nf0; 1g: 0

The variables zi , i 2 N , correspond to a coalition S N , zi equals 1 if the player i belongs to S and 0 otherwise. Other variables and constraints come from the integer program (1), which models the cost function. A violated constraint exists iff the optimal value is larger than 0. If it is non-positive, then .x ; " / is a feasible solution N ; of (4). Otherwise, we can find a feasible solution .; N zN/ of (9) with a positive objective function value. Define T WD fi 2 N j zNi D 1g, then .x ; " / violates the constraint (6).

5 Fair IC Ticket Prices We now use our ansatz to compute ticket prices for the intercity network of the Netherlands, which is shown in Fig. 1. Our data is a simplified version of that published in [3], namely, we consider all 23 cities, but reduce the number of ODpairs to 85 by removing pairs with small demand. However, with 285 1 possible coalitions, the problem is still very large. We start with a “pure fairness scenario” where the prices are only required to have the monotonicity property (2), i.e., we ignore property (3) for the moment. By solving LP (4), we determine a point x in the c-least core (i.e., f D c) and define c-least core ticket prices (lc-prices) for each passenger of an OD-pair i as pi WD xi =Pi . Figure 2 compares these lc-prices p with the distance dependent prices p that have been used by the railway operator NS Reizigers for this network as reported in /x.S / [1]. The picture on the left side plots the relative c-profits c.Sc.S with x D x and / x D x D p ı P (ı denotes the coordinate-wise product) of 8,000 coalitions, which have been computed in the course of our constraint generation algorithm. These c-profits are sorted in non-decreasing order. Note that the core of this particular

Determining Fair Ticket Prices in Public Transport

61 Groningen

Leeuwarden

Assen Heerenveen

Lelystad

Zwolle Oldenzaal

Amsterdam Hengelo Schiphol

Apeldoorn Arnhem

Den Haag

Utrecht Zevenaar

Rotterdam

Breda Rosendaal

Eindhoven

Sittard

Maastricht

Fig. 1 The intercity network of the Netherlands

game is empty, and some coalitions have to pay more than their cost. The maximum c-loss of any coalition with respect to the lc-prices is a mere 1.1%. This hardly noticeable unfairness is in contrast with the 25.67% maximum c-loss in the distance prices. In fact, there are 10 other coalitions with losses of more than 20%. Even worse, the coalition with the maximum loss is the main coalition of passengers traveling in the center of the country, i.e., in our model, a major coalition would earn a substantial benefit from shrinking the network. The picture on the right side of Fig. 2 plots the distribution of the ratio between the lc-prices and the distance prices. It can be seen that lc-prices are lower, equal,

62

R. Bornd¨orfer and N.-D. Ho`ang 0.3 lc-prices/distance prices

14

relative profit

0.2 0.1 0 -0.1 -0.2 -0.3

lc-prices distance prices 0

2000

4000 coalition

6000

distribution

12 10 8 6 4 2 1

8000

0

20000 40000 60000 number of passengers

80000

Fig. 2 c-least core vs. distance prices for the Dutch IC network (1) 0.3 lc-prices/distance prices

2

relative profit

0.2 0.1 0 -0.1 -0.2 -0.3

lc-prices distance prices 0

2000

4000 coalition

6000

8000

1.5 1 0.5 0

0

20000

40000

60000

80000

number of passengers

Fig. 3 c-least core vs. distance prices for the Dutch IC network (2)

or slightly higher for most passengers. However, some passengers, mainly in the periphery of the country, pay much more to cover the costs that they produce. The increment factor is at most 3.78 except for two OD-pairs, which face very high price increases. The top of the list is the OD-pair Den Haag HS to Den Haag CS, which gets 14.4 times more expensive. The reason is that the travel path of this OD-pair consists of a single edge that is not used by any other travel route. From a game theoretical point of view, these lc-prices can be seen as fair. It would, however, be very difficult to implement such prices in practice. We therefore add property (3) in order to limit price increases by a factor of K. Considering the results from the previous computation, we set K D 3. Figure 3 gives the same comparisons as Fig. 2 for these lc-prices. The maximum c-loss of any coalition in the new lc-prices is 1.68%, which is slightly worse than before. But the price increments are significantly smaller. Nobody has to pay more than 1.89 times more than the distance price. In this way, one can come up with price systems that constitute a good compromise between fairness and enforceability. Acknowledgements We thank Sebastian Stiller for valuable comments and suggestions. The work of N.-D. H. is supported by a Konrad-Zuse Scholarship.

Determining Fair Ticket Prices in Public Transport

63

References ¨ 1. R. BORND ORFER , M. NEUMANN , AND M. E. P FETSCH, Optimal fares for public transport, Operations Research Proceedings 2005, (2006), pp. 591–596. , Models for fare planning in public transport, Tech. Rep. ZIB Report 08-16, Zuse2. Institut Berlin, 2008. 3. M. R. BUSSIECK, Optimal Lines in Public Rail Transport, PhD thesis, TU Braunschweig, 1998. ¨ ¨ -LUNDGREN , AND P. VARBRAND , The traveling salesman game: 4. S. ENGEVALL, M. G OTHE An application of cost allocation in a gas and oil company, Annals of Operations Research, 82 (1998), pp. 453–471. 5. D. GATELY, Sharing the gains from regional cooperation: A game theoretic application to planning investment in electric power, International Economic Review, 15 (1974), pp. 195– 208. ˚ H ALLEFJORD, R. H ELMING, AND K. JØRNSTEN, Computing the nucleolus when the 6. A. characteristic function is given implicitly: A constraint generation approach, International Journal of Game Theory, 24 (1995), pp. 357–372. 7. N. D. HOANG , Algorithmic Cost Allocation Game: Theory and Applications, PhD thesis, TU Berlin, 2010. 8. S. LITTLECHILD AND G. THOMPSON , Aircraft landing fees: A game theory approach, Bell Journal of Economics 8, (1977). 9. M. MASCHLER , B. PELEG , AND L. S. SHAPLEY , Geometric properties of the kernel, nucleolus, and related solution concepts, Mathematics of Operations Research, 4 (1979), pp. 303–338. 10. P. STRAFFIN AND J. HEANEY , Game theory and the tennessee valley authority, International Journal of Game Theory, 10 (1981), pp. 35–43. 11. H. P. YOUNG , Cost allocation, In R. J. Aumann and S. Hart, editors, Handbook of Game Theory, vol. 2, North-Holland, Amsterdam, 1994. 12. H. P. YOUNG , N. OKADA, AND T. HASHIMOTO , Cost allocation in water resources development, Water Resources Research, 18 (1982), pp. 463–475.

•

A Domain Decomposition Method for Strongly Mixed Boundary Value Problems for the Poisson Equation Dang Quang A and Vu Vinh Quang

Abstract Recently we proposed a domain decomposition method (DDM) for solving a Dirichlet problem for a second order elliptic equation, where differently from other DDMs, the value of the normal derivative on an interface is updated from iteration to iteration. In this paper we develop a method for solving strongly mixed boundary value problems (BVPs), where boundary conditions are of different type on different sides of a rectangle and the transmission of boundary conditions occurs not only in vertices but also in one or several inner points of a side of the rectangle. Such mixed problems often arise in mechanics and physics. Our method reduces these strongly mixed BVPs to sequences of weakly mixed problems for the Poisson equation in the sense that on each side of the rectangle there is given only one type of boundary condition, which are easily solved by a program package, constructed recently by Vu (see [13]). The detailed investigation of the convergence of the method for a model problem is carried out. After that the method is applied to a problem of semiconductors. The convergence of the method is proved and numerical experiments confirm the efficiency of the method. Keywords Domain decomposition method • Poisson equation • strongly mixed boundary conditions

D. Quang A Institute of Information Technology, VAST 18 Hoang Quoc Viet, Cau giay, Hanoi, Vietnam e-mail: [email protected] V.V. Quang Faculty of Information Technology, Thai Nguyen University, Hanoi, Vietnam e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 6, © Springer-Verlag Berlin Heidelberg 2012

65

66

D. Quang A and V.V. Quang

1 Introduction Consider the problem 8 u D f; ˆ ˆ < u D '; ˆ @u ˆ : D ; @

x 2 ˝; x 2 @˝ n n ;

(1)

x 2 n :

where ˝ R2 with a Lipschitz continuous boundary @˝, f 2 L2 .˝/; ' 2 H 1=2 .@˝ n n /; 2 H 1=2 .n /; x D .x1 ; x2 / . Under the above assumptions the problem (1) has a solution in H 1 .˝/ (see [11]). We call the problem (1) with the point of transmission of Dirichlet and Neumann boundary conditions being an inner point of a smooth part of the boundary strongly mixed boundary value problem in distinguishing it from the weakly mixed one where the points of transmission of boundary conditions occur only in angle points of the boundary. This problem and other mixed boundary value problems often arise in mechanics and physics, and attract attention of many researchers. When the domain ˝ has a special shape such as rectangle, circle, half-plane or infinite strip for the case f D 0 and some special boundary conditions using series or integral transform methods many authors reduce strongly mixed boundary value problems to dual series or dual integral equations, and after that reduce the latter equations to the Fredholm equation for obtaining approximate solutions [6, 12]. An another approach to the solution of strongly mixed BVPs for the Laplace equation is the use of the expansion by fundamental solutions (see e.g. [1,5]). In 1989 Vabishchevich [8] proposed a method for reducing a strongly mixed BVP to a sequence of Dirichlet problems by an iterative method. In this method the value of the unknown function on the part of boundary, where its derivative is prescribed, is updated from iteration to iteration. As the author showed, this iterative method is not convergent on the continuous level, but it is convergent on the discrete level. Recently, in [2] the first author of the present paper proposed an alternative idea for treating the mixed BVP in a rectangle, where the Neumann condition is given on a part n of one side and the Dirichlet condition is given on the remaining part of boundary. This idea is based on iterative updating of the derivative of the unknown function on the part of the Dirichlet condition on the side. This iterative process reduces the strongly mixed problem to a sequence of weakly mixed BVPs, which are easily solved. It is proved that the method converges on both the continuous and the discrete level. Below we present a completely different approach to strongly mixed BVPs, which is based on a DDM developed by ourselves in [3, 4]. Differently from other DDMs, where the value of the sought function on the interface is updated in each iteration, in our method the value of its normal derivative is updated. In the mentioned works the advantage of our method in convergence rate over other DDMs is demonstrated in many numerical examples. The investigation of our DDM for the model problem (1) is presented in Sect. 2. In Sect. 3 we develop the method for a

DDM for Strongly Mixed BVP for Poisson Equation

67

problem which is a generalization of a problem in physics of semiconductors. The performed numerical examples show the efficiency of the proposed method.

2 A Domain Decomposition Method for a Strongly Mixed BVP For the problem (1), divide the domain ˝ into two subdomains ˝1 ; ˝2 by a curve . Denote 1 D @˝1 n ; 2 D @˝2 n .n [ / (see Fig. 1), and let ui .i D 1; 2/ be the solution u restricted in the subdomains ˝1 ; ˝2 , i be the outward normal to the boundary of subdomain ˝i .i D 1; 2/, Further, denote 1 g D @u j . Our idea is to determine the boundary function g together with the @1 functions ui .i D 1; 2/ by an iterative process. This leads the original problem to a sequence of weakly BVPs, which are easily solved.

2.1 Description of the Method Step 1. Step 2.

Step 3.

Given a function g .0/ 2 L2 . /, for example, g .0/ D 0; x 2 : For the known g .k/ on , consecutively solve two problems 8 .k/ ˆ 8 u2 D f; x 2 ˝2 ; ˆ .k/ ˆ ˆ ˆ u1 D f; x 2 ˝1 ; ˆ ˆ ˆ ˆ ˆ u.k/ ˆ .k/ D '; x 2 2 ; < < 2 u1 D '; x 2 1 ; .k/ .k/ D u1 ; x 2 ; u ˆ ˆ ˆ 2 ˆ @u.k/ ˆ ˆ ˆ ˆ .k/ ˆ : 1 D g.k/ ; x 2 ; @u ˆ ˆ @1 : 2 D ; x 2 n : @2

(2)

Update value of g.kC1/ .k/

g .kC1/ D .1 /g.k/

@u2 ; x 2 : @2

(3)

where is a parameter to be chosen for guaranteeing the convergence of the method. Γn

Γ Γ1

Fig. 1 Domain ˝ and its subdomains ˝1 ; ˝2

Ω1

Ω2

Γ2

68

D. Quang A and V.V. Quang

2.2 Convergence of the Method Rewrite (3) in the form .k/

@u g.kC1/ g.k/ C g.k/ C 2 D 0; .k D 0; 1; 2; : : :/: @2 .k/

Introduce the notations ei the problems

.k/

.k/

D ui ui .i D 1; 2/; .k/ D g .k/ g: Then ei

8 .k/ e1 D 0; x 2 ˝1 ; ˆ ˆ ˆ ˆ < .k/ e1 D 0; x 2 1 ; ˆ .k/ ˆ ˆ @e1 ˆ : D .k/ ; x 2 ; @1

8 .k/ e2 D 0; ˆ ˆ ˆ ˆ ˆ .k/ ˆ e2 D 0; ˆ < ˆ ˆ ˆ ˆ ˆ ˆ ˆ :

.k/ e2 .k/ @e2

@2

D

.k/ e1 ;

satisfy

x 2 ˝2 ; x 2 2 ; x 2 ;

D 0; x 2 n :

and the .k/ satisfy the relation .k/

@e .kC1/ .k/ C .k/ C 2 D 0; .k D 0; 1; 2; : : :/: @2

(4)

Now we define Steklov-Poincare operators S1 ; S2 as follows

S1 D

@v1 @v2 ; S2 D ; x2 @1 @2

where v1 and v2 are the solutions of the problems 8 < v1 D 0; v1 D 0; : v1 D ;

x 2 ˝1 ; x 2 1 ; x 2 ;

8 v2 D 0; x 2 ˝2 ; ˆ ˆ ˆ < v2 D 0; x 2 2 ; v2 D ; x 2 ; ˆ ˆ @v ˆ 2 : D 0; x 2 n : @2

Then the inverse operator S11 is defined by S11 D w1 j , where w1 is the solution of the problem

DDM for Strongly Mixed BVP for Poisson Equation

8 ˆ w1 D 0; ˆ < w1 D 0; @w ˆ 1 ˆ D ; : @1

69

x 2 ˝1 ; x 2 1 ; x 2 : .k/

@e2 ; x 2 : Using the operators @2 defined above we can rewrite the relation (4) in the form Therefore, we have S11 .k/ D e1 ; S2 e1 .k/

.k/

D

.kC1/ .k/ C .I C S2 S11 / .k/ D 0; .k D 0; 1; 2; : : :/: Acting the operator S11 on the both sides of the above equality we obtain the twolayer iterative scheme .kC1/

e1

.k/

e1

.k/

C Be1 D 0; x 2 .k D 0; 1; 2; : : :/:

(5)

where B D I C S11 S2 . From this scheme we have .kC1/

e1

.k/

D .I B/e1 ;

x 2 :

The convergence of the iterative scheme depends on properties of the operator B. 1 ˇ For investigating the operator B we introduce the space D H 2 . / D fvˇ W v 2 00

1

H01 .˝/g and its dual space 0 D H00 2 . /. Then in weak formulation the operator S1 can be defined as ˛ ˝ S1 ; 0 ; D .rH1 ; rH1 /L2 .˝1 / ; 8; 2 ; where H1 is the harmonic extension of from to ˝1 : In [3] we proved that S1 is a symmetric, positive definite and bounded operator in the space . 1 b 2 the harmonic Now consider S2 . Let 2 H002 . / and denote by w D H extension of to ˝2 , i. e., w is the solution of the problem 8 w D 0; ˆ ˆ ˆ w D 0; < w D ; ˆ ˆ @w ˆ : D 0; @2

x 2 ˝2 ; x 2 2 ; x 2 ; x 2 n :

b 2 the harmonic extension of to ˝2 . Then Analogously, denote by v D H

70

D. Quang A and V.V. Quang

Z 0D

Z

@v wds C @2

vwdx D

˝2

@˝2

Z

Z

S2 ds C

D

Z rvrwdx ˝2

b 2 :r H b 2 dx: rH

˝2

From here it follows Z

Z S2 ds D

b 2 :r H b 2 dx; rH

˝2

which means that S2 is a symmetric operator in the space . Next, using the Poincare-Friedrich inequality and the trace theorem we obtain ˛ ˝ b 2 ; r H b 2 2 D rv; rv L2 .˝2 / S2 ; ;0 D r H L .˝2 / C12 jjvjj2H 1 .˝2 / C22 jjjj2H 1=2 . / :

(6)

On other hand, we have the following estimate for the solution of the problem for v jjvjjH 1 .˝2 / C jjvjjH 1=2. / :

(7)

Besides, from the definition of the norm jjvjj2H 1.˝2 / D jjvjj2L2 .˝2 / C jjrvjj2L2 .˝2 / it follows jjrvjj2L2 .˝2 / jjvjj2H 1.˝2 / :

(8)

Thus, from (6), (7) and (8) we obtain ˝ ˛ C22 jjjj2H 1=2 . / S2 ; ;0 C 2 jjjj2H 1=2 . / : It means that S2 is a positive definite and bounded operator in : In the energetic product of S1 we have ˛ ˝ ˛ ˝ ˛ ˝ B; D S1 I C S11 S2 ; ;0 D S1 ; ;0 C S2 ; ;0 : Since, as shown above, the operators S1 , S2 are symmetric, positive definite and bounded operators the operator B also is a symmetric, positive definite and bounded operator in : According to the general theory of two-layer iterative schemes [7] we conclude that with the parameter chosen appropriately the iterative scheme (5) will converge. The results of computational experiments in [4] confirm the conclusion.

DDM for Strongly Mixed BVP for Poisson Equation

71

3 A Parallel Algorithm for Solving a Problem of Semiconductors At present for describing physical processes in semiconductor devices hydrodynamical models are usually used. Among them one is described in [10] (see also [9]). This model contains Poisson equation for electric potential with mixed boundary conditions. In this section we develop the idea of DDM presented in Sect. 2 to a more general problem in the domain ˝ D .0; 6a/ .0; b/ which is the model in [9]: 8 u D f; ˆ ˆ ˆ ˆ ˆ ˆ @u ˆ ˆ D ˇ; ˆ ˆ @x2 ˆ ˆ ˆ < @u D ˛; ˆ @x1 ˆ ˆ ˆ ˆ ˆ u D g; ˆ ˆ ˆ ˆ ˆ @u ˆ ˆ : D ˇ; @x2

.x1 ; x2 / 2 ˝; x2 D 0I 0 < x1 < 6a; x1 D 0; 6aI 0 < x2 < b;

(9)

x2 D bI 0 < x1 < a; 2a < x1 < 4a; 5a < x1 < 6a x2 D bI a < x1 < 2a; 4a < x1 < 5a;

where ˛; ˇ; g are given functions.

3.1 Description of the Algorithm We divide the domain ˝ into five subdomains ˝i .i D 1; : : : ; 5/ by straightline segments 1 D .x1 D aI 0 < x2 < b/; 2 D .x1 D 2aI 0 < x2 < b/; 3 D .x1 D 4aI 0 < x2 < b/; 4 D .x1 D 5aI 0 < x2 < b/. We denote the left side of ˝1 by 0 , the right side of ˝5 by 5 , the top and bottom sides of ˝i by Ti and Bi .i D 1; : : : ; 5/, respectively (see Fig. 2).

x2 b

T1

T2

Γ1

Γ0

0

Ω1

Ω2

B1

a B 2

T3

Γ2

T4

Γ3 Ω3

2a

B3

Fig. 2 Domain ˝ and its subdomains with boundaries

Γ4

Ω4

4a

T5

B4

Γ5

Ω5 5a B 5

- x1

6a

72

D. Quang A and V.V. Quang

Also, we introduce the notations ˇ ui D uˇ˝i .i D 1; : : : ; 5/; D .1 ; 2 ; 3 ; 4 /0 ; ˇ ˇ ˇ ˇ @u1 ˇˇ @u3 ˇˇ @u3 ˇˇ @u5 ˇˇ 1 D ; D ; D ; D : 2 3 4 @x1 ˇ1 @x1 ˇ2 @x1 ˇ3 @x1 ˇ4 Step 1. Given .0/ D 0: Step 2. For k D 0; 1; 2; : : : solve in parallel problems in subdomains ˝1 ; ˝3 ; ˝5 8 .k/ u1 ˆ ˆ ˆ ˆ ˆ .k/ ˆ ˆ @u1 ˆ ˆ ˆ ˆ ˆ @x1 ˆ ˆ ˆ < .k/ @u1 ˆ @x1 ˆ ˆ ˆ ˆ ˆ ˆ @u.k/ ˆ 1 ˆ ˆ ˆ ˆ @x 2 ˆ ˆ ˆ : .k/ u1

D f;

x 2 ˝1 ;

D ˛;

x 2 0 ;

.k/

D 1 ;

x 2 1 ;

D ˇ;

x 2 B1 ;

D g;

x 2 T1 ;

8 .k/ ˆ u3 ˆ ˆ ˆ ˆ ˆ .k/ ˆ ˆ @u3 ˆ ˆ ˆ ˆ @x1 ˆ ˆ ˆ ˆ < .k/ @u3 ˆ @x1 ˆ ˆ ˆ ˆ ˆ .k/ ˆ ˆ @u3 ˆ ˆ ˆ ˆ @x2 ˆ ˆ ˆ ˆ : .k/ u3

8 .k/ x 2 ˝5 ; u5 D f; ˆ ˆ ˆ ˆ ˆ ˆ < @u.k/ .k/ 5 D 4 ; x 2 4 I @x 1 ˆ ˆ ˆ .k/ ˆ ˆ @u ˆ 5 : D ˇ; x 2 B5 I @x2

D f;

x 2 ˝3 ;

.k/

x 2 2 ;

D 3 ;

.k/

x 2 3 ;

D ˇ;

x 2 B3 ;

D g;

x 2 T3 ;

D 2 ;

.k/

@u5 D ˛; @x1

x 2 5 ;

.k/

u5 D g;

x 2 T5 :

Step 3. Solve in parallel in ˝2 ; ˝4 8 .k/ u2 D f; ˆ ˆ ˆ ˆ ˆ .k/ .k/ ˆ u2 D u 1 ; ˆ < ˆ ˆ ˆ ˆ ˆ ˆ ˆ :

.k/

.k/

u2 D u3 ; .k/ @u2

@x2

D ˇ;

x 2 ˝2 ; x 2 1 ; x 2 2 ; x 2 B2 [ T2 ;

8 .k/ u4 D f; ˆ ˆ ˆ ˆ ˆ .k/ .k/ ˆ u 4 D u3 ; ˆ < ˆ ˆ ˆ ˆ ˆ ˆ ˆ :

.k/

.k/

u4 D u5 ; .k/ @u4

@x2

D ˇ;

x 2 ˝4 ; x 2 3 ; x 2 4 ; x 2 B4 [ T4 :

Step 4. Update .kC1/ D .1 / .k/ ' .k/ ; where is a parameter to be chosen and

(10)

DDM for Strongly Mixed BVP for Poisson Equation

'D

73

@u2 ˇˇ @u2 ˇˇ @u4 ˇˇ @u4 ˇˇ 0 : ˇ ; ˇ ; ˇ ; ˇ @x1 1 @x1 2 @x1 3 @x1 4

3.2 Convergence of the Iterative Process In order to study the convergence of the proposed iterative process we introduce an operator B defined in the space L2 .1 [ 2 [ 3 [ 4 / by the formula B D '; where ui ; .i D 1; 2; 3; 4; 5/ are the solutions of the problems 8 u1 ˆ ˆ ˆ ˆ ˆ @u1 ˆ ˆ ˆ ˆ ˆ @x 1 ˆ ˆ < @u 1 ˆ @x2 ˆ ˆ ˆ ˆ @u1 ˆ ˆ ˆ ˆ ˆ @x ˆ 2 ˆ : u1

D 0;

x 2 ˝1 ;

D 0;

x 2 0 ;

D 1 ;

x 2 1 ;

D 0;

x 2 B1 ;

D 0;

x 2 T1 ;

8 u3 ˆ ˆ ˆ ˆ @u3 ˆ ˆ ˆ ˆ ˆ @x1 ˆ ˆ ˆ < @u3 @x1 ˆ ˆ ˆ ˆ ˆ @u3 ˆ ˆ ˆ ˆ @x2 ˆ ˆ ˆ : u3

D 0;

x 2 ˝2 ;

D 2 ;

x 2 2 ;

D 3 ;

x 2 3 ;

D 0;

x 2 B3 ;

D 0;

x 2 T3 ;

8 u5 D 0; x 2 ˝5 ; ˆ ˆ ˆ ˆ < @u5 D ; x 2 I @u5 D 0; x 2 I 4 4 5 @x1 @x1 ˆ ˆ ˆ @u5 ˆ : D 0; x 2 B5 I u5 D 0; x 2 T5 ; @x2 8 u2 ˆ ˆ ˆ ˆ < u2 u2 ˆ ˆ ˆ @u2 ˆ : @x2

D 0; x 2 ˝2 ; D u 1 ; x 2 1 ; D u 3 ; x 2 2 ; D 0;

x 2 B2 [ T2

8 u4 ˆ ˆ ˆ ˆ < u4 u4 ˆ ˆ ˆ @u4 ˆ : @x2

D 0; .x1 ; x2 / 2 ˝4 ; D u 3 ; x 2 3 ; D u 5 ; x 2 4 ; D 0;

x 2 B4 [ T4

Then the formula (10) can be written in the form of the iterative scheme .kC1/ .k/ C .I C B/ .k/ D F; where F is a function determined by the data functions of the problem (9).

(11)

74

D. Quang A and V.V. Quang

Table 1 Convergence of the iterative scheme in Example 1 u1 u2 u3 K Error K Error K Error 0.1 40 5.104 40 0.05 40 0.05 40 6:104 30 0.002 0.2 27 9.105 0.3 17 9:105 22 6:104 18 0.001 0.4 12 9:105 16 6:104 14 0.002 0.5 9 8:105 12 6:104 10 0.002 0.6 16 8:105 30 8:104 17 0.002 0.7 40 2:104 40 0.05 40 0.002

K 40 34 22 15 12 20 40

u4 Error 0.0047 9:105 6:105 8:105 4:105 7:105 0.0025

Analogously as in the case of two subdomains in Sect. 2 we can prove that B is a symmetric, positive definite and bounded operator in appropriate space. It implies that I C B is a symmetric, positive definite and bounded operator in the space. Therefore, according to the general theory of two-layer iterative schemes [7] the iterative scheme (11) converges for some range of the parameter with the rate of geometric progression.

3.3 Numerical Experiments We perform some numerical examples for testing the convergence of the proposed iterative method. Below we report the results of the experiments when the exact solution u.x1 ; x2 / is known. The computational domain ˝ D .0; 6a/ .0; b/ with a D =6 and b D =3 is covered by the uniform grid with grid size h D 1=64. The stopping criteria is jju.k/ u.k1/ jj1 < 0:001: The results of the experiment for 4 different functions: u1 D sin x1 x2 ; u2 D x13 C x2 e x1 C x23 C x1 e x2 ; u3 D e x2 log.x1 C 5/ sin x2 log.x1 C 6/; u4 D x12 C x22 are given in Table 1 , where K is the number of iterations, error=ku.k/ uk1 . From the results of the experiments we see that the value D 0:5 of the iterative parameter appears to be optimal and with this value the proposed iterative process converges very fast.

4 Conclusion In the paper we propose a domain decomposition method for solving strongly mixed BVPs, when the transmission of types of boundary conditions occurs at one or many points on a smooth part of the boundary. This domain decomposition

DDM for Strongly Mixed BVP for Poisson Equation

75

method is based on the update of the normal derivative of the solution on the interface between subdomains. In the case of many points of transmission such as in a problem in physics of semiconductors, a parallel algorithm is considered. The computational experiments performed for some examples show an optimal value of the iterative parameter, which gives the fastest convergence of the iterative process. The proposed method can be successfully applied to other strongly mixed BVPs for the Poisson equation and for elliptic equations in general. Acknowledgements The authors kindly acknowledge financial support from Vietnam National Foundation for Science and Technology Development (NAFOSTED), project 102.99-2011.24, and would like to thank the referees for the helpful suggestions.

References 1. Arad M., Yosibash Z., Ben-Dor G., Yakhot A., Computing flux intensity factors by a boundary method for elliptic equations with singularities, Communications in Numerical Methods in Engineering, 14 (1998) 657–670. 2. Dang Q. A, Iterative method for solving strongly mixed boundary value problem, Proceedings of the First National Workshop on Fundamental and Applied Researches in Information Technology, Publ. House “Science and Technology”, Ha Noi, 2004. 3. Dang Q. A and Vu V.Q., A domain decomposition method for solving an elliptic boundary value problem, In: L. H. Son, W. Tutschke, S. Jain (eds.), Methods of Complex and Clifford Analysis (Proceedings of ICAM Hanoi 2004), SAS International Publications, Delhi, 2006, 309–319. 4. Dang Q. A, Vu V.Q., Experimental study of a domain decomposition method for mixed boundary value problems in domains of complex geometry, J. of Comp. Sci. and Cyber., 21(3) (2005) 216–229. 5. Georgiou G.C., Olson L., Smyrlis Y.S., A singular function boundary integral method for the Laplace equation, Communications in Numerical Methods in Engineering 12(2) (1996), 127–134. 6. Mandal N., Advances in dual integral equations, Chapman & Hall, 1999. 7. Samarskii A.A., The Theory of Difference Schemes, New York, Marcel Dekker, 2001. 8. Vabishchevich P.N., Iterative reduction of mixed boundary value problem to the Dirichlet problem, Differential Equations 25(7) (1989) 1177–1183 (Russian). 9. Blokhin A.M., Ibragimova A.S., Krasnikov N.Y., On a variant of the method of lines for the Poisson equation, Comput. Technologies, 12(2) (2007), 33-42 (Russian). 10. Romano V., 2D simulation of a silicon MESFET with a non-parabolic hydrodynamical model based on the maximum entropy principle, J. Comput. Phys. 176 (2002) 70–92. 11. Savare G., Regularity and perturbation results for mixed second order elliptic problems, Commun. in partial differential equations, 22 (5 and 6), 1997, 869–899. 12. Snedon I., Mixed boundary value problems in potential theory. North. Hol. Pub. Amsterdam, 1966. 13. Vu V. Q., Results of application of algorithm for reducing computational amount in the solution of mixed elliptic boundary value problems, Proceedings of the National symposium “Development of tools of informatics for the help of teaching, researching and applying mathematics”, Hanoi, 2005, 247–256 (Vietnamese).

•

Detecting, Monitoring and Preventing Database Security Breaches in a Housing-Based Outsourcing Model Tran Khanh Dang, Tran Thi Que Nguyet, and Truong Quynh Chi

Abstract In a housing-based outsourcing model, the database server is the client’s property and the outsourcing service provider only provides physical security of machines and data, and monitors (and if necessary restores) the operating condition of the server. Soft security-related aspects (e.g., DBMS security breaches) are the client’s responsibility. This is a non-trivial task for most of the clients. In this paper, we propose an extensible architecture for detecting, monitoring and preventing database security breaches in a housing-based outsourcing model. The architecture can help in dealing with both outsider and insider threats. It is well suited for the detection of both predefined and potential security breaches. Our solution to the database security breach detection is based on the wellknown pentesting- and version checking-based techniques in network and operation systems security. The architecture features visual monitoring and secure auditing w.r.t. all database user activities in real time. Moreover, it also supports automatic prevention techniques if security risks are established w.r.t. the found security breaches.

1 Introduction With the rapid development of the Internet and networking technology, outsourcing database services become increasingly popular in the enterprise database management [3, 4, 6]. Organizations (partly) outsource their data management needs to external service providers, and thereby freeing themselves to concentrate on their core business. One popular method of this outsourcing model is called “housing service”, where the servers are the property of the client who installs and

T.K. Dang T.T.Q. Nguyet T.Q. Chi Faculty of Computer Science & Engineering, HCMUT, Vietnam e-mail: fkhanh,ttqnguyet,[email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 7, © Springer-Verlag Berlin Heidelberg 2012

77

78

T.K. Dang et al.

operationally manages the servers’ software. In this case the outsourcing service provider provides the physical security of machines and data, monitors (and if necessary restores) the operating condition of the server. Even then, concerning security issues in the model, “soft” security-related issues are usually ignored or understood that “the installed software (or the client) must be responsible for it”. For example, if a client chooses a database housing service at a service provider SP and SP has got a special and limited account easytask for some special managerial activities (easytask was assigned very limited rights on the client’s “outside-housed” database clientDB), then SP is typically liable only for the so-called “physical” security of the client’s hardware and data. This is reasonable only if (the client believes that) the database management system (DBMS) is working as expected in terms of security. In practice, serious problems can occur as the DBMS has got some security flaws and the client may not realize these “soft” flaws soon before the housing service provider SP. In this case, SP may make use of the account easytask and the found security flaws in order to get control of or perform unauthorized accesses to the client’s clientDB, which is housed on SP’s premises. For the above reasons, we have been carrying out research in order to develop a means capable of detecting the database security flaws and visually monitoring the account easytask’s real-time activities, especially with respect to the clientDB’s possible security flaws. Moreover, based on the results of the detecting and monitoring phase, the system will also be able to send warning messages or conduct other proper preventing actions if it detects the risks that may violate the security policy. The rest of this paper is organized as follows. Section 2 briefly discusses related work. Section 3 introduces a classification of security flaws and monitored database activities. An architecture for the system is proposed in Sect. 4. After that, we present our prototype for an Oracle database and evaluate our solution in Sect. 5. Section 6 discusses open related research issues. Finally, Sect. 7 gives concluding remarks and presents future work.

2 Related Work 2.1 Detecting Techniques Data mining is one of the techniques used in anomaly detection methodology [16] to find an abnormal pattern of access transactions which may be a signal of an attack [1] by heuristics and/or some stochastic-based algorithms. However, it may also give a false alarm or some uncertain degree to the discovered results. Therefore, we focus only on two major detecting techniques: version checking and pentestingbased techniques to find the security flaws. We will elaborate on this problem as placing data mining-based techniques in the context of our developing system in Sect. 6.

Detecting, Monitoring and Preventing Database Security Breaches

79

The version checking-based technique [5, 6] is used to find inherently security flaws related to each particular DBMS version which have been published on the database security-related web sites. This technique is simple, but it has a noticeable lacuna that it may result in false alarms because security flaws may depend on not only DBMS but also other factors. Pentesting is a method of evaluating the security of database system by simulating an attack by a malicious user through various phrases [2, 7] to detect security flaws such as flaws related to SQL Injection, etc. (see Sect. 3.1). If the simulatedintrusion action is successful, a security flaw definitely exists in the system.

2.2 Monitoring Techniques Database activity monitoring means that all structured query language (SQL) statements of both normal users and database administrators are captured and recorded in real time. To perform auditing activities, database activity monitoring needs collectors. There are three major categories of collection techniques [8, 13, 14]: network monitoring, local agent, and remote monitoring. Each category has its own advantages and disadvantages which are related to the system performance, DBMS platform, and type of captured activities (internal or external). In our system, the remote monitoring-based technique is used to collect information of outsourced database access activities (by using native auditing features of DBMS or other internal database features such as triggers). We can, therefore, monitor both internal and external database activities.

2.3 Preventing Techniques To protect the outsourced database, in addition to built-in security functionalities of the DBMS, our system provides users with two additional kinds of prevention: active and passive unauthorized access prevention [6]. Active prevention is a countermeasure that should be done before the system being attacked while passive prevention is done right after a certain sign of an attack appeared in the database system. There are three levels of a passive prevention, namely to alert, to disable (temporarily) user’s access, or to shut down the database. An instance of this prevention is to monitor all the activities on the system in real time and allow the tracker to prevent malicious or attacking actions as soon as possible.

3 Problem Analysis The desired system has two key features: detecting database security flaws and monitoring database activities. As a consequence, it also provides abilities of

80

T.K. Dang et al.

preventing possible damages to the database systems. Firstly, we must identify and classify objects or events which should/must be detected.

3.1 Database Security Flaw Classification A database security flaw is a vulnerability which can be exploited to make an attack on the database system. Database security flaws are classified into five major categories, which will also be used as the fundamental criteria to scan the flaws later on [5, 6, 11]: – Version: The version-related flaws are inherently natural security flaws existing in the DBMS since it has been released. – Users/Accounts: This type refers to the way users manage their accounts, or the account settings in the security aspect. – Procedures/Functions/Packages: Some procedures/functions/packages may consist of errors/bugs that can be exploited to attack the database system. For instance, parameters of the procedures are not well validated so that hackers can take advantage of this leading to a privilege escalation or denial of service (DoS) attack on the database systems. – Privileges/Roles: This kind of flaws begins from lack of full awareness of the power that the privileges/roles give to users, and from incomplete understanding of the existing technologies w.r.t. the employed DBMS. – System security settings: Mis-configuration of the database system often leads to serious security flaws. Thus, the database administrator has to cover as much as possible the aspects which will affect the database system security.

3.2 Monitoring Activity Classification Based on the requirements of the Sarbanes-Oxley Act (SOX) for auditing activities [Nat05], we classify monitoring activities into six main categories which are used to set up the monitoring policy for our system: – Connection: The events of sign-on and sign-off should be recorded with the login name, a timestamp for the event, and some additional information such as TCP/IP address of the user or the program creating the connection. – Object changes: This category falls into three kinds, namely schema changes, user changes, and role changes. Auditing schema changes means that all data definition activities are audited. User changes or role changes are activities related to addition, deletion or expiration of users or roles. – System security settings: It includes audit settings and system configuration settings. Activities related to audit settings are monitored when changing what

Detecting, Monitoring and Preventing Database Security Breaches

81

to be audited and which users can access to audit tables. System configuration settings which change values of system parameters are also captured. – Security settings for users: This kind includes privilege/role settings and account settings. It means that activities which change privileges of users and parameters’ values of accounts must be recorded. – Privileged user activities: Any activities by privileged users, including public users, database administrators and other predefined privileged users, are monitored. – Direct data access: This type is to monitor any content changes or accesses to key tables including audit tables and sensitive tables.

4 The Proposed Architecture The system architecture consists of three programs separately located at client side, service server side and client server side. The client side is the application to scan the client’s remote database system, to monitor the database activities and to prevent security problems. The service server side is an independent server as a central service server providing services for the above application to process the client’s requests (cf. Fig. 1). The client server side or housing provider side is actually the client’s outsourced database server. The system architecture is described in detail as follows:

Fig. 1 The overall system architecture

82

T.K. Dang et al.

4.1 Client Side The client installs a desktop application to interact with the service server. This application is a functional interface. The main tasks of this application are to connect to the service server and call suitable deployed services in order to accomplish the following tasks: scanning, monitoring, configuring, flaw fixing, reporting, feedback sending, and visualization. – Scan: identify which database security flaws exist in the client’s database system and notify the client administrators of their current database’s health (see Sect. 4.4 for more details). – Monitor: while an account accesses to the client’s database or whenever this account’s privileges have any changes in the real time, all activities of the account are monitored (see Sect. 4.5 for more details). – Configure: set up the system configuration. – Fix: offer solutions to repair the found flaws with two modes, a manual or an automatic-fixing mode. – Report: present the final results of the scan and monitoring processes in the visually most attractive way (graphs and reports). – Send feedback: send comments and report new security flaws to the system developers. – Visualization: visually displays the scan/monitor results with intuitively understandable interfaces. – Update program: ensure that the program’s database and functionalities at the client side are always up-to-date (e.g., new security flaw patterns, new solutions to monitor database activities, or even a new GUI). As mentioned above, the remote services are actually in charge of performing the main functions, and then returning results to the client program. Thus, there is no real scanning or monitoring process happening at the client side. Therefore, this application is light enough for the client to download and run at his personal computer. The program database at the client side only stores some configuration information for the program to connect to the service server.

4.2 Service Server Side Each function is implemented as web service. This implementation model is also suitable for the service provider to manage its supplied services according to any specific clients. The services are corresponding to primary functions of the client system: scan, monitor, prevent, fix, report and collect feedback. The prevent function differs from the other functions as it is not a function called by the client. The response actions are predefined and called automatically when there are risks

Detecting, Monitoring and Preventing Database Security Breaches

83

threatening the client’s server. The other functions are described similarly as the client side’s functions, but they are implemented as web services. For example, when client wants to scan his server, the program at the client side will call the “scan” service to detect the security flaws in the target database. Another program at the service server side is built for the system development purpose. It supports the developers to specify more database security flaws, detecting scripts and monitoring policies: – Specify database security flaws: update the database of the program. The repository includes the flaw information, preconditions, and corresponding fixing solutions. – Specify detecting scripts: define the scripts to detect database security flaws. For each security flaw, it needs scripts and a solution to detect it. The solution can base on the version checking- or pentesting-based technique. – Specify monitoring policies: define the policies to monitor database activities. The policy defines the sensitive objects, subjects, and database privilege that should be monitored. – Collect feedback: get feedback from the users/clients (from the client side) in order to further improve the program with new advanced features. A crucially important feature of the service server side is the capability of securing sensitive data for each client, namely, the scanning results, the audit data, as well as the sensitive configuration information of its database server. By that way, even the provider administrators do also not know the state of their client database server and, of course, each authorized client can only access its own data.

4.3 Client Server Side (Housing Provider Side) The program at the client server side is a small piece of code, called listener that records what is happening at the client’s outsourced database. It stores audit data temporarily in the client’s database server and then sends them to the service server after being filtered for further analysis and reporting. The audit data are saved temporarily in the client’s server instead of transferring them directly to the service server in order to ensure secure auditing property. Such data contains important information about the system’s state and so, must be continuously updated for clients to make timely decisions. When the connection between client database server and service server is interrupted, all activities at the outsourced database still keep on being audited and stored at the local client database server. As the connection is re-established, the recovery and synchronize mechanism in the system will ensure that these two audit databases are consistent. Consequently, monitoring database activities of users continue even if the connection is interrupted. In addition, all audit data are encrypted, digitally signed, as well as transferred over a secure communication channel. Besides that, security policies are enforced strictly to limit access to the audit tables from unauthorized users.

84

T.K. Dang et al.

Fig. 2 Process of detecting database security flaws

4.4 Detecting Engine The detecting engine (cf. Fig. 2) is used for the web service of scanning. This process is activated when the client’s administrator defines requirements and requests the detecting engine to find database security flaws through the application at the client side. After gathering the requirements and criteria to scan, it makes some necessary preparations before scanning such as performing backup of important data, generating temporary files, etc. Next, it starts scanning, analyzing and evaluating the database system. These three processes work together in order to find the database security flaws. After that, it returns results to the client and then displays them on the client user interface w.r.t. the found flaws as a statistical report and, if necessary, sends alert messages to the client’s administrator. The result is visually organized by the flaw categories and ranked by the risk level. Next, possible solutions to the found flaws are also suggested. This will help the administrator to have an overall view of the system health and to carry out suitable actions to protect the database system better. While running, the process uses the database security flaw repository, audit logs and stated policies to detect the security flaws. The result of the scanning phase is stored at a local database, called the scan history, for references as needed.

4.5 Monitoring Engine Figure 3 illustrates the process of monitoring database activities which is activated when the administrator configures the monitor settings and requests the monitoring service to track users’ database activities in real time. Firstly, it

Detecting, Monitoring and Preventing Database Security Breaches

85

Fig. 3 Process of monitoring database activities

gathers the users’ requirements to monitor. Then, using the collected information, it makes some preparations, for example, by installing the auditing modules (if necessary), generating temporary files, etc. [5, 10]. Next, it starts to monitor database activities, including the following sub-processes: gathering audit data, filtering data, analyzing, evaluating, and detecting. These processes work together to monitor the client database system’s health. The gathering audit data process records all activities of an account while the account is accessing the client’s database or whenever this account’s privileges have got any changes in the real-time environment from audit logs. These activities are captured by an auditor module. The filtering data process filters audit trails using policy-based rules. These rules are set to determine activities that really matter. This engine will send filtered data to the analysing and evaluating processes. Using the results of the detector process as described in Sect. 4.4 and the account’s activities combined with thresholds retrieved from the predefined baseline storage, the analysis and evaluation part can give trustworthy and noticeable results about the found security flaws. After that, basing on policies for response action, the preventing process will propose possible solutions for fixing the found flaws as well as sending certain alert messages to the administrator. The results from the security flaw scanning process, the corresponding response actions as well as what the monitoring processes have captured will be presented visually in the forms of charts, graphs, etc. by the displaying process. They are also summarized and stored in the history database for statistics.

86

T.K. Dang et al.

5 Prototype and Evaluation In this section, we briefly introduce our implemented prototype which is used to scan security flaws and to monitor activities for Oracle databases in a real-world housing model [6]. One can also extend this prototype w.r.t. other DBMSs based on the specification modules as presented in Sect. 4.2. After that, we give the evaluation of our architecture and implementation.

5.1 Prototype The prototype can scan about 150 flaws [6] of the Oracle Database Server 10g and 11g, including all kinds of flaws mentioned in Sect. 3.1. Besides that, it can track all activities of normal users who are accessing the client’s outsourced database server (clientDB) for some special managerial activities in the housing-outsourcing model and alert them to a predefined security policy violation. These activities include Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control Language (DCL) in both successful and unsuccessful modes. This means that all performed activities are recorded in the program database even if they were unsuccessful. The prototype comprises the collection of web services and two programs which are described in the system architecture (cf. Sect. 4): the client tool at the client side and the developer tool at the service server side. The client tool consumes data space of less than 4 MB. It mainly calls web services deployed at the service server side of the provider to carry out the main functions such as scanning and monitoring. The sensitive data such as scanning results or auditing data are secured for each client by using Oracle Data Vault. Except the authorized client, nobody not even the administrator of the housing service provider can have the privilege to discover such sensitive data.

5.2 Evaluation The evaluation is based on the following characteristics: – Extensibility: The system developers of the service provider can enlarge the security flaws database and the kinds of database activities to be monitored through specification functions of the service server side (cf. Sect. 4.2). Moreover, this architecture can be extended to any kind of DBMS. – Efficiency: The clients only need to install a lightweight program to perform the functions of scanning, monitoring, alerting and reporting that will call the corresponding web services located at the service server side. Therefore, it reduces the workload as well as the cost of resources for the clients as running the application (cf. Sect. 4.1). Regarding the response time, it depends on the speed

Detecting, Monitoring and Preventing Database Security Breaches

87

of the network connection between client server and service server. If a lot of clients are connecting to the service server concurrently, a bottleneck problem is not avoidable and must be solved, for example, by upgrading the configuration of the service server. – Security: The architecture proposes the solution to keep sensitive data of a client secure from the administrator(s) of the service provider as well as other clients. In addition, with the secure auditing data feature, all database activities are always monitored even though the network connection is interrupted (stored in the local audit data by the listener, see Sect. 4.2). – Reliability: The client application calls web services located in a separate service server. Therefore, it cannot work in the condition that network between the client and the service server is interrupted. The analyzed results, however, can be reviewed by the client because they are stored at the service server side in the client data vault. Through the proposed solution, although the clients are not present at the database server location, their database soft security aspects can be monitored and evaluated. Besides, necessary prevention actions can also be conducted by the clients in case suspicious activities are found.

6 Open Research Issues Although our framework is general-purpose for a variety of application contexts, many research topics are still open and we would like to introduce some most notable issues which have been identified but not addressed radically: – Pentesting weaknesses: Although pentesting is an useful practice and a simple and sound solution to identify vulnerabilities existing in the database system, there are many security problems and limitations associated with penetration tests [2]. We have proposed a framework which combines five components based on a variety of rules: verifying scripts before testing, monitoring and preventing, alerting, auditing all events during testing and checking recovery state [15]. – Database security visualization: The in-depth analysis of huge amounts of information from scanning or monitoring results is not easy without the support of data visualization techniques. Security visualization extensively helps users to understand deeply security issues and easily keep track of the health state of database by time from the visual perspective [12]. – Data mining-based techniques: The crucial advantage of data mining techniques is the ability to draw attention to apotential/unknown security flaws. To implement data mining in the proposed system, we need a mechanism to collect all database transactions and configurations from the clients. The main challenge of this technique is to develop an efficient algorithm to distinguish risky patterns [1] and to protect clients’ privacy in data mining [9].

88

T.K. Dang et al.

7 Conclusion and Future Work In this paper, we have presented a framework for detecting, monitoring and preventing database security breaches and three relevant methodologies, namely version checking- and pentesting-based techniques for detecting database security flaws, remote monitoring-based techniques for auditing data, and policy-based preventing techniques for preventing suspicious activities in a housing-based outsourcing model. Besides that, we have also introduced a classification of security flaws existing in the database systems and of activities that need to be monitored as a basis to enlarge the database of flaws and monitoring policies. In the future, we are going to integrate our recent results of a side-effects free database pentesting solution [15] into the proposed architecture and carry out open research issues as discussed in Sect. 6 to improve the system effectiveness, performance, and security. Besides that, we continue to do research on an extensible framework for database security flaw specification in order to provide a means for enlarging the system easier and for reducing the risks due to lack of specification experience of pentesters.

References 1. Ashish, K., Evimaria, K., Elisa, B.: Detecting Anomalous Access Patterns in Relational Databases. VLDB Journal, 17(5), 1063–1077 (2008) 2. The Bundesamt f¨ur Sicherheit in der Informationstechnik: Study: A Penetration Testing Model, URL: https://ssl.bsi.bund.de/english/publications/studies/penetration.pdf (2003) 3. Dang, T.K., Nguyen, T.S.: Providing Query Assurance for Outsourced Tree-Indexed Data. HPSC2006, Hanoi, Vietnam, pp. 207–224 (2008) 4. Dang, T.K.: Ensuring Correctness, Completeness and Freshness for Outsourced Tree-Indexed Data. IRMJ, Idea Group, 21(1), 59–76 (2008) 5. Dang, T.K., Truong, Q.C., Cu-Nguyen, P.H., Tran, T.Q.N.: An Extensible Framework for Detecting Database Security Flaws. ACOMP2008, Vietnam, pp. 68–77 (2008) 6. Dang, T.K., Tran, T.Q.N., Truong, Q.C.: Security Issues in Housing Service Outsourcing Model with Database Systems. ASIS LAB, ASIS-TR-0017/2009, URL: http://www.cse.hcmut.edu. vn/asis (2009) 7. Geer, D., Harthorne, J.: Penetration testing: a duet. Proceedings of the 18th Annual Computer Security Applications Conference, Las Vegas, USA, pp. 185–198 (2002) 8. Handscombe, K.: Continuous Auditing From A Practical Perspective. Information Systems Control Journal, 2 (2007) 9. Huynh, V.Q.P, Dang, T.K: eM2: An Efficient Member Migration Algorithm for Ensuring k-Anonymity and Mitigating Information Loss. VLDB Workshop on Secure Data Management, LNCS, Springer Verlag, Singapore, pp. 26–40 (2010) 10. Natan, R.B.: Implementing Database Security and Auditing. Elsevier Digital Press (2005) 11. Qiang, L.: Defense In-Depth to Achieve Unbreakable Database Security. ICITA2004, China, pp. 386–390 (2004) 12. Raffael, M.: Applied Security Visualization. Addison-Wesley (2008) 13. Rich, M.: Understanding and Selecting a Database Activity Monitoring Solution. URL: http: //securosis.com/publications/DAM-Whitepaper-final.pdf (2008)

Detecting, Monitoring and Preventing Database Security Breaches

89

14. Surajit, C., Arnd, C., Koenig, V.N.: SQLCM: A Continuous Monitoring Framework for Relational Database Engines. ICDE2004, USA, pp. 473–485 (2004) 15. Tran, T.Q.N., Dang, T.K.: Towards Side-Effects-free Database Penetration Testing. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA), 1(1), 72–85 (2010) 16. Varun, C., Arindam B., Vipin K.: Anomaly Detection: A Survey. ACM Computing Surveys (CSUR), 41(3), article 15 (2009)

•

Real-Time Sequential Convex Programming for Optimal Control Applications Tran Dinh Quoc, Carlo Savorgnan, and Moritz Diehl

Abstract This paper proposes real-time sequential convex programming (RTSCP), a method for solving a sequence of nonlinear optimization problems depending on an online parameter. We provide a contraction estimate for the proposed method and, as a byproduct, a new proof of the local convergence of sequential convex programming. The approach is illustrated by an example where RTSCP is applied to nonlinear model predictive control.

1 Introduction and Motivation Consider a parametric optimization problem of the form: (

min c T x x

s.t. g.x/ C M D 0; x 2 ˝;

P./

where x; c 2 Rn , g W Rn ! Rm is a nonlinear function, ˝ Rn is a convex set, the parameter belongs to a given set Rp , and M 2 Rmp is a given matrix. This paper deals with the efficient calculation of approximate solutions to a sequence of problems of the form P./ where the parameter is varying slowly. In other words, for a sequence fk gk1 such that kM.kC1 k /k is small, we want to solve problem P.k / in an efficient way without requiring too much accuracy in the result. In practice, sequences of problems of the form P./ can be solved in the framework of nonlinear model predictive control (MPC). MPC is an optimal control T.D. Quoc C. Savorgnan M. Diehl Department of Electrical Engineering (ESAT-SCD) and Optimization in Engineering Center (OPTEC), K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium e-mail: [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 8, © Springer-Verlag Berlin Heidelberg 2012

91

92

T.D. Quoc et al.

technique which avoids computing an optimal control law in a feedback form, which is often a numerically intractable problem. A popular way of solving the optimization problem to calculate the control sequence is using either interior point methods [1] or sequential quadratic programming (SQP) [2, 3, 9]. A drawback of using SQP is that this method may require several iterations before convergence and therefore the computation time may be too large for a real-time implementation. A solution to this problem was proposed in [6], where the real-time iteration (RTI) technique was introduced. Extensions to the original idea and some theoretical results are reported in [5, 7, 8]. Similar nonlinear MPC algorithms are proposed in [10, 13]. RTI is based on the observation that for several practical applications of nonlinear MPC, the data of two successive optimization problems to be solved in the MPC loop is numerically close. In particular, if we express these optimization problems in the form P./, the parameter usually represents the current state of the system, which, for most applications, doesn’t change significantly in two successive measurements. The RTI technique consists of performing only the first step of the usual SQP algorithm which is initialized using the solution calculated in the previous MPC iteration.

1.1 Contribution Before stating the main contributions of the paper we need to outline the (full-step) sequential convex programming (SCP) algorithm framework applied to problem P./ for a given value k of the parameter : 1. Choose a starting point x 0 2 ˝ and set j WD 0. 2. Solve the convex approximation of P.k /: 8 ˆ cT x < min x s.t. g 0 .x j /.x x j / C g.x j / C M k D 0; ˆ : x2˝

Pcvx .x j I k /

to obtain a solution x j C1 , where g 0 ./ is the Jacobian matrix of g./. 3. If the stopping criterion is satisfied then: STOP. Otherwise, set j WD j C 1 and go back to Step 2. The real-time sequential convex programming (RTSCP) method proposed in this paper combines the RTI technique and the SCP algorithm: instead of solving with SCP every P.k / to full accuracy, RTSCP solves only one convex approximation Pcvx .x k1 I k / using as a linearization point x k1 , which is the approximate solution of P.k1 / calculated at the previous iteration. Therefore, RTSCP solves a sequence of convex problems corresponding to the different problems P.k /. This method is suitable for the problems that contain a general convex substructure such as nonsmooth convex cost, second order or semidefinte cone constraints which may not be convenient for SQP methods.

Real-Time Sequential Convex Programming for Optimal Control Applications

93

In this paper we provide a contraction estimate for RTSCP which can be interpreted in the following way: if the linearization of the first problem P.0 / is close enough to the solution of the problem and the quantity kM.kC1 k /k is not too big (which is the case for many problems arising from nonlinear MPC), RTSCP provides a sequence of good approximations of the sequence of optimal solutions of the problems P.k /. As a byproduct of this result, we obtain a new proof of local convergence for the SCP algorithm. The paper is organized as follows. Section 2 proposes a description of the RTSCP algorithm. Section 3 proves the contraction estimate for the RTSCP method. The last section shows an application of the RTSCP method to nonlinear MPC.

2 The RTSCP Method As mentioned in the previous section, SCP solves a possibly nonconvex optimization problem by solving a sequence of convex subproblems which approximate the original problem locally. In this section, we combine RTI and SCP to obtain the RTSCP method. The method consists of the following steps: Initialization. Find an initial value 1 2 , choose a starting point x 0 2 ˝ and compute the information needed at the first iteration such as derivatives, dependent variables, . . . . Set k WD 1. Iteration. 1. Solve Pcvx .x k1 I k / (see Sect. 3) to obtain a solution x k . 2. Determine a new parameter kC1 2 , update (or recompute) the information needed for the next step. Set k WD k C 1 and go back to Step 1. One of the main tasks of the RTSCP method is to solve the convex subproblem Pcvx .x k1 I k / at each iteration. This work can be done by either implementing an optimization method which exploits the problem structure or relying on one of the many efficient software tools available nowadays. Remark 1. In the RTSCP method, a starting point x 0 in ˝ is required. It can be any point in ˝. But as we will show later [Theorem 1], if we choose x 0 close to the true solution of P.0 / and kM.1 0 /k is sufficiently small, then the solution x 1 of Pcvx .x 0 ; 1 / is still close to the true solution of P.1 /. Therefore, in practice, problem P.0 / can be solved approximately to get a starting point x 0 . Remark 2. Problem P./ has a linear cost function. However, RTSCP can deal directly with the problems where the cost function f .x/ is convex. If the cost function is quadratic and ˝ is a polyhedral set then the RTSCP method collapses to the real-time iteration of a Gauss-Newton method (see, e.g. [4]). Remark 3. In MPC, the parameter is usually the value of the state variables of a dynamic system at the current time t. In this case, is measured at each sample time based on the real-world dynamic system (see example in Sect. 4).

94

T.D. Quoc et al.

3 RTSCP Contraction Estimate The KKT conditions of problem P./ can be written as (

0 2 c C g0 .x/T C N˝ .x/

(1)

0 D g.x/ C M ;

˚ where N˝ .x/ WD u 2 Rn j uT .v x/ 0; 8v 2 ˝ if x 2 ˝ and N˝ .x/ WD ; if x … ˝, is the normal cone of ˝ at x, and is a Lagrange multiplier associated with g. Note that the constraint x 2 ˝ is implicitly included in the first line of (1). N A pair zN./ WD .x./; N .// satisfying (1) is called a KKT point and x./ N is called a stationary point of P./. We denote by ./ the set of KKT points at . In the sequel, we use z for a pair .x; /, zNk is a KKT point of P./ at k and zk is a KKT point of Pcvx .x k I kC1 / (defined below) at kC1 for k 0. The symbols k k and k kF stand for the L2 -normand the Frobenius norm, respectively. c C g 0 .x/T and K WD ˝ Rm , then the KKT Now, let us define '.zI / WD g.x/ C M system (1) can be expressed as a parametric generalized equation [11]: 0 2 '.zI / C NK .z/;

(2)

where NK .z/ is the normal cone of K at z. Let x k 2 ˝ be a solution of Pcvx .x k1 I k / at the k-iteration of RTSCP. We consider the following parametric convex subproblem at Step 1 of the RTSCP algorithm: 8 ˆ cT x < min x s.t. g 0 .x k /.x x k / C g.x k / C M kC1 D 0; ˆ : x 2 ˝:

Pcvx .x k I kC1 /

c C g0 .x k /T then g.x k / C g 0 .x k /.x x k / C M kC1 the KKT condition for Pcvx .x k ; kC1 / can also be represented as a parametric generalized equation: 0 2 '.zI O x k ; kC1 / C NK .z/; (3)

If we define '.zI O x k ; kC1 / WD

where k WD .x k ; kC1 / plays a role of parameter. Suppose that the Slater constraint qualification condition holds for problem Pcvx .x k I kC1 /, i.e.: ˚ ri.˝/ \ x W g.x k / C g 0 .x k /.x x k / C M kC1 D 0 ¤ ;;

Real-Time Sequential Convex Programming for Optimal Control Applications

95

where ri.˝/ is the set of the relative interior points of ˝. Then by convexity of ˝, a point zkC1 D .x kC1 ; kC1 / is a KKT point of the subproblem Pcvx .x k I kC1 / if and only if x kC1 is a solution of Pcvx .x k I kC1 / with a corresponding multiplier kC1 . For a given KKT point zN k 2 .k / of P.k /, we define a set-valued mapping: L.zI / WD '.zI O xN k ; / C NK .z/;

(4)

˚ and L1 .ıI / WD z 2 RnCm W ı 2 L.zI / for ı 2 RnCm is its inverse mapping. Note that 0 2 L.zI / is indeed the KKT condition of Pcvx .xN k I /. For each k 0, we make the following assumptions: (A1) The set of the KKT points 0 WD .0 / is nonempty. (A2) The function g is twice continuously differentiable on its domain. (A3) There exist a neighborhood N0 RnCm of the origin and a neighborhood NzNk of zNk such that for each ı 2 N0 , k .ı/ WD NzNk \ L1 .ıI / is single-valued and Lipschitz continuous on N0 with a Lipschitz constant > 0. (A4) There exists a constant 0 < 1= such that kEg .Nzk /kF , where P N k 2 N k /. Eg .Nzk / WD m i D1 i r gi .x Assumptions (A1) and (A2) are standard in optimization, while Assumption (A3) is related to the strong regularity concept introduced by Robinson [11] for the parametric generalized equations of the form (2). It is important to note that the strong regularity assumption follows from the strong second order sufficient optimality in nonlinear programming when the constraint qualification condition (LICQ) holds [11] [Theorem 4.1]. In this paper, instead of the generalized linear mapping LR .zI / WD '.Nzk I / C ' 0 .Nzk /.z zN k / C NK .z/ used in [11] to define strong regularity, in Assumption (A3) we use a similar form L.zI / D '.Nzk I / C D.Nzk /.z zN k / C NK .z/, where ' 0 .Nzk / D

Eg .Nzk / g 0 .xN k /T 0 g0 .xN k /T k / D ; and D.N z : g0 .xN k / 0 g 0 .xN k / 0

These expressions are different from each other only at the left-top corner Eg .Nzk /, the Hessian of the Lagrange function. Assumption (A3) corresponds to the standard strong regularity assumption (in the sense of Robinson [11]) of the subproblem (Pcvx .x k I kC1 /) at the point zNk , a KKT point of (2) at D k . Assumption (A4) implies that either the function g should be “weakly nonlinear” (small second derivatives) in a neighborhood of a stationary point or the corresponding Lagrange multipliers are sufficiently small in this neighborhood. The latter case occurs if the optimal value of (P./) depends only weakly on perturbations of the nonlinear constraint g.x/ C M D 0. Theorem 1 (Contraction Theorem). Suppose that Assumptions (A1)–(A4) are satisfied. Then there exist neighborhoods N of k , N of zNk and a single-valued function zN W N ! N such that for all kC1 2 N , zN kC1 WD zN.kC1 / is the unique

96

T.D. Quoc et al.

KKT point of P.kC1 / in N with respect to parameter kC1 (i.e. .kC1 / ¤ ;). Moreover, for any kC1 2 N , zk 2 N we have kzkC1 zNkC1 k !k kzk zNk k C ck kM.kC1 k /k;

(5)

where !k 2 .0; 1/, ck > 0 are constant, and zkC1 is a KKT point of Pcvx .x k I kC1 /. Proof. The proof is organized in two parts and step by step. The first part proves k WD .k / ¤ ; for all k 0 by induction and estimates the norm kNzkC1 zNk k. The second part proves the inequality (5) (see Fig. 1). Part 1: For k D 0, 0 ¤ ; by Assumption (A1). Suppose that k ¤ ; for k 0, we will show that kC1 ¤ ;. We divide the proof into four steps. Step 1.1. We first provide the following estimations. Take any zN k 2 k . We define O xN k ; k / '.zI /: rk .zI / WD '.zI

(6)

Since p < 1 by (A4), we can choose " > 0 sufficientlypsmall such that C 5 3 " < 1. By the choice of ", we also have c0 WD C 3" 2 .0; 1= /. Since g is twice continuously differentiable, there exist neighborhoods N Nk of k and N NzNk of a radius > 0 centered at zNk such that: rk .zI / 2 N0 , kEg .z/ Eg .Nzk /kF ", kEg .z/ Eg .zk /kF ", kg0 .x/ g 0 .xN k /kF " and kg 0 .x/ g 0 .x k /kF " for all z 2 N . Next, we shrink the neighborhood N of k , if necessary, such that: kM. k /k .1 c0 /=:

(7)

Step 1.2. For any z; z0 2 N , we now estimate krk .zI / rk .z0 I /k. From (6) we have rk .zI / rk .z0 I / D '.zI O xN k ; k / '.z O 0 I xN k ; k / '.zI / C '.z0 I / Z

1

D 0

Fig. 1 The approximate sequence fzk gk along the manifold zN./ of the KKT points

(8) 0

B.zt I xN /.z z/dt; k

Real-Time Sequential Convex Programming for Optimal Control Applications

97

where zt WD z C t.z0 z/ 2 N and

g 0 .z/T g 0 .x/ O T Eg .z/ B.zI x/ O D 0 : g .x/ g0 .x/ O 0

(9)

Using the estimations of Eg and g 0 at Step 1.1, it follows from (9) that 1=2 kB.zt I xN k /k kEg .Nzk /kF C kEg .zt / Eg .Nzk /k2F C 2kg 0 .xt / g0 .Nzk /k2F (10) p C 3" c0 : Substituting (10) into (8), we get krk .zI / rk .z0 I /k c0 kz z0 k:

(11)

Step 1.3. Let us define ˚ .z/ WD NzNk \ L.rk .zI /I k /. Next, we show that ˚ ./ is a contraction self-mapping onto N and then show that kC1 ¤ ;. Indeed, since rk .zI / 2 N0 , applying (A3) and (11), for any z; z0 2 N , one has k˚ .z/ ˚ .z0 /k krk .zI / rk .z0 I /k c0 kz z0 k:

(12)

Since c0 2 .0; 1/ (see Step 1.1), we conclude that ˚ ./ is a contraction mapping on N . Moreover, since zNk D NzNk \ L1 .0I k /, it follows from (A3) and (7) that k˚ .Nzk / zNk k krk .Nzk I /k D kM. k /k .1 c0 / : Combining the last inequality, (12) and noting that kz zNk k we obtain k˚ .z/ zN k k k˚ .z/ ˚ .Nzk /k C k˚ .Nzk / zNk k ; which proves ˚ is a self-mapping onto N . Consequently, for any kC1 2 N , ˚kC1 possesses a unique fixed point zNkC1 in N by virtue of the contraction principle. This statement is equivalent to zNkC1 is a KKT point of P.kC1 /, i.e. zNkC1 2 .kC1 /. Hence, kC1 ¤ ;. Step 1.4. Finally, we estimate kNzkC1 zNk k. From the properties of ˚ we have kNzkC1 zk .1 c0 /1 k˚kC1 .z/ zk; 8z 2 N :

(13)

Using this inequality with z D zN k and noting that zNk D ˚k .Nzk /, we have kNzkC1 zN k k .1 c0 /1 k˚kC1 .z/ ˚k .Nzk /k:

(14)

98

T.D. Quoc et al.

Since krk .Nzk I k / rk .Nzk I kC1 /k D kM.kC1 k /k, applying again (A3), it follows from (14) that kNzkC1 zNk k .1 c0 /1 kM.kC1 k /k:

(15)

O x k ; kC1 / as: Part 2: Let us define the residual from '.zI O xN k ; kC1 / to '.zI O xN k ; kC1 / '.zI O x k ; kC1 /: ı.zI x k ; kC1 / WD '.zI

(16)

Step 2.1. We first provide an estimation for kı.zI x k ; kC1 /k. From (16) we have ı.zI x k ; kC1 /D '.zI O xN k; kC1/ '.NzkI kC1 / '.zI kC1/ '.NzkI kC1 / '.zI O x k ; kC1 / '.zk I kC1 / C '.zI kC1 / '.zk I kC1 / Z 1 Z 1 B.zkt I x k /.z zk /dt B.Nzkt I xN k /.z zN k /dt (17) D Z

0

D 0

0

1 k k B.zt I x / B.Nzkt I xN k / .z zk /dt

Z 0

1

B.Nzkt I xN k /.zk zNk /dt;

where zkt WD zk C t.z zk /, zN kt WD zNk C t.z zNk / and B is defined by (9). Using the definition of 'O and the estimations of Eg and g 0 at Step 1.1, it is easy to show that 1=2 kB.zkt I x k / B.Nzkt I xN k /k kEg .zkt/Eg .Nzk /k2F C2kg0 .xtk / g0 .x k /k2F (18) p 1=2 2 3": C kEg .Nzkt/Eg .Nzk /k2F C2kg0 .xN tk / g0 .xN k /k2F Similar to (10), the quantity B.Nzkt I xN k / is estimated by kB.Nzkt I xN k /k C

p

3":

(19)

Substituting (18) and (19) into (17), we obtain an estimation for kı.zI x k ; kC1 /k as kı.zI x k ; kC1 /k . C

p p 3"/kzk zNk k C 2 3"kz zk k:

(20)

Step 2.2. We finally prove the inequality (5). Suppose that zkC1 is a KKT point of Pcvx .x k I kC1 /, we have 0 2 '.z O kC1 I x k ; kC1 / C NK .zkC1 /. This inclusion kC1 k kC1 implies ı.z I x ; kC1 / 2 '.z O I xN k ; kC1 / C NK .zkC1 / L.zkC1 I kC1 / by kC1 k the definition (16) of ı.z I x ; kC1 /. On the other hand, since 0 2 '.N O zk I xN k ; k / C k k NK .Nz /, which is equivalent to ı1 WD M.kC1 k / 2 L.Nz I kC1 /, applying (A3) we get kzkC1 zN k k kı.zkC1 I x k ; kC1 / ı1 k kı.zkC1 I x k ; kC1 /k C kM.kC1 k /k:

Real-Time Sequential Convex Programming for Optimal Control Applications

99

Combining this inequality and (20) with z D zkC1 to obtain p p kzkC1 zNk k . C 3"/kzk Nzk kC2 3 "kzkC1zk kC kM.kC1 k /k: (21) Using the triangular inequality, after a simple arrangement, (21) implies kz

kC1

zN

kC1

p p . C 3 3"/ k 1 C 2 3 " kC1 k p kz zN k C p kNz k zNk k 1 2 3 " 1 2 3 " C p kM.kC1 k /k: 1 2 3 "

(22)

i h p p 3"/ 2 3 "C1 p p , ck WD C 1 . By the choice Now, let us define !k WD .C3 1c 0 12 3 " 12 3 " of " at Step 1.1, we can easily check that !k 2 .0; 1/ and ck > 0. Substituting (15) into (22) and using the definitions of !k and ck , we obtain kzkC1 zNkC1 k !k kzk zNk k C ck kM.kC1 k /k; which proves (5). The theorem is proved.

If fg then the RTSCP method collapses to the full-step SCP method described in Sect. 1. Without loss of generality, we can assume that k D 0 for all k 0. The following corollary immediately follows from Theorem 1. ˚ Corollary 1. Suppose that zj j 1 is the sequence of the KKT points of Pcvx .x j 1 I 0/ generated by the SCP method described in Sect. 1 and that the assumptions of Theorem 1 hold for k D 0. Then kzj C1 zNk !kzj zNk; 8j 0;

(23)

where ! 2 .0; 1/ is the contraction factor. Consequently, this sequence converges linearly to a KKT point zN of P.0/.

4 Numerical Example: Control of an Underactuated Hovercraft In this section we apply RTSCP to the control of an underactuated hovercraft. We use the same model as in [12], which is characterized by the following differential equations: 8 ˆ ˆ <myR1 .t/ D .u1 .t/ C u2 .t// cos./; (24) myR2 .t/ D .u1 .t/ C u2 .t// sin./; ˆ ˆ :I R .t/ D r.u .t/ u .t//; 1 2

100

T.D. Quoc et al.

u1 θ (y1,y2) y2

r

u2 y1

Fig. 2 RC hovercraft and its model [12]

where y.t/ D .y1 .t/; y2 .t//T is the coordinate of the center of mass of the hovercraft (see Fig. 2); .t/ represents the direction of the hovercraft; u1 .t/ and u2 .t/ are the fan thrusts; m and I are the mass and moment of inertia of the hovercraft, respectively; and r is the distance between the central axis of the hovercraft and the fans. The problem considered is to drive the hovercraft from its initial position to the final parking position corresponding to the origin of the state space while respecting the constraints u u1 .t/ uN ; u u2 .t/ uN ; y 1 y1 .t/ yN1 ; y 2 y2 .t/ yN2 :

(25)

To formulate this problem so that we can use the proposed method, we discretize the dynamics of the system using the Euler discretization scheme. After introducing a P T and a control variable u WD .u1 ; u2 /T , new state variable WD .y1 ; y2 ; ; yP1 ; yP2 ; / we can formulate the following optimal control problem: min

0 ;:::;N u0 ;:::;uN 1

s.t.

PN 1 h nD0

i kn k2Q C kun k2R C kN k2S

N 0 D ; nC1 D .n ; un / 8n D 0; : : : ; N 1; Q .0 ; : : : ; N ; u0 ; : : : ; uN 1 / 2 ˝;

(26)

where .; / represents the discretized dynamics and the constraint set ˝Q can be easily deduced from (25). By introducing a slack variable s and using the convex constraint: N 1 h i X kn k2Q C kun k2R C kN k2S ; s (27) nD0

Real-Time Sequential Convex Programming for Optimal Control Applications

u1(t)

0.3

0.2

0.3 0.2 0.1 0 −0.1 0

2

4 6 Time [s]

8

0

2

4 6 Time [s]

8

0.15 0.1 u (t)

y − axis (m)

0.25

0 −0.4

0.3 0.2 0.1 0 −0.1

2

0.05

101

−0.3

−0.2 −0.1 x − axis (m)

0

Fig. 3 Trajectory of the hovercraft after t D 9:5s (left) and control input profile (right)

N of a variable x WD .s; T ; : : : ; T ; uT ; : : : ; uT /T we can transform (26) into P./ 0 N 0 N 1 T and the objective function c x D s. Note that N is an online parameter. It plays the role of k in the RTSCP algorithm along the moving horizon (see Sect. 2). We implemented the RTSCP algorithm using a primal-dual interior point method for solving the convex subproblem Pcvx .x k1 I k /. We performed a simulation using the same data as in [12]: m D 0:974kg, I D 0:0125kg m2 , r D 0:0485m, u D 0:121N, uN D 0:342N; y 1 D y 2 D 2m, yN1 D yN2 D 2m, Q D diag.5; 10; 0:1; 1; 1; 0:01/, S D diag.5; 15; 0:05; 1; 1; 0:01/, R D diag.0:01; 0:01/ and the initial condition 0 D .0/ D .0:38; 0:30; 0:052; 0:0092;0:0053; 0:002/T. Figure 3 shows the results of the simulation where a sampling time of t D 0:05s and N D 15 are used. The stopping condition used for the simulation is ky.t/k 0:01. Acknowledgements The authors would like to thank the anonymous referees for their comments and suggestions that helped to improve the paper. This research was supported by Research Council KUL: CoE EF/05/006 Optimization in Engineering(OPTEC), GOA AMBioRICS, IOF-SCORES4CHEM, several PhD/postdoc & fellow grants; the Flemish Government via FWO: PhD/postdoc grants, projects G.0452.04, G.0499.04, G.0211.05, G.0226.06, G.0321.06, G.0302.07, G.0320.08 (convex MPC), G.0558.08 (Robust MHE), G.0557.08, G.0588.09, research communities (ICCoS, ANMMM, MLDM) and via IWT: PhD Grants, McKnow-E, Eureka-FliteCEU: ERNSI; FP7-HD-MPC (Collaborative Project STREP-grantnr. 223854), Contract Research: AMINAL, and Helmholtz Gemeinschaft: viCERP; Austria: ACCM, and the Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO, Dynamical systems, control and optimization, 2007–2011).

References 1. L.T. Biegler: Efficient solution of dynamic optimization and NMPC problems. In: F. Allg¨ower and A. Zheng (ed), Nonlinear Predictive Control, vol. 26 of Progress in Systems Theory, 219– 244, Basel Boston Berlin, 2000.

102

T.D. Quoc et al.

2. L.T. Biegler and J.B Rawlings: Optimization approaches to nonlinear model predictive control. In: W.H. Ray and Y. Arkun (ed), Proc. 4th International Conference on Chemical Process Control - CPC IV, 543–571. AIChE, CACHE, 1991. 3. H.G. Bock, M. Diehl, D.B. Leineweber, and J.P. Schl¨oder: A direct multiple shooting method for real-time optimization of nonlinear DAE processes. In: F. Allg¨ower and A. Zheng (ed), Nonlinear Predictive Control, vol. 26 of Progress in Systems Theory, 246–267, Basel Boston Berlin, 2000. 4. M. Diehl: Real-Time Optimization for Large Scale Nonlinear Processes. vol. 920 of Fortschr.Ber. VDI Reihe 8, Meß-, Steuerungs- und Regelungstechnik, VDI Verlag, D¨usseldorf, 2002. 5. M. Diehl, H.G. Bock, and J.P. Schl¨oder: A real-time iteration scheme for nonlinear optimization in optimal feedback control. SIAM J. on Control and Optimization, 43(5):1714– 1736, 2005. 6. M. Diehl, H.G. Bock, J.P. Schl¨oder, R. Findeisen, Z. Nagy, and F. Allg¨ower: Real-time optimization and nonlinear model predictive control of processes governed by differentialalgebraic equations. J. Proc. Contr., 12(4):577–585, 2002. 7. M. Diehl, R. Findeisen, and F. Allg¨ower: A stabilizing real-time implementation of nonlinear model predictive control. In: L. Biegler, O. Ghattas, M. Heinkenschloss, D. Keyes, and B. van Bloemen Waanders (ed), Real-Time and Online PDE-Constrained Optimization, 23–52. SIAM, 2007. 8. M. Diehl, R. Findeisen, F. Allg¨ower, H.G. Bock, and J.P. Schl¨oder: Nominal Stability of the Real-Time Iteration Scheme for Nonlinear Model Predictive Control. IEE Proc.-Control Theory Appl., 152(3):296–308, 2005. 9. A. Helbig, O. Abel, and W. Marquardt: Model Predictive Control for On-line Optimization of Semi-batch Reactors. Pages 1695–1699, Philadelphia, 1998. 10. T. Ohtsuka: A continuation/GMRES method for fast computation of nonlinear receding horizon control. Automatica, 40(4):563–574, 2004. 11. S. M. Robinson: Strongly regular generalized equations. Mathematics of Operations Research, 5(1):43-62, 1980. 12. H. Seguchi and T. Ohtsuka: Nonlinear Receding Horizon Control of an Underactuated Hovercraft. International Journal of Robust and Nonlinear Control, 13(3–4):381–398, 2003. 13. V. M. Zavala and L.T. Biegler: The Advanced Step NMPC Controller: Optimality, Stability and Robustness. Automatica, 45:86–93, 2009.

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares Abhijeet Gaikwad, Viet Dung Doan, Mireille Bossy, Franc¸oise Baude, and Fr´ed´eric Abergel

Abstract Pricing and hedging of higher order derivatives such as multidimensional (up to 100 underlying assets) European and first generation exotic options represent mathematically complex and computationally intensive problems. Grid computing promises to give the capability to handle such intense computations. With several Grid middleware solutions available for gridifying traditional applications, it is cumbersome to select an ideal candidate, to develop financial applications, that can cope up with time critical computational demand for complex pricing requests. In this paper we present SuperQuant Financial Benchmark Suite to evaluate and quantify the overhead imposed by a Grid middleware on throughput of the system and turnaround times for computation. This approach is a step towards producing a middleware independent, reproducible, comparable, self-sufficient and fair performance analysis of Grid middlewares. The result of such a performance analysis can be used by middleware vendors to find the bottlenecks and problems in their design and implementation of the system and by financial application developers to verify the implementation of their financial algorithms. In this paper we explain the motivation and the details of the proposed benchmark suite. As a proof of concept, we utilize the benchmarks in an International Grid Programming contest and demonstrate the result of initial experiments.

V. D. Doan M. Bossy F. Baude INRIA Sophia Antipolis - M´editerran´ee – Universit´e de Nice - Sophia Antipolis – I3S CNRS e-mail: Viet [email protected]; [email protected]; [email protected] A. Gaikwad F. Abergel Laboratoire de Math´ematiques Appliqu´ees aux Syst`emes – Ecole Centrale de Paris e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 9, © Springer-Verlag Berlin Heidelberg 2012

103

104

A. Gaikwad et al.

1 Introduction Over the past few decades financial engineering has become a critical discipline and have gained strategic reputation for its own. Financial mathematicians keep coming up with novel and complex financial products and numerical computing techniques which often increase volume of data or computational time while posing critical time constraints for transactional processing. Generally, Monte Carlo (MC) simulations based methods are utilized to overcome typical problems like “curse of dimensionality” (e.g. integration over high dimensional space) [4]. Despite the ease of numerics, MC simulations come at the cost of tremendous computational demand in addition to slower convergence rates. However, the advances in computer architectures like multi-core, many-cores, General Purpose Graphics Processing Units (GPGPUs) and their macro forms like clusters and federated Grids have made such MC simulations a handy tool for financial engineers. Financial institutions are using Grid computing to perform more time critical computations for competitive advantage. With this unprecedented computational capacity, running overnight batch processes for risk management or middle-office functions to re-evaluate whole product of portfolios have almost become a passe. K Grid middleware is what makes Grid computing work and easier to work with. It provided abstractions for core functionalities like authentication across a large number of resources, authorization, resource matchmaking, data transfer, monitoring and fault-tolerance mechanisms in order to account for failure of resources etc. Any robust financial service operation cannot be achieved without paying a great attention to such issues. Current Grid middleware had its beginning in the Condor Project1 and the Globus Alliance.2 Recently, we have seen an upsurge of academic and commercial middleware providers such as gLite,3 ProActive/GCM Parallel Suite,4 Alchemi .NET Grid computing framework,5 and KAAPI/TakTuk6 etc. Now the question is which middleware to choose for gridifying financial applications? An obvious way is to devise a set of benchmarks and put different implementations through their paces. The middleware that results in the fastest computation could be declared as a winner. For this, one would need a standard well defined benchmark which would represent a wide set of financial algorithms, for instance MC based methods, and could also generate enough load on the middleware in test. Benchmarks provide a commonly accepted basis of performance evaluation of software components. Performance analysis and benchmarking, however, are relatively young areas in Grid computing compared to benchmarks designed for

1

http://www.cs.wisc.edu/condor/ http://www.globus.org/ 3 http://glite.web.cern.ch/glite/ 4 http://proactive.inria.fr/ 5 http://sourceforge.net/projects/alchemi/ 6 http://kaapi.gforge.inria.fr/ 2

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares

105

evaluating computer architecture. Traditionally, performance of parallel computer systems has been evaluated by strategically creating benchmark induced load on the system. Typically, such benchmarks comprise of codes, workloads that may represent varied computation and are developed with different programming paradigms. Some examples are STREAM,7 LINPACK8 and MPI Benchmarks,9 SPEC10 and most popular NAS Parallel benchmark.11 A key issue, however, is whether these benchmarks can be used “as is” for the Grid settings. The adoption of these benchmarks may raise several fundamental questions about their applicability, and ways of interpreting the results. Inherently, Grid is a complex integration of several functionally diverse components which may hinder evaluation of any individual components like middleware. Furthermore, in order to have fair evaluations, thus any benchmark would have to account for heterogeneity of resources, presence of virtual organizations and their diverse resource access policies, dynamicity due to inherent shared nature of the Grid. Such issues in turn have led to broader implications upon methodologies used behind evaluating middlewares as discussed in [1, 7, 13, 15]. In our work, however, for the sake of simplicity we assume the benchmarks are run on Grid nodes in isolation. Thus, we primarily focus on quantifying performance of financial applications, achievable scalability, ease of deployment across large number of heterogeneous resources and their efficient utilization. The goal of our work presented in this paper is to design and develop SuperQuant Financial Benchmark Suite, a tool for researchers that wish to investigate various aspects of usage of Grid middlewares using well-understood benchmark kernels. The availability of such kernels can enable the characterization of factors that affect application performance, the quantitative evaluation of different middleware solutions, scalability of financial algorithms . . . The rest of this paper is organized as follows: in Sect. 2 we discuss the motivation behind SuperQuant Financial Benchmark Suite and propose guidelines for designing such benchmark. In Sect. 3 we describe the components of the benchmark suite. Section 4 presents the preliminary benchmark usage in a Grid Programming Contest. We conclude in Sect. 5.

2 SuperQuant Financial Benchmark Suite In order to produce verifiable, reproducible and objectively comparable results, any middleware benchmark must follow the general rules of scientific experimentation. Such tools must provide a way of conducting reproducible experiments to evaluate 7

http://www.gridstream.org/ http://www.netlib.org/benchmark/ 9 http://hcl.ucd.ie/project/mpiblib 10 http://www.spec.org/mpi2007/press/release.html 11 http://www.nas.nasa.gov/Resources/Software/npb.html 8

106

A. Gaikwad et al.

performance metrics objectively, and to interpret benchmark results in a desirable context. The financial application developers should be able to generate metrics that quantify the performance capacity of Grid middleware through measurements of deployability, scalability, and computational capacity etc. Such metrics can provide a basis for performance tuning of application or the middleware. Alternatively, the middleware providers could utilize such benchmarks to make necessary problem specific software design changes. Hence, in order to formalize efforts to design and evaluate any Grid middleware, in this paper we present our SuperQuant Financial Benchmark Suite. Some other considerations for the development of this benchmarks are described below and significantly follow the design guidelines of NAS benchmarks suite [2]: • Benchmarks must be conceptually simple and easy to understand for both the financial and the Grid computing community. Additionally, such a benchmark should also be representative of typical challenging problems in the target application field, as in this work, we focus on computational finance. Monte Carlo simulation, for instance, has become an essential tool for financial engineering and ideally represents a wide range of problems in the field. In addition, MC simulation is fairly easy to parallelize and hence can be treated as a potential candidate for a benchmark suite. • Benchmarks must be “generic” and should not favor any specific middleware. Some middlewares provide various high level programming constructs, such as tailored APIs like Monte Carlo API or provision for parallel random number generators, etc. A benchmark definition should not assume any such features. • The correctness of results and performance figures must be easily verifiable. This requirement implies that both input and output data sets must be limited and well defined. Since we target financial applications, we also need to consider real world trading and computation scenarios and data involved therewith. The problem has to be specified in sufficient detail and the required output has to be brief yet detailed enough to certify that the problem has been solved correctly. • The problem size and runtime requirements must be easily adjustable to accommodate new middlewares or systems with different functionalities. The problem size should be large enough to generate considerable amount of computation and communication. In the kernel presented in this paper, we primarily focus on the computational load while future benchmark kernels may impose communication as well as data volume loads. • The benchmarks must be readily redistributable. A financial engineer implementing the benchmarks with a given Grid middleware is expected to solve the problem in the most appropriate way for the given computing infrastructure. The choice of APIs, algorithms, parallel random number generators, benchmark processing strategies, resource allocation is left open to the discretion of this engineer. The languages used for programming financial systems are mostly C,C++ and Java. Most of the Grid middlewares are available in these languages and the application developers are free to utilize language constructs that,

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares

107

they think give the best performance possible or any other requirements imposed by the business decisions, on the particular infrastructure available at their organization.

3 Components of SuperQuant Financial Benchmark Suite Our benchmark suite consists of three major components, (1) embarrassingly parallel kernel, (2) input/output data and Grid metric descriptors, and (3) output evaluator. Each of these components are briefly described in the following sections.

3.1 Embarrassingly Parallel Kernel We have proposed a relatively “simple” computational kernel which utilizes a batch of high dimensional Vanilla and Barrier options. The aim of this benchmark is to compute the prices and the Greeks (i.e. hedging values) of a maximum possible number of options using Monte Carlo simulation based methods, within a definite time interval. Despite being embarrassingly parallel, in order to achieve the acceptable accuracy of the computed values, Monte Carlo simulations that are required to resolve the proposed task demand significant computational power. The necessary financial formulations, pseudocodes and an exemplary parallel version of MC based solution are provided with the benchmark suite. A detailed description of the proposed kernel can be found in ([9], Sect. 1.1.2). The definitions of financial terms in this section can be found in common textbooks [11, 14], although the reader may find the following information self-explanatory.

3.1.1 European Option Pricing The Black–Scholes (BS) model describes the evaluation of a basket of assets price through a system of stochastic differential equations (SDEs) [12], dSti D Sti .r ıi /dt C Sti i dBti ; i D 1; : : : ; d; where • • • • •

(1)

S D fS 1 ; : : : ; S d g is a basket of d assets. r is the constant interest rate for every maturity date and at any time. ı D fı1 ; : : : ; ıd g are the constant dividend rates. B D fB 1 ; : : : ; B d g is a correlated d -dimensional Brownian motion (BM). D f1 ; : : : ; d g is a constant volatility vector.

A European option is a contract which can be exercised only at a fixed future date T with a fixed price K. A call (or put) option gives the option holder the right

108

A. Gaikwad et al.

(not the obligation) to buy (or sell) the underlying asset at the date T . At T , an exercised option contract will pay to the option holder a position payoff ˚.f .ST // which depends only on the underlying asset price at the maturity date ST (for a Vanilla option) or ˚.f .St /; t 2 Œ0; T / which depends on the entire underlying asset trajectories price St (for a Barrier option). The definition of f ./ is given by the option’s payoff type (Arithmetic Average, Maximum, or Minimum) [11, 14]. According to the Arbitrage Pricing Theory [12], the fair price V for the option contract is given by the following expression: V .S0 ; 0/ D E e rT ˚.f .ST // . The expectation value is calculated by computing the mean with MC simulation based methods [10] and the parallel approach for which can be found in [5].

3.1.2 European Greeks Hedging The Greeks represent the sensitivities of the option price with respect to a change in market parameters like time remained to maturity, volatility, or interest rate etc, on which the option price is dependent. Usually Greeks are higher order derivatives that are computed using finite difference methods [10,11]. Greeks are not observed in the real time market, but important information that needs to be accurately computed. These computations are very expensive. A detailed explanation of Greeks such as Delta ./, Gamma . /, Rho ./ and Theta ./ can be found in [11]. The core benchmark kernel consists of a batch of 1,000 well calibrated TestCases. Each TestCase is a high-dimensional European option with up to 100 underlying assets with necessary attributes like spot prices, payoffs types, time to maturity, volatility, and other market parameters. In order to constitute an option, the underlying assets are chosen from a pool of companies listed in the equity S&P500 index,12 while volatility of each asset and its dividend rate are taken from CBOE.13 The composition of the batch is as follows, • 500 TestCases of 10-dimensional European options with 2 years time to maturity • 240 TestCases of 30-dimensional European options with 9 months time to maturity • 240 TestCases of 50-dimensional European options with 6 months time to maturity • 20 TestCases of 100-dimensional European options with 3 months time to maturity Thus, the objective of the benchmark is pricing and hedging of the maximum number of TestCases by implementing the algorithms using a given Grid middleware.

12 13

http://www2.standardandpoors.com http://www.cboe.com/

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares

109

3.2 Input/Output Data and Grid Metrics Format To facilitate processing, exchanging and archiving of input data, output data and Grid related metrics, we define relevant XML data descriptors. The TestCases required by the kernel and the “reference” results are also included in the benchmark suite. • Input AssetPool: represents the database of 250 assets required to construct a basket (collection) option of assets • Input CorrelationMatrix: defines a correlation matrix of the assets in AssetPool. The provided matrix is positive-definite with diagonal values 1 and correlation coefficients in the interval of Œ1; 1. • Input TestCases: defines a set of TestCases and input parameters needed by the pricing and hedging algorithm discussed above. Each TestCase includes parameters such an option, which is a subset of AssetPool, a submatrix of CorrelationMatrix, type of payoff, type of option, barrier value if needed, interest rate, maturity date and etc. • Output Results: defines a set of Result which consists of Price and Greeks of individual TestCase and time Metrics required to compute each output values. • Output Grid Metrics: defines the total time required for the entire computation.

3.3 Output Evaluator The output evaluator is a tool to compare the results computed by different implementations of the benchmark kernel TestCases with “reference” results provided in the suite.

3.3.1 Evaluation Criteria In order to measure the precision, the results are estimated with a confidence interval of 95% [10]. We decide the tolerable error in computing the results is 103 . Since the accuracy of the computed results relies on the spot prices of the underlying assets, we consider relative errors with respect to the “reference results”. These reference results are computed with a sufficiently large number of MC simulations (more than 106 simulations) [10], in order to achieve a lower confidence interval. The Output Evaluator employs a point based scheme to grade the results and also provides a detail analysis of points gained per TestCase. For further description on the evaluation criteria, see [8].

110

A. Gaikwad et al. Table 1 Call/Put price of a GA of 100 assets option and of the “pseudo” one Call Price V (95% CI) “Pseudo” Call Price VQ V .104 / (%) 0.16815 (0.00104) 0.16777 3.8 0.22 Put Price V (95% CI) “Pseudo” Put Price VQ V .104 / (%) 2.12868 (0.00331) 2.12855 1.2 0.01

3.3.2 “Reference” Results Validation The “reference” results provided in the benchmark suite are not analytical results and are computed using MC based methods. Pricing or hedging of high-dimensional European options is not possible with a standard analytical BS formula [6]. The intriguing question of correctness of the “reference” results also diverted us to investigate methods to validate the results computed by simulation. We observed that in some specific cases we can analytically reduce a basket of assets into a one-dimensional “pseudo” asset. The option price on this “pseudo” asset can be computed by using the BS formula. This way we can compare simulated and analytical results. Further details of the reduction techniques are given in our technical report [8]. To highlight the usefulness of this approach, we provide below a numerical example. Numerical Example: Consider a call/put Geometric Average (GA) option of 100 independent assets with prices modeled by SDEs (1). The parameters are given as S0i D 100; i D 1; : : : ; 100, K D 100, r D 0:0, ıi D 0:0, D 0:2 and T D 1 year. The basket option is simulated by using 106 MC simulations. The “pseudo” asset 1 Qd id of the one-dimensional SDE: d˙t =˙t D is ˙t D i D1St and it is the solution 2 2 dt Q C Q dZt where Q D r C 2d 2 ; Q D p and Zt is a Brownian motion. The d parameters of ˙ are given as ˙0 D 100; Q D 0:0198; Q D 0:02. We are interested in comparing the estimated option price V of d assets with the analytical “pseudo” one VQ on ˙. We denote the absolute error V D jV VQ j, then the relative error is computed as D V . In Table 1 we present the numerical results. The first column VQ represents the estimated option prices and their 95% confidence interval. The second column gives the analytical option prices. The last two columns show the absolute and relative errors. As it can be observed, the errors are very small. We can reduce the errors in case of call option pricing by increasing the number of MC simulations.

4 Proof of Concept: The V Grid Plugtest and Contest As a proof of concept, we used the SuperQuant Benchmark Suite for the 2008 SuperQuant Monte Carlo Challenge organized as a part of V GRID Plugtest14 at 14

http://www.etsi.org/plugtests/GRID2008/About GRID.htm

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares

111

Fig. 1 Final results of the 2008 SuperQuant Monte Carlo Challenge

INRIA-Sophia Antipolis. The details of the contest and the benchmark input data can be found on the Challenge website.15 Each participant was given an exclusive one hour access for evaluating the benchmark on two academic Grids, Grid’500016 and InTrigger,17 which consisted of 5,000 computational cores geographically distributed across France and Japan.

4.1 Challenge Results Figure 1 presents the final results of the Challenge. The participants primarily used two middlewares, ProActive, an open source Java based Grid middleware and KAAPI/TAKTUK, which coupled KAAPI , a Parallel Programming Kernel and TAKTUK, a middleware for adaptive deployment. As we can see in Fig. 1, the KAAPI/TAKTUK team was successful in computing the maximum number of TestCases and was also able to deploy application on a significantly large number of nodes. The other teams used ProActive to implement the benchmark kernel. Both middlewares implement Grid Component Models (GCM), recently standardized by the ETSI technical committee GRID18

15

http://www-sop.inria.fr/oasis/plugtests2008/ProActiveMonteCarloPricingContest.html https://www.grid5000.fr/mediawiki/index.php/Grid5000:Home 17 https://www.intrigger.jp/wiki/index.php/InTrigger 18 http://www.etsi.org/WebSite/Technologies/GRID.aspx 16

112

A. Gaikwad et al.

for deploying the application over large number Grid nodes [3]. The infrastructure descriptors and application descriptors required by GCM were bundled with the benchmark suite. From Fig. 1, we can observe that the benchmarks were not only useful to quantitatively compare two middleware solutions, but also gave the opportunity to evaluate different benchmark implementations using the same middleware. Such comparison is useful not only to middleware providers but also to Grid application developers.

5 Conclusion and Perspectives In this paper we have presented SuperQuant Financial Benchmark Suite for performance evaluation and analysis of Grid middlewares in the financial engineering context. We described the preliminary guidelines for designing the benchmark. We also described the benchmark constituents along with a brief overview of the embarrassingly parallel benchmark kernel. As a proof of concept, we also utilized this benchmark in a Grid Programming Contest. Although this is a preliminary proposal for this benchmark, the specification of more complex kernels that can induce inter-cluster communication, high speed I/O requirements, or data processing, is necessary for truly understanding the overhead imposed by Grid middlewares in financial applications. Acknowledgements We thank the organizers of the 2008 SuperQuant Monte Carlo Challenge for their useful support during the contest. We also thank the participants for giving permission to use the contest results to carry out this work. This research has been supported by the French “ANR-CIGC GCPMF” research project, and Grid5000 has been funded by ACI-GRID.

References 1. P. Alexius, B. Elahi, F. Hedman, P. Mucci, G. Netzer, and Z. Shah. A black-box approach to performance analysis of grid middleware. In Euro-Par 2007 Workshops: Parallel Processing, pages 62–71. Springer, 2008. 2. D. Baileym, J. Barton, T. Lasinski, and H. Simon. The NAS Parallel Benchmarks. Technical Report RNR-91-002 Revision 2, NAS Systems Division, NASA Ames Research Center, August 1991. 3. F. Baude, D. Caromel, C. Dalmasso, M. Danelutto, V. Getov, L. Henrio, and C. P´erez. GCM: A grid extension to fractal for autonomous distributed components. Annals of Telecommunications, 64(1):5–24, 2009. 4. R. Bellman. Dynamic programming. Science, 153(3731):34–37, 1966. 5. S. Bezzine, V. Galtier, S. Vialle, F. Baude, M. Bossy, V.D. Doan, and L. Henrio. A Fault Tolerant and Multi-Paradigm Grid Architecture for Time Constrained Problems. Application to Option Pricing in Finance. 2nd IEEE International Conference on e-Science and Grid Computing, Netherlands, December 2006. 6. F. Black and M. Scholes. The pricing of options and corporate liabilities. The journal of political economy, 81(3):637–654, 1973.

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares

113

7. M.D. Dikaiakos. Grid benchmarking: vision, challenges, and current status. Concurrency and Computation: Practice and Experience, 19(1):89–105, 2007. 8. V.D. Doan, A. Gaikwad, M. Bossy, F. Baude, and F. Abergel. A financial engineering benchmark for performance analysis of grid middlewares. Technical Report 0365, INRIA, 2009. 9. Doan.V.D. Grid computing for Monte Carlo based intensive calculations in financial derivative pricing applications. PhD thesis, University of Nice-Sophia Antipolis, March 2010. http:// www-sop.inria.fr/oasis/personnel/Viet Dung.Doan/thesis/. 10. P. Glasserman. Monte Carlo Methods in Financial Engineering. Springer, 2004. 11. J. Hull. Options, futures, and other derivatives. Prentice Hall, 2000. 12. D. Lamberton and B. Lapeyre. Introduction to stochastic calculus applied to finance. CRC Press, 1996. 13. G. Tsouloupas and M.D. Dikaiakos. Gridbench: A tool for benchmarking grids. In Fourth International Workshop on Grid Computing, pages 60–67, 2003. 14. P. Wilmott. Derivatives: The theory and practice of financial engineering. J. Wiley, 1998. 15. H. Zhu, Y. Zou, and L. Zha. Vegabench: A benchmark tool for grid system software. In Fifth International Conference on Grid and Cooperative Computing Workshops, pages 543–548. IEEE Computer Society, 2006.

•

A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria Jochen Garcke

Abstract We present a dimension adaptive sparse grid combination technique for the machine learning problem of regression. A function over a d -dimensional space, which assumedly describes the relationship between the features and the response variable, is reconstructed using a linear combination of partial functions; these may depend only on a subset of all features. The partial functions, which are piecewise multilinear, are adaptively chosen during the computational procedure. This approach (approximately) identifies the ANOVA-decomposition of the underlying problem. We introduce two new localized criteria, one inspired by residual estimators based on a hierarchical subspace decomposition, for the dimension adaptive grid choice and investigate their performance on real data.

1 Introduction Sparse grids are an approach for efficient high dimensional function approximation. They were introduced under this name for the numerical solution of partial differential equations, although the underlying idea was used first for numerical integration. These approaches are based on a multiscale tensor product basis where basis functions of small importance are omitted. In the form of the combination technique, sparse grids have successfully been applied to the machine learning problems of classification and regression using a regularization network approach [2, 3]. Here the problem is discretized and solved on an a priori chosen sequence of anisotropic grids with uniform mesh sizes in each coordinate direction. The sparse grid solution is then obtained from the solutions on these different grids by linear combination. This results in a non-linear function, while the computational complexity scales

J. Garcke Technische Universit¨at Berlin, Institut f¨ur Mathematik, MA 3-3, Straße des 17. Juni 136, 10623 Berlin, Germany e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 10, © Springer-Verlag Berlin Heidelberg 2012

115

116

J. Garcke

only linear in the number of data. The main difference in comparison to many other machine learning approaches is the choice of basis functions whose anchoring position is independent of the locations of the data. Although sparse grids cope with the curse of dimensionality to some extent the approach still has high dependence on d , the number of dimensions. But typically the importance of and variance within a dimension vary in real machine learning applications. This can exploited by different mesh resolutions for each feature. The degree of interaction between different dimensions also varies; the usage of all dimensions in each partial grid might be unnecessary. In this spirit a so-called dimension adaptive algorithm [5, 7] to construct a generalized sparse grid was recently used for regularized least squares regression [4]; the idea is to choose the grids for the combination technique during the computation instead of defining them a priori. The aim is to attain a function representation for f .x/, with x D .x1 ; : : : ; xd /, of the ANOVA type X f .x/ D cj1 ;:::;jq fj1 ;:::;jq .xj1 ; : : : ; xjq /; fj1 ;:::;jq gf1;:::;d g

where each fj1 ;:::;jq .xj1 ; : : : ; xjq / depends only on a subset of size q of the dimensions and may have different refinement levels for each dimension. The computational complexity now depends only on the so-called superposition (or effective) dimension q. Originally the overall error reduction was used as an adaptation criteria [4], but the computational effort of this criteria grows quite significantly with the number of grids. In this paper we introduce and investigate two different error indicators which are localized in some sense since one considers the improvement using subsets of all grids employed, and whose computational effort grows less with the number of partial grids. In the following we describe the regularized regression approach, present the dimension adaptive combination technique, introduce the two new error indicators, and show results on machine learning benchmarks.

2 Dimension Adaptive Combination Technique for Regression We assume that the relation between predictor and response variables can be described by an (unknown) function f which belongs to some space V of functions defined over the range of the predictor variables. To get a well-posed, uniquely solvable problem we use regularization theory and impose additional smoothness constraints on the solution of the approximation problem. In our regularized least squares approach this results in the variational problem argmin f 2V

M 1 X .f .x i / yi /2 C jjrf jj2 ; M i D1

(1)

A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria

117

for a given a dataset S D f.x i ; yi / 2 Œ0; 1d RgM i D1 : Note that using the following semi-definite bilinear form hu; viRLS WD

M 1 X u.x i /v.x i / C hru; rvi2 M i D1

corresponding to (1), the Galerkin equations are hf; giRLS D

M 1 X g.x i /yi ; M i D1

(2)

which hold for the solution f of (1) and all g 2 V . The discretization to a finite dimensional subspace VN of the function space V is achieved by the sparse grid combination technique [6]. To get a solution defined on a sparse grid we discretize and solve the problem (1) on a suitable sequence of small anisotropic grids ˝l D ˝l1 ;:::;ld , characterized by an index set I, i.e. l 2 I. These are grids which have different but uniform mesh sizes ht in each coordinate direction with ht D 2lt ; t D 1; : : : ; d . The grid points are numbered using the multi-index j ; jt D 0; : : : ; 2lt and have the coordinate jt ht in dimension t. A finite element Qd approach with piecewise d -linear functions l;j .x/ WD jt D t D1 lt ;jt .xt /; 0; : : : ; 2lt on each grid ˝l , where the one-dimensional basis functions l;j .x/ are the hat functions ( 1 jx= hl j j; x 2 Œ.j 1/hl ; .j C 1/hl l;j .x/ D 0; otherwise; now gives the function space Vl WD spanfl;j ; jt D 0; : : : ; 2lt ; t D 1; : : : ; d g: The sparse grid combination technique solution fI for a given index set I is now built via [2, 3, 6, 8] fI .x/ WD

X

cl fl .x/;

(3)

l2I

where fI is an element of a discrete function space defined on a sparse grid, the fl are partial solutions and the cl are corresponding combination coefficients. A general choice of grids was introduced in [7]. One considers an index set I which only needs to fulfil the following admissibility condition [5] k 2 I and j k

)

j 2 I;

(4)

an index k can only belong to the index set I if all smaller grids j belong to it. The combination coefficients, which are related to the inclusion/exclusion principle from combinatorics, depend only on the index set [7, 8].

118

J. Garcke

In the original combination technique one considers an a priori chosen index set I consisting of all l with jlj1 WD l1 C C ld n: This results in a sparse grid of refinement level n [2, 3, 6]. The size of an ˝l is here of order O.2d h1 n /, while the total number of points used in the combination technique is of order O.2d h1 n d 1 log.h1 / /, the same as for a sparse grid. n The solution obtained this way is the same as the sparse grid solution if the projections into partial spaces commute, or at least of the same approxd 1 imation order O.h2n log.h1 / (if the function has bounded mixed secn / ond derivatives) if a series expansion of the error of the form f fl D Pd P 2 2 i D1 j1 ;:::;jm 1;:::;d cj1 ;:::;jm .hj1 ; : : : ; hjm / hj1 : : : hjm exists [6, 8]. But for the machine learning application this does not hold [8]. Instead combination coefficients which also depend on the function to be represented are employed. They are optimal in the sense that the sum of the partial functions minimizes the error against the sparse grid solution computed directly in the joint function space [3, 8], this approach is valid for the machine learning setting as well. Note that the approximation properties of the optimized combination technique in relation to a sparse grid are currently unknown. In any case we employ in (3) the optimal coefficients computed according to 2

32 3 2 3 c1 kf1 k2RLS 7 2 7 6 c2 7 hf2 ; fk iRLS 76 7 6 kf2 kRLS 7 76 D ; 6 7 6 :: 74 :: 5 4 :: 7 : 5 5 : : ck kfk k2RLS hfk ; fk iRLS

hf1 ; f1 iRLS hf1 ; fk iRLS

6 6 hf2 ; f1 iRLS 6 :: 6 4 : hfk ; f1 iRLS

:: :

(5)

using a suitable numbering of the fl [3, 8]. A generalization of the original sparse grid combination technique consists in the use of a slightly different level hierarchy. Let us formally define the one-dimensional basis functions e l;j .x/ as e 1;0 WD 1; e 0;0 WD 0;1 ; and e l;j WD l;j for l 1; with l;j as before. Note that it holds 0;0 D e 1;0 e 0;0 . If one builds the tensor product between a constant in one dimension and a .d 1/-linear function the resulting d -dimensional function is still .d 1/-linear, one gains no additional degrees of freedom. But formally introducing a level 1, and using this as coarsest level in our adaptive procedure described in the next section, will allow us to build a combined function in the ANOVA-style, in other words each partial function might only depend on a subset of all features. The size of each grid ˝l is now of order O.2q .jlj1 C .d q//, where q D #fli jli 0g.

2.1 Adaptive Grid Choice Most important is the choice of a suitable index set I. One might be able to use external knowledge of the properties and the interactions of the dimensions

A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria

119

which would allow an a priori choice of the index set. In general the algorithm should choose the grids automatically in a dimension adaptive way during the actual computation. We therefore start with the smallest grid with index 1 D .1; : : : ; 1/ (i.e., I D f1g) which is just a constant function. Step-by-step we add additional indices such that: 1. The new index set remains admissible. 2. The grid corresponding to the index provides a large reduction in (1). During each adaptation step one has an outer layer of indices under consideration for inclusion in the sequence, the set of active indices denoted by A. Furthermore there is the set O D InA of old indices which already belong to the sequence of grids, O needs to fulfil (4). The backward neighbourhood of an index k is defined as the set B.k/ WD fk e t ; 1 t d g; the set F.k/ WD fk C e t ; 1 t d g is the forward neighbourhood, with e t the unit vector in the t-th dimension. To limit the search range we restrict the active set A to only include indices whose backward neighbours are in the old index set O, in other words for k 2 A it holds that O [ k fulfils (4). Note that A cannot be empty since, at least, an index of the form .1; : : : ; k; : : : ; 1/ for each coordinate direction is active. In Fig. 1 a few adaptation steps for the two dimensional case are presented. We assume here that the indices .0; 0/; .1; 0/; .1; 2/ and .2; 0/ are chosen in succession. In each case their forward neighbours are considered: in the first step both are admissible and added to A; in the second and third step both are admissible, but one is not used since the backward neighbour is not in O; in the last step one forward neighbour is not admissible and the other is not used since the backward neighbour is not in O. In [4] for each candidate index k from A one computes kfO[fkg fO kRLS to measure the contribution of this grid to the overall solution, i.e., the reduction in the functional (1) for the solution using the set O [ fkg in comparison to the current solution using O. Although k kRLS has to be computed in the additive space of all partial grids, its value can still be computed by using just the partial grids [3]. For the data dependent part one computes for each partial function its value on a given data point and adds these using the combination coefficients. The smoothing term of

Fig. 1 A few steps of the dimension adaptive algorithm. Active indices i 2 A are shown in dark grey and old indices i 2 O in light grey. The chosen active index (with the largest error indicator) is shown striped. The arrows indicate the admissible forward neighbours which are added to A. The indexes go from 1 to 6

120

J. Garcke

the discrete Laplacian can be expressed as a weighted sum over expressions of the form hrfi ; rfj i which can be computed on-the-fly via a grid which includes both ˝i and ˝j [3]. This approach was shown to recover the ANOVA-decomposition for synthetic problems [4] but can be computationally expensive. This is due to the necessary recomputation of the error indicators for all remaining grids in A after a grid is added to O since the reference solution for the error indicator of the candidate grids changes in each adaptation step. If this recomputation does not take place the adaptive procedure tends to converge to less suitable solution or stops too early.

2.2 Localized Adaptation Criteria In this paper we investigate two different error indicators, which are localized in some sense. First, we propose to use kffkg fB.k/ kRLS , the difference in the functional (1) between the solution of the candidate grid k added to A and the optimized combination solution of its backward neighbourhood B.k/. This value will not change during the computation since the backward neighbourhood B.k/ will not change; candidate grids can only be added to A if all backward neighbours are already in I. This is a big reduction in the computational complexity in comparison to the original approach since no recomputation is necessary. But it is also has the potential for a good error indicator. The combined solution from its backward neighbours lives in the same (small) discrete function space as the candidate grid. If the solution on the new candidate grid shows no improvement in regard to the combined solution from its backward neighbours no large gain in the representation of the overall function can be expected. Second, we propose an error indicator inspired by the residual estimator based on a hierarchical subspace decomposition used for the finite element approach, see e.g. [1]. For a candidate grid k we consider the set O n B.k/, i.e. the old indices without the indices from the backward neighbourhood of k. The solution fOnB.k/ for this index set is now computed according to (3) using the optimal coefficients (5) and considered as the reference solution. As an error indicator we new compare the difference in the solutions of the residual problems heI ; fl iRLS D

M 1 X fl .x i /yi hfOnB.k/ ; fl iRLS M

8l 2 I

i D1

for I D B.k/ and I D B.k/ [ k, that is we use keB.k/ eB.k/[k kRLS as an error indicator. It measures the additional improvement k can provide for the solution of the overall problem. This error indicator is localized, since it measures the improvement of k compared against its backward neighbourhood B.k/, but it also takes the global problem into account since we compute the residual solutions, we measure the improvement in addition to the solution provided by fOnB.k/ .

A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria compute partial problem for index 1 A WD f1g active index set while stopping criterion not fulfilled do choose k 2 A with largest "k O WD O [ fkg A WD Anfkg for t D 1; : : : ; d do j WD k C e t if j e l 2 O8 l D 1; : : : ; d then A WD A [ fj g compute partial problem for index j compute local error indicator "j end end end

121

O WD ; old index set largest indicator

look at neighbours of k admissible

Fig. 2 The dimension adaptive algorithm

Solving for eI amounts to nothing else as computing the optimal combination coefficients for the modified right hand side M 1 X L.g/ WD g.x i /yi hfOnB.k/ ; giRLS ; M i D1

P using a suitable numbering of I. Then eI D l2I cl fl .x/ and one can compute the error indicator keB.k/ eB.k/[k kRLS . Both these error indicators can be viewed as a local criteria since only the difference between the candidate grid and its backward neighbours are considered, we call them LOCAL CHANGE and LOCAL RESIDUAL, respectively. In comparison the original indicator, where all grids from O are taken into account, can be regarded as a global error indicator, we call it GLOBAL CHANGE in the following. The overall procedure is sketched in Fig. 2. Given the sets O and A the algorithm uses a greedy approach for the dimension adaptive grid choice. Depending on one of the above error indicators (and possibly other values such as the complexity of the computation for a partial solution) the grid in A which provides the highest benefit is chosen and added to the index set O. Its forward neighbourhood is searched for admissible grids to be added to A, for these the solution and error indicator are computed. Then the outer loop restarts and the procedure continues until a suitable global stopping criterion is reached; typically when the reduction of the residual falls under a given threshold. Note that for GLOBAL CHANGE additionally the error indicators for all j 2 A need to be recomputed after an index k is added to O. Note that we start in the algorithm with the constant function of grid ˝1 (i.e., A WD f1g) and in the first step look at all grids which are linear in only one dimension, that is all ˝1Cej with j D 1; : : : ; d . Once two of these onedimensional grids were chosen in successive adaptive steps, the algorithm starts to branch out to grids which can involve two dimensions and later more. Since each

122

J. Garcke

partial grid is small and depends in its complexity not on d , the total number of dimensions, but q, the number of dimensions which are not treated as constant, it allows us to treat higher dimensional problems than before with the original combination technique. Furthermore, the information about which dimensions are refined and in which combinations attributes are used allows an interpretation of the combined solution, for example one can easily see which input dimensions are significant.

3 Numerical Experiments The following experiments were done on a machine P with an Opteron (2.6 GHz) M 2 CPU. We measure the mean squared error MSE D M1 i D1 .f .x i / yi / and the p normalized root mean square error NRMSE D MSE=.max yi /. Note that for GLOBAL CHANGE after each addition of an index k to O we only recompute the criteria for the 25% of the indices k 2 A with the currently largest criteria, the values for all indices in A are only recomputed after every 10th add, this is to reduce the computational effort. We consider several real life data sets: 1. 2. 3. 4. 5.

Census housing1 consists of 22,784 instances with 121 attributes. Computer activity2 consists of 8,192 instances in 21 dimensions. Elevators2 consists of 16,599 instances in 18 dimensions. Helicopter flight project,3 with 13 attributes and 44,000 instances. Pole2 consists of 15,000 instances in 26 dimensions.

For the following experiments 90% of each data set are used for training and 10% for evaluation. We further use a 2:1 split of the training data to tune the parameters and the stopping criteria, i.e. learning on 2 parts and evaluating on 1 part. With the and tolerance resulting in the lowest MSE we then compute on all training data and evaluate on the before unseen test data. These are the results given in Table 1. The first observation is that the two local criteria involve more grids, but the used run time does not increase as much, which was the aim. The GLOBAL CHANGE is good for data which do not need many grids for the representation. Overall the best performance achieved the LOCAL RESIDUAL criteria, while the LOCAL CHANGE criteria produced the worst results. Just using the localized information from the comparison of the solution from grid k against the one from its backward neighbours B.k/ often results in the use of grids with large refinement per dimension and therefore large computational effort. The lack of information from the other grids in O also leads to early overfitting.

Available at http://www.cs.toronto.edu/delve/data/census-house Available at http://www.liaad.up.pt/ltorgo/Regression/ 3 from Ng et al., Autonomous inverted helicopter flight via reinforcement learning. 1 2

A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria

123

Table 1 Results for several real life data sets using the dimension adaptive combination technique with the three different error criteria. Given is the mean squared error (MSE) on the test data (with order of magnitude in the subscript), the used tolerance for stopping criteria, the number of grids in I, and the run times in seconds GLOBAL CHANGE

Tol Census 55 CPU activity 12 Elevators 19 Helicopter 59 Pole 12

MSE

7:857 4:96 6:376 5:695 7:951

LOCAL CHANGE

Grid Time Tol 5477 2796 2:57 291 15 51 450 157 18 1584 22565 7:55 1785 7946 11

MSE

6:617 5:27 8:486 2:714 9:881

LOCAL RESIDUAL

Grid Time Tol 5215 1413 2:55 2251 260 52 998 110 110 909 8550 510 3133 13690 53

MSE

4:367 5:30 5:986 3:745 7:291

Grid 9883 1674 2674 3573 5016

Time 2463 83 1011 6636 2500

Table 2 Comparison of our results using the GLOBAL CHANGE and LOCAL RESIDUAL criteria against results from [9], time is in seconds. The MSE results of the LOCAL RESIDUAL adaptive criteria scale accordingly to NRMSE SVR NRMSE

Census Cpu activity Elevators Pole

> 0:015 > 0:04 > 0:08 > 0:09

GLOBAL CHANGE

Time > 400 > 100 > 200 > 100

NRMSE

0.017 0.022 0.043 0.089

Time 2,796 15 157 7,946

LOCAL RESIDUAL NRMSE

0.013 0.023 0.042 0.085

Time 2,463 83 1,011 2,500

The run time depends on the number of grids, but also on the kind of grids which are being used. Grids which depend on a small number of dimensions but are highly refined, i.e. for a few but large entries in k, are worse in this regard than grids which depend on more dimensions, but only have a small level, i.e. many, but small entries in k. For the considered data sets all dimensions were used at least in one grid, although the number of grids can depend largely for the different attributes. We observed up to 5 non-constant dimensions per grid. How often a dimension is used and the size of the error indicators for these grids are information about the importance of attributes and can be derived from the final results. If this information is worthwhile in practise needs to be investigated on real life data sets together with specialists from the application area. Only on the helicopter data set with just 13 dimensions the non-adaptive optimized combination technique [3] could be used. It achieves a MSE of 3:265 in 18,280 s using level 3. Level 4 was not finished after 5 days. Finally a comparison with results using CVR, a special form of support vector regression, is given in Table 2. For all data sets our method achieves better results, but might need more, in one case quite significant, run time. On the other hand, using a smaller tolerance a somewhat worse result could be achieved by our approach in less time. Note that for a larger synthetic data set a quite significant run time advantage of the dimension adaptive approach in comparison to CVR can be observed [3, 9].

124

J. Garcke

4 Conclusions and Outlook The dimension adaptive combination technique for regression shows good results in high dimensions and breaks the curse of dimensionality of grid based approaches. It gives a non-linear function describing the relationship between predictor and response variables and (approximately) identifies the ANOVA-decomposition. Of the three different refinement criteria, GLOBAL CHANGE is best suited for applications using a small number of partial grids, otherwise LOCAL RESIDUAL performed best. It is known that error estimators which use the difference between two approximations of different resolution, i.e. of extrapolation type, have weaknesses. For example the error estimator can be small, although the actual error is still large [1]. Furthermore, the combination technique can also be derived as an extrapolation technique, therefore a thorough investigation of the observed behaviour in this context is warranted. We currently employ a simple greedy approach in the adaptive procedure. More sophisticated adaptation strategies and different error indicators, for example taking computational complexity of a grid into account, are worthwhile investigating, especially in regard to an underlying theory which could provide robustness and efficiency of the approach similar to the numerical solution of partial differential equations with adaptive finite elements [1]. The original approach scales linear in the number of data [2, 3]. In the dimension adaptive approach at least the computational effort for each partial grid scales linear in the number of data. Since the value of the adaption and stopping criteria depends on the number of data, the number of partial grids might change with a different number of data for a given stopping tolerance. Although we did not observe such unwanted behaviour in our experiments, it has to be seen if in a worst case scenario the dimension adaptive approach could result in a non-linear scaling in regard to the number of data.

References 1. Mark Ainsworth and J.Tinsley Oden. A posteriori error estimation in finite element analysis. Wiley, 2000. 2. J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225–253, 2001. 3. Jochen Garcke. Regression with the optimised combination technique. In W. Cohen and A. Moore, editors, 23rd ICML ’06, pages 321–328, 2006. 4. Jochen Garcke. A dimension adaptive sparse grid combination technique for machine learning. In Wayne Read, Jay W. Larson, and A. J. Roberts, editors, Proc. of 13th CTAC-2006, volume 48 of ANZIAM J., pages C725–C740, 2007. 5. T. Gerstner and M. Griebel. Dimension–Adaptive Tensor–Product Quadrature. Computing, 71(1):65–87, 2003. 6. M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In P. de Groen and R. Beauwens, editors, Iterative Methods in Linear Algebra, pages 263–281. IMACS, Elsevier, 1992.

A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria

125

7. M. Hegland. Adaptive sparse grids. In K. Burrage and Roger B. Sidje, editors, Proc. of 10th CTAC-2001, volume 44 of ANZIAM J., pages C335–C353, 2003. 8. M. Hegland, J. Garcke, and V. Challis. The combination technique and some generalisations. Linear Algebra and its Applications, 420(2–3):249–275, 2007. 9. Ivor W. Tsang, James T. Kwok, and Kimo T. Lai. Core vector regression for very large regression problems. In Luc De Raedt and Stefan Wrobel, editors, 22nd ICML 2005, pages 912–919. ACM, 2005.

•

Haralick’s Texture Features Computation Accelerated by GPUs for Biological Applications Markus Gipp, Guillermo Marcus, Nathalie Harder, Apichat Suratanee, Karl Rohr, Rainer K¨onig, and Reinhard M¨anner

Abstract In biological applications, features are extracted from microscopy images of cells and are used for automated classification. Usually, a huge number of images has to be analyzed so that computing the features takes several weeks or months. Hence, there is a demand to speed up the computation by orders of magnitude. This paper extends previous results of the computation of co-occurrence matrices and Haralick texture features, as used for analyzing images of cells, by generalpurpose graphics processing units (GPUs). New GPUs include more cores (480 stream processors) and their architecture enables several new capabilities (namely, computing capabilities). With the new capabilities (by atomic functions) we further parallelize the computation of the cooccurrence matrices. The visually profiling tool was used to find the most critical bottlenecks which we investigated and improved. Changes in the implementation like using more threads, avoiding costly barrier synchronizations, a better handling with divergent branches, and a reorganization of the thread tasks yielded the desired performance boost. The computing time of the features for one image with around 200 cells is compared to the original software version as a reference, to our first CUDA version with computing capability v1.0 and to our improved CUDA version with computing capability v1.3. With the latest CUDA version we obtained an improvement of 1.4 to the previous CUDA version, computed on the same GPU (gForce GTX 280).

M. Gipp G. Marcus R. M¨anner Department of Computer Science V Institute of Computer Engineering (ZITI), University of Heidelberg B6, 26, 68131 Mannheim, Germany e-mail: [email protected]; [email protected]; [email protected] N. Harder A. Suratanee K. Rohr R. K¨onig Department of Bioinformatics and Functional Genomics IPMB, BIOQUANT and DKFZ Heidelberg, University of Heidelberg Im Neuenheimer Feld 267, 69120 Heidelberg, Germany e-mail: [email protected]; [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 11, © Springer-Verlag Berlin Heidelberg 2012

127

128

M. Gipp et al.

In total, we achieved a speedup of 930 with the most recent GPU (gForce GTX 480, Fermi) compared to the original CPU version and a speedup of 1.8 compared to the older GPU with the optimized CUDA version. Keywords Co-occurrence matrix • GLCM • Graphics processing units • GPU • GPGPU • Fermi • Haralick texture features extraction

1 Introduction In 1973 Haralick introduced the co-occurrence matrix and texture features for automated classification of rocks into six categories [6]. Today, these features are widely used for different kinds of images, for example, for microscope images of biological cells. One drawback of the features is the relatively high costs for computation. However, it is possible to speed up the computation using generalpurpose graphics processing units (GPUs). Nowadays, GPUs (ordinary computer graphics cards) are used to accelerate non-graphical software by highly parallel execution. In biological applications, features are extracted from microscopy images of cells and are used for automated classification as described in [3, 7]. Figure 1 shows an example of a microscopy image (1,3441,024 pixels and 12 bit gray level depth), which includes several hundred cells (typically 100–600). Usually a very large number of images has to be analyzed so that computing the features takes several weeks or months. Hence, there is a demand to speed up the computation by orders of magnitude. The overall goal of this biological application is to construct a network of signalling pathways of the cells. Therefore, genes are knocked down and images are acquired. Afterwards, the images are segmented using the adaptive thresholding algorithm in [7] to distinguish cells from the background. For the segmented cells Haralick texture features are computed. Besides these features also other features are calculated and a well-chosen list of features is used for classification. The classification result yields information about the signalling network of the cells. Due to a large range of interesting genes and images, the image analysis process must

Fig. 1 Microscopy image with several hundred cells

Haralick’s Texture Features Computation Accelerated by GPUs

129

be automated. After analyzing the different computation steps it turned out that the Haralick texture features consume most of the time. In a previous GPU version we analyzed the features and found similarities in the computation between them. Common intermediate results were visualized in a dependency graph and the optimal computational order was determined to avoid costly double computation. Further, we grouped the features in functional steps and parallelized the computation. Implementation details on how the features are parallelized in order to dramatically speed up the computations can be found in [4]. In this paper, our approach is to optimize the latest GPU version. Since then the profiling tool offered by NVIDIA has been fully developed and includes good visualization options. The profiling is simple and shows a detailed timing behavior with a direct comparison with small changes in the implementation. Furthermore, new GPU architectures with more computing functions (computing capabilities) are available. These computing capabilities helps to speed up our latest algorithm by changing the structure of the implementation. Hence, we use new tools and architectures to further improve our latest results. Below, we present the latest changes in the state of the art of GPUs, refresh important details about the co-occurrence matrices, and introduce all versions of the computations. Then, we explain the investigation of the profiling and implementation changes in Sects. 2.4 and 2.5 (these sections are a good entry for those familiar with our last paper). Afterwards, we present the speed up factors for the different versions. We finally discuss the result and draw conclusions.

2 Methods 2.1 State of the Art Speedup of the computation of the co-occurrence matrix and the Haralick texture features using reconfigurable hardware has been described in [9]. There, only a subset of the 14 features was chosen, obtaining a speedup of 4.75 for the cooccurrence matrix and 7.3 for the texture features when compared to a CPU application. More recent FPGAs (Xilinx Virtex4, Virtex5, Virtex6) would provide more space to implement more features at a higher clock speed. Using GPUs for general-purpose computation is more and more common. During the last years, the peak computing power of GPUs has been rising dramatically. As an example, the NVidia GTX 480 from the Geforce-Force-400 series with 480 usable thread processors and 1.4 GHz clock speed reached over 1,345 GFLOPS. It can process 2 single precision floating operations (SP) concurrently or 1 double precision (DP) per thread processor. Hence, the maximum speed is computed by 480 * 1.4 GHz * 1 floating operation DP gives 672 GFLOPS, and double as fast with SP. A state of the art CPU (Intel Xeon X7560 with 16 thread cores at 2.66 GHz turbo) reaches around 307 GFLOPS [1], i.e. 19 GFLOPS for each core. Figure 2 illustrates the peak performance of GPUs and CPUs and highlights a much steeper growing curve for GPUs. Reference [8] presents various applications

130

M. Gipp et al.

Fig. 2 Peak performance growing curve of different GPU and CPU generations

in which GPUs provide a speedup of 3–59 compared to CPUs. Especially n-body simulations achieve a GPU performance over 200 GFLOPS. One should mention that the total peak performance depends on the application itself and how the GFLOPS are counted. Only applications using multiply-add operations without divisions and other costly operations come close to the theoretical maximum performance. The better an application can be parallelized and partitioned in identical small computational units, the better the architecture of a GPU is utilized. The NVidia graphics card we use are the GeForce GTX 280 and the GeForce GTX 480 (called Fermi). The older card (GTX 280) has 30 streaming multiprocessors (SM), and each consists of 16,384 registers, 16 kBytes of shared memory, and 8 processing elements. The partition of the newer card (GTX 480) is different, it has 15 steaming multiprocessors containing 32,768 registers, 64 kBytes of shared memory, and 32 processing elements. These processing elements are arranged in a single instruction multiple data (SIMD) fashion. In total, the GPUs provides 240 and 480 parallel pipelines that can operate most efficiently if a much higher number of light-weight program threads are available. A GPU (below called “device”) is divided into many multiprocessors and is provided by several usable memories. The device memory is the biggest memory with around 1 GByte (GTX 280) and 1.5 GByte (GTX 480) but also the slowest. Access to this memory has a latency of several hundred cycles. One important improvement of the newer card is the presence of caches, so that the latency is dramatically reduced with the occurrence of a cache hit. NVidia offers an Application Programmable Interface (API), an extension to the programming language C called Compute Unified Device Architecture (CUDA) to use the highly parallel GPU architecture. One CUDA block contains a program code in a single instruction multiple threads (SIMT) fashion and is executed on one SM.

Haralick’s Texture Features Computation Accelerated by GPUs

131

All threads within a block share the total number of registers and the shared memory of one SM. Using a high number of threads has the advantage of hiding latency of memory accesses for a maximum occupation of the SM computational units. Blocks are arranged in a block grid so they can be dispatched between the SM. Reference [2] discusses the architecture and CUDA.

2.2 Co-occurrence Matrix The generation of the co-occurrence matrices (simply co-matrices) is based on second order statistics as described in [6] and [5]. With this approach, histogram matrices are computed for different orientations of pixel pairs. Using pixel pairs along a specific angle (horizontal, diagonal, vertical, co-diagonal) and distance (one to five pixels) together, a two-dimensional symmetric histogram of the gray levels is generated. The gray levels of the pixel pair address the indexes in the co-matrix and increment it by one, detailed example can be found in [5]. For each specific angle/distance combination a separate matrix must be generated. This means that one side of the square co-matrix is as long as the gray range level in the image. The microscope generates multi cell images (Fig. 1) with a gray level depth of 12 bits corresponding to 4,096 different gray levels. Hence, each co-matrix needs 4,0964,0964 bytesD64 Mbytes of storage capacity. The graphics device is equipped with 1,536 Mbytes of memory. Therefore we can generate only 24 matrices at once and compute the features on the corresponding image, which does not fully use the GPU. For a massive parallel approach we need to reduce the size of the co-matrices and the size depends on the existing gray range of each cell image extracted from the multi cell image. Actually, the co-matrices contain zeros almost everywhere. The reason for this sparse matrix is that a cell image contains nothing purely random and all the pixel pairs have preferred gray tones so that during the co-matrix counting part the elements are not determined randomly. For example, the cell core pixels have a small variation compared to the neighboring pixels. Especially the background of the segmented image contains only pixels with the same intensity value so that no gray tone difference between neighboring pixels exists. These facts result in many pixel pairs with similar gray values and the counting in the matrix being more or less spotted into small regions. Figure 3a shows a binary image of a full matrix with a size of 4,0964,096 pixels. Especially the plane background of the cell images has the gray tone zero (black) with only one combination of gray levels (zero/zero) apart from the background cell border combinations. In our algorithm we delete all rows having only zero elements (and also all columns since the matrices are symmetric) to obtain a smaller packed co-matrix. Figure 3b shows the co-matrix of Fig. 3a in a packed representation with only 277277 elements. For this example, a reduction from 64 MByte to 300 kByte could be achieved. The total average packed co-matrix size has been determined to be about 1.5 MByte of storage space. A large standard deviation in the average size

132

M. Gipp et al.

Fig. 3 Binary images of a full (a) and a packed (b) co-occurrence matrix. White pixels indicate zeros and black pixel indicate values differ from zero

forces us to assume a bigger size to determine the actually memory demand for the computations. For the feature computations, we store the gray value index of the full comatrix in a lookup table corresponding to the index of the packed co-matrix. So the gray value can be reconstructed from the index of the packed co-matrix, which is necessary for computing related features. This co-matrix reduction strategy is a compromise between less storage capacity and direct accessibility in memory and works well in our algorithm for real cell images.

2.3 Previous Versions Previously, we implemented an optimized software version and a GPU version. In our software version, we analyzed the existing software version that computes the Haralick texture features. The goal was to optimize the code and run it on a single node. The single node version can be used to run it on a cluster with different data sources. In our GPU version we could parallelize the software version in several ways, to compute several cell images in parallel (C ), to generate all co-occurrence matrices for each angle/distance combination in parallel (AD), and to compute each feature by summing and multiplying several elements in parallel. In CUDA we created a grid of AD C blocks so that for each cell all matrices and features are computed in parallel. More details on how we mapped the CUDA blocks on the GPU architecture and how we used the threads within a CUDA block can be found in [4].

2.4 Optimizations with Profiling The profiler showed us the most time consuming-computational functions in our algorithm and visualized it. With the profiling we investigated the functions and

Haralick’s Texture Features Computation Accelerated by GPUs

133

found various points for changes. Many divergent paths and synchronization barriers could be avoided by changes in the structure. Access on the matrices are row-wise in small blocks simultaneously as many threads are used. Often, the row size is not a multiple of the thread block size so that at the border of the row divergent paths exist for some threads. We divided the memory accesses in a common part, all threads executing before they reach the border, and a border part with only thread blocks executing on the borders. The common way is to read from the global memory, store the information in a shared memory, then synchronize, do operations on shared memory, synchronize again, and write the results in a global memory. For many functions with few operations, the execution order was changed to read from global memory, compute the operations and write the results back in global memory. The operations read, compute, and write are coded in just one line of code. We are not using the shared memory, therefore we got rid of all synchronization barriers. In the next section we will give more details on how and why the changes effect the speed up in the optimized implementation.

2.5 Implementation Changes Before we explain the changes to our last version we refresh details about the structure of the algorithm. On each multi cell image C , AD co-matrices with different orientations are generated and on each co-matrix 13 features are computed. The total computation consists of C AD 13 features. Before the features can be computed we need to generate the C AD matrices. Afterwards the features are computed on the matrices. Below we describe implementation details about the matrix generation process and how we further improved it. For details of the feature implementation we refer to our last work in [4]. With the profiling we could identify and significantly optimize the slowest kernel functions. This section explains the implementation changes of the functions with the biggest gain of speedup, see also the result Table 2. Kernel Function 0B sets all matrix elements to zero, made by C AD CUDA blocks in the old code. Since each CUDA block sets one relatively small matrix to zero, this includes also a small work load. A better efficiency of the work load and a reduction of the call overhead could be achieved by using only C CUDA blocks which do the work of AD matrices. The matrix generation process in Function 0C is similar to the computation of a histogram. Each event increments one of the histogram bins. First the value is read, than one is added, and finally the result is written back to the same memory address. This process is not thread safe, which means that several threads working on the same value have to be mutual exclusive accesses. In our previous version we avoided computing conflicts by using one thread within each CUDA block generating one matrix. Our current version is supported by atomic functions

134

M. Gipp et al.

provided by devices with higher computing capabilities. The atomic add reads the memory value, increments it, and writes it back in one instruction. This has several advantages, for example it needs less memory accesses, the add is executed by the memory controller so that the computational units are free. More important, the atomic operations are thread safe so that several threads can be used to generate one matrix without any computing conflicts. Function 1D is the slowest kernel in the old code, so we spend much effort to speed it up. The task of this function is to compute a vector containing the sum of each matrix row. The profiling showed that using one thread for each matrix row to sum up all elements in a loop has an inefficient memory access pattern. To improve the speed of the memory accesses a change of the thread mapping from one thread per matrix row to 16 threads also reduced the number of loop iteration, respectively. In Sect. 2.4 we already mentioned that the use of shared memory with costly barrier synchronizations and only few computational instructions is contra productive in comparison to avoiding the barriers at all. Further, splitting up the computation in a common part without divergent branches and a border part with branches increases the occupancy of the GPU. These two changes are applied to several kernel functions in the new code. The ones with the most gain are the functions 0D, 1F and 5A.

3 Results We compare six versions of the Haralick texture feature computation: the original version, a well optimized software version, and two CUDA versions using different GPUs. Results are shown in Table 1. The execution times have been compared on a Intel Core 2 Quad machine (Q6600) with 2.4 GHz and 8 MBytes L2 cache, 4 GBytes DDR2 RAM with 1,066 MHz clock speed, a NVidia GeForce 8,800 GTX with a 1,350 MHz shader clock, 768 MByte GDDR3 at 900 MHz and 384 Bit wide in a PCIe v1.0 16x slot; and

Table 1 Comparison of execution times and speedup factor of all introduced versions and different GPUs Execution Speed up Speed up Speed up time [s] factor to 1. factor to 2. factor to 3. 1. Original software version 2. Optimized software version 3. GPU version I (8,800 GTX) 4. GPU version I (GTX 280) 5. GPU version II (GTX 280) 6. GPU version II (GTX 480)

2,378 214 11.1 6.6 4.65 2.55

– 11 214 360 511 930

– – 19 32 46 83

– – – 1.7 2.4 4.4

Haralick’s Texture Features Computation Accelerated by GPUs

135

a NVidia GeForce GTX280 with a 1,300 MHz shader clock, 1,024 MByte GDDR3 at 1,107 MHz and 512 Bit wide in a PCIe v2.0 slot. The operating system was Linux Ubuntu x64 with kernel version 2.6.20 and gnu C-compiler version 4.1.2. For software version 1 and 2 we used one CPU core only. In the GPU version, we chose C=8 and AD=20, i.e. eight cells are calculated in parallel with 4 angles and 5 directions per cell. These parameters gave the best results. The total grid size is 160 blocks in CUDA for each feature kernel. CUDA Version I is compiled for the architecture with computing capability v1.0 and CUDA version II is compiled with computing capability v1.3. In Table 2 we show the profiling improvements of the previously and current CUDA version. In the first column, the kernel functions are listed. The second column contains the execution times of our previous GPU version and in the third column the execution times of our current version is listed. The last column contains the speedup factors, respectively.

Table 2 Execution times and speedup factors of the previous and current GPU versions ID, Function CUDA version 1 CUDA version 2 0A, lookup tables 276.3 ms 275.9 ms 0B, clear co-matrices 242.4 ms 86.8 ms 0C, compute co-matrices 466.4 ms 367.6 ms 0D, normalize co-matrices 224.4 ms 141.0 ms 221.8 ms 221.2 ms 1A, compute f1 1B, compute f5 416.8 ms 373.6 ms 202.5 ms 203.3 ms 1C, compute f9 1D, compute P 929.2 ms 177.0 ms 1E, compute Pjxyj 310.1 ms 288.2 ms 602.6 ms 418.2 ms 1F, compute PxCy 2A, compute mean 4.5 ms 4.5 ms 2B, compute var 6.2 ms 6.2 ms 2C, compute H 4.2 ms 4.2 ms 5.0 ms 5.0 ms 3A, compute f2 4.9 ms 4.9 ms 3B, compute f11 3C, compute M acP jx yj 6.7 ms 6.7 ms 3D, compute f10 5.2 ms 5.2 ms 7.2 ms 7.2 ms 4A, compute f6 10.9 ms 10.9 ms 4B, compute f8 13.1 ms 13.1 ms 4C, compute f7 5A, compute f3 418.6 ms 300.5 ms 269.3 ms 270.7 ms 5B, compute f4 309.9 ms 273.0 ms 5C, compute f12 5D, compute f13 225.1 ms 226.5 ms

Factor 1 2:8 1:3 1:6 1 1:1 1 5:2 1:1 1:4 1 1 1 1 1 1 1 1 1 1 1:4 1 1:1 1

GPU execution time CPU execution time

5,183 ms 1,416 ms

3,690 ms 963 ms

1:40 1:47

Total execution time

6,600.0 ms

4,650.0 ms

1:42

136

M. Gipp et al.

4 Discussion The speedup of a factor of 930 for the GPU version compared to the original software version meets the demand of the biologists. Compared to the optimized software version the speedup is still around a factor of 83. Table 1 shows the execution times in seconds. Beside an optimized software version which is 11 times faster than the original software version. Our latest CUDA version is in comparison 4.4 times faster, including the implementation optimization and a faster GPU. In a direct comparison CUDA version II is around 1.4 times faster over CUDA version I, due to optimizations and new computing capabilities in the recent GTX 280 device. A detailed performance comparison of our CUDA versions can be found in Table 2. The best results we obtained in terms of improvements concern Function 1D and Functions 0B—0D. The biggest execution time reduction was achieved by avoiding divergent paths and reduced synchronization barriers. Many complex operations can very effectively be performed on shared memory due to its fast access times of one to two cycles. In most functions we have only two operations to compute, so that the benefit using fast shared memory is worse than the benefit of removing the barrier synchronization. Moreover, the division into a common memory access part and a memory border access part leads, together with not using shared memory, to a steady computing and memory flow without any synchronization points. The benefit of using the atomic function shows Function 0C with a improvement of 1.3. Originally, we planed to use multi-threading for the matrix generation but a view on the profiling result told us that spending more effort to parallelize it would hardly change the total computational time. Therefore, we left the matrix generation single threaded for each co-matrix. Given the complexity of the Haralick texture features and the co-occurrence matrices computations, and the application requirements, our most recent implementation yields excellent performance.

5 Conclusion In this paper we have shown that the costly computation of the co-occurrence matrix and the Haralick texture features can be speed up by a factor of 930 in comparison to the original software version. This allows biologists to perform much more tests to acquire novel knowledge in cell biology in weeks or days instead of several months. Graphics Processing Units (GPUs) are inexpensive alternatives to reconfigurable hardware with an even higher computational capability, a much shorter implementation development time, and are much faster (in orders of magnitudes) than Central Processing Units (CPUs). By using many CUDA functions to reduce the complexity of the algorithm in combination with avoiding divergent branches and refraining from using shared memory we could improve our own results by an additional factor of 1.4 and a factor of 4.4 including the latest GPU.

Haralick’s Texture Features Computation Accelerated by GPUs

137

References 1. Intel(r) microprocessor export compliance metrics. URL: http://www.intel.com/support/ processors/xeon/sb/CS-020863.htm, (5. Dec. 2008). 2. NVIDIA CUDA Programming Guid Version 2.0. URL: http://www.nvidia.com/object/cuda develop.html, (5. Dec. 2008). 3. C. Conrad, H. Erfle, P. Warnat, N. Daigle, T. L¨orch, J. Ellenberg, R. Pepperkok, and R. Eils. Automatic identification of subcellular phenotypes on human cell arrays. Genome Research, 14:130–1136, 2004. 4. M. Gipp, G. Marcus, N. Harder, A. Suratanee, K. Rohr, R. K¨onig, and R. M¨anner. Haralick’s texture features computed by GPUs for biological applications. IAENG International Journal of Computer Science, 36:1:IJCS 36 1 09, 2009. 5. R. M. Haralick. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786–804, 1979. 6. R. M. Haralick and K. Shanmugam. Computer classification of reservoir sandstones. IEEE Transactions on Geoscience Electronics, 11(4):171–177, 1973. 7. N. Harder, B. Neumann, M. Held, U. Liebel, H. Erfle, J. Ellenberg, R. Eils, and K. Rohr. Automated recognition of mitotic patterns in fluorescence microscopy images of human cells. In B. Neumann, editor, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, pages 1016–1019, 2006. 8. H. Nguyen. GPU Gems 3. Addison-Wesley, Upper Saddle River, NJ, USA, 2007. 9. M. A. Tahir, A. Bouridane, F. Kurugollu, and A. Amira. Accelerating the computation of GLCM and Haralick texture features on reconfigurable hardware. In A. Bouridane, editor, International Conference on Image Processing (ICIP ’04), volume 5, pages 2857–2860, 2004.

•

Free-Surface Flows over an Obstacle: Problem Revisited Panat Guayjarernpanishk and Jack Asavanant

Abstract Two-dimensional steady free-surface flows over an obstacle are considered. The fluid is assumed to be inviscid and incompressible; and the flow is irrotational. Both gravity and surface tension are included in the dynamic boundary condition. Far upstream, the flow is assumed to be uniform. Triangular obstruction is located at the channel bottom as positive bump or negative bump (dip). This problem has been investigated by many researchers, such as Forbes [5], Shen [8], and Dias and Vanden-Broeck [2], to seek for new types of solutions. In this paper, the fully nonlinear problem is formulated by using a boundary integral equation technique. The resulting integrodifferential equations are solved iteratively by using Newton’s method. When surface tension is neglected, a new solution type of subcritical flow is proposed, the so-called drag-free solution. Furthermore, solutions of flows over a dip in the bottom are also presented. When surface tension is included, there is an additional parameter in the problem known as the Bond number B. In addition, the weakly nonlinear problem is investigated and compared with the fully nonlinear results. Finally, solution diagrams for all flow regimes are presented on the .F; hob/plane for which F is the Froude number and hob is the dimensionless height of the obstacle. Keywords Free-surface flow • Obstacle • Boundary integral equation • Surface tension

P. Guayjarernpanishk Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand e-mail: [email protected] J. Asavanant Advanced Virtual and Intelligent Computing (AVIC) Research Center, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 12, © Springer-Verlag Berlin Heidelberg 2012

139

140

P. Guayjarernpanish and J. Asavanant

1 Introduction Flow over submerged obstacles is one of the classical problems in fluid mechanics. This problem has many related physical applications ranging from the flow of water over rocks to atmospheric, and oceanic stratified flows encountering topographic obstacles, or even a moving pressure distribution over a free surface. Free surface flows over an obstacle have been investigated for different bottom topography by many researchers. Lamb [7] calculated solutions of the linear problem of free-surface flow over a submerged semi-elliptical obstacle. He obtained solutions with downstream waves for subcritical flow and symmetric solutions without waves for supercritical flow. Forbes and Schwartz [6] used the boundary integral method to find fully nonlinear solutions of subcritical and supercritical flows over a semi-circular obstacle. Their results confirmed and extended Lamb’s solutions. In 1987, Vanden-Broeck [11] showed that there exist solutions of supercritical flow for which one solution corresponds to that obtained by the perturbation of uniform flow, and the other by the solitary wave. Both profiles were symmetric with respect to the obstacle. Forbes [4] computed numerical solutions of critical flow over a semi-circular obstacle. The flow was a uniform subcritical stream ahead of the obstacle, followed by a uniform supercritical stream behind the obstacle. This type of solution is generallyerred as “hydraulic fall”. Supercritical and critical flows over a submerged triangular obstacle were investigated by Dias and Vanden-Broeck [3]. They used a series truncation technique to find numerical solutions. For critical regime, the flow behavior near the apex of the triangle was similar to the flow over a wedge as the size of the triangle increased. In case of supercritical flow, the flow approached a limiting configuration with a stagnation point on the free surface with a 120ı angle. Shen et al. [10], Shen and Shen [9] and Shen [8] presented weakly nonlinear solutions of flow over an obstacle. They confirmed the two branches of solutions for supercritical flow and Forbes’s numerical result [4] was a limit of the noidal wave solution. Zhang and Zhu [12] derived a new nonlinear integral equation model in terms of hodograph variables for free-surface flow over an arbitrary bottom obstruction with downstream waves. Their results did not suffer the upstream spurious waves as those obtained by Forbes and Schwartz [6]. In 2002, Dias and Vanden-Broeck [2] found a new solution called the “generalized hydraulic fall”. Such solutions are characterized by downstream supercritical flow and a train of waves on the upstream side. This type of solution can be obtained by removing the radiation condition on the far upstream of the obstacle. Forbes [5] calculated numerical solutions of gravity-capillary flows over a semicircular obstruction. The fluid was subject to the combined effects of gravity and surface tension. Three different branches of solution were presented and compared between linear and fully nonlinear problems. In this paper we consider both the fully nonlinear problem and the weakly nonlinear problem of free-surface flows over an obstacle. Both gravity and surfacetension are included in the dynamic boundary condition. The fully nonlinear

Free-Surface Flows over an Obstacle: Problem Revisited

141

problem is formulated and solved, as in Binder, Vanden-Broeck, and Dias [1], by using the boundary integral equation technique in Sect. 2. Results and discussion are presented in Sect. 3.

2 Mathematical Formulation 2.1 Fully Nonlinear Problem We consider a steady two-dimensional flow over an obstacle. The fluid is assumed to be inviscid1 and incompressible2; and the flow is irrotational.3 The flow domain is bounded below by a horizontal bottom, except at the presence of bump/dip in the form of isosceles triangular obstacle, and above by a free surface. We introduce Cartesian coordinates .x; y/ with the x-axis along the flat part of the bottom and the y-axis directed vertically upwards through the apex of triangular obstacle. Gravity g is acting in the negative y-direction. The apex and the other two vertex of the isosceles triangular obstacle are denoted by xb ; xbl and xbr , respectively. Here the superscripts “l”and “r” refer to the leftmost and the rightmost vertices of the obstacle. Far upstream as x ! 1, the flow approaches a uniform stream with constant velocity U and constant depth H . All variables are made dimensionless with respect to the velocity and length scales U and H , respectively. The dimensionless parameters in the problem are the T Froude number4 F D pUgH , the Bond number5 B D gH 2 , and the dimensionless height of triangular obstacle hob. Here T is a surface tension and is a fluid density. From the irrotationlity and incompressibility, there exist a potential function .x; y/ and a stream function .x; y/. Let’s define a complex potential f D C i which is an analytical function of z D x C iy. Without loss of generality, we choose D 0 on the free surface and D 0 at the apex of the obstacle. It follows that D 1 on the bottom. The flow domain in the complex f -plane is a strip 1 < < 0. The mathematical problem can be formulated in terms of the potential function satisfying the Laplace’s equation and the corresponding boundary conditions

1

Inviscid implies that viscosity is negligible and therefore it can support no shearing stress. Fluid density is unchanged under pressure variation or, mathematically, the divergence of velocity is zero. 3 This can be thought of fluid flow that is free of vortices or, mathematically, the curl of velocity vanishes everywhere in the flow field. 4 The Froude number is the dimensionless ratio of characteristic velocity to wave celerity. 5 The Bond number is the dimensionless ratio of body forces (often gravitational) to surface tension forces. 2

142

P. Guayjarernpanish and J. Asavanant

xx C yy D 0 x2 C y2 C

in the fluid domain,

(1)

2 2 2 y 2 B D 1 C 2 2 F F F

on y D .x/;

(2)

y D x x

on y D .x/;

(3)

y D x hx

on y D h.x/;

(4)

x ! 1; .x/ ! 1

as x ! 1:

(5)

Here y D .x/ is the unknown free surface, y D h.x/ is the equation of the bottom xx and D .1C 2 /3=2 is the curvature of the free surface. Equation (2) is the dynamic x boundary condition known as Bernoulli’s equation, and (3) and (4) are the kinematic boundary conditions on the free surface and on the bottom, respectively. Let us introduce a conformal mapping D ˛ C iˇ D e f . Here the region occupied by the fluid is mapped onto the upper half of the -plane. Values of ˛ at the apex and the leftmost and the rightmost vertices of the triangular obstacle are O we can denoted by ˛b ; ˛bl and ˛br . Introducing the hodograph variables, O and , write the complex velocity w as O

O w x i y D e i :

Here e O is the magnitude of the velocity and O represents the directivity angle of the flow ( O < ). We now apply the Cauchy integral formula to the function i O O in -plane with a contour consisting of the real axis (˛ axis) and a semicircle of arbitrary large radius in the upper half plane. Since the flow is uniform far upstream, O i O ! 0 as jj ! 1. After taking the real part, we have Q .˛0 / D

1

Z

1 1

Q .˛/ d˛; ˛ ˛0

(6)

where .˛/ Q and Q .˛/ are the values of O and O on the ˛ axis. It should be noted that the integral equation (6) is of Cauchy principal value type. The kinematic boundary condition on the bottom (1 < ˛ < 0) of the channel implies 8 r r ˆ ˆ < ; ˛b < ˛ < ˛b ; Q .˛/ D l ; ˛bl < ˛ < ˛b ; ˆ ˆ :0 ; otherwise where l and r are inclined angles of left and right side of triangular obstacle and r D l . Substituting these values into (6) and rewriting in terms of by the change of variables ˛ D e ; ˛0 D e 0 , we obtain

Free-Surface Flows over an Obstacle: Problem Revisited

ˇ ˇ 2 ˇ e b C e 0 l ˇˇ ˇ .0 / D ln ˇ ˇ u d ˇ .e b C e 0 /.e b C e 0 / ˇ Z 1 ./e d: e 0 1 e

143

(FNL1)

Q / and b ; l ; and r are values of at the Here ./ D .e Q /, ./ D .e b b apex, the leftmost, and the rightmost vertices of the triangular obstacle. To obtain another relation on the free surface, we use the identity x i y D e i . The dynamic boundary condition (2) then becomes e 2 C

@ 2 2 .y 1/ 2 Be 1 D 0: F2 F @

(FNL2)

Equations (FNL1) and (FNL2) define a nonlinear system of integrodifferential equations for ./ and ./ on the free surface. The free surface profile can be calculated by Z x.0 / D

0 0

Z y.0 / D 1 C

0

1

e ./ cos ./d for 1 < 0 < 1; e ./ sin ./d for 1 < 0 < 1:

2.2 Weakly Nonlinear Problem One of the main difficulties in computing free-surface flows over an obstacle is to find the number of independent parameters. For that purpose, a weakly nonlinear analysis can be useful. Shen [8] used a new and more rigorous derivation to show that under the long-wave6 and small obstacle7 assumptions the gravity flow over an obstacle can be described by the stationary forced Korteweg-de Vries (sfKdV) equation 9 Axx 3.F 2 1/A D A2 3h 2 where A D A.x/ is the first-order elevation of the free surface . In this paper, we employ a similar process to derive a stationary forced Korteweg-de Vries equation of gravity-capillary flow over an obstacle. The sfKdV equation of gravity-capillary flow is 9 (WNL) .1 3B/Axx 3.F 2 1/A D A2 3h: 2 6 7

This assumption is valid when the wave length L is larger than the water depth. The smallness of the obstacle must be in the order of O.. HL /2 /.

144

P. Guayjarernpanish and J. Asavanant

The initial-value problem (WNL) with initial conditions limx!1 A.x/ D limx!1 Ax .x/ D 0 is solved by the fourth-order Runge–Kutta method. Phase portraits in the phase plane .A; Ax / can be determined directly from equation (WNL). There are two fixed points: A D 0; Ax D 0 and A D 23 .F 2 1/; Ax D 0. Weakly nonlinear solutions and their phase portraits shall be discussed in comparison with the fully nonlinear results in Sect. 3.

3 Results and Discussion Following Binder, Vanden-Broeck, and Dias [1], we solve the fully nonlinear problem by introducing N C 1 mesh points 1 ; 2 ; : : : ; N C1 . Equations (FNL1) and (FNL2) are then reduced to a set of nonlinear algebraic equations that can be solved by Newton’s method. Numerical computation is performed for given values of F , B, hob, and l . In case of the isosceles triangular obstacle, numerical solutions are found to be qualitatively similar for different values of l . In this study, we set l D 4 for a bump and l D 4 for a dip. Sensitivity of the numerical solutions to the numbers of grid point N and the grid spacing are investigated for various values of N and until they are in agreement within graphical accuracy. All results presented here are obtained with N D 800 and D 0:10.

3.1 Free-Surface Flows over an Obstacle Without Surface Tension 3.1.1 Bump (hob > 0) In case of subcritical flow, there exist two types of solutions for which the first type is characterized by a train of nonlinear waves behind the obstacle (Forbes and Schwartz [6], see Fig. 1a). The second type can be called the “drag-free” solution as

a

b

c

Fig. 1 Typical free-surface profiles of flows over a bump. (a) Nonsymmetric subcritical solution for F D 0:70 and hob D 0:10. (b) Symmetric subcritical solution for D 0:20 and hob D 0:10. (c) Critical solution for F D 0:62 and hob D 0:30

Free-Surface Flows over an Obstacle: Problem Revisited

145

0.8 0.7 0.6 No solution 0.5

C

hob

Subcritical flow 0.4

rit ic al flo

Supercritical flow

w

0.3 0.2 0.1 Symmetric Nonsymmetric 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

F

Fig. 2 Solution diagram in .F; hob/-plane of free-surface flows over a bump

shown in Fig. 1b. When the Froude number F decreases to its critical value Fc (solid line with squares in Fig. 2), the amplitude of the nonlinear waves of the first type decreases and ultimately vanishes. The drag-free solution exists on the left-hand region of the solid line with squares. For supercritical flows, solutions exist with a symmetric free-surface profile as elevation wave for given values of F and hob. Two types of supercritical solutions can be found for which one is a perturbation of a uniform flow (Forbes and Schwartz [6]), so-called type 1 supercritical solution, whereas the other is a perturbation of a solitary wave (Vanden-Broeck [11]), so-called type 2 supercritical solution. Unlike the above flow regimes, critical flow over an obstacle depends only on one parameter which can be chosen to be the obstacle height hob. This type of solution is traditionally called the “hydraulic fall” (see Fig. 1c). Regions of existence of different types of solutions are illustrated in Fig. 2. In summary, for pure gravity flow over a bump, there exist two types of supercritical and subcritical flows, and a one-to-one correspondence of F and hob for hydraulic fall solutions. 3.1.2 Dip (hob < 0) For subcritical flow, there exist two types of solutions for which one is characterized by a train of nonlinear waves and the other by an elevation wave. The first type of solutions is depicted in Fig. 3a for F D 0:60 and hob D 0:20. Downstream behavior of this flow is similar to the case of a bump except in the dip region where the free surface is uplifted. The other subcritical solution takes on the form of a symmetrical elevation profile with respect to the obstacle. A typical profile is

146

a

P. Guayjarernpanish and J. Asavanant

b

c

Fig. 3 Typical free-surface profiles of flows over a dip. (a) Subcritical solution of the first type (nonsymmetric) for F D 0:60 and hob D 0:20. (b) Subcritical solution of the second type (symmetric) for F D 0:20 and hob D 0:20. (c) Supercritical solution for F D 1:50 and hob D 0:50

a

b

c

Fig. 4 Critical flows over a dip. (a) Fully nonlinear solution for F D 0:82 and hob D 0:45. dy (b) Plot of y 1 versus dx D tan of the fully nonlinear phase trajectories for (a). (c) Weakly nonlinear profile for F D 0:79 and hob D 0:25

shown in Fig. 3b. For supercritical flow, it is found that a unique solution exists as a symmetric depression wave (see Fig. 3c). Weakly nonlinear solutions of subcritical and supercritical flows are found to be qualitatively in good agreement with the fully nonlinear results. In case of critical flow, the upstream free surface is elevated in the region of the dip with a hydraulic fall on the downstream of the obstacle. Typical profiles of fully nonlinear and weakly nonlinear critical solutions are shown in Fig. 4a and c. A fully nonlinear phase trajectory is illustrated in Fig. 4b. Figure 5 illustrates regions of existence of solutions for subcritical, supercritical, and critical flow regimes in the presence of a dip.

3.2 Free-Surface Flows over an Obstacle with Surface Tension 3.2.1 Bump (hob > 0) For subcritical flow, a nonsymmetric solution with downstream waves and nonphysical wave train of small amplitude on the upstream side was proposed by Forbes [5]. In our computation, an upstream radiation condition is imposed to

Free-Surface Flows over an Obstacle: Problem Revisited

147

0

w

-0.1

-0.3

Symmetric

Critic al flo

hob

-0.2

Nonsymmetric

-0.4 Subcritical flow

Supercritical flow

-0.5 No solution -0.6

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

F

Fig. 5 Solution diagram in .F; hob/-plane of free-surface flows over a dip. The Froude number on the dashed line with diamonds and the dash-pointed line with squares is 0.24 and 1.04, respectively

a

b

Fig. 6 Gravity-capillary subcritical flows over a bump. (a) Nonsymmetric solutions for hob D 0:10; F D 0:70 and B D 0; 0:005; and 0:01. (b) Symmetric solutions for hob D 0:15; F D 0:50 and B D 0:10; 0:20; 0:30; and 0:40

remove these unwanted numerical phenomenon. In Fig. 6a, the amplitude of a train of nonlinear waves increases with decreasing wavelength as the Bond number increases for nonsymmetric solution of subcritical flow. The shape of a nonlinear free-surface profile for a symmetric solution of subcritical flow, when F D 0:50 and hob D 0:15, is shown in Fig. 6b. It is found that solutions for small Bond numbers exhibit a sharper trough at x D 0 than those of large Bond numbers. For supercritical flows, as shown in Fig. 7, maximum free-surface elevation of type 1 solution is slightly greater than that obtained in the case B D 0. For solutions of

148

a

P. Guayjarernpanish and J. Asavanant

b

c

Fig. 7 Typical free-surface profiles of gravity-capillary supercritical flows over a bump. (a) Type 1 solutions for F D 1:20, hob D 0:25 and B D 0; 0:05; 0:10. (b–c) Type 2 solutions for F D 1:20; B D 0; 0:02; 0:04 and hob D 0:10 and 0:25, respectively

type 2, nonuniformity of the surface tension effect is found in the numerical results. That is, the maximum level of elevation wave increases when the obstacle height is small (see Fig. 7b) but decreases when the obstacle height is large (see Fig. 7c) as the Bond number increases in both cases. For a given Bond number, the free-surface profile on the upstream side of critical flows changes from rigid lid to a profile with elevated free-surface as hob decreases to the critical height hob (see Fig. 8a). For hob < hob , critical flow solution does not exist. Fully nonlinear phase trajectories of these results are shown in Fig. 8b. For each case, the trajectory starts at a fixed point y 1 D 0 with a downward jump to the solitary wave orbit and then returns to another fixed point y 1 D 23 .F 2 1/. A similar behavior can also be found as in the previous case for a fixed Bond number with various values of the height of obstacle. In particular, when the Bond number decreases to its critical value B , the amplitude of the upstream wave increases and the shape of the free-surface above the bump ultimately approaches a dimplelike profile whose amplitude is of O.hob2 / as shown in Fig. 8c. Weakly nonlinear results and weakly nonlinear phase portraits of critical flows are shown in Fig. 8e–f for hob D 0:10 and B D 0:10; 0:20; and 0:30. 3.2.2 Dip (hob < 0) For given values of hob and F , the amplitude and wavelength of downstream waves of the type 1 solution of subcritical flow decrease as the Bond number increases. Typical free-surface profiles are shown in Fig. 9a. This solution can be found only for small values of the Bond number. When the Bond number increases (B D 0:05 ! 0:19), the maximum level of an elevation wave over a dip for a type 2 symmetric solution of the subcritical flow decreases (see Fig. 9b). Typical profiles of supercritical flow over a dip for various values of Bond number are shown in Fig. 9c. The minimum level of flow over a dip is found to be a decreasing function of the Bond number. In case of the critical flow, as the Bond number increases, the

Free-Surface Flows over an Obstacle: Problem Revisited

a

b

c

d

e

f

149

Fig. 8 Gravity-capillary critical flows over a bump. (a) Fully nonlinear solutions for B D 0:10 and hob D 0:165; 0:20; 0:25; 0:30. The critical height hob is 0.165. (b) Values of y 1 versus dy D tan of the fully nonlinear phase trajectories for (a). (c) Fully nonlinear solutions for hob D dx 0:20 and B D 0:10; 0:20; 0:30. The critical Bond number B is 0:10. (d) Fully nonlinear phase trajectories for (c). (e) Weakly nonlinear solutions for hob D 0:10 and B D 0:10; 0:20; and 0:30. (f) Weakly nonlinear phase portrait for (e) showing A versus Ax

maximum elevation over a dip increases whereas the far downstream level decreases as shown in Fig. 9d. It should be noted that, for critical flow, the Froude number F is treated as part of the solution which is an inverse proportion of the far downstream free surface elevation.

150

a

c

P. Guayjarernpanish and J. Asavanant

b

d

Fig. 9 Typical fully nonlinear free-surface profiles of gravity-capillary flows over a dip. (a) Subcritical flows of type 1 for hob D 0:40; F D 0:60 and B D 0:0; 0:002; and 0:004. (b) Subcritical flows of type 2 for hob D 0:30; F D 0:40 and B D 0:05; 0:10; 0:15; and 0:19. (c) Supercritical flows for hob D 0:40; F D 1:50 and B D 0; 0:05; 0:10; and 0:15. (d) Critical flows for hob D 0:20 and B D 0; 0:02; 0:04; and 0:06

4 Conclusion Subcritical, supercritical and critical flows of gravity-capillary waves over a triangle-shaped bump and dip are considered. Fully nonlinear solutions are calculated by using the boundary integral equation technique. When the flow is subcritical or supercritical, there exists a three-parameter family of solutions (F; B and hob). For the critical flow, it is found that there is a two-parameter family of solutions (B and hob). In this paper, new solutions of subcritical and critical flows over a bump and critical flows over a dip are found for both fully nonlinear and weakly nonlinear problems. Acknowledgements This work was partially supported by the Graduate and Faculty of Science, Chulalongkorn University, the National Research Council of Thailand, the Franco-Thai Cooperation Program in Higher Education, and Advanced Virtual and Intelligent Computing (AVIC) Research Center.

Free-Surface Flows over an Obstacle: Problem Revisited

151

References 1. Binder, B.J., Vanden-Broeck, J.-M., Dias, F.: Forced solitary waves and fronts past submerged obstacles. Chaos., 15, 037106-1–13 (2005) 2. Dias, F., Vanden-Broeck, J.-M.: Generalised critical free-surface flows. J. Eng. Math., 42, 291– 301 (2002) 3. Dias, F., Vanden-Broeck, J.-M.: Open channel flows with submerged obstructions. J. Fluids. Mech., 206, 155–170 (1989) 4. Forbes, L.K.: Critical free-surface flow over a semi-circular obstruction. J. Eng. Math., 22, 3–13 (1988) 5. Forbes, L.K.: Free-surface flow over a semicircular obstruction, including the influence of gravity and surface tension. J. Fluid. Mech., 127, 283–297 (1983) 6. Forbes, L.K., Schwartz, L.W.: Free-surface flow over a semi-circular obstruction in a channel. J. Fluid. Mech., 114, 299–314 (1982) 7. Lamb, H.: Hydrodynamics. Cambridge, Cambridge University Press (1945) 8. Shen, S.S.P.: On the accuracy of the stationary forced Korteweg-de Vries equation as a model equation for flows over a bump. Quar. App. Math., 53, 701–719 (1995) 9. Shen, S.S.P., Shen, M.C.: Notes on the limit of subcritical free-surface flow over an obstruction. Acta Mech., 82, 225–230 (1990) 10. Shen, S.S.P., Shen, M.C., Sun, S.M.: A model equation for steady surface waves over a bump. J. Eng. Math., 23, 315–323 (1989) 11. Vanden-Broeck, J.-M.: Free-surface flow over an obstruction in a channel. Phys. Fluids., 30, 2315–2317 (1987) 12. Zhan, Y., Zhu, S.: Open channel flow past a bottom obstruction. J. Eng. Math., 30, 487–499 (1996)

•

The Relation Between the Gene Network and the Physical Structure of Chromosomes Dieter W. Heermann, Manfred Bohn, and Philipp M. Diesinger

1 Introduction Human cells contain 46 chromosomes with a total length of about 5 cm beads-ona-string type of nucleosomal fibre, called chromatin. Packaging this into a nucleus of typically 5–20 m diameter requires extensive compatification. This packaging cannot be random, as considerable evidence has been gathered that chromatin folding is closely related to local genome function. However, the different levels of compactification are ill understood and not easily accessible by experiments. Consensus is that chromosomes are folded and compactified on several length scales. The lowest level of compactification is the nucleosome [27] consisting of a cylindrical-shaped histone octamer and a stretch of DNA which is wrapped around the histone complex approximately 1.65 times. The histone octamer consists of four pairs of core histones (H2A, H2B, H3 and H4) and is known up to atomistic resolution [8, 15]. The nucleosomes are connected by naked DNA strands and together with these linkers they form the so-called 30 nm fibre. The histone H1 (and the variant histone H5 with similar structure and functions) is involved in the packing of the beads on a string structure into the 30 nm chromatin structure (the second level of compaction). To do so it sits in front of the nucleosome keeping in place the DNA which is wrapped around the histone octamer and thus stabilizes the chromatin fibre. The folding motifs of the chromatin fibre on the scale of the entire chromosome are totally unclear. Imaging techniques do not allow one to follow the folding path of the fibre in the interphase nucleus. Therefore, indirect approaches have been used to obtain information on the folding [12, 20]. There is an ever growing body of evidence that chromatin loops play a dominant role in transcriptional regulation [16].

D.W. Heermann M. Bohn P.M. Diesinger Institute for Theoretical Physics, University of Heidelberg, Philosophenweg 19, 69120 Heidelberg, Germany e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 13, © Springer-Verlag Berlin Heidelberg 2012

153

154

D.W. Heermann et al.

It was suggested that genes from different positions on a chromosome assemble to transcription factories showing a high degree of gene activity. These studies indicate that there is a tight and probably causal relationship between folding of the chromatin fibre and its transcriptional activity. Models and simulations are required to gain understanding and compare to the mostly indirect evidence of the folding motifs. There are challenges here too, both on the modelling as well as on the numerical methods side. It is literally impossible to do simulations on the level of individual atoms or DNA base pairs. Coarsegraining descriptions are necessary due to the enormous amount of number of constituents of the system. This holds true for all the scales involved. On the scale of the 30 nm fibre it is impossible to use atomistic simulations and methods and clearly this is so on the scale of an entire chromosome. Further new methods are necessary to take into account the dynamic nature of problem. The individual association of molecules to chromosomes that alter the folding motifs can certainly not be taken into account on an individual basis. Here methods that use statistics are required. Taking this altogether multi-scale methods are clearly needed to couple the information on the different levels. This awaits invention, however.

2 Modelling on the 30 nm Scale Chromatin can be described by the two-angle model which was introduced by Woodcock et al. [28] to model the geometry of the 30 nm chromatin fibre. In the framework of the extended two angle model (“E2A-model”) (cf. Fig. 1) the nucleosomes will be characterized by the centers Ni 2 R3 and their orientations pOi 2 R3 . The linkers between the centers of two nucleosomes will be denoted by bi WD Ni Ni 1 with i D 1; : : : ; N for a fibre of N nucleosomes. The length kbi k of the linkers will be a further input parameter of the model (opposite of the direction bi 2 R3 of the linkers). Furthermore, the entry-exit-angle ˛i 2 Œ0; between two consecutive linkers is defined by ˛i WD ^.bi ; bi C1 / with i D 1; : : : ; N 1 and the rotational angle ˇi 2 Œ0; between two consecutive orientations is given by ˇi WD ^.pi 1 ; pi / with i D 1; : : : ; N . Moreover, hi represents the distance along the orientational axis pOi 1 from Ni 1 to Ni due to the spatial discrepancy between in and outgoing DNA strand. hi can be expressed by the vertical distances di which the DNA covers by wrapping up itself around the histone complexes: hi D 12 .di 1 Cdi / with i D 1; : : : ; N . The construction of the fiber can be done using an iterative process. A further part of the model is the presence of a H1 histone which is assumed to be present with probability p.

The Relation Between the Gene Network and the Physical Structure of Chromosomes

155

Fig. 1 The figure shows the basic parameters of the E2A model: The entry-exit-angle ˛i , the rotational angle ˇi , the linker length bi and the vertical distance di between in and outgoing linker. We chose a large entry-exit-angle here to make the visualization clear

The first nucleosome center and its orientation are arbitrary. We chose: 0 1 0 1 0 0 N0 D @ 0 A ; pO0 D @ 0 A : 0 1 The following vectors fulfil the conditions of the two angle model for the second nucleosome location and its orientation: 1 0 q 0 1 1 q kb1 k2 h21 C B N1 D N0 C kb1 k2 h21 @ 0 A C h1 pO0 D @ A 0 0 h1 and

ˇ

pO1 D RaO 1 pOi 1

aO D .1; 0; 0/t :

with

Now we can calculate Ni C1 and pOi C1 in dependence of Ni , Ni 1 , pOi and pOi 1 . With vi WD bi C hpOi ; bi ipO i and v0i

WD

Rp˛O0i

q

bi2C1

di2C1

vi kvi k

C hi C1 pOi

(1)

156

D.W. Heermann et al.

one gets the location of nucleosome i C 1 by Ni C1 D Ni C v0i : ˛0 is the angle between the projections of bi C1 and bi 1 onto an arbitrary plane orthogonal to pOi . We need to calculate the dependence of this projected entry-exitangle ˛0 on the actual entry-exit-angle ˛. Using the law of cosine one gets l 2 D bi2 C bi2C1 2bi bi C1 cos.˛/:

(2) T

Now we will use an affine transformation T to a new coordinate system .x; y; z/ ! .x 0 ; y 0 ; z0 / in order to get a second relation for l. We shift the origin to Ni and rotate our old coordinate system so that pOi corresponds to the new z-axis. Furthermore, the new x-axis has to coincide with the projection of bi onto any plane orthogonal to pOi . Obviously, l 2 D kbi C v0i k2 D kbi0 C v00i k2 1 0q bi2 hpO i ; bi i2 T C B bi ! bi0 D @ A 0

with

hpOi ; bi i

and

q 1 cos.˛0 / bi2C1 h2iC1 C Br T q 2 C B v0i ! v00i D B b 2 h2 cos.˛ / b 2 h2 C: 0 A @ i C1 i C1 i C1 i C1 0

hi C1 This leads to q q l 2 D bi2C1 C bi2 2hi C1hpOi ; bi i 2cos.˛0 / bi2 hpOi ; bi i2 bi2C1 h2iC1 : (3) By comparing (2) and (3) one gets eventually bi bi C1 cos.˛/ hi C1 hpO i ; bi i q cos.˛0 / D q bi2C1 h2iC1 bi2 hpOi ; bi i2 with the boundary condition ˛0 > ˛mi n

.hi C1 C khpOi ; bi ik/2 bi2C1 bi2 D acos 2bi bi C1

! (4)

The Relation Between the Gene Network and the Physical Structure of Chromosomes

157

due to non-vanishing di and di C1 . The calculation of Ni C1 is complete, since we now know the dependence of ˛0 on ˛ and therefore one can use (1) to determine Ni C1 . But one still has to calculate the orientation pi C1 of nucleosome Ni C1 . Due to the fixation of the in and outgoing DNA strand by the H1 histones this orientation can be calculated by a rotation around the following normalized axis a: O aO WD

bi C1 hpi ; bi C1 ipOi : kak O

pOi C1 then follows by a rotation of pOi around this axis: ˇ

pOi C1 D RaO i C1 pOi : These equations can be solved analytically in some cases and thus supply the basis of our Monte Carlo model to describe the structure of chromatin [5]. It has been shown that the excluded volume of the histone complex plays a very important role for the stiffness of the chromatin fibre [18] and for the topological constraints during condensation/decondensation processes [3]. In [22] a rough approximation of the forbidden surface in the chromatin phase diagram was given. In a previous work of ours [5] we answered questions concerning the fine structure of the excluded volume borderline which separates the allowed and forbidden states in the phase diagram with the basic assumption of spherical nucleosomes and no vertical shift between in and outgoing strand. Furthermore, we were able to show analytically that the shape of the excluded volume borderline, which is very irregular, comes from an underlying prime factor dismantling problem. In a following work [6] we presented a Ramachandran-like diagram for chromatin fibres with cylindrical nucleosomes for the extended model and furthermore discussed the influence of a vertical shift between the linkers due to H1 histones and the volume exclusion of the DNA. This diagram is shown in Fig. 2. The coloured lines represent the phase transition between allowed and forbidden states. All states below the corresponding line are forbidden, those above it are allowed. The states near the excluded volume borderline are the most interesting of the phase diagram since they are the most compact ones. The gaps in the borderline might be used by the fibre to become (at least locally) very dense. The nucleosome-nucleosome as well as the nucleosome-DNA interaction are highly complex and still an area of current research. We solved the problem of avoiding these potentials by using experimental data [29] for the distribution of the nucleosome repeat length (NRL) and taking advantage of the fact that the local chromatin parameters are not independent. This makes it possible to partially invert the convolution of the probability distributions (which is given by the experimental data) and thus get information on the individual distributions of our model parameters.

158

D.W. Heermann et al.

Phase diagram of the chromatin fiber 25

d=0.00nm d=0.17nm d=0.33nm d=0.50nm d=0.67nm d=0.83nm d=1.00nm d=1.67nm d=1.33nm d=1.50nm d=1.67nm d=1.83nm d=2.00nm d=2.17nm d=2.33nm

β [deg]

20

15

10

5

0 10

20

30

40

50

60

70

80

90

100

α [deg] Fig. 2 Cut-out of the chromatin phase diagram (for different d ). The states below the corresponding lines are forbidden due to excluded volume interactions. With increasing d more and more states become accessible to the fibre

Making use of given parameter distributions for the model parameters gives us the advantage of saving computation time that would otherwise be spent for the equilibration of the fibres. The saved computation time can then be used to generate very large fibres (i.e. chromatin fibres consisting of several Mbp). Of course excluded volume potentials for the DNA and the nucleosomes are taken into account: The DNA has a tube-like shape and the nucleosomes have the excluded volume of flat cylinders. An example conformation of such a regular chromatin fibre—that means a fibre without defects—is shown in Fig. 5. Integrating these experimental parameter distributions does not lead to one specific chromatin fibre structure, but instead to a distribution of structures in the chromatin phase diagram (cf. Fig. 3). A further part of the model is the presence of a H1 histone which is assumed to be present with a fixed probability [6, 7]. For a certain nucleosome Ni the defect probability p gives the chance of a missing H1 histone. If the histone is missing, the in and outgoing DNA strand are no longer fixed in front of the nucleosome but instead are arbitrary with respect to the excluded volume interactions of the chromatin strand (c.f. Fig. 4b). Furthermore, the second chromatin feature that we included in the model is the possibility for nucleosomes to dissolve entirely so that only naked DNA stretches remain. These DNA strands are modelled as worm-like chains with a diameter of 2.2 nm and a persistence length of 50 nm as illustrated in Fig. 4a. In a real cell nucleus nucleosome-free regions are likely to be occupied by regulatory proteins.

The Relation Between the Gene Network and the Physical Structure of Chromosomes

159

Fig. 3 A single point in this phase diagram corresponds to a specific chromatin structure. The forbidden structures lie left and below the dashed line which is the excluded volume borderline [5]. Due to the parameter distributions in our model we do not expect a specific chromatin structure but instead a distribution of structures in the phase diagram. This probability distribution is shown in the back of the figure

To estimate the average rate of nucleosome skips we used data for the average nucleosome occupancy per bp [25] that was obtained by experiments combined with a probabilistic prediction model. We use a prediction of the nucleosome occupancy for the entire yeast genome [23]. A example conformation for a such a disturbed chromatin fibre with fixed depletion rates is shown in Fig. 5. Depletion of linker histones and nucleosomes affects, massively, the flexibility and the extension of chromatin fibres. Increasing the amount of nucleosome skips (i.e., nucleosome depletion) can lead either to a collapse or to a swelling of chromatin fibres. These opposing effects were discussed and we showed that depletion effects might even contribute to chromatin compaction. Furthermore, we found that predictions from experimental data for the average nucleosome skip rate lie exactly in the regime of maximum chromatin compaction. We determined the pair distribution function of chromatin. This function reflects the structure of the fibre, and its Fourier-transform can be measured experimentally. Our calculations show that even in the case of fibres with depletion effects, the main dominant peaks (characterizing the structure and the length scales) can still be identified which might lead to new experimental approaches in determining chromatin structure for instance by light optical methods.

160

D.W. Heermann et al.

Fig. 4 Illustration of two kinds of histone depletion. (a) An example of a individual nucleosome skip. If a nucleosome is dissolved, a blank stretch of DNA will remain. The naked DNA stretches have lengths of multiple integers of the nucleosome repeat length plus once the length of a DNA linker and can either lead to a collapse or to a swelling of the chromatin fibre. In both cases they increase the flexibility of the chromatin chain. (b) An example conformation of a chromatin fibre with a missing linker histone. The upper strand and the strand below the defect are regular, i.e. the local fibre parameters are fixed. Please note that the fibre is very open to make the visualization clear

3 Modelling on the Large Scale The folding of chromatin above the scale of about 100 kb is unknown to a surprising extend. The limited resolution of light microscopy does not allow tracking the path of the chromatin fibre in vivo. An ever growing body of evidence suggests that chromatin folding is tightly connected to genome function. A pivotal role in maintaining this connection is attributed to the formation of chromatin loops, i.e. the possibility of genes and regulatory elements to co-locate [16,26]. The formation of these loops is dynamic: different genes interact with the control sequences during development in a mutually exclusive way, correlated with their expression. Loops have also been associated with the formation of transcription factories, which bring together transcriptionally active genes [10, 19]. Despite recent progress in understanding some links between genome folding and function, a coherent connection has not been established yet. Polymer models are able to shed light on the most important features of chromatin folding, being able to make predictions as well as explain experimental data on the basis of very general assumptions. An interesting ˝ ˛ outcome of recent experiments [20] is that the mean square displacement R2 between two fluorescent markers becomes independent of genomic separation g at about 5–10 Mb (see Fig. 6), indicating a folding of chromosomes into a confined space of the nucleus. Let us assume that we can approximate the chromosomal conformations on the scale above 100 kb by a bead spring or linker type of polymer model, the chain consisting of N uncorrelated, equal subunits of length b. Such a description of a biological polymer is correct when we make N sufficient small so that b is larger

The Relation Between the Gene Network and the Physical Structure of Chromosomes

161

Fig. 5 (a) Example conformation of a chromatin strand of length 40 kbp. The light blue tubes represent the DNA, the histone octamers are modeled as purple cylinders and the linker histones are marked pale yellow. This chromatin conformation with a diameter of about 34 nm has no depletion effects (i.e. it is regular). (b) Example conformation of a chromatin fibre with depletion effects: The linker histone skip rate is 6 and the nucleosome skip rate is 8. The linker histone skips are marked orange. One can see that the concept of a regular 30 nm fibre does not hold anymore. Instead one obtains very flexible coil-like structures of compact regions which are separated by naked DNA stretches. Shown is a section of the fibre which has a total length of 394 kbp

than the persistence length of chromatin [11]. Three basic polymer models are commonly used to compare experimental data with: (a) the random walk (RW) model where no volume interactions are taken into account, (b) the self-avoiding walk model (SAW) takes excluded volume into account, while (c) the globular state (GS) model furthermore includes temperature-dependent attractive interactions [4]. One characteristic feature of a polymer model is the mean squared end-to-end ˝ 2˛ distance RN , which displays a typical scaling behaviour, ˝ 2˛ RN D b 2 N 2

(5)

in the limit of large N , where l is the linker length, N the chain length and a constant depending on the model used: D 0:5 for the RW, 0:588 for the SAW and D 1=3 for the GS.

162

Chromosome1

Chromosome 11

2

mean square distance [µm ]

a

D.W. Heermann et al.

genomic separation [Mb]

2

mean square distance [µm ]

b

Chromosome 1

genomic separation [Mb]

genomic separation [Mb]

Chromosome 11

genomic separation [Mb]

Fig. 6 Distance measurements in fibroblast cells. a. Plots show the mean square physical ˝ ˛ distances R2 between two fluorescent markers as a function of the genomic distance for regions of increased gene density (ridges, green) and gene-poor regions (anti-ridges, red) on human chromosome 1 and 11 in the 0.5–10 Mb range. Data points in green and red correspond to the ridges and anti-ridges, respectively [20]. Error bars represent standard errors. b. The mean square ˝ ˛ displacement R2 is shown as a function of genomic distance in the 25–75 Mb range. Error bars represent standard error

We compare data from experiments (Fig. 6) to these polymer models by calcu˝ ˛ ˝ ˛2 lating the moment ratio R4 = R2 [1]. It has the advantage of being dimensionless and containing information about the fluctuations. Interestingly, we find that the experimental data displays pronounced deviations from these simple polymer models (Fig. 7), the fluctuations being even larger than for the RW model. The values are larger even than for the RW model, indicating huge distance fluctuations inside the cell nucleus. Based on these observations we propose a general polymer model, the Random Loop (RL) model [2], which is able to explain the observed levelling-off in the mean-square distance as well as the large cell-to-cell variation. The model takes into account the looping of the chromatin fibre. In contrast to other chromatin models [13,17,24], the RL model for the first time includes two important aspects of chromatin folding: Firstly, loops are assumed to be dynamic, i.e. the loop attachment points are not fixed throughout the ensemble. Secondly, our model allows the formation of loops of all sizes, in agreement with experimental evidence [26]. The RL model assumes a chain of length N , where the spatial bead positions are denoted by x0 ; : : : ; xN , to be subjected to the following potential

The Relation Between the Gene Network and the Physical Structure of Chromosomes

163

4.4 4

RW SAW GS

〈R 4〉 / 〈R 2〉2

3.6 3.2 2.8 2.4 2 1.6 1.2 0.1

1

10 genomic distance [Mb]

˝ ˛ ˝ ˛ Fig. 7 The moment ratio R4 = R2 for the experimental data from human chromosome 1 and 11 (Fig. 6) and data from the murine Igh locus [14]. The ratios are compared to the random walk (RW), self-avoiding walk (SAW) and globular state (GS) polymer model [1]. The large ratios of experimental data indicate a huge cell-to-cell variation

N X 1 U D k xj xj 1 k2 C 2 j D1 2

N X

ij k xi xj k2

(6)

i <j ji j j>1

The first term describes the connectivity of the chain, while the second term describes the formation of random loops. ij D j i are the spring constants for the loop attachment points. Right now we keep them arbitrary but they will be randomly chosen later within the model. The probability density for a bead conformation .x0 ; : : : ; xN / in the canonical ensemble is given by the P .x0 ; : : : xN / D C exp.U=kB T /. Eliminating the degrees of freedom stemming from the translational invariance and factorizing the spatial dimensions, we can rewrite the one-dimensional probability density, 1 P1 .x1 ; : : : ; xN / D C1 exp. XT KX/ 2

(7)

where X D .x1 ; : : : ; xN /T and K is a matrix made up of the ij [2]. Assuming K to be symmetric and positive semi-definite, we can integrate out some degrees of freedom and obtain the probability distribution for the coordinates of two arbitrary beads I and J Z P .xI ; xJ / D

Z :::

N Y

P .x1 ; : : : ; xN /

(8)

i D1;i ¤I;J

This integral can be evaluated by standard methods for normal distributions. Going back to three dimensions we obtain after some basic integral evaluations the mean square distance between two beads I and J ,

164

D.W. Heermann et al.

Fig. 8 The Random Loop Model averages over (a) the thermal disorder and (b) over the possible configurations of loops. Here one can see two possible configurations of loops

where

˝ 2 ˛ rIJ thermal D 3.JJ C II 2IJ / :

(9)

˙ D K 1 D ij i;j

(10)

The important point is now, that we let ij be Bernoulli-distributed random variables with probability p, meaning that the loop attachment points are chosen randomly. In a first approach, we assume a homogeneous looping probability p for all pairs of monomers, i.e. each pair of monomers will form a loop with equal probability independent of the contour length n in between. The mean square displacement between two monomers has to be calculated not only over the thermal ensemble given by (9) but also over the ensemble of different loop configurations, i.e. the random variables. Two such conformations are displayed in Fig. 8. The disorder average cannot be calculated analytically, so we have to use a representative subset of the ensemble and numerically calculate the averages. ˝ ˛ The results for the mean square displacement R2 in relation to genomic separation g are displayed in Fig. 9a. It shows an increase at small genomic separations, which is due to the random-walk nature of the backbone. At larger genomic separation, however, the model displays a leveling-off comparable to that of the experimental data. Note that loops on all scales are necessary to explain this leveling-off [2]. Interestingly, the random loop model can also explain the large ˝ ˛ ˝ ˛2 fluctuations found, represented by the ratio R4 = R2 . Fluctuations of the RLM exceed the random walk value due to the additional disorder in the system (Fig. 9b), yielding a natural explanation for the cell-to-cell variation being based on different looping configurations. In contrast to the assumptions of a homogeneous looping distribution, experiments reveal a strong dependence of the level of compaction on transcriptional activity [12] (see Fig. 6a). How can these findings be explained by the Random Loop Model? Indeed, the ˛ scale behaviour of the Random Loop model displays ˝ short a power-law behaviour R2 N 2 , the parameter , i.e. the compaction depending on the looping probability (Fig. 10a). This leads us to propose that different states of compaction can be explained by different local looping probabilities. As a first approximation we divide the polymer in ridge and anti-ridge regions and define three different looping probabilities, i.e. PR , defining loop formation in ridge regions, PAR for anti-ridges and Pinter for the interaction between such

The Relation Between the Gene Network and the Physical Structure of Chromosomes

165

Mean Square Physical Distance [µm2]

10 9 8 7 6 5 4 p=4E-5 p=5E-5 p=6E-5 p=7E-5 chromosome 11 long distance data chromosome 1 ridge data chromosome 1 anti-ridge data

3 2 1 0

0

10

20

30 40 50 genomic distance [Mb]

60

70

80

˝ ˛ Fig. 9 (a). Mean square distance R2 in relation to genomic separation g (contour length) of the Random Loop model compared to experimental data. Data is shown for the model without excluded volume and a chain length of N D 1;000 for different looping probabilities p. (b). The ˝ ˛ ˝ ˛2 dimensionless ratio R4 = R2 of the Random Loop model has much larger values than the RW, SAW or GS polymer model, in agreement with experimental data

b

200

160 140

p = 1 × 10–5, v = 0.450 ± 0.001 p = 2 × 10–5, v = 0.406 ± 0.002 p = 3 × 10–5, v = 0.366 ± 0.004 p = 5 × 10–5, v = 0.307 ± 0.005 p = 8 × 10–5, v = 0.241 ± 0.007

2

〈Rn 〉

120 100 180 60 40 20 0 0

10

20

30 40 50 Contour length n

10 chr 11 ridge chr 11 anti-ridge model in ridge region model in anti-ridge region

2 2

180

mean square displacement 〈R 〉 [mm ]

a

60

70

80

8

6

4

2

0 0

2

8 4 6 genomic distance g [Mb]

10

12

Fig. 10 a. Qualitative short scale behaviour of the Random Loop model. The relationship between the mean square displacement between two monomers and their contour ˝ ˛distance is shown for different values of the looping probability P and fitted to a power-law R2 N 2 . The scaling exponent varies over a broad range of values, depending on the looping probability P . b. This panel shows simulations of the RL model using different P values for ridges, ant-ridges and the interactions between these regions on the q-arm of chromosome 11, as shown in Fig. 6. The assigned P values are pR D 3 105 , pAR D 7 105 and pinter D 1 105 , respectively. Calculations are without excluded volume; the coarse-grained monomer is set at 75 kb

166

D.W. Heermann et al.

regions. Figure 10b shows the result of a simulation for PR D 3 105, PAR D 7 105 and Pinter D 1 105 . The RL model with these values describes the folding of the ridge and anti-ridge region of chromosome 11 remarkably well. Thus, this heterogeneous Random Loop model allows a unified description of the folding of the chromatin fibre inside the interphase nucleus over different length scales. It furthermore bridges the gap between genome folding and function, explaining different levels of compaction with different local looping probabilities.

4 Discussion In this contribution we have presented two models for chromatin on the small and the large scale. Using Monte Carlo simulations of the 30 nm chromatin fibre, it was shown that linker histone H1 depletion as well as nucleosomal skips massively affect the flexibility and the extension of chromatin fibres. On the scale of the whole chromosome we have presented a model, the Random Loop model, which predicts important features of large-scale chromatin organization by assuming probabilistic loops on a broad range of scales. Local differences in chromatin compaction, as for instance found in ridges and anti-ridges along the q-arms of chromosomes 1 and 11 (Fig. 6a), are taken into account by locally assigning different looping probabilities to the polymer. Although still highly simplifying, this explains remarkably well the difference in compaction of ridges and anti-ridges, assuming a 2.5-fold difference in looping probability for the studied region on human chromosome 11. Thus the RL model allows for a unified description of the folding of the chromatin fibre inside the interphase nucleus over different length scales and explains different levels of compaction by assuming different looping probabilities, related for instance to local differences in transcription level and gene density. The RL model creates a basis for explaining the formation of chromosome territories, not requiring a scaffold or other physical confinement. While there is a lot of evidence that chromatin-chromatin interactions play a crucial role in genome function (e.g. see [9, 21]), our study proposes that it also plays an important role in chromatin organization inside the interphase nucleus on the scale of the whole chromosome (tens of Mbs) as well as on that of subchromosomal domains in the size range of a few Mb.

References 1. M. Bohn and D. W. Heermann, J. Chem. Phys., 130(17):174901, 2009. 2. M. Bohn, D. W. Heermann, and R. van Driel, Phys. Rev. E, 76(5):051805, 2007. 3. M. Barbi, J. Mozziconacci, and J.-M. Victor, Phys Rev E Stat Nonlin Soft Matter Phys, 71(3 Pt 1):031910, Mar 2005.

The Relation Between the Gene Network and the Physical Structure of Chromosomes

167

4. P.-G. de Gennes, Ithaca, N.Y., Cornell University Press, 1979. 5. P. M. Diesinger and D. W. Heermann, Phys. Rev. E, 74, 031904, Sep 2006. 6. P. M. Diesinger and D. W. Heermann, Biophys. J., 94(11), 4165 – 4172, 2008. 7. P. M. Diesinger and D. W Heermann. Biophys. J., 97(8), 2146–2153, Oct 2009. 8. C. A. Davey, D. F. Sargent, K. Luger, A. W Maeder, and T. J Richmond, J Mol Biol, 319(5), 1097–1113, Jun 2002. 9. P. Fraser and W. Bickmore, Nature, 447(7143), 413–417, May 2007. 10. P. Fraser, Current Opinion in Genetics & Development, 16(5), 490–495, Oct 2006. 11. A. Y. Grosberg and A. R. Khokhlov, Statistical Physics of Macromolecules. AIP Press, 1994. 12. S. Goetze, J. Mateos-Langerak, H. J. Gierman, W. de Leeuw, Osdilly Giromus, M. H. G. Indemans, J. Koster, V. Ondrej, R. Versteeg, and R. van Driel, Mol. Cell. Biol., 27(12), 4475–4487, 2007. 13. P. Hahnfeldt, J. E. Hearst, D. J. Brenner, R. K. Sachs, and L. R. Hlatky, PNAS, 90, 7854–7858, August 1993. 14. S. Jhunjhunwala, M. C. van Zelm, M. M. Peak, S. Cutchin, R. Riblet, J. J. M. van Dongen, F. G. Grosveld, T. A. Knoch, and C. Murre, Cell, 133(2), 265–279, Apr 2008. 15. K. Luger, A. W. Maeder, R. K. Richmond, D. F. Sargent, and T, J. Richmond, Nature, 389(6648), 251–260, September 1997. 16. A. Miele and J. Dekker, Mol. BioSyst., 4(11), 1046–1057, Nov 2008. 17. C. M¨unkel, R. Eils, S. Dietzel, D. Zink, C. Mehring, G. Wedemann, T. Cremer, and J. Langowski, J. Mol. Biol., 285, 1053–1065, 1999. 18. B. Mergell, R. Everaers, and H. Schiessel, Phys. Rev. E, 70, 011915, Jul 2004. 19. D. Marenduzzo, I. Faro-Trindade, and P. R. Cook, Trends Genet., 23(3), 126 – 133, 2007. 20. J. Mateos-Langerak, M. Bohn, W. de Leeuw, O. Giromus, E. M. M. Manders, P. J. Verschure, M. H. G. Indemans, H. J. Gierman, D. W. Heermann, R. van Driel, and S. Goetze, PNAS, 106(10), 3812–3817, 2009. 21. R.-J. Palstra, B. Tolhuis, E. Splinter, R. Nijmeijer, F. Grosveld, and W. de Laat, Nat. Genet., 35(2), 190–194, Oct 2003. 22. H. Schiessel, J. Phys.: Condens. Matter, 15(19), R699–R774, 2003. 23. E. Segal and J. Widom, Nature Reviews Genetics, 10, 443-456, 2009. 24. R. K. Sachs, G. V. D. Engh, B. Trask, H. Yokota, and J. E. Hearst, PNAS, 92(7), 2710–2714, 1995. 25. E. Segal, Y. Fondufe-Mittendorf, L. Chen, A. Th˚astr¨om, Y. Field, I. K Moore, J.-P. Z. Wang, and J. Widom, Nature, 442(7104), 772–778, Aug 2006. 26. M. Simonis, P. Klous, E. Splinter, Y. Moshkin, R. Willemsen, E. de Wit, B. van Steensel, and W. de Laat, Nat. Genet., 38(11), 1348–1354, Nov 2006. 27. K. E. van Holde, Chromatin, New York: Springer-Verlag, 1989. 28. C. L. Woodcock, S. A. Grigoryev, R. A. Horowitz, and N. Whitaker, PNAS, 90(19), 9021– 9025, 1993. 29. J. Widom, PNAS, 89(3), 1095–1099, Feb 1992.

•

Generalized Bilinear System Identification with Coupling Force Variables Jer-Nan Juang

Abstract A novel method is presented for identification of a generalized bilinear system with nonlinear terms consisting of the product of the state vector and the coupling force variables. The identification process requires a series of pulse response experiments from input values of various pulse duration for coupling force variables. It also requires experiments with multiple inputs rather than one single input at a time. The resulting identified system matrices represent the input–output map of the generalized bilinear system. A simple example is given to illustrate the concept of the identification method.

1 Introduction Many important processes, not only in engineering, but also in biology, socioeconomics, and ecology, may be modeled by bilinear systems (see Bruni et al. [1, 2], Mohler et al. [3], Mohler [4] and Elliott [5]). An important feature of the bilinear system is that it has the characteristics of a linear system for a constant or zero input. The special characteristics are the basis for the identification method developed by Juang [6]. Sontag et al. [7] were able to adapt many of its basic ideas [6] to prove that step inputs are not sufficient, nor are single pulses, but the family of all pulses (of a fixed amplitude but varying widths) do suffice for identification to completely identify the input/output behavior of generic bilinear systems. Recently, the earlier work [6] was extended by Juang [8] to identify a generalized bilinear system with dynamics jointly nonlinear in the state and the force variables of order higher than one.

J.-N. Juang Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 14, © Springer-Verlag Berlin Heidelberg 2012

169

170

J.-N. Juang

This paper is intended to motivate the interest in bilinear system identification and to present the current state of research in its various aspect of nonlinearity. The identification methods introduced in [6] and [8] are advanced to handle the nonlinearity consisting of the product of the state vector and the coupling force variables. The organization of this paper is as follows. After an introductory section, the section of Basic Formulation is given enlightening special characteristics of bilinear systems. The formulations are self-contained for the purpose of completeness, even though they look quite similar to the ones described in [6] and [8]. The main results are given in the section of System Identification Method, describing the main contributions of this paper in comparison with its companion papers . The Numerical Example section gives a simple example to illustrate the identification method developed. In the final section of Concluding Remarks, some concluding remarks are made on still open problems and possible trends for future research.

2 Basic Formulation Let x be the state vector of n 1, Ac the state matrix of n n , u the input vector of r 1, Bc the input matrix of n r, y the output vector of m 1, C the output matrix of m n, and D the direct transmission matrix of m r. The generalized bilinear state equation in the continuous-time domain is expressed by xP D Ac x C Bc u C

r X

Nci xui C

i D1

r X r X

Ncij xui uj

(1)

i D1 j D1

with the output measurement equation y D C x C Du

(2)

The coupling terms xui and xui uj between the state vector x and each individual ui and/or uj (i; j D 1; : : : ; r) in the input vector u are weighted by the matrices Nci and Ncij of n n, respectively. Subscript c implies the associated quantity in the continuous-time domain. Considering two inputs at a time, (1) reduces to xP D Ac x C bci ui C bcj uj C Nci ui C Ncj uj C Nci i u2i C Ncjj u2j C Ncij ui uj x (3) where bci and bcj are the i th and j th columns of Bc associated with the inputs ui and uj respectively. Assuming that ui D i = constant and uj D j = constant, the continuous-time state equation (3) further reduces to xP D Ac C Nci i C Ncj j C Nci i i2 C Ncjj j2 C Ncij i j x C bci i C bcj j (4)

Generalized Bilinear System Identification with Coupling Force Variables

171

The discrete-time model of this system is x.k C 1/ D ANij x.k/ C bNij I

i; j D 1; 2; : : : ; r

(5)

with the measurement equation yij .k/ D C x.k/ C dij

(6)

where the quantities ANij , bNij , and dNij are determined by N ANij D e Ac ij t I Acij D Ac CNci i CNcj j CNci i i2 CNcjj j2 CNcij i j (7)

bNij D

Z

t

N e Ac ij d bcij I bcij D bci i C bcj j

(8)

0

dij D di i C dj j

(9)

The quantity t is the time interval for data sampling. With the absence of the input, i.e., ui D uj D 0, (5) reduces to x.k C 1/ D Ax.x/

(10)

A D e Ac t

(11)

where Assuming that the initial state x.0/ is a zero vector of n by 1, i.e., x.0/ D 0n1 , the measurement quantities yij .k/ for k D 0; 1; ; N C `, due to the force excitation of ui D i and uj D j (constant force) simultaneously for k < p, and ui D uj D 0 for k N , can be expressed as

where

yij .0/ D dij yij .1/ D C bNij C dij yij .2/ D C ANij bNij C bNij C dij :: : yij .p 1/ D C bQij .p 1/ C dij yij .p/ D C bQij .p/ yij .p C 1/ D CAbQij .p/ :: : yij .p C `/ D CA` bQij .p/

(12)

p1 p2 bQij .p/ D ANij bNij C ANij bNij C C ANij bNij C bNij I

(13)

172

J.-N. Juang

and ` is an integer indicating the data length of the free-decay response. The upper portion, yij .0/; yij .1/; ; yij .p 1/, of (12), corresponds to the multiple-pulse response resulting from a constant force over multiple sample periods, i.e., pt. But the lower portion, yij .p/; yij .p C 1/; ; yij .p C `/, corresponds to the freedecay response which is quite similar, if not identical, to the pulse response for a linear system with the absence of nonlinear coupling terms between the state x and the input u. Any linear system identification technique may be applied to compute the state matrix A and the output matrix C [9, 10] from the free-decay response.

3 System Identification Method The identification method requires two steps. The first step is to identify the state matrix Ac , the output matrix C , and the data transmission matrix D. The second step is to determine the input matrices Bc , Nci for the bilinear coupling term between the state x and the input ui , and Ncij for the nonlinear term between the state vector x and the product of forces ui and uj for i; j D 1; 2; : : : r.

3.1 Identification of Ac , C , and D Apply a pulse of i for the i th input ui and j for the j th input uj to the system for one time step to generate the pulse response .i; j D 1; 2; : : : ; r/. From (12) for N D 1, the pulse response has the following expression yij .0/ D dij I yij .1/ D C bNij I yij .2/ D CAbNij I :: : yij .` C 1/ D CA` bNij

(14)

For any input set of i and j , one obtains a sequence of pulse response; yij .k/; k D 1; 2; : : : ; `. Any different value set of i and j will generate another sequence of pulse response. At the end, one may generate as many sequences as desired for system identification. Now, form the system Markov parameters as (see [9] and [10]) Y1 .0/ D y12 .0/ y13 .0/ D d12 d13 D D Y1 .1/ D y12 .1/ y13 .1/ D C bN12 bN13 Y1 .2/ D y12 .2/ y13 .2/ D CA bN12 bN13 (15) :: : Y1 .` C 1/ D y12 .` C 1/ y13 .` C 1/ D CA` bN12 bN13

Generalized Bilinear System Identification with Coupling Force Variables

173

Subscript 1 for Y1 .k/.k D 1; 2; : : : ; / implies one-time-step pulse response. Equation (15) provides the basic parameters for identification of A, C and D. Note that each system Markov parameter has a minimum of columns, , to be determined later. Let us form a Hankel matrix of ˛m ˇ as follows. 2

Y1 .1/ 6 Y1 .2/ 6 H1 D 6 : 4 ::

:: :

Y1 .2/ Y1 .3/ :: :

Y1 .ˇ/ Y1 .ˇ C 1/ :: :

3 7 7 7 5

Y1 .˛/ Y1 .˛ C 1/ Y1 .˛ C ˇ 1/ 3 C 6 CA 7 6 7 D 6 : 7 BN 1 ABN 1 Aˇ1 BN 1 4 :: 5 2

(16)

CA˛1 where

BN 1 D bN12 bN13

(17)

with the size of n . The matrix product in (16) shows the relationship between the system Markov parameters and the discrete-time system matrices. Obviously the Hankel matrix H1 has the rank n that is the order of the state matrix A if we choose ˛ and ˇ such that ˛m and ˇ are larger than or equal to n where m is the number of outputs and is related to the number of inputs. Using the singular value decomposition (SVD) to decompose the Hankel matrix H1 yields 2 6 6 H1 D U1 ˙1 V1T 6 4

3

C CA :: :

7 7 N 7 B1 ABN 1 Aˇ1 BN 1 5

(18)

CA˛1 where ˙1 is a square matrix containing n non-zero singular values. One may choose 2 6 6 U1 D 6 4

C CA :: :

3 7 7 7 5

(19)

CA˛1 and

˙1 V1T D BN1 ABN 1 Aˇ1 BN 1

(20)

174

J.-N. Juang

The matrix U1 has the dimension of ˛m n, whereas ˙1 V1T has the dimension of n ˇ. This choice is not unique. Many other choices are also valid. Note that the choice of (19) has the advantage that U1T U1 D Inn ) U1 D U1T because U1 is a unitary matrix resulting from the property of the singular value decomposition. Equation (19) is commonly called observability matrix whereas (20) is referred to as the controllability matrix. Equations (19) and (20) produce the following solutions C D the first m rows of U1

(21)

BN 1 D the first columns of ˙1 V1T (22) N Note that the dimension n for B1 is oversized in comparison with the dimension n r for the original input matrix Bc to be determined later. Since the choices of controllability and observability matrices are not unique, the identified matrices C and BN 1 are not unique. To determine the state matrix A, let us first define and observe the following matrices. 2 6 6 U1" D 6 4

C CA :: :

3

2

7 7 7 5

6 6 and U1# D 6 4

CA˛2

CA CA2 :: :

3 7 7 7 D U1" A 5

(23)

CA˛1

Deleting the last m rows of U1 forms the matrix U1" whereas deleting the first m rows of U1 yields the matrix U1# . The state matrix A can then be determined by

A D U1" U1#

(24)

For the identified state matrix to have the rank n, the integer ˛ must be chosen such that .˛ 1/m n, i.e., ˛m > n . From (11), (24) produces the continuous-time state matrix as Ac D

1 1 log.A/ D log.U1" U1# / t t

(25)

Thus, we have determined Ac from (25), and C from (21). The original transmission matrix D can be determined from (9) 2

d12 d13 d.r1/r D d1 d2 d3 dr1

D˝

1 6 6 2 6 60 dr ˝ 6 6 :: 6 : 6 40 0

3 0 0 7 7 7 0 7 :: :: 7 7 : : 7 7 0 r1 5 0 r

1 0 3 :: :

:: :

(26)

Generalized Bilinear System Identification with Coupling Force Variables

175

The matrix D is uniquely determined only when the r matrix has rank of r, that is the case for nonzero i ; .i D 1; 2; : : : ; r/ when > r. Any column of may be repeated by assigning a different value for i , i.e., repeat the same experiment but different input values. From (15), observe that Y1 .0/ D y12 .0/ y13 .0/ D d12 d13 D D

(27)

The matrix D can then be recovered using (26) to have D D Y1 .0/ ˝

(28)

Note again that the identified matrices Ac and C are not uniquely determined but D is coordinate invariant and so is uniquely computed.

3.2 Identification of Bc , Nci and Ncji I i; j D 1; 2 ; : : : ; r The second step begins with generating the multiple-sample-period pulse response for all inputs with two inputs at a time to excite the bilinear system. Figure 1 shows several pulses with sample periods up to p D 4 (four periods) for the i th input ui with the pulse of magnitude i . Similarly, other inputs have the same structure as Fig. 1 but may have various values of i . Apply a pulse of i for the i th input uj and j for the j th input uj to the system for p time steps to generate the pulse response (i, j = 1,2,,r). yij .p/ D C bQij .p/ yij .p C 1/ D CAbQij .p/ yij .p C 2/ D CA2 bQij .p/ :: : yij .p C `/ D CA` bQij .p/

Fig. 1 Multiple-sample-period pulse

(29)

176

J.-N. Juang

where bQij .p/ is defined in (13). There are a total of .r 1/r=2 combinations for two inputs at a time for generation of multiple-sample-pulse response. Additional sets of pulse responses are generated by repeating some experiments with different input values. Now define the system Markov parameters for the p-sample-period pulse response as Yp .p/ D y12 .p/ y13 .p/ C BN p Yp .p C 1/ D y12 .p C 1/ y13 .p C 1/ CABN p :: : Yp .p C `/ D y12 .p C `/ y13 .p C `/ CA` BN p

(30)

where the n matrix BN p is defined as BN p D bQ12 .p/ bQ13 .p/

(31)

Let us form a ˛m matrix as follows. 2 6 6 Hp D 6 4

3

Yp .p/ Yp .p C 1/ :: :

2

7 6 7 6 7D6 5 4

Yp .p C ˛ 1/

C CA :: :

3 7 7 N 7 Bp 5

(32)

CA˛1

Using U1 computed in (19) from one-time-step pulse response, the n matrix BN vp in (32) can be solved by 2 6 6 BN p D 6 4

C CA :: :

3 7 7 7 Hp D U1 Hp 5

(33)

CA˛1 To determine Bc , let us first observe the matrices BN v1 ; ; BN vp defined in (17), and (31), and determined by (22), and (33), i.e., BN 1 D bQ12 .1/ bQ13 .1/ D bN12 bN13 D the first columns of ˙1 V1T BN 2 D bQ12 .2/ bQ13 .2/ :: : N Bp D bQ12 .p/ bQ13 .p/ (34)

Generalized Bilinear System Identification with Coupling Force Variables

177

Applying the recursive formula k1 k1 N BN k BN .k1/ D AN12 bN12 AN13 b13 I k D 2; 3; ; p

(35)

yields 2

3

2

bN12 6 6 BN BN 7 6 AN bN 12 12 6 2 1 7 6 7D6 :: :: 6 7 6 4 5 6 : : 4 p1 N N N N A12 b12 Bp B.p1/ BN 1

bN13 AN13 bN13 :: :: : : p1 N N A b13 13

bN.r1/r AN.r1/r bN.r1/r :: : p1 N bN.r1/r A .r1/r

3

7 7 7 :: 7 : 7 5

(36)

Based on the above matrix, define the controllability-like matrices for each pair of input i and input j h i N C ij D bNij ANij bNij ANp1 ij bij I

i; j D 1; 2; ; rI i ¤ j

(37)

To determine the state matrix ANij D e ANc ij t , let us first define the two matrices Cij

i h Nij D bNij ANij bNij ANp2 b ij

(38)

h i Nij D ANij Cij Cij ! D ANij bNij AN2ij bNij ANp1 b ij

and

(39)

Deleting the last column of Cij forms the matrix Cij whereas deleting the first column of Cij yields the matrix Cij ! . Equations (38) and (39) produce the solutions: ANij D Cij ! Cij I

and

i; j D 1; 2; rI i ¤ j

bNij D the first column of Cij I

i; j D 1; 2; rI i ¤ j

(40)

(41)

For the identified matrix ANij to have the rank n, both n p matrices Cij ! and Cij must also have the rank n. It implies that p must be chosen such that p > n. This indicates that identification of ANij requires a total of at least .n C 1/ sets of responses generated by .n C 1/ various time periods of the pulse input. From (7) and (8) for the definitions of ANij and bNij , taking the conversion from discrete-time to continuous-time produces Acij D Ac CNci i CNcj j CNci i i2 CNcjj j2 CNcij i j D and bcij

1 log.ANij / (42) t

1 1 2 2 3 D Inn t C Acij .t/ C .Acij / .t/ C bNij 2Š 3Š

(43)

178

J.-N. Juang

for i; j D 1; 2; rI i ¤ j , where Inn is a n n identity matrix. Now recall from (9) that bcij D bci i C bcj j I i; j D 1; 2; ; rI i ¤ j (44) which yields

bc12 bc13 bc.r1/r

D bc1 bc2 bc3 bc.r1/

2

1 6 6 2 6 60 bcr ˝ 6 6 :: 6 : 6 40 0

3 0 0 7 7 7 0 7 :: :: 7 7 : : 7 7 0 r1 5 0 r

1 0 3 :: :

:: :

(45)

or equivalently Bc D Bc ˝

(46)

where the symbol ˝ means the Kronecker product. The input matrix Bc can thus be identified to be Bc D Bc ˝ (47) From (42), the matrices Nci and Ncij .i D 1; 2; : : : ; rI j D 1; 2; : : : ; r/ are determined by Nci i C Ncj j C Nci i i2 C Ncjj j2 C Ncij i j D Acij Ac

(48)

Rewriting it into a matrix form yields 3 Nci 7 6 h i 6 Ncj 7 7 6 2 2 i j i j i j ˝ 6 Nci i 7 D Acij Ac 7 6 4 Ncjj 5 Ncij 2

(49)

For the case where the system has only two inputs denoted as ui and uj , we need five experiments with five sets of i and j with D 1; 2; 3; 4; 5 2

3

2 Nci 7 6 6 i2 j2 i22 j22 i2 j2 7 6 N 7 6 cj 6 6 i j 2 2 i j 7 ˝ 6 Nci i 6 3 3 i3 j3 3 3 7 6 7 6 6 4 i4 j4 i24 j24 i4 j4 5 4 Ncjj Ncij i5 j5 i25 j25 i5 j5 i1 j1 i21 j21 i1 j1

3

2

Aci1 j1 7 6A 7 6 ci2 j2 7 6 7 D 6 Aci3 j3 7 6 5 4 Aci4 j4 Aci5 j5

3 Ac Ac 7 7 7 Ac 7 7 Ac 5 Ac

(50)

Generalized Bilinear System Identification with Coupling Force Variables

179

Matrices Nci and Ncij .i; j D 1; 2/ can then be computed by 2

Nci 6N 6 cj 6 6 Nci i 6 4 Ncjj Ncij

3

2

i1 j1 i21 j21 i1 j1

31

7 7 6 i2 j2 i22 j22 i2 j2 7 7 6 6 7 7 6 7 D 6 i3 j3 i23 j23 i3 j3 7 7 7 6 5 4 i4 j4 2 2 i4 j4 7 5 i4 j4 2 2 i5 j5 i5 j5 i5 j5

2

Aci1 j1 6A 6 ci2 j2 6 ˝ 6 Aci3 j3 6 4 Aci4 j4 Aci5 j5

3 Ac 7 Ac 7 7 Ac 7 7 Ac 5 Ac

(51)

The matrix inverse Œ 1 becomes matrix pseudo-inverse Œ if more than five experiments are conducted for the pair of inputs. Note that the values i and j for D 1; : : : ; 5 should be chosen such that the matrix inverse is well-conditioned. For the general cases where the number of inputs is r > 1, similar equation to (51) may be formulated by performing .r Cr Cr.r 1/=2/ sets of experiments with two inputs at a time. Each set of experiments requires a minimum of n C 1 different time periods of pulse to compute the matrix Aci j with a proper integer for in order to establish enough number of equations to solve for the coupling coefficient matrices Ni , and Ncij for i; j D 1; 2; : : : ; r.

4 Numerical Example Consider the following bilinear equation for the two-input (r D 2) and single-output (m D 1) case xP D Ac x C Bc u C Nc1 xu1 C Nc2 xu2 C Nc11 xu21 C Nc22 xu22 C Nc12 xu1 u2 (52) y D Cx where 1 0 10 00 11 Ac D I Bc D I C D 0 1 I Nc1 D I Nc2 D I 1 2 01 11 00 0 0 1 1 1 1 I Nc22 D I Nc12 D I Nc11 D 1 1 0 0 1 1 (53) The system possesses five nonlinear terms, i.e., r C r C r.r 1/=2 D 5, in this two-input case. Five nonlinear terms implies the need of five independent sets of multiple-pulse response data with five different input values for identification of a complete set of system matrices, Ac ; Bc ; C; D; Nc1 ; Nc2 ; Nc11 ; Nc22 and Nc12 for the two-input case. The order of the system is generally unknown. Let us generate five sets of data with the time period D 1 second for each pair of input values. The five sets of input values are shown as follows:

180

J.-N. Juang

Fig. 2 Five responses sampled at 1 Hz from the two inputs of amplitudes Œ0:4I 0:001 over five different sample periods, spp: sample-period pulse

11 12 13 14 15 21 22 23 24 25

0:4 0:001 0:25 1 0:4 D 0:001 1 0:25 2 0:1

(54)

The first subscript for indicates the input number whereas the second subscript gives the number of experiment. These five sets of pulse values were carefully selected to generate response histories with comparable size in amplitude. Figure 2 shows the pulse responses generated by exciting the bilinear system, (52), using the first set of input values shown in (54). Each response sampled at 1 Hz has 12 data points. Similar pulse responses (not shown) are also generated by other sets of input values. These responses are obtained by numerically integrating the bilinear system shown in (52). Using the five one-sample-period pulse responses and setting ˛ D 5 and ˇ D 6, the Hankel matrix H1 shown in (16) with five sets of input values should have the size of m˛ 5ˇ D 5 30. The singular values of this Hankel matrix are ˙1 D diag 1:0114 0:8767 0 0 0

(55)

that implies the system order n D 2. The left singular vector matrix is 2

3 00:8827 00:4673 6 7 6 7 6 00:4313 00:7638 7 6 7 6 7 U1 D 6 7 D 6 00:1731 00:4076 7 7 4 5 6 4 00:0656 00:1671 5 4 CA 00:0244 00:0638 2

C CA :: :

3

The state matrix and the output matrix identified from this Hankel matrix are

(56)

Generalized Bilinear System Identification with Coupling Force Variables

00:5674 30:8003 Q I CQ D 00:8827 00:4673 I D D 0 0 Ac D 00:1631 20:4326

181

(57)

Using the other multiple-sample-period responses, the Hankel matrices Hk shown in (32) for k = 2, 3, 4, 5 have the size of 5 5 (i.e., ˛ number of input pairs), that produce the 2 5 matrices BN1 ; BN 2 ; ; BN 5 shown in (34), and in turn yield 2 4 matrices Cij and Cij ! shown in (38) and (39). Applying (35)–(44), the quantity Bc is identified as 00:1241 00:9444 I (58) BQc D 00:2344 00:3560 Applying (51) thus yields the identified quantities: 00:2759 00:3833 Q Nc2 D I 00:5212 00:7241

(59)

00:4326 30:8003 00:0568 00:4993 I NQ c22 D I 00:1631 10:4326 00:1074 00:9432 00:7085 40:1836 D 00:3581 00:7085

(60)

NQ c1 D and

NQ c11 D NQ c12

20:0999 20:9177 I 00:7916 10:0999

The tilt on the top of Ac ; Bc ; Nc1 ; Nc2 ; Nc11 ; Nc22 ; and Nc12 signifies the identified quantities that are not uniquely determined. The identified matrices AQc ; BQ c ; CQ ; NQ c1 ; NQ c2 ; NQ c11 ; NQ c22 ; and NQ c12 are equivalent to the matrices Ac ; Bc ; Nc1 ; Nc2 ; Nc11 ; Nc22 ; and Nc12 shown in (53) in the sense that they give the same map from the input u to the output y. Note that the eigenvalues for both Ac and AQc are identical, i.e., 1 and 2. A coordinate transformation will be able to transform one set of matrices to the other set of matrices (see [6]).

5 Concluding Remarks This paper constitutes part of the great efforts in studying continuous-time bilinear system identification. A series of studies have been done for several cases with various aspect of nonlinearity consisting of the product of state vector and force variables. The whole identification process requires a series of pulse response experiments for various pulse duration with a finite number of pairs of input values. For simplicity, only two coupling force variables at a time are considered in this paper. The identification process can be easily extended to more general cases with any fixed number of coupling force variables. The derivation of the identification algorithm relies on noise-free input and output data. In practice, only noisy signals are measured and model errors always exist. Furthermore, the initial state may not be at rest. These practical limitations of the proposed method have not been addressed in this paper. Nevertheless, it is

182

J.-N. Juang

anticipated that the accuracy indicators already existed for linear system identification with noisy measurements may be enhanced for bilinear system identification. In the past decade, many identification methods were developed for discretetime bilinear systems. It is known at this moment that the continuous-time bilinear systems do not have its explicit counterpart of discrete-time bilinear systems as in the linear case. An open question is still unresolved whether there ever exists an explicit/implicit discrete-time version for the continuous-time bilinear systems in the sense that they may be able to transform from one to the other as in the linear case. Acknowledgements The major portion of this research was completed when the author served as President of National Applied Research Laboratory, Taipei, Taiwan

References 1. Bruni, C., DiPillo, G., and Koch, G., On the Mathematical Models of Bilinear Systems, Ricerche Di Automatica, 2 (1), 1971, pp. 11–26. 2. Bruni, C., DiPillo, G., and Koch, G., Bilinear Systems: An Appealing Class of Nearly Linear Systems in Theory and Application, IEEE Transaction Automatic Control, AC-19, 1974, pp. 334–348. 3. Mohler, R. R., and Kolodziej, W. J., An Overview of Bilinear System Theory and Applications, IEEE Transactions on Systems, Man and Cybernetics, SMC-10, 1980, pp. 683–688. 4. Mohler, R. R., Nonlinear Systems: Vol. II, Applications to Bilinear Control, Prentice-Hall, New Jersey, 1991. 5. Elliott, D. L., Bilinear Systems, in Encyclopedia of Electrical Engineering, Vol. II John Webster (ed.), John Wiley and Sons, New York, 1999, pp. 308–323. 6. Juang, J.-N., Continuous-Time Bilinear System Identification, Nonlinear Dynamics, Kluwer Academic Publishers, Special Issue 39(1-2), (January I-II 2005), pp. 79–94 7. Sontag, E.D., Wang, Y., Megretski, A., Input Classes for Identification of Bilinear Systems, 2007 American Control Conference, July 11–13, 2007, Marriott Marquis Hotel at Time Square, New York, USA, Paper FrA04.3. 8. Juang, J.-N., Generalized Bilinear System Identification, The Journal of the Astronautical Sciences, Vol. 57, Nos. 1 & 2, January-June 2009, pp. 261–273. 9. Juang, J.-N., Applied System Identification, Prentice Hall, New Jersey, 1994. 10. Juang, J-N. and Phan, M. Q., Identification and Control of Mechanical Systems, Cambridge University Press, New York, 2001.

Reduced-Order Wave-Propagation Modeling Using the Eigensystem Realization Algorithm Stephen A. Ketcham, Minh Q. Phan, and Harley H. Cudney

Abstract This paper presents a computationally efficient version of the Eigensystem Realization Algorithm (ERA) to model the dynamics of large-domain acoustic propagation from High Performance Computing (HPC) data. This adaptation of the ERA permits hundreds of thousands of output signals to be handled at a time. Once the ERA-derived reduced-order models are obtained, they can be used for future simulation of the propagation accurately without having to go back to the HPC model. Computations that take hours on a massively parallel high performance computer can now be carried out in minutes on a laptop computer.

1 Introduction Simulation of linear-time-invariant systems has broad application in outdoor sound propagation. On a site-specific basis, particularly for geometrically complex and heterogeneous domains, sound propagation can be best examined and understood using results of three-dimensional numerical simulations. However, to analyze large-domain, long-duration, and wide-bandwidth systems associated with many practical scenarios, these simulations require massively parallel computations merely to approach an acceptable fidelity between simulation and actual outdoor sound transmission. This research, recognizing the computational investment inherent in every high-performance numerical model, develops reduced-order models (ROMs) from input–output signals of sound-propagation supercomputing

S.A. Ketcham H.H. Cudney Engineer Research and Development Center, Hanover, NH 03755, USA e-mail: [email protected]; [email protected] M.Q. Phan Thayer School of Engineering, Dartmouth College, Hanover, NH 03755, USA e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 15, © Springer-Verlag Berlin Heidelberg 2012

183

184

S.A. Ketcham et al.

simulations [1, 2]. The method uses a state-space minimum-realization technique called the Eigensystem Realization Algorithm (ERA) [3–6] to generate efficient and reusable ROMs for very large wave fields that are down-selected from the entire output field of the numerical model. For problems where the number of outputs is in the hundreds of thousands or millions, conventional ERA requires the singular value decomposition (SVD) of a very large dimensional matrix. This calculation is expensive in terms of both computational time and random access memory (RAM) requirement. Hence there is a need for a modified ERA formulation that avoids using a large amount of RAM and the SVD of a large matrix. It should be noted that outdoor propagation cannot be described with classical modal decomposition. The implementation, therefore, focuses on finding a model that results in nearly exact reproduction of the Markov parameters (i.e., system unit pulse response samples) for the duration and the dynamic range of interest. Compared to results generated by the urban-acoustics HPC simulation, the prediction error of the reduced-order model derives primarily from any inaccuracies in the computed Markov parameters, rather than from the realization technique. Our study of the wave-field error levels of a reduced-order model with 1.26 million outputs indicates that the method is capable of reducing a supercomputer model to a model that can operate on a laptop computer. The outline of this paper is as follows. First, we review the original ERA algorithm. Next, we describe a modified version for computational efficiency. Finally, we present a numerical illustration that shows how the technique is used to derive highly accurate reduced-order models from HPC simulation data of sound propagation in a highly complex environment. A companion paper [7] describes another technique where the reduced-order models can be directly realized from the Markov parameters without going through ERA.

2 State-Space Model The discrete-time state-space model for a linear time-invariant dynamical system takes the form, x.k C 1/ D Ax.k/ C Bu.k/ (1) y.k/ D C x.k/ In (1), k is the integer sample index, x is an n-dimensional state vector, u is an m-dimensional input vector, y is a q-dimensional output vector. A is the n-by-n system matrix, B is the n-by-m input influence matrix, and C is the q-by-n output influence matrix. For example, the input can be single sound source, and the output can be sound pressures at nodes throughout a domain of interest. The system Markov parameters, h.k/, are h.k/ D CAk1 B; k D 1; 2; : : :

(2)

Reduced-Order Wave-Propagation Modeling

185

These can be obtained from the source and response data by various techniques, including the inverse FFT method as described in [6]. The next section describes how a state-space model (1) can be realized from a sufficient number of Markov parameters.

3 Eigensystem Realization Algorithm (ERA) Starting with the Markov parameters, a state-space model of the system can be derived by ERA. The algorithm begins by forming two r-by-s block Hankel matrices H.0/ and H.1/ as follows, 2

h.1/ 6 h.2/ 6 H.0/ D 6 : 4 ::

h.2/ h.3/ :: :

:: :

h.s/ h.s C 1/ :: :

3 7 7 7 5

(3)

h.r/ h.r C 1/ h.r C s 1/ 2 6 6 H.1/ D 6 4

h.2/ h.3/ :: :

h.3/ h.4/ :: :

3 h.s C 1/ h.s C 2/ 7 7 7 :: :: 5 : :

(4)

h.r C 1/ h.r C 2/ h.r C s/ The minimum order n of the system is revealed by the singular value decomposition of the Hankel matrix H.0/, (5) H.0/ D U˙V T The columns of U and V are orthonormal, ˙ is an n-by-n diagonal matrix of positive singular values, and n is the minimum order of the system. Define a q-by-rq matrix EqT , and an m-by-sm matrix EmT consisting of identity and null matrices of the form, EqT D Iqq 0q.r1/q ; EmT D Imm 0m.s1/m

(6)

A discrete-time minimum-order realization of the system can be shown to be Ar D ˙ 1=2 U T H.1/V ˙ 1=2 Br D ˙ 1=2 V T Em Cr D EqT U˙ 1=2

(7)

In (7) the subscript r is added to indicate that the realized state-space model is not necessarily in the same coordinates as the original state-space model in (1), but they have the same input–output map, i.e., the same Markov parameters. With perfect Markov parameters, the singular value decomposition in (5) reveals the true

186

S.A. Ketcham et al.

minimum order n of the system if r and s are chosen to be sufficiently large so that the rank n of H.0/ can be revealed. In the presence of noise, or when the Markov parameters are imperfect, the Hankel matrix H.0/ will contain more than n nonzero singular values, rendering the determination of the true minimum order of the system not obvious. In that case the order of the realization in (7) is determined by the number of singular values that the user decides to retain in (5).

4 Computationally Efficient Version of ERA Consider a system where the number of outputs is many orders of magnitudes larger than the number of inputs, q >> m. Instead of working with the original state-space model, we work with its transposed model, AN D AT ; BN D C T ; CN D B T

(8)

It follows that the Markov parameters of the transposed model are the transpose of the Markov parameters of the original model, T k CN ANk BN D B T AT C T D CAk B

(9)

Applying ERA to the Markov parameters of the transposed system produces ANr D ˙N 1=2 UN T HN .1/VN ˙N 1=2 BN r D ˙N 1=2 VN T Eq CN r D EmT UN ˙N 1=2

(10)

where HN .0/ D UN ˙N VN T , and both the Hankel matrices HN .0/ and HN .1/ are built from the transposed Markov parameters. When q >> m, the matrices HN .0/ and HN .1/ are both very wide matrices. To compute the SVD of HN .0/ efficiently, recognize that T HN .0/HN T .0/ D UN ˙N VN T UN ˙N VN T D UN ˙N 2 UN T

(11)

The product HN .0/HN T .0/, which is a square symmetric matrix of much lower dimensions, has UN as its matrix of left singular vectors. The singular values of HN .0/HN T .0/ are the squares of the singular values of HN .0/. The matrices VN and VN T can be expressed as (12) VN D HN .0/T UN ˙N 1 VN T D ˙N 1 UN T HN .0/

(13)

The Hankel matrices formed by the transposed Markov parameters are the transpose of the Hankel matrices formed by the original Markov parameters, HN .0/ D H.0/T ; HN .1/ D H.1/T

(14)

Reduced-Order Wave-Propagation Modeling

Therefore,

H.0/T H.0/ D HN .0/HN .0/T D UN ˙N 2 UN T

187

(15)

Substituting (12), (13), and (14) into (10) produces ANr D ˙N 1=2 UN T H.1/T H.0/ UN ˙N 3=2 BN r D ˙N 1=2 UN T H.0/T Eq CN r D EmT UN ˙N 1=2

(16)

Recognizing the relationship between the transposed model and the original model, we arrive at the final formulas for Ar ; Br ; Cr as Ar D ˙N 3=2 UN T H.0/T H.1/ UN ˙N 1=2 Br D ˙N 1=2 UN T Em Cr D EqT H.0/UN ˙N 1=2

(17)

From (11), UN and ˙N are obtained from the SVD of H.0/T H.0/ D UN ˙N 2 UN T . The formulas in (17) are computationally efficient because both H.0/T H.0/ and H.0/T H.1/ have significantly small dimensions when q >> m. There is no need to form H.0/ and H.1/ explicitly because only their products, H.0/T H.0/ and H.0/T H.1/, are called for. It should be noted that given p D r C s Markov parameters, H.0/ and H.1/ with the largest number of columns are: 2

3 h.2/ h.p 2/ h.p 1/ 6 h.3/ h.p 1/ h.p/ 7 6 7 6 7 h.4/ h.p/ 0 6 7 6 7 :: H.0/ D 6 7 6 h.4/ h.5/ h.p/ 7 0 : 6 7 :: :: :: :: :: :: 6 7 4 5 : : : : : : h.p 1/ h.p/ 0 0 0 h.1/ h.2/ h.3/

2

h.2/ 6 h.3/ 6 6 6 h.4/ 6 H.1/ D 6 6 h.5/ 6 6 :: 4 : h.p/

3 h.3/ h.p 1/ h.p/ h.4/ h.p/ 0 7 7 7 h.5/ h.p/ 0 0 7 :: 7 7 h.6/ 0 0 : 7 7 :: :: :: :: :: 7 : : : : : 5 0 0 0 0

(18)

(19)

In (18) and (19) h.k/ D 0 for k > p, which is appropriate for modeling the propagation dynamics during a finite-time interval k D 0; 1; : : : ; p. For q >> m, these Hankel matrices produce the largest possible state-space model consisting of .p 1/m states. Common terms that are present in the products H.0/T H.0/ and H.0/T H.1/ don’t have to be computed twice. We use an algorithm that computes

188

S.A. Ketcham et al.

Fig. 1 3D HPC simulation model and RMS of output signals (Pa) from source at center

Fig. 2 Waveform of filtered pulse source and frequency spectrum

only the unique multiplications in H.0/T H.0/ and H.0/T H.1/ in forming these products. Finally, as the matrices Em and EqT in (17) simply pick out the first m columns and the first q rows of UN T and H.0/, respectively, the computation of Br ; Cr is simpler than what it may appear.

5 Numerical Illustration A 3D HPC model of a city center (778 m-by-775 m-by-179 m in height) with 2.8 billion nodes is used to simulate the dynamic propagation of a sound source (Fig. 1). The output layer contains 1.26 million nodes above the ground and rooftops. The street-level source is at the model center. The sound source is a filtered pulse with the waveform shown on the left side of Fig. 2. The source concentrates energy in a particular frequency range as shown on the right side of Fig. 2. Time series of sound levels at the 1.26 million output locations are used in the identification. Each series

Reduced-Order Wave-Propagation Modeling

Fig. 3 Markov parameters (Pa s) of reduced-order model and HPC model

Fig. 4 Singular value plots for all 11 strips

189

190

S.A. Ketcham et al.

is 1,024 samples long which corresponds to 4.36 s of propagation. These outputs are divided further into 11 strips, each with 114,240 outputs. This set of time-domain input-out data, one strip at a time, is used to generate the system Markov parameters by the inverse FFT method. From these Markov parameters, the modified version of ERA is applied to produce 11 reduced-order single-input 114,240-output statespace models, one model per strip. The HPC simulation was performed on a Cray XT3 supercomputer with 256 CPU’s and 512 GB of RAM. The ERA model, on the other hand, operates using a single core and about 2.6 GB of RAM on a laptop computer, accessing the Markov parameters by virtual memory. To verify these reduced-order models, they are used to produce the pulse responses at the 1.26 million output nodes, and these responses are compared to the Markov parameters computed from the original HPC model. Figure 3 shows the time-series agreement from the highly scattered signals of selected locations. Using the ordering of the singular values, plotted in Fig. 4 for the 11 strips, the effect of model order reduction from 1,024 to 768 to 512 to 256 states is illustrated by spatial relative error plots in Fig. 5. As the plots quantify, the error increases as the model

Fig. 5 Relative error of 256-, 512-, 768-, and 1024-state reduced-order models (top left, top right, bottom left, bottom right, respectively)

Reduced-Order Wave-Propagation Modeling

191

Fig. 6 Waveforms and spectra of source signals superimposed to test the reduced-order models

order is reduced, revealing the accuracy expected within the duration of the Markov parameters. The higher-order models capture more propagation dynamics, hence better prediction quality is expected. Further verification is by an HPC simulation with an independent source, shown in Fig. 6 both in time and frequency. This input signal is a superposition of two Gaussian pulses and three harmonics. The source is longer than the 4.36-s duration of the Markov parameter sequence to test accuracy when ignoring late-arriving scattered energy. When comparing HPC results and the 1,024-state model signals, before 4.36 s, the median relative error over the output field is 1.6%. This error is 6% when comparing the full 6.5-s duration of the Fig. 6 signals, revealing the

192

S.A. Ketcham et al.

importance of capturing the desired dynamic range in the Markov parameters for models with continuous sources. Regarding efficiency of the ROM compared to the HPC simulation, the 1,024-state model with the Fig. 6 source operates on a laptop computer with reduction factors of about 10,000 in computational requirements (number of cores seconds) and about 2 million in combined computational and memory requirements (number of cores seconds bytes). The utility of the reduced-order models is thus clearly demonstrated.

6 Conclusions In this work we have developed a computationally efficient version of the Eigensystem Realization Algorithm to derive reduced-order modes from wave-field data. This version of ERA can handle systems with hundreds of thousands of outputs. A high fidelity HPC simulation code is used to generate the acoustic responses to a source input in the entire 3D domain from which a subset of output locations of interest are selected. The inverse FFT technique is used to recover the system Markov parameters for use by the computationally efficient version of ERA developed in this paper. When the dynamic responses to a different source input is needed, the ERA-derived reduced-order models can be used in place of the original HPC simulation code, resulting in many orders of magnitude savings in computational requirements. To test the validity of the reduced-order models, the predicted acoustic wave-field signals from the reduced-order models are compared to the HPC-generated responses over a large and highly resolved domain. With care in the generation of the Markov parameters, highly accurate reduced-order models that describe the dynamics of sound propagation with severe scattering can be produced by the developed technique. Acknowledgements This research is supported in part by In-House Laboratory Independent Research (ILIR) and a US Department of the Army Small Business Technology Transfer (STTR) subcontract to Dartmouth College by Sound Innovations, Inc. The authors thank Mr. Michael W. Parker who contributed to the HPC simulations.

References 1. Ketcham, S.A., Parker, M.W., Cudney, H.H., and Wilson, D.K.: Scattering of Urban Sound Energy from High-Performance Computations. DoD High Performance Computing Modernization Program Users Group Conference, IEEE Computer Society, pp. 341–348 (2008). 2. Cudney, H.H., Ketcham, S.A., and Parker, M.W.: Verification of Acoustic Propagation Over Natural and Synthetic Terrain. DoD High Performance Computing Modernization Program Users Group Conference, IEEE Computer Society, pp. 247–252 (2007).

Reduced-Order Wave-Propagation Modeling

193

3. Ho, B.L., Kalman, R.E.: Effective Construction of Linear State-Variable Models from Input– Output Functions. Regelungstechnik, 14, 545–548 (1966). 4. Juang, J.-N., Pappa, R.S.: An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction. Journal of Guidance, Control, and Dynamics, 8, 620–627 (1985). 5. Juang, J.-N., Cooper, J.E., Wright, J.R.: An Eigensystem Realization Algorithm Using Data Correlations (ERA/DC) for Modal Parameter Identification. Control Theory and Advanced Technology, 4, No. 1, 5–14 (1988). 6. Juang, J.-N.: Applied System Identification. Prentice-Hall, Upper Saddle River, NJ (2001). 7. Phan, M.Q., Ketcham, S.A., Darling, R.S., Cudney, H.H.: Superstable State-Space Representation for Large-Domain Wave Propagation. Proceedings of the 4th International Conference on High Performance Scientific Computing, Hanoi, Vietnam (2009).

•

Complementary Condensing for the Direct Multiple Shooting Method Christian Kirches, Hans Georg Bock, Johannes P. Schl¨oder, and Sebastian Sager

Abstract In this contribution we address the efficient solution of optimal control problems of dynamic processes with many controls. Such problems typically arise from the convexification of integer control decisions. We treat this problem class using the direct multiple shooting method to discretize the optimal control problem. The resulting nonlinear problems are solved using an SQP method. Concerning the solution of the quadratic subproblems we present a factorization of the QP’s KKT system, based on a combined null-space range-space approach exploiting the problem’s block sparse structure. We demonstrate the merit of this approach for a vehicle control problem in which the integer gear decision is convexified.

1 Introduction Mixed-integer optimal control problems (MIOCPs) in ordinary differential equations (ODEs) have a high potential for optimization. A typical example is the choice of gears in transport [6, 8, 9, 14, 19]. Direct methods, in particular all-at-once approaches, [2, 3], have become the methods of choice for most practical OCPs. The drawback of direct methods with binary control functions is that they lead to high-dimensional vectors of binary variables. Because of the exponentially growing complexity of the problem, techniques from mixed-integer nonlinear programming will work only for small instances [20]. In past contributions [9, 12, 13, 15] we proposed to use an outer convexification with respect to the binary controls, which has several main advantages over standard

C. Kirches H.G. Bock J.P. Schl¨oder S. Sager Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg Im Neuenheimer Feld 368, 69120 Heidelberg, Germany e-mail: [email protected]; [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 16, © Springer-Verlag Berlin Heidelberg 2012

195

196

C. Kirches et al.

formulations or convexifications, cf. [12, 13]. In an SQP framework for the solution of the discretized MICOP, the outer convexification approach results in QPs with many control parameters. Classical methods [3] for exploiting the block sparse structure of the discretized OCP leave room for improvement. In [16, 17], structured interior point methods for solving QP subproblems arising in SQP methods for the solution of discretized nonlinear OCPs are studied. A family of block structured factorizations for the arising KKT systems is presented. Extensions to tree-sparse convex programs can be found in [18]. In this contribution we present an alternative approach at solving these QPs arising from outer convexification of MIOCPs, showing that a certain factorization from [16] ideally lends itself to the case of many control parameters. We employ this factorization for the first time inside an active-set method. Comparisons of run times and complexity to classical condensing methods are presented.

2 Direct Multiple Shooting for Optimal Control 2.1 Optimal Control Problem Formulation In this section we describe the direct multiple shooting method [3] as an efficient tool for the discretization and parameterization of a broad class of OCPs. We consider the following general class (1) of optimal control problems min

l.x./; u.//

s.t.

x.t/ P D f .t; x.t/; u.t//

8t 2 T

(1b)

0 c.t; x.t/; u.t//

8t 2 T

(1c)

0 5 r.ti ; x.ti //

0i m

(1d)

x./;u./

(1a)

in which we strive to minimize objective function l./ depending on the trajectory x./ of a dynamic process described in terms of a system f of ordinary differential equations the time horizon T WD Œt0 ; tf R, and governed by a control trajectory u./ subject to optimization. The process trajectory x./ and the control trajectory u./ shall satisfy certain inequality path constraints c on the time horizon T , as well as (in-)equality point constraints ri on a grid of m C 1 grid points on T , t0 < t1 < : : : < tm1 < tm WD tf ;

m 2 N; m 1:

(2)

The direct multiple shooting method is applied to discretize the control trajectory u./ to make this infinite dimensional problem computationally accessible.

Complementary Condensing for the Direct Multiple Shooting Method

197

2.2 Direct Multiple Shooting Discretization Control Discretization A discretization of the control trajectory u./ on the shooting q grid (2) is introduced, using control parametersqi 2 Rni and base functions bi W q u T Rni ! Rn . Examples are piecewise constant or linear functions. Xnqi

u.t/ WD

j D1

bij .t; qij /;

t 2 Œti ; ti C1 T ; 0 i m 1:

(3)

State Parameterization In addition to the control parameter vectors, we introduce x state vectors si 2 Rn in all shooting nodes serving as initial values for m IVPs xP i .t/ D f .t; xi .t/; qi /;

xi .ti / D si

t 2 Œti ; ti C1 T ; 0 i m 1: (4)

This parameterization of the process trajectory x./ will in general be discontinuous on T . Continuity is ensured by introduction of additional matching conditions xi .ti C1 I ti ; si ; qi / si C1 D 0;

0 i m 1;

(5)

where xi .ti C1 I ti ; si ; qi / denotes the evaluation of the i -th state trajectory xi ./ at time ti C1 depending on the start time ti , initial value si , and control parameters qi . Constraint Discretization The path constraints of problem (1) are enforced on the nodes of the shooting grid (2) only. It can be observed that in general this formulation already leads to a solution that satisfies the path constraints on the whole of T . 0 5 ri .ti ; si ; qi /; 0 i m 1;

0 5 rm .tm ; sm /:

(6)

Separable Objective The objective function shall be separable with respect to the shooting grid structure, l.x./; u.// D

Xm i D0

li .si ; qi /:

(7)

In general, l./ will be a Mayer type function or a Lagrange type integral function. For both types, a separable formulation is easily found. Summarizing, the discretized multiple shooting optimal control problem can be cast as a nonlinear problem min w

s.t.

Xm i D0

li .wi /

(8a)

0 D xi .ti C1 I ti ; wi / si C1

0i m1

(8b)

0 5 ri .wi /

0i m

(8c)

198

C. Kirches et al.

with the vector of unknowns w WD .s1 ; q1 ; : : : ; sm1 ; qm1 ; sm / and subvectors wi WD .si ; qi / for 0 i m1, and wm WD .sm /. The evaluation of the matching condition constraint (8b) requires the solution of the initial value problem (4).

2.3 Block Sparse Quadratic Subproblem For solving the highly structured NLP (8) we employ methods of SQP type, a long-standing and highly effective method for the solution of NLPs that also allow for much flexibility in exploiting the problem’s special structure. SQP methods iteratively progress towards a KKT point of the NLP by solving a linearly constrained local quadratic model of the NLP’s Lagrangian [11]. For NLP (8) the local quadratic model of the Lagrangian, to be solved in each step of the SQP method, reads Xm

ıw0i Bi ıwi C gi0 ıwi

min

1 2

s.t.

0 D Xi .wi /ıwi ısi C1 hi .wi /;

0 i m 1;

(9b)

0 5 Ri .wi /ıwi ri .wi /;

0 i m;

(9c)

ıw

i D0

(9a)

with the following notations for vector of unknowns ıw and its components ıwi WD .ısi ; ıqi / ; 0 i m 1;

ıwm WD ısm ;

(10)

reflecting the notation used in (8), and with vectors hi denoting the residuals hi .wi / WD xi .ti C1 I ti ; wi / si C1 :

(11)

The matrices Bi denote the node Hessians or suitable approximations, cf. [3], and the vectors gi denotes the node gradients of the objective function, while matrices eq Xi , Ri , and Riin denote linearizations of the constraint functions obtained in wi , Bi

d2 li .wi / ; dw2i

gi WD

dli .wi / dri .wi / ; Ri WD ; dwi dwi

Xi WD

@xi .ti C1 I ti ; wi / : @wi (12)

The computation of the sensitivity matrices Xi requires the computation of derivatives of the solution of IVP (4) with respect to the wi . Consistency of the derivatives is ensured by applying the principle of internal numerical differentiation (IND) [1].

Complementary Condensing for the Direct Multiple Shooting Method

199

3 Block Sparse Quadratic Programming 3.1 Classical Condensing In the classical condensing algorithm [3, 10] that works as a preprocessing step to obtain a small dense QP from the block sparse one, the matching conditions (9b) are used for block Gaussian elimination of the steps of the additionally introduced state variables .ıs1 ; : : : ; ısm /. The resulting dense QP has nx C mnq unknowns instead of m.nx C nq / ones, is usually densely populated, and suited for solution with any standard QP code such as the null-space active-set codes QPOPT [7], qpOASES [4], or BQPD [5]. As we will see in Sect. 4, for MIOCPs with many controls parameters (i.e. large dimension nq ) resulting from the outer convexification of integer control functions, the achieved reduction of the QP’s size is marginal, however.

3.2 The KKT System’s Block Sparse Structure In this section we present an alternative approach at solving the KKT system of QP (9) found in [16, 17] where it was employed inside an interior-point method. We derive in detail the necessary elimination steps that will ultimately retain the duals of the matching conditions only. In this sense, the approach is complementary to the classical condensing algorithm. For optimal control problems with dimensions nq nx , the presented approach obviously is computationally more favorable than retaining unknowns of dimension nq . In contrast to [16, 17] we employ this factorization approach inside an active-set method, and intend to further adapt it to this case by exploitation of simple bounds and derivation of matrix updates in a further publication. For a given active set, the KKT system of the QP (9) to be solved for the primal step ıwi and the dual step .ı; ı/ reads for 0 i m Pi0 ıi 1 C Bi .ıwi / C Ri0 ıi C Xi0 ıi D Bi wi C gi

DW g i ; (13a)

Ri .ıwi / D Ri wi ri

DW r i ; (13b)

Xi .ıwi / C Pi C1 .ıwi C1 / D Xi wi C Pi C1 si C1 hi DW hi : (13c) r

x

with multipliers ı 2 Rn for the matching conditions (9b) and ı 2 Rni for the active point constraints (9c). The projection matrices Pi are defined as x x q Pi WD I 0 2 Rn .n Cn / ; 1 i m;

(14)

and as P0 WD 0 2 Rn .n Cn / , PmC1 WD 0 2 Rn n for the first and last shooting nodes, respectively. In the following, all matrices and vectors are assumed to x

x

q

x

x

200

C. Kirches et al.

comprise the components of the active set only. To avoid the need for repeated special treatment of the first and last shooting node throughout this paper, we introduce the following conventions that make (13) hold also for the border cases i D 0 and i D m: x

x

ı1 WD 0 2 Rn ; 1 WD 0 2 Rnx ; ım WD 0 2 Rn ; nx

nx

ıwmC1 WD 0 2 R ; wmC1 WD 0 2 R ;

nx

hm WD 0 2 R ;

x

m WD 0 2 Rn ; Xm WD 0 2 R

nx nx

(15a) :

(15b)

3.3 Hessian Projection Schur Complement Factorization Hessian Projection Step Under the assumption that the number of active point constraints does not exceed the number of unknowns, i.e. the active set is not degenerate, we can perform QR factorizations of the point constraints matrices Ri , 0 Ri Qi D RiR 0 ;

Qi WD Yi Zi :

(16)

Here Qi are a unitary matrices and RiR is upper triangular. We partition ıwi into its range space part ıwYi and its null space part ıwZ i , where the identity ıwi D Y Yi ıwYi C Zi ıwZ holds. We find ıw from the range space projection of (13b) i i Ri .ıwi / D RiR ıwYi D r i :

(17)

We transform the KKT system onto the null space of Ri by substituting Yi ıwYi C Z Zi ıwZ i for ıwi and solving for ıwi . We find for the matching conditions (13c) Z Y Y Xi Zi ıwZ i Pi C1 Zi ıwi C1 D hi C Xi Yi ıwi C Pi C1 Yi ıwi C1

(18)

Z to be solved for ıwZ i once ıwi C1 is known. For stationarity (13a) we find 0 0 0 0 0 0 Y Zi0 Pi 0 ıi 1 Zi0 Bi Zi ıwZ i C Zi Ri i C Zi Xi ıi D Zi g i C Zi Bi Yi ıwi

and Zi0 Ri

Yi0 Ri0 ıi D Yi0 .Bi ıwi C Pi 0 ıi 1 Xi0 ıi C g i /: Yi0 Ri0

(19) (20)

D RiR

Therein, D 0 and Thus (19) can be solved for ıi once ıwi and ıi 1 are known, while (20) can be used to determine the point constraints multipliers ıi . Let thus null space projections be defined as follows: BQ i WD Zi0 Bi Zi ; gQ i WD Zi0 g i C Zi0 Bi Yi ıwYi ; XQ i WD Xi Zi ; PQi WD Pi Zi ;

hQ i WD hi C Xi Yi ıwYi C Pi C1 Yi ıwYiC1 ;

0 i m;

(21a)

0 i m 1; (21b) 0 i m 1: (21c)

Complementary Condensing for the Direct Multiple Shooting Method

201

With this notation the projection of the KKT system on the null space of the point constraints can be read from (18) and (19) for 0 i m 1 as Q0 Qi ; PQi0 ıi 1 C BQ i .ıwZ i / C Xi ıi D g

(22a)

Z Q Q XQi .ıwZ i / C Pi C1 .ıwi C1 / D hi :

(22b)

Schur Complement Step In (22a) the elimination of ıwZ is possible using a Schur complement step, provided that the reduced Hessians BQ i are positive definite. We find Q 1 Q i PQi0 ıi 1 XQi0 ıi / .ıwZ (23) i / D Bi .g depending on the knowledge of ıi . Inserting into (22b) and collecting for ıi yields Q0 XQ i BQ i1 PQi0 ıi 1 C .XQ i BQ i1 XQ i0 C PQi C1 BQ i1 C1 Pi C1 /ıi

(24)

Q Q Q 1 Q i C PQi C1 BQ i1 Q0 Q i C1 C PQi C1 BQ i1 C1 Xi C1 ıi C1 D hi C Xi Bi g C1 g 0 With Cholesky factorizations BQi D RiB RiB we define the following symbols 1 Q0 XOi WD XQ i RiB ; Ai WD XQ i BQ i1 XQi0 C PQi C1 BQ i1 C1 Pi C1

D XO i XO i0 C POi C1 POi0C1 ;

1 POi WD PQi RiB ;

D XO i POi0 ;

gO WD RiB

T

gQ i ;

Bi WD XQ i BQ i1 PQi0

(25)

ai WD hQ i C XQ i BQ i1 gQ i C PQi C1 BQ i1 Q i C1 D hQ i C XOi gO i C1 g C POi C1 gO i C1 :

Equation (24) may then be written in terms of these values for 0 i m 1 as Bi ıi 1 C Ai ıi C Bi0C1 ıi C1 D ai :

(26)

Solving the Block Tridiagonal System In the symmetric positive definite banded x system (26), only the matching condition duals ıi 2 Rn remain as unknowns. In classical condensing, exactly these matching conditions were used for elimination of a part of the primal unknowns. System (26) can be solved for ı by means of a block tridiagonal Cholesky factorization and two backsolves. Recovering the Block Sparse QP’s Solution Once ı is known, the step ıwZ can be recovered using (23). The full primal step ıw is then obtained from ıw D Y ıwY C ZıwZ . The constraint multipliers step ı is recovered using (20).

202

C. Kirches et al.

3.4 Computational Complexity In the left part of Table 1 a detailed list of the linear algebra operations required to carry out the individual steps of the complementary condensing method can be found. The number of floating point operations (FLOPs) required per shooting node, depending on the system’s dimensions n D nx C nq and nri , is given in the right part of Table 1. The numbers ny and nz with ny C nz D nri denote the range-space and null-space dimension in (16), respectively. The proposed method’s runtime complexity is O.m/, in sharp contrast to the classical condensing method’s O.m2 /, as the shooting grid length m does not appear explicitly in Table 1. Table 1 Left: Number of factorizations (dc), backsolves (bs), multiplications (*), and additions (+) required per shooting node. Right: Number of FLOPs required per shooting node Matrix Vector Action dc bs * + bs * + Decompose Ri 1 – – – Solve for ıwY , Y ıwY 1 1 – Build BQi – – 2 – Build XQi , PQi – – 2 – Build gQi , hQi – 4 3 Decompose BQi 1 – – – Build XOi , POi – 1 1 – Build Ai , Bi – – 3 1 Build gO i , ai – 3 2 Decompose (26) 1 – – – Solve for ıi 2 – – Z Solve for ıwZ , Zıw 2 3 2 i i Solve for ıi 1 4 3

Action

Floating point operations

Decompose Ri Solve for Y ıwY Build BQi Build XQi , PQi Build gQi , hQi

nri 2 .n 13 nri / nri ny C ny n nz2 n C nz n2 2nx nz n 2nx n C nz n C n2 C 2nx C n

Decompose BQi Build XOi , POi Build Ai , Bi Build gO i , ai Decompose (26) Solve for ıi Solve for ZıwZ i Solve for ıi

1 z3 n 3 x z2

2n n 3nx2 nz C nx2 nz2 + 2nx nz C 2nx 4 x3 n 3 2 2nx2 nz2 C 2nx nz C nz n C 2nz nr i ny C 2nx n C ny n C n2 C 3n

Complementary Condensing for the Direct Multiple Shooting Method

203

4 Example: A Vehicle Mixed-Integer Optimal Control Problem In this section we formulate a vehicle control problem as a test bed for the presented approach to solving the block sparse QPs. Exemplary Vehicle Mixed-Integer Optimal Control Problem We consider a simple dynamic model of a car driving with velocity v on a straight lane with varying slope . The optimizer excerts control over the engine and brake torque rates of change Reng and Rbrk , and the gear choice y. The state dimension is nx D 3, and we consider different numbers of available gears to scale the problems control dimension nq 3. 1 iA vP .t/ D iT .y/T .y/Meng Mbrk iT .y/Mfric Mair Mroad m r (27a) MP eng .t/ D Racc .t/;

MP brk .t/ D Rbrk .t/

(27b)

Herein m is the vehicle’s mass, iA and iT .y/ are the rear axle and gearbox transmission ratios. The amount of engine friction is denoted by Mfric , a nonlinear function of the engine speed. By Mair WD 12 cw Aair v2 .t/ we denote air resistance, cw being the aerodynamic shape coefficient, A the effective flow surface, and air the air density. Finally Mroad D mg.sin .t/ C fr cos .t// accounts for downhill force and tyre friction, g being the gravity constant and fr the coefficient of rolling friction. On a predefined track with varying slope, we minimize a weighted sum of travel time and fuel consumption, subject to velocity and engine speed constraints making the gear choice nontrivial. Run Time Complexity Run Time Complexity Clearly from Table 2 it can be seen that the classical condensing algorithm will be suitable for problems with limited grid lengths m and with considerably less controls than states, i.e. nq nx , which is exactly contrary to the situation encountered when applying outer convexification to MIOCPs. Nonetheless, using this approach we could solve several challenging mixed-integer optimal control problems to optimality with little computational effort, as reported in [9, 12, 14]. Table 2 Run time complexity of classical condensing and a dense active-set QP solver

Action

Run time complexity

Computing the hessian B Computing the constraints X , R Dense QP solver, startup Dense QP solver, per iteration Recovering ıv

O.m2 n3 / O.m2 n3 / O..mnq C nx /3 / O..mnq C nx /2 / O.mnx2 /

204

C. Kirches et al.

Sparsity In Table 3 the dimensions and amount of sparsity present in the Hessian and constraints matrices are given for the exemplary problem for 6 and 16 available gears. A grid length of m D 20 was used. As can be seen in the left part, the QP (9) is only sparsely populated for this example problem, with the number of nonzero elements (nnz) never exceeding 3%. After classical condensing, sparsity has been lost as expected. Had the overall dimension of the QP reduced considerably, as is the case for optimal control problems with nx nq , that would be of no concern. For our MIOCP with outer convexification, however, the results shown in Tables 2 and 3 indicate a considerable run time increase for larger m or nq is to be expected. Implementation Run Times The classical condensing algorithm as well as the QP solver QPOPT [7] are implemented in ANSI C and translated using gcc 4.3.3 with optimization level -O3. The linear algebra package ATLAS was used for BLAS operations. The proposed complementary condensing algorithm was preliminarily c (Release 2008b). All run times have been obtained implemented in MATLAB on a Pentium 4 machine at 3 GHz under SuSE Linux 10.3. The resulting run times shown in Table 4 support our conclusions drawn from Table 3. For m D 30 as well as for m D 20 and nq 14 the MATLAB code of our proposed methods beats an optimized C implementation of classical condensing plus QPOPT. In addition, we could solve four instances with m D 30 or nq D 18 that could not be solved before due to active set cycling of the QPOPT solver.

Table 3 Dimensions and number of nonzero elements (nnz) of the block structured QP (9) and the condensed QP for the exemplary vehicle control problem. Here m D 20, nx D 3. nq 2+6 2+16

Matrix Hessian Constraints Hessian Constraints

Block sparse Size 223 223 438 223 423 423 858 423

nnz 1; 419 1; 535 6; 623 3; 731

Condensed Size 163 163 378 163 363 363 798 363

nnz 13; 366 13; 591 66; 066 131; 769

Dense QP solver nnz seen 13; 366 .27%/ 61; 614 .63%/ 66; 066 .37%/ 289; 674 .80%/

Table 4 Average run time per iteration of the QP solver QPOPT on the condensed QPs (left, condensing run times excluded), and of a preliminary MATLAB code running proposed method on the block sparse QPs (right). “–” indicates cycling of the active set m m nq 10 20 30 nq 10 20 30 2C6 6 ms 31 ms 103 ms 2C6 40 ms 65 ms 100 ms 2C8 11 ms 58 ms 467 ms 2C8 42 ms 75 ms 110 ms 2 C 12 18 ms 226 ms – 2 C 12 50 ms 95 ms 140 ms 2 C 16 – – – 2 C 16 60 ms 115 ms 170 ms

Complementary Condensing for the Direct Multiple Shooting Method

205

5 Summary and Future Work Summarizing the results presented in Tables 3 and 4, we have seen that for OCPs with larger dimension nq , the classical O.m2 n3 / condensing algorithm is unable to significantly reduce the QPs size. Worse yet, the condensed QP is densely populated. As a consequence, the dense QP solver’s performance, exemplarily tested using QPOPT, is worse than what can be achieved by a suitable exploitation of the sparse block structure for the case nq nx . We presented an alternative O.mn3 / factorization of the block sparse KKT system due to [16,17], named complementary condensing in the context of MIOCPs. By theoretical analysis as well as by preliminary implementation we provided evidence that the proposed approach is able to challenge the run times of the classical condensing algorithm. The complementary condensing approach for solving the QP’s KKT system is embedded in an active set loop. In our preliminary implementation, a new factorization of the KKT system is computed in O.mn3 / time in every iteration of the active set loop. Nonetheless, the achieved computation times are attractive for larger values of m or nq . To improve the efficiency of this active set method further, several issues have to be addressed. Exploiting simple bounds on the unknowns will reduce the size of the matrices Bi , Ri , and Xi involved. For dense nullspace and range-space methods it is common knowledge that certain factorizations can be updated after an active set change in O.n2 / time. Such techniques would essentially relieve the active-set loop from all matrix-only operations, yielding O.mn2 / active set iterations with only an initial factorization in O.mn3 / time necessary. A forthcoming publication shall investigate into this topic.

References 1. J. ALBERSMEYER AND H. B OCK , Sensitivity Generation in an Adaptive BDF-Method, in Modeling, Simulation and Optimization of Complex Processes: Proc. 3rd Int. Conf. on High Performance Scientific Computing, Hanoi, Vietnam, 2008, pp. 15–24. 2. L. BIEGLER , Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocation, Comp. Chem. Eng., 8 (1984), pp. 243–248. 3. H. BOCK AND K. PLITT, A Multiple Shooting algorithm for direct solution of optimal control problems, in Proc. 9th IFAC World Congress Budapest, 1984, pp. 243–247. 4. H. FERREAU , H. BOCK , AND M. D IEHL, An online active set strategy for fast parametric quadratic programming in MPC applications, in Proc. IFAC Workshop on Nonlinear Model Predictive Control for Fast Systems, Grenoble, 2006. 5. R. FLETCHER , Resolving degeneracy in quadratic programming, Numerical Analysis Report NA/135, University of Dundee, Dundee, Scotland, 1991. 6. M. GERDTS, A variable time transformation method for mixed-integer optimal control problems, Optimal Control Applications and Methods, 27 (2006), pp. 169–182. 7. P. G ILL, W. MURRAY, AND M. SAUNDERS, User’s Guide For QPOPT 1.0: A Fortran Package For Quadratic Programming, 1995.

206

C. Kirches et al.

˚ SLUND, AND L. NIELSEN , Look-ahead control for heavy ¨ , M. IVARSSON, J. A 8. E. HELLSTR OM trucks to minimize trip time and fuel consumption, Control Eng. Pract., 17 (2009), pp. 245–254. ¨ , Time-optimal control of automobile 9. C. KIRCHES, S. SAGER , H. BOCK , AND J. SCHL ODER test drives with gear shifts, Opt. Contr. Appl. Meth. (2010). DOI 10.1002/oca.892. ¨ , An efficient multiple shooting 10. D. LEINEWEBER , I. BAUER , H. B OCK , AND J. SCHL ODER based reduced SQP strategy for large-scale dynamic process optimization. Part I: Theoretical aspects, Computers and Chemical Engineering, 27 (2003), pp. 157–166. 11. J. N OCEDAL AND S. WRIGHT, Numerical Optimization, Springer, 2nd ed., 2006. 12. S. S AGER , Numerical methods for mixed–integer optimal control problems, Der andere Verlag, T¨onning, L¨ubeck, Marburg, 2005. 13. S. SAGER , Reformulations and algorithms for the optimization of switching decisions in nonlinear optimal control, Journal of Process Control, 19 (2009), pp. 1238–1247. 14. S. S AGER , C. KIRCHES, AND H. BOCK , Fast solution of periodic optimal control problems in automobile test-driving with gear shifts, in Proc. 47th IEEE CDC, Cancun, Mexico, 2008, pp. 1563–1568. 15. S. S AGER , G. REINELT, AND H. BOCK , Direct methods with maximal lower bound for mixedinteger optimal control problems, Math. Prog., 118 (2009), pp. 109–149. 16. M. STEINBACH , Fast recursive SQP methods for large-scale optimal control problems, PhD thesis, Universit¨at Heidelberg, 1995. , Structured interior point SQP methods in optimal control, Zeitschrift f¨ur Angewandte 17. Mathematik und Mechanik, 76 (1996), pp. 59–62. , Tree-sparse convex programs, Math. Methods Oper. Res., 56 (2002), pp. 347–376. 18. 19. S. TERWEN, M. BACK , AND V. KREBS, Predictive powertrain control for heavy duty trucks, in Proc. IFAC Symposium in Advances in Automotive Control, Salerno, Italy, 2004, pp. 451–457. 20. J. T ILL, S. ENGELL, S. PANEK, AND O. STURSBERG , Applied hybrid system optimization: An empirical investigation of complexity, Control Eng. Pract., 12 (2004), pp. 1291–1303.

Some Inverse Problem for the Polarized-Radiation Transfer Equation A.E. Kovtanyuk and I.V. Prokhorov

Abstract An inverse problem for the steady vector transfer equation for polarized radiation is studied. For this problem, an attenuation factor is found from a given solution of the equation at a medium boundary. An approach is propounded to solve the inverse problem by using special external radiative sources. A formula is proposed which relates the Radon transform of an attenuation factor to a solution of the equation at the medium boundary. Numerical experiments show that the proposed reconstruction algorithm for the polarized-radiation transfer equation has an advantage over the similar method for the scalar case.

1 Introduction The linear integro-differential Boltzmann equation, also called the radiation transfer equation, is a basic model for describing the photon transfer process. Two kinds of interaction of photons with substance, namely, absorption and scattering, are considered within the framework of this model. For a more accurate description of the radiation transfer process, account should be taken of light beam polarization. Theoretical aspects of solving the vector transfer equation are presented in [1–3] where general functional properties of the direct problem are explored and conditions are specified under which a Neumann series converges in various spaces. A nice review on the vector transfer equation can be found in [4]. Among few works on inverse problems for the vector transfer equation it is worth mentioning [5–7], in which scattering properties of a medium are defined. In [5], in particular,

A.E. Kovtanyuk Far Eastern National University, Vladivostok, Russia e-mail: [email protected] I.V. Prokhorov Institute of Applied Mathematics FEBRAS, Vladivostok, Russia e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 17, © Springer-Verlag Berlin Heidelberg 2012

207

208

A.E. Kovtanyuk and I.V. Prokhorov

the problem of finding the single scattering albedo in a semi-infinite layer with a Rayleigh scattering matrix is solved. A model of polarized radiation passing through a plane homogeneous layer is considered in [6] where, too, an inverse problem of finding scattering matrix coefficients by using the incoming and outgoing radiation at a layer boundary is formulated and solved. In this paper, we deal with the problem of determining the attenuation factor in the transfer equation. A method of finding the factor is advanced which is based on employing a special-type external radiation source with discontinuities of the first kind in an angular variable. This method was used in [8–10] for solving an inverse problem in the scalar case. A method for determining the attenuation factor in the vector equation was proposed and substantiated in our recent paper [11]. The present account relies essentially on the results in [11], and so we will prove only those necessary statements that are not contained therein. The emphasis is on numerical verification of the method in order to underline peculiarities and demonstrate advantages of the proposed algorithm over a similar method for the scalar transfer equation. In computational experiments on finding the attenuation factor, we plan to realize some known weight modifications of the Monte-Carlo method: namely, the conjugate trajectories method and the maximum cross-section method [1].

2 Formulation and Solution of the Inverse Problem The main characteristic of polarized radiation is f D .f1 ; f2 ; f3 ; f4 /, a fourdimensional vector of Stokes parameters. The corresponding transfer equation for this vector in an isotropic medium has the form Z ! rr f .r; !/ C .r/f .r; !/ D s .r/ P .r; !; ! 0 /f .r; ! 0 /d! 0 C J.r; !/; (1) ˝

where r D .r1 ; r2 ; r3 / 2 G, G is a convex bounded domain in a three-dimensional Euclidian space E 3 , and ! 2 ˝ D f! 2 E 3 W j!j D 1g. In (1), the function J.r; !/ is a four-dimensional vector of internal radiation sources, .r/ is the total attenuation factor, s .r/ is a scattering coefficient, P .r; !; ! 0 / is a 4 4 scattering matrix. By writing ! rr f .r; !/ we mean a four-component vector function whose i -th component is a derivative of the function fi .r; !/ in a direction ! with respect to a space variable r. To characterize the inhomogeneity of a medium G in which the radiation transfer process is examined, we introduce a partition G0 of the domain G. Assume that the set G0 is open and dense in G, that is, G 0 D G. Moreover, we let G0 be the union of a finite number of domains and write G0 D

p [ i D1

Gi ;

Gi \ Gj D ;;

i ¤ j:

Some Inverse Problem for the Polarized-Radiation Transfer Equation

209

The domains Gi can be interpreted as parts of the inhomogeneous medium G filled with substance i . Suppose that the set G0 is generalized convex [12]; that is, any ray Lr;! D fr C t!; t 0g outgoing from point r 2 G0 in a direction ! 2 ˝ will intersect @G0 at finitely many points. Denote by Cb .X /, X 2 E m , a Banach space of functions which are defined on X , are bounded and continuous on X , and have the norm jjf jj D sup jf .x/j: x2X .4/

Similarly, we define a space Cb .X / which is formed by vector functions f D .f1 ; f2 ; f3 ; f4 / every component of which belongs to Cb .X /, and its corresponding norm is defined by setting jjf jj4 D max jjfi jj: 1i 4

Treating the coefficients in (1), we assume that functions .r/, s .r/ are nonnegative and belong to a space Cb .G0 /, with .r/ s .r/, and that the vector function .4/ J.r; !/ 2 Cb .G0 ˝/. All components of the scattering matrix P .r; !; ! 0 / belong to Cb .G0 ˝ ˝/. Let d.r; !/ be a distance from point r 2 G to boundary @G D G n G in a direction !. In view of [10], d.r; !/ 2 Cb .G ˝/. Put !˙ D fz 2 @G W ˙ D f.z; !/ 2 @G ˝ W

Lz;! \ G0 ¤ ;g; z 2 !˙ g;

D C [ :

The set . C / is a domain of incoming (outgoing) radiation. To (1), we add the following boundary condition: f .; !/ D h.; !/;

.; !/ 2 :

(2)

The vector function h.; !/ is defined on and describes a radiation flux entering the medium G. By the definitions of and d.r; !/, the boundary condition specified by (2) can be written in the form f .r d.r; !/!; !/ D h.r d.r; !/!; !/;

.r; !/ 2 G0 ˝:

.20 /

As for h, we assume that it is nonnegative and that e h.r; !/ D h.r d.r; !/!; !/ .4/ belongs to Cb .G0 ˝0 /, where ˝0 is open and dense subset in ˝. Along with the boundary condition given by (2’), we specify the following: f .r C d.r; !/!; !/ D H.r C d.r; !/!; !/;

.r; !/ 2 G0 ˝:

(3)

The function H.; !/ is defined on C and specifies a radiation flux leaving the medium. We formulate an inverse problem which in essence can be thought of as a tomography problem.

210

A.E. Kovtanyuk and I.V. Prokhorov

2.1 Tomography Problem Determine a function .r/ from (1) and boundary conditions (2’) and (3) if only functions h and H are known. To solve the tomography problem, we need some properties of the solution of a direct problem.

2.1.1 Direct Problem Equations (1), (2) is a problem of determining a function f from (1), (2) with known , P , J , and h. Put .lf /.r; !/ D ! rr f .r; !/ C .r/f .r; !/; Z N.r; !/ D s .r/

P .r; !; ! 0 /f .r; ! 0 /d! 0 :

(4) (5)

˝

Let ˝0 be an open subset of a unit sphere ˝ that is dense in ˝. We define a class D in which the solution to the direct problem is sought for. Definition 1. A vector function f .r; !/ belongs to D.G0 ˝0 / if, for any points .r; !/ 2 G0 ˝0 , the function f .r Ct!; !/ is absolutely continuous with respect to a variable t 2 Œd.r; !/; d.r; !/ and functions f .r; !/ and ! rr f .r; !/ belong .4/ to the space Cb .G0 ˝0 /. Note that since .r/ 2 Cb .G0 /, the operator l defined .4/ by (4) maps D.G0 ˝0 / to Cb .G0 ˝0 /. Definition 2. A solution of the direct problem (1), (2) is a function f .r; !/ 2 D.G0 ˝0 / satisfying the relations .lf /.r; !/ D N.r; !/ C J.r; !/; f .r d.r; !/!; !/ D h.r d.r; !/!; !/: for all .r; !/ 2 G0 ˝0 . In what follows, we use some conditions evoked by physical constraints on components of the function f .r; !/. Functions fi .r; !/ are Stokes parameters; so they should satisfy the following: f1 0;

f12 f22 C f32 C f42 : .4/

(6)

(see [1–3]). Denote by K a cone in the space Cb .G0 ˝0 / formed by functions .4/ f D .f1 ; f2 ; f3 ; f4 / 2 Cb .G0 ˝0 / satisfying (6). For the functions in K, note, there are conditions that are physical in character, which ultimately guarantee the existence and uniqueness of a solution of the direct problem (1), (2). These

Some Inverse Problem for the Polarized-Radiation Transfer Equation

211

conditions are as follows. For all f 2 K, the matrix P .r; !; ! 0 / must meet the constraints Pf 2 K; (7) Z Z .r/ s .r/ .P .r; !; ! 0 /f .r; ! 0 //1 d! 0 f1 .r; ! 0 /d! 0 (8) 4 ˝

˝

(see [2]). Constraint (7) signifies that the matrix operator P maps the cone K into itself, that is, .Pf /1 0;

.Pf /21 .Pf /22 C .Pf /23 C .Pf /24 ;

and condition (8) expresses energy conservation for a scattering event in a nonmultiplying medium [2]. Let e .r/ be a function in Cb .G0 / satisfying the inequality e .r/ .r/, r 2 G. We cite some facts from [11] which are needed for our further reasoning. We define integral operators e A W K ! K, S W K ! K \ C .4/ .G0 ˝/ and e S W K ! K as follows: .e A'/.r; !/ D

d.r;!/ Z

exp.e .r; !; t//'.r t!; !/dt; 0

Z .S '/.r; !/ D s .r/ .e S '/.r; !/ D s .r/

Z

P .r; !; ! 0 /'.r; ! 0 /d! 0 ;

˝

P .r; !; ! 0 /'.r; ! 0 /d! 0 C .e .r/ .r//'.r; !/;

˝

where d.r;!/ Z

e .r; !/ D

e .r t!/dt;

Zt e .r; !; t/ D

0

Put

e .r t 0 !/dt 0 :

0

e0 .r; !/ D e f h.r; !/ exp.e .r; !// C .e AJ /.r; !/:

Let formulate a statement on the well-posedness of direct problem (1), (2). Theorem 1. Assume that e h.r; !/ 2 K; J.r; !/ 2 Cb .G0 ˝/\K, and conditions (7) and (8) hold. In the cone K, a unique solution to problem (1), (2) exists and is expressed in terms of a Neumann series to yield .4/

e0 .r; !/ C f .r; !/ D f

1 X e 0 .r; !/; .e Ae S /n f nD1

(9)

212

A.E. Kovtanyuk and I.V. Prokhorov .4/

which converges in the norm of Cb .G0 ˝0 /. If e .r/ D .r/ then proving that the direct problem is well posed coincides with a similar argument in [11]. The function e .r/ is introduced to justify a computational algorithm which is used to solve the direct problem in the next section. Now we specify the set ˝0 . Hereinafter, let ˝ 0 D ˝ [ ˝C ;

˝˙ D f! 2 ˝ W sgn.!3 / D ˙1g:

In order to solve the above tomography problem, along with the conditions stated in Theorem 1, we impose extra requirements on the function h. 1. Let

.4/ e h.r; !/ 2 Cb .G0 ˝0 /:

(10)

2. For at least one i , i 2 f1; 2; 3; 4g and for all ! D .!1 ; !2 ; 0/ 2 ˝, the following relation holds: e Œe hi .r; !/ D e hC i .r; !/ hi .r; !/ ¤ 0;

r 2 G0 ;

(11)

.!1 ; !2 ; ˙"/ ˙ e e : hi .r; !/ D lim hi r; "!C0 1 C "2

where

Thus, we assume that one or more components of the function h.r; !/ in horizontal directions .!3 D 0/ have a discontinuity of the first kind. Below is a result from [11] which yields a solution to our tomography problem. Theorem 2. Assume that under the conditions of Theorem 1, h satisfies (10), (11) and relation (3) holds. Then, for all r 2 G0 ; ! D .!1 ; !2 ; 0/ 2 ˝, the following equality holds true: d.r;!/ Z

.r C !t/dt D ln d.r;!/

Œhi .r d.r; !/!; !/ : ŒHi .r C d.r; !/!; !/

(12)

Thus, the tomography problem is reduced to inverting the two-dimensional Radon transform of a function , that is, d.r;!/ Z

.r C !t/dt D ˚i .r; !/;

.R/.r; !/

(13)

d.r;!/

where ˚i .r; !/ D ln

Œhi .r d.r; !/!; !/ ŒHi .r C d.r; !/!; !/

(14)

Some Inverse Problem for the Polarized-Radiation Transfer Equation

213

in any horizontal plane fr D .r1 ; r2 ; r3 / 2 E 3 W r3 D constg which has a point in common with the set G0 . This problem has a unique solution in a wide class of functions [13, 14]. From (12) to (14), it follows immediately that in order to find .R/.r; !/ we can use any components of the vector functions h and H with nonzero discontinuities treated as functions of the angular variable. This fact will be made use of in conducting numerical experiments in the next section.

3 Numerical Results We show how the solution algorithm for the tomography problem works by the 3D Shepp-Logan Phantom [14] (See Fig. 1a). The function .r/ is recovered in a plane r3 D 0. We assume that matrix P D P .!; ! 0 / describes the Rayleigh law of scattering [4] and s .r/ D 0:5.r/ at the all medium G. For a vector function h.r; !/ corresponding to the incoming radiation, we take components such as 1; !3 0; 0; !3 0; 1; !3 0; h2 D h3 D h4 D 0: h1 D 1:1; !3 < 0; 1:1; !3 < 0; 0; !3 < 0; Intersection of r3 D 0 with the domain G is a circle of radius 1. In recovering .r/, use is made of a parallel scanning scheme [14]. Let r1 D cos ';

r2 D sin ';

! D . sin '; cos '; 0/;

2 Œ1; 1;

' 2 Œ0; 2/;

!? D .cos '; sin '; 0/;

! !? D 0;

Then equality (13) can be written in the form p 2 Z1 e i . ; !/; . !? C t!/dt D ˚ p

1 2

e i . ; !/ D ˚i .r. ; !/; !.'//, r D !? .'/. Hence, in the cross-section where ˚ r3 D 0, we derive integrals of the trace of the function on almost all lines passing through points r D !? .'/ in a direction !.'/, with 2 Œ1; 1 and ' 2 Œ0; 2/. Thus, the problem of defining the function .r/ reduces to inverting its Radon transform .R/. ; !/. In conducting computational experiments, we use the following partition of the set Œ1; 1 Œ0; 2/: l D 1 C l=60;

l D 0; 120;

's D s=90;

on which the Radon transform .R/. ; !.'// is defined.

s D 0; 179:

214

A.E. Kovtanyuk and I.V. Prokhorov

Fig. 1 The 3D Shepp-Logan Phantom at the section r3 D 0: (a) original cross-section; (b) reconstruction by using the jump of f1 ; (c) reconstruction by using the jump of f2

Some Inverse Problem for the Polarized-Radiation Transfer Equation

215

e i . l ; !.'s //, we calculate a jump ŒH.r C d.r; !/!; !/ at points To find ˚ .r l;s ; ! s /, where r l;s D l !? .'s / and ! s D !.'s /. The vector function H is calculated based on a Monte-Carlo method. Let .r/ for any r 2 G, where is a constant. Put e .r/ D . By Theorem 1, then, we arrive at a solution in the form of a convergent series such as in (9). The component . .r//'.r; !/ in the expression for the integral operator . .r//'.r; !/ can be treated as some fictitious scattering with the direction of photon propagation kept fixed. This method, called the maximum cross-section method [1], allows for simpler tricks with the free path length of a particle even in domains with a complex structure.With small variations of the total interaction coefficient in the medium, such an approach gives rather good results. Let m be the number of terms being considered in the Neumann series and n be the number of simulated trajectories. Then the function f .r; !/ can be found approximately in the form n 1X f .r; !/ f n .r; !/ D si .r; !/; n i D1 si .r; !/ D fe0 .r; !/ C

j m Y X 1 exp.d.r i;k1 ; ! i;k1 // j D1 kD1

. .r / C s .r i;k //Q.! i;k1 ; ! i;k /fe0 .r i;j ; ! i;j /: i;k

In simulating trajectories at each step .i; k/, we define r i;k setting r i;k D r i;k1 ! i;k1 ti;k ;

r i;0 D r;

! i;0 D !;

where ti;k is an independent realization of a random variable distributed on Œ0; d.r i;k1 ; ! i;k1 / with density .t/ D

exp.t/ : 1 exp.d.r i;k1 ; ! i;k1 //

Then we simulate a realization ˛i;k of a random variable distributed uniformly on Œ0; 1. In defining the quantity ! i;k and the matrix Q.! i;k1 ; ! i;k /, for ˛i;k

.r i;k / .r i;k / C s .r i;k /

we use the following formulas: !1i;k D !2i;k D

q

2 1 i;k cos 'i;k ;

q

2 1 i;k sin 'i;k ;

216

A.E. Kovtanyuk and I.V. Prokhorov

!3i;k D i;k ; Q.! i;k1 ; ! i;k / D 4P .! i;k1 ; ! i;k /; and for ˛i;k <

.r i;k / .r i;k / C s .r i;k /

we use ! i;k D ! i;k1 ;

Q.! i;k1 ; ! i;k / D E:

Here i;k and 'i;k are independent realizations of random variables distributed uniformly on corresponding intervals Œ1; 1 and Œ0; 2/, and E is a 4 4 identity matrix. In the numerical experiments, the vector function was approximated by a sum of 10 terms in the Neumann series specified by (9) .m D 10/. In one experiment, 2,000 trajectories were taken .n D 2;000/. Initially, the Radon transform was calculated at i D 1, which corresponds to an algorithm for the scalar transfer equation [10]. Values for the function .r/ on a 400 400 uniform grid in the plane r3 D 0 were computed by using an algorithm of convolution and inverse projection [14]. The result of reconstruction is depicted in Fig. 1b. A similar reconstruction at i D 2 is presented in Fig. 1c. Thus, the above-described generalization of the algorithm of solving the tomography problem to the case of a vector transfer equation for polarized radiation yields, in a number of cases, a more qualitative reconstruction as compared to the approach used earlier. In the practical aspect, this means that application of specialtype polarized sources does essentially expand the possibilities for nondestructive radiative control of various objects.

References 1. Marchuk G.I., Mikhailov G.A., Nazarliev M.A. et al. Monte–Carlo Method in Atmospheric Optics, Novosibirsk: Nauka, 1976. (in Russian) 2. Germogenova T.A., Konovalov N.V., and Kuz’mina M.G. Foundations of Mathematical Theory for Polarized Radiation Transfer (Rigorous Results), Proc. All-Union Symp. on Invariance Principle and Its Applications, Byurakan, 1981, Yerevan: Akad. Nauk Armyan. SSR, 1989, pp. 271–284. (in Russian) 3. Mikhailov G.A., Ukhinov S.A., and Chimaeva A.S. Variance of Standard Vector Estimate of Monte–Carlo Method in Polarized Radiation Transfer Theory, J. Comp. Math. and Math. Phys., 2006, vol. 46, no. 11., pp. 2099–2113. 4. Sushkevich T.A., Strelkov S.A., and Maksakova S.V. Mathematical Model of Polarized Radiation Transfer, J. Math. Model., 1998, vol. 10, no. 7. pp. 61–75. (in Russian) 5. Siewert C.E. Determination of the Single Scattering Albedo from Polarization Measurements of the Rayleigh Atmosphere, Astrophys. Space Sci., 1979, vol. 60, pp. 237–239. 6. Siewert C.E. Solution of an Inverse Problem in Radiative Transfer with Polarization, J. Quant. Spectrosc. Radiat. Transfer, 1983, vol. 30, no. 6. pp. 523–526. 7. Ukhinov S.A. and Yurkov D.I. Computation of the Parametric Derivatives of Polarized Radiation and the Solution of Inverse Atmospheric Optics Problems, Russ. J. Numer. Anal. Math. Model., 2002, vol. 17, no. 3, pp. 283–303.

Some Inverse Problem for the Polarized-Radiation Transfer Equation

217

8. Anikonov D.S. and Prokhorov I.V. Determination of Transfer Equation Coefficient with Energy and Angular Singularities of External Radiation, Dokl. Math., 1992, vol. 327, no. 2, pp. 205–207. 9. Anikonov D.S., Prokhorov I.V., and Kovtanyuk A.E. Investigation of Scattering and Absorbing Media by the Methods of X-Ray Tomography, J. Inv. Ill-Posed Probl., 1993, vol. 1, no. 4, pp. 259–281. 10. Anikonov D.S., Kovtanyuk A.E., and Prokhorov I.V. Transport Equation and Tomography, Utrecht: VSP, 2002. 11. Kovtanyuk A.E. and Prokhorov I.V. Tomography Problem for the Polarized-Radiation Transfer Equation, J. Inv. Ill-Posed Probl., 2006, vol. 14, no. 6, pp. 1–12. 12. Germogenova T.A. Local Properties of Solutions of the Transfer Equation, Moscow: Nauka, 1986. (in Russian) 13. Khachaturov A.A. Determination of the Measure of a Domain in an n-Dimensional Euclidian Space from Its Values for All Half-Spaces, Russ. Math. Surv., 1954, vol. 9, no. 3(61), pp. 205–212. (in Russian) 14. Natterer F. The Mathematics of Computerized Tomography, Stuttgart: B. G. Teubner and John Wiley and Sons, 1986.

•

Finite and Boundary Element Energy Approximations of Dirichlet Control Problems Gunther ¨ Of, Thanh Xuan Phan, and Olaf Steinbach

Abstract We study a Dirichlet boundary control problem of the Poisson equation where the Dirichlet control is considered in the energy space H 1=2 . /. Both, finite and boundary element approximations of the minimal solution are presented. We state the unique solvability of both approaches, as well as the stability and error estimates. The numerical example is in good agreement with the theoretical results.

1 Introduction While finite element methods are widely used in optimal control problems, see e.g. [1–3, 6, 7, 13] and the references therein, there are only a few approaches by boundary element methods, see the discussion in [9]. The use of boundary element methods looks promising, since the unknown control function is defined on the boundary only. In most cases the Dirichlet control is considered in L2 . /, but the energy space H 1=2 . / seems to be the more natural choice. Here, we present the continuous solution of the considered Dirichlet control problem in Sect. 2, its finite element approximation in Sect. 3, and the symmetric boundary element approximation in Sect. 4. We end with a numerical example comparing the two methods in Sect. 5. Note that the related analysis is presented in [8] and [9], respectively.

G. Of T.X. Phan O. Steinbach Institute of Computational Mathematics, Graz University of Technology Steyrergasse 30/III, 8010 Graz, Austria e-mail: [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 18, © Springer-Verlag Berlin Heidelberg 2012

219

220

G. Of et al.

2 Dirichlet Control Problems As a model problem, we consider the Dirichlet control problem to minimize J.u; z/ D

1 2

Z

1 Œu.x/ u.x/2 dx C %kzk2A 2 ˝

for .u; z/ 2 H 1 .˝/ H 1=2 . / (1)

subject to the constraint of the Dirichlet boundary value problem of the Poisson equation u.x/ D f .x/

for x 2 ˝;

u.x/ D z.x/

for x 2 ;

(2)

where u 2 L2 .˝/ is a given target, f 2 L2 .˝/ is a given volume density, % 2 RC , and ˝ Rd ; d D 2; 3; is a bounded Lipschitz domain with boundary D @˝. Moreover, k kA is an equivalent norm in H 1=2 . / which is induced by an elliptic, self-adjoint, and bounded operator A W H 1=2 . / ! H 1=2 . /. For example, e we may consider the stabilized hypersingular boundary integral operator A D D, see [10], e wi WD hDz; wi C hz; 1i hw; 1i hDz; where .Dz/.x/ D

@ @nx

Z

for all z; w 2 H 1=2 . /

@ U .x; y/z.y/ dsy @ny

(3)

for x 2

is the hypersingular integral operator of the Laplacian, and the fundamental solution of the Laplace operator is given by [12] 8 1 ˆ log jx yj < 2 U .x; y/ D 1 1 ˆ : 4 jx yj

for d D 2; for d D 3:

2.1 Continuous Solution of the Control Problem Let uf 2 H01 .˝/ be the weak solution of the homogeneous Dirichlet boundary value problem uf .x/ D f .x/

for x 2 ˝;

uf .x/ D 0 for x 2 :

The solution of the boundary value problem (2) is then given by u D uz C uf , where uz 2 H 1 .˝/ is the unique solution of the boundary value problem uz .x/ D 0

for x 2 ˝;

uz .x/ D z.x/

for x 2 :

Finite and Boundary Element Energy Approximations of Dirichlet Control Problems

221

Hence we may define a linear map S W H 1=2 . / ! H 1 .˝/ L2 .˝/; uz D S z. By using u D S z C uf , we can write the cost functional (1) to be minimized subject to the constraint (2) as the reduced cost functional Z 1 1 e Œ.S z/.x/ C uf .x/ u.x/2 dx C %hAz; zi : (4) J .z/ D 2 ˝ 2 Since the reduced cost functional e J ./ is convex, the unconstrained minimizer z is determined by the optimality condition S S z C S .uf u/ C %Az D 0;

(5)

where S W L2 .˝/ ! H 1=2 . / is the adjoint operator of S W H 1=2 . / ! L2 .˝/, i.e., hS ; 'i Dh ; S 'i˝ D 2 t˝ .x/.S '/.x/ dx for all ' 2 H 1=2 . /;

2 L2 .˝/:

The operator equation (5), i.e., T% z WD .%A C S S /z D S .u uf / DW g

(6)

admits a unique solution z 2 H 1=2 . /, since the operator T% W H 1=2 . / ! H 1=2 . / is bounded and H 1=2 . /–elliptic, see, e.g., [8]. In this paper, we will discuss the approximate solution of the operator equation (6) by finite and boundary element methods, which is the main subject of this paper. By introducing the adjoint variable D S .u u/ 2 H 1=2 . /, instead of (6) we have to solve the coupled problem C %Az D 0;

D S .u u/;

u D S z C uf :

(7)

The adjoint variable D S .u u/ is the Neumann datum .x/ D

@ p.x/ for x 2 @nx

of the unique solution p of the adjoint Dirichlet boundary value problem p.x/ D u.x/ u.x/

for x 2 ˝;

p.x/ D 0

for x 2 :

(8)

Hence we can rewrite the optimality condition C %Az D 0 as @ p.x/ D %.Az/.x/ @nx

for x 2 :

(9)

222

G. Of et al.

Therefore, instead of minimizing the cost functional (1) subject to the constraint (2), we have to solve a coupled system: the state (2), the adjoint boundary value problem (8), and the optimality condition (9).

3 Finite Element Approximations In this section, we introduce finite element approximations to solve the primal problem (2), the adjoint problem (8) and the optimality condition (9). First, we consider a variational formulation of the primal Dirichlet boundary value problem (2) to find u 2 H 1 .˝/ such that uj D z and Z

Z ru.x/ rv.x/ dx D ˝

f .x/v.x/ dx ˝

for all v 2 H01 .˝/:

(10)

Let M˝ CM 1 1 1 1 1 ˝ .˝/Dspanfk gM Sh;0 kD1 H0 .˝/; Sh .˝/ D Sh;0 .˝/ [ spanfk gkDM˝ C1 H .˝/

be finite element spaces of piecewise linear and continuous basis functions k , which are defined with respect to some admissible domain triangulation of mesh size h, and 1=2 Sh1 . / D Sh1 .˝/j D spanf'` gM . / `D1 H be a finite element space of piecewise linear and continuous basis functions '` which are the boundary traces of the domain basis functions M˝ C` . The trace of the function uh 2 Sh1 .˝/ is denoted by zh 2 Sh1 . /. Defining the natural extension e zh D

M X

z` M˝ C` 2 Sh1 .˝/ of zh D

`D1

M X

z` 'M˝ C` 2 Sh1 . /;

`D1

P ˝ 1 zh where u0;h D M we represent the function uh by uh D u0;h Ce kD1 uk k 2 Sh;0 .˝/. 1 The Galerkin formulation of (10) is to find u0;h 2 Sh;0 .˝/ for zh 2 Sh1 . / such that Z

Z ˝

rŒu0;h .x/ Ce zh .x/ rvh .x/ dx D

˝

f .x/vh .x/ dx

1 for all vh 2 Sh;0 .˝/:

By choosing vh D j for j D 1; : : : ; M˝ , we get the equivalent system of linear equations KII u C KCI z D f I : (11) Remark 1. For the analysis in [8], we use a different splitting u D u0 C E z and a quasi-interpolation operator. But finally the perturbed system corresponds to the one which we present here.

Finite and Boundary Element Energy Approximations of Dirichlet Control Problems

223

Next, we consider a finite element approximation of the adjoint problem (8) to find p 2 H01 .˝/ as the unique solution of the variational problem Z

Z rp.x/ rv.x/ dx D ˝

Œu.x/ u.x/v.x/ dx ˝

for all v 2 H01 .˝/:

The finite element approximation of (12) is to find ph .x/ D 1 Sh;0 .˝/ such that

PM˝

kD1 pk k .x/ 2

Z

Z ˝

(12)

rph .x/ rvh .x/ dx D

˝

Œe u0;h .x/ Ce zh .x/ u.x/vh .x/ dx

1 for all vh 2 Sh;0 .˝/. By choosing vh D j for j D 1; : : : ; M˝ , the latter is equivalent to a system of linear equations

KII p D MII u C MCI z g I :

(13)

We now consider a finite element approximation of the optimality condition (9). Note that, by Green’s formula, z 2 H 1=2 . / solves the variational problem Z %hAz; wi D

Z D

@ p.x/w.x/ dsx @nx

Z

rp.x/ rE w.x/ dx ˝

Œu.x/ u.x/E w.x/ dx; ˝

for all w 2 H 1=2 . /, where E W H 1=2 . / ! H 1 .˝/ is a bounded extension operator, see, e.g, [12]. Then, the finite element approximation reads to find zh 2 Sh1 . / such that Z %hAzh ; '` i D ˝

Z rph .x/rM˝ C` .x/ dx Œu0;h .x/Ce zh .x/u.x/M˝ C` .x/ dx ˝

for all ` D 1; : : : ; M or to solve the equivalent system of linear equations %Ah z D KIC p MIC u MC C z C gC :

(14)

We end up from (11), (13), and (14) with the finite element system 0

MII @ KII MIC

KII KIC

1 10 1 0 g I u MCI C A @p A D B KCI @ f I A: .MC C C %Ah / z g C

(15)

224

G. Of et al.

e as defined in (3) When using the modified hypersingular integral operator A D D we use integration by parts to evaluate the Galerkin matrix Ah , for the details, see, e.g., [11, 12]. Since the matrix KII is invertible, we obtain the Schur complement system

1 1 1 1 KIC KII MII KII KCI KIC KII MCI MIC KII KCI C MC C C %Ah z 1 1 1 1 D gC KIC KII g I C KIC KII MII KII f I MIC KII fI

(16)

which is the finite element approximation of the operator (6). The stability result of the linear system (16) and the error analysis are presented as follows. Theorem 1 ([8]). The matrix 1 1 1 1 e%;h WD KIC KII T MII KII KCI KIC KII MCI MIC KII KCI C MC C C %Ah

is positive definite and therefore the system (16) is uniquely solvable. Let z be the unique solution of the operator equation (6) minimizing the reduced cost functional (4). Let zh 2 Sh1 . / $ z 2 RM be the unique solution of the system (16). Then there holds the error estimate kz zh kH 1=2 . / c.z; u; f /h;

(17)

Applying the Aubin–Nitsche trick, we are also able to derive an error estimate in L2 . / norm, i.e., kz zh kL2 . / c.z; u; f /h3=2 : (18)

4 Symmetric Boundary Element Approximations In this section, we propose to use boundary element methods to solve the state (2), the adjoint (8), and finally the coupled system (7). In addition, we state the main results of the related analysis. The use of boundary integral equations seems to be a natural choice since the unknown control z 2 H 1=2 . / is considered on the boundary. For details on boundary integral equations and boundary element methods see, e.g., [12] and the references therein.

4.1 Boundary Integral Equations 4.1.1 State Equation The unknown Cauchy data z 2 H 1=2 . / and t WD have to satisfy the boundary integral equation

@u @n

2 H 1=2 . / of the state (2)

Finite and Boundary Element Energy Approximations of Dirichlet Control Problems

1 .V t/.x/ D . I C K/z.x/ .N0 f /.x/ 2 Z

where .V t/.x/ D

U .x; y/t.y/ dsy

for x 2 ;

225

(19)

for x 2

is the Laplace single layer potential V W H 1=2 . / ! H 1=2 . /, and Z .Kz/.x/ D

@ U .x; y/z.y/ dsy @ny

for x 2

is the Laplace double layer potential K W H 1=2 . / ! H 1=2 . /. In addition, Z .N0 f /.x/ D

U .x; y/f .y/ dy

for x 2

˝

is the related Newton potential. 4.1.2 Adjoint Boundary Value Problem For the adjoint Dirichlet boundary value problem (8), p.x/ D u.x/ u.x/

for x 2 ˝;

p.x/ D 0 for x 2 ;

the solution is given by the representation formula for e x2˝ Z p.e x/ D

U .e x ; y/

@ p.y/ dsy C @ny

Z

U .e x ; y/Œu.y/ u.y/ dy:

(20)

˝

As in (19), we obtain a boundary integral equation .V q/.x/ D .N0 u/.x/ .N0 u/.x/ for x 2

(21)

@ to determine the unknown Neumann datum q D p 2 H 1=2 . /. @n So far, the solution u of the primal Dirichlet boundary value problem (2) enters the volume potential N0 u in the integral (21). To end up with a system of boundary integral operators for the unknowns only, we will rewrite the volume potential N0 u in (21) by boundary integral operators. Therefore we introduce a modified representation formula, instead of (20), for the adjoint state p as follows. First we note that 8 1 ˆ ˆ < jx yj2 .log jx yj 1/ for d D 2; 8 V .x; y/ D (22) ˆ 1 ˆ : jx yj for d D 3 8

226

G. Of et al.

is a solution of the Poisson equation y V .x; y/ D U .x; y/ for x ¤ y; i.e., V .x; y/ is the fundamental solution of the Bi-Laplacian. By using Green’s second formula, we can rewrite the volume integral for u in (20) as follows: Z

U .e x ; y/u.y/ dy D

˝

Z D

Z ˝

Œy V .e x ; y/u.y/ dy

Z @ V .e x ; y/ u.y/ dsy C V .e x ; y/Œu.y/dy @ny ˝ Z Z Z @ V .e x ; y/z.y/ dsy V .e x ; y/t.y/ dsy V .e x ; y/f .y/ dy: D @n y ˝ @ V .e x ; y/u.y/dsy @ny

Z

Therefore, we now obtain from (20) the modified representation formula Z p.e x/ D

Z U .e x ; y/q.y/ dsy C

Z @ V .e x ; y/z.y/ dsy V .e x ; y/t.y/ dsy @ny Z Z U .e x ; y/u.y/ dy V .e x ; y/f .y/ dy (23) ˝

˝

for e x 2 ˝, where the volume potentials involve given data only. Taking the limit ˝ 3e x ! x 2 , the representation formula (23) results in the boundary integral equation .V q/.x/ D .V1 t/.x/ .K1 z/.x/ C .N0 u/.x/ C .M0 f /.x/

for x 2 ;

(24)

where, as in (19), V1 ; K1 and M0 are the potentials of the Bi–Laplace, which are related to the fundamental solution (22), see [4, 5].

4.1.3 Optimality Condition We will now rewrite the optimality condition (9) by using a hypersingular boundary integral equation for the adjoint problem to obtain a symmetric boundary integral formulation for the coupled problem. To do so, we compute the normal derivative of the representation formula (23) and obtain a second boundary integral equation for x 2 1 q.x/ D . I C K 0 /q.x/ .D1 z/.x/ .K10 t/.x/ .N1 u/.x/ .M1 f /.x/; (25) 2 where K 0 and K10 are the adjoint double layer potentials of the operators K and K1 , respectively. In addition, N1 and M1 are Newton potentials and

Finite and Boundary Element Energy Approximations of Dirichlet Control Problems

@ .D1 z/.x/ D @nx

Z

@ V .x; y/z.y/ dsy @ny

227

for x 2

is the Bi–Laplace hypersingular boundary integral operator. For the mapping properties of these operators, see, e.g., [4, 5, 12]. Combining the optimality condition (9) and (25) gives a boundary integral equation for x 2 , 1 I C K 0 q.x/ .D1 z/.x/ .K10 t/.x/ .N1 u/.x/ .M1 f /.x/: 2 (26) Now we are in position to reformulate the primal Dirichlet boundary value problem (2), the adjoint Dirichlet boundary value problem (8), and the modified optimality condition (9) as the system of the boundary integral equations (19), (24), and (26),

%.Az/.x/ D

10 1 0 1 N 0 u C M0 f t V1 V K1 A: @ V 12 I K A @q A D @ N0 f 1 0 0 K1 2 I K %A C D1 N1 u M1 f z 0

(27)

Since the Laplace single layer potential V is H 1=2 . /–elliptic and therefore invertible, we can eliminate t and q from the second and the first equation. Hence it remains to solve the Schur complement system h 1 1 %ACD1 C K10 V 1 . I C K/ C . I CK 0 /V 1 K1 2 2 i 1 1 . I C K 0 /V 1 V1 V 1 . I CK/ z 2 2 1 D K10 V 1 N0 f N1 u M1 f C . I C K 0 /V 1 ŒN0 u C M0 f V1 V 1 N0 f : 2 (28) Theorem 2 ([9]). The composed boundary integral operator b% D %A C D1 C K10 V 1 . 1 I C K/ C . 1 I C K 0 /V 1 K1 T 2 2 1 1 . I CK 0 /V 1 V1 V 1 . I CK/ 2 2 b% W H 1=2 . / ! H 1=2 . /, and H 1=2 . /–elliptic. is self-adjoint, bounded, i.e., T Hence, by the Lax-Milgram theorem, the Schur complement system (28) admits a unique solution z 2 H 1=2 . /.

228

G. Of et al.

4.2 Galerkin Boundary Element Discretization We consider a boundary element discretization of the system (27). Let Sh0 . / D spanf

N k gkD1

H 1=2 . /;

1=2 Sh1 . / D spanf'` gM . / `D1 H

be some boundary element spaces of piecewise constant and piecewise linear basis functions k and '` , which are defined with respect to some admissible and globally quasi-uniform boundary element mesh of mesh size h. The Galerkin discretization of the system (27) is to find .th ; qh ;b zh / 2 Sh0 . / Sh0 . / Sh1 . / such that zh ; wh i D hN0 u C M0 f; wh i ; hV1 th ; wh i C hV qh ; wh i C hK1b 1 hV th ; h i h. I C K/b zh ; h i D hN0 f; h i ; 2 1 zh ; vh i D hN1 u C M1 f; vh i hK10 th ; vh i h. I C K 0 /qh ; vh i C h.%A C D1 /b 2 is satisfied for all .wh ; h ; vh / 2 Sh0 . /Sh0 . /Sh1 . /. This Galerkin formulation is equivalent to a system of linear equations, 10 1 0 1 f t Vh K1;h V1;h B 1C 1 @ Vh A @ A q D @f 2 A ; 2 M h Kh > K1;h 12 Mh> Kh> %Ah C D1;h b z f 0

(29)

3

where the entries of block matrices of the linear system are defined corresponding to the boundary integral operators as used in (27), see [9, 12]. The Galerkin matrix Vh is invertible due to the H 1=2 . /–ellipticity of the single layer potential V . Eliminating t and q, we obtain the Schur complement system of the boundary integral formulation (28) b%;hb T zDf; (30) where the Schur complement is given by 1 1 > b%;h D %Ah C D1;h C K1;h Vh1 . Mh C Kh / C . Mh> C Kh> /Vh1 K1;h T 2 2 1 1 . Mh> C Kh> /Vh1 V1;h Vh1 . Mh C Kh /; (31) 2 2 and the right hand side is 1 1 > f D f 3 K1;h Vh1 f 2 C . Mh> C Kh> /Vh1 f 1 C . Mh> C Kh> /Vh1 V1;h Vh1 f 2 : 2 2

Finite and Boundary Element Energy Approximations of Dirichlet Control Problems

229

Note that (31) represents the symmetric boundary element approximation of the operator (6). b%;h as defined in (31) is Theorem 3 ([9]). The approximate Schur complement T positive definite, and therefore the system (29) is uniquely solvable. Let z and t be the unique solutions of the system (27), and let b zh 2 Sh1 . / and th 2 Sh0 . / be the unique solutions of the Galerkin variational problem of (29). When assuming 1 z 2 H 2 . /, and t 2 Hpw . /, i.e., u 2 H 5=2 .˝/, there hold the error estimates kz b zh kH 1=2 . / c.z; u; f /h3=2 ;

kt th kH 1=2 . / c.z; u; f /h3=2 :

(32)

Applying the Aubin–Nitsche trick, we are able to derive the error estimates in L2 . /, kz b zh kL2 . / D O.h2 /; kt th kL2 . / D O.h/: (33)

5 Numerical Example As numerical example, we consider the Dirichlet boundary control problem (1) and (2) for the domain ˝ D .0; 12 /2 R2 where 1 u.x/ D 4 C Œx1 .1 2x1 / C x2 .1 2x2 /; %

8 f .x/ D ; %

% D 0:01:

For the finite and boundary element discretization, we introduce a uniform triangulation of ˝ and a uniform decomposition of the boundary on several levels L where the mesh size is hL D 2.LC1/ . In this case, the minimizer of (1) is not known. Therefore we use the approximate solution zh of the 9th level as reference solution. In Table 1, we present the errors for the control z and the flux t in the L2 . / norm and the estimated order of convergence (eoc). N˝ is the number of triangles of the FE triangulation, while N denotes the number of elements on the boundary. Table 1 Finite and boundary element errors for Dirichlet optimal control problem L N˝ N FEM BEM 2 3 4 5 6 7 8

64 256 1,024 4,096 16,384 65,536 262,144

16 32 64 128 256 512 1,024

kzhL zh9 kL2 . / 3.893e-01 1.074e-01 2.809e-02 7.281e-03 1.872e-03 4.691e-04 1.060e-04

eoc 1.86 1.93 1.95 1.96 2.00 2.15

kb zhL b zh9 kL2 . / 2.246e-00 4.655e-01 8.836e-02 1.631e-02 3.019e-03 5.731e-04 1.243e-04

eoc 2.27 2.39 2.44 2.43 2.40 2.20

kth th9 kL2 . / 39.874 28.670 16.220 8.880 4.727 2.446 1.185

eoc 0.48 0.82 0.87 0.91 0.95 1.05

230

G. Of et al.

Both methods give comparable L2 . / errors for the approximation of the control z, as expected. While for the boundary element discretization we could prove a convergence order of 2, which also corresponds to the best approximation error, in the finite element case, up to now, we are only able to prove an order of 1.5.

6 Conclusions We presented a finite and a boundary element approximation of a Dirichlet control problem utilizing the energy space H 1=2 . /. Both methods result in systems of linear equations which have a similar structure. This feature is well known from coupled problems and domain decomposition methods and offers the use of suitable preconditioned solution techniques in combination with fast boundary element methods [11]. Moreover, ongoing work is focused on the consideration of additional box constraints. Acknowledgements This work has been supported by the Austrian Science Fund (FWF) under the Grant SFB Mathematical Optimization and Applications in Biomedical Sciences.

References 1. E. Casas, J. P. Raymond: Error estimates for the numerical approximation of Dirichlet boundary control for semilinear elliptic equations. SIAM J. Control Optim. 45 (2006) 1586– 1611. 2. K. Deckelnick, A. G¨unther, M. Hinze: Finite element approximation of Dirichlet boundary control for elliptic PDEs on two- and three-dimensional curved domains. SIAM J. Control Optim. 48 (2009) 2798–2819. 3. M. Hinze, R. Pinnau, M Ulbrich, S. Ulbrich: Optimization with PDE Constraints. Mathematical Modelling: Theory and Applications, vol. 23, Springer, Heidelberg, 2009. 4. G. C. Hsiao, W. L. Wendland: Boundary Integral Equations, Springer, Heidelberg, 2008. 5. B. N. Khoromskij, G. Schmidt: Boundary integral equations for the biharmonic Dirichlet problem on non-smooth domains. J. Integral Equation Appls. 11 (1999) 217–253. 6. K. Kunisch, B. Vexler: Constrainted Dirichlet boundary control in L2 for a class of evolution equations. SIAM J. Control Optim. 46 (2007) 1726–1753. 7. S. May, R. Rannacher, B. Vexler: Error analysis for a finite element approximation of elliptic Dirichlet boundary control problems. Lehrstuhl f¨ur Angewandte Mathematik, Universit¨at Heidelberg, Preprint 05/2008. 8. G. Of, T. X. Phan, O. Steinbach: An energy space finite element approach for elliptic Dirichlet boundary control problems. Berichte aus dem Institut f¨ur Numerische Mathematik, Bericht 2009/13, TU Graz, 2009. 9. G. Of, T. X. Phan, O. Steinbach: Boundary element methods for Dirichlet boundary control problems. Math. Methods Appl. Sci., published online, 2010. 10. G. Of, O. Steinbach: A fast multipole boundary element method for a modified hypersingular boundary integral equation. In: Analysis and Simulation of Multifield Problems (W. L. Wendland, M. Efendiev eds.), Lecture Notes in Applied and Computational Mechanics, vol. 12, Springer, Heidelberg, pp. 163–169, 2003.

Finite and Boundary Element Energy Approximations of Dirichlet Control Problems

231

11. S. Rjasanow, O. Steinbach: The Fast Solution of Boundary Integral Equations. Mathematical and Analytical Techniques with Applications to Engineering, Springer, 2007. 12. O. Steinbach: Numerical Approximation Methods for Elliptic Boundary Value Problems. Finite and Boundary Elements. Springer, New York, 2008. 13. B. Vexler: Finite element approximation of elliptic Dirichlet optimal control problems. Numer. Funct. Anal. Optim. 28 (2007) 957–973.

•

Application of High Performance Computational Fluid Dynamics to Nose Flow I. Pantle and M. Gabi

Abstract Modern nasal surgery aims at improving airways for healthy, comfortable breathing. Presently available measurements are not sufficient to describe optimized shapes, particle transport and flow. Computational Fluid Dynamics (CFD), particularly Large Eddy simulation (LES), might help to understand flow details and define standards. However, the human nose flow is challenging state-of-the-art CFD methods. It is generally not as clear and as well investigated as technical flows where CFD has originally been developed for. The challenging aspects are: First, the geometrical complexity of the nasal airways is much higher. Thin, long channels exist with multiple junctions and separations, 3-dimensionally contorted, requiring high resolution. Second, flow conditions are an interaction of several physical phenomena and tend to challenge numerical schemes in terms of viscosity, Reynolds number, Mach number, wall roughness, turbulence, heat transfer, humidity, fluidtissue interaction etc. Third, only few validable experimental data of sufficient accuracy are available. Dealing with humans, either no standardized measurement conditions exist due to the bodies’ uniqueness or standard measurement procedures of engineering type cannot be applied. This causes a lack of comparability, limiting conclusions for surgery. Within this contribution, exemplary flow simulations through a real nose geometry under average conditions will be shown using a MPICH-parallelized, compressible Navier–Stokes scheme. The emphasis is on investigating fast, smallscale flow fluctuations near the regio olfactoria. The intention is to present a first step and to show in which direction developments must turn in order to perform reliable simulations. A 3D compressible CFD research code will be used, which is developed at the Institute of Fluid Machinery, University of Karlsruhe, Germany.

I. Pantle M. Gabi Institute of Fluid Machinery, University of Karlsruhe, Baden-W¨urttemberg, Germany e-mail: [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 19, © Springer-Verlag Berlin Heidelberg 2012

233

234

I. Pantle and M. Gabi

1 Introduction Clinical scanning techniques, e.g. Computer Tomography (CT), allow digital reconstructions of a patient’s nasal geometry. The scans can be used to construct a simulation grid spanning the air flow domain which is required for breath flow calculations. With such a procedure, comparative simulations of breath flow in varying nose geometries and in particular in real nasal airways can be performed and even virtual surgery rooms seem possible. Figure 1 shows a digitally reconstructed face out of CT scans which were taken for this project. The resolution is of a voxel size of 0:35 0:35 0:8 mm3 . Figure 2 shows the block distribution of a blockstructured grid which is used within this project. The procedure of creating this simulation grid out of the CT data will be described in more detail in Sect. 4. CFD approaches, particularly LES, might help to understand the time dependant flow behavior in complex geometries [2]. Human nasal flow belongs to a class of flow, where detailed investigations with the aim of drawing general conclusions on human breathing, smelling and particle absorption bear severe difficulties. From an engineering point of view, the wide range of individual airway shapes so far inhibit conclusions on typical, standardized geometries for comfortable breathing and smelling. Measurements of engineering accuracy such as Particle Imaging

Regio Olfactoria (approx.)

Fig. 1 Digitally reconstructed nose geometry out of axial CT data

Fig. 2 Digital geometry, flow domain and grid block structure of full nose (D 2=2 nose)

Application of High Performance Computational Fluid Dynamics to Nose Flow

235

Velocimetry (PIV) can be applied to optical models shaped into light transmitting materials [7]. However, such models rather meet measurement requirements than human live conditions concerning fluid, materials, sizes, shapes etc. Adding up the difficulties which arise for drawing conclusions from model experiments to real nasal flow, other means of investigations are sought. LES seems to be a suitable method to provide the required insight. LES is a 3D time dependent flow simulation method, where most of the turbulent energy spectrum is directly resolved. Smaller turbulent scales are filtered either by models or grid resolution [3]. Individual calculation domains can be delivered by medical scanning techniques. Such an approach will be demonstrated.

2 Long-Term Simulation Targets 2.1 Medical Targets In order to obtain general and integral information on questions with medical relevance, several long-term investigations are headed for. These will be described in the following paragraphs. Some have been achieved already. In this case previous publications will be referenced. Two major targets have been identified as of clinical relevance: • Confirmation that high frequency fluctuations can appear and can numerically be resolved in the upper nose region which is the smelling region (regio olfactoria): These fluctuations are of medical relevance. It is expected that in this region the flow slows down and swirling appears so that scent particles which are transported with the flow can easily be absorbed by the mucus. • Full breathing cycle simulation: It is expected that a full breathing cycle (in- and exhalation) follows a significantly larger temporal regime than the fluctuations near the regio olfactoria. It is therefore a significant question, if simulations of a complete cycle with resolution of small scale simulations will be feasible.

2.2 Steady State Simulations General steady state simulations are necessary to check for grid dependency and get a feeling for the general mean flow through the nasal airways and meatus. For these simulations it will be sufficient to limit the geometry to one half of the nose and place a symmetry boundary condition in the symmetry plane in the pharynx. In Figs. 3 and 4 one nose half is sketched with a definition of the boundary conditions used for this simulation. Simulations of this kind have already been presented in [6].

236

I. Pantle and M. Gabi

Fig. 3 Digital geometry, flow domain and grid block structure of right half of nose, including numerical boundary conditions; view from left

Fig. 4 Digital geometry, flow domain and grid block structure of right half of nose, view from opposite side (from right)

2.3 Unsteady Simulations of 1/2 Nose Unsteady calculations on one half of a nose (see Figs. 3 and 4) using a symmetry plane condition where the flow channels would join together (pharynx). This process will be limited to inhalation. Average, parallely connected computers can be used. However, the computational time will increase significantly in comparison to steady calculations, since temporal dependencies are not clear. Temporal dependencies appear due to small fluctuations within one inhalation (!) process, locally triggered by the airways geometries. The main question at this point is the investigation of flow fluctuations (laminar or turbulent) which are responsible for the smelling mechanism, i.e. proper transport and absorption of scent molecules close to the regio olfactoria. There, high frequency fluctuations of small extent and a certain slow down of the flow is expected.

2.4 Unsteady Simulations of 2/2 Nose Unsteady calculations on both nose halves in order to reduce a forcing influence of the symmetry plane, applied as a numerical boundary condition close to the numerical outlet region of the pharynx in the previous calculations: First calculations as

Application of High Performance Computational Fluid Dynamics to Nose Flow

237

Volume [l] 6 5 4 3

Total Capacity 6.1 l

Vital Capacity 4.5 l

Exspiratoric Reserve Volume 1.0 l

2 1 0

Inspiratoric Reserve Volume 3.0 l Breath Take Volume 0.5 l

Residual Volume 2.6 l 0

2

4

6

8

Time [s]

Fig. 5 Breath take cycle duration and volume scheme

in Sect. 2.3 led to the assumption, that the symmetry plane might have a strongly stabilizing influence on the whole flow. The configuration is otherwise like the previous one. Computing time will be similar as in the previous case. However, due to the double mesh size, about the double number of processors is needed and parallel computing on high performance machines is highly advised.

2.5 Unsteady Simulations of a Breathing Cycle Unsteady calculations of breathing cycles, performed as parameter studies: Here specific boundary conditions are required which provide on one end the periodically triggered in- and exhalation process and on the other end the respective response with a certain time delay not known in advance. Specific boundary conditions have to be developed for this. In order to assure proper communication of the numerical boundaries at both ends, the calculation domain has to be extended outwards, i.e. beyond the nose entry holes along the face front. Such calculations would be triggered by a defined, sinus-like lung air pulsation. In consequence, the flow stream would turn periodically at the facial part of the nose. Apart from a significant mesh extension, which increases the required computational effort, such simulations require long term calculations due to the rather long time scales of several seconds for one breathing cycle. One average breathing cycle of an adult lasts about 4 s (compare to Fig. 5). At least some cycles (2–5) have to be performed to consider the simulation as fully developed (beyond the transitional phase).

3 Simulations 3.1 Grid Generation An axial sequence of CT data from a representative live human nose was obtained. Since the grey scale CT scans rather show a continuously colored transition from air filled and nasal flow domain to tissue instead of a definite border, a pre-preparation

238

I. Pantle and M. Gabi

Fig. 6 Sequence extract of pre-prepared CT data of nasal airways

of the scans by the co-operating radiologists became necessary in order to identify exclusively those regions relevant for the flow simulation. Figure 1 shows a full-face graph constructed out of the CT scans of the patient, roughly sketching the inner borders of the nose. Part of the pre-prepared CT sequence is depicted as black & white (b&w) graphs in Fig. 6. For further details on grid generation and alternatives, refer to [6]. The final nose geometry had an extension of about 13 cm from nose entry to pharynx. Figures 3 and 4 show the flow domain for the block-structure of the grid of one half of the nose (right half) as generated by the mesh generator ICEMCFD (a commercial product of the trademark of ANSYS). Figure 2 shows a sketch of the full nose as constructed for the calculations with both halves, eliminating the symmetry boundary condition at the pharynx. The mesh extension for Sect. 2.5 of the intended calculations would appear from the nose entries outwards along the face. The grid generation here is still in progress and accordingly calculations are not yet performed.

3.2 Solver Properties The used solver (SPARC, a renamed and strongly improved version of KAPPA [5]) is an in-house development of the Inst. of Fluid Machinery of the University of Karlsruhe, Germany. It is a MPICH-parallelized, compressible, 3D Navier–Stokes scheme, based on block-structured grids, using the Finite Volume approach. Several simulation schemes are contained such as Direct Numerical Simulation, Large and Detached Eddy Simulation (DNS/LES/DES) with various sub-grid scale models as well as unsteady and steady Reynolds-Averaged Navier–Stokes turbulence modeling (RANS based on algebraic, one- or two-equation models). As for unsteady calculations, time explicit schemes (Runge–Kutta schemes of varying order) as well as a time implicit (Dual Time Stepping/DTS) scheme can be used. A model allowing to simulate fluid driven tissue movement (fluid-structure interaction/FSI) was recently added using an open source structural code [8].

Application of High Performance Computational Fluid Dynamics to Nose Flow

239

For all simulations the temporal as well as the spatial schemes were of second order accuracy.

3.3 Unsteady Simulations: Sects. 2.3 and 2.4 A LES of the inhalation is performed with more than 106 grid cells. The low Mach number (M a 2103) requires a pre-conditioning, in order to prevent Eigenvalue mismatch and stiffness. Parallel LES calculations are performed on HPC platforms and Linux clusters. The simulations have been carried out on meshes with varying resolution for one nose half and for a full nose (5–5:5106 cells in the finest grid level for one nose half, double the numbers for both halves). For a full nose the meshes were mirrored and both halves were connected together, eliminating the symmetry boundary condition. Each mesh consists of 4 grid levels, differing by a factor of 8 in the number of grid cells from one level to the next. The calculations have been performed on cluster PCs of the Inst. of Fluid Machinery of the University of Karlsruhe, Germany, (coarser grid levels) and on the high performance computers HP XC6000 and HP XC4000 (fine grid levels) of the Steinbuch Center for Computing in Karlsruhe, Germany. Steady state simulations and grid dependency investigations have been performed previously [6]. For the unsteady simulations of the inhalation process (Sects. 2.3 and 2.4) a monotonically integrated LES (MILES) [3] was carried out. At the nose entry a constant mass flow boundary condition (d m=dt D 0:1875 103 kg=s, which corresponds to an inflow volume rate of about 150 ml=s), in the epipharynx the static atmosphere pressure was set. The LES was the obvious choice due to its capability of predicting laminar to turbulent transitions. The identification of laminar and turbulent flow regions is a question of specifically medical interest for supporting the theory that a certain ratio of laminar and turbulent flow causes a patient’s breathing to be considered comfortable or not. However, only weak turbulence is expected, and flow turbulence with larger vortices due to sudden geometry changes is expected rather than small scale wall boundary layer turbulence. A yC value of 1 was configured for the grid in order to perform a fine resolution of the wall boundary layers. The following unsteady calculations were performed, partly considered as tests and as such only performed like described in Sect. 2.3:

3.3.1 Without Specific Wall Heating 1. Due to the low Mach number (M a 0:0022), compressibility effects are small. Consequently, within the compressible CFD solver, a pre-conditioned compressible scheme is the suitable choice. However, the pre-conditioning destroys the real time step and can therefore only be applied for time implicit calculations.

240

I. Pantle and M. Gabi

The purely incompressible scheme, also contained in the code, is presently restricted to steady state calculations. It is about to be extended to unsteady calculations but cannot yet be applied in this context. 2. Time explicit calculations have been performed for test reasons: Due to the low Mach number, a pure compressible scheme gets very stiff, and the solution degenerates. This is a result of the increasing Eigenvalue mismatch with decreasing Mach number. In order to reduce this mismatch (although not completely remove it) and to increase accuracy, non-conservative variables and differences to their mean values were solved for. This test was performed to get an idea of relevant time scales and to be able to configure the time implicit scheme properly.

3.3.2 With Heated Wall 1. The time implicit scheme was used as described above. 2. In this case the wall boundary condition was configured with a constant temperature of T D 37 ı C. A time step of t D 5 107 s proved to resolve the small scale fluctuations. This is about 100 times higher than the time step found for explicit simulations. Technically, a time implicit calculation would be configured with an even much higher time step. This would result in the fact, that the small scale fluctuations at the regio olfactoria would not be resolved. However, this was a major question of the Sects. 2.3 and 2.4. Therefore, this relatively small time step was set for the implicit time marching scheme. It led to a stable, temporal resolution with about 65 convergent, inner iterations in the finest grid level. Convergence is assumed when a certain convergence criterion is reached. The convergence criterion is the number of decades of decreasing residuals for all variables. This number depends on the grid level. On the coarsest level all residuals must fall below 105 (decade number 5), with each step to a finer level the exponent/decade number increases by 0.5. So for the forth grid level a convergence criterion of 103:5 is imposed which all residuals must fall below. With this time step it was possible to track distinct, periodic fluctuations of the flow magnitudes velocity (vector u with 3 spatial components ui ) and pressure (p) within the inferior, middle and superior nasal meatus as well as at the nose isthmus and the regio olfactoria. An example sequence of full nose calculations showing fluctuations of the velocity component u1 (also declared as U ) along the x direction (main nose flow direction) is depicted in Fig. 7. Please note the fluctuative direction reversion (colors turn from red to blue and to red again) at the regio olfactoria (uppermost part over the turbinates). The corresponding frequency of those small scale fluctuations at the regio olfactoria turned out to be around 20; 000 Hz. Selected LES simulations of one nose half were discussed in [4].

Application of High Performance Computational Fluid Dynamics to Nose Flow

241

Fig. 7 Velocity component U D u1 in direction along x (main stream direction): fluctuations close to the regio olfactoria from positive to negative values of the order of ˙ 0:2 m=s (direction reverse); time increasing from (a) to (e) by a sum of approximately 5 105 s

For calculations with walls of constant temperature, the highest vorticity values appeared at the nose entry and in the region of the choanes. Figure 8 shows the vorticity of the flow at one time step (moment shot), full nose calculation. The black boxes show regions of high vorticity (reddish color). The wall boundary temperature condition was set to 37 ıC.

242

I. Pantle and M. Gabi

Fig. 8 Vorticity of nose flow: ! D r u

3.4 Validation Issues Presently, there is not much of detailed experimental data of technical quality to validate the simulations with [9]. Measurements show an integral pressure difference between nose entry and pharynx of approximately 150 Pa [4]. This is a value which was used for all simulations regarding Sects. 2.2 to 2.4. It was inherently included in the configuration of the inlet and outlet boundary condition which were both of compressible type. Previous simulations, e.g. in [6], showed that this value was kept fairly close. As for grid dependency, the steady state simulations served as reference. Since the code contains a full multi-grid scheme [1], the simulations were carried out starting on a coarse grid level finding a converged solution. This solution will be interpolated to the next higher grid level where again a multi-grid scheme is applied and where again a converged solution is found. In this way, solutions on successively finer grids where found over four grid levels in total and a cell refinement of the domain by a factor of 8 between each grid level. A first version of the simulation grid showed a bad resolution within the region between nasal entry and nasal isthmus [4]. A position-steady vortex appeared which almost blocked the complete isthmus. This was considered rather unrealistic though not generally impossible. However, the nose geometry was considered a rather good nose from a medical point of view, so that the general function had to be assumed. Thereafter, the grid was improved in this area. Differences between the second-finest and finest grid levels were reduced and the position-steady vortex disappeared. Another measurable quantity, the nose resistance, is at this stage of the simulations not yet possible to compare. The reason is, that several physical phenomena related to the wall texture and function (humidity, roughness) were neglected. So far, the existence of high frequency fluctuations close to the regio olfactoria could be resolved. An advantage in simulating the inhalation only using 2/2 of the nose instead of 1/2 of the nose could not be observed. This measure was used in order to eliminate the symmetry boundary condition in the pharynx applied in the 1/2 nose case. Though expected, the 2/2 nose simulation did, however, not

Application of High Performance Computational Fluid Dynamics to Nose Flow

243

show significant asymmetry of flow between the 2 halves and in the geometrically undisturbed pharynx (see Fig. 7).

3.5 Outlook Unsteady Simulations: Sect. 2.5 Unsteady calculations for complete breathing cycles are presently prepared. The required grid extension is almost finished. There, a grid is needed which extends outwards the nose entries in front of the face. The boundary conditions are under development. The future task includes parameter studies for different geometries. The computational time will increase significantly due to the larger grid and the additional time needed for the simulation of a minimum of breathing cycles. In order to compensate for this, medical considerations will be revised and limited to the breathing cycle instead of small scale fluctuations at the regio olfactoria. In this way, the time step can be increased to a suitable value for time implicit schemes. The major subject of investigation is then specifically, if an asymmetric flow comparing the two nose channels will develop as it is developing in reality. The breathing cycles of a patient’s nose flow show alternatingly different peak mass flow rates at the nose entries, which sum up for an overall mean peak mass flow rate of the already mentioned 150 ml=s approximately [9].

4 Computational Performance The simulations on the second-finest and the finest grid level were performed either on the Linux PC cluster of the Inst. of Fluid Machinery or on the Opteron processors of the HP XC4000 of the SCC Karlsruhe, both University of Karlsruhe, Germany. The calculations were generally distributed on 17 or 20 processors, each processor containing about 20 blocks out of a total of 200 (half nose) or 400 (full nose) respectively. This distributions led to a load balancing (compare to Fig. 9) of

Fig. 9 Load balancing for different processor numbers for simulations using the full nose mesh of 400 blocks

244

I. Pantle and M. Gabi

between 97% (17 processors) to 78% (20 processors). On the finest grid, calculation times lasted at least one week for a first checkup of the results and for a full calculation including visualization writing about a month (about 700h). The latter represents about 1,000 implicit time steps. The computational performance as pure calculation time (without communication and writing) to total calculation time was found to be close to 96% (17 processors) or 91% (20 processors). For the future investigations a much huger mesh (more than double the domain size) with an overall higher resolution will be used. It will be restructured for a better load balancing (about 95%) using more processors.

5 Conclusions and Outlook These simulations demonstrate the basic capability of highly resolving CFD methods to show the swirling of the inspiration air near the olfactory cleft. Still, there are open questions. Some physical effects like humidity are not yet included. The boundary conditions do not fully represent all mechanisms of the flow behavior. As for the wall boundary condition, calculations with heat transfer exist, but wall roughness is not yet specified. Though already contained in the code, no fluid driven tissue movement (FSI) was allowed. As for the pressure condition at the pharynx and the mass flow inlet condition, they are suitable for general simulations, where the focus is laid on flow reasonably far away from these boundaries, like at the regio olfactoria. As mentioned in Sect. 2.5, a real nose flow, triggered by sinus-like lung impulses, must not impose strict boundary conditions where the real flow conditions tend to be much less strict, i.e. at the nose entries and the pharynx. For understanding a real nasal flow, the step in Sect. 2.5 is required with reasonable spatial and temporal resolution. This, in turn, is not likely to suit quick analysis of virtual surgery rooms as proposed from time to time within the community, since the calculation times will increase significantly. However, with newly developed and efficient numerical methods virtual surgery rooms and optimized geometry prediction for physiologically optimal breathing might get feasible in the future. Numerical simulation methods are—as a matter of fact—always approximations of reality. Under this constraint it is of high importance to find means of suitable validation of these methods, in particular for this type of flow. All used models were developed and calibrated for technical flows with less complex geometries and fewer multi-physical phenomena. In order to obtain deeper insight and at the same time to overcome the lack of detailed validation data, extensive numerical parameter studies (geometrical or flow condition variations) might be performed in the future and suitable means of statistical analysis might be applied to them. Acknowledgements This project is an interdisciplinary co-operation with the Otorhinolaryngological Clinic and the Clinic of Diagnostical Radiology of the Medical Faculty of the University of Halle, Germany. The authors like to express their special thanks to the doctors E.-J. Haberland, S. Knipping, K. Neumann, M. Knoergen and K. Stock.

Application of High Performance Computational Fluid Dynamics to Nose Flow

245

References 1. J. H. Ferziger and M. Peri´c. Computational Methods for Fluid Dynamics. Springer Berlin, 2002. 2. A. Gerndt, T. van Reimersdahl, T. Kuhlen, C. Bischof, I. H¨orschler, M. Meinke, and W. Schr¨oder. Large-Scale CFD Data Handling in a VR-Based Otorhinilaryngological CASSystem using a Linux-Cluster. Journal of Supercomputing, 25:143–154, 2003. 3. F. F. Grinstein and C. Fureby. From Canonical to Complex Flows: Recent Progress on Monotonically Integrated LES. Journal of Computing in Science and Engineering, 6:36–49, 2004. 4. E. J. Haberland, I. Pantle, S. Knipping, M. Knoergen, K. Stock, and K. Neumann. Anwendung von Large-Eddy-Simulation auf die Luftstr¨omung in einem aus CT-Daten generierten nasalen Str¨omungsraum. In 79. Jahrestagung der Deutschen Gesellschaft f¨ur Hals-Nasen-OhrenHeilkunde, Kopf- und Halschirurgie, Bonn, Germany, 2008. http://www.egms.de/en/meetings/ hnod2008/08hnod573.shtml. 5. F. Magagnato. KAPPA - Karlsruhe Parallel Program for Aerodynamics. TASK Quarterly Scientific Bulletin of Academic Computer Centre in Gdansk, 2(2):215–270, 1998. 6. I. Pantle, U. Serra, and M. Gabi. Flow and Acoustic Calculations in a Human Nose. In 8th International Symposium on Experimental and Computational Aerothermodynamics of Internal Flows (ISAIF8), Lyon, France, 2007. 7. K. I. Park, C. Br¨uckner, and W. Limberg. Experimental study of velocity fields in a model of human nasal cavity by DPIV. In B. Ruck et al., editor, Proceedings 7th International Conference on Laser Anemometry, Advances and Applications, Karlsruhe, Germany, pages 617–626, 1997. 8. F. Rieg. Z88 - Das kompakte Finite Elemente System, Version 12.0. http://www.z88.de/, 2006. 9. I. Weinhold and G. Mlynski. Numerical simulation of airflow in the human nose. European Archives of Oto-Rhino-Laryngology, 261(8):452–455, 2004.

•

MaxNet and TCP Reno/RED on Mice Traffic Khoa T. Phan, Tuan T. Tran, Duc D. Nguyen, and Nam Thoai

Abstract Congestion control is a distributed algorithm to share network bandwidth among competing users on the Internet. In the common case, quick response time for mice traffic (HTTP traffic) is desired when mixed with elephant traffic (FTP traffic). The current approach using loss-based with Additive Increase, Multiplicative Decrease (AIMD) is too greedy and eventually, most of the network bandwidth would be consumed by elephant traffic. As a result, it causes longer response time for mice traffic because there is no room left at the routers. MaxNet is a new TCP congestion control architecture using an explicit signal to control transmission rate at the source node. In this paper, we show that MaxNet can control well the queue length at routers and therefore the response time to HTTP traffic is several times faster than with TCP Reno/RED. Keywords Active queue management • TCP congestion control • MaxNet

1 Introduction TCP Reno [Jac90] uses AIMD mechanism [Jac88] in which the sending rate is increased until packet loss happens. To avoid buffer overflows at router, AQM RED (Active Queue Management Random Early Detection) [FJ93] can be used in conjunction with TCP Reno. The weakness of RED is that it does not take into account the number of incoming flows arrived at a bottleneck link to perform appropriate treatments to lighten down the heavy load. When there are a large number of sharing flows at a bottleneck link, the offered load will not be decreased,

K.T. Phan T.T. Tran D.D.Nguyen N. Thoai Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Vietnam e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 20, © Springer-Verlag Berlin Heidelberg 2012

247

248

K. T. Phan et al.

also the queue length at router does not change. This results in longer response time for the mice traffic. MaxNet [SAW08, MRB05, BLM03] is a new congestion control mechanism using multi-bit signal instead of packet loss to control the sending rate. Besides, MaxNet router can control the magnitude of transient queues well regardless the number of new arrival flows. In other words, MaxNet can always keep a free space at routers for mice traffic to fly though. As a result, the response time to HTTP requests is much shorter with MaxNet than with TCP Reno/RED. The rest of this paper is structured as follows. Section 2 is the theoretical analysis of queueing delay of RED and MaxNet routers. Section 3 shows the efficiency of MaxNet’s quick control of the transfer rate of mice flows. We have some experiments and evaluations in Sect. 4. Finally, we present the conclusions and future work in Sect. 5.

2 Equilibrium Queueing Delay at RED and MaxNet Router 2.1 Queueing Delay at RED Router RED routers calculate and compare the average queue length based on the two parameters: maximum threshold and minimum threshold. Base on this comparison, RED router operates in three modes [FJ93]: – “No dropped”: when the average queue length is less than the value of the minimum threshold, router assumes that its link is under-utilized. As a result, all packets are allowed to go through without marking or dropping. – “Probabilistic dropped”: when the average queue length is between the minimum and maximum thresholds, router assumes that the network can be saturated, it then marks/drops packets with a probability corresponding to the traffic load. – “Forced dropped”: when the average queue length is greater than the maximum threshold, all packets that go through the router will be marked/dropped to reduce heavy load on the link. From the flow-level model of AIMD, the window size of TCP Reno is updated using the following equation [WJL06]: 0

wi .t/ D

1 2 xi .t/qi .t/wi .t/ Ti .t/ 3

(1)

where Ti .t/ is the round-trip-time; xi .t/ D wi .t//Ti .t/ (packets/s); wi .t/ is the current window size and qi .t/ is the end-to-end loss probability. At the equilibrium 0 point, window size adjustment wi .t/ D 0, then from (1), the end-to-end equilibrium mark/drop probability feedback to source i is derived as following: qi D

3 >0 2:w2 i

(2)

MaxNet and TCP Reno/RED on Mice Traffic

249

Elephant traffic

RED Router

RED Router

Mice traffic

Fig. 1 Queueing delay of RED routers

Equation (2) implies that, at the equilibrium point, end-to-end mark/drop probability of source must be greater than zero. As a result, from the marking scheme of RED, it can be asserted that each router on the end-to-end always keeps a backlog. This consequently causes inevitable queueing delay for mice traffic such as HTTP requests to fly through. Figure 1 illustrates two RED routers which always maintain backlog at equilibrium point.

2.2 Queueing Delay at MaxNet Router The marking mechanism of MaxNet router uses an explicit multi-bit signal instead of marking/dropping packet as RED router’s. Congestion price pl at MaxNet router is defined in [SAW08]: pl .t C dt/ D pl .t/ C dt

yl .t/ l Cl Cl

(3)

where yl .t/ is the aggregated rate at link l; Cl is the link capacity and l is the target link utilization. In MaxNet router, at the equilibrium point, the link price adjustment (3) tries to match the aggregated input rate yi .t/ with l Cl , leaving spare .1 l /Cl capacity to absorb mice traffic and reduce the queueing delay. Figure 2 illustrates the queueing delay of two MaxNet bottleneck links at the equilibrium point. In contrast to RED routers, there is no backlog in both two MaxNet routers when the target link utilization is set to l where 0 < l < 1, hence, mice traffic can fly through links without being blocked.

3 Magnitude of Transient Queue of RED and MaxNet Routers As pointed out in [FKS99,LAJ03], the weakness of RED is that it does not take into account the number of flows arriving a bottleneck link to have proper treatments to avoid heavy load. Assuming there are n flows sharing a bottleneck link. If a packet is

250

K. T. Phan et al. Elephant traffic

MaxNet Router

MaxNet Router

Mice traffic

Fig. 2 Queueing delay of MaxNet routers

marked or dropped then the offered load is reduced by a factor of .10:5n1/. When n is large then .10:5n1 / ! 1, which means the offered load will not be decreased and the queue length doesn’t change, either. That means if RED is not configured aggressively then marking a single packet could result in simple “droptail” packet drops. The packet loss then severely declines the throughput of Reno sources due to their AIMD mechanism. MaxNet router well controls the magnitude transient queue regardless the number of new arrival flows. At the equilibrium point, when the number of new arrival flows is small, transient queues exist, but the magnitude of these queues decreases rapidly as the number of flows increases [SAW08]. This can be explained with the following simple case: assuming that there are N flows sharing a bottleneck link. At the equilibrium point, each flow transmits at the rate of Ni Ci . Thus, when a new flow joins, its advertised rate is at most Ni Ci . The aggregated arrivals at router are at most l Cl C Nl Cl . Thus, this causes the overload: 0 overload ..1 C

1 /l 1/Cl N

(4)

Obviously, the larger the N is, the smaller the magnitude of transient queue becomes and eventually when N > 1 , the transient queue size drops to zero. As mice traffic is short-lived flows, an effective congestion control should quickly controls the rate of such flows to avoid uncontrolled overshoot and transient queue. Unlike SlowStart mechanism of Reno [Jac88], MaxNet source employs MaxStart mechanism [SAW08] to seek for the target rate at initial state Fig. 3. By adopting the multi-bit explicit signaling mechanism, MaxStart enables source to seek for its target rate within a significant short duration. New MaxNet flow is initiated at the minimum spare capacity of all links on its end-to-end path and then ramped up linearly to the advertised rate over two Round Trip Time [SAW08]. Therefore MaxNet source converges to the target rate more quickly than Reno source.

SlowStart

0

251

Congestion window

Congestion window

MaxNet and TCP Reno/RED on Mice Traffic

time

MaxStart

0

time

Fig. 3 SlowStart and MaxStart

FTP sink

FTP source

100

Mbp

MaxNet router 1

MaxNet router 2

s

10 Mbps s bp

0 10

Dummynet

M

HTTP source

HTTP sink

Fig. 4 MaxNet networks testbed

4 Experiment and Evaluation

4.1 Testbed Layout In this testbed, Pentium IV PCs (CPU 1.8 GHz, 256 MB RAM, 100 Mbps network cards) are used. The MaxNet Router1 is configured with the output capacity 10 Mbps to make sure it is the bottleneck link. The target utilization of both MaxNet routers .l / is set to 0.95. Dummynet [DUM] is configured with 20 ms RTT delay. The testbed of Reno/RED is same as MaxNet testbed where MaxNet routers are changed to RED routers. All of RED routers are configured with the RED parameters for web traffic [FKS99] as following: – wq D 0:002 : weighting factor for computing average queue size as suggested in [CH04] for web traffic. – qavg D .1 wq /:qavg C wq :q with q is instantaneous queue size. – mint h D 30: average queue length threshold for triggering probabilistic drops/marks.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Reno/RED

360

315

270

225

180

90

135

45

MaxNet

0

Fig. 5 Response time of TCP Reno vs MaxNet with 200 HTTP connections

K. T. Phan et al.

cumulative probability

252

response time (ms)

– maxt h D 90: average queue length threshold for triggering forced drops/marks. – maxp D 0:1: maximum mark/drop probability. We simulated elephant traffic and mice traffic with iperf [IPE] and httperf [HTT] tools. For the both networks of Reno/RED and MaxNet, one long live FTP connection is generated at the “FTP source” in approximately of 60 s. Then 20 s after, HTTP connections are generated on the “HTTP source”. Each HTTP connection sends one request of 62 bytes and HTTP response size is 4 KB. The HTTP response time is computed at the application layer by the duration from the first byte being sent out to the time when the first byte of response received.

4.2 Response Time of HTTP Connections We adopted the cumulative probability for statistical analysis of the response time of HTTP requests. In Fig. 5, the response time of HTTP requests in MaxNet is significantly less than in TCP Reno/RED. Particularly, in the experiment with 200 HTTP requests spawn, 100% of MaxNet HTTP requests receive the reply at most at 135ms while in TCP Reno/RED, only 30% of the total HTTP requests receive the first reply less than 135ms.

4.3 Throughput of Elephant Flow In Figs. 6 and 7, packets drop occur at the RED bottleneck link, thus the throughput of Reno/RED is decreased. The greater the number of connections is, the more severity the drop becomes. In contrast, the throughput of elephant flow in MaxNet networks is not impacted regardless the number of arrival flows.

MaxNet and TCP Reno/RED on Mice Traffic

253

1200000

Throughput (bytes/sec)

1000000 MaxNet Reno/RED 800000

600000

0

10

20

30

40

50

60

70

80

Time (s) Fig. 6 Throughput of elephant traffic when 50 HTTP flows join

Fig. 7 Throughput of elephant traffic when 200 HTTP flows join

4.4 Transient Queue In this section, we analyse the transient queue size in comparison between RED and MaxNet routers (Figs. 8 and 9). In this experiment, we configure two same 100 Mbps links and keep the other configurations and parameters of MaxNet and RED router as the same as the above experiments to compare transient queue. Under MaxStart mechanism, the transient

254

K. T. Phan et al.

Fig. 8 Backlog at RED router

Fig. 9 Backlog at MaxNet router

queue happens within short duration when HTTP connections join, meanwhile RED router always keep a backlog all the time.

5 Conclusions and Future Work At the equilibrium point, MaxNet can clear the buffer while Reno/RED always keeps a backlog in routers. Therefore, when elephant traffic is mixed with mice traffic, MaxNet has a shorter response time for mice traffic than TCP Reno/RED. If the number of arrival mice flows is large, Reno without proper treatment can cause packet loss which in turn eventually degrades the throughput of elephant traffic. In addition, MaxStart mechanism of MaxNet (using multi-bit signaling) can control mice flows to the target rate more quickly than Reno sources. By the experiments, we showed that the performance of mice and elephant traffic when using with MaxNet is better than with TCP Reno/RED’s in terms of response time and network utilization. For future work, more experiments should be conducted with realistic web workload and other performance properties such as fairness, TCP friendly should be evaluated. Acknowledgements The authors would like to thank Bartek Wydrowski, Lachlan Andrew and MaxNet team [TEAM] for their advices and support.

MaxNet and TCP Reno/RED on Mice Traffic

255

References [BLM03] Bartek Wydrowski, Lachlan L.H. Andrew, Moshe Zukerman, “MaxNet: A Congestion Control Architecture for Scalable Networks”, IEEE Communications Letters, Oct. 2003. [CH04] Claypool, M. Kinicki, R. Hartling, M., “Active Queue Management for Web traffic”, Proceedings of IPCCC04, 2004. [DUM] http://info.iet.unipi.it/luigi/ip -dummynet/. [FJ93] Sally Floyd and Van Jacobson, “Random Early Detection Gateways for Congestion Avoidance”, ACM Transactions on Networking, 1993. [FKS99] Wu-Chang Feng, Dilip D. Kandlur, Debanjan Saha and Kang G. Shin, “A SelfConfiguring RED Gateway”, Proceedings of INFOCOM’99, March 1999. [HTT] http://www.hpl.hp.com/research/linux/httperf/. [IPE] http://noc.pregi.net/iperf.html. [Jac88] Van Jacobson, “Congestion Avoidance and Control”, in Proceeding of SIGCOMM’88, August 1988. [Jac90] Van Jacobson, “Berkeley TCP Evolution from 4.3-Tahoe to 4.3-Reno”, in Proceeding 18th Internet Engineering Task Force, August 1990. [LAJ03] Long Le, Jay Aikat, Kevin Jeffay, F. Donelson Smith, “The Effects of Active Queue Management on Web Performance”, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communications, SIGCOMM’03. [MRB05] Martin Suchara, Ryan Witt and Bartek Wydrowski, “TCP MaxNet - Implementation and Experiments on the WAN in Lab” in Proceedings of IEEE International conference on Networks (ICON), 2005. [SAW08] M. Suchara, L. Andrew, R. Witt, K. Jacobsson, B. Wydrowski and S. Low, “Implementation of Provably Stable MaxNet” Proceedings of BROADNETS, September 2008. [TEAM] http://netlab.caltech.edu/maxnet/. [WJL06] D.X. Wei, Cheng Jin, Steven H. Low and Sanjay Hegde, “FAST TCP: motivation, architecture, algorithms, performance”, IEEE/ACM Transactions on Networking, 14(6):1246-1259, Dec 2006.

•

Superstable Models for Short-Duration Large-Domain Wave Propagation Minh Q. Phan, Stephen A. Ketcham, Richard S. Darling, and Harley H. Cudney

Abstract This paper introduces a superstable state-space representation suitable for modeling short-duration wave propagation dynamics in large domain. The true system dimensions and the number of output nodes can be extremely large, yet one is only interested in the propagation dynamics during a relatively short time duration. The superstable model can be interpreted as a finite-time version of the standard state-space model that is equivalent to the unit pulse response model. The state-space format of the model allows to user to take advantage of extensive statespace based tools that are widely available for simulation, model reduction, dynamic inversion, Kalman filtering, etc. The practical utility of the new representation is demonstrated in modeling the acoustic propagation of a sound source in a complex city center environment.

1 Introduction Computer codes for high-fidelity acoustic simulation over a useful range in a realistic and complex environment are available for both fixed and moving sources [1–3], but these codes run on massively parallel computers. In our recent experience, a 1.6-s 3-dimensional simulation of the open-air propagation of a sound source in a city center environment of several blocks with a bandwidth of the signals limited to

M.Q. Phan Thayer School of Engineering, Dartmouth College, Hanover, NH 03755, USA e-mail: [email protected] S.A. Ketcham H.H. Cudney Engineer Research and Development Center, Hanover, NH 03755, USA e-mail: [email protected]; [email protected] R.S. Darling Sound Innovations, Inc., White River Junction, VT 05001, USA e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 21, © Springer-Verlag Berlin Heidelberg 2012

257

258

M.Q. Phan et al.

300 Hz takes about 11.5 h using 256 cores of a high-performance computer (HPC). By complex environment, we refer to the multi-path reflection from buildings to buildings as the sound propagates throughout the domain. Larger domains and higher bandwidths require using even more cores (increasing proportionally with domain volume and exponentially with bandwidth), and longer simulations require proportionally longer run times. In applications such as source localization or optimal placement of sensors, such simulations must be repeated many times for different sound sources. One way to mitigate this computational burden is to find reduced-order representations of the original high-fidelity propagation dynamics, and exploit these reduced-order models for any repeated simulations of different sound sources. In other words, one finds a much simpler model that describes the input–output relationship of a much larger and more complex simulation model (i.e., a model of a model). It was demonstrated in [3] that accurate reduced-order models can be obtained for wave propagation problems. For each source location, the supercomputer simulation needs to be carried out only once to create the input– output data from which a reduced-order model is derived. The reduced-order model can be used from then on to predict the signal propagation for a different source input at the same source location. We should note here that using the principle of reciprocity where the locations of sensors and actuators are reversed by the transpose operation, the transpose of such a model can also be used to simulate the sound propagation of a moving source to a sensor placed at the original source location. For the purpose of this paper, it is sufficient to consider the situation of a fixed source. Simulations using these reduced-order models can be performed on a standard laptop computer and produce in minutes results that are comparable in accuracy to those obtained from a massively parallel high-performance computers in hours. Example calculations show four orders of magnitude reduction in required computational effort. The standard procedure described in [3] that produces a reduced-order model from HPC simulation data consists of three main steps: Step 1: Design a bandwidth-limited Gaussian pulse to serve as an input signal at a specified source location, and simulate the acoustic propagation from this pulse using a finite-difference time-domain HPC code. The output of this simulation consists of sound pressure levels at locations throughout the domain of interest (such as an area of many city blocks). Step 2: Perform the Fast Fourier Transform (FFT) on the input and output time histories, and compute the Frequency Response Functions (FRF’s). The FRF’s are then inverted to recover the system Markov parameters which are sampled values of the system responses to a unit pulse input [7]. Step 3: Apply a standard realization algorithm such as the Eigensystem Realization Algorithm (ERA [4–8]) to the Markov parameters to produce a state-space model of the system. In the above 3-step process, once the HPC data is obtained in Step 1, Step 2 can be quite fast, but Step 3 is computationally intensive. We now develop a new state-space representation that can be immediately derived from the Markov parameters of Step 2 without involving any additional computation, thereby completely bypassing

Superstable Models for Short-Duration Large-Domain Wave Propagation

259

Step 3. For the purpose of this paper, the starting point of our development is the Markov parameters obtained from Step 2, and the end result is a state-space model that we designate as a superstable model. The reason that the realization step can be bypassed altogether is because the proposed superstable model takes advantage of special features of the wave propagation problem that we are interested in: (a) the true dimension of the propagation dynamics is very large but the length of the available data records to model it is very short (for example, 512 or 1,024), (b) the number of output locations (in the hundreds of thousands or millions) is many orders of magnitude larger than the number of inputs (typically a single source), and (c) the time duration of interest is short (for example, 512 or 1,024). One is interested in the propagation dynamics during this short time interval and not beyond. In the following we will describe a new superstable state-space representation that is particularly suitable for short-duration large-domain wave-propagation modeling. Examples are then provided to illustrate the utility of this representation in practice.

2 Full-Order Superstable Representation The starting point of the derivation is a set of p Markov parameters, h1 D CB; h2 D CAB; h3 D CA2 B; : : : ; hp D CAp1 B

(1)

As mentioned, for a typical acoustic propagation problem that we are concerned with, p is typically 512 or 1,024. These parameter combinations are the Markov parameters of the original or truth state-space model x.k C 1/ D Ax.k/ C Bu.k/ y.k/ D C x.k/

(2)

Given the Markov parameters, our goal is to find a state-space representation of the original system. This is the standard realization problem in linear system theory, for which there are several available solutions, e.g., the well-known Eigensystem Realization Algorithm (ERA) in the aerospace structural control and identification community. Our goal is to find a state-space representation of (1) without involving any realization method in order to avoid the intensive computation associated with it due to the very large number of output nodes which are in the hundreds of thousands or millions. As mentioned, having the representation in state-space form is important in order to take advantage of extensive state-space based tools for dynamic simulation, inversion, model reduction, Kalman filtering, etc. It is well-known that there exists an input–output model called an AutoRegressive moving average model with eXogenous input (ARX) that is equivalent to the state-space model in (2),

260

M.Q. Phan et al.

y.k/ D ˛1 y.k 1/ C ˛2 y.k 2/ C C ˛p y.k p/ Cˇ1 u.k 1/ C ˇ2 u.k 2/ C C ˇp u.k p/

(3)

The true (minimum) order of the ARX model does not need to be the same as the number of Markov parameters p, but we can consider (3) without loss of generality. The ARX model can be put in observable canonical state-space form as follows: 2

3 2 3 0 ˇ1 :: 7 6 ˇ2 7 :7 6 7 7 6 ˇ3 7 Q 7 Q ; B D 6 7;C D I 0 0 0 7 07 6 : 7 7 4 :: 5 5 I ˇp ˛p 0 0 0

˛1 I 0 0 6 6 ˛2 0 I : : : 6 6 AQ D 6 ˛3 0 0 : : : 6 6 : : : 4 :: :: ::

(4)

With the Markov parameters of (1), it is also well-known that the unit pulse response model involving the p Markov parameters starting from zero initial condition has the form, y.k/ D h1 u.k 1/ C h2 u.k 2/ C C hp u.k p/

(5)

The model given in (5) is also known as the finite impulse response (FIR) model. If we view the above unit pulse response model (5) as a special case of the ARX model (3), then the corresponding version of (4) is 2

3 2 3 0 h1 :: 7 6 h2 7 :7 6 7 7 6 7 7 N ; B1 D 6 h3 7 ; CN 1 D I 0 0 0 07 6 : 7 7 7 4 :: 5 I5 hp 0 0 0 0

0I 0 0 6 6 0 0 I ::: 6 6 AN1 D 6 0 0 0 : : : 6 6: : : 4 :: :: ::

(6)

Viewing the unit pulse response model as a special case of the ARX model is rather unconventional, but it does lead to (6) which is indeed a model in spacespace form. To remove any concern, direct substitution reveals that model (6) does reproduce the p Markov parameters of the original system exactly. For example, the first two Markov parameters of (6) are those of (1), 2

3 h1 6 h2 7 7 6 6 7 N N C1 B1 D I 0 0 0 6 h3 7 D h1 D CB 6 : 7 4 :: 5 hp

(7)

Superstable Models for Short-Duration Large-Domain Wave Propagation

3 0 2 3 h1 :: 7 6 h2 7 :7 76 7 7 6 h3 7 6 7 07 76 : 7 7 4 :: 5 I5 hp 0 0 0 0 3

261

2

0I 0 0 6 6 0 0 I ::: 6 6 CN 1 AN1 BN 1 D I 0 0 0 6 0 0 0 : : : 6 6: : : 4 :: :: :: 2

(8)

h2 6 h3 7 7 6 6 7 D I 0 0 0 6 h4 7 D h2 D CAB 6 : 7 4 :: 5 0

The verification is exact up to the last Markov parameter hp , but not beyond because the additional Markov parameters formed from (6) are identically zero. This fact will render the model invalid beyond the p-time step duration (unless the Markov parameters of the truth system are also negligible from that point on). However, it is important to recognize that up to time step p the model (6) is valid. Therefore, the model (6) is a finite-time state-space model, because it only correctly reproduces the first p Markov parameters. The I and 0 matrices in (6) are of dimensions q-by-q, where q is the number of outputs. The state-space model (6) has q outputs, r inputs, and pq states. For later distinction, (6) is referred to as a full-order superstable model. The first unusual aspect of this model is that the eigenvalues of the (discrete-time) system matrix AN1 are identically 0’s repeated pq times, and these zero eigenvalues are completely independent of the system dynamics. This observation is contrary to the normal understanding of a statespace model where the dynamics of the system is present in the system matrix. A zero eigenvalue in the discrete-time domain corresponds to minus infinity in the continuous-time domain, hence we use the term superstable to describe this realization. A more detailed explanation as to why we choose superstable will be provided in Sect. 4. For the time being, however, this realization can be used to compute the system response to any arbitrary input for p time steps, x1 .k C 1/ D AN1 x1 .k/ C BN 1 u.k/ y.k/ D CN 1 x1 .k/

(9)

It might be useful to repeat here that because the model (9) has the same first p Markov parameters as the original or truth model of (2), the two models are said to be equivalent for the first p time steps. In other words, the two models have identical input–output maps, i.e., they produce the same responses given the same input signal, for the first p time steps. At this point, it appears that we have accomplished our goal of finding from p Markov parameters a state-space representation that can be used to predict the system response correctly in the first p time steps, and doing so without

262

M.Q. Phan et al.

involving any realization method. However, there is a major drawback of this finitetime state-space representation when we examine its dimensions. For example, consider a system with 1 input, 1,000 outputs, and 512 available Markov parameters. The dimensions of the state-space model matrices of (6) are 512,000-by-512,000, 512,000-by-1, and 1,000-by-512,000, respectively. While it is true that (6) is a valid state-space representation that reproduces the first p Markov parameters, and this model can be obtained without any calculation, the dimensions of the full-order representation are clearly unacceptable. In a typical wave propagation modeling problem, the output dimension can be in the hundreds of thousands or millions but the number source locations is small (one or a few). We take advantage of this feature to reduce the dimensions of this superstable state-space representation further. Indeed, such a reduction can be achieved without any additional calculation as shown in the next section.

3 Reduced-Order Superstable Representation The key to finding a reduced-order representation of the full-order superstable model (6) is found by considering the transpose of the original system Markov parameters, k D 1; 2; : : : ; p, T k1 T k1 C D C A B D hk hTk D CAk1 B D B T AT

(10)

The Markov parameters of the starred system, A D AT , B D C T , C D B T , are the transposes of the Markov parameters of the original system A; B; C . Applying the result of the previous section to hk we have the following full-order superstable model for the transposed system, 2

3 2 3 0 h1 :: 7 6 7 :7 6 h2 7 7 7 7 N 6 h 7 N ;B D 6 07 6 :3 7 ; C2 D I 0 0 0 7 2 6 : 7 7 4 : 5 I5 hp 0 0 0 0

0I 0 0 6 6 0 0 I ::: 6 6 AN2 D 6 0 0 0 : : : 6 6: : : 4 :: :: ::

(11)

The model given in (11) is the finite-time state-space model of the transposed system, not the original system. We need to convert the transposed system (11) back to the original system. We can accomplish this conversion by using the relationship between the state-space model of the transposed system and the state-space model of the original system as seen in (10), but this time the conversion is applied to model (11) instead of model (2). In so doing, we arrive at the following superstable realization for the original system as

Superstable Models for Short-Duration Large-Domain Wave Propagation

2

3 2 3 0 I :: 7 607 :7 7 6 7 7 N 607 N ; B D 7 ; C D h1 h2 h3 hp 7 07 2 6 6:7 2 7 4 :: 5 05 0 0 0 I 0

0 0 0 6 6 I 0 0 ::: 6 6 AN2 D 6 0 I 0 : : : 6 6: : : 4 :: :: ::

263

(12)

It might look like we have not accomplished anything by transposing the original system to the starred system, and then transposing it back. However, going from (11) to (12) involves the transpose of AN2 which is now a much smaller matrix in dimensions. The I and 0 matrices in AN2 have dimensions r-by-r, where r is the number of inputs. On the other hand, the I and 0 matrices in the original matrix AN1 have dimensions q-by-q, where q is the number of outputs. In our problem, the number of outputs q is in the hundreds of thousands or millions whereas the number of input r is 1 for a single sound source, thus significant reduction in dimensions is achieved. For this reason, (12) is now referred to as a reduced-order superstable model. For the previously considered example with 1 input, 1,000 outputs, and 512 available Markov parameters, the dimensions of the state-space model matrices of (12) are now 512-by-512, 512-by-1, and 1,000-by-512, respectively. Thus significant model reduction has been achieved for (12), which is now a 512-state model, from (6), which is originally a 512,000-state model. The model reduction is analytical and exact. There is no computation needed to achieve this reduction. The reduced-dimension realization can be used to compute the system response to any arbitrary input for p time steps, x2 .k C 1/ D AN2 x2 .k/ C BN 2 u.k/ y.k/ D CN 2 x2 .k/

(13)

The model (13) is equivalent to the original or truth model (2) for the first p time steps. Model (13) is our final superstable model. To verify the validity of the above model reduction argument, it can be easily confirmed that the first p Markov parameters of (13) are indeed identical to those of the original system. For example, the first two Markov parameters are: 2 3 I 607 7 6 6 7 CN 2 BN 2 D h1 h2 h3 hp 6 0 7 D h1 D CB 6:7 4 :: 5 0

(14)

264

M.Q. Phan et al.

2

3 0 2 3 I :: 7 607 :7 76 7 7607 6 7 07 76 : 7 7 4 :: 5 05 0 0 0 I 0

0 0 0 6 6 I 0 0 ::: 6 6 CN 2 AN2 BN 2 D h1 h2 h3 hp 6 0 I 0 : : : 6 6: : : 4 :: :: ::

2 3 I 607 7 6 6 7 D h2 h3 h4 0 6 0 7 D h2 D CAB 6:7 4 :: 5 0

(15)

The significance of this development is that the time-consuming realization in Step 3 of the standard procedure can now be bypassed. Given the Markov parameters obtained in Step 2, a reduced-order superstable model of the system that perfectly reproduces the available Markov parameters can be immediately obtained without any further computation. The superstable model exactly captures the dynamics of the system during the duration of interest, and can therefore be used to predict the system response to another source input at the same location during this time interval.

4 Features of the Superstable Representation In this section, we summarize the rather unconventional features of this superstable representation and their implications. (1) It is a finite-time state-space model equivalent to the finite impulse response (FIR) model. The key word here is finite-time. The model reproduces a given finite number of Markov parameters exactly. In applications where we are only interested in predicting the system response during a specific duration of interest, this superstable representation is sufficient. The transient dynamics of the system need not vanish during this duration for the model to be applicable. (2) Although the representation is fundamentally finite-time, the model may still be used to predict beyond the desired duration if the dynamics of the system is such that its transient response is negligible at the end of the finite time duration of interest. (3) The system matrix of the superstable model is a deadbeat matrix and contains no dynamics of the system. In a typical state-space model, the system characteristic information (frequencies, damping ratios) is present in the system matrix which governs the transient response of the model. In the superstable representation, however, the system matrix has zero eigenvalues regardless of the dynamics of

Superstable Models for Short-Duration Large-Domain Wave Propagation

265

the system that it models. The zero eigenvalues cause the Markov parameters to vanish identically beyond a finite number of time steps. (4) Although the system matrix is deadbeat as stated above, the model does not exhibit the typical deadbeat behavior where the transient dynamics fluctuates wildly before vanishing identically (unless that is the dynamics of the true system). The superstable model transient response does indeed vanish identically due to the deadbeat nature of its system matrix, but it need not wildly fluctuate like a typical deadbeat system. For this very important reason, we use the term superstable, rather than deadbeat, to describe it. (5) The system matrix of the superstable model does not contain modal frequencies and damping ratios. This makes this type of model ideally suited to describe outdoor signal propagation problems where the signal may reflect several times between buildings but does not set up standing wave patterns and always dissipates out to infinity. The concept of physical modal frequencies and damping ratios are not applicable to large-domain wave-propagation situations. (6) There is no limitation of the superstable model in representing any physical phenomenon as long as it is linear and the time duration of interest is relatively short. The short duration requirement is to keep the dimensions of the model reasonable although state-space based model reduction techniques can be applied to reduce the dimension of the model further if desired. The finite-time dynamics of the system is represented by the Markov parameters which are stored in the output influence matrix CN 2 . Any linear dynamics can be modeled by these Markov parameters as long as the sampling interval is sufficiently small to avoid aliasing associated with sampled signals.

5 Numerical Illustration The developed superstable model is used to represent a HPC model of 3-dimensional outdoor sound propagation in a city-center environment. The citycenter dimensions are 778 m by 775 m, and 179 m in height (Fig. 1). A total of 2.8 billion nodes is used in the HPC simulation to simulate a 4.36-second propagation of a sound source located at 1.9 m above ground at the model center (Fig. 2). For the reduced-order model, the output layer has 1.26 million nodes, which are then divided into 11 strips, each with 114,000 nodes. A filtered pulse source is used and the outputs at the 1.26 million nodes are recorded. With this input–output data, 1,024 Markov parameters of the system are computed by the inverse FFT method. A superstable model described in (12) is then constructed from these 1,024 Markov parameters. The superstable model replaces the original HPC model in future simulations, resulting in significant savings in computational resource requirement and time. Snapshots of a typical simulation revealing the wavefronts at various instants of time at 0.5 m above the ground and rooftops, are shown in Fig. 3.

266

M.Q. Phan et al.

Fig. 1 3D HPC simulation model

Fig. 2 RMS of output signals (source at center)

As a point of comparison, the simulation of the original HPC model requires hours on a CRAY XT3 with 256 CPU’s and 512 GB of RAM. The superstable model, on the other hand, takes minutes on a single-core computer with about 2.6 GB of RAM. To test the validity of the superstable model, a different sound source that is not used in the identification of the Markov parameters is simulated with the HPC model, and the results of the HPC simulation are compared to the results of the superstable model simulation at these 1.26 million output nodes. The median prediction error within the 1,024-sample duration (time points 257–768, when the

Superstable Models for Short-Duration Large-Domain Wave Propagation

Fig. 3 Sound pressure at 0.140 s, 0.464 s, 0.796 s, and 1.081 s.

Fig. 4 RMS error between HPC and superstable model prediction

267

268

M.Q. Phan et al.

signals are strong throughout the model) is about 1.6 % (Fig. 4). It should be noted here that the superstable model reproduces the Markov parameters identically. Hence any observed discrepancy between the HPC simulation and the superstable prediction is entirely due to minor errors known as leakage associated with the inverse FFT method of identifying the Markov parameters. If necessary, this leakage error can be reduced further by applying proper windowing techniques. For the purpose of the present illustration, however, it is not necessary to do so. In terms of computational time (number of cores seconds), the superstable model represents a speed-up factor of nearly 20,000. When the RAM requirement is also taken into account (number of cores seconds bytes), the saving factor is nearly 3.9 million.

6 Conclusions This paper formulates a type of discrete-time model which we refer to as superstable model. This type of model is particularly suitable to represent short-duration largedomain linear wave propagation dynamics. The model is the most efficient when the number of input sources is small, but the number of output nodes is very large, and the number of samples associated with the finite-time duration of interest is small. Once the Markov parameters are computed from HPC-derived input–output data, a superstable model can be derived immediately without incurring any additional computation. Because the time-consuming realization step is entirely bypassed, the new representation significantly shortens the process of finding reduced-order models from HPC data. Being in state-space form the model is compatible with a wide variety of available state-space based computational and analysis tools including dynamic simulation, model reduction, model inverse, Kalman filtering, etc. These tools are not always available for the unit pulse response model that the superstable model replaces. Most features of this superstable model are rather unconventional. It is a state-space model but finite-time in nature. The model reproduces any given number of Markov parameters exactly, but not beyond. As such it can be used to predict the system response exactly during the time duration of interest associated with the given Markov parameters. The model requires zero initial condition, which is often a valid assumption in modeling sound propagation. The system dynamics is entirely carried in the output influence matrix which stores the system Markov parameters. No system dynamics is present in the system matrix which has zero eigenvalues regardless of the dynamics of the system that it represents. Finally, although the system matrix has zero eigenvalues, the model does not exhibit the common deadbeat behavior, hence we use the term superstable instead of deadbeat to describe this new representation. Acknowledgements This research is supported by a Small Business Technology Transfer (STTR) grant by the US Department of the Army to Sound Innovations, Inc. and Dartmouth College, and also in part by US Army In-House Laboratory Independent Research (ILIR). The authors thank Mr. Michael W. Parker who contributed to the HPC simulations.

Superstable Models for Short-Duration Large-Domain Wave Propagation

269

References 1. Anderson, T.S., Moran, M.L., Ketcham, S.A., Lacombe, J.: Tracked Vehicle Simulations and Seismic Wavefield Synthesis in Seismic Sensor Systems. Computing in Science and Engineering, 22–28 (2004). 2. Ketcham, S.A., Moran, M.L., Lacombe, J., Greenfield, R.J., Anderson, T.S.: Seismic Source Model for Moving Vehicles. IEEE Transactions on Geoscience and Remote Sensing, 43, No. 2, 248–256 (2005). 3. Ketcham, S.A., Wilson, D.K., Cudney, H., Parker, M.: Spatial Processing of Urban Acoustic Wave Fields From High-Performance Computations. ISBN: 978–0–7695–3088–5, Digital Object Identifier: 10.1109/HPCMP–UGC.2007.68, DoD High Performance Computing Modernization Program Users Group Conference, 289–295 (2007). 4. Juang, J.-N., Pappa, R.S.: An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction. Journal of Guidance, Control, and Dynamics, 8, 620–627 (1985). 5. Ho, B.L., Kalman, R.E.: Effective Construction of Linear State-Variable Models from Input– Output Functions. Proceedings of the 3rd Annual Allerton Confernce on Circuit and System Theory, 152–192 (1965); also Regelungstechnik, 14, 545–548 (1966). 6. Juang, J.-N., Cooper, J.E., Wright, J.R.: An Eigensystem Realization Algorithm Using Data Correlations (ERA/DC) for Modal Parameter Identification. Control Theory and Advanced Technology, 4, No. 1, 5–14 (1988). 7. Juang, J.-N.: Applied System Identification. Prentice-Hall, Upper Saddle River, NJ (2001). 8. Ketcham, S.A., Phan, M.Q., Cudney, H.H.: Reduced-Order Wave-Propagation Modeling Using the Eigensystem Realization Algorithm. The 4th International Conference on High Performance Scientific Computing, Hanoi, Vietnam (2009).

•

Discontinuous Galerkin as Time-Stepping Scheme for the Navier–Stokes Equations Th. Richter

Abstract In this work we describe a fast solution algorithm for the time dependent Navier–Stokes equations in the regime of moderate Reynolds numbers. Special to this approach is the underlying discretization: both for spatial and temporal discretization we apply higher order Galerkin methods. In space, standard TaylorHood like elements on quadrilateral or hexahedral meshes are used. For time discretization, we employ discontinuous Galerkin methods. This combination of Galerkin discretizations in space and time allows for a consistent variational space-time formulation of the Navier Stokes equations. This brings along the benefit of a well defined adjoint problem to be used for optimization methods based on the Euler-Lagrange approach and for adjoint error estimation methods. Special care is given to the solution of the algebraic systems. Higher order discontinuous Galerkin formulations in time ask for a coupled treatment of multiple solution states. By an approximative factorization of the system matrices we can reduce the complex system to a multi-step method employing only standard backward Euler like time steps.

1 Introduction On a domain ˝ Rd with d D 2; 3 we consider the Navier–Stokes equations on the time interval I D .0; T given in the weak formulation. Velocity v 2 vD C VN and pressure p 2 LN are defined by: Z

T 0

Z .@t v; /C.vrv; /C.rv; r/.p; r /C.r v; / dt D

T

.f; / dt; (1) 0

Th. Richter Institute of Applied Mathematics, University of Heidelberg, INF 293/294 D.69120 Heidelberg, Germany e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 22, © Springer-Verlag Berlin Heidelberg 2012

271

272

Th. Richter

N where for all .; / 2 VN L, VN WD L2 .I; V /\H 1 .I; H 1 .˝//; V WD H01 .˝/;

LN WD L2 .I; L/; L WD L2 .˝/;

and vD is a suitable extension of the prescribed Dirichlet data on the boundary D @˝ into the domain. On the remaining part of the boundary OU T the do nothing condition of Robin-type < @n v pn; >OUT is supplied, see [7]. For simplicity we will assume homogeneous Dirichlet-data throughout this work. Further, at t D 0 an initial condition for the velocity v.0; / D v0 ./ is given. See [6] for a detailed description of the Navier Stokes equations and the fundamental function spaces. For simplicity of notation we will refer to the Navier–Stokes problem in the abstract setting: U WD .v; p/ 2 XN WD VN LN W

Z

Z

T

A.U; ˚/ dt D

0

T

F .˚/ dt

8˚ WD .; / 2 XN :

0

(2) A.; / is a semi-linear form on X X with X WD V L and F ./ linear functional X ! R. In the course of this work we have flow problems with moderate Reynolds number in mind, leading to non-stationary flows, however without developed turbulences. The non stationeries can be periodic flow patterns like the Van Karman Street or non stationary behavior induced by boundary conditions. Further application fields are fluid structure interaction problems, with a basically laminar flow field on changing flow domains. The underlying structure of these problems typically allows for larger time steps. A posteriori error estimation can be used for mesh adaptation and for a control of the time step size. The dual weighted residual (DWR) method [3] allows for error estimation with regard to functional outputs J.U / of the solution. These functionals can be norm values, forces on (parts of) the boundary, the vorticity or other values of interest in fluid dynamics. The error in the functional is essentially expressed as the residual term Z J.U / J.Uh /

0

T

F .Z ˚h / A.Uh ; Z ˚h /dt;

where Zh is the solution to an adjoint problem running backward in time. For the Navier–Stokes system, this adjoint problem is given by (see [12]) Z

T 0

.@t zv ; / C .zv ; rv C v r/ C .rzv ; r/ C .zp ; r / .; r zv / dt D J:

When discretizing the problem with a time-stepping method, the backward problem usually follows another method. Often (e.g. for Runge–Kutta methods), this adjoint time scheme is not available and an approximation needs to be used. This leads to a conformity error in the error estimation process. If however the time discretization scheme has the form of a Galerkin method, the adjoint problem can be

Discontinuous Galerkin as Time-Stepping Schemes

273

solved without this additional conformity error. See [13] for a derivation of adjoint time-stepping schemes. Using dG.r/ methods for the time discretization has the additional advantage, that the adjoint time-stepping scheme is dG.r/ again [14]. For optimization algorithms, adjoint solutions are necessary in the Newton iteration. These adjoint solutions are similar to the ones used for error estimation. Here the problem is even severe: if the adjoint discrete problem is not solved with the correct adjoint scheme, the Newton convergence rates deteriorate leading to a huge number of required iteration steps [9]. In Sect. 2 we describe spatial and temporal discretizations based on Galerkin methods. The adjoint problems can be solved with the same discrete scheme. For spatial discretization, standard Taylor-Hood like elements are used. They are infsup stable [6] for solving the Navier–Stokes equations and available as higher order methods. For time discretization we employ the dG.r/ scheme. This method is well known to yield a very high approximation order and optimal stability results [8]. As a symmetric Galerkin method, the adjoint scheme is dG.r/ again. In spite of these properties, dG.r/ methods are usually not considered to be efficient schemes for the time discretization of partial differential equations, since they require the coupled solution of the Navier–Stokes system for different solution states, leading to huge systems. In Sect. 3 we describe a solution method which allows to approximate the algebraic systems with an easy factorization. As a consequence, the coupled system can be iteratively solved with a multi-step scheme consisting in easy backward Euler like cycles. Key to this splitting is the relation of dG.r/ methods to Pade-approximations of the exponential function [8]. The denominator of the Padeapproximation is the implicit part of the time stepping scheme. This polynomial will be approximately factorized by an easy simplification. To cope with the saddle point system we apply a variant of the global Schur complement method proposed by Turek [15]. In Sect. 4 we present numerical results demonstrating the feasibility and efficiency of the proposed method.

2 Discretization 2.1 Temporal Discretization For the temporal discretization of (2) we split the interval I into intervals Ii WD .ti 1 ; ti with k WD max.ti ti 1 / and approximate the solution U D .v; p/ with piece-wise polynomial functions Uk D .vk ; pk / 2 XN k . Restricted to one subinterval Ii , the functions vk jIi and pk jIi are polynomials of degree r. Functions in XN k however are not continuous on all I . Now, the dG.r/ time discretized formulation of (2) is to find Uk 2 XN k such that (see [14]) N Z X i D1

A.Uk ; ˚k / dt C .Œvk i 1 ; k .ti 1 /C / D Ii

Z F .˚k / dt I

8˚k 2 XN k :

(3)

274

Th. Richter

At the intersection ti between the intervals Ii and Ii C1 we define the values on both sides of the discontinuous functions and the jump to be Œvk i WD vk .ti /C vk .ti / , where vk .ti /˙ WD limh!0C vk .ti ˙ h/. For t0 and the initial value v0 we set vk .t0 / WD v0 . In the limit k ! 0 the jump condition enforces continuity of the velocity v. The dG.r/ formulation can be regarded as a time-stepping scheme since the knowledge of vk .ti / is sufficient to determine vk and pk on Ii . In the following we can restrict all considerations to one single interval I D .0; k and discuss the problem to find Uk 2 Xk satisfying: Z

k

C

C

Z

k

A.Uk ; ˚k / dt C .vk .0/ ; k .0/ / D

0

F .˚k / dt C .v0 ; k .0/C /:

(4)

0

This system, nonlinear in the first argument of A.; / is solved by a Newton iteration. .0/ .j / For an initial guess Uk it yields iterates Uk , j D 1; 2; : : : given by the linear .j / .j C1/ .j / update system for Wk WD Uk Uk 2 X k : Z 0

k

A0 .Uk /.W .j / ; ˚k / dt C .wk .0/; k .0// D .j /

Z 0

k

.j /

.j /

.j /

F .˚k / A.Uk ; ˚k / dt C .v0 vk .0/; k .0//

8˚k 2 Xk ; (5)

where A0 .U /.W; ˚/ is the directional derivative of A.; / at U in direction of W . Since analyzing the single linear system, we neglect the Newton iteration index j whenever possible.

2.2 Spatial Discretization Spatial discretization is accomplished by continuous finite elements for both velocity and pressure. Let Vh Qh be a pair of finite element spaces for velocity and pressure, e.g. a Taylor Hood space. The fully discrete solution Ukh on the initial time interval is then sought in the finite element space Xkh , piece-wise polynomial in space and time. For convection dominated flows this discretization lacks from spurious oscillations. To overcome this issue we add stabilization terms to the variational formulation. We apply the Local Projection Stabilization (LPS) method (see [1] or [4]) due to some advantages over classical stabilization methods of residual type (SUPG, GLS, . . . ). With LPS, the added stabilization terms are all of a diagonal kind: no additional artificial couplings between different solution components (pressure and velocity) are introduced. Further, the formulation of residual type stabilization methods for non stationary problems asks for the use of space-time elements bringing along couplings of the solution between multiple time

Discontinuous Galerkin as Time-Stepping Schemes

275

slots. See [4] for a comparison and discussion of different stabilization methods for the Navier–Stokes equations. In the course of this paper we consider finite element pairs of Taylor-Hood type with Local Projection Stabilization for strong convection and time discretization with piecewise linear polynomials, the dG.1/-method.

2.3 The Fully Discrete Scheme t 2 The two basis functions 1 .t/ WD kt k and .t/ WD k span the dG.1/ space on the interval I D .0; k/. Thus, the discrete solution Ukh D .vkh ; pkh /, the update Wkh D .wkh ; qkh / and the test function kh D .kh ; kh / are given as ˚kh .t; x/ D 1 .t/˚ 1 .x/ C 1 .t/˚ 2 .x/ and Ukh .t; x/ D 1 .t/U 1 .x/ C 2 .t/U 2 .x/, with ˚ i ; U i ; W i 2 Vh Qh . The discretization subscripts kh will be omitted when possible. For simplicity we introduce the following notations a.v; / WD .v rv; /, a0 .v/.w; / WD .v rw C w rv; /, b.p; / WD .p; r / and c.v; / WD .r v; /. The right hand sides of the linear update system in (5) is given by .f 1 ; g 1 / and .f 2 ; g 2 / assembled with the test function 1 ˚ 1 and 2 ˚ 2 respectively. The accurate integration of the right hand side is crucial to obtain Newton convergence. The nonlinear parts thus need to be integrated with a 2-point Gauss formula to guarantee third order super-convergence in the nodes. The Jacobian however can be approximated without fully loosing convergence in the Newton iteration. Exact evaluation of the nonlinear (in t) part a.v/.w; / would make the assembly of the system matrix too expansive. Thus we approximate the Jacobian by evaluation in the midpoint only a.v/.w; / a.Nv/.w; / with vN D 12 .v1 C v2 /. Introducing the notations A; B; C and M for the matrices Aij WD a0 .Nv/. j ; i /, Bij WD b. j ; i /, Cij WD c. j ; i / and Mij WD . j ; i / the linear system (5) can be written as: 3 2 2 1 1 2 1 1 MC A MC A B B 6 k 3 k 3 3 3 7 2 13 7 2 13 6 f 6 1 1 1 2 1 2 7 w 6 M C A M C A B B 7 6 2 7 6f 2 7 2 w 7 6 k 3 k 3 3 3 76 7 D 6 7: (6) 6 7 4q1 5 6 k 4 g1 5 1 2 7 6 C C 0 0 7 q2 6 g2 3 3 5 4 1 2 C C 0 0 3 3

3 Solution Scheme for the Linear System The system of linear equation (6), as a coupled system for the two solution states has four times the size than the Navier–Stokes system as emerging from standard time stepping schemes like backward Euler or Crank-Nicolson. Assembling, storing and

276

Th. Richter

inverting the system matrix is not feasible in an efficient algorithm. The saddle point character further complicates the solution. In the proposed algorithm we will address the saddle point character by a preconditioned iteration on the Schur complement and the coupled two-state system by a special factorization as proposed in [10] for (nonlinear) heat equations.

3.1 Preconditioned Schur Complement Richardson System (6) is written in the condensed form

A B C 0

f w ; D g q

(7)

with appropriate vectors f; g; w; q and the combined matrices A ; B and C . Following [15], to cope with the saddle point character, we solve for q by transforming (7) into the Schur complement and by applying a preconditioned Richardson iteration with the preconditioner P. In every step of the Richardson iteration we solve two sub-steps: the velocity-prediction and a pressure-Poisson problem:

Algorithm 1: Richardson Iteration on Schur Complement With q 0 D 0 iterate i D 1; 2; : : : to solve .wi ; q i / by: (i) (ii)

A wn D f B q n1 ; P .q n q n1 / D .C wn g/:

3.2 Solution of the Pressure Problem The efficiency of the Richardson iteration is determined by choice of the preconditioner P C A 1 B. In [15] several “optimal” possibilities for P are discussed depending on the governing pattern of the flow, whether reactive, diffusive or convective. Here, we only discuss the case of a reaction dominated flow with small time steps k. Then, the matrix A is governed by the mass Matrix: 2

1 2 6 k M C 3A 6 A D6 4 1 1 MC A k 3

3 1 1 2 3 M C A7 M M k 3 7 14 5 DW MQ 7 k 1 2 5 M M MC A k 3

Discontinuous Galerkin as Time-Stepping Schemes

277

By using the (diagonal) lumped mass matrix Ml we can easily invert MQl and assemble the preconditioner P as P WD C MQl1 B. The pressure update is then given as the solution of the problem:

Algorithm 2: Step (ii)—Pressure Problem 1 . 3C wn1 C C wn2 10g 1 C 2 g 2 /; k 1 P .q2n q2n1 / D .3C wn1 C C wn2 C 14g 1 10g2 /: k

P .q1n q1n1 / D

The preconditioning matrix P corresponds to a discretization of the Laplace operator with mixed finite elements. Multigrid solvers with optimal complexity exist for the solution of the systems. In the Finite Element toolbox Gascoigne [16] we use geometric multigrid methods on adaptively refined meshes [2]. The huge gain of the method originally implemented in [15] was due to the simple structure of the preconditioning matrix P . With the underlying lowest order Rannacher-Turek element for solving the Navier–Stokes equations, the matrix P turned out to be the well known 5-point star and the 7-point star in three dimensions respectively. However for general higher order finite elements, the stencil of the matrix P does not decrease in size but even grows compared to a standard Laplace discretizations. This matrix however does not depend on the solution itself, thus once assembled it is reused till the mesh is altered. Due to the efficiency of geometric multigrid methods the inversion of the matrix P is to be regarded as a very easy problem.

3.3 Approximation of the Velocity Problem The system to be solved for the velocity prediction A w D f Bp is 2 2 1 2 1 1 3 2f 1 M l C A Ml C A 1 6 k 3 k 3 7 w D6 4 5 2 4 1 1 1 w 2f 2 Ml C A Ml C 23 A k 3 k

2 1 Bq 3 1 1 Bq 3

1 23 1 Bq F 7 3 (8) 5 DW 2 2 F2 Bq 3

We solve (8) for the second state w2 :

2 4 1 1 2 2 1 IC M A C ŒMl A w D k2 3k l 3 1 1 2 1 1 1 1 1 I Ml A M l F C I C Ml A Ml1F 2 k 3 k 3

(9)

278

Th. Richter

Scaling of the system with the inverse of the lumped mass matrix is necessary since the matrices Ml and A do not commute, the matrices ŒMl1 A and I however do. Assembling or inverting the squared matrix ŒMl1 A2 is as difficult as dealing with the bigger system in (8). However by analyzing the left side of (9) and by introducing the function h. / D 1 C 23 C 16 2 ; describing the matrix of (9), a link to the Pade-approximations of the exponential function is made. The matrix 1 hŒMl1 A in (9) corresponds to the denominator of the .2; 1/ Pade-approximation. k2 As a matter of fact this relation is known for every dG.r/ formulation [5]. In [10] a factorization scheme is proposed to deal with systems of type (9). The denominators of the first sub-diagonal Pade approximations to the exponential function all do not have real roots [5]. Nevertheless we approximately factorize h. / by: 2 2 1 1 2 2 1 2 Q h. / D 1 C C 1 C p C D 1 C p DW h. /: 3 6 6 6 6 We replace the matrix on the left hand side of (9) by

1 Q hŒkA k2

to obtain:

2 1 1 1 1 1 1 1 I C p Ml A w2 D Ml .F C F 2 / Ml1 AMl1 .F 1 2F 2 /: (10) k 2k 6 6

The feasibility of this approximation can be seen with the generalized eigenvalue problem: 1 1 1 1 Q h M A xD h M A x: k l k l For every positive real eigenvalue of ŒMl1 A the corresponding value . / is bounded between 0 and 0:1, thus the eigenvalues of hŒMl1 A and the approximation Q 1 A are very close. In [10] the impact of this factorization on the Newton hŒM l method is analyzed. Dealing with the Navier–Stokes equations brings along the difficulty of non positive real Eigenvalues. For them, the factorization (10) degenerates if using larger time steps. As a further approximation step we thus neglect all off-diagonal couplings between different velocities in the solution matrices, that is, we neglect all couplings that arise from the term .w rv; /. This has the additional advantage, that the velocity system decouples: for every direction wd , d D 1; 2; 3 we can separately solve an equation of the type Œ˛Ml C ˇA0 wd D fd ; with some right hand side fd . This reduced the effort to solving three scalar equations compared to one vector valued equation with three unknowns. This factorization is now used to solve for the update state w2 in two sub-steps (both with the same matrix). w1 is then acquired from (8). All together the following three-step scheme is used to approximate the velocity prediction:

Discontinuous Galerkin as Time-Stepping Schemes

279

Algorithm 3: Step (i)—Velocity Update .i / .i i / .i i i /

1 1 1 1 Ml C p A y D .F 1 C F 2 / AMl1 .F 1 2F 2 /; k 2k 6 6 1 1 Ml C p A w2 D Ml y; k 6 3 1 Ml w1 D 2f 1 4f 2 C Bq 2 C M w2 Aw2 : k k

Fig. 1 Configuration of the incompressible flow benchmark problem “Laminar Flow Around A Cylinder” and reference solution: the lift-value over the time interval I D Œ0; 8

The approximate factorization described here is also possible for higher order dG-schemes. In [10] the method is described for dG.r/, r D 0; 1; 2 and general results show, that an approximative factorization works for all r.

4 Numerical Experiment As numerical test-case we consult the benchmark problem Laminar Flow Around A Cylinder as proposed in [11]. Two and three-dimensional, steady and unsteady flow problems have been described. In the left half of Fig. 1 the layout of the flow domain is described: a time-dependent, parabolic inflow profile is prescribed on the inflow boundary condition, no-slip condition is enforced on the boundaries of the channel and on the obstacle, a cylinder with a square cross-section. On the outflow boundary, the natural do-nothing condition is given [15]. Goal of the benchmark was to calculate the drag and lift values of the obstacle. The original unsteady benchmark configuration 3D-3Q in [11] yields a Reynolds number depending on the inflow profile in the interval 0 Re.t/ 100 for t 2 Œ0; 8. This (low) Reynolds number leads to a nearly stationary flow, see [12] for a discussion and analysis of the time

280

Th. Richter Table 1 Maximum relative error in the lift and overall computational for the Crank Nicolson (CN) and for the dG.1/-scheme on a mesh about 5,000 Taylor-Hood elements (about 120,000 spatial degrees of freedom) Time-step Crank-Nicolson dG.1/ 0:2 0:1 0:05 0:025

Error 1:31 101 4:19 102 1:21 102 3:11 103

Time 145 284 485 854

Error 8:95 102 1:46 102 2:03 103 2:60 104

Time 324 444 729 1,344

dependent benchmark problem. Thus as a test-case for the proposed method we increase the Reynolds number to 0 Re.t/ 400 by adjusting the viscosity. All other values are taken as in [11]. In the right half of Fig. 1 the course of the lift-value over time taken from a reference solution is plotted. Large variations in time ask for rather small time step. Too dissipative methods like implicit Euler cannot reproduce these structures. In Table 1 we list the maximum error of the lift in the time interval for the standard Crank-Nicolson and the dG.1/ scheme for different time step sizes. As expected we observe with O.k 1:9 / about second order in time for Crank-Nicolson and with O.k 2:9 / nearly third order for dG.1/. In [12], the coupled system of dG.1/ is solved by directly applying the geometric multigrid-method to (6). Compared to approximately 5 min solution-time for every time-step using the factorization and Schur complement technique described here, the solution of the coupled system takes nearly 3 h for every step.

5 Conclusion We have demonstrated an efficient way for handling discontinuous Galerkin methods as time discretization schemes. These schemes are highly attractive for the numerical approximation of the Navier–Stokes equations due to their Galerkin character. For dealing with the saddle-point character we apply a variant of the global Schur complement iteration described by Turek [15] to higher order Finite Elements. New is the way of dealing with the complex systems arising from the dG.r/-time stepping formulation. The large, coupled systems are approximately factorized. This leads to a multi-step algorithm for solving the linear system. Only standard Crank-Nicolson like steps in the velocity space and a pressure-Poisson problem needs to be solved. A numerical example demonstrates the feasibility of the proposed method for flow problems with moderate Reynolds number.

Discontinuous Galerkin as Time-Stepping Schemes

281

References 1. R. Becker, M. Braack, A finite element pressure gradient stabilization for the Stokes Equations based on local projections, Calcolo Vol. 38, No. 4, pp 173-199 (2001) 2. R. Becker, M. Braack, Multigrid techniques for finite elements on locally refined meshes, Numerical linear Algebra with Applications, Vol. 7, pp 363-379 (2000) 3. R. Becker, R. Rannacher, An Optimal Control Approach to A Posteriori Error Estimation in Finite Element Methods, Acta Numerica 2001, Cambridge University Press, Vol. 37, pp 1-225, (2001) 4. M. Braack, E. Burman, V. John, G. Lube, Stabilized finite element methods for the generalized Oseen problem, Comput. Methods Appl. Mech. Engrg. Vol. 196, 853-866 (2007) 5. C. Brezinski, U. Ieea , J. V. Iseghem, A Taste of Pade Approximation, Acta Numerica 4:53-103, Cambridge University Press (1995) 6. V. Girault, P.-A. Raviart, Finite Element Methods for the Navier–Stokes Equations, Springer: Berlin-Heidelberg-New York, (1986) 7. J.G. Heywood, R. Rannacher, S. Turek, Artificial boundaries and flux and pressure conditions for the incompressible Navier-Stokes equations, Int. J. Numer. Math. Fluids (22), pp 325-352 (1992) 8. C. Johnson, Error estimates and adaptive time-step control for a class of one-step methods for stiff ordinary differential equations, SIAM J. Numer. Anal., 25(4), pp. 908–926, (1988) 9. D. Meidner, B. Vexler, Adaptive Space-Time Finite Element Methods for Parabolic Optimization Problems, SIAM J. on Control Optimization 46(1), pp 116–142 (2007) 10. T. Richter, A. Springer, B. Vexler, Efficient numerical realization of discontinuous Galerkin methods for temporal discretization of parabolic problems, accepted for Numerische Mathematik, 2012 11. M. Sch¨afer, S. Turek, Benchmark computations of laminar flow around a cylinder. (With support by F. Durst, E. Krause and R. Rannacher), Flow Simulation with High-Performance Computers II. DFG priority research program results 1993-1995, pp 547-566, Vieweg, Wiesbaden (1996) 12. M. Schmich, Adaptive Finite Element Methods for Nonstationary Incompressible Flow Problems, Dissertation, Universit¨at Heidelberg (2009) 13. M. Schmich, B. Vexler, Adaptivity with dynamic meshes for space-time finite element discretizations of parabolic equations, SIAM J. Sci. Comput., Vol. 30, No. 1, pp. 369-393. 14. M. Schmich, B. Vexler, Adaptivity with dynamic meshes for space-time finite element discretizations of parabolic equations, SIAM J. Sci. Comput., Vol. 30, No. 1, pp. 369-393, (2008) 15. S. Turek, Efficient Solvers for Incompressible Flow Problems, Lecture Notes in Computational Science and Engineering 6, Springer, (1999) 16. The Finite Element Toolkit Gascoigne, http://www.gascoigne.uni-hd.de.

•

Development of a Three Dimensional Euler Solver Using the Finite Volume Method on a Multiblock Structured Grid Tran Thanh Tinh, Dang Thai Son, and Nguyen Anh Thi

Abstract The ongoing efforts to develop an in-house Euler solver on a multiblock structured grid using the finite volume method are briefly presented in this paper. The flux through the control volume’s surface is computed using Roe’s scheme and extended to second order using the MUSCL approach. The steady state solution is determined using a time-marching approach with a modified Runge– Kutta scheme in the core. The acceleration of convergence to a steady solution is realized using a preconditioned multigrid method, a highly efficient method making explicit schemes such as the Runge–Kutta scheme competitive compared to implicit schemes. The numerical results clearly demonstrate the capability of the developed Euler solver to handle complex configurations and the superior efficiency of the preconditioned multigrid method. Keywords MUSCL • Multigrid • Preconditioning • Parallel computing

1 Introduction Computational Fluid Dynamics (CFD) has made important progress during the last four decades. In broad terms, the field of CFD has developed in response to the need for accurate, efficient and robust numerical algorithms for solving increasingly complete descriptions of fluid motion over increasingly complex flow geometries. Even though viscous flow computations using Reynolds-Averaged Navier–Stokes are recently accessible for complex industrial configurations (i.e. complete aircraft configurations), there is still room for improvements over accuracy, efficiency and

T.T. Tinh D.T. Son N.A. Thi Ho Chi Minh City University of Technology, 268 Ly Thuong Kiet Street, Ho Chi Minh City, Vietnam e-mail: [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 23, © Springer-Verlag Berlin Heidelberg 2012

283

284

T.T. Tinh et al.

robustness aspects of numerical algorithms for RANS simulations to allow its use in analysis of complex fluid dynamics phenomena and aircraft design [1]. To serve as a tool for flow analysis and aerodynamic shape design studies, it has been recently decided to develop an in-house Euler/RANS solver at Ho Chi Minh City University of Technology (HCMUT). The ultimate goal of this project is indeed the construction of a reliable Navier–Stokes solver, which can be based upon a reliable Euler solver. There is no hope, at least as far as accuracy is concerned, that a modest inviscid model can successfully be used for viscous flow computations. This paper will mainly focus on the efficiency aspect of the developed Euler solver. The preconditioned multigrid method suggested by Pierce [2] has been implemented to accelerate the convergence to a steady solution. The parallel computing on a Beowulf-type system has been also exploited to cut down the turn around time. The paper is organized as follows. Section 2 presents the governing equations (Euler equations). Numerical methods, including space and temporal discretizations, a block-Jacobi preconditioning method and a multigrid method are discussed in Sect. 3. Implementation of parallel computing on a Beowulf-type cluster system using message passing interface and load balancing is presented in Sect. 4. Numerical results are discussed in Sect. 5. The paper ends with the conclusion section.

2 Governing Equations A well-known approximation of the Navier–Stokes equations is obtained assuming non-viscous and non-heat conducting fluids, namely the Euler equations, which can be written in terms of the conservative variables U as [1, 3]: @F .U / @G.U / @K.U / @U C C C D0 @t @x @y @z

(1)

with U D .; u; v; w; E/T being the conservative variables vector, containing the density , the three velocity components u; v; w and the energy per unit mass E. The flux vectors F .U /, G.U / and K.U / are respectively given by: 0 0 1 1 1 v w u B uv B uw B u2 C p C C C B B B C C C B 2 B B C C C F D B uv C ; G D B v C p C ; K D B vw C B B 2 B C C C @ vw @ w C p A @ uw A A vh0 wh0 uh0 0

(2)

where h0 being the total or stagnation enthalpy defined as h0 D E C p=. Using the perfect gas assumption p D Rg T , where Rg being the perfect gas constant, the static pressure p is given by: 1 p D . 1/ŒE .u2 C v2 C w2 / 2

(3)

Development of a Three Dimensional Euler Solver Using the Finite Volume Method

285

where is the specific heat ratio, which ensures the closure of the system of equations. System (1) can be rewritten in compact form by introducing the operator ! ! ! ! ! H as H D F i C G j C K k , ! @U ! C! 5 H .U / D 0: @t

(4)

Integrating system (4) over a control volume ˝, with enclosed surface S , and applying the Gauss divergence theorem, one gets @ @t

ZZ

ZZZ

! ! H n dS D 0;

Ud! C ˝

(5)

S

where ! n D .nx ; ny ; nz /T is the outward pointing normal of the surface S . Equation (5) simply states that the rate of change of the conservative variables in a ! finite control volume ˝ is balanced by the next flux H passing through the control volume’s surface S .

3 Numerical Methods 3.1 Space Discretization ! In order to pass from continuum to discrete form, the unknown vector U , which is collocated in the cell vertices, is assumed to be constant in each elementary ! control volume. Therefore, in (5), the unknown vector U can be taken out of the first integral, and the second integral is replaced by a summation over the Nf faces of the chosen control volume V Nf 1 X! @ ŒH n j;k 4Sj;k ; Uj D @t Vj

(6)

kD1

! where ŒH n j;k is the total flux normal to the surface 4Sj;k exchanged between points j and k. This flux is defined by two conservative variables in two sides of surfaces 4Sj;k W U L D Uj and U R D Uk en Our target is now to define the numerical flux in (6). Let H be the numerical i C1=2 flux function across the cell surface shared by cell i; j; k and i C 1; j; k that will be respectively denoted: U L D Ui;j;k and U R D Ui C1;j;k . Using Taylor series en expansion, H can be expressed as [3] i C1=2 e ni C1=2 D 1 ŒHn .U L / C Hn .U R / ıHni C1=2 ; H 2

(7)

286

T.T. Tinh et al.

where ıHni C1=2 is an artificial dissipation. The exact formulation of ıHni C1=2 depends on the space discretization method adopted. In this study, Roe’s scheme [3] has been implemented. Roe’s scheme is probably the most popular among the approximate Riemann solvers. It is based on the average state of the left and right unknowns, so that the term ıHni C1=2 can be expressed as: O L ; U R /j.U R U L /; ıHni C1=2 D Hn .U R / Hn .U L / D jD.U

(8)

O L ; U R /j is the average Jacobian matrix [3]. where jD.U However, the formulation given by (7) and (8) does correspond only to the first order accurate discretization, as can easily be shown by a Taylor series expansion. It is therefore essential to look for a higher order scheme, in order to reduce or possibly eliminate the diffusive character of the first order scheme. In 1979, Van Leer [2] pointed out that spatially accurate results could be obtained from Godunov-type schemes, simply replacing the piecewise constant initial data of the Riemann problem with linear varying initial data. All MUSCL (Monotone Upstream Schemes for Conservation Laws) [1, 3] rely on this concept. The evaluation of the numerical flux is based upon new left and right states, which corresponds to second and third order spatial differencing: R

U j C1=2 D Uj C1 1=4Œ.1 / 4j C3=2 C.1 C /4j C1=2 L

U j C1=2 D Uj C 1=4Œ.1 / 4j 1=2 C.1 C /4j C1=2 ;

(9) (10)

where 4j C1=2 D Uj C1 Uj . Spatial accuracy is controlled by the parameter [1, 3]. Direct application of (9), (10) produces spurious oscillations wherever the solution is not smooth. To make the scheme oscillation-free at the vicinity of discontinuities, the amplitude of forward/backward differences which are used to reconstruct the left and right states in (9), (10) would be limited within some bounds. In the MUSCL approach, such a restriction procedure is obtained via nonlinear functions, called limiters. The numerical results presented hereafter are obtained with a minmod limiter [3].

3.2 Time Discretization The modified 5-stage Runge–Kutta method developed by Martinelli [2, 5, 6] is a popular semi-discrete scheme. This method has a much larger stability region than the conventional Runge–Kutta scheme [3], so it will be preferred for implementation. In addition, the use of a block-Jacobi preconditioner substantially improves the damping and propagative efficiency of the Runge–Kutta time-stepping scheme. In this study, a block-Jacobi preconditioner has been implemented for which has been

Development of a Three Dimensional Euler Solver Using the Finite Volume Method

287

shown by Pierce [7, 8] that it can cluster the eigenvalues of the residual operator into the region having high damping coefficient of the Runge–Kutta scheme’s stability domain. The initial semi-discrete equation is preconditioned by a local preconditioner, Pj1 [2, 7–9]: @Uj (11) C R.Uj / D 0 @t with R.Uj / being the residual vector of the spatial discretization at the j th control volume. The modified 5-stage Runge–Kutta scheme then can be expressed as [4–6]: Pj1

.0/

D Ujn

Uj

.k/

D Ujn ˛ CFL Pj Rj

.nC1/

D Uj ;

Uj

Uj .k1/

.k1/

(12)

.5/

.k1/

.k1/

.k2/

where Rj D Cj .U .k1/ / Bj ; Bj D ˇk Dj .U .k1/ / C .1 ˇk /Bj .k1/ with Cj .U / being the convective contribution to Rj , and Dj .U .k1/ / being the remaining parts due to both physical and numerical dissipation. For the stability condition, CFL number is selected as 1.5. In this method, by forcing some of coefficients ˇk to be zero, we can avoid the computation of dissipative terms on the corresponding stages and thus reduce computational costs. One will obtain the classical Runge–Kutta scheme if all ˇk are equal to 1 [10, 11]. The multigrid methodology is another powerful acceleration technique which will be considered. It is based on the solution of the governing equations on a series of successively coarser grids. In this method, the low frequency components on the finest grid becomes high frequency components on the coarser grids and are successively damped. As a result, the entire error is very quickly reduced, and the convergence is significantly accelerated. This study implemented a V-cycle multigrid which does one correction on each level, like Fig. 1a. The V-cycle algorithm uses the transfer operators including: restriction transfers residuals from fine to coarse grids, and prolongation transfers corrections from the coarse grid to the fine grid.

Fig. 1 (a) V-cycle multigrid method, and (b) Host-node structure of the parallel multiblock solver

288

T.T. Tinh et al.

4 Parallel Implementation As the structured grid is used, to handle complex configurations the physical domain is partitioned into a finite number of sub-domains, hereafter called blocks. Each block is discretized using structured grid and the blocks are connected to each other at the block boundaries. Each block is virtually surrounded by two ghost layers which are used for the formulation of boundary conditions (this ensures a second order space discretization at the boundary). At block interfaces, these ghost cells correspond with those of their neighboring blocks. The program then treats the blocks more or less independently from each other which can only be done properly by exchanging data of the current solution at the block interfaces before each time step. This makes multiblock structured grid solver suitable for distributed parallel computing by distributing the blocks to different processors and transferring data between blocks (at block interfaces) using message passing. The parallel computing helps to cut down the elapsed-time for solution and the parallel computers usually offer large primary memory that is critical for industrial size applications. The distributed memory programming model has been adopted for parallelizing the developed multiblock solver. Single Program Multi Data (SPMD) programming style has been used: all processors execute the same program acting on different parts of the data set. The host-node (master-slave) programming model is shown in Fig. 1b. A host process starts the distributed application and performs the input and output and data transfers with the node processes. The host process does not participate in the flow computation that is done on node processes. The host process reads the same input files being used in the sequential user program, including the block connectivity (topology) file, the grid coordinates file and the computer’s system/network topology. A load balancing formulation based on the algorithm suggested by Eisfeld et al. [12] that takes into account the grid connectivity and the network topology is used. The optimal partition of blocks on the corresponding computer system is determined by using the Genetic Algorithm (GA) described in [13]. The blocks are then distributed to different processes for flow computation. Neighboring blocks synchronously exchange data at the end of each time step. The OpenMPI library [12] is used for message passing between processors.

5 Results To demonstrate the capability of the Euler solver developed, two extreme test cases will be computed. The first one involves the flow around the RAE2822 airfoil. The second one is the flow around the DLR-F6 body/wing/nacelle configuration.

5.1 RAE2822 Supercritical Airfoil The first test case involves the flow around the RAE2822 airfoil. The flow condition considered corresponds to M1 D 0:55 and ˛ D 2:310 . A grid of 259 65 9

Development of a Three Dimensional Euler Solver Using the Finite Volume Method

289

Mach contour, grid level = 2

Mach contour, grid level = 1

Fig. 2 Iso-Mach contours of RAE2822

1.6

Pressure distribution

Residual history log(Residual)

1.4

-Cp

1.2 1 0.8 0.6

Grid level = 1 Grid level = 2

0.4 0.2 0 -0.2 0

0.2

0.4

0.6

0.8

1

-2.2 -2.7 -3.2 -3.7 -4.2 -4.7 -5.2 -5.7 -6.2

Grid level = 1 Grid level = 2

0

200

x/c

400

600

800

1000

cycles

Fig. 3 Pressure distribution and Residual history of RAE2822

grid points that is partitioned into 16 blocks was used for this computation. Figure 2 shows the iso-Mach lines around the RAE2822 airfoil resulting from single and multigrid (2 grid levels) computations. The pressure distributions around the airfoil and the residual history are shown in Fig. 3. It is obvious that the results obtained are practically the same for the single and multigrid computations. A minor difference is, however, observed in the vicinity of the stagnation point. This may be due to flaws in the multigrid operators that need to be cleared. The efforts to solve this problem are ongoing.

5.2 DLRF6 Body/Wing/Nacelle Configuration The flow around the DLR-F6 body/wing/nacelle configuration, a much more complex aerodynamic configuration has been computed to evaluate the capability of the present solver. The flow condition considered corresponds to M1 D 0:75 and ˛ D 0:490 . The parallel version is used for this computation. A relatively coarse grid

290

T.T. Tinh et al. Optimal distribution of blocks

6

6

5

5 t (second)

t (second)

An arbitrary distribution of blocks

4 3

4 3

2

2

1

1

0

5

10

15

20

0

5

processors

10

15

20

processors

Fig. 4 Distribution of blocks on 20 processors before and after optimization

(a) 354 blocks DLR F6 mod

(b) Mach contours DLR F6, grid level = 1

Fig. 5 (a) 354 blocks grid, and (b) Iso-Mach lines of DLR-F6

of 309,660 grid points is unevenly partitioned into 354 blocks. An even distribution of these blocks to the cluster is ensured by a preprocessor that takes into account the network configuration and the grid topology. Computation workload and communication overhead are modeled using the approach suggested by Eisfeld et al. [12]. The partition problem is finally reduced to an optimization problem whose solution is determined by using a genetic algorithm [13]. The computation work load per iteration for initial and optimal distribution of 354 blocks on 20 processors is shown in Fig. 4. It is obvious that computation work load per iteration is evenly distributed over 20 processors as the optimization is applied. It is important to note that the computation overhead has been also taken into account to optimize the distribution. Figure 5 shows the grid and iso-Mach contours around DLR-F6 configuration. This study essentially focuses on the efficiency aspect of the solver. The validity of the present code will be evaluated in the near future. The convergence rates of the single grid and multigrid (2 grid level) versions are shown in Fig. 6a. First order space discretization was used on all grid levels. In the single grid computation, the

Development of a Three Dimensional Euler Solver Using the Finite Volume Method

(a) Residual history Grid level = 1 Grid level = 2

-4

real ideal

26

-6

speed up

log(Residual)

-2

(b) Speed up

31

0

-8 -10 -12 -14

291

21 16 11 6

-16 0

1000

2000

3000

4000

5000

1

cycles

1

11

21

31

processors

Fig. 6 (a) Residual history, and (b) Speed up of DLR F6

density residual drops down to machine accuracy after 4,500 iterations. One clearly observes that the convergence rate of the multigrid computation is higher than the one of the single grid of initial 200 cycles. The density residual is then stalled for cycles being higher than 200. This is doubtlessly due to an error introduced by the multigrid operators. The efforts to solve this problem are ongoing. The parallel speed up, defined as Sp D TTp1 with T1 , Tp being the execution time of the sequential algorithm and the execution time of the parallel algorithm on p processes, respectively, is shown in Fig. 6b. Even though important speedup has been obtained, there is still room to improve the parallel efficiency.

6 Conclusion A three-dimensional Euler solver being capable of handling complex aerodynamic configurations has been successfully implemented. Preconditioning and multigrid methods have been included to accelerate the convergence to steady state solution. To cut down the elapsed time, a parallel version of this code has been developed using a message passing model. It was shown that the combination of these technologies helps to drastically reduce the turn around time of steady state solution computations. There are, however, still the robustness problems associated with the multigrid implementation to be solved. The accuracy of the developed solver will be thoroughly evaluated. As the ultimate goal of this study, an URANS solver will be developed on the basis of the Euler solver developed. Acknowledgements This research work is partially supported by the Vietnam National University at Ho Chi Minh City through the key research project “Development of Grid computing environment for large scale computation applications” (Grant #B2007-20-09TD) led by Dr. Nguyen Thanh Son at the Department of Computer Sciences and Engineering (CSE) of HCMUT. The technical support of Dr. T. V. Hoai and Mr. T. N. Thuan at CSE in making the Supernode II cluster system available for this work is highly appreciated.

292

T.T. Tinh et al.

References 1. J. Blazed: Computational fluid dynamics: principles and application. Elsevier, 2001 2. N. A. Pierce: Preconditioned multigrid methods for compressible flow calculations on stretched meshes. PhD Thesis, Oxford University (1977) 3. M. Manna: A three dimensional high resolution upwind finite volume Euler solver. Technical note, April 1992, von Karman Institute 4. P.L. Roe: Approximate Riemann Solvers, Parameter Vector and Difference Schemes. Journal of Computational Physics 43, 1, 357–372 (1981) 5. R.C. Swanson, E. Turkel: Multistage schemes with multigrid for Euler and Navier–Stokes equations - components and analysis. NASA Technical paper 3631. August 1997 6. A. Jameson, W. Schmidt, E. Turkel: Numerical solutions of the Euler equations by finite volume methods using Runge–Kutta time-stepping schemes. AIAA Paper 81-1259 (1981) 7. N. A. Pierce, M. B. Giles: Preconditioned multigrid methods for compressible flow calculations on stretched meshes. Journal of Computational Physics 136, 425-445 (1997) 8. N.A. Pierce, M.B. Giles, A. Jameson, L. Martinelli: Accelerating three-dimensional Navier– Stokes calculations. AIAA Paper 97-1953 (1997) 9. D. Lee, B. van Leer: Progress in local preconditioning of the Euler and Navier–Stokes equations. AIAA Paper 97-3328-CP (1993) 10. K. Hosseini, J. J. Alonso: Optimization of multistage coefficients for explicit multigrid flow solvers. AIAA Paper 3705 (2003) 11. K. Hosseini, J. J. Alonso: Practical implementation and improvement of preconditioning methods for explicit multistage solvers. AIAA Paper 0763 (2004) 12. B. Eisfeld, H. M. Bleecke, N. Kroll, H. Ritzdorf: Structured grid solver II: Parallelization of block structured flow solvers. AGARD-FDP-VKI special course on “Parallel computing in CFD”, Von Karman Institute (1995) 13. G. Winter, J. Pelriaux, M. Galan, P. Cuesta: Genetic algorithms in engineering and computer science. John Wiley & Sons (1995)

Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty Thomas Tometzki and Sebastian Engell

Abstract We consider planning problems of flexible chemical batch processes paying special attention to uncertainties in problem data. The optimization problems are formulated as two-stage stochastic mixed-integer models in which some of the decisions (first-stage) have to be made under uncertainty and the remaining decisions (second-stage) can be made after the realization of the uncertain parameters. The uncertain model parameters are represented by a finite set of scenarios. The risk conscious planning problem under uncertainty is solved by a stage decomposition approach using a multi-objective evolutionary algorithm which optimizes the expected scenario costs and the risk criterion with respect to the firststage decisions. The second-stage scenario decisions are handled by mathematical programming. Results from numerical experiments for a multi-product batch plant are presented.

1 Introduction During the operation of flexible batch plants, a large number of decisions have to be made in real time and under significant uncertainties. Any prediction about the evolution of the demands, the availability of the processing units and the performance of the processes is necessarily based on incomplete data. Resource assignment decisions must be made at a given point of time despite the fact that their future effects can not be foreseen exactly. The existing approaches to address planning under uncertainty can be classified into reactive approaches and stochastic approaches. The former use deterministic models and modify a nominal decision if an unexpected event occurs, whereas the

T. Tometzki S. Engell Process Dynamics and Operations Group, Department of Biochemical and Chemical Engineering, Technische Universit¨at Dortmund, 44221 Dortmund, Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 24, © Springer-Verlag Berlin Heidelberg 2012

293

294

T. Tometzki and S. Engell

latter approaches reflect the uncertainty in the models. A recent overview of relevant solution techniques for the class of mixed-integer stochastic models was provided in [8, 9, 15]. The focus of this work is on the solution of two-stage stochastic mixed-integer problems. They are solved by a stage decomposition based hybrid evolutionary approach which was published first in [14]. For two-stage mixed-integer programs with a large number of scenarios, or when good solutions are needed quickly, the hybrid approach provides better results than the formulation and solution of monolithic MILPs [15]. The solution of two-stage mixed-integer stochastic programs in [15] aims at maximizing the expected profit. But plant managers frequently also try to avoid the occurrence of very unfavorable situations, e.g. heavy losses. Naturally, they aim at a compromise between expected profit and accepted risk. Using the scenario based model and the two-stage stochastic approach the risk can be controlled. This contribution introduces a multi-objective evolutionary approach to two-stage stochastic optimization with additional risk objectives.

2 Two-Stage Stochastic Mixed-Integer Programs A two-stage stochastic mixed-integer program is a deterministic model that can be used to model uncertainties in problem data [3]. In a two-stage program, some decisions can be made after the uncertainty is realized. It considers an observation of the uncertainty and thus the decisions are divided into the first-stage decisions x which have to be taken before the uncertainty is disclosed and second-stage decisions y! that have to be taken after the uncertainty is realized. It is assumed that random parameters have a finite number of realizations that can be modeled by a discrete set of scenarios ! D 1; : : : ; ˝. For a finite number of scenarios with fixed probabilities, the two-stage problem can be represented by a large mixed-integer linear program (MILP) that can be written as min f .x; y! / D cT x C x;y!

˝ X

! qT! y!

(1)

!D1

s:t:

Ax b

(2)

W! y! h! T! x

(3)

x 2 X; y! 2 Y; 8! D 1; : : : ; ˝

(4)

The objective function (1) of the problem consists of the first-stage costs and the expected value of the second-stage costs. The costs are calculated as linear functions of the first-stage variables x and the second-stage variables y! with vectors of cost parameters c and q! . The second-stage costs are P calculated over all scenario-costs with the corresponding probabilities ! , with ˝ !D1 ! D 1.

Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty

295

The linear inequality constraints of a two-stage mixed-integer programming problem are classified into constraints on the first-stage variables only (first-stage constraints) (2) and constraints on the first and on the second-stage variables (second-stage constraints) (3). The sets X and Y contain the domains of the variables including integer requirements.

3 Decomposition of a Two-Stage Stochastic Program Two-stage stochastic programs with a finite number of scenarios can be represented as large-scale mixed-integer linear programs which are in principle amenable to of-the-shelf solvers. However, particularly when the number of scenarios is large or if solutions have to be generated in short time, their performance may be not sufficient [15]. The solution technique proposed in this work is based on stage decomposition. The idea of stage decomposition is to remove the tie between the second-stage subproblems by fixing the first-stage decisions [9]. The scenario second-stage subproblems are of significantly smaller size than the full two-stage problem, so that good solutions can be generated fast using standard algorithms. The master problem is a function of the first-stage decisions x only and can be formulated as: min f .x/ D cT x C Q.x/ x

s:t:

Ax b; x 2 X:

(5) (6)

The second-stage value function Q.x/ is given by the expected value of ˝ independent second-stage functions Q! .x/: Q.x/ D

˝ X

! Q! .x/:

(7)

!D1

The evaluation of Q.x/ requires the solution of ˝ subproblems over the secondstage variables y! : Q! .x/ D min qT! y! y!

s:t:

W! y! h! T! x; y! 2 Y:

(8) (9)

The constraints of the master problem (5–6) are scenario independent, while the parameters of the second-stage constraints in (9) may vary from scenario to scenario. The vector of the first-stage variables x appears as a fixed parameter in the constraints of the second-stage scenario problems.

296

T. Tometzki and S. Engell

The main algorithmic idea of the stage decomposition based hybrid evolutionary approach is to address the master problem given by (5–6) by an evolutionary algorithm. An evolutionary algorithm is not restricted to explicitly given functions, thus it can be applied to optimize objective functions where an implicit term Q.x/ which is regarded as a black-box function is provided externally. To evaluate (5), the ˝ subproblems given by (8–9) are solved independently by a MILP solver.

4 Risk Conscious Planning Under Uncertainty The two-stage stochastic optimization approach described above accounts for uncertainty by optimizing the expected profit without reflecting and controlling the variability of the performance associated with each specific scenario. Therefore, there is no guarantee that the process will perform at a certain level for all uncertain scenarios [12, 13]. However, for the solution with the best expected cost there may exist scenarios with poor outcomes, i.e. high costs. From an economic point of view a high economic loss or other disadvantageous outcomes should be avoided. The measure of the occurrence of such disadvantageous events or their degree of damage is termed risk. For given first-stage variables x, the scenario costs are random variables, thus the consequences of a decision are given by the distribution of the scenario costs and can be graded according to various risk measures. Incorporation of the trade-off between risk and profit leads to a multi-objective optimization problem in which the expected performance and the risk measure are the two objectives. Different criteria for assessing risk have been proposed in the literature [2, 13]. The standard deviation for a given set of scenarios is one of the metrics commonly used for quantifying variability. Alternative approaches for integrating risk have been considered as a second criterion, amongst others the value at risk (VaR), the conditional value at risk (CVaR) definitions and the worst-case performance. The risk conscious criteria are expressed in the two-stage stochastic program by a second master problem minx r.x/. The function r.x/ is determined by the formal definition of the risk function which is based on the second-stage cost values Q! .x/. Risk measures used in this work are: • Standard deviation: The standard deviation is a measure of how broad the distribution of the scenario costs is. It reflects the chance that the actual costs may differ largely from the expected costs. • Value at risk (VaR˛ ): For a given scenario cost distribution and a confidence level ˛, the value of VaR˛ is the cost of most favorable scenario of the .1 ˛/ 100% most unfavorable scenarios. • Conditional value at risk (C VaR˛ ): For a given scenario cost distribution and a confidence level ˛, the value of C VaR˛ is the mean cost of the .1 ˛/ 100% most unfavorable scenarios.

Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty

297

• Worst-case cost: Cost of the scenario with the worst performance. A major difference with respect to other risk measures is that the probability information is not used.

5 Multi-Objective Evolutionary Approach In optimization problems with multiple objectives in general no solution exists for which all objectives are optimal. Therefore the goal of multi-objective optimization is to compute the set of the Pareto-optimal solutions. This is a set of solutions, where no improvement in one objective can be achieved without downgrading another objective. The attractive feature of multi-objective evolutionary algorithms is their ability to find a set of non-dominated solutions close to the Pareto-optimal solutions. Instead of using classical multi-objective optimization algorithms (i.e. weighted sum approach, -constraint method) which convert a multi-objective optimization problem into a single-objective optimization problem, the evolutionary approach finds a number of trade-off solutions in one single optimization run (for an overview see [4, 7]).

5.1 Multi-Objective Evolutionary Algorithm (MO-EA) The two objectives f .x/ and r.x/ are addressed by a MO-EA. In this contribution, an integer .; ; /-evolution strategy [10] is used. The selection is adapted to the multi-objective environment by using the elitist non-dominated sorting concept (NSGA-II) of [5]. 5.1.1 Representation and Initialization The individuals of the integer evolution strategy are given by the vector of integer first stage decisions x. In addition to the object parameters x, each individual comprises a vector of real-valued strategy parameters s that represent mutation strengths corresponding to each object parameter. In the initial population, individuals are initialized randomly within the bounds of the box-constrained first stage decision space xmi n x xmax ; x 2 Zn . The mutation strength parameters are initialized randomly in the range of the corresponding object parameter bounds 1 s xmax xmi n ; s 2 Rn . 5.1.2 Evaluation After the initialization and after each generation, an evaluation of the individuals is performed. For first-stage feasible solutions x the ˝ subproblems are solved

298

T. Tometzki and S. Engell

independently by a MILP solver. After this both fitness values f .x/ and r.x/ are calculated. If a first-stage solution x does not satisfy the first-stage constraints (6), the fitness functions f .x/ and r.x/ comprise the penalty p.x/ for unsatisfied first stage feasibility constraints (6) in addition to a fixed value fmax . The penalty p.x/ is defined as the sum of constraint violations of (6) according to p.x/ D

X

max f0; .Ai x bi /g :

(10)

i

The fixed value fmax is a valid upper bound of the objective function f .x/ for solutions x which satisfy the first stage constraints. If for a first-stage feasible solution x no feasible recourse exists (i. e. some of the second-stage MILP-problems may not have feasible solution y! ), the fitness is set to fmax . Due to this choice feasible solutions are always preferred to infeasible solutions. 5.1.3 Mutation In each evolutionary loop, offspring individuals are generated by a mutation operator. For each offspring a parent individual is randomly selected from the population with equal probability. The mutation strength parameters si are mutated log-normally according to si0 WD si exp. N.0;1/ /

(11)

with N.0;1/ representing a normal distribution with an expected value of 0 and a variance of 1. The variation is weighted by so called learning rate . According to 1 [1] it is set to D 2p . n Each object parameter xi is perturbed independently from the others by a random number drawn from a symmetric discrete distribution with an expected value of zero: xi0 WD xi C G1 .qi / G2 .qi /:

(12)

The symmetric discrete distribution is constructed from geometric distributions G.qi /. The variance is controlled by the parameter qi which is calculated from the mutation strength si and the dimension of the object parameter space n such that the mutation strength si is n times the expected absolute perturbation: si D n E.jG1 .q.si // G2 .q.si //j/. The theoretical background of the construction of the mutation distribution is described in [10]. 5.1.4 Maintenance of Bounds The mutation of the object parameters and of the strategy parameters may lead to values which do not satisfy the respective bounds. In order to maintain the bounds

Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty

299

for xi0 , a repair operator is introduced which maps the value xi0 on points on the nearest bounds of the feasible region as suggested by [6]. The transformation of strategy parameters si0 keeps the values within the interval Œ1; 1/. This ensures that the minimum expected absolute perturbation in the L1 -norm for an individual is at least 1.

5.1.5 Selection for Population Replacement First, the populations of the parents and the offspring are combined. If the age of an individual equals , this individual is not further considered in the selection. Then the entire population is sorted into different front sets based on non-domination. All non-dominated solutions in the population are assigned to the first front F1 . The nondominated individuals of the remaining population are then assigned to the second front F2 . This procedure continues until all population members are classified into fronts. The new population for the next generation is successively filled up with individuals from the fronts starting with the first front F1 . This procedure continue until the individuals of a front Fl can no longer be accommodated in the new population. To choose exactly individuals, the solutions of the front Fl are sorted using a crowded-distance comparison operator [5] in descending order and the best solutions are used to fill up the new population. After a new population of individuals is generated, the age of the individuals is increased by one and a new iteration loop starts if the termination criterion is not fulfilled.

6 Numerical Study The performance of the hybrid multi-objective evolutionary approach is evaluated by the quality of the non-dominated solutions. Convergence comparisons to other approaches or statistical analysis of the random behavior of the algorithm are outside the scope of this short contribution.

6.1 Chemical Batch Planning Example The case study considered here is the production of expandable polystyrene (EPS) [11]. The layout of the multi-product batch plant is shown in Fig. 1. Two types A and B of the polymer are produced in five grain size fractions from raw materials E. The preparation stage is not considered here. The polymerization stage is operated in batch mode and is controlled by recipes. Each of the ten recipes defines the product (A or B) and a grain size distribution. Each batch yields a main product and four coupled products. The capacity of the polymerization stage constrains the number of batches to 12 in each two-day period. The batches are

300

T. Tometzki and S. Engell

Fig. 1 Flowsheet of the EPS-plant

transferred to two continuously operated finishing lines which fractionate the grain sizes. The capacity of each finishing line is between 5 and 12 batches per period in case it is operated, and 0 otherwise. The operation mode has to be the same for at least two successive periods. The planning decisions which have to be made are the optimal choices of the numbers of polymerizations of each recipe in each period. The decisions in periods 1 to 3 are considered as first-stage decisions, those in periods 4 and 5 as second-stage decisions. The uncertainties in the demands and the possible polymerization reactors breakdowns are represented by ˝ scenarios of equal probability. The full mathematical description of the process model and of the cost model can be found in [15].

6.2 Experimental Setup Numerical experiments for two problem settings were carried out. Both problem settings differ only in the demand profiles in the scenarios. The demand profiles of setting 1 are such that the average total product demand in each period is between 150% (for ! D 1) and 200% (for ! D ˝) of the maximum capacity of the polymerization stage. In contrast to setting 1, setting 2 has average total demands of half the size of setting 1 (75% for ! D 1 and 100% for ! D ˝). The failure scenarios of the polymerization reactors in periods 4 and 5 are such that the maximum capacity decreases to 11, 10, and 9 with equal probabilities of 16 . All other parameters are set to the setting presented in [15]. In the evolutionary algorithm, the integer first stage decisions x are modeled by 30 integer object parameters xij k 2 f0; : : : ; 12g corresponding to the numbers of polymerization batches that are started in periods i 2 f1; 2; 3g for each EPS-type j 2 fA; Bg of recipes k 2 f1; : : : ; 5g. For each experiment a population size of D 30 and a offspring/parents-ratio of = D 7 were chosen. The maximum age

Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty

301

of individuals was set to D 5. In order to quantify the quality of the obtained sets of non-dominated solutions, the corresponding monolithic two-stage mixed-integer programs were solved for the expected scenario costs objective and for the worstcase scenario costs objective by the state-of-the-art solver CPLEX. The MO-EA was implemented in MATLAB 7.3. All MILPs were solved using CPLEX 10.2. The algebraic models to be solved by CPLEX were formulated using GAMS distribution 22.5. The computational equipment for all the experiments performed was a dual Xeon machine with 3 GHz speed, 1.5 GB of main memory with Linux operating system. For all experiments the calculation time was limited to 12 CPU-hours per setting.

6.3 Computational Results The 8 plots in Fig. 2 show the results obtained by the MO-EA and CPLEX. The rows show results for different risk measures r.x/ in the order: standard deviation of scenario costs, worst scenario costs, VaR˛ (for ˛ D 0:6; 0:8; 0:9; and 0:95), and C VaR˛ (for ˛ D 0:6; 0:8; 0:9; and 0:95). The left column contains the results for setting 1, the right column contains the results for setting 2. The grey lines in the plots represent linear fits to the non-dominated solutions. The dashed lines mark the lower bounds obtained by CPLEX at the end of the optimization time. The results for the minimization of the expected costs and of the standard deviation of scenario costs show that the MO-EA was able to generate a relatively large number of non-dominated solutions. While the region of solutions for setting 1 is connected and smooth, the solution set for setting 2 is a union of three disconnected non-dominated set regions. The best solution obtained by CPLEX for the single-objective problem is only slightly better than the best solution obtained by the MO-EA for the multi-objective problem. The results for the optimization of the worst-case scenario costs in the second row show smaller non-dominated solutions sets than for the standard deviation objective. The differences between the best CPLEX solution and the best MO-EA solution are smaller for the objective f .x/ than for r.x/. This may be an indication that the solution of r.x/ is comparatively easier for CPLEX than for the MO-EA. In average, the reduction of the worst scenario costs by 1 reduces the expected costs by 0.45 for setting 1 and by 0.21 for setting 2. In the lower plots, the results shown for VaR˛ and C VaR˛ are quite similar. The number of non-dominated solutions in the set increases with the ˛-value. Some of the VaR˛ and C VaR˛ values obtained by CPLEX by optimizing the objective f .x/ dominate some of the corresponding non-dominated solutions obtained by the MOEA (especially for ˛ D 0:6). This indicates that the Pareto-optimal solutions are not reached by the MO-EA. The shown linear fitting lines have a gradient of up to 6:9. Especially for higher ˛ values, a strong reduction of risk r.x/ can be obtained by a relatively small reduction of the expected costs f .x/.

302

T. Tometzki and S. Engell Standard deviation, Setting 1 5.5

Standard deviation, Setting 2 4

MO−EA: non−dominated solutions CPLEX: solution for f(x) CPLEX: lower bound for f(x)

5 4.5

3.5 3.25 r(x)

r(x)

4 3.5 3

3 2.75

2.5

2.5

2

2.25

1.5 −28

MO−EA: non−dominated solutions CPLEX: solution for f(x) CPLEX: lower bound for f(x)

3.75

−27

−26

−25

−24

−23 f(x)

−22

−21

−20

2 −6

−19

−4

−14

−16

4

6

8

10

12

Worst scenario, Setting 2 MO−EA: non−dominated solutions MO−EA: linear fitting line CPLEX: solution for f(x) CPLEX: lower bound for f(x) CPLEX: solution for r(x) CPLEX: lower bound for r(x)

4.5 4 3.5 r(x)

r(x)

−15

2

5

MO−EA: non−dominated solutions MO−EA: linear fitting line CPLEX: solution for f(x) CPLEX: lower bound for f(x) CPLEX: solution for r(x) CPLEX: lower bound for r(x)

−13

0

f(x)

Worst scenario, Setting 1 −12

−2

3

−17 2.5

−18

2

−19

1.5

−20

y = − 4.7*x

y = − 2.2*x −21 −28

1 −27

−26

−25

−24

−23

−22

−21

−20

−19

−5

−4

−3

f(x) VARα, Setting 1 −16 −17 −18

r(x)

−19 −20 −21 −22 −23

4

2 1 0 −1 −2 −3 −4 −5

−27

−26

−25

−24

−23 f(x)

−22

−21

−20

−19

−5

−4

r(x)

−17 −18 −19 −20 −21

4

−1

0

MO−EA: α = 0.6 MO−EA: α = 0.8 MO−EA: α = 0.9 MO−EA: α = 0.95 CPLEX: α = 0.6 CPLEX: α = 0.8 CPLEX: α = 0.9 CPLEX: α = 0.95 CPLEX: lower bound for f(x) Linear fitting line: y = −0.7*x Linear fitting line: y = −2.1*x Linear fitting line: y = −1.7*x Linear fitting line: y = −2.3*x

3.5 3 2.5 2 r(x)

−16

−2

CVARα, Setting 2

MO−EA: α = 0.6 MO−EA: α = 0.8 MO−EA: α = 0.9 MO−EA: α = 0.95 CPLEX: α = 0.6 CPLEX: α = 0.8 CPLEX: α = 0.9 CPLEX: α = 0.95 CPLEX: lower bound for f(x) Linear fitting line: y = −0.4*x Linear fitting line: y = −1.1*x Linear fitting line: y = −1.8*x Linear fitting line: y = −3.2*x

−15

−3 f(x)

CVARα, Setting 1 −14

−22 −28

0

MO−EA: α = 0.6 MO−EA: α = 0.8 MO−EA: α = 0.9 MO−EA: α = 0.95 CPLEX: α = 0.6 CPLEX: α = 0.8 CPLEX: α = 0.9 CPLEX: α = 0.95 CPLEX: lower bound for f(x) Linear fitting line: y = −6.2*x Linear fitting line: y = −2.9*x Linear fitting line: y = −1.8*x Linear fitting line: y = −6.9*x

3

−24 −25 −28

−1

VARα, Setting 2

MO−EA: α = 0.6 MO−EA: α = 0.8 MO−EA: α = 0.9 MO−EA: α = 0.95 CPLEX: α = 0.6 CPLEX: α = 0.8 CPLEX: α = 0.9 CPLEX: α = 0.95 CPLEX: lower bound for f(x) Linear fitting line: y = −2.6*x Linear fitting line: y = −1.8*x Linear fitting line: y = −2.2*x Linear fitting line: y = −4.2*x

r(x)

−15

−2 f(x)

1.5 1 0.5 0 −0.5 −1

−27

−26

−25

−24

−23 f(x)

−22

−21

−20

−19

−5

−4

−3

−2

−1

0

f(x)

Fig. 2 Non-dominated solutions obtained for the multi-objective optimization of the expected costs f .x/ and different risk measures r.x/. The grey lines represent fitting lines for the nondominated solutions. The dashed lines mark the lower bounds obtained by CPLEX at the end of the optimization time

Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty

303

Additional Monte-Carlo simulations of first-stage decisions with 1,000 random samplings for each objective showed that the cardinality of the non-dominated solutions is strongly related to the correlation of the objectives f .x/ and r.x/. While the correlation coefficient f .x/;r.x/ for the objectives VaR˛ and C VaR˛ with ˛ 0:8 is f .x/;r.x/ > 0:99, for ˛ > 0:8 and for the worst-case objective the correlation is 0:95 f .x/;r.x/ 0:99. For the standard deviation objective an coefficient of f .x/;r.x/ 0:6 was obtained.

7 Conclusions The paper describes the solution of risk conscious chemical batch planning problems under uncertainty by a multi-objective evolutionary algorithm combined with MILP solutions of the scenario subproblems. The proposed multi-objective twostage stochastic approach reflects decisions under uncertainty. The results from the case study show that the application of multi-objective evolutionary algorithms may have a great practical benefit on the risk conscious production planning under uncertainty. The decision maker gets a set of solution alternatives among which he can choose according to his risk aversion.

References 1. T. B¨ack. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996. 2. A. Bonfill, M. Bagajewicz, A. Espuna, and L. Puigjaner. Risk management in the scheduling of batch plants under uncertain market demand. Industrial and Engineering Chemistry Research, 43:741–750, 2004. 3. J. F. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer, New York, 1997. 4. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Chichester, 2001. 5. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, 2002. 6. M. Emmerich, M. Sch¨utz, B. Gross, and M. Gr¨otzner. Mixed-integer evolution strategy for chemical plant optimization. In I. C. Parmee, editor, ”Evolutionary Design and Manufacture” (ACDM 2000), pages 55–67. Springer, 2000. 7. J. Knowles, D. Corne, and K. Deb. Multiobjective Problem Solving from Nature: From Concepts to Applications. Natural Computing Series, 2008. 8. Z. Li and M. Ierapetritou. Process scheduling under uncertainty: Review and challenges. Comp. and Chem. Eng., 32:715–727, 2008. 9. A. Ruszczynski and A. Shapiro, editors. Stochastic Programming. Handbooks in Operations Research and Management Science. Elsevier, Amsterdam, The Netherlands, 2003. 10. G. Rudolph. An evolutionary algorithm for integer programming. In Y. Davidor, H.-P. Schwefel, and R. M¨anner, editors, PPSN III, volume 866 of LNCS, pages 193–197. Springer, Berlin, 1994.

304

T. Tometzki and S. Engell

11. G. Sand and S. Engell. Modelling and solving real-time scheduling problems by stochastic integer programming. Comp. and Chem. Eng., 28:1087–1103, 2004. 12. M. Suh and T. Lee. Robust optimization method for the economic term in chemical process design and planning. Ind. Eng. Chem. Res., 40:5950–5959, 2001. 13. N. J. Samsatli, L. G. Papageorgiou, and N. Shah. Robustness metrics for dynamic optimization models under parameter uncercainty. AIChE J., 44:1993–2006, 1998. 14. J. Till, G. Sand, S. Engell, M. Emmerich, and Sch¨onemann L. A hybrid algorithm for solving two-stage stochastic integer problems by combining evolutionary algorithms and mathematical programming methods. In Proc. European Symposium on Computer Aided Process Engineering (ESCAPE-15), pages 187–192, 2005. 15. J. Till, G. Sand, M. Urselmann, and S. Engell. Hybrid evolutionary algorithms for solving two-stage stochastic integer programs in chemical batch scheduling. Comp. and Chem. Eng., 31:630–647, 2007.

On Isogeometric Analysis and Its Usage for Stress Calculation Anh-Vu Vuong and B. Simeon

Abstract A concise treatment of isogeometric analysis with particular emphasis on the relation to isoparametric finite elements is given. Besides preserving the exact geometry, this relatively new extension of the finite element method possesses the attractive feature of offering increased smoothness of the basis functions in the Galerkin projection. Such a property is particularly beneficial for stress analysis in linear elasticity problems, which is demonstrated by means of a 3D simulation example.

1 Introduction Isogeometric analysis is an extension of the Finite Element Method (FEM) that is aimed at exactly representing engineering shapes and at bridging the gap between the Computer Aided Design (CAD) and the FEM software. Starting with the pioneering work of Hughes et al. [8], this approach has recently found much interest, and numerical results for challenging applications such as cardiovascular flow simulation [12] demonstrate its potential. An exact preservation of the geometry is not the only attractive feature of isogeometric analysis. The option to increase the global smoothness of the basis functions in the Galerkin projection, which is related to properties of the underlying spline functions, is very promising. However, this does not come for free as it is tightly connected to the tensorproduct structure of the basis functions. While higher smoothness, at least in conventional FEM wisdom, is mostly not considered as beneficial, there are specific applications where one clearly profits from it.

Anh.-Vu Vuong B. Simeon Centre for Mathematical Sciences, Technische Universit¨at M¨unchen, Boltzmannstraße 3, 85748 Garching, Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 25, © Springer-Verlag Berlin Heidelberg 2012

305

306

Anh-Vu Vuong and B. Simeon

One example is the calculation of stress values in elasticity problems where standard C 0 -elements lead to discontinuous stress distributions. In this contribution, we show that by isogeometric analysis, it is straightforward to construct basis functions of class C 1 or higher that result in continuous stress distributions. Moreover, we give a concise introduction to isogeometric analysis and compare it with traditional finite elements, in particular the isoparametric approach. The interested reader is also referred to [3] for convergence estimates with respect to h-refinement, to [11] for the application of isogeometric analysis in shape optimization, and to [10] for the CAD details. Among the various references on the FEM, we mention [2] for state-of-the-art adaptive techniques and [7] for a more engineering-oriented treatment. The paper is organized as follows: In Sect. 2 the basic idea of isogeometric analysis is summarized and the implications for the usual Galerkin projection are discussed. Thereafter in Sect. 3, the focus is on trivariate Non Uniform Rational B-Splines (NURBS) as basis functions for 3D applications, and specific issues such as boundary conditions and refinement strategies are shortly touched. Section 4 finally presents simulation results for a cylinder geometry. It turns out that isogeometric analysis yields a much smoother stress distribution for this example than standard isoparametric finite elements.

2 Preserving the Geometry To simplify the presentation, we restrict the discussion in this section to Poisson’s equation u D f in a Lipschitz domain ˝ R3 , with Dirichlet boundary conditions u D u0 on @˝. The weak form a.u; v/ D hf; vi for all v 2 V

(1)

R with bilinear form a.u; v/ WD ˝ ru rv d x and test functions v in the space V WD fv 2 H 1 .˝/; v D 0 on @˝g is, as usual, the setting for the Galerkin projection.

2.1 Parametrization of the Domain As a starting point for the idea of isogeometric analysis, we suppose that the physical domain ˝ is parametrized by a global geometry function

F W ˝0 ! ˝;

0 1 x1 F ./ D x D @x2 A : x3

(2)

On Isogeometric Analysis and Its Usage for Stress Calculation

307

Below in Sect. 3 we will apply NURBS to define F , but for he moment the geometry function is simply an invertible C 1 -mapping from the parameter domain ˝0 R3 to the physical domain. Integrals over ˝ can be transformed into integrals over ˝0 by means of the well-known integration rule Z

Z w.x/ d x D ˝

w.F .// jdet DF ./j d

(3)

˝0

with 3 3 Jacobian matrix DF ./ D @Fi =@j i;j D1W3 . For the differentiation, the chain rule applied to u.x/ D u.F .// yields rx u.x/ D DF ./T r u./:

(4)

with rx D .@=@xi /i and r D .@=@i /i Accordingly, the bilinear form in (1) satisfies the transformation rule Z Z rx u rx v d x D .DF ./T r u/ .DF ./T r v/ jdet DF ./j d : (5) ˝

˝0

The right hand side integral in (1) is transformed in a similar way. We stress that the paramerization F will be crucial in the following because its properties, in particular the smoothness, are passed to the numerical solution.

2.2 Galerkin Projection In the classical FEM, the finite dimensional subspace Vh V for the Galerkin projection typically consists of piecewise polynomials with global C 0 -continuity. When comparing the FEM with isogeometric analysis, three features are of particular importance: the concept of nodal bases, local shape functions, and the isoparametric approach, cf. [7]. A nodal basis .1 ; : : : ; n / of Vh is characterized by the favorable property i .zj / D ıij for nodes or grid points zj 2 ˝h , which means that uh .zj / D

n X

qi i .zj / D qj :

(6)

i D1

In other words, the coefficient qj stands for the numerical solution in zj and thus carries physical significance. This concept of a nodal basis can be generalized to the partition of unity, which is the property that all basis functions sum up to one. Shape functions are a very useful technique to unify the treatment of the polynomials in each finite element Tk by the transformation to a reference element T0 . Correspondingly, the integrations for the assembly of the stiffness matrix and

308

Anh-Vu Vuong and B. Simeon

the load vector are carried out by summation over all involved elements and a transformation G W T0 ! Tk from the reference element, as for example Aij D a.i ; j / D

XZ Tk

and Z

Z ri rj d x D

Tk

Tk

ri rj d x

(7)

.DG ./T rSm / .DG ./T rS` / jdet DG ./j d :

T0

Here, Sm and S` are the shape functions that correspond to i and j when restricted to the element Tk . Though one observes some similarities with the transformation rule (5), it should be stressed that there are two major differences: The integrals in (7) refer to single elements with simple geometry, and the mapping G is either linear or, in case of the isoparametric approach, a polynomial. Due to the simple structure of the shape functions and the polygonal shape of the elements, the integration is straightforward and can be implemented in terms of standard quadrature rules or sometimes even via closed-form integration. For the approximation of curved boundaries, the isoparametric approach applies the shape functions both for defining the basis functions and for describing the physical domain. Thus, the mapping G W T0 ! Tk from above is written as x D G ./ D

L X

S` ./zk` ;

(8)

`D1

where zk` stands for the nodes of the element Tk . In each element, one has therefore the local representation xD

L X `D1

S` ./zk` ;

uh .x/ D

L X `D1

S` ./qk` D

L X

S` .G 1 .x//qk` :

(9)

`D1

While isoparametric finite elements approximate the boundary by a C 0 interpolant, isogeometric analysis exactly represents the boundary by using a geometry description which is directly related to the CAD representation.1 The basic idea is to formulate the Galerkin projection with respect to basis functions defined on the parameter domain ˝0 and to use the geometry function F from (2) as a global push-forward operator to map these functions to the physical domain ˝. Let . 1 ; : : : ; n / be a set of linear independent functions on ˝0 . By setting

1

We also want to mentioned here that there are other possibilities such as the Bernstein-B´ezier multivariate splines, which describe smooth function over triangulations (e.g. [5]), but we do not want to give further details here.

On Isogeometric Analysis and Its Usage for Stress Calculation

309

i WD i ı F 1 , each function is pushed forward to the physical domain ˝. In other words, Vh D span f i ı F 1 gi D1Wn (10) is the finite dimensional subspace for the projection. Two features are particularly important for an isogeometric method: (1) The geometry function F is typically inherited from the CAD description. In this paper, we concentrate on a single patch parametrization in terms of trivariate NURBS, but other options such as volume meshes generated from trimmed surfaces or from T-Spline surfaces are currently under investigation [6]. (2) The second ingredient is the choice of the functions 1 ; : : : ; n for the Galerkin projection. Hughes et. al [8] select those NURBS that describe the geometry. This is in analogy to the isoparametric approach, but on a global level. However, as long as the geometry function is exact and used as in the transformation rule (5), other choices for 1 ; : : : ; n will also preserve the geometry. For example, one could think of B-Splines instead of NURBS and thus avoid the rational terms.

3 NURBS and Isogeometric Analysis In this section we concentrate on NURBS as basis functions in the Galerkin projection and discuss some of their properties, in particular the global smoothness.

3.1 Splines In isogeometric analysis, the space Vh consists of spline functions, i.e., piecewise polynomials (or rational functions, see below) of degree2 p that are connected in so-called knots i . If a knot is not repeated, then the spline has p 1 continuous derivatives in this knot. This differs from the classical finite element spaces where mostly C 0 -elements dominate. Let .0 ; : : : ; m / 2 Rm be the knot vector, consisting of nondecreasing real numbers. We assume that the first and the last knot have multiplicity p, that means that these values are repeated p times. Then the i -th B-Spline basis function of p-degree Ni;p is defined recursively as ( Ni;0 ./ D

2

1

if i < i C1 ;

0

otherwiseI

(11)

Note that we use the terms order and degree synonymously, i.e. a quadratic polynomial is of order/degree two.

310

Anh-Vu Vuong and B. Simeon

Ni;p ./ D

i CpC1 i Ni;p1 ./ C Ni C1;p1 ./: i Cp i i CpC1 i C1

(12)

Note that the quotient 0=0 is defined to be zero. The corresponding non-uniform rational B-spline (NURBS) basis function (of pdegree) is defined as Ni;p ./wi j D0 Nj;p ./wj

Ri ./ D Pn

(13)

with given real positive weights wi . The multiplicity mi > 1 of a knot i decreases the smoothness of the spline to C pmi in this knot. By specifying single or multiple knots, one may thus change the smoothness. In the following we skip the index p of the degree for sake of convenience. In three dimensions, a trivariate NURBS basis function reads Ni .r/Nj .s/Nk .t/wijk Rijk .r; s; t/ D Pni ;nj ;nk Ni .r/Nj .s/Nk .t/wijk i;j;k

(14)

Though this representation requires a knot vector for each parameter direction, it does not exactly have a tensorproduct structure due to the weights wijk , which can be altered separately. The continuity in each parameter direction is determined by the knot multiplicities of the corresponding knot vectors. It should be noted that the weights do not have any influence on this, and therefore the tensorproduct-like structure results in isoparametric lines or surfaces sharing the same continuity in the other parameter directions. For a three-dimensional geometric model described by NURBS, the geometry function is of the form F ./ D F .r; s; t/ D

n j nk ni X X X i

j

Rijk .r; s; t/d ijk

(15)

k

with trivariate basis functions Rijk defined on the patch ˝0 D Œ0; 13 and control points d ijk 2 R3 . Like in [8], we use the same functions Rijk as basis functions and thus have Vh span fRijk ı F 1 gi D1Wni ;j D1Wnj ;kD1Wnk : (16) Note that the boundary condition u D u0 has also to be taken into account, and for this reason we write Vh as a subset of the span. A comparison with isoparametric finite elements leads to the following observations: (1) The three knot vectors partition the patch into a computational mesh, and adopting the finite element terminology, we can call three-dimensional knot

On Isogeometric Analysis and Its Usage for Stress Calculation

311

spans also elements (in the parameter domain). However, the support of the basis functions is in general larger than in the FEM case. (2) The NURBS do not form a nodal basis, and thus single coefficients qijk do not represent approximations in specific grid points. On the other hand, the partition of unity property is satisfied. (3) Depending on the chosen degree and the knot multiplicity in the NURBS data, global smoothness of class C 1 or higher is easily achieved in isogeometric analysis by the usage of a knot vector which avoids a high knot multiplicity. Note also that both the FEM and isogeometric analysis coincide for an important special case. For degree p D 1 in all three coordinate directions, the geometry function (15) generates a regular assembly of hexahedral finite elements, and the corresponding Rijk reduce to trilinear basis functions in each element. Thus, the wide-spread trilinear hexahedral finite element is part of isogeometric analysis. While the idea of isogeometric analysis is impressive, its actual implementation requires additional efforts in order to come up with powerful algorithms. For this reason, we shortly address some of the major issues in the following.

3.2 Boundary Conditions, Quadrature, Refinement In standard FEM, the treatment of Dirichlet boundary conditions is greatly simplified by the nodal basis property. For isogeometric analysis it turns out that the lack of a nodal basis is a drawback and renders the incorporation of boundary conditions more involved. More specifically, zero Dirichlet boundary conditions are the easiest case and simply require the determination of those basis functions Rijk that do not vanish on the boundary. The corresponding solution coefficients qijk are then set to zero, which can be accomplished by elimination techniques at the linear algebra level. Non-zero boundary conditions u D u0 , however, need to be projected into a suitable spline space. This projection and its influence on the numerical simulation are currently under investigation. As discussed in Sect. 2, the evaluation of integrals over ˝ can be replaced by integrals over the parameter domain ˝0 via the transformation rule as given in (5). In isogeometric analysis the basis functions are the trivariate NURBS Rijk , and numerical quadrature is employed to approximate the integrals, see [9] for a discussion of specific quadrature rules. In this context, it is important to take both the larger support of the basis functions and the increased smoothness into account, which means that the Gaussian quadrature rules used in standard FEM are not optimal in isogeometric analysis. Finally, we shortly comment on the options for refining the grid in isogeometric analysis. Due to the connection to CAD there are some well-known procedures that can be interpreted as classical h- and p-refinement in the FEM context. Knot insertion adds an additional basis function in the corresponding parameter direction and is the analogue of h-refinement. Degree elevation is a combination of increasing

312

Anh-Vu Vuong and B. Simeon

the degree of the spline as well as the multiplicity of the already existing knots to preserve the smoothness. One major feature of all refinement techniques is that the geometry function itself always remains unchanged and only its representation is altered. Currently, the tensorproduct structure still represents a bottleneck for efficient refinement methods. It can be easily seen that h-refinement always has a global effect on the mesh and in most cases this will result in a nonlocal increase of the degrees of freedom. For recent work on T-Splines, which to a certain degree allow local refinement, see [6].

4 Numerical Example In this last section, we consider a problem from linear elasticity and compute the displacements as well as the stress along a contour line on the surface. The example was simulated by two different simulation codes, the single patch isogeometric solver by [8] and the commercial FEM code COMSOL. The geometry (i.e., the physical domain ˝) of the model problem is given by a three-dimensional cylinder (height 2 in dimensionless form) with a circular base (diameter 1). Such a circular shape of the cross-section can be exactly represented by NURBS, whereas isoparametric finite elements (in our case quadratic tetrahedral elements) are only able to approximate it. The parametrization of the cross-section by NURBS, however, leads to a singularity of rF in the center point. The bottom circle is fixed with a zero Dirichlet boundary condition and a uniform surface force parallel to the base is applied at the upper circle. In the following, we compare isogeometric analysis for a cylinder parametrization of degree 2 (which was obtained from the original geometric description via order elevation) with isoparametric finite elements of degree 2 within COMSOL. Whereas both simulations show comparable results in terms of the displacements, which is plotted in Fig. 1, the different smoothness properties show up when investigating the stress. In Fig. 2 the von Mises stress calculated by isogeometric analysis (2,754 degrees of freedom) and by isoparametric finite elements (16,086 degrees of freedom) along the boundary of the bottom base circle are displayed. The stress calculated in isogeometric analysis is clearly smoother due to the spline ansatz functions and the exact geometry representation. On the other hand the isoparametric FEM solution tends to be significantly less smooth and tends to oscillate. Due to the fact that the stress is quite sensitive and that the values at the boundary have to be obtained by interpolation the stress values in the FEM depend strongly on the tetrahedral grid. The same effect can also be observed for finer grids, whereas in isogeometric analysis the solution remains smooth at all refinement levels. It should be remarked that there are knots that have at least a multiplicity of two due to the global geometry description. In Fig. 2, these knots are marked by circles on the x-axis, and not surprisingly, a lack of smoothness can be observed also

On Isogeometric Analysis and Its Usage for Stress Calculation

313

2 1.8 1.6 1.4

Z

1.2 1 0.8 0.6 0.4 0.2 0 −0.5 0.5

0

0 0.5 −0.5

Y

X

Fig. 1 geometry of the cylinder (left); scaled deformation of the cylinder (factor 104 ). greyscale: total displacement (right) 2000 1800

von Mises Stress

1600 1400 1200 1000 800 600 400

Isogeometric Analysis

200

Isoparametric FE

0

0

1

2

3

4

5

6

Angle

Fig. 2 von Mises stress along the boundary of bottom base circle; multiple knot locations are marked with circles at the x-axis

in the isogeometric stress curve in these particular points. Summarizing, the example shows how the smoothness of the splines basis and of the geometry description are passed to the numerical solution.

5 Conclusions In this paper we looked into isogeometric analysis from a FEM point of view and elaborated some of the distinctive features. As the numerical example demonstrates, a gain in global smoothness can be beneficial in applications such as stress analysis.

314

Anh-Vu Vuong and B. Simeon

However, though the idea of isogeometric analysis is impressive, several important issues still need to be addressed in future work. For one, due to the more global nature of the approximation, which combines the representation of the geometry with the Galerkin projection, the geometry function or parametrization plays a major role. Results on the influence of different parametrizations on the convergence and a priori criteria for a good choice are currently work in progress (see [4]). Another challenging topic is the generation of volume parametrizations from typical surface data in standard CAD software models. A technique for creating so-called swept volumes for latter use in isogeometric analysis is introduced in [1]. Acknowledgements The authors were supported by the 7th Framework Programme of the European Union, project SCP8-218536 “EXCITING”. Special thanks go to Hughes et al. [8] for providing their isogeometric implementation, which was used to generate some numerical results in this paper as well as to Bert J¨uttler and coworkers for support concerning the geometry models.

References 1. M. Aigner, Ch. Heinrich, B. J¨uttler, E. Pilgerstorfer, B. Simeon, and A.-V. Vuong. Swept volume parametrization for isogeometric analysis. In E. Hancock and R. Martin, editors, The Mathematics of Surfaces (MoS XIII 2009), pages 19 – 44. Springer, 2009. 2. W. Bangerth and R. Rannacher. Adaptive Finite Element Methods for Differential Equations. Lectures in Mathematics. Birkh¨auser, Basel, 2003. 3. Y. Bazilevs, L. Beir˜ao da Veiga, J. A. Cottrell, T. J. R. Hughes, and G. Sangalli. Isogeometric analysis: Approximation, stability and error estimates for h-refined meshes. Mathematical Methods and Models in Applied Sciences, 16:1031–1090, 2006. 4. E. Cohen, T. Martin, R. M. Kirby, T. Lyche, and R. F. Riesenfeld. Analysis-aware modeling: Understanding quality considerations in modeling for isogeometric analysis. Computer Methods in Applied Mechanics and Engineering, 199:334–356, 2010. 5. O. Davydov. Stable local bases for multivariate spline spaces. Journal of Approximation Theory, 111:267–297, 2001. 6. M. R. D¨orfel, B. J¨uttler, and B. Simeon. Adaptive isogeometric analysis by local h-refinement with T-splines. Computer Methods in Applied Mechanics and Engineering, 199:264–275, 2010. 7. T. J. R. Hughes. The Finite Element Method. Dover Publ., Mineola, New York, 2000. 8. T. J. R. Hughes, J. A. Cottrell, and Y. Bazilevs. Isogeometric analysis: CAD, finite elements, NURBS, exact geometry and mesh refinement. Computer methods in applied mechanics and engineering, 194:4135–4195, 2005. 9. T. J. R. Hughes, A. Reali, and G. Sangalli. Efficient quadrature for NURBS-based isogeometric analysis. Computer Methods in Applied Mechanics and Engineering, 199:301–313, 2010. 10. L. Piegl and W. Tiller. The NURBS Book. Monographs in Visual Communication. Springer, New York, 2nd edition, 1997. 11. W. A. Wall, M. A. Frenzel, and Ch. Cyron. Isogeometric stuctural shape optimization. Computer methods in applied mechanics and engineering, 197:2976–2988, 2008. 12. Y. Zhang, Y. Bazilevs, S. Goswami, Ch. L. Bajaj, and T. J. R. Hughes. Patient-specific vascular NURBS modeling for isogeometric analysis of blood flow. Computer methods in applied mechanics and engineering, 196:2943–2959, 2007.

On the Efficient Evaluation of Higher-Order Derivatives of Real-Valued Functions Composed of Matrix Operations Sebastian F. Walter

Abstract Two different hierarchical levels of algorithmic differentiation are compared: the traditional approach and a higher-level approach where matrix operations are considered to be atomic. More explicitly: It is discussed how computer programs that consist of matrix operations (e.g. matrix inversion) can be evaluated in univariate Taylor polynomial arithmetic. Formulas suitable for the reverse mode are also shown. The advantages of the higher-level approach are discussed, followed by an experimental runtime comparison.

1 Introduction This paper is concerned with the efficient evaluation of higher-order derivatives of the form d r d f .X / 2 RN ; where f W RN ! R is given as a program that consists of matrix operations. For example, we think of functions of the form f .X1 ; X2 / D tr

T T ! X1T X1 X2T 1 X1T X1 0 X1 X1 X2T 11 ; 11 0 0 0 0 X2 0 X2 0

where X1 and X2 are matrices of matching dimensions and 11 denotes the identity matrix. We investigate a method that is a combination of two well-known techniques from Algorithmic Differentiation (AD): Univariate Taylor Polynomial arithmetic

S.F. Walter Department of Mathematics, Faculty of Mathematics and Natural Sciences II, Humboldt-Universit¨at zu Berlin, Rudower Chaussee 25, Adlershof, 12489 Berlin, Germany Mail address: Unter den Linden 6, 10099 Berlin, Germany e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 26, © Springer-Verlag Berlin Heidelberg 2012

315

316

S.F. Walter

on Scalars (UTPS) [7], potentially coupled with an interpolation approach [5], and first-order forward and reverse on matrices [2]. The combination leads to a technique that we call Univariate Taylor Polynomial arithmetic on Matrices (UTPM). The method inherits many desirable properties: It is relatively easy to implement, it is efficient and it returns not only r d f but yields in the process also the derivatives r k f for k d . As performance test we compute the gradient rf .X / of f .X / D tr.X 1 / in the reverse mode of AD as well as the matrix product in UTP arithmetic. We observe that UTPM arithmetic is typically at least one order of magnitude faster than UTPS arithmetic. Due to the nature of UTPM arithmetic, the memory footprint in the reverse mode is also small and therefore can be used to differentiate functions involving operations on large matrices. The document is structured as follows: In Sect. 2, we give a brief summary of forward and reverse mode AD using UTP arithmetic and introduce our notation, followed by Sect. 3 where we add Numerical Linear Algebra (NLA) functions to the list of elementary functions. In Sect. 4, we examine what can go wrong if one applies the AD theory directly to existing implementations of NLA functions. Finally, we investigate the runtime performance at numerical experiments in Sect. 5.

2 Algorithmic Differentiation We assume that we deal with functions F W RN ! RM

x 7! y D F .x/;

that can be described in the three-part form vnN Dxn

n D 1; : : : ; N

vl Dl .vj l / yM m DvLm

l D 1; : : : ; L

(1)

m D M 1; : : : ; 0;

where l 2 fC; ; ; =; sin; exp; : : : g are elementary functions as defined in the C header file math.h. In the special case M D 1 we use f instead of F . Higherorder derivatives can be computed by means of Univariate Taylor Polynomial arithmetic on Scalars (UTPS). This theory has been implemented in software by use of operator overloading, for example ADOL-C [3]. The key observation is that the propagation of a univariate Taylor polynomial x0 C t 2 RŒt through a sufficiently often differentiable function f W R ! R yields the derivatives r d f , 0 d < D: f .x0 C t/ D

D1 X d D0

1 d r f .x0 /t d C O.t D /: dŠ

(2)

On the Efficient Evaluation of Higher-Order Derivatives

317

By UTP arithmetic it is meant that one regards truncated polynomials of the form ŒxD WD

D1 X

xd t d C .t D / 2 RŒt=.t D /;

(3)

d D0

where t is an indeterminate and .t D / WD RŒtt D . In other words, the UTP ŒxD is defined by the first D Taylor coefficients xd , d D 0; : : : ; D 1. Functions l W R R ! R are generalized to functions ED .l /: RŒt=.t D / RŒt=.t D / defined by Œvl D D ED .l /.Œvj l D / WD

D1 X d D0

ˇ 1 dd l Œvj l D ˇt D0 t d : d d Š dt

(4)

We call ED .l / the extended function of l (cf. [7, Chap. 13]). One can show that the UTP arithmetic defined by (4) is consistent with the usual polynomial addition and multiplication in the factor ring RŒt=.t D /. In the reverse mode of AD one computes derivatives by application of the chain rule vN l dvl D vN l dl .vj l / D vN l

X @l j l

@vj

dvj DW

X

vN j dvj :

(5)

j l

For example, to compute the gradient of a function f given in three-part form, the recursion is started with vN L D 1 and is stopped when vj is an independent variable. The interpretation is that the bar values are entries of the gradient, i.e. xN n D @ f .x/. To compute higher-order derivatives, one can combine UTP arithmetic and @xn the reverse mode. The important observation is that one can differentiate extended functions ED .F / W RN Œt=.t D / ! RM Œt=.t D / that propagate univariate Taylor polynomials in the forward mode once more in the reverse mode. In consequence one obtains derivatives of degree d D 1; : : : ; D. Christianson has shown that ŒNvl D dŒvl D D

X X @l ŒNvl D ED . /.Œvj D /dŒvj D D ŒNvj D dŒvj D @vj j l

(6)

j l

holds [1]. That is, one symbolically differentiates l , then uses UTPS to compute l ED . @ /.Œvj D / and finally employs the UTPS multiplication to obtain ŒNvj D WD @vj l /.Œvj D /. That one obtains one higher order of derivatives can be seen ŒNvl D ED . @ @vj from (2). For a more detailed discussion of AD we refer to the standard reference [7].

318

S.F. Walter

3 Algorithmic Differentiation on Matrices We now add the matrix product dot.X; Y / WD X Y and the matrix inversion inv.X / WD X 1 to the list of elementary functions. Hence, there are now two possibilities how inv./ and dot.; / can be differentiated: Either one regards the matrices A and B as two-dimensional arrays and applies forward/reverse UTPS arithmetic to the linear algebra algorithms, or one considers matrices as elementary objects and applies matrix calculus as described in many textbooks and papers [2,8,10,11]. In the forward mode of AD, the first possibility results in the following formal procedure: 00 1 ŒX11 D ŒY11 D : : : ŒY1NY D B B B C : :: :: : :: :: @ A D ED .F / @@ : : ŒYMY 1 D : : : ŒYMY NY D ŒXMX 1 D 0

11 : : : ŒX1NX D CC :: :: AA : : : : : : ŒXMX NX D

A simple reformulation transforms a matrix of UTPS into a UTPM: 1 0 11 X ŒX11 D : : : ŒX1NX D D1 X B :d C B :: : : :: :: AD @ :: @ : d D0 ŒXMX 1 D : : : ŒXMX NX D XdMX 1 0

1 : : : Xd1NX C d :: :: At : : : MX NX : : : Xd

(7)

We denote the rhs of (7) as ŒX D . The formal procedure then reads ŒY D D ED .F /.ŒX D /:

(8)

To give an explicit example, consider computation of ŒY D D ŒX 1 D WD ED .inv/.ŒX D /, where the constant term X0 2 RN N is nonsingular. That is, we have to find ŒY D D ŒX 1 D s.t. 11 D ŒX D ŒY D C O.t / D D

D1 X

d X

d D0

kD0

! Xk Yd k t d C O.t D /

is satisfied. Equating coefficients yields the recurrence Yd D

X01

d X

! Xk Yd k ;

d D 1; : : : ; D 1:

(9)

kD1

We call this Univariate Taylor Polynomial arithmetic on Matrices (UTPM) as it is a straight-forward generalization from UTPs with scalar coefficients to UTPs with matrix coefficients.

On the Efficient Evaluation of Higher-Order Derivatives

319

Table 1 The sequence of operations of the nominal function evaluation y D f .X/ D tr..X T X/1 / is shown on the left and the corresponding sequence of operations of the reverse mode on the right v0 v1 v2 v3 v4

D D D D D

X vT0 v1 v0 .v2 /1 tr.v3 /

vN 4 vN 3

D CD

yN vN 4 11

vN 2

CD

vT3 vN3 vT3

vN 1

CD

vN 2 vT0

vN 0

CD

vT1 vN2

vN 0 XN

CD

vN T1

D

vN 0

Similarly, one can find the analog of the reverse mode using matrix calculus. Consider the real-valued function y D f .X / 2 R, where X 2 RM N . Performing one step of the reverse mode one obtains M X N X

M X N X @f yN dXmn D ydf N .X / D XN mn dXmn D tr.XN T dX /: (10) @X mn mD1 nD1 mD1 nD1

The reverse mode for the inverse of a matrix Y D X 1 , transpose of a matrix Y D X T , trace of a matrix y D tr.X / and the matrix multiplication Z D X Y are respectively given by tr.YN T dY / D tr.Y YN T Y dX /, tr.YN T dY/ D tr.YN dX /, T T ydtr.X N / D tr.y1 N 1dX / and tr.ZN dZ/ D tr Y ZN dX / C tr.ZN T X dY [2]. To compute higher-order derivatives r d f one can again combine UTP arithmetic and the reverse mode as shown in the previous section and obtains ŒNvl TD dl .Œvj l D / D

X

tr.ŒNvj TD dŒvj D /:

(11)

j l

It is instructive to have a look at the example y D f .X / D tr..X T X /1 /, which also motivates the test function in Sect. 5. Its three-part form can be seen on the left side of Table 1 and the sequence of operations that have to be performed in the reverse sweep on the right side. The variable XN is the desired gradient of the function f , i.e., XN rf .X /.

4 Automatic Differentiation of Existing NLA Algorithms We briefly investigate what can go wrong if one applies automatic differentiation techniques to existing NLA algorithms. In particular, we are interested in the correctness of the computed derivatives in the reverse mode. As an example, we have a look at Algorithm 4 (cf. Algorithm 5.1.3 from [6]). The algorithm computes

320

S.F. Walter

one Givens rotation, i.e. c; s 2 R from a; b 2 R s.t.

c s s c

a r D b 0

(12)

holds. Problematic from the AD point of view is the case b D 0. The reason is that the nominal, i.e. normal, function evaluation fixes the control flow and other branches are simply ignored. In other words, if b D 0 then the three-part form is v1 D a; v0 D b; c D v1 D 1, s D v2 D 0. Hence, a reverse sweep would yield the incorrect result aN D 0 and bN D 0. This is also what one can observe when an AD tool like Tapenade [9] is applied to a C or Fortran implementation of Algorithm 4. The naive solution to the problem is simply not to perform the check b D 0. However, this obviously has a negative effect on the performance.

5 Experimental Performance Comparison We now shift the focus to the performance comparison between UTPS and UTPM arithmetic. It is instructive to get a general idea at the example of the matrix inversion Y D X 1 , X 2 RN N . The recurrence defined by (9) yields a computational cost of ops.X 1 / C

.D C 2/.D 1/ .D 1/.D 2/ ops.C/ C ops.dot/: 2 2

Assuming that the matrix addition is an O.N 2 / and the matrix multiplication/inversion is an O.N 3 / operation one obtains a computational cost that scales with O.D 2 N 3 /. On the other hand, evaluating the matrix inversion in UTPS arithmetic requires

Algorithm 4: Computes c; s from a; b as defined by (12). input : a; b 2 R output: s; c 2 R if b D 0 then c D 1; s D 0 else if jbj > jaj then r D ab ; s D

p 1 ; 1Cr 2

c D sr

p 1 ; 1Cr 2

s D cr

else r D ba ; c D end end

On the Efficient Evaluation of Higher-Order Derivatives

ops.; X

1

321

/

.D 1/D ops.x C y/C 2 .D C 1/D ops.xy/ C ops.C; X 1 / .Dops.x C y// C 2

operations in total. The quantities ops.; X 1 / and ops.C; X 1 / are the number of multiplications respectively additions in the matrix inversion. In the leading powers it is also O.N 3 D 2 /. Hence, one can expect UTPM and UTPS arithmetic to show the same scaling law in a runtime comparison. Nonetheless, there are reasons why one can expect a relatively large difference in practice. Firstly, simply counting the operations is inadequate in practice since there is no one-to-one correspondence between the mathematical formulation and sequence of instructions that are performed on the hardware. In particular, the mathematical formulation has no notion of memory movements. As practitioner one is typically not interested in such details and therefore tries to use optimized algorithms such as provided by ATLAS that hide such issues from the user [14]. Looking at (9) one can see that one can compute all coefficients Yd by calling the ATLAS routines clapack dgetrf, clapack dgetri and cblas dgemm. That is, using UTPM arithmetic it is possible to employ existing high-performance implementations. The alternative, i.e. augmenting such optimized implementations for UTPS arithmetic, is likely to destroy the cache efficiency since a UTPS ŒxD 2 RŒt=.t D / requires D times more memory than a scalar x 2 R. The second reason concerns only the reverse mode of AD. Functions such as dot.; / and inv./ scale with O.N 3 / and therefore O.N 3 / intermediate values need to be available during a reverse sweep. Typically, AD tools simply write the intermediate values into memory during a forward evaluation and retrieve the intermediate values during the reverse sweep. That means that the naive UTPS approach requires memory of order O.D 2 N 3 /. This has to be compared to the UTPM approach which requires to store only O.D 2 N 2 / intermediate values. Since modern CPUs are much faster than the memory, an O.D 2 N 3 / memory requirement has also a negative effect on the runtime performance. Note that not all is lost for the UTPS approach as it is possible to recompute intermediate values as reported for the LU decomposition or use checkpointing to reduce the memory requirement [4]. To compare the performance of UTPM and UTPS arithmetic, we use two easy but sufficiently complex examples. The first test problem is the computation of rf .X / 2 RN N of X 7! f .X / D tr.X 1 /:

(13)

in the reverse mode of AD. Since the runtimes of the gradient evaluation depend on the underlying function, we also measure the runtimes of the normal function evaluation. To avoid problems due to pivoting as mandatory in an LU decomposition, we use a matrix inversion algorithm based on a QR decomposition employing

322

S.F. Walter

Givens rotations without checks like b D 0 in Algorithm 4. Since ADOL-C requires a generic CCC code but Tapenade C code, the same algorithm is provided in both languages. ADOL-C requires the sequence of operations to be traced because all further operations with ADOL-C are performed on an internal representation of the function which is similar to the three-part form. We measure both the time for the tracing and for a function evaluation using the trace. For comparison, also timings of an implementation using ATLAS are shown. The results are depicted and interpreted in Fig. 1. The second test problem is the evaluation of the matrix product dot W RN N N N R ! RN N in UTP arithmetic. Since Tapenade does not offer the possibility of UTP arithmetic with D > 2, we implemented a dot function that performs UTPS arithmetic by hand. As building blocks we use algorithms from Taylorpoly [12]. Furthermore, we use Taylorpoly’s UTPM implementation of the dot function. The results are shown in Fig. 2.

Function Evaluation of inv(A)

4

10

runtime t [ms]

3

10 runtime t [ms]

10

Function eval (C) Function eval (C++) Function eval (ATLAS) ADOL-C tracing ADOL-C eval

2

10

1

10

0

10

Gradient Evaluation trace(inv(A))

4

10

3

10

2

10

1

UTPM eval (C) UTPM eval (ATLAS) TAPENADE eval ADOL-C tracing ADOL-C eval

0

40

60

80

100

120

140

10

160

40

60

80

matrix size N

100

120

140

160

matrix size N

Fig. 1 In the left plot the runtimes for different implementations to compute the inverse of a matrix are shown. One can see that the tracing is the most time-consuming operation. Also, the interpreted evaluation of the function using the trace is much slower than the evaluation of compiled functions. In the right plot, one can see the runtime of the gradient evaluation. ADOL-C as well as the compiled code generated by Tapenade are significantly slower than the UTPM arithmetic 3

UTP, matrix-matrix product, P,D=1,2

4

10

3

runtime t [ms]

runtime t [ms]

2

10

1

10

10

0

40

UTP, matrix-matrix product, P,D=10,10

10 ADOL-C tracing ADOL-C eval TAYLORPOLY UTPM eval TAYLORPOLY UTPS eval

60

80

100 120 matrix size N

140

160

10

ADOL-C tracing ADOL-C eval TAYLORPOLY UTPM eval TAYLORPOLY UTPS eval

2

10

10

1

40

60

80

120 100 matrix size N

140

160

Fig. 2 In the left plot one can see that the UTPS arithmetic using Taylorpoly and ADOL-C is considerably slower than UTPM arithmetic, both for D D 2 and D D 10 as shown on the left respectively on the right. P denotes the number of simultaneous directions as described in ADOLC documentation [3] and is chosen as 10 to reduce the relative overhead of UTPS arithmetic

On the Efficient Evaluation of Higher-Order Derivatives

323

The experiments have been performed on a Dell Latitude D530 with an Intel(R) Core(TM)2 Duo CPU T7300 @2.00 GHz with 2048628 kB physical memory on Linux 2.6.32-24-generic. All sources have been compiled with gcc 4.4.3 using the optimization flag -O3. The source code for the tests is available at [13].

6 Summary We have compared two possible approaches to compute higher-order derivatives of functions containing numerical linear algebra functions in the forward and reverse mode of AD using univariate Taylor polynomial (UTP) arithmetic. We have described that the direct application of UTPS arithmetic to existing numerical linear algebra algorithms may give wrong results if no special care is taken. Also, runtimes have been presented that indicate that the UTPM approach leads to better performance, especially if one uses existing highly optimized algorithms. We have also briefly discussed that the UTPM approach needs less memory in the reverse mode of AD in a very natural way. An open question is how well the UTPM arithmetic generalizes to sparse matrix operations. Acknowledgements The author is grateful to Andreas Griewank for the discussions on computational complexities in AD and to Lutz Lehmann for several enlightening discussions. This project is supported by the Bundesministerium f¨ur Bildung und Forschung (BMBF) within the project NOVOEXP (Numerische Optimierungsverfahren f¨ur die Parametersch¨atzung und den Entwurf optimaler Experimente unter Ber¨ucksichtigung von Unsicherheiten f¨ur die Modellvalidierung verfahrenstechnischer Prozesse der Chemie und Biotechnologie) (03GRPAL3), Humboldt-Universit¨at zu Berlin.

References 1. Bruce Christianson. Reverse accumulation and accurate rounding error estimates for Taylor series coefficients. Optimization Methods and Software, 1(1):81–94, 1991. Also appeared as Tech. Report No. NOC TR239, The Numerical Optimisation Centre, University of Hertfordshire, U.K., July 1991. 2. Mike B. Giles. Collected matrix derivative results for forward and reverse mode algorithmic differentiation. In Christian H. Bischof, H. Martin B¨ucker, Paul D. Hovland, Uwe Naumann, and J. Utke, editors, Advances in Automatic Differentiation, pages 35–44. Springer, 2008. 3. Andreas Griewank, David Juedes, H. Mitev, Jean Utke, Olaf Vogel, and Andrea Walther. ADOL-C: A package for the automatic differentiation of algorithms written in C/CCC. Technical report, Institute of Scientific Computing, Technical University Dresden, 1999. Updated version of the paper published in ACM Trans. Math. Software 22, 1996, 131–167. 4. Andreas Griewank. A mathematical view of automatic differentiation. In Acta Numerica, volume 12, pages 321–398. Cambridge University Press, 2003. 5. Andreas Griewank, Jean Utke, and Andrea Walther. Evaluating higher derivative tensors by forward propagation of univariate Taylor series. Mathematics of Computation, 69:1117–1130, 2000.

324

S.F. Walter

6. Gene H. Golub and Charles F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996. 7. Andreas Griewank and Andrea Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Number 105 in Other Titles in Applied Mathematics. SIAM, Philadelphia, PA, 2nd edition, 2008. 8. Michael J.R. Healy. Matrices for Statistics. Clarendon Press, Oxford, 2nd edition, 2000. 9. Laurent Hasco¨et and Val´erie Pascual. Tapenade 2.1 user’s guide. Technical Report 0300, INRIA, 2004. 10. Jan R. Magnus and Heinz Neudecker. Matrix differential calculus with applications in statistics and econometrics. John Wiley & Sons, 2nd edition, 1999. 11. James R. Schott, editor. Matrix Analysis for Statistics. Wiley, New York, 1997. 12. Sebastian F. Walter. Taylorpoly. http://github.com/b45ch1/taylorpoly, 2009–2010. 13. Sebastian F. Walter. Source code of the performance comparison. http://github.com/ b45ch1/hpsc hanoi 2009 walter, 2010. 14. R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1–2):3–35, 2001. Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (www.netlib.org/lapack/lawns/lawn147.ps).

Modeling of Non-ideal Variable Pitch Valve Springs for Use in Automotive Cam Optimization Henry Yau and Richard W. Longman

Abstract Optimal control theory has been studied for use in developing valve trains in engines to minimize vibration and wear. Previous works have concentrated on the optimization of the cam lobe profile using an ideal linear spring model for the valve spring. The ideal linear spring model cannot capture the variations in spring stiffness that occur at high speeds due to the internal spring dynamics. By using a multiplemass lumped-parameter spring, greater accuracy may be obtained in simulation. In addition, such a model allows for the introduction of spring pitch to be included as an additional optimization parameter. In this paper, a simple multi-mass variable pitch spring model is developed to be used in valve pitch optimization as well as cam profile optimization.

1 Introduction As the U.S. federal government mandates increasingly more strict regulations on fuel efficiency in automobiles, even minute improvements in the engine are sought after. One component that has been targeted for improvement is the engine valve train. The valve train facilitates the engine breathing by opening and closing the intake and exhaust valves which are currently universally actuated by cams. There has been a concentration by researchers in the past to design cams which improve certain performance aspects such as minimizing the amount of vibration or reducing contact stress [1,2,6,7,18,19,24] and [10]. By minimizing the contact force between the cams and the valve followers, the frictional forces, which contribute up to 15% of all friction losses [27] in the engine, are reduced. In addition when variable cam lift profiles are introduced and combined with variable cam timing, the valve train may be designed to act optimally for a large range of operating speeds [16]. H. Yau R.W. Longman Columbia University, MC 4703, 500 West 120th, Street New York, NY 10027 USA e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 27, © Springer-Verlag Berlin Heidelberg 2012

325

326

H. Yau and R.W. Longman

The aim of this work is to study one often overlooked component of the engine valve train, the valve spring. The valve spring provides the force to keep the cam and follower in contact. Previous works on cam optimization used an ideal linear spring to model the valve spring. However, above a certain frequency, the internal resonance has a significant effect on the spring stiffness [15]. A spring model that captures the effects of internal resonance and varying pitch is developed. Insight on varying pitch is gained by using this spring model to evaluate a cam follower system. The spring model will be used when developing methods for optimizing the valve train considering the spring pitch as an additional optimization variable.

1.1 Cam Profile Design The current state for designing automotive valve lift profiles has settled for the blending of simple segments of polynomial with trigonometric functions [11, 20] to manually manipulate the characteristic curves for lift, velocity, acceleration and jerk for better performance. For automotive cams, the lift profile can be separated into three distinct segments, the opening ramp event, the main event, and the closing ramp. The ramp events (cosine, rectangular, or trapezoid) are used to minimize backlash [4] and control valve seating velocity and seal. The main event is generally a polynomial curve computed using polydyne theory to smoothly join the two ramps. Better performance should be expected by using optimal control theory to assist the cam designer as the most commonly used commercial cam design softwares rely on the designer iteratively manipulating a control spline and running simulations on simple valve train models. Early investigations in optimal control theory applied to the design of high speed cam follower systems were done in [2] and [24]. The notion of better relative performance must first be defined as is done by Sun et al. in [24] for high-speed cams operating at a fixed speed. Two competing optimality criterion are introduced by the authors of [2], minimizing the residual vibration and minimizing the contact stress. Their conclusion was that to minimize residual vibration, a cycloid-like profile is desired and to minimize contact stress, a parabolic-like profile is desired. The problems raised by high nonlinearities that arise from the contact stress cost functional were not able to be easily addressed at the time. Similarly, the concerns of [24] was to increase the life of the cam by reducing the peak forces (Hertzian contact stress) and minimizing the energy consumption due to friction. The final recommendation for a cost function is to penalize the third derivative of the follower force F«f and jerk Y« : Z J D 0

1

.W1 Y« 2 C W2 F«f2 /d

(1)

with W1 and W2 as designer selected weights. Increasing the former places emphasis on minimizing contact stress while increasing the latter emphasizes reducing the

Modeling of Non-ideal Variable Pitch Valve Springs

327

residual vibrations. The proposed cost functional and system was later easily implemented in MUSCOD-II [12], a suite of optimal control solvers, as it is quadratic in both the control and state variables. Of interest for engine valve-train usage is how the proposed cam behaves at offnominal design speeds. To avoid valve float, contact must be maintained between the cam and follower for all operational speeds. Separation for the optimal cam occurs at a higher speed than the polydyne cam with a lower spring pre-load but the residual vibrations are higher, introducing another compromise. For an automotive cam that may run from 400 to 4,500 rpm, minimizing the single cost functional at a fixed speed is less than ideal as the minimum acceptable follower contact force Fc must be kept for the entire range to prevent separation. Fabien et al. [6] and Mennicke et al. [19] address the issue by designing the cam to minimize the sum of the cost functionals for a chosen finite set of speeds.

1.2 Optimizing Spring Properties to Maintain Contact To maintain contact between the cam and the follower in an engine, a helical spring is almost universally used. At high speeds, separation occurs between the cam and follower when the inertial effects of the valve follower overwhelm the force of the spring. This behavior has been called valve jump or float and has been studied in [21]. Although it is occasionally beneficial to have the follower leave the surface of the cam, such as in a race engine where air exchange may be improved [17], the resultant impact with the cam or valve seat makes float generally undesirable. The subsequent bouncing after impact prevents the valve from maintaining a complete seal. Increasing the pre-load of the spring will solve the valve float problem but at the cost of increasing the contact force and thus the contact stresses, wear, and fuel consumption. In addition, the internal wave propagation of the spring coils results in separation at a lower speed than would be expected using an ideal linear spring model as well as causing higher residual vibrations. One method to resolve these issues is to use a variable pitch spring. By varying the pitch, the force to displacement curve has stiffening non-linearity and the internal coil collisions damp the spring motion [5]. The first property is desirable as the varying pitched spring allows for the ability to provide only the necessary force to ensure cam to tappet contact throughout the operating range. This current work develops a model to be used in optimizing the pitch of the valve springs. This extra variable in spring design has not been previously studied in terms of optimization for cam follower systems. With an accurate model, a valve spring’s pitch may be optimized such that the spring provides sufficient force to maintain contact while limiting the residual vibration to a tolerable amount. By decreasing the applied spring force, the friction between the cam and the follower are reduced and leads to less energy loss as well as reduced component wear. Previous work has concentrated on producing a cam which minimizes specified cost functionals. Future works will concentrate on optimizing both the cam and the spring pitch.

328

H. Yau and R.W. Longman

2 Valve Train Basics The automotive valve train has evolved steadily over the past several decades. In many regards it is the most critical component with regard to engine performance. The valve train itself consists of essentially four components: the cam, the valve spring, the valve follower, and the valve. The cam is a rigid oblong disk that is driven by the crankshaft of the engine, rotating at half the speed of the crankshaft for four stroke engines. In modern engine design it is common for each combustion cylinder to have two intake and two exhaust valves with a single cam lobe actuating each valve. When optimizing any component of the engine, several interconnected considerations must be evaluated. These interconnections for the valve train are shown in Fig. 1 from [18]. Some of these considerations were discussed earlier in terms of trade offs. Directly relating to the valve spring is decreasing the contact force. Lower contact force means less friction and thus lower fuel consumption, however it would also mean a lower separation speed and increased residual vibrations. When coupled with optimizing the cam, the complexity increases significantly. Valve train configurations in use today can be generally classified into three categories [4]: direct acting, push-rod, and cam on rocker. The direct acting follower shown in Fig. 2b has a cam directly forcing the valve follower through a tappet. Due to cam and tappet wear, the space between the two called lash increases so slivers of metal called adjustment shims need to be placed under the tappet to reduce the lash back to the manufactures specifications. Without regular maintenance of the lash, the performance of the engine decreases and in extreme situation may lead to damage to the valve head and seat. The use of hydraulic lash adjusters eliminates the problem by using the engine’s oil pressure to maintain a consistent lash. Although not modeled here, it is necessary to consider the hydraulic lash adjuster when optimizing the valve train as it has a significant affect on valve seating. Hydraulic lash adjusters are used in both the cam on rocker arm (CORA) and push-rod designs. In a CORA valve train, the adjuster is placed in the head of the engine block. A finger follower rocker arm pivots about the tip of the adjuster while the cam depresses the rocker arm either directly on a finished rounded surface or on a roller placed within the rocker arm as shown on Fig. 2a. The push-rod valve train positions the camshaft near the crankshaft of the engine is shown in Fig. 2c. A long

Fig. 1 Physical considerations in cam optimization

Modeling of Non-ideal Variable Pitch Valve Springs

329

Fig. 2 Various valve train configurations

rod generally with a roller follower and a hydraulic lash adjuster is attached to a rocker arm is actuated by the cam lobe. Most American automobile manufacturers favored this configuration for decades. It is only relatively recently that American manufacturers have switched to moving the camshaft above the pistons. The long push-rod leads to severe vibration problems at high speeds essentially limiting the maximum speed. The flat face direct acting follower is used in this study as the other configurations may be interpreted as a flat face follower with some special conditions on the tappet.

3 Modeling of the Valve Spring As with any modeling, when implementing a spring model for use in optimization, a balance between numerical efficiency and the level of model refinement must been made. There are three key behaviors that the valve spring model must capture: 1. Due to the varying pitch, the model must be able to replicate the spring’s nonlinear stiffness. 2. The internal dynamics of the spring coils called spring surge. 3. The effects of a coil coming into contact with another coil called coil clash or coil collisions.

3.1 Variable Pitch Spring Constant As the name implies, in a varying pitch spring the pitch or angle of inclination of the spring wire changes along its length as seen on the left of Fig. 3. As the spring is compressed, the more gradually pitched coils, which are less stiff and closer in proximity to each other, come into contact as seen on the right of the figure.

330

H. Yau and R.W. Longman

Fig. 3 A varying pitch spring uncompressed and partially compressed

Fig. 4 Simulation of valve spring depicting valve surge during closing of valve

This action is called coil close. Unlike a uniformly pitched spring, the coils close at differing times. As the number of active coils is reduced, the spring becomes more stiff. In Fig. 4 is a series of images depicting a valve spring simulated using a 7-mass spring model. The propagation of a wave can be seen as the spring is released from compression. After the spring has returned to its installation length, coils are still moving. Solutions to the forced vibration of helical springs is given by [15, 26] and [22]. However these do not consider varying spring pitch. In [14], a general model is developed which does consider varying pitch and [25] develops a model which includes varying pitch and coil clash. None of these provide closed form solutions which would be suitable in optimization. Instead for this work a simple lumped multi-mass spring-damper model is used. The individual spring stiffnesses and coil positions are determined by the pitch through the spring. The equation used for the stiffness of the spring sections is derived by [25]. Assuming a constant spring radius r and a single active coil is represented by a mass element, the stiffness k is given by: kD

Es C IB Gs AJ cos2 p LIB cos2 p.Es J cos2 p C Gs J sin2 p C Es r 2 cos2 p C 3Gs AJLr 2 sin2 p/ (2)

Modeling of Non-ideal Variable Pitch Valve Springs

331

where: IB moment of inertia of the wire cross-section about spring axial direction, Es elastic modulus of spring material, Gs shear modulus of spring material, A area of spring wire cross section, J polar moment, p pitch, and L is the length of the p coil found by L D 2 r 1 C tan2 .p/.

3.2 Friction Within the spring there is internal damping, however it is often the case, particularly for uniformly spaced springs, that the internal damping is not sufficient to eliminate the internal wave motions. One technique that manufacturers use is using the friction between the spring and a cylindrical sleeve or an internal spring to dampen the motion. Here the friction force of each coil Ff is modeled as Coulomb friction: Ff D

F jF j s Fn sgn.x/ P k Fn jF j > s Fn

(3)

where Fn is the normal force between the spring coil and the sleeve, F is sum of all forces acting on the coil excluding friction, s the coefficient of static friction, and k the coefficient of kinetic friction. In an actual spring, as the spring is compressed the radius expands and the normal force increases, however this attribute is not considered in the model.

3.3 Coil Collisions The energy dissipated from coil collisions arises from several mechanisms including elastic waves, plastic deformation, viscoelastic work. An overview of collision modeling is given in [8]. A penalty method can be applied on contact creating a continuous force. The most well known is the non-linear Hertz law for sphere to sphere collision Fc D kc ı n , where n D 1:5 for metallic spheres, ı is the approach or penetration distance, and kc is the general stiffness constant dependent on the sphere dimensions and material properties. Generally for metallic collisions the viscous work is insignificant and the collision behaves elastically, however due to the high speeds involved in the coil collisions, it cannot be neglected. The coil contact force uses the model developed by [13] that extends the non-linear Hertz law to account for the viscous damping due to material hysteresis shown in (4). Fc D k c ı

n

3.1 e 2 /ıP 1C 4ıPi

! (4)

332

H. Yau and R.W. Longman

where e is the coefficient of restitution and is assumed to be constant, ıPi is the relative velocity at impact, and ıP is the instantaneous relative velocity of the colliding coils. The more general contact force in (4) should still be only applicable for sphere-to-sphere contact, however [9] states a choice of n from 1 to 1:5 gives a good approximation of cylinder-to-cylinder contact. Though models do exist for cylinder-to-cylinder collision such as [3], they are typically non-linear as well as implicit and do not account for hysteresis damping. The stiffness of the spring coils on collision used in (4) is given in [25]: kc D

2Es 3.1 2 /

2 4 d

(5)

where d is the spring wire diameter, Es is the modulus of elasticity, and is Poisson’s ratio. The approach distance ı for two cylinders is given in [9] as: ı D ŒmıPi .g C 1/=2kc 1=.1Cg/

(6)

where g D 3=2, and m is the mass of the coils which are in contact during the collision. For simulations done in this paper, the penalty method is sufficient, however it presents problems when viewed from the perspective of optimization. Using the penalty method requires extremely small step sizes due to the high material stiffness. The alternative is to apply complementarity theory and use a constraint method as in [23] which assumes that a collision occurs instantaneously and stops the integration at that point. The integration is then restarted with updated initial conditions after impact. In Fig. 5, the force-displacement diagram is presented for a 7-mass spring using the penalty method to handle coil impacts. At no displacement the spring exerts 200 N/m of pre-load force. As the coils close, the spring stiffness gradually increases resulting in a non-linear force curve. The spring is displaced at a low speed so that there is no wave propagation thus the curve is essentially piecewise linear.

Fig. 5 Force displacement diagram of the lumped massed spring

Modeling of Non-ideal Variable Pitch Valve Springs

333

4 Example of Spring Model Use The equations of motion for the lumped-parameter spring model can be written as: P C Kx.t/ D F.t/ Fc .x; x/ P Mx.t/ R C Ff .x/

(7)

where x is the position of the coils, M is the mass matrix computed using the density of the spring material and the coil length and the mass of the follower, the friction force vector Ff is found using (3) for each mass, the stiffness matrix K is found using (2), F is the forcing function vector(cam acceleration applied to the follower mass), and Fc is the force of impact from (4) (zero on no penetration). The material properties and dimensions of the steel spring and follower are D 7;800 kg/m3 , modulus of elasticity Es D 1:89e5 N/mm2 , coefficient of restitution e D 0:6 is assumed to be constant, Poisson’s ratio D 0:3, wire diameter d D :7 mm, spring radius r D 10 mm, installation length is 15 mm, and mass of the follower is 88 g. To examine some behaviors of the spring model two sample cases are used. The first illustrates how valve float occurs at a slower speed than predicted when using an ideal spring because of the internal dynamics of the spring. The second example demonstrates how varying the pitch can prevent the float using the same average stiffness as a uniformly pitched spring. The cam lift profile used in these two examples is a constant velocity ramp variety shown in Fig. 6. This is a commonly used automotive cam type favored due to its smooth rise transition which is intended to avoid excessive vibrations.

4.1 Valve Float Using an ideal linear spring with a stiffness of 16,000 N/m and the cam rotating at a fast speed of 4,125 RPM, the spring force is sufficient to keep the cam and follower in contact as shown on the left of Fig. 8 as the force of the follower does not exceed

Fig. 6 Cam lift profile, velocity and acceleration used in experiment

334

H. Yau and R.W. Longman

Fig. 7 Peak valve float at 4,125 RPM in one mass spring model

Fig. 8 Force of ideal and multi-mass springs

the spring force in the negative direction. As the cam rises, the force of the valve spring grows in proportion to the displacement of the cam. Even as the cam reaches its limit of maximum acceleration, there is still a sufficient gap between the force of the valve and the force applied by the spring. Using the 7-mass model with the same amount of spring pre-load and uniform pitch, the valve train exhibits float where the arrow indicates on the right of Fig. 8. The oscillation of the multiple masses results in an unsteady force applied by the spring. To solve the problem one could increase the spring pre-load, however the increased stiffness would also cause increase friction and wear. An alternative solution is to adjust the pitch throughout the spring so that vibrations are damped and the spring stiffness is progressively increased.

4.2 Adjusting Pitch to Prevent Float In the second numerical experiment, the spring pitch is adjusted while the average stiffness of the spring is constrained. In a one mass spring with the camshaft rotating at 4,125 RPM and the average stiffness of the two springs is 13,938 N/m,

Modeling of Non-ideal Variable Pitch Valve Springs

335

the stiffness of the first spring k1 is adjusted. The stiffness k1 is for the spring nearest the cam. The resultant valve float height is shown in Fig. 7. As the stiffness of k1 reaches approximately 16,000 N/m, valve float is prevented. As the single spring mass is forced into contact with the ground the effective spring stiffness keff increases to k1 which is sufficient to maintain contact. This idea can be extended to the two mass model to create a progressive rate spring. As the number of masses increases, the effects of coil collisions becomes more noticeable. The stiffness and thus the spacing decreases away from the cam. Those closely packed coils are the first to impact and close. Thus as the spring is compressed, the number of active coils decreases and the overall spring stiffness increases. The closely packed coils are also the first coils to collide and dissipate energy.

5 Conclusions The work presented builds a spring model which is numerically simple enough to be used in optimization. The spring model exhibits the internal dynamics necessary to accurately estimate the force of a spring in a high speed cam follower. A simple one mass spring was evaluated by varying the individual pitches while maintaining the same average stiffness. The limit where the spring would no longer result in valve float was found. The disparity between using an ideal linear spring model and multi-mass spring was demonstrated by showing how the ideal linear spring model would over estimate the spring force in situations where using the 7-mass lumped model spring would result in valve float. The next logical step is to optimize the spring design for a fixed cam and constrained to have no valve float. The spring parameters are adjusted such that Hertzian contact stress or the energy loss per cycle is minimized. After this is performed successfully a simultaneous spring and cam optimization can be done. The cam introduces additional considerations such as area under the cam lift profile. These extra considerations substantially increase the difficulty in defining the optimization criterion and constraints, and the authors’ will use their previous experience with this process to combine the variable pitch spring optimization with the cam optimization.

References 1. H.G. Bock, R.W. Longman, J.P. Schl¨oder, and M.J. Winckler. Synthesis of automotive cams using multiple shooting-SQP methods for constrained optimization. In W. J¨ager and H.-J. Krebs, editors, Mathematics - Key Technology for the Future. Springer, 2003. 2. M. Chew, F. Freudenstein, and R.W. Longman. Application of optimal control theory to the synthesis of high-speed cam-follower systems: Parts 1 and 2. Transactions of the ASME, 105(1):576–591, 1981.

336

H. Yau and R.W. Longman

3. S. Dubowsky and F. Freudenstein. Dynamic analysis of mechanical systems with clearances part 1: Formulation of dynamic model. Journal of Engineering for Industry, 93(1):305–309, 1971. 4. D. Elgin. Automotive camshaft dynamics. CAM Design Handbook, pages 529–543, 2003. 5. A. Fujimoto, H Higashi, N Osawa, H Nakai, and T Mizukami. Valve jump prediction using dynamic simulation on direct acting valve train. Technical Report 2007-19, Advanced Powertrain Development Dept. Mitsubishi Motors, 2007. 6. B. C. Fabien, R.W. Longman, and F. Freudenstein. The design of high-speed dwell-rise-dwell cams using linear quadratic optimal control theory. Journal of Mechanical Design, 116:867– 874, 1994. 7. F. Freudenstein, M. Mayurian, and E.R. Maki. Energy efficient cam-follower systems. ASME Journal of Mechanisms, Transmissions, and Automation in Design, 105:681–685, 1983. 8. S. Faik and H. Witteman. Modeling of impact dynamics: A literature survey. In International ADAMS User Conference, 2000. 9. K.H. Hunt and F.R.E. Crossley. Coefficient of restitution interpreted as damping in vibroimpact. ASME Journal Of Applied Mechanics, 42(2):440–445, 1975. 10. R. W. Longman J.-G. Sun and F. Freudenstein. Objective functions for optimal control in cam follower systems. International Journal for Manufacturing Science and Technology, 8(2), 2006. 11. T. Kitada and M. Kuchita. Development of vibration calculation code for engine valve-train. Technical Report 2008-20, Advanced Powertrain Development Dept. Mitsubishi Motors, 2008. 12. D.B. Leineweber. The theory of MUSCOD in a nutshell. IWR-Preprint 96-19, Universit¨at Heidelberg, 1996. 13. H. M. Lankarani and P. E. Nikravesh. A contact force model with hysteresis damping for impact analysis of multibody systems. Journal of Mechanical Design, 112:369–376, 1990. 14. Y. Lin and A.P. Pisano. General dynamic equations of helical springs with static solution and experimental verification. ASME Journal Of Applied Mechanics, 54:910–917, 1987. 15. J. Lee and D.J. Thompson. Dynamic stiffness formulation, free vibration and wave motion of helical springs. Journal of Sound and Vibration, 239(2):297–320, 2001. 16. N. Milovanovic, R. Chen, and J. Turner. Influence of variable valve timings on the gas exchange process in a controlled auto-ignition engine. Journal of Automobile Engineering, 218(D):567– 583, 2004. 17. S. Mclaughlin and I. Hague. Development of a multi-body simulation model of a winston cup valvetrain to study valve bounce. Proceedings of the Institution of Mechanical Engineers, 216(K):237–248, 2002. 18. S. Mennicke, R.W. Longman, M.S. Chew, and H.G. Bock. A cad package for high speed automotive cam design based on direct multiple shooting control techniques. ASME Proceedings Design Engineering Technical Conference, 2004. 19. S. Mennicke, R.W. Longman, M.S. Chew, and H.G. Bock. High speed automotive cam design using direct multiple shooting control techniques. ASME Proceedings Design Engineering Technical Conference, 2004. 20. R. Norton. Cam Design and Manufacturing Handbook. Industrial Press, New York, 1st edition, 2002. 21. K. Oezguer and F. Pasin. Separation phenomenon in force closed cam mechanisms. Mechanical Machine Theory, 31(4):487–499, 1996. 22. D. Pearson and W. H. Wittrick. An exact solution for the vibration of helical springs using a bernoulli-euler model. International journal of mechanical sciences, 28(2):83–96, 1986. 23. A. Sinopoli. Dynamics and impact in a system with unilateral constraints the relevance of dry friction. Meccanica, 22(4):210–215, 1987. 24. J.G. Sun, R.W. Longman, and R. Freudenstein. Determination of appropriate cost functionals for cam-follower design using optimal control theory. In Proceedings of the 1984 American Control Conference, pages 1799–1800, San Diego, Calif., 1984. American Automatic Control Council.

Modeling of Non-ideal Variable Pitch Valve Springs

337

25. M.H. Wu and W.Y. Hsu. Modelling the static and dynamic behavior of a conical spring by considering the coil close and damping effects. Journal of Sound and Vibration, 214(1):17–28, 1998. 26. T.L. Wang W. Jiang and W.K. Jones. The forced vibrations of helical springs. International Journal of Mechanical Sciences, 34(7):549–562, 1992. 27. D.J. Zhu and C.M. Taylor. Tribological Analysis and Design of a Modern Automobile Cam and Follower. Wiley and Sons, Suffolk UK, 1st edition, 2001.